4 min read

Seasonal Bump Functions

Motivated by the nice talk on Winning with Simple, even Linear, Models by Vincent D. Warmerdam, I briefly describe how to construct certain class of bump functions to encode seasonal variables in R.

Prepare Notebook

library(glue)
library(lubridate)
library(magrittr)
library(tidyverse)

Generate Data

Let us generate a time sequence variable stored in a tibble.

# Define time sequence. 
t <- seq.Date(from = as.Date("2017-07-01"), to = as.Date("2019-04-01"), by = "day")
# Store it in a tibble. 
df <- tibble(t = t)

Symmetric Bump Function

We want to generate a gaussian-like curve around a specific date. The width of the bump function is parametrized by a non-negative number \(\varepsilon \geq 0\). For \(\varepsilon = 0\) we generate a dummy variable. Namely, for \(\varepsilon \neq 0\),

\[ f_{x_0, \varepsilon}(x) = \exp\left(- \left(\frac{x - x_0}{\varepsilon}\right)^2 \right) \]

In R,

bump_func <- function(x, x_0 = 0, epsilon = 1) {
  # If `epsilon` = 0, we encode it as a dummy variable.
  if (epsilon == 0) {
    as.numeric(x == x_0)
  } else {
    delta <-  as.numeric(x - x_0)
    exp(- (delta / epsilon)^2)
  }
}

Examples

Let us asumme we want to model the effect of Christmas 2018-12-24.

  • \(\varepsilon = 0\) will generate a dummy variable.
epsilon <- 0

df %>% 
  mutate(y = bump_func(x = t, 
                       x_0 = as.Date("2018-12-24"), 
                       epsilon = epsilon)) %>% 
  ggplot() +
  geom_line(mapping = aes(x = t, y = y)) +
  xlab("Date") +
  ylab("") +
  ggtitle(label = glue("Bump Seasonal Variable - epsilon = {epsilon}"))

  • \(\varepsilon = 8\) will generate a bump function around 2018-12-24.
epsilon <- 8

df %>% 
  mutate(y = bump_func(x = t, 
                       x_0 = as.Date("2018-12-24"),
                       epsilon = 8)) %>% 
  ggplot() +
  geom_line(mapping = aes(x = t, y = y)) +
  xlab("Date") +
  ylab("") +
  ggtitle(label = glue("Bump Seasonal Variable - epsilon = {epsilon}"))

Let us see how to do it for repeated dates and events:

epsilon <- 8

christmas_dates <- c(as.Date("2017-12-24"), as.Date("2018-12-24"))
halloween_dates <- c(as.Date("2017-10-31"), as.Date("2018-10-31"))

df %>% 
  mutate(
    y_christmas = map(
      .x = christmas_dates, 
      .f = ~ bump_func(x = t, x_0 = .x, epsilon = epsilon)) %>% 
      reduce(.f = ~ .x + .y),
    
    y_halloween = map(
      .x = halloween_dates, 
      .f = ~ bump_func(x = t, x_0 = .x, epsilon = epsilon)) %>% 
      reduce(.f = ~ .x + .y)) %>% 
  gather(key = Event, value = value, ... = y_christmas, y_halloween) %>% 
  # Plot variables.
  ggplot() +
  geom_line(mapping = aes(x = t, y = value, color = Event)) +
  xlab("Date") +
  ylab("") +
  ggtitle(label = glue("Bump Seasonal Variable - epsilon = {epsilon}"))

Asymmetric Bump Function

Sometimes the effect before and after the seasonal variable is note the same. To model this we can tweak the bump function defined above so that:

  • It is not symmetric.
  • It can have a drop after the seasonal variable (an potentially have different effect sizes).

A natural candidate to fulfill these requirements is

\[ a_{-} f_{x_0, \varepsilon_{-}}(x)I_{\{x \leq x_0\}} \pm a_{+} f_{x_0, \varepsilon_{+}}(x) I_{\{x > x_0\}} \] where:

  • \(I_{\{x \leq x_0\}}\) and \(I_{\{x > x_0\}}\) denote the corresponing indicator functions.

  • \(f_{x_0, \varepsilon_{\pm}}\) denotes the bump function defined above.

  • \(\varepsilon_{-}, \varepsilon_{+} >0\) denote the width (variance) before and after \(x_0\) respectively.

  • \(a_{-}, a_{+} >0\) denote the amplitudes before and after \(x_0\) respectively.

We now write this function in R.

asymmestric_bump_func <- function(
  # Input vector.
  x,
  # Maximum of bump function.
  x_0, 
  # Width of bump function before `x_0`.
  epsilon_minus = 1, 
  # Width of bump function after `x_0`.
  epsilon_plus = 1, 
  # Amplitude of bump function before `x_0`.
  a_minus = 1, 
  # Amplitude of bump function after `x_0`.
  a_plus = 1, 
  # Wether to have a drom after `x_0`.
  drop = FALSE
  ) {
  
  delta <- as.numeric(x - x_0)
  # Define indicator functions.
  indicator_minus <- as.numeric(delta < 0)
  indicator_plus <- as.numeric(delta >= 0)
  
  # Calculate bump function components. 
  y_minus <- a_minus * indicator_minus * bump_func(x = x, x_0 = x_0, epsilon = epsilon_minus) 
  y_plus <- a_plus*indicator_plus * bump_func(x = x, x_0 = x_0, epsilon = epsilon_plus)
  # Calculate sign depending on the drop choice. 
  drop_sign <- 2*as.numeric(drop) - 1
  
  y <- y_minus  - drop_sign*y_plus
  
  return(y)
}

Examples

Let us ilustrate the results for Christmas 2017-12-24.

Without Drop

df  %>% 
  mutate(y =  asymmestric_bump_func(x = t,
                                    x_0 = as.Date("2017-12-24"),
                                    epsilon_minus = 20,
                                    epsilon_plus = 4, 
                                    drop = FALSE)) %>% 
  # Filter for visualization (zoom in). 
  filter(t < "2018-02-01") %>% 
  ggplot() +
  geom_line(mapping = aes(x = t, y = y)) +
  xlab("Date") +
  ylab("") +
  ggtitle(label = glue("Asymmetric Bump Function"))

With Drop

df  %>% 
  mutate(y =  asymmestric_bump_func(x = t,
                                    x_0 = as.Date("2017-12-24"),
                                    epsilon_minus = 15,
                                    epsilon_plus = 4, 
                                    a_minus = 1, 
                                    a_plus = 0.3,
                                    drop = TRUE)) %>% 
  # Filter for visualization (zoom in). 
  filter(t < "2018-02-01") %>% 
  ggplot() +
  geom_line(mapping = aes(x = t, y = y)) +
  xlab("Date") +
  ylab("") +
  ggtitle(label = glue("Asymmetric Bumb Function with Drop"))