Probabilistic Time Series Forecasting

Opportunities and Applications

Outline

  1. Introduction
  2. Scan in NumPyro
  3. Exponential Smoothing
  4. ARIMA
  5. Hierarchical Models
  1. Intermittent Demand
  2. Censored Demand
  3. Price Elasticities
  4. Prior Model Calibration
  5. References

Why Probabilistic Forecasting?

NumPyro is a lightweight probabilistic programming library that provides a NumPy backend for Pyro. It relies on JAX for automatic differentiation and JIT compilation to GPU / CPU.

  • Interpretability
    • Trust on the results
  • Uncertainty quantification
    • Risk Assessment
    • Decision Making
  • Customization
    • Feature engineering
    • Special constraints
    • Calibrate with domain knowledge
  • Scale
    • Good results on production environments
    • Take advantage of GPU

Pyro Forecasting Module 🔥

NumPyro - SGT Example Model 🫠

😅 …

Scan ⭐

An efficient implementation of for loops

def scan(f, init, xs):
  """Pure Python implementation of scan.

  Parameters
  ----------
  f : A  a Python function to be scanned.
  init : An initial loop carry value
  xs : The value over which to scan along the leading axis.
  """
  carry = init
  ys = []
  for x in xs:
    carry, y = f(carry, x)
    ys.append(y)
  return carry, np.stack(ys)

We need recursive relationships! \(y_t \longmapsto y_{t+1}\)

Example: Exponential Smoothing

Example: Exponential Smoothing

\[ \begin{align*} \hat{y}_{t+h|t} = & \: l_t \\ l_t = & \: \alpha y_t + (1 - \alpha)l_{t-1} \end{align*} \]

  • \(y_t\) is the observed value at time \(t\).
  • \(\hat{y}_{t+h|t}\) is the forecast of the value at time \(t+h\) given the information up to time \(t\).
  • \(l_t\) is the level at time \(t\).
  • \(\alpha\) is the smoothing parameter. It is a value between 0 and 1.

Example: Exponential Smoothing

\[ \begin{align*} \hat{y}_{t+h|t} = & \: l_t \\ l_t = & \: \alpha y_t + (1 - \alpha)l_{t-1} \end{align*} \]

    def transition_fn(carry, t):
        previous_level = carry

        level = jnp.where(
            t < t_max,
            level_smoothing * y[t] + (1 - level_smoothing) * previous_level,
            previous_level,
        )

        mu = previous_level
        pred = numpyro.sample("pred", dist.Normal(loc=mu, scale=noise))

        return level, pred

Example: Exponential Smoothing

def level_model(y: ArrayLike, future: int = 0) -> None:
    # Get time series length
    t_max = y.shape[0]
    
    # --- Priors ---
    ## Level
    level_smoothing = numpyro.sample(
        "level_smoothing", dist.Beta(concentration1=1, concentration0=1)
    )
    level_init = numpyro.sample("level_init", dist.Normal(loc=0, scale=1))

    ## Noise
    noise = numpyro.sample("noise", dist.HalfNormal(scale=1))

    # --- Transition Function ---
    def transition_fn(carry, t):
        . . .

    # --- Run Scan ---
    with numpyro.handlers.condition(data={"pred": y}):
        _, preds = scan(
            transition_fn,
            level_init,
            jnp.arange(t_max + future),
        )

    # --- Forecast ---
    if future > 0:
        numpyro.deterministic("y_forecast", preds[-future:])

Example: Exponential Smoothing

Posterior Distribution Parameters

Example: Exponential Smoothing

Example: Exponential Smoothing

Trend + Seasonal + Damped

Example: ARMA(1, 1) Model

    def transition_fn(carry, t):
        y_prev, error_prev = carry
        ar_part = phi * y_prev
        ma_part = theta * error_prev
        pred = mu + ar_part + ma_part
        error = y[t] - pred
        return (y[t], error), error

Hierarchical Exponential Smoothing

Hierarchical Exponential Smoothing

Hierarchical Exponential Smoothing

Baseline Model

Local Level Model + Seasonality + Covariates

  • Local level model to capture the trend component.
  • Seasonality using a Fourier modes.
  • Add covariates to account for promotions and discounts.
  • Global factor to account the availability of the product.

Scalability: \(~40\)K time-series can be fitted in less than \(10\) minutes in a GPU.

Intermittent Time Series

Croston’s Method

The method is based on the idea of separating the demand size \(z_t\) and the demand interval \(p_t\), and then forecasting them separately using simple exponential smoothing.

  • \(z_t\): keep the non-zero values of \(y_t\).
  • \(p_t\): keep the time between non-zero values of \(y_t\).

\[ \hat{y}_{t+h} = \frac{\hat{z}_{t+h}}{\hat{p}_{t+h}} \]

Croston’s Method

Croston’s Method

def croston_model(z: ArrayLike, p_inv: ArrayLike, future: int = 0) -> None:
    z_forecast = scope(level_model, "demand")(z, future)
    p_inv_forecast = scope(level_model, "period_inv")(p_inv, future)

    if future > 0:
        numpyro.deterministic("z_forecast", z_forecast)
        numpyro.deterministic("p_inv_forecast", p_inv_forecast)
        numpyro.deterministic("forecast", z_forecast * p_inv_forecast)

TSB Method

- The TSB method is similar to the Croston’s method: constructs two different time series out of the original one and then forecast each of them separately, so that the final forecast is generated by combining the forecasts of the two time series.

- The main difference between the two methods is that the TSB method uses the demand probability instead of the demand periods.

TSB Method

1 - Step Ahead Time Slice Cross Validation

Zero-Inflated TSB Model

🧪 We can modify the TSB model to include zero-inflation by using a Zero-Inflated Negative Binomial Distribution.

    def transition_fn(carry, t):
        z_prev, p_prev = carry
        
        z_next . . .
        p_next . . .

        mu = z_next
        gate = 1 - p_next
        pred = numpyro.sample(
            "pred",
            dist.ZeroInflatedNegativeBinomial2(
                mean=mu, concentration=concentration, gate=gate
            ),
        )

Time-Slice Cross Validation

Why Zeros Happen?

Simulation Study: Availability Constrains

Hacking the TSB Model 🪛

Hacking the TSB Model

Forecasting Unseen Demand

ARIMA Model

Censored Distributions

Censored Likelihood

def censored_normal(loc, scale, y, censored):
    distribution = dist.Normal(loc=loc, scale=scale)
    ccdf = 1 - distribution.cdf(y)
    numpyro.sample(
        "censored_label",
        dist.Bernoulli(probs=ccdf).mask(censored == 1),
        obs=censored
    )
    return numpyro.sample("pred", distribution.mask(censored != 1))

Change likelihood distribution in a time-series model:

    ## Transition function for AR(2)
    def transition_fn(carry, t):
        y_prev_1, y_prev_2 = carry
        ar_part = phi_1 * y_prev_1 + phi_2 * y_prev_2
        pred_mean = mu + ar_part + seasonal[t]
        # Censored likelihood
        pred = censored_normal(pred_mean, sigma, y[t], censored[t])
        return (pred, y_prev_1), pred

Censored Time Series Forecast 💡

Hierarchical Pricing Elasticity Models

Idea 🤓

Use a hierarchical structure to regularize the demand elasticity parameters.

Hierarchical Pricing Elasticity Models

Dynamic Time-Series Model

Dynamic Coefficients

Hilbert Space Gaussian Processes for Dynamic Coefficients

Inferring Effect of Temperature on Electricity Demand

Calibrating a Demand Model 🧪

Let us assume that we know from domain knowledge that the effect of temperature on demand over 32°C is somehow stable at around a value of 0.13.

MMM Calibration with Lift Tests

MMM Calibration with Lift Tests

References 📚

NumPyro Examples

References 📚

Packages:

References 📚

Other Blogposts

Books

Papers

Thank you!

juanitorduz.github.io