Zero-Inflated TSB Model - Dr. Juan Camilo Orduz

After going through the fundamentals of the TSB Method for Intermittent Time Series Forecasting in NumPyro in the previous notebook, we explore a variation of it that might be useful for certain applications. In a nutshell, we keep the same model structure of the TSB model, but we modify the likelihood function to account for the sparsity of the time series. Concretely, we replace the classic Gaussian likelihood function with a zero-inflated Negative Binomial likelihood function. One clear conceptual advantage of this approach that all our prediction and credible intervals will be non-negative.

Model Specification

Recall that the TSB method is similar to the Croston’s method in the sense that is constructs two different time series out of the original one and then forecast each of them separately, so that the final forecast is generated by combining the forecasts of the two time series. The main difference between the two methods is that the TSB method uses the demand probability instead of the demand periods. Let \(y_{t}\) denote the input time series then the TSB method is specified by the following equations:

If \(y_{t} > 0\), then

\[ \begin{align*} z_{t + 1} & = \alpha y_{t} + (1 - \alpha) z_{t} \\ p_{t + 1} & = \beta + (1 - \beta) p_{t} \end{align*} \]

If \(y_{t} = 0\), then

\[ \begin{align*} z_{t + 1} & = z_{t} \\ p_{t + 1} & = (1 - \beta) p_{t} \end{align*} \]

where \(z_{t}\) is the demand (level) of the time series at time \(t\), \(p_{t}\) is the probability of observing a non-zero demand at time \(t\), and \(\alpha\) and \(\beta\) are the smoothing parameters. The forecast is then given by the product.

\[ \hat{y}_{t + 1} = z_{t} p_{t} \]

Well, to be more precise, one generate predictions by sampling from the distribution:

\[ \hat{y}_{t + 1} \sim \text{Normal}(z_{t} p_{t}, \sigma^2) \]

The idea behind the zero-inflated TSB model is to replace the Gaussian likelihood function with a Zero-Inflated Negative Binomial likelihood function.

\[ \hat{y}_{t + 1} \sim \text{ZeroInflatedNegativeBinomial}(\text{mean}=z_{t}, \text{concentration}=\sigma, \text{gate}=1 - p_{t}), \]

where the parameter gate is the probability of observing a zero demand at time \(t\). Hence, it is natural to consider \(1 - p_{t}\). Note that we use \(z_{t}\) as the man since we account for the demand probability in the gate parameter.

Prepare Notebook

from collections.abc import Callable

import arviz as az
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpyro
import numpyro.distributions as dist
import pandas as pd
import preliz as pz
import xarray as xr
from jax import random
from jaxlib.xla_extension import ArrayImpl
from numpyro.contrib.control_flow import scan
from numpyro.infer import MCMC, NUTS, Predictive
from pydantic import BaseModel, Field
from statsforecast import StatsForecast
from statsforecast.models import TSB, CrostonOptimized
from statsforecast.utils import ConformalIntervals
from tqdm.notebook import tqdm

az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"

numpyro.set_host_device_count(n=4)

rng_key = random.PRNGKey(seed=42)

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"

Generate Synthetic Data

We consider thee same synthetic data as in the previous post TSB Method for Intermittent Time Series Forecasting in NumPyro:

n = 80
lam = 0.3

y = random.poisson(key=rng_key, lam=lam, shape=(n,)).astype(jnp.float32)
t = jnp.arange(y.size)

fig, ax = plt.subplots()
ax.plot(t, y)
ax.set(xlabel="time", ylabel="y", title="Time Series Data")

Train-Test Split

Similarly as before we do a simple train-test split.

n = y.size

prop_train = 0.85
n_train = round(prop_train * n)

y_train = y[:n_train]
t_train = t[:n_train]

y_test = y[n_train:]
t_test = t[n_train:]

fig, ax = plt.subplots()
ax.plot(t_train, y_train, color="C0", label="train")
ax.plot(t_test, y_test, color="C1", label="test")
ax.axvline(x=t_train[-1], c="black", linestyle="--")
ax.legend()
ax.set(xlabel="time", ylabel="y", title="Time Series Data Split")

Croston and TSB Model with StatsForecast

For the sake of comparison, let’s start by looking into the results of the statsforecast package. We keep the parameters from the previous posts.

sf = StatsForecast(
    models=[CrostonOptimized(), TSB(alpha_d=0.311, alpha_p=0.57)],
    freq=1,
    n_jobs=-1,
)

train_df = pd.DataFrame({"unique_id": "a", "ds": t_train, "y": y_train})

sf_forecast = sf.forecast(
    h=y_test.size,
    df=train_df,
    level=[94],
    prediction_intervals=ConformalIntervals(n_windows=5),
)

fig, ax = plt.subplots()
ax.plot(t_train, y_train, color="C0", label="train")
ax.plot(t_test, y_test, color="C1", label="test")
ax.fill_between(
    t_test,
    sf_forecast["TSB-lo-94"],
    sf_forecast["TSB-hi-94"],
    color="C2",
    alpha=0.3,
    label="TSB $94\\%$ Conformal Interval",
)
ax.plot(t_test, sf_forecast["TSB"], color="C2", label="TSB mean forecast")
ax.fill_between(
    t_test,
    sf_forecast["CrostonOptimized-lo-94"],
    sf_forecast["CrostonOptimized-hi-94"],
    color="C4",
    alpha=0.3,
    label="Croston $94\\%$ Conformal Interval",
)
ax.plot(
    t_test, sf_forecast["CrostonOptimized"], color="C4", label="sCroston mean forecast"
)
ax.axvline(x=t_train[-1], c="black", linestyle="--")
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=3)
ax.set(xlabel="time", ylabel="y", title="Time Series Data Split")

As explained in the previous post, for this specific smoothing parameters the forecast in significantly higher then the Croston’s method. The reason is that at the end of the training set we have a big spike.

Zero-Inflated TSB Model with NumPyro

As described in the introduction, the structure of the Zero-Inflated TSB model is thee same as in the one presented in TSB Method for Intermittent Time Series Forecasting in NumPyro. We just change the likelihood structure.

Prior Specification

For this specific case, we need to constrain the priors a bit by making the support of the Beta distribution close to \([0.05, 0.3]\).

fig, ax = plt.subplots()
pz.Beta(10, 60).plot_pdf(ax=ax)
ax.axvline(x=(10 / (60 + 10)), c="black", linestyle="--")  # mean
ax.axvline(x=0.05, c="C3", linestyle=":")  # lower bound
ax.axvline(x=0.3, c="C3", linestyle=":")  # upper bound
ax.axvspan(xmin=0.05, xmax=0.3, color="C3", alpha=0.2)
ax.set(title="Beta(10, 60) PDF", xlabel="$\\alpha$")

Model Specification

We now specify the model using the Zero-Inflated Negative Binomial likelihood function.

def zi_tsb_model(ts_trim: ArrayImpl, z0: float, p0: float, future: int = 0) -> None:
    t_max_trim = ts_trim.size

    # --- Priors ---

    z_smoothing = numpyro.sample(
        "z_smoothing", dist.Beta(concentration1=10, concentration0=60)
    )
    p_smoothing = numpyro.sample(
        "p_smoothing", dist.Beta(concentration1=10, concentration0=60)
    )
    concentration = numpyro.sample("concentration", dist.HalfNormal(scale=1))

    # --- Transition Function ---

    def transition_fn(carry, t):
        z_prev, p_prev = carry

        z_next = jnp.where(
            t < t_max_trim,
            jnp.where(
                ts_trim[t] > 0,
                z_smoothing * ts_trim[t] + (1 - z_smoothing) * z_prev,
                z_prev,
            ),
            z_prev,
        )

        p_next = jnp.where(
            t < t_max_trim,
            jnp.where(
                ts_trim[t] > 0,
                p_smoothing + (1 - p_smoothing) * p_prev,
                (1 - p_smoothing) * p_prev,
            ),
            p_prev,
        )

        mu = z_next
        gate = 1 - p_next
        pred = numpyro.sample(
            "pred",
            dist.ZeroInflatedNegativeBinomial2(
                mean=mu, concentration=concentration, gate=gate
            ),
        )

        return (z_next, p_next), pred

    # --- Run Scan ---

    with numpyro.handlers.condition(data={"pred": ts_trim}):
        _, preds = scan(
            transition_fn,
            (z0, p0),
            jnp.arange(t_max_trim + future),
        )

    # --- Forecast ---

    if future > 0:
        return numpyro.deterministic("ts_forecast", preds[-future:])
    return None

Inference

We now fit the model:

def get_model_args(y_train: ArrayImpl) -> tuple[ArrayImpl, float, float]:
    y_train_trim = jnp.trim_zeros(y_train, trim="f")
    p_idx = jnp.flatnonzero(y_train)
    p_diff = jnp.diff(p_idx, prepend=-1)
    z0 = y_train[p_idx[0]]
    p0 = 1 / p_diff.mean()
    return y_train_trim, z0, p0


class InferenceParams(BaseModel):
    num_warmup: int = Field(2_000, ge=1)
    num_samples: int = Field(2_000, ge=1)
    num_chains: int = Field(4, ge=1)


def run_inference(
    rng_key: ArrayImpl,
    model: Callable,
    args: InferenceParams,
    *model_args,
    **nuts_kwargs,
) -> MCMC:
    sampler = NUTS(model, **nuts_kwargs)
    mcmc = MCMC(
        sampler=sampler,
        num_warmup=args.num_warmup,
        num_samples=args.num_samples,
        num_chains=args.num_chains,
    )
    mcmc.run(rng_key, *model_args)
    return mcmc


y_train_trim, z0, p0 = get_model_args(y_train)

inference_params = InferenceParams()

rng_key, rng_subkey = random.split(key=rng_key)
mcmc = run_inference(
    rng_subkey,
    zi_tsb_model,
    inference_params,
    y_train_trim,
    z0,
    p0,
)

idata = az.from_numpyro(posterior=mcmc)

az.summary(data=idata)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
concentration	1.956	0.592	0.903	3.054	0.007	0.005	6240.0	4385.0	1.0
p_smoothing	0.245	0.053	0.147	0.345	0.001	0.000	7912.0	5831.0	1.0
z_smoothing	0.144	0.041	0.071	0.223	0.000	0.000	8171.0	5759.0	1.0

All diagnostics look good:

print(f"""Divergences: {idata["sample_stats"]["diverging"].sum().item()}""")

Divergences: 0

axes = az.plot_trace(
    data=idata,
    compact=True,
    kind="rank_bars",
    backend_kwargs={"figsize": (12, 7), "layout": "constrained"},
)
plt.gcf().suptitle("Zero-Inflated TSB Model Trace", fontsize=16)

Forecast

We now generate the forecast:

def forecast(
    rng_key: ArrayImpl, model: Callable, samples: dict[str, ArrayImpl], *model_args
) -> dict[str, ArrayImpl]:
    predictive = Predictive(
        model=model,
        posterior_samples=samples,
        return_sites=["ts_forecast"],
    )
    return predictive(rng_key, *model_args)


rng_key, rng_subkey = random.split(key=rng_key)
tsb_forecast = forecast(
    rng_subkey,
    zi_tsb_model,
    mcmc.get_samples(),
    y_train_trim.astype(jnp.int32),
    z0,
    p0,
    y_test.size,
)

posterior_predictive = az.from_numpyro(
    posterior_predictive=tsb_forecast,
    coords={"t": t_test},
    dims={"ts_forecast": ["t"]},
)

Let’s see the forecast and compare it with the statsforecast models:

fig, ax = plt.subplots()
ax.plot(t_train, y_train, color="C0", label="train")
ax.plot(t_test, y_test, color="C1", label="test")
ax.axvline(x=t_train[-1], c="black", linestyle="--")
az.plot_hdi(
    x=t_test,
    y=posterior_predictive["posterior_predictive"]["ts_forecast"],
    hdi_prob=0.94,
    color="C3",
    smooth=False,
    fill_kwargs={"alpha": 0.1, "label": "$94\\%$ HDI"},
    ax=ax,
)
ax.plot(
    t_test,
    posterior_predictive["posterior_predictive"]["ts_forecast"].mean(
        dim=("chain", "draw")
    ),
    color="C3",
    label="mean forecast",
)
ax.plot(t_test, sf_forecast["TSB"], color="C2", label="TSB [statsforecast]")
ax.plot(
    t_test,
    sf_forecast["CrostonOptimized"],
    color="C4",
    label="CrostonOptimized [statsforecast]",
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=4)
ax.set(xlabel="time", ylabel="y", title="Zero-Inflated TSB Model Forecast")

Observes that the mean forecast lies in between the Croston and TSB forecast.

We can zoom in into the one-step ahead forecast posterior distribution:

fig, ax = plt.subplots(figsize=(10, 6))
az.plot_posterior(
    data=posterior_predictive["posterior_predictive"]["ts_forecast"].sel(t=n_train),
    ax=ax,
)
ax.set(
    title="Zero-Inflated TSB Model Forecast Posterior",
    xlabel="y",
    ylabel="Density",
)

Note that, in contrast to the Gaussian likelihood case, zero is in the \(94\%\) credible interval. More over, it is in most of the credible intervals as the distribution is skewed to the right. This goes in alignment with the fact that the input time series is very sparse.

Time Slice Cross Validation

Similarly as in the previous post, we run a time slice cross validation to compare the performance of the Zero-Inflated TSB model with the Croston and TSB models on the one-step ahead forecast.

StatsForecast

def statsforecast_time_slice_cross_validation(
    y: ArrayImpl, n_splits: int, sf: StatsForecast
) -> pd.DataFrame:
    forecast_list = []
    for i in tqdm(range(n_splits)):
        y_train = y[: -(n_splits - i)]
        train_df = pd.DataFrame(
            {"unique_id": "a", "ds": jnp.arange(y_train.size), "y": y_train}
        )
        forecast_list.append(
            sf.forecast(
                h=1,
                df=train_df,
                level=[94],
                prediction_intervals=ConformalIntervals(n_windows=5),
            )
        )
    return pd.concat(forecast_list, axis=0)


sf_cv = statsforecast_time_slice_cross_validation(y=y, n_splits=y_test.size, sf=sf)

Zero-Inflated TSB

def tsb_time_slice_cross_validation(
    rng_key: ArrayImpl, y: ArrayImpl, n_splits: int, inference_params: InferenceParams
) -> xr.Dataset:
    forecast_list = []
    for i in tqdm(range(n_splits)):
        # Prepare data
        y_train = y[: -(n_splits - i)]
        y_train_trim, z0, p0 = get_model_args(y_train)
        # Inference
        rng_key, rng_subkey = random.split(key=rng_key)
        mcmc = run_inference(
            rng_subkey,
            zi_tsb_model,
            inference_params,
            y_train_trim,
            z0,
            p0,
        )
        # Forecast
        rng_key, rng_subkey = random.split(key=rng_key)
        tsb_forecast = forecast(
            rng_subkey,
            zi_tsb_model,
            mcmc.get_samples(),
            y_train_trim.astype(jnp.int32),
            z0,
            p0,
            1,
        )
        forecast_list.append(
            az.from_numpyro(
                posterior_predictive=tsb_forecast,
                coords={"t": [y_train_trim.size]},
                dims={"ts_forecast": ["t"]},
            )
        )
    # Concatenate forecasts
    return xr.concat(
        [x["posterior_predictive"] for x in forecast_list],
        dim=("t"),
    )


rng_key, rng_subkey = random.split(key=rng_key)
forecast_cv = tsb_time_slice_cross_validation(
    rng_key=rng_subkey,
    y=y,
    n_splits=y_test.size,
    inference_params=inference_params,
)

We plot the results:

fig, ax = plt.subplots()
ax.plot(t_train, y_train, color="C0", label="train")
ax.plot(t_test, y_test, marker="o", markersize=4, color="C1", label="test (cv)")
ax.axvline(x=t_train[-1], c="black", linestyle="--")
az.plot_hdi(
    x=t_test,
    y=forecast_cv["ts_forecast"],
    hdi_prob=0.94,
    color="C3",
    smooth=False,
    fill_kwargs={"alpha": 0.2, "label": "$94\\%$ HDI"},
    ax=ax,
)
ax.plot(
    t_test,
    forecast_cv["ts_forecast"].mean(dim=("chain", "draw")),
    marker="o",
    markersize=4,
    color="C3",
    label="mean forecast",
)
ax.plot(
    t_test,
    sf_cv["TSB"],
    color="C2",
    marker="o",
    markersize=4,
    label="TSB [statsforecast]",
)
ax.plot(
    t_test,
    sf_cv["CrostonOptimized"],
    marker="o",
    markersize=4,
    color="C4",
    label="CrostonOptimized [statsforecast]",
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=3)
ax.set(
    xlabel="time",
    ylabel="y",
    title="Zero-Inflated TSB Model Forecast - 1 Step Ahead Cross-Validation",
)

Here are some remarks about the results:

The Zero-Inflated TSB model forecast lies in between the Croston’s and TSB forecast. It keeps the property of updating the forecast when there is no demand as in the TSB model case. However, for the given parameters, the decay rate is much more conservative and goes to zero slower than the TSB model.
The credible intervals take integer values (as we are assuming we are modeling count data) and they are never negative.

Depending on the application, and the forecasting metric to optimize for, the Zero-Inflated TSB model might be a better choice than the Croston’s and TSB models as it somehow lies in between and the likelihood function better captures the discreteness of the input data. Note that this approach is relative natural once we write the model in a probabilistic programming language.