10 min read

Time-Varying Regression Coefficients via Gaussian Random Walk in PyMC

In this notebook we want to illustrate how to use PyMC to fit a time-varying coefficient regression model. The motivation comes from post Exploring Tools for Interpretable Machine Learning where we studied a time series problem, regarding the prediction of the number of bike rentals, from a machine learning perspective. Concretely, we fitted and compared two machine learning models: a linear regression with interactions and a gradient boost model (XGBoost). The models regressors were mainly meteorological data and seasonality features. One interesting feature we saw, through PDP and ICE plots was that the temperature feature had a non-constant effect over the bike rentals (see here). Indeed, when the temperature is high (more than 25 degrees approximately), the bike rentals are negatively impacted by the temperature (to be fair, this is when controlling by other regressors) on average. What we want to do in this notebook is to tackle the same problem from a different perspective. Namely, use to use a GaussianRandomWalk to model the interaction effect between the temperature and the bike rentals. We of course start with the simple regression baseline for comparison.

Remark: The content of this post was part of by talk on Introduction to Bayesian Analysis with PyMC at PyCon Colombia 2022. Here you can find the slides.

Prepare Notebook

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pymc.sampling_jax
from pymc.distributions.continuous import Exponential
from sklearn.preprocessing import StandardScaler
import seaborn as sns


plt.style.use("bmh")
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "svg"

Read Data

For a detailed description of the data set please see here (from the book Interpretable Machine Learning:A Guide for Making Black Box Models Explainable by Christoph Molnar). For a detailed EDA see the previous post Exploring Tools for Interpretable Machine Learning.

data_path = "https://raw.githubusercontent.com/christophM/interpretable-ml-book/master/data/bike.csv"

raw_data_df = pd.read_csv(data_path)

raw_data_df.head()
glue

Data Formatting

data_df = raw_data_df.copy().assign(
    date=pd.date_range(
        start=pd.to_datetime("2011-01-01"), end=pd.to_datetime("2012-12-31"), freq="D"
    )
)

data_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   season           731 non-null    object        
 1   yr               731 non-null    int64         
 2   mnth             731 non-null    object        
 3   holiday          731 non-null    object        
 4   weekday          731 non-null    object        
 5   workingday       731 non-null    object        
 6   weathersit       731 non-null    object        
 7   temp             731 non-null    float64       
 8   hum              731 non-null    float64       
 9   windspeed        731 non-null    float64       
 10  cnt              731 non-null    int64         
 11  days_since_2011  731 non-null    int64         
 12  date             731 non-null    datetime64[ns]
dtypes: datetime64[ns](1), float64(3), int64(3), object(6)
memory usage: 74.4+ KB

Let’s plot the time development of the bike rentals and temperature over time.

fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(x="date", y="cnt", data=data_df, color="black", ax=ax[0])
sns.lineplot(x="date", y="temp", data=data_df, color="C0", ax=ax[1])
ax[0].set(title="Count of bikes")
ax[1].set(title="Temperature");

As we are going to fit a linear model is always good to scale the data.

target = "cnt"
target_scaled = f"{target}_scaled"

endog_scaler = StandardScaler()
exog_scaler = StandardScaler()


data_df[target_scaled] = endog_scaler.fit_transform(X=data_df[[target]])
data_df[["temp_scaled", "hum_scaled", "windspeed_scaled"]] = exog_scaler.fit_transform(
    X=data_df[["temp", "hum", "windspeed"]]
)

Finally, we extract the features we want to include in the model.

n = data_df.shape[0]
# target
cnt = data_df[target].to_numpy()
cnt_scaled = data_df[target_scaled].to_numpy()
# date feature
date = data_df["date"].to_numpy()
# model regressors
temp_scaled = data_df["temp_scaled"].to_numpy()
hum_scaled = data_df["hum_scaled"].to_numpy()
windspeed_scaled = data_df["windspeed_scaled"].to_numpy()
holiday_idx, holiday = data_df["holiday"].factorize(sort=True)
workingday_idx, workingday = data_df["workingday"].factorize(sort=True)
weathersit_idx, weathersit = data_df["weathersit"].factorize(sort=True)
t = data_df["days_since_2011"].to_numpy() / data_df["days_since_2011"].max()

Base Model

Before we jump into the time-varying coefficient model let us fist fit a baseline regression model. Let us follow the bayesian workflow for model development.

1. Model Specification

In a first step (after EDA and data pre-processing) we define the model structure. In particular we choose

  • Prior distributions for the regression coefficients and the noise.
  • Model parametrization, i.e. the structure of the linear model.
  • Likelihood function, which is this case we decide for a StudentT distribution.
coords = {
    "date": date,
    "workingday": workingday,
    "weathersit": weathersit,
}


with pm.Model(coords=coords) as base_model:

    # --- priors ---
    intercept = pm.Normal(name="intercept", mu=0, sigma=2)
    b_temp = pm.Normal(name="b_temp", mu=0, sigma=2)
    b_hum = pm.Normal(name="b_hum", mu=0, sigma=2)
    b_windspeed = pm.Normal(name="b_windspeed", mu=0, sigma=2)
    b_workingday = pm.Normal(name="b_workingday", mu=0, sigma=2, dims="workingday")
    b_weathersit = pm.Normal(name="b_weathersit", mu=0, sigma=2, dims="weathersit")
    b_t = pm.Normal(name="b_t", mu=0, sigma=3)
    nu = pm.Gamma(name="nu", alpha=8, beta=2)
    sigma = pm.HalfNormal(name="sigma", sigma=2)

    # --- model parametrization ---
    mu = pm.Deterministic(
        name="mu",
        var=(
            intercept
            + b_t * t
            + b_temp * temp_scaled
            + b_hum * hum_scaled
            + b_windspeed * windspeed_scaled
            + b_workingday[workingday_idx]
            + b_weathersit[weathersit_idx]
        ),
        dims="date",
    )

    # --- likelihood ---
    likelihood = pm.StudentT(
        name="likelihood", mu=mu, nu=nu, sigma=sigma, dims="date", observed=cnt_scaled
    )


pm.model_to_graphviz(base_model)

2. Prior Predictive Analysis

We can analyze what the model expects before seeing the data.

Remark: The following plotting function was taken from pymc/gp/util.py, see plot_gp_dist.

with base_model:
    # --- prior samples ---
    prior_predictive_base = pm.sample_prior_predictive(samples=200)


palette = "viridis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))


fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(
        prior_predictive_base.prior_predictive["likelihood"], p, axis=1
    )
    lower = np.percentile(
        prior_predictive_base.prior_predictive["likelihood"], 100 - p, axis=1
    )
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper.flatten(),
        y2=lower.flatten(),
        color=cmap(color_val),
        alpha=0.1,
    )
sns.lineplot(x=date, y=cnt_scaled, color="black", ax=ax)
ax.set(
    title="Base Model - Prior Predictive Samples",
    xlabel="date",
    ylabel=target_scaled,
);

The chosen prior distributions are not very restrictive but they image of the predictions is still within a reasonable range.

3. Model Fit

We sample from the posterior distributions using the JAX based NUTS sampler from Numpyro.

with base_model:
    idata_base = pm.sampling_jax.sample_numpyro_nuts(
        target_accept=0.8, draws=1000, chains=4
    )
    posterior_predictive_base = pm.sample_posterior_predictive(trace=idata_base)

4. Model Diagnostics

Now wee look into some diagnostics metrics an plots.

az.summary(data=idata_base, var_names=["~mu"])
axes = az.plot_trace(
    data=idata_base,
    var_names=["~mu"],
    compact=True,
    kind="rank_bars",
    backend_kwargs={"figsize": (12, 15), "layout": "constrained"},
)
plt.gcf().suptitle("Base Model - Trace", fontsize=16);

Overall, the model looks good.

axes = az.plot_forest(
    data=idata_base,
    var_names=["~mu"],
    combined=True,
    r_hat=True,
    ess=True,
    figsize=(10, 6),
)
axes[0].axvline(x=0.0, color="black", linestyle="--")
plt.gcf().suptitle("Base Model - Posterior Distributions", fontsize=16);

Note that in this base model the temperature feature has a positive effect on bike rentals on average.

5. Posterior Predictive Distribution

Finally, let us take a look at the fitted values.

posterior_predictive_likelihood = posterior_predictive_base.posterior_predictive[
    "likelihood"
].stack(sample=("chain", "draw"))

posterior_predictive_likelihood_inv = endog_scaler.inverse_transform(
    X=posterior_predictive_likelihood.to_numpy()
)

fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(posterior_predictive_likelihood_inv, p, axis=1)
    lower = np.percentile(posterior_predictive_likelihood_inv, 100 - p, axis=1)
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper,
        y2=lower,
        color=cmap(color_val),
        alpha=0.1,
    )

sns.lineplot(
    x=date,
    y=posterior_predictive_likelihood_inv.mean(axis=1),
    color="C2",
    label="posterior predictive mean",
    ax=ax,
)
sns.lineplot(
    x=date,
    y=cnt,
    color="black",
    label=target,
    ax=ax,
)
ax.legend(loc="upper left")
ax.set(
    title="Base Model - Posterior Predictive Samples",
    xlabel="date",
    ylabel=target,
);

Observe that for certain points in July the predictions and the fit do not coincide and go in opposite directions. This will be solved by adding the time-varying coefficient to the linear model for the temperature regressor (see below).

Remark: Note that the model predict negative values for the bike rentals. This is of course not good! An alterative choice of likelihood to model count data would resolve this (e.g. Poisson or Negative Binomial likelihoods).

Time-Varying Coefficients Model

We now follow the same workflow above. The main difference is that we replace the regression coefficient of the temperature with GaussianRandomWalk.

1. Model Specification

with pm.Model(coords=coords) as model:

    # --- priors ---
    intercept = pm.Normal(name="intercept", mu=0, sigma=2)
    b_hum = pm.Normal(name="b_hum", mu=0, sigma=2)
    b_windspeed = pm.Normal(name="b_windspeed", mu=0, sigma=2)
    b_workingday = pm.Normal(name="b_workingday", mu=0, sigma=2, dims="workingday")
    b_weathersit = pm.Normal(name="b_weathersit", mu=0, sigma=2, dims="weathersit")
    b_t = pm.Normal(name="b_t", mu=0, sigma=3)
    sigma_slopes = pm.HalfNormal(name="sigma_slope", sigma=0.5)
    nu = pm.Gamma(name="nu", alpha=8, beta=2)
    sigma = pm.HalfNormal(name="sigma", sigma=2)

    # --- model parametrization ---
    slopes = pm.GaussianRandomWalk(
        name="slopes",
        sigma=sigma_slopes,
        init_dist=Exponential.dist(lam=1.0),
        dims="date",
    )
    mu = pm.Deterministic(
        name="mu",
        var=(
            intercept
            + b_t * t
            + slopes * temp_scaled
            + b_hum * hum_scaled
            + b_windspeed * windspeed_scaled
            + b_workingday[workingday_idx]
            + b_weathersit[weathersit_idx]
        ),
        dims="date",
    )

    # --- likelihood ---
    likelihood = pm.StudentT(
        name="likelihood", mu=mu, nu=nu, sigma=sigma, dims="date", observed=cnt_scaled
    )


pm.model_to_graphviz(model)

2. Prior Predictive Analysis

with model:
    # --- prior samples ---
    prior_predictive = pm.sample_prior_predictive(samples=200)


fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(
        prior_predictive.prior_predictive["likelihood"], p, axis=1
    )
    lower = np.percentile(
        prior_predictive.prior_predictive["likelihood"], 100 - p, axis=1
    )
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper.flatten(),
        y2=lower.flatten(),
        color=cmap(color_val),
        alpha=0.1,
    )
sns.lineplot(x=date, y=cnt_scaled, color="black", ax=ax)
ax.set(
    title="Time-Varying Coefficient Model - Prior Predictive Samples",
    xlabel="date",
    ylabel=target_scaled
);

3. Model Fit

with model:
    idata = pm.sampling_jax.sample_numpyro_nuts(
        target_accept=0.8, draws=1000, chains=4
    )
    posterior_predictive = pm.sample_posterior_predictive(trace=idata)

4. Model Diagnostics

az.summary(data=idata, var_names=["~mu", "~slopes"])
axes = az.plot_trace(
    data=idata_base,
    var_names=["~mu", "~slopes"],
    compact=True,
    kind="rank_bars",
    backend_kwargs={"figsize": (12, 15), "layout": "constrained"},
)
plt.gcf().suptitle("Time-Varying Coefficient Model - Trace", fontsize=16);

Let us now compare the two models in a forest plot.

axes = az.plot_forest(
    data=[idata_base, idata],
    model_names=["base", "time-varying"],
    var_names=["~mu", "~slopes"],
    combined=True,
    r_hat=True,
    ess=True, figsize=(10, 6),
)
axes[0].axvline(x=0.0, color="black", linestyle="--")
plt.gcf().suptitle("Posterior Distributions", fontsize=16);

Overall, there is no major change in the estimated regression coefficients besides the trend component.

We can also use the az.compare method to compare the two models.

az.compare(compare_dict={"base": idata_base, "time-varying": idata})
rank loo p_loo d_loo weight se dse warning loo_scale
time-varying 0 -234.726306 131.925724 0.000000 0.989431 26.230529 0.000000 False log
base 1 -488.674408 10.541945 253.948102 0.010569 26.755712 22.971437 False log

It seems that the time-varying coefficient model is better at predicting the bike rentals.

5. Posterior Predictive Distribution

posterior_predictive_likelihood = posterior_predictive.posterior_predictive[
    "likelihood"
].stack(sample=("chain", "draw"))

posterior_predictive_likelihood_inv = endog_scaler.inverse_transform(
    X=posterior_predictive_likelihood.to_numpy()
)

fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(posterior_predictive_likelihood_inv, p, axis=1)
    lower = np.percentile(posterior_predictive_likelihood_inv, 100 - p, axis=1)
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper,
        y2=lower,
        color=cmap(color_val),
        alpha=0.1,
    )

sns.lineplot(
    x=date,
    y=posterior_predictive_likelihood_inv.mean(axis=1),
    color="C2",
    label="posterior predictive mean",
    ax=ax,
)
sns.lineplot(
    x=date,
    y=cnt,
    color="black",
    label=target,
    ax=ax,
)
ax.legend(loc="upper left")
ax.set(
    title="Time-Varying Coefficient Model - Posterior Predictive Samples",
    xlabel="date",
    ylabel=target,
);

6. Temperature Effect Deep-Dive

Next, we wan to compare the inferred temperature effect from both models. To begin with, let us compare the mean effect for both models.

base_tmp_mean = (
    idata_base.posterior["b_temp"].stack(sample=("chain", "draw")).to_numpy().mean()
)
time_varying_tmp_mean = (
    idata.posterior["slopes"].stack(sample=("chain", "draw")).to_numpy().mean()
)
print(
    f"""
base model mean effect = {base_tmp_mean: 0.3f}
------------------------------------------
time-varying model mean effect  = {time_varying_tmp_mean: 0.3f}
------------------------------------------
"""
)
base model mean effect =  0.512
------------------------------------------
time-varying model mean effect  =  0.611
------------------------------------------

It seems that the effect of the time-varying coefficient model is higher. Still, this is just one statistic.It is always better to see the data. The following plot shows the effect as a function of the temperature.

fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(
    x="temp",
    y="slopes",
    data=data_df.assign(
        slopes=idata.posterior["slopes"].stack(sample=("chain", "draw")).mean(axis=1)
    ),
    hue="mnth",
    palette="tab20",
)
sns.lineplot(
    x="temp",
    y="b_temp",
    data=data_df.assign(
        b_temp=idata_base.posterior["b_temp"]
        .stack(sample=("chain", "draw"))
        .mean()
        .to_numpy()
    ),
    color="black",
    label="base model",
    ax=ax,
)
ax.axhline(y=0.0, color="gray", linestyle="--")
ax.set(title="Temperature Effect", xlabel="temp", ylabel="effect");

We see many interesting features from this plot:

  • The time-varying coefficient model indeed finds a non constant effect of temperature on bike rentals.

  • The effect of second model is always positive for temperatures lower than 25 degrees. For higher temperatures the effect is sometimes negative, which is consistent with the results found in the previous post Exploring Tools for Interpretable Machine Learning. This happens mainly in the months of June and July.

  • The variance of the time-varying effect decreases with the temperature.

Finally, let us plot the temperature effect as a function of time.

palette = "cividis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))


posterior_predictive_slopes = idata.posterior["slopes"].stack(sample=("chain", "draw"))

fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=False, layout="constrained"
)

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(posterior_predictive_slopes, p, axis=1)
    lower = np.percentile(posterior_predictive_slopes, 100 - p, axis=1)
    color_val = colors[i]
    ax[0].fill_between(
        x=date,
        y1=upper,
        y2=lower,
        color=cmap(color_val),
        alpha=0.1,
    )

sns.lineplot(
    x="date",
    y="b_temp",
    data=data_df.assign(
        b_temp=idata_base.posterior["b_temp"]
        .stack(sample=("chain", "draw"))
        .mean()
        .to_numpy()
    ),
    color="black",
    label="base model",
    ax=ax[0]
)
ax[0].axhline(y=0.0, color="gray", linestyle="--")
ax[0].legend(loc="upper left")
ax[0].set(title="Temperature Effect")
sns.lineplot(x="date", y="temp", data=data_df, color="C0", ax=ax[1])
ax[1].set(title="Temperature");

We see that the effect is quite seasonal.

Note that these results are compatible with the insights found on the previous post Exploring Tools for Interpretable Machine Learning. For example, take a look into the PDP and ICE Pots

and specifically here is are the same plots restricted to the month of July:

Here we see the negative effect of temperature on bike count for the summer months.

Remark: It would be interesting to compare these results with Orbit’s KTR model (see Media Effect Estimation with Orbit’s KTR Model)