In this notebook we want to illustrate how to use PyMC to fit a time-varying coefficient regression model. The motivation comes from post Exploring Tools for Interpretable Machine Learning where we studied a time series problem, regarding the prediction of the number of bike rentals, from a machine learning perspective. Concretely, we fitted and compared two machine learning models: a linear regression with interactions and a gradient boost model (XGBoost). The models regressors were mainly meteorological data and seasonality features. One interesting feature we saw, through PDP and ICE plots was that the temperature feature had a non-constant effect over the bike rentals (see here). Indeed, when the temperature is high (more than 25 degrees approximately), the bike rentals are negatively impacted by the temperature (to be fair, this is when controlling by other regressors) on average. What we want to do in this notebook is to tackle the same problem from a different perspective. Namely, use to use a GaussianRandomWalk to model the interaction effect between the temperature and the bike rentals. We of course start with the simple regression baseline for comparison.
Remark: The content of this post was part of by talk on Introduction to Bayesian Analysis with PyMC at PyCon Colombia 2022. Here you can find the slides.
Prepare Notebook
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pymc.sampling_jax
from pymc.distributions.continuous import Exponential
from sklearn.preprocessing import StandardScaler
import seaborn as sns
plt.style.use("bmh")
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "svg"
Read Data
For a detailed description of the data set please see here (from the book Interpretable Machine Learning:A Guide for Making Black Box Models Explainable by Christoph Molnar). For a detailed EDA see the previous post Exploring Tools for Interpretable Machine Learning.
data_path = "https://raw.githubusercontent.com/christophM/interpretable-ml-book/master/data/bike.csv"
raw_data_df = pd.read_csv(data_path)
raw_data_df.head()

Data Formatting
data_df = raw_data_df.copy().assign(
date=pd.date_range(
start=pd.to_datetime("2011-01-01"), end=pd.to_datetime("2012-12-31"), freq="D"
)
)
data_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 season 731 non-null object
1 yr 731 non-null int64
2 mnth 731 non-null object
3 holiday 731 non-null object
4 weekday 731 non-null object
5 workingday 731 non-null object
6 weathersit 731 non-null object
7 temp 731 non-null float64
8 hum 731 non-null float64
9 windspeed 731 non-null float64
10 cnt 731 non-null int64
11 days_since_2011 731 non-null int64
12 date 731 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(3), int64(3), object(6)
memory usage: 74.4+ KB
Let’s plot the time development of the bike rentals and temperature over time.
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(x="date", y="cnt", data=data_df, color="black", ax=ax[0])
sns.lineplot(x="date", y="temp", data=data_df, color="C0", ax=ax[1])
ax[0].set(title="Count of bikes")
ax[1].set(title="Temperature");
As we are going to fit a linear model is always good to scale the data.
target = "cnt"
target_scaled = f"{target}_scaled"
endog_scaler = StandardScaler()
exog_scaler = StandardScaler()
data_df[target_scaled] = endog_scaler.fit_transform(X=data_df[[target]])
data_df[["temp_scaled", "hum_scaled", "windspeed_scaled"]] = exog_scaler.fit_transform(
X=data_df[["temp", "hum", "windspeed"]]
)
Finally, we extract the features we want to include in the model.
n = data_df.shape[0]
# target
cnt = data_df[target].to_numpy()
cnt_scaled = data_df[target_scaled].to_numpy()
# date feature
date = data_df["date"].to_numpy()
# model regressors
temp_scaled = data_df["temp_scaled"].to_numpy()
hum_scaled = data_df["hum_scaled"].to_numpy()
windspeed_scaled = data_df["windspeed_scaled"].to_numpy()
holiday_idx, holiday = data_df["holiday"].factorize(sort=True)
workingday_idx, workingday = data_df["workingday"].factorize(sort=True)
weathersit_idx, weathersit = data_df["weathersit"].factorize(sort=True)
t = data_df["days_since_2011"].to_numpy() / data_df["days_since_2011"].max()
Base Model
Before we jump into the time-varying coefficient model let us fist fit a baseline regression model. Let us follow the bayesian workflow for model development.
1. Model Specification
In a first step (after EDA and data pre-processing) we define the model structure. In particular we choose
- Prior distributions for the regression coefficients and the noise.
- Model parametrization, i.e. the structure of the linear model.
- Likelihood function, which is this case we decide for a StudentT distribution.
coords = {
"date": date,
"workingday": workingday,
"weathersit": weathersit,
}
with pm.Model(coords=coords) as base_model:
# --- priors ---
intercept = pm.Normal(name="intercept", mu=0, sigma=2)
b_temp = pm.Normal(name="b_temp", mu=0, sigma=2)
b_hum = pm.Normal(name="b_hum", mu=0, sigma=2)
b_windspeed = pm.Normal(name="b_windspeed", mu=0, sigma=2)
b_workingday = pm.Normal(name="b_workingday", mu=0, sigma=2, dims="workingday")
b_weathersit = pm.Normal(name="b_weathersit", mu=0, sigma=2, dims="weathersit")
b_t = pm.Normal(name="b_t", mu=0, sigma=3)
nu = pm.Gamma(name="nu", alpha=8, beta=2)
sigma = pm.HalfNormal(name="sigma", sigma=2)
# --- model parametrization ---
mu = pm.Deterministic(
name="mu",
var=(
intercept
+ b_t * t
+ b_temp * temp_scaled
+ b_hum * hum_scaled
+ b_windspeed * windspeed_scaled
+ b_workingday[workingday_idx]
+ b_weathersit[weathersit_idx]
),
dims="date",
)
# --- likelihood ---
likelihood = pm.StudentT(
name="likelihood", mu=mu, nu=nu, sigma=sigma, dims="date", observed=cnt_scaled
)
pm.model_to_graphviz(base_model)
2. Prior Predictive Analysis
We can analyze what the model expects before seeing the data.
Remark: The following plotting function was taken from pymc/gp/util.py, see plot_gp_dist
.
with base_model:
# --- prior samples ---
prior_predictive_base = pm.sample_prior_predictive(samples=200)
palette = "viridis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))
fig, ax = plt.subplots(figsize=(12, 6))
for i, p in enumerate(percs[::-1]):
upper = np.percentile(
prior_predictive_base.prior_predictive["likelihood"], p, axis=1
)
lower = np.percentile(
prior_predictive_base.prior_predictive["likelihood"], 100 - p, axis=1
)
color_val = colors[i]
ax.fill_between(
x=date,
y1=upper.flatten(),
y2=lower.flatten(),
color=cmap(color_val),
alpha=0.1,
)
sns.lineplot(x=date, y=cnt_scaled, color="black", ax=ax)
ax.set(
title="Base Model - Prior Predictive Samples",
xlabel="date",
ylabel=target_scaled,
);
The chosen prior distributions are not very restrictive but they image of the predictions is still within a reasonable range.
3. Model Fit
We sample from the posterior distributions using the JAX based NUTS sampler from Numpyro.
with base_model:
idata_base = pm.sampling_jax.sample_numpyro_nuts(
target_accept=0.8, draws=1000, chains=4
)
posterior_predictive_base = pm.sample_posterior_predictive(trace=idata_base)
4. Model Diagnostics
Now wee look into some diagnostics metrics an plots.
az.summary(data=idata_base, var_names=["~mu"])

axes = az.plot_trace(
data=idata_base,
var_names=["~mu"],
compact=True,
kind="rank_bars",
backend_kwargs={"figsize": (12, 15), "layout": "constrained"},
)
plt.gcf().suptitle("Base Model - Trace", fontsize=16);
Overall, the model looks good.
axes = az.plot_forest(
data=idata_base,
var_names=["~mu"],
combined=True,
r_hat=True,
ess=True,
figsize=(10, 6),
)
axes[0].axvline(x=0.0, color="black", linestyle="--")
plt.gcf().suptitle("Base Model - Posterior Distributions", fontsize=16);
Note that in this base model the temperature feature has a positive effect on bike rentals on average.
5. Posterior Predictive Distribution
Finally, let us take a look at the fitted values.
posterior_predictive_likelihood = posterior_predictive_base.posterior_predictive[
"likelihood"
].stack(sample=("chain", "draw"))
posterior_predictive_likelihood_inv = endog_scaler.inverse_transform(
X=posterior_predictive_likelihood.to_numpy()
)
fig, ax = plt.subplots(figsize=(12, 6))
for i, p in enumerate(percs[::-1]):
upper = np.percentile(posterior_predictive_likelihood_inv, p, axis=1)
lower = np.percentile(posterior_predictive_likelihood_inv, 100 - p, axis=1)
color_val = colors[i]
ax.fill_between(
x=date,
y1=upper,
y2=lower,
color=cmap(color_val),
alpha=0.1,
)
sns.lineplot(
x=date,
y=posterior_predictive_likelihood_inv.mean(axis=1),
color="C2",
label="posterior predictive mean",
ax=ax,
)
sns.lineplot(
x=date,
y=cnt,
color="black",
label=target,
ax=ax,
)
ax.legend(loc="upper left")
ax.set(
title="Base Model - Posterior Predictive Samples",
xlabel="date",
ylabel=target,
);
Observe that for certain points in July the predictions and the fit do not coincide and go in opposite directions. This will be solved by adding the time-varying coefficient to the linear model for the temperature regressor (see below).
Remark: Note that the model predict negative values for the bike rentals. This is of course not good! An alterative choice of likelihood to model count data would resolve this (e.g. Poisson or Negative Binomial likelihoods).
Time-Varying Coefficients Model
We now follow the same workflow above. The main difference is that we replace the regression coefficient of the temperature with GaussianRandomWalk.
1. Model Specification
with pm.Model(coords=coords) as model:
# --- priors ---
intercept = pm.Normal(name="intercept", mu=0, sigma=2)
b_hum = pm.Normal(name="b_hum", mu=0, sigma=2)
b_windspeed = pm.Normal(name="b_windspeed", mu=0, sigma=2)
b_workingday = pm.Normal(name="b_workingday", mu=0, sigma=2, dims="workingday")
b_weathersit = pm.Normal(name="b_weathersit", mu=0, sigma=2, dims="weathersit")
b_t = pm.Normal(name="b_t", mu=0, sigma=3)
sigma_slopes = pm.HalfNormal(name="sigma_slope", sigma=0.5)
nu = pm.Gamma(name="nu", alpha=8, beta=2)
sigma = pm.HalfNormal(name="sigma", sigma=2)
# --- model parametrization ---
slopes = pm.GaussianRandomWalk(
name="slopes",
sigma=sigma_slopes,
init_dist=Exponential.dist(lam=1.0),
dims="date",
)
mu = pm.Deterministic(
name="mu",
var=(
intercept
+ b_t * t
+ slopes * temp_scaled
+ b_hum * hum_scaled
+ b_windspeed * windspeed_scaled
+ b_workingday[workingday_idx]
+ b_weathersit[weathersit_idx]
),
dims="date",
)
# --- likelihood ---
likelihood = pm.StudentT(
name="likelihood", mu=mu, nu=nu, sigma=sigma, dims="date", observed=cnt_scaled
)
pm.model_to_graphviz(model)
2. Prior Predictive Analysis
with model:
# --- prior samples ---
prior_predictive = pm.sample_prior_predictive(samples=200)
fig, ax = plt.subplots(figsize=(12, 6))
for i, p in enumerate(percs[::-1]):
upper = np.percentile(
prior_predictive.prior_predictive["likelihood"], p, axis=1
)
lower = np.percentile(
prior_predictive.prior_predictive["likelihood"], 100 - p, axis=1
)
color_val = colors[i]
ax.fill_between(
x=date,
y1=upper.flatten(),
y2=lower.flatten(),
color=cmap(color_val),
alpha=0.1,
)
sns.lineplot(x=date, y=cnt_scaled, color="black", ax=ax)
ax.set(
title="Time-Varying Coefficient Model - Prior Predictive Samples",
xlabel="date",
ylabel=target_scaled
);
3. Model Fit
with model:
idata = pm.sampling_jax.sample_numpyro_nuts(
target_accept=0.8, draws=1000, chains=4
)
posterior_predictive = pm.sample_posterior_predictive(trace=idata)
4. Model Diagnostics
az.summary(data=idata, var_names=["~mu", "~slopes"])

axes = az.plot_trace(
data=idata_base,
var_names=["~mu", "~slopes"],
compact=True,
kind="rank_bars",
backend_kwargs={"figsize": (12, 15), "layout": "constrained"},
)
plt.gcf().suptitle("Time-Varying Coefficient Model - Trace", fontsize=16);
Let us now compare the two models in a forest plot.
axes = az.plot_forest(
data=[idata_base, idata],
model_names=["base", "time-varying"],
var_names=["~mu", "~slopes"],
combined=True,
r_hat=True,
ess=True, figsize=(10, 6),
)
axes[0].axvline(x=0.0, color="black", linestyle="--")
plt.gcf().suptitle("Posterior Distributions", fontsize=16);
Overall, there is no major change in the estimated regression coefficients besides the trend component.
We can also use the az.compare
method to compare the two models.
az.compare(compare_dict={"base": idata_base, "time-varying": idata})
rank | loo | p_loo | d_loo | weight | se | dse | warning | loo_scale | |
---|---|---|---|---|---|---|---|---|---|
time-varying | 0 | -234.726306 | 131.925724 | 0.000000 | 0.989431 | 26.230529 | 0.000000 | False | log |
base | 1 | -488.674408 | 10.541945 | 253.948102 | 0.010569 | 26.755712 | 22.971437 | False | log |
It seems that the time-varying coefficient model is better at predicting the bike rentals.
5. Posterior Predictive Distribution
posterior_predictive_likelihood = posterior_predictive.posterior_predictive[
"likelihood"
].stack(sample=("chain", "draw"))
posterior_predictive_likelihood_inv = endog_scaler.inverse_transform(
X=posterior_predictive_likelihood.to_numpy()
)
fig, ax = plt.subplots(figsize=(12, 6))
for i, p in enumerate(percs[::-1]):
upper = np.percentile(posterior_predictive_likelihood_inv, p, axis=1)
lower = np.percentile(posterior_predictive_likelihood_inv, 100 - p, axis=1)
color_val = colors[i]
ax.fill_between(
x=date,
y1=upper,
y2=lower,
color=cmap(color_val),
alpha=0.1,
)
sns.lineplot(
x=date,
y=posterior_predictive_likelihood_inv.mean(axis=1),
color="C2",
label="posterior predictive mean",
ax=ax,
)
sns.lineplot(
x=date,
y=cnt,
color="black",
label=target,
ax=ax,
)
ax.legend(loc="upper left")
ax.set(
title="Time-Varying Coefficient Model - Posterior Predictive Samples",
xlabel="date",
ylabel=target,
);
6. Temperature Effect Deep-Dive
Next, we wan to compare the inferred temperature effect from both models. To begin with, let us compare the mean effect for both models.
base_tmp_mean = (
idata_base.posterior["b_temp"].stack(sample=("chain", "draw")).to_numpy().mean()
)
time_varying_tmp_mean = (
idata.posterior["slopes"].stack(sample=("chain", "draw")).to_numpy().mean()
)
print(
f"""
base model mean effect = {base_tmp_mean: 0.3f}
------------------------------------------
time-varying model mean effect = {time_varying_tmp_mean: 0.3f}
------------------------------------------
"""
)
base model mean effect = 0.512
------------------------------------------
time-varying model mean effect = 0.611
------------------------------------------
It seems that the effect of the time-varying coefficient model is higher. Still, this is just one statistic.It is always better to see the data. The following plot shows the effect as a function of the temperature.
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(
x="temp",
y="slopes",
data=data_df.assign(
slopes=idata.posterior["slopes"].stack(sample=("chain", "draw")).mean(axis=1)
),
hue="mnth",
palette="tab20",
)
sns.lineplot(
x="temp",
y="b_temp",
data=data_df.assign(
b_temp=idata_base.posterior["b_temp"]
.stack(sample=("chain", "draw"))
.mean()
.to_numpy()
),
color="black",
label="base model",
ax=ax,
)
ax.axhline(y=0.0, color="gray", linestyle="--")
ax.set(title="Temperature Effect", xlabel="temp", ylabel="effect");
We see many interesting features from this plot:
The time-varying coefficient model indeed finds a non constant effect of temperature on bike rentals.
The effect of second model is always positive for temperatures lower than 25 degrees. For higher temperatures the effect is sometimes negative, which is consistent with the results found in the previous post Exploring Tools for Interpretable Machine Learning. This happens mainly in the months of June and July.
The variance of the time-varying effect decreases with the temperature.
Finally, let us plot the temperature effect as a function of time.
palette = "cividis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))
posterior_predictive_slopes = idata.posterior["slopes"].stack(sample=("chain", "draw"))
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=False, layout="constrained"
)
for i, p in enumerate(percs[::-1]):
upper = np.percentile(posterior_predictive_slopes, p, axis=1)
lower = np.percentile(posterior_predictive_slopes, 100 - p, axis=1)
color_val = colors[i]
ax[0].fill_between(
x=date,
y1=upper,
y2=lower,
color=cmap(color_val),
alpha=0.1,
)
sns.lineplot(
x="date",
y="b_temp",
data=data_df.assign(
b_temp=idata_base.posterior["b_temp"]
.stack(sample=("chain", "draw"))
.mean()
.to_numpy()
),
color="black",
label="base model",
ax=ax[0]
)
ax[0].axhline(y=0.0, color="gray", linestyle="--")
ax[0].legend(loc="upper left")
ax[0].set(title="Temperature Effect")
sns.lineplot(x="date", y="temp", data=data_df, color="C0", ax=ax[1])
ax[1].set(title="Temperature");
We see that the effect is quite seasonal.
Note that these results are compatible with the insights found on the previous post Exploring Tools for Interpretable Machine Learning. For example, take a look into the PDP and ICE Pots
and specifically here is are the same plots restricted to the month of July:
Here we see the negative effect of temperature on bike count for the summer months.
Remark: It would be interesting to compare these results with Orbit’s KTR model (see Media Effect Estimation with Orbit’s KTR Model)