Time-Varying Regression Coefficients via Gaussian Random Walk in PyMC

In this notebook we want to illustrate how to use PyMC to fit a time-varying coefficient regression model. The motivation comes from post Exploring Tools for Interpretable Machine Learning where we studied a time series problem, regarding the prediction of the number of bike rentals, from a machine learning perspective. Concretely, we fitted and compared two machine learning models: a linear regression with interactions and a gradient boost model (XGBoost). The models regressors were mainly meteorological data and seasonality features. One interesting feature we saw, through PDP and ICE plots was that the temperature feature had a non-constant effect over the bike rentals (see here). Indeed, when the temperature is high (more than 25 degrees approximately), the bike rentals are negatively impacted by the temperature (to be fair, this is when controlling by other regressors) on average. What we want to do in this notebook is to tackle the same problem from a different perspective. Namely, use to use a GaussianRandomWalk to model the interaction effect between the temperature and the bike rentals. We of course start with the simple regression baseline for comparison.

Prepare Notebook

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
from pymc.distributions.continuous import Exponential
from sklearn.preprocessing import MinMaxScaler
import seaborn as sns


plt.style.use("bmh")
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"

seed: int = sum(map(ord, "bikes"))
rng: np.random.Generator = np.random.default_rng(seed=seed)

Read Data

For a detailed description of the data set please see here (from the book Interpretable Machine Learning:A Guide for Making Black Box Models Explainable by Christoph Molnar). For a detailed EDA see the previous post Exploring Tools for Interpretable Machine Learning.

data_path = "https://raw.githubusercontent.com/christophM/interpretable-ml-book/master/data/bike.csv"

raw_data_df = pd.read_csv(
    data_path,
    dtype={
        "season": "category",
        "mnth": "category",
        "holiday": "category",
        "weekday": "category",
        "workingday": "category",
        "weathersit": "category",
        "cnt": "int",
    },
)

raw_data_df.head()

	season	yr	mnth	holiday	weekday	workingday	weathersit	temp	hum	windspeed	cnt	days_since_2011
0	WINTER	2011	JAN	NO HOLIDAY	SAT	NO WORKING DAY	MISTY	8.175849	80.5833	10.749882	985	0
1	WINTER	2011	JAN	NO HOLIDAY	SUN	NO WORKING DAY	MISTY	9.083466	69.6087	16.652113	801	1
2	WINTER	2011	JAN	NO HOLIDAY	MON	WORKING DAY	GOOD	1.229108	43.7273	16.636703	1349	2
3	WINTER	2011	JAN	NO HOLIDAY	TUE	WORKING DAY	GOOD	1.400000	59.0435	10.739832	1562	3
4	WINTER	2011	JAN	NO HOLIDAY	WED	WORKING DAY	GOOD	2.666979	43.6957	12.522300	1600	4

Data Formatting

data_df = raw_data_df.copy()

data_df["date"] = pd.to_datetime("2011-01-01") + data_df["days_since_2011"].apply(
    lambda z: pd.Timedelta(z, unit="D")
)

data_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   season           731 non-null    category      
 1   yr               731 non-null    int64         
 2   mnth             731 non-null    category      
 3   holiday          731 non-null    category      
 4   weekday          731 non-null    category      
 5   workingday       731 non-null    category      
 6   weathersit       731 non-null    category      
 7   temp             731 non-null    float64       
 8   hum              731 non-null    float64       
 9   windspeed        731 non-null    float64       
 10  cnt              731 non-null    int64         
 11  days_since_2011  731 non-null    int64         
 12  date             731 non-null    datetime64[ns]
dtypes: category(6), datetime64[ns](1), float64(3), int64(3)
memory usage: 45.6 KB

Let’s plot the time development of the bike rentals and temperature over time.

fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(x="date", y="cnt", data=data_df, color="black", ax=ax[0])
sns.lineplot(x="date", y="temp", data=data_df, color="C0", ax=ax[1])
ax[0].set(title="Count of bikes")
ax[1].set(title="Temperature")

We add a Gaussian bump function to model the end-of-year seasonality.

is_december = data_df["date"].dt.month == 12
eoy_mu = 25
eoy_sigma = 7
eoy_arg = (data_df["date"].dt.day - eoy_mu) / eoy_sigma
data_df["eoy"] = is_december * np.exp(-(eoy_arg**2))

fig, ax = plt.subplots()
sns.lineplot(x="date", y="eoy", data=data_df, color="C4", ax=ax)
ax.set(title="End-of-Year Gaussian Bump")

As we are going to fit a linear model is always good to scale the data.

target = "cnt"
target_scaled = f"{target}_scaled"

endog_scaler = MinMaxScaler()
exog_scaler = MinMaxScaler()


data_df[target_scaled] = endog_scaler.fit_transform(X=data_df[[target]])
data_df[["temp_scaled", "hum_scaled", "windspeed_scaled"]] = exog_scaler.fit_transform(
    X=data_df[["temp", "hum", "windspeed"]]
)

Finally, we extract the features we want to include in the model.

n = data_df.shape[0]
# target
cnt = data_df[target].to_numpy()
cnt_scaled = data_df[target_scaled].to_numpy()
# date feature
date = data_df["date"].to_numpy()
# model regressors
temp_scaled = data_df["temp_scaled"].to_numpy()
hum_scaled = data_df["hum_scaled"].to_numpy()
windspeed_scaled = data_df["windspeed_scaled"].to_numpy()
holiday_idx, holiday = data_df["holiday"].factorize(sort=True)
workingday_idx, workingday = data_df["workingday"].factorize(sort=True)
weathersit_idx, weathersit = data_df["weathersit"].factorize(sort=True)
t = data_df["days_since_2011"].to_numpy() / data_df["days_since_2011"].max()
eoy = data_df["eoy"].to_numpy()

Base Model

Before we jump into the time-varying coefficient model let us fist fit a baseline regression model. Let us follow the bayesian workflow for model development.

1. Model Specification

In a first step (after EDA and data pre-processing) we define the model structure. In particular we choose

Prior distributions for the regression coefficients and the noise.
Model parametrization, i.e. the structure of the linear model. Note that for the categorical variables we use a ZeroSumNormal since we are adding an intercept term.
Likelihood function, which is this case we decide for a StudentT distribution.

coords = {
    "date": date,
    "workingday": workingday,
    "weathersit": weathersit,
}

with pm.Model(coords=coords) as base_model:
    # --- priors ---
    intercept = pm.Normal(name="intercept", mu=0, sigma=2)
    b_temp = pm.Normal(name="b_temp", mu=0, sigma=1)
    b_hum = pm.Normal(name="b_hum", mu=0, sigma=1)
    b_windspeed = pm.Normal(name="b_windspeed", mu=0, sigma=1)
    b_holiday = pm.ZeroSumNormal(name="b_holiday", sigma=1, dims="holiday")
    b_workingday = pm.ZeroSumNormal(name="b_workingday", sigma=1, dims="workingday")
    b_weathersit = pm.ZeroSumNormal(name="b_weathersit", sigma=1, dims="weathersit")
    b_t = pm.Normal(name="b_t", mu=0, sigma=3)
    b_eoy = pm.Normal(name="b_eoy", mu=0, sigma=1)
    nu = pm.Gamma(name="nu", alpha=8, beta=2)
    sigma = pm.HalfNormal(name="sigma", sigma=1)

    # --- model parametrization ---
    mu = pm.Deterministic(
        name="mu",
        var=(
            intercept
            + b_t * t
            + b_temp * temp_scaled
            + b_hum * hum_scaled
            + b_windspeed * windspeed_scaled
            + b_holiday[holiday_idx]
            + b_workingday[workingday_idx]
            + b_weathersit[weathersit_idx]
            + b_eoy * eoy
        ),
        dims="date",
    )

    # --- likelihood ---
    likelihood = pm.StudentT(
        name="likelihood", mu=mu, nu=nu, sigma=sigma, dims="date", observed=cnt_scaled
    )


pm.model_to_graphviz(base_model)

2. Prior Predictive Analysis

We can analyze what the model expects before seeing the data.

with base_model:
    # --- prior samples ---
    prior_predictive_base = pm.sample_prior_predictive(samples=200, random_seed=rng)


palette = "viridis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))


fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(
        prior_predictive_base.prior_predictive["likelihood"], p, axis=1
    )
    lower = np.percentile(
        prior_predictive_base.prior_predictive["likelihood"], 100 - p, axis=1
    )
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper.flatten(),
        y2=lower.flatten(),
        color=cmap(color_val),
        alpha=0.1,
    )
sns.lineplot(x=date, y=cnt_scaled, color="black", ax=ax)
ax.set(
    title="Base Model - Prior Predictive Samples",
    xlabel="date",
    ylabel=target_scaled,
)

The chosen prior distributions are not very restrictive but they image of the predictions is still within a reasonable range.

3. Model Fit

We sample from the posterior distributions using the JAX based NUTS sampler from Numpyro.

with base_model:
    idata_base = pm.sample(
        target_accept=0.9,
        draws=2_000,
        chains=5,
        nuts_sampler="numpyro",
        idata_kwargs={"log_likelihood": True},
        random_seed=rng,
    )
    posterior_predictive_base = pm.sample_posterior_predictive(
        trace=idata, random_seed=rng
    )

# get number of divergences
idata_base["sample_stats"]["diverging"].sum().item()

4. Model Diagnostics

Now wee look into some diagnostics metrics an plots.

az.summary(data=idata_base, var_names=["~mu"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
intercept	0.096	0.031	0.039	0.155	0.000	0.000	4774.0	5867.0	1.0
b_temp	0.474	0.016	0.442	0.503	0.000	0.000	10817.0	7829.0	1.0
b_hum	-0.155	0.032	-0.216	-0.096	0.000	0.000	6113.0	6706.0	1.0
b_windspeed	-0.129	0.023	-0.173	-0.085	0.000	0.000	7584.0	6548.0	1.0
b_t	0.460	0.012	0.437	0.483	0.000	0.000	10465.0	7553.0	1.0
b_eoy	-0.310	0.025	-0.357	-0.262	0.000	0.000	10741.0	7346.0	1.0
b_holiday[HOLIDAY]	-0.025	0.011	-0.045	-0.005	0.000	0.000	9671.0	7367.0	1.0
b_holiday[NO HOLIDAY]	0.025	0.011	0.005	0.045	0.000	0.000	9671.0	7367.0	1.0
b_workingday[NO WORKING DAY]	-0.007	0.004	-0.014	0.000	0.000	0.000	12154.0	7185.0	1.0
b_workingday[WORKING DAY]	0.007	0.004	-0.000	0.014	0.000	0.000	12154.0	7185.0	1.0
b_weathersit[GOOD]	0.086	0.009	0.068	0.103	0.000	0.000	5353.0	6721.0	1.0
b_weathersit[MISTY]	0.044	0.008	0.029	0.060	0.000	0.000	8161.0	7154.0	1.0
b_weathersit[RAIN/SNOW/STORM]	-0.130	0.015	-0.160	-0.102	0.000	0.000	5753.0	6144.0	1.0
nu	4.816	0.812	3.363	6.345	0.009	0.006	8752.0	7263.0	1.0
sigma	0.077	0.003	0.070	0.084	0.000	0.000	8498.0	7483.0	1.0

axes = az.plot_trace(
    data=idata_base,
    var_names=["~mu"],
    compact=True,
    kind="rank_bars",
    backend_kwargs={"figsize": (12, 15), "layout": "constrained"},
)
plt.gcf().suptitle("Base Model - Trace", fontsize=16)

Overall, the model looks good.

axes = az.plot_forest(
    data=idata_base,
    var_names=["~mu", "~nu", "~sigma"],
    combined=True,
    r_hat=True,
    ess=True,
    figsize=(10, 6),
)
axes[0].axvline(x=0.0, color="black", linestyle="--")
plt.gcf().suptitle("Base Model - Posterior Distributions", fontsize=16)

Note that in this base model the temperature feature has a positive effect on bike rentals on average.

5. Posterior Predictive Distribution

Finally, let us take a look at the fitted values.

palette = "viridis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))


posterior_predictive_likelihood = posterior_predictive_base.posterior_predictive[
    "likelihood"
].stack(sample=("chain", "draw"))

posterior_predictive_likelihood_inv = endog_scaler.inverse_transform(
    X=posterior_predictive_likelihood.to_numpy()
)

fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(posterior_predictive_likelihood_inv, p, axis=1)
    lower = np.percentile(posterior_predictive_likelihood_inv, 100 - p, axis=1)
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper,
        y2=lower,
        color=cmap(color_val),
        alpha=0.1,
    )

sns.lineplot(
    x=date,
    y=posterior_predictive_likelihood_inv.mean(axis=1),
    color="C2",
    label="posterior predictive mean",
    ax=ax,
)
sns.lineplot(
    x=date,
    y=cnt,
    color="black",
    label=target,
    ax=ax,
)
ax.legend(loc="upper left")
ax.set(
    title="Base Model - Posterior Predictive Samples",
    xlabel="date",
    ylabel=target,
)

Observe that for certain points in July the predictions and the fit do not coincide and go in opposite directions. This will be solved by adding the time-varying coefficient to the linear model for the temperature regressor (see below).

Remark: Note that the model predict negative values for the bike rentals. This is of course not good! An alterative choice of likelihood to model count data would resolve this (e.g. Poisson or Negative Binomial likelihoods).

Time-Varying Coefficients Model

We now follow the same workflow above. The main difference is that we replace the regression coefficient of the temperature with GaussianRandomWalk and remove the intercept term.

1. Model Specification

with pm.Model(coords=coords) as model:
    # --- priors ---
    b_hum = pm.Normal(name="b_hum", mu=0, sigma=1)
    b_windspeed = pm.Normal(name="b_windspeed", mu=0, sigma=1)
    b_holiday = pm.ZeroSumNormal(name="b_holiday", sigma=1, dims="holiday")
    b_workingday = pm.ZeroSumNormal(name="b_workingday", sigma=1, dims="workingday")
    b_weathersit = pm.ZeroSumNormal(name="b_weathersit", sigma=1, dims="weathersit")
    b_t = pm.Normal(name="b_t", mu=0, sigma=2)
    b_eoy = pm.Normal(name="b_eoy", mu=0, sigma=1)
    sigma_slopes = pm.Exponential(name="sigma_slope", lam=1 / 0.2)
    nu = pm.Gamma(name="nu", alpha=8, beta=2)
    sigma = pm.HalfNormal(name="sigma", sigma=1)

    # --- model parametrization ---
    slopes = pm.GaussianRandomWalk(
        name="slopes",
        sigma=sigma_slopes,
        init_dist=Exponential.dist(lam=1 / 0.1),
        dims="date",
    )
    mu = pm.Deterministic(
        name="mu",
        var=(
            b_t * t
            + slopes * temp_scaled
            + b_hum * hum_scaled
            + b_windspeed * windspeed_scaled
            + b_holiday[holiday_idx]
            + b_workingday[workingday_idx]
            + b_weathersit[weathersit_idx]
            + b_eoy * eoy
        ),
        dims="date",
    )

    # --- likelihood ---
    likelihood = pm.StudentT(
        name="likelihood", mu=mu, nu=nu, sigma=sigma, dims="date", observed=cnt_scaled
    )


pm.model_to_graphviz(model)

2. Prior Predictive Analysis

with model:
    # --- prior samples ---
    prior_predictive = pm.sample_prior_predictive(samples=200, random_seed=rng)


palette = "viridis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))


fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(prior_predictive.prior_predictive["likelihood"], p, axis=1)
    lower = np.percentile(
        prior_predictive.prior_predictive["likelihood"], 100 - p, axis=1
    )
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper.flatten(),
        y2=lower.flatten(),
        color=cmap(color_val),
        alpha=0.1,
    )
sns.lineplot(x=date, y=cnt_scaled, color="black", ax=ax)
ax.set(
    title="Time-Varying Coefficient Model - Prior Predictive Samples",
    xlabel="date",
    ylabel=target_scaled,
)

3. Model Fit

with model:
    idata = pm.sample(
        target_accept=0.97,
        draws=2_000,
        chains=5,
        nuts_sampler="numpyro",
        idata_kwargs={"log_likelihood": True},
        random_seed=rng,
    )
    posterior_predictive = pm.sample_posterior_predictive(
        trace=idata, random_seed=rng
    )

# get number of divergences
idata["sample_stats"]["diverging"].sum().item()

We get a small number of divergences 😔 … still not dramatic.

4. Model Diagnostics

az.summary(data=idata, var_names=["~mu", "~slopes"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
b_hum	-0.077	0.019	-0.112	-0.040	0.001	0.001	656.0	2021.0	1.00
b_windspeed	-0.082	0.016	-0.111	-0.052	0.000	0.000	2182.0	5512.0	1.00
b_t	0.399	0.033	0.336	0.459	0.004	0.003	87.0	424.0	1.06
b_eoy	-0.248	0.037	-0.316	-0.175	0.001	0.001	1852.0	3498.0	1.00
b_holiday[HOLIDAY]	-0.043	0.008	-0.057	-0.027	0.000	0.000	2628.0	6585.0	1.00
b_holiday[NO HOLIDAY]	0.043	0.008	0.027	0.057	0.000	0.000	2628.0	6585.0	1.00
b_workingday[NO WORKING DAY]	-0.005	0.003	-0.010	-0.000	0.000	0.000	12229.0	9441.0	1.00
b_workingday[WORKING DAY]	0.005	0.003	0.000	0.010	0.000	0.000	12229.0	9441.0	1.00
b_weathersit[GOOD]	0.102	0.006	0.090	0.113	0.000	0.000	2059.0	5124.0	1.00
b_weathersit[MISTY]	0.049	0.007	0.037	0.062	0.000	0.000	5743.0	7402.0	1.00
b_weathersit[RAIN/SNOW/STORM]	-0.151	0.011	-0.172	-0.129	0.000	0.000	2845.0	5447.0	1.00
sigma_slope	0.044	0.006	0.033	0.055	0.000	0.000	212.0	347.0	1.02
nu	2.564	0.349	1.925	3.204	0.006	0.004	3667.0	7415.0	1.00
sigma	0.042	0.003	0.035	0.048	0.000	0.000	811.0	2206.0	1.01

axes = az.plot_trace(
    data=idata,
    var_names=["~mu", "~slopes"],
    compact=True,
    kind="rank_bars",
    backend_kwargs={"figsize": (12, 15), "layout": "constrained"},
)
plt.gcf().suptitle("Time-Varying Coefficient Model - Trace", fontsize=16)

Let us now compare the two models in a forest plot.

axes = az.plot_forest(
    data=[idata_base, idata],
    model_names=["base", "time-varying"],
    var_names=["~mu", "~slopes", "~nu", "~sigma"],
    combined=True,
    r_hat=True,
    ess=True,
    figsize=(12, 7),
)
axes[0].axvline(x=0.0, color="black", linestyle="--")
plt.gcf().suptitle("Posterior Distributions", fontsize=16)

Overall, there is no major change in the estimated regression coefficients besides the trend component.

We can also use the az.compare method to compare the two models.

az.compare(compare_dict={"base": idata_base, "time-varying": idata})

	rank	elpd_loo	p_loo	elpd_diff	weight	se	dse	warning	scale
time-varying	0	859.510979	190.554494	0.000000	0.924529	28.507826	0.000000	False	log
base	1	668.611994	11.713275	190.898985	0.075471	24.952297	21.054906	False	log

It seems that the time-varying coefficient model is better at predicting the bike rentals.

5. Posterior Predictive Distribution

palette = "viridis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))


posterior_predictive_likelihood = posterior_predictive.posterior_predictive[
    "likelihood"
].stack(sample=("chain", "draw"))

posterior_predictive_likelihood_inv = endog_scaler.inverse_transform(
    X=posterior_predictive_likelihood.to_numpy()
)

fig, ax = plt.subplots(figsize=(12, 6))

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(posterior_predictive_likelihood_inv, p, axis=1)
    lower = np.percentile(posterior_predictive_likelihood_inv, 100 - p, axis=1)
    color_val = colors[i]
    ax.fill_between(
        x=date,
        y1=upper,
        y2=lower,
        color=cmap(color_val),
        alpha=0.1,
    )

sns.lineplot(
    x=date,
    y=posterior_predictive_likelihood_inv.mean(axis=1),
    color="C2",
    label="posterior predictive mean",
    ax=ax,
)
sns.lineplot(
    x=date,
    y=cnt,
    color="black",
    label=target,
    ax=ax,
)
ax.legend(loc="upper left")
ax.set(
    title="Time-Varying Coefficient Model - Posterior Predictive Samples",
    xlabel="date",
    ylabel=target,
)

6. Temperature Effect Deep-Dive

Next, we wan to compare the inferred temperature effect from both models. To begin with, let us compare the mean effect for both models.

base_tmp_mean = (
    idata_base.posterior["b_temp"].stack(sample=("chain", "draw")).to_numpy().mean()
)
time_varying_tmp_mean = (
    idata.posterior["slopes"].stack(sample=("chain", "draw")).to_numpy().mean()
)
print(
    f"""
base model mean effect = {base_tmp_mean: 0.3f}
------------------------------------------
time-varying model mean effect  = {time_varying_tmp_mean: 0.3f}
------------------------------------------
"""
)

base model mean effect =  0.473
------------------------------------------
time-varying model mean effect  =  0.536
------------------------------------------

It seems that the effect of the time-varying coefficient model is higher. Still, this is just one statistic.It is always better to see the data. The following plot shows the effect as a function of the temperature.

fig, ax = plt.subplots(figsize=(12, 7))
sns.scatterplot(
    x="temp",
    y="slopes",
    data=data_df.assign(slopes=idata.posterior["slopes"].mean(dim=["chain", "draw"])),
    hue="mnth",
    palette="tab20",
)
sns.lineplot(
    x="temp",
    y="b_temp",
    data=data_df.assign(
        b_temp=idata_base.posterior["b_temp"].mean(dim=["chain", "draw"]).to_numpy()
    ),
    color="black",
    label="base model",
    ax=ax,
)
sns.rugplot(x=data_df["temp"], color="black", alpha=0.5, ax=ax)
ax.axhline(y=0.0, color="gray", linestyle="--")
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
ax.set(title="Temperature Effect", xlabel="temp", ylabel="effect")

We see many interesting features from this plot:

The time-varying coefficient model indeed finds a non constant effect of temperature on bike rentals.
The effect of second model is always positive and starts decreasing at around \(15\) degrees which is consistent with the results found in the previous post Exploring Tools for Interpretable Machine Learning.
The lowest points of the effect are:
- December and January. I think this is some seasonality effect that is not captured by the model.
- During the summer months when the temperatures are very high, the effect decreases considerably.
The variance of the time-varying effect decreases with the temperature.

Next, we plot the temperature effect as a function of time.

palette = "cividis_r"
cmap = plt.get_cmap(palette)
percs = np.linspace(51, 99, 100)
colors = (percs - np.min(percs)) / (np.max(percs) - np.min(percs))


posterior_predictive_slopes = idata.posterior["slopes"].stack(sample=("chain", "draw"))

fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(10, 6), sharex=True, sharey=False, layout="constrained"
)

for i, p in enumerate(percs[::-1]):
    upper = np.percentile(posterior_predictive_slopes, p, axis=1)
    lower = np.percentile(posterior_predictive_slopes, 100 - p, axis=1)
    color_val = colors[i]
    ax[0].fill_between(
        x=date,
        y1=upper,
        y2=lower,
        color=cmap(color_val),
        alpha=0.1,
    )

sns.lineplot(
    x="date",
    y="b_temp",
    data=data_df.assign(
        b_temp=idata_base.posterior["b_temp"]
        .stack(sample=("chain", "draw"))
        .mean()
        .to_numpy()
    ),
    color="black",
    label="base model",
    ax=ax[0],
)
ax[0].axhline(y=0.0, color="gray", linestyle="--")
ax[0].legend(loc="upper left")
ax[0].set(title="Temperature Effect")
sns.lineplot(x="date", y="temp", data=data_df, color="C0", ax=ax[1])
ax[1].set(title="Temperature")

In the time-varying coefficient model, the temperature effect oscillates around the constant effect of the baseline model. We do see that during the spring months the effect is higher than the constant effect of the baseline model. This is consistent with the previous plot.

We can transform the scale back into the original scale:

temp_effect_scaled_baseline = (
    idata_base["posterior"]["b_temp"] * exog_scaler.scale_[0]
) / endog_scaler.scale_[0]

temp_effect_scaled = (
    idata["posterior"]["slopes"] * exog_scaler.scale_[0]
) / endog_scaler.scale_[0]

fig, ax = plt.subplots(figsize=(12, 7))
az.plot_hdi(
    x=data_df["temp"].to_numpy(),
    y=temp_effect_scaled,
    hdi_prob=0.94,
    color="C0",
    fill_kwargs={"alpha": 0.4, "label": "$94\%$ HDI"},
    ax=ax,
)
az.plot_hdi(
    x=data_df["temp"].to_numpy(),
    y=temp_effect_scaled,
    hdi_prob=0.5,
    color="C0",
    fill_kwargs={"alpha": 0.6, "label": "$50\%$ HDI"},
    ax=ax,
)
ax.axhline(
    y=temp_effect_scaled_baseline.mean(dim=["chain", "draw"]).item(),
    color="C1",
    linestyle="--",
    label="base model posterior predictive mean",
)
sns.rugplot(x=data_df["temp"], color="black", alpha=0.5, ax=ax)
ax.axhline(y=0.0, color="gray", linestyle="--")
ax.legend(loc="upper right")
ax.set(
    title="Temperature Coefficient (Original Scale) - Posterior Predictive",
    xlabel="temp",
    ylabel=None,
)

For temperatures lower than \(-2\) the estimate is noisy and is probably leaking some seasonality effects.

Note that these results are compatible with the insights found on the previous post Exploring Tools for Interpretable Machine Learning. For example, take a look into the PDP and ICE Pots

and specifically here is are the same plots restricted to the month of July:

Here we see the negative effect of temperature on bike count for the summer months.

Remark: It would be interesting to compare these results with Orbit’s KTR model (see Media Effect Estimation with Orbit’s KTR Model)