In this notebook, we present a complete simulation study of the media mix model (MMM) and experimental calibration method presented in the paper “Media Mix Model Calibration With Bayesian Priors”, by Zhang, et al., where the authors propose a convenient parametrization the regression model in terms of the ROAs (return on advertising spend) instead of the classical regression (beta) coefficients. The benefit of this parametrization is that it allows for using Bayesian priors on the ROAS, which typically come from previous experiments or domain knowledge. Providing this information to the model is essential in real-life MMM applications, as biases and noise can easily fool us. Similar to the author’s paper, we show that the proposed method can provide better ROAS estimation when we have a bias due to missing covariates or omitted variables. We work out an example of the classical media mix model presented in the paper “Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects”, by Jin, et al.. I strongly recommend taking a look in too these two papers before reading this notebook. Reading them in parallel with this notebook is also a good alternative.
For an introduction to the topic I recommend my two previous posts:
- Media Effect Estimation with Orbit’s KTR Model
- Media Effect Estimation with PyMC: Adstock, Saturation & Diminishing Returns
and the PyMC-Marketing
MMM example.
Outline
Here is the outline of the study.
Data Generating Process
ROAS Computation (Analytically)
Data Preparation
Models:
Model 1 (Causal Model): Model with all variables needed to specify causal effects of media channels. We test parameter recovery from the data generating process.
Model 2 (Unobserved Confounder): Model with out a confounding variable. We show how the ROAS estimation in this case is quite off due omitted variable bias (confounding).
Model 3 (ROAS Calibration): We keep out the confounding variable as in Model 2, but we set up priors in the ROAS instead of the regression (beta) coefficients of the media variables. When setting informative priors on the ROAS, we show the biad due the missing confounding variable decreases significantly. This implies this technique can be very effective to calibrate media mix models.
All the results are fully reproducible!
Prepare Notebook
import arviz as az
import graphviz as gr
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
import pandas as pd
import preliz as pz
import pymc as pm
import pytensor.tensor as pt
import seaborn as sns
import xarray as xr
from dowhy import CausalModel
from sklearn import set_config
from sklearn.preprocessing import MaxAbsScaler
from tqdm.notebook import tqdm
set_config(transform_output="pandas")
az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"
seed: int = sum(map(ord, "mmm_roas"))
rng: np.random.Generator = np.random.default_rng(seed=seed)
Data Generating Process
We begin by generating a synthetic data set to be able to compare the estimates of various models against the ground truth. We follow a similar strategy as in the PyMC-Marketing
MMM example. However, the main difference is that we add a confounder variable which a causal effect on a specific channel and the target. The following DAG represents the data generating process and the causal dependencies between the variables.
g = gr.Digraph()
g.node(name="seasonality", label="seasonality", color="lightgray", style="filled")
g.node(name="trend", label="trend")
g.node(name="z", label="z", color="lightgray", style="filled")
g.node(name="x1", label="x1", color="#2a2eec80", style="filled")
g.node(name="x2", label="x2", color="#fa7c1780", style="filled")
g.node(name="y", label="y", color="#328c0680", style="filled")
g.edge(tail_name="seasonality", head_name="x1")
g.edge(tail_name="z", head_name="x1")
g.edge(tail_name="x1", head_name="y")
g.edge(tail_name="seasonality", head_name="y")
g.edge(tail_name="trend", head_name="y")
g.edge(tail_name="z", head_name="y")
g.edge(tail_name="x2", head_name="y")
g
Here \(x_1\) and \(x_2\) represent two media channels and \(y\) our target variable (e.g. sales). The confounder \(z\) has a causal effect on \(x_1\) and \(y\).
Remark: If you are a color geek like me, you can get the color palette color codes of the arviz
theme here. Note that you can add them with transparency as described here. Note that the transparency is specified by the las two characters of the hex code.
We not proceed to generate weekly data for \(2.5\) years. In the following snippet of code we generate the date range, seasonality and trend components and, the covariate \(z\), which is always positive.
# date range
min_date = pd.to_datetime("2021-10-02")
max_date = pd.to_datetime("2024-03-30")
data_df = pd.DataFrame(
data={"date": pd.date_range(start=min_date, end=max_date, freq="W-SAT")}
)
n = data_df.shape[0]
data_df["dayofyear"] = data_df["date"].dt.dayofyear
data_df["quarter"] = data_df["date"].dt.to_period("Q").dt.strftime("%YQ%q")
data_df["trend"] = (np.linspace(start=0.0, stop=50, num=n) + 10) ** (1 / 3) - 1
data_df["cs"] = -np.sin(2 * 2 * np.pi * data_df["dayofyear"] / 365.25)
data_df["cc"] = np.cos(1 * 2 * np.pi * data_df["dayofyear"] / 365.25)
data_df["seasonality"] = 0.3 * (data_df["cs"] + data_df["cc"])
data_df["z"] = 0.3 * rng.gamma(shape=1, scale=2 / 3, size=n)
The following plot shows the time series development of these features (and the aggregation).
fig, ax = plt.subplots(
nrows=2,
ncols=1,
figsize=(12, 9),
sharex=True,
sharey=False,
layout="constrained",
)
sns.lineplot(
data=data_df,
x="date",
y="trend",
color="C0",
label="trend",
ax=ax[0],
)
sns.lineplot(
data=data_df,
x="date",
y="seasonality",
color="C1",
label="seasonality",
ax=ax[0],
)
sns.lineplot(data=data_df, x="date", y="z", color="C2", label="z", ax=ax[0])
ax[0].legend(loc="upper left")
ax[0].set(xlabel="date", ylabel=None)
sns.lineplot(
data=data_df.eval("sum = trend + seasonality + z"),
x="date",
y="sum",
color="black",
label="trend + seasonality + z",
ax=ax[1],
)
ax[1].legend(loc="upper left")
fig.suptitle(t="Data Components", fontsize=18, fontweight="bold")
Next, we generate the media channels \(x_1\) and \(x_2\). Note that the channel spend \(x_1\) depends both the seasonality component and the \(z\) variable. We also generate Gaussian noise to be added to the target variable \(y\).
data_df["x1"] = 1.2 * (
0.5
+ 0.4 * data_df["seasonality"]
+ 0.6 * data_df["z"]
+ 0.1 * rng.poisson(lam=1 / 2, size=n)
)
data_df["x2"] = 0.3 * rng.gamma(shape=1, scale=1, size=n)
data_df["epsilon"] = rng.normal(loc=0, scale=0.1, size=n)
We now can visualize all the components:
fig, ax = plt.subplots(
nrows=2,
ncols=1,
figsize=(15, 10),
sharex=True,
sharey=False,
layout="constrained",
)
sns.lineplot(
data=data_df,
x="date",
y="trend",
color="C0",
alpha=0.5,
label="trend",
ax=ax[0],
)
sns.lineplot(
data=data_df,
x="date",
y="seasonality",
color="C1",
alpha=0.5,
label="seasonality",
ax=ax[0],
)
sns.lineplot(data=data_df, x="date", y="z", color="C2", alpha=0.5, label="z", ax=ax[0])
sns.lineplot(
data=data_df, x="date", y="x1", color="C3", linewidth=2.5, label="x1", ax=ax[0]
)
sns.lineplot(
data=data_df, x="date", y="x2", color="C4", linewidth=2.5, label="x2", ax=ax[0]
)
sns.lineplot(
data=data_df,
x="date",
y="epsilon",
color="C5",
linewidth=2.5,
label="epsilon",
ax=ax[0],
)
ax[0].legend(loc="upper left")
ax[0].set(xlabel="date", ylabel=None)
sns.lineplot(
data=data_df.eval("sum = trend + seasonality + z + x1 + x2 + epsilon"),
x="date",
y="sum",
color="black",
linewidth=2.5,
label="trend + seasonality + z + x1 + x2 + epsilon",
ax=ax[1],
)
ax[1].legend(loc="upper left")
fig.suptitle(t="Data Components", fontsize=18, fontweight="bold")
Note that we have not yet added the non-linear effects of the media channels. This is a key component in media mix models as we expect that the media signal to have carryover (adstock) and saturation effects (for a detailed description of these transformation and alternative parametrizations see the paper “Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects”, by Jin, et al.). We add these two transformations using some helper functions (the code is not that important):
def geometric_adstock(x, alpha, l_max, normalize):
"""Vectorized geometric adstock transformation."""
cycles = [
pt.concatenate(tensor_list=[pt.zeros(shape=x.shape)[:i], x[: x.shape[0] - i]])
for i in range(l_max)
]
x_cycle = pt.stack(cycles)
x_cycle = pt.transpose(x=x_cycle, axes=[1, 2, 0])
w = pt.as_tensor_variable([pt.power(alpha, i) for i in range(l_max)])
w = pt.transpose(w)[None, ...]
w = w / pt.sum(w, axis=2, keepdims=True) if normalize else w
return pt.sum(pt.mul(x_cycle, w), axis=2)
def logistic_saturation(x, lam):
"""Logistic saturation transformation."""
return (1 - pt.exp(-lam * x)) / (1 + pt.exp(-lam * x))
First, we apply the adstock transformation. The strength of the carryover effect is controlled by the decay parameter \(0\leq \alpha \leq 1\).
# adstock maximum lag
l_max = 4
# apply geometric adstock transformation
alpha1 = 0.3
alpha2 = 0.5
alpha = np.array([alpha1, alpha2])
data_df[["x1_adstock", "x2_adstock"]] = geometric_adstock(
x=data_df[["x1", "x2"]], alpha=alpha, l_max=l_max, normalize=True
).eval()
Let’s visualize the raw and transformed data:
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(x="date", y="x1", data=data_df, color="C0", label="x1", ax=ax[0])
sns.lineplot(
x="date",
y="x1_adstock",
data=data_df,
color="C1",
label="x1_adstock",
ax=ax[0],
)
sns.lineplot(x="date", y="x2", data=data_df, color="C0", label="x2", ax=ax[1])
sns.lineplot(
x="date",
y="x2_adstock",
data=data_df,
color="C1",
label="x2_adstock",
ax=ax[1],
)
fig.suptitle("Adstock Transformation", fontsize=18, fontweight="bold")
Observe that the adstock transformation has the effect of smoothing the media signal.
We continue by applying the saturation transformation:
# apply saturation transformation
lam1 = 1.0
lam2 = 2.5
lam = np.array([lam1, lam2])
data_df[["x1_adstock_saturated", "x2_adstock_saturated"]] = logistic_saturation(
x=data_df[["x1_adstock", "x2_adstock"]], lam=lam
).eval()
This transformation simply compresses peaks. Let;s first visualize the saturation function (which depends on a parameter \(\lambda > 0\)).
x_range = np.linspace(start=0, stop=1.4, num=100)
fig, ax = plt.subplots(figsize=(8, 7))
sns.lineplot(
data=data_df,
x="x1_adstock",
y="x1_adstock_saturated",
linewidth=3,
label="x1",
ax=ax,
)
sns.lineplot(
x=x_range,
y=logistic_saturation(x=x_range, lam=lam1).eval(),
color="C0",
linestyle="--",
ax=ax,
)
sns.lineplot(
data=data_df,
x="x2_adstock",
y="x2_adstock_saturated",
linewidth=3,
label="x2",
ax=ax,
)
sns.lineplot(
x=x_range,
y=logistic_saturation(x=x_range, lam=lam2).eval(),
color="C1",
linestyle="--",
ax=ax,
)
ax.axline(
(0, 0), slope=1, color="black", linestyle="--", linewidth=2.5, label="diagonal"
)
ax.legend(loc="upper left")
ax.set_title("Saturation Transformation", fontsize=18, fontweight="bold")
We can now compare the initial signal and how is transformed by these two on-linear transformations:
fig, ax = plt.subplots(
nrows=3, ncols=2, figsize=(15, 10), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(x="date", y="x1", data=data_df, color="C0", ax=ax[0, 0])
sns.lineplot(x="date", y="x2", data=data_df, color="C1", ax=ax[0, 1])
sns.lineplot(x="date", y="x1_adstock", data=data_df, color="C0", ax=ax[1, 0])
sns.lineplot(x="date", y="x2_adstock", data=data_df, color="C1", ax=ax[1, 1])
sns.lineplot(x="date", y="x1_adstock_saturated", data=data_df, color="C0", ax=ax[2, 0])
sns.lineplot(x="date", y="x2_adstock_saturated", data=data_df, color="C1", ax=ax[2, 1])
fig.autofmt_xdate()
fig.suptitle("Media Costs Data - Transformed", fontsize=18, fontweight="bold")
Now we are ready to put everything together and generate the target variable \(y\). We just need to apply some weights \(\beta\) to the transformed media data (i.e. specifying the regression coefficients).
beta1 = 2.0
beta2 = 1.5
amplitude = 100
data_df = data_df.eval(
f"""
x1_effect = {beta1} * x1_adstock_saturated
x2_effect = {beta2} * x2_adstock_saturated
y = {amplitude} * (trend + seasonality + z + x1_effect + x2_effect + epsilon)
y01 = {amplitude} * (trend + seasonality + z + x2_effect + epsilon)
y02 = {amplitude} * (trend + seasonality + z + x1_effect + epsilon)
"""
)
Finally, we can see the components and the final target variable:
fig, ax = plt.subplots(
nrows=2,
ncols=1,
figsize=(15, 10),
sharex=True,
sharey=False,
layout="constrained",
)
sns.lineplot(
data=data_df,
x="date",
y="trend",
color="C0",
alpha=0.5,
label="trend",
ax=ax[0],
)
sns.lineplot(
data=data_df,
x="date",
y="seasonality",
color="C1",
alpha=0.5,
label="seasonality",
ax=ax[0],
)
sns.lineplot(data=data_df, x="date", y="z", color="C2", alpha=0.5, label="z", ax=ax[0])
sns.lineplot(
data=data_df,
x="date",
y="x1_effect",
color="C3",
linewidth=2.5,
label="x1_effect",
ax=ax[0],
)
sns.lineplot(
data=data_df,
x="date",
y="x2_effect",
color="C4",
linewidth=2.5,
label="x2_effect",
ax=ax[0],
)
sns.lineplot(
data=data_df,
x="date",
y="epsilon",
color="C5",
linewidth=2.5,
label="epsilon",
ax=ax[0],
)
ax[0].legend(loc="upper left")
ax[0].set(title="Components", xlabel="date", ylabel=None)
sns.lineplot(
data=data_df,
x="date",
y="y",
color="black",
linewidth=2.5,
label="100 x (trend + seasonality + z + x1_effect + x2_effect + epsilon)",
ax=ax[1],
)
ax[1].legend(loc="lower right")
ax[1].set(title="Target")
fig.suptitle(t="Data Generating Process", fontsize=18, fontweight="bold")
Remark: We are computing quantities y01
and y01
as the potential outcome sales when channels \(x_1\) and \(x_2\) are set to \(0\). This is useful to compute the global ROAS (see below).
data_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131 entries, 0 to 130
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 131 non-null datetime64[ns]
1 dayofyear 131 non-null int32
2 quarter 131 non-null object
3 trend 131 non-null float64
4 cs 131 non-null float64
5 cc 131 non-null float64
6 seasonality 131 non-null float64
7 z 131 non-null float64
8 x1 131 non-null float64
9 x2 131 non-null float64
10 epsilon 131 non-null float64
11 x1_adstock 131 non-null float64
12 x2_adstock 131 non-null float64
13 x1_adstock_saturated 131 non-null float64
14 x2_adstock_saturated 131 non-null float64
15 x1_effect 131 non-null float64
16 x2_effect 131 non-null float64
17 y 131 non-null float64
18 y01 131 non-null float64
19 y02 131 non-null float64
dtypes: datetime64[ns](1), float64(17), int32(1), object(1)
memory usage: 20.1+ KB
ROAS
One of the key outputs of media mix models is the ROAS (return on advertising spend) which is defined as the ratio between the incremental sales and the media spend. In the following snippet of code we compute the ROAS analytically for each channel.
Recall from “Media Mix Model Calibration With Bayesian Priors”, by Zhang, et al., the definition of ROAS:
We compute a global ROAS as well as a quarterly ROAS. One has to be careful with the computation as we want to make sure we consider the carryover effects of the media channels. For example, if we compute the ROAS for the first week of the year, we need to consider the carryover effects of the media spend from the previous year.
The following function generate the sales when a given channel is turned off during a given period (note it mimics the data generating process described above). In the language of causal inference, this function computes the potential outcome when a given channel is turned off.
def generate_potential_y(
dataset, alpha, l_max, lambda_, beta1, beta2, amplitude, channel, t0, t1
):
dataset = dataset.copy()
dataset["mask"] = ~dataset["date"].between(left=t0, right=t1, inclusive="both")
dataset[channel] = dataset[channel].where(cond=dataset["mask"], other=0)
dataset[["x1_adstock", "x2_adstock"]] = geometric_adstock(
x=dataset[["x1", "x2"]], alpha=alpha, l_max=l_max, normalize=True
).eval()
dataset[["x1_adstock_saturated", "x2_adstock_saturated"]] = logistic_saturation(
x=dataset[["x1_adstock", "x2_adstock"]], lam=lambda_
).eval()
return dataset.eval(
f"""
x1_effect = {beta1} * x1_adstock_saturated
x2_effect = {beta2} * x2_adstock_saturated
y = {amplitude} * (trend + seasonality + z + x1_effect + x2_effect + epsilon)
"""
)
Let’s test this function does generates our dataset above when we do not turn off any channel. We achieve this by setting the t0
and t1
arguments (which define the turn off period) outside of the data range.
for channel in ["x1", "x2"]:
temp_df = generate_potential_y(
dataset=data_df,
alpha=alpha,
l_max=l_max,
lambda_=lam,
beta1=beta1,
beta2=beta2,
amplitude=amplitude,
channel=channel,
t0=min_date - pd.Timedelta(days=7),
t1=min_date - pd.Timedelta(days=7),
)
pd.testing.assert_frame_equal(left=temp_df[data_df.columns], right=data_df)
We proceed to generate the potential outcome sales for each channel and each quarter.
quarter_ranges = data_df.groupby("quarter", as_index=False).agg(
{"date": ["min", "max"]}
)
quarter_ranges.head()
quarter | date | ||
---|---|---|---|
min | max | ||
0 | 2021Q4 | 2021-10-02 | 2021-12-25 |
1 | 2022Q1 | 2022-01-01 | 2022-03-26 |
2 | 2022Q2 | 2022-04-02 | 2022-06-25 |
3 | 2022Q3 | 2022-07-02 | 2022-09-24 |
4 | 2022Q4 | 2022-10-01 | 2022-12-31 |
roas_data = {
channel: {
row["quarter"].item(): generate_potential_y(
dataset=data_df,
alpha=alpha,
l_max=l_max,
lambda_=lam,
beta1=beta1,
beta2=beta2,
amplitude=amplitude,
channel=channel,
t0=row["date"]["min"],
t1=row["date"]["max"],
)
for _, row in tqdm(quarter_ranges.iterrows())
}
for channel in ["x1", "x2"]
}
We can visualize the potential outcome against the true sales for each quarter for the first channel \(x_1\).
fig, axes = plt.subplots(
nrows=quarter_ranges.quarter.nunique(),
ncols=1,
figsize=(15, 17),
sharex=True,
sharey=True,
layout="constrained",
)
axes = axes.flatten()
for i, (q, q_df) in enumerate(roas_data["x1"].items()):
ax = axes[i]
ax_twin = ax.twinx()
sns.lineplot(
data=data_df, x="date", y="y", color="black", label="$y (actual)$", ax=ax
)
sns.lineplot(data=q_df, x="date", y="y", color="C0", label="$y (potential)$", ax=ax)
sns.lineplot(data=q_df, x="date", y="mask", color="C1", ax=ax_twin)
ax.set_title(f"Quarter {q}")
ax_twin.grid(None)
fig.suptitle(t="Actual vs Potential - x1", fontsize=18, fontweight="bold")
Observe that the potential sales and the true sales differ immediately after the mask period as a result of the carryover effect of the media spend. Next, we compute the ROAS for each each quarter by simply dividing the difference between the potential sales and the true sales by the media spend.
roas_x1 = pd.DataFrame.from_dict(
data={
q: (
(data_df["y"].sum() - q_df["y"].sum())
/ data_df.query("quarter == @q")["x1"].sum()
)
for q, q_df in roas_data["x1"].items()
},
orient="index",
columns=["roas"],
).assign(channel="x1")
We can do the exact same procedure for channel \(x_2\).
fig, axes = plt.subplots(
nrows=quarter_ranges.quarter.nunique(),
ncols=1,
figsize=(15, 17),
sharex=True,
sharey=True,
layout="constrained",
)
axes = axes.flatten()
for i, (q, q_df) in enumerate(roas_data["x2"].items()):
ax = axes[i]
ax_twin = ax.twinx()
sns.lineplot(
data=data_df, x="date", y="y", color="black", label="$y (actual)$", ax=ax
)
sns.lineplot(data=q_df, x="date", y="y", color="C0", label="$y (potential)$", ax=ax)
sns.lineplot(data=q_df, x="date", y="mask", color="C1", ax=ax_twin)
ax.set_title(f"Quarter {q}")
ax_twin.grid(None)
fig.suptitle(t="Actual vs Potential - x2", fontsize=18, fontweight="bold")
roas_x2 = pd.DataFrame.from_dict(
data={
q: (
(data_df["y"].sum() - q_df["y"].sum())
/ data_df.query("quarter == @q")["x2"].sum()
)
for q, q_df in roas_data["x2"].items()
},
orient="index",
columns=["roas"],
).assign(channel="x2")
Moreover, we can easily compute the global ROAS:
roas_all_1 = (data_df["y"] - data_df["y01"]).sum() / data_df["x1"].sum()
roas_all_2 = (data_df["y"] - data_df["y02"]).sum() / data_df["x2"].sum()
Let’s visualize the quarterly and global ROAS for each channel:
roas = (
pd.concat(objs=[roas_x1, roas_x2], axis=0)
.reset_index(drop=False)
.rename(columns={"index": "quarter"})
)
g = sns.catplot(
data=roas.assign(quarter_num=lambda df: df.quarter.str[4:]),
x="quarter",
y="roas",
row="channel",
hue="quarter_num",
kind="bar",
height=4,
aspect=3,
alpha=0.7,
sharey=False,
)
ax = g.axes.flatten()
ax[0].axhline(
y=roas_all_1, color="black", linestyle="--", linewidth=2.5, label="roas (all)"
)
ax[1].axhline(
y=roas_all_2, color="black", linestyle="--", linewidth=2.5, label="roas (all)"
)
g.fig.suptitle(t="True ROAS", fontsize=18, fontweight="bold", y=1.03)
We see that channel \(x_2\) has approximately \(75\%\) higher ROAS as compared to channel \(x_1\).
Data Preparation
Now that we have generated the data, we can proceed to prepare it for the media mix model. We start by selecting the features we would have in a real scenario. Note that in such a case we would not have x1_effect
or x2_effect
as this is precisely what we want to estimate.
columns_to_keep = [
"date",
"dayofyear",
"z",
"x1",
"x2",
"y",
]
model_df = data_df[columns_to_keep]
model_df.head()
date | dayofyear | z | x1 | x2 | y | |
---|---|---|---|---|---|---|
0 | 2021-10-02 | 275 | 0.053693 | 0.646554 | 0.336188 | 199.329637 |
1 | 2021-10-09 | 282 | 1.045243 | 1.411917 | 0.203931 | 371.237041 |
2 | 2021-10-16 | 289 | 0.179703 | 0.837610 | 0.024026 | 272.215933 |
3 | 2021-10-23 | 296 | 0.140496 | 0.973612 | 0.120257 | 291.104040 |
4 | 2021-10-30 | 303 | 0.869155 | 1.415985 | 0.084630 | 386.243000 |
Next, we proceed to scale the data with a MaxAbsScaler
. For more details about the scaling please see the previous post Media Effect Estimation with PyMC: Adstock, Saturation & Diminishing Returns.
date = model_df["date"]
index_scaler = MaxAbsScaler()
index_scaled = index_scaler.fit_transform(model_df.reset_index(drop=False)[["index"]])
target = "y"
target_scaler = MaxAbsScaler()
target_scaled = target_scaler.fit_transform(model_df[[target]])
channels = ["x1", "x2"]
channels_scaler = MaxAbsScaler()
channels_scaled = channels_scaler.fit_transform(model_df[channels])
controls = ["z"]
controls_scaler = MaxAbsScaler()
controls_scaled = controls_scaler.fit_transform(model_df[controls])
To model the yearly seasonality we generate some Fourier modes to include them into the regression model.
n_order = 3
periods = model_df["dayofyear"] / 365.25
fourier_features = pd.DataFrame(
{
f"{func}_order_{order}": getattr(np, func)(2 * np.pi * periods * order)
for order in range(1, n_order + 1)
for func in ("sin", "cos")
}
)
Now we are ready to start with the media mix model.
Model 1: Causal Model
In this first model we simply add the features from our causal graph. We should be able to recover the parameters and the ROAS from the data generating process. Before jumping into the model lets look closer the the causal structure.
Causal Model
It is very important to view media mix models from a causal perspective. We can not simply add a bunch of variables and hope for the best! See for example Using Data Science for Bad Decision-Making: A Case Study. There is a systematic way to select the variables we should / should not add into the model. To see this how this works in practice, let’s use DoWhy
to get the set of variables for our model. First we need to define the DAG (this is where domain knowledge from the marketing mechanism comes into place!).
gml_graph = """
graph [
directed 1
node [
id x1
label "x1"
]
node [
id x2
label "x2"
]
node [
id z
label "z"
]
node [
id seasonality
label "seasonality"
]
node [
id trend
label "trend"
]
node [
id y
label "y"
]
edge [
source seasonality
target x1
]
edge [
source z
target x1
]
edge [
source seasonality
target y
]
edge [
source trend
target y
]
edge [
source z
target y
]
edge [
source x1
target y
]
edge [
source x2
target y
]
]
"""
We want to see if we can find a set of variables which we can put in a regression model to estimate the effect of \(x_1\) and \(x_2\) on \(y\). Hence, we define to causal models:
causal_model_1 = CausalModel(
data=model_df,
graph=gml_graph,
treatment="x1",
outcome="y",
)
causal_model_2 = CausalModel(
data=model_df,
graph=gml_graph,
treatment="x2",
outcome="y",
)
causal_model_1.view_model()
We can now query for backdoor paths between the treatment and the outcome:
causal_model_1._graph.get_backdoor_paths(nodes1=["x1"], nodes2=["y"])
[['x1', 'z', 'y'], ['x1', 'seasonality', 'y']]
causal_model_2._graph.get_backdoor_paths(nodes1=["x2"], nodes2=["y"])
[]
Note that for \(x_1\) we need to add \(z\) and seasonality, otherwise we would have open paths:
causal_model_1._graph.check_valid_backdoor_set(nodes1=["x1"], nodes2=["y"], nodes3=[])
{'is_dseparated': False}
Therefore the set of variables we should add to the model are \(x_1\), \(x_2\), \(z\), seasonality and trend (the trend component will allow us to reduce the variance on the estimate).
causal_model_1._graph.check_valid_backdoor_set(
nodes1=["x1"], nodes2=["y"], nodes3=["z", "seasonality", "trend"]
)
{'is_dseparated': True}
causal_model_1._graph.check_valid_backdoor_set(
nodes1=["x2"], nodes2=["y"], nodes3=["z", "seasonality", "trend"]
)
{'is_dseparated': True}
Prior Specification
Bayesian models provide great flexibility to add domain knowledge into the model through the prior distributions. PreliZ
is a great library to study and set priors.
For example, what should be the prior for the adstock parameter \(\alpha\)? We know it should be between zero and one. We also do not expect the carry over to be very strong. Hence, we can use a Beta distribution with the mass a bit towards zero.
fig, ax = plt.subplots(figsize=(10, 6))
pz.Beta(alpha=2, beta=3).plot_pdf(ax=ax)
ax.set(xlabel=r"$\alpha$")
ax.set_title(label=r"Prior Distribution $\alpha$", fontsize=18, fontweight="bold")
For \(\lambda\) we could also argue similarly: it is a positive number and we expect a mild saturation. Hence, we can use a Gamma distribution with mean close to one. Still, we make sure it is not concentrated to much around one in order to allow for some flexibility.
fig, ax = plt.subplots(figsize=(10, 6))
pz.Gamma(alpha=2, beta=2).plot_pdf(ax=ax)
ax.set(xlabel=r"$\lambda$")
ax.set_title(label=r"Prior Distribution $\lambda$", fontsize=18, fontweight="bold")
A much harder choice for prior are the regression coefficients of the media variables. Without any prior information on channel performance a common choice is to use a weakly informative prior driven by the investment share. The idea motivation behind it is that we expect (without seeing the data!) that channels where we spend the most to contribute more to sales. This can actually not be the case once we see the data. Nevertheless, is a good guidance if you do not have any other information. One could put the same prior for all channels, but in practice I have seen in practice that, given the small sample sizes, smaller channels can get very high contributions just because of the noise (I have also tested via simulations). It is important to emphasize that there is no right answer here. It is a matter of what you believe and what you want to test.
For this example, let’s use such a prior. In addition, note we expect the effect to be always positive. Hence, we can use a half-normal distribution to enforce this.
channel_share = data_df[channels].sum() / data_df[channels].sum().sum()
fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(
x=channel_share.index, y=channel_share.values, hue=channel_share.index, ax=ax
)
ax.yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1))
ax.set(xlabel="channel", ylabel="share")
ax.set_title(label="Channel Spend Share", fontsize=18, fontweight="bold")
fig, ax = plt.subplots(figsize=(10, 6))
pz.HalfNormal(sigma=channel_share.loc["x1"]).plot_pdf(ax=ax)
pz.HalfNormal(sigma=channel_share.loc["x2"]).plot_pdf(ax=ax)
ax.set(xlabel=r"$\beta$")
ax.set_title(label=r"Prior Distribution $\beta$", fontsize=18, fontweight="bold")
Model Specification
We are now ready to specify the model. We will not go into details as we are implementing the classical model from “Media Mix Model Calibration With Bayesian Priors”, by Zhang, et al., which is described in detail in the the previous post Media Effect Estimation with PyMC: Adstock, Saturation & Diminishing Returns.
The only difference it we do not add an explicit trend variable but we use a Gaussian process to model this latent variable (therefore we also remove an intercept) as described in the post Estimating the Long-Term Base in Marketing-Mix Models. We are using the same technique as described in the example Time Series Modeling with HSGP: Baby Births Example.
coords = {"date": date, "channel": channels, "fourier_mode": np.arange(2 * n_order)}
with pm.Model(coords=coords) as model_1:
# --- Data Containers ---
index_scaled_data = pm.Data(
name="index_scaled",
value=index_scaled.to_numpy().flatten(),
mutable=True,
dims="date",
)
channels_scaled_data = pm.Data(
name="channels_scaled",
value=channels_scaled,
mutable=True,
dims=("date", "channel"),
)
z_scaled_data = pm.Data(
name="z_scaled",
value=controls_scaled.to_numpy().flatten(),
mutable=True,
dims="date",
)
fourier_features_data = pm.Data(
name="fourier_features",
value=fourier_features,
mutable=True,
dims=("date", "fourier_mode"),
)
y_scaled_data = pm.Data(
name="y_scaled",
value=target_scaled.to_numpy().flatten(),
mutable=True,
dims="date",
)
# --- Priors ---
amplitude_trend = pm.HalfNormal(name="amplitude_trend", sigma=1)
ls_trend_params = pm.find_constrained_prior(
distribution=pm.InverseGamma,
lower=0.1,
upper=0.9,
init_guess={"alpha": 3, "beta": 1},
mass=0.95,
)
ls_trend = pm.InverseGamma(name="ls_trend", **ls_trend_params)
alpha = pm.Beta(name="alpha", alpha=2, beta=3, dims="channel")
lam = pm.Gamma(name="lam", alpha=2, beta=2, dims="channel")
beta_channel = pm.HalfNormal(
name="beta_channel",
sigma=channel_share.to_numpy(),
dims="channel",
)
beta_z = pm.Normal(name="beta_z", mu=0, sigma=1)
beta_fourier = pm.Normal(name="beta_fourier", mu=0, sigma=1, dims="fourier_mode")
sigma = pm.HalfNormal(name="sigma", sigma=1)
nu = pm.Gamma(name="nu", alpha=25, beta=2)
# --- Parametrization ---
cov_trend = amplitude_trend * pm.gp.cov.ExpQuad(input_dim=1, ls=ls_trend)
gp_trend = pm.gp.HSGP(m=[20], c=1.5, cov_func=cov_trend)
f_trend = gp_trend.prior(name="f_trend", X=index_scaled_data[:, None], dims="date")
channel_adstock = pm.Deterministic(
name="channel_adstock",
var=geometric_adstock(
x=channels_scaled_data,
alpha=alpha,
l_max=l_max,
normalize=True,
),
dims=("date", "channel"),
)
channel_adstock_saturated = pm.Deterministic(
name="channel_adstock_saturated",
var=logistic_saturation(x=channel_adstock, lam=lam),
dims=("date", "channel"),
)
channel_contributions = pm.Deterministic(
name="channel_contributions",
var=channel_adstock_saturated * beta_channel,
dims=("date", "channel"),
)
fourier_contribution = pm.Deterministic(
name="fourier_contribution",
var=pt.dot(fourier_features_data, beta_fourier),
dims="date",
)
z_contribution = pm.Deterministic(
name="z_contribution",
var=z_scaled_data * beta_z,
dims="date",
)
mu = pm.Deterministic(
name="mu",
var=f_trend
+ channel_contributions.sum(axis=-1)
+ z_contribution
+ fourier_contribution,
dims="date",
)
# --- Likelihood ---
y = pm.StudentT(
name="y",
nu=nu,
mu=mu,
sigma=sigma,
observed=y_scaled_data,
dims="date",
)
pm.model_to_graphviz(model=model_1)
Prior Predictive Checks
Before fitting the model, it is always a good idea to check the prior predictive distribution. This is a way to see if the model is able to generate data that is consistent with our prior beliefs. We can do this by sampling from the prior and plotting the results.
with model_1:
prior_predictive_1 = pm.sample_prior_predictive(samples=2_000, random_seed=rng)
fig, ax = plt.subplots()
az.plot_ppc(data=prior_predictive_1, group="prior", kind="kde", ax=ax)
ax.set_title(label="Prior Predictive - Model 1", fontsize=18, fontweight="bold")
It looks quite reasonable.
Model Fitting
We now fit the model using the No-U-Turn Sampler (NUTS) algorithm. We will run four chains in parallel and sample \(2000\) draws from each chain. We use NumPyro
as the backend for the model fitting (it is super fast).
with model_1:
idata_1 = pm.sample(
target_accept=0.9,
draws=2_000,
chains=4,
nuts_sampler="numpyro",
random_seed=rng,
)
posterior_predictive_1 = pm.sample_posterior_predictive(
trace=idata_1, random_seed=rng
)
Model Diagnostics
First we verify we do not have divergences:
idata_1["sample_stats"]["diverging"].sum().item()
\(\displaystyle 0\)
Next, we look into the posterior distribution of the parameters.
var_names = [
"amplitude_trend",
"ls_trend",
"alpha",
"lam",
"beta_channel",
"beta_z",
"beta_fourier",
"sigma",
"nu",
]
az.summary(data=idata_1, var_names=var_names, round_to=3)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
amplitude_trend | 0.380 | 0.339 | 0.031 | 1.024 | 0.006 | 0.004 | 2216.743 | 3547.131 | 1.001 |
ls_trend | 0.307 | 0.042 | 0.224 | 0.380 | 0.001 | 0.001 | 2068.505 | 2892.961 | 1.003 |
alpha[x1] | 0.318 | 0.072 | 0.181 | 0.449 | 0.001 | 0.001 | 7797.458 | 6545.114 | 1.000 |
alpha[x2] | 0.493 | 0.028 | 0.440 | 0.544 | 0.000 | 0.000 | 10193.826 | 5924.861 | 1.001 |
lam[x1] | 0.973 | 0.452 | 0.276 | 1.806 | 0.006 | 0.004 | 5677.231 | 6177.363 | 1.000 |
lam[x2] | 2.812 | 0.687 | 1.548 | 4.070 | 0.009 | 0.006 | 6103.450 | 5004.784 | 1.001 |
beta_channel[x1] | 0.709 | 0.312 | 0.262 | 1.288 | 0.004 | 0.003 | 5211.349 | 6102.915 | 1.000 |
beta_channel[x2] | 0.404 | 0.094 | 0.268 | 0.581 | 0.001 | 0.001 | 5946.573 | 4792.955 | 1.001 |
beta_z | 0.190 | 0.017 | 0.158 | 0.222 | 0.000 | 0.000 | 7142.269 | 5758.734 | 1.000 |
beta_fourier[0] | 0.000 | 0.008 | -0.015 | 0.012 | 0.000 | 0.000 | 3680.755 | 3103.355 | 1.000 |
beta_fourier[1] | 0.047 | 0.007 | 0.034 | 0.060 | 0.000 | 0.000 | 5009.843 | 3406.016 | 1.001 |
beta_fourier[2] | -0.055 | 0.004 | -0.063 | -0.047 | 0.000 | 0.000 | 7868.972 | 6689.172 | 1.001 |
beta_fourier[3] | -0.004 | 0.002 | -0.008 | 0.000 | 0.000 | 0.000 | 10720.589 | 7004.293 | 1.001 |
beta_fourier[4] | -0.004 | 0.002 | -0.008 | -0.000 | 0.000 | 0.000 | 10558.562 | 6143.012 | 1.000 |
beta_fourier[5] | -0.000 | 0.002 | -0.004 | 0.004 | 0.000 | 0.000 | 10858.069 | 5358.714 | 1.000 |
sigma | 0.015 | 0.001 | 0.013 | 0.018 | 0.000 | 0.000 | 9595.037 | 5908.872 | 1.000 |
nu | 12.606 | 2.482 | 8.049 | 17.155 | 0.023 | 0.017 | 12003.132 | 6203.788 | 1.001 |
axes = az.plot_trace(
data=idata_1,
var_names=var_names,
compact=True,
kind="rank_bars",
backend_kwargs={"figsize": (15, 17), "layout": "constrained"},
)
plt.gcf().suptitle("Trace - Model 1", fontsize=16)
The trace looks fine!
Posterior Predictive Checks
We now want to see the model fit. We want to do it in the original scale. Therefore we need to scale our posterior samples back.
pp_vars_original_scale_1 = {
var_name: xr.apply_ufunc(
target_scaler.inverse_transform,
idata_1["posterior"][var_name].expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
for var_name in ["mu", "f_trend", "fourier_contribution", "z_contribution"]
}
pp_vars_original_scale_1["channel_contributions"] = xr.apply_ufunc(
target_scaler.inverse_transform,
idata_1["posterior"]["channel_contributions"],
input_core_dims=[["date", "channel"]],
output_core_dims=[["date", "channel"]],
vectorize=True,
)
# We define the baseline as f_trand + fourier_contribution + z_contribution
pp_vars_original_scale_1["baseline"] = xr.apply_ufunc(
target_scaler.inverse_transform,
(
idata_1["posterior"]["f_trend"]
+ idata_1["posterior"]["fourier_contribution"]
+ idata_1["posterior"]["z_contribution"]
).expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
We also scale back the likelihood.
pp_likelihood_original_scale_1 = xr.apply_ufunc(
target_scaler.inverse_transform,
posterior_predictive_1["posterior_predictive"]["y"].expand_dims(
dim={"_": 1}, axis=-1
),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
We can now visualize the results:
fig, ax = plt.subplots()
sns.lineplot(data=model_df, x="date", y="y", color="black", label=target, ax=ax)
az.plot_hdi(
x=date,
y=pp_likelihood_original_scale_1,
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.1, "label": r"likelihood $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["mu"],
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.3, "label": r"$\mu$ $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper left")
ax.set_title(label="Posterior Predictive - Model 1", fontsize=18, fontweight="bold")
The model does explain the data variance quite well. Let’s look into the distribution:
fig, ax = plt.subplots()
az.plot_ppc(
data=posterior_predictive_1,
num_pp_samples=1_000,
observed_rug=True,
random_seed=seed,
ax=ax,
)
ax.set_title(label="Posterior Predictive - Model 1", fontsize=18, fontweight="bold")
Looks good!
Model Components
We now visualize the model components.
fig, ax = plt.subplots()
sns.lineplot(data=model_df, x="date", y="y", color="black", label=target, ax=ax)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["channel_contributions"].sel(channel="x1"),
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.5, "label": r"$x1$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["channel_contributions"].sel(channel="x2"),
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.5, "label": r"$x2$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["f_trend"],
hdi_prob=0.94,
color="C2",
fill_kwargs={"alpha": 0.3, "label": r"$f_\text{trend}$ $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["fourier_contribution"],
hdi_prob=0.94,
color="C3",
fill_kwargs={"alpha": 0.3, "label": r"seasonality $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["z_contribution"],
hdi_prob=0.94,
color="C4",
fill_kwargs={"alpha": 0.3, "label": r"$z$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=3)
ax.set_title(label="Components Contributions - Model 1", fontsize=18, fontweight="bold")
Remarks:
- The Gaussian process is able to capture the trend component quite well.
- Note that the variance of the \(x_1\) estimate is much wider than the one for \(x_2\). This is due to the fact \(x_1\) has a high correlation with the seasonality component. Multicollinearity is a common issue in media mix models.
We now plot the baseline against the media contribution:
fig, ax = plt.subplots()
sns.lineplot(data=model_df, x="date", y="y", color="black", label=target, ax=ax)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["channel_contributions"].sel(channel="x1"),
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.5, "label": r"$x1$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["channel_contributions"].sel(channel="x2"),
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.5, "label": r"$x2$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["baseline"],
hdi_prob=0.94,
color="C7",
fill_kwargs={"alpha": 0.5, "label": r"baseline $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper left")
ax.set_title(label="Baseline vs Channels - Model 1", fontsize=18, fontweight="bold")
Let’s now deep dive into the media contributions:
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(
x="date",
y="x1_effect",
data=data_df.assign(x1_effect=lambda x: amplitude * x["x1_effect"]),
color="C0",
label=r"$x1$ effect",
ax=ax[0],
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["channel_contributions"].sel(channel="x1"),
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.3, "label": r"$x1$ contribution $94\%$ HDI"},
smooth=False,
ax=ax[0],
)
ax[0].legend(loc="upper right")
sns.lineplot(
x="date",
y="x2_effect",
data=data_df.assign(x2_effect=lambda x: amplitude * x["x2_effect"]),
color="C1",
label=r"$x2$ effect",
ax=ax[1],
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_1["channel_contributions"].sel(channel="x2"),
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.3, "label": r"$x2$ contribution $94\%$ HDI"},
smooth=False,
ax=ax[1],
)
ax[1].legend(loc="upper right")
fig.suptitle("Channel Contributions - Model 1", fontsize=18, fontweight="bold")
Overall, the model recovered the true effects. Still, the estimate of \(x_2\) is slightly underestimated by the model (the true value is not in the middle of the HDI).
Media Parameters
We now compare the estimated against the true values for the parameters \(\alpha\), \(\lambda\) and the regression coefficients \(\beta\). Recall that since we are working on the scaled space we need to scale \(\lambda\) and \(\beta\) back:
alpha_posterior_1 = idata_1["posterior"]["alpha"]
lam_posterior_1 = idata_1["posterior"]["lam"] * channels_scaler.scale_
beta_channel_posterior_1 = (
idata_1["posterior"]["beta_channel"]
* (1 / channels_scaler.scale_)
* (target_scaler.scale_)
* (1 / amplitude)
)
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=alpha_posterior_1, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(
x=alpha1, color="C0", linestyle="--", linewidth=2, label=r"$\alpha_1$ (true)"
)
ax.axvline(
x=alpha2, color="C1", linestyle="--", linewidth=2, label=r"$\alpha_2$ (true)"
)
ax.legend(loc="upper left")
ax.set_title(
label=r"Adstock Parameter $\alpha$ ($94\%$ HDI) - Model 1",
fontsize=18,
fontweight="bold",
)
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=lam_posterior_1, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(x=lam1, color="C0", linestyle="--", linewidth=2, label=r"$\lambda_1$ (true)")
ax.axvline(x=lam2, color="C1", linestyle="--", linewidth=2, label=r"$\lambda_2$ (true)")
ax.legend(loc="upper right")
ax.set_title(
label=r"Saturation Parameter $\lambda$ ($94\%$ HDI) - Model 1",
fontsize=18,
fontweight="bold",
)
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=beta_channel_posterior_1, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(x=beta1, color="C0", linestyle="--", linewidth=2, label=r"$\beta_1$ (true)")
ax.axvline(x=beta2, color="C1", linestyle="--", linewidth=2, label=r"$\beta_2$ (true)")
ax.legend(loc="upper right")
ax.set_title(
label=r"Channel $\beta$ ($94\%$ HDI) - Model 1", fontsize=18, fontweight="bold"
)
Remarks:
- We wer able to recover the true values of the carryover parameter \(\alpha\).
- For \(x_1\), the true values of \(\lambda\) and \(\beta\) are in the \(94\%\) HDI.
- The \(94\%\) HDI of the regression coefficient \(\beta\) for \(x_1\) is much wider than the one for \(x_2\). As mentioned before, this is probably related with multicollinearity with the seasonal component.
- For \(x_2\) the model overestimated the saturation parameter \(\lambda\) and underestimated the regression coefficient \(\beta\). Maybe we should have had stricter priors? This is a true challenge in practice as we do not know the true values. This is a motivation for this notebook! Being able to easily pass information through ROAS than beta coefficients in the scaled space is a much more convenient way to pass information to the model.
ROAS Estimation
Once the model is fitted, we can use the posterior samples to compute the ROAS as in above. We simply need to generate predictions when setting the media data to zero. Let’s start with the global ROAS:
predictions_roas_data_all_1 = {"x1": {}, "x2": {}}
for channel in tqdm(["x1", "x2"]):
with model_1:
pm.set_data(
new_data={
"channels_scaled": channels_scaler.transform(
data_df[channels].assign(**{channel: 0})
)
}
)
predictions_roas_data_all_1[channel] = pm.sample_posterior_predictive(
trace=idata_1, var_names=["y", "mu"], progressbar=False, random_seed=rng
)
We now scale the predictions back to the original scale:
predictions_roas_data_scaled_diff_all_1 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_data_scaled_diff_all_1[channel] = pp_vars_original_scale_1[
"mu"
] - xr.apply_ufunc(
target_scaler.inverse_transform,
predictions_roas_data_all_1[channel]["posterior_predictive"]["mu"].expand_dims(
dim={"_": 1}, axis=-1
),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(
dim="_"
)
We can now visualize the difference iin the potential outcomes:
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
az.plot_hdi(
x=date,
y=predictions_roas_data_scaled_diff_all_1["x1"],
color="C0",
smooth=False,
fill_kwargs={"alpha": 0.3, "label": r"$E(y - y_{01})$ $94\%$ HDI"},
ax=ax[0],
)
sns.lineplot(
data=data_df.assign(x1_effect=lambda x: amplitude * x["x1_effect"]),
x="date",
y="x1_effect",
color="C0",
label=r"$x1$ effect",
ax=ax[0],
)
az.plot_hdi(
x=date,
y=predictions_roas_data_scaled_diff_all_1["x2"],
color="C1",
smooth=False,
fill_kwargs={"alpha": 0.3, "label": r"$E(y - y_{02})$ $94\%$ HDI"},
ax=ax[1],
)
sns.lineplot(
data=data_df.assign(x2_effect=lambda x: amplitude * x["x2_effect"]),
x="date",
y="x2_effect",
color="C1",
label=r"$x2$ effect",
ax=ax[1],
)
ax[0].legend(loc="upper left")
ax[1].legend(loc="upper left")
ax[0].set(ylabel=None)
ax[1].set(ylabel=None)
fig.suptitle("Potential Outcomes Difference - Model 1", fontsize=18, fontweight="bold")
It is not surprise that this is precisely the same plot as the one we saw before. Namely, the media contributions from the model 🙃.
We now compute the ROAS:
predictions_roas_all_1 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_all_1[channel] = (
predictions_roas_data_scaled_diff_all_1[channel].sum(dim="date")
/ data_df[channel].sum()
)
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
az.plot_posterior(predictions_roas_all_1["x1"], ref_val=roas_all_1, ax=ax[0])
ax[0].set(title="x1")
az.plot_posterior(predictions_roas_all_1["x2"], ref_val=roas_all_2, ax=ax[1])
ax[1].set(title="x2", xlabel="ROAS")
fig.suptitle("Global ROAS - Model 1", fontsize=18, fontweight="bold")
We can proceed in an analogous way for the quarterly ROAS. This time we need to loop ever time series on which we set up the quarterly spend to zero for a given channel.
predictions_roas_data_1 = {"x1": {}, "x2": {}}
for channel in tqdm(["x1", "x2"]):
for q in tqdm(roas_data[channel].keys()):
with model_1:
pm.set_data(
new_data={
"channels_scaled": channels_scaler.transform(
roas_data[channel][q][channels]
)
}
)
predictions_roas_data_1[channel][q] = pm.sample_posterior_predictive(
trace=idata_1, var_names=["y", "mu"], progressbar=False, random_seed=rng
)
predictions_roas_data_scaled_diff_1 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_data_scaled_diff_1[channel] = {
q: pp_vars_original_scale_1["mu"]
- xr.apply_ufunc(
target_scaler.inverse_transform,
idata_q["posterior_predictive"]["mu"].expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
for q, idata_q in tqdm(predictions_roas_data_1[channel].items())
}
In order to have a better understanding on this computation, we can plot one of such time series differences:
fig, ax = plt.subplots()
ax.axvspan(
xmin=quarter_ranges.loc[quarter_ranges.quarter == "2022Q3"]["date"]["min"].item(),
xmax=quarter_ranges.loc[quarter_ranges.quarter == "2022Q3"]["date"]["max"].item(),
color="C1",
label="2022-Q3",
alpha=0.2,
)
az.plot_hdi(
x=date,
y=predictions_roas_data_scaled_diff_1["x2"]["2022Q3"],
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.7, "label": r"$x2$ difference against actuals $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper right")
ax.set_title(
label=r"Potential Outcomes Difference - Model 1 (Channel $x2$ (2023-Q3))",
fontsize=18,
fontweight="bold",
)
Observe the difference is non-zero just after the selected quarter. This is due to the carryover effect of the media spend.
We now divide the differences by the spend to get the ROAS:
predictions_roas_1 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_1[channel] = {
q: idata_q.sum(dim="date") / data_df.query("quarter == @q")[channel].sum()
for q, idata_q in predictions_roas_data_scaled_diff_1[channel].items()
}
Here are the results for \(x_1\) and compare them with the true values:
x1_ref_vals = (
roas.query("channel == 'x1'")
.drop("channel", axis=1)
.rename(columns={"roas": "ref_val"})
.to_dict(orient="records")
)
axes = az.plot_posterior(
data=xr.concat(
predictions_roas_1["x1"].values(),
dim=pd.Index(data=predictions_roas_1["x1"].keys(), name="quarter"),
),
figsize=(18, 15),
grid=(5, 2),
ref_val={"mu": x1_ref_vals},
backend_kwargs={"sharex": True, "layout": "constrained"},
)
fig = axes.ravel()[0].get_figure()
fig.suptitle(
t="Posterior Predictive - ROAS x1 - Model 1", fontsize=18, fontweight="bold"
)
Here are the results for \(x_2\):
x2_ref_vals = (
roas.query("channel == 'x2'")
.drop("channel", axis=1)
.rename(columns={"roas": "ref_val"})
.to_dict(orient="records")
)
axes = az.plot_posterior(
data=xr.concat(
predictions_roas_1["x2"].values(),
dim=pd.Index(data=predictions_roas_1["x2"].keys(), name="quarter"),
),
figsize=(18, 15),
grid=(5, 2),
ref_val={"mu": x2_ref_vals},
backend_kwargs={"sharex": True, "layout": "constrained"},
)
fig = axes.ravel()[0].get_figure()
fig.suptitle(
t="Posterior Predictive - ROAS x2 - Model 1", fontsize=18, fontweight="bold"
)
These results are consistent with the ones from the global ROAS. The model underestimates for \(x_2\) (for \(x_1\) the difference lies within the reasonable uncertainty estimates).
Model 2: Unobserved Confounder
In this second model we keep the same model structure as above but we remove the confounder \(z\) as a covariate. We want to see if the model is able to recover the true effects of the media channels. We should expect the model to overestimate the effect of \(x_1\) as a result of omitted variable bias.
with pm.Model(coords=coords) as model_2:
# --- Data Containers ---
index_scaled_data = pm.Data(
name="index_scaled",
value=index_scaled.to_numpy().flatten(),
mutable=True,
dims="date",
)
channels_scaled_data = pm.Data(
name="channels_scaled",
value=channels_scaled,
mutable=True,
dims=("date", "channel"),
)
fourier_features_data = pm.Data(
name="fourier_features",
value=fourier_features,
mutable=True,
dims=("date", "fourier_mode"),
)
y_scaled_data = pm.Data(
name="y_scaled",
value=target_scaled.to_numpy().flatten(),
mutable=True,
dims="date",
)
# --- Priors ---
amplitude_trend = pm.HalfNormal(name="amplitude_trend", sigma=1)
ls_trend_params = pm.find_constrained_prior(
distribution=pm.InverseGamma,
lower=0.1,
upper=0.9,
init_guess={"alpha": 3, "beta": 1},
mass=0.95,
)
ls_trend = pm.InverseGamma(name="ls_trend", **ls_trend_params)
alpha = pm.Beta(name="alpha", alpha=2, beta=3, dims="channel")
lam = pm.Gamma(name="lam", alpha=2, beta=2, dims="channel")
beta_channel = pm.HalfNormal(
name="beta_channel",
sigma=channel_share.to_numpy(),
dims="channel",
)
beta_fourier = pm.Normal(name="beta_fourier", mu=0, sigma=1, dims="fourier_mode")
sigma = pm.HalfNormal(name="sigma", sigma=1)
nu = pm.Gamma(name="nu", alpha=25, beta=2)
# --- Parametrization ---
cov_trend = amplitude_trend * pm.gp.cov.ExpQuad(input_dim=1, ls=ls_trend)
gp_trend = pm.gp.HSGP(m=[20], c=1.5, cov_func=cov_trend)
f_trend = gp_trend.prior(name="f_trend", X=index_scaled_data[:, None], dims="date")
channel_adstock = pm.Deterministic(
name="channel_adstock",
var=geometric_adstock(
x=channels_scaled_data,
alpha=alpha,
l_max=l_max,
normalize=True,
),
dims=("date", "channel"),
)
channel_adstock_saturated = pm.Deterministic(
name="channel_adstock_saturated",
var=logistic_saturation(x=channel_adstock, lam=lam),
dims=("date", "channel"),
)
channel_contributions = pm.Deterministic(
name="channel_contributions",
var=channel_adstock_saturated * beta_channel,
dims=("date", "channel"),
)
fourier_contribution = pm.Deterministic(
name="fourier_contribution",
var=pt.dot(fourier_features_data, beta_fourier),
dims="date",
)
mu = pm.Deterministic(
name="mu",
var=f_trend + channel_contributions.sum(axis=-1) + fourier_contribution,
dims="date",
)
# --- Likelihood ---
y = pm.StudentT(
name="y",
nu=nu,
mu=mu,
sigma=sigma,
observed=y_scaled_data,
dims="date",
)
pm.model_to_graphviz(model=model_2)
The prior predictive check should be similar to the one above. We will skip it. We fit the model directly.
with model_2:
idata_2 = pm.sample(
target_accept=0.9,
draws=2_000,
chains=4,
nuts_sampler="numpyro",
random_seed=rng,
)
posterior_predictive_2 = pm.sample_posterior_predictive(
trace=idata_2, random_seed=rng
)
Model Diagnostics
We inspect for divergences the trace of the model:
idata_2["sample_stats"]["diverging"].sum().item()
\(\displaystyle 0\)
var_names = [
"amplitude_trend",
"ls_trend",
"alpha",
"lam",
"beta_channel",
"beta_fourier",
"sigma",
"nu",
]
az.summary(data=idata_2, var_names=var_names, round_to=3)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
amplitude_trend | 0.296 | 0.298 | 0.017 | 0.867 | 0.006 | 0.005 | 1669.534 | 3017.828 | 1.001 |
ls_trend | 0.337 | 0.049 | 0.243 | 0.426 | 0.001 | 0.001 | 1712.131 | 1506.935 | 1.001 |
alpha[x1] | 0.140 | 0.037 | 0.068 | 0.206 | 0.000 | 0.000 | 5437.078 | 3453.508 | 1.001 |
alpha[x2] | 0.481 | 0.043 | 0.396 | 0.560 | 0.000 | 0.000 | 8357.205 | 5956.070 | 1.001 |
lam[x1] | 0.799 | 0.202 | 0.467 | 1.201 | 0.003 | 0.002 | 4217.968 | 5296.788 | 1.001 |
lam[x2] | 2.090 | 0.584 | 1.041 | 3.153 | 0.009 | 0.006 | 4337.649 | 3596.350 | 1.001 |
beta_channel[x1] | 1.485 | 0.334 | 0.915 | 2.099 | 0.005 | 0.004 | 4258.979 | 5367.265 | 1.001 |
beta_channel[x2] | 0.488 | 0.127 | 0.289 | 0.730 | 0.002 | 0.001 | 4267.566 | 5275.431 | 1.001 |
beta_fourier[0] | 0.006 | 0.007 | -0.004 | 0.017 | 0.000 | 0.000 | 3971.823 | 2878.822 | 1.001 |
beta_fourier[1] | 0.028 | 0.007 | 0.016 | 0.038 | 0.000 | 0.000 | 4614.429 | 2660.869 | 1.001 |
beta_fourier[2] | -0.027 | 0.005 | -0.036 | -0.019 | 0.000 | 0.000 | 6689.125 | 5829.273 | 1.001 |
beta_fourier[3] | -0.001 | 0.003 | -0.007 | 0.005 | 0.000 | 0.000 | 10266.507 | 4704.094 | 1.000 |
beta_fourier[4] | -0.004 | 0.003 | -0.009 | 0.002 | 0.000 | 0.000 | 13133.854 | 5397.527 | 1.001 |
beta_fourier[5] | -0.004 | 0.003 | -0.010 | 0.002 | 0.000 | 0.000 | 11668.364 | 5770.083 | 1.002 |
sigma | 0.022 | 0.002 | 0.019 | 0.026 | 0.000 | 0.000 | 6113.067 | 5754.287 | 1.000 |
nu | 12.444 | 2.408 | 7.858 | 16.776 | 0.027 | 0.019 | 7514.359 | 5171.298 | 1.000 |
axes = az.plot_trace(
data=idata_2,
var_names=var_names,
compact=True,
kind="rank_bars",
backend_kwargs={"figsize": (15, 17), "layout": "constrained"},
)
plt.gcf().suptitle("Trace - Model 2", fontsize=16)
Overall, the trace looks fine. One big difference we immediately see is that the regression coefficient \(\beta_1\) of \(x_1\) increased significantly as compared to the previous model
Posterior Predictive Checks
We run the same analysis to visualize the model fit. As berore, we need to rescale the predictions back to the original scale.
pp_vars_original_scale_2 = {
var_name: xr.apply_ufunc(
target_scaler.inverse_transform,
idata_2["posterior"][var_name].expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
for var_name in ["mu", "f_trend", "fourier_contribution"]
}
pp_vars_original_scale_2["channel_contributions"] = xr.apply_ufunc(
target_scaler.inverse_transform,
idata_2["posterior"]["channel_contributions"],
input_core_dims=[["date", "channel"]],
output_core_dims=[["date", "channel"]],
vectorize=True,
)
pp_likelihood_original_scale_2 = xr.apply_ufunc(
target_scaler.inverse_transform,
posterior_predictive_2["posterior_predictive"]["y"].expand_dims(
dim={"_": 1}, axis=-1
),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
fig, ax = plt.subplots()
sns.lineplot(data=model_df, x="date", y="y", color="black", label=target, ax=ax)
az.plot_hdi(
x=date,
y=pp_likelihood_original_scale_2,
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.1, "label": r"likelihood $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_2["mu"],
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.3, "label": r"$\mu$ $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper left")
ax.set_title(label="Posterior Predictive - Model 2", fontsize=18, fontweight="bold")
fig, ax = plt.subplots()
az.plot_ppc(
data=posterior_predictive_2,
num_pp_samples=1_000,
observed_rug=True,
random_seed=seed,
ax=ax,
)
ax.set_title(label="Posterior Predictive - Model 2", fontsize=18, fontweight="bold")
The models seem to have a comparable in-sample fit.
Model Components
Now we inspect the model components.
fig, ax = plt.subplots()
sns.lineplot(data=model_df, x="date", y="y", color="black", label=target, ax=ax)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_2["channel_contributions"].sel(channel="x1"),
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.5, "label": r"$x1$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_2["channel_contributions"].sel(channel="x2"),
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.5, "label": r"$x2$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_2["f_trend"],
hdi_prob=0.94,
color="C2",
fill_kwargs={"alpha": 0.3, "label": r"$f_\text{trend}$ $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_2["fourier_contribution"],
hdi_prob=0.94,
color="C3",
fill_kwargs={"alpha": 0.3, "label": r"seasonality $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=3)
ax.set_title(label="Components Contributions - Model 2", fontsize=18, fontweight="bold")
Here we clearly see that the model overestimates the effect of \(x_1\) by a lot! 🫠
Let’s compare against the true values for the media contributions:
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(
x="date",
y="x1_effect",
data=data_df.assign(x1_effect=lambda x: amplitude * x["x1_effect"]),
color="C0",
label=r"$x1$ effect",
ax=ax[0],
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_2["channel_contributions"].sel(channel="x1"),
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.3, "label": r"$x1$ contribution $94\%$ HDI"},
smooth=False,
ax=ax[0],
)
ax[0].legend(loc="upper right")
sns.lineplot(
x="date",
y="x2_effect",
data=data_df.assign(x2_effect=lambda x: amplitude * x["x2_effect"]),
color="C1",
label=r"$x2$ effect",
ax=ax[1],
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_2["channel_contributions"].sel(channel="x2"),
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.3, "label": r"$x2$ contribution $94\%$ HDI"},
smooth=False,
ax=ax[1],
)
ax[1].legend(loc="upper right")
fig.suptitle("Channel Contributions - Model 2", fontsize=18, fontweight="bold")
The overestimation of \(x_1\) is close to \(100\%\)! Note that the estimate of \(x_2\) is not that bad, but still underestimated.
Media Parameters
We now
alpha_posterior_2 = idata_2["posterior"]["alpha"]
lam_posterior_2 = idata_2["posterior"]["lam"] * channels_scaler.scale_
beta_channel_posterior_2 = idata_2["posterior"]["beta_channel"] * channels_scaler.scale_
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=[alpha_posterior_2], combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(
x=alpha1, color="C0", linestyle="--", linewidth=2, label=r"$\alpha_1$ (true)"
)
ax.axvline(
x=alpha2, color="C1", linestyle="--", linewidth=2, label=r"$\alpha_2$ (true)"
)
ax.legend(loc="upper left")
ax.set_title(
label=r"Adstock Parameter $\alpha$ ($94\%$ HDI) - Model 2",
fontsize=18,
fontweight="bold",
)
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=lam_posterior_2, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(x=lam1, color="C0", linestyle="--", linewidth=2, label=r"$\lambda_1$ (true)")
ax.axvline(x=lam2, color="C1", linestyle="--", linewidth=2, label=r"$\lambda_2$ (true)")
ax.legend(loc="upper right")
ax.set_title(
label=r"Saturation Parameter $\lambda$ ($94\%$ HDI) - Model 2",
fontsize=18,
fontweight="bold",
)
beta_channel_posterior_2 = (
idata_2["posterior"]["beta_channel"]
* (1 / channels_scaler.scale_)
* (target_scaler.scale_)
* (1 / amplitude)
)
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=beta_channel_posterior_2, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(x=beta1, color="C0", linestyle="--", linewidth=2, label=r"$\beta_1$ (true)")
ax.axvline(x=beta2, color="C1", linestyle="--", linewidth=2, label=r"$\beta_2$ (true)")
ax.legend(loc="upper right")
ax.set_title(
label=r"Channel $\beta$ ($94\%$ HDI) - Model 1", fontsize=18, fontweight="bold"
)
The carryover parameter \(\alpha_1\) and the regression coefficient \(\beta_1\) for \(x_1\) are the ones that are most affected by the omitted variable bias. For \(x_2\) the model is able to recover the true values within the \(94\%\) HDI.
ROAS Estimation
We now compute the ROAS as before. We start with the global ROAS:
predictions_roas_data_all_2 = {"x1": {}, "x2": {}}
for channel in tqdm(["x1", "x2"]):
with model_2:
pm.set_data(
new_data={
"channels_scaled": channels_scaler.transform(
data_df[channels].assign(**{channel: 0})
)
}
)
predictions_roas_data_all_2[channel] = pm.sample_posterior_predictive(
trace=idata_2, var_names=["y", "mu"], progressbar=False, random_seed=rng
)
predictions_roas_data_scaled_diff_all_2 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_data_scaled_diff_all_2[channel] = pp_vars_original_scale_2[
"mu"
] - xr.apply_ufunc(
target_scaler.inverse_transform,
predictions_roas_data_all_2[channel]["posterior_predictive"]["mu"].expand_dims(
dim={"_": 1}, axis=-1
),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(
dim="_"
)
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
az.plot_hdi(
x=date,
y=predictions_roas_data_scaled_diff_all_2["x1"],
color="C0",
smooth=False,
fill_kwargs={"alpha": 0.3, "label": r"$E(y - y_{01})$ $94\%$ HDI"},
ax=ax[0],
)
sns.lineplot(
data=data_df.assign(x1_effect=lambda x: amplitude * x["x1_effect"]),
x="date",
y="x1_effect",
color="C0",
label=r"$x1$ effect",
ax=ax[0],
)
az.plot_hdi(
x=date,
y=predictions_roas_data_scaled_diff_all_2["x2"],
color="C1",
smooth=False,
fill_kwargs={"alpha": 0.3, "label": r"$E(y - y_{02})$ $94\%$ HDI"},
ax=ax[1],
)
sns.lineplot(
data=data_df.assign(x2_effect=lambda x: amplitude * x["x2_effect"]),
x="date",
y="x2_effect",
color="C1",
label=r"$x2$ effect",
ax=ax[1],
)
ax[0].legend(loc="upper left")
ax[1].legend(loc="upper left")
ax[0].set(ylabel=None)
ax[1].set(ylabel=None)
fig.suptitle("Potential Outcomes Difference - Model 2", fontsize=18, fontweight="bold")
predictions_roas_all_2 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_all_2[channel] = (
predictions_roas_data_scaled_diff_all_2[channel].sum(dim="date")
/ data_df[channel].sum()
)
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
az.plot_posterior(predictions_roas_all_2["x1"], ref_val=roas_all_1, ax=ax[0])
ax[0].set(title="x1")
az.plot_posterior(predictions_roas_all_2["x2"], ref_val=roas_all_2, ax=ax[1])
ax[1].set(title="x2", xlabel="ROAS")
fig.suptitle("Global ROAS - Model 2", fontsize=18, fontweight="bold")
The ROAS overestimation of \(x_1\) is massive! It is of the order of \(100\%\). For \(x_2\) the estimated global is less than the true value (but less dramatic than \(x_1\)).
We now compute the quarterly ROAS as before.
predictions_roas_data_2 = {"x1": {}, "x2": {}}
for channel in tqdm(["x1", "x2"]):
for q in tqdm(roas_data[channel].keys()):
with model_2:
pm.set_data(
new_data={
"channels_scaled": channels_scaler.transform(
roas_data[channel][q][channels]
)
}
)
predictions_roas_data_2[channel][q] = pm.sample_posterior_predictive(
trace=idata_2, var_names=["y", "mu"], progressbar=False, random_seed=rng
)
predictions_roas_data_scaled_diff_2 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_data_scaled_diff_2[channel] = {
q: pp_vars_original_scale_2["mu"]
- xr.apply_ufunc(
target_scaler.inverse_transform,
idata_q["posterior_predictive"]["mu"].expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
for q, idata_q in tqdm(predictions_roas_data_2[channel].items())
}
predictions_roas_2 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_2[channel] = {
q: idata_q.sum(dim="date") / data_df.query("quarter == @q")[channel].sum()
for q, idata_q in predictions_roas_data_scaled_diff_2[channel].items()
}
axes = az.plot_posterior(
data=xr.concat(
predictions_roas_2["x1"].values(),
dim=pd.Index(data=predictions_roas_2["x1"].keys(), name="quarter"),
),
figsize=(18, 15),
grid=(5, 2),
ref_val={"mu": x1_ref_vals},
backend_kwargs={"sharex": True, "layout": "constrained"},
)
fig = axes.ravel()[0].get_figure()
fig.suptitle(
t="Posterior Predictive - ROAS x1 - Model 2", fontsize=18, fontweight="bold"
)
axes = az.plot_posterior(
data=xr.concat(
predictions_roas_2["x2"].values(),
dim=pd.Index(data=predictions_roas_2["x2"].keys(), name="quarter"),
),
figsize=(18, 15),
grid=(5, 2),
ref_val={"mu": x2_ref_vals},
backend_kwargs={"sharex": True, "layout": "constrained"},
)
fig = axes.ravel()[0].get_figure()
fig.suptitle(
t="Posterior Predictive - ROAS x2 - Model 2", fontsize=18, fontweight="bold"
)
The quarterly ROAS present the same pattern as the global ROAS. The model overestimates the ROAS for \(x_1\) heavily!
Model 3: ROAS Calibration
In this third model we use the parametrization presented in the paper “Media Mix Model Calibration With Bayesian Priors”, by Zhang, et al.. Here is the description from the paper:
Remark: In the paper they use a Hill saturation function instead of a logistic one. This is not important for the method. It works in both cases.
We show that by adding ROAS estimations (e.g. coming from previous experiments or domain knowledge) as priors we are able to mitigate a significant amount of the omitted variable bias.
We start with the estimation of the ROAS in the original scale. We set them so that they are close to the true values (i.e. assuming the experiment worked). In order to include them into the model we need to scale them (as always, the devil is in the details).
roas_x1_bar = 95
roas_x2_bar = 170
roas_x1_bar_scaled = roas_x1_bar / target_scaler.scale_.item()
roas_x2_bar_scaled = roas_x2_bar / target_scaler.scale_.item()
roas_bar_scaled = np.array([roas_x1_bar_scaled, roas_x2_bar_scaled])
For the prior distribution we choose a distribution which is centered at the (scaled) ROAS and has a standard deviation approximately equal to the uncertainty error of the estimation. We can use a log-normal distribution to ensure the ROAS values are positive.
error = 30
error_scaled = error / target_scaler.scale_.item()
fig, ax = plt.subplots(figsize=(10, 6))
pz.LogNormal(mu=np.log(roas_x1_bar_scaled), sigma=error_scaled).plot_pdf(
color="C0", ax=ax
)
ax.axvline(x=roas_x1_bar_scaled, color="C0", linestyle="--", linewidth=2)
pz.LogNormal(mu=np.log(roas_x2_bar_scaled), sigma=error_scaled).plot_pdf(
color="C1", ax=ax
)
ax.axvline(x=roas_x2_bar_scaled, color="C1", linestyle="--", linewidth=2)
ax.set(xlabel="ROAS (scaled)")
ax.set_title(label=r"Prior Distribution ROAS", fontsize=18, fontweight="bold")
Model Specification
In the following model we set priors on the ROAS instead of the regression coefficients.
eps = np.finfo(float).eps
with pm.Model(coords=coords) as model_3:
# --- Data Containers ---
index_scaled_data = pm.Data(
name="index_scaled",
value=index_scaled.to_numpy().flatten(),
mutable=True,
dims="date",
)
channels_scaled_data = pm.Data(
name="channels_scaled",
value=channels_scaled,
mutable=True,
dims=("date", "channel"),
)
channels_cost_data = pm.Data(
name="channels_cost",
value=data_df[channels].to_numpy(),
mutable=True,
dims=("date", "channel"),
)
fourier_features_data = pm.Data(
name="fourier_features",
value=fourier_features,
mutable=True,
dims=("date", "fourier_mode"),
)
y_scaled_data = pm.Data(
name="y_scaled",
value=target_scaled.to_numpy().flatten(),
mutable=True,
dims="date",
)
# --- Priors ---
amplitude_trend = pm.HalfNormal(name="amplitude_trend", sigma=1)
ls_trend_params = pm.find_constrained_prior(
distribution=pm.InverseGamma,
lower=0.1,
upper=0.9,
init_guess={"alpha": 3, "beta": 1},
mass=0.95,
)
ls_trend = pm.InverseGamma(name="ls_trend", **ls_trend_params)
alpha = pm.Beta(name="alpha", alpha=2, beta=3, dims="channel")
lam = pm.Gamma(name="lam", alpha=2, beta=2, dims="channel")
roas = pm.LogNormal(
name="roas",
mu=np.log(roas_bar_scaled),
sigma=error_scaled,
dims="channel",
)
beta_fourier = pm.Normal(name="beta_fourier", mu=0, sigma=1, dims="fourier_mode")
sigma = pm.HalfNormal(name="sigma", sigma=1)
nu = pm.Gamma(name="nu", alpha=25, beta=2)
# --- Parametrization ---
cov_trend = amplitude_trend * pm.gp.cov.ExpQuad(input_dim=1, ls=ls_trend)
gp_trend = pm.gp.HSGP(m=[20], c=1.5, cov_func=cov_trend)
f_trend = gp_trend.prior(name="f_trend", X=index_scaled_data[:, None], dims="date")
channel_adstock = pm.Deterministic(
name="channel_adstock",
var=geometric_adstock(
x=channels_scaled_data,
alpha=alpha,
l_max=l_max,
normalize=True,
),
dims=("date", "channel"),
)
channel_adstock_saturated = pm.Deterministic(
name="channel_adstock_saturated",
var=logistic_saturation(x=channel_adstock, lam=lam),
dims=("date", "channel"),
)
beta_channel = pm.Deterministic(
name="beta_channel",
var=(channels_cost_data.sum(axis=0) * roas)
/ (channel_adstock_saturated.sum(axis=0) + eps),
dims="channel",
)
channel_contributions = pm.Deterministic(
name="channel_contributions",
var=channel_adstock_saturated * beta_channel,
dims=("date", "channel"),
)
fourier_contribution = pm.Deterministic(
name="fourier_contribution",
var=pt.dot(fourier_features_data, beta_fourier),
dims="date",
)
mu = pm.Deterministic(
name="mu",
var=f_trend + channel_contributions.sum(axis=-1) + fourier_contribution,
dims="date",
)
# --- Likelihood ---
y = pm.StudentT(
name="y",
nu=nu,
mu=mu,
sigma=sigma,
observed=y_scaled_data,
dims="date",
)
pm.model_to_graphviz(model=model_3)
Remark: The channels_cost_data
has the cost of each channel without any scaling. Note that you could use impressions to model the contribution of each channel (channels_scaled_data
). This method is agnostic to the type of data you use.
Prior Predictive Checks
Let’s see how this parametrization changes the prior predictive distribution:
with model_3:
prior_predictive_3 = pm.sample_prior_predictive(samples=2_000, random_seed=rng)
fig, ax = plt.subplots()
az.plot_ppc(data=prior_predictive_3, group="prior", kind="kde", ax=ax)
ax.set_title(label="Prior Predictive - Model 3", fontsize=18, fontweight="bold")
It actually looks very similar to the first model.
Model Fitting
We now fit the model.
with model_3:
idata_3 = pm.sample(
target_accept=0.9,
draws=2_000,
chains=4,
nuts_sampler="numpyro",
random_seed=rng,
)
posterior_predictive_3 = pm.sample_posterior_predictive(
trace=idata_3, random_seed=rng
)
The sampling time is comparable with the models above.
Model Diagnostics
We now take a look into the results.
idata_3["sample_stats"]["diverging"].sum().item()
\(\displaystyle 0\)
var_names = [
"amplitude_trend",
"ls_trend",
"alpha",
"lam",
"roas",
"beta_channel",
"beta_fourier",
"sigma",
"nu",
]
az.summary(data=idata_3, var_names=var_names, round_to=3)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
amplitude_trend | 0.390 | 0.354 | 0.026 | 1.066 | 0.008 | 0.005 | 1577.531 | 2360.063 | 1.002 |
ls_trend | 0.328 | 0.047 | 0.235 | 0.410 | 0.001 | 0.001 | 1510.604 | 1477.460 | 1.003 |
alpha[x1] | 0.035 | 0.022 | 0.002 | 0.074 | 0.000 | 0.000 | 8148.597 | 4067.383 | 1.000 |
alpha[x2] | 0.527 | 0.041 | 0.452 | 0.603 | 0.000 | 0.000 | 8590.509 | 6258.338 | 1.000 |
lam[x1] | 0.288 | 0.160 | 0.020 | 0.569 | 0.002 | 0.001 | 7728.718 | 3689.815 | 1.001 |
lam[x2] | 1.362 | 0.880 | 0.073 | 2.986 | 0.010 | 0.007 | 6400.545 | 4857.139 | 1.000 |
roas[x1] | 0.217 | 0.012 | 0.194 | 0.239 | 0.000 | 0.000 | 7185.918 | 6334.511 | 1.000 |
roas[x2] | 0.294 | 0.013 | 0.269 | 0.319 | 0.000 | 0.000 | 9086.750 | 5634.668 | 1.000 |
beta_channel[x1] | 3.640 | 4.206 | 0.713 | 8.634 | 0.071 | 0.050 | 7509.496 | 3694.108 | 1.001 |
beta_channel[x2] | 1.600 | 7.306 | 0.256 | 3.892 | 0.131 | 0.093 | 6502.806 | 4805.879 | 1.000 |
beta_fourier[0] | 0.008 | 0.008 | -0.009 | 0.020 | 0.000 | 0.000 | 2930.040 | 1823.914 | 1.002 |
beta_fourier[1] | 0.042 | 0.008 | 0.029 | 0.055 | 0.000 | 0.000 | 3880.937 | 2236.211 | 1.000 |
beta_fourier[2] | -0.051 | 0.004 | -0.059 | -0.043 | 0.000 | 0.000 | 9493.599 | 6236.470 | 1.001 |
beta_fourier[3] | -0.002 | 0.004 | -0.009 | 0.005 | 0.000 | 0.000 | 17322.683 | 5873.648 | 1.000 |
beta_fourier[4] | -0.003 | 0.004 | -0.010 | 0.004 | 0.000 | 0.000 | 16458.519 | 5724.276 | 1.000 |
beta_fourier[5] | 0.002 | 0.004 | -0.005 | 0.009 | 0.000 | 0.000 | 14683.029 | 5765.216 | 1.001 |
sigma | 0.027 | 0.002 | 0.023 | 0.031 | 0.000 | 0.000 | 7344.703 | 5838.703 | 1.000 |
nu | 12.098 | 2.441 | 7.685 | 16.658 | 0.025 | 0.018 | 9119.262 | 4900.607 | 1.001 |
axes = az.plot_trace(
data=idata_3,
var_names=var_names,
compact=True,
backend_kwargs={"figsize": (15, 17), "layout": "constrained"},
)
plt.gcf().suptitle("Trace - Model 3", fontsize=16)
Everything looks in order. Note that the ROAS posterior distribution is very close to the prior.
Posterior Predictive Checks
We now evaluate the model fit.
pp_vars_original_scale_3 = {
var_name: xr.apply_ufunc(
target_scaler.inverse_transform,
idata_3["posterior"][var_name].expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
for var_name in ["mu", "f_trend", "fourier_contribution"]
}
pp_vars_original_scale_3["channel_contributions"] = xr.apply_ufunc(
target_scaler.inverse_transform,
idata_3["posterior"]["channel_contributions"],
input_core_dims=[["date", "channel"]],
output_core_dims=[["date", "channel"]],
vectorize=True,
)
pp_vars_original_scale_3["baseline"] = xr.apply_ufunc(
target_scaler.inverse_transform,
(
idata_3["posterior"]["f_trend"] + idata_3["posterior"]["fourier_contribution"]
).expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
pp_likelihood_original_scale_3 = xr.apply_ufunc(
target_scaler.inverse_transform,
posterior_predictive_3["posterior_predictive"]["y"].expand_dims(
dim={"_": 1}, axis=-1
),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
fig, ax = plt.subplots()
sns.lineplot(data=model_df, x="date", y="y", color="black", label=target, ax=ax)
az.plot_hdi(
x=date,
y=pp_likelihood_original_scale_3,
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.1, "label": r"likelihood $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_3["mu"],
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.3, "label": r"$\mu$ $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper left")
ax.set_title(label="Posterior Predictive - Model 3", fontsize=18, fontweight="bold")
fig, ax = plt.subplots()
az.plot_ppc(
data=posterior_predictive_3,
num_pp_samples=1_000,
observed_rug=True,
random_seed=seed,
ax=ax,
)
ax.set_title(label="Posterior Predictive - Model 3", fontsize=18, fontweight="bold")
The posterior predictive distribution looks worse than the previous models.
Model Components
We now plot the model components.
fig, ax = plt.subplots()
sns.lineplot(data=model_df, x="date", y="y", color="black", label=target, ax=ax)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_3["channel_contributions"].sel(channel="x1"),
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.5, "label": r"$x1$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_3["channel_contributions"].sel(channel="x2"),
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.5, "label": r"$x2$ contribution $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_3["f_trend"],
hdi_prob=0.94,
color="C2",
fill_kwargs={"alpha": 0.3, "label": r"$f_\text{trend}$ $94\%$ HDI"},
smooth=False,
ax=ax,
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_3["fourier_contribution"],
hdi_prob=0.94,
color="C3",
fill_kwargs={"alpha": 0.3, "label": r"seasonality $94\%$ HDI"},
smooth=False,
ax=ax,
)
ax.legend(loc="upper center", bbox_to_anchor=(0.5, -0.1), ncol=3)
ax.set_title(label="Components Contributions - Model 3", fontsize=18, fontweight="bold")
Observe that the \(94\%\) HDI for both channels is much narrower than the previous models. This is because we are constraining the values through the ROAS tight priors.
Let’s compare now the media contributions against the true values:
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
sns.lineplot(
x="date",
y="x1_effect",
data=data_df.assign(x1_effect=lambda x: amplitude * x["x1_effect"]),
color="C0",
label=r"$x1$ effect",
ax=ax[0],
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_3["channel_contributions"].sel(channel="x1"),
hdi_prob=0.94,
color="C0",
fill_kwargs={"alpha": 0.3, "label": r"$x1$ contribution $94\%$ HDI"},
smooth=False,
ax=ax[0],
)
ax[0].legend(loc="upper right")
sns.lineplot(
x="date",
y="x2_effect",
data=data_df.assign(x2_effect=lambda x: amplitude * x["x2_effect"]),
color="C1",
label=r"$x2$ effect",
ax=ax[1],
)
az.plot_hdi(
x=date,
y=pp_vars_original_scale_3["channel_contributions"].sel(channel="x2"),
hdi_prob=0.94,
color="C1",
fill_kwargs={"alpha": 0.3, "label": r"$x2$ contribution $94\%$ HDI"},
smooth=False,
ax=ax[1],
)
ax[1].legend(loc="upper right")
fig.suptitle("Channel Contributions - Model 3", fontsize=18, fontweight="bold")
The result for both channels is much better than the previous models *remember we do not have \(z\) in the model).
Media Parameters
alpha_posterior_3 = idata_3["posterior"]["alpha"]
lam_posterior_3 = idata_3["posterior"]["lam"] * channels_scaler.scale_
beta_channel_posterior_3 = idata_3["posterior"]["beta_channel"] * channels_scaler.scale_
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=alpha_posterior_3, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(
x=alpha1, color="C0", linestyle="--", linewidth=2, label=r"$\alpha_1$ (true)"
)
ax.axvline(
x=alpha2, color="C1", linestyle="--", linewidth=2, label=r"$\alpha_2$ (true)"
)
ax.legend(loc="upper left")
ax.set_title(
label=r"Adstock Parameter $\alpha$ ($94\%$ HDI) - Model 2",
fontsize=18,
fontweight="bold",
)
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=lam_posterior_3, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(x=lam1, color="C0", linestyle="--", linewidth=2, label=r"$\lambda_1$ (true)")
ax.axvline(x=lam2, color="C1", linestyle="--", linewidth=2, label=r"$\lambda_2$ (true)")
ax.legend(loc="upper right")
ax.set_title(
label=r"Saturation Parameter $\lambda$ ($94\%$ HDI) - Model 2",
fontsize=18,
fontweight="bold",
)
beta_channel_posterior_3 = (
idata_3["posterior"]["beta_channel"]
* (1 / channels_scaler.scale_)
* (target_scaler.scale_)
* (1 / amplitude)
)
fig, ax = plt.subplots(figsize=(10, 6))
az.plot_forest(
data=beta_channel_posterior_3, combined=True, hdi_prob=0.94, colors="black", ax=ax
)
ax.axvline(x=beta1, color="C0", linestyle="--", linewidth=2, label=r"$\beta_1$ (true)")
ax.axvline(x=beta2, color="C1", linestyle="--", linewidth=2, label=r"$\beta_2$ (true)")
ax.legend(loc="upper right")
ax.set_title(
label=r"Channel $\beta$ ($94\%$ HDI) - Model 1", fontsize=18, fontweight="bold"
)
- For \(x_2\) we recovered all the media parameters within the \(94\%\) HDI.
- For \(x_1\) the model heavily underestimated the carryover parameter \(\alpha_1\) and on the other hand overestimated the regression coefficient \(\beta_1\). The saturation parameter \(\lambda_1\) was closer to the true value, but still outside of the \(94\%\) HDI.
ROAS Estimation
Finally, we compute the ROAS. We should expect them to be much closer to the true values than model 2 since we have explicit tight priors over them. Let’s start with the global ROAS:
predictions_roas_data_all_3 = {"x1": {}, "x2": {}}
for channel in tqdm(["x1", "x2"]):
with model_3:
pm.set_data(
new_data={
"channels_scaled": channels_scaler.transform(
data_df[channels].assign(**{channel: 0}),
),
"channels_cost": data_df[channels].assign(**{channel: 0}),
}
)
predictions_roas_data_all_3[channel] = pm.sample_posterior_predictive(
trace=idata_3, var_names=["y", "mu"], progressbar=False, random_seed=rng
)
Note we hat to replace the media model data (channels_scaled
) and the media cost data (channels_cost
). Here you can see that we could use impressions instead of cost to model the contribution of each channel.
predictions_roas_data_scaled_diff_all_3 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_data_scaled_diff_all_3[channel] = pp_vars_original_scale_3[
"mu"
] - xr.apply_ufunc(
target_scaler.inverse_transform,
predictions_roas_data_all_3[channel]["posterior_predictive"]["mu"].expand_dims(
dim={"_": 1}, axis=-1
),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
az.plot_hdi(
x=date,
y=predictions_roas_data_scaled_diff_all_3["x1"],
color="C0",
smooth=False,
fill_kwargs={"alpha": 0.3, "label": r"$E(y - y_{01})$ $94\%$ HDI"},
ax=ax[0],
)
sns.lineplot(
data=data_df.assign(x1_effect=lambda x: amplitude * x["x1_effect"]),
x="date",
y="x1_effect",
color="C0",
label=r"$x1$ effect",
ax=ax[0],
)
az.plot_hdi(
x=date,
y=predictions_roas_data_scaled_diff_all_3["x2"],
color="C1",
smooth=False,
fill_kwargs={"alpha": 0.3, "label": r"$E(y - y_{02})$ $94\%$ HDI"},
ax=ax[1],
)
sns.lineplot(
data=data_df.assign(x2_effect=lambda x: amplitude * x["x2_effect"]),
x="date",
y="x2_effect",
color="C1",
label=r"$x2$ effect",
ax=ax[1],
)
ax[0].legend(loc="upper left")
ax[1].legend(loc="upper left")
ax[0].set(ylabel=None)
ax[1].set(ylabel=None)
fig.suptitle("Potential Outcomes Difference - Model 3", fontsize=18, fontweight="bold")
predictions_roas_all_3 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_all_3[channel] = (
predictions_roas_data_scaled_diff_all_3[channel].sum(dim="date")
/ data_df[channel].sum()
)
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 7), sharex=True, sharey=False, layout="constrained"
)
az.plot_posterior(predictions_roas_all_3["x1"], ref_val=roas_all_1, ax=ax[0])
ax[0].set(title="x1")
az.plot_posterior(predictions_roas_all_3["x2"], ref_val=roas_all_2, ax=ax[1])
ax[1].set(title="x2", xlabel="ROAS")
fig.suptitle("Global ROAS - Model 3", fontsize=18, fontweight="bold")
The results look as expected 🚀! The ROAS estimate for \(x_1\) is much closer to the true value than in the second model.
We wrap up the analysis by computing the quarterly ROAS.
predictions_roas_data_3 = {"x1": {}, "x2": {}}
for channel in tqdm(["x1", "x2"]):
for q in tqdm(roas_data[channel].keys()):
with model_3:
pm.set_data(
new_data={
"channels_scaled": channels_scaler.transform(
roas_data[channel][q][channels]
),
"channels_cost": roas_data[channel][q][channels],
}
)
predictions_roas_data_3[channel][q] = pm.sample_posterior_predictive(
trace=idata_3, var_names=["y", "mu"], progressbar=False, random_seed=rng
)
predictions_roas_data_scaled_diff_3 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_data_scaled_diff_3[channel] = {
q: pp_vars_original_scale_3["mu"]
- xr.apply_ufunc(
target_scaler.inverse_transform,
idata_q["posterior_predictive"]["mu"].expand_dims(dim={"_": 1}, axis=-1),
input_core_dims=[["date", "_"]],
output_core_dims=[["date", "_"]],
vectorize=True,
).squeeze(dim="_")
for q, idata_q in tqdm(predictions_roas_data_3[channel].items())
}
predictions_roas_3 = {"x1": {}, "x2": {}}
for channel in ["x1", "x2"]:
predictions_roas_3[channel] = {
q: idata_q.sum(dim="date") / data_df.query("quarter == @q")[channel].sum()
for q, idata_q in predictions_roas_data_scaled_diff_3[channel].items()
}
axes = az.plot_posterior(
data=xr.concat(
predictions_roas_3["x1"].values(),
dim=pd.Index(data=predictions_roas_3["x1"].keys(), name="quarter"),
),
figsize=(18, 15),
grid=(5, 2),
ref_val={"mu": x1_ref_vals},
backend_kwargs={"sharex": True, "layout": "constrained"},
)
fig = axes.ravel()[0].get_figure()
fig.suptitle(
t="Posterior Predictive - ROAS x1 - Model 3", fontsize=18, fontweight="bold"
)
axes = az.plot_posterior(
data=xr.concat(
predictions_roas_3["x2"].values(),
dim=pd.Index(data=predictions_roas_3["x2"].keys(), name="quarter"),
),
figsize=(18, 15),
grid=(5, 2),
ref_val={"mu": x2_ref_vals},
backend_kwargs={"sharex": True, "layout": "constrained"},
)
fig = axes.ravel()[0].get_figure()
fig.suptitle(
t="Posterior Predictive - ROAS x2 - Model 3", fontsize=18, fontweight="bold"
)
The results at quarterly level are consistent with the global ROAS.
Model Comparison
In this final section we compare the posterior distribution of the channel ROAS for the three models.
fig, ax = plt.subplots(
nrows=2, ncols=1, figsize=(12, 9), sharex=True, sharey=False, layout="constrained"
)
az.plot_forest(
data=[
predictions_roas_all_1["x1"].rename("roas"),
predictions_roas_all_2["x1"].rename("roas"),
predictions_roas_all_3["x1"].rename("roas"),
],
model_names=[
"Model 1 (causal)",
"Model 2 (unobserved confounder)",
"Model 3 (roas prior)",
],
combined=True,
ax=ax[0],
)
ax[0].get_legend().update({"loc": "upper right"})
ax[0].axvline(x=roas_all_1, color="black", linestyle="--", linewidth=2)
ax[0].set(title="x1")
az.plot_forest(
data=[
predictions_roas_all_1["x2"].rename("roas"),
predictions_roas_all_2["x2"].rename("roas"),
predictions_roas_all_3["x2"].rename("roas"),
],
model_names=[
"Model 1 (causal)",
"Model 2 (unobserved confounder)",
"Model 3 (roas prior)",
],
combined=True,
ax=ax[1],
)
ax[1].get_legend().update({"loc": "upper left"})
ax[1].axvline(x=roas_all_2, color="black", linestyle="--", linewidth=2)
ax[1].set(title="x2")
fig.suptitle("Global ROAS - Model Comparison", fontsize=18, fontweight="bold")
Lastly, we compare the estimated quarterly ROAS for \(x_1\):
fig, ax = plt.subplots()
az.plot_forest(
data=[
xr.concat(
predictions_roas_1["x1"].values(),
dim=pd.Index(data=predictions_roas_1["x2"].keys(), name="quarter"),
).rename("roas"),
xr.concat(
predictions_roas_2["x1"].values(),
dim=pd.Index(data=predictions_roas_2["x2"].keys(), name="quarter"),
).rename("roas"),
xr.concat(
predictions_roas_3["x1"].values(),
dim=pd.Index(data=predictions_roas_2["x2"].keys(), name="quarter"),
).rename("roas"),
],
model_names=[
"Model 1 (causal)",
"Model 2 (unobserved confounder)",
"Model 3 (roas prior)",
],
combined=True,
ax=ax,
)
ax.axvline(x=roas_all_1, color="black", linestyle="--", linewidth=2)
ax.get_legend().update({"loc": "center left", "bbox_to_anchor": (1, 0.5)})
ax.set_title(
label=r"Posterior Predictive - $x_1$ ROAS - Model Comparison",
fontsize=18,
fontweight="bold",
)
Conclusion
With this simulated case study we have seen the benefit of using ROAS priors instead of regression coefficients. We have seen that the model with the ROAS priors was able to partially recover the true values of the ROAS in the presence of unobserved confounders. We also see that the ROAS are not the same as the specified prior since it balances out with the evidence from the media mix regression model. Hence, this approach serves as a more holistic ROAS estimation for real world scenarios.