7 min read

Multilevel Elasticities for a Single SKU - Part III.

In this notebook we continue our simulation study for elasticities, see:

for an introduction to the problem and some models. We extend the covariance model to allow two covariance components on both the intercepts and slopes (coefficient of the median_ income variable). Also, to abstract from a specific framework, we do the implementation in NumPyro. The corresponding PyMC is very similar.

Prepare Notebook

import arviz as az
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpy as np
import numpyro
import numpyro.distributions as dist
import polars as pl
import seaborn as sns
from jax import random
from numpyro.infer import MCMC, NUTS, Predictive
from sklearn.preprocessing import LabelEncoder

az.style.use("arviz-darkgrid")
plt.rcParams["figure.figsize"] = [12, 7]
plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"

numpyro.set_host_device_count(n=4)

rng_key = random.PRNGKey(seed=42)

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "retina"

Read Data

We read the data from cour case study. YOu can find an explicit data generating process in the first notebook Multilevel Elasticities for a Single SKU - Part I.

market_df = pl.read_csv(
    "https://raw.githubusercontent.com/juanitorduz/website_projects/master/data/multilevel_elasticities_single_sku_data.csv",
    schema_overrides={"region_id": pl.Categorical},
)

market_df.head()
shape: (5, 10)
item_id price quantities time_step store_id region_store_id region_id median_income log_price log_quantities
i64 f64 f64 i64 i64 str cat f64 f64 f64
0 1.335446 6.170862 16 0 "r-0_s-0" "0" 5.873343 0.289265 1.819839
0 1.702792 3.715124 15 0 "r-0_s-0" "0" 5.873343 0.532269 1.312412
0 1.699778 3.290962 14 0 "r-0_s-0" "0" 5.873343 0.530498 1.19118
0 1.335844 5.702928 13 0 "r-0_s-0" "0" 5.873343 0.289563 1.74098
0 1.517213 4.264949 12 0 "r-0_s-0" "0" 5.873343 0.416875 1.45043

Recall that our objective is to estimate the elasticity of a single SKU across different regions.

g = sns.lmplot(
    data=market_df,
    x="log_price",
    y="log_quantities",
    hue="region_id",
    height=8,
    aspect=1.2,
    scatter=False,
)
g.set_axis_labels(x_var="Log Prices", y_var="Log Quantities")
legend = g.legend
legend.set_title(title="Region ID", prop={"size": 10})
g.fig.suptitle(
    "Log Prices vs Log Quantities by Region", y=1.05, fontsize=18, fontweight="bold"
);

We expect this elasticity to be influenced by the region’s median income.

fig, ax = plt.subplots(figsize=(9, 6))
(
    market_df.group_by("region_id")
    .agg(pl.col("median_income").mean())
    .to_pandas()
    .pipe(
        (sns.barplot, "data"), x="region_id", y="median_income", hue="region_id", ax=ax
    )
)
ax.set(xlabel="Region ID", ylabel="Median Income")
ax.set_title(label="Median Income by Region", fontsize=18, fontweight="bold");

Model Specification

As described in the previous notebooks, we can use a multilevel covariance model to estimate the elasticity by constraining how the intercept and slops vary. In the first notebook we did this just for the median_income variable. Here we extend it so that we have two covariance components for the intercept and two for the slope. We do thin in the non-centered parametrization (see Statistical Rethinking in NumPyro - Chapter 14. Adventures in Covariance). Here is a mathematical description of the model:

\[\begin{align*} \log(q) & \sim \text{Normal}(\mu, \sigma ^2) \\ \mu & = \alpha + \beta \log(p) \\ \alpha & = \alpha_{\text{intercept}, \text{region}} + \alpha_{\text{slope}, \text{region}} m \\ \beta & = \beta_{\text{intercept}, \text{region}} + \beta_{\text{slope}, \text{region}} m \\ \alpha_{\text{intercept}, \text{region}} & = z'_{\text{intercept}}[0] \\ \alpha_{\text{slope}, \text{region}} & = z'_{\text{slope}}[0] \\ \beta_{\text{intercept}, \text{region}} & = z'_{\text{intercept}}[1] \\ \beta_{\text{slope}, \text{region}} & = z'_{\text{slope}}[1] \\ z'_{\text{intercept}} &= \text{diag}(\sigma_{\alpha_{\text{intercept}}}^2, \sigma_{\beta_{\text{intercept}}}^2) L_{\text{intercept}} \: z_{\text{intercept}} \\ z'_{\text{slope}} &= \text{diag}(\sigma_{\alpha_{\text{slope}}}^2, \sigma_{\beta_{\text{slope}}}^2) L_{\text{slope}} \: z_{\text{slope}} \\ L_{\text{slope}} & \sim \text{LKJCholesky}(1) \\ z_{\text{intercept}} & \sim \text{Normal}(0, 1) \\ z_{\text{slope}} & \sim \text{Normal}(0, 1) \end{align*}\]

We prepare the data (very similar to the previous notebooks) and then implement the model in NumPyro.

obs_idx = np.arange(market_df.shape[0])

# For the region_id we use a label encoder to transform the categorical
# variable into integers (in more applications these tags won't be always integers so
# it is good to get used to these encoders.
regions_encoder = LabelEncoder()
regions_encoder.fit(market_df["region_id"].to_numpy())
regions_idx = regions_encoder.transform(market_df["region_id"].to_numpy())
regions = regions_encoder.classes_

log_price = market_df["log_price"].to_jax()
log_quantities = market_df["log_quantities"].to_jax()

# This is just an auxiliary variable to indicate we are modeling
# two covariance components.
effect = ["intercept", "median_income"]

Besides the usual data preparation, we need a map from the encoded category regions to the corresponding median income values:

region_median_income_mapping_df = market_df.select(
    ["region_id", "median_income"]
).unique()

region_median_income_mapping_df = region_median_income_mapping_df.with_columns(
    region_idx=regions_encoder.transform(
        region_median_income_mapping_df["region_id"].to_numpy()
    )
).sort("region_idx")

median_income = region_median_income_mapping_df["median_income"].to_jax()

Now we implement the model:

def model(log_price, regions_idx, median_income, log_quantities=None):
    dim = 2
    effects = 2
    n_obs = log_price.size
    n_regions = np.unique(regions_idx).size

    # Prior for the degrees of freedom of the Student-T distribution.
    nu = numpyro.sample("nu", dist.Gamma(concentration=20.0, rate=3))
    # Prior for the scale parameter of the Student-T distribution.
    sigma = numpyro.sample("sigma", dist.Exponential(rate=1.0 / 0.5))

    # concentration prior for the LKJ prior (this value assumes uniform correlation),
    concentration = jnp.ones(1)

    with numpyro.plate("effect", effects, dim=-1):
        # standard deviation of the effect.
        sigma_effect = numpyro.sample(
            "sigma_effect", dist.HalfNormal(scale=0.2 * jnp.ones(dim))
        )
        # correlation matrix
        chol_corr = numpyro.sample(
            "chol_corr",
            dist.LKJCholesky(dimension=dim, concentration=concentration),
        )

        with numpyro.plate("regions", n_regions, dim=-2):
            # z is the latent variable that we will use to compute the covariance matrix
            # in the non-centered parameterization.
            z = numpyro.sample("z", dist.Normal(loc=0, scale=1))

    # We compute the covariance matrix using the cholesky decomposition
    chol_cov = z @ (sigma_effect[None, ...] * chol_corr)
    # We extract the region effects
    alpha_region_effect = chol_cov[:, :, 0]
    beta_region_effect = chol_cov[:, :, 1]
    # Global intercept component (intercept and median income term)
    alpha_region_intercept = alpha_region_effect[0, :]
    alpha_region_median_income = alpha_region_effect[1, :]
    # Global slope factor of `log_price` (intercept and median income term)
    beta_region_intercept = beta_region_effect[0, :]
    beta_region_median_income = beta_region_effect[1, :]

    # Region specific intercept and slope
    alpha_region = numpyro.deterministic(
        "alpha_region",
        alpha_region_intercept + alpha_region_median_income * median_income,
    )
    beta_region = numpyro.deterministic(
        "beta_region",
        beta_region_intercept + beta_region_median_income * median_income,
    )

    # Likelihood mean
    mu = alpha_region[regions_idx] + beta_region[regions_idx] * log_price

    with numpyro.plate("data", n_obs):
        # We use a Student-T likelihood to model the residuals.
        numpyro.sample(
            "log_quantities",
            dist.StudentT(df=nu, loc=mu, scale=sigma),
            obs=log_quantities,
        )

Let’s visualize the model:

numpyro.render_model(
    model=model,
    model_kwargs={
        "log_price": log_price,
        "regions_idx": regions_idx,
        "median_income": median_income,
        "log_quantities": log_quantities,
    },
    render_distributions=True,
    render_params=True,
)

Inference

We now use NUTS to sample from the posterior distribution of the model.

sampler = NUTS(model)
mcmc = MCMC(
    sampler=sampler,
    num_warmup=1_000,
    num_samples=4_000,
    num_chains=4,
)

rng_key, rng_subkey = random.split(rng_key)
mcmc.run(rng_subkey, log_price, regions_idx, median_income, log_quantities)

We collect the results in an az.InferenceData object:

idata = az.from_numpyro(
    mcmc,
    coords={"effect": effect, "regions": regions},
    dims={
        "sigma_effect": ["effect"],
        "z": ["regions", "effect"],
        "alpha_region": ["regions"],
        "beta_region": ["regions"],
    },
)

Model Diagnostics

# Count divergences
idata.sample_stats.diverging.sum().item()
0
az.summary(idata, var_names=["~z", "~chol_corr"])
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha_region[0] 2.752 0.052 2.654 2.848 0.000 0.000 18213.0 11582.0 1.0
alpha_region[1] 2.994 0.056 2.890 3.100 0.000 0.000 16905.0 11383.0 1.0
alpha_region[2] 2.440 0.035 2.376 2.507 0.000 0.000 17436.0 12368.0 1.0
alpha_region[3] 3.606 0.139 3.344 3.867 0.001 0.001 16645.0 11384.0 1.0
alpha_region[4] 2.989 0.052 2.894 3.088 0.000 0.000 18849.0 11999.0 1.0
alpha_region[5] 2.827 0.057 2.720 2.935 0.000 0.000 16608.0 11715.0 1.0
alpha_region[6] 3.082 0.049 2.994 3.178 0.000 0.000 18674.0 13358.0 1.0
alpha_region[7] 2.343 0.029 2.291 2.398 0.000 0.000 18050.0 13305.0 1.0
alpha_region[8] 2.541 0.039 2.470 2.615 0.000 0.000 16718.0 12278.0 1.0
beta_region[0] -3.558 0.124 -3.789 -3.324 0.001 0.001 18427.0 11093.0 1.0
beta_region[1] -4.074 0.127 -4.321 -3.844 0.001 0.001 17216.0 11973.0 1.0
beta_region[2] -2.983 0.083 -3.141 -2.829 0.001 0.000 17442.0 12151.0 1.0
beta_region[3] -5.362 0.344 -5.990 -4.692 0.003 0.002 16356.0 10542.0 1.0
beta_region[4] -4.011 0.121 -4.240 -3.786 0.001 0.001 19029.0 11793.0 1.0
beta_region[5] -3.601 0.134 -3.854 -3.348 0.001 0.001 16588.0 9217.0 1.0
beta_region[6] -4.203 0.114 -4.421 -3.995 0.001 0.001 18968.0 12457.0 1.0
beta_region[7] -2.784 0.067 -2.909 -2.659 0.000 0.000 18677.0 11903.0 1.0
beta_region[8] -3.220 0.092 -3.395 -3.050 0.001 0.000 17982.0 11875.0 1.0
nu 12.536 1.533 9.740 15.388 0.015 0.011 10519.0 9749.0 1.0
sigma 0.271 0.005 0.262 0.280 0.000 0.000 11093.0 10292.0 1.0
sigma_effect[intercept] 0.268 0.057 0.174 0.376 0.001 0.001 3469.0 6384.0 1.0
sigma_effect[median_income] 0.498 0.089 0.344 0.671 0.001 0.001 5370.0 7094.0 1.0

We do not have divergences and the r-hat values look good!

axes = az.plot_trace(
    data=idata,
    var_names=["~z", "~chol_corr"],
    compact=True,
    kind="rank_bars",
    backend_kwargs={"figsize": (15, 9), "layout": "constrained"},
)
plt.gcf().suptitle("Covariance Model", fontsize=20, fontweight="bold", y=1.05);

Intercepts and Elasticities

Let’s look into the estimates for the intercepts and elasticities:

ax, *_ = az.plot_forest(
    data=idata,
    var_names=["alpha_region"],
    combined=True,
    figsize=(10, 6),
)
ax.set_title(
    label="Posterior Distribution of the Intercepts",
    fontsize=18,
    fontweight="bold",
);
ax, *_ = az.plot_forest(
    data=idata,
    var_names=["beta_region"],
    combined=True,
    figsize=(10, 6),
)
ax.set_title(
    label="Posterior Distribution of the Elasticities",
    fontsize=18,
    fontweight="bold",
);

These results are very similar to the ones we obtained in the previous notebooks!

Posterior Predictive Distribution

Finally, we compute the posterior predictive distribution and compare it to the observed data.

predictive = Predictive(
    model=model,
    posterior_samples=mcmc.get_samples(),
    return_sites=["log_quantities"],
)

rng_key, rng_subkey = random.split(rng_key)
predictions = predictive(rng_subkey, log_price, regions_idx, median_income)
idata.extend(
    az.from_numpyro(
        posterior_predictive=predictions,
        coords={"obs_idx": obs_idx},
        dims={"log_quantities": ["obs_idx"]},
    )
)
fig, ax = plt.subplots()
az.plot_ppc(idata, kind="kde", num_pp_samples=1_000, ax=ax)
ax.set_title(label="Posterior Predictive", fontsize=18, fontweight="bold");

The model seems to be capturing most of the variation in the data.