In this notebook we describe how to fit Fader’s and Hardie’s *gamma-gamma model* presented in the paper “RFM and CLV: Using Iso-value Curves
for Customer Base Analysis” and the note “The Gamma-Gamma Model of Monetary
Value”. The approach is very similar as the one presented in the previous post BG/NBD Model in PyMC where we simply ported the log-likelihood of the `lifetimes`

package from `numpy`

to `theano`

.

## Prepare Notebook

```
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from lifetimes.datasets import load_cdnow_summary_data_with_monetary_value
from lifetimes import GammaGammaFitter
import pymc3 as pm
import theano.tensor as tt
plt.style.use("bmh")
plt.rcParams["figure.figsize"] = [8, 6]
plt.rcParams["figure.dpi"] = 100
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = "svg"
```

## Load Data

We are going to use an existing data set from the `lifetimes`

package documentation, see here.

```
data_df = load_cdnow_summary_data_with_monetary_value()
data_df.info()
```

```
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2357 entries, 1 to 2357
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 frequency 2357 non-null int64
1 recency 2357 non-null float64
2 T 2357 non-null float64
3 monetary_value 2357 non-null float64
dtypes: float64(3), int64(1)
memory usage: 92.1 KB
```

From the package’s documentation:

`frequency`

: Number of repeat purchases the customer has made. More precisely, It’s the count of time periods the customer had a purchase in.`monetary_value`

: represents the average value of a given customer’s purchases.

LEt us plot the distribution of the features of interest:

```
fig, ax = plt.subplots(
nrows=1,
ncols=2,
figsize=(9, 4),
sharex=False,
sharey=False,
layout="constrained"
)
sns.kdeplot(x="frequency", data=data_df, color="C2", clip=(0.0, None), fill=True, ax=ax[0])
ax[0].set(title="Frequency",xlabel="frequency (log)",xscale="log")
sns.kdeplot(x="monetary_value", data=data_df, color="C3", clip=(0.0, None), fill=True, ax=ax[1])
ax[1].set(title="Monetary Value",xlabel="monetary value (log)",xscale="log")
fig.suptitle("Data Distribution (log scale)", fontsize=16);
```

## Gamma-Gamma Model (Lifetimes)

In this first section we we fit the model using the `GammaGammaFitter`

class from the `lifetimes`

package. Before fitting the model, let us recall the model specification (see “The Gamma-Gamma Model of Monetary
Value”). First, here are the assumptions:

- The monetary value of a customer’s given transaction varies randomly around their average transaction value.
- Average transaction values vary across customers but do not vary over time for any given individual.
- The distribution of average transaction values across customers is independent of the transaction process.

Let \(\bar{z}\) be the observed average transaction value for a given customer. One is generally interested in the expected value \(E(Z|\bar{z}, x)\), where \(Z\) denotes the random variable of the customer monetary value per transaction and \(x\) is the observed frequency (see last section below). Now let us look into the concrete model parametrization:

- We assume that \(z_i \sim \text{Gamma}(p, \nu)\), with \(E(Z_i| p, \nu) = \xi = p/\nu\).

- Given the convolution properties of the gamma, it follows that total spend across \(x\) transactions is distributed \(\text{Gamma}(px, \nu)\).
- Given the scaling property of the gamma distribution, it follows \(\bar{z} \sim \text{Gamma}(px, νx)\).
- We assume \(\nu \sim \text{Gamma}(q, \gamma)\).

**Remark:** The notation used in the paper is slightly different from the one used in the `lifetimes`

package. The parameter \(\gamma\) is denoted by \(v\).

For more details in the theory and the consequences of this parametrization please see the references above.

When fitting the model, we need to be careful with two conditions:

- We must remove users without repeated purchases.
- The Gamma-Gamma model assumes that there is no relationship between the monetary value and the purchase frequency.

For the first condition we simply filter out the data:

`data_df.query("frequency > 0", inplace=True)`

For the second condition we compute the correlation:

`data_df.filter(["frequency", "monetary_value"]).corr()`

frequency | monetary_value | |
---|---|---|

frequency | 1.000000 | 0.113884 |

monetary_value | 0.113884 | 1.000000 |

The correlation is not very high. Let us see it ins a scatter plot:

```
fig, ax = plt.subplots()
sns.scatterplot(x="frequency", y="monetary_value", data=data_df, ax=ax)
ax.set(title="Frequency vs Monetary Value",xlabel="frequency",ylabel="monetary value");
```

There does not seem to be a high relation between these features, so we can move and fir the `lifetimes`

model.

```
# extract the data using the paper's (and lifetimes) notation
x = data_df["frequency"].to_numpy()
m = data_df["monetary_value"].to_numpy()
```

```
ggf = GammaGammaFitter(penalizer_coef=0.0)
ggf.fit(frequency=x, monetary_value=m)
ggf.summary
```

coef | se(coef) | lower 95% bound | upper 95% bound | |
---|---|---|---|---|

p | 6.248802 | 1.189687 | 3.917016 | 8.580589 |

q | 3.744588 | 0.290166 | 3.175864 | 4.313313 |

v | 15.447748 | 4.159994 | 7.294160 | 23.601336 |

These errors are computed using the Hessian (see for example here).

## Bayesian Model

We use the same strategy of the post BG/NBD Model in PyMC where we rewrite the log-likelihood of the `lifetimes`

package from `numpy`

to `theano`

. In this case the expression is much simpler, see the `staticmethod`

of the class `GammaGammaFitter._negative_log_likelihood`

.

```
with pm.Model() as model:
p = pm.HalfNormal(name="p", sd=10)
q = pm.HalfNormal(name="q", sd=10)
v = pm.HalfNormal(name="v", sd=10)
def logp(x, m):
return (
tt.gammaln(p * x + q)
- tt.gammaln(p * x)
- tt.gammaln(q)
+ q * tt.log(v)
+ (p * x - 1) * tt.log(m)
+ (p * x) * tt.log(x)
- (p * x + q) * tt.log(x * m + v)
)
likelihood = pm.DensityDist(
name="likelihood",
logp=logp,
observed = {"x":x, "m": m}
)
```

Next, we estimate the posterior distribution.

```
with model:
trace = pm.sample(
tune=2000,
draws=6000,
chains=4,
target_accept=0.95,
return_inferencedata=True
)
```

Let us take a look at the model summary:

`az.summary(data=trace, var_names=["p", "q", "v"])`

mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|

p | 6.991 | 1.352 | 4.733 | 9.541 | 0.018 | 0.013 | 5455.0 | 6269.0 | 1.0 |

q | 3.663 | 0.257 | 3.198 | 4.150 | 0.003 | 0.002 | 5666.0 | 6580.0 | 1.0 |

v | 14.021 | 3.466 | 7.963 | 20.688 | 0.049 | 0.035 | 5034.0 | 5506.0 | 1.0 |

We now plot the posterior distributions and compare them against the point estimate of the `GammaFitter`

model.

```
axes = az.plot_trace(
data=trace,
lines=[(k, {}, [v]) for k, v in ggf.summary["coef"].items()],
compact=True,
backend_kwargs={
"figsize": (10, 7),
"layout": "constrained"
},
)
fig = axes[0][0].get_figure()
fig.suptitle("Gamma-Gamma Model Trace");
```

The results look comparable!

Finally, let’s look into the pair-plot of the posterior distributions:

```
axes = az.plot_pair(data=trace, var_names=["p", "q", "v"], figsize=(8, 8))
fig = plt.gcf()
fig.suptitle("Gamma-Gamma Model Parameters Pairplot", y=0.95, fontsize=16);
```

It is interesting to see that the `p`

and `v`

distributions seem to obey a non-linear relationship.

## Conditional Expected Average Profit

Now that we have posterior samples at our disposal, we can compute quantities of interest using the expressions from the `lifetimes`

package (and the papers of course). For example, let us compute the *conditional expected average profit* defined by the expected value \(E(Z|\bar{z}, x)\). One can show that the expected average profit is a weighted average of individual monetary value and the population mean (see Equation 5 in “The Gamma-Gamma Model of Monetary
Value”).

First we compute it using the method `GammaGammaFitter.conditional_expected_average_profit`

:

`ggf_ceap = ggf.conditional_expected_average_profit(frequency=x, monetary_value=m)`

Next, we compare it to the results from the `pymc`

model. We do this by simply re-writing (specifically, vectorizing) the method above.

```
# pymc-examples/examples/time_series/Air_passengers-Prophet_with_Bayesian_workflow.html
def _sample(array, n_samples):
"""Little utility function, sample n_samples with replacement."""
idx = np.random.choice(np.arange(len(array)), n_samples, replace=True)
return array[idx]
def conditional_expected_average_profit(trace, x, m, n_samples):
posterior = trace.posterior.stack(sample=('chain', 'draw'))
p = _sample(array=posterior["p"], n_samples=n_samples).to_numpy()[..., None]
q = _sample(array=posterior["q"], n_samples=n_samples).to_numpy()[..., None]
v = _sample(array=posterior["v"], n_samples=n_samples).to_numpy()[..., None]
individual_weight = p * x[None, ...] / (p * x[None, ...] + q - 1)
population_mean = (v * p) / (q - 1)
return (population_mean * (1 - individual_weight)) + (individual_weight * m)
n_samples = 1000
pymc_ceap = conditional_expected_average_profit(trace, x, m, n_samples)
```

Finally, we can compare the results:

```
fig, ax = plt.subplots(figsize=(8, 8))
ax.errorbar(
x=ggf_ceap,
y=pymc_ceap.mean(axis=0),
yerr=az.hdi(ary=pymc_ceap).T,
fmt="o",
ecolor="C6",
markeredgecolor="C1",
markerfacecolor="C1",
markersize=5,
alpha=0.5,
capsize=2,
)
ax.axline(
xy1=(0, 0),
slope=1,
color="black",
linestyle="--",
label="diagonal"
)
ax.legend(loc="upper left")
ax.set(title="Model Comparison", xlabel="lifetimes", ylabel="pymc");
```

Note that the credible intervals increase as the monetary value increases. This is no surprise given the data distribution (see first plot above).

**Remark:** Note that the credible intervals are always positive:

`assert az.hdi(ary=pymc_ceap).min().min() > 0`