PyCon Colombia 2022
Dr. Juan Orduz
Learn more: PyMC 4.0 Release Announcement
Suppose you see a person with long hair. You want to estimate the probablity that this person is a woman. That is, for \(A = \text{woman}\) and \(B = \text{long hair}\), we want to estimate \(P(A|B)\)
Prior-Information
You belive \(P(A) = 0.5\), \(P(B)=0.4\) and \(P(B|A) = 0.7\).
Bayes Rule
\[ P(A|B) = \frac{P(A)\times P(B|A)}{P(B)} = \frac{0.5 \times 0.7}{0.4} = 0.875 \]
Assume \(y\sim p(y|\theta)\), where \(\theta\) is a parameter(s) for the distribution (e.g. \(y\sim N(\mu, \sigma^2)\)). From Bayes Theorem:
\[ p(\theta|y)=\frac{p(y|\theta) \times p(\theta)}{p(y)} = \displaystyle{\frac{p(y|\theta)\times p(\theta)}{\color{red}{\int p(y|\theta)p(\theta)d\theta}}} \]
likelihood
.prior
distribution of \(\theta\).\[ p(\theta|y) \propto \text{likelihood} \times \text{prior}. \]
Integrals are hard to compute \(\Longrightarrow\) we need samplers
.
\[\begin{align*} y & \sim \text{Normal}(\mu, \sigma^2)\\ \mu & = a + bx \end{align*}\]
Objective:
We want to estimate the (posterior) distributions of \(a\), \(b\) (and hence \(\mu\)) and \(\sigma\) given \(x\) and \(y\).
Model Parametrization
: \[\begin{align*}
y & \sim \text{Normal}(\mu, \sigma^2)\\
\mu & = a + bx \\
\end{align*}\]
Prior Distributions
: \[\begin{align*}
a & \sim \text{Normal}(0, 2)\\
b & \sim \text{Normal}(0, 2) \\
\sigma & \sim \text{HalfNormal}(2)
\end{align*}\]
with pm.Model(coords={"idx": range(n_train)}) as model:
# --- Data Containers ---
x = pm.MutableData(name="x", value=x_train)
y = pm.MutableData(name="y", value=y_train)
# --- Priors ---
a = pm.Normal(name="a", mu=0, sigma=2)
b = pm.Normal(name="b", mu=0, sigma=2)
sigma = pm.HalfNormal(name="sigma", sigma=2)
# --- Model Parametrization ---
mu = pm.Deterministic(name="mu", var=a + b * x, dims="idx")
# --- Likelihood ---
likelihood = pm.Normal(
name="likelihood", mu=mu, sigma=sigma, observed=y, dims="idx"
)
Compare to:
Prior samples before passing the data through the model.
with model:
idata = pm.sample(target_accept=0.8, draws=1_000, chains=4)
posterior_predictive = pm.sample_posterior_predictive(trace=idata)
Posterior samples distriibution via NUTS sampler in PyMC. For each parameter we run 4 iindependent chains with 1000 samples each.
with pm.Model(coords={"idx": range(n_train)}) as model:
# --- Data Containers ---
x = pm.MutableData(name="x", value=x_train)
y = pm.MutableData(name="y", value=y_train)
# --- Priors ---
a = pm.Normal(name="a", mu=0, sigma=2)
b = pm.HalfNormal(name="b", sigma=2)
sigma = pm.HalfNormal(name="sigma", sigma=2)
# --- Model Parametrization ---
mu = pm.Deterministic(name="mu", var=a + b * x, dims="idx")
# --- Likelihood ---
likelihood = pm.Normal(
name="likelihood", mu=mu, sigma=sigma, observed=y, dims="idx"
)
Compare to:
sklearn.linear_model.LinearRegression with positive = True.
with pm.Model(coords={"idx": range(n_train)}) as model:
# --- Data Containers ---
x = pm.MutableData(name="x", value=x_train)
y = pm.MutableData(name="y", value=y_train)
# --- Priors ---
a = pm.Normal(name="a", mu=0, sigma=2)
b = pm.Laplace(name="b", sigma=2)
sigma = pm.HalfNormal(name="sigma", sigma=2)
# --- Model Parametrization ---
mu = pm.Deterministic(name="mu", var=a + b * x, dims="idx")
# --- Likelihood ---
likelihood = pm.Normal(
name="likelihood", mu=mu, sigma=sigma, observed=y, dims="idx"
)
Compare to:
with pm.Model(coords={"idx": range(n_train)}) as model:
# --- Data Containers ---
x = pm.MutableData(name="x", value=x_train)
y = pm.MutableData(name="y", value=y_train)
# --- Priors ---
a = pm.Normal(name="a", mu=0, sigma=2)
b = pm.Normal(name="b", mu=0, sigma=2)
sigma = pm.HalfNormal(name="sigma", sigma=2)
nu = pm.Gamma(name="nu", a=10, b=10)
# --- Model Parametrization ---
mu = pm.Deterministic(name="mu", var=a + b * x, dims="idx")
# --- Likelihood ---
likelihood = pm.StudentT(
name="likelihood", mu=mu, sigma=sigma, nu=nu, observed=y, dims="idx"
)
Compare to:
\[ \text{cnt} = \text{intercept} + b_{\text{temp}}\text{temp} + \cdots \]
Two ML Models: Both see a negative effect of temperature in bke rentals ini the month of July.
\[ b(t) \sim N(b(t - 1), \sigma^2) \]
Effect of temperature on bike rentals as a function of tmie for a time varying coeffiicient model (via Gaussian random walk).
MMM structure: Media data (cost, impressions or clicks) is modeled using carryover effects (adstock) and saturation effects. In addition, one can control for seasonality and external regressors. In this example, we allow time-varying coefficients to capture the effect development over time.