import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import bernoulli, expon
"viridis", 3)
sns.set_palette(
import warnings
"ignore") warnings.filterwarnings(
Bayesian A/B testing Analysis using Python.
As data scientists, we are frequently confronted with questions like: “Does X affects Y?” Here, Y is the outcome that we care about, while X could be a new feature, product, policy or experience. How do we go about answering such causal questions? For online platforms, the answer lies in experimentation.
By randomly assigning some kind of treatment (new interface design, new product, etc.) to a subset of users (Group A) and comparing their behavior or outcomes (such as revenue, visits, clicks, etc.) to those who did not receive the treatment (Group B), we can effectively isolate the causal impact of the change. This is at the core of A/B testing (also known as Randomized Control Trials), which is considered the workhorse method of experimentation for platforms like Spotify, Uber, Netflix, an more.
In my previous article, we learned the basics of A/B testing using Python, employing traditional traditional methods. However, one distinguishing feature of platforms like Spotify is their execution of thousands of simultaneous tests. With so many experiments available, is there a way to leverage information from previous tests and improve the inferences we make?
In this article, I aim to address this question by introducing the Bayesian approach to A/B testing. The Bayesian framework proves to be highly suitable for this task as it inherently facilitates the integration of prior knowledge with new data, enabling more robust conclusions. Let’s see how!
1. Case Study: E-beer Catalog Campaign
As as study case, consider E-Beer, a fictional online beer retailer that serves approximately 50,000 customers and operates on a swift delivery model where customers select their favorite brand, make a purchase, and receive their order within an hour. In an effort to enhance sales, E-Beer devised a catalog campaign to distribute to its customer base. Each catalog included a flyer to reinforce awareness of the services offered and a key ring bearing the company’s name and website.
To evaluate the efficacy of the email campaign, an A/B test was conducted wherein users were randomly assigned to a treated group versus a control group that did not receive any catalog. Therefore, the design of the test enables us to address our main questions:
- Does the catalog campaign lead to an increase in purchases compared to the control group (Treated vs. Control)?
But platforms are also interested in studying the number of responses the catalog mail generates. Response rates are one of the most common metrics for evaluating the effectiveness of email campaigns. Although response rates do not directly address causal relationships, they serve as valuable indicators of campaign performance. Here, the Bayesian approach can actually also help us to better assess the uncertainty of our response rates. Therefore, we will start the analysis by answering the following question:
- What is the estimated response rate of the catalog campaign, and how confident are we in this estimate?
Table 1 presents a sample of the dataset containing all the information we need for our analysis:
acctnum | gender | R | F | M | firstpur | age_class | single | student | mailing | respmail | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10001 | 1 | 30 | 10 | 39.2 | 50 | 3 | 1 | 1 | 1 | 0.0 |
1 | 10005 | 0 | 16 | 1 | 152.5 | 16 | 0 | 0 | 1 | 1 | 0.0 |
2 | 10010 | 0 | 12 | 1 | 126.5 | 12 | 0 | 1 | 0 | 1 | 0.0 |
3 | 10011 | 0 | 6 | 2 | 150.5 | 8 | 0 | 1 | 1 | 1 | 0.0 |
4 | 10014 | 1 | 6 | 3 | 99.5 | 18 | 0 | 0 | 1 | 1 | 1.0 |
2. Binary Case: Response Rates
What is the estimated response rate of the catalog campaign, and how confident are we in this estimate?
2.1. Classical Approaches: Normal Distribution vs Nonparametric Bootstrap
We want to know the uncertainty of the response rates p
. One standard way to measure uncertainty of the possible values of the response rate is the classical framework of inference, which is to assume that p
comes from a normal distribution.
We start by calculating the mean and standard deviation. Then, we’ll use those parameters to plot confidence intervals (CI) and get a sense of the lay of the land:
\text{CI}_{\text{lower}} = \bar{x} - 1.96 \times \frac{\text{SE}_{\bar{x}}}{\sqrt{n}}
It’s classic statistical inference!
# Calculate the mean
= df['respmail'].mean()
xbar print("Sample Response Rate:", xbar.round(3))
# Calculate the standard error of the mean (SE)
= np.sqrt(df['respmail'].var() / len(df))
xbse
# Calculate the confidence interval
= xbar - 1.96 * xbse
ci_lower = xbar + 1.96 * xbse
ci_upper
print("95% Confidence Interval:")
print("Lower bound:", ci_lower.round(3))
print("Upper bound:", ci_upper.round(3))
Sample Response Rate: 0.124
95% Confidence Interval:
Lower bound: 0.118
Upper bound: 0.131
Our estimated response rate for the catalog campaign falls within a confidence interval ranging from 11.8% to 13.1% with 95% confidence. This means that, assuming normality, we’re pretty confident that the true response rate lies somewhere in this range. Now, let’s visualize these bounds:
Code
# Generate xx values
= np.linspace(df['respmail'].min(), df['respmail'].max(), 10000)
xx
# Calculate normal density
= pd.DataFrame({'xx': xx, 'm': np.exp(-(xx - xbar) ** 2 / (2 * xbse ** 2)) / (xbse * np.sqrt(2 * np.pi))})
norm_density
# Plot using Seaborn with confidence interval
=(8, 6))
plt.figure(figsize=norm_density, x='xx', y='m', color="#00A087B2")
sns.lineplot(data="black", linestyle="--", label="Mean")
plt.axvline(xbar, color="#481f70", linestyle="--", label="95% CI Lower Bound")
plt.axvline(ci_lower, color="#481f70", linestyle="--", label="95% CI Upper Bound")
plt.axvline(ci_upper, color0.108, 0.14) # Limit x-axis range
plt.xlim("Response Rate (p)")
plt.xlabel("Density")
plt.ylabel("Normal Distribution")
plt.title(
plt.legend()True)
plt.grid( plt.show()
If we don’t want to make any distributional assumptions, we can give the nonparametric bootstrap method a shot. This approach allows us to leverage bootstrap sampling techniques for robust estimates of response rates. Here is the code to do re-sampling with replacement:
# Define the number of bootstrap samples
= 10000
B
# Define a function to calculate response rate
def calculate_response_rate(data):
return data.mean()
# Initialize an array to store bootstrap sample response rates
= np.zeros(B)
bootstrap_response_rates
# Generate bootstrap samples and calculate response rates
for i in range(B):
= df['respmail'].sample(frac=1, replace=True) # Sampling with replacement
bootstrap_sample = calculate_response_rate(bootstrap_sample)
bootstrap_response_rates[i]
# Calculate mean and standard error of bootstrap response rates
= bootstrap_response_rates.mean()
bootstrap_mean
# Calculate 95% confidence interval
= np.percentile(bootstrap_response_rates, [2.5, 97.5])
confidence_interval print("Bootstrap Mean:", bootstrap_mean.round(3))
print("95% Confidence Interval:", confidence_interval.round(3))
Bootstrap Mean: 0.124
95% Confidence Interval: [0.115 0.134]
Code
# Plot using Seaborn
=(8, 6))
plt.figure(figsize=100, kde=True, color="#00A087B2", edgecolor="black", fill=True)
sns.histplot(bootstrap_response_rates, bins="black", linestyle="--", linewidth=1.5)
plt.axvline(np.mean(bootstrap_response_rates), color="#481f70", linestyle="--", label="95% CI Lower Bound")
plt.axvline(ci_lower, color="#481f70", linestyle="--", label="95% CI Upper Bound")
plt.axvline(ci_upper, color"Response Rate (p)")
plt.xlabel("Density")
plt.ylabel("Bootstrap Distribution")
plt.title(True)
plt.grid( plt.show()
Upon running the resampling, we obtain a 95% confidence interval of [0.115, 0.134]. Interestingly, we notice that the uncertainty level is slightly higher in the nonparametric case (more uncertainty). This is a common trade-off when opting for fewer assumptions in our analysis.
2.3. Bayesian Approach
So far, everything has been very standard. But let’s consider this: what if our catalog campaign isn’t the only experiment we’ve conducted to boost our browser (and ultimately ad revenue)? Among the thousands of experiments we’ve run in the past, this campaign is just one idea in the mix. Can we efficiently leverage this wealth of additional information?
This is how Bayesian statistics offers a powerful advantage over the frequentist approach: the seamless integration of additional information into our model. The idea directly follows from the main results behind all Bayesian statistics: Bayes Theorem. By inverting the inference problem, Bayes’ Theorem allows us to transition from the probability of the model given the data to the probability of the data given the model:
\underbrace{ \Pr \big( \text{model} \ \big| \ \text{data} \big) }_{\text{posterior}} = \underbrace{ \Pr(\text{model}) }_{\text{prior}} \ \underbrace{ \frac{ \Pr \big( \text{data} \ \big| \ \text{model} \big) }{ \Pr(\text{data}) } }_{\text{likelihood}}
We can split the right-hand side of Bayes Theorem (or Rule) into two components: the prior and the likelihood. The likelihood is the information about the model that comes from the data, the prior instead is any additional information about the model.
Priors
As the name suggests, priors contain information that was available even before looking at the data. This leads to one of the most relevant questions in Bayesian statistics: How do you choose a prior? In practice, there are several considerations to keep in mind:
- Beta Distribution Assumption: Under the Bayesian framework, we assume that the true conversion rates follow a Beta distribution. This particular distribution is chosen for its flexibility and ability to capture uncertainty.
- Uninformative or Weakly Informative Priors: these priors are characterized by setting low values for alpha and beta. For example, alpha = 1, beta = 1 leads to a uniform distribution as a prior. If we were considering one distribution in isolation, setting this prior is a statement that we don’t know anything about the value of the parameter, nor our confidence around it.
- Strong Priors: Conversely, strong priors are characterized by high values for alpha and beta, implying that the relative uplift distribution is thin, i.e. our prior belief is that the variants are not very different from each other.
Going back to our emailing campaign, let’s assume our response rate p
comes from a beta distribution. The mean of the beta distribution is:
\text{E}[p|a,b] = \frac{a}{a+b}
Drawing insights from past experiments, we’ve observed that p
tends to hover around 12.3%. To reflect this knowledge, we set the parameters a and b of the beta distribution accordingly: a = 9 and b = 64.
= 9
prior_a = 64
prior_b
# Mean
= prior_a / (prior_a + prior_b)
p_hat print(round(p_hat,3))
0.123
Code
# Create DataFrame with samples
= 10000 # number draws from distribution
B = np.random.beta(prior_a, prior_b, size=B)
samples = pd.DataFrame({'Samples': samples})
beta_df
# Plot using Seaborn
=(8, 6))
plt.figure(figsize=100, kde=True, color="#00A087B2", edgecolor="black", fill=True)
sns.histplot(beta_df, bins="black", linestyle="--", linewidth=1.5)
plt.axvline(p_hat, color"Response Rate (p)")
plt.xlabel("Density")
plt.ylabel("Prior Distribution")
plt.title(True)
plt.grid( plt.show()
It’s worth noting that the beta distribution exhibits a slight leftward skewness, suggesting that lower response rates may be more likely than higher response rates, which is a common pattern in scenarios where there are factors such as limited customer engagement or market saturation.
Posterior
Bayes rule tells you how you should update your beliefs after you see some data:
\textrm{posterior} \propto \textrm{likelihood} \times \textrm{prior}
Now, we already assume some prior and now we run the experiment and we see our data results. With this, we can update our prior belief by calculating the posterior. Our observed experiment had the following characteristics:
Code
print('Number Emails sent:', df.mailing.sum() )
print('Number of responses:', df[df['mailing'] == 1].respmail.sum() )
print('Response Rate', df[df['mailing'] == 1].respmail.mean().round(4)*100 )
Number Emails sent: 4952
Number of responses: 616.0
Response Rate 12.44
If we observe s responses in n observations, we can update our data by using the following equations:
f(p)=p^{a+x-1}(1-p)^{b+n-s-1}\approx\text{beta}[a+s,b+n-s]
# Define parameters
= 10000
B = df.mailing.sum() # number in test sample
n = df[df['mailing'] == 1].respmail.sum() # number of responses
s
= prior_a + s
post_a = prior_b + n - s post_b
from scipy.stats import beta
# Calculate mean posterior response rate
= post_a / (post_a + post_b)
p_hat1
# Print mean response rate
print("The Posterior Mean response rate is", round(p_hat1, 3))
# Calculate prior and posterior confidence intervals
= beta.ppf([0.025, 0.975], prior_a, prior_b)
prior_ci = beta.ppf([0.025, 0.975], post_a, post_b)
posterior_ci
# Print confidence intervals
print("Prior Response Rate 95% CI:", prior_ci.round(3))
print("Posterior Response Rate 95% CI:", posterior_ci.round(3))
The Posterior Mean response rate is 0.124
Prior Response Rate 95% CI: [0.059 0.207]
Posterior Response Rate 95% CI: [0.115 0.134]
Code
# Generate posterior density samples
= np.random.beta(a=post_a, b=post_b, size=B)
posterior_density
# Plot using Seaborn
=(8, 6))
plt.figure(figsize=100, kde=True, color="#00A087B2", edgecolor="black", fill=True)
sns.histplot(posterior_density, bins="black", linestyle="--", linewidth=1.5)
plt.axvline(p_hat1, color"Response Rate (p)")
plt.xlabel("Density")
plt.ylabel("Posterior Distribution")
plt.title(True)
plt.grid( plt.show()
The Bayesian approach provides a posterior mean response rate of 0.124, along with a 95% confidence interval (CI) for the response rate ranging from 0.115 to 0.134. In this case, the standard approaches reported similar confidence intervals.
Breakeven Point
Platforms often seek to determine whether the response rate from a campaign surpasses a minimum threshold necessary to cover the costs incurred. This minimum threshold is known as the breakeven point. For instance, let’s consider a scenario where the cost per mailing (c) amounts to $5.75, while the profits generated per response (m) stand at $50.
We can calculate the probability that the posterior is above the breakeven point, which in this case is: p >\frac{c}{\pi} \equiv \frac{5.75}{50} = 0.115
So that we are interested in: \Pr(p<0.115) We estimate the probability by drawing from a beta distribution with above parameters.
= 5.75 # cost per mailing
c = 50 # profit if respond
m
# Calculate breakeven point
= c / m
brk print("Breakeven point:", brk)
Breakeven point: 0.115
# Set random seed
19312)
np.random.seed(
# Generate random deviates
= np.random.beta(post_a, post_b, size=B)
post_draws
# Calculate probability of passing breakeven point
= np.sum(post_draws < brk) / B
prob_passing
# Print probability
print("The Probability that the response rate is below the breakeven point is", round(prob_passing, 3))
The Probability that the response rate is below the breakeven point is 0.02
With only a 2% probability of the response rate falling below the breakeven point, the results indicate a favorable outcome, indicating that the campaign’s effectiveness in generating responses is promising in relation to its profitability.
3. Continuous Case: A/B Testing
We are now ready to tackle our primary question:
Does the catalog campaign effectively drive purchases compared to the control group (Treated vs. Control)?
Before delving into the A/B testing analysis, it is good practice to build a balance table to check the comparability of our treatment and control groups across key covariates (see my previous post here):
Code
import numpy as np
import pandas as pd
def smd(feature, treatment):
"""Calculate the standard mean difference (SMD) of a feature between the
treatment and control groups.
Args:
feature (pandas.Series): a column of a feature to calculate SMD for
treatment (pandas.Series): a column that indicate whether a row is in
the treatment group or not
Returns:
(float): The SMD of the feature
"""
= feature[treatment == 1]
t = feature[treatment == 0]
c return (t.mean() - c.mean()) / np.sqrt(0.5 * (t.var() + c.var()))
def create_table_one(data, treatment_col, features):
"""Report balance in input features between the treatment and control groups.
Args:
data (pandas.DataFrame): total or matched sample data
treatment_col (str): the column name for the treatment
features (list of str): the column names of features
Returns:
(pandas.DataFrame): A table with the means and standard deviations in
the treatment and control groups, and the SMD between two groups
for the features.
"""
= pd.pivot_table(
t1 + [treatment_col]],
data[features =treatment_col,
columns=[lambda x: "{:.2f} ({:.2f})".format(x.mean(), x.std())],
aggfunc
)= t1.columns.droplevel(level=0)
t1.columns "SMD"] = data[features].apply(lambda x: smd(x, data[treatment_col])).round(4)
t1[
= pd.pivot_table(
n_row 0], treatment_col]], columns=treatment_col, aggfunc=["count"]
data[[features[
)= n_row.columns.droplevel(level=0)
n_row.columns "SMD"] = ""
n_row[= ["n"]
n_row.index
= pd.concat([n_row, t1], axis=0)
t1 = ""
t1.columns.name = ["Control", "Treatment", "SMD"]
t1.columns = "Variable"
t1.index.name
return t1
= ['gender', 'firstpur', 'age_class', 'single', 'student']
my_cov 'mailing', my_cov) create_table_one(df,
Control | Treatment | SMD | |
---|---|---|---|
Variable | |||
n | 5012 | 4952 | |
age_class | 0.85 (1.10) | 0.85 (1.10) | -0.0031 |
firstpur | 26.33 (18.15) | 26.62 (18.28) | 0.0156 |
gender | 0.30 (0.46) | 0.30 (0.46) | 0.0096 |
single | 0.31 (0.46) | 0.30 (0.46) | -0.0064 |
student | 0.53 (0.50) | 0.53 (0.50) | -0.0033 |
From Table 2 we observe a stable distribution across covariates, indicating successful randomization. All good so far.
3.1. Classical Approach: Linear Regression
As explored in my previous post on A/B Testing, we can quantify the impact of our catalog campaign using a standard linear regression model. In this case, I would like to know the impact of the campaign on Purchase Value M of customers on the E-beer store during a given period:
Y_{i} = \alpha + \beta D_i + \varepsilon_i
where D_i is the treatment indicator (equal to 1 for treated units and 0 for control units), \alpha is the intercept term and \beta is our coefficient of interest representing the Average Treatment Effect.
import statsmodels.formula.api as smf
'M ~ mailing', data=df).fit().summary().tables[1] smf.ols(
coef | std err | t | P>|t| | [0.025 | 0.975] | |
Intercept | 92.2839 | 1.066 | 86.589 | 0.000 | 90.195 | 94.373 |
mailing | 2.9149 | 1.512 | 1.928 | 0.054 | -0.049 | 5.878 |
According to Table 3, while the coefficient associated with the treatment variable (mailing
) suggests a positive impact of the catalog campaign, these effects do not achieve statistical significance. The 95% Confidence Intervals for the coefficient range from -0.049 to 5.878, indicating substantial uncertainty surrounding the estimated treatment effect.
Can we do better? Let’s leverage information from previous tests with the Bayesian framework.
3.2. Bayesian Approach: Uninformative Prior
Prior
When choosing a prior, one analytically appealing restriction is to have a prior distribution such that the posterior belongs to the same family. For example, in the case of a continuous dependent variable (M), I can assume my treatment effect is normally distributed and I would like it to be normally distributed also after incorporating the information contained in the data.
Now, before observing any data, I knew nothing about how much people might spend on beers online. They might spend 2 euros or they might stay for 200 euros. Formally, I can describe my weakly informative prior beliefs with a prior distribution:
\textrm{mean Purchase-Value by group} \sim N(0, 100^2)
Here is how the distribution looks like:
Code
# Prior parameters
= 0
prior_mean = 100
prior_sd
# Generate x-values for plotting
= np.linspace(-300, 300, num=1000)
xx
# Calculate prior density
= (1 / (prior_sd * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((xx - prior_mean) / prior_sd) ** 2)
prior_density
# Plot prior density using Seaborn
=(10, 6))
plt.figure(figsize=xx, y=prior_density, color='#00A087B2')
sns.lineplot(x="black", linestyle="--", linewidth=1.5)
plt.axvline(prior_mean, color'Prior Density')
plt.title('Mean Purchase Value (M)')
plt.xlabel('Density')
plt.ylabel(True)
plt.grid( plt.show()
Posterior
Then Bayes rule tells us that the posterior distribution for mean purchase-value for each group should be:
\textrm{mean time-on-site (m)} \sim \mathcal{N}\left(\mu, \sigma^2\right) where
\sigma = \left(\frac{1}{\sigma_{0}^2} + \frac{n}{s^2}\right)^{-1}
and
\mu = \sigma^2 \left(\frac{\mu_0}{\sigma_{0}^2} + \frac{n \bar{y}}{s^2}\right)
# Posterior parameters
= df.shape[0] - df.mailing.sum()
n_A = df.mailing.sum()
n_B = df.M.std() # Standard deviation of data (approx==2)
s = df[df['mailing'] == 0].M.mean()
ybar_A = df[df['mailing'] == 1].M.mean()
ybar_B
# Posterior standard deviation
= (1 / (prior_sd ** 2) + n_A / (s ** 2)) ** (-0.5)
posterior_sd_A = (1 / (prior_sd ** 2) + n_B / (s ** 2)) ** (-0.5)
posterior_sd_B
# Posterior mean
= posterior_sd_A ** 2 * ((prior_mean / (prior_sd ** 2)) + (n_A * ybar_A / (s ** 2)))
posterior_mean_A = posterior_sd_B ** 2 * ((prior_mean / (prior_sd ** 2)) + (n_B * ybar_B / (s ** 2)))
posterior_mean_B
print("Posterior Mean Estimate for Group A:", posterior_mean_A.round(1))
print("Posterior Mean Estimate for Group B:", posterior_mean_B.round(1))
Posterior Mean Estimate for Group A: 92.3
Posterior Mean Estimate for Group B: 95.2
We can plot each posterior distribution to compare:
Code
# Plot posterior distributions
= np.linspace(min(df['M']), max(df['M']), 1000)
xx = (1 / (posterior_sd_A * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((xx - posterior_mean_A) / posterior_sd_A) ** 2)
posterior_distribution_A = (1 / (posterior_sd_B * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((xx - posterior_mean_B) / posterior_sd_B) ** 2)
posterior_distribution_B
=(10, 6))
plt.figure(figsize=xx, y=posterior_distribution_A, label='Group A', color='#440154')
sns.lineplot(x=xx, y=posterior_distribution_B, label='Group B', color='#00A087B2')
sns.lineplot(x85, 102) # Limit x-axis range
plt.xlim(
'Posterior Distributions of Monetary Value (M)')
plt.title('Monetary Value (M)')
plt.xlabel('Density')
plt.ylabel(
plt.legend()True)
plt.grid( plt.show()
Excellent! With the Bayesian approach, we’ve estimated the posterior mean purchase value for Group A to be $92.3 and for Group B to be $95.2. Once we have these distribution for the difference in the mean, we can compute the probability that the mean of B is greater than the mean of A and Confidence Intervals:
from scipy.stats import norm
# Calculate the difference in means
= posterior_mean_B - posterior_mean_A
post_mean_diff print("Posterior Average Treatment Effect:", post_mean_diff.round(3))
# Calculate the standard deviation of the difference
= np.sqrt(posterior_sd_B ** 2 + posterior_sd_A ** 2)
post_sd_diff
# Compute the probability
= 1 - norm.cdf(0, loc=post_mean_diff, scale=post_sd_diff)
prob print("Probability that mean of B is greater than mean of A:", prob.round(3))
# Calculate the 95% confidence interval
= norm.ppf(0.025, loc=post_mean_diff, scale=post_sd_diff)
ci_lower = norm.ppf(0.975, loc=post_mean_diff, scale=post_sd_diff)
ci_upper print("95% Confidence Interval for the Difference in Means:", (ci_lower.round(3), ci_upper.round(3)))
Posterior Average Treatment Effect: 2.914
Probability that mean of B is greater than mean of A: 0.973
95% Confidence Interval for the Difference in Means: (-0.049, 5.878)
On average, the treatment (catalog campaign) leads to an increase in purchase value by approximately $2.914 compared to the control group. The confidence interval for the difference in means ranges from -0.049 to 5.878. Since this interval includes zero, we cannot conclude with 95% confidence that there is a statistically significant difference in purchase values between the treated and control groups.
This suggests that the catalog campaign may not have a significant impact on purchase behavior compared to the control group, based on the observed data.
Code
# Generate x-values for plotting
= np.linspace(-2, 7, num=100)
xx_diff
# Calculate the posterior density of the difference
= norm.pdf(xx_diff, loc=post_mean_diff, scale=post_sd_diff)
post_density_diff
# Plot the posterior distribution of the difference using Seaborn
=(10, 6))
plt.figure(figsize=xx_diff, y=post_density_diff, color='#00A087B2')
sns.lineplot(x'Posterior Distribution of Difference in Means (B - A)')
plt.title(=0, color='black', linestyle='--', label='Zero difference')
plt.axvline(x'Difference in Means')
plt.xlabel('Density')
plt.ylabel(True)
plt.grid(
# Shade the area to the right of 0 to calculate the probability
=(xx_diff >= 0), color='#00A087B2', alpha=0.2)
plt.fill_between(xx_diff, post_density_diff, where
plt.show()
3.3. Bayesian Approach: Student-t Prior
What if we have a more informative prior? What if past experiments tell us that the distribution is actually more similar to a t-student distribution, for example?
While we’ve previously assumed a normal distribution for our prior, which is appropriate for continuous data with a large sample size, we can explore also the t-student distribution, which is more robust to outliers and has heavier tails compared to the normal distribution. This is a setting that happens often in the industry.
from scipy.stats import t
# Assuming you have a DataFrame df with columns 'mailing' and 'M'
# Filter the data for group A and group B
= df[df['mailing'] == 0]['M']
group_A = df[df['mailing'] == 1]['M']
group_B
# Prior parameters for Student's t-distribution
= 1.3 # degrees of freedom
prior_df = 0 # mean
prior_loc = 100 # scale (similar to standard deviation)
prior_scale
# Likelihood function (assuming a normal likelihood)
def likelihood(data, mean):
= data.std() # Standard deviation from the data
sd return np.prod(1 / (sd * np.sqrt(2 * np.pi)) * np.exp(-0.5 * ((data - mean) / sd) ** 2))
# Posterior parameters for group A
= group_A.mean()
posterior_mean_A = np.sqrt(prior_scale ** 2 / (prior_scale ** 2 + (group_A.std() ** 2) / len(group_A)))
posterior_sd_A
# Posterior parameters for group B
= group_B.mean()
posterior_mean_B = np.sqrt(prior_scale ** 2 / (prior_scale ** 2 + (group_B.std() ** 2) / len(group_B))) posterior_sd_B
Code
from scipy.stats import norm
# Prior parameters
= 0
prior_mean = 100
prior_sd
# Generate x-values for plotting
= np.linspace(-300, 300, num=1000)
xx
# Calculate prior density
= (1 / (prior_sd * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((xx - prior_mean) / prior_sd) ** 2)
prior_density
# Calculate the difference in means
= posterior_mean_B - posterior_mean_A
post_mean_diff
# Calculate the standard deviation of the difference
= np.sqrt(posterior_sd_B ** 2 + posterior_sd_A ** 2)
post_sd_diff
# Generate x-values for plotting
= np.linspace(-2, 7, num=100)
xx_diff
# Calculate the posterior density of the difference using Normal prior
= norm.pdf(xx_diff, loc=post_mean_diff, scale=post_sd_diff)
post_density_diff_normal
# Plot prior density and posterior distribution of the difference using Seaborn
=(10, 6))
plt.figure(figsize
# Plot prior density
=xx, y=prior_density, color='#00A087B2', label='Normal Prior ')
sns.lineplot(x="black", linestyle="--", linewidth=1.5)
plt.axvline(prior_mean, color
# Plot posterior distribution of the difference
=xx_diff, y=post_density_diff_normal, color='#440154', label='t-Student Prior')
sns.lineplot(x
'Comparison of Distributions')
plt.title('Mean Purchase Value (M)')
plt.xlabel('Density')
plt.ylabel(True)
plt.grid(
plt.legend() plt.show()
As we can see from this plot, all distributions are centered on zero, but they have very different shapes. Let’s estimate our Confidence Intervals:
print("Posterior Average Treatment Effect:", post_mean_diff.round(3))
# Calculate the 95% confidence interval using the t-distribution
= t.ppf(0.025, df=(len(group_A) + len(group_B)) - 2, loc=post_mean_diff, scale=post_sd_diff)
ci_lower = t.ppf(0.975, df=(len(group_A) + len(group_B)) - 2, loc=post_mean_diff, scale=post_sd_diff)
ci_upper print("95% Confidence Interval for the Difference in Means:", (ci_lower.round(3), ci_upper.round(3)))
Posterior Average Treatment Effect: 2.915
95% Confidence Interval for the Difference in Means: (0.143, 5.687)
Code
from scipy.stats import t
# Generate x-values for plotting
= np.linspace(-2, 7, num=100)
xx_diff
# Calculate the posterior density of the difference using Student's t-distribution
= t.pdf(xx_diff, df=(len(group_A) + len(group_B)) - 2, loc=post_mean_diff, scale=post_sd_diff)
post_density_diff
# Plot the posterior distribution of the difference using Seaborn
=(10, 6))
plt.figure(figsize=xx_diff, y=post_density_diff, color='#00A087B2')
sns.lineplot(x'Posterior Distribution of Difference in Means (B - A) with Student-t Prior')
plt.title('Difference in Means')
plt.xlabel('Density')
plt.ylabel(True)
plt.grid(
# Shade the area to the right of 0 to calculate the probability
=(xx_diff >= 0), color='#00A087B2', alpha=0.2)
plt.fill_between(xx_diff, post_density_diff, where
# Add a vertical line at the point where the difference in means is zero
=0, color='black', linestyle='--', label='Zero difference')
plt.axvline(x
# Add confidence interval lines
=ci_lower, color='blue', linestyle='--', label='95% CI')
plt.axvline(x=ci_upper, color='blue', linestyle='--')
plt.axvline(x
plt.legend() plt.show()
At first glance, the Posterior Average Treatment Effect (ATE) appears to be similar to our previous findings. But now, these results are statistically significant. Unlike our previous analysis, the 95% Confidence Interval for the Difference in Means now ranges from 0.143 to 5.687. The change in significance might be attributed to the adoption of a more informative prior distribution, which has provided greater precision in our estimates.
5. Conclusions
In this post, we’ve explored Bayesian Statistics to leverage information from previous A/B tests and improve the inferences of our parameters of interest. We focused our attention on how to choose appropriate priors and its influence on precision. While we touched briefly on this aspect, the choice of distribution function holds equal importance to the fine-tuning of its parameters. Indeed, the shape of the prior distribution exerts a significant influence on our inference, particularly in scenarios characterized by skewed distributions.