📏

Calculate the sample size for A/B testing

Category
Statistics
Published on
April 28, 2023

General approximation

For a 95% confidence level, the sample size can be approximated by the formula:

n16σ2d2n ≈ \frac{16 \sigma^2}{d^2}

where:

  • z is the z-score
  • σ is the standard deviation
    • which is p(1p)\sqrt{p(1-p)} for a binomial distribution
  • d is the effect size
    • which is μ1μ0\mu_1 - \mu_0 for a continuous metric
    • and p1p0p_1 - p_0 for a binomial distribution

For continuous metrics

Formula

Applied to continuous metrics, the formula is:

n=2((zα/2+zβ)σd)2n = 2 \left ( \frac{(z_{\alpha/2} + z_\beta) \sigma}{d} \right )^2

where

  • σ is the standard deviation
  • z are the z-scores associated with α and 1 − β
  • d is the minimum detectable effect in absolute value, e.g. μ1μ0\mu_1 - \mu_0

Python implementation

# Function to get minimum sample size (two-sided test)
import scipy.stats as st
def sample_size_cont(mde_abs, variance, power=.8, alpha=.05):
    t_alpha = st.norm.ppf(1-alpha/2)
    t_beta = st.norm.ppf(power)
    result = 2*((t_alpha + t_beta) * np.sqrt(variance) / mde_abs)**2
    print("Sample size: {:.0f}".format(result))

Let’s apply our function for a MDE of 0.05, and a (pooled) variance of 5:

mde_abs = .05   # Minimum detectable effect in absolute value
variance = 5    # Pooled variance
sample_size_cont(mde_abs, variance)
Sample size: 31396

This can also be implemented with statsmodels TTestIndPower().solve_power() function:

# Statsmodel equivalent function
import numpy as np
import statsmodels.stats.power as smpr

effect_size = mde_abs / np.sqrt(variance)
smresult = smpr.TTestIndPower().solve_power(
    effect_size=effect_size, 
    power=.8, 
    alpha=.05, 
    ratio=1, 
    alternative='two-sided'
)
print("Sample size: {:.0f}".format(smresult))
Sample size: 31396

For proportions

Formula

Applied to proportions, the minimum sample size is calculated as:

n=2((zα/2+zβ)p(1p)d)2n = 2 \left ( \frac {(z_{\alpha/2} + z_\beta) \sqrt{p(1-p)}}{d} \right ) ^2

where

  • p is the pooled proportion, i.e. total number of successes over total number of observations
  • z are the z-scores associated with α and 1 − β
  • d is the minimum detectable effect in absolute value, e.g. p1p0p_1 - p_0

Python implementation

# Function to get minimum sample size
import scipy.stats as st

def sample_size_ratio(p, mde_abs, alpha=.05, power=.8):
    t_alpha = st.norm.ppf(1-alpha/2)
    t_beta = st.norm.ppf(power)
    result = 2 * (((t_alpha + t_beta)**2 * p*(1-p))/mde_abs**2)
    print("Sample size: {:.0f}".format(result))

Let’s try this with a proportion p of 0.20 and an absolute MDE of 0.01:

p = .20     # Baseline rate
mde = .01   # Absolute minimum detectable effect
sample_size_ratio(p=p, mde_abs=mde)
Sample size: 25116

Again, it can be implemented with statsmodels:

# Statsmodels equivalent function
import statsmodels.stats.power as smpr
import statsmodels.stats.proportion as smp

effect_size = smp.proportion_effectsize(p, p+mde)
smresult = smpr.NormalIndPower().solve_power(
    effect_size=effect_size, 
    power=.8, 
    alpha=.05, 
    ratio=1, 
    alternative='two-sided'
)
print("Sample size: {:.0f}".format(smresult))
Sample size: 25580

When randomization unit ≠ analysis unit

When the randomization unit is different from the analysis unit, the minimum required sample size can be estimated by adding to the usual formula, the ratio of the number of analysis units to the number of randomization units:

n=2((zα/2+zβ)σDEd)2n = 2 \left ( \frac{(z_{α/2} + z_β) \cdot σ}{\text{DE} \cdot d} \right )^2

where:

  • zα/2z_{\alpha/2} is the upper 1-α/2 percentile of the standard normal distribution. For example, if α = 0.05, then zα/2z_{\alpha/2} = 1.96.
  • zβz_\beta is the upper 1-β percentile of the standard normal distribution. For example, if β = 0.20, then zβz_\beta = 0.84.
  • σ2\sigma^2 is the variance of the outcome variable
  • d is the effect size you want to detect. This is the difference in means between the two groups that you want to be able to detect with your test.
  • DE is the Design Effect

The Design Effect DE is calculated as:

DE=1+(m1)×ICCDE = 1 + (m - 1) × ICC

where:

  • m is the average cluster size (number of analysis units per randomization unit)
  • ICC is the Intraclass Correlation Coefficient

👉 Since it’s not trivial to compute sample sizing for this use case, you can fallback on the usual simple formula, and consider the result as the minimum required number of randomisation units. It will be conservative, but it’ll guarantee that you have at least the minimum required size.

Resources