# Difference-in-differences method

Category
Statistics
Published on
May 13, 2023
📖  Table of content

# Introduction

The difference-in-differences (DiD) method is a quasi-experimental research design that is used to estimate the causal effect of a treatment on an outcome.

The method compares the outcomes of two groups of units: a treatment group that received the treatment and a control group that did not receive the treatment. Then it compares the changes in the outcomes of the two groups over time.

If the treatment group experiences a greater change in the outcome than the control group, then the DiD method can be used to infer that the treatment caused the change in the outcome.

# Calculation

The DiD method assumes that the two groups would have experienced similar changes in the outcome if the treatment had not been implemented. It’s sometimes referred to as “parallel trends assumption”. In other words, the only difference between the two groups should be the treatment.

To assess this difference, a regression can be used, like the following:  👉
The impact will be assessed by calculating coefficient
: if it is not significant, the hypothesis of parallel trends cannot be rejected. On the opposite, if the coefficient is significant, it can be inferred that the treatment has an effect, positive or negative.

# When should DiD be used?

The DiD method can be used a substitute when it is not possible to conduct a randomized controlled trial. However, there are two requirements:

1. Treatment and control group should be similar before the treatment is implemented.
2. Treatment should be implemented at the same time for all units in the treatment group.

Another more advanced option, that can be thought of as an evolution of the DiD method, is the Synthetic Control method.

# Python implementation

1. Let’s start with example data from a website experiment, where each user is randomly assigned to a Control or Target group.
2. We aggregate data by date, and compute the daily conversion_rate of each group, simply as the number of conversions over number of sessions.

We need a numeric column day rather than a datetime, which is the rank of the day since the beginning of available data, including the pre-experiment period.

We also create binary columns is_target as indicator for the target group, and is_after that indicates the period of the experiment.

# Example data
df_did
 day group is_target is_after sessions conversions conversion_rate 0 control 0 0 584 373 0.246575 1 control 0 0 378 332 0.314815 2 control 0 0 331 290 0.280967 3 control 0 0 445 376 0.328090 4 control 0 0 645 479 0.289922 ... ... ... ... ... ... ... 57 target 1 1 654 331 0.191131 58 target 1 1 568 364 0.285211 59 target 1 1 763 422 0.245085 60 target 1 1 1054 537 0.245731 61 target 1 1 1885 735 0.187798

3. We can plot the conversion rates of each group before and after the start of the experiment, as well as regression lines for each period and group. This gives a sense of a possible divergence between groups after the experiment start. In this example, it’s possible but not visually obvious.
4. # Plot conversion rate by group and period (before/after)
fig, ax = plt.subplots(figsize=(6,8))
sns.lineplot(data=df_did, x='day', y='conversion_rate', hue='group')
sns.regplot(
data=df_did.loc[lambda x: (x['is_target'] == 0) & (x['is_after'] == 0)],
x='day', y='conversion_rate', marker='.', color='steelblue')
sns.regplot(
data=df_did.loc[lambda x: (x['is_target'] == 1) & (x['is_after'] == 0)],
x='day', y='conversion_rate', marker='.', color='peru')
sns.regplot(
data=df_did.loc[lambda x: (x['is_target'] == 0) & (x['is_after'] == 1)],
x='day', y='conversion_rate', marker='.', color='steelblue')
sns.regplot(
data=df_did.loc[lambda x: (x['is_target'] == 1) & (x['is_after'] == 1)],
x='day', y='conversion_rate', marker='.', color='peru') 5. Finally we apply the actual difference-in-difference test, by fitting a linear regression and looking at the coefficients.
6. # Import library
import statsmodels.formula.api as smf

# Fit DiD model
did_model = smf.ols(
'conversion_rate ~ is_target * is_after',
data=df_did.reset_index()
)
results = did_model.fit()

# Display results
print(results.summary())
OLS Regression Results
================================================================================
Dep. Variable:          conversion_rate   R-squared:                       0.336
Method:                   Least Squares   F-statistic:                     20.26
Date:                  Sat, 13 May 2023   Prob (F-statistic):           1.09e-10
Time:                          14:50:56   Log-Likelihood:                 202.05
No. Observations:                   124   AIC:                            -396.1
Df Residuals:                       120   BIC:                            -384.8
Df Model:                             3
Covariance Type:              nonrobust
===================================================================================
coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept           0.2559      0.009     27.578      0.000       0.238       0.274
is_target           0.0289      0.013      2.201      0.030       0.003       0.055
is_after           -0.0634      0.012     -5.136      0.000      -0.088      -0.039
is_target:is_after  0.0255      0.017      1.460      0.147      -0.009       0.060
==============================================================================
Omnibus:                        3.972   Durbin-Watson:                   1.186
Prob(Omnibus):                  0.137   Jarque-Bera (JB):                3.736
Skew:                          -0.425   Prob(JB):                        0.154
Kurtosis:                       3.016   Cond. No.                         7.33
==============================================================================

Notes:
 Standard Errors assume that the covariance matrix of the errors is correctly specified.
👉
What we’re looking at is the p-value on the line is_target:is_after, which indicates whether the coefficient is null or not, i.e. whether there is a significant divergence between trends of both groups after the experiment started.

In this case, the p-value is 0.147 and the 95% confidence interval includes 0, so it’s not significant. We cannot conclude that the treatment had an effect.