Difference-in-differences method

Category
Statistics
Published on
May 13, 2023

Introduction

The difference-in-differences (DiD) method is a quasi-experimental research design that is used to estimate the causal effect of a treatment on an outcome.

The method compares the outcomes of two groups of units: a treatment group that received the treatment and a control group that did not receive the treatment. Then it compares the changes in the outcomes of the two groups over time.

If the treatment group experiences a greater change in the outcome than the control group, then the DiD method can be used to infer that the treatment caused the change in the outcome.

Calculation

The DiD method assumes that the two groups would have experienced similar changes in the outcome if the treatment had not been implemented. It’s sometimes referred to as “parallel trends assumption”. In other words, the only difference between the two groups should be the treatment.

To assess this difference, a regression can be used, like the following:

👉
The impact will be assessed by calculating coefficient : if it is not significant, the hypothesis of parallel trends cannot be rejected. On the opposite, if the coefficient is significant, it can be inferred that the treatment has an effect, positive or negative.

When should DiD be used?

The DiD method can be used a substitute when it is not possible to conduct a randomized controlled trial. However, there are two requirements:

  1. Treatment and control group should be similar before the treatment is implemented.
  2. Treatment should be implemented at the same time for all units in the treatment group.

Another more advanced option, that can be thought of as an evolution of the DiD method, is the Synthetic Control method.

Python implementation

  1. Let’s start with example data from a website experiment, where each user is randomly assigned to a Control or Target group.
  2. We aggregate data by date, and compute the daily conversion_rate of each group, simply as the number of conversions over number of sessions.

    We need a numeric column day rather than a datetime, which is the rank of the day since the beginning of available data, including the pre-experiment period.

    We also create binary columns is_target as indicator for the target group, and is_after that indicates the period of the experiment.

    # Example data
    df_did
    day
    group
    is_target
    is_after
    sessions
    conversions
    conversion_rate
    0
    control
    0
    0
    584
    373
    0.246575
    1
    control
    0
    0
    378
    332
    0.314815
    2
    control
    0
    0
    331
    290
    0.280967
    3
    control
    0
    0
    445
    376
    0.328090
    4
    control
    0
    0
    645
    479
    0.289922
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    57
    target
    1
    1
    654
    331
    0.191131
    58
    target
    1
    1
    568
    364
    0.285211
    59
    target
    1
    1
    763
    422
    0.245085
    60
    target
    1
    1
    1054
    537
    0.245731
    61
    target
    1
    1
    1885
    735
    0.187798

  3. We can plot the conversion rates of each group before and after the start of the experiment, as well as regression lines for each period and group. This gives a sense of a possible divergence between groups after the experiment start. In this example, it’s possible but not visually obvious.
  4. # Plot conversion rate by group and period (before/after)
    fig, ax = plt.subplots(figsize=(6,8))
    sns.lineplot(data=df_did, x='day', y='conversion_rate', hue='group')
    sns.regplot(
    		data=df_did.loc[lambda x: (x['is_target'] == 0) & (x['is_after'] == 0)], 
    		x='day', y='conversion_rate', marker='.', color='steelblue')
    sns.regplot(
    		data=df_did.loc[lambda x: (x['is_target'] == 1) & (x['is_after'] == 0)], 
    		x='day', y='conversion_rate', marker='.', color='peru')
    sns.regplot(
    		data=df_did.loc[lambda x: (x['is_target'] == 0) & (x['is_after'] == 1)], 
    		x='day', y='conversion_rate', marker='.', color='steelblue')
    sns.regplot(
    		data=df_did.loc[lambda x: (x['is_target'] == 1) & (x['is_after'] == 1)], 
    		x='day', y='conversion_rate', marker='.', color='peru')
    image
  5. Finally we apply the actual difference-in-difference test, by fitting a linear regression and looking at the coefficients.
  6. # Import library
    import statsmodels.formula.api as smf
    
    # Fit DiD model
    did_model = smf.ols(
    	'conversion_rate ~ is_target * is_after', 
    	data=df_did.reset_index()
    )
    results = did_model.fit()
    
    # Display results
    print(results.summary())
    OLS Regression Results                             
    ================================================================================
    Dep. Variable:          conversion_rate   R-squared:                       0.336
    Model:                              OLS   Adj. R-squared:                  0.320
    Method:                   Least Squares   F-statistic:                     20.26
    Date:                  Sat, 13 May 2023   Prob (F-statistic):           1.09e-10
    Time:                          14:50:56   Log-Likelihood:                 202.05
    No. Observations:                   124   AIC:                            -396.1
    Df Residuals:                       120   BIC:                            -384.8
    Df Model:                             3                                         
    Covariance Type:              nonrobust                                         
    ===================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
    -----------------------------------------------------------------------------------
    Intercept           0.2559      0.009     27.578      0.000       0.238       0.274
    is_target           0.0289      0.013      2.201      0.030       0.003       0.055
    is_after           -0.0634      0.012     -5.136      0.000      -0.088      -0.039
    is_target:is_after  0.0255      0.017      1.460      0.147      -0.009       0.060
    ==============================================================================
    Omnibus:                        3.972   Durbin-Watson:                   1.186
    Prob(Omnibus):                  0.137   Jarque-Bera (JB):                3.736
    Skew:                          -0.425   Prob(JB):                        0.154
    Kurtosis:                       3.016   Cond. No.                         7.33
    ==============================================================================
    
    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    👉
    What we’re looking at is the p-value on the line is_target:is_after, which indicates whether the coefficient is null or not, i.e. whether there is a significant divergence between trends of both groups after the experiment started.

    In this case, the p-value is 0.147 and the 95% confidence interval includes 0, so it’s not significant. We cannot conclude that the treatment had an effect.

Resources