🧬

Estimate consumers lifetime value

Category
Data Science
Published on
November 29, 2021

Introduction

Python package lifetimes helps to make recency/frequency customers analysis, and estimate lifetime values. Two models can be used, the simple BG/NBD model that doesn’t use monetary value, and the Gamma-Gamma model that estimates the customer lifetime value.

Prepare data

The initial dataset with one row per transaction (order_id) needs to be transformed to the appropriate shape, with one row per client, and four columns for recency, frequency, age, and monetary value.

# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import lifetimes
from lifetimes.plotting import plot_frequency_recency_matrix, plot_probability_alive_matrix

# Import list of transactions
df = pd.read_csv('./exclude/eshop_transactions.csv', parse_dates=['date'])
df.head()
order_id
client_id
date
total
0
1007
fe1a03b2b0e021bbac0ea050a1d216a7
2017-06-21 05:14:35
39.0
1
1008
c9bdedeb9ac367f11c77dc5753b2b939
2017-06-21 05:30:19
39.0
2
1009
47f4ef0684413a1f5a429e251cbc7261
2017-06-21 07:58:02
75.0
3
1010
9a4ee011e9639539955423f54b7d46ec
2017-06-21 08:23:34
75.0
4
1011
490037f2983ad86fab82bcb309a33ee1
2017-06-21 08:30:59
42.0
# Transform data to appropriate shape
df_rfm = lifetimes.utils.summary_data_from_transaction_data(
    df, 'client_id', 'date', 
    monetary_value_col = 'total',
    observation_period_end='2019-03-31'
)
df_rfm.head(7)
frequency
recency
T
monetary_value
client_id
0020c81355c2057acfb019eb2c18a9a1
1.0
236.0
619.0
94.5
003b1637ce45163ccb9b0ff278464fd1
0.0
0.0
535.0
0.0
00444e8c950c199b637e5dcdfe401e0c
0.0
0.0
226.0
0.0
0055eb3281238fa3388a6db46d7d2d01
0.0
0.0
163.0
0.0
006e783513f582e9657d18334d06df49
1.0
103.0
634.0
53.0
007062115a3cbd21d0b52af5e42e10cf
0.0
0.0
299.0
0.0
0085fabd8952f67be3efdeaddfbf6d43
2.0
327.0
439.0
94.5

Recency and frequency with BG/NBD model

# Fit BG/NBD model
bgf = lifetimes.BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(df_rfm['frequency'], df_rfm['recency'], df_rfm['T'])
# Expected future purchases given frequency/recency
fig, ax = plt.subplots(figsize=(8, 6))
ax = plot_frequency_recency_matrix(bgf, T=365)
image
# Probability of being alive given frequency/recency
fig, ax = plt.subplots(figsize=(16, 6))
ax = plot_probability_alive_matrix(bgf)
image

Predictions of the number of expected purchases, along with the probability of being alive, can be made for every customer with function conditional_expected_number_of_purchases_up_to_time()

# Predict for every customer

## Number of periods (days) forward to predict the number of purchases
t = 365

## Create prediction dataframe
df_clv = df_rfm.copy()
df_clv['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(
    t, df_rfm['frequency'], df_rfm['recency'], df_rfm['T']
)
df_clv['proba_alive'] = bgf.conditional_probability_alive(
    frequency=df_rfm['frequency'],
    recency=df_rfm['recency'],
    T=df_rfm['T']
)

## Show results
df_clv.head()
frequency
recency
T
monetary_value
predicted_purchases
proba_alive
client_id
0020c81355c2057acfb019eb2c18a9a1
1.0
236.0
619.0
94.5
0.366430
0.632533
003b1637ce45163ccb9b0ff278464fd1
0.0
0.0
535.0
0.0
0.153914
1.000000
00444e8c950c199b637e5dcdfe401e0c
0.0
0.0
226.0
0.0
0.271410
1.000000
0055eb3281238fa3388a6db46d7d2d01
0.0
0.0
163.0
0.0
0.321769
1.000000
006e783513f582e9657d18334d06df49
1.0
103.0
634.0
53.0
0.277443
0.487711

Monetary value with Gamma-gamma model

The Gamma-gamma model takes the monetary value of customer history into account, and use it to predict the remaining value of the customer lifetime over a given period (12 months in this example).

The model assumes that there is no relationship between the monetary value and the purchase frequency. In practice, we need to check whether the Pearson correlation between the two vectors is close to 0 in order to use this model.

# Keep only returning customers
df_rfm_return = df_rfm[df_rfm['frequency'] > 0]

# Check (absence of) correlation between frequency and monetary value
df_rfm_return[['frequency', 'monetary_value']].corr()
frequency
monetary_value
frequency
1.000000
0.021878
monetary_value
0.021878
1.000000
# Fit Gamma-Gamma model
ggf = lifetimes.GammaGammaFitter()
ggf.fit(df_rfm_return['frequency'], df_rfm_return['monetary_value'])

# Predict expected value per transaction
df_clv['exp_avg_value'] = ggf.conditional_expected_average_profit(
    df_rfm['frequency'], df_rfm['monetary_value']
)

# Compare with actual average profit
print("Actual average profit: {:.2f}".format(df_rfm_return['monetary_value'].mean()))
print("Predicted expected value: {:.2f}".format(df_clv['exp_avg_value'].mean()))
Actual average profit: 82.45 
Predicted expected value: 82.40
# Predict residual customer lifetime value
df_clv['clv'] = ggf.customer_lifetime_value(
    bgf,
    df_rfm['frequency'],
    df_rfm['recency'],
    df_rfm['T'],
    df_rfm['monetary_value'],
    time=12,   # months
)
df_clv.head()
frequency
recency
T
monetary_value
predicted_purchases
proba_alive
exp_avg_value
clv
client_id
0020c81355c2057acfb019eb2c18a9a1
1.0
236.0
619.0
94.5
0.366430
0.632533
90.215145
30.654873
003b1637ce45163ccb9b0ff278464fd1
0.0
0.0
535.0
0.0
0.153914
1.000000
82.352228
11.747738
00444e8c950c199b637e5dcdfe401e0c
0.0
0.0
226.0
0.0
0.271410
1.000000
82.352228
20.741905
0055eb3281238fa3388a6db46d7d2d01
0.0
0.0
163.0
0.0
0.321769
1.000000
82.352228
24.602803
006e783513f582e9657d18334d06df49
1.0
103.0
634.0
53.0
0.277443
0.487711
63.353343
16.298747

In addition to the probability of being alive (proba_alive) and the expected number of purchases (predicted_purchases), we now have the expected value for each purchase (exp_avg_value) and as a result, the estimated remaining customer lifetime value (clv), which is the the product of the probability of being alive and the expected average purchase value, diminished by the discounted cash flow rate (discount_rate).