Introduction
Python package lifetimes helps to make recency/frequency customers analysis, and estimate lifetime values. Two models can be used, the simple BG/NBD model that doesn’t use monetary value, and the Gamma-Gamma model that estimates the customer lifetime value.
Prepare data
The initial dataset with one row per transaction (order_id
) needs to be transformed to the appropriate shape, with one row per client, and four columns for recency, frequency, age, and monetary value.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import lifetimes
from lifetimes.plotting import plot_frequency_recency_matrix, plot_probability_alive_matrix
# Import list of transactions
df = pd.read_csv('./exclude/eshop_transactions.csv', parse_dates=['date'])
df.head()
order_id | client_id | date | total | |
0 | 1007 | fe1a03b2b0e021bbac0ea050a1d216a7 | 2017-06-21 05:14:35 | 39.0 |
1 | 1008 | c9bdedeb9ac367f11c77dc5753b2b939 | 2017-06-21 05:30:19 | 39.0 |
2 | 1009 | 47f4ef0684413a1f5a429e251cbc7261 | 2017-06-21 07:58:02 | 75.0 |
3 | 1010 | 9a4ee011e9639539955423f54b7d46ec | 2017-06-21 08:23:34 | 75.0 |
4 | 1011 | 490037f2983ad86fab82bcb309a33ee1 | 2017-06-21 08:30:59 | 42.0 |
# Transform data to appropriate shape
df_rfm = lifetimes.utils.summary_data_from_transaction_data(
df, 'client_id', 'date',
monetary_value_col = 'total',
observation_period_end='2019-03-31'
)
df_rfm.head(7)
frequency | recency | T | monetary_value | |
client_id | ||||
0020c81355c2057acfb019eb2c18a9a1 | 1.0 | 236.0 | 619.0 | 94.5 |
003b1637ce45163ccb9b0ff278464fd1 | 0.0 | 0.0 | 535.0 | 0.0 |
00444e8c950c199b637e5dcdfe401e0c | 0.0 | 0.0 | 226.0 | 0.0 |
0055eb3281238fa3388a6db46d7d2d01 | 0.0 | 0.0 | 163.0 | 0.0 |
006e783513f582e9657d18334d06df49 | 1.0 | 103.0 | 634.0 | 53.0 |
007062115a3cbd21d0b52af5e42e10cf | 0.0 | 0.0 | 299.0 | 0.0 |
0085fabd8952f67be3efdeaddfbf6d43 | 2.0 | 327.0 | 439.0 | 94.5 |
Recency and frequency with BG/NBD model
# Fit BG/NBD model
bgf = lifetimes.BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(df_rfm['frequency'], df_rfm['recency'], df_rfm['T'])
# Expected future purchases given frequency/recency
fig, ax = plt.subplots(figsize=(8, 6))
ax = plot_frequency_recency_matrix(bgf, T=365)
# Probability of being alive given frequency/recency
fig, ax = plt.subplots(figsize=(16, 6))
ax = plot_probability_alive_matrix(bgf)
Predictions of the number of expected purchases, along with the probability of being alive, can be made for every customer with function conditional_expected_number_of_purchases_up_to_time()
# Predict for every customer
## Number of periods (days) forward to predict the number of purchases
t = 365
## Create prediction dataframe
df_clv = df_rfm.copy()
df_clv['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(
t, df_rfm['frequency'], df_rfm['recency'], df_rfm['T']
)
df_clv['proba_alive'] = bgf.conditional_probability_alive(
frequency=df_rfm['frequency'],
recency=df_rfm['recency'],
T=df_rfm['T']
)
## Show results
df_clv.head()
frequency | recency | T | monetary_value | predicted_purchases | proba_alive | |
client_id | ||||||
0020c81355c2057acfb019eb2c18a9a1 | 1.0 | 236.0 | 619.0 | 94.5 | 0.366430 | 0.632533 |
003b1637ce45163ccb9b0ff278464fd1 | 0.0 | 0.0 | 535.0 | 0.0 | 0.153914 | 1.000000 |
00444e8c950c199b637e5dcdfe401e0c | 0.0 | 0.0 | 226.0 | 0.0 | 0.271410 | 1.000000 |
0055eb3281238fa3388a6db46d7d2d01 | 0.0 | 0.0 | 163.0 | 0.0 | 0.321769 | 1.000000 |
006e783513f582e9657d18334d06df49 | 1.0 | 103.0 | 634.0 | 53.0 | 0.277443 | 0.487711 |
Monetary value with Gamma-gamma model
The Gamma-gamma model takes the monetary value of customer history into account, and use it to predict the remaining value of the customer lifetime over a given period (12 months in this example).
The model assumes that there is no relationship between the monetary value and the purchase frequency. In practice, we need to check whether the Pearson correlation between the two vectors is close to 0 in order to use this model.
# Keep only returning customers
df_rfm_return = df_rfm[df_rfm['frequency'] > 0]
# Check (absence of) correlation between frequency and monetary value
df_rfm_return[['frequency', 'monetary_value']].corr()
frequency | monetary_value | |
frequency | 1.000000 | 0.021878 |
monetary_value | 0.021878 | 1.000000 |
# Fit Gamma-Gamma model
ggf = lifetimes.GammaGammaFitter()
ggf.fit(df_rfm_return['frequency'], df_rfm_return['monetary_value'])
# Predict expected value per transaction
df_clv['exp_avg_value'] = ggf.conditional_expected_average_profit(
df_rfm['frequency'], df_rfm['monetary_value']
)
# Compare with actual average profit
print("Actual average profit: {:.2f}".format(df_rfm_return['monetary_value'].mean()))
print("Predicted expected value: {:.2f}".format(df_clv['exp_avg_value'].mean()))
Actual average profit: 82.45
Predicted expected value: 82.40
# Predict residual customer lifetime value
df_clv['clv'] = ggf.customer_lifetime_value(
bgf,
df_rfm['frequency'],
df_rfm['recency'],
df_rfm['T'],
df_rfm['monetary_value'],
time=12, # months
)
df_clv.head()
frequency | recency | T | monetary_value | predicted_purchases | proba_alive | exp_avg_value | clv | |
client_id | ||||||||
0020c81355c2057acfb019eb2c18a9a1 | 1.0 | 236.0 | 619.0 | 94.5 | 0.366430 | 0.632533 | 90.215145 | 30.654873 |
003b1637ce45163ccb9b0ff278464fd1 | 0.0 | 0.0 | 535.0 | 0.0 | 0.153914 | 1.000000 | 82.352228 | 11.747738 |
00444e8c950c199b637e5dcdfe401e0c | 0.0 | 0.0 | 226.0 | 0.0 | 0.271410 | 1.000000 | 82.352228 | 20.741905 |
0055eb3281238fa3388a6db46d7d2d01 | 0.0 | 0.0 | 163.0 | 0.0 | 0.321769 | 1.000000 | 82.352228 | 24.602803 |
006e783513f582e9657d18334d06df49 | 1.0 | 103.0 | 634.0 | 53.0 | 0.277443 | 0.487711 | 63.353343 | 16.298747 |
In addition to the probability of being alive (proba_alive
) and the expected number of purchases (predicted_purchases
), we now have the expected value for each purchase (exp_avg_value
) and as a result, the estimated remaining customer lifetime value (clv
), which is the the product of the probability of being alive and the expected average purchase value, diminished by the discounted cash flow rate (discount_rate
).