Forward- or back-fill NA values in DataFrames

Handling missing data is an important step in the data preprocessing pipeline. In many scenarios, filling NA values with meaningful data can lead to more robust models. Two common techniques are forward-filling and back-filling, where missing values are filled with adjacent non-NA values.

Back-filling replaces NaN with the next non-NA value, while forward-filling takes the previous non-NA value. In pandas, this can be be done with fillna(method=bfill|ffill) or with the shortcuts bfill() and ffill().

First, let's create a sample DataFrame with random integer values and some missing data:

# Import libraries
import pandas as pd
import numpy as np
from numpy.random import randint

# Set seed
np.random.seed(21)

# Create sample data
df = (
    pd.DataFrame({
        'group': list('ABCDE'), 
        'value': [randint(1000, 20000, randint(2, 5)) for i in range(5)]
    })
    .explode('value')
    .reset_index(drop=True)
)
df.loc[df.sample(frac=0.5).index, 'value'] = np.nan
df
group
value
0
A
6327
1
A
NaN
2
A
NaN
3
B
NaN
4
B
13898
5
C
NaN
6
C
18224
7
C
2646
8
D
NaN
9
D
16613
10
E
17118
11
E
NaN
12
E
3352

Back-fill and forward-fill

Pandas provides the fillna(method=bfill|ffill) function, as well as shortcuts bfill() and ffill(), to perform these operations.

This code snippet adds two new columns to the DataFrame, 'value_bfill' and 'value_ffill', representing the back-filled and forward-filled 'value' feature, respectively.

# Back-fill and forward-fill
df.assign(
    value_bfill=df['value'].bfill(),
    value_ffill=df['value'].ffill()
)
group
value
value_bfill
value_ffill
0
A
6327
6327
6327
1
A
NaN
13898
6327
2
A
NaN
13898
6327
3
B
NaN
13898
6327
4
B
13898
13898
13898
5
C
NaN
18224
13898
6
C
18224
18224
18224
7
C
2646
2646
2646
8
D
NaN
16613
2646
9
D
16613
16613
16613
10
E
17118
17118
17118
11
E
NaN
3352
17118
12
E
3352
3352
3352

Forward- or back-fill within groups

In some cases, you may want to apply these fill methods within specific groups. This can be done using the groupby method in Pandas.

This code snippet demonstrates how to apply both back-fill and forward-fill within each group, and also how to combine both methods to fill all NA values:

# Backfill and foward-fill within each group
df.assign(
		# Back-fill
    value_bfill=df.groupby('group').transform(lambda x: x.bfill()),
		# Forward-fill
    value_ffill=df.groupby('group').transform(lambda x: x.ffill()),
		# Get back- or forward-filled value
    value_filled=lambda x: x['value_bfill'].fillna(x['value_ffill'])
)
group
value
value_bfill
value_ffill
value_filled
0
A
6327
6327.0
6327.0
6327.0
1
A
NaN
NaN
6327.0
6327.0
2
A
NaN
NaN
6327.0
6327.0
3
B
NaN
13898.0
NaN
13898.0
4
B
13898
13898.0
13898.0
13898.0
5
C
NaN
18224.0
NaN
18224.0
6
C
18224
18224.0
18224.0
18224.0
7
C
2646
2646.0
2646.0
2646.0
8
D
NaN
16613.0
NaN
16613.0
9
D
16613
16613.0
16613.0
16613.0
10
E
17118
17118.0
17118.0
17118.0
11
E
NaN
3352.0
17118.0
3352.0
12
E
3352
3352.0
3352.0
3352.0