Apply computation within groups in a pandas DataFrame

One useful capability of pandas is to execute computations within groups of a DataFrame. This is achieved using groupby() followed by transform().

species
sepal_width
0
setosa
3.4
1
setosa
3.7
2
setosa
3.2
3
setosa
3.1
4
setosa
3.8
5
versicolor
2.4
6
versicolor
2.7
7
versicolor
2.5
8
virginica
2.9
9
virginica
2.8
10
virginica
3.0
11
virginica
3.4

Compute mean at group-level

To get group-level statistics like mean, sum or count, use transform('function').

species
sepal_width
mean
0
setosa
3.4
3.440000
1
setosa
3.7
3.440000
2
setosa
3.2
3.440000
3
setosa
3.1
3.440000
4
setosa
3.8
3.440000
5
versicolor
2.4
2.533333
6
versicolor
2.7
2.533333
7
versicolor
2.5
2.533333
8
virginica
2.9
3.025000
9
virginica
2.8
3.025000
10
virginica
3.0
3.025000
11
virginica
3.4
3.025000

Standardize values

You can center values inside a group by substracting the group mean to each row.

species
sepal_width
standardized
0
setosa
3.4
-0.040000
1
setosa
3.7
0.260000
2
setosa
3.2
-0.240000
3
setosa
3.1
-0.340000
4
setosa
3.8
0.360000
5
versicolor
2.4
-0.133333
6
versicolor
2.7
0.166667
7
versicolor
2.5
-0.033333
8
virginica
2.9
-0.125000
9
virginica
2.8
-0.225000
10
virginica
3.0
-0.025000
11
virginica
3.4
0.375000

Rank values inside groups

Besides computing group-wise values, you can also rank values within each group.

species
sepal_width
rank
0
setosa
3.4
3.0
1
setosa
3.7
4.0
2
setosa
3.2
2.0
3
setosa
3.1
1.0
4
setosa
3.8
5.0
5
versicolor
2.4
1.0
6
versicolor
2.7
3.0
7
versicolor
2.5
2.0
8
virginica
2.9
2.0
9
virginica
2.8
1.0
10
virginica
3.0
3.0
11
virginica
3.4
4.0

Error when output has multiple columns

If the result of the transform generates more than 1 column, and you try to assign it to a column of an existing DataFrame, you will encounter an error like ValueError: Wrong number of items passed X, placement implies 1. To avoid this, pass only one column in the result:

species
sepal_width
rank
0
setosa
3.4
3.0
1
setosa
3.7
4.0
2
setosa
3.2
2.0
3
setosa
3.1
1.0
4
setosa
3.8
5.0
5
versicolor
2.4
1.0
6
versicolor
2.7
3.0
7
versicolor
2.5
2.0
8
virginica
2.9
2.0
9
virginica
2.8
1.0
10
virginica
3.0
3.0
11
virginica
3.4
4.0