Setup
# Import libraries
import pandas as pd
import seaborn as sns
# Load sample data in a DataFrame
df = sns.load_dataset('iris')[['species', 'sepal_length']]
df
species | sepal_length | |
0 | setosa | 5.1 |
1 | setosa | 4.9 |
2 | setosa | 4.7 |
3 | setosa | 4.6 |
4 | setosa | 5.0 |
… | … | … |
145 | virginica | 6.7 |
146 | virginica | 6.3 |
147 | virginica | 6.5 |
148 | virginica | 6.2 |
149 | virginica | 5.9 |
Group and aggregate
If we want to get the minimum, average and maximum of sepal_length
for each species, a classic way would be:
(
df
.groupby('species')
.agg({'sepal_length': ['min', 'mean', 'max']})
)
sepal_length | |
min | |
species | |
setosa | 4.3 |
versicolor | 4.9 |
virginica | 4.9 |
An alternative style for aggregation allows to aggregate the same column on multiple functions, without having to suffer the pain of multi-index columns:
(
df
.groupby('species')
.agg(
sepal_length_min=('sepal_length', 'min'),
sepal_length_mean=('sepal_length', 'mean'),
sepal_length_max=('sepal_length', 'max'),
)
)
sepal_length_min | sepal_length_mean | sepal_length_max | |
species | |||
setosa | 4.3 | 5.006 | 5.8 |
versicolor | 4.9 | 5.936 | 7.0 |
virginica | 4.9 | 6.588 | 7.9 |