To get the first row, or first n rows, of each group in a pandas DataFrame, you can use groupby()
followed by head()
.
If your DataFrame is not sorted, make sure you sort it before grouping.
And after the operation, you may want to reset indexes.
Here is an example:
# Import libraries
import pandas as pd
import seaborn as sns
# Load sample data in a DataFrame
df = (
sns.load_dataset('iris')
.sample(n=12, random_state=24)
.sort_values('species')
.reset_index(drop=True)
[['species', 'sepal_width']]
)
df
species | sepal_width | |
0 | setosa | 3.4 |
1 | setosa | 3.7 |
2 | setosa | 3.2 |
3 | setosa | 3.1 |
4 | setosa | 3.8 |
5 | versicolor | 2.4 |
6 | versicolor | 2.7 |
7 | versicolor | 2.5 |
8 | virginica | 2.9 |
9 | virginica | 2.8 |
10 | virginica | 3.0 |
11 | virginica | 3.4 |
# Get smallest value of each group
(
df
.sort_values('sepal_width')
.groupby('species')
.head(1)
.reset_index(drop=True)
)
species | sepal_width | |
0 | versicolor | 2.4 |
1 | virginica | 2.8 |
2 | setosa | 3.1 |
And that’s it! A couple of last remarks:
- If you want to get the last row, just sort by inverse order with
ascending=True
- And to get the nth row, use
nth()
instead ofhead()