Get first row(s) of each group

To get the first row, or first n rows, of each group in a pandas DataFrame, you can use groupby() followed by head().

If your DataFrame is not sorted, make sure you sort it before grouping.

And after the operation, you may want to reset indexes.

Here is an example:

# Import libraries
import pandas as pd
import seaborn as sns

# Load sample data in a DataFrame
df = (
    sns.load_dataset('iris')
    .sample(n=12, random_state=24)
    .sort_values('species')
    .reset_index(drop=True)
    [['species', 'sepal_width']]
)
df
species
sepal_width
0
setosa
3.4
1
setosa
3.7
2
setosa
3.2
3
setosa
3.1
4
setosa
3.8
5
versicolor
2.4
6
versicolor
2.7
7
versicolor
2.5
8
virginica
2.9
9
virginica
2.8
10
virginica
3.0
11
virginica
3.4
# Get smallest value of each group
(
		df
		.sort_values('sepal_width')
		.groupby('species')
		.head(1)
		.reset_index(drop=True)
)
species
sepal_width
0
versicolor
2.4
1
virginica
2.8
2
setosa
3.1

And that’s it! A couple of last remarks:

  • If you want to get the last row, just sort by inverse order with ascending=True
  • And to get the nth row, use nth() instead of head()