Get top rows or random rows within groups in a DataFrame

# Import libraries
import pandas as pd
import seaborn as sns

# Load sample data in a DataFrame
df = (
    sns.load_dataset('iris')
    .sample(n=20, random_state=20)
    .sort_values('species')
    .reset_index(drop=True)
    [['species', 'sepal_width']]
)
df
species
sepal_width
0
setosa
3.2
1
setosa
3.0
2
setosa
3.4
3
setosa
3.2
4
setosa
3.7
5
setosa
3.0
6
versicolor
2.7
7
versicolor
2.8
8
versicolor
2.9
9
versicolor
2.8
10
versicolor
2.9
11
versicolor
2.5
12
versicolor
2.4
13
virginica
3.2
14
virginica
2.5
15
virginica
2.8
16
virginica
3.0
17
virginica
2.2
18
virginica
3.0
19
virginica
2.8

Get top N rows of each group

An option is to sort values, then use groupby() followed by head()

# Get top 3 rows for each group, sorted by decreasing sepal width
(
    df
    .sort_values(['species', 'sepal_width'], ascending=[True, False])
    .groupby('species')
    .head(3)
)
species
sepal_width
4
setosa
3.7
2
setosa
3.4
0
setosa
3.2
8
versicolor
2.9
10
versicolor
2.9
7
versicolor
2.8
13
virginica
3.2
16
virginica
3.0
18
virginica
3.0

Get random rows within each group

To retrieve n random rows from each group, use sample(n) after a groupby():

# Get 1 random row within each group of the DataFrame
(
    df
    .groupby('species')
    .apply(lambda x: x.sample(1))
)
species
sepal_width
species
setosa
1
setosa
3.0
versicolor
10
versicolor
2.9
virginica
13
virginica
3.2