Cross-join two pandas DataFrames
Say you have groups and subgroups, and you want to combine in a DataFrame all possible combinations of groups and subgroups.
# Import libraries
import pandas as pd
from IPython.display import display
# Create DataFrames
df_groups = pd.DataFrame({'group': ['A', 'B', 'C']})
display(df_groups)
df_subgroups = pd.DataFrame({'subgroup': list(range(5))})
display(df_subgroups)
|
subgroup |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
This can be done with a cross-join, using the merge()
function with argument how='cross'
.
# Combine DataFrames with a cross-join
df_groups.merge(df_subgroups, how='cross')
|
group |
subgroup |
0 |
A |
0 |
1 |
A |
1 |
2 |
A |
2 |
3 |
A |
3 |
4 |
A |
4 |
5 |
B |
0 |
6 |
B |
1 |
7 |
B |
2 |
8 |
B |
3 |
9 |
B |
4 |
10 |
C |
0 |
11 |
C |
1 |
12 |
C |
2 |
13 |
C |
3 |
14 |
C |
4 |