Setup
# Import libraries
import pandas as pd
# Create sample dataframe
df = pd.DataFrame(
{'zip_codes':["75002, 75005, 75007, 75019, 75020", "13000, 13001, 13005", "69001, 69003, 69004"]},
index=['Paris', 'Marseille', 'Lyon']
)
df
zip_codes | |
Paris | 75002, 75005, 75007, 75019, 75020 |
Marseille | 13000, 13001, 13005 |
Lyon | 69001, 69003, 69004 |
Split values and expand into multiple rows
Use explode()
to break down a list into multiple rows
# Split values on commas, and expand into multiple rows
df = (
df
.assign(zip_codes=lambda x: x['zip_codes'].str.split(','))
.explode('zip_codes')
)
df
zip_codes | |
Paris | 75002 |
Paris | 75005 |
Paris | 75007 |
Paris | 75019 |
Paris | 75020 |
Marseille | 13000 |
Marseille | 13001 |
Marseille | 13005 |
Lyon | 69001 |
Lyon | 69003 |
Lyon | 69004 |
Regroup rows into a list
The inverse operation, regrouping multiple rows into a list, is achieved with a grouby()
followed by agg(list)
.
# Regroup rows with the same index into a list
(
df
.groupby(df.index)
.agg(list)
)
zip_codes | |
Lyon | [69001, 69003, 69004] |
Marseille | [13000, 13001, 13005] |
Paris | [75002, 75005, 75007, 75019, 75020] |