A common way of storing hierarchical data is the tree:
In SQL, the methods for going up a tree are very different between SQL dialects. In pandas however, it is relatively simple.
For this example, we’ll use the list of Google Ads geotargets, that stores locations in a hierarchical structure, and get the list of all parents for each location.
Setup
# Import libraries
import pandas as pd
import numpy as np
# Import Google Ads location data
df = (
pd.read_csv('https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv')
.rename(columns=lambda x: x.lower().replace(' ', '_'))
.loc[lambda x: x['country_code'] == 'FR']
.reset_index(drop=True)
.assign(parent_id=lambda x: x['parent_id'].fillna(0).astype('int'))
[['criteria_id', 'canonical_name', 'parent_id']]
)
df
criteria_id | canonical_name | parent_id | |
0 | 1005781 | Algolsheim,Grand Est,France | 9068896 |
1 | 1005782 | Cernay,Grand Est,France | 9068896 |
2 | 1005783 | Colmar,Grand Est,France | 9068896 |
3 | 1005784 | Ensisheim,Grand Est,France | 9068896 |
4 | 1005785 | Erstein,Grand Est,France | 9068896 |
… | … | … | … |
3755 | 9060754 | Petite-France,Grand Est,France | 9068896 |
3756 | 9072487 | Les Halles,Ile-de-France,France | 20321 |
3757 | 9072488 | Place Vendome,Ile-de-France,France | 20321 |
3758 | 9072489 | Sorbonne,Ile-de-France,France | 20321 |
3759 | 9072490 | Saint-Germain-des-Pres,Ile-de-France,France | 20321 |
Recursive function
# Recursive algorithm: list the next parent if any, otherwise exit
def recursive(row, parents):
if row['parent_id'] == 0:
return parents
else:
parents.append(row['parent_id'])
return recursive(df.loc[df['criteria_id'] == row['parent_id'], :].squeeze(), parents)
# Call the recursive function for each row
def get_parents(row):
parents = []
return recursive(row, parents)
# Apply the function and create a column with the parents list for each location
df['parents_list'] = df.apply(get_parents, axis=1)
# Display resulting DataFrame
df.sort_values('parents_list')
criteria_id | canonical_name | parent_id | parents_list | |
683 | 2250 | France | 0 | [] |
669 | 20320 | Franche-Comte,France | 2250 | [2250] |
675 | 20326 | Nord-Pas-de-Calais,France | 2250 | [2250] |
674 | 20325 | Midi-Pyrenees,France | 2250 | [2250] |
673 | 20324 | Lorraine,France | 2250 | [2250] |
… | … | … | … | … |
3088 | 9056091 | 73370,Auvergne-Rhone-Alpes,France | 9069525 | [9069525, 2250] |
3089 | 9056092 | 73400,Auvergne-Rhone-Alpes,France | 9069525 | [9069525, 2250] |
3090 | 9056093 | 73600,Auvergne-Rhone-Alpes,France | 9069525 | [9069525, 2250] |
638 | 1006423 | Saint-Agreve,Auvergne-Rhone-Alpes,France | 9069525 | [9069525, 2250] |
3112 | 9056115 | 74500,Auvergne-Rhone-Alpes,France | 9069525 | [9069525, 2250] |