Go up a hierarchical tree with pandas

A common way of storing hierarchical data is the tree:

image

In SQL, the methods for going up a tree are very different between SQL dialects. In pandas however, it is relatively simple.

For this example, we’ll use the list of Google Ads geotargets, that stores locations in a hierarchical structure, and get the list of all parents for each location.

Setup

# Import libraries
import pandas as pd
import numpy as np

# Import Google Ads location data
df = (
    pd.read_csv('https://developers.google.com/adwords/api/docs/appendix/geo/geotargets-2021-02-24.csv')
    .rename(columns=lambda x: x.lower().replace(' ', '_'))
    .loc[lambda x: x['country_code'] == 'FR']
    .reset_index(drop=True)
    .assign(parent_id=lambda x: x['parent_id'].fillna(0).astype('int'))
    [['criteria_id', 'canonical_name', 'parent_id']]
)
df
criteria_id
canonical_name
parent_id
0
1005781
Algolsheim,Grand Est,France
9068896
1
1005782
Cernay,Grand Est,France
9068896
2
1005783
Colmar,Grand Est,France
9068896
3
1005784
Ensisheim,Grand Est,France
9068896
4
1005785
Erstein,Grand Est,France
9068896
…
…
…
…
3755
9060754
Petite-France,Grand Est,France
9068896
3756
9072487
Les Halles,Ile-de-France,France
20321
3757
9072488
Place Vendome,Ile-de-France,France
20321
3758
9072489
Sorbonne,Ile-de-France,France
20321
3759
9072490
Saint-Germain-des-Pres,Ile-de-France,France
20321

Recursive function

# Recursive algorithm: list the next parent if any, otherwise exit
def recursive(row, parents):
    if row['parent_id'] == 0:
        return parents
    else:
        parents.append(row['parent_id'])
        return recursive(df.loc[df['criteria_id'] == row['parent_id'], :].squeeze(), parents)

# Call the recursive function for each row
def get_parents(row):
    parents = []
    return recursive(row, parents)

# Apply the function and create a column with the parents list for each location
df['parents_list'] = df.apply(get_parents, axis=1)
# Display resulting DataFrame
df.sort_values('parents_list')
criteria_id
canonical_name
parent_id
parents_list
683
2250
France
0
[]
669
20320
Franche-Comte,France
2250
[2250]
675
20326
Nord-Pas-de-Calais,France
2250
[2250]
674
20325
Midi-Pyrenees,France
2250
[2250]
673
20324
Lorraine,France
2250
[2250]
…
…
…
…
…
3088
9056091
73370,Auvergne-Rhone-Alpes,France
9069525
[9069525, 2250]
3089
9056092
73400,Auvergne-Rhone-Alpes,France
9069525
[9069525, 2250]
3090
9056093
73600,Auvergne-Rhone-Alpes,France
9069525
[9069525, 2250]
638
1006423
Saint-Agreve,Auvergne-Rhone-Alpes,France
9069525
[9069525, 2250]
3112
9056115
74500,Auvergne-Rhone-Alpes,France
9069525
[9069525, 2250]