When I need to apply successive steps of data transformation to a DataFrame (i.e. basically in every data analysis), I favor method chaining for better readability and efficiency. However, some operations stayed rather obscure to me for a long time, like renaming all columns in a single operation. Here are some tips that may come in handy.
Setup
# Import library
import pandas as pd
# Create sample DataFrame
df = pd.DataFrame({'First Name': ['John', 'Aby', 'Bob', 'Alice'],
'Last Name': ['Doe', 'Parker', 'Morris', 'Allen']})
df
First Name | Last Name | |
0 | John | Doe |
1 | Aby | Parker |
2 | Bob | Morris |
3 | Alice | Allen |
Classic, non-chained method
A standard way of applying transformative functions to all columns names would be the following:
# Classic way
df2 = df.copy()
df2.columns = [col.lower().replace(' ', '_') for col in df2.columns]
df2
first_name | last_name | |
0 | John | Doe |
1 | Aby | Parker |
2 | Bob | Morris |
3 | Alice | Allen |
Apply transformative functions
With pandas method chaining, you can apply functions to transform names with .rename(columns=<function>)
.
Applying a function that does not require any argument, like lower()
or title()
, is very short:
# Method chaining: convert to lowercase
df.rename(columns=str.lower)
first name | last name | |
0 | John | Doe |
1 | Aby | Parker |
2 | Bob | Morris |
3 | Alice | Allen |
For a function that requires arguments, like str.replace()
, use a lambda
:
# Method chaining: replace strings
df.rename(columns=lambda x: x.replace(' ', ''))
FirstName | LastName | |
0 | John | Doe |
1 | Aby | Parker |
2 | Bob | Morris |
3 | Alice | Allen |
Specify new names to columns
If you want to specify a list of new columns names, use set_axis()
:
# Method chaining: rename all columns
df.set_axis(['given_name', 'family_name'], axis=1)
given_name | family_name | |
0 | John | Doe |
1 | Aby | Parker |
2 | Bob | Morris |
3 | Alice | Allen |