Handle dates in pandas
Convert strings to datetime
import pandas as pd
# Create a dataframe with dates
df = pd.DataFrame(['01-01-2020', '01-02-2020', '01-03-2020', '01-04-2020', '01-05-2020'], columns=['date'])
df
date | |
---|---|
0 | 01-01-2020 |
1 | 01-02-2020 |
2 | 01-03-2020 |
3 | 01-04-2020 |
4 | 01-05-2020 |
pd.to_datetime()
is the most useful pandas function for converting strings to datetime. It tries to guess the date format if you don’t specify it. For a cheatsheet on Python’s strftime parameters, you can refer to strftime.org.
# Without specifying a date format
pd.to_datetime(df['date'])
0 2020-01-01
1 2020-01-02
2 2020-01-03
3 2020-01-04
4 2020-01-05
Name: date, dtype: datetime64[ns]
# With the format specified
df['date_converted'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
df['date_converted']
0 2020-01-01
1 2020-02-01
2 2020-03-01
3 2020-04-01
4 2020-05-01
Name: date_converted, dtype: datetime64[ns]
Extract date parts
The pd.Series.dt
accessor extracts parts of a timestamp.
# Extract year
df['date_converted'].dt.year
0 2020
1 2020
2 2020
3 2020
4 2020
Name: date_converted, dtype: int64
# Extract week day
df['date_converted'].dt.weekday
0 2
1 5
2 6
3 2
4 4
Name: date_converted, dtype: int64
To get the name of the weekday, use dt.day_name()
. You can even specify the locale
you want, e.g.:
# Get name of the weekday in FR locale
df['date_converted'].dt.day_name(locale='fr_FR')
0 Mercredi
1 Samedi
2 Dimanche
3 Mercredi
4 Vendredi
Name: date_converted, dtype: object
Truncate date
To truncate a date or datetime to a less granular frequency, use dt.to_period()
with a specified offset, for example:
D
: calendar dayW
: weekM
: monthQ
: quarterY
: year
# Truncate date to month
df['date_converted'].dt.to_period('M')
0 2020-01
1 2020-02
2 2020-03
3 2020-04
4 2020-05
Name: date_converted, dtype: period[M]
# Truncate date to quarter
df['date_converted'].dt.to_period('Q')
0 2020Q1
1 2020Q1
2 2020Q1
3 2020Q2
4 2020Q2
Name: date_converted, dtype: period[Q-DEC]
Convert date to string
strftime()
is the Python function for converting datetime to strings. You can apply it to a Series or DataFrame with apply()
.
# Convert to string, with a different format
df['date_converted'].apply(lambda x: x.strftime('%B %-d, %Y'))
0 January 1, 2020
1 February 1, 2020
2 March 1, 2020
3 April 1, 2020
4 May 1, 2020
Name: date_converted, dtype: object
Convert epoch to date
pd.to_datetime()
can convert an UNIX epoch up to the nanosecond, specified with the unit parameter.
# Convert epoch in seconds to date
pd.to_datetime(1587239920, unit='s')
Timestamp('2020-04-18 19:58:40')
Change an element of a date
To change the element of date (year, month or day), you must first convert the date to a Timestamp
, then apply replace()
, and specify the new value of the element:
# Change the year of the date
(
df['date']
.apply(pd.Timestamp)
.apply(lambda y: y.replace(year=1996))
)
0 1996-01-01
1 1996-02-01
2 1996-03-01
3 1996-04-01
4 1996-05-01
Name: date, dtype: datetime64[ns]