Visualizing time-series data is a common task in data analysis, and bar plots are often used to represent data over time. When using dates as axis labels in a Pandas bar plot (plot.bar()
), the default rendering includes both date and time (e.g., 2021-01-01 00:00:00
). This can lead to cluttered and less readable plots. In this post, we'll explore two methods to format dates for a cleaner and more informative visualization.
First, let's create a sample DataFrame with dates and random integer values:
# Import libraries
import pandas as pd
from numpy.random import randint
# Create sample data
df = pd.DataFrame({'date':pd.date_range(start='01/01/2021', periods=15),
'value':randint(10, 20, 15)})
# Default plot with full datetime
df.plot.bar(x='date');
By default, the plot.bar()
method will render the full datetime, including the time component. This can be less than ideal, especially when the time component is not relevant to the analysis.
Keep only the date part with dt.date
One way to address this is to keep only the date part of the datetime object. This can be done using the dt.date
attribute:
# Keep only the date
df.assign(date=lambda x: x['date'].dt.date).plot.bar(x='date');
This method removes the time component, leaving only the date in the YYYY-MM-DD
format.
More control with dt.strftime
For even more control over the date format, you can use the dt.strftime
method. This allows you to specify the exact format you want:
# Format date with strftime
df.assign(date=lambda x: x['date'].dt.strftime('%b %d')).plot.bar(x='date');
Here, the date is formatted to show the abbreviated month name and the day of the month, providing a concise and readable representation.