- Introduction
- Setup and data loading
- Prepare train, validation and test sets
- Simple forecasts
- Naive forecast
- Moving average
- Differentiated moving average
- Smoothed differentiated average
- Neural Networks
- Find optimal learning rate
- Train model with optimal learning rate
- Evaluate on test set
- Recurrent Neural Network
- Find optimal learning rate
- Train model with optimal learning rate
- Evaluate on test set
- LSTM model
- Find optimal learning rate
- Train model with optimal learning rate
- Evaluate on test set
- Models comparison
Introduction
This notebook is largely inspired by Laurence Moroney’s Coursera tutorial about time series prediction. The goal is to compare models for predicting daily temperatures in New York City, from the simplest methods up to TensorFlow neural networks. We use 10 years of historical data, available on BigQuery public datasets.
Setup and data loading
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
# Settings
sns.set(rc={'figure.figsize':(10,6)})
%%bigquery df
# Get 10 years of NYC daily temperatures
SELECT DISTINCT
PARSE_DATE("%Y%m%d", CONCAT(year, mo, da)) AS date,
ROUND((temp - 32) * 5/9, 1) AS temp
FROM `bigquery-public-data.noaa_gsod.gsod*`
WHERE
wban = '94728'
AND _TABLE_SUFFIX BETWEEN '2010' AND '2019'
ORDER BY date
# Prepare dataframe
df = df.reset_index()
df['date'] = pd.to_datetime(df['date'])
df
index | date | temp | |
0 | 0 | 2010-01-01 | 2.2 |
1 | 1 | 2010-01-02 | -1.8 |
2 | 2 | 2010-01-03 | -7.3 |
3 | 3 | 2010-01-04 | -5.0 |
4 | 4 | 2010-01-05 | -4.2 |
… | … | … | … |
3647 | 3647 | 2019-12-27 | 9.7 |
3648 | 3648 | 2019-12-28 | 8.9 |
3649 | 3649 | 2019-12-29 | 5.9 |
3650 | 3650 | 2019-12-30 | 4.1 |
3651 | 3651 | 2019-12-31 | 3.7 |
Prepare train, validation and test sets
We have 10 years of daily temperature data. The first 8 years (2010 to 2017) will be used as training set, while 2018 will be the validation set, and 2019 the test set. For time series, it would inappropriate to select rows at random for the training/validation/test sets, as the order of observations should be kept.
# Helper function for plotting series
def plot_series(time, series, format="-", start=0, end=None):
plt.plot(time[start:end], series[start:end], format)
plt.xlabel("Time")
plt.ylabel("Value")
plt.grid(True)
# Helper function for MSE and MAE
def mse_mae(x_valid, x_pred, name=None):
return pd.Series(data=[
keras.metrics.mean_squared_error(x_valid, x_pred).numpy().round(2),
keras.metrics.mean_absolute_error(x_valid, x_pred).numpy().round(2)
],
index=['MSE', 'MAE'],
name=name)
# Create simple lists from 2010-01-01
time = df.loc[df['date'] >= '2010-01-01', 'index'].to_numpy()
series = df.loc[df['date'] >= '2010-01-01', 'temp'].to_numpy()
# Split between training, validation and test sets
valid_start = int(df.loc[df['date'] == '2018-01-01', 'index'])
test_start = int(df.loc[df['date'] == '2019-01-01', 'index'])
time_train = time[:valid_start]
x_train = series[:valid_start]
time_valid = time[valid_start:test_start]
x_valid = series[valid_start:test_start]
time_test = time[test_start:]
x_test = series[test_start:]
# Plot training set
plot_series(time_train, x_train)
# Plot validation test
plot_series(time_valid, x_valid)
# Plot test set
plot_series(time_test, x_test)
Simple forecasts
Naive forecast
A performance baseline can be established with an very basic model: predict each day temperature with the previous day value.
# Naive forecast is just day N-1
naive_forecast = series[valid_start-1:test_start-1]
# Plot result
plot_series(time_valid, x_valid)
plot_series(time_valid, naive_forecast)
# Zoom on 100 days
plot_series(time_valid[:50], x_valid[:50])
plot_series(time_valid[:50], naive_forecast[:50])
# Performance
naive_forecast_perfs = mse_mae(x_valid, naive_forecast, name='Naive Forecast')
naive_forecast_perfs
MSE 12.01
MAE 2.61
Name: Naive Forecast, dtype: float64
Moving average
Let’s compare our baseline model with a simple 30-day moving average.
# Helper function to calculate moving average
def moving_average_forecast(series, window_size):
forecast = []
for time in range(len(series) - window_size):
forecast.append(series[time:time + window_size].mean())
return np.array(forecast)
# Moving average over 30 days
moving_avg = moving_average_forecast(series, 30)[valid_start - 30:test_start-30]
# Plot result
plot_series(time_valid, x_valid)
plot_series(time_valid, moving_avg)
# Performance
moving_avg_perfs = mse_mae(x_valid, moving_avg, name="Moving average")
moving_avg_perfs
MSE 23.92
MAE 3.73
Name: Moving average, dtype: float64
The performance is not nearly as good as the benchmark, the MSE is almost twice as high. The model gets the general trend, but completely misses the daily variations, and has a 30-day lag in seasonality.
Differentiated moving average
Moving average does not help to predict future values. For seasonal time series like temperatures, values of the previous year may be useful. One strategy is to try to predict the difference between day D and D-365. We then add the seasonality, and see if the prediction is better.
# Remove seasonality by substracting N-365 days
diff_series = (series[365:test_start] - series[:test_start-365])
diff_time = time[365:test_start]
plot_series(diff_time, diff_series)
# Differentiated moving average is the moving average with N-365 value substracted
diff_moving_avg = moving_average_forecast(diff_series, 50)[valid_start - 365 - 50:test_start]
plot_series(time_valid, diff_series[valid_start - 365:test_start])
plot_series(time_valid, diff_moving_avg)
# Need to add past value for actual forecast
diff_moving_avg_plus_past = series[valid_start - 365:test_start - 365] + diff_moving_avg
plot_series(time_valid, x_valid)
plot_series(time_valid, diff_moving_avg_plus_past)
# Performance
diff_moving_avg_plus_past_perfs = mse_mae(x_valid, diff_moving_avg_plus_past, name="Differentiated moving average")
diff_moving_avg_plus_past_perfs
MSE 35.80
MAE 4.71
Name: Differentiated moving average, dtype: float64
The performance of the differentiated moving average is significantly worse than the simple moving average, so this strategy does not work here.
Smoothed differentiated average
To remove the noise of the previous method, the differentiated moving average, we can substract the moving average for the previous year, instead of the daily values.
# To smoothen the predictions, substract moving average from N-365 days
diff_moving_avg_plus_smooth_past = moving_average_forecast(series[valid_start - 370:test_start - 320], 50) + diff_moving_avg
plot_series(time_valid, x_valid)
plot_series(time_valid, diff_moving_avg_plus_smooth_past)
# Performance
diff_moving_avg_plus_smooth_past_perfs = mse_mae(
x_valid,
diff_moving_avg_plus_smooth_past,
name="Smoothed diff. moving average"
)
diff_moving_avg_plus_smooth_past_perfs
MSE 29.26
MAE 4.37
Name: Smoothed diff. moving average, dtype: float64
This enhances the prediction performance, that is more similar to 30-day moving average model.
Neural Networks
The first neural network with TensorFlow is a relatively simple DNN model, with three sequential layers.
# Helper function to prepare dataset for batches
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
dataset = tf.data.Dataset.from_tensor_slices(series)
dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
dataset = dataset.shuffle(shuffle_buffer).map(
lambda window: (window[:-1], window[-1])
)
dataset = dataset.batch(batch_size).prefetch(1)
return dataset
# Prepare dataset
window_size = 20
batch_size = 32
shuffle_buffer_size = 1000
dataset = windowed_dataset(
x_train, window_size, batch_size, shuffle_buffer_size
)
Find optimal learning rate
# Search for optimal learning rate
tf.keras.backend.clear_session()
dnn_grid_model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1)
])
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
lambda epoch: 1e-8 * 10**(epoch / 20))
dnn_grid_model.compile(
loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
)
dnn_grid_history = dnn_grid_model.fit(
dataset, epochs=200, callbacks=[lr_schedule], verbose=0
)
# Plot loss vs learning rate
lrs = 1e-8 * (10 ** (np.arange(200) / 20))
plt.semilogx(lrs, dnn_grid_history.history["loss"])
plt.axis([1e-8, 1e-3, 0, 50]);
Train model with optimal learning rate
# Retrain using best learning rate (ie 10e-4)
tf.keras.backend.clear_session()
dnn_model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10, activation="relu", input_shape=[window_size]),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1)
])
dnn_model.compile(
loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-4, momentum=0.9)
)
dnn_history = dnn_model.fit(dataset, epochs=150, verbose=0)
# Look at training loss vs epochs
loss = dnn_history.history['loss']
epochs = range(len(loss))
plt.plot(epochs, loss, 'b', label='Training Loss');
# Predict on validation set
dnn_forecast = []
for time in range(valid_start-window_size, test_start-window_size):
dnn_forecast.append(
dnn_model.predict(series[time:time + window_size][np.newaxis])
)
dnn_results = np.array(dnn_forecast)[:, 0, 0]
# Performance on validation set
mse_mae(x_valid, dnn_results)
MSE: 10.7
MAE: 2.5
Evaluate on test set
# Predict on test set
dnn_test_forecast = []
for time in range(test_start-window_size, len(series)-window_size):
dnn_test_forecast.append(
dnn_model.predict(series[time:time + window_size][np.newaxis])
)
dnn_test_results = np.array(dnn_test_forecast)[:, 0, 0]
# Plot results
plot_series(time_test, x_test)
plot_series(time_test, dnn_test_results)
# Performance on test set
dnn_perfs = mse_mae(x_test, dnn_test_results, name="DNN")
dnn_perfs
MSE 11.65
MAE 2.43
Name: DNN, dtype: float64
This starts well! This first DNN model performs better than the naive forecast.
Recurrent Neural Network
RNN use “recurring” layers, that can perform well on time series predictions.
Find optimal learning rate
# Build model
tf.keras.backend.clear_session()
rnn_grid_model = tf.keras.models.Sequential([
tf.keras.layers.Lambda(
lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]
),
tf.keras.layers.SimpleRNN(40, return_sequences=True),
tf.keras.layers.SimpleRNN(40),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 100.0)
])
# Search for optimal learning rate
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
lambda epoch: 1e-8 * 10 ** (epoch / 20)
)
optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
rnn_grid_model.compile(loss=tf.keras.losses.Huber(),
optimizer=optimizer,
metrics=["mae"])
rnn_grid_history = rnn_grid_model.fit(
dataset, epochs=100, callbacks=[lr_schedule], verbose=0
)
# Plot losses
plt.semilogx(rnn_grid_history.history["lr"], rnn_grid_history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 30]);
Train model with optimal learning rate
# Build optimal model
tf.keras.backend.clear_session()
rnn_model = tf.keras.models.Sequential([
tf.keras.layers.Lambda(
lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]
),
tf.keras.layers.SimpleRNN(40, return_sequences=True),
tf.keras.layers.SimpleRNN(40),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 100.0)
])
# Train model
rnn_model.compile(loss=tf.keras.losses.Huber(),
optimizer=tf.keras.optimizers.SGD(lr=1e-5, momentum=0.9),
metrics=["mae"])
rnn_history = rnn_model.fit(dataset, epochs=200, verbose=0)
# Look at training loss vs epochs
loss = rnn_history.history['loss']
epochs = range(20, len(loss))
plt.plot(epochs, loss[20:], 'b', label='Training Loss');
# Predict on validation set
rnn_forecast = []
for time in range(valid_start-window_size, test_start-window_size):
rnn_forecast.append(rnn_model.predict(series[time:time + window_size][np.newaxis]))
rnn_results = np.array(rnn_forecast)[:, 0, 0]
# Performance
mse_mae(x_valid, rnn_results)
MSE: 15.76
MAE: 2.93
Evaluate on test set
# Predict on test set
rnn_test_forecast = []
for time in range(test_start-window_size, len(series)-window_size):
rnn_test_forecast.append(
rnn_model.predict(series[time:time + window_size][np.newaxis])
)
rnn_test_results = np.array(rnn_test_forecast)[:, 0, 0]
# Plot results
plot_series(time_test, x_test)
plot_series(time_test, rnn_test_results)
# Performance on test set
rnn_perfs = mse_mae(x_test, rnn_test_results, name="RNN")
rnn_perfs
MSE 16.01
MAE 2.99
Name: RNN, dtype: float64
This RNN actually performs significantly worse than the simple DNN, and does not over-perform the naive benchmark.
LSTM model
The model can be sophisticated with a Long Short-Term Memory layer with 32 cells.
Find optimal learning rate
# Build model
tf.keras.backend.clear_session()
lstm_grid_model = tf.keras.models.Sequential([
tf.keras.layers.Lambda(
lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]
),
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(32, return_sequences=True)
),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 100.0)
])
# Search for optimal learning rate
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
lambda epoch: 1e-8 * 10**(epoch / 20))
lstm_grid_model.compile(
loss=tf.keras.losses.Huber(),
optimizer=tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9),
metrics=["mae"]
)
lstm_grid_history = lstm_grid_model.fit(
dataset, epochs=100, callbacks=[lr_schedule], verbose=0
)
### Plot losses
plt.semilogx(lstm_grid_history.history["lr"], lstm_grid_history.history["loss"])
plt.axis([1e-8, 1e-4, 0, 30]);
Train model with optimal learning rate
### Build optimized model
tf.keras.backend.clear_session()
lstm_model = tf.keras.models.Sequential([
tf.keras.layers.Lambda(
lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]
),
tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(32, return_sequences=True)
),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(1),
tf.keras.layers.Lambda(lambda x: x * 100.0)
])
### Train model
lstm_model.compile(loss=tf.keras.losses.Huber(),
optimizer=tf.keras.optimizers.SGD(lr=5e-5, momentum=0.9),
metrics=["mae"])
lstm_history = lstm_model.fit(dataset, epochs=200, verbose=0)
# Look at training loss vs epochs
loss = lstm_history.history['loss']
epochs = range(20, len(loss))
plt.plot(epochs, loss[20:], 'b', label='Training Loss');
# Predict on validation set
lstm_forecast = []
for time in range(valid_start-window_size, test_start-window_size):
lstm_forecast.append(
lstm_model.predict(series[time:time + window_size][np.newaxis])
)
lstm_results = np.array(lstm_forecast)[:, 0, 0]
# Performance
mse_mae(x_valid, lstm_results)
MSE: 10.41
MAE: 2.5
Evaluate on test set
# Predict on test set
lstm_test_forecast = []
for time in range(test_start-window_size, len(series)-window_size):
lstm_test_forecast.append(lstm_model.predict(series[time:time + window_size][np.newaxis]))
lstm_test_results = np.array(lstm_test_forecast)[:, 0, 0]
# Plot results
plot_series(time_test, x_test)
plot_series(time_test, lstm_test_results)
# Performance on test set
lstm_perfs = mse_mae(x_test, lstm_test_results, name="LSTM")
lstm_perfs
MSE 9.96
MAE 2.37
Name: LSTM, dtype: float64
This last model is the best performer, with an MSE well under our forecast. Looking at the predictions on the graph is quite fascinating, some daily variations seem to be predicted with magic!
Models comparison
# Sort models by MSE
pd.DataFrame([
naive_forecast_perfs,
moving_avg_perfs,
diff_moving_avg_plus_past_perfs,
diff_moving_avg_plus_smooth_past_perfs,
dnn_perfs,
rnn_perfs,
lstm_perfs
]).sort_values('MSE')
MSE | MAE | |
LSTM | 9.96 | 2.37 |
DNN | 11.65 | 2.43 |
Naive Forecast | 12.01 | 2.61 |
RNN | 16.01 | 2.99 |
Moving average | 23.92 | 3.73 |
Smoothed diff. moving average | 29.26 | 4.37 |
Differentiated moving average | 35.80 | 4.71 |
The naive forecast model, that simply took the previous day as prediction for each day, was surprisingly hard to beat! Only the DNN and LSTM models could achieve a better performance.