Time Series Cross Validation

Time Series Cross Validation#

import pandas as pd
import numpy as np

#suppress ARIMA warnings
import warnings
warnings.filterwarnings('ignore')

import sys

# if running in Google Colab install forecast-tools
if 'google.colab' in sys.modules:
    !pip install forecast-tools

Up till now we have used a single validation period to select our best model. The weakness of that approach is that it gives you a sample size of 1 (that’s better than nothing, but generally poor statistics!). Time series cross validation is an approach to provide more data points when comparing models. In the classicial time series literature time series cross validation is called a Rolling Forecast Origin. There may also be benefit of taking a sliding window approach to cross validaiton. This second approach maintains a fixed sized training set. I.e. it drops older values from the time series during validation.

Rolling Forecast Origin#

The following code and output provide a simplified view of how rolling forecast horizons work in practice.

def rolling_forecast_origin(train, min_train_size, horizon):
    '''
    Rolling forecast origin generator.
    '''
    for i in range(len(train) - min_train_size - horizon + 1):
        split_train = train[:min_train_size+i]
        split_val = train[min_train_size+i:min_train_size+i+horizon]
        yield split_train, split_val

full_series = [2502, 2414, 2800, 2143, 2708, 1900, 2333, 2222, 1234, 3456]

test = full_series[-2:]
train = full_series[:-2]
print('full training set: {0}'.format(train))
print('hidden test set: {0}'.format(test))

full training set: [2502, 2414, 2800, 2143, 2708, 1900, 2333, 2222]
hidden test set: [1234, 3456]

cv_rolling = rolling_forecast_origin(train, min_train_size=4, horizon=2)
cv_rolling

<generator object rolling_forecast_origin at 0x7fc04d75cc80>

i = 0
for cv_train, cv_val in cv_rolling:
    print(f'CV[{i+1}]')
    print(f'Train:\t{cv_train}')
    print(f'Val:\t{cv_val}')
    print('-----')
    i += 1

CV[1]
Train:	[2502, 2414, 2800, 2143]
Val:	[2708, 1900]
-----
CV[2]
Train:	[2502, 2414, 2800, 2143, 2708]
Val:	[1900, 2333]
-----
CV[3]
Train:	[2502, 2414, 2800, 2143, 2708, 1900]
Val:	[2333, 2222]
-----

Sliding Window Cross Validation#

def sliding_window(train, window_size, horizon, step=1):
    '''
    sliding window  generator.
    
    Parameters:
    --------
    train: array-like
        training data for time series method
    
    window_size: int
        lookback - how much data to include.
    
    horizon: int
        forecast horizon
        
    step: int, optional (default=1)
        step=1 means that a single additional data point is added to the time
        series.  increase step to run less splits.
        
    Returns:
        array-like, array-like
    
        split_training, split_validation
    '''
    for i in range(0, len(train) - window_size - horizon + 1, step):
        split_train = train[i:window_size+i]
        split_val = train[i+window_size:window_size+i+horizon]
        yield split_train, split_val

This code tests its with step=1

cv_sliding = sliding_window(train, window_size=4, horizon=1)

print('full training set: {0}\n'.format(train))

i = 0
for cv_train, cv_val in cv_sliding:
    print(f'CV[{i+1}]')
    print(f'Train:\t{cv_train}')
    print(f'Val:\t{cv_val}')
    print('-----')
    i += 1

full training set: [2502, 2414, 2800, 2143, 2708, 1900, 2333, 2222]

CV[1]
Train:	[2502, 2414, 2800, 2143]
Val:	[2708]
-----
CV[2]
Train:	[2414, 2800, 2143, 2708]
Val:	[1900]
-----
CV[3]
Train:	[2800, 2143, 2708, 1900]
Val:	[2333]
-----
CV[4]
Train:	[2143, 2708, 1900, 2333]
Val:	[2222]
-----

The following code tests it with step=2. Note that you get less splits. The code is less computationally expensive at the cost of less data. That is probably okay.

cv_sliding = sliding_window(train, window_size=4, horizon=1, step=2)

print('full training set: {0}\n'.format(train))

i = 0
for cv_train, cv_val in cv_sliding:
    print(f'CV[{i+1}]')
    print(f'Train:\t{cv_train}')
    print(f'Val:\t{cv_val}')
    print('-----')
    i += 1

full training set: [2502, 2414, 2800, 2143, 2708, 1900, 2333, 2222]

CV[1]
Train:	[2502, 2414, 2800, 2143]
Val:	[2708]
-----
CV[2]
Train:	[2800, 2143, 2708, 1900]
Val:	[2333]
-----

Parallel Cross Validation Example using Naive1#

from forecast_tools.baseline import SNaive, Naive1
from forecast_tools.datasets import load_emergency_dept
#optimised version of the functions above...
from forecast_tools.model_selection import (rolling_forecast_origin, 
                                            sliding_window,
                                            cross_validation_score)
from sklearn.metrics import mean_absolute_error

train = load_emergency_dept()

model = Naive1()

#%%timeit runs the code multiple times to get an estimate of runtime.
#comment if out to run the code only once.

Run on a single core

%%time
cv = sliding_window(train, window_size=14, horizon=7, step=1)
results_1 = cross_validation_score(model, train, cv, mean_absolute_error, 
                                   n_jobs=1)

CPU times: user 812 ms, sys: 0 ns, total: 812 ms
Wall time: 809 ms

Run across multiple cores by setting n_jobs=-1

%%time 
cv = sliding_window(train, window_size=14, horizon=7, step=1)
results_2 = cross_validation_score(model, train, cv, mean_absolute_error,
                                    n_jobs=-1)

CPU times: user 369 ms, sys: 71.5 ms, total: 440 ms
Wall time: 990 ms

results_1.shape

(324, 1)

results_2.shape

(324, 1)

print(results_1.mean(), results_1.std())

26.653439153439148 10.346957901088238

just to illustrate that the results are the same - the difference is runtime.

print(results_2.mean(), results_2.std())

26.653439153439148 10.346957901088238

Cross validation with multiple forecast horizons#

horizons = [7, 14, 21]
cv = sliding_window(train, window_size=14, horizon=max(horizons), step=1)
#note that we now pass in the horizons list to cross_val_score
results_h = cross_validation_score(model, train, cv, mean_absolute_error,
                                   horizons=horizons, n_jobs=-1)

#results are returned as numpy array - easy to cast to dataframe and display
pd.DataFrame(results_h, columns=['7days', '14days', '21days']).head()

	7days	14days	21days
0	21.142857	22.357143	23.190476
1	14.857143	18.285714	19.000000
2	17.857143	22.357143	25.047619
3	17.142857	18.571429	19.285714
4	23.142857	19.500000	20.142857

Cross validation example using ARIMA - does it speed up when CV run in Parallel?#

#use ARIMA from pmdarima as that has a similar interface to baseline models.
from pmdarima import ARIMA, auto_arima

#ato_model = auto_arima(train, suppress_warnings=True, n_jobs=-1, m=7)

#auto_model

#create arima model - reasonably complex model
#order=(1, 1, 2), seasonal_order=(2, 0, 2, 7)
args = {'order':(1, 1, 2), 'seasonal_order':(2, 0, 2, 7)}
model = ARIMA(order=args['order'], seasonal_order=args['seasonal_order'],
              enforce_stationarity=False, suppress_warnings=True)

%%time
cv = rolling_forecast_origin(train, min_train_size=320, horizon=7)
results_1 = cross_validation_score(model, train, cv, mean_absolute_error, 
                                   n_jobs=1)

CPU times: user 39.8 s, sys: 373 ms, total: 40.1 s
Wall time: 13.4 s

comment out %%timeit to run the code only once!

you should see a big improvement in performance. mine went from 12.3 seconds to 2.4 seconds.

%%time
cv = rolling_forecast_origin(train, min_train_size=320, horizon=7)
results_2 = cross_validation_score(model, train, cv, mean_absolute_error, 
                                   n_jobs=-1)

CPU times: user 1.63 s, sys: 252 ms, total: 1.88 s
Wall time: 4.32 s

results_1.shape

(18, 1)

results_2.shape

(18, 1)

results_1.mean()

15.58663509219668

results_2.mean()

15.586662939018318