Automatically selecting a naive model to use as a benchmark#

forecast-tools provides a auto_naive function that uses point-forecast cross validation to select the ‘best’ naive model to use as a benchmark.

The function tests all of the naive Forecast methods.

This notebook covers how to use auto_naive and also how to trouble shoot it use if there are conflicts between parameters.

Imports#

import sys

# if running in Google Colab install forecast-tools
if 'google.colab' in sys.modules:
    !pip install forecast-tools
import numpy as np
from forecast_tools.datasets import load_emergency_dept
from forecast_tools.model_selection import auto_naive                                    
help(auto_naive)
Help on function auto_naive in module forecast_tools.model_selection:

auto_naive(y_train, horizon=1, seasonal_period=1, min_train_size='auto', method='cv', step=1, window_size='auto', metric='mae')
    Automatic selection of the 'best' naive benchmark on a 'single' series
    
    The selection process uses out-of-sample cv performance.
    
    By default auto_naive uses cross validation to estimate the mean
    point forecast peformance of all naive methods.  It selects the method
    with the lowest point forecast metric on average.
    
    If there is limited data for training a basic holdout sample could be
    used.
    
    Dev note: the plan is to update this to work with multiple series.
    It would be best to use MASE for multiple series comparison.
    
    Parameters:
    ----------
    y_train: array-like
        training data.  typically in a pandas.Series, pandas.DataFrame
        or numpy.ndarray format.
    
    horizon: int, optional (default=1)
        Forecast horizon.
    
    seasonal_period: int, optional (default=1)
        Frequency of the data.  E.g. 7 for weekly pattern, 12 for monthly
        365 for daily.
    
    min_train_size: int or str, optional (default='auto')
        The size of the initial training set (if method=='ro' or 'sw').
        If 'auto' then then min_train_size is set to len(y_train) // 3
        If main_train_size='auto' and method='holdout' then
        min_train_size = len(y_train) - horizon.
    
    method: str, optional (default='cv')
        out of sample selection method.
        'ro' - rolling forecast origin
        'sw' - sliding window
        'cv' - scores from both ro and sw
        'holdout' - single train/test split
         Methods'ro' and 'sw' are similar, however, sw has a fixed
         window_size and drops older data from training.
    
    step: int, optional (default=1)
        The stride/step of the cross-validation. I.e. the number
        of observations to move forward between folds.
    
    window_size: str or int, optional (default='auto')
        The window_size if using sliding window cross validation
        When 'auto' and method='sw' then
        window_size=len(y_train) // 3
    
    metric: str, optional (default='mae')
        The metric to measure out of sample accuracy.
        Options: mase, mae, mape, smape, mse, rmse, me.
    
    Returns:
    --------
    dict
        'model': baseline.Forecast
        f'{metric}': float
    
        Contains the model and its CV performance.
    
    Raises:
    -------
    ValueError
        For invalid method, metric, window_size parameters
    
    See Also:
    --------
    forecast_tools.baseline.Naive1
    forecast_tools.baseline.SNaive
    forecast_tools.baseline.Drift
    forecast_tools.baseline.Average
    forecast_tools.baseline.EnsembleNaive
    forecast_tools.baseline.baseline_estimators
    forecast_tools.model_selection.rolling_forecast_origin
    forecast_tools.model_selection.sliding_window
    forecast_tools.model_selection.mase_cross_validation_score
    forecast_tools.metrics.mean_absolute_scaled_error
    
    Examples:
    ---------
    Measuring MAE and taking the best method using both
    rolling origin and sliding window cross validation
    of a 56 day forecast.
    
    >>> from forecast_tools.datasets import load_emergency_dept
    >>> y_train = load_emergency_dept
    >>> best = auto_naive(y_train, seasonal_period=7, horizon=56)
    >>> best
    {'model': Average(), 'mae': 19.63791579700355}
    
    
    Take a step of 7 days between cv folds.
    
    >>> from forecast_tools.datasets import load_emergency_dept
    >>> y_train = load_emergency_dept
    >>> best = auto_naive(y_train, seasonal_period=7, horizon=56,
        ...               step=7)
    >>> best
    {'model': Average(), 'mae': 19.675635558539383}

Load the training data#

y_train = load_emergency_dept()

Select the best naive model for a h-step horizon of 7 days.#

Let’s select a method for the emergency deparment daily level to predict 7 days ahead. By default the function using the mean absolute error to evaluate forecast accuracy.

best = auto_naive(y_train, horizon=7, seasonal_period=7)
best
{'model': Average(), 'mae': 19.679856211931035}
y_preds = best['model'].fit_predict(y_train, horizon=7)
y_preds
array([221.06395349, 221.06395349, 221.06395349, 221.06395349,
       221.06395349, 221.06395349, 221.06395349])

Using a different forecasting error metric#

best = auto_naive(y_train, horizon=7, seasonal_period=7, metric='mape')
best
{'model': Average(), 'mape': 8.69955926909263}

Using a single train-test split when data are limited.#

If your forecast horizon means that h-step cross-validation is infeasible then you can automatically select using a single holdout sample.

best = auto_naive(y_train, horizon=7, seasonal_period=7, method='holdout')
best
{'model': Average(), 'mae': 30.182280627384486}

Trouble shooting use of auto_naive#

Problem 1: Training data is shorter than the min_train_size + horizon

For any validation to take place, including a simple holdout - the time series used must allow at least one train test split to take place. This can be a problem when seasonal_period is set to a length similar to the length of the time series.

# generate a synthetic daily time series of exactly one year in length.
y_train = np.random.randint(100, 250, size=365)

Let’s set seasonal period to seasonal_period=365 (the length of the time series) and horizon=7.

We will also manually set min_train_size=365

This will generate a ValueError reporting that the “The training data is shorter than min_train_size + horizon No validation can be performed.”

best = auto_naive(y_train, horizon=7, seasonal_period=365, method='ro', 
                  min_train_size=365, metric='mae')

best
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-d9dc4e172979> in <module>
----> 1 best = auto_naive(y_train, horizon=7, seasonal_period=365, method='ro', 
      2                   min_train_size=365, metric='mae')
      3 
      4 best

~/opt/anaconda3/envs/forecast_dev/lib/python3.8/site-packages/forecast_tools/model_selection.py in auto_naive(y_train, horizon, seasonal_period, min_train_size, method, step, window_size, metric)
    432         msg = f"The training data is shorter than {min_train_size=} + {horizon=}" \
    433             + " No validation can be performed. "
--> 434         raise ValueError(msg)
    435     elif min_train_size < seasonal_period and (method == 'cv' or method == 'ro'):
    436         msg = "Seasonal period is longer than the minimum training size for" \

ValueError: The training data is shorter than min_train_size=365 + horizon=7 No validation can be performed. 

A longer time series or a shorter seasonal period will fix this problem.

# a longer synthetic time series.
y_train = np.random.randint(100, 250, size=365+7)
best = auto_naive(y_train, horizon=7, seasonal_period=365, method='ro', 
                  min_train_size=365, metric='mae')

best
{'model': Average(), 'mae': 43.29549902152642}
# a shorter seasonal period and minimum training size
y_train = np.random.randint(100, 250, size=365)
best = auto_naive(y_train, horizon=7, seasonal_period=7, method='ro', 
                  min_train_size=7, metric='mae')

best
{'model': Average(), 'mae': 37.50786553941686}