Using Facebook’s Prophet library to forecast the FTSE 100 index.
By Dan Lantos
Photo by Maxim Hopman on Unsplash
This article (part of a short series) aims to introduce the Prophet library, discuss it at a high level and run through a basic example of forecasting the FTSE 100 index. Future articles will discuss exactly how Prophet achieves its results, how to interpret the output and how to improve the model.
Please see this article (by my talented colleague Gavita) for an introduction to time-series forecasting algorithms.
What is Facebook Prophet?
Prophet is an open-source time-series forecasting library developed by Facebook’s Core Data Science team.
The standard (and simplest) implementation uses a univariate model, where only one variable, time, is used to forecast results.
The forecast is achieved as below:
y(t) is the target variable, the value that is being predicted
g(t) is the trend term, one of two models — “nonlinear, saturating growth” or “linear trend with changepoints”.
s(t) is the season term, and will vary depending upon the periodicity of the data (intra-daily, weekly and yearly seasonalities).
h(t) is the holidays term, Prophet allows for custom holidays (and ranges either side) that may impact the model.
ε is our error term, these are assumed to be normally distributed random variables.
Don’t worry too much about these for now, they will be covered in more depth in future articles, but a high-level understanding of these things is always helpful.
How to use Facebook Prophet?
One of the big benefits of Prophet is the minimal setup. All we require to use Prophet is a pandas dataframe, with 2 columns, “ds” our datestamp, and “y” our target variable.
Below is a code block sorting out the initial config, fetching the FTSE 100 ticker information from yfinance and plotting the dataset.
# Installation of packages. pip install Prophet pip install yfinance # Pandas for pandas dataframe operations. import pandas as pd # NumPy package for numeric operations. import numpy as np # Prophet package used for forecasting. from prophet import Prophet # Prophet diagnostics for rolling cross-validation. from prophet.diagnostics import cross_validation from prophet.diagnostics import performance_metrics from prophet.plot import plot_cross_validation_metric # yfinance package used to import dataset. import yfinance as yf # Datetime package used for date functions. from datetime import datetime, timedelta # Matplotlib package used for altering default plots. import matplotlib.pyplot as plt import matplotlib.lines as lines # Plotly package used for candlestick charts. import plotly.graph_objects as go # Logging package used to remove logging output. import logging # Command to remove logging messages from Prophet calls. logging.getLogger("py4j").setLevel(logging.ERROR) # Select up to yesterday's close - as "close" today is current price if the market hasn't closed. today = datetime.now() - timedelta( days = 1 ) # Pull information for the FTSE100 index "ticker". ftse = yf.Ticker( "^FTSE" ) ftse_df = ftse.history( start = '2016-01-01', end = today ) # Inspect the dataframe. ftse_df.head() # Reset the index to use it as our x variable. ftse_df.reset_index( inplace=True ) # Create a candlestick chart of the dataset using plotly.graph_objects. candlestick = go.Figure( data = [go.Candlestick( x = ftse_df['Date'], open = ftse_df['Open'], high = ftse_df['High'], low = ftse_df['Low'], close = ftse_df['Close'] ) ] ) # Show the figure. candlestick.update_xaxes( title_text = 'Date' ) candlestick.update_yaxes( title_text = 'FTSE 100 Index' ) candlestick.show()
Next we perform a tiny bit of data wrangling to squish the dataframe into the shape Prophet is expecting.
# Select only the 2 columns we want, and rename them appropriately to be passed to Facebook Prophet. ftse_prophet = ftse_df[['Date','Close']].rename( columns = ) # Inspect the new dataframe. ftse_prophet.head()
We now have the required dataframe to apply our Prophet model, super simple to set-up!
Next we will define our model, and fit it to our dataset. For the purpose of this article, we will leave EVERY hyperparameter (the parameters used in the training of the model) as default, to showcase the “out-of-the-box” solution.
# Specify a cutoff day range. days = 60 # Create a cutoff date using the days value. today = datetime.now() cutoff_date = today - timedelta( days ) # Use cutoff_date to split our dataset to history and actuals, which we will use to validate the model. history_df = ftse_prophet[ ftse_prophet['ds'] <= cutoff_date ] actuals_df = ftse_prophet[ ftse_prophet['ds'] > cutoff_date ] # Define the model - passing no hyperparameters results in a default model being created. # This is bad practice in reality, but great for showing how simple Prophet is "out of the box". model = Prophet() # Fit the model to our history dataset. model.fit( history_df ) # Create a future dataframe using Prophet's functon make_future_dataframe. # Remove any non-trading (or similar days) not found in the base dataframe. future_df = model.make_future_dataframe( periods = days, freq ='d', include_history = True ) future_df = future_df[ future_df['ds'].isin( ftse_prophet['ds'] ) ] # Use the model to predict values for our test dataset. forecast_df = model.predict( future_df ) # Plot the predictions, and overlay our actuals. fig = model.plot( forecast_df ) ax = fig.gca() ax.plot( actuals_df["ds"], actuals_df["y"], 'k.', color = "r" ) ax.set_xlim( [ datetime( 2019, 1, 1 ), today ] )
How well did Prophet do?
The above plot shows the actual (historical) data points in black, the actual (“future”) data points in red, the Prophet forecast as the blue line, and the lower-upper banding around the forecast in light blue.
As you can observe, the model does a generally good job of fitting the historical data, bar a few outliers, and has incredibly good predictive accuracy when plotted against the “future” actuals, at least by eye.
But how successful was the model? There is the need to define some metrics to assess the performance, MAPE (Mean Absolute Percentage Error) has been used here as it provides a user-friendly error metric, in percentage terms.
# Define a function to calculate the Mean Absolute Percentage Error (MAPE) - user friendly error metric. def mape( actuals, forecast ): x = 0 for i in range( actuals.index.min(), actuals.index.max()+1 ): x += np.abs( ( actuals[i] - forecast[i] ) / actuals[i] ) return x / len( actuals ) # Ensure out dataframes have only corresponding entries. forecast = forecast_df[ forecast_df['ds'].isin( actuals_df['ds'] ) ] actuals = actuals_df[ actuals_df['ds'].isin( forecast['ds'] ) ] # Use our MAPE function to evaluate the success of our 60 day forecast. mape = round( 100 * mape( actuals['y'], forecast['yhat'] ), 2 ) print( f'Forecast MAPE: %' )
The results show an MAPE of 1.06% — an amazingly low figure for such an unrefined model!
Unreasonably high accuracy is normally a cause for concern, so Prophet’s cross validation tools are used here to investigate further.
These functions allow the creation of “simulated historical forecasts” where we validate our results on subsets of the training data.
This is achieved by truncating the training dataset at each forecast point, training the model, and predicting over a horizon, before validating the results against the actuals, and repeating over consecutive intervals.
The code below uses this cross validation approach, taking 3 years of initial data to begin with (initial = ‘1095 days’), forecasting 180 days in advance (horizon = ‘180 days’) and repeating this process every 90 days (period = ’90 days’).
# Apply cross-validation on our model. This creates a forecast, for 180 days ahead, every 90 days, with 3 years of initial training data. # As we have 5.5 years of data, this results in 8 forecasts (1 every quarter of a year, starting from 3 years -> 5 years). crossv_df = cross_validation( model, initial = '1095 days', period = '90 days', horizon = '180 days' ) perf_df = performance_metrics( crossv_df ) # Use prophet's plot_cross_validation_metric to visualise the MAPE as the horizon increases. fig = plot_cross_validation_metric( crossv_df, metric = 'mape', color='red' ) # Evaluate the mean MAPE for our forecasts. crossv_mape = round( 100 * perf_df['mape'].mean(), 2 ) print( f'Cross validation MAPE: %' )
The plot above shows a clear trend of increasing MAPE as the horizon length increases, an insight we could intuit.
The cross validation MAPE came out at 11.66% in this ensemble of forecasts. It’s interesting to note that the MAPE of 1.06% in the FTSE 100 forecast was outrageously low for the “typical” model on that horizon. So it’s a good job we didn’t get too excited about the performance of the model!
Summary
In this article, we looked at Prophet from a very high level and both implemented and evaluated a simplistic model.
The next article in this series will take a deeper look at hyperparameter tuning and getting “under-the-hood” of the model and formulate how these forecasts are created.