Machine Learning for Time Series Forecasting with Python. Francesca Lazzeri
as only one time step is to be predicted. In contrast to the one-step forecast are the multiple-step or multi-step time series forecasting problems, where the goal is to predict a sequence of values in a time series. Many time series problems involve the task of predicting a sequence of values using only the values observed in the past (Cheng et al. 2006). Examples of this task include predicting the time series for crop yield, stock prices, traffic volume, and electrical power consumption. There are at least four commonly used strategies for making multi-step forecasts (Brownlee 2017):Direct multi-step forecast: The direct method requires creating a separate model for each forecast time stamp. For example, in the case of predicting energy consumption for the next two hours, we would need to develop a model for forecasting energy consumption on the first hour and a separate model for forecasting energy consumption on the second hour.Recursive multi-step forecast: Multi-step-ahead forecasting can be handled recursively, where a single time series model is created to forecast next time stamp, and the following forecasts are then computed using previous forecasts. For example, in the case of forecasting energy consumption for the next two hours, we would need to develop a one-step forecasting model. This model would then be used to predict next hour energy consumption, then this prediction would be used as input in order to predict the energy consumption in the second hour.Direct-recursive hybrid multi-step forecast: The direct and recursive strategies can be combined to offer the benefits of both methods (Brownlee 2017). For example, a distinct model can be built for each future time stamp, however each model may leverage the forecasts made by models at prior time steps as input values. In the case of predicting energy consumption for the next two hours, two models can be built, and the output from the first model is used as an input for the second model.Multiple output forecast: The multiple output strategy requires developing one model that is capable of predicting the entire forecast sequence. For example, in the case of predicting energy consumption for the next two hours, we would develop one model and apply it to predict the next two hours in one single computation (Brownlee 2017).
Contiguous or noncontiguous time series values of your forecasting model – A time series that present a consistent temporal interval (for example, every five minutes, every two hours, or every quarter) between each other are defined as contiguous (Zuo et al. 2019). On the other hand, time series that are not uniform over time may be defined as noncontiguous: very often the reason behind noncontiguous timeseries may be missing or corrupt values. Before jumping to the methods of data imputation, it is important to understand the reason data goes missing. There are three most common reasons for this:Missing at random: Missing at random means that the propensity for a data point to be missing is not related to the missing data but it is related to some of the observed data.Missing completely at random: The fact that a certain value is missing has nothing to do with its hypothetical value and with the values of other variables.Missing not at random: Two possible reasons are that the missing value depends on the hypothetical value or the missing value is dependent on some other variable's value.In the first two cases, it is safe to remove the data with missing values depending upon their occurrences, while in the third case removing observations with missing values can produce a bias in the model. There are different solutions for data imputation depending on the kind of problem you are trying to solve, and it is difficult to provide a general solution. Moreover, since it has temporal property, only some of the statistical methodologies are appropriate for time series data.I have identified some of the most commonly used methods and listed them as a structural guide in Figure 1.7.Figure 1.7: Handling missing dataAs you can observe from the graph in Figure 1.7, listwise deletion removes all data for an observation that has one or more missing values. Particularly if the missing data is limited to a small number of observations, you may just opt to eliminate those cases from the analysis. However, in most cases it is disadvantageous to use listwise deletion. This is because the assumptions of the missing completely at random method are typically rare to support. As a result, listwise deletion methods produce biased parameters and estimates.Pairwise deletion analyses all cases in which the variables of interest are present and thus maximizes all data available by an analysis basis. A strength to this technique is that it increases power in your analysis, but it has many disadvantages. It assumes that the missing data is missing completely at random. If you delete pairwise, then you'll end up with different numbers of observations contributing to different parts of your model, which can make interpretation difficult.Deleting columns is another option, but it is always better to keep data than to discard it. Sometimes you can drop variables if the data is missing for more than 60 percent of the observations but only if that variable is insignificant. Having said that, imputation is always a preferred choice over dropping variables.
Regarding time series specific methods, there are a few options:Linear interpolation: This method works well for a time series with some trend but is not suitable for seasonal data.Seasonal adjustment and linear interpolation: This method works well for data with both trend and seasonality.Mean, median, and mode: Computing the overall mean, median, or mode is a very basic imputation method; it is the only tested function that takes no advantage of the time series characteristics or relationship between the variables. It is very fast but has clear disadvantages. One disadvantage is that mean imputation reduces variance in the data set.
In the next section of this chapter, we will discuss how to shape time series as a supervised learning problem and, as a consequence, get access to a large portfolio of linear and nonlinear machine learning algorithms.
Supervised Learning for Time Series Forecasting
Machine learning is a subset of artificial intelligence that uses techniques (such as deep learning) that enable machines to use experience to improve at tasks (aka.ms/deeplearningVSmachinelearning
). The learning process is based on the following steps:
1 Feed data into an algorithm. (In this step you can provide additional information to the model, for example, by performing feature extraction.)
2 Use this data to train a model.
3 Test and deploy the model.
4 Consume the deployed model to do an automated predictive task. In other words, call and use the deployed model to receive the predictions returned by the model (aka.ms/deeplearningVSmachinelearning).
Machine learning is a way to achieve artificial intelligence. By using machine learning and deep learning techniques, data scientists can build computer systems and applications that do tasks that are commonly associated with human intelligence. These tasks include time series forecasting, image recognition, speech recognition, and language translation (aka.ms/deeplearningVSmachinelearning
).
There are three main classes of machine learning: supervised learning, unsupervised learning, and reinforcement learning. In the following few paragraphs, we will have a closer look at each of these machine learning classes:
Supervised learning is a type of machine learning system in which both input (which is the collection of values for the variables in your data set) and desired output (the predicted values for the target variable) are provided. Data is identified and labeled a priori to provide the algorithm a learning memory for future data handling. An example of a numerical label is the sale price associated with a used car (aka.ms/MLAlgorithmCS). The goal of supervised learning is to study many labeled examples like these and then to be able to make predictions about future data points, like, for example, assigning accurate sale prices to other used cars that have similar characteristics to the one used during the labeling process. It is called supervised learning because data scientists supervise the process of an algorithm learning from the training data set (www.aka.ms/MLAlgorithmCS): they know the correct answers and they iteratively share them with the algorithm during the learning process. There are several specific types of supervised learning. Two of the most common are classification and regression:Classification: Classification