Applied Data Mining for Forecasting Using SAS. Tim Rey
discuss the model support issue in advance during the model definition phase. In the best-case scenario the project sponsor signs a service contract for a specified period of time. The users must understand that due to continuous changes in the economic environment forecasting models deteriorate with time and professional service is needed to maintain high-quality forecasts. A short description of the corresponding substeps and deliverables is given below.
Statistical baseline definition
The necessary pre-condition for performance assessment is to define a statistical baseline. The accepted baseline is called the naïve forecast, which assumes that the current observation can be used as the future forecast. It is also very important to explain to the final user the meaning of a forecast since non-educated users are looking only at the predicted number at the end of the forecast horizon as the only performance metric. A forecast is defined as the combination of: (1) predictions, (2) prediction standard errors, and (3) confidence limits at each time sample in the forecast horizon (Makridakis et al. 1998). The performance metric can be based on the difference between the defined forecast of the selected model and the accepted benchmark (naïve forecast).
Performance tracking
Performance monitoring is usually scheduled on a regular basis after every new data update. The tracking process includes two key evaluation metrics: (1) data consistency checks and (2) forecast performance metric evaluation. The data consistency check validates if the new data sample is not different from the most current data beyond some defined threshold. The forecast performance check is based on a comparison of the difference between the forecast of the selected model and the naïve forecast. Based on these two metrics, a set of decision rules is defined for appropriate corrective actions. The potential changes include either re-estimation of the model parameters and keeping the existing structure or complete model re-design and identifying a new forecast model structure.
Of critical importance is also tracking the business impact on KPIs of the forecast decisions. One possible solution for doing so is using business intelligence portals and dashboards (Chase 2009).
Forecasting model maintenance deliverables
The key deliverable in this final block of the work process is a performance status report. It includes the corresponding tables and trend charts to track the discussed metrics as well as the action items if corrective actions are taken.
2.3 Work Process with SAS Tools
The objective of this section is to specify how the proposed generic work process can be implemented with the wide range of software tools developed by SAS. A generic overview of the key SAS software tools related to data mining and forecasting is shown in Figure 2.3.
The SAS tools are divided in two categories depending on the requirements for programming knowledge: (1) tools that require programming skills and (2) tools that are based on functional blocks schemes and do not require programming skills. The first category consists of the software kernel of all SAS products—Base SAS with its set of operators and functions as well as specific toolboxes of specialized functions in selected areas. Examples of such toolboxes, related to data mining and forecasting, are SAS/ETS (includes the key functions for time series analysis), SAS/STAT (includes procedures for a wide range of statistical methodologies), SAS/GRAPH (allows creating various high resolution color graphics plots and chart), SAS/IML (enables programming of new methods based on the powerful Interactive Matrix Language IML), and SAS High-Performance Forecasting (includes a set of procedures for High-Performance Forecasting).
The second category of SAS tools, based on functional block schemes, shown in Figure 2.3, includes three main products: SAS Enterprise Guide, SAS Enterprise Miner, and SAS Forecast Server. SAS Enterprise Guide allows high-efficiency data preprocessing and development, basic statistical analysis, and forecasting by linking functional blocks. SAS Enterprise Miner is the main tool for developing data mining models based on build-in functional blocks and SAS Forecast Server is a highly productive forecasting environment with a very high level of automation. The business clients can interact with all model development tools via SAS Microsoft Add-in.
Figure 2.3: SAS software tools related to data mining in forecasting
SAS also has another product with statistical, data mining, and forecasting capabilities. It is called JMP. However, because its functionality is similar to SAS Enterprise Guide and SAS Enterprise Miner, it is not discussed in this book. For those readers interested in the forecasting capabilities of JMP, a good starting point is JMP Start Statistics: A Guide to Statistics and Data Analysis Using JMP (Sall J., Creighton L., and Lehnan, A. 2009).
2.3.1 Data Preparation Steps with SAS Tools
The wide range of SAS tools gives the developer many options to effectively implement all of the data preparation steps. Good examples at the Base SAS level are procedures, such as DATA step for generic data collection or PROC SQL for writing specific data extracts.6 The specific functions or built-in functional blocks for data preparation in the SAS tools that are related to data mining and forecasting are discussed briefly below.
Data preparation using SAS/ETS
The key SAS/ETS procedures for data preparation are as follows:
DATASOURCE provides seamless access to time series data from commercial and governmental data vendors, such as Haver Analytics, Standard & Poor's Compustat Service, the U.S. Bureau of Labor Statistics, and so on. It enables you to select the time series with specific frequency over a selected time range across sections of the data.
EXPAND provides different types of time interval conversions, such as converting irregular observations in periodic format or constructing quarterly estimates from annual data. Another important capability of this procedure is interpolating missing values for time series via the following methods: cubic splines, linear splines, step functions, and simple aggregation.
TIMESERIES has the ability to process large amounts of time-stamped data. It accumulates transactional data to time series and performs correlation, trend, and seasonal analysis on the accumulated time series. It also delivers descriptive statistics for the corresponding time series data.
X11 and X12 both provide seasonal adjustment of time series by decomposing monthly or quarterly data into trend, seasonal, and irregular components. The procedures are based on slightly different methods that were developed by the U.S. Census Bureau as the result of years of work by census researchers. X12 includes additional diagnostic tests to be run after the decomposition and the ability to remove the effect of input variables before the decomposition.7
Data preparation using SAS Enterprise Guide
SAS Enterprise Guide has built-in functional blocks that enable you to automate many data manipulation procedures (such as filtering, sorting, transposing, ranking, and comparing) without writing programming code. The two functional blocks for time series data preparation are Create Time Series Data and Prepare Time Series Data. Each block is a functional user interface to SAS/ETS procedures. Create Time Series Data is the user interface to TIMESERIES and Prepare Time Series Data is the corresponding user interface to EXPAND.
The advantage of using the functional block flows for implementing different steps of the proposed work process is clearly demonstrated with a simple example in Figure 2.4. The SAS Enterprise Guide flow shows the process of developing ARIMA forecasting models from the transactional data of 42 products. The original 42 transactional data are transformed as a time series of monthly data by the Create Time Series block, and the forecasting models are generated by the ARIMA Modeling functional block. The results with the corresponding graphical plots are summarized and output in a Word document.
Figure 2.4: An example of SAS Enterprise Guide flow for time series data preparation and | modeling