Applied Data Mining for Forecasting Using SAS. Tim Rey
methods for effective forecasting.
Six Sigma users – Six Sigma is a work process for developing high-quality processes and solutions in industry. It has been accepted as a standard by the majority of global corporations. The estimated users of Six Sigma are tens of thousands of project leaders, called black belts, and hundreds of thousands of technical experts, called green belts. Usually, they use classical statistics in their projects. Data mining for forecasting is a natural extension to Six Sigma for solving complex problems, which both the black and green belts can take advantage of.
Academics – This group includes a large class of academics in both fields (data mining and forecasting) who are not familiar with the research and technical details of the other. They will benefit from the book by using it to broaden their area of expertise and understanding specific requirements for successful practical applications as defined by industrial experts.
Students – Undergraduate and graduate students in technical, economical, and even social disciplines can benefit from the book by understanding the advantages of using data mining in forecasting and its potential for implementation in their specific field. In addition, the book will help students gain knowledge about the practical aspects of forecasting and data mining and the issues faced in real-world applications.
How This Book Is Structured
The first four chapters of the book focus on the main topic of applying data mining for industrial forecasting. Chapter 1 clarifies the business forces that drive the use of data mining for forecasting while Chapter 2 presents a work process, akin to Six Sigma methodologies, that helps to integrate the proposed approach into corporate culture. Chapter 3 describes the critical efforts of building hardware, software, and organizational infrastructures that are needed for the successful application of business forecasting. Chapter 4 gives a systematic view of the key technical and nontechnical application issues as well as a complete checklist for applying data mining for forecasting. The next three chapters focus on presenting the necessary process and methods of data mining as it relates to forecasting. The focus of Chapter 5 is on data collection while Chapter 6 identifies the main data preprocessing steps and emphasizes their critical role for high-quality forecasting. Chapter 7 defines, from a practical perspective, the key data mining methods of forecasting, such as similarity analysis, varcluster analysis, principal component analysis, stepwise regression, decision trees, co-integration analysis, and genetic programming.
Chapters 8 through 11 cover the most important topic of the book—how to define an implementation strategy for successful real-world applications of data mining for forecasting. These chapters present a practitioner's guide of time series forecasting methods that details univariate, multivariate, hierarchical, and nonlinear models. Finally, Chapter 12 illustrates the key topics in applying data mining for forecasting on a real business example.
What This Book Is NOT About
Detailed theoretical description of data mining and forecasting approaches – This book does not include a deep academic presentation of the various data mining and forecasting methods. The reader who is interested in more detailed knowledge on any individual approach is referred to the appropriate resources, such as books, critical papers, and Websites. The focus of the book is on the application of related data mining and forecasting methods. All methods are described and analyzed at the level of detail that will help their broad practical implementation.
Introduction of new data mining and forecasting methods – The book does not propose new data mining and forecasting methods. The novelty of the book is on integrating both methodologies and on the application of data mining for forecasting.
Software manual of SAS products – This is not an introductory manual of the SAS software products used in the application of data mining for forecasting. It is assumed that the interested reader has some basic knowledge on the specific SAS software used herein: Base SAS, SAS Enterprise Guide, SAS Enterprise Miner, and SAS Forecast Server.
Features of the Book
The key features that differentiate this book from other titles on data mining and forecasting are:
1 Integrating data mining and forecasting – One of the main messages in the book is that a critical factor for improving forecasting is using data mining methods. The synergetic benefits of both approaches are mostly in the area of variable reduction and variable selection for building multivariate forecasting models.
2 A broader view of industrial forecasting – Another important topic of the book is the proposed broadening of the forecasting approaches by using nonlinear predictions in addition to the existing time series methods. This allows handling cases with short time series and extraordinary business or process conditions.
3 Emphasis on practical applications – The third key feature of the book is the predominant practical view of all discussed topics. The examples given are from real industrial applications and the reader has the opportunity to “learn from the kitchen” regarding how data mining for forecasting works in an industrial setting.
Acknowledgments
The authors would like to thank Jan Baumgras and Terry Woodfield whose constructive comments substantially improved the final manuscript. The authors also highly appreciate the comments and clarifications of our technical reviewers Lorne Rothman, Abhijit Kulkarni, Sean Cai, Sara Vidal, and Udo Sglavo.
The staff of SAS Press has been most helpful, especially George McDaniel who successfully managed the project and responded to our requests. We gratefully acknowledge the contributions of our copyeditor Brad Kellam, production specialist Candy Farrell, designer Jennifer Dilley, and marketing specialists Aimee Rodriguez and Shelly Goodin.
Chapter 1: Why Industry Needs Data Mining For Forecasting
1.2 Forecasting Capabilities as a Competitive Advantage
1.3 The Explosion of Available Time Series Data
1.4 Some Background on Forecasting
1.5 The Limitations of Classical Univariate Forecasting
1.6 What is a Time Series Database?
1.7 What is Data Mining for Forecasting?
1.8 Advantages of Integrating Data Mining and Forecasting
1.1 Overview
In today's economic environment there is ample opportunity to leverage the numerous sources of time series data that are readily available to the savvy decision maker. This