Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen
from design nominal, then Part 1 will consequently not be in its designed nominal position, as shown in Figure 1.2(b). After joining Part 1 and Part 2, the dimensions of the final parts will deviate from the designed nominal values. One critical point that needs to be emphasized is that Figure 1.2(b) only shows one possible realization of produced assemblies. If we produce another assembly, the deviation of the position of Part 1 could be different. For instance, if the diameter of a pin is reduced due to pin wear, then the matching between the pin and the corresponding hole will be loose, which will lead to random wobble of the final position of part. This will in turn cause increased variation in the dimension of the produced final assemblies. As a result, mislocations of the pin can be manifested by either mean shift or variance change in the dimensional quality measurement such as M1 and M2 in the figure. In the case of mean shift error (for example due to a fixed position shift of the pin), the error can be compensated by process adjustment such as realignment of the locators. The variance change errors (for example due to a worn-out pin or the excessive looseness of a pin) cannot be easily compensated for in most cases. Also, note that each locator in the process is a potential source of the variance change errors, which is referred to as a variation source. The variation sources are random effects in the process that will impact on the final assembly quality. In most assembly processes, the pin wear is difficult to measure so the random effects are not directly observed. In a modern automotive body assembly process, hundreds of locators are used to position a large number of parts and sub-assemblies. An important and challenging diagnosis problem is to estimate and identify the variation sources in the process based on the observed quality measurements.
Example 1.2 Random effects in battery degradation processes
In industrial applications, the reliability of a critical unit is crucial to guarantee the overall functional capabilities of the entire system. Failure of such a unit can be catastrophic. Turbine engines of airplanes, power supplies of computers, and batteries of automobiles are typical examples where failure of the unit would lead to breakdown of the entire system. For these reasons, the working condition of such critical units must be monitored and the remaining useful life (RUL) of such units should be predicted so that we can take preventive actions before catastrophic failure occurs. Many system failure mechanisms can be traced back to some underlying degradation processes. An important prognosis problem is to predict RUL based on the degradation signals collected, which are often strongly associated with the failure of the unit. For example, Figure 1.3 shows the evolution of the internal resistance signals of multiple automotive lead-acid batteries. The internal resistance measurement is known to be one of the best condition monitoring signals for the battery life prognosis [Eddahech et al., 2012]. As we can see from Figure 1.3, the internal resistance measurement generally increases with the service time of the battery, which indicates that the health status of the battery is deteriorating.
Figure 1.3 Internal resistance measures from multiple batteries over time.
We can clearly see from Figure 1.3 that although similar, the progression paths of the internal resistance over time of different batteries are not identical. The difference is certainly expected due to many random factors in the material, manufacturing processes, and the working environment that vary from unit-to-unit. The random characteristics of degradation paths are random effects, which impact the observed degradation signals of multiple batteries.
The available data from multiple similar units/machines poses interesting intellectual opportunities and challenges for prognosis. As for opportunities, since we have observations from potentially a very large number of similar units, we can compare their operations/conditions, share the information, and extract common knowledge to enable accurate prediction and control at the individual level. As for challenges, because the data are collected in the field and not in a controlled environment, the data contain significant variation and heterogeneity due to the large variations in working conditions for different units. The data analytics approaches should not only be general (so that the common information can be learned and shared), but also flexible (so that the behavior of an individual subject can be captured and controlled).
Random effects always exist in industrial processes. The process variation caused by random effects is detrimental and thus random effects should be modeled, analyzed, and controlled, particularly in system diagnosis and prognosis. However, due to the limitation in the data availability, the data analytics approaches considering random effects have not been widely adopted in industrial practices. Indeed, before the significant advancement in communication and information technology, data collection in industries often occurs locally in very similar environments. With such limited data, the impact of random effects cannot be exposed and modeled easily. This situation has changed significantly in recent years due to the digital revolution as mentioned at the beginning of the section.
The statistical methods for random effects provide a powerful set of tools for us to model and analyze the random variation in an industrial process. The goal of this book is to provide a textbook for engineering students and a reference book for researchers and industrial practitioners to adapt and bring the theory and techniques of random effects to the application area of industrial system diagnosis and prognosis. The detailed scope of the book is summarized in the next section.
1.2 Scope and Organization of the Book
This book focuses on industrial data analytics methods for system diagnosis and prognosis with an emphasis on random effects in the system. Diagnosis concerns identification of the root cause of a failure or an abnormal working condition. In the context of random effects, the goal of diagnosis is to identify the variation sources in the system. Prognosis concerns using data to predict what will happen in the future. Regarding random effects, prognosis focuses on addressing unit-to-unit variation and making degradation/failure predictions for each individual unit considering the unique characteristic of the unit.
The book contains two main parts:
1 Statistical Methods and Foundation for Industrial Data AnalyticsThis part covers general statistical concepts, methods, and theory useful for describing and modelling the variation, the fixed effects, and the random effects for both univariate and multivariate data. This part provides necessary background for later chapters in part II. In part I, Chapter 2 introduces the basic statistical methods for visualizing and describing data variation. Chapter 3 introduces the concept of random vectors and multivariate normal distribution. Basic concepts in statistical modeling and inference will also be introduced. Chapter 4 focuses on the principal component analysis (PCA) method. PCA is a powerful method to expose and describe the variations in multivariate data. PCA has broad applications in variation source identification. Chapter 5 focuses on linear regression models, which are useful in modeling the fixed effects in a dataset. Statistical inference in linear regression including parameter estimation and hypothesis testing approaches will be discussed. Chapter 6 focuses on the basic theory of the linear mixed effects model, which captures both the fixed effects and the random effects in the data.
2 Random Effects Approaches for Diagnosis and PrognosisThis part covers the applications of the random effects modeling approach to diagnosis of variation sources and to failure prognosis in industrial processes/systems. Through industrial application examples, we will present variation pattern based variation source identification in Chapter 7. Variation source estimation methods based on the linear mixed effects model will be introduced in Chapter 8. A detailed performance comparison of different methods for practical applications is presented as well. In Chapter 9, the diagnosability issue for the variation source diagnosis problem will be studied. Chapter 10 introduces the mixed effects longitudinal modeling approach for forecasting system degradation and predicting remaining useful life based on the first time hitting probability. Some variations of the basic method such as the method considering mixture prior for unbalanced data in remaining useful life prediction are also presented.