Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen

Industrial Data Analytics for Diagnosis and Prognosis

where β² is the variance of X. An estimator of a parameter is called unbiased if its mean is equal to the true value of the parameter. X̄ is a commonly used estimator of µ because it is unbiased and has a smaller variance for a larger sample size n.

This concept can be extended to a p-dimensional random vector X with mean vector µ. Consider a random sample X₁, X₂,…, Xn from the population of X. The sample mean vector X̄ is a random vector with population mean E(X̄) = µ and population covariance matrix cov left parenthesis bold X with bar on top right parenthesis equals 1 over n bold capital sigma , where Σ is the population covariance matrix of X. The population covariance matrix is defined shortly. The sample mean vector X̄ is an unbiased estimator of the population mean vector μ.

The (population) covariance matrix of a random vector X is defined as

table row cell bold capital sigma bold equals bold c bold o bold v bold left parenthesis bold X with bold bar on top bold right parenthesis bold equals open parentheses table row cell bold sigma subscript bold 11 end cell cell bold sigma subscript bold 12 end cell bold horizontal ellipsis cell bold sigma subscript bold 1 bold p end subscript end cell row cell bold sigma subscript bold 21 end cell cell bold sigma subscript bold 22 end cell bold horizontal ellipsis cell bold sigma subscript bold 2 bold p end subscript end cell row bold vertical ellipsis bold vertical ellipsis blank bold vertical ellipsis row cell bold sigma subscript bold p bold 1 end subscript end cell cell bold sigma subscript bold p bold 2 end subscript end cell bold horizontal ellipsis cell bold sigma subscript bold p bold p end subscript end cell end table close parentheses bold. end cell end table

The ith diagonal element of Σ is the population variance of X_i:

bold italic sigma subscript bold ii bold equals bold italic sigma subscript bold i superscript bold 2 bold equals open curly brackets table attributes columnalign left end attributes row cell table attributes columnalign left end attributes row cell bold integral subscript bold minus bold infinity end subscript superscript bold infinity bold left parenthesis bold x subscript bold i bold minus bold mu subscript bold i bold right parenthesis to the power of bold 2 bold f subscript bold i bold left parenthesis bold x subscript bold i bold right parenthesis bold dx subscript bold i end cell cell bold if bold space bold X subscript bold i bold space bold is bold thin space bold a bold thin space bold continuous bold thin space bold random bold thin space bold variable end cell row cell stack begin bold style sum end style with bold x subscript bold i below bold left parenthesis bold x subscript bold i bold minus bold mu subscript bold i bold right parenthesis to the power of bold 2 bold p subscript bold i bold left parenthesis bold x subscript bold i bold right parenthesis end cell cell bold if bold space bold X subscript bold i bold space bold is bold thin space bold a bold thin space bold discrete bold thin space bold random bold thin space bold variable end cell end table end cell row blank end table close bold.

The (j,k)th off-diagonal element of Σ is the population covariance between Xj and Xk:

table attributes columnalign left end attributes row cell bold sigma subscript bold j bold k end subscript bold equals bold E bold left parenthesis bold X subscript bold j bold minus bold mu subscript bold j bold right parenthesis bold left parenthesis bold X subscript bold k bold minus bold mu subscript bold k bold right parenthesis end cell row cell bold equals bold left curly bracket table attributes columnalign left end attributes row cell begin bold style integral subscript negative infinity end subscript superscript infinity integral subscript negative infinity end subscript superscript infinity left parenthesis x subscript j minus mu subscript j right parenthesis left parenthesis x subscript k minus mu subscript k right parenthesis f subscript j k end subscript left parenthesis x subscript j comma x subscript k right parenthesis d subscript x j end subscript d x subscript k text?? end text i f text? end text X subscript j comma end subscript X subscript j text? end text a r e text? end text c o n t i n u o u s text? end text r a n d o m text? end text variables end style end cell row cell begin bold style sum for x j of sum for x k of left parenthesis x subscript j minus mu subscript j right parenthesis left parenthesis x subscript k minus mu subscript k right parenthesis p subscript j k end subscript left parenthesis x subscript j comma x subscript k right parenthesis text????????????????? end text i f text?? end text X subscript j comma end subscript X subscript k text? end text a r e text? end text d i s c r e t e text? end text r a n d o m text? end text variables end style end cell end table bold comma end cell end table

where fjk(xj, xk) and pjk(xj, xk) are the joint density function and joint probability mass function, respectively, of Xj and Xk. The population covariance measures the linear association between the two random variables. It is clear that σi = σkj and the covariance matrix Σ is symmetric. The same as the sample covariance matrix, the population covariance matrix Σ is always positive semidefinite.

Similar to the population mean, the population variance and covariance can be estimated by the sample variance and covariance introduced in Section 2.2. The sample variance and covariance are both random variables, and are unbiased estimators of the population variance and covariance. Consequently, the sample covariance matrix S is an unbiased estimator of the population covariance matrix Σ, that is, E(S) = Σ.

As for the sample covariance, the value of the population covariance of two random variables depends on the scaling, possibly due to the difference of measuring unit of the variables. A scaling-independent measure of the degree of linear association between the random variables Xj and Xk is given by the population correlation:

$table row cell bold rho subscript bold j bold k end subscript bold equals fraction numerator bold sigma subscript bold j bold k end subscript over denominator square root of bold sigma subscript bold j bold j end subscript end root square root of bold sigma subscript bold k bold k end subscript end root end fraction bold. end cell end table$

It is clear that ρjk = ρkj. And the population correlation matrix of a random vector X is a symmetric matrix defined as

table row cell bold cor open parentheses bold X close parentheses bold equals open parentheses table row bold 1 cell bold rho subscript bold 12 end cell bold horizontal ellipsis cell bold rho subscript bold 1 bold p end subscript end cell row cell bold rho subscript bold 21 end cell bold 1 bold horizontal ellipsis cell bold rho subscript bold 2 bold p end subscript end cell row bold vertical ellipsis bold vertical ellipsis blank bold vertical ellipsis row cell bold rho subscript bold p bold 1 end subscript end cell cell bold rho subscript bold p bold 2 end subscript end cell bold vertical ellipsis bold 1 end table close parentheses bold. end cell end table

For univariate variables X and Y and a constant c, we have E(X + Y) = E(X) + E(Y) and E(cX) = cE(X). Similarly, for random vectors X and Y and a constant matrix C, it can be seen that

E open parentheses bold X bold space plus space bold Y close parentheses space equals space E left parenthesis bold X right parenthesis space plus space E open parentheses bold Y close parentheses (3.1)

E open parentheses bold CX close parentheses space equals space C open parentheses E open parentheses bold X <hr><noindex><a href=

Скачать книгу