Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen

Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen


Скачать книгу
is the sample variance s squared equals fraction numerator 1 over denominator n minus 1 end fraction sum subscript i equals 1 end subscript superscript n left parenthesis X subscript i minus X with bar on top right parenthesis squared. The sample mean follows N(μ, σ2/n) and (n − 1)s22 follows a χ2 distribution with n1 degrees of freedom. Consequently, under H0 the t statistic in (3.18) follows a Student’s t-distribution with n1 degrees of freedom. We reject H0 at significance level α and conclude that μ is not equal to μ0 if |t|>tα/2,n−1, where tα/2,n−1 denotes the upper 100(α/2)th percentile of the t-distribution with n − 1 degrees of freedom. Intuitively, |t|>tα/2,n−1 indicates that we only have a small probability to observe |t| if we sample from the Student’s t-distribution with n − 1 degrees of freedom. Thus, it is very likely the null hypothesis H0 is not correct and we should reject H0.

      The test based on a fixed significance level α, say α = 0.05, has the disadvantage that it gives the decision maker no idea about whether the observed value of the test statistic is just barely in the rejection region or if it is far into the region. Instead, the p-value can be used to indicate how strong the evidence is in rejecting the null hypothesis H0. The p-value is the probability that the test statistic will take on a value that is at least as extreme as the observed value when the null hypothesis is true. The smaller the p-value, the stronger the evidence we have in rejecting H0. If the p-value is smaller than α, H0 will be rejected at the significance level of α. The p-value based on the t statistic in (3.18) can be found as

table row cell P equals 2 text Pr end text left parenthesis T left parenthesis n minus 1 right parenthesis > semicolon vertical line t vertical line right parenthesis comma end cell end table

      where T(n − 1) denotes a random variable following a t distribution with n − 1 degrees of freedom.

      We can define the 100(1 − α)% confidence interval for μ as

table row cell left square bracket top enclose x minus t subscript alpha divided by 2 comma n minus 1 end subscript fraction numerator s over denominator square root of n end fraction comma space top enclose x plus t subscript alpha divided by 2 comma n minus 1 end subscript fraction numerator s over denominator square root of n end fraction right square bracket. end cell end table

      It is easy to see that the null hypothesis H0 is not rejected at level α if and only if μ0 is in the 100(1 − α)% confidence interval for μ. So the confidence interval consists of all those “plausible” values of μ0 that would not be rejected by the test of H0 at level α.

      To see the link to the test statistic used for a multivariate normal distribution, we consider an equivalent rule to reject H0, which is based on the square of the t statistic:

      For a multivariate distribution with unknown mean μ and known Σ, we consider testing the following hypotheses:

      Let X1, X2,…, Xn denote a random sample from a multivariate normal population. The test statistic in (3.19) can be naturally generalized to the multivariate distribution as

      where and S are the sample mean vector and the sample covariance matrix of X1, X2,…, Xn. The T2 statistic in (3.19) is called Hotelling’s T2 in honor of Harold Hotelling who first obtained its distribution. Assuming H0 is true, we have the following result about the distribution of the T2-statistic:

table row cell fraction numerator bold n bold minus bold p over denominator bold left parenthesis bold n bold minus bold 1 bold right parenthesis bold p end fraction bold T to the power of bold 2 bold tilde bold F subscript bold p bold comma bold n bold minus bold p end subscript bold comma end cell end table

      where Fp,n−p denotes the F-distribution with p and np degrees of freedom. Based on the results on the distribution of T2, we reject H0 at the significance level of α if

      where Fp,n−p denotes the upper (100α)th percentile of the F-distribution with p and np degrees of freedom. The p-value of the test based on the T2-statistic is

table row cell straight P equals text Pr end text left parenthesis straight F left parenthesis straight p comma straight n minus straight p right parenthesis > semicolon fraction numerator straight n minus straight p over denominator left parenthesis straight n minus 1 right parenthesis straight p end fraction straight T squared right parenthesis comma end cell end table

      where F(p,np) denotes a random variable distributed as Fp,n−p.

      The T2 statistic can also be written as

table row cell left curly bracket straight mu vertical line straight n left parenthesis bold x with bold bar on top minus bold mu right parenthesis to the power of straight T bold S to the power of negative 1 end exponent left parenthesis bold x with bold <hr><noindex><a href=Скачать книгу