Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen

Industrial Data Analytics for Diagnosis and Prognosis

is the sample variance $s squared equals fraction numerator 1 over denominator n minus 1 end fraction sum subscript i equals 1 end subscript superscript n left parenthesis X subscript i minus X with bar on top right parenthesis squared$ . The sample mean X̄ follows N(μ, σ²/n) and (n − 1)s²/σ² follows a χ² distribution with n − ¹ degrees of freedom. Consequently, under H₀ the t statistic in (3.18) follows a Student’s t-distribution with n − ¹ degrees of freedom. We reject H₀ at significance level α and conclude that μ is not equal to μ₀ if |t|>t_α/2,n−1, where t_α/2,n−1 denotes the upper 100(α/2)th percentile of the t-distribution with n − 1 degrees of freedom. Intuitively, |t|>t_α/2,n−1 indicates that we only have a small probability to observe |t| if we sample from the Student’s t-distribution with n − 1 degrees of freedom. Thus, it is very likely the null hypothesis H₀ is not correct and we should reject H₀.

The test based on a fixed significance level α, say α = 0.05, has the disadvantage that it gives the decision maker no idea about whether the observed value of the test statistic is just barely in the rejection region or if it is far into the region. Instead, the p-value can be used to indicate how strong the evidence is in rejecting the null hypothesis H₀. The p-value is the probability that the test statistic will take on a value that is at least as extreme as the observed value when the null hypothesis is true. The smaller the p-value, the stronger the evidence we have in rejecting H₀. If the p-value is smaller than α, H₀ will be rejected at the significance level of α. The p-value based on the t statistic in (3.18) can be found as

table row cell P equals 2 text Pr end text left parenthesis T left parenthesis n minus 1 right parenthesis > semicolon vertical line t vertical line right parenthesis comma end cell end table

where T(n − 1) denotes a random variable following a t distribution with n − 1 degrees of freedom.

We can define the 100(1 − α)% confidence interval for μ as

$table row cell left square bracket top enclose x minus t subscript alpha divided by 2 comma n minus 1 end subscript fraction numerator s over denominator square root of n end fraction comma space top enclose x plus t subscript alpha divided by 2 comma n minus 1 end subscript fraction numerator s over denominator square root of n end fraction right square bracket. end cell end table$

It is easy to see that the null hypothesis H₀ is not rejected at level α if and only if μ₀ is in the 100(1 − α)% confidence interval for μ. So the confidence interval consists of all those “plausible” values of μ₀ that would not be rejected by the test of H₀ at level α.

To see the link to the test statistic used for a multivariate normal distribution, we consider an equivalent rule to reject H₀, which is based on the square of the t statistic:

$table row cell t squared equals fraction numerator left parenthesis X with bar on top minus mu subscript 0 right parenthesis squared over denominator s squared divided by n end fraction equals n left parenthesis X with bar on top minus mu subscript 0 right parenthesis left parenthesis s squared right parenthesis to the power of negative 1 end exponent left parenthesis X with bar on top minus mu subscript 0 right parenthesis. end cell end table$ (3.19)

We reject H₀ at significance level α if t²>(t_α/2,n−1)².

For a multivariate distribution with unknown mean μ and known Σ, we consider testing the following hypotheses:

table row cell H subscript 0 colon bold mu equals mu subscript 0 text vs. end text H subscript 1 colon bold mu not equal to mu subscript 0. end cell end table (3.20)

Let X₁, X₂,…, X_n denote a random sample from a multivariate normal population. The test statistic in (3.19) can be naturally generalized to the multivariate distribution as

T squared space equals space n open parentheses top enclose bold X space minus space bold italic mu subscript bold 0 close parentheses to the power of T space bold S to the power of bold minus bold 1 end exponent open parentheses top enclose bold X bold space bold minus bold space bold mu subscript bold 0 close parentheses (3.21)

where X̄ and S are the sample mean vector and the sample covariance matrix of X₁, X₂,…, X_n. The T² statistic in (3.19) is called Hotelling’s T² in honor of Harold Hotelling who first obtained its distribution. Assuming H₀ is true, we have the following result about the distribution of the T²-statistic:

$table row cell fraction numerator bold n bold minus bold p over denominator bold left parenthesis bold n bold minus bold 1 bold right parenthesis bold p end fraction bold T to the power of bold 2 bold tilde bold F subscript bold p bold comma bold n bold minus bold p end subscript bold comma end cell end table$

where Fp,n−p denotes the F-distribution with p and n − p degrees of freedom. Based on the results on the distribution of T², we reject H₀ at the significance level of α if

$table row cell straight T squared greater than fraction numerator left parenthesis straight n minus 1 right parenthesis straight p over denominator straight n minus straight p end fraction straight F subscript straight alpha comma straight p comma straight n minus straight p end subscript comma end cell end table$ (3.22)

where Fp,n−p denotes the upper (100α)th percentile of the F-distribution with p and n − p degrees of freedom. The p-value of the test based on the T²-statistic is

$table row cell straight P equals text Pr end text left parenthesis straight F left parenthesis straight p comma straight n minus straight p right parenthesis > semicolon fraction numerator straight n minus straight p over denominator left parenthesis straight n minus 1 right parenthesis straight p end fraction straight T squared right parenthesis comma end cell end table$

where F(p,n − p) denotes a random variable distributed as F_p,n−p.

The T² statistic can also be written as

table row cell left curly bracket straight mu vertical line straight n left parenthesis bold x with bold bar on top minus bold mu right parenthesis to the power of straight T bold S to the power of negative 1 end exponent left parenthesis bold x with bold <hr><noindex><a href=

Скачать книгу