Data Science in Theory and Practice. Maria Cristina Mariani

Data Science in Theory and Practice

href="#fb3_img_img_d356d65e-6688-56e7-bf38-9547bd4860f2.png" alt="z"/> can be found either by averaging the

values

z 1 equals bold a Superscript upper T Baseline bold x Subscript 1 Baseline comma z 2 equals bold a Superscript upper T Baseline bold x Subscript 2 Baseline comma ellipsis comma z Subscript n Baseline equals bold a Superscript upper T Baseline bold x Subscript n Baseline

or as a linear combination of bold upper X overbar

, the sample mean vector of bold x Subscript 1 Baseline comma bold x Subscript 2 Baseline comma ellipsis comma bold x Subscript n Baseline

(3.12)

3.6.2 Linear Combinations of Sample Variance and Covariance

The sample variance of z Subscript i Baseline equals bold a Superscript upper T Baseline bold x Subscript i Baseline comma i equals 1 comma 2 comma ellipsis comma n can be found as the sample variance of z 1 comma z 2 comma ellipsis comma z Subscript n Baseline or directly from bold a and bold upper S , where is the sample covariance matrix of x 1 comma x 2 comma ellipsis comma x Subscript n Baseline :

(3.13) s Subscript z Superscript 2 Baseline equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis z Subscript i Baseline minus z right-parenthesis squared Over n minus 1 EndFraction equals bold a Superscript upper T Baseline bold Sa period

We recall from Section 2.3 that variance is always nonnegative. Thus, we have s Subscript z Superscript 2 Baseline greater-than-or-equal-to 0 , and therefore, bold a Superscript upper T Baseline bold Sa greater-than-or-equal-to 0 , for every bold a .

If we define another linear combination u equals bold b Superscript upper T Baseline bold upper X equals b 1 x 1 plus b 2 x 2 plus midline-horizontal-ellipsis plus b Subscript p Baseline x Subscript p Baseline , where is a vector of constants different from bold a Superscript upper T , then the sample covariance of z equals bold a Superscript bold upper T Baseline bold upper X and u equals bold b Superscript bold upper T Baseline bold upper X is given by

(3.14) s Subscript z u Baseline equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis z Subscript i Baseline minus z overbar right-parenthesis left-parenthesis u Subscript i Baseline minus u overbar right-parenthesis Over n minus 1 EndFraction equals bold a Superscript upper T Baseline bold Sb comma

where is the number of measurements.

Please refer to Johnson and Wichern (2014) for the proof of (3.14).

3.6.3 Linear Combinations of Sample Correlation

The sample correlation between z equals bold a Superscript upper T Baseline bold upper X and u equals bold b Superscript upper T Baseline bold upper X is obtained as follows:

(3.15)

We note that the sample results in Section 3.6 have population counterparts. We briefly state them below:

The population mean of z equals bold a Superscript upper T Baseline bold upper X is defined as follows:

upper E left-parenthesis z right-parenthesis equals upper E left-parenthesis bold a Superscript upper T Baseline bold upper X right-parenthesis equals bold a Superscript upper T Baseline upper E left-parenthesis bold upper X right-parenthesis equals bold a Superscript upper T Baseline mu comma

where denotes the population mean vector. The population variance of is defined as follows:

sigma Subscript z Superscript 2 Baseline equals var left-parenthesis bold a Superscript upper T Baseline bold upper X right-parenthesis equals bold a Superscript upper T Baseline cov left-parenthesis bold upper X right-parenthesis bold a equals bold a Superscript upper T Baseline sigma-summation bold a comma

where sigma-summation denotes the population covariance

Скачать книгу