Data Science in Theory and Practice. Maria Cristina Mariani
sample correlation coefficient is a measure of the linear association between two variables and does not depend on the units of measurement, i.e. when you construct the sample correlation coefficient, the units of measurement that are used cancel out. The sample correlation matrix is analogous to the covariance matrix with correlations in place of covariances:
(3.8)
The population correlation matrix similar to (3.8) is defined as follows:
(3.9)
where
We note that even though the signs of the sample correlation and the sample covariance are the same, the correlation is easier to interpret because its magnitude is bounded. It is bounded within the closed interval
1 The value of the sample correlation must lie between and inclusive. indicates perfect linear relationship and indicates perfect inverse relationship.
2 The sample correlation measures the strength of the linear association between two variables. If equals to zero, it implies no linear association between the components. Otherwise, the sign of indicates the direction of the association. If is positive, it means that as one variable gets larger the other gets larger. If is negative, it means that as one gets larger, the other gets smaller (often called an “inverse” correlation). A larger value of implies greater linear strength. This is an indication that both variables move in the opposite direction if one variable increases, the other variable decreases with the same magnitude (and vice versa).
Example 3.4 Consider the following data matrix introduced in Example 3.1:
Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. We find the sample correlation
Therefore,
In this example, we observe the variables
3.6 Linear Combinations of Variables
Most often, we are interested in linear combinations of the variables
Let
(3.10)
where
(3.11)
For example, if
3.6.1 Linear Combinations of Sample Means
The sample mean of