Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics

href="#fb3_img_img_a92f08e4-ddf0-55dd-b374-a4dfc853d84c.png" alt="equation"/>

The matched-pairs design is a very important concept in statistics and design of experiments, because this simple design is the starting point to understanding more complicated designs and modeling such as mixed effects and hierarchical models.

We analyze the hypothetical data in Table 2.8 using a paired samples t‐test in R by requesting paired = TRUE :

> treat <- c(10, 15, 20, 22, 25) > control <- c(8, 12, 14, 15, 24) > t.test(treat, control, paired = TRUE) Paired t-test data: treat and control t = 3.2827, df = 4, p-value = 0.03042 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.5860324 7.0139676 sample estimates: mean of the differences 3.8

The obtained p‐value of 0.03 is statistically significant at a 0.05 level of significance. We reject the null hypothesis and conclude the population means for the treatment conditions to be different.

As a nonparametric test, the Wilcoxon rank‐sum test featured earlier can be adapted to incorporate paired observations. For our data, we have:

> wilcox.test(treat, control, paired = TRUE) Wilcoxon signed rank test data: treat and control V = 15, p-value = 0.0625 alternative hypothesis: true location shift is not equal to 0

Table 2.9 Randomized Block Design

	Treatment 1	Treatment 2	Treatment 3
Block 1	10	9	8
Block 2	15	13	12
Block 3	20	18	14
Block 4	22	17	15
Block 5	25	25	24

We notice that the obtained p‐value is somewhat greater for the nonparametric test than for the parametric one. In terms of significance tests, this emphasizes the fact that there is usually a cost to not being able to make parametric assumptions.

2.24 BLOCKING WITH SEVERAL CONDITIONS

We have said that in a blocking design, between treatment conditions we expect the covariance to be unequal to 0. Now, consider a design in which, once again we block, but this time on more than two treatment levels. The layout for such a design is given in Table 2.9.

Now, here is the trick to understanding advanced modeling, including a primary feature of mixed effects modeling. We know that we expect the covariance between treatments to be unequal to 0. This is analogous to what we expected in the simple matched-pairs design. It seems then that a reasonable assumption to make for the data in Table 2.9 is that the covariances between treatments are equal, or at minimum, follow some hypothesized correlational structure. In multilevel and hierarchical models, attempts are made to account for the correlation between treatment levels instead of assuming these correlations to equal 0 as is the case for classical between‐subjects designs. In Chapter 6, we elaborate on these ideas when we discuss randomized block and repeated measures models.

2.25 COMPOSITE VARIABLES: LINEAR COMBINATIONS

In many statistical techniques, especially multivariate ones, statistical analyses take place not on individual variables, but rather on linear combinations of variables. A linear combination in linear algebra can be denoted simply as:

where a ' = (a₁, a₂, …, a_p). These values are scalars, and serve to weight the respective values of y₁ through y_p, which are the variables.

Just as we did for “ordinary” variables, we can compute a number of central tendency and dispersion statistics on linear combinations. For instance, we can compute the mean of a linear combination ℓ_i as

We can also compute the sample variance of a linear combination:

for ℓ_i = a′y, i = 1, 2, …, n, and where S is the sample covariance matrix. Though the form a′Sa for the variance may be difficult to decipher at this point, it will become clearer when we consider techniques such as principal components later in the book.

For two linear combinations,

and

we can obtain the sample covariance between such linear combinations as follows:

The correlation of these linear combinations (Rencher and Christensen, 2012, p. 76) is simply the standardized version of images :

As we will see later in the book, if images is the maximum correlation between linear combinations on the same variables, it is called the canonical correlation, discussed in Chapter 12. The correlation between linear combinations plays a central role in multivariate analysis. Substantively, and geometrically, linear combinations can be interpreted as “projections” of one or more variables onto new dimensions. For instance, in simple linear regression, the fitting of a least‐squares line is such a projection. It is the projection of points such that it guarantees that the sum of squared deviations from the given projected line or “surface” (in the case of higher dimensions) is kept to a minimum.

If we can assume multivariate normality of a distribution, that is, Y ∼ N[μ, ∑], then we know linear combinations of Y are also normally distributed, as well as a host of other useful statistical properties (see Timm, 2002, pp. 86–88). In multivariate methods especially, we regularly need to make assumptions about such linear combinations,

Скачать книгу