Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics

can say E(T) − θ ≠ 0. Since the bias will be a positive number, we can express this last statement as E(T) − θ > 0.

Good estimators are, in general, unbiased. The most popular example of an unbiased estimator is that of the arithmetic sample mean since it can be shown that:

An example of an estimator that is biased is the uncorrected sample variance, as we will soon discuss, since it can be shown that

However, S² is not asymptotically biased. As sample size increases without bound, E(S²) converges to σ². Once the sample variance is corrected via the following, it leads to an unbiased estimator, even for smaller samples:

where now,

Consistency⁶of an estimator means that as sample size increases indefinitely, the variance of the estimator approaches zero. That is, images as n → ∞. We could also write this using a limit concept:

which reads “the variance of the estimator T as sample size n goes to infinity (grows without bound) is equal to 0.” Fisher called this the criterion of consistency, informally defining it as “when applied to the whole population the derived statistic should be equal to the parameter” (Fisher, 1922a, p. 316). The key to Fisher's definition is whole population, which means, theoretically at least, an infinitely large sample, or analogously, n → ∞. More pragmatically, images when we have the entire population.

An estimator is regarded as efficient the lower is its mean squared error. Estimators with lower variance are more efficient than estimators with higher variance. Fisher called this the criterion of efficiency, writing “when the distributions of the statistics tend to normality, that statistic is to be chosen which has the least probable error” (Fisher, 1922a, p. 316). Efficient estimators are generally preferred over less efficient ones.

An estimator is regarded as sufficient for a given parameter if the statistic “captures” everything we need to know about the parameter and our knowledge of the parameter could not be improved if we considered additional information (such as a secondary statistic) over and above the sufficient estimator. As Fisher (1922a, p. 316) described it, “the statistic chosen should summarize the whole of the relevant information supplied by the sample.” More specifically, Fisher went on to say:

If θ be the parameter to be estimated, θ₁ a statistic which contains the whole of the information as to the value of θ, which the sample supplies, and θ₂ any other statistic, then the surface of distribution of pairs of values of θ₁ and θ₂, for a given value of θ, is such that for a given value of θ₁, the distribution of θ₂ does not involve θ. In other words, when θ₁ is known, knowledge of the value of θ₂ throws no further light upon the value of θ.

(Fisher, 1922a, pp. 316–317)

2.8 VARIANCE

Returning to our discussion of moments, the variance is the second moment of a distribution. For the discrete case, variance is defined as:

while for the continuous case,

Since E(y_i) = μ, it stands that we may also write E(y_i) as μ. We can also express σ² as images since, when we distribute expectations, we obtain:

Recall that the uncorrected and biased sample variance is given by:

As earlier noted, taking the expectation of S², we find that E(S²) ≠ σ². The actual expectation of S² is equal to:

which implies the degree to which S² is biased is equal to:

We have said that S² is biased, but you may have noticed that as n increases, (n − 1)/n approaches 1, and so E(S²) will equal σ² as n increases without bound. This was our basis for earlier writing images . That is, we say that the estimator S², though biased for small samples, is asymptotically unbiased because its expectation is equal to σ² as n → ∞.

When we lose a degree of freedom in the denominator and rename S² to s², we get

Recall that when we take the expectation of s², we find that E(s²) = σ² (see Wackerly, Mendenhall, and Scheaffer (2002, pp. 372–373) for a proof).

The population standard deviation is given by the positive square root of σ², that is, images . Analogously, the sample standard deviation is given by images .

Recall the interpretation of a standard deviation. It tells us on average how much scores deviate from the mean. In computing a measure of dispersion, we initially squared deviations so as to avoid our measure of dispersion always equaling zero for any given set of observations, since the sum of deviations about the mean is always equal to 0. Taking the average of this sum of squares gave us the variance, but since this is in squared units, we wish to return them to “unsquared” units. This is how the standard deviation comes about. Studying the analysis of variance, the topic of the following chapter, will help in “cementing” some of these ideas of variance and the squaring of deviations, since ANOVA is all about generating different sums of squares and their averages, which go by the name of mean squares.

The variance and standard deviation are easily obtained in R. We compute for parent in Galton's data:

> var(parent) [1] 3.194561 > sd(parent) [1] 1.787333

Скачать книгу