Biostatistics Decoded. A. Gouveia Oliveira
expected, as the variable we used had a normal distribution, the sample means also have a normal distribution. We can see that the average value of the sample means is, in all cases, the same value as the population mean, that is, 0. However, the standard deviations of the values of sample means are not the same in all four runs of the experiment. In samples of size 4 the standard error is 0.50, in samples of size 9 it is 0.33, in samples of size 16 it is 0.25, and in samples of size 25 it is 0.20.
Figure 1.29 Distribution of sample means of different sample sizes.
If we look more closely at these results, we realize that those values have something in common. Thus, 0.50 is 1 divided by 2, 0.33 is 1 divided by 3, 0.25 is 1 divided by 4, and 0.20 is 1 divided by 5. Now, can you see the relation between the divisors and the sample size, that is, 2 and 4, 3 and 9, 4 and 16, 5 and 25? The divisors are the square root of the sample size and 1 is the value of the population standard deviation. This means that the standard deviation of the sample means is equal to the population standard deviation divided by the square root of the sample size. Therefore, there is a fixed relationship between the standard deviation of the sample means of an attribute and the standard deviation of that attribute, where the former is equal to the latter divided by the square root of the sample size.
In the next section we will present an explanation for this relationship, but for now let us consolidate some of the concepts we have discussed so far.
The standard deviation of the sample means has its own name of standard error of the mean or, simply, standard error. If the standard error is equal to the population standard deviation divided by the square root of the sample size, then the variance of the sample means is equal to the population variance divided by the sample size.
Now we can begin to see why people tend to get confused with statistics. We have been talking about different means and different standard deviations, and students often become disoriented with so many measures. Let us review the meaning of each one of those measures.
There is the sample mean, which is not equal in value to the population mean. Sample means have a probability distribution whose mean has the same value as the population mean.
There is a statistical notation to represent the quantities we have encountered so far, whereby population parameters, which are constants and have unknown values, are represented by Greek characters; statistics obtained from samples, which are variables with known value, are represented by Latin characters. Therefore, the value of the sample mean is represented by the letter m, and the value of the population mean by the letter μ (“m” in the Greek alphabet). Following the same rule for proportions, the symbol for the sample proportion is p and for the population proportion is π (“p” in the Greek alphabet).
Next, there is the sample standard deviation, which is not equal in value to the population standard deviation. Sample means have a distribution whose standard deviation, also known as standard error, is different from the sample standard deviation and from the population standard deviation. The usual notation for the sample standard deviation is the letter s, and for the population standard deviation is the letter σ (“s” in the Greek alphabet). There is no specific notation for the standard error.
Then there is the sample variance, which is also not equal to the population variance. These quantities are usually represented by the symbols s2 and σ2, respectively. In the case of proportions, the sample and population variances should also be represented by s2 and σ2, but instead the general practice is to represent them by the formulas used for their computation. Therefore, the sample proportion is represented by p(1 − p) and the population variance by π(1 − π).
If one looks at the formulae for each of the above statistics, it becomes readily apparent why the sample statistics do not have the same value as their population counterparts. The reason is because they are computed differently, as shown in Figure 1.30.
Figure 1.30 Comparison of the computation of sample and population statistics.
Sample means also have variance, which is the square of the standard error, but the variance of sample means has neither a specific name, nor a specific notation.
From all of the above, we can conclude the following about sampling distributions:
Sample means have a normal distribution, regardless of the distribution of the attribute, but on the condition that they are large.
Small samples have a normal distribution only if the attribute has a normal distribution.
The mean of the sample means is the same as the population mean, regardless of the distribution of the variable or the sample size.
The standard deviation of the sample means, or standard error, is equal to the population standard deviation divided by the square root of the sample size, regardless of the distribution of the variable or the sample size.
Both the standard deviation and the standard error are measures of dispersion: the first measures the dispersion of the values of an attribute and the second measures the dispersion of the sample means of an attribute.
The above results are valid only if the observations in the sample are mutually independent.
1.19 The Value of the Standard Error
Let us continue to view, as in Section 1.17, the sample mean as a random variable that results from the sum of identically distributed independent variables. The mean and variance of each of these identical variables are, of course, the same as the population mean and variance, respectively μ and σ2.
When we compute sample means, we sum all observations and divide the result by the sample size. This is exactly the same as if, before we summed all the observations, we divided each one by the sample size. If we represent the sample mean by m, each observation by x, and the sample size by n, what was just said can be represented by
This is the same as if every one of the identical variables was divided by a constant amount equal to the sample size. From the properties of means, we know that if we divide a variable by a constant, its mean will be divided by the same constant. Therefore, the mean of each xi/n is equal to the population mean divided by n, that is, μ/n.
Now, from the properties of means we know that if we add independent variables, the mean of the resulting variable will be the sum of the means of the independent variables. Sample means result from adding together n variables, each one having a mean equal to μ/n. Therefore, the mean of the resulting variable will be n × μ/n = μ, the population mean. The conclusion, therefore, is that the distribution of sample means m has a mean equal to the population mean μ.
A similar reasoning may be used to find the value of the variance of sample means. We saw above that, to obtain a sample mean, we divide every