Compared with the corresponding table for the normal distribution in Section 1.2, the critical values are slightly larger in this table.
We will use the t‐distribution from this point on because it will allow us to use an estimate of the population standard deviation (rather than having to assume this value). A reasonable estimate to use is the sample standard deviation, . Since we will be using an estimate of the population standard deviation, we will be a little less certain about our probability calculations—this is why the t‐distribution needs to be a little more spread out than the normal distribution, to adjust for this extra uncertainty. This extra uncertainty will be of particular concern when we are not too sure if our sample standard deviation is a good estimate of the population standard deviation (i.e., in small samples). So, it makes sense that the degrees of freedom increases as the sample size increases. In this particular application, we will use the t‐distribution with degrees of freedom in place of a standard normal distribution in the following t‐version of the central limit theorem.
Suppose that a random sample of data values, represented by , comes from a population that has a mean of . Imagine taking a large number of random samples of data values and calculating the mean and standard deviation for each sample. As before, we will let represent the imagined list of repeated sample means, and similarly, we will let represent the imagined list of repeated sample standard deviations. Define
Under very general conditions, t has an approximate t‐distribution with degrees of freedom. The two differences from the normal version of the central limit theorem that we used before are that the repeated sample standard deviations, , replace an assumed population standard deviation, , and that the resulting sampling distribution is a t‐distribution (not a normal distribution).
To illustrate, let us repeat the calculations from Section 1.4.1 based on an assumed population mean, , but rather than using an assumed population standard deviation, , we will instead use our observed sample standard deviation, 53.8656 for . To find the 90th percentile of the sampling distribution of the mean sale price, :
Thus, the 90th percentile of the sampling distribution of is (to the nearest ).
Turning this around, what is the probability that is greater than 292.893?
So, the probability that is greater than 292.893 is 0.10.
So far, we have focused on the sampling distribution of sample means, , but what we would really like to do is infer what the observed sample mean, , tells us about the population mean, . Thus, while the preceding calculations have been useful for building up intuition about sampling distributions and manipulating probability statements, their main purpose has been to prepare the ground for the next two sections, which cover how to make statistical inferences about the population mean, .
1.5 Interval Estimation
We have already seen that the sample mean, , is a good point estimate of the population mean, (in the sense that it is unbiased—see Section 1.4). It is also helpful to know how reliable this estimate is, that is, how much sampling uncertainty is associated with it. A useful way to express this uncertainty is to calculate an interval estimate or confidence interval for the population mean, . The interval should be centered at the point estimate (in this case, ), and since we are probably equally uncertain that the population mean could be lower or higher than this estimate, it should have the same amount of uncertainty either side of the point estimate. We quantify this uncertainty with a number called the “margin of error.” Thus, the confidence interval is of the form “point estimate margin of error” or “(point estimate margin of error, point estimate margin of error).”