Applied Regression Modeling. Iain Pardoe

Applied Regression Modeling

0.025 0.01 0.005 0.001 Critical value of images

1.311 1.699 2.045 2.462 2.756 3.396 Two‐tail area 0.2 0.1 0.05 0.02 0.01 0.002

Compared with the corresponding table for the normal distribution in Section 1.2, the critical values are slightly larger in this table.

We will use the t‐distribution from this point on because it will allow us to use an estimate of the population standard deviation (rather than having to assume this value). A reasonable estimate to use is the sample standard deviation, images . Since we will be using an estimate of the population standard deviation, we will be a little less certain about our probability calculations—this is why the t‐distribution needs to be a little more spread out than the normal distribution, to adjust for this extra uncertainty. This extra uncertainty will be of particular concern when we are not too sure if our sample standard deviation is a good estimate of the population standard deviation (i.e., in small samples). So, it makes sense that the degrees of freedom increases as the sample size increases. In this particular application, we will use the t‐distribution with images degrees of freedom in place of a standard normal distribution in the following t‐version of the central limit theorem.

Suppose that a random sample of images data values, represented by images , comes from a population that has a mean of images . Imagine taking a large number of random samples of images data values and calculating the mean and standard deviation for each sample. As before, we will let images represent the imagined list of repeated sample means, and similarly, we will let images represent the imagined list of repeated sample standard deviations. Define

Under very general conditions, t has an approximate t‐distribution with images degrees of freedom. The two differences from the normal version of the central limit theorem that we used before are that the repeated sample standard deviations, images , replace an assumed population standard deviation, images , and that the resulting sampling distribution is a t‐distribution (not a normal distribution).

To illustrate, let us repeat the calculations from Section 1.4.1 based on an assumed population mean, images , but rather than using an assumed population standard deviation, images , we will instead use our observed sample standard deviation, 53.8656 for images . To find the 90th percentile of the sampling distribution of the mean sale price, images :

Thus, the 90th percentile of the sampling distribution of images is images (to the nearest images ).

Turning this around, what is the probability that images is greater than 292.893?

So, the probability that images is greater than 292.893 is 0.10.

So far, we have focused on the sampling distribution of sample means, images , but what we would really like to do is infer what the observed sample mean, images , tells us about the population mean, images . Thus, while the preceding calculations have been useful for building up intuition about sampling distributions and manipulating probability statements, their main purpose has been to prepare the ground for the next two sections, which cover how to make statistical inferences about the population mean, images .

1.5 Interval Estimation

We have already seen that the sample mean, images , is a good point estimate of the population mean, images (in the sense that it is unbiased—see Section 1.4). It is also helpful to know how reliable this estimate is, that is, how much sampling uncertainty is associated with it. A useful way to express this uncertainty is to calculate an interval estimate or confidence interval for the population mean, images . The interval should be centered at the point estimate (in this case, images ), and since we are probably equally uncertain that the population mean could be lower or higher than this estimate, it should have the same amount of uncertainty either side of the point estimate. We quantify this uncertainty with a number called the “margin of error.” Thus, the confidence interval is of the form “point estimate images margin of error” or “(point estimate images margin of error, point estimate images margin of error).”

Скачать книгу