Applied Regression Modeling. Iain Pardoe

Applied Regression Modeling

We can express the model we have been using to estimate the population mean, images , as

In other words, each sample images ‐value (the index images keeps track of the sample observations) can be decomposed into two pieces, a deterministic part that is the same for all values, and a random error part that varies from observation to observation. A convenient choice for the deterministic part is the population mean, images , since then the random errors have a (population) mean of zero. Since images is the same for all images ‐values, the random errors, images , have the same standard deviation as the images ‐values themselves, that is, images . We can use this decomposition to derive the confidence interval and hypothesis test results of Sections 1.5 and 1.6 (although it would take more mathematics than we really need for our purposes in this book). Moreover, we can also use this decomposition to motivate the precise form of the uncertainty needed for prediction intervals (without having to get into too much mathematical detail).

In particular, write the images ‐value to be predicted as images , and decompose this into two pieces as above:

Then subtract images , which represents potential values of repeated sample means, from both sides of this equation:

(1.1) equation

Thus, in estimating the population mean, the only error we have to worry about is estimation error, whereas in predicting an individual images ‐value, we have to worry about both estimation error and random error.

Recall from Section 1.5 that the form of a confidence interval for the population mean is

The term images in this formula is an estimate of the standard deviation of the sampling distribution of sample means, images , and is called the standard error of estimation. The square of this quantity, images , is the estimated variance of the sampling distribution of sample means, images . Then, thinking of images as some fixed, unknown constant, images is also the estimated variance of the estimation error, images , in expression (1.1).

The estimated variance of the random error, images , in expression (1.1) is images . It can then be shown that the estimated variance of the prediction error, images , in expression (1.1) is images . Then, images is called the standard error of prediction.

Thus, in general, we can write a prediction interval for an individual images ‐value, as

where images is the sample mean, images is the sample standard deviation, images is the sample size, and the t‐percentile comes from a t‐distribution with images degrees of freedom.

For example, for a 95% interval (i.e., with 2.5% in each tail), the 97.5th percentile would be needed, whereas for a 90% interval (i.e., with 5% in each tail), the 95th percentile would be needed. These percentiles can be obtained from Table C.1. For example, the 95% prediction interval for an individual value of images picked at random from the population of single‐family home sale prices is calculated as

What about the interpretation of a prediction interval? Well, for the home prices example, loosely speaking, we can say that “we are 95% confident that the sale price for an individual home picked at random from all single‐family homes in this housing market will be between images and images .” More precisely, if we were to take a large number of random samples of size 30 from our population of sale prices and calculate a 95% prediction interval for each, then 95% of those prediction intervals would contain the (unknown) sale price for an individual home picked at random from the population.

Interpretation of a prediction interval for an individual images ‐value:

Suppose we have calculated a 95% prediction interval for an individual images ‐value to be ( images , images ).

Скачать книгу