Applied Regression Modeling. Iain Pardoe
The example above becomes
Computer help #23 in the software information files available from the book website shows how to use statistical software to calculate confidence intervals for the population mean. As further practice, calculate a 90% confidence interval for the population mean for the home prices example (see Problem 1.10)—you should find that it is (
Now that we have calculated a confidence interval, what exactly does it tell us? Well, for the home prices example, loosely speaking, we can say that “we are 95% confident that the mean single‐family home sale price in this housing market is between
Interpretation of a confidence interval for a univariate mean:
Suppose we have calculated a 95% confidence interval for a univariate mean,
Before moving on to Section 1.6, which describes another way to make statistical inferences about population means—hypothesis testing—let us consider whether we can now forget the normal distribution. The calculations in this section are based on the central limit theorem, which does not require the population to be normal. We have also seen that t‐distributions are more useful than normal distributions for calculating confidence intervals. For large samples, it does not make much difference (note how the percentiles for t‐distributions get closer to the percentiles for the standard normal distribution as the degrees of freedom get larger in Table C.1), but for smaller samples it can make a large difference. So for this type of calculation, we always use a t‐distribution from now on. However, we cannot completely forget about the normal distribution yet; it will come into play again in a different context in later chapters.
When using a t‐distribution, how do we know how many degrees of freedom to use? One way to think about degrees of freedom is in terms of the information provided by the data we are analyzing. Roughly speaking, each data observation provides one degree of freedom (this is where the
1.6 Hypothesis Testing
Another way to make statistical inferences about a population parameter such as the mean is to use hypothesis testing to make decisions about the parameter's value. Suppose that we are interested in a particular value of the mean single‐family home sale price, for example, a claim from a realtor that the mean sale price in this market is
1.6.1 The rejection region method
To decide between two competing claims, we can conduct a hypothesis test as follows:
Express the claim about a specific value for the population parameter of interest as a null hypothesis, denoted . In this textbook, we use this notation in place of the traditional , which, in the author's experience, is unfamiliar and awkward for many students. The null hypothesis needs to be in the form “parameter some hypothesized value,” for example, : . A frequently used legal analogy is that the null hypothesis is equivalent to a presumption of innocence in a trial before any evidence has been presented.
Express the alternative claim as an alternative hypothesis, denoted . Again, in this book, we use this notation in place of the traditional . The alternative hypothesis can be in a lower‐tail form, for example, : , or an upper‐tail form, for example, : , or a two‐tail form, for example, : . The alternative hypothesis, also sometimes called the research hypothesis, is what we would like to demonstrate to be the case, and needs to be stated before looking at the data. To continue the legal analogy, the alternative hypothesis is guilt, and we will only reject the null hypothesis (innocence) if we favor the alternative hypothesis (guilt) beyond a reasonable doubt. To illustrate, we will presume for the home prices example that we have some reason to suspect that the mean sale price is higher than claimed by the realtor (perhaps a political organization