Introduction to Linear Regression Analysis. Douglas C. Montgomery

Introduction to Linear Regression Analysis

rel="nofollow" href="#u4064aa44-1729-52c4-85bc-49c35cc5e76f">Appendix E provides a brief introduction to the R statistical software package. We present R code for doing analyses throughout the text. Without these skills, it is virtually impossible to successfully build a regression model.

Figure 1.8 Regression model-building process.

      CHAPTER 2
      SIMPLE LINEAR REGRESSION

      2.1 SIMPLE LINEAR REGRESSION MODEL

This chapter considers the simple linear regression model, that is, a model with a single regressor x that has a relationship with a response y that is a straight line. This simple linear regression model is

(2.1)

where the intercept β₀ and the slope β₁ are unknown constants and ε is a random error component. The errors are assumed to have mean zero and unknown variance σ². Additionally we usually assume that the errors are uncorrelated. This means that the value of one error does not depend on the value of any other error.

It is convenient to view the regressor x as controlled by the data analyst and measured with negligible error, while the response y is a random variable. That is, there is a probability distribution for y at each possible value for x. The mean of this distribution is

(2.2a)

and the variance is

(2.2b)

Thus, the mean of y is a linear function of x although the variance of y does not depend on the value of x. Furthermore, because the errors are uncorrelated, the responses are also uncorrelated.

The parameters β₀ and β₁ are usually called regression coefficients. These coefficients have a simple and often useful interpretation. The slope β₁ is the change in the mean of the distribution of y produced by a unit change in x. If the range of data on x includes x = 0, then the intercept β₀ is the mean of the distribution of the response y when x = 0. If the range of x does not include zero, then β₀ has no practical interpretation.

2.2 LEAST-SQUARES ESTIMATION OF THE PARAMETERS

The parameters β₀ and β₁ are unknown and must be estimated using sample data. Suppose that we have n pairs of data, say (y₁, x₁), (y₂, x₂), …, (yn, xn). As noted in Chapter 1, these data may result either from a controlled experiment designed specifically to collect the data, from an observational study, or from existing historical records (a retrospective study).

2.2.1 Estimation of β₀ and β₁

The method of least squares is used to estimate β₀ and β₁. That is, we estimate β₀ and β₁ so that the sum of the squares of the differences between the observations yi and the straight line is a minimum. From Eq. (2.1) we may write

(2.3)

Equation (2.1) maybe viewed as a population regression model while Eq. (2.3) is a sample regression model, written in terms of the n pairs of data (yi, xi) (i = 1, 2, …, n). Thus, the least-squares criterion is

(2.4)

The least-squares estimators of β₀ and β₁, say in13-1 and in13-2 , must satisfy

and

Simplifying these two equations yields

(2.5)

Equations (2.5) are called the least-squares normal equations. The solution to the normal equations is

(2.6)

and

(2.7)

where

are the averages of yi and xi, respectively. Therefore, in14-1 and in14-2 in Eqs. (2.6) and (2.7) are the least-squares estimators of the intercept and slope, respectively. The fitted simple linear regression model is then

(2.8)

Equation (2.8) gives a point estimate of the mean of y for a particular x.

Since the denominator of Eq. (2.7) is the corrected sum of squares of the xi and the numerator is the corrected sum of cross products of xi and yi, we may write these quantities in a more compact notation as

(2.9) Скачать книгу

Introduction to Linear Regression Analysis. Douglas C. Montgomery