Introduction to Linear Regression Analysis. Douglas C. Montgomery
2061.30
17.00
2207.50
5.50
1708.30
19.00
1784.70
24.00
2575.00
2.50
2357.90
7.50
2256.70
11.00
2165.20
13.00
2399.55
3.75
1779.80
25.99
2336.75
9.75
1765.30
22.00
2053.50
18.00
2414.40
6.00
2200.50
12.50
2654.20
2.00
1753.70
21.50
proc reg;
model shear=age/p clm cli;
run;
TABLE 2.8 SAS Output for Analysis of Rocket Propellant Data.
|
SAS also produces a log file that provides a brief summary of the SAS session. The log file is almost essential for debugging SAS code. Appendix D provides more details about this file.
R is a popular statistical software package, primarily because it is freely available at www.r-project.org. An easier-to-use version of R is R Commander. R itself is a high-level programming language. Most of its commands are prewritten functions. It does have the ability to run loops and call other routines, for example, in C. Since it is primarily a programming language, it often presents challenges to novice users. The purpose of this section is to introduce the reader as to how to use R to analyze simple linear regression data sets.
The first step is to create the data set. The easiest way is to input the data into a text file using spaces for delimiters. Each row of the data file is a record. The top row should give the names for each variable. All other rows are the actual data records. For example, consider the rocket propellant data from Example 2.1 given in Table 2.1. Let propellant.txt be the name of the data file. The first row of the text file gives the variable names:
strength age
The next row is the first data record, with spaces delimiting each data item:
2158.70 15.50
The R code to read the data into the package is:
prop <- read.table(“propellant.txt”,header=TRUE, sep=””)
The object prop is the R data set, and “propellant.txt” is the original data file. The phrase, header=TRUE tells R that the first row is the variable names. The phrase sep=”” tells R that the data are space delimited.
The commands
prop.model <- lm(strength~age, data=prop) summary(prop.model)
tell R
to estimate the model, and
to print the analysis of variance, the estimated coefficients, and their tests.
R Commander is an add-on package to R. It also is freely available. It provides an easy-to-use user interface, much like Minitab and JMP, to the parent R product. R Commander makes it much more convenient to use R; however, it does not provide much flexibility in its analysis. R Commander is a good way for users to get familiar with R. Ultimately, however, we recommend the use of the parent R product.
Figure 2.9 Two influential observations.
Figure 2.10 A point remote in x space.
2.10 SOME CONSIDERATIONS IN THE USE OF REGRESSION
Regression analysis is widely used and, unfortunately, frequently misused. There are several common abuses of regression that should be mentioned:
1 Regression models are intended as interpolation equations over the range of the regressor variable(s) used to fit the model. As observed previously, we must be careful if we extrapolate outside of this range. Refer to Figure 1.5.
2 The disposition of the x values plays an important role in the least-squares fit. While all points have equal weight in determining the height of the line, the slope is more strongly influenced by the remote values of x. For example, consider the data in Figure 2.9. The slope in the least-squares fit depends heavily on either or both of the points A and B. Furthermore, the remaining data would give a very different estimate of the slope if A and B were deleted. Situations such as this often require corrective action, such as further analysis and possible deletion of the unusual points, estimation of the model parameters with some technique that is less seriously influenced by these points than least squares, or restructuring the model, possibly by introducing further regressors.A somewhat different situation is illustrated