Handbook of Regression Analysis With Applications in R. Samprit Chatterjee
rel="nofollow" href="#fb3_img_img_eaebd95d-609f-5086-a9e0-7a8221df3a11.png" alt="images"/>) values are given for each predictor. It is apparent that there is virtually no collinearity among these predictors (recall that
P L r i Y o B v e p B a i L a e e t n o r r d h g t . t r r . . b y o o a s u . o o r i i t Mallows m m e z l a Vars R-Sq R-Sq(adj) Cp AICc S s s a e t x 1 35.3 34.6 21.2 1849.9 52576 X 1 29.4 28.6 30.6 1857.3 54932 X 1 10.6 9.5 60.3 1877.4 61828 X 2 46.6 45.2 5.5 1835.7 48091 X X 2 38.9 37.5 17.5 1847.0 51397 X X 2 37.8 36.3 19.3 1848.6 51870 X X 3 49.4 47.5 3.0 1833.1 47092 X X X 3 48.2 46.3 4.9 1835.0 47635 X X X 3 46.6 44.7 7.3 1837.5 48346 X X X 4 50.4 48.0 3.3 1833.3 46885 X X X X 4 49.5 47.0 4.7 1834.8 47304 X X X X 4 49.4 46.9 5.0 1835.1 47380 X X X X 5 50.6 47.5 5.0 1835.0 47094 X X X X X 5 50.5 47.3 5.3 1835.2 47162 X X X X X 5 49.6 46.4 6.7 1836.8 47599 X X X X X 6 50.6 46.9 7.0 1836.9 47381 X X X X X X
Output of this type provides the tools to choose among candidate models. The output provides summary statistics for the three models with strongest fit for each number of predictors. So, for example, the best one‐predictor model is based on Bathrooms
, while the second best is based on Living.area
; the best two‐predictor model is based on Bathrooms
and Living.area
; and so on. The principle of parsimony noted earlier implies moving down the table as long as the gain in fit is big enough, but no further, thereby encouraging simplicity. A reasonable model selection strategy would not be based on only one possible measure, but looking at all of the measures together, using various guidelines to ultimately focus in on a few models (or only one) that best trade off strength of fit with simplicity, for example as follows:
1 Increase the number of predictors until the value levels off. Clearly, the highest for a given cannot be smaller than that for a smaller value of . If levels off, that implies that additional variables are not providing much additional fit. In this case, the largest values go from roughly to from to , which is clearly a large gain in fit, but beyond that more complex models do not provide much additional fit (particularly past ). Thus, this guideline suggests choosing either or .
2 Choose the model that maximizes the adjusted . Recall from equation (1.7) that the adjusted equalsIt is apparent that explicitly trades off strength of fit () versus simplicity [the multiplier ], and can decrease if predictors that do not add any predictive power are added to a model. Thus, it is reasonable to not complicate a model beyond the point where its adjusted increases. For these data, is maximized at .
The fourth column in the output refers to a criterion called Mallows'
where
1 Choose the model that minimizes . In case of tied values, the simplest model (smallest ) would be chosen. In these data, this rule implies choosing .
An additional operational rule for the use of
This suggests the following model selection rule:
1 Choose the simplest model such that or smaller. In these data, this rule implies choosing .
A weakness of the
where the