Statistics. David W. Scott

Statistics - David W. Scott


Скачать книгу
but not its brain weight.) Moreover, the relationship appears to be linear. In this re‐expressed scatter diagram, the two or three outliers identified in the first plot are no longer outliers.

      1.2.2 Space Shuttle Flight 25

overnight, and it was 36
when the launch was attempted at 11:38 a.m. During the first 90 s, several O‐rings on the solid rocket boosters failed, leading to a catastrophic explosion and loss of all seven crew members. Scientists knew previous shuttle flights had occasionally experienced one or two O‐ring failures, but a launch had never been attempted at freezing temperatures. Varying opinions of the safety were provided to the launch director, who eventually decided to proceed. One of the data analyses is reproduced in the first row of Figure 1.5.

Graphs depict the plot of the raw and log-transformed body and brain weights of sixty-two land mammals.
‐transformed body and brain weights of 62 land mammals.

Graphs depict the analysis of the number of O-ring failures for the first twenty-four Space Shuttle launches.

      In the second frame, we have jittered the data by adding a little uniform noise. This reveals that there were two data points superimposed at images; jittering broke that tie. In the third frame, the data are replotted, but with an expanded images‐axis to include 28images. Would you have supported the decision to launch? A least‐squares line (discussed in Chapter 8.5) is superimposed. This line suggests that, if anything, lower temperatures might result in fewer O‐ring failures. Thus the launch was attempted.

      1.2.3 Pearson's Father–Son Height Data Revisited

      In the top right frame, we have placed a red dot at the location of the average heights of the fathers and sons. We have also drawn a straight line fit using the intuitive equation images. However, the equation images is an improvement, since we observed earlier that sons were 1 inch taller than their fathers on average. As a reference, we have also included a horizontal line at the average heights of the sons. This line would be appropriate if there were no information about a son's height to be gleaned from his father's height; but a positive relationship (correlation) is clear.

Graphs depict the scatter plot of the father and son height data that are collected by Karl Pearson.

      In the final frame, we take advantage of the large sample size to try to understand if the prediction (as weak as it may be) might be linear or nonlinear. For integer values of the rounded fathers' heights, we compute a three‐point summary of the corresponding sons' heights. The red dots are the arithmetic average of the sons' heights. The vertical lines display the (conditional) interquartile range. The final two red dots on each end are based on only a few points, so that the IQR can not be computed. These four red dots are shown in a smaller font size to indicate that even the averages are not so reliable.

      We see that these summary points clearly suggest a linear rather than a nonlinear fit. We also see that the two blue reference lines from the second frame, namely images and images, both miss badly. A new (dashed) line with slope of 1/2 appears to capture the linear trend quite well. The relationship between this slope and the correlation coefficient, as well as a genetic explanation, will be discussed in Chapter 4.1.5.

      1.2.4 Discussion

      These rather substantial examples illustrate the search for structure in distribution and prediction problems, as well as practical problems and cures that may be encountered. A


Скачать книгу