Statistics. David W. Scott

Statistics - David W. Scott


Скачать книгу
Lord Rayleigh's Data

measurements from 1892 to 1894, with a mean of 2.30584 and a standard deviation of 0.00537. It is common to assume such measurements of a fundamental quantity are normally distributed. Multiple experiments are run and the results averaged in the presumption that a more accurate estimate will result.

2.29816 2.29849 2.29869 2.29889 2.29890
2.29940 2.30054 2.30074 2.30143 2.30182
2.30956 2.30986 2.31001 2.31010 2.31010
2.31012 2.31017 2.31024 2.31024 2.31026
2.31027 2.31028 2.31035 2.31163
command
.

.

      1.1.3 Discussion

      Finding structure in data is a primary goal of data science. Graphical methods are powerful approaches to discovering unexpected or hidden structure. Some of these methods are better suited to small datasets. In a multivariate statistics course, we will learn how to analyze data with more than one variable. Modern genetic datasets often result in more than

variables!

      The second fundamental task of statistics is prediction. Data for this task are typically ordered pairs,

. The goal is to predict the value of the
variable using the corresponding value of the
variable. For example, we might try to predict a son's height (
) knowing the father's height (
). Or a bank contemplating a mortgage loan may use a person's credit score to predict the probability the person will default on the loan.

data points in order to determine if there is a strong relationship between
and
. The relationship, if it exists, is linear or nonlinear. If knowledge of
does not convey any information about the value of
, then the scatter diagram will have no slope or trends, with
values just scattered around their average.

      1.2.1 Body and Brain Weights of Land Mammals

MASS library. The relationship, if any, is hard to discern since 59 of the measurements are overplotted near the origin. We might choose to exclude the two elephant measurements and even the human data point as outliers, and then replot.

      However, Tukey introduced a power transformation ladder to re‐express a variable

:

      see Problem 3 for an explanation of why

is used in place of
when
.

      In the right frame of Figure 1.4, we use the log function to dramatic effect. There clearly is a strong relationship that allows highly accurate prediction of the log(brain weight) of a land mammal knowing its log(body weight). (The body weight


Скачать книгу