Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis


Скачать книгу
may also wish to compute what is known as the coefficient of variation, which is a ratio of the standard deviation to the mean. We can estimate this coefficient for parent and child respectively in Galton's data:

      Computing the coefficient of variation is a way of comparing the variability of competing distributions relative to each distribution's mean. We can see that the dispersion of child relative to its mean (0.037) is slightly larger than that of the dispersion of parent relative to its mean (0.026).

      In our discussion of variance, we saw that if we wanted to use the sample variance as an estimator of the population variance, we needed to subtract 1 from the denominator. That is, S2 was “corrected” into s2:

equation

      We say we lost a degree of freedom in the denominator of the statistic. But what are degrees of freedom? They are the number of independent units of information in a sample that are relevant to the estimation of some parameter (Everitt, 2002). In the case of the sample variance, s2, one degree of freedom is lost since we are interested in using s2 as an estimator of σ2. We are losing the degree of freedom because the numerator, images, is not based on n independent pieces of information since μ had to be estimated by images. Hence, a degree of freedom is lost. Why? Because values of yi are not independent of what images is, since images is fixed in terms of the given sample data. In general, when we estimate a parameter, it “costs” a degree of freedom. Had we μ, such that images, we would have not lost a degree of freedom, since μ is a known (not estimated) parameter.

Schematic illustration of the Beautiful Triangle as a way to understanding degrees of freedom.

      Degrees of freedom occur throughout statistics in a variety of statistical tests. If you understand this basic example, then while working out degrees of freedom for more advanced designs and tests may still pose a challenge, you will nonetheless have a conceptual base from which to build your comprehension.

      The third moment of a distribution is its skewness. Skewness of a random variable generally refers to the extent to which a distribution lacks symmetry. Skewness is defined as:

equation

       Skewness for a normal distribution is equal to 0, just as skewness for a rectangular distribution is also equal to 0 (one does not necessarily require a bell‐shaped curve for skewness to equal 0)

       Skewness for a positively skewed distribution is greater than 0; these distributions have tails that stretch out into values on the abscissa of greatest value

       Skewness for a negatively skewed distribution is less than 0; these distributions have tails that stretch out to values on the abscissa of least value

      The fourth moment of a distribution is its kurtosis, generally referring to the peakness of a distribution (Upton and Cook, 2002), but also having much to do with a distribution's tails (DeCarlo, 1997):

equation Graph depicts the F distribution on 2 and 5 degrees of freedom. It is positively skewed since the tail stretches out to numbers of greater value.

      With regard to kurtosis, distributions are defined:

       mesokurtic if the distribution exhibits kurtosis typical of a bell‐shaped normal curve

       platykurtic if the distribution exhibits lighter tails and is flatter toward the center than a normal distribution

       leptokurtic if the distribution exhibits heavier tails and is generally more narrow in the center than a normal distribution, revealing that it is somewhat “peaked”

      We can easily compute moments of empirical distributions in R or SPSS. Several packages in R are available for this purpose. We could compute skewness for parent on Galton's data by:

      > library(psych) > skew(parent) [1] -0.03503614

      The psych package (Revelle, 2015) also provides a range of descriptive statistics:

      > library(psych) > describe(Galton) vars n mean sd median trimmed mad min max range skew kurtosis parent 1 928 68.31 1.79 68.5 68.32 1.48 64.0 73.0 9 -0.04 0.05 child 2 928 68.09 2.52


Скачать книгу