Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
How to best relate mathematical models to reality is not at all straightforward. Read Hennig (2009), and discuss Hennig's account of the relation between reality and mathematical models. Do you agree with this account? What might be some problems with it?
Notes
1 1 B.F. Skinner was a psychologist known for his theory of operant conditioning within the behaviorist tradition in psychology. One of Skinner’s primary investigatory tools was that of observing and recording the conditions that would lead a rat, pigeon, or other animal, to press a lever for a food pellet in a small chamber. This chamber came to be known as the Skinner box. For a read of Skinner, see Rutherford (2009) and Fancher and Rutherford (2011).
2 2 See Friendly (2000, pp. 208–211) for an analysis of the O‐ring data. See Vaughan (1996) for an account of the social, political, and managerial influences at NASA that were also purportedly responsible for the disaster.
3 3 The reader is strongly encouraged to consult Kuhn’s excellent book The Structure of Scientific Revolutions in which an eminent philosopher of science argues for what makes some theories more longstanding than others and why some theories drop out of fashion. So‐called paradigm shifts are present in virtually all sciences. An awareness of such shifts can help one better put “theories of the day” into their proper context.
2 INTRODUCTORY STATISTICS
In spite of the immense amount of fruitful labour which has been expended in its practical applications, the basic principles of this organ of science are still in a state of obscurity, and it cannot be denied that, during the recent rapid development of practical methods, fundamental problems have been ignored and fundamental paradoxes left unresolved.
(Fisher, 1922a, p. 310)
Our statistics review includes topics that would customarily be seen in a first course in statistics at the undergraduate level, but depending on the given course and what was emphasized by the instructor, our treatment here may be at a slightly deeper level. We review these principles with demonstrations in R and SPSS where appropriate. Should any of the following material come across as entirely “new,” then a review of any introductory statistics text is recommended. For instance, Kirk (2008), Moore, McCabe, and Craig (2014), Box, Hunter, and Hunter (1978) are relatively nontechnical sources, whereas Degroot and Schervish (2002), Wackerly, Mendenhall III, and Scheaffer (2002) along with Evans and Rosenthal (2010) are much deeper and technically dense. Casella and Berger (2002), Hogg and Craig (1995) along with Shao (2003) are much higher‐level theoretically oriented texts targeted mainly at mathematical and theoretical statisticians. Other sources include Panik (2005), Berry and Lindgren (1996), and Rice (2006). For a lighter narrative on the role of statistics in social science, consult Abelson (1995).
Because of its importance in the interpretation of evidence, we close the chapter with an easy but powerful demonstration of what makes a p‐value small or large in the context of statistical significance testing and the testing of null hypotheses. It is imperative that as a research scientist, you are knowledgeable of this material before you attempt to evaluate any research findings that employ statistical inference.
2.1 DENSITIES AND DISTRIBUTIONS
When we speak of density as it relates to distributions in statistics, we are referring generally to theoretical distributions having area under their curves. There are numerous probability distributions or density functions. Empirical distributions, on the other hand, rarely go by the name of densities. They are in contrast “real” distributions of real empirical data. In some contexts, the identifier normal distribution may be given without reference as to whether one is referring to a density or to an empirical distribution. It is usually evident by the context of the situation which we are referring to. We survey only a few of the more popular densities and distributions in our discussion that follows.
The univariate normal density is given by:
where,
μ is the population mean for the given density,
σ2 is the population variance,
π is a constant equal to approximately 3.14,
e is a constant equal to approximately 2.71,
xi is a given value of the independent variable, assumed to be a real number.
When μ is 0 and σ2 is 1, which implies that the standard deviation σ is also equal to 1 (i.e.,
Notice that in (2.1),
The standard normal distribution is the classic z‐distribution whose areas under the curve are given in the appendices of most statistics texts, and are more conveniently computed by software. An example of the standard normal is featured in Figure 2.1.
Scores in research often come in their own units, with distributions having means and variances different from 0 and 1. We can transform a score coming from a given distribution with mean μ and standard deviation σ by the familiar z‐score:
A z‐score is expressed in units of the standard normal distribution. For example, a z‐score of +1 denotes that the given raw score lay one standard deviation above the mean. A z‐score of −1 means that the given raw score lay one standard deviation below the mean. In some settings (such as school psychology), t‐scores are also useful, having a mean of 50 and standard deviation of 10. In most contexts, however, z‐scores dominate.
Figure 2.1 Standard normal distribution with shaded area from −1 to +1 standard deviations from the mean.
A classic example of the utility of z‐scores typically goes like this. Suppose two sections of a