Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
“disappear” and because of the nature of measured variables, we may no longer have physical recourse to justify the original relationship at all, external to the statistical model. This is why social models can be very “neurotic,” frustrating, and context‐dependent. Self‐esteem may predict achievement in one model, but in another, it does not. Many areas of psychological, political, and economic research, for instance, implicitly operate on such grounds. The existence of phenomena is literally “built” on the existence of the statistical model and often does not necessarily exist separate from it, or at least not in an easily observed manner such as the healing of a wound. Social scientists working in such areas, if nothing else, must be aware of this. Estimating a statistical model may or may not correspond to actual physical effects it is seeking to account for.
1.11 UNDERSTANDING WHAT “APPLIED STATISTICS” MEANS
In this day and age of extraordinary computing power, the likes of which will probably seem laughable in even a decade from the date of publication of this book, with a few clicks of the mouse and a software manual, one can obtain a principal components analysis, factor analysis, discriminant analysis, multiple regression, and a host of other relatively theoretically advanced statistical techniques in a matter of seconds. The advance of computers and especially easy‐to‐use software programs has made performing statistical analyses seemingly quite easy because even a novice can obtain output from a statistical procedure relatively quickly. One consequence of this however is that there seems to have arisen a misunderstanding in some circles that “applied statistics” somehow equates with the idea of “statistics without mathematics” or even worse, “statistics via software.”
The word “applied” in applied statistics should not be understood to necessarily imply the use of computers. What “applied” should mean is that the focus on the writing is on how to use statistics in the context of scientific investigation, oftentimes with demonstrations with real or hypothetical data. Whether that data is analyzed “by hand” or through the use of software does not make one approach more applied than the other. If analyzed via computer, what it does make it is more computational compared to the by‐hand approach. Indeed, there is a whole field of study known as computational statistics that features a variety of software approaches to data analysis. For examples, see Dalgaard (2008), Venables and Ripley (2002), and Friendly (1991, 2000) for an emphasis on data visualization. Fox (2002) also provides good coverage of functions in S‐Plus and R. And of course, computer science and the machine‐learning movement have contributed greatly to software development and our ability to analyze data quickly and efficiently via algorithms, and implement new and classic procedures that would be impossible otherwise.
On the opposite end of the spectrum, if a course in statistics is advertised as not being applied, then most often what this implies is that the course is more theoretical or mathematical in nature with a focus on proof and the justification of results. In essence, what this really means is that the course is usually more abstract than what would be expected in an applied course. In such theoretical courses, very seldom will one see applications to real data, and instead the course will feature proofs of essential statistical theorems and the justification of analytical propositions. Hence, this is the true distinction between applied versus theoretical courses. The computer has really nothing to do with the distinction other than facilitating computation in either field.
Review Exercises
1 1.1. Distinguish between rationalism versus empiricism in accounting for different types of knowledge, and why being a rationalist or empiricist exclusively is usually quite unreasonable and unrealistic.
2 1.2. Briefly discuss what is meant by a model in scientific research.
3 1.3. Compare and contrast the social versus so‐called “hard” sciences. How are they similar? Different? In this context, discuss the statement “Social science is a courageous attempt.”
4 1.4. Compare and contrast a physical quantity such as weight to a psychological one such as intelligence. How is one more “real” than the other? Can they be considered to be equally real? Why or why not?
5 1.5. Why would some people say that an attribute such as intelligence is not measurable?
6 1.6. Discuss George Box's infamous statement “All models are wrong, some are useful.” What are the implications of this for your own research?
7 1.7. Consider an example from your own area of research in which two competing explanations, one simple, and one complex, may equally well account for observed data. Then, discuss why the simpler explanation may be preferable to the more complex. Are there instances where the more complex explanation may be preferable to the simpler? Discuss.
8 1.8. Briefly discuss why using statistical methods to make causal statements about phenomena may be unrealistic and in most cases unattainable. Should the word “cause” be used at all in reference to nonexperimental social research?
9 1.9. Discuss why it is important to suspend one's beliefs about a subject such as applied statistics or mathematics in order to potentially learn more about it.
10 1.10. Statistical thinking is about relativity. Discuss what this statement means with reference to the pilot example, then by making up an example of your own.
11 1.11. Distinguish between experimental versus statistical control, and why understanding the distinction between them is important when interpreting a statistical model.
12 1.12. Distinguish between statistical versus physical effects and how the effect of a medication treating a wound might be considered different in nature from the correlation between intelligence and self‐esteem.
13 1.13. Distinguish between the domains of applied versus theoretical statistics.
Further Discussion and Activities
1 1.14. William of Ockham (c. 1287–1347) is known for his infamous principle Ockham's razor, which essentially states that all things equal, given competing theories accounting for the same data, the simpler theory is the better theory. In other words, complex explanations for phenomena that could be explained by simpler means are not encouraged. Read Kelly (2007), and evaluate the utility of Ockham's razor as it applies to statistical modeling. Do you agree that the simpler statistical model is usually preferred over the more complex when it comes to modeling social phenomena? Why or why not?
2 1.15. Read Kuhn (2012). Discuss what Kuhn means by normal science and the essence of what constitute paradigm shifts in science.
3 1.16. As briefly discussed in this chapter, statistical control is not the same thing as experimental control or that of a control group. Read Dehue (2005), and provide a brief commentary regarding what constitutes a real control group versus the concept of statistical controls.
4 1.17. It was briefly discussed in the chapter potential problems with using the word cause or speaking of causality at all when describing findings in the social and (often) natural sciences. The topic of causality is a philosopher's career and a scientist's methodological nightmare. Epidemiology, the study of diseases in human and other populations, has, like so many other disciplines, had to grapple with the issue of causation. For example, if one is to make the statement smoking causes cancer, one must be able to defend one's philosophical position in advancing such a claim. Not everyone who smokes gets cancer. Further, some who smoke the most never get the disease, whereas some who smoke the least do. Tobacco companies have historically relied on the fact that not everyone who smokes gets cancer as a means for challenging the smoking‐cancer “link.” As an introduction to these issues, as well as a brief history of causal interpretations, read Morabia (2005). Summarize the historical interpretations of causality, as well as how epidemiology has generally dealt with the problem of causation.
5 1.18. Models