Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
science should possess a healthy blend of both perspectives. On the one hand, science should, of course, be grounded in objective objects. The objects one studies should be independent of the psychical realm. A cup of coffee is a cup of coffee regardless of our belief or theory about the existence of the cup. On the other hand, void of any rationalist activity, science becomes the study of objects for which we are not allowed to assign meaning. For example, the behavior of a pigeon in a Skinner box11(see Figure 1.1) can be documented as to the number of times it presses on the lever for the reward of a food pellet. That the pigeon presses on the lever is empirical reality. Why the pigeon presses on the level is theoretical speculation, of which there could be many competing possibilities. Observing data is fine, but without theory, we have very little “guidance” to either explain current observations or predict new ones. B.F. Skinner’s theory of operant conditioning, being such that the pigeon presses the lever because it is reinforced to do so, is a prime example of where a wedding of rationalism and empiricism takes place. The theory attempts to explain or account for the pigeon's behavior. It is a narrative for why the pigeon does what it does.
Figure 1.1 Observing the behavior of a pigeon in a Skinner box.
Source: Dtarazona (1998). https://commons.wikimedia.org/wiki/File:UNMSM_PsiExperimental_1998_2.jpg. Public Domain.
Of course, theorizing can go too far, much too far. One must be cautious to not “over‐theorize” too emphatically without acknowledging the absence of empirical backing. Is there anything wrong with hypothesizing that cloudy days are associated with depressive moods? No, so long as you are prepared to state what evidence exists that may support or contradict your theory. If no evidence exists, you may still theorize, but you owe it to your audience to admit the lack of current empirical support for your hypothesis.
As an example of “heightened theorizing,” recall the missing Malaysia Airlines Flight 370 where a Being 777 aircraft vanished, apparently without a trace, originally destined from Kuala Lumpur to Beijing in March of 2014. Media were sometimes criticized for proposing numerous theories as to its disappearance, ranging from the plane being flown into a hidden location to it being hijacked or a result of pilot suicide. One theory even speculated that the plane was swallowed by a black hole! Speculation is fine and theorizing is a necessary scientific as well as human activity, so long as one is up front about existent available evidence to support the theory one is advancing. Indeed, one could assign probabilities to competing theories and revise such probabilities as new data become available. This is precisely what Bayesian philosophers and statisticians are wont to do. A theory should only be considered credible however when empirical reality and the theory coincide (see Figure 1.2). The fit may not be perfect, and seldom if ever is, but when the rational coincides well with the empirical, credibility of the idea is at least tentatively assured, at least until potentially new evidence debunks it (e.g., the fall of Newtonian physics).
Figure 1.2 “Model fit” as an overlap of data with theory.
We must also ensure that our theories are not too convenient of narratives fit to data. If you have ever witnessed a sporting event where the deciding point occurred by the lucky bounce of a puck in hockey or the breezy push of a tennis ball in midair, only to hear post‐match commentators laud the winning team or individual as suddenly so much better than the losing team, then you know what “convenient narratives” are all about. We must be careful not to exaggerate how well our given theory fits data simply because a few data points went “our way.” George Box once said that all models are wrong but some are useful. In any scientific endeavor, guard against falling in love with your theory or otherwise exaggerating it far beyond what the data suggest. Otherwise, it no longer is a legitimate theory, but rather is simply your brand and more a product of subjective bias and “career‐building” than anything scientific. After 20 years of advocating a theory, is the researcher you are speaking to really prepared to “accept” evidence that contradicts his or her theory? They have a lot of stakes in that theory, their whole career may have been built upon it, are they really willing to accept “defeat” of it? Indeed, one reason I believe why economic predictions, for instance, are often looked upon with suspicion, is because economists, like psychologists (and theoretical physicists, for that matter), are far too quick to advance theories as though they were near facts. “Sexy theories” sound great and may be marketable to uncritical consumers and media (make an outlandish claim on cable, you'll be a hero!), but to good scientists, theories are always only as good as the data that exist to support them. Science is exciting, to be sure, but should not be overly speculative. If you are looking for fireworks, then you are best to choose a field other than science.
1.2 WHAT IS A “MODEL”?
The word “model” is perhaps the most popular word featured in textbooks, tutorials, and lectures having anything to do with the application of quantitative methods. Attempting to define just what is a model in statistics can be a bit challenging. We discuss the concept by referring to Everitt's definition:
A description of the assumed structure of a set of observations that can range from a fairly imprecise verbal account to, more usually, a formalized mathematical expression of the process assumed to have generated the observed data. The purpose of such a description is to aid in understanding the data.
(Everitt, 2002, p. 247)
Models, are, essentially, and perhaps somewhat crudely, equations. They are equations fit to data that attempt to account for how the data came about or were generated in the first place. For example, if for every hour a student studied for an exam corresponded to exactly a 1‐point increase in a student's grade, the model that would best explain how this data was generated would be a linear model. Even if the relationship between hours studied and student grade was not perfect, a perfect line might still be the “best” summary. Models are often used to account for messy or imperfect data.
Figure 1.3 Hebbian Yerkes–Dodson performance–arousal curve.
Source: Diamond et al. (2007). Licensed under CC by 3.0.
Another example of a model is the classic Hebbian version of the Yerkes–Dodson curve expressing the relationship between performance and arousal, depicted in Figure 1.3.
The curve is an inverted “U” shape (an approximate parabola) that provides a useful model relating these two attributes (i.e., performance and arousal). If one exhibits very low arousal, performance will be minimal. If one exhibits a very high degree of arousal, performance will likely also suffer. However, if one exhibits a moderate range of arousal, performance will likely be optimal. The model in this case, as in most cases, does not account for all the data one might collect. The extent to