Practitioner's Guide to Using Research for Evidence-Informed Practice. Allen Rubin
alternative explanations by randomly assigning survivors to an experimental group that receives our innovative new therapy versus a control group that receives routine treatment as usual. If our treatment group has a significantly better outcome on average than the control group, we can rule out contemporaneous events or the passage of time as plausible explanations, since both groups had an equal opportunity to have been affected by such extraneous factors.
Suppose we did not randomly assign survivors to the two groups. Suppose instead we treated those survivors who were exhibiting the worst trauma symptoms in the immediate aftermath of the traumatic event and compared their outcomes to the outcomes of the survivors whom we did not treat. Even if the ones we treated had significantly better outcomes, our evidence would be more flawed than with random assignment. That's because the difference in outcome might have had more to do with differences between the two groups to begin with. Maybe our treatment group improved more simply because their immediate reaction to the trauma was so much more extreme that even without treatment their symptoms would have improved more than the less extreme symptoms of the other group.
As another alternative to random assignment, suppose we simply compared the outcomes of the survivors we treated to the outcomes of the ones who declined our services. If the ones we treated had on average better outcomes, that result very plausibly could be due to the fact that the ones who declined our treatment had less motivation or fewer support resources than those who wanted to and were able to utilize our treatment.
In each of the previous two examples, the issue is whether the two groups being compared were really comparable. To the extent that doubt exists as to their comparability, the research design is said to have a selectivity bias. Consequently, when evaluations of outcome compare different treatment groups that have not been assigned randomly, they are called quasi-experiments. Quasi-experiments have the features of experimental designs, but without the random assignment.
Not all quasi-experimental designs are equally vulnerable to selectivity biases. A design that compares treatment recipients to treatment decliners, for example, would be much more vulnerable to a selectivity bias than a design that provides the new treatment versus the routine treatment depending solely on whether the new treatment therapists have caseload openings at the time of referral of new clients. (The latter type of quasi-experimental design is called an overflow design.)
So far we have developed a pecking order of four types of designs for answering EIP questions about effectiveness. Experiments are at the top, followed by quasi-experiments with relatively low vulnerabilities to selectivity biases. Next come quasi-experiments whose selectivity bias vulnerability represents a severe and perhaps fatal flaw. At the bottom are designs that assess client change without using any control or comparison group whatsoever.
But our hierarchy is not yet complete. Various other types of studies are used to assess effectiveness. One alternative is called single-case designs. You may have seen similar labels, such as single-subject designs, single-system experiments, and so on. All these terms mean the same thing: a design in which a single client or group is assessed repeatedly at regular intervals before and after treatment commences. With enough repeated measurements in each phase, it can be possible to infer which explanation for any improvement in trauma symptoms is more plausible: treatment effects versus contemporaneous events or the passage of time. We examine this logic further later in this book. For now, it is enough to understand that when well executed, these designs can offer some useful, albeit tentative, evidence about whether an intervention really is the cause of a particular outcome. Therefore, these designs merit a sort of medium status on the evidentiary hierarchy for answering EIP questions about effectiveness.
Next on the hierarchy come correlational studies. Instead of manipulating logical arrangements to assess intervention effectiveness, correlational studies attempt to rely on statistical associations that can yield preliminary, but not conclusive, evidence about intervention effects. For example, suppose we want to learn what, if any, types of interventions may be effective in preventing risky sexual behavior among high school students. Suppose we know that in some places the students receive sex education programs that emphasize abstinence only, while in other places the emphasis is on safe-sex practices. Suppose we also know that some settings provide faith-based programs, others provide secular programs, and still others provide no sex education. We could conduct a large-scale survey with many students in many different schools and towns, asking them about the type of sex education they have received and about the extent to which they engage in safe and unsafe sex. If we find that students who received the safe-sex approach to sex education are much less likely to engage in unsafe sex than the students who received the abstinence-only approach, that would provide preliminary evidence as to the superior effectiveness of the safe-sex approach.
Correlational studies typically also analyze data on a variety of other experiences and background characteristics and then use multivariate statistical procedures to see if differences in the variable of interest hold up when those other experiences and characteristics are held constant. In the sex education example, we might find that the real explanation for the differences in unsafe-sex practices is the students' socioeconomic status or religion. Perhaps students who come from more affluent families are both more likely to have received the safe-sex approach as well as less likely to engage in unsafe sex. In that case, if we hold socioeconomic status constant using multivariate statistical procedures, we might find no difference in unsafe-sex practices among students at a particular socioeconomic level regardless of what type of sex education they received.
Suppose we had found that students who received the abstinence-only sex education approach, or a faith-based approach, were much less likely to engage in unsafe sex. Had we held religion constant in our analysis, we might have found that students of a certain religion or those who are more religious are both more likely to have received the abstinence-only or faith-based approach and less likely to engage in unsafe sex. By holding religion or religiosity constant, we might have found no difference in unsafe-sex practices among students who did and did not receive the abstinence-only or a faith-based approach.
Although correlational studies are lower on the hierarchy than experiments and quasi-experiments (some might place them on a par with or slightly above or slightly below single-case experiments on an effectiveness research hierarchy – there is not complete agreement on the exact order of hierarchies), they derive value from studying larger samples of people under real-world conditions. Their main drawback is that correlation, alone, does not imply causality. As illustrated in the sex education example, some extraneous variable – other than the intervention variable of interest – might explain away a correlation between type of intervention and a desired outcome. All other methodological things – such as quality of measurement – being equal, studies that control statistically for many extraneous variables that seem particularly likely to provide alternative explanations for correlations between type of intervention and outcome provide better evidence about possible intervention effects than studies that control for few or no such variables.
However, no matter how many extraneous variables are controlled for, there is always the chance of missing the one that really matters. Another limitation of correlational studies is the issue of time order. Suppose we find in a survey that the more contact youths have had with a volunteer mentor from a Big Brother/Big Sister program, the fewer antisocial behaviors they have engaged in. Conceivably, the differences in antisocial behaviors might explain differences in contact with mentors, instead of the other way around. That is, perhaps the less antisocial youths are to begin with, the more likely they are to spend time with a mentor, and the more motivated the mentor will be to spend time with them.
Thus, our ability to draw causal inferences about intervention effects depends on not just correlation, but also on time order and on eliminating alternative plausible explanations for differences in outcome. When experiments randomly assign an adequate number of participants to different treatment conditions, we can assume that the groups will be comparable in terms of plausible alternative explanations. Random assignment also lets us assume that the groups are comparable in terms of pretreatment differences in outcome variables. Moreover, most experiments administer pretests to handle possible pretreatment differences. This explains why experiments using random assignment