Experimental Evaluation Design for Program Improvement. Laura R. Peck
Countering this idea, Peck describes and illustrates by way of practical examples, a variety of design options that each in their own way support causal explanation of program impact. By doing so, the book significantly broadens the types of evaluation questions potentially pursued by experimental impact evaluations. As Peck opines, “using experimental evaluation designs to answer ‘black box’ type questions—what works, for whom, and under what circumstances—holds substantial promise” (p. 10). We agree.
Sebastian T. Lemire, Christina A. Christie, and Marvin C. Alkin
Volume Editors
Reference
Stern, E., Stame, N., Mayne, J., Forss, K., Davies, R., & Befani, B. (2012). Broadening the range of designs and methods for impact evaluations. Report of a study commissioned by the Department for International Development. Working Paper 38. Accessible from here: https://www.oecd.org/derec/50399683.pdf
About the Author
Laura R. Peck, PhD, is a principal scientist at Abt Associates and has spent her career evaluating social welfare and employment policies and programs, both in research and academic settings. A policy analyst by training, Dr. Peck specializes in innovative ways to estimate program impacts in experimental and quasi-experimental evaluations, and she applies this to many social safety net programs. Dr. Peck is currently the principal investigator, co-PI, or director of analysis for several major national evaluations for the U.S. Departments of Health and Human Services, Labor, and Housing and Urban Development and over her career has been part of more than 35 informative evaluations of nonprofit, local, state, and federal programs and policies. Peck is a co-author of a public policy textbook and is well published on program evaluation topics. Prior to her work at Abt Associates, Dr. Peck was a tenured professor at the Arizona State University School of Public Affairs and also served as the founding associate dean of the Barrett Honors College, Downtown Phoenix campus. She earned her PhD from the Wagner Graduate School at New York University.
Acknowledgments
This book has been motivated by many years of scholarly and applied evaluation work. Along the way, I developed and tested my ideas in response to prodding from professors, practitioners, and policymakers; and it is my hope that articulating them in this way gives them more traction in the field. Through my graduate school training at New York University’s Wagner School, I especially valued the perspectives of Howard Bloom, Jan Blustein, and Dennis Smith. While at Arizona State University’s School of Public Affairs, I appreciated opportunities to engage many community agencies, who served as test beds for my graduate students as they learned about evaluation in practice.
Most recently, at Abt Associates, I am fortunate to have high-quality colleagues and projects where we have the chance not only to inform public policy but also to advance evaluation methods. I am grateful to the funders of that research (including the U.S. Departments of Labor, Health and Human Services, and Housing and Urban Development) who are incredibly supportive of advancing evaluation science: They encourage rigor and creativity and are deeply interested in opening up that black box—be it through advancing analytic approaches or design approaches. I am also grateful to have had the opportunity to write about some of these ideas in the context of my project work, for the American Evaluation Association’s AEA365 blog, and for Abt Associates’ Perspectives blog.
Evaluation is a team sport, and so the ideas in this book have evolved through teamwork over time. For example, some of the arguments, particularly regarding the justification for experiments and ethics of experimentation (in Chapter 1), stem from work with Steve Bell (and appear in our joint JMDE publication in 2016). The discussion of whether a control group (in Chapter 4) is needed as well as some observations of design variants draws on earlier work as well (e.g., Peck, 2015, in JMDE; Bell & Peck, 2016, in NDE #152). In addition, some of the (Appendix) discussion of the trade-offs between intent-to-treat and treatment-on-the-treated impacts and the factors that determine minimum detectable effect sizes came from joint work with Shawn Moulton, Director of Analysis, and the project team for the HUD First-Time Homebuyer Education and Counseling Demonstration.
At Abt Associates, Rebecca Jackson provided research assistance, Bry Pollack provided editorial assistance, Daniel Litwok provided critical review of a draft manuscript, and the Work in Progress Seminar offered input on final revisions. I am also appreciative of input from five anonymous reviewers for the Evaluation in Practice Series, and SAGE editors.
My most intellectually and personally enriching partnership—and longest-standing collaboration—is with Brad Snyder who asks the right and hard questions, including the “and then,” which implies pushing further still. I also thank my parents for raising me to value questioning and my daughter for teaching me patience in answering.
I would also like to acknowledge the following reviewers for their feedback on the book:
Deven Carlson, University of Oklahoma
Roger A. Boothroyd, University of South Florida
Sebastian Galindo, University of Florida
Katrin Anacker, George Mason University
Christopher L. Atkinson, University of West Florida
Sharon Kingston, Dickinson College
Colleen M. Fisher, University of Minnesota
Regardt Ferreira, Tulane University
Chapter 1 Introduction
The concepts of cause and effect are critical to the field of program evaluation. After all, establishing a causal connection between a program and its effects is at the core of what impact evaluations do. The field of program evaluation has its roots in the social work research of the settlement house movement and in the business-sector’s efficiency movement, both at the turn of the 20th century. Evaluation as we know it today emerged from the Great Society Era, when large scale demonstrations tested new, sweeping interventions to improve many aspects of our social, political, and economic worlds. Specifically, it was the Elementary and Secondary Education Act of 1965 that first stipulated evaluation requirements (Hogan, 2007). Thereafter, a slew of scholarly journals launched and, to accompany them, academic programs to train people in evaluation methods. Since then scholars, practitioners and policymakers have increased their awareness of the diversity of questions that program evaluation pursues. This has coupled with a broadening range of evaluation approaches to address not only whether programs work but also what works, for whom, and under what circumstances (e.g., Stern et al., 2012). Program evaluation as a profession is diverse, and scholars and practitioners can be found in a wide array of settings from small, community-based nonprofits to the largest of federal agencies.
As those program administrators and policymakers seek to establish, implement, and evolve their programs and public policies, measuring the effectiveness of the programs or policies is essential to justifying ongoing funding, enacting policy changes to improve it, or terminating. In doing so, impact evaluations must isolate a program’s impact from the many other possible explanations that exist for any observed difference in outcomes. How much of the improvement in outcomes (that is, the “impact”) is due to the program involves estimating what would have happened in the program’s absence (the “counterfactual”). As of 2019, we are amid an era of “evidence-based” policy-making, which implies that the results of evaluation research inform what we choose to implement, how we choose to improve, and whether we terminate certain public and nonprofit programs and policies.
Experimentally designed evaluations—those that randomize to treatment and control groups—offer a convincing means for establishing a causal connection between a program and its effects. Over the last roughly 3 decades, experimental