Spinning Out of Control with Randomized Controls

In my work, I hear the occasional drum beat for more “scientific evaluation techniques” especially the need for randomized controls. You might have heard the sound of the drill from Stanley and Campbell who set the stage for treatment and control groups to determine the worth of experiments. This beat has gotten louder over the last decade in the political world. The federal government’s weight in establishing a “Scientifically Based Evaluation Methods” policy has claimed primacy for randomized control groups (RCT’s) as being the only “true” route to judging whether education programs make a difference. Funders and foundations have echoed this mantra widely and even slavishly. Unfortunately, paralysis is never far away.

Others, including Michael Quinn Patton, have been seeking to widen educators’ knowledge about the possible in proving things work. Disputing this false “Gold Standard,” Patton offers some common sense advice well worth considering. Considering that standards for research are different from evaluation, Patton includes utility, feasibility, propriety, and accuracy as key goals. How often to higher education researchers fold these touchstones into their work? Not often in my opinion, especially when recommending RCT’s. Remember, the goal of research in the area of student outcomes is to determine what types of students change in what way under what circumstances.

Patton indicates that there are times when RCT’s are appropriate: drug studies, fertilizer and crop yield studies, and single health practices. The fertilizer reference, of course, brings me a broad chuckle. In my keynote to the Association for Institutional Research in 2003, I pointed out that treatment and control group methodologies were a call to “explain a multivariate world with a two variable model.” I still think I’m right and Patton agrees. Times when RCT’s are not appropriate include situations that are complex, multi-dimensional and highly context-specific. Patton uses a community health interventions as an example; more broadly I use any intervention that seeks to change complex human behavior such as learning and skill interventions. My monograph late in the 1990’s sought to explain, for example, how complex and interrelated factors come together to predict student learning and cognitive development. Readers wanting a quick overview of how complex events might tie together can find a visual here.
So where does this leave all those well-intended souls wanting to prove that their programs work? Having to learn much more than the common mantra, I suspect. Patton talks about both the possible and appropriate, a good place to visit. Multiple sources of data about each case, triangulation of sources, modus operandi analysis, and epidemiological field are his keywords. To me, this all sounds a lot like context and generating meaning from each program’s reality rather than hammering on a RCT. Patton goes further, though, and offers that RCT’s aren’t needed when face validity is high, the observed changes are dramatic, and the link between treatment and outcome is direct. I think he’s arguing that educators and others frequently use RCT’s as a nail when the only tool they have is a hammer.

This is to say that there are times when RCT’s are appropriate. Most often, however, the assumptions they carry limit what can be learned about the intervention under scrutiny. In higher education, they also assume that small, mirco-level programs can either assign students randomly to control and treatment groups or have the sophistication to match treatment and control groups. The former is virtually impossible from a moral as well as logistical perspective while the latter does violence to the complexity of an intervention. My experience working widely with programs throughout the United States has taught me that matching subjects on gender, age, and race/ethnicity tells you only whether gender, age, and race/ethnicity have a bearing on an intervention. Does anyone besides me think that student outcomes depend more on other key factors such as the structure of the intervention, student motivation, and the quality of teaching? Does lack of a RCT mean that any other data gathered about that program is meaningless? I hope not.

I’ve come lately to adopt the term developmental evaluation in my consulting practice to distinguish our approach from summative and formative evaluation. Most funders are interested in summative and formative evaluation while some are moving toward developmental approaches. The difference? Summative evaluations are for making final judgments and formative evaluations are directed at improving programs. Developmental evaluation, on the other hand, talks about ongoing development and knowledge building. The programs I work with aspire to be innovative and cutting-edge; they don’t have a long history from which to draw. They’re not ready for rigid summative judgments although they’re receptive to formative help. They are works-in-progress

Innovative programs understand they have much to prove. Especially as they operate in a world that usually believes that a RCT is the sine qua non in research and evaluation. These programs have much to prove and much to lose. Using other techniques to prove their worth is attractive including focus groups; comparing outcome data to other, similar programs; journaling and analyzing the insights of faculty about what works; and benchmarking a program’s progress against its own historical performance. Each has pitfalls, certainly, but if RCT’s are sold as the only answer for judging a programs worth, and they take all the time and resources available for evaluation in a small-scale program, we may never get to the bottom of what really works in a complex world.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top