By July, the evidence was in on hydroxychloroquine: High-quality randomized controlled trials (RCTs) “indicate[d] no therapeutic efficacy” against COVID, in the words of Dr. Anthony S. Fauci, the nation’s top infectious disease expert. A month earlier, the Food and Drug Administration (FDA) revoked its emergency use authorization of hydroxychloroquine for COVID patients based on the accumulating RCT evidence.
This is a classic case of medical reversal, in which multiple well-conducted RCTs — which are widely considered the most reliable method of evaluating a treatment’s effectiveness — overturned evidence from several less-rigorous, non-randomized studies suggesting that hydroxychloroquine reduced the rate of illness and death in COVID patients by 50 percent or more. The history of medicine is replete with such examples, including cases where RCTs overturned well-established treatments widely believed to be effective based on non-randomized studies, such as hormone replacement therapy to prevent heart disease in postmenopausal women and bone marrow transplants to treat women with advanced breast cancer.
The limitations of non-randomized studies are understood in medicine, with experts from the National Institutes of Health and FDA routinely awaiting RCT results before making definitive judgments about the efficacy of medical treatments. But public officials and researchers in social policy — including areas such as education, poverty reduction, and crime and policing — should also take heed: The same principles apply to their field.
The advantages of RCTs, and limitations of non-randomized studies, transcend fields. Unique among evaluation methods, an RCT’s process of randomly assigning a sizable number of people to either a treatment group that receives a new treatment or program or a control group that does not, ensures there are no systematic differences between the two groups in observable characteristics (such as income, gender, age) or unobservable characteristics (such as motivation, psychological resilience, family support). Because it is an apples-to-apples comparison, any difference in outcomes between the two groups can confidently be attributed to the treatment and not to other factors.
By contrast, non-randomized studies that — as a typical example — compare people who choose to participate in a program to a group of nonparticipants with similar observable characteristics, cannot rule out the possibility that the two groups differ in unobservable characteristics such as motivation. Because this comparison may not be true apples-to-apples, such studies are best viewed as a source of preliminary or promising hypotheses that, like the initial hydroxychloroquine findings, merit testing in RCTs wherever feasible.
Yet, in social policy, public officials and researchers routinely rely on such preliminary findings when championing and funding programs and policies, sometimes even in the face of contrary RCT evidence.
For example, the Trump Administration’s fiscal year 2021 budget request for charter schools cites non-randomized studies as evidence that charter schools improve student achievement, while never mentioning the disappointing findings from the only large RCT to measure the effects of charter schools as a general educational strategy. The RCT — a federally-sponsored study with a sample of nearly 3,000 students who applied to 36 charter middle schools across 15 states — found that, on average, the schools produced no significant gains in reading or math achievement; nor, in a long-term follow-up, college enrollment or completion.
Similarly, proponents of universal pre-k often cite non-randomized studies, or smallpreliminary RCTs from the 1960s and 1970s, as evidence that pre-k is an effective general strategy to improve children’s educational and other life outcomes, while overlooking or downplaying far more credible (and disappointing) findings from the two large RCTs on the topic. Both RCTs — the National Head Start Impact study (with a sample of 4,667 children nationwide) and the Tennessee Voluntary Pre-K study (with a sample of 3,131 children statewide) — found that, on average, pre-k produced short-term gains in child outcomes during the pre-k year that unfortunately faded out in early elementary school.
If we wish to make progress in education and other social policy areas, public officials must take a leaf out of the medical playbook by building and prioritizing evidence from large RCTs. For example, while the RCT evidence on charter schools as a general strategy is disappointing, RCTs show that specific types of public charter schools, such as the KIPP model, are highly effective, producing sizable gains in student achievement. These models should be widely expanded as they now serve only a fraction of students that could benefit. This would be analogous to the medical field’s rapid expansion of the drugs remdesivir and dexamethasone for the treatment of COVID, based on RCT findings of important improvements in patient health.
Similarly, initial RCTs have identified specific pre-k curricula, such as Breakthrough to Literacy, as having promising effects on student achievement through early elementary school. Policy officials should prioritize further research to determine whether these initial findings can be reproduced in larger, more definitive RCTs; if so, these curricula should be widely expanded.
More generally, social policy needs a major innovation and testing strategy aimed at building a body of proven-effective strategies, analogous to that in place for COVID. Through rigorous testing of treatments and vaccines, we will ultimately prevail over COVID, diminishing its toll of illness and mortality, as we have with AIDS, polio, and other diseases. The key to victory is trial-and-error testing of many approaches, with large RCTs as the final determinant of success.
In social policy, by contrast, we have been fighting major problems like educational failure, stagnant wages, poverty, crime, and opioid abuse for years with unproven strategies or even strategies already found to produce disappointing results. While RCTs are far less common in social policy than in medicine, they have shown the same ability to identify exceptional programs that produce important improvements in people’s lives. If we hope to make progress on the nation’s social problems, we must deploy RCTs on a much larger scale.