Harry Macks, “Trust and Mistrust in the Marketplace: Statistics and Clinical Research, 1945-60”
As part of a historical study concerning the development of double-blind controlled trails, Harry Macks seeks to identify the objects of mistrust within medical research; focusing on who or what is mistrusted? Macks begins by tracing the roots of what he later dubs an “ideological cult of impersonality”, the purpose of which was for researchers to purge the effects of individual subjectivity from individual observations. These techniques evolved over the next few centuries to the point where it was not only the patients, but also doctors and researchers that were figuratively blindfolded in order to reduce subjective biases. By the 1950’s, the objects of mistrust were not limited just to patients, but included general medical practitioners, research physicians, nurses, but most importantly, pharmaceutical manufactures.
A campaign geared towards therapeutic reform dates back to the end of the 19th C, the purpose of which was to provide clinicians with reliable evaluations of new drugs. Joined in the 1940s by mathematical statisticians, experimenters were given new ideas about experimental rigour and statistical inference. As such, the focus of Macks’ paper is on the methodological reforms; including objective measurements, randomized experiments, placebo controls, and double blinding. Researchers sought to impose the same kinds of standards on medical research as were applied to laboratory scientists who planned a series of experiments to eliminate errors through repetition. Two kinds of errors concerned researchers: first, were errors rooted in human fallibility (errors of perception or judgment); second, were errors due to prejudice and preconceptions.
Fears that practitioners were unable to discern the value of new drugs saw the introduction of the randomized controlled trial as the yardstick by which to measure such claims. Patient selection by physicians soon emerged as a potential problem, so randomization was introduced and researchers promoted the mechanization of treatment assignment as a way to eliminate even researcher bias. It was believed that the way to eliminate bias was to not let knowledge in to confuse the results. This is why at every point in the research procedure everyone was kept in the dark. Standing against the background of reasoned findings were the gullible physician, the mislead researcher, the naturally sympathetic nurse, and the exploitative manufacturer. By filling researchers’ and practitioners’ minds with these characteristic figures of mistrust, reformers were able to garner support for new research techniques.
Drawing on these objects of mistrust, reformers employed rhetoric as much as reason in an effort to persuade researches to adopt these new techniques. Epistemic arguments extolling the virtues of randomization and statistical inference were seen as arcane, so reformers drew on concerns about potential (even unconscious) bias. The result was that the new standard for medical journals was statistical review, and even the law mandated efficacy results based on randomized clinical trials. Yet, despite methodological progress, mistrust endures which is why we still see the need for researchers to identify financial ties; for the “marketplace remains suspect.””
Jason Grossman, “The Randomized Control Group: gold standard, or merely standard?”
Jason Grossman argues that, contrary to popular belief, the randomized control trial (RCT) is not the gold standard; it may be a good experimental design in some cases, but certainly not all even though it is always considered superior to all other types of evidence. With the rise of evidence based medicine (EBM), the RTC has become the exclusive standard for determining how research funding is allocated, what kind of publications are deemed worthy of publication, and what kind of public health interventions to favour, because it is seen as the most scientific and rigourous study design available. Grossman argues that while the RTC may be the most valuable kind of study for phase III drug trials, it should not be considered superior to any and all studies not using this design.
He identifies two major problems: first, no single study is the right choice for every situation; and second, researchers suffer from “blinkered vision” because while problems exist with observational studies, they fail to take notice of the problems inherent in RCT studies. Proponents of alternative studies face almost insurmountable hurdles because RCT studies are accepted without any significant critical appraisal, while even good observational studies are dismissed without question. This attitude is reinforced by evidence hierarchies which place RCT studies above all others, but there are some encouraging signs of change as some propose evidence hierarchy matrixes which assign different weights to each study design based on the specific research question undertaken by the study.
The problem with research hierarchies is general is that they fail to take notice of poor quality results coming from RCT studies, but even here, the tide is turning. While Grossman acknowledges potential problems with non-RCT studies, the purpose of his paper is dispel the myth of the RCT as the gold standard by citing some of its problems (unit of measure, and randomization, serve as prime examples). Randomization is problematic because in its attempt to reduce bias, it falls victim to the “illusion of homogeneity”, by failing to account for other factors that can go wrong.
Grossman identifies social contexts as a particularly troublesome area to employ RCT studies. Looking at studies investigating the negative effects of smoking, neither blinding, nor randomization, is appropriate. Advocates who concede this shortcomings will often do so while adding the proviso that while RCT studies may not be possible or desirable here, it needs to be acknowledge that the results from alternative observational or historically controlled trailed ought to be considered sub-optimal in comparison to a RCT study.
With the majority of funding, publication, and policy decisions are based primarily on RCT studies, Grossman’s attempt to temper enthusiasm concern RCT studies should be seen as an attempt to revalidate alternative approaches. His final conclusion is not only that RCT studies are not always the best choice of study, instead arguing for a stronger conclusion: that the results are often markedly worse than observational studies.
Linsey McGoey, “Profitable Failure: antidepressant drugs and the triumph of flawed experiments”
The context for Linsey McGeoy’s article is the Maudsley debate held at the Institute of Psychiatry, King’s College London, to address claims regarding the usefulness of anti-depressant drug, by debating the following motion: anti-depressant drugs are no better than placeboes. Her article looks at the ramifications of the debate, and how awareness of methodological limitations of RCTs makes it difficult to determine the value of the drugs.
Irving Kirsch and Joanna Moncrieff argued for the motion. Kirsch admitted the superiority of the drugs over placebos, but emphasized the importance of the results were clinical effectiveness. While the results showed statistically significant benefits, they bore no clinically significant results over placebos. Moncrieff pointed out that any perceived benefits of the drugs had to do with their sedative qualities, and not with the active ingredients ability to cure depression.
Guy Goodwin and Lewis Wolpert argued against the motion. Goodwin echoed Kirsch’s conclusion that clinical trails are often not representative of clinical practice, but called for improved clinical methodologies to overcome the criticism. Solutions included overcoming the placebo effect, and adjusting patient recruitment policies. Wolpert, a sufferer of depression himself, touted his own personal success story on anti-depressants, suggesting that if you do not have depression, it may be difficult to see the value of the drugs.
Three things complicate Kirsch’s findings:
- Usefulness of ratings scale – 3 problems emerge with the scales: (a) effective change may be result of sedative properties; (b) changes in scale scores can be presented as evidence of treatment’s efficacy; (c) results of RCTs demonstrate disproportionate importance by conflating statistical significance with clinical usefulness.
- Problems determining clinical significance – while it may be proven that clinical effectiveness is less than is apparent in published clinical trial data, how to interpret these results remains debatable. Some argue that the threshold for efficacy is arbitrary, and even an improvement below the threshold may prove valuable to patients.
- Recruitment bias – oftentimes, those recruited for the trials are not representative of the potential therapeutic population. Ethical restrictions preclude the recruitment of severely depressed patients, implying that those taking part in the study may not be as depressed as recruiters purport them to be. Consequently, patients may respond well to placeboes/drugs simply because of the relief they feel for taking part in the study, as opposed to the placebo effect or the active ingredient in the drug.
Efforts have been undertaken to weed out patients prone to the placebo effect by conducting a single-blind pre-trial test; eliminating patients that immediately respond to the placebo through what is called the mouse-trap technique.
Researchers are placed in a paradoxical position: in order to show the weaknesses of a RCT, some sort of evidence is needed, but the only source of accepted evidence comes from further trials. Supports and doubters must then debate their results without acknowledging potential methodological weakness that renders one’s conclusions questionable. Despite the inherent weaknesses of the trials, and the apparent paradox, researchers are more committed than ever to such trials because they appear to be the only way to reduce bias – even though this is further problematized by the fact that those who conduct the trials are the manufactures who themselves have a financial stake. Ironically, the more useless that RCTs turn out to be in practice, the more people advocate further trials as a way to remedy failures of previous trials.