Bad Pharma: How Medicine is Broken, And How We Can Fix It. Ben Goldacre
place, because you need to run the experiment many more times to prove the absence of a finding, simply because of the way that the statistics of detecting weak effects work; and you also need to be absolutely certain that you’ve excluded all technical problems, to avoid getting egg on your face if your replication turns out to have been inadequate. These barriers to refutation may partly explain why it’s so easy to get away with publishing findings that ultimately turn out to be wrong.31
Publication bias is not just a problem in the more abstract corners of psychology research. In 2012 a group of researchers reported in the journal Nature how they tried to replicate fifty-three early laboratory studies of promising targets for cancer treatments: forty-seven of the fifty-three could not be replicated.32 This study has serious implications for the development of new drugs in medicine, because such unreplicable findings are not simply an abstract academic issue: researchers build theories on the back of them, trust that they’re valid, and investigate the same idea using other methods. If they are simply being led down the garden path, chasing up fluke errors, then huge amounts of research money and effort are being wasted, and the discovery of new medical treatments is being seriously retarded.
The authors of the study were clear on both the cause of and the solution for this problem. Fluke findings, they explained, are often more likely to be submitted to journals – and more likely to be published – than boring, negative ones. We should give more incentives to academics for publishing negative results; but we should also give them more opportunity.
This means changing the behaviour of academic journals, and here we are faced with a problem. Although they are usually academics themselves, journal editors have their own interests and agendas, and have more in common with everyday journalists and newspaper editors than some of them might wish to admit, as the episode of the precognition experiment above illustrates very clearly. Whether journals like this are a sensible model for communicating research at all is a hotly debated subject in academia, but this is the current situation. Journals are the gatekeepers, they make decisions on what’s relevant and interesting for their audience, and they compete for readers.
This can lead them to behave in ways that don’t reflect the best interests of science, because an individual journal’s desire to provide colourful content might conflict with the collective need to provide a comprehensive picture of the evidence. In newspaper journalism, there is a well-known aphorism: ‘When a dog bites a man, that’s not news; but when a man bites a dog …’ These judgements on newsworthiness in mainstream media have even been demonstrated quantitatively. One study in 2003, for example, looked at the BBC’s health news coverage over several months, and calculated how many people had to die from a given cause for one story to appear. 8,571 people died from smoking for each story about smoking; but there were three stories for every death from new variant CJD, or ‘mad cow disease’.33 Another, in 1992, looked at print-media coverage of drug deaths, and found that you needed 265 deaths from paracetamol poisoning for one story about such a death to appear in a paper; but every death from MDMA received, on average, one piece of news coverage.34
If similar judgements are influencing the content of academic journals, then we have a problem. But can it really be the case that academic journals are the bottleneck, preventing doctors and academics from having access to unflattering trial results about the safety and effectiveness of the drugs they use? This argument is commonly deployed by industry, and researchers too are often keen to blame journals for rejecting negative findings en masse. Luckily, this has been the subject of some research; and overall, while journals aren’t blameless, it’s hard to claim that they are the main source of this serious public-health problem. This is especially so since there are whole academic journals dedicated to publishing clinical trials, with a commitment to publishing negative results written into their constitutions.
But to be kind, for the sake of completeness, and because industry and researchers are so keen to pass the blame on to academic journals, we can see if what they claim is true.
One survey simply asked the authors of unpublished work if they had ever submitted it for publication. One hundred and twenty-four unpublished results were identified, by following up on every study approved by a group of US ethics committees, and when the researchers contacted the teams behind the unpublished results, it turned out that only six papers had ever actually been submitted and rejected.35 Perhaps, you might say, this was a freak finding. Another approach is to follow up all the papers submitted to one journal, and see if those with negative results are rejected more often. Where this has been tried, the journals seem blameless: 745 manuscripts submitted to the Journal of the American Medical Association (JAMA) were followed up, and there was no difference in acceptance rate for significant and non-significant findings.36 The same thing has been tried with papers submitted to the BMJ, the Lancet, Annals of Internal Medicine and the Journal of Bone and Joint Surgery.37 Again and again, no effect was found. Some have argued that this might still represent evidence of editorial bias, if academics know that manuscripts with negative results have to be of higher quality before submission, to get past editors’ prejudices. It’s also possible that the journals played fair when they knew they were being watched, although turning around an entire publishing operation for one brief performance would be tough.
These studies all involved observing what has happened in normal practice. One last option is to run an experiment, sending identical papers to various journals, but changing the direction of the results at random, to see if that makes any difference to the acceptance rates. This isn’t something you’d want to do very often, because it wastes a lot of people’s time; but since publication bias matters, it has been regarded as a justifiable intrusion on a few occasions.
In 1990 a researcher called Epstein created a series of fictitious papers, with identical methods and presentation, differing only in whether they reported positive or negative results. He sent them at random to 146 social-work journals: the positive papers were accepted 35 per cent of the time, and the negative ones 26 per cent of the time, a difference that wasn’t large enough to be statistically significant.38
Other studies have tried something similar on a smaller scale, not submitting a paper to a journal, but rather, with the assistance of the journal, sending spoof academic papers to individual peer reviewers: these people do not make the final decision on publication, but they do give advice to editors, so a window into their behaviour would be useful. These studies have had more mixed results. In one from 1977, sham papers with identical methods but different results were sent to seventy-five reviewers. Some bias was found from reviewers against findings that disagreed with their own views.39
Another study, from 1994, looked at reviewers’ responses to a paper on TENS machines: these are fairly controversial devices sold for pain relief. Thirty-three reviewers with strong views one way or the other were identified, and again it was found that their judgements on the paper were broadly correlated with their pre-existing views, though the study was small.40 Another paper did the same thing with papers on quack treatments; it found that the direction of findings had no effect on reviewers from mainstream medical journals deciding whether to accept them.41
One final randomised trial from 2010 tried on a grand scale to see if reviewers really do reject ideas based on their pre-existing beliefs (a good indicator of whether journals are biased by results, when they should be focused simply on whether a study is properly designed and conducted). Fabricated papers were sent to over two hundred reviewers, and they were all identical, except for the results they reported: half of the reviewers got results they would like, half got results they wouldn’t. Reviewers were more likely to recommend publication if they received the version of the manuscript with results they’d like (97 per cent vs 80 per cent), more likely to detect errors in a manuscript whose results they didn’t like, and rated the methods more highly in papers whose results they liked.42
Overall, though, even if there are clearly rough edges in some domains, these results don’t suggest that the journals are the main cause of the problem of the disappearance of negative trials. In the experiments isolating the peer reviewers, those individual referees were biased in some studies, but they don’t have the last word on publication,