A vaccine against bias
In an exclusive extract from his new book, Chris Chambers (a professor of cognitive neuroscience at the school of psychology, Cardiff University) tells the story of his contribution to the development of ‘Registered Reports’; plus a Q+A.
10 May 2017
My earliest experience of publication bias came in 1999 with the rejection of my first scientific paper.3 "The results are only moderately interesting," chided an anonymous reviewer. "The methods are solid but the findings are not very important," said another. "We can only publish the most novel studies," declared the editor as he frog-marched me and my boring paper to the door. The experience was not what I expected. I was 22, a fresh-faced PhD student in his first year, instilled with the belief that science was a process of objective truth seeking. The very last thing I expected was that a scientific journal – and one so highly respected in my field – would reject a piece of theoretically and methodologically sound research based on the results. It just didn't make sense and seemed incredibly unfair. How could we be expected to control the results? Didn't this just turn publishing into a lottery? Surely there had to be some mistake?
My supervisor wasn't surprised. Although he had approved a draft of the manuscript before submission, he had harbored reservations about the way it was presented. "It didn't really tell a story," he said. "I could see what you were trying to do, but you have to give the reviewers a narrative – something they can get excited about – or they will find your paper boring and reject it." The conversation was formative in my training. "But the results are the results," I exclaimed. "Shouldn't we just let the data tell the story?" He shook his head. "That's not how science works, Chris. Data don't tell stories, scientists tell stories."
Data don't tell stories, scientists tell stories.
Over the next decade I heeded those words. We told all kinds of stories based on data. Big stories led to papers in zeitgeist journals like Nature Neuroscience, Neuron, and PNAS. Smaller stories led us to solid specialist journals. We'd carefully design and conduct experiments and then weave compelling narratives out of the results. It was fun and we published well. We were awarded grants. I got a career. Everybody won.
Or did they? Every paper had a story, but was that story the most unbiased representation of the data? Of course not. We would gloss over imperfections or noisy results that were hard to explain, burying the nuances in "supplementary information" or not mentioning them at all. I'm certain that I committed the sin of hidden flexibility many times, and no doubt at an unconscious level even more than I realize. Every paper had a pithy logline and a tale to tell. As hard as we worked on the science – and we worked very hard to design experiments that were theoretically meaningful – we worked tirelessly on the storytelling.
The strategy worked. In 2006 I was awarded a fellowship that allowed me to move to the UK and establish my own research team at a major London university. Initially I practiced the same tried-and-true formula, one that my new institute already embraced: novel experiment + great results + great storytelling = publication in a prestigious journal. To succeed you needed every element in place; if even one was missing or below par then your study would end up in a lower-ranking specialist journal or the file drawer. By now, however, this style of science was starting to grate on me. I had always found the publish-or-perish culture unappealing, and the pressure at my new institute was higher than ever. As scientists, the one part of an experiment we were supposed (in theory) to relinquish control over was the results. To teach this but to nevertheless pin success on "good results" was a devil's temptation toward bias, questionable research practices, and fraud. I was tired of watching colleagues analyzing their data a hundred different ways, praying like gamblers at a roulette wheel for signs of statistical significance. I was fed up with the inexorable analysis and reanalysis of brain imaging data until a publishable story emerged from an underpowered design. I also became dubious about the robustness of my own work. As a friend dryly observed, "You guys do high-impact work, but you're like magpies. You do one study and move on to something else without ever following it up." He was right, of course. "That's the game," I admitted. "Why would anyone waste time doing the same experiment twice when no funder will pay for it and no top journal will publish it? And why would you take the risk of failing to replicate yourself?" Talk about shooting yourself in the foot.
As my doubts grew, the idealism of my younger self began to reassert itself. My institute was packed to the rafters with brilliant people, but it felt like the scientific equivalent of an elite telemarketing company. Walking in the front door each morning you would face off against a wall-mounted screen listing this week's "top sellers" – the roll call of who published what in which prestigious journal. Nature Neuroscience. Current Biology. Science. The last-author credit would always belong to one of the professors, with the first-author slot filled by a tireless protégé who barely left the building. If your name wasn't on that list (and mine usually wasn't) you felt small, inadequate, an imposter. You pushed yourself harder. You pushed your staff and students harder. As the director of the institute at the time – and a valued mentor – once told me: "This place is powered by appetite. You keep the young researchers hungry. You keep them on fixed-term contracts with uncertain futures, and you place them in competition with the world and each other. The result is high-octane science." I was never convinced that he really believed in this philosophy, but it was the reality within those walls.
After two years in London I left for a more stable academic career, and admittedly a more relaxed professional lifestyle. For several years I continued cranking the handle before a series of events reawakened my idealism with a jolt. In 2011 we had a paper rejected by the Journal of Cognitive Neuroscience for the main reason that because one of the experiments was a close replication of the other, the study and results weren't considered sufficiently novel or important to be publishable. One of the reviewers even told the editor in a private comment (obtained by us only after we had unsuccessfully appealed the rejection): "The methods are strong, but the outcome produced results that aren't particularly groundbreaking."
A year later I snapped. In September 2012 we received a rejection letter from the specialist journal Neuropsychologia, in which our paper was declined because one of the analyses (and not even a test that was important for the conclusions) returned a statistically nonsignificant outcome. We hadn't glossed over it; we hadn't massaged it away by p-hacking or HARK-ing; we had simply reported it honestly and transparently. It was like 1999 all over again. Unpublishable results. Boring study. Reject. A few days later I sat down and wrote a letter to the editors of Neuropsychologia, which I also posted on my personal blog.4 I thanked them for their work on our paper and made it clear that what I was about to say wasn't personal. I had reviewed for the journal and published in it many times, but I was now severing all ties. "My real problem is that by rejecting papers based on imperfect results, Neuropsychologia reinforces bad scientific practice and promotes false discoveries. . . . For this reason, my main purpose in writing is to inform you that I will no longer be submitting manuscripts to Neuropsychologia or reviewing them. I will be encouraging my colleagues similarly."
In the meantime I began to form an idea for a solution. I had been following the work of blogger Neuroskeptic, who for several years had been proposing study preregistration as a way to improve transparency and reproducibility.5 In his articles, and in the comments below them, you could always find a vigorous debate about the potential benefits and drawbacks of preregistration. Requiring authors to prespecify their hypotheses and analysis plans held great promise for reducing a wide range of questionable research practices, such as p-hacking and HARKing. Furthermore, if journals could somehow be convinced to accept preregistered studies without knowledge of the results then it would also prevent publication bias and eliminate the incentive for authors to engage in questionable practices in the first place. It seemed to me a brilliant solution to many problems. Suddenly there would be no further need for excessive storytelling, no need to gloss over inconsistencies and "messiness" of data. But how could you persuade authors and journals to actually do this? Should it be mandatory? Would the process be too cumbersome and bureaucratic? There were dozens of unanswered questions, but I found the debates fascinating, and I had an idea. I just needed a journal to try it in.
An opportunity soon came. A few weeks after publishing my open letter to Neuropsychologia, the chief editor of Cortex, Sergio Della Sala, invited me to join his editorial board. I accepted and immediately set to work on a proposal for a new type of empirical article called a Registered Report in which study protocols (including introduction and methods) would be reviewed before authors gathered their data.6 Registered Reports stem from the simple philosophy that the publishable quality of science should be judged according to the importance of the research question and rigor of the methodology, and never based on whether or not the hypothesis was supported. This wasn't a new idea and certainly wasn't mine. In addition to Neuroskeptic's blog posts calling for preregistration, there had been several proposals within the last 50 years for such a publishing mechanism. As far back as 1966, psychologist Robert Rosenthal wrote:
What we may need is a system for evaluating research based only on the procedures employed. If the procedures are judged appropriate, sensible, and sufficiently rigorous to permit conclusions from the results, the research cannot then be judged inconclusive on the basis of the results and rejected by the referees or editors. Whether the procedures were adequate would be judged independently of the outcome.7
A few years later, in 1970, G. William Walster and T. Anne Cleary from the University of Wisconsin offered the same idea:
In proposing this alternative policy, we argue that all decisions involving the treatment of data should be considered design decisions. Then, since the decision to publish the results of a study is particular treatment of data, it follows that the same limitations should be imposed on publication decisions as are imposed on all designs. When one views publication in this way, it becomes immediately clear that a specific change should be made in current policy. There is a cardinal rule in experimental design that any decision regarding the treatment of data must be made prior to an inspection of the data. If this rule is extended to publication decisions, it follows that when an article is submitted to a journal for review, the results should be withheld. This would insure that the decision to publish, or not to publish, would be unrelated to the outcome of the research.8
Neither Rosenthal's nor Walster and Cleary's proposals were ever implemented, but perhaps Registered Reports could be our chance to finally do so. In the model I had in mind, protocols considered scientifically important and robust and that met strict guidelines for prospective rigor would be offered "in principle acceptance." The journal would then commit to publishing the outcomes regardless of how the results turned out, provided the authors adhered to their preregistered protocol, that various prespecified quality checks were passed, and that the conclusions were based on the evidence obtained.
Three days later my proposal was ready for feedback. I distributed it to the editorial board, and at the same time I also posted it as an open letter on my blog.9 I knew that publishing an open letter would be controversial as journals are accustomed to considering such proposals behind closed doors rather than in sight of the public. But I had resolved to take an open route for two reasons. My foremost concern was that the proposal might have a glaring flaw that I hadn't considered, and the best way to find that out was to crowd source critical feedback from the wider community, including those who had commented on Neuroskeptic's earlier proposals. And secondly, I wanted the journal –and me – to be held accountable for whatever decision it reached about Registered Reports. I had heard on the grapevine that similar ideas had been mooted in the past at other journals and binned by conservative boards amid concerns that accepting papers before results could force the journals to publish negative or inconclusive outcomes. And even though I had known Sergio Della Sala for many years and respected him, the wider Cortex editorial board was an unknown quantity. I had no idea how the board would react in private, but going public provided a crucial test of the idea. If the journal decided to reject Registered Reports then its reasons would need to be sufficiently defensible to survive public scrutiny. And if the journal adopted them then the community, having been involved since the beginning, could play a role in shaping the initiative. In one case we would learn something important; in the other we would hopefully do something important.10
Within a few days my open letter had attracted thousands of views and dozens of comments below the line. The feedback contained many constructive suggestions for improvement, most of which I integrated into the proposal. However my strategy divided the Cortex editorial board. Some editors were supportive of Registered Reports, but many were not, even though none of the strongest opponents ever expressed their views to me personally. Several board members also felt that I had risen above my station in proposing the initiative in such a public way, just a few days after being invited to join the editorial board.
Their response was understandable. I knew that the open letter would seem like a coup to some, but I decided that the ire of a few editors was a small price to pay for exposing Registered Reports to the crucible of public opinion and giving it the best possible chance of a fair hearing. Fortunately, the chief editor was strongly in favor. A month later, in November 2012, Cortex became the first journal to approve Registered Reports. We assembled an editorial subcommittee to handle submissions and prepared for the launch in May 2013.11
How do Registered Reports work? Unlike conventional papers where peer review happens after the entire study is finished and written up, here the review process is split into two stages (see figure 8.1). At Stage 1, authors submit an introduction, proposed methods and the analysis plan before they have collected data. These are initially triaged for scientific significance, clarity, and adherence to specific Stage 1 review criteria. I took a lot of time to shape these criteria based on ideas and feedback from Neuroskeptic and many others. Editors and reviewers at Stage 1 assess:
- The significance of the research question(s)
- The logic, rationale, and plausibility of the proposed hypotheses
- The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis)
- Whether the clarity and degree of methodological detail would be sufficient to replicate exactly the proposed experimental procedures and analysis pipeline
- Whether the authors provide a sufficiently clear and detailed de- scription of the methods to prevent undisclosed flexibility in the experimental procedures or analysis pipeline
- Whether the authors have considered sufficient outcome-neutral conditions (e.g., absence of floor or ceiling effects; positive controls) for ensuring that the results obtained are able to test the stated hypotheses
Let's take a moment to explore these points in more detail. The first two criteria are designed to test the scientific credibility of the proposal. Is the research question important? Do the hypotheses arise logically from the literature? The third criterion tests whether the proposed methods are rigorous and realistic, with particular emphasis on statistical power. As we saw in chapter 3, underpowered experiments are common in psychology and neuroscience, increasing the rate of false negatives and false positives. At Cortex we decided to set a minimum statistical power of 0.9, meaning that the experiment must have a sufficiently large sample that all statistical hypothesis tests have a 90 percent chance of correctly rejecting a false null hypothesis. We also invite authors to consider alternative Bayesian sampling and hypothesis-testing methods in which statistical power is irrelevant and data are simply acquired until a sufficiently decisive answer is reached.
The fourth and fifth criteria focus on the extent of detail in the proposed methods. Are the study procedures elaborate enough to provide a replication "recipe" that could be repeated by other researchers? This takes into account the major cause of unreliability in psychology (as discussed in chapter 3), that method sections are often too vague to permit replication. In addition, are the proposed analyses sufficiently precise to prevent "wriggle room" that could allow authors to either consciously or unconsciously exploit researcher degrees of freedom such as p-hacking? Finally, the sixth criterion requires authors to specify, in advance, what quality checks and positive controls are required in order for the study to provide a fair test of their hypothesis.12 An experiment might produce meaningless results because the equipment was wrongly calibrated, or because behavioural performance from the participants was at ceiling or floor levels, or because a condition with a known effect failed (i.e., a reality check or "positive control"). To avoid publication bias, any such tests would have to be outcome-neutral, which is to say they must be independent of the study hypotheses. Keeping these tests independent of hypotheses prevents the common bias, seen in status quo publishing, of authors and reviewers using positive or negative outcomes to decide whether or not an experiment "worked," and to reject or accept it accordingly.
If the submitted manuscript passes editorial triage, it is then it is sent for in-depth peer review where experts assess the proposed study rationale and methods, testing it against the Stage 1 criteria. Following a favourable assessment, which might require revision of the protocol, the journal can then offer the paper "in-principle acceptance" or IPA. Only once IPA is obtained can researchers then implement the preregistered experiment. Following data collection and analysis, they resubmit a Stage 2 manuscript that includes the introduction and methods from the original submission plus the results and discussion. The results section of the completed manuscript must include the outcome of all preregistered analyses. Crucially, however, it can also include any additional unregistered analyses provided they are clearly distinguished and identified as exploratory (post hoc) rather than confirmatory (preregistered). The authors are also required to deposit their data in a publicly accessible archive, addressing the sin of data hoarding. At Stage 2 the reviewers, who are ideally the same as at Stage 1, consider the following criteria:
- Whether the data are able to test the authors' proposed hypotheses by passing the approved outcome-neutral criteria (such as absence of floor and ceiling effects)
- Whether the introduction, rationale, and stated hypotheses are the same as the approved Stage 1 submission (required)
- Whether the authors adhered precisely to the registered experimental procedures
- Whether any unregistered post hoc analyses added by the authors are justified, methodologically sound, and informative
- Whether the authors' conclusions are justified given the data
The first of these criteria assesses whether the experiment passed the preapproved quality checks and positive controls. For example, if the study was on the effect of alcohol on cognitive function, did the authors confirm that the alcohol had the necessary intoxicating effect through an established questionnaire or other measure? If this positive control were to fail, then this would suggest that the intervention was administered incorrectly, and thus that the hypothesis was not properly tested. The second and third criteria check for consistency between the introduction and methods sections of the preregistered protocol with the same sections in the Stage 2 manuscript. Finally, the fourth and fifth criteria check that any additional exploratory analyses, and the overall study conclusions, are sensible. The finished article is published only after this process is complete.13
Note what is missing from the Stage 2 review criteria. There is no assessment of impact, novelty, or originality of results. There is no consideration of how conclusive the results happen to be. There is no weight placed on whether or not the experimental hypothesis was supported. Such considerations may be relevant for scientists in deciding whether a study is exciting or newsworthy, but they tell us nothing about scientific quality or the longer-term contribution of the study. For Registered Reports, the outcomes of hypothesis tests are irrelevant to whether a piece of scientific research meets a sufficiently high standard to warrant publication.
At around the time that Cortex launched Registered Reports in May 2013, similar initiatives began popping up at Perspectives on Psychological Science and Attention, Perception, and Psychophysics.14 My impression was that pre-registration was fast transforming from theory into reality, but I also became aware that some journals were quietly shelving the idea without transparent debate. Those of us working to promote Registered Reports were hearing that chief editors, petitioned behind closed doors to adopt the initiative, were rejecting it on the grounds that it would lead to the publication of negative findings that might be cited less than standard articles, leading to a consequent drop in the journal's impact factor. I found this worship of impact factors deeply disappointing, and it reinforced my belief that closed-door politics are the antithesis of rational decision making (and are probably the reason that Registered Reports had gone nowhere since the 1960s).
My colleague Marcus Munafò and I decided that something needed to be done or the initiative could be killed by the sin of bean counting before it got started. Three days after the Cortex launch we met at a pub in Bristol and devised a plan. Over the next month we would assemble dozens of scientists and members of journal editorial boards, and, in June 2013, we published a joint open letter article in the Guardian headlined "Trust in Science Would Be Improved by Study Pre-registration."15 The article, which was eventually signed by more than 80 senior academics, called for all life science journals to offer Registered Reports as a new option for authors. Registered Reports were not going to go quietly into the night.
The storm that followed was astonishing. The publicity of the Guardian opened the debate to a much wider community of researchers, and two opposing camps took shape. On one side were the reformers pushing for greater transparency and reproducibility; they were generally in favor of Registered Reports being adopted, if only to find out how the articles it produced compared with standard publishing. On the opposite side was a rearguard of (often powerful) traditionalists who argued that preregistration would "put science in chains."16 I found these criticisms puzzling. Many were based on what appeared to be simple misunderstandings or elementary errors about the initiative. Had these people even read what we had written? Other reactions struck me as disingenuous misrepresentations that appeared to have no purpose except to derail the initiative and preserve the status quo. What follows are some of the main objections that emerged, my personal interpretation of the objection, and a longer explanation in each case:
- Registered Reports prevent exploration of data and curb scientific creativity. Verdict: False. This is perhaps the greatest misconception of the initiative. Authors of Registered Reports are welcome to perform unregistered exploratory analyses with as much creativity as they can muster, just as they would with standard publishing. The only requirement is that such analyses are labeled transparently as exploratory rather than being presented as confirmatory. I was particularly surprised at how many traditionalists clung (and still cling) to this argument, despite exploratory analyses being formally invited as part of Registered Reports policies at all adopting journals. It appears that some traditionalists not only want the freedom to conduct exploratory analyses (which Registered Reports explicitly welcome), but also want to be able to present those exploratory analyses as confirmatory hypothesis testing. I reached the rather unsettling conclusion that the traditionalists who continued to oppose Registered Reports on these grounds simply wanted the freedom to HARK, and because HARKing is socially unacceptable the opponents had no choice but to argue that Registered Reports would somehow prevent exploration.
- Registered Reports could lead to the denigration of exploratory, observational research. Verdict: Unsubstantiated and probably false. This is a subtler version of the first counterargument. The concern here is that developing a more robust mechanism for hypothesis testing would somehow lead to exploratory research (that is, research that doesn't involve a priori hypotheses) being sidelined and seen as second-class. However this concern is illogical and speaks to a peculiar insecurity held by those who value exploration in science. All that Registered Reports do is to clarify the distinction between confirmatory hypothesis testing and more exploratory forms of analysis and hypothesis generation, both of which are valuable to science and welcomed as part of the format. But if the mere act of distinguishing one from the other denigrates exploratory research then what does that say about the value our community places in exploratory research in the first place? High rates of HARKing in psychology, as identified by Norbert Kerr and Leslie John (see chapters 1 and 2), show that exploratory research is indeed regularly crammed into an ill-fitting confirmatory framework, shoehorning Thomas Kuhn into Karl Popper for the sole purpose of achieving publication. Rather than criticizing Regis- tered Reports for adding clarity to a system that champions obfuscation, why weren't the traditionalists developing parallel publish- ing initiatives that celebrate purely exploratory science? Indeed, if exploration mattered so much to them, why hadn't they done so years ago?17
- Registered Reports can be gamed by "preregistering" a study that is already completed. Verdict: True only for fraudsters. It's a curious sociological phenomenon that many otherwise reputable psychologists genuinely believe this is a possibility. Do they really hold their colleagues in such low regard? In any case, for Registered Reports such a strategy would not only be fruitless but is impossible without committing fraud. When authors submit a Stage 2 manuscript it must be accompanied by a laboratory log indicating the range of dates during which data collection took place together with a certification on behalf of all authors that no data (other than pilot data in the Stage 1 protocol) was collected prior to the date of IPA. Time-stamped raw data files generated by the pre-registered study must also be deposited in a public archive, with the time-stamps postdating in-principle acceptance. Submitting a Stage 1 protocol for a study that had already been completed would therefore require complex falsification of laboratory records and data time stamps. Even putting aside the fact that such behavior would be clearly fraudulent, traditionalists who raise this concern overlook a major problem with such a strategy: based on the comments of reviewers, editors usually require changes to the pro- posed experimental procedures following Stage 1 review. Even minor changes to a protocol would be impossible if the experiment had already been conducted, and would therefore defeat the purpose of preregistration. Unless the authors were willing to engage in further dishonesty about what their experimental procedures involved – a degree of fraud that is beyond redemption – "pre- registering" a completed study would be a highly ineffective publication strategy.
- Registered Reports won't stop fraud. Verdict: Straw man. This was a common reaction and another straw man because Registered Reports are not designed to stop fraud. No publishing mechanism, least of all the status quo, can protect science against complex and premeditated acts of misconduct. What Registered Reports achieve, above all, is to eliminate publication bias along with the pressure to massage data, reinvent hypotheses, or behave dishonestly in the first place.
- Registered Reports lock authors into publishing with a specific journal. Verdict: False. Authors are free to withdraw their Registered Report submissions at any time – there is no binding contract with the journal. The only requirement is that study withdrawal after IPA leads to the publication of a Withdrawn Registration, which includes the abstract from the Stage 1 submission together with a reason for the withdrawal. This ensures that the process is transparent to the scientific community.
- Registered Reports fail to lock authors into publishing with a specific journal. Verdict: Red herring. Some traditionalists have criticized Registered Reports for exactly the opposite reason: that it could be gamed by authors precisely because there is no binding contract with the journal. Their argument goes like this. Suppose a researcher has a Stage 1 protocol accepted in principle with a specialist journal. They conduct their study but find something amazing and unexpected in the results that they feel could be sold to Nature or Science. What would stop them from withdrawing their paper from the specialist journal and resubmitting it as a conventional (unregistered) article to a more prestigious outlet? The answer is: nothing. But there is a catch. Authors would do so knowing that their choice would be transparent to their peers because withdrawing a paper after IPA triggers publication of a Withdrawn Registration. It would interesting to see how an author's peers would react to a reason for withdrawing a Registered Report on the grounds that: "After finding something remarkable and unexpected, we decided we could publish this in a more prestigious journal." In many ways, such transparent careerism would be refreshing.
- Registered Reports are not suitable for exploratory science or for developing new methods where there are no hypotheses. Verdict: Red herring. This is a common objection but irrelevant because the format isn't designed to be applicable to anything other than hypothesis-driven science, which makes up a large bulk of published research in psychology and beyond.
- Registered Reports are suitable only for one-shot experiments, not a series of sequential experiments where the outcomes of one experiment feed into the design of the next. Verdict: False. At many of the adopting journals, authors can register experiments sequentially, each time adding to the previous set. At each stage in the cycle the previous version of the paper is accepted, eliminating the risk that the addition of later registered experiments could jeopardize publication of the earlier ones.
- Reviewers of Stage 1 submissions could steal my ideas and scoop me. Verdict: Possible but highly unlikely. Scooping is the bogeyman of science. Everyone knows someone who overheard someone talking with someone about a story in which someone got scooped. Maybe. In fact such cases are very rare.18 Concerns about being scooped do not stop researchers applying for grant funding or presenting ideas at conferences, both of which involve releasing ideas to a much larger group of potential competitors than would typically see a Stage 1 Registered Report (which usually isn't published until the study is completed). It is also noteworthy that once in-principle acceptance is awarded, the journal cannot reject the Stage 2 submission because similar work was published elsewhere in the meantime. Therefore, even in the unlikely event of a reviewer rushing to complete a preregistered design ahead of the authors, such a strategy would bring little career advantage for the perpetrator, and would very possibly backfire.19
- If Registered Reports were mandatory or universal they would . . . Verdict: Straw man and slippery slope fallacy. However this sentence ends, the objection is irrelevant because we never proposed that Registered Reports should be mandatory or universal – indeed quite the opposite. The argument that Registered Reports should be a universal option for hypothesis-driven research is quite different to the argument (proposed by nobody) that it should be obligatory across all science.
- A major previous discovery (e.g., mirror neurons) would never have been possible under a Registered Reports model, therefore Registered Reports would hold back science. Verdict: Arguably false but irrelevant even if true. This concern is beside the point because Registered Reports have never been suggested as a replacement for exploratory science, only as an enhancement of hypothesis-driven science. But even putting that fact to one side, how can we be sure that discoveries such as mirror neurons wouldn't have emerged serendipitously from a Registered Report? A major misconception of Registered Reports is that they hinder serendipity when in fact they protect it. To illustrate this point, suppose you conducted a standard (unregistered) study and found something serendipitous that you believe is surprising and groundbreaking. What do you suppose will happen when you submit your findings to a journal through a conventional (unregistered) publishing route? Because the results are surprising the reviewers are likely to be skeptical about them, holding up publication for months or years while you argue your case or run additional experiments. You might even find that the barriers are too great and give up, dumping the results in a file drawer. Now suppose you conducted exactly the same study as a Registered Report. At Stage 2, reviewers can't recommend rejection on the basis of the results; therefore your serendipitous finding is protected.20 This prompts us to turn the tables and ask: how many serendipitous results such as mirror neurons might have even been reported even sooner if they were revealed within Registered Reports?
- We don't need Registered Reports because we have replication. Verdict: False. This argument ignores the fact that direct replication in psychology is extremely rare and associated with many disincentives, not least of which is the contempt shown by journals for replication studies. Registered Reports, however, provide a perfect avenue for replications by provisionally accepting papers before authors invest the resources into conducting them. What better incentive could you have for persuading authors to consider conducting replication studies?
- We don't need Registered Reports because protocols are already assessed through grant reviews. Verdict: False. The first time I heard this criticism I couldn't believe the person was serious. Any psychological scientist who has reviewed or applied for a major grant knows that such applications contain nothing close to the level of technical detail that is required for a Stage 1 Registered Report. Grant applications propose the bigger picture of a planned program of work; they rarely drill down into the specific details of individual experiments to a degree that is required for a specific Stage 1 protocol. And even for the occasional cases where a proto- col is sufficiently detailed, funded grant applications are almost never published so who would know whether the researcher did what they said they would?21 A private preregistration that is never published is worthless to the scientific community.
- With publication virtually guaranteed, authors of Registered Reports will conduct their experiments poorly, leading to meaningless results. Verdict: False. This rather cynical objection emerges fairly regularly from traditionalists. As one put it: "If you're a young researcher and you get your good idea preaccepted based on the question and design, then it's just more efficient to do a quick, sloppy analysis and damn the results – after all, who cares? The paper was already accepted. Time to move on to the next one."22 Aside from portraying early-career scientists as little more than ladder climbers, this argument ignores the fact that Stage 1 submissions must include outcome-neutral tests and quality checks for ensuring that the proposed methods are capable of testing the stated hypotheses. Stage 2 submissions that fail any critical outcome-neutral tests can be rejected, providing an inherent safeguard against sloppy science. This objection also disregards the fact that Stage 1 review involves stringent and detailed assessment of the proposed analyses. It is no more possible for authors to con- duct a "quick, sloppy analysis" as part of a Registered Report than for a standard conventional article – indeed it may be a lot less likely for a Registered Report.
- The case for Registered Reports assumes that scientists act dishonestly, and sends the message that there is no trust in the scientific community. Verdict: Non sequitur and red herring. This argument rests on the false premise that bad practice is synonymous with deliberate deceit. As we have seen, however, bias and questionable research practices can be unconscious or stem from ignorance without implying any dishonesty. At a deeper level, the objection misdirects us to place a greater emphasis on how psychological science is perceived externally, and how researchers feel, than on how the research is actually conducted. Regardless of whether bias and questionable practices are conscious or unconscious, the solutions are the same.
- Registered Reports are based on a naive view of the scientific method. Verdict: False. Registered Reports provide a way to protect the integrity of the deductive scientific method, but they do not elevate deductive science above alternative exploratory approaches. One might just as easily argue that a better drug for treating cancer is "naive" because it doesn't treat hepatitis. There is also a curious inconsistency inherent in this viewpoint. Some traditionalists may well believe that the hypothetico-deductive model is the wrong way to frame science, but if so, why do the very same researchers routinely publish articles that report p values and purport to test a priori hypotheses? Are they merely pretending to be deductive in order to get their papers published? Registered Reports ensure that when researchers are truly engaging in deductive science that they do it in as unbiased a way as possible and that they are rewarded appropriately for doing so. Those who criticize Registered Reports on these grounds are not actually arguing against Registered Reports. They are criticizing the fundamental way research is taught and published in the life sciences, despite supporting that very system themselves and without proposing any alternative.
- Registered Reports will overload the peer-review system. Verdict: Unknown but probably false. It is true that Registered Reports involve two stages of peer review at the same journal, each of which is likely to involve at least one round of manuscript revision. However, this is offset by the fact that authors are much less likely to be successively rejected by multiple journals, as pointed out nicely by neuroscientist Molly Crockett in response to my Cortex open letter:
[T]he value of this system is that a given manuscript will (ideally) only go through a single review process – so in terms of collective hours spent reviewing papers, your proposal may actually reduce the burden on the scientific community. Consider the process we have now. Papers often face a string of rejections before getting published (and often rejections are based on data, not methods – e.g., null findings). A given paper may go through the review process at 3 or 4 different journals before getting published – so anywhere from 6 to 12 (or more) reviewers may take the time to review the paper. This is extremely inefficient both for reviewers, and for authors, who must spend a substantial amount of time re-formatting the manuscript for different journals. None of this is time well spent. In contrast, the extra time involved for authors and reviewers in your pro- posed system *is* time well spent – the steps you outline guard against all sorts of problems that are rife in the scientific literature.23
- Registered Reports will lead to researchers bombarding journals with protocols that have no funding or ethics and will never happen. Verdict: False. As one critic said: "Pre-registration sets up a strong incentive to submit as many ideas/experiments as possible to as many high impact factor journals as possible."24 Armed with IPA, the researcher could then prepare grant applications to support only the successful protocols, discarding the rejected ones. However, this entire objection is beside the point because Stage 1 Registered Reports must include a statement confirming that all necessary support (e.g., funding, facilities) and approvals (e.g., ethics) are already in place and that the researchers could start promptly following IPA. Since these guarantees could not be made for un- supported proposals, the concern is moot.
As we can see, many of the critical reactions to Registered Reports were based on misunderstandings, logical fallacies, or ideological objections parading as rational counterarguments. In the wake of the Guardian letter we also faced a remarkable intensity of ad hominem attacks. Through various channels we were accused of being "self-righteous," "sanctimonious," "fascists," "a head prefect movement," "Nazis," "Stasi," "crusaders" on a "witch hunt," and worse. In one widely circulated e-mail, a professor who I happened to know went so far as to belittle the 80 scientists who signed our Guardian letter, stating: "Looking at the Chambers letter, I was struck by the lack of scientific weight of the signatories."25 That the mere suggestion of a new type of article in science could provoke such aggression was telling about what the aggressors sought to protect. As Niccolò Machiavelli wrote over five centuries ago, "the innovator has for enemies all those who have done well under the old conditions."
Despite the fact that most of the critical reactions to Registered Reports were in my view deeply flawed, several points did have merit. One concern was that the time taken to review Stage 1 protocols could be incompatible with short-term student projects, where students would usually lack the time to wait for peer review and provisional acceptance before starting data collection. There are at least two possible solutions to this problem. The first is to accept that operating within such a rigid schedule is incompatible with Registered Reports and either conduct such studies as unregistered or instead preregister protocols for student projects in a database without peer review, such as the Open Science Framework. A more radical solution would be to reorganize undergraduate student projects into a daisy-chain system where students work for several months on a Stage 1 protocol while simultaneously implementing the provisionally accepted protocol from a previous year's student. Under this system, students would never implement the specific protocol that they submitted for peer review but they would nevertheless gain intensive training in all aspects of deductive science.
A second concern is that the sin of bean counting (see chapter 7) could make Registered Reports an unattractive choice for younger scientists. Because Registered Reports set rigorous methodological standards, requiring large sample sizes and higher statistical power, researchers can find that their experiments take longer to complete than they otherwise would under the status quo. As we saw in chapter 3, psychology and neuroscience are endemically underpowered, which permits researchers to publish a higher volume of lower-quality papers, rich in post hoc storytelling but making only a limited and biased contribution to knowledge. Registered Reports turn this equation upside down. Researchers who publish Registered Reports are likely to publish a fewer number of larger and more credible papers; however, as long as the community values quantity over quality then a more credible publication record is not guaranteed to provide young scientists with more secure careers in science – and this is a problem that can be fixed only by senior scientists changing the way they assess junior researchers for jobs, grants, and career progression. A related concern is the conservatism of the most prestigious journals, which despite claiming to publish the highest-quality research nevertheless rely on publication bias to select which papers to accept. To compete in the current academic system, young scientists need to be strategic in where they send their work, so if Registered Reports were offered only within specialist journals their reach and appeal would hit a glass ceiling. In the shorter term, the solution to this problem is the central goal of the Registered Reports initiative: to see the format launched within all journals that publish hypothesis-driven science, regardless of prestige. In the longer term, the solution – as we will discuss later – is to do away with journals altogether, rejecting the premise of "prestigious" outlets and allowing the quality and contribution of individual studies to speak for themselves.
A third limitation of Registered Reports is that it is unclear to what extent it can be applied to analyses of preexisting data. We saw in chapter 4 how analysis of existing data archives can be used to answer important questions that may not occur to the investigators who conduct the original studies. This raises the question of whether analysis of preexisting data could be preregistered under a Registered Reports mechanism without the process being biased by the authors having prior knowledge of the outcomes. A potential solution to this problem – at least for sufficiently "big" data – is to consider such existing data sets as split-half discovery and replication samples. The system would be analogous to a card game in which one player deals while the other cuts the deck: After vowing on the record that they had never viewed the dataset in question, authors would submit a proposed analysis to the journal. If the proposal passes prestudy review and is provisionally accepted, then the journal would then decide (using a random algorithm) which half of the data is the discovery sample and which is the replication sample – that is, a random cutting of the deck. Under this system, a registered secondary analysis would be considered to produce a finding of note only if it replicated in each subsample.
Finally, one important question is whether there is any evidence that Registered Reports will be effective in reducing publication bias and questionable research practices. As with any new initiative, prior evidence of effectiveness cannot and does not (yet) exist.26 From a logical point of view, barring failure of the peer-review process, prespecification of hypotheses and analysis plans is guaranteed to eliminate the practices of p-hacking and HARKING. Similarly, unless Daryl Bem was right all along about the existence of precognition, accepting papers before results are known renders them immune to publication bias. But whether eliminating these practices will lead to more reproducible science is an empirical question that has yet to be answered. In 2014, the International Journal of Radiation Oncology, Biology, Physics launched a randomized trial of Registered Reports, results of which are pending. Among other outcome measures, the editors are testing whether the rate of positive, negative, and indeterminate results will differ between studies that are assigned randomly to either a standard unregistered format or to a Registered Report.27 There are also signs in the clinical trials literature that preregistration, in general, may reduce publication bias and/or researcher bias. A 2015 analysis of medical trials in the prevention of heart disease found that, since the advent of mandatory clinical trial registration in 2000, the percentage of published trials showing a statistically significant effect dropped from 57 percent (17 out of 30) to just 8 percent (2 out of 25).28 Preregistration might therefore help put the brakes on false discoveries.
Three years after the launch of Registered Reports the tone of the discussion has shifted. The storm of personal attacks has subsided, and the opposition from many traditionalists appears to have softened. The debate is far from over, but important progress has been made. Some who initially resisted the initiative have become supporters, and we are seeing completed examples of Registered Reports now appearing in the literature.29 At the time of writing, more than 40 journals have adopted the format, and not just in psychology and neuroscience but also in cancer biology, empirical accounting, nutrition research, political science, and psychiatry. In addition, by the time this book is in print, a number of "high-impact" journals are likely to be offering them, including Nature Human Behaviour.30 The initiative has been heralded by the UK Academy of Medical Sciences as one of several promising solutions to improving research transparency and eradicating publication bias.31 In parallel, more than 750 journals across the full range of sciences have agreed to review their adoption of open science as part of the Transparency and Openness Promotion (TOP) guidelines, a process that involves the consideration of Registered Reports.32 In March 2014 we also established the Registered Reports Committee at the Center for Open Science. This committee, which I currently chair, aims to develop and promote Registered Reports as a new way to improve the credibility of published research.
Perhaps the most significant step forward for Registered Reports came in November 2015. On the 350th anniversary of launching the world's first scientific journal, the Royal Society officially launched the initiative within its journal Royal Society Open Science. This is a major development not only because it is the first endorsement of Registered Reports by a learned society, but because it establishes the format well beyond psychology to cover the full spectrum of more than 200 physical and life sciences.33 From here a future beckons in which Registered Reports may become a popular format for science in general, offered at all journals that publish the outcomes of hypothesis testing. If our goal is to establish a reproducible knowledge base, then the literature must include at least some papers that are published based on theoretical value and methodological rigor, independently of results.
- Extracted from The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice by Chris Chambers, published by Princeton University Press.
Notes
- It should also be pointed out that just because an area of research doesn't meet all these conditions for being "science" doesn't mean that it is flawed, useless, or not research. Qualitative psychological research, for instance, can reveal rich insights into behavior and society even if such approaches do not seek to test hypotheses or quantify phenomena.
- For a discussion of wider reproducibility problems in biomedicine, see the 2015 report issued by the UK Academy of Medical Sciences: http://www.acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research/.
- Parts of this section are adapted from the following articles: Chris Chambers, "Are we finally getting serious about fixing science?," http://www.theguardian.com/science/head-quarters/2015/oct/29/are-we-finally-getting-serious-about-fixing-science; and Christopher D. Chambers, Eva Feredoes, Suresh Daniel Muthukumaraswamy, and Peter Etchells, "Instead of 'playing the game' it is time to change the rules: Registered reports at AIMS Neuroscience and beyond," AIMS Neuroscience 1, no. 1 (2014): 4–17. This article can freely downloaded from http://www.aimspress.com/article/10.3934/Neuroscience.2014.1.4/pdf.
- Chris Chambers, "Why I will no longer review or publish for the journal Neuropsychologia," http://neurochambers.blogspot.co.uk/2012/09/why-i-will-no-longer-review-or-publish.html.
- Neuroskeptic, "Fixing science—systems and politics," Fixing Science - Systems and Politics | Discover Magazine.
- In my original proposal it was called a registration report, which I soon changed because it sounded like some form of tedious government bureaucracy.
- See R. Rosenthal, Experimenter Effects in Behavioral Research (New York: Appleton-Century-Croft, 1966). When I wrote to Rosenthal in 2015 to inform him that we had finally put his plan in place after nearly 50 years, he said he was pleased that we had implemented his "pipe dream" from the mid-1960s.
- G. William Walster and T. Anne Cleary, "A proposal for a new editorial policy in the social sciences," American Statistician 24, no. 2 (1970): 16–19, http://dx.doi.org/10.1080/00031305.1970.10478884. Similar proposals were later mooted by Robert Newcombe in 1987 and Erick Turner in 2013. See Robert G. Newcombe, "Towards a reduction in publication bias," BMJ 295, no. 6599 (1987): 656–59, http://dx.doi.org/10.1136/bmj.295.6599.656; and Erick H. Turner, "Publication bias, with a focus on psychiatry: Causes and solutions," CNS Drugs 27, no. 6 (2013): 457–68.
- For my original open letter to Cortex, see http://neurochambers.blogspot.co.uk/2012/10/changing-culture-of-scientific.html.
- I considered the possibility of proposing the idea initially in private and then going public only if it was rejected, but I was concerned that this could reek of sour grapes. Also, this strategy wouldn't have allowed the proposal to benefit from wider peer review.
- At the time of writing, the Registered Reports subcommittee at Cortex includes me, Dr. Rob McIntosh from the University of Edinburgh, Dr. Pia Rotshtein from the University of Birmingham, Professor Klaus Willmes from RWTH Aachen University, and Zoltan Dienes from the University of Sussex.
- This feature was inspired by a remark from psychologist Hal Pashler, who in commenting in one of Neuroskeptic's posts on study preregistration (see Neuroskeptic, "Fixing science – systems and politics,"), said, "reviewers should get to specify some outcome-neutral criteria for publishing the study, e.g., that you do not have a floor effect or ceiling effect, that manipulation checks turn out OK, etc. If you don't do this, then you are asking journals to precommit to publishing studies that fail to offer real tests of hypotheses."
- For an example of Registered Reports guidelines in full, the reader is directed to the Cortex format: http://cdn.elsevier.com/promis_misc/PROMIS%20pub_idt_CORTEX%20Guidelines_RR_29_04_2013.pdf.
- Perspectives on Psychological Science offers a unique twist on Registered Reports, focused on replications and calling for multisite collaborations attempts. It also provides some funding to support these studies. The Perspectives initiative was devised independently of Registered Reports by Dan Simons, Alex Holcombe, and others.
- Chris Chambers, Marcus Munafò, and 83 signatories: "Trust in science would be improved by study pre-registration," http://www.theguardian.com/science/blog/2013/jun/05/trust-in-science-study-pre-registration.
- The main opposition to Registered Reports is exemplified in this response to our Guardian article by Sophie Scott, professor of cognitive neuroscience at University College London: https://www.timeshighereducation.com/comment/opinion/pre-registration-would-put-science-in-chains/2005954.article.
- When I make this point in seminars I sometimes get asked, "Why aren't you proposing an initiative to support exploratory science?" The answer is that we are. An initiative called Exploratory Reports is currently in development at Cortex, led by Rob McIntosh.
- Since February 2014, the Open Scoop Challenge has remained standing and unclaimed: http://software-carpentry.org/blog/2014/02/open-scoop-challenge.html.
- An additional disincentive for reviewers to scoop is that the "manuscript received" date in the final published Registered Report refers to the initial Stage 1 submission date and so will predate the "manuscript received" date of any standard submission published by a competitor. Therefore, even if a reviewer went ahead and stole the idea and published it first, the original authors could prove they had the idea first.
- By splitting the review process into two stages and preventing reviewers from assessing the quality of papers according to the results of hypothesis tests, we also prevent a form of bias by peer reviewers known as CARKing: "critiquing after results are known" (a term coined by Brian Nosek and Daniël Lakens: https://osf.io/vwfk2/). CARKing is a form of motivated reasoning in which a reviewer, objecting to the results, raises spurious concerns about the methods in order to prevent publication. For conventional (unregistered) articles, CARKing is impossible to prove – when methods and results are reviewed at the same time, there is no definitive way to distinguish a true methodological objection from CARKing. For Registered Reports, however, any CARKing is obvious and easy to prevent. We see evidence of CARKing if a reviewer approves a submission at Stage 1 but then raises new objections to the same methods after results are presented at Stage 2. While reviewers are free to enter such comments into Stage 2 reviews, the validity of the (already approved) method is not one of assessment criteria; therefore CARKing is not only transparent but impossible.
- Hint: they almost certainly wouldn't. As a former colleague and eminent neuroscientist once told me: "Never apply for a grant for something you haven't already done."
- See http://neurochambers.blogspot.co.uk/2012/10/changing-culture-of-scientific.html?showComment=1349772625668#c58363015480342 36209.
- This insult was an extraordinary misfire given that the signatories included, among other luminaries, Dorothy Bishop, Morton Ann Gernsbacher, John Hardy, John Ioannidis, Steven Luck, Barbara Spellman, and Jeremy Wolfe.
- Some opponents of preregistration suggested that we should have preregistered the Registered Reports initiative (e.g., https://nucambiguous.wordpress.com/2013/07/25/preregistration-a-boring-ass-word-for-a-very-important-proposal/#comment-540). Interestingly, what these critics seemed not to consider is that by advocating preregistration as a way of presumably enhancing the credibility of Registered Reports, they tacitly assume that preregistration does something useful in the first place.
- Loren K. Mell and Anthony L. Zietman, "Introducing prospective manuscript review to address publication bias," International Journal of Radiation Oncology, Biology, Physics 90, no. 4 (2014): 729–732, http://dx.doi.org/10.1016/j.ijrobp.2014.07.052.
- Robert M. Kaplan and Veronica L. Irvin, "Likelihood of null effects of large NHLBI clinical trials has increased over time," PLOS ONE 10, no. 8 (2015): e0132382, http://dx.doi.org/10.1371/journal.pone.0132382.
- Recent examples at Cortex include Jona Sassenhagen and Ina Bornkessel- Schlesewsky, "The P600 as a correlate of ventral attention network reorientation," Cortex 66 (2015): A3–A20, http://dx.doi.org/10.1016/j.cortex.2014.12.019; Tim Paris, Jeesun Kim, and Chris Davis, "Using EEG and stimulus context to probe the modelling of auditory-visual speech," Cortex (2015). These and more are showcased in a virtual special issue of Registered Reports at Cortex. See also the special issue on Registered Reports of Replications in Social Psychology: http://econtent.hogrefe.com/toc/zsp/45/3.
- For a full list of currently participating journals and guidelines, see https://cos.io/rr/.
- http://www.acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research/.
- For more information on the TOP guidelines, see https://cos.io/top/.
- I am the current handling editor for Registered Reports at Royal Society Open Science. Even though the physical sciences suffer less from questionable research practices and researcher bias, they are nevertheless subject to publication bias and therefore stand to benefit from Registered Reports.
Our journalist Ella Rhodes asked Chris some questions:
Can you tell me what prompted you to write the book?
As a graduate student I had this idea of science as an objective journey of discovery, with the aim to discover the truth. The further I progressed in my career, the more that idealistic young scientist was beaten down by the system, realising that 'truth seeking' was actually quite a naive way of thinking about psychology research. Modern psychology us more like a game where the aim is to get famous from publishing great results and publishing in prestigious journals. I got quite good at this game but realised that I didn't like what I was turning into. I also reckoned that if the field goes on this way it is doomed to scientific oblivion. This book is a time machine written for my younger self. I want to tell him, you were right. It just took me a while to realise it.
Which of psychology's deadly sins are included?
The sins, in turn, are bias, hidden flexibility, unreliability, data hoarding, corruptibility, internment and bean counting.
In your experience, since the reproducibility crisis became more widely known, have attitudes/behaviours changed in a positive way (i.e. better research practices, using registered reports, carrying out more replications)? If so, how?
The state of psychology has definitely improved, but we have a long way to go. There are hundreds of psychology journals, and dozens of major ones. Registered Reports are offered only by a handful, and as yet by none of the 11 journals published by the British Psychological Society. We have a lot of work ahead of us.
How has the scientific community taken to using registered reports?
I'm very happy with the response. To date we have about 60 published submissions and many more in the pipeline. We're also seeing a great uptake by journals in psychology and beyond, with representation across the full spectrum of life, social and physical sciences.
How many journals now use study pre-registration? Do any use it solely?
There are currently 51 journals offering Registered Reports. Only one of these — a new journal called Comprehensive Results in Social Psychology – is dedicated exclusively to the format. The others offer it as a new article format for authors.
Aside from registered reports are there any other broad initiatives that may encourage better research practices?/more replication attempts?
Yes. The TOP guidelines are very important for changing journals policies. The TOP guidelines are a self-certification initiative in which journals declare, publicly, their adherence to various transparency standards. It's a brilliant initiative because it has a low bar for entry: the journal simply needs to say what it's level of transparency is, even if it is low – and over 2,900 journals and professional organisations have signed.
Another very important initiative is the Peer Reviewers' Openness Initiative (PRO). PRO is the brainchild of Richard Morey in Cardiff. It is a grassroots campaign that calls for peer reviewers to provide comprehensive reviews of papers only where authors publicly archive study data and materials or provide a public reason for not archiving. Since launching, PRO has attracted over 400 individual signatories and has triggered an increasing number of journals to adopt open data policies that are PRO compliant.