The dark side of impact
Emma Young, writer on our Research Digest, on ‘surprising’ findings, and moves towards science we can better trust.
02 February 2023
Every researcher wants their new paper to make an impact – to grab the attention of journalists as well as colleagues, and garner legions of citations. Surprising, counter-intuitive, or 'paradigm-shifting' findings are most likely to achieve this. Unfortunately, these are also the types of findings that are least likely to be confirmed in replication studies. This is the 'dark side' of impact. But, some argue, this isn't necessarily as bleak as it might sound.
Even people who don't work in academia are in fact more dubious about 'surprising' findings. This was shown in work published in 2019, led by Suzanne Hoogeeven, which found that lay people were pretty good at predicting which of 27 high-profile social science findings would replicate, and which would not. These participants' decisions seemed to be based simply on judgements about which findings seemed more likely. For example, a study finding that people are less likely to choose to donate to one charity over another when told a significant chunk of their donation would go to administrative costs was deemed (correctly), to be much more likely to replicate than a remarkable 'finding', published in Science in 2012, that simply looking at an image of Rodin's 'The Thinker' could 'promote religious disbelief' (this finding was not replicated).
If lay people are cautious about high tariff claims, academics should be even more so. However, work published by Marta Serra-Garcia and Uri Gneezy in Science Advances in 2021 found that papers that failed to replicate were cited more often than papers with findings that were successfully replicated, even after a replication attempt had failed. When a paper that had failed to replicate was cited, this was indicated only 12 per cent of the time. As the team noted, this means that findings that are either unreliable or less likely to go on to be found to be reliable are cited more often, giving them an even bigger impact over time.
The papers considered in this 2021 research were taken from an (enormously impactful) 2015 Science paper by the Open Science Collaboration. This multi-lab group replicated 100 psychology experiments that had been published in top journals in 2008. They found that they could replicate only between a third and a half of the original findings. 'Scientific claims should not gain credence because of the status or authority of their originator but by the replicability of their supporting evidence,' the group warned.
However, if a single paper that finds a surprising effect cannot be considered the final word on the subject, neither of course can a single failed replication attempt. Even the 2015 Science paper was challenged by a few researchers, and some authors of disputed findings have criticised individual replication studies. In 2019, for example, Carol Dweck, pioneer of the concept of the growth mindset, and co-author David Yeager at the University of Texas, published a paper arguing that a 'simple re-analysis' overturned a failure to replicate some key findings in the field.
All of this raises the question: what's the best way to go about investigating the replicability of impactful findings?
Beyond 'yes/no'
The field of social priming (the idea that subtle cues can have a big influence on behaviour) has taken some of the biggest replication hits. In 2022, a Swedish team led by Eric Mac Giolla reviewed 65 published replications of social priming studies and found that when an original author was involved, the replication was much more likely to be successful than when the team was fully independent. The Swedish researchers argued that this is bad news for social priming. They felt that only fully independent replications could be considered convincing.
Not all researchers share this view. In their 2019 paper, Dweck and Yeager argued for a collaborative approach: 'When replicators and original authors collaborate in good faith there is a unique opportunity for potentially important new knowledge to be generated, thus furthering the goals of science,' they wrote. The pair also argued in favour of moving away from seeking to make a 'yes-no' judgement about a body of work. While some findings might not be reproducible and may be deemed invalid, that doesn't necessarily mean that everything about the idea is wrong.
Thanks to a huge multi-national collaborative effort, this is exactly what was recently found for one controversial (or previously controversial) idea – that facial expressions can affect how you feel. The 'Many Smiles' Collaboration included researchers both for and against the theory (as well as researchers who had no stance on this), including some who had worked on original research that had failed to replicate. But rather than setting out to run faithful replications, they worked together to establish what they felt would be the best tests of the theory. In 2022, they published their findings: that there is indeed evidence for a key claim in the field – that smiling can make you feel happier – but that holding a pen between your teeth is unlikely to have any effect. This cooperative study suggested that the idea that facial movements affect how you feel isn't entirely right or wrong; it's more nuanced than that.
A similar argument is being made in some circles for even more controversial ideas – even power posing. The idea that how you hold your body affects how you feel and behave has a storied history, packed with claims, counterclaims, viral Ted talks, and often vitriolic arguments. One early reported finding – that power poses boost testosterone and reduce the stress hormone cortisol – has not been replicated. But there is evidence that power poses can make people feel more confident, and this might have an effect on their interactions. In his review in our June 2021 issue, Tom Loncar concluded that with some alterations to study design, power posing might be elevated 'from a potentially misinterpreted one-size-fits-all idea, to more specific and actionable understanding'.
Refining ideas
Even heavily disputed work suggesting that physical warmth affects feelings of warmth towards other people has been revisited recently. You'd have to have been hiding in a ditch not to know that replication studies (plural) have not supported the original findings. But in 2020, a pair of US psychologists, Adam Fay and Jon Maner, published work suggesting that the ambient temperature when this type of research is carried out matters. This wasn't considered in the original temperature/social feelings studies, or the replications – and it's possible that this might explain the inconsistent results.
No one is suggesting that we should not be cautious about bold claims. But it's also worth bearing in mind that these kinds of findings do sometimes replicate – for example, the massively impactful discovery that we'd rather avoid a loss than make the equivalent gain. And, as with the facial feedback hypothesis, after a comprehensive follow-up, some controversial ideas have been refined, rather than ditched.
When the 2015 Science paper that really stimulated talk of the 'crisis' of replicability was published, lan Kraut, now executive director emeritus of the Association for Psychological Science, made this point to The Guardian: 'The only finding that will replicate 100% of the time is likely to be trite, boring and probably already known'. The philosopher Alexander Bird at King's College London has also argued that in some cases, at least, 'bad luck' rather than 'bad science' could be to blame. In fact, Bird believes that replication failures should be expected, especially in an evolving field. 'Science in new and difficult fields is highly fallible,' he writes.
Many studies
Multi-lab collaborations, like the 'Many Smiles' group, are becoming more popular because they are seen as being more likely to give rise to less fallible findings. Though the original, influential 'Many Labs' replication project ceased in 2022,'Many Babies' (which focuses on replicating findings in developmental psychology) and 'Many EEGs', for example, are still hard at work. The Psychological Science Accelerator represents another major collaborative effort, with labs in 84 countries in its distributed network. Its aim is to produce reliable findings that are truly generalisable – rather than being limited to sub-groups of American students, for example.
Findings from these kinds of studies will, in general, be better trusted. And, when they are surprising, counter-intuitive or paradigm-shifting, their impact may well be more deserved.