Updated: A re-replication of a psychological classic provides a cautionary tale about overhyped science
Update: On Twitter, some researchers argued, reasonably in my view, that I wasn’t sceptical enough in relating these findings. See the update at the end of this post for more.
15 August 2018
By Jesse Singal
If you wanted a poster child for the replication crisis and the controversy it has unleashed within the field of psychology, it would be hard to do much better than Fritz Strack's findings. In 1988, the German psychologist and his colleagues published research that appeared to show that if your mouth is forced into a smile, you become a bit happier, and if it's forced into a frown, you become a bit sadder. He pulled this off by asking volunteers to view a set of cartoons (paper ones, not animated) while holding a pen in their mouth, either with their teeth (forcing their mouth into a smile), or with their lips (forcing a frown), and to then use the pen in this position to rate how amused they were by the cartoons. The smilers were more amused, and the frowners less so – and best of all, they mostly didn't discern the true purpose of the experiment, eliminating potential placebo-effect explanations.
This basic idea, that our facial expressions can feed back into our psychological state and behavior, goes back at least as far as Darwin and William James, but "facial feedback", as it is known, had never been demonstrated in such an elegant and rigorous-seeming manner. Over time, this style of experiment was replicated and expanded upon, and soon it came to be considered a true blockbuster, so famous it found its ways into psychology textbooks, as well as popular books and articles citing it as an example of the unexpectedly subtle ways our bodies and environments can affect us psychologically. Often, facial feedback has been popularised along the lines of Maybe you can smile your way to happiness!, which added an irresistible self-help element that likely helped spread the idea. Either way, it seemed like a genuinely safe and solid psychological finding. That changed rather abruptly in 2016.
That was when a large, multinational replication attempt of the 1988 study, organised by E. J. Wagenmakers of the University of Amsterdam (Strack had bravely volunteered his study for such scrutiny), delivered some surprising results from 17 labs experimenting on almost 2,000 participants: There wasn't much evidence to support the effect after all. Nine of the labs found the expected effect, albeit in a much weaker form – a difference of just .1 or .2 points, on average, on the nine-point cartoon-amusement rating scale, between the smilers and frowners, as compared to an average difference of about .8 in the original study – while the rest found an about equally weak effect pointing in the other direction. Summing up the whole episode in an enjoyable and comprehensive Slate article, Daniel Engber writes, "When Wagenmakers put all the findings in a giant pile, the effect averaged out and disappeared. The difference between the smilers and frowners had been reduced to three-hundredths of a rating point, a random blip, a distant echo in the noise."
But a failed replication is rarely the end of the story, because failed replications often spark further controversy over what they mean – or don't. This was no exception: Some observers agreed that Wagenmakers and his colleagues' work really did call the idea of facial feedback into question. Others, including Strack himself, argued that because the researchers had altered certain aspects of the experimental setup, these weren't "true" replications and thus couldn't be counted as evidence against the original finding. "I don't see what we've learned," Strack told Engber. (Strack and Wolfgang Stroebe published a paper in 2014 making this replication-sceptical argument more generally.)
Now, a new paper adds a bit of evidence to the idea that the failure to replicate here might have more to do with methodological issues than with the absence of a real effect. For an article published in the Journal of Personality and Social Psychology, Tom Noah, Yaacov Schul, and Ruth Mayo of the Hebrew University of Jerusalem basically replicated the Wagenmakers teams' replications, but with a newly introduced independent variable to toggle on or off: the presence of a video camera. As the authors explain, this was one of the key differences between the original studies and the replications: in the former, there was no video camera watching the participants, but in the latter there was (the footage was used to check whether the participants had followed the instructions correctly). Strack himself had cited the presence of the cameras as one reason he was skeptical of the failed replications.
Noah and her colleagues write that there are theoretical reasons to believe that the feeling of being observed – by a camera, in this case – could have certain effects that might disrupt facial feedback. Specifically, under such circumstances "people adopt an external perspective of themselves… [and] tend to neglect internal information." In other words, the act of smiling might cause certain internal cues that in turn cause an uptick in happiness or amusement, but the feeling of being observed could short-circuit this connection.
So in half the Noah team's (re-)replications, there was a video camera. In the other half, their wasn't one. And sure enough, there was a good-sized statistically significant effect in the no-camera group – a .83 difference between the smiling and frowning groups on that nine-point scale, which was much larger than the average effect sizes of about .1 or .2 in the successful Wagenmakers replications and right in line with what Strack had found originally – but not in the camera group, where the difference was minuscule and statistically non-significant. This, they write, provides evidence that, as per the paper's title, "Both the Original Study and Its Failed Replications Are Correct." In other words, it could be that facial feedback is real, but if you feel like you're being observed, the effect is stymied.
This could explain the whole sequence, from the original, exciting experiment to the dispiriting follow-ups from 2016: There were no cameras in the original paper (and the various replications of it that followed), so the effect was observed. Then Wagenmakers' replicators introduced a new feature of the experiment – cameras – that disrupted the effect, so poof, the effect went away.
Toward the end of their paper Noah, Schul, and Mayo write of the importance of "cumulative science", and their research is a good example of how that principle could be put to work to help resolve what has become something of a vexing controversy in social psychology. Now, researchers have a new theory to work with: being observed can disrupt the facial feedback effect. One study can't prove this, of course – it's time, as always, to run more of them, to get a few inches closer to the truth of the matter.
This sort of research also introduces an important cautionary note into the question of how science should be communicated to the public – as an ongoing process rather than as a generator of open-and-closed facts. It is often the case that fairly limited, difficult-to-generalise-from lab findings get popularised in overhyped ways – the reason the power posing controversy, for example, blew up the way it did has just as much to do with the way the findings were popularised and presented to the public (both by Amy Cuddy herself and by journalists and others) than with the core research itself. At the same time, when a famous finding fails to replicate, sceptics can be quick to label the entire topic as junk science. This new replication of a failed replication provides the latest reminder that psychological lab findings – particularly sexy, counterintuitive ones from social psych – can be quite context-dependent and can wither when even subtle changes to the experimental procedure are made. This should make us all the more sceptical about the big, bold claims made by popularisers in TED Talks and elsewhere – and all the more aware of the importance of careful, nuanced science reporting and communication.
Further reading
About the author
Jesse Singal (@JesseSingal) is a contributing writer at New York Magazine. He is working on a book about why shoddy behavioral-science claims sometimes go viral for Farrar, Straus and Giroux.