Publication bias afflicts the whole of psychology
A team of psychologists based in Salzburg looked at “effect sizes”, which provide a measure of how much experimental variables actually change an outcome.
24 October 2014
By Alex Fradera
In the last few years the social sciences, including psychology, have been taking a good look at themselves. While incidences of fraud hit the headlines, pervasive issues are just as important to address, such as publication bias, the phenomenon where non-significant results never see the light of day thanks to editors rejecting them or savvy researchers recasting their experiments around unexpected results and not reporting the disappointments. Statistical research has shown the extent of this misrepresentation in pockets of social science, such as specific journals, but a new meta-analysis suggests that the problem may infect the entire discipline of psychology.
A team of psychologists based in Salzburg looked at "effect sizes", which provide a measure of how much experimental variables actually change an outcome. The researchers randomly sampled the PsycINFO database to collect 1000 psychology articles across the discipline published in 2007, and then winnowed the list down to 395 by focusing only on those that used quantitative data to test hypotheses. For each main finding, the researchers extracted or calculated the effect size.
Studies with lots of participants (500 or more) had an average effect size in the moderate range r=.25. But studies with a smaller sample tended to have formidable effect sizes, as high as .48 for studies with under 50 participants. This resulted in a strong negative relationship between number of participants and size of effect, when statistically the two should be unrelated. As studies with more participants make more precise measurements, .25 is the better estimate of a typical psychology effect size, so the higher estimates suggest some sort of inflation.
The authors, led by Anton Kühberger, argue that the literature is thin on modest effect sizes thanks to the non-publication of non-significant findings (rejection by journals would be especially plausible for non-significant smaller studies), and the over-representation of spurious large effects, due to researchers retrospectively constructing their papers around surprising effects that were only stumbled across thanks to inventive statistical methods.
The analysts rejected one alternative explanation. To detect powerful effects a small sample is sufficient, so researchers who anticipate a big effect thanks to an initial "power analysis" might deliberately plan on small samples. But only 13 per cent of the papers in this report mentioned power, and the pattern of correlation in these specific papers appears no different to that found in the ones who never mention power. Moreover, the original 1000 authors were surveyed as to what they expected the relationship between effect size and sample size to be. Many respondents expected no effect, and even more expected that studies with more participants would have larger effects. This suggests that an up-front principled power analysis decision is unlikely to have been driving the main result.
Kühberger and his co-analysts recommend that in future we give more weight to how precise study findings are likely to be, by considering their sample size. One way of doing this is by reporting a statistic that takes sample size into account, the "confidence interval", which describes effect size not as a single value but as a range that we can be confident the true effect size falls within. As we all want to maintain confidence in psychological science, it's a recommendation worth considering (but see here for an alternative view).