Regulation, Research, Rules and regulations

"Psychological measures aren’t toothbrushes.”

New paper calls for urgent change in the ways psychological researchers select and use measures.

07 December 2023

Share this page

"Psychological constructs and measures suffer from the toothbrush problem: no self-respecting psychologist wants to use anyone else's."

So begins a provocative comment paper in Communications Psychology. In this paper, Malte Elson at the University of Bern and colleagues call for an urgent change to this situation, and propose new Standardisation Of BEhaviour Research (SOBER) guidelines for researchers, to promote the use of standardised measures.

Most psychological measures are used only once or twice, the team writes (with tests that are widely re-used almost all being used in clinical psychology). They think there are various reasons this, including a desire to promote one's own work, but also difficulties in identifying re-usable measures in the literature. Though at first glance it might seem like a good thing to investigate a phenomenon using different measures, the team argues that there are major downsides to varying them so regularly: "We argue that proliferation is in fact a serious barrier to cumulative science."

One problem identified by the team is that measures which seem to quantify the same thing actually don't — at least, not exactly. Among APA PsycTests, no fewer than 15 different tests are identified as being a 'job satisfaction scale', for example — but these tests may probe different aspects of job satisfaction, so one paper that uses a 'job satisfaction scale' could be using quite a different scale to another that professes to be measuring exactly the same thing.

Another problem is that even when the authors of a paper write that they are using the same measure of the same construct as in another paper, this isn't necessarily strictly the case. In some studies, a few items might be dropped from a scale, for instance, while other items might be added. These variations could have "unknown psychometric consequences," the authors write.

Yet another problem, they argue, is that even widely used measures have typically never been normed in the population in which they are being used — or the test norms are badly outdated. This makes it impossible to know whether the selection process for the participants resulted in a skewed group, which makes it harder to judge the generalisability of the findings.

"Psychology should be serious about standardising its measures — and currently it is not," the authors write. So, they are calling on journals to implement policies to address this, and in their paper, they lay out their proposed SOBER guidelines. Under these guidelines, among other things, authors would have to demonstrate that a new measure genuinely is needed, justify any modification to an existing measure (and clearly document any deviations) and report in their paper all the items, stimuli, instructions, and other detail that would be required for that protocol to be faithfully replicated in other work.

Elson and colleagues would now like to see an open repository of measurement protocols, with details of precisely how to implement them, as well as test norms and standard scoring rules — plus metadata that large language models could analyse to assess these protocols' reliability.

All of this, they argue, should allow for the creation of a truly cumulative evidence base, and for meta-analyses of papers that are all genuinely exploring the same thing, in the same way. "Psychologists need to stop remixing and recycling, and start reusing (measures, not toothbrushes)," they conclude.

Read the paper in full: https://doi.org/10.1038/s44271-023-00026-9