Work and occupational

IQ, Personality, and understanding test design

New research.

08 October 2012

We need to talk statistics. No, seriously, you'll thank me. You'll get a handle on Item Response Theory (IRT), something pretty crucial to occupational assessment, and be able to appreciate the important study we'll go on to discuss. It'll be fine...

If you've taken a modern occupational test, IRT was probably sitting beneath the bonnet making sense of the responses. Traditional tests count correct responses to give an estimate of your true ability: 30/40 means you ought to be better than if you'd scored 23/40. In contrast, IRT moves the unit of meaning from the test to the test item. Getting one question right or wrong gives us some predictor of ability right off the cuff; a coarse one, admittedly, but increasingly accurate as further responses are given.

Let's say Item E is easy. Someone at or above average should get it, and those below average have a fighting chance. Item H is hard: the chances of a correct answer are low for most, but the chances rocket up for the sharpest. Each item has a different relationship between test-taker's ability and the likelihood of them getting it right: these are the test parameters. (Because I love my readers, I've bodged up a visual example). You don't need to understand the maths to appreciate that armed with these parameters, it quickly becomes possible to home in on the true performance behind the item responses. Potent stuff.

As well as powering the tests, IRT offers an investigative methodology for the following problem: if two populations differ in test performance, does this reflect genuine difference or simply artefacts of how those populations approach the test? Well, if parameters are similar for both groups - the verbally sharp and weak Montagues have the same pattern on items E and H as do their Capulet counterparts - then the items are functioning in the same way, making us more confident the differences are real. If not, we should start to wonder if the test is being contaminated by something else - perhaps Capulets get stressed and guess blindly to items that look tricky, even ones they ought to have gotten right on account of their raw ability.

Put Verona aside. The real issue investigated by Chakadee Waiyavutti, Wendy Johnson, and Ian Deary is whether individuals with low IQ respond to personality tests differently. Personality? Yep, IRT is used for these assessments too, in a slightly fiddlier way - item 'difficulty' and right/wrong binaries need to be translated - with the concepts remaining solid. Higher and lower IQ groups do show slight personality differences in aggregate. If these differences were because personality items were understood differently by these different groups, it would call into question the validity of making judgements about personality when testing across ranges of IQ, which would impact occupational testing in a profound way.

Waiyavutti's team drew on a large data set of 683 individuals born in 1936, categorised into two groups with a mean IQ difference of 21 points. Participants completed two personality tests, the NEO-FFI and IPIP (both based on the Big 5 personality factors) and the researchers produced parameters for each item in each group, and analysed whether averaged parameters across the groups were significantly different. They found that while the two groups did differ on average - in expected areas such as Intellect and Openness to Experience and Emotional Stability - the personality test items operated similarly. This gives reassurance that these are meaningful differences.

So: we can be more confident that personality tests (at least these) are operating in the same way in people of differing IQ, making it reasonable to use them to draw their intended insights. And along the way we've figured out something about how modern tests operate. If you want a fuller exploration of IRT, you may be interested in this open-access article in the Psychologist online.

Waiyavutti C, Johnson W, & Deary IJ (2012). Do personality scale items function differently in people with high and low IQ? Psychological assessment, 24 (3), 545-55 PMID: 22082036