Methods and statistics

Statistical significance explained in plain English

Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students.

16 August 2010

By Christian Jarrett

Today I'm delighted to discuss an absolutely fascinating topic in psychology – statistical significance. I know you're as excited about this as I am!

Why is psychology a science? Why bother with complicated research methods and statistical analyses? The answer is that we want to be as sure as possible that our theories about the mind and behaviour are correct. These theories are important – many decisions in areas like psychotherapy, business and social policy depend on what psychologists say.

Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you're testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was.

In science we're always testing hypotheses. We never conduct a study to 'see what happens', because there's always at least one way to make any useless set of data look important. We take a risk; we put our idea on the line and expose it to potential refutation. Therefore, all statistical tests in psychology test the probability of obtaining your given set of results (and all those that are even more extreme) if the hypothesis were incorrect – i.e. the null hypothesis were true.

Say I create a loaded die that I believe will always roll a six. I've invited you round to my house tonight for a nice cup of tea and a spot of gambling. I plan to hustle you out of lots of money (don't worry, we're good friends and always playing tricks like this on each other). Before you arrive I want to test my hypothesis that the die is loaded against my null hypothesis that it isn't.

I roll the die. A six. Success! But wait… there's actually a 1:6 chance that I would have gotten this result, even if the null hypothesis was correct. Not good enough. Better roll again. Another six! That's more like it; there's a 1:36 chance of getting two sixes, assuming the null hypothesis is correct.

The more sixes I roll, the lower the probability that my results came about by chance, and therefore the more confident I could be in rejecting the null hypothesis.

This is what statistical significance testing tells you – the probability that the result (and all those that are even more extreme) would have come about if the null hypothesis were true (in this case, if the die were truly random and not loaded). It's given as a value between 0 and 1, and labelled p. So p = .01 means a 1% chance of getting the results if the null hypothesis were true; p = .5 means 50% chance, p = .99 means 99%, and so on.

In psychology we usually look for p values lower than .05, or 5%. That's what you should look out for when reading journal papers. If there's less than a 5% chance of getting the result if the null hypothesis were true, a psychologist will be happy with that, and the result is more likely to get published.

Significance testing is not perfect, though. Remember this: 'Statistical significance is not psychological significance.' You must look at other things too; the effect size, the power, the theoretical underpinnings. Combined, they tell a story about how important the results are, and with time you'll get better and better at interpreting this story.

And that, in a nutshell, is what statistical significance is. Enthralling, isn't it?