A place for confidence intervals
From October 'Letters'.
15 September 2015
The debate on confidence intervals continues.
In response to van der Linden and Chryst's letter ('Why the "new statistics" isn't new', August 2015), their assertion that confidence intervals (CIs) are based on null hypothesis significance testing (NHST) is clearly mistaken, as evidenced by the history of their development and by the information provided by each approach.
The history of the development of CIs and NHST makes clear that they are not equivalent, although both were developed as alternatives to applying Bayesian arguments in the absence of a priori expectations. These alternatives were developed at roughly the same time, by different people in different places: Fisher for NHST (1930, 1933, 1935, England) and Neyman for CIs (1934, 1941, Poland; see also Pytkowski, 1932). Neyman (1934) initially viewed Fisher's fiducial limits as essentially the same as his confidence intervals, but later (1941), following careful examination of Fisher's work and discussions with Fisher, both Neyman (1941) and Fisher (1935) determined that they were fundamentally different.
CIs and NHST provide very different information, although they sometimes rely on the same statistical information. With NHST, the significance value is the probability of obtaining these data, or more extreme data, if the null hypothesis is true. NHST does not provide any information about the likelihood of the data if the null hypothesis is not true. CIs, on the other hand, straightforwardly provide a range of plausible values for a statistic (in future samples) or a parameter without reference to a null hypothesis.
Catherine O. Fritz
Psychology Division, University of Northampton
References
Fisher, R.A. (1930). Inverse probability. Proceedings of the Cambridge Philosophical Society, 26, 528–535.
Fisher, R.A. (1933). The concepts of inverse probability and fiducial probability referring to unknown parameters. Proc of the Royal Soc, A, 139, 343–348.
Fisher, R.A. (1935). The fiducial argument in statistical inference. Annals of Eugenics, 6, 391–398.
Neyman, J. (1934). On the two different aspects of the representative method. Journal of the Royal Statistical Society, 97, 558–625.
Neyman, J. (1941). Fiducial argument and the theory of confidence intervals. Biometrica, 32, 128–150.
Pytkowski, W. (1932). The dependence of small farms upon their area, the outlay and the capital invested in cows. Warsaw: Series Bibljoteka Pulawaka.
We were puzzled by van der Linden and Chryst's claim that confidence intervals (CIs) are based on null hypothesis significance testing (NHST). Their argument seems to be that both CIs and NHSTs can make use of the same statistical tools in their calculations – in their example, standard errors and Z-scores. You might, however, argue similarly that, for example, a grave and a flower bed are the same because they both involve the use of the same tool, in this case, a spade.
The thrust of our argument in our article on building confidence in confidence intervals (June 2015) was that CIs, combined with effect sizes (ESs), provide researchers with much more useful information than NHST tests. We acknowledge that some researchers, shaped by the NHST world, will use CIs to draw conclusions about the likelihood of their results occurring by chance, and will probably be encouraged to do so by journal editors. We do not believe that this is a bad thing, because all researchers have the practical decision to take on whether they will or will not continue with that particular line of research, and at least the decision concerns the probable distribution of the effect, rather than a usually irrelevant null hypothesis. However, we hope that the researchers go much farther in using CIs.
Of course, CIs cannot give certainty over the population parameter – but they can give a great deal more information than the point estimates that are all that are usually reported. Also, it is not surprising that, as Hoekstra et al. (2014) reported, many psychologists misinterpret CIs. Most psychologists are unfamiliar with them and their use. Perhaps more depressing are the misinterpretations of NHSTs that occur, despite researchers' familiarity with them.
It is wrong for van der Linden and Chryst to claim that one has no idea whether or not a CI contains the population value. That conclusion would be true only if one accepts the strict frequentist interpretation of CIs that they set out. But the interpretation is hotly debated among statisticians. Instead we hold, as does Cumming (2012), that it is logical to believe that values within a CI are relatively plausible potential population values and therefore CIs are much more intuitive than van der Linden and Chryst suggest.
We would not want the casual reader to believe that van der Linden and Chryst have successfully defended current statistical practice from the arguments we presented. They are no fans of NHST, but instead want psychologists to adopt a third approach, Bayesian statistics. We have sympathy with Bayesian approaches, but we do not recommend a wholesale and enforced conversion of psychological data analysis to Bayesian methods for two main reasons. The challenge of persuading all psychologists to retrain in the sophisticated Bayesian techniques seems beyond what could be accomplished at the moment. We have found that even getting psychologists to think about effect sizes, rather than merely reporting them from statistical package output, is a challenging task. Secondly, moving to a Bayesian approach will not eliminate problems and disputes. Most Bayesian methods (but not all; see Wagenmakers et al., 2011) require the choosing of prior probabilities on which the calculations develop. Such priors are often contentious. As one recent example, Wagenmakers et al. argue that the prior probability for analysing a study on precognition should be .00000000000000000001. Not surprisingly, this makes finding supporting evidence for precognition very difficult! Others who are more sympathetic to the possibility of precognition would argue for a much more generous prior probability, leading to a greater likelihood of positive results.
We believe that the calculation of CIs and ESs are well within the skills of all psychologists, and that, if they explore this approach, they will find that they have more insight into their data and are able to communicate more useful information to their readers.
Peter E. Morris
Graham D. Smith
Psychology Division, University of Northampton
References
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge.
Hoekstra, R., Morey, R.D., Rouder, J.N. & Wagenmakers, E.J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.
Wagenmakers, E.J., Wetzels, R., Borsboom, D. & van der Maas, H.L.J. (2011). Why psychologists must change the way they analyse their data. Journal of Personality and Social Psychology, 100, 426–432.