Research

What some people say about what they think they think

We speak to Professor Brian M. Hughes about his new Palgrave book ‘Rethinking Psychology’, and run an exclusive extract.

30 January 2017

You write about the tension between science and pseudoscience which threatens psychology. How has this come about?

Psychology appeals to many audiences, not all well versed, or even interested, in science. Most psychology graduates build their careers outside the research-oriented university environments they were trained in. And psychologists themselves often identify more with the helping professions than they do with science.

Psychology's empirical ethos can slip easily from people's minds, if indeed it is ever there.

There is always tension between rigorous science and the applied relevance, public sympathy, and vocational instinct pursued by most psychologists. Even full-time researchers are constantly urged to emphasise the 'relevance' of their work. Therefore, people interested in psychology – and many psychologists – often focus on the products of research rather than the process.

Pseudoscience is a philosophical term for describing activities that claim to be, but are not really, scientific. When psychologists champion psychology but ignore the nuts and bolts of research, they benefit from psychology's scientific status without truly valuing the science. To me, this at least resembles, if not constitutes, pseudoscience within psychology.

In the battle between sense and nonsense, who's winning?

If we take sense to mean 'assertions we can confidently defend (because they incorporate both evidence and logic)' and nonsense to mean 'assertions we cannot confidently defend (because they lack evidence and/or logic)' then we must admit that psychology displays plenty of both. In my view, nonsense is winning at this moment.

Sense involves hard work, ethics, self-evaluation and vigilance. Nonsense is frictionless. Therefore, nonsense is more successful. Sensational trashy newspapers are read far more than staid academic journals. In this era of fake news, post-truth, and alternative facts, popular antipathy towards evidence-based knowledge represents a significant cultural threat.

Psychologists should promote empiricism and criticise poor research. In the extract you have online, I discuss self-report methods. But throughout Rethinking Psychology I argue that experimental and biological psychologists also drift towards pseudoscience. We draw unwarranted conclusions about evolutionary influences on behaviour, and we oversell brain-imaging research.

How should the average psychologist rethink their discipline?

We should view psychology as a research discipline that can be applied, rather than an applied discipline that is based on research. Psychologists should champion the principle that all genuine knowledge must be traceable to sound evidence. We should see ourselves as part of an intellectual movement that promotes rationality, logic and defensible knowledge.

Secondly, psychologists should rethink their duty as defenders of empiricism. We should consider the relevance of psychology outside the academic or mental health bubble. Psychology gathers evidence that resolves uncertainties about how people think and behave. We should be able to inform public debate on many issues.

Finally, psychologists should not critique other disciplines until they adequately critique themselves. We should ask whether psychology has become too tolerant of non-rigorous research.

Psychology is a public good well worth defending. If we seriously rethink what we are doing and how we do it, we can make a genuine contribution to humanity.

- For your chance to win a copy of the book, keep an eye on @psychmag on Twitter. Now we turn to an exclusive extract, courtesy of Palgrave.

In previous chapters of this book, I considered biological reductionism. If it is a problem, then what might be the solution? According to many psychologists, these obstacles are best overcome by circumventing biological paradigms altogether and reverting to a more common touch. If biological approaches are divorced from the individual human perspective, then the solution must lie in restoring the link: simply put, if you wish to explain the human experience, then go talk to some humans.

The idea that psychologists can best explore human psychology simply by talking to people takes many forms. It also rests on many assumptions. First is the basic assumption that the human experience is made up of discrete thoughts that occur within the human mind, most likely in the form of an inner narrative or 'live commentary' that comprises the individual's autobiographical monologue. A second assumption is that thoughts are available to the person who has them, and can be called to consciousness both readily and on demand. A third assumption is that most people will be willing to describe these thoughts should a researcher ask them to do so. And a fourth assumption is that experiences so described will inform psychologists in their attempt to extract generalizable principles about how humans think, feel, and behave. On these premises rests a huge portion of psychological research conducted over the past century, in both pure (e.g., social, developmental, personality) and applied (e.g., health, occupational, educational) domains.

However, each of these assumptions can be questioned in elementary ways. The first is perhaps the most obscure, but also the most fundamental. Whether or not human minds generate discrete sequential thoughts is very much open to doubt. Far more likely is the prospect that minds continuously generate a multiplicity of simultaneous thoughts, in the form of parallel processing. The idea that there is an inner monologue constantly representing our psychological state at any given moment is unlikely to fully account for the way we think. This also means that the second assumption – that people can directly access thoughts in a manner that supports their reporting them to a researcher – is mechanically questionable. Thoughts are not produced in such a way as to present themselves neatly to the person in whose mind they are formed. People often use obtuse reasoning to produce logically indefensible conclusions, and frequently derive erroneous impressions of what they remember. Thoughts not afflicted by heuristic error will nonetheless be subject to self-serving bias. The generation of ideas in the human mind is not ordered, apparent, and easily parsed; on the contrary, it is fuzzily sporadic, camouflaged by mental shortcuts, and virtually automatic. Such automaticity makes it parlous to assume that people even know what it is that they think.

A commonly cited example of cognitive automaticity occurs when we engage in a well-learned behaviour, such as driving a car. Driving requires a person to monitor their perceptual environments on a moment-by-moment basis, to continuously make a series of if-then type decisions regarding this input, to initiate actions (such as steering, signalling, and braking) as and when they are required, and to engage in several other related cognitive processes. When they are well practiced, these activities are achieved as though they were instinctive. Motorists are most likely to not even notice the detailed thinking they engage in as they glide along the roadway. Indeed, so submerged are their cognitive elaborations that drivers often feel their minds are sufficiently unencumbered that they can concentrate adequately on other things, such as conversing with a passenger or listening to the radio. Only in the event of a sudden interruption, such as a near-miss encounter with another vehicle, will specific driving-related cognitions intrude into conscious awareness. (One way to illustrate this point is to recall what happens when such behaviours are not well learned. For example, when you are taking part in your very first driving lesson, the sheer complexity of the required cognitions will probably overwhelm your consciousness to an extent that will make the task of driving seem a terrifying ordeal.) The point is that people who drive cars do so without having an up-to-the-second awareness of each and every thought required for the job. Asking them why they have driven in a particular manner might not be the best way to find out what they were thinking at the time.

Of course, an interview with a motorist about specific acts of driving might not make for the most compelling psychological research. Yet cognitive automaticity is not confined to banal or routine activities. Take, for example, the cognitions of a person attempting to console a distressed friend. When offering support to someone who is distraught, we try to choose very carefully not only what to say but also how to say it. To ensure that we react sympathetically to their responses, we monitor and recalibrate our tone as we go. We will approach the interaction having taken account of our own previous experiences, both direct and vicarious, as well as any relevant advice that we can remember receiving from others. Our choice of words will be influenced by our assessment of the various available options as we imagine and understand them, and these in turn will be based on the extent of our vocabulary and the conditions under which we learned each phrase and idiom. And our overall approach will be influenced by our own emotions at the time, including the sadness we will feel when witnessing our friend's distress. In short, our task will be inordinately multi-faceted and heavily laden with nuance. Nonetheless, it is one which we will probably pursue in an intuitive, almost instantaneous, fashion. We go with the flow, we open our mouths, and things come out. We won't deconstruct all the considerations we have to make: all the if-then choice points, the individual inferences about our friend's facial reactions and vocal pitch, all the momentary readjustments of our own demeanour and our efforts to act with appropriate empathy. If called upon afterwards to explain each and every choice we made (Why did you use that tone of voice? Why did you choose that phrase? Why did you stand like that?), we would have great difficulty telling a researcher the exact thoughts that led us to our decisions. Such helping behaviour is of great interest to psychologists: many studies have examined the various informal ways we all try to support each other through distressing experiences, and many more have sought to identify the best way of doing so in a systematic therapeutic context. However, given the extent to which the critical details of these effectively automatic processes lie beyond the reach of our awareness, efforts to ask people about their thoughts when navigating these interactions seem almost pointless.

And then we have the third assumption – that people are willing to describe their thoughts when asked to do so. We have already considered how people moderate their utterances, engage in self-censorship, and embellish what they say for various reasons. When relaying events, witnesses frequently exaggerate some details in order to highlight what they feel is pertinent. Alternatively, they might choose to omit other details for the same purpose. The result of either approach is the same: an impression conveyed to listeners that is flawed in terms of accuracy. But perhaps of more concern is the way people's reports are affected by social motivations that arise from modesty and vanity. The number of respondents who will deliberately cast themselves in a negative light, or even run the risk of doing so, can be presumed to be very low. Indeed, very many people will expend effort to cast themselves in a deliberately positive (as opposed to neutral) light. For these reasons, when considering the reliability or accuracy of what people tell us, we must factor in the way they are trying to manage our impressions of them. This is hardly an unknown requirement: the problem of social desirability bias is one of the most discussed by research methodologists in psychology. However, the fact that psychologists readily recognize the issue, and ponder it at length, does not guarantee that they have any certain way of circumventing it.

With social desirability bias comes an inevitable interpretation paradox. Given the flimsy reliability and objectivity of anecdotal evidence, we can presume that researchers would prefer to use other methods should they be available. So the main reason researchers end up directly interrogating their participants (or giving them questionnaires, which amounts to much the same thing) is that they have concluded that no other such method is actually available. The subject matter is not something that can be observed using other approaches. The very fact that we must go to the trouble of asking people to tell us something indicates that their views on the matter are not ordinarily available for public scrutiny. We are inquiring about subject matter that the person has chosen not to broadcast, private thoughts that are hidden from view, presumably for a reason. Moreover, once disclosed these thoughts cannot be made private again. The very irreversibility of disclosure will likely make many respondents quite reticent. Thus, whether used to measure participants' feelings, attitudes, or past behaviours, self- report methods are most often targeted at precisely those subjects about which participants are cautious. In short, self-report is most likely to be used when social desirability bias is most likely to be a problem.

As an obvious example, take the issue of sexuality. Because human sex is primarily a private behaviour, nearly everything we know about it is filtered through the lens of social desirability bias. (In this regard, it is tempting to refer to a popular online parody video news reel, which describes the results of a new Teen Sex Survey: 'Nearly 100% of boys, ages 12 to 15, report that they have sex all the time and are definitely not virgins.'; 'Teen boys losing virginity earlier and earlier, report teen boys', 2014.) It can be argued that any subject matter with an emotional aspect will fall foul of similar difficulties. Emotions are experienced privately and so, to establish their existence (or measure their intensity), we must rely on their being communicated second-hand by the person experiencing them. The problem is that human emotions are blatantly communicative: people regulate their expression in an attempt to control what they reveal. As a result, often what is communicated is not precisely what is felt. In one study conducted at my own laboratory, a group of college students were asked to perform a logic task on a computer (Hughes, 2007). Immediately afterwards, some of the participants who had performed quite well on the task were misled into believing that they had done very poorly (we told them that their scores were as bad as the weakest competitors in their age group). In essence, we fooled them into thinking they had flunked the task. When asked how this made them feel, the group reported no more distress than other students in the study. However, all the students were having their blood pressure monitored, so it was possible to examine other aspects of their state of mind. The cardiovascular data showed that these students exhibited large spikes in blood pressure, akin to those of people undergoing acute mental stress. Students who were given neutral or positive feedback showed no such changes in blood pressure, suggesting that their feedback caused them little or no distress. Overall, therefore, the story told by the cardiovascular data cast doubt on the validity of the self-reports: students who thought they had flunked were unwilling to tell us they were disappointed. It is worth recalling that this was all in the context of a modest laboratory experiment at a university, involving anonymous participation in a five-minute computerized mental rotation task.

Nonetheless, social desirability served to undermine the value of self-report in this banal context. It is hard to imagine that participants would have been particularly more forthcoming had they been asked to talk about their political opinions, personal values, altruistic tendencies, mental health, hygiene habits, condom use, intentions to quit smoking, or any of the real-life topics addressed by studies in which self-report methods are typically employed.

The fourth and final assumption is no less of a minefield: the idea that the testimony of a given group of participants can be used as the basis to make broad inferences about people in general. This is the straightforward enough issue of sampling validity, where researchers must think about whether the people they are studying are sufficiently representative of the population they wish to describe. The problem of ensuring sampling validity is not unique to studies that employ self-report methods. However, it is an especially important consideration for these studies, not least because of the vast range of ideas and opinions that human beings can have. It is easy to imagine that even a large sample of participants in a particular research study might hold views that are different to those of people not involved in the study. Indeed, because of the social desirability problem, the very fact that some participants are willing to participate in a study at all can mean that the views they report reflect a frame of reference that other people may not share. Sexually conservative people may not wish to participate in a study of sexual attitudes; emotionally reserved people may not wish to participate in a study of mental health. This conflation of sampling with subject matter is an acute problem for studies where the subject matter is sensitive. And as is argued above, self-report methods are usually resorted to for precisely those types of subjects.

In culmination, it can be sentimentally attractive to advocate the view that psychologists who wish to learn about the human condition should simply talk to people. However, the various questionable assumptions underlying such a strategy mean that the data collected will be hampered by a number of qualifications. Rather than yielding direct insights about what people think, it is safer to consider such data as comprising what some people say about what they think they think.

Our survey says…

Most self-report data are gathered through questionnaire-based methods. Sometimes this involves an attempt to solicit open-ended testimony, such as when participants are asked to describe their feelings about a specific topic. On other occasions, questionnaires are carefully prepared using a number of statistically-grounded methods that together are referred to as psychometrics. Psychometric tools usually require participants to supply a series of ratings or other quantitative responses to sets of rigidly structured requests. Technically, this latter style of instrument is not always a true 'questionnaire', in the sense that its various items may not be phrased as questions. Instead, the instruments might instruct participants to rank a list of options in order of their preferences, or to declare which of two (or more) statements best describes their feelings. A key advantage of these tools is that, by tightly structuring the format of participants' responses, it becomes possible to make reasonable assessments of response patterns using statistics. For example, it is possible to establish the statistically average response to a particular question, to compute numerical summaries of the respondent's own response tendency or style, and to identify profiles or patterns of responses across large numbers of questions in ways that then facilitate comparisons with other people. With this overall approach, psychologists have been able to develop relatively brief instruments that seek to measure a huge range of psychological variables, including attitudes, emotions, and mental health symptoms. By and large, these tools have proven to be valuable.

For example, when used in mental health, such psychometric instruments enable clinicians to form a confident view as to whether a client's self-reported feelings are close to the average of those reported by other similar people, or whether they are sufficiently far from the average as to warrant clinical attention. In the same way as a person's blood sugar or bodyweight can be classified as either normal or clinically high by comparing them with measures taken from healthy and unhealthy people, similar protocols can be developed using psychometrics to gauge a person's level of depression, anxiety, or (say) obsessive-compulsive thinking. Similarly, an occupational psychologist can employ a statistically-based psychometric tool to assess whether a candidate's attitudes resemble those of good managers, of effective innovators, or of incorrigible procrastinators. In these contexts, the strength of the psychometric approach lies in the ability to compare an individual participant's responses with statistical benchmarks derived from normative population samples. Without such data (including, critically, the benchmarks), practitioners would be left relying entirely on personal judgement to decide whether a client's psychological functioning was sufficiently distinct as to require comment or intervention.

Such tools have also proven useful in pure research contexts, by allowing psychologists to develop systematic ways of studying differences across people. A field where this has very clearly been the case is the study of personality. By using psychometrically developed questionnaires, psychologists have refined the way we understand how human personality is expressed, its consistency over time, and the way dispositions are inherited by children from their parents. Statistical techniques such as factor analysis have derived strikingly stable patterns in personality questionnaire data, across time and across cultures. For instance, virtually a century's worth of data now supports the view that human beings around the world can usefully be described in terms of a dimension ranging from introversion to extraversion. Every person can be scored as being extremely introverted, extremely extraverted, or somewhere specific in between. Moreover, the data shows that wherever a person is ranked on the dimension between the extremes, it is highly probable that they will continue to be ranked at roughly the same point whenever they are assessed in the future. It is even possible to statistically establish that introversion–extraversion is around 54 per cent heritable (Bouchard Jr & McGue, 2003). This means that across a group of people, more than half of the variation in this trait will be attributable to genetic, rather than environmental, factors. By comparison, this is around the same level of genetic heritability as has been established for body-mass index (Silventoinen, Magnusson, Tynelius, Kaprio, & Rasmussen, 2008) and cardiovascular disease risk (Fischer et al., 2005), two health characteristics that most people recognize as being biologically ingrained. As well as introversion–extraversion, psychometric research has found humans to be conspicuously characterized by other traits in ways that are similarly consistent and genetically heritable, such as their levels of emotional stability (which range from extremely stable to extremely unstable). In all, the number of major traits that have been identified as universally applicable to human personalities is believed to be relatively small, with the consensus largely favouring the idea that it is no more than five.

In sum, the psychometric approach to questionnaire research has been found to be useful. However, it is always worth remembering that this usefulness is almost entirely derived from the statistical nature of psychometrics. The scores produced by psychometric tools are contrived to be statistically consistent. Tools found to show poor reliability – that is, to yield scores that are not stable across repeated administrations – are deliberately overhauled or else simply abandoned. And when psychometricians modify a questionnaire, they do not dwell on the content of the questions; instead, they retain or drop questions on the basis of whether their responses exhibit the desired statistical stability. In this sense, the actual content of the question is almost irrelevant. Take the case of a psychometric questionnaire designed to measure a person's competence as a manager. If a particular question – for example, 'Do you enjoy watching television?' – consistently shows a high degree of statistical association with managerial competence (acclaimed managers always give one particular response, while notably poor managers always give a different one), then this question will be retained even though its phrasing appears to have little or nothing to do with management ability. On the other hand, if a question consistently shows a low degree of association with managerial competence – for example, 'Do you enjoy being a manager?' – then it will be discarded even though its phrasing appears, at face value, to be very relevant to management ability. The fact that the content of the latter question appears to be more related to management has no bearing on its usefulness in psychometric terms. The former question is more useful because, even though its wording concerns other things, good managers answer it in a way that makes them statistically discernible from bad managers. It is this statistical consistency that makes such a psychometric instrument actually work.

The big problem with questionnaires relates not so much to the method itself, but to the way casual observers view the field. To casual observers, the statistical underpinnings of psychometrics are typically invisible. They do not readily recognize that so long as a correlation is established between response patterns and outcomes, it hardly matters if the questions being responded to are semantically linked with those outcomes. Very often, people tasked with designing a questionnaire (such as psychology students embarking on a research project) become very invested in the content of the questions, and assume that the intrinsic value of the tool can be gauged from the way each is worded. They overlook the fact that usefulness is in fact determined by the comparative linking of responses with independent information concerning the construct that is to be measured (such as whether particular responses are consistently returned by good managers, but not by bad ones). It is only through examining statistical associations and comparisons with benchmarks that meaningful conclusions can be drawn. Casual observers tend to make the false inference that questionnaires, in their own right, are inherently useful in face-value terms. They feel that asking managers about management simply must be more informative than asking them about television, regardless of any amount of statistical data showing that what people say about television is more closely correlated with managerial prowess. Of course, when we refer to casual observers, we are not exclusively talking about amateur social scientists or lay readers of the psychological literature. It would appear that many mainstream researchers, including some who make questionnaire studies their life's work, observe the psychometric side of things quite casually indeed.

Ignoring the statistical potential of psychometrics, and instead relying on the semantic content of questionnaires to interpret responses, reduces the resulting data to the status of what some people say about what they think they think. It means that researchers are ignoring the various qualifications that make self-report testimony unreliable, and are holding firm to the four naïve assumptions that inform the idea that simply talking to people will generate useful insights about human psychology. This then leads to the problem of research findings that cannot be substantiated or, worse, the claim that particular findings are the result of scientific research when in fact they are based on evidence that is little more than anecdotal. Let us consider just a few illustrative examples of recent research appearing in major journals of psychology.

Example #1. How afraid are you of terrorists?

Surveying is a common way to attempt to gauge people's views on major political or cultural circumstances of the day. In one study, researchers used survey data to investigate whether British people's levels of perceived terrorist threat was a factor in determining their levels of social prejudice (Greenaway, Louis, Hornsey, & Jones, 2014). The researchers also aimed to assess whether people's sense of control over life moderated this link, based on the theory that feelings of control serve to nullify the way threat leads to paranoia. Having analysed data from over 2,000 UK citizens, the authors concluded that their theory was borne out. However, the measurement of all relevant variables was clouded by the use of self-report methods.

Take the measure of perceived terrorist threat. This was based on responses to the following question: 'Do you think a terrorist threat somewhere in the UK during the next 12 months is [not at all likely/not likely/ likely/very likely]?' At the very least, this question is quite vague. What, for example, might be meant by the term 'terrorist threat' in this context? An actual terrorist attack, or the mere existence of some unknowable and unrealized risk? Are we to take it that a threat is only deemable as highly likely if it is inevitably going to be followed through? Or is it legitimate to say that a threat is highly likely when there is some terrorist somewhere in the community, who has some vague but as yet unconsummated intention to consider action in the future? It is certainly conceivable that different people will attach different meanings to such a question, which then makes it difficult (if not impossible) to interpret the responses given to it.

Secondly, take the measure of social prejudice. This was based on answers to a number of questions, including: 'Would you say it is generally good or bad for the UK's economy that people come to live here from other countries?' Once again there is a level of vagueness to such a query. For one thing, it is possible to take a purely economic approach to the issue without being influenced by social prejudice at all (in other words, it is possible to hold a generalized view about the impact of transnational migration on economic growth in tariff-bound territories, without being addled by racism). At the other extreme, responses may be economically uninformed but driven entirely by xenophobia. Therefore, it is impossible to know whether a person answering such a question in the negative is socially prejudiced or not. In this case, the problems are undoubtedly compounded by social desirability. We can easily imagine that participants might be reluctant to admit to opinions that others may perceive as prejudiced; we can even expect that people's honest utterances will not fully reflect the degree to which social prejudices are implicit in their worldviews (Greenwald, McGhee, & Schwartz, 1998). In short, even though 2,000 Britons have reported their attitudes to the survey-takers, it appears wholly unsafe to treat their responses as de facto measures of the variables under investigation.

Example #2. How afraid are you of death?

In a second study, which addressed somewhat overlapping themes, researchers sought to investigate the way people's attitudes to mortality affect their willingness to become political martyrs (Orehek, Sasota, Kruglanski, Dechesne, & Ridgeway, 2014). For attitudes to mortality, the researchers employed a previously standardized psychometric tool, which was certainly a strength of the study. However, for martyrdom, the researchers took the approach of simply asking their participants questions that, at face value, concerned the relevant subject matter. Specifically, they asked the participants to rate their agreement with the following two statements:

'If faced with circumstances that required as much, I would sacrifice my life for a cause that was important to me' and 'I would not sacrifice my life for a cause highly important to me' (responses to the latter question were reverse-weighted when being combined with responses to the former). It can be noted that the 119 participants were all students at a North American university, and so were unlikely to have had day-to-day contact with political martyrdom. Nonetheless, even as an experiment dealing in hypotheticals, the reliance on self-report once again produced anomalies.

As before, a major source of confusion concerned the vagueness of the questions – in this case, the use of the phrase 'sacrifice my life'. For some people, being willing to sacrifice one's life for a cause in circumstances that require as much might involve becoming a suicide bomber. However, for others, it might involve something far less malign. For example, it might involve refusing to move from your civilian home even though you know your enemy is about to launch a deadly air strike (in fact, it may be reasonable to speculate that the latter interpretation would apply to more people than would the former). It can further be argued that answers to questions on martyrdom will inevitably overlap with those on mortality attitudes. Both sets of questions ask participants to indicate their willingness to die; the statistical association of the responses more likely reflects this overlap in meaning than it does the existence of two discrete cognitive constructs, one causally driving the other. Notwithstanding the effort invested in presenting these questions as part of a systematically structured research study, it seems doubtful that cognitions relating to martyrdom can be gauged simply by asking participants to report them.

Example #3. How violent are you? (And how much do you eat?)

In the next example, researchers attempted to apply self-report methods to younger participants. In a study of teenage girls, a team of researchers sought to establish whether a history of violent behaviour was associated with a history of weight-loss dieting (their theory drew on previous studies suggesting links between adolescent weight-control and aggression; Shiraishi et al., 2014). Over 9,000 girls completed the necessary questionnaires. Based on statistical analyses, the researchers concluded that past engagement in weight-loss dieting was indeed associated with higher levels of violence towards both people and objects. However, as all variables were quantified on the basis of self-report data, the actual meaning of the statistical result is terribly unclear.

The main problems here relate to the fact that both target variables are socially sensitive. It is hard to imagine that all teenagers will be equally forthcoming in providing accurate information about their history of past violence. Many participants will inflate reports of such behaviour because of teenage bravado; others will do the opposite, especially if they fear that admissions would expose them to a risk of criminal prosecution (a concern that reassurances about confidentiality are unlikely to fully assuage). Likewise, it is doubtful that reports of dieting history will be straightforward either. For one thing, participants with extremely disordered eating habits will be reluctant to report them to others, and some participants will be unable to admit their full extent, even to themselves. In fact, there may be a logical reason for self-reported violence to be associated with self-reported weight-loss dieting in such datasets. It reflects the fact that people vary in the degree to which they are inhibited when asked to reveal risqué aspects about themselves: most people who are shy about reporting one of these behaviours will likely be shy about reporting the other as well.

Example #4. How lazy are you?

In the final example, researchers attempted to apply self-report methods to children. In this case, the researchers were interested in finding out whether physical activity (as opposed to sedentary behaviour) was associated with academic success in primary school (Haapala et al., 2014). Having analysed data from 186 school pupils, they concluded that children's physical activity levels were predictive of both their literacy and their numeracy: the more physical activities they engaged in, the better the children were at reading and counting. However, while academic abilities were derived from (presumably objective) school tests, the information on physical activity was based on self-reports. There are a number of reasons why such an approach might be unreliable.

Firstly, the children were very young – they were aged between 6 and 8 years old – and so may have lacked the concentration or conceptual understanding required to provide accurate answers to the several questions asked (for example, the children were asked to report separately the extent of their engagement – in minutes per day – in supervised exercise, organized sport, organized non-sport exercise, unsupervised physical activity, physically active commuting, physical activity during school breaks, and so on). Secondly, because they were so young, their parents helped them to complete the questionnaires. While this may have improved the accuracy of some responses (such as reports of physical activity in which the parents were involved), it may have had an adverse influence on others. Indeed, the social desirability of the parents may have influenced the answers of the children (after all, few parents would wish to be perceived as failing to encourage their children to be healthy). Ultimately, the fact that self-reported physical activity was found to be correlated with literacy and numeracy may reflect the possibility that children who were less good at explaining and less good at counting answered the questions differently from their peers.

Considering individual examples helps to illustrate the ways in which research that is reliant on self-report will always face certain limitations. These limitations correspond to the underlying assumptions: assumptions relating to the formation and accessibility of thoughts, the willingness of participants to share their thoughts, and the degree to which such thoughts truly represent the discrete constructs being investigated. These problems are not isolated to a few recent (and hand-picked) cases. When the major journals of social psychology are scrutinized, similar methods appear prominently and frequently. Of articles describing empirical studies that appeared in the British Journal of Social Psychology during the last decade, the top five most cited included the following: a study linking self-reported health habits to self-reported self-esteem (Verplanken, 2006); a study linking self-reported social identification with self-reported social support and self-reported life satisfaction, but not self-reported stress (Haslam, O'Brien, Jetten, Vormedal, & Penna, 2005); and a study linking self-reported cultural stereotypes with self-reported in-group favouritism (Cuddy et al., 2009). During the same period, the top five most cited empirical papers in the Journal of Personality and Social Psychology included: a study linking self-reported positive emotions with blood pressure, but also with self- reported benefit-finding during negative situations (Tugade & Fredrickson, 2004); a non-blinded intervention study where self-reported positive emotions were linked with self-reported social support, self-reported life satisfaction, and self-reported depression (Fredrickson & Cohn, 2008); and a study linking self-reported political orientation with self-reported values and moral judgements (Graham, Haidt, & Nosek, 2009). It is clear that self-report survey methods are heavily relied upon in certain areas of psychology. However, whether the problems associated with the underlying assumptions are ever properly dealt with, or are simply ignored, is less clear.

Researchers will often argue that surveys are the only feasible method with which certain variables of interest can be examined. Certainly, many variables – such as self-esteem, political orientation, personal value systems, intentions, life satisfaction, and depression – are difficult to examine without consulting participants directly. Private behaviours – such as sexual activity – are also hard to examine objectively, but for other reasons. However, these variables are rarely ever impossible to examine in an objective way (a person's political orientation might alternatively be inferred from their behaviour; and, while still presenting a minefield, a person's self-reported sexual activity might be corroborated by consulting their partner). As such, in academic debates it occasionally appears as though such defences of self- report are ritualistic rather than fully thought through. For example, in one high-profile article defending self-reports in health behaviour research, two prominent authors offered this observation:

It is virtually impossible to obtain objective measures of some health related behaviors (e.g., condom use), and for many others (e.g., exercise, physical check-up) objective measures are expensive and time consuming. (Ajzen & Fishbein, 2004, p. 432)

However, in the very same paragraph, the authors went on to argue the following:

In some behavioral domains, such as condom use (Jaccard, McDonald, Wan, Dittus, & Quinlan, 2002)…self-reports are found to be quite accurate… . (Ajzen & Fishbein, 2004, p. 432)

Note the authors' point about condom use: they simultaneously describe it as virtually impossible to measure objectively and yet amenable to quite accurate measurement by self-report. But how exactly can they know that self-reports are 'quite accurate' if it is 'virtually impossible' to produce objective measures with which they can be compared? The paper they cite to support their statement inferred accuracy of self-reported condom use from the fact that such self-reports are often highly correlated across individuals (Jaccard, McDonald, Wan, Dittus, & Quinlan, 2002). In other words, the merit of one person's self-report is to be determined by its resemblance to another person's self-report. This appears very similar to the basis on which conspiracy theorists attach credibility to claims that the Loch Ness Monster actually exists.

Put another way, the idea that self-reports are self-verifying exposes research to precisely those pitfalls of anecdotal positivism that the scientific method was intended to avoid. The fact that some researchers simply find it difficult to imagine other ways of examining their subject matter does not, in and of itself, make pure self-report – the kind intended to be read at face value, with no psychometric benchmarking against population norms – any less weak.

- Find the references in the book, Rethinking Psychology: Good science, bad science, pseudoscience, by Brian M. Hughes (Professor of Psychology at the National University of Ireland, Galway). This extract is reproduced by kind permission of Palgrave.