Visual attention – a fresh look
John M. Findlay and Iain D. Gilchrist describe their efforts to ensure eye movements play a central role in our understanding of ‘active vision’.
12 December 2012
Our understanding of vision has a firm biological grounding: adding in attention seeks to extend this understanding to include higher cognitive processes. But thinking of visual attention as solely brain-based neglects our roving eyes. Eye movements are arguably our most common behavioural act and for each gaze shift, the brain selects a new location to direct the eyes. Here, we attempt to put eye movements back at the centre of a theory of sensorial attention.
Sensorial attention
Many text books introduce the topic of attention with the quote from William James (1890):
Everyone knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. (pp.403–404)
Rather fewer explore further James's ideas on attention, such as the distinction he makes between intellectual attention and sensorial attention. Intellectual attention, as the quotation shows, allows you to think about why William James's brother was famous and then what you might choose for lunch today, but prevents these thoughts occurring at the same time and interfering. Sensorial attention explains why the 'millions of items of the outward order are present to my senses which never properly enter into my experience' (p.402). For visual attention, the focus of this article, the problem is well known to us all: for example, when faced with a crowd of people at an airport arrival gate, how do we select the friend we are meeting?
In the mid-1990s, the two of us began collaborating and the realisation gradually crystallised that existing work on visual attention, whilst impressive, was also limited and in some respects misguided. Visual attention was conceived of as a mental process which operated on the retinal image. Michael Posner conceptualised a process sometimes termed a mental spotlight, which could pick out part of the retinal information (e.g. Posner, 1980), and Anne Treisman suggested a role that such a spotlight could play in visual search (e.g. Treisman, 1988). Almost no reference was made to the fact that the eye itself is mobile and indeed under many normal circumstances of viewing, is jumping around three or four times a second.
The oculomotor system is an exquisite example of neuro-engineering that enables the eyes to move almost instantaneously from one location to another in a jump-like way, known as a saccade, and then remain stable for a fraction of a second at the new location in a fixation. Normal vision consists of a continuous rapid sequence of fixation– saccade–fixation, etc. The retinal fovea is the location of the highest visual resolution and visual ability drops off dramatically away from the fovea. The mobility of the eye allows the fovea to be continually redirected to sample new locations in the environment. We felt that this process of pointing the fovea at regions of interest was the primary mechanism by which James's millions of items presented to the visual sense never properly enter into experience.
Why were eye movements ignored in the then-dominant traditions? In part, this can be attributed to the seductive but slippery apparent correspondence between the camera-like retinal image and the perceptual experience of having a picture in the head. However, more specific factors were associated with the dominant work on attention. Posner's paradigm required the eyes to be kept still, to demonstrate that a purely mental process could operate. Treisman's analysis of search functions suggested that during visual search the spotlight moved at a rate that was much more rapid than that at which the eyes could move. Their experimental studies were interesting and important, but had segued to a viewpoint where all that went on in vision depended on subsequent mental processing of a fixed retinal image. We gradually became convinced that we could offer a better framework. One fruit of our collaboration was the book Active Vision: The Psychology of Looking and Seeing (Findlay & Gilchrist, 2003). In this article, we revisit the themes and comment briefly on developments over the past 10 years.
Active vision
The above sketch is oversimplified, since a number of workers had already argued that the mobile eye must have some connection with the mental spotlight. Notably, Giacomo Rizzolatti, now better known for his discovery of mirror neurons, had linked the perceptual and motor sides of vision by proposing a pre-motor theory of visual attention whereby the spotlight reflected the preparatory process of generating an eye movement, prior to its execution (Rizzolatti et al., 1987). His theory was given strong support by a finding of Heiner Deubel and Werner Schneider (1996). Visual discrimination of material in peripheral vision is selectively enhanced at the target location of a saccade after the decision has been made to make the movement, but before the actual movement starts. We became convinced of the correctness of the pre-motor view. The mental spotlight process of attention, covert attention, should be regarded not as something that operates independently of eye movements. (Outside the laboratory, situations where covert attention is used without moving the eyes would seem to be predominantly social ones, for example, deception where actually moving the eyes would provide a cue that the individual wishes to conceal.) Instead, we argued that the study of covert visual attention should be seen as just one, albeit an important, component of the process of active vision, seeing through looking around.
What about the apparent very rapid movements of the mental spotlight in Treisman's serial search? Jeremy Wolfe is one of the leading theorists in the area. His theory of guided visual search (Wolfe, 1994) was an important development of Anne Treisman's work in recognising that some process – perhaps degree of similarity to the target – must guide the serial deployment of attention. He supported the 'very rapid movement' view, although he noted that the movement rate seemed to be variable rather than fixed. However, attempts to demonstrate such very rapid attentional movements more directly have failed, including one by Wolfe himself (Wolfe et al., 2000), who devised an ingenious experiment requiring participants to move attention as rapidly as possible around a clock face. They could not make very rapid movements voluntarily, although Wolfe was not ready to abandon the idea of involuntary very fast movements.
In the 1990s several laboratories started measuring patterns of eye movements during visual search and showed a high correlation between the number of saccades made in search and the search time (e.g. Williams et al., 1997). Moreover, careful analysis of the data from this work failed to find a result that would be predicted if a separate, very rapid, scanning by covert attention was taking place. If that occurred, it might be expected that the duration of the fixation preceding the eye movement to the target would be reduced in comparison with other scanning fixations, since once the covert attentional scan had reached the target, the eyes would be drawn to it. Pre-target fixation durations proved to be no shorter than those made elsewhere in the scanning. We felt ready to criticise the 'standard model' of visual search. It was sometimes an uphill task, with one referee in a high-quality visual science journal telling us that eye movements were of minimal importance in visual search.
In formulating our views, we were greatly aided by the fact that one area of perceptual research had already developed through taking account of the mobility of the eyes. When reading text, the eyes make a series of saccades progressing along each line (with occasional reverse movements). In the early days of computer technology, two US workers, Keith Rayner and George McConkie, devised the highly productive gaze-contingency paradigm. This methodology involved presenting the text on a screen and manipulating it in a way that depended on the gaze direction. In this way they were able to demonstrate that information was taken in from a limited area, the perceptual span, extending (in left-to-right reading) to the right of the current fixation position. The span was a compound with detailed letter information only coming from about eight letters ahead, although some information (word boundaries, initial letters) was available from further out to the right. The information taken in from the span, as well as being used to comprehend the text, is also used to plan the next saccade. A striking finding is the tendency (not strong, but reliable) for the eyes to jump over a predictable short word. However, in general, the location where the eyes are directed is mainly determined by low-level physical features of the text such as word boundaries. In contrast, the process that determines when the eyes move is much more sensitive to high-level cognitive factors such as word frequency.
Reading progresses, broadly speaking, in a one-dimensional way along the line of text. We believed that our understanding would be enhanced if we could extend the ideas to a two-dimensional case and develop a theory of how saccades and fixations were controlled in the visual search process. We used a framework termed biased competition (Desimone & Duncan, 1995), not unrelated to the work of Jeremy Wolfe, in which visual selection is the result of an array of interconnected neural networks, each having a retinotopic representation of the visual field, in which the interconnections are biased to promote features of the target being searched for. Such an interconnected set of networks is of course inspired by the known neuroanatomy of the visual system. The set of visual networks (V1, V2, etc.), retaining retinotopic mapping in each one, extends eventually through to the brain's superior colliculus, a key centre in saccade generation only two synapses away from the eye muscles. We argued that the biased competition allowed the neural activity in the superior colliculus to be considered as a salience map, a two-dimensional neural representation in which the level of activity at any point (i.e. any potential location for the direction of an eye movement) is related to the similarity at that location of the visual information to the search target. Neurophysiological work had already demonstrated support for this approach (Schall & Hanes, 1993).
It is worth clarifying what we mean by salience in this context. Salience can be used to describe the fact that in some situations a visual item will contrast strongly in a perceptual manner from the local background. Such items, to use Anne Treisman's serendipitously chosen term, pop out from their surroundings. We hadn't in fact intended such intrinsic salience, as we termed it, to be at the heart of our theory. We envisaged the salience map as simply the way in which similarity to the target was encoded. In practice, many other processes must also influence saccade generation and intrinsic salience will certainly be among them.
Asking the right questions
We feel that three broad questions should be at the heart of studies of visual attention:
I What visual information determines the target of the next eye movement?
I What visual information determines when the eyes move?
I What information is combined across eye movements to form a stable representation of the environment?
Returning to visual search, it is also important to consider how a sequence of saccades is generated that scans the array efficiently, avoiding locations that have been already viewed and finding
the target in the minimum possible time.
It has been shown that such efficient scanning is lost in certain neuropsychological conditions such as visual neglect (Husain et al., 2001). We have attacked this problem in several ways. Our work, and the work of others, suggests that a number of processes are involved in delivering efficient search behaviour, including short-term memory (Gilchrist & Harvey, 2000; Körner & Gilchrist, 2007), inhibitory control (Farrell et al., 2010), and systematic or heuristic planning (Findlay & Brown, 2006). Visual search appears then to tap into a much wider pool of core cognitive processes than we previously thought.
Moreover, our approach also offers a way to link to higher-level perceptual and cognitive processes. What is visually interesting (or salient) may vary from moment to moment and from individual to individual. In this way, the looking pattern of the eyes will reflect the interests, expectations and biases of each individual. A study showing this, which has become well-known, was made by the Russian scientist Alfred Yarbus. He measured eye scanning of a genre picture painted by Ilya Repin when observers scanned the picture following different questions about it. Very different eye scans were produced (Yarbus, 1967).
From the very elegant pioneering studies of Michael Land (e.g. Land et al., 1999) much work is now concerned with how eye movements are used in everyday life. Other recent developments have begun to relate active vision to social perception. It has long been known that when the eyes scan a scene containing human figures, these figures are very likely to be fixated. Recently it has been possible to show that even the very first eye movement when such a scene is presented is often directed to the human figure, indicating a rapid high-level selective process (Fletcher-Watson et al., 2008), a tendency also shown to exist in autistic individuals (Fletcher-Watson et al., 2009).
Our work has put movements of the eye at the heart of a model of sensorial attention. But what of James's other kind of attention: intellectual attention? This is still a somewhat mysterious process although an essential human characteristic. Recent work within the area of embodied cognition (e.g. Wilson, 2002) suggests that there is often no clear distinction between sensory processes and those of higher-level cognition. Whilst intellectual attention seems far removed from movements of the eyes, we are nevertheless fascinated by the possibility that the neural architecture that has evolved to achieve active vision may prove to be similar in some ways to that used to control thought processes. This is clearly an exciting and potentially fruitful area for further inquiry.
John M. Findlay is Professor Emeritus, University of Durham
[email protected]
Iain D. Gilchrist is Professor of Neuropsychology, University of Bristol [email protected]
References
Desimone, R. & Duncan, J. (1995). Neural mechanisms of selective attention. Annual Review of Neuroscience, 18, 193–222.
Deubel, H. & Schneider, W.X. (1996). Saccade target selection and object recognition. Vision Research, 36, 1827–1837.
Farrell, S., Ludwig, C.J.H., Ellis, L.A. & Gilchrist, I.D. (2010). The influence of environmental statistics on inhibition of saccadic return. Proceedings of the National Academy of Sciences, 107, 929–934.
Findlay, J.M. & Brown, V. (2006). Eye scanning of multi-element displays. I. Scanpath planning. Vision Research, 46, 179–195.
Findlay, J.M. & Gilchrist, I.D. (2003). Active vision: The psychology of looking and seeing. Oxford: Oxford University Press.
Fletcher-Watson, S., Findlay, J.M., Leekam, S.R. & Benson, V. (2008). Rapid detection of person information in a naturalistic scene. Perception, 37, 571–583.
Fletcher-Watson, S., Leekam, S.R., Benson, V. et al. (2009). Eye movements reveal attention to social information in autistic spectrum disorder. Neuropsychologia, 47, 248–257.
Gilchrist, I.D. & Harvey, M. (2000). Refixation frequency and memory mechanisms in visual search. Current Biology, 10(19), 1209–1212.
Husain, M., Mannan, S., Hodgson, T. et al. (2001). Impaired spatial working memory across saccades contributes to abnormal search in parietal neglect. Brain, 124, 941–952.
Körner, C. & Gilchrist, I.D. (2007). Finding a new target in an old display. Psychonomic Bulletin & Review, 14, 846–851.
Land, M.F., Mennie, N. & Rusted, J. (1999). The roles of vision and eye movements in the control of activities of everyday living. Perception, 28, 1311–1328.
Posner, M.I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3–25.
Rizzolatti, G., Riggio, L., Dascola, I. & Umiltà, C. (1987). Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention. Neuropsychologia, 25, 31–40.
Schall, J.D. & Hanes, D.P. (1993). Neural basis of target selection in frontal eye field during visual search. Nature, 366, 467–469.
Treisman, A. (1988). Features and objects: The 14th Bartlett Memorial Lecture. Quarterly Journal of Experimental Psychology, 40A, 201–237.
Williams, D.E., Reingold, E.M., Moscovitch, M. & Behrmann, M. (1997). Patterns of eye movements during parallel and serial visual search tasks. Canadian Journal of Experimental Psychology, 51, 151–164.
Wilson, M. (2002). Six views of embodied cognition, Psychonomic Bulletin & Review, 9, 625–636.
Wolfe, J.M. (1994). Guided Search 2.0: A revised model of visual search. Psychonomic Bulletin & Review, 1, 202–238.
Wolfe, J.M., Alvarez, G.A. & Horowitz, T.S. (2000). Attention is fast but volition is slow. Nature, 406, 691.
Yarbus, A.L. (1967). Eye movements and vision. (English translation, L.A. Riggs, Ed.). New York: Plenum Press