162 research outputs found

    Problems with Saliency Maps

    Get PDF
    Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures

    Navigating the narrative : An eye-tracking study of readers' strategies when Reading comic page layouts

    Get PDF
    ACKNOWLEDGMENTS The authors wish to acknowledge the work of Elliot Balson, Yiannis Giagis, Rossi Gifford, Damon Herd, Cletus Jacobs, Norrie Millar, Gary Walsh and Letty Wilson as the artists and writers of the comics used in this study. Research Funding Economic and Social Research Council. Grant Number: ESRC (ES/M007081/1)Peer reviewedPublisher PD

    How to look next? A data-driven approach for scanpath prediction

    Get PDF
    By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour

    Measures and Limits of Models of Fixation Selection

    Get PDF
    Models of fixation selection are a central tool in the quest to understand how the human mind selects relevant information. Using this tool in the evaluation of competing claims often requires comparing different models' relative performance in predicting eye movements. However, studies use a wide variety of performance measures with markedly different properties, which makes a comparison difficult. We make three main contributions to this line of research: First we argue for a set of desirable properties, review commonly used measures, and conclude that no single measure unites all desirable properties. However the area under the ROC curve (a classification measure) and the KL-divergence (a distance measure of probability distributions) combine many desirable properties and allow a meaningful comparison of critical model performance. We give an analytical proof of the linearity of the ROC measure with respect to averaging over subjects and demonstrate an appropriate correction of entropy-based measures like KL-divergence for small sample sizes in the context of eye-tracking data. Second, we provide a lower bound and an upper bound of these measures, based on image-independent properties of fixation data and between subject consistency respectively. Based on these bounds it is possible to give a reference frame to judge the predictive power of a model of fixation selection . We provide open-source python code to compute the reference frame. Third, we show that the upper, between subject consistency bound holds only for models that predict averages of subject populations. Departing from this we show that incorporating subject-specific viewing behavior can generate predictions which surpass that upper bound. Taken together, these findings lay out the required information that allow a well-founded judgment of the quality of any model of fixation selection and should therefore be reported when a new model is introduced

    Individual differences in infant oculomotor behavior during the viewing of complex naturalistic scenes

    Get PDF
    Little research hitherto has examined how individual differences in attention, as assessed using standard experimental paradigms, relate to individual differences in how attention is spontaneously allocated in more naturalistic contexts. Here, we analyzed the time intervals between refoveating eye movements (fixation durations) while typically developing 11-month-old infants viewed a 90-min battery ranging from complex dynamic to noncomplex static materials. The same infants also completed experimental assessments of cognitive control, psychomotor reaction times (RT), processing speed (indexed via peak look during habituation), and arousal (indexed via tonic pupil size). High test–retest reliability was found for fixation duration, across testing sessions and across types of viewing material. Increased cognitive control and increased arousal were associated with reduced variability in fixation duration. For fixations to dynamic stimuli, in which a large proportion of saccades may be exogenously cued, we found that psychomotor RT measures were most predictive of mean fixation duration; for fixations to static stimuli, in contrast, in which there is less exogenous attentional capture, we found that psychomotor RT did not predict performance, but that measures of cognitive control and arousal did. The implications of these findings for understanding the development of attentional control in naturalistic settings are discussed

    Influence of Low-Level Stimulus Features, Task Dependent Factors, and Spatial Biases on Overt Visual Attention

    Get PDF
    Visual attention is thought to be driven by the interplay between low-level visual features and task dependent information content of local image regions, as well as by spatial viewing biases. Though dependent on experimental paradigms and model assumptions, this idea has given rise to varying claims that either bottom-up or top-down mechanisms dominate visual attention. To contribute toward a resolution of this discussion, here we quantify the influence of these factors and their relative importance in a set of classification tasks. Our stimuli consist of individual image patches (bubbles). For each bubble we derive three measures: a measure of salience based on low-level stimulus features, a measure of salience based on the task dependent information content derived from our subjects' classification responses and a measure of salience based on spatial viewing biases. Furthermore, we measure the empirical salience of each bubble based on our subjects' measured eye gazes thus characterizing the overt visual attention each bubble receives. A multivariate linear model relates the three salience measures to overt visual attention. It reveals that all three salience measures contribute significantly. The effect of spatial viewing biases is highest and rather constant in different tasks. The contribution of task dependent information is a close runner-up. Specifically, in a standardized task of judging facial expressions it scores highly. The contribution of low-level features is, on average, somewhat lower. However, in a prototypical search task, without an available template, it makes a strong contribution on par with the two other measures. Finally, the contributions of the three factors are only slightly redundant, and the semi-partial correlation coefficients are only slightly lower than the coefficients for full correlations. These data provide evidence that all three measures make significant and independent contributions and that none can be neglected in a model of human overt visual attention

    Effects of scene properties and emotional valence on brain activations : a fixation-related fMRI study

    Get PDF
    Temporal and spatial characteristics of fixations are affected by image properties, including high-level scene characteristics, such as object-background composition, and low-level physical characteristics, such as image clarity. The influence of these factors is modulated by the emotional content of an image. Here, we aimed to establish whether brain correlates of fixations reflect these modulatory effects. To this end, we simultaneously scanned participants and measured their eye movements, while presenting negative and neutral images in various image clarity conditions, with controlled object-background composition. The fMRI data were analyzed using a novel fixation-based event-related (FIBER) method, which allows the tracking of brain activity linked to individual fixations. The results revealed that fixating an emotional object was linked to greater deactivation in the right lingual gyrus than fixating the background of an emotional image, while no difference between object and background was found for neutral images. We suggest that deactivation in the lingual gyrus might be linked to inhibition of saccade execution. This was supported by fixation duration results, which showed that in the negative condition, fixations falling on the object were longer than those falling on the background. Furthermore, increase in the image clarity was correlated with fixation-related activity within the lateral occipital complex, the structure linked to object recognition. This correlation was significantly stronger for negative images, presumably due to greater deployment of attention towards emotional objects. Our eye-tracking results are in line with these observations, showing that the chance of fixating an object rose faster for negative images over neutral ones as the level of noise decreased. Overall, our study demonstrated that emotional value of an image changes the way that low and high-level scene properties affect the characteristics of fixations. The fixation-related brain activity is affected by the low-level scene properties and this impact differs between negative and neutral images. The high-level scene properties also affect brain correlates of fixations, but only in the case of the negative images

    Object Detection Through Exploration With A Foveated Visual Field

    Get PDF
    We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) approach which is the dominant method of search in computer vision object detection. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our approach combines modern object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.Comment: An extended version of this manuscript was published in PLOS Computational Biology (October 2017) at https://doi.org/10.1371/journal.pcbi.100574

    Markov models for ocular fixation locations in the presence and absence of colour

    Get PDF
    In response to the 2015 Royal Statistical Society's statistical analytics challenge, we propose to model the fixation locations of the human eye when observing a still image by a Markov point process in R2_{2}. Our approach is data driven using k\textit{k}-means clustering of the fixation locations to identify distinct salient regions of the image, which in turn correspond to the states of our Markov chain. Bayes factors are computed as the model selection criterion to determine the number of clusters. Furthermore, we demonstrate that the behaviour of the human eye differs from this model when colour information is removed from the given image.This work was supported by UK Engineering and Physical Sciences Research Council grant EP/H023348/1 for the University of Cambridge Centre for Doctoral Training, Cambridge Centre for Analysis

    Task-specific modulation of memory for object features in natural scenes

    Get PDF
    The influence of visual tasks on short and long-term memory for visual features was investigated using a change-detection paradigm. Subjects completed 2 tasks: (a) describing objects in natural images, reporting a specific property of each object when a crosshair appeared above it, and (b) viewing a modified version of each scene, and detecting which of the previously described objects had changed. When tested over short delays (seconds), no task effects were found. Over longer delays (minutes) we found the describing task influenced what types of changes were detected in a variety of explicit and incidental memory experiments. Furthermore, we found surprisingly high performance in the incidental memory experiment, suggesting that simple tasks are sufficient to instill long-lasting visual memories
    corecore