2,993 research outputs found

    Fixation prediction with a combined model of bottom-up saliency and vanishing point

    Full text link
    By predicting where humans look in natural scenes, we can understand how they perceive complex natural scenes and prioritize information for further high-level visual processing. Several models have been proposed for this purpose, yet there is a gap between best existing saliency models and human performance. While many researchers have developed purely computational models for fixation prediction, less attempts have been made to discover cognitive factors that guide gaze. Here, we study the effect of a particular type of scene structural information, known as the vanishing point, and show that human gaze is attracted to the vanishing point regions. We record eye movements of 10 observers over 532 images, out of which 319 have vanishing points. We then construct a combined model of traditional saliency and a vanishing point channel and show that our model outperforms state of the art saliency models using three scores on our dataset.Comment: arXiv admin note: text overlap with arXiv:1512.0172

    Digging Deeper into Egocentric Gaze Prediction

    Full text link
    This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We also propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.Comment: presented at WACV 201

    Visual attention in the real world

    Get PDF
    Humans typically direct their gaze and attention at locations important for the tasks they are engaged in. By measuring the direction of gaze, the relative importance of each location can be estimated which can reveal how cognitive processes choose where gaze is to be directed. For decades, this has been done in laboratory setups, which have the advantage of being well-controlled. Here, visual attention is studied in more life-like situations, which allows testing ecological validity of laboratory results and allows the use of real-life setups that are hard to mimic in a laboratory. All four studies in this thesis contribute to our understanding of visual attention and perception in more complex situations than are found in the traditional laboratory experiments. Bottom-up models of attention use the visual input to predict attention or even the direction of gaze. In such models the input image is analyzed for each of several features first. In the classic Saliency Map model, these features are color contrast, luminance contrast and orientation contrast. The “interestingness” of each location in the image is represented in a ‘conspicuity maps’, one for each feature. The Saliency Map model then combines these conspicuity maps by linear addition, and this additivity has recently been challenged. The alternative is to use the maxima across all conspicuity maps. In the first study, the features color contrast and luminance contrast were manipulated in photographs of natural scenes to test which of these mechanisms is the best predictor of human behavior. It was shown that a linear addition, as in the original model, matches human behavior best. As all the assumptions of the Saliency Map model on the processes preceding the linear addition of the conspicuity maps are based on physiological research, this result constrains future models in their mechanistic assumption. If models of visual attention are to have ecological validity, comparing visual attention in laboratory and real-world conditions is necessary, and this is done in the second study. In the first condition, eye movements and head-centered, first-person perspective movies were recorded while participants explored 15 real-world environments (“free exploration”). Clips from these movies were shown to participants in two laboratory tasks. First, the movies were replayed as they were recorded (“video replay”), and second, a shuffled selection of frames was shown for 1 second each (“1s frame replay”). Eye-movement recordings from all three conditions revealed that in comparison to 1s frame replay, the video replay condition was qualitatively more alike to the free exploration condition with respect to the distribution of gaze and the relationship between gaze and model saliency and was quantitatively better able to predict free exploration gaze. Furthermore, the onset of a new frame in 1s frame replay evoked a reorientation of gaze towards the center. That is, the event of presenting a stimulus in a laboratory setup affects attention in a way unlikely to occur in real life. In conclusion, video replay is a better model for real-world visual input. The hypothesis that walking on more irregular terrain requires visual attention to be directed at the path more was tested on a local street (“Hirschberg”) in the third study. Participants walked on both sides of this inclined street; a cobbled road and the immediately adjacent, irregular steps. The environment and instructions were kept constant. Gaze was directed at the path more when participants walked on the steps as compared to the road. This was accomplished by pointing both the head and the eyes lower on the steps than on the road, while only eye-in-head orientation was spread out along the vertical more on the steps, indicating more or large eye movements on the more irregular steps. These results confirm earlier findings that eye and head movements play distinct roles in directing gaze in real-world situations. Furthermore, they show that implicit tasks (not falling, in this case) affect visual attention as much as explicit tasks do. In the last study it is asked if actions affect perception. An ambiguous stimulus that is alternatively perceived as rotating clockwise or counterclockwise (the ‘percept’) was used. When participants had to rotate a manipulandum continuously in a pre-defined direction – either clockwise or counterclockwise – and reported their concurrent percept with a keyboard, percepts weren’t affected by movements. If participants had to use the manipulandum to indicate their percept – by rotating either congruently or incongruently with the percept – the movements did affect perception. This shows that ambiguity in visual input is resolved by relying on motor signals, but only when they are relevant for the task at hand. Either by using natural stimuli, by comparing behavior in the laboratory with behavior in the real world, by performing an experiment on the street, or by testing how two diverse but everyday sources of information are integrated, the faculty of vision was studied in more life like situations. The validity of some laboratory work has been examined and confirmed and some first steps in doing experiments in real-world situations have been made. Both seem to be promising approaches for future research

    Eye tracking and visual arts. Introduction to the special thematic issue

    Get PDF
    There is no visual art without the eye, just like no music without the ear. Visual art does not happen in the eye, but it has to go through the eye. Even for artworks with little visual focus, as in Conceptual Art, we need eyes to create and receive them. In order to see we need to move our eyes. It is therefore not surprising that, for centuries, the eye and its movements have been a major topic of literature on art. It is equally unsurprising that along recent technological improvements of eye tracking, this technology has become prolific for studying visual arts. This special issue of the Journal of Eye Movement Research is the first platform that provides a broad picture of recent developments in this area. In this introduction we present a history of eye movement in art literature, followed by a sketch of some of the oculometric parameters used for studies of visual art. In the third section we showcase each contribution to this special issue

    Perceptual modelling for 2D and 3D

    Get PDF
    Livrable D1.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.1 du projet

    Perceptual modelling for 2D and 3D

    Get PDF
    Livrable D1.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.1 du projet
    • 

    corecore