3,412 research outputs found
Gaze Distribution Analysis and Saliency Prediction Across Age Groups
Knowledge of the human visual system helps to develop better computational
models of visual attention. State-of-the-art models have been developed to
mimic the visual attention system of young adults that, however, largely ignore
the variations that occur with age. In this paper, we investigated how visual
scene processing changes with age and we propose an age-adapted framework that
helps to develop a computational model that can predict saliency across
different age groups. Our analysis uncovers how the explorativeness of an
observer varies with age, how well saliency maps of an age group agree with
fixation points of observers from the same or different age groups, and how age
influences the center bias. We analyzed the eye movement behavior of 82
observers belonging to four age groups while they explored visual scenes.
Explorativeness was quantified in terms of the entropy of a saliency map, and
area under the curve (AUC) metrics was used to quantify the agreement analysis
and the center bias. These results were used to develop age adapted saliency
models. Our results suggest that the proposed age-adapted saliency model
outperforms existing saliency models in predicting the regions of interest
across age groups
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
Objects predict fixations better than early saliency
Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as “saliency maps,” are often built on the assumption that “early” features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to “interesting” objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural
scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting
through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated
Personalization of Saliency Estimation
Most existing saliency models use low-level features or task descriptions
when generating attention predictions. However, the link between observer
characteristics and gaze patterns is rarely investigated. We present a novel
saliency prediction technique which takes viewers' identities and personal
traits into consideration when modeling human attention. Instead of only
computing image salience for average observers, we consider the interpersonal
variation in the viewing behaviors of observers with different personal traits
and backgrounds. We present an enriched derivative of the GAN network, which is
able to generate personalized saliency predictions when fed with image stimuli
and specific information about the observer. Our model contains a generator
which generates grayscale saliency heat maps based on the image and an observer
label. The generator is paired with an adversarial discriminator which learns
to distinguish generated salience from ground truth salience. The discriminator
also has the observer label as an input, which contributes to the
personalization ability of our approach. We evaluate the performance of our
personalized salience model by comparison with a benchmark model along with
other un-personalized predictions, and illustrate improvements in prediction
accuracy for all tested observer groups
- …