34,865 research outputs found
Fixation prediction with a combined model of bottom-up saliency and vanishing point
By predicting where humans look in natural scenes, we can understand how they
perceive complex natural scenes and prioritize information for further
high-level visual processing. Several models have been proposed for this
purpose, yet there is a gap between best existing saliency models and human
performance. While many researchers have developed purely computational models
for fixation prediction, less attempts have been made to discover cognitive
factors that guide gaze. Here, we study the effect of a particular type of
scene structural information, known as the vanishing point, and show that human
gaze is attracted to the vanishing point regions. We record eye movements of 10
observers over 532 images, out of which 319 have vanishing points. We then
construct a combined model of traditional saliency and a vanishing point
channel and show that our model outperforms state of the art saliency models
using three scores on our dataset.Comment: arXiv admin note: text overlap with arXiv:1512.0172
Perception of Motion and Architectural Form: Computational Relationships between Optical Flow and Perspective
Perceptual geometry refers to the interdisciplinary research whose objectives
focuses on study of geometry from the perspective of visual perception, and in
turn, applies such geometric findings to the ecological study of vision.
Perceptual geometry attempts to answer fundamental questions in perception of
form and representation of space through synthesis of cognitive and biological
theories of visual perception with geometric theories of the physical world.
Perception of form, space and motion are among fundamental problems in vision
science. In cognitive and computational models of human perception, the
theories for modeling motion are treated separately from models for perception
of form.Comment: 10 pages, 13 figures, submitted and accepted in DoCEIS'2012
Conference: http://www.uninova.pt/doceis/doceis12/home/home.ph
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
The Visual Social Distancing Problem
One of the main and most effective measures to contain the recent viral
outbreak is the maintenance of the so-called Social Distancing (SD). To comply
with this constraint, workplaces, public institutions, transports and schools
will likely adopt restrictions over the minimum inter-personal distance between
people. Given this actual scenario, it is crucial to massively measure the
compliance to such physical constraint in our life, in order to figure out the
reasons of the possible breaks of such distance limitations, and understand if
this implies a possible threat given the scene context. All of this, complying
with privacy policies and making the measurement acceptable. To this end, we
introduce the Visual Social Distancing (VSD) problem, defined as the automatic
estimation of the inter-personal distance from an image, and the
characterization of the related people aggregations. VSD is pivotal for a
non-invasive analysis to whether people comply with the SD restriction, and to
provide statistics about the level of safety of specific areas whenever this
constraint is violated. We then discuss how VSD relates with previous
literature in Social Signal Processing and indicate which existing Computer
Vision methods can be used to manage such problem. We conclude with future
challenges related to the effectiveness of VSD systems, ethical implications
and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this
manuscript and they are listed by alphabetical order. Under submissio
- …