33,182 research outputs found
Attention Allocation Aid for Visual Search
This paper outlines the development and testing of a novel, feedback-enabled
attention allocation aid (AAAD), which uses real-time physiological data to
improve human performance in a realistic sequential visual search task. Indeed,
by optimizing over search duration, the aid improves efficiency, while
preserving decision accuracy, as the operator identifies and classifies targets
within simulated aerial imagery. Specifically, using experimental eye-tracking
data and measurements about target detectability across the human visual field,
we develop functional models of detection accuracy as a function of search
time, number of eye movements, scan path, and image clutter. These models are
then used by the AAAD in conjunction with real time eye position data to make
probabilistic estimations of attained search accuracy and to recommend that the
observer either move on to the next image or continue exploring the present
image. An experimental evaluation in a scenario motivated from human
supervisory control in surveillance missions confirms the benefits of the AAAD.Comment: To be presented at the ACM CHI conference in Denver, Colorado in May
201
On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection
It has become apparent that a Gaussian center bias can serve as an important
prior for visual saliency detection, which has been demonstrated for predicting
human eye fixations and salient object detection. Tseng et al. have shown that
the photographer's tendency to place interesting objects in the center is a
likely cause for the center bias of eye fixations. We investigate the influence
of the photographer's center bias on salient object detection, extending our
previous work. We show that the centroid locations of salient objects in
photographs of Achanta and Liu's data set in fact correlate strongly with a
Gaussian model. This is an important insight, because it provides an empirical
motivation and justification for the integration of such a center bias in
salient object detection algorithms and helps to understand why Gaussian models
are so effective. To assess the influence of the center bias on salient object
detection, we integrate an explicit Gaussian center bias model into two
state-of-the-art salient object detection algorithms. This way, first, we
quantify the influence of the Gaussian center bias on pixel- and segment-based
salient object detection. Second, we improve the performance in terms of F1
score, Fb score, area under the recall-precision curve, area under the receiver
operating characteristic curve, and hit-rate on the well-known data set by
Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we
exemplarily demonstrate that implicit center biases are partially responsible
for the outstanding performance of state-of-the-art algorithms. Last but not
least, as a result of debiasing Cheng et al.'s algorithm, we introduce a
non-biased salient object detection method, which is of interest for
applications in which the image data is not likely to have a photographer's
center bias (e.g., image data of surveillance cameras or autonomous robots)
Pose Embeddings: A Deep Architecture for Learning to Match Human Poses
We present a method for learning an embedding that places images of humans in
similar poses nearby. This embedding can be used as a direct method of
comparing images based on human pose, avoiding potential challenges of
estimating body joint positions. Pose embedding learning is formulated under a
triplet-based distance criterion. A deep architecture is used to allow learning
of a representation capable of making distinctions between different poses.
Experiments on human pose matching and retrieval from video data demonstrate
the potential of the method
Object Referring in Videos with Language and Human Gaze
We investigate the problem of object referring (OR) i.e. to localize a target
object in a visual scene coming with a language description. Humans perceive
the world more as continued video snippets than as static images, and describe
objects not only by their appearance, but also by their spatio-temporal context
and motion features. Humans also gaze at the object when they issue a referring
expression. Existing works for OR mostly focus on static images only, which
fall short in providing many such cues. This paper addresses OR in videos with
language and human gaze. To that end, we present a new video dataset for OR,
with 30, 000 objects over 5, 000 stereo video sequences annotated for their
descriptions and gaze. We further propose a novel network model for OR in
videos, by integrating appearance, motion, gaze, and spatio-temporal context
into one network. Experimental results show that our method effectively
utilizes motion cues, human gaze, and spatio-temporal context. Our method
outperforms previousOR methods. For dataset and code, please refer
https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure
- …