2,183 research outputs found
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
The Cyborg Astrobiologist: Testing a Novelty-Detection Algorithm on Two Mobile Exploration Systems at Rivas Vaciamadrid in Spain and at the Mars Desert Research Station in Utah
(ABRIDGED) In previous work, two platforms have been developed for testing
computer-vision algorithms for robotic planetary exploration (McGuire et al.
2004b,2005; Bartolo et al. 2007). The wearable-computer platform has been
tested at geological and astrobiological field sites in Spain (Rivas
Vaciamadrid and Riba de Santiuste), and the phone-camera has been tested at a
geological field site in Malta. In this work, we (i) apply a Hopfield
neural-network algorithm for novelty detection based upon color, (ii) integrate
a field-capable digital microscope on the wearable computer platform, (iii)
test this novelty detection with the digital microscope at Rivas Vaciamadrid,
(iv) develop a Bluetooth communication mode for the phone-camera platform, in
order to allow access to a mobile processing computer at the field sites, and
(v) test the novelty detection on the Bluetooth-enabled phone-camera connected
to a netbook computer at the Mars Desert Research Station in Utah. This systems
engineering and field testing have together allowed us to develop a real-time
computer-vision system that is capable, for example, of identifying lichens as
novel within a series of images acquired in semi-arid desert environments. We
acquired sequences of images of geologic outcrops in Utah and Spain consisting
of various rock types and colors to test this algorithm. The algorithm robustly
recognized previously-observed units by their color, while requiring only a
single image or a few images to learn colors as familiar, demonstrating its
fast learning capability.Comment: 28 pages, 12 figures, accepted for publication in the International
Journal of Astrobiolog
Robot in the mirror: toward an embodied computational model of mirror self-recognition
Self-recognition or self-awareness is a capacity attributed typically only to
humans and few other species. The definitions of these concepts vary and little
is known about the mechanisms behind them. However, there is a Turing test-like
benchmark: the mirror self-recognition, which consists in covertly putting a
mark on the face of the tested subject, placing her in front of a mirror, and
observing the reactions. In this work, first, we provide a mechanistic
decomposition, or process model, of what components are required to pass this
test. Based on these, we provide suggestions for empirical research. In
particular, in our view, the way the infants or animals reach for the mark
should be studied in detail. Second, we develop a model to enable the humanoid
robot Nao to pass the test. The core of our technical contribution is learning
the appearance representation and visual novelty detection by means of learning
the generative model of the face with deep auto-encoders and exploiting the
prediction error. The mark is identified as a salient region on the face and
reaching action is triggered, relying on a previously learned mapping to arm
joint angles. The architecture is tested on two robots with a completely
different face.Comment: To appear in KI - K\"unstliche Intelligenz - German Journal of
Artificial Intelligence - Springe
Deep visible and thermal image fusion for enhanced pedestrian visibility
Reliable vision in challenging illumination conditions is one of the crucial requirements of future autonomous automotive systems. In the last decade, thermal cameras have become more easily accessible to a larger number of researchers. This has resulted in numerous studies which confirmed the benefits of the thermal cameras in limited visibility conditions. In this paper, we propose a learning-based method for visible and thermal image fusion that focuses on generating fused images with high visual similarity to regular truecolor (red-green-blue or RGB) images, while introducing new informative details in pedestrian regions. The goal is to create natural, intuitive images that would be more informative than a regular RGB camera to a human driver in challenging visibility conditions. The main novelty of this paper is the idea to rely on two types of objective functions for optimization: a similarity metric between the RGB input and the fused output to achieve natural image appearance; and an auxiliary pedestrian detection error to help defining relevant features of the human appearance and blending them into the output. We train a convolutional neural network using image samples from variable conditions (day and night) so that the network learns the appearance of humans in the different modalities and creates more robust results applicable in realistic situations. Our experiments show that the visibility of pedestrians is noticeably improved especially in dark regions and at night. Compared to existing methods we can better learn context and define fusion rules that focus on the pedestrian appearance, while that is not guaranteed with methods that focus on low-level image quality metrics
- …