9,067 research outputs found
Vision-Based Production of Personalized Video
In this paper we present a novel vision-based system for the automated production of personalised video souvenirs for visitors in leisure and cultural heritage venues. Visitors are visually identified and tracked through a camera network. The system produces a personalized DVD souvenir at the end of a visitor’s stay allowing visitors to relive their experiences. We analyze how we identify visitors by fusing facial and body features, how we track visitors, how the tracker recovers from failures due to occlusions, as well as how we annotate and compile the final product. Our experiments demonstrate the feasibility of the proposed approach
Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks
This paper presents to the best of our knowledge the first end-to-end object
tracking approach which directly maps from raw sensor input to object tracks in
sensor space without requiring any feature engineering or system identification
in the form of plant or sensor models. Specifically, our system accepts a
stream of raw sensor data at one end and, in real-time, produces an estimate of
the entire environment state at the output including even occluded objects. We
achieve this by framing the problem as a deep learning task and exploit
sequence models in the form of recurrent neural networks to learn a mapping
from sensor measurements to object tracks. In particular, we propose a learning
method based on a form of input dropout which allows learning in an
unsupervised manner, only based on raw, occluded sensor data without access to
ground-truth annotations. We demonstrate our approach using a synthetic dataset
designed to mimic the task of tracking objects in 2D laser data -- as commonly
encountered in robotics applications -- and show that it learns to track many
dynamic objects despite occlusions and the presence of sensor noise.Comment: Published in The Thirtieth AAAI Conference on Artificial Intelligence
(AAAI-16), Video: https://youtu.be/cdeWCpfUGWc, Code:
http://mrg.robots.ox.ac.uk/mrg_people/peter-ondruska
Towards dense object tracking in a 2D honeybee hive
From human crowds to cells in tissue, the detection and efficient tracking of
multiple objects in dense configurations is an important and unsolved problem.
In the past, limitations of image analysis have restricted studies of dense
groups to tracking a single or subset of marked individuals, or to
coarse-grained group-level dynamics, all of which yield incomplete information.
Here, we combine convolutional neural networks (CNNs) with the model
environment of a honeybee hive to automatically recognize all individuals in a
dense group from raw image data. We create new, adapted individual labeling and
use the segmentation architecture U-Net with a loss function dependent on both
object identity and orientation. We additionally exploit temporal regularities
of the video recording in a recurrent manner and achieve near human-level
performance while reducing the network size by 94% compared to the original
U-Net architecture. Given our novel application of CNNs, we generate extensive
problem-specific image data in which labeled examples are produced through a
custom interface with Amazon Mechanical Turk. This dataset contains over
375,000 labeled bee instances across 720 video frames at 2 FPS, representing an
extensive resource for the development and testing of tracking methods. We
correctly detect 96% of individuals with a location error of ~7% of a typical
body dimension, and orientation error of 12 degrees, approximating the
variability of human raters. Our results provide an important step towards
efficient image-based dense object tracking by allowing for the accurate
determination of object location and orientation across time-series image data
efficiently within one network architecture.Comment: 15 pages, including supplementary figures. 1 supplemental movie
available as an ancillary fil
Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Visual scene understanding is an important capability that enables robots to
purposefully act in their environment. In this paper, we propose a novel
approach to object-class segmentation from multiple RGB-D views using deep
learning. We train a deep neural network to predict object-class semantics that
is consistent from several view points in a semi-supervised way. At test time,
the semantics predictions of our network can be fused more consistently in
semantic keyframe maps than predictions of a network trained on individual
views. We base our network architecture on a recent single-view deep learning
approach to RGB and depth fusion for semantic object-class segmentation and
enhance it with multi-scale loss minimization. We obtain the camera trajectory
using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth
annotated frames in order to enforce multi-view consistency during training. At
test time, predictions from multiple views are fused into keyframes. We propose
and analyze several methods for enforcing multi-view consistency during
training and testing. We evaluate the benefit of multi-view consistency
training and demonstrate that pooling of deep features and fusion over multiple
views outperforms single-view baselines on the NYUDv2 benchmark for semantic
segmentation. Our end-to-end trained network achieves state-of-the-art
performance on the NYUDv2 dataset in single-view segmentation as well as
multi-view semantic fusion.Comment: the 2017 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS 2017
- …