806 research outputs found
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark
A long-term visual object tracking performance evaluation methodology and a
benchmark are proposed. Performance measures are designed by following a
long-term tracking definition to maximize the analysis probing strength. The
new measures outperform existing ones in interpretation potential and in better
distinguishing between different tracking behaviors. We show that these
measures generalize the short-term performance measures, thus linking the two
tracking problems. Furthermore, the new measures are highly robust to temporal
annotation sparsity and allow annotation of sequences hundreds of times longer
than in the current datasets without increasing manual annotation labor. A new
challenging dataset of carefully selected sequences with many target
disappearances is proposed. A new tracking taxonomy is proposed to position
trackers on the short-term/long-term spectrum. The benchmark contains an
extensive evaluation of the largest number of long-term tackers and comparison
to state-of-the-art short-term trackers. We analyze the influence of tracking
architecture implementations to long-term performance and explore various
re-detection strategies as well as influence of visual model update strategies
to long-term tracking drift. The methodology is integrated in the VOT toolkit
to automate experimental analysis and benchmarking and to facilitate future
development of long-term trackers
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters
Standard RGB-D trackers treat the target as an inherently 2D structure, which
makes modelling appearance changes related even to simple out-of-plane rotation
highly challenging. We address this limitation by proposing a novel long-term
RGB-D tracker - Object Tracking by Reconstruction (OTR). The tracker performs
online 3D target reconstruction to facilitate robust learning of a set of
view-specific discriminative correlation filters (DCFs). The 3D reconstruction
supports two performance-enhancing features: (i) generation of accurate spatial
support for constrained DCF learning from its 2D projection and (ii) point
cloud based estimation of 3D pose change for selection and storage of
view-specific DCFs which are used to robustly localize the target after
out-of-view rotation or heavy occlusion. Extensive evaluation of OTR on the
challenging Princeton RGB-D tracking and STC Benchmarks shows it outperforms
the state-of-the-art by a large margin
Articulated motion and deformable objects
This guest editorial introduces the twenty two papers accepted for this Special Issue on Articulated Motion and Deformable Objects (AMDO). They are grouped into four main categories within the field of AMDO: human motion analysis (action/gesture), human pose estimation, deformable shape segmentation, and face analysis. For each of the four topics, a survey of the recent developments in the field is presented. The accepted papers are briefly introduced in the context of this survey. They contribute novel methods, algorithms with improved performance as measured on benchmarking datasets, as well as two new datasets for hand action detection and human posture analysis. The special issue should be of high relevance to the reader interested in AMDO recognition and promote future research directions in the field
- …