11 research outputs found
DSEC: A Stereo Event Camera Dataset for Driving Scenarios
Once an academic venture, autonomous driving has received unparalleled
corporate funding in the last decade. Still, the operating conditions of
current autonomous cars are mostly restricted to ideal scenarios. This means
that driving in challenging illumination conditions such as night, sunrise, and
sunset remains an open problem. In these cases, standard cameras are being
pushed to their limits in terms of low light and high dynamic range
performance. To address these challenges, we propose, DSEC, a new dataset that
contains such demanding illumination conditions and provides a rich set of
sensory data. DSEC offers data from a wide-baseline stereo setup of two color
frame cameras and two high-resolution monochrome event cameras. In addition, we
collect lidar data and RTK GPS measurements, both hardware synchronized with
all camera data. One of the distinctive features of this dataset is the
inclusion of high-resolution event cameras. Event cameras have received
increasing attention for their high temporal resolution and high dynamic range
performance. However, due to their novelty, event camera datasets in driving
scenarios are rare. This work presents the first high-resolution, large-scale
stereo dataset with event cameras. The dataset contains 53 sequences collected
by driving in a variety of illumination conditions and provides ground truth
disparity for the development and evaluation of event-based stereo algorithms.Comment: IEEE Robotics and Automation Letter
Semi-Dense 3D Reconstruction with a Stereo Event Camera
Event cameras are bio-inspired sensors that offer several advantages, such as
low latency, high-speed and high dynamic range, to tackle challenging scenarios
in computer vision. This paper presents a solution to the problem of 3D
reconstruction from data captured by a stereo event-camera rig moving in a
static scene, such as in the context of stereo Simultaneous Localization and
Mapping. The proposed method consists of the optimization of an energy function
designed to exploit small-baseline spatio-temporal consistency of events
triggered across both stereo image planes. To improve the density of the
reconstruction and to reduce the uncertainty of the estimation, a probabilistic
depth-fusion strategy is also developed. The resulting method has no special
requirements on either the motion of the stereo event-camera rig or on prior
knowledge about the scene. Experiments demonstrate our method can deal with
both texture-rich scenes as well as sparse scenes, outperforming
state-of-the-art stereo methods based on event data image representations.Comment: 19 pages, 8 figures, Video: https://youtu.be/Qrnpj2FD1e
Focus Is All You Need: Loss Functions For Event-based Vision
Event cameras are novel vision sensors that output pixel-level brightness
changes ("events") instead of traditional video frames. These asynchronous
sensors offer several advantages over traditional cameras, such as, high
temporal resolution, very high dynamic range, and no motion blur. To unlock the
potential of such sensors, motion compensation methods have been recently
proposed. We present a collection and taxonomy of twenty two objective
functions to analyze event alignment in motion compensation approaches (Fig.
1). We call them Focus Loss Functions since they have strong connections with
functions used in traditional shape-from-focus applications. The proposed loss
functions allow bringing mature computer vision tools to the realm of event
cameras. We compare the accuracy and runtime performance of all loss functions
on a publicly available dataset, and conclude that the variance, the gradient
and the Laplacian magnitudes are among the best loss functions. The
applicability of the loss functions is shown on multiple tasks: rotational
motion, depth and optical flow estimation. The proposed focus loss functions
allow to unlock the outstanding properties of event cameras.Comment: 29 pages, 19 figures, 4 table
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Exploiting Structural Regularities and Beyond: Vision-based Localization and Mapping in Man-Made Environments
Image-based estimation of camera motion, known as visual odometry
(VO), plays a very important role in many robotic applications
such as control and navigation of unmanned mobile robots,
especially when no external navigation reference signal is
available. The core problem of VO is the estimation of the
camera’s ego-motion (i.e. tracking) either between successive
frames, namely relative pose estimation, or with respect to a
global map, namely absolute pose estimation. This thesis aims to
develop efficient, accurate and robust VO solutions by taking
advantage of structural regularities in man-made environments,
such as piece-wise planar structures, Manhattan World and more
generally, contours and edges. Furthermore, to handle challenging
scenarios that are beyond the limits of classical sensor based VO
solutions, we investigate a recently emerging sensor — the
event camera and study on event-based mapping — one of the key
problems in the event-based VO/SLAM. The main achievements are
summarized as follows.
First, we revisit an old topic on relative pose estimation:
accurately and robustly estimating the fundamental matrix given a
collection of independently estimated homograhies. Three
classical methods are reviewed and then we show a simple but
nontrivial two-step normalization
within the direct linear method that achieves similar performance
to the less attractive and more computationally intensive
hallucinated points based method.
Second, an efficient 3D rotation estimation algorithm for depth
cameras in piece-wise planar environments is presented. It shows
that by using surface normal vectors as an input, planar modes in
the corresponding density distribution function can be discovered
and continuously
tracked using efficient non-parametric estimation techniques. The
relative rotation can be estimated by registering entire bundles
of planar modes by using robust L1-norm minimization.
Third, an efficient alternative to the iterative closest point
algorithm for real-time tracking of modern depth cameras in
ManhattanWorlds is developed. We exploit the common orthogonal
structure of man-made environments in order to decouple the
estimation of the rotation and the three degrees of freedom of
the translation. The derived camera orientation is absolute and
thus free of long-term drift, which in turn benefits the accuracy
of the translation estimation as well.
Fourth, we look into a more general structural
regularity—edges. A real-time VO system that uses Canny edges
is proposed for RGB-D cameras. Two novel alternatives to
classical distance transforms are developed with great properties
that significantly improve the classical Euclidean distance field
based methods in terms of efficiency, accuracy and robustness.
Finally, to deal with challenging scenarios that go beyond what
standard RGB/RGB-D cameras can handle, we investigate the
recently emerging event camera and focus on the problem of 3D
reconstruction from data captured by a stereo event-camera rig
moving in a static
scene, such as in the context of stereo Simultaneous Localization
and Mapping