25 research outputs found
Semi-Dense 3D Reconstruction with a Stereo Event Camera
Event cameras are bio-inspired sensors that offer several advantages, such as
low latency, high-speed and high dynamic range, to tackle challenging scenarios
in computer vision. This paper presents a solution to the problem of 3D
reconstruction from data captured by a stereo event-camera rig moving in a
static scene, such as in the context of stereo Simultaneous Localization and
Mapping. The proposed method consists of the optimization of an energy function
designed to exploit small-baseline spatio-temporal consistency of events
triggered across both stereo image planes. To improve the density of the
reconstruction and to reduce the uncertainty of the estimation, a probabilistic
depth-fusion strategy is also developed. The resulting method has no special
requirements on either the motion of the stereo event-camera rig or on prior
knowledge about the scene. Experiments demonstrate our method can deal with
both texture-rich scenes as well as sparse scenes, outperforming
state-of-the-art stereo methods based on event data image representations.Comment: 19 pages, 8 figures, Video: https://youtu.be/Qrnpj2FD1e
Exploiting Structural Regularities and Beyond: Vision-based Localization and Mapping in Man-Made Environments
Image-based estimation of camera motion, known as visual odometry
(VO), plays a very important role in many robotic applications
such as control and navigation of unmanned mobile robots,
especially when no external navigation reference signal is
available. The core problem of VO is the estimation of the
camera’s ego-motion (i.e. tracking) either between successive
frames, namely relative pose estimation, or with respect to a
global map, namely absolute pose estimation. This thesis aims to
develop efficient, accurate and robust VO solutions by taking
advantage of structural regularities in man-made environments,
such as piece-wise planar structures, Manhattan World and more
generally, contours and edges. Furthermore, to handle challenging
scenarios that are beyond the limits of classical sensor based VO
solutions, we investigate a recently emerging sensor — the
event camera and study on event-based mapping — one of the key
problems in the event-based VO/SLAM. The main achievements are
summarized as follows.
First, we revisit an old topic on relative pose estimation:
accurately and robustly estimating the fundamental matrix given a
collection of independently estimated homograhies. Three
classical methods are reviewed and then we show a simple but
nontrivial two-step normalization
within the direct linear method that achieves similar performance
to the less attractive and more computationally intensive
hallucinated points based method.
Second, an efficient 3D rotation estimation algorithm for depth
cameras in piece-wise planar environments is presented. It shows
that by using surface normal vectors as an input, planar modes in
the corresponding density distribution function can be discovered
and continuously
tracked using efficient non-parametric estimation techniques. The
relative rotation can be estimated by registering entire bundles
of planar modes by using robust L1-norm minimization.
Third, an efficient alternative to the iterative closest point
algorithm for real-time tracking of modern depth cameras in
ManhattanWorlds is developed. We exploit the common orthogonal
structure of man-made environments in order to decouple the
estimation of the rotation and the three degrees of freedom of
the translation. The derived camera orientation is absolute and
thus free of long-term drift, which in turn benefits the accuracy
of the translation estimation as well.
Fourth, we look into a more general structural
regularity—edges. A real-time VO system that uses Canny edges
is proposed for RGB-D cameras. Two novel alternatives to
classical distance transforms are developed with great properties
that significantly improve the classical Euclidean distance field
based methods in terms of efficiency, accuracy and robustness.
Finally, to deal with challenging scenarios that go beyond what
standard RGB/RGB-D cameras can handle, we investigate the
recently emerging event camera and focus on the problem of 3D
reconstruction from data captured by a stereo event-camera rig
moving in a static
scene, such as in the context of stereo Simultaneous Localization
and Mapping
CED: Color Event Camera Dataset
Event cameras are novel, bio-inspired visual sensors, whose pixels output
asynchronous and independent timestamped spikes at local intensity changes,
called 'events'. Event cameras offer advantages over conventional frame-based
cameras in terms of latency, high dynamic range (HDR) and temporal resolution.
Until recently, event cameras have been limited to outputting events in the
intensity channel, however, recent advances have resulted in the development of
color event cameras, such as the Color-DAVIS346. In this work, we present and
release the first Color Event Camera Dataset (CED), containing 50 minutes of
footage with both color frames and events. CED features a wide variety of
indoor and outdoor scenes, which we hope will help drive forward event-based
vision research. We also present an extension of the event camera simulator
ESIM that enables simulation of color events. Finally, we present an evaluation
of three state-of-the-art image reconstruction methods that can be used to
convert the Color-DAVIS346 into a continuous-time, HDR, color video camera to
visualise the event stream, and for use in downstream vision applications.Comment: Conference on Computer Vision and Pattern Recognition Workshop
Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency
An event camera is a novel vision sensor that can capture per-pixel
brightness changes and output a stream of asynchronous ``events''. It has
advantages over conventional cameras in those scenes with high-speed motions
and challenging lighting conditions because of the high temporal resolution,
high dynamic range, low bandwidth, low power consumption, and no motion blur.
Therefore, several supervised monocular depth estimation from events is
proposed to address scenes difficult for conventional cameras. However, depth
annotation is costly and time-consuming. In this paper, to lower the annotation
cost, we propose a self-supervised event-based monocular depth estimation
framework named EMoDepth. EMoDepth constrains the training process using the
cross-modal consistency from intensity frames that are aligned with events in
the pixel coordinate. Moreover, in inference, only events are used for
monocular depth prediction. Additionally, we design a multi-scale
skip-connection architecture to effectively fuse features for depth estimation
while maintaining high inference speed. Experiments on MVSEC and DSEC datasets
demonstrate that our contributions are effective and that the accuracy can
outperform existing supervised event-based and unsupervised frame-based
methods.Comment: Accepted by IROS202
Stereo Event-based Visual-Inertial Odometry
Event-based cameras are new type vision sensors whose pixels work
independently and respond asynchronously to brightness change with microsecond
resolution, instead of providing standard intensity frames. Compared with
traditional cameras, event-based cameras have low latency, no motion blur, and
high dynamic range (HDR), which provide possibilities for robots to deal with
some challenging scenes. We propose a visual-inertial odometry for stereo
event-based cameras based on Error-State Kalman Filter (ESKF). The visual
module updates the pose relies on the edge alignment of a semi-dense 3D map to
a 2D image, and the IMU module updates pose by median integral. We evaluate our
method on public datasets with general 6-DoF motion and compare the results
against ground truth. We show that our proposed pipeline provides improved
accuracy over the result of the state-of-the-art visual odometry for stereo
event-based cameras, while running in real-time on a standard CPU
(low-resolution cameras). To the best of our knowledge, this is the first
published visual-inertial odometry for stereo event-based cameras