20 research outputs found
PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features
Event cameras are motion-activated sensors that capture pixel-level
illumination changes instead of the intensity image with a fixed frame rate.
Compared with the standard cameras, it can provide reliable visual perception
during high-speed motions and in high dynamic range scenarios. However, event
cameras output only a little information or even noise when the relative motion
between the camera and the scene is limited, such as in a still state. While
standard cameras can provide rich perception information in most scenarios,
especially in good lighting conditions. These two cameras are exactly
complementary. In this paper, we proposed a robust, high-accurate, and
real-time optimization-based monocular event-based visual-inertial odometry
(VIO) method with event-corner features, line-based event features, and
point-based image features. The proposed method offers to leverage the
point-based features in the nature scene and line-based features in the
human-made scene to provide more additional structure or constraints
information through well-design feature management. Experiments in the public
benchmark datasets show that our method can achieve superior performance
compared with the state-of-the-art image-based or event-based VIO. Finally, we
used our method to demonstrate an onboard closed-loop autonomous quadrotor
flight and large-scale outdoor experiments. Videos of the evaluations are
presented on our project website: https://b23.tv/OE3QM6
ESVIO: Event-based Stereo Visual Inertial Odometry
Event cameras that asynchronously output low-latency event streams provide
great opportunities for state estimation under challenging situations. Despite
event-based visual odometry having been extensively studied in recent years,
most of them are based on monocular and few research on stereo event vision. In
this paper, we present ESVIO, the first event-based stereo visual-inertial
odometry, which leverages the complementary advantages of event streams,
standard images and inertial measurements. Our proposed pipeline achieves
temporal tracking and instantaneous matching between consecutive stereo event
streams, thereby obtaining robust state estimation. In addition, the motion
compensation method is designed to emphasize the edge of scenes by warping each
event to reference moments with IMU and ESVIO back-end. We validate that both
ESIO (purely event-based) and ESVIO (event with image-aided) have superior
performance compared with other image-based and event-based baseline methods on
public and self-collected datasets. Furthermore, we use our pipeline to perform
onboard quadrotor flights under low-light environments. A real-world
large-scale experiment is also conducted to demonstrate long-term
effectiveness. We highlight that this work is a real-time, accurate system that
is aimed at robust state estimation under challenging environments
Event-based Simultaneous Localization and Mapping: A Comprehensive Survey
In recent decades, visual simultaneous localization and mapping (vSLAM) has
gained significant interest in both academia and industry. It estimates camera
motion and reconstructs the environment concurrently using visual sensors on a
moving robot. However, conventional cameras are limited by hardware, including
motion blur and low dynamic range, which can negatively impact performance in
challenging scenarios like high-speed motion and high dynamic range
illumination. Recent studies have demonstrated that event cameras, a new type
of bio-inspired visual sensor, offer advantages such as high temporal
resolution, dynamic range, low power consumption, and low latency. This paper
presents a timely and comprehensive review of event-based vSLAM algorithms that
exploit the benefits of asynchronous and irregular event streams for
localization and mapping tasks. The review covers the working principle of
event cameras and various event representations for preprocessing event data.
It also categorizes event-based vSLAM methods into four main categories:
feature-based, direct, motion-compensation, and deep learning methods, with
detailed discussions and practical guidance for each approach. Furthermore, the
paper evaluates the state-of-the-art methods on various benchmarks,
highlighting current challenges and future opportunities in this emerging
research area. A public repository will be maintained to keep track of the
rapid developments in this field at
{\url{https://github.com/kun150kun/ESLAM-survey}}
Deep Event Visual Odometry
Event cameras offer the exciting possibility of tracking the camera's pose
during high-speed motion and in adverse lighting conditions. Despite this
promise, existing event-based monocular visual odometry (VO) approaches
demonstrate limited performance on recent benchmarks. To address this
limitation, some methods resort to additional sensors such as IMUs, stereo
event cameras, or frame-based cameras. Nonetheless, these additional sensors
limit the application of event cameras in real-world devices since they
increase cost and complicate system requirements. Moreover, relying on a
frame-based camera makes the system susceptible to motion blur and HDR. To
remove the dependency on additional sensors and to push the limits of using
only a single event camera, we present Deep Event VO (DEVO), the first
monocular event-only system with strong performance on a large number of
real-world benchmarks. DEVO sparsely tracks selected event patches over time. A
key component of DEVO is a novel deep patch selection mechanism tailored to
event data. We significantly decrease the pose tracking error on seven
real-world benchmarks by up to 97% compared to event-only methods and often
surpass or are close to stereo or inertial methods. Code is available at
https://github.com/tum-vision/DEVOComment: Accepted by 3DV 202
Event-aided Direct Sparse Odometry
We introduce EDS, a direct monocular visual odometry using events and frames.
Our algorithm leverages the event generation model to track the camera motion
in the blind time between frames. The method formulates a direct probabilistic
approach of observed brightness increments. Per-pixel brightness increments are
predicted using a sparse number of selected 3D points and are compared to the
events via the brightness increment error to estimate camera motion. The method
recovers a semi-dense 3D map using photometric bundle adjustment. EDS is the
first method to perform 6-DOF VO using events and frames with a direct
approach. By design, it overcomes the problem of changing appearance in
indirect methods. We also show that, for a target error performance, EDS can
work at lower frame rates than state-of-the-art frame-based VO solutions. This
opens the door to low-power motion-tracking applications where frames are
sparingly triggered "on demand" and our method tracks the motion in between. We
release code and datasets to the public.Comment: 16 pages, 14 Figures, Page: https://rpg.ifi.uzh.ch/ed
Event-Based Visual Odometry on Non-Holonomic Ground Vehicles
Despite the promise of superior performance under challenging conditions,
event-based motion estimation remains a hard problem owing to the difficulty of
extracting and tracking stable features from event streams. In order to
robustify the estimation, it is generally believed that fusion with other
sensors is a requirement. In this work, we demonstrate reliable, purely
event-based visual odometry on planar ground vehicles by employing the
constrained non-holonomic motion model of Ackermann steering platforms. We
extend single feature n-linearities for regular frame-based cameras to the case
of quasi time-continuous event-tracks, and achieve a polynomial form via
variable degree Taylor expansions. Robust averaging over multiple event tracks
is simply achieved via histogram voting. As demonstrated on both simulated and
real data, our algorithm achieves accurate and robust estimates of the
vehicle's instantaneous rotational velocity, and thus results that are
comparable to the delta rotations obtained by frame-based sensors under normal
conditions. We furthermore significantly outperform the more traditional
alternatives in challenging illumination scenarios. The code is available at
\url{https://github.com/gowanting/NHEVO}.Comment: Accepted by 3DV 202
Learning to Segment Dynamic Objects using SLAM Outliers
We present a method to automatically learn to segment dynamic objects using
SLAM outliers. It requires only one monocular sequence per dynamic object for
training and consists in localizing dynamic objects using SLAM outliers,
creating their masks, and using these masks to train a semantic segmentation
network. We integrate the trained network in ORB-SLAM 2 and LDSO. At runtime we
remove features on dynamic objects, making the SLAM unaffected by them. We also
propose a new stereo dataset and new metrics to evaluate SLAM robustness. Our
dataset includes consensus inversions, i.e., situations where the SLAM uses
more features on dynamic objects that on the static background. Consensus
inversions are challenging for SLAM as they may cause major SLAM failures. Our
approach performs better than the State-of-the-Art on the TUM RGB-D dataset in
monocular mode and on our dataset in both monocular and stereo modes.Comment: Accepted to ICPR 202