84 research outputs found
Learning Monocular Visual Odometry through Geometry-Aware Curriculum Learning
Inspired by the cognitive process of humans and animals, Curriculum Learning
(CL) trains a model by gradually increasing the difficulty of the training
data. In this paper, we study whether CL can be applied to complex geometry
problems like estimating monocular Visual Odometry (VO). Unlike existing CL
approaches, we present a novel CL strategy for learning the geometry of
monocular VO by gradually making the learning objective more difficult during
training. To this end, we propose a novel geometry-aware objective function by
jointly optimizing relative and composite transformations over small windows
via bounded pose regression loss. A cascade optical flow network followed by
recurrent network with a differentiable windowed composition layer, termed
CL-VO, is devised to learn the proposed objective. Evaluation on three
real-world datasets shows superior performance of CL-VO over state-of-the-art
feature-based and learning-based VO.Comment: accepted in IEEE ICRA 201
ATDN vSLAM: An all-through Deep Learning-Based Solution for Visual Simultaneous Localization and Mapping
In this paper, a novel solution is introduced for visual Simultaneous
Localization and Mapping (vSLAM) that is built up of Deep Learning components.
The proposed architecture is a highly modular framework in which each component
offers state of the art results in their respective fields of vision-based deep
learning solutions. The paper shows that with the synergic integration of these
individual building blocks, a functioning and efficient all-through deep neural
(ATDN) vSLAM system can be created. The Embedding Distance Loss function is
introduced and using it the ATDN architecture is trained. The resulting system
managed to achieve 4.4% translation and 0.0176 deg/m rotational error on a
subset of the KITTI dataset. The proposed architecture can be used for
efficient and low-latency autonomous driving (AD) aiding database creation as
well as a basis for autonomous vehicle (AV) control.Comment: Published in Periodica Polytechnica Electrical Engineering 11 page
On Deep Learning Enhanced Multi-Sensor Odometry and Depth Estimation
In this thesis, we systematically study the integration of deep learning and simultaneous localization and mapping (SLAM) and advance the research frontier by making the following contributions. (1) We devise a unified information theoretic framework for end-to-end learning methods aimed at odometry estimation, which not only improves the accuracy empirically, but provides an elegant theoretical tool for performance evaluation and understanding in information theoretical language. (2) For the integration of learning and geometry, we put our research focus on the scale ambiguity problem in monocular SLAM and odometry systems. To this end, we first propose VRVO (Virtual-to-Real Visual Odometry) which retrieves the absolute scale from virtual data, adapts the learnt features between real and virtual domains, and establishes a mutual reinforcement pipeline between learning and optimization to further leverage the complementary information. The depth maps are used to carry the scale information, which are then integrated with classical SLAM systems by providing initialization values and dense virtual stereo objectives. (3) Since modern sensor-suits usually contain multiple sensors including camera and IMU, we further propose DynaDepth, an unsupervised monocular depth estimation method that integrates IMU motion dynamics. A differentiable camera-centric extended Kalman filter (EKF) framework is derived to exploit the complementary information from both camera and IMU sensors, which also provides an uncertainty measure for the ego-motion predictions. The proposed depth network not only learns the absolute scale, but exhibits better generalization ability and robustness against vision degradation. And the resulting depth predictions can be integrated into classical SLAM systems in the similar way as VRVO to achieve a scale-aware monocular SLAM system during inference
DeepTIO: a deep thermal-inertial odometry with visual hallucination
This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordVisual odometry shows excellent performance in a wide range of environments. However, in visually-denied scenarios (e.g. heavy smoke or darkness), pose estimates degrade or even fail. Thermal cameras are commonly used for perception and inspection when the environment has low visibility. However, their use in odometry estimation is hampered by the lack of robust visual features. In part, this is as a result of the sensor measuring the ambient temperature profile rather than scene appearance and geometry. To overcome this issue, we propose a Deep Neural Network model for thermal-inertial odometry (DeepTIO) by incorporating a visual hallucination network to provide the thermal network with complementary information. The hallucination network is taught to predict fake visual features from thermal images by using Huber loss. We also employ selective fusion to attentively fuse the features from three different modalities, i.e thermal, hallucination, and inertial features. Extensive experiments are performed in hand-held and mobile robot data in benign and smoke-filled environments, showing the efficacy of the proposed model
AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation
Motion estimation approaches typically employ sensor fusion techniques, such
as the Kalman Filter, to handle individual sensor failures. More recently, deep
learning-based fusion approaches have been proposed, increasing the performance
and requiring less model-specific implementations. However, current deep fusion
approaches often assume that sensors are synchronised, which is not always
practical, especially for low-cost hardware. To address this limitation, in
this work, we propose AFT-VO, a novel transformer-based sensor fusion
architecture to estimate VO from multiple sensors. Our framework combines
predictions from asynchronous multi-view cameras and accounts for the time
discrepancies of measurements coming from different sources.
Our approach first employs a Mixture Density Network (MDN) to estimate the
probability distributions of the 6-DoF poses for every camera in the system.
Then a novel transformer-based fusion module, AFT-VO, is introduced, which
combines these asynchronous pose estimations, along with their confidences.
More specifically, we introduce Discretiser and Source Encoding techniques
which enable the fusion of multi-source asynchronous signals.
We evaluate our approach on the popular nuScenes and KITTI datasets. Our
experiments demonstrate that multi-view fusion for VO estimation provides
robust and accurate trajectories, outperforming the state of the art in both
challenging weather and lighting conditions
- …