395 research outputs found
Visual Odometry Revisited: What Should Be Learnt?
In this work we present a monocular visual odometry (VO) algorithm which
leverages geometry-based methods and deep learning. Most existing VO/SLAM
systems with superior performance are based on geometry and have to be
carefully designed for different application scenarios. Moreover, most
monocular systems suffer from scale-drift issue.Some recent deep learning works
learn VO in an end-to-end manner but the performance of these deep systems is
still not comparable to geometry-based methods. In this work, we revisit the
basics of VO and explore the right way for integrating deep learning with
epipolar geometry and Perspective-n-Point (PnP) method. Specifically, we train
two convolutional neural networks (CNNs) for estimating single-view depths and
two-view optical flows as intermediate outputs. With the deep predictions, we
design a simple but robust frame-to-frame VO algorithm (DF-VO) which
outperforms pure deep learning-based and geometry-based methods. More
importantly, our system does not suffer from the scale-drift issue being aided
by a scale consistent single-view depth CNN. Extensive experiments on KITTI
dataset shows the robustness of our system and a detailed ablation study shows
the effect of different factors in our system.Comment: ICRA2020. Demo video: https://youtu.be/Nl8mFU4SJKY Code:
https://github.com/Huangying-Zhan/DF-V
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
Depth estimation from images serves as the fundamental step of 3D perception
for autonomous driving and is an economical alternative to expensive depth
sensors like LiDAR. The temporal photometric constraints enables
self-supervised depth estimation without labels, further facilitating its
application. However, most existing methods predict the depth solely based on
each monocular image and ignore the correlations among multiple surrounding
cameras, which are typically available for modern self-driving vehicles. In
this paper, we propose a SurroundDepth method to incorporate the information
from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views
and propose a cross-view transformer to effectively fuse the information from
multiple views. We apply cross-view self-attention to efficiently enable the
global interactions between multi-camera feature maps. Different from
self-supervised monocular depth estimation, we are able to predict real-world
scales given multi-camera extrinsic matrices. To achieve this goal, we adopt
the two-frame structure-from-motion to extract scale-aware pseudo depths to
pretrain the models. Further, instead of predicting the ego-motion of each
individual camera, we estimate a universal ego-motion of the vehicle and
transfer it to each view to achieve multi-view ego-motion consistency. In
experiments, our method achieves the state-of-the-art performance on the
challenging multi-camera depth estimation datasets DDAD and nuScenes.Comment: Accepted to CoRL 2022. Project page:
https://surrounddepth.ivg-research.xyz Code:
https://github.com/weiyithu/SurroundDept
- …