95 research outputs found
DeepFactors: Real-time probabilistic dense monocular SLAM
The ability to estimate rich geometry and camera motion from monocular imagery is fundamental to future interactive robotics and augmented reality applications. Different approaches have been proposed that vary in scene geometry representation (sparse landmarks, dense maps), the consistency metric used for optimising the multi-view problem, and the use of learned priors. We present a SLAM system that unifies these methods in a probabilistic framework while still maintaining real-time performance. This is achieved through the use of a learned compact depth map representation and reformulating three different types of errors: photometric, reprojection and geometric, which we make use of within standard factor graph software. We evaluate our system on trajectory estimation and depth reconstruction on real-world sequences and present various examples of estimated dense geometry
Learning meshes for dense visual SLAM
Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach
CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth
In this work, we present a lightweight, tightly-coupled deep depth network
and visual-inertial odometry (VIO) system, which can provide accurate state
estimates and dense depth maps of the immediate surroundings. Leveraging the
proposed lightweight Conditional Variational Autoencoder (CVAE) for depth
inference and encoding, we provide the network with previously marginalized
sparse features from VIO to increase the accuracy of initial depth prediction
and generalization capability. The compact encoded depth maps are then updated
jointly with navigation states in a sliding window estimator in order to
provide the dense local scene geometry. We additionally propose a novel method
to obtain the CVAE's Jacobian which is shown to be more than an order of
magnitude faster than previous works, and we additionally leverage
First-Estimate Jacobian (FEJ) to avoid recalculation. As opposed to previous
works relying on completely dense residuals, we propose to only provide sparse
measurements to update the depth code and show through careful experimentation
that our choice of sparse measurements and FEJs can still significantly improve
the estimated depth maps. Our full system also exhibits state-of-the-art pose
estimation accuracy, and we show that it can run in real-time with
single-thread execution while utilizing GPU acceleration only for the network
and code Jacobian.Comment: 6 Figure
iDF-SLAM: End-to-End RGB-D SLAM with Neural Implicit Mapping and Deep Feature Tracking
We propose a novel end-to-end RGB-D SLAM, iDF-SLAM, which adopts a
feature-based deep neural tracker as the front-end and a NeRF-style neural
implicit mapper as the back-end. The neural implicit mapper is trained
on-the-fly, while though the neural tracker is pretrained on the ScanNet
dataset, it is also finetuned along with the training of the neural implicit
mapper. Under such a design, our iDF-SLAM is capable of learning to use
scene-specific features for camera tracking, thus enabling lifelong learning of
the SLAM system. Both the training for the tracker and the mapper are
self-supervised without introducing ground truth poses. We test the performance
of our iDF-SLAM on the Replica and ScanNet datasets and compare the results to
the two recent NeRF-based neural SLAM systems. The proposed iDF-SLAM
demonstrates state-of-the-art results in terms of scene reconstruction and
competitive performance in camera tracking.Comment: 7 pages, 6 figures, 3 table
DVI-SLAM: A Dual Visual Inertial SLAM Network
Recent deep learning based visual simultaneous localization and mapping
(SLAM) methods have made significant progress. However, how to make full use of
visual information as well as better integrate with inertial measurement unit
(IMU) in visual SLAM has potential research value. This paper proposes a novel
deep SLAM network with dual visual factors. The basic idea is to integrate both
photometric factor and re-projection factor into the end-to-end differentiable
structure through multi-factor data association module. We show that the
proposed network dynamically learns and adjusts the confidence maps of both
visual factors and it can be further extended to include the IMU factors as
well. Extensive experiments validate that our proposed method significantly
outperforms the state-of-the-art methods on several public datasets, including
TartanAir, EuRoC and ETH3D-SLAM. Specifically, when dynamically fusing the
three factors together, the absolute trajectory error for both monocular and
stereo configurations on EuRoC dataset has reduced by 45.3% and 36.2%
respectively.Comment: 7 pages, 3 figure
- …