22 research outputs found
DPC-Net: Deep Pose Correction for Visual Localization
We present a novel method to fuse the power of deep networks with the
computational efficiency of geometric and probabilistic localization
algorithms. In contrast to other methods that completely replace a classical
visual estimator with a deep network, we propose an approach that uses a
convolutional neural network to learn difficult-to-model corrections to the
estimator from ground-truth training data. To this end, we derive a novel loss
function for learning SE(3) corrections based on a matrix Lie groups approach,
with a natural formulation for balancing translation and rotation errors. We
use this loss to train a Deep Pose Correction network (DPC-Net) that predicts
corrections for a particular estimator, sensor and environment. Using the KITTI
odometry dataset, we demonstrate significant improvements to the accuracy of a
computationally-efficient sparse stereo visual odometry pipeline, that render
it as accurate as a modern computationally-intensive dense estimator. Further,
we show how DPC-Net can be used to mitigate the effect of poorly calibrated
lens distortion parameters.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane,
Australia, May 21-25, 201
Learning monocular visual odometry with dense 3D mapping from dense 3D flow
This paper introduces a fully deep learning approach to monocular SLAM, which
can perform simultaneous localization using a neural network for learning
visual odometry (L-VO) and dense 3D mapping. Dense 2D flow and a depth image
are generated from monocular images by sub-networks, which are then used by a
3D flow associated layer in the L-VO network to generate dense 3D flow. Given
this 3D flow, the dual-stream L-VO network can then predict the 6DOF relative
pose and furthermore reconstruct the vehicle trajectory. In order to learn the
correlation between motion directions, the Bivariate Gaussian modelling is
employed in the loss function. The L-VO network achieves an overall performance
of 2.68% for average translational error and 0.0143 deg/m for average
rotational error on the KITTI odometry benchmark. Moreover, the learned depth
is fully leveraged to generate a dense 3D map. As a result, an entire visual
SLAM system, that is, learning monocular odometry combined with dense 3D
mapping, is achieved.Comment: International Conference on Intelligent Robots and Systems(IROS 2018
Towards Visual Ego-motion Learning in Robots
Many model-based Visual Odometry (VO) algorithms have been proposed in the
past decade, often restricted to the type of camera optics, or the underlying
motion manifold observed. We envision robots to be able to learn and perform
these tasks, in a minimally supervised setting, as they gain more experience.
To this end, we propose a fully trainable solution to visual ego-motion
estimation for varied camera optics. We propose a visual ego-motion learning
architecture that maps observed optical flow vectors to an ego-motion density
estimate via a Mixture Density Network (MDN). By modeling the architecture as a
Conditional Variational Autoencoder (C-VAE), our model is able to provide
introspective reasoning and prediction for ego-motion induced scene-flow.
Additionally, our proposed model is especially amenable to bootstrapped
ego-motion learning in robots where the supervision in ego-motion estimation
for a particular camera sensor can be obtained from standard navigation-based
sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through
experiments, we show the utility of our proposed approach in enabling the
concept of self-supervised learning for visual ego-motion estimation in
autonomous robots.Comment: Conference paper; Submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS) 2017, Vancouver CA; 8 pages, 8 figures,
2 table
Non-iterative RGB-D-inertial Odometry
This paper presents a non-iterative solution to RGB-D-inertial odometry
system. Traditional odometry methods resort to iterative algorithms which are
usually computationally expensive or require well-designed initialization. To
overcome this problem, this paper proposes to combine a non-iterative front-end
(odometry) with an iterative back-end (loop closure) for the RGB-D-inertial
SLAM system. The main contribution lies in the novel non-iterative front-end,
which leverages on inertial fusion and kernel cross-correlators (KCC) to match
point clouds in frequency domain. Dominated by the fast Fourier transform
(FFT), our method is only of complexity , where is
the number of points. Map fusion is conducted by element-wise operations, so
that both time and space complexity are further reduced. Extensive experiments
show that, due to the lightweight of the proposed front-end, the framework is
able to run at a much faster speed yet still with comparable accuracy with the
state-of-the-arts