86 research outputs found
DPC-Net: Deep Pose Correction for Visual Localization
We present a novel method to fuse the power of deep networks with the
computational efficiency of geometric and probabilistic localization
algorithms. In contrast to other methods that completely replace a classical
visual estimator with a deep network, we propose an approach that uses a
convolutional neural network to learn difficult-to-model corrections to the
estimator from ground-truth training data. To this end, we derive a novel loss
function for learning SE(3) corrections based on a matrix Lie groups approach,
with a natural formulation for balancing translation and rotation errors. We
use this loss to train a Deep Pose Correction network (DPC-Net) that predicts
corrections for a particular estimator, sensor and environment. Using the KITTI
odometry dataset, we demonstrate significant improvements to the accuracy of a
computationally-efficient sparse stereo visual odometry pipeline, that render
it as accurate as a modern computationally-intensive dense estimator. Further,
we show how DPC-Net can be used to mitigate the effect of poorly calibrated
lens distortion parameters.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane,
Australia, May 21-25, 201
How to Train a CAT: Learning Canonical Appearance Transformations for Direct Visual Localization Under Illumination Change
Direct visual localization has recently enjoyed a resurgence in popularity
with the increasing availability of cheap mobile computing power. The
competitive accuracy and robustness of these algorithms compared to
state-of-the-art feature-based methods, as well as their natural ability to
yield dense maps, makes them an appealing choice for a variety of mobile
robotics applications. However, direct methods remain brittle in the face of
appearance change due to their underlying assumption of photometric
consistency, which is commonly violated in practice. In this paper, we propose
to mitigate this problem by training deep convolutional encoder-decoder models
to transform images of a scene such that they correspond to a previously-seen
canonical appearance. We validate our method in multiple environments and
illumination conditions using high-fidelity synthetic RGB-D datasets, and
integrate the trained models into a direct visual localization pipeline,
yielding improvements in visual odometry (VO) accuracy through time-varying
illumination conditions, as well as improved metric relocalization performance
under illumination change, where conventional methods normally fail. We further
provide a preliminary investigation of transfer learning from synthetic to real
environments in a localization context. An open-source implementation of our
method using PyTorch is available at https://github.com/utiasSTARS/cat-net.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane,
Australia, May 21-25, 201
Learning a Bias Correction for Lidar-only Motion Estimation
This paper presents a novel technique to correct for bias in a classical
estimator using a learning approach. We apply a learned bias correction to a
lidar-only motion estimation pipeline. Our technique trains a Gaussian process
(GP) regression model using data with ground truth. The inputs to the model are
high-level features derived from the geometry of the point-clouds, and the
outputs are the predicted biases between poses computed by the estimator and
the ground truth. The predicted biases are applied as a correction to the poses
computed by the estimator.
Our technique is evaluated on over 50km of lidar data, which includes the
KITTI odometry benchmark and lidar datasets collected around the University of
Toronto campus. After applying the learned bias correction, we obtained
significant improvements to lidar odometry in all datasets tested. We achieved
around 10% reduction in errors on all datasets from an already accurate lidar
odometry algorithm, at the expense of only less than 1% increase in
computational cost at run-time.Comment: 15th Conference on Computer and Robot Vision (CRV 2018
Inertial learning and haptics for legged robot state estimation in visually challenging environments
Legged robots have enormous potential to automate dangerous or dirty jobs because they are capable of traversing a wide range of difficult terrains such as up stairs or through mud. However, a significant challenge preventing widespread deployment of legged robots is a lack of robust state estimation, particularly in visually challenging conditions such as darkness or smoke.
In this thesis, I address these challenges by exploiting proprioceptive sensing from inertial, kinematic and haptic sensors to provide more accurate state estimation when visual sensors fail. Four different methods are presented, including the use of haptic localisation, terrain semantic localisation, learned inertial odometry, and deep learning to infer the evolution of IMU biases.
The first approach exploits haptics as a source of proprioceptive localisation by comparing geometric information to a prior map. The second method expands on this concept by fusing both semantic and geometric information, allowing for accurate localisation on diverse terrain.
Next, I combine new techniques in inertial learning with classical IMU integration and legged robot kinematics to provide more robust state estimation. This is further developed to use only IMU data, for an application entirely different from robotics: 3D reconstruction of bone with a handheld ultrasound scanner. Finally, I present the novel idea of using deep learning to infer the evolution of IMU biases, improving state estimation in exteroceptive systems where vision fails.
Legged robots have the potential to benefit society by automating dangerous, dull, or dirty jobs and by assisting first responders in emergency situations. However, there remain many unsolved challenges to the real-world deployment of legged robots, including accurate state estimation in vision-denied environments. The work presented in this thesis takes a step towards solving these challenges and enabling the deployment of legged robots in a variety of applications
On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey
Stereo matching is one of the longest-standing problems in computer vision
with close to 40 years of studies and research. Throughout the years the
paradigm has shifted from local, pixel-level decision to various forms of
discrete and continuous optimization to data-driven, learning-based methods.
Recently, the rise of machine learning and the rapid proliferation of deep
learning enhanced stereo matching with new exciting trends and applications
unthinkable until a few years ago. Interestingly, the relationship between
these two worlds is two-way. While machine, and especially deep, learning
advanced the state-of-the-art in stereo matching, stereo itself enabled new
ground-breaking methodologies such as self-supervised monocular depth
estimation based on deep networks. In this paper, we review recent research in
the field of learning-based depth estimation from single and binocular images
highlighting the synergies, the successes achieved so far and the open
challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial:
"Learning-based depth estimation from stereo and monocular images: successes,
limitations and future challenges"
(https://sites.google.com/view/cvpr-2019-depth-from-image/home
Self-Supervised Learning for Geometry
This thesis focuses on two fundamental problems in robotic vision, scene geometry understanding and camera tracking. While both tasks have been the subject of research in robotic vision, numerous geometric solutions have been proposed in the past decades. In this thesis, we cast the geometric problems as machine learning problems, specifically, deep learning problems. Differ from conventional supervised learning methods that using expensive annotations as the supervisory signal, we advocate for the use of geometry as a supervisory signal to improve the perceptual capabilities in robots, namely Geometry Self-supervision. With the geometry self-supervision, we allow robots to learn and infer the 3D structure of the scene and ego-motion by watching videos, instead of expensive ground-truth annotation in traditional supervised learning problems. Followed by showing the use of geometry for deep learning, we show the possibilities of integrating self-supervised models with traditional geometry-based methods as a hybrid solution for solving the mapping and tracking problem. We focus on an end-to-end mapping problem from stereo data in the first part of this thesis, namely Deep Stereo Matching. Stereo matching is one of the oldest problems in computer vision. Classical approaches to stereo matching typically rely on handcrafted features and a multiple-step solution. Recent deep learning methods utilize deep neural networks to achieve end-to-end trained approaches while significantly outperforming classic methods. We propose a novel data acquisition pipeline using an untethered device (Microsoft HoloLens) with a Time-of-Flight (ToF) depth camera and stereo cameras to collect real-world data. A novel semi-supervised method is proposed to train networks with ground-truth supervision and self-supervision. The large scale real-world stereo dataset with semi-dense annotation and dense self-supervision allow our deep stereo matching network to generalize better when compared to prior arts. Mapping and tracking using a single camera (Monocular) is a harder problem when compared to that using a stereo camera due to varies well-known challenges. In the second part of this thesis, We decouple the problem into single view depth estimation (mapping) and two view visual odometry (tracking) and propose a self-supervised framework, namely SelfTAM, which jointly learns the depth estimator and the odometry estimator. The self-supervised problem is usually formulated as an energy minimization problem consist of an energy of data consistency in multi-view (e.g. photometric) and an energy of prior regularization (e.g. depth smoothness prior). We strengthen the supervision signal with a deep feature consistency energy term and a surface normal regularization term. Though our method trains models with stereo sequence such that a real-world scaling factor is naturally incorporated, only monocular data is required in the inference stage. In the last part of this thesis, we revisit the basics of visual odometry and explore the best practice to integrate deep learning models with geometry-based visual odometry methods. A robust visual odometry system, DF-VO, is proposed. We use deep networks to establish 2D-2D/3D-2D correspondences and pick the best correspondences from the dense predictions. Feeding the high-quality correspondences into traditional VO methods, e.g. Epipolar Geometry and Prospective-n-Points, we can solve visual odometry problem within a more robust framework. With the proposed self-supervised training, we can even allow the models to perform online adaptation in the run-time and take a step toward a lifelong learning visual odometry system.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
Deep probabilistic methods for improved radar sensor modelling and pose estimation
Radar’s ability to sense under adverse conditions and at far-range makes it a valuable alternative to vision and lidar for mobile robotic applications. However, its complex, scene-dependent sensing process and significant noise artefacts makes working with radar challenging. Moving past classical rule-based approaches, which have dominated the literature to date, this thesis investigates deep and data-driven solutions across a range of tasks in robotics.
Firstly, a deep approach is developed for mapping raw sensor measurements to a grid-map of occupancy probabilities, outperforming classical filtering approaches by a significant margin. A distribution over the occupancy state is captured, additionally allowing uncertainty in predictions to be identified and managed. The approach is trained entirely using partial labels generated automatically from lidar, without requiring manual labelling.
Next, a deep model is proposed for generating stochastic radar measurements from simulated elevation maps. The model is trained by learning the forward and backward processes side-by-side, using a combination of adversarial and cyclical consistency constraints in combination with a partial alignment loss, using labels generated in lidar. By faithfully replicating the radar sensing process, new models can be trained for down-stream tasks, using labels that are readily available in simulation. In this case, segmentation models trained on simulated radar measurements, when deployed in the real world, are shown to approach the performance of a model trained entirely on real-world measurements.
Finally, the potential of deep approaches applied to the radar odometry task are explored. A learnt feature space is combined with a classical correlative scan matching procedure and optimised for pose prediction, allowing the proposed method to outperform the previous state-of-the-art by a significant margin. Through a probabilistic consideration the uncertainty in the pose is also successfully characterised. Building upon this success, properties of the Fourier Transform are then utilised to separate the search for translation and angle. It is shown that this decoupled search results in a significant boost to run-time performance, allowing the approach to run in real-time on CPUs and embedded devices, whilst remaining competitive with other radar odometry methods proposed in the literature
- …