86 research outputs found

    DPC-Net: Deep Pose Correction for Visual Localization

    Full text link
    We present a novel method to fuse the power of deep networks with the computational efficiency of geometric and probabilistic localization algorithms. In contrast to other methods that completely replace a classical visual estimator with a deep network, we propose an approach that uses a convolutional neural network to learn difficult-to-model corrections to the estimator from ground-truth training data. To this end, we derive a novel loss function for learning SE(3) corrections based on a matrix Lie groups approach, with a natural formulation for balancing translation and rotation errors. We use this loss to train a Deep Pose Correction network (DPC-Net) that predicts corrections for a particular estimator, sensor and environment. Using the KITTI odometry dataset, we demonstrate significant improvements to the accuracy of a computationally-efficient sparse stereo visual odometry pipeline, that render it as accurate as a modern computationally-intensive dense estimator. Further, we show how DPC-Net can be used to mitigate the effect of poorly calibrated lens distortion parameters.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane, Australia, May 21-25, 201

    How to Train a CAT: Learning Canonical Appearance Transformations for Direct Visual Localization Under Illumination Change

    Full text link
    Direct visual localization has recently enjoyed a resurgence in popularity with the increasing availability of cheap mobile computing power. The competitive accuracy and robustness of these algorithms compared to state-of-the-art feature-based methods, as well as their natural ability to yield dense maps, makes them an appealing choice for a variety of mobile robotics applications. However, direct methods remain brittle in the face of appearance change due to their underlying assumption of photometric consistency, which is commonly violated in practice. In this paper, we propose to mitigate this problem by training deep convolutional encoder-decoder models to transform images of a scene such that they correspond to a previously-seen canonical appearance. We validate our method in multiple environments and illumination conditions using high-fidelity synthetic RGB-D datasets, and integrate the trained models into a direct visual localization pipeline, yielding improvements in visual odometry (VO) accuracy through time-varying illumination conditions, as well as improved metric relocalization performance under illumination change, where conventional methods normally fail. We further provide a preliminary investigation of transfer learning from synthetic to real environments in a localization context. An open-source implementation of our method using PyTorch is available at https://github.com/utiasSTARS/cat-net.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane, Australia, May 21-25, 201

    Learning a Bias Correction for Lidar-only Motion Estimation

    Full text link
    This paper presents a novel technique to correct for bias in a classical estimator using a learning approach. We apply a learned bias correction to a lidar-only motion estimation pipeline. Our technique trains a Gaussian process (GP) regression model using data with ground truth. The inputs to the model are high-level features derived from the geometry of the point-clouds, and the outputs are the predicted biases between poses computed by the estimator and the ground truth. The predicted biases are applied as a correction to the poses computed by the estimator. Our technique is evaluated on over 50km of lidar data, which includes the KITTI odometry benchmark and lidar datasets collected around the University of Toronto campus. After applying the learned bias correction, we obtained significant improvements to lidar odometry in all datasets tested. We achieved around 10% reduction in errors on all datasets from an already accurate lidar odometry algorithm, at the expense of only less than 1% increase in computational cost at run-time.Comment: 15th Conference on Computer and Robot Vision (CRV 2018

    Inertial learning and haptics for legged robot state estimation in visually challenging environments

    Get PDF
    Legged robots have enormous potential to automate dangerous or dirty jobs because they are capable of traversing a wide range of difficult terrains such as up stairs or through mud. However, a significant challenge preventing widespread deployment of legged robots is a lack of robust state estimation, particularly in visually challenging conditions such as darkness or smoke. In this thesis, I address these challenges by exploiting proprioceptive sensing from inertial, kinematic and haptic sensors to provide more accurate state estimation when visual sensors fail. Four different methods are presented, including the use of haptic localisation, terrain semantic localisation, learned inertial odometry, and deep learning to infer the evolution of IMU biases. The first approach exploits haptics as a source of proprioceptive localisation by comparing geometric information to a prior map. The second method expands on this concept by fusing both semantic and geometric information, allowing for accurate localisation on diverse terrain. Next, I combine new techniques in inertial learning with classical IMU integration and legged robot kinematics to provide more robust state estimation. This is further developed to use only IMU data, for an application entirely different from robotics: 3D reconstruction of bone with a handheld ultrasound scanner. Finally, I present the novel idea of using deep learning to infer the evolution of IMU biases, improving state estimation in exteroceptive systems where vision fails. Legged robots have the potential to benefit society by automating dangerous, dull, or dirty jobs and by assisting first responders in emergency situations. However, there remain many unsolved challenges to the real-world deployment of legged robots, including accurate state estimation in vision-denied environments. The work presented in this thesis takes a step towards solving these challenges and enabling the deployment of legged robots in a variety of applications

    State of the art in vision-based localization techniques for autonomous navigation systems

    Get PDF

    On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey

    Full text link
    Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial: "Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges" (https://sites.google.com/view/cvpr-2019-depth-from-image/home

    Self-Supervised Learning for Geometry

    Get PDF
    This thesis focuses on two fundamental problems in robotic vision, scene geometry understanding and camera tracking. While both tasks have been the subject of research in robotic vision, numerous geometric solutions have been proposed in the past decades. In this thesis, we cast the geometric problems as machine learning problems, specifically, deep learning problems. Differ from conventional supervised learning methods that using expensive annotations as the supervisory signal, we advocate for the use of geometry as a supervisory signal to improve the perceptual capabilities in robots, namely Geometry Self-supervision. With the geometry self-supervision, we allow robots to learn and infer the 3D structure of the scene and ego-motion by watching videos, instead of expensive ground-truth annotation in traditional supervised learning problems. Followed by showing the use of geometry for deep learning, we show the possibilities of integrating self-supervised models with traditional geometry-based methods as a hybrid solution for solving the mapping and tracking problem. We focus on an end-to-end mapping problem from stereo data in the first part of this thesis, namely Deep Stereo Matching. Stereo matching is one of the oldest problems in computer vision. Classical approaches to stereo matching typically rely on handcrafted features and a multiple-step solution. Recent deep learning methods utilize deep neural networks to achieve end-to-end trained approaches while significantly outperforming classic methods. We propose a novel data acquisition pipeline using an untethered device (Microsoft HoloLens) with a Time-of-Flight (ToF) depth camera and stereo cameras to collect real-world data. A novel semi-supervised method is proposed to train networks with ground-truth supervision and self-supervision. The large scale real-world stereo dataset with semi-dense annotation and dense self-supervision allow our deep stereo matching network to generalize better when compared to prior arts. Mapping and tracking using a single camera (Monocular) is a harder problem when compared to that using a stereo camera due to varies well-known challenges. In the second part of this thesis, We decouple the problem into single view depth estimation (mapping) and two view visual odometry (tracking) and propose a self-supervised framework, namely SelfTAM, which jointly learns the depth estimator and the odometry estimator. The self-supervised problem is usually formulated as an energy minimization problem consist of an energy of data consistency in multi-view (e.g. photometric) and an energy of prior regularization (e.g. depth smoothness prior). We strengthen the supervision signal with a deep feature consistency energy term and a surface normal regularization term. Though our method trains models with stereo sequence such that a real-world scaling factor is naturally incorporated, only monocular data is required in the inference stage. In the last part of this thesis, we revisit the basics of visual odometry and explore the best practice to integrate deep learning models with geometry-based visual odometry methods. A robust visual odometry system, DF-VO, is proposed. We use deep networks to establish 2D-2D/3D-2D correspondences and pick the best correspondences from the dense predictions. Feeding the high-quality correspondences into traditional VO methods, e.g. Epipolar Geometry and Prospective-n-Points, we can solve visual odometry problem within a more robust framework. With the proposed self-supervised training, we can even allow the models to perform online adaptation in the run-time and take a step toward a lifelong learning visual odometry system.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

    Deep probabilistic methods for improved radar sensor modelling and pose estimation

    Get PDF
    Radar’s ability to sense under adverse conditions and at far-range makes it a valuable alternative to vision and lidar for mobile robotic applications. However, its complex, scene-dependent sensing process and significant noise artefacts makes working with radar challenging. Moving past classical rule-based approaches, which have dominated the literature to date, this thesis investigates deep and data-driven solutions across a range of tasks in robotics. Firstly, a deep approach is developed for mapping raw sensor measurements to a grid-map of occupancy probabilities, outperforming classical filtering approaches by a significant margin. A distribution over the occupancy state is captured, additionally allowing uncertainty in predictions to be identified and managed. The approach is trained entirely using partial labels generated automatically from lidar, without requiring manual labelling. Next, a deep model is proposed for generating stochastic radar measurements from simulated elevation maps. The model is trained by learning the forward and backward processes side-by-side, using a combination of adversarial and cyclical consistency constraints in combination with a partial alignment loss, using labels generated in lidar. By faithfully replicating the radar sensing process, new models can be trained for down-stream tasks, using labels that are readily available in simulation. In this case, segmentation models trained on simulated radar measurements, when deployed in the real world, are shown to approach the performance of a model trained entirely on real-world measurements. Finally, the potential of deep approaches applied to the radar odometry task are explored. A learnt feature space is combined with a classical correlative scan matching procedure and optimised for pose prediction, allowing the proposed method to outperform the previous state-of-the-art by a significant margin. Through a probabilistic consideration the uncertainty in the pose is also successfully characterised. Building upon this success, properties of the Fourier Transform are then utilised to separate the search for translation and angle. It is shown that this decoupled search results in a significant boost to run-time performance, allowing the approach to run in real-time on CPUs and embedded devices, whilst remaining competitive with other radar odometry methods proposed in the literature
    • …
    corecore