38 research outputs found
Geometric loss functions for camera pose regression with deep learning
Deep learning has shown to be effective for robust and real-time monocular
image relocalisation. In particular, PoseNet is a deep convolutional neural
network which learns to regress the 6-DOF camera pose from a single image. It
learns to localize using high level features and is robust to difficult
lighting, motion blur and unknown camera intrinsics, where point based SIFT
registration fails. However, it was trained using a naive loss function, with
hyper-parameters which require expensive tuning. In this paper, we give the
problem a more fundamental theoretical treatment. We explore a number of novel
loss functions for learning camera pose which are based on geometry and scene
reprojection error. Additionally we show how to automatically learn an optimal
weighting to simultaneously regress position and orientation. By leveraging
geometry, we demonstrate that our technique significantly improves PoseNet's
performance across datasets ranging from indoor rooms to a small city
CNN-BASED INITIAL LOCALIZATION IMPROVED BY DATA AUGMENTATION
Image-based localization or camera re-localization is a fundamental task in computer vision and mandatory in the fields of navigation for robotics and autonomous driving or for virtual and augmented reality. Such image pose regression in 6 Degrees of Freedom (DoF) is recently solved by Convolutional Neural Networks (CNNs). However, already well-established methods based on feature matching still score higher accuracies so far. Therefore, we want to investigate how data augmentation could further improve CNN-based pose regression. Data augmentation is a valuable technique to boost performance on training based methods and wide spread in the computer vision community. Our aim in this paper is to show the benefit of data augmentation for pose regression by CNNs. For this purpose images are rendered from a 3D model of the actual test environment. This model again is generated by the original training data set, whereas no additional information nor data is required. Furthermore we introduce different training sets composed of rendered and real images. It is shown that the enhanced training of CNNs by utilizing 3D models of the environment improves the image localization accuracy. The accuracy of pose regression could be improved up to 69.37 % for the position component and 61.61 % for the rotation component on our investigated data set
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved