908 research outputs found
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Visual localization enables autonomous vehicles to navigate in their
surroundings and augmented reality applications to link virtual to real worlds.
Practical visual localization approaches need to be robust to a wide variety of
viewing condition, including day-night changes, as well as weather and seasonal
variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera
pose estimates. In this paper, we introduce the first benchmark datasets
specifically designed for analyzing the impact of such factors on visual
localization. Using carefully created ground truth poses for query images taken
under a wide variety of conditions, we evaluate the impact of various factors
on 6DOF camera pose estimation accuracy through extensive experiments with
state-of-the-art localization approaches. Based on our results, we draw
conclusions about the difficulty of different conditions, showing that
long-term localization is far from solved, and propose promising avenues for
future work, including sequence-based localization approaches and the need for
better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh
The simultaneous localization and mapping (SLAM):An overview
Positioning is a need for many applications related to mapping and navigation either in civilian or military domains. The significant developments in satellite-based techniques, sensors, telecommunications, computer hardware and software, image processing, etc. positively influenced to solve the positioning problem efficiently and instantaneously. Accordingly, the mentioned development empowered the applications and advancement of autonomous navigation. One of the most interesting developed positioning techniques is what is called in robotics as the Simultaneous Localization and Mapping SLAM. The SLAM problem solution has witnessed a quick improvement in the last decades either using active sensors like the RAdio Detection And Ranging (Radar) and Light Detection and Ranging (LiDAR) or passive sensors like cameras. Definitely, positioning and mapping is one of the main tasks for Geomatics engineers, and therefore it's of high importance for them to understand the SLAM topic which is not easy because of the huge documentation and algorithms available and the various SLAM solutions in terms of the mathematical models, complexity, the sensors used, and the type of applications. In this paper, a clear and simplified explanation is introduced about SLAM from a Geomatical viewpoint avoiding going into the complicated algorithmic details behind the presented techniques. In this way, a general overview of SLAM is presented showing the relationship between its different components and stages like the core part of the front-end and back-end and their relation to the SLAM paradigm. Furthermore, we explain the major mathematical techniques of filtering and pose graph optimization either using visual or LiDAR SLAM and introduce a summary of the deep learning efficient contribution to the SLAM problem. Finally, we address examples of some existing practical applications of SLAM in our reality
Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image
We consider the problem of dense depth prediction from a sparse set of depth
measurements and a single RGB image. Since depth estimation from monocular
images alone is inherently ambiguous and unreliable, to attain a higher level
of robustness and accuracy, we introduce additional sparse depth samples, which
are either acquired with a low-resolution depth sensor or computed via visual
Simultaneous Localization and Mapping (SLAM) algorithms. We propose the use of
a single deep regression network to learn directly from the RGB-D raw data, and
explore the impact of number of depth samples on prediction accuracy. Our
experiments show that, compared to using only RGB images, the addition of 100
spatially random depth samples reduces the prediction root-mean-square error by
50% on the NYU-Depth-v2 indoor dataset. It also boosts the percentage of
reliable prediction from 59% to 92% on the KITTI dataset. We demonstrate two
applications of the proposed algorithm: a plug-in module in SLAM to convert
sparse maps to dense maps, and super-resolution for LiDARs. Software and video
demonstration are publicly available.Comment: accepted to ICRA 2018. 8 pages, 8 figures, 3 tables. Video at
https://www.youtube.com/watch?v=vNIIT_M7x7Y. Code at
https://github.com/fangchangma/sparse-to-dens
Camera Pose Estimation from Street-view Snapshots and Point Clouds
This PhD thesis targets on two research problems: (1) How to efficiently and robustly estimate the camera pose of a query image with a map that contains street-view snapshots and point clouds; (2) Given the estimated camera pose of a query image, how to create meaningful and intuitive applications with the map data.
To conquer the first research problem, we systematically investigated indirect, direct and hybrid camera pose estimation strategies. We implemented state-of-the-art methods and performed comprehensive experiments in two public benchmark datasets considering outdoor environmental changes from ideal to extremely challenging cases. Our key findings are: (1) the indirect method is usually more accurate than the direct method when there are enough consistent feature correspondences; (2) The direct method is sensitive to initialization, but under extreme outdoor environmental changes, the mutual-information-based direct method is more robust than the feature-based methods; (3) The hybrid method combines the strength from both direct and indirect method and outperforms them in challenging datasets.
To explore the second research problem, we considered inspiring and useful applications by exploiting the camera pose together with the map data. Firstly, we invented a 3D-map augmented photo gallery application, where images’ geo-meta data are extracted with an indirect camera pose estimation method and photo sharing experience is improved with the augmentation of 3D map. Secondly, we designed an interactive video playback application, where an indirect method estimates video frames’ camera pose and the video playback is augmented with a 3D map. Thirdly, we proposed a 3D visual primitive based indoor object and outdoor scene recognition method, where the 3D primitives are accumulated from the multiview images
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Understanding the Limitations of CNN-based Absolute Camera Pose Regression
Visual localization is the task of accurate camera pose estimation in a known
scene. It is a key problem in computer vision and robotics, with applications
including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality.
Traditionally, the localization problem has been tackled using 3D geometry.
Recently, end-to-end approaches based on convolutional neural networks have
become popular. These methods learn to directly regress the camera pose from an
input image. However, they do not achieve the same level of pose accuracy as 3D
structure-based methods. To understand this behavior, we develop a theoretical
model for camera pose regression. We use our model to predict failure cases for
pose regression techniques and verify our predictions through experiments. We
furthermore use our model to show that pose regression is more closely related
to pose approximation via image retrieval than to accurate pose estimation via
3D structure. A key result is that current approaches do not consistently
outperform a handcrafted image retrieval baseline. This clearly shows that
additional research is needed before pose regression algorithms are ready to
compete with structure-based methods.Comment: Initial version of a paper accepted to CVPR 201
- …