170 research outputs found

    Understanding the Limitations of CNN-based Absolute Camera Pose Regression

    Full text link
    Visual localization is the task of accurate camera pose estimation in a known scene. It is a key problem in computer vision and robotics, with applications including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality. Traditionally, the localization problem has been tackled using 3D geometry. Recently, end-to-end approaches based on convolutional neural networks have become popular. These methods learn to directly regress the camera pose from an input image. However, they do not achieve the same level of pose accuracy as 3D structure-based methods. To understand this behavior, we develop a theoretical model for camera pose regression. We use our model to predict failure cases for pose regression techniques and verify our predictions through experiments. We furthermore use our model to show that pose regression is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure. A key result is that current approaches do not consistently outperform a handcrafted image retrieval baseline. This clearly shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.Comment: Initial version of a paper accepted to CVPR 201

    Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization

    Get PDF
    Image-based camera relocalization is an important problem in computer vision and robotics. Recent works utilize convolutional neural networks (CNNs) to regress for pixels in a query image their corresponding 3D world coordinates in the scene. The final pose is then solved via a RANSAC-based optimization scheme using the predicted coordinates. Usually, the CNN is trained with ground truth scene coordinates, but it has also been shown that the network can discover 3D scene geometry automatically by minimizing single-view reprojection loss. However, due to the deficiencies of the reprojection loss, the network needs to be carefully initialized. In this paper, we present a new angle-based reprojection loss, which resolves the issues of the original reprojection loss. With this new loss function, the network can be trained without careful initialization, and the system achieves more accurate results. The new loss also enables us to utilize available multi-view constraints, which further improve performance.Comment: ECCV 2018 Workshop (Geometry Meets Deep Learning

    Learning camera localization via dense scene matching

    Get PDF
    This thesis presents a method for camera localization. Given a set of reference images with known camera poses, camera localization aims to estimate the 6 DoF camera pose for an arbitrary query image captured in the same environment. It might also be generalized to recover the 6 DoF pose of each video frame of an input query video. Traditional methods detect and match interest points between the query image and a pre-built 3D model, and then solve camera poses accordingly by the PnP algorithm combined with RANSAC. The recent development of deep learning has motivated end-to-end approaches for camera localization. Those methods encode scene structures into the parameters of a specific convolutional neural network (CNN) and thus are able to predict a dense coordinate map for a query image whose pixels record 3D scene coordinates. This dense coordinate map can be used to estimate camera poses in the same way as traditional methods. However, most of these learning-based methods require re-training or re-adaption for a new scene and have difficulties in handling large-scale scenes due to limited network capacity. In this thesis, We present a new method for scene agnostic camera localization which can be applied to a novel scene without retraining. This scene agnostic localization is achieved with our dense scene matching (DSM) technique, where a cost volume is constructed between a query image and a scene. The cost volume is fed to a CNN to predict the dense coordinate map to compute the 6 DoF camera pose. In addition, our method can be directly applied to deal with query videoclips, which leads to extra performance boost during testing time by exploring temporal constraint between neighboring frames. Our method achieves state-of-the-art performance over several benchmarks

    To Learn or Not to Learn: Visual Localization from Essential Matrices

    Full text link
    Visual localization is the problem of estimating a camera within a scene and a key component in computer vision applications such as self-driving cars and Mixed Reality. State-of-the-art approaches for accurate visual localization use scene-specific representations, resulting in the overhead of constructing these models when applying the techniques to new scenes. Recently, deep learning-based approaches based on relative pose estimation have been proposed, carrying the promise of easily adapting to new scenes. However, it has been shown such approaches are currently significantly less accurate than state-of-the-art approaches. In this paper, we are interested in analyzing this behavior. To this end, we propose a novel framework for visual localization from relative poses. Using a classical feature-based approach within this framework, we show state-of-the-art performance. Replacing the classical approach with learned alternatives at various levels, we then identify the reasons for why deep learned approaches do not perform well. Based on our analysis, we make recommendations for future work.Comment: Accepted to ICRA 202
    • …
    corecore