29,548 research outputs found
Camera pose estimation in unknown environments using a sequence of wide-baseline monocular images
In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured images taken by hand-held camera in room-sized workspaces with maximum scene depth of 3-4 meters. The system can be used in unknown environments with no additional information available from the outside world except in the first two images that are used for initialization. Pose estimation is performed using only natural feature points extracted and matched in successive images. In wide-baseline images unlike consecutive frames of a video stream, displacement of the feature points in consecutive images is notable and hence cannot be traced easily using patch-based methods. To handle this problem, a hybrid strategy is employed to obtain accurate feature correspondences. In this strategy, first initial feature correspondences are found using similarity of their descriptors and then outlier matchings are removed by applying RANSAC algorithm. Further, to provide a set of required feature matchings a mechanism based on sidelong result of robust estimator was employed. The proposed method is applied on indoor real data with images in VGA quality (640Ă480 pixels) and on average the translation error of camera pose is less than 2 cm which indicates the effectiveness and accuracy of the proposed approach
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
2D-3D Pose Tracking with Multi-View Constraints
Camera localization in 3D LiDAR maps has gained increasing attention due to
its promising ability to handle complex scenarios, surpassing the limitations
of visual-only localization methods. However, existing methods mostly focus on
addressing the cross-modal gaps, estimating camera poses frame by frame without
considering the relationship between adjacent frames, which makes the pose
tracking unstable. To alleviate this, we propose to couple the 2D-3D
correspondences between adjacent frames using the 2D-2D feature matching,
establishing the multi-view geometrical constraints for simultaneously
estimating multiple camera poses. Specifically, we propose a new 2D-3D pose
tracking framework, which consists: a front-end hybrid flow estimation network
for consecutive frames and a back-end pose optimization module. We further
design a cross-modal consistency-based loss to incorporate the multi-view
constraints during the training and inference process. We evaluate our proposed
framework on the KITTI and Argoverse datasets. Experimental results demonstrate
its superior performance compared to existing frame-by-frame 2D-3D pose
tracking methods and state-of-the-art vision-only pose tracking algorithms.
More online pose tracking videos are available at
\url{https://youtu.be/yfBRdg7gw5M}Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
L6DNet: Light 6 DoF Network for Robust and Precise Object Pose Estimation with Small Datasets
Estimating the 3D pose of an object is a challenging task that can be
considered within augmented reality or robotic applications. In this paper, we
propose a novel approach to perform 6 DoF object pose estimation from a single
RGB-D image. We adopt a hybrid pipeline in two stages: data-driven and
geometric respectively. The data-driven step consists of a classification CNN
to estimate the object 2D location in the image from local patches, followed by
a regression CNN trained to predict the 3D location of a set of keypoints in
the camera coordinate system. To extract the pose information, the geometric
step consists in aligning the 3D points in the camera coordinate system with
the corresponding 3D points in world coordinate system by minimizing a
registration error, thus computing the pose. Our experiments on the standard
dataset LineMod show that our approach is more robust and accurate than
state-of-the-art methods. The approach is also validated to achieve a 6 DoF
positioning task by visual servoing.Comment: This work has been accepted at IEEE Robotics and Automation Letter
Camera Pose Estimation from Street-view Snapshots and Point Clouds
This PhD thesis targets on two research problems: (1) How to efïŹciently and robustly estimate the camera pose of a query image with a map that contains street-view snapshots and point clouds; (2) Given the estimated camera pose of a query image, how to create meaningful and intuitive applications with the map data.
To conquer the ïŹrst research problem, we systematically investigated indirect, direct and hybrid camera pose estimation strategies. We implemented state-of-the-art methods and performed comprehensive experiments in two public benchmark datasets considering outdoor environmental changes from ideal to extremely challenging cases. Our key ïŹndings are: (1) the indirect method is usually more accurate than the direct method when there are enough consistent feature correspondences; (2) The direct method is sensitive to initialization, but under extreme outdoor environmental changes, the mutual-information-based direct method is more robust than the feature-based methods; (3) The hybrid method combines the strength from both direct and indirect method and outperforms them in challenging datasets.
To explore the second research problem, we considered inspiring and useful applications by exploiting the camera pose together with the map data. Firstly, we invented a 3D-map augmented photo gallery application, where imagesâ geo-meta data are extracted with an indirect camera pose estimation method and photo sharing experience is improved with the augmentation of 3D map. Secondly, we designed an interactive video playback application, where an indirect method estimates video framesâ camera pose and the video playback is augmented with a 3D map. Thirdly, we proposed a 3D visual primitive based indoor object and outdoor scene recognition method, where the 3D primitives are accumulated from the multiview images
Hybrid Focal Stereo Networks for Pattern Analysis in Homogeneous Scenes
In this paper we address the problem of multiple camera calibration in the
presence of a homogeneous scene, and without the possibility of employing
calibration object based methods. The proposed solution exploits salient
features present in a larger field of view, but instead of employing active
vision we replace the cameras with stereo rigs featuring a long focal analysis
camera, as well as a short focal registration camera. Thus, we are able to
propose an accurate solution which does not require intrinsic variation models
as in the case of zooming cameras. Moreover, the availability of the two views
simultaneously in each rig allows for pose re-estimation between rigs as often
as necessary. The algorithm has been successfully validated in an indoor
setting, as well as on a difficult scene featuring a highly dense pilgrim crowd
in Makkah.Comment: 13 pages, 6 figures, submitted to Machine Vision and Application
Geometrically-driven underground camera modeling and calibration with coplanarity constraints for Boom-type roadheader
The conventional calibration methods based on perspective camera model are not suitable for underground camera with two-layer glasses, which is specially designed for explosion-proof and dust removal in coal mine. The underground camera modeling and calibration algorithms are urgently needed to improve the precision and reliability of underground visual measurement system. This paper presents a novel geometrically-driven underground camera calibration algorithm for Boom-type roadheader. The underground camera model is established under coplanarity constraints, considering explicitly the impact of refraction triggered by the two-layer glasses and deriving the geometrical relationship of equivalent collinearity equations. On this basis, we perform parameters calibration based on a geometrically-driven calibration model, which is a 2D-2D correspondences between the image points and object coordinates of the plannar target. A hybrid LM-PSO algorithm is further proposed in terms of the dynamic combination of the Levenberg-Marqurdt (LM) and Particle Swarm Optimization (PSO), which optimize the underground camera calibration results by minimizing the error of the nonlinear underground camera model. The experiment results demonstrate that the pose errors caused by the two-layer glass refraction are well corrected by the proposed method. The accuracy of the cutting-head pose estimation has increased by 55.73%, meeting the requirements of underground excavations
- âŠ