11,933 research outputs found
Accurate Single Image Multi-Modal Camera Pose Estimation
Abstract. A well known problem in photogrammetry and computer vision is the precise and robust determination of camera poses with respect to a given 3D model. In this work we propose a novel multi-modal method for single image camera pose estimation with respect to 3D models with intensity information (e.g., LiDAR data with reflectance information). We utilize a direct point based rendering approach to generate synthetic 2D views from 3D datasets in order to bridge the dimensionality gap. The proposed method then establishes 2D/2D point and local region correspondences based on a novel self-similarity distance measure. Correct correspondences are robustly identified by searching for small regions with a similar geometric relationship of local self-similarities using a Generalized Hough Transform. After backprojection of the generated features into 3D a standard Perspective-n-Points problem is solved to yield an initial camera pose. The pose is then accurately refined using an intensity based 2D/3D registration approach. An evaluation on Vis/IR 2D and airborne and terrestrial 3D datasets shows that the proposed method is applicable to a wide range of different sensor types. In addition, the approach outperforms standard global multi-modal 2D/3D registration approaches based on Mutual Information with respect to robustness and speed. Potential applications are widespread and include for instance multispectral texturing of 3D models, SLAM applications, sensor data fusion and multi-spectral camera calibration and super-resolution applications
Increasing the Efficiency of 6-DoF Visual Localization Using Multi-Modal Sensory Data
Localization is a key requirement for mobile robot autonomy and human-robot
interaction. Vision-based localization is accurate and flexible, however, it
incurs a high computational burden which limits its application on many
resource-constrained platforms. In this paper, we address the problem of
performing real-time localization in large-scale 3D point cloud maps of
ever-growing size. While most systems using multi-modal information reduce
localization time by employing side-channel information in a coarse manner (eg.
WiFi for a rough prior position estimate), we propose to inter-weave the map
with rich sensory data. This multi-modal approach achieves two key goals
simultaneously. First, it enables us to harness additional sensory data to
localise against a map covering a vast area in real-time; and secondly, it also
allows us to roughly localise devices which are not equipped with a camera. The
key to our approach is a localization policy based on a sequential Monte Carlo
estimator. The localiser uses this policy to attempt point-matching only in
nodes where it is likely to succeed, significantly increasing the efficiency of
the localization process. The proposed multi-modal localization system is
evaluated extensively in a large museum building. The results show that our
multi-modal approach not only increases the localization accuracy but
significantly reduces computational time.Comment: Presented at IEEE-RAS International Conference on Humanoid Robots
(Humanoids) 201
Robust Photogeometric Localization over Time for Map-Centric Loop Closure
Map-centric SLAM is emerging as an alternative of conventional graph-based
SLAM for its accuracy and efficiency in long-term mapping problems. However, in
map-centric SLAM, the process of loop closure differs from that of conventional
SLAM and the result of incorrect loop closure is more destructive and is not
reversible. In this paper, we present a tightly coupled photogeometric metric
localization for the loop closure problem in map-centric SLAM. In particular,
our method combines complementary constraints from LiDAR and camera sensors,
and validates loop closure candidates with sequential observations. The
proposed method provides a visual evidence-based outlier rejection where
failures caused by either place recognition or localization outliers can be
effectively removed. We demonstrate the proposed method is not only more
accurate than the conventional global ICP methods but is also robust to
incorrect initial pose guesses.Comment: To Appear in IEEE ROBOTICS AND AUTOMATION LETTERS, ACCEPTED JANUARY
201
Recovering 6D Object Pose: A Review and Multi-modal Analysis
A large number of studies analyse object detection and pose estimation at
visual level in 2D, discussing the effects of challenges such as occlusion,
clutter, texture, etc., on the performances of the methods, which work in the
context of RGB modality. Interpreting the depth data, the study in this paper
presents thorough multi-modal analyses. It discusses the above-mentioned
challenges for full 6D object pose estimation in RGB-D images comparing the
performances of several 6D detectors in order to answer the following
questions: What is the current position of the computer vision community for
maintaining "automation" in robotic manipulation? What next steps should the
community take for improving "autonomy" in robotics while handling objects? Our
findings include: (i) reasonably accurate results are obtained on
textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy
existence of occlusion and clutter severely affects the detectors, and
similar-looking distractors is the biggest challenge in recovering instances'
6D. (iii) Template-based methods and random forest-based learning algorithms
underlie object detection and 6D pose estimation. Recent paradigm is to learn
deep discriminative feature representations and to adopt CNNs taking RGB images
as input. (iv) Depending on the availability of large-scale 6D annotated depth
datasets, feature representations can be learnt on these datasets, and then the
learnt representations can be customized for the 6D problem
Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
We describe the first method to automatically estimate the 3D pose of the
human body as well as its 3D shape from a single unconstrained image. We
estimate a full 3D mesh and show that 2D joints alone carry a surprising amount
of information about body shape. The problem is challenging because of the
complexity of the human body, articulation, occlusion, clothing, lighting, and
the inherent ambiguity in inferring 3D from 2D. To solve this, we first use a
recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D
body joint locations. We then fit (top-down) a recently published statistical
body shape model, called SMPL, to the 2D joints. We do so by minimizing an
objective function that penalizes the error between the projected 3D model
joints and detected 2D joints. Because SMPL captures correlations in human
shape across the population, we are able to robustly fit it to very little
data. We further leverage the 3D model to prevent solutions that cause
interpenetration. We evaluate our method, SMPLify, on the Leeds Sports,
HumanEva, and Human3.6M datasets, showing superior pose accuracy with respect
to the state of the art.Comment: To appear in ECCV 201
- …