3,003 research outputs found
LoopSmart: Smart Visual SLAM Through Surface Loop Closure
We present a visual simultaneous localization and mapping (SLAM) framework of
closing surface loops. It combines both sparse feature matching and dense
surface alignment. Sparse feature matching is used for visual odometry and
globally camera pose fine-tuning when dense loops are detected, while dense
surface alignment is the way of closing large loops and solving surface
mismatching problem. To achieve smart dense surface loop closure, a highly
efficient CUDA-based global point cloud registration method and a map content
dependent loop verification method are proposed. We run extensive experiments
on different datasets, our method outperforms state-of-the-art ones in terms of
both camera trajectory and surface reconstruction accuracy
Pose Estimation using Local Structure-Specific Shape and Appearance Context
We address the problem of estimating the alignment pose between two models
using structure-specific local descriptors. Our descriptors are generated using
a combination of 2D image data and 3D contextual shape data, resulting in a set
of semi-local descriptors containing rich appearance and shape information for
both edge and texture structures. This is achieved by defining feature space
relations which describe the neighborhood of a descriptor. By quantitative
evaluations, we show that our descriptors provide high discriminative power
compared to state of the art approaches. In addition, we show how to utilize
this for the estimation of the alignment pose between two point sets. We
present experiments both in controlled and real-life scenarios to validate our
approach
Physics-based Scene-level Reasoning for Object Pose Estimation in Clutter
This paper focuses on vision-based pose estimation for multiple rigid objects
placed in clutter, especially in cases involving occlusions and objects resting
on each other. Progress has been achieved recently in object recognition given
advancements in deep learning. Nevertheless, such tools typically require a
large amount of training data and significant manual effort to label objects.
This limits their applicability in robotics, where solutions must scale to a
large number of objects and variety of conditions. Moreover, the combinatorial
nature of the scenes that could arise from the placement of multiple objects is
hard to capture in the training dataset. Thus, the learned models might not
produce the desired level of precision required for tasks, such as robotic
manipulation. This work proposes an autonomous process for pose estimation that
spans from data generation to scene-level reasoning and self-learning. In
particular, the proposed framework first generates a labeled dataset for
training a Convolutional Neural Network (CNN) for object detection in clutter.
These detections are used to guide a scene-level optimization process, which
considers the interactions between the different objects present in the clutter
to output pose estimates of high precision. Furthermore, confident estimates
are used to label online real images from multiple views and re-train the
process in a self-learning pipeline. Experimental results indicate that this
process is quickly able to identify in cluttered scenes physically-consistent
object poses that are more precise than the ones found by reasoning over
individual instances of objects. Furthermore, the quality of pose estimates
increases over time given the self-learning process.Comment: 18 pages, 13 figures, International Journal of Robotics Research
(IJRR) 2019. arXiv admin note: text overlap with arXiv:1710.0857
Learning to Fuse Local Geometric Features for 3D Rigid Data Matching
This paper presents a simple yet very effective data-driven approach to fuse
both low-level and high-level local geometric features for 3D rigid data
matching. It is a common practice to generate distinctive geometric descriptors
by fusing low-level features from various viewpoints or subspaces, or enhance
geometric feature matching by leveraging multiple high-level features. In prior
works, they are typically performed via linear operations such as concatenation
and min pooling. We show that more compact and distinctive representations can
be achieved by optimizing a neural network (NN) model under the triplet
framework that non-linearly fuses local geometric features in Euclidean spaces.
The NN model is trained by an improved triplet loss function that fully
leverages all pairwise relationships within the triplet. Moreover, the fused
descriptor by our approach is also competitive to deep learned descriptors from
raw data while being more lightweight and rotational invariant. Experimental
results on four standard datasets with various data modalities and application
contexts confirm the advantages of our approach in terms of both feature
matching and geometric registration
Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes
We introduce a novel robust hybrid 3D face tracking framework from RGBD video
streams, which is capable of tracking head pose and facial actions without
pre-calibration or intervention from a user. In particular, we emphasize on
improving the tracking performance in instances where the tracked subject is at
a large distance from the cameras, and the quality of point cloud deteriorates
severely. This is accomplished by the combination of a flexible 3D shape
regressor and the joint 2D+3D optimization on shape parameters. Our approach
fits facial blendshapes to the point cloud of the human head, while being
driven by an efficient and rapid 3D shape regressor trained on generic RGB
datasets. As an on-line tracking system, the identity of the unknown user is
adapted on-the-fly resulting in improved 3D model reconstruction and
consequently better tracking performance. The result is a robust RGBD face
tracker, capable of handling a wide range of target scene depths, beyond those
that can be afforded by traditional depth or RGB face trackers. Lastly, since
the blendshape is not able to accurately recover the real facial shape, we use
the tracked 3D face model as a prior in a novel filtering process to further
refine the depth map for use in other tasks, such as 3D reconstruction.Comment: 10 pages, 8 figures, 4 table
Joint Layout Estimation and Global Multi-View Registration for Indoor Reconstruction
In this paper, we propose a novel method to jointly solve scene layout
estimation and global registration problems for accurate indoor 3D
reconstruction. Given a sequence of range data, we first build a set of scene
fragments using KinectFusion and register them through pose graph optimization.
Afterwards, we alternate between layout estimation and layout-based global
registration processes in iterative fashion to complement each other. We
extract the scene layout through hierarchical agglomerative clustering and
energy-based multi-model fitting in consideration of noisy measurements. Having
the estimated scene layout in one hand, we register all the range data through
the global iterative closest point algorithm where the positions of 3D points
that belong to the layout such as walls and a ceiling are constrained to be
close to the layout. We experimentally verify the proposed method with the
publicly available synthetic and real-world datasets in both quantitative and
qualitative ways.Comment: Accepted to 2017 IEEE International Conference on Computer Vision
(ICCV
A 3D Object Detection and Pose Estimation Pipeline Using RGB-D Images
3D object detection and pose estimation has been studied extensively in
recent decades for its potential applications in robotics. However, there still
remains challenges when we aim at detecting multiple objects while retaining
low false positive rate in cluttered environments. This paper proposes a robust
3D object detection and pose estimation pipeline based on RGB-D images, which
can detect multiple objects simultaneously while reducing false positives.
Detection begins with template matching and yields a set of template matches. A
clustering algorithm then groups templates of similar spatial location and
produces multiple-object hypotheses. A scoring function evaluates the
hypotheses using their associated templates and non-maximum suppression is
adopted to remove duplicate results based on the scores. Finally, a combination
of point cloud processing algorithms are used to compute objects' 3D poses.
Existing object hypotheses are verified by computing the overlap between model
and scene points. Experiments demonstrate that our approach provides
competitive results comparable to the state-of-the-arts and can be applied to
robot random bin-picking
Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors
This paper proposes a global approach for the multi-view registration of
unordered range scans. As the basis of multi-view registration, pair-wise
registration is very pivotal. Therefore, we first select a good descriptor and
accelerate its correspondence propagation for the pair-wise registration. Then,
we design an effective rule to judge the reliability of pair-wise registration
results. Subsequently, we propose a model augmentation method, which can
utilize reliable results of pair-wise registration to augment the model shape.
Finally, multi-view registration can be accomplished by operating the pair-wise
registration and judgment, and model augmentation alternately. Experimental
results on public available data sets show, that this approach can
automatically achieve the multi-view registration of unordered range scans with
good accuracy and effectiveness
3D Scan Registration using Curvelet Features in Planetary Environments
Topographic mapping in planetary environments relies on accurate 3D scan
registration methods. However, most global registration algorithms relying on
features such as FPFH and Harris-3D show poor alignment accuracy in these
settings due to the poor structure of the Mars-like terrain and variable
resolution, occluded, sparse range data that is hard to register without some
a-priori knowledge of the environment. In this paper, we propose an alternative
approach to 3D scan registration using the curvelet transform that performs
multi-resolution geometric analysis to obtain a set of coefficients indexed by
scale (coarsest to finest), angle and spatial position. Features are detected
in the curvelet domain to take advantage of the directional selectivity of the
transform. A descriptor is computed for each feature by calculating the 3D
spatial histogram of the image gradients, and nearest neighbor based matching
is used to calculate the feature correspondences. Correspondence rejection
using Random Sample Consensus identifies inliers, and a locally optimal
Singular Value Decomposition-based estimation of the rigid-body transformation
aligns the laser scans given the re-projected correspondences in the metric
space. Experimental results on a publicly available data-set of planetary
analogue indoor facility, as well as simulated and real-world scans from Neptec
Design Group's IVIGMS 3D laser rangefinder at the outdoor CSA Mars yard
demonstrates improved performance over existing methods in the challenging
sparse Mars-like terrain.Comment: 27 pages in Journal of Field Robotics, 201
A Polynomial-time Solution for Robust Registration with Extreme Outlier Rates
We propose a robust approach for the registration of two sets of 3D points in
the presence of a large amount of outliers. Our first contribution is to
reformulate the registration problem using a Truncated Least Squares (TLS) cost
that makes the estimation insensitive to a large fraction of spurious
point-to-point correspondences. The second contribution is a general framework
to decouple rotation, translation, and scale estimation, which allows solving
in cascade for the three transformations. Since each subproblem (scale,
rotation, and translation estimation) is still non-convex and combinatorial in
nature, out third contribution is to show that (i) TLS scale and
(component-wise) translation estimation can be solved exactly and in polynomial
time via an adaptive voting scheme, (ii) TLS rotation estimation can be relaxed
to a semidefinite program and the relaxation is tight in practice, even in the
presence of an extreme amount of outliers. We validate the proposed algorithm,
named TEASER (Truncated least squares Estimation And SEmidefinite Relaxation),
in standard registration benchmarks showing that the algorithm outperforms
RANSAC and robust local optimization techniques, and favorably compares with
Branch-and-Bound methods, while being a polynomial-time algorithm. TEASER can
tolerate up to 99% outliers and returns highly-accurate solutions.Comment: 18 pages, Accepted for publication in Robotics: Science and Systems,
201
- …