748 research outputs found
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation
In this paper, we introduce an SE(3) diffusion model-based point cloud
registration framework for 6D object pose estimation in real-world scenarios.
Our approach formulates the 3D registration task as a denoising diffusion
process, which progressively refines the pose of the source point cloud to
obtain a precise alignment with the model point cloud. Training our framework
involves two operations: An SE(3) diffusion process and an SE(3) reverse
process. The SE(3) diffusion process gradually perturbs the optimal rigid
transformation of a pair of point clouds by continuously injecting noise
(perturbation transformation). By contrast, the SE(3) reverse process focuses
on learning a denoising network that refines the noisy transformation
step-by-step, bringing it closer to the optimal transformation for accurate
pose estimation. Unlike standard diffusion models used in linear Euclidean
spaces, our diffusion model operates on the SE(3) manifold. This requires
exploiting the linear Lie algebra associated with SE(3) to
constrain the transformation transitions during the diffusion and reverse
processes. Additionally, to effectively train our denoising network, we derive
a registration-specific variational lower bound as the optimization objective
for model learning. Furthermore, we show that our denoising network can be
constructed with a surrogate registration model, making our approach applicable
to different deep registration networks. Extensive experiments demonstrate that
our diffusion registration framework presents outstanding pose estimation
performance on the real-world TUD-L, LINEMOD, and Occluded-LINEMOD datasets.Comment: Accepted by NeurIPS-202
TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer
Estimating the 6D object pose is an essential task in many applications. Due
to the lack of depth information, existing RGB-based methods are sensitive to
occlusion and illumination changes. How to extract and utilize the geometry
features in depth information is crucial to achieve accurate predictions. To
this end, we propose TransPose, a novel 6D pose framework that exploits
Transformer Encoder with geometry-aware module to develop better learning of
point cloud feature representations. Specifically, we first uniformly sample
point cloud and extract local geometry features with the designed local feature
extractor base on graph convolution network. To improve robustness to
occlusion, we adopt Transformer to perform the exchange of global information,
making each local feature contains global information. Finally, we introduce
geometry-aware module in Transformer Encoder, which to form an effective
constrain for point cloud feature learning and makes the global information
exchange more tightly coupled with point cloud tasks. Extensive experiments
indicate the effectiveness of TransPose, our pose estimation pipeline achieves
competitive results on three benchmark datasets.Comment: 10 pages, 5 figures, IEEE Journa
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
- …