3,957 research outputs found
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios
Estimating the 6D pose of objects is a major 3D computer vision problem.
Since the promising outcomes from instance-level approaches, research heads
also move towards category-level pose estimation for more practical application
scenarios. However, unlike well-established instance-level pose datasets,
available category-level datasets lack annotation quality and provided pose
quantity. We propose the new category-level 6D pose dataset HouseCat6D
featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly
diverse 194 objects of 10 household object categories including 2
photometrically challenging categories, 3) High-quality pose annotation with an
error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive
viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout
the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps.
Furthermore, we also provide benchmark results of state-of-the-art
category-level pose estimation networks
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
In this paper, we focus on category-level 6D pose and size estimation from
monocular RGB-D image. Previous methods suffer from inefficient category-level
pose feature extraction which leads to low accuracy and inference speed. To
tackle this problem, we propose a fast shape-based network (FS-Net) with
efficient category-level feature extraction for 6D pose estimation. First, we
design an orientation aware autoencoder with 3D graph convolution for latent
feature extraction. The learned latent feature is insensitive to point shift
and object size thanks to the shift and scale-invariance properties of the 3D
graph convolution. Then, to efficiently decode category-level rotation
information from the latent feature, we propose a novel decoupled rotation
mechanism that employs two decoders to complementarily access the rotation
information. Meanwhile, we estimate translation and size by two residuals,
which are the difference between the mean of object points and ground truth
translation, and the difference between the mean size of the category and
ground truth size, respectively. Finally, to increase the generalization
ability of FS-Net, we propose an online box-cage based 3D deformation mechanism
to augment the training data. Extensive experiments on two benchmark datasets
show that the proposed method achieves state-of-the-art performance in both
category- and instance-level 6D object pose estimation. Especially in
category-level pose estimation, without extra synthetic data, our method
outperforms existing methods by 6.3% on the NOCS-REAL dataset.Comment: accepted by CVPR2021, ora
Recovering 6D pose of rigid object from point cloud at the level of instance and category
Estimating the 3D orientation and 3D position, i.e. 6D pose, of rigid objects plays an essential role in computer vision tasks. This field has been made huge progress with the development of deep learning techniques. However, some challenges still need to be addressed, such as occlusion, viewpoint variation, and intra-class variation in categorylevel pose estimation. This thesis is philosophically built upon addressing the aforementioned problems in 3D space via point cloud representation. Via addressing these problems, there are mainly three findings of this thesis: point cloud representation in 3D space is more suitable for 6D object pose estimation; feature design is essential to pose estimation tasks; rotation representation has an important impact on pose estimation results. For the first finding, all the three proposed pipelines use RGB information for the 2D location of the target object and estimate the 6D pose of the object in the detected region with point cloud input. The experimental results show that this fashion focuses the network learning useful 3D information from the point cloud, which is useful to pose estimation tasks. As to the second finding, for different challenges, we design different features. For the occlusion challenge at the instance level, we propose to extract dense local features by regressing point-wise vectors for pose hypotheses generation and select the best pose candidate based on 3D geometry constraints by RANSAC. Via this fashion, the network can better utilize the local 3D information from the point cloud. However, due to a large number of hypotheses, this generation and verification strategy is time-consuming. Then to mitigate the time-consuming and viewpoint variation problem, we propose the embedding vector feature. With this newly designed feature, the proposed method can effectively extract the viewpoint information from the train-ing dataset, which leads to the fast, over 20fps, 6D pose estimation of the target object. However, we still need a large amount of labelled data to train the model. To make the model less dependent on the labelled data, this thesis then addresses the category-level pose estimation problem. To handle the intra-class variation in the categorical 6D pose estimation task, we propose to use 3D graph convolution for category-level latent rotation feature learning. Finally, to fully decode the rotation information from the latent feature, we employ two decoders based on the newly designed rotation representation. With this new rotation representation and learned feature, the proposed method achieves state-of-the-art performance with almost real-time speed at the level of category
Recovering 6D Object Pose: A Review and Multi-modal Analysis
A large number of studies analyse object detection and pose estimation at
visual level in 2D, discussing the effects of challenges such as occlusion,
clutter, texture, etc., on the performances of the methods, which work in the
context of RGB modality. Interpreting the depth data, the study in this paper
presents thorough multi-modal analyses. It discusses the above-mentioned
challenges for full 6D object pose estimation in RGB-D images comparing the
performances of several 6D detectors in order to answer the following
questions: What is the current position of the computer vision community for
maintaining "automation" in robotic manipulation? What next steps should the
community take for improving "autonomy" in robotics while handling objects? Our
findings include: (i) reasonably accurate results are obtained on
textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy
existence of occlusion and clutter severely affects the detectors, and
similar-looking distractors is the biggest challenge in recovering instances'
6D. (iii) Template-based methods and random forest-based learning algorithms
underlie object detection and 6D pose estimation. Recent paradigm is to learn
deep discriminative feature representations and to adopt CNNs taking RGB images
as input. (iv) Depending on the availability of large-scale 6D annotated depth
datasets, feature representations can be learnt on these datasets, and then the
learnt representations can be customized for the 6D problem
- …