3,957 research outputs found

    HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios

    Full text link
    Estimating the 6D pose of objects is a major 3D computer vision problem. Since the promising outcomes from instance-level approaches, research heads also move towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category-level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps. Furthermore, we also provide benchmark results of state-of-the-art category-level pose estimation networks

    FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

    Get PDF
    In this paper, we focus on category-level 6D pose and size estimation from monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware autoencoder with 3D graph convolution for latent feature extraction. The learned latent feature is insensitive to point shift and object size thanks to the shift and scale-invariance properties of the 3D graph convolution. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. Meanwhile, we estimate translation and size by two residuals, which are the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.Comment: accepted by CVPR2021, ora

    Recovering 6D pose of rigid object from point cloud at the level of instance and category

    Get PDF
    Estimating the 3D orientation and 3D position, i.e. 6D pose, of rigid objects plays an essential role in computer vision tasks. This field has been made huge progress with the development of deep learning techniques. However, some challenges still need to be addressed, such as occlusion, viewpoint variation, and intra-class variation in categorylevel pose estimation. This thesis is philosophically built upon addressing the aforementioned problems in 3D space via point cloud representation. Via addressing these problems, there are mainly three findings of this thesis: point cloud representation in 3D space is more suitable for 6D object pose estimation; feature design is essential to pose estimation tasks; rotation representation has an important impact on pose estimation results. For the first finding, all the three proposed pipelines use RGB information for the 2D location of the target object and estimate the 6D pose of the object in the detected region with point cloud input. The experimental results show that this fashion focuses the network learning useful 3D information from the point cloud, which is useful to pose estimation tasks. As to the second finding, for different challenges, we design different features. For the occlusion challenge at the instance level, we propose to extract dense local features by regressing point-wise vectors for pose hypotheses generation and select the best pose candidate based on 3D geometry constraints by RANSAC. Via this fashion, the network can better utilize the local 3D information from the point cloud. However, due to a large number of hypotheses, this generation and verification strategy is time-consuming. Then to mitigate the time-consuming and viewpoint variation problem, we propose the embedding vector feature. With this newly designed feature, the proposed method can effectively extract the viewpoint information from the train-ing dataset, which leads to the fast, over 20fps, 6D pose estimation of the target object. However, we still need a large amount of labelled data to train the model. To make the model less dependent on the labelled data, this thesis then addresses the category-level pose estimation problem. To handle the intra-class variation in the categorical 6D pose estimation task, we propose to use 3D graph convolution for category-level latent rotation feature learning. Finally, to fully decode the rotation information from the latent feature, we employ two decoders based on the newly designed rotation representation. With this new rotation representation and learned feature, the proposed method achieves state-of-the-art performance with almost real-time speed at the level of category

    Recovering 6D Object Pose: A Review and Multi-modal Analysis

    Full text link
    A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem
    • …
    corecore