In this paper, we focus on the problem of category-level object pose
estimation, which is challenging due to the large intra-category shape
variation. 3D graph convolution (3D-GC) based methods have been widely used to
extract local geometric features, but they have limitations for complex shaped
objects and are sensitive to noise. Moreover, the scale and translation
invariant properties of 3D-GC restrict the perception of an object's size and
translation information. In this paper, we propose a simple network structure,
the HS-layer, which extends 3D-GC to extract hybrid scope latent features from
point cloud data for category-level object pose estimation tasks. The proposed
HS-layer: 1) is able to perceive local-global geometric structure and global
information, 2) is robust to noise, and 3) can encode size and translation
information. Our experiments show that the simple replacement of the 3D-GC
layer with the proposed HS-layer on the baseline method (GPV-Pose) achieves a
significant improvement, with the performance increased by 14.5% on 5d2cm
metric and 10.3% on IoU75. Our method outperforms the state-of-the-art methods
by a large margin (8.3% on 5d2cm, 6.9% on IoU75) on the REAL275 dataset and
runs in real-time (50 FPS).Comment: Accepted by the 2023 IEEE/CVF Computer Vision and Pattern Recognition
Conference (CVPR