DCPoint: global-local dual contrast for self-supervised representation learning of 3D point clouds

Abstract

In recent years, 3D vision has gained increasing prominence in practical applications such as autonomous driving and robotics. However, the scarcity of large labeled point cloud datasets continues to be a bottleneck for deep networks. Self-supervised representation learning (SRL) has emerged as an effective approach to alleviate this issue by pre-training general feature encoders without requiring human annotations. Existing contrastive SRL methods for 3D point clouds have predominantly concentrated on object representations from a global or point perspective. They overlook essential local geometry information, thereby constraining the generalizability of pre-trained models. To address these challenges, we propose a local contrast module as an intermediate level between the scene and point levels. It is then integrated with a global contrast module to form a dual contrast method known as DCPoint. The local contrast module operates on point-wise representations of objects and designs contrastive pairs based on the spatial information of point clouds. It effectively addresses the challenges posed by the sparsity and irregularity of point clouds and imperfect partition issues. The point-wise local contrast module aims to enhance the internal connections between the components within the point cloud, while the global contrast module introduces semantic information about individual instances. Experimental results demonstrate the effectiveness of DCPoint across various downstream tasks on synthetic and real-world datasets. It consistently outperforms previously reported SRL methods and the randomly initialized counterparts. Additionally, the proposed local contrast module can enhance the performances of other SRL methods

    Similar works