847 research outputs found
Unsupervised Feature Learning for Point Cloud by Contrasting and Clustering with Graph Convolutional Neural Network
Recently, deep graph neural networks (GNNs) have attracted significant attention for point cloud understanding tasks, including classification, segmentation, and detection. However, the training of such deep networks still requires a large amount of annotated data, which is both expensive and time-consuming. To alleviate the cost of collecting and annotating large-scale point cloud datasets, we propose an unsupervised learning approach to learn features from unlabeled point cloud ”3D object” dataset by using part contrasting and object clustering with GNNs. In the contrast learning step, all the samples in the 3D object dataset are cut into two parts and put into a ”part” dataset. Then a contrast learning GNN (ContrastNet) is trained to verify whether two randomly sampled parts from the part dataset belong to the same object. In the cluster learning step, the trained ContrastNet is applied to all the samples in the original 3D object dataset to extract features, which are used to group the samples into clusters. Then another GNN for clustering learning (ClusterNet) is trained to predict the cluster IDs of all the training samples. The contrasting learning forces the ContrastNet to learn high-level semantic features of objects but probably ignores low-level features, while the ClusterNet improves the quality of learned features by being trained to discover objects that probably belong to the same semantic categories by using cluster IDs. We have conducted extensive experiments to evaluate the proposed framework on point cloud classification tasks. The proposed unsupervised learning approach obtains comparable performance to the state-of-the-art unsupervised learning methods that used much more complicated network structures. The code of this work is publicly available via: https://github.com/lingzhang1/ContrastNe
Unsupervised 3D Learning for Shape Analysis via Multiresolution Instance Discrimination
Although unsupervised feature learning has demonstrated its advantages to
reducing the workload of data labeling and network design in many fields,
existing unsupervised 3D learning methods still cannot offer a generic network
for various shape analysis tasks with competitive performance to supervised
methods. In this paper, we propose an unsupervised method for learning a
generic and efficient shape encoding network for different shape analysis
tasks. The key idea of our method is to jointly encode and learn shape and
point features from unlabeled 3D point clouds. For this purpose, we adapt
HR-Net to octree-based convolutional neural networks for jointly encoding shape
and point features with fused multiresolution subnetworks and design a
simple-yet-efficient Multiresolution Instance Discrimination (MID) loss for
jointly learning the shape and point features. Our network takes a 3D point
cloud as input and output both shape and point features. After training, the
network is concatenated with simple task-specific back-end layers and
fine-tuned for different shape analysis tasks. We evaluate the efficacy and
generality of our method and validate our network and loss design with a set of
shape analysis tasks, including shape classification, semantic shape
segmentation, as well as shape registration tasks. With simple back-ends, our
network demonstrates the best performance among all unsupervised methods and
achieves competitive performance to supervised methods, especially in tasks
with a small labeled dataset. For fine-grained shape segmentation, our method
even surpasses existing supervised methods by a large margin.Comment: Accepted by AAAI 2021. Code:
https://github.com/microsoft/O-CNN/blob/master/docs/unsupervised.m
Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering
Semantic segmentation of point clouds usually requires exhausting efforts of
human annotations, hence it attracts wide attention to the challenging topic of
learning from unlabeled or weaker forms of annotations. In this paper, we take
the first attempt for fully unsupervised semantic segmentation of point clouds,
which aims to delineate semantically meaningful objects without any form of
annotations. Previous works of unsupervised pipeline on 2D images fails in this
task of point clouds, due to: 1) Clustering Ambiguity caused by limited
magnitude of data and imbalanced class distribution; 2) Irregularity Ambiguity
caused by the irregular sparsity of point cloud. Therefore, we propose a novel
framework, PointDC, which is comprised of two steps that handle the
aforementioned problems respectively: Cross-Modal Distillation (CMD) and
Super-Voxel Clustering (SVC). In the first stage of CMD, multi-view visual
features are back-projected to the 3D space and aggregated to a unified point
feature to distill the training of the point representation. In the second
stage of SVC, the point features are aggregated to super-voxels and then fed to
the iterative clustering process for excavating semantic classes. PointDC
yields a significant improvement over the prior state-of-the-art unsupervised
methods, on both the ScanNet-v2 (+18.4 mIoU) and S3DIS (+11.5 mIoU) semantic
segmentation benchmarks
U3DS: Unsupervised 3D Semantic Scene Segmentation
Contemporary point cloud segmentation approaches largely rely on richly
annotated 3D training data. However, it is both time-consuming and challenging
to obtain consistently accurate annotations for such 3D scene data. Moreover,
there is still a lack of investigation into fully unsupervised scene
segmentation for point clouds, especially for holistic 3D scenes. This paper
presents U3DS, as a step towards completely unsupervised point cloud
segmentation for any holistic 3D scenes. To achieve this, U3DS leverages a
generalized unsupervised segmentation method for both object and background
across both indoor and outdoor static 3D point clouds with no requirement for
model pre-training, by leveraging only the inherent information of the point
cloud to achieve full 3D scene segmentation. The initial step of our proposed
approach involves generating superpoints based on the geometric characteristics
of each scene. Subsequently, it undergoes a learning process through a spatial
clustering-based methodology, followed by iterative training using
pseudo-labels generated in accordance with the cluster centroids. Moreover, by
leveraging the invariance and equivariance of the volumetric representations,
we apply the geometric transformation on voxelized features to provide two sets
of descriptors for robust representation learning. Finally, our evaluation
provides state-of-the-art results on the ScanNet and SemanticKITTI, and
competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) 202
Joint Data and Feature Augmentation for Self-Supervised Representation Learning on Point Clouds
To deal with the exhausting annotations, self-supervised representation
learning from unlabeled point clouds has drawn much attention, especially
centered on augmentation-based contrastive methods. However, specific
augmentations hardly produce sufficient transferability to high-level tasks on
different datasets. Besides, augmentations on point clouds may also change
underlying semantics. To address the issues, we propose a simple but efficient
augmentation fusion contrastive learning framework to combine data
augmentations in Euclidean space and feature augmentations in feature space. In
particular, we propose a data augmentation method based on sampling and graph
generation. Meanwhile, we design a data augmentation network to enable a
correspondence of representations by maximizing consistency between augmented
graph pairs. We further design a feature augmentation network that encourages
the model to learn representations invariant to the perturbations using an
encoder perturbation. We comprehensively conduct extensive object
classification experiments and object part segmentation experiments to validate
the transferability of the proposed framework. Experimental results demonstrate
that the proposed framework is effective to learn the point cloud
representation in a self-supervised manner, and yields state-of-the-art results
in the community. The source code is publicly available at:
https://zhiyongsu.github.io/Project/AFSRL.html
- …