31 research outputs found
Direct Superpoints Matching for Fast and Robust Point Cloud Registration
Although deep neural networks endow the downsampled superpoints with
discriminative feature representations, directly matching them is usually not
used alone in state-of-the-art methods, mainly for two reasons. First, the
correspondences are inevitably noisy, so RANSAC-like refinement is usually
adopted. Such ad hoc postprocessing, however, is slow and not differentiable,
which can not be jointly optimized with feature learning. Second, superpoints
are sparse and thus more RANSAC iterations are needed. Existing approaches use
the coarse-to-fine strategy to propagate the superpoints correspondences to the
point level, which are not discriminative enough and further necessitates the
postprocessing refinement. In this paper, we present a simple yet effective
approach to extract correspondences by directly matching superpoints using a
global softmax layer in an end-to-end manner, which are used to determine the
rigid transformation between the source and target point cloud. Compared with
methods that directly predict corresponding points, by leveraging the rich
information from the superpoints matchings, we can obtain more accurate
estimation of the transformation and effectively filter out outliers without
any postprocessing refinement. As a result, our approach is not only fast, but
also achieves state-of-the-art results on the challenging ModelNet and 3DMatch
benchmarks. Our code and model weights will be publicly released
When 3D Bounding-Box Meets SAM: Point Cloud Instance Segmentation with Weak-and-Noisy Supervision
Learning from bounding-boxes annotations has shown great potential in
weakly-supervised 3D point cloud instance segmentation. However, we observed
that existing methods would suffer severe performance degradation with
perturbed bounding box annotations. To tackle this issue, we propose a
complementary image prompt-induced weakly-supervised point cloud instance
segmentation (CIP-WPIS) method. CIP-WPIS leverages pretrained knowledge
embedded in the 2D foundation model SAM and 3D geometric prior to achieve
accurate point-wise instance labels from the bounding box annotations.
Specifically, CP-WPIS first selects image views in which 3D candidate points of
an instance are fully visible. Then, we generate complementary background and
foreground prompts from projections to obtain SAM 2D instance mask predictions.
According to these, we assign the confidence values to points indicating the
likelihood of points belonging to the instance. Furthermore, we utilize 3D
geometric homogeneity provided by superpoints to decide the final instance
label assignments. In this fashion, we achieve high-quality 3D point-wise
instance labels. Extensive experiments on both Scannet-v2 and S3DIS benchmarks
demonstrate that our method is robust against noisy 3D bounding-box annotations
and achieves state-of-the-art performance
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts
a two-stage paradigm, extracting segmentation proposals and then matching them
with referring expressions. However, this conventional paradigm encounters
significant challenges, most notably in terms of the generation of lackluster
initial proposals and a pronounced deceleration in inference speed. Recognizing
these limitations, we introduce an innovative end-to-end Superpoint-Text
Matching Network (3D-STMN) that is enriched by dependency-driven insights. One
of the keystones of our model is the Superpoint-Text Matching (STM) mechanism.
Unlike traditional methods that navigate through instance proposals, STM
directly correlates linguistic indications with their respective superpoints,
clusters of semantically related points. This architectural decision empowers
our model to efficiently harness cross-modal semantic relationships, primarily
leveraging densely annotated superpoint-text pairs, as opposed to the more
sparse instance-text pairs. In pursuit of enhancing the role of text in guiding
the segmentation process, we further incorporate the Dependency-Driven
Interaction (DDI) module to deepen the network's semantic comprehension of
referring expressions. Using the dependency trees as a beacon, this module
discerns the intricate relationships between primary terms and their associated
descriptors in expressions, thereby elevating both the localization and
segmentation capacities of our model. Comprehensive experiments on the
ScanRefer benchmark reveal that our model not only set new performance
standards, registering an mIoU gain of 11.7 points but also achieve a
staggering enhancement in inference speed, surpassing traditional methods by
95.7 times. The code and models are available at
https://github.com/sosppxo/3D-STMN
Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation
Weakly supervised point cloud segmentation, i.e. semantically segmenting a
point cloud with only a few labeled points in the whole 3D scene, is highly
desirable due to the heavy burden of collecting abundant dense annotations for
the model training. However, existing methods remain challenging to accurately
segment 3D point clouds since limited annotated data may lead to insufficient
guidance for label propagation to unlabeled data. Considering the
smoothness-based methods have achieved promising progress, in this paper, we
advocate applying the consistency constraint under various perturbations to
effectively regularize unlabeled 3D points. Specifically, we propose a novel
DAT (\textbf{D}ual \textbf{A}daptive \textbf{T}ransformations) model for weakly
supervised point cloud segmentation, where the dual adaptive transformations
are performed via an adversarial strategy at both point-level and region-level,
aiming at enforcing the local and structural smoothness constraints on 3D point
clouds. We evaluate our proposed DAT model with two popular backbones on the
large-scale S3DIS and ScanNet-V2 datasets. Extensive experiments demonstrate
that our model can effectively leverage the unlabeled 3D points and achieve
significant performance gains on both datasets, setting new state-of-the-art
performance for weakly supervised point cloud segmentation.Comment: ECCV 202