20 research outputs found
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Self-driving cars need to understand 3D scenes efficiently and accurately in
order to drive safely. Given the limited hardware resources, existing 3D
perception models are not able to recognize small instances (e.g., pedestrians,
cyclists) very well due to the low-resolution voxelization and aggressive
downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv),
a lightweight 3D module that equips the vanilla Sparse Convolution with the
high-resolution point-based branch. With negligible overhead, this point-based
branch is able to preserve the fine details even from large outdoor scenes. To
explore the spectrum of efficient 3D models, we first define a flexible
architecture design space based on SPVConv, and we then present 3D Neural
Architecture Search (3D-NAS) to search the optimal network architecture over
this diverse design space efficiently and effectively. Experimental results
validate that the resulting SPVNAS model is fast and accurate: it outperforms
the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive
SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x
measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer
our method to 3D object detection, and it achieves consistent improvements over
the one-stage detection baseline on KITTI.Comment: ECCV 2020. The first two authors contributed equally to this work.
Project page: http://spvnas.mit.edu
Improving 3D Semantic Segmentation withTwin-Representation Networks
The growing importance of 3d scene understanding and interpretation is inher-ently connected to the rise of autonomous driving and robotics. Semanticsegmentation of 3d point clouds is a key enabler for this task, providing geo-metric information enhanced with semantics. To use Convolutional NeuralNetworks, a proper representation of the point clouds must be chosen. Variousrepresentations have been proposed, with different advantages and disadvantages.In this work, we present a twin-representation architecture, which is composedof a 3d point-based and a 2d range image branch, to efficiently extract and refinepoint-wise features, supported by strong context information. Additionally, afeature propagation strategy is proposed to connect both branches. The approachis evaluated on the challenging SemanticKITTI dataset [2] and considerablyoutperforms the baseline overall as well as for every individual class. Especiallythe predictions for distant points are significantly improved
Panoster: End-to-end Panoptic Segmentation of LiDAR Point Clouds
Panoptic segmentation has recently unified semantic and instance
segmentation, previously addressed separately, thus taking a step further
towards creating more comprehensive and efficient perception systems. In this
paper, we present Panoster, a novel proposal-free panoptic segmentation method
for LiDAR point clouds. Unlike previous approaches relying on several steps to
group pixels or points into objects, Panoster proposes a simplified framework
incorporating a learning-based clustering solution to identify instances. At
inference time, this acts as a class-agnostic segmentation, allowing Panoster
to be fast, while outperforming prior methods in terms of accuracy. Without any
post-processing, Panoster reached state-of-the-art results among published
approaches on the challenging SemanticKITTI benchmark, and further increased
its lead by exploiting heuristic techniques. Additionally, we showcase how our
method can be flexibly and effectively applied on diverse existing semantic
architectures to deliver panoptic predictions.Comment: Preprint of IEEE RA-L articl
COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation
Annotation of large-scale 3D data is notoriously cumbersome and costly. As an
alternative, weakly-supervised learning alleviates such a need by reducing the
annotation by several order of magnitudes. We propose COARSE3D, a novel
architecture-agnostic contrastive learning strategy for 3D segmentation. Since
contrastive learning requires rich and diverse examples as keys and anchors, we
leverage a prototype memory bank capturing class-wise global dataset
information efficiently into a small number of prototypes acting as keys. An
entropy-driven sampling technique then allows us to select good pixels from
predictions as anchors. Experiments on three projection-based backbones show we
outperform baselines on three challenging real-world outdoor datasets, working
with as low as 0.001% annotations
Two Heads are Better than One: Geometric-Latent Attention for Point Cloud Classification and Segmentation
We present an innovative two-headed attention layer that combines geometric
and latent features to segment a 3D scene into semantically meaningful subsets.
Each head combines local and global information, using either the geometric or
latent features, of a neighborhood of points and uses this information to learn
better local relationships. This Geometric-Latent attention layer (Ge-Latto) is
combined with a sub-sampling strategy to capture global features. Our method is
invariant to permutation thanks to the use of shared-MLP layers, and it can
also be used with point clouds with varying densities because the local
attention layer does not depend on the neighbor order. Our proposal is simple
yet robust, which allows it to achieve competitive results in the ShapeNetPart
and ModelNet40 datasets, and the state-of-the-art when segmenting the complex
dataset S3DIS, with 69.2% IoU on Area 5, and 89.7% overall accuracy using
K-fold cross-validation on the 6 areas.Comment: Accepted in BMVC 202
Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion
LiDAR point cloud analysis is a core task for 3D computer vision, especially
for autonomous driving. However, due to the severe sparsity and noise
interference in the single sweep LiDAR point cloud, the accurate semantic
segmentation is non-trivial to achieve. In this paper, we propose a novel
sparse LiDAR point cloud semantic segmentation framework assisted by learned
contextual shape priors. In practice, an initial semantic segmentation (SS) of
a single sweep point cloud can be achieved by any appealing network and then
flows into the semantic scene completion (SSC) module as the input. By merging
multiple frames in the LiDAR sequence as supervision, the optimized SSC module
has learned the contextual shape priors from sequential LiDAR data, completing
the sparse single sweep point cloud to the dense one. Thus, it inherently
improves SS optimization through fully end-to-end training. Besides, a
Point-Voxel Interaction (PVI) module is proposed to further enhance the
knowledge fusion between SS and SSC tasks, i.e., promoting the interaction of
incomplete local geometry of point cloud and complete voxel-wise global
structure. Furthermore, the auxiliary SSC and PVI modules can be discarded
during inference without extra burden for SS. Extensive experiments confirm
that our JS3C-Net achieves superior performance on both SemanticKITTI and
SemanticPOSS benchmarks, i.e., 4% and 3% improvement correspondingly.Comment: To appear in AAAI 2021. Codes are available at
https://github.com/yanx27/JS3C-Ne
COARSE3D: Class-Prototypes for Contrastive Learning in Weakly-Supervised 3D Point Cloud Segmentation
International audienceAnnotation of large-scale 3D data is notoriously cumbersome and costly. As an alternative, weakly-supervised learning alleviates such a need by reducing the annotation by several order of magnitudes. We propose COARSE3D, a novel architecture-agnostic contrastive learning strategy for 3D segmentation. Since contrastive learning requires rich and diverse examples as keys and anchors, we leverage a prototype memory bank capturing class-wise global dataset information efficiently into a small number of prototypes acting as keys. An entropy-driven sampling technique then allows us to select good pixels from predictions as anchors. Experiments on three projection-based backbones show we outperform baselines on three challenging real-world outdoor datasets, working with as low as 0.001% annotations