23,457 research outputs found
Permutation Matters: Anisotropic Convolutional Layer for Learning on Point Clouds
It has witnessed a growing demand for efficient representation learning on
point clouds in many 3D computer vision applications. Behind the success story
of convolutional neural networks (CNNs) is that the data (e.g., images) are
Euclidean structured. However, point clouds are irregular and unordered.
Various point neural networks have been developed with isotropic filters or
using weighting matrices to overcome the structure inconsistency on point
clouds. However, isotropic filters or weighting matrices limit the
representation power. In this paper, we propose a permutable anisotropic
convolutional operation (PAI-Conv) that calculates soft-permutation matrices
for each point using dot-product attention according to a set of evenly
distributed kernel points on a sphere's surface and performs shared anisotropic
filters. In fact, dot product with kernel points is by analogy with the
dot-product with keys in Transformer as widely used in natural language
processing (NLP). From this perspective, PAI-Conv can be regarded as the
transformer for point clouds, which is physically meaningful and is robust to
cooperate with the efficient random point sampling method. Comprehensive
experiments on point clouds demonstrate that PAI-Conv produces competitive
results in classification and semantic segmentation tasks compared to
state-of-the-art methods
GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature of Point Cloud
Exploiting fine-grained semantic features on point cloud is still challenging
due to its irregular and sparse structure in a non-Euclidean space. Among
existing studies, PointNet provides an efficient and promising approach to
learn shape features directly on unordered 3D point cloud and has achieved
competitive performance. However, local feature that is helpful towards better
contextual learning is not considered. Meanwhile, attention mechanism shows
efficiency in capturing node representation on graph-based data by attending
over neighboring nodes. In this paper, we propose a novel neural network for
point cloud, dubbed GAPNet, to learn local geometric representations by
embedding graph attention mechanism within stacked Multi-Layer-Perceptron (MLP)
layers. Firstly, we introduce a GAPLayer to learn attention features for each
point by highlighting different attention weights on neighborhood. Secondly, in
order to exploit sufficient features, a multi-head mechanism is employed to
allow GAPLayer to aggregate different features from independent heads. Thirdly,
we propose an attention pooling layer over neighbors to capture local signature
aimed at enhancing network robustness. Finally, GAPNet applies stacked MLP
layers to attention features and local signature to fully extract local
geometric structures. The proposed GAPNet architecture is tested on the
ModelNet40 and ShapeNet part datasets, and achieves state-of-the-art
performance in both shape classification and part segmentation tasks
Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review
Recently, the advancement of deep learning in discriminative feature learning
from 3D LiDAR data has led to rapid development in the field of autonomous
driving. However, automated processing uneven, unstructured, noisy, and massive
3D point clouds is a challenging and tedious task. In this paper, we provide a
systematic review of existing compelling deep learning architectures applied in
LiDAR point clouds, detailing for specific tasks in autonomous driving such as
segmentation, detection, and classification. Although several published
research papers focus on specific topics in computer vision for autonomous
vehicles, to date, no general survey on deep learning applied in LiDAR point
clouds for autonomous vehicles exists. Thus, the goal of this paper is to
narrow the gap in this topic. More than 140 key contributions in the recent
five years are summarized in this survey, including the milestone 3D deep
architectures, the remarkable deep learning applications in 3D semantic
segmentation, object detection, and classification; specific datasets,
evaluation metrics, and the state of the art performance. Finally, we conclude
the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and
Learning System
Deep Closest Point: Learning Representations for Point Cloud Registration
Point cloud registration is a key problem for computer vision applied to
robotics, medical imaging, and other applications. This problem involves
finding a rigid transformation from one point cloud into another so that they
align. Iterative Closest Point (ICP) and its variants provide simple and
easily-implemented iterative methods for this task, but these algorithms can
converge to spurious local optima. To address local optima and other
difficulties in the ICP pipeline, we propose a learning-based method, titled
Deep Closest Point (DCP), inspired by recent techniques in computer vision and
natural language processing. Our model consists of three parts: a point cloud
embedding network, an attention-based module combined with a pointer generation
layer, to approximate combinatorial matching, and a differentiable singular
value decomposition (SVD) layer to extract the final rigid transformation. We
train our model end-to-end on the ModelNet40 dataset and show in several
settings that it performs better than ICP, its variants (e.g., Go-ICP, FGR),
and the recently-proposed learning-based method PointNetLK. Beyond providing a
state-of-the-art registration technique, we evaluate the suitability of our
learned features transferred to unseen objects. We also provide preliminary
analysis of our learned model to help understand whether domain-specific and/or
global features facilitate rigid registration
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
Capturing long-range dependencies in feature representations is crucial for
many visual recognition tasks. Despite recent successes of deep convolutional
networks, it remains challenging to model non-local context relations between
visual features. A promising strategy is to model the feature context by a
fully-connected graph neural network (GNN), which augments traditional
convolutional features with an estimated non-local context representation.
However, most GNN-based approaches require computing a dense graph affinity
matrix and hence have difficulty in scaling up to tackle complex real-world
visual problems. In this work, we propose an efficient and yet flexible
non-local relation representation based on a novel class of graph neural
networks. Our key idea is to introduce a latent space to reduce the complexity
of graph, which allows us to use a low-rank representation for the graph
affinity matrix and to achieve a linear complexity in computation. Extensive
experimental evaluations on three major visual recognition tasks show that our
method outperforms the prior works with a large margin while maintaining a low
computation cost.Comment: ICML 201
Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling
Geometric deep learning is increasingly important thanks to the popularity of
3D sensors. Inspired by the recent advances in NLP domain, the self-attention
transformer is introduced to consume the point clouds. We develop Point
Attention Transformers (PATs), using a parameter-efficient Group Shuffle
Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its
ability to process size-varying inputs, and prove its permutation equivariance.
Besides, prior work uses heuristics dependence on the input data (e.g.,
Furthest Point Sampling) to hierarchically select subsets of input points.
Thereby, we for the first time propose an end-to-end learnable and
task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select
a representative subset of input points. Equipped with Gumbel-Softmax, it
produces a "soft" continuous subset in training phase, and a "hard" discrete
subset in test phase. By selecting representative subsets in a hierarchical
fashion, the networks learn a stronger representation of the input sets with
lower computation cost. Experiments on classification and segmentation
benchmarks show the effectiveness and efficiency of our methods. Furthermore,
we propose a novel application, to process event camera stream as point clouds,
and achieve a state-of-the-art performance on DVS128 Gesture Dataset.Comment: CVPR'201
Dynamic Graph CNN for Learning on Point Clouds
Point clouds provide a flexible geometric representation suitable for
countless applications in computer graphics; they also comprise the raw output
of most 3D data acquisition devices. While hand-designed features on point
clouds have long been proposed in graphics and vision, however, the recent
overwhelming success of convolutional neural networks (CNNs) for image analysis
suggests the value of adapting insight from CNN to the point cloud world. Point
clouds inherently lack topological information so designing a model to recover
topology can enrich the representation power of point clouds. To this end, we
propose a new neural network module dubbed EdgeConv suitable for CNN-based
high-level tasks on point clouds including classification and segmentation.
EdgeConv acts on graphs dynamically computed in each layer of the network. It
is differentiable and can be plugged into existing architectures. Compared to
existing modules operating in extrinsic space or treating each point
independently, EdgeConv has several appealing properties: It incorporates local
neighborhood information; it can be stacked applied to learn global shape
properties; and in multi-layer systems affinity in feature space captures
semantic characteristics over potentially long distances in the original
embedding. We show the performance of our model on standard benchmarks
including ModelNet40, ShapeNetPart, and S3DIS
RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
We study the problem of efficient semantic segmentation for large-scale 3D
point clouds. By relying on expensive sampling techniques or computationally
heavy pre/post-processing steps, most existing approaches are only able to be
trained and operate over small-scale point clouds. In this paper, we introduce
RandLA-Net, an efficient and lightweight neural architecture to directly infer
per-point semantics for large-scale point clouds. The key to our approach is to
use random point sampling instead of more complex point selection approaches.
Although remarkably computation and memory efficient, random sampling can
discard key features by chance. To overcome this, we introduce a novel local
feature aggregation module to progressively increase the receptive field for
each 3D point, thereby effectively preserving geometric details. Extensive
experiments show that our RandLA-Net can process 1 million points in a single
pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net
clearly surpasses state-of-the-art approaches for semantic segmentation on two
large-scale benchmarks Semantic3D and SemanticKITTI.Comment: CVPR 2020 Oral. Code and data are available at:
https://github.com/QingyongHu/RandLA-Ne
MeshCNN: A Network with an Edge
Polygonal meshes provide an efficient representation for 3D shapes. They
explicitly capture both shape surface and topology, and leverage non-uniformity
to represent large flat regions as well as sharp, intricate features. This
non-uniformity and irregularity, however, inhibits mesh analysis efforts using
neural networks that combine convolution and pooling operations. In this paper,
we utilize the unique properties of the mesh for a direct analysis of 3D shapes
using MeshCNN, a convolutional neural network designed specifically for
triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized
convolution and pooling layers that operate on the mesh edges, by leveraging
their intrinsic geodesic connections. Convolutions are applied on edges and the
four edges of their incident triangles, and pooling is applied via an edge
collapse operation that retains surface topology, thereby, generating new mesh
connectivity for the subsequent convolutions. MeshCNN learns which edges to
collapse, thus forming a task-driven process where the network exposes and
expands the important features while discarding the redundant ones. We
demonstrate the effectiveness of our task-driven pooling on various learning
tasks applied to 3D meshes.Comment: For a two-minute explanation video see https://bit.ly/meshcnnvide
Relation-Shape Convolutional Neural Network for Point Cloud Analysis
Point cloud analysis is very challenging, as the shape implied in irregular
points is difficult to capture. In this paper, we propose RS-CNN, namely,
Relation-Shape Convolutional Neural Network, which extends regular grid CNN to
irregular configuration for point cloud analysis. The key to RS-CNN is learning
from relation, i.e., the geometric topology constraint among points.
Specifically, the convolutional weight for local point set is forced to learn a
high-level relation expression from predefined geometric priors, between a
sampled point from this point set and the others. In this way, an inductive
local representation with explicit reasoning about the spatial layout of points
can be obtained, which leads to much shape awareness and robustness. With this
convolution as a basic operator, RS-CNN, a hierarchical architecture can be
developed to achieve contextual shape-aware learning for point cloud analysis.
Extensive experiments on challenging benchmarks across three tasks verify
RS-CNN achieves the state of the arts.Comment: Accepted to CVPR 2019 as an oral presentation. Project page at
https://yochengliu.github.io/Relation-Shape-CN
- …