2,483 research outputs found
ConvPoint: Continuous Convolutions for Point Cloud Processing
Point clouds are unstructured and unordered data, as opposed to images. Thus,
most machine learning approach developed for image cannot be directly
transferred to point clouds. In this paper, we propose a generalization of
discrete convolutional neural networks (CNNs) in order to deal with point
clouds by replacing discrete kernels by continuous ones. This formulation is
simple, allows arbitrary point cloud sizes and can easily be used for designing
neural networks similarly to 2D CNNs. We present experimental results with
various architectures, highlighting the flexibility of the proposed approach.
We obtain competitive results compared to the state-of-the-art on shape
classification, part segmentation and semantic segmentation for large-scale
point clouds.Comment: 12 page
Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds
In this work, we propose Dilated Point Convolutions (DPC). In a thorough
ablation study, we show that the receptive field size is directly related to
the performance of 3D point cloud processing tasks, including semantic
segmentation and object classification. Point convolutions are widely used to
efficiently process 3D data representations such as point clouds or graphs.
However, we observe that the receptive field size of recent point convolutional
networks is inherently limited. Our dilated point convolutions alleviate this
issue, they significantly increase the receptive field size of point
convolutions. Importantly, our dilation mechanism can easily be integrated into
most existing point convolutional networks. To evaluate the resulting network
architectures, we visualize the receptive field and report competitive scores
on popular point cloud benchmarks.Comment: ICRA 2020 Video https://www.youtube.com/watch?v=JDfFmuOvMkM Project
https://francisengelmann.github.io/DPC
A Review on Deep Learning Techniques Applied to Semantic Segmentation
Image semantic segmentation is more and more being of interest for computer
vision and machine learning researchers. Many applications on the rise need
accurate and efficient segmentation mechanisms: autonomous driving, indoor
navigation, and even virtual or augmented reality systems to name a few. This
demand coincides with the rise of deep learning approaches in almost every
field or application target related to computer vision, including semantic
segmentation or scene understanding. This paper provides a review on deep
learning methods for semantic segmentation applied to various application
areas. Firstly, we describe the terminology of this field as well as mandatory
background concepts. Next, the main datasets and challenges are exposed to help
researchers decide which are the ones that best suit their needs and their
targets. Then, existing methods are reviewed, highlighting their contributions
and their significance in the field. Finally, quantitative results are given
for the described methods and the datasets in which they were evaluated,
following up with a discussion of the results. At last, we point out a set of
promising future works and draw our own conclusions about the state of the art
of semantic segmentation using deep learning techniques.Comment: Submitted to TPAMI on Apr. 22, 201
Octree guided CNN with Spherical Kernels for 3D Point Clouds
We propose an octree guided neural network architecture and spherical
convolutional kernel for machine learning from arbitrary 3D point clouds. The
network architecture capitalizes on the sparse nature of irregular point
clouds, and hierarchically coarsens the data representation with space
partitioning. At the same time, the proposed spherical kernels systematically
quantize point neighborhoods to identify local geometric structures in the
data, while maintaining the properties of translation-invariance and asymmetry.
We specify spherical kernels with the help of network neurons that in turn are
associated with spatial locations. We exploit this association to avert dynamic
kernel generation during network training that enables efficient learning with
high resolution point clouds. The effectiveness of the proposed technique is
established on the benchmark tasks of 3D object classification and
segmentation, achieving new state-of-the-art on ShapeNet and RueMonge2014
datasets.Comment: Accepted in IEEE CVPR 2019. arXiv admin note: substantial text
overlap with arXiv:1805.0787
Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation
In this paper, we present an extension to LaserNet, an efficient and
state-of-the-art LiDAR based 3D object detector. We propose a method for fusing
image data with the LiDAR data and show that this sensor fusion method improves
the detection performance of the model especially at long ranges. The addition
of image data is straightforward and does not require image labels.
Furthermore, we expand the capabilities of the model to perform 3D semantic
segmentation in addition to 3D object detection. On a large benchmark dataset,
we demonstrate our approach achieves state-of-the-art performance on both
object detection and semantic segmentation while maintaining a low runtime.Comment: Accepted for publication at CVPR Workshop on Autonomous Driving 201
RIU-Net: Embarrassingly simple semantic segmentation of 3D LiDAR point cloud
This paper proposes RIU-Net (for Range-Image U-Net), the adaptation of a
popular semantic segmentation network for the semantic segmentation of a 3D
LiDAR point cloud. The point cloud is turned into a 2D range-image by
exploiting the topology of the sensor. This image is then used as input to a
U-net. This architecture has already proved its efficiency for the task of
semantic segmentation of medical images. We demonstrate how it can also be used
for the accurate semantic segmentation of a 3D LiDAR point cloud and how it
represents a valid bridge between image processing and 3D point cloud
processing. Our model is trained on range-images built from KITTI 3D object
detection dataset. Experiments show that RIU-Net, despite being very simple,
offers results that are comparable to the state-of-the-art of range-image based
methods. Finally, we demonstrate that this architecture is able to operate at
90fps on a single GPU, which enables deployment for real-time segmentation.Comment: This version of the article contains revised scores to match the
evaluation process presented in SqueezeSegV
RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
We study the problem of efficient semantic segmentation for large-scale 3D
point clouds. By relying on expensive sampling techniques or computationally
heavy pre/post-processing steps, most existing approaches are only able to be
trained and operate over small-scale point clouds. In this paper, we introduce
RandLA-Net, an efficient and lightweight neural architecture to directly infer
per-point semantics for large-scale point clouds. The key to our approach is to
use random point sampling instead of more complex point selection approaches.
Although remarkably computation and memory efficient, random sampling can
discard key features by chance. To overcome this, we introduce a novel local
feature aggregation module to progressively increase the receptive field for
each 3D point, thereby effectively preserving geometric details. Extensive
experiments show that our RandLA-Net can process 1 million points in a single
pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net
clearly surpasses state-of-the-art approaches for semantic segmentation on two
large-scale benchmarks Semantic3D and SemanticKITTI.Comment: CVPR 2020 Oral. Code and data are available at:
https://github.com/QingyongHu/RandLA-Ne
Multi-Kernel Diffusion CNNs for Graph-Based Learning on Point Clouds
Graph convolutional networks are a new promising learning approach to deal
with data on irregular domains. They are predestined to overcome certain
limitations of conventional grid-based architectures and will enable efficient
handling of point clouds or related graphical data representations, e.g.
superpixel graphs. Learning feature extractors and classifiers on 3D point
clouds is still an underdeveloped area and has potential restrictions to equal
graph topologies. In this work, we derive a new architectural design that
combines rotationally and topologically invariant graph diffusion operators and
node-wise feature learning through 1x1 convolutions. By combining multiple
isotropic diffusion operations based on the Laplace-Beltrami operator, we can
learn an optimal linear combination of diffusion kernels for effective feature
propagation across nodes on an irregular graph. We validated our approach for
learning point descriptors as well as semantic classification on real 3D point
clouds of human poses and demonstrate an improvement from 85% to 95% in Dice
overlap with our multi-kernel approach.Comment: accepted for ECCV 2018 Workshop Geometry Meets Deep Learnin
A-CNN: Annularly Convolutional Neural Networks on Point Clouds
Analyzing the geometric and semantic properties of 3D point clouds through
the deep networks is still challenging due to the irregularity and sparsity of
samplings of their geometric structures. This paper presents a new method to
define and compute convolution directly on 3D point clouds by the proposed
annular convolution. This new convolution operator can better capture the local
neighborhood geometry of each point by specifying the (regular and dilated)
ring-shaped structures and directions in the computation. It can adapt to the
geometric variability and scalability at the signal processing level. We apply
it to the developed hierarchical neural networks for object classification,
part segmentation, and semantic segmentation in large-scale scenes. The
extensive experiments and comparisons demonstrate that our approach outperforms
the state-of-the-art methods on a variety of standard benchmark datasets (e.g.,
ModelNet10, ModelNet40, ShapeNet-part, S3DIS, and ScanNet).Comment: 17 pages, 14 figures. To appear, Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), June 201
SPLATNet: Sparse Lattice Networks for Point Cloud Processing
We present a network architecture for processing point clouds that directly
operates on a collection of points represented as a sparse set of samples in a
high-dimensional lattice. Naively applying convolutions on this lattice scales
poorly, both in terms of memory and computational cost, as the size of the
lattice increases. Instead, our network uses sparse bilateral convolutional
layers as building blocks. These layers maintain efficiency by using indexing
structures to apply convolutions only on occupied parts of the lattice, and
allow flexible specifications of the lattice structure enabling hierarchical
and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both
point-based and image-based representations can be easily incorporated in a
network with such layers and the resulting model can be trained in an
end-to-end manner. We present results on 3D segmentation tasks where our
approach outperforms existing state-of-the-art techniques.Comment: Camera-ready, accepted to CVPR 2018 (oral); project website:
http://vis-www.cs.umass.edu/splatnet
- …