5,303 research outputs found
Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review
Recently, the advancement of deep learning in discriminative feature learning
from 3D LiDAR data has led to rapid development in the field of autonomous
driving. However, automated processing uneven, unstructured, noisy, and massive
3D point clouds is a challenging and tedious task. In this paper, we provide a
systematic review of existing compelling deep learning architectures applied in
LiDAR point clouds, detailing for specific tasks in autonomous driving such as
segmentation, detection, and classification. Although several published
research papers focus on specific topics in computer vision for autonomous
vehicles, to date, no general survey on deep learning applied in LiDAR point
clouds for autonomous vehicles exists. Thus, the goal of this paper is to
narrow the gap in this topic. More than 140 key contributions in the recent
five years are summarized in this survey, including the milestone 3D deep
architectures, the remarkable deep learning applications in 3D semantic
segmentation, object detection, and classification; specific datasets,
evaluation metrics, and the state of the art performance. Finally, we conclude
the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and
Learning System
Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
Recent advances in deep convolutional neural networks (CNNs) have motivated
researchers to adapt CNNs to directly model points in 3D point clouds. Modeling
local structure has been proven to be important for the success of
convolutional architectures, and researchers exploited the modeling of local
point sets in the feature extraction hierarchy. However, limited attention has
been paid to explicitly model the geometric structure amongst points in a local
region. To address this problem, we propose Geo-CNN, which applies a generic
convolution-like operation dubbed as GeoConv to each point and its local
neighborhood. Local geometric relationships among points are captured when
extracting edge features between the center and its neighboring points. We
first decompose the edge feature extraction process onto three orthogonal
bases, and then aggregate the extracted features based on the angles between
the edge vector and the bases. This encourages the network to preserve the
geometric structure in Euclidean space throughout the feature extraction
hierarchy. GeoConv is a generic and efficient operation that can be easily
integrated into 3D point cloud analysis pipelines for multiple applications. We
evaluate Geo-CNN on ModelNet40 and KITTI and achieve state-of-the-art
performance
A Fully Convolutional Network for Semantic Labeling of 3D Point Clouds
When classifying point clouds, a large amount of time is devoted to the
process of engineering a reliable set of features which are then passed to a
classifier of choice. Generally, such features - usually derived from the
3D-covariance matrix - are computed using the surrounding neighborhood of
points. While these features capture local information, the process is usually
time-consuming, and requires the application at multiple scales combined with
contextual methods in order to adequately describe the diversity of objects
within a scene. In this paper we present a 1D-fully convolutional network that
consumes terrain-normalized points directly with the corresponding spectral
data,if available, to generate point-wise labeling while implicitly learning
contextual features in an end-to-end fashion. Our method uses only the
3D-coordinates and three corresponding spectral features for each point.
Spectral features may either be extracted from 2D-georeferenced images, as
shown here for Light Detection and Ranging (LiDAR) point clouds, or extracted
directly for passive-derived point clouds,i.e. from muliple-view imagery. We
train our network by splitting the data into square regions, and use a pooling
layer that respects the permutation-invariance of the input points. Evaluated
using the ISPRS 3D Semantic Labeling Contest, our method scored second place
with an overall accuracy of 81.6%. We ranked third place with a mean F1-score
of 63.32%, surpassing the F1-score of the method with highest accuracy by
1.69%. In addition to labeling 3D-point clouds, we also show that our method
can be easily extended to 2D-semantic segmentation tasks, with promising
initial results
MVX-Net: Multimodal VoxelNet for 3D Object Detection
Many recent works on 3D object detection have focused on designing neural
network architectures that can consume point cloud data. While these approaches
demonstrate encouraging performance, they are typically based on a single
modality and are unable to leverage information from other modalities, such as
a camera. Although a few approaches fuse data from different modalities, these
methods either use a complicated pipeline to process the modalities
sequentially, or perform late-fusion and are unable to learn interaction
between different modalities at early stages. In this work, we present
PointFusion and VoxelFusion: two simple yet effective early-fusion approaches
to combine the RGB and point cloud modalities, by leveraging the recently
introduced VoxelNet architecture. Evaluation on the KITTI dataset demonstrates
significant improvements in performance over approaches which only use point
cloud data. Furthermore, the proposed method provides results competitive with
the state-of-the-art multimodal algorithms, achieving top-2 ranking in five of
the six bird's eye view and 3D detection categories on the KITTI benchmark, by
using a simple single stage network.Comment: 7 page
Augmented Semantic Signatures of Airborne LiDAR Point Clouds for Comparison
LiDAR point clouds provide rich geometric information, which is particularly
useful for the analysis of complex scenes of urban regions. Finding structural
and semantic differences between two different three-dimensional point clouds,
say, of the same region but acquired at different time instances is an
important problem. A comparison of point clouds involves computationally
expensive registration and segmentation. We are interested in capturing the
relative differences in the geometric uncertainty and semantic content of the
point cloud without the registration process. Hence, we propose an
orientation-invariant geometric signature of the point cloud, which integrates
its probabilistic geometric and semantic classifications. We study different
properties of the geometric signature, which are an image-based encoding of
geometric uncertainty and semantic content. We explore different metrics to
determine differences between these signatures, which in turn compare point
clouds without performing point-to-point registration. Our results show that
the differences in the signatures corroborate with the geometric and semantic
differences of the point clouds.Comment: 18 pages, 6 figures, 1 tabl
PointIT: A Fast Tracking Framework Based on 3D Instance Segmentation
Recently most popular tracking frameworks focus on 2D image sequences. They
seldom track the 3D object in point clouds. In this paper, we propose PointIT,
a fast, simple tracking method based on 3D on-road instance segmentation.
Firstly, we transform 3D LiDAR data into the spherical image with the size of
64 x 512 x 4 and feed it into instance segment model to get the predicted
instance mask for each class. Then we use MobileNet as our primary encoder
instead of the original ResNet to reduce the computational complexity. Finally,
we extend the Sort algorithm with this instance framework to realize tracking
in the 3D LiDAR point cloud data. The model is trained on the spherical images
dataset with the corresponding instance label masks which are provided by KITTI
3D Object Track dataset. According to the experiment results, our network can
achieve on Average Precision (AP) of 0.617 and the performance of
multi-tracking task has also been improved
Multi-Modal Obstacle Detection in Unstructured Environments with Conditional Random Fields
Reliable obstacle detection and classification in rough and unstructured
terrain such as agricultural fields or orchards remains a challenging problem.
These environments involve large variations in both geometry and appearance,
challenging perception systems that rely on only a single sensor modality.
Geometrically, tall grass, fallen leaves, or terrain roughness can mistakenly
be perceived as nontraversable or might even obscure actual obstacles.
Likewise, traversable grass or dirt roads and obstacles such as trees and
bushes might be visually ambiguous. In this paper, we combine appearance- and
geometry-based detection methods by probabilistically fusing lidar and camera
sensing with semantic segmentation using a conditional random field. We apply a
state-of-the-art multimodal fusion algorithm from the scene analysis domain and
adjust it for obstacle detection in agriculture with moving ground vehicles.
This involves explicitly handling sparse point cloud data and exploiting both
spatial, temporal, and multimodal links between corresponding 2D and 3D
regions. The proposed method was evaluated on a diverse data set, comprising a
dairy paddock and different orchards gathered with a perception research robot
in Australia. Results showed that for a two-class classification problem
(ground and nonground), only the camera leveraged from information provided by
the other modality with an increase in the mean classification score of 0.5%.
However, as more classes were introduced (ground, sky, vegetation, and object),
both modalities complemented each other with improvements of 1.4% in 2D and
7.9% in 3D. Finally, introducing temporal links between successive frames
resulted in improvements of 0.2% in 2D and 1.5% in 3D.Comment: This is the accepted version of the following article: Kragh M,
Underwood J. Multimodal obstacle detection in unstructured environments with
conditional random fields. J Field Robotics. 2019, 1-20., which has been
published in final form at https://doi.org/10.1002/rob.2186
Learning 3D Segment Descriptors for Place Recognition
In the absence of global positioning information, place recognition is a key
capability for enabling localization, mapping and navigation in any
environment. Most place recognition methods rely on images, point clouds, or a
combination of both. In this work we leverage a segment extraction and matching
approach to achieve place recognition in Light Detection and Ranging (LiDAR)
based 3D point cloud maps. One challenge related to this approach is the
recognition of segments despite changes in point of view or occlusion. We
propose using a learning based method in order to reach a higher recall
accuracy then previously proposed methods. Using Convolutional Neural Networks
(CNNs), which are state-of-the-art classifiers, we propose a new approach to
segment recognition based on learned descriptors. In this paper we compare the
effectiveness of three different structures and training methods for CNNs. We
demonstrate through several experiments on real-world data collected in an
urban driving scenario that the proposed learning based methods outperform
hand-crafted descriptors.Comment: Presented at IROS 2017 Workshop on Learning for Localization and
Mappin
Real-time Dynamic Object Detection for Autonomous Driving using Prior 3D-Maps
Lidar has become an essential sensor for autonomous driving as it provides
reliable depth estimation. Lidar is also the primary sensor used in building 3D
maps which can be used even in the case of low-cost systems which do not use
Lidar. Computation on Lidar point clouds is intensive as it requires processing
of millions of points per second. Additionally there are many subsequent tasks
such as clustering, detection, tracking and classification which makes
real-time execution challenging. In this paper, we discuss real-time dynamic
object detection algorithms which leverages previously mapped Lidar point
clouds to reduce processing. The prior 3D maps provide a static background
model and we formulate dynamic object detection as a background subtraction
problem. Computation and modeling challenges in the mapping and online
execution pipeline are described. We propose a rejection cascade architecture
to subtract road regions and other 3D regions separately. We implemented an
initial version of our proposed algorithm and evaluated the accuracy on CARLA
simulator.Comment: Preprint Submission to ECCVW AutoNUE 2018 - v2 author name accent
correctio
3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Semantic parsing of large-scale 3D point clouds is an important research
topic in computer vision and remote sensing fields. Most existing approaches
utilize hand-crafted features for each modality independently and combine them
in a heuristic manner. They often fail to consider the consistency and
complementary information among features adequately, which makes them difficult
to capture high-level semantic structures. The features learned by most of the
current deep learning methods can obtain high-quality image classification
results. However, these methods are hard to be applied to recognize 3D point
clouds due to unorganized distribution and various point density of data. In
this paper, we propose a 3DCNN-DQN-RNN method which fuses the 3D convolutional
neural network (CNN), Deep Q-Network (DQN) and Residual recurrent neural
network (RNN) for an efficient semantic parsing of large-scale 3D point clouds.
In our method, an eye window under control of the 3D CNN and DQN can localize
and segment the points of the object class efficiently. The 3D CNN and Residual
RNN further extract robust and discriminative features of the points in the eye
window, and thus greatly enhance the parsing accuracy of large-scale point
clouds. Our method provides an automatic process that maps the raw data to the
classification results. It also integrates object localization, segmentation
and classification into one framework. Experimental results demonstrate that
the proposed method outperforms the state-of-the-art point cloud classification
methods.Comment: IEEE International Conference on Computer Vision (ICCV) 201
- …