16,545 research outputs found
Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review
Recently, the advancement of deep learning in discriminative feature learning
from 3D LiDAR data has led to rapid development in the field of autonomous
driving. However, automated processing uneven, unstructured, noisy, and massive
3D point clouds is a challenging and tedious task. In this paper, we provide a
systematic review of existing compelling deep learning architectures applied in
LiDAR point clouds, detailing for specific tasks in autonomous driving such as
segmentation, detection, and classification. Although several published
research papers focus on specific topics in computer vision for autonomous
vehicles, to date, no general survey on deep learning applied in LiDAR point
clouds for autonomous vehicles exists. Thus, the goal of this paper is to
narrow the gap in this topic. More than 140 key contributions in the recent
five years are summarized in this survey, including the milestone 3D deep
architectures, the remarkable deep learning applications in 3D semantic
segmentation, object detection, and classification; specific datasets,
evaluation metrics, and the state of the art performance. Finally, we conclude
the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and
Learning System
Orientation-boosted Voxel Nets for 3D Object Recognition
Recent work has shown good recognition results in 3D object recognition using
3D convolutional networks. In this paper, we show that the object orientation
plays an important role in 3D recognition. More specifically, we argue that
objects induce different features in the network under rotation. Thus, we
approach the category-level classification task as a multi-task problem, in
which the network is trained to predict the pose of the object in addition to
the class label as a parallel task. We show that this yields significant
improvements in the classification results. We test our suggested architecture
on several datasets representing various 3D data sources: LiDAR data, CAD
models, and RGB-D images. We report state-of-the-art results on classification
as well as significant improvements in precision and speed over the baseline on
3D detection.Comment: BMVC'17 version. Added some experiments + auto-alignment of
Modelnet4
Classification of Aerial Photogrammetric 3D Point Clouds
We present a powerful method to extract per-point semantic class labels from
aerialphotogrammetry data. Labeling this kind of data is important for tasks
such as environmental modelling, object classification and scene understanding.
Unlike previous point cloud classification methods that rely exclusively on
geometric features, we show that incorporating color information yields a
significant increase in accuracy in detecting semantic classes. We test our
classification method on three real-world photogrammetry datasets that were
generated with Pix4Dmapper Pro, and with varying point densities. We show that
off-the-shelf machine learning techniques coupled with our new features allow
us to train highly accurate classifiers that generalize well to unseen data,
processing point clouds containing 10 million points in less than 3 minutes on
a desktop computer.Comment: ISPRS 201
3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds
Semantic parsing of large-scale 3D point clouds is an important research
topic in computer vision and remote sensing fields. Most existing approaches
utilize hand-crafted features for each modality independently and combine them
in a heuristic manner. They often fail to consider the consistency and
complementary information among features adequately, which makes them difficult
to capture high-level semantic structures. The features learned by most of the
current deep learning methods can obtain high-quality image classification
results. However, these methods are hard to be applied to recognize 3D point
clouds due to unorganized distribution and various point density of data. In
this paper, we propose a 3DCNN-DQN-RNN method which fuses the 3D convolutional
neural network (CNN), Deep Q-Network (DQN) and Residual recurrent neural
network (RNN) for an efficient semantic parsing of large-scale 3D point clouds.
In our method, an eye window under control of the 3D CNN and DQN can localize
and segment the points of the object class efficiently. The 3D CNN and Residual
RNN further extract robust and discriminative features of the points in the eye
window, and thus greatly enhance the parsing accuracy of large-scale point
clouds. Our method provides an automatic process that maps the raw data to the
classification results. It also integrates object localization, segmentation
and classification into one framework. Experimental results demonstrate that
the proposed method outperforms the state-of-the-art point cloud classification
methods.Comment: IEEE International Conference on Computer Vision (ICCV) 201
Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios
Vehicle detection and tracking is a core ingredient for developing autonomous
driving applications in urban scenarios. Recent image-based Deep Learning (DL)
techniques are obtaining breakthrough results in these perceptive tasks.
However, DL research has not yet advanced much towards processing 3D point
clouds from lidar range-finders. These sensors are very common in autonomous
vehicles since, despite not providing as semantically rich information as
images, their performance is more robust under harsh weather conditions than
vision sensors. In this paper we present a full vehicle detection and tracking
system that works with 3D lidar information only. Our detection step uses a
Convolutional Neural Network (CNN) that receives as input a featured
representation of the 3D information provided by a Velodyne HDL-64 sensor and
returns a per-point classification of whether it belongs to a vehicle or not.
The classified point cloud is then geometrically processed to generate
observations for a multi-object tracking system implemented via a number of
Multi-Hypothesis Extended Kalman Filters (MH-EKF) that estimate the position
and velocity of the surrounding vehicles. The system is thoroughly evaluated on
the KITTI tracking dataset, and we show the performance boost provided by our
CNN-based vehicle detector over a standard geometric approach. Our lidar-based
approach uses about a 4% of the data needed for an image-based detector with
similarly competitive results.Comment: Presented in IEEE ECMR 2017. IEEE Copyrights: Personal use of this
material is permitted. Permission from IEEE must be obtained for all other
use
Real-time Dynamic Object Detection for Autonomous Driving using Prior 3D-Maps
Lidar has become an essential sensor for autonomous driving as it provides
reliable depth estimation. Lidar is also the primary sensor used in building 3D
maps which can be used even in the case of low-cost systems which do not use
Lidar. Computation on Lidar point clouds is intensive as it requires processing
of millions of points per second. Additionally there are many subsequent tasks
such as clustering, detection, tracking and classification which makes
real-time execution challenging. In this paper, we discuss real-time dynamic
object detection algorithms which leverages previously mapped Lidar point
clouds to reduce processing. The prior 3D maps provide a static background
model and we formulate dynamic object detection as a background subtraction
problem. Computation and modeling challenges in the mapping and online
execution pipeline are described. We propose a rejection cascade architecture
to subtract road regions and other 3D regions separately. We implemented an
initial version of our proposed algorithm and evaluated the accuracy on CARLA
simulator.Comment: Preprint Submission to ECCVW AutoNUE 2018 - v2 author name accent
correctio
A Fully Convolutional Network for Semantic Labeling of 3D Point Clouds
When classifying point clouds, a large amount of time is devoted to the
process of engineering a reliable set of features which are then passed to a
classifier of choice. Generally, such features - usually derived from the
3D-covariance matrix - are computed using the surrounding neighborhood of
points. While these features capture local information, the process is usually
time-consuming, and requires the application at multiple scales combined with
contextual methods in order to adequately describe the diversity of objects
within a scene. In this paper we present a 1D-fully convolutional network that
consumes terrain-normalized points directly with the corresponding spectral
data,if available, to generate point-wise labeling while implicitly learning
contextual features in an end-to-end fashion. Our method uses only the
3D-coordinates and three corresponding spectral features for each point.
Spectral features may either be extracted from 2D-georeferenced images, as
shown here for Light Detection and Ranging (LiDAR) point clouds, or extracted
directly for passive-derived point clouds,i.e. from muliple-view imagery. We
train our network by splitting the data into square regions, and use a pooling
layer that respects the permutation-invariance of the input points. Evaluated
using the ISPRS 3D Semantic Labeling Contest, our method scored second place
with an overall accuracy of 81.6%. We ranked third place with a mean F1-score
of 63.32%, surpassing the F1-score of the method with highest accuracy by
1.69%. In addition to labeling 3D-point clouds, we also show that our method
can be easily extended to 2D-semantic segmentation tasks, with promising
initial results
Augmented Semantic Signatures of Airborne LiDAR Point Clouds for Comparison
LiDAR point clouds provide rich geometric information, which is particularly
useful for the analysis of complex scenes of urban regions. Finding structural
and semantic differences between two different three-dimensional point clouds,
say, of the same region but acquired at different time instances is an
important problem. A comparison of point clouds involves computationally
expensive registration and segmentation. We are interested in capturing the
relative differences in the geometric uncertainty and semantic content of the
point cloud without the registration process. Hence, we propose an
orientation-invariant geometric signature of the point cloud, which integrates
its probabilistic geometric and semantic classifications. We study different
properties of the geometric signature, which are an image-based encoding of
geometric uncertainty and semantic content. We explore different metrics to
determine differences between these signatures, which in turn compare point
clouds without performing point-to-point registration. Our results show that
the differences in the signatures corroborate with the geometric and semantic
differences of the point clouds.Comment: 18 pages, 6 figures, 1 tabl
Traffic Sign Timely Visual Recognizability Evaluation Based on 3D Measurable Point Clouds
The timely provision of traffic sign information to drivers is essential for
the drivers to respond, to ensure safe driving, and to avoid traffic accidents
in a timely manner. We proposed a timely visual recognizability quantitative
evaluation method for traffic signs in large-scale transportation environments.
To achieve this goal, we first address the concept of a visibility field to
reflect the visible distribution of three-dimensional (3D) space and construct
a traffic sign Visibility Evaluation Model (VEM) to measure the traffic sign
visibility for a given viewpoint. Then, based on the VEM, we proposed the
concept of the Visual Recognizability Field (VRF) to reflect the visual
recognizability distribution in 3D space and established a Visual
Recognizability Evaluation Model (VREM) to measure a traffic sign visual
recognizability for a given viewpoint. Next, we proposed a Traffic Sign Timely
Visual Recognizability Evaluation Model (TSTVREM) by combining VREM, the actual
maximum continuous visual recognizable distance, and traffic big data to
measure a traffic sign visual recognizability in different lanes. Finally, we
presented an automatic algorithm to implement the TSTVREM model through traffic
sign and road marking detection and classification, traffic sign environment
point cloud segmentation, viewpoints calculation, and TSTVREM model
realization. The performance of our method for traffic sign timely visual
recognizability evaluation is tested on three road point clouds acquired by a
mobile laser scanning system (RIEGL VMX-450) according to Road Traffic Signs
and Markings (GB 5768-1999 in China), showing that our method is feasible and
efficient
A Review on Deep Learning Techniques Applied to Semantic Segmentation
Image semantic segmentation is more and more being of interest for computer
vision and machine learning researchers. Many applications on the rise need
accurate and efficient segmentation mechanisms: autonomous driving, indoor
navigation, and even virtual or augmented reality systems to name a few. This
demand coincides with the rise of deep learning approaches in almost every
field or application target related to computer vision, including semantic
segmentation or scene understanding. This paper provides a review on deep
learning methods for semantic segmentation applied to various application
areas. Firstly, we describe the terminology of this field as well as mandatory
background concepts. Next, the main datasets and challenges are exposed to help
researchers decide which are the ones that best suit their needs and their
targets. Then, existing methods are reviewed, highlighting their contributions
and their significance in the field. Finally, quantitative results are given
for the described methods and the datasets in which they were evaluated,
following up with a discussion of the results. At last, we point out a set of
promising future works and draw our own conclusions about the state of the art
of semantic segmentation using deep learning techniques.Comment: Submitted to TPAMI on Apr. 22, 201
- …