5,642 research outputs found
Small, Versatile and Mighty: A Range-View Perception Framework
Despite its compactness and information integrity, the range view
representation of LiDAR data rarely occurs as the first choice for 3D
perception tasks. In this work, we further push the envelop of the range-view
representation with a novel multi-task framework, achieving unprecedented 3D
detection performances. Our proposed Small, Versatile, and Mighty (SVM) network
utilizes a pure convolutional architecture to fully unleash the efficiency and
multi-tasking potentials of the range view representation. To boost detection
performances, we first propose a range-view specific Perspective Centric Label
Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module
to further refine hard-to-predict box properties. In addition, our framework
seamlessly integrates semantic segmentation and panoptic segmentation tasks for
the LiDAR point cloud, without extra modules. Among range-view-based methods,
our model achieves new state-of-the-art detection performances on the Waymo
Open Dataset. Especially, over 10 mAP improvement over convolutional
counterparts can be obtained on the vehicle class. Our presented results for
other tasks further reveal the multi-task capabilities of the proposed small
but mighty framework
PIXOR: Real-time 3D Object Detection from Point Clouds
We address the problem of real-time 3D object detection from point clouds in
the context of autonomous driving. Computation speed is critical as detection
is a necessary component for safety. Existing approaches are, however,
expensive in computation due to high dimensionality of point clouds. We utilize
the 3D data more efficiently by representing the scene from the Bird's Eye View
(BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs
oriented 3D object estimates decoded from pixel-wise neural network
predictions. The input representation, network architecture, and model
optimization are especially designed to balance high accuracy and real-time
efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection
benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets
we show that the proposed detector surpasses other state-of-the-art methods
notably in terms of Average Precision (AP), while still runs at >28 FPS.Comment: Update of CVPR2018 paper: correct timing, fix typos, add
acknowledgemen
Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
Perception technologies in Autonomous Driving are experiencing their golden
age due to the advances in Deep Learning. Yet, most of these systems rely on
the semantically rich information of RGB images. Deep Learning solutions
applied to the data of other sensors typically mounted on autonomous cars (e.g.
lidars or radars) are not explored much. In this paper we propose a novel
solution to understand the dynamics of moving vehicles of the scene from only
lidar information. The main challenge of this problem stems from the fact that
we need to disambiguate the proprio-motion of the 'observer' vehicle from that
of the external 'observed' vehicles. For this purpose, we devise a CNN
architecture which at testing time is fed with pairs of consecutive lidar
scans. However, in order to properly learn the parameters of this network,
during training we introduce a series of so-called pretext tasks which also
leverage on image data. These tasks include semantic information about
vehicleness and a novel lidar-flow feature which combines standard image-based
optical flow with lidar scans. We obtain very promising results and show that
including distilled image information only during training, allows improving
the inference results of the network at test time, even when image data is no
longer used.Comment: Presented in IEEE ICRA 2018. IEEE Copyrights: Personal use of this
material is permitted. Permission from IEEE must be obtained for all other
uses. (V2 just corrected comments on arxiv submission
SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving
In this paper, we introduce a deep encoder-decoder network, named SalsaNet,
for efficient semantic segmentation of 3D LiDAR point clouds. SalsaNet segments
the road, i.e. drivable free-space, and vehicles in the scene by employing the
Bird-Eye-View (BEV) image projection of the point cloud. To overcome the lack
of annotated point cloud data, in particular for the road segments, we
introduce an auto-labeling process which transfers automatically generated
labels from the camera to LiDAR. We also explore the role of imagelike
projection of LiDAR data in semantic segmentation by comparing BEV with
spherical-front-view projection and show that SalsaNet is projection-agnostic.
We perform quantitative and qualitative evaluations on the KITTI dataset, which
demonstrate that the proposed SalsaNet outperforms other state-of-the-art
semantic segmentation networks in terms of accuracy and computation time. Our
code and data are publicly available at
https://gitlab.com/aksoyeren/salsanet.git
Deep Generative Modeling of LiDAR Data
Building models capable of generating structured output is a key challenge
for AI and robotics. While generative models have been explored on many types
of data, little work has been done on synthesizing lidar scans, which play a
key role in robot mapping and localization. In this work, we show that one can
adapt deep generative models for this task by unravelling lidar scans into a 2D
point map. Our approach can generate high quality samples, while simultaneously
learning a meaningful latent representation of the data. We demonstrate
significant improvements against state-of-the-art point cloud generation
methods. Furthermore, we propose a novel data representation that augments the
2D signal with absolute positional information. We show that this helps
robustness to noisy and imputed input; the learned model can recover the
underlying lidar scan from seemingly uninformative dataComment: Presented at IROS 201
- …