Search CORE

5,642 research outputs found

Small, Versatile and Mighty: A Range-View Perception Framework

Author: Meng Qiang
Wang JiaBao
Wang Ke
Wang Xiao
Yan Liujiang
Publication venue
Publication date: 01/03/2024
Field of study

Despite its compactness and information integrity, the range view representation of LiDAR data rarely occurs as the first choice for 3D perception tasks. In this work, we further push the envelop of the range-view representation with a novel multi-task framework, achieving unprecedented 3D detection performances. Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional architecture to fully unleash the efficiency and multi-tasking potentials of the range view representation. To boost detection performances, we first propose a range-view specific Perspective Centric Label Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module to further refine hard-to-predict box properties. In addition, our framework seamlessly integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud, without extra modules. Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Waymo Open Dataset. Especially, over 10 mAP improvement over convolutional counterparts can be obtained on the vehicle class. Our presented results for other tasks further reveal the multi-task capabilities of the proposed small but mighty framework

arXiv.org e-Print Archive

PIXOR: Real-time 3D Object Detection from Point Clouds

Author: Luo Wenjie
Urtasun Raquel
Yang Bin
Publication venue
Publication date: 01/03/2019
Field of study

We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Computation speed is critical as detection is a necessary component for safety. Existing approaches are, however, expensive in computation due to high dimensionality of point clouds. We utilize the 3D data more efficiently by representing the scene from the Bird's Eye View (BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions. The input representation, network architecture, and model optimization are especially designed to balance high accuracy and real-time efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets we show that the proposed detector surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at >28 FPS.Comment: Update of CVPR2018 paper: correct timing, fix typos, add acknowledgemen

arXiv.org e-Print Archive

Crossref

Deep Lidar CNN to Understand the Dynamics of Moving Vehicles

Author: Moreno-Noguer Francesc
Sanfeliu Alberto
Vaquero Victor
Publication venue
Publication date: 01/01/2018
Field of study

Perception technologies in Autonomous Driving are experiencing their golden age due to the advances in Deep Learning. Yet, most of these systems rely on the semantically rich information of RGB images. Deep Learning solutions applied to the data of other sensors typically mounted on autonomous cars (e.g. lidars or radars) are not explored much. In this paper we propose a novel solution to understand the dynamics of moving vehicles of the scene from only lidar information. The main challenge of this problem stems from the fact that we need to disambiguate the proprio-motion of the 'observer' vehicle from that of the external 'observed' vehicles. For this purpose, we devise a CNN architecture which at testing time is fed with pairs of consecutive lidar scans. However, in order to properly learn the parameters of this network, during training we introduce a series of so-called pretext tasks which also leverage on image data. These tasks include semantic information about vehicleness and a novel lidar-flow feature which combines standard image-based optical flow with lidar scans. We obtain very promising results and show that including distilled image information only during training, allows improving the inference results of the network at test time, even when image data is no longer used.Comment: Presented in IEEE ICRA 2018. IEEE Copyrights: Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses. (V2 just corrected comments on arxiv submission

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving

Author: Aksoy Eren Erdal
Baci Saimir
Cavdar Selcuk
Publication venue
Publication date: 18/09/2019
Field of study

In this paper, we introduce a deep encoder-decoder network, named SalsaNet, for efficient semantic segmentation of 3D LiDAR point clouds. SalsaNet segments the road, i.e. drivable free-space, and vehicles in the scene by employing the Bird-Eye-View (BEV) image projection of the point cloud. To overcome the lack of annotated point cloud data, in particular for the road segments, we introduce an auto-labeling process which transfers automatically generated labels from the camera to LiDAR. We also explore the role of imagelike projection of LiDAR data in semantic segmentation by comparing BEV with spherical-front-view projection and show that SalsaNet is projection-agnostic. We perform quantitative and qualitative evaluations on the KITTI dataset, which demonstrate that the proposed SalsaNet outperforms other state-of-the-art semantic segmentation networks in terms of accuracy and computation time. Our code and data are publicly available at https://gitlab.com/aksoyeren/salsanet.git

arXiv.org e-Print Archive

Crossref

Deep Generative Modeling of LiDAR Data

Author: Caccia Lucas
Courville Aaron
Pineau Joelle
van Hoof Herke
Publication venue
Publication date: 01/01/2019
Field of study

Building models capable of generating structured output is a key challenge for AI and robotics. While generative models have been explored on many types of data, little work has been done on synthesizing lidar scans, which play a key role in robot mapping and localization. In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map. Our approach can generate high quality samples, while simultaneously learning a meaningful latent representation of the data. We demonstrate significant improvements against state-of-the-art point cloud generation methods. Furthermore, we propose a novel data representation that augments the 2D signal with absolute positional information. We show that this helps robustness to noisy and imputed input; the learned model can recover the underlying lidar scan from seemingly uninformative dataComment: Presented at IROS 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE