164 research outputs found
Deconvolutional networks for point-cloud vehicle detection and tracking in driving scenarios
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Vehicle detection and tracking is a core ingredient for developing autonomous driving applications in urban scenarios. Recent image-based Deep Learning (DL) techniques are obtaining breakthrough results in these perceptive tasks. However, DL research has not yet advanced much towards processing 3D point clouds from lidar range-finders. These sensors are very common in autonomous vehicles since, despite not providing as semantically rich information as images, their performance is more robust under harsh weather conditions than vision sensors. In this paper we present a full vehicle detection and tracking system that works with 3D lidar information only. Our detection step uses a Convolutional Neural Network (CNN) that receives as input a featured representation of the 3D information provided by a Velodyne HDL-64 sensor and returns a per-point classification of whether it belongs to a vehicle or not. The classified point cloud is then geometrically processed to generate observations for a multi-object tracking system implemented via a number of Multi-Hypothesis Extended Kalman Filters (MH-EKF) that estimate the position and velocity of the surrounding vehicles. The system is thoroughly evaluated on the KITTI tracking dataset, and we show the performance boost provided by our CNN-based vehicle detector over a standard geometric approach. Our lidar-based approach uses about a 4% of the data needed for an image-based detector with similarly competitive results.Peer ReviewedPostprint (author's final draft
Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
Perception technologies in Autonomous Driving are experiencing their golden
age due to the advances in Deep Learning. Yet, most of these systems rely on
the semantically rich information of RGB images. Deep Learning solutions
applied to the data of other sensors typically mounted on autonomous cars (e.g.
lidars or radars) are not explored much. In this paper we propose a novel
solution to understand the dynamics of moving vehicles of the scene from only
lidar information. The main challenge of this problem stems from the fact that
we need to disambiguate the proprio-motion of the 'observer' vehicle from that
of the external 'observed' vehicles. For this purpose, we devise a CNN
architecture which at testing time is fed with pairs of consecutive lidar
scans. However, in order to properly learn the parameters of this network,
during training we introduce a series of so-called pretext tasks which also
leverage on image data. These tasks include semantic information about
vehicleness and a novel lidar-flow feature which combines standard image-based
optical flow with lidar scans. We obtain very promising results and show that
including distilled image information only during training, allows improving
the inference results of the network at test time, even when image data is no
longer used.Comment: Presented in IEEE ICRA 2018. IEEE Copyrights: Personal use of this
material is permitted. Permission from IEEE must be obtained for all other
uses. (V2 just corrected comments on arxiv submission
Dual-Branch CNNs for Vehicle Detection and Tracking on LiDAR Data
We present a novel vehicle detection and tracking system that works solely on 3D LiDAR information. Our approach segments vehicles using a dual-view representation of the 3D LiDAR point cloud on two independently trained convolutional neural networks, one for each view. A bounding box growing algorithm is applied to the fused output of the networks to properly enclose the segmented vehicles. Bounding boxes are grown using a probabilistic method that takes into account also occluded areas. The final vehicle bounding boxes act as observations for a multi-hypothesis tracking system which allows to estimate the position and velocity of the observed vehicles. We thoroughly evaluate our system on the KITTI benchmarks both for detection and tracking separately and show that our dual-branch classifier consistently outperforms previous single-branch approaches, improving or directly competing to other state of the art LiDAR-based methods.This work was supported in part by the EU Project LOGIMATIC under Grant H2020-Galileo-2015-1-687534, in part by the Spanish State Research Agency through the MarĂa de Maeztu Seal of Excellence to IRI under Grant MDM-2016-0656, in part by the ColRobTransp Project under Grant DPI2016-78957-RAEI/FEDER EU, in part by the EB-SLAM Project under Grant DPI2017-89564-P, and in part by the FPU Grant under Grant FPU15/04446
Dual-branch CNNs for vehicle detection and tracking on LiDAR data
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present a novel vehicle detection and tracking system that works solely on 3D LiDAR information. Our approach segments vehicles using a dual-view representation of the 3D LiDAR point cloud on two independently trained convolutional neural networks, one for each view. A bounding box growing algorithm is applied to the fused output of the networks to properly enclose the segmented vehicles. Bounding boxes are grown using a probabilistic method that takes into account also occluded areas. The final vehicle bounding boxes act as observations for a multi-hypothesis tracking system which allows to estimate the position and velocity of the observed vehicles. We thoroughly evaluate our system on the KITTI benchmarks both for detection and tracking separately and show that our dual-branch classifier consistently outperforms previous single-branch approaches, improving or directly competing to other state of the art LiDAR-based methods.This work was supported in part by the Spanish State Research Agency through the MarĂa de Maeztu Seal of Excellence to IRI under Grant MDM-2016-0656, in part by the ColRobTransp Project under Grant DPI2016-78957-RAEI/FEDER EU, in part by the EB-SLAM Project under Grant DPI2017-89564-P, and in part by the FPU Grant under Grant FPU15/04446. The Associate Editor for this article was Z. Duric. (VĂctor Vaquero and Iván del Pino contributed equally to this work.) (Corresponding author: VĂctor Vaquero.)Peer ReviewedPostprint (author's final draft
Lidar-based scene understanding for autonomous driving using deep learning
With over 1.35 million fatalities related to traffic accidents worldwide, autonomous driving was foreseen at the beginning of this century as a feasible solution to improve security in our roads. Nevertheless, it is meant to disrupt our transportation paradigm, allowing to reduce congestion, pollution, and costs, while increasing the accessibility, efficiency, and reliability of the transportation for both people and goods. Although some advances have gradually been transferred into commercial vehicles in the way of Advanced Driving Assistance Systems (ADAS) such as adaptive cruise control, blind spot detection or automatic parking, however, the technology is far from mature. A full understanding of the scene is actually needed so that allowing the vehicles to be aware of the surroundings, knowing the existing elements of the scene, as well as their motion, intentions and interactions.
In this PhD dissertation, we explore new approaches for understanding driving scenes from 3D LiDAR point clouds by using Deep Learning methods. To this end, in Part I we analyze the scene from a static perspective using independent frames to detect the neighboring vehicles. Next, in Part II we develop new ways for understanding the dynamics of the scene. Finally, in Part III we apply all the developed methods to accomplish higher level challenges such as segmenting moving obstacles while obtaining their rigid motion vector over the ground.
More specifically, in Chapter 2 we develop a 3D vehicle detection pipeline based on a multi-branch deep-learning architecture and propose a Front (FR-V) and a Bird’s Eye view (BE-V) as 2D representations of the 3D point cloud to serve as input for training our models. Later on, in Chapter 3 we apply and further test this method on two real uses-cases, for pre-filtering moving
obstacles while creating maps to better localize ourselves on subsequent days, as well as for vehicle tracking. From the dynamic perspective, in Chapter 4 we learn from the 3D point cloud a novel dynamic feature that resembles optical flow from RGB images. For that, we develop a new approach to leverage RGB optical flow as pseudo ground truth for training purposes but allowing the use of only 3D LiDAR data at inference time. Additionally, in Chapter 5 we explore the benefits of combining classification and regression learning problems to face the optical flow estimation task in a joint coarse-and-fine manner. Lastly, in Chapter 6 we gather the previous methods and demonstrate that with these independent tasks we can guide the learning of higher challenging problems such as segmentation and motion estimation of moving vehicles from our own moving perspective.Con más de 1,35 millones de muertes por accidentes de tráfico en el mundo, a principios de siglo se predijo que la conducciĂłn autĂłnoma serĂa una soluciĂłn viable para mejorar la seguridad en nuestras carreteras. Además la conducciĂłn autĂłnoma está destinada a cambiar nuestros paradigmas de transporte, permitiendo reducir la congestiĂłn del tráfico, la contaminaciĂłn y el coste, a la vez que aumentando la accesibilidad, la eficiencia y confiabilidad del transporte tanto de personas como de mercancĂas. Aunque algunos avances, como el control de crucero adaptativo, la detecciĂłn de puntos ciegos o el estacionamiento automático, se han transferido gradualmente a vehĂculos comerciales en la forma de los Sistemas Avanzados de Asistencia a la ConducciĂłn (ADAS), la tecnologĂa aĂşn no ha alcanzado el suficiente grado de madurez. Se necesita una comprensiĂłn completa de la escena para que los vehĂculos puedan entender el entorno, detectando los elementos presentes, asĂ como su movimiento, intenciones e interacciones. En la presente tesis doctoral, exploramos nuevos enfoques para comprender escenarios de conducciĂłn utilizando nubes de puntos en 3D capturadas con sensores LiDAR, para lo cual empleamos mĂ©todos de aprendizaje profundo. Con este fin, en la Parte I analizamos la escena desde una perspectiva estática para detectar vehĂculos. A continuaciĂłn, en la Parte II, desarrollamos nuevas formas de entender las dinámicas del entorno. Finalmente, en la Parte III aplicamos los mĂ©todos previamente desarrollados para lograr desafĂos de nivel superior, como segmentar obstáculos dinámicos a la vez que estimamos su vector de movimiento sobre el suelo. EspecĂficamente, en el CapĂtulo 2 detectamos vehĂculos en 3D creando una arquitectura de aprendizaje profundo de dos ramas y proponemos una vista frontal (FR-V) y una vista de pájaro (BE-V) como representaciones 2D de la nube de puntos 3D que sirven como entrada para entrenar nuestros modelos. Más adelante, en el CapĂtulo 3 aplicamos y probamos aĂşn más este mĂ©todo en dos casos de uso reales, tanto para filtrar obstáculos en movimiento previamente a la creaciĂłn de mapas sobre los que poder localizarnos mejor en los dĂas posteriores, como para el seguimiento de vehĂculos. Desde la perspectiva dinámica, en el CapĂtulo 4 aprendemos de la nube de puntos en 3D una caracterĂstica dinámica novedosa que se asemeja al flujo Ăłptico sobre imágenes RGB. Para ello, desarrollamos un nuevo enfoque que aprovecha el flujo Ăłptico RGB como pseudo muestras reales para entrenamiento, usando solo information 3D durante la inferencia. Además, en el CapĂtulo 5 exploramos los beneficios de combinar los aprendizajes de problemas de clasificaciĂłn y regresiĂłn para la tarea de estimaciĂłn de flujo Ăłptico de manera conjunta. Por Ăşltimo, en el CapĂtulo 6 reunimos los mĂ©todos anteriores y demostramos que con estas tareas independientes podemos guiar el aprendizaje de problemas de más alto nivel, como la segmentaciĂłn y estimaciĂłn del movimiento de vehĂculos desde nuestra propia perspectivaAmb mĂ©s d’1,35 milions de morts per accidents de trĂ nsit al mĂłn, a principis de segle es va
predir que la conducció autònoma es convertiria en una solució viable per millorar la seguretat
a les nostres carreteres. D’altra banda, la conducció autònoma està destinada a canviar els
paradigmes del transport, fent possible aixĂ reduir la densitat del trĂ nsit, la contaminaciĂł i
el cost, alhora que augmentant l’accessibilitat, l’eficiència i la confiança del transport tant de
persones com de mercaderies. Encara que alguns avenços, com el control de creuer adaptatiu,
la detecció de punts cecs o l’estacionament automà tic, s’han transferit gradualment a vehicles
comercials en forma de Sistemes Avançats d’Assistència a la Conducció (ADAS), la tecnologia
encara no ha arribat a aconseguir el grau suficient de maduresa. És necessà ria, doncs, una
total comprensió de l’escena de manera que els vehicles puguin entendre l’entorn, detectant els
elements presents, aixĂ com el seu moviment, intencions i interaccions.
A la present tesi doctoral, explorem nous enfocaments per tal de comprendre les diferents
escenes de conducció utilitzant núvols de punts en 3D capturats amb sensors LiDAR, mitjançant
l’ús de mètodes d’aprenentatge profund. Amb aquest objectiu, a la Part I analitzem l’escena des
d’una perspectiva està tica per a detectar vehicles. A continuació, a la Part II, desenvolupem
noves formes d’entendre les dinà miques de l’entorn. Finalment, a la Part III apliquem els
mètodes prèviament desenvolupats per a aconseguir desafiaments d’un nivell superior, com, per
exemple, segmentar obstacles dinĂ mics al mateix temps que estimem el seu vector de moviment
respecte al terra.
Concretament, al CapĂtol 2 detectem vehicles en 3D creant una arquitectura d’aprenentatge
profund amb dues branques, i proposem una vista frontal (FR-V) i una vista d’ocell (BE-V)
com a representacions 2D del nĂşvol de punts 3D que serveixen com a punt de partida per
entrenar els nostres models. MĂ©s endavant, al CapĂtol 3 apliquem i provem de nou aquest
mètode en dos casos d’ús reals, tant per filtrar obstacles en moviment prèviament a la creació
de mapes en els quals poder localitzar-nos millor en dies posteriors, com per dur a terme
el seguiment de vehicles. Des de la perspectiva dinĂ mica, al CapĂtol 4 aprenem una nova
caracterĂstica dinĂ mica del nĂşvol de punts en 3D que s’assembla al flux òptic sobre imatges
RGB. Per a fer-ho, desenvolupem un nou enfocament que aprofita el flux òptic RGB com pseudo
mostres reals per a entrenament, utilitzant només informació 3D durant la inferència. Després,
al CapĂtol 5 explorem els beneficis que s’obtenen de combinar els aprenentatges de problemes
de classificació i regressió per la tasca d’estimació de flux òptic de manera conjunta. Finalment,
al CapĂtol 6 posem en comĂş els mètodes anteriors i demostrem que mitjançant aquests processos
independents podem abordar l’aprenentatge de problemes més complexos, com la segmentació
i estimació del moviment de vehicles des de la nostra pròpia perspectiva
LiDAR point-cloud processing based on projection methods: a comparison
An accurate and rapid-response perception system is fundamental for
autonomous vehicles to operate safely. 3D object detection methods handle point
clouds given by LiDAR sensors to provide accurate depth and position
information for each detection, together with its dimensions and
classification. The information is then used to track vehicles and other
obstacles in the surroundings of the autonomous vehicle, and also to feed
control units that guarantee collision avoidance and motion planning. Nowadays,
object detection systems can be divided into two main categories. The first
ones are the geometric based, which retrieve the obstacles using geometric and
morphological operations on the 3D points. The seconds are the deep
learning-based, which process the 3D points, or an elaboration of the 3D
point-cloud, with deep learning techniques to retrieve a set of obstacles. This
paper presents a comparison between those two approaches, presenting one
implementation of each class on a real autonomous vehicle. Accuracy of the
estimates of the algorithms has been evaluated with experimental tests carried
in the Monza ENI circuit. The position of the ego vehicle and the obstacle is
given by GPS sensors with RTK correction, which guarantees an accurate ground
truth for the comparison. Both algorithms have been implemented on ROS and run
on a consumer laptop
Deep lidar CNN to understand the dynamics of moving vehicles
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Perception technologies in Autonomous Driving are experiencing their golden age due to the advances in Deep Learning. Yet, most of these systems rely on the semantically rich information of RGB images. Deep Learning solutions applied to the data of other sensors typically mounted on autonomous cars (e.g. lidars or radars) are not explored much. In this paper we propose a novel solution to understand the dynamics of moving vehicles of the scene from only lidar information. The main challenge of this problem stems from the fact that we need to disambiguate the proprio-motion of the “observer” vehicle from that of the external “observed” vehicles. For this purpose, we devise a CNN architecture which at testing time is fed with pairs of consecutive lidar scans. However, in order to properly learn the parameters of this network, during training we introduce a series of so-called pretext tasks which also leverage on image data. These tasks include semantic information about vehicleness and a novel lidar-flow feature which combines standard image-based optical flow with lidar scans. We obtain very promising results and show that including distilled image information only during training, allows improving the inference results of the network at test time, even when image data is no longer used.Peer ReviewedPostprint (author's final draft
Hallucinating dense optical flow from sparse lidar for autonomous vehicles
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we propose a novel approach to estimate dense optical flow from sparse lidar data acquired on an autonomous vehicle. This is intended to be used as a drop-in replacement of any image-based optical flow system when images are not reliable due to e.g. adverse weather conditions or at night. In order to infer high resolution 2D flows from discrete range data we devise a three-block architecture of multiscale filters that combines multiple intermediate objectives, both in the lidar and image domain. To train this network we introduce a dataset with approximately 20K lidar samples of the Kitti dataset which we have augmented with a pseudo ground-truth image-based optical flow computed using FlowNet2. We demonstrate the effectiveness of our approach on Kitti, and show that despite using the low-resolution and sparse measurements of the lidar, we can regress dense optical flow maps which are at par with those estimated with image-based methods.Peer ReviewedPostprint (author's final draft
Deep Generative Modeling of LiDAR Data
Building models capable of generating structured output is a key challenge
for AI and robotics. While generative models have been explored on many types
of data, little work has been done on synthesizing lidar scans, which play a
key role in robot mapping and localization. In this work, we show that one can
adapt deep generative models for this task by unravelling lidar scans into a 2D
point map. Our approach can generate high quality samples, while simultaneously
learning a meaningful latent representation of the data. We demonstrate
significant improvements against state-of-the-art point cloud generation
methods. Furthermore, we propose a novel data representation that augments the
2D signal with absolute positional information. We show that this helps
robustness to noisy and imputed input; the learned model can recover the
underlying lidar scan from seemingly uninformative dataComment: Presented at IROS 201
- …