71 research outputs found
SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud
In this paper, we address semantic segmentation of road-objects from 3D LiDAR
point clouds. In particular, we wish to detect and categorize instances of
interest, such as cars, pedestrians and cyclists. We formulate this problem as
a point- wise classification problem, and propose an end-to-end pipeline called
SqueezeSeg based on convolutional neural networks (CNN): the CNN takes a
transformed LiDAR point cloud as input and directly outputs a point-wise label
map, which is then refined by a conditional random field (CRF) implemented as a
recurrent layer. Instance-level labels are then obtained by conventional
clustering algorithms. Our CNN model is trained on LiDAR point clouds from the
KITTI dataset, and our point-wise segmentation labels are derived from 3D
bounding boxes from KITTI. To obtain extra training data, we built a LiDAR
simulator into Grand Theft Auto V (GTA-V), a popular video game, to synthesize
large amounts of realistic training data. Our experiments show that SqueezeSeg
achieves high accuracy with astonishingly fast and stable runtime (8.7 ms per
frame), highly desirable for autonomous driving applications. Furthermore,
additionally training on synthesized data boosts validation accuracy on
real-world data. Our source code and synthesized data will be open-sourced
Intent prediction of vulnerable road users for trusted autonomous vehicles
This study investigated how future autonomous vehicles could be further trusted by vulnerable road users (such as pedestrians and cyclists) that they would be interacting with in urban traffic environments. It focused on understanding the behaviours of such road users on a deeper level by predicting their future intentions based solely on vehicle-based sensors and AI techniques. The findings showed that personal/body language attributes of vulnerable road users besides their past motion trajectories and physics attributes in the environment led to more accurate predictions about their intended actions
Multimodal perception for autonomous driving
Mención Internacional en el título de doctorAutonomous driving is set to play an important role among intelligent
transportation systems in the coming decades. The advantages
of its large-scale implementation –reduced accidents, shorter commuting
times, or higher fuel efficiency– have made its development a priority
for academia and industry. However, there is still a long way to
go to achieve full self-driving vehicles, capable of dealing with any
scenario without human intervention. To this end, advances in control,
navigation and, especially, environment perception technologies
are yet required. In particular, the detection of other road users that
may interfere with the vehicle’s trajectory is a key element, since it
allows to model the current traffic situation and, thus, to make decisions
accordingly.
The objective of this thesis is to provide solutions to some of
the main challenges of on-board perception systems, such as extrinsic
calibration of sensors, object detection, and deployment on
real platforms. First, a calibration method for obtaining the relative
transformation between pairs of sensors is introduced, eliminating
the complex manual adjustment of these parameters. The algorithm
makes use of an original calibration pattern and supports LiDARs,
and monocular and stereo cameras. Second, different deep learning
models for 3D object detection using LiDAR data in its bird’s eye
view projection are presented. Through a novel encoding, the use
of architectures tailored to image detection is proposed to process
the 3D information of point clouds in real time. Furthermore, the
effectiveness of using this projection together with image features is
analyzed. Finally, a method to mitigate the accuracy drop of LiDARbased
detection networks when deployed in ad-hoc configurations is
introduced. For this purpose, the simulation of virtual signals mimicking
the specifications of the desired real device is used to generate
new annotated datasets that can be used to train the models.
The performance of the proposed methods is evaluated against
other existing alternatives using reference benchmarks in the field of
computer vision (KITTI and nuScenes) and through experiments in
open traffic with an automated vehicle. The results obtained demonstrate
the relevance of the presented work and its suitability for commercial
use.La conducción autónoma está llamada a jugar un papel importante en
los sistemas inteligentes de transporte de las próximas décadas. Las
ventajas de su implementación a larga escala –disminución de accidentes,
reducción del tiempo de trayecto, u optimización del consumo–
han convertido su desarrollo en una prioridad para la academia y
la industria. Sin embargo, todavía hay un largo camino por delante
hasta alcanzar una automatización total, capaz de enfrentarse a cualquier
escenario sin intervención humana. Para ello, aún se requieren
avances en las tecnologías de control, navegación y, especialmente,
percepción del entorno. Concretamente, la detección de otros usuarios
de la carretera que puedan interferir en la trayectoria del vehículo
es una pieza fundamental para conseguirlo, puesto que permite modelar
el estado actual del tráfico y tomar decisiones en consecuencia.
El objetivo de esta tesis es aportar soluciones a algunos de los
principales retos de los sistemas de percepción embarcados, como
la calibración extrínseca de los sensores, la detección de objetos, y su
despliegue en plataformas reales. En primer lugar, se introduce un
método para la obtención de la transformación relativa entre pares
de sensores, eliminando el complejo ajuste manual de estos parámetros.
El algoritmo hace uso de un patrón de calibración propio y da
soporte a cámaras monoculares, estéreo, y LiDAR. En segundo lugar,
se presentan diferentes modelos de aprendizaje profundo para la detección
de objectos en 3D utilizando datos de escáneres LiDAR en su
proyección en vista de pájaro. A través de una nueva codificación, se
propone la utilización de arquitecturas de detección en imagen para
procesar en tiempo real la información tridimensional de las nubes
de puntos. Además, se analiza la efectividad del uso de esta proyección
junto con características procedentes de imágenes. Por último,
se introduce un método para mitigar la pérdida de precisión de las
redes de detección basadas en LiDAR cuando son desplegadas en
configuraciones ad-hoc. Para ello, se plantea la simulación de señales
virtuales con las características del modelo real que se quiere utilizar,
generando así nuevos conjuntos anotados para entrenar los modelos.
El rendimiento de los métodos propuestos es evaluado frente a
otras alternativas existentes haciendo uso de bases de datos de referencia
en el campo de la visión por computador (KITTI y nuScenes),
y mediante experimentos en tráfico abierto empleando un vehículo
automatizado. Los resultados obtenidos demuestran la relevancia de
los trabajos presentados y su viabilidad para un uso comercial.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Jesús García Herrero.- Secretario: Ignacio Parra Alonso.- Vocal: Gustavo Adolfo Peláez Coronad
Deep Learning Based Classification of Pedestrian Vulnerability Trained on Synthetic Datasets
The reliable detection of vulnerable road users and the assessment of the actual vulnerability is an important task for the collision warning algorithms of driver assistance systems. Current systems make assumptions about the road geometry which can lead to misclassification. We propose a deep learning-based approach to reliably detect pedestrians and classify their vulnerability based on the traffic area they are walking in. Since there are no pre-labeled datasets available for this task, we developed a method to train a network first on custom synthetic data and then use the network to augment a customer-provided training dataset for a neural network working on real world images. The evaluation shows that our network is able to accurately classify the vulnerability of pedestrians in complex real world scenarios without making assumptions on road geometry
Identificação de objetos para veículos autónomos com base em aprendizagem automática
Autonomous driving is one of the most actively researched fields in artificial
intelligence. The autonomous vehicles are expected to significantly reduce
the road accidents and casualties one day when they become sufficiently
mature transport option. Currently much effort is focused to prove the
concept of autonomous vehicles that is based on a suit of sensors to observe
their surroundings. In particular, camera and LiDAR are researched as an
efficient combination of sensors for on-line object identification on the road.
2D object identification is an already established field in Computer Vision.
The successful application of Deep Learning techniques has led to 2D vision
with Human-level accuracy. However, for a matter of improved safety more
advanced approaches suggest that the vehicle should not rely on a single
class of sensors. LiDAR has been proposed as an additional sensor, particularly
due to its 3D vision capability. 3D vision relies on LiDAR captured data
to recognize objects in 3D. However, in contrast to the 2D object identifi- cation, 3D object detection is a relatively immature field and still has many
challenges to overcome. In addition, LiDARs are expensive sensors, which
makes the acquisition of data required for training 3D object recognition
techniques expensive tasks as well.
In this context, this Master's thesis has the major goal to further facilitate
the 3D object identification for autonomous vehicles based on Deep Learning
(DL). The specific contributions of the present work are the following.
First, a comprehensive overview of the state of the art Deep Learning architectures
for 3D object identification based on Point Clouds. The purpose
of this overview is to understand how to better approach such a problem in
the context of autonomous driving.
Second, synthetic but realistic Lidar captured data was generated in the
GTA V virtual environment. Tools were developed to convert the generated
data into the KITTI dataset format, which has become standard in 3D
object detection techniques for autonomous driving.
Third, some of the overviewed 3D object identification DL architectures
were evaluated with the generated data. Though their performance with
the generated data was worse than with the original KITTI data, the models
were still able to correctly process the synthetic data without being retrained.
The future benefit of this work is that the models can be further
trained with home-made data and varying testing scenarios.
The implemented GTA V mod has proved to be capable of providing rich,
well-structured and compatible datasets with the state of the art 3D object
identification architectures.
The developed tool is publicly available and we hope it will be useful in
advancing 3D object identification for autonomous driving, as it removes
the dependency from datasets provided by a third party.Condução autónoma é uma das áreas mais ativamente estudadas em inteligência artificial.
É esperado que os veículos autónomos reduzam significativamente os acidentes rodoviários e vitimas mortais quando se tornarem
suficientemente maturos como opção de transporte. Atualmente, muitos
dos esforços estão focados na prova de conceito de veículos autónomos
serem baseados num conjunto de sensores que observam o ambiente em
redor. Em particular, a camara e o LiDAR são estudados como sendo uma
combinação eficiente de sensores para realização de identificação de objectos
on-line nas estradas.
Identificação de objetos 2D é uma área de estudo já estabelecida no campo
de Computação Visual. O sucesso na aplicação de técnicas de Deep Learning
levou a que a visão 2D atingisse uma precisão ao nível Humano. No
entanto, de forma a melhorar a segurança, abordagens mais avançadas sugerem
que o veículo não deve depender de uma única classe de sensores. O
LiDAR foi proposto como sendo um sensor adicional, particularmente devido
à sua capacidade de visão 3D. Visão 3D depende dos dados capturados
pelo LiDAR para reconhecer objetos em 3D. No entanto, em contraste com
a identificação de objetos 2D, a identificação de objetos 3D é um campo
de estudos relativamente imaturo e ainda possui muitos desafios para ultrapassar.
Adicionalmente, LiDARs são sensores dispendiosos, o que também
torna a aquisição de dados necessários para o treino de técnicas de reconhecimento
de objetos 3D mais cara.
Neste contexto, esta tese de Mestrado tem como objetivo principal facilitar a
identificação de objetos 3D, baseada em Deep Learning (DL), para veículos
autónomos. As contribuições especificas deste trabalho são as seguintes.
Primeiro, uma visão global compreensiva do estado de arte relativo _as arquiteturas
Deep Learning para identificação de objetos 3D baseadas em
point clouds. O propósito desta visão global é para perceber como melhor
abordar este tipo de problema no contexto de condução autónoma.
Segundo, foi gerado um dataset sintético, mas realista, com dados capturados
por um LiDAR no ambiente virtual do GTA V. Foram desenvolvidas
ferramentas para converter os dados gerados no formato do dataset do
KITTI, que se tornou num standard para avaliação de técnicas de deteção
de objetos 3D para condução autónoma.
Terceiro, algumas das arquiteturas DL de identificação de objetos 3D revistas
foram avaliadas com o dataset gerado. Apesar da sua performance
com o dataset gerado ter sido pior que os resultados no dataset original do
KITTI, os models chegaram a conseguir processar corretamente os dados
sintéticos sem serem retreinados. O benefício futuro deste trabalho consiste
nos modelos poderem ser adicionalmente treinados com dados produzidos
localmente e testados em cenários variados.
O mod do GTA V implementado provou ser capaz de fornecer datasets ricos,
bem estruturados e compatíveis com o estado de arte em arquiteturas de
identificação de objetos 3D.
A ferramenta desenvolvida está disponível publicamente e esperamos que
seja útil para o avanço da identificação de objetos 3D para condução
autónoma, já que remove a dependência de datasets fornecidos por terceiros.Mestrado em Engenharia de Computadores e Telemátic
- …