3,563 research outputs found
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
Lidar-based scene understanding for autonomous driving using deep learning
With over 1.35 million fatalities related to traffic accidents worldwide, autonomous driving was foreseen at the beginning of this century as a feasible solution to improve security in our roads. Nevertheless, it is meant to disrupt our transportation paradigm, allowing to reduce congestion, pollution, and costs, while increasing the accessibility, efficiency, and reliability of the transportation for both people and goods. Although some advances have gradually been transferred into commercial vehicles in the way of Advanced Driving Assistance Systems (ADAS) such as adaptive cruise control, blind spot detection or automatic parking, however, the technology is far from mature. A full understanding of the scene is actually needed so that allowing the vehicles to be aware of the surroundings, knowing the existing elements of the scene, as well as their motion, intentions and interactions.
In this PhD dissertation, we explore new approaches for understanding driving scenes from 3D LiDAR point clouds by using Deep Learning methods. To this end, in Part I we analyze the scene from a static perspective using independent frames to detect the neighboring vehicles. Next, in Part II we develop new ways for understanding the dynamics of the scene. Finally, in Part III we apply all the developed methods to accomplish higher level challenges such as segmenting moving obstacles while obtaining their rigid motion vector over the ground.
More specifically, in Chapter 2 we develop a 3D vehicle detection pipeline based on a multi-branch deep-learning architecture and propose a Front (FR-V) and a Bird’s Eye view (BE-V) as 2D representations of the 3D point cloud to serve as input for training our models. Later on, in Chapter 3 we apply and further test this method on two real uses-cases, for pre-filtering moving
obstacles while creating maps to better localize ourselves on subsequent days, as well as for vehicle tracking. From the dynamic perspective, in Chapter 4 we learn from the 3D point cloud a novel dynamic feature that resembles optical flow from RGB images. For that, we develop a new approach to leverage RGB optical flow as pseudo ground truth for training purposes but allowing the use of only 3D LiDAR data at inference time. Additionally, in Chapter 5 we explore the benefits of combining classification and regression learning problems to face the optical flow estimation task in a joint coarse-and-fine manner. Lastly, in Chapter 6 we gather the previous methods and demonstrate that with these independent tasks we can guide the learning of higher challenging problems such as segmentation and motion estimation of moving vehicles from our own moving perspective.Con más de 1,35 millones de muertes por accidentes de tráfico en el mundo, a principios de siglo se predijo que la conducción autónoma serÃa una solución viable para mejorar la seguridad en nuestras carreteras. Además la conducción autónoma está destinada a cambiar nuestros paradigmas de transporte, permitiendo reducir la congestión del tráfico, la contaminación y el coste, a la vez que aumentando la accesibilidad, la eficiencia y confiabilidad del transporte tanto de personas como de mercancÃas. Aunque algunos avances, como el control de crucero adaptativo, la detección de puntos ciegos o el estacionamiento automático, se han transferido gradualmente a vehÃculos comerciales en la forma de los Sistemas Avanzados de Asistencia a la Conducción (ADAS), la tecnologÃa aún no ha alcanzado el suficiente grado de madurez. Se necesita una comprensión completa de la escena para que los vehÃculos puedan entender el entorno, detectando los elementos presentes, asà como su movimiento, intenciones e interacciones. En la presente tesis doctoral, exploramos nuevos enfoques para comprender escenarios de conducción utilizando nubes de puntos en 3D capturadas con sensores LiDAR, para lo cual empleamos métodos de aprendizaje profundo. Con este fin, en la Parte I analizamos la escena desde una perspectiva estática para detectar vehÃculos. A continuación, en la Parte II, desarrollamos nuevas formas de entender las dinámicas del entorno. Finalmente, en la Parte III aplicamos los métodos previamente desarrollados para lograr desafÃos de nivel superior, como segmentar obstáculos dinámicos a la vez que estimamos su vector de movimiento sobre el suelo. EspecÃficamente, en el CapÃtulo 2 detectamos vehÃculos en 3D creando una arquitectura de aprendizaje profundo de dos ramas y proponemos una vista frontal (FR-V) y una vista de pájaro (BE-V) como representaciones 2D de la nube de puntos 3D que sirven como entrada para entrenar nuestros modelos. Más adelante, en el CapÃtulo 3 aplicamos y probamos aún más este método en dos casos de uso reales, tanto para filtrar obstáculos en movimiento previamente a la creación de mapas sobre los que poder localizarnos mejor en los dÃas posteriores, como para el seguimiento de vehÃculos. Desde la perspectiva dinámica, en el CapÃtulo 4 aprendemos de la nube de puntos en 3D una caracterÃstica dinámica novedosa que se asemeja al flujo óptico sobre imágenes RGB. Para ello, desarrollamos un nuevo enfoque que aprovecha el flujo óptico RGB como pseudo muestras reales para entrenamiento, usando solo information 3D durante la inferencia. Además, en el CapÃtulo 5 exploramos los beneficios de combinar los aprendizajes de problemas de clasificación y regresión para la tarea de estimación de flujo óptico de manera conjunta. Por último, en el CapÃtulo 6 reunimos los métodos anteriores y demostramos que con estas tareas independientes podemos guiar el aprendizaje de problemas de más alto nivel, como la segmentación y estimación del movimiento de vehÃculos desde nuestra propia perspectivaAmb més d’1,35 milions de morts per accidents de trà nsit al món, a principis de segle es va
predir que la conducció autònoma es convertiria en una solució viable per millorar la seguretat
a les nostres carreteres. D’altra banda, la conducció autònoma està destinada a canviar els
paradigmes del transport, fent possible aixà reduir la densitat del trà nsit, la contaminació i
el cost, alhora que augmentant l’accessibilitat, l’eficiència i la confiança del transport tant de
persones com de mercaderies. Encara que alguns avenços, com el control de creuer adaptatiu,
la detecció de punts cecs o l’estacionament automà tic, s’han transferit gradualment a vehicles
comercials en forma de Sistemes Avançats d’Assistència a la Conducció (ADAS), la tecnologia
encara no ha arribat a aconseguir el grau suficient de maduresa. És necessà ria, doncs, una
total comprensió de l’escena de manera que els vehicles puguin entendre l’entorn, detectant els
elements presents, aixà com el seu moviment, intencions i interaccions.
A la present tesi doctoral, explorem nous enfocaments per tal de comprendre les diferents
escenes de conducció utilitzant núvols de punts en 3D capturats amb sensors LiDAR, mitjançant
l’ús de mètodes d’aprenentatge profund. Amb aquest objectiu, a la Part I analitzem l’escena des
d’una perspectiva està tica per a detectar vehicles. A continuació, a la Part II, desenvolupem
noves formes d’entendre les dinà miques de l’entorn. Finalment, a la Part III apliquem els
mètodes prèviament desenvolupats per a aconseguir desafiaments d’un nivell superior, com, per
exemple, segmentar obstacles dinà mics al mateix temps que estimem el seu vector de moviment
respecte al terra.
Concretament, al CapÃtol 2 detectem vehicles en 3D creant una arquitectura d’aprenentatge
profund amb dues branques, i proposem una vista frontal (FR-V) i una vista d’ocell (BE-V)
com a representacions 2D del núvol de punts 3D que serveixen com a punt de partida per
entrenar els nostres models. Més endavant, al CapÃtol 3 apliquem i provem de nou aquest
mètode en dos casos d’ús reals, tant per filtrar obstacles en moviment prèviament a la creació
de mapes en els quals poder localitzar-nos millor en dies posteriors, com per dur a terme
el seguiment de vehicles. Des de la perspectiva dinà mica, al CapÃtol 4 aprenem una nova
caracterÃstica dinà mica del núvol de punts en 3D que s’assembla al flux òptic sobre imatges
RGB. Per a fer-ho, desenvolupem un nou enfocament que aprofita el flux òptic RGB com pseudo
mostres reals per a entrenament, utilitzant només informació 3D durant la inferència. Després,
al CapÃtol 5 explorem els beneficis que s’obtenen de combinar els aprenentatges de problemes
de classificació i regressió per la tasca d’estimació de flux òptic de manera conjunta. Finalment,
al CapÃtol 6 posem en comú els mètodes anteriors i demostrem que mitjançant aquests processos
independents podem abordar l’aprenentatge de problemes més complexos, com la segmentació
i estimació del moviment de vehicles des de la nostra pròpia perspectiva
Co-Clustering Network-Constrained Trajectory Data
Recently, clustering moving object trajectories kept gaining interest from
both the data mining and machine learning communities. This problem, however,
was studied mainly and extensively in the setting where moving objects can move
freely on the euclidean space. In this paper, we study the problem of
clustering trajectories of vehicles whose movement is restricted by the
underlying road network. We model relations between these trajectories and road
segments as a bipartite graph and we try to cluster its vertices. We
demonstrate our approaches on synthetic data and show how it could be useful in
inferring knowledge about the flow dynamics and the behavior of the drivers
using the road network
Learning Deep Representations of Appearance and Motion for Anomalous Event Detection
We present a novel unsupervised deep learning framework for anomalous event
detection in complex video scenes. While most existing works merely use
hand-crafted appearance and motion features, we propose Appearance and Motion
DeepNet (AMDN) which utilizes deep neural networks to automatically learn
feature representations. To exploit the complementary information of both
appearance and motion patterns, we introduce a novel double fusion framework,
combining both the benefits of traditional early fusion and late fusion
strategies. Specifically, stacked denoising autoencoders are proposed to
separately learn both appearance and motion features as well as a joint
representation (early fusion). Based on the learned representations, multiple
one-class SVM models are used to predict the anomaly scores of each input,
which are then integrated with a late fusion strategy for final anomaly
detection. We evaluate the proposed method on two publicly available video
surveillance datasets, showing competitive performance with respect to state of
the art approaches.Comment: Oral paper in BMVC 201
A computer vision approach to classification of birds in flight from video sequences
Bird populations are an important bio-indicator; so collecting reliable data is useful for ecologists helping conserve and manage fragile ecosystems. However, existing manual monitoring methods are labour-intensive, time-consuming, and error-prone. The aim of our work is to develop a reliable system, capable of automatically classifying individual bird species in flight from videos. This is challenging, but appropriate for use in the field, since there is often a requirement to identify in flight, rather than when stationary. We present our work in progress, which uses combined appearance and motion features to classify and present experimental results across seven species using Normal Bayes classifier with majority voting and achieving a classification rate of 86%
- …