194 research outputs found
Efficient Pedestrian Detection in Urban Traffic Scenes
Pedestrians are important participants in urban traffic environments, and thus act as an interesting category of objects for autonomous cars. Automatic pedestrian detection is an essential task for protecting pedestrians from collision. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance
Driver Behavior Analysis Based on Real On-Road Driving Data in the Design of Advanced Driving Assistance Systems
The number of vehicles on the roads increases every day. According to the National Highway Traffic Safety Administration (NHTSA), the overwhelming majority of serious crashes (over 94 percent) are caused by human error. The broad aim of this research is to develop a driver behavior model using real on-road data in the design of Advanced Driving Assistance Systems (ADASs). For several decades, these systems have been a focus of many researchers and vehicle manufacturers in order to increase vehicle and road safety and assist drivers in different driving situations. Some studies have concentrated on drivers as the main actor in most driving circumstances. The way a driver monitors the traffic environment partially indicates the level of driver awareness. As an objective, we carry out a quantitative and qualitative analysis of driver behavior to identify the relationship between a driver’s intention and his/her actions. The RoadLAB project developed an instrumented vehicle equipped with On-Board Diagnostic systems (OBD-II), a stereo imaging system, and a non-contact eye tracker system to record some synchronized driving data of the driver cephalo-ocular behavior, the vehicle itself, and traffic environment. We analyze several behavioral features of the drivers to realize the potential relevant relationship between driver behavior and the anticipation of the next driver maneuver as well as to reach a better understanding of driver behavior while in the act of driving. Moreover, we detect and classify road lanes in the urban and suburban areas as they provide contextual information. Our experimental results show that our proposed models reached the F1 score of 84% and the accuracy of 94% for driver maneuver prediction and lane type classification respectively
Predictive Model of Driver\u27s Eye Fixation for Maneuver Prediction in the Design of Advanced Driving Assistance Systems
Over the last few years, Advanced Driver Assistance Systems (ADAS) have been shown to significantly reduce the number of vehicle accidents. Accord- ing to the National Highway Traffic Safety Administration (NHTSA), driver errors contribute to 94% of road collisions. This research aims to develop a predictive model of driver eye fixation by analyzing the driver eye and head information (cephalo-ocular) for maneuver prediction in an Advanced Driving Assistance System (ADAS). Several ADASs have been developed to help drivers to perform driving tasks in complex environments and many studies were conducted on improving automated systems. Some research has relied on the fact that the driver plays a crucial role in most driving scenarios, recognizing the driver’s role as the central element in ADASs. The way in which a driver monitors the surrounding environment is at least partially descriptive of the driver’s situation awareness. This thesis’s primary goal is the quantitative and qualitative analysis of driver behavior to determine the relationship between driver intent and actions. The RoadLab initiative provided an instrumented vehicle equipped with an on-board diagnostic system, an eye-gaze tracker, and a stereo vision system for the extraction of relevant features from the driver, the vehicle, and the environment. Several driver behavioral features are investigated to determine whether there is a relevant relation between the driver’s eye fixations and the prediction of driving maneuvers
Deep reinforcement learning for multi-modal embodied navigation
Ce travail se concentre sur une tâche de micro-navigation en plein air où le but est de naviguer
vers une adresse de rue spécifiée en utilisant plusieurs modalités (par exemple, images, texte
de scène et GPS). La tâche de micro-navigation extérieure s’avère etre un défi important pour
de nombreuses personnes malvoyantes, ce que nous démontrons à travers des entretiens et
des études de marché, et nous limitons notre définition des problèmes à leurs besoins. Nous
expérimentons d’abord avec un monde en grille partiellement observable (Grid-Street et Grid
City) contenant des maisons, des numéros de rue et des régions navigables. Ensuite, nous
introduisons le Environnement de Trottoir pour la Navigation Visuelle (ETNV), qui contient
des images panoramiques avec des boîtes englobantes pour les numéros de maison, les portes
et les panneaux de nom de rue, et des formulations pour plusieurs tâches de navigation. Dans
SEVN, nous formons un modèle de politique pour fusionner des observations multimodales
sous la forme d’images à résolution variable, de texte visible et de données GPS simulées afin
de naviguer vers une porte d’objectif. Nous entraînons ce modèle en utilisant l’algorithme
d’apprentissage par renforcement, Proximal Policy Optimization (PPO). Nous espérons que
cette thèse fournira une base pour d’autres recherches sur la création d’agents pouvant aider
les membres de la communauté des gens malvoyantes à naviguer le monde.This work focuses on an Outdoor Micro-Navigation (OMN) task in which the goal is to
navigate to a specified street address using multiple modalities including images, scene-text,
and GPS. This task is a significant challenge to many Blind and Visually Impaired (BVI)
people, which we demonstrate through interviews and market research. To investigate the
feasibility of solving this task with Deep Reinforcement Learning (DRL), we first introduce
two partially observable grid-worlds, Grid-Street and Grid City, containing houses, street
numbers, and navigable regions. In these environments, we train an agent to find specific
houses using local observations under a variety of training procedures. We parameterize
our agent with a neural network and train using reinforcement learning methods. Next, we
introduce the Sidewalk Environment for Visual Navigation (SEVN), which contains panoramic
images with labels for house numbers, doors, and street name signs, and formulations for
several navigation tasks. In SEVN, we train another neural network model using Proximal
Policy Optimization (PPO) to fuse multi-modal observations in the form of variable resolution
images, visible text, and simulated GPS data, and to use this representation to navigate to
goal doors. Our best model used all available modalities and was able to navigate to over 100
goals with an 85% success rate. We found that models with access to only a subset of these
modalities performed significantly worse, supporting the need for a multi-modal approach to
the OMN task. We hope that this thesis provides a foundation for further research into the
creation of agents to assist members of the BVI community to safely navigate
A PhD Dissertation on Road Topology Classification for Autonomous Driving
La clasificaci´on de la topolog´ıa de la carretera es un punto clave si queremos desarrollar
sistemas de conducci´on aut´onoma completos y seguros. Es l´ogico pensar que la comprensi
´on de forma exhaustiva del entorno que rodea al vehiculo, tal como sucede cuando es
un ser humano el que toma las decisiones al volante, es una condici´on indispensable si se
quiere avanzar en la consecuci´on de veh´ıculos aut´onomos de nivel 4 o 5. Si el conductor,
ya sea un sistema aut´onomo, como un ser humano, no tiene acceso a la informaci´on del
entorno la disminuci´on de la seguridad es cr´ıtica y el accidente es casi instant´aneo i.e.,
cuando un conductor se duerme al volante.
A lo largo de esta tesis doctoral se presentan sendos sistemas basados en deep leaning
que ayudan al sistema de conducci´on aut´onoma a comprender el entorno en el que se
encuentra en ese instante. El primero de ellos 3D-Deep y su optimizaci´on 3D-Deepest,
es una nueva arquitectura de red para la segmentaci´on sem´antica de carretera en el que
se integran fuentes de datos de diferente tipolog´ıa. La segmentaci´on de carretera es clave
en un veh´ıculo aut´onomo, ya que es el medio por el que deber´ıa circular en el 99,9% de
los casos. El segundo es un sistema de clasificaci´on de intersecciones urbanas mediante
diferentes enfoques comprendidos dentro del metric-learning, la integraci´on temporal y la
generaci´on de im´agenes sint´eticas. La seguridad es un punto clave en cualquier sistema
aut´onomo, y si es de conducci´on a´un m´as. Las intersecciones son uno de los lugares dentro
de las ciudades donde la seguridad es cr´ıtica. Los coches siguen trayectorias secantes y por
tanto pueden colisionar, la mayor´ıa de ellas son usadas por los peatones para atravesar
la v´ıa independientemente de si existen pasos de cebra o no, lo que incrementa de forma
alarmante los riesgos de atropello y colisi´on.
La implementaci´on de la combinaci´on de ambos sistemas mejora substancialmente la
comprensi´on del entorno, y puede considerarse que incrementa la seguridad, allanando el
camino en la investigaci´on hacia un veh´ıculo completamente aut´onomo.Road topology classification is a crucial point if we want to develop complete and safe
autonomous driving systems. It is logical to think that a thorough understanding of
the environment surrounding the ego-vehicle, as it happens when a human being is a
decision-maker at the wheel, is an indispensable condition if we want to advance in the
achievement of level 4 or 5 autonomous vehicles. If the driver, either an autonomous
system or a human being, does not have access to the information of the environment,
the decrease in safety is critical, and the accident is almost instantaneous, i.e., when a
driver falls asleep at the wheel.
Throughout this doctoral thesis, we present two deep learning systems that will help
an autonomous driving system understand the environment in which it is at that instant.
The first one, 3D-Deep and its optimization 3D-Deepest, is a new network architecture
for semantic road segmentation in which data sources of different types are integrated.
Road segmentation is vital in an autonomous vehicle since it is the medium on which
it should drive in 99.9% of the cases. The second is an urban intersection classification
system using different approaches comprised of metric-learning, temporal integration, and
synthetic image generation. Safety is a crucial point in any autonomous system, and if it
is a driving system, even more so. Intersections are one of the places within cities where
safety is critical. Cars follow secant trajectories and therefore can collide; most of them
are used by pedestrians to cross the road regardless of whether there are crosswalks or
not, which alarmingly increases the risks of being hit and collision.
The implementation of the combination of both systems substantially improves the
understanding of the environment and can be considered to increase safety, paving the
way in the research towards a fully autonomous vehicle
A monocular color vision system for road intersection detection
Urban driving has become the focus of autonomous robotics in recent years. Many groups seek to benefit from the research in this field including the military, who hopes to deploy autonomous rescue forces to battle-torn cities, and consumers, who will benefit from the safety and convenience resulting from new technologies finding purpose in consumer automobiles. One key aspect of autonomous urban driving is localization, or the ability of the robot to determine its position on a road network. Any information that can be obtained for the surrounding area including stop signs, road lines, and intersecting roads can aid this localization. The work here attempts to combine some previously established computer vision methods to identify roads and develop a new method that can identify both the road and any possible intersecting roads present in front of a vehicle using a single color camera. Computer vision systems rely on a few basic methods to understand and identify what they are looking at. Two valuable methods are the detection of edges that are present in the image and analysis of the colors that compose the image. The method described here attempts to utilize edge information to find road lines and color information to find the road area and any similarly colored intersecting roads. This work demonstrates that combining edge detection and color analysis methods utilizes their strengths and accommodates for their weaknesses and allows for a method that can successfully detect road lanes and intersecting roads at speeds fast enough for use with autonomous urban driving
- …