135 research outputs found
Recommended from our members
Learning Birds-Eye View Representations for Autonomous Driving
Over the past few years, progress towards the ambitious goal of widespread fully-autonomous vehicles on our roads has accelerated dramatically. This progress has been spurred largely by the success of highly accurate LiDAR sensors, as well the use of detailed high-resolution maps, which together allow a vehicle to navigate its surroundings effectively. Often, however, one or both of these resources may be unavailable, whether due to cost, sensor failure, or the need to operate in an unmapped environment. The aim of this thesis is therefore to demonstrate that it is possible to build detailed three-dimensional representations of traffic scenes using only 2D monocular camera images as input. Such an approach faces many challenges: most notably that 2D images do not provide explicit 3D structure. We overcome this limitation by applying a combination of deep learning and geometry to transform image-based features into an orthographic birds-eye view representation of the scene, allowing algorithms to reason in a metric, 3D space. This approach is applied to solving two challenging perception tasks central to autonomous driving.
The first part of this thesis addresses the problem of monocular 3D object detection, which involves determining the size and location of all objects in the scene. Our solution was based on a novel convolutional network architecture that processed features in both the image and birds-eye view perspective. Results on the KITTI dataset showed that this network outperformed existing works at the time, and although more recent works have improved on these results, we conducted extensive analysis to find that our solution performed well in many difficult edge-case scenarios such as objects close to or distant from the camera.
In the second part of the thesis, we consider the related problem of semantic map prediction. This consists of estimating a birds-eye view map of the world visible from a given camera, encoding both static elements of the scene such as pavement and road layout, as well as dynamic objects such as vehicles and pedestrians. This was accomplished using a second network that built on the experience from the previous work and achieved convincing performance on two real-world driving datasets. By formulating the maps as an occupancy grid map (a widely used representation from robotics), we were able to demonstrate how predictions could be accumulated across multiple frames, and that doing so further improved the robustness of maps produced by our system.Toyota Motors Europ
Reliable localization methods for intelligent vehicles based on environment perception
Mención Internacional en el título de doctorIn the near past, we would see autonomous vehicles and Intelligent Transport
Systems (ITS) as a potential future of transportation. Today, thanks to all the
technological advances in recent years, the feasibility of such systems is no longer a
question. Some of these autonomous driving technologies are already sharing our
roads, and even commercial vehicles are including more Advanced Driver-Assistance
Systems (ADAS) over the years. As a result, transportation is becoming more efficient
and the roads are considerably safer.
One of the fundamental pillars of an autonomous system is self-localization. An
accurate and reliable estimation of the vehicle’s pose in the world is essential to
navigation. Within the context of outdoor vehicles, the Global Navigation Satellite
System (GNSS) is the predominant localization system. However, these systems are
far from perfect, and their performance is degraded in environments with limited
satellite visibility. Additionally, their dependence on the environment can make them
unreliable if it were to change.
Accordingly, the goal of this thesis is to exploit the perception of the environment
to enhance localization systems in intelligent vehicles, with special attention to
their reliability. To this end, this thesis presents several contributions: First, a study
on exploiting 3D semantic information in LiDAR odometry is presented, providing
interesting insights regarding the contribution to the odometry output of each type
of element in the scene. The experimental results have been obtained using a public
dataset and validated on a real-world platform. Second, a method to estimate the
localization error using landmark detections is proposed, which is later on exploited
by a landmark placement optimization algorithm. This method, which has been
validated in a simulation environment, is able to determine a set of landmarks
so the localization error never exceeds a predefined limit. Finally, a cooperative
localization algorithm based on a Genetic Particle Filter is proposed to utilize vehicle
detections in order to enhance the estimation provided by GNSS systems. Multiple
experiments are carried out in different simulation environments to validate the
proposed method.En un pasado no muy lejano, los vehículos autónomos y los Sistemas Inteligentes
del Transporte (ITS) se veían como un futuro para el transporte con gran potencial.
Hoy, gracias a todos los avances tecnológicos de los últimos años, la viabilidad
de estos sistemas ha dejado de ser una incógnita. Algunas de estas tecnologías
de conducción autónoma ya están compartiendo nuestras carreteras, e incluso los
vehículos comerciales cada vez incluyen más Sistemas Avanzados de Asistencia a la
Conducción (ADAS) con el paso de los años. Como resultado, el transporte es cada
vez más eficiente y las carreteras son considerablemente más seguras.
Uno de los pilares fundamentales de un sistema autónomo es la autolocalización.
Una estimación precisa y fiable de la posición del vehículo en el mundo es esencial
para la navegación. En el contexto de los vehículos circulando en exteriores, el
Sistema Global de Navegación por Satélite (GNSS) es el sistema de localización predominante.
Sin embargo, estos sistemas están lejos de ser perfectos, y su rendimiento
se degrada en entornos donde la visibilidad de los satélites es limitada. Además, los
cambios en el entorno pueden provocar cambios en la estimación, lo que los hace
poco fiables en ciertas situaciones.
Por ello, el objetivo de esta tesis es utilizar la percepción del entorno para mejorar
los sistemas de localización en vehículos inteligentes, con una especial atención a
la fiabilidad de estos sistemas. Para ello, esta tesis presenta varias aportaciones:
En primer lugar, se presenta un estudio sobre cómo aprovechar la información
semántica 3D en la odometría LiDAR, generando una base de conocimiento sobre la
contribución de cada tipo de elemento del entorno a la salida de la odometría. Los
resultados experimentales se han obtenido utilizando una base de datos pública y se
han validado en una plataforma de conducción del mundo real. En segundo lugar,
se propone un método para estimar el error de localización utilizando detecciones
de puntos de referencia, que posteriormente es explotado por un algoritmo de
optimización de posicionamiento de puntos de referencia. Este método, que ha
sido validado en un entorno de simulación, es capaz de determinar un conjunto de
puntos de referencia para el cual el error de localización nunca supere un límite
previamente fijado. Por último, se propone un algoritmo de localización cooperativa
basado en un Filtro Genético de Partículas para utilizar las detecciones de vehículos
con el fin de mejorar la estimación proporcionada por los sistemas GNSS. El método
propuesto ha sido validado mediante múltiples experimentos en diferentes entornos
de simulación.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridSecretario: Joshué Manuel Pérez Rastelli.- Secretario: Jorge Villagrá Serrano.- Vocal: Enrique David Martí Muño
Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning
Camera-based end-to-end driving neural networks bring the promise of a
low-cost system that maps camera images to driving control commands. These
networks are appealing because they replace laborious hand engineered building
blocks but their black-box nature makes them difficult to delve in case of
failure. Recent works have shown the importance of using an explicit
intermediate representation that has the benefits of increasing both the
interpretability and the accuracy of networks' decisions. Nonetheless, these
camera-based networks reason in camera view where scale is not homogeneous and
hence not directly suitable for motion forecasting. In this paper, we introduce
a novel monocular camera-only holistic end-to-end trajectory planning network
with a Bird-Eye-View (BEV) intermediate representation that comes in the form
of binary Occupancy Grid Maps (OGMs). To ease the prediction of OGMs in BEV
from camera images, we introduce a novel scheme where the OGMs are first
predicted as semantic masks in camera view and then warped in BEV using the
homography between the two planes. The key element allowing this transformation
to be applied to 3D objects such as vehicles, consists in predicting solely
their footprint in camera-view, hence respecting the flat world hypothesis
implied by the homography
Road terrain type classification based on laser measurement system data
For road vehicles, knowledge of terrain types is useful in improving passenger safety and comfort. The conventional methods are susceptible to vehicle speed variations and in this paper we present a method of using Laser Measurement System (LMS) data for speed independent road type classification. Experiments were carried out with an instrumented road vehicle (CRUISE), by manually driving on a variety of road terrain types namely Asphalt, Concrete, Grass, and Gravel roads at different speeds. A looking down LMS is used for capturing the terrain data. The range data is capable of capturing the structural differences while the remission values are used to observe anomalies in surface reflectance properties. Both measurements are combined and used in a Support Vector Machines Classifier to achieve an average accuracy of 95% on different road types
Semantic Mapping of Road Scenes
The problem of understanding road scenes has been on the fore-front in the computer vision community
for the last couple of years. This enables autonomous systems to navigate and understand
the surroundings in which it operates. It involves reconstructing the scene and estimating the objects
present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these
aspects and proposes solutions to address them.
First, we propose a solution to generate a dense semantic map from multiple street-level images.
This map can be imagined as the bird’s eye view of the region with associated semantic labels for
ten’s of kilometres of street level data. We generate the overhead semantic view from street level
images. This is in contrast to existing approaches using satellite/overhead imagery for classification
of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then
we describe a method to perform large scale dense 3D reconstruction of road scenes with associated
semantic labels. Our method fuses the depth-maps in an online fashion, generated from the
stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image
sequences. The object class labels estimated from the street level stereo image sequence are used to
annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by
performing inference over the meshed representation of the scene. By performing labelling over the
mesh we solve two issues: Firstly, images often have redundant information with multiple images
describing the same scene. Solving these images separately is slow, where our method is approximately
a magnitude faster in the inference stage compared to normal inference in the image domain.
Secondly, often multiple images, even though they describe the same scene result in inconsistent
labelling. By solving a single mesh, we remove the inconsistency of labelling across the images.
Also our mesh based labelling takes into account of the object layout in the scene, which is often
ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform
labelling and structure computation through a hierarchical robust PN Markov Random Field
defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and
the object-class labels in a principled manner, through bounded approximate minimisation of a well
defined and studied energy functional. In this thesis, we also introduce two object labelled datasets
created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per
camera view of the roadways of the United Kingdom with a subset of them annotated with object
class labels and the second dataset is comprised of ground truth object labels for the publicly available
KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision
research community
Autonomous navigation for guide following in crowded indoor environments
The requirements for assisted living are rapidly changing as the number of elderly
patients over the age of 60 continues to increase. This rise places a high level of stress on
nurse practitioners who must care for more patients than they are capable. As this trend is
expected to continue, new technology will be required to help care for patients. Mobile
robots present an opportunity to help alleviate the stress on nurse practitioners by
monitoring and performing remedial tasks for elderly patients. In order to produce
mobile robots with the ability to perform these tasks, however, many challenges must be
overcome.
The hospital environment requires a high level of safety to prevent patient injury. Any
facility that uses mobile robots, therefore, must be able to ensure that no harm will come
to patients whilst in a care environment. This requires the robot to build a high level of
understanding about the environment and the people with close proximity to the robot.
Hitherto, most mobile robots have used vision-based sensors or 2D laser range finders.
3D time-of-flight sensors have recently been introduced and provide dense 3D point
clouds of the environment at real-time frame rates. This provides mobile robots with
previously unavailable dense information in real-time. I investigate the use of time-of-flight
cameras for mobile robot navigation in crowded environments in this thesis. A
unified framework to allow the robot to follow a guide through an indoor environment
safely and efficiently is presented. Each component of the framework is analyzed in
detail, with real-world scenarios illustrating its practical use.
Time-of-flight cameras are relatively new sensors and, therefore, have inherent problems
that must be overcome to receive consistent and accurate data. I propose a novel and
practical probabilistic framework to overcome many of the inherent problems in this
thesis. The framework fuses multiple depth maps with color information forming a
reliable and consistent view of the world. In order for the robot to interact with the
environment, contextual information is required. To this end, I propose a region-growing
segmentation algorithm to group points based on surface characteristics, surface normal
and surface curvature. The segmentation process creates a distinct set of surfaces,
however, only a limited amount of contextual information is available to allow for
interaction. Therefore, a novel classifier is proposed using spherical harmonics to
differentiate people from all other objects.
The added ability to identify people allows the robot to find potential candidates to
follow. However, for safe navigation, the robot must continuously track all visible
objects to obtain positional and velocity information. A multi-object tracking system is
investigated to track visible objects reliably using multiple cues, shape and color. The
tracking system allows the robot to react to the dynamic nature of people by building an
estimate of the motion flow. This flow provides the robot with the necessary information
to determine where and at what speeds it is safe to drive. In addition, a novel search
strategy is proposed to allow the robot to recover a guide who has left the field-of-view.
To achieve this, a search map is constructed with areas of the environment ranked
according to how likely they are to reveal the guide’s true location. Then, the robot can
approach the most likely search area to recover the guide. Finally, all components
presented are joined to follow a guide through an indoor environment. The results
achieved demonstrate the efficacy of the proposed components
Percepção do ambiente urbano e navegação usando visão robótica : concepção e implementação aplicado à veículo autônomo
Orientadores: Janito Vaqueiro Ferreira, Alessandro Corrêa VictorinoTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia MecânicaResumo: O desenvolvimento de veículos autônomos capazes de se locomover em ruas urbanas pode proporcionar importantes benefícios na redução de acidentes, no aumentando da qualidade de vida e também na redução de custos. Veículos inteligentes, por exemplo, frequentemente baseiam suas decisões em observações obtidas a partir de vários sensores tais como LIDAR, GPS e câmeras. Atualmente, sensores de câmera têm recebido grande atenção pelo motivo de que eles são de baixo custo, fáceis de utilizar e fornecem dados com rica informação. Ambientes urbanos representam um interessante mas também desafiador cenário neste contexto, onde o traçado das ruas podem ser muito complexos, a presença de objetos tais como árvores, bicicletas, veículos podem gerar observações parciais e também estas observações são muitas vezes ruidosas ou ainda perdidas devido a completas oclusões. Portanto, o processo de percepção por natureza precisa ser capaz de lidar com a incerteza no conhecimento do mundo em torno do veículo. Nesta tese, este problema de percepção é analisado para a condução nos ambientes urbanos associado com a capacidade de realizar um deslocamento seguro baseado no processo de tomada de decisão em navegação autônoma. Projeta-se um sistema de percepção que permita veículos robóticos a trafegar autonomamente nas ruas, sem a necessidade de adaptar a infraestrutura, sem o conhecimento prévio do ambiente e considerando a presença de objetos dinâmicos tais como veículos. Propõe-se um novo método baseado em aprendizado de máquina para extrair o contexto semântico usando um par de imagens estéreo, a qual é vinculada a uma grade de ocupação evidencial que modela as incertezas de um ambiente urbano desconhecido, aplicando a teoria de Dempster-Shafer. Para a tomada de decisão no planejamento do caminho, aplica-se a abordagem dos tentáculos virtuais para gerar possíveis caminhos a partir do centro de referencia do veículo e com base nisto, duas novas estratégias são propostas. Em primeiro, uma nova estratégia para escolher o caminho correto para melhor evitar obstáculos e seguir a tarefa local no contexto da navegação hibrida e, em segundo, um novo controle de malha fechada baseado na odometria visual e o tentáculo virtual é modelado para execução do seguimento de caminho. Finalmente, um completo sistema automotivo integrando os modelos de percepção, planejamento e controle são implementados e validados experimentalmente em condições reais usando um veículo autônomo experimental, onde os resultados mostram que a abordagem desenvolvida realiza com sucesso uma segura navegação local com base em sensores de câmeraAbstract: The development of autonomous vehicles capable of getting around on urban roads can provide important benefits in reducing accidents, in increasing life comfort and also in providing cost savings. Intelligent vehicles for example often base their decisions on observations obtained from various sensors such as LIDAR, GPS and Cameras. Actually, camera sensors have been receiving large attention due to they are cheap, easy to employ and provide rich data information. Inner-city environments represent an interesting but also very challenging scenario in this context, where the road layout may be very complex, the presence of objects such as trees, bicycles, cars might generate partial observations and also these observations are often noisy or even missing due to heavy occlusions. Thus, perception process by nature needs to be able to deal with uncertainties in the knowledge of the world around the car. While highway navigation and autonomous driving using a prior knowledge of the environment have been demonstrating successfully, understanding and navigating general inner-city scenarios with little prior knowledge remains an unsolved problem. In this thesis, this perception problem is analyzed for driving in the inner-city environments associated with the capacity to perform a safe displacement based on decision-making process in autonomous navigation. It is designed a perception system that allows robotic-cars to drive autonomously on roads, without the need to adapt the infrastructure, without requiring previous knowledge of the environment and considering the presence of dynamic objects such as cars. It is proposed a novel method based on machine learning to extract the semantic context using a pair of stereo images, which is merged in an evidential grid to model the uncertainties of an unknown urban environment, applying the Dempster-Shafer theory. To make decisions in path-planning, it is applied the virtual tentacle approach to generate possible paths starting from ego-referenced car and based on it, two news strategies are proposed. First one, a new strategy to select the correct path to better avoid obstacles and to follow the local task in the context of hybrid navigation, and second, a new closed loop control based on visual odometry and virtual tentacle is modeled to path-following execution. Finally, a complete automotive system integrating the perception, path-planning and control modules are implemented and experimentally validated in real situations using an experimental autonomous car, where the results show that the developed approach successfully performs a safe local navigation based on camera sensorsDoutoradoMecanica dos Sólidos e Projeto MecanicoDoutor em Engenharia Mecânic
Survey on video anomaly detection in dynamic scenes with moving cameras
The increasing popularity of compact and inexpensive cameras, e.g.~dash
cameras, body cameras, and cameras equipped on robots, has sparked a growing
interest in detecting anomalies within dynamic scenes recorded by moving
cameras. However, existing reviews primarily concentrate on Video Anomaly
Detection (VAD) methods assuming static cameras. The VAD literature with moving
cameras remains fragmented, lacking comprehensive reviews to date. To address
this gap, we endeavor to present the first comprehensive survey on Moving
Camera Video Anomaly Detection (MC-VAD). We delve into the research papers
related to MC-VAD, critically assessing their limitations and highlighting
associated challenges. Our exploration encompasses three application domains:
security, urban transportation, and marine environments, which in turn cover
six specific tasks. We compile an extensive list of 25 publicly-available
datasets spanning four distinct environments: underwater, water surface,
ground, and aerial. We summarize the types of anomalies these datasets
correspond to or contain, and present five main categories of approaches for
detecting such anomalies. Lastly, we identify future research directions and
discuss novel contributions that could advance the field of MC-VAD. With this
survey, we aim to offer a valuable reference for researchers and practitioners
striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie
- …