Search CORE

135 research outputs found

Recommended from our members

Learning Birds-Eye View Representations for Autonomous Driving

Author: Roddick Thomas
Publication venue: University of Cambridge
Publication date: 19/03/2021
Field of study

Over the past few years, progress towards the ambitious goal of widespread fully-autonomous vehicles on our roads has accelerated dramatically. This progress has been spurred largely by the success of highly accurate LiDAR sensors, as well the use of detailed high-resolution maps, which together allow a vehicle to navigate its surroundings effectively. Often, however, one or both of these resources may be unavailable, whether due to cost, sensor failure, or the need to operate in an unmapped environment. The aim of this thesis is therefore to demonstrate that it is possible to build detailed three-dimensional representations of traffic scenes using only 2D monocular camera images as input. Such an approach faces many challenges: most notably that 2D images do not provide explicit 3D structure. We overcome this limitation by applying a combination of deep learning and geometry to transform image-based features into an orthographic birds-eye view representation of the scene, allowing algorithms to reason in a metric, 3D space. This approach is applied to solving two challenging perception tasks central to autonomous driving. The first part of this thesis addresses the problem of monocular 3D object detection, which involves determining the size and location of all objects in the scene. Our solution was based on a novel convolutional network architecture that processed features in both the image and birds-eye view perspective. Results on the KITTI dataset showed that this network outperformed existing works at the time, and although more recent works have improved on these results, we conducted extensive analysis to find that our solution performed well in many difficult edge-case scenarios such as objects close to or distant from the camera. In the second part of the thesis, we consider the related problem of semantic map prediction. This consists of estimating a birds-eye view map of the world visible from a given camera, encoding both static elements of the scene such as pavement and road layout, as well as dynamic objects such as vehicles and pedestrians. This was accomplished using a second network that built on the experience from the previous work and achieved convincing performance on two real-world driving datasets. By formulating the maps as an occupancy grid map (a widely used representation from robotics), we were able to demonstrate how predictions could be accumulated across multiple frames, and that doing so further improved the robustness of maps produced by our system.Toyota Motors Europ

Apollo (Cambridge)

Reliable localization methods for intelligent vehicles based on environment perception

Author: Moreno Olivo Francisco Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/09/2022
Field of study

Mención Internacional en el título de doctorIn the near past, we would see autonomous vehicles and Intelligent Transport Systems (ITS) as a potential future of transportation. Today, thanks to all the technological advances in recent years, the feasibility of such systems is no longer a question. Some of these autonomous driving technologies are already sharing our roads, and even commercial vehicles are including more Advanced Driver-Assistance Systems (ADAS) over the years. As a result, transportation is becoming more efficient and the roads are considerably safer. One of the fundamental pillars of an autonomous system is self-localization. An accurate and reliable estimation of the vehicle’s pose in the world is essential to navigation. Within the context of outdoor vehicles, the Global Navigation Satellite System (GNSS) is the predominant localization system. However, these systems are far from perfect, and their performance is degraded in environments with limited satellite visibility. Additionally, their dependence on the environment can make them unreliable if it were to change. Accordingly, the goal of this thesis is to exploit the perception of the environment to enhance localization systems in intelligent vehicles, with special attention to their reliability. To this end, this thesis presents several contributions: First, a study on exploiting 3D semantic information in LiDAR odometry is presented, providing interesting insights regarding the contribution to the odometry output of each type of element in the scene. The experimental results have been obtained using a public dataset and validated on a real-world platform. Second, a method to estimate the localization error using landmark detections is proposed, which is later on exploited by a landmark placement optimization algorithm. This method, which has been validated in a simulation environment, is able to determine a set of landmarks so the localization error never exceeds a predefined limit. Finally, a cooperative localization algorithm based on a Genetic Particle Filter is proposed to utilize vehicle detections in order to enhance the estimation provided by GNSS systems. Multiple experiments are carried out in different simulation environments to validate the proposed method.En un pasado no muy lejano, los vehículos autónomos y los Sistemas Inteligentes del Transporte (ITS) se veían como un futuro para el transporte con gran potencial. Hoy, gracias a todos los avances tecnológicos de los últimos años, la viabilidad de estos sistemas ha dejado de ser una incógnita. Algunas de estas tecnologías de conducción autónoma ya están compartiendo nuestras carreteras, e incluso los vehículos comerciales cada vez incluyen más Sistemas Avanzados de Asistencia a la Conducción (ADAS) con el paso de los años. Como resultado, el transporte es cada vez más eficiente y las carreteras son considerablemente más seguras. Uno de los pilares fundamentales de un sistema autónomo es la autolocalización. Una estimación precisa y fiable de la posición del vehículo en el mundo es esencial para la navegación. En el contexto de los vehículos circulando en exteriores, el Sistema Global de Navegación por Satélite (GNSS) es el sistema de localización predominante. Sin embargo, estos sistemas están lejos de ser perfectos, y su rendimiento se degrada en entornos donde la visibilidad de los satélites es limitada. Además, los cambios en el entorno pueden provocar cambios en la estimación, lo que los hace poco fiables en ciertas situaciones. Por ello, el objetivo de esta tesis es utilizar la percepción del entorno para mejorar los sistemas de localización en vehículos inteligentes, con una especial atención a la fiabilidad de estos sistemas. Para ello, esta tesis presenta varias aportaciones: En primer lugar, se presenta un estudio sobre cómo aprovechar la información semántica 3D en la odometría LiDAR, generando una base de conocimiento sobre la contribución de cada tipo de elemento del entorno a la salida de la odometría. Los resultados experimentales se han obtenido utilizando una base de datos pública y se han validado en una plataforma de conducción del mundo real. En segundo lugar, se propone un método para estimar el error de localización utilizando detecciones de puntos de referencia, que posteriormente es explotado por un algoritmo de optimización de posicionamiento de puntos de referencia. Este método, que ha sido validado en un entorno de simulación, es capaz de determinar un conjunto de puntos de referencia para el cual el error de localización nunca supere un límite previamente fijado. Por último, se propone un algoritmo de localización cooperativa basado en un Filtro Genético de Partículas para utilizar las detecciones de vehículos con el fin de mejorar la estimación proporcionada por los sistemas GNSS. El método propuesto ha sido validado mediante múltiples experimentos en diferentes entornos de simulación.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridSecretario: Joshué Manuel Pérez Rastelli.- Secretario: Jorge Villagrá Serrano.- Vocal: Enrique David Martí Muño

Universidad Carlos III de Madrid e-Archivo

Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning

Author: Drummond Tom
Grandvalet Yves
Li You
Loukkal Abdelhak
Publication venue
Publication date: 10/08/2020
Field of study

Camera-based end-to-end driving neural networks bring the promise of a low-cost system that maps camera images to driving control commands. These networks are appealing because they replace laborious hand engineered building blocks but their black-box nature makes them difficult to delve in case of failure. Recent works have shown the importance of using an explicit intermediate representation that has the benefits of increasing both the interpretability and the accuracy of networks' decisions. Nonetheless, these camera-based networks reason in camera view where scale is not homogeneous and hence not directly suitable for motion forecasting. In this paper, we introduce a novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View (BEV) intermediate representation that comes in the form of binary Occupancy Grid Maps (OGMs). To ease the prediction of OGMs in BEV from camera images, we introduce a novel scheme where the OGMs are first predicted as semantic masks in camera view and then warped in BEV using the homography between the two planes. The key element allowing this transformation to be applied to 3D objects such as vehicles, consists in predicting solely their footprint in camera-view, hence respecting the flat world hypothesis implied by the homography

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Road terrain type classification based on laser measurement system data

Author: Kodagoda S
Ranasinghe R
Wang S
Publication venue
Publication date: 01/12/2012
Field of study

For road vehicles, knowledge of terrain types is useful in improving passenger safety and comfort. The conventional methods are susceptible to vehicle speed variations and in this paper we present a method of using Laser Measurement System (LMS) data for speed independent road type classification. Experiments were carried out with an instrumented road vehicle (CRUISE), by manually driving on a variety of road terrain types namely Asphalt, Concrete, Grass, and Gravel roads at different speeds. A looking down LMS is used for capturing the terrain data. The range data is capable of capturing the structural differences while the remission values are used to observe anomalies in surface reflectance properties. Both measurements are combined and used in a Support Vector Machines Classifier to achieve an average accuracy of 95% on different road types

OPUS - University of Technology Sydney

Semantic Mapping of Road Scenes

Author: Sengupta S
Publication venue: 'Oxford Brookes University'
Publication date: 01/01/2014
Field of study

The problem of understanding road scenes has been on the fore-front in the computer vision community for the last couple of years. This enables autonomous systems to navigate and understand the surroundings in which it operates. It involves reconstructing the scene and estimating the objects present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these aspects and proposes solutions to address them. First, we propose a solution to generate a dense semantic map from multiple street-level images. This map can be imagined as the bird’s eye view of the region with associated semantic labels for ten’s of kilometres of street level data. We generate the overhead semantic view from street level images. This is in contrast to existing approaches using satellite/overhead imagery for classification of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then we describe a method to perform large scale dense 3D reconstruction of road scenes with associated semantic labels. Our method fuses the depth-maps in an online fashion, generated from the stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image sequences. The object class labels estimated from the street level stereo image sequence are used to annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by performing inference over the meshed representation of the scene. By performing labelling over the mesh we solve two issues: Firstly, images often have redundant information with multiple images describing the same scene. Solving these images separately is slow, where our method is approximately a magnitude faster in the inference stage compared to normal inference in the image domain. Secondly, often multiple images, even though they describe the same scene result in inconsistent labelling. By solving a single mesh, we remove the inconsistency of labelling across the images. Also our mesh based labelling takes into account of the object layout in the scene, which is often ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform labelling and structure computation through a hierarchical robust PN Markov Random Field defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and the object-class labels in a principled manner, through bounded approximate minimisation of a well defined and studied energy functional. In this thesis, we also introduce two object labelled datasets created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per camera view of the roadways of the United Kingdom with a subset of them annotated with object class labels and the second dataset is comprised of ground truth object labels for the publicly available KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision research community

Oxford Brookes University: RADAR

Autonomous navigation for guide following in crowded indoor environments

Author: Ballantyne James
Ballantyne James
Publication venue: Surgery and Cancer, Imperial College London
Publication date: 01/02/2011
Field of study

The requirements for assisted living are rapidly changing as the number of elderly patients over the age of 60 continues to increase. This rise places a high level of stress on nurse practitioners who must care for more patients than they are capable. As this trend is expected to continue, new technology will be required to help care for patients. Mobile robots present an opportunity to help alleviate the stress on nurse practitioners by monitoring and performing remedial tasks for elderly patients. In order to produce mobile robots with the ability to perform these tasks, however, many challenges must be overcome. The hospital environment requires a high level of safety to prevent patient injury. Any facility that uses mobile robots, therefore, must be able to ensure that no harm will come to patients whilst in a care environment. This requires the robot to build a high level of understanding about the environment and the people with close proximity to the robot. Hitherto, most mobile robots have used vision-based sensors or 2D laser range finders. 3D time-of-flight sensors have recently been introduced and provide dense 3D point clouds of the environment at real-time frame rates. This provides mobile robots with previously unavailable dense information in real-time. I investigate the use of time-of-flight cameras for mobile robot navigation in crowded environments in this thesis. A unified framework to allow the robot to follow a guide through an indoor environment safely and efficiently is presented. Each component of the framework is analyzed in detail, with real-world scenarios illustrating its practical use. Time-of-flight cameras are relatively new sensors and, therefore, have inherent problems that must be overcome to receive consistent and accurate data. I propose a novel and practical probabilistic framework to overcome many of the inherent problems in this thesis. The framework fuses multiple depth maps with color information forming a reliable and consistent view of the world. In order for the robot to interact with the environment, contextual information is required. To this end, I propose a region-growing segmentation algorithm to group points based on surface characteristics, surface normal and surface curvature. The segmentation process creates a distinct set of surfaces, however, only a limited amount of contextual information is available to allow for interaction. Therefore, a novel classifier is proposed using spherical harmonics to differentiate people from all other objects. The added ability to identify people allows the robot to find potential candidates to follow. However, for safe navigation, the robot must continuously track all visible objects to obtain positional and velocity information. A multi-object tracking system is investigated to track visible objects reliably using multiple cues, shape and color. The tracking system allows the robot to react to the dynamic nature of people by building an estimate of the motion flow. This flow provides the robot with the necessary information to determine where and at what speeds it is safe to drive. In addition, a novel search strategy is proposed to allow the robot to recover a guide who has left the field-of-view. To achieve this, a search map is constructed with areas of the environment ranked according to how likely they are to reveal the guide’s true location. Then, the robot can approach the most likely search area to recover the guide. Finally, all components presented are joined to follow a guide through an indoor environment. The results achieved demonstrate the efficacy of the proposed components

Spiral - Imperial College Digital Repository

Percepção do ambiente urbano e navegação usando visão robótica : concepção e implementação aplicado à veículo autônomo

Author: Vitor Giovani Bernardes, 1985-
Publication venue: [s.n.]
Publication date: 26/08/2018
Field of study

Orientadores: Janito Vaqueiro Ferreira, Alessandro Corrêa VictorinoTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia MecânicaResumo: O desenvolvimento de veículos autônomos capazes de se locomover em ruas urbanas pode proporcionar importantes benefícios na redução de acidentes, no aumentando da qualidade de vida e também na redução de custos. Veículos inteligentes, por exemplo, frequentemente baseiam suas decisões em observações obtidas a partir de vários sensores tais como LIDAR, GPS e câmeras. Atualmente, sensores de câmera têm recebido grande atenção pelo motivo de que eles são de baixo custo, fáceis de utilizar e fornecem dados com rica informação. Ambientes urbanos representam um interessante mas também desafiador cenário neste contexto, onde o traçado das ruas podem ser muito complexos, a presença de objetos tais como árvores, bicicletas, veículos podem gerar observações parciais e também estas observações são muitas vezes ruidosas ou ainda perdidas devido a completas oclusões. Portanto, o processo de percepção por natureza precisa ser capaz de lidar com a incerteza no conhecimento do mundo em torno do veículo. Nesta tese, este problema de percepção é analisado para a condução nos ambientes urbanos associado com a capacidade de realizar um deslocamento seguro baseado no processo de tomada de decisão em navegação autônoma. Projeta-se um sistema de percepção que permita veículos robóticos a trafegar autonomamente nas ruas, sem a necessidade de adaptar a infraestrutura, sem o conhecimento prévio do ambiente e considerando a presença de objetos dinâmicos tais como veículos. Propõe-se um novo método baseado em aprendizado de máquina para extrair o contexto semântico usando um par de imagens estéreo, a qual é vinculada a uma grade de ocupação evidencial que modela as incertezas de um ambiente urbano desconhecido, aplicando a teoria de Dempster-Shafer. Para a tomada de decisão no planejamento do caminho, aplica-se a abordagem dos tentáculos virtuais para gerar possíveis caminhos a partir do centro de referencia do veículo e com base nisto, duas novas estratégias são propostas. Em primeiro, uma nova estratégia para escolher o caminho correto para melhor evitar obstáculos e seguir a tarefa local no contexto da navegação hibrida e, em segundo, um novo controle de malha fechada baseado na odometria visual e o tentáculo virtual é modelado para execução do seguimento de caminho. Finalmente, um completo sistema automotivo integrando os modelos de percepção, planejamento e controle são implementados e validados experimentalmente em condições reais usando um veículo autônomo experimental, onde os resultados mostram que a abordagem desenvolvida realiza com sucesso uma segura navegação local com base em sensores de câmeraAbstract: The development of autonomous vehicles capable of getting around on urban roads can provide important benefits in reducing accidents, in increasing life comfort and also in providing cost savings. Intelligent vehicles for example often base their decisions on observations obtained from various sensors such as LIDAR, GPS and Cameras. Actually, camera sensors have been receiving large attention due to they are cheap, easy to employ and provide rich data information. Inner-city environments represent an interesting but also very challenging scenario in this context, where the road layout may be very complex, the presence of objects such as trees, bicycles, cars might generate partial observations and also these observations are often noisy or even missing due to heavy occlusions. Thus, perception process by nature needs to be able to deal with uncertainties in the knowledge of the world around the car. While highway navigation and autonomous driving using a prior knowledge of the environment have been demonstrating successfully, understanding and navigating general inner-city scenarios with little prior knowledge remains an unsolved problem. In this thesis, this perception problem is analyzed for driving in the inner-city environments associated with the capacity to perform a safe displacement based on decision-making process in autonomous navigation. It is designed a perception system that allows robotic-cars to drive autonomously on roads, without the need to adapt the infrastructure, without requiring previous knowledge of the environment and considering the presence of dynamic objects such as cars. It is proposed a novel method based on machine learning to extract the semantic context using a pair of stereo images, which is merged in an evidential grid to model the uncertainties of an unknown urban environment, applying the Dempster-Shafer theory. To make decisions in path-planning, it is applied the virtual tentacle approach to generate possible paths starting from ego-referenced car and based on it, two news strategies are proposed. First one, a new strategy to select the correct path to better avoid obstacles and to follow the local task in the context of hybrid navigation, and second, a new closed loop control based on visual odometry and virtual tentacle is modeled to path-following execution. Finally, a complete automotive system integrating the perception, path-planning and control modules are implemented and experimentally validated in real situations using an experimental autonomous car, where the results show that the developed approach successfully performs a safe local navigation based on camera sensorsDoutoradoMecanica dos Sólidos e Projeto MecanicoDoutor em Engenharia Mecânic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Survey on video anomaly detection in dynamic scenes with moving cameras

Author: Jiao Runyu
Poiesi Fabio
Wan Yi
Wang Yiming
Publication venue
Publication date: 14/08/2023
Field of study

The increasing popularity of compact and inexpensive cameras, e.g.~dash cameras, body cameras, and cameras equipped on robots, has sparked a growing interest in detecting anomalies within dynamic scenes recorded by moving cameras. However, existing reviews primarily concentrate on Video Anomaly Detection (VAD) methods assuming static cameras. The VAD literature with moving cameras remains fragmented, lacking comprehensive reviews to date. To address this gap, we endeavor to present the first comprehensive survey on Moving Camera Video Anomaly Detection (MC-VAD). We delve into the research papers related to MC-VAD, critically assessing their limitations and highlighting associated challenges. Our exploration encompasses three application domains: security, urban transportation, and marine environments, which in turn cover six specific tasks. We compile an extensive list of 25 publicly-available datasets spanning four distinct environments: underwater, water surface, ground, and aerial. We summarize the types of anomalies these datasets correspond to or contain, and present five main categories of approaches for detecting such anomalies. Lastly, we identify future research directions and discuss novel contributions that could advance the field of MC-VAD. With this survey, we aim to offer a valuable reference for researchers and practitioners striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie

arXiv.org e-Print Archive