126 research outputs found
Detección del Espacio Libre Conducible
La detección del espacio libre conducible es utilizada en la actualidad tanto para la asistencia a la conducción como para el desarrollo de sistemas de conducción completamente autónomos. Habitualmente, este problema se afronta determinando la profundidad en la imagen mediante sensores (LIDAR) o cámaras estéreo. Este trabajo desarrolla una solución para la estimación del espacio libre conducible mediante el análisis de imágenes generadas con una cámara monocular. Inspirándose en una solución propuesta anteriormente, basada en el uso de técnicas de programación dinámica y la valoración de características en una imagen, este trabajo propone una solución escalable a este problema. Para ello se analiza el uso de características geométricas basadas en contornos y apariencia. Por último se muestran resultados de dicha solución para muestras de imágenes del conjunto KITTI para retos orientados a la conducción autónoma.Drivable detection space is currently used for driving assistance and for the development of fully autonomous driving systems. Typically, this problem is tackled by determining the depth in the image through sensors (LIDAR) or stereo cameras. This paper develops a solution for the estimation of free space drivable by analyzing images generated with a monocular camera. Inspired by a solution previously proposed, based in dynamic programming techniques and assessment of features in an image, this paper proposes a scalable solution to this problem. Algorithm use geometric characteristics like appearance and contours based analyzes. Finally we test the results with KITTI road dataset for autonomous driving.La detecció de l'espai lliure conduïble és utilitzada en l'actualitat tant per a l'assistència a la conducció com per al desenvolupament de sistemes de conducció completament autònoms. Habitualment, aquest problema s'afronta determinant la profunditat en la imatge mitjançant sensors (LIDAR) o càmeres estèreo. Aquest treball desenvolupa una solució per a l'estimació de l'espai lliure conduïble mitjançant l'anàlisi d'imatges generades amb una càmera monocular. Inspirant-se en una solució proposada anteriorment, basada en l'ús de tècniques de programació dinàmica i la valoració de característiques en una imatge, aquest treball proposa una solució escalable a aquest problema. Per a això s'analitza l'ús de característiques geomètriques basades en contorns i aparença. Finalment es mostren resultats d'aquesta solució per a mostres d'imatges del conjunt KITTI per a reptes orientats a la conducció autònoma
A Sensor for Urban Driving Assistance Systems Based on Dense Stereovision
Advanced driving assistance systems (ADAS) form a complex multidisciplinary research field, aimed at improving traffic efficiency and safety. A realistic analysis of the requirements and of the possibilities of the traffic environment leads to the establishment of several goals for traffic assistance, to be implemented in the near future (ADASE, INVENT
Recommended from our members
Learning Birds-Eye View Representations for Autonomous Driving
Over the past few years, progress towards the ambitious goal of widespread fully-autonomous vehicles on our roads has accelerated dramatically. This progress has been spurred largely by the success of highly accurate LiDAR sensors, as well the use of detailed high-resolution maps, which together allow a vehicle to navigate its surroundings effectively. Often, however, one or both of these resources may be unavailable, whether due to cost, sensor failure, or the need to operate in an unmapped environment. The aim of this thesis is therefore to demonstrate that it is possible to build detailed three-dimensional representations of traffic scenes using only 2D monocular camera images as input. Such an approach faces many challenges: most notably that 2D images do not provide explicit 3D structure. We overcome this limitation by applying a combination of deep learning and geometry to transform image-based features into an orthographic birds-eye view representation of the scene, allowing algorithms to reason in a metric, 3D space. This approach is applied to solving two challenging perception tasks central to autonomous driving.
The first part of this thesis addresses the problem of monocular 3D object detection, which involves determining the size and location of all objects in the scene. Our solution was based on a novel convolutional network architecture that processed features in both the image and birds-eye view perspective. Results on the KITTI dataset showed that this network outperformed existing works at the time, and although more recent works have improved on these results, we conducted extensive analysis to find that our solution performed well in many difficult edge-case scenarios such as objects close to or distant from the camera.
In the second part of the thesis, we consider the related problem of semantic map prediction. This consists of estimating a birds-eye view map of the world visible from a given camera, encoding both static elements of the scene such as pavement and road layout, as well as dynamic objects such as vehicles and pedestrians. This was accomplished using a second network that built on the experience from the previous work and achieved convincing performance on two real-world driving datasets. By formulating the maps as an occupancy grid map (a widely used representation from robotics), we were able to demonstrate how predictions could be accumulated across multiple frames, and that doing so further improved the robustness of maps produced by our system.Toyota Motors Europ
Footprints and Free Space from a Single Color Image
Understanding the shape of a scene from a single color image is a formidable
computer vision task. However, most methods aim to predict the geometry of
surfaces that are visible to the camera, which is of limited use when planning
paths for robots or augmented reality agents. Such agents can only move when
grounded on a traversable surface, which we define as the set of classes which
humans can also walk over, such as grass, footpaths and pavement. Models which
predict beyond the line of sight often parameterize the scene with voxels or
meshes, which can be expensive to use in machine learning frameworks.
We introduce a model to predict the geometry of both visible and occluded
traversable surfaces, given a single RGB image as input. We learn from stereo
video sequences, using camera poses, per-frame depth and semantic segmentation
to form training data, which is used to supervise an image-to-image network. We
train models from the KITTI driving dataset, the indoor Matterport dataset, and
from our own casually captured stereo footage. We find that a surprisingly low
bar for spatial coverage of training scenes is required. We validate our
algorithm against a range of strong baselines, and include an assessment of our
predictions for a path-planning task.Comment: Accepted to CVPR 2020 as an oral presentatio
Layered Interpretation of Street View Images
We propose a layered street view model to encode both depth and semantic
information on street view images for autonomous driving. Recently, stixels,
stix-mantics, and tiered scene labeling methods have been proposed to model
street view images. We propose a 4-layer street view model, a compact
representation over the recently proposed stix-mantics model. Our layers encode
semantic classes like ground, pedestrians, vehicles, buildings, and sky in
addition to the depths. The only input to our algorithm is a pair of stereo
images. We use a deep neural network to extract the appearance features for
semantic classes. We use a simple and an efficient inference algorithm to
jointly estimate both semantic classes and layered depth values. Our method
outperforms other competing approaches in Daimler urban scene segmentation
dataset. Our algorithm is massively parallelizable, allowing a GPU
implementation with a processing speed about 9 fps.Comment: The paper will be presented in the 2015 Robotics: Science and Systems
Conference (RSS
- …