403 research outputs found
Recommended from our members
Learning Birds-Eye View Representations for Autonomous Driving
Over the past few years, progress towards the ambitious goal of widespread fully-autonomous vehicles on our roads has accelerated dramatically. This progress has been spurred largely by the success of highly accurate LiDAR sensors, as well the use of detailed high-resolution maps, which together allow a vehicle to navigate its surroundings effectively. Often, however, one or both of these resources may be unavailable, whether due to cost, sensor failure, or the need to operate in an unmapped environment. The aim of this thesis is therefore to demonstrate that it is possible to build detailed three-dimensional representations of traffic scenes using only 2D monocular camera images as input. Such an approach faces many challenges: most notably that 2D images do not provide explicit 3D structure. We overcome this limitation by applying a combination of deep learning and geometry to transform image-based features into an orthographic birds-eye view representation of the scene, allowing algorithms to reason in a metric, 3D space. This approach is applied to solving two challenging perception tasks central to autonomous driving.
The first part of this thesis addresses the problem of monocular 3D object detection, which involves determining the size and location of all objects in the scene. Our solution was based on a novel convolutional network architecture that processed features in both the image and birds-eye view perspective. Results on the KITTI dataset showed that this network outperformed existing works at the time, and although more recent works have improved on these results, we conducted extensive analysis to find that our solution performed well in many difficult edge-case scenarios such as objects close to or distant from the camera.
In the second part of the thesis, we consider the related problem of semantic map prediction. This consists of estimating a birds-eye view map of the world visible from a given camera, encoding both static elements of the scene such as pavement and road layout, as well as dynamic objects such as vehicles and pedestrians. This was accomplished using a second network that built on the experience from the previous work and achieved convincing performance on two real-world driving datasets. By formulating the maps as an occupancy grid map (a widely used representation from robotics), we were able to demonstrate how predictions could be accumulated across multiple frames, and that doing so further improved the robustness of maps produced by our system.Toyota Motors Europ
Intent prediction of vulnerable road users for trusted autonomous vehicles
This study investigated how future autonomous vehicles could be further trusted by vulnerable road users (such as pedestrians and cyclists) that they would be interacting with in urban traffic environments. It focused on understanding the behaviours of such road users on a deeper level by predicting their future intentions based solely on vehicle-based sensors and AI techniques. The findings showed that personal/body language attributes of vulnerable road users besides their past motion trajectories and physics attributes in the environment led to more accurate predictions about their intended actions
CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity
Roadside camera-driven 3D object detection is a crucial task in intelligent
transportation systems, which extends the perception range beyond the
limitations of vision-centric vehicles and enhances road safety. While previous
studies have limitations in using only depth or height information, we find
both depth and height matter and they are in fact complementary. The depth
feature encompasses precise geometric cues, whereas the height feature is
primarily focused on distinguishing between various categories of height
intervals, essentially providing semantic context. This insight motivates the
development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D
object detection framework that integrates depth and height to construct robust
BEV representations. In essence, CoBEV estimates each pixel's depth and height
distribution and lifts the camera features into 3D space for lateral fusion
using the newly proposed two-stage complementary feature selection (CFS)
module. A BEV feature distillation framework is also seamlessly integrated to
further enhance the detection accuracy from the prior knowledge of the
fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D
detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as
the private Supremind-Road dataset, demonstrating that CoBEV not only achieves
the accuracy of the new state-of-the-art, but also significantly advances the
robustness of previous methods in challenging long-distance scenarios and noisy
camera disturbance, and enhances generalization by a large margin in
heterologous settings with drastic changes in scene and camera parameters. For
the first time, the vehicle AP score of a camera model reaches 80% on
DAIR-V2X-I in terms of easy mode. The source code will be made publicly
available at https://github.com/MasterHow/CoBEV.Comment: The source code will be made publicly available at
https://github.com/MasterHow/CoBE
Advanced Map Matching Technologies and Techniques for Pedestrian/Wheelchair Navigation
Due to the constantly increasing technical advantages of mobile devices (such as smartphones), pedestrian/wheelchair navigation recently has achieved a high level of interest as one of smartphones’ potential mobile applications. While vehicle navigation systems have already reached a certain level of maturity, pedestrian/wheelchair navigation services are still in their infancy. By comparing vehicle navigation systems, a set of map matching requirements and challenges unique in pedestrian/wheelchair navigation is identified. To provide navigation assistance to pedestrians and wheelchair users, there is a need for the design and development of new map matching techniques.
The main goal of this research is to investigate and develop advanced map matching technologies and techniques particular for pedestrian/wheelchair navigation services. As the first step in map matching, an adaptive candidate segment selection algorithm is developed to efficiently find candidate segments. Furthermore, to narrow down the search for the correct segment, advanced mathematical models are applied. GPS-based chain-code map matching, Hidden Markov Model (HMM) map matching, and fuzzy-logic map matching algorithms are developed to estimate real-time location of users in pedestrian/wheelchair navigation systems/services. Nevertheless, GPS signal is not always available in areas with high-rise buildings and even when there is a signal, the accuracy may not be high enough for localization of pedestrians and wheelchair users on sidewalks. To overcome these shortcomings of GPS, multi-sensor integrated map matching algorithms are investigated and developed in this research. These algorithms include a movement pattern recognition algorithm, using accelerometer and compass data, and a vision-based positioning algorithm to fill in signal gaps in GPS positioning.
Experiments are conducted to evaluate the developed algorithms using real field test data (GPS coordinates and other sensors data). The experimental results show that the developed algorithms and the integrated sensors, i.e., a monocular visual odometry, a GPS, an accelerometer, and a compass, can provide high-quality and uninterrupted localization services in pedestrian/wheelchair navigation systems/services. The map matching techniques developed in this work can be applied to various pedestrian/wheelchair navigation applications, such as tracking senior citizens and children, or tourist service systems, and can be further utilized in building walking robots and automatic wheelchair navigation systems
Single Image Human Proxemics Estimation for Visual Social Distancing
In this work, we address the problem of estimating the so-called "Social
Distancing" given a single uncalibrated image in unconstrained scenarios. Our
approach proposes a semi-automatic solution to approximate the homography
matrix between the scene ground and image plane. With the estimated homography,
we then leverage an off-the-shelf pose detector to detect body poses on the
image and to reason upon their inter-personal distances using the length of
their body-parts. Inter-personal distances are further locally inspected to
detect possible violations of the social distancing rules. We validate our
proposed method quantitatively and qualitatively against baselines on public
domain datasets for which we provided groundtruth on inter-personal distances.
Besides, we demonstrate the application of our method deployed in a real
testing scenario where statistics on the inter-personal distances are currently
used to improve the safety in a critical environment.Comment: Paper accepted at WACV 2021 conferenc
Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking
Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática
- …