Search CORE

19,150 research outputs found

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Author: Amini Alexander
Ban Yutong
Karaman Sertac
Maalouf Alaa
Rosman Guy
Rus Daniela
Wang Tsun-Hsuan
Xiao Wei
Publication venue
Publication date: 26/10/2023
Field of study

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do so, we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training (data augmentation via text) and policy debugging. We encourage the reader to check our explainer video at https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be and to view the code and demos on our project webpage at https://drive-anywhere.github.io/.Comment: Project webpage: https://drive-anywhere.github.io Explainer video: https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.b

arXiv.org e-Print Archive

Research on obstacle avoidance optimization and path planning of autonomous vehicles based on attention mechanism combined with multimodal information decision-making thoughts of robots

Author: Guangming Wang
Guangming Wang
Nachuan Shen
Xuejin Wu
Publication venue: Frontiers Media S.A.
Publication date: 01/09/2023
Field of study

With the development of machine perception and multimodal information decision-making techniques, autonomous driving technology has become a crucial area of advancement in the transportation industry. The optimization of vehicle navigation, path planning, and obstacle avoidance tasks is of paramount importance. In this study, we explore the use of attention mechanisms in a end-to-end architecture for optimizing obstacle avoidance and path planning in autonomous driving vehicles. We position our research within the broader context of robotics, emphasizing the fusion of information and decision-making capabilities. The introduction of attention mechanisms enables vehicles to perceive the environment more accurately by focusing on important information and making informed decisions in complex scenarios. By inputting multimodal information, such as images and LiDAR data, into the attention mechanism module, the system can automatically learn and weigh crucial environmental features, thereby placing greater emphasis on key information during obstacle avoidance decisions. Additionally, we leverage the end-to-end architecture and draw from classical theories and algorithms in the field of robotics to enhance the perception and decision-making abilities of autonomous driving vehicles. Furthermore, we address the optimization of path planning using attention mechanisms. We transform the vehicle's navigation task into a sequential decision-making problem and employ LSTM (Long Short-Term Memory) models to handle dynamic navigation in varying environments. By applying attention mechanisms to weigh key points along the navigation path, the vehicle can flexibly select the optimal route and dynamically adjust it based on real-time conditions. Finally, we conducted extensive experimental evaluations and software experiments on the proposed end-to-end architecture on real road datasets. The method effectively avoids obstacles, adheres to traffic rules, and achieves stable, safe, and efficient autonomous driving in diverse road scenarios. This research provides an effective solution for optimizing obstacle avoidance and path planning in the field of autonomous driving. Moreover, it contributes to the advancement and practical applications of multimodal information fusion in navigation, localization, and human-robot interaction

Directory of Open Access Journals

Multimodal End-to-End Learning for Autonomous Steering in Adverse Road and Weather Conditions

Author: Hyyppä Juha
Maanpää Jyri
Manninen Petri
Melekhov Iaroslav
Pakola Leo
Taher Josef
Publication venue: IEEE
Publication date: 01/01/2021
Field of study

Autonomous driving is challenging in adverse road and weather conditions in which there might not be lane lines, the road might be covered in snow and the visibility might be poor. We extend the previous work on end-to-end learning for autonomous steering to operate in these adverse real-life conditions with multimodal data. We collected 28 hours of driving data in several road and weather conditions and trained convolutional neural networks to predict the car steering wheel angle from front-facing color camera images and lidar range and reflectance data. We compared the CNN model performances based on the different modalities and our results show that the lidar modality improves the performances of different multimodal sensor-fusion models. We also performed on-road tests with different models and they support this observation

Helsingin yliopiston digitaalinen arkisto

Multimodal Manoeuvre and Trajectory Prediction for Autonomous Vehicles Using Transformer Networks

Author: Dianati Mehrdad
Koufos Konstantinos
Mozaffari Sajjad
Publication venue
Publication date: 28/03/2023
Field of study

Predicting the behaviour (i.e. manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a. automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using two public benchmark highway driving datasets, namely NGSIM and highD. The results show that the proposed framework outperforms the state-of-the-art multimodal methods in the literature in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes.Comment: 8 pages, 3 figures, submitted to IEEE RA

arXiv.org e-Print Archive

Integrating state-of-the-art CNNs for multi-sensor 3D vehicle detection in real autonomous driving environments

Author: Barea Navarro Rafael
Bergasa Pascual Luis M.
López Guillén M. Elena
López Joaquín
Pérez Gil Óscar
Romera Carmena Eduardo
Tradacete Ágreda Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2019
Field of study

2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27-30 Oct. 2019This paper presents two new approaches to detect surrounding vehicles in 3D urban driving scenes and their corresponding Bird’s Eye View (BEV). The proposals integrate two state-of-the-art Convolutional Neural Networks (CNNs), such as YOLOv3 and Mask-RCNN, in a framework presented by the authors in [1] for 3D vehicles detection fusing semantic image segmentation and LIDAR point cloud. Our proposals take advantage of multimodal fusion, geometrical constrains, and pre-trained modules inside our framework. The methods have been tested using the KITTI object detection benchmark and comparison is presented. Experiments show new approaches improve results with respect to the baseline and are on par with other competitive state-of-the-art proposals, being the only ones that do not apply an end-to-end learning process. In this way, they remove the need to train on a specific dataset and show a good capability of generalization to any domain, a key point for self-driving systems. Finally, we have tested our best proposal in KITTI in our driving environment, without any adaptation, obtaining results suitable for our autonomous driving application.Ministerio de Economía y CompetitividadComunidad de Madri

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Multimodal manoeuvre and trajectory prediction for automated driving on highways using transformer networks

Author: Dianati Mehrdad
Koufos Konstantinos
Mozaffari Sajjad
Sormoli Mreza Alipour
Publication venue: IEEE
Publication date: 03/08/2023
Field of study

Predicting the behaviour (i.e., manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a., automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction, enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using three public highway driving datasets, namely NGSIM, highD, and exiD. The results show that our framework outperforms the state-of-the-art multimodal methods in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes

Warwick Research Archives Portal Repository

Multimodal 3D Object Detection from Simulated Pretraining

Author: A Gaidon
A Geiger
BR Kiran
HP Schöner
S Shah
Publication venue
Publication date: 01/01/2019
Field of study

The need for simulated data in autonomous driving applications has become increasingly important, both for validation of pretrained models and for training new models. In order for these models to generalize to real-world applications, it is critical that the underlying dataset contains a variety of driving scenarios and that simulated sensor readings closely mimics real-world sensors. We present the Carla Automated Dataset Extraction Tool (CADET), a novel tool for generating training data from the CARLA simulator to be used in autonomous driving research. The tool is able to export high-quality, synchronized LIDAR and camera data with object annotations, and offers configuration to accurately reflect a real-life sensor array. Furthermore, we use this tool to generate a dataset consisting of 10 000 samples and use this dataset in order to train the 3D object detection network AVOD-FPN, with finetuning on the KITTI dataset in order to evaluate the potential for effective pretraining. We also present two novel LIDAR feature map configurations in Bird's Eye View for use with AVOD-FPN that can be easily modified. These configurations are tested on the KITTI and CADET datasets in order to evaluate their performance as well as the usability of the simulated dataset for pretraining. Although insufficient to fully replace the use of real world data, and generally not able to exceed the performance of systems fully trained on real data, our results indicate that simulated data can considerably reduce the amount of training on real data required to achieve satisfactory levels of accuracy.Comment: 12 pages, part of proceedings for the NAIS 2019 symposiu

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives