Search CORE

32 research outputs found

Multi-modal Experts Network for Autonomous Driving

Author: Choromanska Anna
Fang Shihong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/09/2020
Field of study

End-to-end learning from sensory data has shown promising results in autonomous driving. While employing many sensors enhances world perception and should lead to more robust and reliable behavior of autonomous vehicles, it is challenging to train and deploy such network and at least two problems are encountered in the considered setting. The first one is the increase of computational complexity with the number of sensing devices. The other is the phenomena of network overfitting to the simplest and most informative input. We address both challenges with a novel, carefully tailored multi-modal experts network architecture and propose a multi-stage training procedure. The network contains a gating mechanism, which selects the most relevant input at each inference time step using a mixed discrete-continuous policy. We demonstrate the plausibility of the proposed approach on our 1/6 scale truck equipped with three cameras and one LiDAR.Comment: Published at the International Conference on Robotics and Automation (ICRA), 202

arXiv.org e-Print Archive

Crossref

Robust Deep Multi-Modal Sensor Fusion using Fusion Weight Regularization and Target Learning

Author: Li Peng
Li Yang
Shim Myung Seok
Zhang Wenrui
Zhang Xuchong
Zhao Chenye
Publication venue
Publication date: 01/01/2019
Field of study

Sensor fusion has wide applications in many domains including health care and autonomous systems. While the advent of deep learning has enabled promising multi-modal fusion of high-level features and end-to-end sensor fusion solutions, existing deep learning based sensor fusion techniques including deep gating architectures are not always resilient, leading to the issue of fusion weight inconsistency. We propose deep multi-modal sensor fusion architectures with enhanced robustness particularly under the presence of sensor failures. At the core of our gating architectures are fusion weight regularization and fusion target learning operating on auxiliary unimodal sensing networks appended to the main fusion model. The proposed regularized gating architectures outperform the existing deep learning architectures with and without gating under both clean and corrupted sensory inputs resulted from sensor failures. The demonstrated improvements are particularly pronounced when one or more multiple sensory modalities are corrupted.Comment: 8 page

arXiv.org e-Print Archive

eScholarship - University of California

Multimodal End-to-End Learning for Autonomous Steering in Adverse Road and Weather Conditions

Author: Hyyppä Juha
Maanpää Jyri
Manninen Petri
Melekhov Iaroslav
Pakola Leo
Taher Josef
Publication venue: IEEE
Publication date: 01/01/2021
Field of study

Autonomous driving is challenging in adverse road and weather conditions in which there might not be lane lines, the road might be covered in snow and the visibility might be poor. We extend the previous work on end-to-end learning for autonomous steering to operate in these adverse real-life conditions with multimodal data. We collected 28 hours of driving data in several road and weather conditions and trained convolutional neural networks to predict the car steering wheel angle from front-facing color camera images and lidar range and reflectance data. We compared the CNN model performances based on the different modalities and our results show that the lidar modality improves the performances of different multimodal sensor-fusion models. We also performed on-road tests with different models and they support this observation

Helsingin yliopiston digitaalinen arkisto

Challenges and solutions for autonomous ground robot scene understanding and navigation in unstructured outdoor environments: A review

Author: Chai Douglas
Rassau Alexander
Wijayathunga Liyana
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/09/2023
Field of study

The capabilities of autonomous mobile robotic systems have been steadily improving due to recent advancements in computer science, engineering, and related disciplines such as cognitive science. In controlled environments, robots have achieved relatively high levels of autonomy. In more unstructured environments, however, the development of fully autonomous mobile robots remains challenging due to the complexity of understanding these environments. Many autonomous mobile robots use classical, learning-based or hybrid approaches for navigation. More recent learning-based methods may replace the complete navigation pipeline or selected stages of the classical approach. For effective deployment, autonomous robots must understand their external environments at a sophisticated level according to their intended applications. Therefore, in addition to robot perception, scene analysis and higher-level scene understanding (e.g., traversable/non-traversable, rough or smooth terrain, etc.) are required for autonomous robot navigation in unstructured outdoor environments. This paper provides a comprehensive review and critical analysis of these methods in the context of their applications to the problems of robot perception and scene understanding in unstructured environments and the related problems of localisation, environment mapping and path planning. State-of-the-art sensor fusion methods and multimodal scene understanding approaches are also discussed and evaluated within this context. The paper concludes with an in-depth discussion regarding the current state of the autonomous ground robot navigation challenge in unstructured outdoor environments and the most promising future research directions to overcome these challenges

Research Online @ ECU

Recommended from our members

Artificial Intelligence based Robotic Platforms for Autonomous Precision Agriculture

Author: Abdulsalam M.
Publication venue
Publication date
Field of study

Robotic applications are continuously expanding into every aspect of human livelihood, it becomes paramount to leverage this trend for precision agriculture. The agricultural sector despite being an important sector for human is slowly evolving in terms of technology. Crude and manual processes which are conventionally used for agriculture have severe economic and social impacts. The inefficiencies and less productiveness of these methods results to food wastage amidst food shortage, inconsistencies, time consumption, higher labour expenses, and low yield. The world will benefit from automating the processes in agriculture. In bid of addressing such, it becomes necessary to build on existing platforms and develop intelligent autonomous vehicles for precision agriculture. This should include development of intelligent drones for precision agriculture, development of intelligent ground robots for precision agriculture, and other systems working cooperatively. To achieve this, we leverage on Artificial Intelligence (AI) and mathematical methods to impact sufficient intelligence on robotic platforms to make them suitable for precision agriculture. This thesis explores the capabilities of AI for weed classification and detection, weed relative position estimation, fruit 6D pose estimation and virtual reality for teleoperated systems in fruit picking. Infestation of weeds diminishes the yield of crops in agriculture. Deep learning is becoming a more popular approach for identifying weeds on farmlands. However, precision agriculture requires that the object of interest (weed) is precisely classified and detected to facilitate removal or spraying. An approach for this is presented and involves cascading a classification network (ResNet-50) with a detection network (YOLO) for weed classification and detection which we termed Fused-YOLO. Thus, weeds can precisely be located and classified (type) within an image frame. Inspired by the precision of this detection model, the work extends to presenting a novel monocular vision-based approach for drones to detect multiple types of weeds and estimate their positions autonomously for precision agriculture applications. A drone is subjected to an elliptical trajectory while acquiring images from an onboard monecular camera. The images are fed to the fused-YOLO model in real-time. The centre of the detection bounding boxes is leveraged to be the centre of the detected object of interest (weeds). The centre pixels are extracted and converted into world coordinates forming azimuth and elevation angles from the target to the UAV and are effectively used in an estimation scheme that adopts the Unscented Kalman Filteration to estimate the exact relative positions of the weeds. The robustness of this algorithm allows for both indoor and outdoor implementation while achieving a competitive result with affordable off-the-shelf sensors. Artificial intelligence for autonomous 6D pose estimation has valuable contributions to agricultural practices rallying around fruit picking, harvesting, remote operations and other contact-related applications. Conventionally, Convolutional Neural Networks (CNNs) based approaches are adopted for pose estimation. However, precision agriculture applications are demanding on higher accuracy at lower computational costs for real-time applications. Motivated by this, a novel architecture called Transpose is proposed based on transformers. TransPose is an improved Transformer-based 6D pose estimation with a depth refinement. More modalities often result in higher accuracy at the expense of computational cost. TransPose takes in a single RGB image as input without extra modality. However, an innovative light-weight depth estimation network architecture is incorporated into the model to estimate depth from an RGB image using a feature pyramid with an up-sampling method. A transformer model having proven to be efficient, regress the 6D pose directly and also outputs object patches. The depth and the patches are utilised to further refine the regressed 6D pose. The performance of the model is extensively assessed and compared with state-of-the-art methods. As part of this research, a first-ever fruit-oriented 6D pose dataset was acquired. Lastly, a seamless teleoperation pipeline that interfaces virtual reality with robots for precision agriculture tasks is proposed to pave the way for virtual agriculture. This utilises the Transpose model to estimate the 6D pose of a fruit and render it in a virtual reality environment. A robotic manipulator is which is then controlled from within the virtual reality environment to pick/harvest the fruit while being guided by the Transpose AI model. The robustness of the pipeline is tested over simulation and real-time implementation with a physical robotic manipulator is also investigated

City Research Online

Enabling Multi-LiDAR Sensing in GNSS-Denied Environments: SLAM Dataset, Benchmark, and UAV Tracking with LiDAR-as-a-camera

Author: Ha Sier
Publication venue
Publication date: 29/08/2023
Field of study

The rise of Light Detection and Ranging (LiDAR) sensors has profoundly impacted industries ranging from automotive to urban planning. As these sensors become increasingly affordable and compact, their applications are diversifying, driving precision, and innovation. This thesis delves into LiDAR's advancements in autonomous robotic systems, with a focus on its role in simultaneous localization and mapping (SLAM) methodologies and LiDAR as a camera-based tracking for Unmanned Aerial Vehicles (UAV). Our contributions span two primary domains: the Multi-Modal LiDAR SLAM Benchmark, and the LiDAR-as-a-camera UAV Tracking. In the former, we have expanded our previous multi-modal LiDAR dataset by adding more data sequences from various scenarios. In contrast to the previous dataset, we employ different ground truth-generating approaches. We propose a new multi-modal multi-lidar SLAM-assisted and ICP-based sensor fusion method for generating ground truth maps. Additionally, we also supplement our data with new open road sequences with GNSS-RTK. This enriched dataset, supported by high-resolution LiDAR, provides detailed insights through an evaluation of ten configurations, pairing diverse LiDAR sensors with state-of-the-art SLAM algorithms. In the latter contribution, we leverage a custom YOLOv5 model trained on panoramic low-resolution images from LiDAR reflectivity (LiDAR-as-a-camera) to detect UAVs, demonstrating the superiority of this approach over point cloud or image-only methods. Additionally, we evaluated the real-time performance of our approach on the Nvidia Jetson Nano, a popular mobile computing platform. Overall, our research underscores the transformative potential of integrating advanced LiDAR sensors with autonomous robotics. By bridging the gaps between different technological approaches, we pave the way for more versatile and efficient applications in the future

UTUPub