32 research outputs found
Multi-modal Experts Network for Autonomous Driving
End-to-end learning from sensory data has shown promising results in
autonomous driving. While employing many sensors enhances world perception and
should lead to more robust and reliable behavior of autonomous vehicles, it is
challenging to train and deploy such network and at least two problems are
encountered in the considered setting. The first one is the increase of
computational complexity with the number of sensing devices. The other is the
phenomena of network overfitting to the simplest and most informative input. We
address both challenges with a novel, carefully tailored multi-modal experts
network architecture and propose a multi-stage training procedure. The network
contains a gating mechanism, which selects the most relevant input at each
inference time step using a mixed discrete-continuous policy. We demonstrate
the plausibility of the proposed approach on our 1/6 scale truck equipped with
three cameras and one LiDAR.Comment: Published at the International Conference on Robotics and Automation
(ICRA), 202
Robust Deep Multi-Modal Sensor Fusion using Fusion Weight Regularization and Target Learning
Sensor fusion has wide applications in many domains including health care and
autonomous systems. While the advent of deep learning has enabled promising
multi-modal fusion of high-level features and end-to-end sensor fusion
solutions, existing deep learning based sensor fusion techniques including deep
gating architectures are not always resilient, leading to the issue of fusion
weight inconsistency. We propose deep multi-modal sensor fusion architectures
with enhanced robustness particularly under the presence of sensor failures. At
the core of our gating architectures are fusion weight regularization and
fusion target learning operating on auxiliary unimodal sensing networks
appended to the main fusion model. The proposed regularized gating
architectures outperform the existing deep learning architectures with and
without gating under both clean and corrupted sensory inputs resulted from
sensor failures. The demonstrated improvements are particularly pronounced when
one or more multiple sensory modalities are corrupted.Comment: 8 page
Multimodal End-to-End Learning for Autonomous Steering in Adverse Road and Weather Conditions
Autonomous driving is challenging in adverse road and weather conditions in which there might not be lane lines, the road might be covered in snow and the visibility might be poor. We extend the previous work on end-to-end learning for autonomous steering to operate in these adverse real-life conditions with multimodal data. We collected 28 hours of driving data in several road and weather conditions and trained convolutional neural networks to predict the car steering wheel angle from front-facing color camera images and lidar range and reflectance data. We compared the CNN model performances based on the different modalities and our results show that the lidar modality improves the performances of different multimodal sensor-fusion models. We also performed on-road tests with different models and they support this observation
Challenges and solutions for autonomous ground robot scene understanding and navigation in unstructured outdoor environments: A review
The capabilities of autonomous mobile robotic systems have been steadily improving due to recent advancements in computer science, engineering, and related disciplines such as cognitive science. In controlled environments, robots have achieved relatively high levels of autonomy. In more unstructured environments, however, the development of fully autonomous mobile robots remains challenging due to the complexity of understanding these environments. Many autonomous mobile robots use classical, learning-based or hybrid approaches for navigation. More recent learning-based methods may replace the complete navigation pipeline or selected stages of the classical approach. For effective deployment, autonomous robots must understand their external environments at a sophisticated level according to their intended applications. Therefore, in addition to robot perception, scene analysis and higher-level scene understanding (e.g., traversable/non-traversable, rough or smooth terrain, etc.) are required for autonomous robot navigation in unstructured outdoor environments. This paper provides a comprehensive review and critical analysis of these methods in the context of their applications to the problems of robot perception and scene understanding in unstructured environments and the related problems of localisation, environment mapping and path planning. State-of-the-art sensor fusion methods and multimodal scene understanding approaches are also discussed and evaluated within this context. The paper concludes with an in-depth discussion regarding the current state of the autonomous ground robot navigation challenge in unstructured outdoor environments and the most promising future research directions to overcome these challenges
Recommended from our members
Artificial Intelligence based Robotic Platforms for Autonomous Precision Agriculture
Robotic applications are continuously expanding into every aspect of human livelihood, it becomes paramount to leverage this trend for precision agriculture. The agricultural sector despite being an important sector for human is slowly evolving in terms of technology. Crude and manual processes which are conventionally used for agriculture have severe economic and social impacts. The inefficiencies and less productiveness of these methods results to food wastage amidst food shortage, inconsistencies, time consumption, higher labour expenses, and low yield. The world will benefit from automating the processes in agriculture. In bid of addressing such, it becomes necessary to build on existing platforms and develop intelligent autonomous vehicles for precision agriculture. This should include development of intelligent drones for precision agriculture, development of intelligent ground robots for precision agriculture, and other systems working cooperatively. To achieve this, we leverage on Artificial Intelligence (AI) and mathematical methods to impact sufficient intelligence on robotic platforms to make them suitable for precision agriculture.
This thesis explores the capabilities of AI for weed classification and detection, weed relative position estimation, fruit 6D pose estimation and virtual reality for teleoperated systems in fruit picking. Infestation of weeds diminishes the yield of crops in agriculture. Deep learning is becoming a more popular approach for identifying weeds on farmlands. However, precision agriculture requires that the object of interest (weed) is precisely classified and detected to facilitate removal or spraying. An approach for this is presented and involves cascading a classification network (ResNet-50) with a detection network (YOLO) for weed classification and detection which we termed Fused-YOLO. Thus, weeds can precisely be located and classified (type) within an image frame.
Inspired by the precision of this detection model, the work extends to presenting a novel monocular vision-based approach for drones to detect multiple types of weeds and estimate their positions autonomously for precision agriculture applications. A drone is subjected to an elliptical trajectory while acquiring images from an onboard monecular camera. The images are fed to the fused-YOLO model in real-time. The centre of the detection bounding boxes is leveraged to be the centre of the detected object of interest (weeds). The centre pixels are extracted and converted into world coordinates forming azimuth and elevation angles from the target to the UAV and are effectively used in an estimation scheme that adopts the Unscented Kalman Filteration to estimate the exact relative positions of the weeds. The robustness of this algorithm allows for both indoor and outdoor implementation while achieving a competitive result with affordable off-the-shelf sensors.
Artificial intelligence for autonomous 6D pose estimation has valuable contributions to agricultural practices rallying around fruit picking, harvesting, remote operations and other contact-related applications. Conventionally, Convolutional Neural Networks (CNNs) based approaches are adopted for pose estimation. However, precision agriculture applications are demanding on higher accuracy at lower computational costs for real-time applications. Motivated by this, a novel architecture called Transpose is proposed based on transformers. TransPose is an improved Transformer-based 6D pose estimation with a depth refinement. More modalities often result in higher accuracy at the expense of computational cost. TransPose takes in a single RGB image as input without extra modality. However, an innovative light-weight depth estimation network architecture is incorporated into the model to estimate depth from an RGB image using a feature pyramid with an up-sampling method. A transformer model having proven to be efficient, regress the 6D pose directly and also outputs object patches. The depth and the patches are utilised to further refine the regressed 6D pose. The performance of the model is extensively assessed and compared with state-of-the-art methods. As part of this research, a first-ever fruit-oriented 6D pose dataset was acquired.
Lastly, a seamless teleoperation pipeline that interfaces virtual reality with robots for precision agriculture tasks is proposed to pave the way for virtual agriculture. This utilises the Transpose model to estimate the 6D pose of a fruit and render it in a virtual reality environment. A robotic manipulator is which is then controlled from within the virtual reality environment to pick/harvest the fruit while being guided by the Transpose AI model. The robustness of the pipeline is tested over simulation and real-time implementation with a physical robotic manipulator is also investigated
Enabling Multi-LiDAR Sensing in GNSS-Denied Environments: SLAM Dataset, Benchmark, and UAV Tracking with LiDAR-as-a-camera
The rise of Light Detection and Ranging (LiDAR) sensors has profoundly impacted industries ranging from automotive to urban planning. As these sensors become increasingly affordable and compact, their applications are diversifying, driving precision, and innovation. This thesis delves into LiDAR's advancements in autonomous robotic systems, with a focus on its role in simultaneous localization and mapping (SLAM) methodologies and LiDAR as a camera-based tracking for Unmanned Aerial Vehicles (UAV).
Our contributions span two primary domains: the Multi-Modal LiDAR SLAM Benchmark, and the LiDAR-as-a-camera UAV Tracking. In the former, we have expanded our previous multi-modal LiDAR dataset by adding more data sequences from various scenarios. In contrast to the previous dataset, we employ different ground truth-generating approaches. We propose a new multi-modal multi-lidar SLAM-assisted and ICP-based sensor fusion method for generating ground truth maps. Additionally, we also supplement our data with new open road sequences with GNSS-RTK. This enriched dataset, supported by high-resolution LiDAR, provides detailed insights through an evaluation of ten configurations, pairing diverse LiDAR sensors with state-of-the-art SLAM algorithms. In the latter contribution, we leverage a custom YOLOv5 model trained on panoramic low-resolution images from LiDAR reflectivity (LiDAR-as-a-camera) to detect UAVs, demonstrating the superiority of this approach over point cloud or image-only methods. Additionally, we evaluated the real-time performance of our approach on the Nvidia Jetson Nano, a popular mobile computing platform.
Overall, our research underscores the transformative potential of integrating advanced LiDAR sensors with autonomous robotics. By bridging the gaps between different technological approaches, we pave the way for more versatile and efficient applications in the future