91 research outputs found

    Visual Guidance for Unmanned Aerial Vehicles with Deep Learning

    Full text link
    Unmanned Aerial Vehicles (UAVs) have been widely applied in the military and civilian domains. In recent years, the operation mode of UAVs is evolving from teleoperation to autonomous flight. In order to fulfill the goal of autonomous flight, a reliable guidance system is essential. Since the combination of Global Positioning System (GPS) and Inertial Navigation System (INS) systems cannot sustain autonomous flight in some situations where GPS can be degraded or unavailable, using computer vision as a primary method for UAV guidance has been widely explored. Moreover, GPS does not provide any information to the robot on the presence of obstacles. Stereo cameras have complex architecture and need a minimum baseline to generate disparity map. By contrast, monocular cameras are simple and require less hardware resources. Benefiting from state-of-the-art Deep Learning (DL) techniques, especially Convolutional Neural Networks (CNNs), a monocular camera is sufficient to extrapolate mid-level visual representations such as depth maps and optical flow (OF) maps from the environment. Therefore, the objective of this thesis is to develop a real-time visual guidance method for UAVs in cluttered environments using a monocular camera and DL. The three major tasks performed in this thesis are investigating the development of DL techniques and monocular depth estimation (MDE), developing real-time CNNs for MDE, and developing visual guidance methods on the basis of the developed MDE system. A comprehensive survey is conducted, which covers Structure from Motion (SfM)-based methods, traditional handcrafted feature-based methods, and state-of-the-art DL-based methods. More importantly, it also investigates the application of MDE in robotics. Based on the survey, two CNNs for MDE are developed. In addition to promising accuracy performance, these two CNNs run at high frame rates (126 fps and 90 fps respectively), on a single modest power Graphical Processing Unit (GPU). As regards the third task, the visual guidance for UAVs is first developed on top of the designed MDE networks. To improve the robustness of UAV guidance, OF maps are integrated into the developed visual guidance method. A cross-attention module is applied to fuse the features learned from the depth maps and OF maps. The fused features are then passed through a deep reinforcement learning (DRL) network to generate the policy for guiding the flight of UAV. Additionally, a simulation framework is developed which integrates AirSim, Unreal Engine and PyTorch. The effectiveness of the developed visual guidance method is validated through extensive experiments in the simulation framework

    Portable Robotic Navigation Aid for the Visually Impaired

    Get PDF
    This dissertation aims to address the limitations of existing visual-inertial (VI) SLAM methods - lack of needed robustness and accuracy - for assistive navigation in a large indoor space. Several improvements are made to existing SLAM technology, and the improved methods are used to enable two robotic assistive devices, a robot cane, and a robotic object manipulation aid, for the visually impaired for assistive wayfinding and object detection/grasping. First, depth measurements are incorporated into the optimization process for device pose estimation to improve the success rate of VI SLAM\u27s initialization and reduce scale drift. The improved method, called depth-enhanced visual-inertial odometry (DVIO), initializes itself immediately as the environment\u27s metric scale can be derived from the depth data. Second, a hybrid PnP (perspective n-point) method is introduced for a more accurate estimation of the pose change between two camera frames by using the 3D data from both frames. Third, to implement DVIO on a smartphone with variable camera intrinsic parameters (CIP), a method called CIP-VMobile is devised to simultaneously estimate the intrinsic parameters and motion states of the camera. CIP-VMobile estimates in real time the CIP, which varies with the smartphone\u27s pose due to the camera\u27s optical image stabilization mechanism, resulting in more accurate device pose estimates. Various experiments are performed to validate the VI-SLAM methods with the two robotic assistive devices. Beyond these primary objectives, SM-SLAM is proposed as a potential extension for the existing SLAM methods in dynamic environments. This forward-looking exploration is premised on the potential that incorporating dynamic object detection capabilities in the front-end could improve SLAM\u27s overall accuracy and robustness. Various experiments have been conducted to validate the efficacy of this newly proposed method, using both public and self-collected datasets. The results obtained substantiate the viability of this innovation, leaving a deeper investigation for future work

    Dense Visual Simultaneous Localisation and Mapping in Collaborative and Outdoor Scenarios

    Get PDF
    Dense visual simultaneous localisation and mapping (SLAM) systems can produce 3D reconstructions that are digital facsimiles of the physical space they describe. Systems that can produce dense maps with this level of fidelity in real time provide foundational spatial reasoning capabilities for many downstream tasks in autonomous robotics. Over the past 15 years, mapping small scale, indoor environments, such as desks and buildings, with a single slow moving, hand-held sensor has been one of the central focuses of dense visual SLAM research. However, most dense visual SLAM systems exhibit a number of limitations which mean they cannot be directly applied in collaborative or outdoors settings. The contribution of this thesis is to address these limitations with the development of new systems and algorithms for collaborative dense mapping, efficient dense alternation and outdoors operation with fast camera motion and wide field of view (FOV) cameras. We use ElasticFusion, a state-of-the-art dense SLAM system, as our starting point where each of these contributions is implemented as a novel extension to the system. We first present a collaborative dense SLAM system that allows a number of cameras starting with unknown initial relative positions to maintain local maps with the original ElasticFusion algorithm. Visual place recognition across local maps results in constraints that allow maps to be aligned into a common global reference frame, facilitating collaborative mapping and tracking of multiple cameras within a shared map. Within dense alternation based SLAM systems, the standard approach is to fuse every frame into the dense model without considering whether the information contained within the frame is already captured by the dense map and therefore redundant. As the number of cameras or the scale of the map increases, this approach becomes inefficient. In our second contribution, we address this inefficiency by introducing a novel information theoretic approach to keyframe selection that allows the system to avoid processing redundant information. We implement the procedure within ElasticFusion, demonstrating a marked reduction in the number of frames required by the system to estimate an accurate, denoised surface reconstruction. Before dense SLAM techniques can be applied in outdoor scenarios we must first address their reliance on active depth cameras, and their lack of suitability to fast camera motion. In our third contribution we present an outdoor dense SLAM system. The system overcomes the need for an active sensor by employing neural network-based depth inference to predict the geometry of the scene as it appears in each image. To address the issue of camera tracking during fast motion we employ a hybrid architecture, combining elements of both dense and sparse SLAM systems to perform camera tracking and to achieve globally consistent dense mapping. Automotive applications present a particularly important setting for dense visual SLAM systems. Such applications are characterised by their use of wide FOV cameras and are therefore not accurately modelled by the standard pinhole camera model. The fourth contribution of this thesis is to extend the above hybrid sparse-dense monocular SLAM system to cater for large FOV fisheye imagery. This is achieved by reformulating the mapping pipeline in terms of the Kannala-Brandt fisheye camera model. To estimate depth, we introduce a new version of the PackNet depth estimation neural network (Guizilini et al., 2020) adapted for fisheye inputs. To demonstrate the effectiveness of our contributions, we present experimental results, computed by processing the synthetic ICL-NUIM dataset of Handa et al. (2014) as well as the real-world TUM-RGBD dataset of Sturm et al. (2012). For outdoor SLAM we show the results of our system processing the autonomous driving KITTI and KITTI-360 datasets of Geiger et al. (2012a) and Liao et al. (2021) respectively

    Simultaneous Localization and Mapping (SLAM) for Autonomous Driving: Concept and Analysis

    Get PDF
    The Simultaneous Localization and Mapping (SLAM) technique has achieved astonishing progress over the last few decades and has generated considerable interest in the autonomous driving community. With its conceptual roots in navigation and mapping, SLAM outperforms some traditional positioning and localization techniques since it can support more reliable and robust localization, planning, and controlling to meet some key criteria for autonomous driving. In this study the authors first give an overview of the different SLAM implementation approaches and then discuss the applications of SLAM for autonomous driving with respect to different driving scenarios, vehicle system components and the characteristics of the SLAM approaches. The authors then discuss some challenging issues and current solutions when applying SLAM for autonomous driving. Some quantitative quality analysis means to evaluate the characteristics and performance of SLAM systems and to monitor the risk in SLAM estimation are reviewed. In addition, this study describes a real-world road test to demonstrate a multi-sensor-based modernized SLAM procedure for autonomous driving. The numerical results show that a high-precision 3D point cloud map can be generated by the SLAM procedure with the integration of Lidar and GNSS/INS. Online four–five cm accuracy localization solution can be achieved based on this pre-generated map and online Lidar scan matching with a tightly fused inertial system

    Intelligent in-vehicle interaction technologies

    Get PDF
    With rapid advances in the field of autonomous vehicles (AVs), the ways in which human–vehicle interaction (HVI) will take place inside the vehicle have attracted major interest and, as a result, intelligent interiors are being explored to improve the user experience, acceptance, and trust. This is also fueled by parallel research in areas such as perception and control of robots, safe human–robot interaction, wearable systems, and the underpinning flexible/printed electronics technologies. Some of these are being routed to AVs. Growing number of network of sensors are being integrated into the vehicles for multimodal interaction to draw correct inferences of the communicative cues from the user and to vary the interaction dynamics depending on the cognitive state of the user and contextual driving scenario. In response to this growing trend, this timely article presents a comprehensive review of the technologies that are being used or developed to perceive user's intentions for natural and intuitive in-vehicle interaction. The challenges that are needed to be overcome to attain truly interactive AVs and their potential solutions are discussed along with various new avenues for future research

    Forum Bildverarbeitung 2020

    Get PDF
    Image processing plays a key role for fast and contact-free data acquisition in many technical areas, e.g., in quality control or robotics. These conference proceedings of the “Forum Bildverarbeitung”, which took place on 26.-27.11.202 in Karlsruhe as a common event of the Karlsruhe Institute of Technology and the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation, contain the articles of the contributions

    Depth Estimation Using 2D RGB Images

    Get PDF
    Single image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. As a result, in the first part of this dissertation, the idea of constraining the model for more accurate depth estimation by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene is explored. Although deep learning based methods are very successful in computer vision and handle noise very well, they suffer from poor generalization when the test and train distributions are not close. While, the geometric methods do not have the generalization problem since they benefit from temporal information in an unsupervised manner. They are sensitive to noise, though. At the same time, explicitly modeling of a dynamic scenes as well as flexible objects in traditional computer vision methods is a big challenge. Considering the advantages and disadvantages of each approach, a hybrid method, which benefits from both, is proposed here by extending traditional geometric models’ abilities to handle flexible and dynamic objects in the scene. This is made possible by relaxing geometric computer vision rules from one motion model for some areas of the scene into one for every pixel in the scene. This enables the model to detect even small, flexible, floating debris in a dynamic scene. However, it makes the optimization under-constrained. To change the optimization from under-constrained to over-constrained while maintaining the model’s flexibility, ”moving object detection loss” and ”synchrony loss” are designed. The algorithm is trained in an unsupervised fashion. The primary results are in no way comparable to the current state of the art. Because the training process is so slow, it is difficult to compare it to the current state of the art. Also, the algorithm lacks stability. In addition, the optical flow model is extremely noisy and naive. At the end, some solutions are suggested to address these issues

    UAV or Drones for Remote Sensing Applications in GPS/GNSS Enabled and GPS/GNSS Denied Environments

    Get PDF
    The design of novel UAV systems and the use of UAV platforms integrated with robotic sensing and imaging techniques, as well as the development of processing workflows and the capacity of ultra-high temporal and spatial resolution data, have enabled a rapid uptake of UAVs and drones across several industries and application domains.This book provides a forum for high-quality peer-reviewed papers that broaden awareness and understanding of single- and multiple-UAV developments for remote sensing applications, and associated developments in sensor technology, data processing and communications, and UAV system design and sensing capabilities in GPS-enabled and, more broadly, Global Navigation Satellite System (GNSS)-enabled and GPS/GNSS-denied environments.Contributions include:UAV-based photogrammetry, laser scanning, multispectral imaging, hyperspectral imaging, and thermal imaging;UAV sensor applications; spatial ecology; pest detection; reef; forestry; volcanology; precision agriculture wildlife species tracking; search and rescue; target tracking; atmosphere monitoring; chemical, biological, and natural disaster phenomena; fire prevention, flood prevention; volcanic monitoring; pollution monitoring; microclimates; and land use;Wildlife and target detection and recognition from UAV imagery using deep learning and machine learning techniques;UAV-based change detection

    Vision-based localization methods under GPS-denied conditions

    Full text link
    This paper reviews vision-based localization methods in GPS-denied environments and classifies the mainstream methods into Relative Vision Localization (RVL) and Absolute Vision Localization (AVL). For RVL, we discuss the broad application of optical flow in feature extraction-based Visual Odometry (VO) solutions and introduce advanced optical flow estimation methods. For AVL, we review recent advances in Visual Simultaneous Localization and Mapping (VSLAM) techniques, from optimization-based methods to Extended Kalman Filter (EKF) based methods. We also introduce the application of offline map registration and lane vision detection schemes to achieve Absolute Visual Localization. This paper compares the performance and applications of mainstream methods for visual localization and provides suggestions for future studies.Comment: 32 pages, 15 figure

    Perception-driven approaches to real-time remote immersive visualization

    Get PDF
    In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput
    • …
    corecore