703 research outputs found

    Robust Reinforcement Learning Algorithm for Vision-based Ship Landing of UAVs

    Full text link
    This paper addresses the problem of developing an algorithm for autonomous ship landing of vertical take-off and landing (VTOL) capable unmanned aerial vehicles (UAVs), using only a monocular camera in the UAV for tracking and localization. Ship landing is a challenging task due to the small landing space, six degrees of freedom ship deck motion, limited visual references for localization, and adversarial environmental conditions such as wind gusts. We first develop a computer vision algorithm which estimates the relative position of the UAV with respect to a horizon reference bar on the landing platform using the image stream from a monocular vision camera on the UAV. Our approach is motivated by the actual ship landing procedure followed by the Navy helicopter pilots in tracking the horizon reference bar as a visual cue. We then develop a robust reinforcement learning (RL) algorithm for controlling the UAV towards the landing platform even in the presence of adversarial environmental conditions such as wind gusts. We demonstrate the superior performance of our algorithm compared to a benchmark nonlinear PID control approach, both in the simulation experiments using the Gazebo environment and in the real-world setting using a Parrot ANAFI quad-rotor and sub-scale ship platform undergoing 6 degrees of freedom (DOF) deck motion

    An Adaptive Multi-Level Quantization-Based Reinforcement Learning Model for Enhancing UAV Landing on Moving Targets

    Get PDF
    The autonomous landing of an unmanned aerial vehicle (UAV) on a moving platform is an essential functionality in various UAV-based applications. It can be added to a teleoperation UAV system or part of an autonomous UAV control system. Various robust and predictive control systems based on the traditional control theory are used for operating a UAV. Recently, some attempts were made to land a UAV on a moving target using reinforcement learning (RL). Vision is used as a typical way of sensing and detecting the moving target. Mainly, the related works have deployed a deep-neural network (DNN) for RL, which takes the image as input and provides the optimal navigation action as output. However, the delay of the multi-layer topology of the deep neural network affects the real-time aspect of such control. This paper proposes an adaptive multi-level quantization-based reinforcement learning (AMLQ) model. The AMLQ model quantizes the continuous actions and states to directly incorporate simple Q-learning to resolve the delay issue. This solution makes the training faster and enables simple knowledge representation without needing the DNN. For evaluation, the AMLQ model was compared with state-of-art approaches and was found to be superior in terms of root mean square error (RMSE), which was 8.7052 compared with the proportional-integral-derivative (PID) controller, which achieved an RMSE of 10.0592

    A Survey of Offline and Online Learning-Based Algorithms for Multirotor UAVs

    Full text link
    Multirotor UAVs are used for a wide spectrum of civilian and public domain applications. Navigation controllers endowed with different attributes and onboard sensor suites enable multirotor autonomous or semi-autonomous, safe flight, operation, and functionality under nominal and detrimental conditions and external disturbances, even when flying in uncertain and dynamically changing environments. During the last decade, given the faster-than-exponential increase of available computational power, different learning-based algorithms have been derived, implemented, and tested to navigate and control, among other systems, multirotor UAVs. Learning algorithms have been, and are used to derive data-driven based models, to identify parameters, to track objects, to develop navigation controllers, and to learn the environment in which multirotors operate. Learning algorithms combined with model-based control techniques have been proven beneficial when applied to multirotors. This survey summarizes published research since 2015, dividing algorithms, techniques, and methodologies into offline and online learning categories, and then, further classifying them into machine learning, deep learning, and reinforcement learning sub-categories. An integral part and focus of this survey are on online learning algorithms as applied to multirotors with the aim to register the type of learning techniques that are either hard or almost hard real-time implementable, as well as to understand what information is learned, why, and how, and how fast. The outcome of the survey offers a clear understanding of the recent state-of-the-art and of the type and kind of learning-based algorithms that may be implemented, tested, and executed in real-time.Comment: 26 pages, 6 figures, 4 tables, Survey Pape

    Autonomous Drone Landings on an Unmanned Marine Vehicle using Deep Reinforcement Learning

    Get PDF
    This thesis describes with the integration of an Unmanned Surface Vehicle (USV) and an Unmanned Aerial Vehicle (UAV, also commonly known as drone) in a single Multi-Agent System (MAS). In marine robotics, the advantage offered by a MAS consists of exploiting the key features of a single robot to compensate for the shortcomings in the other. In this way, a USV can serve as the landing platform to alleviate the need for a UAV to be airborne for long periods time, whilst the latter can increase the overall environmental awareness thanks to the possibility to cover large portions of the prevailing environment with a camera (or more than one) mounted on it. There are numerous potential applications in which this system can be used, such as deployment in search and rescue missions, water and coastal monitoring, and reconnaissance and force protection, to name but a few. The theory developed is of a general nature. The landing manoeuvre has been accomplished mainly identifying, through artificial vision techniques, a fiducial marker placed on a flat surface serving as a landing platform. The raison d'etre for the thesis was to propose a new solution for autonomous landing that relies solely on onboard sensors and with minimum or no communications between the vehicles. To this end, initial work solved the problem while using only data from the cameras mounted on the in-flight drone. In the situation in which the tracking of the marker is interrupted, the current position of the USV is estimated and integrated into the control commands. The limitations of classic control theory used in this approached suggested the need for a new solution that empowered the flexibility of intelligent methods, such as fuzzy logic or artificial neural networks. The recent achievements obtained by deep reinforcement learning (DRL) techniques in end-to-end control in playing the Atari video-games suite represented a fascinating while challenging new way to see and address the landing problem. Therefore, novel architectures were designed for approximating the action-value function of a Q-learning algorithm and used to map raw input observation to high-level navigation actions. In this way, the UAV learnt how to land from high latitude without any human supervision, using only low-resolution grey-scale images and with a level of accuracy and robustness. Both the approaches have been implemented on a simulated test-bed based on Gazebo simulator and the model of the Parrot AR-Drone. The solution based on DRL was further verified experimentally using the Parrot Bebop 2 in a series of trials. The outcomes demonstrate that both these innovative methods are both feasible and practicable, not only in an outdoor marine scenario but also in indoor ones as well

    Domain Adaptation in Unmanned Aerial Vehicles Landing using Reinforcement Learning

    Get PDF
    Landing an unmanned aerial vehicle (UAV) on a moving platform is a challenging task that often requires exact models of the UAV dynamics, platform characteristics, and environmental conditions. In this thesis, we present and investigate three different machine learning approaches with varying levels of domain knowledge: dynamics randomization, universal policy with system identification, and reinforcement learning with no parameter variation. We first train the policies in simulation, then perform experiments both in simulation, making variations of the system dynamics with wind and friction coefficient, then perform experiments in a real robot system with wind variation. We initially expected that providing more information on environmental characteristics with system identification would improve the outcomes, however, we found that transferring a policy learned in simulation with domain randomization to the real robot system achieves the best result in the real robot and simulation. Although in simulation the universal policy with system identification is faster in some cases. In this thesis, we compare the results of multiple deep reinforcement learning approaches trained in simulation and transferred in robot experiments with the presence of external disturbances. We were able to create a policy to control a UAV completely trained in simulation and transfer to a real system with the presence of external disturbances. In doing so, we evaluate the performance of dynamics randomization and universal policy with system identification. Adviser: Carrick Detweile

    Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

    Get PDF
    The autonomous landing of an Unmanned Aerial Vehicle (UAV) on a marker is one of the most challenging problems in robotics. Many solutions have been proposed, with the best results achieved via customized geometric features and external sensors. This paper discusses for the first time the use of deep reinforcement learning as an end-to-end learning paradigm to find a policy for UAVs autonomous landing. Our method is based on a divide-and-conquer paradigm that splits a task into sequential sub-tasks, each one assigned to a Deep Q-Network (DQN), hence the name Sequential Deep Q-Network (SDQN). Each DQN in an SDQN is activated by an internal trigger, and it represents a component of a high-level control policy, which can navigate the UAV towards the marker. Different technical solutions have been implemented, for example combining vanilla and double DQNs, and the introduction of a partitioned buffer replay to address the problem of sample efficiency. One of the main contributions of this work consists in showing how an SDQN trained in a simulator via domain randomization, can effectively generalize to real-world scenarios of increasing complexity. The performance of SDQNs is comparable with a state-of-the-art algorithm and human pilots while being quantitatively better in noisy conditions

    Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training

    Get PDF
    Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking

    Vision-Based Autonomous Following of a Moving Platform and Landing for an Unmanned Aerial Vehicle

    Get PDF
    Interest in Unmanned Aerial Vehicles (UAVs) has increased due to their versatility and variety of applications, however their battery life limits their applications. Heterogeneous multi-robot systems can offer a solution to this limitation, by allowing an Unmanned Ground Vehicle (UGV) to serve as a recharging station for the aerial one. Moreover, cooperation between aerial and terrestrial robots allows them to overcome other individual limitations, such as communication link coverage or accessibility, and to solve highly complex tasks, e.g., environment exploration, infrastructure inspection or search and rescue. This work proposes a vision-based approach that enables an aerial robot to autonomously detect, follow, and land on a mobile ground platform. For this purpose, ArUcO fiducial markers are used to estimate the relative pose between the UAV and UGV by processing RGB images provided by a monocular camera on board the UAV. The pose estimation is fed to a trajectory planner and four decoupled controllers to generate speed set-points relative to the UAV. Using a cascade loop strategy, these set-points are then sent to the UAV autopilot for inner loop control. The proposed solution has been tested both in simulation, with a digital twin of a solar farm using ROS, Gazebo and Ardupilot Software-in-the-Loop (SiL); and in the real world at IST Lisbon’s outdoor facilities, with a UAV built on the basis of a DJ550 Hexacopter and a modified Jackal ground robot from DJI and Clearpath Robotics, respectively. Pose estimation, trajectory planning and speed set-point are computed on board the UAV, using a Single Board Computer (SBC) running Ubuntu and ROS, without the need for external infrastructure.This research was funded by the ISR/LARSyS Strategic Funding through the FCT project UIDB/50009/2020, the DURABLE project, under the Interreg Atlantic Area Programme through the European Regional Development Fund (ERDF), the Andalusian project UMA18-FEDERJA-090 and the University of Málaga Research Plan. Partial funding for open access charge: Universidad de Málag

    Drone deep reinforcement learning: A review

    Get PDF
    Unmanned Aerial Vehicles (UAVs) are increasingly being used in many challenging and diversified applications. These applications belong to the civilian and the military fields. To name a few; infrastructure inspection, traffic patrolling, remote sensing, mapping, surveillance, rescuing humans and animals, environment monitoring, and Intelligence, Surveillance, Target Acquisition, and Reconnaissance (ISTAR) operations. However, the use of UAVs in these applications needs a substantial level of autonomy. In other words, UAVs should have the ability to accomplish planned missions in unexpected situations without requiring human intervention. To ensure this level of autonomy, many artificial intelligence algorithms were designed. These algorithms targeted the guidance, navigation, and control (GNC) of UAVs. In this paper, we described the state of the art of one subset of these algorithms: the deep reinforcement learning (DRL) techniques. We made a detailed description of them, and we deduced the current limitations in this area. We noted that most of these DRL methods were designed to ensure stable and smooth UAV navigation by training computer-simulated environments. We realized that further research efforts are needed to address the challenges that restrain their deployment in real-life scenarios
    • …
    corecore