12 research outputs found

    Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training

    Get PDF
    Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking

    A Review of DJI’s Mavic Pro Precision Landing Accuracy

    Get PDF
    Precision landing has the potential to increase the accuracy of autonomous landings. Unique applications require specific landing performance; for example, wireless charging loses efficiency with a misalignment of 100mm. Unfortunately, there is no publicly available information for the DJI Mavic Pro’s landing specifications. This research investigated the ability of a Mavic Pro to land at a specified point accurately. The purpose of this research is to determine if precision landings are more accurate than non-precision autonomous landings and if the Mavic Pro is capable of applications such as wireless charging when using precision landings. A total of 128 (64 precision and 64 non-precision) landings were recorded. A two-tail two-sample t-test compared the differences between Precision Landing On vs. Precision Landing Off (PLON vs. PLOFF). Data showed statistical evidence to reject the null hypothesis indicating there was a statistical performance in mean landing accuracy with PLON (M = 3.45, SD = 1.30) over PLOFF (M = 4.40, SD = 1.89), t(109) = -3.313, p = 0.0013. A one-tail one-sample t-test comparing the landing distance of PLON to 100mm (distance for effective wireless charging) produced statistical evidence to reject the null hypothesis indicating the PLON landing accuracy (M = 87.63mm, SD = 33.02mm) was less than 100mm, t(62) = -2.98, p = 0.002. Evidence showed that precision landings increased the landing performance and may allow for future potential applications, including wireless charging

    A particle swarm optimization approach using adaptive entropy-based fitness quantification of expert knowledge for high-level, real-time cognitive robotic control

    Get PDF
    Abstract: High-level, real-time mission control of semi-autonomous robots, deployed in remote and dynamic environments, remains a challenge. Control models, learnt from a knowledgebase, quickly become obsolete when the environment or the knowledgebase changes. This research study introduces a cognitive reasoning process, to select the optimal action, using the most relevant knowledge from the knowledgebase, subject to observed evidence. The approach in this study introduces an adaptive entropy-based set-based particle swarm algorithm (AE-SPSO) and a novel, adaptive entropy-based fitness quantification (AEFQ) algorithm for evidence-based optimization of the knowledge. The performance of the AE-SPSO and AEFQ algorithms are experimentally evaluated with two unmanned aerial vehicle (UAV) benchmark missions: (1) relocating the UAV to a charging station and (2) collecting and delivering a package. Performance is measured by inspecting the success and completeness of the mission and the accuracy of autonomous flight control. The results show that the AE-SPSO/AEFQ approach successfully finds the optimal state-transition for each mission task and that autonomous flight control is successfully achieved

    Closing the Modelling Gap: Transfer Learning from a Low-Fidelity Simulator for Autonomous Driving

    Get PDF
    The behaviour planning subsystem, which is responsible for high-level decision making and planning, is an important aspect of an autonomous driving system. There are advantages to using a learned behaviour planning system instead of traditional rule-based approaches. However, high quality labelled data for training behaviour planning models is hard to acquire. Thus, reinforcement learning (RL), which can learn a policy from simulations, is a viable option for this problem. However, modelling inaccuracies between the simulator and the target environment, called the ‘transfer gap’, hinders its deployment in a real autonomous vehicle. High-fidelity simulators, which have a smaller transfer gap, come with large computational costs that are not favourable for RL training. Therefore, we often have to settle for a fast, but lower fidelity simulator that exacerbates the transfer learning problem. In this thesis, we study how a low-fidelity 2D simulator can be used in place of a slower 3D simulator for training RL behaviour planning models, and analyze the resulting policies in comparison with a rule-based approach. We develop WiseMove, an RL framework for autonomous driving research that supports hierarchical RL, to serve as the low-fidelity source simulator. A transfer learning scenario is set up from WiseMove to an Unreal-based simulator for the Autonomoose system to study and close the transfer gap. We find that perception errors in the target simulator contribute the most to the transfer gap. These errors, when naively modelled in WiseMove, provide a policy that performs better in the target simulator than a carefully constructed rule-based policy. Applying domain randomization on the environment yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has less reliance on the velocity compared to the rule-based algorithm, as its measurement is unreliable in the target simulator. To understand the exact learned behaviour, we also distill the RL policy using a decision tree to obtain an interpretable rule-based policy. We show that constructing a rule-based policy manually to efficiently handle perception errors is not trivial. Future work can explore more driving scenarios under fewer constraints to further validate this result

    A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform

    No full text
    The use of multi-rotor UAVs in industrial and civil applications has been extensively encouraged by the rapid innovation in all the technologies involved. In particular, deep learning techniques for motion control have recently taken a major qualitative step, since the successful application of Deep Q-Learning to the continuous action domain in Atari-like games. Based on these ideas, Deep Deterministic Policy Gradients (DDPG) algorithm was able to provide outstanding results with continuous state and action domains, which are a requirement in most of the robotics-related tasks. In this context, the research community is lacking the integration of realistic simulation systems with the reinforcement learning paradigm, enabling the application of deep reinforcement learning algorithms to the robotics field. In this paper, a versatile Gazebo-based reinforcement learning framework has been designed and validated with a continuous UAV landing task. The UAV landing maneuver on a moving platform has been solved by means of the novel DDPG algorithm, which has been integrated in our reinforcement learning framework. Several experiments have been performed in a wide variety of conditions for both simulated and real flights, demonstrating the generality of the approach. As an indirect result, a powerful work flow for robotics has been validated, where robots can learn in simulation and perform properly in real operation environments. To the best of the authors knowledge, this is the first work that addresses the continuous UAV landing maneuver on a moving platform by means of a state-of-the-art deep reinforcement learning algorithm, trained in simulation and tested in real flights.This work was supported by the Spanish Ministry of Science (Project DPI2014-60139-R). The LAL UPM and the MONCLOA Campus of International Excellence are also acknowledged for funding the predoctoral contract of one of the authors. An introductory version of this paper was presented in the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), held in Miami, FL USA, on 13–16 June 2017.Peer reviewe

    A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform

    Full text link
    The use of multi-rotor UAVs in industrial and civil applications has been extensively encouraged by the rapid innovation in all the technologies involved. In particular, deep learning techniques for motion control have recently taken a major qualitative step, since the successful application of Deep Q-Learning to the continuous action domain in Atari-like games. Based on these ideas, Deep Deterministic Policy Gradients (DDPG) algorithm was able to provide outstanding results with continuous state and action domains, which are a requirement in most of the robotics-related tasks. In this context, the research community is lacking the integration of realistic simulation systems with the reinforcement learning paradigm, enabling the application of deep reinforcement learning algorithms to the robotics field. In this paper, a versatile Gazebo-based reinforcement learning framework has been designed and validated with a continuous UAV landing task. The UAV landing maneuver on a moving platform has been solved by means of the novel DDPG algorithm, which has been integrated in our reinforcement learning framework. Several experiments have been performed in a wide variety of conditions for both simulated and real flights, demonstrating the generality of the approach. As an indirect result, a powerful work flow for robotics has been validated, where robots can learn in simulation and perform properly in real operation environments. To the best of the authors knowledge, this is the first work that addresses the continuous UAV landing maneuver on a moving platform by means of a state-of-the-art deep reinforcement learning algorithm, trained in simulation and tested in real flights

    Modified Structured Domain Randomization in a Synthetic Environment for Learning Algorithms

    Get PDF
    Deep Reinforcement Learning (DRL) has the capability to solve many complex tasks in robotics, self-driving cars, smart grids, finance, healthcare, and intelligent autonomous systems. During training, DRL agents interact freely with the environment to arrive at an inference model. Under real-world conditions this training creates difficulties of safety, cost, and time considerations. Training in synthetic environments helps overcome these difficulties, however, this only approximates real-world conditions resulting in a ‘reality gap’. The synthetic training of agents has proven advantageous but requires methods to bridge this reality gap. This work addressed this through a methodology which supports agent learning. A framework which incorporates a modifiable synthetic environment integrated with an unmodified DRL algorithm was used to train, test, and evaluate agents while using a modified Structured Domain Randomization (SDR+) technique. It was hypothesized that the application of environment domain randomizations (DR) during the learning process would allow the agent to learn variability and adapt accordingly. Experiments using the SDR+ technique included naturalistic and physical-based DR while applying the concept of context-aware elements (CAE) to guide and speed up agent training. Drone racing served as the use case. The experimental framework workflow generated the following results. First, a baseline was established by training and validating an agent in a generic synthetic environment void of DR and CAE. The agent was then tested in environments with DR which showed degradation of performance. This validated the reality gap phenomenon under synthetic conditions and established a metric for comparison. Second, an SDR+ agent was successfully trained and validated under various applications of DR and CAE. Ablation studies determine most DR and CAE effects applied had equivalent effects on agent performance. Under comparison, the SDR+ agent’s performance exceeded that of the baseline agent in every test where single or combined DR effects were applied. These tests indicated that the SDR+ agent’s performance did improve in environments with applied DR of the same order as received during training. The last result came from testing the SDR+ agent’s inference model in a completely new synthetic environment with more extreme and additional DR effects applied. The SDR+ agent’s performance was degraded to a point where it was inconclusive if generalization occurred in the form of learning to adapt to variations. If the agent’s navigational capabilities, control/feedback from the DRL algorithm, and the use of visual sensing were improved, it is assumed that future work could exhibit indications of generalization using the SDR+ technique
    corecore