438 research outputs found
Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
This paper investigates how to utilize different forms of human interaction
to safely train autonomous systems in real-time by learning from both human
demonstrations and interventions. We implement two components of the
Cycle-of-Learning for Autonomous Systems, which is our framework for combining
multiple modalities of human interaction. The current effort employs human
demonstrations to teach a desired behavior via imitation learning, then
leverages intervention data to correct for undesired behaviors produced by the
imitation learner to teach novel tasks to an autonomous agent safely, after
only minutes of training. We demonstrate this method in an autonomous perching
task using a quadrotor with continuous roll, pitch, yaw, and throttle commands
and imagery captured from a downward-facing camera in a high-fidelity simulated
environment. Our method improves task completion performance for the same
amount of human interaction when compared to learning from demonstrations
alone, while also requiring on average 32% less data to achieve that
performance. This provides evidence that combining multiple modes of human
interaction can increase both the training speed and overall performance of
policies for autonomous systems.Comment: 9 pages, 6 figure
A Survey of Offline and Online Learning-Based Algorithms for Multirotor UAVs
Multirotor UAVs are used for a wide spectrum of civilian and public domain
applications. Navigation controllers endowed with different attributes and
onboard sensor suites enable multirotor autonomous or semi-autonomous, safe
flight, operation, and functionality under nominal and detrimental conditions
and external disturbances, even when flying in uncertain and dynamically
changing environments. During the last decade, given the
faster-than-exponential increase of available computational power, different
learning-based algorithms have been derived, implemented, and tested to
navigate and control, among other systems, multirotor UAVs. Learning algorithms
have been, and are used to derive data-driven based models, to identify
parameters, to track objects, to develop navigation controllers, and to learn
the environment in which multirotors operate. Learning algorithms combined with
model-based control techniques have been proven beneficial when applied to
multirotors. This survey summarizes published research since 2015, dividing
algorithms, techniques, and methodologies into offline and online learning
categories, and then, further classifying them into machine learning, deep
learning, and reinforcement learning sub-categories. An integral part and focus
of this survey are on online learning algorithms as applied to multirotors with
the aim to register the type of learning techniques that are either hard or
almost hard real-time implementable, as well as to understand what information
is learned, why, and how, and how fast. The outcome of the survey offers a
clear understanding of the recent state-of-the-art and of the type and kind of
learning-based algorithms that may be implemented, tested, and executed in
real-time.Comment: 26 pages, 6 figures, 4 tables, Survey Pape
An Adaptive Multi-Level Quantization-Based Reinforcement Learning Model for Enhancing UAV Landing on Moving Targets
The autonomous landing of an unmanned aerial vehicle (UAV) on a moving platform is an essential functionality in various UAV-based applications. It can be added to a teleoperation UAV system or part of an autonomous UAV control system. Various robust and predictive control systems based on the traditional control theory are used for operating a UAV. Recently, some attempts were made to land a UAV on a moving target using reinforcement learning (RL). Vision is used as a typical way of sensing and detecting the moving target. Mainly, the related works have deployed a deep-neural network (DNN) for RL, which takes the image as input and provides the optimal navigation action as output. However, the delay of the multi-layer topology of the deep neural network affects the real-time aspect of such control. This paper proposes an adaptive multi-level quantization-based reinforcement learning (AMLQ) model. The AMLQ model quantizes the continuous actions and states to directly incorporate simple Q-learning to resolve the delay issue. This solution makes the training faster and enables simple knowledge representation without needing the DNN. For evaluation, the AMLQ model was compared with state-of-art approaches and was found to be superior in terms of root mean square error (RMSE), which was 8.7052 compared with the proportional-integral-derivative (PID) controller, which achieved an RMSE of 10.0592
Robust Reinforcement Learning Algorithm for Vision-based Ship Landing of UAVs
This paper addresses the problem of developing an algorithm for autonomous
ship landing of vertical take-off and landing (VTOL) capable unmanned aerial
vehicles (UAVs), using only a monocular camera in the UAV for tracking and
localization. Ship landing is a challenging task due to the small landing
space, six degrees of freedom ship deck motion, limited visual references for
localization, and adversarial environmental conditions such as wind gusts. We
first develop a computer vision algorithm which estimates the relative position
of the UAV with respect to a horizon reference bar on the landing platform
using the image stream from a monocular vision camera on the UAV. Our approach
is motivated by the actual ship landing procedure followed by the Navy
helicopter pilots in tracking the horizon reference bar as a visual cue. We
then develop a robust reinforcement learning (RL) algorithm for controlling the
UAV towards the landing platform even in the presence of adversarial
environmental conditions such as wind gusts. We demonstrate the superior
performance of our algorithm compared to a benchmark nonlinear PID control
approach, both in the simulation experiments using the Gazebo environment and
in the real-world setting using a Parrot ANAFI quad-rotor and sub-scale ship
platform undergoing 6 degrees of freedom (DOF) deck motion
Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training
Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking
- …