7,739 research outputs found
Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
This paper investigates how to utilize different forms of human interaction
to safely train autonomous systems in real-time by learning from both human
demonstrations and interventions. We implement two components of the
Cycle-of-Learning for Autonomous Systems, which is our framework for combining
multiple modalities of human interaction. The current effort employs human
demonstrations to teach a desired behavior via imitation learning, then
leverages intervention data to correct for undesired behaviors produced by the
imitation learner to teach novel tasks to an autonomous agent safely, after
only minutes of training. We demonstrate this method in an autonomous perching
task using a quadrotor with continuous roll, pitch, yaw, and throttle commands
and imagery captured from a downward-facing camera in a high-fidelity simulated
environment. Our method improves task completion performance for the same
amount of human interaction when compared to learning from demonstrations
alone, while also requiring on average 32% less data to achieve that
performance. This provides evidence that combining multiple modes of human
interaction can increase both the training speed and overall performance of
policies for autonomous systems.Comment: 9 pages, 6 figure
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
Recent successes combine reinforcement learning algorithms and deep neural
networks, despite reinforcement learning not being widely applied to robotics
and real world scenarios. This can be attributed to the fact that current
state-of-the-art, end-to-end reinforcement learning approaches still require
thousands or millions of data samples to converge to a satisfactory policy and
are subject to catastrophic failures during training. Conversely, in real world
scenarios and after just a few data samples, humans are able to either provide
demonstrations of the task, intervene to prevent catastrophic actions, or
simply evaluate if the policy is performing correctly. This research
investigates how to integrate these human interaction modalities to the
reinforcement learning loop, increasing sample efficiency and enabling
real-time reinforcement learning in robotics and real world scenarios. This
novel theoretical foundation is called Cycle-of-Learning, a reference to how
different human interaction modalities, namely, task demonstration,
intervention, and evaluation, are cycled and combined to reinforcement learning
algorithms. Results presented in this work show that the reward signal that is
learned based upon human interaction accelerates the rate of learning of
reinforcement learning algorithms and that learning from a combination of human
demonstrations and interventions is faster and more sample efficient when
compared to traditional supervised learning algorithms. Finally,
Cycle-of-Learning develops an effective transition between policies learned
using human demonstrations and interventions to reinforcement learning. The
theoretical foundation developed by this research opens new research paths to
human-agent teaming scenarios where autonomous agents are able to learn from
human teammates and adapt to mission performance metrics in real-time and in
real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more
information, see https://vggoecks.com
Interactive Imitation Learning in Robotics: A Survey
Interactive Imitation Learning (IIL) is a branch of Imitation Learning (IL)
where human feedback is provided intermittently during robot execution allowing
an online improvement of the robot's behavior. In recent years, IIL has
increasingly started to carve out its own space as a promising data-driven
alternative for solving complex robotic tasks. The advantages of IIL are its
data-efficient, as the human feedback guides the robot directly towards an
improved behavior, and its robustness, as the distribution mismatch between the
teacher and learner trajectories is minimized by providing feedback directly
over the learner's trajectories. Nevertheless, despite the opportunities that
IIL presents, its terminology, structure, and applicability are not clear nor
unified in the literature, slowing down its development and, therefore, the
research of innovative formulations and discoveries. In this article, we
attempt to facilitate research in IIL and lower entry barriers for new
practitioners by providing a survey of the field that unifies and structures
it. In addition, we aim to raise awareness of its potential, what has been
accomplished and what are still open research questions. We organize the most
relevant works in IIL in terms of human-robot interaction (i.e., types of
feedback), interfaces (i.e., means of providing feedback), learning (i.e.,
models learned from feedback and function approximators), user experience
(i.e., human perception about the learning process), applications, and
benchmarks. Furthermore, we analyze similarities and differences between IIL
and RL, providing a discussion on how the concepts offline, online, off-policy
and on-policy learning should be transferred to IIL from the RL literature. We
particularly focus on robotic applications in the real world and discuss their
implications, limitations, and promising future areas of research
Learning from Interventions using Hierarchical Policies for Safe Learning
Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well on
multiple complex tasks. However, a limitation of the typical LfD approach is
that it requires expert demonstrations for all scenarios, including those in
which the algorithm is already well-trained. The recently proposed Learning
from Interventions (LfI) overcomes this limitation by using an expert overseer.
The expert overseer only intervenes when it suspects that an unsafe action is
about to be taken. Although LfI significantly improves over LfD, the
state-of-the-art LfI fails to account for delay caused by the expert's reaction
time and only learns short-term behavior. We address these limitations by 1)
interpolating the expert's interventions back in time, and 2) by splitting the
policy into two hierarchical levels, one that generates sub-goals for the
future and another that generates actions to reach those desired sub-goals.
This sub-goal prediction forces the algorithm to learn long-term behavior while
also being robust to the expert's reaction time. Our experiments show that LfI
using sub-goals in a hierarchical policy framework trains faster and achieves
better asymptotic performance than typical LfD.Comment: Accepted for publication at the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20
- …