34,621 research outputs found
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
Recent successes combine reinforcement learning algorithms and deep neural
networks, despite reinforcement learning not being widely applied to robotics
and real world scenarios. This can be attributed to the fact that current
state-of-the-art, end-to-end reinforcement learning approaches still require
thousands or millions of data samples to converge to a satisfactory policy and
are subject to catastrophic failures during training. Conversely, in real world
scenarios and after just a few data samples, humans are able to either provide
demonstrations of the task, intervene to prevent catastrophic actions, or
simply evaluate if the policy is performing correctly. This research
investigates how to integrate these human interaction modalities to the
reinforcement learning loop, increasing sample efficiency and enabling
real-time reinforcement learning in robotics and real world scenarios. This
novel theoretical foundation is called Cycle-of-Learning, a reference to how
different human interaction modalities, namely, task demonstration,
intervention, and evaluation, are cycled and combined to reinforcement learning
algorithms. Results presented in this work show that the reward signal that is
learned based upon human interaction accelerates the rate of learning of
reinforcement learning algorithms and that learning from a combination of human
demonstrations and interventions is faster and more sample efficient when
compared to traditional supervised learning algorithms. Finally,
Cycle-of-Learning develops an effective transition between policies learned
using human demonstrations and interventions to reinforcement learning. The
theoretical foundation developed by this research opens new research paths to
human-agent teaming scenarios where autonomous agents are able to learn from
human teammates and adapt to mission performance metrics in real-time and in
real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more
information, see https://vggoecks.com
On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
KL-regularized reinforcement learning from expert demonstrations has proved
successful in improving the sample efficiency of deep reinforcement learning
algorithms, allowing them to be applied to challenging physical real-world
tasks. However, we show that KL-regularized reinforcement learning with
behavioral reference policies derived from expert demonstrations can suffer
from pathological training dynamics that can lead to slow, unstable, and
suboptimal online learning. We show empirically that the pathology occurs for
commonly chosen behavioral policy classes and demonstrate its impact on sample
efficiency and online policy performance. Finally, we show that the pathology
can be remedied by non-parametric behavioral reference policies and that this
allows KL-regularized reinforcement learning to significantly outperform
state-of-the-art approaches on a variety of challenging locomotion and
dexterous hand manipulation tasks.Comment: Published in Advances in Neural Information Processing Systems 34
(NeurIPS 2021
Reinforcement Learning from Demonstration
Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent\u27s own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently
The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
We study the sample complexity of teaching, termed as "teaching dimension"
(TDim) in the literature, for the teaching-by-reinforcement paradigm, where the
teacher guides the student through rewards. This is distinct from the
teaching-by-demonstration paradigm motivated by robotics applications, where
the teacher teaches by providing demonstrations of state/action trajectories.
The teaching-by-reinforcement paradigm applies to a wider range of real-world
settings where a demonstration is inconvenient, but has not been studied
systematically. In this paper, we focus on a specific family of reinforcement
learning algorithms, Q-learning, and characterize the TDim under different
teachers with varying control power over the environment, and present matching
optimal teaching algorithms. Our TDim results provide the minimum number of
samples needed for reinforcement learning, and we discuss their connections to
standard PAC-style RL sample complexity and teaching-by-demonstration sample
complexity results. Our teaching algorithms have the potential to speed up RL
agent learning in applications where a helpful teacher is available
- …