68,576 research outputs found
Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds
We describe a method to use discrete human feedback to enhance the
performance of deep learning agents in virtual three-dimensional environments
by extending deep-reinforcement learning to model the confidence and
consistency of human feedback. This enables deep reinforcement learning
algorithms to determine the most appropriate time to listen to the human
feedback, exploit the current policy model, or explore the agent's environment.
Managing the trade-off between these three strategies allows DRL agents to be
robust to inconsistent or intermittent human feedback. Through experimentation
using a synthetic oracle, we show that our technique improves the training
speed and overall performance of deep reinforcement learning in navigating
three-dimensional environments using Minecraft. We further show that our
technique is robust to highly innacurate human feedback and can also operate
when no human feedback is given
Learning to Teach Reinforcement Learning Agents
In this article we study the transfer learning model of action advice under a
budget. We focus on reinforcement learning teachers providing action advice to
heterogeneous students playing the game of Pac-Man under a limited advice
budget. First, we examine several critical factors affecting advice quality in
this setting, such as the average performance of the teacher, its variance and
the importance of reward discounting in advising. The experiments show the
non-trivial importance of the coefficient of variation (CV) as a statistic for
choosing policies that generate advice. The CV statistic relates variance to
the corresponding mean. Second, the article studies policy learning for
distributing advice under a budget. Whereas most methods in the relevant
literature rely on heuristics for advice distribution we formulate the problem
as a learning one and propose a novel RL algorithm capable of learning when to
advise, adapting to the student and the task at hand. Furthermore, we argue
that learning to advise under a budget is an instance of a more generic
learning problem: Constrained Exploitation Reinforcement Learning
Vision-based reinforcement learning using approximate policy iteration
A major issue for reinforcement learning (RL) applied to robotics is the time required to learn a new skill. While RL has been used to learn mobile robot control in many simulated domains, applications involving learning on real
robots are still relatively rare. In this paper, the Least-Squares Policy Iteration (LSPI) reinforcement learning algorithm and a new model-based algorithm Least-Squares Policy Iteration with Prioritized Sweeping (LSPI+), are implemented on a mobile robot to acquire new skills quickly and efficiently. LSPI+ combines the benefits of LSPI and prioritized sweeping, which uses all previous experience to focus the computational effort on the most āinterestingā or dynamic parts of the state space.
The proposed algorithms are tested on a household vacuum
cleaner robot for learning a docking task using vision as the only sensor modality. In experiments these algorithms are compared to other model-based and model-free RL algorithms. The results show that the number of trials required to learn the docking task is significantly reduced using LSPI compared to the other RL algorithms investigated, and that LSPI+ further improves on the performance of LSPI
- ā¦