130,982 research outputs found
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
While recent advances in deep reinforcement learning have allowed autonomous
learning agents to succeed at a variety of complex tasks, existing algorithms
generally require a lot of training data. One way to increase the speed at
which agents are able to learn to perform tasks is by leveraging the input of
human trainers. Although such input can take many forms, real-time,
scalar-valued feedback is especially useful in situations where it proves
difficult or impossible for humans to provide expert demonstrations. Previous
approaches have shown the usefulness of human input provided in this fashion
(e.g., the TAMER framework), but they have thus far not considered
high-dimensional state spaces or employed the use of deep learning. In this
paper, we do both: we propose Deep TAMER, an extension of the TAMER framework
that leverages the representational power of deep neural networks in order to
learn complex tasks in just a short amount of time with a human trainer. We
demonstrate Deep TAMER's success by using it and just 15 minutes of
human-provided feedback to train an agent that performs better than humans on
the Atari game of Bowling - a task that has proven difficult for even
state-of-the-art reinforcement learning methods.Comment: 9 pages, 6 figure
Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds
We describe a method to use discrete human feedback to enhance the
performance of deep learning agents in virtual three-dimensional environments
by extending deep-reinforcement learning to model the confidence and
consistency of human feedback. This enables deep reinforcement learning
algorithms to determine the most appropriate time to listen to the human
feedback, exploit the current policy model, or explore the agent's environment.
Managing the trade-off between these three strategies allows DRL agents to be
robust to inconsistent or intermittent human feedback. Through experimentation
using a synthetic oracle, we show that our technique improves the training
speed and overall performance of deep reinforcement learning in navigating
three-dimensional environments using Minecraft. We further show that our
technique is robust to highly innacurate human feedback and can also operate
when no human feedback is given
- …