7,488 research outputs found
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
In this paper we explore how actor-critic methods in deep reinforcement
learning, in particular Asynchronous Advantage Actor-Critic (A3C), can be
extended with agent modeling. Inspired by recent works on representation
learning and multiagent deep reinforcement learning, we propose two
architectures to perform agent modeling: the first one based on parameter
sharing, and the second one based on agent policy features. Both architectures
aim to learn other agents' policies as auxiliary tasks, besides the standard
actor (policy) and critic (values). We performed experiments in both
cooperative and competitive domains. The former is a problem of coordinated
multiagent object transportation and the latter is a two-player mini version of
the Pommerman game. Our results show that the proposed architectures stabilize
learning and outperform the standard A3C architecture when learning a best
response in terms of expected rewards.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19
Improving Search through A3C Reinforcement Learning based Conversational Agent
We develop a reinforcement learning based search assistant which can assist
users through a set of actions and sequence of interactions to enable them
realize their intent. Our approach caters to subjective search where the user
is seeking digital assets such as images which is fundamentally different from
the tasks which have objective and limited search modalities. Labeled
conversational data is generally not available in such search tasks and
training the agent through human interactions can be time consuming. We propose
a stochastic virtual user which impersonates a real user and can be used to
sample user behavior efficiently to train the agent which accelerates the
bootstrapping of the agent. We develop A3C algorithm based context preserving
architecture which enables the agent to provide contextual assistance to the
user. We compare the A3C agent with Q-learning and evaluate its performance on
average rewards and state values it obtains with the virtual user in validation
episodes. Our experiments show that the agent learns to achieve higher rewards
and better states.Comment: 17 pages, 7 figure
Mutual Alignment Transfer Learning
Training robots for operation in the real world is a complex, time consuming
and potentially expensive task. Despite significant success of reinforcement
learning in games and simulations, research in real robot applications has not
been able to match similar progress. While sample complexity can be reduced by
training policies in simulation, such policies can perform sub-optimally on the
real platform given imperfect calibration of model dynamics. We present an
approach -- supplemental to fine tuning on the real robot -- to further benefit
from parallel access to a simulator during training and reduce sample
requirements on the real robot. The developed approach harnesses auxiliary
rewards to guide the exploration for the real world agent based on the
proficiency of the agent in simulation and vice versa. In this context, we
demonstrate empirically that the reciprocal alignment for both agents provides
further benefit as the agent in simulation can adjust to optimize its behaviour
for states commonly visited by the real-world agent
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
- …