48 research outputs found
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
In this paper we explore how actor-critic methods in deep reinforcement
learning, in particular Asynchronous Advantage Actor-Critic (A3C), can be
extended with agent modeling. Inspired by recent works on representation
learning and multiagent deep reinforcement learning, we propose two
architectures to perform agent modeling: the first one based on parameter
sharing, and the second one based on agent policy features. Both architectures
aim to learn other agents' policies as auxiliary tasks, besides the standard
actor (policy) and critic (values). We performed experiments in both
cooperative and competitive domains. The former is a problem of coordinated
multiagent object transportation and the latter is a two-player mini version of
the Pommerman game. Our results show that the proposed architectures stabilize
learning and outperform the standard A3C architecture when learning a best
response in terms of expected rewards.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19
Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning
Deep reinforcement learning has achieved great successes in recent years, but
there are still open challenges, such as convergence to locally optimal
policies and sample inefficiency. In this paper, we contribute a novel
self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating
temporal closeness to terminal states for episodic tasks. The intuition is to
help representation learning by letting the agent predict how close it is to a
terminal state, while learning its control policy. Although TP could be
integrated with multiple algorithms, this paper focuses on Asynchronous
Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our
extensive evaluation includes: a set of Atari games, the BipedalWalker domain,
and a mini version of the recently proposed multi-agent Pommerman game. Our
results on Atari games and the BipedalWalker domain suggest that A3C-TP
outperforms standard A3C in most of the tested domains and in others it has
similar performance. In Pommerman, our proposed method provides significant
improvement both in learning efficiency and converging to better policies
against different opponents.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19). arXiv admin note: text overlap with
arXiv:1812.0004
Action Guidance with MCTS for Deep Reinforcement Learning
Deep reinforcement learning has achieved great successes in recent years,
however, one main challenge is the sample inefficiency. In this paper, we focus
on how to use action guidance by means of a non-expert demonstrator to improve
sample efficiency in a domain with sparse, delayed, and possibly deceptive
rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a
new framework where even a non-expert simulated demonstrator, e.g., planning
algorithms such as Monte Carlo tree search with a small number rollouts, can be
integrated within asynchronous distributed deep reinforcement learning methods.
Compared to a vanilla deep RL algorithm, our proposed methods both learn faster
and converge to better policies on a two-player mini version of the Pommerman
game.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19). arXiv admin note: substantial text overlap with
arXiv:1904.05759, arXiv:1812.0004
On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman
How to best explore in domains with sparse, delayed, and deceptive rewards is
an important open problem for reinforcement learning (RL). This paper considers
one such domain, the recently-proposed multi-agent benchmark of Pommerman. This
domain is very challenging for RL --- past work has shown that model-free RL
algorithms fail to achieve significant learning without artificially reducing
the environment's complexity. In this paper, we illuminate reasons behind this
failure by providing a thorough analysis on the hardness of random exploration
in Pommerman. While model-free random exploration is typically futile, we
develop a model-based automatic reasoning module that can be used for safer
exploration by pruning actions that will surely lead the agent to death. We
empirically demonstrate that this module can significantly improve learning.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE) 201
Residual Stresses in Alloy IN718 Produced through Modulated Laser Powder Bed Fusion
Open access via the Springer AgreementPeer reviewe