3 research outputs found
Convergence of Multi-Agent Learning with a Finite Step Size in General-Sum Games
Learning in a multi-agent system is challenging because agents are
simultaneously learning and the environment is not stationary, undermining
convergence guarantees. To address this challenge, this paper presents a new
gradient-based learning algorithm, called Gradient Ascent with Shrinking Policy
Prediction (GA-SPP), which augments the basic gradient ascent approach with the
concept of shrinking policy prediction. The key idea behind this algorithm is
that an agent adjusts its strategy in response to the forecasted strategy of
the other agent, instead of its current one. GA-SPP is shown formally to have
Nash convergence in larger settings than existing gradient-based multi-agent
learning methods. Furthermore, unlike existing gradient-based methods, GA-SPP's
theoretical guarantees do not assume the learning rate to be infinitesimal.Comment: AAMAS 201
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
The key challenge in multiagent learning is learning a best response to the
behaviour of other agents, which may be non-stationary: if the other agents
adapt their strategy as well, the learning target moves. Disparate streams of
research have approached non-stationarity from several angles, which make a
variety of implicit assumptions that make it hard to keep an overview of the
state of the art and to validate the innovation and significance of new works.
This survey presents a coherent overview of work that addresses
opponent-induced non-stationarity with tools from game theory, reinforcement
learning and multi-armed bandits. Further, we reflect on the principle
approaches how algorithms model and cope with this non-stationarity, arriving
at a new framework and five categories (in increasing order of sophistication):
ignore, forget, respond to target models, learn models, and theory of mind. A
wide range of state-of-the-art algorithms is classified into a taxonomy, using
these categories and key characteristics of the environment (e.g.,
observability) and adaptation behaviour of the opponents (e.g., smooth,
abrupt). To clarify even further we present illustrative variations of one
domain, contrasting the strengths and limitations of each category. Finally, we
discuss in which environments the different approaches yield most merit, and
point to promising avenues of future research.Comment: 64 pages, 7 figures. Under review since November 201