10,612 research outputs found
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Deep Q-Learning for Nash Equilibria: Nash-DQN
Model-free learning for multi-agent stochastic games is an active area of
research. Existing reinforcement learning algorithms, however, are often
restricted to zero-sum games, and are applicable only in small state-action
spaces or other simplified settings. Here, we develop a new data efficient
Deep-Q-learning methodology for model-free learning of Nash equilibria for
general-sum stochastic games. The algorithm uses a local linear-quadratic
expansion of the stochastic game, which leads to analytically solvable optimal
actions. The expansion is parametrized by deep neural networks to give it
sufficient flexibility to learn the environment without the need to experience
all state-action pairs. We study symmetry properties of the algorithm stemming
from label-invariant stochastic games and as a proof of concept, apply our
algorithm to learning optimal trading strategies in competitive electronic
markets.Comment: 16 pages, 4 figure
Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning
The recent mean field game (MFG) formalism facilitates otherwise intractable
computation of approximate Nash equilibria in many-agent settings. In this
paper, we consider discrete-time finite MFGs subject to finite-horizon
objectives. We show that all discrete-time finite MFGs with non-constant fixed
point operators fail to be contractive as typically assumed in existing MFG
literature, barring convergence via fixed point iteration. Instead, we
incorporate entropy-regularization and Boltzmann policies into the fixed point
iteration. As a result, we obtain provable convergence to approximate fixed
points where existing methods fail, and reach the original goal of approximate
Nash equilibria. All proposed methods are evaluated with respect to their
exploitability, on both instructive examples with tractable exact solutions and
high-dimensional problems where exact methods become intractable. In
high-dimensional scenarios, we apply established deep reinforcement learning
methods and empirically combine fictitious play with our approximations.Comment: Accepted to the 24th International Conference on Artificial
Intelligence and Statistics (AISTATS 2021
On the Convergence of Model Free Learning in Mean Field Games
Learning by experience in Multi-Agent Systems (MAS) is a difficult and
exciting task, due to the lack of stationarity of the environment, whose
dynamics evolves as the population learns. In order to design scalable
algorithms for systems with a large population of interacting agents (e.g.
swarms), this paper focuses on Mean Field MAS, where the number of agents is
asymptotically infinite. Recently, a very active burgeoning field studies the
effects of diverse reinforcement learning algorithms for agents with no prior
information on a stationary Mean Field Game (MFG) and learn their policy
through repeated experience. We adopt a high perspective on this problem and
analyze in full generality the convergence of a fictitious iterative scheme
using any single agent learning algorithm at each step. We quantify the quality
of the computed approximate Nash equilibrium, in terms of the accumulated
errors arising at each learning iteration step. Notably, we show for the first
time convergence of model free learning algorithms towards non-stationary MFG
equilibria, relying only on classical assumptions on the MFG dynamics. We
illustrate our theoretical results with a numerical experiment in a continuous
action-space environment, where the approximate best response of the iterative
fictitious play scheme is computed with a deep RL algorithm
Scaling up Mean Field Games with Online Mirror Descent
We address scaling up equilibrium computation in Mean Field Games (MFGs)
using Online Mirror Descent (OMD). We show that continuous-time OMD provably
converges to a Nash equilibrium under a natural and well-motivated set of
monotonicity assumptions. This theoretical result nicely extends to
multi-population games and to settings involving common noise. A thorough
experimental investigation on various single and multi-population MFGs shows
that OMD outperforms traditional algorithms such as Fictitious Play (FP). We
empirically show that OMD scales up and converges significantly faster than FP
by solving, for the first time to our knowledge, examples of MFGs with hundreds
of billions states. This study establishes the state-of-the-art for learning in
large-scale multi-agent and multi-population games
- …