7,514 research outputs found
On Learning Algorithms for Nash Equilibria
Third International Symposium, SAGT 2010, Athens, Greece, October 18-20, 2010. ProceedingsCan learning algorithms find a Nash equilibrium? This is a natural question for several reasons. Learning algorithms resemble the behavior of players in many naturally arising games, and thus results on the convergence or non-convergence properties of such dynamics may inform our understanding of the applicability of Nash equilibria as a plausible solution concept in some settings. A second reason for asking this question is in the hope of being able to prove an impossibility result, not dependent on complexity assumptions, for computing Nash equilibria via a restricted class of reasonable algorithms. In this work, we begin to answer this question by considering the dynamics of the standard multiplicative weights update learning algorithms (which are known to converge to a Nash equilibrium for zero-sum games). We revisit a 3×3 game defined by Shapley [10] in the 1950s in order to establish that fictitious play does not converge in general games. For this simple game, we show via a potential function argument that in a variety of settings the multiplicative updates algorithm impressively fails to find the unique Nash equilibrium, in that the cumulative distributions of players produced by learning dynamics actually drift away from the equilibrium
A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games
This paper proposes novel, end-to-end deep reinforcement learning algorithms
for learning two-player zero-sum Markov games. Our objective is to find the
Nash Equilibrium policies, which are free from exploitation by adversarial
opponents. Distinct from prior efforts on finding Nash equilibria in
extensive-form games such as Poker, which feature tree-structured transition
dynamics and discrete state space, this paper focuses on Markov games with
general transition dynamics and continuous state space. We propose (1) Nash DQN
algorithm, which integrates DQN with a Nash finding subroutine for the joint
value functions; and (2) Nash DQN Exploiter algorithm, which additionally
adopts an exploiter for guiding agent's exploration. Our algorithms are the
practical variants of theoretical algorithms which are guaranteed to converge
to Nash equilibria in the basic tabular setting. Experimental evaluation on
both tabular examples and two-player Atari games demonstrates the robustness of
the proposed algorithms against adversarial opponents, as well as their
advantageous performance over existing methods
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
Algorithms designed for single-agent reinforcement learning (RL) generally
fail to converge to equilibria in two-player zero-sum (2p0s) games. On the
other hand, game-theoretic algorithms for approximating Nash and regularized
equilibria in 2p0s games are not typically competitive for RL and can be
difficult to scale. As a result, algorithms for these two cases are generally
developed and evaluated separately. In this work, we show that a single
algorithm can produce strong results in both settings, despite their
fundamental differences. This algorithm, which we call magnet mirror descent
(MMD), is a simple extension to mirror descent and a special case of a
non-Euclidean proximal gradient algorithm. From a theoretical standpoint, we
prove a novel linear convergence for this non-Euclidean proximal gradient
algorithm for a class of variational inequality problems. It follows from this
result that MMD converges linearly to quantal response equilibria (i.e.,
entropy regularized Nash equilibria) in extensive-form games; this is the first
time linear convergence has been proven for a first order solver. Moreover,
applied as a tabular Nash equilibrium solver via self-play, we show empirically
that MMD produces results competitive with CFR; this is the first time that a
standard RL algorithm has done so. Furthermore, for single-agent deep RL, on a
small collection of Atari and Mujoco tasks, we show that MMD can produce
results competitive with those of PPO. Lastly, for multi-agent deep RL, we show
MMD can outperform NFSP in 3x3 Abrupt Dark Hex
- …