Search CORE

63 research outputs found

Actor-Critic Fictitious Play in Simultaneous Move Multistage Games

Author: Pietquin Olivier
Piot Bilal
Pérolat Julien
Publication venue: HAL CCSD
Publication date: 09/04/2018
Field of study

International audienceFictitious play is a game theoretic iterative procedure meant to learn an equilibrium in normal form games. However, this algorithm requires that each player has full knowledge of other players' strategies. Using an architecture inspired by actor-critic algorithms, we build a stochastic approximation of the fictitious play process. This procedure is on-line, decentralized (an agent has no information of others' strategies and rewards) and applies to multistage games (a generalization of normal form games). In addition, we prove convergence of our method towards a Nash equilibrium in both the cases of zero-sum two-player multistage games and cooperative multistage games. We also provide empirical evidence of the soundness of our approach on the game of Alesia with and without function approximation

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

On the Convergence of Model Free Learning in Mean Field Games

Author: Elie Romuald
Geist Matthieu
Laurière Mathieu
Pietquin Olivier
Pérolat Julien
Publication venue
Publication date: 20/02/2020
Field of study

Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Author: Bowling Michael
Lanctot Marc
Munos Remi
Perolat Julien
Srinivasan Sriram
Tuyls Karl
Zambaldi Vinicius
Publication venue
Publication date: 01/12/2018
Field of study

University of Liverpool Repository

A General Framework for Learning Mean-Field Games

Author: Guo Xin
Hu Anran
Xu Renyuan
Zhang Junzi
Publication venue
Publication date: 10/10/2021
Field of study

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the

N

-player setting.Comment: 43 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1901.0958

arXiv.org e-Print Archive

Competitive Policy Optimization

Author: Anandkumar Anima
Azizzadenesheli Kamyar
Liniger Alexander
Prajapat Manish
Yue Yisong
Publication venue
Publication date: 18/06/2020
Field of study

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods