Open-ended Learning in Symmetric Zero-sum Games

Bachrach, Yoram; Balduzzi, David; Czarnecki, Wojciech M.; Garnelo, Marta; Graepel, Thore; Jaderberg, Max; Perolat, Julien

research

Open-ended Learning in Symmetric Zero-sum Games

Authors: Yoram Bachrach
David Balduzzi
Wojciech M. Czarnecki
Marta Garnelo
Thore Graepel
Max Jaderberg
Julien Perolat
Publication date: 1 January 2019
Publisher

Abstract

Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.Comment: ICML 2019, final versio

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UCL Discovery

oai:eprints.ucl.ac.uk.OAI2:100...

Last time updated on 26/05/2020