6,175 research outputs found
Open-ended Learning in Symmetric Zero-sum Games
Zero-sum games such as chess and poker are, abstractly, functions that
evaluate pairs of agents, for example labeling them `winner' and `loser'. If
the game is approximately transitive, then self-play generates sequences of
agents of increasing strength. However, nontransitive games, such as
rock-paper-scissors, can exhibit strategic cycles, and there is no longer a
clear objective -- we want agents to increase in strength, but against whom is
unclear. In this paper, we introduce a geometric framework for formulating
agent objectives in zero-sum games, in order to construct adaptive sequences of
objectives that yield open-ended learning. The framework allows us to reason
about population performance in nontransitive games, and enables the
development of a new algorithm (rectified Nash response, PSRO_rN) that uses
game-theoretic niching to construct diverse populations of effective agents,
producing a stronger set of agents than existing algorithms. We apply PSRO_rN
to two highly nontransitive resource allocation games and find that PSRO_rN
consistently outperforms the existing alternatives.Comment: ICML 2019, final versio
Modelling Behavioural Diversity for Learning in Open-Ended Games
Promoting behavioural diversity is critical for solving games with
non-transitive dynamics where strategic cycles exist, and there is no
consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous
treatment for defining diversity and constructing diversity-aware learning
dynamics. In this work, we offer a geometric interpretation of behavioural
diversity in games and introduce a novel diversity metric based on
determinantal point processes (DPP). By incorporating the diversity metric into
best-response dynamics, we develop diverse fictitious play and diverse
policy-space response oracle for solving normal-form games and open-ended
games. We prove the uniqueness of the diverse best response and the convergence
of our algorithms on two-player games. Importantly, we show that maximising the
DPP-based diversity metric guarantees to enlarge the gamescape -- convex
polytopes spanned by agents' mixtures of strategies. To validate our
diversity-aware solvers, we test on tens of games that show strong
non-transitivity. Results suggest that our methods achieve at least the same,
and in most games, lower exploitability than PSRO solvers by finding effective
and diverse strategies.Comment: corresponds to <[email protected]
Learning to Play Othello with N-Tuple Systems
This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously de-veloped weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games of self-play learning. The conclusion is that n-tuple networks learn faster and better than the other more conventional approaches
- …