Search CORE

2 research outputs found

No Press Diplomacy: Modeling Multi-Agent Gameplay

Author: Bocco Steven
Courville Aaron
Kummerfeld Jonathan K.
Lu Yuchen
Ortiz-Gagne Satya
Paquette Philip
Pineau Joelle
Singh Satinder
Smith Max O.
Publication venue
Publication date: 19/11/2019
Field of study

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. We present DipNet, a neural-network-based policy model for No Press Diplomacy. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play. Both the SL and RL agents demonstrate state-of-the-art No Press performance by beating popular rule-based bots.Comment: Accepted at NeurIPS 201

arXiv.org e-Print Archive

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Author: Anthony Thomas
Bachrach Yoram
Eccles Tom
Everett Richard
Gemp Ian
Graepel Thore
Hudson Thomas C.
Kramár János
Lanctot Marc
Porcel Nicolas
Pérolat Julien
Singh Satinder
Tacchetti Andrea
Werpachowski Roman
Publication venue
Publication date: 26/08/2020
Field of study

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements

arXiv.org e-Print Archive