7 research outputs found
Solving Common-Payoff Games with Approximate Policy Iteration
For artificially intelligent learning systems to have widespread
applicability in real-world settings, it is important that they be able to
operate decentrally. Unfortunately, decentralized control is difficult --
computing even an epsilon-optimal joint policy is a NEXP complete problem.
Nevertheless, a recently rediscovered insight -- that a team of agents can
coordinate via common knowledge -- has given rise to algorithms capable of
finding optimal joint policies in small common-payoff games. The Bayesian
action decoder (BAD) leverages this insight and deep reinforcement learning to
scale to games as large as two-player Hanabi. However, the approximations it
uses to do so prevent it from discovering optimal joint policies even in games
small enough to brute force optimal solutions. This work proposes CAPI, a novel
algorithm which, like BAD, combines common knowledge with deep reinforcement
learning. However, unlike BAD, CAPI prioritizes the propensity to discover
optimal joint policies over scalability. While this choice precludes CAPI from
scaling to games as large as Hanabi, empirical results demonstrate that, on the
games to which CAPI does scale, it is capable of discovering optimal joint
policies even when other modern multi-agent reinforcement learning algorithms
are unable to do so. Code is available at https://github.com/ssokota/capi .Comment: AAAI 202
Measuring the impact of cooperative rewards on AI
Master's Project (M.S.) University of Alaska Fairbanks, 2020We consider the effects of varying individualistic and team rewards on learning for a Deep Q-Network AI in a multi-agent system, using a synthetic team game ‘Futlol’ designed for this purpose. Experimental results with this game using the OpenSpiel framework indicate that mixed reward structures result in lower win rates. It is unclear if this is due to faster learning on simpler reward structures or a flaw in the nature of the reward system
Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements
Welfare Diplomacy: Benchmarking Language Model Cooperation
The growing capabilities and increasingly widespread deployment of AI systems
necessitate robust benchmarks for measuring their cooperative capabilities.
Unfortunately, most multi-agent benchmarks are either zero-sum or purely
cooperative, providing limited opportunities for such measurements. We
introduce a general-sum variant of the zero-sum board game Diplomacy -- called
Welfare Diplomacy -- in which players must balance investing in military
conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both
a clearer assessment of and stronger training incentives for cooperative
capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules
and implementing them via an open-source Diplomacy engine; (2) constructing
baseline agents using zero-shot prompted language models; and (3) conducting
experiments where we find that baselines using state-of-the-art models attain
high social welfare but are exploitable. Our work aims to promote societal
safety by aiding researchers in developing and assessing multi-agent AI
systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is
available at https://github.com/mukobi/welfare-diplomacy