25 research outputs found
Modelling Behavioural Diversity for Learning in Open-Ended Games
Promoting behavioural diversity is critical for solving games with
non-transitive dynamics where strategic cycles exist, and there is no
consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous
treatment for defining diversity and constructing diversity-aware learning
dynamics. In this work, we offer a geometric interpretation of behavioural
diversity in games and introduce a novel diversity metric based on
determinantal point processes (DPP). By incorporating the diversity metric into
best-response dynamics, we develop diverse fictitious play and diverse
policy-space response oracle for solving normal-form games and open-ended
games. We prove the uniqueness of the diverse best response and the convergence
of our algorithms on two-player games. Importantly, we show that maximising the
DPP-based diversity metric guarantees to enlarge the gamescape -- convex
polytopes spanned by agents' mixtures of strategies. To validate our
diversity-aware solvers, we test on tens of games that show strong
non-transitivity. Results suggest that our methods achieve at least the same,
and in most games, lower exploitability than PSRO solvers by finding effective
and diverse strategies.Comment: corresponds to <[email protected]
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
Multiagent reinforcement learning (MARL) has achieved a remarkable amount of
success in solving various types of video games. A cornerstone of this success
is the auto-curriculum framework, which shapes the learning process by
continually creating new challenging tasks for agents to adapt to, thereby
facilitating the acquisition of new skills. In order to extend MARL methods to
real-world domains outside of video games, we envision in this blue sky paper
that maintaining a diversity-aware auto-curriculum is critical for successful
MARL applications. Specifically, we argue that \emph{behavioural diversity} is
a pivotal, yet under-explored, component for real-world multiagent learning
systems, and that significant work remains in understanding how to design a
diversity-aware auto-curriculum. We list four open challenges for
auto-curriculum techniques, which we believe deserve more attention from this
community. Towards validating our vision, we recommend modelling realistic
interactive behaviours in autonomous driving as an important test bed, and
recommend the SMARTS/ULTRA benchmark.Comment: AAMAS 202
Neural Auto-Curricula in Two-Player Zero-Sum Games
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper1, we introduce a novel framework-Neural Auto-Curricula (NAC)-that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data
Paired comparisons for games of chance
We present a Bayesian rating system based on the method of paired
comparisons. Our system is a flexible generalization of the well-known Glicko,
and in particular can better accommodate games with significant elements of
luck. Our system is currently in use in the online game Duelyst II, and in that
setting outperforms Glicko2
Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments
The Game Theory & Multi-Agent team at DeepMind studies several aspects of
multi-agent learning ranging from computing approximations to fundamental
concepts in game theory to simulating social dilemmas in rich spatial
environments and training 3-d humanoids in difficult team coordination tasks. A
signature aim of our group is to use the resources and expertise made available
to us at DeepMind in deep reinforcement learning to explore multi-agent systems
in complex environments and use these benchmarks to advance our understanding.
Here, we summarise the recent work of our team and present a taxonomy that we
feel highlights many important open challenges in multi-agent research.Comment: Published in AI Communications 202
A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers
In this paper, we introduce a two-player zero-sum framework between a
trainable \emph{Solver} and a \emph{Data Generator} to improve the
generalization ability of deep learning-based solvers for Traveling Salesman
Problem (TSP). Grounded in \textsl{Policy Space Response Oracle} (PSRO)
methods, our two-player framework outputs a population of best-responding
Solvers, over which we can mix and output a combined model that achieves the
least exploitability against the Generator, and thereby the most generalizable
performance on different TSP tasks. We conduct experiments on a variety of TSP
instances with different types and sizes. Results suggest that our Solvers
achieve the state-of-the-art performance even on tasks the Solver never meets,
whilst the performance of other deep learning-based Solvers drops sharply due
to over-fitting. To demonstrate the principle of our framework, we study the
learning outcome of the proposed two-player game and demonstrate that the
exploitability of the Solver population decreases during training, and it
eventually approximates the Nash equilibrium along with the Generator.Comment: ICLR2022 Gamification and Multiagent Solutions Workshop Spotlight
Presentatio