4,111 research outputs found
Resolving social dilemmas with minimal reward transfer
Multi-agent cooperation is an important topic, and is particularly
challenging in mixed-motive situations where it does not pay to be nice to
others. Consequently, self-interested agents often avoid collective behaviour,
resulting in suboptimal outcomes for the group. In response, in this paper we
introduce a metric to quantify the disparity between what is rational for
individual agents and what is rational for the group, which we call the general
self-interest level. This metric represents the maximum proportion of
individual rewards that all agents can retain while ensuring that achieving
social welfare optimum becomes a dominant strategy. By aligning the individual
and group incentives, rational agents acting to maximise their own reward will
simultaneously maximise the collective reward. As agents transfer their rewards
to motivate others to consider their welfare, we diverge from traditional
concepts of altruism or prosocial behaviours. The general self-interest level
is a property of a game that is useful for assessing the propensity of players
to cooperate and understanding how features of a game impact this. We
illustrate the effectiveness of our method on several novel games
representations of social dilemmas with arbitrary numbers of players.Comment: 34 pages, 13 tables, submitted to the Journal of Autonomous Agents
and Multi-Agent Systems: Special Issue on Citizen-Centric AI System
Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments
The Game Theory & Multi-Agent team at DeepMind studies several aspects of
multi-agent learning ranging from computing approximations to fundamental
concepts in game theory to simulating social dilemmas in rich spatial
environments and training 3-d humanoids in difficult team coordination tasks. A
signature aim of our group is to use the resources and expertise made available
to us at DeepMind in deep reinforcement learning to explore multi-agent systems
in complex environments and use these benchmarks to advance our understanding.
Here, we summarise the recent work of our team and present a taxonomy that we
feel highlights many important open challenges in multi-agent research.Comment: Published in AI Communications 202
Evolutionary games on graphs
Game theory is one of the key paradigms behind many scientific disciplines
from biology to behavioral sciences to economics. In its evolutionary form and
especially when the interacting agents are linked in a specific social network
the underlying solution concepts and methods are very similar to those applied
in non-equilibrium statistical physics. This review gives a tutorial-type
overview of the field for physicists. The first three sections introduce the
necessary background in classical and evolutionary game theory from the basic
definitions to the most important results. The fourth section surveys the
topological complications implied by non-mean-field-type social network
structures in general. The last three sections discuss in detail the dynamic
behavior of three prominent classes of models: the Prisoner's Dilemma, the
Rock-Scissors-Paper game, and Competing Associations. The major theme of the
review is in what sense and how the graph structure of interactions can modify
and enrich the picture of long term behavioral patterns emerging in
evolutionary games.Comment: Review, final version, 133 pages, 65 figure
Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL
Multi-agent reinforcement learning (MARL) is a powerful tool for training
automated systems acting independently in a common environment. However, it can
lead to sub-optimal behavior when individual incentives and group incentives
diverge. Humans are remarkably capable at solving these social dilemmas. It is
an open problem in MARL to replicate such cooperative behaviors in selfish
agents. In this work, we draw upon the idea of formal contracting from
economics to overcome diverging incentives between agents in MARL. We propose
an augmentation to a Markov game where agents voluntarily agree to binding
state-dependent transfers of reward, under pre-specified conditions. Our
contributions are theoretical and empirical. First, we show that this
augmentation makes all subgame-perfect equilibria of all fully observed Markov
games exhibit socially optimal behavior, given a sufficiently rich space of
contracts. Next, we complement our game-theoretic analysis by showing that
state-of-the-art RL algorithms learn socially optimal policies given our
augmentation. Our experiments include classic static dilemmas like Stag Hunt,
Prisoner's Dilemma and a public goods game, as well as dynamic interactions
that simulate traffic, pollution management and common pool resource
management.Comment: 12 pages, 8 figures, AAMAS 202
Intrinsic fluctuations of reinforcement learning promote cooperation
In this work, we ask for and answer what makes classical reinforcement
learning cooperative. Cooperating in social dilemma situations is vital for
animals, humans, and machines. While evolutionary theory revealed a range of
mechanisms promoting cooperation, the conditions under which agents learn to
cooperate are contested. Here, we demonstrate which and how individual elements
of the multi-agent learning setting lead to cooperation. Specifically, we
consider the widely used temporal-difference reinforcement learning algorithm
with epsilon-greedy exploration in the classic environment of an iterated
Prisoner's dilemma with one-period memory. Each of the two learning agents
learns a strategy that conditions the following action choices on both agents'
action choices of the last round. We find that next to a high caring for future
rewards, a low exploration rate, and a small learning rate, it is primarily
intrinsic stochastic fluctuations of the reinforcement learning process which
double the final rate of cooperation to up to 80\%. Thus, inherent noise is not
a necessary evil of the iterative learning process. It is a critical asset for
the learning of cooperation. However, we also point out the trade-off between a
high likelihood of cooperative behavior and achieving this in a reasonable
amount of time. Our findings are relevant for purposefully designing
cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure
Welfare Diplomacy: Benchmarking Language Model Cooperation
The growing capabilities and increasingly widespread deployment of AI systems
necessitate robust benchmarks for measuring their cooperative capabilities.
Unfortunately, most multi-agent benchmarks are either zero-sum or purely
cooperative, providing limited opportunities for such measurements. We
introduce a general-sum variant of the zero-sum board game Diplomacy -- called
Welfare Diplomacy -- in which players must balance investing in military
conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both
a clearer assessment of and stronger training incentives for cooperative
capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules
and implementing them via an open-source Diplomacy engine; (2) constructing
baseline agents using zero-shot prompted language models; and (3) conducting
experiments where we find that baselines using state-of-the-art models attain
high social welfare but are exploitable. Our work aims to promote societal
safety by aiding researchers in developing and assessing multi-agent AI
systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is
available at https://github.com/mukobi/welfare-diplomacy
- …