951 research outputs found
Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL
Multi-agent reinforcement learning (MARL) is a powerful tool for training
automated systems acting independently in a common environment. However, it can
lead to sub-optimal behavior when individual incentives and group incentives
diverge. Humans are remarkably capable at solving these social dilemmas. It is
an open problem in MARL to replicate such cooperative behaviors in selfish
agents. In this work, we draw upon the idea of formal contracting from
economics to overcome diverging incentives between agents in MARL. We propose
an augmentation to a Markov game where agents voluntarily agree to binding
state-dependent transfers of reward, under pre-specified conditions. Our
contributions are theoretical and empirical. First, we show that this
augmentation makes all subgame-perfect equilibria of all fully observed Markov
games exhibit socially optimal behavior, given a sufficiently rich space of
contracts. Next, we complement our game-theoretic analysis by showing that
state-of-the-art RL algorithms learn socially optimal policies given our
augmentation. Our experiments include classic static dilemmas like Stag Hunt,
Prisoner's Dilemma and a public goods game, as well as dynamic interactions
that simulate traffic, pollution management and common pool resource
management.Comment: 12 pages, 8 figures, AAMAS 202
Stochastic Market Games
Some of the most relevant future applications of multi-agent systems like
autonomous driving or factories as a service display mixed-motive scenarios,
where agents might have conflicting goals. In these settings agents are likely
to learn undesirable outcomes in terms of cooperation under independent
learning, such as overly greedy behavior. Motivated from real world societies,
in this work we propose to utilize market forces to provide incentives for
agents to become cooperative. As demonstrated in an iterated version of the
Prisoner's Dilemma, the proposed market formulation can change the dynamics of
the game to consistently learn cooperative policies. Further we evaluate our
approach in spatially and temporally extended settings for varying numbers of
agents. We empirically find that the presence of markets can improve both the
overall result and agent individual returns via their trading activities.Comment: IJCAI-2
Melting Pot 2.0
Multi-agent artificial intelligence research promises a path to develop
intelligent technologies that are more human-like and more human-compatible
than those produced by "solipsistic" approaches, which do not consider
interactions between agents. Melting Pot is a research tool developed to
facilitate work on multi-agent artificial intelligence, and provides an
evaluation protocol that measures generalization to novel social partners in a
set of canonical test scenarios. Each scenario pairs a physical environment (a
"substrate") with a reference set of co-players (a "background population"), to
create a social situation with substantial interdependence between the
individuals involved. For instance, some scenarios were inspired by
institutional-economics-based accounts of natural resource management and
public-good-provision dilemmas. Others were inspired by considerations from
evolutionary biology, game theory, and artificial life. Melting Pot aims to
cover a maximally diverse set of interdependencies and incentives. It includes
the commonly-studied extreme cases of perfectly-competitive (zero-sum)
motivations and perfectly-cooperative (shared-reward) motivations, but does not
stop with them. As in real-life, a clear majority of scenarios in Melting Pot
have mixed incentives. They are neither purely competitive nor purely
cooperative and thus demand successful agents be able to navigate the resulting
ambiguity. Here we describe Melting Pot 2.0, which revises and expands on
Melting Pot. We also introduce support for scenarios with asymmetric roles, and
explain how to integrate them into the evaluation protocol. This report also
contains: (1) details of all substrates and scenarios; (2) a complete
description of all baseline algorithms and results. Our intention is for it to
serve as a reference for researchers using Melting Pot 2.0.Comment: 59 pages, 54 figures. arXiv admin note: text overlap with
arXiv:2107.0685
Decentralized scheduling through an adaptive, trading-based multi-agent system
In multi-agent reinforcement learning systems, the actions of one agent can
have a negative impact on the rewards of other agents. One way to combat this
problem is to let agents trade their rewards amongst each other. Motivated by
this, this work applies a trading approach to a simulated scheduling
environment, where the agents are responsible for the assignment of incoming
jobs to compute cores. In this environment, reinforcement learning agents learn
to trade successfully. The agents can trade the usage right of computational
cores to process high-priority, high-reward jobs faster than low-priority,
low-reward jobs. However, due to combinatorial effects, the action and
observation spaces of a simple reinforcement learning agent in this environment
scale exponentially with key parameters of the problem size. However, the
exponential scaling behavior can be transformed into a linear one if the agent
is split into several independent sub-units. We further improve this
distributed architecture using agent-internal parameter sharing. Moreover, it
can be extended to set the exchange prices autonomously. We show that in our
scheduling environment, the advantages of a distributed agent architecture
clearly outweigh more aggregated approaches. We demonstrate that the
distributed agent architecture becomes even more performant using
agent-internal parameter sharing. Finally, we investigate how two different
reward functions affect autonomous pricing and the corresponding scheduling.Comment: Accepted at ABMHuB 2022 worksho
Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games
In general-sum games, the interaction of self-interested learning agents
commonly leads to socially worse outcomes, such as defect-defect in the
iterated stag hunt (ISH). Previous works address this challenge by sharing
rewards or shaping their opponents' learning process, which require too strong
assumptions. In this paper, we demonstrate that agents trained to optimize
expected returns are more likely to choose a safe action that leads to
guaranteed but lower rewards. However, there typically exists a risky action
that leads to higher rewards in the long run only if agents cooperate, e.g.,
cooperate-cooperate in ISH. To overcome this, we propose using action value
distribution to characterize the decision's risk and corresponding potential
payoffs. Specifically, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP
learns the distributions over agent's return and estimates a dynamic
risk-seeking bonus to discover risky coordination strategies. Furthermore, to
avoid overfitting training opponents, ARSP learns an auxiliary opponent
modeling task to infer opponents' types and dynamically alter corresponding
strategies during execution. Empirically, agents trained via ARSP can achieve
stable coordination during training without accessing opponent's rewards or
learning process, and can adapt to non-cooperative opponents during execution.
To the best of our knowledge, it is the first method to learn coordination
strategies between agents both in iterated prisoner's dilemma (IPD) and
iterated stag hunt (ISH) without shaping opponents or rewards, and can adapt to
opponents with distinct strategies during execution. Furthermore, we show that
ARSP can be scaled to high-dimensional settings.Comment: arXiv admin note: substantial text overlap with arXiv:2205.1585
Resolving social dilemmas with minimal reward transfer
Multi-agent cooperation is an important topic, and is particularly
challenging in mixed-motive situations where it does not pay to be nice to
others. Consequently, self-interested agents often avoid collective behaviour,
resulting in suboptimal outcomes for the group. In response, in this paper we
introduce a metric to quantify the disparity between what is rational for
individual agents and what is rational for the group, which we call the general
self-interest level. This metric represents the maximum proportion of
individual rewards that all agents can retain while ensuring that achieving
social welfare optimum becomes a dominant strategy. By aligning the individual
and group incentives, rational agents acting to maximise their own reward will
simultaneously maximise the collective reward. As agents transfer their rewards
to motivate others to consider their welfare, we diverge from traditional
concepts of altruism or prosocial behaviours. The general self-interest level
is a property of a game that is useful for assessing the propensity of players
to cooperate and understanding how features of a game impact this. We
illustrate the effectiveness of our method on several novel games
representations of social dilemmas with arbitrary numbers of players.Comment: 34 pages, 13 tables, submitted to the Journal of Autonomous Agents
and Multi-Agent Systems: Special Issue on Citizen-Centric AI System
Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments
The Game Theory & Multi-Agent team at DeepMind studies several aspects of
multi-agent learning ranging from computing approximations to fundamental
concepts in game theory to simulating social dilemmas in rich spatial
environments and training 3-d humanoids in difficult team coordination tasks. A
signature aim of our group is to use the resources and expertise made available
to us at DeepMind in deep reinforcement learning to explore multi-agent systems
in complex environments and use these benchmarks to advance our understanding.
Here, we summarise the recent work of our team and present a taxonomy that we
feel highlights many important open challenges in multi-agent research.Comment: Published in AI Communications 202
- …