11 research outputs found
Inference-Based Deterministic Messaging For Multi-Agent Communication
Communication is essential for coordination among humans and animals.
Therefore, with the introduction of intelligent agents into the world,
agent-to-agent and agent-to-human communication becomes necessary. In this
paper, we first study learning in matrix-based signaling games to empirically
show that decentralized methods can converge to a suboptimal policy. We then
propose a modification to the messaging policy, in which the sender
deterministically chooses the best message that helps the receiver to infer the
sender's observation. Using this modification, we see, empirically, that the
agents converge to the optimal policy in nearly all the runs. We then apply
this method to a partially observable gridworld environment which requires
cooperation between two agents and show that, with appropriate approximation
methods, the proposed sender modification can enhance existing decentralized
training methods for more complex domains as well.Comment: 13 pages, 10 figures. Accepted at accepted at the 35th AAAI
Conference on Artificial Intelligence, 202
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
This work focuses on equilibrium selection in no-conflict multi-agent games,
where we specifically study the problem of selecting a Pareto-optimal
equilibrium among several existing equilibria. It has been shown that many
state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone
to converging to Pareto-dominated equilibria due to the uncertainty each agent
has about the policy of the other agents during training. To address
sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC),
which is an actor-critic algorithm that utilises a simple property of
no-conflict games (a superset of cooperative games): the Pareto-optimal
equilibrium in a no-conflict game maximises the returns of all agents and
therefore is the preferred outcome for all agents. We evaluate Pareto-AC in a
diverse set of multi-agent games and show that it converges to higher episodic
returns compared to seven state-of-the-art MARL algorithms and that it
successfully converges to a Pareto-optimal equilibrium in a range of matrix
games. Finally, we propose PACDCG, a graph neural network extension of
Pareto-AC which is shown to efficiently scale in games with a large number of
agents.Comment: 20 pages, 12 figure
Learning in Cooperative Multiagent Systems Using Cognitive and Machine Models
Developing effective Multi-Agent Systems (MAS) is critical for many
applications requiring collaboration and coordination with humans. Despite the
rapid advance of Multi-Agent Deep Reinforcement Learning (MADRL) in cooperative
MAS, one major challenge is the simultaneous learning and interaction of
independent agents in dynamic environments in the presence of stochastic
rewards. State-of-the-art MADRL models struggle to perform well in Coordinated
Multi-agent Object Transportation Problems (CMOTPs), wherein agents must
coordinate with each other and learn from stochastic rewards. In contrast,
humans often learn rapidly to adapt to nonstationary environments that require
coordination among people. In this paper, motivated by the demonstrated ability
of cognitive models based on Instance-Based Learning Theory (IBLT) to capture
human decisions in many dynamic decision making tasks, we propose three
variants of Multi-Agent IBL models (MAIBL). The idea of these MAIBL algorithms
is to combine the cognitive mechanisms of IBLT and the techniques of MADRL
models to deal with coordination MAS in stochastic environments from the
perspective of independent learners. We demonstrate that the MAIBL models
exhibit faster learning and achieve better coordination in a dynamic CMOTP task
with various settings of stochastic rewards compared to current MADRL models.
We discuss the benefits of integrating cognitive insights into MADRL models.Comment: 22 pages, 5 figures, 2 table
Context-Aware Sparse Deep Coordination Graphs
Learning sparse coordination graphs adaptive to the coordination dynamics
among agents is a long-standing problem in cooperative multi-agent learning.
This paper studies this problem and proposes a novel method using the variance
of payoff functions to construct context-aware sparse coordination topologies.
We theoretically consolidate our method by proving that the smaller the
variance of payoff functions is, the less likely action selection will change
after removing the corresponding edge. Moreover, we propose to learn action
representations to effectively reduce the influence of payoff functions'
estimation errors on graph construction. To empirically evaluate our method, we
present the Multi-Agent COordination (MACO) benchmark by collecting classic
coordination problems in the literature, increasing their difficulty, and
classifying them into different types. We carry out a case study and
experiments on the MACO and StarCraft II micromanagement benchmark to
demonstrate the dynamics of sparse graph learning, the influence of graph
sparseness, and the learning performance of our method
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
In many real-world settings, a team of agents must coordinate its behaviour
while acting in a decentralised fashion. At the same time, it is often possible
to train the agents in a centralised fashion where global state information is
available and communication constraints are lifted. Learning joint
action-values conditioned on extra state information is an attractive way to
exploit centralised learning, but the best strategy for then extracting
decentralised policies is unclear. Our solution is QMIX, a novel value-based
method that can train decentralised policies in a centralised end-to-end
fashion. QMIX employs a mixing network that estimates joint action-values as a
monotonic combination of per-agent values. We structurally enforce that the
joint-action value is monotonic in the per-agent values, through the use of
non-negative weights in the mixing network, which guarantees consistency
between the centralised and decentralised policies. To evaluate the performance
of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new
benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a
challenging set of SMAC scenarios and show that it significantly outperforms
existing multi-agent reinforcement learning methods.Comment: Extended version of the ICML 2018 conference paper (arXiv:1803.11485