144 research outputs found
R-MADDPG for Partially Observable Environments and Limited Communication
There are several real-world tasks that would benefit from applying
multiagent reinforcement learning (MARL) algorithms, including the coordination
among self-driving cars. The real world has challenging conditions for
multiagent learning systems, such as its partial observable and nonstationary
nature. Moreover, if agents must share a limited resource (e.g. network
bandwidth) they must all learn how to coordinate resource use. This paper
introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for
handling multiagent coordination under partial observable set-tings and limited
communication. We investigate recurrency effects on performance and
communication use of a team of agents. We demonstrate that the resulting
framework learns time dependencies for sharing missing observations, handling
resource limitations, and developing different communication patterns among
agents.Comment: Reinforcement Learning for Real Life (RL4RealLife) Workshop in the
36th International Conference on Machine Learning, Long Beach, California,
USA, 201
Improving Coordination in Small-Scale Multi-Agent Deep Reinforcement Learning through Memory-driven Communication
Deep reinforcement learning algorithms have recently been used to train
multiple interacting agents in a centralised manner whilst keeping their
execution decentralised. When the agents can only acquire partial observations
and are faced with tasks requiring coordination and synchronisation skills,
inter-agent communication plays an essential role. In this work, we propose a
framework for multi-agent training using deep deterministic policy gradients
that enables concurrent, end-to-end learning of an explicit communication
protocol through a memory device. During training, the agents learn to perform
read and write operations enabling them to infer a shared representation of the
world. We empirically demonstrate that concurrent learning of the communication
device and individual policies can improve inter-agent coordination and
performance in small-scale systems. Our experimental results show that the
proposed method achieves superior performance in scenarios with up to six
agents. We illustrate how different communication patterns can emerge on six
different tasks of increasing complexity. Furthermore, we study the effects of
corrupting the communication channel, provide a visualisation of the
time-varying memory content as the underlying task is being solved and validate
the building blocks of the proposed memory device through ablation studies
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Reinforcement learning in multi-agent scenarios is important for real-world
applications but presents challenges beyond those seen in single-agent
settings. We present an actor-critic algorithm that trains decentralized
policies in multi-agent settings, using centrally computed critics that share
an attention mechanism which selects relevant information for each agent at
every timestep. This attention mechanism enables more effective and scalable
learning in complex multi-agent environments, when compared to recent
approaches. Our approach is applicable not only to cooperative settings with
shared rewards, but also individualized reward settings, including adversarial
settings, as well as settings that do not provide global states, and it makes
no assumptions about the action spaces of the agents. As such, it is flexible
enough to be applied to most multi-agent learning problems.Comment: ICML 2019 camera ready versio
Learning Attentional Communication for Multi-Agent Cooperation
Communication could potentially be an effective way for multi-agent
cooperation. However, information sharing among all agents or in predefined
communication architectures that existing methods adopt can be problematic.
When there is a large number of agents, agents cannot differentiate valuable
information that helps cooperative decision making from globally shared
information. Therefore, communication barely helps, and could even impair the
learning of multi-agent cooperation. Predefined communication architectures, on
the other hand, restrict communication among agents and thus restrain potential
cooperation. To tackle these difficulties, in this paper, we propose an
attentional communication model that learns when communication is needed and
how to integrate shared information for cooperative decision making. Our model
leads to efficient and effective communication for large-scale multi-agent
cooperation. Empirically, we show the strength of our model in a variety of
cooperative scenarios, where agents are able to develop more coordinated and
sophisticated strategies than existing methods.Comment: NIPS'1
Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control
Centralised training with decentralised execution (CTDE) is an important
learning paradigm in multi-agent reinforcement learning (MARL). To make
progress in CTDE, we introduce Multi-Agent MuJoCo (MAMuJoCo), a novel benchmark
suite that, unlike StarCraft Multi-Agent Challenge (SMAC), the predominant
benchmark environment, applies to continuous robotic control tasks. To
demonstrate the utility of MAMuJoCo, we present a range of benchmark results on
this new suite, including comparing the state-of-the-art actor-critic method
MADDPG against two novel variants of existing methods. These new methods
outperform MADDPG on a number of MAMuJoCo tasks. In addition, we show that, in
these continuous cooperative MAMuJoCo tasks, value factorisation plays a
greater role in performance than the underlying algorithmic choices. This
motivates the necessity of extending the study of value factorisations from
-learning to actor-critic algorithms
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
Decentralized Multi-Agent Actor-Critic with Generative Inference
Recent multi-agent actor-critic methods have utilized centralized training
with decentralized execution to address the non-stationarity of co-adapting
agents. This training paradigm constrains learning to the centralized phase
such that only pre-learned policies may be used during the decentralized phase,
which performs poorly when agent communications are delayed, noisy, or
disrupted. In this work, we propose a new system that can gracefully handle
partially-observable information due to communication disruptions during
decentralized execution. Our approach augments the multi-agent actor-critic
method's centralized training phase with generative modeling so that agents may
infer other agents' observations when provided with locally available context.
Our method is evaluated on three tasks that require agents to combine local and
remote observations communicated by other agents. We evaluate our approach by
introducing both partial observability during decentralized execution, and show
that decentralized training on inferred observations performs as well or better
than existing actor-critic methods.Comment: 8 pages. Accepted to Deep Reinforcement Learning Workshop at NeurIPS
201
Multi-Agent Actor-Critic with Generative Cooperative Policy Network
We propose an efficient multi-agent reinforcement learning approach to derive
equilibrium strategies for multi-agents who are participating in a Markov game.
Mainly, we are focused on obtaining decentralized policies for agents to
maximize the performance of a collaborative task by all the agents, which is
similar to solving a decentralized Markov decision process. We propose to use
two different policy networks: (1) decentralized greedy policy network used to
generate greedy action during training and execution period and (2) generative
cooperative policy network (GCPN) used to generate action samples to make other
agents improve their objectives during training period. We show that the
samples generated by GCPN enable other agents to explore the policy space more
effectively and favorably to reach a better policy in terms of achieving the
collaborative tasks.Comment: 10 pages, total 9 figures including all sub-figure
Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning
In this paper, we propose a new learning technique named message-dropout to
improve the performance for multi-agent deep reinforcement learning under two
application scenarios: 1) classical multi-agent reinforcement learning with
direct message communication among agents and 2) centralized training with
decentralized execution. In the first application scenario of multi-agent
systems in which direct message communication among agents is allowed, the
message-dropout technique drops out the received messages from other agents in
a block-wise manner with a certain probability in the training phase and
compensates for this effect by multiplying the weights of the dropped-out block
units with a correction probability. The applied message-dropout technique
effectively handles the increased input dimension in multi-agent reinforcement
learning with communication and makes learning robust against communication
errors in the execution phase. In the second application scenario of
centralized training with decentralized execution, we particularly consider the
application of the proposed message-dropout to Multi-Agent Deep Deterministic
Policy Gradient (MADDPG), which uses a centralized critic to train a
decentralized actor for each agent. We evaluate the proposed message-dropout
technique for several games, and numerical results show that the proposed
message-dropout technique with proper dropout rate improves the reinforcement
learning performance significantly in terms of the training speed and the
steady-state performance in the execution phase.Comment: The 33rd AAAI Conference on Artificial Intelligence (AAAI) 201
Learning Multi-agent Communication under Limited-bandwidth Restriction for Internet Packet Routing
Communication is an important factor for the big multi-agent world to stay
organized and productive. Recently, the AI community has applied the Deep
Reinforcement Learning (DRL) to learn the communication strategy and the
control policy for multiple agents. However, when implementing the
communication for real-world multi-agent applications, there is a more
practical limited-bandwidth restriction, which has been largely ignored by the
existing DRL-based methods. Specifically, agents trained by most previous
methods keep sending messages incessantly in every control cycle; due to
emitting too many messages, these methods are unsuitable to be applied to the
real-world systems that have a limited bandwidth to transmit the messages. To
handle this problem, we propose a gating mechanism to adaptively prune
unprofitable messages. Results show that the gating mechanism can prune more
than 80% messages with little damage to the performance. Moreover, our method
outperforms several state-of-the-art DRL-based and rule-based methods by a
large margin in both the real-world packet routing tasks and four benchmark
tasks.Comment: This paper proposes a gating mechanism with several crucial designs
for adaptively prunning the unprofitable communication messages among
multiple agents, such that the limited-bandwidth restriction existing in many
real-world muli-agent systems can be resolved. Experiments show that our
method can prune quite a lot of unprofitable messages with little damage to
the performanc
- …