7 research outputs found
Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning
Social psychology and real experiences show that cognitive consistency plays
an important role to keep human society in order: if people have a more
consistent cognition about their environments, they are more likely to achieve
better cooperation. Meanwhile, only cognitive consistency within a neighborhood
matters because humans only interact directly with their neighbors. Inspired by
these observations, we take the first step to introduce \emph{neighborhood
cognitive consistency} (NCC) into multi-agent reinforcement learning (MARL).
Our NCC design is quite general and can be easily combined with existing MARL
methods. As examples, we propose neighborhood cognition consistent deep
Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations.
Extensive experiments on several challenging tasks (i.e., packet routing, wifi
configuration, and Google football player control) justify the superior
performance of our methods compared with state-of-the-art MARL approaches.Comment: Accepted by AAAI2020 with oral presentation
(https://aaai.org/Conferences/AAAI-20/wp-content/uploads/2020/01/AAAI-20-Accepted-Paper-List.pdf).
Since AAAI2020 has started, I have the right to distribute this paper on
arXi
Cooperative Multiagent Attentional Communication for Large-Scale Task Space
Acknowledgments This work was supported by the Dalian University Research Platform Project Funding: Dalian Wise Information Technology of Med and Health Key Laboratory, the National Natural Science Foundation of China: Research on the stability of multi-surface high-speed unmanned boat formation and the method of cooperative collision avoidance in complex sea conditions, NO.61673084.Peer reviewedPostprintPublisher PD
Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?
Centralized Training with Decentralized Execution (CTDE) has recently emerged
as a popular framework for cooperative Multi-Agent Reinforcement Learning
(MARL), where agents can use additional global state information to guide
training in a centralized way and make their own decisions only based on
decentralized local policies. Despite the encouraging results achieved, CTDE
makes an independence assumption on agent policies, which limits agents to
adopt global cooperative information from each other during centralized
training. Therefore, we argue that existing CTDE methods cannot fully utilize
global information for training, leading to an inefficient joint-policy
exploration and even suboptimal results. In this paper, we introduce a novel
Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent
reinforcement learning, that not only enables an efficacious message exchange
among agents during training but also guarantees the independent policies for
execution. Firstly, CADP endows agents the explicit communication channel to
seek and take advices from different agents for more centralized training. To
further ensure the decentralized execution, we propose a smooth model pruning
mechanism to progressively constraint the agent communication into a closed one
without degradation in agent cooperation capability. Empirical evaluations on
StarCraft II micromanagement and Google Research Football benchmarks
demonstrate that the proposed framework achieves superior performance compared
with the state-of-the-art counterparts. Our code will be made publicly
available
Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
Each year, expert-level performance is attained in increasingly-complex
multiagent domains, notable examples including Go, Poker, and StarCraft II.
This rapid progression is accompanied by a commensurate need to better
understand how such agents attain this performance, to enable their safe
deployment, identify limitations, and reveal potential means of improving them.
In this paper we take a step back from performance-focused multiagent learning,
and instead turn our attention towards agent behavior analysis. We introduce a
model-agnostic method for discovery of behavior clusters in multiagent domains,
using variational inference to learn a hierarchy of behaviors at the joint and
local agent levels. Our framework makes no assumption about agents' underlying
learning algorithms, does not require access to their latent states or
policies, and is trained using only offline observational data. We illustrate
the effectiveness of our method for enabling the coupled understanding of
behaviors at the joint and local agent level, detection of behavior
changepoints throughout training, discovery of core behavioral concepts,
demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo
control domain, and also illustrate that the approach can disentangle
previously-trained policies in OpenAI's hide-and-seek domain