30 research outputs found
Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning
Social psychology and real experiences show that cognitive consistency plays
an important role to keep human society in order: if people have a more
consistent cognition about their environments, they are more likely to achieve
better cooperation. Meanwhile, only cognitive consistency within a neighborhood
matters because humans only interact directly with their neighbors. Inspired by
these observations, we take the first step to introduce \emph{neighborhood
cognitive consistency} (NCC) into multi-agent reinforcement learning (MARL).
Our NCC design is quite general and can be easily combined with existing MARL
methods. As examples, we propose neighborhood cognition consistent deep
Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations.
Extensive experiments on several challenging tasks (i.e., packet routing, wifi
configuration, and Google football player control) justify the superior
performance of our methods compared with state-of-the-art MARL approaches.Comment: Accepted by AAAI2020 with oral presentation
(https://aaai.org/Conferences/AAAI-20/wp-content/uploads/2020/01/AAAI-20-Accepted-Paper-List.pdf).
Since AAAI2020 has started, I have the right to distribute this paper on
arXi
Learning Agent Communication under Limited Bandwidth by Message Pruning
Communication is a crucial factor for the big multi-agent world to stay
organized and productive. Recently, Deep Reinforcement Learning (DRL) has been
applied to learn the communication strategy and the control policy for multiple
agents. However, the practical \emph{\textbf{limited bandwidth}} in multi-agent
communication has been largely ignored by the existing DRL methods.
Specifically, many methods keep sending messages incessantly, which consumes
too much bandwidth. As a result, they are inapplicable to multi-agent systems
with limited bandwidth. To handle this problem, we propose a gating mechanism
to adaptively prune less beneficial messages. We evaluate the gating mechanism
on several tasks. Experiments demonstrate that it can prune a lot of messages
with little impact on performance. In fact, the performance may be greatly
improved by pruning redundant messages. Moreover, the proposed gating mechanism
is applicable to several previous methods, equipping them the ability to
address bandwidth restricted settings.Comment: accepted as a regular paper with poster presentation @ AAAI20. arXiv
admin note: text overlap with arXiv:1903.0556
Opponent Modelling in Multi-Agent Systems
Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis
Learning to communicate in cooperative multi-agent reinforcement learning
Recent advances in deep reinforcement learning have produced unprecedented results. The success obtained on single-agent applications led to exploring these techniques in the context of multi-agent systems where several additional challenges need to be considered. Communication has always been crucial to achieving cooperation in multi-agent domains and learning to communicate represents a fundamental milestone for multi-agent reinforcement learning algorithms. In this thesis, different multi-agent reinforcement learning approaches are explored. These provide architectures that are learned end-to-end and capable of achieving effective communication protocols that can boost the system performance in cooperative settings. Firstly, we investigate a novel approach where intra-agent communication happens through a shared memory device that can be used by the agents to exchange messages through learnable read and write operations. Secondly, we propose a graph-based approach where connectivities are shaped by exchanging pairwise messages which are then aggregated through a novel form of attention mechanism based on a graph diffusion model. Finally, we present a new set of environments with real-world inspired constraints that we utilise to benchmark the most recent state-of-theart solutions. Our results show that communication can be a fundamental tool to overcome some of the intrinsic difficulties that characterise cooperative multi-agent systems
Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?
Centralized Training with Decentralized Execution (CTDE) has recently emerged
as a popular framework for cooperative Multi-Agent Reinforcement Learning
(MARL), where agents can use additional global state information to guide
training in a centralized way and make their own decisions only based on
decentralized local policies. Despite the encouraging results achieved, CTDE
makes an independence assumption on agent policies, which limits agents to
adopt global cooperative information from each other during centralized
training. Therefore, we argue that existing CTDE methods cannot fully utilize
global information for training, leading to an inefficient joint-policy
exploration and even suboptimal results. In this paper, we introduce a novel
Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent
reinforcement learning, that not only enables an efficacious message exchange
among agents during training but also guarantees the independent policies for
execution. Firstly, CADP endows agents the explicit communication channel to
seek and take advices from different agents for more centralized training. To
further ensure the decentralized execution, we propose a smooth model pruning
mechanism to progressively constraint the agent communication into a closed one
without degradation in agent cooperation capability. Empirical evaluations on
StarCraft II micromanagement and Google Research Football benchmarks
demonstrate that the proposed framework achieves superior performance compared
with the state-of-the-art counterparts. Our code will be made publicly
available