Search CORE

30 research outputs found

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Author: Hao Jianye
Li Dong
Liu Wulong
Luo Jun
Mao Hangyu
Wang Jun
Xiao Zhen
Zhang Zhengchao
Publication venue
Publication date: 09/02/2020
Field of study

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce \emph{neighborhood cognitive consistency} (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration, and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.Comment: Accepted by AAAI2020 with oral presentation (https://aaai.org/Conferences/AAAI-20/wp-content/uploads/2020/01/AAAI-20-Accepted-Paper-List.pdf). Since AAAI2020 has started, I have the right to distribute this paper on arXi

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Agent Communication under Limited Bandwidth by Message Pruning

Author: Gong Zhibo
Mao Hangyu
Ni Yan
Xiao Zhen
Zhang Zhengchao
Publication venue
Publication date: 02/12/2019
Field of study

Communication is a crucial factor for the big multi-agent world to stay organized and productive. Recently, Deep Reinforcement Learning (DRL) has been applied to learn the communication strategy and the control policy for multiple agents. However, the practical \emph{\textbf{limited bandwidth}} in multi-agent communication has been largely ignored by the existing DRL methods. Specifically, many methods keep sending messages incessantly, which consumes too much bandwidth. As a result, they are inapplicable to multi-agent systems with limited bandwidth. To handle this problem, we propose a gating mechanism to adaptively prune less beneficial messages. We evaluate the gating mechanism on several tasks. Experiments demonstrate that it can prune a lot of messages with little impact on performance. In fact, the performance may be greatly improved by pruning redundant messages. Moreover, the proposed gating mechanism is applicable to several previous methods, equipping them the ability to address bandwidth restricted settings.Comment: accepted as a regular paper with poster presentation @ AAAI20. arXiv admin note: text overlap with arXiv:1903.0556

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Opponent Modelling in Multi-Agent Systems

Author: Tian Zheng
Publication venue: UCL (University College London)
Publication date: 28/11/2021
Field of study

Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

Multi-agent reinforcement learning for character control

Author: Fussell Levi
Komura Taku
Li Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/12/2021
Field of study

Learning to communicate in cooperative multi-agent reinforcement learning

Author: Pesce Emanuele
Publication venue
Publication date
Field of study

Recent advances in deep reinforcement learning have produced unprecedented results. The success obtained on single-agent applications led to exploring these techniques in the context of multi-agent systems where several additional challenges need to be considered. Communication has always been crucial to achieving cooperation in multi-agent domains and learning to communicate represents a fundamental milestone for multi-agent reinforcement learning algorithms. In this thesis, different multi-agent reinforcement learning approaches are explored. These provide architectures that are learned end-to-end and capable of achieving effective communication protocols that can boost the system performance in cooperative settings. Firstly, we investigate a novel approach where intra-agent communication happens through a shared memory device that can be used by the agents to exchange messages through learnable read and write operations. Secondly, we propose a graph-based approach where connectivities are shaped by exchanging pairwise messages which are then aggregated through a novel form of attention mechanism based on a graph diffusion model. Finally, we present a new set of environments with real-world inspired constraints that we utilise to benchmark the most recent state-of-theart solutions. Our results show that communication can be a fundamental tool to overcome some of the intrinsic difficulties that characterise cooperative multi-agent systems

Warwick Research Archives Portal Repository

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Author: Chen Kaixuan
Huang Yanhao
Liu Shunyu
Qing Yunpeng
Song Jie
Song Mingli
Zheng Tongya
Zhou Yihe
Publication venue
Publication date: 26/05/2023
Field of study

Centralized Training with Decentralized Execution (CTDE) has recently emerged as a popular framework for cooperative Multi-Agent Reinforcement Learning (MARL), where agents can use additional global state information to guide training in a centralized way and make their own decisions only based on decentralized local policies. Despite the encouraging results achieved, CTDE makes an independence assumption on agent policies, which limits agents to adopt global cooperative information from each other during centralized training. Therefore, we argue that existing CTDE methods cannot fully utilize global information for training, leading to an inefficient joint-policy exploration and even suboptimal results. In this paper, we introduce a novel Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning, that not only enables an efficacious message exchange among agents during training but also guarantees the independent policies for execution. Firstly, CADP endows agents the explicit communication channel to seek and take advices from different agents for more centralized training. To further ensure the decentralized execution, we propose a smooth model pruning mechanism to progressively constraint the agent communication into a closed one without degradation in agent cooperation capability. Empirical evaluations on StarCraft II micromanagement and Google Research Football benchmarks demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code will be made publicly available

arXiv.org e-Print Archive