30 research outputs found

    Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

    Full text link
    Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the first step to introduce \emph{neighborhood cognitive consistency} (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i.e., packet routing, wifi configuration, and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.Comment: Accepted by AAAI2020 with oral presentation (https://aaai.org/Conferences/AAAI-20/wp-content/uploads/2020/01/AAAI-20-Accepted-Paper-List.pdf). Since AAAI2020 has started, I have the right to distribute this paper on arXi

    Learning Agent Communication under Limited Bandwidth by Message Pruning

    Full text link
    Communication is a crucial factor for the big multi-agent world to stay organized and productive. Recently, Deep Reinforcement Learning (DRL) has been applied to learn the communication strategy and the control policy for multiple agents. However, the practical \emph{\textbf{limited bandwidth}} in multi-agent communication has been largely ignored by the existing DRL methods. Specifically, many methods keep sending messages incessantly, which consumes too much bandwidth. As a result, they are inapplicable to multi-agent systems with limited bandwidth. To handle this problem, we propose a gating mechanism to adaptively prune less beneficial messages. We evaluate the gating mechanism on several tasks. Experiments demonstrate that it can prune a lot of messages with little impact on performance. In fact, the performance may be greatly improved by pruning redundant messages. Moreover, the proposed gating mechanism is applicable to several previous methods, equipping them the ability to address bandwidth restricted settings.Comment: accepted as a regular paper with poster presentation @ AAAI20. arXiv admin note: text overlap with arXiv:1903.0556

    Opponent Modelling in Multi-Agent Systems

    Get PDF
    Reinforcement Learning (RL) formalises a problem where an intelligent agent needs to learn and achieve certain goals by maximising a long-term return in an environment. Multi-agent reinforcement learning (MARL) extends traditional RL to multiple agents. Many RL algorithms lose convergence guarantee in non-stationary environments due to the adaptive opponents. Partial observation caused by agents’ different private observations introduces high variance during the training which exacerbates the data inefficiency. In MARL, training an agent to perform well against a set of opponents often leads to bad performance against another set of opponents. Non-stationarity, partial observation and unclear learning objective are three critical problems in MARL which hinder agents’ learning and they all share a cause which is the lack of knowledge of the other agents. Therefore, in this thesis, we propose to solve these problems with opponent modelling methods. We tailor our solutions by combining opponent modelling with other techniques according to the characteristics of problems we face. Specifically, we first propose ROMMEO, an algorithm inspired by Bayesian inference, as a solution to alleviate the non-stationarity in cooperative games. Then we study the partial observation problem caused by agents’ private observation and design an implicit communication training method named PBL. Lastly, we investigate solutions to the non-stationarity and unclear learning objective problems in zero-sum games. We propose a solution named EPSOM which aims for finding safe exploitation strategies to play against non-stationary opponents. We verify our proposed methods by varied experiments and show they can achieve the desired performance. Limitations and future works are discussed in the last chapter of this thesis

    Multi-agent reinforcement learning for character control

    Get PDF

    Learning to communicate in cooperative multi-agent reinforcement learning

    Get PDF
    Recent advances in deep reinforcement learning have produced unprecedented results. The success obtained on single-agent applications led to exploring these techniques in the context of multi-agent systems where several additional challenges need to be considered. Communication has always been crucial to achieving cooperation in multi-agent domains and learning to communicate represents a fundamental milestone for multi-agent reinforcement learning algorithms. In this thesis, different multi-agent reinforcement learning approaches are explored. These provide architectures that are learned end-to-end and capable of achieving effective communication protocols that can boost the system performance in cooperative settings. Firstly, we investigate a novel approach where intra-agent communication happens through a shared memory device that can be used by the agents to exchange messages through learnable read and write operations. Secondly, we propose a graph-based approach where connectivities are shaped by exchanging pairwise messages which are then aggregated through a novel form of attention mechanism based on a graph diffusion model. Finally, we present a new set of environments with real-world inspired constraints that we utilise to benchmark the most recent state-of-theart solutions. Our results show that communication can be a fundamental tool to overcome some of the intrinsic difficulties that characterise cooperative multi-agent systems

    Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

    Full text link
    Centralized Training with Decentralized Execution (CTDE) has recently emerged as a popular framework for cooperative Multi-Agent Reinforcement Learning (MARL), where agents can use additional global state information to guide training in a centralized way and make their own decisions only based on decentralized local policies. Despite the encouraging results achieved, CTDE makes an independence assumption on agent policies, which limits agents to adopt global cooperative information from each other during centralized training. Therefore, we argue that existing CTDE methods cannot fully utilize global information for training, leading to an inefficient joint-policy exploration and even suboptimal results. In this paper, we introduce a novel Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning, that not only enables an efficacious message exchange among agents during training but also guarantees the independent policies for execution. Firstly, CADP endows agents the explicit communication channel to seek and take advices from different agents for more centralized training. To further ensure the decentralized execution, we propose a smooth model pruning mechanism to progressively constraint the agent communication into a closed one without degradation in agent cooperation capability. Empirical evaluations on StarCraft II micromanagement and Google Research Football benchmarks demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code will be made publicly available
    corecore