355 research outputs found

    Counterfactual Multi-Agent Policy Gradients

    Full text link
    Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state

    Deep Multi-Critic Network for accelerating Policy Learning in multi-agent environments

    Get PDF
    Humans live among other humans, not in isolation. Therefore, the ability to learn and behave in multi-agent environments is essential for any autonomous system that intends to interact with people. Due to the presence of multiple simultaneous learners in a multi-agent learning environment, the Markov assumption used for single-agent environments is not tenable, necessitating the development of new Policy Learning algorithms. Recent Actor-Critic algorithms proposed for multi-agent environments, such as Multi-Agent Deep Deterministic Policy Gradients and Counterfactual Multi-Agent Policy Gradients, find a way to use the same mathematical framework as single agent environments by augmenting the Critic with extra information. However, this extra information can slow down the learning process and afflict the Critic with Curse of Dimensionality. To combat this, we propose a novel Deep Neural Network configuration called Deep Multi-Critic Network. This architecture works by taking a weighted sum over the outputs of multiple critic networks of varying complexity and size. The configuration was tested on data collected from a real-world multi-agent environment. The results illustrate that by using Deep Multi-Critic Network, less data is needed to reach the same level of performance as when not using the configuration. This suggests that as the configuration learns faster from less data, then the Critic may be able to learn Q-values faster, accelerating Actor training as well

    Measuring collaborative emergent behavior in multi-agent reinforcement learning

    Full text link
    Multi-agent reinforcement learning (RL) has important implications for the future of human-agent teaming. We show that improved performance with multi-agent RL is not a guarantee of the collaborative behavior thought to be important for solving multi-agent tasks. To address this, we present a novel approach for quantitatively assessing collaboration in continuous spatial tasks with multi-agent RL. Such a metric is useful for measuring collaboration between computational agents and may serve as a training signal for collaboration in future RL paradigms involving humans.Comment: 1st International Conference on Human Systems Engineering and Design, 6 pages, 2 figures, 1 tabl

    Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training

    Full text link
    Much work has been dedicated to the exploration of Multi-Agent Reinforcement Learning (MARL) paradigms implementing a centralized learning with decentralized execution (CLDE) approach to achieve human-like collaboration in cooperative tasks. Here, we discuss variations of centralized training and describe a recent survey of algorithmic approaches. The goal is to explore how different implementations of information sharing mechanism in centralized learning may give rise to distinct group coordinated behaviors in multi-agent systems performing cooperative tasks.Comment: This article appeared in the news at: https://www.army.mil/article/247261/army_researchers_develop_innovative_framework_for_training_a
    • …
    corecore