81,904 research outputs found
Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training
Much work has been dedicated to the exploration of Multi-Agent Reinforcement
Learning (MARL) paradigms implementing a centralized learning with
decentralized execution (CLDE) approach to achieve human-like collaboration in
cooperative tasks. Here, we discuss variations of centralized training and
describe a recent survey of algorithmic approaches. The goal is to explore how
different implementations of information sharing mechanism in centralized
learning may give rise to distinct group coordinated behaviors in multi-agent
systems performing cooperative tasks.Comment: This article appeared in the news at:
https://www.army.mil/article/247261/army_researchers_develop_innovative_framework_for_training_a
Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning
Learning to collaborate has witnessed significant progress in multi-agent
reinforcement learning (MARL). However, promoting coordination among agents and
enhancing exploration capabilities remain challenges. In multi-agent
environments, interactions between agents are limited in specific situations.
Effective collaboration between agents thus requires a nuanced understanding of
when and how agents' actions influence others. To this end, in this paper, we
propose a novel MARL algorithm named Situation-Dependent Causal Influence-Based
Cooperative Multi-agent Reinforcement Learning (SCIC), which incorporates a
novel Intrinsic reward mechanism based on a new cooperation criterion measured
by situation-dependent causal influence among agents. Our approach aims to
detect inter-agent causal influences in specific situations based on the
criterion using causal intervention and conditional mutual information. This
effectively assists agents in exploring states that can positively impact other
agents, thus promoting cooperation between agents. The resulting update links
coordinated exploration and intrinsic reward distribution, which enhance
overall collaboration and performance. Experimental results on various MARL
benchmarks demonstrate the superiority of our method compared to
state-of-the-art approaches
Multi-agent Deep Covering Option Discovery
The use of options can greatly accelerate exploration in reinforcement
learning, especially when only sparse reward signals are available. While
option discovery methods have been proposed for individual agents, in
multi-agent reinforcement learning settings, discovering collaborative options
that can coordinate the behavior of multiple agents and encourage them to visit
the under-explored regions of their joint state space has not been considered.
In this case, we propose Multi-agent Deep Covering Option Discovery, which
constructs the multi-agent options through minimizing the expected cover time
of the multiple agents' joint state space. Also, we propose a novel framework
to adopt the multi-agent options in the MARL process. In practice, a
multi-agent task can usually be divided into some sub-tasks, each of which can
be completed by a sub-group of the agents. Therefore, our algorithm framework
first leverages an attention mechanism to find collaborative agent sub-groups
that would benefit most from coordinated actions. Then, a hierarchical
algorithm, namely HA-MSAC, is developed to learn the multi-agent options for
each sub-group to complete their sub-tasks first, and then to integrate them
through a high-level policy as the solution of the whole task. This
hierarchical option construction allows our framework to strike a balance
between scalability and effective collaboration among the agents. The
evaluation based on multi-agent collaborative tasks shows that the proposed
algorithm can effectively capture the agent interactions with the attention
mechanism, successfully identify multi-agent options, and significantly
outperforms prior works using single-agent options or no options, in terms of
both faster exploration and higher task rewards.Comment: This paper was presented in part at the ICML Reinforcement Learning
for Real Life Workshop, July 202
Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
In multi-agent reinforcement learning, discovering successful collective
behaviors is challenging as it requires exploring a joint action space that
grows exponentially with the number of agents. While the tractability of
independent agent-wise exploration is appealing, this approach fails on tasks
that require elaborate group strategies. We argue that coordinating the agents'
policies can guide their exploration and we investigate techniques to promote
such an inductive bias. We propose two policy regularization methods: TeamReg,
which is based on inter-agent action predictability and CoachReg that relies on
synchronized behavior selection. We evaluate each approach on four challenging
continuous control tasks with sparse rewards that require varying levels of
coordination as well as on the discrete action Google Research Football
environment. Our experiments show improved performance across many cooperative
multi-agent problems. Finally, we analyze the effects of our proposed methods
on the policies that our agents learn and show that our methods successfully
enforce the qualities that we propose as proxies for coordinated behaviors.Comment: 23 pages, 16 figures. This revised version contains additional
results and minor edit
Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent Deep Reinforcement Learning
Multi-agent deep reinforcement learning (MADRL) problems often encounter the
challenge of sparse rewards. This challenge becomes even more pronounced when
coordination among agents is necessary. As performance depends not only on one
agent's behavior but rather on the joint behavior of multiple agents, finding
an adequate solution becomes significantly harder. In this context, a group of
agents can benefit from actively exploring different joint strategies in order
to determine the most efficient one. In this paper, we propose an approach for
rewarding strategies where agents collectively exhibit novel behaviors. We
present JIM (Joint Intrinsic Motivation), a multi-agent intrinsic motivation
method that follows the centralized learning with decentralized execution
paradigm. JIM rewards joint trajectories based on a centralized measure of
novelty designed to function in continuous environments. We demonstrate the
strengths of this approach both in a synthetic environment designed to reveal
shortcomings of state-of-the-art MADRL methods, and in simulated robotic tasks.
Results show that joint exploration is crucial for solving tasks where the
optimal strategy requires a high level of coordination.Comment: 13 pages, 13 figures. Published as an extended abstract at AAMAS 202
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning
Efficient exploration is important for reinforcement learners to achieve high
rewards. In multi-agent systems, coordinated exploration and behaviour is
critical for agents to jointly achieve optimal outcomes. In this paper, we
introduce a new general framework for improving coordination and performance of
multi-agent reinforcement learners (MARL). Our framework, named Learnable
Intrinsic-Reward Generation Selection algorithm (LIGS) introduces an adaptive
learner, Generator that observes the agents and learns to construct intrinsic
rewards online that coordinate the agents' joint exploration and joint
behaviour. Using a novel combination of MARL and switching controls, LIGS
determines the best states to learn to add intrinsic rewards which leads to a
highly efficient learning process. LIGS can subdivide complex tasks making them
easier to solve and enables systems of MARL agents to quickly solve
environments with sparse rewards. LIGS can seamlessly adopt existing MARL
algorithms and, our theory shows that it ensures convergence to policies that
deliver higher system performance. We demonstrate its superior performance in
challenging tasks in Foraging and StarCraft II.Comment: arXiv admin note: text overlap with arXiv:2103.0915
Cost Adaptation for Robust Decentralized Swarm Behaviour
Decentralized receding horizon control (D-RHC) provides a mechanism for
coordination in multi-agent settings without a centralized command center.
However, combining a set of different goals, costs, and constraints to form an
efficient optimization objective for D-RHC can be difficult. To allay this
problem, we use a meta-learning process -- cost adaptation -- which generates
the optimization objective for D-RHC to solve based on a set of human-generated
priors (cost and constraint functions) and an auxiliary heuristic. We use this
adaptive D-RHC method for control of mesh-networked swarm agents. This
formulation allows a wide range of tasks to be encoded and can account for
network delays, heterogeneous capabilities, and increasingly large swarms
through the adaptation mechanism. We leverage the Unity3D game engine to build
a simulator capable of introducing artificial networking failures and delays in
the swarm. Using the simulator we validate our method on an example coordinated
exploration task. We demonstrate that cost adaptation allows for more efficient
and safer task completion under varying environment conditions and increasingly
large swarm sizes. We release our simulator and code to the community for
future work.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), 201
- …