1,039 research outputs found
Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
In multi-agent reinforcement learning, discovering successful collective
behaviors is challenging as it requires exploring a joint action space that
grows exponentially with the number of agents. While the tractability of
independent agent-wise exploration is appealing, this approach fails on tasks
that require elaborate group strategies. We argue that coordinating the agents'
policies can guide their exploration and we investigate techniques to promote
such an inductive bias. We propose two policy regularization methods: TeamReg,
which is based on inter-agent action predictability and CoachReg that relies on
synchronized behavior selection. We evaluate each approach on four challenging
continuous control tasks with sparse rewards that require varying levels of
coordination as well as on the discrete action Google Research Football
environment. Our experiments show improved performance across many cooperative
multi-agent problems. Finally, we analyze the effects of our proposed methods
on the policies that our agents learn and show that our methods successfully
enforce the qualities that we propose as proxies for coordinated behaviors.Comment: 23 pages, 16 figures. This revised version contains additional
results and minor edit
Learning to Coordinate with Anyone
In open multi-agent environments, the agents may encounter unexpected
teammates. Classical multi-agent learning approaches train agents that can only
coordinate with seen teammates. Recent studies attempted to generate diverse
teammates to enhance the generalizable coordination ability, but were
restricted by pre-defined teammates. In this work, our aim is to train agents
with strong coordination ability by generating teammates that fully cover the
teammate policy space, so that agents can coordinate with any teammates. Since
the teammate policy space is too huge to be enumerated, we find only dissimilar
teammates that are incompatible with controllable agents, which highly reduces
the number of teammates that need to be trained with. However, it is hard to
determine the number of such incompatible teammates beforehand. We therefore
introduce a continual multi-agent learning process, in which the agent learns
to coordinate with different teammates until no more incompatible teammates can
be found. The above idea is implemented in the proposed Macop (Multi-agent
compatible policy learning) algorithm. We conduct experiments in 8 scenarios
from 4 environments that have distinct coordination patterns. Experiments show
that Macop generates training teammates with much lower compatibility than
previous methods. As a result, in all scenarios Macop achieves the best overall
coordination ability while never significantly worse than the baselines,
showing strong generalization ability
Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV
This paper presents multi-agent reinforcement learning frameworks for the
low-level control of a quadrotor UAV. While single-agent reinforcement learning
has been successfully applied to quadrotors, training a single monolithic
network is often data-intensive and time-consuming. To address this, we
decompose the quadrotor dynamics into the translational dynamics and the yawing
dynamics, and assign a reinforcement learning agent to each part for efficient
training and performance improvements. The proposed multi-agent framework for
quadrotor low-level control that leverages the underlying structures of the
quadrotor dynamics is a unique contribution. Further, we introduce
regularization terms to mitigate steady-state errors and to avoid aggressive
control inputs. Through benchmark studies with sim-to-sim transfer, it is
illustrated that the proposed multi-agent reinforcement learning substantially
improves the convergence rate of the training and the stability of the
controlled dynamics.Comment: 8 pages, 6 figures, 3 table
Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization
In this paper, a novel Multi-agent Reinforcement Learning (MARL) approach,
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle
the issues of limited capability and sample efficiency in various scenarios
controlled by multiple agents. It alleviates the inconsistency of multiple
agents' policy updates by introducing the relative entropy regularization to
the Centralized Training with Decentralized Execution (CTDE) framework with the
Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and
competition tasks and traditional control tasks including OpenAI benchmarks and
robot arm manipulation, MACDPP demonstrates significant superiority in learning
capability and sample efficiency compared with both related multi-agent and
widely implemented signal-agent baselines and therefore expands the potential
of MARL in effectively learning challenging control scenarios
Reinforcement Learning-based Visual Navigation with Information-Theoretic Regularization
To enhance the cross-target and cross-scene generalization of target-driven
visual navigation based on deep reinforcement learning (RL), we introduce an
information-theoretic regularization term into the RL objective. The
regularization maximizes the mutual information between navigation actions and
visual observation transforms of an agent, thus promoting more informed
navigation decisions. This way, the agent models the action-observation
dynamics by learning a variational generative model. Based on the model, the
agent generates (imagines) the next observation from its current observation
and navigation target. This way, the agent learns to understand the causality
between navigation actions and the changes in its observations, which allows
the agent to predict the next action for navigation by comparing the current
and the imagined next observations. Cross-target and cross-scene evaluations on
the AI2-THOR framework show that our method attains at least a
improvement of average success rate over some state-of-the-art models. We
further evaluate our model in two real-world settings: navigation in unseen
indoor scenes from a discrete Active Vision Dataset (AVD) and continuous
real-world environments with a TurtleBot.We demonstrate that our navigation
model is able to successfully achieve navigation tasks in these scenarios.
Videos and models can be found in the supplementary material.Comment: 11 pages, corresponding author: Kai Xu ([email protected]) and
Jun Wang ([email protected]
Human-Inspired Multi-Agent Navigation using Knowledge Distillation
Despite significant advancements in the field of multi-agent navigation,
agents still lack the sophistication and intelligence that humans exhibit in
multi-agent settings. In this paper, we propose a framework for learning a
human-like general collision avoidance policy for agent-agent interactions in
fully decentralized, multi-agent environments. Our approach uses knowledge
distillation with reinforcement learning to shape the reward function based on
expert policies extracted from human trajectory demonstrations through behavior
cloning. We show that agents trained with our approach can take human-like
trajectories in collision avoidance and goal-directed steering tasks not
provided by the demonstrations, outperforming the experts as well as
learning-based agents trained without knowledge distillation
- …