11 research outputs found
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
We consider the problem of multiple agents sensing and acting in environments
with the goal of maximising their shared utility. In these environments, agents
must learn communication protocols in order to share information that is needed
to solve the tasks. By embracing deep neural networks, we are able to
demonstrate end-to-end learning of protocols in complex environments inspired
by communication riddles and multi-agent computer vision problems with partial
observability. We propose two approaches for learning in these domains:
Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning
(DIAL). The former uses deep Q-learning, while the latter exploits the fact
that, during learning, agents can backpropagate error derivatives through
(noisy) communication channels. Hence, this approach uses centralised learning
but decentralised execution. Our experiments introduce new environments for
studying the learning of communication protocols and present a set of
engineering innovations that are essential for success in these domains
Visual victim detection and quadrotor-swarm coordination control in search and rescue environment
We propose a distributed victim-detection algorithm through visual information on quadrotors using convolutional neuronal networks (CNN) in a search and rescue environment. Describing the navigation algorithm, which allows quadrotors to avoid collisions. Secondly, when one quadrotor detects a possible victim, it causes its closest neighbors to disconnect from the main swarm and form a new sub-swarm around the victim, which validates the victim’s status. Thus, a formation control that permits to acquire information is performed based on the well-known rendezvous consensus algorithm. Finally, images are processed using CNN identifying potential victims in the area. Given the uncertainty of the victim detection measurement among quadrotors’ cameras in the image processing, estimation consensus (EC) and max-estimation consensus (M-EC) algorithms are proposed focusing on agreeing over the victim detection estimation. We illustrate that M-EC delivers better results than EC in scenarios with poor visibility and uncertainty produced by fire and smoke. The algorithm proves that distributed fashion can obtain a more accurate result in decision-making on whether or not there is a victim, showing robustness under uncertainties and wrong measurements in comparison when a single quadrotor performs the mission. The well-functioning of the algorithm is evaluated by carrying out a simulation using V-Rep
Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?
Centralized Training with Decentralized Execution (CTDE) has recently emerged
as a popular framework for cooperative Multi-Agent Reinforcement Learning
(MARL), where agents can use additional global state information to guide
training in a centralized way and make their own decisions only based on
decentralized local policies. Despite the encouraging results achieved, CTDE
makes an independence assumption on agent policies, which limits agents to
adopt global cooperative information from each other during centralized
training. Therefore, we argue that existing CTDE methods cannot fully utilize
global information for training, leading to an inefficient joint-policy
exploration and even suboptimal results. In this paper, we introduce a novel
Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent
reinforcement learning, that not only enables an efficacious message exchange
among agents during training but also guarantees the independent policies for
execution. Firstly, CADP endows agents the explicit communication channel to
seek and take advices from different agents for more centralized training. To
further ensure the decentralized execution, we propose a smooth model pruning
mechanism to progressively constraint the agent communication into a closed one
without degradation in agent cooperation capability. Empirical evaluations on
StarCraft II micromanagement and Google Research Football benchmarks
demonstrate that the proposed framework achieves superior performance compared
with the state-of-the-art counterparts. Our code will be made publicly
available
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
Autonomous agents must learn to collaborate. It is not scalable to develop a
new centralized agent every time a task's difficulty outpaces a single agent's
abilities. While multi-agent collaboration research has flourished in
gridworld-like environments, relatively little work has considered visually
rich domains. Addressing this, we introduce the novel task FurnMove in which
agents work together to move a piece of furniture through a living room to a
goal. Unlike existing tasks, FurnMove requires agents to coordinate at every
timestep. We identify two challenges when training agents to complete FurnMove:
existing decentralized action sampling procedures do not permit expressive
joint action policies and, in tasks requiring close coordination, the number of
failed actions dominates successful actions. To confront these challenges we
introduce SYNC-policies (synchronize your actions coherently) and CORDIAL
(coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58%
completion rate on FurnMove, an impressive absolute gain of 25 percentage
points over competitive decentralized baselines. Our dataset, code, and
pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page:
https://unnat.github.io/cordial-syn
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots
We present Habitat 3.0: a simulation platform for studying collaborative
human-robot tasks in home environments. Habitat 3.0 offers contributions across
three dimensions: (1) Accurate humanoid simulation: addressing challenges in
modeling complex deformable bodies and diversity in appearance and motion, all
while ensuring high simulation speed. (2) Human-in-the-loop infrastructure:
enabling real human interaction with simulated robots via mouse/keyboard or a
VR interface, facilitating evaluation of robot policies with human input. (3)
Collaborative tasks: studying two collaborative tasks, Social Navigation and
Social Rearrangement. Social Navigation investigates a robot's ability to
locate and follow humanoid avatars in unseen environments, whereas Social
Rearrangement addresses collaboration between a humanoid and robot while
rearranging a scene. These contributions allow us to study end-to-end learned
and heuristic baselines for human-robot collaboration in-depth, as well as
evaluate them with humans in the loop. Our experiments demonstrate that learned
robot policies lead to efficient task completion when collaborating with unseen
humanoid agents and human partners that might exhibit behaviors that the robot
has not seen before. Additionally, we observe emergent behaviors during
collaborative task execution, such as the robot yielding space when obstructing
a humanoid agent, thereby allowing the effective completion of the task by the
humanoid agent. Furthermore, our experiments using the human-in-the-loop tool
demonstrate that our automated evaluation with humanoids can provide an
indication of the relative ordering of different policies when evaluated with
real human collaborators. Habitat 3.0 unlocks interesting new features in
simulators for Embodied AI, and we hope it paves the way for a new frontier of
embodied human-AI interaction capabilities.Comment: Project page: http://aihabitat.org/habitat
Evolutionary Learning of Goal-Driven Multi-agent Communication
Multi-agent systems are a common paradigm for building distributed systems in different domains such as networking, health care, swarm sensing, robotics, and transportation. Systems are usually designed or adjusted in order to reflect the performance trade-offs made according to the characteristics of the mission requirement. Research has acknowledged the crucial role that communication plays in solving many performance problems. Conversely, research efforts that address communication decisions are usually designed and evaluated with respect to a single predetermined performance goal. This work introduces Goal-Driven Communication, where communication in a multi-agent system is determined according to flexible performance goals. This work proposes an evolutionary approach that, given a performance goal, produces a communication strategy that can improve a multi-agent system's performance with respect to the desired goal. The evolved strategy determines what, when, and to whom the agents communicate. The proposed approach further enables tuning the trade-off between the performance goal and communication cost, to produce a strategy that achieves a good balance between the two objectives, according the system designer's needs