Search CORE

11 research outputs found

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Author: Assael Yannis M.
de Freitas Nando
Foerster Jakob N.
Whiteson Shimon
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains

arXiv.org e-Print Archive

Oxford University Research Archive

Visual victim detection and quadrotor-swarm coordination control in search and rescue environment

Author: Calderon Juan M.
Cardona Gustavo A.
Mojica-Nava Eduardo
Ramirez-Rugeles Juan
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2021
Field of study

We propose a distributed victim-detection algorithm through visual information on quadrotors using convolutional neuronal networks (CNN) in a search and rescue environment. Describing the navigation algorithm, which allows quadrotors to avoid collisions. Secondly, when one quadrotor detects a possible victim, it causes its closest neighbors to disconnect from the main swarm and form a new sub-swarm around the victim, which validates the victim’s status. Thus, a formation control that permits to acquire information is performed based on the well-known rendezvous consensus algorithm. Finally, images are processed using CNN identifying potential victims in the area. Given the uncertainty of the victim detection measurement among quadrotors’ cameras in the image processing, estimation consensus (EC) and max-estimation consensus (M-EC) algorithms are proposed focusing on agreeing over the victim detection estimation. We illustrate that M-EC delivers better results than EC in scenarios with poor visibility and uncertainty produced by fire and smoke. The algorithm proves that distributed fashion can obtain a more accurate result in decision-making on whether or not there is a victim, showing robustness under uncertainties and wrong measurements in comparison when a single quadrotor performs the mission. The well-functioning of the algorithm is evaluated by carrying out a simulation using V-Rep

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Author: Chen Kaixuan
Huang Yanhao
Liu Shunyu
Qing Yunpeng
Song Jie
Song Mingli
Zheng Tongya
Zhou Yihe
Publication venue
Publication date: 26/05/2023
Field of study

Centralized Training with Decentralized Execution (CTDE) has recently emerged as a popular framework for cooperative Multi-Agent Reinforcement Learning (MARL), where agents can use additional global state information to guide training in a centralized way and make their own decisions only based on decentralized local policies. Despite the encouraging results achieved, CTDE makes an independence assumption on agent policies, which limits agents to adopt global cooperative information from each other during centralized training. Therefore, we argue that existing CTDE methods cannot fully utilize global information for training, leading to an inefficient joint-policy exploration and even suboptimal results. In this paper, we introduce a novel Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning, that not only enables an efficacious message exchange among agents during training but also guarantees the independent policies for execution. Firstly, CADP endows agents the explicit communication channel to seek and take advices from different agents for more centralized training. To further ensure the decentralized execution, we propose a smooth model pruning mechanism to progressively constraint the agent communication into a closed one without degradation in agent cooperation capability. Empirical evaluations on StarCraft II micromanagement and Google Research Football benchmarks demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code will be made publicly available

arXiv.org e-Print Archive

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Author: A Aydemir
A Elfes
A Giusti
A Tampuu
B Kuipers
C Cadena
C Tomasi
CL Giles
FS Melo
J Canny
JK Gupta
K Konolige
L Busoniu
L Panait
LE Kavraki
M Jaderberg
MG Bellemare
RC Smith
RS Sutton
S Daftry
S Hochreiter
V Mnih
Publication venue
Publication date: 09/07/2020
Field of study

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page: https://unnat.github.io/cordial-syn

arXiv.org e-Print Archive

Crossref

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

Author: Batra Dhruv
Berges Vincent-Pierre
Chaplot Devendra Singh
Clegg Alexander William
Cote Mikael Dallaire
Desai Ruta
Gervet Theophile
Hlavac Michal
Jain Unnat
Kalakrishnan Mrinal
Kira Zsolt
Maksymets Oleksandr
Malik Jitendra
Min So Yeon
Mottaghi Roozbeh
Partsey Ruslan
Puig Xavier
Rai Akshara
Szot Andrew
Turner John M.
Undersander Eric
Vondruš Vladimír
Yang Tsung-Yen
Publication venue
Publication date: 19/10/2023
Field of study

We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real human interaction with simulated robots via mouse/keyboard or a VR interface, facilitating evaluation of robot policies with human input. (3) Collaborative tasks: studying two collaborative tasks, Social Navigation and Social Rearrangement. Social Navigation investigates a robot's ability to locate and follow humanoid avatars in unseen environments, whereas Social Rearrangement addresses collaboration between a humanoid and robot while rearranging a scene. These contributions allow us to study end-to-end learned and heuristic baselines for human-robot collaboration in-depth, as well as evaluate them with humans in the loop. Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before. Additionally, we observe emergent behaviors during collaborative task execution, such as the robot yielding space when obstructing a humanoid agent, thereby allowing the effective completion of the task by the humanoid agent. Furthermore, our experiments using the human-in-the-loop tool demonstrate that our automated evaluation with humanoids can provide an indication of the relative ordering of different policies when evaluated with real human collaborators. Habitat 3.0 unlocks interesting new features in simulators for Embodied AI, and we hope it paves the way for a new frontier of embodied human-AI interaction capabilities.Comment: Project page: http://aihabitat.org/habitat

arXiv.org e-Print Archive

Evolutionary Learning of Goal-Driven Multi-agent Communication

Author: Althnian Alhanoof
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2016
Field of study

Multi-agent systems are a common paradigm for building distributed systems in different domains such as networking, health care, swarm sensing, robotics, and transportation. Systems are usually designed or adjusted in order to reflect the performance trade-offs made according to the characteristics of the mission requirement. Research has acknowledged the crucial role that communication plays in solving many performance problems. Conversely, research efforts that address communication decisions are usually designed and evaluated with respect to a single predetermined performance goal. This work introduces Goal-Driven Communication, where communication in a multi-agent system is determined according to flexible performance goals. This work proposes an evolutionary approach that, given a performance goal, produces a communication strategy that can improve a multi-agent system's performance with respect to the desired goal. The evolved strategy determines what, when, and to whom the agents communicate. The proposed approach further enables tuning the trade-off between the performance goal and communication cost, to produce a strategy that achieves a good balance between the two objectives, according the system designer's needs

KU ScholarWorks