Search CORE

20,564 research outputs found

Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

Author: A Martinoli
C Kube
C Moeslinger
F Arvin
FA Oliehoek
J Foerster
JK Gupta
L Bayındır
N Correll
P Basu
S Nouyan
V Mnih
Publication venue
Publication date: 01/01/2018
Field of study

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201

arXiv.org e-Print Archive

TUbiblio

Crossref

Al-Robotics team: A cooperative multi-unmanned aerial vehicle approach for the Mohamed Bin Zayed International Robotic Challenge

Author: Amigoni
Beard
Burdakov
Capitan
Capitan
Cook
Dalal
Felzenszwalb
Furrer
Galceran
Goyal
Hsieh
Korsah
Stuckler
Tai
Viola
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

The Al-Robotics team was selected as one of the 25 finalist teams out of 143 applications received to participate in the first edition of the Mohamed Bin Zayed International Robotic Challenge (MBZIRC), held in 2017. In particular, one of the competition Challenges offered us the opportunity to develop a cooperative approach with multiple unmanned aerial vehicles (UAVs) searching, picking up, and dropping static and moving objects. This paper presents the approach that our team Al-Robotics followed to address that Challenge 3 of the MBZIRC. First, we overview the overall architecture of the system, with the different modules involved. Second, we describe the procedure that we followed to design the aerial platforms, as well as all their onboard components. Then, we explain the techniques that we used to develop the software functionalities of the system. Finally, we discuss our experimental results and the lessons that we learned before and during the competition. The cooperative approach was validated with fully autonomous missions in experiments previous to the actual competition. We also analyze the results that we obtained during the competition trials.Unión Europea H2020 73166

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Guided Deep Reinforcement Learning for Swarm Systems

Author: Hüttenrauch Maximilian
Neumann Gerhard
Šošić Adrian
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.Comment: 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and Multirobot Systems (ARMS) Worksho

arXiv.org e-Print Archive

TUbiblio

Efficient exploration of unknown indoor environments using a team of mobile robots

Author: A. Meijster
B. Kuipers
B. Yamauchi
B.P. Gerkey
Cyrill Stachniss
D. Fox
D. Goldberg
D. Guzzoni
D. Lee
G. Dudek
G. Dudek
H. Choset
M. Schneider-Fontan
N. Roy
P. Althaus
R.C. Gonzalez
S. Albers
S. Koenig
S. Koenig
S. Oore
W. Burgard
Wolfram Burgard
X. Deng
X. Deng
Y. Freund
Y.U. Cao
Óscar Martínez Mozos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2008
Field of study

Whenever multiple robots have to solve a common task, they need to coordinate their actions to carry out the task efficiently and to avoid interferences between individual robots. This is especially the case when considering the problem of exploring an unknown environment with a team of mobile robots. To achieve efficient terrain coverage with the sensors of the robots, one first needs to identify unknown areas in the environment. Second, one has to assign target locations to the individual robots so that they gather new and relevant information about the environment with their sensors. This assignment should lead to a distribution of the robots over the environment in a way that they avoid redundant work and do not interfere with each other by, for example, blocking their paths. In this paper, we address the problem of efficiently coordinating a large team of mobile robots. To better distribute the robots over the environment and to avoid redundant work, we take into account the type of place a potential target is located in (e.g., a corridor or a room). This knowledge allows us to improve the distribution of robots over the environment compared to approaches lacking this capability. To autonomously determine the type of a place, we apply a classifier learned using the AdaBoost algorithm. The resulting classifier takes laser range data as input and is able to classify the current location with high accuracy. We additionally use a hidden Markov model to consider the spatial dependencies between nearby locations. Our approach to incorporate the information about the type of places in the assignment process has been implemented and tested in different environments. The experiments illustrate that our system effectively distributes the robots over the environment and allows them to accomplish their mission faster compared to approaches that ignore the place labels

University of Lincoln Institutional Repository

Crossref

Planning for Decentralized Control of Multiple Robots Under Uncertainty

Author: Amato Christopher
Cruz Gabriel
How Jonathan P.
Kaelbling Leslie P.
Konidaris George D.
Maynor Christopher A.
Publication venue
Publication date: 12/02/2014
Field of study

We describe a probabilistic framework for synthesizing control policies for general multi-robot systems, given environment and sensor models and a cost function. Decentralized, partially observable Markov decision processes (Dec-POMDPs) are a general model of decision processes where a team of agents must cooperate to optimize some objective (specified by a shared reward or cost function) in the presence of uncertainty, but where communication limitations mean that the agents cannot share their state, so execution must proceed in a decentralized fashion. While Dec-POMDPs are typically intractable to solve for real-world problems, recent research on the use of macro-actions in Dec-POMDPs has significantly increased the size of problem that can be practically solved as a Dec-POMDP. We describe this general model, and show how, in contrast to most existing methods that are specialized to a particular problem class, it can synthesize control policies that use whatever opportunities for coordination are present in the problem, while balancing off uncertainty in outcomes, sensor information, and information about other agents. We use three variations on a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communication, and signaling, as appropriate

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

People tracking by cooperative fusion of RADAR and camera sensors

Author: Dimitrievski Martin
Jacobs Lennert
Philips Wilfried
Veelaert Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Accurate 3D tracking of objects from monocular camera poses challenges due to the loss of depth during projection. Although ranging by RADAR has proven effective in highway environments, people tracking remains beyond the capability of single sensor systems. In this paper, we propose a cooperative RADAR-camera fusion method for people tracking on the ground plane. Using average person height, joint detection likelihood is calculated by back-projecting detections from the camera onto the RADAR Range-Azimuth data. Peaks in the joint likelihood, representing candidate targets, are fed into a Particle Filter tracker. Depending on the association outcome, particles are updated using the associated detections (Tracking by Detection), or by sampling the raw likelihood itself (Tracking Before Detection). Utilizing the raw likelihood data has the advantage that lost targets are continuously tracked even if the camera or RADAR signal is below the detection threshold. We show that in single target, uncluttered environments, the proposed method entirely outperforms camera-only tracking. Experiments in a real-world urban environment also confirm that the cooperative fusion tracker produces significantly better estimates, even in difficult and ambiguous situations

Crossref

Ghent University Academic Bibliography