427 research outputs found
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
Implementation of distributed partitioning algorithms using mobile Wheelphones
This thesis presents the implementation process of partitioning algorithms from the theorical ideas to sperimental result
Multi-Robot Persistent Coverage in Complex Environments
Los recientes avances en robótica móvil y un creciente desarrollo de robots móviles asequibles han impulsado numerosas investigaciones en sistemas multi-robot. La complejidad de estos sistemas reside en el diseño de estrategias de comunicación, coordinación y controlpara llevar a cabo tareas complejas que un único robot no puede realizar. Una tarea particularmente interesante es la cobertura persistente, que pretende mantener cubierto en el tiempo un entorno con un equipo de robots moviles. Este problema tiene muchas aplicaciones como aspiración o limpieza de lugares en los que la suciedad se acumula constantemente, corte de césped o monitorización ambiental. Además, la aparición de vehículos aéreos no tripulados amplía estas aplicaciones con otras como la vigilancia o el rescate.Esta tesis se centra en el problema de cubrir persistentemente entornos progresivamente mas complejos. En primer lugar, proponemos una solución óptima para un entorno convexo con un sistema centralizado, utilizando programación dinámica en un horizonte temporalnito. Posteriormente nos centramos en soluciones distribuidas, que son más robustas, escalables y eficientes. Para solventar la falta de información global, presentamos un algoritmo de estimación distribuido con comunicaciones reducidas. Éste permite a los robots teneruna estimación precisa de la cobertura incluso cuando no intercambian información con todos los miembros del equipo. Usando esta estimación, proponemos dos soluciones diferentes basadas en objetivos de cobertura, que son los puntos del entorno en los que más se puedemejorar dicha cobertura. El primer método es un controlador del movimiento que combina un término de gradiente con un término que dirige a los robots hacia sus objetivos. Este método funciona bien en entornos convexos. Para entornos con algunos obstáculos, el segundométodo planifica trayectorias abiertas hasta los objetivos, que son óptimas en términos de cobertura. Finalmente, para entornos complejos no convexos, presentamos un algoritmo capaz de encontrar particiones equitativas para los robots. En dichas regiones, cada robotplanifica trayectorias de longitud finita a través de un grafo de caminos de tipo barrido.La parte final de la tesis se centra en entornos discretos, en los que únicamente un conjunto finito de puntos debe que ser cubierto. Proponemos una estrategia que reduce la complejidad del problema separándolo en tres subproblemas: planificación de trayectoriascerradas, cálculo de tiempos y acciones de cobertura y generación de un plan de equipo sin colisiones. Estos subproblemas más pequeños se resuelven de manera óptima. Esta solución se utiliza en último lugar para una novedosa aplicación como es el calentamiento por inducción doméstico con inductores móviles. En concreto, la adaptamos a las particularidades de una cocina de inducción y mostramos su buen funcionamiento en un prototipo real.Recent advances in mobile robotics and an increasing development of aordable autonomous mobile robots have motivated an extensive research in multi-robot systems. The complexity of these systems resides in the design of communication, coordination and control strategies to perform complex tasks that a single robot can not. A particularly interesting task is that of persistent coverage, that aims to maintain covered over time a given environment with a team of robotic agents. This problem is of interest in many applications such as vacuuming, cleaning a place where dust is continuously settling, lawn mowing or environmental monitoring. More recently, the apparition of useful unmanned aerial vehicles (UAVs) has encouraged the application of the coverage problem to surveillance and monitoring. This thesis focuses on the problem of persistently covering a continuous environment in increasingly more dicult settings. At rst, we propose a receding-horizon optimal solution for a centralized system in a convex environment using dynamic programming. Then we look for distributed solutions, which are more robust, scalable and ecient. To deal with the lack of global information, we present a communication-eective distributed estimation algorithm that allows the robots to have an accurate estimate of the coverage of the environment even when they can not exchange information with all the members of the team. Using this estimation, we propose two dierent solutions based on coverage goals, which are the points of the environment in which the coverage can be improved the most. The rst method is a motion controller, that combines a gradient term with a term that drives the robots to the goals, and which performs well in convex environments. For environments with some obstacles, the second method plans open paths to the goals that are optimal in terms of coverage. Finally, for complex, non-convex environments we propose a distributed algorithm to nd equitable partitions for the robots, i.e., with an amount of work proportional to their capabilities. To cover this region, each robot plans optimal, nite-horizon paths through a graph of sweep-like paths. The nal part of the thesis is devoted to discrete environment, in which only a nite set of points has to be covered. We propose a divide-and-conquer strategy to separate the problem to reduce its complexity into three smaller subproblem, which can be optimally solved. We rst plan closed paths through the points, then calculate the optimal coverage times and actions to periodically satisfy the coverage required by the points, and nally join together the individual plans of the robots into a collision-free team plan that minimizes simultaneous motions. This solution is eventually used for a novel application that is domestic induction heating with mobile inductors. We adapt it to the particular setting of a domestic hob and demonstrate that it performs really well in a real prototype.<br /
Probabilistic and Distributed Control of a Large-Scale Swarm of Autonomous Agents
We present a novel method for guiding a large-scale swarm of autonomous
agents into a desired formation shape in a distributed and scalable manner. Our
Probabilistic Swarm Guidance using Inhomogeneous Markov Chains (PSG-IMC)
algorithm adopts an Eulerian framework, where the physical space is partitioned
into bins and the swarm's density distribution over each bin is controlled.
Each agent determines its bin transition probabilities using a
time-inhomogeneous Markov chain. These time-varying Markov matrices are
constructed by each agent in real-time using the feedback from the current
swarm distribution, which is estimated in a distributed manner. The PSG-IMC
algorithm minimizes the expected cost of the transitions per time instant,
required to achieve and maintain the desired formation shape, even when agents
are added to or removed from the swarm. The algorithm scales well with a large
number of agents and complex formation shapes, and can also be adapted for area
exploration applications. We demonstrate the effectiveness of this proposed
swarm guidance algorithm by using results of numerical simulations and hardware
experiments with multiple quadrotors.Comment: Submitted to IEEE Transactions on Robotic
Self-organizing Network Optimization via Placement of Additional Nodes
Das Hauptforschungsgebiet des Graduiertenkollegs "International Graduate
School on Mobile Communication" (GS Mobicom) der Technischen Universität
Ilmenau ist die Kommunikation in Katastrophenszenarien. Wegen eines
Desasters oder einer Katastrophe können die terrestrischen Elementen der
Infrastruktur eines Kommunikationsnetzwerks beschädigt oder komplett
zerstört werden. Dennoch spielen verfügbare Kommunikationsnetze eine sehr
wichtige Rolle während der Rettungsmaßnahmen, besonders für die
Koordinierung der Rettungstruppen und für die Kommunikation zwischen ihren
Mitgliedern. Ein solcher Service kann durch ein mobiles Ad-Hoc-Netzwerk
(MANET) zur Verfügung gestellt werden. Ein typisches Problem der MANETs
ist Netzwerkpartitionierung, welche zur Isolation von verschiedenen
Knotengruppen führt. Eine mögliche Lösung dieses Problems ist die
Positionierung von zusätzlichen Knoten, welche die Verbindung zwischen den
isolierten Partitionen wiederherstellen können. Hauptziele dieser Arbeit
sind die Recherche und die Entwicklung von Algorithmen und Methoden zur
Positionierung der zusätzlichen Knoten. Der Fokus der Recherche liegt auf
Untersuchung der verteilten Algorithmen zur Bestimmung der Positionen für
die zusätzlichen Knoten. Die verteilten Algorithmen benutzen nur die
Information, welche in einer lokalen Umgebung eines Knotens verfügbar ist,
und dadurch entsteht ein selbstorganisierendes System. Jedoch wird das
gesamte Netzwerk hier vor allem innerhalb eines ganz speziellen Szenarios -
Katastrophenszenario - betrachtet. In einer solchen Situation kann die
Information über die Topologie des zu reparierenden Netzwerks im Voraus
erfasst werden und soll, natürlich, für die Wiederherstellung mitbenutzt
werden. Dank der eventuell verfügbaren zusätzlichen Information können
die Positionen für die zusätzlichen Knoten genauer ermittelt werden. Die
Arbeit umfasst eine Beschreibung, Implementierungsdetails und eine
Evaluierung eines selbstorganisierendes Systems, welche die
Netzwerkwiederherstellung in beiden Szenarien ermöglicht.The main research area of the International Graduate School on Mobile
Communication (GS Mobicom) at Ilmenau University of Technology is
communication in disaster scenarios. Due to a disaster or an accident, the
network infrastructure can be damaged or even completely destroyed.
However, available communication networks play a vital role during the
rescue activities especially for the coordination of the rescue teams and
for the communication between their members. Such a communication service
can be provided by a Mobile Ad-Hoc Network (MANET). One of the typical
problems of a MANET is network partitioning, when separate groups of nodes
become isolated from each other. One possible solution for this problem is
the placement of additional nodes in order to reconstruct the communication
links between isolated network partitions. The primary goal of this work is
the research and development of algorithms and methods for the placement of
additional nodes. The focus of this research lies on the investigation of
distributed algorithms for the placement of additional nodes, which use
only the information from the nodes’ local environment and thus form a
self-organizing system. However, during the usage specifics of the system
in a disaster scenario, global information about the topology of the
network to be recovered can be known or collected in advance. In this case,
it is of course reasonable to use this information in order to calculate
the placement positions more precisely. The work provides the description,
the implementation details and the evaluation of a self-organizing system
which is able to recover from network partitioning in both situations
Distributed, adaptive deployment for nonholonomic mobile sensor networks : theory and experiments
In this work we show the Lyapunov stability and convergence of an adaptive and decentralized coverage control for a team of mobile sensors. This new approach assumes nonholonomic sensors rather than the usual holonomic sensors found in the literature. The kinematics of the unicycle model and a nonlinear control law in polar coordinates are used in order to prove the stability of the controller applied over a team of mobile sensors. This controller is adaptive, which means that the mobile sensors are able to estimate and map a density function in the sampling space without a previous knowledge of the environment. The controller is decentralized, which means that each mobile sensor has its own estimate and computes its own control input based on local information. In order to guarantee the estimate convergence, the mobile sensors implement a consensus protocol in continuous time assuming a fixed network topology and zero communication delays. The convergence and feasibility of the coverage control algorithm are verified through simulations in Matlab and Stage. The Matlab simulations consider only the kinematics of the mobile sensors and the Stage simulations consider the dynamics and the kinematics of the sensors. The Matlab simulations show successful results since the sensor network carries out the coverage task and distributes itself over the estimated density function. The adaptive law which is defined by a differential equation must be approximated by a difference equation to be implementable in Stage. The Stage simulations show positive results, however, the system is not able to achieve an accurate estimation of the density function. In spite of that, the sensors carry out the coverage task distributing themselves over the sampling space. Furthermore, some experiments are carried out using a team of four Pioneer 3-AT robots sensing a piecewise constant light distribution function. The experimental results are satisfactory since the robots carry out the coverage task. However, the accuracy of the estimation is affected by the approximation of the adaptation law by difference equations, the number of robots and sensor sensitivity. Based on the results of this research, the decentralized adaptive coverage control for nonholonomic vehicles has been analyzed from a theoretical approach and validated through simulation and experimentation with positive results. As a future work we will investigate: (i) new techniques to improve the implementation of the adaptive law in real time,(ii) the consideration of the dynamics of the mobile sensors, and (iii) the stability and convergence of the adaptive law for continuous-time variant density function
- …