427 research outputs found

    Deep Reinforcement Learning for Swarm Systems

    Full text link
    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

    Distributed Control of Multi-Robot Deployment Motion

    Get PDF

    Implementation of distributed partitioning algorithms using mobile Wheelphones

    Get PDF
    This thesis presents the implementation process of partitioning algorithms from the theorical ideas to sperimental result

    Multi-Robot Persistent Coverage in Complex Environments

    Get PDF
    Los recientes avances en robótica móvil y un creciente desarrollo de robots móviles asequibles han impulsado numerosas investigaciones en sistemas multi-robot. La complejidad de estos sistemas reside en el diseño de estrategias de comunicación, coordinación y controlpara llevar a cabo tareas complejas que un único robot no puede realizar. Una tarea particularmente interesante es la cobertura persistente, que pretende mantener cubierto en el tiempo un entorno con un equipo de robots moviles. Este problema tiene muchas aplicaciones como aspiración o limpieza de lugares en los que la suciedad se acumula constantemente, corte de césped o monitorización ambiental. Además, la aparición de vehículos aéreos no tripulados amplía estas aplicaciones con otras como la vigilancia o el rescate.Esta tesis se centra en el problema de cubrir persistentemente entornos progresivamente mas complejos. En primer lugar, proponemos una solución óptima para un entorno convexo con un sistema centralizado, utilizando programación dinámica en un horizonte temporalnito. Posteriormente nos centramos en soluciones distribuidas, que son más robustas, escalables y eficientes. Para solventar la falta de información global, presentamos un algoritmo de estimación distribuido con comunicaciones reducidas. Éste permite a los robots teneruna estimación precisa de la cobertura incluso cuando no intercambian información con todos los miembros del equipo. Usando esta estimación, proponemos dos soluciones diferentes basadas en objetivos de cobertura, que son los puntos del entorno en los que más se puedemejorar dicha cobertura. El primer método es un controlador del movimiento que combina un término de gradiente con un término que dirige a los robots hacia sus objetivos. Este método funciona bien en entornos convexos. Para entornos con algunos obstáculos, el segundométodo planifica trayectorias abiertas hasta los objetivos, que son óptimas en términos de cobertura. Finalmente, para entornos complejos no convexos, presentamos un algoritmo capaz de encontrar particiones equitativas para los robots. En dichas regiones, cada robotplanifica trayectorias de longitud finita a través de un grafo de caminos de tipo barrido.La parte final de la tesis se centra en entornos discretos, en los que únicamente un conjunto finito de puntos debe que ser cubierto. Proponemos una estrategia que reduce la complejidad del problema separándolo en tres subproblemas: planificación de trayectoriascerradas, cálculo de tiempos y acciones de cobertura y generación de un plan de equipo sin colisiones. Estos subproblemas más pequeños se resuelven de manera óptima. Esta solución se utiliza en último lugar para una novedosa aplicación como es el calentamiento por inducción doméstico con inductores móviles. En concreto, la adaptamos a las particularidades de una cocina de inducción y mostramos su buen funcionamiento en un prototipo real.Recent advances in mobile robotics and an increasing development of aordable autonomous mobile robots have motivated an extensive research in multi-robot systems. The complexity of these systems resides in the design of communication, coordination and control strategies to perform complex tasks that a single robot can not. A particularly interesting task is that of persistent coverage, that aims to maintain covered over time a given environment with a team of robotic agents. This problem is of interest in many applications such as vacuuming, cleaning a place where dust is continuously settling, lawn mowing or environmental monitoring. More recently, the apparition of useful unmanned aerial vehicles (UAVs) has encouraged the application of the coverage problem to surveillance and monitoring. This thesis focuses on the problem of persistently covering a continuous environment in increasingly more dicult settings. At rst, we propose a receding-horizon optimal solution for a centralized system in a convex environment using dynamic programming. Then we look for distributed solutions, which are more robust, scalable and ecient. To deal with the lack of global information, we present a communication-eective distributed estimation algorithm that allows the robots to have an accurate estimate of the coverage of the environment even when they can not exchange information with all the members of the team. Using this estimation, we propose two dierent solutions based on coverage goals, which are the points of the environment in which the coverage can be improved the most. The rst method is a motion controller, that combines a gradient term with a term that drives the robots to the goals, and which performs well in convex environments. For environments with some obstacles, the second method plans open paths to the goals that are optimal in terms of coverage. Finally, for complex, non-convex environments we propose a distributed algorithm to nd equitable partitions for the robots, i.e., with an amount of work proportional to their capabilities. To cover this region, each robot plans optimal, nite-horizon paths through a graph of sweep-like paths. The nal part of the thesis is devoted to discrete environment, in which only a nite set of points has to be covered. We propose a divide-and-conquer strategy to separate the problem to reduce its complexity into three smaller subproblem, which can be optimally solved. We rst plan closed paths through the points, then calculate the optimal coverage times and actions to periodically satisfy the coverage required by the points, and nally join together the individual plans of the robots into a collision-free team plan that minimizes simultaneous motions. This solution is eventually used for a novel application that is domestic induction heating with mobile inductors. We adapt it to the particular setting of a domestic hob and demonstrate that it performs really well in a real prototype.<br /

    Probabilistic and Distributed Control of a Large-Scale Swarm of Autonomous Agents

    Get PDF
    We present a novel method for guiding a large-scale swarm of autonomous agents into a desired formation shape in a distributed and scalable manner. Our Probabilistic Swarm Guidance using Inhomogeneous Markov Chains (PSG-IMC) algorithm adopts an Eulerian framework, where the physical space is partitioned into bins and the swarm's density distribution over each bin is controlled. Each agent determines its bin transition probabilities using a time-inhomogeneous Markov chain. These time-varying Markov matrices are constructed by each agent in real-time using the feedback from the current swarm distribution, which is estimated in a distributed manner. The PSG-IMC algorithm minimizes the expected cost of the transitions per time instant, required to achieve and maintain the desired formation shape, even when agents are added to or removed from the swarm. The algorithm scales well with a large number of agents and complex formation shapes, and can also be adapted for area exploration applications. We demonstrate the effectiveness of this proposed swarm guidance algorithm by using results of numerical simulations and hardware experiments with multiple quadrotors.Comment: Submitted to IEEE Transactions on Robotic

    Self-organizing Network Optimization via Placement of Additional Nodes

    Get PDF
    Das Hauptforschungsgebiet des Graduiertenkollegs "International Graduate School on Mobile Communication" (GS Mobicom) der Technischen Universität Ilmenau ist die Kommunikation in Katastrophenszenarien. Wegen eines Desasters oder einer Katastrophe können die terrestrischen Elementen der Infrastruktur eines Kommunikationsnetzwerks beschädigt oder komplett zerstört werden. Dennoch spielen verfügbare Kommunikationsnetze eine sehr wichtige Rolle während der Rettungsmaßnahmen, besonders für die Koordinierung der Rettungstruppen und für die Kommunikation zwischen ihren Mitgliedern. Ein solcher Service kann durch ein mobiles Ad-Hoc-Netzwerk (MANET) zur Verfügung gestellt werden. Ein typisches Problem der MANETs ist Netzwerkpartitionierung, welche zur Isolation von verschiedenen Knotengruppen führt. Eine mögliche Lösung dieses Problems ist die Positionierung von zusätzlichen Knoten, welche die Verbindung zwischen den isolierten Partitionen wiederherstellen können. Hauptziele dieser Arbeit sind die Recherche und die Entwicklung von Algorithmen und Methoden zur Positionierung der zusätzlichen Knoten. Der Fokus der Recherche liegt auf Untersuchung der verteilten Algorithmen zur Bestimmung der Positionen für die zusätzlichen Knoten. Die verteilten Algorithmen benutzen nur die Information, welche in einer lokalen Umgebung eines Knotens verfügbar ist, und dadurch entsteht ein selbstorganisierendes System. Jedoch wird das gesamte Netzwerk hier vor allem innerhalb eines ganz speziellen Szenarios - Katastrophenszenario - betrachtet. In einer solchen Situation kann die Information über die Topologie des zu reparierenden Netzwerks im Voraus erfasst werden und soll, natürlich, für die Wiederherstellung mitbenutzt werden. Dank der eventuell verfügbaren zusätzlichen Information können die Positionen für die zusätzlichen Knoten genauer ermittelt werden. Die Arbeit umfasst eine Beschreibung, Implementierungsdetails und eine Evaluierung eines selbstorganisierendes Systems, welche die Netzwerkwiederherstellung in beiden Szenarien ermöglicht.The main research area of the International Graduate School on Mobile Communication (GS Mobicom) at Ilmenau University of Technology is communication in disaster scenarios. Due to a disaster or an accident, the network infrastructure can be damaged or even completely destroyed. However, available communication networks play a vital role during the rescue activities especially for the coordination of the rescue teams and for the communication between their members. Such a communication service can be provided by a Mobile Ad-Hoc Network (MANET). One of the typical problems of a MANET is network partitioning, when separate groups of nodes become isolated from each other. One possible solution for this problem is the placement of additional nodes in order to reconstruct the communication links between isolated network partitions. The primary goal of this work is the research and development of algorithms and methods for the placement of additional nodes. The focus of this research lies on the investigation of distributed algorithms for the placement of additional nodes, which use only the information from the nodes’ local environment and thus form a self-organizing system. However, during the usage specifics of the system in a disaster scenario, global information about the topology of the network to be recovered can be known or collected in advance. In this case, it is of course reasonable to use this information in order to calculate the placement positions more precisely. The work provides the description, the implementation details and the evaluation of a self-organizing system which is able to recover from network partitioning in both situations

    Distributed, adaptive deployment for nonholonomic mobile sensor networks : theory and experiments

    Get PDF
    In this work we show the Lyapunov stability and convergence of an adaptive and decentralized coverage control for a team of mobile sensors. This new approach assumes nonholonomic sensors rather than the usual holonomic sensors found in the literature. The kinematics of the unicycle model and a nonlinear control law in polar coordinates are used in order to prove the stability of the controller applied over a team of mobile sensors. This controller is adaptive, which means that the mobile sensors are able to estimate and map a density function in the sampling space without a previous knowledge of the environment. The controller is decentralized, which means that each mobile sensor has its own estimate and computes its own control input based on local information. In order to guarantee the estimate convergence, the mobile sensors implement a consensus protocol in continuous time assuming a fixed network topology and zero communication delays. The convergence and feasibility of the coverage control algorithm are verified through simulations in Matlab and Stage. The Matlab simulations consider only the kinematics of the mobile sensors and the Stage simulations consider the dynamics and the kinematics of the sensors. The Matlab simulations show successful results since the sensor network carries out the coverage task and distributes itself over the estimated density function. The adaptive law which is defined by a differential equation must be approximated by a difference equation to be implementable in Stage. The Stage simulations show positive results, however, the system is not able to achieve an accurate estimation of the density function. In spite of that, the sensors carry out the coverage task distributing themselves over the sampling space. Furthermore, some experiments are carried out using a team of four Pioneer 3-AT robots sensing a piecewise constant light distribution function. The experimental results are satisfactory since the robots carry out the coverage task. However, the accuracy of the estimation is affected by the approximation of the adaptation law by difference equations, the number of robots and sensor sensitivity. Based on the results of this research, the decentralized adaptive coverage control for nonholonomic vehicles has been analyzed from a theoretical approach and validated through simulation and experimentation with positive results. As a future work we will investigate: (i) new techniques to improve the implementation of the adaptive law in real time,(ii) the consideration of the dynamics of the mobile sensors, and (iii) the stability and convergence of the adaptive law for continuous-time variant density function
    corecore