1,288 research outputs found

    Deep Reinforcement Learning for Swarm Systems

    Full text link
    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

    Information-rich Task Allocation and Motion Planning for Heterogeneous Sensor Platforms

    Get PDF
    This paper introduces a novel stratified planning algorithm for teams of heterogeneous mobile sensors that maximizes information collection while minimizing resource costs. The main contribution of this work is the scalable unification of effective algorithms for de- centralized informative motion planning and decentralized high-level task allocation. We present the Information-rich Rapidly-exploring Random Tree (IRRT) algorithm, which is amenable to very general and realistic mobile sensor constraint characterizations, as well as review the Consensus-Based Bundle Algorithm (CBBA), offering several enhancements to the existing algorithms to embed information collection at each phase of the planning process. The proposed framework is validated with simulation results for networks of mobile sensors performing multi-target localization missions.United States. Air Force. Office of Scientific Research (Grant FA9550-08-1-0086)United States. Air Force. Office of Scientific Research. Multidisciplinary University Research Initiative (FA9550-08-1-0356

    Task allocation in group of nodes in the IoT: A consensus approach

    Get PDF
    The realization of the Internet of Things (IoT) paradigm relies on the implementation of systems of cooperative intelligent objects with key interoperability capabilities. In order for objects to dynamically cooperate to IoT applications' execution, they need to make their resources available in a flexible way. However, available resources such as electrical energy, memory, processing, and object capability to perform a given task, are often limited. Therefore, resource allocation that ensures the fulfilment of network requirements is a critical challenge. In this paper, we propose a distributed optimization protocol based on consensus algorithm, to solve the problem of resource allocation and management in IoT heterogeneous networks. The proposed protocol is robust against links or nodes failures, so it's adaptive in dynamic scenarios where the network topology changes in runtime. We consider an IoT scenario where nodes involved in the same IoT task need to adjust their task frequency and buffer occupancy. We demonstrate that, using the proposed protocol, the network converges to a solution where resources are homogeneously allocated among nodes. Performance evaluation of experiments in simulation mode and in real scenarios show that the algorithm converges with a percentage error of about±5% with respect to the optimal allocation obtainable with a centralized approach
    corecore