84,304 research outputs found

    Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

    Full text link
    Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201

    Learn to navigate: cooperative path planning for unmanned surface vehicles using deep reinforcement learning

    Get PDF
    Unmanned surface vehicle (USV) has witnessed a rapid growth in the recent decade and has been applied in various practical applications in both military and civilian domains. USVs can either be deployed as a single unit or multiple vehicles in a fleet to conduct ocean missions. Central to the control of USV and USV formations, path planning is the key technology that ensures the navigation safety by generating collision free trajectories. Compared with conventional path planning algorithms, the deep reinforcement learning (RL) based planning algorithms provides a new resolution by integrating a high-level artificial intelligence. This work investigates the application of deep reinforcement learning algorithms for USV and USV formation path planning with specific focus on a reliable obstacle avoidance in constrained maritime environments. For single USV planning, with the primary aim being to calculate a shortest collision avoiding path, the designed RL path planning algorithm is able to solve other complex issues such as the compliance with vehicle motion constraints. The USV formation maintenance algorithm is capable of calculating suitable paths for the formation and retain the formation shape robustly or vary shapes where necessary, which is promising to assist with the navigation in environments with cluttered obstacles. The developed three sets of algorithms are validated and tested in computer-based simulations and practical maritime environments extracted from real harbour areas in the UK

    Decentralized Multi-Robot Formation Control Using Reinforcement Learning

    Full text link
    This paper presents a decentralized leader-follower multi-robot formation control based on a reinforcement learning (RL) algorithm applied to a swarm of small educational Sphero robots. Since the basic Q-learning method is known to require large memory resources for Q-tables, this work implements the Double Deep Q-Network (DDQN) algorithm, which has achieved excellent results in many robotic problems. To enhance the system behavior, we trained two different DDQN models, one for reaching the formation and the other for maintaining it. The models use a discrete set of robot motions (actions) to adapt the continuous nonlinear system to the discrete nature of RL. The presented approach has been tested in simulation and real experiments which show that the multi-robot system can achieve and maintain a stable formation without the need for complex mathematical models and nonlinear control laws.Comment: 7 pages, 10 figures. To be published in 2023 International Conference on Information, Communication and Automation Technologies (ICAT

    Adaptive and extendable control of unmanned surface vehicle formations using distributed deep reinforcement learning

    Get PDF
    Future ocean exploration will be dominated by a large-scale deployment of marine robots such as unmanned surface vehicles (USVs). Without the involvement of human operators, USVs exploit oceans, especially the complex marine environments, in an unprecedented way with an increased mission efficiency. However, current autonomy level of USVs is still limited, and the majority of vessels are being remotely controlled. To address such an issue, artificial intelligence (AI) such as reinforcement learning can effectively equip USVs with high-level intelligence and consequently achieve full autonomous operation. Also, by adopting the concept of multi-agent intelligence, future trend of USV operations is to use them as a formation fleet. Current researches in USV formation control are largely based upon classical control theories such as PID, backstepping and model predictive control methods with the impact by using advanced AI technologies unclear. This paper, therefore, paves the way in this area by proposing a distributed deep reinforcement learning algorithm for USV formations. More importantly, using the proposed algorithm USV formations can learn two critical abilities, i.e. adaptability and extendibility that enable formations to arbitrarily increase the number of USVs or change formation shapes. The effectiveness of algorithms has been verified and validated through a number of computer-based simulations

    Adaptive and learning-based formation control of swarm robots

    Get PDF
    Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation

    Deep Reinforcement Learning for Swarm Systems

    Full text link
    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

    Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

    Full text link
    Recent advances in combining deep neural network architectures with reinforcement learning techniques have shown promising potential results in solving complex control problems with high dimensional state and action spaces. Inspired by these successes, in this paper, we build two kinds of reinforcement learning algorithms: deep policy-gradient and value-function based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The policy-gradient based agent maps its observation directly to the control signal, however the value-function based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Our methods show promising results in a traffic network simulated in the SUMO traffic simulator, without suffering from instability issues during the training process
    • …
    corecore