Search CORE

1,488 research outputs found

Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

Author: A Martinoli
C Kube
C Moeslinger
F Arvin
FA Oliehoek
J Foerster
JK Gupta
L Bayındır
N Correll
P Basu
S Nouyan
V Mnih
Publication venue
Publication date: 01/01/2018
Field of study

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201

arXiv.org e-Print Archive

TUbiblio

Crossref

Exploiting Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

Author: Huttenrauch Max
Neumann Gerhard
Sosic Adrian
Publication venue: Springer International Publishing
Publication date: 01/01/2018
Field of study

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the gents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols

University of Lincoln Institutional Repository

Deep Reinforcement Learning for Swarm Systems

Author: Hüttenrauch Maximilian
Neumann Gerhard
Šošić Adrian
Publication venue
Publication date: 01/01/2019
Field of study

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

arXiv.org e-Print Archive

TUbiblio

Vision-Based Lane-Changing Behavior Detection Using Deep Residual Neural Network

Author: Barth Matthew J
Hao Peng
Wang Chao
Wei Zhensong
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Accurate lane localization and lane change detection are crucial in advanced driver assistance systems and autonomous driving systems for safer and more efficient trajectory planning. Conventional localization devices such as Global Positioning System only provide road-level resolution for car navigation, which is incompetent to assist in lane-level decision making. The state of art technique for lane localization is to use Light Detection and Ranging sensors to correct the global localization error and achieve centimeter-level accuracy, but the real-time implementation and popularization for LiDAR is still limited by its computational burden and current cost. As a cost-effective alternative, vision-based lane change detection has been highly regarded for affordable autonomous vehicles to support lane-level localization. A deep learning-based computer vision system is developed to detect the lane change behavior using the images captured by a front-view camera mounted on the vehicle and data from the inertial measurement unit for highway driving. Testing results on real-world driving data have shown that the proposed method is robust with real-time working ability and could achieve around 87% lane change detection accuracy. Compared to the average human reaction to visual stimuli, the proposed computer vision system works 9 times faster, which makes it capable of helping make life-saving decisions in time

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Scipedia

The State-of-the-art of Coordinated Ramp Control with Mixed Traffic Conditions

Author: Barth Matthew J
Wang Ziran
Wu Guoyuan
Ye Fei
Zhao Zhouqiao
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Ramp metering, a traditional traffic control strategy for conventional vehicles, has been widely deployed around the world since the 1960s. On the other hand, the last decade has witnessed significant advances in connected and automated vehicle (CAV) technology and its great potential for improving safety, mobility and environmental sustainability. Therefore, a large amount of research has been conducted on cooperative ramp merging for CAVs only. However, it is expected that the phase of mixed traffic, namely the coexistence of both human-driven vehicles and CAVs, would last for a long time. Since there is little research on the system-wide ramp control with mixed traffic conditions, the paper aims to close this gap by proposing an innovative system architecture and reviewing the state-of-the-art studies on the key components of the proposed system. These components include traffic state estimation, ramp metering, driving behavior modeling, and coordination of CAVs. All reviewed literature plot an extensive landscape for the proposed system-wide coordinated ramp control with mixed traffic conditions.Comment: 8 pages, 1 figure, IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE - ITSC 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

Author: Bai Yiwei
Wang Jun
Wen Ying
Yang Yaodong
Yu Lantao
Yu Yong
Zhang Weinan
Publication venue
Publication date: 14/05/2018
Field of study

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning. Our intention is to put intelligent agents into a simulated natural context and verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning (DRL). In order to scale the population size up to millions agents, a large-scale DRL training platform with redesigned experience buffer is proposed. Our results show that the population dynamics of AI agents, driven only by each agent's individual self-interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.Comment: Full version of the paper presented at AAMAS 2018 (International Conference on Autonomous Agents and Multiagent Systems

arXiv.org e-Print Archive

UCL Discovery

Machine Learning for Unmanned Aerial System (UAS) Networking

Author: Wang Jian
Publication venue: Scholarly Commons
Publication date: 01/12/2021
Field of study

Fueled by the advancement of 5G new radio (5G NR), rapid development has occurred in many fields. Compared with the conventional approaches, beamforming and network slicing enable 5G NR to have ten times decrease in latency, connection density, and experienced throughput than 4G long term evolution (4G LTE). These advantages pave the way for the evolution of Cyber-physical Systems (CPS) on a large scale. The reduction of consumption, the advancement of control engineering, and the simplification of Unmanned Aircraft System (UAS) enable the UAS networking deployment on a large scale to become feasible. The UAS networking can finish multiple complex missions simultaneously. However, the limitations of the conventional approaches are still a big challenge to make a trade-off between the massive management and efficient networking on a large scale. With 5G NR and machine learning, in this dissertation, my contributions can be summarized as the following: I proposed a novel Optimized Ad-hoc On-demand Distance Vector (OAODV) routing protocol to improve the throughput of Intra UAS networking. The novel routing protocol can reduce the system overhead and be efficient. To improve the security, I proposed a blockchain scheme to mitigate the malicious basestations for cellular connected UAS networking and a proof-of-traffic (PoT) to improve the efficiency of blockchain for UAS networking on a large scale. Inspired by the biological cell paradigm, I proposed the cell wall routing protocols for heterogeneous UAS networking. With 5G NR, the inter connections between UAS networking can strengthen the throughput and elasticity of UAS networking. With machine learning, the routing schedulings for intra- and inter- UAS networking can enhance the throughput of UAS networking on a large scale. The inter UAS networking can achieve the max-min throughput globally edge coloring. I leveraged the upper and lower bound to accelerate the optimization of edge coloring. This dissertation paves a way regarding UAS networking in the integration of CPS and machine learning. The UAS networking can achieve outstanding performance in a decentralized architecture. Concurrently, this dissertation gives insights into UAS networking on a large scale. These are fundamental to integrating UAS and National Aerial System (NAS), critical to aviation in the operated and unmanned fields. The dissertation provides novel approaches for the promotion of UAS networking on a large scale. The proposed approaches extend the state-of-the-art of UAS networking in a decentralized architecture. All the alterations can contribute to the establishment of UAS networking with CPS

Embry-Riddle Aeronautical University

Multi-Agent Reinforcement Learning for Dynamic Ocean Monitoring by a Swarm of Buoys

Author: Bouffanais Roland
Kouzehgar Maryam
Meghjani Malika
Publication venue
Publication date: 21/12/2020
Field of study

Autonomous marine environmental monitoring problem traditionally encompasses an area coverage problem which can only be effectively carried out by a multi-robot system. In this paper, we focus on robotic swarms that are typically operated and controlled by means of simple swarming behaviors obtained from a subtle, yet ad hoc combination of bio-inspired strategies. We propose a novel and structured approach for area coverage using multi-agent reinforcement learning (MARL) which effectively deals with the non-stationarity of environmental features. Specifically, we propose two dynamic area coverage approaches: (1) swarm-based MARL, and (2) coverage-range-based MARL. The former is trained using the multi-agent deep deterministic policy gradient (MADDPG) approach whereas, a modified version of MADDPG is introduced for the latter with a reward function that intrinsically leads to a collective behavior. Both methods are tested and validated with different geometric shaped regions with equal surface area (square vs. rectangle) yielding acceptable area coverage, and benefiting from the structured learning in non-stationary environments. Both approaches are advantageous compared to a na\"{i}ve swarming method. However, coverage-range-based MARL outperforms the swarm-based MARL with stronger convergence features in learning criteria and higher spreading of agents for area coverage.Comment: Accepted for Publication at IEEE/MTS OCEANS 202

arXiv.org e-Print Archive

COOPERATIVE LEARNING FOR THE CONSENSUS OF MULTI-AGENT SYSTEMS

Author: Liu Qishuai
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2019
Field of study

Due to a lot of attention for the multi-agent system in recent years, the consensus algorithm gained immense popularity for building fault-tolerant systems in system and control theory. Generally, the consensus algorithm drives the swarm of agents to work as a coherent group that can reach an agreement regarding a certain quantity of interest, which depends on the state of all agents themselves. The most common consensus algorithm is the average consensus, the final consensus value of which is equal to the average of the initial values. If we want the agents to find the best area of the particular resources, the average consensus will be failure. Thus the algorithm is restricted due to its incapacity to solve some optimization problems. In this dissertation, we want the agents to become more intelligent so that they can handle different optimization problems. Based on this idea, we first design a new consensus algorithm which modifies the general bat algorithm. Since bat algorithm is a swarm intelligence method and is proven to be suitable for solving the optimization problems, this modification is pretty straightforward. The optimization problem suggests the convergence direction. Also, in order to accelerate the convergence speed, we incorporate a term related to flux function, which serves as an energy/mass exchange rate in compartmental modeling or a heat transfer rate in thermodynamics. This term is inspired by the speed-up and speed-down strategy from biological swarms. We prove the stability of the proposed consensus algorithm for both linear and nonlinear flux functions in detail by the matrix paracontraction tool and the Lyapunov-based method, respectively. Another direction we are trying is to use the deep reinforcement learning to train the agent to reach the consensus state. Let the agent learn the input command by this method, they can become more intelligent without human intervention. By this method, we totally ignore the complex mathematical model in designing the protocol for the general consensus problem. The deep deterministic policy gradient algorithm is used to plan the command of the agent in the continuous domain. The moving robots systems are considered to be used to verify the effectiveness of the algorithm. Adviser: Qing Hu

DigitalCommons@University of Nebraska