Search CORE

340 research outputs found

Adaptive and learning-based formation control of swarm robots

Author: Salimi Mahsoo
Publication venue
Publication date: 14/10/2021
Field of study

Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation

Simon Fraser University Institutional Repository

Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement Learning: A Systematic Review

Author: AlMahamid Fadi
Grolinger Katarina
Publication venue: Scholarship@Western
Publication date: 24/08/2022
Field of study

There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones, in different applications such as packages delivery, traffic monitoring, search and rescue operations, and military combat engagements. In all of these applications, the UAV is used to navigate the environment autonomously --- without human interaction, perform specific tasks and avoid obstacles. Autonomous UAV navigation is commonly accomplished using Reinforcement Learning (RL), where agents act as experts in a domain to navigate the environment while avoiding obstacles. Understanding the navigation environment and algorithmic limitations plays an essential role in choosing the appropriate RL algorithm to solve the navigation problem effectively. Consequently, this study first identifies the main UAV navigation tasks and discusses navigation frameworks and simulation software. Next, RL algorithms are classified and discussed based on the environment, algorithm characteristics, abilities, and applications in different UAV navigation problems, which will help the practitioners and researchers select the appropriate RL algorithms for their UAV navigation use cases. Moreover, identified gaps and opportunities will drive UAV navigation research

Scholarship@Western

Self-Organised Swarm Flocking with Deep Reinforcement Learning

Author: Arvin Farshad
Bezcioglu Mehmet
Lennox Barry
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/03/2021
Field of study

The University of Manchester - Institutional Repository

Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control

Author: Jin Yue
Qiu Yunbo
Wang Jian
Zhan Yuzhu
Zhang Xudong
Publication venue
Publication date: 17/09/2022
Field of study

Flocking control is a significant problem in multi-agent systems such as multi-agent unmanned aerial vehicles and multi-agent autonomous underwater vehicles, which enhances the cooperativity and safety of agents. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. However, methods based on MARL suffer from sample inefficiency, since they require a huge number of experiences to be collected from interactions between agents and the environment. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents. During the process of pretraining, agents learn policies from demonstrations by MARL and behavior cloning simultaneously, and are prevented from overfitting demonstrations. By pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency in the process of online MARL with a warm start. Experiments show that PwD-MARL improves sample efficiency and policy performance in the problem of flocking control, even with bad or few demonstrations.Comment: Accepted by IEEE Vehicular Technology Conference (VTC) 2022-Fal

arXiv.org e-Print Archive

Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training

Author: Al-fadhali Najib
Alfandi Omar
Mosali Najmaddin Abo
Omar Rosli
Shamsudin Syariful Syafiq
Publication venue: ZU Scholars
Publication date: 22/02/2022
Field of study

Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking

ZU Scholars (Zayed University)

Sim-to-real transfer for fixed-wing uncrewed aerial vehicle:Pitch Control by High-Fidelity Modelling and Domain Randomization

Author: Araujo-Estrada Sergio
Wada Daichi
Windsor Shane P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2022
Field of study

Explore Bristol Research

COOPERATIVE LEARNING FOR THE CONSENSUS OF MULTI-AGENT SYSTEMS

Author: Liu Qishuai
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2019
Field of study

Due to a lot of attention for the multi-agent system in recent years, the consensus algorithm gained immense popularity for building fault-tolerant systems in system and control theory. Generally, the consensus algorithm drives the swarm of agents to work as a coherent group that can reach an agreement regarding a certain quantity of interest, which depends on the state of all agents themselves. The most common consensus algorithm is the average consensus, the final consensus value of which is equal to the average of the initial values. If we want the agents to find the best area of the particular resources, the average consensus will be failure. Thus the algorithm is restricted due to its incapacity to solve some optimization problems. In this dissertation, we want the agents to become more intelligent so that they can handle different optimization problems. Based on this idea, we first design a new consensus algorithm which modifies the general bat algorithm. Since bat algorithm is a swarm intelligence method and is proven to be suitable for solving the optimization problems, this modification is pretty straightforward. The optimization problem suggests the convergence direction. Also, in order to accelerate the convergence speed, we incorporate a term related to flux function, which serves as an energy/mass exchange rate in compartmental modeling or a heat transfer rate in thermodynamics. This term is inspired by the speed-up and speed-down strategy from biological swarms. We prove the stability of the proposed consensus algorithm for both linear and nonlinear flux functions in detail by the matrix paracontraction tool and the Lyapunov-based method, respectively. Another direction we are trying is to use the deep reinforcement learning to train the agent to reach the consensus state. Let the agent learn the input command by this method, they can become more intelligent without human intervention. By this method, we totally ignore the complex mathematical model in designing the protocol for the general consensus problem. The deep deterministic policy gradient algorithm is used to plan the command of the agent in the continuous domain. The moving robots systems are considered to be used to verify the effectiveness of the algorithm. Adviser: Qing Hu

DigitalCommons@University of Nebraska

Optimal active particle navigation meets machine learning

Author: Liebchen Benno
Löwen Hartmut
Nasiri Mahdi
Publication venue
Publication date: 09/03/2023
Field of study

The question of how "smart" active agents, like insects, microorganisms, or future colloidal robots need to steer to optimally reach or discover a target, such as an odor source, food, or a cancer cell in a complex environment has recently attracted great interest. Here, we provide an overview of recent developments, regarding such optimal navigation problems, from the micro- to the macroscale, and give a perspective by discussing some of the challenges which are ahead of us. Besides exemplifying an elementary approach to optimal navigation problems, the article focuses on works utilizing machine learning-based methods. Such learning-based approaches can uncover highly efficient navigation strategies even for problems that involve e.g. chaotic, high-dimensional, or unknown environments and are hardly solvable based on conventional analytical or simulation methods.Comment: 7 pages, 3 figure

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)