1,353 research outputs found
Enhancing Exploration and Safety in Deep Reinforcement Learning
A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
High-Dimensional Motion Planning and Learning Under Uncertain Conditions
Many existing path planning methods do not adequately account for uncertainty. Without uncertainty these existing techniques work well, but in real world environments they struggle due to inaccurate sensor models, arbitrarily moving obstacles, and uncertain action consequences. For example, picking up and storing childrens toys is a simple task for humans. Yet, for a robotic household robot the task can be daunting. The room must be modeled with sensors, which may or may not detect all the strewn toys. The robot must be able to detect and avoid the child who may be moving the very toys that the robot is tasked with cleaning. Finally, if the robot missteps and places a foot on a toy, it must be able to compensate for the unexpected consequences of its actions. This example demonstrates that even simple human tasks are fraught with uncertainties that must be accounted for in robotic path planning algorithms. This work presents the first steps towards migrating sampling-based path planning methods to real world environments by addressing three different types of uncertainty: (1) model uncertainty, (2) spatio-temporal obstacle uncertainty (moving obstacles) and (3) action consequence uncertainty. Uncertainty is encoded directly into path planning through a data structure in order to successfully and efficiently identify safe robot paths in sensed environments with noise. This encoding produces comparable clearance paths to other planning methods which are a known for high clearance, but at an order of magnitude less computational cost. It also shows that formal control theory methods combined with path planning provides a technique that has a 95% collision-free navigation rate with 300 moving obstacles. Finally, it demonstrates that reinforcement learning can be combined with planning data structures to autonomously learn motion controls of a seven degree of freedom robot despite a low computational cost despite the number of dimensions
Learning Resilient Behaviors for Navigation Under Uncertainty
Deep reinforcement learning has great potential to acquire complex, adaptive
behaviors for autonomous agents automatically. However, the underlying neural
network polices have not been widely deployed in real-world applications,
especially in these safety-critical tasks (e.g., autonomous driving). One of
the reasons is that the learned policy cannot perform flexible and resilient
behaviors as traditional methods to adapt to diverse environments. In this
paper, we consider the problem that a mobile robot learns adaptive and
resilient behaviors for navigating in unseen uncertain environments while
avoiding collisions. We present a novel approach for uncertainty-aware
navigation by introducing an uncertainty-aware predictor to model the
environmental uncertainty, and we propose a novel uncertainty-aware navigation
network to learn resilient behaviors in the prior unknown environments. To
train the proposed uncertainty-aware network more stably and efficiently, we
present the temperature decay training paradigm, which balances exploration and
exploitation during the training process. Our experimental evaluation
demonstrates that our approach can learn resilient behaviors in diverse
environments and generate adaptive trajectories according to environmental
uncertainties.Comment: accepted to ICRA 202
Local Navigation Among Movable Obstacles with Deep Reinforcement Learning
Autonomous robots would benefit a lot by gaining the ability to manipulate
their environment to solve path planning tasks, known as the Navigation Among
Movable Obstacle (NAMO) problem. In this paper, we present a deep reinforcement
learning approach for solving NAMO locally, near narrow passages. We train
parallel agents in physics simulation using an Advantage Actor-Critic based
algorithm with a multi-modal neural network. We present an online policy that
is able to push obstacles in a non-axial-aligned fashion, react to unexpected
obstacle dynamics in real-time, and solve the local NAMO problem. Experimental
validation in simulation shows that the presented approach generalises to
unseen NAMO problems in unknown environments. We further demonstrate the
implementation of the policy on a real quadrupedal robot, showing that the
policy can deal with real-world sensor noises and uncertainties in unseen NAMO
tasks.Comment: 7 pages, 7 figures, 4 table
Multi-Fidelity Reinforcement Learning with Gaussian Processes
We study the problem of Reinforcement Learning (RL) using as few real-world
samples as possible. A naive application of RL can be inefficient in large and
continuous state spaces. We present two versions of Multi-Fidelity
Reinforcement Learning (MFRL), model-based and model-free, that leverage
Gaussian Processes (GPs) to learn the optimal policy in a real-world
environment. In the MFRL framework, an agent uses multiple simulators of the
real environment to perform actions. With increasing fidelity in a simulator
chain, the number of samples used in successively higher simulators can be
reduced. By incorporating GPs in the MFRL framework, we empirically observe up
to reduction in the number of samples for model-based RL and
reduction for the model-free version. We examine the performance of our
algorithms through simulations and through real-world experiments for
navigation with a ground robot
Surgical Subtask Automation for Intraluminal Procedures using Deep Reinforcement Learning
Intraluminal procedures have opened up a new sub-field of minimally invasive surgery that use flexible instruments to navigate through complex luminal structures of the body, resulting in reduced invasiveness and improved patient benefits. One of the major challenges in this field is the accurate and precise control of the instrument inside the human body. Robotics has emerged as a promising solution to this problem. However, to achieve successful robotic intraluminal interventions, the control of the instrument needs to be automated to a large extent. The thesis first examines the state-of-the-art in intraluminal surgical robotics and identifies the key challenges in this field, which include the need for safe and effective tool manipulation, and the ability to adapt to unexpected changes in the luminal environment. To address these challenges, the thesis proposes several levels of autonomy that enable the robotic system to perform individual subtasks autonomously, while still allowing the surgeon to retain overall control of the procedure. The approach facilitates the development of specialized algorithms such as Deep Reinforcement Learning (DRL) for subtasks like navigation and tissue manipulation to produce robust surgical gestures. Additionally, the thesis proposes a safety framework that provides formal guarantees to prevent risky actions. The presented approaches are evaluated through a series of experiments using simulation and robotic platforms. The experiments demonstrate that subtask automation can improve the accuracy and efficiency of tool positioning and tissue manipulation, while also reducing the cognitive load on the surgeon. The results of this research have the potential to improve the reliability and safety of intraluminal surgical interventions, ultimately leading to better outcomes for patients and surgeons
์ถฉ๋ ํ์ต์ ํตํ ์ง์ญ ๊ฒฝ๋ก ๊ณํ ๋ฐฉ๋ฒ
ํ์๋
ผ๋ฌธ (์์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2019. 2. ์ด๋ฒํฌ.๋ณธ ๋
ผ๋ฌธ์์๋ ๊ฐํ ํ์ต ๊ธฐ๋ฐ์ ์ถฉ๋ ํํผ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์ถฉ๋ ํํผ๋ ๋ก๋ด์ด ๋ค๋ฅธ ๋ก๋ด ๋๋ ์ฅ์ ๋ฌผ๊ณผ ์ถฉ๋ ์์ด ๋ชฉํ ์ง์ ์ ๋๋ฌํ๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๋ค. ์ด ๋ฌธ์ ๋ ๋จ์ผ ๋ก๋ด ์ถฉ๋ ํํผ์ ๋ค๊ฐ์ฒด ๋ก๋ด ์ถฉ๋ ํํผ, ์ด๋ ๊ฒ ๋ ๊ฐ์ง๋ก ๋๋ ์ ์๋ค. ๋จ์ผ ๋ก๋ด ์ถฉ๋ ํํผ ๋ฌธ์ ๋ ํ๋์ ์ค์ฌ ๋ก๋ด๊ณผ ์ฌ๋ฌ ๊ฐ์ ์์ง์ด๋ ์ฅ์ ๋ฌผ๋ก ๊ตฌ์ฑ๋์ด ์๋ค. ์ค์ฌ ๋ก๋ด์ ๋๋คํ๊ฒ ์์ง์ด๋ ์ฅ์ ๋ฌผ์ ํผํด ๋ชฉํ ์ง์ ์ ๋๋ฌํ๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๋ค. ๋ค๊ฐ์ฒด ๋ก๋ด ์ถฉ๋ ํํผ ๋ฌธ์ ๋ ์ฌ๋ฌ ๋์ ์ค์ฌ ๋ก๋ด์ผ๋ก ๊ตฌ์ฑ๋์ด ์๋ค. ์ด ๋ฌธ์ ์๋ ์ญ์ ์ฅ์ ๋ฌผ์ ํฌํจ์ํฌ ์ ์๋ค. ์ค์ฌ ๋ก๋ด๋ค์ ์๋ก ์ถฉ๋์ ํํผํ๋ฉด์ ๊ฐ์์ ๋ชฉํ ์ง์ ์ ๋๋ฌํ๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๋ค. ๋ง์ฝ ํ๊ฒฝ์ ์์์น ๋ชปํ ์ฅ์ ๋ฌผ์ด ๋ฑ์ฅํ๋๋ผ๋, ๋ก๋ด๋ค์ ๊ทธ๊ฒ๋ค์ ํผํด์ผ ํ๋ค.
์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํ์ฌ ๋ณธ ๋
ผ๋ฌธ์์๋ ์ถฉ๋ ํํผ๋ฅผ ์ํ ์ถฉ๋ ํ์ต ๋ฐฉ๋ฒ (CALC) ์ ์ ์ํ๋ค. CALC๋ ๊ฐํ ํ์ต ๊ฐ๋
์ ์ด์ฉํด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ค. ์ ์ํ๋ ๋ฐฉ๋ฒ์ ํ์ต ๊ทธ๋ฆฌ๊ณ ๊ณํ ์ด๋ ๊ฒ ๋ ๊ฐ์ง ํ๊ฒฝ์ผ๋ก ๊ตฌ์ฑ ๋๋ค. ํ์ต ํ๊ฒฝ์ ํ๋์ ์ค์ฌ ๋ก๋ด๊ณผ ํ๋์ ์ฅ์ ๋ฌผ ๊ทธ๋ฆฌ๊ณ ํ์ต ์์ญ์ผ๋ก ๊ตฌ์ฑ๋์ด ์๋ค. ํ์ต ํ๊ฒฝ์์ ์ค์ฌ ๋ก๋ด์ ์ฅ์ ๋ฌผ๊ณผ ์ถฉ๋ํ๋ ๋ฒ์ ํ์ตํ๊ณ ๊ทธ์ ๋ํ ์ ์ฑ
์ ๋์ถํด ๋ธ๋ค. ์ฆ, ์ค์ฌ ๋ก๋ด์ด ์ฅ์ ๋ฌผ๊ณผ ์ถฉ๋ํ๊ฒ ๋๋ฉด ๊ทธ๊ฒ์ ์์ ๋ณด์์ ๋ฐ๋๋ค. ๊ทธ๋ฆฌ๊ณ ๋ง์ฝ ์ค์ฌ ๋ก๋ด์ด ์ฅ์ ๋ฌผ๊ณผ ์ถฉ๋ ํ์ง ์๊ณ ํ์ต ์์ญ์ ๋น ์ ธ๋๊ฐ๋ฉด, ๊ทธ๊ฒ์ ์์ ๋ณด์์ ๋ฐ๋๋ค. ๊ณํ ํ๊ฒฝ์ ์ฌ๋ฌ ๊ฐ์ ์ฅ์ ๋ฌผ ๋๋ ๋ก๋ด๋ค๊ณผ ํ๋์ ๋ชฉํ ์ง์ ์ผ๋ก ๊ตฌ์ฑ๋์ด ์๋ค. ํ์ต ํ๊ฒฝ์์ ํ์ตํ ์ ์ฑ
์ ํตํด ์ค์ฌ ๋ก๋ด์ ์ฌ๋ฌ ๋์ ์ฅ์ ๋ฌผ ๋๋ ๋ก๋ด๋ค๊ณผ์ ์ถฉ๋์ ํผํ ์ ์๋ค. ๋ณธ ๋ฐฉ๋ฒ์ ์ถฉ๋์ ํ์ต ํ๊ธฐ ๋๋ฌธ์, ์ถฉ๋์ ํํผํ๊ธฐ ์ํด์๋ ๋์ถ๋ ์ ์ฑ
์ ๋ค์ง์ด์ผ ํ๋ค. ํ์ง๋ง, ๋ชฉํ ์ง์ ๊ณผ๋ ์ผ์ข
์ `์ถฉ๋'์ ํด์ผํ๊ธฐ ๋๋ฌธ์, ๋ชฉํ ์ง์ ์ ๋ํด์๋ ๋์ถ๋ ์ ์ฑ
์ ๊ทธ๋๋ก ์ ์ฉํด์ผ ํ๋ค. ์ด ๋ ๊ฐ์ง ์ข
๋ฅ์ ์ ์ฑ
๋ค์ ์ตํฉํ๊ฒ ๋๋ฉด, ์ค์ฌ ๋ก๋ด์ ์ฅ์ ๋ฌผ ๋๋ ๋ก๋ด๋ค๊ณผ์ ์ถฉ๋์ ํํผํ๋ฉด์ ๋์์ ๋ชฉํ ์ง์ ์ ๋๋ฌํ ์ ์๋ค. ํ์ต ํ๊ฒฝ์์ ๋ก๋ด์ ํ๋ก๋
ธ๋ฏน ๋ก๋ด์ ๊ฐ์ ํ๋ค. ํ์ต๋ ์ ์ฑ
์ด ํ๋ก๋
ธ๋ฏน ๋ก๋ด์ ๊ธฐ๋ฐ์ผ๋ก ํ๋๋ผ๋, ์ ์ํ๋ ๋ฐฉ๋ฒ์ ํ๋ก๋
ธ๋ฏน ๋ก๋ด๊ณผ ๋นํ๋ก๋
ธ๋ฏน ๋ก๋ด ๋ชจ๋์ ์ ์ฉ์ด ๊ฐ๋ฅํ๋ค.
CALC๋ ๋ค์์ ์ธ ๊ฐ์ง ๋ฌธ์ ์ ์ ์ฉํ ์ ์๋ค. 1) ํ๋ก๋
ธ๋ฏน ๋จ์ผ ๋ก๋ด์ ์ถฉ๋ ํํผ. 2) ๋นํ๋ก๋
ธ๋ฏน ๋จ์ผ ๋ก๋ด์ ์ถฉ๋ ํํผ. 3) ๋นํ๋ก๋
ธ๋ฏน ๋ค๊ฐ์ฒด ๋ก๋ด์ ์ถฉ๋ ํํผ. ์ ์๋ ๋ฐฉ๋ฒ์ ์๋ฎฌ๋ ์ด์
๊ณผ ์ค์ ๋ก๋ด ํ๊ฒฝ์์ ์คํ ๋์๋ค. ์๋ฎฌ๋ ์ด์
์ ๋ก๋ด ์ด์์ฒด์ (ROS) ๊ธฐ๋ฐ์ ์๋ฎฌ๋ ์ดํฐ์ธ ๊ฐ์ ๋ณด์ ๊ฒ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ํ ์ข
๋ฅ์ธ PyGame์ ์ฌ์ฉํ์๋ค. ์๋ฎฌ๋ ์ด์
์์๋ ํ๋ก๋
ธ๋ฏน๊ณผ ๋นํ๋ก๋
ธ๋ฏน ๋ก๋ด์ ๋ชจ๋ ์ฌ์ฉํ์ฌ ์คํ์ ์งํํ์๋ค. ์ค์ ๋ก๋ด ํ๊ฒฝ ์คํ์์๋ ๋นํ๋ก๋
ธ๋ฏน ๋ก๋ด์ ํ ์ข
๋ฅ์ธ e-puck ๋ก๋ด์ ์ฌ์ฉํ์๋ค. ๋ํ, ์๋ฎฌ๋ ์ด์
์์ ํ์ต๋ ์ ์ฑ
์ ์ค์ ๋ก๋ด ํ๊ฒฝ ์คํ์์ ์ฌํ์ต ๋๋ ๋ณ๋์ ์์ ๊ณผ์ ์์ด ๋ฐ๋ก ์ ์ฉ์ด ๊ฐ๋ฅํ์๋ค. ์ด๋ฌํ ์คํ๋ค์ ๊ฒฐ๊ณผ๋ฅผ ํตํด ์ ์๋ ๋ฐฉ๋ฒ์ Reciprocal Velocity Obstacle (RVO) ๋๋ Optimal Reciprocal Collision Avoidance (ORCA)์ ๊ฐ์ ๊ธฐ์กด์ ๋ฐฉ๋ฒ๋ค๊ณผ ๋น๊ตํ์์ ๋ ํฅ์๋ ์ฑ๋ฅ์ ๋ณด์๋ค. ๊ฒ๋ค๊ฐ, ํ์ต์ ํจ์จ์ฑ ๋ํ ๊ธฐ์กด์ ํ์ต ๊ธฐ๋ฐ์ ๋ฐฉ๋ฒ๋ค์ ๋นํด ๋์ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์๋ค.This thesis proposes a reinforcement learning based collision avoidance method. The problem can be defined as an ability of a robot to reach its goal point without colliding with other robots and obstacles. There are two kinds of collision avoidance problem, single robot and multi-robot collision avoidance. Single robot collision avoidance problem contains multiple dynamic obstacles and one agent robot. The objective of the agent robot is to reach its goal point and avoid obstacles with random dynamics. Multi-robot collision avoidance problem contains multiple agent robots. It is also possible to include unknown dynamic obstacles to the problem. The agents should reach their own goal points without colliding with each other. If the environment contains unknown obstacles, the agents should avoid them also.
To solve the problems, Collision Avoidance by Learning Collision (CALC) is proposed. CALC adopts the concept of reinforcement learning. The method is divided into two environments, training and planning. The training environment consists of one agent, one obstacle, and a training range. In the training environment, the agent learns how to collide with the obstacle and generates a colliding policy. In other words, when the agent collides with the obstacle, it receives positive reward. On the other hand, when the agent escapes the training range without collision, it receives negative reward. The planning environment contains multiple obstacles or robots and a single goal point. With the trained policy, the agent can solve the collision avoidance problem in the planning environment regardless of its dimension. Since the method learned collision, the generated policy should be inverted in the planning environment to avoid obstacles or robots. However, the policy should be applied directly for the goal point so that the agent can `collide' with the goal. With the combination of both policies, the agent can avoid the obstacles or robots and reach to the goal point simultaneously. In the training algorithm, the robot is assumed to be a holonomic robot. Even though the trained policy is generated from the holonomic robot, the method can be applied to both holonomic and non-holonomic robots by holonomic to non-holonomic converting method.
CALC is applied to three problems, single holonomic robot, single non-holonomic robot, and multiple non-holonomic robot collision avoidance. The proposed method is validated both in the robot simulation and real-world experiment. For simulation, Robot Operating System (ROS) based simulator called Gazebo and simple game library PyGame are used. The method is tested with both holonomic and non-holonomic robots in the simulation experiment. For real-world planning experiment, non-holonomic mobile robot named e-puck is used. The learned policy from the simulation can be directly applied to the real-world robot without any calibration or retraining. The result shows that the proposed method outperforms the existing methods such as Reciprocal Velocity Obstacle (RVO), PrEference Appraisal Reinforcement Learning (PEARL), and Optimal Reciprocal Collision Avoidance (ORCA). In addition, it is shown that the proposed method is more efficient in terms of learning than existing learning-based method.1. Introduction 1
1.1 Motivations 1
1.2 Contributions 6
1.3 Organizations 7
2 Related Work 8
2.1 Reinforcement Learning 8
2.2 Classical Navigation Methods 11
2.3 Learning-Based Navigation Methods 13
3. Learning Collision 17
3.1 Introduction 17
3.2 Learning Collision 18
3.2.1 Markov Decision Process Setup 18
3.2.2 Training Algorithm 19
3.2.3 Experimental Results 22
4. Single Robot Collision Avoidance 25
4.1 Introduction 25
4.2 Holonomic Robot Obstacle Avoidance 26
4.2.1 Approach 26
4.2.2 Experimental Results 29
4.3 Non-Holonomic Robot Obstacle Avoidance 31
4.3.1 Approach 31
4.3.2 Experimental Results 33
5. Multi-Robot Collision Avoidance 36
5.1 Introduction 36
5.2 Approach 37
5.3 Experimental Results 40
5.3.1 Simulated Experiment 40
5.3.2 Real-World Experiment 44
5.3.3 Holonomic to Non-Holonomic Conversion Experiment 49
6. Conclusion 52
Bibliography 55
์ด๋ก 62
๊ฐ์ฌ์ ๊ธ 64Maste
LBGP: Learning Based Goal Planning for Autonomous Following in Front
This paper investigates a hybrid solution which combines deep reinforcement
learning (RL) and classical trajectory planning for the following in front
application. Here, an autonomous robot aims to stay ahead of a person as the
person freely walks around. Following in front is a challenging problem as the
user's intended trajectory is unknown and needs to be estimated, explicitly or
implicitly, by the robot. In addition, the robot needs to find a feasible way
to safely navigate ahead of human trajectory. Our deep RL module implicitly
estimates human trajectory and produces short-term navigational goals to guide
the robot. These goals are used by a trajectory planner to smoothly navigate
the robot to the short-term goals, and eventually in front of the user. We
employ curriculum learning in the deep RL module to efficiently achieve a high
return. Our system outperforms the state-of-the-art in following ahead and is
more reliable compared to end-to-end alternatives in both the simulation and
real world experiments. In contrast to a pure deep RL approach, we demonstrate
zero-shot transfer of the trained policy from simulation to the real world
Adaptive and learning-based formation control of swarm robots
Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation
- โฆ