1,353 research outputs found

    Enhancing Exploration and Safety in Deep Reinforcement Learning

    Get PDF
    A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration

    High-Dimensional Motion Planning and Learning Under Uncertain Conditions

    Get PDF
    Many existing path planning methods do not adequately account for uncertainty. Without uncertainty these existing techniques work well, but in real world environments they struggle due to inaccurate sensor models, arbitrarily moving obstacles, and uncertain action consequences. For example, picking up and storing childrens toys is a simple task for humans. Yet, for a robotic household robot the task can be daunting. The room must be modeled with sensors, which may or may not detect all the strewn toys. The robot must be able to detect and avoid the child who may be moving the very toys that the robot is tasked with cleaning. Finally, if the robot missteps and places a foot on a toy, it must be able to compensate for the unexpected consequences of its actions. This example demonstrates that even simple human tasks are fraught with uncertainties that must be accounted for in robotic path planning algorithms. This work presents the first steps towards migrating sampling-based path planning methods to real world environments by addressing three different types of uncertainty: (1) model uncertainty, (2) spatio-temporal obstacle uncertainty (moving obstacles) and (3) action consequence uncertainty. Uncertainty is encoded directly into path planning through a data structure in order to successfully and efficiently identify safe robot paths in sensed environments with noise. This encoding produces comparable clearance paths to other planning methods which are a known for high clearance, but at an order of magnitude less computational cost. It also shows that formal control theory methods combined with path planning provides a technique that has a 95% collision-free navigation rate with 300 moving obstacles. Finally, it demonstrates that reinforcement learning can be combined with planning data structures to autonomously learn motion controls of a seven degree of freedom robot despite a low computational cost despite the number of dimensions

    Learning Resilient Behaviors for Navigation Under Uncertainty

    Full text link
    Deep reinforcement learning has great potential to acquire complex, adaptive behaviors for autonomous agents automatically. However, the underlying neural network polices have not been widely deployed in real-world applications, especially in these safety-critical tasks (e.g., autonomous driving). One of the reasons is that the learned policy cannot perform flexible and resilient behaviors as traditional methods to adapt to diverse environments. In this paper, we consider the problem that a mobile robot learns adaptive and resilient behaviors for navigating in unseen uncertain environments while avoiding collisions. We present a novel approach for uncertainty-aware navigation by introducing an uncertainty-aware predictor to model the environmental uncertainty, and we propose a novel uncertainty-aware navigation network to learn resilient behaviors in the prior unknown environments. To train the proposed uncertainty-aware network more stably and efficiently, we present the temperature decay training paradigm, which balances exploration and exploitation during the training process. Our experimental evaluation demonstrates that our approach can learn resilient behaviors in diverse environments and generate adaptive trajectories according to environmental uncertainties.Comment: accepted to ICRA 202

    Local Navigation Among Movable Obstacles with Deep Reinforcement Learning

    Full text link
    Autonomous robots would benefit a lot by gaining the ability to manipulate their environment to solve path planning tasks, known as the Navigation Among Movable Obstacle (NAMO) problem. In this paper, we present a deep reinforcement learning approach for solving NAMO locally, near narrow passages. We train parallel agents in physics simulation using an Advantage Actor-Critic based algorithm with a multi-modal neural network. We present an online policy that is able to push obstacles in a non-axial-aligned fashion, react to unexpected obstacle dynamics in real-time, and solve the local NAMO problem. Experimental validation in simulation shows that the presented approach generalises to unseen NAMO problems in unknown environments. We further demonstrate the implementation of the policy on a real quadrupedal robot, showing that the policy can deal with real-world sensor noises and uncertainties in unseen NAMO tasks.Comment: 7 pages, 7 figures, 4 table

    Multi-Fidelity Reinforcement Learning with Gaussian Processes

    Full text link
    We study the problem of Reinforcement Learning (RL) using as few real-world samples as possible. A naive application of RL can be inefficient in large and continuous state spaces. We present two versions of Multi-Fidelity Reinforcement Learning (MFRL), model-based and model-free, that leverage Gaussian Processes (GPs) to learn the optimal policy in a real-world environment. In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. With increasing fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. By incorporating GPs in the MFRL framework, we empirically observe up to 40%40\% reduction in the number of samples for model-based RL and 60%60\% reduction for the model-free version. We examine the performance of our algorithms through simulations and through real-world experiments for navigation with a ground robot

    Surgical Subtask Automation for Intraluminal Procedures using Deep Reinforcement Learning

    Get PDF
    Intraluminal procedures have opened up a new sub-field of minimally invasive surgery that use flexible instruments to navigate through complex luminal structures of the body, resulting in reduced invasiveness and improved patient benefits. One of the major challenges in this field is the accurate and precise control of the instrument inside the human body. Robotics has emerged as a promising solution to this problem. However, to achieve successful robotic intraluminal interventions, the control of the instrument needs to be automated to a large extent. The thesis first examines the state-of-the-art in intraluminal surgical robotics and identifies the key challenges in this field, which include the need for safe and effective tool manipulation, and the ability to adapt to unexpected changes in the luminal environment. To address these challenges, the thesis proposes several levels of autonomy that enable the robotic system to perform individual subtasks autonomously, while still allowing the surgeon to retain overall control of the procedure. The approach facilitates the development of specialized algorithms such as Deep Reinforcement Learning (DRL) for subtasks like navigation and tissue manipulation to produce robust surgical gestures. Additionally, the thesis proposes a safety framework that provides formal guarantees to prevent risky actions. The presented approaches are evaluated through a series of experiments using simulation and robotic platforms. The experiments demonstrate that subtask automation can improve the accuracy and efficiency of tool positioning and tissue manipulation, while also reducing the cognitive load on the surgeon. The results of this research have the potential to improve the reliability and safety of intraluminal surgical interventions, ultimately leading to better outcomes for patients and surgeons

    ์ถฉ๋Œ ํ•™์Šต์„ ํ†ตํ•œ ์ง€์—ญ ๊ฒฝ๋กœ ๊ณ„ํš ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2019. 2. ์ด๋ฒ”ํฌ.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ถฉ๋Œ ํšŒํ”ผ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ถฉ๋Œ ํšŒํ”ผ๋ž€ ๋กœ๋ด‡์ด ๋‹ค๋ฅธ ๋กœ๋ด‡ ๋˜๋Š” ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œ ์—†์ด ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋Š” ๋‹จ์ผ ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ์™€ ๋‹ค๊ฐœ์ฒด ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ, ์ด๋ ‡๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. ๋‹จ์ผ ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ ๋ฌธ์ œ๋Š” ํ•˜๋‚˜์˜ ์ค‘์‹ฌ ๋กœ๋ด‡๊ณผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์›€์ง์ด๋Š” ์žฅ์• ๋ฌผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ์ค‘์‹ฌ ๋กœ๋ด‡์€ ๋žœ๋คํ•˜๊ฒŒ ์›€์ง์ด๋Š” ์žฅ์• ๋ฌผ์„ ํ”ผํ•ด ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋‹ค๊ฐœ์ฒด ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ ๋ฌธ์ œ๋Š” ์—ฌ๋Ÿฌ ๋Œ€์˜ ์ค‘์‹ฌ ๋กœ๋ด‡์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ์ด ๋ฌธ์ œ์—๋„ ์—ญ์‹œ ์žฅ์• ๋ฌผ์„ ํฌํ•จ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ค‘์‹ฌ ๋กœ๋ด‡๋“ค์€ ์„œ๋กœ ์ถฉ๋Œ์„ ํšŒํ”ผํ•˜๋ฉด์„œ ๊ฐ์ž์˜ ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋งŒ์•ฝ ํ™˜๊ฒฝ์— ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์žฅ์• ๋ฌผ์ด ๋“ฑ์žฅํ•˜๋”๋ผ๋„, ๋กœ๋ด‡๋“ค์€ ๊ทธ๊ฒƒ๋“ค์„ ํ”ผํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ถฉ๋Œ ํšŒํ”ผ๋ฅผ ์œ„ํ•œ ์ถฉ๋Œ ํ•™์Šต ๋ฐฉ๋ฒ• (CALC) ์„ ์ œ์•ˆํ•œ๋‹ค. CALC๋Š” ๊ฐ•ํ™” ํ•™์Šต ๊ฐœ๋…์„ ์ด์šฉํ•ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํ•™์Šต ๊ทธ๋ฆฌ๊ณ  ๊ณ„ํš ์ด๋ ‡๊ฒŒ ๋‘ ๊ฐ€์ง€ ํ™˜๊ฒฝ์œผ๋กœ ๊ตฌ์„ฑ ๋œ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์€ ํ•˜๋‚˜์˜ ์ค‘์‹ฌ ๋กœ๋ด‡๊ณผ ํ•˜๋‚˜์˜ ์žฅ์• ๋ฌผ ๊ทธ๋ฆฌ๊ณ  ํ•™์Šต ์˜์—ญ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ์ค‘์‹ฌ ๋กœ๋ด‡์€ ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œํ•˜๋Š” ๋ฒ•์„ ํ•™์Šตํ•˜๊ณ  ๊ทธ์— ๋Œ€ํ•œ ์ •์ฑ…์„ ๋„์ถœํ•ด ๋‚ธ๋‹ค. ์ฆ‰, ์ค‘์‹ฌ ๋กœ๋ด‡์ด ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œํ•˜๊ฒŒ ๋˜๋ฉด ๊ทธ๊ฒƒ์€ ์–‘์˜ ๋ณด์ƒ์„ ๋ฐ›๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋งŒ์•ฝ ์ค‘์‹ฌ ๋กœ๋ด‡์ด ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œ ํ•˜์ง€ ์•Š๊ณ  ํ•™์Šต ์˜์—ญ์„ ๋น ์ ธ๋‚˜๊ฐ€๋ฉด, ๊ทธ๊ฒƒ์€ ์Œ์˜ ๋ณด์ƒ์„ ๋ฐ›๋Š”๋‹ค. ๊ณ„ํš ํ™˜๊ฒฝ์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์žฅ์• ๋ฌผ ๋˜๋Š” ๋กœ๋ด‡๋“ค๊ณผ ํ•˜๋‚˜์˜ ๋ชฉํ‘œ ์ง€์ ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ํ•™์Šตํ•œ ์ •์ฑ…์„ ํ†ตํ•ด ์ค‘์‹ฌ ๋กœ๋ด‡์€ ์—ฌ๋Ÿฌ ๋Œ€์˜ ์žฅ์• ๋ฌผ ๋˜๋Š” ๋กœ๋ด‡๋“ค๊ณผ์˜ ์ถฉ๋Œ์„ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋ฐฉ๋ฒ•์€ ์ถฉ๋Œ์„ ํ•™์Šต ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ถฉ๋Œ์„ ํšŒํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋„์ถœ๋œ ์ •์ฑ…์„ ๋’ค์ง‘์–ด์•ผ ํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ๋ชฉํ‘œ ์ง€์ ๊ณผ๋Š” ์ผ์ข…์˜ `์ถฉ๋Œ'์„ ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ชฉํ‘œ ์ง€์ ์— ๋Œ€ํ•ด์„œ๋Š” ๋„์ถœ๋œ ์ •์ฑ…์„ ๊ทธ๋Œ€๋กœ ์ ์šฉํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ์ •์ฑ…๋“ค์„ ์œตํ•ฉํ•˜๊ฒŒ ๋˜๋ฉด, ์ค‘์‹ฌ ๋กœ๋ด‡์€ ์žฅ์• ๋ฌผ ๋˜๋Š” ๋กœ๋ด‡๋“ค๊ณผ์˜ ์ถฉ๋Œ์„ ํšŒํ”ผํ•˜๋ฉด์„œ ๋™์‹œ์— ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ๋กœ๋ด‡์€ ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์„ ๊ฐ€์ •ํ•œ๋‹ค. ํ•™์Šต๋œ ์ •์ฑ…์ด ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋”๋ผ๋„, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡๊ณผ ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡ ๋ชจ๋‘์— ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. CALC๋Š” ๋‹ค์Œ์˜ ์„ธ ๊ฐ€์ง€ ๋ฌธ์ œ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 1) ํ™€๋กœ๋…ธ๋ฏน ๋‹จ์ผ ๋กœ๋ด‡์˜ ์ถฉ๋Œ ํšŒํ”ผ. 2) ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋‹จ์ผ ๋กœ๋ด‡์˜ ์ถฉ๋Œ ํšŒํ”ผ. 3) ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋‹ค๊ฐœ์ฒด ๋กœ๋ด‡์˜ ์ถฉ๋Œ ํšŒํ”ผ. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ ์‹คํ—˜ ๋˜์—ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ๋กœ๋ด‡ ์šด์˜์ฒด์ œ (ROS) ๊ธฐ๋ฐ˜์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์ธ ๊ฐ€์ œ๋ณด์™€ ๊ฒŒ์ž„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ํ•œ ์ข…๋ฅ˜์ธ PyGame์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ํ™€๋กœ๋…ธ๋ฏน๊ณผ ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ ์‹คํ—˜์—์„œ๋Š” ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์˜ ํ•œ ์ข…๋ฅ˜์ธ e-puck ๋กœ๋ด‡์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋œ ์ •์ฑ…์€ ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ ์‹คํ—˜์—์„œ ์žฌํ•™์Šต ๋˜๋Š” ๋ณ„๋„์˜ ์ˆ˜์ •๊ณผ์ • ์—†์ด ๋ฐ”๋กœ ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์‹คํ—˜๋“ค์˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ Reciprocal Velocity Obstacle (RVO) ๋˜๋Š” Optimal Reciprocal Collision Avoidance (ORCA)์™€ ๊ฐ™์€ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ•˜์˜€์„ ๋•Œ ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ํ•™์Šต์˜ ํšจ์œจ์„ฑ ๋˜ํ•œ ๊ธฐ์กด์˜ ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋“ค์— ๋น„ํ•ด ๋†’์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค.This thesis proposes a reinforcement learning based collision avoidance method. The problem can be defined as an ability of a robot to reach its goal point without colliding with other robots and obstacles. There are two kinds of collision avoidance problem, single robot and multi-robot collision avoidance. Single robot collision avoidance problem contains multiple dynamic obstacles and one agent robot. The objective of the agent robot is to reach its goal point and avoid obstacles with random dynamics. Multi-robot collision avoidance problem contains multiple agent robots. It is also possible to include unknown dynamic obstacles to the problem. The agents should reach their own goal points without colliding with each other. If the environment contains unknown obstacles, the agents should avoid them also. To solve the problems, Collision Avoidance by Learning Collision (CALC) is proposed. CALC adopts the concept of reinforcement learning. The method is divided into two environments, training and planning. The training environment consists of one agent, one obstacle, and a training range. In the training environment, the agent learns how to collide with the obstacle and generates a colliding policy. In other words, when the agent collides with the obstacle, it receives positive reward. On the other hand, when the agent escapes the training range without collision, it receives negative reward. The planning environment contains multiple obstacles or robots and a single goal point. With the trained policy, the agent can solve the collision avoidance problem in the planning environment regardless of its dimension. Since the method learned collision, the generated policy should be inverted in the planning environment to avoid obstacles or robots. However, the policy should be applied directly for the goal point so that the agent can `collide' with the goal. With the combination of both policies, the agent can avoid the obstacles or robots and reach to the goal point simultaneously. In the training algorithm, the robot is assumed to be a holonomic robot. Even though the trained policy is generated from the holonomic robot, the method can be applied to both holonomic and non-holonomic robots by holonomic to non-holonomic converting method. CALC is applied to three problems, single holonomic robot, single non-holonomic robot, and multiple non-holonomic robot collision avoidance. The proposed method is validated both in the robot simulation and real-world experiment. For simulation, Robot Operating System (ROS) based simulator called Gazebo and simple game library PyGame are used. The method is tested with both holonomic and non-holonomic robots in the simulation experiment. For real-world planning experiment, non-holonomic mobile robot named e-puck is used. The learned policy from the simulation can be directly applied to the real-world robot without any calibration or retraining. The result shows that the proposed method outperforms the existing methods such as Reciprocal Velocity Obstacle (RVO), PrEference Appraisal Reinforcement Learning (PEARL), and Optimal Reciprocal Collision Avoidance (ORCA). In addition, it is shown that the proposed method is more efficient in terms of learning than existing learning-based method.1. Introduction 1 1.1 Motivations 1 1.2 Contributions 6 1.3 Organizations 7 2 Related Work 8 2.1 Reinforcement Learning 8 2.2 Classical Navigation Methods 11 2.3 Learning-Based Navigation Methods 13 3. Learning Collision 17 3.1 Introduction 17 3.2 Learning Collision 18 3.2.1 Markov Decision Process Setup 18 3.2.2 Training Algorithm 19 3.2.3 Experimental Results 22 4. Single Robot Collision Avoidance 25 4.1 Introduction 25 4.2 Holonomic Robot Obstacle Avoidance 26 4.2.1 Approach 26 4.2.2 Experimental Results 29 4.3 Non-Holonomic Robot Obstacle Avoidance 31 4.3.1 Approach 31 4.3.2 Experimental Results 33 5. Multi-Robot Collision Avoidance 36 5.1 Introduction 36 5.2 Approach 37 5.3 Experimental Results 40 5.3.1 Simulated Experiment 40 5.3.2 Real-World Experiment 44 5.3.3 Holonomic to Non-Holonomic Conversion Experiment 49 6. Conclusion 52 Bibliography 55 ์ดˆ๋ก 62 ๊ฐ์‚ฌ์˜ ๊ธ€ 64Maste

    LBGP: Learning Based Goal Planning for Autonomous Following in Front

    Full text link
    This paper investigates a hybrid solution which combines deep reinforcement learning (RL) and classical trajectory planning for the following in front application. Here, an autonomous robot aims to stay ahead of a person as the person freely walks around. Following in front is a challenging problem as the user's intended trajectory is unknown and needs to be estimated, explicitly or implicitly, by the robot. In addition, the robot needs to find a feasible way to safely navigate ahead of human trajectory. Our deep RL module implicitly estimates human trajectory and produces short-term navigational goals to guide the robot. These goals are used by a trajectory planner to smoothly navigate the robot to the short-term goals, and eventually in front of the user. We employ curriculum learning in the deep RL module to efficiently achieve a high return. Our system outperforms the state-of-the-art in following ahead and is more reliable compared to end-to-end alternatives in both the simulation and real world experiments. In contrast to a pure deep RL approach, we demonstrate zero-shot transfer of the trained policy from simulation to the real world

    Adaptive and learning-based formation control of swarm robots

    Get PDF
    Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation
    • โ€ฆ
    corecore