415 research outputs found

    Learning visual docking for non-holonomic autonomous vehicles

    Get PDF
    This paper presents a new method of learning visual docking skills for non-holonomic vehicles by direct interaction with the environment. The method is based on a reinforcement algorithm, which speeds up Q-learning by applying memorybased sweeping and enforcing the โ€œadjoining propertyโ€, a filtering mechanism to only allow transitions between states that satisfy a fixed distance. The method overcomes some limitations of reinforcement learning techniques when they are employed in applications with continuous non-linear systems, such as car-like vehicles. In particular, a good approximation to the optimal behaviour is obtained by a small look-up table. The algorithm is tested within an image-based visual servoing framework on a docking task. The training time was less than 1 hour on the real vehicle. In experiments, we show the satisfactory performance of the algorithm

    ์ถฉ๋Œ ํ•™์Šต์„ ํ†ตํ•œ ์ง€์—ญ ๊ฒฝ๋กœ ๊ณ„ํš ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2019. 2. ์ด๋ฒ”ํฌ.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ถฉ๋Œ ํšŒํ”ผ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ถฉ๋Œ ํšŒํ”ผ๋ž€ ๋กœ๋ด‡์ด ๋‹ค๋ฅธ ๋กœ๋ด‡ ๋˜๋Š” ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œ ์—†์ด ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋Š” ๋‹จ์ผ ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ์™€ ๋‹ค๊ฐœ์ฒด ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ, ์ด๋ ‡๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. ๋‹จ์ผ ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ ๋ฌธ์ œ๋Š” ํ•˜๋‚˜์˜ ์ค‘์‹ฌ ๋กœ๋ด‡๊ณผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์›€์ง์ด๋Š” ์žฅ์• ๋ฌผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ์ค‘์‹ฌ ๋กœ๋ด‡์€ ๋žœ๋คํ•˜๊ฒŒ ์›€์ง์ด๋Š” ์žฅ์• ๋ฌผ์„ ํ”ผํ•ด ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋‹ค๊ฐœ์ฒด ๋กœ๋ด‡ ์ถฉ๋Œ ํšŒํ”ผ ๋ฌธ์ œ๋Š” ์—ฌ๋Ÿฌ ๋Œ€์˜ ์ค‘์‹ฌ ๋กœ๋ด‡์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ์ด ๋ฌธ์ œ์—๋„ ์—ญ์‹œ ์žฅ์• ๋ฌผ์„ ํฌํ•จ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ค‘์‹ฌ ๋กœ๋ด‡๋“ค์€ ์„œ๋กœ ์ถฉ๋Œ์„ ํšŒํ”ผํ•˜๋ฉด์„œ ๊ฐ์ž์˜ ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋งŒ์•ฝ ํ™˜๊ฒฝ์— ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์žฅ์• ๋ฌผ์ด ๋“ฑ์žฅํ•˜๋”๋ผ๋„, ๋กœ๋ด‡๋“ค์€ ๊ทธ๊ฒƒ๋“ค์„ ํ”ผํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ถฉ๋Œ ํšŒํ”ผ๋ฅผ ์œ„ํ•œ ์ถฉ๋Œ ํ•™์Šต ๋ฐฉ๋ฒ• (CALC) ์„ ์ œ์•ˆํ•œ๋‹ค. CALC๋Š” ๊ฐ•ํ™” ํ•™์Šต ๊ฐœ๋…์„ ์ด์šฉํ•ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํ•™์Šต ๊ทธ๋ฆฌ๊ณ  ๊ณ„ํš ์ด๋ ‡๊ฒŒ ๋‘ ๊ฐ€์ง€ ํ™˜๊ฒฝ์œผ๋กœ ๊ตฌ์„ฑ ๋œ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์€ ํ•˜๋‚˜์˜ ์ค‘์‹ฌ ๋กœ๋ด‡๊ณผ ํ•˜๋‚˜์˜ ์žฅ์• ๋ฌผ ๊ทธ๋ฆฌ๊ณ  ํ•™์Šต ์˜์—ญ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ์ค‘์‹ฌ ๋กœ๋ด‡์€ ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œํ•˜๋Š” ๋ฒ•์„ ํ•™์Šตํ•˜๊ณ  ๊ทธ์— ๋Œ€ํ•œ ์ •์ฑ…์„ ๋„์ถœํ•ด ๋‚ธ๋‹ค. ์ฆ‰, ์ค‘์‹ฌ ๋กœ๋ด‡์ด ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œํ•˜๊ฒŒ ๋˜๋ฉด ๊ทธ๊ฒƒ์€ ์–‘์˜ ๋ณด์ƒ์„ ๋ฐ›๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋งŒ์•ฝ ์ค‘์‹ฌ ๋กœ๋ด‡์ด ์žฅ์• ๋ฌผ๊ณผ ์ถฉ๋Œ ํ•˜์ง€ ์•Š๊ณ  ํ•™์Šต ์˜์—ญ์„ ๋น ์ ธ๋‚˜๊ฐ€๋ฉด, ๊ทธ๊ฒƒ์€ ์Œ์˜ ๋ณด์ƒ์„ ๋ฐ›๋Š”๋‹ค. ๊ณ„ํš ํ™˜๊ฒฝ์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์žฅ์• ๋ฌผ ๋˜๋Š” ๋กœ๋ด‡๋“ค๊ณผ ํ•˜๋‚˜์˜ ๋ชฉํ‘œ ์ง€์ ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ํ•™์Šตํ•œ ์ •์ฑ…์„ ํ†ตํ•ด ์ค‘์‹ฌ ๋กœ๋ด‡์€ ์—ฌ๋Ÿฌ ๋Œ€์˜ ์žฅ์• ๋ฌผ ๋˜๋Š” ๋กœ๋ด‡๋“ค๊ณผ์˜ ์ถฉ๋Œ์„ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋ฐฉ๋ฒ•์€ ์ถฉ๋Œ์„ ํ•™์Šต ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ถฉ๋Œ์„ ํšŒํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋„์ถœ๋œ ์ •์ฑ…์„ ๋’ค์ง‘์–ด์•ผ ํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ๋ชฉํ‘œ ์ง€์ ๊ณผ๋Š” ์ผ์ข…์˜ `์ถฉ๋Œ'์„ ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ชฉํ‘œ ์ง€์ ์— ๋Œ€ํ•ด์„œ๋Š” ๋„์ถœ๋œ ์ •์ฑ…์„ ๊ทธ๋Œ€๋กœ ์ ์šฉํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ์ •์ฑ…๋“ค์„ ์œตํ•ฉํ•˜๊ฒŒ ๋˜๋ฉด, ์ค‘์‹ฌ ๋กœ๋ด‡์€ ์žฅ์• ๋ฌผ ๋˜๋Š” ๋กœ๋ด‡๋“ค๊ณผ์˜ ์ถฉ๋Œ์„ ํšŒํ”ผํ•˜๋ฉด์„œ ๋™์‹œ์— ๋ชฉํ‘œ ์ง€์ ์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ์—์„œ ๋กœ๋ด‡์€ ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์„ ๊ฐ€์ •ํ•œ๋‹ค. ํ•™์Šต๋œ ์ •์ฑ…์ด ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋”๋ผ๋„, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡๊ณผ ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡ ๋ชจ๋‘์— ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. CALC๋Š” ๋‹ค์Œ์˜ ์„ธ ๊ฐ€์ง€ ๋ฌธ์ œ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 1) ํ™€๋กœ๋…ธ๋ฏน ๋‹จ์ผ ๋กœ๋ด‡์˜ ์ถฉ๋Œ ํšŒํ”ผ. 2) ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋‹จ์ผ ๋กœ๋ด‡์˜ ์ถฉ๋Œ ํšŒํ”ผ. 3) ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋‹ค๊ฐœ์ฒด ๋กœ๋ด‡์˜ ์ถฉ๋Œ ํšŒํ”ผ. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ ์‹คํ—˜ ๋˜์—ˆ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ๋กœ๋ด‡ ์šด์˜์ฒด์ œ (ROS) ๊ธฐ๋ฐ˜์˜ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์ธ ๊ฐ€์ œ๋ณด์™€ ๊ฒŒ์ž„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ํ•œ ์ข…๋ฅ˜์ธ PyGame์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋Š” ํ™€๋กœ๋…ธ๋ฏน๊ณผ ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ ์‹คํ—˜์—์„œ๋Š” ๋น„ํ™€๋กœ๋…ธ๋ฏน ๋กœ๋ด‡์˜ ํ•œ ์ข…๋ฅ˜์ธ e-puck ๋กœ๋ด‡์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šต๋œ ์ •์ฑ…์€ ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ ์‹คํ—˜์—์„œ ์žฌํ•™์Šต ๋˜๋Š” ๋ณ„๋„์˜ ์ˆ˜์ •๊ณผ์ • ์—†์ด ๋ฐ”๋กœ ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์‹คํ—˜๋“ค์˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ Reciprocal Velocity Obstacle (RVO) ๋˜๋Š” Optimal Reciprocal Collision Avoidance (ORCA)์™€ ๊ฐ™์€ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ•˜์˜€์„ ๋•Œ ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ํ•™์Šต์˜ ํšจ์œจ์„ฑ ๋˜ํ•œ ๊ธฐ์กด์˜ ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋“ค์— ๋น„ํ•ด ๋†’์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค.This thesis proposes a reinforcement learning based collision avoidance method. The problem can be defined as an ability of a robot to reach its goal point without colliding with other robots and obstacles. There are two kinds of collision avoidance problem, single robot and multi-robot collision avoidance. Single robot collision avoidance problem contains multiple dynamic obstacles and one agent robot. The objective of the agent robot is to reach its goal point and avoid obstacles with random dynamics. Multi-robot collision avoidance problem contains multiple agent robots. It is also possible to include unknown dynamic obstacles to the problem. The agents should reach their own goal points without colliding with each other. If the environment contains unknown obstacles, the agents should avoid them also. To solve the problems, Collision Avoidance by Learning Collision (CALC) is proposed. CALC adopts the concept of reinforcement learning. The method is divided into two environments, training and planning. The training environment consists of one agent, one obstacle, and a training range. In the training environment, the agent learns how to collide with the obstacle and generates a colliding policy. In other words, when the agent collides with the obstacle, it receives positive reward. On the other hand, when the agent escapes the training range without collision, it receives negative reward. The planning environment contains multiple obstacles or robots and a single goal point. With the trained policy, the agent can solve the collision avoidance problem in the planning environment regardless of its dimension. Since the method learned collision, the generated policy should be inverted in the planning environment to avoid obstacles or robots. However, the policy should be applied directly for the goal point so that the agent can `collide' with the goal. With the combination of both policies, the agent can avoid the obstacles or robots and reach to the goal point simultaneously. In the training algorithm, the robot is assumed to be a holonomic robot. Even though the trained policy is generated from the holonomic robot, the method can be applied to both holonomic and non-holonomic robots by holonomic to non-holonomic converting method. CALC is applied to three problems, single holonomic robot, single non-holonomic robot, and multiple non-holonomic robot collision avoidance. The proposed method is validated both in the robot simulation and real-world experiment. For simulation, Robot Operating System (ROS) based simulator called Gazebo and simple game library PyGame are used. The method is tested with both holonomic and non-holonomic robots in the simulation experiment. For real-world planning experiment, non-holonomic mobile robot named e-puck is used. The learned policy from the simulation can be directly applied to the real-world robot without any calibration or retraining. The result shows that the proposed method outperforms the existing methods such as Reciprocal Velocity Obstacle (RVO), PrEference Appraisal Reinforcement Learning (PEARL), and Optimal Reciprocal Collision Avoidance (ORCA). In addition, it is shown that the proposed method is more efficient in terms of learning than existing learning-based method.1. Introduction 1 1.1 Motivations 1 1.2 Contributions 6 1.3 Organizations 7 2 Related Work 8 2.1 Reinforcement Learning 8 2.2 Classical Navigation Methods 11 2.3 Learning-Based Navigation Methods 13 3. Learning Collision 17 3.1 Introduction 17 3.2 Learning Collision 18 3.2.1 Markov Decision Process Setup 18 3.2.2 Training Algorithm 19 3.2.3 Experimental Results 22 4. Single Robot Collision Avoidance 25 4.1 Introduction 25 4.2 Holonomic Robot Obstacle Avoidance 26 4.2.1 Approach 26 4.2.2 Experimental Results 29 4.3 Non-Holonomic Robot Obstacle Avoidance 31 4.3.1 Approach 31 4.3.2 Experimental Results 33 5. Multi-Robot Collision Avoidance 36 5.1 Introduction 36 5.2 Approach 37 5.3 Experimental Results 40 5.3.1 Simulated Experiment 40 5.3.2 Real-World Experiment 44 5.3.3 Holonomic to Non-Holonomic Conversion Experiment 49 6. Conclusion 52 Bibliography 55 ์ดˆ๋ก 62 ๊ฐ์‚ฌ์˜ ๊ธ€ 64Maste

    Modeling and Control Strategies for a Two-Wheel Balancing Mobile Robot

    Get PDF
    The problem of balancing and autonomously navigating a two-wheel mobile robot is an increasingly active area of research, due to its potential applications in last-mile delivery, pedestrian transportation, warehouse automation, parts supply, agriculture, surveillance, and monitoring. This thesis investigates the design and control of a two-wheel balancing mobile robot using three different control strategies: Proportional Integral Derivative (PID) controllers, Sliding Mode Control, and Deep Q-Learning methodology. The mobile robot is modeled using a dynamic and kinematic model, and its motion is simulated in a custom MATLAB/Simulink environment. The first part of the thesis focuses on developing a dynamic and kinematic model for the mobile robot. The robot dynamics is derived using the classical Euler-Lagrange method, where motion can be described using potential and kinetic energies of the bodies. Non-holonomic constraints are included in the model to achieve desired motion, such as non-drifting of the mobile robot. These non-holonomic constraints are included using the method of Lagrange multipliers. Navigation for the robot is developed using artificial potential field path planning to generate a map of velocity vectors that are used for the set points for linear velocity and yaw rate. The second part of the thesis focuses on developing and evaluating three different control strategies for the mobile robot: PID controllers, Hierarchical Sliding Mode Control, and Deep-Q-Learning. The performances of the different control strategies are evaluated and compared based on various metrics, such as stability, robustness to mass variations and disturbances, and tracking accuracy. The implementation and evaluation of these strategies are modeled tested in a MATLAB/SIMULINK virtual environment

    Modeling and Control Strategies for a Two-Wheel Balancing Mobile Robot

    Get PDF
    The problem of balancing and autonomously navigating a two-wheel mobile robot is an increasingly active area of research, due to its potential applications in last-mile delivery, pedestrian transportation, warehouse automation, parts supply, agriculture, surveillance, and monitoring. This thesis investigates the design and control of a two-wheel balancing mobile robot using three different control strategies: Proportional Integral Derivative (PID) controllers, Sliding Mode Control, and Deep Q-Learning methodology. The mobile robot is modeled using a dynamic and kinematic model, and its motion is simulated in a custom MATLAB/Simulink environment. The first part of the thesis focuses on developing a dynamic and kinematic model for the mobile robot. The robot dynamics is derived using the classical Euler-Lagrange method, where motion can be described using potential and kinetic energies of the bodies. Non-holonomic constraints are included in the model to achieve desired motion, such as non-drifting of the mobile robot. These non-holonomic constraints are included using the method of Lagrange multipliers. Navigation for the robot is developed using artificial potential field path planning to generate a map of velocity vectors that are used for the set points for linear velocity and yaw rate. The second part of the thesis focuses on developing and evaluating three different control strategies for the mobile robot: PID controllers, Hierarchical Sliding Mode Control, and Deep-Q-Learning. The performances of the different control strategies are evaluated and compared based on various metrics, such as stability, robustness to mass variations and disturbances, and tracking accuracy. The implementation and evaluation of these strategies are modeled tested in a MATLAB/SIMULINK virtual environment

    Deep Reinforcement Learning for Autonomous Navigation of Mobile Robots in Indoor Environments

    Get PDF
    Conventional autonomous navigation framework for mobile robots is highly modularized with various subsystems such as localization, perception, mapping, planning and control. Although these provide easy interpretation, they are highly dependent on a known map of the robotโ€™s surroundings for navigating in a cluttered environment. Local planners such as DWA require a map with all obstacles in the surroundings to calculate an optimal collision-free trajectory to the goal. Planning and tracking a collision-free path without knowing the obstacle locations is a challenging task. Since the advent of deep learning techniques, the field of deep reinforcement learning has proven to be a powerful learning framework for robotic tasks. Deep Reinforcement Learning has demonstrated wide success in various complex computer games such as Go and StarCraft which have high dimensional state and action spaces. However, it has rarely been used in real world applications due to the Sim-2-Real challenges in transferring the trained RL policy into the real-world. In this work, we propose a novel framework for autonomously navigating a mobile robot in a cluttered space without known localization of the obstacles in its surroundings using deep reinforcement learning techniques. The proposed method is a modular and scalable approach due to a strategic design of the training environment. It uses constrained space and randomization techniques to learn an effective reinforcement learning policy in lesser simulation training time. The state vector consists of the target location in the mobile robot coordinate frame and additionally a 36-dimensional lidar vector for obstacle avoidance task. We demonstrate the optimal discrete action policy on a Turtlebot in the real-world. We have also addressed some key challenges in robot pose estimation for autonomous driving tasks
    • โ€ฆ
    corecore