Search CORE

1,420 research outputs found

Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization

Author: La Hung Manh
Louis Sushil J.
Nguyen Hai
Sehgal Adarsh
Publication venue
Publication date: 19/02/2019
Field of study

Reinforcement learning (RL) enables agents to take decision based on a reward function. However, in the process of learning, the choice of values for learning algorithm parameters can significantly impact the overall learning process. In this paper, we use a genetic algorithm (GA) to find the values of parameters used in Deep Deterministic Policy Gradient (DDPG) combined with Hindsight Experience Replay (HER), to help speed up the learning agent. We used this method on fetch-reach, slide, push, pick and place, and door opening in robotic manipulation tasks. Our experimental evaluation shows that our method leads to better performance, faster than the original algorithm

arXiv.org e-Print Archive

Deep Reinforcement Learning for Autonomous Driving

Author: Jia Daoyuan
Wang Sen
Weng Xinshuo
Publication venue
Publication date: 19/05/2019
Field of study

Reinforcement learning has steadily improved and outperform human in lots of traditional games since the resurgence of deep neural network. However, these success is not easy to be copied to autonomous driving because the state spaces in real world are extreme complex and action spaces are continuous and fine control is required. Moreover, the autonomous driving vehicles must also keep functional safety under the complex environments. To deal with these challenges, we first adopt the deep deterministic policy gradient (DDPG) algorithm, which has the capacity to handle complex state and action spaces in continuous domain. We then choose The Open Racing Car Simulator (TORCS) as our environment to avoid physical damage. Meanwhile, we select a set of appropriate sensor information from TORCS and design our own rewarder. In order to fit DDPG algorithm to TORCS, we design our network architecture for both actor and critic inside DDPG paradigm. To demonstrate the effectiveness of our model, We evaluate on different modes in TORCS and show both quantitative and qualitative results.Comment: no time for further improvemen

arXiv.org e-Print Archive

Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping

Author: Bailey Colleen P.
Zhang Daniel
Publication venue
Publication date: 09/04/2020
Field of study

In this paper, we investigate the obstacle avoidance and navigation problem in the robotic control area. For solving such a problem, we propose revised Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization algorithms with an improved reward shaping technique. We compare the performances between the original DDPG and PPO with the revised version of both on simulations with a real mobile robot and demonstrate that the proposed algorithms achieve better results

arXiv.org e-Print Archive

Safe, Efficient, and Comfortable Velocity Control based on Reinforcement Learning for Autonomous Driving

Author: Hu Jingyun
Ke Ruimin
Pu Ziyuan
Wang Xuesong
Wang Yinhai
Zhu Meixin
Publication venue: 'Elsevier BV'
Publication date: 31/10/2019
Field of study

A model used for velocity control during car following was proposed based on deep reinforcement learning (RL). To fulfil the multi-objectives of car following, a reward function reflecting driving safety, efficiency, and comfort was constructed. With the reward function, the RL agent learns to control vehicle speed in a fashion that maximizes cumulative rewards, through trials and errors in the simulation environment. A total of 1,341 car-following events extracted from the Next Generation Simulation (NGSIM) dataset were used to train the model. Car-following behavior produced by the model were compared with that observed in the empirical NGSIM data, to demonstrate the model's ability to follow a lead vehicle safely, efficiently, and comfortably. Results show that the model demonstrates the capability of safe, efficient, and comfortable velocity control in that it 1) has small percentages (8\%) of dangerous minimum time to collision values (\textless\ 5s) than human drivers in the NGSIM data (35\%); 2) can maintain efficient and safe headways in the range of 1s to 2s; and 3) can follow the lead vehicle comfortably with smooth acceleration. The results indicate that reinforcement learning methods could contribute to the development of autonomous driving systems.Comment: Under the first-round revision for transportation research part

arXiv.org e-Print Archive

Run, skeleton, run: skeletal model in a physics-based simulation

Author: Kolesnikov Sergey
Pavlov Mikhail
Plis Sergey M.
Publication venue
Publication date: 28/01/2018
Field of study

In this paper, we present our approach to solve a physics-based reinforcement learning challenge "Learning to Run" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. We benchmark state of the art policy-gradient methods and test several improvements, such as layer normalization, parameter noise, action and state reflecting, to stabilize training and improve its sample-efficiency. We found that the Deep Deterministic Policy Gradient method is the most efficient method for this environment and the improvements we have introduced help to stabilize training. Learned models are able to generalize to new physical scenarios, e.g. different obstacle courses.Comment: Corrected typos and spellin

arXiv.org e-Print Archive

Actor-critic versus direct policy search: a comparison based on sample complexity

Author: de Broissia Arnaud de Froissard
Sigaud Olivier
Publication venue
Publication date: 22/08/2016
Field of study

Sample efficiency is a critical property when optimizing policy parameters for the controller of a robot. In this paper, we evaluate two state-of-the-art policy optimization algorithms. One is a recent deep reinforcement learning method based on an actor-critic algorithm, Deep Deterministic Policy Gradient (DDPG), that has been shown to perform well on various control benchmarks. The other one is a direct policy search method, Covariance Matrix Adaptation Evolution Strategy (CMA-ES), a black-box optimization method that is widely used for robot learning. The algorithms are evaluated on a continuous version of the mountain car benchmark problem, so as to compare their sample complexity. From a preliminary analysis, we expect DDPG to be more sample efficient than CMA-ES, which is confirmed by our experimental results.Comment: Proceedings JFPDA (Journees Francaises Planification Decision Apprentissage

arXiv.org e-Print Archive

Personalized Cancer Chemotherapy Schedule: a numerical comparison of performance and robustness in model-based and model-free scheduling methodologies

Author: Arbelaiz Juncal
Tordesillas Jesus
Publication venue
Publication date: 02/09/2019
Field of study

Reinforcement learning algorithms are gaining popularity in fields in which optimal scheduling is important, and oncology is not an exception. The complex and uncertain dynamics of cancer limit the performance of traditional model-based scheduling strategies like Optimal Control. Motivated by the recent success of model-free Deep Reinforcement Learning (DRL) in challenging control tasks and in the design of medical treatments, we use Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) to design a personalized cancer chemotherapy schedule. We show that both of them succeed in the task and outperform the Optimal Control solution in the presence of uncertainty. Furthermore, we show that DDPG can exterminate cancer more efficiently than DQN presumably due to its continuous action space. Finally, we provide some insight regarding the amount of samples required for the training.Comment: Minor change

arXiv.org e-Print Archive

Aplicación de técnicas de Deep Reinforcement learning mediante agente CNN-DDPG a la conducción autónoma

Author: Arroyo de la Torre Víctor
Publication venue
Publication date: 01/01/2021
Field of study

El trabajo tiene como objetivo el desarrollo de un modelo de conducción autónoma con el algoritmo DDPG-LSTM, el cual pertenece a la familia Deep Reinforcement Learning (DRL). El algoritmo parte del Deep Deterministic Policy Gradient (DDPG) al cual se le ha implementado una red neuronal de mayor memoria llamada LSTM (Long Short-Term Memory). Con este modelo se contribuirá al desarrollo de algoritmos de navegación autónoma y en trabajos futuros se ajustará para poder implementarlo sobre el vehículo eléctrico autónomo en desarrollo del grupo Robesafe para el proyecto Techs4AgeCar.The aim of this work is to develop an autonomous driving model with the DDPG-LSTM algorithm, which belongs to the Deep Reinforcement Learning (DRL) family. The algorithm is based on the Deep Deterministic Policy Gradient (DDPG) to which a larger memory neural network called LSTM (Long Short-Term Memory) has been implemented. This model will contribute to the development of autonomous navigation algorithms and in future work will be adjusted to be implemented on the autonomous electric vehicle being developed by the Robesafe group for the Techs4AgeCar project.Grado en Ingeniería en Electrónica y Automática Industria

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Human-Like Autonomous Car-Following Model with Deep Reinforcement Learning

Author: Wang Xuesong
Wang Yinhai
Zhu Meixin
Publication venue
Publication date: 02/01/2019
Field of study

This study proposes a framework for human-like autonomous car-following planning based on deep reinforcement learning (deep RL). Historical driving data are fed into a simulation environment where an RL agent learns from trial and error interactions based on a reward function that signals how much the agent deviates from the empirical data. Through these interactions, an optimal policy, or car-following model that maps in a human-like way from speed, relative speed between a lead and following vehicle, and inter-vehicle spacing to acceleration of a following vehicle is finally obtained. The model can be continuously updated when more data are fed in. Two thousand car-following periods extracted from the 2015 Shanghai Naturalistic Driving Study were used to train the model and compare its performance with that of traditional and recent data-driven car-following models. As shown by this study results, a deep deterministic policy gradient car-following model that uses disparity between simulated and observed speed as the reward function and considers a reaction delay of 1s, denoted as DDPGvRT, can reproduce human-like car-following behavior with higher accuracy than traditional and recent data-driven car-following models. Specifically, the DDPGvRT model has a spacing validation error of 18% and speed validation error of 5%, which are less than those of other models, including the intelligent driver model, models based on locally weighted regression, and conventional neural network-based models. Moreover, the DDPGvRT demonstrates good capability of generalization to various driving situations and can adapt to different drivers by continuously learning. This study demonstrates that reinforcement learning methodology can offer insight into driver behavior and can contribute to the development of human-like autonomous driving algorithms and traffic-flow models

arXiv.org e-Print Archive

Curriculum goal masking for continuous deep reinforcement learning

Author: Eppe Manfred
Magg Sven
Wermter Stefan
Publication venue
Publication date: 13/02/2019
Field of study

Deep reinforcement learning has recently gained a focus on problems where policy or value functions are independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, but there is a lack of general mechanisms that focus on optimizing the goal sampling process. In this work, we present a simple and general goal masking method that also allows us to estimate a goal's difficulty level and thus realize a curriculum learning approach for deep RL. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an "aim for the stars and reach the moon-strategy", where hard goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with for hindsight experience replay (HER). We demonstrate that the approach significantly outperforms standard goal sampling for different robotic object manipulation problems

arXiv.org e-Print Archive