1,420 research outputs found
Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization
Reinforcement learning (RL) enables agents to take decision based on a reward
function. However, in the process of learning, the choice of values for
learning algorithm parameters can significantly impact the overall learning
process. In this paper, we use a genetic algorithm (GA) to find the values of
parameters used in Deep Deterministic Policy Gradient (DDPG) combined with
Hindsight Experience Replay (HER), to help speed up the learning agent. We used
this method on fetch-reach, slide, push, pick and place, and door opening in
robotic manipulation tasks. Our experimental evaluation shows that our method
leads to better performance, faster than the original algorithm
Deep Reinforcement Learning for Autonomous Driving
Reinforcement learning has steadily improved and outperform human in lots of
traditional games since the resurgence of deep neural network. However, these
success is not easy to be copied to autonomous driving because the state spaces
in real world are extreme complex and action spaces are continuous and fine
control is required. Moreover, the autonomous driving vehicles must also keep
functional safety under the complex environments. To deal with these
challenges, we first adopt the deep deterministic policy gradient (DDPG)
algorithm, which has the capacity to handle complex state and action spaces in
continuous domain. We then choose The Open Racing Car Simulator (TORCS) as our
environment to avoid physical damage. Meanwhile, we select a set of appropriate
sensor information from TORCS and design our own rewarder. In order to fit DDPG
algorithm to TORCS, we design our network architecture for both actor and
critic inside DDPG paradigm. To demonstrate the effectiveness of our model, We
evaluate on different modes in TORCS and show both quantitative and qualitative
results.Comment: no time for further improvemen
Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping
In this paper, we investigate the obstacle avoidance and navigation problem
in the robotic control area. For solving such a problem, we propose revised
Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization
algorithms with an improved reward shaping technique. We compare the
performances between the original DDPG and PPO with the revised version of both
on simulations with a real mobile robot and demonstrate that the proposed
algorithms achieve better results
Safe, Efficient, and Comfortable Velocity Control based on Reinforcement Learning for Autonomous Driving
A model used for velocity control during car following was proposed based on
deep reinforcement learning (RL). To fulfil the multi-objectives of car
following, a reward function reflecting driving safety, efficiency, and comfort
was constructed. With the reward function, the RL agent learns to control
vehicle speed in a fashion that maximizes cumulative rewards, through trials
and errors in the simulation environment. A total of 1,341 car-following events
extracted from the Next Generation Simulation (NGSIM) dataset were used to
train the model. Car-following behavior produced by the model were compared
with that observed in the empirical NGSIM data, to demonstrate the model's
ability to follow a lead vehicle safely, efficiently, and comfortably. Results
show that the model demonstrates the capability of safe, efficient, and
comfortable velocity control in that it 1) has small percentages (8\%) of
dangerous minimum time to collision values (\textless\ 5s) than human drivers
in the NGSIM data (35\%); 2) can maintain efficient and safe headways in the
range of 1s to 2s; and 3) can follow the lead vehicle comfortably with smooth
acceleration. The results indicate that reinforcement learning methods could
contribute to the development of autonomous driving systems.Comment: Under the first-round revision for transportation research part
Run, skeleton, run: skeletal model in a physics-based simulation
In this paper, we present our approach to solve a physics-based reinforcement
learning challenge "Learning to Run" with objective to train
physiologically-based human model to navigate a complex obstacle course as
quickly as possible. The environment is computationally expensive, has a
high-dimensional continuous action space and is stochastic. We benchmark state
of the art policy-gradient methods and test several improvements, such as layer
normalization, parameter noise, action and state reflecting, to stabilize
training and improve its sample-efficiency. We found that the Deep
Deterministic Policy Gradient method is the most efficient method for this
environment and the improvements we have introduced help to stabilize training.
Learned models are able to generalize to new physical scenarios, e.g. different
obstacle courses.Comment: Corrected typos and spellin
Actor-critic versus direct policy search: a comparison based on sample complexity
Sample efficiency is a critical property when optimizing policy parameters
for the controller of a robot. In this paper, we evaluate two state-of-the-art
policy optimization algorithms. One is a recent deep reinforcement learning
method based on an actor-critic algorithm, Deep Deterministic Policy Gradient
(DDPG), that has been shown to perform well on various control benchmarks. The
other one is a direct policy search method, Covariance Matrix Adaptation
Evolution Strategy (CMA-ES), a black-box optimization method that is widely
used for robot learning. The algorithms are evaluated on a continuous version
of the mountain car benchmark problem, so as to compare their sample
complexity. From a preliminary analysis, we expect DDPG to be more sample
efficient than CMA-ES, which is confirmed by our experimental results.Comment: Proceedings JFPDA (Journees Francaises Planification Decision
Apprentissage
Personalized Cancer Chemotherapy Schedule: a numerical comparison of performance and robustness in model-based and model-free scheduling methodologies
Reinforcement learning algorithms are gaining popularity in fields in which
optimal scheduling is important, and oncology is not an exception. The complex
and uncertain dynamics of cancer limit the performance of traditional
model-based scheduling strategies like Optimal Control. Motivated by the recent
success of model-free Deep Reinforcement Learning (DRL) in challenging control
tasks and in the design of medical treatments, we use Deep Q-Network (DQN) and
Deep Deterministic Policy Gradient (DDPG) to design a personalized cancer
chemotherapy schedule. We show that both of them succeed in the task and
outperform the Optimal Control solution in the presence of uncertainty.
Furthermore, we show that DDPG can exterminate cancer more efficiently than DQN
presumably due to its continuous action space. Finally, we provide some insight
regarding the amount of samples required for the training.Comment: Minor change
Aplicación de técnicas de Deep Reinforcement learning mediante agente CNN-DDPG a la conducción autónoma
El trabajo tiene como objetivo el desarrollo de un modelo de conducción autónoma con el algoritmo DDPG-LSTM, el cual pertenece a la familia Deep Reinforcement Learning (DRL). El algoritmo parte del Deep Deterministic Policy Gradient (DDPG) al cual se le ha implementado una red neuronal de mayor memoria llamada LSTM (Long Short-Term Memory).
Con este modelo se contribuirá al desarrollo de algoritmos de navegación autónoma y en trabajos futuros se ajustará para poder implementarlo sobre el vehículo eléctrico autónomo en desarrollo del grupo Robesafe para el proyecto Techs4AgeCar.The aim of this work is to develop an autonomous driving model with the DDPG-LSTM algorithm, which belongs to the Deep Reinforcement Learning (DRL) family. The algorithm is based on the Deep Deterministic Policy Gradient (DDPG) to which a larger memory neural network called LSTM (Long Short-Term Memory) has been implemented.
This model will contribute to the development of autonomous navigation algorithms and in future work will be adjusted to be implemented on the autonomous electric vehicle being developed by the Robesafe group for the Techs4AgeCar project.Grado en Ingeniería en Electrónica y Automática Industria
Human-Like Autonomous Car-Following Model with Deep Reinforcement Learning
This study proposes a framework for human-like autonomous car-following
planning based on deep reinforcement learning (deep RL). Historical driving
data are fed into a simulation environment where an RL agent learns from trial
and error interactions based on a reward function that signals how much the
agent deviates from the empirical data. Through these interactions, an optimal
policy, or car-following model that maps in a human-like way from speed,
relative speed between a lead and following vehicle, and inter-vehicle spacing
to acceleration of a following vehicle is finally obtained. The model can be
continuously updated when more data are fed in. Two thousand car-following
periods extracted from the 2015 Shanghai Naturalistic Driving Study were used
to train the model and compare its performance with that of traditional and
recent data-driven car-following models. As shown by this study results, a deep
deterministic policy gradient car-following model that uses disparity between
simulated and observed speed as the reward function and considers a reaction
delay of 1s, denoted as DDPGvRT, can reproduce human-like car-following
behavior with higher accuracy than traditional and recent data-driven
car-following models. Specifically, the DDPGvRT model has a spacing validation
error of 18% and speed validation error of 5%, which are less than those of
other models, including the intelligent driver model, models based on locally
weighted regression, and conventional neural network-based models. Moreover,
the DDPGvRT demonstrates good capability of generalization to various driving
situations and can adapt to different drivers by continuously learning. This
study demonstrates that reinforcement learning methodology can offer insight
into driver behavior and can contribute to the development of human-like
autonomous driving algorithms and traffic-flow models
Curriculum goal masking for continuous deep reinforcement learning
Deep reinforcement learning has recently gained a focus on problems where
policy or value functions are independent of goals. Evidence exists that the
sampling of goals has a strong effect on the learning performance, but there is
a lack of general mechanisms that focus on optimizing the goal sampling
process. In this work, we present a simple and general goal masking method that
also allows us to estimate a goal's difficulty level and thus realize a
curriculum learning approach for deep RL. Our results indicate that focusing on
goals with a medium difficulty level is appropriate for deep deterministic
policy gradient (DDPG) methods, while an "aim for the stars and reach the
moon-strategy", where hard goals are sampled much more often than simple goals,
leads to the best learning performance in cases where DDPG is combined with for
hindsight experience replay (HER). We demonstrate that the approach
significantly outperforms standard goal sampling for different robotic object
manipulation problems
- …