106,191 research outputs found
Benchmarking Deep Reinforcement Learning for Continuous Control
Recently, researchers have made significant progress combining the advances
in deep learning for learning feature representations with reinforcement
learning. Some notable examples include training agents to play Atari games
based on raw pixel data and to acquire advanced manipulation skills using raw
sensory inputs. However, it has been difficult to quantify progress in the
domain of continuous control due to the lack of a commonly adopted benchmark.
In this work, we present a benchmark suite of continuous control tasks,
including classic tasks like cart-pole swing-up, tasks with very high state and
action dimensionality such as 3D humanoid locomotion, tasks with partial
observations, and tasks with hierarchical structure. We report novel findings
based on the systematic evaluation of a range of implemented reinforcement
learning algorithms. Both the benchmark and reference implementations are
released at https://github.com/rllab/rllab in order to facilitate experimental
reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201
Evolutionary Deep Reinforcement Learning Using Elite Buffer: A Novel Approach Towards DRL Combined with EA in Continuous Control Tasks
Despite the numerous applications and success of deep reinforcement learning
in many control tasks, it still suffers from many crucial problems and
limitations, including temporal credit assignment with sparse reward, absence
of effective exploration, and a brittle convergence that is extremely sensitive
to the hyperparameters of the problem. The problems of deep reinforcement
learning in continuous control, along with the success of evolutionary
algorithms in facing some of these problems, have emerged the idea of
evolutionary reinforcement learning, which attracted many controversies.
Despite successful results in a few studies in this field, a proper and fitting
solution to these problems and their limitations is yet to be presented. The
present study aims to study the efficiency of combining the two fields of deep
reinforcement learning and evolutionary computations further and take a step
towards improving methods and the existing challenges. The "Evolutionary Deep
Reinforcement Learning Using Elite Buffer" algorithm introduced a novel
mechanism through inspiration from interactive learning capability and
hypothetical outcomes in the human brain. In this method, the utilization of
the elite buffer (which is inspired by learning based on experience
generalization in the human mind), along with the existence of crossover and
mutation operators, and interactive learning in successive generations, have
improved efficiency, convergence, and proper advancement in the field of
continuous control. According to the results of experiments, the proposed
method surpasses other well-known methods in environments with high complexity
and dimension and is superior in resolving the mentioned problems and
limitations
Docking control of an autonomous underwater vehicle using reinforcement learning
To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) will need to autonomously dock onto a charging station. Here, reinforcement learning strategies were applied for the first time to control the docking of an AUV onto a fixed platform in a simulation environment. Two reinforcement learning schemes were investigated: one with continuous state and action spaces, deep deterministic policy gradient (DDPG), and one with continuous state but discrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as step changes in the control input signals. The performance of the reinforcement learning strategies was compared with classical and optimal control techniques. The control actions selected by DDPG suffer from chattering effects due to a hyperbolic tangent layer in the actor. Conversely, DQN presents the best compromise between short docking time and low control effort, whilst meeting the docking requirements. Whereas the reinforcement learning algorithms present a very high computational cost at training time, they are five orders of magnitude faster than optimal control at deployment time, thus enabling an on-line implementation. Therefore, reinforcement learning achieves a performance similar to optimal control at a much lower computational cost at deployment, whilst also presenting a more general framework
Data-efficient learning of feedback policies from image pixels using deep dynamical models
Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ( torques ) from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques
Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces
Traditional controllers have limitations as they rely on prior knowledge
about the physics of the problem, require modeling of dynamics, and struggle to
adapt to abnormal situations. Deep reinforcement learning has the potential to
address these problems by learning optimal control policies through exploration
in an environment. For safety-critical environments, it is impractical to
explore randomly, and replacing conventional controllers with black-box models
is also undesirable. Also, it is expensive in continuous state and action
spaces, unless the search space is constrained. To address these challenges we
propose a specialized deep residual policy safe reinforcement learning with a
cycle of learning approach adapted for complex and continuous state-action
spaces. Residual policy learning allows learning a hybrid control architecture
where the reinforcement learning agent acts in synchronous collaboration with
the conventional controller. The cycle of learning initiates the policy through
the expert trajectory and guides the exploration around it. Further, the
specialization through the input-output hidden Markov model helps to optimize
policy that lies within the region of interest (such as abnormality), where the
reinforcement learning agent is required and is activated. The proposed
solution is validated on the Tennessee Eastman process control
- …