1,520 research outputs found
Benchmarking Deep Reinforcement Learning for Continuous Control
Recently, researchers have made significant progress combining the advances
in deep learning for learning feature representations with reinforcement
learning. Some notable examples include training agents to play Atari games
based on raw pixel data and to acquire advanced manipulation skills using raw
sensory inputs. However, it has been difficult to quantify progress in the
domain of continuous control due to the lack of a commonly adopted benchmark.
In this work, we present a benchmark suite of continuous control tasks,
including classic tasks like cart-pole swing-up, tasks with very high state and
action dimensionality such as 3D humanoid locomotion, tasks with partial
observations, and tasks with hierarchical structure. We report novel findings
based on the systematic evaluation of a range of implemented reinforcement
learning algorithms. Both the benchmark and reference implementations are
released at https://github.com/rllab/rllab in order to facilitate experimental
reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201
Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning
Temporal difference (TD) methods constitute a class of methods for learning
predictions in multi-step prediction problems, parameterized by a recency
factor lambda. Currently the most important application of these methods is to
temporal credit assignment in reinforcement learning. Well known reinforcement
learning algorithms, such as AHC or Q-learning, may be viewed as instances of
TD learning. This paper examines the issues of the efficient and general
implementation of TD(lambda) for arbitrary lambda, for use with reinforcement
learning algorithms optimizing the discounted sum of rewards. The traditional
approach, based on eligibility traces, is argued to suffer from both
inefficiency and lack of generality. The TTD (Truncated Temporal Differences)
procedure is proposed as an alternative, that indeed only approximates
TD(lambda), but requires very little computation per action and can be used
with arbitrary function representation methods. The idea from which it is
derived is fairly simple and not new, but probably unexplored so far.
Encouraging experimental results are presented, suggesting that using lambda
> 0 with the TTD procedure allows one to obtain a significant learning
speedup at essentially the same cost as usual TD(0) learning.Comment: See http://www.jair.org/ for any accompanying file
Chance-Constrained Control with Lexicographic Deep Reinforcement Learning
This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes, in which the controller seeks to ensure that the probability of satisfying the constraint is above a given threshold. Standard DeepRL approaches require i) the constraints to be included as additional weighted terms in the cost function, in a multi-objective fashion, and ii) the tuning of the introduced weights during the training phase of the Deep Neural Network (DNN) according to the probability thresholds. The proposed approach, instead, requires to separately train one constraint-free DNN and one DNN associated to each constraint and then, at each time-step, to select which DNN to use depending on the system observed state. The presented solution does not require any hyper-parameter tuning besides the standard DNN ones, even if the probability thresholds changes. A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations
Pseudorehearsal in actor-critic agents with neural network function approximation
Catastrophic forgetting has a significant negative impact in reinforcement
learning. The purpose of this study is to investigate how pseudorehearsal can
change performance of an actor-critic agent with neural-network function
approximation. We tested agent in a pole balancing task and compared different
pseudorehearsal approaches. We have found that pseudorehearsal can assist
learning and decrease forgetting
Pseudorehearsal in actor-critic agents with neural network function approximation
Catastrophic forgetting has a significant negative impact in reinforcement
learning. The purpose of this study is to investigate how pseudorehearsal can
change performance of an actor-critic agent with neural-network function
approximation. We tested agent in a pole balancing task and compared different
pseudorehearsal approaches. We have found that pseudorehearsal can assist
learning and decrease forgetting
The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming
In adaptive dynamic programming, neurocontrol and reinforcement learning, the
objective is for an agent to learn to choose actions so as to minimise a total
cost function. In this paper we show that when discretized time is used to
model the motion of the agent, it can be very important to do "clipping" on the
motion of the agent in the final time step of the trajectory. By clipping we
mean that the final time step of the trajectory is to be truncated such that
the agent stops exactly at the first terminal state reached, and no distance
further. We demonstrate that when clipping is omitted, learning performance can
fail to reach the optimum; and when clipping is done properly, learning
performance can improve significantly.
The clipping problem we describe affects algorithms which use explicit
derivatives of the model functions of the environment to calculate a learning
gradient. These include Backpropagation Through Time for Control, and methods
based on Dual Heuristic Dynamic Programming. However the clipping problem does
not significantly affect methods based on Heuristic Dynamic Programming,
Temporal Differences or Policy Gradient Learning algorithms. Similarly, the
clipping problem does not affect fixed-length finite-horizon problems
- …