33rd International Conference on Machine Learning, ICML 2016
Abstract
This is the version of record. It originally appeared on arXiv at http://arxiv.org/abs/1603.00748.Model-free reinforcement learning has been successfully
applied to a range of challenging problems,
and has recently been extended to handle
large neural network policies and value functions.
However, the sample complexity of modelfree
algorithms, particularly when using highdimensional
function approximators, tends to
limit their applicability to physical systems. In
this paper, we explore algorithms and representations
to reduce the sample complexity of
deep reinforcement learning for continuous control
tasks. We propose two complementary techniques
for improving the efficiency of such algorithms.
First, we derive a continuous variant of
the Q-learning algorithm, which we call normalized
adantage functions (NAF), as an alternative
to the more commonly used policy gradient and
actor-critic methods. NAF representation allows
us to apply Q-learning with experience replay to
continuous tasks, and substantially improves performance
on a set of simulated robotic control
tasks. To further improve the efficiency of our
approach, we explore the use of learned models
for accelerating model-free reinforcement learning.
We show that iteratively refitted local linear
models are especially effective for this, and
demonstrate substantially faster learning on domains
where such models are applicable