PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Deisenroth, MP; Rasmussen, CE

research

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Authors: MP Deisenroth
CE Rasmussen
Publication date: 1 January 2011
Publisher: IMLS

Abstract

In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. Copyright 2011 by the author(s)/owner(s)

Similar works

Full text

Available Versions

Supporting member

Spiral - Imperial College Digital Repository

oai:spiral.imperial.ac.uk:1004...

Last time updated on 21/10/2013