Search CORE

2 research outputs found

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

Author: Dabney Will
Gemp Ian
Giguere Steve
Jacek Nicholas
Liu Bo
Liu Ji
Mahadevan Sridhar
Thomas Philip
Publication venue
Publication date: 26/05/2014
Field of study

In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a Legendre transform. This allows temporal difference updates to occur in dual spaces, allowing a variety of important technical advantages. The Legendre transform elegantly generalizes past algorithms for solving reinforcement learning problems, such as natural gradient methods, which we show relate closely to the previously unconnected framework of mirror descent methods. Equally importantly, proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients that occur in recent variants of gradient-based temporal difference learning. This key technical innovation makes it possible to finally design "true" stochastic gradient methods for reinforcement learning. Finally, Legendre transforms enable a variety of other benefits, including modeling sparsity and domain geometry. Our work builds extensively on recent work on the convergence of saddle-point algorithms, and on the theory of monotone operators.Comment: 121 page

arXiv.org e-Print Archive

Exponentiated Gradient Methods for Reinforcement Learning

Author: Doina Precup
Richard S. Sutton
Publication venue: Morgan Kaufmann
Publication date
Field of study

This paper introduces and evaluates a natural extension of linear exponentiated gradient methods that makes them applicable to reinforcement learning problems. Just as these methods speed up supervised learning, we find that they can also increase the efficiency of reinforcement learning. Comparisons are made with conventional reinforcement learning methods on two test problems using CMAC function approximators and replacing traces. On a small prediction task, exponentiated gradient methods showed no improvement, but on a larger control task (Mountain Car) they improved the learning speed by approximately 25%. A more detailed analysis suggests that the difference may be due to the distribution of irrelevant features. 1 INTRODUCTION Exponentiated gradient (EG) methods were first proposed by Littlestone (1988) in the form of the Winnow algorithm for training linear threshold classifiers. Kivinen and Warmuth (1994) proposed the first EG methods for on-line linear regression. The analogou..

CiteSeerX