A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

Davy, Manuel; Loth, Manuel; Preux, Philippe

research

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

Authors: Manuel Davy
Manuel Loth
Philippe Preux
Publication date: 25 April 2007
Publisher: HAL CCSD

Abstract

International audienceThis paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

HAL - Lille 3

oai:HAL:inria-00116936v2

Last time updated on 11/11/2016

INRIA a CCSD electronic archive server

oai:HAL:inria-00116936v2

Last time updated on 10/11/2016