Search CORE

1 research outputs found

A Model based Search Method for Prediction in Model-free Markov Decision Process

Author: Bhatnagar Shalabh
Joseph Ajin George
Publication venue: IEEE, 345 E 47TH ST, NEW YORK, NY 10017 USA
Publication date
Field of study

In this paper, we provide a new algorithm for the problem of prediction in the model-free MDP setting, i.e., estimating the value function of a given policy using the linear function approximation architecture, with memory and computation costs scaling quadratically in the size of the feature set. The algorithm is a multi-timescale variant of the very popular cross entropy (CE) method which is a model based search method to find the global optimum of a real-valued function. This is the first time a model based search method is used for the prediction problem. A proof of convergence using the ODE method is provided. The theoretical results are supplemented with experimental comparisons. The algorithm achieves good performance fairly consistently on many benchmark problems

Crossref

Open Access Repository of IISc Research Publications