1 research outputs found

    A Model based Search Method for Prediction in Model-free Markov Decision Process

    No full text
    In this paper, we provide a new algorithm for the problem of prediction in the model-free MDP setting, i.e., estimating the value function of a given policy using the linear function approximation architecture, with memory and computation costs scaling quadratically in the size of the feature set. The algorithm is a multi-timescale variant of the very popular cross entropy (CE) method which is a model based search method to find the global optimum of a real-valued function. This is the first time a model based search method is used for the prediction problem. A proof of convergence using the ODE method is provided. The theoretical results are supplemented with experimental comparisons. The algorithm achieves good performance fairly consistently on many benchmark problems
    corecore