1 research outputs found

    Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

    No full text
    Reinforcement learning methods for discrete and semi-Markov decision problems such as Real-Time Dynamic Programming can be generalized for Controlled Diffusion Processes. The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of HamiltonJacobi -Bellman (HJB-) type. Numerical analysis provides multigrid methods for this kind of equation. In the case of Learning Control, however, the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed toward the type of time and space discretization during the observation. An algorithm for multi-grid observation is proposed. The multi-grid algorithm is demonstrated on a simple queuing problem
    corecore