A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

Abstract

Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in infinite-horizon when no model is available. In this article we consider the particular framework of nonstationary finite-horizon Markov Decision Processes. After establishing a relationship between the finite-horizon total reward criterion and the average-reward criterion in finite-horizon, we define QH -Learning and RH -Learning for finite-horizon MDPs. Then we introduce the Ordinary Differential Equation (ODE) method to conduct a learning rate analysis of QH -Learning and RH - Learning. RH -Learning appears to be a version of QH -Learning with matrix-valued stepsizes, the corresponding gain matrix being very close to the optimal matrix which results from the ODE analysis. Experimental results confirm that performance hierarchy. 1 Introduction The search for optimal policies in Markov Decision Processes has been deeply studied according t..

    Similar works

    Full text

    thumbnail-image

    Available Versions