Policy Evaluation in Batch Reinforcement Learning

Abstract

Policy evaluation is a central problem in batch reinforcement learning. It refers to the assessment of a given decision policy using logged data. Practical methods for policy evaluation typically involve some form of function approximation so as to relieve issues caused by the enormous scales of real-world systems and the length of planning horizons. The thesis is primarily concerned with statistical analysis of policy evaluation and show how function approximation improves the efficacy of methods. In the first part of the thesis, we consider off-policy evaluation with linear function approximation. We show the equivalence of a regression-based fitted Q-iteration method, marginalized importance sampling methods and a model-based method that estimates a conditional mean embedding of the transition operator. Moreover, our theory reveals that the hardness of off-policy evaluation is determined by the mismatch between data and target distributions, which is reflected by a projected chi-square-divergence in error bounds. We prove that the estimators are minimax optimal in terms of sample size, planning horizon and the mismatch term. In the second part of the thesis, we study on-policy evaluation with kernel methods. In particular, we are interested in a regularized form of the kernel least-squares temporal-difference (LSTD) estimate. We use empirical process theory techniques to derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator, as well as the instance-dependent variance of the Bellman residual error. In addition, we prove minimax lower bounds over sub-classes of MRPs, which shows that our rate is optimal in terms of the sample size and the effective horizon. Our analysis sheds light on how we could tune the complexity of function class to favorably strike a balance between the curse of dimension and the curse of horizon in policy evaluation

    Similar works

    Full text

    thumbnail-image

    Available Versions