Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Abstract

In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes NN. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of O~(N)\tilde{O}(\sqrt{N}). The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.Comment: 25 page

    Similar works

    Full text

    thumbnail-image

    Available Versions