Linear Quadratic Reinforcement Learning: Sublinear Regret in the
  Episodic Continuous-Time Framework

Basei, Matteo; Guo, Xin; Hu, Anran

Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Authors: Matteo Basei
Xin Guo
Anran Hu
Publication date: 10 November 2020
Publisher

Abstract

In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes

N

. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of

\tilde{O}(\sqrt{N})

. The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.Comment: 25 page

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2006.15316

Last time updated on 01/07/2020