In this paper we study a continuous-time linear quadratic reinforcement
learning problem in an episodic setting. We first show that na\"ive
discretization and piecewise approximation with discrete-time RL algorithms
yields a linear regret with respect to the number of learning episodes N. We
then propose an algorithm with continuous-time controls based on a regularized
least-squares estimation, and establish a sublinear regret bound in the order
of O~(N​). The analysis consists of two parts: parameter
estimation error, which relies on properties of sub-exponential random
variables and double stochastic integrals; and perturbation analysis, which
establishes the robustness of the associated continuous-time Riccati equation
by exploiting its regularity property.Comment: 25 page