57 research outputs found
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
In this paper, we propose new problem-independent lower bounds on the sample
complexity and regret in episodic MDPs, with a particular focus on the
non-stationary case in which the transition kernel is allowed to change in each
stage of the episode. Our main contribution is a novel lower bound of
on the sample complexity of an
-PAC algorithm for best policy identification in a
non-stationary MDP. This lower bound relies on a construction of "hard MDPs"
which is different from the ones previously used in the literature. Using this
same class of MDPs, we also provide a rigorous proof of the
regret bound for non-stationary MDPs. Finally, we
discuss connections to PAC-MDP lower bounds
- …