Search CORE

2 research outputs found

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

Author: Ortner Ronald
Publication venue
Publication date: 19/01/2019
Field of study

We give a simple optimistic algorithm for which it is easy to derive regret bounds of

\tilde{O}(\sqrt{t_{\rm mix} SAT})

after

T

steps in uniformly ergodic Markov decision processes with

S

states,

A

actions, and mixing time parameter

t_{\rm mix}

. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter

arXiv.org e-Print Archive

Linear dependence of stationary distributions in ergodic Markov decision processes

Author: Ronald Ortner
Publication venue
Publication date: 01/01/2007
Field of study

In ergodic MDPs we consider stationary distributions of policies that coincide in all but n states, in which one of two possible actions is chosen. We give conditions and formulas for linear dependence of the stationary distributions of n + 2 such policies, and show some results about combinations and mixtures of policies

CiteSeerX