4,712 research outputs found
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
We derive sublinear regret bounds for undiscounted reinforcement learning in
continuous state space. The proposed algorithm combines state aggregation with
the use of upper confidence bounds for implementing optimism in the face of
uncertainty. Beside the existence of an optimal policy which satisfies the
Poisson equation, the only assumptions made are Holder continuity of rewards
and transition probabilities
- …