231 research outputs found
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
We derive sublinear regret bounds for undiscounted reinforcement learning in
continuous state space. The proposed algorithm combines state aggregation with
the use of upper confidence bounds for implementing optimism in the face of
uncertainty. Beside the existence of an optimal policy which satisfies the
Poisson equation, the only assumptions made are Holder continuity of rewards
and transition probabilities
Prediction with Expert Advice under Discounted Loss
We study prediction with expert advice in the setting where the losses are
accumulated with some discounting---the impact of old losses may gradually
vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm
for Regression to this case, propose a suitable new variant of exponential
weights algorithm, and prove respective loss bounds.Comment: 26 pages; expanded (2 remarks -> theorems), some misprints correcte
- …