Search CORE

241 research outputs found

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Author: Ortner Ronald
Ryabko Daniil
Publication venue
Publication date: 01/01/2012
Field of study

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Learning to Control in Metric Space with Optimal Regret

Author: Ni Chengzhuo
Wang Mengdi
Yang Lin F.
Publication venue
Publication date: 04/05/2019
Field of study

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after

K

episodes is

O(HL(KH)^{\frac{d-1}{d}})

where

L

is a smoothness parameter, and

d

is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Prediction with Expert Advice under Discounted Loss

Author: A. Chernov
B. Schölkopf
D. Haussler
D.A. Harville
E.F. Beckenbach
E.S. Gardner
J.F. Muth
M. Herbster
N. Cesa-Bianchi
R. Sutton
V. Vovk
V. Vovk
V. Vovk
Y. Kalnishkan
Publication venue
Publication date: 01/01/2010
Field of study

We study prediction with expert advice in the setting where the losses are accumulated with some discounting---the impact of old losses may gradually vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm for Regression to this case, propose a suitable new variant of exponential weights algorithm, and prove respective loss bounds.Comment: 26 pages; expanded (2 remarks -> theorems), some misprints correcte

arXiv.org e-Print Archive

Crossref

University of Brighton Research Portal

University of Bedfordshire Repository