Search CORE

16 research outputs found

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Author: Ortner Ronald
Ryabko Daniil
Publication venue
Publication date: 01/01/2012
Field of study

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Online Learning Models for Content Popularity Prediction In Wireless Edge Caching

Author: Bharath B. N.
Bhatia Vimal
Garg Navneet
Ratnarajah Tharmalingam
Sellathurai Mathini
Publication venue
Publication date: 19/01/2019
Field of study

Caching popular contents in advance is an important technique to achieve the low latency requirement and to reduce the backhaul costs in future wireless communications. Considering a network with base stations distributed as a Poisson point process (PPP), optimal content placement caching probabilities are derived for known popularity profile, which is unknown in practice. In this paper, online prediction (OP) and online learning (OL) methods are presented based on popularity prediction model (PPM) and Grassmannian prediction model (GPM), to predict the content profile for future time slots for time-varying popularities. In OP, the problem of finding the coefficients is modeled as a constrained non-negative least squares (NNLS) problem which is solved with a modified NNLS algorithm. In addition, these two models are compared with log-request prediction model (RPM), information prediction model (IPM) and average success probability (ASP) based model. Next, in OL methods for the time-varying case, the cumulative mean squared error (MSE) is minimized and the MSE regret is analyzed for each of the models. Moreover, for quasi-time varying case where the popularity changes block-wise, KWIK (know what it knows) learning method is modified for these models to improve the prediction MSE and ASP performance. Simulation results show that for OP, PPM and GPM provides the best ASP among these models, concluding that minimum mean squared error based models do not necessarily result in optimal ASP. OL based models yield approximately similar ASP and MSE, while for quasi-time varying case, KWIK methods provide better performance, which has been verified with MovieLens dataset.Comment: 9 figure, 29 page

arXiv.org e-Print Archive

Crossref

Heriot Watt Pure

Cover Tree Bayesian Reinforcement Learning

Author: Blekas Konstantinos
Dimitrakakis Christos
Tziortziotis Nikolaos
Publication venue
Publication date: 08/12/2013
Field of study

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

Bounded Optimal Exploration in MDP

Author: Kawaguchi Kenji
Publication venue
Publication date: 21/02/2016
Field of study

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.Comment: In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications