16 research outputs found
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
We derive sublinear regret bounds for undiscounted reinforcement learning in
continuous state space. The proposed algorithm combines state aggregation with
the use of upper confidence bounds for implementing optimism in the face of
uncertainty. Beside the existence of an optimal policy which satisfies the
Poisson equation, the only assumptions made are Holder continuity of rewards
and transition probabilities
Online Learning Models for Content Popularity Prediction In Wireless Edge Caching
Caching popular contents in advance is an important technique to achieve the
low latency requirement and to reduce the backhaul costs in future wireless
communications. Considering a network with base stations distributed as a
Poisson point process (PPP), optimal content placement caching probabilities
are derived for known popularity profile, which is unknown in practice. In this
paper, online prediction (OP) and online learning (OL) methods are presented
based on popularity prediction model (PPM) and Grassmannian prediction model
(GPM), to predict the content profile for future time slots for time-varying
popularities. In OP, the problem of finding the coefficients is modeled as a
constrained non-negative least squares (NNLS) problem which is solved with a
modified NNLS algorithm. In addition, these two models are compared with
log-request prediction model (RPM), information prediction model (IPM) and
average success probability (ASP) based model. Next, in OL methods for the
time-varying case, the cumulative mean squared error (MSE) is minimized and the
MSE regret is analyzed for each of the models. Moreover, for quasi-time varying
case where the popularity changes block-wise, KWIK (know what it knows)
learning method is modified for these models to improve the prediction MSE and
ASP performance. Simulation results show that for OP, PPM and GPM provides the
best ASP among these models, concluding that minimum mean squared error based
models do not necessarily result in optimal ASP. OL based models yield
approximately similar ASP and MSE, while for quasi-time varying case, KWIK
methods provide better performance, which has been verified with MovieLens
dataset.Comment: 9 figure, 29 page
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Bounded Optimal Exploration in MDP
Within the framework of probably approximately correct Markov decision
processes (PAC-MDP), much theoretical work has focused on methods to attain
near optimality after a relatively long period of learning and exploration.
However, practical concerns require the attainment of satisfactory behavior
within a short period of time. In this paper, we relax the PAC-MDP conditions
to reconcile theoretically driven exploration methods and practical needs. We
propose simple algorithms for discrete and continuous state spaces, and
illustrate the benefits of our proposed relaxation via theoretical analyses and
numerical examples. Our algorithms also maintain anytime error bounds and
average loss bounds. Our approach accommodates both Bayesian and non-Bayesian
methods.Comment: In Proceedings of the 30th AAAI Conference on Artificial Intelligence
(AAAI), 201