16 research outputs found

    Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

    Full text link
    We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities

    Online Learning Models for Content Popularity Prediction In Wireless Edge Caching

    Full text link
    Caching popular contents in advance is an important technique to achieve the low latency requirement and to reduce the backhaul costs in future wireless communications. Considering a network with base stations distributed as a Poisson point process (PPP), optimal content placement caching probabilities are derived for known popularity profile, which is unknown in practice. In this paper, online prediction (OP) and online learning (OL) methods are presented based on popularity prediction model (PPM) and Grassmannian prediction model (GPM), to predict the content profile for future time slots for time-varying popularities. In OP, the problem of finding the coefficients is modeled as a constrained non-negative least squares (NNLS) problem which is solved with a modified NNLS algorithm. In addition, these two models are compared with log-request prediction model (RPM), information prediction model (IPM) and average success probability (ASP) based model. Next, in OL methods for the time-varying case, the cumulative mean squared error (MSE) is minimized and the MSE regret is analyzed for each of the models. Moreover, for quasi-time varying case where the popularity changes block-wise, KWIK (know what it knows) learning method is modified for these models to improve the prediction MSE and ASP performance. Simulation results show that for OP, PPM and GPM provides the best ASP among these models, concluding that minimum mean squared error based models do not necessarily result in optimal ASP. OL based models yield approximately similar ASP and MSE, while for quasi-time varying case, KWIK methods provide better performance, which has been verified with MovieLens dataset.Comment: 9 figure, 29 page

    Cover Tree Bayesian Reinforcement Learning

    Get PDF
    This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

    Bounded Optimal Exploration in MDP

    Full text link
    Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.Comment: In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), 201
    corecore