9,871 research outputs found

    Cover Tree Bayesian Reinforcement Learning

    Get PDF
    This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

    Reinforcement Learning via AIXI Approximation

    Full text link
    This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur

    Simple trees in complex forests: Growing Take The Best by Approximate Bayesian Computation

    Get PDF
    How can heuristic strategies emerge from smaller building blocks? We propose Approximate Bayesian Computation as a computational solution to this problem. As a first proof of concept, we demonstrate how a heuristic decision strategy such as Take The Best (TTB) can be learned from smaller, probabilistically updated building blocks. Based on a self-reinforcing sampling scheme, different building blocks are combined and, over time, tree-like non-compensatory heuristics emerge. This new algorithm, coined Approximately Bayesian Computed Take The Best (ABC-TTB), is able to recover a data set that was generated by TTB, leads to sensible inferences about cue importance and cue directions, can outperform traditional TTB, and allows to trade-off performance and computational effort explicitly

    Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

    Get PDF
    Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

    Feature Reinforcement Learning: Part I: Unstructured MDPs

    Get PDF
    General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II. The role of POMDPs is also considered there.Comment: 24 LaTeX pages, 5 diagram

    Optimal Reinforcement Learning for Gaussian Systems

    Full text link
    The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finite-dimensional projection gives an impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again, please note some nontrivial changes to exposition and interpretation of the results, in particular in Equation (9) and Eqs. 11-14. The algorithm and results have remained the same, but their theoretical interpretation has change
    • …
    corecore