Search CORE

6,203 research outputs found

Optimal Reinforcement Learning for Gaussian Systems

Author: Hennig Philipp
Publication venue
Publication date: 14/10/2011
Field of study

The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finite-dimensional projection gives an impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again, please note some nontrivial changes to exposition and interpretation of the results, in particular in Equation (9) and Eqs. 11-14. The algorithm and results have remained the same, but their theoretical interpretation has change

arXiv.org e-Print Archive

MPG.PuRe

Monte Carlo Bayesian Reinforcement Learning

Author: Hsu David
Lee Wee Sun
Wang Yi
Won Kok Sung
Publication venue
Publication date: 01/01/2012
Field of study

Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paper presents Monte Carlo BRL (MC-BRL), a simple and general approach to BRL. MC-BRL samples a priori a finite set of hypotheses for the model parameter values and forms a discrete partially observable Markov decision process (POMDP) whose state space is a cross product of the state space for the reinforcement learning task and the sampled model parameter space. The POMDP does not require conjugate distributions for belief representation, as earlier works do, and can be solved relatively easily with point-based approximation algorithms. MC-BRL naturally handles both fully and partially observable worlds. Theoretical and experimental results show that the discrete POMDP approximates the underlying BRL task well with guaranteed performance.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

ScholarBank@NUS

On the role of synaptic stochasticity in training low-precision neural networks

Author: Baldassi Carlo
Gerace Federica
Kappen Hilbert J.
Lucibello Carlo
Saglietti Luca
Tartaglione Enzo
Zecchina Riccardo
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2018
Field of study

Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension aimed at training discrete deep neural networks is also investigated.Comment: 7 pages + 14 pages of supplementary materia

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

HAL Descartes

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Radboud Repository

Cover Tree Bayesian Reinforcement Learning

Author: Blekas Konstantinos
Dimitrakakis Christos
Tziortziotis Nikolaos
Publication venue
Publication date: 08/12/2013
Field of study

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library