Search CORE

927 research outputs found

Cover Tree Bayesian Reinforcement Learning

Author: Blekas Konstantinos
Dimitrakakis Christos
Tziortziotis Nikolaos
Publication venue
Publication date: 08/12/2013
Field of study

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

Towards parallelizable sampling-based Nonlinear Model Predictive Control

Author: Bobiti R. V.
Lazar M.
Publication venue
Publication date: 12/01/2017
Field of study

This paper proposes a new sampling-based nonlinear model predictive control (MPC) algorithm, with a bound on complexity quadratic in the prediction horizon N and linear in the number of samples. The idea of the proposed algorithm is to use the sequence of predicted inputs from the previous time step as a warm start, and to iteratively update this sequence by changing its elements one by one, starting from the last predicted input and ending with the first predicted input. This strategy, which resembles the dynamic programming principle, allows for parallelization up to a certain level and yields a suboptimal nonlinear MPC algorithm with guaranteed recursive feasibility, stability and improved cost function at every iteration, which is suitable for real-time implementation. The complexity of the algorithm per each time step in the prediction horizon depends only on the horizon, the number of samples and parallel threads, and it is independent of the measured system state. Comparisons with the fmincon nonlinear optimization solver on benchmark examples indicate that as the simulation time progresses, the proposed algorithm converges rapidly to the "optimal" solution, even when using a small number of samples.Comment: 9 pages, 9 pictures, submitted to IFAC World Congress 201

arXiv.org e-Print Archive

Pure OAI Repository

Rollout Sampling Approximate Policy Iteration

Author: A. Antos
A. Fern
Christos Dimitrakakis
E. Even-Dar
H. O. Wang
M. G. Lagoudakis
Michail G. Lagoudakis
P. Auer
R. A. Howard
R. Sutton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented at EWRL08, to be presented at ECML 200

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

International Migration, Integration and Social Cohesion online publications

Institutional Repository of the Technical University of Crete

Approximate Modified Policy Iteration

Author: Gabillon Victor
Geist Matthieu
Ghavamzadeh Mohammad
Scherrer Bruno
Publication venue
Publication date: 01/01/2012
Field of study

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

HAL - Lille 3

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

HAL-Rennes 1