927 research outputs found
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Towards parallelizable sampling-based Nonlinear Model Predictive Control
This paper proposes a new sampling-based nonlinear model predictive control
(MPC) algorithm, with a bound on complexity quadratic in the prediction horizon
N and linear in the number of samples. The idea of the proposed algorithm is to
use the sequence of predicted inputs from the previous time step as a warm
start, and to iteratively update this sequence by changing its elements one by
one, starting from the last predicted input and ending with the first predicted
input. This strategy, which resembles the dynamic programming principle, allows
for parallelization up to a certain level and yields a suboptimal nonlinear MPC
algorithm with guaranteed recursive feasibility, stability and improved cost
function at every iteration, which is suitable for real-time implementation.
The complexity of the algorithm per each time step in the prediction horizon
depends only on the horizon, the number of samples and parallel threads, and it
is independent of the measured system state. Comparisons with the fmincon
nonlinear optimization solver on benchmark examples indicate that as the
simulation time progresses, the proposed algorithm converges rapidly to the
"optimal" solution, even when using a small number of samples.Comment: 9 pages, 9 pictures, submitted to IFAC World Congress 201
Rollout Sampling Approximate Policy Iteration
Several researchers have recently investigated the connection between
reinforcement learning and classification. We are motivated by proposals of
approximate policy iteration schemes without value functions which focus on
policy representation using classifiers and address policy learning as a
supervised learning problem. This paper proposes variants of an improved policy
iteration scheme which addresses the core sampling problem in evaluating a
policy through simulation as a multi-armed bandit machine. The resulting
algorithm offers comparable performance to the previous algorithm achieved,
however, with significantly less computational effort. An order of magnitude
improvement is demonstrated experimentally in two standard reinforcement
learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented
at EWRL08, to be presented at ECML 200
Approximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that
contains the two celebrated policy and value iteration methods. Despite its
generality, MPI has not been thoroughly studied, especially its approximation
form which is used when the state and/or action spaces are large or infinite.
In this paper, we propose three implementations of approximate MPI (AMPI) that
are extensions of well-known approximate DP algorithms: fitted-value iteration,
fitted-Q iteration, and classification-based policy iteration. We provide error
propagation analyses that unify those for approximate policy and value
iteration. On the last classification-based implementation, we develop a
finite-sample analysis that shows that MPI's main parameter allows to control
the balance between the estimation error of the classifier and the overall
value function approximation
- …