1,968 research outputs found
Rollout Sampling Approximate Policy Iteration
Several researchers have recently investigated the connection between
reinforcement learning and classification. We are motivated by proposals of
approximate policy iteration schemes without value functions which focus on
policy representation using classifiers and address policy learning as a
supervised learning problem. This paper proposes variants of an improved policy
iteration scheme which addresses the core sampling problem in evaluating a
policy through simulation as a multi-armed bandit machine. The resulting
algorithm offers comparable performance to the previous algorithm achieved,
however, with significantly less computational effort. An order of magnitude
improvement is demonstrated experimentally in two standard reinforcement
learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented
at EWRL08, to be presented at ECML 200
Towards parallelizable sampling-based Nonlinear Model Predictive Control
This paper proposes a new sampling-based nonlinear model predictive control
(MPC) algorithm, with a bound on complexity quadratic in the prediction horizon
N and linear in the number of samples. The idea of the proposed algorithm is to
use the sequence of predicted inputs from the previous time step as a warm
start, and to iteratively update this sequence by changing its elements one by
one, starting from the last predicted input and ending with the first predicted
input. This strategy, which resembles the dynamic programming principle, allows
for parallelization up to a certain level and yields a suboptimal nonlinear MPC
algorithm with guaranteed recursive feasibility, stability and improved cost
function at every iteration, which is suitable for real-time implementation.
The complexity of the algorithm per each time step in the prediction horizon
depends only on the horizon, the number of samples and parallel threads, and it
is independent of the measured system state. Comparisons with the fmincon
nonlinear optimization solver on benchmark examples indicate that as the
simulation time progresses, the proposed algorithm converges rapidly to the
"optimal" solution, even when using a small number of samples.Comment: 9 pages, 9 pictures, submitted to IFAC World Congress 201
- …