Search CORE

1,848 research outputs found

Rollout Sampling Approximate Policy Iteration

Author: A. Antos
A. Fern
Christos Dimitrakakis
E. Even-Dar
H. O. Wang
M. G. Lagoudakis
Michail G. Lagoudakis
P. Auer
R. A. Howard
R. Sutton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented at EWRL08, to be presented at ECML 200

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

International Migration, Integration and Social Cohesion online publications

Institutional Repository of the Technical University of Crete

Towards parallelizable sampling-based Nonlinear Model Predictive Control

Author: Bobiti R. V.
Lazar M.
Publication venue
Publication date: 12/01/2017
Field of study

This paper proposes a new sampling-based nonlinear model predictive control (MPC) algorithm, with a bound on complexity quadratic in the prediction horizon N and linear in the number of samples. The idea of the proposed algorithm is to use the sequence of predicted inputs from the previous time step as a warm start, and to iteratively update this sequence by changing its elements one by one, starting from the last predicted input and ending with the first predicted input. This strategy, which resembles the dynamic programming principle, allows for parallelization up to a certain level and yields a suboptimal nonlinear MPC algorithm with guaranteed recursive feasibility, stability and improved cost function at every iteration, which is suitable for real-time implementation. The complexity of the algorithm per each time step in the prediction horizon depends only on the horizon, the number of samples and parallel threads, and it is independent of the measured system state. Comparisons with the fmincon nonlinear optimization solver on benchmark examples indicate that as the simulation time progresses, the proposed algorithm converges rapidly to the "optimal" solution, even when using a small number of samples.Comment: 9 pages, 9 pictures, submitted to IFAC World Congress 201

arXiv.org e-Print Archive

Pure OAI Repository