Search CORE

17,568 research outputs found

Optimistic planning for continuous–action deterministic systems.

Author: Alexander Daniels
Lucian Buşoniu
Robert Babuška
Rémi Munos
Publication venue
Publication date: 01/01/2013
Field of study

Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly large-scale state spaces, and continuous, low-dimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on the state space structure. Unlike previous algorithms, SOOP explores the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. To this end, it borrows the principle of the simultaneous optimistic optimization method, and develops a nontrivial adaptation of this principle to the planning problem. Experiments on four problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization

CiteSeerX

Practical Open-Loop Optimistic Planning

Author: D Silver
D Silver
D Silver
J-F Hren
L Buşoniu
O Cappé
R Bellman
R Coulom
Publication venue
Publication date: 09/04/2019
Field of study

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Global Continuous Optimization with Error Bound and Fast Convergence

Author: Kawaguchi Kenji
Maruyama Yu
Zheng Xiaoyu
Publication venue: 'AI Access Foundation'
Publication date: 01/03/2015
Field of study

This paper considers global optimization with a black-box unknown objective function that can be non-convex and non-differentiable. Such a difficult optimization problem arises in many real-world applications, such as parameter tuning in machine learning, engineering design problem, and planning with a complex physics simulator. This paper proposes a new global optimization algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both fast convergence in practice and finite-time error bound in theory. The advantage and usage of the new algorithm are illustrated via theoretical analysis and an experiment conducted with 11 benchmark test functions. Further, we modify the LOGO algorithm to specifically solve a planning problem via policy search with continuous state/action space and long time horizon while maintaining its finite-time error bound. We apply the proposed planning method to accident management of a nuclear power plant. The result of the application study demonstrates the practical utility of our method

arXiv.org e-Print Archive

DSpace@MIT

A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption

Author: Bartlett Peter L.
Gabillon Victor
Valko Michal
Publication venue
Publication date: 01/01/2019
Field of study

We study the problem of optimizing a function under a \emph{budgeted number of evaluations}. We only assume that the function is \emph{locally} smooth around one of its global optima. The difficulty of optimization is measured in terms of 1) the amount of \emph{noise}

b

of the function evaluation and 2) the local smoothness,

d

, of the function. A smaller

d

results in smaller optimization error. We come with a new, simple, and parameter-free approach. First, for all values of

b

and

d

, this approach recovers at least the state-of-the-art regret guarantees. Second, our approach additionally obtains these results while being \textit{agnostic} to the values of both

b

and

d

. This leads to the first algorithm that naturally adapts to an \textit{unknown} range of noise

b

and leads to significant improvements in a moderate and low-noise regime. Third, our approach also obtains a remarkable improvement over the state-of-the-art SOO algorithm when the noise is very low which includes the case of optimization under deterministic feedback (

b=0

). There, under our minimal local smoothness assumption, this improvement is of exponential magnitude and holds for a class of functions that covers the vast majority of functions that practitioners optimize (

d=0

). We show that our algorithmic improvement is borne out in experiments as we empirically show faster convergence on common benchmarks

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1