17,568 research outputs found
Optimistic planning for continuous–action deterministic systems.
Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly large-scale state spaces, and continuous, low-dimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on the state space structure. Unlike previous algorithms, SOOP explores the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. To this end, it borrows the principle of the simultaneous optimistic optimization method, and develops a nontrivial adaptation of this principle to the planning problem. Experiments on four problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization
Practical Open-Loop Optimistic Planning
We consider the problem of online planning in a Markov Decision Process when
given only access to a generative model, restricted to open-loop policies -
i.e. sequences of actions - and under budget constraint. In this setting, the
Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical
guarantees but is overly conservative in practice, as we show in numerical
experiments. We propose a modified version of the algorithm with tighter
upper-confidence bounds, KLOLOP, that leads to better practical performances
while retaining the sample complexity bound. Finally, we propose an efficient
implementation that significantly improves the time complexity of both
algorithms
Global Continuous Optimization with Error Bound and Fast Convergence
This paper considers global optimization with a black-box unknown objective
function that can be non-convex and non-differentiable. Such a difficult
optimization problem arises in many real-world applications, such as parameter
tuning in machine learning, engineering design problem, and planning with a
complex physics simulator. This paper proposes a new global optimization
algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both
fast convergence in practice and finite-time error bound in theory. The
advantage and usage of the new algorithm are illustrated via theoretical
analysis and an experiment conducted with 11 benchmark test functions. Further,
we modify the LOGO algorithm to specifically solve a planning problem via
policy search with continuous state/action space and long time horizon while
maintaining its finite-time error bound. We apply the proposed planning method
to accident management of a nuclear power plant. The result of the application
study demonstrates the practical utility of our method
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption
We study the problem of optimizing a function under a \emph{budgeted number
of evaluations}. We only assume that the function is \emph{locally} smooth
around one of its global optima. The difficulty of optimization is measured in
terms of 1) the amount of \emph{noise} of the function evaluation and 2)
the local smoothness, , of the function. A smaller results in smaller
optimization error. We come with a new, simple, and parameter-free approach.
First, for all values of and , this approach recovers at least the
state-of-the-art regret guarantees. Second, our approach additionally obtains
these results while being \textit{agnostic} to the values of both and .
This leads to the first algorithm that naturally adapts to an \textit{unknown}
range of noise and leads to significant improvements in a moderate and
low-noise regime. Third, our approach also obtains a remarkable improvement
over the state-of-the-art SOO algorithm when the noise is very low which
includes the case of optimization under deterministic feedback (). There,
under our minimal local smoothness assumption, this improvement is of
exponential magnitude and holds for a class of functions that covers the vast
majority of functions that practitioners optimize (). We show that our
algorithmic improvement is borne out in experiments as we empirically show
faster convergence on common benchmarks
- …