1,143 research outputs found
No-Regret Bayesian Optimization with Unknown Hyperparameters
Bayesian optimization (BO) based on Gaussian process models is a powerful
paradigm to optimize black-box functions that are expensive to evaluate. While
several BO algorithms provably converge to the global optimum of the unknown
function, they assume that the hyperparameters of the kernel are known in
advance. This is not the case in practice and misspecification often causes
these algorithms to converge to poor local optima. In this paper, we present
the first BO algorithm that is provably no-regret and converges to the optimum
without knowledge of the hyperparameters. During optimization we slowly adapt
the hyperparameters of stationary kernels and thereby expand the associated
function class over time, so that the BO algorithm considers more complex
function candidates. Based on the theoretical insights, we propose several
practical algorithms that achieve the empirical sample efficiency of BO with
online hyperparameter estimation, but retain theoretical convergence
guarantees. We evaluate our method on several benchmark problems
On the Design of LQR Kernels for Efficient Controller Learning
Finding optimal feedback controllers for nonlinear dynamic systems from data
is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful
framework for direct controller tuning from experimental trials. For selecting
the next query point and finding the global optimum, BO relies on a
probabilistic description of the latent objective function, typically a
Gaussian process (GP). As is shown herein, GPs with a common kernel choice can,
however, lead to poor learning outcomes on standard quadratic control problems.
For a first-order system, we construct two kernels that specifically leverage
the structure of the well-known Linear Quadratic Regulator (LQR), yet retain
the flexibility of Bayesian nonparametric learning. Simulations of uncertain
linear and nonlinear systems demonstrate that the LQR kernels yield superior
learning performance.Comment: 8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and
Control (CDC 2017
Gait learning for soft microrobots controlled by light fields
Soft microrobots based on photoresponsive materials and controlled by light
fields can generate a variety of different gaits. This inherent flexibility can
be exploited to maximize their locomotion performance in a given environment
and used to adapt them to changing conditions. Albeit, because of the lack of
accurate locomotion models, and given the intrinsic variability among
microrobots, analytical control design is not possible. Common data-driven
approaches, on the other hand, require running prohibitive numbers of
experiments and lead to very sample-specific results. Here we propose a
probabilistic learning approach for light-controlled soft microrobots based on
Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach
results in a learning scheme that is data-efficient, enabling gait optimization
with a limited experimental budget, and robust against differences among
microrobot samples. These features are obtained by designing the learning
scheme through the comparison of different GP priors and BO settings on a
semi-synthetic data set. The developed learning scheme is validated in
microrobot experiments, resulting in a 115% improvement in a microrobot's
locomotion performance with an experimental budget of only 20 tests. These
encouraging results lead the way toward self-adaptive microrobotic systems
based on light-controlled soft microrobots and probabilistic learning control.Comment: 8 pages, 7 figures, to appear in the proceedings of the IEEE/RSJ
International Conference on Intelligent Robots and Systems 201
Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model
hyperparameters, regularization terms, and optimization parameters.
Unfortunately, this tuning is often a "black art" that requires expert
experience, unwritten rules of thumb, or sometimes brute-force search. Much
more appealing is the idea of developing automatic approaches which can
optimize the performance of a given learning algorithm to the task at hand. In
this work, we consider the automatic tuning problem within the framework of
Bayesian optimization, in which a learning algorithm's generalization
performance is modeled as a sample from a Gaussian process (GP). The tractable
posterior distribution induced by the GP leads to efficient use of the
information gathered by previous experiments, enabling optimal choices about
what parameters to try next. Here we show how the effects of the Gaussian
process prior and the associated inference procedure can have a large impact on
the success or failure of Bayesian optimization. We show that thoughtful
choices can lead to results that exceed expert-level performance in tuning
machine learning algorithms. We also describe new algorithms that take into
account the variable cost (duration) of learning experiments and that can
leverage the presence of multiple cores for parallel experimentation. We show
that these proposed algorithms improve on previous automatic procedures and can
reach or surpass human expert-level optimization on a diverse set of
contemporary algorithms including latent Dirichlet allocation, structured SVMs
and convolutional neural networks
Adversarially Robust Optimization with Gaussian Processes
In this paper, we consider the problem of Gaussian process (GP) optimization
with an added robustness requirement: The returned point may be perturbed by an
adversary, and we require the function value to remain as high as possible even
after this perturbation. This problem is motivated by settings in which the
underlying functions during optimization and implementation stages are
different, or when one is interested in finding an entire region of good inputs
rather than only a single point. We show that standard GP optimization
algorithms do not exhibit the desired robustness properties, and provide a
novel confidence-bound based algorithm StableOpt for this purpose. We
rigorously establish the required number of samples for StableOpt to find a
near-optimal point, and we complement this guarantee with an
algorithm-independent lower bound. We experimentally demonstrate several
potential applications of interest using real-world data sets, and we show that
StableOpt consistently succeeds in finding a stable maximizer where several
baseline methods fail.Comment: Corrected typo
Portfolio Allocation for Bayesian Optimization
Bayesian optimization with Gaussian processes has become an increasingly
popular tool in the machine learning community. It is efficient and can be used
when very little is known about the objective function, making it popular in
expensive black-box optimization scenarios. It uses Bayesian methods to sample
the objective efficiently using an acquisition function which incorporates the
model's estimate of the objective and the uncertainty at any given point.
However, there are several different parameterized acquisition functions in the
literature, and it is often unclear which one to use. Instead of using a single
acquisition function, we adopt a portfolio of acquisition functions governed by
an online multi-armed bandit strategy. We propose several portfolio strategies,
the best of which we call GP-Hedge, and show that this method outperforms the
best individual acquisition function. We also provide a theoretical bound on
the algorithm's performance.Comment: This revision contains an updated the performance bound and other
minor text change
Time-Varying Gaussian Process Bandit Optimization
We consider the sequential Bayesian optimization problem with bandit
feedback, adopting a formulation that allows for the reward function to vary
with time. We model the reward function using a Gaussian process whose
evolution obeys a simple Markov model. We introduce two natural extensions of
the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The
first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB,
instead forgets about old data in a smooth fashion. Our main contribution
comprises of novel regret bounds for these algorithms, providing an explicit
characterization of the trade-off between the time horizon and the rate at
which the function varies. We illustrate the performance of the algorithms on
both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB
to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover,
both algorithms significantly outperform classical GP-UCB, since it treats
stale and fresh data equally.Comment: To appear in AISTATS 201
- …