218 research outputs found
Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters
Bayesian optimisation has gained great popularity as a tool for optimising
the parameters of machine learning algorithms and models. Somewhat ironically,
setting up the hyper-parameters of Bayesian optimisation methods is notoriously
hard. While reasonable practical solutions have been advanced, they can often
fail to find the best optima. Surprisingly, there is little theoretical
analysis of this crucial problem in the literature. To address this, we derive
a cumulative regret bound for Bayesian optimisation with Gaussian processes and
unknown kernel hyper-parameters in the stochastic setting. The bound, which
applies to the expected improvement acquisition function and sub-Gaussian
observation noise, provides us with guidelines on how to design hyper-parameter
estimation methods. A simple simulation demonstrates the importance of
following these guidelines.Comment: 16 pages, 1 figur
Narrowing the Gap: Random Forests In Theory and In Practice
Despite widespread interest and practical use, the theoretical properties of
random forests are still not well understood. In this paper we contribute to
this understanding in two ways. We present a new theoretically tractable
variant of random regression forests and prove that our algorithm is
consistent. We also provide an empirical evaluation, comparing our algorithm
and other theoretically tractable random forest models to the random forest
algorithm used in practice. Our experiments provide insight into the relative
importance of different simplifications that theoreticians have made to obtain
tractable models for analysis.Comment: Under review by the International Conference on Machine Learning
(ICML) 201
Linear and Parallel Learning of Markov Random Fields
We introduce a new embarrassingly parallel parameter learning algorithm for
Markov random fields with untied parameters which is efficient for a large
class of practical models. Our algorithm parallelizes naturally over cliques
and, for graphs of bounded degree, its complexity is linear in the number of
cliques. Unlike its competitors, our algorithm is fully parallel and for
log-linear models it is also data efficient, requiring only the local
sufficient statistics of the data to estimate parameters
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
We present a tutorial on Bayesian optimization, a method of finding the
maximum of expensive cost functions. Bayesian optimization employs the Bayesian
technique of setting a prior over the objective function and combining it with
evidence to get a posterior function. This permits a utility-based selection of
the next observation to make on the objective function, which must take into
account both exploration (sampling from areas of high uncertainty) and
exploitation (sampling areas likely to offer improvement over the current best
observation). We also present two detailed extensions of Bayesian optimization,
with experiments---active user modelling with preferences, and hierarchical
reinforcement learning---and a discussion of the pros and cons of Bayesian
optimization based on our experiences
- …