43 research outputs found
Student-t Processes as Alternatives to Gaussian Processes
We investigate the Student-t process as an alternative to the Gaussian
process as a nonparametric prior over functions. We derive closed form
expressions for the marginal likelihood and predictive distribution of a
Student-t process, by integrating away an inverse Wishart process prior over
the covariance kernel of a Gaussian process model. We show surprising
equivalences between different hierarchical Gaussian process models leading to
Student-t processes, and derive a new sampling scheme for the inverse Wishart
process, which helps elucidate these equivalences. Overall, we show that a
Student-t process can retain the attractive properties of a Gaussian process --
a nonparametric representation, analytic marginal and predictive distributions,
and easy model selection through covariance kernels -- but has enhanced
flexibility, and predictive covariances that, unlike a Gaussian process,
explicitly depend on the values of training observations. We verify empirically
that a Student-t process is especially useful in situations where there are
changes in covariance structure, or in applications like Bayesian optimization,
where accurate predictive covariances are critical for good performance. These
advantages come at no additional computational cost over Gaussian processes.Comment: 13 pages, 6 figures, 1 table. To appear in "The Seventeenth
International Conference on Artificial Intelligence and Statistics (AISTATS),
2014.
Hypervolume-based Multi-objective Bayesian Optimization with Student-t Processes
Student- processes have recently been proposed as an appealing alternative
non-parameteric function prior. They feature enhanced flexibility and
predictive variance. In this work the use of Student- processes are explored
for multi-objective Bayesian optimization. In particular, an analytical
expression for the hypervolume-based probability of improvement is developed
for independent Student- process priors of the objectives. Its effectiveness
is shown on a multi-objective optimization problem which is known to be
difficult with traditional Gaussian processes.Comment: 5 pages, 3 figure
On the average uncertainty for systems with nonlinear coupling
The increased uncertainty and complexity of nonlinear systems have motivated
investigators to consider generalized approaches to defining an entropy
function. New insights are achieved by defining the average uncertainty in the
probability domain as a transformation of entropy functions. The Shannon
entropy when transformed to the probability domain is the weighted geometric
mean of the probabilities. For the exponential and Gaussian distributions, we
show that the weighted geometric mean of the distribution is equal to the
density of the distribution at the location plus the scale, i.e. at the width
of the distribution. The average uncertainty is generalized via the weighted
generalized mean, in which the moment is a function of the nonlinear source.
Both the Renyi and Tsallis entropies transform to this definition of the
generalized average uncertainty in the probability domain. For the generalized
Pareto and Student's t-distributions, which are the maximum entropy
distributions for these generalized entropies, the appropriate weighted
generalized mean also equals the density of the distribution at the location
plus scale. A coupled entropy function is proposed, which is equal to the
normalized Tsallis entropy divided by one plus the coupling.Comment: 24 pages, including 4 figures and 1 tabl
Practical Bayesian optimization in the presence of outliers
Inference in the presence of outliers is an important field of research as
outliers are ubiquitous and may arise across a variety of problems and domains.
Bayesian optimization is method that heavily relies on probabilistic inference.
This allows outstanding sample efficiency because the probabilistic machinery
provides a memory of the whole optimization process. However, that virtue
becomes a disadvantage when the memory is populated with outliers, inducing
bias in the estimation. In this paper, we present an empirical evaluation of
Bayesian optimization methods in the presence of outliers. The empirical
evidence shows that Bayesian optimization with robust regression often produces
suboptimal results. We then propose a new algorithm which combines robust
regression (a Gaussian process with Student-t likelihood) with outlier
diagnostics to classify data points as outliers or inliers. By using an
scheduler for the classification of outliers, our method is more efficient and
has better convergence over the standard robust regression. Furthermore, we
show that even in controlled situations with no expected outliers, our method
is able to produce better results.Comment: 10 pages (2 of references), 6 figures, 1 algorith
BRUNO: A Deep Recurrent Model for Exchangeable Data
We present a novel model architecture which leverages deep learning tools to
perform exact Bayesian inference on sets of high dimensional, complex
observations. Our model is provably exchangeable, meaning that the joint
distribution over observations is invariant under permutation: this property
lies at the heart of Bayesian inference. The model does not require variational
approximations to train, and new samples can be generated conditional on
previous samples, with cost linear in the size of the conditioning set. The
advantages of our architecture are demonstrated on learning tasks that require
generalisation from short observed sequences while modelling sequence
variability, such as conditional image generation, few-shot learning, and
anomaly detection.Comment: NIPS 201