1,819 research outputs found
Laplace Approximation for Divisive Gaussian Processes for Nonstationary Regression
The standard Gaussian Process regression (GP) is usually formulated under stationary hypotheses: The noise power is considered constant throughout the input space and the covariance of the prior distribution is typically modeled as depending only on the difference between input samples. These assumptions can be too restrictive and unrealistic for many real-world problems. Although nonstationarity can be achieved using specific covariance functions, they require a prior knowledge of the kind of nonstationarity, not available for most applications. In this paper we propose to use the Laplace approximation to make inference in a divisive GP model to perform nonstationary regression, including heteroscedastic noise cases. The log-concavity of the likelihood ensures a unimodal posterior and makes that the Laplace approximation converges to a unique maximum. The characteristics of the likelihood also allow to obtain accurate posterior approximations when compared to the Expectation Propagation (EP) approximations and the asymptotically exact posterior provided by a Markov Chain Monte Carlo implementation with Elliptical Slice Sampling (ESS), but at a reduced computational load with respect to both, EP and ESS
Slice sampling covariance hyperparameters of latent Gaussian models
The Gaussian process (GP) is a popular way to specify dependencies between
random variables in a probabilistic model. In the Bayesian framework the
covariance structure can be specified using unknown hyperparameters.
Integrating over these hyperparameters considers different possible
explanations for the data when making predictions. This integration is often
performed using Markov chain Monte Carlo (MCMC) sampling. However, with
non-Gaussian observations standard hyperparameter sampling approaches require
careful tuning and may converge slowly. In this paper we present a slice
sampling approach that requires little tuning while mixing well in both strong-
and weak-data regimes.Comment: 9 pages, 4 figures, 4 algorithms. Minor corrections to previous
version. This version to appear in Advances in Neural Information Processing
Systems (NIPS) 23, 201
Recommended from our members
Advances in Bayesian inference and stable optimization for large-scale machine learning problems
A core task in machine learning, and the topic of this thesis, is developing faster and more accurate methods of posterior inference in probabilistic models. The thesis has two components. The first explores using deterministic methods to improve the efficiency of Markov Chain Monte Carlo (MCMC) algorithms. We propose new MCMC algorithms that can use deterministic methods as a “prior” to bias MCMC proposals to be in areas of high posterior density, leading to highly efficient sampling. In Chapter 2 we develop such methods for continuous distributions, and in Chapter 3 for binary distributions. The resulting methods consistently outperform existing state-of-the-art sampling techniques, sometimes by several orders of magnitude. Chapter 4 uses similar ideas as in Chapters 2 and 3, but in the context of modeling the performance of left-handed players in one-on-one interactive sports.
The second part of this thesis explores the use of stable stochastic gradient descent (SGD) methods for computing a maximum a posteriori (MAP) estimate in large-scale machine learning problems. In Chapter 5 we propose two such methods for softmax regression. The first is an implementation of Implicit SGD (ISGD), a stable but difficult to implement SGD method, and the second is a new SGD method specifically designed for optimizing a double-sum formulation of the softmax. Both methods comprehensively outperform the previous state-of-the-art on seven real world datasets. Inspired by the success of ISGD on the softmax, we investigate its application to neural networks in Chapter 6. In this chapter we present a novel layer-wise approximation of ISGD that has efficiently computable updates. Experiments show that the resulting method is more robust to high learning rates and generally outperforms standard backpropagation on a variety of tasks
Pseudo-Marginal Bayesian Inference for Gaussian Processes
The main challenges that arise when adopting Gaussian Process priors in
probabilistic modeling are how to carry out exact Bayesian inference and how to
account for uncertainty on model parameters when making model-based predictions
on out-of-sample data. Using probit regression as an illustrative working
example, this paper presents a general and effective methodology based on the
pseudo-marginal approach to Markov chain Monte Carlo that efficiently addresses
both of these issues. The results presented in this paper show improvements
over existing sampling methods to simulate from the posterior distribution over
the parameters defining the covariance function of the Gaussian Process prior.
This is particularly important as it offers a powerful tool to carry out full
Bayesian inference of Gaussian Process based hierarchic statistical models in
general. The results also demonstrate that Monte Carlo based integration of all
model parameters is actually feasible in this class of models providing a
superior quantification of uncertainty in predictions. Extensive comparisons
with respect to state-of-the-art probabilistic classifiers confirm this
assertion.Comment: 14 pages double colum
Integrals over Gaussians under Linear Domain Constraints
Integrals of linearly constrained multivariate Gaussian densities are a
frequent problem in machine learning and statistics, arising in tasks like
generalized linear models and Bayesian optimization. Yet they are notoriously
hard to compute, and to further complicate matters, the numerical values of
such integrals may be very small. We present an efficient black-box algorithm
that exploits geometry for the estimation of integrals over a small, truncated
Gaussian volume, and to simulate therefrom. Our algorithm uses the
Holmes-Diaconis-Ross (HDR) method combined with an analytic version of
elliptical slice sampling (ESS). Adapted to the linear setting, ESS allows for
rejection-free sampling, because intersections of ellipses and domain
boundaries have closed-form solutions. The key idea of HDR is to decompose the
integral into easier-to-compute conditional probabilities by using a sequence
of nested domains. Remarkably, it allows for direct computation of the
logarithm of the integral value and thus enables the computation of extremely
small probability masses. We demonstrate the effectiveness of our tailored
combination of HDR and ESS on high-dimensional integrals and on entropy search
for Bayesian optimization
Gaussian Processes with Monotonicity constraints for Big Data
Tämän työn tarkoitus on kehittää menetelmä monotonisuusrajoitettujen Gaussisten Prosessien käyttämiseksi suurille aineistoille. Variaatiolaskentaan perustuvaa menetelmää testataan usealla simuloidulla ja oikealla aineistolla. Uuden menetelmän prediktiivistä kykyä verrataan expectation propagation menetelmään, sekä Markov chain Monte Carlo menetelmiin. Työssä saatujen tulosten perusteella voidaan päätellä, että uusi menetelmä toimii ja sitä voidaan käyttää, kun aineistot kasvavat liian suuriksia laskennallisesti raskaille menetelmille.In this thesis, we combine recent advances in monotonicity constraints for Gaussian processes with Big Data inference of Gaussian Proceses. The new variational inference based method is developed and experimented on several simulated and real world data sets by comparing the predictive performance to Expectation Propagation and Markov chain Monte Carlo methods. The results indicate that the new method produces good results and can be used when the data sets get so large that the computationally demanding methods cannot be used
Perspectives of Imaging of Single Protein Molecules with the Present Design of the European XFEL. - Part I - X-ray Source, Beamlime Optics and Instrument Simulations
The Single Particles, Clusters and Biomolecules (SPB) instrument at the
European XFEL is located behind the SASE1 undulator, and aims to support
imaging and structure determination of biological specimen between about 0.1
micrometer and 1 micrometer size. The instrument is designed to work at photon
energies from 3 keV up to 16 keV. This wide operation range is a cause for
challenges to the focusing optics. In particular, a long propagation distance
of about 900 m between x-ray source and sample leads to a large lateral photon
beam size at the optics. The beam divergence is the most important parameter
for the optical system, and is largest for the lowest photon energies and for
the shortest pulse duration (corresponding to the lowest charge). Due to the
large divergence of nominal X-ray pulses with duration shorter than 10 fs, one
suffers diffraction from mirror aperture, leading to a 100-fold decrease in
fluence at photon energies around 4 keV, which are ideal for imaging of single
biomolecules. The nominal SASE1 output power is about 50 GW. This is very far
from the level required for single biomolecule imaging, even assuming perfect
beamline and focusing efficiency. Here we demonstrate that the parameters of
the accelerator complex and of the SASE1 undulator offer an opportunity to
optimize the SPB beamline for single biomolecule imaging with minimal
additional costs and time. Start to end simulations from the electron injector
at the beginning of the accelerator complex up to the generation of diffraction
data indicate that one can achieve diffraction without diffraction with about
0.5 photons per Shannon pixel at near-atomic resolution with 1e13 photons in a
4 fs pulse at 4 keV photon energy and in a 100 nm focus, corresponding to a
fluence of 1e23 ph/cm^2. This result is exemplified using the RNA Pol II
molecule as a case study
- …