3 research outputs found
Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes
Deep Gaussian Process (DGP) models offer a powerful nonparametric approach
for Bayesian inference, but exact inference is typically intractable,
motivating the use of various approximations. However, existing approaches,
such as mean-field Gaussian assumptions, limit the expressiveness and efficacy
of DGP models, while stochastic approximation can be computationally expensive.
To tackle these challenges, we introduce Neural Operator Variational Inference
(NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a
sampler and minimizes the Regularized Stein Discrepancy in L2 space between the
generated distribution and true posterior. We solve the minimax problem using
Monte Carlo estimation and subsampling stochastic optimization techniques. We
demonstrate that the bias introduced by our method can be controlled by
multiplying the Fisher divergence with a constant, which leads to robust error
control and ensures the stability and precision of the algorithm. Our
experiments on datasets ranging from hundreds to tens of thousands demonstrate
the effectiveness and the faster convergence rate of the proposed method. We
achieve a classification accuracy of 93.56 on the CIFAR10 dataset,
outperforming SOTA Gaussian process methods. Furthermore, our method guarantees
theoretically controlled prediction error for DGP models and demonstrates
remarkable performance on various datasets. We are optimistic that NOVI has the
potential to enhance the performance of deep Bayesian nonparametric models and
could have significant implications for various practical application
Adversarial -divergence Minimization for Bayesian Approximate Inference
Neural networks are popular state-of-the-art models for many different
tasks.They are often trained via back-propagation to find a value of the
weights that correctly predicts the observed data. Although back-propagation
has shown good performance in many applications, it cannot easily output an
estimate of the uncertainty in the predictions made. Estimating the uncertainty
in the predictions is a critical aspect with important applications, and one
method to obtain this information is following a Bayesian approach to estimate
a posterior distribution on the model parameters. This posterior distribution
summarizes which parameter values are compatible with the data, but is usually
intractable and has to be approximated. Several mechanisms have been considered
for solving this problem. We propose here a general method for approximate
Bayesian inference that is based on minimizing{\alpha}-divergences and that
allows for flexible approximate distributions. The method is evaluated in the
context of Bayesian neural networks on extensive experiments. The results show
that, in regression problems, it often gives better performance in terms of the
test log-likelihoodand sometimes in terms of the squared error. In
classification problems, however, it gives competitive results.Comment: 47 pages, 10 figures (41 pages for the main article, 6 for the
supplementary material
Alpha-divergence minimization for deep Gaussian processes
This paper proposes the minimization of α-divergences for approximate inference in the
context of deep Gaussian processes (DGPs). The proposed method can be considered
as a generalization of variational inference (VI) and expectation propagation (EP), two
previously used methods for approximate inference in DGPs. Both VI and EP are based
on the minimization of the Kullback-Leibler divergence. The proposed method is based on
a scalable version of power expectation propagation, a method that introduces an extra
parameter α that specifies the targeted α-divergence to be optimized. In particular, such
a method can recover the VI solution when α → 0 and the EP solution when α → 1.
An exhaustive experimental evaluation shows that the minimization of α-divergences via
the proposed method is feasible in DGPs and that choosing intermediate values of the α
parameter between 0 and 1 can give better results in some problems. This means that
one can improve the results of VI and EP when training DGPs. Importantly, the proposed
method allows for stochastic optimization techniques, making it able to address datasets
with several millions of instancesThe authors gratefully acknowledge the use of the facilities of Centro de Computación CientÃfica (CCC) at Universidad
Autónoma de Madrid. The authors also acknowledge financial support from Spanish Plan Nacional I+D+i, Ministerio de
Ciencia e Innovación, grant PID2019-106827GB-I00 / AEI / 10.13039/50110001103