3 research outputs found

    Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes

    Full text link
    Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these challenges, we introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior. We solve the minimax problem using Monte Carlo estimation and subsampling stochastic optimization techniques. We demonstrate that the bias introduced by our method can be controlled by multiplying the Fisher divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm. Our experiments on datasets ranging from hundreds to tens of thousands demonstrate the effectiveness and the faster convergence rate of the proposed method. We achieve a classification accuracy of 93.56 on the CIFAR10 dataset, outperforming SOTA Gaussian process methods. Furthermore, our method guarantees theoretically controlled prediction error for DGP models and demonstrates remarkable performance on various datasets. We are optimistic that NOVI has the potential to enhance the performance of deep Bayesian nonparametric models and could have significant implications for various practical application

    Adversarial α\alpha-divergence Minimization for Bayesian Approximate Inference

    Full text link
    Neural networks are popular state-of-the-art models for many different tasks.They are often trained via back-propagation to find a value of the weights that correctly predicts the observed data. Although back-propagation has shown good performance in many applications, it cannot easily output an estimate of the uncertainty in the predictions made. Estimating the uncertainty in the predictions is a critical aspect with important applications, and one method to obtain this information is following a Bayesian approach to estimate a posterior distribution on the model parameters. This posterior distribution summarizes which parameter values are compatible with the data, but is usually intractable and has to be approximated. Several mechanisms have been considered for solving this problem. We propose here a general method for approximate Bayesian inference that is based on minimizing{\alpha}-divergences and that allows for flexible approximate distributions. The method is evaluated in the context of Bayesian neural networks on extensive experiments. The results show that, in regression problems, it often gives better performance in terms of the test log-likelihoodand sometimes in terms of the squared error. In classification problems, however, it gives competitive results.Comment: 47 pages, 10 figures (41 pages for the main article, 6 for the supplementary material

    Alpha-divergence minimization for deep Gaussian processes

    Full text link
    This paper proposes the minimization of α-divergences for approximate inference in the context of deep Gaussian processes (DGPs). The proposed method can be considered as a generalization of variational inference (VI) and expectation propagation (EP), two previously used methods for approximate inference in DGPs. Both VI and EP are based on the minimization of the Kullback-Leibler divergence. The proposed method is based on a scalable version of power expectation propagation, a method that introduces an extra parameter α that specifies the targeted α-divergence to be optimized. In particular, such a method can recover the VI solution when α → 0 and the EP solution when α → 1. An exhaustive experimental evaluation shows that the minimization of α-divergences via the proposed method is feasible in DGPs and that choosing intermediate values of the α parameter between 0 and 1 can give better results in some problems. This means that one can improve the results of VI and EP when training DGPs. Importantly, the proposed method allows for stochastic optimization techniques, making it able to address datasets with several millions of instancesThe authors gratefully acknowledge the use of the facilities of Centro de Computación Científica (CCC) at Universidad Autónoma de Madrid. The authors also acknowledge financial support from Spanish Plan Nacional I+D+i, Ministerio de Ciencia e Innovación, grant PID2019-106827GB-I00 / AEI / 10.13039/50110001103
    corecore