161 research outputs found
Large-scale Heteroscedastic Regression via Gaussian Process
Heteroscedastic regression considering the varying noises among observations
has many applications in the fields like machine learning and statistics. Here
we focus on the heteroscedastic Gaussian process (HGP) regression which
integrates the latent function and the noise function together in a unified
non-parametric Bayesian framework. Though showing remarkable performance, HGP
suffers from the cubic time complexity, which strictly limits its application
to big data. To improve the scalability, we first develop a variational sparse
inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore,
two variants are developed to improve the scalability and capability of VSHGP.
The first is stochastic VSHGP (SVSHGP) which derives a factorized evidence
lower bound, thus enhancing efficient stochastic variational inference. The
second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee
machine formalism to distribute computations over multiple local VSHGP experts
with many inducing points; and (ii) adopts hybrid parameters for experts to
guard against over-fitting and capture local variety. The superiority of DVSHGP
and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs
is then extensively verified on various datasets.Comment: 14 pages, 15 figure
Forecasting of commercial sales with large scale Gaussian Processes
This paper argues that there has not been enough discussion in the field of
applications of Gaussian Process for the fast moving consumer goods industry.
Yet, this technique can be important as it e.g., can provide automatic feature
relevance determination and the posterior mean can unlock insights on the data.
Significant challenges are the large size and high dimensionality of commercial
data at a point of sale. The study reviews approaches in the Gaussian Processes
modeling for large data sets, evaluates their performance on commercial sales
and shows value of this type of models as a decision-making tool for
management.Comment: 1o pages, 5 figure
Understanding and Comparing Scalable Gaussian Process Regression for Big Data
As a non-parametric Bayesian model which produces informative predictive
distribution, Gaussian process (GP) has been widely used in various fields,
like regression, classification and optimization. The cubic complexity of
standard GP however leads to poor scalability, which poses challenges in the
era of big data. Hence, various scalable GPs have been developed in the
literature in order to improve the scalability while retaining desirable
prediction accuracy. This paper devotes to investigating the methodological
characteristics and performance of representative global and local scalable GPs
including sparse approximations and local aggregations from four main
perspectives: scalability, capability, controllability and robustness. The
numerical experiments on two toy examples and five real-world datasets with up
to 250K points offer the following findings. In terms of scalability, most of
the scalable GPs own a time complexity that is linear to the training size. In
terms of capability, the sparse approximations capture the long-term spatial
correlations, the local aggregations capture the local patterns but suffer from
over-fitting in some scenarios. In terms of controllability, we could improve
the performance of sparse approximations by simply increasing the inducing
size. But this is not the case for local aggregations. In terms of robustness,
local aggregations are robust to various initializations of hyperparameters due
to the local attention mechanism. Finally, we highlight that the proper hybrid
of global and local scalable GPs may be a promising way to improve both the
model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
Streaming sparse Gaussian process approximations
Sparse pseudo-point approximations for Gaussian process (GP) models provide a
suite of methods that support deployment of GPs in the large data regime and
enable analytic intractabilities to be sidestepped. However, the field lacks a
principled method to handle streaming data in which both the posterior
distribution over function values and the hyperparameter estimates are updated
in an online fashion. The small number of existing approaches either use
suboptimal hand-crafted heuristics for hyperparameter learning, or suffer from
catastrophic forgetting or slow updating when new data arrive. This paper
develops a new principled framework for deploying Gaussian process
probabilistic models in the streaming setting, providing methods for learning
hyperparameters and optimising pseudo-input locations. The proposed framework
is assessed using synthetic and real-world datasets
Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes
Deep Gaussian Process (DGP) models offer a powerful nonparametric approach
for Bayesian inference, but exact inference is typically intractable,
motivating the use of various approximations. However, existing approaches,
such as mean-field Gaussian assumptions, limit the expressiveness and efficacy
of DGP models, while stochastic approximation can be computationally expensive.
To tackle these challenges, we introduce Neural Operator Variational Inference
(NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a
sampler and minimizes the Regularized Stein Discrepancy in L2 space between the
generated distribution and true posterior. We solve the minimax problem using
Monte Carlo estimation and subsampling stochastic optimization techniques. We
demonstrate that the bias introduced by our method can be controlled by
multiplying the Fisher divergence with a constant, which leads to robust error
control and ensures the stability and precision of the algorithm. Our
experiments on datasets ranging from hundreds to tens of thousands demonstrate
the effectiveness and the faster convergence rate of the proposed method. We
achieve a classification accuracy of 93.56 on the CIFAR10 dataset,
outperforming SOTA Gaussian process methods. Furthermore, our method guarantees
theoretically controlled prediction error for DGP models and demonstrates
remarkable performance on various datasets. We are optimistic that NOVI has the
potential to enhance the performance of deep Bayesian nonparametric models and
could have significant implications for various practical application
- …