11 research outputs found
Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition
We propose a method (TT-GP) for approximate inference in Gaussian Process
(GP) models. We build on previous scalable GP research including stochastic
variational inference based on inducing inputs, kernel interpolation, and
structure exploiting algebra. The key idea of our method is to use Tensor Train
decomposition for variational parameters, which allows us to train GPs with
billions of inducing inputs and achieve state-of-the-art results on several
benchmarks. Further, our approach allows for training kernels based on deep
neural networks without any modifications to the underlying GP model. A neural
network learns a multidimensional embedding for the data, which is used by the
GP to make the final prediction. We train GP and neural network parameters
end-to-end without pretraining, through maximization of GP marginal likelihood.
We show the efficiency of the proposed approach on several regression and
classification benchmark datasets including MNIST, CIFAR-10, and Airline
Algorithmic Linearly Constrained Gaussian Processes
We algorithmically construct multi-output Gaussian process priors which
satisfy linear differential equations. Our approach attempts to parametrize all
solutions of the equations using Gr\"obner bases. If successful, a push forward
Gaussian process along the paramerization is the desired prior. We consider
several examples from physics, geomathematics and control, among them the full
inhomogeneous system of Maxwell's equations. By bringing together stochastic
learning and computer algebra in a novel way, we combine noisy observations
with precise algebraic computations.Comment: NIPS 201
Scalable Multi-Task Gaussian Process Tensor Regression for Normative Modeling of Structured Variation in Neuroimaging Data
Most brain disorders are very heterogeneous in terms of their underlying
biology and developing analysis methods to model such heterogeneity is a major
challenge. A promising approach is to use probabilistic regression methods to
estimate normative models of brain function using (f)MRI data then use these to
map variation across individuals in clinical populations (e.g., via anomaly
detection). To fully capture individual differences, it is crucial to
statistically model the patterns of correlation across different brain regions
and individuals. However, this is very challenging for neuroimaging data
because of high-dimensionality and highly structured patterns of correlation
across multiple axes. Here, we propose a general and flexible multi-task
learning framework to address this problem. Our model uses a tensor-variate
Gaussian process in a Bayesian mixed-effects model and makes use of Kronecker
algebra and a low-rank approximation to scale efficiently to multi-way
neuroimaging data at the whole brain level. On a publicly available clinical
fMRI dataset, we show that our computationally affordable approach
substantially improves detection sensitivity over both a mass-univariate
normative model and a classifier that --unlike our approach-- has full access
to the clinical labels
Adaptive Tensor Learning with Tensor Networks
Tensor decomposition techniques have shown great successes in machine
learning and data science by extending classical algorithms based on matrix
factorization to multi-modal and multi-way data. However, there exist many
tensor decomposition models~(CP, Tucker, Tensor Train, etc.) and the rank of
such a decomposition is typically a collection of integers rather than a unique
number, making model and hyper-parameter selection a tedious and costly task.
At the same time, tensor network methods are powerful tools developed in the
physics community which have recently shown their potential for machine
learning applications and offer a unifying view of the various tensor
decomposition models. In this paper, we leverage the tensor network formalism
to develop a generic and efficient adaptive algorithm for tensor learning. Our
method is based on a simple greedy approach optimizing a differentiable loss
function starting from a rank one tensor and successively identifying the most
promising tensor network edges for small rank increments. Our algorithm can
adaptively identify tensor network structures with small number of parameters
that effectively optimize the objective function from data. The framework we
introduce is very broad and encompasses many common tensor optimization
problems. Experiments on tensor decomposition and tensor completion tasks with
both synthetic and real world data demonstrate the effectiveness of the
proposed algorithm
SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes
State-of-the-art methods for scalable Gaussian processes use iterative
algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance
kernel. The Structured Kernel Interpolation (SKI) framework accelerates these
MVMs by performing efficient MVMs on a grid and interpolating back to the
original space. In this work, we develop a connection between SKI and the
permutohedral lattice used for high-dimensional fast bilateral filtering. Using
a sparse simplicial grid instead of a dense rectangular one, we can perform GP
inference exponentially faster in the dimension than SKI. Our approach,
Simplex-GP, enables scaling SKI to high dimensions, while maintaining strong
predictive performance. We additionally provide a CUDA implementation of
Simplex-GP, which enables significant GPU acceleration of MVM based inference.Comment: International Conference on Machine Learning (ICML), 202
Hierarchical Inducing Point Gaussian Process for Inter-domain Observations
We examine the general problem of inter-domain Gaussian Processes (GPs):
problems where the GP realization and the noisy observations of that
realization lie on different domains. When the mapping between those domains is
linear, such as integration or differentiation, inference is still closed form.
However, many of the scaling and approximation techniques that our community
has developed do not apply to this setting. In this work, we introduce the
hierarchical inducing point GP (HIP-GP), a scalable inter-domain GP inference
method that enables us to improve the approximation accuracy by increasing the
number of inducing points to the millions. HIP-GP, which relies on inducing
points with grid structure and a stationary kernel assumption, is suitable for
low-dimensional problems. In developing HIP-GP, we introduce (1) a fast
whitening strategy, and (2) a novel preconditioner for conjugate gradients
which can be helpful in general GP settings. Our code is available at https:
//github.com/cunningham-lab/hipgp
Alternating linear scheme in a Bayesian framework for low-rank tensor approximation
Multiway data often naturally occurs in a tensorial format which can be
approximately represented by a low-rank tensor decomposition. This is useful
because complexity can be significantly reduced and the treatment of
large-scale data sets can be facilitated. In this paper, we find a low-rank
representation for a given tensor by solving a Bayesian inference problem. This
is achieved by dividing the overall inference problem into sub-problems where
we sequentially infer the posterior distribution of one tensor decomposition
component at a time. This leads to a probabilistic interpretation of the
well-known iterative algorithm alternating linear scheme (ALS). In this way,
the consideration of measurement noise is enabled, as well as the incorporation
of application-specific prior knowledge and the uncertainty quantification of
the low-rank tensor estimate. To compute the low-rank tensor estimate from the
posterior distributions of the tensor decomposition components, we present an
algorithm that performs the unscented transform in tensor train format
Lower and Upper Bounds on the VC-Dimension of Tensor Network Models
Tensor network methods have been a key ingredient of advances in condensed
matter physics and have recently sparked interest in the machine learning
community for their ability to compactly represent very high-dimensional
objects. Tensor network methods can for example be used to efficiently learn
linear models in exponentially large feature spaces [Stoudenmire and Schwab,
2016]. In this work, we derive upper and lower bounds on the VC dimension and
pseudo-dimension of a large class of tensor network models for classification,
regression and completion. Our upper bounds hold for linear models
parameterized by arbitrary tensor network structures, and we derive lower
bounds for common tensor decomposition models~(CP, Tensor Train, Tensor Ring
and Tucker) showing the tightness of our general upper bound. These results are
used to derive a generalization bound which can be applied to classification
with low rank matrices as well as linear classifiers based on any of the
commonly used tensor decomposition models. As a corollary of our results, we
obtain a bound on the VC dimension of the matrix product state classifier
introduced in [Stoudenmire and Schwab, 2016] as a function of the so-called
bond dimension~(i.e. tensor train rank), which answers an open problem listed
by Cirac, Garre-Rubio and P\'erez-Garc\'ia in [Cirac et al., 2019]
Functional Variational Bayesian Neural Networks
Variational Bayesian neural networks (BNNs) perform variational inference
over weights, but it is difficult to specify meaningful priors and approximate
posteriors in a high-dimensional weight space. We introduce functional
variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower
BOund (ELBO) defined directly on stochastic processes, i.e. distributions over
functions. We prove that the KL divergence between stochastic processes equals
the supremum of marginal KL divergences over all finite sets of inputs. Based
on this, we introduce a practical training objective which approximates the
functional ELBO using finite measurement sets and the spectral Stein gradient
estimator. With fBNNs, we can specify priors entailing rich structures,
including Gaussian processes and implicit stochastic processes. Empirically, we
find fBNNs extrapolate well using various structured priors, provide reliable
uncertainty estimates, and scale to large datasets.Comment: ICLR 201
Kernel methods through the roof: handling billions of points efficiently
Kernel methods provide an elegant and principled approach to nonparametric
learning, but so far could hardly be used in large scale problems, since
na\"ive implementations scale poorly with data size. Recent advances have shown
the benefits of a number of algorithmic ideas, for example combining
optimization, numerical linear algebra and random projections. Here, we push
these efforts further to develop and test a solver that takes full advantage of
GPU hardware. Towards this end, we designed a preconditioned gradient solver
for kernel methods exploiting both GPU acceleration and parallelization with
multiple GPUs, implementing out-of-core variants of common linear algebra
operations to guarantee optimal hardware utilization. Further, we optimize the
numerical precision of different operations and maximize efficiency of
matrix-vector multiplications. As a result we can experimentally show dramatic
speedups on datasets with billions of points, while still guaranteeing state of
the art performance. Additionally, we make our software available as an easy to
use library.Comment: 33 pages, 7 figures, NeurIPS 202