27,877 research outputs found
A non-parametric conditional factor regression model for high-dimensional input and response
In this paper, we propose a non-parametric conditional factor regression
(NCFR)model for domains with high-dimensional input and response. NCFR enhances
linear regression in two ways: a) introducing low-dimensional latent factors
leading to dimensionality reduction and b) integrating an Indian Buffet Process
as a prior for the latent factors to derive unlimited sparse dimensions.
Experimental results comparing NCRF to several alternatives give evidence to
remarkable prediction performance.Comment: 9 pages, 3 figures, NIPS submissio
Gaussian processes with built-in dimensionality reduction: Applications in high-dimensional uncertainty propagation
The prohibitive cost of performing Uncertainty Quantification (UQ) tasks with
a very large number of input parameters can be addressed, if the response
exhibits some special structure that can be discovered and exploited. Several
physical responses exhibit a special structure known as an active subspace
(AS), a linear manifold of the stochastic space characterized by maximal
response variation. The idea is that one should first identify this low
dimensional manifold, project the high-dimensional input onto it, and then link
the projection to the output. In this work, we develop a probabilistic version
of AS which is gradient-free and robust to observational noise. Our approach
relies on a novel Gaussian process regression with built-in dimensionality
reduction with the AS represented as an orthogonal projection matrix that
serves as yet another covariance function hyper-parameter to be estimated from
the data. To train the model, we design a two-step maximum likelihood
optimization procedure that ensures the orthogonality of the projection matrix
by exploiting recent results on the Stiefel manifold. The additional benefit of
our probabilistic formulation is that it allows us to select the dimensionality
of the AS via the Bayesian information criterion. We validate our approach by
showing that it can discover the right AS in synthetic examples without
gradient information using both noiseless and noisy observations. We
demonstrate that our method is able to discover the same AS as the classical
approach in a challenging one-hundred-dimensional problem involving an elliptic
stochastic partial differential equation with random conductivity. Finally, we
use our approach to study the effect of geometric and material uncertainties in
the propagation of solitary waves in a one-dimensional granular system.Comment: 37 pages, 20 figure
Noise Contrastive Priors for Functional Uncertainty
Obtaining reliable uncertainty estimates of neural network predictions is a
long standing challenge. Bayesian neural networks have been proposed as a
solution, but it remains open how to specify their prior. In particular, the
common practice of an independent normal prior in weight space imposes
relatively weak constraints on the function posterior, allowing it to
generalize in unforeseen ways on inputs outside of the training distribution.
We propose noise contrastive priors (NCPs) to obtain reliable uncertainty
estimates. The key idea is to train the model to output high uncertainty for
data points outside of the training distribution. NCPs do so using an input
prior, which adds noise to the inputs of the current mini batch, and an output
prior, which is a wide distribution given these inputs. NCPs are compatible
with any model that can output uncertainty estimates, are easy to scale, and
yield reliable uncertainty estimates throughout training. Empirically, we show
that NCPs prevent overfitting outside of the training distribution and result
in uncertainty estimates that are useful for active learning. We demonstrate
the scalability of our method on the flight delays data set, where we
significantly improve upon previously published results.Comment: 12 pages, 6 figure
When Gaussian Process Meets Big Data: A Review of Scalable GPs
The vast quantity of information brought by big data as well as the evolving
computer hardware encourages success stories in the machine learning community.
In the meanwhile, it poses challenges for the Gaussian process (GP) regression,
a well-known non-parametric and interpretable Bayesian model, which suffers
from cubic complexity to data size. To improve the scalability while retaining
desirable prediction quality, a variety of scalable GPs have been presented.
But they have not yet been comprehensively reviewed and analyzed in order to be
well understood by both academia and industry. The review of scalable GPs in
the GP community is timely and important due to the explosion of data size. To
this end, this paper is devoted to the review on state-of-the-art scalable GPs
involving two main categories: global approximations which distillate the
entire data and local approximations which divide the data for subspace
learning. Particularly, for global approximations, we mainly focus on sparse
approximations comprising prior approximations which modify the prior but
perform exact inference, posterior approximations which retain exact prior but
perform approximate inference, and structured sparse approximations which
exploit specific structures in kernel matrix; for local approximations, we
highlight the mixture/product of experts that conducts model averaging from
multiple local experts to boost predictions. To present a complete review,
recent advances for improving the scalability and capability of scalable GPs
are reviewed. Finally, the extensions and open issues regarding the
implementation of scalable GPs in various scenarios are reviewed and discussed
to inspire novel ideas for future research avenues.Comment: 20 pages, 6 figure
Probabilistic Programming with Gaussian Process Memoization
Gaussian Processes (GPs) are widely used tools in statistics, machine
learning, robotics, computer vision, and scientific computation. However,
despite their popularity, they can be difficult to apply; all but the simplest
classification or regression applications require specification and inference
over complex covariance functions that do not admit simple analytical
posteriors. This paper shows how to embed Gaussian processes in any
higher-order probabilistic programming language, using an idiom based on
memoization, and demonstrates its utility by implementing and extending classic
and state-of-the-art GP applications. The interface to Gaussian processes,
called gpmem, takes an arbitrary real-valued computational process as input and
returns a statistical emulator that automatically improve as the original
process is invoked and its input-output behavior is recorded. The flexibility
of gpmem is illustrated via three applications: (i) robust GP regression with
hierarchical hyper-parameter learning, (ii) discovering symbolic expressions
from time-series data by fully Bayesian structure learning over kernels
generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian
optimization with automatic inference and action selection. All applications
share a single 50-line Python library and require fewer than 20 lines of
probabilistic code each.Comment: 36 pages, 9 figure
Variational Inference for Uncertainty on the Inputs of Gaussian Process Models
The Gaussian process latent variable model (GP-LVM) provides a flexible
approach for non-linear dimensionality reduction that has been widely applied.
However, the current approach for training GP-LVMs is based on maximum
likelihood, where the latent projection variables are maximized over rather
than integrated out. In this paper we present a Bayesian method for training
GP-LVMs by introducing a non-standard variational inference framework that
allows to approximately integrate out the latent variables and subsequently
train a GP-LVM by maximizing an analytic lower bound on the exact marginal
likelihood. We apply this method for learning a GP-LVM from iid observations
and for learning non-linear dynamical systems where the observations are
temporally correlated. We show that a benefit of the variational Bayesian
procedure is its robustness to overfitting and its ability to automatically
select the dimensionality of the nonlinear latent space. The resulting
framework is generic, flexible and easy to extend for other purposes, such as
Gaussian process regression with uncertain inputs and semi-supervised Gaussian
processes. We demonstrate our method on synthetic data and standard machine
learning benchmarks, as well as challenging real world datasets, including high
resolution video data.Comment: 51 pages (of which 10 is Appendix), 19 figure
Kernel Implicit Variational Inference
Recent progress in variational inference has paid much attention to the
flexibility of variational posteriors. One promising direction is to use
implicit distributions, i.e., distributions without tractable densities as the
variational posterior. However, existing methods on implicit posteriors still
face challenges of noisy estimation and computational infeasibility when
applied to models with high-dimensional latent variables. In this paper, we
present a new approach named Kernel Implicit Variational Inference that
addresses these challenges. As far as we know, for the first time implicit
variational inference is successfully applied to Bayesian neural networks,
which shows promising results on both regression and classification tasks.Comment: Published as a conference paper at ICLR 201
The identification of complex spatiotemporal patterns using Coupled map lattice model
Many complex and interesting spatiotemporal patterns have been observed in a wide range of scienti¯c areas. In this paper, two kinds of spatiotemporal patterns including spot replication and Turing systems are investigated and new identi¯cation methods are proposed to obtain Coupled Map Lattice (CML) models for this class of systems. Initially, a new correlation analysis method is introduced to determine an appropriate temporal and spatial data sampling step procedure for the identification of spatiotemporal systems. A new combined Orthogonal Forward Regression and Bayesian Learning algorithm with Laplace priors is introduced to identify sparse and robust CML models for complex spatiotemporal patterns. The final identified CML models are validated using correlation based model validation tests for spatiotemporal systems. Numerical re-sults illustrate the identification procedure and demonstrate the validity of the identified models
Doubly Decomposing Nonparametric Tensor Regression
Nonparametric extension of tensor regression is proposed. Nonlinearity in a
high-dimensional tensor space is broken into simple local functions by
incorporating low-rank tensor decomposition. Compared to naive nonparametric
approaches, our formulation considerably improves the convergence rate of
estimation while maintaining consistency with the same function class under
specific conditions. To estimate local functions, we develop a Bayesian
estimator with the Gaussian process prior. Experimental results show its
theoretical properties and high performance in terms of predicting a summary
statistic of a real complex network.Comment: 21 page
On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms
A popular approach to semi-supervised learning proceeds by endowing the input
data with a graph structure in order to extract geometric information and
incorporate it into a Bayesian framework. We introduce new theory that gives
appropriate scalings of graph parameters that provably lead to a well-defined
limiting posterior as the size of the unlabeled data set grows. Furthermore, we
show that these consistency results have profound algorithmic implications.
When consistency holds, carefully designed graph-based Markov chain Monte Carlo
algorithms are proved to have a uniform spectral gap, independent of the number
of unlabeled inputs. Several numerical experiments corroborate both the
statistical consistency and the algorithmic scalability established by the
theory
- …