60,322 research outputs found
Bayesian Robust Tensor Factorization for Incomplete Multiway Data
We propose a generative model for robust tensor factorization in the presence
of both missing data and outliers. The objective is to explicitly infer the
underlying low-CP-rank tensor capturing the global information and a sparse
tensor capturing the local information (also considered as outliers), thus
providing the robust predictive distribution over missing entries. The
low-CP-rank tensor is modeled by multilinear interactions between multiple
latent factors on which the column sparsity is enforced by a hierarchical
prior, while the sparse tensor is modeled by a hierarchical view of Student-
distribution that associates an individual hyperparameter with each element
independently. For model learning, we develop an efficient closed-form
variational inference under a fully Bayesian treatment, which can effectively
prevent the overfitting problem and scales linearly with data size. In contrast
to existing related works, our method can perform model selection automatically
and implicitly without need of tuning parameters. More specifically, it can
discover the groundtruth of CP rank and automatically adapt the sparsity
inducing priors to various types of outliers. In addition, the tradeoff between
the low-rank approximation and the sparse representation can be optimized in
the sense of maximum model evidence. The extensive experiments and comparisons
with many state-of-the-art algorithms on both synthetic and real-world datasets
demonstrate the superiorities of our method from several perspectives.Comment: in IEEE Transactions on Neural Networks and Learning Systems, 201
Projection predictive model selection for Gaussian processes
We propose a new method for simplification of Gaussian process (GP) models by
projecting the information contained in the full encompassing model and
selecting a reduced number of variables based on their predictive relevance.
Our results on synthetic and real world datasets show that the proposed method
improves the assessment of variable relevance compared to the automatic
relevance determination (ARD) via the length-scale parameters. We expect the
method to be useful for improving explainability of the models, reducing the
future measurement costs and reducing the computation time for making new
predictions.Comment: A few minor changes in tex
Automatic Differentiation Variational Inference
Probabilistic modeling is iterative. A scientist posits a simple model, fits
it to her data, refines it according to her analysis, and repeats. However,
fitting complex models to large data is a bottleneck in this process. Deriving
algorithms for new models can be both mathematically and computationally
challenging, which makes it difficult to efficiently cycle through the steps.
To this end, we develop automatic differentiation variational inference (ADVI).
Using our method, the scientist only provides a probabilistic model and a
dataset, nothing else. ADVI automatically derives an efficient variational
inference algorithm, freeing the scientist to refine and explore many models.
ADVI supports a broad class of models-no conjugacy assumptions are required. We
study ADVI across ten different models and apply it to a dataset with millions
of observations. ADVI is integrated into Stan, a probabilistic programming
system; it is available for immediate use
Automatic Variational Inference in Stan
Variational inference is a scalable technique for approximate Bayesian
inference. Deriving variational inference algorithms requires tedious
model-specific calculations; this makes it difficult to automate. We propose an
automatic variational inference algorithm, automatic differentiation
variational inference (ADVI). The user only provides a Bayesian model and a
dataset; nothing else. We make no conjugacy assumptions and support a broad
class of models. The algorithm automatically determines an appropriate
variational family and optimizes the variational objective. We implement ADVI
in Stan (code available now), a probabilistic programming framework. We compare
ADVI to MCMC sampling across hierarchical generalized linear models,
nonconjugate matrix factorization, and a mixture model. We train the mixture
model on a quarter million images. With ADVI we can use variational inference
on any model we write in Stan
Deep Gaussian Processes
In this paper we introduce deep Gaussian process (GP) models. Deep GPs are a
deep belief network based on Gaussian process mappings. The data is modeled as
the output of a multivariate GP. The inputs to that Gaussian process are then
governed by another GP. A single layer model is equivalent to a standard GP or
the GP latent variable model (GP-LVM). We perform inference in the model by
approximate variational marginalization. This results in a strict lower bound
on the marginal likelihood of the model which we use for model selection
(number of layers and nodes per layer). Deep belief networks are typically
applied to relatively large data sets using stochastic gradient descent for
optimization. Our fully Bayesian treatment allows for the application of deep
models even when data is scarce. Model selection by our variational bound shows
that a five layer hierarchy is justified even when modelling a digit data set
containing only 150 examples.Comment: 9 pages, 8 figures. Appearing in Proceedings of the 16th
International Conference on Artificial Intelligence and Statistics (AISTATS)
201
- …