9,641 research outputs found
A Mutually-Dependent Hadamard Kernel for Modelling Latent Variable Couplings
We introduce a novel kernel that models input-dependent couplings across
multiple latent processes. The pairwise joint kernel measures covariance along
inputs and across different latent signals in a mutually-dependent fashion. A
latent correlation Gaussian process (LCGP) model combines these non-stationary
latent components into multiple outputs by an input-dependent mixing matrix.
Probit classification and support for multiple observation sets are derived by
Variational Bayesian inference. Results on several datasets indicate that the
LCGP model can recover the correlations between latent signals while
simultaneously achieving state-of-the-art performance. We highlight the latent
covariances with an EEG classification dataset where latent brain processes and
their couplings simultaneously emerge from the model.Comment: 17 pages, 6 figures; accepted to ACML 201
Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis
Factor analysis aims to determine latent factors, or traits, which summarize
a given data set. Inter-battery factor analysis extends this notion to multiple
views of the data. In this paper we show how a nonlinear, nonparametric version
of these models can be recovered through the Gaussian process latent variable
model. This gives us a flexible formalism for multi-view learning where the
latent variables can be used both for exploratory purposes and for learning
representations that enable efficient inference for ambiguous estimation tasks.
Learning is performed in a Bayesian manner through the formulation of a
variational compression scheme which gives a rigorous lower bound on the log
likelihood. Our Bayesian framework provides strong regularization during
training, allowing the structure of the latent space to be determined
efficiently and automatically. We demonstrate this by producing the first (to
our knowledge) published results of learning from dozens of views, even when
data is scarce. We further show experimental results on several different types
of multi-view data sets and for different kinds of tasks, including exploratory
data analysis, generation, ambiguity modelling through latent priors and
classification.Comment: 49 pages including appendi
A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms
The benefits of automating design cycles for Bayesian inference-based
algorithms are becoming increasingly recognized by the machine learning
community. As a result, interest in probabilistic programming frameworks has
much increased over the past few years. This paper explores a specific
probabilistic programming paradigm, namely message passing in Forney-style
factor graphs (FFGs), in the context of automated design of efficient Bayesian
signal processing algorithms. To this end, we developed "ForneyLab"
(https://github.com/biaslab/ForneyLab.jl) as a Julia toolbox for message
passing-based inference in FFGs. We show by example how ForneyLab enables
automatic derivation of Bayesian signal processing algorithms, including
algorithms for parameter estimation and model comparison. Crucially, due to the
modular makeup of the FFG framework, both the model specification and inference
methods are readily extensible in ForneyLab. In order to test this framework,
we compared variational message passing as implemented by ForneyLab with
automatic differentiation variational inference (ADVI) and Monte Carlo methods
as implemented by state-of-the-art tools "Edward" and "Stan". In terms of
performance, extensibility and stability issues, ForneyLab appears to enjoy an
edge relative to its competitors for automated inference in state-space models.Comment: Accepted for publication in the International Journal of Approximate
Reasonin
Bayesian Semi-supervised Learning with Graph Gaussian Processes
We propose a data-efficient Gaussian process-based Bayesian approach to the
semi-supervised learning problem on graphs. The proposed model shows extremely
competitive performance when compared to the state-of-the-art graph neural
networks on semi-supervised learning benchmark experiments, and outperforms the
neural networks in active learning experiments where labels are scarce.
Furthermore, the model does not require a validation data set for early
stopping to control over-fitting. Our model can be viewed as an instance of
empirical distribution regression weighted locally by network connectivity. We
further motivate the intuitive construction of the model with a Bayesian linear
model interpretation where the node features are filtered by an operator
related to the graph Laplacian. The method can be easily implemented by
adapting off-the-shelf scalable variational inference algorithms for Gaussian
processes.Comment: To appear in NIPS 2018 Fixed an error in Figure 2. The previous arxiv
version contains two identical sub-figure
Understanding and Comparing Scalable Gaussian Process Regression for Big Data
As a non-parametric Bayesian model which produces informative predictive
distribution, Gaussian process (GP) has been widely used in various fields,
like regression, classification and optimization. The cubic complexity of
standard GP however leads to poor scalability, which poses challenges in the
era of big data. Hence, various scalable GPs have been developed in the
literature in order to improve the scalability while retaining desirable
prediction accuracy. This paper devotes to investigating the methodological
characteristics and performance of representative global and local scalable GPs
including sparse approximations and local aggregations from four main
perspectives: scalability, capability, controllability and robustness. The
numerical experiments on two toy examples and five real-world datasets with up
to 250K points offer the following findings. In terms of scalability, most of
the scalable GPs own a time complexity that is linear to the training size. In
terms of capability, the sparse approximations capture the long-term spatial
correlations, the local aggregations capture the local patterns but suffer from
over-fitting in some scenarios. In terms of controllability, we could improve
the performance of sparse approximations by simply increasing the inducing
size. But this is not the case for local aggregations. In terms of robustness,
local aggregations are robust to various initializations of hyperparameters due
to the local attention mechanism. Finally, we highlight that the proper hybrid
of global and local scalable GPs may be a promising way to improve both the
model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
- …