4,048 research outputs found
Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product Networks
While Gaussian processes (GPs) are the method of choice for regression tasks,
they also come with practical difficulties, as inference cost scales cubic in
time and quadratic in memory. In this paper, we introduce a natural and
expressive way to tackle these problems, by incorporating GPs in sum-product
networks (SPNs), a recently proposed tractable probabilistic model allowing
exact and efficient inference. In particular, by using GPs as leaves of an SPN
we obtain a novel flexible prior over functions, which implicitly represents an
exponentially large mixture of local GPs. Exact and efficient posterior
inference in this model can be done in a natural interplay of the inference
mechanisms in GPs and SPNs. Thereby, each GP is -- similarly as in a mixture of
experts approach -- responsible only for a subset of data points, which
effectively reduces inference cost in a divide and conquer fashion. We show
that integrating GPs into the SPN framework leads to a promising probabilistic
regression model which is: (1) computational and memory efficient, (2) allows
efficient and exact posterior inference, (3) is flexible enough to mix
different kernel functions, and (4) naturally accounts for non-stationarities
in time series. In a variate of experiments, we show that the SPN-GP model can
learn input dependent parameters and hyper-parameters and is on par with or
outperforms the traditional GPs as well as state of the art approximations on
real-world data
Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures
Probabilistic graphical models are a central tool in AI; however, they are
generally not as expressive as deep neural models, and inference is notoriously
hard and slow. In contrast, deep probabilistic models such as sum-product
networks (SPNs) capture joint distributions in a tractable fashion, but still
lack the expressive power of intractable models based on deep neural networks.
Therefore, we introduce conditional SPNs (CSPNs), conditional density
estimators for multivariate and potentially hybrid domains which allow
harnessing the expressive power of neural networks while still maintaining
tractability guarantees. One way to implement CSPNs is to use an existing SPN
structure and condition its parameters on the input, e.g., via a deep neural
network. This approach, however, might misrepresent the conditional
independence structure present in data. Consequently, we also develop a
structure-learning approach that derives both the structure and parameters of
CSPNs from data. Our experimental evidence demonstrates that CSPNs are
competitive with other probabilistic models and yield superior performance on
multilabel image classification compared to mean field and mixture density
networks. Furthermore, they can successfully be employed as building blocks for
structured probabilistic models, such as autoregressive image models.Comment: 13 pages, 6 figure
Generative Image Modeling Using Spatial LSTMs
Modeling the distribution of natural images is challenging, partly because of
strong statistical dependencies which can extend over hundreds of pixels.
Recurrent neural networks have been successful in capturing long-range
dependencies in a number of problems but only recently have found their way
into generative image models. We here introduce a recurrent image model based
on multi-dimensional long short-term memory units which are particularly suited
for image modeling due to their spatial structure. Our model scales to images
of arbitrary size and its likelihood is computationally tractable. We find that
it outperforms the state of the art in quantitative comparisons on several
image datasets and produces promising results when used for texture synthesis
and inpainting
Automatic Bayesian Density Analysis
Making sense of a dataset in an automatic and unsupervised fashion is a
challenging problem in statistics and AI. Classical approaches for {exploratory
data analysis} are usually not flexible enough to deal with the uncertainty
inherent to real-world data: they are often restricted to fixed latent
interaction models and homogeneous likelihoods; they are sensitive to missing,
corrupt and anomalous data; moreover, their expressiveness generally comes at
the price of intractable inference. As a result, supervision from statisticians
is usually needed to find the right model for the data. However, since domain
experts are not necessarily also experts in statistics, we propose Automatic
Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible
at large. Specifically, ABDA allows for automatic and efficient missing value
estimation, statistical data type and likelihood discovery, anomaly detection
and dependency structure mining, on top of providing accurate density
estimation. Extensive empirical evidence shows that ABDA is a suitable tool for
automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Estimating Local Function Complexity via Mixture of Gaussian Processes
Real world data often exhibit inhomogeneity, e.g., the noise level, the
sampling distribution or the complexity of the target function may change over
the input space. In this paper, we try to isolate local function complexity in
a practical, robust way. This is achieved by first estimating the locally
optimal kernel bandwidth as a functional relationship. Specifically, we propose
Spatially Adaptive Bandwidth Estimation in Regression (SABER), which employs
the mixture of experts consisting of multinomial kernel logistic regression as
a gate and Gaussian process regression models as experts. Using the locally
optimal kernel bandwidths, we deduce an estimate to the local function
complexity by drawing parallels to the theory of locally linear smoothing. We
demonstrate the usefulness of local function complexity for model
interpretation and active learning in quantum chemistry experiments and fluid
dynamics simulations.Comment: 19 pages, 16 figure
Understanding and Comparing Scalable Gaussian Process Regression for Big Data
As a non-parametric Bayesian model which produces informative predictive
distribution, Gaussian process (GP) has been widely used in various fields,
like regression, classification and optimization. The cubic complexity of
standard GP however leads to poor scalability, which poses challenges in the
era of big data. Hence, various scalable GPs have been developed in the
literature in order to improve the scalability while retaining desirable
prediction accuracy. This paper devotes to investigating the methodological
characteristics and performance of representative global and local scalable GPs
including sparse approximations and local aggregations from four main
perspectives: scalability, capability, controllability and robustness. The
numerical experiments on two toy examples and five real-world datasets with up
to 250K points offer the following findings. In terms of scalability, most of
the scalable GPs own a time complexity that is linear to the training size. In
terms of capability, the sparse approximations capture the long-term spatial
correlations, the local aggregations capture the local patterns but suffer from
over-fitting in some scenarios. In terms of controllability, we could improve
the performance of sparse approximations by simply increasing the inducing
size. But this is not the case for local aggregations. In terms of robustness,
local aggregations are robust to various initializations of hyperparameters due
to the local attention mechanism. Finally, we highlight that the proper hybrid
of global and local scalable GPs may be a promising way to improve both the
model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
- …