213 research outputs found
Multiscale Dictionary Learning for Estimating Conditional Distributions
Nonparametric estimation of the conditional distribution of a response given
high-dimensional features is a challenging problem. It is important to allow
not only the mean but also the variance and shape of the response density to
change flexibly with features, which are massive-dimensional. We propose a
multiscale dictionary learning model, which expresses the conditional response
density as a convex combination of dictionary densities, with the densities
used and their weights dependent on the path through a tree decomposition of
the feature space. A fast graph partitioning algorithm is applied to obtain the
tree decomposition, with Bayesian methods then used to adaptively prune and
average over different sub-trees in a soft probabilistic manner. The algorithm
scales efficiently to approximately one million features. State of the art
predictive performance is demonstrated for toy examples and two neuroscience
applications including up to a million features
Bayesian learning of joint distributions of objects
There is increasing interest in broad application areas in defining flexible
joint models for data having a variety of measurement scales, while also
allowing data of complex types, such as functions, images and documents. We
consider a general framework for nonparametric Bayes joint modeling through
mixture models that incorporate dependence across data types through a joint
mixing measure. The mixing measure is assigned a novel infinite tensor
factorization (ITF) prior that allows flexible dependence in cluster allocation
across data types. The ITF prior is formulated as a tensor product of
stick-breaking processes. Focusing on a convenient special case corresponding
to a Parafac factorization, we provide basic theory justifying the flexibility
of the proposed prior and resulting asymptotic properties. Focusing on ITF
mixtures of product kernels, we develop a new Gibbs sampling algorithm for
routine implementation relying on slice sampling. The methods are compared with
alternative joint mixture models based on Dirichlet processes and related
approaches through simulations and real data applications.Comment: Appearing in Proceedings of the 16th International Conference on
Artificial Intelligence and Statistics (AISTATS) 2013, Scottsdale, AZ, US
Bayesian factorizations of big sparse tensors
It has become routine to collect data that are structured as multiway arrays
(tensors). There is an enormous literature on low rank and sparse matrix
factorizations, but limited consideration of extensions to the tensor case in
statistics. The most common low rank tensor factorization relies on parallel
factor analysis (PARAFAC), which expresses a rank tensor as a sum of rank
one tensors. When observations are only available for a tiny subset of the
cells of a big tensor, the low rank assumption is not sufficient and PARAFAC
has poor performance. We induce an additional layer of dimension reduction by
allowing the effective rank to vary across dimensions of the table. For
concreteness, we focus on a contingency table application. Taking a Bayesian
approach, we place priors on terms in the factorization and develop an
efficient Gibbs sampler for posterior computation. Theory is provided showing
posterior concentration rates in high-dimensional settings, and the methods are
shown to have excellent performance in simulations and several real data
applications
A survey on Bayesian nonparametric learning
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Bayesian (machine) learning has been playing a significant role in machine learning for a long time due to its particular ability to embrace uncertainty, encode prior knowledge, and endow interpretability. On the back of Bayesian learning's great success, Bayesian nonparametric learning (BNL) has emerged as a force for further advances in this field due to its greater modelling flexibility and representation power. Instead of playing with the fixed-dimensional probabilistic distributions of Bayesian learning, BNL creates a new “game” with infinite-dimensional stochastic processes. BNL has long been recognised as a research subject in statistics, and, to date, several state-of-the-art pilot studies have demonstrated that BNL has a great deal of potential to solve real-world machine-learning tasks. However, despite these promising results, BNL has not created a huge wave in the machine-learning community. Esotericism may account for this. The books and surveys on BNL written by statisticians are overcomplicated and filled with tedious theories and proofs. Each is certainly meaningful but may scare away new researchers, especially those with computer science backgrounds. Hence, the aim of this article is to provide a plain-spoken, yet comprehensive, theoretical survey of BNL in terms that researchers in the machine-learning community can understand. It is hoped this survey will serve as a starting point for understanding and exploiting the benefits of BNL in our current scholarly endeavours. To achieve this goal, we have collated the extant studies in this field and aligned them with the steps of a standard BNL procedure-from selecting the appropriate stochastic processes through manipulation to executing the model inference algorithms. At each step, past efforts have been thoroughly summarised and discussed. In addition, we have reviewed the common methods for implementing BNL in various machine-learning tasks along with its diverse applications in the real world as examples to motivate future studies
Application of Stochastic Processes in Nonparametric Bayes
<p>This thesis presents theoretical studies of some stochastic processes and their appli- cations in the Bayesian nonparametric methods. The stochastic processes discussed in the thesis are mainly the ones with independent increments - the Levy processes. We develop new representations for the Levy measures of two representative exam- ples of the Levy processes, the beta and gamma processes. These representations are manifested in terms of an infinite sum of well-behaved (proper) beta and gamma dis- tributions, with the truncation and posterior analyses provided. The decompositions provide new insights into the beta and gamma processes (and their generalizations), and we demonstrate how the proposed representation unifies some properties of the two, as these are of increasing importance in machine learning.</p><p>Next a new Levy process is proposed for an uncountable collection of covariate- dependent feature-learning measures; the process is called the kernel beta process. Available covariates are handled efficiently via the kernel construction, with covari- ates assumed observed with each data sample ("customer"), and latent covariates learned for each feature ("dish"). The dependencies among the data are represented with the covariate-parameterized kernel function. The beta process is recovered as a limiting case of the kernel beta process. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks.</p><p>Last is a non-Levy process example of the multiplicative gamma process applied in the low-rank representation of tensors. The multiplicative gamma process is applied along the super-diagonal of tensors in the rank decomposition, with its shrinkage property nonparametrically learns the rank from the multiway data. This model is constructed as conjugate for the continuous multiway data case. For the non- conjugate binary multiway data, the Polya-Gamma auxiliary variable is sampled to elicit closed-form Gibbs sampling updates. This rank decomposition of tensors driven by the multiplicative gamma process yields state-of-art performance on various synthetic and benchmark real-world datasets, with desirable model scalability.</p>Dissertatio
Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data
We present a Bayesian non-negative tensor factorization model for
count-valued tensor data, and develop scalable inference algorithms (both batch
and online) for dealing with massive tensors. Our generative model can handle
overdispersed counts as well as infer the rank of the decomposition. Moreover,
leveraging a reparameterization of the Poisson distribution as a multinomial
facilitates conjugacy in the model and enables simple and efficient Gibbs
sampling and variational Bayes (VB) inference updates, with a computational
cost that only depends on the number of nonzeros in the tensor. The model also
provides a nice interpretability for the factors; in our model, each factor
corresponds to a "topic". We develop a set of online inference algorithms that
allow further scaling up the model to massive tensors, for which batch
inference methods may be infeasible. We apply our framework on diverse
real-world applications, such as \emph{multiway} topic modeling on a scientific
publications database, analyzing a political science data set, and analyzing a
massive household transactions data set.Comment: ECML PKDD 201
- …