213 research outputs found

    Multiscale Dictionary Learning for Estimating Conditional Distributions

    Full text link
    Nonparametric estimation of the conditional distribution of a response given high-dimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space. A fast graph partitioning algorithm is applied to obtain the tree decomposition, with Bayesian methods then used to adaptively prune and average over different sub-trees in a soft probabilistic manner. The algorithm scales efficiently to approximately one million features. State of the art predictive performance is demonstrated for toy examples and two neuroscience applications including up to a million features

    Bayesian learning of joint distributions of objects

    Full text link
    There is increasing interest in broad application areas in defining flexible joint models for data having a variety of measurement scales, while also allowing data of complex types, such as functions, images and documents. We consider a general framework for nonparametric Bayes joint modeling through mixture models that incorporate dependence across data types through a joint mixing measure. The mixing measure is assigned a novel infinite tensor factorization (ITF) prior that allows flexible dependence in cluster allocation across data types. The ITF prior is formulated as a tensor product of stick-breaking processes. Focusing on a convenient special case corresponding to a Parafac factorization, we provide basic theory justifying the flexibility of the proposed prior and resulting asymptotic properties. Focusing on ITF mixtures of product kernels, we develop a new Gibbs sampling algorithm for routine implementation relying on slice sampling. The methods are compared with alternative joint mixture models based on Dirichlet processes and related approaches through simulations and real data applications.Comment: Appearing in Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS) 2013, Scottsdale, AZ, US

    Bayesian factorizations of big sparse tensors

    Full text link
    It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common low rank tensor factorization relies on parallel factor analysis (PARAFAC), which expresses a rank kk tensor as a sum of rank one tensors. When observations are only available for a tiny subset of the cells of a big tensor, the low rank assumption is not sufficient and PARAFAC has poor performance. We induce an additional layer of dimension reduction by allowing the effective rank to vary across dimensions of the table. For concreteness, we focus on a contingency table application. Taking a Bayesian approach, we place priors on terms in the factorization and develop an efficient Gibbs sampler for posterior computation. Theory is provided showing posterior concentration rates in high-dimensional settings, and the methods are shown to have excellent performance in simulations and several real data applications

    A survey on Bayesian nonparametric learning

    Full text link
    © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Bayesian (machine) learning has been playing a significant role in machine learning for a long time due to its particular ability to embrace uncertainty, encode prior knowledge, and endow interpretability. On the back of Bayesian learning's great success, Bayesian nonparametric learning (BNL) has emerged as a force for further advances in this field due to its greater modelling flexibility and representation power. Instead of playing with the fixed-dimensional probabilistic distributions of Bayesian learning, BNL creates a new “game” with infinite-dimensional stochastic processes. BNL has long been recognised as a research subject in statistics, and, to date, several state-of-the-art pilot studies have demonstrated that BNL has a great deal of potential to solve real-world machine-learning tasks. However, despite these promising results, BNL has not created a huge wave in the machine-learning community. Esotericism may account for this. The books and surveys on BNL written by statisticians are overcomplicated and filled with tedious theories and proofs. Each is certainly meaningful but may scare away new researchers, especially those with computer science backgrounds. Hence, the aim of this article is to provide a plain-spoken, yet comprehensive, theoretical survey of BNL in terms that researchers in the machine-learning community can understand. It is hoped this survey will serve as a starting point for understanding and exploiting the benefits of BNL in our current scholarly endeavours. To achieve this goal, we have collated the extant studies in this field and aligned them with the steps of a standard BNL procedure-from selecting the appropriate stochastic processes through manipulation to executing the model inference algorithms. At each step, past efforts have been thoroughly summarised and discussed. In addition, we have reviewed the common methods for implementing BNL in various machine-learning tasks along with its diverse applications in the real world as examples to motivate future studies

    Application of Stochastic Processes in Nonparametric Bayes

    Get PDF
    <p>This thesis presents theoretical studies of some stochastic processes and their appli- cations in the Bayesian nonparametric methods. The stochastic processes discussed in the thesis are mainly the ones with independent increments - the Levy processes. We develop new representations for the Levy measures of two representative exam- ples of the Levy processes, the beta and gamma processes. These representations are manifested in terms of an infinite sum of well-behaved (proper) beta and gamma dis- tributions, with the truncation and posterior analyses provided. The decompositions provide new insights into the beta and gamma processes (and their generalizations), and we demonstrate how the proposed representation unifies some properties of the two, as these are of increasing importance in machine learning.</p><p>Next a new Levy process is proposed for an uncountable collection of covariate- dependent feature-learning measures; the process is called the kernel beta process. Available covariates are handled efficiently via the kernel construction, with covari- ates assumed observed with each data sample ("customer"), and latent covariates learned for each feature ("dish"). The dependencies among the data are represented with the covariate-parameterized kernel function. The beta process is recovered as a limiting case of the kernel beta process. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks.</p><p>Last is a non-Levy process example of the multiplicative gamma process applied in the low-rank representation of tensors. The multiplicative gamma process is applied along the super-diagonal of tensors in the rank decomposition, with its shrinkage property nonparametrically learns the rank from the multiway data. This model is constructed as conjugate for the continuous multiway data case. For the non- conjugate binary multiway data, the Polya-Gamma auxiliary variable is sampled to elicit closed-form Gibbs sampling updates. This rank decomposition of tensors driven by the multiplicative gamma process yields state-of-art performance on various synthetic and benchmark real-world datasets, with desirable model scalability.</p>Dissertatio

    Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data

    Full text link
    We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a "topic". We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as \emph{multiway} topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.Comment: ECML PKDD 201
    • …
    corecore