87 research outputs found

    Inconsistency of Pitman-Yor process mixtures for the number of components

    Full text link
    In many applications, a finite mixture is a natural model, but it can be difficult to choose an appropriate number of components. To circumvent this choice, investigators are increasingly turning to Dirichlet process mixtures (DPMs), and Pitman-Yor process mixtures (PYMs), more generally. While these models may be well-suited for Bayesian density estimation, many investigators are using them for inferences about the number of components, by considering the posterior on the number of components represented in the observed data. We show that this posterior is not consistent --- that is, on data from a finite mixture, it does not concentrate at the true number of components. This result applies to a large class of nonparametric mixtures, including DPMs and PYMs, over a wide variety of families of component distributions, including essentially all discrete families, as well as continuous exponential families satisfying mild regularity conditions (such as multivariate Gaussians).Comment: This is a general treatment of the problem discussed in our related article, "A simple example of Dirichlet process mixture inconsistency for the number of components", Miller and Harrison (2013) arXiv:1301.270

    Advances in Bayesian asymptotics and Bayesian nonparametrics

    Get PDF
    Bayesian statistics is a powerful approach to learning real-world phenomena, its strength lying in its ability to quantify uncertainty explicitly by treating unknown quantities of interest as random variables. In this thesis, we consider questions regarding three quite different aspects of Bayesian learning. Firstly, we consider approximate Bayesian computation (ABC), a computational method suitable for computing approximate posterior distributions for highly complex models, where the likelihood function is intractable but can be simulated from. Previous authors have proved consistency and provided rates of convergence in the case where all summary statistics converge at the same rate as each other. We generalize to the case where summary statistics may converge at different rates, and provide an explicit representation of the shape of the ABC posterior distribution in our general setting. We also show under our general setting that local linear post-processing can lead to significantly faster contraction rates of the pseudo-posterior. We then focus on the application of Bayesian statistics to natural language processing. The class of context-free grammars, which are standard in the modelling of natural language, have been shown to be too restrictive to fully describe all features of natural language. We propose a Bayesian non-parametric model for the class of 2-multiple context-free grammars, which generalise context-free grammars. Our model is inspired by previously proposed Bayesian models for context-tree grammars and is based on the hierarchical Dirichlet process. We develop a sequential Monte Carlo algorithm to make inference under this model and carry out simulation studies to assess our method. Finally, we consider some consistency issues related to Bayesian nonparametric mixture models. It has been shown that these models are inconsistent for the number of clusters. In the case of Dirichlet process (DP) mixture models, this problem can be mitigated when a prior is put on the model's concentration hyperparameter α, as is common practice. We prove that Pitman--Yor process (PYP) mixture models (which generalise DP mixture models) remain inconsistent for the number of clusters when a prior is put on α, in the special case where the true number of components in the data generating mechanism is equal to 1 and the discount parameter σ is a fixed constant. When considering the space over partitions induced by BNP mixture models, point estimators such as the maximum a posteriori (MAP) are commonly used to summarise the posterior clustering structure of such models, which alone can be complex and difficult to interpret. We prove consistency of the MAP partition for DP mixture models when the concentration parameter, α, goes deterministically to zero, and when the true partition is made of only one cluster

    Bayesian Cluster Analysis

    Get PDF

    Quasi-Bernoulli Stick-breaking: Infinite Mixture with Cluster Consistency

    Full text link
    In mixture modeling and clustering application, the number of components is often not known. The stick-breaking model is an appealing construction that assumes infinitely many components, while shrinking most of the redundant weights to near zero. However, it has been discovered that such a shrinkage is unsatisfactory: even when the component distribution is correctly specified, small and spurious weights will appear and give an inconsistent estimate on the cluster number. In this article, we propose a simple solution that gains stronger control on the redundant weights -- when breaking each stick into two pieces, we adjust the length of the second piece by multiplying it to a quasi-Bernoulli random variable, supported at one and a positive constant close to zero. This substantially increases the chance of shrinking {\em all} the redundant weights to almost zero, leading to a consistent estimator on the cluster number; at the same time, it avoids the singularity due to assigning an exactly zero weight, and maintains a support in the infinite-dimensional space. As a stick-breaking model, its posterior computation can be carried out efficiently via the classic blocked Gibbs sampler, allowing straightforward extension of using non-Gaussian components. Compared to existing methods, our model demonstrates much superior performances in the simulations and data application, showing a substantial reduction in the number of clusters.Comment: 21 pages, 7 figure

    A review on Bayesian model-based clustering

    Full text link
    Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random object and define a prior distribution on it. This prior distribution may be induced by models on the observations, or directly defined for the partition. Several recent results, however, have shown the difficulties in consistently estimating the number of clusters, and, therefore, the partition. The problem itself of summarising the posterior distribution on the partition remains open, given the large dimension of the partition space. This work aims at reviewing the Bayesian approaches available in the literature to perform clustering, presenting advantages and disadvantages of each of them in order to suggest future lines of research
    • …
    corecore