87 research outputs found
Inconsistency of Pitman-Yor process mixtures for the number of components
In many applications, a finite mixture is a natural model, but it can be
difficult to choose an appropriate number of components. To circumvent this
choice, investigators are increasingly turning to Dirichlet process mixtures
(DPMs), and Pitman-Yor process mixtures (PYMs), more generally. While these
models may be well-suited for Bayesian density estimation, many investigators
are using them for inferences about the number of components, by considering
the posterior on the number of components represented in the observed data. We
show that this posterior is not consistent --- that is, on data from a finite
mixture, it does not concentrate at the true number of components. This result
applies to a large class of nonparametric mixtures, including DPMs and PYMs,
over a wide variety of families of component distributions, including
essentially all discrete families, as well as continuous exponential families
satisfying mild regularity conditions (such as multivariate Gaussians).Comment: This is a general treatment of the problem discussed in our related
article, "A simple example of Dirichlet process mixture inconsistency for the
number of components", Miller and Harrison (2013) arXiv:1301.270
Advances in Bayesian asymptotics and Bayesian nonparametrics
Bayesian statistics is a powerful approach to learning real-world phenomena, its strength lying in its ability to quantify uncertainty explicitly by treating unknown quantities of interest as random variables. In this thesis, we consider questions regarding three quite different aspects of Bayesian learning.
Firstly, we consider approximate Bayesian computation (ABC), a computational method suitable for computing approximate posterior distributions for highly complex models, where the likelihood function is intractable but can be simulated from. Previous authors have proved consistency and provided rates of convergence in the case where all summary statistics converge at the same rate as each other. We generalize to the case where summary statistics may converge at different rates, and provide an explicit representation of the shape of the ABC posterior distribution in our general setting. We also show under our general setting that local linear post-processing can lead to significantly faster contraction rates of the pseudo-posterior.
We then focus on the application of Bayesian statistics to natural language processing. The class of context-free grammars, which are standard in the modelling of natural language, have been shown to be too restrictive to fully describe all features of natural language. We propose a Bayesian non-parametric model for the class of 2-multiple context-free grammars, which generalise context-free grammars. Our model is inspired by previously proposed Bayesian models for context-tree grammars and is based on the hierarchical Dirichlet process. We develop a sequential Monte Carlo algorithm to make inference under this model and carry out simulation studies to assess our method.
Finally, we consider some consistency issues related to Bayesian nonparametric mixture models. It has been shown that these models are inconsistent for the number of clusters. In the case of Dirichlet process (DP) mixture models, this problem can be mitigated when a prior is put on the model's concentration hyperparameter α, as is common practice. We prove that Pitman--Yor process (PYP) mixture models (which generalise DP mixture models) remain inconsistent for the number of clusters when a prior is put on α, in the special case where the true number of components in the data generating mechanism is equal to 1 and the discount parameter σ is a fixed constant. When considering the space over partitions induced by BNP mixture models, point estimators such as the maximum a posteriori (MAP) are commonly used to summarise the posterior clustering structure of such models, which alone can be complex and difficult to interpret. We prove consistency of the MAP partition for DP mixture models when the concentration parameter, α, goes deterministically to zero, and when the true partition is made of only one cluster
Quasi-Bernoulli Stick-breaking: Infinite Mixture with Cluster Consistency
In mixture modeling and clustering application, the number of components is
often not known. The stick-breaking model is an appealing construction that
assumes infinitely many components, while shrinking most of the redundant
weights to near zero. However, it has been discovered that such a shrinkage is
unsatisfactory: even when the component distribution is correctly specified,
small and spurious weights will appear and give an inconsistent estimate on the
cluster number. In this article, we propose a simple solution that gains
stronger control on the redundant weights -- when breaking each stick into two
pieces, we adjust the length of the second piece by multiplying it to a
quasi-Bernoulli random variable, supported at one and a positive constant close
to zero. This substantially increases the chance of shrinking {\em all} the
redundant weights to almost zero, leading to a consistent estimator on the
cluster number; at the same time, it avoids the singularity due to assigning an
exactly zero weight, and maintains a support in the infinite-dimensional space.
As a stick-breaking model, its posterior computation can be carried out
efficiently via the classic blocked Gibbs sampler, allowing straightforward
extension of using non-Gaussian components. Compared to existing methods, our
model demonstrates much superior performances in the simulations and data
application, showing a substantial reduction in the number of clusters.Comment: 21 pages, 7 figure
A review on Bayesian model-based clustering
Clustering is an important task in many areas of knowledge: medicine and
epidemiology, genomics, environmental science, economics, visual sciences,
among others. Methodologies to perform inference on the number of clusters have
often been proved to be inconsistent, and introducing a dependence structure
among the clusters implies additional difficulties in the estimation process.
In a Bayesian setting, clustering is performed by considering the unknown
partition as a random object and define a prior distribution on it. This prior
distribution may be induced by models on the observations, or directly defined
for the partition. Several recent results, however, have shown the difficulties
in consistently estimating the number of clusters, and, therefore, the
partition. The problem itself of summarising the posterior distribution on the
partition remains open, given the large dimension of the partition space. This
work aims at reviewing the Bayesian approaches available in the literature to
perform clustering, presenting advantages and disadvantages of each of them in
order to suggest future lines of research
- …