503 research outputs found
Gibbs sampling methods for Pitman-Yor mixture models
We introduce a new sampling strategy for the two-parameter Poisson-Dirichlet process mixture model, also known as Pitman-Yor process mixture model (PYM). Our sampler is therefore applicable to the well-known Dirichlet process mixture model (DPM). Inference in DPM and PYM is usually performed via Markov Chain Monte Carlo (MCMC) methods, specifi cally the Gibbs sampler. These sampling methods are usually divided in two classes: marginal and conditional algorithms. Each method has its merits and limitations. The aim of this paper is to propose a new sampler that combines the main advantages of each class. The key idea of the proposed sampler consists in replacing the standard posterior updating of the mixing measure based on the stick-breaking representation, with a posterior updating of Pitman(1996) which represents the posterior law under a Pitman-Yor process as the sum of a jump part and a continuous one. We sample the continuous part in two ways, leading to two variants of the proposed sampler. We also propose a threshold to improve mixing in the first variant of our algorithm. The two variants of our sampler are compared with a marginal method, that is the celebrated Algorithm 8 of Neal(2000), and two conditional algorithms based on the stick-breaking representation, namely the efficient slice sampler of Kalli et al. (2011) and the truncated blocked Gibbs sampler of Ishwaran and James (2001). We also investigate e ffects of removing the proposed threshold in the first variant of our algorithm and introducing the threshold in the efficient slice sampler of Kalli et al. (2011). Results on real and simulated data sets illustrate that our algorithms outperform the other conditionals in terms of mixing properties
Sufficientness postulates for Gibbs-type priors and hierarchical generalizations
A fundamental problem in Bayesian nonparametrics consists of selecting a prior distribution by assuming that the corresponding predictive probabilities obey certain properties. An early discussion of such a problem, although in a parametric framework, dates back to the seminal work by English philosopher W. E. Johnson, who introduced a noteworthy characterization for the predictive probabilities of the symmetric Dirichlet prior distribution. This is typically referred to as Johnson’s “sufficientness” postulate. In this paper we review some nonparametric generalizations of Johnson’s postulate for a class of nonparametric priors known as species sampling models. In particular we revisit and discuss the “sufficientness” postulate for the two parameter Poisson-Dirichlet prior within the more general framework of Gibbs-type priors and their hierarchical generalizations.. Stefano Favaro is supported by the European Research Council through StG N-BNP 306406. Marco Battiston’s research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) ERC grant agreement number 617071
Fair Clustering via Hierarchical Fair-Dirichlet Process
The advent of ML-driven decision-making and policy formation has led to an
increasing focus on algorithmic fairness. As clustering is one of the most
commonly used unsupervised machine learning approaches, there has naturally
been a proliferation of literature on {\em fair clustering}. A popular notion
of fairness in clustering mandates the clusters to be {\em balanced}, i.e.,
each level of a protected attribute must be approximately equally represented
in each cluster. Building upon the original framework, this literature has
rapidly expanded in various aspects. In this article, we offer a novel
model-based formulation of fair clustering, complementing the existing
literature which is almost exclusively based on optimizing appropriate
objective functions
Incorporating Prior Knowledge of Latent Group Structure in Panel Data Models
The assumption of group heterogeneity has become popular in panel data
models. We develop a constrained Bayesian grouped estimator that exploits
researchers' prior beliefs on groups in a form of pairwise constraints,
indicating whether a pair of units is likely to belong to a same group or
different groups. We propose a prior to incorporate the pairwise constraints
with varying degrees of confidence. The whole framework is built on the
nonparametric Bayesian method, which implicitly specifies a distribution over
the group partitions, and so the posterior analysis takes the uncertainty of
the latent group structure into account. Monte Carlo experiments reveal that
adding prior knowledge yields more accurate estimates of coefficient and scores
predictive gains over alternative estimators. We apply our method to two
empirical applications. In a first application to forecasting U.S. CPI
inflation, we illustrate that prior knowledge of groups improves density
forecasts when the data is not entirely informative. A second application
revisits the relationship between a country's income and its democratic
transition; we identify heterogeneous income effects on democracy with five
distinct groups over ninety countries
- …