503 research outputs found

    Gibbs sampling methods for Pitman-Yor mixture models

    Get PDF
    We introduce a new sampling strategy for the two-parameter Poisson-Dirichlet process mixture model, also known as Pitman-Yor process mixture model (PYM). Our sampler is therefore applicable to the well-known Dirichlet process mixture model (DPM). Inference in DPM and PYM is usually performed via Markov Chain Monte Carlo (MCMC) methods, specifi cally the Gibbs sampler. These sampling methods are usually divided in two classes: marginal and conditional algorithms. Each method has its merits and limitations. The aim of this paper is to propose a new sampler that combines the main advantages of each class. The key idea of the proposed sampler consists in replacing the standard posterior updating of the mixing measure based on the stick-breaking representation, with a posterior updating of Pitman(1996) which represents the posterior law under a Pitman-Yor process as the sum of a jump part and a continuous one. We sample the continuous part in two ways, leading to two variants of the proposed sampler. We also propose a threshold to improve mixing in the first variant of our algorithm. The two variants of our sampler are compared with a marginal method, that is the celebrated Algorithm 8 of Neal(2000), and two conditional algorithms based on the stick-breaking representation, namely the efficient slice sampler of Kalli et al. (2011) and the truncated blocked Gibbs sampler of Ishwaran and James (2001). We also investigate e ffects of removing the proposed threshold in the first variant of our algorithm and introducing the threshold in the efficient slice sampler of Kalli et al. (2011). Results on real and simulated data sets illustrate that our algorithms outperform the other conditionals in terms of mixing properties

    Sufficientness postulates for Gibbs-type priors and hierarchical generalizations

    Get PDF
    A fundamental problem in Bayesian nonparametrics consists of selecting a prior distribution by assuming that the corresponding predictive probabilities obey certain properties. An early discussion of such a problem, although in a parametric framework, dates back to the seminal work by English philosopher W. E. Johnson, who introduced a noteworthy characterization for the predictive probabilities of the symmetric Dirichlet prior distribution. This is typically referred to as Johnson’s “sufficientness” postulate. In this paper we review some nonparametric generalizations of Johnson’s postulate for a class of nonparametric priors known as species sampling models. In particular we revisit and discuss the “sufficientness” postulate for the two parameter Poisson-Dirichlet prior within the more general framework of Gibbs-type priors and their hierarchical generalizations.. Stefano Favaro is supported by the European Research Council through StG N-BNP 306406. Marco Battiston’s research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) ERC grant agreement number 617071

    Fair Clustering via Hierarchical Fair-Dirichlet Process

    Full text link
    The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a protected attribute must be approximately equally represented in each cluster. Building upon the original framework, this literature has rapidly expanded in various aspects. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions

    Incorporating Prior Knowledge of Latent Group Structure in Panel Data Models

    Full text link
    The assumption of group heterogeneity has become popular in panel data models. We develop a constrained Bayesian grouped estimator that exploits researchers' prior beliefs on groups in a form of pairwise constraints, indicating whether a pair of units is likely to belong to a same group or different groups. We propose a prior to incorporate the pairwise constraints with varying degrees of confidence. The whole framework is built on the nonparametric Bayesian method, which implicitly specifies a distribution over the group partitions, and so the posterior analysis takes the uncertainty of the latent group structure into account. Monte Carlo experiments reveal that adding prior knowledge yields more accurate estimates of coefficient and scores predictive gains over alternative estimators. We apply our method to two empirical applications. In a first application to forecasting U.S. CPI inflation, we illustrate that prior knowledge of groups improves density forecasts when the data is not entirely informative. A second application revisits the relationship between a country's income and its democratic transition; we identify heterogeneous income effects on democracy with five distinct groups over ninety countries
    • …
    corecore