34 research outputs found

    Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes

    Full text link
    In this paper we propose a Bayesian nonparametric model for clustering partial ranking data. We start by developing a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation. We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian Plackett--Luce Mixture Models for Partially Ranked Data

    Get PDF
    The elicitation of an ordinal judgment on multiple alternatives is often required in many psychological and behavioral experiments to investigate preference/choice orientation of a specific population. The Plackett–Luce model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. The present work introduces a Bayesian finite mixture of Plackett–Luce models to account for unobserved sample heterogeneity of partially ranked data. We describe an efficient way to incorporate the latent group structure in the data augmentation approach and the derivation of existing maximum likelihood procedures as special instances of the proposed Bayesian method. Inference can be conducted with the combination of the Expectation-Maximization algorithm for maximum a posteriori estimation and the Gibbs sampling iterative procedure.We additionally investigate several Bayesian criteria for selecting the optimal mixture configuration and describe diagnostic tools for assessing the fitness of ranking distributions conditionally and unconditionally on the number of ranked items. The utility of the novel Bayesian parametric Plackett–Luce mixture for characterizing sample heterogeneity is illustrated with several applications to simulated and real preference ranked data. We compare our method with the frequentist approach and a Bayesian nonparametric mixture model both assuming the Plackett–Luce model as a mixture component. Our analysis on real datasets reveals the importance of an accurate diagnostic check for an appropriate in-depth understanding of the heterogenous nature of the partial ranking data

    Bayesian modelling and analysis of ranked data

    Get PDF
    PhD ThesisRanked data are central to many applications in science and social science and arise when rankers (individuals) use some criterion to order a set of entities. Such rankings are therefore equivalent to permutations of the elements of a set. The majority of models for ranked data rely on a strong assumption of homogeneity, such as all rankers sharing the same view on preferences of the entities. The aim of this thesis is to develop a richer class of models which can reveal any plausible subgroup structure within the data both for rankers and entities. We begin by looking at the Plackett–Luce model, an extension of the Bradley–Terry model for paired comparisons. First this model is extended to cater for when rankers do not report a full ranking of all entities. For example, they might only report their top five ranked entities after seeing some or all entities. Another issue is that most work in this area assumes that all rankers are equally informed about the entities they are ranking. Often this assumption will be questionable and so we develop a model which allows rankers to have differing reliability. This model, the Weighted Plackett–Luce model, allows for such heterogeneity through a novel two component mixture model defined by augmenting the Plackett–Luce model. The idea that rankers may be heterogeneous in their beliefs about entities is not new. However, there might be groups of rankers with each group sharing the same view about entities. Generally the number of such groups will not be known and so we investigate the possibility of such group structure by using a Dirichlet process mixture of Weighted Plackett–Luce models. It can also be useful to assess whether some entities are exchangeable, that is, whether there is also entity clustering within each ranker group, an issue that has received little attention in the literature. We extend the model further to explore both ranker and entity clustering by adapting the Nested Dirichlet process. The resulting model is a Weighted Adapted Nested Dirichlet (WAND) process mixture of Plackett–Luce models. Posterior inference is conducted via a simple and efficient Gibbs sampling scheme. The richness of information in the posterior distribution allows for inference about many aspects of the clustering structure both between ranker groups and between entity groups (within ranker groups), in contrast to many other (Bayesian) analyses. The methodology is illustrated using several simulation studies and real data examples. Finally, we relax the assumption of a known ranking process underpinning these models by looking at the recently developed Extended Plackett–Luce model. This model allows inference for the order in which a homogeneous set of rankers assign entities to ranks. Analysis of this model is challenging but we have found that using Metropolis coupled Markov chain Monte Carlo (MC3 ) methods can provide adequate mixing over the high dimensional space of all possible permutations when the number of entities is not small

    Rank-based Bayesian clustering via covariate-informed Mallows mixtures

    Full text link
    Data in the form of rankings, ratings, pair comparisons or clicks are frequently collected in diverse fields, from marketing to politics, to understand assessors' individual preferences. Combining such preference data with features associated with the assessors can lead to a better understanding of the assessors' behaviors and choices. The Mallows model is a popular model for rankings, as it flexibly adapts to different types of preference data, and the previously proposed Bayesian Mallows Model (BMM) offers a computationally efficient framework for Bayesian inference, also allowing capturing the users' heterogeneity via a finite mixture. We develop a Bayesian Mallows-based finite mixture model that performs clustering while also accounting for assessor-related features, called the Bayesian Mallows model with covariates (BMMx). BMMx is based on a similarity function that a priori favours the aggregation of assessors into a cluster when their covariates are similar, using the Product Partition models (PPMx) proposal. We present two approaches to measure the covariate similarity: one based on a novel deterministic function measuring the covariates' goodness-of-fit to the cluster, and one based on an augmented model as in PPMx. We investigate the performance of BMMx in both simulation experiments and real-data examples, showing the method's potential for advancing the understanding of assessor preferences and behaviors in different applications

    Gamma Processes, Stick-Breaking, and Variational Inference

    Full text link
    While most Bayesian nonparametric models in machine learning have focused on the Dirichlet process, the beta process, or their variants, the gamma process has recently emerged as a useful nonparametric prior in its own right. Current inference schemes for models involving the gamma process are restricted to MCMC-based methods, which limits their scalability. In this paper, we present a variational inference framework for models involving gamma process priors. Our approach is based on a novel stick-breaking constructive definition of the gamma process. We prove correctness of this stick-breaking process by using the characterization of the gamma process as a completely random measure (CRM), and we explicitly derive the rate measure of our construction using Poisson process machinery. We also derive error bounds on the truncation of the infinite process required for variational inference, similar to the truncation analyses for other nonparametric models based on the Dirichlet and beta processes. Our representation is then used to derive a variational inference algorithm for a particular Bayesian nonparametric latent structure formulation known as the infinite Gamma-Poisson model, where the latent variables are drawn from a gamma process prior with Poisson likelihoods. Finally, we present results for our algorithms on nonnegative matrix factorization tasks on document corpora, and show that we compare favorably to both sampling-based techniques and variational approaches based on beta-Bernoulli priors

    Modeling heterogeneity in ranked responses by nonparametric maximum likelihood:How do Europeans get their scientific knowledge?

    Get PDF
    This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incorporated. An extension is proposed to allow for heterogeneity in the ranked responses. The resulting model uses a nonparametric formulation of the random effects structure, fitted using the EM algorithm. Each mass point is multivalued, with a parameter for each item. The resultant model is equivalent to a covariate latent class model, where the latent class profiles are provided by the mass point components and the covariates act on the class profiles. This provides an alternative interpretation of the fitted model. The approach is also suitable for paired comparison data

    Comparison mining from text

    Get PDF

    BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data

    Full text link
    We perform differential expression analysis of high-throughput sequencing count data under a Bayesian nonparametric framework, removing sophisticated ad-hoc pre-processing steps commonly required in existing algorithms. We propose to use the gamma (beta) negative binomial process, which takes into account different sequencing depths using sample-specific negative binomial probability (dispersion) parameters, to detect differentially expressed genes by comparing the posterior distributions of gene-specific negative binomial dispersion (probability) parameters. These model parameters are inferred by borrowing statistical strength across both the genes and samples. Extensive experiments on both simulated and real-world RNA sequencing count data show that the proposed differential expression analysis algorithms clearly outperform previously proposed ones in terms of the areas under both the receiver operating characteristic and precision-recall curves.Comment: To appear in Journal of the American Statistical Associatio
    corecore