Search CORE

34 research outputs found

Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes

Author: Caron François
Murphy Thomas Brendan
Teh Yee Whye
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

In this paper we propose a Bayesian nonparametric model for clustering partial ranking data. We start by developing a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation. We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

HAL Descartes

Oxford University Research Archive

Bayesian Plackett--Luce Mixture Models for Partially Ranked Data

Author: Mollica C.
TARDELLA Luca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/10/2016
Field of study

The elicitation of an ordinal judgment on multiple alternatives is often required in many psychological and behavioral experiments to investigate preference/choice orientation of a specific population. The Plackett–Luce model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. The present work introduces a Bayesian finite mixture of Plackett–Luce models to account for unobserved sample heterogeneity of partially ranked data. We describe an efficient way to incorporate the latent group structure in the data augmentation approach and the derivation of existing maximum likelihood procedures as special instances of the proposed Bayesian method. Inference can be conducted with the combination of the Expectation-Maximization algorithm for maximum a posteriori estimation and the Gibbs sampling iterative procedure.We additionally investigate several Bayesian criteria for selecting the optimal mixture configuration and describe diagnostic tools for assessing the fitness of ranking distributions conditionally and unconditionally on the number of ranked items. The utility of the novel Bayesian parametric Plackett–Luce mixture for characterizing sample heterogeneity is illustrated with several applications to simulated and real preference ranked data. We compare our method with the frequentist approach and a Bayesian nonparametric mixture model both assuming the Plackett–Luce model as a mixture component. Our analysis on real datasets reveals the importance of an accurate diagnostic check for an appropriate in-depth understanding of the heterogenous nature of the partial ranking data

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Bayesian modelling and analysis of ranked data

Author: Johnson Stephen Richard
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

PhD ThesisRanked data are central to many applications in science and social science and arise when rankers (individuals) use some criterion to order a set of entities. Such rankings are therefore equivalent to permutations of the elements of a set. The majority of models for ranked data rely on a strong assumption of homogeneity, such as all rankers sharing the same view on preferences of the entities. The aim of this thesis is to develop a richer class of models which can reveal any plausible subgroup structure within the data both for rankers and entities. We begin by looking at the Plackett–Luce model, an extension of the Bradley–Terry model for paired comparisons. First this model is extended to cater for when rankers do not report a full ranking of all entities. For example, they might only report their top five ranked entities after seeing some or all entities. Another issue is that most work in this area assumes that all rankers are equally informed about the entities they are ranking. Often this assumption will be questionable and so we develop a model which allows rankers to have differing reliability. This model, the Weighted Plackett–Luce model, allows for such heterogeneity through a novel two component mixture model defined by augmenting the Plackett–Luce model. The idea that rankers may be heterogeneous in their beliefs about entities is not new. However, there might be groups of rankers with each group sharing the same view about entities. Generally the number of such groups will not be known and so we investigate the possibility of such group structure by using a Dirichlet process mixture of Weighted Plackett–Luce models. It can also be useful to assess whether some entities are exchangeable, that is, whether there is also entity clustering within each ranker group, an issue that has received little attention in the literature. We extend the model further to explore both ranker and entity clustering by adapting the Nested Dirichlet process. The resulting model is a Weighted Adapted Nested Dirichlet (WAND) process mixture of Plackett–Luce models. Posterior inference is conducted via a simple and efficient Gibbs sampling scheme. The richness of information in the posterior distribution allows for inference about many aspects of the clustering structure both between ranker groups and between entity groups (within ranker groups), in contrast to many other (Bayesian) analyses. The methodology is illustrated using several simulation studies and real data examples. Finally, we relax the assumption of a known ranking process underpinning these models by looking at the recently developed Extended Plackett–Luce model. This model allows inference for the order in which a homogeneous set of rankers assign entities to ranks. Analysis of this model is challenging but we have found that using Metropolis coupled Markov chain Monte Carlo (MC3 ) methods can provide adequate mixing over the high dimensional space of all possible permutations when the number of entities is not small

Newcastle University eTheses

Rank-based Bayesian clustering via covariate-informed Mallows mixtures

Author: Eliseussen Emilie
Frigessi Arnoldo
Vitelli Valeria
Publication venue
Publication date: 16/02/2024
Field of study

Data in the form of rankings, ratings, pair comparisons or clicks are frequently collected in diverse fields, from marketing to politics, to understand assessors' individual preferences. Combining such preference data with features associated with the assessors can lead to a better understanding of the assessors' behaviors and choices. The Mallows model is a popular model for rankings, as it flexibly adapts to different types of preference data, and the previously proposed Bayesian Mallows Model (BMM) offers a computationally efficient framework for Bayesian inference, also allowing capturing the users' heterogeneity via a finite mixture. We develop a Bayesian Mallows-based finite mixture model that performs clustering while also accounting for assessor-related features, called the Bayesian Mallows model with covariates (BMMx). BMMx is based on a similarity function that a priori favours the aggregation of assessors into a cluster when their covariates are similar, using the Product Partition models (PPMx) proposal. We present two approaches to measure the covariate similarity: one based on a novel deterministic function measuring the covariates' goodness-of-fit to the cluster, and one based on an augmented model as in PPMx. We investigate the performance of BMMx in both simulation experiments and real-data examples, showing the method's potential for advancing the understanding of assessor preferences and behaviors in different applications

arXiv.org e-Print Archive

Gamma Processes, Stick-Breaking, and Variational Inference

Author: Kulis Brian
Roychowdhury Anirban
Publication venue
Publication date: 04/10/2014
Field of study

While most Bayesian nonparametric models in machine learning have focused on the Dirichlet process, the beta process, or their variants, the gamma process has recently emerged as a useful nonparametric prior in its own right. Current inference schemes for models involving the gamma process are restricted to MCMC-based methods, which limits their scalability. In this paper, we present a variational inference framework for models involving gamma process priors. Our approach is based on a novel stick-breaking constructive definition of the gamma process. We prove correctness of this stick-breaking process by using the characterization of the gamma process as a completely random measure (CRM), and we explicitly derive the rate measure of our construction using Poisson process machinery. We also derive error bounds on the truncation of the infinite process required for variational inference, similar to the truncation analyses for other nonparametric models based on the Dirichlet and beta processes. Our representation is then used to derive a variational inference algorithm for a particular Bayesian nonparametric latent structure formulation known as the infinite Gamma-Poisson model, where the latent variables are drawn from a gamma process prior with Poisson likelihoods. Finally, we present results for our algorithms on nonnegative matrix factorization tasks on document corpora, and show that we compare favorably to both sampling-based techniques and variational approaches based on beta-Bernoulli priors

arXiv.org e-Print Archive

CiteSeerX

Modeling heterogeneity in ranked responses by nonparametric maximum likelihood:How do Europeans get their scientific knowledge?

Author: Dittrich Regina
Francis Brian
Hatzinger Reinhold
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incorporated. An extension is proposed to allow for heterogeneity in the ranked responses. The resulting model uses a nonparametric formulation of the random effects structure, fitted using the EM algorithm. Each mass point is multivalued, with a parameter for each item. The resultant model is equivalent to a covariate latent class model, where the latent class profiles are provided by the mass point components and the covariates act on the class profiles. This provides an alternative interpretation of the fitted model. The approach is also suitable for paired comparison data

arXiv.org e-Print Archive

Crossref

National Centre for Research Methods: NCRM EPrints Repository

Lancaster E-Prints

Comparison mining from text

Author: TKACHENKO Maksim
Publication venue: Singapore Management University
Publication date: 01/12/2018
Field of study

Institutional Knowledge at Singapore Management University

BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data

Author: Dadaneh Siamak Zamani
Qian Xiaoning
Zhou Mingyuan
Publication venue
Publication date: 02/05/2017
Field of study

We perform differential expression analysis of high-throughput sequencing count data under a Bayesian nonparametric framework, removing sophisticated ad-hoc pre-processing steps commonly required in existing algorithms. We propose to use the gamma (beta) negative binomial process, which takes into account different sequencing depths using sample-specific negative binomial probability (dispersion) parameters, to detect differentially expressed genes by comparing the posterior distributions of gene-specific negative binomial dispersion (probability) parameters. These model parameters are inferred by borrowing statistical strength across both the genes and samples. Extensive experiments on both simulated and real-world RNA sequencing count data show that the proposed differential expression analysis algorithms clearly outperform previously proposed ones in terms of the areas under both the receiver operating characteristic and precision-recall curves.Comment: To appear in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

FigShare