130 research outputs found

    Learning Mixtures of Plackett-Luce Models

    Get PDF
    Abstract In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any k ≥ 2, the mixture of k PlackettLuce models for no more than 2k − 1 alternatives is non-identifiable and this bound is tight for k = 2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is generically identifiable if k ≤ m−2 2 !. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm b

    Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks

    Full text link
    We present a novel Bayesian nonparametric regression model for covariates X and continuous, real response variable Y. The model is parametrized in terms of marginal distributions for Y and X and a regression function which tunes the stochastic ordering of the conditional distributions F(y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates

    Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes

    Full text link
    In this paper we propose a Bayesian nonparametric model for clustering partial ranking data. We start by developing a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a completely random measure. We characterise the posterior distribution given data, and derive a simple and effective Gibbs sampler for posterior simulation. We then develop a Dirichlet process mixture extension of our model and apply it to investigate the clustering of preferences for college degree programmes amongst Irish secondary school graduates. The existence of clusters of applicants who have similar preferences for degree programmes is established and we determine that subject matter and geographical location of the third level institution characterise these clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Epitope profiling via mixture modeling of ranked data

    Full text link
    We propose the use of probability models for ranked data as a useful alternative to a quantitative data analysis to investigate the outcome of bioassay experiments, when the preliminary choice of an appropriate normalization method for the raw numerical responses is difficult or subject to criticism. We review standard distance-based and multistage ranking models and in this last context we propose an original generalization of the Plackett-Luce model to account for the order of the ranking elicitation process. The usefulness of the novel model is illustrated with its maximum likelihood estimation for a real data set. Specifically, we address the heterogeneous nature of experimental units via model-based clustering and detail the necessary steps for a successful likelihood maximization through a hybrid version of the Expectation-Maximization algorithm. The performance of the mixture model using the new distribution as mixture components is compared with those relative to alternative mixture models for random rankings. A discussion on the interpretation of the identified clusters and a comparison with more standard quantitative approaches are finally provided.Comment: (revised to properly include references

    Modelling rankings in R: the PlackettLuce package

    Get PDF
    This paper presents the R package PlackettLuce, which implements a generalization of the Plackett-Luce model for rankings data. The generalization accommodates both ties (of arbitrary order) and partial rankings (complete rankings of subsets of items). By default, the implementation adds a set of pseudo-comparisons with a hypothetical item, ensuring that the underlying network of wins and losses between items is always strongly connected. In this way, the worth of each item always has a finite maximum likelihood estimate, with finite standard error. The use of pseudo-comparisons also has a regularization effect, shrinking the estimated parameters towards equal item worth. In addition to standard methods for model summary, PlackettLuce provides a method to compute quasi standard errors for the item parameters. This provides the basis for comparison intervals that do not change with the choice of identifiability constraint placed on the item parameters. Finally, the package provides a method for model-based partitioning using covariates whose values vary between rankings, enabling the identification of subgroups of judges or settings that have different item worths. The features of the package are demonstrated through application to classic and novel data sets.Comment: In v2: review of software implementing alternative models to Plackett-Luce; comparison of algorithms provided by the PlackettLuce package; further examples of rankings where the underlying win-loss network is not strongly connected. In addition, general editing to improve organisation and clarity. In v3: corrected headings Table 4, minor edit

    Efficient Bayesian Inference for Generalized Bradley-Terry Models

    Full text link
    The Bradley-Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behaviour, chess ranking and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons and random graphs. From a computational point of view, Hunter (2004) has proposed efficient iterative MM (minorization-maximization) algorithms to perform maximum likelihood estimation for these generalized Bradley-Terry models whereas Bayesian inference is typically performed using MCMC (Markov chain Monte Carlo) algorithms based on tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\ algorithms can be reinterpreted as special instances of Expectation-Maximization (EM) algorithms associated to suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications
    • …
    corecore