130 research outputs found
Learning Mixtures of Plackett-Luce Models
Abstract In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any k ≥ 2, the mixture of k PlackettLuce models for no more than 2k − 1 alternatives is non-identifiable and this bound is tight for k = 2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is generically identifiable if k ≤ m−2 2 !. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm b
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes
In this paper we propose a Bayesian nonparametric model for clustering
partial ranking data. We start by developing a Bayesian nonparametric extension
of the popular Plackett-Luce choice model that can handle an infinite number of
choice items. Our framework is based on the theory of random atomic measures,
with the prior specified by a completely random measure. We characterise the
posterior distribution given data, and derive a simple and effective Gibbs
sampler for posterior simulation. We then develop a Dirichlet process mixture
extension of our model and apply it to investigate the clustering of
preferences for college degree programmes amongst Irish secondary school
graduates. The existence of clusters of applicants who have similar preferences
for degree programmes is established and we determine that subject matter and
geographical location of the third level institution characterise these
clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Epitope profiling via mixture modeling of ranked data
We propose the use of probability models for ranked data as a useful
alternative to a quantitative data analysis to investigate the outcome of
bioassay experiments, when the preliminary choice of an appropriate
normalization method for the raw numerical responses is difficult or subject to
criticism. We review standard distance-based and multistage ranking models and
in this last context we propose an original generalization of the Plackett-Luce
model to account for the order of the ranking elicitation process. The
usefulness of the novel model is illustrated with its maximum likelihood
estimation for a real data set. Specifically, we address the heterogeneous
nature of experimental units via model-based clustering and detail the
necessary steps for a successful likelihood maximization through a hybrid
version of the Expectation-Maximization algorithm. The performance of the
mixture model using the new distribution as mixture components is compared with
those relative to alternative mixture models for random rankings. A discussion
on the interpretation of the identified clusters and a comparison with more
standard quantitative approaches are finally provided.Comment: (revised to properly include references
Modelling rankings in R: the PlackettLuce package
This paper presents the R package PlackettLuce, which implements a
generalization of the Plackett-Luce model for rankings data. The generalization
accommodates both ties (of arbitrary order) and partial rankings (complete
rankings of subsets of items). By default, the implementation adds a set of
pseudo-comparisons with a hypothetical item, ensuring that the underlying
network of wins and losses between items is always strongly connected. In this
way, the worth of each item always has a finite maximum likelihood estimate,
with finite standard error. The use of pseudo-comparisons also has a
regularization effect, shrinking the estimated parameters towards equal item
worth. In addition to standard methods for model summary, PlackettLuce provides
a method to compute quasi standard errors for the item parameters. This
provides the basis for comparison intervals that do not change with the choice
of identifiability constraint placed on the item parameters. Finally, the
package provides a method for model-based partitioning using covariates whose
values vary between rankings, enabling the identification of subgroups of
judges or settings that have different item worths. The features of the package
are demonstrated through application to classic and novel data sets.Comment: In v2: review of software implementing alternative models to
Plackett-Luce; comparison of algorithms provided by the PlackettLuce package;
further examples of rankings where the underlying win-loss network is not
strongly connected. In addition, general editing to improve organisation and
clarity. In v3: corrected headings Table 4, minor edit
Efficient Bayesian Inference for Generalized Bradley-Terry Models
The Bradley-Terry model is a popular approach to describe probabilities of
the possible outcomes when elements of a set are repeatedly compared with one
another in pairs. It has found many applications including animal behaviour,
chess ranking and multiclass classification. Numerous extensions of the basic
model have also been proposed in the literature including models with ties,
multiple comparisons, group comparisons and random graphs. From a computational
point of view, Hunter (2004) has proposed efficient iterative MM
(minorization-maximization) algorithms to perform maximum likelihood estimation
for these generalized Bradley-Terry models whereas Bayesian inference is
typically performed using MCMC (Markov chain Monte Carlo) algorithms based on
tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\
algorithms can be reinterpreted as special instances of
Expectation-Maximization (EM) algorithms associated to suitable sets of latent
variables and propose some original extensions. These latent variables allow us
to derive simple Gibbs samplers for Bayesian inference. We demonstrate
experimentally the efficiency of these algorithms on a variety of applications
- …