4,343 research outputs found
Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes
In this paper we propose a Bayesian nonparametric model for clustering
partial ranking data. We start by developing a Bayesian nonparametric extension
of the popular Plackett-Luce choice model that can handle an infinite number of
choice items. Our framework is based on the theory of random atomic measures,
with the prior specified by a completely random measure. We characterise the
posterior distribution given data, and derive a simple and effective Gibbs
sampler for posterior simulation. We then develop a Dirichlet process mixture
extension of our model and apply it to investigate the clustering of
preferences for college degree programmes amongst Irish secondary school
graduates. The existence of clusters of applicants who have similar preferences
for degree programmes is established and we determine that subject matter and
geographical location of the third level institution characterise these
clusters.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS717 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Modelling rankings in R: the PlackettLuce package
This paper presents the R package PlackettLuce, which implements a
generalization of the Plackett-Luce model for rankings data. The generalization
accommodates both ties (of arbitrary order) and partial rankings (complete
rankings of subsets of items). By default, the implementation adds a set of
pseudo-comparisons with a hypothetical item, ensuring that the underlying
network of wins and losses between items is always strongly connected. In this
way, the worth of each item always has a finite maximum likelihood estimate,
with finite standard error. The use of pseudo-comparisons also has a
regularization effect, shrinking the estimated parameters towards equal item
worth. In addition to standard methods for model summary, PlackettLuce provides
a method to compute quasi standard errors for the item parameters. This
provides the basis for comparison intervals that do not change with the choice
of identifiability constraint placed on the item parameters. Finally, the
package provides a method for model-based partitioning using covariates whose
values vary between rankings, enabling the identification of subgroups of
judges or settings that have different item worths. The features of the package
are demonstrated through application to classic and novel data sets.Comment: In v2: review of software implementing alternative models to
Plackett-Luce; comparison of algorithms provided by the PlackettLuce package;
further examples of rankings where the underlying win-loss network is not
strongly connected. In addition, general editing to improve organisation and
clarity. In v3: corrected headings Table 4, minor edit
Epitope profiling via mixture modeling of ranked data
We propose the use of probability models for ranked data as a useful
alternative to a quantitative data analysis to investigate the outcome of
bioassay experiments, when the preliminary choice of an appropriate
normalization method for the raw numerical responses is difficult or subject to
criticism. We review standard distance-based and multistage ranking models and
in this last context we propose an original generalization of the Plackett-Luce
model to account for the order of the ranking elicitation process. The
usefulness of the novel model is illustrated with its maximum likelihood
estimation for a real data set. Specifically, we address the heterogeneous
nature of experimental units via model-based clustering and detail the
necessary steps for a successful likelihood maximization through a hybrid
version of the Expectation-Maximization algorithm. The performance of the
mixture model using the new distribution as mixture components is compared with
those relative to alternative mixture models for random rankings. A discussion
on the interpretation of the identified clusters and a comparison with more
standard quantitative approaches are finally provided.Comment: (revised to properly include references
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …