17,243 research outputs found
Modelling rankings in R: the PlackettLuce package
This paper presents the R package PlackettLuce, which implements a
generalization of the Plackett-Luce model for rankings data. The generalization
accommodates both ties (of arbitrary order) and partial rankings (complete
rankings of subsets of items). By default, the implementation adds a set of
pseudo-comparisons with a hypothetical item, ensuring that the underlying
network of wins and losses between items is always strongly connected. In this
way, the worth of each item always has a finite maximum likelihood estimate,
with finite standard error. The use of pseudo-comparisons also has a
regularization effect, shrinking the estimated parameters towards equal item
worth. In addition to standard methods for model summary, PlackettLuce provides
a method to compute quasi standard errors for the item parameters. This
provides the basis for comparison intervals that do not change with the choice
of identifiability constraint placed on the item parameters. Finally, the
package provides a method for model-based partitioning using covariates whose
values vary between rankings, enabling the identification of subgroups of
judges or settings that have different item worths. The features of the package
are demonstrated through application to classic and novel data sets.Comment: In v2: review of software implementing alternative models to
Plackett-Luce; comparison of algorithms provided by the PlackettLuce package;
further examples of rankings where the underlying win-loss network is not
strongly connected. In addition, general editing to improve organisation and
clarity. In v3: corrected headings Table 4, minor edit
Modeling heterogeneity in ranked responses by nonparametric maximum likelihood:How do Europeans get their scientific knowledge?
This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incorporated. An extension is proposed to allow for heterogeneity in the ranked responses. The resulting model uses a nonparametric formulation of the random effects structure, fitted using the EM algorithm. Each mass point is multivalued, with a parameter for each item. The resultant model is equivalent to a covariate latent class model, where the latent class profiles are provided by the mass point components and the covariates act on the class profiles. This provides an alternative interpretation of the fitted model. The approach is also suitable for paired comparison data
Epitope profiling via mixture modeling of ranked data
We propose the use of probability models for ranked data as a useful
alternative to a quantitative data analysis to investigate the outcome of
bioassay experiments, when the preliminary choice of an appropriate
normalization method for the raw numerical responses is difficult or subject to
criticism. We review standard distance-based and multistage ranking models and
in this last context we propose an original generalization of the Plackett-Luce
model to account for the order of the ranking elicitation process. The
usefulness of the novel model is illustrated with its maximum likelihood
estimation for a real data set. Specifically, we address the heterogeneous
nature of experimental units via model-based clustering and detail the
necessary steps for a successful likelihood maximization through a hybrid
version of the Expectation-Maximization algorithm. The performance of the
mixture model using the new distribution as mixture components is compared with
those relative to alternative mixture models for random rankings. A discussion
on the interpretation of the identified clusters and a comparison with more
standard quantitative approaches are finally provided.Comment: (revised to properly include references
Controlling Fairness and Bias in Dynamic Learning-to-Rank
Rankings are the primary interface through which many online platforms match
users to items (e.g. news, products, music, video). In these two-sided markets,
not only the users draw utility from the rankings, but the rankings also
determine the utility (e.g. exposure, revenue) for the item providers (e.g.
publishers, sellers, artists, studios). It has already been noted that
myopically optimizing utility to the users, as done by virtually all
learning-to-rank algorithms, can be unfair to the item providers. We,
therefore, present a learning-to-rank approach for explicitly enforcing
merit-based fairness guarantees to groups of items (e.g. articles by the same
publisher, tracks by the same artist). In particular, we propose a learning
algorithm that ensures notions of amortized group fairness, while
simultaneously learning the ranking function from implicit feedback data. The
algorithm takes the form of a controller that integrates unbiased estimators
for both fairness and utility, dynamically adapting both as more data becomes
available. In addition to its rigorous theoretical foundation and convergence
guarantees, we find empirically that the algorithm is highly practical and
robust.Comment: First two authors contributed equally. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information
Retrieval 202
Copula Processes
We define a copula process which describes the dependencies between
arbitrarily many random variables independently of their marginal
distributions. As an example, we develop a stochastic volatility model,
Gaussian Copula Process Volatility (GCPV), to predict the latent standard
deviations of a sequence of random variables. To make predictions we use
Bayesian inference, with the Laplace approximation, and with Markov chain Monte
Carlo as an alternative. We find both methods comparable. We also find our
model can outperform GARCH on simulated and financial data. And unlike GARCH,
GCPV can easily handle missing data, incorporate covariates other than time,
and model a rich class of covariance structures.Comment: 11 pages, 1 table, 1 figure. Submitted for publication. Since last
version: minor edits and reformattin
- …