3,724 research outputs found
Models for Paired Comparison Data: A Review with Emphasis on Dependent Data
Thurstonian and Bradley-Terry models are the most commonly applied models in
the analysis of paired comparison data. Since their introduction, numerous
developments have been proposed in different areas. This paper provides an
updated overview of these extensions, including how to account for object- and
subject-specific covariates and how to deal with ordinal paired comparison
data. Special emphasis is given to models for dependent comparisons. Although
these models are more realistic, their use is complicated by numerical
difficulties. We therefore concentrate on implementation issues. In particular,
a pairwise likelihood approach is explored for models for dependent paired
comparison data, and a simulation study is carried out to compare the
performance of maximum pairwise likelihood with other limited information
estimation methods. The methodology is illustrated throughout using a real data
set about university paired comparisons performed by students.Comment: Published in at http://dx.doi.org/10.1214/12-STS396 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Efficient Bayesian Inference for Generalized Bradley-Terry Models
The Bradley-Terry model is a popular approach to describe probabilities of
the possible outcomes when elements of a set are repeatedly compared with one
another in pairs. It has found many applications including animal behaviour,
chess ranking and multiclass classification. Numerous extensions of the basic
model have also been proposed in the literature including models with ties,
multiple comparisons, group comparisons and random graphs. From a computational
point of view, Hunter (2004) has proposed efficient iterative MM
(minorization-maximization) algorithms to perform maximum likelihood estimation
for these generalized Bradley-Terry models whereas Bayesian inference is
typically performed using MCMC (Markov chain Monte Carlo) algorithms based on
tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\
algorithms can be reinterpreted as special instances of
Expectation-Maximization (EM) algorithms associated to suitable sets of latent
variables and propose some original extensions. These latent variables allow us
to derive simple Gibbs samplers for Bayesian inference. We demonstrate
experimentally the efficiency of these algorithms on a variety of applications
Modeling heterogeneity in ranked responses by nonparametric maximum likelihood:How do Europeans get their scientific knowledge?
This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the standard framework of generalized linear models and also allows respondent covariates to be incorporated. An extension is proposed to allow for heterogeneity in the ranked responses. The resulting model uses a nonparametric formulation of the random effects structure, fitted using the EM algorithm. Each mass point is multivalued, with a parameter for each item. The resultant model is equivalent to a covariate latent class model, where the latent class profiles are provided by the mass point components and the covariates act on the class profiles. This provides an alternative interpretation of the fitted model. The approach is also suitable for paired comparison data
Convex Optimization for Binary Classifier Aggregation in Multiclass Problems
Multiclass problems are often decomposed into multiple binary problems that
are solved by individual binary classifiers whose results are integrated into a
final answer. Various methods, including all-pairs (APs), one-versus-all (OVA),
and error correcting output code (ECOC), have been studied, to decompose
multiclass problems into binary problems. However, little study has been made
to optimally aggregate binary problems to determine a final answer to the
multiclass problem. In this paper we present a convex optimization method for
an optimal aggregation of binary classifiers to estimate class membership
probabilities in multiclass problems. We model the class membership probability
as a softmax function which takes a conic combination of discrepancies induced
by individual binary classifiers, as an input. With this model, we formulate
the regularized maximum likelihood estimation as a convex optimization problem,
which is solved by the primal-dual interior point method. Connections of our
method to large margin classifiers are presented, showing that the large margin
formulation can be considered as a limiting case of our convex formulation.
Numerical experiments on synthetic and real-world data sets demonstrate that
our method outperforms existing aggregation methods as well as direct methods,
in terms of the classification accuracy and the quality of class membership
probability estimates.Comment: Appeared in Proceedings of the 2014 SIAM International Conference on
Data Mining (SDM 2014
Modelling rankings in R: the PlackettLuce package
This paper presents the R package PlackettLuce, which implements a
generalization of the Plackett-Luce model for rankings data. The generalization
accommodates both ties (of arbitrary order) and partial rankings (complete
rankings of subsets of items). By default, the implementation adds a set of
pseudo-comparisons with a hypothetical item, ensuring that the underlying
network of wins and losses between items is always strongly connected. In this
way, the worth of each item always has a finite maximum likelihood estimate,
with finite standard error. The use of pseudo-comparisons also has a
regularization effect, shrinking the estimated parameters towards equal item
worth. In addition to standard methods for model summary, PlackettLuce provides
a method to compute quasi standard errors for the item parameters. This
provides the basis for comparison intervals that do not change with the choice
of identifiability constraint placed on the item parameters. Finally, the
package provides a method for model-based partitioning using covariates whose
values vary between rankings, enabling the identification of subgroups of
judges or settings that have different item worths. The features of the package
are demonstrated through application to classic and novel data sets.Comment: In v2: review of software implementing alternative models to
Plackett-Luce; comparison of algorithms provided by the PlackettLuce package;
further examples of rankings where the underlying win-loss network is not
strongly connected. In addition, general editing to improve organisation and
clarity. In v3: corrected headings Table 4, minor edit
Modelling dependency in multivariate paired comparisons:a log-linear approach.
A log-linear representation of the Bradley-Terry model is presented for multivariate paired comparison data, where judges are asked to compare pairs of objects on more than one attribute. By converting such data to multiple binomial responses, dependencies between the decisions of the judges as well as possible association structures between the attributes can be incorporated in the model, providing an advantage over parallel univariate analyses of individual attributes. The approach outlined gives parameters which can be interpreted as (conditional) log–odds and log–odds ratios. As the model is a generalised linear model, parameter estimation can use standard software and the GLM framework can be used to test hypotheses on these parameters
Bradley-Terry models in R : the BradleyTerry2 package
This is a short overview of the R add-on package BradleyTerry2, which facilitates the specification and fitting of Bradley-Terry logit, probit or cauchit models to pair-comparison data. Included are the standard 'unstructured' Bradley-Terry model, structured versions in which the parameters are related through a linear predictor to explanatory variables, and the possibility of an order or 'home advantage' effect or other 'contest-specific' effects. Model fitting is either by maximum likelihood, by penalized quasi-likelihood (for models which involve a random effect), or by bias-reduced maximum likelihood in which the first-order asymptotic bias of parameter estimates is eliminated. Also provided are a simple and efficient approach to handling missing covariate data, and suitably-defined residuals for diagnostic checking of the linear predictor
Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning
The preference scaling of a group of subjects may not be homogeneous, but different
groups of subjects with certain characteristics may show different preference scalings,
each of which can be derived from paired comparisons by means of the Bradley-Terry model.
Usually, either different models are fit in predefined subsets of the
sample, or the effects of subject covariates are explicitly specified in a parametric
model. In both cases, categorical covariates can be employed directly to distinguish
between the different groups, while numeric covariates are typically discretized
prior to modeling.
Here, a semi-parametric approach for recursive partitioning of Bradley-Terry models is
introduced as a means for identifying groups of subjects with homogeneous preference scalings
in a data-driven way. In this approach, the covariates that -- in main effects or
interactions -- distinguish between groups of subjects with different preference
orderings, are detected automatically from the set of candidate covariates. One main
advantage of this approach is that sensible partitions in numeric covariates are
also detected automatically
- …