We consider the predictive problem of supervised ranking, where the task is
to rank sets of candidate items returned in response to queries. Although there
exist statistical procedures that come with guarantees of consistency in this
setting, these procedures require that individuals provide a complete ranking
of all items, which is rarely feasible in practice. Instead, individuals
routinely provide partial preference information, such as pairwise comparisons
of items, and more practical approaches to ranking have aimed at modeling this
partial preference data directly. As we show, however, such an approach raises
serious theoretical challenges. Indeed, we demonstrate that many commonly used
surrogate losses for pairwise comparison data do not yield consistency;
surprisingly, we show inconsistency even in low-noise settings. With these
negative results as motivation, we present a new approach to supervised ranking
based on aggregation of partial preferences, and we develop U-statistic-based
empirical risk minimization procedures. We present an asymptotic analysis of
these new procedures, showing that they yield consistency results that parallel
those available for classification. We complement our theoretical results with
an experiment studying the new procedures in a large-scale web-ranking task.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1142 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org