Listwise learning-to-rank methods form a powerful class of ranking algorithms
that are widely adopted in applications such as information retrieval. These
algorithms learn to rank a set of items by optimizing a loss that is a function
of the entire set -- as a surrogate to a typically non-differentiable ranking
metric. Despite their empirical success, existing listwise methods are based on
heuristics and remain theoretically ill-understood. In particular, none of the
empirically successful loss functions are related to ranking metrics. In this
work, we propose a cross entropy-based learning-to-rank loss function that is
theoretically sound, is a convex bound on NDCG -- a popular ranking metric --
and is consistent with NDCG under learning scenarios common in information
retrieval. Furthermore, empirical evaluation of an implementation of the
proposed method with gradient boosting machines on benchmark learning-to-rank
datasets demonstrates the superiority of our proposed formulation over existing
algorithms in quality and robustness