The distillation of ranking models has become an important topic in both
academia and industry. In recent years, several advanced methods have been
proposed to tackle this problem, often leveraging ranking information from
teacher rankers that is absent in traditional classification settings. To date,
there is no well-established consensus on how to evaluate this class of models.
Moreover, inconsistent benchmarking on a wide range of tasks and datasets make
it difficult to assess or invigorate advances in this field. This paper first
examines representative prior arts on ranking distillation, and raises three
questions to be answered around methodology and reproducibility. To that end,
we propose a systematic and unified benchmark, Ranking Distillation Suite
(RD-Suite), which is a suite of tasks with 4 large real-world datasets,
encompassing two major modalities (textual and numeric) and two applications
(standard distillation and distillation transfer). RD-Suite consists of
benchmark results that challenge some of the common wisdom in the field, and
the release of datasets with teacher scores and evaluation scripts for future
research. RD-Suite paves the way towards better understanding of ranking
distillation, facilities more research in this direction, and presents new
challenges.Comment: 15 pages, 2 figures. arXiv admin note: text overlap with
arXiv:2011.04006 by other author