Evaluating Interpolation and Extrapolation Performance of Neural
  Retrieval Models

Guo, Jiafeng; Liu, Yiqun; Ma, Shaoping; Mao, Jiaxin; Xie, Xiaohui; Zhan, Jingtao; Zhang, Min

Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models

Authors: Jiafeng Guo
Yiqun Liu
Shaoping Ma
Jiaxin Mao
Xiaohui Xie
Jingtao Zhan
Min Zhang
Publication date: 4 August 2022
Publisher

Abstract

A retrieval model should not only interpolate the training data but also extrapolate well to the queries that are different from the training data. While neural retrieval models have demonstrated impressive performance on ad-hoc search benchmarks, we still know little about how they perform in terms of interpolation and extrapolation. In this paper, we demonstrate the importance of separately evaluating the two capabilities of neural retrieval models. Firstly, we examine existing ad-hoc search benchmarks from the two perspectives. We investigate the distribution of training and test data and find a considerable overlap in query entities, query intent, and relevance labels. This finding implies that the evaluation on these test sets is biased toward interpolation and cannot accurately reflect the extrapolation capacity. Secondly, we propose a novel evaluation protocol to separately evaluate the interpolation and extrapolation performance on existing benchmark datasets. It resamples the training and test data based on query similarity and utilizes the resampled dataset for training and evaluation. Finally, we leverage the proposed evaluation protocol to comprehensively revisit a number of widely-adopted neural retrieval models. Results show models perform differently when moving from interpolation to extrapolation. For example, representation-based retrieval models perform almost as well as interaction-based retrieval models in terms of interpolation but not extrapolation. Therefore, it is necessary to separately evaluate both interpolation and extrapolation performance and the proposed resampling method serves as a simple yet effective evaluation tool for future IR studies.Comment: CIKM 2022 Full Pape

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2204.11447

Last time updated on 18/07/2022