5 research outputs found
ABNIRML: Analyzing the Behavior of Neural IR Models
Numerous studies have demonstrated the effectiveness of pretrained
contextualized language models such as BERT and T5 for ad-hoc search. However,
it is not well-understood why these methods are so effective, what makes some
variants more effective than others, and what pitfalls they may have. We
present a new comprehensive framework for Analyzing the Behavior of Neural IR
ModeLs (ABNIRML), which includes new types of diagnostic tests that allow us to
probe several characteristics---such as sensitivity to word order---that are
not addressed by previous techniques. To demonstrate the value of the
framework, we conduct an extensive empirical study that yields insights into
the factors that contribute to the neural model's gains, and identify potential
unintended biases the models exhibit. We find evidence that recent neural
ranking models have fundamentally different characteristics from prior ranking
models. For instance, these models can be highly influenced by altered document
word order, sentence order and inflectional endings. They can also exhibit
unexpected behaviors when additional content is added to documents, or when
documents are expressed with different levels of fluency or formality. We find
that these differences can depend on the architecture and not just the
underlying language model
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading