98 research outputs found
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading
Answering Consumer Health Questions on the Web
Question answering is an important sub task in the field of information retrieval. Question answering has typically used reliable sources of information such as the Wikipedia for information. In this work, we look at answering health questions using the web. The web offers the means to answer general medical questions on a variety of topics but comes with the downside of being rife with misinformation and contradictory information. We develop our techniques using the TREC health misinformation tracks that use consumer health question as topics and web crawls as their document collection.
In this work, we implement a document filtering technique based on topic-sensitive PageRank that uses a web graph of the hosts in common crawl. We develop a new passage extraction technique that performs query-based contextualized sentence selection. We test this technique on a multi-span extractive question answering dataset. We also develop an answer aggregation technique that can combine language features and manual features to predict answers to these consumer health questions. We test all of these approaches on the TREC Health Misinformation Track. We show that these techniques in the majority of cases provide an uplift in performance
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval
Large neural models (such as Transformers) achieve state-of-the-art
performance for information retrieval (IR). In this paper, we aim to improve
distillation methods that pave the way for the resource-efficient deployment of
such models in practice. Inspired by our theoretical analysis of the
teacher-student generalization gap for IR models, we propose a novel
distillation approach that leverages the relative geometry among queries and
documents learned by the large teacher model. Unlike existing teacher
score-based distillation methods, our proposed approach employs embedding
matching tasks to provide a stronger signal to align the representations of the
teacher and student models. In addition, it utilizes query generation to
explore the data manifold to reduce the discrepancies between the student and
the teacher where training data is sparse. Furthermore, our analysis also
motivates novel asymmetric architectures for student models which realizes
better embedding alignment without increasing online inference cost. On
standard benchmarks like MSMARCO, we show that our approach successfully
distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to
1/10th size asymmetric students that can retain 95-97% of the teacher
performance
- …