98 research outputs found

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading

    Answering Consumer Health Questions on the Web

    Get PDF
    Question answering is an important sub task in the field of information retrieval. Question answering has typically used reliable sources of information such as the Wikipedia for information. In this work, we look at answering health questions using the web. The web offers the means to answer general medical questions on a variety of topics but comes with the downside of being rife with misinformation and contradictory information. We develop our techniques using the TREC health misinformation tracks that use consumer health question as topics and web crawls as their document collection. In this work, we implement a document filtering technique based on topic-sensitive PageRank that uses a web graph of the hosts in common crawl. We develop a new passage extraction technique that performs query-based contextualized sentence selection. We test this technique on a multi-span extractive question answering dataset. We also develop an answer aggregation technique that can combine language features and manual features to predict answers to these consumer health questions. We test all of these approaches on the TREC Health Misinformation Track. We show that these techniques in the majority of cases provide an uplift in performance

    EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

    Full text link
    Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice. Inspired by our theoretical analysis of the teacher-student generalization gap for IR models, we propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. Unlike existing teacher score-based distillation methods, our proposed approach employs embedding matching tasks to provide a stronger signal to align the representations of the teacher and student models. In addition, it utilizes query generation to explore the data manifold to reduce the discrepancies between the student and the teacher where training data is sparse. Furthermore, our analysis also motivates novel asymmetric architectures for student models which realizes better embedding alignment without increasing online inference cost. On standard benchmarks like MSMARCO, we show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance
    • …
    corecore