107 research outputs found
Large Language Models for Information Retrieval: A Survey
As a primary means of information acquisition, information retrieval (IR)
systems, such as search engines, have integrated themselves into our daily
lives. These systems also serve as components of dialogue, question-answering,
and recommender systems. The trajectory of IR has evolved dynamically from its
origins in term-based methods to its integration with advanced neural models.
While the neural models excel at capturing complex contextual signals and
semantic nuances, thereby reshaping the IR landscape, they still face
challenges such as data scarcity, interpretability, and the generation of
contextually plausible yet potentially inaccurate responses. This evolution
requires a combination of both traditional methods (such as term-based sparse
retrieval methods with rapid response) and modern neural architectures (such as
language models with powerful language understanding capacity). Meanwhile, the
emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has
revolutionized natural language processing due to their remarkable language
understanding, generation, generalization, and reasoning abilities.
Consequently, recent research has sought to leverage LLMs to improve IR
systems. Given the rapid evolution of this research trajectory, it is necessary
to consolidate existing methodologies and provide nuanced insights through a
comprehensive overview. In this survey, we delve into the confluence of LLMs
and IR systems, including crucial aspects such as query rewriters, retrievers,
rerankers, and readers. Additionally, we explore promising directions within
this expanding field
Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models
Neural text ranking models have witnessed significant advancement and are
increasingly being deployed in practice. Unfortunately, they also inherit
adversarial vulnerabilities of general neural models, which have been detected
but remain underexplored by prior studies. Moreover, the inherit adversarial
vulnerabilities might be leveraged by blackhat SEO to defeat better-protected
search engines. In this study, we propose an imitation adversarial attack on
black-box neural passage ranking models. We first show that the target passage
ranking model can be transparentized and imitated by enumerating critical
queries/candidates and then train a ranking imitation model. Leveraging the
ranking imitation model, we can elaborately manipulate the ranking results and
transfer the manipulation attack to the target ranking model. For this purpose,
we propose an innovative gradient-based attack method, empowered by the
pairwise objective function, to generate adversarial triggers, which causes
premeditated disorderliness with very few tokens. To equip the trigger
camouflages, we add the next sentence prediction loss and the language model
fluency constraint to the objective function. Experimental results on passage
ranking demonstrate the effectiveness of the ranking imitation attack model and
adversarial triggers against various SOTA neural ranking models. Furthermore,
various mitigation analyses and human evaluation show the effectiveness of
camouflages when facing potential mitigation approaches. To motivate other
scholars to further investigate this novel and important problem, we make the
experiment data and code publicly available.Comment: 15 pages, 4 figures, accepted by ACM CCS 2022, Best Paper Nominatio
Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Neural networks with deep architectures have demonstrated significant
performance improvements in computer vision, speech recognition, and natural
language processing. The challenges in information retrieval (IR), however, are
different from these other application areas. A common form of IR involves
ranking of documents--or short passages--in response to keyword-based queries.
Effective IR systems must deal with query-document vocabulary mismatch problem,
by modeling relationships between different query and document terms and how
they indicate relevance. Models should also consider lexical matches when the
query contains rare terms--such as a person's name or a product model
number--not seen during training, and to avoid retrieving semantically related
but irrelevant results. In many real-life IR tasks, the retrieval involves
extremely large collections--such as the document index of a commercial Web
search engine--containing billions of documents. Efficient IR methods should
take advantage of specialized IR data structures, such as inverted index, to
efficiently retrieve from large collections. Given an information need, the IR
system also mediates how much exposure an information artifact receives by
deciding whether it should be displayed, and where it should be positioned,
among other results. Exposure-aware IR systems may optimize for additional
objectives, besides relevance, such as parity of exposure for retrieved items
and content publishers. In this thesis, we present novel neural architectures
and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading
Learning to Rank in the Age of Muppets
The emergence of BERT in 2018 has brought a huge boon to retrieval effectiveness in many tasks across various domains and led the recent research landscape of IR to transformer-related technologies.
While researchers are fascinated by the power of BERT, along with related transformer models, substantial computational costs incurred by transformers become an unavoidable problem.
Meanwhile, under the light of BERT, there are ''out-of-date'' but fairly effective techniques forgotten by people.
For example, learning to rank was one of the most popular technologies a decade ago.
In this work, we aim to answer two research questions: RQ1 is whether using learning to rank as a filtering stage in a multi-stage reranking pipeline can improve the efficiency of reranking using transformers without sacrificing effectiveness. In addition, we are interested in if using transformer-based features in the traditional learning to rank framework can increase effectiveness as RQ2.
To answer RQ1, we implement a multi-stage reranking pipeline which places learning to rank as a filter in the middle stage.
This configuration allows the pipeline to only send the most promising candidates using cheap learning to rank module to expensive neural rerankers, hence a speedup in overall latency for transformer-based reranking can be obtained without a degradation in effectiveness.
By applying the pipeline on MS MARCO passage and document ranking tasks, we can achieve up to 18 times increase in efficiency while maintaining the same level of effectiveness.
Moreover, our method is orthogonal to other techniques that focus on neural models themselves to accelerate inference.
Hence, our method can be combined with other accelerating works to further save computational costs and latency.
For RQ2, since transformers generate relevance scores for different query-document pairs independently, it is possible to use transformer-based scores as learning to rank features, so that learning to rank can take advantage of transformers to increase retrieval effectiveness.
Applied to the MS MARCO passage and document ranking tasks, we gain a maximal 52% increase in effectiveness by adding the BERT-based feature compared to the ''traditional'' learning to rank.
Also, we obtain a result with a little bit higher effectiveness by adding transformer-based features with other traditional features in learning to rank, compared to the standard retrieve-and-rerank design with transformers.
This work explores potential roles of learning to rank in the age of muppets.
In a broader sense, this work illustrates that we should stand on the shoulder of giants, which is what we learned and discovered in history, to explore next unknowns
Understanding Differential Search Index for Text Retrieval
The Differentiable Search Index (DSI) is a novel information retrieval (IR)
framework that utilizes a differentiable function to generate a sorted list of
document identifiers in response to a given query. However, due to the
black-box nature of the end-to-end neural architecture, it remains to be
understood to what extent DSI possesses the basic indexing and retrieval
abilities. To mitigate this gap, in this study, we define and examine three
important abilities that a functioning IR framework should possess, namely,
exclusivity, completeness, and relevance ordering. Our analytical
experimentation shows that while DSI demonstrates proficiency in memorizing the
unidirectional mapping from pseudo queries to document identifiers, it falls
short in distinguishing relevant documents from random ones, thereby negatively
impacting its retrieval effectiveness. To address this issue, we propose a
multi-task distillation approach to enhance the retrieval quality without
altering the structure of the model and successfully endow it with improved
indexing abilities. Through experiments conducted on various datasets, we
demonstrate that our proposed method outperforms previous DSI baselines.Comment: Accepted to Findings of ACL 202
Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models
A retrieval model should not only interpolate the training data but also
extrapolate well to the queries that are different from the training data.
While neural retrieval models have demonstrated impressive performance on
ad-hoc search benchmarks, we still know little about how they perform in terms
of interpolation and extrapolation. In this paper, we demonstrate the
importance of separately evaluating the two capabilities of neural retrieval
models. Firstly, we examine existing ad-hoc search benchmarks from the two
perspectives. We investigate the distribution of training and test data and
find a considerable overlap in query entities, query intent, and relevance
labels. This finding implies that the evaluation on these test sets is biased
toward interpolation and cannot accurately reflect the extrapolation capacity.
Secondly, we propose a novel evaluation protocol to separately evaluate the
interpolation and extrapolation performance on existing benchmark datasets. It
resamples the training and test data based on query similarity and utilizes the
resampled dataset for training and evaluation. Finally, we leverage the
proposed evaluation protocol to comprehensively revisit a number of
widely-adopted neural retrieval models. Results show models perform differently
when moving from interpolation to extrapolation. For example,
representation-based retrieval models perform almost as well as
interaction-based retrieval models in terms of interpolation but not
extrapolation. Therefore, it is necessary to separately evaluate both
interpolation and extrapolation performance and the proposed resampling method
serves as a simple yet effective evaluation tool for future IR studies.Comment: CIKM 2022 Full Pape
- …