1,484 research outputs found
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Generative models for open domain question answering have proven to be
competitive, without resorting to external knowledge. While promising, this
approach requires to use models with billions of parameters, which are
expensive to train and query. In this paper, we investigate how much these
models can benefit from retrieving text passages, potentially containing
evidence. We obtain state-of-the-art results on the Natural Questions and
TriviaQA open benchmarks. Interestingly, we observe that the performance of
this method significantly improves when increasing the number of retrieved
passages. This is evidence that generative models are good at aggregating and
combining evidence from multiple passages
GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking
Retrieval-enhanced text generation, which aims to leverage passages retrieved
from a large passage corpus for delivering a proper answer given the input
query, has shown remarkable progress on knowledge-intensive language tasks such
as open-domain question answering and knowledge-enhanced dialogue generation.
However, the retrieved passages are not ideal for guiding answer generation
because of the discrepancy between retrieval and generation, i.e., the
candidate passages are all treated equally during the retrieval procedure
without considering their potential to generate the proper answers. This
discrepancy makes a passage retriever deliver a sub-optimal collection of
candidate passages to generate answers. In this paper, we propose the
GeneRative Knowledge Improved Passage Ranking (GripRank) approach, addressing
the above challenge by distilling knowledge from a generative passage estimator
(GPE) to a passage ranker, where the GPE is a generative language model used to
measure how likely the candidate passages can generate the proper answer. We
realize the distillation procedure by teaching the passage ranker learning to
rank the passages ordered by the GPE. Furthermore, we improve the distillation
quality by devising a curriculum knowledge distillation mechanism, which allows
the knowledge provided by the GPE can be progressively distilled to the ranker
through an easy-to-hard curriculum, enabling the passage ranker to correctly
recognize the provenance of the answer from many plausible candidates. We
conduct extensive experiments on four datasets across three knowledge-intensive
language tasks. Experimental results show advantages over the state-of-the-art
methods for both passage ranking and answer generation on the KILT benchmark.Comment: 11 pages, 4 figure
Large-Scale Knowledge Synthesis and Complex Information Retrieval from Biomedical Documents
Recent advances in the healthcare industry have led to an abundance of
unstructured data, making it challenging to perform tasks such as efficient and
accurate information retrieval at scale. Our work offers an all-in-one scalable
solution for extracting and exploring complex information from large-scale
research documents, which would otherwise be tedious. First, we briefly explain
our knowledge synthesis process to extract helpful information from
unstructured text data of research documents. Then, on top of the knowledge
extracted from the documents, we perform complex information retrieval using
three major components- Paragraph Retrieval, Triplet Retrieval from Knowledge
Graphs, and Complex Question Answering (QA). These components combine lexical
and semantic-based methods to retrieve paragraphs and triplets and perform
faceted refinement for filtering these search results. The complexity of
biomedical queries and documents necessitates using a QA system capable of
handling queries more complex than factoid queries, which we evaluate
qualitatively on the COVID-19 Open Research Dataset (CORD-19) to demonstrate
the effectiveness and value-add
Large Language Models for Information Retrieval: A Survey
As a primary means of information acquisition, information retrieval (IR)
systems, such as search engines, have integrated themselves into our daily
lives. These systems also serve as components of dialogue, question-answering,
and recommender systems. The trajectory of IR has evolved dynamically from its
origins in term-based methods to its integration with advanced neural models.
While the neural models excel at capturing complex contextual signals and
semantic nuances, thereby reshaping the IR landscape, they still face
challenges such as data scarcity, interpretability, and the generation of
contextually plausible yet potentially inaccurate responses. This evolution
requires a combination of both traditional methods (such as term-based sparse
retrieval methods with rapid response) and modern neural architectures (such as
language models with powerful language understanding capacity). Meanwhile, the
emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has
revolutionized natural language processing due to their remarkable language
understanding, generation, generalization, and reasoning abilities.
Consequently, recent research has sought to leverage LLMs to improve IR
systems. Given the rapid evolution of this research trajectory, it is necessary
to consolidate existing methodologies and provide nuanced insights through a
comprehensive overview. In this survey, we delve into the confluence of LLMs
and IR systems, including crucial aspects such as query rewriters, retrievers,
rerankers, and readers. Additionally, we explore promising directions within
this expanding field
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
Open-domain Question Answering models which directly leverage question-answer
(QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show
promise in terms of speed and memory compared to conventional models which
retrieve and read from text corpora. QA-pair retrievers also offer
interpretable answers, a high degree of control, and are trivial to update at
test time with new knowledge. However, these models lack the accuracy of
retrieve-and-read systems, as substantially less knowledge is covered by the
available QA-pairs relative to text corpora like Wikipedia. To facilitate
improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very
large resource of 65M automatically-generated QA-pairs. We introduce a new
QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and
caches test questions, enabling RePAQ to match the accuracy of recent
retrieve-and-read models, whilst being significantly faster. Using PAQ, we
train CBQA models which outperform comparable baselines by 5%, but trail RePAQ
by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be
configured for size (under 500MB) or speed (over 1K questions per second)
whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at
selective QA, abstaining from answering when it is likely to be incorrect. This
enables RePAQ to ``back-off" to a more expensive state-of-the-art model,
leading to a combined system which is both more accurate and 2x faster than the
state-of-the-art model alone
NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned
We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing retrieval corpora or the parameters of learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
The rise of large language models (LLMs) had a transformative impact on
search, ushering in a new era of search engines that are capable of generating
search results in natural language text, imbued with citations for supporting
sources. Building generative information-seeking models demands openly
accessible datasets, which currently remain lacking. In this paper, we
introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative
Retrieval for Information-seeking Dataset) for building end-to-end generative
information-seeking models that are capable of retrieving candidate quotes and
generating attributed explanations. Unlike recent efforts that focus on human
evaluation of black-box proprietary search engines, we built our dataset atop
the English subset of MIRACL, a publicly available information retrieval
dataset. HAGRID is constructed based on human and LLM collaboration. We first
automatically collect attributed explanations that follow an in-context
citation style using an LLM, i.e. GPT-3.5. Next, we ask human annotators to
evaluate the LLM explanations based on two criteria: informativeness and
attributability. HAGRID serves as a catalyst for the development of
information-seeking models with better attribution capabilities.Comment: Data released at https://github.com/project-miracl/hagri
Information Retrieval: Recent Advances and Beyond
In this paper, we provide a detailed overview of the models used for
information retrieval in the first and second stages of the typical processing
chain. We discuss the current state-of-the-art models, including methods based
on terms, semantic retrieval, and neural. Additionally, we delve into the key
topics related to the learning process of these models. This way, this survey
offers a comprehensive understanding of the field and is of interest for for
researchers and practitioners entering/working in the information retrieval
domain
Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
Questions in open-domain question answering are often ambiguous, allowing
multiple interpretations. One approach to handling them is to identify all
possible interpretations of the ambiguous question (AQ) and to generate a
long-form answer addressing them all, as suggested by Stelmakh et al., (2022).
While it provides a comprehensive response without bothering the user for
clarification, considering multiple dimensions of ambiguity and gathering
corresponding knowledge remains a challenge. To cope with the challenge, we
propose a novel framework, Tree of Clarifications (ToC): It recursively
constructs a tree of disambiguations for the AQ -- via few-shot prompting
leveraging external knowledge -- and uses it to generate a long-form answer.
ToC outperforms existing baselines on ASQA in a few-shot setup across the
metrics, while surpassing fully-supervised baselines trained on the whole
training set in terms of Disambig-F1 and Disambig-ROUGE. Code is available at
https://github.com/gankim/tree-of-clarifications.Comment: Accepted to EMNLP 202
- …