22 research outputs found
TaskWeb: Selecting Better Source Tasks for Multi-task NLP
Recent work in NLP has shown promising results in training models on large
amounts of tasks to achieve better generalization. However, it is not
well-understood how tasks are related, and how helpful training tasks can be
chosen for a new task. In this work, we investigate whether knowing task
relationships via pairwise task transfer improves choosing one or more source
tasks that help to learn a new target task. We provide TaskWeb, a large-scale
benchmark of pairwise task transfers for 22 NLP tasks using three different
model types, sizes, and adaptation methods, spanning about 25,000 experiments.
Then, we design a new method TaskShop based on our analysis of TaskWeb.
TaskShop uses TaskWeb to estimate the benefit of using a source task for
learning a new target task, and to choose a subset of helpful training tasks
for multi-task training. Our method improves overall rankings and top-k
precision of source tasks by 10% and 38%, respectively. We also use TaskShop to
build much smaller multi-task training sets that improve zero-shot performances
across 11 different target tasks by at least 4.3%.Comment: 21 pages, 14 figure
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Despite their remarkable capabilities, large language models (LLMs) often
produce responses containing factual inaccuracies due to their sole reliance on
the parametric knowledge they encapsulate. Retrieval-Augmented Generation
(RAG), an ad hoc approach that augments LMs with retrieval of relevant
knowledge, decreases such issues. However, indiscriminately retrieving and
incorporating a fixed number of retrieved passages, regardless of whether
retrieval is necessary, or passages are relevant, diminishes LM versatility or
can lead to unhelpful response generation. We introduce a new framework called
Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's
quality and factuality through retrieval and self-reflection. Our framework
trains a single arbitrary LM that adaptively retrieves passages on-demand, and
generates and reflects on retrieved passages and its own generations using
special tokens, called reflection tokens. Generating reflection tokens makes
the LM controllable during the inference phase, enabling it to tailor its
behavior to diverse task requirements. Experiments show that Self-RAG (7B and
13B parameters) significantly outperforms state-of-the-art LLMs and
retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG
outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA,
reasoning and fact verification tasks, and it shows significant gains in
improving factuality and citation accuracy for long-form generations relative
to these models.Comment: 30 pages, 2 figures, 12 table
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Despite their impressive performance on diverse tasks, large language models
(LMs) still struggle with tasks requiring rich world knowledge, implying the
limitations of relying solely on their parameters to encode a wealth of world
knowledge. This paper aims to understand LMs' strengths and limitations in
memorizing factual knowledge, by conducting large-scale knowledge probing
experiments of 10 models and 4 augmentation methods on PopQA, our new
open-domain QA dataset with 14k questions. We find that LMs struggle with less
popular factual knowledge, and that scaling fails to appreciably improve
memorization of factual knowledge in the long tail. We then show that
retrieval-augmented LMs largely outperform orders of magnitude larger LMs,
while unassisted LMs remain competitive in questions about high-popularity
entities. Based on those findings, we devise a simple, yet effective, method
for powerful and efficient retrieval-augmented LMs, which retrieves
non-parametric memories only when necessary. Experimental results show that
this significantly improves models' performance while reducing the inference
costs.Comment: ACL 2023; Code and data available at
https://github.com/AlexTMallen/adaptive-retrieva