7 research outputs found
Query Rewriting for Retrieval-Augmented Large Language Models
Large Language Models (LLMs) play powerful, black-box readers in the
retrieve-then-read pipeline, making remarkable progress in knowledge-intensive
tasks. This work introduces a new framework, Rewrite-Retrieve-Read instead of
the previous retrieve-then-read for the retrieval-augmented LLMs from the
perspective of the query rewriting. Unlike prior studies focusing on adapting
either the retriever or the reader, our approach pays attention to the
adaptation of the search query itself, for there is inevitably a gap between
the input text and the needed knowledge in retrieval. We first prompt an LLM to
generate the query, then use a web search engine to retrieve contexts.
Furthermore, to better align the query to the frozen modules, we propose a
trainable scheme for our pipeline. A small language model is adopted as a
trainable rewriter to cater to the black-box LLM reader. The rewriter is
trained using the feedback of the LLM reader by reinforcement learning.
Evaluation is conducted on downstream tasks, open-domain QA and multiple-choice
QA. Experiments results show consistent performance improvement, indicating
that our framework is proven effective and scalable, and brings a new framework
for retrieval-augmented LLM.Comment: EMNLP202
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Large language models (LLMs) have dramatically enhanced the field of language
intelligence, as demonstrably evidenced by their formidable empirical
performance across a spectrum of complex reasoning tasks. Additionally,
theoretical proofs have illuminated their emergent reasoning capabilities,
providing a compelling showcase of their advanced cognitive abilities in
linguistic contexts. Critical to their remarkable efficacy in handling complex
reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning
techniques, obliging them to formulate intermediate steps en route to deriving
an answer. The CoT reasoning approach has not only exhibited proficiency in
amplifying reasoning performance but also in enhancing interpretability,
controllability, and flexibility. In light of these merits, recent research
endeavors have extended CoT reasoning methodologies to nurture the development
of autonomous language agents, which adeptly adhere to language instructions
and execute actions within varied environments. This survey paper orchestrates
a thorough discourse, penetrating vital research dimensions, encompassing: (i)
the foundational mechanics of CoT techniques, with a focus on elucidating the
circumstances and justification behind its efficacy; (ii) the paradigm shift in
CoT; and (iii) the burgeoning of language agents fortified by CoT approaches.
Prospective research avenues envelop explorations into generalization,
efficiency, customization, scaling, and safety. This paper caters to a wide
audience, including beginners seeking comprehensive knowledge of CoT reasoning
and language agents, as well as experienced researchers interested in
foundational mechanics and engaging in cutting-edge discussions on these
topics. A repository for the related papers is available at
https://github.com/Zoeyyao27/CoT-Igniting-Agent
PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization
Based on the remarkable achievements of pre-trained language models in
abstractive summarization, the copying mechanism has proved helpful by
improving the factuality, stability, and overall performance. This work
proposes PROM, a new PhRase-level cOpying Mechanism that enhances attention on
n-grams, which can be applied to zero-shot summarization with pre-training.
PROM adds an indicator layer to explicitly pick up tokens in n-gram that can be
copied from the source, and calculates an auxiliary loss for the copying
prediction. Empirical studies show that PROM makes significant improvements in
fine-tuning on benchmarks. In zero-shot setting, PROM is utilized in the
self-supervised pre-training on raw corpora and provides new general baselines
on a wide range of summarization datasets. Further analysis shows that PROM
performs more reasonable copying and contributes to faithfulness