28 research outputs found
Query Rewriting for Retrieval-Augmented Large Language Models
Large Language Models (LLMs) play powerful, black-box readers in the
retrieve-then-read pipeline, making remarkable progress in knowledge-intensive
tasks. This work introduces a new framework, Rewrite-Retrieve-Read instead of
the previous retrieve-then-read for the retrieval-augmented LLMs from the
perspective of the query rewriting. Unlike prior studies focusing on adapting
either the retriever or the reader, our approach pays attention to the
adaptation of the search query itself, for there is inevitably a gap between
the input text and the needed knowledge in retrieval. We first prompt an LLM to
generate the query, then use a web search engine to retrieve contexts.
Furthermore, to better align the query to the frozen modules, we propose a
trainable scheme for our pipeline. A small language model is adopted as a
trainable rewriter to cater to the black-box LLM reader. The rewriter is
trained using the feedback of the LLM reader by reinforcement learning.
Evaluation is conducted on downstream tasks, open-domain QA and multiple-choice
QA. Experiments results show consistent performance improvement, indicating
that our framework is proven effective and scalable, and brings a new framework
for retrieval-augmented LLM.Comment: EMNLP202
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
Large language models are powerful text processors and reasoners, but are
still subject to limitations including outdated knowledge and hallucinations,
which necessitates connecting them to the world. Retrieval-augmented large
language models have raised extensive attention for grounding model generation
on external knowledge. However, retrievers struggle to capture relevance,
especially for queries with complex information needs. Recent work has proposed
to improve relevance modeling by having large language models actively involved
in retrieval, i.e., to improve retrieval with generation. In this paper, we
show that strong performance can be achieved by a method we call Iter-RetGen,
which synergizes retrieval and generation in an iterative manner. A model
output shows what might be needed to finish a task, and thus provides an
informative context for retrieving more relevant knowledge which in turn helps
generate a better output in the next iteration. Compared with recent work which
interleaves retrieval with generation when producing an output, Iter-RetGen
processes all retrieved knowledge as a whole and largely preserves the
flexibility in generation without structural constraints. We evaluate
Iter-RetGen on multi-hop question answering, fact verification, and commonsense
reasoning, and show that it can flexibly leverage parametric knowledge and
non-parametric knowledge, and is superior to or competitive with
state-of-the-art retrieval-augmented baselines while causing fewer overheads of
retrieval and generation. We can further improve performance via
generation-augmented retrieval adaptation.Comment: Preprin
Adapting LLM Agents Through Communication
Recent advancements in large language models (LLMs) have shown potential for
human-like agents. To help these agents adapt to new tasks without extensive
human supervision, we propose the Learning through Communication (LTC)
paradigm, a novel training approach enabling LLM agents to improve continuously
through interactions with their environments and other agents. Recent
advancements in large language models (LLMs) have shown potential for
human-like agents. To help these agents adapt to new tasks without extensive
human supervision, we propose the Learning through Communication (LTC)
paradigm, a novel training approach enabling LLM agents to improve continuously
through interactions with their environments and other agents. Through
iterative exploration and PPO training, LTC empowers the agent to assimilate
short-term experiences into long-term memory. To optimize agent interactions
for task-specific learning, we introduce three structured communication
patterns: Monologue, Dialogue, and Analogue-tailored for common tasks such as
decision-making, knowledge-intensive reasoning, and numerical reasoning. We
evaluated LTC on three datasets: ALFWorld (decision-making), HotpotQA
(knowledge-intensive reasoning), and GSM8k (numerical reasoning). On ALFWorld,
it exceeds the instruction tuning baseline by 12% in success rate. On HotpotQA,
LTC surpasses the instruction-tuned LLaMA-7B agent by 5.1% in EM score, and it
outperforms the instruction-tuned 9x larger PaLM-62B agent by 0.6%. On GSM8k,
LTC outperforms the CoT-Tuning baseline by 3.6% in accuracy. The results
showcase the versatility and efficiency of the LTC approach across diverse
domains. We will open-source our code to promote further development of the
community.Comment: Preprin
Joint Generator-Ranker Learning for Natural Language Generation
Generate-then-rank is a widely used mechanism for text generation, where a
generator produces multiple text candidates and a ranker chooses the best one
among the text candidates. However, existing methods usually train the
generator and the ranker individually, neglecting the mutual feedback that
could further enhance the generation quality. To tackle this limitation, we
propose JGR, a novel joint training algorithm that integrates the generator and
the ranker in a single framework. JGR optimizes the generator with a hybrid
objective that combines data likelihood and ranker reward, and trains the
ranker with a contrastive loss that compares the generator outputs. By
iteratively updating the generator and the ranker, JGR can effectively
harmonize their learning and enhance their quality jointly. We evaluate JGR on
various text generation tasks and demonstrate that it surpasses existing
methods on four public datasets across three common generation scenarios. Our
code and models are publicly available at
https://github.com/microsoft/ProphetNet/tree/master/JGR
Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning
Neural semantic parsing has achieved impressive results in recent years, yet
its success relies on the availability of large amounts of supervised data. Our
goal is to learn a neural semantic parser when only prior knowledge about a
limited number of simple rules is available, without access to either annotated
programs or execution results. Our approach is initialized by rules, and
improved in a back-translation paradigm using generated question-program pairs
from the semantic parser and the question generator. A phrase table with
frequent mapping patterns is automatically derived, also updated as training
progresses, to measure the quality of generated instances. We train the model
with model-agnostic meta-learning to guarantee the accuracy and stability on
examples covered by rules, and meanwhile acquire the versatility to generalize
well on examples uncovered by rules. Results on three benchmark datasets with
different domains and programs show that our approach incrementally improves
the accuracy. On WikiSQL, our best model is comparable to the SOTA system
learned from denotations
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Recent developments in large language models (LLMs) have been impressive.
However, these models sometimes show inconsistencies and problematic behavior,
such as hallucinating facts, generating flawed code, or creating offensive and
toxic content. Unlike these models, humans typically utilize external tools to
cross-check and refine their initial content, like using a search engine for
fact-checking, or a code interpreter for debugging. Inspired by this
observation, we introduce a framework called CRITIC that allows LLMs, which are
essentially "black boxes" to validate and progressively amend their own outputs
in a manner similar to human interaction with tools. More specifically,
starting with an initial output, CRITIC interacts with appropriate tools to
evaluate certain aspects of the text, and then revises the output based on the
feedback obtained during this validation process. Comprehensive evaluations
involving free-form question answering, mathematical program synthesis, and
toxicity reduction demonstrate that CRITIC consistently enhances the
performance of LLMs. Meanwhile, our research highlights the crucial importance
of external feedback in promoting the ongoing self-improvement of LLMs.Comment: ICLR 202
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Large language models have made significant progress in various language
tasks, yet they still struggle with complex mathematics. In this paper, we
propose ToRA a series of Tool-integrated Reasoning Agents designed to solve
challenging mathematical problems by seamlessly integrating natural language
reasoning with the utilization of external tools (e.g., computation libraries
and symbolic solvers), thereby amalgamating the analytical prowess of language
and the computational efficiency of tools. To train ToRA, we curate interactive
tool-use trajectories on mathematical datasets, apply imitation learning on the
annotations, and propose output space shaping to further refine models'
reasoning behavior. As a result, ToRA models significantly outperform
open-source models on 10 mathematical reasoning datasets across all scales with
13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the
competition-level dataset MATH, surpassing the best open-source model
WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source
model that achieves an accuracy exceeding 50% on MATH, which significantly
outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems
with programs. Additionally, we conduct a comprehensive analysis of the
benefits and remaining challenges of tool interaction for mathematical
reasoning, providing valuable insights for future research.Comment: ICLR 2024; First two authors equal contributio