20 research outputs found
Diversity of Thought Improves Reasoning Abilities of LLMs
Large language models (LLMs) are documented to struggle in settings that
require complex reasoning. Nevertheless, instructing the model to break down
the problem into smaller reasoning steps, or ensembling various generations
through modifying decoding steps boosts performance. However, these methods
assume that the input prompt is fixed and expect the decoding strategies to
introduce the diversity needed for ensembling. In this work, we discuss how one
can create and leverage variations of the input prompt as a means of diversity
of thought. We propose a method that automatically improves prompt diversity by
soliciting feedback from the LLM to ideate approaches that are apt for the
problem. We then ensemble the diverse prompts in our method DIVSE (DIVerse
reasoning path Self-Ensemble) across multiple inference calls, or use diverse
approaches within a single inference call; we call the latter IDIV-SE (In-call
DIVerse reasoning path Self-Ensemble). Apart from our approaches outperforming
prior work, DIV-SE(in particular) advances state-of-the-art performance on the
challenging planning and graph coloring benchmarks. Our results improve the
Pareto frontier of the accuracy-cost trade-off
Privately Aligning Language Models with Reinforcement Learning
Positioned between pre-training and user deployment, aligning large language
models (LLMs) through reinforcement learning (RL) has emerged as a prevailing
strategy for training instruction following-models such as ChatGPT. In this
work, we initiate the study of privacy-preserving alignment of LLMs through
Differential Privacy (DP) in conjunction with RL. Following the influential
work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment
via RL without human in the loop (e.g., positive review generation) and (ii)
alignment via RL from human feedback (RLHF) (e.g., summarization in a
human-preferred way). We give a new DP framework to achieve alignment via RL,
and prove its correctness. Our experimental results validate the effectiveness
of our approach, offering competitive utility while ensuring strong privacy
protections
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
We investigate the internal behavior of Transformer-based Large Language
Models (LLMs) when they generate factually incorrect text. We propose modeling
factual queries as Constraint Satisfaction Problems and use this framework to
investigate how the model interacts internally with factual constraints.
Specifically, we discover a strong positive relation between the model's
attention to constraint tokens and the factual accuracy of its responses. In
our curated suite of 11 datasets with over 40,000 prompts, we study the task of
predicting factual errors with the Llama-2 family across all scales (7B, 13B,
70B). We propose SAT Probe, a method probing self-attention patterns, that can
predict constraint satisfaction and factual errors, and allows early error
identification. The approach and findings demonstrate how using the mechanistic
understanding of factuality in LLMs can enhance reliability