Search CORE

20 research outputs found

Diversity of Thought Improves Reasoning Abilities of LLMs

Author: Chandrasekaran Varun
Naik Ranjita
Nushi Besmira
Palangi Hamid
Yuksekgonul Mert
Publication venue
Publication date: 23/02/2024
Field of study

Large language models (LLMs) are documented to struggle in settings that require complex reasoning. Nevertheless, instructing the model to break down the problem into smaller reasoning steps, or ensembling various generations through modifying decoding steps boosts performance. However, these methods assume that the input prompt is fixed and expect the decoding strategies to introduce the diversity needed for ensembling. In this work, we discuss how one can create and leverage variations of the input prompt as a means of diversity of thought. We propose a method that automatically improves prompt diversity by soliciting feedback from the LLM to ideate approaches that are apt for the problem. We then ensemble the diverse prompts in our method DIVSE (DIVerse reasoning path Self-Ensemble) across multiple inference calls, or use diverse approaches within a single inference call; we call the latter IDIV-SE (In-call DIVerse reasoning path Self-Ensemble). Apart from our approaches outperforming prior work, DIV-SE(in particular) advances state-of-the-art performance on the challenging planning and graph coloring benchmarks. Our results improve the Pareto frontier of the accuracy-cost trade-off

arXiv.org e-Print Archive

Privately Aligning Language Models with Reinforcement Learning

Author: Backurs Arturs
Chandrasekaran Varun
Inan Huseyin A.
Kulkarni Janardhan
Sim Robert
Wu Fan
Publication venue
Publication date: 25/10/2023
Field of study

Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment via RL without human in the loop (e.g., positive review generation) and (ii) alignment via RL from human feedback (RLHF) (e.g., summarization in a human-preferred way). We give a new DP framework to achieve alignment via RL, and prove its correctness. Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections

arXiv.org e-Print Archive

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Author: Chandrasekaran Varun
Gunasekar Suriya
Jones Erik
Kamar Ece
Naik Ranjita
Nushi Besmira
Palangi Hamid
Yuksekgonul Mert
Publication venue
Publication date: 26/09/2023
Field of study

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as Constraint Satisfaction Problems and use this framework to investigate how the model interacts internally with factual constraints. Specifically, we discover a strong positive relation between the model's attention to constraint tokens and the factual accuracy of its responses. In our curated suite of 11 datasets with over 40,000 prompts, we study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing self-attention patterns, that can predict constraint satisfaction and factual errors, and allows early error identification. The approach and findings demonstrate how using the mechanistic understanding of factuality in LLMs can enhance reliability

arXiv.org e-Print Archive