Search CORE

2 research outputs found

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Author: Ajith Aswathy
Bauer André
Chard Kyle
Foster Ian
Grzenda Daniel
Hudson Nathaniel
Khan Arham
Sakarvadia Mansi
Publication venue
Publication date: 12/09/2023
Field of study

Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%

arXiv.org e-Print Archive

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

Author: Ajith Aswathy
Bauer André
Chard Kyle
Foster Ian
Grzenda Daniel
Hudson Nathaniel
Khan Arham
Sakarvadia Mansi
Publication venue
Publication date: 24/10/2023
Field of study

Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention-head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized roles in language models. The code for Attention Lens is available at github.com/msakarvadia/AttentionLens

arXiv.org e-Print Archive