2 research outputs found
Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
Answering multi-hop reasoning questions requires retrieving and synthesizing
information from diverse sources. Large Language Models (LLMs) struggle to
perform such reasoning consistently. Here we propose an approach to pinpoint
and rectify multi-hop reasoning failures through targeted memory injections on
LLM attention heads. First, we analyze the per-layer activations of GPT-2
models in response to single and multi-hop prompts. We then propose a mechanism
that allows users to inject pertinent prompt-specific information, which we
refer to as "memories," at critical LLM locations during inference. By thus
enabling the LLM to incorporate additional relevant information during
inference, we enhance the quality of multi-hop prompt completions. We show
empirically that a simple, efficient, and targeted memory injection into a key
attention layer can often increase the probability of the desired next token in
multi-hop tasks, by up to 424%
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Transformer-based Large Language Models (LLMs) are the state-of-the-art for
natural language tasks. Recent work has attempted to decode, by reverse
engineering the role of linear layers, the internal mechanisms by which LLMs
arrive at their final predictions for text completion tasks. Yet little is
known about the specific role of attention heads in producing the final token
prediction. We propose Attention Lens, a tool that enables researchers to
translate the outputs of attention heads into vocabulary tokens via learned
attention-head-specific transformations called lenses. Preliminary findings
from our trained lenses indicate that attention heads play highly specialized
roles in language models. The code for Attention Lens is available at
github.com/msakarvadia/AttentionLens