Do Localization Methods Actually Localize Memorized Data in LLMs?

Chang, Ting-Yun; Jia, Robin; Thomason, Jesse

Do Localization Methods Actually Localize Memorized Data in LLMs?

Authors: Ting-Yun Chang
Robin Jia
Jesse Thomason
Publication date: 15 November 2023
Publisher

Abstract

Large language models (LLMs) can memorize many pretrained sequences verbatim. This paper studies if we can locate a small set of neurons in LLMs responsible for memorizing a given sequence. While the concept of localization is often mentioned in prior work, methods for localization have never been systematically and directly evaluated; we address this with two benchmarking approaches. In our INJ Benchmark, we actively inject a piece of new information into a small subset of LLM weights and measure whether localization methods can identify these "ground truth" weights. In the DEL Benchmark, we study localization of pretrained data that LLMs have already memorized; while this setting lacks ground truth, we can still evaluate localization by measuring whether dropping out located neurons erases a memorized sequence from the model. We evaluate five localization methods on our two benchmarks, and both show similar rankings. All methods exhibit promising localization ability, especially for pruning-based methods, though the neurons they identify are not necessarily specific to a single memorized sequence

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.09060

Last time updated on 10/02/2024