69 research outputs found
Gradient Sketches for Training Data Attribution and Studying the Loss Landscape
Random projections or sketches of gradients and Hessian vector products play
an essential role in applications where one needs to store many such vectors
while retaining accurate information about their relative geometry. Two
important scenarios are training data attribution (tracing a model's behavior
to the training data), where one needs to store a gradient for each training
example, and the study of the spectrum of the Hessian (to analyze the training
dynamics), where one needs to store multiple Hessian vector products. While
sketches that use dense matrices are easy to implement, they are memory bound
and cannot be scaled to modern neural networks. Motivated by work on the
intrinsic dimension of neural networks, we propose and study a design space for
scalable sketching algorithms. We demonstrate the efficacy of our approach in
three applications: training data attribution, the analysis of the Hessian
spectrum and the computation of the intrinsic dimension when fine-tuning
pre-trained language models
- …