Low-rank approximations, of the weight and feature space can enhance the
performance of deep learning models, whether in terms of improving
generalization or reducing the latency of inference. However, there is no clear
consensus yet on \emph{how}, \emph{when} and \emph{why} these approximations
are helpful for large language models (LLMs). In this work, we empirically
study the efficacy of weight and feature space decomposition in
transformer-based LLMs. We demonstrate that surgical decomposition not only
provides critical insights into the trade-off between compression and language
modelling performance, but also sometimes enhances commonsense reasoning
performance of LLMs. Our empirical analysis identifies specific network
segments that intrinsically exhibit a low-rank structure. Furthermore, we
extend our investigation to the implications of low-rank approximations on
model bias. Overall, our findings offer a novel perspective on optimizing LLMs,
presenting the low-rank approximation not only as a tool for performance
enhancements, but also as a means to potentially rectify biases within these
models. Our code is available at
\href{https://github.com/nyunAI/SFSD-LLM}{GitHub}.Comment: Accepted at ACL 202