Overparameterized models have proven to be powerful tools for solving various
machine learning tasks. However, overparameterization often leads to a
substantial increase in computational and memory costs, which in turn requires
extensive resources to train. In this work, we present a novel approach for
compressing overparameterized models, developed through studying their learning
dynamics. We observe that for many deep models, updates to the weight matrices
occur within a low-dimensional invariant subspace. For deep linear models, we
demonstrate that their principal components are fitted incrementally within a
small subspace, and use these insights to propose a compression algorithm for
deep linear networks that involve decreasing the width of their intermediate
layers. We empirically evaluate the effectiveness of our compression technique
on matrix recovery problems. Remarkably, by using an initialization that
exploits the structure of the problem, we observe that our compressed network
converges faster than the original network, consistently yielding smaller
recovery errors. We substantiate this observation by developing a theory
focused on deep matrix factorization. Finally, we empirically demonstrate how
our compressed model has the potential to improve the utility of deep nonlinear
models. Overall, our algorithm improves the training efficiency by more than
2x, without compromising generalization.Comment: International Conference on Artificial Intelligence and Statistics
(AISTATS 2024