43 research outputs found
Efficient NTK using Dimensionality Reduction
Recently, neural tangent kernel (NTK) has been used to explain the dynamics
of learning parameters of neural networks, at the large width limit.
Quantitative analyses of NTK give rise to network widths that are often
impractical and incur high costs in time and energy in both training and
deployment. Using a matrix factorization technique, we show how to obtain
similar guarantees to those obtained by a prior analysis while reducing
training and inference resource costs. The importance of our result further
increases when the input points' data dimension is in the same order as the
number of input points. More generally, our work suggests how to analyze large
width networks in which dense linear layers are replaced with a low complexity
factorization, thus reducing the heavy dependence on the large width
Data pruning and neural scaling laws: fundamental limitations of score-based algorithms
Data pruning algorithms are commonly used to reduce the memory and
computational cost of the optimization process. Recent empirical results reveal
that random data pruning remains a strong baseline and outperforms most
existing data pruning methods in the high compression regime, i.e., where a
fraction of or less of the data is kept. This regime has recently
attracted a lot of interest as a result of the role of data pruning in
improving the so-called neural scaling laws; in [Sorscher et al.], the authors
showed the need for high-quality data pruning algorithms in order to beat the
sample power law.
In this work, we focus on score-based data pruning algorithms and show
theoretically and empirically why such algorithms fail in the high compression
regime. We demonstrate ``No Free Lunch" theorems for data pruning and present
calibration protocols that enhance the performance of existing pruning
algorithms in this high compression regime using randomization
One-shot Network Pruning at Initialization with Discriminative Image Patches
One-shot Network Pruning at Initialization (OPaI) is an effective method to
decrease network pruning costs. Recently, there is a growing belief that data
is unnecessary in OPaI. However, we obtain an opposite conclusion by ablation
experiments in two representative OPaI methods, SNIP and GraSP. Specifically,
we find that informative data is crucial to enhancing pruning performance. In
this paper, we propose two novel methods, Discriminative One-shot Network
Pruning (DOP) and Super Stitching, to prune the network by high-level visual
discriminative image patches. Our contributions are as follows. (1) Extensive
experiments reveal that OPaI is data-dependent. (2) Super Stitching performs
significantly better than the original OPaI method on benchmark ImageNet,
especially in a highly compressed model.Comment: BMVC 202