16 research outputs found
Structured Multi-Hashing for Model Compression
Despite the success of deep neural networks (DNNs), state-of-the-art models
are too large to deploy on low-resource devices or common server configurations
in which multiple models are held in memory. Model compression methods address
this limitation by reducing the memory footprint, latency, or energy
consumption of a model with minimal impact on accuracy. We focus on the task of
reducing the number of learnable variables in the model. In this work we
combine ideas from weight hashing and dimensionality reductions resulting in a
simple and powerful structured multi-hashing method based on matrix products
that allows direct control of model size of any deep network and is trained
end-to-end. We demonstrate the strength of our approach by compressing models
from the ResNet, EfficientNet, and MobileNet architecture families. Our method
allows us to drastically decrease the number of variables while maintaining
high accuracy. For instance, by applying our approach to EfficentNet-B4 (16M
parameters) we reduce it to to the size of B0 (5M parameters), while gaining
over 3% in accuracy over B0 baseline. On the commonly used benchmark CIFAR10 we
reduce the ResNet32 model by 75% with no loss in quality, and are able to do a
10x compression while still achieving above 90% accuracy.Comment: Elad and Yair contributed equally to the paper. They jointly proposed
the idea of structured-multi-hashing. Elad: Wrote most of the code and ran
most of the experiments Yair: Main contributor to the manuscript Hao: Coding
and experiments Yerlan: Coding and experiments Miguel: advised Yerlan about
optimization and model compression Mark:experiments Andrew: experiment
ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks
Low-rank compression, a popular model compression technique that produces
compact convolutional neural networks (CNNs) with low rankness, has been
well-studied in the literature. On the other hand, low-rank training, as an
alternative way to train low-rank CNNs from scratch, has been exploited little
yet. Unlike low-rank compression, low-rank training does not need pre-trained
full-rank models, and the entire training phase is always performed on the
low-rank structure, bringing attractive benefits for practical applications.
However, the existing low-rank training solutions still face several
challenges, such as a considerable accuracy drop and/or still needing to update
full-size models during the training. In this paper, we perform a systematic
investigation on low-rank CNN training. By identifying the proper low-rank
format and performance-improving strategy, we propose ELRT, an efficient
low-rank training solution for high-accuracy, high-compactness, low-rank CNN
models. Our extensive evaluation results for training various CNNs on different
datasets demonstrate the effectiveness of ELRT