178 research outputs found
Pruning Convolutional Neural Networks with Self-Supervision
Convolutional neural networks trained without supervision come close to
matching performance with supervised pre-training, but sometimes at the cost of
an even higher number of parameters. Extracting subnetworks from these large
unsupervised convnets with preserved performance is of particular interest to
make them less computationally intensive. Typical pruning methods operate
during training on a task while trying to maintain the performance of the
pruned network on the same task. However, in self-supervised feature learning,
the training objective is agnostic on the representation transferability to
downstream tasks. Thus, preserving performance for this objective does not
ensure that the pruned subnetwork remains effective for solving downstream
tasks. In this work, we investigate the use of standard pruning methods,
developed primarily for supervised learning, for networks trained without
labels (i.e. on self-supervised tasks). We show that pruned masks obtained with
or without labels reach comparable performance when re-trained on labels,
suggesting that pruning operates similarly for self-supervised and supervised
learning. Interestingly, we also find that pruning preserves the transfer
performance of self-supervised subnetwork representations
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks
Pruning refers to the elimination of trivial weights from neural networks.
The sub-networks within an overparameterized model produced after pruning are
often called Lottery tickets. This research aims to generate winning lottery
tickets from a set of lottery tickets that can achieve similar accuracy to the
original unpruned network. We introduce a novel winning ticket called Cyclic
Overlapping Lottery Ticket (COLT) by data splitting and cyclic retraining of
the pruned network from scratch. We apply a cyclic pruning algorithm that keeps
only the overlapping weights of different pruned models trained on different
data segments. Our results demonstrate that COLT can achieve similar accuracies
(obtained by the unpruned model) while maintaining high sparsities. We show
that the accuracy of COLT is on par with the winning tickets of Lottery Ticket
Hypothesis (LTH) and, at times, is better. Moreover, COLTs can be generated
using fewer iterations than tickets generated by the popular Iterative
Magnitude Pruning (IMP) method. In addition, we also notice COLTs generated on
large datasets can be transferred to small ones without compromising
performance, demonstrating its generalizing capability. We conduct all our
experiments on Cifar-10, Cifar-100 & TinyImageNet datasets and report superior
performance than the state-of-the-art methods
Winning Lottery Tickets in Deep Generative Models
The lottery ticket hypothesis suggests that sparse, sub-networks of a given
neural network, if initialized properly, can be trained to reach comparable or
even better performance to that of the original network. Prior works in lottery
tickets have primarily focused on the supervised learning setup, with several
papers proposing effective ways of finding "winning tickets" in classification
problems. In this paper, we confirm the existence of winning tickets in deep
generative models such as GANs and VAEs. We show that the popular iterative
magnitude pruning approach (with late rewinding) can be used with generative
losses to find the winning tickets. This approach effectively yields tickets
with sparsity up to 99% for AutoEncoders, 93% for VAEs and 89% for GANs on
CIFAR and Celeb-A datasets. We also demonstrate the transferability of winning
tickets across different generative models (GANs and VAEs) sharing the same
architecture, suggesting that winning tickets have inductive biases that could
help train a wide range of deep generative models. Furthermore, we show the
practical benefits of lottery tickets in generative models by detecting tickets
at very early stages in training called "early-bird tickets". Through
early-bird tickets, we can achieve up to 88% reduction in floating-point
operations (FLOPs) and 54% reduction in training time, making it possible to
train large-scale generative models over tight resource constraints. These
results out-perform existing early pruning methods like SNIP (Lee, Ajanthan,
and Torr 2019) and GraSP (Wang, Zhang, and Grosse 2020). Our findings shed
light towards existence of proper network initializations that could improve
convergence and stability of generative models.Comment: Published at AAAI 202
A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations
Modern deep neural networks, particularly recent large language models, come
with massive model sizes that require significant computational and storage
resources. To enable the deployment of modern models on resource-constrained
environments and accelerate inference time, researchers have increasingly
explored pruning techniques as a popular research direction in neural network
compression. However, there is a dearth of up-to-date comprehensive review
papers on pruning. To address this issue, in this survey, we provide a
comprehensive review of existing research works on deep neural network pruning
in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to
prune, and 4) fusion of pruning and other compression techniques. We then
provide a thorough comparative analysis of seven pairs of contrast settings for
pruning (e.g., unstructured/structured) and explore emerging topics, including
post-training pruning, different levels of supervision for pruning, and broader
applications (e.g., adversarial robustness) to shed light on the commonalities
and differences of existing methods and lay the foundation for further method
development. To facilitate future research, we build a curated collection of
datasets, networks, and evaluations on different applications. Finally, we
provide some valuable recommendations on selecting pruning methods and prospect
promising research directions. We build a repository at
https://github.com/hrcheng1066/awesome-pruning
Playing Lottery Tickets in Style Transfer Models
Style transfer has achieved great success and attracted a wide range of
attention from both academic and industrial communities due to its flexible
application scenarios. However, the dependence on a pretty large VGG-based
autoencoder leads to existing style transfer models having high parameter
complexities, which limits their applications on resource-constrained devices.
Compared with many other tasks, the compression of style transfer models has
been less explored. Recently, the lottery ticket hypothesis (LTH) has shown
great potential in finding extremely sparse matching subnetworks which can
achieve on par or even better performance than the original full networks when
trained in isolation. In this work, we for the first time perform an empirical
study to verify whether such trainable matching subnetworks also exist in style
transfer models. Specifically, we take two most popular style transfer models,
i.e., AdaIN and SANet, as the main testbeds, which represent global and local
transformation based style transfer methods respectively. We carry out
extensive experiments and comprehensive analysis, and draw the following
conclusions. (1) Compared with fixing the VGG encoder, style transfer models
can benefit more from training the whole network together. (2) Using iterative
magnitude pruning, we find the matching subnetworks at 89.2% sparsity in AdaIN
and 73.7% sparsity in SANet, which demonstrates that style transfer models can
play lottery tickets too. (3) The feature transformation module should also be
pruned to obtain a much sparser model without affecting the existence and
quality of the matching subnetworks. (4) Besides AdaIN and SANet, other models
such as LST, MANet, AdaAttN and MCCNet can also play lottery tickets, which
shows that LTH can be generalized to various style transfer models
- …