10 research outputs found
PDP: Parameter-free Differentiable Pruning is All You Need
DNN pruning is a popular way to reduce the size of a model, improve the
inference latency, and minimize the power consumption on DNN accelerators.
However, existing approaches might be too complex, expensive or ineffective to
apply to a variety of vision/language tasks, DNN architectures and to honor
structured pruning constraints. In this paper, we propose an efficient yet
effective train-time pruning scheme, Parameter-free Differentiable Pruning
(PDP), which offers state-of-the-art qualities in model size, accuracy, and
training cost. PDP uses a dynamic function of weights during training to
generate soft pruning masks for the weights in a parameter-free manner for a
given pruning target. While differentiable, the simplicity and efficiency of
PDP make it universal enough to deliver state-of-the-art
random/structured/channel pruning results on various vision and natural
language tasks. For example, for MobileNet-v1, PDP can achieve 68.2% top-1
ImageNet1k accuracy at 86.6% sparsity, which is 1.7% higher accuracy than those
from the state-of-the-art algorithms. Also, PDP yields over 83.1% accuracy on
Multi-Genre Natural Language Inference with 90% sparsity for BERT, while the
next best from the existing techniques shows 81.5% accuracy. In addition, PDP
can be applied to structured pruning, such as N:M pruning and channel pruning.
For 1:4 structured pruning of ResNet18, PDP improved the top-1 ImageNet1k
accuracy by over 3.6% over the state-of-the-art. For channel pruning of
ResNet50, PDP reduced the top-1 ImageNet1k accuracy by 0.6% from the
state-of-the-art
Emerging Paradigms of Neural Network Pruning
Over-parameterization of neural networks benefits the optimization and
generalization yet brings cost in practice. Pruning is adopted as a
post-processing solution to this problem, which aims to remove unnecessary
parameters in a neural network with little performance compromised. It has been
broadly believed the resulted sparse neural network cannot be trained from
scratch to comparable accuracy. However, several recent works (e.g., [Frankle
and Carbin, 2019a]) challenge this belief by discovering random sparse networks
which can be trained to match the performance with their dense counterpart.
This new pruning paradigm later inspires more new methods of pruning at
initialization. In spite of the encouraging progress, how to coordinate these
new pruning fashions with the traditional pruning has not been explored yet.
This survey seeks to bridge the gap by proposing a general pruning framework so
that the emerging pruning paradigms can be accommodated well with the
traditional one. With it, we systematically reflect the major differences and
new insights brought by these new pruning fashions, with representative works
discussed at length. Finally, we summarize the open questions as worthy future
directions
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
Neural network pruning is a key technique towards engineering large yet
scalable, interpretable, and generalizable models. Prior work on the subject
has developed largely along two orthogonal directions: (1) differentiable
pruning for efficiently and accurately scoring the importance of parameters,
and (2) combinatorial optimization for efficiently searching over the space of
sparse models. We unite the two approaches, both theoretically and empirically,
to produce a coherent framework for structured neural network pruning in which
differentiable pruning guides combinatorial optimization algorithms to select
the most important sparse set of parameters. Theoretically, we show how many
existing differentiable pruning techniques can be understood as nonconvex
regularization for group sparse optimization, and prove that for a wide class
of nonconvex regularizers, the global optimum is unique, group-sparse, and
provably yields an approximate solution to a sparse convex optimization
problem. The resulting algorithm that we propose, SequentialAttention++,
advances the state of the art in large-scale neural network block-wise pruning
tasks on the ImageNet and Criteo datasets
Information-Theoretic GAN Compression with Variational Energy-based Model
We propose an information-theoretic knowledge distillation approach for the
compression of generative adversarial networks, which aims to maximize the
mutual information between teacher and student networks via a variational
optimization based on an energy-based model. Because the direct computation of
the mutual information in continuous domains is intractable, our approach
alternatively optimizes the student network by maximizing the variational lower
bound of the mutual information. To achieve a tight lower bound, we introduce
an energy-based model relying on a deep neural network to represent a flexible
variational distribution that deals with high-dimensional images and consider
spatial dependencies between pixels, effectively. Since the proposed method is
a generic optimization algorithm, it can be conveniently incorporated into
arbitrary generative adversarial networks and even dense prediction networks,
e.g., image enhancement models. We demonstrate that the proposed algorithm
achieves outstanding performance in model compression of generative adversarial
networks consistently when combined with several existing models.Comment: Accepted at Neurips202
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators
Compressing a predefined deep neural network (DNN) into a compact sub-network
with competitive performance is crucial in the efficient machine learning
realm. This topic spans various techniques, from structured pruning to neural
architecture search, encompassing both pruning and erasing operators
perspectives. Despite advancements, existing methods suffers from complex,
multi-stage processes that demand substantial engineering and domain knowledge,
limiting their broader applications. We introduce the third-generation
Only-Train-Once (OTOv3), which first automatically trains and compresses a
general DNN through pruning and erasing operations, creating a compact and
competitive sub-network without the need of fine-tuning. OTOv3 simplifies and
automates the training and compression process, minimizes the engineering
efforts required from users. It offers key technological advancements: (i)
automatic search space construction for general DNNs based on dependency graph
analysis; (ii) Dual Half-Space Projected Gradient (DHSPG) and its enhanced
version with hierarchical search (H2SPG) to reliably solve (hierarchical)
structured sparsity problems and ensure sub-network validity; and (iii)
automated sub-network construction using solutions from DHSPG/H2SPG and
dependency graphs. Our empirical results demonstrate the efficacy of OTOv3
across various benchmarks in structured pruning and neural architecture search.
OTOv3 produces sub-networks that match or exceed the state-of-the-arts. The
source code will be available at https://github.com/tianyic/only_train_once.Comment: 39 pages. Due to the page dim limitation, the full appendix is
attached here https://tinyurl.com/otov3appendix. Recommend to zoom-in for
finer details. arXiv admin note: text overlap with arXiv:2305.1803
INTERPRETING AND PRUNING COMPUTER VISON BASED NEURAL NETWORKS
Computer vision is a complex subject matter entailing tasks, such as, object detection and recognition, image segmentation, super resolution, image restoration, generated artwork, and many others. The application of these tasks is becoming more fundamental to our everyday lives. Hence, beyond the complexity of said systems, their accuracy has become critical. In this context, the ability to decentralise the computation of the neural networks behind cutting edge computer vision systems has become essential. However, this is not always possible, models are getting larger, and this makes them harder, or potentially impossible to use on consumer hardware. This thesis develops a pruning methodology called “Weight Action Pruning” to reduce the complexity of computer vision neural networks, this method combines sparsity pruning and structured pruning. Sparsity pruning highlights the importance of specific neurons and weights, and structural pruning is then used to remove any redundancies. This process is repeated multiple times and results in a significant decrease in the computing power required to deploy a neural network, reducing inference times and memory requirements. Weight Action Pruning is first applied to deblocking neural networks used in video coding. Pruning these networks with Weight Action Pruning allowed for large computational reductions without significant impacts on accuracy. To further test the validity of Weight Action Pruning on multiple datasets and different network architectures, Weight Action Pruning was tested on the generative adversarial U-Net used in a seminal paper in the field. This work showed that the ability to prune a neural network relies not only on the neural network’s architecture, but also the dataset used to train the model. Weight Action Pruning was then applied to image recognition networks VGG-16 and ResNet-50, this allowed Weight Action Pruning to be directly evaluated against other state of the art pruning methods. It was found that, models that were pruned to a set size had higher accuracies than models that were trained from scratch with the same size. Finally, the impact of pruning a neural network is investigated by analysing weight distribution, saliency maps and other visualizations. It must be noted that Weight Action Pruning comes at a cost at training time, due to the re-training required. Additionally pruning may cause networks to become less robust, as they are optimised by removing the learnt “edge cases”