2 research outputs found
A Main/Subsidiary Network Framework for Simplifying Binary Neural Network
To reduce memory footprint and run-time latency, techniques such as neural
network pruning and binarization have been explored separately. However, it is
unclear how to combine the best of the two worlds to get extremely small and
efficient models. In this paper, we, for the first time, define the
filter-level pruning problem for binary neural networks, which cannot be solved
by simply migrating existing structural pruning methods for full-precision
models. A novel learning-based approach is proposed to prune filters in our
main/subsidiary network framework, where the main network is responsible for
learning representative features to optimize the prediction performance, and
the subsidiary component works as a filter selector on the main network. To
avoid gradient mismatch when training the subsidiary component, we propose a
layer-wise and bottom-up scheme. We also provide the theoretical and
experimental comparison between our learning-based and greedy rule-based
methods. Finally, we empirically demonstrate the effectiveness of our approach
applied on several binary models, including binarized NIN, VGG-11, and
ResNet-18, on various image classification datasets.Comment: 9 pages and 9 figure
Automatic Pruning for Quantized Neural Networks
Neural network quantization and pruning are two techniques commonly used to
reduce the computational complexity and memory footprint of these models for
deployment. However, most existing pruning strategies operate on full-precision
and cannot be directly applied to discrete parameter distributions after
quantization. In contrast, we study a combination of these two techniques to
achieve further network compression. In particular, we propose an effective
pruning strategy for selecting redundant low-precision filters. Furthermore, we
leverage Bayesian optimization to efficiently determine the pruning ratio for
each layer. We conduct extensive experiments on CIFAR-10 and ImageNet with
various architectures and precisions. In particular, for ResNet-18 on ImageNet,
we prune 26.12% of the model size with Binarized Neural Network quantization,
achieving a top-1 classification accuracy of 47.32% in a model of 2.47 MB and
59.30% with a 2-bit DoReFa-Net in 4.36 MB