2,223 research outputs found
Scalable Compression of Deep Neural Networks
Deep neural networks generally involve some layers with mil- lions of
parameters, making them difficult to be deployed and updated on devices with
limited resources such as mobile phones and other smart embedded systems. In
this paper, we propose a scalable representation of the network parameters, so
that different applications can select the most suitable bit rate of the
network based on their own storage constraints. Moreover, when a device needs
to upgrade to a high-rate network, the existing low-rate network can be reused,
and only some incremental data are needed to be downloaded. We first
hierarchically quantize the weights of a pre-trained deep neural network to
enforce weight sharing. Next, we adaptively select the bits assigned to each
layer given the total bit budget. After that, we retrain the network to
fine-tune the quantized centroids. Experimental results show that our method
can achieve scalable compression with graceful degradation in the performance.Comment: 5 pages, 4 figures, ACM Multimedia 201
Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)
Deep neural networks (DNN) have shown remarkable success in a variety of
machine learning applications. The capacity of these models (i.e., number of
parameters), endows them with expressive power and allows them to reach the
desired performance. In recent years, there is an increasing interest in
deploying DNNs to resource-constrained devices (i.e., mobile devices) with
limited energy, memory, and computational budget. To address this problem, we
propose Entropy-Constrained Trained Ternarization (EC2T), a general framework
to create sparse and ternary neural networks which are efficient in terms of
storage (e.g., at most two binary-masks and two full-precision values are
required to save a weight matrix) and computation (e.g., MAC operations are
reduced to a few accumulations plus two multiplications). This approach
consists of two steps. First, a super-network is created by scaling the
dimensions of a pre-trained model (i.e., its width and depth). Subsequently,
this super-network is simultaneously pruned (using an entropy constraint) and
quantized (that is, ternary values are assigned layer-wise) in a training
process, resulting in a sparse and ternary network representation. We validate
the proposed approach in CIFAR-10, CIFAR-100, and ImageNet datasets, showing
its effectiveness in image classification tasks.Comment: Proceedings of the CVPR'20 Joint Workshop on Efficient Deep Learning
in Computer Vision. Code is available at
https://github.com/d-becking/efficientCNN
- …