15 research outputs found
Few Shot Network Compression via Cross Distillation
Model compression has been widely adopted to obtain light-weighted deep
neural networks. Most prevalent methods, however, require fine-tuning with
sufficient training data to ensure accuracy, which could be challenged by
privacy and security issues. As a compromise between privacy and performance,
in this paper we investigate few shot network compression: given few samples
per class, how can we effectively compress the network with negligible
performance drop? The core challenge of few shot network compression lies in
high estimation errors from the original network during inference, since the
compressed network can easily over-fits on the few training instances. The
estimation errors could propagate and accumulate layer-wisely and finally
deteriorate the network output. To address the problem, we propose cross
distillation, a novel layer-wise knowledge distillation approach. By
interweaving hidden layers of teacher and student network, layer-wisely
accumulated estimation errors can be effectively reduced.The proposed method
offers a general framework compatible with prevalent network compression
techniques such as pruning. Extensive experiments on benchmark datasets
demonstrate that cross distillation can significantly improve the student
network's accuracy when only a few training instances are available.Comment: AAAI 202
RTN: Reparameterized Ternary Network
To deploy deep neural networks on resource-limited devices, quantization has
been widely explored. In this work, we study the extremely low-bit networks
which have tremendous speed-up, memory saving with quantized activation and
weights. We first bring up three omitted issues in extremely low-bit networks:
the squashing range of quantized values; the gradient vanishing during
backpropagation and the unexploited hardware acceleration of ternary networks.
By reparameterizing quantized activation and weights vector with full precision
scale and offset for fixed ternary vector, we decouple the range and magnitude
from the direction to extenuate the three issues. Learnable scale and offset
can automatically adjust the range of quantized values and sparsity without
gradient vanishing. A novel encoding and computation pat-tern are designed to
support efficient computing for our reparameterized ternary network (RTN).
Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a
much better efficiency between bitwidth and accuracy, and achieves up to 26.76%
relative accuracy improvement compared with state-of-the-art methods. Moreover,
we validate the proposed computation pattern on Field Programmable Gate Arrays
(FPGA), and it brings 46.46x and 89.17x savings on power and area respectively
compared with the full precision convolution.Comment: To appear at AAAI-2
M-NAS: Meta Neural Architecture Search
Neural Architecture Search (NAS) has recently outperformed hand-crafted networks in various areas. However, most prevalent NAS methods only focus on a pre-defined task. For a previously unseen task, the architecture is either searched from scratch, which is inefficient, or transferred from the one obtained on some other task, which might be sub-optimal. In this paper, we investigate a previously unexplored problem: whether a universal NAS method exists, such that task-aware architectures can be effectively generated? Towards this problem, we propose Meta Neural Architecture Search (M-NAS). To obtain task-specific architectures, M-NAS adopts a task-aware architecture controller for child model generation. Since optimal weights for different tasks and architectures span diversely, we resort to meta-learning, and learn meta-weights that efficiently adapt to a new task on the corresponding architecture with only several gradient descent steps. Experimental results demonstrate the superiority of M-NAS against a number of competitive baselines on both toy regression and few shot classification problems