2,897 research outputs found
Towards Effective Low-bitwidth Convolutional Neural Networks
This paper tackles the problem of training a deep convolutional neural
network with both low-precision weights and low-bitwidth activations.
Optimizing a low-precision network is very challenging since the training
process can easily get trapped in a poor local minima, which results in
substantial accuracy loss. To mitigate this problem, we propose three
simple-yet-effective approaches to improve the network training. First, we
propose to use a two-stage optimization strategy to progressively find good
local minima. Specifically, we propose to first optimize a net with quantized
weights and then quantized activations. This is in contrast to the traditional
methods which optimize them simultaneously. Second, following a similar spirit
of the first method, we propose another progressive optimization approach which
progressively decreases the bit-width from high-precision to low-precision
during the course of training. Third, we adopt a novel learning scheme to
jointly train a full-precision model alongside the low-precision one. By doing
so, the full-precision model provides hints to guide the low-precision model
training. Extensive experiments on various datasets ( i.e., CIFAR-100 and
ImageNet) show the effectiveness of the proposed methods. To highlight, using
our methods to train a 4-bit precision network leads to no performance decrease
in comparison with its full-precision counterpart with standard network
architectures ( i.e., AlexNet and ResNet-50).Comment: 11 page
Domain-adaptive deep network compression
Deep Neural Networks trained on large datasets can be easily transferred to
new domains with far fewer labeled examples by a process called fine-tuning.
This has the advantage that representations learned in the large source domain
can be exploited on smaller target domains. However, networks designed to be
optimal for the source task are often prohibitively large for the target task.
In this work we address the compression of networks after domain transfer.
We focus on compression algorithms based on low-rank matrix decomposition.
Existing methods base compression solely on learned network weights and ignore
the statistics of network activations. We show that domain transfer leads to
large shifts in network activations and that it is desirable to take this into
account when compressing. We demonstrate that considering activation statistics
when compressing weights leads to a rank-constrained regression problem with a
closed-form solution. Because our method takes into account the target domain,
it can more optimally remove the redundancy in the weights. Experiments show
that our Domain Adaptive Low Rank (DALR) method significantly outperforms
existing low-rank compression techniques. With our approach, the fc6 layer of
VGG19 can be compressed more than 4x more than using truncated SVD alone --
with only a minor or no loss in accuracy. When applied to domain-transferred
networks it allows for compression down to only 5-20% of the original number of
parameters with only a minor drop in performance.Comment: Accepted at ICCV 201
- …