7,221 research outputs found
Fixed-point Factorized Networks
In recent years, Deep Neural Networks (DNN) based methods have achieved
remarkable performance in a wide range of tasks and have been among the most
powerful and widely used techniques in computer vision. However, DNN-based
methods are both computational-intensive and resource-consuming, which hinders
the application of these methods on embedded systems like smart phones. To
alleviate this problem, we introduce a novel Fixed-point Factorized Networks
(FFN) for pretrained models to reduce the computational complexity as well as
the storage requirement of networks. The resulting networks have only weights
of -1, 0 and 1, which significantly eliminates the most resource-consuming
multiply-accumulate operations (MACs). Extensive experiments on large-scale
ImageNet classification task show the proposed FFN only requires one-thousandth
of multiply operations with comparable accuracy
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
State-of-the-art convolutional neural networks are enormously costly in both
compute and memory, demanding massively parallel GPUs for execution. Such
networks strain the computational capabilities and energy available to embedded
and mobile processing platforms, restricting their use in many important
applications. In this paper, we push the boundaries of hardware-effective CNN
design by proposing BCNN with Separable Filters (BCNNw/SF), which applies
Singular Value Decomposition (SVD) on BCNN kernels to further reduce
computational and storage complexity. To enable its implementation, we provide
a closed form of the gradient over SVD to calculate the exact gradient with
respect to every binarized weight in backward propagation. We verify BCNNw/SF
on the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator for
CIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of
17% and execution time reduction of 31.3% compared to BCNN with only minor
accuracy sacrifices.Comment: 9 pages, 6 figures, accepted for Embedded Vision Workshop (CVPRW
Incremental multi-domain learning with network latent tensor factorization
The prominence of deep learning, large amount of annotated data and
increasingly powerful hardware made it possible to reach remarkable performance
for supervised classification tasks, in many cases saturating the training
sets. However the resulting models are specialized to a single very specific
task and domain. Adapting the learned classification to new domains is a hard
problem due to at least three reasons: (1) the new domains and the tasks might
be drastically different; (2) there might be very limited amount of annotated
data on the new domain and (3) full training of a new model for each new task
is prohibitive in terms of computation and memory, due to the sheer number of
parameters of deep CNNs. In this paper, we present a method to learn
new-domains and tasks incrementally, building on prior knowledge from already
learned tasks and without catastrophic forgetting. We do so by jointly
parametrizing weights across layers using low-rank Tucker structure. The core
is task agnostic while a set of task specific factors are learnt on each new
domain. We show that leveraging tensor structure enables better performance
than simply using matrix operations. Joint tensor modelling also naturally
leverages correlations across different layers. Compared with previous methods
which have focused on adapting each layer separately, our approach results in
more compact representations for each new task/domain. We apply the proposed
method to the 10 datasets of the Visual Decathlon Challenge and show that our
method offers on average about 7.5x reduction in number of parameters and
competitive performance in terms of both classification accuracy and Decathlon
score.Comment: AAAI2
- …