8,882 research outputs found
Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
When approaching a novel visual recognition problem in a specialized image
domain, a common strategy is to start with a pre-trained deep neural network
and fine-tune it to the specialized domain. If the target domain covers a
smaller visual space than the source domain used for pre-training (e.g.
ImageNet), the fine-tuned network is likely to be over-parameterized. However,
applying network pruning as a post-processing step to reduce the memory
requirements has drawbacks: fine-tuning and pruning are performed
independently; pruning parameters are set once and cannot adapt over time; and
the highly parameterized nature of state-of-the-art pruning methods make it
prohibitive to manually search the pruning parameter space for deep networks,
leading to coarse approximations. We propose a principled method for jointly
fine-tuning and compressing a pre-trained convolutional network that overcomes
these limitations. Experiments on two specialized image domains (remote sensing
images and describable textures) demonstrate the validity of the proposed
approach.Comment: BMVC 2017 ora
Similarity Learning for High-Dimensional Sparse Data
A good measure of similarity between data points is crucial to many tasks in
machine learning. Similarity and metric learning methods learn such measures
automatically from data, but they do not scale well respect to the
dimensionality of the data. In this paper, we propose a method that can learn
efficiently similarity measure from high-dimensional sparse data. The core idea
is to parameterize the similarity measure as a convex combination of rank-one
matrices with specific sparsity structures. The parameters are then optimized
with an approximate Frank-Wolfe procedure to maximally satisfy relative
similarity constraints on the training data. Our algorithm greedily
incorporates one pair of features at a time into the similarity measure,
providing an efficient way to control the number of active features and thus
reduce overfitting. It enjoys very appealing convergence guarantees and its
time and memory complexity depends on the sparsity of the data instead of the
dimension of the feature space. Our experiments on real-world high-dimensional
datasets demonstrate its potential for classification, dimensionality reduction
and data exploration.Comment: 14 pages. Proceedings of the 18th International Conference on
Artificial Intelligence and Statistics (AISTATS 2015). Matlab code:
https://github.com/bellet/HDS
Accelerating Training of Deep Neural Networks via Sparse Edge Processing
We propose a reconfigurable hardware architecture for deep neural networks
(DNNs) capable of online training and inference, which uses algorithmically
pre-determined, structured sparsity to significantly lower memory and
computational requirements. This novel architecture introduces the notion of
edge-processing to provide flexibility and combines junction pipelining and
operational parallelization to speed up training. The overall effect is to
reduce network complexity by factors up to 30x and training time by up to 35x
relative to GPUs, while maintaining high fidelity of inference results. This
has the potential to enable extensive parameter searches and development of the
largely unexplored theoretical foundation of DNNs. The architecture
automatically adapts itself to different network sizes given available hardware
resources. As proof of concept, we show results obtained for different bit
widths.Comment: Presented at the 26th International Conference on Artificial Neural
Networks (ICANN) 2017 in Alghero, Ital
EIE: Efficient Inference Engine on Compressed Deep Neural Network
State-of-the-art deep neural networks (DNNs) have hundreds of millions of
connections and are both computationally and memory intensive, making them
difficult to deploy on embedded systems with limited hardware resources and
power budgets. While custom hardware helps the computation, fetching weights
from DRAM is two orders of magnitude more expensive than ALU operations, and
dominates the required power.
Previously proposed 'Deep Compression' makes it possible to fit large DNNs
(AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by
pruning the redundant connections and having multiple connections share the
same weight. We propose an energy efficient inference engine (EIE) that
performs inference on this compressed network model and accelerates the
resulting sparse matrix-vector multiplication with weight sharing. Going from
DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x;
Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x.
Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to
CPU and GPU implementations of the same DNN without compression. EIE has a
processing power of 102GOPS/s working directly on a compressed network,
corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of
AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is
24,000x and 3,400x more energy efficient than a CPU and GPU respectively.
Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy
efficiency and area efficiency.Comment: External Links: TheNextPlatform: http://goo.gl/f7qX0L ; O'Reilly:
https://goo.gl/Id1HNT ; Hacker News: https://goo.gl/KM72SV ; Embedded-vision:
http://goo.gl/joQNg8 ; Talk at NVIDIA GTC'16: http://goo.gl/6wJYvn ; Talk at
Embedded Vision Summit: https://goo.gl/7abFNe ; Talk at Stanford University:
https://goo.gl/6lwuer. Published as a conference paper in ISCA 201
- …