19,089 research outputs found
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
Improving Confidence Estimates for Unfamiliar Examples
Intuitively, unfamiliarity should lead to lack of confidence. In reality,
current algorithms often make highly confident yet wrong predictions when faced
with relevant but unfamiliar examples. A classifier we trained to recognize
gender is 12 times more likely to be wrong with a 99% confident prediction if
presented with a subject from a different age group than those seen during
training. In this paper, we compare and evaluate several methods to improve
confidence estimates for unfamiliar and familiar samples. We propose a testing
methodology of splitting unfamiliar and familiar samples by attribute (age,
breed, subcategory) or sampling (similar datasets collected by different people
at different times). We evaluate methods including confidence calibration,
ensembles, distillation, and a Bayesian model and use several metrics to
analyze label, likelihood, and calibration error. While all methods reduce
over-confident errors, the ensemble of calibrated models performs best overall,
and T-scaling performs best among the approaches with fastest inference. Our
code is available at https://github.com/lizhitwo/ConfidenceEstimates .
Comment: Published in CVPR 2020 (oral). ERRATA: (1) a previous version (v3)
included erroneous results for -scaling, where novel samples are
mistakenly included in the validation set for calibration. Please disregard
those results. (2) Previous versions (v4, v5) incorrectly stated that Adam
was used. In fact, we used SG
Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural Networks
In spite of advances in object recognition technology, Handwritten Bangla
Character Recognition (HBCR) remains largely unsolved due to the presence of
many ambiguous handwritten characters and excessively cursive Bangla
handwritings. Even the best existing recognizers do not lead to satisfactory
performance for practical applications related to Bangla character recognition
and have much lower performance than those developed for English alpha-numeric
characters. To improve the performance of HBCR, we herein present the
application of the state-of-the-art Deep Convolutional Neural Networks (DCNN)
including VGG Network, All Convolution Network (All-Conv Net), Network in
Network (NiN), Residual Network, FractalNet, and DenseNet for HBCR. The deep
learning approaches have the advantage of extracting and using feature
information, improving the recognition of 2D shapes with a high degree of
invariance to translation, scaling and other distortions. We systematically
evaluated the performance of DCNN models on publicly available Bangla
handwritten character dataset called CMATERdb and achieved the superior
recognition accuracy when using DCNN models. This improvement would help in
building an automatic HBCR system for practical applications.Comment: 12 pages,22 figures, 5 tables. arXiv admin note: text overlap with
arXiv:1705.0268
Gender Effect on Face Recognition for a Large Longitudinal Database
Aging or gender variation can affect the face recognition performance
dramatically. While most of the face recognition studies are focused on the
variation of pose, illumination and expression, it is important to consider the
influence of gender effect and how to design an effective matching framework.
In this paper, we address these problems on a very large longitudinal database
MORPH-II which contains 55,134 face images of 13,617 individuals. First, we
consider four comprehensive experiments with different combination of gender
distribution and subset size, including: 1) equal gender distribution; 2) a
large highly unbalanced gender distribution; 3) consider different gender
combinations, such as male only, female only, or mixed gender; and 4) the
effect of subset size in terms of number of individuals. Second, we consider
eight nearest neighbor distance metrics and also Support Vector Machine (SVM)
for classifiers and test the effect of different classifiers. Last, we consider
different fusion techniques for an effective matching framework to improve the
recognition performance.Comment: This paper has been accepted by IEEE International Workshop on
Information Forensics and Security (2018 WIFS
Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent
SGD is the widely adopted method to train CNN. Conceptually it approximates
the population with a randomly sampled batch; then it evenly trains batches by
conducting a gradient update on every batch in an epoch. In this paper, we
demonstrate Sampling Bias, Intrinsic Image Difference and Fixed Cycle Pseudo
Random Sampling differentiate batches in training, which then affect learning
speeds on them. Because of this, the unbiased treatment of batches involved in
SGD creates improper load balancing. To address this issue, we present
Inconsistent Stochastic Gradient Descent (ISGD) to dynamically vary training
effort according to learning statuses on batches. Specifically ISGD leverages
techniques in Statistical Process Control to identify a undertrained batch.
Once a batch is undertrained, ISGD solves a new subproblem, a chasing logic
plus a conservative constraint, to accelerate the training on the batch while
avoid drastic parameter changes. Extensive experiments on a variety of datasets
demonstrate ISGD converges faster than SGD. In training AlexNet, ISGD is
21.05\% faster than SGD to reach 56\% top1 accuracy under the exactly same
experiment setup. We also extend ISGD to work on multiGPU or heterogeneous
distributed system based on data parallelism, enabling the batch size to be the
key to scalability. Then we present the study of ISGD batch size to the
learning rate, parallelism, synchronization cost, system saturation and
scalability. We conclude the optimal ISGD batch size is machine dependent.
Various experiments on a multiGPU system validate our claim. In particular,
ISGD trains AlexNet to 56.3% top1 and 80.1% top5 accuracy in 11.5 hours with 4
NVIDIA TITAN X at the batch size of 1536.Comment: The patent of ISGD belongs to NEC Lab
SkipNet: Learning Dynamic Routing in Convolutional Networks
While deeper convolutional networks are needed to achieve maximum accuracy in
visual perception tasks, for many inputs shallower networks are sufficient. We
exploit this observation by learning to skip convolutional layers on a
per-input basis. We introduce SkipNet, a modified residual network, that uses a
gating network to selectively skip convolutional blocks based on the
activations of the previous layer. We formulate the dynamic skipping problem in
the context of sequential decision making and propose a hybrid learning
algorithm that combines supervised learning and reinforcement learning to
address the challenges of non-differentiable skipping decisions. We show
SkipNet reduces computation by 30-90% while preserving the accuracy of the
original model on four benchmark datasets and outperforms the state-of-the-art
dynamic networks and static compression methods. We also qualitatively evaluate
the gating policy to reveal a relationship between image scale and saliency and
the number of layers skipped.Comment: ECCV 2018 Camera ready version. Code is available at
https://github.com/ucbdrive/skipne
Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices
Recurrent neural networks (RNNs) achieve cutting-edge performance on a
variety of problems. However, due to their high computational and memory
demands, deploying RNNs on resource constrained mobile devices is a challenging
task. To guarantee minimum accuracy loss with higher compression rate and
driven by the mobile resource requirement, we introduce a novel model
compression approach DirNet based on an optimized fast dictionary learning
algorithm, which 1) dynamically mines the dictionary atoms of the projection
dictionary matrix within layer to adjust the compression rate 2) adaptively
changes the sparsity of sparse codes cross the hierarchical layers.
Experimental results on language model and an ASR model trained with a 1000h
speech dataset demonstrate that our method significantly outperforms prior
approaches. Evaluated on off-the-shelf mobile devices, we are able to reduce
the size of original model by eight times with real-time model inference and
negligible accuracy loss.Comment: Accepted by IJCAI-ECAI 201
Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models
Person re-identification (ReID) is aimed at identifying the same person
across videos captured from different cameras. In the view that networks
extracting global features using ordinary network architectures are difficult
to extract local features due to their weak attention mechanisms, researchers
have proposed a lot of elaborately designed ReID networks, while greatly
improving the accuracy, the model size and the feature extraction latency are
also soaring. We argue that a relatively compact ordinary network extracting
globally pooled features has the capability to extract discriminative local
features and can achieve state-of-the-art precision if only the model's
parameters are properly learnt. In order to reduce the difficulty in learning
hard identity labels, we propose a novel knowledge distillation method:
Factorized Distillation, which factorizes both feature maps and retrieval
features of holistic ReID network to mimic representations of multiple partial
ReID models, thus transferring the knowledge from partial ReID models to the
holistic network. Experiments show that the performance of model trained with
the proposed method can outperform state-of-the-art with relatively few network
parameters.Comment: 10 pages, 5 figure
Accelerating Neutron Scattering Data Collection and Experiments Using AI Deep Super-Resolution Learning
We present a novel methodology of augmenting the scattering data measured by
small angle neutron scattering via an emerging deep convolutional neural
network (CNN) that is widely used in artificial intelligence (AI). Data
collection time is reduced by increasing the size of binning of the detector
pixels at the sacrifice of resolution. High-resolution scattering data is then
reconstructed by using AI deep super-resolution learning method. This technique
can not only improve the productivity of neutron scattering instruments by
speeding up the experimental workflow but also enable capturing kinetic changes
and transient phenomenon of materials that are currently inaccessible by
existing neutron scattering techniques.Comment: 16 pages, 5 figure
Representation-Oblivious Error Correction by Natural Redundancy
Storage systems have a strong need for substantially improving their error
correction capabilities, especially for long-term storage where the
accumulating errors can exceed the decoding threshold of error-correcting codes
(ECCs). In this work, a new scheme is presented that uses deep learning to
perform soft decoding for noisy files based on their natural redundancy. The
soft decoding result is then combined with ECCs for substantially better error
correction performance. The scheme is representation-oblivious: it requires no
prior knowledge on how data are represented (e.g., mapped from symbols to bits,
compressed, and combined with meta data) in different types of files, which
makes the solution more convenient to use for storage systems. Experimental
results confirm that the scheme can substantially improve the ability to
recover data for different types of files even when the bit error rates in the
files have significantly exceeded the decoding threshold of the ECC.Comment: 7 pages, 5 figures, submitted to IEEE International Conference on
Communications-201
- ā¦