Search CORE

19,089 research outputs found

A Survey of Model Compression and Acceleration for Deep Neural Networks

Author: Cheng Yu
Wang Duo
Zhang Tao
Zhou Pan
Publication venue
Publication date: 14/06/2020
Field of study

Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past five years, tremendous progress has been made in this area. In this paper, we review the recent techniques for compacting and accelerating DNN models. In general, these techniques are divided into four categories: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and quantization are described first, after that the other techniques are introduced. For each category, we also provide insightful analysis about the performance, related applications, advantages, and drawbacks. Then we go through some very recent successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrices, the main datasets used for evaluating the model performance, and recent benchmark efforts. Finally, we conclude this paper, discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version including more recent work

arXiv.org e-Print Archive

Improving Confidence Estimates for Unfamiliar Examples

Author: Hoiem Derek
Li Zhizhong
Publication venue
Publication date: 07/09/2020
Field of study

Intuitively, unfamiliarity should lead to lack of confidence. In reality, current algorithms often make highly confident yet wrong predictions when faced with relevant but unfamiliar examples. A classifier we trained to recognize gender is 12 times more likely to be wrong with a 99% confident prediction if presented with a subject from a different age group than those seen during training. In this paper, we compare and evaluate several methods to improve confidence estimates for unfamiliar and familiar samples. We propose a testing methodology of splitting unfamiliar and familiar samples by attribute (age, breed, subcategory) or sampling (similar datasets collected by different people at different times). We evaluate methods including confidence calibration, ensembles, distillation, and a Bayesian model and use several metrics to analyze label, likelihood, and calibration error. While all methods reduce over-confident errors, the ensemble of calibrated models performs best overall, and T-scaling performs best among the approaches with fastest inference. Our code is available at https://github.com/lizhitwo/ConfidenceEstimates .

\color{red}{\text{Please see UPDATED ERRATA.}}

Comment: Published in CVPR 2020 (oral). ERRATA: (1) a previous version (v3) included erroneous results for

T

-scaling, where novel samples are mistakenly included in the validation set for calibration. Please disregard those results. (2) Previous versions (v4, v5) incorrectly stated that Adam was used. In fact, we used SG

arXiv.org e-Print Archive

Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural Networks

Author: Alom Md Zahangir
Asari Vijayan K.
Hasan Mahmudul
Sidike Peheding
Taha Tark M.
Publication venue
Publication date: 10/02/2018
Field of study

In spite of advances in object recognition technology, Handwritten Bangla Character Recognition (HBCR) remains largely unsolved due to the presence of many ambiguous handwritten characters and excessively cursive Bangla handwritings. Even the best existing recognizers do not lead to satisfactory performance for practical applications related to Bangla character recognition and have much lower performance than those developed for English alpha-numeric characters. To improve the performance of HBCR, we herein present the application of the state-of-the-art Deep Convolutional Neural Networks (DCNN) including VGG Network, All Convolution Network (All-Conv Net), Network in Network (NiN), Residual Network, FractalNet, and DenseNet for HBCR. The deep learning approaches have the advantage of extracting and using feature information, improving the recognition of 2D shapes with a high degree of invariance to translation, scaling and other distortions. We systematically evaluated the performance of DCNN models on publicly available Bangla handwritten character dataset called CMATERdb and achieved the superior recognition accuracy when using DCNN models. This improvement would help in building an automatic HBCR system for practical applications.Comment: 12 pages,22 figures, 5 tables. arXiv admin note: text overlap with arXiv:1705.0268

arXiv.org e-Print Archive

Gender Effect on Face Recognition for a Large Longitudinal Database

Author: Chen Cuixian
Ferguson Morgan
Kling Troy
Park Kevin
Wang Yishi
Werther Caroline
Publication venue
Publication date: 08/11/2018
Field of study

Aging or gender variation can affect the face recognition performance dramatically. While most of the face recognition studies are focused on the variation of pose, illumination and expression, it is important to consider the influence of gender effect and how to design an effective matching framework. In this paper, we address these problems on a very large longitudinal database MORPH-II which contains 55,134 face images of 13,617 individuals. First, we consider four comprehensive experiments with different combination of gender distribution and subset size, including: 1) equal gender distribution; 2) a large highly unbalanced gender distribution; 3) consider different gender combinations, such as male only, female only, or mixed gender; and 4) the effect of subset size in terms of number of individuals. Second, we consider eight nearest neighbor distance metrics and also Support Vector Machine (SVM) for classifiers and test the effect of different classifiers. Last, we consider different fusion techniques for an effective matching framework to improve the recognition performance.Comment: This paper has been accepted by IEEE International Workshop on Information Forensics and Security (2018 WIFS

arXiv.org e-Print Archive

Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent

Author: Chakradhar Srimat
Min Martin Renqiang
Wang Linnan
Yang Yi
Publication venue
Publication date: 28/03/2017
Field of study

SGD is the widely adopted method to train CNN. Conceptually it approximates the population with a randomly sampled batch; then it evenly trains batches by conducting a gradient update on every batch in an epoch. In this paper, we demonstrate Sampling Bias, Intrinsic Image Difference and Fixed Cycle Pseudo Random Sampling differentiate batches in training, which then affect learning speeds on them. Because of this, the unbiased treatment of batches involved in SGD creates improper load balancing. To address this issue, we present Inconsistent Stochastic Gradient Descent (ISGD) to dynamically vary training effort according to learning statuses on batches. Specifically ISGD leverages techniques in Statistical Process Control to identify a undertrained batch. Once a batch is undertrained, ISGD solves a new subproblem, a chasing logic plus a conservative constraint, to accelerate the training on the batch while avoid drastic parameter changes. Extensive experiments on a variety of datasets demonstrate ISGD converges faster than SGD. In training AlexNet, ISGD is 21.05\% faster than SGD to reach 56\% top1 accuracy under the exactly same experiment setup. We also extend ISGD to work on multiGPU or heterogeneous distributed system based on data parallelism, enabling the batch size to be the key to scalability. Then we present the study of ISGD batch size to the learning rate, parallelism, synchronization cost, system saturation and scalability. We conclude the optimal ISGD batch size is machine dependent. Various experiments on a multiGPU system validate our claim. In particular, ISGD trains AlexNet to 56.3% top1 and 80.1% top5 accuracy in 11.5 hours with 4 NVIDIA TITAN X at the batch size of 1536.Comment: The patent of ISGD belongs to NEC Lab

arXiv.org e-Print Archive

SkipNet: Learning Dynamic Routing in Convolutional Networks

Author: Darrell Trevor
Dou Zi-Yi
Gonzalez Joseph E.
Wang Xin
Yu Fisher
Publication venue
Publication date: 25/07/2018
Field of study

While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient. We exploit this observation by learning to skip convolutional layers on a per-input basis. We introduce SkipNet, a modified residual network, that uses a gating network to selectively skip convolutional blocks based on the activations of the previous layer. We formulate the dynamic skipping problem in the context of sequential decision making and propose a hybrid learning algorithm that combines supervised learning and reinforcement learning to address the challenges of non-differentiable skipping decisions. We show SkipNet reduces computation by 30-90% while preserving the accuracy of the original model on four benchmark datasets and outperforms the state-of-the-art dynamic networks and static compression methods. We also qualitatively evaluate the gating policy to reveal a relationship between image scale and saliency and the number of layers skipped.Comment: ECCV 2018 Camera ready version. Code is available at https://github.com/ucbdrive/skipne

arXiv.org e-Print Archive

Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices

Author: Li Dawei
Wang Xiaolong
Wang Yalin
Zhang Jie
Publication venue
Publication date: 08/06/2018
Field of study

Recurrent neural networks (RNNs) achieve cutting-edge performance on a variety of problems. However, due to their high computational and memory demands, deploying RNNs on resource constrained mobile devices is a challenging task. To guarantee minimum accuracy loss with higher compression rate and driven by the mobile resource requirement, we introduce a novel model compression approach DirNet based on an optimized fast dictionary learning algorithm, which 1) dynamically mines the dictionary atoms of the projection dictionary matrix within layer to adjust the compression rate 2) adaptively changes the sparsity of sparse codes cross the hierarchical layers. Experimental results on language model and an ASR model trained with a 1000h speech dataset demonstrate that our method significantly outperforms prior approaches. Evaluated on off-the-shelf mobile devices, we are able to reduce the size of original model by eight times with real-time model inference and negligible accuracy loss.Comment: Accepted by IJCAI-ECAI 201

arXiv.org e-Print Archive

Factorized Distillation: Training Holistic Person Re-identification Model by Distilling an Ensemble of Partial ReID Models

Author: Li Jianmin
Ren Pengyuan
Publication venue
Publication date: 19/11/2018
Field of study

Person re-identification (ReID) is aimed at identifying the same person across videos captured from different cameras. In the view that networks extracting global features using ordinary network architectures are difficult to extract local features due to their weak attention mechanisms, researchers have proposed a lot of elaborately designed ReID networks, while greatly improving the accuracy, the model size and the feature extraction latency are also soaring. We argue that a relatively compact ordinary network extracting globally pooled features has the capability to extract discriminative local features and can achieve state-of-the-art precision if only the model's parameters are properly learnt. In order to reduce the difficulty in learning hard identity labels, we propose a novel knowledge distillation method: Factorized Distillation, which factorizes both feature maps and retrieval features of holistic ReID network to mimic representations of multiple partial ReID models, thus transferring the knowledge from partial ReID models to the holistic network. Experiments show that the performance of model trained with the proposed method can outperform state-of-the-art with relatively few network parameters.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Accelerating Neutron Scattering Data Collection and Experiments Using AI Deep Super-Resolution Learning

Author: Chang Ming-Ching
Chen Wei-Ren
Do Changwoo
Wei Yi
Publication venue
Publication date: 31/05/2019
Field of study

We present a novel methodology of augmenting the scattering data measured by small angle neutron scattering via an emerging deep convolutional neural network (CNN) that is widely used in artificial intelligence (AI). Data collection time is reduced by increasing the size of binning of the detector pixels at the sacrifice of resolution. High-resolution scattering data is then reconstructed by using AI deep super-resolution learning method. This technique can not only improve the productivity of neutron scattering instruments by speeding up the experimental workflow but also enable capturing kinetic changes and transient phenomenon of materials that are currently inaccessible by existing neutron scattering techniques.Comment: 16 pages, 5 figure

arXiv.org e-Print Archive

Representation-Oblivious Error Correction by Natural Redundancy

Author: Anxiao
Jiang
Upadhyaya Pulakesh
Publication venue
Publication date: 09/11/2018
Field of study

Storage systems have a strong need for substantially improving their error correction capabilities, especially for long-term storage where the accumulating errors can exceed the decoding threshold of error-correcting codes (ECCs). In this work, a new scheme is presented that uses deep learning to perform soft decoding for noisy files based on their natural redundancy. The soft decoding result is then combined with ECCs for substantially better error correction performance. The scheme is representation-oblivious: it requires no prior knowledge on how data are represented (e.g., mapped from symbols to bits, compressed, and combined with meta data) in different types of files, which makes the solution more convenient to use for storage systems. Experimental results confirm that the scheme can substantially improve the ability to recover data for different types of files even when the bit error rates in the files have significantly exceeded the decoding threshold of the ECC.Comment: 7 pages, 5 figures, submitted to IEEE International Conference on Communications-201

arXiv.org e-Print Archive