26 research outputs found

    Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

    Full text link
    This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a net with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods.Comment: Accepted to IEEE T. Pattern Analysis and Machine Intelligence (TPAMI). Extended version of arXiv:1711.00205 (CVPR 2018

    Forward and Backward Information Retention for Accurate Binary Neural Networks

    Full text link
    Weight and activation binarization is an effective approach to deep neural network compression and can accelerate the inference by leveraging bitwise operations. Although many binarization methods have improved the accuracy of the model by minimizing the quantization error in forward propagation, there remains a noticeable performance gap between the binarized model and the full-precision one. Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks. To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. IR-Net mainly relies on two technical contributions: (1) Libra Parameter Binarization (Libra-PB): simultaneously minimizing both quantization error and information loss of parameters by balanced and standardized weights in forward propagation; (2) Error Decay Estimator (EDE): minimizing the information loss of gradients by gradually approximating the sign function in backward propagation, jointly considering the updating ability and accurate gradients. We are the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization. Comprehensive experiments with various network structures on CIFAR-10 and ImageNet datasets manifest that the proposed IR-Net can consistently outperform state-of-the-art quantization methods

    ํšจ์œจ์ ์ธ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์œ„ํ•œ ์–‘์žํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์œ ์Šน์ฃผ.Deep neural networks (DNN) are becoming increasingly popular and widely adopted for various applications. Energy efficiency of neural networks is critically important for both edge devices and servers. It is imperative to optimize neural networks in terms of both speed and energy consumption while maintaining the accuracy of the network. Quantization is one of the most effective optimization techniques. By reducing the bit-width of activations and weights, both the speed and energy can be improved by executing more computations using the same amount of memory access and computational resources (e.g. silicon chip area and battery). It is expected that computations with 4-bit and lower precision will contribute to the energy efficient and real-time characteristics of future deep learning applications. One major drawback of quantization is the drop in accuracy, resulting from the reduction in the degree of freedom of data representation. Recently, there have been several studies that demonstrated that the inference of DNNs can be accurately done by using 8-bit precision. However, many studies show that the network quantized into 4-bit or less precision suffers from significant quality degradation. Especially, the state-of-the art networks cannot be quantized easily due to their optimized structure. In this dissertation, several methods are proposed that use different approaches to minimize the reduction in the accuracy of the quantized DNNs. Weighted- entropy-based quantization is designed to fully utilize the limited number of quantization levels by maximizing the weighted information of the quantized data. This work shows the potential of multi-bit quantization for both activation and weight. Value-aware quantization, or outlier-aware quantization is designed to support sub-4-bit quantization, while allowing a small amount (1 ~ 3 %) of large values in high precision. This helps the quantized data to maintain the statistics, e.g. mean and variance corresponding to the full-precision, thus minimizing the accuracy drop after quantization. The dedicated hardware accelerator, called OLAccel, is also proposed to maximize the performance of the network quantized by the outlier-aware quantization. The hardware takes advantage of the benefit of reduced precision, i.e. 4-bit, with minimal accuracy drop by the proposed quantization algorithm. Precision-highway is the structural concept that forms an end-to-end high-precision information flow while performing ultra-low-precision computations. This minimizes the accumulated quantization error, which helps to improve the accuracy of the network even with extremely low precision. BLast, the training methodology, and differentiable and unified quantization (DuQ), a novel quantization algorithm, are designed to support sub-4-bit quantization for the optimized mobile networks, i.e. MobileNet-v3. These methods allow the MobileNet-v3 network to be quantized into 4-bit for both activation and weight with negligible accuracy loss.๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ (DNN)๋Š” ํ™œ์šฉ ๋ฒ”์œ„๋ฅผ ์ ์ฐจ ๋„“ํ˜€๊ฐ€๋ฉฐ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ์ ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋Š” ์„œ๋ฒ„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ์—์„œ๋„ ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ์œผ๋ฉฐ ์ด๋กœ์ธํ•ด ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๊ฒƒ์€ ์ ์  ๋” ์ค‘์š”ํ•ด์ง€๋Š” ์ค‘์ด๋‹ค. ์ด์ œ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ์†๋„๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜๊ณ  ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ์ค„์ด๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ์ตœ์ ํ™”๋Š” ํ•„์ˆ˜์  ์š”์†Œ๋กœ ์ž๋ฆฌ์žก์•˜๋‹ค. ์–‘์žํ™”๋Š” ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑ๋„ (activation) ๋ฐ ํ•™์Šต ๊ฐ€์ค‘์น˜ (weight)๋ฅผ ์ €์žฅํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๋น„ํŠธ ์ˆ˜๋ฅผ ์ค„์ž„์œผ๋กœ์จ ๋™์ผํ•œ ์–‘์˜ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ๊ณผ ์—ฐ์‚ฐ ๋น„์šฉ (์นฉ ๋ฉด์  ๋ฐ ์—๋„ˆ์ง€ ์†Œ๋ชจ ๋“ฑ)์œผ๋กœ ๋” ๋งŽ์€ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•ด์ง€๋ฉฐ ์ด๋กœ์ธํ•ด ์†๋„์™€ ์—๋„ˆ์ง€ ์†Œ๋ชจ๋ฅผ ๋™์‹œ์— ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ถ”ํ›„ ๋”ฅ ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ํ•„์š”ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ธก๋˜๋Š” ์—๋„ˆ์ง€ ํšจ์œจ ๋ฐ ์—ฐ์‚ฐ ์†๋„๋ฅผ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ 4 ๋น„ํŠธ ํ˜น์€ ๋” ์ ์€ ์ •๋ฐ€๋„ ๊ธฐ๋ฐ˜์˜ ์–‘์žํ™” ์—ฐ์‚ฐ์ด ์ง€๋Œ€ํ•œ ๊ณตํ—Œ์„ ํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์–‘์žํ™”์˜ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋‹จ์  ์ค‘ ํ•˜๋‚˜๋Š” ๋ฐ์ดํ„ฐ์˜ ํ‘œํ˜„ํ˜•์„ ์ œํ•œํ•˜์—ฌ ์ž์œ ๋„๊ฐ€ ๋–จ์–ด์ง€๊ฒŒ ๋จ์œผ๋กœ์„œ ๋ฐœ์ƒํ•˜๋Š” ์ •ํ™•๋„์˜ ์†์‹ค์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋“ค์ด ์ง„ํ–‰์ค‘์ด๋‹ค. ์ตœ๊ทผ ์ผ๋ถ€ ์—ฐ๊ตฌ๋“ค์€ 8 ๋น„ํŠธ์˜ ์ •๋ฐ€๋„์—์„œ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉํ•ด ๊ฒฐ๊ณผ๋ฅผ ์ถ”๋ก  (inference)ํ•˜๋Š”๋ฐ ์ •ํ™•๋„ ์†์‹ค์ด ๊ฑฐ์˜ ์—†์Œ์„ ๋ณด๊ณ ํ•˜๊ณ  ์žˆ๋‹ค. ๋ฐ˜๋ฉด ๊ทธ ์™ธ์˜ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋“ค์„ ํ†ตํ•ด 4 ๋น„ํŠธ ํ˜น์€ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ ์–‘์žํ™”๋ฅผ ์ ์šฉํ–ˆ์„ ๋•Œ ๋งŽ์€ ๋„คํŠธ์›Œํฌ๋“ค์˜ ์ •ํ™•๋„๊ฐ€ ํฌ๊ฒŒ ์†์ƒ๋˜๋Š” ํ˜„์ƒ๋„ ํ•จ๊ป˜ ๋ณด๊ณ ๋˜๊ณ  ์žˆ๋‹ค. ํŠนํžˆ ์ตœ๊ทผ ์ œ์•ˆ๋œ ๋„คํŠธ์›Œํฌ๋“ค์˜ ๊ฒฝ์šฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋„์ž…ํ•œ ์ตœ์ ํ™”๋œ ๊ตฌ์กฐ๊ฐ€ ์–‘์žํ™” ํ•˜๊ธฐ ์–ด๋ ค์šด ํŠน์„ฑ์„ ๊ฐ€์ ธ ์ด๋Ÿฌํ•œ ํ˜„์ƒ์ด ์‹ฌํ™”๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์–‘์žํ™”๋œ DNN์˜ ์ •ํ™•๋„ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ์œ„ํ•œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ฐ€์ค‘ ์—”ํŠธ๋กœํ”ผ ๊ธฐ๋ฐ˜ ์–‘์žํ™” (Weighted-entropy-based quantization)์€ ์ œํ•œ๋œ ๊ฐœ์ˆ˜์˜ ์–‘์žํ™” ๋ ˆ๋ฒจ์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์–‘์žํ™”๋œ ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์–‘์žํ™”๋ฅผ ์ง„ํ–‰ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ์•„์ฃผ ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ๋„ ๋‰ด๋Ÿฐ์˜ ํ™œ์„ฑ๋„์™€ ํ•™์Šต ๊ฐ€์ค‘์น˜ ๋ชจ๋‘์˜ ์–‘์žํ™”๊ฐ€ ์ ์šฉ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์˜€๋‹ค. ๊ฐ’-์˜์‹ ์–‘์žํ™” (value-aware quantization), ํ˜น์€ ์˜ˆ์™ธ-์˜์‹ ์–‘์žํ™” (outlier-aware quantization)๋Š” ๋นˆ๋„๋Š” ๋‚ฎ์ง€๋งŒ ํฐ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํฐ ์ •๋ฐ€๋„๋กœ ์ €์žฅํ•˜๋Š” ๋Œ€์‹  ๋‚˜๋จธ์ง€ ๋ฐ์ดํ„ฐ์— 4 ๋น„ํŠธ ์ดํ•˜์˜ ์–‘์žํ™”๋ฅผ ์ ์šฉํ•˜๋„๋ก ์„ค๊ณ„๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์ด๋Š” ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ ๊ฐ™์€ ํŠน์„ฑ์ด ์–‘์žํ™”๋œ ํ›„์—๋„ ์œ ์ง€ํ•˜๋„๋ก ๋„์™€์ฃผ์–ด ์–‘์žํ™”๋œ ๋„คํŠธ์›Œํฌ์˜ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋Š”๋ฐ ๊ธฐ์—ฌํ•œ๋‹ค. ์ด์— ๋”ํ•˜์—ฌ OLAccel์ด๋ผ ๋ช…๋ช…๋œ ํŠนํ™” ๊ฐ€์†๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๊ฐ€์†๊ธฐ๋Š” ๊ฐ’-์˜์‹ ์–‘์žํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์–‘์žํ™”๋œ ๋„คํŠธ์›Œํฌ๋ฅผ ๊ฐ€์†ํ•จ์œผ๋กœ์จ ์ •ํ™•๋„ ๊ฐ์†Œ๋Š” ์ตœ์†Œํ™” ํ•˜๋ฉด์„œ ๋‚ฎ์€ ์ •๋ฐ€๋„์˜ ์„ฑ๋Šฅ ์ด๋“์„ ์ตœ๋Œ€ํ™”ํ•œ๋‹ค. ๊ณ ์ •๋ฐ€๋„-ํ†ต๋กœ ๊ตฌ์กฐ (precision-highway)๋Š” ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•˜์—ฌ ์ดˆ์ €์ •๋ฐ€๋„ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ๋„ ๊ณ ์ •๋ฐ€๋„ ์ •๋ณด ํ†ต๋กœ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋Š” ์–‘์žํ™”๋กœ ์ธํ•˜์—ฌ ์—๋Ÿฌ๊ฐ€ ๋ˆ„์ ๋˜๋Š” ํ˜„์ƒ์„ ์™„ํ™”ํ•˜์—ฌ ๋งค์šฐ ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ ์ •ํ™•๋„๋ฅผ ๊ฐœ์„ ํ•˜๋Š”๋ฐ ๊ธฐ์—ฌํ•œ๋‹ค. ํ•™์Šต ๊ธฐ๋ฒ•์ธ BLast์™€ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ณ  ํ†ตํ•ฉ๋œ ์–‘์žํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ (DuQ)๋Š” MobileNet-v3๊ณผ ๊ฐ™์€ ์ตœ์ ํ™”๋œ ๋ชจ๋ฐ”์ผํ–ฅ ๋„คํŠธ์›Œํฌ๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•๋“ค์„ ํ†ตํ•ด ๋ฏธ๋ฏธํ•œ ์ •ํ™•๋„ ์†์‹ค๋งŒ์œผ๋กœ MobileNet-v3์˜ ํ™œ์„ฑ๋„ ๋ฐ ํ•™์Šต ๊ฐ€์ค‘์น˜ ๋ชจ๋‘๋ฅผ 4 ๋น„ํŠธ ์ •๋ฐ€๋„๋กœ ์–‘์žํ™”ํ•˜๋Š”๋ฐ ์„ฑ๊ณตํ•˜์˜€๋‹ค.Chapter 1. Introduction 1 Chapter 2. Background and RelatedWork 4 Chapter 3. Weighted-entropy-based Quantization 15 3.1 Introduction 15 3.2 Motivation 17 3.3 Quantization based on Weighted Entropy 20 3.3.1 Weight Quantization 20 3.3.2 Activation Quantization 24 3.3.3 IntegratingWeight/Activation Quantization into the Training Algorithm 27 3.4 Experiment 28 3.4.1 Image Classification: AlexNet, GoogLeNet and ResNet-50/101 28 3.4.2 Object Detection: R-FCN with ResNet-50 35 3.4.3 Language Modeling: An LSTM 37 3.5 Conclusion 38 Chapter 4. Value-aware Quantization for Training and Inference of Neural Networks 40 4.1 Introduction 40 4.2 Motivation 41 4.3 Proposed Method 43 4.3.1 Quantized Back-Propagation 44 4.3.2 Back-Propagation of Full-Precision Loss 46 4.3.3 Potential of Further Reduction in Computation Cost 47 4.3.4 Local Sorting in Data Parallel Training 48 4.3.5 ReLU and Value-aware Quantization (RV-Quant) 49 4.3.6 Activation Annealing 50 4.3.7 Quantized Inference 50 4.4 Experiments 51 4.4.1 Training Results 52 4.4.2 Inference Results 59 4.4.3 LSTM Language Model 61 4.5 Conclusions 62 Chapter 5. Energy-efficient Neural Network Accelerator Based on Outlier-aware Low-precision Computation 63 5.1 Introduction 63 5.2 Proposed Architecture 65 5.2.1 Overall Structure 65 5.2.2 Dataflow 68 5.2.3 PE Cluster 72 5.2.4 Normal PE Group 72 5.2.5 Outlier PE Group and Cluster Output Tri-buffer 75 5.3 Evaluation Methodology 78 5.4 Experimental Results 80 5.5 Conclusion 90 Chapter 6. Precision Highway for Ultra Low-Precision Quantization 92 6.1 Introduction 92 6.2 Proposed Method 93 6.2.1 Precision Highway on Residual Network 94 6.2.2 Precision Highway on Recurrent Neural Network 96 6.2.3 Practical Issues with Precision Highway 98 6.3 Training 99 6.3.1 LinearWeight Quantization based on Laplace Distribution Model 99 6.3.2 Fine-tuning for Weight/Activation Quantization 100 6.4 Experiments 101 6.4.1 Experimental Setup 101 6.4.2 Analysis of Accumulated Quantization Error 101 6.4.3 Loss Surface Analysis of Quantized Model Training 103 6.4.4 Evaluating the Accuracy of Quantized Model 103 6.4.5 Hardware Cost Evaluation of Quantized Model 108 6.5 Conclusion 109 Chapter 7. Towards Sub-4-bit Quantization of Optimized Mobile Netowrks 114 7.1 Introduction 114 7.2 BLast Training 117 7.2.1 Notation 118 7.2.2 Observation 118 7.2.3 Activation Instability Metric 120 7.2.4 BLast Training 122 7.3 Differentiable and Unified Quantization 124 7.3.1 Rounding and Truncation Errors 124 7.3.2 Limitations of State-of-the-Art Methods 124 7.3.3 Proposed Method: DuQ 126 7.3.4 Handling Negative Values 128 7.4 Experiments 131 7.4.1 Accuracy on ImageNet Dataset 131 7.4.2 Discussion on Fused-BatchNorm 133 7.4.3 Ablation Study 134 7.5 Conclusion 137 Chapter 8 Conclusion 138 Bibliography 141 ๊ตญ๋ฌธ์ดˆ๋ก 154 Acknowledgements 157Docto

    Distribution-sensitive Information Retention for Accurate Binary Neural Network

    Full text link
    Model binarization is an effective method of compressing neural networks and accelerating their inference process. However, a significant performance gap still exists between the 1-bit model and the 32-bit one. The empirical study shows that binarization causes a great loss of information in the forward and backward propagation. We present a novel Distribution-sensitive Information Retention Network (DIR-Net) that retains the information in the forward and backward propagation by improving internal propagation and introducing external representations. The DIR-Net mainly relies on three technical contributions: (1) Information Maximized Binarization (IMB): minimizing the information loss and the binarization error of weights/activations simultaneously by weight balance and standardization; (2) Distribution-sensitive Two-stage Estimator (DTE): retaining the information of gradients by distribution-sensitive soft approximation by jointly considering the updating capability and accurate gradient; (3) Representation-align Binarization-aware Distillation (RBD): retaining the representation information by distilling the representations between full-precision and binarized networks. The DIR-Net investigates both forward and backward processes of BNNs from the unified information perspective, thereby providing new insight into the mechanism of network binarization. The three techniques in our DIR-Net are versatile and effective and can be applied in various structures to improve BNNs. Comprehensive experiments on the image classification and objective detection tasks show that our DIR-Net consistently outperforms the state-of-the-art binarization approaches under mainstream and compact architectures, such as ResNet, VGG, EfficientNet, DARTS, and MobileNet. Additionally, we conduct our DIR-Net on real-world resource-limited devices which achieves 11.1x storage saving and 5.4x speedup

    Ternary Neural Networks

    Get PDF

    Efficient Deep Neural Networks

    Get PDF
    The success of deep neural networks (DNNs) is attributable to three factors: increased compute capacity, more complex models, and more data. These factors, however, are not always present, especially for edge applications such as autonomous driving, augmented reality, and internet-of-things. Training DNNs requires a large amount of data, which is difficult to obtain. Edge devices such as mobile phones have limited compute capacity, and therefore, require specialized and efficient DNNs. However, due to the enormous design space and prohibitive training costs, designing efficient DNNs for different target devices is challenging. So the question is, with limited data, compute capacity, and model complexity, can we still successfully apply deep neural networks?This dissertation focuses on the above problems and improving the efficiency of deep neural networks at four levels. Model efficiency: we designed neural networks for various computer vision tasks and achieved more than 10x faster speed and lower energy. Data efficiency: we developed an advanced tool that enables 6.2x faster annotation of a LiDAR point cloud. We also leveraged domain adaptation to utilize simulated data, bypassing the need for real data. Hardware efficiency: we co-designed neural networks and hardware accelerators and achieved 11.6x faster inference. Design efficiency: the process of finding the optimal neural networks is time-consuming. Our automated neural architecture search algorithms discovered, using 421x lower computational cost than previous search methods, models with state-of-the-art accuracy and efficiency
    corecore