Search CORE

6,873 research outputs found

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Author: Adam Hartwig
Chen Bo
Howard Andrew
Jacob Benoit
Kalenichenko Dmitry
Kligys Skirmantas
Tang Matthew
Zhu Menglong
Publication venue
Publication date: 15/12/2017
Field of study

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.Comment: 14 pages, 12 figure

arXiv.org e-Print Archive

Crossref

Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

Author: Chen Guan-Yu
Chen Hung-Jen
Cheng Chia-Ming
Chiang Cheng-Ming
Kao Kloze
Kuo Hsien-Kai
Lin Wei-Shiang
Lin Yu-Chieh
Shen BY
Tan Koan-Sin
Tsai Yi-Min
Tseng Shou-Yao Roy
Tseng Yu
Wang Wei-Ting
Xu Yu-Syuan
Yu Chia-Lin
Publication venue
Publication date: 27/04/2020
Field of study

Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency variation due to the difference and limitation of deep learning accelerators on mobile devices. In this paper, we conduct a search of portable network architectures for better quality-latency trade-off across mobile devices. We further present the effectiveness of widely used network optimizations for image deblurring task. This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality. Through all the above works, we demonstrate the successful deployment of image deblurring application on mobile devices with the acceleration of deep learning accelerators. To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices. This paper provides practical deployment-guidelines, and is adopted by the championship-winning team in NTIRE 2020 Image Deblurring Challenge on Smartphone Track.Comment: CVPR 2020 Workshop on New Trends in Image Restoration and Enhancement (NTIRE

arXiv.org e-Print Archive

Crossref

MLPerf Inference Benchmark

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

arXiv.org e-Print Archive

Crossref