Search CORE

9 research outputs found

ZeroQ: A Novel Zero Shot Quantization Framework

Author: Cai Yaohui
Dong Zhen
Gholami Amir
Keutzer Kurt
Mahoney Michael W.
Yao Zhewei
Publication venue
Publication date: 01/01/2020
Field of study

Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization. This is often not possible for applications with sensitive or proprietary data, e.g., due to privacy and security concerns. Existing zero-shot quantization methods use different heuristics to address this, but they result in poor performance, especially when quantizing to ultra-low precision. Here, we propose ZeroQ , a novel zero-shot quantization framework to address this. ZeroQ enables mixed-precision quantization without any access to the training or validation data. This is achieved by optimizing for a Distilled Dataset, which is engineered to match the statistics of batch normalization across different layers of the network. ZeroQ supports both uniform and mixed-precision quantization. For the latter, we introduce a novel Pareto frontier based method to automatically determine the mixed-precision bit setting for all layers, with no manual search involved. We extensively test our proposed method on a diverse set of models, including ResNet18/50/152, MobileNetV2, ShuffleNet, SqueezeNext, and InceptionV3 on ImageNet, as well as RetinaNet-ResNet50 on the Microsoft COCO dataset. In particular, we show that ZeroQ can achieve 1.71\% higher accuracy on MobileNetV2, as compared to the recently proposed DFQ method. Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0.5\% of one epoch training time of ResNet50 on ImageNet). We have open-sourced the ZeroQ framework\footnote{https://github.com/amirgholami/ZeroQ}.Comment: CVPR 202

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Author: Blott Michaela
Gambardella Giulio
Huang Qijing
Keutzer Kurt
Lavagno Luciano
Ma Liang
Vissers Kees
Wawrzynek John
Wu Bichen
Yang Yifan
Zhang Tianjun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In this work, we adopt an algorithm-hardware co-design approach to develop a ConvNet accelerator called Synetgy and a novel ConvNet model called DiracDeltaNet

^{\dagger}

. Both the accelerator and ConvNet are tailored to FPGA requirements. DiracDeltaNet, as the name suggests, is a ConvNet with only

1\times 1

convolutions while spatial convolutions are replaced by more efficient shift operations. DiracDeltaNet achieves competitive accuracy on ImageNet (88.7\% top-5), but with 42

\times

fewer parameters and 48

\times

fewer OPs than VGG16. We further quantize DiracDeltaNet's weights to 4-bit and activations to 4-bits, with less than 1\% accuracy loss. These quantizations exploit well the nature of FPGA hardware. In short, DiracDeltaNet's small model size, low computational OP count, low precision and simplified operators allow us to co-design a highly customized computing unit for an FPGA. We implement the computing units for DiracDeltaNet on an Ultra96 SoC system through high-level synthesis. Our accelerator's final top-5 accuracy of 88.1\% on ImageNet, is higher than all the previously reported embedded FPGA accelerators. In addition, the accelerator reaches an inference speed of 66.3 FPS on the ImageNet classification task, surpassing prior works with similar accuracy by at least 11.6

\times

.Comment: Update to the latest result

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

Author: Amid Alon
Asanovic Krste
Genc Hasan
Grubb Daniel
Haj-Ali Ameer
Iyer Vighnesh
Kim Seah
Liew Harrison
Mao Howard
Nikolic Borivoje
Ou Albert
Prakash Pranav
Ragan-Kelley Jonathan
Schmidt Colin
Shao Yakun Sophia
Steffl Samuel
Stoica Ion
Wright John
Zhao Jerry
Publication venue
Publication date: 09/07/2021
Field of study

DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source*, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks. * https://github.com/ucb-bar/gemminiComment: To appear at the 58th IEEE/ACM Design Automation Conference (DAC), December 2021, San Francisco, CA, US

arXiv.org e-Print Archive

DSpace@MIT

Co-design of deep neural nets and neural net accelerators for embedded vision applications

Author
Publication venue: 'IBM'
Publication date
Field of study

Crossref