Search CORE

2 research outputs found

QONNX: Representing Arbitrary-Precision Quantized Neural Networks

Author: Blott Michaela
Borras Hendrik
Duarte Javier
Hauck Scott
Hawks Ben
Hsu Shih-Chieh
Loncar Vladimir
Mitrevski Jovan
Muhizi Jules
Pappalardo Alessandro
Summers Sioni
Trahms Matthew
Tran Nhan
Umuroglu Yaman
Publication venue
Publication date: 15/06/2022
Field of study

We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks

CERN Document Server

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Author: Blott Michaela
Borras Hendrik
Di Guglielmo Giuseppe
Duarte Javier
Ghielmetti Nicolò
Hauck Scott
Hawks Ben
Hsu Shih-Chieh
Kastner Ryan
Liang Jason
Meza Andres
Muhizi Jules
Nguyen Tai
Roy Rushil
Tran Nhan
Umuroglu Yaman
Weng Olivia
Yokuda Aidan
Publication venue
Publication date: 23/06/2022
Field of study

We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20

\mu

s and energy consumption as low as 30

\mu

J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools

CERN Document Server