Search CORE

6 research outputs found

FPGA-accelerated machine learning inference as a service for particle physics computing

Author: Duarte Javier
Harris Philip
Hauck Scott
Holzman Burt
Hsu Shih-Chieh
Jindariani Sergo
Khan Suffian
Kreis Benjamin
Lee Brian
Liu Mia
Lončar Vladimir
Ngadiuba Jennifer
Pedro Kevin
Perez Brandon
Pierini Maurizio
Rankin Dylan
Trahms Matthew
Tran Nhan
Tsaris Aristeidis
Versteeg Colin
Way Ted W.
Werran Dustin
Wu Zhenbin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/04/2019
Field of study

New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.Comment: 16 pages, 14 figures, 2 table

arXiv.org e-Print Archive

CERN Document Server

Generalized Machine Learning Quantization Implementation for High Level Synthesis Targeting FPGAs

Author: Trahms Matthew Karl
Publication venue
Publication date: 01/01/2022
Field of study

Thesis (Master's)--University of Washington, 2022The Large Hadron Collider produces a large amount of data while operating, approximately one petabyte of data per second. The collider is currently undergoing an upgrade to collide more particles and produce even more data. In order to handle this large quantity of data, high throughput and low latency algorithms are required to filter interesting collision results out of the rest of the data collected by the sensors attached to the collider. Machine learning algorithms can be used for this filtering task with comparable accuracy to the traditional filtering algorithms and provide a wide range of accelerator options. FINN and hls4ml are frameworks to deploy machine learning models on Field Programmable Gate Arrays for high throughput, low latency acceleration options. FINN utilizes Brevitas, a quantization aware training library. Using Brevitas, I trained a particle tracking network and demonstrated equivalent accuracy at lower bit precision than post training quantization. As a cross organizational project, the hls4ml and FINN teams collaborated to develop the QONNX standard for quantized machine learning model representation. In order to integrate QONNX into hls4ml, I implemented new transformations to support the unique structures of QONNX

DSpace at The University of Washington

CERN Document Server

QONNX: Representing Arbitrary-Precision Quantized Neural Networks

Author: Blott Michaela
Borras Hendrik
Duarte Javier
Hauck Scott
Hawks Ben
Hsu Shih-Chieh
Loncar Vladimir
Mitrevski Jovan
Muhizi Jules
Pappalardo Alessandro
Summers Sioni
Trahms Matthew
Tran Nhan
Umuroglu Yaman
Publication venue
Publication date: 15/06/2022
Field of study

We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks

CERN Document Server

Graph Neural Networks for Charged Particle Tracking on FPGAs.

Author: Atkinson Markus
DeZoort Gage
Duarte Javier
Elabd Abdelrahman
Elmer Peter
Hauck Scott
Hsu Shih-Chieh
Hu Jin-Xuan
Huang Shi-Yu
Lai Bo-Cheng
Neubauer Mark
Ojalvo Isobel
Razavimaleki Vesal
Thais Savannah
Trahms Matthew
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

The determination of charged particle trajectories in collisions at the CERN Large Hadron Collider (LHC) is an important but challenging problem, especially in the high interaction density conditions expected during the future high-luminosity phase of the LHC (HL-LHC). Graph neural networks (GNNs) are a type of geometric deep learning algorithm that has successfully been applied to this task by embedding tracker data as a graph-nodes represent hits, while edges represent possible track segments-and classifying the edges as true or fake track segments. However, their study in hardware- or software-based trigger applications has been limited due to their large computational cost. In this paper, we introduce an automated translation workflow, integrated into a broader tool called hls4ml, for converting GNNs into firmware for field-programmable gate arrays (FPGAs). We use this translation tool to implement GNNs for charged particle tracking, trained using the TrackML challenge dataset, on FPGAs with designs targeting different graph sizes, task complexites, and latency/throughput requirements. This work could enable the inclusion of charged particle tracking GNNs at the trigger level for HL-LHC experiments

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

FPGAs-as-a-Service Toolkit (FaaST)

Author: Duarte Javier
Flechas Maria Acosta
Harris Philip
Hauck Scott
Ho Ta-Wei
Holzman Burt
Hsu Shih-Chieh
Klijnsma Thomas
Krupa Jeffrey
Lin Kelvin
Liu Mia
Lou Yu
Pedro Kevin
Rankin Dylan
Trahms Matthew
Tran Nhan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/04/2022
Field of study

Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit

DSpace@MIT