Search CORE

6 research outputs found

Fast convolutional neural networks on FPGAs with hls4ml

Author: Aarrestad Thea
Di Guglielmo Giuseppe
Duarte Javier
Ghielmetti Nicolò
Harris Philip
Hoang Duc
Iiyama Yutaro
Jindariani Sergo
Kreinar Edward
Linander Hampus
Liu Mia
Loncar Vladimir
Ngadiuba Jennifer
Pedro Kevin
Petersson Christoffer
Pierini Maurizio
Rankin Dylan
Summers Sioni
Tran Nhan
Wu Zhenbin
Publication venue: 'IOP Publishing'
Publication date: 01/01/2021
Field of study

We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of

5\,\mu

s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.Comment: 18 pages, 18 figures, 4 table

arXiv.org e-Print Archive

DSpace@MIT

Chalmers Research

CERN Document Server

Tiny Neural Networks for EnvironmentalPredictions: an integrated approach with Miosix

Author: Alongi Francesco
Fornaciari William
Ghielmetti Nicolò
Pau Danilo
Terraneo Federico
Publication venue
Publication date: 01/01/2020
Field of study

Collecting vast amount of data and performing complex calculations to feed modern Numerical Weather Prediction (NWP) algorithms require to centralize intelligence into some of the most powerful energy and resource hungry supercomputers in the world. This is due to the chaotic complex nature of the atmosphere which interpretation require virtually unlimited computing and storage resources. With Machine Learning(ML) techniques, a statistical approach can be designed in order to perform weather forecasting activity. Moreover, the recently growing interest in Edge Computing Tiny Intelligent architectures is proposing a shift towards the deployment of ML algorithms on Tiny Embedded Systems (ES). This paper describes how Deep but Tiny Neural Networks (DTNN) can be designed to be parsimonious and can be automatically converted into a STM32 microcontroller-optimized C-library through X-CUBE-AI toolchain; we propose the integration of the obtained library with Miosix, a Real Time Operating System (RTOS) tailored for resource constrained and tiny processors, which is an enabling factor for system scalability and multi tasking. With our experiments we demonstrate that it is possible to deploy a DTNN, with a FLASH and RAM occupation of 45,5 KByte and480 Byte respectively, for atmospheric pressure forecasting in an affordable cost effective system. We deployed the system in areal context, obtaining the same prediction quality as the same DNN model deployed on the cloud but with the advantage of processing all the necessary data to perform the prediction close to environmental sensors, avoiding raw data traffic to the cloud

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

Author: Aarrestad Thea
Ghielmetti Nicolò
Harris Philip
Lin Kelvin
Linander Hampus
Loncar Vladimir
Ngadiuba Jennifer
Petersson Christoffer
Pierini Maurizio
Roed Marcel
Summers Sioni
Publication venue: 'IOP Publishing'
Publication date: 01/01/2022
Field of study

In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.ISSN:2632-215

Repository for Publications and Research Data

Chalmers Research

CERN Document Server

Autoencoders on field-programmable gate arrays for real-time, unsupervised new physics detection at 40 MHz at the Large Hadron Collider

Author: Aarrestad Thea
Duarte Javier
Ghielmetti Nicolò
Govorkova Ekaterina
Graczyk Maksymilian
James Thomas
Loncar Vladimir
Ngadiuba Jennifer
Nguyen Thong Q.
Pierini Maurizio
Pol Adrian Alan
Puljak Ema
Summers Sioni
Wu Zhenbin
Publication venue
Publication date: 09/08/2021
Field of study

In this paper, we show how to adapt and deploy anomaly detection algorithms based on deep autoencoders, for the unsupervised detection of new physics signatures in the extremely challenging environment of a real-time event selection system at the Large Hadron Collider (LHC). We demonstrate that new physics signatures can be enhanced by three orders of magnitude, while staying within the strict latency and resource constraints of a typical LHC event filtering system. This would allow for collecting datasets potentially enriched with high-purity contributions from new physics processes. Through per-layer, highly parallel implementations of network layers, support for autoencoder-specific losses on FPGAs and latent space based inference, we demonstrate that anomaly detection can be performed in as little as

80\,

ns using less than 3% of the logic resources in the Xilinx Virtex VU9P FPGA. Opening the way to real-life applications of this idea during the next data-taking campaign of the LHC

CERN Document Server

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

Author: Blott Michaela
Borras Hendrik
Di Guglielmo Giuseppe
Duarte Javier
Ghielmetti Nicolò
Hauck Scott
Hawks Ben
Hsu Shih-Chieh
Kastner Ryan
Liang Jason
Meza Andres
Muhizi Jules
Nguyen Tai
Roy Rushil
Tran Nhan
Umuroglu Yaman
Weng Olivia
Yokuda Aidan
Publication venue
Publication date: 23/06/2022
Field of study

We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20

\mu

s and energy consumption as low as 30

\mu

J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools

CERN Document Server