Search CORE

78 research outputs found

LF-checker: Machine Learning Acceleration of Bounded Model Checking for Concurrency Verification (Competition Contribution)

Author: Aljaafari Fatimah
Cordeiro Lucas C.
Manino Edoardo
Petoumenos Pavlos
Wu Tong
Publication venue
Publication date: 22/01/2023
Field of study

We describe and evaluate LF-checker, a metaverifier tool based on machine learning. It extracts multiple features of the program under test and predicts the optimal configuration (flags) of a bounded model checker with a decision tree. Our current work is specialised in concurrency verification and employs ESBMC as a back-end verification engine. In the paper, we demonstrate that LF-checker achieves better results than the default configuration of the underlying verification engine

arXiv.org e-Print Archive

CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference

Author: Arnau Montañés José María
González Colás Antonio María
Riera Villanueva Marc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations (CGPA) to avoid the computation of entire neurons, based on the activation values of the gates. We show that CGPA can be easily implemented on top of a TPU-like architecture with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

FPGA-accelerated machine learning inference as a service for particle physics computing

Author: Duarte Javier
Harris Philip
Hauck Scott
Holzman Burt
Hsu Shih-Chieh
Jindariani Sergo
Khan Suffian
Kreis Benjamin
Lee Brian
Liu Mia
Lončar Vladimir
Ngadiuba Jennifer
Pedro Kevin
Perez Brandon
Pierini Maurizio
Rankin Dylan
Trahms Matthew
Tran Nhan
Tsaris Aristeidis
Versteeg Colin
Way Ted W.
Werran Dustin
Wu Zhenbin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/04/2019
Field of study

New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.Comment: 16 pages, 14 figures, 2 table

arXiv.org e-Print Archive

CERN Document Server

Recommended from our members

Implementation of the OPU Instruction Set Architecture on the Microsemi Polarfire 300 Field-Programmable Gate Array

Author: Delhez Louis Jean Eric
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Deep learning is a fast-growing field with numerous promising applications that, unfortunately, demands large computing power for both training and inference tasks. To meet this demand, numerous hardware accelerators have thus been designed. Currently, however, these platforms are being developed independently from each other, and, as a result, there is a lack of compatibility between them. Notably, there is a need for standardization of the interface between hardware accelerators and software. UCLA's OPU is an ISA that aims at solving this issue. Contrary to general-purpose ISAs, OPU is designed to adequately express the computations involved in deep learning models, which allows for simple compilation and efficient cores. Prior to this work, only two fully-featured cores implementing the OPU ISA had been designed, both targeted at Xilinx SRAM-based FPGAs. However, flash-based FPGAs can offer several advantages thanks to their different technology. They are more secure, more reliable, and can yield a lower power consumption. All three of these characteristics being potentially highly valuable for deep learning accelerators, especially those embedded in edge devices, a new OPU core is here developed and mapped to a flash-based FPGA. More specifically, the potential of the MPF300 FPGA as a platform for the OPU ISA is evaluated. This represents the first OPU core implemented on an FPGA that is not manufactured by Xilinx. In addition, this design is also the first OPU core capable of operating on floating-point numbers, which simplifies the compilation of models. As such, this work contributes to the diversification of the catalog of available OPU cores, which increases the relevance of this ISA.While prior work affirms that, on Xilinx FPGAs, 8-bit floating-point arithmetic is more area-efficient than 8-bit integer arithmetic, the opposite is found in this work for Microsemi FPGAs. As a consequence, it is established that the optimum manner to perform large floating-point dot products on the MPF300 is to convert the operands to wider integers, on the device, then complete the computations using integer arithmetic. In contrast to Xilinx FPGAs, 5-bit mantissas are here preferred over 4-bit mantissas. Additionally, due to the lower ratio of the number of LUTs to DSPs of the MPF300, the relative resource utilization is found to be significantly higher here compared to the existing implementations. This new OPU core is found to be in average 1.7 times more energy-efficient than the existing similarly-sized implementation of the OPU ISA. Furthermore, the new core is in average 2 times faster than the Nvidia Jetson Nano platform, while consuming the same amount of power. These results further prove the relevance of the OPU ISA. In addition, this demonstrates that flash-based FPGAs, too, are a viable option for deep learning acceleration. The scarcity of these FPGAs in the relevant literature is thus not justified. Nevertheless, analysis of the core shows that the layout of modern FPGAs is in general suboptimal for the task of machine learning acceleration. In particular, the placement of the hard resources of the device tends to cause congestion on the device that reduces performance. This suggests the need for the development of specialized FPGAs for this task

eScholarship - University of California

A novel edge computing approach to astronomical image data processing based on sCMOS camera using SoC.

Author: Brona Grzegorz
Czortek Natalia
Jamroży Mikołaj
Juszczyk Bartłomiej
Karpińska Katarzyna
Pochapskyi Dmytro
Zienkiewicz Paweł
Łukasiewicz Jerzy
Publication venue: Electronics and Telecommunications Committee
Publication date: 20/06/2024
Field of study

The ever-growing deluge of astronomical data challenges traditional server-based processing, hindering real-time analysis and scientific discovery. This paper proposes a novel approach: edge computing directly on an sCMOS camera using a System-on-Chip (SoC) architecture currently developed at Creotech Instruments. We present a custom-designed camera equipped with an FPGA-based SoC, enabling on-board pre-processing and feature extraction of astronomical images. This significantly reduces data transmission, minimizes latency, and empowers real-time decision-making for critical observations. We showcase the camera\u27s capabilities through real-world scenarios, demonstrating its usability in astronomy.

International Journal of Electronics and Telecommunications (Warsaw University of Technology)