21 research outputs found
ML-based Real-Time Control at the Edge: An Approach Using hls4ml
This study focuses on implementing a real-time control system for a particle
accelerator facility that performs high energy physics experiments. A critical
operating parameter in this facility is beam loss, which is the fraction of
particles deviating from the accelerated proton beam into a cascade of
secondary particles. Accelerators employ a large number of sensors to monitor
beam loss. The data from these sensors is monitored by human operators who
predict the relative contribution of different sub-systems to the beam loss.
Using this information, they engage control interventions. In this paper, we
present a controller to track this phenomenon in real-time using edge-Machine
Learning (ML) and support control with low latency and high accuracy. We
implemented this system on an Intel Arria 10 SoC. Optimizations at the
algorithm, high-level synthesis, and interface levels to improve latency and
resource usage are presented. Our design implements a neural network, which can
predict the main source of beam loss (between two possible causes) at speeds up
to 575 frames per second (fps) (average latency of 1.74 ms). The practical
deployed system is required to operate at 320 fps, with a 3ms latency
requirement, which has been met by our design successfully
Applications and Techniques for Fast Machine Learning in Science
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science - the concept of integrating powerful ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs
RPack: Routability-Driven packing for cluster-based FPGAs
Routing tools consume a significant portion of the total design time. Considering routability at earlier steps of the CAD flow would both yield better quality and faster design process. In this paper we are presenting a routability-driven clustering method for cluster-based FPGAs. Our method packs LUTs into logic clusters while incorporating routability metrics into a cost function. The objective is to minimize this routability cost function. Our cost function is consistently able to indicate improved routability. Our method yields up to 50 % improvement over existing clustering methods in terms of the number of routing tracks required. The average improvement obtained is 16.5 %. Reduction in number of tracks yields reduced routing area
Managing Reconfigurable Resources in Heterogeneous Cores Using Portable Pre-Synthesized Templates
An ILP Formulation for the Task Graph Scheduling Problem Tailored to Bi-Dimensional Reconfigurable Architectures
This work proposes an exact ILP formulation for the task scheduling problem on a 2D dynamically and partially reconfigurable architecture. Our approach takes physical constraints of the target device that is relevant for reconfiguration into account. Specifically, we consider the limited number of reconfigurators, which are used to reconfigure the device. This work also proposes a reconfiguration-aware heuristic scheduler, which exploits configuration prefetching, module reuse, and antifragmentation techniques. We experimented with a system employing two reconfigurators. This work also extends the ILP formulation for a HW/SW Codesign scenario. A heuristic scheduler for this extension has been developed too. These systems can be easily implemented using standard FPGAs. Our approach is able to improve the schedule quality by 8.76% on average (22.22% in the best case). Furthermore, our heuristic scheduler obtains
the optimal schedule length in 60% of the considered cases. Our extended analysis demonstrated that HW/SW codesign can indeed lead to significantly better results. Our experiments show that by using our proposed HW/SW codesign method, the schedule length of applications can be reduced by a factor of 2 in the best case