Search CORE

22 research outputs found

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

Author: Cosco Francesco
Dendaluce Jahnke Martin
Gomez-Garay Vicente
Novickis Rihards
Pérez Rastelli Joshué
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

Directory of Open Access Journals

On Entropy and Bit Patterns of Ring Oscillator Jitter

Author: Saarinen Markku-Juhani O.
Publication venue
Publication date: 12/10/2021
Field of study

Thermal jitter (phase noise) from a free-running ring oscillator is a common, easily implementable physical randomness source in True Random Number Generators (TRNGs). We show how to evaluate entropy, autocorrelation, and bit pattern distributions of ring oscillator noise sources, even with low jitter levels or some bias. Entropy justification is required in NIST 800-90B and AIS-31 testing and for applications such as the RISC-V entropy source extension. Our numerical evaluation algorithms outperform Monte Carlo simulations in speed and accuracy. We also propose a new lower bound estimation formula for the entropy of ring oscillator sources which applies more generally than previous ones.Comment: 6 page

arXiv.org e-Print Archive

Towards Machine Learning-Based FPGA Backend Flow: Challenges and Opportunities

Author: Farooq Umer
Taj Imran
Publication venue: MDPI
Publication date: 01/02/2023
Field of study

Field-Programmable Gate Array (FPGA) is at the core of System on Chip (SoC) design across various Industry 5.0 digital systems—healthcare devices, farming equipment, autonomous vehicles and aerospace gear to name a few. Given that pre-silicon verification using Computer Aided Design (CAD) accounts for about 70% of the time and money spent on the design of modern digital systems, this paper summarizes the machine learning (ML)-oriented efforts in different FPGA CAD design steps. With the recent breakthrough of machine learning, FPGA CAD tasks—high-level synthesis (HLS), logic synthesis, placement and routing—are seeing a renewed interest in their respective decision-making steps. We focus on machine learning-based CAD tasks to suggest some pertinent research areas requiring more focus in CAD design. The development of open-source benchmarks optimized for an end-to-end machine learning experience, intra-FPGA optimization, domain-specific accelerators, lack of explainability and federated learning are the issues reviewed to identify important research spots requiring significant focus. The potential of the new cloud-based architectures to understand the application of the right ML algorithms in FPGA CAD decision-making steps is discussed, together with visualizing the scenario of incorporating more intelligence in the cloud platform, with the help of relatively newer technologies such as CAD as Adaptive OpenPlatform Service (CAOS). Altogether, this research explores several research opportunities linked with modern FPGA CAD flow design, which will serve as a single point of reference for modern FPGA CAD flow design

Sunderland University Institutional Repository

FPGA-based architectures for acoustic beamforming with microphone arrays : trends, challenges and research opportunities

Author: Braeken An
da Silva Gomes Bruno
Touhafi Abdellah
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Over the past decades, many systems composed of arrays of microphones have been developed to satisfy the quality demanded by acoustic applications. Such microphone arrays are sound acquisition systems composed of multiple microphones used to sample the sound field with spatial diversity. The relatively recent adoption of Field-Programmable Gate Arrays (FPGAs) to manage the audio data samples and to perform the signal processing operations such as filtering or beamforming has lead to customizable architectures able to satisfy the most demanding computational, power or performance acoustic applications. The presented work provides an overview of the current FPGA-based architectures and how FPGAs are exploited for different acoustic applications. Current trends on the use of this technology, pending challenges and open research opportunities on the use of FPGAs for acoustic applications using microphone arrays are presented and discussed

Directory of Open Access Journals

Rethinking FPGA Architectures for Deep Neural Network applications

Author: Rasoulinezhad Seyedramin
Publication venue: Faculty of Engineering, School of Electrical and Information Engineering
Publication date: 01/01/2023
Field of study

The prominence of machine learning-powered solutions instituted an unprecedented trend of integration into virtually all applications with a broad range of deployment constraints from tiny embedded systems to large-scale warehouse computing machines. While recent research confirms the edges of using contemporary FPGAs to deploy or accelerate machine learning applications, especially where the latency and energy consumption are strictly limited, their pre-machine learning optimised architectures remain a barrier to the overall efficiency and performance. Realizing this shortcoming, this thesis demonstrates an architectural study aiming at solutions that enable hidden potentials in the FPGA technology, primarily for machine learning algorithms. Particularly, it shows how slight alterations to the state-of-the-art architectures could significantly enhance the FPGAs toward becoming more machine learning-friendly while maintaining the near-promised performance for the rest of the applications. Eventually, it presents a novel systematic approach to deriving new block architectures guided by designing limitations and machine learning algorithm characteristics through benchmarking. First, through three modifications to Xilinx DSP48E2 blocks, an enhanced digital signal processing (DSP) block for important computations in embedded deep neural network (DNN) accelerators is described. Then, two tiers of modifications to FPGA logic cell architecture are explained that deliver a variety of performance and utilisation benefits with only minor area overheads. Eventually, with the goal of exploring this new design space in a methodical manner, a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations is first proposed. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then suggested together with a family of new embedded blocks, called MLBlocks

FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption

Author: D'Anvers Jan-Pieter
Turan Furkan
Van Beirendonck Michiel
Verbauwhede Ingrid
Publication venue
Publication date: 18/10/2023
Field of study

Fully Homomorphic Encryption is a technique that allows computation on encrypted data. It has the potential to change privacy considerations in the cloud, but computational and memory overheads are preventing its adoption. TFHE is a promising Torus-based FHE scheme that relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation. We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT is the first hardware accelerator to exploit the inherent noise present in FHE calculations. Instead of double or single-precision floating-point arithmetic, it implements TFHE bootstrapping entirely with approximate fixed-point arithmetic. Using an in-depth analysis of noise propagation in bootstrapping FFT computations, FPT is able to use noise-trimmed fixed-point representations that are up to 50% smaller than prior implementations. FPT is built as a streaming processor inspired by traditional streaming DSPs: it instantiates directly cascaded high-throughput computational stages, with minimal control logic and routing networks. We explore throughput-balanced compositions of streaming kernels with a user-configurable streaming width in order to construct a full bootstrapping pipeline. Our approach allows 100% utilization of arithmetic units and requires only a small bootstrapping key cache, enabling an entirely compute-bound bootstrapping throughput of 1 BS / 35us. This is in stark contrast to the classical CPU approach to FHE bootstrapping acceleration, which is typically constrained by memory and bandwidth. FPT is implemented and evaluated as a bootstrapping FPGA kernel for an Alveo U280 datacenter accelerator card. FPT achieves two to three orders of magnitude higher bootstrapping throughput than existing CPU-based implementations, and 2.5x higher throughput compared to recent ASIC emulation experiments.Comment: ACM CCS 202

arXiv.org e-Print Archive

VR-ZYCAP: A versatile resourse-level ICAP controller for ZYNQ SOC

Author: Ahmad Andwaleed
Malik Arsalan Ali
Muslim Fahad Bin
Reviriego Vasallo Pedro
Sultana Bushra
Ullah Anees
Ullah Nasim
Zahir Ali
Publication venue: 'MDPI AG'
Publication date: 02/04/2021
Field of study

This article belongs to the Special Issue Architecture and CAD for Field-Programmable Gate Arrays (FPGAs)Hybrid architectures integrating a processor with an SRAM-based FPGA fabric—for example, Xilinx ZynQ SoC—are increasingly being used as a single-chip solution in several market segments to replace multi-chip designs. These devices not only provide advantages in terms of logic density, cost and integration, but also provide run-time in-field reconfiguration capabilities. However, the current reconfiguration capabilities provided by vendor tools are limited to the module level. Therefore, incremental run-time configuration memory changes require a lengthy compilation time for off-line bitstream generation along with storage and reconfiguration time overheads with traditional vendor methodologies. In this paper, an internal configuration access port (ICAP) controller that provides a versatile fine-grain resource-level incremental reconfiguration of the programmable logic (PL) resources in ZynQ SoC is presented. The proposed controller implemented in PL, called VR-ZyCAP, can reconfigure look-up tables (LUTs) and Flip-Flops (FF). The run-time reconfiguration of FF is achieved through a reset after reconfiguration (RAR)-featured partial bitstream to avoid the unintended state corruption of other memory elements. Along with versatility, our proposed controller improves the reconfiguration time by 30 times for FFs compared to state-of-the-art works while achieving a nearly 400-fold increase in speed for LUTs when compared to vendor-supported software approaches. In addition, it achieves competitive resource utilization when compared to existing approaches.This research was funded by Spanish Ministry of Science and Innovation under the ACHILLES project, grant number PID2019-104207RB-I00 and by Taif University Researchers Supporting fund, grant number (TURSP-2020/144), Taif University, Taif, Saudi Arabia

Universidad Carlos III de Madrid e-Archivo