28 research outputs found

    Mutual Impact between Clock Gating and High Level Synthesis in Reconfigurable Hardware Accelerators

    Get PDF
    With the diffusion of cyber-physical systems and internet of things, adaptivity and low power consumption became of primary importance in digital systems design. Reconfigurable heterogeneous platforms seem to be one of the most suitable choices to cope with such challenging context. However, their development and power optimization are not trivial, especially considering hardware acceleration components. On the one hand high level synthesis could simplify the design of such kind of systems, but on the other hand it can limit the positive effects of the adopted power saving techniques. In this work, the mutual impact of different high level synthesis tools and the application of the well known clock gating strategy in the development of reconfigurable accelerators is studied. The aim is to optimize a clock gating application according to the chosen high level synthesis engine and target technology (Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)). Different levels of application of clock gating are evaluated, including a novel multi level solution. Besides assessing the benefits and drawbacks of the clock gating application at different levels, hints for future design automation of low power reconfigurable accelerators through high level synthesis are also derived

    FPGA-Based PUF Designs: A Comprehensive Review and Comparative Analysis

    Get PDF
    Field-programmable gate arrays (FPGAs) have firmly established themselves as dynamic platforms for the implementation of physical unclonable functions (PUFs). Their intrinsic reconfigurability and profound implications for enhancing hardware security make them an invaluable asset in this realm. This groundbreaking study not only dives deep into the universe of FPGA-based PUF designs but also offers a comprehensive overview coupled with a discerning comparative analysis. PUFs are the bedrock of device authentication and key generation and the fortification of secure cryptographic protocols. Unleashing the potential of FPGA technology expands the horizons of PUF integration across diverse hardware systems. We set out to understand the fundamental ideas behind PUF and how crucially important it is to current security paradigms. Different FPGA-based PUF solutions, including static, dynamic, and hybrid systems, are closely examined. Each design paradigm is painstakingly examined to reveal its special qualities, functional nuances, and weaknesses. We closely assess a variety of performance metrics, including those related to distinctiveness, reliability, and resilience against hostile threats. We compare various FPGA-based PUF systems against one another to expose their unique advantages and disadvantages. This study provides system designers and security professionals with the crucial information they need to choose the best PUF design for their particular applications. Our paper provides a comprehensive view of the functionality, security capabilities, and prospective applications of FPGA-based PUF systems. The depth of knowledge gained from this research advances the field of hardware security, enabling security practitioners, researchers, and designers to make wise decisions when deciding on and implementing FPGA-based PUF solutions.publishedVersio

    A Comprehensive Framework for Fair and Efficient Benchmarking of Hardware Implementations of Lightweight Cryptography

    Get PDF
    In this paper, we propose a comprehensive framework for fair and efficient benchmarking of hardware implementations of lightweight cryptography (LWC). Our framework is centered around the hardware API (Application Programming Interface) for the implementations of lightweight authenticated ciphers, hash functions, and cores combining both functionalities. The major parts of our API include the minimum compliance criteria, interface, and communication protocol supported by the LWC core. The proposed API is intended to meet the requirements of all candidates submitted to the NIST Lightweight Cryptography standardization process, as well as all CAESAR candidates and current authenticated cipher and hash function standards. In order to speed-up the development of hardware implementations compliant with this API, we are making available the LWC Development Package and the corresponding Implementer’s Guide. Equipped with these resources, hardware designers can focus on implementing only a core functionality of a given algorithm. The development package facilitates the communication with external modules, full verification of the LWC core using simulation, and generation of optimized results. The proposed API for lightweight cryptography is a superset of the CAESAR Hardware API, endorsed by the organizers of the CAESAR competition, which was successfully used in the development of over 50 implementations of Round 2 and Round 3 CAESAR candidates. The primary extensions include support for optional hash functionality and the development of cores resistant against side-channel attacks. Similarly, the LWC Development Package is a superset of the part of the CAESAR Development Package responsible for support of Use Case 1 (lightweight) CAESAR candidates. The primary extensions include support for hash functionality, increasing the flexibility of the code shared among all candidates, as well as extended support for the detection of errors preventing the correct operation of cores during experimental testing. Overall, our framework supports (a) fair ranking of candidates in the NIST LWC standardization process from the point of view of their efficiency in hardware before and after the implementation of countermeasures against side-channel attacks, (b) ability to perform benchmarking within the limited time devoted to Round2 and any subsequent rounds of the NIST LWC standardization process, (c) compatibility among implementations of the same algorithm by different designers and (d) fast deployment of the best algorithms in real-life applications

    Comparative Study of Keccak SHA-3 Implementations

    Get PDF
    This paper conducts an extensive comparative study of state-of-the-art solutions for im- plementing the SHA-3 hash function. SHA-3, a pivotal component in modern cryptography, has spawned numerous implementations across diverse platforms and technologies. This research aims to provide valuable insights into selecting and optimizing Keccak SHA-3 implementations. Our study encompasses an in-depth analysis of hardware, software, and software–hardware (hybrid) solutions. We assess the strengths, weaknesses, and performance metrics of each approach. Critical factors, including computational efficiency, scalability, and flexibility, are evaluated across differ- ent use cases. We investigate how each implementation performs in terms of speed and resource utilization. This research aims to improve the knowledge of cryptographic systems, aiding in the informed design and deployment of efficient cryptographic solutions. By providing a comprehensive overview of SHA-3 implementations, this study offers a clear understanding of the available options and equips professionals and researchers with the necessary insights to make informed decisions in their cryptographic endeavors

    An Scalable matrix computing unit architecture for FPGA and SCUMO user design interface

    Get PDF
    High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations' performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N x N matrices, the architecture requires N ALU-RAM blocks and performs O(N*N), requiring N*N +7 and N +7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 x 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units

    A New Methodology to Manage FPGA Distributed Memory Content via Bitstream for Xilinx ZYNQ Devices

    Get PDF
    This paper proposes a methodology to access data and manage the content of distributed memories in FPGA designs through the configuration bitstream. Thanks to the methods proposed, it is possible to read and write the data content of registers without using the in/out ports of registers in a straightforward fashion. Hence, it offers the possibility of performing several operations, such as, to load, copy or compare the information stored in registers without the necessity of physical interconnections. This work includes two flows that simplify the designing process when using the proposed approach: while the first enables the protection or unprotection of writing on different partial regions through the bitstream, the second permits homogeneous instances of a design implemented in different reconfigurable regions to be obtained without losing efficiency. The approach is based and has been physically validated on the ZYNQ from Xilinx, and when using partially reconfigurable designs, it does not affect the hardware overhead nor the maximum operating frequency of the design.This work has been supported, within the fund for research groups of the Basque university system IT1440-22, by the Department of Education and, within PILAR ZE-2020/00022 and COMMUTE ZE-2021/00931 projects, by the Hazitek program, both of the Basque Government; the latter also by the Ministerio de Ciencia Innovación of Spain through the Centro para el Desarrollo Tecnológico Industrial (CDTI) within the projects IDI-20201264 and IDI-20220543, and through the Fondo Europeo de Desarrollo Regional 2014–2020 (FEDER funds)

    A dynamic reconfigurable architecture for hybrid spiking and convolutional FPGA-based neural network designs

    Get PDF
    This work presents a dynamically reconfigurable architecture for Neural Network (NN) accelerators implemented in Field-Programmable Gate Array (FPGA) that can be applied in a variety of application scenarios. Although the concept of Dynamic Partial Reconfiguration (DPR) is increasingly used in NN accelerators, the throughput is usually lower than pure static designs. This work presents a dynamically reconfigurable energy-efficient accelerator architecture that does not sacrifice throughput performance. The proposed accelerator comprises reconfigurable processing engines and dynamically utilizes the device resources according to model parameters. Using the proposed architecture with DPR, different NN types and architectures can be realized on the same FPGA. Moreover, the proposed architecture maximizes throughput performance with design optimizations while considering the available resources on the hardware platform. We evaluate our design with different NN architectures for two different tasks. The first task is the image classification of two distinct datasets, and this requires switching between Convolutional Neural Network (CNN) architectures having different layer structures. The second task requires switching between NN architectures, namely a CNN architecture with high accuracy and throughput and a hybrid architecture that combines convolutional layers and an optimized Spiking Neural Network (SNN) architecture. We demonstrate throughput results from quickly reprogramming only a tiny part of the FPGA hardware using DPR. Experimental results show that the implemented designs achieve a 7× faster frame rate than current FPGA accelerators while being extremely flexible and using comparable resources

    Revisiting the high-performance reconfigurable computing for future datacenters

    Get PDF
    Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects-discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well

    Towards Machine Learning-Based FPGA Backend Flow: Challenges and Opportunities

    Get PDF
    Field-Programmable Gate Array (FPGA) is at the core of System on Chip (SoC) design across various Industry 5.0 digital systems—healthcare devices, farming equipment, autonomous vehicles and aerospace gear to name a few. Given that pre-silicon verification using Computer Aided Design (CAD) accounts for about 70% of the time and money spent on the design of modern digital systems, this paper summarizes the machine learning (ML)-oriented efforts in different FPGA CAD design steps. With the recent breakthrough of machine learning, FPGA CAD tasks—high-level synthesis (HLS), logic synthesis, placement and routing—are seeing a renewed interest in their respective decision-making steps. We focus on machine learning-based CAD tasks to suggest some pertinent research areas requiring more focus in CAD design. The development of open-source benchmarks optimized for an end-to-end machine learning experience, intra-FPGA optimization, domain-specific accelerators, lack of explainability and federated learning are the issues reviewed to identify important research spots requiring significant focus. The potential of the new cloud-based architectures to understand the application of the right ML algorithms in FPGA CAD decision-making steps is discussed, together with visualizing the scenario of incorporating more intelligence in the cloud platform, with the help of relatively newer technologies such as CAD as Adaptive OpenPlatform Service (CAOS). Altogether, this research explores several research opportunities linked with modern FPGA CAD flow design, which will serve as a single point of reference for modern FPGA CAD flow design

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems
    corecore