83 research outputs found

    Exploring the energy consumption of lightweight blockciphers in FPGA

    Get PDF

    Mutual Impact between Clock Gating and High Level Synthesis in Reconfigurable Hardware Accelerators

    Get PDF
    With the diffusion of cyber-physical systems and internet of things, adaptivity and low power consumption became of primary importance in digital systems design. Reconfigurable heterogeneous platforms seem to be one of the most suitable choices to cope with such challenging context. However, their development and power optimization are not trivial, especially considering hardware acceleration components. On the one hand high level synthesis could simplify the design of such kind of systems, but on the other hand it can limit the positive effects of the adopted power saving techniques. In this work, the mutual impact of different high level synthesis tools and the application of the well known clock gating strategy in the development of reconfigurable accelerators is studied. The aim is to optimize a clock gating application according to the chosen high level synthesis engine and target technology (Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)). Different levels of application of clock gating are evaluated, including a novel multi level solution. Besides assessing the benefits and drawbacks of the clock gating application at different levels, hints for future design automation of low power reconfigurable accelerators through high level synthesis are also derived

    Efficient FPGA Cost-Performance Space Exploration Using Type-driven Program Transformations

    Get PDF
    Many numerical simulation applications from the scientific, financial and machine-learning domains require large amounts of compute capacity. They can often be implemented with a streaming data-flow architecture. Field Programmable Gate Arrays (FPGA) are particularly power-efficient hardware architectures suitable for streaming data-flow applications. Although numerous programming languages and frameworks target FPGAs, expert knowledge is still required to optimise the throughput of such applications for each target FPGA device. The process of selecting which optimising transformations to apply, and where to apply them is dubbed Design Space Exploration (DSE). We contribute an elegant and efficient compiler based DSE strategy for FPGAs by merging information sourced from the compiled application's semantic structure, an accurate cost-performance model and a description of hardware resource limits for particular FPGAs. Our work leverages developments in functional programming and dependent type theory to bring performance portability to the realm of High-Level Synthesis (HLS) tools targeting FPGAs. We showcase our approach by presenting achievable speedups for three example applications. Results indicate considerable improvements in throughput of up to 58Ă— in one example. These results are obtained by traversing a minute fraction of the total Design Space

    A Comprehensive Framework for Fair and Efficient Benchmarking of Hardware Implementations of Lightweight Cryptography

    Get PDF
    In this paper, we propose a comprehensive framework for fair and efficient benchmarking of hardware implementations of lightweight cryptography (LWC). Our framework is centered around the hardware API (Application Programming Interface) for the implementations of lightweight authenticated ciphers, hash functions, and cores combining both functionalities. The major parts of our API include the minimum compliance criteria, interface, and communication protocol supported by the LWC core. The proposed API is intended to meet the requirements of all candidates submitted to the NIST Lightweight Cryptography standardization process, as well as all CAESAR candidates and current authenticated cipher and hash function standards. In order to speed-up the development of hardware implementations compliant with this API, we are making available the LWC Development Package and the corresponding Implementer’s Guide. Equipped with these resources, hardware designers can focus on implementing only a core functionality of a given algorithm. The development package facilitates the communication with external modules, full verification of the LWC core using simulation, and generation of optimized results. The proposed API for lightweight cryptography is a superset of the CAESAR Hardware API, endorsed by the organizers of the CAESAR competition, which was successfully used in the development of over 50 implementations of Round 2 and Round 3 CAESAR candidates. The primary extensions include support for optional hash functionality and the development of cores resistant against side-channel attacks. Similarly, the LWC Development Package is a superset of the part of the CAESAR Development Package responsible for support of Use Case 1 (lightweight) CAESAR candidates. The primary extensions include support for hash functionality, increasing the flexibility of the code shared among all candidates, as well as extended support for the detection of errors preventing the correct operation of cores during experimental testing. Overall, our framework supports (a) fair ranking of candidates in the NIST LWC standardization process from the point of view of their efficiency in hardware before and after the implementation of countermeasures against side-channel attacks, (b) ability to perform benchmarking within the limited time devoted to Round2 and any subsequent rounds of the NIST LWC standardization process, (c) compatibility among implementations of the same algorithm by different designers and (d) fast deployment of the best algorithms in real-life applications

    Review of Rounding Based Approximate Multiplier (ROBA) For Digital Signal Processing

    Get PDF
    The fundamental idea of adjusting put together estimated multiplier depends with respect to adjusting of numbers. This multiplier can be connected for both marked and unsigned numbers. In this paper contemplated an Rounding Based Approximate Multiplier that is fast yet vitality effective. The methodology is to round the operands to the closest example of two. Along these lines the computational concentrated piece of the augmentation is excluded improving rate and vitality utilization at the cost of a little mistake. This methodology is appropriate to both marked and unsigned augmentations. The productivity of the ROBA multiplier is assessed by contrasting its execution and those of some rough and precise multipliers utilizing distinctive plan parameters

    An Scalable matrix computing unit architecture for FPGA and SCUMO user design interface

    Get PDF
    High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations' performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N x N matrices, the architecture requires N ALU-RAM blocks and performs O(N*N), requiring N*N +7 and N +7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 x 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units

    FPGA-Based PUF Designs: A Comprehensive Review and Comparative Analysis

    Get PDF
    Field-programmable gate arrays (FPGAs) have firmly established themselves as dynamic platforms for the implementation of physical unclonable functions (PUFs). Their intrinsic reconfigurability and profound implications for enhancing hardware security make them an invaluable asset in this realm. This groundbreaking study not only dives deep into the universe of FPGA-based PUF designs but also offers a comprehensive overview coupled with a discerning comparative analysis. PUFs are the bedrock of device authentication and key generation and the fortification of secure cryptographic protocols. Unleashing the potential of FPGA technology expands the horizons of PUF integration across diverse hardware systems. We set out to understand the fundamental ideas behind PUF and how crucially important it is to current security paradigms. Different FPGA-based PUF solutions, including static, dynamic, and hybrid systems, are closely examined. Each design paradigm is painstakingly examined to reveal its special qualities, functional nuances, and weaknesses. We closely assess a variety of performance metrics, including those related to distinctiveness, reliability, and resilience against hostile threats. We compare various FPGA-based PUF systems against one another to expose their unique advantages and disadvantages. This study provides system designers and security professionals with the crucial information they need to choose the best PUF design for their particular applications. Our paper provides a comprehensive view of the functionality, security capabilities, and prospective applications of FPGA-based PUF systems. The depth of knowledge gained from this research advances the field of hardware security, enabling security practitioners, researchers, and designers to make wise decisions when deciding on and implementing FPGA-based PUF solutions.publishedVersio

    Annual Report, 2015-2016

    Get PDF

    On cost-effective reuse of components in the design of complex reconfigurable systems

    Get PDF
    Design strategies that benefit from the reuse of system components can reduce costs while maintaining or increasing dependability—we use the term dependability to tie together reliability and availability. D3H2 (aDaptive Dependable Design for systems with Homogeneous and Heterogeneous redundancies) is a methodology that supports the design of complex systems with a focus on reconfiguration and component reuse. D3H2 systematizes the identification of heterogeneous redundancies and optimizes the design of fault detection and reconfiguration mechanisms, by enabling the analysis of design alternatives with respect to dependability and cost. In this paper, we extend D3H2 for application to repairable systems. The method is extended with analysis capabilities allowing dependability assessment of complex reconfigurable systems. Analysed scenarios include time-dependencies between failure events and the corresponding reconfiguration actions. We demonstrate how D3H2 can support decisions about fault detection and reconfiguration that seek to improve dependability while reducing costs via application to a realistic railway case study

    Transition Probability Test for an RO-Based Generator and the Relevance between the Randomness and the Number of ROs

    Get PDF
    A ring oscillator is a well-known circuit used for generating random numbers, and interested readers can find many research results concerning the evaluation of the randomness with a packaged test suit. However, the authors think there is room for evaluating the unpredictability of a sequence from another viewpoint. In this paper, the authors focus on Wold's RO-based generator and propose a statistical test to numerically evaluate the randomness of the RO-based generator. The test adopts the state transition probabilities in a Markov process and is designed to check the uniformity of the probabilities based on hypothesis testing. As a result, it is found that the RO-based generator yields a biased output from the viewpoint of the transition probability if the number of ROs is small. More precisely, the transitions 01 -> 01 and 11 -> 11 happen frequently when the number l of ROs is less than or equal to 10. In this sense, l > 10 is recommended for use in any application, though a packaged test suit is passed. Thus, the authors believe that the proposed test contributes to evaluating the unpredictability of a sequence when used together with available statistical test suits, such as NIST SP800-22
    • …
    corecore