256 research outputs found

    Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults

    Get PDF
    Voltage underscaling below the nominal level is an effective solution for improving energy efficiency in digital circuits, e.g., Field Programmable Gate Arrays (FPGAs). However, further undervolting below a safe voltage level and without accompanying frequency scaling leads to timing related faults, potentially undermining the energy savings. Through experimental voltage underscaling studies on commercial FPGAs, we observed that the rate of these faults exponentially increases for on-chip memories, or Block RAMs (BRAMs). To mitigate these faults, we evaluated the efficiency of the built-in Error-Correction Code (ECC) and observed that more than 90% of the faults are correctable and further 7% are detectable (but not correctable). This efficiency is the result of the single-bit type of these faults, which are then effectively covered by the Single-Error Correction and Double-Error Detection (SECDED) design of the built-in ECC. Finally, motivated by the above experimental observations, we evaluated an FPGA-based Neural Network (NN) accelerator under low-voltage operations, while built-in ECC is leveraged to mitigate undervolting faults and thus, prevent NN significant accuracy loss. In consequence, we achieve 40% of the BRAM power saving through undervolting below the minimum safe voltage level, with a negligible NN accuracy loss, thanks to the substantial fault coverage by the built-in ECC.Comment: 6 pages, 2 figure

    FPGA Energy Efficiency by Leveraging Thermal Margin

    Full text link
    Cutting edge FPGAs are not energy efficient as conventionally presumed to be, and therefore, aggressive power-saving techniques have become imperative. The clock rate of an FPGA-mapped design is set based on worst-case conditions to ensure reliable operation under all circumstances. This usually leaves a considerable timing margin that can be exploited to reduce power consumption by scaling voltage without lowering clock frequency. There are hurdles for such opportunistic voltage scaling in FPGAs because (a) critical paths change with designs, making timing evaluation difficult as voltage changes, (b) each FPGA resource has particular power-delay trade-off with voltage, (c) data corruption of configuration cells and memory blocks further hampers voltage scaling. In this paper, we propose a systematical approach to leverage the available thermal headroom of FPGA-mapped designs for power and energy improvement. By comprehensively analyzing the timing and power consumption of FPGA building blocks under varying temperatures and voltages, we propose a thermal-aware voltage scaling flow that effectively utilizes the thermal margin to reduce power consumption without degrading performance. We show the proposed flow can be employed for energy optimization as well, whereby power consumption and delay are compromised to accomplish the tasks with minimum energy. Lastly, we propose a simulation framework to be able to examine the efficiency of the proposed method for other applications that are inherently tolerant to a certain amount of error, granting further power saving opportunity. Experimental results over a set of industrial benchmarks indicate up to 36% power reduction with the same performance, and 66% total energy saving when energy is the optimization target.Comment: Accepted in IEEE International Conference on Computer Design (ICCD) 201

    Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins

    Get PDF
    Modern large-scale computing systems (data centers, supercomputers, cloud and edge setups and high-end cyber-physical systems) employ heterogeneous architectures that consist of multicore CPUs, general-purpose many-core GPUs, and programmable FPGAs. The effective utilization of these architectures poses several challenges, among which a primary one is power consumption. Voltage reduction is one of the most efficient methods to reduce power consumption of a chip. With the galloping adoption of hardware accelerators (i.e., GPUs and FPGAs) in large datacenters and other large-scale computing infrastructures, a comprehensive evaluation of the safe voltage reduction levels for each different chip can be employed for efficient reduction of the total power. We present a survey of recent studies in voltage margins reduction at the system level for modern CPUs, GPUs and FPGAs. The pessimistic voltage guardbands inserted by the silicon vendors can be exploited in all devices for significant power savings. On average, voltage reduction can reach 12% in multicore CPUs, 20% in manycore GPUs and 39% in FPGAs.Comment: Accepted for publication in IEEE Transactions on Device and Materials Reliabilit

    Dynamic Partial Reconfiguration for Dependable Systems

    Get PDF
    Moore’s law has served as goal and motivation for consumer electronics manufacturers in the last decades. The results in terms of processing power increase in the consumer electronics devices have been mainly achieved due to cost reduction and technology shrinking. However, reducing physical geometries mainly affects the electronic devices’ dependability, making them more sensitive to soft-errors like Single Event Transient (SET) of Single Event Upset (SEU) and hard (permanent) faults, e.g. due to aging effects. Accordingly, safety critical systems often rely on the adoption of old technology nodes, even if they introduce longer design time w.r.t. consumer electronics. In fact, functional safety requirements are increasingly pushing industry in developing innovative methodologies to design high-dependable systems with the required diagnostic coverage. On the other hand commercial off-the-shelf (COTS) devices adoption began to be considered for safety-related systems due to real-time requirements, the need for the implementation of computationally hungry algorithms and lower design costs. In this field FPGA market share is constantly increased, thanks to their flexibility and low non-recurrent engineering costs, making them suitable for a set of safety critical applications with low production volumes. The works presented in this thesis tries to face new dependability issues in modern reconfigurable systems, exploiting their special features to take proper counteractions with low impacton performances, namely Dynamic Partial Reconfiguration

    A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

    Full text link
    A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

    A Survey of FPGA Optimization Methods for Data Center Energy Efficiency

    Get PDF
    This article provides a survey of academic literature about field programmable gate array (FPGA) and their utilization for energy efficiency acceleration in data centers. The goal is to critically present the existing FPGA energy optimization techniques and discuss how they can be applied to such systems. To do so, the article explores current energy trends and their projection to the future with particular attention to the requirements set out by the European Code of Conduct for Data Center Energy Efficiency. The article then proposes a complete analysis of over ten years of research in energy optimization techniques, classifying them by purpose, method of application, and impacts on the sources of consumption. Finally, we conclude with the challenges and possible innovations we expect for this sector.Comment: Accepted for publication in IEEE Transactions on Sustainable Computin

    Model Building and Security Analysis of PUF-Based Authentication

    Get PDF
    In the context of hardware systems, authentication refers to the process of confirming the identity and authenticity of chip, board and system components such as RFID tags, smart cards and remote sensors. The ability of physical unclonable functions (PUF) to provide bitstrings unique to each component can be leveraged as an authentication mechanism to detect tamper, impersonation and substitution of such components. However, authentication requires a strong PUF, i.e., one capable of producing a large, unique set of bits per device, and, unlike secret key generation for encryption, has additional challenges that relate to machine learning attacks, protocol attacks and constraints on device resources. We describe the requirements for PUF-based authentication, and present a PUF primitive and protocol designed for authentication in resource constrained devices. Our experimental results are derived from a 28 nm Xilinx FPGA. In the authentication scenario, strong PUFs are required since the adversary could collect a subset of challenges and response pairsto build a model and predict the responses for unseen challenges. Therefore, strong PUFs need to provide exponentially large challenge space and be resilient to model building attacks. We investigate the security properties of a Hardware-embedded Delay PUF called HELP which leverages within-die variations in path delays within a hardware-implemented macro (functional unit) as the entropy source. Several features of the HELP processing engine significantly improve its resistance to model-building attacks. We also investigate a novel technique that significantly improves the statistically quality of the generated bitstring for HELP. Stability across environmental variations such as temperature and voltage, is critically important for Physically Unclonable Functions (PUFs). Nearly all existing PUF systems to date need a mechanism to deal with “bit flips” when exact regeneration of the bitstring is required, e.g., for cryptographic applications. Error correction (ECC) and error avoidance schemes have been proposed but both of these require helper data to be stored for the regeneration process. Unfortunately, helper data adds time and area overhead to the PUF system and provides opportunities for adversaries to reverse engineer the secret bitstring. We propose a non-volatile memory-based (NVM) PUF that is able to avoid bit flips without requiring any type of helper data. We describe the technique in the context of emerging nano-devices, in particular, resistive random access memory (Memristor) cells, but the methodology is applicable to any type of NVM including Flash

    Strategies towards high performance (high-resolution/linearity) time-to-digital converters on field-programmable gate arrays

    Get PDF
    Time-correlated single-photon counting (TCSPC) technology has become popular in scientific research and industrial applications, such as high-energy physics, bio-sensing, non-invasion health monitoring, and 3D imaging. Because of the increasing demand for high-precision time measurements, time-to-digital converters (TDCs) have attracted attention since the 1970s. As a fully digital solution, TDCs are portable and have great potential for multichannel applications compared to bulky and expensive time-to-amplitude converters (TACs). A TDC can be implemented in ASIC and FPGA devices. Due to the low cost, flexibility, and short development cycle, FPGA-TDCs have become promising. Starting with a literature review, three original FPGA-TDCs with outstanding performance are introduced. The first design is the first efficient wave union (WU) based TDC implemented in Xilinx UltraScale (20 nm) FPGAs with a bubble-free sub-TDL structure. Combining with other existing methods, the resolution is further enhanced to 1.23 ps. The second TDC has been designed for LiDAR applications, especially in driver-less vehicles. Using the proposed new calibration method, the resolution is adjustable (50, 80, and 100 ps), and the linearity is exceptionally high (INL pk-pk and INL pk-pk are lower than 0.05 LSB). Meanwhile, a software tool has been open-sourced with a graphic user interface (GUI) to predict TDCs’ performance. In the third TDC, an onboard automatic calibration (AC) function has been realized by exploiting Xilinx ZYNQ SoC architectures. The test results show the robustness of the proposed method. Without the manual calibration, the AC function enables FPGA-TDCs to be applied in commercial products where mass production is required.Time-correlated single-photon counting (TCSPC) technology has become popular in scientific research and industrial applications, such as high-energy physics, bio-sensing, non-invasion health monitoring, and 3D imaging. Because of the increasing demand for high-precision time measurements, time-to-digital converters (TDCs) have attracted attention since the 1970s. As a fully digital solution, TDCs are portable and have great potential for multichannel applications compared to bulky and expensive time-to-amplitude converters (TACs). A TDC can be implemented in ASIC and FPGA devices. Due to the low cost, flexibility, and short development cycle, FPGA-TDCs have become promising. Starting with a literature review, three original FPGA-TDCs with outstanding performance are introduced. The first design is the first efficient wave union (WU) based TDC implemented in Xilinx UltraScale (20 nm) FPGAs with a bubble-free sub-TDL structure. Combining with other existing methods, the resolution is further enhanced to 1.23 ps. The second TDC has been designed for LiDAR applications, especially in driver-less vehicles. Using the proposed new calibration method, the resolution is adjustable (50, 80, and 100 ps), and the linearity is exceptionally high (INL pk-pk and INL pk-pk are lower than 0.05 LSB). Meanwhile, a software tool has been open-sourced with a graphic user interface (GUI) to predict TDCs’ performance. In the third TDC, an onboard automatic calibration (AC) function has been realized by exploiting Xilinx ZYNQ SoC architectures. The test results show the robustness of the proposed method. Without the manual calibration, the AC function enables FPGA-TDCs to be applied in commercial products where mass production is required

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults
    • …
    corecore