392 research outputs found

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    A toolset for the analysis and optimization of motion estimation algorithms and processors

    Get PDF

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

    Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults

    Get PDF
    Voltage underscaling below the nominal level is an effective solution for improving energy efficiency in digital circuits, e.g., Field Programmable Gate Arrays (FPGAs). However, further undervolting below a safe voltage level and without accompanying frequency scaling leads to timing related faults, potentially undermining the energy savings. Through experimental voltage underscaling studies on commercial FPGAs, we observed that the rate of these faults exponentially increases for on-chip memories, or Block RAMs (BRAMs). To mitigate these faults, we evaluated the efficiency of the built-in Error-Correction Code (ECC) and observed that more than 90% of the faults are correctable and further 7% are detectable (but not correctable). This efficiency is the result of the single-bit type of these faults, which are then effectively covered by the Single-Error Correction and Double-Error Detection (SECDED) design of the built-in ECC. Finally, motivated by the above experimental observations, we evaluated an FPGA-based Neural Network (NN) accelerator under low-voltage operations, while built-in ECC is leveraged to mitigate undervolting faults and thus, prevent NN significant accuracy loss. In consequence, we achieve 40% of the BRAM power saving through undervolting below the minimum safe voltage level, with a negligible NN accuracy loss, thanks to the substantial fault coverage by the built-in ECC.Comment: 6 pages, 2 figure

    Extending ASSERT for HW/SW Co-design

    Get PDF
    Embedded systems are commonly designed by specifying and developing hardware and software systems separately. On the contrary, the hardware/software (HW/SW) co-development exploits the trade-offs between hardware and software in a system through their concurrent design. HW/SW Codevelopment techniques take advantage of the flexibility of system design to create architectures that can meet stringent performance requirements with a shorter design cycle. This paper presents the work done within the scope of ESA HWSWCO (Hardware-Software Co-design) study. The main objective of this study has been to address the HW/SW co-design phase to integrate this engineering task as part of the ASSERT process (refer to [1]) and compatible with the existing ASSERT approach, process and tool, Advances in the automation of the design of HW and SW and the adoption of the Model Driven Architecture (MDA) [9] paradigm make possible the definition of a proper integration substrate and enables the continuous interaction of the HW and SW design paths

    A Basic, Four Logic Cluster, Disjoint Switch Connected FPGA Architecture

    Get PDF
    This paper seeks to describe the process of developing a new FPGA architecture from nothing, both in terms of knowledge about FPGAs and in initial design material. Specifically, this project set out to design an FPGA architecture which can implement a simple state machine type design with 10 inputs, 10 outputs and 10 states. The open source Verilog-to-Routing FPGA CAD flow tool was used in order to synthesize, place, and route HDL files onto the architecture. This project was completed in terms of the spirit of the original goals of implementing an FPGA from scratch. Although, the project resulted in an architecture which slightly underperformed in its ability to route 100% of 10 input, 10 output, 10 state designs due to the general place and route algorithm used and the lack of non-contrived 10 input 10 output 10 state FSM designs
    corecore