10,918 research outputs found

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

    An FPGA Architecture and CAD Flow Supporting Dynamically Controlled Power Gating

    Get PDF
    © 2015 IEEE.Leakage power is an important component of the total power consumption in field-programmable gate arrays (FPGAs) built using 90-nm and smaller technology nodes. Power gating was shown to be effective at reducing the leakage power. Previous techniques focus on turning OFF unused FPGA resources at configuration time; the benefit of this approach depends on resource utilization. In this paper, we present an FPGA architecture that enables dynamically controlled power gating, in which FPGA resources can be selectively powered down at run-time. This could lead to significant overall energy savings for applications having modules with long idle times. We also present a CAD flow that can be used to map applications to the proposed architecture. We study the area and power tradeoffs by varying the different FPGA architecture parameters and power gating granularity. The proposed CAD flow is used to map a set of benchmark circuits that have multiple power-gated modules to the proposed architecture. Power savings of up to 83% are achievable for these circuits. Finally, we study a control system of a robot that is used in endoscopy. Using the proposed architecture combined with clock gating results in up to 19% energy savings in this application

    High quality testing of grid style power gating

    No full text
    This paper shows that existing delay-based testing techniques for power gating exhibit fault coverage loss due to unconsidered delays introduced by the structure of the virtual voltage power-distribution-network (VPDN). To restore this loss, which could reach up to 70.3% on stuck-open faults, we propose a design-for-testability (DFT) logic that considers the impact of VPDN on fault coverage in order to constitute the proper interface between the VPDN and the DFT. The proposed logic can be easily implemented on-top of existing DFT solutions and its overhead is optimized by an algorithm that offers trade-off flexibility between test-application-time and hardware overhead. Through physical layout SPICE simulations, we show complete fault coverage recovery on stuck-open faults and 43.2% test-application-time improvement compared to a previously proposed DFT technique. To the best of our knowledge, this paper presents the first analysis of the VPDN impact on test qualit

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Synthesis of application specific processor architectures for ultra-low energy consumption

    No full text
    In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage to synthesise architectures optimised for energy consumption. Firstly, we present an overview of both traditional energy saving techniques and new developments in architectural approaches to energy-efficient processing. Secondly, we propose a picoMIPS architecture that serves as an architectural template for energy-efficient synthesis. As a case study, we show how the picoMIPS architecture can be tailored to an energy efficient execution of the DCT algorithm

    RTL2RTL Formal Equivalence: Boosting the Design Confidence

    Full text link
    Increasing design complexity driven by feature and performance requirements and the Time to Market (TTM) constraints force a faster design and validation closure. This in turn enforces novel ways of identifying and debugging behavioral inconsistencies early in the design cycle. Addition of incremental features and timing fixes may alter the legacy design behavior and would inadvertently result in undesirable bugs. The most common method of verifying the correctness of the changed design is to run a dynamic regression test suite before and after the intended changes and compare the results, a method which is not exhaustive. Modern Formal Verification (FV) techniques involving new methods of proving Sequential Hardware Equivalence enabled a new set of solutions for the given problem, with complete coverage guarantee. Formal Equivalence can be applied for proving functional integrity after design changes resulting from a wide variety of reasons, ranging from simple pipeline optimizations to complex logic redistributions. We present here our experience of successfully applying the RTL to RTL (RTL2RTL) Formal Verification across a wide spectrum of problems on a Graphics design. The RTL2RTL FV enabled checking the design sanity in a very short time, thus enabling faster and safer design churn. The techniques presented in this paper are applicable to any complex hardware design.Comment: In Proceedings FSFMA 2014, arXiv:1407.195

    KAPow: A System Identification Approach to Online Per-Module Power Estimation in FPGA Designs

    Get PDF
    In a modern FPGA system-on-chip design, it is often insufficient to simply assess the total power consumption of the entire circuit by design-time estimation or runtime power rail measurement. Instead, to make better runtime decisions, it is desirable to understand the power consumed by each individual module in the system. In this work, we combine boardlevel power measurements with register-level activity counting to build an online model that produces a breakdown of power consumption within the design. Online model refinement avoids the need for a time-consuming characterisation stage and also allows the model to track long-term changes to operating conditions. Our flow is named KAPow, a (loose) acronym for ‘K’ounting Activity for Power estimation, which we show to be accurate, with per-module power estimates as close to ±5mW of true measurements, and to have low overheads. We also demonstrate an application example in which a permodule power breakdown can be used to determine an efficient mapping of tasks to modules and reduce system-wide power consumption by over 8%
    corecore