15 research outputs found

    Implementation Of Less Area Inexact Speculative Adder Using Carry Look Ahead Adder

    Get PDF
    The ISA improves snake performance by dividing the critical path into two or more shorter paths, reducing the strength of pseudo-defects and managing faults through an improved speculative path and versatile bi-directional error compensation technology. Pipelines are the process of shortening the critical path at the expense of the area. The overall structure of the runners improves performance and allows precise control of precision. This paper leads to the next snake-based Contemporary Estimation Theory (ISA) CLA plan, which consists of micropipelins to include two logic gates along their main path, thus enhancing replay activity. In addition, various stages of the ISA architecture have been proposed and the power clock has minimized this scheme. In mod we change Adder, we can replace Brent Kung Adder instead of CLA

    Efficient register renaming and recovery for high-performance processors

    Full text link
    © © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”Modern superscalar processors implement register renaming using either random access memory (RAM) or content-addressable memories (CAM) tables. The design of these structures should address both access time and misprediction recovery penalty. Although direct-mapped RAMs provide faster access times, CAMs are more appropriate to avoid recovery penalties. The presence of associative ports in CAMs, however, prevents them from scaling with the number of physical registers and pipeline width, negatively impacting performance, area, and energy consumption at the rename stage. In this paper, we present a new hybrid RAM CAM register renaming scheme, which combines the best of both approaches. In a steady state, a RAM provides fast and energy-efficient access to register mappings. On misspeculation, a low-complexity CAM enables immediate recovery. Experimental results show that in a four-way state-ofthe- art superscalar processor, the new approach provides almost the same performance as an ideal CAM-based renaming scheme, while dissipating only between 17% and 26% of the original energy and, in some cases, consuming less energy than purely RAM-based renaming schemes. Overall, the silicon area required to implement the hybrid RAM CAM scheme does not exceed the area required by conventional renaming mechanisms.This work was supported in part by the Spanish MINECO under Grant TIN2012-38341-C04-01.Petit Martí, SV.; Ubal Tena, R.; Sahuquillo Borrás, J.; López Rodríguez, PJ. (2014). Efficient register renaming and recovery for high-performance processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 22(7):1506-1514. https://doi.org/10.1109/TVLSI.2013.2270001S1506151422

    Energy-Efficient Digital Design Through Inexact and Approximate Arithmetic Circuits

    Get PDF
    Inexact and approximate circuit design is a promising approach to improve performance and energy efficiency in technology-scaled and low-power digital systems. Such strategy is suitable for error tolerant applications involving perceptive or statistical outputs. This paper reviews two established techniques applicable to arithmetic units: circuit pruning and carry speculation. A critical comparative study is carried out considering several error metrics

    Energy-Efficient Inexact Speculative Adder with High Performance and Accuracy Control

    Get PDF
    Inexact and approximate circuit design is a promising approach to improve performance and energy efficiency in technology-scaled and low-power digital systems. Such strategy is suitable for error-tolerant applications involving perceptive or statistical outputs. This paper presents a novel architecture of an Inexact Speculative Adder with optimized hardware efficiency and advanced compensation technique with either error correction or error reduction. This general topology of speculative adders improves performance and enables precise accuracy control. A brief design methodology and comparative study of this speculative adder are also presented herein, demonstrating power savings up to 26 % and energy-delay-area reductions up to 60% at equivalent accuracy compared to the state-of-the-art

    Near/Sub-Threshold Circuits and Approximate Computing: The Perfect Combination for Ultra-Low-Power Systems

    Get PDF
    While sub/near-threshold design offers the minimal power and energy consumption, such approach strongly deteriorates circuit performances and robustness against PVT (process/voltage/temperature) variations, leading to gigantic speed penalties and large silicon areas. Inexact and approximate circuit design can address these issues by trading calculation accuracy for better silicon area, circuit speed and even better power consumption. This paper reviews and proposes improvements for two approximate computing techniques applicable to arithmetic circuits: gate-level pruning and carry speculation. A critical study is then carried out considering several error metrics, and for the first time, those techniques are combined to produce approximate adders showing even higher gains at similar error levels. It is then shown that those techniques can be applied to a sub-threshold library to mitigate the large variability

    Utilizing timing error detection and recovery to dynamically improve superscalar processor performance

    Get PDF
    To provide reliable execution, traditional design methodologies perform timing error avoidance. Worst case parameters are assumed when determining a processor\u27s operating frequency, allowing the maximum propagation delay through the system to be met. However, in practice the worst cases are rare, leading to a large amount of exploitable performance improvement if timing errors can be detected and recovered from. To this end, we propose a novel low cost scheme which allows a superscalar processor to dynamically tune its frequency past the worst case limit. When timing errors occur, they are detected and recovered from locally. Additionally, the number of errors that occur are monitored by one of several sampling methods. When the error rate becomes too high, leading to decreased performance, the frequency is scaled back. Experimental results show an average performance gain of 45% across all benchmark applications. The cost of implementing the error detection and recovery is kept modest by reusing the existing pipeline logic to detect the timing errors

    Optimization for timing-speculated circuits by redundancy addition and removal

    Full text link

    Superscalar Processor Performance Enhancement through Reliable Dynamic Clock Frequency Tuning

    Get PDF
    Synchronous circuits are typically clocked considering worst case timing paths so that timing errors are avoided under all circumstances. In the case of a pipelined processor, this has special implications since the operating frequency of the entire pipeline is limited by the slowest stage. Our goal, in this paper, is to achieve higher performance in superscalar processors by dynamically varying the operating frequency during run time past worst case limits. The key objective is to see the effect of overclocking on superscalar processors for various benchmark applications, and analyze the associated overhead, in terms of extra hardware and error recovery penalty, when the clock frequency is adjusted dynamically. We tolerate timing errors occurring at speeds higher than what the circuit is designed to operate at by implementing an efficient error detection and recovery mechanism. We also study the limitations imposed by minimum path constraints on our technique. Experimental results show that an average performance gain up to 57% across all benchmark applications is achievable

    High Performance Reliable Variable Latency Carry Select Addition

    Get PDF
    This thesis describes the design and the optimization of a low overhead, high performance variable latency carry select adder. Previous researchers believed that the traditional adder has reached the theoretical speed bound. However, a considerable portion of hardware resources of the traditional adder is only used in the worst case. Based on this observation, variable latency adders have been proposed to improve on the theoretical limit, but such adders incur significant area overhead. By combining previous variable latency adders with carry select addition, this work describes a novel variable latency carry select adder. Applying carry select addition in the variable latency adder design significantly reduces the area overhead and increases its performance. This variable latency adder is faster and smaller than previous variable latency adders. Furthermore, this variable latency adder can be optimized to be faster and smaller than the fastest adder generated by the Synopsys DesignWare building block IP
    corecore