36 research outputs found

    Impact of parameter variations on circuits and microarchitecture

    Get PDF
    Parameter variations, which are increasing along with advances in process technologies, affect both timing and power. Variability must be considered at both the circuit and microarchitectural design levels to keep pace with performance scaling and to keep power consumption within reasonable limits. This article presents an overview of the main sources of variability and surveys variation-tolerant circuit and microarchitectural approaches.Peer ReviewedPostprint (published version

    Empowering a helper cluster through data-width aware instruction selection policies

    Get PDF
    Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost- and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. helper cluster), potentially providing performance benefits. We complement a 32-bit monolithic processor with a low-complexity 8-bit helper cluster. Then, in our main focus, we propose various ideas to select suitable instructions to execute in the data-width based clusters. We add data-width information as another instruction steering decision metric and introduce new data-width based selection algorithms which also consider dependency, inter-cluster communication and load imbalance. Utilizing those techniques, the performance of a wide range of workloads are substantially increased; helper cluster achieves an average speedup of 11% for a wide range of 412 apps. When focusing on integer applications, the speedup can be as high as 22% on averagePeer ReviewedPostprint (published version

    Low Latency Prefix Accumulation Driven Compound MAC Unit for Efficient FIR Filter Implementation

    Get PDF
    135–138This article presents hierarchical single compound adder-based MAC with assertion based error correction for speculation variations in the prefix addition for FIR filter design. The VLSI implementation of approximation in prefix adder results show a significant delay and complexity reductions, all this at the cost of latency measures when speculation fails during carry propagation, which is the main reason preventing the use of speculation in parallel-prefix adders in DSP applications. The speculative adder which is based on Han Carlson parallel prefix adder structure accomplishes better reduction in latency. Introducing a structured and efficient shift-add technique and explore latency reduction by incorporating approximation in addition. The improvements made in terms of reduction in latency and merits in performance by the proposed MAC unit are showed through the synthesis done by FPGA hardware. Results show that proposed method outpaces both formerly projected MAC designs using multiplication methods for attaining high speed

    RECOGNITION AND ABOLITION OF ERROR BYUSING APPROXIMATE ADDER

    Get PDF
    Approximate computing is emerging as a new paradigm to improve digital circuit performance by relaxing the requirement of performing exact calculations. Approximate adders rely on the idea that for uniformly distributed inputs, long carry-propagation chains are rarely activated. Unfortunately, however, the above assumption on input signal statistics is not always verified; in this paper we focus on the case when the inputs have a Gaussian distribution. We show that for Gaussian inputs the error probability of previously proposed approximate adders approaches 25% for low sigma values, which is much larger than the uniform case. On the basis of this analysis, we propose an approximate adder with a correction circuit that drastically reduces the error rate for Gaussian distributed operands. In order to investigate the performance of our approach in a real application, simulated results for a simple audio processing system are reported. Implementation results in 65nm technology are also presented

    Design of variability compensation architectures of digital circuits with adaptive body bias

    Get PDF
    The most critical concern in circuit is to achieve high level of performance with very tight power constraint. As the high performance circuits moved beyond 45nm technology one of the major issues is the parameter variation i.e. deviation in process, temperature and voltage (PVT) values from nominal specifications. A key process parameter subject to variation is the transistor threshold voltage (Vth) which impacts two important parameters: frequency and leakage power. Although the degradation can be compensated by the worstcase scenario based over-design approach, it induces remarkable power and performance overhead which is undesirable in tightly constrained designs. Dynamic voltage scaling (DVS) is a more power efficient approach, however its coarse granularity implies difficulty in handling fine grained variations. These factors have contributed to the growing interest in power aware robust circuit design. We propose a variability compensation architecture with adaptive body bias, for low power applications using 28nm FDSOI technology. The basic approach is based on a dynamic prediction and prevention of possible circuit timing errors. In our proposal we are using a Canary logic technique that enables the typical-case design. The body bias generation is based on a DLL type method which uses an external reference generator and voltage controlled delay line (VCDL) to generate the forward body bias (FBB) control signals. The adaptive technique is used for dynamic detection and correction of path failures in digital designs due to PVT variations. Instead of tuning the supply voltage, the key idea of the design approach is to tune the body bias voltage bymonitoring the error rate during operation. The FBB increases operating speed with an overhead in leakage power

    Low Latency Prefix Accumulation Driven Compound MAC Unit for Efficient FIR Filter Implementation

    Get PDF
    This article presents hierarchical single compound adder-based MAC with assertion based error correction for speculation variations in the prefix addition for FIR filter design. The VLSI implementation of approximation in prefix adder results show a significant delay and complexity reductions, all this at the cost of latency measures when speculation fails during carry propagation, which is the main reason preventing the use of speculation in parallel-prefix adders in DSP applications. The speculative adder which is based on Han Carlson parallel prefix adder structure accomplishes better reduction in latency. Introducing a structured and efficient shift-add technique and explore latency reduction by incorporating approximation in addition. The improvements made in terms of reduction in latency and merits in performance by the proposed MAC unit are showed through the synthesis done by FPGA hardware. Results show that proposed method outpaces both formerly projected MAC designs using multiplication methods for attaining high speed

    Design of Unsigned Approximate Hybrid Dividers based on Restoring Array and Logarithmic Dividers

    Get PDF
    Approximate computer arithmetic has been extensively studied due to its advantages to further reduce power consumption and increase performance at reduced accuracy. Although a number of approximate adders and multipliers have been studied, only a few approximate dividers have been proposed. A logarithmic divider (LD) has low complexity and accuracy, while an exact array divider (EXD) has a high complexity. Therefore, in this paper, an approximate hybrid divider (AXHD) is proposed. It takes advantage of both LD and EXD to achieve a tradeoff between hardware performance and accuracy. Exact restoring divider cells are used to generate the most significant bits (MSBs) of the quotient for attaining a high accuracy while the other quotient digits are generated by using a LD as an approximate scheme to improve figures of merit such as power consumption, area and delay. To further save hardware resources, a so-called eliminated approximate hybrid divider (E-AXHD) based on AXHD is also proposed. In this improved design, a reduced width divider is used to replace the EXD in AXHD. Specifically, for a 16-by-8 design, n=(n + 1) array division is used to replace the n=8 array division (n < 8). The proposed AXHD and E-AXHD are evaluated and analyzed using error and hardware metrics. The proposed designs are also compared with EXD, LD and previous approximate dividers. The results show that the proposed designs outperform previous approximate dividers by considering both energy and error. The proposed hybrid dividers are of particular interest for error tolerant applications such as image processing and machine learning
    corecore