1,746 research outputs found

    High Performance and Optimal Configuration of Accurate Heterogeneous Block-Based Approximate Adder

    Full text link
    Approximate computing is an emerging paradigm to improve power and performance efficiency for error-resilient application. Recent approximate adders have significantly extended the design space of accuracy-power configurable approximate adders, and find optimal designs by exploring the design space. In this paper, a new energy-efficient heterogeneous block-based approximate adder (HBBA) is proposed; which is a generic/configurable model that can be transformed to a particular adder by defining some configurations. An HBBA, in general, is composed of heterogeneous sub-adders, where each sub-adder can have a different configuration. A set of configurations of all the sub-adders in an HBBA defines its configuration. The block-based adders are approximated through inexact logic configuration and truncated carry chains. HBBA increases design space providing additional design points that fall on the Pareto-front and offer better power-accuracy trade-off compared to other configurations. Furthermore, to avoid Mont-Carlo simulations, we propose an analytical modelling technique to evaluate the probability of error and Probability Mass Function (PMF) of error value. Moreover, the estimation method estimates delay, area and power of heterogeneous block-based approximate adders. Thus, based on the analytical model and estimation method, the optimal configuration under a given error constraint can be selected from the whole design space of the proposed adder model by exhaustive search. The simulation results show that our HBBA provides improved accuracy in terms of error metrics compared to some state-of-the-art approximate adders. HBBA with 32 bits length serves about 15% reduction in area and up to 17% reduction in energy compared to state-of-the-art approximate adders.Comment: Submitted to the IEEE-TCAD journal, 16 pages, 16 figure

    Latency Optimized Asynchronous Early Output Ripple Carry Adder based on Delay-Insensitive Dual-Rail Data Encoding

    Full text link
    Asynchronous circuits employing delay-insensitive codes for data representation i.e. encoding and following a 4-phase return-to-zero protocol for handshaking are generally robust. Depending upon whether a single delay-insensitive code or multiple delay-insensitive code(s) are used for data encoding, the encoding scheme is called homogeneous or heterogeneous delay-insensitive data encoding. This article proposes a new latency optimized early output asynchronous ripple carry adder (RCA) that utilizes single-bit asynchronous full adders (SAFAs) and dual-bit asynchronous full adders (DAFAs) which incorporate redundant logic and are based on the delay-insensitive dual-rail code i.e. homogeneous data encoding, and follow a 4-phase return-to-zero handshaking. Amongst various RCA, carry lookahead adder (CLA), and carry select adder (CSLA) designs, which are based on homogeneous or heterogeneous delay-insensitive data encodings which correspond to the weak-indication or the early output timing model, the proposed early output asynchronous RCA that incorporates SAFAs and DAFAs with redundant logic is found to result in reduced latency for a dual-operand addition operation. In particular, for a 32-bit asynchronous RCA, utilizing 15 stages of DAFAs and 2 stages of SAFAs leads to reduced latency. The theoretical worst-case latencies of the different asynchronous adders were calculated by taking into account the typical gate delays of a 32/28nm CMOS digital cell library, and a comparison is made with their practical worst-case latencies estimated. The theoretical and practical worst-case latencies show a close correlation....Comment: arXiv admin note: text overlap with arXiv:1704.0761

    Asynchronous Early Output Dual-Bit Full Adders Based on Homogeneous and Heterogeneous Delay-Insensitive Data Encoding

    Get PDF
    This paper presents the designs of asynchronous early output dual-bit full adders without and with redundant logic (implicit) corresponding to homogeneous and heterogeneous delay-insensitive data encoding. For homogeneous delay-insensitive data encoding only dual-rail i.e. 1-of-2 code is used, and for heterogeneous delay-insensitive data encoding 1-of-2 and 1-of-4 codes are used. The 4-phase return-to-zero protocol is used for handshaking. To demonstrate the merits of the proposed dual-bit full adder designs, 32-bit ripple carry adders (RCAs) are constructed comprising dual-bit full adders. The proposed dual-bit full adders based 32-bit RCAs incorporating redundant logic feature reduced latency and area compared to their non-redundant counterparts with no accompanying power penalty. In comparison with the weakly indicating 32-bit RCA constructed using homogeneously encoded dual-bit full adders containing redundant logic, the early output 32-bit RCA comprising the proposed homogeneously encoded dual-bit full adders with redundant logic reports corresponding reductions in latency and area by 22.2% and 15.1% with no associated power penalty. On the other hand, the early output 32-bit RCA constructed using the proposed heterogeneously encoded dual-bit full adder which incorporates redundant logic reports respective decreases in latency and area than the weakly indicating 32-bit RCA that consists of heterogeneously encoded dual-bit full adders with redundant logic by 21.5% and 21.3% with nil power overhead. The simulation results obtained are based on a 32/28nm CMOS process technology

    Indicating Asynchronous Array Multipliers

    Full text link
    Multiplication is an important arithmetic operation that is frequently encountered in microprocessing and digital signal processing applications, and multiplication is physically realized using a multiplier. This paper discusses the physical implementation of many indicating asynchronous array multipliers, which are inherently elastic and modular and are robust to timing, process and parametric variations. We consider the physical realization of many indicating asynchronous array multipliers using a 32/28nm CMOS technology. The weak-indication array multipliers comprise strong-indication or weak-indication full adders, and strong-indication 2-input AND functions to realize the partial products. The multipliers were synthesized in a semi-custom ASIC design style using standard library cells including a custom-designed 2-input C-element. 4x4 and 8x8 multiplication operations were considered for the physical implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one (RTO) handshake protocols were utilized for data communication, and the delay-insensitive dual-rail code was used for data encoding. Among several weak-indication array multipliers, a weak-indication array multiplier utilizing a biased weak-indication full adder and the strong-indication 2-input AND function is found to have reduced cycle time and power-cycle time product with respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further, the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943

    Efficiency analysis methodology of FPGAs based on lost frequencies, area and cycles

    Get PDF
    We propose a methodology to study and to quantify efficiency and the impact of overheads on runtime performance. Most work on High-Performance Computing (HPC) for FPGAs only studies runtime performance or cost, while we are interested in how far we are from peak performance and, more importantly, why. The efficiency of runtime performance is defined with respect to the ideal computational runtime in absence of inefficiencies. The analysis of the difference between actual and ideal runtime reveals the overheads and bottlenecks. A formal approach is proposed to decompose the efficiency into three components: frequency, area and cycles. After quantification of the efficiencies, a detailed analysis has to reveal the reasons for the lost frequencies, lost area and lost cycles. We propose a taxonomy of possible causes and practical methods to identify and quantify the overheads. The proposed methodology is applied on a number of use cases to illustrate the methodology. We show the interaction between the three components of efficiency and show how bottlenecks are revealed
    • …
    corecore