16,435 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Latency Optimized Asynchronous Early Output Ripple Carry Adder based on Delay-Insensitive Dual-Rail Data Encoding

    Full text link
    Asynchronous circuits employing delay-insensitive codes for data representation i.e. encoding and following a 4-phase return-to-zero protocol for handshaking are generally robust. Depending upon whether a single delay-insensitive code or multiple delay-insensitive code(s) are used for data encoding, the encoding scheme is called homogeneous or heterogeneous delay-insensitive data encoding. This article proposes a new latency optimized early output asynchronous ripple carry adder (RCA) that utilizes single-bit asynchronous full adders (SAFAs) and dual-bit asynchronous full adders (DAFAs) which incorporate redundant logic and are based on the delay-insensitive dual-rail code i.e. homogeneous data encoding, and follow a 4-phase return-to-zero handshaking. Amongst various RCA, carry lookahead adder (CLA), and carry select adder (CSLA) designs, which are based on homogeneous or heterogeneous delay-insensitive data encodings which correspond to the weak-indication or the early output timing model, the proposed early output asynchronous RCA that incorporates SAFAs and DAFAs with redundant logic is found to result in reduced latency for a dual-operand addition operation. In particular, for a 32-bit asynchronous RCA, utilizing 15 stages of DAFAs and 2 stages of SAFAs leads to reduced latency. The theoretical worst-case latencies of the different asynchronous adders were calculated by taking into account the typical gate delays of a 32/28nm CMOS digital cell library, and a comparison is made with their practical worst-case latencies estimated. The theoretical and practical worst-case latencies show a close correlation....Comment: arXiv admin note: text overlap with arXiv:1704.0761

    A Library-Based Synthesis Methodology for Reversible Logic

    Full text link
    In this paper, a library-based synthesis methodology for reversible circuits is proposed where a reversible specification is considered as a permutation comprising a set of cycles. To this end, a pre-synthesis optimization step is introduced to construct a reversible specification from an irreversible function. In addition, a cycle-based representation model is presented to be used as an intermediate format in the proposed synthesis methodology. The selected intermediate format serves as a focal point for all potential representation models. In order to synthesize a given function, a library containing seven building blocks is used where each building block is a cycle of length less than 6. To synthesize large cycles, we also propose a decomposition algorithm which produces all possible minimal and inequivalent factorizations for a given cycle of length greater than 5. All decompositions contain the maximum number of disjoint cycles. The generated decompositions are used in conjunction with a novel cycle assignment algorithm which is proposed based on the graph matching problem to select the best possible cycle pairs. Then, each pair is synthesized by using the available components of the library. The decomposition algorithm together with the cycle assignment method are considered as a binding method which selects a building block from the library for each cycle. Finally, a post-synthesis optimization step is introduced to optimize the synthesis results in terms of different costs.Comment: 24 pages, 8 figures, Microelectronics Journal, Elsevie
    • 

    corecore