16,435 research outputs found
Low Power Processor Architectures and Contemporary Techniques for Power Optimization â A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
Latency Optimized Asynchronous Early Output Ripple Carry Adder based on Delay-Insensitive Dual-Rail Data Encoding
Asynchronous circuits employing delay-insensitive codes for data
representation i.e. encoding and following a 4-phase return-to-zero protocol
for handshaking are generally robust. Depending upon whether a single
delay-insensitive code or multiple delay-insensitive code(s) are used for data
encoding, the encoding scheme is called homogeneous or heterogeneous
delay-insensitive data encoding. This article proposes a new latency optimized
early output asynchronous ripple carry adder (RCA) that utilizes single-bit
asynchronous full adders (SAFAs) and dual-bit asynchronous full adders (DAFAs)
which incorporate redundant logic and are based on the delay-insensitive
dual-rail code i.e. homogeneous data encoding, and follow a 4-phase
return-to-zero handshaking. Amongst various RCA, carry lookahead adder (CLA),
and carry select adder (CSLA) designs, which are based on homogeneous or
heterogeneous delay-insensitive data encodings which correspond to the
weak-indication or the early output timing model, the proposed early output
asynchronous RCA that incorporates SAFAs and DAFAs with redundant logic is
found to result in reduced latency for a dual-operand addition operation. In
particular, for a 32-bit asynchronous RCA, utilizing 15 stages of DAFAs and 2
stages of SAFAs leads to reduced latency. The theoretical worst-case latencies
of the different asynchronous adders were calculated by taking into account the
typical gate delays of a 32/28nm CMOS digital cell library, and a comparison is
made with their practical worst-case latencies estimated. The theoretical and
practical worst-case latencies show a close correlation....Comment: arXiv admin note: text overlap with arXiv:1704.0761
A Library-Based Synthesis Methodology for Reversible Logic
In this paper, a library-based synthesis methodology for reversible circuits
is proposed where a reversible specification is considered as a permutation
comprising a set of cycles. To this end, a pre-synthesis optimization step is
introduced to construct a reversible specification from an irreversible
function. In addition, a cycle-based representation model is presented to be
used as an intermediate format in the proposed synthesis methodology. The
selected intermediate format serves as a focal point for all potential
representation models. In order to synthesize a given function, a library
containing seven building blocks is used where each building block is a cycle
of length less than 6. To synthesize large cycles, we also propose a
decomposition algorithm which produces all possible minimal and inequivalent
factorizations for a given cycle of length greater than 5. All decompositions
contain the maximum number of disjoint cycles. The generated decompositions are
used in conjunction with a novel cycle assignment algorithm which is proposed
based on the graph matching problem to select the best possible cycle pairs.
Then, each pair is synthesized by using the available components of the
library. The decomposition algorithm together with the cycle assignment method
are considered as a binding method which selects a building block from the
library for each cycle. Finally, a post-synthesis optimization step is
introduced to optimize the synthesis results in terms of different costs.Comment: 24 pages, 8 figures, Microelectronics Journal, Elsevie
- âŠ