1,581 research outputs found
A gate-level strategy to design carry select adders
ABSTRACT This paper addresses the gate-level design of Carry Select Adders aiming at minimizing its delay through a proper selection of the Full Adder groups sizes. It starts from a rigorous timing analysis of the Carry Select Adder, from which a preliminary procedure is formulated to build an incomplete nearly-optimum adder. Then, the required number of bits is reached by adding remaining bits into proper blocks minimizing the delay increase. The design strategy proposed also accounts for the dependence of multiplexer (MUX) delay on its fan-out, in contrast to the usual and unrealistic assumption of a constant MUX delay. The strategy proposed is applied in several design cases, whose results shows that the delay achieved is usually minimum, and only in a few cases delay it is lower than 2% of the optimum
Asynchronous Early Output Dual-Bit Full Adders Based on Homogeneous and Heterogeneous Delay-Insensitive Data Encoding
This paper presents the designs of asynchronous early output dual-bit full
adders without and with redundant logic (implicit) corresponding to homogeneous
and heterogeneous delay-insensitive data encoding. For homogeneous
delay-insensitive data encoding only dual-rail i.e. 1-of-2 code is used, and
for heterogeneous delay-insensitive data encoding 1-of-2 and 1-of-4 codes are
used. The 4-phase return-to-zero protocol is used for handshaking. To
demonstrate the merits of the proposed dual-bit full adder designs, 32-bit
ripple carry adders (RCAs) are constructed comprising dual-bit full adders. The
proposed dual-bit full adders based 32-bit RCAs incorporating redundant logic
feature reduced latency and area compared to their non-redundant counterparts
with no accompanying power penalty. In comparison with the weakly indicating
32-bit RCA constructed using homogeneously encoded dual-bit full adders
containing redundant logic, the early output 32-bit RCA comprising the proposed
homogeneously encoded dual-bit full adders with redundant logic reports
corresponding reductions in latency and area by 22.2% and 15.1% with no
associated power penalty. On the other hand, the early output 32-bit RCA
constructed using the proposed heterogeneously encoded dual-bit full adder
which incorporates redundant logic reports respective decreases in latency and
area than the weakly indicating 32-bit RCA that consists of heterogeneously
encoded dual-bit full adders with redundant logic by 21.5% and 21.3% with nil
power overhead. The simulation results obtained are based on a 32/28nm CMOS
process technology
Recommended from our members
MILO : a microarchitecture and logic optimizer
In this report we discuss strengths and weaknesses of logic synthesis systems and describe a system for microarchitectural and logic optimization. Our system uses a set of algorithms for synthesizing SSI/MSI macros from parameterized microarchitecture components. In addition, it uses rules for optimizing both at the microarchitecture and logic level. The system increases designer productivity and requires less design knowledge and experience from circuit engineers
Penelope: The NBTI-aware processor
Transistors consist of lower number of atoms with every technology generation. Such atoms may be displaced due to the stress caused by high temperature, frequency and current, leading to failures. NBTI (negative bias temperature instability) is one of the most important sources of failure affecting transistors. NBTI degrades PMOS transistors whenever the voltage at the gate is negative (logic inputPeer ReviewedPostprint (published version
Energy-efficient acceleration of MPEG-4 compression tools
We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09
μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art
Pipelining Saturated Accumulation
Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, and FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280 MHz on a Xilinx Spartan-3(XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM
Recommended from our members
An approach to component generation and technology adaptation
Component generation is the task of mapping the abstract functional specification of register-transfer (RT) components, such as decoders and multiplexers, adders and comparators, and multipliers and arithmetic logic units, into configurations of connected physical layout cells. Cells are drawn from a given ASIC (application-specific integrated circuit) library.In this dissertation, I describe a symbolic pattern-matching approach to component generation and, relative to this, an approach to automating technology adaptation. I define the component decomposition algorithm and technology compilation algorithm that formalize these two approaches and describe implementations of each, in the DTAS component generation system and the LOLA technology adaptation system, respectively. I present empirical results to validate the utility of my approach to component generation, and I present a demonstration to validate my approach to technology adaptation.My approach to component generation has two significant benefits. First, it enables the use of complex functional library cells, such as adders and CLAs, in the generation of designs for functional units. Second, it effectively searches the design space for designs that make desirable tradeoffs between design constraints, such as area and delay. My approach to technology adaptation is significant because it bootstraps the DTAS component generation system into new ASIC cell libraries, as well as cell libraries as they undergo change. In this way, the technology compilation algorithm automates the task of maintaining technology independence.To validate my approach to component generation, I present the results of four sets of experiments using the DTAS component generation system. The first set examines the effectiveness of search control in DTAS; the second examines the capability to find desirable design alternatives; the third compares designs generated by DTAS with those of MISII; and the fourth shows how the use of complex library cells improves design quality. To validate my approach to automating technology adaptation, I demonstrate the application of the LOLA technology adaptation system to a cell library as it undergoes four phases of evolution
- …