1,496 research outputs found
Recommended from our members
Tradeoffs in parallel prefix adder structures
textThis report presents the results of research on comparing the structures and qualities of fast parallel prefix adders. The binary adder serves as a fundamental component of many digital arithmetic operations. Many modern microprocessors and ASICs that require high speed arithmetic logic often implement parallel prefix adders. Modern parallel prefix adder structures are based on previous works including those of Kogge-Stone, Brent-Kung, Ladner-Fischer, Knowles, et al. and designs presented in each work have their own merits and tradeoffs that are suitable for certain applications. Previous works have described standard and systematic ways to design and construct functional parallel prefix adder structures. Although the parallel prefix adder has been studied for decades, this work explores the possibility that non-standard and more optimal structures may exist by developing and utilizing a brute force search algorithm based on the prefix operator rules and properties to find all possible parallel prefix adder structures. The parallel prefix adder search algorithm design, search results and study of tradeoffs are discussed in this work.Electrical and Computer Engineerin
Low Latency Prefix Accumulation Driven Compound MAC Unit for Efficient FIR Filter Implementation
135–138This article presents hierarchical single compound adder-based MAC with assertion based error correction for speculation variations in the prefix addition for FIR filter design. The VLSI implementation of approximation in prefix adder results show a significant delay and complexity reductions, all this at the cost of latency measures when speculation fails during carry propagation, which is the main reason preventing the use of speculation in parallel-prefix adders in DSP applications. The speculative adder which is based on Han Carlson parallel prefix adder structure accomplishes better reduction in latency. Introducing a structured and efficient shift-add technique and explore latency reduction by incorporating approximation in addition. The improvements made in terms of reduction in latency and merits in performance by the proposed MAC unit are showed through the synthesis done by FPGA hardware. Results show that proposed method outpaces both formerly projected MAC designs using multiplication methods for attaining high speed
DESIGN OF LOW POWER AND HIGH SPEED CARRY SELECT ADDER USING BRENT KUNG ADDER
In this paper, Carry Select Adder (CSA) architectures are proposed using parallel prefix adders. Instead of using dual Ripple Carry Adders (RCA), parallel prefix adder i.e., Brent Kung (BK) adder is used to design Regular Linear CSA. Adders are the basic building blocks in digital integrated circuit based designs. Ripple Carry Adder (RCA) gives the most compact design but takes longer computation time. The time critical applications use Carry Look-ahead scheme (CLA) to derive fast results but they lead to increase in area. Carry SelectAdder is a compromise between RCA and CLA in term of area and delay. Delayof RCA is large therefore we have replaced it with parallel prefix adder which gives fast results. In this paper, structures of 16-Bit Regular Linear Brent Kung CSA, Modified Linear BK CSA, Regular Square Root (SQRT) BK CSA andModified SQRT BK CSA are designed. Power and delayof all these adder architectures are calculated at different input voltages. The results depict that Modified SQRT BK CSA is better than all the other adder architectures in terms of power but with small speed penalty. The designs have been synthesized at45nm technology using Tanner EDA tool
Low Latency Prefix Accumulation Driven Compound MAC Unit for Efficient FIR Filter Implementation
This article presents hierarchical single compound adder-based MAC with assertion based error correction for speculation variations in the prefix addition for FIR filter design. The VLSI implementation of approximation in prefix adder results show a significant delay and complexity reductions, all this at the cost of latency measures when speculation fails during carry propagation, which is the main reason preventing the use of speculation in parallel-prefix adders in DSP applications. The speculative adder which is based on Han Carlson parallel prefix adder structure accomplishes better reduction in latency. Introducing a structured and efficient shift-add technique and explore latency reduction by incorporating approximation in addition. The improvements made in terms of reduction in latency and merits in performance by the proposed MAC unit are showed through the synthesis done by FPGA hardware. Results show that proposed method outpaces both formerly projected MAC designs using multiplication methods for attaining high speed
Pipelining Saturated Accumulation
Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, and FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280 MHz on a Xilinx Spartan-3(XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM
Optimistic Parallelization of Floating-Point Accumulation
Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6Ă— with randomly generated data and 3-7Ă— with summations extracted from Conjugate Gradient benchmarks
Comparison of parallel prefix adder (PPA)
The parallel prefix adder is one of the fastest types of adder that had been created and developed. Two common types of parallel prefix adder are the Brent
Kung and Kogge Stone adders. This project will compare and study the performances of these two adders in terms of propagation delay and design area. The
comparison and study for both adders will be conducted for 8, 16 and 32 bits size. By using the Quartus II design software, the designs for both Brent Kung and Kogge Stone adders will be developed. From the designs of both adders, a vector waveform will be produced as a result of the designs’ simulations. The simulation result will also show the propagation delay for the adders. Hence, this project is significant in showing which of the two adders being tested perform better in terms of propagation delay and design area based on different sizes of bits
Pipelined Two-Operand Modular Adders
Pipelined two-operand modular adder (TOMA) is one of basic components used in digital signal processing (DSP) systems that use the residue number system (RNS). Such modular adders are used in binary/residue and residue/binary converters, residue multipliers and scalers as well as within residue processing channels. The design of pipelined TOMAs is usually obtained by inserting an appriopriate number of latch layers inside a nonpipelined TOMA structure. Hence their area is also determined by the number of latches and the delay by the number of latch layers. In this paper we propose a new pipelined TOMA that is based on a new TOMA, that has the smaller area and smaller delay than other known structures. Comparisons are made using data from the very large scale of integration (VLSI) standard cell library
- …