Search CORE

201 research outputs found

Fast Prefix Adders for Non-uniform Input Arrival Times

Author: Held Stephan
Spirkl Sophie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This is a post-peer-review, pre-copyedit version of an article published in Algorithmica. The final authenticated version is available online at: https://doi.org/10.1007/s00453-015-0067-xWe consider the problem of constructing fast and small parallel prefix adders for non-uniform input arrival times. In modern computer chips, adders with up to hundreds of inputs occur frequently, and they are often embedded into more complex circuits, e.g. multipliers, leading to instance-specific non-uniform input arrival times. Most previous results are based on representing binary carry-propagate adders as parallel prefix graphs, in which pairs of generate and propagate signals are combined using complex gates called prefix gates. Examples of commonly-used adders are constructed based on the Kogge–Stone or Ladner–Fischer prefix graphs. Adders constructed in this model usually minimize the delay in terms of these prefix gates. However, the delay in terms of logic gates can be worse by a factor of two. In contrast, we aim to minimize the delay of the underlying logic circuit directly. We prove a lower bound on the delay of a carry bit computation achievable by any prefix carry bit circuit and develop an algorithm that computes a prefix carry bit circuit with optimum delay up to a small additive constant. Our algorithm improves the running time of a previous dynamic program for constructing a prefix carry bit from O(n3) to O(nlog2n) while simultaneously improving the delay and size guarantee, where n is the number of bits in the summands. Furthermore, we use this algorithm as a subroutine to compute a full adder in near-linear time, reducing the delay approximation factor of 2 from previous approaches to 1.441 for our algorithm

University of Waterloo's Institutional Repository

Recommended from our members

Total delay optimization for column reduction multipliers considering non-uniform arrival times to the final adder

Author: Waters Ronald S.
Publication venue
Publication date: 26/06/2014
Field of study

textColumn Reduction Multiplier techniques provide the fastest multiplier designs and involve three steps. First, a partial product array of terms is formed by logically ANDing each bit of the multiplier with each bit of the multiplicand. Second, adders or counters are used to reduce the number of terms in each bit column to a final two. This activity is commonly described as column reduction and occurs in multiple stages. Finally, some form of carry propagate adder (CPA) is applied to the final two terms in order to sum them to produce the final product of the multiplication. Since forming the partial products, in the first step, is simply forming an array of the logical AND's of two bits, there is little opportunity for delay improvement for the first step. There has been much work done in optimizing the reduction stages for column multipliers in the second reduction step. All of the reduction approaches of the second step result in non-uniform arrival times to the input of the final carry propagate adder in the final step. The designs for carry propagate adders have been done assuming that the input bits all have the same arrival time. It is not evident that the non-uniform arrival times from the columns impacts the performance of the multiplier. A thorough analysis of the several column reduction methods and the impact of carry propagate adder designs, along with the column reduction design step, to provide the fastest possible final results, for an array of multiplier widths has not been undertaken. This dissertation investigates the design impact of three carry propagate adders, with different performance attributes, on the final delay results for four column reduction multipliers and suggests general ways to optimize the total delay for the multipliers.Electrical and Computer Engineerin

Texas ScholarWorks

A Low-Area, Energy-Efficient 64-Bit Reconfigurable Carry Select Modified Tree-Based Adder for Media Signal Processing

Author: Allwin Priscilla Sharon
Publication venue: CORE Scholar
Publication date: 01/01/2019
Field of study

Multimedia systems play an essential part in our daily lives and have drastically improved the quality of life over time. Multimedia devices like cellphones, radios, televisions, and computers require low-area and low-power reconfigurable adders to process greedy computation algorithms for the real-time audio/video signal and image processing such as discrete cosine transform, inverse discrete cosine transform, and fast Fourier transform, etc. In this thesis, a novel 64-bit reconfigurable adder is proposed and implemented to reduce the area and power consumption. This adder can be run-time reconfigured to different reconfigurable word lengths, i.e., one 64- bit, two 32-bits, four 16-bits or eight 8-bits addition, depending on the partition signal command. A Carry Select Modified Tree (CSMT) based adder is used in the reconfigurable adder to reduce the area by 22 % and the power consumption by 47 % when compared to the conventional design. The proposed adder, implemented in 180 nm CMOS technology at 1.8-volt supply, has a worst-case Delay of 20.67 nanoseconds with an overall area of 36,417 μm² and power consumption of 447.93 μW

CORE

Binary Adder Circuits of Asymptotically Minimum Depth, Linear Size, and Fan-Out Two

Author: Held Stephan
Spirkl Sophie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

© Stephan Held and Sophie Spirkl | ACM 2018. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Algorithms, https://doi.org/10.1145/3147215.We consider the problem of constructing fast and small binary adder circuits. Among widely used adders, the Kogge-Stone adder is often considered the fastest, because it computes the carry bits for two n-bit numbers (where n is a power of two) with a depth of 2 log2n logic gates, size 4 nlog2n, and all fan-outs bounded by two. Fan-outs of more than two are disadvantageous in practice, because they lead to the insertion of repeaters for repowering the signal and additional depth in the physical implementation. However, the depth bound of the Kogge-Stone adder is off by a factor of two from the lower bound of log2n. Two separate constructions by Brent and Krapchenko achieve this lower bound asymptotically. Brent’s construction gives neither a bound on the fan-out nor the size, while Krapchenko’s adder has linear size, but can have up to linear fan-out. With a fan-out bound of two, neither construction achieves a depth of less than 2 log2n. In a further approach, Brent and Kung proposed an adder with linear size and fan-out two but twice the depth of the Kogge-Stone adder. These results are 33–43 years old and no substantial theoretical improvement for has been made since then. In this article, we integrate the individual advantages of all previous adder circuits into a new family of full adders, the first to improve on the depth bound of 2 log2n while maintaining a fan-out bound of two. Our adders achieve an asymptotically optimum logic gate depth of log2n + o(log 2n) and linear size O(n)

University of Waterloo's Institutional Repository

VLSI Circuits for Approximate Computing

Author: Esposito Darjn
Publication venue
Publication date: 08/04/2017
Field of study

Approximate Computing has recently emerged as a promising solution to enhance circuits performance by relaxing the requisite on exact calculations. Multimedia and Machine Learning constitute a typical example of error resilient, albeit compute-intensive, applications. In this dissertation, the design and optimization of approximate fundamental VLSI digital blocks is investigated. In chapter one the theoretical motivations of Approximate Computing, from the VLSI perspective, are discussed. In chapter two my research activity about approximate adders is reported. In this chapter approximate adders for both traditional non-error tolerant applications and error resilient applications are discussed. In chapter three precision-scalable units are investigated. Real-time precision scalability allows adapting the precision level of the unit with the precision requirements of the applications. In this context my research activities regarding approximate Multiply-and-Accumulate and memory units are described. In chapter four a precision-scalable approximate convolver for computer vision applications is discussed. This is composed of both the approximate Multiply-and-Accumulate and memory units, presented in the chapter three

Università degli Studi di Napoli Federico Il Open Archive

Design and development of mobile channel simulators using digital signal processing techniques

Author: Khawar Siddique Khokhar
Publication venue
Publication date: 01/01/2006
Field of study

A mobile channel simulator can be constructed either in the time domain using a tapped delay line filter or in the frequency domain using the time variant transfer function of the channel. Transfer function modelling has many advantages over impulse response modelling. Although the transfer function channel model has been envisaged by several researchers as an alternative to the commonly employed tapped delay line model, so far it has not been implemented. In this work, channel simulators for single carrier and multicarrier OFDM system based on time variant transfer function of the channel have been designed and implemented using DSP techniques in SIMULINK. For a single carrier system, the simulator was based on Bello's transfer function channel model. Bello speculated that about 10Βτ(_MAX) frequency domain branches might result in a very good approximation of the channel (where в is the signal bandwidth and τ(_MAX) is the maximum excess delay of the multi-path channel). The simulation results showed that 10Bτ(_MAX) branches gave close agreement with the tapped delay line model(where Be is the coherence bandwidth). This number is π times higher than the previously speculated 10Bτ(_MAX).For multicarrier OFDM system, the simulator was based on the physical (PHY) layer standard for IEEE 802.16-2004 Wireless Metropolitan Area Network (WirelessMAN) and employed measured channel transfer functions at the 2.5 GHz and 3.5 GHz bands in the simulations. The channel was implemented in the frequency domain by carrying out point wise multiplication of the spectrum of OFDM time The simulator was employed to study BER performance of rate 1/2 and rate 3/4 coded systems with QPSK and 16-QAM constellations under a variety of measured channel transfer functions. The performance over the frequency selective channel mainly depended upon the frequency domain fading and the channel coding rate

Durham e-Theses

OpenGrey Repository