201 research outputs found

    Fast Prefix Adders for Non-uniform Input Arrival Times

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Algorithmica. The final authenticated version is available online at: https://doi.org/10.1007/s00453-015-0067-xWe consider the problem of constructing fast and small parallel prefix adders for non-uniform input arrival times. In modern computer chips, adders with up to hundreds of inputs occur frequently, and they are often embedded into more complex circuits, e.g. multipliers, leading to instance-specific non-uniform input arrival times. Most previous results are based on representing binary carry-propagate adders as parallel prefix graphs, in which pairs of generate and propagate signals are combined using complex gates called prefix gates. Examples of commonly-used adders are constructed based on the Kogge–Stone or Ladner–Fischer prefix graphs. Adders constructed in this model usually minimize the delay in terms of these prefix gates. However, the delay in terms of logic gates can be worse by a factor of two. In contrast, we aim to minimize the delay of the underlying logic circuit directly. We prove a lower bound on the delay of a carry bit computation achievable by any prefix carry bit circuit and develop an algorithm that computes a prefix carry bit circuit with optimum delay up to a small additive constant. Our algorithm improves the running time of a previous dynamic program for constructing a prefix carry bit from O(n3) to O(nlog2n) while simultaneously improving the delay and size guarantee, where n is the number of bits in the summands. Furthermore, we use this algorithm as a subroutine to compute a full adder in near-linear time, reducing the delay approximation factor of 2 from previous approaches to 1.441 for our algorithm

    A Low-Area, Energy-Efficient 64-Bit Reconfigurable Carry Select Modified Tree-Based Adder for Media Signal Processing

    Get PDF
    Multimedia systems play an essential part in our daily lives and have drastically improved the quality of life over time. Multimedia devices like cellphones, radios, televisions, and computers require low-area and low-power reconfigurable adders to process greedy computation algorithms for the real-time audio/video signal and image processing such as discrete cosine transform, inverse discrete cosine transform, and fast Fourier transform, etc. In this thesis, a novel 64-bit reconfigurable adder is proposed and implemented to reduce the area and power consumption. This adder can be run-time reconfigured to different reconfigurable word lengths, i.e., one 64- bit, two 32-bits, four 16-bits or eight 8-bits addition, depending on the partition signal command. A Carry Select Modified Tree (CSMT) based adder is used in the reconfigurable adder to reduce the area by 22 % and the power consumption by 47 % when compared to the conventional design. The proposed adder, implemented in 180 nm CMOS technology at 1.8-volt supply, has a worst-case Delay of 20.67 nanoseconds with an overall area of 36,417 μm² and power consumption of 447.93 μW

    Binary Adder Circuits of Asymptotically Minimum Depth, Linear Size, and Fan-Out Two

    Get PDF
    © Stephan Held and Sophie Spirkl | ACM 2018. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Algorithms, https://doi.org/10.1145/3147215.We consider the problem of constructing fast and small binary adder circuits. Among widely used adders, the Kogge-Stone adder is often considered the fastest, because it computes the carry bits for two n-bit numbers (where n is a power of two) with a depth of 2 log2n logic gates, size 4 nlog2n, and all fan-outs bounded by two. Fan-outs of more than two are disadvantageous in practice, because they lead to the insertion of repeaters for repowering the signal and additional depth in the physical implementation. However, the depth bound of the Kogge-Stone adder is off by a factor of two from the lower bound of log2n. Two separate constructions by Brent and Krapchenko achieve this lower bound asymptotically. Brent’s construction gives neither a bound on the fan-out nor the size, while Krapchenko’s adder has linear size, but can have up to linear fan-out. With a fan-out bound of two, neither construction achieves a depth of less than 2 log2n. In a further approach, Brent and Kung proposed an adder with linear size and fan-out two but twice the depth of the Kogge-Stone adder. These results are 33–43 years old and no substantial theoretical improvement for has been made since then. In this article, we integrate the individual advantages of all previous adder circuits into a new family of full adders, the first to improve on the depth bound of 2 log2n while maintaining a fan-out bound of two. Our adders achieve an asymptotically optimum logic gate depth of log2n + o(log 2n) and linear size O(n)

    VLSI Circuits for Approximate Computing

    Get PDF
    Approximate Computing has recently emerged as a promising solution to enhance circuits performance by relaxing the requisite on exact calculations. Multimedia and Machine Learning constitute a typical example of error resilient, albeit compute-intensive, applications. In this dissertation, the design and optimization of approximate fundamental VLSI digital blocks is investigated. In chapter one the theoretical motivations of Approximate Computing, from the VLSI perspective, are discussed. In chapter two my research activity about approximate adders is reported. In this chapter approximate adders for both traditional non-error tolerant applications and error resilient applications are discussed. In chapter three precision-scalable units are investigated. Real-time precision scalability allows adapting the precision level of the unit with the precision requirements of the applications. In this context my research activities regarding approximate Multiply-and-Accumulate and memory units are described. In chapter four a precision-scalable approximate convolver for computer vision applications is discussed. This is composed of both the approximate Multiply-and-Accumulate and memory units, presented in the chapter three

    Design and development of mobile channel simulators using digital signal processing techniques

    Get PDF
    A mobile channel simulator can be constructed either in the time domain using a tapped delay line filter or in the frequency domain using the time variant transfer function of the channel. Transfer function modelling has many advantages over impulse response modelling. Although the transfer function channel model has been envisaged by several researchers as an alternative to the commonly employed tapped delay line model, so far it has not been implemented. In this work, channel simulators for single carrier and multicarrier OFDM system based on time variant transfer function of the channel have been designed and implemented using DSP techniques in SIMULINK. For a single carrier system, the simulator was based on Bello's transfer function channel model. Bello speculated that about 10Βτ(_MAX) frequency domain branches might result in a very good approximation of the channel (where в is the signal bandwidth and τ(_MAX) is the maximum excess delay of the multi-path channel). The simulation results showed that 10Bτ(_MAX) branches gave close agreement with the tapped delay line model(where Be is the coherence bandwidth). This number is π times higher than the previously speculated 10Bτ(_MAX).For multicarrier OFDM system, the simulator was based on the physical (PHY) layer standard for IEEE 802.16-2004 Wireless Metropolitan Area Network (WirelessMAN) and employed measured channel transfer functions at the 2.5 GHz and 3.5 GHz bands in the simulations. The channel was implemented in the frequency domain by carrying out point wise multiplication of the spectrum of OFDM time The simulator was employed to study BER performance of rate 1/2 and rate 3/4 coded systems with QPSK and 16-QAM constellations under a variety of measured channel transfer functions. The performance over the frequency selective channel mainly depended upon the frequency domain fading and the channel coding rate
    • …
    corecore