6 research outputs found

    Binary Adder Circuits of Asymptotically Minimum Depth, Linear Size, and Fan-Out Two

    Get PDF
    © Stephan Held and Sophie Spirkl | ACM 2018. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Algorithms, https://doi.org/10.1145/3147215.We consider the problem of constructing fast and small binary adder circuits. Among widely used adders, the Kogge-Stone adder is often considered the fastest, because it computes the carry bits for two n-bit numbers (where n is a power of two) with a depth of 2 log2n logic gates, size 4 nlog2n, and all fan-outs bounded by two. Fan-outs of more than two are disadvantageous in practice, because they lead to the insertion of repeaters for repowering the signal and additional depth in the physical implementation. However, the depth bound of the Kogge-Stone adder is off by a factor of two from the lower bound of log2n. Two separate constructions by Brent and Krapchenko achieve this lower bound asymptotically. Brent’s construction gives neither a bound on the fan-out nor the size, while Krapchenko’s adder has linear size, but can have up to linear fan-out. With a fan-out bound of two, neither construction achieves a depth of less than 2 log2n. In a further approach, Brent and Kung proposed an adder with linear size and fan-out two but twice the depth of the Kogge-Stone adder. These results are 33–43 years old and no substantial theoretical improvement for has been made since then. In this article, we integrate the individual advantages of all previous adder circuits into a new family of full adders, the first to improve on the depth bound of 2 log2n while maintaining a fan-out bound of two. Our adders achieve an asymptotically optimum logic gate depth of log2n + o(log 2n) and linear size O(n)

    Fast Repeater Tree Construction

    Get PDF
    Repeaters are used during physical design of chips to improve the electrical and timing properties of interconnections. They are added along Steiner trees that connect root gates to sinks, creating repeater trees. Their construction became a crucial part of chip design. We present a new algorithm to solve the repeater tree construction problem. We first present an extensive version of the Repeater Tree Problem. Our problem formulation encapsulates most of the constraints that have been studied so far. We also consider several aspects for the first time, for example, slew dependent required arrival times at repeater tree sinks. The employed technology, the properties of available repeaters and metal wires, the shape of the chip, the temperature, the voltages, and many other factors highly influence the results of repeater tree construction. To take all this into account, we extensively preprocess the environment to extract parameters for our algorithms. We first present an algorithm for Steiner tree creation and prove that our algorithm is able to create timing-efficient as well as cost-efficient trees. Our algorithm is based on a delay model that accurately describes the timing that one can achieve after repeater insertion upfront. Next, we deal with the problem of adding repeaters to a given Steiner tree. The predominantly used algorithms to solve this problem use dynamic programming. However, they have several drawbacks. Firstly, potential repeater positions along the Steiner tree have to be chosen upfront. Secondly, the algorithms strictly follow the given Steiner tree and miss optimization opportunities. Finally, dynamic programming causes high running times. We present our new buffer insertion algorithm, Fast Buffering, that overcomes these limitations. It is able to produce results with similar quality to a dynamic programming approach but a much better running time. In addition, we also present improvements to the dynamic programming approach that allows us to push the quality at the expense of a high running time. We have implemented our algorithms as part of the BonnTools physical design optimization suite developed at the Research Institute for Discrete Mathematics in cooperation with IBM. Our implementation deals with all tedious details of a grown real-world chip optimization environment. We have created extensive experimental results on challenging real-world test cases provided by our cooperation partner. Our algorithm can solve about 5.7 million instances per hour
    corecore