196 research outputs found

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Process-induced skew reduction in nominal zero-skew clock trees

    Full text link
    Abstract — This work develops an analytic framework for clock tree analysis considering process variations that is shown to correspond well with Monte Carlo results. The analysis frame-work is used in a new algorithm that constructs deterministic nominal zero-skew clock trees that have reduced sensitivity to process variation. The new algorithm uses a sampling approach to perform route embedding during a bottom-up merging phase, but does not select the best embedding until the top-down phase. This results in clock trees that exhibit a mean skew reduction of 32.4 % on average and a standard deviation reduction of 40.7 % as verified by Monte Carlo. The average increase in total clock tree capacitance is less than 0.02%. I

    Throughput-driven floorplanning with wire pipelining

    Get PDF
    The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires

    High performance IC clock networks with grid and tree topologies

    Get PDF
    In this dissertation, an essential step in the integrated circuit (IC) physical design flow—the clock network design—is investigated. Clock network design entailsa series of computationally intensive, large-scale design and optimization tasks for the generation and distribution of the clock signal through different topologies. The lack or inefficacy of the automation for implementing high performance clock networks, especially for low-power, high speed and variation-aware implementations, is the main driver for this research. The synthesis and optimization methods for the two most commonly used clock topologies in IC design—the grid topology and the tree topology—are primarily investigated.The clock mesh network, which uses the grid topology, has very low skew variation at the cost of high power dissipation. Two novel clock mesh network designmethodologies are proposed in this dissertation in order to reduce the power dissipation. These are the first methods known in literature that combine clock meshsynthesis with incremental register placement and clock gating for power saving purposes. The application of the proposed automation methods on the emerging resonant rotary clocking technology, which also has the grid topology, is investigated in this dissertation as well.The clock tree topology has the advantage of lower power dissipation compared to other traditional clock topologies (e.g. clock mesh, clock spine, clock tree with cross links) at the cost of increased performance degradation due to on-chip variations. A novel clock tree buffer polarity assignment flow is proposed in this dissertation in order to reduce these effects of on-chip variations on the clock tree topology. The proposed polarity assignment flow is the first work that introduces post-silicon, dynamic reconfigurability for polarity assignment, enabling clock gating for low power operation of the variation-tolerant clock tree networks.Ph.D., Electrical Engineering -- Drexel University, 201

    Clock routing for high performance microprocessor designs.

    Get PDF
    Tian, Haitong.Chinese abstract is on unnumbered page.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (p. 65-74).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivations --- p.1Chapter 1.2 --- Our Contributions --- p.2Chapter 1.3 --- Organization of the Thesis --- p.3Chapter 2 --- Background Study --- p.4Chapter 2.1 --- Traditional Clock Routing Problem --- p.4Chapter 2.2 --- Tree-Based Clock Routing Algorithms --- p.5Chapter 2.2.1 --- Clock Routing Using H-tree --- p.5Chapter 2.2.2 --- Method of Means and Medians(MMM) --- p.6Chapter 2.2.3 --- Geometric Matching Algorithm (GMA) --- p.8Chapter 2.2.4 --- Exact Zero-Skew Algorithm --- p.9Chapter 2.2.5 --- Deferred Merge Embedding (DME) --- p.10Chapter 2.2.6 --- Boundary Merging and Embedding (BME) Algorithm --- p.14Chapter 2.2.7 --- Planar Clock Routing Algorithm --- p.17Chapter 2.2.8 --- Useful-skew Tree Algorithm --- p.18Chapter 2.3 --- Non-Tree Clock Distribution Networks --- p.19Chapter 2.3.1 --- Grid (Mesh) Structure --- p.20Chapter 2.3.2 --- Spine Structure --- p.20Chapter 2.3.3 --- Hybrid Structure --- p.21Chapter 2.4 --- Post-grid Clock Routing Problem --- p.22Chapter 2.5 --- Limitations of the Previous Work --- p.24Chapter 3 --- Post-Grid Clock Routing Problem --- p.26Chapter 3.1 --- Introduction --- p.26Chapter 3.2 --- Problem Definition --- p.27Chapter 3.3 --- Our Approach --- p.30Chapter 3.3.1 --- Delay-driven Path Expansion Algorithm --- p.31Chapter 3.3.2 --- Pre-processing to Connect Critical ports --- p.34Chapter 3.3.3 --- Post-processing to Reduce Capacitance --- p.36Chapter 3.4 --- Experimental Results --- p.39Chapter 3.4.1 --- Experiment Setup --- p.39Chapter 3.4.2 --- Validations of the Delay and Slew Estimation --- p.39Chapter 3.4.3 --- Comparisons with the Tree Grow (TG) Approach --- p.41Chapter 3.4.4 --- Lowest Achievable Delays --- p.42Chapter 3.4.5 --- Simulation Results --- p.42Chapter 4 --- Non-tree Based Post-Grid Clock Routing Problem --- p.44Chapter 4.1 --- Introduction --- p.44Chapter 4.2 --- Handling Ports with Large Load Capacitances --- p.46Chapter 4.2.1 --- Problem Ports Identification --- p.47Chapter 4.2.2 --- Non-Tree Construction --- p.47Chapter 4.2.3 --- Wire Link Selection --- p.48Chapter 4.3 --- Path Expansion in Non-tree Algorithm --- p.51Chapter 4.4 --- Limitations of the Non-tree Algorithm --- p.51Chapter 4.5 --- Experimental Results --- p.51Chapter 4.5.1 --- Experiment Setup --- p.51Chapter 4.5.2 --- Validations of the Delay and Slew Estimation --- p.52Chapter 4.5.3 --- Lowest Achievable Delays --- p.53Chapter 4.5.4 --- Results on New Benchmarks --- p.53Chapter 4.5.5 --- Simulation Results --- p.55Chapter 5 --- Efficient Partitioning-based Extension --- p.57Chapter 5.1 --- Introduction --- p.57Chapter 5.2 --- Partition-based Extension --- p.58Chapter 5.3 --- Experimental Results --- p.61Chapter 5.3.1 --- Experiment Setup --- p.61Chapter 5.3.2 --- Running Time Improvement with Partitioning Technique --- p.61Chapter 6 --- Conclusion --- p.63Bibliography --- p.6

    Evaluating the reliability of NAND multiplexing with PRISM

    Get PDF
    Probabilistic-model checking is a formal verification technique for analyzing the reliability and performance of systems exhibiting stochastic behavior. In this paper, we demonstrate the applicability of this approach and, in particular, the probabilistic-model-checking tool PRISM to the evaluation of reliability and redundancy of defect-tolerant systems in the field of computer-aided design. We illustrate the technique with an example due to von Neumann, namely NAND multiplexing. We show how, having constructed a model of a defect-tolerant system incorporating probabilistic assumptions about its defects, it is straightforward to compute a range of reliability measures and investigate how they are affected by slight variations in the behavior of the system. This allows a designer to evaluate, for example, the tradeoff between redundancy and reliability in the design. We also highlight errors in analytically computed reliability bounds, recently published for the same case study

    Evaluating the reliability of NAND multiplexing with PRISM

    Get PDF
    Probabilistic-model checking is a formal verification technique for analyzing the reliability and performance of systems exhibiting stochastic behavior. In this paper, we demonstrate the applicability of this approach and, in particular, the probabilistic-model-checking tool PRISM to the evaluation of reliability and redundancy of defect-tolerant systems in the field of computer-aided design. We illustrate the technique with an example due to von Neumann, namely NAND multiplexing. We show how, having constructed a model of a defect-tolerant system incorporating probabilistic assumptions about its defects, it is straightforward to compute a range of reliability measures and investigate how they are affected by slight variations in the behavior of the system. This allows a designer to evaluate, for example, the tradeoff between redundancy and reliability in the design. We also highlight errors in analytically computed reliability bounds, recently published for the same case study

    Associative skew clock routing for difficult instances

    Get PDF
    This thesis studies the associative skew clock routing problem, which seeks a clock routing tree such that zero skew is preserved only within identified groups of sinks. Although the number of constraints is reduced, the problem becomes more difficult to solve due to the enlarged solution space. Perhaps, the only previous study used a very primitive delay model which could not handle difficult instances when sink groups are intermingled. We reuse existing techniques to solve this problem including difficult instances based on an improved delay model. Experimental results show that our algorithm can reduce the total clock routing wirelength by 9%Â15% compared to greedy-DME, which is one of the best zero skew routing algorithms

    Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms

    Get PDF
    Computer-aided design (CAD) for very large scale integration (VLSI) involve

    Facility Location and Clock Tree Synthesis

    Get PDF
    The construction of clock trees and repeater trees are major challenges in chip design. Such trees distribute an electrical clock signal from a source to a set of sinks on a chip. On recent designs there can be millions of repeater trees with only a few up to some hundred sinks and several clock trees with up to some hundred thousand of sinks. In repeater trees the signal has to arrive at each sink not later than an individual required arrival time, while in clock trees it has to arrive at each sink within an individual required arrival time window. In this thesis, we present new theory and algorithms for the construction of clock trees and repeater trees and an essential sub-problem, the Sink Clustering Problem. We also describe our clock tree construction tool BonnClock, which has been used by IBM Microelectronics for the design of hundreds of most complex chips. First, we introduce the Sink Clustering Problem, the main sub-problem of clock tree design. Given a metric space (V,c), a finite set D of terminals with positions p(v) ∈ V and demands d(v) ∈ R ≥ 0 for all v ∈ D, a facility opening cost f ∈ R>0 and a load limit u ∈ R>0 , the task is to find a partition D=D1 ∪ ... ∪ Dk of D and, for all 1 ≤ i ≤ k, a Steiner tree Si for {p(v)| v ∈ Di }. Each cluster (Di ,Si ), 1 ≤ i ≤ k, has to keep the load limit, that means ∑e ∈ E(Si) c(e) +∑s ∈ Di d(s) ≤ u. The goal is to minimize the weighted sum of the length of all Steiner trees plus the number of clusters, i.e. minimize ∑i=1,...,k (∑e ∈ E(Si ) c(e)) +kf. We present the first constant-factor approximation algorithm for the Sink Clustering Problem. It is based on decomposing a minimum spanning tree on the sinks and has an approximation guarantee of 1+2α, where α is the Steiner ratio of the underlying metric. Moreover, we introduce two variants of the algorithm that rely on decomposing an approximate minimum Steiner tree and an approximate minimum traveling salesman tour. These algorithms have approximation guarantees of 3β and 3γ, respectively, where β and γ are the approximation guarantees of the Steiner tree and TSP approximation algorithms, respectively. We also propose two post-optimization algorithms that can further improve an existing clustering. We analyze the structure of the Sink Clustering Problem and exhibit its connections to matroid theory. In particular, we use the property of matroids that for any two bases B1 , B2 there is a bijection p : B1 → B2 so that (B1 \ {b}) ∪ {p(b)} is again a basis for each b ∈ B1. We replace each Steiner tree of an optimum solution by a minimum spanning tree and connect all trees to a new artificial vertex s and get a tree S. In a modified metric the total length of S is a good lower bound for the cost of an optimum solution. Due to the matroid property we can compare a minimum spanning tree T on D ∪ {s} with S; the length of any edge of T is bounded by the length of an edge of S. We introduce the concept of K-dominated functions that helps us to increase the `cost' of certain edges of T while still having the property that the total length of all edges of T ending in a vertex of K ⊆ D is bounded by the total length of all edges of S ending in a vertex of K. Applying this procedure to the sets of a laminar family on D yields an improved lower bound. The bound can be further improved by combining it with a lower bound for the length of a minimum Steiner tree on D. For this bound we prove the following lemma: For any family of trees T = {T1 ,..., Tk } with V(Ti ) ⊂ D, 1 ≤ i ≤ k, with the property that for any subset T' ⊆ T the trees in T' cover at least | T' |+1 vertices, there exists an edge ei ∈ E(Ti ) for i=1,..., k such that these edges E={ei | 1 ≤ i ≤ k} form a forest, i.e. the set does not contain an edge twice and it does not contain a circuit. Our experimental results on real-world instances from clock tree design show that the cost of the solutions computed by our algorithms is in average only 10% over the best lower bound. Moreover, we compare our algorithm to another clustering algorithm used in industry. The results show that the total cost of our solutions is 10% less than the cost of the solutions computed by the competitive tool. Clock trees have to satisfy several timing constraints. More precisely, the signal has to reach each sink within an individual required arrival time window. Sinks can only be clustered together if their required arrival time windows have a point of time in common. Typically, all required arrival time windows are the same. In this case we have the Sink Clustering Problem defined above. However, there are clock trees where the sinks have different required arrival time windows. This motivates a generalization of the Sink Clustering Problem where each sink additionally has an individual time window. As further constraint the time windows of the sinks of a cluster must have at least one point of time in common. We study the Sink Clustering Problem with Time Windows and present a polynomial O(log s)-approximation algorithm for this problem, where s is the size of a minimum clique partition in the interval graph induced by the time windows. Our algorithm is based on a divide and conquer approach and uses the approximation algorithms for the Sink Clustering Problem on sub-sets of the instance. We show that the approximation guarantee of the algorithm is tight. For the practical construction of clock trees we present our algorithm BonnClock. BonnClock builds a clock tree combining a bottom-up clustering and a top-down partitioning strategy. In the bottom-up phase BonnClock is using the Sink Clustering Algorithm in order to determine the drivers of unconnected sinks or inverters. The `global' topology of the tree is determined by the top-down partitioning considering big blockages and timing restrictions. BonnClock uses a dynamic program in order to determine the sizes of the inverters that are inserted. All components of the algorithm are discussed in detail. As part of this thesis, we have also implemented this algorithm. BonnClock has become the standard tool to construct clock trees within IBM. We show experimental results with comparisons to another industrial clock tree construction tool and to lower bounds for the power consumption. It turns out that - mainly due to the Sink Clustering Algorithm - our power consumption is much smaller than with the other tool and only one third over the lower bound. Finally, we consider the repeater tree construction problem. In contrast to clock trees, each sink has a latest required arrival time instead of a time window. We describe a simple algorithm to build such trees where we insert the sinks one by one into an existing tree. Depending on the optimization goal we show a variant of the algorithm computing trees of almost optimal length or trees with guaranteed best possible performance. Moreover, we analyze the topology of trees with best or almost best performance more closely. Such trees are equivalent to minimax and almost minimax trees: Let a1 , ... , an ∈ N ≥ 0 be a set of numbers. The weight of a tree with n leaves is the maximum over all leaves i of the depth of leaf i plus ai. For a non-negative integral constant c the goal is to build a binary tree with weight at most the optimum weight plus c. This problem can be solved optimally by a greedy algorithm. However, we are interested in the online version of this problem where we have to insert the leaf i with weight ai into the tree without knowing n and the following weights aj, j> i. We give necessary and sufficient conditions for an online algorithm to compute trees of weight at most the optimum weight plus c. Moreover, we show how these conditions can be verified efficiently. We obtain an online algorithm that computes an optimum tree in O(nlog n) time. Finally, we study a further mathematical model of repeater trees that considers that additional delay caused by a bifurcation of a tree can be distributed partially to the two branches. For c∈ R>0 and a set L ⊆ {(l1 ,l2 ) ∈ R2 ≥ 0 | l1 +l2 = c} of two-element sets of non-negative real numbers we consider rooted binary trees with the property that the two edges emanating from every non-leaf are assigned lengths l1 and l2 with { l1 ,l2 } ? L. We study the asymptotic growth of the maximum number of leaves of bounded depths in such trees and the existence of such trees with leaves at individually specified maximum depths. Our results yield better lower bounds for repeater trees
    corecore