Abstract|Bounding the load capacitance at gate outputs is a standard element i n t o d a y's electrical correctness methodologies for high-speed digital VLSI design. Bounds on load caps improve coupling noise immunity, reduce degradation of signal transition edges, and reduce delay uncertainty due to coupling noise 6 . For clock and test distribution, an additional design requirement is bounding the bu er skew, i.e., the di erence between the maximum and the minimum number of bu ers over all source-to-sink paths in the routing tree, since bu er skew is one of the main factors a ecting sink delay s k ew 10 . In this paper we consider algorithms for bu ering a given tree with the minimum number of bu ers under given load cap and bu er skew constraints. We show that the greedy algorithm proposed by T ellez and Sarrafzadeh 10 is suboptimal for non-zero bu er skew bounds and give examples showing that no bottom-up greedy algorithm can achieve optimality. The main contribution of the paper is an optimal dynamic programming algorithm for the problem. Experiments on test cases extracted from recent industrial designs show that the dynamic programming algorithm has practical running time and inserts up to 5 10 fewer bu ers compared to the algorithm in 10 .
I Introduction
For high-speed digital VLSI design, bounding the load capacitance at gate outputs is a standard element i n t oday's electrical correctness methodologies. Bounds on load caps improve coupling noise immunity, reduce degradation of signal transition edges, and reduce delay uncertainty due to coupling noise 6 . 1 According to 9 , commercial EDA methodologies and tools for signal integrity rely heavily on upper-bounding the capacitive loads on This work was partially supported by Cadence Design Systems, Inc., the MARCO Gigascale Silicon Research C e n ter and NSF Grant CCR-9988331. 1 Such bounds also improve reliability with respect to hot-carrier oxide breakdown hot electrons 4, 5 and AC self-heating in interconnects 8 , and facilitate technology migration since designs are more balanced. driver and bu er outputs to prevent v ery long slew times on signal transitions. Essentially, the load capacitance bounds serve a s proxies for bounds on input rise fall times at bu ers and sinks Tellez and Sarrafzadeh 10 formally prove t h i s equivalence using a simple linear model. We assume that such capacitive load bounds are inherent t o any bu ered routing tree design task. It is natural to propose a minimum-bu er formulation, so as to minimize changes made to the routing tree in meeting the load bounds.
Bu ering to control slew times is also critical to early timing analysis. With lookup-table based modeling of gate delays and output transition times, very long input slews tend to be propagated inaccurately, resulting in extremely slow transitions. Static timing analyses that are based on the associated delay calculations will be utterly compromised, and useless for driving performance optimizations. Thus, early timing analysis must start with a bu ering solution that bounds the capacitive loads of all bu ers and of the source driver. Again, a minimum-bu er objective is appropriate.
Last, we observe that bu ering of some large routing trees e.g., for clock and test distribution is further constrained with respect to the bu er skew, i.e., the difference between the maximum and the minimum number of bu ers over all source-to-sink paths in the routing tree 10 . This is because bu er skew re ects the actual bu ered clock t r e e s k ew after routing. To accurately estimate tradeo s between alternative c l o c k tree topologies in the early stages of clock distribution design, the key problem is to bound the number of bu ers needed by a given tree to satisfy given constraints on both slew rate input rise fall times and bu er skew. Good bounds or, good constructions that minimize the number of bu ers while controlling the bu er skew will enable accurate estimation and tradeo of such system resources as power and area.
From the above c o n text and assumptions, we obtain the following problem formulation:
Bounded Skew Bu ering Problem BSBP: Given a clock net N, per-unit length wire capacitance, sink and bu er input capacitances, capacitive load bounds for bu ers and for the tree source, and an upper bound on bu er skew, nd a bu ering of N that satis es all bounds while using the minimum numberofbu ers.
The BSBP was rst formulated by Tellez and Sarrafzadeh 10 , who suggested a greedy algorithm with runtime On + k, where n is the number of sinks in the net N and k is the number of inserted bu ers. In this paper, we m a k e the following contributions:
We give examples showing the sub-optimality o f t h e Tellez-Sarrafzadeh algorithm for BSBP with non-zero skew bounds, and further prove that no bottomup greedy algorithm can achieve optimality Section III.
We give a non-trivial dynamic programming algorithm which guarantees optimum solutions for BSBP in On + 1   3   N B   2 time, where n, , and N B are the number of sinks, the given skew bound, and an upper-bound on the optimum number of inserted bu ers, respectively Section IV. We present experimental results on test cases extracted from recent industrial designs, showing that the dynamic programming algorithm has practical running time and inserts signi cantly fewer bu ers compared to the algorithm in 10 Section V.
II Notations and Problem Formulation
We start with a few de nitions and notations. Let N beanet consisting of a source r and a set of sinks S.
A routing tree for the net N is a binary 2 tree T = r; V ; E rooted at r such that each sink of S is a leaf in T. A bu ered routing tree for the net N is a tree T 0 = r; V ; E ; B such that T = r; V ; E is a routing tree for N and B is a set of bu ers located on the edges 3 of T.
For any b 2 B f rg, the subtree driven by b, D b , also referred to as the stage of b 10 is the maximal subtree of T which is rooted at b and has no internal bu ers; a bu ered routing tree T = r; V ; E ; B has jBj + 1 stages including a source s t a g e driven by t h e source.
Throughout the paper we will use the following notations: n = jSj number of sinks 2 In this paper we restrict ourselves to binary routing trees. Every routing tree can be made binary by duplicating nodes and inserting zero-length edges. 3 We assume that bu ers have a single input and a single output and thus are inserted only on the edges of T .
C w = capacitance of a wire of unit length, which is as- Tellez and Sarrafzadeh 10 also note that the bu er skew is a signi cant factor a ecting sink delay skew. Other sources of sink delay skew, such as propagation delays, have b e e n w ell studied heuristics and approximation algorithms for constructing unbu ered trees with zero-or bounded-skew can be found, e.g., in 3, 12 . To guarantee bounded sink delay skew after bu ering we need to ensure that the di erence in the number of bu ers of the longest and shortest path from the root r to the sinks is at most a given bu er skew bound , i.e., T = lT , sT 3 A bu ering satisfying both the load constraint 2 and the bu er skew constraint 3 will be called feasible. In this paper we consider the problem of nding a feasible bu ering with minimum number of bu ers, formally dened as follows:
Bounded Skew Bu ering Problem BSBP Given: 1 net N with source r and set of sinks S, 2 binary routing tree T = r; V ; E f o r N, 3 For every v 2 V the branch of v, denoted brv, is T v v;parentv where parentr = r. For each bu ering X of a branch brv, we denote by nbX, lX, sX, capX, and X the total number of bu ers, the number of bu ers on the longest path, the number of bu ers on the shortest path, the residual capacitance i.e., the capacitance of the stage driven by parentv, and the bu er skew in the branch brv, respectively. Also, if X is a bu ering of a subtree containing vertex v, w e denote by X v the bu ering X restricted to the branch brv.
III Why Greedy Does Not Work
The BSBP has been previously studied by Tellez and Sarrafzadeh 10 . In 10 , a greedy algorithm is rst presented for minimum bu ering without bu er skew constraints and then the algorithm is modi ed to handle such constraints. Below w e describe the two algorithms for the case of binary trees; the description in 10 is given for arbitrary trees.
When there are no constraints on bu er skew, the algorithm in 10 starts with an empty bu ering X = ; and then performs the following two steps for each n o d e u, i n bottom-up order:
1. packNodeu: while capX v +capX w C U where v and w are the two c hildren of u, add a bu er at the topmost position of the child branch with the largest residual capacitance the greedy choice. 2. Perform packNodeu excluding the child branches with maximum longest path, i.e., if lX w l X v , then add a bu er at the topmost position in brw.
Exit if capX u C U .
3. Insert bu ers at the topmost position of all child branches with shortest path equal to lu , in order to maintain bu er skew at most when we insert bu ers on the longest paths in the next step. Exit if the load constraint is satis ed.
Perform packNodeu considering only child
branches with maximum longest path, i.e., longest path equal to su + + 1 .
The modi ed greedy algorithm nds the optimum solution of any g i v en tree when the ske w b o u n d i s z e r o . However, contrary to the claim made in 10 , the modied greedy algorithm may g i v e suboptimal solutions for 1. There several reasons for its sub-optimality. One reason is that child branches with maximum longest path are considered for bu ering after considering the other branches, regardless of their residual capacitance. This may cause the algorithm to return a suboptimal solution, e.g., when the skew bound is so large that the bu er skew constraint n e v er becomes a constraint in this case the optimum is found by a l w ays choosing the branch with the largest residual capacitance in packNode. Figure 1 b shows the optimal solution which has one less bu er. This instance points to a more basic reason for the sub-optimality of the modi ed greedy algorithm: the optimum bu ering of a given tree may b e suboptimal when restricted to subtrees.
A natural question prompted by the example in Figure  1 is whether or not there exists a bottom-up algorithm that computes a xed number of solutions for each branch and still guarantees global optimality. Below, we give two series of examples showing that the answer to this question is negative. The minimum number of bu ers for each of the two branches into a is 2 d,2 , since bu ers are only required by the u" l e a ves. If we start with minimum-number bu erings for both branches into a, we will have to insert a bu er right below a on one of them in order to meet the load constraints. This in turn triggers the insertion of a very large number of bu ers upstream due to the skew constraint. The optimum overall solution is to insert bu ers right a b o ve 2 d,2 of the v" l e a ves. This leads to bu ering one of the branches into a with at least path have larger residual capacitance, and, depending on the upstream tree topology, e a c h o f t h e m m a y be the only way to complete the optimal solution.
IV Dynamic Programming Algorithm
In this section we give a dynamic programming algorithm for the bounded skew bu ering problem. The dynamic programming technique has been applied in the past to timing-driven bu er insertion see e.g., 1, 7, 11 , but its application to BSBP presents speci c challenges. Proof. The proof is by induction on the depth of u.
The claim is trivially true if u is a sink, i.e., a leaf of T. Both our dynamic programming algorithm and the greedy algorithm of 10 h a ve been implemented in C. Table I gives the results obtained by running the two algorithms on 5 testcases from 2 . In all experiments the initial tree was computed using the Greedy-DME algorithm 3 . The unit wire capacitance was C w = 0:177f F = m, bu er input capacitance was C b = 3 7 :5f F , and sink input capacitance varied between 2:04f F and 200f F .
The rst observation is that, although slower than the greedy algorithm of 10 , the dynamic programming has very practical runtime all testcases nish in less than one second on a SUN Ultra 60 running SunOS 5.7. As expected, both algorithms nd the optimum solution when a b u e r s k ew bound of 0 is imposed. For non-zero skew bounds the dynamic programming algorithm is almost always strictly better than the greedy algorithm the numberofsaved bu ers is often in the 5 10 range.
Table I also shows that a signi cant reduction in the number of inserted bu ers can be achieved with a small increase in bu er skew, e.g., when going from zero bu er skew to a bu er skew of 1. For comparison, we h a ve a l s o included in the table a lower bound on the number of bu ers, which is the minimum number of bu ers needed to meet the load cap constraints while disregarding bu er skew constraints. 5 In all but one case, the lower bound is matched by the optimum bu ering with = 4, and often it is matched with a bu er skew as small as 2.
VI Conclusions and Future Research
In this paper we have addressed the problem of nding the minimum-bu ered routing of a given tree under bu er load and skew constraints. We have shown that a greedy algorithm previously proposed for this problem in 10 may fail to nd the optimum solution, and we have proposed an exact dynamic programming algorithm. Experimental results on test cases extracted from recent industrial designs show that the dynamic programming algorithm has practical running time and inserts signicantly fewer bu ers compared to the greedy algorithm of 10 .
Our future research will address i multi-constraint formulations, in which, e.g., input capacitance and fanout must be upper-bounded simultaneously, ii minimum inverter insertion in a given tree subject to sink polarity constraints, in addition to inverter load and skew constraints, and iii simultaneous tree construction and bu ering under given bu er load and skew constraints. 
