The " chicken-egg" dilemma 
Introduction
The major objective of VLSI interconnect synthesis has shifted from area minimization to performance optimization. Design convergence -in particular timing closurehas become one of the most critical concerns of any design methodology. Logic synthesis must be revoked with no guarantees of improvement if physical design cannot meet timing assumptions from logic synthesis. Interconnect timing optimization has become increasingly critical in achieving timing closure.
Existing VLSI interconnect construction algorithms (Table 1) have various well-understood limitations in their fields of use. Traditional minimum-spanning tree (MST) (e.g., Prim) or Steiner minimum tree (SMT) (e.g., ERSteiner [5] ) interconnect topologies are no longer timing optimum in deep submicron (DSM) technologies due to nonnegligible interconnect resistance. Shortest path trees (SPT) This work was partially supported by Cadence Design Systems, Inc., the MARCO Gigascale Silicon Research Center and NSF Grant CCR-9988331.
y Incentia Design Systems, Inc., Santa Clara, CA 95054. Table 1 : Interconnect routing tree construction algorithms categorized by topologies and objectives/functionalities. and arborescences (e.g., A-Tree [7] ), or shallow-light tree 1 [3] (e.g., AHHK [2] , BPRIM and BRBC [6] ) are insensitive to electrical parameters (e.g., driver strength, sink load capacitance, required-arrival times) and thus give the same results over all technologies, pin loads, driver strengths, etc. Elmore-delay-based timing optimization heuristics (e.g., C-Tree [1] , ERT, SERT, SERT-C [4] , PER-Steiner [5] and Alphabetic tree [15] ) do not guarantee timing performance: e.g., C-Tree solutions are limited in 5 empirical AHHK-over-SMT tree topologies; and PER-Steiner constructs shortest paths to pre-identified critical sinks before further improvements. Dynamic programming approaches (e.g., BA-Tree [14] , MBA-Tree, RMP [8] , P-Tree [13] and S-Tree [11] ) can achieve optimum area or timing performance, and can extend to address such functionality as simultaneous buffering, use of buffer stations, and routing obstacle avoidance. However, dynamic programming approaches are only weakly scalable, e.g., to nets with up to 10 [8] or 15 [11] sinks even with pruning techniques that sacrifice optimality.
In this paper, we propose a new iterative improvement approach for buffered interconnect timing optimization. We present our problem formulation in Section 2. Section 3 analyzes iterative interconnect timing optimization approaches, which we separate as Hanan grafting and nonHanan sliding. A greedy iterative interconnect timing optimization algorithm Q-Tree is developed in Section 4. We present our experimental results in Section 5 and conclude in Section 6.
Notations and Problem Formulation
We adopt the following notations in this paper. A DSM VLSI interconnect can be represented as a RC tree with segment resistances and capacitances, resistive driver, and capacitive loads. The presence of buffers B separates an interconnect into stages. Elmore delay [9] from the source s to a sink k is given by:
where path delay in a stage driven by b 2 f sg B is given by: 
Analysis
In this section, we analyze two iterative interconnect timing optimization primitives called Hanan grafting and nonHanan sliding. Our analyses reveal limited contribution of non-Hanan sliding, and form the foundation of the Q-Tree algorithm in the next section.
Sliding
A timing optimum interconnect tree may not be a RSMT on the Hanan grid, 2 since Elmore delay to a sink can be decreased by non-Hanan sliding [10, 15] .
Definition 1 Non-Hanan sliding performs interconnect tree transformation by relocating a Steiner node towards its parent node, i.e., to a non-Hanan location with introduction of coincident interconnect segments.
For a Steiner node u slid upstream by distance l to location u 0 (Figure 1) , 3 the change of Elmore delay to sink k per unit extra wirelength (= sliding distance) is given by:
2 The Hanan grid is formed by passing horizontal and vertical lines through every terminal i 2 f sg K. Note that Elmore delay is a concave function of sliding distance l ( Figure 2 ). The following observations hold.
Observation 3 Minimum Elmore delay to the sink k is achieved by sliding the Steiner node u to an extreme location (i.e., its current location or the location of its parent node), which is a Hanan location if the original routing is
Hanan.
Observation 4 Maximum required-arrival time at source of a Hanan routing is achieved by sliding the Steiner node u to a Hanan location, unless the extra wirelength makes a different (off-path) sink critical.

Grafting
Complementary to non-Hanan sliding, which embeds the same tree topology beyond the Hanan grid, Hanan grafting embeds a different tree topology on the Hanan grid. Hanan grafting extends the basic edge replacement operation proposed in [5] by allowing possible buffer insertion. Definition 2 Hanan grafting performs tree transformation by possibly buffered edge replacement on the Hanan grid. 4 Consider a grafting which replaces a tree edge u v, u = parentv, by a non-tree edge u 0 v 0 . The change of Elmore delay d gr E k to sink k can be calculated for each of the following cases ( 4 Depending on different scenarios, buffers can be inserted at the head of the edge, at the head of edge segments, at buffer stations, optimally, etc. We observe that (i) the optimum sliding is Hanan when starting with a Hanan routing, unless the extra wirelength makes a different sink critical; and (ii) Hanan sliding is a special case of Hanan grafting, i.e., case (a) without buffer insertion. The following observation holds. Observation 5 forms the basis of our proposed greedy iterative timing optimization algorithm -Q-Tree.
Observation 5 Without introduction of extra wirelength which makes a different sink critical, a Hanan grafting achieves the largest source required-arrival time
Greedy Iterative Optimization
We propose a greedy iterative timing optimization algorithm Q-Tree based on the above efficiency analysis. QTree chooses the most efficient interconnect tree transformation with the largest source required-arrival time increase per unit extra wirelength (Algorithm 1). Hanan grafting is preferred over non-Hanan sliding whenever possible due to Observation 5.
Observing that in many cases shorter wires imply smaller delay, we classify grafting into three categories depending on whether the wirelength is (i) decreased, (ii) unchanged or (iii) increased. 
Experiments
In the following experiments we collect data over 5, 10, 15, 20, 50 and 100 terminals. For each terminal number, 100 sets of terminal locations are randomly generated in a 10 000m 10 000m square, over which interconnects are constructed based on 180nm, 130nm, 100nm and 70nm ITRS parameters and the Berkeley Predictive Technology Model (BPTM) beta version[12] ( Table 2) . Results in terms of average source required-arrival time, wirelength, buffer number and runtime are presented with identical driver, buffer and sink size (R s = R b C b = C t ) and identical sink required-arrival times. 6 The runtimes are measured on a 1:4GHz Intel Xeon i686 with 1GB memory, excluding the time for SMT construction by the ER-Steiner heuristic. 5 The optimum sliding distance l can be found by calculating the intersections of these quadratic or linear functions, which takes On 2 runtime. We adopt a binary search approach (Algorithm 3), which takes On log l runtime with negligible suboptimality as observed in our experiments. We first compare three (unbuffered) interconnect topology optimizations: Q-Tree starting with ER-Steiner, C-Tree and PER-Steiner (Table 3) . We observe that significant interconnect timing performance improvement over Steiner minimum trees can be achieved by interconnect topology optimization at the expense of moderate wirelength increase. This improvement grows with technology advancement and instance size increase. Further, Q-Tree topology Table 3 : Average source required-arrival time qs, wirelength lT (normalized to ER-Steiner [5] ), and runtime of unbuffered (a) Q-Tree starting with ER-Steiner, (b) C-Tree and (c) PER-Steiner over 100 randomly generated terminal sets with identical sink capacitances and required-arrival times under 180nm, 130nm, 100nm and 70nm technology, respectively.
optimization starting with ER-Steiner topologies achieves better timing performance with longer wires than C-Tree and PER-Steiner topology optimizations. We observe indistinguishable timing performance improvement by Q-Tree topology optimization with or without non-Hanan sliding. Since introduction of buffer insertion would further decrease the contribution of sliding, i.e., as an alternative of critical sink isolation, we have: Non-Hanan sliding contributes generally negligible timing performance improvement.
We apply Q-Tree to ER-Steiner, BA-Tree and P-Tree interconnects. 7 For each interconnect, Q-Tree is first applied without any bound of wirelength or buffer number to achieve the best possible timing performance. Bound of wirelength and buffer number is then applied to achieve a "dominant" Q-Tree solution (i.e., with shorter wires, fewer buffers and better timing performance). We observe that (Table 4): 7 C-Tree solutions on randomly generated instances are not available due to instability of C-Tree code. P-Tree results on 50 sink instances are not available due to weak scalability.
Q-Tree starting with ER-Steiner in average achieves
better timing performance than BA-Tree with longer wires and more buffers, and worse timing performance than P-Tree with shorter wires and fewer buffers.
2. Q-Tree starting with ER-Steiner can achieve dominant solutions over BA-Tree.
3. Q-Tree starting with BA-Tree can achieve better timing performance, especially, dominant solutions over BA-Tree. 8 4. Q-Tree starting with P-Tree can achieve better timing performance, especially, dominant solutions over PTree. 9 
Conclusion
We propose iterative interconnect timing optimization as a solution to the "chicken-egg" dilemma between VLSI interconnect timing optimization and delay calculation. We separate iterative optimization approaches as Hanan grafting and non-Hanan sliding. Our analyses reveal limited contribution of non-Hanan sliding on timing performance improvement. We also propose a greedy iterative interconnect timing optimization algorithm Q-Tree. Our experimental results show that Q-Tree starting with ER-Steiner (resp. BA-Tree or P-Tree) can achieve better timing performance, especially, with shorter wires and fewer buffers compared to BA-Tree (resp. BA-Tree or P-Tree). In general, Q-Tree can be applied to any interconnect tree for further timing performance improvement, with practical instance sizes and easily-extended functionality, e.g., with buffer station and routing obstacle consideration. Q-Tree provides a simple and powerful VLSI interconnect timing optimization approach. , (e) Q-Tree starting with BA-Tree with lT and jBj bounded by 1:0 BA-Tree results (BAjQ ), (f) P-Tree (P), (g) Q-Tree starting with P-Tree (PjQ), (h) Q-Tree starting with P-Tree with lT and jBj bounded by 1:1 P-Tree results (PjQ ), over 100 randomly generated sets of terminals with identical sink capacitances and required-arrival times under 180nm, 130nm, 100nm and 70nm technology, respectively.
