Introduction
Interconnect optimization for delay minimization has drawn much attention recently. Previous work falls into two categories. One is topology optimization, such as the construction of bounded-radius boundedcost trees [2] , A-trees [G] , and low-delay trees [l] . The other is wiresizing optimization, which was first introduced in [6, 71 to minimize a weighted average interconnect delay, then extended with a sensitivity-based heuristic in [ll] to minimize the maximum interconnect delay. Moreover, both delay and power dissipation were optimized in [4] by simultaneous driver and interconnect sizing, and the circuit-level critical path delay (rather than the Elmore delay used in other work) was reduced in [lo] helei@cs . ucla.edu multaneous gate and interconnect sizing.
All these methods assume that there is a unique source in each interconnect tree and minimize the delay between the source and a set of critical sinks. Thus, they are only applicable to single-source interconnect trees (SSITs) . However, there exist many interconnect trees with multiple potential sources, each driving the interconnect tree at a different time. None of the existing methods consider such multi-source interconnect trees (MSITs) , except a very recent work by Cong and Madden [5] , where an MSIT topology optimization method based on the construction of mincost min-diameter A-trees was developed.
In this paper, we study the optimal wiresizing problem for MSITs under the distributed Elmore delay model, and the optimal wiresizing problem using a variable grid rather than a fixed grid used in previous work. The remainder of this paper is organized as follows: In Section 2, we present the formulation of the MSIT wiresizing problem. In Section 3 and 4, we study the properties of the optimal wiresizing solutions for MSIT designs, respectively under a fixed grid and a variable grid. These properties lead to efficient algorithms given in Section 5. Section 6 shows experimental results. Section 7 concludes the paper with discussions of future work. Proofs and more detailed experimental results can be found in [3] . The reader is strongly recommended to be familiar with the results in [7, 41, 
G ( E ) . $ + K5.

H ( E ) . -W E
penalty weight X i j to indicate its priority. When there is only one source in an interconnect tree, the M S WS problem becomes the single-source wiresizing (SSWS) problem studied in [7, 41. Also, a slightly more general wiresizing problem, the multi-source wiresizing problem with a variable grid (MSWS/G) will be formulated in Section 4.
E E M S I T E E M S I
Weighted Delay Formulation
Given in [3] , the Elmore delay t i j between source N i andl sink N j is:
Although this weighted delay formulation for multiple sources and sinks is very similar to that for the single source and multiple sinks in [4], the coefficient functions F , G and H have very different properties, which lead to much higher complexity and very different properties for the MSWS problem when compared to the SSWS problem. These properties will be discussed in Section 3.
3 Properties of Optimal MSWS Solutions
Review of SSWS Properties
When there is only one source in the routing tree, each edge has a unique signal flow direction. We can define the "ancestors" and "descendants" with respect to the signal flow in the tree. The following properties of optimal SSWS solutions were given in [7] .
A. Separability Given the wire width assignment of a path P originating from the source in an SSIT, the optimal wire width assignment for each subtree branching off from P can be carried out independently. The presence of multiple sources greatly complicates the wiresizing problem. For example, with multiple sources, even a monotone wiresizing is not well defined. Nevertheless, our research have revealed a number of interesting properties of the optimal wiresizing solutions for an MSIT, some of which generalize the results on the SSWS problem, and others are unique for the MS WS problem.
B. Monotone Property
Decomposition of an MSIT
In order to reduce the complexity with the MSWS problem, we decompose an MSIT into a source subtree (SST) and a set of loading slsbtrees (LSTs) (see Figure  1 ). The SST is the subtree spanned by all source nodes in the MSIT. After we remove the SST from the MSIT, the remaining segments form a set of subtrees, each of them is called an LST. When every pin of an MSIT can be a source at different times, the SST is the entire MSIT and there is no LST.
Parallel to the ancestor-descendent relation in an SSIT, the left-right relation is introduced in an MSIT. We choose an arbitrary source as the leftmost node Lsrc. The direction of the signal (current) flowing out from Lsrc is the right direction along each edge E . Under such definitions, the signal in any LST always flows rightward, but the signal may flow either leftward or rightward in an edge in the SST.
lNote that SST defined in this paper is different from that defined in [7] , where SST is used to denote a single stem tree.
A Source
YT
Figure 1: An MSIT can be decomposed into a source subtree SST, and a set of subtrees (three LSTs here) branching off from the SST. Of course, the local monotone property holds for segments in LSTs, where the Fi(S) is always greater than F'(S) (in fact, F,(S) = 0) and the edge widths always decrease rightward, just as given by the LST monotone property.
Properties of
D. Dominance Property Theorem 4 With respect to the definitions of the local refinement operation and the dominance relation an Section 3.1, the dominance property holds for the MSWS problem.
Although the dominance property was proven based on the ancestor-descendant relation in [7] for the SS WS problern, we showed that it is a general property neither dependent on the ancestor-descendent relation, nor on the left-right relation.
Theorem 4 enables efficient computations of lower and upper bounds of the optimal wiresizing solution by the G WSA algorithm in [7] . It uses the local refinement operation iteratively to tighten the lower bound or the upper bound for one edge at a tree. A much more powerful refinement operation, bundled refinement operation, which may tighten the lower bound or the upper bound for a number of edges by only one operation, will be introduced in the next section.
Properties of Optimal MSWS/G Solu tions
Up to now, both the MSWS problem defined in this paper and the SS WS problem defined in all previous work [7, 4, 11, 101 were only investigated and solved using a fixed grid. The grid controls how often the wire wadths are allowed to change. However, it is difficult to choose a proper grid structure. For the best accuracy, a very fine, uniform grid is usually chosen, which results in very high memory usage and computation time due to the large number of edges. We now investigate methods to obtain the optimal wiresizing results using a non-uniform and coarser grid.
A novel contribution of our work is to introduce a variable-grid formulation for the MS WS problem. The grid maybe finer in some regions but coarser in others. Moreover, we begin with a coarser grid then proceed to a finer one. Theorem 5 to be presented in Section 4.2 justifies this strategy and leads to much more efficient algorithms with the same accuracy when compared with previous work.
All properties in this section hold for both the MS WS problem and the SS WS problem, but we shall concentrate on the MS WS problem because the SS WS problem can be treated as a special case.
Grid Refinement and Bundled Edges
Given an MSIT, let 6 0 be the grid with each segment in the MSIT as an edge, GF the uniform grid with the finest grid unit 6 everywhere (6 is determined by the design rules 
Corollary of Theorem 3 Each segment in an MSIT
has at most T bundled edges where T is the number of possible wire width choices.
We shall compute the optimal width for each bundled edge directly, instead of treating it as a sequence of edges of length 6 under the grid Q F .
Bundled Refinement Operations
Let W be a wiresizing solution which dominates the optimal solution W * , and E be an edge under the current grid G and in segment S. Without loss of generality, we assume PI(,") 2 F,(S) and treat E as two edges El and z. El is the left end of E and with length 6, a is the remaining part of E (recall 6 is the grid unit in the finest grid G F ) . Let 
Optimal Wiresizing Algorithm
Given an MSIT, we first compute a lower bound and an upper bound of the optimal wiresizing solution. If the lower bound and the upper bound meet, which is very likely in practice, we get the optimal wiresizing solution immediately. Otherwise, a bounded enumeration technique combined with a dynamic programming technique is carried out between the lower and upper bounds where they do not meet.
OWBR Algorithm
Compared with the G WSA algorithm [7, 4] Starting with the coarsest grid GO, we perform BRU and BRL iteratively through an MSIT as follows. We first assign the minimum width to all edges (in this case, an edge is a segment in the M S I T ) , then traverse the MSIT and perform BRL operation on every edge. This process is repeated until no improvement is achieved on any edge in the last round of traversal. The result is a lower bound of the optimal wiresizing solution. Similarly, we assign the maximum width to all edges and perform BRU operations, obtain an upper bound of the optimal wiresizing solution. This is the first pass of 0 WBR.
After each pass, we check the lower and upper bounds. If there is a gap between the lower bound and the upper bound for an edge E (called an unconvergent edge) and its length is still larger than 6, we divide E into two edges of almost equal lengths (they may differ by S in order to maintain a valid grid), each of which inherits the old lower and upper bounds. After the refinement of all unconvergent edges, another pass of lower/upper bound refinements is carried out on all unconvergent edges in the refined grid.
This process is repeated until we either have the identical lower and upper bound for all edges under current grid, or each unconvergent edge is of length S. Where, r is the number of wire-width choices, m is the number of segments in the MSIT, n is the total wire length of the MSIT and no is the wire length of the longest segment in the MSIT (both in terms of 6).
Note that m 5 2(k-l), where IC is the number of pins.
Since k is a constant bounded by the size of the largest multi-source net (normally no larger than 32), m is also a constant, plus Inno << n, OWBR is much more efficient than G WSA. This has been further confirmed by experimental results.
Bounded Enumeration
Because the separability and the monotone property hold in LSTs, the optimal single-source wiresizing algorithm, named 0 WSA [7] , can be used for LSTs. The 0 WSA algorithm is a dynamic programming method 'There m a y exist more than one Pptight upper (or lower)
bounds of an W * to compute an optimal wiresizing solution between the lower and upper bounds.
Without the separability in SST, the wire width assignment for unconvergent edges in the SST need to be enuimerated subject to the local monotone property. Our experiments show that 0 WBR gives the convergent bounds on all edges in an MSlTfor most cases. For those cases which have unconvergent edges, the percentage of unconvergent edges is very small. Moreover, the gap between the lower and upper bounds on each unconvergent edge is also very small (usually being one in our experiments). Therefore, the enumeration procedure on unconvergent edges in the SST is very fast in practice.
Experimental Results
We have implemented the OWBR algorithm and tested it on a large number of MSITs for both the MCM and the IC technologies. We shall present both the comparison of different wiresizing solutions and the comparison between the OWBR algorithm and the G WSA algorithm. The delays reported in this section are computed using HSPICE. The use of HSPICE simulation results, instead of calculated Elmore delay values, verifies not only the quality of our MS WS solutions but also the validity of our interconnect modeling and the correctness of our MSWS problem formulation. 
B. Results on Industrial Nets
We also tested our algorithms on several multi-source nets provided by Intel. These nets were extracted from the toplevel floorplan of a high-performance microprocessor. Most pins of these nets can serve as both inputs and outputs, and all pairs between sources and sinks (excluding feedthrough pins) are considered1 to be timing critical. We use the l-Steiner tree algorithm [9] to route these nets. As shown in Table 3 , the opt-msws solutions consistently outperform the min-width solutions with as much as 36% and 17% reduction on the maximum delay and the average delay. It is interesting to observe that although the (weighted) average delay is the objective of our algorithms, all experimental results show that this formulation reduces the maximal delay as well. Also, the delay reduction for nets with larger span is more significant.
Speed-up Using Variable (Grid Computation
Ten %pin nets with pins randomly distributing in a 1000 x 1000 grid and routed by the l-Steiner algorithm are wiresized. The CPU times used by the 0 WBR algorithm and the G WSA algorithm to obtain the lower bound and the upper bound of the optimal solution are reported in Table 4 . We observed speed-ups ranging from a factor of 2 orders of magnitude to 3 orders of magnitude.
Conclusions and Future Work
The results in this paper have shown convincingly that proper sizing of the wire segments in multi-source nets can lead to significant reduction in the interconnect delay. We have also developed an efficient wiresizing algorithm using a (coarse) variable grid , which Table 3 : Multi-source wiresizing results on several nets in an Intel microprocessor layout Table 4 : Performance comparison between 0 WBR and G WSA achieves the same accuracy in the wiresizing solutions as the fixed grid based optimal wiresizing algorithms on the finest grid, but uses much less memory and computation time. To the best of our knowledge, it is the first work which presents an in-depth study of both the optimal wiresizing problem for multi-source interconnect trees and the optimal wiresizing problem using a variable grid. In order to further reduce the interconnect delay in multi-source nets, we plan to study the simultaneous driver and wiresizing problem for multi-source nets. Also, we would like to develop efficient multi-source wiresizing algorithms for multiple-objective optimization to minimize delay, area, and power dissipation and explore the tradeoff among these objectives.
