Abstract -With the exponential reduction in scaling of feature size, inter-wire coupling capacitance becomes the dominant part of load capacitance. Two problems are introduced by coupling, delay deterioration and crosstalk. This paper presents a timing-driven global routing algorithm with consideration of coupling effects and crosstalk avoidance. Our work differs from the existing ones in that we design a global routing "framework" which performs well in routablity, timing, and also facilitate the detailed routing in crosstalk avoidance. Experimental results on industrial circuits show that, the algorithm leads to substantial delay reduction and effective crosstalk elimination.
I. Introduction
As the CMOS technology enters the very deep sub-micron era, the shrinking of geometries brings the circuit designers a growing number of issues previously considered with the second order priority. One of the biggest concerns is interconnect crosstalk [1] , which is usually due to the capacitive coupling between the "victim" net and one or more "aggressor" nets. For 0.18 micron designs, the coupling capacitance can exceed 70 percent of the total interconnect capacitance [2] . Ignoring such coupling effects may lead to significant deviations between actual and nominal timing responses, power consumptions and functional behaviors.
Previous works [3] [4] [5] [6] have various contributions to performance-driven global routing based on conventional interconnect delay metrics [7, 8] . These algorithms are straightforward and efficient regarding specific test cases. However, with recent advances in VDSM technology, the switching cross-coupling experienced by critical wires becomes one of the unforeseen problems for these traditional approaches. Therefore, it is of increasing importance to consider and control coupling effects to guarantee a real reliable and high-performance design.
Most early works on crosstalk avoidance are focused on detailed routing [9] [10] [11] , where the estimation of crosstalk is accurate but the flexibility to avoid it is restricted. A few recent works have developed coupling-aware techniques during global routing phase. In [12] , global routing with crosstalk constraints has been studied. An extended global routing problem was addressed in [13] to consider simultaneous shield insertion and net ordering with RLC crosstalk constraints. Typical methods also include post-global-route measures [14] to eliminate crosstalk. However, none of these works explicitly combine crosstalk elimination with the timing optimization problem, which remains one of the most important tasks for global routing to deal with.
A very accurate crosstalk calculation begins with detailed routing. But it is often difficult to find a crosstalk-feasible solution in detailed routing if global routing is crosstalk-blinded. The goal of this work is to develop a technique to deal with crosstalk in a global view for decreasing the difficulty in getting a crosstalk-free solution in detailed routing, while satisfy the timing constraints and routablity at the same time. Our approach works as follows. A timing-relax method is applied in the earlier phase to eliminate congestion and optimize delay simultaneously. In the later phase, a crosstalk control method is devised to find the crosstalk-feasible solution. By utilizing a conservative crosstalk model, our approach produces the crosstalk reduction by releasing the regions of high coupling "risk" with topological optimization. Our work differs from the existing ones in that we design a global routing "framework" which performs well in routablity, timing, and also facilitates the detailed routing in crosstalk avoidance. The algorithm achieves promising results on industrial circuits.
The remainder of this paper is organized as follows. In Section II, we formulate the global routing problem for symbolic analysis. In Section III, the timing analysis strategy is discussed. The global routing algorithm is given in detail in Section IV. Section V shows the experimental results and Section VI is an overall conclusion.
II. Problem Formulation
We first give the definition on notations. For SC layout design, the chip is divided into a rectangular array of M row × M col cells called global routing cells (GRCs). Global routing graph (GRG) is the dual graph, which is composed of gridlines and crossings. Let G = (V, E) be the global routing graph shown by the solid line in Fig 1. Node v i represents the center point of GRC i . The edge that links v i and v j is a GRG edge (e) with length l denoting the distance between v i and v j . A non-negative number c e , called edge capacity, is assigned to e, indicating the number of available tracks between the corresponding GRCs.
In general, the crosstalk between two parallel wires is proportional to their coupling length, and is inversely proportional to their separating distance. Given a net, the crosstalk effects from all neighboring wires may not happen at the same time. But characterizing all cases requires exhaustive timing. Considering the worst case, the summation of all effects from other nets is defined as the total crosstalk on a net i(Ct i ). That is,
Ct Ct , 1 (1) where Ct ij is the value of crosstalk on net j from net i and N n the total number of nets within a design.
Without loss of generality, we tackle the timing optimization problem by minimizing the longest path delay from any input port to any output port of a circuit. Thus, the timing-driven global routing problem can be formulated as follows. Minimize
where m is the number of paths. delay(P i ) denotes the delay of path P i composed of gates and net wires. Two sets of constraints should be satisfied. Let f j be the total demand of the nets using edge e j . f j should be no greater than the edge capacity c j . And the total crosstalk on net j(Ct j ) should not exceed the corresponding threshold x j .
III. Timing Analysis
For wire-load estimation, we apply a wire-load model that is useful for timing and noise analysis in an early design stage [15] . It is derived by simulation of metal wires in different layers to obtain the first cut parasitic numbers and then curve fitting. The largest error in estimation of parasitics of a metal wire is within 5% of the field solvers to measure these parasitics. With specified GRG capacity and number of used tracks, an estimated wire spacing and other geometrical information are the input of the wire load model. All capacitance around a given conductor can be extracted including the coupling capacitance.
Since the conventional interconnect delay metrics [7, 8] do not take signal input rise/fall time into account or propagate signal transition time during delay evaluation, we apply more advanced delay estimation methods. The built-in delay evaluation engine uses congruence transformation technique [16] to reduce the order of a large RC net-list while stability and passivity can be guaranteed. Considering the trade-off between accuracy and speed, an adaptive method is used to control the order of net-list reduction. In benchmark testing, the accuracy can be within 1% of SPICE simulation. Fig 2 shows the delay estimation method of the algorithm utilizing above models, which takes coupling effects into consideration.
We adopt table-lookup model for gate delay estimation. In our work, the lookup tables are obtained from industrial circuit libraries.
IV. Global Routing Algorithm
Our approach mainly consists of two phases. In the earlier phase, we apply a timing-relax method. A heuristic algorithm is firstly developed to construct the initial delay-optimal solution, followed by an optimization algorithm, which utilizes coupling effect as a heuristic to optimize the circuit delay and congestion. Then, we devise the crosstalk control algorithm to minimize the total crosstalk.
A. The Initial Steiner Tree Algorithm
The initial solution is generated with two main concerns. (1)Timing performance is given first priority. (2)A time consuming initial solution is not desirable. With these considerations, we apply ITDT algorithm [17] in the first iteration of the routing tree construction for its simplicity.
Considering the longest path delay of the circuit, constructing minimum delay tree for each net respectively may not necessarily yield the best timing for the circuit. We then develop a short-term iteration algorithm to approach the optimal solution. The pseudo-code is shown in Fig 3. 
Initial_solution_optimization_algorithm( ) 1) For(each net) do Let the sink pin t i yielding maximum L(s,t) be the target critical node t. ITDT_algorithm(t);/*find best timing tree with respect to delay(s, t)*/ 2) Check each path and find the longest delay path P t .
3) dp = pathdelay (P t When minimizing the total capacitance of Steiner tree below a node v to reduce delay(s, t a ) in step 4, we either increase the wire spacing to reduce the inter-wire coupling capacitance or construct minimum wire length sub-tree to reduce the ground capacitance. The proof for effectiveness of the optimization algorithm in step 4 can be found in [18] .
B. Timing Optimization Considering Coupling Effects
Based on the initial solution, we optimize the network topology of the circuit to adjust most congested areas. To maintain timing performance of the initial solution, the most critical paths should be kept in the original topology. Applied simultaneously with the congestion optimization, the basic idea of timing optimization is to perform coupling effects transference, which directs the coupling effects to transfer to areas that are not critical for circuit delay. An extended congestion weight is used to denote the combination of both coupling weight and congestion weight. Definition 1 The extended congestion(EC) of a segment is the combination of coupling and congestion evaluation.
The EC weight of segment i can be defined by the following formulae:
where w congi is contributed merely by congestion of the segment and w coupi by coupling effects. α 1 was experimentally set to be 0.8. ε is a small constant.
As the coupling capacitance is inversely proportional to the spacing between neighboring wires, the coupling weight can be defined as:
where coupling capacitance C ci depends on spacing and wire length of the segment, C mi is the maximum coupling capacitance calculated under minimum spacing condition.
With the intention to optimize the circuit delay, we naturally note that adjusting the route on the critical path would result in a smaller circuit delay. The EC weight of segment i on the longest delay path is given by:
Since 2 2 α µα > , the extended congestion weight of segment on critical path is magnified. Thus when a net has alternative routes available in rerouting, it is more likely to choose the GRG segment on non-critical path because it has comparatively lower "congestion". In this way, the routing demands on critical path transfer to non-critical paths, which may increase their path delay. However, the coupling effects transference is directed to facilitate a delay reduction if considering the circuit timing as a whole, and consequently guarantees space of further delay optimization.
C. Crosstalk Control
The purpose of the crosstalk control algorithm is to find a crosstalk-feasible solution based on some estimation method. In practice, designers may want to control the crosstalk among logically sensitive nets, for which a switching event on one net causes the other to malfunction. Their relation is characterized by the use of a sensitivity graph [19] . If the sensitivity graph is not available, we assume all the nets are sensitive to adjacent nets.
The crosstalk constraints are defined through the definition of "crosstalk risk bound" for each net. Our approach produces the crosstalk reduction by releasing the regions of high crosstalk "risk". For example, suppose there are two wires routed in parallel over a number of GRG edges. One wire may violate corresponding crosstalk constraint if they remain being placed in parallel over the whole distance. We shall reroute part of the "risky" wire to change the topology to produce a crosstalk reduction. The violations can be checked during detailed routing as guiding information to tune our measurements.
Crosstalk Model and Basic Assumptions As described in Section II, crosstalk is a function partially depending on the parallel length and spacing between neighboring wires. Calculation of the noise waveform can accurately determine the crosstalk volume, while it would be time-consuming for global routing crosstalk estimation. Some previous works adopt the formula in [20] , where crosstalk is given by:
where β is an experimentally estimated constant. Considering more direct factors such as the impact of coupling capacitance and driver strengths, etc, we adopt a noise metric to estimate crosstalk effects. Given a victim net j, the crosstalk on net j from net i can be formulated as follows:
where R driver_ i and R driver_ j is the driver resistance of net i and j, respectively. R wire_ i and R wire_ j are the wire resistances. C couping_ ij is the coupling capacitance between net i and j, C total_ j includes all interconnect ground capacitance, coupling capacitance and load capacitance of net j. The definition of C total_ i is similar.
Our metric can be used as an upper bound for real peak noise. To testify we carried out a simple noise analysis and meanwhile compared our metric with another two widely used noise estimation metrics(denoted as Metric A and Metric B). They are given as follows: The results are shown in Fig 4. The curve at the bottom(magenta color, triangle mark) is the peak noise from SPICE simulation. Therefore it is a straight line with slope equal to 1. The green curve with "o" mark is the coupling ratio defined as
, where net j is the victim and net i the aggressor. The blue curve with "*" mark is the noise estimation using metric A. The red curve with "+" mark is the noise estimation of metric B considering driver resistance. The black curve with "x" mark is the noise estimation of our metric considering driver resistance and wire resistance. All three metrics can be used as the upper bound of real peak noise, while it is clear that our metric lowers the bound and is also conservative.
Without knowing the detailed routing information, some general assumptions are made first: Assumption 1 The coupling length (overlapping length between adjacent wires) of wire segments in GRG edge i is the length of the edge l i . Assumption 2 Each wire segment has crosstalk with at most two adjacent wires(wires above and below, or wires left and right) within a GRG edge. Assumption 3 The spacing between adjacent wire segments is the average spacing within a GRG edge. Assumption 4 The crosstalk between wire segments in different GRG edges can be ignored.
The purpose of assumption 1 is to make a conservative estimation on the crosstalk volume, illustrated in Fig 5 . Assumption 2 and 3 assume that each wire segment within a GRG edge has equal status. Assumption 4 assumes coupling capacitance existing between wire segments in different GRG edges can be ignored because in practice, the length of a GRG edge is usually a few times of the height of a cell.
Heuristics The basic idea of the crosstalk control algorithm is based on rip up and iteration. According to formula (1), the total crosstalk on a net j can be given by:
where S n is the total number of GRG edges that net j routed through, Ct ijk is the value of crosstalk on segment k of net j from net i. As we assume that crosstalk Ct j below a risk bound x j will not affect the proper functioning of the circuit, the "overflow" crosstalk on net j and the total "overflow" crosstalk of solution S can be defined by the following formulae:
Given a global routing solution S, the total overflow crosstalk and crosstalk on each net Ct j (S) can be calculated. If crosstalk violations are found, the nets having violations are ripped up. Since the rerouted nets will introduce extra crosstalk not only on itself but also on other nets, with which it shares GRG edges, the cost can be calculated by crosstalk increases of these two parts. The total overflow crosstalk introduced by rerouting of net j is formulated as follows:
(~U (10) where T j and T' j is the routing tree for net j before and after rerouting, respectively. During crosstalk elimination, we also need to consider timing performance. One thing benefiting from our timing optimization algorithm is that the critical path has relatively low crosstalk due to coupling effects transference. In order to prevent the critical path from delay deterioration, the overflow crosstalk of a segment s i on critical path introduced by rerouting of net j is given by the following expression: Each newly constructed routing tree will be recorded into the corresponding net. When a new solution is constructed, we compare it with all the historical solutions for minimal crosstalk cost. To expand the space of greedy search, the sequence of the nets having crosstalk violations will be randomized in rerouting to search for the global optimum. The heuristic algorithm is described in Fig 6 . An example of the procedure is given in Fig 7. The original solution is shown in (a), where the regions having crosstalk risk are identified by red color. After calculation of ∆Ct, net1 is chosen and rerouted. Consequently a wire segment is released from crosstalk risk on net2 and net5, respectively, which is shown in (b). (c) and (d) illustrate the operations of steps followed.
Since the coupling length is conservatively estimated, the resulting total crosstalk value may appear to be relatively high. Nevertheless, the assumptions herein are general enough to reflect the extent of "crosstalk risk" for each net as the guide information for detailed routing. Thus, the crosstalk risk bound can be flexibly defined.
D. The Global Routing Algorithm Description
The core algorithm for optimization is a 3-step (step 3, 4, 5, in Fig 8) iteration algorithm based on rip-up and reroute method. Evaluation functions are used with respect to single rerouted tree and overall routablity. In our optimization process, the evaluation function varies with changing of routing phases, by which we may take purposive control.
Construct initial timing tree for all the nets. I and table II:  TABLE I Single Routing Tree Extended Congestion Evaluation Weighted total overflow of GRG *α is a constant, normally set to1.0.
Evaluate global congestion of GRG. Evaluation functions are defined in table

3.
Determine the longest delay path P t . 4 By simultaneous random optimization of step 4, we can approach some local optimum quickly but roughly. Sequential optimization then helps to find better route by accurate rerouting and weight-refreshing. The crosstalk control algorithm will be applied in case crosstalk violations are found.
V. Experimental Results
We have implemented the timing-driven global routing algorithm on a Sun Enterprise 450 in C language and tested it with three industrial circuits extracted from microprocessor design. They are under 0.13µm technology and provided with corresponding look-up tables containing the gate timing arc information. Note that the routing resource constraints had all been satisfied in the following experimental results unless specified.
Two sets of experimental data are provided. The first set of data (given in Section A) is the comparison on timing optimization capability of two methods, which can be applied before crosstalk elimination. The resulting delay values are all actual delay with consideration of worst-case coupling capacitance. The second set of data demonstrates the effect of our crosstalk control algorithm, which is given in Section B. Table III compares the experimental results with respect to two different global routing methods. In Method 1, we first apply our initial routing tree construction algorithm. Then, optimization is applied based on the initial solution but no coupling directed optimization is carried out. In Method 2, we utilize coupling as a heuristic in guiding timing optimization, after which the crosstalk control algorithm will be applied.
A. Comparison on Circuit Delay
Row index of Table III is the circuit name and size of net list. Comparisons of delay performance and run time are shown in column 3 to 5, column 6 to 7 respectively. We can see from Table III that applying our coupling directed delay optimization, Method 2 has made an up to 12% improvement over the timing performance of Method 1 within very slightly increased run time.
B. Experimental Results on Crosstalk Control
In order to measure the effectiveness of the crosstalk control algorithm, we first run our global router for timing optimization merely, which we had discussed as Method 2 in Section A. The experimental results are compared with those obtained after applying crosstalk control algorithm. We give the comparison in Table IV . It is clear from Table IV that after applying our crosstalk control method, the total overflow crosstalk of the circuits have been suppressed with very slight deterioration on timing performance. Meanwhile the number of nets having violations has been successfully reduced. 
VI. Conclusions and Future Work
