In tlus paper, we propose a novel framework for fast multilevel routing considering crosstalk and performance optimization. To handle the crosstalk minirmzation problem, we incorporate an intermediate stage of layerltrack assignment into the multilevel routing framework. For performance-driven routing, we propose a novel minimum-radius minimum-cost spanning-uee (MRMCST) hewistic for global routing. Compared with the state-of-the-art multilevel routing. the experimental results show that our approach achieved a 6 . l X runtime speedup, reduced the respective maximum and average crosstalk (coupling length) by about 30% and 24%. reduced the respective maximum and average delay by about 15% and 5%. and resulted in fewer failed nets.
Introduction
With decreasing feature sizes, higher clock rates. and increasing interconnect densities, crosstalk has become a major concem of comparable importance to area and timing in IC design. Crosstalk profoundly affects the circuit performance in veri deep submicron (VDSM) technology: it is introduced by a coupling between two neighboring wires.
For example. two adjacent wires form a coupling capacitor. A voltage or a current change on one wire can thus interfere the signal on the other wire. Crosstalk is an unwanted variation which makes the behavior of a manufactured circuit deviate from the expected response. The deleterious influences of crosstalk can be classified into two'categories. One is malfunctioning, which makes the logic values of circuit nodes differ from what we desire; the other is timing change, which is caused by switching behavior. Therefore, in addition to routability and timing performance. crosstalk minimization should also be considered in VDSM router design.
Traditionally. the complex routing problem is often solved by using the two-stage approach of global routing followed by detailed routing. Global routing first partitions the routing area into tiles and decides tile-to-tile paths for all nets whde detailed routing assigns actual tracks and vias for nets. Many routing algorithms adopt a flat framework of finding paths for all nets. Those algorithms can be classified into sequential and concurrent approaches. Early sequential ranting algorithms include maze-searching approaches I181 and line-searching approaches [14J, which route net-by-net. Most concurrent algorithms a p ply network-flow [I] or linear-assignment formulation 5 2 2 1 to route a set of nets at one time.
The major problem of the flat framework lies in their scalability for handling larger designs. As technology advances, technology nodes are getting smaller and circuit sizes are getting larger. To cope with the increasing complexity. researchers proposed to use hierarchical approaches to handle the problem. Marek-Sadowska [22] proposed a hierarchical global router based on linear assignment. Chang, Zhu, and Wong [5] applied linear assignment to develop a hierarchical, concurrent elnbal and detailed router for FPGA' s. hemxhical 3ppm~ch bc:omcr msuficicni Therefore. 11 I S drsired 1.1 cmplo) mort bels of routing lor LCT) lugc-,odlr IC Jc,igns.
'The m~l i i l e \ e l frrmcuorl. h a aiwxtcd much Atenuon In Ih: literaiurc rr.cenil). 11 rmplo!r J tuo-mg:c tr2hniquc. ccmscning folluusd by uncousening. The -owscning stage iterati\el! group, 3 ,el oi ctrcuit c o n i p " , { e g , circuit node,. ieIIs. moJule>, roiung tiles. cI; hdsed on a prcdeRncJ cub1 mctnc unul the number of compJncnr, being con. ,&red IS smallw hnn D Ihrcinold Then. the unc:,ar\ening suge iiersrivel) ungroupc 1 s i u i p r c w d ) cludereJ .%cult mnp,nenir XIJ rcfines the soluuun b) using 3 
382
routing is used for routability-driven global routing. After the coarsening stage, we perform a crosstalk-driven layerlrrack assignment for crosstalk optimization, At the uncoarsening stage, we perform detailed routing. Further, the UNOutahle nets are performed by point-to-path maze routing and rip-up and re-route to refine the routing solution level by level.
Compared with [ZO]
. the experimental results show that our approach achieved a 6.7X runtime speedup, reduced the respective maximum and average crosstalk (coupling length) by about 30% and 24%. reduced the respective maximum and average delay by about 15% and 590, and resulted in fewer failed nets. The results show the promise of our approach.
The rest of ttus paper is organized as follows. Section 2 presents the routing model and the multilevel routing framework. Section 3 presents our novel framework for run-time and crosstalk optimization. Experimental results are shown in Section 4. Finally, we give concluding remarks in Section 5.
Preliminaries

Routing Model
Our global routing algorithm is based on a graph search technique guided by the congestion information associated with routing regions and topologies. The router assigns higher costs to route nets through congested areas (or those of higher delay and/or crosstalk costs) to halance rhe net distribution among routing regions.
Before we can apply the graph search technique to multilevel routing, we first need to model the routing resource as a graph such that the graph topology can represent the chip structure. Figure 2 illustrates the graph modeling. For the modeling, we first parlition a chip into an array of rectangular subregions. These subregions are called global cells (GCs) . A node in the graph represents a GG in the chip. and an edge denotes the boundary between lwo adjacent GCs. Each edge is assigned a capacity according to the physical area or the number of tracks of a Gc. The graph is used to represenl the routing area and is called a mulrilewl muring graph, denoted by CA, where L is the level ID. A global router finds GC-to-GC paths for all nets on a routing graph to guide the detailed routing. The goal of global routing is to route as many nets as possible while meeting the capacity constraint of each edge and any other constraint, if specified.
As the process technology advances, multiple routing layers are possible. The number of layers in a modem chip can he more than six (121.
Wires in each layer run either horizontally or venically. We refer the Layer as a horizontal (H) or a verlical (V) routing layer. 
Multilevel Routing Model
As illustrated in Figure 1 , Go corresponds to the routing graph of the level 0 of the multilevel coarsening stage. At each level, our global router first finds routing paths for the local ne15 (or local 2-pin coiinecrions) (those nets [connections] that entirely sit inside a GC). After the global routing is performed, we merge 2 x 2 GCs of Go into a larger GC and at the same time perform resource estimation for use at the next level (i.e., level 1 here). Coarsening continues until the number of GCs at a level, say the k-th level, is helow a threshold. After the coarsening is finished, a crosstalk-driven layerltrack assignment is performed to assign long and straight segments to underlying routing resources. The uncoarsening stage tries to refine the routing solution of the unassigned segments of the level k. During uncoarsening, the unroutable nets are performed by point-to-path maze routing and rip-up and re-route to refine the routing solution. Then we proceed to the next level (level le -1) of uncoarsening by expanding each GCh to four finer GCx-L. The process continues until we reach level 0 when the final routing solution is obtained. straight segments tend to be assigned to specified layersltracks, leading to more efficient detailed routing at the uncoarsening stage since often only short segments need to he handled during detailed routing. At the uncoarsening stage, the unroutahle nets are routed by point-to-path maze routing and rip-up and re-route to refine the routing solution level by level.
Multilevel Routing Framework
Performance-driven Routing Tree Construction
In VDSM IC designs, interconnection delay dominates the performance of a circuit. Therefore, improving the wire delay also improves the overall chip performance. Many techniques have been developed to facilitate high-performance IC designs. For example. the algorithms for constructing performance-driven routing trees have received much attention ([IO]). The minimum spanning tree (MST) topology leads to the minimum total wire length, and thus congestion is often easier to he controlled than other topologies. However, its topology may result in longer critical paths and degrade circuit performance. In contrast, a shortest path tree ( S E ) may result in the best performance, but its total wire length (and congestion) may be significantly larger than that constructed by the MST algorithm. In [lo], researchers used the idea of incrementally modifying an MST to construct a performance-driven routing tree for a smooth trade-off between the tree radius (maximum signal delay) and the tree cost (total interconnection length). On one hand, minimizing wire length minimizes driver's output resistance and the total wire capacitance. On the other hand, minimizing the path length from the source to a sink also minimizes loading capacitance. Thus, both wire length and path length minimization are comparably important for RC delay minimization.
Different from the work presented in [IO], our algorithm tries to find a timing-driven routing tree, named a minimum-radius minimum-cost spanning tree (MRMCST), with the minimum radius among all MSTs. Since the MRMCST problem is NP-hard [23], we resort to a heuristic to obtain efficient solutions.
Given a vertex v in a graph G, its eccentricity, denoted by ecc(v), is the distance from v to the farthest vertex in G. The essenrial edges are those contained in every MST, and the optional edges are those contained in some MST's but not all MST's. The pseudo-center of a tree, denoted by pc, is a point on an edge or a vertex of diamerer P of the tree such that the distances from pc to the two extremes of P are the same. By diameter, we mean the longest path between any two veltices in a tree.
Since an MRMCST is an MST with the minimum radius among all possible MST's, this leads us to find the union graph of all MST's (called the MST Union Graph, MSTUG for short) and the intersection graph of all MST's (the MST Intersection Graph, MSTIG for short). We construct an MSTUG and an MSTIG by modifying the edge-coloring process introduced by Tarjan [26] , in which edges are colored either blue (essential edges) or red (discarded edges). But neither blue nor red edges can be applied to the optional edges. By modifying the edgecoloring process, we introduce green edges to represent optional edges. The MSTUG contains all the blue and green edges while the MSTIG contains hlue edges only.
Initially there are n single components. As edges are colored green or blue, two components are merged together to produce one component. If there exists one and only one component, the algorithm will terminate after coloring all the uncolored edges red. The algorithm is summarized in Figure 3 . After constructing an 41STI:G and an hISrIG. we may ohtmn se,. cia1 hluc uses and optional edge5 unless ths MSr IS uniquc, md then mi \IK\ICST can he rtinrtruitsd hy rclsctmg uptional cdgcrr, to connect the blue tress We introduce a Pnm-haled heuristic. n a n d lorull) #>piu" t'nmirrtioii ,irarc'g? I LOCS,. fnr the MRMCST ciinstruction.
Thc I'nm-hnscd method conridcr, only one critenon If there 15 more thdn one miniindly cmt optiond edpe Incident 13 the hlue trec uith wurie .S. we hreak thc tic h) choosing the sdgc c = ( p . q ) , where p is in !he hluc irce uith wurce -i and , I I S in 3 neighboring blur trec T. 
Crosstalk-Driven Layerfkack Assignment
As fabrication technology shrinks into the VDSM era, as pointed out in [25], on-chip minimum feature sizes continue to decrease, and devices and interconnection wires are placed in closer proximity in order to reduce interconnection delay and routing area. The increasing aspect ratio of wires and the decreasing of interconnect spacing have made the coupling capacitance larger than self capacitance. In fact, the ratio of coupling capacitance is reported to be even as high as 70% -80% of the total wiring capacitance, even in 0.25pm technology.
Crosstalk is mostly caused by coupling capacitance between interconnection wires. In general, the crosstalk between two wires is proportional to their coupling capacitance, which is determined by the relative positions of the wires. The coupling capacitance between a pair of parallel wires is proportional to their coupling length, and is inversely proportional to their separating distance. The coupling capacitance between a part of onhogonal wires is negligible in comparison with the coupling capacitance between a pair of parallel wires for current technology. Consequently. it is reasonable to assume that there is crosstalk only between adjacent parallel wires. Recently, there has been much research on the coupling problem in both global and detailed routing. Zhou and Wong [27] minimized crosstalk at the global routing stage. Cbaudh;uy et al. 161 proposed wire soacinn after detailed routine. to reduce crosstalk. This technisue can be appliei as a post-processincand used for improving an existibg layout. but it is not suitable for routing.
However. both elobal routine. and detailed routine are not the best
The CLA problem can be formulated as the max-cut. k-coloring (MC) problem [241. However, the MC problem i s NP-complete P41. Thus, we resort to a simple yet efficient heuristic by constlucting a maximum spanning tree from the given HCG. Since a tree can be k colored in linear time if we have k layers, we shall first panition the vertices incident on edges with larger costs (coupling lengths) and allocates the corresponding segments to different layers.
Let T be the set of tracks inside a panel. Each track t E T can be represented by its set of constituent contiguous intervals. Denoting these intervals by xi, we have t E ux;. Each xi is either a blocked interval, where no segment from e c m he assigned, an occupied interval, where a segment from e has been assigned a free interval, where no segment from the set e has yet been 01 assigned.
To address these problems, Kay and Rutenbar [I71 suggested an integer linear programming (1LP)-based tracWlayer assignment method to do crosstalk optimization. However, the iLP-based approach is very time-consuming and thus not suitable for large and complex design. Batterywalaet al. 131 proposed a fast track assignment heuristic considering routahility, but crosstalk was not addressed in the work. In this paper, we propose a fast layerltrack assignment heuristic for crosstalk optimization. After the coarsening stage, we may obtain some long horizontal and vertical segments. To simplify the layerltrack assignment problem. we only assign segments which span more than one complete global cell in a row or a column. (We handle short segments during detailed routing.) The layerltrack assigner works on a full row or pone/
We first build the horizontal constraint graph HCG(V, E ) for all scgments in the panel. Each vertex t, t V corresponds to a segment in the panel. Two vertices U, and vj are connected by an edge e E E iff these segmenrs h~long to two different nets and their spans Overlap. The cdge cost of e = vi) E E represents the coupling length if U; and U , are assigned IO adjacent tracks. We define the crosstalk-driven layer assignment problem as follows:
a The Crosstalk-driven Layer Assignment (CLA) Problem:
Given B set of layers L. a set of segmcnts e, and B cost function r : l! x L i N which represents the coupling cost of assigning a segment to a layer, find an assignment that minimizes the sum ofthe coupling costs of each laycr.
A segment seg E e is said to he assignable t o t E T. t = xi, iff xi n seg # g implies that either xi is a free interval or is an Interval occupied by a segment of the same net. Thus, the crosstalk-dnven track assignment problem can he defined as:
Given a set of tracks T, a set of segments e, and a cost function 0 : e x T -+ N which represents the coupling cost of assigning a segment to a track, find an assignment that minimizes the sum of the coupling costs of the assignment. After layer assignment, most of the edges with l%ger Costs in an HCG are eliminated, and the HCG is decomposed into k subgraphs an example of track assignment problem for a subHCG, where e = {a, b, e, d , e, f}. T = {I, 2,3,4). and obstacles On tracks are shaded in grey (e.g., the two obstacles on tracks 3 and 4). We use a bipartite assignment graph to indicate the assignability of segments to tracks. For example, as shown in Figure 6 (h), edges between vertex a and vertices 1.2, and 3 are introduced Since segment a can be assigned to track 1, 2, or 3, but not track 4. For easier implementation, we merge the subHCG and the bipartite assignment graph into a combination graph, as shown in Figure 6 (c).
The CTA problem can he formulate as the Hamiltonian path problem which has been proven to be NP-complete 11 11. We resort to a heuristic for the CTA problem. Our track assignment algorithm starts by finding the maximal sets of conflicting segments. This is equivalent to finding the largest clique V, in the subgraph subHCG;. Since the graph is an The algorithm first assigns one maximal subset of conflicting segments at a time by starting from the largest clique. Then we choose the longest segment in the clique as the source s and assign it to the uppermost available track. Then, we choose the min-cost edge ( s , i ) (and thus the minimal coupling) and assign the segment associated with i to the first available track. If all tracks are occupied, we refer to the net associated with i as a failed net which will be reconsidered at the uncoarsening stage. We repeat the procedure by finding the min-cost edge ( i , j ) for further processing, where j is an unvisited vertex. After the track assignment, the actual track position of a segment is known. Thus, we can perform point-to-segment maze routing to complete the routing.
Experimental Results
We have implemented our crosstalk-driven multilevel system in the C++ language on a 1 GHz SUN late the delay For nets for those benchmarks. Therefore. we shall Focus our comparative studies on the six benchmark circuits listed in Table 1 .)
The design rules for wirelvia widths and wirelvia separation for detailed routing are the same as those used in (201. Table 1 describes the set of benchmark circuits. In the table, "Sire" gives the layout dimensions, "#Layers" denotes the number of muting layers used, "#Nets" represents the number of two-pin connections after net decomposition. Since the resulls reported in (201 are better than those in [SI and 171, we shall compare our multilevel router with that in (201.
Experimental results on run-time, routing completion rate, delay, and crosstalk are listed in Tables 2 and 3 , where "D,,," represenls the critical path delay, "D,,," represents the average net delay, "C,,," represents the maximum coupling length of a net, and "C,,," represents the average coupling length. To perform experiments on timing-driven routing, we used the same resistance and capacitance parameters as those used in (201 and set the constraint ratio h used in (201 to 5.5 For comparison. (For this case, both routers have comparable routability, and thus it is easier to compare the delay and crosstalk results.) A via is modeled as the n-model circuit, with its resistance and capacitance being twice of those of a wire segment. All the parameters were the same as those tance is considered, we can obtain even better timing reduction due to the significant crosstalk reduction.
Conclusion
In t h s paper, we have proposed anovel framework for fast multilevel routing considering crosstalk and timing optimization. The experimental results have shown that our algorithm is vely efficient and effective. Our future work lies in multilevel routing considering nanometer electrical effects.
