Abstract
Introduction
As the feature size shrinks into the ultra-deep submicron (UDSM) regime, crosstalk is becoming a new design challenge of comparable importance to area and timing. This paper focuses mainly on the crosstalk issue, specifically on the impacts of physical design and process variation on crosstalk.
In the UDSM era, the interconnect delay starts to dominate the delay in a circuit. Gate delay declines as technology progresses, but *The work of his Hui-Ru Jiang and Jing-Yang Jon was partially supported by National Science Council of Taiwan under Grant No. NSC89-2215.E009-058.
t The work of Song-Ra Pan and Yao-Wen Chang was partially supported by National Science Council of Taiwan under Grant No. NSC89-2215B009-055.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not inade or distributed for profit or conunercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific pennission and/or a fee. the intrinsic delay of interconnect remains the same. In addition, the growing coupling capacitance magnifies the effective loading of interconnect and induces noise to interfere signal propagation. Moreover, interconnect may become a bottleneck of the continuation of Moore's law, which has perfectly forecasted the legend of semiconductor industry so far [ 131. This trend forces designers to endeavor after interconnect optimization. Recent literature intensively focuses on crosstalk, mainly depending on the coupling capacitance between wires. The typical techniques to reduce the coupling capacitance include wire permutation [5] , perturbation [ 121, and shielding [ 111, etc. On the other hand, when technology shrinks below 0.35 pm, designers have to consider the impact of subwavelength lithography, where the feature size becomes smaller than the wavelength of the light shining through the mask [8] . Subwavelength lithography could cause variations in the dimensions of components. This type of process variation may create considerable unexpected circuit behavior [ 111 and, in the worst case, could offset the optimization done by designers. Hence, a reliable design of insensitivity to process variation is desirable.
Considering crosstalk and process variation together, this paper first raises an issue on crosstalk sensitivity, which reflects the influence of process variation on the crosstalk in a circuit. Crosstalk sensitivity is measured by the first derivative of crosstalk with respect to wire width, thus it could be considered during the wire sizing stage in post-routing optimization. In Section 3, we derive the formula for crosstalk sensitivity. Based on our formula, as technology scales down, the lower bound of crosstalk sensitivity increases quadratically, while the lower bound of crosstalk increases linearly. This fact shows that process variation affects crosstalk more than physical design does. Currently, most research attention is directed to crosstalk minimization; however, our work reveals that a reliable design with crosstalk insensitivity is also desirable in the UDSM technology.
When technology advances apace, designers are simultaneously challenged by multiple objectives, including crosstalk, crosstalk sensitivity, area, and delay. Our modeling for these objectives shows that they can simultaneously be handled by gate and wire sizing in post-routing optimization. According to our modeling, these design metrics are all in posynomial forms [6] , and thus the multi-objective optimization problem can optimally be solved by the Lagrangian relaxation method. Moreover, our method can easily be extended to other objectives, for example, energy in high-performance circuitry. The experimental results show that our method is effective and efficient. For instance, a circuit of 2856 gates and 5272 wires is optimized using only 46 minute runtime and 2.8 MB memory. We note that all solutions rapidly converge to the global optimal after only two iterations by relaxing Lagrange multipliers to the critical paths of the benchmark circuits. The relaxation scheme provides a key insight into the rapid convergence in Lagrangian relaxation. To the best knowledge of authors, this kind of efficiency has never been reported in related previous work.
Circuit Interpretation
A digital circuit can be divided into combinational and sequential parts. After applying peripheral retiming [9] , we can perform optimization on the combinational part to achieve our design objectives. Hence, this paper focuses on combinational circuits and interprets them in the way similar to that adopted in [2] . Given a combinational circuit with s primary inputs, t primary outputs, and n gates and/or wires as shown in Figure 1 (a), we construct the corresponding circuit graph as depicted in Figure 1 (b). Each primary input or primary output has a respective corresponding input driver or output load. A component is a circuit element that may be a gate, a wire, or an input driver. A circuit graph H = ( V E) is a directed acyclic graph. The set of nodes V = G W R f sg f tg contains the set G of gates, the set W of wires, the set R of input drivers, as well as two artificial nodes-the sources and the sinkt. On the other hand, the set E of edges represents the connections between nodes. An edge (i j), an ordered pair, links node i to node j if data flow from node i to node j. Additional edges are added to connects to input drivers and link primary outputs tot. The index of each node is done by the topological sort [3] . We set the indices ofs andt as 0 and m = n+s+1respectively. In addition, input(i) = fjj(j i) 2 Eg, and output(i) = fjj(i j) 2 Eg. Figure 2 shows the analytical models for gates and wires used throughout this paper. We choose the model [11] to approximate wire behavior. For a gate i of size xi, the resistance ri isri=xi, and the capacitance ci isĉixi, whereri andĉi are its unit-size resistance and capacitance. For a wire j of size xj, the resistance rj isrj =xj, and the capacitance cj isĉjxj + fj + 2 Ccj, whererj is its unitsize resistance, andĉj, fj, and 2Ccj are its respective unit-size, fringing, and worst case coupling capacitance. We will detail the calculation of the coupling capacitance in Section 3.1. Later, by incorporating the coupling capacitance into the wire capacitance, we can directly consider the coupling effect on delay, and even on power. In addition, an input driver i 1 i s, is treated as a gate whose xi is always 1 andri equals R D i . With the circuit model, a combinational circuit is transformed into a network with resistors and capacitors, as illustrated in Figure 3 . In the transformed circuit, for 1 i n + s, upstream(i) is the set of all the nodes except i on i's upstream paths; similarly, downstream(i) is the set of i and all the nodes on its downstream paths. The Elmore delay [4] is used as the delay of a component; the delay Di of node i is riCi, where Ci is the downstream capacitance. In the circuit graph H of a circuit, each node i is associated with all the above parameters. Thus we can optimize a circuit through manipulating the corresponding circuit graph. 
Crosstalk and Crosstalk Sensitivity
In this section, we will discuss the formulas for crosstalk and crosstalk sensitivity. In this paper, the coupling capacitance is used as the quantity of crosstalk. Figure 4 gives an instance where a coupling capacitance exists between two parallel wires segments i and j probably belonging to different routing trees. The coupling capacitance cij between two neighboring wires i of size xi and j of size xj is directly proportional to the overlap length lij but inversely proportional to the center-to-center distance dij. 
Crosstalk-Coupling Capacitance
As can be seen in Equation (1), wire sizing affects crosstalk, thus causing variation on delay and disturbance on signal integrity. The first term in Equation (1),fijlij=dij , is a constant extracted by technology files and design circuits, while the second term,
(1 ; (xi + xj)=2dij) ;1 , could be varied by wire sizing. Let x = (xi + xj)=2dij, the second term becomes (1 ; x) ;1 , 0 < x < 1.
It can be seen that cij is a positive quantity, and its lower bound cij isfijlij =dij. 
The neighborhood N(i) of wire i is defined as the set of its adjacent wires; the dominating index I(i) of N(i) of wire i is defined as the set of adjacent wires with the indices greater than i. For instance, if wires 7 and 4 are adjacent to wire 5, then N(5) = f7 4g and I (5) 
In Equation (3), we consider the worst case coupling capacitance 2Cci. If the switching behavior of wires is available, the worst case coupling capacitance 2Cci in wire capacitance ci can be substituted by the effective coupling capacitance [7] . On the other hand, by incorporating the coupling capacitance into the wire capacitance, we can directly consider the coupling effect on delay, and even on power. Note that Equation (3) is posynomial (positive polynomial) [6] , an important property to guarantee the optimality of our algorithm.
Crosstalk Sensitivity
As indicated in Figure 4 , the influences of xi and xj on cij are in the same direction: the larger xi and xj, the larger cij. Consequently, the crosstalk sensitivity &ij of cij is defined as the superposition of the first derivatives of cij with respect to xi and xj. 
Equation (4) reveals that the crosstalk sensitivity &ij is also a positive quantity; moreover, its lower bound&ij isfijlij =d 2 ij . The crosstalk sensitivity lower bound&ij is quadratically proportional to the inverse of wire spacing. Thus process variation affects crosstalk in a quadratic fashion. Crosstalk sensitivity should be an increasingly important design metric in the UDSM regime.
The Gate and Wire Sizing Problem
A generic optimization problem in the gate and wire sizing stage is described as follows. Problem M tries to minimize the critical delay Dmax under timing, crosstalk, crosstalk sensitivity, area, as well as sizing constraints. The following will detail the objective and these constraints.
The Objective and Constraints
In order not to traverse total paths (that may grow exponentially in the graph size), each node i, 1 i m, is associated with its arrival time ai. The objective function is to minimize the critical delay, and is equivalently to minimize the arrival time am of sink. The timing relationship between components are subject to the timing constraints. Thus The crosstalk for each pair of adjacent wires i and j is bounded by X B ij . Hence, we have, by Equation (1) 
As can be seen, the crosstalk bound X B ij must be larger than the crosstalk lower boundcij. If technology is scaled down,cij will increase in a linear fashion.
On the other hand, for each pair of adjacent wires i and j, the crosstalk sensitivity is constrained by S B ij . By Equation (4) 
The crosstalk sensitivity bound S B ij needs to be greater than the crosstalk sensitivity lower bound&ij . As technology advancing,&ij thus increases in a quadratic fashion. In other words, the crosstalk sensitivity issue may become more significant than the crosstalk one in the UDSM era. By Inequalities (5) and (6) We summarize how the aforementioned design characteristics change as technology scales down U times. Table 1 reveals the profitable effects of scaling-the speed of gates increases in a linear fashion, while the area declines in a quadratic fashion. In contrast, the table also indicates the harmful impacts of scaling-the delay of wires does not decline; moreover, the crosstalk grows in a linear fashion, and the crosstalk sensitivity increases in a quadratic fashion. As shown in Table 1 , crosstalk sensitivity should be a new comer after crosstalk which may play an even important role in the future technology.
Problem Formulation
We substitute the objective and constraints derived in the preceding subsection for those in Problem M as follows. 
Lagrangian Relaxation
As given in Problem P, the objective and constraints are in posynomial (positive polynomial) forms [6] . This property guarantees Problem P can optimally be solved by the Lagrangian relaxation method. We relax the constraints into the objective function by introducing one Lagrange multiplier to each constraint.
Let x=(xs+1 : : : x n+s) and a=(a1 ::: am). 
where Di = riCi and Ci is i's downstream capacitance.
For any vector satisfying the optimality conditions in Theorem 1, the corresponding Lagrangian relaxation subproblem LRS of Problem P is formulated as follows. 
Proof:
C 0 i is the portion of downstream capacitance Ci which is independent of the size xi. In terms of C 0 i , Equation (7) The minimum L occurs when @L @x i (x)=0; thus the theorem follows.
2
It can be shown that there exists a vector of Lagrange multipliers such that the optimal solution of LRS is also the optimal solution of the original problem P. The problem to find such a vector is the Lagrangian dual problem:
LDP : Maximize min L (x) Subject to in the optimality conditions:
By Theorems 1 and 2, we propose the OS algorithm shown in Figure 5 to solve Problem LDP optimally. At the beginning, A1
sets and to arbitrary positive numbers and assigns an arbitrary positive vector in the optimality conditions to . In A2, is then calculated with respect to . A3 solves the Lagrangian relaxation subproblem LRS. Lagrange multipliers are then adjusted by the sub-gradient method in A4. A5 projects the new multipliers onto the nearest point in the optimality conditions; our projection scheme is relaxing Lagrange multipliers to the critical paths. It is shown in experimental results that this projection strategy leads to very fast convergence. A6 updates the iteration counter. We repeat the above process until the solution converges within the error bound (see A7). 
Subroutine: LRS (Lagrangian Relaxation Subroutine)

Extension to Other Objectives
Our formulation can also be extended to other objectives such as energy in high-performance circuitry. We demonstrate how the energy constraint can be incorporated into our formulation in this section. Extensions to other objectives can be considered similarly. For a given technology file and circuit, energy is the product of power consumption and delay, which is generally a constant. This inequality is also in a posynomial form; thus, without loss of optimality, it can be incorporated into the optimization problem solved in the preceding section. Let be the Lagrange multiplier for the energy constraint. Accordingly, the quantity of opti in Theorem 2 can be modified as follows. 
Experimental Results
We implemented our algorithm and tested on the MCNC93 benchmark circuits on a SUN UltraSPARC II 300 workstation. The technology parameters used in our experiments are as follows. The supply voltage is 2:5 V. The resistance and capacitance of a unitwidth inverter are 4:73 k and 8:8 fF respectively, and the resistance, capacitance, and fringing capacitance of a unit-width wire are 5:3 , 2:06 fF, and 102:6 fF respectively. The respective lower and upper bounds of a gate are 0:36 m and 5 m, while those bounds of a wire are 0:36 m and 1:8 m. Table 2 lists the names (Ckt Name) of the circuits, numbers of gates (#G) and wires (#W) in the circuits, total numbers of components (All), crosstalk (Xtalk), crosstalk sensitivity (Xtalk Sens.), area (Area), delay (Delay), numbers of iterations (ite), runtimes (time (measured by minute:second)), and storage requirements (mem). The improvement Imprv (X) is calculated by Initial
Final .
The experimental results show that our method is effective and efficient. Table 2 reveals that while crosstalk is on average improved 3:6 times, crosstalk sensitivity is on average improved 9:41 times. The respective improvements on area and delay are 3:14 and 11:97
times. Further, our method converges very fast and its storage requirement is quite small. For instance, a circuit of 2856 gates and 5272 wires is optimized using 46 minute runtime and 2:8 MB memory. We note that all solutions rapidly converge to the global optimal after only two iterations by relaxing Lagrange multipliers to the critical paths of the circuits. To the best knowledge of authors, this kind of efficiency has never been reported in related previous work.
Conclusion
This paper has raised a new issue-crosstalk sensitivity, which is an important new design metric in the UDSM technology. We have optimally solved a multi-objective optimization problem by Lagrangian relaxation. The experimental results show that our method is very efficient and effective. Our projection scheme, relaxing Lagrange multipliers to the critical paths, provides a crucial insight into effectively adjusting Lagrange multipliers, which is a key ingredient to Lagrangian relaxation.
