Abstract-Noise, as well as area, delay, and power, is one of the most important concerns in the design of deep submicrometer integrated circuits. Currently existing algorithms do not handle simultaneous switching conditions of signals for noise minimization. In this paper, we model not only physical coupling capacitance, but also simultaneous switching behavior for noise optimization. Based on Lagrangian relaxation, we present an algorithm which can optimally solve the simultaneous noise, area, delay, and power optimization problem by sizing circuit components. Our algorithm, with linear memory requirement and linear runtime, is very effective and efficient. For example, for a circuit of 6144 wires and 3512 gates, our algorithm solves the simultaneous optimization problem using only 2.1-MB memory and 19.4-min runtime to achieve the precision of within 1% error on a SUN Sparc Ultra-I workstation.
I. INTRODUCTION

W
ITH decreasing feature sizes, higher clock rates, and increasing interconnect densities, noise is getting a greater concern of comparable importance to power, area, and timing in integrated circuits [22] , [23] . While power, area, and timing have been extensively discussed in the recent literature, e.g., [3] - [7] , [10] , [18] , and [20] , relatively less work has been done on noise.
Noise profoundly affects the performance of a circuit, especially in the deep submicrometer regime. Noise is an unwanted variation which makes the behavior of a manufactured circuit deviate from the expected response [19] . The deleterious influences of noise can be classified into two categories. One is malfunctioning, which makes the logic values of nodes differ from what we desire; the other is timing change, which is caused by switching behavior.
Generally, crosstalk is a type of noises which is introduced by an unwanted coupling between a node and its neighboring wire or between two neighboring wires. For example, two adjacent wires form a coupling capacitor and a mutual inductor. A voltage or a current change on one wire can thus interfere the signal on the other wire. The inductive effects [15] , [17] must be considered as circuit frequencies increase above 500 MHz.
So far, the typical strategies to minimize the on-chip inductance are shielding wires and/or shielding layers. The inductive effects are beyond the scope of this paper.
In this paper, we focus on the capacitive effects of crosstalk. We refer to the capacitance created by the physical geometry as the physical coupling capacitance. The physical coupling capacitance is directly proportional to the overlap length of adjacent wires and is inversely proportional to the distance between them. Currently existing literature handles only physical coupling capacitance. Miscellaneous heuristics and techniques have been proposed to minimize the overlap length or to maximize the distance between the wires; these methods include track permutation [12] , [13] and wire spacing [21] , [24] , [26] , etc.
In fact, coupling capacitance is dominated not only by physical geometry, but also by switching conditions [16] . The influence of switching conditions can be explained by the Miller and the anti-Miller effects [2] . Assume that the physical coupling capacitance between two neighboring wires is . The Miller effect occurs when the adjacent wires switch in opposite directions. In this case, the equivalent coupling is . On the contrary, the anti-Miller effect happens when the adjacent wires switching in the same direction. In this case, the equivalent coupling is zero. In other words, the coupling effect is not always undesirable. In the appearance of the anti-Miller effect, the wires are charged or discharged by the currents from all drivers. Thus, the transition of wires can be shortened so that the logic values become stable earlier. If two wires have very large physical coupling capacitance but possess the same switching behavior, the inter-wire crosstalk can be very small. Hence, it is often too pessimistic if we only consider the Miller effect. However, the anti-Miller effect is hard to be considered because of its uncertainty. Though some previous work has mentioned this problem, yet there is no literature solving this problem so far.
In this paper, we model not only physical coupling capacitance but also simultaneous switching behavior for crosstalk optimization. We first consider a more accurate model of crosstalk between wire and wire :
For this model, we propose a two-stage strategy to minimize the crosstalk in a circuit. In the first stage, using geometry wire ordering, we place the wires with similar switching behavior in closer proximity; this Switching Dissimilarity problem is equivalent to the minimum-weighted Hamiltonian path problem in a complete graph, which is an NP-hard problem. Therefore, we resort to heuristics for dealing with the Switching Dissimilarity 0278-0070/00$10.00 © 2000 IEEE problem. In the second stage, we minimize the inter-wire physical coupling capacitance by sizing wires. We formulate the constraints for physical coupling capacitance in a posynomial (positive polynomial) form [14] , which can optimally be solved by Lagrangian relaxation.
The second stage not only deals with the crosstalk problem but also optimizes area, power and delay by sizing gates and wires. Gate and wire sizing has been extensively studied in the literature for optimizing area, power, and/or delay, e.g., [3] - [7] , etc. In the previous work, Lagrangian relaxation has been proven to be an effective approach for simultaneous performance optimization [4] - [6] ; this fact encourages us to adopt the Lagrangian relaxation method for our problem. In this paper, based on Lagrangian relaxation, we present an algorithm which can optimally solve the simultaneous crosstalk, area, power, and delay optimization problem by sizing circuit components. Our algorithm, with linear memory requirement and linear runtime, is very effective and efficient. For example, for a circuit of 6144 wires and 3512 gates, our algorithm solves the simultaneous optimization problem using only 2.1-MB memory and 19.4-min. runtime to achieve the precision of within 1% error on a SUN Sparc Ultra-I workstation.
The remainder of this paper is organized as follows. Section II gives a circuit model and the problem description. The crosstalk modeling is detailed in Section III, in which coupling capacitance and simultaneous switching are discussed. In Section IV, based on Lagrangian relaxation, we propose an algorithm to minimize the total area under noise, power, and delay constraints. Section V shows the experimental results. Concluding remarks are given in Section IV.
II. CIRCUIT MODELING AND PROBLEM DESCRIPTION
In this section, we introduce the representation of a circuit and some notation used throughout the paper, present circuit and delay models, and formulate a performance optimization problem.
A. Circuit Representation
For a digital circuit, we can partition it into two groups-combinational and sequential parts. We can improve the performance by optimizing the combinational part. For example, in order to speed up the working frequency, we have to minimize the clock period. We may achieve this goal by minimizing the delay of the critical path in any combinational subcircuit between two latch elements. Hence, we can focus on the combinational circuits. The way we interpret a circuit is similar to that used in [5] .
Given a combinational circuit with primary inputs, primary outputs and gates/wires. The sizes of gates and wires can be changed according to our objectives. For the th primary input, , we have one corresponding input resistor, , as its input driver. Similarly, for the th primary output, , we have one corresponding output capacitor, , as its output load. Fig. 1 depicts a combinational circuit with three input drivers and one output load.
A component is a circuit element which can be a gate, a wire, or an input driver. An input driver is considered as a gate. A Fig. 1 . A combinational circuit with three input drivers, seven wires, three gates, and one output load, in which the gate and wire sizes can be varied for optimization. node is located at the output of a component, which either connects two components or links one primary output to one output load. Because every node obviously connects to a distinct node, a circuit has nodes. In order to conveniently manipulate the circuit, a circuit graph is constructed. Fig. 2 illustrates the circuit graph of the circuit given in Fig. 1 . A circuit graph is a directed acyclic graph which contains nodes. The set of nodes consists of two additional artificial nodes as well as nodes corresponding to the components. One added node is viewed as the source, , connected to every input driver; the other is viewed as the , , linked to all output loads. Let and . Therefore, the node set , , contains the set of gates, the set of wires, the set of input drivers, the source , and the sink . The index of a node is labeled such that if node is the input of node , then . For an acyclic directed graph, this indexing can be labeled by topological sorting [9] with runtime linear in the graph size. Hence, the index of the source is zero, and that of the sink is . For , index is referred to a gate, a wire, or an input driver. On the other hand, the set of edges expresses the connections between nodes. An edge , an ordered pair, connects node to node , , if data flow from node to node . Additional edges are added to connect the source to input drivers and connect primary outputs to the sink. The connectivity relationship between parents and children are defined by Hence, the delay D lumped in r can be computed by r C . For example, the delay of node 2 is R C , where C represents the capacitance for all the capacitors in the shaded area.
and
, where , and . Furthermore, belongs to if and only if belongs to . For example, in Fig. 2,  , .
B. Circuit and Delay Models
In order to explore a circuit, we shall model the circuit elements by analyzable electric components, like resistors and capacitors. Fig. 3 illustrates the gate and wire models used in this paper. For a gate of size , the resistance is , and the capacitance is , where and are the resistance and capacitance of gate of unit size, respectively. In addition, the of an input driver , , is equal to the input resistor . However, same as [5] , the intrinsic gate delay is ignored in this model for simplicity. To conquer this problem, we could attach the self-loading capacitance at the output node of each gate. The self-loading capacitance can be approximated by for a gate with inputs. Note that the derivations of Theorems 4-7 remain the same if the intrinsic gate delay is considered, and the corresponding properties still hold. We choose the model [19] to approximate wire behavior. For a wire of size , the resistance is , and the capacitance is , where and are the respective resistance and capacitance of wire of unit size, is the fringing capacitance of wire , and is the coupling capacitance of wire . Section III-B will detail the coupling capacitance . The term represents the coupling capacitance of wire in the worst case. By incorporating the coupling capacitance into the wire capacitance, this wire model considers the impacts of crosstalk on delay and power.
With the gate and wire models, a combinational circuit can be transformed to a network with resistors and capacitors. Fig. 4 illustrates the resultant circuit modeling for the circuit shown in Fig. 1 . In the transformed circuit, for , means the proper set (all elements are distinct) containing all the nodes except on the paths from node to all reachable drivers; similarly, means the proper set containing all the nodes on the paths from node to all reachable loads. For instance, in Fig. 4 , and . We adopt the Elmore delay model [11] to compute the delays of gates and wires. The delay of node is , where is the downstream capacitance of including self-loading. For the time being, is referred to the upstream resistance of node , whereas means the weighted upstream resistance of node in Section IV.
In the circuit graph of a circuit, each node is tagged with some attributes, including size , node type , , , or , unit-width resistance , unit-width capacitance , fringing capacitance ( 0 if ), and information about coupling capacitance detailed in Section III. Thus, we shall optimize a circuit through manipulating the corresponding circuit graph but ignoring the transformed RC network.
C. Problem Description
For practical requirement, area is the greatest concern in circuit design. This paper targets to minimize area subject to noise, timing, and power constraints. Let , , , and denote the total area, the total crosstalk, the delay on the critical path, and the total power of the circuit, respectively, and , , and denote the upper bound of the total crosstalk, the delay on the critical path, and the total power of the circuit, respectively. A generic formulation of this problem is given as follows:
: Minimize subject to
In Section IV, we will give more detailed problem definitions and present our algorithms for the problem.
III. CROSSTALK MODELING
In the preceding section, we have introduced preliminaries about representing and interpreting a circuit. In this section, we will focus on the crosstalk problem, which has been briefly described in Section I. We compute the physical coupling capacitance between two wires and using the model mentioned in Section I
We will deal in turn with the two crucial factors which affect the crosstalk-switching behavior and physical coupling capacitance.
A. Switching Behavior
For two adjacent wires with coupling , when one switches, the current may flow through to the other wire, thus interfering the signal on the other wire. In the worst case, the two wires simultaneously switch in different directions. As a result, the transitions on these wires are longer than expected. This phenomenon, called the Miller effect [2] , is like the effect caused by large loading. On the contrary, the anti-Miller effect benefits the transitions. While two neighboring wires toggle in the same direction, they can help each other. Consequently, the transition time is reduced. This phenomenon is like the effect caused by small loading.
In order to take advantage of the switching conditions for crosstalk minimization, we shall analyze the switching behavior of signals. In real applications, the information of switching behavior can be retrieved during the logic simulation stage or based on the patterns in previous designs. When analyzing the switching behavior, we first assume each gate or wire is of the minimum size or of other sizes extracted from profiles. Therefore, the similarity of switching behavior between two wires and can be defined as follows:
where is the simulation duration, is the normalized waveform of wire at time .
if node is high; otherwise, if node is low. For any two wires and ,
. The closer to for , the less similar their behavior; the closer to 1 for , the more similar their behavior. Two wires with most similar switching behavior are assigned to closer tracks to minimize the effective loading. We can show that the problem for minimizing the effective loading is equivalent to a graph-theoretic one. We build a complete graph for wires. In , each node corresponds to a wire , and every edge is associated with a equal to . An ordering is a sequence composed of all nodes, . Accordingly, the total effective loading between neighboring wires is . Hence, the Switching Dissimilarity problem is defined in the following:
We have the following theorem for the complexity of the problem.
Theorem 1: The Switching Dissimilarity problem is NP-hard.
The problem can be reduced from the Hamiltonian path problem, which is NP-hard. The reduction is similar to that from the Hamiltonian cycle problem to the traveling-salesman problem in [9] . We briefly describe the reduction in the following. Given a general graph , the existence of a Hamiltonian path in is NP-hard. We construct a complete graph by adding all nonedges of , thus . In addition, the weight of each edge is assigned as follows: if if . It can be seen that has a Hamiltonian path if and only if the minimum total effective loading of the ordering in is . Therefore, the problem is NP-hard. Since the problem is NP-hard, we resort to heuristics. Specifically, we need an approximation algorithm with a performance guarantee. However, we have a negative result described in the following theorem.
Theorem 2: If and , there is no polynomial-time approximation algorithm with ratio bound for the problem. The above theorem can be proved by contradiction. The details of its proof is similar to that of no polynomial-time approximation algorithm with the traveling-salesman problem [9] .
By the above two theorems, the problem is NP-hard and there exists no efficient approximation algorithm. We propose an efficient minimum spanning tree based heuristic for the problem as shown in Fig. 6 . The running time of constructing a minimum spanning tree for a complete graph is . A preorder tree walk recursively visits each node in a tree. The walk lists a node when the node is first encountered and before any of its children is visited. The time complexity of a preorder tree walk is . Therefore, the running time of the WOSD algorithm is . Fig. 7 illustrates the operation of the WOSD algorithm on the example shown in Fig. 5 .
Solving the switching dissimilarity problem, we can obtain a geometry ordering for all wires with the minimum effective loading. Therefore, we can know the adjacency relationship between wires. The neighborhood of wire is defined as the set of adjacent wires; the dominating index of , denoted by , of wire is defined as the set of adjacent wires with the indexes greater than . For instance, in Fig. 5 , if these four wires are routed in the same channel, the geometry ordering is equivalent to track assignment. If we choose as the resulting track assignment, , , and ; , , and .
B. Physical Coupling Capacitance
A multiterminal net is decomposed into wire segments. Each line between two junction is treated as a wire. Fig. 8 depicts a case where two wires and , belonging to different nets, have coupling capacitance.
According to Fig. 8 , the physical coupling capacitance between two neighboring wires and can be calculated as follows: (1) where and sizes of wires and ( ); unit-length fringing capacitance between wires and ; overlap length of wires and ; distance from the center line of wire to that of wire . In (1), the first term, , is a constant which can be computed by technology files, and the second term, , is what we are concerned. Let , the second term of (1) (2) where is a constant. Note that (2) is in a posynomial (positive polynomial) form [14] . It will be clear that this is an important property for guaranteeing the optimality of our algorithm to be presented in Section IV.
Recall that, in Section II-B, the capacitance of wire is . The coupling capacitance of wire can be computed by (2) as follows:
Hence, can be calculated in the following:
IV. OPTIMAL AREA MINIMIZATION UNDER CROSSTALK, DELAY, AND POWER CONSTRAINTS
In this section, we give the problem formulation and an algorithm for simultaneous area, crosstalk, delay, and power optimization. Since area is typically the most important concern in VLSI design, we formulate the performance optimization problem as to minimize the total area of a circuit subject to crosstalk, delay and power constraints.
We summarize the Lagrangian relaxation method here [1] . Consider the following generic geometric optimization method formulated in terms of a vector of decision variables:
Minimize subject to
The decision variables lie in a given constraint set . The Lagrangian relaxation method relaxes the set of constraints to the objective function by introducing Lagrange multipliers, , resulting in the Lagrangian subproblem
Since the constraints are relaxed, the Lagrangian subproblem is easier to be solved. By the Lagrangian Bounding Principle [1] , the Lagrangian function is always a lower bound on the optimal objective function value of the original problem. Lagrangian relaxation method can solve a problem optimally when all of the constraints are in or in form, and the objective and constraints are in a posynomial form [14] .
Section IV-A formulates the primal problem in the linear programming form. Section IV-B relaxes the primal problem to a Lagrangian relaxation problem and simplifies the relaxed problem. We demonstrate how to solve the corresponding Lagrangian relaxation subproblem in Section IV-C. In Section IV-D, we present the Lagrangian dual problem and solve it by the subgradient optimization technique.
A. Problem Formulation
For each component , , the corresponding area is proportional to its size . Given the unit-sized area , the area of component is ; the total area of a circuit is thus . The areas occupied by input drivers and output loads are ignored because their areas are fixed. If the respective crosstalk, power, and delay bounds of a circuit are , and , we have where supply voltage; working frequency; switching activity of component ; path in the path set .
Note that, though not presented here, the above crosstalk constraint can easily be extended to the case with a distributed crosstalk bound on each net or a crosstalk bound on the sum of the square of each crosstalk. Further, all corresponding theorems and properties still hold for the extended formulation. Therefore, the optimization problem addressed here can be formulated as follows.
By the delay constraint in Problem , the delay for each source-to-sink path cannot exceed the delay bound . The crosstalk and power constraints mean that the total crosstalk (coupling capacitance) for all nets and total power consumption for all gates and wires cannot exceed the crosstalk and power bounds. From Section III-B, the crosstalk between two adjacent wires and is their inter-wire physical coupling capacitance, , where is a constant. Hence, the crosstalk constraint can be simplified by subtracting both sides by ; the constraint becomes If we define as and as , the modified crosstalk constraint is Assume the supply voltage and frequency are fixed. The power constraint can be simplified by dividing both sides by . Let be . The power constraint becomes
Since the interconnect densities of a circuit can be very high in deep submicrometer technology, the circuit graph could be very dense. Hence, the path set can be far greater than or even grows exponentially with the circuit size. It is prohibitively expensive to traverse all paths to check the constraints. To conquer this problem, we associate to each node , which represents the arrival time of that node. This technique was also used in [5] . Therefore, we distribute the delay constraint into each edge in the circuit graph . Let and in the following discussion. We have and Consequently, the problem can be modified as follows.
: Minimize subject to and The objective function and constraints of the problem are all in the posynomial form. Through variable transformation, a convex programming problem is obtained. It is known that a convex programming problem has a unique global optimum [14] . Hence, problem has a unique global optimum, and it is ensured that each local optimum is the global optimum.
Note that in the formulations for Problems and , we did not consider the switching conditions mentioned in Section III-A. To incorporate switching behavior, we can simply multiply by in the formulations.
B. Lagrangian Relaxation
To solve the problem , we apply Lagrangian relaxation by introducing one Lagrange multiplier to each constraint: to the power constraint, to the crosstalk constraint, to each delay constraint.
can be viewed as a timing weight on edge . Let and . The Lagrangian function, therefore, is
The corresponding Lagrangian relaxation subproblem is 1: Minimize subject to
To solve the Lagrangian relaxation subproblem, we derive the optimality conditions by Kuhn-Tucker conditions [25] .
Theorem 4: The optimality conditions on Lagrange multipliers are given by (3) Proof: By Kuhn-Tucker conditions [25] , if the optimal solution of the Lagrangian relaxation subproblem is the optimal solution of primal problem , then must satisfy Inspired by the work [5] on the optimality conditions for Lagrange multipliers, we have the following by rearranging :
By checking Kuhn-Tucker conditions, this theorem thus follows. Theorem 4 reveals the sum of in-degree multipliers equals to that of out-degree multipliers for every node except the source. This theorem is analogous to the Kirchhoff's Current Law [8] :
The algebraic sum of the currents flowing into a node equals that of the currents leaving from the node for all times. 
C. Lagrangian Relaxation Subproblem
In the preceding subsection, we have obtained the Lagrangian relaxation subproblem
. In this subsection, we will derive the optimal sizing solution and present a greedy, optimal algorithm to solve this subproblem. is the portion of downstream capacitance which is independent of the size . Hence, is defined as follows:
if ;
otherwise.
We have
We extract the terms dependent on as follows. Let be a weighted upstream resistance By (5), the optimal resizing for a gate is mainly determined by its upstream and downstream; that for a wire is dominated by not only the upstream and downstream but also its neighborhood.
In summary, we have the following theorem. In the above theorem, 1) is the optimality condition; 2)-6) are the complementary slackness conditions; 7)-11) are constraints; 12)-14) restrict nonnegative multipliers; 15) is the optimal sizing.
We propose a greedy algorithm LRS in Fig. 9 to optimally solve the Lagrangian relaxation subproblem (and equivalently to solve ). As mentioned earlier, the Lagrangian relaxation problem has a unique global optimum. In other words, if we find a local optimum, this local optimum equals the global optimum. This property guarantees that a greedy algorithm can find the optimal solution. 
D. Lagrangian Dual Problem
It can be shown that there exists a vector of Lagrange multipliers such that the optimal solution of is also the optimal solution of the original problem . The problem of finding such a vector is the Lagrangian dual problem described as follows.
: Maximize subject to where We present Algorithm OGWS listed in Fig. 10 to solve . Initially, an arbitrary multiplier vector in the optimality condition is chosen as the initial one and , are assigned to positive numbers in A1. In A2, are calculated with respect to in A1. A3 calls the LRS subroutine. In A4, the OGWS algorithm iteratively adjusts the multipliers by the subgradient optimization method. It is well-known that if the step-size sequence satisfies the condition and [e.g., ], the subgradient optimization method will always converge to the global optimal. In A5, the updated Lagrange multipliers are projected onto the nearest point in the optimality condition. A6 updates the iteration counter, while A7 checks if the stop criteria holds. 
V. EXPERIMENTAL RESULTS
We implemented our algorithm in the C language on a Sun SPARC Ultra-I workstation and tested on the ISCAS85 benchmark circuits. The circuit sizes ranged from 640-9656. The supply voltage was set to 2.5 V, and the working frequency was set to 400 MHz. Listed in Table I , the unit-sized resistance and capacitance of a gate were m and 8.8 fF/ m, and those of a wire were 5.3 m and 2.06 fF/ m, respectively. The respective lower and upper bounds for a gate were 0.36 m and 5 m; those of a wire were 0.36 m and 1.8 m. Initially, the sizes of gates and wires were set to 0.36 m and 1.8 m, respectively. Table II shows the experimental results, where #G denotes the number of gates, #W denotes the number of wires, tot denotes the total number of gates and wires, Init denotes the initial values before sizing, Fin denotes the final values after sizing, ite denotes the number of iterations, time denotes the runtime, mem denotes the memory requirement, and Impr(%) denotes the average improvement in %. The improvement for each term is calculated by %. Our algorithm is effective and efficient. The results show that our algorithm, on the average, improved the respective area, noise, power, and delay by 79.98%, 80.00%, 16.02%, and 1.77% after wire and gate sizing. For the largest circuit, c7552, with 3512 gates and 6144 wires, our algorithm needed only 19.4-min runtime and 2.1-MB storage to achieve the precision of within 1% error.
Note that the results show that sizing benefits delay not much. When a component is enlarged, it will increase ont only the loading of the components on the upstream path of the sized component and the driving capability for the components on the downstream path but the physical coupling capacitance also. Consequently, up-sizing causes that the delay for the upstream part increases, while the delay for the downstream part decreases. Similarly, down-sizing reduces the delay for the upstream part and harms that for the downstream part. As a result, the delay over the whole circuit would not be significantly improved.
In Fig. 11 , the storage requirement (denoted by the vertical axis) is plotted as a function of the total number of gates and wires in a circuit (represented by the horizontal axis). Similarly, the relationship between the runtime and the circuit size is depicted in Fig. 12. Figs. 11 and 12 show that the runtime and the storage requirements of our algorithm approach linear in the total number of gates and wires. As revealed by Fig. 12 , some points deviate from the linear line; a probable reason is that these circuits are not regular and their structures are different from each other. 
VI. CONCLUDING REMARKS
Noise immunity is of significant importance for a deep submicrometer digital circuit; it, as well as area, delay, and power, has become an important design metric. Switching conditions and coupling capacitance are two dominating considerations for crosstalk optimization; nevertheless, the switching condition is often neglected in previous work. We have modeled the crosstalk optimization problem by considering both of switching conditions and physical coupling capacitance. We have proposed a two-stage method for crosstalk minimization: the first stage handles geometry wire ordering by exploiting the switching conditions to reduce the effective loading; the second stage, further, optimizes not only physical coupling capacitance but also area, power, and delay. Based on the Lagrangian relaxation method, our simultaneous gate and wire sizing algorithm can economically optimize all the above objectives. The experimental results show that our algorithm is very effective for performance optimization, especially for noise, area, and power minimization.
