Realization of high-performance domino logic depends strongly on energy-efficient and noise-tolerant interconnect design in ultra deep sub-micron processes. We characterize the cycle-averaged power model for interconnects accounting for switching statistics and dynamic behaviors. For the sake of signal integrity, cross-coupling effects are also characterized which reflect logical correlation between adjacent wires. Based on the new models for interconnect power and capacitive crosstalk, we optimize the coupling power consumed by interconnects with crosstalk constraints. Experimental results show that optimized designs save the power consumption significantly.
Introduction
The era of system-on-a-chip integrated with millions of transistors drives the technology to scale down to ultra deep sub-micron (UDSM) range, 0.25 m or below. As feature size decreases, interconnect pitch also shrinks while packaging is becoming denser. Meanwhile, wire aspect ratio (namely, metal height/width) increases from 1.8 in 0.18 m processes to 2.7 in 0.07 m processes to keep resistance in control for high performance [11] . These apparent trends bring up two competitive issues: power efficiency and noise immunity on interconnect.
First, higher packaging density and wider chip area lead to relatively longer interconnects which will dominate both the speed and power dissipation. Hence, it is crucial to keep the effective wire capacitance small for high performance and low power dissipation. Coupling capacitance between adjacent wires becomes the dominant component of interconnect capacitance. The ratio between cross-coupling capacitance and total wire capacitance is as much as 0.8 in 0.18 m technology with minimum spacing [2] . Based on these observations, we conclude that most of the power consumed by interconnects depend on the coupling capacitance, which is referred to as coupling power. Our approach to coupling power optimization is motivated by the fact that coupling power can be minimized with appropriate net ordering. Another important issue is signal integrity which can be deteriorated by design-related noise sources, such as coupling effects, di/dt noise and IR drop. The noise margin and slew rate of a wire are strongly dependent on the Miller effects in UDSM which is referred to as crosstalk. There are two types of crosstalk due to cross-coupled capacitors or inductors. Inductive crosstalk becomes significant when the clock frequency goes high [6] , which we shall not consider in our work. Capacitive crosstalk becomes far more hazardous as dynamic logic is now prevalently used in high-performance circuit design. The reason is that dynamic logic is more vulnerable to crosstalk noise, because its evaluation nodes are not able to recover from erroneous transitions.
First, we characterize the average power consumed by interconnects for domino logic. Our cycle-averaged power model for interconnects accounts for physical net topology, switching activities, and dynamic characteristics of the circuit implementation. Second, the maximum crosstalk model takes into account logical correlation on top of coupling capacitance and net spacing. The reason is that satisfiability and observability of crosstalk condition determine effective crosstalk between two signals. Third, based on the power and crosstalk model, we optimize the initial routing solution with minimizing coupling power as the objective with constraints on maximum crosstalk and area. The power optimization is performed via track assignment by using simulated annealing.
108
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED '00, Rapallo, Italy. Copyright 2000 ACM 1-58113-190-9/00/0007…$5.00. The average power consumed by a wire with a clock frequency f = 1 T clk can be evaluated by [8] 
ISLPED '00,
where n is the number of clock cycles observed and Ic(t) represents the drawn current due to transitions during a clock period. Qav is the cycle-averaged charge provided by the power supply to all the capacitance of interconnect which is given by
where S is the set of nodes connected to V dd , pj denotes the switching probability, and C total j denotes lumped total capacitance on node j. The effective capacitance b C is characterized by both physical capacitance and switching activities.
Effective Capacitance
The effective capacitance accounts for cycle-averaged charge stored in physical capacitance which is provided by the power supply during the evaluation stage.
One can model the physical capacitance of a wire with width w, length l, overlapping segment length y, and edge-to-edge distance d as shown in Figure 1 
where Ca is unit area capacitance (F = c m 2 ) to substrates, C f is unit fringe capacitance (F=cm) and Cx is unit coupling capacitance (F = c m ) with minimum spacing dmin. To be precise, C f and Cx are not constant due to complex fringing effects [3] , but for our purpose it is adequate. The area and fringe capacitance are barely independent of the distance d, which is referred to as the self capacitance Cs.
The distributed RC model for interconnects, called -model, is illustrated in Figure 1(b) . Corresponding to self and coupling capacitance, effective capacitance can be viewed as consisting of two components to reflect the switching activities as in Figure 1 
The effective self capacitance b
Cs is characterized by the switching activity of domino logic. Domino logic is free from spurious transitions. At most one low-to-high transition can occur at the output of a domino logic cell during the evaluation stage. If the input combination to the cell results in a current path from the evaluation node to the ground rail, then discharge of load capacitance causes a low-to-high transition to take place at the output of the domino cell. Otherwise, charge is kept in the load capacitance, so the output logic value of the domino cell remains low. Therefore, only two types of waveforms are observable at the buffer output of a domino cell, namely, either stable low or low-to-high transition during the evaluation stage. The monotonicity of domino logic accounts for the effective self capacitance of a wire i, which is given by
where p(i=1) is the probability that the signal i is high, which is referred to as the on-probability of the signal i. 
where p(i=q^j=r) denotes the correlated probability where q r2 f0 1g. The temporal correlation factor ! captures the temporal behavior of type HH transitions. Discussion on signal and temporal correlations follows in the subsequent sections.
Signal Correlation
To evaluate the on-probability p(i=1) and the correlated probability p(i=q^j=r), we need to compute the signal correlation coefficients and the transition correlation coefficients. The signal correlation coefficient [4] between a signal i and a signal j is ij qr , p(i=q^j=r) p(i=q) p(j=r) q r2 f 0 1g where p(i=q) and p(j=r) are signal probabilities that i has the value q and j has the value r, respectively. The transition correlation coefficient [10] between a signal i which changes from q to u, and a signal j which changes from r to v is defined as
where p(iq!u) is the transitional probability that the signal value of i changes from q to u, which can be represented by the signal probability p(i=q) and the conditional probability p(i(t + t)=u j i(t)=q):
Due to the monotonicity of domino logic, the transition correlation coefficient ; ij qr uv becomes ; ij 00 uv , which can be simply represented by the signal correlation coefficient as ij uv .
The signal correlation coefficient can be efficiently evaluated by using OBDD. If we assume that higher order correlations between two signals and a third signal are negligible, then the signal correlation coefficient for three signals can be represented as a product of the signal correlation coefficients of signal pairs: ijk qrs = ij qr jk rs ki sq (10) Then the signal probability of a signal i to have a value 2 f 0 1g
where is the set of path from the OBDD node representing the signal i to the terminal OBDD node denoting 2 f 0 1g. x k is a primary input variable constituting i, and k is an assigned value to x k such that the signal i should be evaluated in a path . The signal probability can be computed in O(N) time where N is the number of OBDD nodes [1] . The correlated probability in Equation (6) represents the correlation of two signals:
p(i=q^j=r) = p(i=q) p(j=r) ij qr (12) Because the signal probability can be computed in O(N) time, evaluation of Equation (12) can also be performed in O(N) time.
Temporal Correlation
Simultaneous type HH transitions on neighboring wires yield an insignificant amount of charge distribution in coupling capacitance.
However, in most cases, the transitions are misaligned due to unbalanced propagation delays through multiple signal paths to the logic component. The misalignment leads to multiple interactions between adjacent wires. The power consumed by coupling capacitance increases corresponding to the multiple charge distributions. 
Now we are ready to figure out the temporal correlation factor !. In principle, the signal transition on a wire can be temporally correlated to the signal transition on other adjacent wires. In other words, the transition on tr1 for the signal 1 is not an independent event of the transition on tr2 for the other signal 2 in Figure 4 
Equation (14) accounts for the temporal correlation between two signals. However, accurate evaluation of temporal correlation is known to be prohibitively expensive from a computational point of view. To approximate the problem, we assume that the transitions are uniformly distributed and independent of each other. Then, we can compute the temporal correlation factor as follows. Let f denote the probability of p(j j f ), 0 denote the probability of p(j j 0), and i denote the probability of p( 0 <j j< f ) shown in Figure 4 Cx. Based on the effective capacitance, we eventually compute the cycle-averaged power for interconnects in Equation (1). 
Crosstalk Model for Domino Logic
Crosstalk effects are determined by physical capacitance and logical characteristics of the wires under consideration. The physical capacitance is represented by unit-length cross-coupling capacitance Cx and spacing between wires.
Due to logical correlation between signals, some signals are free from crosstalk from other specific set of signals. The concepts of satisfiability and observability are introduced to capture such logical behavior.
The uni-directional transition of domino logic implies that a signal with a stable low is a potential victim signal. And the potential aggressor signal can be defined as the signal which experiences a transition causing cross-coupling with a victim signal. Satisfiability represents the existence of at least one primary input vector that satisfies both the victim signal condition and the aggressor signal condition. Satisfiability of a crosstalk between a victim net v and a aggressor net a is defined to be (17) where p is an instance of primary input vector which evaluates v to logic value 0 and a to logic value 1.
For a crosstalk to affect the functional or temporal behavior of a circuit, the crosstalk should be observable at primary outputs. The observability of a crosstalk on a victim signal v is given by
where q is a primary output signal, and q j v= is a cofactor of q with respect to the variable v with the value . Then, the maximum crosstalk that a victim signal v can experience is given by
where N denotes the whole set of nets in a circuit.
Notice that we can define logical crosstalk immunity based on satisfiability and observability according to Equation (19). Sav Oav = 0 also holds, then wire pair v and a are mutually free from crosstalk, being in the same crosstalk immunity set (CIS) [9] .
Intuitively, the wire pair v and a can be placed as close as possible for compact design, since there will be no crosstalk between them.
Cross-Coupling Power Optimization

Problem Formulation
We assume that the design has already been placed and routed using gridded channel routing. In addition, the primary input statistics are known a priori. These assumptions are realistic for soft or firm intellectual property (IP) macroblock specification.
The effective capacitance consists of both self and coupling capacitance. Coupling power is the dominant component which is heavily dependent upon net spacing. Hence, we isolate the effective coupling capacitance and optimize the net spacing in a routing solution in order to minimize the coupling power. We formulate the power optimization problem as follows:
Xv Xspec Routing area Areaspec where is a member of the set of signal paths from the primary inputs to the primary outputs. The constraints specify that maximum crosstalk and routing area should all be kept within pre-defined bounds.
Coupling Power Optimization
Gridded channel routing is specifically considered. The objective of track assignment is to minimize the effective coupling capacitance (Equation (6)) in order to reduce coupling power, while satisfying all the constraints. Because it is hard to consider the objective function and constraints simultaneously during the routing process, a conventional channel routing approach, left-edge algorithm [5] , is The segments in the initial routing solution are then perturbed by moving, swapping, and permuting so as to minimize the coupling power. Note that the perturbation cannot be performed freely because of the vertical constraints on their relative vertical positions. The track assignment problem is known to be NP-hard. So we use a simulated annealing approach to obtain an optimal solution [7] .
The temperature function is defined as follows: Without loss of generality, we assume that unit-length cross-coupling capacitance Cx has unit value for minimum spacing. Then the temperature is computed as the sum of the effective capacitive couplings of each segment of adjacent nets.
T e m p =
where net i is adjacent to net j, is is a segment of the net i, and js is a segment of the net j. Each segment corresponds to a unit RC tile in Figure 1(b) . The space between two adjacent net segments and switching activities determine the effective capacitive couplings of each segment as in Equation (6) . Given switching probabilities and slack time, the temperature is computed to be 13.4 for the solution in Figure 5 (a). Figure 5(b) shows the routing solution based on track permutation, net swapping, and net moving [13] . The average coupling power is computed to be 10.7 which is substantially improved in comparison to initial routing.
It is worthy noting that track perturbation does not incur any area overhead because we carry out permutation, moving and swapping within the initial routing area. During the annealing process, the maximum crosstalk constraints are guaranteed by evaluating Equation (19). The concept of crosstalk immunity set is applied to account for logical relationships between signals.
Experimental Results
Domino logic synthesis flow is shown in Figure 6 , which is implemented on an Ultra SPARC workstation. Technology independent optimization is performed using SIS [12] , then all the internal inverters are eliminated for a domino logic realization. Crosstalk immunity set information is extracted so that maximum crosstalk computation accurately represents the logical behavior of adjacent wires [9] . Identified CIS information is used to compute the maximum cross-coupling effect so that noise constraints are satisfied during optimization. The circuit is then mapped with a parameterized cell-library [14] , which specifies the maximum number of allowable serial and parallel transistors in a cell. The mapped circuit is placed and routed using the left-edge algorithm, which provides the initial routing solution for track assignment. In each cooling step of the simulated annealing, the temperature in Equation (20) is evaluated based on switching probabilities and updated timing information. For each routing solution, interconnect parameters for distributed RC model are extracted under 0.18 m technology process. Eventually, we measure the power consumed by the circuit using HSPICE for given signal probabilities on the primary input signals. Figure 7 contains two curves of power consumption in the benchmark circuit C880 for given input patterns. The curve (a) in Figure 7 represents the running average for the circuit implementation with the left-edge algorithm, while the curve (b) represents the average power for optimized routing. Notice that in Figure 7 , it takes about only 60 simulation cycles to converge its eventual level, and it takes about 10 to 40 cycles to reach the 1% error line. Table 5 contains synthesis results and measured power in mW when the signal probabilities assigned to the primary inputs were 0.2, 0.5 and 0.8, respectively. The results under the column "P(LE)" refer to the power consumed by the circuit implementation using the left-edge algorithm. It should be noted that the power measured by HSPICE includes power consumed by gate capacitance and interconnect capacitance, power attributed to short circuit current and leakage power. The power consumed by the optimal implementation are reported under the column "P(OPT)". The optimal routing solutions save 14.2%, 12.6%, and 16.1% power compared to the initial routing as shown in the column "% Sav" denoting the percentage of the power saving, with respect to each signal probability. It can be noted that the impacts of optimization on power saving would become more significant in UDSM process, since power dissipation will more heavily dependent on coupling capacitance in UDSM than in 0.18 m technology that we used. Table 2 contains the results of maximum crosstalk computed by Equation (19) for each physical realization with unit value for coupling capacitance Cx. The results show that maximum crosstalk effects are reduced by 10.7%, 8.8%, and 12.1%, with respect to each signal probability.
Conclusion
Signal integrity and energy efficiency are becoming important in ultra deep sub-micron technology. Furthermore, as a premier candidate for high-performance system design, reliable domino logic implementation requires a well-structured interconnect network. The reason is that domino logic is more vulnerable to noise in comparison to static CMOS logic.
We presented a formulation for the average power dissipated by interconnect. The power model accounts for signal and temporal correlations when computing effective capacitance. Signal correlations are evaluated by using OBDD efficiently. Timing analysis provides for temporal correlation factor that accounts for charge redistribution on coupling capacitance due to misaligned transitions on adjacent nets. Then, maximum crosstalk effects between neighboring wires are modeled to reflect the logical aspects as well as electrical factors. Based on the coupling power model and the maximum crosstalk model, coupling power is optimized using track perturbation. Meanwhile, constraints on area and noises are assured to meet the specification. Experimental results show that over 12% of total power is saved and the maximum crosstalk is reduced by 8.8% eventually with on-probability of 0.5 for all primary input signals. 
