Abstract-A novel concept of bidirectional transformation between on-chip coupling noise waveform and delay-change curve (DCC) using closed-form equations is described in this paper. These equations are targeted for use in: 1) the efficient generation of DCCs and 2) accurate experimental determination of subnanosecond coupling noise. In particular, we explore the concept of using analytical models to efficiently generate DCCs that can then be used to characterize the impact of noise on any victim/aggressor configuration. The concept is model independent, although we investigate several common noise modeling choices and perform a sensitivity analysis to optimize the generation of DCCs. By extending existing noise models, arbitrary configurations can be considered including multiple aggressors in the timing-analysis framework. Simulation using the analytical approach closely matches time-consuming SPICE simulations, making noise-aware timing analysis using DCCs both efficient and accurate. A test chip using a 0.25-m CMOS process was designed and its measurement results also show good agreement with SPICE simulations.
I. INTRODUCTION
T IGHTER metal pitches, larger aspect ratios, and increasing operating frequencies place greater importance on interconnect coupling effects. Excess coupling noise may cause false switching on the victim net, but even a small amount of noise can change the victim delay significantly, resulting in timing degradation. Thus, accurate estimations of the coupling noise impact are crucial for noise-sensitive circuit blocks, critical timing paths, and long interconnects connecting intellectual property (IP) blocks.
Characterization of the interconnect delay and its variation due to coupling noise is required to maintain an adequate timing margin and achieve the targeted performance. Currently, noise must be considered at nearly every stage of high-speed circuit design in order to reduce the number of design iterations. For example, an incremental approach has been suggested to rewire noise critical nets after an initial routing solution is found [1] . Another technique generates a list of aggressors for each noise or timing critical net; these aggressors are sequentially rerouted until constraints are met [2] . Although inductive effects have become a major concern [3] , [4] in high-performance applications, the effect of capacitive coupling on timing is still an open issue and remains dominant since nets exhibiting strong inductive behavior are relatively few and addressed using shielding and other techniques. The impact of coupling noise on the victim delay depends strongly on the relative input timing of the aggressor and victim. It is too pessimistic to use only the nominal and maximum delay values. The delay variation becomes maximum in two extreme cases when the parallel lines switch nearly at the same time in either the same or opposite directions. The delay change decreases when there is a timing difference between the aggressor and victim inputs. Therefore, the relative signal arrival time (RSAT) of the aggressor and victim is a key parameter to realizing a compact layout with sufficient timing margin [5] , [6] .
The victim delay change due to coupling noise can be captured using in situ measurement techniques [7] . The resulting delay-change curve (DCC) is useful for verifying and calibrating simulation models; it also serves well in timing analysis. In [7] , it was pointed out that the DCC correlates with the shape of the coupling noise (such as the noise base width), but translation back to the originated complete waveform is nontrivial. Once the DCC is related to the original coupling noise waveform or some of its defining parameters, characterization of interconnect noise effects becomes simpler. While there has been a significant amount of work in modeling coupled noise waveforms [8] - [13] , there has been no work in modeling the shape of the DCC. By developing models for DCCs, we can specify the links between them and coupled noise waveforms in an analytical manner, allowing us to move easily between the delay and noise design spaces.
In this paper, the relationship between capacitive coupling noise waveforms and DCCs is modeled for the first time. Based on this model, an efficient methodology for generating the DCC through a small number of SPICE simulations or through fully analytical means only are proposed. The main contributions of this paper are: 1) a demonstration that the DCC and the coupled noise waveform are fundamentally related; 2) a presentation of a method of translating between these noise forms; and 3) a proposal of the practical application of the new methodology and verification of its usefulness. Simulation results are also validated by test chip measurements using a 0.25-m process. This paper is organized as follows. The background of this work-crosstalk noise issues and noise approaches in timing analysis, how the DCC is obtained, and how it can be used in timing analysis-is presented in Section II. Section III describes the fundamental relationship between the DCC and the victim noise and aggressor waveforms, enabling bidirectional transformation between them. Section IV describes two major applications of DCC to coupled noise translation and highlights modeling details and sensitivities. Then, in Section V, the analytical results, simulations, and test chip measurements are compared. Conclusions are stated in Section VI.
II. BACKGROUND

A. Crosstalk and Dynamic Delay
Noise caused by interconnect effects can be separated into one of two forms: 1) crosstalk, which we define as a voltage glitch on a quiet victim line and 2) dynamic delay , which refers to the uncertainty in delay of a stage (gate + wire) due to the switching activity of nearby gates. For static CMOS designs, the functional implications of crosstalk are not as significant as the potential timing errors caused by dynamic delay. Due to the restoring nature of CMOS logic, a noise glitch would need to exceed the switching threshold of the fan-out gate in order to cause functional failure. In contrast, delay changes occur for any coupled interconnect and the resulting change from dynamic delay can easily exceed 20%-30% for relatively short wires ( 0.5 mm), depending on driver and interconnect configurations. Fig. 1 shows the increase in delay uncertainty for a 3-mm global wire through a number of technology generations. A large inverter with a fan-out of one serves as both victim and aggressor. We see that worst case dynamic delay approaches the 80% plateau in this example, corresponding to the portion of capacitance due to coupling. This degree of delay uncertainty is intolerable for designs with tight timing budgets.
B. Modeling Approaches for Dynamic Delay
There are two primary modeling approaches to dynamic delay. The first is based on the Miller effect, which replaces a capacitance between two nodes by equivalent capacitances to ground at each node [14] . In an on-chip context, the coupling capacitance between two adjacent wires is replaced by a ground capacitance for each net. The resulting ground capacitance has traditionally been set to either 0 or 2 which have long been considered lower and upper bounds respectively. Recent work has shown that the actual bounds on the effective coupling capacitance are and 3 [11] . The coefficients of are often called switch factors.
The second modeling approach to dynamic delay recognizes the fundamental relationship between crosstalk and dynamic delay. In [15] , the authors note that neighboring wires can be viewed as an added load for the victim gate and, as such, we should be able to directly calculate the additional charge required to switch these new loads. By examining the voltage glitch experienced on the victim line in the crosstalk scenario, we can find an upper bound on the amount of charge needed to counteract the influence of the aggressors. In short, dynamic delay can be characterized by superimposing the voltage glitch of the victim experiencing crosstalk onto the switching waveform of the victim when aggressors are quiet. While not exact due to device nonlinearities, this approach has been shown to yield good results for a variety of driver and interconnect dimensions.
Switch factor based analysis is simple and handy but tends to be pessimistic since switching of the aggressor and victim does not necessarily occur at the worst timing alignment. In this paper, we take the second approach to accurately characterize the noise impact on delay by considering the relative timing of the victim and aggressors.
C. Relative Timing Window Analysis
In [6] , the authors describe a novel method for dealing with dynamic delay in timing analysis. The idea is based on the observation that, while worst case dynamic delay occurs when the aggressor and victim change nearly simultaneously, the delay is a strong function of exactly when these switching phenomena take place. When the aggressor and victim switch at time points very far from one another, there is no dynamic delay impact; the nominal delay is obtained. In addition, with a slight offset of switching events, the delay change is less than the worst case but still greater than zero. Thus, the authors in [6] introduced the concept of the relative window method (RWM) in which the range between earliest and latest RSATs is propagated. The window edges passed to subsequent stages are determined by consulting a DCC of the driving gate which is a function of relative signal arrival time (RSAT), where RSAT is defined as aggressor signal arrival time (ASAT)-victim signal arrival time (VSAT). When VSAT and ASAT are deterministic, we have a deterministic value of RSAT and we can use the DCC to find the delay for this value of RSAT. However, static timing tools deal with ranges, not deterministic VSATs and ASATs. Hence, we obtain a range of possible RSATs, which is called the relative window, and we use the worst case delay in that relative window for static timing analysis (STA). Traditional switch factor timing analysis models assume that the worst case applies whenever the noise exists. This approximation overconstrains the design and cuts into the available timing budget; this effect will become more severe with shrinking clock periods and rising noise effects.
The concept described above is demonstrated in Fig. 2 , referred to as a DCC. In the graph, RSAT is varied where it is again defined as the aggressor arrival time minus the victim arrival time at the gate inputs. Near , the maximum dynamic delay is observed. At either end of the axis, the dynamic delay is zero since the switching events are, in effect, independent at these points. The most interesting part of the curve is the intermediate region, where the delay is changed from its nominal value, yet it is impossible for existing modeling approaches to determine exactly how much it has changed.
The approach of [6] builds a DCC from circuit-level simulations using a typical line length. It is unclear how results from these simulations are applied to actual on-chip scenarios where wirelengths vary. In addition, the sheer number of simulation environments required by different interconnect configurations, drive strengths, line lengths, etc., makes timing analysis based on simulation-generated DCCs impractical. However, a major advantage of this approach is that by using the DCC to focus on signal arrival times, [6] reduces the conservatism shown in many noise-based timing-analysis engines. Worst case noise is not always assumed. In addition, DCC captures the possibility of dynamic delay when switching windows do not overlap while switch-factor-based approaches cannot.
To illustrate this latter phenomenon, examine Fig. 3 . A simple inverter-based circuit has different input arrival times to the aggressor and victim gates. Here, the aggressor arrival time is 1.9 ns (at ) and the victim arrival time is 2.05 ns, resulting in an RSAT of 0.15 ns. Although the waveforms propagating along the nets have only a slight overlap (the last 5% of the aggressor transition in this example) the switching delay of the victim is substantially different (23% rise in this case). Nev- ertheless, STA tools based on comparing switching windows would expect zero dynamic delay for this case. The noise waveform on the victim arising from the earlier aggressor transition has caused the initial voltage of the victim switching event to be less than 0 V. As a result, an additional charge has to be supplied by the victim driver, effectively increasing the delay. In this manner, the aggressor transitions well before victim switching events can contribute to delay changes. Also, the slew rate of the victim will not be changed under these conditions since the delay increase is only due to the initial voltage conditions and not concurrent switching activity. The rise time in this case is within 1% of the scenario where the aggressor is quiet.
D. DCC Generation
Recent work has presented ways to measure and model the presence of dynamic delay in advanced processes [7] , [16] . Fig. 4 shows a DCC measurement structure [7] and Fig. 5 shows how a DCC is constructed based on measurements and/or simulations. Switching input pulses are applied to both the aggressor and victim inputs. Measurement of the voltage waveforms at the pads compensates for first-order process, voltage, and temperature variations of the output path delay. RSAT is swept by changing the aggressor signal arrival (or victim arrival timing) to find each data point in Fig. 2 .
In [7] , the authors suggested a relationship between the coupled noise waveform (the crosstalk voltage glitch) and the DCC. For example, the crosstalk noise pulse width is fundamentally related to the width of the DCC, which can be well understood by comparing Figs. 2 and 5. The width of the nonzero delay change in a DCC gives an approximation of the noise base width since the delay change occurs only when the noise and the victim edge cross. This point is important since it gives designers and STA tools an idea of how sensitive coupled nets are to noise effects across time. We investigate these points further in the next section in order to characterize the coupling noise using DCCs.
III. RELATIONSHIP BETWEEN DCC AND VICTIM WAVEFORM
From observing Figs. 2 and 5, we can regard a DCC measurement as a sampling of the coupling noise using the victim signal edge. The relationship between a DCC and both the coupled noise waveform and the without-noise victim waveform can be characterized.
A. Victim Waveform Model
The far-end coupling noise waveform on the victim line and far-end without-noise victim waveform are individually modeled using the following equations. The parameters are defined in 
The coupling noise waveform is modeled using a linear rise to the peak voltage with attack time followed by an exponential decay with a time constant . A variable is defined as the timing difference between the aggressor and without-noise victim waveforms at the far-end of the interconnect, representing the timing difference in the inputs or RSAT. The without-noise victim waveform is modeled using a single time constant , where is the supply voltage. Any timing alignment of the aggressor and victim input can be realized by changing the variable .
B. Derivation of the DCC Equation Using Victim Waveform Model
Based on the observation in [15] , we assume that the victim switching waveform with coupling noise can be expressed as the sum of the coupling noise and the without-noise victim waveform. The equation is solved for . The delay change compared to the without-noise case is then calculated as with being a variable. Here, represents the victim delay with the noise, and is the without-noise victim delay defined at the reference voltage ( ). In this calculation, exponential functions are expanded to the first order polynomial around . The derived DCC equation is expressed as follows:
for for for (3) where and . Assuming the maximum delay change occurs at the noise peak [15] , the absolute value of the maximum delay change becomes (4) Note that it is possible to use up to a 4th order expansion to analytically solve the equation more accurately. In order to realize both a bidirectional transform and conciseness, we focus on the first order expansion here. This assumption provides a good approximation and works as an upper bound for most practical cases. The error introduced by neglecting higher order terms is analyzed in Appendix I. The exactness of increases when the expansion is a good approximation. Another option that avoids the use of is to take the minimum of the equations around the peak of the DCC.
The DCC exhibits an exponential ramp-up and linear decay, which is similar in shape to the coupling noise waveform but reflected about the axis. With a larger without-noise victim time constant , the second expression of (3) becomes steeper, leading to a greater delay change. Since the peak noise strongly depends on the victim drive strength (quantified by ), the delay variation increases super-linearly with as it appears in both terms of (4). The decay time constant of the DCC is equal to the coupling noise time constant .
IV. APPLICATIONS OF THE BIDIRECTIONAL TRANSFORM
The proposed (1)-(4) link the victim waveforms and DCC bidirectionally. The ability to transform from DCC to victim waveform and victim waveform to DCC has several important ramifications. In this section, we describe two major applications:
1) accurate measurement of on-chip coupling noise, and 2) DCC estimation from victim waveform.
A. Accurate Measurement of On-Chip Coupling Noise
Since the DCC can be readily measured and translated into victim noise waveforms using (1)-(4), the DCC measurement works as an indirect measurement technique for very sharp subnanosecond width coupling noise which is difficult to measure. The DCC measurement structure, such as in Fig. 4 , makes it possible to mimic actual circuit loading conditions without the parasitic effects of probing. This indirect measurement also provides the victim signal transition time or slew rate at the receiver, which is important in high-speed circuit design. Table I summarizes the set of data points to be measured and how to extract the waveform parameters. To determine the coupling noise and the without-noise victim waveforms from the DCC, follow steps (a)-(d).
B. DCC Estimation From Victim Waveform
Another important application of translating between DCCs and victim waveforms is the efficient generation of DCCs. Noise-aware timing analysis calculates the earliest and latest arrival times based on traditional STA but using DCCs [6] . Since there are an unlimited number of driver sizes, gate types, interconnect topologies, fan-out conditions, etc., DCC calculation must be extremely fast. For quick generation of the DCC, the translation from (1) and (2) to (3) can be used when delay variation data is required during full-chip routing or timing analysis. If the parameters of the coupling noise and the without-noise victim waveform are available, they enable DCC calculation for the entire RSAT range. The computation time required to construct the DCC is very low since the translation is done using closed-form equations.
The overall flow of analytical DCC generation is shown in Fig. 7 . We begin with the extracted parasitics of the design, including coupling capacitances. The remainder of the process focuses on translating these extracted RC parasitics to a DCC. First, the victim noise model is key since the noise waveform shape determines the nature of the DCC. As mentioned earlier, the noise waveform can be seen as a mirror image of the DCC; a slowly decaying noise spike translates to a slow ramp-up in the DCC toward the worst case delay. Likewise, a sharp ramp-up in the noise waveform leads to a rapid decay when the RSAT becomes slightly positive. This relationship is shown graphically in Fig. 8 . Therefore, we need a coupled noise model that accurately portrays the entire waveform shape rather than only capturing the peak noise value.
To derive the required victim waveforms, we propose the use of either a single SPICE run (for fitting purposes) or purely analytical models.
1) Single SPICE Run Approach:
The first option is to run SPICE once to determine the without-noise victim ramp and victim noise waveforms. To derive these waveforms independently, RSAT should be set large enough so that the noise on victim and the without-noise victim ramp do not overlap. Using the simulated waveforms, the required parameters ( ) in (1) and (2) are found using a simple fitting procedure. Once the parameters are determined, the DCC for the full range of RSAT can be calculated using (3) .
If the proposed equations of Section III-B are not used, at least 20 transient simulations are required to generate even the rough shape of the DCC since a wide range of relative arrival times between the aggressor and victim must be considered. Therefore, the single SPICE simulation in conjunction with the proposed equations can achieve more than an order of magnitude reduction in DCC calculation time. This process significantly reduces the number of SPICE simulation and the computation time to generate DCCs while maintaining good accuracy. However, even this single SPICE run is too costly for on-the-fly generation of DCCs for each cell instance in a large design. To further enhance the efficiency of generating DCCs, we extend the above process by eliminating the single SPICE run used for parameter fitting and replacing it with accurate closed-form noise models to extract the relevant model parameters.
2) Analytical Approach: The analytical model used in this section is derived from the transfer function of the distributed RC line by approximating the solution into a two-pole expression in the time domain, which is similar to the approach in [13] . The equivalent circuit is presented in Fig. 9 . The model used here is only valid for two fully coupled lines but provides insight into how we can compute the DCC based on a given noise model. Handling the general case will be discussed in Section IV-C.
The equivalent nonlinear CMOS gate of the quiet driver can be modeled by its effective linear resistance. However, for a switching driver the impedance changes during switching and a single linear resistance model is not accurate. In this model, we use a ramped input voltage source to represent the driving gate. The slope of this ramp can be obtained by using any of the established approaches [17] . Although these approaches are iterative, they are preferable over other approaches that attempt to model nonlinear gates with a single resistance. In this paper, we concentrate on modeling noise and delay assuming that the slope of the waveform at the beginning of the interconnect (just at the output of the driver) can be obtained accurately.
The delay waveform expression is for for
The noise waveform on the victim line is described by for for (6) Details and parameters of (5) and (6) are listed in Appendix II. Eq. (6) serves as the noise model depicted in step 2 of Fig. 7 . Note that models considering arbitrary coupled lines can easily be used instead of the present model within the DCC generation framework. Solving for the peak noise voltage in closed-form
Here, is the aggressor ramp rate (found using techniques) and other parameters are again given in Appendix II. In translating from crosstalk noise to dynamic delay, it is important to know the time at which the worst case noise occurs. This is given by (8) With the above equations, we have the exact form of the noise waveform (the exactness depends on the model accuracy) and we have readily extracted two key parameters and from this waveform. The next step is to mathematically transform the noise waveform into a DCC, which is difficult to do based on a complex noise expression such as (6) . Instead, we approximate the noise waveform by a simpler two-piece model with a linear ramp time ( ) and exponential decay after the peak ( ). Furthermore, the without-noise victim waveform (which we need to solve for the new delay value) is approximated by a single rise/fall time constant ( ).
can be found by fitting the resulting without-noise waveform of (5) to a single-exponential ramp at some specified voltage level (e.g., 50% or 63% of the voltage swing).
These expressions can be used to generate one possible dynamic delay model, as shown in Fig. 7 . We calculate and from (7) and (8) above and then fit and in (3) by comparing to the more accurate two-pole models of (5) and (6) . We take this approach because the two-pole models are considerably more complex to translate to DCCs; instead we focus on transforming the results accurately to the simpler one-pole models of (1) and (2). After these four parameters are found, a DCC can be generated directly using the extracted RC parasitics with no simulations.
In the next section, a more general noise model is used in the same manner to handle multiple aggressors, emphasizing the model independence of the DCC generation concept. Continued technology advances require the handling of increasing numbers of coupled nets with heightened accuracy. The proposed technique avoids two potential problems under these circumstances: 1) the need to store enormous databases of possible coupling combinations and 2) inaccuracies resulting from table look-up and interpolation.
C. General Models Incorporating Multiple Aggressors
In this section, we extend the above approach to practical cases with multiple aggressors. Most nets in modern designs are capacitively coupled to at least several other nets; this fact complicates timing analysis as each of the aggressors will have separate signal arrival times and will act on the victim in a distinct manner (see Fig. 10 ). Sasaki extended his relative window analysis method to include the effects of multiple aggressors in [18] and confirmed the technique experimentally in [19] . Their approach uses the absolute arrival time of the victim as a reference point so that temporal isolation of aggressors is accounted for (i.e., aggressors that cannot act upon the victim simultaneously due to exclusive timing windows should not be considered together). The translation from crosstalk noise waveform to DCC Fig. 10 . Victim line typically has more than one aggressor. Here two aggressors with partial coupling complicate the timing analysis environment. described in the previous section is still valid in a multiple aggressor scenario so long as an approach similar to [18] is used to avoid simply summing the effect of the completely independent aggressors.
Our multiaggressor approach based on [18] can be summarized as follows.
1) Consider one aggressor at a time while assuming all of the other aggressors are quiet. Find the noise due to this aggressor using a general noise model (one is described below). Find the DCC for this victim-aggressor pair in the same way as a single aggressor case. 2) Repeat step 1 for each aggressor.
3) Plot the worst case delay change as a function of absolute victim signal arrival time. This requires computing the range of relative signal arrival times for all of the aggressors at each value of victim signal arrival time.
The maximum delay impact due to each aggressor in its RSAT range is found from their DCCs and added together to calculate the maximum possible delay change at this particular value of victim signal arrival time. Effectively, the worst case switching behavior is determined for each aggressor individually at each time point for which the victim can switch. In this way, a curve is built from worst case delay change data points across the victim switching window. In step 3, instead of summing the worst case delays due to each aggressor, the multiple aggressor relative window method considers the temporal isolation of aggressors by using the absolute arrival time of the victim as a reference point. At all possible values of victim signal arrival time in the victim switching window, we compute the range of relative signal arrival times for all aggressors and find the maximum victim delay change due to each aggressor within its RSAT range based on DCCs from steps 1 and 2. This point is important in reducing pessimism because the worst case delays due to each aggressor do not typically occur at the same time (or would do so very rarely).
The noise model used in this flow must be capable of handling arbitrary configurations, particularly various aggressor placements, drive directions, etc. The noise model introduced in the previous section is valid only for two fully coupled lines. We now describe a general noise model and extensions we make to it in order to calculate crosstalk noise waveforms for each aggressor acting on a single victim.
The victim delay waveform when aggressors are quiet can be calculated in the traditional fashion-with coupling capacitances to aggressors viewed as capacitances to ground. Likewise, when analyzing each aggressor individually, the other aggressors are considered quiet and their coupling capacitance to the victim is treated as ground capacitance (switch factor ). We use the 2-model from [8] with modifications for the estimation of crosstalk noise. This model considers the location of coupling and can be used effectively for generic RC trees. Furthermore, the 2-network better approximates a distributed line than a single segment, making the model more appropriate for global wires than most previous approaches. The model also provides simple closed form expressions for noise peak and peak timing. However, [8] models the aggressor as a saturated linear ramp. In reality, it is more closely approximated by an exponential waveform. We extend the model to include this, yielding the following expression:
While dealing with multiple aggressors, to solve for the victim noise waveform due to a single active aggressor, we lump all the coupling capacitance due to other aggressors (quiet) at the center of the coupling. After lumping these quiet coupling capacitances to ground, the resulting network is reduced to the equivalent 2-network.
In (9), is the upstream resistance (referenced to the coupling point) multiplied by the coupling capacitance, is the Elmore delay of the victim net, and is the time constant of the aggressor rise time (originally the rise time itself in [8] ). Another enhancement over [8] is the consideration of slew rate degradation along the aggressor line. 1 The aggressor ramp rate at the beginning of the line can be much different than the coupled ramp rate to the victim due to the line RC delay. We directly include this effect by dividing the aggressor line into a 2-network and calculating the new time constant at the coupling point. The time constant at the coupling point can be obtained by using any delay metric. The Elmore delay metric is simple but its pessimistic results will directly translate to optimistic noise results. Therefore, we use the more accurate delay metric described in [20] to calculate slew rate degradation along the aggressor line.
In Fig. 11 , we show noise waveform results for [8] as well as two forms of the extended model described above for the interconnect configuration of Fig. 10 waveform is a major improvement in that the exponential rise makes noise more prominent and shifts the peak timing earlier compared to linear models. By considering slew rate degradation, the model becomes more accurate for cases where the aggressor does not couple directly at the beginning of the line, as in Fig. 10 .
The noise waveforms resulting from this approach can handle arbitrary configurations and are used in steps 1 and 2 of our multiple aggressor flow. It must be emphasized here that the multiple aggressor flow generates DCCs for each aggressor separately. This requires computing noise due to one aggressor at a time and translating to its DCC, which is identical to the approach from Section IV-B. By referencing all aggressor switching activity to the victim signal arrival time, the true impact of multiple aggressors can be determined. Fig. 12 shows the resulting composite DCC for a configuration with two independent partially coupled aggressors. The general topology is shown in the figure inset. The timing window intervals are [4, 5.5] ns for aggressor 1, [7, 8.5 ]ns for aggressor 2, and [5, 7.5] ns for the victim line. The axis in this figure represents the range of possible switching times for the victim and the worst case switching behavior of the aggressors is exercised at each absolute time point. In the middle of the curve (around 6-6.4 ns), both aggressors have only a small impact on the delay of the victim. Since the individual DCCs show larger relative error at the tail of the noise waveform, the cumulative effect of two aggressors enhances this effect, leading to larger errors than other cases ( 30 overestimate). For the time points at which the delay change is larger, the multiaggressor RWM is still accurate. In particular, since we are most concerned with worst case delay for the victim, in this case we would accurately conclude the worst delay change is 54.4 ps from the analytical models while SPICE expects a value of 53.5 ps (both occurring at a victim arrival time of 5.5 ns). Linear superposition of noise due to the two aggressors yields an expected worst case delay change of ps, an overestimate of 81% compared to less than 2% error for the analytical RWM. The line parasitics in this case are: 1) victim line resistance of 36 mm; 2) victim capacitance to ground of 81.6 fF/mm; 3) coupling capacitance of 68.8 fF/mm; 4) aggressor line resistance of 44 mm; and 5) aggressor ground capacitance of 77.6 fF/mm. 
D. Modeling Considerations
In this section, we discuss the model independence of our DCC generation methodology and describe the sensitivity of our approach to the supporting model accuracy.
A major step in DCC generation is the simplification made to the noise waveform that allows closed-form translation to a DCC. We use a model with a linear ramp to the noise peak, followed by an exponential decay to zero. A comparison of this shape with the correct noise waveform taken from SPICE is shown in Fig. 13 . As can be seen, the approximated waveform underestimates noise as it increases toward the peak because it uses a sharp peak, rather than a rounded one as seen in practice. Furthermore, the exponential tail can be fit at any particular point along the waveform and this fitting point is set at 50% of the peak value nominally. This gives a decent fit throughout the curve but results in underestimation near the peak where dynamic delay is largest. An alternative is to use an exponential rise and decay, which is also shown in Fig. 13 . The overall results are improved: there is some underestimation near the beginning of the noise pulse that will not strongly impact the delay calculation since the resulting dynamic delay is small here. Otherwise, the model fits better than the simpler linear-exponential piecewise model of [16] .
In addition, it follows from the form of (9) that noise waveforms can be modeled accurately using a dual-exponential expression. However, using a dual exponential noise expression, the victim delay with noise cannot be solved for in a closed form expression. In order to achieve the superior fit to realistic noise waveforms as shown in Fig. 13 , we can employ Newton-Raphson iterations based on the initial estimate of victim delay with noise obtained from the linear ramp + exponential decay noise assumption. With this approach, the resultant DCC has a smoother shape and the underestimation near the peak seen in a linear ramp assumption is eliminated. DCCs generated based on both noise approximations are shown in Fig. 14 . To emphasize efficiency, results in Section V are based on the simpler approximation in (5) and (6) . The expansion of the double exponential waveform about may eliminate the iterative process although we have not investigated this. In this way, even more complex models can be seen to be compatible with analytically generated DCCs. A sensitivity analysis was undertaken to determine which parameters are most critical in the DCC generation methodology. Results for are shown in Fig. 15 , where DCCs are generated based on the actual SPICE-extracted and values with errors of 10 . 2 In contrast to Fig. 15 , deviations of 10 in result in less than 3% error in peak noise and a half-maximum width change of 8 . Overall, the study indicates that and are the most important parameters to accurately model in our approach while and do not strongly impact the DCC shape. This implies that timing models are as important as noise models in determining dynamic delay. In particular, strongly affects both the maximum delay change as well as the RSAT at which this peak occurs (i.e., it shifts the DCC in both the -and -directions). We have focused above on finding accurate noise models, but emphasize that either delay models or underlying cell timing characteristics must provide good estimates of in order to generate highly accurate DCCs. In our approach, we fit based on the delay model of (1).
V. EXPERIMENTAL RESULTS
This section evaluates the accuracy of the above methodologies for noise-aware timing analysis through circuit-level simulation and test chip measurements. Specifically, we focus on 2 The peak noise occurs at a fairly large positive RSAT value here -this is due to a fast aggressor and relatively slow victim transition. Arrival times are defined at the beginning of transitions and maximum noise occurs when a victim is mid-transition and a fast aggressor then couples to it. 
A. Test Chip Description
A test chip was fabricated in a 0.25-m, five-metal CMOS process to demonstrate the relationships between noise and dynamic delay. Based on [7] , the test chip implemented the DCC measurement circuit shown in Fig. 4 . The micrograph of the chip is shown in Fig. 16 . The dimensions of the chip are 6.6 1.7 mm. The length, width, space, and metal layer of the interconnect were varied in the test structures implemented.
B. Victim Waveform Extraction From DCC
In this section, the coupling noise and without-noise waveforms of the victim are extracted from a measured DCC and compared to simulation results. Moving from measured DCC to noise waveforms can be useful when the noise waveform itself is very sharp, making in situ measurement very difficult due to the high bandwidth requirements. For instance, the measurement technique of [7] exhibited clipping of the noise peak for half-maximum pulse widths 0.35 ns in a 0.35-m technology while the technique of [21] did not measure any noise waveforms of under 1-ns duration. Since direct measurement of fast noise pulses is very difficult, the use of measured DCCs to extract noise waveforms can be useful. The parameters used in each case are summarized in Table II . Here, is wire resistance, is wire capacitance to ground, and is wire to wire coupling capacitance. The waveform in Fig. 17(a) and (b) corresponds to the DCCs of cases A and B in Fig. 19 , and Fig. 18 corresponds to cases C and D. In cases A and B (line and dashed line) of Fig. 19 , the curves are constructed using a series of SPICE simulations that change the relative input timing between aggressor and victim, which is the same method as in the actual measurement. By using only these simulated DCCs, the solid lines in Fig. 17(a) and (b) are extracted using the procedures in Table I . The dotted lines are SPICE simulated waveforms stored during DCC construction. The noise pulse width measured at the half maximum matches within 5% for both cases, but the peak height is overestimated by 8% and 25% for Fig. 17(a) and (b) , respectively. These errors are caused mainly by the linear ramp assumption of the noise waveform. Different approximations of the coupling noise waveform shape can be used to obtain better results at the expense of greater complexity of the model and additional measurement points.
In case D (symbols) of Fig. 19 , the DCC is found using test chip measurements. Following the procedure shown in Table I , the without-noise victim waveform and the coupling noise waveform (solid lines) are extracted from the measured DCC in Fig. 18 . The simulated and measured waveforms match well with an accuracy of 11.1 for the noise width at the half maximum and 6.7 for the noise peak height (the measured values are smaller than the simulated values). Accuracy will increase when the measured resistance value for interconnect and measured transistor parameters are used instead of two-dimensional parasitic extraction and the nominal device specification. In particular, fluctuations in sheet resistance in deep submicrometer processes are known to be especially prevalent and hard to suppress due to boundary effects, cladding layers, etc. [22] . Results from the foundry test element group for this fabrication run show a 2 increase in sheet resistance compared to the nominal specifications. This information is included in the simulations of Fig. 19 for case C which is seen to match the overall shape of the measured DCC (case D) well.
The results of this section confirm that a slower without-noise victim slope ( , demonstrated by changes of the victim driver size) leads to substantially larger delay changes. Inserting buffers for long interconnect and appropriately sizing global drivers for noise considerations as well as delay constraints is required to maintain signal integrity. The approaches investigated include full SPICE generation, a single SPICE run followed by curve fitting as described in Section IV-B1, and the analytical approach of Section IV-B2. The horizontal lines show the predicted DCC peaks for the latter two approaches using (4). The wire structure is similar to that of Fig. 19 . Note that the symbols show that the full SPICE generation case uses 45 transient simulations. The dotted line represents the DCC calculated using a single SPICE run approach and the line represents the fully-analytical generation approach. Results show that the analytical method is accurate throughout the range of the curve. In fact, the analytical approach is comparable or superior to the results from the method using one SPICE run for fitting equation parameters. This validates the two-pole models used to determine and in Section IV-B2. Fig. 21 shows an out-of-phase DCC generated for a 2-mm fully coupled net routed on an intermediate metal layer using the improved model of Section IV-C. Different line dimensions are used for victim and aggressor nets to TABLE IV  PARASITICS USED IN TABLE III explore a more general case. Parasitics of the coupled lines used are as follows: 1) victim resistance is 36 mm; 2) capacitance to ground is 71.8 fF/mm; 3) aggressor resistance is 27 mm; 4) capacitance to ground is 81 fF/mm; and 5) coupling capacitance between the lines is 91.1 fF/mm. Very good fit is seen, with 5.5% error in half maximum width and 4.4% for peak noise. Some underestimation of delay ( 10 ) occurs around the peak resulting from the linear-ramp noise approximation described in Fig. 13 . The iterative approach described in Section IV-D can reduce this error at the expense of computational efficiency.
C. DCC Estimation From Victim Waveforms
Tables III and IV present accuracy, parasitic parameters, and runtime results for the methods of generating DCCs in Fig. 21 . Here and represent minimum spacing and minimum pitch of the wires, respectively. Wire width is kept at minimum, , for all cases in Table III . We examined a wide range of interconnect and driver configurations in the same 0.18-m technology and found that the analytical approach of Section IV-B2 gives smaller error than [16] for nearly all cases. For a range of interconnect pitches (using 1-, 2-, and 3-mm M6 lines, and 0.5-and 1-mm M2 lines), we found the average error of the analytical approach to be 8% for peak delay change and 17% for DCC half-maximum width. The error is primarily due to the simple one-pole models that we are using to drive the DCC translation. We are effectively forcing accurate data into a simplified model. These results are based on the 2-pole noise model of Section IV-B2. Cases with the largest error tend to be when the noise (and, hence, the delay change) is very small. Large noise cases such as a 3-mm M6 line or a 1-mm M2 line using minimum pitch are modeled very accurately. In addition, the runtime of the new method is much faster since simulation is completely avoided. In Table V , we ran 70 combinations of line lengths and wire pitches representative of an actual design and found the analytical approach took less than 1 s. Since we are still using some fitting functions to calculate the parameters to pass to (3) and(4) described in Section IV-B2, the runtime is not completely negligible.
VI. CONCLUSION
In this work, we proposed the concept of bidirectional translation between DCCs and coupled noise waveforms using closed-form expressions. The procedure enables: 1) indirect in situ measurement of crosstalk noise and 2) efficient generation of DCCs. For efficient DCC generation we describe a fully analytical approach to generating DCCs that exactly describe the amount of dynamic delay experienced on a net as a function of victim and aggressor arrival times. These DCCs can then be used in a STA framework to avoid the inaccuracies inherent in a switch factor-based approach. Since the number of potential aggressor/victim configurations is limitless, the efficiency of analytical models is necessary to make a relative window approach to STA feasible. Our results indicate that analytically generated DCCs typically match SPICE simulated curves within 5%-10% (some cases with small levels of noise have larger errors in the half maximum width), validating the approach. We also documented several modeling issues that arise in generating DCCs analytically. The most important conclusions from this analysis are that exponential ramps are more accurate than linear ramps in estimating noise and that timing models are of equal importance to noise models in building DCCs. An improved general crosstalk model was described based on [8] and shown to be useful in cases with multiple aggressors and short to intermediate line lengths. Simulation and test chip measurement results match well and confirm the effectiveness of the new noise characterization methodology.
APPENDIX I ERROR ANALYSIS OF (3)
The th order expansion of the equation using the bottom expressions in both (1) and (2) is ex- Fig. 22 . Error of (3) to the second order expansion.
pressed as (10) where , . For , the equation becomes (11) The relative error of the first order to the second-order expansion is presented in Fig. 22 . Here, the relative error is calculated as the ratio between the second-order term and the constant term. Ignoring the second-order term introduces about 10% error in the worst case with , . For typical noise of and , (3) always overestimates delay, thus giving an upper bound of the DCC peak. The second order term is small because 1) never exceeds 0.7 for , 2) the coefficient ( ) becomes nearly zero when , and 3) for realistic noise peaks of , is approximately 0.1.
APPENDIX II PARAMETERS DEFINITION FOR (5)
To solve for the output waveform at the end of coupled distributed RC lines, we first derive the transfer function from line input to line output in the domain. We then transfer back into the time domain to solve the waveforms as a function of time, truncating at the first two dominant poles.
The equivalent circuit is presented in Fig. 9 . In (5) - (8) of Section IV-B2, the parameters have the following definitions: (12) (13) (14) (15) and the RC parameters are defined as (16) is the equivalent on-resistance of the victim driver in the linear region of operation (found from -curves as the reciprocal of the slope near the origin), is the line length, and , , and are resistance, line-to-ground, and coupling capacitances per unit length (assumed to be the same for both lines in this model). and are load capacitances representing receiver gates for the aggressor and victim, respectively.
