A technique based on the sensitivity of the output to input waveform is presented for accurate propagation of delay information through a gate for the purpose of static timing analysis (STA) 
Introduction
STA tools require accurate delay models for gates and interconnects. The function of a gate delay model is to take a (noisy) input waveform at the far-end of the interconnect (e.g., in_u in Figure 1 ) and produce the waveform for the gate output (out_u.) This process is known as the gate delay propagation. Most STA tools model the noisy input waveform at in_u with an equivalent linear waveform, eff , with a single reference point (input arrival time) and a constant slope (an effective input slew.) However, different waveforms with identical arrival time and slew when applied at in_u can result in very different propagation delays through 4INV x . Generally speaking, as the crosstalk noise becomes more significant in current technologies, using only a reference point and a constant slope to convey the timing information for a signal transition adversely impacts the accuracy of STA tools. More precisely, the actual shape of the input waveform should be considered to ensure accuracy. 
2
Conventional Gate Delay Propagation Techniques
Point-Based
A technique, denoted as P1, sets the input slew of eff to the time from the 0.1V dd to 0.9V dd crossing points of the noiseless waveform as though the waveform had not been affected by the noise. Another technique, called P2, uses the time from the earliest 0.1V dd crossing point to the latest 0.9V dd crossing point of the noisy waveform as the effective slew of eff . Both techniques set the 0.5V dd point of the eff to the latest 0.5V dd crossing point. P1 and P2 may be too optimistic in some cases because of the way that they calculate the slew of eff . They can also be too pessimistic in other cases because of the way they calculate the arrival time.
Least Squared Error-Based
A technique, denoted by LSF3, finds eff such that the sum of the squares of the sampled differences between eff and the input noisy waveform is minimized. LSF3 can show pessimistic or optimistic behavior because it is simply a mathematical approach to match a waveform without any consideration of the logic gate behavior.
2.3
Energy-Based Inspired by the Elmore delay idea [2] , another technique, denoted as E4, passes eff through the latest 0.5V dd crossing point of the noisy voltage waveform. The slope is then selected such that the area which is encapsulated by that line, and straight lines v 1 (t) = 0.5V dd and v 2 (t) = V dd is equal to the area that is enclosed by the noisy input and lines v 1 and v 2 . In general, the more times the noisy waveform passes through the 0.5V dd level, the higher is the probability for this approach to produce pessimistic estimates.
2.4
Weighted Least Squared Error-Based Recently, a technique, which we will denote as WLS5, has been suggested in [1] that multiplies a weight factor to each squared term in the minimization equation of LSF3. This factor is defined as the derivative of the output to input signal for the noiseless input. WLS5-Step 1: Finding the derivative of the output to the noiseless input. For each logic cell, the derivative of the output to the noiseless input waveform, noiseless , is calculated as: , which are in turn set to time instances at which the noiseless input crosses the 0.1V dd and 0.9V dd levels, respectively.
WLS5-
Step 2: Finding eff . WLS5 finds eff with coefficients a and b such that Equation 2 is minimized (P denotes the number of sampling points in this paper):
The noiseless critical region in WLS5 acts as a filter. If the noise distortion occurs outside the noiseless critical region, then it will be ignored. Our experiments confirm that limiting the noise consideration to this range only, makes WLS5 inaccurate. Moreover, the higher the number of aggressors is, the higher is the probability that WLS5 underestimates the arrival time and/or slew at the output of the gate by a large amount.
Another weak point of WLS5 is that it is meaningful as long as the noiseless input and output waveform overlap each other; otherwise, the derivative of output to input is undefined. Therefore, WLS5 cannot be applied to gates with large intrinsic delay such as multi-stage gates, and/or those with large fanout loadings, where the input and output transitions may not overlap.
SGDP: Sensitivity-Based Gate Delay Propagation
This section describes the main steps of the proposed SGDP.
3.1
Calculation Steps SGDP-Step 1: Finding the derivative of the output to the noiseless input. This step is the same as that in WLS5. To address the weakness of WLS5 for gates with nonoverlapping input and output voltage transitions, SGDP adds additional pre-and post-processing steps as follows: SGDP-Additional step for non-overlapping input and output waveforms only: SGDP shifts the output back in time by an amount such that 0.5V dd for both the input and output voltage waveforms coincide. It then performs SGDP-Steps 1, 2, and 3. Finally, it shifts the equivalent input line forward in time by .
SGDP-Step 2: Estimation
) 3 ( } )) ( ) ( ( ) ( ) ( 2 1 ) ) ( ) ( ( ) ( { 2 1 0 b t a t v t v t b t a t v t k k noisy in k noisy in k eff P k k k noisy in k eff
4
Experimental Results Table 1 shows the gate delay errors for all of the techniques discussed in this paper, including SGDP, compared to Hspice. The gate delay was calculated as the difference between the 0.5V dd crossing points of the input and output waveforms.
Accuracy Comparison
Configuration I is the one depicted in Figure 1 with total coupling value of 100fF. We used standard inverter cells of an industrial TSMC 0.13µ cell library in our experiments. Both aggressor and victim line inputs, in_x and in_y, were 1000 m long and were given a slew of 150ps. Configuration II includes two aggressors x1 and x2 each with 100fF total coupling capacitance. These aggressors and the victim line, y, were each 500 m long and modeled similarly to the interconnects in Figure 1 . 200 noise injection timing cases in a range of 1ns were analyzed for each configuration. Results are reported in Table 1 . SGDP shows higher accuracy compared to all existing techniques. For instance, for configuration II, the average (maximum) delay error reduction is 2.6ps (4.8ps) compared to WLS5, which is the most accurate technique among the conventional ones. Therefore SGDP reduces the average (maximum) delay error by %15 (%10) over WLS5. 
Run-Time Comparison
Although the worst case computational complexity of all techniques including the SGDP can be proved to be of linear order with respect to P, in practice, we observed different run times. On average P1, P2, and LSF3, and E4 take about 40µs and WLS5 takes about 60µs to accomplish delay propagation through a typical logic gate on a Sun Blade 1000 machine. In contrast, both WLS5 and SGDP (with P = 35) take about 65µs. The SGDP run-time can be reduced by using smaller P values. However small P tends to result in lower timing analysis accuracy.
Conclusions
This paper presented a technique to efficiently propagate gate delay information for noisy waveform for the purpose of STA. Without any additional library characterization, it utilizes the sensitivity of the output to the noisy input waveform to model the impact of the shape of the waveform. Experiments demonstrate that this technique is more accurate than any existing technique e.
