Abstract
Introduction
Delay test has been investigated and used for many years to detect small manufacturing defects that do not cause functional failures but reduce the speed of circuits [1] . This has become a critical concern due to increased clock frequencies and reduced timing margins. At-speed delay testing can significantly increase delay fault coverage [2] , and it is a requirement for IC technologies below 180 nm [3] . Further, various delay fault models combined with at-speed scan test approaches [2] [4] have been used to detect the smallest delay faults in a circuit.
As technology scales into the deep submicron (DSM) regime, designs become more and more sensitive to various noise sources, such as leakage noise, crosstalk and power supply noise [5] . Excessive noise can significantly affect the performance of DSM designs and cause problems such as signal integrity [6] or additional delay [7] .
Power supply noise refers to the noise on the supply and ground network, which reduces device voltage levels and increases signal delay [5] [7] . As operating frequency and gate density increase, simultaneous switching activity per unit area increases, which increases power density [8] . Meanwhile, DSM CMOS technologies require the use of reduced supply voltages [9] . Industry data shows that until recently, the power density of high-end microprocessors was increasing by approximately 80% per technology generation, with the voltage scaling by 0.8 [8] . This has led to higher current density, and consequently increased power supply noise. Furthermore, as technology continues to scale and supply voltage level decreases, the gate delay is increasingly sensitive to supply voltage variations [8] [10] [11] . Experiments on ISCAS85 benchmark circuits using 180 nm technology show 10% critical path delay variations due to gate switching activity [11] . In 130 nm CMOS technology, if supply and ground voltages are allowed to vary by 10%, one can expect a 30% delay variation for typical gates [10] . For 90 nm 0.9 V technology, a 1% change in power supply voltage causes approximately a 4% change in delay for most static CMOS gates [8] . These trends lead to a larger power noise impact on delay.
A number of methods have been proposed for power supply noise analysis [6] [8] [10] [12] . To accurately predict noise, a comprehensive RLC network is needed to model package components, on-chip power interconnect and circuit elements [ 13 ] . However, for DSM designs, simulation of such a complicated network is infeasible, due to the long simulation time [10] . Different methods have been proposed to address this issue. Jiang et al. [7] proposed a vector-less approach using genetic algorithms. Liou et al. [14] further proposed a noise analysis method based on a statistical timing analysis framework. This work focused on worst-case voltage drop on the supply network, identifying power grid design issues.
There is also prior work on delay testing considering power supply noise. Tirumurti et al. [8] proposed a fault modeling method that added supply noise to a generalized fault model [15] . Pant et al. [10] also proposed a vectorless analysis approach for computing the maximum path delay under voltage fluctuations. Krstic et al. [16] , using a vector-based approach, focused on generating the maximum power supply noise on one path at a time. However, the resulting maximum noise estimation was too pessimistic compared with the mission-mode noise level. The motivation of this research is quite different from previous work. In industry, random fill of don't care bits is usually applied to test patterns to increase fortuitous detection of non-target defects. Such fill is performed automatically by common test compression approaches.
Paper 17.3
INTERNATIONAL TEST CONFERENCE 2
Unfortunately, this random fill can produce excessive supply noise and result in overkill [3] . Even worse, test compaction itself can generate excessive switching activity [ 17 ] . To reduce this overkill, we need a vector-based approach to analyze supply noise induced delay variation for each delay test pattern applied to a design. This approach must be fast enough for use in test pattern compaction. The research in this paper is an improvement on our prior work [18] [19] . In the prior work, we first used a path delay fault ATPG tool [20] to generate delay tests, and then apply our vector-dependent, layout-aware approach for power supply noise and delay analysis on each test pattern. A novel model consisting of both off-chip and onchip elements was proposed for fast vector-based power noise analysis, which avoided complicated and costly power network analysis. A delay model considering both spatial and temporal voltage variation [18] was applied to calculate path delay under supply noise. Model verification was performed on ISCAS89 benchmarks using Cadence Spectre circuit simulation.
One drawback of our prior work was that the model could only be applied on array-bond chips. The definition of region, on which the power model was based, was simply defined as the area centered on each power pad. Obviously, for wire-bond designs, this definition does not work. Therefore, in this work, we propose an improved model along with a region partitioning algorithm, so that our analysis approach can be applied to wire-bond chips as well as array-bond chips. The experiments for this work have been performed on a 130 nm wire-bond Philips design, and the model results have been compared with silicon measurements on an Agilent 93000 test system. This paper is organized as follows. Section 2 provides background on power supply noise and our approach for supply noise analysis. Section 3 introduces the delay model implemented in our tool and includes discussions about propagation delay under voltage variation. Section 4 introduces the circuit under test we used for our delay measurements and simulations. Section 5 compares our simulation results with silicon measurements. Section 6 gives conclusions and directions for future work.
Power Noise Model
As discussed in Section 1, there has been much prior work [6] 
Power Supply Noise
As discussed in Section 1, power supply noise is the noise on the supply and ground network, which reduces the actual device voltage levels and increases signal arrival time at the primary outputs and next state lines [5] [7] .
Power supply noise consists of both DC and sinusoidal content. The DC noise comes from IR drop due to wire resistance and the average static current demand of the chip [5] . In the case of DC noise, a DC network is built and solved to obtain the average IR-drop at each location [3] . The sinusoidal noise is our major concern. It comes from the RLC response of both chip and package due to switching current demands that peak at the beginning of the clock cycle [5] . Simulation-based techniques are usually used for dynamic supply noise analysis.
In traditional dynamic analysis for power supply noise, only the on-chip IR drop has been addressed, so most analysis tools model the on-chip power grid as a RC network. However, as we move into deep submicron design with higher frequency and circuit density, the LdI/dt noise becomes a significant concern. A comprehensive package and on-chip power grid model was introduced in [13] .
Power Noise Model
Much previous work [21] [22] [23] has been published on transient power grid analysis. However, RLC or RC network analysis is much too expensive for use in analyzing supply noise during delay test. For scan-based delay testing, we propose a solution for noise analysis that avoids heavy computation.
The basic idea is explained as follows. During the beginning of the launch clock cycle, when most switching activity occurs, the power pads are unable to provide current immediately to satisfy the switching current demand. This is because off-chip inductance prevents the supply current from rising appreciably before most transitions are propagated. Therefore, most charge demanded by the switching devices comes from close-by, on-chip sources, such as parasitic capacitance and decoupling capacitance. In this way, the noise problem is localized. The switching charge is finally provided by offchip sources, but it has little impact on propagation delay if most transitions complete before the off-chip current rises appreciably.
Effective Region Concept
In this work, the circuit is extracted as a large RC network, neglecting on-chip inductance as it is small compared to the package inductance. Assume a current impulse occurs somewhere in the network. Capacitors around this impulse will begin to discharge in order from nearby to far away, and result in localized voltage drop. However, if a capacitor is far enough away, it is possible that it will not discharge within the clock cycle. Such Paper 17.3
INTERNATIONAL TEST CONFERENCE 3
capacitors should be considered irrelevant to the noise analysis. Consequently, an effective region for a switching device is defined as the area whose RC time constant is less than or equal to the clock cycle time. Put another way, a capacitor only provides current to devices whose effective regions cover that capacitor. The RC time constant T of a region follows from the integration over the region area of the supply network resistance times the circuit capacitance [24] .
Computing Effective Regions
An algorithm has been developed to define effective regions for all devices on the chip. This algorithm only needs to be applied once for each circuit. Its flowchart is shown in Fig. 1 .
Fig. 1. Flowchart of the algorithm to find effective regions for all devices.
We first divide the whole chip area into m × n small squares, each containing a limited number of capacitors and switching devices. These grids will then be assigned to effective regions, which will determine the effective region for all devices and capacitances within the grid. The grid size is chosen such that the effective region can be accurately determined. In practice, the grid size can be quite large as long as its RC time constant is small compared with the clock cycle time. To determine the region associated with each grid, we start with the grid itself, and then increase the radius by one grid width each time to expand the region until the RC time constant equals or exceeds the clock cycle time. Some grids are only partially covered, but they are still considered part of the region as long as over 50% of the grid area is covered. We repeat this analysis for all grids, so that each has an effective region.
The complexity of the effective region algorithm is O(grid_count 2 ). As discussed above, grids must have RC time constants small compared to the clock cycle time to achieve good accuracy. In our experiments, we have found that we can achieve this accuracy by setting grid_count to the square root of cell count, so that the complexity of the algorithm is O(cell_count).
Noise Model
We make several approximations before proposing our noise model. First, the voltage level (and power supply noise) is uniform within each grid. Therefore, the voltage level for all cells in the grid is identical. This approximation is reasonable, since the spatial voltage variation within a small area is small, due to embedded capacitance.
Second, in response to a switching impulse, all capacitors in the effective region are assumed to be equally effective, despite their varying distance to the switching device. Therefore, the total switching charge in the grid is evenly provided by all capacitors in the effective region. For each grid in the region, the percentage of total charge it needs to provide for the center grid depends on the ratio of its capacitance to that of the whole region. Further, parasitic capacitance is approximated as constant, since. experiments show that the pattern-to-pattern variation of parasitic capacitance is small. This approximation makes the effective regions independent of test patterns.
A third approximation is that there is no current coming from off-chip sources. As we discussed in section 2.2, the power supply cannot response immediately to the impulsive switching current demand, due to high package inductance and the long idling time during the scan cycle prior to the launch cycle. Approximately, most switching activity occurs in the first half of the clock cycle. For example for the chip in [4] , the average path length is 3 ns, while the longest path is 7 ns. Therefore, the off-chip current is considered insignificant compared to on-chip current demand when most transitions are propagated, and it is neglected in this work. We will consider a more accurate model for off-chip current in future work.
Our simplified noise model within a grid is illustrated in Fig. 2 . As we have discussed, each grid contains two kinds of components: capacitors and switching devices. A grid provides current by discharging its capacitance for any switching devices whose effective region covers this grid. In the meantime, it absorbs current from all capacitors in its effective region. In Fig. 2, C distributed decoupling capacitance in a grid, and C p is the total parasitic capacitance of devices and interconnect connected to the power supply network in the current clock cycle. All switching cells that draw current from the supply within this region during the clock cycle are modeled as time-varying current sources, which will be discussed in Section 2.3.
The maximum voltage drop for a particular grid during a clock cycle is:
where Q i is the total switching charge of grid i, whose effective region covers the current grid, and α i is the ratio of the decoupling capacitance of the current grid to the whole effective region of grid i. 
Circuit Switching Model
Q i needs to be calculated for each switching device in order to compute ∆V max . Switching current drawn from the supply network in CMOS circuits consists mainly of two parts, the short circuit current and the charging/discharging current on the output capacitive load. Due to slew rate design constraints, the dynamic charging/discharging current is usually more significant than short circuit current. Each will be explained below.
Dynamic Charging Current
It is easy to estimate the dynamic charging/discharging current for CMOS devices. Tirumurti et al. [8] used simulation to create a table of peak power and ground currents for different values of cell output load and input slope. This approach incorporates both short-circuit and charging current. We adopt a similar approach to calculation charge due to dynamic charging current. As shown in Fig. 3 , the switching current waveform is approximated as triangular in library cell characterization. The triangular approximation was also used by Chen [6] . A table is built by simulation for each cell, such that one can determine its peak current and output transition time for different values of output load and input slope. Once we get the peak current and transition time from the table, the total charge demanded by a transition can be calculated as:
where I peak , t end and t begin are computed from simulation. 
Short Circuit Current
Short circuit current refers to the current flowing from the power supply to ground during transitions in a static CMOS gate [25] . As with the dynamic charging current, the short circuit current is dependent on the input rise/fall time, the load capacitance and gate design; and its waveform is modeled as triangular. Theoretically speaking, short circuit current becomes dominant if the load capacitance is small. A table method can be used to compute short circuit current. However, in this work, we use analytical functions to calculate short circuit charge for all cells.
The short circuit current model used here is based on previous work by Sylvester et al [26] [27] . By making various assumptions and approximations, the peak current is substituted with a certain fraction of the saturation current, and the time of short circuit current is approximated as:
where T short is the flow time, R d and R w are the device and wiring resistance, and C j , C in and C w are the junction, input and wiring capacitance, respectively. Assuming a triangular waveform for the short circuit current, we can calculate the short circuit charge using both T short and peak current. Fig. 4 is the flow chart of the entire noise analysis procedure for one test pattern. We first load the circuit netlist and layout to locate devices and extract parasitic capacitance. Each grid is then associated with an effective region as discussed in Section 2. 
Power Supply Noise Analysis Procedure

Delay Model
Power-noise-aware timing analysis consists of two consecutive steps: computing the on-chip voltage levels and computing the propagation delay on target paths under noise impact. The first step was discussed in section 2. In this section, we focus on propagation delay computation with noise.
Several delay definitions must first be given. Cell delay is measured as the time interval between the input crossing approximately V dd1 /2 and the output crossing approximately V dd2 /2, where V DD1 and V DD2 are the input and output voltage ranges of the cell. For both input and output, the accurate measure point is the 40% point for rising transitions and the 60% point for falling transitions. The transition time is specified in the 10% to 90% interval of full swing. Some prior work suggests the 30% to 70% interval is more accurate [28] .
Several models have been proposed for delay functions. Bai et al. proposed the following delay equation [29] :
where the coefficients can be obtained from simulation. The coefficients here strongly depend on the input transition time and output capacitance. Bai et al. also suggested linear functions of supply voltage with appropriate coefficients if the voltage drop is not too large [29] .
Another widely used delay modeling approach is to model both delay and transition time as a function of input slope, output capacitive load and device voltage level. This model either neglects the possible difference between voltage levels of drivers and receivers, or uses some equalization method [30] .
In our work, we first model both delay and transition time as a function of input slope and output capacitive load. A look-up The supply voltage varies during a clock cycle due to supply noise. The voltage level during the logic transition can be regarded as constant, if the time constant of the noise waveform is much larger than the transition time [30] . However, it is difficult to know the actual voltage level on a device during its transition, unless we know the real noise waveform and the real time of the transition. In this work we assume that the voltage level drops linearly with time, and the worst case voltage drop occurs when all switching activity finishes. This assumption is based on an approximation that most paths are similar length, so that switching density is uniform until the time that most switching activity finishes. At that point, there are only a few long paths still propagating, as in [4] . We further take the voltage at the nominal switching time of the device, since we do not know the actual switching time. Spatial voltage variation is not taken into account in this work, so we simply assume the device input voltage level is the same as the device voltage. That is, we do not consider the delay effects of different driver and receiver supply voltages.
Circuit under Test
The circuit under test (CUT) is a DSP-like core of a test chip in a 160-pin quad flat pack (QFP). The core is fabricated in a 130 nm technology with a standard cell library. No dynamic logic is used and the circuit contains more than 1 million transistors. The nominal supply voltage is 1.2 V. The same device was used in a study of fine delay fault detection by Kruseman et al. [4] .
To show the requirements for the detection of small delay-faults, the "delay map" method was introduced [4] . In this method, the delays of all transitions in a pattern set are determined. Fig. 5 shows an example of these measurements for two ICs. The data stems from a single launch-on-capture gate-delay pattern and shows the transition time along the glitch free paths. The delay is measured with a step size of 25 ps on an Agilent 93000 test system.
The data lies on a straight line at an angle close to 45 o . Global process variations typically affect all paths and exhibit themselves as a deviation in this angle. The tightness of the distribution (e.g. the deviation from the straight line) reflects measurement inaccuracies and local process variations. Typical the standard deviation of this distribution for two measurements on the same IC is 15 ps and this represents the measurement inaccuracy. This variation is in good agreement with the expected error of 2 times the 10 ps edge placement resolution of the test system. Measurements between two different ICs for the complete pattern set typically have a distribution with σ=70 ps. The distribution for a single path, as shown in Fig. 5, is typically narrower (in this case σ=28 ps) . Instead of comparing measurement vs. measurement, one can also compare measurement vs. digital simulation (see Fig. 6 ). The experimental and simulation data show a strong correlation but do not have a distribution as tight as Paper 17.3 INTERNATIONAL TEST CONFERENCE 7
in Fig. 5 . By itself this is not surprising, since it is well known that digital simulations will deviate from actual measurements. These deviations are caused by several reasons, such as deviations between actual and modeled process parameters, incorrectly modeled voltage drops and the general modeling limitations of digital simulations. Hence, it is difficult to get perfect agreement, certainly if the only information is a single fail frequency for an IC. In this case, however, we do have delay information for a large set of paths, for which all measurements are performed in the same clock cycle. Hence we do not expect an absolute match but we hope to see a relative match. This should translate into a straight line with a narrow distribution and an angle that will deviate from 45 o , since the assumed and actual process parameters may be different. In Fig. 6 , 'path1' and 'path2' have a large deviation. According to the measurements, the difference in delay between the paths is less than 0.3 ns (measured on different chip samples) while according to simulation they have a systematic difference of more than 0.6 ns. If this was not the case, we would not have the straight line in Fig. 6 . This mismatch justifies a more thorough investigation to quantify the effects influencing the delay.
The pattern shown in Fig. 6 was selected because the difference between 'path1' and 'path2' is one of the largest we encountered in the complete set of 1312 delay test patterns. Furthermore, these paths are both in the same corner of the chip. This makes them the most interesting subjects for a power supply noise analysis, since they are affected by global variations, but potentially also by local variations in power supply noise. One explanation for the mismatch between the two paths could be that the local V DD is different for each path due to differences in local switching activity. Even if a mismatch between analog and digital simulations is the more likely explanation, it is important to exclude other sources, such as local supply voltage variation. Therefore power supply noise analysis was used to investigate these two paths and determine the impact of noise on their delay. Three other paths with mismatches were also considered. We did not have the resources to consider additional paths.
Experimental Results
A power supply noise analysis tool has been developed in C++ and run on a 2.3 GHz Pentium 4 system. The experiments were performed on the industrial design introduced in Section 4. This design has only one effective region since the RC time constant of the whole chip area is less than the clock cycle time. The launch-on-capture path delay patterns generated by the ATPG tool leave all the unassigned pattern bits as "don't care". The paths are all statically sensitized such that all side inputs of the path are restricted to be static non-controlling. This ensures transitions propagate on the target path in our experiments. The filling strategy we adopt in this work is to randomly set these "don't care" bits to 1 with a specified probability.
We vary this filling rate and generate sets of patterns. Each pattern set targets one path. For each test set, we expect to have different noise impact on path delay due to different switching activity produced by different don't care bit assignments. Fig. 7 plots the silicon delay measurements vs. 1-filling rate for a test set targeting the longest path among the five paths considered in the experiments. The discretization in the delays in this and other figures is due to the step size of 25 ps employed in the measurements. The 1-filling rate is defined as the probability of setting a "don't care" bit to 1 during pattern filling. In the experiments, each pattern is filled ten times at each filling rate, from 0% to 100% in 10% steps, for a total of 92 filled patterns per unfilled pattern. The result shows that the delay variation between high noise and low noise is as much as 0.85 ns, which is 15% of the path delay, assuming the smallest delay value measured is the nominal delay. The delay variation is not as large on the shorter paths, but is still significant. Fig. 7 also shows that for the target path, a higher 1-filling rate generally produces a longer delay. In Fig. 7 , the maximum average delay appears at 90% 1-filling rate. Test sets generated for other paths also show the same trend. This phenomenon is due to the particular characteristics of the circuit function and design. Test patterns with a higher 1-filling rate are more likely to generate more switching activity. Note that we use launch-on-capture patterns, so 100% 1-filling can still result in any activity level between 0 and 100%. In contrast, 100% 1-filling of launch-on-shift patterns would produce an activity close to 0%. One circuit characteristic that causes this skewed behavior for 1-filling is a heavy usage of AND/NAND gates in the first stages of the paths. If either or both inputs change state, a transition occurs. This is in contrast to 0-filling, in which both inputs need to change state to create a transition.
A simple metric to investigate the activity is to count the number of signal transitions. Fig. 8 Although Fig. 7 and Fig. 8 confirm the dependence between activity and delay, it does not quantify it. For this we use our timing analysis tool, which takes into account the power supply noise. Fig. 9 shows the delay measured by the tester versus the delay from our timing analysis tool for the same test set as in Fig. 7 and Fig. 8 . The correlation is 0.83 with an intercept that is non-zero. A zero-intercept was not expected since, as explained in Section 4, there can be a variety of errors between simulation and measurement. In this research we are not interested in an absolute agreement, only a relative one. The offset is especially clear in Fig. 10 , which shows the measurements for each pattern, simulated nominal delay, and simulated noise-induced delay increase. Both the simulated nominal delay value and the simulated delay increase vs. the 1-filling rate are lower than the measurements. This is due to shortcomings in the delay model characterization. The correlation, however, shows that extra delay measured on the tester can be well explained by the impact of power supply noise.
Experimental results for test sets generated for other paths also show a correlation similar to that of the longest path. One exception is a relatively short path. For this path, we only observe an increase in delay for 1-filling rates of 70% and more. The delay continues to increase for higher fill rates. One explanation is that the propagation on the short path ends before many transitions on average length paths occur. As a consequence, the voltage drop impact on delay for this path is smaller than for the one shown in Fig.  9 . Not surprisingly this path did not show good correlation between delay and noise. Nevertheless, the similar behavior for most of the paths indicates that global effects for this circuit are in general more important than local effects. Hence, power supply noise effects cannot explain the differences between path1 and path2 observed in Fig.  6 . But what is also clear from the present experiments is that power supply noise does have a significant impact on path delay. 
Conclusion and Future Work
We have developed a methodology to analyze patterndependent power supply noise level and the noise impact on delay during delay test. We have proposed low cost models for fast analysis. Unlike our previous research, this new methodology can be applied to both array-bond and wire-bond chips. Experiments using simulation and tester measurements have been performed on an industrial design. We found that the 1-filling rate caused a significant variation in delay, and that this delay variation could be explained by the power supply noise variation induced by the 1-filling rate. We found that a 10% worst-case voltage drop can cause a 15% delay increase. The impact on longer
