Abstract-In modern technologies, process variations can be quite substantial, often causing design timing failures. It is essential that those errors be correctly and quickly diagnosed. Unfortunately, the resolution of the existing delay-fault diagnostic methodologies is still unsatisfactory. In this paper, the feasibility of using the circuit timing information to guide the delay-fault diagnosis is investigated. A novel and efficient diagnostic approach based on the delay window propagation (DWP) is proposed to achieve significantly better diagnostic results than those of an existing delay-fault diagnostic commercial tool. Besides locating the source of the timing errors, for each identified candidate the proposed method determines the most probable delay defect size. The experimental results indicate that the new method diagnoses timing faults with very good resolution.
I. INTRODUCTION

D
UE to process parameter variations, a circuit may fail to operate at the desired clock speed. A critical task for a failure analyzer is to locate quickly and accurately the cause of timing failures. The quality of a failure analyzer is measured by the resolution, which is defined as a ratio of the number of true faults to the total number of reported candidates. Unfortunately, the existing delay-fault diagnostic methodologies suffer from very poor resolution. Industrial data suggest that on average it may take about 240 h to locate an open via defect by screening under the microscope the failure candidates reported by the diagnostic tool. If the diagnostic tool reports too many candidates, the time-to-market requirement is hard to satisfy.
Delay-fault diagnosis has been studied in the past. However, most of the existing works explicitly or implicitly assume that the delay-fault-model-based simulation results reflect exactly the delay defect behavior in real silicon [4] - [7] . However, this assumption is generally not true. It is well known that different paths in a circuit may have different delays and slacks. A real delay defect can demonstrate itself only on those sensitized paths whose slacks are smaller than the delay size of the defect. Other delay defects cannot be observed. The matching mechanism used in some of the existing techniques may produce wrong or inconclusive results due to mismatches between the delay-fault simulation results and the real defect behavior. The algorithms based on path tracing alleviate this problem [6] . They back-trace the sensitized paths from each failing primary output (PO) or scan cell. They work based on the assumption that the failure has been caused by a single fault, anticipating that the real faulty site should be located in the intersection of the fanin cones of the observed failure POs or scan cells. However, in practice, this method still produces too many candidates.
A more recent work advocates using statistical timing information to guide the delay defect diagnosis [8] , [9] , which produces good diagnostic results. In this method, it is assumed that the probability density functions (pdfs) of each internal cell/interconnect are known. In reality, however, the accurate pdf information may not be easily available.
In [3] , the authors describe an approach based on static timing information targeting the multiple delay-fault diagnosis. For each fault candidate, they try to use a robustly tested path and observe a fault-free situation to determine the upper and lower bounds for a suspect delay fault. The experimental results [3] show a much improved diagnostic resolution compared to that of nontiming-based approaches. However, the resolution is still unsatisfactory for time-to-market requirements.
This paper proposes a delay defect diagnosis method based on the timing information extracted from the layout. A simulation-based approach is applied to diagnose the timing failure responses. For a given design, the circuit timing information is obtained from the standard delay format (SDF) file that contains the interconnect delay and gate pin-to-pin rising/falling delay. The proposed novel and efficient technique based on the delay window propagation (DWP) diagnoses the failure responses caused by the delay defects. Experimental results show that the method not only reports quite accurately the fault candidates (high resolution) but also suggests delay defect sizes of the reported candidates.
The rest of the paper is organized as follows. In Section II, the preliminary concepts are introduced and the delay-fault diagnosis algorithm used in the existing commercial tool is explained briefly. In Section III, the proposed diagnostic algorithm is described. In Section IV, practical issues and the feasibility of the method are discussed. In Section V, the authors extend the approach to multiple delay faults. In Section VI, the resolution and the performance of the algorithm for the International Symposium on Circuits and Systems (ISCAS) circuits are reported. Section VII concludes the paper.
II. PRELIMINARY
It was assumed that the faults present in the circuit are modeled as transition faults [1] (slow to rise, slow to fall, slow). Slow transition faults are at those sites that experience long delays each time a transition occurs. Due to its simplicity, this model has been widely used in the majority of delay-fault testing and diagnosis works [11] .
Since the authors will be referring to a commercial delayfault diagnostic tool, for the sake of completeness the algorithm is summarized here. The authors call it Alg_no_timing. Alg_no_timing is also based on the single-transition fault model.
It can be said that the fault simulation result covers the failure response when the faulty effect can propagate to all the observed failure outputs. It is possible that the fault simulation may cause more POs to fail.
A. Alg_no_timing 1) Initialize the fault candidate list using the path-tracing technique. The initial fault candidates satisfy the following requirements, which reduce the search space and improve the diagnostic efficiency: a) the fault must reside in the input cone of a failing PO of the given pattern; b) if a failing pattern affects more than one PO, the candidate fault must reside in the intersection of all the input cones of the failing POs (based on the assumption of single delay fault per pattern). 2) Simulate each transition fault on the initial candidate list to see if the simulation results cover the observed failure responses. If they do, store the fault as a candidate and assign to it a weight equal to the number of patterns it explains in the current list. The fault simulation results are said to cover the observed failure responses if each observed failing pattern is also a simulated failing pattern for the same POs. An example of a cover relationship is illustrated in Fig. 1.  3 ) After explaining the entire failing pattern list, or when all faults in the initial list have been examined, the algorithm terminates and reports the possible candidate sites. Candidate faults are sorted by their weights. The fault with the greatest weight is reported first.
Because of the cover relationship between the observed and the simulated results, and based on the single-fault assumption, it can be concluded that if indeed a real single-delay fault has caused the timing failure, then that fault must be on the reported candidate list. The candidates reported by Alg_no_timing form a super set of real faulty sites. The diagnostic results on real designs may produce a huge number of candidates. The purpose of this paper is to improve the diagnostic resolution and also to prune unlikely candidates. When multiple delay faults exist in a circuit, a test pattern may activate several faults, thus inducing multiple-fault behaviors.
Each failing pattern p in the given diagnostic test set T is classified into one of the following two types.
Type-1 failing pattern: p can activate only one fault and observe its effect. Other faults cannot be activated nor their faulty effects be observed. Type-2 failing pattern: p can activate multiple faults and observe their faulty effects through different POs.
III. OUR APPROACH
This paper investigated the feasibility and efficiency of using the delay and timing information for delay-fault diagnosis.
The circuit delay and timing information can be obtained from the SDF files (refer to IEEE DASC P1497 SDF Standard [13] for more details about the file format). The SDF file contains the internal gate timing/delay and the interconnect delay. For each pin-to-pin and interconnect, SDF provides the rising/falling transition delay if the transition occurs.
A. Timing-Based Simulator
A framework that utilizes the delay information to emulate the real chip and tester timing behavior during at-speed testing has been developed. This framework allows evaluation of the proposed diagnostic algorithm. The implementation details are not included here. Instead, a simple example is given to show how the real circuit timing/delay for given test sets can be emulated. In this example, a pair of delay values was used to represent the pin-to-pin rising and falling delays. All the examples discussed in this paper are based on this simplification. However, in the framework, these values are replaced by pairs of delay ranges that span from the fastest rising (falling) delay to the slowest rising (falling) delay.
Consider the example in Fig. 2 . The numbers shown beside each gate's inputs are the corresponding rising/falling delays from the input to the output of that gate. For example, if there is a transition on the output of the gate g 2 caused by the signal A rising transition, the delay from A to g 2 output E is 5. Here, the authors ignore all the interconnect delays to simplify the explanation. For different patterns (structural or functional tests), the circuit delays and long(est) paths may vary.
For the pattern P 1 : {ABCD} : {r 1 r 1}(r = rising), there are two rising transitions on E and F . Because the logic value "1" is controlling for the OR gate g 1 , the circuit delay will be decided by the earliest transition, which occurs on E. So even though there is a longer delay on F , its effect cannot be propagated further. The longest path delay in the good circuit for this pattern is 5 + 7 = 12. The path that contributes to the circuit delay is from A to Z.
For the pattern P 2 : {ABCD} : { f 1 f f}( f = falling), there are two falling transitions on E and F . Because 0 is a controlling value for the AND gates g 2 and g 3 , the circuit delay will be decided by the latest transition, which occurs on F . Note that F 's delay is defined by C and not by D, whose delay is longer. The longest path delay in a good circuit mode for this pattern is 6.2 + 7.7 = 13.9. The path that determines the circuit delay is from C to Z.
B. DWP
This section describes the novel simulation technique, the DWP. The purpose of this technique is to determine, for a given fault candidate, its capability of explaining the failing and passing pattern responses.
In this paper, the authors define a delay window (DW) of a fault candidate. It is a delay range [a, b] whose lower bound is the smallest possible delay-fault size and the upper bound is the biggest possible delay-fault size.
A conventional interval arithmetic is defined on the DWs.
The timing model will now be illustrated with an example. Fig. 3(a) shows the information available from the timing simulation. It was assumed that there is a rising transition (R) at line A, the signal arrival time at A is d A , and the pin-topin rising delay from A to Z is d A−Z . DW A is the DW (or delay range) of the target fault at A. It corresponds to the delay that could be introduced by a target delay fault at A. This information can be transformed into the form presented in Fig. 3(b) , where t AZ stands for the arrival time at Z coming from A. This arrival time is computed without considering the delay-fault effect. t AZ is equal to the arrival time at A plus the pin-to-pin rising delay from A to Z. The authors put t AZ beside the signal propagation path (from A to Z). The delay-faultaffected signal arrival time coming from A to Z is denoted by D AZ , which considers the delay fault DW. For the case shown in the figure, B is a constant 1 and the DW of the target fault at Z is still [a, b] . Next, it will be shown how to update the DW of a fault.
For a given pattern P and a fault candidate f under evaluation, the authors begin the timing-fault simulation from the faulty location f by assigning an initial DW [0, T ]. The lower bound of this window is 0, which means that the minimum delay for f is 0. The initial upper bound for f is the operating clock period T . For each candidate, those windows will be propagated (and possibly shrunk) along the sensitized paths.
The examples in Fig. 4 show how the DWs are updated as they propagate through an AND gate. For other gate types (GTs) and combinations of rising and falling signals, the rules are similar. If the f 's faulty effect propagates to a signal line, the corresponding DW can also propagate to this line. The symbol R (F ) stands for the rising(falling) transition on the signal line.
The example in Fig. 4 (a) shows how the lower bound of a DW is updated. Suppose the faulty effect propagates to A, the DW on A is [a, b], and the current simulating pattern produces rising transitions on both A and B. To propagate the delay effect from A to Z, the arrival time on A and B must satisfy the condition t 1 + a > t 2 , where t 1 (t 2 ) is the arrival time propagated from input A(B) calculated using the model described in Fig. 3 . The lower bound of the DW of the fault at Z must show how to update the lower bound of the DW. Suppose the arrival time propagating from A without considering the delay-fault effect is 5, the arrival time propagating from B is 8 and the delay fault can propagate to A with delay range [1, 10] . To further propagate this delay effect to Z, the minimum delay of the candidate fault must be larger than max[1, (8 − 5)] = 3. The DW lower bound is changed to 3 as illustrated in Fig. 5 . Suppose that in Fig. 5 [a, b] is changed to [5, 10] (from [1, 10] ). The DW lower bound at Z is max[5, (8 − 5)] = 5 (not 3). It is the same as the DW's lower bound at A. The DW at Z is now 5 because it must have a minimum delay value of 5 to propagate the candidate's faulty effect to the g's input A before further updating the DW at Z. Although for the gate g the candidate's delay effect with minimum size (8 − 5) = 3 can propagate from A to Z, the authors need to consider all the constraints along the sensitizable paths. For different GTs, the formulas for computing the DW lower bound are listed below. The authors list here only four GTs. An (N)XOR gate can be decomposed into AND/OR gates, and the analysis is the same as for those GTs. Table I lists the formulas for updating the lower bound of the candidate DW. It was assumed that each gate has up to two inputs. This analysis can be simply extended to gates with more than two inputs. The first column shows the GT. In the second column, "On_in" stands for the on-input the faulty effect can propagate to. t 1 is the arrival time without considering the delay effects at the gate's output when the output transition comes from the on-input. In the third column, "Off_in" stands for the input other than the on-input. t 2 is the arrival time at the gate's output when the output transition comes from the off-input. The fourth column shows the input transition type (I_Tran), and R(F ) stands for both inputs having rising (falling) transitions. If the off-input has no transition (has a constant value v), DW will propagate to the gate's output without changing if v is a noncontrolling value for this gate, or the propagation will terminate if v is a controlling value. If inputs have different transition types, the propagation terminates because the gate's output has no definite transition. However, in reality, due to the timing of switching inputs, a glitch or hazard could happen. At present, the proposed approach here cannot handle this situation. This problem is addressed in the subsequent section where the algorithm's limitations are discussed. In the last column, the updated DW's lower bound is shown. "Stop" stands for the termination of the DW's propagation. For example, if both inputs of an AND gate have falling transitions, no delay effect was observed at the gate's output.
If the DW's lower bound a is bigger than the upper bound b, simulation is aborted because there is no valid delay size for the given fault candidate. Such a candidate is not valid. During the propagation process, the authors always check if DW's lower bound does not exceed its upper bound.
In Fig. 4 (b), DWs propagate to each fanout branch from the stem. 
Next, DW Z is determined. Two cases need to be considered separately. The DWP is performed by applying all test patterns for the given fault candidate. If the DW can propagate to any primary output under a given pattern P , a tuple with four components { f, P, PO i , DW i } is used to record this information. It records that the candidate f 's DW (DW i ) can propagate to the primary output PO i under the pattern P .
When a DW propagates to a PO at which a failure response is observed, the lower bound of DW i is updated. It should be bigger than T − D PO , where T is the operating clock's period and is the lower bound of the delay value observed at PO.
So far, the authors have discussed only how the lower bound of a DW is updated. The upper bound of DW is updated using the passing point information.
Suppose that for a given pattern P we cannot observe a delay failure on the output PO i at the capture time. If, however, we recorded a tuple at PO i under P , the upper bound of DW i must be smaller than T − D PO i , where T is the operating clock and D PO i is the delay value at PO i calculated from the fault-free circuit simulation.
It will be explained in Section III-D how to extract useful information from those tuples and further prune the unlikely candidates.
C. The Diagnostic Algorithm
A novel algorithm to diagnose the delay defects using the timing information and delay simulation is proposed. The algorithm is based on the assumption that each failing pattern attributes its response to a single fault location. However, the algorithm is capable of identifying multiple fault locations as long as each failing pattern is affected by one fault only. We make another assumption that if two candidates have the same explanation capability for a set of failing and passing patterns, and we have the delay size information for each candidate, the smaller delay size candidate has a higher probability of being the real defect. The experimental results here and the results in [8] support this assumption.
The algorithm referred as Alg_timing proceeds as follows. 1) Back-tracing diagnosis: The set of initial fault candidates F initial is determined by using the algorithm Alg_no_timing. 2) DWP and pruning: For each candidate in the F initial , the authors apply the DWP technique, simulate the patterns, and record the tuples. They prune the unlikely candidates by checking the rules (which will be described in detail in Section III-D). The candidate set after pruning is denoted as F after_DWP . 3) Refinement: If a group of candidates in F after_DWP has the same explanation capability for a set of failing and passing patterns, the tuple information obtained from the observation points is used to deduce the most likely delay size of each candidate in this group. Then the candidates are ranked by their delay sizes. Candidates with smaller sizes are reported earlier. If multiple delay faults exist in the circuit, it is not always true that the real cause of failure is included in the initial candidate list. This may happen if two delay faults affect two POs on the same pattern and their sensitizable paths are disjoint. It is possible that both faults' simulation results do not cover the observed failure and both faults are deleted from the candidate list. Section V discusses this multiple fault situation in more detail and proposes a solution.
D. Pruning Rules
Here, the rules for pruning the impossible candidates are introduced. The rules are derived based on the DW relations at the observation points.
Rule 1) For two tuples { f, P, PO 1 , DW 1 } and { f, P, PO 2 , DW 2 }, if at-speed failure responses are observed on two primary outputs PO 1 and PO 2 , but DW 1 ∩ DW 2 = Φ, then f is not a candidate. In this case, the DWs have a conflict, as shown in Fig. 6 . This rule eliminates those cases when there is no consistent delay value for f that would propagate its effect to both failing POs. Rule 2) Calculate slacks along the paths propagating the failure f to PO 1 and PO 2 . If the timing failure is only observed on PO 1 but not on PO 2 , and the slack s 1 to PO 1 is bigger than s 2 , the slack to PO 2 , then f is not the candidate. Fig. 7 illustrates this situation. Rule 3) For a tuple { f, P, PO 1 , DW 1 }, if a failure on PO 1 is observed, but the summation of the upper bound of DW 1 and the path delay value calculated on PO 1 is less than the clock period T , f is not the candidate. Fig. 8 shows this situation where t1 is the fault-free delay value calculated as described in Section III-A. Rule 4) If failure on PO 1 is not observed, but the summation of the lower bound of DW 1 and the delay value calculated on PO 1 is greater than the clock period T , f is not the candidate for P (Fig. 9 ). For each candidate and every failing/passing pattern, after applying the above checking rules, many unlikely candidates are pruned. In the implementation here, instead of deleting a candidate that fails the rules, a penalty is assigned for each failed rule. After simulating all the fault candidates, they are ranked based on their penalty scores. The better-matched candidates have a higher rank. A simple penalty function is used and a ten-point credit for each matched PO and a five-point penalty for each failed checking rule is used. The score of each candidate is the sum of its credits and penalties. Different penalty functions may affect the order of reported candidates. The experiments suggest that this simple penalty function works well and assigns higher ranks to the real injected faults.
E. Refinement
In the last step, after rule checking and applying the penalty function, many unlikely candidates are removed. In the refinement step, all the failing patterns that can be explained by the fault candidates are considered. Ranking of most higherranked candidates will remain unchanged. For each of those candidate faults with the same score obtained in the previous step, the tuples are combined for different observation points and different patterns, and then the DW distribution of each candidate is determined. For each fault, DWs are collected and the delay distribution graph wave(t) is constructed. wave(t) is built such that for each delay t a value equal to the number of DWs that cover t can be assigned. For each fault, the delay value/range that has the highest weight is the greatest possible delay size for this fault. The waveforms of the candidates are measured by their integrals from 0 to t, which yield the total area Area(t) covered by a wave(t)
The percentage function is defined as percentage(t) = Area(t)/Area(T ). For each candidate, three points-t lb , t mid , and t ub -are determined by setting the percentage(t) to be 0.3, 0.5, and 0.7 and solving the percentage function. An example explaining this step is in Fig. 10 . Fig. 10(a) is the DW information for a given fault. Fig. 10(b) is the delay distribution graph wave(t). Fig. 10(c) shows the waveform for Area(t) and the values of t ub , t mid , and t lb .
A fault candidate that has a clustered distribution graph is more meaningful than a candidate that has a sparse distribution graph. The clustered distribution demonstrates that the candidate's delay size is within a small range as long as it can be activated. This can be quantified by the density function density( f ) = [Area(t ub ) − Area(t lb )]/(t ub − t lb ). A higher density indicates that the candidate's delay value is more densely clustered in a smaller delay range. In the example in Fig. 11 , when all waveforms cover the same area, the candidate that has the delay distribution (a) is better than the candidate that has the delay distribution (b).
After obtaining all the distribution graphs for all the faults, it may happen that two faults have about the same density. Based on the assumption stated in Section III-B, the fault that has a smaller t mid is a better candidate than the fault that has a larger t mid . In Fig. 11 , if the three candidates have the same capability to explain all the failing and passing pattern responses (and their waveforms cover the same area), we can use the density function and t mid to rank them. In this example, the ranks of three candidates are (c) > (a) > (b).
The complete diagnosis flow is shown in Fig. 12 .
IV. PRACTICAL ISSUES
A. Storing the Timing Information
Since the timing information may require a huge space, storing it efficiently is an issue. It is common to find in a circuit the networks of gates that logically correspond to a large AND/OR gate. Such structures are referred to as supergates (SGs) and can be efficiently detected [10] . The authors propose using SGs to reduce the storage requirements. The delay information can be stored for SGs instead of individual gates.
The example in Fig. 13 will be used to explain how to store the delay information using SGs. Suppose the timing information (rising/falling, interconnect) for each gate and interconnect in Fig. 13(a) is obtained. Because the extracted SG is fanoutfree, the delay value of any path from X i to B (either rising or falling transition on that path) can be lumped to the boundary of the SG. All the information needed to store are the lumped delays and the inversion status on the boundary of the SG. When the delay effect is propagated, the SGs speed up the simulation. The authors do not need to perform traditional gateby-gate-evaluation. Instead, they perform SG by SG simulation. If the fault candidate is inside an SG, a traditional gate by gate simulation is used for all the gates in this SG. Then the simulation proceeds using SGs.
The experimental data on ISCAS circuits show that the SG circuit transformation yields a factor of 5-7 storage space reduction.
B. Propagating DWs
To achieve higher performance and memory efficiency, the parallel pattern single fault propagation (PPSFP) technique is used to simulate 32 patterns at a time. The event-driven simulation technique is applied to improve memory efficiency. Nontiming and timing simulations are performed. For the nontiming simulation, the traditional transition fault simulation and PPSFP technique are used to achieve good performance. For the timing-based simulation that is pattern dependent, a single pattern single fault propagation (SPSFP) mechanism is used. The authors have found experimentally that a bottleneck of the proposed technique is in this SPSFP step. However, the number of failing patterns available for diagnosis (normally fewer than 100) is rather small compared to the total test volume. Also, the number of simulated fault candidates obtained from the Alg_no_timing is normally less than 100. For those reasons, the simulation effort is still reasonable for diagnostic purposes.
C. Algorithm Complexity and Limitations
The proposed algorithm is a simulation-based approach. The input of the algorithm is the initial fault candidate list reported from Alg_no_timing and the failing pattern list. A simulation for each failing pattern and each initial fault candidate was performed. The time complexity is O(m · n · N ), where m is the number of total failing patterns, n is the number of initial fault candidates reported by Alg_no_timing, and N is the circuit size. In practice, m and n are less than 100. Also, because the event-driven technique is embedded into the diagnosis technique, which normally simulates only a small portion of a circuit, the proposed algorithm is still efficient in terms of run time.
However, a few limitations of the method have to be pointed out. The first limitation is caused by the assumed fault model. The authors have assumed a transition fault model for diagnosis. However, the circuit timing failure may be caused by the accumulated delay defects instead of a gross delay. The method assumes a single gross delay site as the cause of failure. This assumption may cause the authors not to find all the accumulated defect sites. Another limitation is that the authors do not consider glitches. They perform the DWP based on the final stable signal values. In reality, due to different arrival times at the gate inputs, race conditions may occur that in turn may cause a static hazard. Fig. 14 shows a simple example illustrating this situation. The algorithm does not detect or propagate glitches that may cause errors. In practice, if a glich is small, it will be suppressed after a few propagation stages due to the noise margin of the downstream logic. The authors may want to avoid patterns causing the opposite transitions at the same gate. However, in the worst case, due to process variations or design tool limitations, glitches may potentially arise. 
V. MULTIPLE DELAY-FAULT DIAGNOSIS
In practice, delay defects may occur at multiple locations. Fig. 15 shows examples of two delay faults at different locations on propagation paths. In each case, these two faults are activated and their faulty effects are propagated and observed on one or more POs. In Fig. 15(a) , two faults' delay-effect propagation paths merge at a gate and the faulty behavior is observed on the same PO(s). In this example, the fault that introduces a bigger delay at the merging gate's output is actually observed. The other fault's delay effect is marked, unless it can propagate through other sensitizable paths. In Fig. 15(b) , the fault ( f 2 ) is located on another fault ( f 1 )'s delay-effect propagation path. In this case, the effects of f 1 and f 2 accumulate. In Fig. 15(c) , two faults have disjoint delay-effect propagation paths. Their faulty effects can be observed through different POs.
When multiple faults' delay-effect propagation paths have the relationships as those shown in Fig. 15(a) or (b) , each of the faults can be identified. For the case shown in Fig. 15(a) , the location on the path that has a smaller slack (either from f 1 to PO or from f 2 to PO) is reported as a higher-rank candidate based on the previous assumption that a smaller delay size candidate has a higher rank. For the relationship shown in Fig. 15(b) , the diagnosis algorithm reports all the locations along the faulty sensitizable path if no more sensitizable path information exists.
In reality, multiple-delay-fault sensitizable paths can form a complex combination of the three simple cases shown in Fig. 15 . Difficulties arise when multiple faults' delay-effect propagation paths contain the relationship shown in Fig. 15(c) . In such a case, different faults may be observed from different groups of POs for the same pattern.
In practice, diagnostic algorithms apply the assumption of using as few faulty locations to explain as many failing responses as possible. In the experiments here, the authors found that even though multiple delay faults exist in a circuit, many failing patterns can activate and observe faulty effects of only one of them. Such failing patterns fall into the type-1 failing pattern category.
The diagnostic algorithm proposed in Section III-C can be altered to diagnose failure responses caused by multiple delayfault sites.
For multiple-fault diagnosis, the initial candidate fault must reside in at least one of the fanin cones of the failing POs of the given pattern. By making this conservative assumption, the possible candidates for the multiple fault behavior will not be missed. However, this increases the number of initial fault candidates.
Some changes in the diagnostic steps are necessary. First, the diagnosis must be performed based on the single fault per pattern assumption and the candidates that could explain some failing patterns must be found. Even though multiple faults exist in the circuit, if the failure responses observed from some patterns can be explained by the fault behavior of one of those multiple faults, this fault is included in the candidate list. Those explained patterns are type-1 patterns according to the pattern classification in Section II. By using type-1 patterns, many fault locations can be identified correctly and easily. If most of the failing patterns can be explained by a single fault, the incremental approach can be used to diagnose the unexplained failing patterns. We inject the highest-ranked candidate into the circuit with the delay size equal to its DW lower bound obtained from other failing patterns. Then the diagnosis is performed again based on the timing-modified circuit using the same algorithm. When this is done, some of the faults that are only activated by type-2 failing patterns can be successfully identified. Those injected faults that can be detected by the type-2 patterns should contain the relation described in Fig. 15(c) in their delay-effect sensitization paths. Otherwise, there is no difficulty in identifying those fault locations. However, if every failing pattern is affected by multiple faults (type-2 failing pattern), the proposed algorithm may encounter difficulties. To overcome this limitation, a failing-PO-partition technique proposed in [12] may be used. This technique is based on the observation that normally a fault tends to be observed from very clustered POs, even for different patterns. Using a back-tracing approach, we find a single-fault whose simulation results can explain the most failure responses. The basic idea of this technique is to partition the failing POs into groups such that each group can be explained by a single fault candidate. After the partitioning, most of the type-2 failing patterns are transformed into several type-1 failing patterns so that the proposed algorithm can be applied further. This partitioning technique is very useful especially for big industrial designs, most of which are full scanned and very flat, with a large number of observation points.
The experimental results have shown that most of the failing patterns are of type-1. For small number of test cases whose failures exhibit only type-2 patterns, the partition technique usually finds the correct candidates. However, the authors observed that due to the fewer failing POs in each group, the number of initial fault candidates obtained from Alg_timing increases. This causes more simulation effort for Alg_timing.
After finding multiple locations by using type-1 patterns and the failing-PO-partition technique, the authors rank the candidate sites by their capability of explaining failure responses. The more failing POs that can be explained by a fault candidate while causing fewer passing POs to fail, the higher the score this fault gets. Moreover, if a candidate (or a group of candidates) can perfectly explain the failure responses of failing patterns, those candidates are ranked higher than those candidates that can only partially explain some failing patterns. The authors group those higher-ranked candidates together.
The algorithm Alg_multiple proceeds as follows. 1) Initial fault candidate.
2) Perform Alg_timing. Ranking the candidates. Inject the highest ranked candidate into the circuit with delay size equal to the lower bound. Remove the failing pattern explained by the injected fault.
3) Repeat
Step 2) for the remaining failing patterns until all patterns are explained or no candidate can be found. 4) Simulate each identified faulty locations. Rank them based on their capability of explaining the observed responses.
VI. EXPERIMENTAL RESULTS
A. Single Delay Fault
The diagnostic capabilities of the algorithm were evaluated using the delay-fault simulation framework. Since obtaining accurate timing information has not been the main purpose of this work, the static timing/delay information was used as a substitute for the accurate timing information obtained from either SPICE simulation or SDF files.
In the experiments, the authors used ISCAS'85 and fullscan versions of ISCAS'89 benchmark circuits bigger than 1 K gates. The authors injected randomly into the circuits, random-size delay-faults (slow-to-rise, slow-to-fall, or slow), performed delay-fault simulation, and collected the failure responses. To emulate better the real faulty chip's timing behavior, when the authors perform delay-fault simulation, each gate/interconnect has a variation range. In the experiments, it has been set to 5%. The failure responses were captured when the path delay values on the observation points were bigger than the system operating clock. The authors generated the 100%-test-coverage transition-fault test-pattern set (>96% transition-fault-coverage) using a commercial Automatic Test Pattern Generation (ATPG) tool. The authors fed those failure responses into an existing commercial delay-fault diagnostic tool. The candidates reported by the commercial tool served as the initial fault candidates for the algorithm. When the authors performed the diagnostic algorithm, they assumed that the delay variations in the circuit were unknown and used the circuit's nominal timing information to perform DWP. The experimental results show that the algorithm can significantly improve the diagnosis resolution. Also, after applying the refinement step, the real fault sites have much higher ranks than the other candidates. The algorithm is programmed in C language. The authors ran all the experiments on a PC with Linux OS on a 2-GHz CPU and a 1-GB memory. Table II lists the pertinent data for the circuits used in the experiments. The first and second columns list the circuit's name and size. The third column shows the size of the transition fault test sets.
The diagnostic algorithm ranks the identified faults so that the most probable faults (based on the matching and refinement steps) are reported first. It is important that the rank of the first TABLE II  CIRCUIT INFORMATION   TABLE III  EXPERIMENTAL RESULTS FOR ISCAS CIRCUITS fault that matches the injected fault on the ordered list is as low as possible. The position of the first true fault is often referred to as the first hit rank (FHR). A low FHR can save time on chip screening for a failure analyzer.
The results in Table III have been obtained by averaging 100 random-injected test-cases for each benchmark circuit. Since the FHR is not reported by the commercial tool, the authors list only the FHR of the algorithm. Table III shows the diagnostic results. In column 2, the diagnostic result obtained from the commercial tool is listed. The reason it is called the "initial number of faults" is because the candidates reported by the tool are the initial faults for the algorithm. In column 3, the number of candidate sites reported by the algorithm is listed. The reported sites are representative faults of equivalence classes after transition fault collapsing.
Comparing the results here with the results produced by the commercial tool, the algorithm significantly increases the diagnostic resolution and produces very low FHRs. The resolution improvement can be seen in Table III . On average, the commercial tool produces 39 candidates per test case whereas the here algorithm produces around 8-9 candidates per test case. The injected single delay faults can be correctly identified and on average are the top four candidates on the list. For the assumed 5% delay variations, the authors have experimentally observed that for some cases the algorithm may delete an injected fault. This happens because of the failing DW's checking rule. An example in Fig. 16 shows this situation. Suppose that the clock period is 10 and the injected fault f 1 has delay size of 1.2. Due to delay variations (speed-up or slow-down), a delay failure at PO 2 , but not at PO 1 , is observed. However, in the diagnostics step, when the nonvariation timing value is used, a failure only at PO 1 , not at PO 2 , can be observed. In such a case, the authors will prune fault f 1 from the candidate list. To overcome this difficulty while applying the checking rule, to each DW the authors introduce an error tolerance margin of the same level as the circuit timing variation. For instance, when the authors check whether two DWs of a fault overlap (rule 1), they extend them by the error tolerance margin. If the windows overlap, the candidate fault remains on the list. A potential problem with this solution is that it increases the number of possible candidate sites. When the error tolerance margin is smaller than the real variation on the chip, the diagnostic result may not be meaningful. Table IV lists the memory usage information for the proposed algorithm. The usage is defined by subtracting the memory usage used by Alg_no_timing from all the memory usage of the program. This usage information can give us some insight about the memory overhead of using the timing information.
The performance of the algorithm is proportional to the circuit size. The runtime depends on the number of initial fault candidates and on the number of test patterns used for diagnosis. The average runtime of Alg_no_timing is less than 2 s. Alg_no_timing requires 10% of the Alg_timing memory usage.
B. Multiple Delay Faults
In the second round of experiments, the authors randomly injected two or three delay faults into the circuits and collected the failure responses. For each circuit and a given number of faults, the authors performed 100 random fault injections to get different "faulty" circuits. Then they obtained the average diagnostic results for those test cases. Table V shows the experimental results for multiple delay faults. The first column shows the circuit name. The second column shows the number of fault candidates reported before using the timing information. In the table, "2-f " refers to the results for two delay faults injected and "3-f " to three delay faults. The third column shows the number of reported candidates when timing information is being used. The last column shows the diagnosability (DA) for different fault densities. DA is defined as a ratio of the correctly identified fault count over the total number of injected faults.
The results suggest that for the majority of the cases the approach can identify the injected faulty locations with good resolution. By using timing information, we can significantly The authors need to mention that the results reported here are the possible candidate locations. If the injected fault can only propagate through one sensitizable path for a pattern, all the locations along the sensitizable path from the injected fault location to the failing PO are reported for that pattern. If we have multiple sensitizable paths to observe the delay effect of injected faults, the reported candidates are narrowed down to the common locations of those multiple sensitizable paths.
From the experiments, it can be observed that in the majority of cases the faults injected at random locations can be detected by the single-fault-based algorithm. The reason is that for those randomly selected locations most of the failing patterns are type-1 patterns.
To exercise the algorithm on more challenging examples, the authors performed another experiment. They purposefully removed all type-1 patterns from the failing pattern list. Then they performed a diagnosis using the remaining failing patterns, which are all type-2 patterns. The diagnosis results strongly depend on the locations of the injected faults. If the injected faults are located along the same sensitized path as shown in Fig. 15(b) , and if this sensitizable path is the only path propagating the combined delay effect to a primary output, both faults can be correctly identified. However, since the delay effect on this sensitizable path has accumulated from several faults, for each candidate the authors determine a bigger DW lower bound. This is because the delay size for each fault location has to be bigger than the slack of the sensitizable path to cause the PO to observe the failure. It is possible that both faults' delay sizes might be smaller than the slack, but their combined delay is bigger than the slack. When multiple sensitizable paths exist for f 1 , those details are sometimes helpful and sometimes harmful for the algorithm. It is usually helpful if multiple POs can observe the f 1 's delay effect. In such a case, the authors can use the fanin cone intersection of different failing POs to narrow down the candidates. A case when multiple sensitizable paths are detrimental for the algorithm is shown in the following example. Suppose that the slack of the path from f 1 to PO 1 is smaller than the slack of the path from f 1 to PO 2 , and we only observe the failure responses from PO 2 but not PO 1 due to the accumulated delay effect of f 1 and f 2 . When f 1 is considered as a possible candidate, its DW will fail the check rule 2 as shown in Fig. 17 .
If the injected faults are located on sensitized paths as shown in Fig. 15(c) , the partition method helps. The experimental results show a low probability that all the failing patterns are type-2. However, it is possible that a single fault exists whose simulation results explain failure responses caused by injected faults. This fault is reported as the best candidate based on the single fault assumption.
In the experiments, the authors purposefully inject multiple faults whose delay-effect propagation paths form relationships depicted in Fig. 15(c) .
For each benchmark circuit, the authors create 20 test cases, diagnose them, and determine the average results. They select the failure patterns such that each circuit is affected simultaneously by multiple faults. Each injected fault can be observed by at least one PO. The results are shown in Table VI . The first column shows the circuit name. The second column shows the number of test cases that produced the wrong fault candidates based on the single fault assumption. If the single-fault-based algorithm cannot find a candidate, the authors use the failing-PO-partition technique and perform the single-fault-based algorithm for each of the partitioned groups. The number of test cases that can be solved by the partition technique is listed in the third column. If using the partition technique does not find the correct injected faults, the number of failure results is reported in the fourth column. In the table, "2-f " refers to the results for two delay faults injected and "3-f " to three delay faults.
The experimental results suggest that when the failing patterns for diagnosis are all affected by multiple faults, the results are not good. The single fault assumption causes most of the wrong diagnosis results. The partition technique can help solve some of those hard-to-diagnose cases. From the experiments, because in practice type-1 patterns usually exist, the incremental approach has good efficiency.
VII. CONCLUSION AND FUTURE WORK
This paper investigated the feasibility of using timing/delay information to guide the delay defect diagnosis. Experimental results suggest that the timing information is essential for detecting fault defects.
In addition to investigating the efficiency of the approach to diagnose failures caused by a single delay-fault location, the authors also used the simulation-based approach to diagnose timing failures caused by multiple delay faults. The results show that the method significantly improves the diagnostic resolution in comparison to algorithms that do not utilize the timing information.
