fault model allows an ATPG to determine, from the failing behavior, if
INTRODUCTION
to affect timing of deep (DSM) designs [I, 21. These DSM delay effects are often continuous in nature [3, 41 , and the traditional assumptions of discrete timing and delay models in analysis and simulations become less applicable. Instead, these factors should bet-lay defects based upon statistical timing models and delay simulation ter be captured and simulated using statistical models and methods [5] .
Historically, the diagnosis Problem was defined Over the logic do-1. Effect-cause phase: In this phase, a set of suspect faults are idenmain and no timing information was involved. In today's industry, tified based purely on logic conditions. the single stuck-at fault model remains one of the most affordable and 2. ~~~~~-~f f~~~ phase: we apply a novel diagnosis algorithm opereffective models for defect diagnosis. Stuck-at based diagnosis algoating on the probabilistic space, instead of the logic space to obtain a much smaller set of candidate faults. rithms are often classified into two types: an effect-cause approach and a few target faults and produce additional pattems for them in order to further narrow down to more exact fault location(s).
In phases 2 and 3, statistical timing analyzer serves as a predictor for the delay configuration of a given failing chip instance. Because of this, how to match the failing behavior to the probabilistic information contained in the fault dictionary becomes an interesting question. Since the delay defect size i s a random variable, the criteria to determine the maximal fault resolution for a given pattern set become
In this paper? we propose a for diagnosing de- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. probabilistic as well. As a result, we can no longer rely merely on the logic conditions to decide if a test can differentiate two faults.
To measure the accuracy of matching the probabilistic fault information in the fault dictionary to the failing behavior, we introduce a new concept called diagnosis error function. In the second phase, we utilize an Euclidean-distance-based diagnosis error function to decide the fault suspects. To obtain a good diagnostic pattern set, in the second phase, we use a path delay fault ATPG without considering timing (for a set of longest paths). Then, in the third phase we use a Genetic Algorithm based timed ATPG to derive additional fine-tuning patterns.
By separating phase 2 from phase I, we avoid the construction of the fault dictionary for the faults that can be excluded as a cause of the failing behavior using only logic criteria. Hence, the effectiveness of our phase 2 diagnosis algorithm can be better observed. By separating phase 3 from phase 2, we avoid the application of a more complex (timed) ATPG to a large number of faults. Therefore, in our 3-phase methodology, we apply a more complicated algorithm to solve a problem aspect only when it cannot be solved by an easier approach.
While deciding which suspect faults should be kept after phase 1 is deterministic (because this decision is based purely on logic criteria), deciding which suspect faults should be used in phase 3 can only be probabilistic. Because of this, we re-define the concept of diagnosis resolution and discuss heuristics to separate faults in phases 2 and 3.
PROBABILISTIC FAULT DICTIONARY
In logic diagnosis, the circuit model used in the simulation is assumed to logically match to the chip instance. In delay diagnosis, this is not true due to the inclusion of statistical delay information. Each chip represents only a single instance of all possible delay configurations intended to be modeled statistically by the CAD tools.
Suppose the single stuck-at fault model is used in logic diagnosis.
Let {fl , . . . fn} be the n faults that belong to n different fault equivalence classes. Suppose a pattem set is available to achieve the maximal fault resolution, i.e., for any pair of faults fi, fJ, there exists a pattern in the set to differentiate these two faults (detect one but not the other). Then, in theory, given the failing behavior resulting from a single stuck-at defect, the diagnosis algorithm can conclude exactly which fault is the cause. On the other hand, if the pattern set does not achieve the maximal fault resolution, then depending on the resolution, an algorithm can conclude a subset of the faults as the potential causes. Exactly which one is unknown. Based upon these observations, we can say that the diagnosis resolution in the logic domain is the same as the fault resolution ifdefects are the same as faults. Take the single transition fault model as an example. If no delay information is involved, then the above statement still holds. However, if delay information is involved, then the diagnosis resolution is not the same as the fault resolution in the logic domain. Figure 1 illustrates the reasons.
In the figure, output arrival times are characterized as probability distributions. When a clock is given, from each probability distribution we can calculate the critical probability that represents the chance of an output delay exceeding the given clock [5] . In the figure, the critical probability at output ol is illustrated as the shaded area. In the first case, for a fault d, suppose two patterns VI, u~ are available. In logic domain, both pattems detect d and can differentiate between d and d'. However, depending on the timing length of the sensitized path (pl or p 2 ) , the critical probability (shaded area) resulting from each pattem can be different. If a pattern detects a fault through a short path (like UZ), then it is possible that with a small delay defect size, the pattern does not detect the defect at all. Consequently, uz can differentiate two faults in the logic domain but cannot do so by considering the delays (it may detect none).
In the second case, a pattern v detects both faults dl dz, logically through sensitized paths pl p~, respectively. Suppose the two paths merge at a 2-input cell and the arrival time random variables at the two inputs are denoted as a1 a2. The output arrival time random variable of the cell is the joint pdf random variable max(a1, az). Suppose Prob(a1 > az) = 1. Then, it is possible that pl always dominates the output delay (or vice verse depending on the transition type). Hence, pattem v can differentiate the two faults. As it can be seen, even though logically, pattem w does not differentiate the two faults, timing-wise it may.
Due to the above two reasons, in general, whether or not a test pattern can differentiate two given faults should be characterized as a probability value that depends on the given clock period clk. Therefore, in delay defect diagnosis, given a pattern w, our first task is to compute the probability that U detects a particular fault. This information is used to build the probabilistic fault dictionary, and our algorithm will use the dictionary to guess which fault is the most probable one to be the cause of failure. The probabilistic nature of the fault dictionary raises an interesting question. Consider the example in Figure 2 . Suppose the failing behavior of a chip instance is characterized as a 0-1 matrix (1 means that an error is observed). Suppose we have a way to calculate (in the simulation), for each candidate suspect fault, a probability matrix P where p i j represents the chance that a failure is observed at primary output i during the application of test vector j . Then, in the example, the underlying question to ask is: which probability matrix is a better match to the failing behavior?
If we focus on matching the "1" entries in the 0-1 matrix, we would say that fault # 1 is a better match. However, if we focus on matching the "0" entries, fault # 2 would be a better match. In general, depending on our view of what do we mean by a "better match" the diagnosis answer can be different. Hence, in order to develop an accurate diagnosis algorithm, our first task is to define carefully how to match the information in the probabilistic fault dictionary to the failing behavior. We call such functions the diagnosis error functions.
The concept of probabilistic fault dictionary implies that an optimal test set considering only the logical conditions may not be optimal for delay defect diagnosis. A possible solution to obtain a good set of diagnostic delay patterns could be to use a timed ATPG [14, 151. In our work, we do not consider a deterministic timed ATPG due to its high complexity. Instead, we use a path delay fault ATPG (based on logic sensitization conditions ) as an approximation in phase 2, and we use a Genetic Algorithm based timed ATPG in phase 3.
AN ERROR-FUNCTION-DRIVEN DIAGNO-SIS ALGORITHM
Given a failing n-output chip instance Ci, and a set T P of m pat-terns, suppose that the failing behavior is characterized in an n x m-
observed at output i while applying pattern j . Otherwise, bij = 0.
To diagnose the defect, we utilize a single-defect assumption. We assume that the defect location and defect size are independent random variables. In our current work, we do not consider that these random variables are correlated. Moreover, we assume that only one defect can occur for a given failing chip instance. This delay defect can happen on any one of the signals in the circuit.
Phase 1: Effect-Cause Analysis
In the phase 1, for each failing pattern we perform backward analysis from each failing output. We collect faults that fall on the logically sensitized paths (to the failing output) given by the pattems. At the end of this phase, we obtain a suspect fault set F = { fi, . . . , fi}. Each fault fk (for 1 5 k 5 1) falls on at least one sensitized path to one failing output during the sensitized path to one failing output during the application of at least one pattern.
Phase 2: Cause-Effect Diagnosis
Given F , for each fault fk, we construct an n x m-probability matrix &k. Each e:j represents the probability that an error can be observed at output i while applying pattern j for a given clock period if fault fk is present. Suppose we can calculate these probability matrices for all faults in F and obtain &I, . . . , &i [7] . Then, the underlying question to ask is: which &I, (for 1 5 k 5 1) is a better match to the failing behavior matrix B? To measure the accuracy of this "match,"
we introduce the concept of diagnosis errorfunction Err. In essence, Err(B, &k) measures the diagnosis error if fault fk is selected as the answer of diagnosis.
I An Error Function Based on Euclidean "Distance"
To simplify the problem, we first assume that n = 1. 
r ( ) .
For multiple-output circuits (n > l), Figure 3 demonstrates a simple view about the meaning of an error in the diagnosis. Under the equivalence checking model, an error in the diagnosis for a given pattern, is defined as at least one output produces a difference. In the figure, the delay configuration of the failing chip instance Ci, is also unknown, and is modeled in the simulation with statistical timing in the circuit model C. What we know is the failing behavior matrix B.
The faijing chip instance
The circuit model with statistical timing information between the probability vector Pk and the ideal solution 0 as simply
Equation (2) follows the same spirit as equation (l) , both of them use the Euclidean distance to measure the diagnosis error. Hence, we can use equation (2) to pick a fault whose error is the minimum. Steps The clock period clk is used to observe the failing behavior matrix E.
1. For each fault fk in F , calculate the probability matrix €k. The tools and methodologies used in this calculation will be summarized in Section 4. From &h and B, we use the method described in [7] to calculate the probability vector
where each p," is the probability that ej is 1 (a mismatch between the observed and simulated results is at least at one output) if fk is present as shown in Figure 3. 2. Calculate Err,+ = ET!l ( P , " )~ as described above to measure the diagnosis error. 
After we finish the calculation for all faults in

TOOLS AND METHODOLOGIES FOR THE EXPERIMENTS
The key tools to realize the proposed diagnosis algorithm include a statistical timing analysis tool and a dynamic timing simulator. Moreover, to measure the effectiveness of our diagnosis method, we need to perform statistical defect injection and fault simulation.
Statistical Timing Analysis
In statistical timing analysis framework, the delays of cells/interconnects are modeled as correlated random variables with known probability density functions (pdf's). These pdf's can be obtained using a Monte-Carlo-based SPICE simulator. Given cellhnterconnec delay functions and a cell-based netlist, the statistical framework can derive the pdf's of signal arrival times for both internal signals and primary outputs using Monte-Carlo based simulation technique.
In our experiments, we use a cell-based statistical timing analysis framework [5] . It requires pre-characterization of cells, i.e., building libraries of pin-pin cell delays and output transition times (as random variables). We use a Monte-Carlo-based SPICE (ELDO) [17] to extract the statistical delays of cells for a 0.25,um, 2.5V CMOS technology. The input transition time and output loading of the cells are used as indices for building/accessing these libraries. Each interconnect delay is also modeled as a random variable and is pre-characterized once the RCs are extracted.
Dynamic Timing Simulation
With a given set of test patterns the statistical timing analysis framework can be used to perform statistical dynamic timing simulations to obtain the pdf's of internal signals and primary outputs for the given set of test pattems. These pdf's are obtained by simulating a large number of circuit instances with different celllinterconnect delay assignments.
Defect Injection and Simulation
To measure the accuracy of our diagnosis method in the cause-effect phase (phase 2), we apply it with single as well as multiple defect models. For both models, we adopt an exponential delay size distribution function.
Single Defect Model. This model can be used to represent small delay faults resulting from manufacturing defects, resistive opens and shorts, bridging faults. We use exponential distribution for defect size Xe-'" where x is the defect size and X is a constant. We use X = 0.04 in our experiments. Other defect distributions could be used as well and using other distributions in general should not invalidate the trends observed in our work [7] . Multiple Defect Model. In this model, several single defects are simultaneously injected into the design. It can represent delay faults from a defect localized to a certain area of the chip. 
INITIAL RESULTS
pattem set. Hence, the effectiveness of our algorithm can be clearly For each circuit model C, we produce N circuit instances with difby the small K relative to the large suspect set in each case.
ferent delay configurations. On each instance, we inject a delay defect F~~~ comparing the results for double and triple faults with the reof which both location and size are drawn randomly according to the sults for single faults, it can be seen that our single defect-based didefect model (single or multiple). These instances model the faulty agnosis algorithm performs very well for the case of multiple defects chips. we then Our diagnosis method to each instance. The as well. In the cases of double and triple faults, a success is declared accuracy of diagnosis for single defects is measured in two Ways: 1) if at least one of the faults is diagnosed correctly. Note that this does In the algorithm, if the user-defined diagnosis resolution number K not imply that the diagnosis problems become easier because the effect value is 1 (refer to Algorithm 3.1 above), then the accuracy is a binary resulting from multiple faulty delay random variables is statistical and value success and failure depending on whether the answer matches remains h a d to predict.
the injected defect or not. 2) If the user-defined K > 1, then if the injected defect is contained in the potential defect set answered by the 5.1 algorithm, it is counted as a success; otherwise, it fails. Then, we cal-T~~ fundamental questions remain at the end of the phase 2. ~i~~~, culate the success rate as the accuracy t n t x c " e n t by averaging Over although the experimental results indicate that a small diagnosis resothe results from all N instances. Clearly, the larger the K value is, lution K can usually give us good results, how do we know that our the higher the success rate will be. For multiple faults, in the current selection of K is good enough? Second, suppose we Want to further implementation, we evaluate the algorithm assuming that the defect improve the diagnosis resolution by adding patterns, how can we multiple fault can be diagnosed. should be targeted for producing these additional pattems? As it can For single and multiple defect models, for each injected fault, we be seen, the to the second question partly depends on the anfind a set of statistically "long" paths through the fault site and generate swer to the first question. In the following section, we will present our path delay tests for them without considering timing. These long paths methodology in phase are derived using the false-path aware static statistical timing analysis
PHASE 3: FINE-TUNING
tool [16] . Then, robust or non-robust pattems for testing these paths are produced. Table 1 shows results on the accuracy of diagnosis. As expected, In phase 3, we use a Genetic Algorithm (GA) based ATPG to prothe rates of success increase for larger K . Values SI, Sz and S3 rep-duce additional fine-tuning patterns. The ATPG process is guided usresent the average number of suspect faults per injected fault for sin-ing a fitness function based on the timing information. Since including gle, double and triple faults, respectively after the phase 1 effect-cause timing can significantly increase the cost of test generation, we are analysis. The number of applied diagnostic patterns is in the range interested in finding a small set of target faults. The additional tests of few tens of pattems, depending on the fault model and the circuit. for these faults should produce the greatest impact on the diagnosis The results in the table can be interpreted in the following way: For accuracy.
Questions Left After Phase 2
can be diagnosed if at least one Of the contained in the produce the additional good diagnosis patterns? Moreover, what faults to these questions.
example for ~15850, for the case of single faults, 38% of the failing chip instances can be diagnosed successfully with a diagnosis reso-6-1 Selection of Target Suspects lution K = 2, 57% can be diagnosed successfully with a diagnosis After phase 2, we can rank the set of suspects for a given fault acresolution K = 6, etc.
cording to the value of the diagnosis error. Then, using the ranking and These numbers for K should be compared to the number of potential user-specified K value, we can pick a set of most likely faults as the suspect faults 224 given at the end of the phase 1. In other words, at target faults in phase 3 for generating additional pattems. However, the start of phase 2 where we apply our main diagnosis algorithm, we what if a user does not know what h ' value to pick?
have already excluded the faults that are impossible to cause the faulty It is obvious that selection of K affects the diagnosis results. A behavior by considering the logic sensitization conditions of the given larger K provides a higher confidence that the defect is contained in the final set of suspects, while a smaller K provides a better diagnosis resolution. For the purpose of selecting the target faults in phase 3, a larger K implies more ATPG effort. Given the ranking of the suspects at the end of phase 2, the underlying question becomes how to choose K so that we strike a good balance among all these concerns. From the experimental results shown above, in general we expect that a small K value should be good enough.
To help us answer this question, we focus again on the diagnosis error values for the suspects after phase 2. Figures 4 and 5 show plots of the diagnosis error values for every suspect in 4 faulty chips for s5378 and ~15850. These chip instances are randomly chosen from the set of defects being diagnosed in the experiment in Section 5. The suspects in the x-axis are the ones remaining after eliminating all the impossible faults based on phase 1.
In these plots, we can observe a clear trend showing that after a certain fault index there is a rapid increase of the diagnosis error values. This trend suggests that there is a small set of faults for which the failing patterns result in a much better match between the observed and the statistically simulated behavior (as defined by the diagnosis error).
Therefore, we can select K based upon when the rapid increase happens. Our experimental results in Section 5 support this easy approach to be a good heuristic for selecting the K value.
For example, next to each curve in Figure 4 , we show the K value that results in successful diagnosis, i.e. the injected defect is one of the first K faults (as validated by our experiments). Hence, for the curve denoted as "fault #9" this curve was obtained based upon the injection of fault #9 in the experiments. And, with a selection of K = 3, the fault # 9 is contained in the first three suspects being diagnosed in phase 2. We note that for fault #9, the diagnosis error values increase rapidly after the first 5 suspects. Similar trends can be observed for other faults and the other design as well. The shape of these diagnosis error curves reveals the obvious heuristic of selecting the first K suspects based upon the rapid increase of the error values. Based upon this heuristic, we select a small set of K suspect faults for test generation of the additional fine-tuning patterns.
IO'
10'
Suspect Index 
Pattern Generation for Diagnosis
Generating "good' diagnostic pattems for delay faults is a complex task. Since the ATPG has the burden of ensuring that small defects are not missed due to poor quality patterns, in general, it means that timing information needs to be involved.
A given path can be sensitized with many different patterns resulting in different path delays. "Good" pattems are defined as those that sensitize the fault through long paths and produce a long delay on it It is the task of the pattern generator to generate patterns resulting in longer path delays. Due to complexity reasons, most conventional path delay fault pattern generators do not take timing information into account and generate tests based purely on logic path sensitization conditions. Thus, the patterns might not always exercise the worst-case timing scenarios and result in longest path delays. One possible solution for generating high quality patterns could be a timed ATPG technique in which the timing information is used in addition to the logic information to generate deterministic patterns [14] . However, due to its complexity, timed ATPG is not practical. In addition, due to the statistical nature of the timing information, patterns derived using nominal or worst-case delays might not be best for all circuit instances. On the other hand, considering statistical timing would just further hurt the efficiency of this method.
seems to be able to strike a good balance between the quality of produced test patterns and complexity. Therefore, we use this approach to generate fine-tuning patterns for enhancing the diagnosis resolution in phase 3.
Genetic Algorithm based ATPG
For each target fault, we select a small set of long paths based on statistical timing analysis. Next, for each path, we assign the mandatory logic assignments to sensitize it. After assigning the mandatory values to sensitize a given path, usually there are still many unspecified values at the primary inputs. Even though the path sensitization does not depend on the assignment of these values, the path delay does. Therefore, we use an iterative GA process to specify these PI values such that they result in longer path delays.
Genetic algorithms [18] are search algorithms based on the mechanics of natural selection and natural genetics. In our GA based delay fault ATPG, the solution space is represented by the set of all possible patterns satisfying the mandatory values for sensitizing the target path delay fault. Each pattern has an associated fitness value which is, in our case, given by the path delay under the current set of patterns. We use statistical dynamic timing simulations to evaluate the path delays [16] . In the initial generation of patterns, the unspecified PI values are randomly assigned. Next, the GA searches for the pattern(s) with the optimal solution using three processes: selection, crossover and mutation [19] . The objective of the GA is to evolve a population of patterns having high fitness values. This iterative process continues until the number of generations reaches a pre-defined value.
From previously published work [ 191, Genetic Algorithm based ATPG
EXPERIMENTAL RESULTS
In this section, we present the final experimental results for all phases of our diagnosis framework. For each design and each injected defect, we first apply phase 1 and phase 2 of our diagnosis methodology. The results after phase 2 are given in Section 5. Next, for each defect, we select K highest ranking suspects to use them as target faults for generating the fine-tuning patterns. The value of K is determined based on when the rapid change of the diagnosis error is observed for the given defect, as described in Section 6.1. I 45   52   2  48  51  64  3  60  64  77  5  80  82  90  1  7  12  25  2  29  37  62  4  52  57  75  9  75  78  87  1  29  36  58  4  58  67  83  6  86  87  91  9  86  88 fault fi E F , we select 10 longest paths and use them in the GA based ATPG to generate patterns. The paths are selected using statistical timing analysis tool [16] . For each path, we set the size of the GA population to 12 and the number of generations as 5. At the end of the GA process, for each target fault fi, we pick 4 patterns resulting in a longest path delay.The path delay is obtained using statistical dynamic timing simulations [16] . Next, we use our phase 2 diagnosis methodology again with these additional patterns. The experiment follows the same spirit as the one described in Section 5. To evaluate the contribution of generating high quality patterns as opposed to just generating additional patterns without timing information, for each target fault, we also derive 40 additional test patterns such, that the unspecified values at the primary inputs after sensitizing the paths are assigned randomly (rather than generated using the GA based ATPG). Table 2 shows the results for diagnosis resolution in the case of single defect. The average number of target faults for which additional test patterns are generated is given under the circuit name. The average is taken over all injected defects for the given design. The percentages in column 3 are the results after phase 2, i.e., they are repeated from Table 1 . The results under column marked "Phase 3" are obtained with the two sets of additional pattems: random and GA generated. Even though both sets result in improved diagnosis resolution, the GA generated patterns have a clear advantage. Due to the relatively low cost of GA based ATPG as compared to deterministic timed ATPG, the results suggest that the extra cost is worth the effort.
CONCLUSIONS AND FUTURE WORK
In this work, we study the problem of delay defect diagnosis based upon statistical timing model. We propose a 3-phase diagnosis methodology that gradually improves the diagnosis resolution. While most of the previous delay diagnosis approaches stop after faults can be distinguished using logic conditions in phase 1, our main contribution is a novel delay diagnosis methodology based on statistical timing information used to further distinguish faults in phase 2 and phase 3. To do this, we propose an error-function-driven diagnosis algorithm based upon the single defect assumption, and demonstrate its effectiveness under different defect assumptions. We note that this algorithm was the best after experimenting with several other different approaches.
In phase 3, we propose a novel methodology for deriving fine-tuning diagnostic patterns to further enhance the diagnostic resolution. Our experimental results indicate that using our diagnosis framework with extra effort of generating additional good patterns results in improved diagnosis resolution.
Future research includes many possible directions, including 1) development of better diagnosis error functions and new diagnosis algorithms accordingly, 2) development of methods to reduce the expense of computing and storing the probablistic fault dictionary, 3) the improvement of dynamic statistical timing simulator for more accurate delay fault simulation.
