A new algorithm for identifying stuck faults in combinational circuits that cannot be detected by a given input sequence is presented. Other than pre and post-processing steps, certain signal conditions are monitored during logic simulation. These signal conditions are specified by an analysis of dominators and signal reconvergences in the circuit graph. After simulation, a post-processing step identifies faults that cannot be detected by the sequence. For combinational ISCAS benchmarks, the runtime overhead for the algorithm is found to be around 30-40% over that of a logic simulator. Experimental data show a substantial reduction of error in statistical estimates obtained by a stuck-fault coverage estimator when corrected for faults found by this algorithm as guaranteed to be undetected by the given sequence. An effective application of this technique is demonstrated for scan-based test point selection in an industrial scenario where circuit size and vector length prohibit the use of fault simulation.
Introduction
Upper bounding of fault coverage involves the identification of stuck faults that are guaranteed to remain undetected by a vector sequence. There are several applications of upper bounding in various areas of test, most predominant being fault coverage estimation of functional vectors as required for custom chips such as microprocessors. A strong motivation for this work comes from this critical need of the industry. For multi-million gate VLSI circuits, logic simulation is considered feasible but fault simulation is often impractical. Statistical fault simulators [5] , though viable complexity-wise, can be too inaccurate in situations where vectors are written to specifically detect faults in certain modules. Because of the approximations used, a statistical fault simulator may estimate a non-zero detection probability for many undetectable faults, hence treating them as detected. The upper bounding technique of this paper can help revise such statistical coverage estimates.
At least two upper bounding algorithms have been proposed in the literature [1, 3] . Critical path tracing (CPT) [1] underestimates fault coverage when a * This work was supported in part by a grant from Intel Corp. stem fault is sensitized by simultaneous propagation of fault effects on all fanout branches, while none of the branch faults is individually detected (see example in Figure 1(a) ). The method of Akers et al. [3] analyzes necessary conditions for fault detection. Both algorithms require fault analysis after simulation of each vector. We borrow ideas from both algorithms. While we monitor pre-specified conditions during simulation, unlike the two algorithms, the coverage is analyzed only after the simulation is completed. Thus, the per-vector computation overhead is reduced. Also, in contrast to CPT, we avoid tracing through gates, with multiple dominant input logic values, that are points of reconvergence for paths with different parity. However, at points of reconvergence with the same parity, the algorithm is exhaustive and repeats the analysis for each possible input choice that can justify the output.
Some basic definitions are presented in Section 2. An illustrative example runs through Sections 3 and 5. The details of the algorithm are presented in Section 4. Section 6 presents experimental results for ISCAS combinational benchmarks. We also provide a real-life application. For three industry designs, we show the use of the upper bounding algorithm to select scanout points to meet higher fault coverage requirements.
Definitions
A circuit is modeled as a directed graph where nodes of the graph represent gates, and edges correspond to signals. We will use the following definitions. Checkpoints of a circuit consist of primary inputs and fanout signals [4] .
Each gate is classified into one or more of three categories: Fanout gates -Gates that drive fanout stems. Reconvergent gates -Gates that are points of reconvergences. Non-reconvergent gates -Gates that are not a point of reconvergence.
Note that recovergent and non-reconvergent categories are exclusive but each can have fanout gates. Dominator -In a classical sense, a node A of a directed graph is a dominator of node B if every path from B to any output passes through A [2] . The set of nodes that dominates a given node is called its dominator set. Our definition of dominators differs from this classical graph theory definition. First, in the classical sense every node is a dominator of itself, but we exclude such cases. In addition, we require that a dominator gate must have at least one input that is not reachable from the dominated gate. This eliminates dominators that exist due to reconvergent fanouts only. Besides, a single-input gate will not be treated as a dominator. Trivial dominator set -A given node is said to have a trivial dominator set if its immediate fanout is the only dominator. Non-trivial dominator set -For a given node, a non-trivial dominator set necessarily includes at least one dominator node that is not an immediate fanout point of the given node. Output sensitizing condition -Given a gate with a specified output value and its dominator set, this condition requires that all off-path sensitizing values for the dominators have appropriate non-controlling values. Input sensitizing condition -Given a gate with a specified value on a specific input pin and its dominator set, this condition requires non-controlling values on all other inputs to the given gate and off-path inputs to the dominators.
An Example
We consider a small circuit (c17) and a sequence of two vectors to illustrate how the algorithm determines an upper bound on fault coverage. There are three distinct steps to upper bounding fault coverage: (1) a preprocessing step consisting of structural analysis, (2) monitoring of signal conditions during simulation and (3) post-processing of results and identification of undetectable faults. The first two stages are outlined in this section. The final step for this example is presented in Section 5 after the algorithm is detailed in Section 4.
Structural Analysis
Structural graph analysis for finding dominators [2, 6] and reconvergence points [7, 8, 9] has been used in test generation systems. We use a similar analysis to monitor signal conditions at the end of each cycle in a vector sequence. These conditions are derived for single output circuits. For multi-output circuits, the conditions are obtained separately from each cone of logic that drives a primary output. These signal conditions are necessary for fault detection. If some of these conditions never occur throughout a vector sequence then certain faults are guaranteed to remain undetected by that sequence.
For each gate, all input value combinations are stored in a table. These input combinations are analyzed once simulation of the entire sequence is complete. Along with gate input combinations, each logic cone is analyzed separately and dominators for each non-fanout gate to the cone output are obtained. Such dominators found for non-fanout gates are either trivial or non-trivial. For the circuit of Figure 1 , gate G0 has a trivial dominator set {G4}, while G6 has a nontrivial dominator set {G0, G4}. The other dominated gates are G3 and G12, with dominator sets {G5} (trivial) and {G3, G5} (non-trivial), respectively. Since this dominator analysis is performed for each cone, additional dominators are found for gates G1 and G2. For the cone of G4, G2 has a trivial dominator set {G4}, while G1 has a non-trivial dominator set {G2, G4}. In the cone of G5, G2 has a trivial dominator set {G5}. In a given cone, for gates with trivial dominator sets, input sensitizing conditions are monitored explicitly, unless an input is a checkpoint in that cone. For non-trivial dominators, only output sensitizing conditions are monitored. Output sensitizing conditions for trivial dominators are ignored because they are obtained automatically by monitoring input states for the dominator gate. For the circuit of Figure 1 , all inputs of both gates G0 and G3 are checkpoints and are skipped. For G6, the output sensitizing condition is the simultaneous occurrence of {G6=v , G10=1, G2=1} for v ∈ {0, 1}. For G12, the sensitizing condition consists of {G12=v , G1=1, G2=1} for v ∈ {0, 1}. In the cone of G4, the trivial dominator set for G2 yields two conditions for its second input: {G1=v , G9=1, G0=1} for v ∈ {0, 1}. These conditions are considered because G1 is not a checkpoint in this cone. The non-trivial dominator condition for G9 yields {G9=v , G1=1, G0=1}. For the cone of G5, gates G2 and G9 have domina- 
tor sets {G5} and {G2, G5}. Both inputs of G2 are checkpoints for the cone and are ignored. The output sensitizing conditions for G9 and its dominator set are {G9=v , G1=1, G3=1}, for v ∈ {0, 1}. Table 1 A dominator set {G1, G2, G4} exists for gate G11 in the cone of G4. However, no dominator exists for this gate in the cone of G5. This is because G5, not having an input independent of G11, does not qualify as a dominator of G11 according to out definition. If no conditions are obtained for a gate from a cone, then all conditions for that gate obtained from other cones are also dropped from consideration. A fault that appears in multiple cones is considered undetectable if the relevant conditions in all cones remain unsatisfied during simulation. Gates G10 and G1 have conditions from one cone only. However, these are not dropped because they are fanout points in the other cone, and additional conditions due to reconvergence in the other cone are generated, as explained below.
Fanout gates are analyzed specific to each output cone that contains them. These fanout gates in a cone are origins of reconvergent paths. Some fanout gates may have no reconvergent fanout in any cone and are not considered in this step, e.g., G2. Different reconvergent paths may have different (odd and even) inversion parity and fault propagation requires that paths with different parity be not simultaneously sensitized. However, along paths that have the same parity, simultaneous fault effect propagation may occur.
For the cone of G4, G10 has reconvergent paths with different parity. We denote the two sensitizing conditions for paths originating at G10 by sp0(G10) [1] (sp1(G10) [1] ) and sp0(G10) [2] (sp1(G10) [2] ) for logic value 0 (1). These are shown in Table 2 . The notation "sp" is used for single paths, while "mp" will denote multiple paths. For sp0(G10) [1] , {G9 = 0 G11 = 0} represents the disabling condition for propagation paths with different parity. For the cone of G5, there are reconvergent paths with the same parity from G1. Considering that it is possible to activate any subset of Table 2 : Different parity sensitizing conditions for G10. 
these paths, including all of them simultaneously, the conditions derived for G1 are as shown in Table 3 .
For a given gate G, if none of the conditions dom0 (G), sp0(G) or mp0(G) (dom1(G), sp1(G) or mp1(G)) is ever satisfied during simulation, the sa1 (sa0) fault at the output of G will remain undetected. In addition, all faults whose propagation requires gate G to have a value 0 (1) also remain undetected. Figure 1 shows a two vector simulation sequence of the example circuit, along with specific vectors where these conditions are first satisfied. Also shown in the figure are the gate input combinations that are observed after simulation of each vector. Once simulation of a subsequence is complete, the set of input value combinations for each gate and the conditions of Section 3.1 that are satisfied are analyzed and faults that are guaranteed not to be detected are identified. The algorithm for the analysis of these input value combinations and the satisfied dominator conditions is presented in Section 4, following which we revisit this example in Section 5.
Monitoring in Logic Simulator

Algorithm for Post-processing
For a node n, we assume that signal conditions dom0(n), dom1(n), sp0(n), sp1(n), mp0(n) and mp1(n) are monitored during simulation. Note that these conditions may not exist for all signals, in which case they are trivially assigned a true (logic 1) value. When any condition is not satisfied during simulation it is assigned a false (logic 0) value.
For a given node n, we evaluate two predicates, oneP rop(n) and zeroP rop(n), that denote possibilities for logic values 1 and 0, respectively, to propagate from n to some primary output. For an output that attains a logic value 1 (0) sometime during simulation, oneP rop(n) (zeroP rop(n)) is assigned a value 1. Otherwise, this value is set to 0. Using backward traversal from the outputs, these predicates are evaluated at gate inputs. We next describe how this backward traversal is performed from circuit outputs to primary inputs.
The immediate input value combinations of each gate are also stored. We denote by ρ(n = 1) (ρ(n = 0)) the condition that node (signal) n attains the value 1 (0) during simulation. For a gate with input signals s 1 , s 2 , . . . , s N , the predicate ρ(s 1 = v 1 , s 2 = v 2 , . . . , s N = v N ) denotes the condition that signals are s i = v i , 1 ≤ i ≤ N , simultaneously. The predicate ρ will be referred to as the reachability predicate. Since this algorithm analyzes each cone of the circuit separately, all fanout stems in a cone are necessarily reconvergent. We show how oneP rop(s i ) and zeroP rop(s i ) are evaluated given the values of these predicates at the output of the gate s o . Without loss of generality, we study an AND gate. The formulas are similar for other gate types. 
Propagation at Non-Fanout Inputs
Propagation at Stem Inputs
Fanout stems can vary depending on the inversion parity of reconvergent paths. If reconvergent paths have different parity, a simultaneous fault propagation will cancel out the fault effects at the point of reconvergence. For paths that have the same parity, simultaneous fault effect propagation along multiple paths is possible and needs to be modeled.
Same Parity Reconvergence
As in Section 4.1, for each branch b i , predicates localOneP rop(b i ) and localZeroP rop(b i ) are evaluated. At stem s,
where stem s has N branches. First, assume that all fanout branches reconverge at node r with the same parity as the stem s. 
Different Parity Reconvergence
For each branch b i , predicates localOneP rop(b i ) and localZeroP rop(b i ) are evaluated as in Section 4.1. At stem s,
where stem s has N branches.
Deducing Undetectability
A stuck-at-0 fault, f , at signal s is considered undetected if either the fault was not excited or the logic value 1 cannot be propagated from s to a primary output. Therefore,
Similarly, for a stuck-at-1 fault f ,
Example: Post-processing
Returning to the example of Figure 1 , we consider how the algorithm of Section 4 computes an upper bound on fault coverage. We assume that faults have been collapsed and only representative faults in equivalent classes are being considered. After simulation of the first vector, knowing the possible values at the two outputs, the values of oneP rop (zeroP rop) at G4 and G5 are evaluated to 1 (0). Tracing back from G4 (G5), both zeroP rop and oneP rop at gate inputs are evaluated to 0 because the only input state seen at these gates is 00. This backward tracing continues to gates G0, G2 and G3, with identical results. However, at the fanout stem driven by G1, the predicate mp1(G1) evaluates to 1. Using the formulas of Section 4.2.1, oneP rop(G1) (zeroP rop(G1)) evaluates to 1 (0). Tracing backwards from G1, zeroP rop(G11) (oneP rop(G11)) evaluates to 1 (0). Using these values of zeroP rop and oneP rop, and the formulas of Section 4.3, only the faults shown in Figure 1 (a) are found to be detected. For the second vector, the values of the propagation predicates are shown in Table 4 . Two cones are considered separately. Note that G2 has no fanout in Table 4 : Propagation predicate values for 2nd vector.
Cone of G4
Cone of G5 Signal zeroProp oneProp zeroProp
either of the cones. Considering vector 2 and the cone of G4 (columns 2 and 3), the zeroP rop predicate at G4 evaluates to 0 because this output never goes to 1 in the first two vectors. Tracing back from G4, only the zeroP rop predicate at G0 evaluates to 1 because the two input states at G4 are 00 and 01. Due to a similar reason, both predicates for G2 are also 0. Tracing back from G2, all predicates for the logic driving G2 are also 0, including signals G1, G11, G9, and G10-G1.
Tracing back from G0, oneP rop at G6 and signal G10-G0 evaluates to 1. In the cone of G5, only the zeroP rop predicate at G3 evaluates to 1. Note that oneP rop also evaluates to 1 for signal G1-G3 and stem G1. The sa0 fault on stem G1 is correctly identified as detected by these two vectors by this analysis. However, as explained in the previous paragraph, this fault was also correctly identified as detected after the first vector. Predicate oneP rop evaluates to 1 for G12 also. The sa0 at G12 is identified as detected by this analysis. Notice that this fault is equivalent to sa1 on G3, and is not shown in Figure 1(b) . The three faults detected by the second vector are shown in Figure 1 (b).
Results and Application
A set of 100 random vectors was simulated for each of the ISCAS combinational benchmarks. The results from fault simulation were compared to upper bounds obtained from algorithms outlined in this paper. For each circuit, monitored data were collected during simulation of the 100 vectors, and the post-processing algorithm of Section 4 was used only once after the simulation was complete. Results are shown in Table 5 . Exact coverages (%) from fault simulation are shown in column 2. A cumulative detection probability for 100 vectors was obtained for each fault using an algorithm similar to Stafan [5] . A fault is assumed to be detected if this probability is greater than 50%. Further details of this estimation method are beyond the scope of this paper and are not presented here. The estimated values are shown in the third column (marked "vanilla estimate"). The fourth column (marked "upper bound") shows the upper bounds obtained using the method outlined in this paper. Faults incorrectly identified as detected by the Stafan-like [5] analysis (column 3), but guaranteed to remain undetected by algorithms of this paper, were removed leading to the improved estimates of column 5, which are much closer to column 2.
Columns 6-9 show improvements in the errors of classifying individual faults as detected or undetected by the statistical method [5] . When the estimated detection probability of a fault is higher than 0.5, it is classified as detected, otherwise it is classified as undetected. The percentage of faults that is incorrectly classified as detected (compared to exact fault simulation) is referred to as overshoot error. Similarly, the percentage incorrectly classified as undetected is referred to as undershoot error. These errors for the original Stafanlike method (vanilla of column 3) and for the statistical method with upper bound improvement (column 5) are shown in columns 6 through 9. Upper bounding significantly reduces both types of errors. Overshoot errors decrease because dominator analysis finds faults that are guaranteed not to be detected. The decrease in undershoot errors is due to cases like G1 sa0 fault in Figure 1(a) , where the stem fault is detected while all branch faults remain undetected. Both Stafan [5] and critical path tracing [1] tend to underestimate fault coverages when fault effect propagation occurs simultaneously along multiple fanout branches, without any fault on the fanout branch being detected.
Algorithm Complexity: The computing cost of the upper-bounding algorithm was determined as the overhead in logic simulation. Plain logic simulation (without monitoring) had execution times on Intel P4 CPU shown in column 10 of Table 5 . When signal conditions were monitored runtimes of column 11, or percentage overheads of column 12 were seen. These numbers ignore the time of the pre-processing and the postprocessing steps that are fixed overheads, independent of the length of the input sequence. For an n node (gate) circuit, the main contributor to pre-processing is the dominator analysis with nlog(n) complexity [2] . Post-processing is linear and its time was negligible.
An Industry Application: The upper bounding technique was used for selection of scanout test points in large industrial designs. Table 6 shows data on three circuits. Column 2 gives the circuit size and column 3 gives the number of functional vector sequences. These sequences were of varying sizes, the larger ones having a million or more vectors. The maximum possible number of scanout points, permitted by critical timing consideration, is shown in column 4. Fault simulation with fault dropping determined the coverage in column 5. The CPU time in column 6 is for a multiprocessor. The goal was to maintain a coverage as close as possible to that of column 5 with fewest scanout points. This would require fault simulation without fault dropping, which was considered too expensive. Columns 7 and 8 show the number of scanout points based on designer's knowledge and heuristics and the corresponding coverages by the fault simulator, which took about the same CPU time as in column 6.
As an alternative, upper bounding fault coverage estimates were obtained for all test point (scanout) candidates, and an optimal set of the same size as in column 7 was chosen such that estimated coverage was maximized. The runtimes of the estimator are shown in column 11. Fault simulation, again requiring about the same CPU time as in column 6, produced the coverages of column 10, which are indeed higher than the designer's choice coverages of column 8. The upper bounding coverage estimates were very close to those in column 10 and the estimator was one to two orders of magnitude faster than the fault simulator, which could only be run with fault dropping.
Conclusion
The upper bounding algorithm finds a strict upper bound for stuck fault coverage and reduces the error of a fault coverage estimator. Although the specific details of the statistical estimator are skipped, the reduction of overshoot errors is independent of the choice of the estimator used. The number of undershoot errors is also reduced by finding reconvergent paths with the same parity. Except for the pre-processing part, all algorithms are linear in circuit size. The runtime overhead added to logic simulation is about 30-40%. Application of this algorithm in a custom design environment has proven successful. Very large vector sets demand linear complexity fault analysis and the accuracy of the undetected fault data is necessary for improving the tests for increased coverage.
