Introduction
Fault simulation [3] consists of simulating a circuit's behavior in the presence of faults. Comparing the faulty response of the circuit to that of the fault-free response using the same test set r, we can determine the faults detected by T. Fault simulation has many applications, such as test set evaluation, fault-oriented test generation, fault dictionaries construction, and analysis of circuit operation in the presence of faults. There are many algorithms for fault simulation [3] : serial, parallel, deductive, concurrent, parallel-pattern single-fault propagation [ 141, and critical Unlike SSL fault simulation where the number of faults is proportional to the number of nets in the circuit, design error simulation deals with a complex set of error models with the total number of errors proportional to max{ 2", p x g2} , where g is the number of gates and p is the maximum fanin of the gates in the circuit.
John P. Hayes Advanced Computer Architecture Laboratory Dept. of Electrical Engineering & Computer Science
University of Michigan 1301 Beal Avenue, Ann Arbor, MI 48109-2122 In this paper, we develop an efficient error/fault simulator ESIM that covers various new design error models as well as the older manufacturing fault models. ESIM is based on novel simulation algorithms that use a combination of parallel-pattern evaluation, multiple error activation, single fault propagation, and critical path tracing.
We discuss the design errors and logical faults that can be handled by ESIM in Section 2. We then review fault simulation methods for combinational circuits with emphasis on the parallel-pattern single fault propagation and critical path tracing in Section 3. We further discuss the application of critical path tracing to error simulation in this section. Section 4 discusses the implementation details of ESIM, presents the results of experiments performed using it, and presents preliminary results using a sequential version of ESIM. Finally, we draw some conclusions and suggest directions for further research. not necessarily complete, but they are believed to be common in the lifetime of a digital system. In this paper, we consider several fault and error models which are described below. Single Stuck-Line (SSL) Faults: The most widely-used logical fault model is the SSL model [3] . Under this model, every single signal line can become permanently fixed (stuck) at a logical 1 or 0 value. The model is simple and technology-independent. It represents some physical faults directly; more importantly, however, tests derived for SSL faults detect many actual design errors/faults. Since the number of SSL faults is proportional to the number of lines in the circuit, it is feasible to consider all possible SSL faults in medium scale designs. Input Pattern (IP) Faults: Blanton and Hayes [8] presented a more general logical fault model called the input pattern (IP) fault model. Under this model, an IP fault in a module M changes the response of M to the input pattern V from F, to F,. This IP fault is represented by V' (F, = F,) A jtinctional fault in a module M changes the function implemented by M. It can be represented by a set of IP faults. The number of IP faults in a circuit C is proportional to g x 2') where g is the number of gates in C andp is the maximum fanin of the gates in C. Gate Substitution Errors (GSEs): According to experiments reported in [I] , the most frequent error made in manual design is gate substitution, accounting for around 67% of al1 errors. Gate substitution refers to mistakenly replacing a gate G with another gate G' that has the same number of inputs. We represent this error by G/G'. For gates with multiple inputs, a multiple-input GSE (M/GSE) An e.xtra-gate design error (EGE) is defined as inserting a gate G' that has its m inputs taken from the n inputs of a gate G and feeding the output of G ' to G. As a consequence, the number of inputs of gate G becomes n -m + 1 We represent an EGE by EG(G',G). It is easy to see that EG(AND, AND), EG(AND, NAND), EG(OR, OR), EG(OR, NOR), EG(XOR, XOR), and EG(XOR, XNOR) are undetectable or redundant. A missing-gate design error (MGE) is defined as removing a gate G ' that has m inputs and feeds an n-input gate G, and then changing the inputs of G ' into inputs of G; see Figure   1 . As a consequence, the number of inputs of G becomes N = n + m-1 We represent the MGE by MG(G',G). As n -1 inputs connected to an arbitrary subset of the original n inputs. We represent an EIE of a gate G by EI(e,G) , where e is the extra input. We represent an MIE of a gate G by MI(m,G) where m is the source of the missing input. The number of ICES in a circuit is very large-approximately O(Kj, where k is the number of distinct signals in the circuit.
Fault and design error models
A wrong input error (WIE) is defined as a connection of a gate input to a wrong signal source. We represent a WIE on a gate G by WI(u,w,G) , where IA is the wrong input of the gate and w is the correct input. If a vector v detects WI(zc,w,G), then it must set u and w to opposite values and propagate the signal at LI to a primary output. The number of WIEs is larger than that of the ICES, and it is approximately O(!?), where k is the number of distinct signals in the circuit. WIEs appears to be the second most common design error-around 17% of the errors reported in [I] .
An important question concerning MIEs is the source of the missing input. It must not depend on the erroneous gate's output, otherwise, the circuit can become sequential. Errors that make a combinational circuit sequential can be detected by a levelization procedure [3] . Similarly, the source of wrong input of a WIE must not depend on the gate output. Design Error Examples: Examples of the design errors discussed in this section are shown in Figure 2 . The design errors of types SIGSE, MIGSE, EGE, MGE, and EIE are grouped in GPI. The remaining design errors, of type WIE and MIE, are combined in GP2. The number of errors/ faults in GPI for a typical circuit is proportional to the number of nets in the circuit. On the other hand, the number of errors in GP2 is proportional to the square of the number of lines in the circuit. 
The error and fault simulator ESIM
Many general approaches to fault simulation have been proposed such as serial, parallel [ 191, deductive [7] , and concurrent [21] . Serial fault simulation is the slowest method of all, but uses the least amount of memory. It is based on simulating the fault-free circuit and the circuit in the presence of one fault, and then comparing the responses of the faulty and fault-free circuits. If the responses differ, then the fault is detected. The process is repeated for all the faults of interest, hence the execution time is proportional to the number of faults. Parallel fault simulation simulates a number W of faults simultaneously. Hence, it is faster than serial simulation but it needs more memory to deal with W faults at a time. Deductive and concurrent fault simulation techniques are based on detecting all possible faults in the circuit by a given test in one forward pass through the circuit. These methods are fast, but they have the disadvantage of unpredictable memory requirements [3] . The widespread use of design for testability techniques that transform a sequential circuit into a combinational one for testing purposes has increased the importance of specialized methods for combinational circuits.
Critical path tracing is a fault simulation method for combinational circuits that is based on simulating the faultfree circuit. It computes signal values for tracing paths from primary outputs towards primary inputs to determine the detected faults without explicitly computing the faulty -Critical l If only one input x has a controlling value and the output is critical, then .r is critical.
l If all the inputs are non-controlling and the output is critical, then all inputs are critical.
l Otherwise, none of the inputs is critical.
For the other gates, {NOT,BUF, XOR, XNOR} , the inputs are critical if the output is critical. To illustrate, consider the fanout-free circuit in Figure 3 . The signal values in response to 1111 are shown in the figure. The critical path tracing method starts by marking the output g as critical, then it examines the gate G,. Since e is the only controlling input and the output is critical, then e is critical. The method then examines gate Cl followed by G,. For
Gt , the inputs are non-controlling and the output is critical, hence the inputs a and b are critical. For G,, the output is not critical, hence the inputs c and d are also not critical. After performing the critical path tracing analysis, the faults that are activated by the signal values and fall on a critical path are detected by the test I I1 I. Hence, the SSL faults a/O, b/O, e/l, and g/O are detected by 1111.
For the case of circuits with fanout, we have to consider fanout stems, where a stem is a line that has multiple fanout. The criticality of the stem cannot be determined from its fanouts due to the fact that propagation of fault effects on multiple paths can block the propagation of the effects to primary outputs. The problem of determining the criticality of the stems is called stem analysis. The simplest solution to stem analysis is to explicitly simulate the stems to determine if they are critical. An efficient technique for stem analysis has not been found yet. Due to stem analysis, critical path tracing is not used alone in fault simulation. In [l6], it determines the criticality of non-stem lines as described above, while the criticality of stems is determined by parallel-pattern single fault propagation (PPSFP). This approach combines two concepts: singlefault propagation and parallel-pattern evaluation.
l Single-fault propagution is a specialized serial fault simulation method for combinational circuits. Each SSL fault is injected in the circuit and the circuit is simulated; then the response is compared to the fault-free response. If they differ, the fault is detected. To speed this process, the faulty circuit is simulated starting at the fault siteand continuing to the primary outputs. The gates at earlier levels than the fault site need not be evaluated because they are not changed. After checking for the last fault, another set of W vectors are selected and the process is repeated until either the vectors are exhausted or all the faults are detected. Unlike the methods discussed above that target SSL faults only, ESIM is designed to efficiently fault simulate several different types of error and fault models including all those discussed in Section 2. The detection of an error/ fault in a target circuit is determined by ESIM using the information about the criticality of the lines as well as the activation conditions for the faults/errors. A fault/error in a gate G is detected by a test t iff t activates the fault/error and the output of G is critical under t. Hence, if the output of a gate G is critical under a test t, then all the errors/faults at G that are activated by tare detected by it. The activation conditions for the faults/errors are summarized in Table I . ESIM combines the following four techniques: parallel pattern evaluation where packets of 32 tests are simulated concurrently, multiple error and fault activation, single fault propagation at stems, and critical path tracing at the non-stem lines. We build explicit fault and error lists for SSLs, IPs, GSEs, GCEs, and EIEs. However, since the number of MIEs and WIEs is quadratic in the number of nets in the circuit, we use implicit partial error lists for these errors.
ESIM is written using C++ in approximately 9000 lines of code. Its simulation algorithms for GPI errors (GSEs, GCEs, and EIEs) and GP2 errors (MIES and WIEs) are shown in Figures 4 and 5 , respectively. The simulation algorithms for SSL and IP faults are similar to that of GPI.
Experimental results
The major application of ESIM is to evaluate the coverage of design errors and logical faults by using various test sets that are determined by typical automatic test pattern generation tools such as the following: were developed by us to produce random and exhaustive tests, respectively. We now describe several experiments that illustrate the capabilities of ESIM. The circuits used in the experiments Experiment 1 (E.uhaustive simulation): The first experiment was conducted to investigate exhaustive simulation using tests generated by ETESTS. This experiment gives us the percentage of redundant design errors and logical faults in the simulated circuits. The results of the experiment are shown in Table 3 , from which we see that the redundancy of some types of design errors can be as large as 11.6%, and that of IP faults can be as large as 33.5%. This experiment is performed only for those benchmarks where simulation with exhaustive tests is feasible-circuits with approximately 16 or fewer inputs. Experiment 2 (Random simulation): The second experiment evaluates the random simulation approach.
Random test sets of sizes 1 through 20 were generated by RTESTS for the ~74283 carry-lookahead adder circuit and the coverage of design errors was determined using ESIM. The process was repeated 50 times and the average coverage obtained is shown in Table 4 . The table shows that a small number of vectors provide good (but not full) coverage of design errors. The main problem with random simulation of this type is that it cannot guarantee high coverage with a relatively small number of vectors. Table 4 The coverage of SSL faults and design errors in the 74283 adder using random test sets. Tables 4 and 5 . As discussed earlier, most of the simulation time is spent in the simulation of GP2, especially as the circuits become larger. The effectiveness of the complete test sets for SSL faults in detecting IP faults is shown in Table 7 . The results show that complete test sets for SSL faults do a a sequential version of ESIM on a subset of non-scan sequential benchmarks from the ISCAS-89 suite [9] are shown in Table 8 . The test sequence S used in the simulation was generated in [5] to detect the design error models in GPI. In addition to computing the coverage of design error models in GPI and GP2, ESIM returns the coverage of extra latch errors (ELEs) and missing latch errors (MLEs). The coverage of design errors is high for all circuits, except for ~420 whose internal nets have low controllability and observability.
Discussion
ESIM is based on a novel combination of parallel-pattern evaluation, multiple fault/error activation, single fault propagation, and critical path tracing. It can handle several types of design errors and logical faults, and can readily be extended to cover additional error/fault models. The experiments reported here show that ESIM is relatively fast. They also confirm a number of interesting observations made before [2] [6] such as: (i) most design errors and logical faults can be covered by small test sets, (ii) the percentage of redundant design errors and IP faults is large in some circuits, and (iii) complete test sets for SSL faults are reasonably good tests for simulation-based design verifica- tion.
[9] F. Brglez, D. Bryan, and K. Kozminski, "Combinational Several aspects of ESIM's error modeling and simulation capabilities remain to be investigated, especially in the case of sequentiBl circuits. Its overall performance could be improved by introducing error collapsing. Finding ways to collapse the number of the missing or wrong inputs that need to be considered would be especially useful. As Table  6 shows, most of ESIM's simulation time is spent on MlEs and WIEs. The relation between design errors and IP faults also seems worth exploring since tests for IP faults appear to cover many design error types including hard-to-model errors, as well as unknown manufacturing fault types [8] .
