Abstract-FALCON (FAst fauLt COverage estimatioN) is a scalable method for fault grading which uses local fault simulations to estimate the fault coverage of a large system. The generality of this method makes it applicable for any modular design. Our analysis shows that the run time of our algorithm is related to the number of gates and the number of IOs in a module, while fault simulation run time is related to the total number of gates in the system. We have measured fault coverage for OR1200 and IVM processors and compared the results with fault simulation performed by a commercial tool. We have also compared our results with fault sampling. Our results show that for large designs FALCON is an order of magnitude faster compared with fault simulation. It also has a smaller error rate compared with fault sampling when the size of design under test grows.
I. INTRODUCTION
Fault grading has been studied for half a century [28] . Despite this strong background, the complexity of fault grading still makes it time-consuming for today's large designs. Design-for-test methods, such as full-scan widely used in chip testing, reduce the complexity of fault grading for structural test; however, these features do no help for the problem of grading the application-level and functional tests. There are several papers which compare structural and functional test methods [19] [16] [17] [30] , and address issues for structural testing like test quality, test vector re-usability, and test application time.
However, growing design sizes and complexity and decreasing technology features necessitate application-level tests. Small delay defects, capacitive coupling between circuit lines, and power droops are some design related issues that might not be addressed effectively by structural tests, since tests for these need to be applied in normal operating modes. In addition, since functional tests operate the circuit in its normal modes, the frequency goals of the design are completely satisfied and there are no potential overkill issues which can be seen in structural tests [23] .
Therefore, to be able to test the designs using functional test vectors, there is still a need for measuring the coverage for application level tests. For example, in microprocessors, these tests are instruction level tests which are applied to the the entire processor chip while it is operating in its normal mode inside the system. Industry designs have used such techniques as their test techniques [23] [7] . This paper addresses a solution to the problem of high run times when fault grading a design executing functional patterns.
Proposed methods attacking the fault grading problem can be categorized into three major groups: fault simulation, fault emulation, and coverage estimation (statistical methods). Approaches to fault simulation include basic gate-level algorithms [1] and several hybrid methods. These methods either combine basic methods (like PROOFS [20] ), or use different levels of abstraction to speed up the simulation process [13] [27] [18] . Although these methods are faster than the traditional gate-level methods, they are still not scalable. To overcome this problem, high-level (example, RTL) fault models [29] have been proposed. These methods scale well with the size of the design, but they lack a precise correlation with the fault models used to evaluate the coverage of manufacturing test sequences. In general, in a large design with a large set of test vectors, system fault simulation seems to be nearly impossible [9] .
Fault emulation was proposed in the 90s [31] . The main drawback of fault emulation is that the design under test should be fully synthesizable, which makes emulation not applicable in early design stages. Also, fitting large designs into the emulation hardware might not be feasible.
Another solution for fault grading, proposed first in the 80s, is to estimate the coverage using some data from good simulation and/or the circuit structure [4] [8] [14] . These methods are based on fault sampling [2] , test vector sampling [10] , and gate-level statistical analysis using data from design simulation (called STAFAN) [12] [15] . The initial work was mainly on the Single Stuck-At (SSA) fault model. Later, fault models were expanded to path delay faults and also sequential circuits [11] [24] [6] . Some high-level testability measurements were also introduced [25] and an extension of STAFAN to RTL components was proposed [26] . Since most of the statistical methods use good simulation data, their run time is comparable to the run time of good simulation, which makes them scalable. Apart from their scalability, these methods (like STAFAN) introduce some parameters (for re-convergent fanouts for example) which should be determined empirically. The results have been shown only for small circuits. There is also a commercial tool [5] based on statistical fault analysis which uses testability measurements and sets the faults with 0 probability as undetectable. Then it fault simulates the design with the rest of the faults. The error in this tool is claimed to be not more than 10%, but it still depends on the fault simulation on the whole system. In this paper we present FALCON, a coverage estimation method for SSA faults in modular designs, which overcomes some of the above problems. Our method is similar to [15] in that both are based on functional block observability calculation. However our approach is different, since their approach is an extension to STAFAN and has the same drawbacks as the STAFAN method. Our method eliminates the need for defining empirical parameters. However it needs more simulations (local simulations) to determine a more realistic measure for fault propagation. These local fault simulations make our method applicable for industrial designs with an acceptable error range. FALCON is not dependant on a specific design architecture. For example, it has the same range of error for symmetric and non-symmetric designs.
FALCON uses data from stand-alone fault simulation of modules to estimate the detection probability of each fault in the whole system. Therefore, it reduces the complexity of fault grading from a function of G to a function of g, where g is the number of gates inside a module and G is the number of gates in the whole system. This method is applicable to both combinational and sequential designs.
As another feature, users are also able to apply this method on early stage designs when all the modules may not be available at the gate level of abstraction. This can help test engineers start the test process shortly after the design process starts and since it is based on functional test vectors, it can re-use test vectors from the design verification process. Since this methodology uses the manufacturing fault models, the resulting coverage is completely correlated with gate-level fault coverage.
We have applied FALCON on two processors, OR1200 [22] and IVM [21] . OR1200 has around 40,000 gates and 2,000 sequential elements, while IVM has around 5,000,000 gates and more than 100,000 sequential elements. We have injected around 75,000 faults in OR1200 and 320,000 faults in IVM. Our experiments show that FALCON works much faster than fault simulation and it estimates the coverage more accurately than the fault sampling method [2] with a confidence of 0.998 [3] . Our contributions to this work include the following.
• To our knowledge, the idea of divide-and-conquer for coverage estimation of modular designs has not been proposed before. Existing methods (such STAFAN) use gate-level granularity rather than module granularity.
• We have developed a fully automated environment for our coverage estimation method using a commercial fault simulator, a commercial logic simulator, and Perl scripts to feed these tools the proper testbenches.
• We have measured the results on a small-size (around 40,000 gates) and a large-size testcase (around 5,000,000
gates). Almost all of the previous methods have been applied on relatively trivial designs (around a few thousand gates). FALCON results are compared to both fault simulation and fault sampling methods. FALCON is always faster than fault simulation in our experiments. However, for OR1200, fault sampling method works faster than FALCON but with higher error. For IVM, which is a much bigger design, FALCON works faster than fault sampling with less error. The structure for the remainder of this paper is as follows. In Section II, we discuss our approach. Section II-A and Section II-B describe the methodology. Then, in the following sub-sections (II-C, II-D, and II-E) we describe the details for each step of our estimation method. In Section III, we show our experimental results. Finally, in Section IV, a brief run time analysis is discussed.
II. COVERAGE ESTIMATION METHODOLOGY A. Overview
We have developed a coverage estimation method for large modular designs, where a module can be combinational or sequential. A module boundary can be the HDL (Hardware Description Language) modules, like Verilog, in a hierarchical design. However, if the design is flattened by the synthesis tool, a partitioning algorithm for the gate-level design is not difficult. Any boundary which includes a reasonable number of gates in a module can be used in this method.
In this paper, the SSA fault model has been used. The main idea of our approach is to accurately estimate the coverage from the fault grading results on a standalone module, when that module is embedded in a larger system. This is accomplished by estimating how each module can propagate an error from one of its inputs to one of its outputs. We also calculate the probability of the presence of fault effects on the outputs of each module-under-test (MUT). The latter factor gives us an idea of how many fault effects will be activated on the boundaries of a MUT, while the former factor helps us find how many of these activated faults can be propagated through other modules in the whole system. Combining these two, we can estimate how many fault effects can reach the system's primary outputs.
The following are a few terms used frequently in this paper.
• MUT (module-under-test): A module in the system which is the target for fault grading. We perform fault grading module by module.
• Detection probability 
INTERNATIONAL TEST CONFERENCE
• Stand-alone fault simulation: The process of fault simulating a module, separated from the system, with its corresponding local test vector set.
• Local fault dictionary: The resulting fault dictionary when performing stand-alone fault simulation on a module.
B. Algorithm Steps
This section describes our methodology, step by step, using an example. The details of each step will be discussed in the following sections. We start with a modular design and an input test sequence whose fault coverage needs to be determined.
• Step 1: Given a test vector set, we simulate the entire system to generate local test vectors for each module. This step is done by a commercial logic simulator.
Fig. 1. System block-diagram
A system with four modules and a set of test vectors is shown in Figure 1 , while Figure 2 shows the system after applying this step. Step 2: Now that we have local test vectors, we perform stand-alone fault simulation for each MUT (M1 in our example). Note that faults are not dropped during this process, because we want to measure the probability of each fault detection. Therefore, the more times a fault is detected on an MUT output, the higher the probability it can be detected on a system primary output. In this step, the results are stored in local fault dictionaries. This step is done by a commercial fault simulator (shown in Figure 3 ). • Step 3: Using the local fault dictionaries from step 2, detection probability tables are generated for each MUT and propagation tables are generated for all modules in the system ( Figure 4 ). Note that for modules which are not in the MUT set, we still need to generate propagation tables. We will discuss this in more detail in Section II-D. This step is done by a Perl script. Step 4: We generate a statistical model using module interconnections in our design, propagation tables for each module in the design, and detection probability tables for each MUT ( Figure 5 ). By simulating this statistical model with a commercial simulator, we are able to estimate the fault coverage of each MUT in the entire system. We will describe the probability calculation formula in Section II-E.
C. Detection Probability Tables
As discussed above, this table indicates the detection probability for each fault on each output of a MUT. It is implemented as a 3-dimensional array; the first dimension represents the fault number, the second dimension represents the MUT output number, and the third dimension is either 0 or 1. Zero represents a 0/1 value and one represents a 1/0 value (a line with v/v value shows that a fault effect has reached that line and inverted the value of that line from v tov).
The value of each element in this 
As an example, fault #10 is detected on the 5 th output of a MUT in the stand-alone fault simulation process, 4 times with value 0/1 and 11 times with value 1/0. Suppose our test vector set contains 100 test vectors, then the detection probability Tables   This table calculates the ability of a module to propagate a fault effect from each of its inputs to each of its outputs. As discussed in Section II-A, this table is generated using the local fault dictionaries of each module. However, if we do not have the description of this module at the gate level, we can generate this table by simulating this module stand-alone (with its local test vectors) and inject the module's input stuck-at-0 (stuck-at-1) faults by putting a constant 0 (1) instead of the value of that input. The number of simulations will be 2 × i where i is the number of module inputs. Since this is done on a high-level module, the simulation cost is not that high. In another case, if we have the gate-level details for a module but we do not want to perform fault grading for this module, we can inject only the faults for this module's primary inputs and perform stand-alone fault simulation.
D. Propagation
Similar to detection probability tables, propagation tables are also implemented as a 3-dimensional array. The first dimension is the input number, the second dimension is the output number, and the third is between 0 and 3. Value 0 for this dimension shows the propagation probability of a 0/1 value from an input to a 0/1 value to an output. Table I shows the interpretation of other values for this dimension.
The value of each element of this array is calculated to be the propagation factor from an input to an output. If value v/v can be propagated through output o, it means that fault i−sa− v is detected on output o. Therefore, we calculate propagation factors from fault simulation as shown below. Note that we do not divide the numerator by the number of test vectors. This is because whenever we use this factor in our calculations, the fault effect has been already propagated through the input of this module. Therefore, we only need to use a definition similar to conditional probability (i.e., the probability of a fault effect propagation given that the fault is activated).
Also, we do not call this factor as propagation probability. This is because it can happen that the number of faults detected is more than the number of fault activations and this factor becomes greater than 1. This can happen in modules with sequential feedback paths.
E. Detection Probability Function
Suppose we have generated a propagation table for each module in the system and we have generated the detection probability table for our MUT. Now, using these tables, we want to calculate the detection probability of each fault on system primary outputs. For this purpose, we need to define a function that accepts the detection probability values at the inputs of a module and calculates the detection probability values at the output of that module, using the propagation factors of that module. Suppose a fault effect is propagated through more than one input of a module. This case happens usually since we always have fanouts in our design. In our example in Figure 6 , suppose a fault effect has reached input i 1 and i 2 with 0/1 probability values equal to α 1 and α 2 , and 1/0 probability values equal to β 1 and β 2 , respectively. In this case, it is easier to calculate the probability of absorption of a fault effect from ALL inputs through an output and then negate this absorption probability to reach the propagation probability from either of inputs to that output. This idea is a realization of the following probability formula (suppose A and B are independent events), 
Note that we are adding some error by assuming that the two events are independent, because in reality, two inputs of a module can affect each other during fault effect propagation (i.e., the fault effect can be masked). Since we are only dealing with system re-convergent fanouts and intra-module re-convergent fanouts are taken care of by stand-alone fault simulations, we expect only a small amount of error due to this assumption in our estimation method. This is validated by our experimental results discussed in Section III. Another source of error in FALCON can happen when the design has inter-module feedback paths. The experimental results show a small amount of error in this case as well as masking errors.
Using Formula 2, the detection probability of the fault effect on i 1 and i 2 reaching o 1 with value 0/1 can be calculated as,
The other values for detection probability of the outputs can be calculated in a similar way. A general formula for o 1 with value 0/1, when a fault effect reaches N inputs is,
F. Fault Detection Metric
Now that we can calculate the detection probabilities of a fault on each line in the system, we need to know a way to determine which value (or ranges of values) should be determined as detected and which ones should be considered as not detected. In other words, we need a metric for our fault coverage.
Using our statistical system and the statistical simulation environment (Section II-G), we calculate the detection probability for MUT faults from the outputs of each MUT through the primary outputs of the system. Since detection probability is defined as in Equation 1, we define our detection threshold as,
This threshold means that the fault is detected one time when applying our test vector set to our design. Therefore, if a detection probability value at a system primary output is greater than or equal to this value, it should be counted as a detected fault.
Using our detection probability function and our defined detection threshold, we can estimate the fault coverage of the system for each MUT. Due to our detection probability definition in Equation 1, the output of our statistical system shows the detection of the faults in the system as if they are not dropped.
G. Statistical System and Simulation
After we build propagation tables and detection probability tables, it is time to calculate the detection probability for each line in our design using the detection probability function discussed in Section II-E. Note that if the top module of the system (the module we are building our statistical system from) has some glue logic, we wrap it inside a dummy module and generate propagation tables for this dummy module as well. We have done this in one of our testcases. We generate our statistical system (in Verilog) following the steps below.
• Replace every module in the system with its propagation table.
• Add a detection probability table to the MUT.
• Connect these high-level models as they were connected in the original design.
• Change the signal type to a type which accepts the detection probability for both 0/1 and 1/0 values (e.g., a two element array of type real). Given the above statistical system (along with a library containing detection probability functions), and our commercial simulator, the detection probabilities of interconnections and system primary outputs can be calculated. For coverage calculation, detection probabilities on primary outputs are compared with our defined detection threshold.
III. EXPERIMENTAL RESULTS
We have developed scripts for generating local test vectors and testbenches for stand-alone fault simulation to be able to apply our method on designs. Figure 7 shows the flow of our estimation methodology. As can be seen in this figure, the local test vector sets (TV 1 , ...) are obtained using a commercial simulator. Then using a commercial fault simulator, local fault dictionaries are obtained based on the local test vectors (Local Dict. 1, ...). These local fault dictionaries are converted to detection probability and propagation probability tables. These tables, along with the interconnections of the system are used in a simulation environment to estimate the fault detection probabilities in the whole system.
We have applied FALCON on two CPU designs, OR1200 which is a RISC processor and IVM which is an implementation of the DEC Alpha processor. These CPUs are Verilog designs which were synthesized with the TSMC 180nm technology library. Table II shows some characteristics for each test case. We ran our experiments on an Intel R Xeon R X5670, 2.93GHz processor, with 72GB of memory, and 12 cores (with hyper threading). In both cases, fault grading is started after the design has been reset (using the reset signal of the design).
As discussed in previous sections, FALCON estimates the presence of each fault on each output of a design. This can be considered as a statistical fault dictionary. To show the accuracy of our estimation method, we have performed sequential fault simulation on a sub-set of faults for OR1200 without fault dropping and averaged the appearance of each fault on each primary output. On the other hand, we have applied our method on the same sub-set of faults and measured the detection probability of each fault on each primary output. An example is shown in Figure 8 (fault simulation) and Figure 9 (fault estimation) for the faults in the ALU module in OR1200. As can be seen, these two measurements are very close to each other, which means that FALCON is able to prepare statistical data about fault detection rather than outputting only a coverage number. This data can be used for purposes like fault diagnosis. Other estimation methods, like fault sampling, do not output any data other than the fault coverage. However, in this paper, we have compared our results with the results of fault simulation and fault sampling with fault dropping.
We performed traditional fault simulation and fault sampling (using the same commercial fault simulator that we use in our method for standalone fault simulations) on the whole system, measured their run time and coverage, and we have compared the run time and fault coverage of our estimation method with these results. As discussed above, fault simulation and fault sampling processes are done with fault dropping.
In fault sampling method, a sample of faults from the fault list is selected and fault simulated. Based on the fault coverage from these sampled faults, the fault coverage for the whole system is calculated using a formula. This method gives the method gives us a range equal to [28.4, 30.9 ] with a confidence of 0.998. This means that with a probability of 0.998, the real fault coverage (for the whole system) is between 28.4% and 30.9%. Our experimental results show that fault sampling coverage range does not match the real coverage in several cases. In cases that the calculated coverage matches the real fault coverage, we have put 0% error in our tables and diagrams. For the cases that the real coverage is not in the calculated range, we have calculated the error as the difference between the real fault coverage and the coverage in the middle of the range.
In the following sections, we will discuss our experiments on two case studies using some tables and diagrams.
A. OR1200 Case Study
For OR1200 case study, we applied different sizes of random test vectors to the CPU. Fault coverages are shown in Table III . The first column shows the number of test vectors (which are random), while columns 2, 3, and 4 show the coverage results for fault simulation, fault sampling, and FALCON, respectively. As discussed above, in column 3, a range of fault coverage is shown. Column 5 indicates the error between our estimation method and fault simulation method. We have measured our error as the number of mis-calculated faults over the total number of faults. That is why the difference between fault coverages shows a smaller number than the error shown in the fifth row of Table III. The sixth row of this table shows the error between fault sampling method and traditional fault simulation method (columns 2 and 3). As discussed above, the error is defined as 0% if the real coverage is in the range of the calculated coverage. The next two columns in this table show the number of mis-detected faults (rather than the percentage) in FALCON and fault sampling methods, respectively. The last column shows the difference between the number of mis-detected faults between fault sampling method and our estimation method. As it can be seen, this number is relatively high in the first three cases. Table IV shows the run time results for fault simulation, fault sampling, and our coverage estimation method for the runs whose coverages shown in Table III . The first column of this table shows the number of random test vectors. The second, third, and forth columns show the run times of fault grading for fault simulation, fault sampling, and our coverage estimation method, respectively. In column 5, speed-up in run time between our coverage estimation and fault simulation has been shown. This speed-up factor is calculated by dividing the time spent in fault simulation method by the time spent in all the steps of FALCON (i.e., column 2 divided by column 4). All run times are shown in seconds. We measured the run time speedup between FALCON and fault simulation. As can be seen in this table, fault sampling works faster than our method, but the error of sampling method is more than our estimation in most cases as shown in Table III . Also, when the design size grows, the fault sampling method calculates less accurate results compared to our estimation method. However, the sampling method run time is still comparable with our estimation method. This can be seen in the IVM test case in the next section (Tables V and VI) .
We have summarized our results in Figure 10 . This figure shows the run time for fault simulation, fault sampling and coverage estimation on a logarithmic scale (shown with bars). Also, sampling error and coverage estimation error are shown in this figure with lines. These errors are shown by the number of miscalculated faults. As it can be seen in this figure, the fault sampling method has the fastest run time when the number of test vectors is increased. It can be seen that the run time for our estimation method also grows more slowly than the traditional fault simulation. For the IVM test case, FALCON works faster than the fault sampling method. This is while we only use a small subset of faults to simulate. We believe that FALCON will run even faster than fault sampling with smaller error rates if we inject more faults in our design.
B. IVM Case Study
In the IVM test case, we applied random test vectors which are valid instructions. As can be seen in Table V , we have run fault simulation, fault sampling, and coverage estimation methods on this test case for 50, 200, 500, 1000, 2000, and 5000 clock cycles. Since IVM is a superscalar processor, the number of random instructions in the memory model is more than the number of clock cycles for which the design is fault simulated.
In this case we have also chosen a subset of faults for this processor and we have not simulated all the faults. This is Table V and run time results can be found in Table VI .
As can be seen in Table VI , in this test case, fault sampling takes longer times than our coverage estimation method and as shown in Table V , the error between fault sampling and fault simulation is higher than the error between our coverage estimation method and fault simulation (Table V) .
Similar to the OR1200 case, we show run times, fault coverages, and coverage errors for fault simulation, fault sampling, and coverage estimation in the IVM processor. Figure 11 shows the run times for the three methods and errors in coverage for fault sampling and coverage estimation. The run times are shown in logarithmic scale and the error is indicated by the number of faults.
As can be seen in Figure 11 for both the OR1200 and IVM cases, the run time in our method, due to its scalability, grows at a slower rate than fault simulation. Also, it can be seen that our method runs faster than fault sampling with the growth of the design size, with smaller error rates.
As an advantage of FALCON, we can determine which faults are detected on which outputs. This is useful when the user needs more data than a simple coverage number (e.g., in the case of fault diagnosis).
As we can see in the above two test cases, the run time of our estimation method grows faster than fault sampling, however it it still faster than fault sampling for large designs. The main reason for this growth rate in coverage estimation is that we do not drop the faults during our process, which can have its own applications. In cases that we do not need the results without fault dropping, we can divide our test vector set into sub-sets of test vectors and apply our method step-bystep for each sub-set of test vector. In each step, we can drop the detected faults. This way, we reduce the time of our standalone fault simulations. We can also change our boundaries to have smaller partitions. This way the local fault simulations will take less time.
From our experimental results, we can say that fault sampling is a great method for estimating fault coverage for small to medium designs. It is still a good way to roughly estimate fault coverage for larger designs. However, this method does not provide data other than fault coverage. On the other hand, FALCON works a lot faster than fault simulation. Although it works slower than fault sampling for small to medium designs, it becomes faster than fault sampling for larger designs. In addition, FALCON provides more information about fault detection which can be useful during the test process.
IV. RUN-TIME ANALYSIS
In this section, we develop a simple run time complexity analysis for our estimation method and compare it with a run time analysis of fault simulation.
We can use the following symbols in our analysis.
• M: number of modules in the system • T : number of test vectors • f m : number of faults in an MUT m • G: number of gates in the system • g m : number of gates in MUT m • i max × o max : maximum module input/output product Using the above definitions, the complexity of each step of our algorithm can be expressed as follows, assuming we are estimating the fault coverage of module m in our system. f m is the number of detected faults in stand-alone fault simulation. In worst case f m = f m All of the above should be done for coverage estimation of module m. Therefore, the runtime of coverage estimation can be written as
If we want to inject all of the faults in our MUT, the number of faults is linearly related to the number of gates. Therefore, we can replace f by g in formula 3. We also can remove I × T + M part since it is negligible compared to the other parts. The statistical simulation part (g m × M × (i m × o m )) is not negligible if all faults propagate through all inputs of every module. Since each fault usually affects a limited part of the design, it will propagate through a few of the module paths. Therefore, using i max × o max in our formula is unrealistic since this term can be easily replaced by a small constant. On the other hand, M is also a relatively small number and the whole product of M × i max × o max can be replaced by a constant. As a result, the estimation of run time can be written as:
which can be written as:
Equation 5 shows that the run time of FALCON mostly depends on the time for good simulation and the time for local fault simulation for module m.
On the other hand, the complexity of fault simulation for a module with g m gates can be written as: As we can see, our estimation method can work around 100 times faster than fault simulation. For example, if coverage estimation takes a few minutes, we can expect hours for fault simulation or if FALCON takes an hour, we can expect days for fault simulation.
Due to the above analysis, if g m is close to G (which means g m is a large module in the design), our estimation method will be as time-consuming as fault simulation. If we have such modules in the design, we need to break them down into smaller modules and apply the algorithm on these smaller modules. Fortunately, with today's hierarchical designs, every module has its own sub-modules. Therefore, we can use the sub-modules of large modules under test as our new modules under test and apply our technique hierarchically to the design.
V. CONCLUSIONS
We have developed a hierarchical and modular technique for estimating fault coverage (FALCON). Currently, this method is evaluated for single-stuck-at faults. Our experimental results show that for large designs, we can reach orders of magnitude improvements in time with a very small amount of error. Our estimation method works the best when each module in then design is a few times smaller than the whole design. For large modules, we can simply break them into smaller modules and apply our method hierarchically. FALCON works on both combinational and sequential modules. This method can be used even before completing the design, when we do not have every module at the gate level of abstraction. Future work will include analysis for error bounds and other fault models.
