test pattern set or (in extreme cases) the design could be modified to achieve adequate quality.
Although the IC industry widely accepts stuck-at fault detection as a key test-quality figure of merit, it is nevertheless necessary to detect other defect types seen in real manufacturing environments. 1, 2 For example, designers and test engineers often use stuck-at fault coverage in the Williams-Brown model 3 to estimate the defect level of a part: DL = 1 -Yield (1 − fault_coverage) (1) However, the coefficient of correlation between stuck-at fault coverage and defect coverage declines rapidly after about 80% fault coverage. 4 This is bad news for those who wish to use stuck-at fault coverage to predict defective part levels. Just in the interval (near 100%) where it is most important for fault coverage to accurately predict defect coverage, the predictor is at its worst. Therefore, an alternate-and more accurate-method for defective-part-level prediction is needed.
Researchers have tried to characterize the exact nature of defects to create more accurate models for test generation. Unfortunately, using complex defect models during automatic testpattern generation (ATPG) is costly in both CPU time and memory. Multiple fault models with multiple testing methods have also been proposed and studied. 5 However, with millions of transistors on a single device, it is not practical to generate deterministic test sets for all possible defect models. The massive scale of resources required to deterministically generate robust tests motivated our probabilistic approach to detecting defects.
Valuable information is discarded during the fault simulation phase of traditional ATPG. Specifically, each site's fault detection profile is lost in modern fault simulators because they use fault dropping for time and space efficiency. However, there is a strong correlation between the number of times a fault site is "observed" and the ability of the corresponding test set to screen out arbitrary defects at that site. The focus of the research program that produced this article is twofold: to develop accurate defective-part-level predictors and to use such predictors to produce superior test-pattern-generation techniques.
Before actual silicon becomes available, we can use surrogate (nontargeted defect) simulation as an alternative metric for comparing test set effectiveness. Bridges are appropriate surrogates because they model shorts in CMOS logic more precisely than do stuck-at faults. This makes bridges attractive, because shorts are a common defect. Measuring the ability of a test set to detect bridging surrogates is therefore helpful in determining that test set's quality. Bridging-surrogate coverage is an especially useful metric when comparing two test sets with similar stuck-at fault coverage. This is the case when comparing a DO-RE-ME test set 6 (currently our best approach) with a traditional commercial ATPG test set. The DO-RE-ME method is named for its primary components: deterministic observation, random excitation, and MPG-D estimation of defective part level.
(The MPG-D model is named for its creators: M. Ray Mercer, Jaehong Park, Michael R. Grimaila, and Jennifer Dworak.)
We applied a traditional test-pattern set (consistent with current best commercial practice) and our enhanced test-pattern set to an IC with 75,000 logic gates. The results of the experiment support our conjecture that increasing each site's observability significantly reduces the overall defective part level for the device.
Excitation and observation
Because of the mismatch between the faults used for test generation and the defects that ultimately occur in ICs, some metric other than fault coverage is needed for predicting the defective part level and evaluating the effectiveness of test sets. Ideally, this metric's usefulness wouldn't depend on defects of any particular type but would apply equally to all defect types. One such metric is the number of times circuit sites are observed.
No matter what type of defect occurs, the defect site that is excited to produce an incorrect value must be observed at an output for the defect to be detected. If we know that some type of defect occurs at a given site, we can observe that site deterministically. In contrast, a defect's excitation requirements depend intrinsically on the nature of the defect in question, and thus the excitation of nontargeted defects is inherently probabilistic. To understand the excitation and observation requirements of different types of defects, consider the requirements for generating tests to detect stuck-at faults and bridging surrogates.
Generating tests using the stuck-at fault model requires deterministic excitation and deterministic observation of the circuit's stuckat faults. The simple circuit in Figure 1 illustrates the steps required to test for A stuck-at one. In this case, A is set to 0 to excite the fault, and S is set to 1 to allow the fault, D, to propagate through NAND gate N1. Since the signal S is set to 1, the output of inverter I1 is 0. This 0 ensures that the output of gate N2 is 1, which allows the D on the output of N1 to propagate to the output of N3 as D. In this case, it doesn't matter what value is assigned to node B and the resulting value at the output of I2 because this value is blocked from propagating through N2 because of the 0 on the output of I1. Given the conditions that A equals 0 and S equals 1, we find that the probability of both the excitation and the observation of fault A stuck-at one is equal to 1.0.
If we apply the same test vector to a circuit containing a nontargeted OR bridge between node A and node B, we find that the value of node B will determine whether the OR bridge is detected: if B equals 0, the OR bridge is not detected; if B equals 1, then the OR bridge is detected. Given the conditions that A equals 0 and S equals 1, we find that the probability of the OR bridge's excitation in this example is 0.5. Thus, we see that the excitation of a bridging surrogate is probabilistic when using a stuck-at fault model for test generation.
If we apply the same test vector to a circuit containing a nontargeted OR bridge between node A and node BN, we find that the value of node BN will determine whether the OR bridge is detected: if BN equals 0, the OR bridge is not detected; if BN equals 1, then the OR bridge is detected. Given the conditions that A equals 0 and S equals 1, we find that the probability of the OR bridge's excitation in this example is 0.5. Also note that if one chip is defective because it contains an OR bridge between nodes A and B, and another chip is defective because it contains an OR bridge between nodes A and BN, then both tests must be applied to reject both of the defective chips.
Thus, in all three cases the value at the node set to the incorrect logic value by the presence of the defect had to be propagated to a primary output for detection to occur. This was done deterministically for node A by setting the value of S equal to 1. However, the excitation requirements depended on the type of defect we were trying to detect. The stuck-at fault required only that A be set to 0 to excite it and to cause an incorrect logic value at the location of the defect. However, excitation of the bridges depended on the value of B, and the two bridges required that B be set to opposite values for excitation to occur. Since the particular defects in circuits under test are unknown, we cannot deterministically excite them-but we can deterministically observe the sites where those defects may be present. Every time we observe a site, we have a certain probability of exciting any remaining defects that occur there, so to minimize our chances of missing defects, we should observe difficult-to-observe sites as often as possible.
Defective-part-level model
As we have shown, information on the number of times sites are observed (and the probability of exciting undetected defects when we make those observations) can serve to develop a defective-part-level model that doesn't rely on measures of fault coverage. Our best such model to date is the MPG-D model. 4 MPG-D assumes that, initially, defects are evenly distributed across all circuit sites and assigns to each circuit site i its initial contribution to the overall defect level (DL) on the basis of the following formula: (2) Here, Y represents the manufacturing yield, and 1 − Y represents the initial overall defect level. Therefore, the overall defect level is merely the sum of the defect-level contributions from individual sites. (Of course, if the defect distribution as a function of location is known, the initial defect-level contribution assigned to each site can be modified accordingly.) When test patterns are applied, some of the circuit sites are observed, some of the defects are excited, and some of the defects are detected. Thus, after n patterns have been applied, the overall defect level becomes changed by the application of each pattern n according to the following set of equations:
(6)
Here, Equation 4 describes the probability of exciting an as yet undetected defect, given that site i is observed. We know that this probability decreases exponentially with a time constant τ as the site is observed multiple times. 7 Because the site was observed by the current pattern, we can use this probability to calculate the change in the site's defect-level contribution. This change is defined as ∆site i and is described in Equation 5 . In this equation, A represents the fraction of a site's defect-level contribution that will be removed, given that at least one undetected defect at site i is excited and site i is observed. For example, if we set A equal to 0.3, we're assuming that if we observe site i and manage to excite (and thus detect) at least one undetected defect there, we will simultaneously detect 30% of the defects that were still undetected at that site. In other words, a single test may detect multiple defects.
When we multiply A by the probability of exciting an undetected defect, given that the site is observed, we get a value for the average fractional reduction in defect-level contribution at site i, given that site i is observed and has been observed by a given number of previous patterns. This value is then multiplied by the old value of the defect-level contribution to obtain the actual change in defect level for site i resulting from its observation. Note that ∆site i is equal to 0 for sites that have not been observed by the current pattern n.
The total change in the defect-level contributions of all the observed sites-owing to the fact that they were observed-is equal to Total∆site and is simply the sum of all the ∆site i values (see Equation 6 ).
However, a single defect may affect more than one site. For example, if we assume that all of the defects are AND/OR bridging defects, then each defect will involve two sites. The defect-level contribution of one of these bridging defects should really be divided equally between the affected sites. When a defect is detected at either of the two sites, equal portions of the reduction in defect-level contribution should be removed from each site.
The constant C determines the portion of the defect-level removal at an observed site that must also be removed from somewhere else. If C is greater than 0, some of each site's defectlevel contribution also involves other sites. In the case of wired AND/OR bridging defects, the reduction in defect-level contribution occurring at a site is only one half the overall reduction that should occur because of defect detection at this site, and thus C should equal 1. (C may be greater than 1 if a significant portion of the defects involve more than two sites.) The additional reduction must be removed in some way. The extra removal is contained within ∆share i (see Equation 8 ). Ideally, we would remove it from the sites that were bridged to the current site and whose bridges were detected. Since we don't know the exact sites, we apportion the extra reduction among all the sites, so every site experiences some corresponding decrease in its defect-level contribution. We make the amount of additional removal from the other sites proportional to the defect-level contribution that remains at that site after we take into account the changes resulting from ∆site, as shown in Equation 7 . In other words, if a site contributes to 10% of the overall defect level, 10% of the additional removal is taken from that site.
Ultimately, the final defect-level contribution DL n DL n site share at a site after application of a test pattern n is equal to its previous defect-level contribution minus the defect-level removal resulting from observation on the current pattern, and minus the additional removal resulting from the sharing of defects among circuit sites (see Equation 9 ).
Test generation
Our defective-part-level model and our analysis of the importance of repeated observation of difficult-to-observe sites led us to develop and evaluate the DO-RE-ME test generation method. We generate tests using traditional ATPG algorithms targeting traditional stuck-at faults. There are only two nonstandard requirements for the ATPG algorithm:
s the test-pattern-generation decision process must be random so that if the same fault is used twice as a target, the resulting two tests are extremely unlikely to be identical (that is, they are random samples from the set of all possible test patterns for the fault); and s the number of detections of each fault over the entire test pattern set must be computed during fault simulation.
In fact, if two identical patterns are produced, one is removed in a postprocessing step after ATPG and prior to optimization. In contrast to traditional fault selection methods (such as testing for each undetected fault at least once), we select faults to target in such a way as to maximize the number of times each site is observed. More specifically, since the sites least often observed make the dominant contribution to the defective part level, our objective is to maximize the number of observations of difficult-to-observe sites. This means that many stuck-at faults will never be used as targets, because they are often detected by tests that target other faults. In contrast, some stuckat faults will be targeted many times because they are located at rarely observed sites. Note that since faults (not sites) are targeted during test generation, both the stuck-at-one and the stuck-at-zero faults associated with each site will be targeted with equal priority. Test generation is terminated after all faults have been detected at least n threshold times. Our test generation process has evolved continually since its development. Because existing ATPG tools were employed, we used Perl programs to implement the targeting and statistics collection mechanisms the DO-RE-ME method requires. Originally, only one fault was targeted per pass until all faults were detected at least n threshold times. To take advantage of the dynamic test-patterncompression capability of modern ATPG tools, we modified the method so that all faults detected fewer than n threshold times were targeted in a single pass. The resulting test patterns were more robust in terms of fault detection capability, and the test generation times were substantially reduced. The key differences from standard ATPG application are that all faults detected fewer than n threshold times are targeted at the beginning of each pass, and fault detection statistics record how many times each fault has been detected so far. This is orders of magnitude less information than collected for a fault dictionary. Initially, all faults have been detected exactly zero times. In the first pass, all faults are targeted by the ATPG tool. The resulting test set is fault simulated, and any faults detected more than n threshold times are removed from the target list for the next pass. The process is repeated until all faults have been detected at least n threshold times. In some cases, no tests are produced because an upper bound on ATPG time is exceeded, and the corresponding fault(s) is temporarily removed to avoid excessive ATPG times. 
Optimized test generation
Ideally, all tests generated using the DO-RE-ME method would be applied to the device under test. However, tester memory constraints and/or test time limitations often make it necessary to select an optimized subset of patterns that produce the lowest defective part level according to our defective-part-level model.
Assuming that a superset of patterns was generated as discussed in the previous section, we now apply methods to select the optimized testpattern subset. First, we create a fault dictionary for all of the patterns in the superset. We calculate defective part level assuming all patterns in the superset are applied to the device under test. This provides a baseline minimum possible defective part level for the given test superset. We select an initial subset of patterns randomly. For successive steps, we calculate the predicted defective part level of the subset and compare it to the minimum calculated from the superset. The least effective n patterns are removed, where n is determined by the magnitude of the difference between the minimum and current predicted defective part level. Then, the defective-part-level reductions predicted by our model for every test pattern in the test pattern superset (which did not already exist in the optimized subset) are compared, and the most effective n patterns are added to form the optimized subset. Removal of the weakest n test patterns from the subset and addition of the n strongest patterns from the superset are repeated until no significant difference in predicted defective part level is achieved. Note that this process resembles simulated annealing, wherein the number of patterns removed or added to the subset is a function of the difference between the predicted defective part levels of the subset and the superset. We used the final test-pattern subset as the optimized test-pattern set to apply during manufacturing test.
Commercial IC surrogate simulation
Normally, several weeks elapse between generation of the test pattern set and availability of the first silicon to be tested. During this period, we can further evaluate the test pattern set's quality and use surrogate simulation to estimate defective part levels.
We performed bridging-surrogate simulation on a commercial design to compare a traditional ATPG test set with a test set produced by DO-RE-ME. This commercial design was a 1-million-transistor submicron mixed-signal device using full scan in the digital logic with approximately 204,000 design-level stuck-at faults. The design's size and the tendency of commercial simulation tools to exclusively manipulate stuck-at faults introduce impediments to determining bridging-surrogate coverage.
The dominant problem is the sheer number of bridges to model. The total can grow quadratically if all potential bridges are modeled. This number is needlessly large, since, on the basis of the design layout, a majority of bridges are unlikely to occur. Thus, likely bridges are extracted from the layout so that a smaller and more realistic set of bridges can be analyzed. 8 When the critical area between two nets indicates a significant possibility of a short, two bridging surrogates (wired AND/OR) are created. The bridge list was also postprocessed to remove electrically equivalent bridges.
Researchers in a previous experiment reported that most bridges, as determined by Spice simulations, do not act exclusively as AND/OR bridges but rather as net-dominating bridges. 9 However, diagnosis studies on real devices with actual bridging defects created with focused ion beam equipment indicate that wired AND/OR behavior is also a relatively common occurrence. 10 In our study, surrogates were modeled as wired AND/OR bridges. The wired AND bridge models a short where a zero on either net dominates a logic one on the other net. Conversely, the wired OR bridge acts as a short where a one on either net dominates. Both surrogates represent shorts that are assumed to be nonresistive.
Extracting the bridges for this design produced approximately 250,000 cases in which both ends were external to library cells. The bridged nets had to be outside of cells, since labels at the layout level had to be matched to net names, and no labels were available for sites within library cells. For us to deem a bridge to be testable, at least one net of the pair had to have a detectable stuck-at fault (stuck-at zero for an AND bridge, stuck-at one for an OR bridge). This resulted in 217,534 AND bridges and 213,517 OR bridges, for an initial surrogate list total of 431,051 bridging surrogates.
To determine surrogate coverage data, we created a fault dictionary through fault simulation. Results from a "good circuit" simulation also served to check bridge excitation conditions. For this we chose the Mentor Graphics ATPG tool, FastScan. The fault dictionary tells which stuckat faults each vector detects. Using this information enables us to run vectors one at a time, dynamically changing the simulation between vectors, to check the values of the nets opposite the detected faults of each bridge. Upon detection, surrogates are deleted from the list.
The ATPG set contained 853 vectors and the DO-RE-ME set 4,000. Test engineers originally chose this size because of tester memory limitations. At vector 853 the ATPG set had detected 1,890 more stuck-at faults than the DO-RE-ME set. This results in a 0.9% total fault coverage difference. Even so, at the 853rd vector the DO-RE-ME patterns had detected 877 more surrogates than the ATPG test patterns. This shows that even at a lower stuck-at fault coverage, the DO-RE-ME test set detects more surrogate defects. At the end of the 4,000 vectors, the two sets have comparable stuck-at fault coverage, but the additional 3,147 DO-RE-ME vectors detected an additional 2,730 surrogates. Thus, the total difference in surrogate detections between the ATPG and DO-RE-ME test sets was 3,607, clearly demonstrating that high stuck-at fault coverage is not sufficient to guarantee high nontargeted defect coverage.
Commercial IC production testing
The bridging-surrogate simulation results strongly suggest the utility of DO-RE-ME vectors for reducing defective part levels in commercial ICs. We now examine results from applying the DO-RE-ME method to a different commercial IC. The data was collected on two different lots of production wafers. This IC consists of more than 75,000 two-input NAND equivalent logic gates, and it was tested using two different test pattern sets. We designated the first test pattern set as commercial because it is exactly what the standard manufacturing test flow uses. Because the chip was designed with 100% scannable flipflops, we performed ATPG using the Mentor Graphics FastScan program. The commercial test-pattern set consisted of 3,000 test patterns, and each was applied using one scan chain load/unload. The stuck-at fault coverage for this test pattern set was just over 97%.
We designated the second test pattern set as research because it was produced by the FastScan program using the DO-RE-ME method. Its length exactly equaled that of the commercial set, and its stuck-at fault coverage was 96.7%. However, the research set differed from standard commercial practice in that the DO-RE-ME method maximized the number of observations at difficult-to-observe sites.
Of all die in the first lot of production wafers, tested in October 1998, 6,986 passed all parametric tests. These were then tested using the two test pattern sets described above: 220 were declared defective using the commercial testpattern set, and 229 were declared defective using the research test-pattern set. We arbitrarily assume that 12 defective die were never detected by either test pattern set. The defective part levels for the two test pattern sets appear in Figure 3 . These defect-level calculations consider only scan stuck-at tests and do not take into account any other manufacturing tests applied, such as functional patterns and I DDQ . Actual manufacturing defect levels are considerably lower than reported here. The second production wafer lot was tested in February 1999. Of all die in the lot, 20,591 passed all parametric tests and were tested using the two test pattern sets described previously: 245 were declared defective using the commercial test-pattern set, and 246 were declared defective using the research test-pattern set. Once again, we arbitrarily assume that 12 defective die were never detected by either test pattern set. Defective part levels for the two test pattern sets appear in Figure 4 .
Comparing actual and predicted defective part levels MPG-D is a significant improvement over Williams-Brown in every case. Williams-Brown is pessimistic at the beginning of each run and optimistic at the end. In addition, since fault coverage is slightly higher for the set of commercial vectors than for the set of research vectors, WilliamsBrown incorrectly predicts that the commercial vectors will detect more defects. In reality, the research vectors always performed better.
MPG-D was able to make very accurate predictions when the defects changed (October 1998 compared to February 1999) and when the test patterns changed (research compared to commercial) in all but one instance-the application of the commercial vectors in October 1998. In this case, the commercial vectors detected fewer defects than were predicted by either MPG-D or Williams-Brown, but the MPG-D prediction was closer. The reason for this error isn't known. Finally, the two defective-part-level models estimated the research vectors' effectiveness better than the commercial vectors' effectiveness because the shape of the predicted curves for the research vectors better matched what actually happened. Thus, the research vectors and their associated test generation method lend themselves to accurate defective-part-level prediction.
THE DO-RE-ME improvement involves using traditional ATPG and fault simulation tools in a new way. By maximizing the deterministic observation of defect sites in the network (as determined from traditional stuck-at fault simulation) and relying on probabilistic defect excitation, we achieved significant improvements in test pattern efficiency.
The accuracy of MPG-D defective-part-level 
