A 25k gate Test Chip was designed and manufactured to evaluate different test methods for scan-designed circuits. The design of the chip, the experiment, and preliminary experimental results were presented at ITC'95. This paper presents results f o r different clock speeds and clocking modes (at-speed and delay), and uses this data to characterize the behavior of the defective parts. It was found that timing-related defects are common, and the escape rate for different test techniques on these parts is discussed.
INTRODUCTION
A Test Chip has been designed and manufactured to evaluate multiple test techniques for combinational or fullscan circuits. The objective of the experiment was to get a real-world comparison of many different test methods, as it is difficult to evaluate the effectiveness of tests without experimental data.
The design of the experiment and architecture of the Test Chip were described in [l] , and preliminary experimental results were presented in [2] . This paper presents results for different clock speeds and clocking modes (at-speed and delay), and uses this data together with the data from the on-chip failure counters to characterize the behavior of the defective parts. Almost 44% of the defective parts were found to be timing or pattern dependent, and most of the test escapes were on parts with these defects. The Stability Checking results show that the Stability Checkers worked as intended, and accurate propagation delay measurements on a few CUTs show that a super-exhaustive test can exercise longer delays through a circuit than other test sets.
Several experiments have been reported in the literature that address the quality impact of the different test sets The Test Chip architecture was designed in conjunction with Hughes Aircraft Corporation, where most of the detailed design was done. The Test Chips were fabricated at LSI Logic Corporation in four wafer lots, and the wafers were tested on a lOOMHz Tester (Schlumberger ITS9000FX) by Digital Testing Services, Inc.
The Test Chip is a 25k gate CMOS gate array, manufactured using LSI Logic's LFT15OK FasTest array series, and has 96 I/O pins. The chip area is divided equally between test support (DFT) circuitry, and circuitsunder-test (CUTs). The CUTs on 5,491 dice have been tested. support circuitry for applying test vectors and observing responses. The overall data flow in the Test Chip is shown in the block diagram in Fig. 1 . There is a common data source (parallel load LFSR which implements the primitive polynomial f ( x ) = x24 + x7 + x2 + x + I), and a response analysis circuit for each CUT type. There is also a multiple input signature analyzer for the MUL CUT. Table 1 lists the five types of CUTs that were designed, including two multipliers and three control blocks. The sizes of the CUTs are given in terms of LSI gate equivalents. For example, a 2-input NAND has a gate count of 1, and a 4-input NAND has a gate count of 3.
CUTs
MUL is a 12x12 partial product multiplier with only the 12 most significant outputs observable. SQR consists of a 6x6 multiplier in which the 6 most significant outputs are fed into another 6x6 multiplier, such that the second multiplier acts as a squarer. The three control logic blocks implement the same function synthesized with different constraints. STD is implemented using the standard LFTlSOK library [19] , ELM uses only elementary gates, and ROB is a robust path-delay-fault testable implementation. CUT details can be found in [1,18].
Response Analysis Circuits
The response analysis is done on chip to avoid storing prohibitively long test responses on the ATE. The corresponding outputs of four copies of each CUT are compared to determine if any errors have occurred, and counters are used to record the first-failing vector and the number of failing vectors for the test set.
The design of the comparison circuit is shown in Fig.  2 
TESTS APPLIED

Test Plan
The two stage testing plan shown in Fig. 3 was adopted for the experiment [ 11. Gross parametric and test support circuitry tests (Stage 1 tests) were first applied to each die, and those dice that failed Stage 1 tests were not considered as part of this experiment. This makes the experiment more focused since the gross failures have been removed, and we are interested in the ability of tests to detect the difficult or "elusive" failures. 
Test Conditions
Each test set was applied to the Test Chip under different test conditions to evaluate the different testing techniques used in practice. The test condition consists of the Data Source, the Clocking Mode, and the Test Timing, which are discussed in turn.
Data Sources
In scan designs, the scan-path ordering places restrictions on consecutive vectors, so not all 2-pattern tests can be applied to the CUT. One solution to this problem is to use an "enhanced' scan chain that can store two values, but this can be expensive. Other approaches include using a "skewed-load" test, where the second pattern is shifted one bit from the first, or functionally propagating the second pattern through the combinational logic [20, 21] .
Vectors can be applied to the CUTs in two ways, in order to investigate the effect of skewed-load: 0 parallel load; 0 simulated scan. In parallel load, vectors are applied either directly from the ATE, or generated internally by a primitive LFSR. In simulated scan, before each vector is applied, a shifted copy of the vector is applied, to simulate the correlation between consecutive vectors when loading the vectors through a scan chain. Since there is no logic feeding the CUTs, functional propagation was not tested.
Clocking Modes
Delay tests assume that the transients settle before applying the second pattern, which can be difficult to implement in practice. One proposed alternative to delay testing is to apply functional or stuck-at vectors "at-speed", even though this is not necessarily a good delay test [22] .
Delay testing uses slow and fast clocks, whereas at-speed testing uses only fast clocks. Methods that combine slow and fast clocks have been proposed, for example [23] . The effect of applying fast clocks instead of slow clocks is addressed in 1241.
The main reason for investigating both the at-speed and delay clocking modes is that there has been interest in the tradeoffs between the two methods for detecting timing failures. Three clocking modes were investigated on the Test Chip :
0 DI --DIrect clocking mode (at-speed); 0 PU --Pulse-width generated clocking mode (delay);
IN --INternally generated clocking mode (delay).
The DI clocking mode represents "at-speed" or normal clocking of a circuit, while the other two clocking modes represent "delay" clocking, where the first vector is allowed time to settle before the second vector is applied. The first two clocking modes are derived from an external clock, while the third is generated internally by a delay line. Simplified timing diagrams for the different clocking modes are shown in Fig. 4 . More details can be found in [1,18]. The Input Clock is used to apply data to the CUTs and the Output Clock is used to sample the outputs of the CUTs. For a 2-pattern test <V1,V2> the time between the application of <V2> and sampling the output is always the cycle time Tc. The time between the application of <Vi> and <V2> is Tc for at-speed testing and is large enough to allow the CUT to settle for delay testing. 
ARR& Panterns Sample Output
Input
Test Timing
The test sets were applied at three different clock speeds in order to explore the effect of running tests at different speeds:
e r --rated speed of each CUT; Table 2 summarizes the tests sets applied to the CUTs, as well as the data sources, clocking modes, and test timing. The shaded boxes in Table 2 indicate that the test was applied under the corresponding conditions. The parallel load PU clocking mode data was reported in [2] . for others).
SAMPLING TEST RESULTS
Estimating defective GUTS
The first step in the data analysis is establishing which CUTS are defective. Using this information, escape rates for each test set can be determined. Determining the number of defective die is more complicated than in [2] , as the DI and IN clocking modes are also considered.
Note that since each CUT type (4 copies of the CUT) is tested independently of the others, we refer to defective CUT types or CUTS for short, and not defective die. For sampling (boolean) testing, if a CUT fails any of the tests,
Fig. 5: Failing CUTS for Different Test Clocking Modes
it will be considered to be defective. This is only an estimate of the true yield as the testing is imperfect and there can be errors in testing. This is not expected to be a major factor, since some of the tests are very thorough, and the repeatability of the testing was checked by repeating the exhaustive test at the end of the test program.
The Venn diagram in Fig. 5 shows the relationship between the failing CUTS for the different clocking modes and speeds. There were 128 CUTs that failed at least one test at either rated or slow timing, and one more CUT that failed at fast timing. The 128 CUTs were on 126 different die, but the analysis is done on a CUT by CUT basis.
These 128 CUTs will be used as the basis for comparing test sets in the remainder of this paper. Figure 5 shows that there are 121 CUTs that failed at least one test set in each of the clocking modes and speeds, whereas the other 8 defective CUTS escaped some of the clocking modes and speeds. The CUT naming convention used is consistent with [2] , except that each CUT starts at 1 (ELM17 and 18 were not in [2] ). More detailed data for each test set is presented in Table 3 in the next section.
The fact that only one more CUT failed at the fast timing shows that the clock rate used was conservative. This is expected, as the rated clock speed was based on worst-case design parameters, but due to the statistical nature of component delays, most circuits will operate faster than the worst-case timing. This is important for this experiment, since it means that any timing failures that are found are true delay defects, and not just a result of aggressive timing.
It is also important to note that the defect density (i.e. defects per unit area) is low and fairly constant across all CUT types, as shown in Fig. 6 . There are approximately 2 defects per million gates for the Stage 2 CUT tests. This indicates that the process is mature and we are investigating random "spot defects" rather than more gross process problems. 
Defect Classification
Repeating the test sets for different conditions allows some classification of defective CUTs. Furthermore, the failure counters give much more information than simply padfail for each test. First we will define a TIC defect.
Defn.: Combinational Defect.
pattern applied, and not the previous patterns.
Defn.: Timing-Independent Defects clock speed (less than or equal to rated speed).
Defn.: Timing-Independent Combinational Defect (TIC).
If a defect has both the above properties, then it is a timing-independent combinational (TIC) defect. For example, a defect that behaves like a stuck-at fault is a TIC defect. Figure 7 shows that 72 CUTs out of 128 are classified as having TIC defects for the tests applied. For a defect to be classified as a TIC defect in this experiment, both the first failing vector and number of failing vectors must match for each test set, both Data Source modes, the three clocking modes, and slow and rated timing. Note that the actual number of non-TIC defects could be greater than 56, since classified TIC defects are not necessarily TIC defects, they just behave as TIC defects within the resolution of the experiments performed.
For the 72 CUTS that have been classified as TIC defects, the values in the failure counters were compared to a diagnostic dictionary for stuck-at faults, and the analysis shows that 41 CUTs behave like stuck-at faults (pin faults) within the resolution of the experiment. This was determined by comparing the values of the first fail vector and number of failing vectors with a fault dictionary for all the test sets applied to the CUTs. Once again, the actual number could be less than 41.
The 56 CUTs that do not have TIC defects were further classified into "timing" problems and "pattern dependent" problems. This is a very rough classification, as we are only considering two different timings. The results are that 42 CUTS behaved differently for slow and rated timing, and 54 CUTs behaved differently when the pattern preceding each vector was changed.
The behavior of the defect only depends on the input Table 3 . One observation is that there are substantial differences in escape rates even for similar tests, such as the 100% single stuck-at tests. Applying stuck-at vectors using the simulated scan data source was sometimes better and sometimes worse than the parallel load data source. Running tests at a lower speed clearly increases the escapes. There are differences between the clocking modes at the same test timing. Generally, the PU clocking mode has fewer escapes than the DI clocking mode.
In general, defects that cause timing changes by introducing extra capacitive coupling or ground bounce might be exercised better with a test applied with at-speed timing, whereas defects that depend on charging or discharging a capacitor might be exercised better with a test that allows complete discharging before starting to charge the capacitor (Spice simulations of a simple circuit confirm the possibility of this effect.). Table 4 lists the actual escaping CUTS for several test sets, divided into SSF-TIC defects, other TIC defects, and non-TIC defects. This shows the relationship between the escapes for the different test conditions. Table 4 shows that most of the test escapes are for CUTS with non-TIC defects. The 99% coverage stuck-at test also missed one SSF-TIC defect. There is no simple covering relation between the defects detected under the the different test conditions. Even for the exhaustive test, different CUTs escape for the PU, DI and IN clocking modes at the same timing.
STABILITY CHECKING TEST RESULTS
All CUT outputs have stability checkers. There are 216 stability checkers per Test Chip, using the conservative 5 gate NAND design described in [16] . For each test, counters in the response analysis circuitry record the total number of sampling and stability checking errors, as well as the first-fail vectors, as described in Sec. 2.
As Stability Checking can be used with all test sets, and all test modes except DI (at-speed) where there is no time for a checking period, tests were not generated specifically for Stability Checking. The results for slow and rated timing are presented below.
Slow Timing
The results for slow timing are that 34 CUTs have stability checking failures. These are all from the 128 CUTS set that had sampling failures. Furthermore, the 34 CUTs were classified as having non-TIC defects in Fig. 7 based on the sampling tests. Note that these are fairly large delay faults, as the clock rate in slow timing is 2/3 of the rated speed.
Several conclusions can be drawn from this data. First, the stability checkers are operational, otherwise the counts would be zero. Second, there are no "false alarms" or stability checking failures that do not correspond to a 1.1, 2.1, 2.4, 2.1 l), 2  for ROB4 (2.3, 8.1), and 4 for STD4 (2.1, 9.1, 9.2, 9.3) . Furthermore, the above 4 test sets for MUL4 also had no sampling failures at rated speed. 
Rated Timing
At rated timing, there were 188 Stability Checking failures. This is significantly more than expected, and almost all the "extra" failures were due to the ROB circuit.
In order to understand this behavior, the clock rate on the ATE was increased slowly to find the first sampling and Stability Checking error, and it was found that the Stability Checkers started checking the output waveform too early. This is a consequence of the Stability Checker design being separate from the sampling flip-flop, and the timing to the two circuits was not controlled accurately enough. Unfortunately this makes direct comparison between the different techniques more difficult at the rated timing, as outputs that change just before the setup time of the sampling flip-flop will have Stability Checking errors. This shows that it is very desirable to incorporate the stability checker into the flip-flop design to minimize skew problems.
SIGNATURE ANALYSIS TEST RESULTS
Hardware was included in the Test Chip to permit a thorough analysis of the aliasing behavior of both serial and parallel signature analyzers. Intermediate signatures were also taken, which can be processed in various ways, such as investigating multiple signature schemes. Due to area limitations, only one of the CUTs, the MUL, has a signature analyzer.
No instance of aliasing was observed for any of the 168 tests applied to each MUL CUT type. This is probably not very surprising, as the sample size for the signature analysis experiment turned out to be very small (only 40 MUL CUTs failed any of the sampling tests at normal voltage).
Another purpose of the signature analysis experiment was to see what fraction of failures behave exactly as stuck-at faults. Of the 40 defective MUL CUTs, 17 were classified as having TIC defects and 7 matched single stuck-at faults based on the failure counters.
A dictionary of single stuck-at fault (SSF) signatures was computed, against which the observed faulty signatures were matched. Table 5 show that for 5 CUTs every single failing signature matched a single stuck-at signature, and 2 CUTs matched most of the failing signatures. This is strong evidence that there are some defects (5 of 40 in this case) that behave just like stuck-at faults. 
' 7 VALIDITY OF EXPERIMENTAL
When doing an experiment of this nature there is always the question of repeatability of results. It has been reported in [25] that almost half of the failed parts passed retest. In this experiment, special effort was made to ensure repeatability, as many of the primary causes of problems mentioned in [26] have been minimized or checked (e.g. program software errors, improper initialization, race conditions, uncalibrated hardware, etc.)
As verification, the exhaustive test was repeated at the end of the test sequence (after the Very-Low-Voltage and IDDQ tests). 
PROPAGATION BELAY M E A S~~E M E N T S
The results presented in this section were not part of the main experiment, and were done to investigate issues related to the modeling of delay, as limitations of conventional delay fault models have been suggested.
Accurate propagation delay measurements were taken for the SQR CUT, to investigate the effect of inaccurate modeling of gate delay in practice. This CUT has only 12 inputs, making the super-exhaustive test possible. The CUT consists of two cascaded multipliers, and there are 7~1 0 '~ structural paths in this circuit, so test pattern generation for all paths was not possible, as is often the case in practice. As this test was very time-consuming, it has only been done for 4 SQR CUTs. Figure 8 shows the clocking mode (PU clocking mode) and clock waveform used for this test. Patterns were applied to the CUT on the rising edge of the clock, and the CUT outputs were sampled on the falling edge of the clock. The advantage of this type of clocking is that the only timing-critical pin on the ATE is the clock, so that skew between tester pins does not have to be taken into account. The duty cycle of the clock was decreased in 25 ps steps until the CUT started failing.
The tests applied are shown in Table 7 .
The critical path test only tested the 100 longest paths for rising and falling transitions. The paths were chosen using a simple gate delay model (unit delay). For the critical path, gate delay, and robust tests, the test generators left unused inputs at X. The tests were repeated with Os assigned to the X's, as well as 0 or 1 randomly assigned to the X's.
All vectors were sampled, as well as every second vector as is normally done for delay testing. Figure 9 shows the experimental results for the SQR CUT. The propagation delays on four die were measured, and the maximum and minimum values relative to the "super-exhaustive" test were plotted. The longest delay is not exercised by any test except the "super-exhaustive" test. For this circuit, the results for the single-stuck, transition, and critical path tests were very similar. For example, the stuck-at test needs to be applied about 10% faster than the desired speed of the CUT in order to detect timing failures.
Three exhaustive tests were also applied. The first was generated with a primitive polynomial, the second is a gray code with single bit transitions between vectors, and the third test maximizes the number of transitions between vectors (either n or n-1, for an n-bit vector). The gray code performs very poorly, and the circuit must be clocked at least 22% faster than the worst-case delay to detect the delay fault. The node activity for the three exhaustive tests was computed, and as expected, the activity for the gray code was significantly lower than for the other two tests (12% compared to 24% for the pseudo-random, and 33% for the maximal-transition test). The propagation delay measurements in this section show that the longest delays in circuit are not exercised by using test sets generated using a simple delay model. Even passing an exhaustive test does not guarantee that the circuit functions at the designed speed.
The critical path test was no better than the stuck-at tests. This shows the danger in testing only a small fraction of the paths in the circuit, and using an inaccurate model to choose the paths.
(Approximately 100 measurements were remade to check the repeatability of the ATE, and all measurements for the SQR CUT were within 25 ps, so there were no ATE consistency problems.)
CONCLUSION
This paper presents new results from the Test Chip experiment, covering different clocking modes and speeds. The emphasis of the paper has been on timing failures.
The main result is that even for a mature process, timing or pattern dependent defects make up a significant fraction (44%) of the defect population. There are at least 42 CUTs with defects that are timing dependent out of 128 CUTs, and at least 54 CUTs for which the behavior depends on the previous pattern applied. These could be delay , stuck-open , or even feedback bridging faults.
Of the 72 TIC defects at most 41 were accurately modeled by stuck-at faults (the signature analysis shows that some defects behave very much like stuck-at faults). Due to the detection of non-targeted faults, however, the escape rate for stuck-at tests for different clocking modes was between 6 and 15 CUTs for slow timing, and between 3 and 15 CUTs for rated timing.
In terms of individual test sets, it is difficult to compare those with few escapes with enough statistical significance. However, there was a definite drop in defect coverage when the tests were run at slow speed, and generally, "delay" application of vectors was better than "at-speed" application of vectors. No test set had fewer than 7 escapes with "at-speed" application of vectors at slow speed.
The Stability Checker designs functioned as expected at the lower clock rates, and cases of test invalidation by hazards were found.
The results of the propagation delay measurements show that none of the tests applied sensitized the longest delay through the circuit, as indicated by their performance versus the super-exhaustive test.
