An experiment has been designed to evaluate multiple testing techniques for combinational circuits. To perform the experiment, a 25k gate CMOS Test Chip has been designed, manufactured (5491 devices), and evaluated with over 300 tests. The chip contains five types of CUTS derived from functions in production ASICs.
Introduction
Many test techniques have been proposed to achieve the quality levels now required for digital integrated circuits. In order to make optimum test choices, the various testing approaches can be compared with respect to several criteria. These include the impact on performauce and area of the circuit, required design time, test vector size, test time, automated test equipment (ATE) requirements, and finally, thoroughness of the test in detecting faulty circuits. Of these factors, test thoroughness, usually specified as escape rate or defect level, is most difficult to quantify. The reason is that, while the other factors can be accurately calculated, escape rate predictions must be verified empirically.
This experiment is a collaboration among several organizations with the common interest to gather more information on test techniques and their associated escape rates. The partners include an ASIC mufxturer &SI Logic), an ASIC user (Hughes Aircraft Co.), a test service (Digital Testing Services) , and the Center for Reliable Computing (CRC) at Stanford University.
The experiment is designed to achieve the following objectives:
1 
Test Chip Architecture
Fundamental to our approach has been the design and manufacture of a dedicated test IC containing a number d combinational circuits-under-test (CUTS), which m representative of real world designs, and on which a wide range of diffkrent test methodologies may be evaluated. There are provisions to perfomm exhaustive tests on each CUT, as an absolute reference.
A number of practical considerations also enter into the design, to minimize both device and testing costs. Considerable support circuitry is provided to enable all evaluations to be performed by wafer probing, eliminating packaging. Support circuitry also minimizes the burden on Automated Test Equipment (ATE) by providing clocking and response strobing on-chip (even with low speed ATE and Iixturing) and by performing all response evaluations (except I~Q ) on-chip. Finally, a mechanism is provided by which experiments can be performed without revealing-or needing to knowthe absolute yield attained at the source foundry. The protection of proprietary yield information simplifies the acquisition of devices for experiments and, along with the fact that the design is a gate array, makes multiple foundry sourcing a possibility. A common data source is used for all CUTs. Tests can be applied either by the built-in self-test (BIST) circuitry (for pseudo-random and exhaustive tests) or by an external tester (ATE). In either case, at-speed, shifted vector pairs, and timed two-pattern tests can be applied to the CUTs. Clocks may be applied directly by ATE or derived limn slower ATE clocks. The derived clocks have periods which are set either fiom the applied clock pulse width, or h m an internal delay line which tracks process and operating conditions. The internal delay line may be monitored as a ring oscillator to measure intrinsic chip
The respnse analysis circuitry enables the use of long pseudo-random tests and the exhaustive tests (since the responses would otherwise overwhelm the ATE storage capability). The response analysis circuits are used for tests whose source is ATE as well. This relieves output pin requirements as well as ATE response storage.
The response analysis consists of a real time comparison of the outputs of the four identical copies of each CUT. Thus, failure of any CUT copy constitutes a detected fault within that CUT set. Each response analyzer records 1) the address of the first detected fdure in a pattern set and 2) the total number of failure events occurring during a complete pattern set.
Testing Strategy
The two stage testing strategy shown in Fig. 3-1 resulted from the following considerations. A premise c f this experiment is to make as much data publicly available as possible, without disclosing sensitive process information. Furthermore, the main interest is in failures that are difficult to detect. (Gross failures are generally easy to test; the ability of tests to detect the difficult failures is more important for achieving high quality levels.) Stage 1:
Speed.
The fmt test is a gross DC parametric test, consisting of gross IDD (<500pA), input thresholds, VOLNOH, IOS, input leakage, and tristate leakage. The Crosscheck test circuitry and the test support circuitry on the die are then tested. Only the data h m the dice that pass all these tests were available for this experiment. The disadvantage is that the gross wakr yield is not known, but tihere is no constraint on reporting any data after this Fioint.
The second stage is the testing of the actual CUTs on die that pass the Stage 1 tesls. The sample size, N (in this case 5491), of die that will be tested thoroughly is the number of devices which pass stage 1. Note that this strategy requires the test support circuitry to be well tested, to minimize erroneous responses in the subsequent CUT test experiments. Stuck-at tests of support circuitry achieve 98.8%~ coverage, based on fault grading by Zycad XPLUS. The support circuitry is also subjected to an IDDQ test with 250 strobe points. (There has been no evidence of support circuitry failure on any actual testing done so far.) Also, since this is a standard gate array, it is impossible to modify the base artay to strictly isolate the test support circuits and the CUTs during I, , tests and CrossCheck testing. In order to minimize ambiguity, the CUTs are held in k e d states while the support circuitry is screened with and CrossCheck tests.
Test Chip Implementation
in the block diagram, Fig. 4 Crosscheck Observation Circuitry These are described in the following sections.
Circuits Under Test
The circuits-under-test (CUTS) are described in this section. The CUTs are representative of data-path and control logic, as well as various design styles. The requirements for the CUTs were:
Combinational logic 24 or fewer inputs. The number of inputs was limited in order to permit exhaustive testing. Few outputs, to reduce the: response analysis circuitry, as well as increase the test difficulty by reducing fault detectabilities. The CUTS have either 6 or 12 outputs. There are five CUT types: two multipliers and three control logic blocks, shown in Table 4 -1. The three control logic blocks perform tlhe same hction, but were designed differently. The first implementation was synthesized using all availabile gates in the LFT15OK library [8], the second was restricted to elementary gates (NOT, AND, OR, NAND, NOR), and the third is robust path-delay-fault testable. The control logic blocks are various implementations of the same function, part of a control circuit from a DMA controller. The original circuit had 34 inputs and approximately 3,700 literals. Ten of the inputs were tied to 0 (to meet the 24 input CUT constraint) before synthesizing the three implementations.
One of the purposes of this part of the experiment was to investigate the effectiveness of multiple test techniques on three different circuit implementations of an identical function. Fig. 4-2 summarizes the steps taken to produce the final netlists.
The first implementation is STD, the "standard" implementation. It uses the unrestricted LFTlSOK The third implementation is ROB, which was synthesized to be robust path-delay-fault testable. The original netlist was collapsed to 2 levels and simplified using espresso [l] , since the synthesis tool used Eor achieving robust delay fault testability required a flattened netlist. A 3-level robust pathdelay-fault testable circuit was then generated using the procedure described in [7] . The Sequential Interactive Synthesis Program (SIS, [18] ) was then used with constrained algebraic optimizations that preserve robustness to get a multi-level circuit. The technology mapping was done using Synopsys, using both simple gates and complex gates with no intemal reconvergence.
The final ROB circuit was generated first, optimizing delay, then STD and ELM were synthesized with the same target delays.
The sizes of the three implementations are given in Table 4 
12x12 Multiplier (MULJ
This is a 12x12 partial product multiplier composed c€ 6x6 multiplier building blocks, as shown in Fig. 4-3 Only the twelve most significant bits of the output are implemented. Using the most significant bits reduces the response evaluation circuitry, as well as decreasing the observability of some faults in the multiplier. This CUT has a nominal Dost-lavout delahr of 33.8 nS. gate count of 1, and a 4-input NAND has a gate count of 3 [8] . The number of paths without single-path-propagating 
6x6 Multip&er Foflowpd by Squarer (SQR)
those used on the 12x12 multiplier. The multipliers are cascaded, and the second multiplier acts as a squarer since both inputs are fed by the output of the first multiplier. Some redundancies were eliminated by hand in the final circuit, and only the 6 most significant bits of the output are implemented. This CUT also has a nominal postlayout delay of 35.4 nS.
predicted by LSI Logic MDE toolset.
Paper 28.2
The main reason for including this circuit is to have at least one CUT with few enough inputs to permit 2m applied. The 22N exhaustive test provides a thorough reference for delay faults.
Response Evaluation Circuits
The response evaluation circuits, EVALl through shown in Fig. 44 
Operating Modes
Data is applied to the CUTs using an on-chip parallel output linear feedback shift register (LFSR). There are three ways to apply data to the CUTs:
Parallel Load
In this mode, data at the parallel data inputs is clocked into the source register &om ATE at the rising edge of the main clock. Both single and twopattern delay tests are applied in this mode. For two-pattern tests, CUT outputs are observed on even input vectors, and masked on odd input vectors.
Simulated Scan
ATE test vectors can be applied in a "simulated scan" sequence in two clock cycles. At the first clock, the input vector rotated by one bit position is applied to the CUTs. At the second clock, the or&&al input vector is applied. This mode is mainly for mnvenience in applying patterns generated for scan; an identical sequence is possible in Direct mode by modifying the patkms to supply the two-pattern tests &om the ATE.
Pseudo-random
In this mode, the data source register is configured as an LFSR (linear feedback shift register) to provide an exhaustive test in 224 clocks. The LFSR implements the primitive polynomial
f ( x ) = x W + X 6 + X + 1
The all-zeros state must be supplied in the direct load mode. The LFSR must be initialized with a non-zero seed, by operating in direct load mode (mode 0) with the seed applied at the parallel data inputs, DIN(23:O).
The pseudo-random mode also supplies an 2m exhaustive test to the fi& CUT, the 12-input multipliersquarer, in 224 clocks.
CUT Isolation
CUT type.
asserted, the CUT inputs are forced to logic zero.
Control inputs are provided to disable the data to each When the corresponding control signal is Using the data clocked into the register, the outputs c6 the second, third and fourth CUT copies are compared to the outputs of the first copy. The result appears at the CPASSF output.
The stability checkers observe each output for changes after the sample clock. If any output changes after the sample clock, the PPASSF output of the EVALn is asserted. The occurrence of a stability fdure in the absence of a corresponding Boolean failure may indicate a fault being masked by a hazard. The clock generator modes are described in section 4.6, "Timing".
Failure Counters
The counters record the first clock cycle in which a fdure occurred and the total number of failures occurring during the test. 
5
Total Stability-only Failures 16 The counters for each CUT type are concatenated in a scan string, as shown in Fig. 4-5 .
There are a total of 100 flip-flops in the chain, ordered LSB to MSB. To initialize the counters, the test pattem shifts zeros into the entire chain. The failure counters have a masking control signal (not shown) which causes the counters to ignore failures. This is used when failures are to be recorded only on certain cycles, as in delay testing or simulated scan.
In normal use, the tester reads and records the values of the counters at the end of the test.
Signature Register
Due to hardware constraints (available gate area) signature analsysis is applied to only one CUT, the 12 x 12 multiplier (MUL). The outputs of the MUL are fed to a configurable signature register, shown in Fig. 4-6 .
The register is segmented to support signature compression in four configurations (parallel mode):
In serial mode, one of the 48 MUL outputs is selected as the serial input, and only the first (lebost) portion of the signature register is used. The .bur signature sizes are also available in this mode.
For test and initialization, the register operates in a simple scan mode. The scan output is brought it0 the chip YO, and can be monitored to aid diagnosis of ClUT failures.
Timing
The data sources and failure counters are clocked by the rising edge of the Source Clock (see Fig. 4-7 and Fig. 4-8) . The output sample registers are clocked by the Sample Clock, generated by one of three methods: direct clocking based on tester cycle period, externally generated clocking, and internally generated clocking. 
COUNTERS

Direct Clocking from ATE
This is the simplest clocking mode, corresponding to single clock synchronous designs. A new pattern is applied on each rising edge of the master clock, and the outputs are sampled at the subsequent rising edge. The master clock tbm the ATE is used as both the Source Clock and the Sample Clock. Stability error detection is disabled in this clocking mode.
Pulse Width Generated Clocking
In the pulse width generated clocking mode, the time fiom CUT input pattem application to CUT output sampling is precisely controlled, independent of the tester clock period. This allows the tester and fixture to operate at a low data rate, provided it can deliver an accurate clock pulse width. As seen in Fig. 4-9 , the output clock is generated tbm the falling transition of the master clock.
Internally Generated Clocking
In the internally generated clocking mode, the time fiom CUT input pattem application to CUT output sampling is determined by delay elements in the clock generators in each EVALn partition. This places minimal requirements on the tester and fixture, since the delay test is independent of pulse width as well as clock period. Also, the test is equally stringent for all dice, since process variations are taken into account automatically. Fig. 4-10 shows the timing relationships in this mode.
Crosscheck
The test chip includes the Crosscheck testability solution, which allows unobstructed observation of every node in To test the chip, vectors are applied to achieve high toggle coverage in the logic. At each vector, the Crosscheck Test Electronics activates the probe lines in sequence, accumulating a signature of the data received on the sense lines. The signature is shifted out and compared with an expected signature derived from simulation.
Implementation
Netlists
The full chip netlist is available in hierarchical LSI Logic NDL and EDIF formats. The CUT netlists are also available separately in NDL and EDIF formats. The netlists may be obtained by contacting CRC.
Hardware Details
The Test Chip was implemented in LSI Logic LFTl5OK technology [8] :
1 .O micron drawn gate length (0.7 micron effective channel length) Two level metallization Embedded Crosscheck observation network. Macrocells functionally equivalent to LSI Logic LCAlOOK series
Summary
requirements of the project:
The experiment and chip design have met the 1.
2.
3.
4.
5.
Large number of test techniques: The chip supports externally applied patterns as well as internally generated exhaustive patterns, each of which can be applied in multiple clock modes. For 
