This paper describes the overall test methodology used in implementing the S/390@' microprocessor and the associated L2 cache array in shared multiprocessor designs, the design-for-test implementations, and the test software used in creating the test patterns and in measuring test effectiveness. Microprocessor advances in architectural complexity, circuit density, cycle time, and technology-related issues, coupled with IBM's high requirements for quality, reliability, and diagnosability, have made it necessary to develop testing methods and attain quality levels that far exceed what others have approached.
Introduction
The advent of deep-submicron technology has given rise to integrated circuits containing hundreds of thousands of logic gates, embedded memories approaching the megabit range, I/O counts in the thousands, and operating frequencies in the hundreds of MHz. Along with the benefits of such characteristics and the design flexibility necessary to achieve them come severe design and test challenges. In particular, traditional methods of testing semiconductor devices are quickly becoming obsolete. The use of functional patterns derived for design verification as manufacturing test patterns is becoming increasingly unacceptable. Some of the most severe problems associated with this approach are high test development times, defect coverages that are low or hard to measure, and poor diagnosability. As far back as fifteen to twenty years ago, test techniques were developed within IBM and in industry which based analysis on the design structure rather than on functionality [l] . Within IBM, these techniques have been evolving from the 308x testing in the early 1980s to the 3090* testing in the later %Os, to highdensity CMOS parts in the early '90s [2-131. These techniques have led to the development of automatic testpattern generation (ATPG) algorithms and tools [14-191. Although ATPG-based approaches to digital testing have met with some success, they also are becoming increasingly ineffective as chip sizes increase. Indeed, time requirements for ATPG algorithms grow nonlinearly in relation to the size of the circuit under test [20] .
However, the largest problem with both the functional and ATPG-based test techniques is their reliance on the use of automatic test equipment to apply the test patterns to the device's external inputs and measure responses on the device's external outputs. This approach does not provide a means to adequately detect all of the device's internal defects. Direct access to the internal structures of a device is necessary. This requirement has led to the development of design-for-test (DFT) and built-in self-test (BIST) techniques and methods [21-271.
DFT techniques consist of design rules and constraints aimed at increasing the testability of a design through increased internal controllability and observability. The most popular form of DFT is scan design, which involves pins, an on-chip phase-locked loop (PLL) was used to multiply the incoming tester frequency to bring it up to the operating frequency of the chip. Additional selfgenerated clock (SGC) [22, 29, 301 circuitry is then used to generate the various system clock sequences needed to properly exercise all portions of the chip.
The BIST techniques can be divided into two major categories: logic BIST (LBIST) to test at-speed the logic in the devices, and array B E T (ABIST) to provide atspeed testing of the embedded arrays (Le., RAMS). The basic idea in LBIST is to add a pseudorandom-pattern generator (PRPG) to the inputs and a multiple-input signature register (MISR) to the outputs of the device's internal scan chains. A BIST controller generates all necessary waveforms for repeatedly loading pseudorandom patterns into the scan chains, initiating a functional cycle (capture cycle), and logging the captured responses out into the MISR. The MISR compresses the accumulated responses into a code known as a signature [31] . Any corruption in the final signature at the end of the test indicates a defect in the chip. This LBIST architecture is known as a STUMPS [6] architecture (self-test using MISR and parallel shift register sequence generator), and the scan chains connecting the PRPG and MISR are defined as the STUMPS channels.
Although pseudorandom patterns achieve high test coverage for most scan-based designs, some areas within the design may be inherently resistant to testing with such patterns. Supplemental patterns designated as weighted random patterns (WRP) [15, 161 are therefore used during manufacturing test. WRP avoids the large test data volume that would be needed to drive conventional stored-pattern logic tests. External tester hardware is used to force individual bits in scan-based random test patterns to be statistically weighted toward a logic one or zero. Compared with LBIST alone, this method greatly reduces the number of random patterns needed for obtaining high test coverage, thereby greatly reducing test time.
For the ABIST test, a controller based on a programmable-state machine is used to algorithmically generate a variety of memory test sequences. As with LBIST, test patterns can be applied to the embedded array at cycle speeds. Because of the regular structure of arrays, an ABIST controller can be shared among several arrays. This not only reduces the overhead per array, but allows for decreased test times, since the arrays can be tested in parallel.
To ensure good-quality chips, several IBM standard manufacturing tests are applied. In addition to those already mentioned, the types of tests include a boundaryscan 1/0 test, parametria, IDDQ, excessive voltage and temperature stressing, and dynamic burn-in [13] . Stressing the devices with voltage stressing and burn-in conditions helps guarantee the very high quality and reliability necessary for the mission-critical applications for S/390 servers.
The DFT implementation requires chip area dedicated to test functions; however, the bulk of the logic is also used for system initialization, recovery, and system failure analysis. The amount of logic dedicated to manufacturing test on these high-density CMOS parts utilizes less than 1% of the overall chip real estate.
Design for test-LBIST
LBIST is used for manufacturing test at all package levels and for system self-test. The main LBIST components are a PRPG and a MISR. These two components are connected to chip scan chains to form the overall LBIST structure.
The basic LBIST logic test sequence used to apply test patterns is as follows:
The PRPG and MISR are initialized to a predetermined state known as a seed. Then, the circuitry loops on Steps 2 and 3 for n patterns. Scan clocks are applied to the PRPG, MISR, and system latches so that a pseudorandom pattern is generated by the PRPG and loaded into the system latches. Simultaneously, the result of the previously applied test pattern is compressed from the system latches into the MISR. System clocks are applied to the system latches to test the logic paths between the latches. Test patterns are both launched and captured by the latches in the scan chains. After n repetitions of Steps 2 and 3, the signature in the MISR is compared against an expected predetermined signature that was calculated during the test-pattern generation and simulation process.
Although the LBIST sequence is straightforward, there are multiple means to apply the sequence to perform different categories of logic tests. If the test is required to verify only that the logic structure between the latches is correct and has no stuck-at faults, the LBIST test can be applied with static, nontransitional patterns. The time between launch of data from one system latch and data capture in another system latch is irrelevant, so data are scanned into the latches in a nonskewed state such that the master and slave latches contain the same data. When system clocks are applied, there is no transition of data on the launching latches.
If the LBIST test is to determine not only that the logic between system latches is correct, but also that the propagation delay from one system latch to another occurs within a predetermined delay, a transition test is applied. In this test, data are scanned into the latches in a skewed state such that the master and slave latches potentially have different values so that the launch clock will create LBIST PRPG architecture. LBIST is used on the tester during manufacturing test and during system self-test. During manufacturing test, the tester can apply the necessary signals to scan the shift-register chains, cycle the PRPG and MISR, and apply the system clocks at the proper time. In the system, there are no available resources external to the chip to control the LBIST circuitry. These controls are generated on-chip by the self-test control macro (STCM). The STCM executes the LBIST test sequence in a stand-alone manner. In fact, an entire self-test sequence of the entire system can be initiated at a customer office via modemiservice processor controller.
LBIST design implementation
Several unique features were required in the logic implementation to support the various aspects of the LBIST methodology. The PRPG shown in Figure 2 is a 61-bit linear-feedback shift register (LFSR) with a feedback configuration utilizing taps 0, 14, 15, and 60.
To minimize data dependencies, the outputs of the LFSR are passed through an XOR spreading network before being applied to the logic. This spreading network is used to minimize latch adjacency dependencies between subsequent stages of the LFSR. Each stage of the LFSR has an associated two-input XOR which is fed from that stage and bit 0 of the LFSR. The output of the LFSR is applied to the appropriate STUMPS channel scan input.
The MISR is also 61 bits long and has a feedback configuration similar to that of the PRPG. Unlike the PRPG, the MISR has a two-input XOR between each of the latch stages, which allows for 61 bits of data from the STUMPS channel scan outputs to be clocked into the MISR on each LBIST scan cycle in the process of generating the signature.
control the on-chip LBIST test operation; however, it also functions as the main interface and controller for all other test functions, with the exception of ABIST execution, which has its own independent test engine. The functions of the STCM are as follows:
The primary purpose of the STCM in Figure 3 is to LBIST scan-clock generation and sequence controls. Scan-chain configuration controls. ABIST initialization. External clock controls.
The mode of operation, or test mode, is determined by four bits of the general-purpose test register (GPTR). The GPTR is a register that is initialized prior to each test with static control information. The GPTR consists of scan-only latches, and its state remains constant throughout the application of a given set of tests. Details of the GPTR are discussed later. 
GPTR The GPTR is a register that provides static
Scan-chain configuration controls The scan chains can be configured in several ways depending on the specified test mode:
LSSD (level-sensitive-scan-design) [3] mode All of the chip latches are configured in one long scan chain. This mode is used whenever all latches of the chip must be initialized to a specific state. It is the primary chip GPTR mode This mode is a subset of the LSSD mode in which only the GPTR latches are contained in the scan chain. This mode is used to modify the state of the GPTR without changing the state of the chip system latches.
WRP mode This design implementation of WRP uses fifteen scan chains. In this mode, the scan latches are configured into the fifteen independent chains. This mode is used for applying WRP and deterministic test patterns during manufacturing test.
configured as 61 STUMPS channels connected to the PRPG and MISR.
LBIST mode In this mode, the scan chains are
ABIST initialization
The ABIST engines are local to the different arrays on the chip, so there is no central ABIST state machine. ABIST engines are initialized by a scan-load operation. The function of the STCM is to specify the appropriate ABIST mode on the basis of the state of the test-mode bits in the GPTR and start the clocks to the ABIST engines. The STCM starts the ABIST tests by generating an SGC-go to the SGC circuitry when a start-test signal is received. Each of the ABIST engines runs to completion and issues an ABIST-done. The STCM combines these done signals into one ABIST-done for the chip and propagates that signal to a chip output.
External clock controls
In all cases where internally generated clocks are used, external clocks can also be applied to generate the same test sequences. LBIST, WRP, or deterministic stored patterns can be applied with internal clocks or external tester-generated clocks without modifying the stimulateimeasure latch data. This feature allows for the debugging of timing or data problems at slow speeds and is used quite often to verify the integrity of the patterns before attempting to apply the patterns at fast internal chip cycle times.
LBIST circuit design implementation
The LBIST macro is designed to communicate with the rest of the chip in an asynchronous manner, since there are no critical timing signals to or from the LBIST macro. This clocking approach is possible because of the asynchronous nature of the design; it allowed the macro to be synthesized and physically designed with no custom circuit layout techniques.
Design for test-ABIST
Memory array built-in self-test (ABIST) has traditionally consisted of a finite-state machine logic engine designed to apply a prescribed fixed set of memory test patterns to the memory array(s) under test. These tests typically include data blankets (all Os and all Is), word-line stripes, bit-line stripes, and checkerboards. Although these are performed in multiple addressing modes, including unique addressing (read before write), this pattern set is the limit of most finite-state-machine ABIST engines and is mostly unchangeable. A simplified logic model has registers of latches set up as counters in nested loops to perform the series of array addressing and readingiwriting, and a twolatch data register that fans out to all of the array even data bits and odd data bits, respectively. Data patterns are limited by the combinations of this two-bit data register. The data out of the array are compared against this datain register, and the passifail results are latched. Execution of finite-state-machine ABIST involves initializing the chip for ABIST, usually through the scan chain, and applying a sufficient number of system clocks, either externally or through the SCG, for the finite-state machine to reach its final state. The ABIST passifail results (and repairable array addresses for arrays with redundancy) are scanned out through the scan chain.
For the S/390 microprocessor and L2 cache designs there are several different custom embedded memory 61 6 array designs, often having different test requirements.
The tight cycle-time and access-time requirements on these arrays caused their designs to be quite aggressive. Dynamic and self-resetting circuit techniques were used extensively. These aggressive arrays dictated a need not just for high-speed ABIST engines, but for a testing scheme that was flexible enough to help diagnose potential problems, stress array performance, and provide production-level testing ability.
for these high-performance arrays [32, 331, with microprocessor-like function. The ABIST program to be applied is scanned into a custom microcode array, and each instruction is decoded, executed, and applied to the array by the ABIST microprocessor. The programmable ABIST design comprises eight basic components, as shown in Figure 5 :
A programmable ABIST design was implemented 1. Microcode array Scannable register array that contains the ABIST program to be executed (typically eight instructions).
Pointer control macro
Register that controls the address of the ABIST instruction being executed. 3. Address control macro Register that controls the address to be applied to the array. 4. Data control macro Register that controls the data to be applied to the array. 5. Readiwrite register Register that controls the readiwrite mode to be applied to the array. 6. Result compression macro Registers that either log passifail results and failing address data or store a passingifailing signature on the basis of the array data outputs. I . Test control interface Logic which communicates to the STCM and controls the test modes of the array.
Access timer macro
Digitally Programmable timer which measures the access time of the array.
The core blocks of the programmable ABIST design are the microcode array, pointer control macro, and test control interface. For nearly all of the different memory array designs, each ABIST engine uses the same basic design for these core blocks. The address, data, and readiwrite macros need to vary only in size according to the addressidata widths and readiwrite configurations of the arrays under test. The result compression macro generally consists of either a MISR (multiple-input signature register) on the arrays without redundancy or a data comparator and failed address registers on the arrays with redundant repairable addresses. Redundancy is a feature sometimes used on larger arrays to improve yield by replacing failing word lines with redundant (spare) word lines.
The microcode array is custom-designed as a dense, scannable, read-only register array, with fast access. It w. v. n u o n generally contains eight ABIST microcode instructions. Its scannability gives the ABIST great flexibility in that it can so easily be reprogrammed. Since it consists mainly of scan-only latches, the microcode array is implemented in a very area-efficient manner. quite powerful. A microcode instruction is basically broken down into five command fields:
The programmable ABIST instruction set is small but Table 1 , a background of blanket Os can be written into the memory array using just one instruction; several other examples are included in Table 1 . Only a few instructions are needed to perform some very powerful operations.
While it may sound like an expensive proposition to include the full array data input width in the data control macro, it actually can be done quite cheaply with the programmable ABIST design. The data, address, and read/write input fields to the S/390 arrays are usually logical system cycle boundaries. This means that a system series latch, or at least a listening latch for LBIST, must exist at the data, address, and read/write ports of the arrays. Either system data or ABIST data are multiplexed into these latches depending on the test mode of the array. The programmable ABIST design is able to use these latches to form its data, address, and readiwrite control macros. The cost of these macros becomes just combinational logic for these macros, with a few control lines from the ABIST microcode array to tell the macro what to do and when. With the ability to economically provide full-data-width testing to the memory arrays, not only can a multitude of new test patterns be thrown at the arrays (such as marching and walking patterns, which are typically not done by finite-machine ABIST engines), but there are system benefits as well. For example, the logic in the ABIST engine can now be used to initialize the data in the arrays to good parity and ECC upon power-on reset or even during on-the-fly reset recovery after error hits in the system.
The use of MISRs for signature compression on the outputs of memory arrays without redundancy in 61 8 conjunction with a programmable ABIST has many difficult to do at speed without complex circuitry and wiring. These arrays are usually placed in some of the most congested locations of the chip, where silicon area and wiring channels are at a premium. A MISR lends itself very well to very wide data words while keeping wiring channel usage and circuit complexity to a minimum. A properly implemented MISR can also be operated at very high rates of speed because of its relative logic simplicity. The MISR function on the nonredundant arrays is actually integrated into the data output register of these arrays. Since the data output register is already a necessary component of the array function, the integration of the MISR logic becomes even more economical compared to a full data comparator. Another limitation of a data-comparator type of compression is that the ABIST engine always has the burden of calculating the expected array data output for the comparator. This puts limitations on the complexity of the test patterns that can be applied to the array, especially with a finite-state-machine ABIST engine. With the MISR approach, the expected result is never calculated by the ABIST engine. All of the responses are merely compressed into the MISR final signature. The ABIST engine can generate patterns of any complexity while writing and reading them back in any order whatsoever. A programmable ABIST engine is well suited to generate patterns at these levels of complexity. The net results of a MISR approach with a programmable ABIST are more thorough pattern coverage and flexibility while maintaining high performance and minimum design overhead.
In addition to maximizing pattern coverage with the programmable ABIST, there is also much emphasis on performance characterization of the memory arrays. Each of the ABIST engines is designed to cycle well below the specified system cycle-time limit of the chip. In fact, the ABIST is able to cycle below the expected cycle-time limits of many of the individual arrays. This is no small feat, since ABIST operating frequencies in some cases go beyond 500 MHz. The benefit of this is not just the guarantee of chip system cycle performance from the arrays; cycle-time results can also be examined to point of failure on a per-array basis. Point-of-failure results enable the quantification of cycle-time guardbanding for the arrays as well as possible qualification of the failing circuitry for future improvements. Because the access time of a memory array may occupy as little as 50% of a full system cycle, access-time measurements can be even more important than cycle-time measurements. Each ABIST engine is equipped with a digitally programmable access timer macro [34] which is able to measure the access time of each array to a resolution of nearly 100 ps. The desired access-time setting is scanned into the timer, and on each array clock the timer supplies an access-time strobe, delayed by the scanned setting, to the data-compression macro. An access-time strobe is applied to the datacompression macro for every cycle of the entire ABIST program, yielding a worst-case access-time measurement across every pattern and every address in the array. When the pass/fail settings of the timer for a particular array have been determined, the timer is then configured in a recirculating-loop mode which allows the timer to oscillate at a rate corresponding to the access time of the array under test. This frequency is divided to produce a lower rate and multiplexed off the chip for an easily measured, very accurate access time of the array, based on this oscillator frequency. The access timer macro also has a static mode which completely removes the access-time measurement from the ABIST test and allows for static functional debug.
ABIST engines for the S/390 microprocessor and L2 cache chips achieved high function at minimal cost. This approach provided high-speed testing capability, the flexibility needed for diagnosing difficult problems, and the ability to stress and measure array and redundancy performance, along with production-level testing capability. All this was accomplished in a design-efficient manner, minimizing real estate and functional timing impact.
The design and implementation of programmable

Chip testing debug, analysis, and diagnosis
Although rigorous checking and verification of the design [35] is performed, the complexity of microprocessors often leads to some technology-related problems that are found during test of the hardware. The flexibility built into the BIST designs was invaluable in chip bring-up, both at the tester and in debugging the system. hardware problem caused by coupled noise. A problem was first suspected when initial test runs showed that LBIST passed only in a very narrow (-100-mV) voltage range. At voltages above and below the narrow band, the LBIST signatures were intermittent and nonrepeating, and varied with voltage. First an attempt was made to find a single failing pattern that caused the fail, using a binary search with LBIST patterns. Since the MISR signature was known to be correct after X patterns, it must also have been good for all patterns less than X . With this knowledge, one simply changed the bits programmed in the pattern counter in a binary search fashion and ran LBIST in the known good-voltage range. The signature was not checked but was saved and used as the golden signature for additional analysis. (The benefit of this approach is that a new signature can be obtained in less than a minute, and no additional pattern generation is required.) Then the voltage was varied to see whether the passing voltage window remained the same.
It was found that certain patterns caused the goodvoltage window to narrow from the previous pattern. These were identified as noisy patterns, and the state of the system latches was extracted from the LBIST sequence before and after these patterns were applied. The extracted patterns were then applied in a deterministic fashion and used to narrow down the source of the coupled noise.
Another unique use of LBIST was determining powersupply noise problems. LBIST can be programmed to apply a skewed or nonskewed load/unload sequence with or without system clocks. This feature was used to measure the power-supply noise at different levels of switching activity. Since LBIST runs in a continuous loop, it was straightforward to trace the V,, supply and determine the delta-Z noise and power-supply droop with different levels of switching activity based on the scan and system clock sequences applied. Worst case is a skewed loadiunload sequence with system clocks applied. Best case is a nonskewed load/unload with no system clocks applied.
are very complicated to determine. Again, LBIST was used to isolate the worst-case delay paths between scan One unique use of the BIST hardware was in isolating a Within a complex microprocessor, delay measurements chains, using a technique similar to that for the couplednoise analysis. A binary search was performed to find the failing pattern, but using cycle time as the search variable rather than the voltage. The LBIST patterns were narrowed down to those that failed at the slowest cycle time; these patterns were then extracted and used for deterministic analysis of timing-critical paths. In the above cases, LBIST was able to be used to diagnose problems at the tester because of the flexibility designed into the LBIST circuitry. The analysis was performed, without requiring any pattern generation beyond the original LBIST patterns, and by simple edits to the initialization state of the LBIST scan setup.
Test software strategy
The wide variety of test techniques discussed in the preceding sections and the complexity of the processor design require powerful and flexible test analysis, generation, and diagnostic support. TestBench*, the IBMdeveloped test-generation tool, was selected to fill this role because of its state-of-the-art capabilities. It supports various design styles, encompassing several different clocking and scan approaches. Along with supporting efficient and varied test-generation and simulation techniques, TestBench understands the interaction of these test types. This support is provided via multiple test modes, where a test mode is the set of conditions required for test. In addition, there is a close working relationship between the TestBench development team and both the processor development and internal technology groups. An outcome of these partnerships is correct-by-construction 620 test data, a critical component of any test methodology.
branches of their own clocks.
Generate test data Generate test data supports the several different types of tests-mentioned earlier: LBIST, ABIST, stored-pattern stuck-fault tests, 1 / 0 wrap, parametrics, IDDQ, and burnin. The most familiar of these is stored-pattern tests for the logic, where the process consists in automatically generating input stimuli and performing fault simulation to produce the output responses and grade the fault coverage of the tests, keeping track of the faults that have been tested and the ones that remain. As another example, the test-generation process for LBIST consists in reading in the clock sequence for the LBIST tests and the initial seeds for PRPG and MISR, and performing fault simulation to grade the tests (as in stored-pattern simulation) and produce the expected MISR signature, which corresponds to the output responses for storedpattern test. Another test type previously mentioned is burn-in. TestBench has no specific support for burn-in, but by using the TestBench multiple-test-mode support, the logic environment for burn-in test can be described so that tests can be generated for burn-in.
Most TestBench applications work with the circuit in a single test mode at a time, but there are a few instances in which there is some interaction among various test modes. LBIST is a prime example. The STUMPS configuration is w. v . nuom built upon a scan design, but the shift registers are fed by a PRPG and the shift-register outputs feed an on-product MISR. This scan mode is clearly different from the standard LSSD scan mode, in which the shift registers are connected to chip pins. A full-scan (LSSD) mode is used to initialize the design, including the PRPG and MISR. Then the design is switched to the LBIST mode, in which the PRPG and MISR are not scannable, but perform their intended test functions. In manufacturing test, the design is switched back to the LSSD scan mode at the end of the test to observe the signature. Thus, LBIST application involves a combination of test modes.
Also bringing together different test modes is the cross-mode mark-off aspect of multiple test modes. Faults marked off (detected) in one test mode may be automatically marked off in other test modes to avoid wasting time in redundantly testing the same faults. This cross-mark-off ability is one of the techniques which guarantee an efficient, compact test-vector set.
The clock circuitry and control designs presented a challenge to TestBench, since the clock block contains nonscannable latches. These latches control the scan operation, and TestBench does not support sequential logic in the clock-generation and scan controls. This tool limitation was circumvented by removing these latches from the model and replacing them with equivalent combinational logic and "pseudo primary inputs" that can be sequenced in such a way as to mimic the real operation. Of course, this necessitated additional steps. All of the test patterns had to be converted from the TestBench sequences, which use the pseudo primary inputs, to a form that could be applied to the hardware without the pseudo primary inputs, and with the appropriate clocking on the real inputs to produce the desired effect. This conversion was straightforward for LBIST and WRP data because these TestBench processes accept user-specified clocking sequences. Some other testgeneration processes, such as I/O wrap, do not support clocking constraints, and for those processes the conversion was complicated by the need to look for and eliminate any tests that could not be applied on the real hardware. Fortunately, as it turned out, there were few automatically generated tests that could not be easily converted.
Analyze untestable faults
Because of the mission-critical nature of the S/390 servers, the chips had to be tested to the highest product quality. The test coverage goal was greater than 99.9%. To understand the fault coverage, test generation with user- analysis was performed to understand the nature of each untested fault. The TestBench fault analysis identified testable and untestable faults and split the untestable faults into various categories: redundant faults, untestable due to test inhibits, multi-time-frame untestable, and test generated but simulation failed.
Redundant logic (corresponding to the "redundant" faults) was analyzed and the redundancy often removed. Custom logic has a higher percentage of redundant faults than synthesized logic. The causes for the redundancies were understood and often removed.
Test inhibits hold a fixed value on a pin throughout the test generation in a particular mode. Faults untestable due to these constraints cannot be tested in the respective test mode, but must be tested in some other mode if they are to be tested at all. Such faults would often cause a system failure if they were to occur. Each fault of this type was analyzed to ensure that it was tested in some other test mode, usually employing some other type of test. TestBench provides a "global" test coverage which reflects the union of all of the tests for which fault simulation is performed, across all of the various test modes.
The term multi-time-frame untestable refers to faults which require a series of clock pulses to be detected. TestBench fault analysis is based on its own automatically generated single-clock sequences. This means that faults which require a series of clock pulses to detect will remain undetected by the automatic analysis program. Usually these faults were tested when user-defined clock sequences were applied.
The last category of untestable fault, representing a disagreement between the test-pattern generator and the fault simulator, is usually symptomatic of a software problem. Either the generated test was incorrect, or the fault simulation was in error. The robust simulation capabilities in TestBench were valuable in the analysis of these faults. Test patterns could be rerun using a different simulator. When the two simulators agreed, this pointed to a problem in the test-pattern generator. Another useful feature of TestBench is the capability to simulate a subunit as if the fault existed and display the waveforms showing either the good or faulty behavior of the design. identified faults which, while not tested, were testable. Typically one might not need to consider these faults, but since the designs supported a limited set of clock sequences, these faults had to be examined. This analysis prompted model changes to support an additional clock sequence. Also, some groups of faults were understood to be untestable with the models used, but either did not exist or were actually being tested in the hardware.
Custom design techniques which stretch or bend accepted DFT guidelines and tool capabilities are often necessary in today's competitive environment. A close partnership between the product designer and the tool provider is mandatory for survival, as more demands are being placed on the tools, both for additional function and for added flexibility to run existing functions with fewer constraints on the design. This has come about primarily from the natural forces of VLSI: larger circuits demanding higher tool performance and integration of more functions on a single chip, requiring techniques such as selfgenerated clocking and self-test.
Fault modeling
TestBench operates on a design's fault model, which is based on a gate-level representation of the circuit. In this section, we refer to this gate-level circuit model as a fault-model d e . For standard-cell (ASIC) designs, a TestBench-compatible fault-model rule set was developed and stored in a library. A predefined set of fault-model rules cannot be created for custom designs. Custom designs are modeled by an application in TestBench called Modgen, which automatically generates fault-model rules directly from a transistor-level schematic.
Modgen takes as input a netlist of a transistor circuit and uses a path-tracing algorithm to produce a structurally 622 equivalent gate-level circuit. The gate-level circuit is composed of logic blocks (called logic primitives) that TestBench understands, such as AND, OR, LATCH, and TSD (tristate device). An example of this is shown in Figure 7 .
logic, but not for sequential logic such as latches, clock blocks, and arrays. Fault-model rules for sequential logic were created by hand. The fault-model rule-generation process is shown in Figure 8 .
Modgen's output is in the form of EDIF (Electronic Design Interchange Format), which contains TestBench logic primitives. After the EDIF model is built, it is imported into TestBench, and a TestBench model is created. The EDIF is also used by E2V (EDIF to Verilog) to create a fault-model cell view stored in the designer's library. The fault-model cell view is compared to the schematic using Verity [36] to ensure that the fault-model rule correctly predicts the circuit behavior.
Building a fault-model rule from a hierarchical schematic consists of running Modgen with all of the schematic's instantiated cells treated as black boxes. If the fault-model rule for a given black-boxed cell has already been created, no further processing is required for the cell. If a fault-model rule has not been created for a given cell, or if the schematic for the cell has been updated Modgen creates the fault-model rules for combinational 1 Flow for fault-model generation.
since the fault-model rule was created, a new fault-model rule is created. This hierarchical traversal of the design continues until updated fault-model rules have been created for all unique cells in the design.
Modeling sequential logic
Creating fault-model rules for sequential logic is done manually, since Modgen does not handle sequential logic. The schematic must be studied and understood, and then a fault-model rule and corresponding EDIF file can be built using TestBench logic primitives.
One example of modeling sequential logic is the clock block that is used throughout the chip to provide local C2/C1 clocks to latches. Its fault-model rule is shown in Figure 9(a) .
TestBench treats a circuit using this clock block as an unconstrained sequential design, because the latches that are used in the clock generation confuse the TestBench analysis programs, which look for simple means of controlling all of the system latch clocks and the scan data path. To allow the TestBench highly efficient stored- was to simplify the model while ensuring that the tests generated with this simplified model would work on the hardware. The first change to the model is to remove the nonscannable latches, as shown in Figure 9(b) . There is still a problem with this model, because TestBench requires the existence of a primary input stability state defined by setting all test inhibits (constant-value inputs) and all clock primary inputs to their inactive states. Even for an edge-triggered design, the clock primary inputs must have a defined stability state. It is a requirement that the stability state must set all clock inputs to all latches to known values. In this case, there was no way to define the clock stability state so that just setting the clock primary inputs (there are no test inhibits in this picture) would force the derived clock signal to a known value. This problem was solved by adding to the model an extra pin called GLOBAL-C1. Figure 9(c) shows the new model with the extra "pseudo" primary input.
ramifications. Since the pseudo pin does not exist in the hardware, TestBench cannot be allowed to control it in a random manner; instead, a user-specified clock sequence must be used so that the model will behave like the hardware. Figure 10 shows the user-specified clock sequence that is used. Note that signals C2 and C1 behave identically in Figures 9(a) and 9(b) using the userspecified clock sequence, so the model will behave like the hardware if the clock sequence used in Figure 10 is used during test-pattern simulation.
Another ramification of using a pseudo pin is that the test patterns must be changed prior to applying them at the tester. In addition, using pseudo pins increases the risk Adding the pseudo pin to the model has several Test coverage vs. CPU time.
of modeling the behavior of the circuit incorrectly, so extensive model verification must be done.
Model verijication
In order to confirm that the TestBench model correctly predicts the behavior of the chip, the model must be compared to the circuit for a variety of patterns. Verification was done at different levels of hierarchy in the design: leaf-cell-, macro-, and chip-level verification.
Leaf-cell verijication
A leaf cell is defined as any cell that contains transistors. For each leaf cell in the design containing combinational logic, the fault-model rule was compared to the schematic using Verity. Verity performs exhaustive verificationthat is, the circuit is verified for all possible input stimuli. Verity cannot be used to verify sequential logic, so verification was also done at the macro level and chip level, where another verification method was used.
Macro-level verification
A macro is a functional logic group that can contain combinational and sequential logic. In order to verify the fault-model rule used for a macro, TestBench was run on the macro to create a set of patterns. These patterns were then simulated on the VHDL model of the macro, and the output of TestBench was compared to the output of the VHDL. This process proved to be quite valuable. Since it is time-consuming to create a set of patterns to use for 624 verification, using TestBench to create the patterns saved considerable time. Also, TestBench created patterns that a designer might have overlooked.
Chip-level verification
For the chip-level verification, LBIST and ABIST were run using TestBench and also using the chip simulation model [37] , and the signatures were compared. This level of verification disclosed problems existing between macros that were not caught by the macro-level verification. It also found problems in the behavior of TestBench, and VHDL problems in the simulation model. This extensive model verification resulted in easing debugging of the test patterns at the tester, where no model or tool problems were found.
Test-pattern generation and coverage
Many test techniques have been discussed. They all come together in the final test-data generation for the product. The goal in test generation is to maximize test coverage as quickly and as efficiently as possible. With limited tester buffer sizes, test-data volume is critical. Also, the total CPU time to generate the test patterns had to be kept to a reasonable length, since pattern regeneration occurred often because of code bugs, model updates, last-minute logic changes, and efforts to optimize the pattern set.
and then resimulate the pattern set with dynamic fault simulation turned on. This was done because dynamic fault simulation for the CP chip required more than two CPU weeks on an RS/6000* Model 590 with two gigabytes of real memory (the number of flat model blocks is 1.3 million, and the combined number of static and dynamic faults is 9.2 million), and static test-pattern generation required several iterations. To speed dynamic testcoverage growth when targeting the static stuck-faults, the dynamic-type clock sequences were used first. Then the remaining static-only-type clock sequences were used to complete the test generation. During dynamic resimulation of the static patterns, this approach enabled quicker dynamic test-coverage gain, which ultimately improved the test coverage. Note that even though targeting was done in a static manner, these patterns were executed on the product in a dynamic mode. Figure 11 shows an overview of the test patterns that were created, the time required to create them, and the number of each type that were created.
The first test generated is the shift-register test, which detects about 45% of the static stuck-faults (static stuckfaults total about 4.5 million). This runs quickly, in about ten CPU minutes, and generates only ten patterns.
Next, 256 000 LBIST patterns were generated, but only the first 64 000 were fault-simulated. Fault simulation of 64000 patterns required 50 hours of CPU time, but 256000 would have required about 200 hours of CPU The approach used was to target static stuck-faults first time. The test-coverage number reflects only the first techniques must be developed to provide more detailed 64000 patterns; however, in order to get the benefit of the analysis of device performance. BIST must be optimized patterns, the full 256000 patterns were applied at the to detect dynamic faults. tester. Since LBIST is inexpensive from a tester buffer and Test-generation techniques must be developed to reduce tester run-time perspective, the number of patterns was test-generation time and test-application time in an effort extended to detect unmodeled defects as well. Unmodeled to drive down test costs. Larger and more complex designs defects are defects which can occur in hardware that are will exceed the capability of today's test-generation tools not modeled in the fault-model rule for a circuit. In and will increase test times so that test becomes a greater addition, it is beneficial to have a large number of LBIST portion of the overall product cost. patterns, since LBIST is the only logic test that is run on
Test is and will always be key to future microprocessor the higher levels of system testing.
designs Once the bulk of static faults were detected, the other tests were created to test the I/Os, the PLL, and the arrays (using ABIST).
After the static tests were created, the patterns were Thivierge, Darren Childress, owen Farnsworth, Deborah resimulated with dynamic fault simulation turned on. Then Hamm, and Ed Leahy for their efforts On pattern the remaining untested dynamic faults were targeted with generation and hadware bring-up in manufacturing; and WRP and deterministic patterns, and the new patterns Micrus Corporation personnel Franco Motika, Donato were appended to the total pattern set. Greater than 90% Forlenza, Orazio Forlenza, Ray Kurtulick, Joe Sheridan, dynamic test coverage was achieved. Wendy Chong, and Adrian Anderson for hardware bringup and characterization at the tester.
Conclusions
The Si390 custom microprocessor design created many test challenges. The test methodology required several enhancements to the test-generation process and specialized fault-model development because of the complexity and unique clocking structure of the design. BIST required unique design development to support the complexity and high-performance aspects of the design, while SGC was instrumental in verifying performance and allowed the chips to be tested at cycle time during the manufacturing test process.
addressed: Dynamic test must be enhanced to keep up with increasing device performance; dynamic faults must be targeted and accurately measured, while on-chip
