CLX Introduction
The Tandem NonStop CLX is a low-cost multiprocessor computer for online transaction processing. A CLX system can have from 2 to 6 single-board processors. The processor design is based on 4 custom CMOS chips, a 32K x 60 cache/control store static RAM array, and a 4 MB main memory array [1] . An optional expansion memory board of 2, 4, or 8 MB may be added to each processor.
Apart from the traditional Tandem design requirements of fault tolerance, data integrity and reliability, the CLX is required to be inexpensive and user serviceable. Furthermore, its use in an office environment requires such operations as diagnostics to be speedy and require no user intervention. A BIST function to test the CPU and memory met these requirements.
CLX Built In Self Test
The BIST on the NonStop CLX processor is based on scan, in which the majority of state elements within the processor are made of shiftable latches and registers. The BIST repetitively shifts pseudo-random patterns into the registers and latches, single steps the machine, and then shifts the results out, incorporating the results into a signature [2] . This is repeated until 2 16 test vectors have been shifted in. The signature is then compared against that of a known good board, giving a pass/fail result. The system responds deterministically to the pseudorandom stimulus, responding with the same signature for good boards but a different signature for boards containing a detectable failure. .
The test is completely self contained; it does not require control or data from other boards. The processor board and its external memory board are covered by BlST. The total test time is less than one minute. The test is run at power-on and on request by a maintenance process running in a separate CPU.
The BlST covers failures throughout the processor and memory boards, as well as within the core of the custom chips. The control logic for large RAM arrays is well tested, but the arrays themselves (cache static RAMs and main memory dynamic RAMs) are tested to a lesser extent
The test does not cover the Tandem 10 Bus interface, nor does it cover the Tandem Inter-Processor Bus interface. Testing these areas would involve different processors, and BlST is only intended to verify the one processor. Running in a single processor, BIST has no way to control and observe these external interfaces without affecting the shared bus.
The BIST test of memory changes the memory contents. Because of this the BIST can be done in two ways --I~ither cold BIST or warm BIST. The two techniques differ in the way they affect memory. In cold BIST the contents of the main memory are written to and tested. This type of test is suitable for use before a cold start of the processor when the contents of main memory are not important. In a warm BIST the contents of main memory are not disturbed. Of course this ven,ion of the test is not as thorough. A warm BIST is suitable for testing the processor after a power fail when battery backup preserved the contents of main memory, allowing a resumption of existing jobs. Unless specified, for the rest of this paper BlST refers to cold BIST.
While the scan-based design necessary for BIST is expensive, it allows for many other attractive features. The NonStop CLX uses the scan feature within development engineering, manufacturing, and in the field. For test and debug, scan gives excellent observability and controllability of registers and latches [3] . A stand-alone tester to screen custom chips uses a pseudorandom scan/step/collect signature method. Another tester does board-level repair using scan techniques. The board level tester does pseudo-random pattern testing, and when a failure is found, it masks off scan strings until the problem is isolated. Finally, scan is used in the field to determine the state of failing boards. This information allows diagnostic software to determine the cause of failure and prescribe an appropriate fu [4] .
Scan Implementation
Most of the state elements on the CPU board are scannable. That is, they operate in one of three modes: nonnal, frozen, or scanning. When frozen, registers or latches hold their .state. When scanning, the registers and latches are configured mto a large shift register, allowing their entire contents to be shifted in and out.
MDP Software Control of BIST
The decision to include a BIST feature on CLX was made after the decision to include scan in the processor design. To achieve the CLX design requirement of low-cost, control of the BIST function had to added with minimum hardware overhead. The requirement was met by maximizing the use of tvIDP software to control the BIST function. The BIST on the CLX uses a faster way to load the scan strings with pseudo-random data ( fig. 3 ). When in BIST mode, the decoders that assert the individual scan control signals operate differently. Nonnally, the SCAN<T1n> control line is asserted for one system clock when the appropriate MOP address is written to. In BIST mode, a write to this address causes the SCAN<nn> signal to assert continuously, and all other SCAN signals to deassert. The input to the scan strings, SCAN_IN, is driven by the PPG output. With this method, the scan strings can be loaded at a rate of one bit per system clock.
The main function of the tvIDP, the Maintenance and Diagnostic Processor, is to manage microcode loading, fault analysis, and scan functions. The MDP, based on the 6803 8-bit microprocessor, is central to the control of the BIST function. The tvIDP executes the proper sequence of steps for initialization and directs the scanning and stepping of the processor. During the BIST function, the MDP software is completely detenninistic. There are no real-time interrupts or data-dependent steps allowed.
The MDP can control the scan strings to shift one bit at a time into or out of scannable registers. To transfer a bit between the MDP and a scan string (one of 13), the MDP writes a certain address, causing the corresponding scan string to shift in the high order bit of the microprocessor data bus. A read to the same address gives the current scan-data-out of the string on the low order data bus bit. This allows direct observability and controllability of all scannable registers. This mode of scanning could theoretically be used for BIST, i.e. one bit could be read from the PPG and then shifted into a scan string. This method would be slow, since at microprocessor speeds, several microseconds per bit would be needed.
CEJ Figure 3, Block Diagram of BIST Structure
To load the scan strings with pseudo-random data, the MOP does a write to the address which asserts SCAN<OO>. The software then enters a wait loop until enough bits have been shifted in. The MOP then writes to the address which asserts SCAN<Ol>, causing SCAN<OO> to deassert. This is repeated for the other scan strings until all are loaded. The MDP software need not shift in exactly the right number of bits for the scan string. All that is required is that the minimum number of bits for each scan string be shifted in. A string is at most 255 bits (a constraint imposed by our board-level tester), so MDP software shifts a fIxed number of bits into each string.
(' "' . 
EXEC ---'
The BIST uses strings of pseudo-random bits generated by a 16 bit linear feedback shift register (LFSR) [7] called the PPG (pseudo-random pattern generator). The BIST signature is collected by another 16 bit LFSR called the SR (signature register). The characteristic polynomial of the two LFSRs ,is x16 + x 12 + x 9 + x 7 + 1. Figure 2 shows the PPG and SR deSIgns.
The PPG and SR must be writeable for initialization and the SR must be readable to determine the results of the test. Both registers reside in a custom chip, occupying less than 5% of the chip's logic area. To minimize 10 connections to the chip, the PPG and SR are connected to a scan string. When not doing BIST the registers may be examined and set using scan. During the BIST, the two registers are removed from the scan string and operate as LFSRs.
System Clock
This scan method allows state elements from different technologies and clocking regimes to coexist in one scan string and use the same control signals. The custom CMOS chips have shifting latches which use two-phase non-overlapping clocks [6] , while the scannable MSI FAST parts and PALs use a single, edge-triggered clock. The same free-running system clock is distributed to all state elements (the two-phase clocks for the custom chips are generated internally).
Pattern Generator/Signature Register Among the numerous scan methods [5] a single-clock multiplexed-data scheme was chosen. That is, a free-running clock is distributed to all registers, and the nonnal, frozen, or scan modes are determined by control lines ( fig. 1 ).
The BIST loads a test vector into all scannable registersllatches before single stepping the machine. The length of the test vector ge::erated by the PPG is determined by the total number of bits shifted into the scan strings, not simply the number of bits in the sca:1 string (since MOP software shifts in a number greater than 255). Since the pseudo-random pattern is generated by an LFSR, the test vector length determines the number of unique test vectors generated.
Initialization for BIST
For the BIST to return a stable signature, all data incorporated Into the signature must be deterministic. This requires that all state bits affecting the signature be initialized. Scannable registersllatches are initialized through scan, non-scannable state elements through different processes.
Srr_all RAM arrays and register files (non-scannable) are initialized by repeating the BIST, and only using the signature from the second pass. That is, the 2 16 iterations of scanning in test vectors and single stepping the machine are repeated twice. Th:: PPG and SR are reinitialized between the two passes. Since the RAMs are small, the first BIST pass will write all those loc.ations that the second BrST pass will read.
A different method is required for large RAM arrays, as there is a d:stinct probability that BIST WIll read a location that it never writes to. For these RAMs, initialization is also done in a two pass approach. During the fIrst pass, a control signal is asserted which forces all read operations to writes. Then, during the second pass, the control signal is deasserted, and the same pseudo-random test vectors are applied. Thus, all the locations that the BIST second pass reads contain initialized, pseudorandom data.
Funher initialization issues are covered in the section on ini::ializing a memory system.
Memory System Operation
The CLX memory system includes an instruction/data cache and page table cache in static RAM and between 4 and 12 megabytes of main memory dynamic RAM [l] . 4 MB of memory are on the CPU board and the rest are on the optional memory expansion board. The main memory interface to the CPU supports transfers of single words at random addresses, and higher speed transfers of words from contiguous addresses. The memory system also provides support for memory to memory block moves and data alignment
Integration With BIST
The main memory system is functionally tested by BIST. BIST giyes good coverage of the control logic, buffers, and interconnect but only does a partial test of actual DRAM cells in the main memory. The memory system is tested functional. That is, BIST applies pseudo-random operations to the memory system, along with pseudo-random address and data, and incorporates the results of those operations into the BIST signature. The memory system is not scanned and no attempt is made to systematically test every location in memory.
Even without exhaustively testing all DRAMs, the BIST test of main memory is quite adequate for our needs. First, it tests at least a few memory locations in each dynamic RAM along with all the memory support circuitry external to the DRAMs. Total failure of a dynamic RAM, such as a missing component or a floating pin, is detected. Failures undetected by BIST would be confined to a single DRAM and would be easily detected by a microcode diagnostic.
The BIST must test the memory system differently from the rest of the processor. The processor is tested by scanning in pseudo-random data, single stepping the machine, and scanning out the data. The memory system cannot be tested in this fashion because of the DRAMs. The dynamic RAMs' periodic refresh would be interrupted if pseudo-random data were to be scanned into the control logic. Also, DRAM operations take longer than one CPU system clock, e.g. a memory read takes 4 cycles to complete. The normal pseudo-random test expects all operations to complete in a single cycle. Thus, the CPU is operated for one system clock and then results are scanned out and incorporated into the signature.
For test purposes, the memory system operates separately from the CPU. While the CPU is frozen, scanned and stepped by BIST, the memory system continues to run normally, doing periodic refreshes of the dynamic RAMs. The BIST lest of the memory system occurs when the CPU interacts with the memory system. During BIST the CPU pseudo-randcmly does reads and writes of pseudo-random addresses with pseudorandom data. Results of these memory operations are ,oaded by the CPU and then incorporated into the BIST signature. In effect, the BIST does a functional test of the memory system.
This type of testing places several requirements on the: memory system. Special memory system features to support BIST are listed below:
• The memory system must be able to accept single stepped operations from the CPU.
• The memory system interface to the CPU must ignore pseudo-random bits scanning past the interface.
• The memory system must respond deterministically to any command from the CPU presented in any sequence.
• The memory system must give a deterministic response to the CPU in the presence of events such as refresh.
• There must be a mechanism for initializing all state elements observable by BIST.
• The memory refresh must occur deterministically relative to the start of BIST.
During the BIST, the CPU is scanned and single-stepped while the memory system continues executing every cycle, maintaining the dynamic RAM refresh operation. When the CPU is singlestepped, an operation may be requested of the memory system. Possible operations are writes to memory, initiation of a read from memory, transfer of data from a read FIFO to the: CPU, or read or write of status registers.
The memory system only acts on requests from the CPU when it is executing normally or when it is being single stepped (as it is during BIST). When the CPU is frozen or being scanned (see fig. 1 ) the EXEC signal is deasserted and the memOty system ignores the bits being scanned past its interface with the processor.
When the BIST single steps the CPU, the memory system takes the pseudo-random bits at its interface with the CPU and executes the command. The command is executed in successive clocks. That is, the memory system is not single stepped like the CPU.
In the case of a read or write to a memory system status register the operation is completed in one cycle and the results are loaded by the CPU and are incorporated into the scan string.
set to an invalid state. Not only should the response of the memory system be deterministic, the control logic should remain in a state where it can service further memory requests and service the DRAM refresh. That is, it is not acceptable for a normally illegal sequence of requests to cause the memory control state machines to halt or go to undefined states. While this would most likely be deterministic, it would prevent the memory system from participating in any further functional tests. Unlike most of the processor, the memory system is not initialized by scan, as the memory system control logic must preserve the dynamic RAM refresh function. The refresh interval is initialized by synchronizing its count to the start of BIST. During a refresh, the memory system asserts a control signal to the CPU indicating that it is busy and cannot respond to any requests. Failure to synchronize BIST with refresh would result in nondeterministic results being sent back to the CPU and cause an unstable BIST signature.
The refresh counter and other control logic is synchronized by the MDP software asserting a signal at the start of the BIST. This ensures that refresh interactions will always be consistent across every run of the BIST. Note that the refresh interval may change during the course of BIST due to the CPU pseudorandomly writing to the memory system's refresh interval register.
The memory system must respond to any sequence of operations without producing nondeterministic results. For example, a series of read FIFO-to-CPU transfers could empty the FIFO. The data value transferred from the empty FIFO must be some deterministic constant, and not be dependent on a random power-on state for example.
Initialization of the Memory System In the case of a multi-word read, the operation is completed in 7 cycles (for a 4 word read) and the read data is buffered in a memory system FIFO. The data waits in the FIFO until a single stepped pseudo-random FIFO-to-CPU transfer operation occurs. Once transferred to the CPU, the data is incorporated into the scan string and hence influences the BIST signature. The read data cannot bypass the FIFO and go directly to the CPU because data is being read from the DRAM in a stream of back-to-back clocks. The CPU is unable to accept data on backto-back clocks while it is being scanned for the next BIST iteration.
In the case of memory reads and writes, the Rl signal MSTRT always precedes the R2 encoded signal to do a read or write. When a line of microcode is single-stepped, the pipeline advances one stage for each single step. There may be many cycles between RO and R I or between R I and R2 where the processor is frozen. When a memory operation is singlestepped in R2 (i.e. EXEC is asserted by the falling edge of ¢lA ) the operation is said to be committed. The committed operation will complete in the next 4 system clocks, even if EXEC is freezing the processor.
During BIST, pseudo-random data is scanned into all pipeline stages of the processor (but not into the memory system pipeline stages). For example, in order for the memory system to do a read operation, the memory system expects to see an MSTRT in the R I stage and in the following single-step see the appropriately encoded read command in R2. For this sequence to happen, pseudo random data has to, by chance, load the pipeline correctly on successive BIST iterations. If the BIST does not load the pipeline correctly, the memory operation will not complete. Instead of accessing the memory, there is some other deterministic response (such as not doing anything).
Deterministic Pseudo-Random Memory Operations
The memory system's response to any request during BIST must be deterministic. This is also true of invalid requests such as would occur if the previously described pipeline stages were
Test Coverage
The exact test coverage of the BIST on CLX is unknown as we were unable to run a complete fault simulation. The large number of test vectors with over 350 chips, including commercial and custom VLSI and large static and dynamic RAM arrays, are an obstacle to simulation.
The BIST does not cover the 10 channel interface or interprocessor bus interface. Both of these connect to external sources which cannot be controlled or observed by the BIST. Of the remaining logic, very high fan in circuits, such as the comparators for the cross-coupled chips, are known not to be covered. The logic known not to be covered by BIST is 5-10% of all the logic on the board.
Even though the DRAMs are not exhaustively tested, the BI~T coverage of the memory system is quite good for control logIc, board wiring, and address/data paths.
In the nrst 160 processor boards built, there were no cases of failures (outside of the areas known not to be covered by BIST) that were undetectable by BIST. In some cases BIST indicated a board to be bad even though the board passed all other tests and seemed to function normally. In these cases it is assumed that BIST detected a failure in a path not sensitized by normal operation.
After the BIST is completed, MDP software determines if the test passed or failed. The generated signature is compared against known good signatures kept in EEPROM. There is more than one valid signature possible. The signature is dependent on the type of BIST, either warm or cold, and the size of the external memory board installed. -A simple solution would be to require that the correct warm and cold BIST signatures be updated any time a processor or mer-lOry board is installed. This is undesirable for a system intended to be user serviceable, as it would add a manual step to the installation process.
Keeping a list of known good signatures only on the processor or r.:emory board presents a version control problem. A new size or version of memory board unknown to the processor wot:ld cause a signature miscompare and erroneously indicate a failing board.
The solution is a list of signatures in non-volatile memory kept on both the processor and memory boards. On the processor boa:d, a list is kept of all good signatures possible when that processor board is paIred with any memory board. On the mer.1ory board, a list is kept of all good signatures poSSIble when that memory board is paired with any processor board.
After completing the BIST, the maintenance processor scans the list on both the processor board and the memory board. Failure to find a match on either list indicates that the board does not pass the BIST. This approach allows engineering changes or new versions of either board to change the correct BIST sigr.3ture without updating the signature list previously stored on the other board.
Conclusion
The BIST design presented here does a good test of the board without high hardware overhead. Keeping control of the test in maintenance processor software minimized hardware overhead. The test of the processor board covers much of the dynamic RA.\1 memory system as well as the scannable logic.
