I. INTRODUCTION ANY RAM test algorithms based on different fault
M models have been proposed during the past ten years
[1]- [9] . Most of these algorithms were written from a purely mathematical point of view and there is hardly any insight about their practical importance.
The main objective of this paper is to show the feasibility of fault model and test algorithm development based on actual device defects. The defects are modeled as local disturbances in the layout of a static random access memory (SRAM) memory array and translated to defects in the corresponding transistor diagram. The electrical behavior of each defect is analyzed and classified, resulting in a fault model at SRAM cell level. The defect modeling at layout level and extraction to SRAM cell level is done using the inductive fault analysis (IFA) technique as proposed at Carnegie Mellon University [ 101, [ 1 11. Only single defects per memory cell are assumed.
Two categories of defects can be distinguished at layout level: 1) global defects, like too thick gate oxide or too thin poly silicon caused by process etching errors or oven temperature variations;
2) spot defects, like dust particles on the chip or the masks, scratches, and gate oxide pinholes.
In this paper only spot defects will be considered. These defects may result in complete or nearly breaks and shorts in the circuit. This paper will only deal with complete shorts and opens. Nearly breaks and shorts generally only cause parametric faults [ 121. One of the other objectives of our work was to develop an excellent test algorithm for SRAM self-test applications [ 131. Therefore, much effort has been spent to keep the test algorithm regular, symmetric, linear, and with a simple address order. Minimization of the test length was considered a second priority. The test length might be optimized for other applications while maintaining the same fault coverage. RAM'S are commonly available in size of N words with m bits per word. In most papers presenting test algorithms only the case of m = 1 (a bit oriented SRAM) is considered. We present a general optimal solution to solve the test problems of both bit oriented and word oriented SRAM's.
THE SRAM FAULT MODEL
For the development of a fault model, an SRAM is divided into three blocks: 1) the memory array; 2) the address decoder; 3) the R/W logic.
These blocks differ in structure, hence, they are analyzed separately. Defects in the address decoder and the R/W logic are mapped onto functionally equivalent faults in the memory array. This method, as proposed by Nair, Thatte, and Abraham [4] , has the advantage that all faults can be considered to be in the memory array.
In the next sections the following definitions are used.
Dejinition I :
A memory cell is said to be stuck-ut if the logic value of the cell cannot be changed by any action on the cell or by influences from other cells. Dejinition 2: A memory cell is said to be stuck-open if it is not possible to access the cell by any action on the cell.
Dejinition 3: A memory cell with a transition fault fails to undergo at least one of the transitions 0 --* 1 or 1 --* 0.
Dejinition 4: A memory cell, say cell i , is said to be state coupled to another memory cell, say j , if cell i is fixed at a certain value x(x E (0, l}) only if cell j is in one defined state y ( y E ( 0 , l}). State coupling is a nonsymmetric relation.
Dejinition 5: A memory cell i is said to have a multiple access fault to another memory c e l l j if a WRITE action with value x(x E (0, l}) to cell i also forces a WRITE action to cell j with value x. Multiple access is a nonsymmetric relation.
0278-0070/90/0600-0567$01 .OO 0 1990 IEEE Dejinition 6: There is said to be a data retention fault in a cell if the cell fails to retain its logical value after some units of time.
The Memory Array
The layout of a Philips 8k8 SRAM was used as a vehicle to perform the fault model study. This memory is made in a double poly CMOS process with passive pullup resistors. In order to model faults in the memory array, a physical defect model has been adapted. We used the spot defect model based on extra and missing layer material as proposed by the IFA method [lo] , [ll] . The model resulted in five different types of spot defects at layout level: broken wires, shorts between wires, missing contacts, extra contacts, and newly created transistors.
Defect analysis is done in two steps. The first step is the translation of spot defects in the layout to defects in the transistor circuit diagram. Fig. 1 shows two examples of spot defects at the layout extracted to circuit level. A detailed analysis of the Philips 8k8 SRAM resulted in at set of 60 spot defects in the layout that were analyzed in more detail. Many different layout defects showed the same defect at transistor diagram level. The second step is to classify defects at transistor level based on equivalent faulty memory cell behavior. The result is a fault model at SRAM cell level, consisting of six fault classes:
1) a memory cell is stuck-at 0 or stuck-at 1 ; 2) a memory cell is stuck-open; 3 ) a memory cell suffers from a transition fault; 4) a memory cell is state coupled to another cell; 5) there is a multiple access fault from one memory cell to a memory cell at another address; 6) a memory cell suffers from a data retention fault in one of its states. The retain time depends on the leak current to the substrate and the capacitance of the floating node. The retain time for the Philips 8k8 design can be up 100 ms.
The complete extraction of the fault model from the circuit defects is given in [14] . A probability factor for a fault class is required in order to find its importance. The probability of a fault caused by a spot defect depends on the critical area. This is the chip area where a spot defect can damage the active part of the layout. It depends on the dimensions of the spot defect and the topology of the layout. For example, if two parallel metal wires are separated by 4 pm, then a complete short between these two can only be caused by a spot defect with a diameter of more than 4 pm. The probability of occurrence of the short is proportional to the length of the two wires. This length is called the critical path length. The accumulated critical path lengths for each fault class and the minimal spot defect dimensions have been calculated (c.f. Fig. 2 ). Stuckat faults have the highest probability of occurrence, but the other fault classes are not negligible. It can be concluded that stuck-at faults only include about 50% of all faults. This is a clear indication that the stuck-at fault model is insufficient for SRAM's. The critical path lengths, as presented in Fig. 2 , were based on the layout of the Philips 8k8 memory cells. Another layout will certainly change the critical path lengths of the various fault classes. However, since the structure of SRAM memory cells in various processes is similar, we believe that no new fault classes will be introduced in case an other layout or process is used.
Address Decoder
A general fault model for the address decoder is assumed. It was presented by Nair, Thatte, and Abraham [41: 1) more than one cell is accessed by one address, 2) an address accesses no cells.
Faults in the decoder can be viewed as memory array faults. Item 1 is equivalent to a multiple access fault from one cell to one or more cells at other addresses. Item 2 is equivalent to a stuck-open cell.
R/W Logic
The R/W logic passes the data information from the 110 pins to the memory array and vice versa. A word ori-ented SRAM with m bits per word is considered. Faults in the buses, sense amplifiers, and write buffers were considered, resulting in the following fault classes: 1) one or more of the m bits is stuck-at; 2) one or more of the m bits is stuck-open; 3) a pair of bits is state coupled.
All faults in the R/W logic can be regarded as faults in the memory array [4] . Item 1 is equivalent to a set of stuck-at cells. Item 2 is equivalent to a number of stuckopen cells. Item 3 is equivalent to a state coupling fault between two cells at the same address.
DESIGN DEPENDENT CONDITIONS
Translation of all faults to the memory array is not sufficient. Detection of all faults is only assured if the R/W logic is transparent for all faults in the memory array. This means the R/W logic must pass faults in the memory array to the output pins.
If the sense amplifier is combinational or if the input of the sense amplifier is only one bit line, the sense amplifier will pass a proper defined logical value to the output pin. In this case, a stuck-open fault will be detected as if it were a stuck-at fault.
However, some designs of sense amplifiers include a data latch in the read path. This data latch is used to broaden the read window of the RAM during normal operation. It makes the R/W logic sequential. This latch has serious consequences for the fault coverage of cell stuckopen faults since the last read data of the memory will be stored in the data latch. In case a stuck-open cell is read, the contents of the data latch will be passed to the output pin. If this is the expected data to be read from the memory, the stuck-open fault will not be detected.
IV. THE SRAM TEST ALGORITHM

Dejinition 7:
A march element is a finite sequence of read and/or write operations applied to every cell in the memory array, either in increasing or decreasing address order.
In the proceeding sections, test algorithms will be presented for both bit oriented and word oriented SRAM's with combinational or sequential R/W logic.
Bit Oriented SRAM with Combinational R/W Logic
A length 9N test algorithm is presented, where N is the number of addresses. In order to detect the data retention faults, a data retention test is added. It is given in Fig. 3 . A Rd(0) instruction means reading from the memory array and expect the logical 0 from the addressed cell. A Wr(0) instruction means write a logical 0 to the addressed cell. The address is indicated in the first column of the figure.
The proposed wait time in the data retention test depends on the node capacitance and the leakage current in a memory cell. In the Philips 8k8 a wait time of 100 ms was estimated. Other cell designs or processes may result in other wait times.
It can be proven that the 9N test algorithm detects all faults of the fault model. Details of the proof can be found in [17].
Extension for Sequential R/W Logic
If the R/W logic includes a data latch, detection of stuck-open faults is not guaranteed by the 9N test algorithm as was explained in Section 111. We have solved this problem by adding one extra read action to each march element such that the expected read value is alternating high and low. With this it is ensured that the expected read value is always different from the latest read value. Thus each stuck-open fault will be detected. The resulting 13N test algorithm is given in Fig. 4 . The data retention test is added. We do not guarantee that this test algorithm is the smallest one possible for 100% fault coverage over the fault model, but it is regular and symmetric, and thus suitable for BIST applications.
Extension for a Word Oriented SRAM
A read or write action for a word oriented SRAM involves reading or writing an entire word of data, called data background. The instructions in the test algorithms (c.f. Figs Word oriented SRAM's introduce the problem of state coupling faults between two cells at one address (or two bits in the R/W logic). To detect those faults all states (00, 01, 10, and 1 1 ) of two arbitrary cells at the same address must be checked. This is only possible if several data backgrounds are used in the test algorithm.' If m bits per word are used, a minimum of K data backgrounds will be needed where:
The proof of this statement can be found in [ 141. If m is a power of two, the formula simplifies to
As an example, we will give the data backgrounds for m = 8 (a byte oriented SRAM). According to the formula, K will be 4: Detection of all coupling faults is guaranteed, since in the test algorithm the inverted data backgrounds are also generated.
The number of data backgrounds can be minimized if the topology of a particular design is known.
V. PRACTICAL VALIDATION
The theoretical results obtained in the previous section have been validated by testing and analyzing a large number of devices of a Philips' 8k8 SRAM. Defective memories have been evaluated and analyzed using microscopes and SEM techniques. In this way the approach to come to a fault model, via a layout study, was validated. Most analyzed devices indeed suffered from the defects as covered by our fault model. Only 10% of the faulty devices behaved in an unexplainable way. A detailed description of the practical validation analysis is given in ~4 1 .
The Objectives
The objectives of this validation phase were twofold. 1) Validation of the fault model. Do the defects, as encountered in our SRAM's, indeed behave as described in the fault model?
2) Validation of the test algorithms. How does our test behave in comparison with test algorithms as proposed in literature?
The Test Data Analysis
1192 devices of the Philips 8k8 SRAM were taken out of 9 wafers from 3 different batches. These devices were tested with a set of 23 test algorithms on a Teradyne J389 memory tester. This set of algorithms consists of the algorithms derived in Section I1 and a number of algorithms as proposed in literature. The choice of the latter algorithms were guided by a previously performed extensive algorithm-comparison study [ 151. Finally, various implementations of the algorithms were used based on several addressing orders and data backgrounds.
Executing a test algorithm on one device results in a number of detected faulty cells. This number is called the bit error number. It is dangerous, however, to rely completely on the bit error number as a measure of test algorithm effectiveness. One defect in a decoder may result in a large bit error number. One defect in the memory array may result in a small bit error number. A second problem with the bit error number is the interpretation of the defects. The test algorithms used do have an overlap with respect to detection capabilities of the faults in our fault model. Correlations will, therefore, exist between the different bit error numbers of the various test algorithms.
The bit error number can vary strongly from algorithm to algorithm and from device to device. We have translated defects in the decoders and the R/W logic into equivalent faults in the memory array. Hence, it is possible that, although for various devices the bit error number is completely different, the actual fault class is the same.
In order to find these correlations the "behavior" of the test algorithms over one device and the "behavior" of the test set over the total number of devices has been computed. The behavior of the test algorithms over one device is described by using the notion of signature. The behavior of the test sets over the total number of devices is described by the notion of a cluster. The signatures were also used to measure the effectiveness of the test algorithms.
For each device, each test algorithm is assigned a signature bit with values 0, 1 or 2. This bit is a function of the bit error number of the corresponding test algorithm. where:
m mean value of the bit error numbers for one device, r standard deviation of the bit error numbers for one device.
This process resulted into a 23 bit signature per device which may be considered as a fault-class representative.
Some of the most frequently occurring signatures differed at one or two signature bits. The differences in these signatures were caused by random defects and multiple faults on a device. To obtain the main fault classes the signatures were clustered using a statistical clustering algorithm, similar to the isodata algorithm developed by Tou and Conzalez [16] . We expected that each cluster could be characterized by a single fault class.
Validation of the Fault Model
Our analysis resulted into 15 clusters (c.f. Fig. 5(a) ). Several devices from each cluster were analyzed in more detail using a light microscope and a SEM. Defects in devices from the same cluster indeed all suffered from the same fault class. The result of this analysis is shown in Fig. 5(b) . Notice, that most of the theoretically derived fault classes are indeed important and were encountered in practice. Only transition faults were hardly ever found and multiple access faults occurred regularly because of faults in the decoder. In [17] examples of spot defects in devices from several clusters are shown.
About 10% of the analyzed defects showed a not yet understood behavior. This kind of fault remains a point of further research and improvement of the IFA technique and fault model.
Validation of the Test Algorithm
The signatures used for the fault model validation analysis were also used for the test algorithm validation analysis. Each algorithm was assigned a score number. This number is the mean value taken over the signature bits from all devices of one test algorithm. Therefore, the maximum score that could be reached for a test algorithm was 2. The higher the score number of a test algorithm, the higher its fault coverage of real faults compared to other test algorithms. The results of the test algorithm comparison study is shown in Fig. 6 . The test algorithms as developed in Section IV are called IFA-9N and IFA-13N.
VI. CONCLUSIONS Defect analysis using the IFA technique results into a physical failure-based fault model. The proposed test algorithms show excellent features in both test time and fault coverage. The fault model and test algorithms have been validated in practice. The test algorithms cover 100% of the faults under the fault model and are independent of row, column, and cell arrangement in the memory array. The choice of the test algorithm is slightly design dependent. A length 9N test algorithm is given for SRAM's with combinatorial R/W logic. A length 13N test algorithm is given for SRAM's with sequential R/W logic. The given test algorithms are both suitable for bit and word oriented SRAM's. We have taken advantage of the regularity and symmetry of the test algorithm by implementing them in a self-test machine [13] .
It is important to notice that 60% of the defects behave as stuck-at faults (memory stuck-at faults and total chip failures). The yield of the sample was low, so many total chip failures occurred. Hence, a sample with a higher yield (less total chip failures) will show an even lower percentage of stuck-at faults. In the theoretical approach, single defects were assumed and 50% stuck-at was estimated.
Ninety percent of the analyzed defects could be explained by using the IFA technique and are covered by the fault model. Almost all fault classes as indicated in the fault model indeed occurred regularly in actual devices. The 10% of not yet understood defects form the input O f Current research activities to achieve improved defect modeling and fault modeling techniques.
[15] P. Veenstra, F. Beenker and J . Koomen, "Testing of random access memories, theory and practice," Insr. Elect. Eng. Proc., vol. 135, pt. G , no. I , pp. 24-28, Feb. 1988. 
