register (MAR) and the memory data register (MDR) form the bottleneck in speeding up the test algorithms. In this paper, a new RAM organization with parallel testing technique has been proposed, and two O(&) algorithms have been designed to detect the pattern-sensitive faults. In the test mode, the decoder makes multiple selection of bit lines allowing one to write the same data simultaneously at many storage locations on the same word line. In the read mode, a multibit comparator concurrently detects the error. if any. The additional hardware is minimal and has been designed such that it can fit within the limits of 3X to 6X intercell pitch width, where X is the technological feature width.
The technique of parallel testing was at first proposed by You and Hayes [8] , by reconfiguring the memory cells into an n-bit shift register and using a built-in test generator to test many bit lines concurrently. This test procedure has O ( 6 ) complexity. It detects limited types of pattern-sensitive faults where a write operation becomes faulty in the presence of a few specific patterns in its adjacent cells. It does not detect the faults caused by transitions in the neighborhood. Moreover, it relies on comparing only two cells in memory arrays having two partitions. Thus, if two cells are identically faulty, the test fails to detect the faults. In contrast to their technique, a multibit comparator which concurrently detects simultaneous 0's and 1's has been used in this paper. You and Hayes reconfigured the memory subarray of size s bit into an s-bit cyclic shift register, where the data recirculated whenever a read operation was performed. The reconfiguration was accomplished by introducing pass transistors on the bit lines which deteriorated both the sensitivity of the sense amplifiers by VT (threshold voltage in MOS devices) and the access time of RAM in normal mode of operation.
Kinoshita and Saluja 191 analyzed the overhead involved in microcoded, ROM-based built-in test generator for testing the pattern-sensitive faults, using test algorithms of linear time complexity. In a subsequent paper, Le and Saluja [lo] utilized the concept of parallel access by modifying the bit lines, and improved the complexity of their algorithms. A potential problem in their technique was the reliability of parallel read operation, particularly in multimegabit DRAM where hundreds of cells are accessed simultaneously for reading and writing.
Sridhar [l 11 proposed an alternate scheme that used the parallel signature analyzer (PSA) to access many bit lines simultaneously. The PSA operates in three distinct modes: in the scan mode, it is loaded with a specific pattern from outside the chip by sequentially scanning in the bits; in the write mode, it writes its stored value to many bit lines in parallel; finally, in the signature mode, it reads the content of the memory cells written earlier, and generates an error-quotient bit if there is any error. Even though signature analyzers are frequently used for testing, the scheme using PSA has intrinsic problems in memory testing. At first, the scan mode sometimes reduces the parallelism in testing. Han and Malek [12] demonstrated that by using PSA, the popular marching test algorithms [ 131 could be speeded up at most by 1.2 times. Second, the'PSA introduces an aliasing error in testing. An error in output of the j t h stage flip-flop at time t; followed by an error in output of t h e j + hth stage at time ti+,, has no effect on the signature, if there is no feedback tap in the PSA between the jth and j + hth stage flip-flops. Finally, the overhead in employing a PSA can be fairly high. A b-bit PSA can be implemented by approximately 15b transistors. As opposed to this, the proposed technique uses only 2b + 2 log b + 12 extra transistors that can be easily laid out even within the constraints of 3X intercell pitch width in the high-density DRAM.
II. RAM ORGANIZATION AND OPERATION
Generally, an n x 1-bit RAM, denoted by M,,, consists o f p identical two-dimensional submatrices, each of size b x w ( = r, say) such that n = pbw. A submatrix, M,, consists of b bit lines and w word lines. Since in two-layer VLSI technology, either the bit line or the word line consists of diffusion/ polysilicon wire which has quadratic signal propagation delay as opposed to the linear delay of the metal wire, it is mandatory to partition the M,, into p submatrices so that the access time does not deteriorate for large values of n. Also in order to drive long bit lines, larger sense amplifiers are required to maintain their sensitivity. The partitioning has the intrinsic advantage, since all the submatrices can be tested concurrently, and thereby a testing speedup of p is easily achieved by incorporating built-in testing circuits, or by modifying the addressing scheme in the test mode. But p partitions are achieved at the expense of proliferating the sense amplifiers by a factor of 6, and also physically redistributing the decoder logic that introduces additional routing complexity and delay. For practical large DRAM'S of size 64K bit or more, the value of p is usually chosen between 2 and 16. The ratio e = b / w is called the eccentricity of M, and is chosen such that the access time is minimized. Usually, e 2 1 if the word line is made of metal, and e I 1 if the bit line is made of metal.
L e t B = { 0 , 1 , . * . , b -l)denotethesetofbitlinesinM, and W = (0, 1, * e , w -l} denote the set of word lines. The ordered pair (i, j ) described by the Cartesian product B x W denotes the address of the storage cell that occurs at the crosspoint of the ith bit line andjth word line. Such a cell is denoted by Co. The state of the cell Co is denoted by so. An operation $ on cell Cjj is denoted by $(CO) and its new state by si. Valid operations are to write a 1, or a 0 on a cell, and to read the content of the cell. These operations are denoted by W , , WO, and R, respectively. In addition to the above notations, the operation of writing a transition (i.e., complement of the present state) will be denoted by W,, and the operation of writing y E ( 0 , l} will be denoted by W,.
Depending on the previous state of the cell sjj E { 0, 1 } , the effect of the application of $ E { W , , WO, R } on Co can be further classified by the ordered pair in the Cartesian product 1 ) Ordered pair (0, W , ) is to write 1 erasing a previous 0. It 2) Ordered pair (0, WO) is to write 0 over a previous 0, and 3) Ordered pair (1, W , ) is to write 1 over a previous 1, and
is called transition write 1 , and is denoted by t . 
4)
Ordered pair (1, WO) is to write 0 over a previous 1. It is called transition write 0, and is denoted by 1.
5) Ordered pairs (0, R ) and (1, R ) denote the operations of reading a cell whose previous contents are 0 and 1, respectively.
The operations WO, W , , and R can be represented by the arcs in the state diagram shown in Fig. 1 . Since a DRAM cell stores a 0 or a 1, the state diagram consists of two states, So and SI, corresponding to the storage value of 0 and 1, respectively. In a fault-free DRAM, the action of a read operation, denoted by R, is to output the content of the cell, but the cell stays in its original state, as shown by the selfloops marked R in Fig. 1 . If the cell is in state So and a 0 is written by the operation WO, it stays in the same state. Similarly, if the cell is in state SI and a 1 is written by the operation W , , it does not change the state. Hence, these two writing operations are called the nontransition writes. As opposed to these writes, an operation W, will result in a change of state if the memory cell is at So before the write operation. Similarly, an operation WO will result in a change of state if the memory cell is at SI before the write operation. These write operations are called the transition writes. In this paper, it will be assumed that only the transition writes and read operations may be faulty. The nontransition writes generally do not result in fault and will not be tested by the proposed test algorithms. In memories where nontransition writes can be faulty, they can be easily tested by following up every transition write in the test procedure by an additional nontransition write. In memories with nondestructive read, the operations (0, R ) and (1, R ) are such that s; = s,. But in memories with destructive read, s; may be different from sij.
The read operations in this paper will be assumed to be destructive like in a single-cell DRAM and they will be tested.
III. DESIGN FOR PARALLEL TESTING OF RAM'S
The organization of the testable RAM with augmented hardware is shown in Fig. 2 . The memory is organized as a b x w matrix, where b is the number of bit lines and w is the number of word lines. The normal 1-out-of-b, bit-line decoder is modified to select multiple bit lines during testing. In general, b bit lines can be divided into g groups such that the bit line i belongs to group j , where j = i (mod 8 ) . In the test mode, all the bit lines in a groupj are selected in parallel to execute a read or a write operation. Thus, a write operation in test mode results in writing the content of the data-in buffer on all cells at the crosspoints of the selected word line and the selected bit lines in groupj. In read mode, the content of the cells located at the crosspoints of the selected word line, and the selected bit lines group (sayj), are read in parallel. Thus, if all the multiple-accessed cells contain 0 (l), a 0 (1) is read out. If all the contents of multiple-accessed cells are not identical, then whether a 0 or 1 is read out into the data-out buffer cannot be predetermined. It may be noted that it is not entirely correct to assume that the resulting operation will be strictly a wired-OR or a wired-AND. The outcome depends on the number of 0's and 1's in the multiple-accessed cells. Thus, for example, if all the cells except one contain 1 and only one cell contains 0, then most likely a 1 will be read out of the data-out buffer. However, if the reverse situation existed, i.e., one 1 and the rest O's, then a 0 is likely to be read. Thus, the resulting effect is sometimes an OR operation and sometimes an AND operation. Because of this anomaly, in read operation it is not possible to determine by monitoring the data-out buffer whether all the multiple-accessed cells have identical content. This limitation has been circumvented by incorporating a parallel comparator and an error detector in the RAM. The parallel comparator determines whether the content of all the multiple-accessed cells are either all 0 or all 1. If, due to some fault, a write operation on a cell fails or the content of some cells change, all the inputs to the comparator are not identical. The comparator detects this anomalous input and triggers the error latch to indicate the occurrence of a fault.
It should be stressed that only the bit-line decoder is modified to allow multiple access of cells on the selected word line. The word-line decoder is not modified, and word lines are accessed only one at a time. The multiple cells accessed through multiple word lines are also possible. But this needs the sense amplifier to drive many cells at a time. For a moderately large size RAM, this introduces a very high access time delay. By increasing the physical size of the sense amplifier driver, delay can be decreased to a certain extent. But as the sense amplifier driver size is increased, it consumes more power, and also because of its large gate capacitance, the sense amplifier slew rate decreases.
A . Modified Decoder Circuit
A typical CMOS implementation of a decoder circuit is shown in Fig. 3 . Transistors Q1, -* , Q7 constitute a normal decoder circuit. Transistors Qs and Q9 have been added so that in the test mode the decoder output can be selected by applying SELECT = 0 independent of the input address. In the normal mode of operation, SELECT = 1 and the decoder output is selected by the address input ao, * * e , ak-I . The modified decoder has been simulated using SPICE and the degradation in performance due to additional elements has been found to be an additional 0.1 ns delay (which is 1 percent of the normal decoding delay).
B. Parallel Comparator and Error Detector
The parallel comparator and error detector monitors the output of sense amplifiers connected to bit lines which are selected in parallel. The circuit detects the Occurrence of either all 0's or all 1's. In case any selected bit line is different from other bit lines in the same group, it triggers the error latch indicating the Occurrence of a fault. Fig. 4 shows the parallel comparator and error detector which compare the contents of cells on the even (odd) bit lines together. Transistors Qo, * * e , Q b -l are connected to the sense amplifiers and they are selected by L1 and Lz. In a normal mode of operation, L1 = LZ = 1 and the p-type pass transistors isolate the comparator from the sense amplifiers. In the test mode, if L1 = 1 and L2 = 0, all the p-type pass transistors connected to the set of odd bit lines conduct and provide the input to the parallel comparator while even bit lines remain disconnected. If L1 = 0 and L2 = 1, all the p-type transistors connected to the set of even bit lines provide the input to the parallel comparator. then both SI and S2 remain in cutoff and the output of the detector is 1. The output of the detector is connected to an error latch through the pass transistor S4 that isolates the error latch during phase 41. It may be noted that during the precharge phase, the transistor So will be directly shorted through the error amplifier if S, does not isolate the coincidence detector from the error amplifier. During phase 42, the output of the coincidence detector is connected to the error amplifier through S4. The error amplifier consists of transistors Vo, e , V,. The error latch output is ERROR = 1, when the selected bit lines are not identical. If the bit lines are all l's, then SI conducts and S2 remains cut off and the detector output is 0. This holds the error latch output to ERROR = 0. If the bit lines are all 0's then S2 conducts and SI remains cut off, setting the error latch output to ERROR = 0. During write phase and normal mode of operation, the error latch is clamped to zero by V,. The error detector is inhibited by the discharge transistor P,,, during the start of the read phase, when the sense amplifiers' outputs are not identical because of sluggish changes in some of the sense amplifiers.
IV. ALGORITHMS FOR TESTING PATTERN-SENSITIVE FAULTS
The test algorithms discussed in this paper have O ( & m complexity, and they can detect the restricted pattern-sensitive fault (PSF) [ 141. An unrestricted PSF has been shown by Hayes [15] to have (3n2 + 2n)2" test complexity and is impractical in a large size DRAM. In a restricted PSF, an operation $(C,) is faulty due to the presence of a specific data pattern in a set of physically adjacent cells (called its neighbors), or due to a specific operation on one or more cells in the neighborhood. said to occur if the state of the cell C, changes because of an operation $ on one cell in its predefined neighborhood in the presence of a specific pattern. The above definitions apply to all memory operations, namely, read, transition writes, and nontransition writes. Unless specifically stated in the following discussion, we will assume that only the transition writes and reads are faulty.
Let the cells in the neighborhood be assigned the k distinct positive number in the set (0, 1, * -, k -l } such that the number i is assigned to the base cell. The content of the neighborhood can be denoted by a state vector (sosl * * si s kwhere sj is the state of the cell which has been assigned the number j E (0, 1, e , k -1 } . Clearly, the state vector space of a neighborhood of size k describes a k-dimensional symmetric Boolean hypercube in which the nodes represent all different possible patterns which can be stored in the neighbor- A valid test is a tour on the directed Boolean cube that traverses all edges. A tour, which traverses each edge once and only once, is a minimum length test set. Such a tour is called an Eulerian tour on the graph. An example of an Eulerian tour in Fig. 6 is (0,2,3,1,0,4,6 ,7,5,4,5,7,6,4, 0,1,3,2,0) recalling the fact that each edge in the Fig. 6 represents two directed edges in opposite directions. Similarly, an Eulerian tour on the Boolean 5-cube gives a minimum length test set for the SPSF's and DPSF's in the Type-1 neighborhood of Fig. 5 . Algorithms for deriving an Eulerian tour over a symmetric n-cube are well-known [3] , [4] , [18] .
A. Algorithm for Testing PSF over Type-I
Neighborhood
In order to obtain a test size having an optimal number of transition writes, each memory cell C , in the RAM is assigned a positive integer k = (2j + i ) (mod 5) such that k E {0,1,2,3,4). The above assignment function ensures that each base cell CO, which is assigned the value k, is surrounded by four cells (NI) whose assigned values are distinct and not k. Fig. 7 shows the cell assignment values from which it can be seen that every cell that is assigned the value 1 has cells numbered 2,3,4, and 0 in its Type-1 neighborhood. By assigning the cell numbers in this manner, the average number of transition writes per cell can be minimized to 32, when the sequence of writes is derived from an Eulerian tour. This assignment also ensures that the cell values form a periodic pattern that repeats after five successive word lines. A number of other neighborhoods consisting of five cells can be described by the above cell numbering scheme [19] . These neighborhoods, which are identified later on, can be shown to tessellate the memory plane like the Type-1 neighborhood. By superimposing these different neighborhoods, the actual effective physical size of the neighborhood spans over 25 cells. Table II shows the successive bit patterns obtained from an Eulerian cycle over a 5-cube. This constitutes the optimal transition writes necessary to sensitize all the SPSF's and DPSF's. It may be noted that the Hamming distance between any two successive patterns in Table I1 is always 1. Thus, to obtain a successive pattern, only a single transition write is required on a cell in the neighborhood. The transition write involved in obtaining the mth test pattern from the (m -1)th test pattern can be defined to be an operation #m. Thus, for example, operation # I in Table 11 describes the operation of writing a transition 1 (t) on the cell numbered 3 in the neighborhood, and operation #6 implies writing transition 0
(1) on the cell number 4. Let the state of the neighborhood S,-l change to S, (where 0 I S, , I 31) due to operation #m in Table 11 . Then, in general, the operation is t on cell numberedp E {0,1,2,3,4} in the neighborhood if S, = S,-+ 2P or the operation is 1 on the cell numbered p if
The above set of operations can be applied to the memory locations as described in Algorithm 1. All the memory cells numbered p on a word line are written in parallel using the proposed modified decoder. The word lines are written sequentially so that in w writes (lines e-g of Algorithm 1) all the cells numbered p are applied the operation m and all the neighborhoods in the memory contain the pattern m in Table  II. In one read operation, all bits on a word line are read and compared simultaneously using the parallel comparators.
Thus, to read the whole memory in line i of Algorithm 1, w operations are needed. It may be noted that the effects of executing the lines e-k for a particular value of m, a s , = S,-, -2p.
-- transition write o n p sensitizes one SPSF for the pattern in line m for all cells numbered p and it also sensitizes one DPSF for all cells numbered other than p . Thus, for example, operation 55 of Table I1 in which a transition write (t) is made on cells numbered 0, the lines e-k in Algorithm 1 will sensitize all cells numbered 0 for the SPSF where a t cannot be made in the presence of binary pattern 1100 in cells 4,3,2, and 1, respectively. Simultaneously, it will sensitize four DPSF's in neighborhoods in which cells 4,3,2, and 1 are the base cells. This is illustrated in Fig. 8 . By executing the lines e-k 160 times for all the patterns in Table II Since there are altogether 160 transition writes (plus 1 write operation is for memory initialization) and the memory is read 160 times and each operation is performed on w word lines, the overall complexity of the algorithm is 321 w. In an n-bit RAM organized into p identical square submatrices, Algorithm l will require 3 2 1 d m ' o p e r a t i o n s to test all the SPSF's and DPSF's. It may be noted that Algorithm 1 tests the pattern-sensitive faults due to transition writes only. Usually, for DRAM'S using three or more transistors in each memory cell, a read operation does not result in destruction of the content of the cell. Such a read fault that depends on the pattern of stored data in the Type-1 neighborhood will be detected by the parallel comparator. Pattern-sensitive faults due to destructive-read operations in a single-cell, highdensity memory cannot be tested by Algorithm 1 and will be discussed later on.
I ) Computation of Effective Size of the Neighborhood:
The cell number assignment allows one to describe a number of memory plane tessellations as shown in the Fig. 9 . Neighborhoods in these tessellations are the corresponding tile shape. It can be seen below that by linear translations of these tile geometries, the memory plane can be covered without overlapping (i.e., tessellated). Pattern-sensitive fault procedures described in Algorithm 1 will satisfy over each of the neighborhood. The effective physical size of the neighborhood can be estimated by superimposing these tile geometries. But it is necessary to invoke a constraint that all cells numbered i (0 I i I 4) are mutually noninteractive and consistent in the sense that a transition write on a cell numbered i neither affects the content of another cell numbered i in the same neighborhood, nor does it mask the coupling effect between any other coupling cell i and a coupled cell j # i , both in the same neighborhood.
B. Algorithm for Testing Symmetric PSF in Type-2 Neighborhood
In order to detect all the static pattern-sensitive faults over the Moore's neighborhood, every cell in the memory should make both t and 1 transitions in the presence of all 2* = 256 patterns in the memory. Thus, overall 512 transition writes are (2j + i ) (mod 5) = p . necessary for each cell in the memory to detect the SPSF's. In order to detect all the DPSF's, each base cell should be tested by a reading operation whenever a transition write is made over a cell in the neighborhood of a base cell. Since each cell can make two types of transition writes for all the possible binary patterns in the other eight cells and there are altogether eight neighbors of a base cell, there are altogether 2 x 28 x 8 = 4096 transition writes in the neighborhood of a cell to test all the DPSF's. This requires a large amount of test time.
In this section, a new approach has been adopted in which cells in the 9-neighborhood are categorized into four logical groups, viz., the base cell, bit-line neighbors (Nb), word-line neighbors (N,,,) , and diagonal neighbors ( N 2 ) as described in the Definition 1. It may be noted that a pattern-sensitive fault models the adjacency effect between a base cell and its physically neighboring cells. It is typically due to leakage effects between a memory cell and its adjoining cells in the presence of a particular data pattern in the neighboring cells
[20]- [22] . It has been found that the leakage is maximum when the symmetrically located cells contain the same bit patterns [2 I]. By the above classification, many unnecessary binary combinations are avoided. For example, let there be a situation when a read operation is made to verify the transition write t when its bit-line neighbors Ci,,+ and C;,j-1 contain 0.
Clearly, at first, the bit line i will be precharged to some high potential. Now, if any of the access transistors of the bit-line neighbors is weak, then in the presence of 0's in the bit-line neighborhood, the precharge level in the bit line will be degraded, and the sense amplifier on the ith bit line will fail to detect a 1 in the base cell. Similarly, if there is a weak transistor in cell Ci+ I,,, which does not allow the base cell to make a t transition because CiCl,, is at state 0, then it is enough to test the fault when the symmetrically located cell C ; -l , j is also at state 0, since then the leakage effect will be predominant. Thus, if a fault does not occur when both the Utilizing this notation, the SSPSF's and SDPSF's can be represented by Table 111 . The four-tuples are as described in Definitions 5 and 6. Since, there are 16 SSPSF's and 48 SDPSF's associated with each cell in the memory, it can be easily seen that at least 4 X 24 = 64 transition writes in the neighborhood will be necessary to sensitize all the SSPSF's and SDPSF's of a cell in a DRAM. A straightforward extension to an n-cell memory will require 64n transition writes. However, in this section, it will be shown that by cleverly combining the transition write sequence in overlapping neighborhoods, the total number of transition writes can be reduced to 16n.
In order to accomplish the minimal transition writes per memory cell, at first, each cell C, in the memory will be assigned a positive number k E {0,1,2,3} such that k = 2 (i mod 2) + j mod 2. Thus, the memory cells are divided into four types of cells 0,1,2, and 3, as shown in Fig. 10 . It will be shown later that a transition write on a cell followed by a suitable sequence of read operations on the adjoining cells will simultaneously sensitize four pattern-sensitive faults and thereby the number transition writes on each cell can be reduced by a factor of 4.
In order to obtain a test procedure which needs only 16 transition write sequences per cell, a graph theoretic approach similar to the earlier algorithm can be used. The 4-tuple, ,j i 1 1s (N2 or sij 1s ( N b ) s ( N w and from k to k + 2 P , if the pth bit changes from 1 to 0. All these transitions from one state to another state represent edges in the state space graph, and the resulting graph describes a symmetric four-dimensional cube as shown in Fig. 11 . Clearly, node 12 in the figure represents that the base cell and its bit-line neighbors contain 1, and the word line and diagonal neighbors of the base cell contain 0. By making T operation on the word-line neighbors, N, the content of the Type-2 neighborhood, will change and it will be represented by the node 14 in Fig. 11 . A directed edge from node 12 to node 14 represents this transition-write operation. In general, each directed edge corresponds to a transition write over all the cells having the same number, k E {0,1,2,3} in a neighborhood, and since there are always two anti-parallel directed edges, these two edges are represented as a single undirected edge in Fig. 11 . The set of thick edges corresponds to changing the state of the base cell and will pertain to sensitizing the SSPSF's. Other edges pertain to sensitizing the SDPSF's. Similar to Algorithm 1, an algorithm with optimal transition writes can be obtained by deriving the write t".' sequences from an Eulerian tour over the symmetric 4-cube.
The resulting algorithm will have 97w complexity, involving 32w transition writes, 64w read, and w write operations corresponding to memory initialization. It may be noted that even though 64 transition writes over the symmetric 4-cube are needed to sensitize all the SSPSF's and SDPSF's, only 32w transition writes are needed over the entire memory. Even though the test algorithm derived from an Eulerian tour has an optimal test length, for many applications like in the embedded environments, it is desirable to make a tradeoff between the test length and the built-in self-test (BIST) hardware. It is shown in [23] and [24] that the test generator circuit can be simplified considerably if the sequence of transition writes is derived by decomposing an Eulerian tour over the symmetric 4-cube into the following eight disjoint Hamiltonian cyclic tours described over a subgraph of the symmetric 4-cube: H 1 : (0, 2, 6, 14, 15, 13, 9, 1, 0) H2: (0,1,9,13,15,14,6,2,0) H3: (0,4,6,7,15,11,9,8,0) H 4 : (0,8,9,11,15,7,6,4,0) N5: (12, 4, 5, 7, 3, 11, 10, 8, 12) H6: (12, 8, 10, 11, 3, 7, 5, 4, 12) H 7 : (12,13,5,1,3,2,10,14,12) , and H 8 : (12,14,10,2,3,1,5,13,12) .
These tours are shown in Fig. 12 , where the set of dark edges constitutes two Hamiltonian cycles, one in clockwise and the other in anti-clockwise direction. Thus, Fig. 12(a) represents the Hamiltonian cycles H 1 and H 2 , Fig. 12(b) represents the cycles H 3 and H 4 , Fig. 12(c) represents the cycles H 5 and H 6 , and finally, Fig. 12(d) represents the cycles H 7 and H 8 . The length of each Hamiltonian cycle is eight, and altogether eight Hamiltonian cyclic tours are made. Thus, all the 64 edges of the symmetric 4-cube are traversed and thereby all the 16 SSPSF's and 48 SDPSF's are sensitized. Initially, the memory is initialized to zero and the first four Hamiltonian cycles are performed as indicated above. Then the memory is After memory reinitialization, the rest of the Hamiltonian tours are performed. Thus, the overall tour can be represented as ( H l , H2, H3, H4, (0,12), H5, H6, H7, H8), and it consists of 65 transition-write operations. Initially all the cells in the Type-2 neighborhood contain 0 and successively the transition-write operations are made to change the state of the neighborhood. Thus, after the first operation the state of the neighborhood changes to 2, and after the second operation the state of the neighborhood changes to 6, and so on. In general, an operation m will change the state S, -to S, , where S, E {0,15} denotes the state of the neighborhood after the mth operation is applied. After each operation, the whole memory is read to find out whether any SSPSF or SDPSF has occurred. The above test procedure is described in Algorithm 2 which uses these write sequences to test the SSPSF's and SDPSF's for every cell in the memory over its 9-neighborhood.
In Algorithm 2, all cells numbered k E {0,1,2,3} make an upward (t) and a downward (1) write, all the cells are read, if any SSPSF occurs, it will be detected by the parallel comparator and error latch. Also, because of the neighborhood relationship in Fig. 10 , every transition write on cells numbered k will also sensitize the SDPSF's for the other three cells for which the number is not k. For example, in the third operation in H1, the state of the Type-2 neighborhood changes from 6 to 14 by writing t in the cell numbered 0 while the contents of the cells numbered 1,2, and 3 remain the same to 1,1, and 0, respectively. The succeeding read operation in line 4 of Algorithm 2 detects an SSPSF in all neighborhoods for which the base cell is 0, and SDPSF's in all other neighborhoods for which base cell is not 0 where the fault occurs due to transition in cell numbered 0. The effect of third operation on different neighborhoods is shown in Fig. 13 . Thus, Algorithm 2 which makes all the 65 transition writes over the entire memory will sensitize both the SSPSF's and SDPSF's for every cell in the memory. 
V. TEST PROCEDURES FOR OTHER FAULTS
Nontransition Write Operations: Algorithms 1 and 2 detect the pattern-sensitive faults in the memory due to transition writes. Since the nontransition write does not change the state of a cell, it is unlikely to cause a failure. In memories where nontransition writes are also faulty, the algorithms in the preceeding section can be easily augmented by following up each transition write with a nontransition write. After each nontransition write, the entire neighborhood is read to detect the occurrence of any pattern-sensitive faults.
Destructive Read Operations: In a switched-capacitor, single-transistor DRAM which employs a destructive-read operation, the failure may result during the precharge, sensing, and restoration phases in a read operation. Since in the proposed parallel testing scheme, multiple cells are read and compared simultaneously, any failure in a cell that results during the precharge and sensing by the sense amplifiers will be automatically detected by the parallel comparator. A fault that occurs during the restoration phase when the original data in the cell are written back, will also be tested by Algorithms 1 and 2, provided the faulty read operation is a patternindependent fault (PIF). It can be shown that in these algorithms every memory cell is read successively twice or more, without any intervening write operations on it. Therefore, a fault occurring in the restoration phase will be detected by Algorithms 1 and 2. But in order to test the read faults in the restoration phase which only depend on a specific stored pattern in the neighborhood, every read operation in these algorithms should be followed by an extra read operation. Read operations, which may result in static and dynamic pattern-sensitive faults similar to transition writes, can also be tested by reading all the cells in the neighborhood after each read operation on a cell is made. The complexity of these pattern-sensitive faults over the Type-1 and Type-2 neighborhoods due to the different types of memory operations are shown in Table IV . It may be emphasized that for a particular processing technology, certain operations are more likely to result in pattern-sensitive faults than others (e.g., a transition write may cause more pattern-sensitive faults than a read operation), and hence different test sizes are indicated here.
Decoder Faults: A fault-free decoder makes a bijective mapping of the input addresses onto the memory cells. A faulty decoder may cause three types of fault syndromes. First, an input address may not map onto any memory cell, and the resulting failure is known as a no-access fault. Second, an input address may select multiple cells for reading or writing, and the resulting failure is commonly known as multiple-cells access fault. Third, a memory cell may be accessed by multiple addresses, and the resulting failure is called multiple-address access fault. The different decoder mappings are shown in Fig. 14. A no-access fault usually results in a stuck-at fault and will be detected by the patternsensitive algorithms discussed here. It may be noted that in the event of a stuck-at fault in a cell, it manifests static patternsensitive faults for all the possible patterns in the neighbor- hood. Multiple-cells access and multiple-address access faults cannot be detected by Algorithms 1 and 2. It may be noted that the bit-line decoders are grouped into g classes in the test mode, and the multiple-access faults in the bit-line decoder will be masked if the fault results in accessing bit lines within the same class. Also, it may be noted that even though the word-line decoder is not modified, these faults in the wordline decoder will not be detected by Algorithms 1 and 2. This is because in these algorithms cells having identical number are at first written in all word lines before verifying individual word-line-write operations. In order to test these faults, a marching-type test procedure as shown in Algorithm 3 is needed. Steps 1-4 in the algorithm detect the multiple-access faults in a bit-line decoder by using 2 + 4& operations. Steps 5-8 involve 6&
operations and are used to detect multiple-access faults in a word-line decoder. In the fault model, it is assumed that multiple access may occur asymmetrically in the sense that a line r may be accessed with a line s while the line r is selected, but the line r is not necessarily accessed along with the line s when the line s is selected. 
Faults in Parallel Comparator:
It may be noted that the testable hardware in Fig. 4 is only tested partially by Algorithm 2. When all the even or the odd lines contain 0, the n-type transistors Po, * a , Pm-I will be tested for stuck-at 1 faults. When all the even or the odd lines contain 1, all the ptype transistors To, a * a , Tm-will be tested for stuck-at 0 faults. The rest of the stuck-at faults in the testable hardware can be tested by Algorithm 4. Lines 1 and 2 are used to initialize the memory, and to verify this initialization, lines 3 and 4 redundantly detect the stuck-at 1 faults in n-type transistors and stuck-at 0 faults in p-type transistors. Line 5 verifies the stuck-at 0 fault in p-type transistors, and line 6 verifies the stuck-at 1 fault in n-type transistors. The overall complexity of Algorithm 4 is 4& + 4.
It may be noted that the parallel comparator does not distinguish between all the 0's and all 1's data, and the error detector may fail to indicate the Occurrence of a reversal of data. Such a fault can be detected by comparing the value of data-out buffer to the expected data. VI. CONCLUSIONS This paper has discussed an efficient technique to speed up the RAM test algorithms. Specifically, it has proposed two test algorithms to test the pattern-sensitive faults in a RAM. Algorithm 1, requiring 32 1 -operations, detects all the static and dynamic pattern-sensitive faults in the memory over a localized neighborhood of five cells, called here the Type-1 neighborhood. By identifying other neighborhood geometries similar to pentomino tiles [25] , [26] , the algorithm has been identified to actually cover a restricted type of patternsensitive fault over 25 cells. The basic algorithms test the pattern-sensitive faults which occur due to transition writes. The algorithms have been extended to test the pattern-sensitive faults due to transition writes and read operations, both in a DRAM with destructive and nondestructive read. The algorithm derives its transition writes sequence from an Eulerian tour over a symmetric 5-cube, and it employs an optimal number of transition writes. As opposed to this technique, Algorithm 2, which tests symmetric pattern-sensitive faults over the Type-2 neighborhood consisting of nine cells, derives its transition writes sequence from the knowledge of eight disjoint Hamiltonian cycles over the subgraphs of the symmetric 4-cube. For an n-bit RAM organized into p submatrices, the resulting algorithm has a complexity of 98-operations, using an extra -transition writes. For a 4 Mbit memory organized into 16 squares subarrays and with 50 ns access time, this needs an extra 25 ps compared to a similar test procedure where the transition writes are derived from an Eulerian walk as in Algorithm 1. However, this simplifies the test generator circuit considerably, and the scheme is useful for built-in self-test applications.
Even though in this paper only the testing of patternsensitive faults has been discussed, the proposed design-fortestability technique can be readily used to speed up the conventional algorithms to test the stuck-at and 2-coupling faults. Classical tests like the column bar and the checker board detect the stuck-at faults in memory arrays using 4n operations. Marching tests detect coupling faults between two arbitrary cells in the memory. Several versions of the marching tests can be found in the literature. Nair, Thatte, and Abraham [27] originally proposed a linear marching test algorithm which has complexity of 30n operations. Suk and Reddy [13] improved the complexity to 14n operations by slightly modifying the fault model, and Marinescu [28] improved the complexity to n operations for a more restricted coupling fault model. All these algorithms can be speeded up by a factor of O ( m , but due to the parallel operations some of the coupling faults will be masked. However, the parallel algorithms for stuck-at and pattern-sensitive faults do not mask any fault.
The proposed implementation scheme for the parallel testing uses minimal extra hardware. The parallel comparator consists of 2 b + 12 transistors and the extra hardware in the modified decoder is 2 log2 b transistors. Thus, the overall extra hardware is only 2 b + 2 log2 b + 11 transistors, and the overall chip area expansion is only 0.4 percent for a 256 Kbit DRAM. The proposed technique needs only one transistor to fit within the pitch width and easily fits even for the vertically integrated, single-cell DRAM design with trench-type capacitor having intercell pitch width of 3A [29].
