We present a design method (called STD architecture) to design large memories so that the test time does not increase with the increasing size of memory. Large memories can be constructed by using several small blocks of memory. The memory address decoder is divided into two or more levels and designed such that during the test mode all small memory blocks are accessed together. With 
emory is a very important part of a computer system. Significant amount of work has been done in the recent years to obtain fast and very large memory systems. As a result, the density of semiconductor memory chips has increased dramatically [1] . With the increasing complexity, it has been recognized that the efficient testing of such memories is a difficult problem. A multi-mega bit random access memory requires excessively large time just to test all cell stuck-at faults. To overcome this problem, researchers have sought to develop innovative test generation algorithms and on-chip built-in self-test methods.
Several innovative test algorithms for random access memories have been reported [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] . These algorithms can be categorized into two classes. One set of algorithms is based on the memory fault model as given by Nair, Thatte and Abraham [3] . The representative papers are [3] [4] [5] [6] [7] [8] [9] [10] [11] . A second class of test algorithms is based on the pattern sensitive neighborhood cell fault model of Hayes [12] [13] . The representative papers are [12] [13] [14] [15] [16] [17] . The best known algorithms in both the classes are polynominal in time.
Because the complexity of memories is quadrupling in every 2-3 years, even a linear increase in test time becomes undesirable for large memories [1] .
To overcome the prOblem of large test time, builtin self-test (BIST) methods have been developed.
One set of papers use extra hardware for on-chip test generation and response evaluation (using a parallel signature analyzer) [18] [19] [20] [21] [22] . The second approach [23] [24] [25] [26] [27] [28] [29] [30] , uses extra hardware to partition the whole memory into small blocks and test them in parallel (using external test generation). Jarwala and Pradhan [25] , showed that using partitioning methods, a significant saving in the test time can be achieved for large memories. However, Jarwala and Pradhan [25] have pointed out that if a 1M-bit memory is parti- [27] [28] . This partitioning method does not need large hardware overhead and only requires a modified decoder for the most significant address lines to ensure testability. As will be demonstrated, the proposed decoder modification does not cause large hardware overhead. The basic difference between the proposed scheme and that of Jarwala and Pradhan [25] , is that the latter partitions memory using extra hardware (making H-tree) after the design. The access to the partitioned blocks was obtained by extra hardware. In the proposed STD architecture, partitioning is achieved through memory address decoder.
Below, we have used four examples to illustrate the proposed scheme. These examples cover both 1-bit and m-bit word size memories, and allow memory size to increase by a factor of 2x. To express the memory size, we have used the notation nxm, where n represents the number of words and rn represents the word size. For example, 16K x i represents 16K-bits and 16K x 8 represents 16K-bytes.
Example 1 8K x 1 Memory
This memory requires a 13-to-8K decoder which can be implemented at two levels, using 10-to-lK decoder and a 3-to-8 decoder. Thus, the memory can be designed by eight blocks of 1K-bits (each associated with 10-to-lK decoder) and a 3-to-8 decoder. The design is made testable by modifying the 3-to-8 decoder, which contains the most significant address lines A10-A12. The modification is done by adding one extra control signal to the decoder. To illustrate the concept, design of 2-to-4 decoder is given in Figure 1 . Also, a parity circuit is added at the outputs of the 1K-bits blocks. The testable design is given in Figure 2 . The control signal added to the 3-to-8 decoder makes it possible to select all of the decoder output lines when control signal C 1 (this is done during the test mode). When C 0, the decoder is in its normal mode and selects only one of its output.
To test this memory, the control signal C is kept at '1.' Thus, same data read/write operations can be done to all eight memory blocks using address lines A0-A9. During this mode, all eight block are tested in parallel. It should be noted that in case of a fault in any block, the output of the parity circuit would be '1' and hence, fault is detected. Using the algorithm as given in [5] , all eight blocks can be tested by 9K read/write operations.
After the testing of memory blocks, the control C is switched to '0', converting the 3-to-8 decoder into normal mode. Under this situation, eight input combinations are needed to test the 3-to-8 decoder. It should be noted that if 1K 1 blocks are tested by the algorithm given in [5] The memory size can be extended in this architecture with minimal effort. For example, consider the design given in example 1 is required to be extended to 32K-bits memory. This can be done easily by using four blocks of 8K-bits memories each equivalent to Example 1, and an additional 2-to-4 decoder. This 2-to-4 decoder contains the most significant address lines and hence, it is modified by a control signal. The control signal used in the 3-to-8 decoder in Fig.  3 can be used in 2-to-4 decoder as shown in Figure  3 . During the test mode, the four 8K 1 blocks are selected using the control signal (C 1) and all eight 1K 1 blocks are selected for each 8K 1 block.
Therefore, by setting C 1, all 32 blocks of 1K 1 memory are selected. These 32 blocks are tested in parallel by 9K read/write operations using address lines A0-A9.
After testing of the memory blocks, the control signal is switched to '0'. Under this condition, the 2-to-4 decoder and four 3-to-8 decoders are tested by 32 Figure 5 ). It should be noted that equivalently, 256K 1 memory can be designed by four additional 3-to-8 decoders, and can be tested by (72K + 48) read/write operations.
The above examples show that various size and word length memories can be designed such that the test time remains constant in all the cases. From Example 1 for total capacity of 8K-bits, to Example 2 for total capacity of 256K-bits the test time is constant (approximately 9K vectors).
FAULT DIAGNOSIS AND RECONFIGURATION
With slight modifications in the designs given in the preceding section, memory can be designed for fault diagnosis. The basic idea is to use a register instead of a parity circuit to obtain better observability. Consider the design of 8K 1 memory as given in Example 1. The modified design is shown in Figure 6 . The design given in Figure 6 is basically same as given in Figure 2, Figure 7 . It should be noted that in Figure   7 , we measure the overhead in the number of gates. Although, this measure is not very accurate, we feel that this is the best representation, because the percentage overhead is negligible. It should be noted that the routing overhead for the control signal is very small in the proposed STD architecture as explained in section 4. The area overhead due to ad- ditional transistors associated with the control signal inside decoder is also extremely small, if the decoder is designed with complex gates as shown in Figure  1 . Partitioning and implementation of a decoder into two or multi-level in fact can result in a reduction in transistor count. This fact can be visualized by considering a small example, a 4-to-16 decoder. One level implementation of 4-to-16 decoder using 4-input gates requires 128 transistors (64 nMOS and 64 pMOS transistors). The same decoder can be implemented at two levels using five 2-to-4 decoders. In this case, the total number of transistors is (5 16 80). Partitioning of the decoder in this manner also results in the decrease in signal propagation delay [37] , due to small capacitances. Therefore, such partitioning is desirable to improve the performance.
