We describe a built-in test pattern generation method for scan circuits. The method is based on partitioning and storage of test sets. Under this method, a precomputed test set is partitioned into several sets containing values of different primary inputs or state variables. The on-chip test set is obtained by implementing the Cartesian product of the various sets. The sets are reduced as much as possible before they are stored on-chip in order to reduce the storage requirements and the test application time.
Introduction
Storage-based methods for built-in test pattern generation store certain data on-chip. For example, the data may consist of a test set that was generated off-chip (or parts of such a test set). Storage-based methods use the stored information as part of an on-chip test pattern generator (TPG ) in order to achieve complete fault coverage. The simplest form of a storage-based TPG consists of a memory that stores the complete test set, and a counter that can go through the memory addresses where the test set is stored. In this way, a precomputed test set can be applied to the circuit in full. To reduce the size of the on-chip memory required, one of the following two general approaches can be used.
Encoding techniques [1] allow the complete test set to be stored in a compressed way.
Alternatively, on-chip computations [2] , [3] allow new test vectors to be obtained from existing ones. Under the method of [2] , the next test vector to be applied to the circuit is obtained from the test vector currently applied by complementing single bits. The implementation of [2] stores in an on-chip memory the description of the operations to be applied to each test vector in order to obtain the next one. This method was applied to combinational circuits. The method of [3] was designed for synchronous sequential circuits. It is based on storage of short input sequences that are expanded on-chip into test sequences.
Recently [4] , we proposed a new approach to storagebased test-pattern generation for synchronous sequential circuits. This method uses a simpler test application scheme than [3] , reduces the overall amount of data that needs to be loaded to the chip (thus reducing the load time), and in several cases also reduces the memory requirements compared to [3] . The basic idea behind the method of [4] is demonstrated by the following example (although the method of [4] was proposed for synchronous sequential circuits, we demonstrate in the following example its application to a combinational circuit since it is closer to the application to scan circuits that we consider here). Consider a test set T = {00000, 00111, 01000, 01110, 10110, 10111} precomputed off-chip for a 5-input combinational circuit. To store this test set on-chip, we need a memory of 6 . 5 = 30 bits. Let us partition the test set into two subsets T 1 and T 2 such that T 1 contains the values of the first two inputs, and T 2 contains the values of the last three inputs. Thus, the pattern 00000 contributes 00 to T 1 and 000 to T 2 , the pattern 00111 contributes 00 to T 1 and 111 to T 2 , and so on. We obtain T 1 = {00, 01, 10} and T 2 = {000, 110, 111}. To store T 1 we need 3 . 2 = 6 bits and to store T 2 we need 3 . 3 = 9 bits, for a total of 15 bits. The memory requirements are thus reduced to half by partitioning the test set. To apply the original test set to the circuit, we need to apply certain pairs t 1 t 2 simultaneously, where t 1 ∈ T 1 and t 2 ∈ T 2 . For example, we need to apply t 1 = 00 and t 2 = 000 to obtain the first vector 00000 in T , while the pair t 1 = 00 and t 2 = 110 yields 00110 which is not in T . However, storing the pairs that need to be applied to the circuit may be a space consuming solution. Instead, we observe that T is contained in the Cartesian product T 1 ×T 2 of T 1 and T 2 . The Cartesian product T 1 ×T 2 consists of every pair t 1 t 2 such that t 1 ∈ T 1 and t 2 ∈ T 2 . In the example, we obtain T 1 ×T 2 = {00000 , 00110, 00111 , 01000 , 01110 , 01111, 10000, 10110 , 10111 }, where the underlined vectors are in T . To implement the Cartesian product we only need two counters that will go through all the elements of T 1 and T 2 . If the size of T is N , the size of T 1 and T 2 is at most N , and the size of the Cartesian product is at most N 2 . Two effects help keep the number of tests in the Cartesian product significantly lower than N 2 .
(1) The number of patterns in T 1 and T 2 is typically smaller than N . This is because some vectors in T contribute the same vector to T 1 or T 2 . For example, both 00000 and 00111 contribute 00 to T 1 in the example above. Thus, instead of six vectors we obtained three vectors in each one of T 1 and T 2 above.
We apply a procedure where we omit as many patterns as possible from T 1 and T 2 . This serves to reduce the memory requirements, but also reduces the number of tests applied to the circuit. Since the Cartesian product contains more tests than T , it is typically possible to omit patterns from T 1 and T 2 and still obtain complete fault coverage when applying T 1 ×T 2 to the circuit. In the example above, if we omit the pattern 10 from T 1 , we obtain T 1 = {00, 01}, T 2 = {000, 110, 111} and T 1 ×T 2 = {00000, 00110, 00111, 01000, 01110, 01111}. If this test set detects all the circuit faults, it can replace the original test set. The memory requirements are reduced by two bits, and the number of tests applied to the circuit is reduced by three tests.
It is important to note that in most storage-based methods, including [2] , [3] , [4] and the method proposed here, encoding [1] can be used to further reduce the memory sizes required. It is also important to note that similar to [3] and [4] , the number of test vectors applied to the circuit under the method proposed here is larger than the number of vectors in the precomputed test set T . Consequently, improved defect coverages are likely to be obtained. We present experimental results to support this point by showing that the extra tests applied to the circuit are effective in detecting the circuit faults multiple times [5] , [6] .
Although the example above was given for a combinational circuit, we consider full-scan sequential circuits in this work. We assume that a test τ i for a full-scan circuit has the following components. (1) A scan-in vector SI i . The vector SI i is applied through the scan chain at the beginning of the test. (2) A primary input sequence T i consisting of one or more primary input vectors. The sequence T i is applied to the primary inputs after SI i is scanned in. While T i is applied, the flip-flops are driven from the next-state variables of the circuit (without using the scan chain). At the end of a test, the final state is scanned out. We use the notation τ i = (SI i ,T i ) for a scan-based test. For example, for a circuit with four state variables and two primary inputs, a possible test is (0000, (01, 10)), where 0000 is the initial state scanned in, and (01, 10) is the sequence applied to the primary inputs following the scan operation. The importance of applying input sequences T i of length larger than one is that it contributes to at-speed testing of the circuit [7] , [8] .
A test set for a scan circuit consists of tests τ 1 ,τ 2 , . . . ,τ N where τ i = (SI i ,T i ). In contrast, for a non-scan synchronous sequential circuit, a single test sequence T i is considered in [4] , and T i is significantly longer than the sequences T i included in scan-based tests. The difference in test set structure results in significant differences in the partitioning and test application schemes for the two types of circuits. For example, in [4] , T i is partitioned into equal length subsequences over which the Cartesian product is defined, whereas here, the subsequences T i are short enough to be kept intact.
Earlier works on built-in test generation for scan circuits using tests of the form (SI i ,T i ) were based on random patterns [9] - [11] . The methods of [9] and [10] do not achieve complete fault coverage, and all three methods do not guarantee that complete fault coverage would be achieved. Although the method of [11] achieves complete fault coverage for all the benchmark circuits considered in [11] , this is achieved at the cost of a more complex test application scheme. The method proposed here guarantees that the same fault coverage achieved by an off-chip test set would be achieved by the test set generated on-chip.
The paper is organized as follows. In Section 2 we describe the proposed procedure for partitioning a test set T = {τ 1 ,τ 2 , . . . ,τ N } of a scan circuit. We also describe the procedure for reducing the number of elements in the sets obtained after partitioning. This procedure reduces the storage requirements and the number of tests applied to the circuit while ensuring that the Cartesian product would detect all the faults detectable by T . In Section 3 we consider the test application process and the hardware required for implementing it in the case of scan circuits. In Section 4 we present experimental results. Section 5 concludes the paper.
Test set partitioning
Let T = {τ 1 ,τ 2 , . . . ,τ N } be a test set for a scan circuit, with
In this section, we first consider the partitioning of T . We then describe a procedure for reducing the sizes of the resulting sets. For illustration, we consider the test set for ISCAS-89 benchmark circuit s 27 shown in The first level of partitioning we apply to the test set T results in a set Ψ that contains all the scan-in vectors SI i , and a set Σ that contains all the primary input sequences T i . For s 27, we obtain Ψ = {000, 011, 110} and Σ = {(0000), (1101), (1010), (0100, 0111, 1001)}. This requires storage of 33 bits. The Cartesian product Ψ×Σ defines the test set that will be applied to the circuit. We obtain Ψ×Σ = {(000, (0000)), (000, (1101)), (000, (1010)), (000, (0100, 0111, 1001)), (011, (0000)), . . . , (110, (0100, 0111, 1001))} containing 12 tests.
The partitioning into Ψ and Σ as defined above is motivated by the fact that the number of state variables of a circuit is typically much larger than the number of primary inputs. In addition, the primary input sequences are typically short. Consequently, it is advantageous to deal with the state variables separately. When reducing the sizes of the sets Ψ and Σ, we will give a higher priority to reducing the size of Ψ since the vectors in Ψ tend to be larger than the input sequences in Σ when the number of state variables is large.
When the number of state variables is large, it is also advantageous to further partition the set Ψ. We achieve this by The test set to be applied to the circuit is obtained by computing the Cartesian product Ψ 1 ×Ψ 2 ×Σ. For s 27, Y 1 contains the first state variable and Y 2 contains the last two state variables. We obtain Ψ 1 = {0, 1}, Ψ 2 = {00, 10, 11} and Σ = {(0000), (1101), (1010), (0100, 0111, 1001)}. Storage of these sets requires 32 bits. The Cartesian product includes 2 . 3 . 4 = 24 tests.
Once the sets Ψ and Σ (or Ψ 1 , Ψ 2 and Σ) are defined, we attempt to reduce their sizes as much as possible without reducing the fault coverage achieved by Ψ×Σ (or Ψ 1 ×Ψ 2 ×Σ). We consider the case where Ψ 1 , Ψ 2 and Σ are used. The case where Ψ is not partitioned into Ψ 1 and Ψ 2 can be accommodated by setting Ψ 1 = Ψ and Ψ 2 = φ. Procedure 1 below describes how the sets are reduced. Procedure 1 accepts the sets Ψ 1 , Ψ 2 and Σ, and the set of faults F detected by the original test set T . The procedure attempts to omit elements of Ψ 1 , Ψ 2 and Σ one at a time in an order that will be explained below. After every element is omitted, the reduced test set obtained by the Cartesian product Ψ 1 ×Ψ 2 ×Σ is fault simulated. If, during the simulation process, it turns out that a fault f ∈ F is not detected by Ψ 1 ×Ψ 2 ×Σ, fault simulation stops and the omitted element is restored. Only if all the faults in F are detected, the omission is accepted. It is then made final, and the omitted element is never restored.
Procedure 1 first considers the patterns in Ψ 1 . To determine the order by which the patterns will be considered, we associate with every pattern ψ 1i ∈ Ψ 1 the number of times it appears in the original test set T , i.e., the number of tests in T that contain ψ 1i . We denote this number by n (ψ 1i ). For example, for s 27, ψ 11 = 0 appears three times in T (in three different tests), and ψ 12 = 1 appears once in T (in a single test). Thus, n (ψ 11 ) = 3 and n (ψ 12 ) = 1. We attempt to omit patterns that appear small numbers of times before we attempt to omit patterns that appear large numbers of times in T . The motivation for this is as follows. If ψ 1i appears a large number of times in T , it is likely to contribute to a large number of tests in Ψ 1 ×Ψ 2 ×Σ that detect new faults. If we omit it, there are likely to be many other patterns that it will not be possible to omit without reducing the fault coverage. Therefore, we prefer to keep ψ 1i in Ψ 1 . In contrast, if ψ 1i appears a small number of times in T , it is likely that it will be possible to omit it, and that omitting it will have a small impact on the ability to omit other patterns. In the example of s 27, we try to omit ψ 12 = 1 first, and then we try to omit ψ 11 = 0. In this example, none of them can be omitted.
We repeat the same process for Ψ 2 . In the case of s 27, we have ψ 21 = 00 with n (ψ 21 ) = 1, ψ 22 = 10 with n (ψ 22 ) = 1, and ψ 23 = 11 with n (ψ 23 ) = 2. We attempt to omit ψ 21 , then ψ 22 and finally ψ 23 . We find that ψ 21 = 00 can be omitted.
For the input sequences in Σ, the order is based on two parameters. The first parameter, n (T i ), is the number of times T i appears in T . This is similar to the parameter n (ψ ij ) used for the patterns in Ψ 1 and Ψ 2 . The second parameter is the length of T i , denoted by L (T i ). We prefer to omit long sequences first, since their storage requirements are higher. We use the sequence length as the primary criterion, and the number of appearances in T to break ties. For s 27, we have T 1 = (0000), T 2 = (1101), T 3 = (1010) and T 4 = (0100, 0111, 1001) with n (T i ) = 1 for i = 1,2,3,4, L (T i ) = 1 for i = 1,2,3 and L (T 4 ) = 3. We consider the sequences in the order <T 4 ,T 1 ,T 2 ,T 3 >. We find that T 2 can be omitted.
The final result we obtain for s 27 is Ψ 1 = {0, 1}, Ψ 2 = {10, 11} and Σ = {(0000), (1010), (0100, 0111, 1001)}. Storage of these sets requires 26 bits. The number of tests applied to the circuit is 2 . 2 . 3 = 12.
Procedure 1 that reduces the sets Ψ 1 , Ψ 2 and Σ is given next. Procedure 1: Reducing Ψ 1 , Ψ 2 and Σ (1)
Let F be the set of faults detected by T . Mark all the patterns in Ψ 1 unselected . (2) Select the unselected pattern ψ 1i ∈ Ψ 1 with the minimum value of n (ψ 1i ). Note that the Cartesian product Ψ 1 ×Ψ 2 ×Σ does not have to be maintained explicitly. Instead, the tests can be generated as needed. This prevents excessive storage requirements during fault simulation. Fault simulation time is reduced by stopping the simulation of Ψ 1 ×Ψ 2 ×Σ as soon as an undetected fault which was detected by T is identified.
To further reduce the fault simulation time in Procedure 1, we observe that for every fault f ∈ F , it is possible to express the test τ ∈ T that detects it as a combination ψ according to the new combination. In this way, we minimize the number of times a fault f has to be simulated after removing an element from one of the sets.
Hardware implementation
In this section, we compare the hardware required to implement the proposed partitioning-based method with the hardware required when the complete test set is stored.
One way to store the complete test set is by using the following memories (other options exist, but they have similar overheads). The memory referred to as SI stores the scan-in vectors. The i th memory entry, SI 
Experimental results
We applied the proposed built-in test generation method to ISCAS-89 and ITC-99 benchmark circuits. In two separate experiments, we used two precomputed test sets for every circuit. The first test set is derived from a combinational test set (a test set designed for the combinational logic of the circuit). For ISCAS-89 benchmark circuits, the combinational test set is the compacted test set generated by the procedure from [12] . For ITC-99 benchmark circuits, a combinational test set is obtained by applying 100,000 combinational test patterns and including in the test set only the patterns that detect new faults. A combinational test c i is transformed into a scan-based test τ i as follows. The values of the state variables obtained under c i are assigned to the scan-in vector SI i ; and the primary input vector obtained under c i is included in a test sequence T i of length one. Starting from the test set obtained in this way, we apply the static compaction procedure of [13] . After static compaction, the number of tests is lower and the lengths of the sequences T i is higher (the lower number of tests reduces the number of scan operations and the test application time; the total length of all the primary input sequences is kept the same or it is reduced by compaction). Since the test sets obtained in this way tend to contain very short input sequences, we also consider the test sets generated by the simulation-based test generation procedure of [14] . These test sets contain longer input sequences, and in most cases also a larger number of tests.
Information about the circuits we consider is shown in Table 2 . Information about the test sets is shown later in Tables 3  and 4 . In Table 2 , after the circuit name we show the number of primary inputs and the number of state variables. We then show the number of faults and the number of faults detected by the test sets we consider. For ISCAS-89 benchmark circuits, all the detectable faults are detected by the test sets we use.
In Table 3 , we show the following information for the test sets obtained by static compaction [13] . Under column original , we show the number of tests in T , the length of all the input sequences T i in T , and the number of bits required to store T . Under column partitioned , we show the results obtained after partitioning T but before applying Procedure 1 to reduce the resulting set sizes. In all the cases, the test set is partitioned into three sets, Ψ 1 , Ψ 2 and Σ. Under subcolumn st 1 we show the number of vectors in Ψ 1 , under subcolumn st 2 we show the number of vectors in Ψ 2 , under subcolumn seq we show the number of input sequences in Σ, and under subcolumn len we show the total length of all the sequences in Σ. Under column f inal we show the results obtained after applying Procedure 1.
In addition to the sizes of Ψ 1 , Ψ 2 and Σ shown under subcolumns st 1, st 2, seq and len , we show under subcolumn tst the number of tests applied to the circuit. This is the size of the Cartesian product Ψ 1 ×Ψ 2 ×Σ. Under subcolumn stor we show the storage requirements of the partitioned and reduced sets Ψ 1 , Ψ 2 and Σ in bits. Under column ratio we show the storage requirements of the proposed method divided by the storage requirements for the original test set. In the last row of Table 3 we show the total storage requirements and the average ratio. In the last column of Table 3 we show the normalized run time of Procedure 1. The run time is normalized by dividing it by the time it takes to fault simulate the original test set. It is important to note that the original test set is small and fault simulation time for this test set is very short. The run time of Procedure 1 can be further reduced by incorporating techniques to speed-up the identification of elements that cannot be removed from the partitioned test set. Storage requirements are computed as follows. For the original test set T , let the number of tests in T be N and let the length of T i be L i . The total length of all the sequences T i in T is
where N SV is the number of state variables and N PI is the number of primary inputs. The first component of the sum corresponds to storage of scan-in vectors, and the second component corresponds to storage of the input sequences. We ignore the memories required to store beginnings and lengths of sequences since they depend on the implementation, and they can only be reduced by the proposed method. For a partitioned test set with sets Ψ 1 , Ψ 2 and Σ, let the size of Ψ 1 be N 1 , let the size of Ψ 2 be N 2 , and let the size of Σ be N 3 . The number of bits required to store these sets is N 1 N SV 1 +N 2 N SV 2 +LN PI , where N SV 1 is the number of state variables whose vectors are stored in Ψ 1 , N SV 2 is the number of state variables whose vectors are stored in Ψ 2 , and L is the length of all the input sequences in original  partitioned  circuit  tst  len  stor  st1  st2  seq  len  s208  23  27  481  10  14  23  27  s298  20  24  352  19  17  10  14  s344  11  15  300  10  8  5  8  s382  23  25  558  23  23  8  10  s386  42  70  742  8  8  36  64  s400  20  24  492  19  20  10  13  s420  40  43  1457  23  32  24  27  s510  23  54  1164  8  8  20  51  s526  44  50  1074  36  38  11  17  s641  15  22  1055  11  14  15  22  s820  42  94  1902  4  8 The following points can be seen from Table 3 . We first consider the results after partitioning T but before applying Procedure 1. In the worst case, if the number of tests in T is N , the numbers of vectors in Ψ 1 and Ψ 2 and the number of sequences in Σ are also N . In most cases, the numbers are smaller than N ; however, they are not significantly smaller. This implies that partitioning alone does not reduce the storage requirements significantly. For example, for s 208, the test set obtained by static compaction contains 23 tests. After partitioning, the numbers of vectors in Ψ 1 and Ψ 2 are 10 and 14, respectively, but the number of input sequences in Σ is 23. In this case, all the input sequences in T are different. Procedure 1 reduces the numbers of vectors in Ψ 1 and Ψ 2 , and the number of input sequences in Σ substantially. As a result, it reduces the storage requirements, in most cases, to less than half of the original value. For s 208, the numbers of vectors in Ψ 1 and Ψ 2 and the number of sequences in Σ are all 7. The storage requirements are reduced from 481 bits to 155 bits, or 0.32 of the original value.
Results using the test sets from [14] are shown in Table 4 . The original test sets in this case are larger than the test sets of Table 3 , which were obtained by static compaction. Consequently, their storage requirements are higher. tests applied by the proposed method achieve large numbers of detections of stuck-at faults will support the argument that the proposed method, by applying a number of tests which is larger than necessary, improves the defect coverage.
Results regarding the numbers of detections achieved by the proposed method are given in Table 5 . The original test set in this case is the one obtained by static compaction [13] . Table 5 is organized as follows. After the circuit name, we show the minimum and the maximum number of times a stuck-at fault is detected by the tests generated by the proposed method. We then show the average number of times a stuck-at fault is detected. For comparison, we show the same information for the original test set. The following points can be seen from Table 5 . The minimum number of times a fault is detected by the Cartesian product is in most cases one. This is a result of the fact that the subsets Ψ 1 , Ψ 2 and Σ are minimized as much as possible. Specifically, the omission of any additional entry from either one of these subsets will leave a fault undetected. The faults that prevent the omission of another entry are the ones detected only once. A similar situation can be seen for the original test set which is a compacted test set.
The maximum and average numbers of detections are significantly higher for the Cartesian product than for the original test set. This supports our argument that the Cartesian product includes tests of high quality that detect large numbers of faults.
Concluding remarks
We described a partitioning and storage based built-in test pattern generation method for full-scan circuits. Under the proposed method, a precomputed test set is partitioned into several sets. One or more of the sets contain values of state variables. The remaining set contains input sequences. The on-chip test set is obtained by implementing the Cartesian product of the various sets. The sets were reduced as much as possible by omitting vectors or input sequences in order to reduce the storage requirements and the test application time.
In the application to full-scan circuits, we used the fact that in most cases, the number of primary inputs is smaller than the number of state variables, and the primary input sequences are relatively short. Consequently, we partitioned the scan-in vectors, but we did not partition the input sequences. If necessary, it is possible to partition the input sequences by partitioning the set of primary inputs (if it is large), or by partitioning the input sequences into subsequences of limited lengths (if the input sequences are long).
To further reduce the storage requirements, it is possible to use the proposed method in conjunction with a random pattern generator, and apply it only to faults that remain undetected after random pattern generation. This would reduce the sizes of the sets that need to be stored on-chip.
