In 
Introduction
Logic BIST is gaining acceptance in the VLSI industry because it eliminates the need of expensive test equipment as well as provides at-speed and in-system testing capabilities [1, 5] . Accordingly, the circuit is partitioned into a number of circuits under test (CUT). Each CUT has an associated test pattern generator (TPG) and output response analyzer -a multiple-input signature register (MISR). The efficiency of a BIST technique strongly depends on the abilities TPGs to achieve very high fault coverage with acceptable test lengths. Different techniques have addressed this problem, e.g., pseudoexhaustive (verification) testing [2, 20] , pseudo-random testing [4, 5, 10, 23, 24, 25] , and deterministic test set embedding [14, 18, 26] . Recently, the width compression method [8, 9, 12] has been introduced to reduce the test length of pseudo-exhaustive testing. Unlike pseudoexhaustive testing [2, 20] , two inputs are considered compatible if they can be connected to the same output of the TPG without any loss of fault coverage. The target fault set may include different type of faults, however, for simplicity, we consider here only stuck-at faults. Figure 1 illustrates a test-per-clock scheme of the counter-based exhaustive BIST. Accordingly, we assume a TPG implemented using an N-bit binary counter or complete LFSR, called a de Bruijn counter [5] . This scheme also includes mapping logic connected via multiplexers (MUXs) to the inputs of an M-input combinational CUT where M>>N. In this case, the width compression method is used to minimize the size of the counter by finding as many compatibility relations as possible between the inputs of the CUT. Four compatibility relations have been introduced in the literature: direct, inverse [9] , decoder [8] and combinational compatibility [12] . As a result, the mapping logic in [9] includes only direct connections and inverters, while the mapping logic in [8] and [12] includes also binary decoders and 2-input gates.
The advantage of the counter-based pseudo-exhaustive testing is a low-area overhead. Although some recent improvements [12] , counter-based BIST still require relatively long test length (up to 2 23 and 2 26 for the ISCAS'85 and ISCAS'89 benchmark circuits, see column 6 in Table 1 ). In this paper, we address this problem and propose a new compatibility relation to further reduce the test length of the counter-based BIST. More formally, the target faults is divided into K groups such that a binary counter can generate test patterns for each group.
The rest of this paper is organized as follows. In section 2, a new compatibility relation allowing further reduction of the test length of the counter-based BIST is presented. In Section 3, the synthesis procedure is described. In Section 4, the experimental results are given and Section 5 concludes the paper. 
New compatibility relation
In this section, we first summarize the compatibility relations previously introduced in the literature. Next, we present a new compatibility relation for reducing the test length of the counter-based exhaustive testing.
Two inputs of a combinational circuit are said to be directly compatible if they can be shorted together without introducing any redundant stuck-at fault in the circuit. Similarly, two inputs of a circuit are said to be inversely compatible if they can be shorted together via inverter without introducing any redundant stuck-at fault in the circuit [9] .
A set of inputs of a circuit forms a compatibility class if all inputs are directly or inversely compatible to one another, i.e., shorting together by wire or via inverters to one output of TPG do not introduce any redundant faults in the CUT. As the same (or opposite) logic value is applied to all inputs in a compatibility class during testing, the size of the binary counter used for the counterbased exhaustive testing is equal to the number of compatibility classes, i.e., the test length is 2 N where N is the number of compatibility classes.
Two or more inputs of a circuit are said to be decodercompatible (D-compatible) if all detectable stuck-at faults in the circuit can be tested by a test set in which no test vector requires more than one of these inputs to be at value one [8] . This compatibility relation can be used to further reduce the size of the binary counter since the values of D-compatible inputs x 1 ,x 2 ,..,x k can be generated by a counter connected to the inputs x 1 ,x 2 ,..,x k via a decoder.
In [12] , a compatibility relation called combinational compatibility (Ck-compatibility) has been introduced to further reduce the test length of the counter-based exhaustive testing. We use this definition because it covers all previous definitions presented in the literature.
Definition of Ck-compatibility relation [12] : Inputs x 1 ,x 2 ,.., x k of a circuit are said to be Ck-compatible with input x k+1 if a combinational circuit z=f(x 1 ,x 2 ,..,x k ) exists whose output z can be connected to input x k+1 without introducing any redundant fault in the circuit.
For example, the direct and inverse compatibility relations introduced in [9] can be viewed as C1-compatibility relation. Also, the D-compatibility relation introduced in [8] is a special case of Ck-compatibility where the combinational circuit is a decoder. The disadvantage of the Ck-compatibility relation is its extremely high time-complexity when K>2 because, in the worst case, all combinations of (K+1) compatibility classes have to be checked with all possible k-input combinational functions z=f(x 1 ,x 2 ,..,x k ). For example, the number of these functions when k=2 and 3 is 10 and 218, respectively. The following compatibility relation can be used to further reduce the test length of the counter-based pseudo-exhaustive testing and keep the time-complexity for width compression at a reasonable level.
New compatibility relation: Let {c 1 ,c 2 ,..,c N ) be a set of compatibility classes for a given circuit. Let β i define a partition of the compatibility classes into S blocks {b i1 ,b i2 ,…,b iS } such that each block consists of one or more compatibility classes. Let λ(β i ) be a set of the redundant faults introduced by partition β i . Then partitions {β 1 ,β 2 ,..,β k } define a composite Pkcompatibility relation iff the set ∆λ k =λ(β 1 )∩λ(β 2 ) ∩…∩ λ(β k )=∅ is empty.
In this way, the Pk-compatibility relation allows a compatibility class to belong to different blocks b i1 ,b i2 ,..,b is for each partition β i . As a result, two compatibility classes, c i and c j , can be independent, directly or inversely compatible in different partitions.
The test length for the counter-based exhaustive testing using the Pk-compatibility relation is equal to 2 S where S is the number of blocks for each partition β i .
Synthesis procedure
In this section, we present a synthesis procedure for a test-per-scan scheme using Pk-compatibility relation. First, we present the target BIST scheme. Next, we describe the procedure for deriving C1-compatibility classes. Finally, we describe the procedure for reducing the test length of exhaustive testing using Pkcompatibility relation. Figure 2 presents the test-per-scan scheme of the counter-based BIST. Accordingly, we assume a full-scan approach, i.e., in test mode the CUT is transformed into a combinational circuit and all primary inputs and internal registers are included into a scan chain. Also, the scan cells are divided into two or more scan chains each one having at most L cells such that, for ∀i∈{1,..,L}, the cell in position i of all scan chains are directly compatible. This precondition can be achieved using pseudoexhaustive technique [2, 20] , i.e., initially, two inputs are The TPG consists of two ROM's, a complex counter and multiplexer. The first section of the complex counter is a counter modulo L that together with ROM1 determine the compatibility class for each scan cell. More formally, ROM1 contains information for the compatibility classes of all scan cells. Since all cells in position i of all scan chains belong to one compatibility class, let say c j , ROM1 in address (L-i) contains an unique n-bit integer corresponding to compatibility class c j where n=log 2 N. Similarly, the third section of the complex counter is a counter by modulo K that together with ROM2 determine the block for each pair -compatibility class and partition, i.e., the bit of the S-bit counter that is shifted during the current load/unload cycle. For example, if compatibility class c i in partition β j belongs to block b jm , then ROM2 in address N(j-1)+(i-1) contains an unique s-bit integer corresponding to block b jm where i∈{1,..,N}, j∈{1,..,K}, m∈{1,..,S} and s=log 2 S. In fact, ROM2 has (s+1)-bit word and the most significant bit determines relations between the compatibility classes within one block. For example, if compatibility classes c x and c y in partition β j and are directly or inversely compatible, then the most significant bits in addresses N(j-1)+x-1and N(j-1)+y-1 have equal or different values, respectively.
Test-per-scan scheme
An important aspect of BIST is to achieve a selftesting property. In this case, BIST is able to detect not only all detectable faults in the CUT but also any fault in the TPG. For the proposed scheme, this goal can be achieved by connecting the only output of the TPG to the MISR as it is shown in Figure 2 .
Next, let us consider how to facilitate satisfying the precondition of the proposed BIST scheme that all scan cells in the position i for ∀i=(1,..,L) are directly compatible. Clearly, if the CUT has only one scan chain, then this precondition is always satisfied. Therefore, by choosing a proper value for parameter L, a trade of between the area overhead and test application time can be achieved. For example, by choosing a high value for parameter L, we increase the degree of freedom for scan chain ordering and reordering but also increase the size of ROM1 and test application time. To increase the degree of freedom for DFT synthesis, a tree-structure of the scan chains shown in Figure 2 can be used. In this way, if the value of parameter L is properly selected, then the precondition of the proposed test-per-scan scheme can be satisfied by slight reordering of the scan cells.
Deriving compatibility classes
The procedure for deriving compatibility classes is based on a greedy strategy. More formally, the first input is assigned to the first compatibility class. Next, if the current input is compatible to all inputs in an existing compatibility class, then this input is added to that compatibility class, otherwise this input is assigned to a new compatibility class.
Deriving the compatibility classes requires checking the compatibility of a set of inputs. The synthesis procedure uses techniques similar to those presented in [8, 9, 12] . These techniques are based on dependency matrix [20] , incompatibility filters [9] and fault set reduction [12] . After quick identification of a large number of compatible inputs using dependency matrix, the synthesis procedure tries to reduce the number of compatibility classes using test generation. This technique checks whether shorting two inputs together introduces any redundant fault in the CUT. The search space in this phase can be considerably reduced using incompatibility filters. More formally, the synthesis procedure calculates the incompatibility filters using both structural information and necessary input assignments of each fault in the target set. Accordingly, two inputs of a circuit are incompatible if they are inputs of one logic gate [9] . Also, two inputs of a circuit are inversely (directly) incompatible, if a necessary input assignment to detect a fault requires both inputs to be at the same (different) values [12] .
This phase is speeded up by a dynamic update of test set during width compression. For example, when checking the compatibility of two inputs, x a and x b , the synthesis procedure identifies test patterns that have incompatible values for these inputs. Next, these test patterns are temporarily removed from test set and all faults detected by these patterns become potentially redundant (or untested). These faults form the fault list to be checked by ATPG. After shorting together inputs x a and x b (via wire or inverter), if all faults in the fault list are proven to be detectable then these inputs are compatible and test set is updated, otherwise these inputs are incompatible and previous test set is restored.
Fault set partition
In general, two approaches for width compression based on test generation [9, 12] and computed minimal test sets [8] exist. The divide-and-conquer strategy for width compression will result in fault set and test set partition, respectively.
From a practical point of view, two different tasks for fault test partition are possible: (1) to minimize K the number of partitions when the size S of the counter is fixed and (2) to minimize test length when K=2. Both tasks assume a similar approach, so we present here the synthesis procedure for the first case.
Procedure 1: The input data are the CUT, the compatibility classes, parameter S, and the target fault set, ∆λ(β0), including all detectable single stuck-at faults. The output data are partitions β1,β2,..,βk of the compatibility classes into S blocks such as the set ∆λ(βk), calculated as an intersection of the redundant faults introduced by partitions β1,β2,..,βk is empty. K=0; while ∆λ(βk)≠∅ do the following: 1) K=K+1; derive partition βk using a greedy strategy to minimize ∆λ(βk). Initially, the number of blocks of partition βk is equal to N -the number of compatibility classes. Next, if two blocks are merged then the number of blocks in partition βk decreases by 1. This step continues until the number of blocks in partition βk becomes equal to S.
2) Optimize partition βk by checking whether each compatibility class can be included to another block so that ∆λ(βk) is minimized.
Procedure 1 is based on a dynamic calculation and minimization of ∆λ(β k ) -the number of redundant faults introduced by merging blocks of partition β k . This analysis is speeded up by filtering the pairs of blocks that merged together introduce too many redundant faults. Also, Procedure 1 utilizes the following assumption: Assumption 1: Let ∆λ(β) be estimated by calculating a square matrix λ such that the entry (x,y) is equal to the number of redundant faults introduced by merging directly (or inversely) blocks b x and b y where x<y (or x>y), respectively. The number of redundant faults introduced by merging operation can be estimated as the sum of corresponding entries in matrix λ if each new block is formed by merging at most two blocks.
Example 1: Figure 3 shows matrix λ of partition β where entry (4,3) defines that if blocks b3 and b4 are inversely merged, then the number of introduced redundant faults is 17. Now let us suppose that we would like to find two pairs of blocks that merged together introduce a minimum number of redundant faults. According to Assumption 1, this goal can be achieved if blocks b1 and b6 are inversely merged and blocks b2 and b7 are directly merged, i.e., the selected pairs are (6, 1) and (2, 7) . In this case, the number of blocks is 6 and the number of redundant faults introduced by this merge is estimated to be 4. Now let us suppose that we would like to find three pairs of blocks. According to Assumption 1, to minimize the number of introduced redundant faults, the selected pairs are (2,7), (3, 6) and (1, 5) . In this case, the number of blocks is 5 and the number of the redundant faults introduced by this merge is estimated to be 9.
In fact, Assumption 1 is very useful when the number of blocks is bigger than 30. In this case, just 2-3 iterations are necessary to reduce by 50 percents the number of blocks.
Experimental results
The presented synthesis procedure was implemented using SPIRIT [11] and ran on a 1GHz Pentium-III PC. The experimental results for the ISCAS'85 [6] benchmark circuits and a full-scan version of the ISCAS'89 [7] benchmark circuits are presented in Tables 1 and 2 . Table 1 presents a comparison of the P2-and C2-compatibility relations. Columns 2-6 give the number of inputs before and after width compression using dependency matrix [2, 20] , C1-compatibility relation [9] and C2-compatibility relation [12] . Columns 7-9 show the experimental result of the proposed P2-compatibility relation. These results include the number of compatibility classes obtained by the first step of the synthesis procedure, the numbers of block of partitions β1 and β2, respectively, S1 and S2 as well as the test length of the counter-based exhaustive testing calculated by the formula 2 S1 + 2 S2 . Column 10 shows the test length reduction of the P2-compatibility relation calculated by the formula (1-L P2 /L C2 ) in percents where L C2 and L P2 are the test lengths of the C2-and P2-compatibility relations. These results demonstrate the effectiveness of the proposed P2-compatibility relation in respect to the C2-compatibility relation. In all critical cases, i.e., the cases where the test length of the C1-compatibility relation is bigger than 2 20 , the test length reduction was between 24.8 and 93.4 percents. In fact, the Pk and Ck compatibility relations do not contradict each other and they can be applied together to further reduce the test length of the counter-based exhaustive testing. Table 2 presents experimental results for the Pkcompatibility relation when for each partition the number of blocks were fixed to 12 and 15, i.e., S=12 and S=15. Columns 2-9 show the maximum length of scan chains, the number of compatibility classes, blocks and partitions obtained after the first and second steps of the synthesis procedure, the test length of counter-based exhaustive testing, the ROM sizes and CPU time in hours only for the second step of the synthesis procedure. Columns 10-13 give ROM sizes of the best-published results for the reseeding technique -chosen here as an alternative technique. The size of the ROM for the proposed BIST technique was calculated by the following formula: nL + (s+1)KN where n=log 2 N and s=log 2 S. Obviously, the proposed BIST technique achieved higher compression of test data than reseeding. In fact, some of these results for the reseeding technique were achieved using also Pseudo Random Test Generation (PRTG) [14, 15, 19] as well as width compression method [15] . For example, if the structure of scan chains proposed in [15] is used, then ROM1 in Figure 2 will become redundant. Also, these results show that the size of ROM's for the proposed BIST technique slightly depends on the size of the circuits. For the typical cores (10K-100K gates) [17] , we may expect L≤512, N≤64, K≤16 and S≤16, i.e., the test length and the size of ROM's will be less than 1M and 8K, respectively. Now, let us compare this technique with other BIST techniques that achieve complete fault coverage. The most promising scan-based techniques able to achieve this goal by reasonable test application time are based on reseeding [3, 14, 15, 19] and bit-flipping [17, 26] . In this analysis, we exclude the test point insertion technique [22] because this technique does not guarantee complete fault coverage [16] .
The advantages of the proposed BIST technique are: 1) low area overhead (estimated ROM size) that depends slightly on the size of the CUT; 2) higher test length reduction than the previously published counter-based pseudo-exhaustive techniques; 3) ability to achieve a trade-off between area overhead and test application time when complete fault coverage in both the CUT and TPG is ensured.
The disadvantage of the proposed BIST technique is the time complexity of the synthesis procedure involving many runs of ATPG for the whole or reduced fault set. The characteristics of proposed BIST technique can be improved using PRTG and/or dynamic-scan [13, 17, 21] . For example, some recent results demonstrated the ability of the weighted random techniques [24, 25] to achieve very high fault coverage. Therefore the target fault set can be considerably reduced using PRTG. As a result, the synthesis procedure could be speeded up significantly.
Conclusions
We presented a new technique for reducing the test length of the counter-based BIST. The experimental results for the ISCAS'85 and ISCAS'89 benchmark circuits demonstrated the effectiveness of the proposed BIST technique. When K=2 (fault set partition in two groups), much shorter test length was achieved than the previously published counter-based BIST techniques in all critical cases. A further reduction of the test length was achieved by increasing parameter K. As a result, the proposed BIST technique achieved the highest test data volume compression than all previously published deterministic BIST techniques. Also, the size of the test data depends slightly on the size of the CUT. These results were achieved using a new test-per-scan BIST architecture where test data were represented by Pkcompatibility relations. 20 
