SUMMARY We developed test data compression scheme for scanbased BIST, aiming to compress test stimuli and responses by more than 100 times. As scan-BIST architecture, we adopt BIST-Aided Scan Test (BAST), and combines four techniques: the invert-and-shift operation, runlength compression, scan address partitioning, and LFSR pre-shifting. Our scheme achieved a 100x compression rate in environments where Xs do not occur without reducing the fault coverage of the original ATPG vectors. Furthermore, we enhanced the masking logic to reduce data for X-masking so that test data is still compressed to 1/100 in a practical environment where Xs occur. We applied our scheme to five real VLSI chips, and the technique compressed the test data by 100x for scan-based BIST.
Introduction
With advancements in semiconductor technology, the cost of testing VLSI chips has been increasing [1] . Testing them requires a large amount of test data and long testing time due to their increased scale, resulting in high costs. Therefore, the test data, including test stimuli and test responses, must be reduced.
One promising testing technique is the built-in self-test (BIST), in which a linear feedback shift register (LFSR) is used as a pseudo-random test generator and a multiple-input signature register (MISR) is used as a test response compactor [2] . BIST has been considered a practical solution to the difficulties with VLSI testing [3] , [4] . One of drawbacks of BIST is the difficulty to achieve the high fault coverage of the ATPG technique. Consequently, many hybrid BIST schemes (we use this term as its generalized meaning [5] , that is, the combinations of BIST and external testing) have been proposed to provide better test patterns to circuits under test (CUTs), still reducing test cost [5]- [9] . As the number of transistors being integrated into chips increases, more attention has been focused on SoC testing.
As a result, many commercial hybrid BIST The key technique for the hybrid BIST is significantly reducing stimuli and handling unknown (X) values. Circuits with hybrid BIST architectures use bit flipping circuits and/or broadcasters that propagate test stimuli into multiple scan chains. The techniques mentioned above have moderate compression rates (up to 25x) or have much higher rates in exchange for some limitation on the applicable test vectors, due to the arrangement of scan chains. For example, EDT achieves compression rates of 5x-25x [10]- [12] , and BAST achieves up to 22x [17] . While UltraScan enables more than 100x compression, 10x compression rate out of 100 is due to its faster ATE clock cycle [15] , [16] .
For tolerating Xs, on the other hand, coding-theorybased static masking logics such as X-Compact and Convolutional Compactor have been proposed, as well as filtering logics such as X-filter and Channel Masking that are controlled by an ATE [17]- [21] .
We developed test data compression scheme for scanbased BIST, aiming to compress test stimuli and responses by more than 100 times. The scheme combines four techniques: invert-and-shift operation, run-length compression, scan address partitioning, and LFSR pre-shifting. Our scheme achieved a 100x compression rate in environments where Xs do not occur without reducing the fault coverage of the original ATPG vectors. Furthermore, we enhanced the masking logic to reduce the amount of information for X-masking so that test data was still compressed to less than 1/100 in practical environments where Xs occur. We applied our scheme to five real VLSI chips. This paper is organized as follows. In Sect. 2, we present the proposed compression scheme, followed in Sect. 3 by the results of using it in the real circuits. In Sect. 4, we describe, the test response compression considering Xmasking and show some experimental results. Section 5 concludes the paper.
Proposed Test Data Compression Scheme

Basic Scan-BIST Architecture
As an example of a basic scan BIST architecture, the structure of BAST [17] is shown in Fig. 1 . The deterministic test vectors generated by an ATPG are not always fully specified. The specified bits in the vectors are called as care-bits, operation also has non-flipped care-bit, and that there are no other non-flipped care-bits. In this case, the second and third invert operations can be integrated to one group invert operation, and the first and the fifth invert operations can be translated to group invert operation each, while the fourth one needs two operations, to toggle the non-flipped care-bit. In the example, the test data is reduced from 100-to 60-bit.
LFSR Pre-Shifting
While a simple way to select an appropriate random pattern is the reseeding of an LFSR, this usually requires information of the seeds of the same length as the LFSR, say, 31 bit. We introduce block-based LFSR pre-shifting. In this scheme, we determine the maximum number of shifts, r, and the maximum number of care bits in one block, C. An ATPG vector is divided into groups, each of which has at most C care bits. For each group, we determine the optimum number of pre-shifts r'. The number of shifts is specified by an instruction code. Table 3 Test data reduction using various techniques. Compression rates for original BAST, invertand-shift (INV), run-length compression (RL), address partitioning (AP), LFSR pre-shifting (PS), and combinations of them from test data of ATPG were calculated for 256 scan chains. X-masking is not considered.
Test Stimuli Compression
First, we show the results of reducing the test data without X-masking. X-masking will be considered in the following section. Table 3 shows the results of using the original BAST, invert-and-shift operation (INV), run-length compression (RL), address partitioning (AP), LFSR pre-shifting (PS), and combinations of these. In all circuits, the number of scan chains was set to 256, and a 31-bit LFSR with a phase shifter was applied as the PRPG. The compression rate, R, is derived as Table 3 show the result for original BAST, the invert-and-shift operation, and runlength compression. For these three schemes, each instruction code is arranged as 2+(1og2256)=10-bit.
Thus, for the invert-and-shift operation, the instruction code mentioned in Sect. 2.2 is used. The compression rate for the original BAST ranged from 5.16% (19x) to 2.54% (39x). The addition of the invert-and-shift operation further decreased the test data by up to 81.4%. Furthermore, applying run-length compression drastically reduced the test data. For all circuits, the test data was compressed by more than 50x, and a compression rate of over 100x was achieved for Circuits 2, 4, and 5. Figure 7 shows the details of the test data from Circuit 1 using (a) the original BAST, (b) the invert-and-shift operation, and (c) run-length-compressed invert-and-shift operations. The X-axis shows the ATPG vector No., and the Y-axis shows the test data corresponding to each vector. Most of the bits flipped (thus, also the care bits) appeared in the first 500 vectors. While the original BAST requires a constant number of shift operations in every ATPG, the invert-and-shift operation can reduce the number of shift operations by combining the shift and invert operations. The number of shift operations is further reduced significantly by applying run-length compression. Column (d) of Table 3 shows the results of test data compression using address partitioning (AP) combined with the invert-and-shift operation and run-length compression. The instruction code set shown in Table 1 was applied using a 4-bit address specifier. By applying address partitioning, the bit-width of one instruction code was suppressed to 60%, that is, from 10-bit to 6-bit. While additional instruction codes were required for the cases where multiple care bits are in one high-address group, such cases were not frequent. As a result, applying address partitioning reduced the test data by up to 70% of the results without partitioning. With the addition of the invert-and-shift operation and run-length compression, Circuits 2 and 5 had compressions of over 200x compared to the original ATPG, and Circuits 1, 2, 4, and 5 had compressions of over 100x.
Column (e) of Table 3 shows the reduction of the test data using our scheme, that is, the combination of all four techniques including LFSR pre-shifting.
The maximum number of LFSR pre-shifts was set to 65535, and the maximum number of care-bits in a block, C, was set to 16. All circuits had compression rates greater than 100x.
The details of the test data for Circuits 3 and 5 are shown in Fig. 8 . The data is plotted by the compression rate relative to DATPG. Similarly to the results shown in Fig. 7 , the invert-and-shift operation reduced the test data by replacing some of the shift operations by invert-and-shift operations, and run-length coding reduced the number of shift operations. The test data was reduced to 44% compared to that of the original BAST by using these two schemes in Circuit 3, and 18% in Circuit 5. The test data was further reduced to 67% in Circuit 3 and 63% in Circuit 5 using address partitioning because the instruction code was short- Fig. 9 Effects of run-length compression and address partitioning using Throughout the ATPG responses, more than 70% of the scan chains output no Xs. The ratio of the scan chains producing Xs decreased as the number of scan chains increased.
For Circuit 5 with 1024 scan chains, only 1.37% of the scan chains output Xs.
Two-Phase X-Masking
Although the compression scheme mentioned in Sect. 3 guaranteed exactly the same fault coverage as the original ATPG and achieved a compression rate of more than 100x, X-masking was not considered. Figure 11 shows the test data reduction with and without using X-masking for Circuits 3, 4, and 5 when all four techniques were combined.
By applying scan address partitioning, the instruction code for X-masking is significantly increased because the mask-(a) without X-masking (b) with X-masking Test data reduction using X-masking for Circuits 4, 5, and 6 , when four techniques (INV, RL, AP, PS) were applied. (1024 scan chains) Fig. 12 Example of enhanced X-masking logic applying strict and broad 
For each clock cycle, the expected number of Xs is nx, and these Xs must occur in m scan chains. The curlybracketed part represents the probability that at least one scan chain out of m outputs Xs, where the X-bit rate for these scan chains is (nx/m). Under the condition that masking occurs, the expected ratio of masked non-X bits to total non-X bits are calculated as (m-nx)/(n-nx).
Then, the product of these represents the expected ratio of masked non-X bits. Thus, the multiplicand of Fb in the numerator is the ratio of unmasked response bits to total non-X response bits. Table 6 shows the calculated compression rate and fault coverage. The point of changing from strict X-masking mode to broad X-masking mode is determined by the fault coverage of strict X-masking mode. There are trade-offs between the compression rate and fault coverage, which can be controlled using the transition point.
Circuit 3 has a much higher care-bit rate, higher X-bit rate, and larger number of scan chains with Xs than Circuits 4 and 5. Thus, our compression scheme is more effective for Circuits 4 and 5 than for Circuit 3. Nonetheless, it has Table 6 Test data compression rates and fault coverage relative to ATPG . Test data is compressed more than 100x when decoder changes to broad Xmasking mode at 90% for all circuits. (1024 scan chains) a 0.99% (101x) compression rate with 99% fault coverage when the mode changes at 90% coverage and a 1.2% (83x) compression rate with 99.8% fault coverage when the mode changes at 99.9% coverage.
As mentioned above, Circuit 4 has more Xs than carebits, and thus it has the lowest compression rate under the strict X-masking mode of the three circuits. The combination of broad X-masking mode with strict X-masking mode, however, effectively reduced the test data while keeping the same level of fault coverage as that of Circuit 3. Circuit 4 has a compression rate of over 200x with 99% fault coverage and a 181x compression rate with 99.5% fault coverage.
Circuit 5 had very small number of scan chains that produced Xs. Therefore, the degradation of fault coverage by using broad X-masking was small. Even when we set the decoder to broad X-masking mode from the beginning, the fault coverage was higher than 99.2%, while the test data was compressed to 0.32% (312x). A fault coverage higher than 99.99% was achieved at a compression rate of 0.41% when the decoder changed to broad X-masking mode at 99% coverage.
Conclusions
We developed a scheme for scan-based BIST that can compress test stimuli and responses. The scheme combines the invert-and-shift operation, run-length compression, scan address partitioning, and LFSR pre-shifting. It achieved a 100x compression rate for environment in which X does not occur without reducing the fault coverage of the original ATPG. Experiments using five real VLSI chips showed that combining the four techniques enabled compression rates of 144x to 370x compared to an ATPG. Furthermore, we enhanced the masking logic to reduce data for X-masking. Experiments considering Xs showed that test data was still compressed by 101x to 312x while achieving more than 99% of fault coverage, of the ATPG.
