Abstract: The proposed scheme, called the IOC-LP (input reduction and one block compression for low power test), compresses the test data of scan based SoCs to improve the compression ratio in the ATPG process. It does so by using the modified input reduction and novel techniques, a new scan flip-flop reordering for low power test, the newly proposed one block compression, and a novel reordering algorithm. Unlike previous approaches using the cyclic scan register architecture, the proposed scheme is able to compress original test data and to decompress the compressed test data without the cyclic scan register architecture. Therefore, the proposed method leads to a better compression ratio with lower hardware overhead and lower power consumption than previous works. Experimental results on ISCAS '89 and ITC '99 benchmark circuits validated the proposed method.
Introduction
As the complexity of very large scale integration (VLSI) circuits increases, it has become more important to test VLSI circuits completely. Today's large and complex VLSI circuits in SoC environments need an especially enormous amount of test data. When SoCs are tested, such test data are transferred to the circuit under test (CUT) from automatic test equipment (ATE). Since the channel width and the size of memory for the ATE are limited, the traditional ATE must be modified or more expensive ATE must be developed in order to test an SoC with enormous test data. In addition, if the original test data are reduced for the size of the ATE memory by eliminating useful test patterns, then the accuracy of testing will be diminished.
The test power could be as much as twice as high as the power consumed during the normal mode [1] . Excessive power consumption during testing can cause several problems. First, this leads to an increased peak current and electromigration which are caused by excessive power consumption to become more likely thus affecting the reliability of the system. In addition, power consumption during testing is more important since excessive heat dissipation can damage the CUT directly.
Therefore, we focused on the test data compression/ decompression scheme for a low power test in order to overcome these particular problems. In fact, the low power scan testing and the reduced test data volume are mutually conflicting goals. To alleviate this conflict, various approaches for resolving test problems have been researched over the last few years. These include the low power built-in self test (BIST) [2] [3] [4] [5] and static compaction of scan vectors for low power testing [6] . In a BIST scheme, the BIST can be applied directly to circuits only if circuits are BIST-ready.
In [6] , while a 2-3 times reduction in power consumption for ISCAS benchmark circuits is achieved, it does not lead to any appreciable reduction in test data volume. Moreover, this technique considers only scan-in power during scan testing.
Recently, a number of test data compression techniques have been proposed for reducing test data volume [7] [8] [9] [10] [11] . However, these techniques did not consider power consumption during testing. In addition, although all of them provide high compression ratios for most benchmark circuits, they still lead to high area overheads for the decompression architectures owing to the use of the differential test sequence set. In [12] , test data compression for low power scan test is presented. However this still has problems caused by the cyclic shift register (CSR) architecture [13] and the static compaction test vectors. High area overhead means that more test application time, large delay time, more expensive chip cost and much higher test power consumption owing to the high area overhead are required for testing a CUT. Especially, in SoC environments, the little area overhead of the decompression architecture is more significant since the decompression architecture is necessary for each core. Obviously, the methods using FDR, Golomb and VIHC do not need to have the CSR architecture. Instead, they map the Xs to 0 in order to increase the compression ratio. However, the compression ratios of these techniques without the CSR architecture are significantly decreased than those of the cases with it. For example, the difference of the compression ratio of the s38584 benchmark circuit using VIHC code is about 20%.
Therefore, we focus on a test data compression/decompression scheme for low power testing without the CSR architecture in order to achieve high compression ratio and low area overhead for the decompression architecture. In this paper, a new hybrid test data compression/decompression scheme is proposed. Unlike previous approaches, the proposed approach uses the modified input reduction and a new scan cell reordering to compensate for the loss of the compression ratio which is caused by our approach when it lacks the CSR architecture and the differential test vector set. In addition, a new one block compression code with lower hardware overhead and faster application time and a novel test data reordering algorithm for low power test are proposed.
This paper has been organised as follows. The next section explains the power consumption model to estimate power dissipation during scan testing and introduces several definitions to easily present the proposed scheme. Section 3 presents a new test data compression/decompression scheme for a scan-based design using the IOC-LP (input reduction and one block compression for low power testing) code. Experimental results to demonstrate the efficiency of the proposed scheme are presented in Section 4. The conclusions are given in Section 5.
Preliminary details

Power consumption model
The source of power dissipation in CMOS devices is summarised by the following expression [14] 
where P denotes the total power, V DD is the supply voltage, and f is the frequency of operation. The expression f Á N is the average number of times per second that the nodes switch. The first term in (1) corresponds to the power involved in charging and discharging circuit nodes. The second term in (1) represents the power dissipation owing to current flowing directly from the supply to ground during the period that the pull-up and pull-down networks of the CMOS gates are both considering when the output switches. The third term is related to the static power dissipation owing to leakage current I leak . These three factors for power dissipation are often referred to as dynamic power, short circuit power and leakage current power respectively. It has been shown [15] that during normal operation of well-designed CMOS circuits the dynamic power dissipation caused by the switching activity accounts for over 90% of total power dissipation. Thus power optimisation techniques were employed at different levels of abstraction target minimal switching activity power. The model for power dissipation for a gate i in a logic circuit is simplified as
The capacitive load C i that the gate is driving can be extracted from the circuit. The switch activity N i depends on the input to the system which during test mode involves test vectors and therefore, for low power scan based tests, reducing switching activities during scan shifting is one of the most significant factors. In order to find the test data compression schemes to optimally minimise the power consumption by switching activities during scan shifting, a means for comparing the power consumption by two vectors is needed. The most accurate results would be obtained by using a circuit simulator to find the actual number of circuit elements that switch when a vector is scanned in and out. This process of using a simulator is expensive in terms of execution time. A simple heuristic is required for comparing the power dissipated by each of the two vectors.
Therefore, we use the extended transition metric (ETM) to estimate the power consumption owing to scan vectors. In previous transition metrics, only the scan-in power for given vectors is estimated using the number of transitions in them with their relative positions. However, since it should be considered the scan-out power for given vectors during scan based testing, we propose an extended transition metric (ETM).
Consider a scan chain of length k, a scan vector P ¼ fp 1 ; p 2 ; . . . ; p i g and a scan response R ¼ fr 1 ; r 2 ; . . . ; r i g. The ETM to estimate the power consumption during scan based test is as follows
Definitions
To simplify the presentation of the proposed test data compression scheme for low power test, the following definitions are used. When a test set T D whose input size is N and length is L is given, let v(i , k) be a value of input i (0 r i r N À 1) at sequence k (0 r k r L À 1). 
should not conflict the value of any other compatible or inverse compatible inputs. Definition 3. Scan input/output vector (SV): For a given test input set T D , scan input/output vector of input i, SV i , is defined as the union set of the test input sequence SV ik(input) and the output response for each test input sequence k, SV ik(output) . Definition 4. Scan distance (SD): For a given test input set T D , the scan distance SD is defined as the distance between scan input/output vector SV i and SV j . Therefore, the scan distance SD is calculated using the following equation.
Definition 5. Compression block: The compression block (CB) is defined as the 4-bit block which occurs most frequently in the test data. Definition 6. Uncompression block: The test data which exclude the compression block is divided into 2-bit blocks. The uncompression blocks are defined as these divided 2-bit blocks. The uncompression block is denoted by UB.
IOC-LP scheme
For testing scan-based SoCs by an ATE, test vectors obtained from the ATPG are applied to the SoC. Therefore, it is more significant to compress test vectors in the ATPG process, unlike testing circuits in BIST environments. To improve the compression ratio in the ATPG process, the proposed scheme, called the IOC-LP (input reduction and one block compression for low power testing), uses the modified input reduction, the scan flip-flop reordering for low power test, the one block compression, a novel mapping and reordering algorithm. The DFT (design for testability) process like the scan insertion and the ATPG process should be merged to the design process at gate level in the whole design flow in order to achieve maximal results for the proposed scheme. The overall algorithm of the IOC-LP scheme for scan based cores is shown in Fig. 1 .
The modified input reduction
In order to reduce test sets for a BIST, input reduction was proposed in [16] . A new input reduction algorithm is used in this paper, which is the modified version of the input reduction in [16] , in order to achieve a high compression ratio. The input reduction is done to identify circuit inputs that can be combined into other test inputs without the loss of fault coverage. This method is based on the concepts of compatibility and inverse compatibility, Definitions 1 and 2, respectively. Unlike the input reduction approach in [16] , we first obtained test sets, which were given by the deterministic ATPG or the combination of random and deterministic ATPG. Therefore, we developed a new input reduction algorithm for this case. The proposed algorithm is shown in Fig. 2 . Note that the process of don't care identification is required for a given test set obtained from the combination of random and deterministic ATPG in order to reduce test inputs efficiently. At the first step of the don't care identification process we performed fault simulation for a given test set T D to mark essential faults of each test vector. Next, we dropped faults which were detectable by a current test vector in the entire fault list and the combination of input values which excited and propagated these faults is identified by the processes of backtrace and the forward implication as similar methods in the deterministic ATPG. Input values which are not fixed were reassigned to Xs. Initially, in the modified input reduction algorithm, the initial set of input check set C is prepared and C i (0 r i r N À 1) has UNIQUE value, which means that input i is not input compatible or input inverse compatible. For the entire test sequence k (0 r k r L À 1) for a given test set T D , the compatible function is the function that checks the compatibility between target input value v(i, k) and the comparison input value v( j, k) as found in Definition 1 and Definition 2. In this function, as defined as Definitions 1 and 2, if v(i, k) or v( j, k) is unspecified, then it should be determined whether it conflicts with the value of any other compatible or inverse compatible inputs in conflict check function which is included in the compatible function.
The modified scan chain architecture is necessary to apply the reduced test set T IR which is generated by the modified input reduction algorithm to the CUT. Since the reconfigured scan architecture should not influence scan-out mode in internal test and external test, it is only used in scan-in mode of internal test. Figure 3 shows an example of the scan chain architecture to apply the modified input reduction scheme for the c17 benchmark circuit. Using muxes, the reconfigured scan chains, shown in Fig. 3b , are only used in scan-in mode of internal test sequence while general scan chains, shown in Fig. 3a , are used in the rest modes of internal test. As shown in Fig. 3 , since muxes, fanout points, and inverters are included only in the original scan chains of this architectures, which modifies the connection of scan chains, it is reconfigurable with little effort.
By using this modified input reduction, test data volume is reduced in proportion to the number of the reduced inputs, and the number of switching activities in scan chains during scan shifting is also reduced. Therefore, the power consumption during scan based testing is diminished, especially putting test sequences into scan chains.
Scan flip-flop reordering algorithm
The
The key idea of the proposed scan flip-flop reordering algorithm is to find the order which has the minimum scan distance between scan flip-flops. Thus, the switching activity during all the test sequences is significantly reduced using the proposed scan flip-flop reordering algorithm. Figure 4 presents the scan flip-flop reordering algorithm.
To reduce test power consumption using the proposed scan FF reordering algorithm, first, the scan input/output vector SV i (0 r i r N À 2) defined in Section 2 is set. In this case, assuming that the reduced test data T IR using the input reduction has M inputs, for the given original test data T D , SV k(input) of (N À M) compatible or inverse compatible inputs should set 'X' (don't care) in the entire series of test sequences since SV k(input) of them do not affect scan shifting in test mode. Therefore, SV i of the compatible input or the inverse compatible input can be obtained to combine the SV k(input) which set 'X' in entire test sequence and SV k(output) . Next, the scan distance SD ij is calculated using the scan input/output vector SV i . Based on this scan distance SD ij , the order of scan flip-flops is determined and 'X' values of SV i are justified to minimise the power consumption during scan shifting. Note that the result of XOR operation for the 'X' value is 0 when calculating the scan distance SD. 
The proposed compression code
In this paper, a new compression code, called the IOC-LP code, is proposed in order to efficiently compress test data for low power tests. The key idea is that the compression ratio is enhanced by increasing the occurrence frequency of one block. It can be achieved by appropriately filling specific values to numerous unspecified values ('X' values) in test sets. Compressing one block leads to lower hardware overhead for the decoder. In contrast, in the methods like the Huffman code, the hardware overhead for the decompression architecture is increased enormously as the number of the compressed blocks is increased.
The idea of the IOC-LP code is to make the 4-bit block CB, which occurs most frequently have a one-bit code, and the rest of test sets have 3-bit codes which consists of a 2-bit original block and a prefix bit to identify uncompressed 2-bit blocks. In general, it is easy to increase the occurrence frequency of one 4-bit block by filling appropriate specific values to unspecified test cubes since deterministic test patterns contain many 'X' values. This is illustrated in Fig. 5 . A segment of the test patterns with unspecified values for s953 among ISCAS '89 benchmark circuits is shown in Fig. 5a . We assume that '1111' 4-bit block occurs most frequently in entire test sets for s953 benchmark circuit. Accordingly, 'X' in Fig. 5a is replaced to '1' in order to increase the occurrence frequency of '1111' 4-bit block. This result is shown in Fig. 5b . As shown in this example, therefore, to increase the occurrence frequency of the specific 4-bit block is implemented easily and it is an efficient method to improve the compression ratio for test data.
Furthermore, all test data except the compression block are divided into 2-bit uncompression blocks in order to make the most frequent 4-bit block in test sets occur more frequently. This is because dividing the rest of the test data which excludes the CB into smaller size has much greater chances to increasing the occurrence frequency of the CB in the original test data. For this reason, the compression block occurs more frequently as the length of the UB is 2 bits. The goal of the IOC-LP code is to improve the compression ratio by increasing the occurrence frequency of the most frequent 4-bit block in T IR .
The algorithm used to implement the proposed scheme has three procedures: (1) reordering the test set and (2) compressing the test set using the IOC-LP code. The detail of each step is as follows.
Reordering the test set T IRSR :
The sequence of the reduced test set T IRSR which is mapped to specific values is reordered so that the compression block occurs more frequently. Note that T IRSR denotes the test data after the scan FF reordering. Initially, in this procedure, the first test sequence T IRSR1 is prepared as the initial test sequence and then T IRSR1 is divided into 4-bit blocks and 2-bit blocks according to whether the divided 4-bit block is the CB or not. If there is the remaining block rb 1 in T IRSR1 (rb 1 r 3 bits) and a filling block fb 2 , which the compression block is generated by adding rb 1 to, is found in the other sequences, this sequence which includes the filling block fb 2 becomes the second test sequence T IRSR2 . Otherwise, the original second test sequence T IRSR2 is prepared for the reordering procedure. By repeating the same steps mentioned above for the entire sequence k (1 r k r L À 1), the occurrence frequency of the compression block in T IRSR would increase greatly.
Compressing the test set using the IOC-LP code:
The reduced and reordered test set T IRSR is compressed by the IOC-LP code. This procedure is to make the compression block {CB} which occurs most frequently have a one-bit code, and the uncompression block {UB 1 , UB 2 , UB 3 , UB 4 } have 3-bit codes which consist of a 2-bit original block and a prefix bit to identify uncompressed 2-bit blocks. Let N ¼ {n CB , n UB1 , n UB2 , n UB3 , n UB4 } be the number of occurrences of the test set T IRSR . The total number of the compressed test data is n CB þ P 4 i¼1 3 Á n UB i .
The decompression architecture
Once the IOC-LP code has been chosen, then a FSM decoder for the IOC-LP code is synthesised. There are two inputs to the decoder, one is the ATE clock and another is the input for transferring the compressed test data from the ATE channel. Outputs for the FSM decoder consists of a data output to transfer the original test sets to scan chains in a CUT and three control outputs. The three control signals are Parallel, Serial, and Wait. These signals control the buffering and the loading of data into the controller when the data has been decoded and the ATE has been synchronised. An example of the state diagram for the FSM decoder of the proposed code is shown in Fig. 6a . The controller to transfer the decoded test data into scan chains in the CUT and to control the signals between the ATE and the FSM decoder is illustrated in Fig. 6b . Note that the clock signal Clk in Fig. 6b is the same as the test clock. The controller for the proposed compression method consists of the serialiser to shift the decoded data into scan chains, which is synchronous with the test clock, the scan clock, and the part to match the clock for the FSM decoder to the test clock. The FSM does not perform the decoding process but the controller transfers data into scan chains since the FSM clock is stable, when the sync output in the controller is a '1'. If the sync signal is '0', then the FSM decoder decodes the compressed test data with the shifting process in the controller.
Experimental results
To demonstrate the efficiency of the proposed method, the proposed compression/decompression scheme is used to compress test sets for ISCAS '89 and ITC '99 benchmark circuits. The proposed IOC-LP scheme is implemented in C. The test set for each circuit is generated by MinTest [17] in order to compare previous works, SC [6] , FDR [7] , Golomb [8] , VIHC [10] , and SHC (selective Huffman coding) [11] , show different compression ratios according to the group size, the group size for which [10] reports the best results is used. The test set applying the modified input reduction scheme T IR is used in the proposed method and the differential test sequence set of the original test vector set T D is used for the method using FDR [7] , Golomb [8] and 1111XXXXXXX1X1XX1100X1XXXXXX XXXXXXXXX XXXXXXX X 11XXXXXXXXXXXXXX001110XXX XXXXXXXXX XXXXXXXXXXX   1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 0 0 1 1 1 0 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1   a   b Fig. 5 Part of deterministic test patterns for the full scan version of s953 benchmark circuit a Test pattern filled with unspecified values b Test pattern filled with appropriate specific values VIHC [11] 
The results of the compression are presented in Table 1 . As shown in Table 1 , test data for ISCAS '89 and ITC '99 benchmark circuits are more highly compressed in the proposed scheme. The IOC-LP scheme's compression ratio is better than that of other approaches in most circuits because test data were compressed by reducing redundant testing inputs using the modified input reduction. However, the results for s13207 do not show the best compression ratio owing to the scan flip-flop reordering for low power test. As mentioned before, since the problems between the test data compression and the low power test are irreconcilable, a tradeoff between two problems must be accepted. Table 2 presents results on the reduction ratio of the average power consumption during the scan-in and scanout operations. These results show that the proposed test data compression scheme can also lead to significant savings in power consumption. As described in Section 2, the average power using the ETM is then estimated. Let P TD be the average power with the compacted test sets obtained using MinTest [17] . Similarly let P IOC-LP be the average power consumption when the IOC-LP coding is used by reordering scan flip-flops and mapping the 'X' values in T IRSR for low power test. Note that the differential test sequence set of T D is assumed to contain the same test vectors as T D , because the differential test sequence set is decompressed to the original test vector T D in test mode. Therefore, the T D for the comparison of the power reduction is considered in this paper. Table 2 shows the reduction of the average power consumption for MinTest sets with T D when the IOC-LP technique is used. The percentage reduction in power is computed as follows power reduction ratio ¼ P TD À P IOC-LP P TD Â 100 ð6Þ Table 2 shows that the average power is significantly less if the IOC-LP is used for test data compression and the decompression. In addition, since the CSR architecture is used in the method using FDR [7] , Golomb [8] and VIHC [10] , more power consumption during testing is actually required. Therefore, experimental results demonstrate that the significant reduction in power consumption during scan testing, as well as the substantial reduction in test data volume, was accomplished by using the IOC-LP scheme. Next, the area overheads of the decompression architectures for the different compression methods are compared in Table 3 . The ISCAS '89 and ITC '99 benchmark circuits are synthesised with a single scan chain and the DFTadvisor of the Mentor Graphics [18] inserts a single scan chain in the benchmark circuit from the class library. The area overhead of the decompression architecture is computed as follows
The areas for decompressors and benchmark circuits are computed by the Synopsys Design compiler with the class library. Note that the decompression for each compression scheme is configured by the parameter, the group size, shown in Table 1 . The second column in Table 3 contains the area of benchmark circuits without the decoder. As shown in Table 3 , the decompression architecture of the IOC-LP has the lowest area overhead. Furthermore, for Golomb, FDR, and VIHC, the area overheads in Table 3 exclude the area overhead for the CSR architecture. Inevitably, as mentioned before, the CSR architecture requires a high area overhead in order to reduce test data efficiently using Golomb, FDR and VIHC. For example, there are 1464 test inputs, including primary inputs and scan inputs, in s38584 of ISCAS '89 benchmark circuits. Assuming in the worst case that primary inputs and scan inputs are constructed to one scan chain, 1464 flip-flops and an XOR gate would be necessary for the CSR architecture. Although it assumes the worst case, the fact that high area overhead for the CSR architecture is required cannot be ignored. Of course, to resolve this problem, the method using the UDL or the boundary scan around another core was proposed in [13] . However, the boundary scan around another core cannot be used in all cases and the control logic to reconfigure the boundary scan to the CSR architecture is also required. Although this alleviates the problem, the area overhead to use the UDL is still high, especially in an SoC. Therefore, the proposed compression/ decompression scheme for low power testing is an effective solution for the test data compression/decompression scheme for SoCs.
Finally, it is necessary to compare the total test application time (TAT) reduction that can be achieved using the different compression techniques. Table 4 shows the comparison between the lower bounds on f sys /f T to obtain maximum TAT reduction. Note that f sys is the ratio of the system clock frequency and f T is the tester clock frequency. Thus, f sys /f T defines how fast the system clock is relative to the ATE clock. The same concept in [11] is used in these experiments in order to compare the total test application time (TAT). For IOC-LP code, the block size b determines the lower bound on f sys /f T . When the decoder receives a complete codeword, it needs to output the corresponding block of b bits into the serialiser in the controller of the IOC-LP decompression architecture. Therefore, the lower bound is given by f sys /f T Z (b/L min ) where L min is the size of the smallest codeword. Since it is more difficult to achieve the maximum reduction of the total test application time when the value of the lower bound on f sys /f T is higher, it is advantageous to have a lower value of the lower bound on f sys /f T .
Conclusion
In this paper, we have proposed a new test data compression method for low power testing using the input reduction. Unlike previous studies using different test sequences, a simple decompression architecture with low hardware overhead was achieved by our approach. By using the modified input reduction scheme, we obtain effectively compressed test data without any loss of test data information and the reduction of the power consumption during scan testing. In addition, reordering scan flip-flops is proposed to reduce the power consumption during the scan out operation as well as the scan in operation. Then the compressed test data T IRSR is compressed again by the proposed compression code. The decompression architecture for the proposed code is simple and small since it has a simple decoding process. Therefore, the area overhead for our approach is much lower than that of previous approaches. Even though there are many steps involved in processing the test set so that it is ready for compression, it is almost inevitable to satisfy both conditions for test data compression and low power consumption. In addition, the proposed technique is easily integrated to the DFT process of overall design process, if it is possible to use DFT techniques such as scan flip-flops. Therefore, as shown in experimental results, the proposed approach is an attractive and effective solution of test data compression/decompression for low power testing.
Although the proposed IOC-LP technique may lead to additional routing overhead owing to the modified input reduction scheme and the scan reordering, this routing overhead can be negligible because it can be optimised during the place and route process. Therefore, this method can be applied for all practical circuits without considering routing overheads. In addition, considering the routing overhead, we will develop the more advance version of the IOC-LP in the future.
Acknowledgment
This work was supported by 'System IC 2010' project of Korea Ministry of Science and Technology and Ministry of Commerce, Industry and Energy.
