Abstract-We present a test resource partitioning (TRP) technique that simultaneously reduces test data volume, test application time, and scan power. The proposed approach is based on the use of alternating run-length codes for test data compression. We present a formal analysis of the amount of data compression obtained using alternating run-length codes. We show that a careful mapping of the don't-cares in precomputed test sets to 1's and 0's leads to significant savings in peak and average power, without requiring either a slower scan clock or blocking logic in the scan cells. We present a rigorous analysis to show that the proposed TRP technique reduces testing time compared to a conventional scan-based scheme. We also improve upon prior work on run-length coding by showing that test sets that minimize switching activity during scan shifting can be more efficiently compressed using alternating run-length codes. Experimental results for the larger ISCAS89 benchmarks and an IBM production circuit show that reduced test data volume, test application time, and low power-scan testing can indeed be achieved in all cases.
I. INTRODUCTION
Intellectual property (IP) cores are now commonly used in large system-on-a-chip (SOC) designs. Although IP cores help reduce design cycle time, they pose several difficult test challenges. The precomputed test patterns provided by the core vendors must be applied to each core without exceeding the power constraints of the SOC. The system integrator is confronted with the problems of test data volume, test application time, and power consumption during test. New techniques based on test resource partitioning (TRP) that reduce test data volume, testing time, and power during testing are, therefore, necessary to facilitate plug-and-play SOC test automation.
It is well known that power consumption in test mode is considerably higher than during normal mode [1] . Therefore, special care must be taken to ensure that the power rating of the SOC is not exceeded during test application. Test data volume and test application time are two additional problems faced in SOC test integration. As SOCs grow in size and complexity, the volume of test data and the test application time are also increasing rapidly. Current techniques do not provide a unified solution to the three problems of test data volume, test application time, and test power. These techniques typically provide point solutions that target at most two out of the above three objectives. For example, a number of techniques target only test data volume and test application time.
Structural methods for reducing test data volume and testing time typically require design modifications. For example, the Illinois scan architecture (ILS) presented in [2] and [3] offers an alternative to conventional scan design. In the ILS architecture, a single scan-in pin is used to simultaneously feed the multiple scan chains of cores during broadcast mode. However, a drawback of the above method is that it can lead to higher power dissipation in test mode; the ILS architecture does not address the problem of reducing power consumption during scan testing. In a different approach, which can be described as an algorithmic strategy, the precomputed test set T D provided by the core vendor is compressed (encoded) to a much smaller test set TE and stored in automatic test equipment (ATE) memory. An on-chip decoder is used for pattern decompression to generate T D from T E during pattern application [4] - [8] . These techniques are typically based on statistical codes, run-length codes, and their variants, e.g., Golomb and frequency-directed run-length (FDR) codes. These codes can also form the basis for TRP [4] . A particularly attractive feature of TRP based on compression methods is that it does not require any redesign of IP cores. Test data volume reduction techniques based on on-chip pattern decompression are also presented in [9] - [13] . The compression scheme presented in [11] utilizes a linear mapping network to drive a large number of internal scan chains through a small number of external pins. The RESPIN method proposed in [9] uses the scan chains of one embedded core to decode test patterns for another core or interconnection. The technique based on geometric shapes can be used for compressing test vectors if an embedded processor is available for pattern decompression [10] . The method presented in [12] achieves compression by filling don't-care bits in the test vectors such that these bits are not stored on ATE and are not transferred to the chip if decompression is done on-chip. Another method based on tester-based stimulus and response compression (OPMISR) has been shown to be very effective for testing large scan-based circuits with limited input-outputs (I/Os) [13] .
Another way to reduce test data volume and testing time is to use built-in self-test (BIST) [14] . However, BIST can only be applied to SOCs if the IP cores in them are BIST-ready. BIST also imposes an area penalty and it often requires additional design changes. Since most currently available IP cores are not BIST-ready, the incorporation of BIST in them requires considerable redesign.
A number of techniques to control power consumption in test mode have also been presented in the literature. These can be broadly classified as a) structural, b) algorithmic, and c) tester based. 1) Structural methods: These methods, which do not address test data volume or testing time, are based on the following design techniques.
• Gated scan chains: These refer to schemes that use gating techniques to clock portions of the scan chain during scan operation [15] - [17] . In [15] , shift registers are used to gate portions of the scan chain during shifting while counters are used for gating in [16] . A decoder/multiplexer-based architecture for gating scan chains has also been proposed in [17] .
• Modified test pattern generator (TPG): Test generation circuits can be tailored to yield low-power vectors without significantly affecting the fault coverage and testing time [18] , [19] . The method presented in [18] is based on gated clock scheme for the TPG. The TPG is divided into two groups of flip-flops and each group is activated by a clock running at half the speed of the normal clock. Another TPG based on cellular automata is presented in [19] .
• Modified scan latch and vector inhibition: Scan power can also be reduced by modifying the scan cell and adding gating logic to mask the scan path activity during shifting [20] . This approach coupled with random pattern suppression provides significant power savings during BIST [21] . The vector inhibiting technique presented in [21] provides 0278-0070/03$17.00 © 2003 IEEE a hardware solution to the power minimization problem and is shown to significantly decrease power consumption during BIST sessions. The method decreases the switching activity in the internal nodes of the circuit under test during scan-in and scan-out by holding the output of the scan cell to a fixed value.
• Scan chain organization: The switching activity in the scan chain can be reduced by shortening and reorganizing the scan chains. The scan array solution presented in [22] reduces power dissipation by using two-dimensional scan arrays, which reduce switching activity and allow the use of a slower scan clock. 2) Algorithmic methods: These include automatic test pattern generation (ATPG) under power constraints, techniques based on test data compression, and test scheduling algorithms.
• ATPG techniques: ATPG techniques for generating vectors that lead to low power testing are described in [23] and [26] . However, while these techniques provide reduction in power consumption, they do not lead to any appreciable decrease in test data volume.
• Test data compression: Test generation for low-power scan testing usually leads to an increase in the number of test vectors [23] . On the other hand, static compaction of scan vectors causes significant increase in power consumption during testing [26] . While compacted vectors are useless if they exceed power constraints, uncompacted vectors cannot be used as they require excessive tester memory.
Power minimization based on test data compression was first presented in [34] . • Test scheduling: Test scheduling techniques for system integration attempt to reduce testing time by applying scan/BIST vectors to several cores simultaneously [27] - [32] . Test scheduling is typically carried out under power constraints since multiple cores are tested in parallel. 3) Tester frequency: Reduction in power dissipation can be archieved by running the tester at a slower frequency. Although this method offers the simplest way to reduce power consumption, it leads to unacceptable testing times and is, therefore, impractical. We note that structural methods for reducing test power in SOCs require modification to the embedded cores, e.g., via scan latch reordering (SLR) [24] , scan chain, and scan cell redesign. This is usually not feasible for IP cores. ATPG techniques are also infeasible for IP cores since they require gate-level structural models [25] . Moreover, ATPG techniques that address test power do not directly consider test data volume and testing time issues. We, therefore, focus on test data compression for reducing test power, test data volume, and testing time simultaneously.
It was shown in [34] that test data volume and test power can be reduced simultaneously using Golomb coding. The key idea is to map the don't-cares in the test vectors to zero. This results in long runs of zeros that can be efficiently compressed using Golomb code [6] . The resulting fully specified test set also reduces switching activity during scan shifting. Hence, significant reduction in scan power is accompanied with test data compression. However, the switching activity can be reduced further if a more efficient procedure is used to map the don't-cares to binary values. In particular, further savings in power can be achieved if we map the don't-cares to derive test sets that minimize switching activity. Unfortunately, these test sets are not amenable for compression by run-length codes.
A recent test data compression approach offers an elegant solution to the problem of simultaneous reduction of test data volume and scan power [35] . The key idea in this work is to use a minimum transition count (MTC) mapping of don't-cares in the test set and a variant of Golomb coding that can effectively handle runs of both 0's and 1's.
In this paper, we present a new class of codes, called alternating runlength codes, for test data compression. These codes are particularly effective for compressing test sets that lead to minimum switching activity. They also reduce testing time due to the reduction in the amount of test data that needs to be transported from the tester to the SOC. We, therefore, demonstrate that we can reduce test data volume, test application time, and test power simultaneously by first using MTC mapping and then using alternating run-length codes for compressing the resulting runs of 0's and 1's. Compared to [35] , we achieve the same reduction in scan power but significantly greater test data compression.
The organization of the paper is as follows. In Section II, we first review FDR coding. We then present the alternating run-length code, describe the data compression procedure and the decompression architecture, describe the power estimation model, and highlight the key differences from [34] . In Section III, we present a rigorous testing time analysis for TRP based on alternating run-length codes. In Section IV, we present experimental results for the large ISCAS89 benchmark circuits as well as for a real-life microprocessor circuit from IBM. Finally, in order to explain the experimental results, we present in Section V a formal analysis of the alternating run-length code for a memoryless binary data source and for deterministic sequences.
II. ALTERNATING RUN-LENGTH CODE
In this section, we present the alternating run-length code, and describe the compression procedure and decompression architecture. We also present the power estimation model used to estimate the power dissipation during scan testing.
We first review FDR coding and its application to test data compression [33] . The FDR code is a data compression code that maps variable-length runs of 0's to variable-length codewords. The encoding procedure is illustrated in Fig. 1 . As an example, consider a run of six 0's (0000001) in the input stream. This run belongs to group A3 and it is mapped to the codeword 110000. The reader is referred to [33] for a detailed discussion and motivation for the FDR code.
It was shown in [33] that the FDR code is very efficient for compressing data that has few 1's and long runs of 0's. However, for data streams that are composed of both runs of 0's and runs of 1's, the FDR code is rather inefficient. In fact, in our initial experiments, the sizes of encoded test sets obtained for such test sets were larger than the sizes of uncompressed test sets. This provides the motivation to develop a code that can efficiently compress both runs of 0's and 1's. The alternating run-length code is also a variable-to-variable-length code and consists of two parts-group prefix and tail. The prefix identifies the group in which the run-length lies and the tail identifies the member within the group. An additional parameter associated with this code is the alternating binary variable a. The encoding produced by the alternating run-length code for a given run-length depends on the value of a. If a = 0, the run-length is treated as a run of 0's. On the other hand, if a = 1, the run-length is treated as a run of 1's. Note that the value of a for the different runs are not added to the encoded data stream. Fig. 3 shows the encoded data obtained using the two codes for a data stream composed of interleaved runs of 0's and 1's. We observe that the size of the FDR-encoded data set (22 bits) is larger than the size of the input data set (18 bits); hence, the FDR code provides no compression for this case. On the other hand, the size of the alternating run-length-encoded data set (14 bits) is smaller than the size of the input data set. Therefore, we are able to achieve compression with the new code. We also note that a = 0 is used for compressing a runs of 0's, and a = 1 is used for compressing a runs of 1's, and a = 0 is then used for compressing the next run of 0's. Hence, a is inverted after each run is encoded and it keeps alternating between 0 and 1 thereafter. In this paper, we assume a default initial value of a = 0, i.e., we assume that the input data stream starts with a run of 0's.
A. Decompression Architecture
An on-chip decoder decompresses the encoded test set T E and produces TD. Even though TD contains more patterns than test sets obtained after static compaction of ATPG vectors, the testing time is reduced since pattern decompression can be carried out on-chip at higher clock frequencies. Let rmax be the longest run of 0's in TD and let k = dlog 2 r max e. As discussed in [33] , the FDR decoder can be efficiently implemented by a k-bit counter, a log 2 k-bit counter and a finite-state machine (FSM), and it is independent of the precomputed test set and the circuit under test. The decoder for alternating run-length code can be implemented by making a small modification to the FDR decoder. The block diagram of the alternating run-length decoder is shown in Fig. 4 . An additional toggle flip-flop and an exclusive-OR gate are required to switch between a = 0 and a = 1. For any circuit whose test set is compressed using alternating run-length code, the given logic is the only additional hardware required other than the two small counters.
The bit_in is the input to the FSM and an enable (en) signal is used to input encoded data when the decoder is ready. The FSM output counter_in is used to shift in the prefix or the tail into the k-bit counter and the signals shift, dec 1 , and rs 1 are used to shift the data in, to decrement, and to indicate the reset state of the counter, respectively.
The second counter of log 2 k-bits is used to count the length of the prefix and the tail so as to identify the group. The signals inc and dec 2 are used to increment and decrement the counter, respectively, and rs2
indicates that the counter has finished counting. Finally, the signal out is the decoder output and v indicates when the output is valid. The operation of the decoder is as follows.
• The FSM feeds the k-bit counter with the prefix. The end of the prefix is identified by the separator zero. The en, shift, and inc signals are high till the 0 is received.
• The FSM outputs 0's, decrements the k-bit counter, and makes the signal dec 1 high. It continues to output 0's until rs 1 goes high. The signal v is used to indicate a valid output.
• The tail part is shifted in until the log 2 k-bit counter resets to zero.
The dec 2 signal then goes high, the counter is decremented, and the signal rs2 indicates when it is in the zero state.
• The FSM output 0's corresponding to the tail followed by a zero at the end of tail decoding. The state diagram for the FSM used for pattern decompression is shown in Fig. 5 . We note that the state diagram consists of only nine states. We synthesized the FSM using Synopsys Design Compiler. The synthesized circuit contains only four flip-flops and 38 gates. Therefore, the additional hardware needed for the decoder is very small, and existing counters on the SOC can be reused for decompression. The decoder area is, therefore, comparable to the area of the decoder in [35] .
Since the decoder needs to communicate with the tester, and both the codewords and the decompressed data can be of variable length, proper synchronization must be ensured through careful design. In particular, the decoder must communicate with the tester to signal the end of a block of variable-length decompressed data. These and other related decompression issues are discussed in detail in [33] . 
B. Power Estimation
We now examine the impact of test set encoding on power consumption during scan testing. We then show how power consumption can be minimized by appropriately assigning binary values to the don't-care bits in T D and then applying the alternating run-length code for test data compression.
We use the weighted transitions metric (WTM) introduced in [26] to estimate the power consumption due to scan vectors. The WTM models the fact that the scan power for a given vector depends not only on the number of transitions in it but also on their relative positions. For example, consider a scan vector v1v2v3v4v5 = 01000, where v1 is first loaded into the scan chain. The 0-to-1 transition between v 1 and v 2 causes more switching activity in the scan chain than the 1-to-0 transition between v2 and v3. We use the same model to estimate the power consumption during the scan-in and scan-out operations.
The weighted transitions count metric is also strongly correlated to the switching activity in the internal nodes of the core under test during the scan in operation. It was shown experimentally in [26] that scan vectors with higher weighted transition metric dissipate more power in the core under test. In addition, experimental results for an industrial ASIC [17] confirm that the transition power in the scan chains is the dominant contributor to test power.
Consider a scan chain of length l and a scan vector tj = t If the peak power exceeds a threshold value, it can cause structural damage to the silicon or to the package. Likewise, elevated average power can also cause structural damage to the silicon, bonding wires or the package. It also adds to the thermal load that must be transported away from the device under test.
It was shown in [34] that Golomb coding can be used to simultaneously reduce the volume of test data, Pavg and P peak . The don't-care bits were mapped to 0's and the resulting test data was compressed using the Golomb code. While this approach provides significant reductions in power consumption, and at the same time, decreases the test data volume considerably, it does not minimize either P avg or P peak .
Here, we improve upon [34] by using the alternating run-length code. Table I shows a partially specified scan vector t i = 01XXX10XXX01 with scan chain length l = 12, where X denotes a don't-care bit. If the don't-cares are mapped to appropriate binary values to minimize the weighted transition metric, then a sequence dXXXXd 0 , d 2 f0; 1g must be mapped to dddddd 0 . Similarly, a sequence dXXXX must be mapped to ddddd. This ensures that the few unavoidable transitions occur "late" during scan-in. This mapping of don't-cares is identical to the MTC mapping used in [35] . Table I shows the impact of don't-care mapping on the WTMi for a given test vector t i . The compression obtained using the FDR code and the alternating run-length code are also shown. Note that the Golomb code was used for compression in [34] . Since then, the FDR code has been shown to be more efficient than the Golomb code, hence, we consider only the FDR code here. The WTM value is clearly higher if the don't-cares are always mapped to zero. However, FDR coding is much more effective in reducing test data volume if this strategy is used. On the other hand, while FDR coding is ineffective for the fully specified test vector which minimizes WTM, the alternating run-length code provides the same compression as achieved with the FDR code with all don't-cares mapped to 0's. The above example demonstrates that a careful mapping of the don't-cares to 0's and 1's, followed by alternating run-length coding of the resulting test data, not only provides reduction in test data volume, but also minimizes the scan power dissipation.
III. TESTING-TIME ANALYSIS
We now analyze the testing time when a single scan chain is fed by the alternating run-length decoder. Test data compression decreases testing time and allows the use of a low-cost ATE running at a lower frequency to test the core without imposing any penalties on the total testing time. Let the ATE frequency and the on-chip scan frequency be f ATE and f scan , respectively, where f ATE < f scan . Since the ATE and the scan chain operate at two different frequencies, the decoder also consists of two parts-one operating at f ATE and the other operating at f scan such that f ATE = f scan =, > 1. The parameter should ideally be a power of two since it is easier to synchronize the ATE clock with the scan clock for such values of [37] . If the scan chain has multiple segments operating at different clock frequencies, each segment has a dedicated decoder for test data decompression. Fig. 6 outlines the decoder partitioned into two frequency domains. The proposed TRP scheme, therefore, decouples the internal scan chain(s) from the ATE via the use of a decoder interface. This decoupling implies that the scan clock frequency is no longer constrained by the ATE clock frequency limitation. Thus, fscan can now be made larger than f ATE . For the alternating run-length code, let t(k; i) be the total time required to decompress a codeword that is the ith member of the kth group, and t shift (k; i) and t decode (k; i) be the time required to transfer the data from the ATE to the chip and to decode the codeword, respectively. An upper bound on t(k; i) can be obtained by assuming that decoding begins after the complete codeword is transferred from the ATE. This implies that t(k; i) t shift (k; i) + t decode (k; i):
For the alternating run-length code, the prefix length and the tail length of the codeword belonging to the kth group are each equal to k bits; see Fig. 2 . Since data is transferred from the ATE to the chip at the tester frequency, the time required to transfer any codeword of the kth group is given by t shift (k; i) = 2k f ATE :
For any codeword, the prefix is identical to the binary representation of the run-length corresponding to the group's first element. As shown in Fig. 2 , the number of 0's in the prefix of a codeword belonging to the kth group is equal to 2 k 0 2. The decoder has to output (2 k 0 2) 0's before the tail decoding starts. The time t prex (k) required to decompress the prefix of any codeword from the kth group is, therefore, The total decoding time t decode (k; i) is given by t decode (k; i) = t prex (k) + t tail (k; i)
The total time needed to decompress the codeword is given by t(k; i) t shift (k; i) + t decode (k; i)
where f scan = f ATE . Let q(k; 1); q(k; 2); q(k; 3); . . . ; q(k; 2 k ) be the absolute frequencies of the members of the kth group. Therefore, the decompression time (k) for the runs belonging to the kth group is given by
Let us assume that kmax is the largest group. The test application time T AT A SSC (A in the superscript denotes the alternating run-length code) for the entire test set with a single scan chain (SSC) is given by
where jTEj is the size of the encoded test set. Next, to derive a lower bound on the testing time, suppose the tail bits are shifted in while the prefix is being decompressed. Since the tail bits are now shifted from the ATE while the prefix bits are decoded, the time required to shift any codeword of the kth group is given by t shift (k; i) = k fATE :
Therefore, a lower bound on decoding time is given by t(k; i) t shift (k; i)+t decode (k; i) = 1 f ATE k+ 2 k 02+i :
We next compare the testing time using the proposed TRP scheme with that for an ATPG-compacted test set with p patterns and an external tester operating at frequency f ?
ATE . Let the length of the scan chain be n bits. The size of the ATPG-compacted test set is pn bits and To conclude the analysis, we note that the above bounds allow us to evaluate the testing time without a detailed analysis of the asynchronous handshaking protocol between the tester and the decoder. The exact testing time, which lies between the two bounds, can be determined through a bit-by-bit analysis of the encoded test data. The formulation, based on upper and lower bounds, allows us to demonstrate the effectiveness of the proposed TRP scheme without resorting to such detailed analysis.
IV. EXPERIMENTAL RESULTS
In this section, we evaluate the effectiveness of alternating run-length coding for reducing test data volume, testing time, and power consumption during scan testing. We carried out experiments for the ISCAS89 benchmark circuits and a production circuit from IBM. The experiments were conducted on a Sun Ultra 10 workstation with a 333-MHz processor and 256 MB of memory. We only considered the large ISCAS89 full-scan circuits with a single scan chain each. Table II presents the experimental results for the ISCAS benchmarks for test sets obtained from the Mintest ATPG program [36] . We compare the compression obtained using the FDR code and the alternating run-length code. In order to compare with [34] , we first assigned all don't-cares to 0's and compressed TD using the FDR code. We then carefully mapped the don't-cares to minimize WTM and compressed the resulting T D using the alternating run-length code. Table II shows the sizes of TD, the size of the smallest encoded test set obtained after static compaction using Mintest, the size of compressed test set obtained with all don't-cares mapped to zero (jTE1j) and size of compressed test set obtained with an optimal mapping of don't-cares to minimize WTM (jT E2 j). We also compare our compression results with the results published in [35] . Note that we do not include results for s35932 here since the test cubes that are available to us were found to provide incomplete fault coverage.
As is evident from Table II , the alternating run-length code yields better compression than the FDR code for four out of the six benchmark circuits. This is particularly remarkable since, as we show later in this section, high compression with the alternating run-length code is also accompanied with significant reduction in testing time and test power. The results also show that ATPG compaction is not always necessary for saving memory and reducing testing time. In all cases, the size of the encoded test set is less than the smallest ATPG-compacted test sets known for these circuits. This comparison is essential in order to show that storing TE in ATE memory is more efficient than simply applying static compaction to test cubes and storing the resulting com- The compression results are also significantly better (over 16% on average) than in [35] . This is not entirely unexpected, since the alternating run-length code, which uses an underlying FDR code, is better tailored than the Golomb code to exploit the properties of test sets. Note that the compression is considerably higher in [35] if SLR is allowed; however, we assume in this work that SLR is not possible for the IP cores in an SOC. Table III presents test application times for the proposed method using the alternating run-length code and for traditional scan-based testing with f ? ATE = f ATE . We note that in all the cases the upper bound on test application time using the proposed scheme is lower than that for scan-based external testing. The actual test application time for the proposed TRP scheme lies between the lower and upper bounds.
For example, the test application time for s13207 with = 8, and f ? ATE = f ATE =20 MHz lies between 1.848 and 2.664 ms, which is lower than the time of 8.155 ms required for conventional scan testing using ATPG-compacted patterns derived using Mintest. Furthermore, let us assume that the desired testing time for s38417 is the average of lower and upper bounds i.e., 2.256 ms. In this case, an external tester operating at f ? ATE =72.29 MHz will be required, as opposed to a 20-MHz tester with the proposed TRP scheme.
We next present results on the peak and average power consumption during the scan-in operation. These results show that using test data compression with a careful mapping of don't-cares to 0's and 1's can also lead to significant savings in power consumption. As discussed in Section II, we estimate power using the WTM. Let P C peak (P C avg ) be the peak (average) power with compacted test sets obtained using Mintest, and let P F peak (P F avg ) be the peak (average) power when the FDR code is applied to T D after mapping the don't-cares to 0's. Similarly, let P A peak (P A avg ) be the peak (average) power when the alternating run-length code is applied to T D after mapping the don't-cares to minimize the WTM. Table IV compares Table IV shows that the peak power and average power are significantly less if the alternating run-length code is used for test data compression and the decompressed patterns are applied during testing. On average, the peak (average) power is 37.72% (84.32%) less in this case than for the Mintest test sets. (Note that the power values for the Mintest test sets are shown in Table V .) Thus, our results demonstrate that substantial reduction in test data volume and testing time are also accompanied by significant reduction in power consumption during scan testing. Note that the power savings obtained in this paper are identical to that in [35] since the don't-cares are optimally mapped to 0's and 1's in both papers. Additional power reduction was obtained in [35] using SLR, a structural modification that we do not consider in this paper due to the underlying assumption of IP cores.
We next present results on the peak and average power consumption during the scan-out operation. The scan-out power depends, to a large extent, on the responses of the core under test to the test patterns. Since our approach is aimed at minimizing the scan-in power, we conducted a set of experiments to evaluate the impact of scan-in power minimization on the scan-out power. Table V shows that the peak power and average power are significantly less in five out of six cases if alternating run-length coding is used for test data compression. The peak power during scan-out operation was only slightly higher for the s9234 benchmark circuit. On average, the peak (average) power is 20.25% (33.17%) less than for the Mintest test sets. Thus, our results demonstrate that the substantial reduction in test data volume is also accompanied by significant reduction in power consumption during scan testing. The reduction in scan-out power is an important added advantage since we do not directly target scan-out power in our compression scheme.
Table VI presents the experimental results when the FDR code and the alternating run-length code are applied to scan vectors for a production circuit from IBM. This circuit contains 1.2 million gates, 32 200 latches and a total of 30 scan chains. We were provided with four sets of scan vectors for this circuit. Each vector set consists of 32 patterns (a total of 1 031 072 bits of test data per vector set). We find that the compression obtained using the alternating run-length code is comparable to that obtained using the FDR code. The slight (1-2%) decrease in compression is offset by the significant savings in test power as is shown next.
Table VII presents the average and peak power values for the IBM circuit when the FDR code and the new alternating run-length code are applied to the scan vectors. We find that the alternating run-length code is extremely efficient for reducing test power. Compared to the FDR code, as much as 43.36% greater reduction is obtained in peak power and as much as 42.38% greater reduction is obtained in average power.
Finally, we present results on test application time for the IBM circuit when the TRP scheme based on the new alternating run-length codes is applied to the scan vectors. We assume that the circuit consists of a single scan chain. The minimum (maximum) testing time for the four vectors using TRP and a tester with f ATE =50 MHz is 12.757 ms (15.205 ms) . On the other hand, the test application time based on traditional scan-based testing using the same external tester is 61.863 ms.
V. ANALYSIS OF ALTERNATING RUN-LENGTH CODE
In this section, we provide analytical insights into the reasons underlying the high compression obtained using alternating run-length codes. We first analyze the effectiveness of the alternating run-length code for a memoryless data source. We then present an analysis for deterministic sequences. Let the probability of producing 0's be p and the probability of producing 1's be p1 = 1 0 p. Next, we develop an analysis technique to determine the worst and best case compressions that can be achieved using alternating run-length codes for some generic parameters of precomputed test sets using this technique. Suppose T D contains r 1 runs of 0's, r 2 runs of 1's, and a total of n bits. We first determine C A max , the number of bits in the encoded test set TE in the worst case, i.e., when the compression is the least effective. In doing so, we also determine the distribution of the runs of 0's and runs of 1's that gives rise to this worst case compression.
Suppose T D contains k i runs of 0's and j i runs of 1's of length i
with maximum run-length lmax. Let the size of the encoded test set T E be A bits, and let = A 0 (n 0 r 1 0 r 2 ) measure the amount with the size of the corresponding codeword. For example, the codeword corresponding to a run of length zero contains two bits (one more than the original run), the codeword for run-length one is of the same size as the original run-length, and so on. The difference between these two quantities contributes to , and it appears as the coefficient of the appropriate k i and j i term in the equation for . We next use the following simple integer linear programming (ILP) model to determine the maximum value of . This yields the worst case compression (C A max ) using alternating run-length codes.
Maximize:
= 2k 0 + 2j 0 + k 1 + j 1 + 2k 2 + 2j 2 + k 3 + j 3 0 k 5 0 j 5 0 k 7 0 j 7 0 2k 8 0 2j 8 0 3k 9 0 3j 9 6 111(up to l max ) subject to:
This ILP model can be easily solved, e.g., using a solver such as lpsolve [38] , to obtain the worst case values for the k i s and j i s. Note that even though l max appears in the above ILP model, we do not make any explicit use of it. Our goal here is to determine a worst case distribution of the runs of 0's and runs of 1's. Generally, short run lengths yield the worst case compression; however, if l max must exceed a minimum value to satisfy the constraints listed above. We can use lpsolve to determine the minimum lmax by incrementally increasing lmax until the optimization problem becomes feasible. Table VIII lists the size C A max of the encoded data set for worst case compression for various values of n, r1, and r2. The last two columns show the distribution of runs of 0's and 1's for which the worst case compression is achieved (a=b indicates a runs of length b). Note that this distribution is not unique since a number of run-lengths can yield the worst case distribution.
Next, we analyze the best case compression achieved using alternating run-length codes for any given n, r 1 , and r 2 . Since the compression is better for longer run-lengths, we also need to constrain the maximum run-length in this case. As before, we formulate this problem using ILP, and the following model can be solved using lpsolve to obtain a best case distribution of runs and C A max , the number of bits in the encoded test set in the best case. Table IX lists the run-length distributions corresponding to the best case compression using alternating run-length codes. The corresponding percentage compression values are also listed. In Fig. 8 , we plot the best case and worst case size of the encoded data as the number of runs r 1 and r 2 are varied (for n = 1000). The upper surface represents the worst case compression and the lower surface represents the best case compression that can be obtained using alternating run-length codes for different values of r 1 and r 2 . The actual size of the encoded data will lie between these two bounds and is dependent on the distribution of the runs of 0's and runs of 1's. We note that for small values of r 1 and r 2 , the bounds are very close to each other. For example, for r1 = 20 and r2 = 15, the difference between C A max and C A min is only eight bits; hence, the alternating run-length code is robust, i.e., its efficiency is relatively insensitive to variations in the distributions of the runs.
VI. CONCLUSION
We have shown that TRP based on test data compression can be used to reduce SOC test data volume, testing time, and test power simultaneously. The proposed TRP method is based on the use of a new code, which we call the alternating run-length code. We have shown that a careful mapping of the don't-cares in precomputed test sets to 1's and 0's leads to significant savings in peak and average power, without requiring either a slower scan-clock or blocking logic in the scan cells. In addition, the on-chip decompression of test pattern decouples the internal scan chain(s) from the ATE, thereby allowing higher scan clock frequency. We have presented a rigorous testing-time analysis for compression/decompression based on alternating run-length codes. Experimental results for the ISCAS89 benchmark circuits and for an IBM production circuit show that a slower ATE can often be used with no adverse impact on testing time. Therefore, the proposed approach not only decreases test data volume and the amount of data that must be transferred from the ATE, but it also reduces test power and testing time, and facilitates the use of less expensive ATEs.
