This article presents several scan-cell reordering techniques to reduce the signal transitions during the test mode while preserving the don't-care bits in the test patterns for a later optimization. Combined with a pattern-filling technique, the proposed scan-cell reordering techniques can utilize both high response correlations and pattern correlations to simultaneously minimize scan-out and scan-in transitions. Those scan-shift transitions can be further reduced by selectively using the inverse connections between scan cells. In addition, the trade-off between routing overhead and power consumption can also be controlled by the proposed scan-cell reordering techniques. A series of experiments are conducted to demonstrate the effectiveness of each of the proposed techniques individually.
INTRODUCTION
By enhancing circuit's controllability and observability, scan design has been a widely used DFT technique to achieve high fault coverage for a complex circuit [Bushnell et al. 2000] . However, with the scan design, the CircuitUnder-Test (CUT) consumes much more power in its test mode than that in its functional mode Zorian [1993] due to the following reasons. First, when Authors' address: Y.-Z. Wu and M. C.-T. Chao (corresponding author), Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan; email: mango@faculty.nctu.edu.tw. Permission to make digital or hard copies part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. c 2010 ACM 1084 -4309/2010 /11-ART10 $10.00 DOI: 10.1145 /1870109.1870119. http://doi.acm.org/10.1145 using the scan design to shift in test patterns and shift out test responses, a large number of signal transitions may occur along the scan paths, which induce even more signal transitions on the CUT and hence consume higher power. Also, the clock-gating logics, which has been a popular design technique to reduce the power consumption by selectively updating only part of the flip-flops, are forced to turn off during the scan-shift cycles. Therefore, all the flip-flops are updated simultaneously in the test mode, which leads to higher power consumption as well.
This excessive power consumption during the scan-based testing may result in physical damage or reliability degradation to the CUT, and in turn decreases the yield and product lifetime [Girard 2002] . As the number of scan cells keeps on growing in modern designs, this increasing power consumption has become one of the biggest barriers to effective scan-based testing.
A common practice to lower the power consumption during scan-based testing is to reduce the number of scan cell's signal transitions, which can be classified into the following three types: (1) capture transitions, generated by the same scan cell's value difference between the scan-in pattern and the corresponding captured response; (2) scan-out transitions, generated by two adjacent scan cells' value difference between their scan-out response; and (3) scan-in transitions, generated by two adjacent scan cells' value difference between the scan-in patterns. The first transition type is associated with the capture power and the last two types are associated with the scan-shift power.
In order to reduce the capture transitions, specialized ATPG techniques [Chandra et al. 2008; Sankaralingam et al. 2002; Remersaro et al. 2006; Wen et al. 2005] are proposed to generate test-pattern vectors which have a minimal hamming distance with their corresponding test-response vectors. Because the don't-care bits in their test cubes are fully specified for minimizing the capture transitions, the preceding ATPGs preclude the possibility for further test compaction or compression, and hence may result in a larger test set.
Methods are proposed to utilize the don't-care bits to minimize the scanin transitions for a given test set [Mrugalski et al. 2007; Li et al. 2008; Lin et al. 2006; Sankaralingam et al. 2000; Sinanoglu et al. 2003 ]. Sankaralingam et al. [2000] proposed a don't-care-filling technique, named MT-fill, guaranteeing that the scan-in transitions generated by its filled patterns are minimum for the given test set and scan-cell ordering. The methods in Sankaralingam et al. [2000] , Mrugalski et al. [2007] , Lin et al. [2006] reduced the test power as well as the test data volume based on built-in decompression hardware. Sinanoglu et al. [2003] added Xor gates or inverters along the scan paths to minimize the scan-in transitions. Li et al. [2008] proposed a don't-care-filling technique which can simultaneously reduce the scan-in, scan-out, and capture transitions.
Another technique to reduce the scan-shift power is to partition the scan cells into multiple groups and activate only one group at a time during the scan-shift cycles [Bonhomme et al. 2001; Huang et al. 2001; Rosinger et al. 2004; Sankaralingam et al. 2001; Saxena et al. 2001; Whetsel 2000] . It can limit the concurrent transitions in a small portion of the CUT. The partition · 10: 3 methods require special control architectures to the scan designs, such as gated clocks [Bonhomme et al. 2001] , central control unit for each group's clock signal [Rosinger et al. 2004; Whetsel 2000] , or specialized scan cells along with multiphase generator [Huang et al. 2001] . Sankaralingam et al. [2001] further minimize the capture power by only capturing responses for certain selected groups of scan cells. It requires a customized ATPG and discards a significant portion of responses.
Methods in Bonhomme et al. [2002] , Dabholkar et al. [1998] , Sinanoglu et al. [1998] change the order of scan cells along the scan paths to minimize both scan-in and scan-out transitions based on given test patterns and responses. This scan-cell-reordering technique saves the scan-shift power, but sacrifices the opportunity of optimizing the wire length of scan paths during the APR stage [Makar 1998; Hirech et al. 1998 ]. Methods in Bonhomme et al. [2003 Bonhomme et al. [ , 2004 further consider the routing overhead during the reordering process such that the imposed routing overhead can be limited. However, one serious disadvantage in the scan-chain-reordering techniques [Bonhomme et al. 2002 [Bonhomme et al. , 2003 [Bonhomme et al. , 2004 Dabholkar et al. 1998 ] is that the exact test patterns and responses need to be obtained in advance. As the result, no don't-care bits can be utilized for a further reduction to scan-in transitions or test data volume, such as Sankaralingam et al. [2000] , Mrugalski et al. [2007] , Lin et al. [2006] , Sinanoglu et al. [2003] , Li et al. [2008] . Sinanoglu et al. [1998] can reorder the scan cells based on the test set with don't-care bits. However, Sinanoglu et al. [1998] relies on the controllability measures to approximate the response correlation between scan cells, which may not be able to reflect the reality. Also, Sinanoglu et al. [1998] did not consider the impact of any don't-care-filling technique.
In this article, we attempt to develop a scan-cell-reordering scheme which can minimize the scan-out transitions while preserving the don't-care bits in the test cubes for a later optimization of scan-in transitions using MTfill [Sankaralingam et al. 2000] . To achieve this goal, we first need to predict the correlation between the response values before specifying don't-care bits. This response correlation is an index to the possible scan-out transitions between scan cells and can be used as a guidance to the reordering process (Section 4). Second, we consider the impact of scan-cell reordering on the result of MT-fill and simultaneously optimize the scan-in and scan-out transitions (Section 6). Next, we selectively inverse some connections between scan cells such that a low response correlation (or pattern correlation) between two scan cells can be turned into a high correlation, which in turn reduces the probability that scan-shift transitions occur along the scan paths (Section 7). Last, we consider the routing overhead of scan paths during the scan-cell reordering process, and thus the trade-off between scan-shift power and routing overhead can be properly controlled (Section 8). In addition, we propose a pattern reordering scheme to minimize the signal transitions resulted from the value difference between the first bit of a scan-in pattern and the last bit of its previous scan-out response after the scan-cell reordering scheme is applied (Section 5). All the proposed methods are validated through large ISCAS and ITC benchmark circuits.
MOTIVATION
During the scan-based testing, the total power consumption of the CUT is highly correlated with the total number of signal transitions on the scan cells [Sankaralingam et al. 2000] . In this article, we use the number of signal transitions occurring on scan cells to represent the power of the whole CUT. The proposed scan-cell-reordering scheme focuses on reducing the total scanshift power, that is, reducing the total scan-shift transitions. The capture power is not considered in the proposed scheme since the number of capture transitions generated for a test pattern depends only on the filling of the test pattern. Changing the scan-cell ordering does not change the hamming distance between the test-pattern vector and its corresponding test-response vector.
From the discussions in Section 1, the scan-in transitions can be minimized by properly filling the don't-care bits of a test set once the scan-cell order in the scan paths is given [Sankaralingam et al. 2000] . This reduction could be more significant as the percentage of don't-care bits increases. Therefore, our scan-cell reordering scheme attempts to first minimize the scan-out transition count without specifying the don't-care bits, leaving the don't-care bits for a later minimization of scan-in transition, such as MT-fill [Sankaralingam et al. 2000] . However, before specifying the don't-care bits, the value of some responses cannot be known, implying that no explicit information for estimating the possible number of scan-out transitions can be used during the scan-cell reordering process.
We first use a simple experiment (reported in Table I ) to show that certain pairs of scan cells tend to have the same response value in most cases of the random don't-care filling. Thus, even without knowing the exact test responses, the reordering scheme can still avoid the possible scan-out transitions by connecting those correlated pairs of scan cells next to each other. We first define this tendency between two scan cells as the response correlation, which is the probability that the two scan cells have the same response value by a random fill of don't-care bits. Please note that the similar concept of response correlation has already been used in previous works [Bonhomme et al. 2002; Chen et al. 2003; Dabholkar et al. 1998; Sinanoglu et al. 1998 ], but the method and assumptions for obtaining this response correlation are different from work to work.
In the experiment, we use a commercial tool [Synopsys 2010 ] to generate stuck-at-fault patterns with don't-care bits. By randomly filling the don't-care bits and simulating the corresponding responses for 1-million times, the statistic of the response correlation between any two scan cells can then be collected. Table I lists the range of response correlations (Columns 1 and 4), the number of scan-cell pairs whose sampled response correlation falls in the range (Columns 2 and 5), and its corresponding percentage to the total scan-cell pairs (Columns 3 and 6), for the largest ISCAS benchmark circuit s38584. The don't-care bit percentage of this test set is 78.01%. As the results show, while majority of the scan-cell pairs have a response correlation around 0.5, still 21595 scan-cell pairs (2%) have a response correlation higher than 0.75. Those 21595 scan-cell pairs could form a fair-sized solution space when reordering the 1452 scan cells in s38584. This experimental result also indicates that, even with 78.01% of don't-care bits, the response correlations are not purely random.
The same trend can be observed on other ISCAS and ITC benchmark circuits as well. Table II shows the result of a similar experiment on the largest ITC benchmark circuit, where the don't-care bit percentage of its test set is 89.98% and 1.58% of scan-cell pairs have a response correlation higher than 0.75.
PROBLEM FORMULATION
Our problem definition of the scan-cell reordering for reducing scan-shift power is given as follows.
Input:
-A circuit under test with scan cells inserted, and -ATPG test patterns with don't-care bits (X's).
Output:
-An ordering of scan cells, and -Test patterns with all don't-care bits specified by MT-Fill based on the derived cell ordering.
Objective:
-Generate the minimum number of scan-shift transitions for the given test patterns.
In this article, the proposed scan-cell-reordering scheme only discusses the situation of one scan chain in a design. However, the concept of the proposed reordering scheme could be extended to multiple-scan-chain architectures as well. Given a test pattern and the scan-cell order for the scan chain, we can use the Weighted Transition Count (WTC) [Sankaralingam et al. 2000 ] to calculate the number of scan-in and scan-out transitions generated during the scan-shift cycles. The WTC considers not only the value difference between the patterns or responses of two adjacent scan cells, but also the number of transitions that this value difference generates during the scan-shift cycles. Eqs. (1) and (2) define the WTC in (i) and WTC out (i) to calculate the scan-in transitions and scan-out transitions generated by the ith pattern, respectively.
In Eqs.
(1) and (2), s denotes the total number of scan cells; PD( j) (RD( j)) denotes the value difference between the scan-in pattern (scan-out response) of the jth cell and the j + 1 cell; W PD ( j) denotes the number of scan-in transitions generated by the pattern-value difference PD( j) when shifting in the corresponding pattern values from the scan-chain input to the j+ 1 cell; W RD ( j) denotes the number of scan-out transitions generated by the response-value difference RD( j) when shifting out the responses from the j cell to the scanchain output.
In the WTC calculation, W PD ( j) = j, implying that a pattern-value difference can generate more scan-in transitions if this value difference occurs closer to the scan-chain output. On the contrary, W RD ( j) = s − 1 − j, implying that a response-value difference can generate more scan-out transitions if this value difference occurs closer to the scan-chain input. Figure 1 shows an example of the WTC computation on a 6-cell scan chain, assuming that three value differences occur between cells (C 1 , C 2 ) , (C 2 , C 3 ), and (C 5 , C 6 ) for both the test pattern and its response.
Eq. (3) calculates the total number of transitions, WTC total , generated by a given test set with m test patterns.
Scan-Cell Reordering for Minimizing Scan-Shift Power · 10: 7 
SCAN-CELL REORDERING CONSIDERING ONLY RESPONSE CORRELATION

Detailed Steps of Reordering Scheme
We introduce a scan-cell reordering scheme, named RORC (ReOrdering considering Response Correlation), which first reduces the scan-out transitions by minimizing the response correlations while preserving all don't-care bits in the test patterns. Then, the scan-in transitions are further minimized by specifying the don't-care bits with MT-fill. Figure 2 shows the flow of RORC, which consists of five main steps. The detail of each step is described in the following subsections.
Obtain Response Correlations.
A simulation-based method is applied to sample the response correlations between each pair of scan cells. However, the filling of don't-care bits in RORC is not purely random since the MT-fill technique will be applied later in RORC. Therefore, in this step, we randomly generate the scan-cell ordering multiple times, specify don't-care bits using MT-fill based on each generated scan-cell ordering, and then collect the response correlations by simulating the filled patterns. The number of random-generated cell orderings used in simulation will determine the accuracy of the sampled response correlations. We use the following empirical equation to determine this number of random-generated cell orderings. We have
where G Counts and P Counts denote the circuit gate count and the number of given test patterns, respectively.
Construct the Correlation Graph.
After obtaining the response correlations, we construct a nondirected graph, named response-correlation graph, in which a vertex represents a scan cell and the weight of each edge represents the response correlation between the adjacent vertices. Because any pair of scan cells could be placed next to each other, the response-correlation graph is a complete graph. Figure 3 shows an example of constructing a responsecorrelation graph with four scan cells.
Find a Maximal Hamiltonian Cycle.
A higher response correlation between two scan cells implies a lower probability that a response-value difference occurs between the two cells. Based on this concept, the maximum Hamiltonian cycle on the response-correlation graph implies a scan-cell ordering on which the number of value differences generated between adjacent cells is statistically minimum. Finding the maximum Hamiltonian cycle is known as the Traveling Salesman Problem (TSP), which is NP-complete. We use a greedy TSP algorithm which orders one vertex at a time to form the cycle. The selection criterion for the new ordered vertex is to find the vertex which has the maximum weight with the previous ordered vertex. In addition, we select the first N largest edges as the initial searching points and report the best result out of these N trials, where N denotes the total number of scan cells. The time complexity of this algorithm is Q(N 3 ).
Determine Cell Ordering with Minimal WTC.
In the previous step, we obtained a maximal Hamiltonian cycle on the response-correlation graph so that the number of potential response-value differences between adjacent cells can be minimized. However, to minimize the WTC out , we need to consider not only the number of response-value differences but also the positions of those value differences in the cell ordering (as discussed in Section 3). In step 4, we break the given maximal Hamiltonian cycle into a Hamiltonian path which forms the final scan-cell ordering. The breaking of the Hamiltonian cycle will affect the positions of the response-value differences and, in turn, affect the WTC out . Here, we estimate the WTC out generated by each possible breaking of the given Hamiltonian cycle and use the breaking with the minimum WTC out to form the final cell ordering.
The estimated WTC out here is obtained by replacing the RD( j) in Eq. (2) with 1 minus the response correlation between cell j and j + 1. For example, the maximal Hamiltonian cycle in Figure 3 is C 1 -C 2 -C 4 -C 3 -C 1 . Figure 4 shows the estimated WTC out for all eight cases of the possible cycle breaking. The final cell ordering of the scan chain is C 2 -C 1 -C 3 -C 4 .
Apply MT-Fill to Specify Don't-Care Bits.
After the scan-cell ordering is decided in the previous step, we apply the MT-fill technique to fill the don'tcare bits of the test patterns so that the scan-in transitions based on the scancell ordering can be minimized. The rule of MT-fill is that a don't-care bit is filled with the value of the first encountered specified bit when traversing from the don't-care bit toward the scan-chain output. Refer to Sankaralingam et al. [2000] for more details of MT-fill. 
Experimental Results
We conduct experiments on ten ISCAS and ITC benchmark circuits. Table III first shows the statistics of the benchmark circuits and their ATPG patterns generated by Synopsys [2010] .
The following experiment compares RORC with another scan-cell reordering scheme presented in Bonhomme et al. [2002] , which requires fully-specified test patterns before the reordering. Since RORC applies MT-fill to minimize the scan-in transitions, we apply MT-fill for Bonhomme et al. [2002] as well. In the following experiment of Bonhomme et al. [2002] , we first randomly generate an initial scan-cell ordering and specify the don't-care bits using MT-fill according to that initial ordering. Then the reordering scheme in Bonhomme et al. [2002] is applied to obtain the final scan-cell ordering based on the filled patterns. We repeat the aforesaid steps 100 times and report the best results for Bonhomme et al. [2002] . Also, we use the same TSP algorithm in both RORC and Bonhomme et al. [2002] to make a fair comparison.
In Table IV , Columns 3, 4, and 5 list the numbers of scan-in transitions, scan-out transitions, total scan-shift transitions, respectively. Column 6 lists the peak number of scan-shift transitions at a single scan-shift cycle. Column 7 lists the runtime in seconds. The results show that RORC can outperform Bonhomme et al. [2002] with an average 43.68% and 49.50% reduction to the number of scan-in transitions and scan-out transitions, respectively. The reduction to scan-in transitions first demonstrates the advantages of preserving don't-care bits for later minimization. Also, the reduction to scan-out transitions demonstrates the effectiveness of using sampled response correlations to guide the reordering process. The reduction to peak transitions is a by product of the reduction to total scan-shift transitions. Note that the result reported for Bonhomme et al. [2002] is selected from 100 trials of random initial cell ordering. It implies that, even with MT-fill, specifying all don't-care bits before reordering will significantly decrease the opportunity in minimizing scan-shift transitions later on and, in turn, lead to a local optimum. It also implies that the optimal cell ordering obtained by RORC is hard to be achieved by randomly assigning the initial cell ordering of Bonhomme et al. [2002] for multiple times.
Note that the runtime of Bonhomme et al. [2002] listed in Table IV is the runtime for only one trial, but the reported result of Bonhomme et al. [2002] is the best result from 100 trails. Therefore, the runtime for the Bonhomme et al. [2002] is actually longer than that of RORC. In addition, please also note that the comparison of total scan-shift transitions shown in Table IV also represents the comparison of the average scan-shift transitions per cycle, which can be computed by dividing the total number of shift transitions by the total number of shift cycles. Table V reports RORC's runtime distribution and memory usage for each benchmark circuit. Column 2 to 4 lists the runtime spent in the responsecorrelation sampling (Column 2), TSP algorithm (Column 3), and other computation (Column 4), respectively. Column 5 lists the total runtime. Column 6 lists the ratio of the runtime spent in correlation sampling over the total runtime. In average, 90% of the total runtime is spent on sampling the response correlations, which is actually the efficiency bottleneck of the proposed scancell reordering scheme. At last, Column 7 lists the memory usage of RORC. The largest memory usage among the benchmark circuits is 21.2M.
In Table IV , the total number of scan-shift transitions is actually slightly larger than the sum of scan-in transitions and scan-out transitions. This is because we omitted the in-between transitions in Table IV , which are generated by the value difference between the first bit of a scan-in pattern and the last bit of its previous scan-out response. The percentage of in-between transitions is low compared to scan-in and scan-out transitions. It can be further reduced by a pattern-reordering scheme proposed in the next section. 
PATTERN REORDERING FOR MINIMIZING IN-BETWEEN TRANSITIONS
Detailed Steps of Pattern Reordering
We first divide test patterns into four types, A, B, C, and D, according to the first scan-in bit of a pattern and the last scan-out bit of its response. The other bits in a pattern and its response cannot affect the number of in-between transitions. 
If x < y, then we apply the following ordering.
The preceding two pattern orderings both attempt to alternately arrange one type-B pattern next to one type-C pattern as often as possible, such that the last bit of more responses can be the same as the first bit of their next pattern. Both aforesaid pattern orderings also consecutively arrange all type-A patterns or all type-D patterns next to each other. There is no in-between transition among such a consecutive sequence of type-A patterns or type-B patterns. Please also note that putting all consecutive type-A patterns or type-D patterns in the middle of the sequence can also result in the same total number of in-between transitions as the aforesaid two listed pattern orderings. · 10: 13 Table VII compares the results with and without applying the proposed pattern reordering. Column 2, 3, and 4 list the number of in-between transitions, the number of total transitions, and the ratio of in-between transitions over the total transitions, respectively, without applying the proposed pattern reordering. Column 5, 6, and 7 list the corresponding results with applying the proposed pattern reordering. As the results show, the average ratio of in-between transitions can be reduced from 0.915% to 0.241% by applying the proposed pattern reordering. Also, the runtime of this pattern reordering is fast (less than 1 second for all benchmark circuits).
Since the percentage of in-between transitions is much lower than that of scan-in transitions or scan-out transitions (0.241% in average), we will not individually list the number of in-between transitions in later experiments so that the focus of our scan-cell reordering schemes can be on the scan-in and scan-out transitions. We will still count in-between transitions in the total number of scan-shift transitions.
SCAN-CELL REORDERING CONSIDERING BOTH RESPONSE AND PATTERN CORRELATIONS
As the results show in Table IV , RORC generates a lower number of total scanshift transitions than Bonhomme et al. [2002] in all circuits but s35932. This exception may contribute to its low don't-care bit percentage of 37.36%. From our internal experiments, we found that a cell ordering will affect the results of the MT-fill more significantly when the don't-care bit percentage becomes lower. However, RORC can only reduce scan-out transitions by minimizing the response correlations between adjacent cells. It ignores the impact of the cell ordering on the number of scan-in transitions resulted from the MT-fill patterns.
In this section, we introduce another scan-cell reordering scheme, named ROBPR (ReOrdering considering Both Pattern and Response correlation), which can simultaneously optimize the pattern correlations and response correlations during the reordering process. 
6.1 Detailed Steps of Reordering Scheme Figure 5 shows the flow of ROBPR consisting of four main steps. The details of steps 1-3 are described in the following subsections. The detail of step 4 is the same as the step 5 in RORC and hence omitted in this section.
Obtain Pattern and Response Correlations.
In order to measure the impact of a scan-cell ordering on the number of scan-in transitions, we first define the pattern correlation between cell i and cell j as the probability that the pattern values on these two cells are the same when the output of cell i is connected to the input of cell j. Note that this pattern correlation is dependent on the order of cells. For a test pattern k, Table VIII considers each combination of pattern values between cell i and cell j, and lists its corresponding pattern correlation after MT-fill (denoted as PC k (i, j)).
In cases 1, 2, 4, and 5, both values of cell i and j are specified bits and hence their pattern correlations can be determined immediately for test pattern k. In cases 7, 8, and 9, a don't-care bit is placed prior to a specified bit and hence the don't-care bit will be filled with the same value as the specified bit. In cases 3 and 6, a specified bit is placed prior to a don't-care bit. Hence, the value of this don't-care bit cannot be derived immediately and has to be determined by its first encountered specified bit when traversing toward the scan-chain output. We use S 0 /(S 0 + S 1 ) (S 1 /(S 0 + S 1 )) to represent the probability that its first encountered specified bit is a 0 (1), where S 0 and S 1 denote the total numbers of specified 1s and 0s in the test pattern, respectively. After calculating the PC k (i, j) for each pattern k, the pattern correlation between cell i and cell j for the entire test set can be obtained by averaging the PC k (i, j) for each pattern k.
As to the response correlations, we use the same simulation-based method described in Section 6.1.1 to estimate them.
Construct the Directed Correlation Graph.
The correlation graph constructed in ROBPR is a revised version of the correlation graph in Section 4.1.2. First, this correlation graph is directed. Second, an edge in this correlation graph has two weights (W p , W r ), where W p and W r represent the pattern correlation and response correlation, respectively. Figure 6 shows an example of constructing such a directed correlation graph given the pattern and response correlations between three scan cells.
Find the Hamiltonian Path with Minimal WTC.
Unlike RORC which finds a Hamiltonian cycle first and then breaks the Hamiltonian cycle to obtain a Hamiltonian path with minimal estimated WTC out , ROBPR uses an integrated algorithm to directly obtain the Hamiltonian path with minimal estimated WTC total on the correlation graph. Figure 7 shows the proposed greedy-based algorithm which also ordered one new vertex at a time to form such a Hamiltonian path.
When adding the nth nonordered vertex V non for the Hamiltonian path, this algorithm uses a cost function Cost(V last , V non , n) to measure the impact of the new-added edge (V last , V non ) on WTC total , which is defined in Eq. (3). In the definition of Cost(V i , V j , n) in Figure 7 , the W p (V i , V j ) (W r (V i , V j )) actually represents the probability that a pattern-value (response-value) difference occurs between V i and V j . The n in the cost function actually represents the W PD (n) described in the WTC Eq. (1). The N − 1 − n in the cost function actually represents the W RD (n) described in the WTC Eq. (2).
This cost function will guide the algorithm to emphasize more on the response correlation in the beginning of the ordering process and then gradually move its emphasis to the pattern correlation in the later stage of the reordering process, which exactly reflects the WTC definition in Eqs. (1) and (2). 
Experimental Results
We conduct experiments for ROBPR on the same benchmark circuits and test patterns as in Section 4.2. Table IX compares the results of ROBPR with the results of RORC, which considers only the response correlation during the reordering. The experimental results show that, in average, ROBPR can generate 34.87% less scan-in transitions but only 6.33% more scan-out transitions compared to RORC. This significant reduction in scan-in transitions first demonstrates the advantage of adding the pattern correlations into consideration during the ordering process. It also shows the effectiveness of the patterncorrelation estimation listed in Table VIII. The average reduction to the total scan-shift transitions is 12.38% by ROBPR. The 7.52% reduction to the number of peak transitions is a by-product of the reduction to total scan-shift transitions as well. The overall result again demonstrates the benefit of considering pattern correlations and response correlations simultaneously during the reordering. In addition, the reported runtime of ROBPR is almost the same as RORC, even though ROBPR needs to collect additional information for pattern-correlations calculation. It is because the proposed algorithm in step 3 (Figure 7) can directly find the Hamiltonian path with minimal WTC total , saving a step of breaking a Hamiltonian cycle to obtain the final ordering, such as the step 4 in RORC.
Table X further compares ROBPR with another scan-cell reordering scheme [Sinanoglu et al. 1998 ], which can also reorder the scan cells based on a test set with nonfilled don't-care bits. As the result shows, RORC can generate 42.69% less scan-in transitions and 68.15% less scan-out transitions compared to Sinanoglu et al. [1998] . The total number of scan-shift transitions and the number of peak transitions generated by RORC are 64.18% and 31.24% less than that generated by Sinanoglu et al. [1998] , respectively. This Sinanoglu et al. [1998] may not be able to accurately predict the response correlations. Also, not considering the impact of the input-pattern filling technique in Sinanoglu et al. [1998] may fail to effectively minimize the scan-in transitions as RORC does.
SCAN-CELL REORDERING USING SCAN-DATA INVERSION
To reduce potential signal transitions, both RORC and ROBPR arrange the scan cells with a high response (or pattern) correlation next to each other. It is because a high correlation between two scan cells represents a high probability that their response (or pattern) values are the same. On the contrary, a low correlation between two scan cells means that their response (or pattern) values are most likely inverse to each other. In such a low-correlation case, if we can inverse the value of a cell before it propagates to the scan-in port of the other cell, this low correlation can be turned into a high correlation and become helpful for minimizing scan-shift transitions. In this section, we introduce a scan-cell-reordering scheme named SIRO (Scan-data-Inversion ReOrdering). SIRO selectively applies the inversion connection between two scan cells and hence can take advantage of both high correlations and low correlations between responses and patterns. This inverse connection between adjacent cells has also been utilized in some previous works, such as Sinanoglu et al. [1998 Sinanoglu et al. [ , 2003 , to minimize the number of scan-shift transitions. Figure 8 shows the overall flow of SIRO, which consists of the following four steps.
Detailed Steps of Reordering Scheme
Obtain Inverse Pattern and Response Correlations.
In SIRO, when connecting a scan cell i to its next scan cell j, two types of connections can be made. One is direct connection, which connects the value Q of i to the scan-in port SI of j. The other type is the inverse connection, which connects the inverse value Q of i to the scan-in port SI of j. In RORC and ROBPR, we already discussed how to estimate the response and pattern correlations when using the direct connection. The focus here is to estimate the response correlations and pattern correlations when using the inverse connection. 
The response correlation for an inverse connection can be simply estimated by 1 minus the response correlation calculated for a direct connection. However, it is more complicated to estimate the pattern correlations for an inverse connection. This is because the MT-fill can adjust its filling of don't-care bits according to the inverse connection or the direct connection. We first define the inverse pattern correlation between cell i and cell j for pattern k as IPC k (i, j), which is the probability that the pattern values on these two cells are the same when cell i is inversely connected to cell j. Table XI shows the inverse pattern correlation for different combinations of pattern values between cell i and cell j after MT-fill. The derivation of Table XI is similar to Table VIII . The only difference is that, for an inversely connected cell pair, a transition is generated when the specified values of both cells are the same. The definition of S 0 and S 1 are the same as that in Table VIII. 7.1.2 Construct the Directed Correlation Graph. The correlation graph constructed in SIRO is a revised version of the correlation graph in ROBPR. The difference is that an edge in this correlation graph has two sets of weights: noninverse set (W p , W r ) and inverse set (IW p , IW r ), where W p and W r represent the direct pattern correlation and response correlation as calculated in ROBPR, and IW p and IW r represent the inverse pattern correlation and response correlation as calculated in previous step. Figure 9 shows an example of constructing such a directed correlation graph given the inverse and noninverse correlation sets between three scan cells. Fig. 9 . Construction of the directed graph based on inverse and noninverse correlation sets. 
The cell with the highest cost function will be selected and the selected cell is directly or inversely connected to the next cell according to the type of the highest cost function (Cost(
Determine the Scan-In Patterns Based on Derived Cell
Ordering. Unlike the RORC and ROBPR which directly use the traditional MT-fill to determine patterns based on the derived cell ordering, the MT-fill in SIRO needs to apply a different filling rule to handle inverse connections. First, the value of a specified bit is inverse if an odd number of inverse connections are encountered before the specified bit. The specified bits remain the same if an even number of inverse connections are encountered. Next, the don't-care bits between the modified specified bits are filled using the traditional MT-fill. Figure 10 shows an example of the revised MT-fill to handle inverse connections. In Figure 10 , the scan chain contains six scan cells, and three inversions occur on cell pairs (C1,C2), (C4,C5), and (C5,C6), respectively. Because C2, C3, C4, and C6 pass through odd times of inversions (1 or 3 times) during scanin operation, the specified values are inverse before MT-fill. Then the MT-fill is applied to fill all the don't-care bits according to the modified specified bits. Figure 11 shows the inverse connections of scan cells corresponding to the example in Figure 10 . In Figure 11 , the differences with traditional architecture is that Q connects to SI while the inversions occur.
Experimental Results
We conduct experiments for SIRO on the same benchmark circuits and test patterns as those used in Section 4.2. Table XII compares the results of SIRO with that of ROBPR, which considers only the noninverse pattern and response correlation during the reordering process. As the results show in Table XII , SIRO in average can generate 1.23% less total scan-shift transitions with almost the same runtime compared to ROBPR. However, even though RISO can generate a smaller or at least an even number of scan-shift transitions for each circuit, this 1.23% average reduction is still less than our expectation before the experiment.
After further analysis, we found that the number of inverse connections used in each circuit is actually small (as listed in the last column of Table XII) . For s35932 and b17, even no inverse connection is used by SIRO. This low usage of inverse connection means that the low correlations between scan cells in those benchmark circuits are not low enough, so that the corresponding inverse correlations cannot produce a high score for the cost function Cost inv (V i , V j , n) used in step 3's greedy algorithm. This argument is further supported by the response-correlations distribution reported in Table I , where 2.05% of response correlations are larger than 0.75 but only 1.09% of response correlations are smaller than 0.25 for s38584. This trend is even more obvious for b17 as shown in Table II , where 1.58% of response correlations are larger than 0.75 but only 0.004% of response correlations are smaller than 0.25. Table XIII lists the probability distributions of response correlations for each benchmark circuit. Overall, SIRO can further reduce the scan-shift transitions for 7 out of 9 benchmark circuits.
From the preceding experiments, we can conclude that using the inverse connections can indeed help the reduction on scan-shift transitions since the only a small number of inverse connections can achieve a 1.23% average reduction in the total scan-shift transitions. However, the amount of this reduction is determined by the ratio of low response or pattern correlations over the high ones, which is highly circuit dependent. The reduction could be more significant if this ratio is higher.
· 10: 23
SCAN CELL REORDERING CONSIDERING BOTH POWER AND ROUTING FACTORS
All aforesaid reordering schemes, such as RORC, ROBPR, and SIRO, focus on reducing the power consumption during scan-based testing. However, these reordering schemes may result in long wire length of scan paths since the connection of scan cells is determined by cells' response or pattern correlations, not cells' physical distance. In this section, we proposed a scan-cell reordering scheme, named PRORO (Power and Routing-Overhead ReOrdering), which combines the ROBPR with routing consideration. The same idea can be applied to SIRO as well.
Detail Steps of Reordering Considering both Power and Routing Overhead
In PRORO, we reorder the scan cells after the placement is done. Based on the placement result, we use the Manhattan distance between two scan cells to approximate the wire length between the two cells. When selecting the next ordered scan cell, we incorporate this approximated wire length into the cost function and hence can limit the routing overhead. In our implementation, the placement is done by a commercial back-end tool and the position of each scan cell is obtained by parsing its DEF file. Basically, PRORO contains almost the same five steps as that of ROBPR, except some modifications to the step 2 and 3. Therefore, this subsection only shows the details of step 2 and 3. The rest of the steps all follow the steps in ROBPR.
Construct a Directed Multiple-Weight Graph Based on Response/
Pattern Correlations and Routing Overhead. As mentioned, the Manhattan distance between two cells is used to represent their routing overhead. In order to make the quantity of routing overhead compatible with the quantity of the cost function regarding scan-shift power, we normalize two cells' routing overhead (represented by the Manhattan distance) to a value between 0 to 1, which is defined as the routing weight between the two cells. We set the longest distance between any two cells as a routing weight of 1, and the shortest distance as a routing weight of 0.
The directed graph constructed in this section is a revised version of the directed graph introduced in ROBPR (step 2 in Section 6.1.2). An edge in the graph contains three weights (W p , W r , W l ), where W p , W r and W l represent the pattern correlation, the response correlation, and the routing weight between the two cells, respectively. Figure 12 shows an example of constructing such a directed graph given the correlation and routing weight between three scan cells.
Find the Hamiltonian Path with the Minimum WTC.
We use a similar greedy TSP algorithm as shown in Figure 7 except its cost function C T , which is modified as follows to control the trade-off between scan-shift power and routing overhead. Figure 7) . The parameter β in C T (V i , V j , n) is call the optimization factor, which is used to control the trade-off between scan-shift power and routing overhead. The value of β ranges from 0 to 1. If β increases, this TSP algorithm focuses more on reducing routing overhead. If β decreases, this TSP algorithm focuses more on reducing scan-shift transitions. Figure 13 shows the details of this TSP algorithm.
Experimental Results
We conduct the following experiments to compare the results of PRORO using different optimization factors with the results of ROBPR and a scancell reordering scheme supported by a commercial back-end tool [Cadence 2006] , where ROBPR only focuses on minimizing the scan-shift transitions and Cadence's [2006] scan-cell reordering only focuses on minimizing the routing overhead of scan paths after the placement is done. In the following experiments, we first use ROBPR to obtain a scan-cell ordering and apply the APR tool in Cadence [2006] to get its placement. Then both PRORO and Cadence's [2006] scan-cell reordering are performed based on this placement of ROBPR. Cadence's [2006] scan-cell reordering is performed by using the command "scanreorder" in Cadence [2006] . A TSMC 0.18μm CMOS technology with 5 metal layers is used in the experiments.
Table XIV first lists the total number of scan-shift transitions generated by different scan-cell reordering schemes. For the convenience of result comparison, Table XIV normalizes the total number of scan-shift transitions of each reordering scheme by dividing it with the total number of scan-shift transitions of ROBPR, which is supposed to be the reordering scheme generating the least scan-shift transitions in this experiment. Table XV lists the estimated wire length of the scan paths (in μm) generated by different scan-cell reordering schemes. This estimated wire length of scan paths is measured by the summation of the Manhattan distance between any two adjacent scan cells. Similar to Table XIV, Table XV also normalizes the wire length of scan paths of each reordering scheme by dividing it with that of Cadence's [2006] reordering scheme, which is supposed to be the reordering scheme generating the shortest wire length of scan paths in this experiment. Table XVI further lists the total wire length (including the routing for both scan paths and CUT) generated by each reordering scheme after detailed route.
As the results show in Table XIV , if only minimizing the wire length of scan paths such as tool Cadence's [2006] reordering scheme, 2.4 times the scan-shift transitions of ROBPR are generated, where ROBPR only minimizes scan-shift transitions. On the other hand, ROBPR requires 3.3 times the wire length of scan paths of tool Cadence's [2006] reordering scheme as shown in Table XV . In fact, the wire length spent on CUT's routing is much more than the wire length spent on scan paths' routing. Thus, after detailed route, the total wire length of ROBPR is 1.26 times the total wire length of Cadence's [2006] reordering scheme as shown in Table XVI .
Also, the experimental results in Tables XIV, XV , and XVI show that the trade-off between scan-shift transitions and scan path's wire length can be controlled by PRORO with different optimization factors. Using a larger optimization factor, PRORO can reduce more wire length of scan paths but generate more scan-shift transitions. When the optimization factor equals 0.5, PRORO generates 12% more scan-shift transitions compared to ROBPR but only requires 7% total wire length after detailed route, which is an acceptable level of routing overhead as long as the design is not intensively routing-congested.
Another reason to sacrifice the wire length of scan paths for the scan-shift power is that the for advanced process technologies, the violation of hold-time constraints on scan paths occurs more often than the violation of setup-time constraints. Designers even intentionally increase the wire length of some scan paths to meet the hold-time constraint instead of applying a scan-cell reordering to reduce its wire length. Therefore, the motivation of reducing wire length on scan paths may not be as strong as that in the old process technologies.
CONCLUSIONS
In this article, we first presented a scan-cell reordering technique which can simultaneously reduce scan-shift transitions based on the response correlations and preserve don't-care bits in the test patterns for a later minimization of scan-in transitions using MT-fill (Section 4). Second, we considered both the response correlation and pattern correlations during the cell reordering process to further reduce the scan-in transitions generated by MT-fill (Section 6). Next, we utilized the inverse connection between scan cells to turn a low correlation into a high one and developed a corresponding scan-cell reordering scheme to consider those inverse correlations (Section 7). Last, we incorporated the routing overhead of scan paths into the cost function of our scan-cell reordering and hence the trade-off between scan path's routing overhead and the number of scan-shift transitions can be controlled by a user-specified factor. In addition, a postprocess pattern-reordering scheme was also proposed to minimize the inbetween transitions (Section 5). A series of experiments were conducted to compare the proposed schemes with a previous reordering scheme [Bonhomme et al. 2002] and a commercial tool's reordering scheme [Cadence 2006] . The experimental results demonstrated the effectiveness and efficiency of each of the proposed scan-cell reordering schemes.
