Abstract
Introduction
The scan design has been a widely used DFT technique which can guarantee high fault coverage for a complex design by enhancing its controllability and observability [1] . When using the scan design to shift test data, however, a large number of signal transitions may occur along the scan paths, which induces even more signal transitions on the circuit-under-test (CUT). Therefore, with the scan design, the CUT will consume much more power in its test mode than that in its functional mode [2] . This excessive power consumption during the scan-based testing may result in physical damage or reliability degradation to the CUT, and in turn decreases the yield and product lifetime [3] . As the number of scan cells keeps on growing in modern designs, this increasing power consumption has become one of the biggest barriers to effective the scan-based testing.
A common practice to lower the power consumption during scan-based testing is to reduce the number of scan cell's signal transitions, which can be classified into the following three types: (1) the capture transition -generated by the same scan cell's value difference between the scan-in pattern and the corresponding captured response, (2) the scan-out transition -generated by two adjacent scan cells' value difference between their scan-out response, and (3) the scan-in transition -generated by two adjacent scan cells' value difference between the scan-in patterns. The first transition type is associated with the capture power and the last two types are associated with the scan-shift power.
In order to reduce the capture transitions, complex ATPGs [4] [5] [6] are proposed to generate test-pattern vectors which have a minimal hamming distance with their corresponding test-response vectors. Because the don't care bits in their test cubes are fully specified for minimizing the capture transitions, the above ATPGs preclude the possibility for further test compaction or compression, and hence may result in a larger test set.
Methods are proposed to utilize the don't-care bits to minimize the scan-in transitions for a given test set [7] [8] [9] [10] . [7] proposed a don't-care-filling technique, named MT-fill, guaranteeing that the scan-in transitions generated by its filled patterns are minimized for the given test set. The methods in [7] [8] [9] reduced the test power as well as the test data volume based on buildin de-compression hardware. [10] added Xor gates or inverters along the scan paths to minimize the scan-in transitions. However, none of [7] [8] [9] [10] considered the scan-out transitions simultaneously.
Another concept to reduce the scan-shift power is to partition the scan cells into multiple groups and activate only one group at a time during the scan-shift cycles [11] [12] [13] [14] [15] [16] . It can limit the concurrent transitions in a small portion of the CUT. The partition methods require special control architectures to the scan designs, such as gated clocks [11] , central control unit for each group's clock signal [12] [13] , or specialized scan cells along with multiphase generator [15] . [16] further minimizes the capture power by only capturing responses for certain selected groups of scan cells. It requires a customized ATPG and discards a significant portion of responses.
Methods in [17] [18] [19] [20] change the order of scan cells along the scan paths to minimize both scan-in and scan-out transitions based on given test patterns and responses. This scan-cellreordering technique saves the scan-shift power, but sacrifices the opportunity of optimizing the wire length of scan paths during the APR stage [21] [22] . One important reason in making this tradeoff is that, for advanced process technologies, the violation of hold-time constraints on scan paths occurs more often than the violation of setup-time constraints. Hence, the need of minimizing wire length for scan paths is not as urgent as that of mini-mizing test power. However, the existing scan-chain-reordering techniques [17] [18] [19] [20] need to obtain the exact test patterns and responses in advance. As the result, no don't-care bits can be utilized for a further reduction to scan-in transitions or test data volume, such as [7] [8] [9] [10] .
In this paper, we attempt to develop a scan-cell-reordering scheme which can minimize the scan-out transitions while preserving the don't-care bits in the test cubes for a later optimization of scan-in transitions using MT-fill [7] . To achieve this goal, we first need to predict the correlation between the response values before specifying don't-care bits. This response correlation is an index to the possible scan-out transitions between scan cells and can be used as a guidance to the reordering process (Section 4). Next, we consider the impact of scan-cell reordering on the result of MT-fill and simultaneously optimize the scan-in and scan-out transitions (Section 5). Last, a comparison between our powerdriven scan-cell reordering and a routing-driven scan-cell reordering is provided based on experiments (Section 6). The experimental results demonstrate the effectiveness and the superiority of the proposed reordering scheme over a previous scan-cell reordering scheme [17] .
Motivation
During the scan-based testing, the total power consumption of the CUT is highly correlated with the total number of signal transitions on the scan cells [7] . In this paper, we use the number of signal transitions on scan cells as the power model of the whole CUT. The proposed scan-cell-reordering scheme focuses on reducing the total scan-shift power, i.e., reducing the total scan-shift transitions. The capture power is not considered in the proposed scheme.
From the discussions in Sec. 1, the scan-in transitions can be minimized by wisely filling the don't-care bits of a test set once the scan-cell order in the scan paths are given [7] . This reduction could be more significant as the percentage of don't-care bits increases. Therefore, our scan-cell reordering scheme attempts to first minimize the scan-out transition count without specifying the don't-care bits, leaving the don't-care bits for a later minimization of scan-in transition, such as MT-fill [7] . However, before specifying the don't-care bits, the value of some responses may not be obtainable, implying that no explicit information of scan-out transitions can be used during the scan-cell reordering process.
We use a simple experiment (reported in Table 1 ) to show that certain pairs of scan cells tend to have the same response value in most cases of the random don't-care filling. Thus the reordering scheme can avoid the possible scan-out transitions by connecting those correlated pairs of scan cells next to each other. We first define this tendency between two scan cells as the response correlation, which is the probability that the two scan cells have the same response value by a random fill of don't-care bits.
In the experiment, we use a commercial tool [23] to generate stuck-at-fault patterns with don't-care bits. We then collect the statistic of the response correlation between any two scan cells by randomly filling the don't-care bits and simulating the corresponding responses for 1-million times. Table 1 lists the range of response correlations (Columns 1 and 4), the number of scancell pairs whose sampled response correlation falls in the range (Columns 2 and 5), and its corresponding percentage to the total scan-cell pairs (Columns 3 and 6), for the largest ISCAS benchmark circuit s38584. As the results show, while majority of the scan-cell pairs have a response correlation around 0.5, still 21595 scan-cell pairs (2%) have a response correlation higher than 0.75. Those 21595 scan-cell pairs could form a fair-sized solution space when reordering the 1452 scan cells in s38584. This experimental result indicates that, even without specifying the don't-care bits, the response correlations are not purely random. The same trend can be observed on other ISCAS and ITC benchmark circuits as well. 
Problem Formulation
The problem of the scan-cell reordering for scan-shift power reduction is first defined as follows:
Input:
• A circuit under test with scan cells inserted, and
• ATPG test patterns with don't care bits (X's).
Output:
• An ordering of scan cells, and
• Test patterns with all don't-care bits specified by MT-Fill based on the derived cell ordering.
Objective:
• Generate the minimum number of scan-shift transitions for the given test patterns.
In this paper, the proposed scan-cell-reordering scheme only discuss the situation of one scan chain in a design. However, the concept of the proposed reordering scheme could be extended to multiple-scan-chain architectures as well.
Given a test pattern and the scan-cell order for the scan chain, we can use the weighted transition count (WTC) [7] to calculate the number of scan-in and scan-out transitions. The WTC considers not only the value difference between the patterns or responses of two adjacent scan cells, but also the number of transitions that this value difference generates during the scan shift cycles. Equation 1 and 2 define the W T Cin(i) and W T Cout(i) to calculate the scan-in transitions and scan-out transitions generated by the ith pattern, respectively.
W T Cin(i)
In equation 1 and 2, s denotes the total number of scan cells; P D(j) (RD(j)) denotes the value difference between the scanin pattern (scan-out response) of the jth cell and the j + 1 cell; WPD(j) denotes the number of scan-in transitions generated by the pattern-value difference P D(j) when shifting in the corresponding pattern values from the scan input to the j + 1 cell; WRD(j) denotes the number of scan-out transitions generated by the response-value difference RD(j) when shifting out the responses from the j cell to the scan chain output.
In the WTC calculation, WPD(j) = j, implying that a patternvalue difference can generate more scan-in transitions if this value difference occurs closer to the scan-chain output. On the contrary, WRD(j) = s − 1 − j, implying that a response-value difference can generate more scan-out transitions if this value difference occurs closer to the scan-chain input. Figure 1 shows an example of the WTC computation on a 6-cell scan chain, assuming that three value differences occur between cells (C1, C2) , (C2, C3), and (C5, C6) for both the test pattern and its response.
Equation 3 calculates the total number of transitions, W T C total , generated by a given test set with m test patterns.
4. Scan-cell Reordering Considering Only Response Correlation
Detailed Steps of Reordering Scheme
We introduce a scan-cell reordering scheme, named RORC (ReOrdering considering Response Correlation), which first reduces the scan-out transitions by minimizing the response correlations while preserving all don't-care bits in the test patterns. Then, the scan-in transitions are further minimized by specifying the don't-care bits with MT-fill. Figure 2 shows the flow of RORC, which consists of five main steps. The detail of each step is described in the following subsections.
Obtain Response Correlations
A simulation-based method is applied to sample the response correlations between each pair of scan cells. However, the filling of don't-care bits in RORC is not purely random since the MT-fill technique will be applied later in RORC. Therefore, in this step, we randomly generate the scan-cell ordering multiple times, specify don't-care bits using MT-fill based on each generated scan-cell ordering, and then collect the response correlations by simulating the filled patterns. The number of random-generated cell orderings used in simulation will determine the accuracy of the sampled response correlations. We use the following empirical equation to determine this number of random-generated cell orderings.
where G Counts and P Counts denote the circuit gate count and the number of given test patterns, respectively.
Construct the Correlation Graph
After obtaining the response correlations, we construct a nondirected graph, named response-correlation graph, in which a vertex represents a scan cell and the weight of each edge represents the response correlation between the adjacent vertices. Because any pair of scan cells could be placed next to each other, the response-correlation graph is a complete graph. Figure 3 shows an example of constructing a response-correlation graph with four scan cells.
Find a Maximal Hamiltonian Cycle
A higher response correlation between two scan cells implies a lower probability that a response-value difference occurs between the two cells. Based on this concept, the maximum Hamiltonian cycle on the response-correlation graph implies a scan-cell ordering on which the number of value differences generated between adjacent cells is statistically minimum. Finding the maximum Hamiltonian cycle is known as the traveling salesman problem
Step 1: Obtain the response correlations
Step 2: Construct the response-correlation graph based on the sampled response correlations
Step 3: Find a maximal Hamiltonian cycle on the responsecorrelation graph
Step 4: Determine the cell ordering with minimum WTC by breaking the Hamiltonian cycle
Step 5: Apply the MT-Fill to specify the don't-care bits of test patterns based on the derived cell ordering (TSP), which is NP-complete. We use a greedy TSP algorithm, which orders one vertex at a time to form the cycle. The selection criteria for the new ordered vertex is to find the vertex which has the maximum weight with the previous ordered vertex. In addition, we select the first N largest edges as the initial searching points and report the best result out of these N trials, where N denotes the total number of scan cells. The time complexity of this algorithm is of Q(N 3 ).
Determine Cell Ordering with Minimal WTC
In the previous step, we obtained a maximal Hamiltonian cycle on the response-correlation graph so that the number of potential response-value differences between adjacent cells can be minimized. However, to minimize the W T Cout, we need to consider not only the number of response-value differences but also the positions of those value differences in the cell ordering (as discussed in Section 3). In
Step 4, we break the given maximal Hamiltonian cycle into a Hamiltonian path, which forms the final scan-cell ordering. The breaking of the Hamiltonian cycle will affect the positions of the response-value differences and, in turn, affect the W T Cout. Here, we estimate the W T Cout generated by each possible breaking of the given Hamiltonian cycle and use the breaking with the minimum W T Cout to form the final cell ordering. The estimated W T Cout here is obtained by replacing the RD(j) in Equation 2 with 1 minus the response correlation between cell j and j + 1. For example, the maximal Hamiltonian cycle in Figure 3 is C1-C2-C4-C3-C1. Figure 4 shows the estimated W T Cout for all eight cases of the possible cycle breaking. The final cell ordering of the scan chain is C2-C1-C3-C4.
Apply MT-Fill to Specify Don't-care Bits
After the scan-cell ordering is decided in the previous step, we apply the MT-fill technique to fill the don't-care bits of the test patterns so that the scan-in transitions based on the scan-cell ordering can be minimized. The rule of MT-fill is that a don't-care bit is filled with the value of the first encountered specified bit when traversing from the don't-care bit toward the scan-chain output.
Refer to [7] for more details of MT-fill.
Experimental Results
We conduct experiments on ten ISCAS and ITC benchmark circuits. The following experiment compares RORC with another scancell reordering scheme presented in [17] , which requires fully-specified test patterns before the reordering. Since RORC applies MT-fill to minimize the scan-in transitions, we apply MT-fill for [17] as well. In the following experiment of [17] , we first randomly generate an initial scan-cell ordering and specify the don'tcare bits using MT-fill according to that initial ordering. Then the reordering scheme in [17] is applied to obtain the final scan-cell ordering based on the filled patterns. We repeat the above steps 100 times and report the best results for [17] . Also, we use the same TSP algorithm in both RORC and [17] to make a fair comparison.
In Table 3 , Columns 3, 4, and 5 list the numbers of scan-in transitions, scan-out transitions, total scan-shift transitions, respectively. Column 6 lists the peak number of scan-shift transitions at a single scan-shift cycle. Column 7 lists the runtime in seconds. The results show that RORC can outperform [17] with an average 44.29% and 45.80% reduction to the number of scan-in transitions and scan-out transitions, respectively. The reduction to scan-in transitions first demonstrates the advantages of preserving don't-care bits for later minimization. Also, the reduction to scan-out transitions demonstrates the effectiveness of using sampled response correlations to guide the reordering process. The reduction to peak transitions is a byproduct of the reduction to total scan-shift transitions. Note that the result reported for [17] is selected from 100 trials of random initial cell ordering. It implies that, even with MT-fill, specifying all don't-care bits before reordering will significantly decrease the opportunity in minimizing scan-shift transitions later on and, in turn, lead to a local optimum.
RORC generates a lower number of total scan-shift transitions than [17] in all circuits but s35932. This exception may attribute to its low don't-care-bit percentage of 37.36%. From our internal experiments, we found that a cell ordering will affect the results of the MT-fill more significantly when the don't-care-bit percentage is lower. This finding further motivates us to develop a cell reordering scheme which can also consider the impact of a scancell ordering on the scan-in transitions generated by the MT-fill patterns.
Scan-cell Reordering Considering Both Response and Pattern Correlations

Detailed Steps of Reordering Scheme
Step 1: Collect pattern and response correlations
Step 2: Construct a directed multiple-weight graph based on the collected pattern and response correlations
Step 3: Find the Hamiltonian path with the minimum WTC
Step 4: Apply the MT-Fill to specify the don't-care bits based on the derived cell ordering cell ordering on the number of scan-in transitions resulted from the MT-fill patterns. In this section, we introduce another scan-cell reordering scheme, named ROBPR (ReOrdering considering Both Pattern and Response correlation), which can simultaneously optimize the pattern correlations and response correlations during the reordering process. Figure 5 shows the flow of ROBPR consisting of four main steps. The details of steps 1-3 are described in the following subsections. The detail of step 4 is the same as the step 5 in RORC and hence omitted in this section.
RORC reduces scan-out transitions by minimizing the response correlations between adjacent cells. It ignores the impact of
Obtain Pattern and Response Correlations
In order to measure the impact of a scan-cell ordering on the number of scan-in transitions, we first define the pattern correlation between cell i and cell j as the probability that the pattern values on these two cells are the same when the output of cell i is connected to the input of cell j. Note that this pattern correlation is dependent on the order of cells. For a test pattern k, Table 4 considers each combination of pattern values between cell i and cell j, and lists its corresponding pattern correlation after MT-fill (denoted as P C k (i, j)). In cases 1, 2, 4, and 5, both values of cell i and j are specified bits and hence their pattern correlations can be determined immediately for test pattern k. In cases 7, 8, and 9, a don't-care bit are placed prior to a specified bit and hence the don't-care bit will be filled with the same value as the specified bit. In cases 3 and 6, a specified bit is placed prior to a don't-care bit. Hence, the value of this don't-care bit cannot be derived immediately and has to be de-case value of cell i value of cell j termined by its first encountered specified bit when traversing toward the scan-chain output. We use S0/(S0 +S1) (S1/(S0 +S1)) to represent the probability that its first encountered specified bit is a 0 (1), where S0 and S1 denote the total numbers of specified 1s and 0s in the test pattern, respectively. After calculating the P C k (i, j) for each pattern k, the pattern correlation between cell i and cell j for the entire test set can be obtained by averaging the P C k (i, j) for each pattern k.
As to the response correlations, we use the same simulationbased method described in the Sec. 4.1.1 to estimate them.
Construct the Directed Correlation Graph
The correlation graph constructed in ROBPR is a revised version of the correlation graph in Sec. 4.1.2. First, this correlation graph is directed. Second, an edge in this correlation graph has two weights (Wp, Wr), where Wp and Wr represent the pattern correlation and response correlation, respectively. Figure 6 shows an example of constructing such a directed correlation graph given the pattern and response correlations between three scan cells.
Find the Hamiltonian Path with Minimal WTC
Unlike RORC which finds a Hamiltonian cycle first and then breaks the Hamiltonian cycle to obtain a Hamiltonian path with minimal estimated W T Cout, ROBPR uses an integrated algorithm to directly obtain the Hamiltonian path with minimal estimated W T C total on the correlation graph. Figure 7 shows the proposed greedy-based algorithm, which also ordered one new vertex at a time to form such a Hamiltonian path. 
Min l ← a list of N edges having the minimum (Wp + Wr × (N -1)); 10 for each directed edge e(Vi, Vj ) of Min l 11
Vnext When adding the nth non-ordered vertex Vnon for the Hamiltonian path, this algorithm uses a cost function Cost(V last , Vnon, n) to measure the impact of the new-added edge (V last , Vnon) on W T C total , which is defined in Equation 3 . In the definition of Cost(Vi, Vj, n) in Figure 7 , the Wp(Vi, Vj ) (Wr(Vi, Vj)) actually represents the probability that a pattern-value (response-value) difference occurs between Vi and Vj . The n in the cost function actually represents the WPD(n) described in the WTC equation 1. The N −1−n in the cost function actually represents the WRD(n) described in the WTC equation 2.
This cost function will guide the algorithm to emphasize more on the response correlation in the beginning of the ordering process and then gradually move its emphasis to the pattern correlation in the later stage of the reordering process, which exactly reflects the WTC definition in Equations 1 and 2.
Experimental Results
We conduct experiments for ROBPR on the same benchmark circuits and test patterns as in Sec. 4.2. Table 5 compares the results of ROBPR with the results of RORC, which considers only the response correlation during the reordering. The experimental results show that, in average, ROBPR can generate 32.97% less scan-in transitions but only 3.82% more scan-out transitions compared to RORC. This significant reduction in scan-in transitions first demonstrates the advantage of adding the pattern correlations into consideration during the ordering process in ROBPR. It also shows the effectiveness of the pattern-correlation estimation listed in Table 4 .
The average reduction to the total scan-shift transitions is 12.52% by ROBPR. The 8.52% reduction to the number of peak transitions is a byproduct of the reduction to total scan-shift transitions as well. The overall result again demonstrates the benefit of considering pattern correlations and response correlations simultaneously during the reordering. In addition, the reported runtime of ROBPR is almost the same as RORC, even though ROBPR needs to collect additional information for pattern-correlations calculation. It is because the proposed algorithm in ROBPR (Figure 7) can directly find the Hamiltonian path with minimal W T C total , saving a step of breaking a Hamiltonian cycle to obtain the final ordering, such as Step 4 in RORC. Table 5 . Comparisons of generated scan-shift transitions between RORC and ROBPR
Comparison Between Power-Driven Reordering and Routing-Driven Reordering
Although the average and peak testing power can be reduced, a major concern of the proposed scan-cell reordering scheme is its potential overhead in the total wire length. The scan-cellreordering technique can be applied not only for the testing-power reduction but also for the wire-length minimization. Most current back-end tools support the option of the scan-cell reordering for wire-length minimization after placement. In this section, we compare our power-driven scan-cell reordering, ROBPR, with a routing-driven scan-cell reordering provided by a commercial tool [24] .
The following experiment uses a TSMC 0.18µm CMOS technology with 5 metal layers. For the experimental results reported for ROBPR, we first obtained the scan-cell ordering by ROBPR and apply the APR tool in [24] to get its placement. For the experimental results reported for [24], we start from the same ROBPR's placement and apply the command "scanreorder" in [24] to get a routing-driven scan-cell reordering. In Table 6 , Columns 3 and 4 list the total number and the peak number of scan-shift transitions, respectively, based on the scan-cell ordering of each scheme. Columns 5 and 6 list the total wire length and the wire length of all scan paths estimated by [24] based on the corresponding placement (manhattan distance). As the average results show, ROBPR can generate 58.04% less scan-shift transitions and 25.88% less peak transitions, compared to [24] . Also, ROBPR leads to a 13.35% higher estimated total wire length and a 220.68% higher estimated wire length of scan paths, compared to [24] . The reduction of the total wire length by [24] is mainly contributed from the reduction of the scan-chain wire length. However, for advanced process technologies, the violation of hold-time constraints occurs much more often than the violation of setup-time constraints on scan paths. Designers even intentionally increase the wire length of some scan paths to meet the hold-time constraint instead of applying a scan-cell reordering to reduce its wire length. Therefore, the motivation of reducing wire length on scan paths may not be as strong as that in the old process technologies. Table 7 lists the final total wire length, the final wire length of scan paths, the number of vias in use, and the allocated area after the detail routing [24] is performed. As the results show, the average reductions to the total wire length, the scan-chain wire length, and the number of via by [24] are 8.24%, 221.24%, and 1.46%, respectively. While the reduction percentage of scan-chain wire length matches the estimated result after placement, the reduction percentage of the total wire length is significantly smaller than its estimated result. It implies that the benefit of a routing-driven scan-cell reordering may be diluted after other back-end optimization steps are performed. However, the reduction to scanshift transitions caused by ROBPR remains the same as long as the scan-cell ordering is kept, which is another advantage of using a power-driven scan-cell reordering Table 7 . Comparisons of scan-shift transitions and estimated wire length after detail routing.
Conclusions
This paper first presents a scan-cell reordering scheme which connects the scan cells with a high response correlation to reduce scan-out transitions. This reordering scheme preserves the don't-care bits during the ordering process so that a post patternfilling technique can be applied to minimize the scan-in transitions. This paper further adds the pattern correlations into consideration and reduce even more scan-shift transitions. A set of experiments are conducted to demonstrated the effectiveness of each technique proposed in this paper. A comparison to [17] also confirms the superiority of the proposed scheme by an average 45.7% reduction to the scan-shift transitions.
