Abstract-At-speed testing is crucial to catch small delay defects that occur during the manufacture of high performance digital chips. LaunchOff-Capture (LOC) and Launch-Off-Shift (LOS) are two prevalently used schemes for this purpose. LOS scheme achieves higher fault coverage while consuming lesser test time over LOC scheme, but dissipates higher power during the capture phase of the at-speed test. Excessive IR-drop during capture phase on the power grid causes false delay failures leading to significant yield reduction that is unwarranted. As reported in literature, an intelligent filling of don't care bits (X-filling) in test cubes has yielded significant power reduction. Given that the tests output by automatic test pattern generation (ATPG) tools for big circuits have large number of don't care bits, the X-filling technique is very effective for them. Assuming that the design for testability (DFT) scheme preserves the state of the combinational logic between capture phases of successive patterns, this paper maps the problem of optimal X-filling for peak power minimization during LOS scheme to a variant of interval coloring problem and proposes a dynamic programming (DP) algorithm for the same along with a theoretical proof for its optimality. To the best of our knowledge, this is the first ever reported X-filling algorithm that is optimal. The proposed algorithm when experimented on ITC99 benchmarks produced peak power savings of up to 34% over the best known low power X-filling algorithm for LOS testing. Interestingly, it is observed that the power savings increase with the size of the circuit.
I. INTRODUCTION
The exponential scaling of metal-oxide-semiconductor (MOS) transistor feature size with successive technology generations has led to an exponential increase in on-chip power densities, causing thermal hot-spots. The power dissipation during the test mode of a chip is often several times higher than the power dissipation during the normal functioning of the chip [1] , [2] . Excessive average power leads to thermal stress and excessive peak power causes unacceptable dynamic IR-drop leading to false delay failures, which are important issues motivating low power test.
Excessive IR-drop specific to test mode can lead to delay failures that are not observed during the normal functioning of the chip [3] , [4] , thereby leading to discarding good chips as faulty, ultimately reducing their final yield. This paper focuses on reduction in peak test power through minimization of peak switching activity. The test cubes for large circuits are typically dominated by don't care (X) bits as shown in column 4 of Table I , making X-filling an effective technique for minimizing peak test power. Motivated by this, this paper proposes an analytical solution for solving the problem of optimal X-filling for minimizing peak test power. The main contributions of this paper are as follows:
• Given a test cube ordering, mapping the problem of optimal Xfilling for minimizing peak test power to a variant of interval coloring problem which we refer to as bottleneck coloring problem; and, • Proposing a polynomial time algorithm for the bottleneck coloring problem. The formal definition for the problem of X-filling for test peak power minimization is shown in section IV. Our mapping of the optimal X-filling problem to the bottleneck coloring problem is explained in section V. Following this, the proposed algorithm for optimal X-filling along with its proof of correctness is shown in section V. The experimental setup used in this paper to implement the proposed algorithm, the results thus obtained by applying it, and comparing the same with the best known techniques proposed in the literature are shown in section VII. Section VIII concludes the paper.
II. RELATED WORK
There have been several techniques proposed in the past for minimizing peak test power. These techniques can be broadly categorized into circuit level [1] , [5] - [7] , gate level [9] - [12] and system level [13] , [15] - [17] , [20] techniques. Circuit level techniques include supply gating [6] , scan flip-flop redesign [1] , [5] and supply voltage scaling [7] , [8] . Gate level techniques include clock gating [10] , [15] , scan cell output gating [12] , and low power scan chain synthesis [1] , [5] , [8] - [10] . System level techniques include low power test pattern generation [16] , power aware test scheduling [17] , test pattern ordering [13] , [14] , [20] and X-filling [19] , [21] , [22] . All of these X-filling techniques for Launch-On-Shift (LOS) scheme [19] , [21] , [22] are heuristics without a performance guarantee. This paper proposes a theoretical framework to arrive at an optimal X-filling for minimizing peak test power. The following sections motivate this theoretical framework, the underlying design for testability (DFT) scheme necessary to apply the proposed technique, its proof of optimality and the results obtained after applying the same on benchmarks. To the best of our knowledge, this is the first ever reported X-filling algorithm that is optimal.
III. MOTIVATION
In scan based test, the input test pattern is serially shifted in, while serially shifting out the response for previous test pattern. An important assumption that is made about the role of the DFT architecture is that it preserves the state in which the combinational logic settles down after launching the current test pattern and before capturing the response, until the launching of next test pattern. The scan architecture can be made to satisfy this property with minimal area and physical design overhead [18] . If the combinational state preservation property can be ensured, the combinational part of the circuit behaves in such a way as if the test patterns are applied one after the other. Once this property is satisfied, since the combinational logic is undisturbed during scan-shift and capture phases, as far as application of test patterns is concerned, the sequential circuit behaves like a combinational circuit. Thus, test pattern ordering technique that was proposed earlier for reducing test power in combinational circuits [13] , [20] becomes equally effective for sequential circuits. Having understood this, the next step is to compute a test pattern ordering that achieves the same.
Once the test pattern ordering is computed, the next step is to minimize the peak toggles at the inputs (primary inputs and the scan cell outputs) through filling of the X-bits in test patterns (cubes) with binary values. The expectation is that reducing the input toggles leads to lower power dissipation inside the circuit, as shown previously in [20] . The most recent and effective X-filling algorithm for peak power minimization during LOS scheme is X-Stat [22] . The X-Stat algorithm follows a two phase approach. In the first phase, it uses adjacent Xfill technique to convert don't care (X-bit) stretches 0XX...X1 and 1XX...X0 into smaller X-bit stretches 0X1 and 1X0 respectively as shown in Phase 1 column of Fig 1. In the second phase, it replaces X-bits by either 0 or 1 in order to minimize peak toggles as shown in Phase 2 column of Fig 1. This figure shows that the global minimum peak toggles is 2 (shown under the Optimum-Fill column), while the minimum peak toggles achieved by XStat technique is 3, making it sub-optimal. Because of greedy approach used in Phase 1 of XStat technique, it does not achieve the global optimal-fill for peak toggle reduction. Motivated by this, we choose a Dynamic Programming paradigm which takes global picture into consideration and optimally fill the X-bits with binary values to achieve the best reduction in peak toggles. 
This problem can be formulated as a variant of interval coloring problem, which we call Bottleneck Coloring Problem. Next, we explain and define Bottleneck Coloring Problem and how peak power minimization is an instance of this problem. Since our objective is to minimize the peak toggles we are naming this problem as Bottleneck Coloring Problem.
V. BOTTLENECK COLORING PROBLEM (BCP)

A. Problem Explanation in Terms of Hotel Room Booking
Suppose a hotel received several guest requests for accommodation each of which has a start-date and end-date of a time period, and asking the hotel to provide accommodation for exactly one day which falls in the given period. The aim of the hotel is to assign rooms to all guest requests such that number of guests staying in the hotel on any given day is minimized, which is a variant of the interval coloring problem [23] .
B. Mathematical Definition of Problem
such that si and ei are integers corresponding to starting and ending times of interval i respectively, for all 
th row contain a sub-sequence 0XX...X0} then replace every don't care in this sub-sequence by zero since there exists an optimal solution in which all of these don't cares are replaced by zeros irrespective of how other don't cares are replaced.
If { i
th row contain a sub-sequence 1XX...X1} then replace every don't care in this sub-sequence by one since there exists an optimal solution in which all of these don't cares are replaced by ones irrespective of how other don't cares are replaced. end • Let S=φ 
There is only one toggle between j th and j + 1 th test vectors in this sub-sequence. The color assigned to this newly added interval in the solution of BCP captures the location of this toggle in this sub-sequence.
to sequence of intervals S. Comment 2 : Note that there exists an optimal solution to Peak Toggle Minimization Problem such that th position corresponds to a hotel room allocation for i th customer on j th day. The BCP ensures that the number of allocations on any given day is minimized, which in the current context translates to minimization of number of peak toggles on any given test cycle (launch-capture duration).
D. Constructing Optimal solution for Peak Toggle Minimization Problem from Optimal solution for Bottleneck Coloring Problem
• Suppose color cj is assigned to interval (si, ei) in the given optimal solution for Bottleneck Coloring Problem .
• Look at the row in matrix A correspond to interval (si, ei), make all bits from column si to column j same as bit value at column si and make all bits from column j +1 to column ei +1 same as bit value at column ei + 1
VI. ALGORITHM
A. Dynamic Programming Approach to compute Lower-Bound (LB) for Bottleneck Coloring Problem
Algorithm 1 gives the lower bound on the number of intervals which are assigned the same color. This algorithm can be implemented such that running time is O(k 2 ), where k is the number of intervals.
B. Greedy Approach to Bottleneck Coloring Problem
Algorithm 2 assigns colors to intervals such that for each interval (si,ei) it assigns i a color cj, where si ≤ j ≤ ei, and maximum number of intervals which are assigned the same color is at most the lower bound value computed in Algorithm 1. We call this algorithm as Optimal X-filling Algorithm (DP-Fill). Running time of this algorithm is O (k log k), where k is the number of intervals Lowerbound LB = max{ Nodes of this heap are ordered by ending times of intervals i.e ending time of interval stored in a node is less than or equal to ending times of intervals stored in that node's children.
Insert into heap H all intervals whose starting time is equal 4 to i. / * if we take any interval in H starting time is at most i. * / Remove top l elements from heap and assign color ci, 5 where l = min(current heap size, LB); / * The reason for picking top elements and assigning colors ci is we want to assign colors to intervals which are ending soon. We prove in section Proof of correctness that ending times of all these removed intervals are at least i.
* / end
C. Proof of correctness
In the following paragraph we will prove that at the end of i th iteration of Algorithm 2 ending times of all intervals contained in min heap are greater than i. This means each interval (si,ei) it assigned a color cj such that si ≤ j ≤ ei.
Suppose at the end of some iteration i min heap contains an interval whose ending time is greater than i. Let i be such that it's value is minimum. Let j < i such that number of intervals which are assigned color in j th iteration is less than lower bound. Let j be such that it's value is maximum. If there is no such a j then let j = 0. Let j < k < i such that in the k th iteration the above algorithm assigned color to an interval whose ending time is more than i. Let k be such that it's value is maximum. If there is no such k then let k = j. Ending times and starting times of all intervals which are assigned color from k + 1 th iteration to i th iteration are less than or equal i and greater than k respectively. Note that the number of intervals which are assigned colors from k + 1 th iteration to i th is equal to lowerbound * (i−k) and min heap contains an interval whose ending time is equal to i and starting time is greater than k. This implies number of intervals whose starting time is greater than k and ending time is less than or equal to i is more than lowerbound * (i − k), which is a contradiction.
D. Test Vector Ordering Algorithm
For a given vector ordering, Algorithm 2 gives the optimum value of peak input toggles. Note that if the length of don't care stretches in the rows of matrix A (which is defined in section V-C) is high, then the optimum value of peak input toggles is small. To achieve such a large don't cares stretches in the rows of matrix A we propose the following test vector ordering algorithm, we call this ordering as interleaved test vector ordering (I-Ordering). Experimentally we observed that the number of times the while loop in Algorithm 3 gets executed is O(log(n)), where n is number of test vectors. This experimental observation is shown in Figures 2(a) and 2(b) . Fig 2(c) analyzes the don't care stretch statistics in the test cubes of b19 circuit, for different test vector orderings. One can observe that I-Ordering increases the sizes of don't care stretches, which are finally exploited by the proposed X-filling Algorithm 2 to achieve the best possible peak toggle savings. The next section explains the experimental setup used to implement the described algorithms and compare the peak toggle savings obtained using the proposed algorithm to that of a commercial tool as well as techniques proposed in the prior literature.
VII. EXPERIMENTAL SETUP AND RESULTS
We have considered the ITC'99 benchmark suite to validate our algorithms. Synthesis, test generation and place-and-route (PAR) phases of different benchmark circuits are performed using
tools respectively, using a 45nm standard library. After PAR phase, the interconnect capacitances are extracted to compute actual power values. Tables II and III show comparison of peak input toggles for various X-filling methods w.r.t to test vector orderings given by the T etraM ax T M tool and the XStat method [22] respectively. Each row in these tables corresponds to a benchmark circuit. The shaded cell in each row corresponds to best X-filling method among all X-filling methods for the given ordering. We can observe that the proposed DP-fill method consistently performed better than all the other X-filling methods, under both the test vector orderings. This is because, under a given ordering, DP-fill is an optimal algorithm for minimizing peak input toggles. Next we will evaluate the impact Select all the vectors in T which are not in S and add 12 them to S, there can be at most k such vectors. Let temp optimal value be the optimal bottleneck value 13 computed on sequence S using Algorithm 2 if temp optimal value < current optimal value then 14 current optimal value = temp optimal value; 17  16  14  14  b04  41  50  47  45  39  39  b05  20  23  19  20  17  17  b06  4  4  5  4  4  4  b07  31  30  34  27  23  23  b08  20  20  20  18  14  12  b09  18  20  22  18  18  18  b10  12  19  17  15  10  10  b11  22  27  29  21  20  20  b12  63  76  62  89  59  58  b13  31  34  38  30  30  29  b14  181  180  194  159  157  156  b15  305  334  344  298  292  282  b17  916  923  943  880  871  841  b18  2134  2167  2251  2114  2066  2009  b19  3926  4099  4201  3955  3819  3753  b20  309  314  315  305  302  299  b21  317  307  315  305  276  260  b22  489  494  507  471  472 45  52  47  43  25  24  b05  21  24  21  23  15  14  b06  5  4  5  5  5  4  b07  27  33  38  25  15  14  b08  16  20  18  15  8  7  b09  20  19  17  16  14  14  b10  14  20  16  14  10  7  b11  18  26  22  20  10  9  b12  60  76  99  68  31  31  b13  37  32  28  23  17  17  b14  181  164  208  152  79  79  b15  308  277  314  198  144  144  b17  912  774  953  680  421  421  b18  2130  1752  2200  1569  1011  1008  b19  3926  3457  4340  3168  1877  1877  b20  314  291  352  297  152  152  b21  288  290  346  237  130  130  b22  483  419  475  440  237  234 of the proposed test vector ordering technique (I-ordering) under different X-filling schemes explained previously including DP-fill. Table IV shows the results for the same. It can be seen that DPfill method consistently performed better than all the other X-filling methods under the proposed I-ordering scheme. Additionally, it can be observed from Tables II, III and IV that the combination of I-ordering + DP-fill is most effective in reducing peak toggles, especially for the larger circuits. Next, we will compare I-ordering + DP-fill with other existing technique in the literature. Table V shows the peak input toggles comparison between the proposed technique and best known existing techniques. Column 1 shows the minimum peak input toggles obtained among all aforementioned X-filling methods, under test vector ordering given by the T etraM ax T M tool. Columns 2, 3 and 4 show minimum peak input toggles obtained using the techniques proposed in [20] , [21] and [22] respectively. Columns 6-9 of this table show the percentage 38  23  15  11  b08  16  18  16  14  8  6  b09  14  18  16  16  11  11  b10  10  18  14  13  9  7  b11  15  25  22  18  10  9  b12  59  72  99  65  30  15  b13  28  31  28  23  15  10  b14  168  158  208  148  77  40  b15  296  267  314  193  141  33  b17  882  770  953  676  419  85  b18  2030  1741  2200  1550  980  232  b19  3862  3436  4340  3167  1871  364  b20  301  285  352  284  143  65  b21  280  286  333  237  129  67  b22  451  409  475  425  210  91 improvement of proposed I-ordering + DP-fill technique over these best known low power techniques for the LOS scheme. It is evident that the proposed technique outperforms all the existing techniques for most of the benchmark circuits and the percentage improvement consistently increases with increase in circuit size. Tables II, III and IV correspond to different X-fillings for a given ordering. So, in all cases DP-fill gave the optimal solution of lowest peak input toggles for all the benchmarks. On the other hand, the orderings employed by the techniques proposed in [20] , [21] or [22] need not necessarily be same as I-ordering. Thus, unlike earlier comparisons made in tables II, III and IV, we cannot give a performance guarantee for I-ordering + DP-fill over techniques proposed in [20] , [21] or [22] . However, it is interesting to note from Table V that the proposed I-ordering + DP-fill technique actually outperforms all the these techniques for most of the benchmarks and the percentage improvement increases with increase in circuit size. This is because I-ordering as well as DP-fill are both designed for reducing peak toggles when test sets are dominated by don't cares and practically the test sets of most of these circuits are dominated by don't cares as shown earlier in Table I . Table VI shows the comparisons of actual peak power dissipation during test, between the proposed technique and the existing techniques. It can be observed that similar to peak toggles savings, the proposed technique performs better than all the existing techniques in peak power savings for most of the benchmarks and percentage improvement increases with increase in circuit size. This can be attributed to well known fact that there is a good correlation between input toggles and circuit toggles, as explained in [20] . Additionally, we can observe that the magnitude of improvement in tables V and VI is not same. The difference is due to the fact that the relation between input toggles and circuit toggles is not perfectly linear and while computing actual power dissipation of the circuit, we need to take interconnect capacitances into account. However, our proposed technique outperforms all the existing techniques considerably, in both peak input toggles as well as actual peak circuit power. 
VIII. CONCLUSIONS
We address the problem of excessive peak capture power that leads to false delay failures. Since the test cubes are dominated by X-bits and there is a good correlation of input toggles to circuit toggles, Xfilling is very effective for reducing peak capture power. We map the problem of optimal X-filling to a variant of interval coloring problem, so as to minimize peak input toggles of the circuit. This algorithm leads to significant reductions in peak capture power dissipated inside the circuit. To the best of our knowledge, this is the first ever reported X-filling algorithm that is optimal.
