INTRODUCTION
With the continuous advancement in complementary metal oxide semiconductor (CMOS) technology, more and more transistors are packed on a single die resulting into chips with tens or hundreds of millions of transistors in a single package. When testing such large designs, peak and average power dissipation become important metrics to assess increased cost and reliability concerns. As the average power of the VLSI system increases, heat dissipated by such system increases. Thus, special packaging and cooling may be needed, which leads to increased system cost. Moreover, peak power demands an instantaneous access to power/ground rails at a very short period of time which results in IR-drop or ground bounce effects that could produce logic errors and/or reliability concerns such as electro-migration (Saxena et al., 2003) . It is customary during the design phase to analyze peak and average power and design the system appropriately to meet functional operation of such systems. However, during testing, values for peak and average power change drastically and can severely affect system performance and reliability.
In (Li et al., 2001) , it was shown that power consumption of a circuit during test mode is considerably higher, i.e. 100-200% higher than during the normal mode of operation. It has been observed that test efficiency correlates with toggle rate; thus switching activity during testing is significantly higher than normal mode of operation (Girard, 2002) . Another cause for increased power consumption during testing can be attributed to the fact that consecutive functional input vectors have high correlation between them, which is not the case for consecutive test vectors (Girard, 2002) .
Test vector ordering has been considered by researchers as an effective method to reduce both average and peak power consumption during test (Girard et al., 1997; Badereddine et al., 2006; Flores et al., 1999; Chattopadhyay & Choudhary, 2003; Hashempour & Lombardi, 2008; Sokolov et al., 2005) . The methods used in test vector ordering can be divided into two major groups; one group that uses minimum Hamming distances (Girard et al., 1997) , while the other performs ordering based on circuit switching activities (Chattopadhyay & Choudhary, 2003; Hashempour & Lombardi, 2008; Sokolov et al., 2005; Paramasivam & Gunavathi, 2007) . Previous works have empirically proved that minimal Hamming distance between test vectors translates into a lower switching activity; thus, one can reduce power consumption when compared to random ordering (Girard et al., 1997; Badereddine et al., 2006; Flores et al., 1999) . Moreover, approaches that utilize switching activity have been shown to provide better results compared to the Hamming distance approach, but they require the availability of internal details of the circuit-under-test (CUT) (Hashempour & Lombardi, 2008) .
One issue that must be addressed when performing test vector ordering is how to handle unspecified bits (don't care bits). Automatic test pattern generators (ATPG) normally generate partially specified test vectors, i.e. vectors with 0, 1, and x, where the value 'x' stands for 'don't care' bits. Many studies have utilized don't care bits to reduce power by specifying them to a value that reduces number of transitions in the circuit-under-test (Badereddine et al., 2006; Flores et al., 1999; Chattopadhyay & Choudhary, 2003; Paramasivam & Gunavathi, 2007; Li et al., 2005; Maiti & Chattopadhyay, 2008) . Thus, for any vector ordering problem, it is important to target both test ordering as well as x-filling to realize true low power test. Authors have exploited don't care bits for different objectives, which include minimizing Hamming distance among test vectors for a lesser number of transitions (Flores et al., 1999; Chattopadhyay & Choudhary, 2003) , reduce shift power in scan design (Badereddine et al., 2006; Li et al., 2005; Eggersglub, 2014; Trinadh et al., 2014) , reducing peak temperature (Dutta et al., 2013) , improving reliability (Feng et al., 2014) or reducing leakage power (Paramasivam & Gunavathi, 2007; Maiti & Chattopadhyay, 2008) .
Power consumption of a CMOS circuit is highly dependent on the amount of switching activity. However, only minimizing the switching activity may not be a good indicator of peak currents. In recent work (Huang et al., 2006; Huang et al., 2009; Lee & Kim, 2011; Gu et al., 2010; Wang et al., 2013; Borowczak & Vemuri, 2014) , researchers have shown the importance of peak current minimization and its strong correlation with switching activity direction. In (Huang et al., 2006; Huang et al., 2009) , authors have shown that the direction of the transitions play a major role in determining true peak power, since peak power is related to peak current. In their definition, authors (Huang et al., 2006; Huang et al., 2009 ) have used maximum transitions in a given direction commonly referred to as peak current to represent peak power. Using the definition of peak current, the authors proved that although the switching activity in the circuit may be the same, different combinations of switching directions still result in different peak currents.
While many techniques have addressed the problem of reducing peak power between two consecutive test vectors (or vector pairs), none have addressed the problem of reducing peak current by considering the direction of these transitions which was shown to correlate to actual peak power (Huang et al., 2006; Huang et al., 2009; Lee & Kim, 2011; Gu et al., 2010; Wang et al., 2013; Borowczak & Vemuri, 2014) . In this work, we propose a framework for test vector ordering and x-filling to minimize peak current for combinational circuit test. The methodology aims to reduce both peak and total power by utilizing minimum peak current ordering and FM-algorithm based x-refilling (Fiduccia & Mattheyses, 1982) . Experimental results comparing the proposed approach and traditionally available concepts have shown the effectiveness of the proposed framework.
PEAK POWER MINIMIZATION
In order to understand the various power issues in low power testing, it is important to have proper understanding of various terms commonly used in power related topics. The following is the power terminology that will be used throughout the paper:
Energy: total switching activity generated during test application.
Average power: ratio between energy and test time.
Instantaneous power: power dissipated at a given instant of time.
Peak power: highest value of instantaneous power for a test set.
Peak current: maximum current value for a test set.
The major source of power consumption in CMOS technology is dynamic power, which is attributed to the charging and discharging of load capacitances (Girard, 2002; Chandrakasan et al., 1992; . The energy/power consumed at node x, when it undergoes a transition can be expressed as (Saxena et al., 2003) :
Where C 0 is output capacitance of node x, and V dd is power supply voltage. This equation can be approximated using the number of transitions on node x as: (2) Where SA x is the number of transitions node x experienced over a period of time. The term SA x is the only variable part in equation (2), and it will be used to estimate the power consumed at node x. Therefore, total power consumed by a test vector pair (v a , v b ) can be expressed as: (3) Where N represents the number of nodes in the CUT, and Node(x, v a ) represents the value of node x when vector v a is applied. In order to find the total power consumed by the complete test set, we need to calculate the power for all test vector pairs in the test set resulting in the following expression: (4) Where k represents the number of test vectors in the test set. It is common practice to use weighted switching activity (WSA) to calculate total power . This is due to the fact that node fan-outs increase the capacitance of various nodes, which in turn affects the power. When such effects are of interest and to be taken into account, equation (4) must be modified by multiplying the SA x term by a factor that represents the node's fan-outs (i.e. fan-out +1). Previous work in (Huang et al., 2009) has shown that the impact of fan-out capacitances on peak current is minimal (i.e. 3% for 3x increase in capacitance), and since this work investigates peak current, we decided not to use the WSA model. Instead we utilized the simplified switching activity (E x ) model described above for the remainder of this discussion.
Peak power is defined as the worst case transitions a CUT undergoes after the application of a test set. Therefore, peak power can be calculated using equation (3) as follows: (5) Many previous works targeted the minimization of SA peak , which corresponds to the number of transitions for worst case vector pairs. In (Huang et al., 2006; Huang et al., 2009) , the authors have shown that the amount of current drawn from supply voltage (V cc ) and sunk to ground is different, depending on the transition type. For example, they have shown that a 0 to 1 transition on the output of D flip/flop draws approximately two times (2x) the current from the power supply, compared to that sunk to ground (in the worst case). Alternately, a 1 to 0 transition would result in current sunk to ground in excess of 2.5x when compared to that drawn from V cc . Even though the experiments conducted by (Huang et al., 2006) consider only transitions on flip/flop outputs, the same concept can be generalized to any logic gate. This means that the direction of switching activity plays a major role in the amount of current consumed by the CUT; thus, minimizing the number of transitions only does not guarantee true minimum peak power.
In this work, the maximum number of transitions in the same direction is used to estimate peak current. Therefore, to find peak current, the maximum up and down transitions for every vector pair must be calculated. The up and down transitions can be calculated for a test vector (v a , v b ) using equations (6) and (7) respectively as follows: (6) (7) where RT x and FT x in equations 6 and 7 refer to up and down transitions on node x, respectively. Therefore, peak current for a test vector pair (v a , v b ) can be found as follows: (8) In order to minimize peak power, all previous work has targeted minimizing SA peak term defined in equation (5); however, as described above, this does not necessarily result in minimum peak current. Consider, for example, the circuit shown in Figure 1:   Fig. 1 . Example of CUT. Table 1 gives different test set orderings for such a circuit. The first two columns give original ordering as generated by ATPG. Columns HD, SA, and Dir give possible ordering based on Hamming distance, switching activity, and peak current respectively. For every ordered test set, a vector number is given to represent the number of the vector in the original unordered test set. The last three rows of Table 1 summarize power statistics of the various orderings. It can be seen from the results that peak current is reduced, when using the peak current concept compared to the other approaches, even for SA based ordering. For example, in columns SA and Dir, both approaches have the same number of peak transitions (i.e. 3). However, Dir approach has lower peak current (2 transitions in same direction for Dir compared to 3 for SA approach). This example demonstrates that even when peak transitions are the same, there is room to reduce peak current further. Almost all ATPG tools available today generate test sets with don't care values. Such don't care bits are normally specified in such a way to reduce power. Many approaches integrate x-filling strategy with their re-ordering algorithm to find the best solution for low power test. In this work, we follow a similar approach by utilizing don't care bits to reduce peak current by performing x-filling as a post-processing step. Our x-refilling algorithm is based on Fiduccia-Mattheyses (FM) algorithm (Fiduccia & Mattheyses, 1982) and is described in section 3. FM algorithm is a partitioning algorithm based on the well-known KL algorithm (Kernighan & Lin, 1970) which is commonly used in the computer-aided design of digital systems, due to its simplicity and efficiency. KL algorithm belongs to a class of iterative improvement algorithms, where it initially starts with an initial partition and then moves nodes between partitions to improve partitioning. The process is repeated until no further improvement is possible. Fudduccia and Mattheyses developed an efficient variant of KL algorithm in which only a single vertex is moved across the cut in a single move by utilizing a better data structure.
FM algorithm has been used for the reduction of leakage (static) power in testing for nanometer technologies (Maiti & Chattopadhyay, 2008; Kao et al., 2010) in addition to many other applications. In this work, we employed the FM algorithm to reduce peak current by assigning "don't care" bits with suitable values of 0 or 1. Each x in a test vector was considered a node. Based on the initial random assignment value, all nodes associated with don't care bits are placed in two partitions: 0-partition and 1-partition based on their initial fill value. Given this initial partition of nodes, we utilize the FM algorithm to reassign values to don't care bits by moving across the partition in order to reduce peak current.
PEAK CURRENT MINIMIZATION FRAMEWORK
In this section we describe the proposed framework for minimizing peak current in combination circuit test. The proposed peak current minimization framework consists of two phases, one targeting test vector ordering, and the other is x-refilling phase. Figure 2 gives the complete flow of the proposed framework. The proposed methodology starts by generating a test set for the CUT using a combination circuit ATPG. Next, all don't care bits in the unordered test set were randomly filled then ordered using DirPeak algorithm that minimizes peak current. After that, the initial randomly filled don't care bits were refilled using our proposed DirFM algorithm, which is a FM based algorithm to perform x-refilling to minimize peak current. Even though Figure 2 shows the complete framework in which algorithms DirPeak and DirFM are used, these algorithms can be integrated with any other test flow since they are completely independent.
The problem of test vector reordering can be formulated as a completely connected graph, where every node represents a test vector and is connected to another node by an undirected edge. The weight on the edge represents the cost of applying the two test vectors one after the other (i.e. a vector pair). For example, if node 1 is connected to node 2 through an edge, then the edge represents the cost of applying a test vector pair (v 1 , v 2 ) or (v 2 , v 1 ). In our implementation, edge weight is a function of the number of switching activity and peak current of the vector pair. Using this graph representation, finding a test vector ordering that reduces peak current can be formulated by finding a Hamiltonian path of minimum cost (Girard et al., 1997; Badereddine et al., 2006; Flores et al., 1999; Chattopadhyay & Choudhary, 2003; Hashempour & Lombardi, 2008; Sokolov et al., 2005) . This is equivalent to the well-known travelling salesman problem and is considered as a NP-complete problem. Therefore, it is a common practice to use heuristics to find a solution to such a problem (Hashempour & Lombardi, 2008; Sokolov et al., 2005) . In our approach, we use greedy algorithm (DirPeak) to solve the test vector ordering problem.
Algorithm DirPeak is shown in Figure 3 . The algorithm starts first by computing the number of transitions and peak current for every test vector pair according to equations (4) and (5), respectively. Second, it adds these vectors to an unordered list and picks a root randomly as the starting point of the Hamiltonian path (i.e. first_ min) and removes it from the unordered list. Third, the algorithm picks a vector with minimum edge cost with respect to the root from the unordered list, which is minimum in peak power (i.e. equation 5) with minimum transitions in the same direction (i.e equation 8). Last, the newly selected vector becomes first_min while second_min is deleted from the unordered list, and the process is repeated again. The algorithm will continue execution until the unordered list becomes empty. In step 6a, the algorithm finds a vector with minimum switching activity, which has a complexity O(n) for each iteration of the "while" loop. Since the loop is repeated n-1 times, this results in an O(n 2 ) complexity of this loop.
Step 6 has the worst case complexity, and the overall complexity of algorithm DirPeak is O(n 2 ), where n is the number of test vectors in the test set. It was mentioned earlier that many algorithms utilize don't care bits in test sets to reduce power. In our framework, we initially filled these don't care bits randomly and then reorder the test set; then, we analyze these random assignments for power improvement and modify their assignments as needed, to reduce peak current. The proposed x-refilling algorithm based on FM algorithm, also called DirFM, is given in Figure 4 .
DirFM Algorithm: For each node with "X" value do { 1. peak_sa_1:= Compute peak switching activity; 2. peak_sw_s_dir_1:= Compute peak switching with same direction; 3. total_sa_1:= Compute total switching activity; 4. Tentatively move node from its partition to anther partition; 5. peak_sa_2:=Compute peak switching activity; 6. peak_sw_s_dir_2:= Compute peak switching with same direction; 7. total_sa_2:= Compute total switching activity; 8. cost := calculate_cost(); 9. if (cost<1) { a. Make tentative movement permanent; b. peak_sa_1:= peak_sa_2; c. peak_sw_s_dir_1:= peak_sw_s_dir_2; d. total_sa_1:= total_sa_2; e. Lock this node; } 10. else { a. Undo movement of this node; b. Lock this node; } } Fig. 4 . DirFM algorithm.
The algorithm starts with calculating total, peak power, and peak current for all vector pairs and creating two partitions: one containing all x-values that were filled with 0, and the other one containing those that were filled with 1. The FM algorithm is started by selecting a member in one partition and tentatively moving it to the other partition (i.e. changing bit value to its complement). Then, cost function is calculated for this new assignment and is compared to that of the original assignment. If the cost function is less than 1, then the new assignment is better in terms of power and the bit is permanently moved to the new partition. Alternatively, if the cost function value is greater than 1; then, power is increased, and the tentative move is undone. In either case, the bit is flagged to inhibit future movement (i.e. locked into the partition). The cost function used in DirFM algorithm is as follows: (9) To clarify DirFM refilling heuristic, Figure 5 shows a graphical representation of this heuristic. Figure 5 assumes that four don't care bits that were initially filled at random. Bits that were filled with 1 are placed in partition '1', whereas those filled with '0' are placed in partition '0'. Assuming the algorithm chooses bit 2 for refilling, the heuristic starts by tentatively moving it from partition '0' to partition '1'. After calculating the cost function, the cost function of the new assignment was found to be less than 1, thereby making the move permanent. Node 2 is locked and a new node is selected repeating the process again. The complexity of algorithm DirFM is linear with respect to the number of don't care bits in the test set, i.e. m; hence, it is an O(m) complexity algorithm. 
EXPERIMENTAL RESULTS
To validate the proposed algorithms, we compare the results of the proposed ordering algorithm to that of commonly used algorithms such as those that are based on Hamming distance and switching activity. In the discussion that follows, we compare the performance of the proposed algorithm to these two different approaches.
Three different ordering algorithms were implemented and they are as follows:
MinHD: An algorithm that utilizes minimum Hamming distance between vector pairs to find best order.
MinSA: An algorithm that uses peak switching activity (i.e peak power) between vector pairs to find best order.
DirPeak:
The proposed algorithm that utilizes both, peak power and peak current to find best order.
All algorithms were implemented using Java and were run on a PC with 1.8 GHz Intel DuoCore processor with 2 GB RAM. Each algorithm uses equations (4) and (5) to calculate peak and switching activity of the test set. All algorithms were initially run without enabling the DirFM x-refilling step to analyze the effectiveness of ordering algorithm DirPeak. Then DirFM refilling algorithm was enabled for all three algorithms and the impact of the proposed x-refilling strategy is discussed. All algorithms were run on ISCAS85 benchmark circuits, and the test set was generated by ATALANTA ATPG (Lee & Ha, 1993) . The characteristics of the ISCAS85 benchmark circuits are highlighted in Table 2 . To analyze the performance of the proposed algorithm, algorithms MinHD, MinSA, and DirPeak were run on the unordered test set generated by ATALANTA using a random root and randomly filled. The results are shown in Table 3 . Note that in both Table 3 and Table 4 , the last row gives the average values. The results are given using three columns for each algorithm, where columns "Dir", "peak", and "tot" represent peak current, peak power, and total transitions respectively. Peak current here refers to the maximum transitions in a given direction. Moreover, only percent reductions in these values are given for algorithms MinSA and DirPeak (i.e. % columns) to ease the analysis of the results. The percent reduction is calculated by using the values generated by algorithm MinHD as a base value.
In Table 3 , the first observation is that algorithm DirPeak outperforms MinHD algorithm, when it comes to reducing peak current and peak power on average. DirPeak algorithm was able to reduce peak current and peak power by 22% and 20%, respectively. Moreover, the proposed algorithm was able to reduce total power on average by 25% compared to Hamming distance due to the fact that the Hamming distance approach does not take into account internal circuit transitions.
The second observation is that algorithm DirPeak in most instances was able to further reduce peak current and peak power found by MinSA algorithm. Table 3 shows an additional 2-3% of approximate reduction can be gained in peak current and power, when comparing the performance of DirPeak algorithm to MinSA. This demonstrates that there is some additional reduction in peak current by including switching direction in the cost function of the reordering algorithm as compared to only considering the switching activity, as is the case in MinSA. We analyzed the impact of using the proposed x-refilling approach as a post processing step to the three previously defined algorithms. Table 4 shows the results of applying DirFM on the ordered test set generated by algorithms MinHD, MinSA, and DirPeak. Note that DirFM algorithm only modifies don't care values that were initially filled randomly, and it does not change the order of the test set generated by these algorithms. All columns in Table 4 are given as percent reduction, compared to the base case of algorithm MinHD with no refilling heuristic (i.e. no DirFM).
From Table 4 , it can be seen that utilizing the proposed DirFM algorithm for refilling don't care bits results in an improvement in almost all power values for all algorithms. DirFM algorithm was able to further reduce peak current values of MinHD, MinSA, and DirPeak algorithms by 12%, 10%, and 11%, respectively, when compared to the initial forms of these algorithms. A similar observation can be made with respect to peak and total power values as well (reduction compared to no DirFM counter parts is approximately 10-17%). Since all algorithms benefit from using algorithm DirFM, this shows the importance of the x-filling technique in providing further reduction in peak current/power. It also demonstrates the importance of using the concept of peak current in x-filling strategies used in low power test. 
CONCLUSION
In this work, we proposed a framework to minimize peak current in a combination circuit test. The approach was shown to be effective in reducing peak current, when compared to other algorithms that do not consider switching direction in their cost function. Experimental results on ISCAS85 benchmark circuits showed reduction in peak current, peak, and total power values of 33%, 32%, and 43% respectively, compared to Hamming distance-based ordering with random filling. Even though the approach proposed in this work is for combinational circuits, the concept can be extended to scan-based design.
ACKNOWLEDGEMENT
This work was supported by Kuwait University under Research Grant no. EO 02/08.
