Abstract: Power consumption and testability are two of major corisiderations in modern VLSI design. A full-scan method had been used widely in the past, to improve the testability of sequential circuits. Owing to the lower overheads incurred, the partial-scan design has gradually become popular. The authors propose a partialscan selection strategy which is based on the structural analysis approach and considers the area and power overheads simultaneously. A powerful sample-and-search algorithm is used to find the solution that minimises the user-specified cost function in terms of power and area overheads. The experimental results show that the sample-and-search algorithm derived by the authors can effect~vely find the best solution of the specified cost function, for almost all circuits, and, on average, the saving of overheads for each specific cost function is significant.
Introduction
Testability is one of the major concerns in VLSI design. However, automatic test pattern generation (ATPG) of sequential circuits is still considered a difficult problem to be solved, because of the lack of direct controllability and dilrect observability of the flip-flops.
To enhance the testability of sequential circuits, the full-scan method has been popular. In full scan, all the flip-flops are chained together into a shift register during the test mode. As the value of flip-flops can be assigned and observed directly in test mode, only a combinational test generator is required to generate test vectors, and the ATPG becomes much simpler. However, the area and delay overheads imposed by the full-scan approach can be significant due to the extra multiplexers in the scan flip-flops and the extra routing area for the scan chains. To reduce the overheads, the partial-scan approach, has been proposed as an alternative method. In partial-scan design, only a subset of flip-flops are selected1 to be replaced by the scan flipflops.
Many approaches have been proposed to select the right set of scan flip-flops [I-121. All the approaches mentioned aim at the low area overhead without con- sidering the timing or power overheads. Only the approach proposed in [ 131 considers performance degradation. The authors of [13] proposed a method that is based on the structural analysis of sequential circuits. In this method, heuristics were used to select a minimal set of flip-flops to eliminate cycles in a condensed version of the circuit graph, where vertices represent flipflops and arcs represent combinational paths, such that the least performance impact after the scan logic is added.
Power consumption has become one of the major concerns in modern VLSI design. In a CMOS circuit, power dissipation is directly related to the extent of switching activity of the nodes and the capacitance of the nodes in the circuit. The equation for dynamic power consumption is typically defined as
where f is a node in the circuit N , Cf,is the capacitance of the nodefand Dcf> is the switching activity of the node S. The power consumption of flip-flops incurred in the normal operation is also directly related to the switching activity and total capacitance of the flipflops. The scan flip-flops have higher power consumption than nonscan flip-flops due to the addition of an extra multiplexer which incurs extra capacitance. In the partial-scan flip-flop selection, if we select the flip-flops that have lower switching activities, the circuit will consume less extra power in the functional mode, because the power consumption is in proportion to the switching activity of flip-flops. Furthermore, when a flip-flop has lower activity, the controllability of the flip-flop is, thus, not good. So we can use the switching activity of flip-flops as a heuristic measurement to achieve both better testability and lower power consumption. Please note that our objective is to minimise the overhead of the power consumption incurred by the added test logic in the functional mode instead of in the test mode.
As discussed so far, we can use the switching activity of flip-flops as a new measurement for the selection of scan flip-flops in partial-scan design. In addition, the structural-analysis-based method has shown its effectiveness on scan flip-flop selection to have both high fault coverage and low area overhead. Our new method is thus based on the structural-analysis method by incorporating the switching activities of flip-flops into the heuristic algorithm. A sample-and-search algorithm is also added to find the best solution for a specified cost function which can be expressed in terms of area and power overheads for each circuit. Testability, area, delay and power consumption are the four major concerns in VLSI designs. However, it is impossible to get an optimal design which is optimal in f E N terms of all four aspects. It is because those four factors are usually in the trade-off situation. The more realistic situation is to minimise some factors while satisfying another constraints in terms of other aspects. In this paper, we try to minimise the users' specified cost function which is expressed as the weighted costs of both area and power consumption while satisfying the requirement of high testability. Performance is another very important factor for VLSI designs and deserves more attention. However, as discussed in [13] , the performance degradation caused by the test logic cannot be analysed easily and can vary significantly when different time-driven logic synthesis tools are used in the experiments. Therefore, we limit the scope of this paper by not discussing the issue on the performance factor. Certainly, our formulation of the problem and the proposed algorithm could be extended to cover the factor as well.
2
Calculating activity of flip-flops
Let us define signal probability and transition probability [14] first.
I Signal probability
The signal probability P,(x) at a node x is defined as the average fraction of clock cycles in which the steady state value of x is a logic high.
Transition probability
The transition probability P,(x) at a node x is defined as the average fraction of clock cycles in which the value of x at the end of the cycle is different from its initial value.
In the following discussion, we will use switching activity, activity or transition probability, interchangeably.
A synchronous sequential circuit or a finite-state machine (FSM) can be seen as some flip-flops acting together with a combinational circuit. Many approaches were proposed to calculate the transition probabilities of flip-flops [15-221. The approach proposed in [21] is a statistical method in which the circuit is simulated repeatedly under randomly generated input vectors while monitoring the outputs of flip-flops. This is, essentially, a Monte-Carlo simulation approach. In this paper, we use this approach to obtain the signal probabilities and transition probabilities of flip-flops because the method can handle large circuits with reasonable fast turn around time.
3
Our n e w approach
The activity of flip-flops, which is a kind of controllability measurement, cannot be the only criterion in choosing scan flip-flops to achieve high testability. In addition, a structural-analysis based method has shown its effectiveness with high fault coverage and low area overhead. Our new method is thus based on a structural-analysis method in which the switching activities of flip-flops are incorporated into a heuristic algorithm. A sample-and-search algorithm is also added to find the best solution for a cost function in terms of area and power overheads for each circuit.
Cycle breaking algorithm
The typical partial-scan selection algorithm based on the structural analysis is to select thenminimum 230 number of vertices to break all cycles and all paths of length more than d, , , in the graph [4, 81. The algorithm proposed by Lee and Reddy [8] is shown to be very efficient in solving this problem. The algorithm is divided into two parts; one to break the cycles and the other to break the paths. The cycle-breaking problem is reduced to a minimal feedback vertex set problem, which is NP-hard. A contraction-based algorithm developed in [23] was adopted to solve this problem. In the algorithm of the feedback vertex set, five graph reduction operations are used to reduce the graph. if the process of reduction cannot be completed, a heuristic is then used to select additional vertices from the reduced graph to break the cycles. Thus, we repeat the process of reduction and heuristic selection until the graph is empty. The five reduction operations are as follows: The outputs of the algorithm are sets SI and S2. The union of S1 and S2 is the minimal vertex set to break all cycles in a directed graph.
LOOP(v)
Modifying the acyclic graph G, the algorithm mentioned here can be used to break all paths whose lengths are longer than d, , , [8] .
Cost function
In this Section, we first define a flexible cost function which can account for both area and power overhead of the partial-scan selection. We then propose a structural-analysis-based method to minimise the cost function in the partial-scan selection. The general form of the cost function is defined as follows. from the standard cell library. In the following experi-ments, we assume a is equal to 0.2. We can thus calculate the EP, by summing up the products of scan flipflops' activities and the factor a in our experiments. The P, can be calculated, correspondingly, by summing up the activities of all flip-flops. The parameters of w, and wp can be specified by users, w, and wp are the area weight and power weight of the cost function. Based on different situations, users can specify different w, and w p for the cost function, so that our algorithm can find the partial-scan set to minimise such cost functions. For example, if wp is set to 0, the algorithm will aim at selecting the minimum number of flip-flops for the scan. This corresponds to the solution with minimum area overhead. If w, is set to 0, the algorithm will aim at selecting scan flip-flops to minimise the power overhead. In other situations, the algorithm can be used to trade-off between area and power.
We have modified the Lee-Reddy algorithm and introduced a sample-and-search algorithm to find the optimal solution for each specific cost function.
Modifications of Lee-Reddy algorithm
The Lee-Reddy algorithm can be divided into two parts, one is to break cycles and the other is to break paths. The same algorithm can be used in both parts as shown in Section 3.1.
The algorithm of break cycles consists of two major steps: graph reduction and heuristic selection. In the graph reduction step, we modify the two operations of 2. OUTl(v) ~ The symmetric operation to INl(v). The purpose of these modifications is to select the flipflops that have lower activities so that the extra power incurred by the scan flip-flops will be lower.
We also modify the heuristic selection part. The original Lee-Reddy algorithm uses the sum (or product) of in-degree and out-degree of each node as the heuristic measurement to minimise the number of selected flipflops. The larger the heuristic number, the higher the priority to select the corresponding flip-flop for scan. However, the new cost function is a weighted combination of the number of selected flip-flops and the extra power consumed by the scan flip-flops. A new heuristic of selection that is similar to the heuristic used in [13] is thus proposed. The heuristic is as follows:
The threshold is assigned as 0.01 in our experiment. The activity is the transition probability of flip-flop, and if activity is smaller than threshold, it is adjusted to threshold. When w is :;et to 0, the heuristic is reduced to the original heuristic aiming at minimising the area overhead. If w is set to a relatively large number, then the heuristic is aiming at selecting flip-flops with lower activities.
Given a dependency graph G and a parameter w, the overall algorithm that breaks cycles, breaks paths and calculates the cost is its follows: As the cost function specified by the users can be any weighted combinations of the power overhead and the area overhead, and every circuit can have dramatically different structures, the parameter w must be adjusted to find the scan flip-flop set that has the optimal cost function for every different circuit. Therefore, a search algorithm to find the best w to minimise the cost function for a different circuit is needed.
Sample-and-search algorithm
The search algorithm is used to find the best parameter w that can bias the heuristic selection part of the cycle breaking algorithm in order to find the set of scan flipflops with minimum cost. Typically, the number of scan flip-flops selected is directly proportional to the magnitude of the parameter w, and the power overhead incurred is inversely proportional to the magnitude of w. Given a cost function with specific weights w, and wp, the cost against w plot can be drawn as shown in 
Fig. 1 Plot of cost and w for ~38417
Circuit 938417, cost z lTNJN,+lOTdYJY,, lhreshold z 0 01
In the first step of the search algorithm, we sample many points that have exponential intervals in the cost curve, i.e. w = 0, 2O, 2', . . ., 2n, perform the partialscan selection algorithm using these ws in the heuristic selection parts, and then calculate the costs of the results for every sample point. The three ws that have minimal costs were picked out as the start points of the second step of detail search. The reason for choosing ws in this manner is that, when w is large, w' = (w small number) will not make visible difference in terms of flip-flop selections. Therefore, in order not to waste too much effort, we sample w using this strategy in the first step of search. The second step will then perform a more detailed search from the three start points wsi and get the result wri, respectively. Then compare the costs of the three results and select the wri that has the lowest cost as the final result. As can be seen in our experimental results, three ws selected in the first step are always enough for our algorithm to find the best w.
We realise that the cost curve nearby the wsi is in much smoother shape. We can thus apply a binary search to locate the best solution around wsi. Hopefully, this two-step search can quickly identify the best results by sampling at the first step but doing more detail search at the second step. Given a wsi, this algorithm calculates the cost of 312ws, first. If the cost of 3/2w,, is smaller than the cost of 312wsi then the search space is set to the interval of (wsi, 2wJ, else the space is set as (1/2w,, 3/2wsi). We define the left value of the search interval as wL, the right value as wR, and middle value as wM. The recursive algorithm of detailed search is as follows: 
Overall algorithm
Combining the modified Lee-Reddy algorithm and the sample-and-search algorithm, the overall algorithm of our partial scan selection can be presented as follows:
(1) Use the statistical technique that was proposed in [21] to compute the activities of every flip-flop in the circuit.
(2) Construct the dependency graph from the circuit structure.
(3) Apply the search algorithm and the modified LeeReddy algorithm to find the best parameter w which is used in the heuristic selection part of the cycle breaking algorithm with minimum cost and report the corresponding scan flip-flops selected.
Experimental results
With regard to the approach that breaks all cycles and all paths with lengths longer than a d, , , , to decide the d, , , for different circuits is a critical and difficult problem. A practical method is to use ATPG in the partialscan selection process to determine a d, , , that causes reasonably high fault coverage. In our experiments, several possible values of d, , were tried in the LeeReddy algorithm for every ISCAS89 benchmark circuit, and a value of maximal length d, , , that results in fault efficiency higher than 99% is selected. Fault efficiency is defined as the ratio of the number of detected faults over the number of the irredundant faults in the circuit. Fault coverage is defined as the ratio of the number of detected faults over the number of total faults (including the redundant faults). We use a sequential ATPG to generate the test patterns and calculate the fault efficiency for the benchmark circuits. Table 1 shows the results we obtained in this experiment. The column of flip-flop number shows the number of scan flip-flops over the number of total flipflops. All other ISCAS89 benchmark circuits which are not listed in Table 1 have high fault efficiency, even when only cycle breaking is applied. As the Lee-Reddy algorithm can obtain results with high fault efficiency when it breaks paths with lengths longer than d, , , listed in Table 1 for all ISCAS89 benchmark circuits, we assume that the fault efficiency will also be high in our modified algorithm when we use the same lengths d, , , to break long paths. We do evaluate the results of many benchmark circuits, and the fault efficiency of them is almost 100%. We conduct our experiments on the SPARC-20 workstation. Table 2 lists the comparisons of overheads between the Lee-Reddy algorithm and our algorithm. The user-specified parameters w, and wp are set to 1 and 10 in these experiments, and the activities threshold is set to 0.01. The fourth column in Table 2 shows ws that our search algorithm found for the heuristic selection step of our break-cycle algorithm. In Table 2 , our search algorithm found the best w for every circuit except s38417 which is marked by a * sign. Even though it is not the global minimum, it is still a local minimum. Compared to the Lee-Reddy algorithm, our algorithm incurs less extra cost for all cases. On average, our algorithm has 25.58% cost reduction compared to the original Lee-Reddy algorithm. For s838, the cost reduction is as high as 81.64 Table 3 shows the case where we set w, = 1 and wp = 0 in the cost function. This set up is totally aimied at area overhead reduction. We can find that the results are comparable to the results of the Lee-Reddy algorithm. The consideration of the activities of flip-flops does not offer significantly additional reduction, if the area minimisation is the only goal for the partial-scan selection. The static structural information such as number of fanins and number of fanouts is doing well in the structural analysis based approach. Table 4 shows the case where w, = 0 and wp = 10. This set up is totally aimed at minimising power overhead. The results show that our algorithm can have 38.70% power reduction, as compared to the original Lee-Reddy algorithm. 234 From the experimental results, we can see that this proposed algorithm only takes about 200 seconds on average to find the appropriate w for each case. However, the timing complexity increases when the circuit size increases. This is because the underlined cycle breaking algorithm will take longer time in those cases. It takes about one CPU hour to find the w for ~38417 according to the Table 2 . The time complexity is still acceptable because we only need to run this program once for each case. Since the sample-and-search algorithm can find the best heuristic selection weight(w) for different cost function with user-specified parameters w, and wp, we run our algorithm with some different w, and wp for some circuits. The results of circuit ~15850 are shown Table 5 . We can see that the smaller the w,lw, ratio is, the larger the optimal w is, and the lower that the extra power is. On the contrary, the number of scan flip-flops selected is larger. This phenomenon confirms our intuition that when wp is set high, the goal is to minimise the power overhead.
Concluding remiarks
In this paper, we proposed an approach that can exploit between area and power overheads for the partial-scan selection problem. Because of the powerful sample-and-search algorithm, our approach can find efficiently a very good solution for this objective. Currently, our approach only considers the area and power overheads, a method that considers area, power and timing simultaneously will be studied in the future.
