Abstract
I. Introduction
With the emergence of mobile computing and communication devices, design of low-energy VLSI systems has become a major concern in circuit synthesis. A significant component of the energy consumed in CMOS circuits is caused by the total amount of switching activity (SA) at various circuit nodes during operation. The energy dissipated at a circuit node is proportional to the total number of 0 → 1 and 1 → 0 transitions the logic signals undergo at that node multiplied by its capacitance (which depends on its fanout, and its transistor implementation). Energy consumption in an IC may be significantly higher during testing due to increased SA than that needed during normal (system) mode, which can cause excessive heating and degrade circuit reliability. The average-power optimization help extend the battery life in mobile applications.
The maximum-power, sustained or instantaneous, may cause excessive heating or undesirable logic swing. Conventional BIST schemes with random patterns may need an excessive amount of energy because of the test length and randomness of the consecutive test vectors. Further, a significant amount of energy may be wasted during just the scan operations.
II. Related work
With the emergence of deep submicron technology, minimizing power/energy consumption during testing has become an important goal of BIST design. The existing techniques include toggle suppression and blocking useless patterns at the inputs to the CUT [1] , designing low-power TPG [2] , and low-transition random pattern generator [13] , distributed BIST [17] and test scheduling [3] , optimal vector selection, designing new scan path architecture [11] , use of Golomb coding for lowpower scan testing with compressed test data [14] , and BIST for data paths [4] . For deterministic testing, energy reduction can be achieved by reordering scan chains and test vectors [6] , or by test compaction [5] .
III. New BIST design
We assume a test-per-scan BIST scheme as in the STUMPS architecture (Fig. 1) . A modulo-m bitcounter keeps track of the number of scan shifts, where m is the length of the longest scan path. Since the number of useful patterns is known to be a very small fraction of all generated patterns, a significant amount of energy is still wasted in the LFSR while cycling through these useless patterns even though they are blocked at the inputs to the CUT [1] . Further, test-vector reordering in a pseudorandom testing environment is a challenging task. In this paper, we propose a new BIST design that prevents the LFSR from cycling through the states generating useless patterns, as well as reorders the useful test patterns in a desired sequence to minimize total energy demand. it generates the new sequence S . (vi) Synthesize a mapping logic (ML) with minimum cost, to augment the LFSR; the state transitions of the LFSR are modified under certain conditions to serve two purposes: (a) to prevent it from cycling through the states generating useless patterns, and (b) to reorder S r to S ; for all other conditions, the LFSR runs in accordance to its original state transition function. Fig. 2 shows a simple MUX-based design that can be used for this purpose. A similar idea of skipping LFSR states is used earlier for embedding a set of deterministic tests [10] .
The following example of a TPG (see Fig. 3 ; ignoring the dotted arcs) illustrates the idea of stateskipping technique. The LSB of the LFSR is shifted serially into the scan path generating a test sequence S (Table 1) . Some component of SA is intrinsic (invariant over a full test session), and the rest is variable. Hence, SA can be represented as a directed complete graph called activity graph ( Fig. 4b) , where each node represents a test vector, and the directed edge (e ij ) represents application of the ordered test pair (t i , t j ). The weight w(e ij ) on the edge e ij denotes the variable component of SA corresponding to the ordered pair of tests (t i , t j ). The intrinsic component being independent of test ordering is represented as a node weight and may be ignored as far as the optimal ordering is concerned. An example of an activity graph is shown in Fig. 4b . The edge weights are represented as an asymmetric cost matrix (Fig. 4a) , as the variable component of SA strongly depends on ordering of test pairs. Thus, for the test sequence S (t1→ t2 → t3 → t4 → t5), the variable component of switching activity is 37. Now, if t3 is found to be a useless test pattern, it along with all incident edges, can be deleted. An optimal ordering of test vectors that minimizes the energy consumption is a min-cost Hamiltonian path: S (t1→t2 → t5 → t4), the path cost being equal to 23 (Fig. 4c) . Thus, in the new sequence S , for the ordered pair (t1 → t2), no action is required, as t2 is generated by the LFSR as a natural successor of t1. So, for s9 (endstate of t1), we set the Y-outputs of the mapping logic (see Fig. 2 ) to don't cares (d), and the control line C to 0. However, we need an additional transition from s14 (end-state of t2) to s8 (start-state of t5), and similarly from s11 (end-state of t5) to s6 (start-state of t4). For these combinations, the Youtputs are determined by the corresponding startstates, and C is set to 1. For all other remaining combinations, all outputs are don't cares. These transitions generate the useful test patterns in a desired sequence, and prevent the LFSR to cycle through the states that generate useless patterns (in this example, test t3). Further, the output M of the modulo-m bit counter assumes 1 only when scan path (whose length is m) is filled, i.e., at the end-states of Fig. 3 . In general, the mapping logic can be described as follows: given a seed, let S denote the original test sequence generated by the LFSR, and S = {t' 1 , t' 2 , … t' i , t' i +1 , …} denote the optimally ordered reduced test sequence consisting of useful vectors only. Let y i denote the output of the i-th flip-flop of the LFSR, and Y i denote the output of the mapping logic feeding the i-th flip-flop through a MUX (see Fig. 2 ). The ML is a combinational circuit with k inputs {y 0 , y 1 , …,y k-1 }, and k+1 outputs {Y 0 , Y 1 , ..,Y k-1 , C}, where k is the length of the LFSR, and C is a control output. For every test t' i in S , there is a corresponding row in the truth table given by (Table 2) : Thus, the next-state of the LFSR follows the transition diagram of the original LFSR when either C = 0, or M = 0, and is determined by the outputs of the mapping logic if and only if CM = 1. Since these additional transitions emanate only from the endstates of test patterns, their occurrences can be signaled by the M output, and also when C = 1. In order to prevent the SA from occurring in ML for every scan shift cycle, an enable signal E controlled by M is used. Thus, the y-inputs become visible to ML if and only if M = 1. The test session terminates when the end-state of the last useful pattern in S is reached. Determination of optimal reordering of test patterns is equivalent to solving a traveling salesman problem (TSP), which being NP-hard, needs heuristic techniques for quick solution [9] .
IV. Experimental results
We assume fully isolated scan-path architecture [12] , where the scan register (SCAN) is completely separated from the CUT by a buffer register (BUFFER). This eliminates SA rippling through the CUT while shifting in a test pattern. When all the bits of a test pattern are shifted in, the content of SCAN is copied to BUFFER, and then applied to the CUT. During the system mode, the CUT outputs are captured in BUFFER, and then copied to SCAN. The response vector is shifted out while shifting in the next test pattern to SCAN. A test cycle corresponds to applying a particular test vector to the CUT and capturing its response. A major component of scanshift SA although being dependent on consecutive vector pairs, remains invariant over a complete test session. Hence, this portion may also be treated as intrinsic as far as TSP optimization is concerned.
Experiments were carried out on several ISCAS-89 scan benchmark circuits, and results are reported in Tables 3 through 6 . A 25-bit LFSR is used to generate 20,000 pseudorandom test vectors, and the useless vectors are identified and eliminated by running the HOPE fault simulator [7] . Since the modified LFSR generates only the useful patterns, a significant amount of test application time is saved as t1 t2 t3 t4 t5 t1 0 8 9 11 9 t2 11 0 10 12 7 t3 8 6 0 10 6 t4 10 9 6 0 9 t5 11 8 7 8 0 shown in the last column of Table 3 . Next, for each pair of useful test vectors, SA in the CUT and the scan path is computed. We assume a single linear scan chain. Table 5 shows the (intrinsic) SA due to scan shift and capturing the responses in the BUFFER. To determine an optimum reordering of useful test vectors, we consider only the variable component of SA occurring in the CUT and the scan path for every ordered pair of useful vectors. We then run a TSP solver [8, 9] to find a nearly optimal ordering. Next, the mapping logic (ML) for the LFSR is synthesized using ESPRESSO [15] and SIS [16] . Table 4 indicates a significant amount of total energy savings in the LFSR. For small-sized circuits, the relative overhead of ML is high compared to the cost of the CUT, as we have used a 25-bit LFSR. However, for large circuits, e.g., s35932, the overhead of ML is low. 
V. Conclusion and future problems
A new BIST design is described for saving energy both in the LFSR and the CUT in a random testing environment. A significant component of the SA is observed to be intrinsic in nature, which given a test set, cannot be reduced by vector reordering. To reduce this component, either a different set of useful test vectors is to be selected from the random sequence, or the scan path architecture is to be radically redesigned. Another intrinsic source of power consumption is the clocking circuitry, which is not considered in this work. Ensuring reusability of mapping logic and BIST hardware for different cores on a chip is another open area to study. 
