This paper presents a new technique for power minimization during test application in sequential circuits using multiple scan chains. The technique is based on a new design for test (DFT) architecture and a novel test application strategy which reduces spurious transitions in the circuit under test. To facilitate the reduction of spurious transitions, the proposed DFT architecture is based on classifying scan latches into compatible, incompatible and independent scan latches.
Introduction
Minimization £ of power dissipation in very large scale integrated (VLSI) circuits is important to improve the reliability and reduce packaging costs [1] . This indicates that future successful portable applications will depend not only on low-power design methods but also on new design for testability (DFT) techniques targeting low-power VLSI circuits. Numerous techniques for investigating power minimization during the normal (functional) mode [2] have been proposed.
Also it is important to examine the power dissipation during the testing mode [3, 4] mainly for the following two reasons. Firstly it was outlined in [1] that power dissipated during test application is substantially higher than power dissipated during functional operation which can decrease the reliability of the circuit under test due to higher temperature and current density.
Secondly the excessive power/ground noise caused by the high rate of current flowing in power and ground lines can erroneously change the logic state of circuit lines causing some good dies to fall the test [5] leading to yield loss. While minimizing power dissipation in full scan sequential circuits is the focus of this paper, in order to provide a meaningful understanding of the novel proposed approach a comprehensive review of sources of higher power dissipation during test application and low power testing techniques is given in sections 1.1 and 1.2. Motivations and objectives of the proposed work are presented in section 1.3.
Sources of higher power dissipation during test application
Depending on the level of abstraction and circuit type, high power dissipation during test application is due to the following problems:
i. Systems which comprise modern memory systems and multichip modules (MCMs) employ power-conscious architectural decisions where blocks are not simultaneously activated under functional operation [6] . Hence, inactive blocks do not contribute to power dissipation during the functional operation. However, when the system is in the test mode of operation, concurrent execution of tests in many blocks will result in substantially higher power dissipation when compared to functional operation.
ii. Low power combinational circuits are synthesized by algorithms [2] which seek to optimize the signal or transition probability of circuit nodes using only the spatial dependencies inside the circuit assuming the transition probabilities of primary inputs to be given. However, the complex spatiotemporal correlations which occur at the primary inputs must be considered [2] . This is of further importance during test application since correlation between consecutive test vectors generated by an automatic test pattern generator (ATPG) is very low, because a test vector is generated for a given target fault without any consideration of the previous test vector in the test sequence. The low correlation between consecutive test vectors during test application leads to substantially higher power dissipation when compared to functional operation.
iii. Low power sequential circuits are synthesized by state assignment algorithms which use state transition probabilities [2] . The state transition probabilities are computed assuming input probability distribution and state transition graph which is valid during functional operation. These two assumptions are not valid during the test mode of operation when scan DFT technique is employed. While shifting out test responses, the scan latches are assigned uncorrelated values that destroy the correlation between successive states.
Furthermore, in the case of data path circuits with large number of states that are synthesized for low power using the correlations between data transfers [2] , in the test mode scan registers are assigned uncorrelated values which are never reached during functional operation leading to substantially higher power dissipation.
Previous work on low power testing
This section gives a comprehensive review of recently proposed solutions for solving problems (i) -(iii) of section 1.1.
Problem (i):
To overcome the problem of high power dissipation during test application at the system level , numerous power-constrained test scheduling algorithms have been proposed under built-in self-test (BIST) environment [1, [6] [7] [8] [9] [10] [11] . The approach in [1] schedules the tests under power constraints by grouping and ordering based on floorplan information. A further exploration in the solution space of the scheduling problem is provided in [6] where a resource graph formulation for the test problem is given and tests are scheduled concurrently without exceeding their power ratings during test application. To overcome the identification of all the cliques in a graph and the covering table minimization problem applied in [6] , which are well known NP-hard problems, the solution proposed in [7] uses the left edge algorithm and tree growing technique as an heuristic for the block test scheduling problem. Several solutions for scheduling tests under power and area constraints [8] [9] [10] [11] have recently been proposed. However, all the previous approaches assume BIST environment which trades off high test area overhead and test application time at the expense of lower power dissipation during testing.
Problem (ii):
A new ATPG tool [5] was proposed to overcome the low correlation between consecutive test vectors during test application in combinational circuits. Despite achieving the objectives of safe and inexpensive testing of low power circuits the approach in [5] increased the test application time. A different approach for minimizing power dissipation during test application in combinational circuits (problem ii) is based on test vector ordering [12] [13] [14] [15] . Test vector ordering is done in a post-ATPG phase with no overhead in test application time since test vectors are reordered such that correlation between consecutive test vectors matches the assumed transition probabilities of primary inputs used for switching activity computation during low power logic synthesis. However the computational time in [12] is very high due to the complexity of test vector ordering problem which is reduced to finding a minimum cost hamiltonian path in a complete, undirected, and weighted graph. The high computational time is overcome by the techniques proposed in [13] [14] [15] where test vector ordering assumes high correlation between switching activity in the circuit under test and the hamming distance [13, 14] or transition density [15] at circuit primary inputs. For combinational circuits employing BIST several techniques for minimizing power dissipation have been proposed recently [16] [17] [18] [19] [20] [21] [22] [23] . In [16] the use of dual speed linear feedback shift register (LFSR) lowers the transition density at the circuit inputs leading to minimized power dissipation. Optimal weight sets for input signal distribution are determined in order to minimize average power [17] , while the peak power is reduced by finding the best initial conditions in the cellular automata (CA) cells used for pattern generation [18] . It has been proved in [19] that all the primitive polynomial LFSR of the same size, produce the same power dissipation in the circuit under test, thus advising to use the LFSR with smaller number of XOR gates since it yields lowest power dissipation by itself. A mixed solution based on reseeding LFSRs and test vector inhibiting to filter few non-detecting subsequences of a pseudorandom test sequence has been proposed in [20] . An enhancement of test vector inhibiting technique has been proposed in [21] where all the non-detecting subsequences are filtered. A different approach for filtering non-detecting vectors inspired by the precomputation architecture is presented in [22] . An improvement in area overhead associated with filtering non-detecting vectors without penalty in fault coverage or test length has been achieved using non-linear hybrid cellular automata [23] . Regardless of the type of test pattern generator, BIST architectures significantly differ from one another in terms of power dissipation as outlined in [24] . Thus, circuit partitioning for low power BIST and test session planning have an important influence on power dissipation as shown in [25] . Regularity of multiplier modules and linear sized test set required to achieve high fault coverage lead to efficient low power BIST implementations for data paths [26] . Although the techniques proposed for minimizing power dissipation during test application in combinational circuits achieve good results, different approaches are required for sequential circuits where both DFT methodology and test application strategy have a strong impact on power dissipation.
Problem (iii):
To minimize power dissipation in non scan sequential circuits during test application a test pattern generation methodology for low power dissipation has been proposed in [27] . The methodology is based on three independent steps comprising redundant test pattern generation, power dissipation measurement and optimal test sequence selection. The methodology which is based on genetic algorithms achieves considerable savings in power dissipation,
however cannot be applied to scan sequential circuits where shifting power dissipation is the major contributor to total power dissipation. To minimize shifting power dissipation in scan sequential circuits, test vector inhibiting techniques proposed for combinational circuits are extended to scan sequential circuits [28] . In [29] the test vector inhibiting technique is extended where the modules and modes with the highest power dissipation are identified, and gating logic to reduce power dissipation has been introduced. Despite substantial savings in power dissipation vector detection and gating logic introduce not only significant area overhead but also considerable performance degradation for modified scan cell design. In [30] a new scan BIST structure has been proposed based on the experimental observation that a very high fault coverage can be obtained by a small number of clusters of test vectors. Although not targeted specifically for low power dissipation during test application the approach in [30] , yields high fault coverage with correlated scan patterns which will also lead to lower power dissipation.
A similar approach is employed in the low transition random test pattern generator (LT-RTPG) proposed in [31] , where neighbouring bits of the test vectors are assigned identical values in most test vectors. A simple and fast procedure to compact scan vectors as much as possible without exceeding power dissipation has been proposed in [32] . All the previous scan-based BIST techniques [28] [29] [30] [31] [32] introduce test area overhead and/or further performance degradation when compared to scan DFT methodology. A different technique [12] based on test vector and scan latch ordering minimizes power dissipation in full scan sequential circuits without any overhead in test area or performance degradation. Further benefit of the post-ATPG technique proposed in [12] is that minimization of power dissipation during test application is achieved without any decrease in fault coverage and/or increase in test application time. However, the technique is test set dependent and cannot significantly reduce power dissipation despite a large computational time required to explore the large design space. Furthermore, for circuits with large number of scan latches the technique proposed in [12] is infeasible since computational time required to compute the cost function of each solution in the large design space, is unac-ceptably large. A further enhancement of the technique proposed in [12] can be achieved by defining novel test application strategies since the value of primary inputs is irrelevant while shifting out test responses. Hence, an improvement to scan latch and test vector ordering based on primary input freezing has been proposed in [33] . The approach does not introduce area overhead or further performance degradation, however it requires high computational times for large circuits. A different approach to achieve power savings is the use of extra primary input test vectors and hence supplementary volume of test data [34, 35] . The technique proposed in [34] exploits the redundant information that occurs during scan shifting, test application and response capture to minimize switching activity in the circuit under test. Despite achieving considerable power savings the technique requires long test application time and large volume of test data. The volume of test data is reduced in [35] where a D-algorithm like pattern generator [36] is developed to generate a single control pattern to mask the circuit activity while shifting out response. The input control technique proposed in [35] can further be combined with previously proposed scan latch and test vector ordering [12] to achieve, however, modest savings in power dissipation. Moreover, both approaches based on extra test vectors [34, 35] require high computational time and hence are infeasible for large sequential circuits.
Motivation and objectives
The aim of this paper is to reduce power dissipation in scan sequential circuit (problem iii).
Despite their benefits in lowering power dissipation during test application, the previously described techniques [12, [28] [29] [30] [31] [32] [33] [34] [35] are inefficient due to one or more of the following problems: a. test area overhead associated with detection logic [28, 29] required to find non-essential vectors (i.e. vectors which do not contribute to an increase in fault coverage).
b. performance degradation associated with modified scan cell design [29] .
c. large test application time required to achieve significant power savings [29] [30] [31] [32] 34] .
d. clock tree power dissipation is tackled by clock gating only for nonessential test vectors [29] .
e. high number of extra test vectors [34] emerges as a problem to testers which need to change to support the large volume of test data [37] .
f. computational time may be prohibitively large hindering the exploration for large sequential circuits [12, [33] [34] [35] .
The previous techniques [12, [28] [29] [30] [31] [32] [33] [34] [35] proposed separate solutions for solving one of the problems (a) -(f) at the expense of the other problems. For example while test vector inhibiting techniques [28, 29] achieve good savings in power dissipation, considerable area overhead for detection logic is introduced (problem a) or further performance degradation is incurred (problem b). On the other hand techniques based on adjacent patterns [30] [31] [32] require considerable test application time (problem c). Furthermore, clock tree power dissipation (problem d) which can be up to one third of total power dissipation [38] is tackled only in [29] where the clock is gated only for non-essential test vectors. This implies that for essential vectors there are no savings in clock tree power dissipation. The technique proposed in [34] necessitates an increase
p¦ in the volume of test data where m is the number of scan latches and p is the number of primary inputs. While volume of test data (problem e) was not a concern in the past for small to medium sized circuits it is recently emerging as a problem for testers which need to change to support the large volume of test data [37] . The technique proposed [35] overcomes the problem with large volume of test data by computing a single extra vector. However, it yields modest savings in power dissipation due to inability to fully mask the activity in the combinational part of the circuit. Furthermore, to achieve good fault coverage both techniques based on extra vectors [34, 35] require longer test sequences and hence both higher test application time (problem c) and computational time (problem f). Finally techniques which operate in a post-ATPG phase [12, 33] using compact test sets for high fault coverage require huge computational time (problem f) since they are strongly test set dependent and require probabilistic optimization.
The aim of this paper is to introduce a new technique for power minimization during test application in full scan sequential circuits based on a novel DFT architecture which eliminates all the above mentioned problems (a) -(f). The proposed DFT architecture is based on partitioning scan latches into multiple scan chains which reduces the clock tree power dissipation and does not have performance penalty. A new test application strategy for the proposed DFT architecture which applies a single extra test vector while shifting out test responses for each scan chain is presented. The multiple scan chain-based approach for power minimization which is test set independent, is applicable to both non-compact and compact test sets leading to low test application time. This paper shows that with low test area and test data overhead high savings in power dissipation during test application in large full scan sequential circuits are achieved in low computational time.
Background and Definitions
In the following, a brief review of the standard test terminology and power dissipation concepts which will be used throughout the paper are presented.
The controlling value for a gate is a single input value that uniquely determines the output to a known value independent of the other inputs to the gate. For example, the controlling value for OR gate is 1, and for AND gate is 0. If the value of an input is the complement of the controlling value, then the input has a noncontrolling value. A path is a set of connected gates and wires. A path is defined by a single input wire and a single output wire per gate. A signal is an on-input if it is on the target path. A signal is an off-input (side input) if it is an input to a gate which is on a target path but is not an on-input. Power dissipation in digital CMOS circuits is divided into static and dynamic power. The static power is considered negligible when compared to the dynamic power in digital CMOS circuits [39] . If the gate is part of a synchronous digital circuit controlled by global clock, it follows that the dynamic power dissipation P d is calculated using:
where C load is the load capacitance, V DD is the supply voltage, T cyc is the global clock cycle, and N G is the total number of gate output transitions (0
or 0). Since supply voltage V DD
and global clock cycle T cyc are design constraints, they are not under designer control. Thus,
is reported as quantitative measure for power dissipation throughout the paper. It has been assumed that load capacitance for each combinational gate is equal to the number of fan-outs.
The node transition count in scan latches N SL is considered as in [12] , where it was shown that 
Power Minimization in Full Scan Sequential Circuits Based on Multiple Scan Chains
In this section a new technique for power minimization in full scan sequential circuits based on multiple scan chains is introduced. Section 3.1 overviews the proposed design for testability (DFT) architecture for power minimization. Section 3.2 defines compatible, incompatible and independent scan latches and their importance for partitioning scan latches into multiple scan chains, as described in section 4, is explained through examples. Interestingly, although a previous approach [34] used the term "independent", they actually classified primary inputs as independent and not scan latches as it is the case in section 3.2. Therefore, there is no similarity between the previous approach [34] and the proposed classification beyond accidental sameness of terminology. Finally section 3.3 gives an important theoretical result showing the advantage of the proposed DFT architecture from the clock tree power dissipation standpoint.
Proposed Design for Testability Architecture Using Multiple Scan Chains
The proposed DFT architecture using multiple scan chains SC 0 S C k 1 is illustrated in Figure   1 . The scan input ScanIn is routed to all scan chains while the scan output ScanOut is selected While shifting out test responses through scan chain SC i , only the bit position i of scan control register is set to 1 while the other positions are set 0. This is easily implemented by shifting the value of 1 through scan control register using the extra scan clock SCLK. Before starting the first scan cycle, the initial vector 10 00 is set up in the scan control register using the scan input ScanIn. Thereafter, for each scan cycle, the 10 00 value is propagated circularly through the scan control register as shown in Figure 1 . It should be noted that when the circuit under test is in the test mode all the faults in the extra logic are observable through ScanOut line using the test data which is shifted through the k scan chains and control data shifted through the scan control register. Therefore the extra test hardware does not reduce any decrease in fault cover- 
Compatible, Incompatible and Independent Scan Latches
In order to partition scan latches into multiple scan chains, they need to be classified into three broad classes: compatible, incompatible and independent scan latches. It should be noted that scan latch classification is not done explicitly by enumeration or exhaustive search, but it is done implicitly by the partitioning algorithm as explained later in Figure 7 of section 4.1. The proposed classification is also important for computing extra test vectors associated with each scan chain that eliminate spurious transitions which are defined as follows. Note that the sole purpose of extra test vectors is to reduce the spurious transitions during test application and has no effect on fault coverage which is determined by the original test set. The application of extra test vectors defines a novel test application strategy for power minimization which is detailed in section 4.2. Further, since a single extra test vector is used for each scan chain regardless of values loaded in scan latches then the volume of extra test data is dependent only on the number of scan chains and not on the number of scan latches and/or the size of the original test set.
Definition 3
Two scan latches S i and S j are incompatible if at least one primary input x k that is assigned value i k to eliminate the spurious transitions which originate from S i will propagate the transitions which originate from S j . Two incompatible scan latches cannot be assigned to the same scan chain since there is no extra test vector that can eliminate spurious transitions which originate from both of them. The previous example has shown that following a careful examination of fanout branches of self-incompatible scan latches, most of the spurious transitions originating in self-incompatible scan latches can be eliminated using a single value for the extra test vector.
Finally, independent scan latches are introduced.
Definition 5 A scan latches S i is independent if all the gates on all the paths which originate from S i do not have at least one side input which can be justified by primary inputs.
The independent scan latches are grouped in the extra scan chain (ESC) for which no extra test vector can be computed and hence the spurious transitions cannot eliminated. The following example illustrates independent scan latches. 
Power dissipated by the buffered clock tree
Previous research has established that power dissipated in the clock tree is typically one third of the total power dissipation [38] and hence it is necessary to minimize power dissipated in the clock tree not only during functional operation but also during test application. Unlike previous approaches which do not consider power dissipated by the buffered clock tree [12, 28, [30] [31] [32] [33] [34] [35] 
Following Cauchy-Schwarz inequality [40] where
the power reduction is upper bounded by Red
The previous theorem shows that power reduction of up to 
Multiple Scan Chains Generation and New Test Application Strategy
In this section, partitioning of scan latches in multiple scan chains based on their classification, as described in 3.2, is given. Then, a new test application strategy for power minimization during test application, based on the DFT architecture described in section 3.1, is introduced. will be used by the novel test application strategy described in section 4.2.
Partitioning Scan Latches into Multiple Scan Chains
In the following each part of the MSC-PARTITIONING algorithm is explained in detail.
a. In the first part of the MSC-PARTITIONING of Figure 7 the initial circuit C is transformed into a reduced circuit C' as described in CIRCUIT-REDUCTION algorithm of Figure 8 .
The algorithm also identifies the freezing signals which are the signals that depend on primary inputs and should be set to the controlling value as side inputs to the gates which eliminate transitions that originate from scan latches as described in the following parts.
Two lists of eliminated gates and modified gates contain the gates which ought to be eliminated and modified respectively in the reduced circuit C'. Initially eliminated gates is set to all the scan latches whereas the modified gates is void (lines 1-2 
New Test Application Strategy Using Multiple Scan Chains and Extra Test Vectors
Having partitioned the scan latches into multiple scan chains with an extra test vector for each scan chain (section 4.1), this section introduces a new test application strategy for power minimization during test application in full scan sequential circuits. 
ALGORITHM: MSC-TEST APPLICATION
INPUT: Test Set S= V 0 V n 1 ! , Circuit C Scan Chains
Experimental Results
This section demonstrates through a set of benchmark examples that multiple scan chains combined with extra test vectors, as outlined in section 3, yield savings in power dissipation during test application. The algorithms described in section 4 have been implemented on a 500 MHz Pentium III PC with 128 MB RAM running Linux and using GNU CC version 2.91. The average value of node transition count (NTC) reported throughout this section is calculated using the formulas from section 2 under the assumption of the zero delay model. The use of zero delay model is motivated by very rapid computation of NTC and by the observation that power dissipation under zero delay model has a high correlation with power dissipation under general delay model [41] . However, the proposed technique applies equally to other general delay models as unit and variable delay models [42] . Furthermore, due to elimination of spurious transitions (Definition 1) the propagation of hazards and glitches is eliminated leading to even greater reductions for power dissipation in the case of unit and variable delay model. Besides, the aim of this paper is not to give exact values of power dissipation during test application, but to define a new design for testability architecture and a new test application strategy for power minimization that applies equally to every delay model. Table 1 shows the experimental results for all circuits from ISCAS89 benchmark set [46] ATALANTA [43] ATOM [44] MINTEST Table 1 : Experimental results using multiple scan chains for power minimization.
using three different ATPG test tools [43] [44] [45] . The first and second columns give the circuit name and the number of test vectors (TV) respectively generated using the ATALANTA test tool [43] . Third column shows the initial average value of NTC (trad. NTC), which is the total value of NTC using the traditional single scan chain design [36] has significantly smaller average value of NTC for all the benchmark circuits when compared to initial value of NTC computed using the test application strategy from [36] which employs a single scan chain. Furthermore, the computational time is very low (¢ 1s) for small circuits.
Moreover, for large circuits which are not handled by previous approaches [12, 34, 35] , as in the case of s38584, it takes ¢ 3600s to achieve substantial reduction in average value of NTC.
To give an indication of the reductions in power dissipation, Table 2 A further advantage of the proposed technique is that due to test set independence the final average value of NTC is predictable within a given range of values regardless of test vectors applied to the circuit. This is justified by the fact that the proposed low overhead area multiple scan chain architecture introduced in section 1 is not overly sensitive to the values of test vectors since only a single chain is active at a time and the spurious transitions within the combinational circuit are eliminated by the extra primary input vector regardless of the value loaded in non active scan chains. This is shown in Figure 5 where the graphs for average value of NTC for for 7 largest ISCAS89 benchmark under three different size test sets are given. For all three test sets MINTEST [45] , ATALANTA [43] and ATOM [44] the average values of NTC are are approximately equal. This implies that the proposed technique can further be applied to more DFT methodologies such as scan-based BIST [36] where the regardless of the value of the pseudorandom test set the savings in power dissipation are guaranteed and final values of NTC are predictable.
It should be noted that experimental results reported in this section using the simplified power model from section 2 do not consider power dissipated by the clock tree which is typically one third of the total power dissipation [38] . However, power dissipated by the clock tree can be substantially reduced using low power buffered clock tree design [38] which successfully handles both scan clock gating and scan clock trees required by the proposed design for testability architecture using multiple scan chains as shown in Theorem 1 of section 3.3.
Conclusions
This paper has presented a new technique for power minimization during test application in sequential circuits using multiple scan chains. The technique is based on a new design for test (DFT) architecture and a novel test application strategy which reduces spurious transitions (Definition 1 of section 3.2) in circuit under test. When compared to traditional approach which consists of a single scan chain [36] the proposed technique employs a novel DFT architecture based on multiple scan chains leading to substantial reduction in power dissipation. The proposed technique which is test set independent overcomes large test application time required to achieve significant power savings [29] [30] [31] [32] 34] since substantial power reductions are achieved for both compact and non compact test sets as shown in section 5. The newly introduced DFT architecture ( Figure 1 from section 3.1) does not introduce any performance degradation when compared to previous approaches employing modified scan cell design [29] . Unlike previous approaches which do not consider [12, 28, [30] [31] [32] [33] [34] [35] or reduce clock tree power dissipation only for nonessential test vectors [29] the proposed technique reduces clock tree power for all the test vectors of a very small test set where each test vector is essential as described in section 3.3.
While previous approaches [28, 29] the proposed technique is computationally inexpensive unlike previous approaches [12, [33] [34] [35] whose computational time is prohibitively large hindering the exploration for large sequential circuits. Finally, the synthesizable extra hardware required by the new DFT architecture introduced in section 3.1, the efficient algorithms given in section 4.1, and the novel test application strategy described in section 4.2 make the technique proposed in this paper easily embeddable in the existing VLSI design flow using state of the art third party electronic design automation tools.
