This paper proposes a power profile manipulation approach which merges two distinct research directions in low power testing: minimization of test power dissipation and test application time reduction under power constraints. It is shown how complementary techniques can be easily combined through this approach to significantly increase test concurrency under power constraints. This is achieved in two steps: in the first step power dissipation is considered a design objective and consequently it is minimized, result further exploited in the second step, when power becomes a design constraint under which the test application time is reduced. A distinctive feature of the proposed power profile manipulation approach is that it can be included in, and consequently improve, any existing power constrained test scheduling algorithm. Extensive experimental results using benchmark circuits, considering test-per-clock as well as test-per-scan schemes, show that by integrating the proposed power profile manipulation approach into any existing power constrained test scheduling algorithm, savings up to 41% in test application time are achieved.
Background information
This section gives the definitions of the basic terms which will be used in the rest of the paper.
Test scheduling
Test scheduling algorithms aim to reduce test application time by increasing the parallelism of the testing activities in the system. When ignoring power issues, the maximum test concurrency is limited by the resource sharing conflicts. The blocks of a system which can be tested simultaneously without generating any resource conflicts are said to be test compatible. The tests which are executed at the same time form a test session. The tests corresponding to test compatible blocks are said to be resource compatible tests. The test compatibility relations among the blocks of a system are represented using the test compatibility graph (TCG). Each block and its corresponding test are associated to a node in the TCG. An arc between two tests in the TCG signifies that the two corresponding tests are resource compatible.
Test power modeling
In order to consider power during test scheduling, the power dissipated by the block under test needs to be modeled. The power profiles capture the power dissipation of a block over time when applying a sequence of test vectors to the inputs and/or pseudo-inputs of the block. The power profiles give cycle-accurate descriptions of power dissipation which makes them too complex to be considered in the test scheduling process. Therefore simple and reliable approximate power models are needed.
The following section analyzes a commonly used power approximation model and justifies the need for a new power approximation model for power constrained test scheduling.
The global peak power approximation model
The power approximation model currently used by most of the existing PCTS algorithms [2, 3, 10-12, 14, 17] is the global peak power approximation model (GP-PAM). As shown in Figure 1 , the GP-PAM basically flattens the power profile of a block to the worst case power dissipation value, i.e. its peak value. According to this model, the power profile of a block is described by the pair P hi
where P hi is the global peak value of the power profile, and L is the sequence length. This simple approximation model, although it guarantees that power dissipation is not under estimated for any time instance, it introduces a high approximation error, indicated by the false power from Figure 1 .
The false power component introduced by the power approximation model leads to under-optimal test concurrency, and hence longer test application time. Section 3 will show how test concurrency can be increased by reducing the false power component of the power profile. The false power is minimized by changing the test power profiles and describing it using a suitable power approximation model.
The new power profile manipulation approach
The previous section has shown that, regardless of its simplicity and reliability, global peak power approximation model leads to large approximation errors and consequently to low test concurrency.
This can be avoided if the shape of the power profile can be manipulated such that it allows a more accurate approximation. This section proposes a power profile manipulation technique for increasing test concurrency under power constraints. This approach consists of the following components:
test vector reordering (Section 3.1) -initially, power profiles for the block tests are lowered by increasing the correlation between successive vectors in the test sequence; test vector reordering is used for peak power reduction, as well as for power profile reshaping; test sequence expansion (Section 3.2) -additional test vectors are added to a test sequence in order to further reduce the peak of its power profile. Only the test sequences which do not influence the test session length are extended, in order to preserve the total test application time; new power approximation model (Section 3.3) -the proposed test vector reordering produces a power profile consists of an initial low power part of the profile followed by a high power part; hence a simple and reliable approximation model exploiting these two parts will provide more accurate descriptions of the power profile than the GP-PAM; test sequence rotation (Section 3.4) -finally, the low power profiles are rotated and piled up together such that the high power parts do not overlap in order to obtain improved usage of the power constraint;
As it is shown in the following sections, the proposed methodology performs low complexity operations with on simple data representations, thus even for large amounts of test data corresponding to real-life circuits the required computational effort is achievable on typical workstations in reasonable execution times, as demonstrated by the experimental results in section 5.
Test vector reordering
In this section, power dissipation is seen as a design objective and consequently it is minimized.
Dynamic power represents one of the main components of power dissipation in CMOS circuits. Its source is the capacitance current flowing to charge/discharge the capacitive loads during logic changes [16] . The dynamic power dissipation is dependent on the switching activity, i.e. the average number of gate transitions per clock period [18] . The number of gate transitions depends on the switching activities at the inputs of the gate as well as on the spatio-temporal correlations among gate's inputs. Thus, the order in which the patterns are applied to the primary and pseudo-inputs influences the power dissipation in the circuit. Reordered test sequences can be applied to the circuit with automatic testing equipment (ATE) when using external testing, or they can be generated on-chip using embedded deterministic tests when built-in test is adopted. It should be noted that in the case of external testing, the resource conflicts (used to generate the test compatibility graph of a system -section 2.1) are caused by sharing the limited number of channels between the ATE and the chip under test, while for embedded deterministic tests the resource conflicts are caused by sharing the test sources, sinks and access mechanisms [24] .
The test vector reordering algorithm described below aims to achieve the following two objectives: minimize the peak power dissipation values and produce a power profile suitable for simple, reliable and accurate test power modeling. Test sequences with lower power allow higher test concurrency under a given power constraint. The use of accurate descriptions of power profiles can also increase the test concurrency under power constraints as it eliminates the "false power" component, which, from the test scheduling perspective, is equivalent with having test sequences with lower power profiles.
The input to the test vector reordering algorithm is a transition graph described below. In a test-per-clock testing scheme, the test vectors are applied to the primary inputs one vector at each clock cycle. Each edge V i
in ITG is weighted with the power P consumed in the circuit during the transition of the primary inputs from
Ψ have to be simulated using a power estimation tool in order to compute the ITG edge weights.
In a test-per-scan testing scheme, a test vector is first scanned-in during m clock cycles, where m is the scan chain length, then it is applied to the block during the clock cycle m 1, and the circuit response is scanned-out during the next m clock cycles, simultaneously with the scanning-in of the next test vector.
in ITG is weighted with the power consumed by the simultaneous scan-out of V i and scan-in of V j . It was shown in [19] that the weighted transition count (WTC) is very well correlated with the real power dissipation. The WTC values corresponding to V i scan-in and scan-out respectively are given by:
where V i j¢ represents the j th bit from vector V i . Finally, ITG edge weights for test-per-scan are computed using:
It should be noted that although the switching activity during the capture cycle is not considered in the above equation, the edge weight formulation can easily be extended to account for it.
Having computed the ITG edge weights, reordering the test sequence for low power reduces to the problem of finding in ITG a low cost Hamiltonian tour. As ITG is a complete directed graph, finding a low cost Hamiltonian cycle in it represents an instance of the asymmetric traveling salesman problem which is known to be NP-hard. Therefore, a greedy depth-first search heuristic was implemented to determine a good solution to this problem. The algorithm starts from a randomly selected vector in the sequence and at each iteration selects the neighboring node which generates the lowest power dissipation, i.e. the outgoing edge with the smallest weight. Due to the greedy nature of the method adopted for traversing the ITG the power profile corresponding to resulting path will exhibit an initial long low power part followed by a short high power part towards the end of the sequence. This is because the edges with lower weights are added to the path in early iterations, leaving the edges with higher weights to the end of the profile. This particular shape of the power profile has the following advantages:
it has lower peak power than a random path in the graph (such as the initial unordered test sequence); in conjunction with the new power approximation model introduced later it brings significant approximation accuracy improvement over its GP-PAM representation as shown later in Table   1 in Section 5; 
Test sequence expansion

New power approximation model
Section 3.1 has shown how, by considering power as a design objective, test vector reordering can generate a test sequence with a regular power profile which has an initial long low power part followed by a short high power part towards the end of the sequence. This regular shaped power profiles can be accurately described using simple approximation models as shown in the example from Figure 2 . By modeling low and high power parts of the profile using their local power peaks and lengths
, then the value, position and size of each part of the profile are available as inputs for power constrained test scheduling algorithms. The improvement in approximation accuracy compared to the GP-PAM, which is represented by the dashed rectangle in Figure 2 , is given by
The new power approximation model will be further referred to as the two local peak power approximation model (2LP-PAM) and will be represented by the 4-tuple
. While P hi is fixed to the global peak value, P lo can be derived based on the L lo and L hi ratio. Thus several 4-tuple descriptions are possible for the same power profile, however the optimum is the one with the highest ∆ approx improv . This optimum 4-tuple approximation can be computed in linear time with the length of the test sequence.
Test sequence rotation
Having lowered and reshaped the test power profiles, using test vector reordering and modeled using the 2LP-PAM, this section explains how more compatible tests can be combined into a test session through the use of test sequence rotation. Since 2LP-PAM offers information on the position and size of both low and high power parts, the power profiles can be rotated such that when added to a 
by using the 2LP-PAM, the maximum test session power dissipation becomes
This examples has shown how by controlling the rotation of the test sequences before adding them to a test session, the high power parts of their power profiles are uniformly spread over the entire test session length, rather than being piled up on top of each other, as in the case of the GP-PAM approach. Therefore, joint power profile of the test session when using the 2LP-PAM, becomes more flat and can fit more tests under the same power constraint.
Test sequence rotation does not influence the peak or average power of a test sequence; rather, it helps the test scheduling algorithm to perform better allocation of the power under the power constraint, and consequently produce shorter test schedules.
It should be noted, that cyclic power profiles are needed for test sequence rotation. A cyclic power profile has to contain the transition between the last test vector in the test sequence and the first one.
Test sequence rotation consists only in assigning a value to an offset parameter which specifies from which position in the initial test sequence starts the rotated one. Therefore, the computational effort involved is virtually inexistent.
Power constrained test scheduling using the proposed power profile manipulation
So far, the new power profile manipulation approach was introduced using the following components:
test vector reordering, test sequence expansion, two local peak power approximation model, and test sequence rotation. This section shows how power profile manipulation can be integrated into existing power constrained test scheduling algorithms.
The non-partitioning test scheduling algorithm for unequal test lengths proposed in [3] will be extended for use in conjunction with power profile manipulation. We maintain the assumption made in [3] that a new test session cannot start before all tests in the current test session are completed, The maximum value of the resulted power profile is P max P hi5 P lo3 P lo4 9, which is less than the power constraint. This means that by using the 2LP-PAM on reordered (and some extended) test sequences all tests in the clique can be scheduled in the same 100 clock cycles long test session under the given power constraint, while by using the GP-PAM on the original test sequences, two test sessions, summing up 160 clock cycles, are required to cover all tests in the clique.
The PCLs are determined for each clique C from TCG and the given power constraint using the algorithm shown in Figure 7 . For each subset of C the algorithm computes the optimum arrangement of its tests using the test sequence rotation described earlier in Section 3.4. The Offset variable guides the rotation of the test sequences to be inserted into the current test session. For the longest test(s) in the test subset, its reordered-non-extended test sequence is used in order to preserve the length of the test session, while for shorter tests, the reordered-and-extended test sequences are used as they exhibit lower power profiles and do not increase the test session length.The maximal power compatible subsets are then added to the set of PCLs.
Finally, finding the optimum test schedule under the given power constraint is reduced to the problem of finding a minimum cost cover for the PCLs set (line 9 in Figure 4 ), where the cost associated to each PCL is the length of the longest test in the PCL, i.e. the test session length. In the proposed implementation, the minimum cost covering problem was formulated as an integer linear programming (ILP) problem and solved using l p solve [20] .
This section has shown how the proposed power profile manipulation can be integrated into power constrained test scheduling algorithms. Although the integration was detailed for the algorithm presented in [3] , the proposed approach can be included into any other existing power constrained test scheduling algorithm [2, 10-12, 14, 17] , to leverage its performance. This is possible because the proposed power profile manipulation approach is a complementary technique to, and hence it is independent on, the test scheduling policy.
Experimental results
This section describes the experiments performed to asses the efficiency of the power profile manipulation technique. The algorithms were implemented in C++ and run on an AMD Athlon 1.2Ghz
workstation with 384Mb RAM running Linux. Due to the simplicity of the WTC model, determining the ITG edge weights does not require a high computational effort. For example, the ITG weights for a test set of 7000 vectors, 1000-bit wide for a test-per-scan scheme were computed in less than 120
seconds. Reordering the vectors of the same sequence was computed in 71 seconds, while expanding the sequences by 5 test vectors as well as computing the two local peak approximation were each performed in less than one second.
The first set of experiments compares the proposed 2LP-PAM with the GP-PAM in terms of approximation accuracy improvement, which basically shows how much "false power" is saved by using the 2LP-PAM. Test-per-clock and, where suitable, test-per-scan test sequences for the largest ISCAS85 and ISCAS89 benchmark circuits were reordered using the algorithm described in section 3.1. The power profiles of the reordered sequences were approximated using the 2LP-PAM and the GP-PAM. Columns 2 and 3 in Table 1 shows the improvement in approximation accuracy of the proposed 2LP-PAM over the traditional GP-PAM. Next, the efficiency of the test sequence expanding method is evaluated. The reordered test sequences were extended with 5 test vectors which led to the peak power reductions shown in column 4 in Table 1 .
The next set of experiments evaluates the performance improvement which can be achieved by integrating the proposed power profile manipulation approach into an existing power constrained test scheduling algorithm. We have integrated our method into the power constrained test scheduling algorithm presented in [3] . The modified and the original algorithms were applied on hypothetical systems. Each system was represented as a set of embedded blocks and a randomly generated test compatibility graph which shows the dependencies between the embedded blocks. The embedded blocks were selected randomly from the ISCAS85 and ISCAS89 benchmarks. Systems with 8 to 16 blocks were considered in our experiments. The original PCTS algorithm from [3] was applied on unordered test sequences and used the GP-PAM to model power. The resulting test application times for a wide range of power constraints are reported in column 3 of Tables 2 and 3 respectively.
By integrating the proposed power profile manipulation approach, that is test vector reordering, test sequence expansion, the 2LP-PAM and test sequence rotation, into the PCTS algorithm from [3] reduced the previous test application times to the values reported in column 4 of Tables 2 and 3 respectively. The percent reduction of the test application time through the use of the proposed power profile manipulation approach in conjunction with the original algorithm is reported in column 5.
The experimental results show how by using the proposed power profile manipulation in conjuction with an existing Power constrained test scheduling algorithm, test application time can be reduced with up to up to 41%.
Concluding remarks
Currently, two main research directions can be identified in low power testing. Figure 2 : Approximation of regular power profiles 
