An integrated technique for test scheduling and scan-chain division under power constraints is proposed in this paper : We demonstrate that optimal test time can be achieved f o r systems tested by an arbitrary number of tests per core using scan-chain division and we define an algorithm for it. The design of wrappers to allow direrent lengths of scanchains per core is also outlined. We investigate the practical limitations of such wrapper design and make a worst case analysis that motivates our integrated test scheduling and scan-chain division algorithm. The efficiency and usefulness of our approach have been demonstrated with an industrial design.
Introduction
The increasing complexity of System-on-Chip (SOC) has created many test problems, and long test application time is one of them. Minimization of test time has become an important issue and several techniques have been developed for this purpose, including test scheduling [ll, [21, [31, 141 , and test vector set reduction [5] .
The basic idea of test scheduling is to schedule tests in parallel so that many test activities can be performed concurrently. However, there are usually many conflicts, such as sharing of common resource, in a system under test, which inhibit parallel testing. Therefore the test scheduling issue must be taken into account during the design of the system under test, in order to maximize the possibility for parallel test. Further, test power constraints must be considered carefully, otherwise the system under test may be damaged due to overheating.
We have recently proposed an integrated framework for the testing of SOC [6] , which provides a design environment to treat test scheduling under test conflicts and test power constraints as well as test set selection, test resource placement and test access mechanism design in a systematic way. In this paper, the issue of test scheduling will be treated in depth, especially the problem of scanchain division (test parallelization). We will present a technique for test parallelization under test power constraints and demonstrate how it can be used to find the optimal test time for the system under test. Our technique is based on a greedy algorithm, which runs fast and can be therefore used during the design space exploration process.
The usefulness of the algorithm is demonstrated with an industrial design.
The rest of the paper is organized as follows. Related work is described in Section 2, and preliminaries are given in Section 3. The details of our approach are presented in Section 4. The paper is concluded with experimental results in Section 5 and conclusions in Section 6.
Related Work

Test Scheduling
Scheduling the tests in a system means that the start time and end time for each test are determined in such a way that all constraints are satisfied and the test time is minimized. Chakrabarty showed that the test scheduling problem where each test is denoted with a fixed test time is equal to the open-shop scheduling [I] [2] . The test conflicts in such systems are few dues to that each testable unit has its dedicated test resources.
For general systems, Chou et a / . [3] and Muresan et al. [4] have proposed techniques to minimize test time under power limitations and conflicts. In the approach by Chou et a / . [3] a resource graph is used to model the system where an arc between a test and a resource indicate that the resource is required for the test, Figure 1 . From the resource graph, a test compatibility graph (TCG) is generated (Figure 2) where each test is a node and an arc between two nodes indicate that the tests can be scheduled concurrently. For instance t l and t2 can be scheduled at the same time. Each test is attached with its test time and its power consumption and the maximal allowed power consumption is 10. The tests t l , t2, t3 are compatible, however, due to the power limit they can not be scheduled at the same time.
Test Parallelization
By test parallelization we mean that the test vectors in a given test are rearranged in such a way that several tests can be executed in parallel. For a scan-based design, each test vector is shifted in (scanned in), and after applying a capture cycle, the test response is shifted out (scanned out). Even if a new test vector is shifted in at the same time as the test response from the previous test vector is shifted out, the shift-in and shift-out process contributes to a major part of the test time due to the length of the scan-chain (number of flip-flops). By dividing a scan-chain into several chains of shorter length, the test time is reduced.
Another advantage with test parallelization, beside test time minimization, is that the time a resource is required for a particular test is reduced, which reduces the impact of test conflicts. For instance, if test t4 that requires r l and r4, in the example given in Figure 1 , is parallelized by a factor 2, the time when rl and r4 is used by t4 is reduced to 1.
Aerts et al. [5] have investigated the problem of dividing scan-chains for test time minimization where the constraints are defined by available pins (bandwidth). We focus on the limitations defined by maximal power consumption and test resources conflicts. However, for the integrated test scheduling and scan-chain division algorithm, bandwidth limitations are considered.
Preliminaries
System Modeling
An example of a system under test is given in Figure 3 where each core is placed in a wrapper in order to achieve efficient test isolation and to ease test access. Each core consists of at least one block with added DFT Applying several sets of tests where each set is created at some test generator (source) and the test response is analysed at some test response evaluator (sink) tests the system.
In our approach, a system under test, such as the one shown in Figure 3 , is by a notation, design with test, DT = (C, Rsourre, Rsink, pmU, T, source, sink, constraint, bandwidth)2, where: Figure 3 is tested by one test per core (j=1) and r, is TGI /TREI, r2 is a shared test bus, r3 is TG21 TRE2 and r4 is the tap, the test resource graph given in Figure 1 is valid for the system.
Test Power Consumption
Generally speaking, there are more switching activities during the testing mode of a system than when it is operated under the normal mode. The power consumption of a CMOS circuit is given by a static part and a dynamic part. The dynamic part dominates and can be characterized by: 1 where the capacitance C, the voltage V, and the clock frequency f are fixed for a given design [7] . The switch activity a, on the other hand, depends on the input to the system which during test mode are test vectors and therefore the power dissipation vary depending on the test vectors.
An example illustrating the test power dissipation variation over time T for two test r j and 9 is given in Figure 4 . Let p ; (~) and p j ( 7 ) be the instantaneous power dissipation of two compatible tests ti and 9, respectively, p = c x v 2 x f x a 2. This is a simplification of the model we used in [6].
and P(tJ and P($) be the corresponding maximal power dissipation.
If p;('~) + pj(r) < P,,,, the two tests can be scheduled at the same time. However, instantaneous power of each test vector is hard to obtain. To simplify the analysis, a fixed value ptest(ti) is usually assigned for all test vectors in a test t; such that when the test is performed the power dissipation is no more then presr(ti) at any moment. The ptesr(ti) can be assigned as the average power dissipation over all test vectors in ti or as the maximum power dissipation over all test vectors in ti. The former approach could be too optimistic, leading to an undesirable test schedule which exceeds the test power constraints. The latter could be too pessimistic; however, it guarantees that the power dissipation will satisfy the constraints. Usually, in a test environment the difference between the average and the maximal power dissipation for each test is often small since the objective is to maximize the circuit activity so that it can be tested in the shortest possible time [3] . Therefore, the definition of power dissipation ptest(ti) for a test ti is usually assigned to the maximal test power dissipation (P(ri)) when test ti alone is applied to the device. This simplification was introduced by Chou et al. [3] and has been used by Zorian [2] and by Muresan et al. [4] . We will use this assumption also in our approach.
For the parallelization of a particular test a model is also required. Aerts et al. have defined such formulas for scanbased designs to determine the change of test time when a scan-chain is subdivided into several chains of shorter length [5] , the test time for a test ti is given by: at a core withfi scanned flip-flops, ni number of scan-chains, and mi test vectors. The formulas assume that a new test vector is scanned in at the same time as the test response is shifted out. This scheme is applicable for all test vectors but when the test response from the last test vector is shifted out and therefore the term +1 is added in Equation 2.
In our approach, we use the a formula which follows the idea introduced by Aerts et al., namely: where nii is the degree of parallelization of a test tii.
Finally, we need an estimation on the relation between test power and test time when parallelizing a test. When a test is parallelizad and the test time is reduced, three options are possible for the change of test power, namely: (1) not affected, (2) decreased or (3) increased.
If the test power is not affected (option 1) or if it is decreased (option 2) while the test time is reduced, it is desirable to parallelize the test as much as possible.
The worst case occurs when the test power increases after a test parallelization since it means that the maximal power limit must be considered in order not to damage the system. In this paper we investigat the worst case.
Gerstendorfer
The simplifications we have defined in this section are used in order to discuss the impact on test time and test power. Especially note that the assumption in Equation 4 is a worst case assumption. For instance, if the test time for a test is reduced by a factor 2, the test power increases by a factor 2.
Test Wrapper Design
Test conflicts can be minimized by placing the core in a wrapper such as the Testshell proposed by Marinissen et al. [9] . A standard under development is the IEEE P1500 Standard for Embedded Core Test, consisting of a Core Test Language and a Core Test Wrapper [lo] ( Figure 5 ). The P1500 wrapper is similar to the Testshell. A major difference between Testshell and P1500 is that the latter only allow a single bit bypasses while the Testshell allows a TAM wide bypass. 
Proposed Approach
Optimal Test Time
In this section we first discuss the possibility of achieving optimal test time with the help of test parallelization under power constraints. We assume a given system to be modelled as described in Section 3.1 where each test has a test time and a test power consumption attached to it. This can be illustrated using a rectangle for each test (as shown in Figure 6 test time while the vertical side corresponds to its test power consumption.
A test schedule can be illustrated by placing all tests in a diagram as in Figure 6 (b). At any moment the test power consumption must be below the maximal allowed power limit pmm. The rectangle where the vertical side is given by pmax and the horizontal side is defined by the total test application time troral characterizes the test feature of a given system under test.
If the rectangle defined by p m p troral is equal to the summation of tr,,r(rii)xpr,,f(tij) for all tests, as given by the following equation, we have the optimal solution. For a scan-based design the scan-chains can be divided into several which reduces the test application time. If every test tii is allowed to be parallelized by a factor nii. the total test time when all tests are scheduled in sequence is:
The lower bound of the degree of parallelization is nu = 1. For a scan-based core, it means a single scan-chain. The upper bound of the degree of parallelization is defined by the maximal test power consumption: in combination with Equation 8, the following is obtained:
The above equation indicates the possibility to obtain optimal test time by parallelization, in theory. However, in the analysis, it is assumed that we have only one test set per block or that all test sets for a core are considered as a single test. In such case, the above analysis is valid. However, a testable unit is often tested by two test sets, one produced by an external test generator and one produced by BIST.
A problem arises when the degree of parallelization of two tests at a testable unit require different degree of parallelization. For instance, a scan-chain is to be divided into nu chains at one moment and into nik chains at another moment where j f k . However, if the core is placed in a wrapper such as PI500 it is possible to allow different lengths of the scan-chains. As an example, in Figure 7 , the bold wiring marks how to set up the wrapper in order to make the two scan-chains to be connected into a single scan-chain.
For a given core ci tested by the tests til and ti2, we have two test sets each with its degree of parallelization calculated as nil and ni2. It means that the number of scanchains at ci should, when test til is applied, be nil and, when ti2 is applied, ni2. For instance if nil=10 and ni2=15 the number of scan-chains are given by 2~5 x 3 = 3 0 which is least coninion niultiplier (lcm). This means that we also generalize our solution to make it applicable to an arbitrary number of tests per testable unit (core).
Optimal Test Algorithm
The optimal test scheduling algorithm is illustrated in Figure 8 . The time T determines when a test is to start and it is initially set to zero. In each iteration over the set of cores and the set of tests at a core, the degree of parallelization nu is computed for the test tu; its new test time is calculated; and the starting time for the test is set to T. Finally T is increased by tfe,i(ru)/nu. When the parallelization is calculated for all tests at a core, the final degree of parallelization can be computed.
The algorithm consists of a loop over the set of cores and at each core a loop over the set of its test, it corresponds to a loop over all tests resulting in a complexity O(l7l) where IT I is the number of tests. 
Practical limitations
The optimal degree of parallelization for a test ti has been defined as pma\lPlesr(ru) ( Equation 9). However, such division does not usually give an integer result. For instance, assume a system with a maximal test power consumption as pmur = 10 and the test power for a test tu at a scan-based core as pre,i(ry) = 4. In this case nu = 2.5. However, the number of scan-chains in a core can not be 2.5. In practice, nu should be rounded down, in this case into 2 (rounding up to 3 leads to a test power of 12, which is bigger than pmu,). The practical degree of parallelization for a test ti is given by: nij = [prnu/(Prcst(tij))] 1 1 Pmux = Prrsr(';j) x n;j + Ajj
For each test tu, the difference between the optimal and the 1 2 practical degree of parallelization is given by:
and the difference Au for each test fu is given by: Ajj = Prrsr(t;j) x n;j -P r e s r ( t ; j ) x p i j J = P r r s r ( t l j )
x (njj-Ln;jJ)
13
Ai reaches its maximum when nu-Lnu J is approximately 1 which occur when nu = 0.99.. leading to Ay= plesr(ru). The worst case test time occurs when Ay = p r e s r ( t~) for all test tu and nu = 1, resulting in a test time given by Equation 8 which is equal to t,eque,,ce computed using Equation 7 since nu = 1.
We now show the difference between the worst case test time for the system and its optimal test time. The worst case occurred when Au = pres,(ru) and nu= 0.99 ... 
An Integrated Test Scheduling and Test
In this section, we outline the test scheduling and test parallelization part of the algorithm and leave the function for constraint checking and nexttime out. The tests are initially sorted based on either power @), tirne(t) or powemtime (pxr) and placed in P (Figure 9 ). Iterations are performed until P i s empty (all tests are scheduled). For all tests in P at a certain time T, the maximal possible parallelization is determined as the minimum among: The constraints are checked and if all are satisfied, the test is scheduled in S at time T and removed from P.
Parallelization Algorithm
The computational complexity of the algorithm, comes from sorting and two loops. The sorting can be performed using a sorting algorithm at O(l7lxlog In). The worst case for the loops occurs when only one test is scheduled in each iteration resulting in a complexity given by:
where In is the number of tests in the system. The total worst case execution time is l7lxlog In+ 1712/2 + 17112 which is of O(1712). For instance, the shortest-task-first approach by Chakrabarty has a worst case complexity of 0(1713) [I] . 
Experimental Results
We have performed experiments on a design example and an industrial design. For the design example ( Figure 3 ) with resource graph in Figure 1 and the TCG in Figure 2 all tests are allowed to be parallelized by a factor 2 except for test t3l which is fixed. The test schedule when not allowing test parallelization results in a test time of 6 time units ( Figure 6(b) ) and when only test parallelization is used the test time is also 6 time units (Figure 10(a) ). However, when combining test scheduling test parallelization the test time is reduced to 4 time units (Figure 10(b) ).
The industrial design has characteristics given in Table 1 and the power limitation is 1200 mW and only one test may use the test bus or the functional pins (fp) at a time. Furthermore block-level tests may not be scheduled concurrently with top-level tests. The minimal and maximal degree of parallelization is also given for each test.
A designers solution requires a test time of 1592 where the tests are scheduled in the following sequence: A, B, C, E, E I, J, K, L, M, N, 0, P, Q. Using the test scheduling approach we proposed [6] results in a test schedule as: N, { A II B, I, E, F, C, J, MI, P, 0, Q, L, K where A is scheduled concurrent with B, I, E, F, C, J, M. The test time is 1077 which is an improvement of the designers solution with 32%. The test schedule achieved using the approach proposed in this paper results in a test time of 383, Table 2 .
Conclusions
In this paper, we have proposed an integrated technique for test scheduling and scan-chain division under power constraints for the testing of SOCs. We have investigated scan-chain division under test power constraints and shown that the optimal solution for test application time can be found in the ideal case and we have defined.an algorithm for finding such solutions. We have also outlined the wrapper design allowing the core to be tested by several test sets at a variable length of the scan-chain. For such wrapper design, we have made a worst case analysis, which motivates that scan-chain division must be integrated into the test scheduling process. We have performed experiments on an industrial design to show the efficiency of the proposed technique. 
