We present a new algorithm to co-optimize test scheduling and wrapper design under power constraints for core-based SoCs (System on Chip). Core testing solutions are generated as a set of wrapper designs, each represented by a rectangle with width equal to the test time and height equal to the number of TAM (Test Access Mechanism) wires used. The test-scheduling problem with power constraints is formulated as the distributed rectangle binpacking problem, which allows wrapper pins to be assigned to nonconsecutive SoC pins. The generalized problem for multiple-TAMS is solved by global optimization using evolutionary strategy and the sequence-pair representation. Experiments on lTC'02 benchmarks are very encouraging
Introduction
Embedded cores have become more common in large SoCdesigns. They are usually deeply embedded in the system chip and direct access is often impossible. Cores have to be tested on a system level after manufachxing and special test access mechanisms (TAMs) are needed. Selecting and scheduling test solutions for SoC embedded IP cores is a vely complex problem. In order to facilitate reuse of test vectors provided by the core vendor, an embedded core must be isolated from the surrounding logic and test access must be provided from the IiO pins of the SOC. A test wrapper forms the interface between the TAM and a core, while the TAM transports test data between SoC pins and the wrapper. The general problem of SoC test integration includes the design of TAM architectures, optimization of the core wrappers, test scheduling, and wrapper pin assignments. The goal is to minimize testing time given TAM architecrure and power constraints.
A number of recent papers cover various aspects of SoC test scheduling. Most earlier papers propose methods to solve wrapper design and test scheduling as separate problems. Recently, [2] and 151 presented an integer-linear programming formulation of cooptimization ofwrapper design and test scheduling for SoCs. In [SI, modeling of the concurrent test-scheduling problem as a pseudo-3D bin-packing problem was proposed along with a best-fit heuristic method to solve it. The test scheduling problem with power constraints was considered in 131 and [8] . ILP formulation for TAM design under Place-and-Route and power constraints was presented in [9] , but with only one wrapper design per core. The Multiple-TAM problem (optimal assignment of TAM wires to different partitions) was discussed in [SI and solved by enumeration of possible solutions. Problem description and conditions for using non-consecutive TAM pins was presented in [I] , and later in [14], but none of the previous approaches introduced an algorithm that allows for wrapper-pin assignment to non-consecutive SOC pins.
In this paper, we addressed the SOC test scheduling and wrapper design co-optimization problem with wrapper pin assignment to non-consecutive SOC pins under power constraint. We also consider partitioning of TAM width (wires) to a given number of partitions and assignmcnt of cores to multiple TAMs.
Permission lo make digital or hard copies of all or part of this work for personal or classroom use is granted without fce provided that copies are not made or distributed for profit or commercial advanlage and that copies bear this notice and thc full cilation on the first page. To copy otherwise, to repobllsh, to post on servers or tu redistribute tu lists.
Given the test-set parameters for the SOC cores, the number of SOC pins (or the number of TAM wires), and the maximum power dissipation allowed during the test, our method determines an optimal test schedule, assignment of TAM wires among cores, and an optimal wrapper design for each core, such that the overall system testing time is minimized and test power dissipation is below the man dissipation limit, We also generalize this problem for multiple-TAMs. Given a number of TAMs, we globally optimize TAM width and the assignment of cores to TAMs such that the total test time is minimized.
We represent wrapper design as a rectangle with width being test time and height being the number of TAM wires. With this representation the test-scheduling problem is similar to the fixedheight floorplanning problem but with added complexity. In floorplanning, we assume the areas of modules are fixed and that aspect ratios change continuously in given ranges, while in test scheduling, areas and aspect ratios change in a non-continuous (discrete) fashion. The power dissipated during the testing of a core is associated with each rectangle. An Evolutionary Algorithm (EA) and sequence-pair (SP) representation [IO] were used in our approach.
A heuristic approach using the sequence-pair representation for test scheduling problem was considered in [7] , and a Simulated Annealing (SA) algorithm using the sequence-pair representation has been recently proposed by Zou et al. [14] . No power constraints were considered, and despite using a heuristic to generate an initial solution for SA solver, the algorithm in 1141 is relatively slow. It will be shown in the result section that our results are much better than those in 171 and that our test time results are comparable with 1141, while our CPU time is much shorter.
The remainder of this paper is organized as follows. In Section 2, we describe our overall approach. Wrapper design optimization is presented in Section 3. We discuss the test scheduling algorithm and non-consecutive pin assignment problem in Section 4. The sequencepair representation is presented in Section 5 and our Evolutionary Algorithm in Section 6. Power constraints and multiple TAM approach are presented in section 7. Results and conclusions are given in Section 8 and Section 9, respectively.
Problem Formulation
Let the SoC design consist of N cores, and each core C, , I<=ic=N has ni data inputs, m; data outputs, b; bidirectional data IiO, sin; scan inputs, and sour; scan outputs. Let K be the total width of the TAMs and B be the number of TAM partitions. Also assume that each core must be tested with P, patterns and that the maximum peak power during testing for each core is given. The amount of power dissipated during core testing depends on core switching activity and the number of FFs in a core. Power estimation is a quite challenging problem and is not a subject of this paper.
Given a set of N cores, their specific test parameters, the number of SoC pins, the maximum-allowable peak power dissipation Q, and power dissipation data for each core, we design the test schedule
The overall problem that we solve is as follows:
with TAM architecture and wrapper designs for all wrapper-based cores such that the SoC test time is minimized and the peak power during testing never exceeds Q. There are three steps. First we generate all possible optimized wrapper designs for each core under the specified TAM width. In the next step we solve the test-scheduling problem under the maximum TAM width and power constraint using the sets of predesigned1optimizcd wrapper solutions. Finally, we assign wrapper pins to non-consecutive SOC pins according to an optimized test schedule.
We generalize the same problem for multiple TAMS with a given number of TAM wires and a given number of partitions. We solve for the minimized test schedule with the optimized distribution of TAM wires among the given number ofpartitions.
Core Wrapper Design
A test wrapper is a layer of DFT logic that connects a TAM to a core for testing purposes [4] . Since large cores typically have hundreds of core terminals, and the total number of TAM wires available depends on the limited number of SoC pins [2] , in practice, wrappers may often need to perform test width adoption when the TAM width is not equal to the number ofcore terminals. Core-intemal testing is only considered in this paper. To calculate the test time, T, for a wrapper assuming different assigned TAM widths we use the well-known expression [4] given below.
T-(l+max(sj,s,)> P+min(sj,s,)
S; is the length of a wrapper input scan chain, S, the length of a wrapper output scan chain, and P is the number of test patterns.
For cores with internal scan chains we used an algorithm based on Best Fit Decreasing (BFD) heuristic [Z] . The functional inputs, outputs and bidirectional I10 are assigned to wrapper scan chains using the same algorithm. For cores without internal scan chains we use unbalanced design [I41 and allow different numbers of SoC pins to be assigned to scan in and to scan out.
As a result of the above given algorithms, each core C j is [I], for a given core, the testing time varies with TAM widths as a "staircase" function. The designs that represent the smallest TAM width for a given test time are known as Pareto-optimal designs and were formally defined in [l] . Only rectangles corresponding to Pareto-optimal TAM width need to be considered.
4.

Test Scheduling problem
Given is a SOC with K TAM wires, and a set of N cores. Each core C; I < = i < = N is represented by a set of L; wrapper configurations. Find the assignment of core wrapper pins to the pins of SoC and determine the test starting timc for each core such that the overall test time is minimized. Wrapper configurations for each core Cj are represented with a set of rectangles and for each core we want to choose one rectangle such that when packed in a bin the width of the bin is minimized and the height is no larger than K.
Thc classical floorplanning problem of placing a set of modules (rectangles) on a plane such that the overall area is minimized and no rectangle overlaps can be modified to represent a Sac test-scheduling problem ( 5 , 8). Floorplan height represents a fixed number of available TAM wires, K , and the width represent the SoC test time that has to be minimized.
A given floorplan is feasible if no rectangles overlap and the aspect ratio of the floorplan is within specified limits. No rectangle can be partitioned. However, the test schedule of a SoC is feasible if no two cores are assigned to the same SoC pin for the same testing time instance and for each core all its wrapper pins are assigned to SoC pins for the entire time needed to test that core. The height of the bin represents the TAM'S pins and the order of the pins is given. Representing a wrapper design for a core as a rectangle limits the pin assignment to the consecutive pins. If we allow the rectangle to be partitioned into smaller rectangles of the same width (width represents core testing time) we relax the consecutive pin limitation. The classical floorplanning feasibility requirement restricts the wrapper pin assignment to only consecutive SoC pins An example of a test scheduling solution, represented as 2D bin packing, is shown in Fig. 1 . Notice Core 7 could be assigned to non-consecutive SOC pins and we could reduce testing time from T, to T,. It would require partitioning the rectangle representing the wrapper design for Core 7 into two rectangles of the same width (the same testing time) and heights, 113 and 213 of the original height, as shown in Fig. 2 . Predicting the optimal partition for a rectangle is close to impossible, therefore we established conditions for a feasible time schedule represented with a non-feasible floorplan (shown in Fig. 3 ). To generate a feasible test schedule assuming that a core can use non-consecutive SOC pins we need to satisfy the following constraint. Consfrainf 1. The total TAM width calculated as a sum of TAM wires used by all cores tested at a given time is smaller than K for O<=t <=T,, where T,,, is the time needed to complete testing of the given SOC. Constraint 1 can be satisfied by assuring that the total TAM width is smaller than K a t each time a new core is added to the set of cores being tested. Therefore, we generate test-schedule floorplans and calculate the total TAM width at TO (test starting time) and at each time instance (T,, T2 ...) when a new core is added to the testing set. The TAM-width histogram representing the total TAM width versus test time, for the example from Fig. 2 and Fig. 3, is shown in Fig. 4 . Two weighted, acyclic digraphs can be constructed in which the modules at the lattice points form the vertex set. In the horizontal graph Gh an edge is placed from module i to module j iff i "is to the left of'j. Similarly, in the vertical graph G, an edge is placed from module i to module j iff i "is below" j . Source and sink vertices are added to each graph. Fig. 6 shows this construction.
A weight is associated with each edge in the Gh (G,) graph that indicates the width (height) of the respective module. A longest path algorithm conducted in both graphs will give the overall packing height and width. Paths in the vertical constraint graph represent the time instances at which the total number of TAM wires has to be calculated to generate the TAM-width histogram, The total TAM width is given by the length of the path .,A, A -ms In our approach we use a very fast ( O(n1ogn)) algorithm [ I I] to translate the sequence-pair representation to its corresponding block placement. Positions of modules on a floorplan are directly generated from the sequence pair without need for constraint graphs. The algorithm is based on computing the longest common subsequence in a pair of weighted sequences. A lower complexity (O(nloglogn)) version of this algorithm, also proposed in [ I I], gives worst experimental results, perhaps due to its complex implementation.
6.
.
Evolutionary Algorithm
Unlike simulutedunnealing (SAFfrequently used in VLSI layout synthesi9-an evolurionory algorirhm (EA) processes a population of potential solutions in parallel rather than a single solution. Our algorithm is derived from an evolution strotegy (ES). Parents are subjected to stochastic "reproduction" operators that produce offsprings. There are two basic types of reproduction operators used in EAs. Recombination uses component parts from two parents to create an offspring, and mutation alters parts in a single parent to create an offspring. We only use mutation operators for reproduction as they always produce a feasible offspring from a feasible parent. Recombination was not used because it has a high probability ofproducing invalid struchxes.
Our EA begins with an initial population of k=2O randomly generated "parents", where a parent is encoded as a sequence pair.
This population size provides a reasonable tradeoff between computation time and exploration of the solution space. Each parent undergoes reproduction to produce one offspring.
In the simple approach, parents and offspring compete equally for survival; they are ranked according to fitness and the most fit become parents in the next generation. In our algorithm we use a tournament selection process when evaluating which solutions will survive to the next generations. Instead of choosing the best k solutions based on the value of the cost function (fitness), we evaluate each solution against a randomly chosen set of T S solutions, where TS is called a tournament set size, and record the number of wins for each solution. The best scoring k solutions are chosen for the next generation. The tournament selection process allows for diversification of the solutions and so decreases search time. If properly designed, each generation will contain a subset of 
,-,). Based on experiments,
w e have introduced heuristic steps into the EA framework, a5 for such implementation we were getting better results. When the maximum height of the test-time histogram (one is shown in Fig. 4) is 30% over the fixed TAM width, we choose the module with the largest TAM width value and we randomly choose another wrapper solution for that module. If the maximum height ofthe histogram is 30% below the fixed TAM width, we choose the module with the longest test time and will randomly choose another wrapper design for that core.
Since we solve the test-scheduling problem for distributed SoC pins, the actual pin assignment is performed after test starting times for all cores have been determined. It can be easily shown that a feasible solution to the pin assignment problem, given a feasible SoC test-scheduling solution, is always possible. We represent each chosen wrapper design as a set of rectangles each of height equal to one TAM wire and width equal to the testing time required for a given core. The problem of assigning wrapper-to-TAM pins can be modeled as a modified channel routing problem that can be exactly solved with the Left Edge Algorithm, as there are no vertical constraints in the pin assignment problem.
Power Constraints and Multiple TAMs
At any time during SoC testing, the maximum power dissipation cannot exceed a certain limit, Q, (SoC power budget).
Otherwise, the chip may he damaged. P, represents the peak power dissipation for core i. In general the peak power dissipated during the test may be different for different wrapper designs. Based on wrapper design we can estimate that wider test data leads to shorter test time, but at the same time, switching activity of the core under test per cycle can increase and thus increase the maximum peak power. We ran our experiments for two cases; ( I ) the maximum peak power is constant for a given core, (2) the maximum peak power is a linear function of the width of the test data.
To evaluate power constraint we calculate the sum of the maximum peak powers for all cores tested at a given time instance.
We assume the maximum peak power for a core in testing is the same over the entire time the core is tested. Therefore, we just need to calculate the sum of the maximum peak powers whenever a new core begins to be tested. Similarly to the TAM-width histogram, we generate a Power histogram. An example of such histogram for test schedules is shown in Fig. 7 To consider power constraints we added a penalty coefficient into the objective function. If power dissipation, is at any time larger than the power budget, Q, the penalty coefficient, P,,sj., is equal to the difference is equal to the difference between the highest value of the instance power, P, , , , , , and the power budget Q. The cost function is given with the following formula cost = (P,;,,,*." ~ Q + I ) T,*,,. To Tun experiments of test scheduling under power constraints using ITC'02 benchmarks, we had to make a number of assumptions, Only one benchmark (h953) among the ITE'O2 benchmark set ha5 power dissipation numbers included. As it is specified in the ITC'02 benchmark format [13] , power is expressed as a non-negative integer in the form of power dissipation per test. The unit of power is not specified, but should be the same for all modules throughout one SOC benchmark. So, we made the following assumptions. For case (1) we assume that the power numbers given in the benchmark represent the maximum peak powers dissipated during core testing, and for case (2) we assume that the given power numbers represent the maximum peak power when test is performed using only one TAM wire.
The sequence pair representation has been modified for the multi-TAM problem. We represent each TAM as a separate sequence pair, and have designed new operators for changing the number of TAM wires assiened to each oartition and movine cores
The overall problem which we solve can he formulated as follows: Multi-TAM Non-Consecutive Pin SoC Test Scheduling Problem under Power Constraints:
Given is a SoC with K TAM wires, a number of TAM 
Experimental Results
All of our tests were conducted on a SUN UltraSPARC 5. We used the ITC'02 SOC benchmarks [ 131 and we averaged all results over 100 runs. In the second experiment we assumed that the peak power dissipated during a test is a linear function of the number of TAM wire used. The results of these cases arc given in column (P= f (TAM)). We assumed the power numbers given for the benchmark h953 are for a single TAM wire used. So, for example, for 4 TAM wires used, power consumed by the core during testing, given in the benchmark, was multiplied by 4. In Table 3 we compare our multi-TAM results to those from [ 5 ] . For most of the cases our results are better, even though those in 15)
were obtained by enumeration of all possible cases.
Conclusions
We proposed an EA-based approach that uses sequence pair representation to simultaneously perform wrapper design, test scheduling, and assignment of wrapper pins to SOC pins to minimize test application time for a given SOC without exceeding the SOC power budget. We have incorporated a distributed rectangle approach to allow non-consecutive SOC pins to be assigned to a given core for testing. 
