In this paper, we propose a new data structure called dual sequences to represent SOC test schedules. Dual sequences are used together with a simulated annealing based procedure to optimize the SOC test application time and tester resources. The problems we consider are generation of optimal test schedules for SOCs and minimizing tester memory and test channels. Results of experiments conducted on ITC'02 benchmark SOCs show the effectiveness of the proposed method.
Introduction
In the last few years, SOC (system on a chip) design has become the trend in integrated circuit design. Designing a SOC usually involves using pre-defined IP blocks, potentially from different sources, and then adding user defined logic to create a design for a specific application. By taking advantage of the re-usability of the IP blocks, SOC design eliminates the need to design an entire chip from scratch and accelerates time-to-market. As SOC design moves toward mainstream use, the problem of effectively testing the IP blocks (called cores) within the SOC needs to be addressed. Generally speaking, SOC test requires considering the following issues: test access mechanism (TAM) design, core wrapper design, test scheduling, tester memory and tester channels.
TAM is the hardware infrastructure, which transports the test data between the SOC pins and the core wrappers. The core wrapper is a logic block consisting primarily of scan chains placed around the core to isolate the core from its surrounding logic and serve as an interface between the TAM and the core. A number of approaches have been proposed for the core wrapper design [1] [2] [3] [4] [5] [6] 11] .
SOC test scheduling is the procedure of deciding the test start time of every core so as to obtain a minimum test application time for the SOC under certain constraints, such as TAM width (i.e. the number of SOC pins), power dissipation during test, etc.. Since test scheduling depends on the SOC TAM design and the core wrapper design, SOC test requires co-optimization of the TAM, the core wrapper design and the test schedule. Recently a number of works proposed solutions to this problem. In [3] Marinissen et. al. presented several methods to design TAMs. In [7, 8] Larsson and Peng considered cooptimization of SOC test time and the number of SOC pins under the assumption that a wrapper for each core is given. In [9] , Chakrabarty developed an integer linear programming model for minimizing test application time by co-optimization of bandwidth distribution and test bus assignment. Huang et al. [10] formulated the cooptimization problem as a two-dimensional bin packing or rectangle packing problem and solved it by using a best-fit heuristic algorithm. In [16] , a SOC test schedule representation called k-tuples was introduced and test scheduling was realized using a greedy algorithm. A similar test schedule representation known as sequence pair was used together with simulated annealing to solve the co-optimization problem [6] . Other works, such as [11] [12] [13] [14] [15] have investigated the same problem using specialized heuristic procedures.
In [6] it was shown that test application time for benchmark SOCs using a simulated annealing algorithm were most often shorter than all earlier proposed heuristic solutions and also shorter than an ILP based procedure when the run time of the ILP procedure was limited (to several hours). The SOC schedules were represented in [6] by what are called sequence pair [19] .
In this paper, we introduce a simple and effective data structure called Dual Sequences (DS) to represent SOC test schedules and use this to obtain optimal SOC test schedules using simulated annealing. Experimental results show that test schedules obtained using DS with simulated annealing are as good as or better than those obtained using sequence pair [6] while the run time of the simulated annealing procedure is greatly reduced. Another problem we consider is minimization of tester memory and tester channels, again using DS to represent test schedules together with simulated annealing.
The paper is organized as follows. In Section 2, we briefly review the features of SOC core test time. SOC test scheduling is introduced in Section 3. In Section 4, the new test schedule representation by dual sequences is presented. In section 5, the simulated annealing algorithm used to obtain optimal SOC test schedules is presented. The problem of minimizing tester memory and tester channels is discussed in Section 6. Experimental results are given in Section 7. Section 8 concludes the paper.
The Features of SOC Core Test Time
The core wrapper is the interface between the core and the SOC TAM. It provides several kinds of operation modes, such as normal function, interconnect test, bypass test, etc.. The test time for a core is derived by the following formula [1] . T={1+max (Si, So)}· P + min (Si, So)
(1) where P is the number of test patterns, and Si (So) denotes the length of the longest wrapper scan input (output) chain for the core. The core test time T is decided by the length of the longest wrapper scan chain. So one goal of the core wrapper design is to shorten the longest wrapper scan chain. For this purpose, balanced wrapper design was proposed [1, 11] , which partitions the wrapper scan elements among the wrapper chains to make the length of the wrapper chains as equal as possible. In our method, balanced wrapper design proposed in [11] is used. Fig.1 The testing time for the core 6 of p93791
Next we consider the relationship between the core test time and the core wrapper width. Figure 1 shows the test time for core 6 of the ITC'02 benchmark SOC p93791 for different wrapper widths. It can be seen that the test time for the core is a staircase function, which means that there are only some wrapper width values where the core test time changes. These points are called pareto-optimal points [11] . This feature of core test time enables us to restrict consideration of candidate core wrapper widths to the pareto-optimal points as the permissible wrapper widths. If the core wrapper is represented by a rectangle with the width representing the wrapper width and the height representing the core test time, there is a set of candidate rectangles for every core corresponding to the pareto-optimal core wrapper widths. In co-optimizing wrapper design and SOC test time, one of these rectangles is chosen for each core.
Problem Formulation
The problem of SOC test scheduling we are considering is stated below.
Given are a SOC with N pins and N c cores. Each core C i (1 ≤ i ≤ N c ) has a set of N i permissible wrapper configurations. Each wrapper configuration is represented by a pair (W ij , T(W ij )), where W ij stands for the width of the j-th wrapper configuration for core C i and T(W ij ) stands for the test time of core C i with wrapper width W ij . The objective is to pick one wrapper design for each core, determine the mapping from the SOC pins to the core wrapper pins, and set the test start time for each core such that the SOC test application time is minimized. This problem can be transformed into the well-known two-dimensional bin packing problem, in which the SOC is represented by a bin with width N and the set of N i SOC wrappers for every core is represented by a set of Ui rectangles with width W ij and height T(W ij ) [10] . The objective is to choose a rectangle for every core C i and pack all the rectangles in the bin, such that height of the bin is minimum.
Representation of SOC Test Schedules by Dual Sequences
In Figure 2 , we illustrate a test schedule for a SOC with six cores. The vertical axis is time and the horizontal axis represents SOC pins. In this schedule, testing of Core 1, Core 2 and Core 3 starts simultaneously at time t = 0. Testing of Core 3 is completed at time t = t 3 at which time testing of Core 4 is initiated. Testing of Core 4 is completed at t = t 4 at which time testing of Cores 5 and 6 is initiated. The two parts of Core 5, denoted 5_1 and 5_2, indicate that Core 5 is tested through two non-consecutive subsets of TAM pins. Testing of Core 2 is completed at t = t 2 and testing of Core 6 is completed at t = t 6 . At this time, testing of the SOC is completed. As seen from Figure 2 , every test schedule corresponds to a rectangle placement in the bin representing the SOC. So the problem of SOC test scheduling can be transformed to the rectangle placement problem.
In this section, a new representation called Dual Sequences (DS) is introduced to express the rectangle placement. Earlier, sequence pair was used to represent SOC test schedules in [6, 16] . As shown in the section on experimental results, using DS representation of SOC test schedules reduces the run time and improves the quality of the solutions obtained by using simulated annealing.
The DS for a placement of a set of n rectangles (cores) is a pair of sequences (R, W), in which R is a sequence of the names of the n rectangles and W is a sequence of the widths of the n rectangles listed in R. For example, (< R3 R1 R2 R4 >, < 4 2 1 5 >) is a DS from which we can see that the placement is composed of four rectangles with widths 4,2,1 and 5, respectively. Next we discuss how to represent a rectangle placement by a DS and how to obtain the rectangle placement corresponding to a DS.
DS Extraction from Rectangle Placement
Given a placement of rectangles, the corresponding DS can be obtained by visiting every rectangle in the placement from bottom to top and from left to right. During the visitation, the rectangles we encounter are recorded in R in the order of visiting them and the width corresponding to the discovered rectangles are recorded in W. If a rectangle is split into several sub-rectangles in the placement, the sub-rectangles are merged into one rectangle for representation in (R, W) and its position in R is decided by the first sub rectangle and the width in W is the sum of the widths of the sub-rectangles. A rectangle placement corresponding to a SOC test schedule may have split a rectangle corresponding to a core since its wrapper pins are connected to non-consecutive SOC pins. For example consider the rectangle placement in Figure 2 , which has six rectangles R1,R2,R3,R4,R5,R6 corresponding to the six cores with widths, say Φ 1 , Φ 2 , Φ 3 , Φ 4 , Φ 5 , and Φ 6 , respectively. The rectangle R5 is divided into two sub-rectangles R5_1 and R5_2. As explained next, by visiting each rectangle within the placement and merging the sub-rectangles, we obtain the Dual Sequences
Since testing of R1, R2 and R3 are all scheduled at time zero, they are visited before R4 which is scheduled for testing at t 3 . Within the set of rectangles R1, R2, and R3, R1 is visited first since it is left of R2, followed by R2 and then R3. Next R4 is visited. After visiting R4, R5 and R6 are visited but R5 is visited before R6.
Fig. 2 A SOC test schedule

Mapping from DS to Placement of Rectangles
To obtain a placement from a DS, a greedy algorithm based on two dimensional bin packing is used. The basic idea of this algorithm is that given the packing sequence R and the width sequence W of the packed rectangles, we pick rectangles from R one at a time in the order of their appearance in R and pack a selected rectangle at a position which is as low as possible (i.e., we schedule the start of the test of the core corresponding to the rectangle as early as possible). It is important to point the distinction between dual sequences used here and sequence pair used in [6, 16] to represent a bin packing. Given a sequence pair the corresponding bin packing is uniquely defined and is obtained by a longest path procedure run over two graphs derived from the sequence pair [6] . The bin packing corresponding to a given DS is not unique. The sequence R determines the order in which the rectangles are considered and W restricts the choice of the width (i.e. wrappers) of the rectangle corresponding to the core being packed. Any procedure to pack the rectangles in the order given by R can be used.
In the proposed method to obtain a rectangle placement from a given DS, a data structure called a layer is used, which corresponds to a position where a yet unplaced rectangle can be placed. A layer has two attributes: starting time and width. The starting time is the height of the layer in the bin and the width indicates the space available at this height. For example, in Fig.3 (a) , we have four layers, layer 0 to layer 3, which are indicated by the thick dark lines. The start time and the width of the layer can be seen from Fig.3 (a) . For example, the start time of layer1 is H1 and the width is (W2-W1).
Before we describle the procedure to obtain the rectangle placement from a DS, we show how the layers change when a new rectangle is added into an existing partial placement.
Given the partial placement in Fig 3(a) , suppose a new rectangle, say R4 with width (W2+W4-W3) is placed on layer 2. As shown in Fig 3 (b) , R4 occupies layer 1, layer 2 and part of layer 0. Widths of layer 1 and layer 2 are changed to 0 and the width of layer 0 is changed to (W5-W4). A new layer 4 with width (W2+W4-W3) is added. Layer 4 is split into two sub-layers as can be seen in Fig. 3 
Next we introduce the greedy algorithm for obtaining a rectangle placement from a DS. At the start, there is only one layer whose width is equal to the total TAM width (the number of SOC pins). We pick a rectangle from sequnce R from left to right , whose width is decided by the corresponding entry in sequence W, and place that rectangle on a layer L, which satisfies the following requirment.
1. The sum of the width of L and the widths of the layers with non-zero widths whose start time is less than or equal to the start time of layer L is greater than or equal to the width of the rectangle being placed. 2. The start time of layer L is the lowest among all layers that satisfy requirment 1.
3. If there is a power constraint, the core placed at that layer will not violate this constraint.
Following the palcement of the rectangle, a new layer with width equal to the width of the just placed rectangle is added to the placement and the widths of the other layers are updated. This procedure is repeated until all the rectangles in sequence R are packed. It should be pointed out that the procedure proposed above is suboptimal. One reason for this sub-optimality is that when the width of a layer is reduced it may preclude the use of some packing space. For example the shaded area in Figure 3 (b) is not availble for future packing after rectangle R4 is placed.
Compared to test schedule representation using sequence pair used in [6, 16] for obtaining SOC test schedules, the search space for DS is of size Since the search space using DS representation of rectangle placement is smaller, it leads to a much lower run time for test schedule optimization using simulated annealing. As the experiments on ITC'02 benchmarks reported later show, the optimality of the obtained SOC test schedule is indeed not effected by using dual sequences instead of sequence pair.
Simulated Annealing
Simulated annealing (SA) is a global stochastic optimization algorithm that was first introduced by Kirkpatric et al. [17] . The algorithm begins with an initial solution, and then a neighboring solution is created by perturbing the current solution. If the cost of the neighboring solution is less than that of the current solution, the neighboring solution is accepted; else it is accepted or rejected with some probability. The probability of accepting an inferior solution is a function of a parameter called the temperature. The probability function used is:
where C ∆ is the change in the cost between the neighboring solution and the current solution and T is the current temperature. At the beginning of the algorithm, the temperature T is large and an inferior solution has a high probability of being accepted. As the optimization progresses, the temperature decreases and there is a lower probability of accepting an inferior solution. The procedure we used to implement the simulated annealing algorithm for finding an optimal SOC test schedule that minimizes the expected test completion time is given below. Objective: Find an optimal solution Sopt, which makes the cost function C(Sopt) minimum. We use the SA algorithm described above to implement the SOC test scheduling based on dual sequences by specifying the parameters of the SA algorithm as follows.
Cost function C:
The objective of test scheduling is to reduce the test application time of the SOC. Therefore, the height of the bin where the rectangles are placed is defined as the cost function. Neighboring solution S n : The neighboring solution is defined by two types of moves over the dual sequences, given below. M1: Exchange the position of two randomly chosen rectangles in the first sequence R (note that W is also changed to reflect the exchange in R). M2: Change the width (and hence the height) of a rectangle to another allowed width of the rectangle in the second sequence W (allowed widths are the paretooptimal values as discussed earlier). During the process of optimization, the probabilities of moves M1 and M2 are set to 0.5 each. The initial solution S init can be set randomly. In order to accelerate the convergence of SA, the test schedule obtained by the heuristic procedure in [13] is used as initial solution in the experiment reported later. Initial temperature: The initial temperature T init is set to 4000. At the end of each outer loop, temperature T is reset to T new = 4000 + 1000 * Counter.
Other parameters: These parameters include the final temperature T final , the number of iterations Niter at every temperature, the stopping criteria and the temperature reduction multiplier K. In our implementation, these parameters are set as follows.
(1) T final = 10;
(2)The number of iterations Niter at each temperature is set to 400*N c where N c is the number of rectangles.
(3)The stopping criteria can be decided by the user. In our experiment, if Counter is larger than 10, the procedure is stopped.
(4) The temperature reduction multiplier K is set to 0.98 when T < 10000; otherwise K = 0.93.
Reducing ATE Resources
Automatic Test Equipment (ATE) used in SOC test provides the ability to perform multi-site testing, which allows several copies of a SOC to be tested concurrently. When the number of ATE channels is given, to test a maximum number of SOCs at the same time requires minimization of the TAM width of the SOC while not violating the ATE memory depth constraint (decreasing the TAM width of the SOC will increase the test application time and hence the ATE memory depth requirement). In this section we discuss how the proposed method using DS representation of the test schedules can be used to minimize SOC TAM width as well as the ATE buffer memory depth for a given SOC TAM width.
When an SOC is tested by an ATE, the test channel memory depth required for the SOC is decided by the test data volume. The depth of the test channel memory can be approximated by the SOC test application time (the number of clock cycles) [20] . Therefore, the problem of multi-site SOC test under ATE memory depth constraints can be considered as a problem of reducing the SOC TAM width while the total test application time is fixed. This allows testing of a maximum number of SOCs using a given number of test channels and their buffer memory depth. The SOC multi-site test problem can be solved using the two dimensional bin packing procedure with the width of the bin representing the memory depth constraint and the height of the bin representing the TAM width. We should point out that in the bin packing problem for multisite testing, a rectangle cannot be divided into several subrectangles, which is different from the bin packing problem we discussed before. Dividing rectangles was permitted in the earlier problem since it is not necessary to connect the wrapper pins of a core to adjacent SOC pins. However, in multi-site testing, breaking a rectangle represents interruption of the test of a core, which may not be permitted. A simple way to accommodate the requirement that core tests cannot be interrupted is to require that the new rectangle to be packed must occupy contiguous layers only, thus avoiding division of rectangles.
Figure 4 Rectangle packing to optimize ATE resources
Another issue that needs to be considered is illustrated by the rectangle packing shown in Figure 4 (a). Figure 4(a) shows the case where the ATE test channels 1 and 2 are used to test core 2 and ATE test channel 3 is used to test core 1. Tests for core 1 occupy M1 bits of memory buffer for tester channel 3 and tests for core 2 occupy M2 bits of memory buffer for channels 1 and 2. Tests for core 3 use all three test channels and hence can only be started after completing the test of core 2. It should be pointed out that the tests are loaded into the buffer and shifted out to the inputs of the device under test. If the tester architecture is such that all test channel buffers are shifted at the same time and each channel has dedicated memory buffer then the buffer bits of channel 3 are don't cares from M1 to M2. However if the tester architecture is such that each test channel is individually controlled, then test channel 3 can be idled after testing core 1 until core 3 test is initiated. In this case the size of the buffer for test channel 3 need only be (M3-M2+M1). The packing shown in Figure 4(b) for the same cores as in Figure 4 (a) illustrates the situation where the memory buffer contents for test channel 3 is such that don't cares occur only at the end. In this case after the testing of core 1 is complete test channel 3 can be idled. In general, if the packing is such that all the test channel buffers have don't cares only at the end the ATE memory management is simpler [20, 21] . Finally, in some ATE architectures the entire buffer memory can be configured as a single pool of memory that can be dynamically assigned to test channels [21, 22] . For such architectures the total memory requirements for a SOC test is important. For finding SOC test schedules to minimize the number of ATE test channels given the maximum depth of test channel buffers, we used two different procedures to obtain rectangle packings from dual sequences. The first one is a modified version of the procedure in [20] to obtain rectangle packings such that all the don't cares in the memory buffers are at the end. The second procedure is the one described in the last section with the additional constraint that rectangles are not divided during packing.
Experimental Results
The proposed simulated annealing based algorithm is implemented in C++ and executed on a PC with a Pentium IV 1.4GHZ processor and a 512 MB memory. The implemented procedure was applied to ITC'02 benchmark SOCs [18] under the assumption of no power constraint.
The results of applying the proposed method to SOC test scheduling together with the results reported by earlier methods are reported in Table 1 . The proposed simulated annealing based procedure was run for ten iterations and the best schedule obtained is reported. The method used is indicated in column 2, where DS indicates the proposed method and the other methods are indicated by the number of the corresponding reference. The remaining columns give the SOC test application time for the number of SOC pins shown as the heading for the column. The entry for the method(s) achieving the best test application time is shown in bold. The method in [6] also used a simulated annealing algorithm with test schedules represented by sequence pair and had achieved better schedules than a heuristic method that also used sequence pair [16] . For this method also we report the schedule obtained from ten iterations of the procedure. It can be seen that for all the benchmark SOCs, the proposed method achieves better or equal SOC test application time compared to [6] . It can also be observed that the proposed method achieves the same or better test application time than all other methods, except in the cases of P93791 with 80 SOC pins and A586710 with 32 SOC pins.
The run times for the proposed simulated annealing based procedure and the earlier procedure using simulated annealing together with sequence pair [6] are given in Table 2 . The run times reported for both procedures are for a total of ten iterations of the procedures. From Table  2 it can be seen that using dual sequences instead of sequence pair to represent rectangle placements improves the run time of simulated annealing based procedures. A 2X to 3X improvement in run time is obtained for most designs.
In Tables 3-6 we report the results on ATE tester channels and buffer memory for four circuits for which data of the earlier work [20] is available. In the first column we show the maximum memory allowed per test channel. In the next three columns we show the number of tester channels required to deliver the tests using the two procedures described in the last section and the method of [20] , respectively. Procedure DS 1 is the proposed simulated annealing based procedure when the don't care bits in the buffer memory of the test channels are all at the end and DS is the procedure where the don't care bits are allowed to be anywhere in the buffer memory. In the next four columns we give the total ATE memory required to store the test input data. For method DS we report two entries. Under DS g we report total memory including the don't cares portion and under DS we report the total memory ignoring the don't care portion. For the other two procedures the don't care portions are not included in the totals reported.
From Tables 3-6 it can be seen that the simulated annealing based procedures require the same or smaller number of test channels compared to the heuristic procedure of [20] for all the SOCs considered. It can also be seen that the total memory required is also smaller for the simulated annealing based procedures.
As Table 4 for SOC p22810 with test channel buffer size limited to 640K, we note that using methods DS 1 , DS and of [20] , the number of test channels needed to apply tests to all SOCs under test is 12, 11 and 13, respectively. However each tested SOC needs the same number of separate test channels to obtain test responses. Thus in this case, the number of SOCs that can be simultaneously tested using a tester with 128 test channels will be 8 using the procedure of [20] , 9 using procedure DS 1 and 10 if procedure DS is used. Thus by using procedure DS the number of SOCs tested per unit of time increases by 25% over the number tested if the procedure from [20] is used.
Conclusions
A new data structure called dual sequence to represent rectangle packings is introduced. Using dual sequences together with simulated annealing procedures to obtain optimal SOC test schedules and to reduce ATE test resources were presented. Experimental results on ITC '02 SOC benchmarks showed that the proposed procedures yield better results than procedures proposed earlier. 
