Core test wrappers and test access mechanisms (TAMs) 
Introduction
The general problem of system-on-chip (SOC) test integration includes the design and optimization of wrapper/TAM architectures and test scheduling. Test wrappers form the interface between cores and TAMs, and TAMs transport test data between SOC pins and test wrappers [15] . Test scheduling determines the order in which tests are applied. We focus here on wrapper/TAM co-design to minimize testing time under TAM width constraints. Wrapper/TAM design is challenging because (i) wrapper and TAM optimization must be carried out in conjunction [8] , (ii) TAMs must be designed to minimize testing time under the constraint of limited chip I/Os available for testing, and (iii) wrapper/TAM co-optimization techniques must be scalable for industrial SOCs containing not only a large number of cores with hundreds of I/O terminals and scan chains, but also a large number of TAMs.
Most prior research has either studied wrapper design and TAM optimization as independent problems [1, 4, 5, 12] , or not addressed the issue of sizing the TAMs to minimize SOC testing time [14] . Alternative approaches that combine TAM design with test scheduling [9, 13] do not address the problem of wrapper design and its relationship to TAM optimization. New techniques for wrapper/TAM co-optimization are therefore needed to minimize testing time under TAM width constraints. Such techniques should be scalable for SOCs that employ a large number of TAMs.
The first integrated method for wrapper/TAM co-optimization was proposed in [8] . TAM optimization was carried out by enumerating over the different partitions of TAM width as well as over the number of TAMs on the SOC. Integer linear programming (ILP) was used to calculate the optimal core assignment and resulting testing time for
¦
This research was supported in part by the National Science Foundation under grant number CCR-9875324 and by an IBM Graduate Fellowship. each partition. A drawback of this approach is that the wrapper/TAM designs considered in [8] are limited to a small number of TAMs in order to maintain feasible compute times. However, if the total number of TAM wires on the SOC is large, the testing time can often be reduced by increasing the number of TAMs. This is because of two reasons. Firstly, when there are multiple TAMs of different widths, a larger number of cores can be assigned to TAMs whose widths match the cores' own test data requirements; thus the number of unnecessary (idle) TAM wires assigned to cores is reduced. Secondly, multiple TAMs provide greater test parallelism, thereby decreasing total testing time. The methods in [8] are therefore inadequate for large industrial SOCs.
In [8] , four problems structured in order of increasing complexity were formulated, such that they serve as stepping stones to the problem of wrapper/TAM co-optimization for SOCs. We first review these four problems. 1.
: Design a wrapper for a given core, such that the core testing time is minimized, and the TAM width required for the core is minimized.
§
: Determine (i) an assignment of cores to TAMs of given widths, and (ii) a wrapper design for each core such that SOC testing time is minimized. (Item (ii) corresponds to , we use an algorithm based on the Best Fit Decreasing (BFD) heuristic for the Bin Packing problem [6] . Our Design wrapper algorithm (proposed earlier in [8] ) has two priorities: (i) minimizing core testing time, and (ii) minimizing the TAM width required for the test wrapper. These priorities are achieved by balancing the lengths of the wrapper scan chains designed, and identifying the number of wrapper scan chains that actually need to be created to minimize testing time. Priority (ii) is addressed by the algorithm since it has a built-in reluctance to create a new wrapper scan chain, while assigning core-internal scan chains to the existing wrapper scan chains.
The second problem § is that of assigning cores to TAMs of given widths. An ILP model was developed to solve § exactly in [8] . The CPU time for this ILP model was reasonably short for a single execution and optimal solutions for the core assignment problem were easily obtained. However, and halts. This plays a significant role in reducing computation when Core assign is executed a large number of times, as will be shown in Section 3.
We illustrate the Core assign algorithm using an example SOC containing five cores and three TAMs. The testing times for the five cores when assigned to the TAMs of widths 8, 16 , and 32 are shown in Figure 2 (a) . Initially, the testing time on all TAMs is 0 cycles; Figure 1 . New algorithm for core assignment.
Testing time (cycles)  TAM 1  TAM 2  TAM 3  Cores  32 bits  16 bits  8 bits  1  50  100  200  2  75  95  200  3  90  100  150  4  60  75  80  5  120  120  125   Testing  time  Core  TAM  (cycles)  1  2  100  2  3  200  3  2  100  4  1  60  5 1 120
(a) (b) Figure 2 . Core testing times for (a) the SOC used to illustrate Core assign, and (b) the final assignment.
therefore TAM 1 of width 32, being the widest, is considered first. Core 5 has the highest testing time on TAM 1, therefore Core 5 is assigned to TAM 1. Next, there is a choice between Cores 1 and 3 to be assigned to TAM 2 of width 16. We choose to assign Core 1 to TAM 2 here because the testing time for Core 1 on TAM 3 is higher than the testing time for Core 3 on TAM 3 (Line 14 of Core assign). Next Core 2 is assigned to TAM 3. TAM 2 is now the minimally loaded TAM; therefore, Core 3 is assigned to TAM 2. Finally, Core 4 is assigned to TAM 1. a is the number of cores in the SOC. Core assign executes two orders of magnitude faster than the ILP model in [8] ; hence, a significantly larger number of Core assign iterations can be executed in the time taken to execute the ILP model.
TAM width partitioning
In this section, we describe how the Core assign heuristic is used to develop an algorithm to quickly reach an intermediate solution to . One way to ensure that only unique partitions are evaluated is to discard, prior to evaluation, each new partition that appears to be a cyclical isomorphism of a previously handled partition. However, the memory requirements for this method and the number of partition comparisons required to be performed grow exponentially with ¥ and severely limit the scalability for large ¥ . Furthermore, as ¥ increases, the time required to enumerate unique partitions and evaluate them using ILP increases significantly. This method is therefore inadequate for industrial SOCs having multiple TAMs.
In this subsection, we use Core assign to develop a fast method to evaluate width partitions; this effectively addresses the problems inherent to the ILP model and "enumeration-comparison" method described above. The new heuristic employs extensive solution-space pruning, and is thus applicable to wrapper/TAM design for industrial SOCs having a large number of TAMs. In our experiments with industrial SOCs, we were able to evaluate width partitions and testing times for wrapper/TAM architectures having upto ten TAMs within a few minutes. Test access architectures having more than ten TAMs could also be evaluated, but were found to be less useful for testing time minimization because testing time increases significantly as the relative width of each TAM decreases beyond a threshold.
The new algorithm Partition evaluate for problems § and § © is presented in Figure 3 . This algorithm employs three levels of solution-space pruning. Firstly, the number of partitions enu- implies that approximately 1% of the number of unique partitions are evaluated to completion by Partition evaluate. We choose p21241 to illustrate the efficiency of our heuristics, because the exhaustive method [8] was found to be inadequate for wrapper/TAM co-design for p21241; the method did not complete even for
. From Table 1 , it can be seen that Partition evaluate evaluates on average only 2% of the unique partitions. Thus there is a significant reduction in the execution time using this heuristic compared to the exhaustive method. 
Final optimization step
Partition evaluate provides a fast approximation of the optimal values of TAM width partition and testing time. We further improve on this result by performing a final optimization step using the ILP model for § [8] . Since this final step is performed only once, and since the execution time for a single iteration of the ILP model for from [8] for reasons of completeness, and to comment on its complexity.
To model § , respectively. This ILP model uses the best width partition obtained from Partition evaluate to optimize the core assignment and obtain a near-optimal wrapper/TAM architecture in the final step of our co-optimization methodology.
Experimental results
In this section, we present experimental results on our wrapper/TAM co-optimization methodology for four example SOCs. The first, d695, is an academic benchmark SOC from Duke University. The other three SOCs p93791, p21241, and p31108 are from Philips. The number (e.g., 93791) in each SOC name is a measure of its test complexity. We calculate the SOC test complexity number using the formula presented in [8] .
The experimental results presented in this paper were obtained at Duke University using a Sun Ultra 10 with a 333 MHz processor and 256 MB memory. The results in [8] were obtained at Philips Research Laboratories using a Sun Ultra 80 with a 450 MHz processor and 4096 MB memory. For the problems in this paper, we found that the Sun Ultra 80 leads to five times faster execution compared to the Sun Ultra 10. Therefore, the CPU times reported in [8] have been multiplied by a factor of five to facilitate a comparison with the CPU times reported here. Note that we achieve an order of magnitude improvement in CPU time over [8] even without the 5 @ adjustment factor. Note also that all SOC testing times in this section are expressed in clock cycles.
Results for SOC d695
In this subsection, we present experimental results for SOC d695. SOC d695 consists of two ISCAS'85 and eight ISCAS'89 benchmark circuits [8] .
Results in [8] £ . The core assignment vector follows the notation introduced in [5] and further used in [8] . Each position in the vector refers to the core number and the entry in each position refers to the TAM to which the corresponding core is assigned. The percentage change in testing time using the new method is calculated using the formula 
¥ U q
). The testing times and CPU times in [8] have already been presented in Table 2 . The best results in [8] were obtained for
. The testing times obtained using the new co-optimization technique are better than or equal to the best testing times in [8] for larger values of
, the new testing times are on average only 3% larger than those reported in [8] . We improve upon the testing times compared to [8] . There is an improvement of two orders of magnitude in the CPU times in all cases. The new technique is therefore scalable for industrial SOCs having multiple TAMs, as illustrated in the following subsections.
Results for SOC p21241
SOC p21241 contains 28 cores. Of these, 6 are memory cores and 22 are scan-testable logic cores. Table 4 , the testing times obtained using the new co-optimization technique are on average 25% lower than those obtained using the Exhaustive method in [8] . This is because using Partition evaluate, we were able to partition 1,1,1,2,1,1,1,1,2,2 1,1,1,1,1,1,2,1,1,2,2,1,2 1,1,1,1,1,2,1,1,1,  1,1,1,1,1,2 , the Partition evaluate heuristic returns a partition of 1+1+4+10. This yields a testing time of 468011 cycles after the final optimization step (Table 7) . However, a lower testing time of 462210 cycles is actually achievable using only 2 TAMs (Table 6 ). This is because the testing time for four TAMs returned by Partition evaluate is lower than that for two TAMs before the final optimization step. Similarly, for
, Partition evaluate obtains a partition of 13+10+10+10+21 (five TAMs), which gives a lower (heuristic) testing time than does the partition obtained for
£ $ P
: 5+4+10+10+10+17 (six TAMs). However, after the final optimization step, the partition for £ $ P is able to achieve the same testing time as that for
. This anomalous behavior of our algorithm is due to the fact that Partition evaluate uses heuristics to quickly approximate the final result. Therefore, the partition that actually provides the lowest testing time after final (exact) optimization might not be returned by Partition evaluate.
Results for SOC p31108
SOC p31108 contains 19 cores. Of these, 15 are memory cores and 4 are scan-testable logic cores. 
¥ $ R
, the exhaustive method of [8] did not provide a solution even after two days of CPU time. , the testing times obtained using the new co-optimization technique are on average 15% higher than those obtained using the Exhaustive method. For £ W R y
, we reach the optimum testing time of 544579 cycles. The testing time of this SOC does not decrease beyond 544579 cycles as £ is increased beyond 40 and ¥ is increased beyond 3. This is because the testing time for Core 18 in p31108 reaches a minimum value of 544579 cycles when the width of the TAM to which it is assigned reaches 10 bits. Note that in Tables 11, 12 and 13, for £ W T R y , Core 18 is always assigned to a TAM, whose width 
is always 10 bits or more and which does not have any other cores assigned to it; thus our method achieves the theoretical lower bound on testing time for this SOC. For
, TAM 1 is not used since the algorithm is able to assign the cores to the remaining TAMs, while achieving the lower bound of 544579 cycles. Results are shown for six TAMs, however, since the Partition evaluate heuristic obtains a lower testing time for six TAMs than for five TAMs before the final optimization step. The values of e b d in Table 13 are for ¥ $ 2
. The new CPU times are on average between 1 and 2 orders of magnitude less than the CPU times of the exhaustive method. This is because the individual § Exhaustive models for p31108 took particularly long to solve. This significantly affected the CPU time of the exhaustive method for § and § © .
New co-optimization method 
Results for SOC p93791
Number 
SOC p93791 contains 32 cores [8] . Of these 32 cores, 18 are memory cores and 14 are scan-testable logic cores. A summary of the 32 cores is presented in Table 14 . Tables 15, 16, Here too, we did not achieve a solution with the exhaustive method after two days of execution for ¥ X $ R
. 2,2,1,1,1,1,2,2,1,2,  2,1,1,1,2,1,1,1,1,1 2,2,1,1,2,2,1,2,1,2,2,  1,2,2,2,1,2,1,1,2,2 2,2,1,1,2,1,1,2,1,1,  2,1,1,1,2,1,1,1,1,1 
algorithm are between two and three orders of magnitude smaller than the CPU times of the exhaustive method. This is because Partition evaluate is able to effectively prune the solution space by halting evaluation of unnecessary partitions, for which the testing time of a TAM exceeds the previous minimum value for the SOC. For example, of the 341 unique partitions for
, only 23 were evaluated to completion. Furthermore, the new heuristic algorithm Core assign makes it possible to evaluate wrapper/TAM architectures for industrial SOCs having multiple TAMs, which was not feasible using the methods in [8] .
Conclusion
We have presented a new efficient technique for co-optimization of the wrapper/TAM architecture for industrial SOCs. The general wrapper/TAM co-optimization problem has been formulated as a progression of four problems. The first problem § relating to wrapper design is solved using an efficient algorithm presented earlier. For the second problem § , relating to core assignment among TAMs of fixed widths, we have presented an efficient procedure called Core assign that executes significantly faster than an 1,2,1,1,2,1,  1,1,1,3,3,1,2,1,  1,3,2,1,1,2,3,3 1,2,1,3,3,3,1,1,  3,1,2,3,1,1,3,1,  1,1,3,2,2,1,1,2 2,1,1,3,1,1,2,2,  2,1,1,3,1,1,2,2,  1,2,3,1,1,1,1,3 relate to determining a partition of TAM width and an effective number of TAMs for the SOC, such that testing time is minimized. These two problems have been solved using a new heuristic procedure called Partition evaluate that quickly reaches within the neighborhood of the optimal solution to § and § ©
. Partition evaluate uses extensive solution-space pruning to identify an effective TAM partition for the SOC. Finally, the existing ILP model for § is used to optimize the core assignment and testing time for the width partition produced by Partition evaluate. Experimental results for several industrial SOCs demonstrate that wrapper/TAM co-optimization can be effectively carried out in over an order of magnitude less time than exact methods based on ILP and exhaustive enumeration presented earlier.
The drawback of the heuristic methods presented in this paper are that they exhibit anomalous behavior at times. The width partition and number of TAMs returned by Partition evaluate do not always provide the lowest testing time after the final (exact) optimization step is performed.
New co-optimization method TAM Core 1,2,1,2,2,2,1,1,  1,1,2,2,1,2,2,1,  1,1,2,1,1,1,1,1 2,1,1,3,1,1,2,2,  2,1,1,3,1,1,2,2,  1,2,3,1,1,1,1,3 
GS U T RS V T W G

