The increasing complexity of system-on-chip (SOC) 
Introduction
Recent advances in CMOS technology have led to a significant increase in the complexity of system-on-chip (SOC) integrated circuits. Today's SOCs consist of embedded cores in multiple clock domains that can often be tested at different scan clock frequencies. In order to test some of the cores at higher clock frequencies, the test data needs to be transported at a higher data rate on some selected tester channels. In order to address this problem, automatic test equipment (ATE) vendors have recently announced a new class of testers that can simultaneously drive a limited number of channels at different data rates [4] . Examples of such ATEs include the Agilent 93000 series tester based on port scalability and the test processor-per-pin architecture [1] , and the Tiger system from Teradyne [2] in which the data rate can be increased for selected pin groups to match SOC test requirements. However, the number of tester channels with high data rates are constrained in practice due to ATE resource limitations, the power rating of the SOC, and scan frequency limits for the embedded cores. Optimization § This research was supported in part by the National Science Foundation under grants CCR-9875324 and CCR-0204077.
techniques are therefore needed to ensure that the high datarate tester channels are efficiently used during SOC testing. In this way, high-frequency ATE channels, typically used for at-speed functional testing, can also be used to reduce the time needed for scan testing.
Modular testing of embedded cores in an SOC can simplify the complex problems of test access and application [13] . For modular testing, an embedded core is isolated from surrounding logic using a test wrapper, and a test access mechanism (TAM) is designed to deliver test data from the I/O pins of the SOC. This facilitates the reuse of precomputed tests for individual cores and partitions the SOC for test.
The problem of designing a TAM architecture and determine a test schedule to minimize the SOC testing time has been shown in the literature to be¨© -hard [7] . Therefore, a number of efficient heuristic techniques have been developed for TAM optimization [5, 6, 8, 9] . However, in all these methods, it is assumed that at any instant in time, the ATE provides test stimuli to the SOC at a single data rate. As a result, existing optimization techniques cannot readily exploit the availability of simultaneous multiple data transfer speeds from the ATE to the SOC. In this work, we focus on the problem of designing an optimized TAM architecture that can benefit from the availability of port scalability in ATEs. As in [5, 8] , we base our TAM design on a Test Bus model.
We extend the heuristic approach based on rectangle packing that was presented in [8] . The use of rectangles to model core tests was described in [6, 8] . The testing times for a core in the SOC can be represented using a set of rectangles. A set R of rectangles for Core ( , where is the number of cores in the SOC) is determined such that the height and width of each rectangle correspond to a TAM width and the corresponding test application time for the core, respectively. The TAM optimization problem can now be formulated in terms of rectangle packing as follows: Select one rectangle from each set R , , and pack the selected rectangles into a bin of fixed height, such that no two rectangles overlap, and the width to which the bin is filled is minimized. The problem formulation and heuristic solution in [8] are based on a single-speed TAM architecture; therefore they are not directly applicable to the problem of optimizing dual-speed TAM architectures being studied in this paper.
The availability of dual-speed ATEs was recently ex-ploited in [12] , where a technique was presented to match ATE channels with high data rates to core scan chain frequencies using virtual TAMs. Virtual TAMs operate at scan-chain frequencies; however, they interface with the higher-frequency ATE channels using bandwidth matching. Moreover, since the virtual TAM width is not limited by the ATE pin-count, a larger number of TAM wires can be used on the SOC, thereby leading to lower testing times. A drawback of virtual TAMs however is the need for additional TAM wires on the SOC, as well as frequency division hardware for bandwidth matching. In this paper, we reduce the hardware overhead compared to [12] by using a smaller number of on-chip TAM wires. We also use ATE channels with high data rates to directly drive SOC TAM wires, thereby obviating the need for frequency division hardware. The rest of this paper is organized as follows. In Section 2, we define the dual-speed TAM optimization problem and formulate it in terms of rectangle packing. In Section 3, we present an efficient algorithm to optimize a dual-speed TAM architecture and to derive a test schedule that minimizes the testing time. In Section 4 we present the experimental results for three ITC'02 benchmark SOCs. Finally, we present conclusion and directions for future work in Section 5.
Dual-speed TAM optimization
In this section we define the dual-speed TAM optimization problem and formulate it in terms of rectangle packing. Problem
: Given the test data parameters for the embedded cores, total SOC-level TAM width £ , a total of ¤ available high-speed ATE channels (
, and the ratio of the high-speed data transfer rate to the lowspeed data transfer rate, determine (i) the wrapper design, TAM width and test data rate for each core, and the SOC test schedule such that (i) the total number of TAM wires utilized at any moment does not exceed £ , (ii) the number of TAM wires driven at the high data rate does not exceed ¤ , and (iii) the SOC testing time is minimized. The test set parameters for each core include the number of primary inputs, primary outputs, bidirectional I/Os, test patterns, scan chains, and scan chain lengths. The cores are assumed to be hard cores, i.e, the number and length of scan chains are fixed.
For a given TAM width and wrapper design for a core, we assume that its testing time at the high data rate is times less than its testing time at the low data rate. 
. A core vendor can mandate an upper limit on the scan test frequency for a core, and if this upper limit is lower than the higher data rate, the core can only be tested at the lower data rate. Otherwise, a core can be tested using either the high-speed data channels or the low-speed data channels. We assume that a core is not connected to the ATE by both high-speed and low-speed data channels during scan testing, i.e., it is not possible to assign both high-speed and low-speed TAM wires to a core. While a higher data rate for a given TAM width always leads to reduced testing time for a core, higher data rate for a subset of TAM wires can lead to a smaller TAM width that is available for a core. The reduced TAM width for the core can lead to an increase in its testing time, despite the faster test data rate. As a result, an optimization procedure as described in this paper is needed to either select appropriate values of Recall that the height of a rectangle for a core represents the TAM width assigned to that core and the width of the rectangle denotes the testing time of the core for the corresponding value of the TAM width. The
algorithm from [7] is used to design a wrapper and determine the testing time for a core for several possible TAM widths. These pre-calculated testing times are subsequently used in the TAM optimization procedure. Based on the heuristic approach of [8] , we formulate the dual-speed TAM optimization problem as follows: Given a collection of two sets of rectangles for each core, one representing the testing times for the high data rate and the other representing the testing times for the low data rate, and two bins of fixed heights £ P ¤ and ¤ , respectively, denoting the two data transfer rates, select a rectangle for each core and pack it in the appropriate bin, such that no two rectangles overlap and the maximum of the widths of the two bins is minimized. Let R be the number of TAM widths of interest for Core . Let
be the set of rectangles for Core for the low data rate and
be the set of rectangles for the high data rate. Let ), such that no two rectangles overlap and the maximum of the widths of the two bins is minimized. reduces to the problem
, and
. Since s u v was shown to be¨© -hard in [8] , we conclude that
In [7] , the staircase nature of testing time variation with TAM width for cores is exploited to reduce the TAM width assigned to cores to the minimal value required to achieve a specific testing time. The TAM width values for which the testing time decreases are called the Pareto-optimal points of the core and only rectangles corresponding to the Paretooptimal TAM width values are considered. In the TAM optimization problem addressed here, each core has a set of Pareto-optimal points for low-speed test application and the same number of Pareto-optimal points for high-speed test application. Lower bound on testing time: In order to evaluate our heuristic approach, we determine a lower bound on the testing time for a dual-speed TAM architecture. In our lower bound and in the test scheduling approach, we do not allow the overlap of the scan-out operation for the last test pattern of a core with the scan-in operation for the first pattern of the next core on the same TAM wire. While this is feasible for a fixed-width TestRail architecture as in [5] , it is difficult to implement for a flexible-width Test Bus architecture.
For a single-speed TAM architecture, the area of a bin, with the width representing total testing time and the height representing total TAM width . A lower bound on the testing time of a core on a TAM width # can be expressed as:
is the number of test bits to be scanned into core , D
is the number of test bits to be scanned out of core , and G is the number of test patterns to be applied to Core . Now, we know that the total area of the bin cannot be less than the sum of the minimum-area rectangles of all the cores in the SOC. Thus for any bin, , which implies that 
. The above ILP model can be solved easily to determine the lower bounds. It takes less than a second of CPU time for the benchmark SOCs. Compared to the lower bound based on the notion of a "bottleneck core" derived in [3] , the above lower bound is more accurate for smaller TAM widths. However, for larger values of £ , [3] provides a tighter bound in many cases. Hence, we take the maxi- mum of the lower bound obtained from the ILP model and lower bound from [3] . Table 1 in Section 4 shows the lower bounds for various TAM widths.
Optimization procedure
In this section, we explain the heuristic procedure used to solve the
problem, which was modeled as
in Section 2. We extend the
procedure from [8] to solve the rectangle packing problem concurrently over two bins. In the
procedure, tests are scheduled depending on certain preferred TAM widths. When a core completes its test, the TAM wires being used by it are freed and are available for assignment to other cores. The goal is to assign a preferred TAM width to each core, as long as there are enough TAM lines available. We highlight the details of our algorithm that distinguish it from the t v w G R x 6 D procedure in the following paragraphs. Data Structure. The TAM width and the testing time of each core are stored in a data structure, which contains information about the start time, end time, preferred TAM widths, and TAM width frequency assigned. The data structure is presented in Figure 1 . This data structure is updated as the SOC test schedule is developed. Preferred TAM widths. We first compute a collection of Pareto-optimal rectangles for both the bins. Each core has a "preferred low-frequency TAM width" and a "preferred high-frequency TAM width". The preferred TAM widths are computed as a small percentage of the maximum allowable TAM widths . In the
procedure from [8] , an input parameter G determines the preferred TAM width for each core. It is the TAM width at which a testing time of the core reaches within . This input parameter is varied from 1 to 10 and the value that results in the best solution is chosen. To account for the requirements of bottleneck cores, a input difference parameter is chosen. If the testing time can be improved by adding a few TAM wires ( ) to the core, then the preferred TAM width is increased. In our procedure, we use the same value of G to determine the preferred TAM width for both the bins. Assigning preferred TAM widths to cores. In the case when both the high-speed and low-speed TAM wires are available, a core that has the highest testing time and whose preferred TAM width is less than the available TAM width is found for both the high-speed and low-speed bins. Of the two assignments, the assignment that yields a smaller end time is chosen. If only one type of TAM wires (low-speed or high-speed) is available, the core with the largest testing time and whose preferred TAM width is less than or equal to the available TAM width is chosen. On assigning a core to one of the two bins, the data structure for the core is updated. Figure 2 illustrated some of the variables used in our procedures. Two pointers, namely the parameter f 8 R 6
and B 6 0 R 6
, are jointly maintained for the two bins. The parameter f 8 R 6
is used to keep track of the earliest time at which TAM lines are available in either of the two bins and procedure. If the available TAM widths are less than the preferred TAM width of all the cores, there might be idle spaces in the schedule since the next assignment can be made only when more TAM lines have been freed. These idle spaces may appear in both the high-speed and the low-speed bin. These spaces are minimized by assigning the freed TAM lines to a unscheduled core that will finish its test before more TAM lines are freed and will use up most of the idle time. However, the idle spaces in the two bins have to be minimized independently, since a core cannot be assigned both highspeed and low-speed TAM wires. Redistribution of lines to fill idle time. If there does not exist a core that can fill up the idle time before the freeing up of more TAM lines, then the idle TAM lines can be redistributed among the cores that began their testing at the start of the idle time. The idle TAM lines in the low-speed bin are redistributed to the cores in that bin only and similarly idle lines in the high-speed bin can be redistributed among cores in the high-speed bin only.
In summary, the proposed heuristic procedure extends
by allowing more decisions to be made due to the availability of two bins. The time complexity of the procedure is found via experiments to be similar
. For all the experiments, the CPU time was less than a minute.
Experimental results
In this section, we present experimental results on test scheduling and dual-speed TAM optimization for the three largest SOCs (in terms of the number of cores) from the ITC'02 SOC Test Benchmarks [11] . We first study the testing time reduction obtained with different sized high-speed bins. The testing time is calculated in £ 8 , the low-speed TAM lines are assumed to be driven at 20MHz and the highspeed TAM lines are assumed to be driven at Table 1 , we present the testing time and lower bounds for various values of TAM widths £ and a range of values for ¤ , the number of TAM wires driven by the high-speed ATE channels. We also assume in this set of experiments that the high-speed data rate is twice that of the low-speed data rate, i.e., ) ¤
. The percentage change in testing time ) of the TAM width is made up of high-speed channels, and represents the base case when no high-speed channels are used. We vary £ from 16 to 64 in steps of 8, and we also consider five different values of n.
As expected, the reduction in the testing time is in many cases proportional to the number of high-speed TAM wires. In addition, the testing time is often close to the lower bound derived in Section 2. Note however that since an increase in the number of high-speed TAM wires leads to a reduction in the number of TAM wires available per core, the decrease in testing time obtained with the dual-speed TAM architecture is not always proportional to the fraction of TAM lines that transport test data at the higher rate. A dual-speed TAM optimization procedure as presented in this paper helps the system integrator determine values of and B for which the testing time reduction is especially noteworthy. For example, for p93791, the testing time for Core 5 with ) and available TAM width of 23 bits is 11398.9 £ 8
. With ) ¤ but available TAM width of 10 bits (due to a smaller bin), the testing time increases to 14026.9
. We make the unexpected observation that for smaller values of £ and B , the testing time is sometimes higher for the dual-speed TAM architecture. The smaller sizes of the bins constrains the heuristic procedure to select small TAM widths for the cores; this leads to higher testing time for the SOC. The reduction in the testing time due to the higher data rate is not sufficient to outweigh the increase in testing time due to smaller TAM widths for the cores.
The results for SOC p34392 in Table 1 are especially interesting. This SOC is known to have a bottleneck core, due to which the testing time levels out at 27228.95
using a single speed TAM architecture [8] . The dual-speed architecture allows us to overcome this "lower bound". For Figures 3 and 4 we vary the frequency factor from 2 to 5 for p22810 and p34392, keeping B fixed at 50%. It can be seen that the testing times for the higher speed ratios (
) for some TAM widths are close to each other. This is because while the testing time for the highspeed bin tends to decrease with an increase in the speed ratio, the testing time for the low-speed bin tends to remain the same. In such situations, the lower-speed bin dominates the overall testing time.
An advantage of the dual-speed TAM architecture is that compared to a single-speed TAM architecture, a desired testing time for the SOC can be achieved with a number of TAM wires. Let £ (£ ) be the SOC-level TAM width for the single-speed (dual-speed) TAM architecture that is required to achieve a desired testing time is sometimes larger than one, which implies that no benefit is obtained with the dual-speed architecture for these cases. Similar results are obtained for the other SOCs.
Finally, in Table 2 , we compare the testing time obtained with the dual-speed architecture to that obtained with the virtual TAM architecture of [12] . We find that for a given number of ATE channels, the dual-speed TAM architecture tends to outperform the virtual-TAM architecture for p22810 and p34392. The improvement in the virtual-TAM architecture is more pronounced for larger values of £ . Note that for p34392, the testing time is determined by the bottleneck core for the virtual-TAM architecture, and it levels off at 27228.95
. However, the use of a higher data rate for a subset of TAM widths allows us to reduce the testing time further, without resorting to additional on-chip interconnect for the virtual TAM.
Conclusion
We have shown how SOC testing time can be reduced through a dual-speed TAM architecture that is optimized using rectangle packing. We have also presented a lower bound on the testing time that can be achieved by using a dual-speed architecture. The testing time for the optimized TAM architecture is often close to the predicted lower bound, and the testing time is reduced significantly compared to a single-speed TAM architecture. We are currently extending this work to TAM architectures with more than two data rates. We are also studying the impact of the higher data rate on SOC test power, and developing a power-constrained multi-speed TAM optimization technique. Table 2 . Comparison between virtual TAMs and the proposed dual-speed TAM architecture; (a)
