Recent advances in tester technology have led to automatic test equipment (ATE) that can operate at up to several hundred MHz. Howevel; system-on-chip (SOC) scan chains typically run at lower frequencies (10-50 MHz). The use of high-speed ATE channels to drive slower scan chains leads to an underutilization of resources, thereby resulting in an increase in testing time. We present a new technique to reduce the testing time and test cost by matching highspeed ATE channels to slower scan chains using the concept of virtual test access mechanisms (TAMS). We also present a new TAM optimization framework based on Lagrange multipliers. Experimental results are presented for three industrial circuits from the ITC'O2 SOC test benchmarks.
INTRODUCTION
The widespread use of embedded cores in system-on-chip (SOC) design has led to higher chip densities and shorter design cycle times. However the growing demand for automatic test equipment (ATE) resources during manufacturing test of SOCs has led to a sharp increase in test cost [16] . Test cost for large SOCs can be viewed as consisting of 1. Explicit test cost (Cost of investing in a new ATE, also known as Capital Expenditure) : Complex cores often require expensive ATE resources such as high-frequency channels, high
As a result of rising costs, test is increasingly being viewed as a major bottleneck in SOC design and manufacturing; it is therefore important to reduce both explicit and implicit test cost.
The reduction of explicit test cost requires that an existing amortized cost ATE be used instead of investing in a new, expensive ATE. Methods proposed to constrain SOC test requirements to match current ATE capabilities include test data compression [lo] , response compaction [15] , and reduced pin-count test [17] . All of these methods seek to ensure that the SOC test can be handled by the existing ATE. However, current growth trends in SOC functionality and test requirements seem to predict that future investment in newer and expensive ATE is inevitable [6] .
On the other hand, reduction of implicit test cost requires that once a new, expensive ATE has been purchased, its resources must be utilized as efficiently as possible. This mandates that SOC testing times must be minimized such that several SOCs can use the ATE in a short time and that the high-frequency data channels and pin-count resources of the ATE are properly utilized by each SOC. Methods to increase the efficiency of ATE use include test scheduling, test access mechanism (TAM) optimization, and multi-site test. Test scheduling seeks to obtain an effective ordering of tests applied to the SOC to minimize testing time [2, 91. TAM optimization is performed to improve test access to embedded cores in a modular test environment [4, 5, 71. Finally, multi-site test seeks to test several copies of the SOC simultaneously on the ATE, thus reducing testing time across an entire production batch [16] . While these methods increase the efficiency of ATE use, they assume that the ATE always operates at core scan chain frequencies. Scan chains are typically run at frequencies lower than 50 MHz to reduce power consumption and avoid high-frequency scan design. However, recent advancements in tester design have led to ATE that can operate at up to several hundred MHz. The use of such high-frequency ATE channels at low scan chain frequencies severely under-utilizes ATE capability, resulting in an increase in testing time and time-tomarket, thereby directly impacting implicit test cost.
In this paper, we present a new technique to reduce implicit test cost by matching ATE channel frequencies to core scan chain frequencies using virtual TAMs. A virtual TAM is an on-chip test data transport mechanism that does not directly correspond to a particular ATE channel. Virtual TAMs operate at scan-chain frequencies; however, they interface with the higher-frequency ATE channels using bandwidth matching. Moreover, since the virtual TAM width is not limited by the ATE pin-count, a larger number of TAM wires can be used on the SOC. This significantly increases the utilization of ATE capabilities and provides the SOC with a larger amount of test data in a shorter testing time. We also propose a new method for virtual TAM optimization to improve test data transport from ATE channels to core YOs. The new method based on Lagrange multipliers [ 121 exploits the monotonically non-increasing function of core testing time with TAM width to effectively partition the set of virtual TAM wires among the cores.
The rest of the paper is organized as follows. In Section 2, we introduce the concept of virtual TAMs. In Section 3, we discuss the use of Lagrange multipliers to TAM width partitioning. In Section 4, we present the new TAM optimization flow using a combination of Lagrange multipliers for TAM width partitioning and a heuristic method for core assignment to TAMs. In Section 5, we present experimental results for benchmark SOCs demonstrating the applicability of our methods. We conclude the paper in Section 6.
VIRTUAL TAMS
Recent advancements in ATE technology have led to a substantial increase in ATE channel frequencies. However, the frequency at which an embedded core can be tested is limited by its scan chain frequency, typically under 50 MHz. Core scan chain frequencies are kept low to meet SOC power constraints and to avoid the design costs of high-frequency scan. The TAMs designed to transport test data to core scan chains, e.g., in [4, 5, 71 , are therefore constrained to operate at frequencies far lower than ATE channel capabilities.
This reduces the utilization of ATE resources and increases testing time, thereby increasing the implicit test cost.
The mismatch between ATE capabilities and TAM operating frequencies can be reduced using virtual TAMs based on bandwidth matching [lo] . The system TAMs are of two kinds: i) low-frequency TAMs driven by low-frequency ATE pins, and ii) high-frequency TAMs driven by high-frequency ATE pins. We apply bandwidth matching to the interface between high-frequency TAMs that interface with high-frequency ATE channels and low-frequency virtual TAMs that drive core scan chains; see Figure 1 . Virtual TAMs are based on the following relationship between the TAM width and operating frequency of test data transport mechanisms:
(1) where WATE and WTAM are the total ATE channel width and the total SOC TAM width, respectively, and fATE and fTAM are the ATE channel and virtual TAM frequencies respectively. If bandwidth matching is not used, WTAM equals W A T E , and all the lowfrequency and high-frequency ATE pins operate at the lower fTAM frequency.
In order to minimize the the testing time by using the high frequency ATE pins, yet not violating the scan frequency constraint of the cores, we increase the available TAM width and decrease the frequency of high-speed TAMs by the same factor n , such that Equation ( 1 ) is satisfied. This is illustrated as follows; again see Figure 1 . Given an SOC TAM of WATE pins (driven by the ATE), of which U pins are driven at the higher frequency fATE and (WATE -U) pins are driven at the lower scan frequency fTAM, such that fATE = n x fTAM using frequency division and band- width matching, the following relationship holds:
Therefore, the total number of pins available to the SOC for core testing, defined as the virtual TAM width, is given by
Thus every ATE pin operating at the higher frequency gives rise to n -1 virtual TAM pins. The virtual TAMs decrease testing time significantly since a larger amount of test data is available to cores. Moreover, since the serial-idparallel-out interfaces used for bandwidth matching are placed next to the cores, only the original WATE TAM wires are routed through the system. Thus, a large number of TAM wires can be obtained with low routing and hardware cost.
LAGRANGE MULTIPLIERS
In this section, we introduce the proposed Lagrange framework for minimizing implicit SOC test cost. Implicit test cost is reflected in the SOC testing time, since testing time directly impacts the ATE time spent per SOC and contributes to test cost in real ($) terms. The SOC testing time is minimized by designing a virtual TAM architecture and optimizing the virtual TAM widths supplied to cores. Here, we first describe a simple TAM optimization problem, and then formulate the general case.
Consider an SOC with two TAMs ( B = 2) and two cores ( N = 2). Let B denote the number of TAMs and N denote the number of cores in the system. Let w1 and wz be the widths of the two TAMs. We assume here that the core assignment to TAMs is determined a priori. (This constraint is relaxed in Section 4, where a method for integrated core assignment and TAM optimization is presented.) Core 1 is tested on TAM 1 and Core 2 is tested on TAM 2. Let the testing time of Core 1 on TAM 1 be denoted by Tl(wl), and the testing time of Core 2 on TAM 2 be denoted by Tz(w2). Note that Tl(w1) and T z (~z ) are both monotonically non-increasing functions, as shown in [9]. We now solve the following optimization problem: determine the values of w1, wz, such that (i) w1 + w g = W , and (ii) max{Tl(wl), Tz(wg)} is minimized, where W denotes the total virtual TAM width available.
We rephrase this problem as the minimization of a Lagrange cost function [12]. Let the Lagrange cost function J(w1, w2) be defined as J(w1, wz) = max{T1(wl),Tz(wz)} + X(w1 + W Z ) (4) where X is referred as the Lagrange multiplier.
The theory of Lagrange multipliers shows that for every W , there exists a Lagrange multiplier X such that the minimization of max{TI(wl), T~( w 2 ) ) is equivalent to the minimization of the right-hand expression in Equation (4) [12]. Thus, instead of minimizing max{Tl(wl), Tz(wz)}, we solve Equation (4). Our goal is to devise an algorithm that determines the values of w1 and W Z , such that J'(w1, w2) is minimized for a given A.
Next, we investigate the relationship between X and W . We consider two comer cases to bound the value of W . Case 1. Let us minimize the expression for J(w1, W Z ) in Equation (4) while setting X to 0. If X = 0, then J(w1, w2) Ix=o= max{Tl(wl), T~( w 2 ) ) .
Hence, the penalty term X(WI + w2) vanishes. Now, since both TI (w1) and T 2 ( W Z ) are monotonically nonincreasing, J(w1, w2) 1~~0 is minimized when both WI -+ CO and w2 + CO. Therefore, if X is set to 0, J'(w1, W Z ) is minimized by selecting a large value of W . Case 2. Next, let us minimize the expression for J'(w1, w2) while setting X to a large value, i.e., X -+ 00. In this case, from Equation (4), J'(w1, w2) M X(w1 + W Z ) . The penalty term thus outweighs the min-max term in Equation (4) . Hence, to minimize J' when X is large, a small value of W must be chosen, i.e., W -+ 0.
From the above two cases we note that by varying the value of the Lagrange multiplier A, it is possible to minimize J (and equivalently, the SOC testing time cost function) for different values of
W .
We next formalize the problem for the general case consisting of B 2 2 TAMs and Iv 1 2 cores. Recall that the core. assignment to TAMS is pre-determined. Let the constant xij = 1 (1 5 i 5 N, 1 5 j 5 B ) denote that core i is assigned to TAM j , otherwise zij = 0. Generalizing Equation (4) determine the optimal value of w1 for this constrained problem instance. Let w: denote the optimal value of w~. We set will = w;.
In the second iteration, keeping the values of all wi (i # 2) constant, we optimize the cost function to determine the optimal value of w2. In this manner, locally-optimal values for w1,. . . , WB are determined. The procedure then repeats to find the next value for w1. The procedure cycles through each value of j , ending when the decrement in the cost function J' goes below a given threshold An important property of the procedure is that the cost at the end of the nth iteration is always less than or equal to the cost at the end of the ( n -l ) t h iteration, i.e., J(%) 5 Z("-l). We exploit this property to show that the procedure is guaranteed to con- monotonically non-increasing function of n. Since a monotonically non-increasing function that is bounded from below is guaranteed to converge, the iterative procedure is also guaranteed to converge. Illustrative Example We demonstrate the efficiency of the proposed method using a simple illustrative example. Let N = 2 and B = 2 as before. Let Core 1 be tested on TAM 1 and Core 2 on TAM 2. Further, let Tl(w1) = 10eC"l and let Tz(w2) = 10e-2"2. Note that both TI(wl) and Tz(w2) are monotonically non-increasing functions. Let X = 1. We wish to minimize J(w1, w2), where J'(w1, w2) = max{lOe-"l, 10e-2"2) + (WI + W Z ) . (6) Let the allowed values of w1 and w2 be constrained, such that 1 5 w1, w2 5 10. A brute force solution would require the evaluation of J' for all 100 possible combinations of w1 and w2. Such a brute-force search in this example gives wppt = 2, wgpt = 1 and JoPt(2, 1) = 4.3534. Next, we solve the problem using the proposed procedure. We initialize the TAM width vector to w y ) = w p ) = 10. Since X = 1, therefore J(O) = 20.0005.
In the first iteration, we minimize J'(w1, w2) varying only WI, while keeping w2 = 10. The constrained cost function can be expressed as
Using the bisection search method [3], we find that the value w1 = 2 minimizes the cost function in Equation (7). Thus, w!') = 2,
-10. After iteration 1, J ( l ) = 13.3534. In Iteration 2, we set w y ) to 2, and minimize the cost function, while varying w2. The new constrained cost function can thus be written as
Here, bisection search [3] yields w2 = 1, and the minimal value of the cost function J(2) equals 4.3534. Next, in Iteration 3, we fix w2 to 1 and vary w1. The solution obtained at the end of Iteration 3, remains unchanged. Thus, we have achieved the optimal values of w1 and w2. These are given by w1 = 2, w2 = 1. Recall that this solution is the same as the one we obtained earlier using brute-force search. However, we are able to find the optimal solution in only three iterations using the iterative descent procedure, as compared to 100 iterations using the brute-force search. Moreover, from the theory of Lagrange multipliers, the complexity of the proposed approach is linear in B , whereas that of the brute-force is exponential in B.
In our experiments, we have found that in order to find partitions for TAM widths varying from 8 to 160, the X values need to vary from 10,000 to 1. For example, for the SOC benchmark circuit ~22810, a X value of 10,000 yields a TAM partition for a TAM width of 8. Since, X varies inversely and monotonically with W, we use a bisection search over all possible values of X to arrive at a solution for a given TAM width.
4.

wp -~' ( w 2 )
= max(1.35, 10e-2w2) + 2 + wz
TAM OPTIMIZATION AND CORE AS-
In the previous section, we used Lagrange optimization to determine an optimal partition of TAM widths among cores when the core assignment to TAMs is known. In this section, we solve the more general problem of optimizing core assignments as well as TAM widths in conjunction. This problem is equivalent to the general TAM optimization problem 'PNPAW formulated in [7] . Here, we first repeat the problem formulation from [7] , and then present a method based on the Lagrange optimization procedure of Section 3 to solve %',PAW. Problem PNPAW: Given an SOC having N cores and a total TAM width W , determine the number of TAMs, a partition of W among the TAMs, an assignment of cores to TAMs, and a wrapper design for each core, such that the total testing time is minimized.
SIGNMENT
0
Problem PNPAW was shown to be "P-hard in 171.
We use the method of alternating projections [12] to iterate between the Lagrange optimization procedure and a heuristic algorithm for core assignment [SI, whose cost function is again the SOC testing time. First, the Lagrange optimization procedure is used to obtain a TAM width partition that minimizes the testing time for the SOC (based on an initial ad hoc core assignment). This width partition is then input to the core assignment algorithm [SI, and cores are re-assigned to TAMs. After this step, the new assignment is fed as input to the Lagrange optimization procedure and the process is repeated. The Lagrange optimization procedure and the core assignment algorithm are run altemately until the SOC testing time converges to a fixed value. Figure 2 illustrates the alternating procedure for core assignment and Lagrange width partition optimization. The wrapper design algorithm from [7] is used to optimize core wrappers for the SOC.
From the wrapper design procedure, we obtain the testing time T,(lc) of each core for TAM width k (1 5 k 5 wmaz), where wmaz is the upper limit on TAM width supplied to the wrapper design algorithm. The core testing times are then input to the core assignment algorithm [8] and cores are assigned to TAMs based on an initial ad hoc TAM width partition in which the width of each TAM is set to wmaz. After the core assignment is performed, the Lagrange optimization procedure determines the new expression for the cost function J'; a TAM partition that minimizes this cost function is obtained. The new TAM width partition is input to the core assignment algorithm and the process repeats until the testing time converges. Convergence is achieved when the decrement in the testing time is less than a threshold value E . In our experiments, we set E to 3 clock cycles.
Recall from Equation (4) that the cost function for the Lagrange optimization problem is the cost function (SOC tes'ting time) for thlcore assignment algorithm of [SI used in the proposed method is given as: 7 = max, {E," Tz(w,)z,J} . It is therefore interesting to note that the cost function expressions for core assignment and TAM optimization are the same, since the values of X and W remain constant during an execution of the procedure illustrated in Figure 2 . Hence the testing time converges at a quicker rate than if the Lagrange 0.002 0.002 0.001 Table 1 : Efficiency of Lagrange procedure for B = 6 and 8.
procedure were run with no alternating core re-assignment step. The procedure in Figure 2 is once again an iterative descent procedure; each Lagrange and each core assignment iteration guarantees a decrease in the testing time. The proof of convergence for this procedure is therefore similar to that given in Section 3 for the Lagrange procedure.
In the absence of an analytical expression for the number of iterations required to arrive at a solution, we demonstrate the efficiency of the proposed procedure empirically. In Table 1 Since both Partition-evahate and the Lagrange procedure use the same algorithm for core assignment [SI, the overall improvement in TAM optimization using the Lagrange procedure is based solely on the new TAM partitioning algorithm. Hence, the performance of the Lagrange procedure does not deteriorate with increasing W , which is not the case for Partition-evaluate [SI. This is especially critical when virtual TAMs are designed, since the total virtual TAM width for a high-performance ATE can be very high. For large TAM widths, the computation time in [SI is in the order of minutes, whereas the proposed approach requires computation time in the order of a few seconds.
EXPERIMENTAL RESULTS
In this section, we present experimental results on core assignment and TAM optimization using virtual TAMs. We demonstrate that the SOC testing time and therefore implicit test cost can be significantly reduced using virtual TAMs. Experimental results are presented for three benchmark SOCs from the ZTC'O2 SOC Test Benchmarks suite [14] .
In Table 2 , we present results on the testing times obtained for different values of TAM width using virtual TAMs. The testing time is measured in terms of the number of scan clock cycles. The total number of high-frequency and low-frequency ATE pins used for test is denoted by WATE. Therefore the real TAM width at the SOC boundary is WATE. Of the WATE pins, there are U highfrequency pins and (WATE -U) low-frequency pins. The U high-frequency pins are assumed to be capable of operating at a frequency of four times that of the (WATE -U ) lowfrequency pins, which operate at the lower scan chain frequency. Therefore, from Equation (3), the number of virtual TAM pins available to cores is given by W = WATE + 3U. The value of WATE is varied from 16 to 64 for each benchmark SOC. For each SOC, we perform two sets of experiments, seting (i) U = v, and (ii) U = *.
SOC
Testing time results are obtained for both these cases. By Tlgrrwe denote the testing time obtained by using Lagrange Optimization, if no virtual TAMs are used. This follows the TAM design methods proposed in [4, 7, 8, 91 , where the entire TAM width of WATE was assumed to operate at the lower scan chain frequency, and only WATE TAM wires are partitioned among the cores. By Tvzrt, we denote the testing time obtained using Lagrange Optimization and virtual TAMs. The lower bounds on testing time LBT for the W virtual TAMs are also presented. These bounds are derived from the formulas presented in [l, 41. The percentage decrease in the SOC testing time A T using virtual TAMs is presented for each value of WATE for the three benchmark SOCs.
The value of AT is calculated as T''c;oydT"Ld x 100.
For ~22810, we obtain a decrease of as much as 47.7% in testing time. In SOC ~34392, one of the cores (Core 18) is a bottleneck core, as a result of which the testing time reaches the lower bound value of 544579 clock cycles for all TAM widths larger than 32. This property of Core 18 for TAM widths larger than 32 in SOC p34392 was presented in [9] . Using virtual TAMs, it is possible to achieve the lower bound of 544579 cycles with WATE = 16.
The testing time results for ~9 3 7 9 1 show an improvement of as much as 58.6% over the testing times obtained without using virtual TAMs, even if only 8 pins out of 16 are running at the higher frequency. This represents a significant reduction in implicit test cost. The lower testing times and ATE pin-count requirements on the part of each SOC facilitate greater utilization of the ATE, and provide larger returns on the ATE investment.
In Table 3 , we compare our results with four recent TAM optimization approaches [4, 7, 8, 91 . In [7] , the authors optimized a test bus architecture using a combination of integer linear programming (ILP) and exhaustive enumeration. The work in [7] was later improved in [8] to include a heuristic method for core assignment.
This heuristic core assignment approach forms a part of the TAM optimization method presented in this paper. In [9], the authors presented a method to integrate TAM design and test scheduling using rectangle packing. Finally, in [4] , the authors presented a heuristic algorithm TR-Architect for TestRail optimization. In Column 3 of Table 3 , we also list the lower bound values on testing time for the benchmark SOCs calculated in [4] . Note that the testing times presented for the proposed Lagrange optimization approach in the last column of Table 3 do not assume virtual TAMs. This is to ensure a fair comparison with the approaches in [4, 7, 8, 9] . The results obtained for the proposed approach compare most closely to those of the Partition-evaluate algorithm [8] , since the two methods use the same heuristic for core assignment. The CPU times taken by the method in [8] is in the range of a few hundred seconds at most, while the proposed Lagrange procedure is usually half of this. This is because, as shown in Section 3, the Lagrange procedure is more efficient than the partitioning approach used in Partition-evaluate, therefore the CPU time taken by the Lagrange procedure is less than that required by Partition-evaluate. The rectangle packing [9] and TR-Architect [4] algorithms appear to be the most efficient in terms of execution time taking at most 10 seconds to complete. The ILP/enumeration algorithm [7] takes prohibitively-large execution times (in the range of several minutes to hours), depending on the SOC complexity.
CONCLUSION
We have presented a new technique to reduce testing time and test cost for core-based SOCs by increasing test resource utilization. The proposed approach, which is based on the concept of virtual TAMs, allows high-speed ATE channels to drive slower scan chains at their maximum rated frequencies. We have shown that even though virtual TAMs operate at scan-chain speeds, they can be interfaced to high-speed ATE channels using bandwidth matching. In this way, the number of on-chip TAM wires is not limited by the number of available pins on the SOC; this allows better utilization of high-speed ATE channels and reduces testing time. We have also presented a new TAM optimization framework based on Lagrange multipliers. Experimental results for three industrial SOCs from the ITC'02 SOC test benchmarks demonstrate the effectiveness of the proposed approach.
