Abstract-Wafer-level testing (wafer sort) is used in the semiconductor industry to reduce packaging and test cost. However, a large number of wafer probe contacts leads to higher yield loss. Therefore, it is desirable that the number of chip pins contacted by tester channels during wafer sort be kept small to reduce the yield loss resulting from improper contacts. Since test time and the number of contacted chip pins are major practical constraints for wafer sort, not all scan-based digital tests can be applied to the die-under-test. We propose an optimization framework that addresses test access mechanism (TAM) optimization and test-length selection for wafer-level testing of core-based digital SoCs. The objective here is to design a TAM architecture and determine test-lengths for the embedded cores such that the overall SoC defect screening probability at wafer sort is maximized. Defect probabilities for the embedded cores, obtained using statistical yield modeling, are incorporated in the optimization framework. Simulation results are presented for five of the ITC'02 SoC Test benchmarks.
I. INTRODUCTION
The consumer electronics market is characterized by low product cost and decreasing profit margins. Rapid advances in process technology and design tools have led to system-on-chip (SoC) integrated circuits that reduce the design cycle time. Short design cycle times are achieved by integrating a number of pre-designed and pre-verified embedded cores into an SoC. While the testing of such core-based SoCs continues to be a major concern in the semiconductor industry [1] , [2] , a number of efficient solutions have recently been proposed for test access mechanism (TAM) optimization and test scheduling [3] - [9] . The design of efficient TAM architectures and SoC test schedules are important problems that need to be addressed during system integration.
Test and packaging costs are major contributors to the overall product cost for an SoC [10] . Wafer level testing, also referred to as wafer sort, is used to screen defective dies prior to packaging, thereby reducing packaging cost and test time for the packaged integrated circuits (ICs) [11] - [13] . However, a large number of wafer probe contacts leads to higher yield loss. Therefore, the number of chip pins contacted by tester channels during wafer sort is deliberately kept small to reduce the yield loss that results from improper contacts [14] . Since test time and the number of contacted chip pins are major practical constraints for wafer sort, all scan-based digital tests cannot always be applied to the die-under-test.
Reduced-pin count testing (RPCT) has been advocated as a designfor test technique, especially for use at wafer sort, to reduce the number of IC pins that needs to be contacted by the tester [14] - [17] . RPCT reduces the cost of test by enabling the reuse of old testers with limited channel availability. It also reduces the number of probe points required during wafer test; this translates to lower test cost, as well as less yield loss issues arising from contact problems with the wafer probe.
Test cost for SoCs can also be reduced by using estimated defect probabilities for the embedded cores to guide test scheduling [18] , [19] . These defect probabilities determine the order in which the embedded cores in the SoC are tested, as well as to identify the subsets of cores that are tested concurrently. Short product cycles in practice make the defect estimation process extremely difficult. The defect probabilities for the cores were assumed in [18] to be either known a priori or obtained by binning the failure information for each individual core over the product cycle [19] . In practice, however, short product cycles make defect estimation based on failure binning difficult. Moreover, defect probabilities for a given technology node are not necessarily the same for the next (smaller) technology node. Recently, a wafer-level defect screening technique for core-based SOCs was presented in [20] . This approach is based on a combination of statistical yield modeling, to determine the defect probability for each core in the SoC, and integer linear programming for optimization.
In this paper, we present an optimization framework that addresses TAM optimization and test-length selection for wafer-level testing of core-based digital SoCs. The objective here is to design a TAM architecture for wafer sort that utilizes a pre-designed underlying TAM architecture for package test, and determine test-lengths for the embedded cores such that the overall SoC defect screening probability at wafer sort is maximized. Defect probabilities for the embedded cores, obtained using statistical yield modeling, are incorporated in the optimization framework. The proposed method reduces packaging cost and the subsequent test time for the IC lot, while efficiently utilizing available tester bandwidth at wafer sort. While an optimal test access architecture and test schedule can also be developed for wafer sort, we assume that these test planning problems are best tackled for package test, simply because the package test time is higher.
We present two techniques for test-length selection and TAM optimization. The first technique is based on the formulation of a non-linear integer programming model, which can be subsequently linearized and solved using standard integer linear programming (ILP) tools. While this approach leads to a thorough understanding of the optimization problem, it does not appear to be scalable for large SoCs. We therefore describe a second method that enumerates all possible valid TAM partitions, and then uses the ILP model presented in [20] to derive test-lengths to maximum defect screening at wafer sort. This enumerative procedure allows an efficient search of a large solution space it results in significantly lower computation time than that needed for the first method. Simulation results on TAM optimization and test-length selection are presented for five of the ITC'02 SoC Test benchmarks [21] .
The remainder of the paper is organized as follows. In Section 2, we briefly describe how the defect screening probabilities of the cores and the SoC are determined using the approach presented in [20] . 
II. DEFECT SCREENING PROBABILITY
In this section we briefly describe how the defect screening probability for the SoC can be determined using the method presented in [20] . The defect probabilities for the embedded cores are obtained using the yield modeling technique presented in [20] . Let us now define the following statistical events for Core i: Ai: the event that the core has a fault; the probability associated with this event is determined from the statistical yield model described in [20] . Bi: the event that the tests applied to core i do not produce an incorrect response.Āi andBi represent events that are complementary to events Ai and Bi, respectively. Two important conditional probabilities associated with the above events are yield loss and test escape, denoted by P(Bi |Āi) and P(Bi | Ai), respectively. Using a basic identity of probability theory, we can derive the probability that the test applied to Core i detects a defect:
Due to SoC test time and TAM width constraints during waferlevel testing, only a subset of the pattern set can be applied to any Core i, i.e., if the complete test suite for the SoC contains pi scan patterns for Core i, only p * i ≤ pi patterns can be actually applied to it during wafer sort. Let f ci(p * i ) be the fault coverage for Core i with p * i test patterns. Let us now assume that the yield loss is γi, the test escape is βi, and the probability that Core i has a defect is θi. Using these variables, we can rewrite Equation (1) as:
Similarly we can rewrite P(Bi) as follows:
The defect screening probability PS for an SoC with N embedded cores is given by
III. REDUCED PIN-COUNT TEST-LENGTH AND TAM OPTIMIZATION PROBLEM
In practice, the TAM bitwidth used at wafer sort is considerably less than the maximum available TAM bitwidth for package test. This is because of the likelihood of yield loss at wafer sort due to improper touchdowns/probe-pin contacts by the wafer probe [11] - [13] . RPCT methods are therefore desirable for wafer sort. In [20] , it is assumed that that the TAM architecture is fixed and optimized for package test; the same architecture is used for wafer-level test optimization. Therefore, [20] makes the unrealistic assumption that the TAM bitwidth at wafer sort is the same as that for package test. We next formulate the test-length selection and TAM optimization problem for RPCT.
A. Test data serialization
Suppose Core i is accessed from the SoC pins for package test using a TAM of width wi (bits). Let us assume that for RPCT-based wafer sort, the TAM width for Core i is constrained to be w * i bits, where w * i < wi. In order to access Core i using only w * i bits for wafer sort, the pre-designed TAM architecture for package test needs to be appropriately modified. Fig. 1 (a) shows a wrapped core that is connected to a 4-bit wide TAM width (wi = 4). For the same wrapped core, Fig. 1(b) outlines a modified test access design that allows RPCT-based wafer-level test with w * i = 2. For wafer sort in this example, the lines T AMout[0], and T AMout [2] are not used. In order to ensure efficient test access architecture for wafer sort, serial-to-parallel conversion of the test data stream is necessary at the wrapper inputs of the core. A similar parallel-to-serial conversion is necessary at the wrapper outputs of the cores. Boundary input cells BIC[0], . . . , BIC [3] , and boundary output cells BOC[0], . . . , BOC [3] , which can operate in both a parallel load and a serial shift mode, are added at the I/Os of the wrapped core. Multiplexers are added on the input side of the core to enable the use of a smaller number of TAM lines for wafer sort. A global select signal P T /W S is used to choose either the package test mode (P T /W S = 0) or the wafer sort mode (P T /W S = 1). For the output side, the multiplexers are not needed; the test response can be serially shifted out to the TAM while the next pattern is serially shifted in to the boundary input cells. Note the above design is fully compliant with the IEEE 1500 standard [22] because no modifications are made to the standard wrapper cells.
We next explain how the test time for Core i is affected by the serialization process. Let Ti(j) be the total testing time (in clock cycles) for core i if it is placed on TAM partition j of the SoC. Let wi(j) be the width of TAM partition j in the pre-designed TAM architecture. At the wafer level, if only w i bits are available for TAM partition j, we assume, as in [23] for hierarchical SoC testing, that the wi lines are distributed equally into w i parts. Thus the waferlevel testing time for core i on TAM partition j equals
·Ti(j) clock cycles. In the example of Fig. 1(b) , the test time for core i due to serialization for is T * i (j) = Ti(j) · (4/2). Note that other TAM serialization methods can also be used for wafer sort. While TAM serialization can be integrated in an overall optimization problem, it is not considered here for the sake of simplicity.
B. Test-length and TAM optimization problem: PT LT W S
Let the upper limit on the test time for an SoC at wafer sort be Tmax (clock cycles). This upper limit on the scan test time at wafer sort is expected to be a fraction of the scan test time TSoC (clock cycles) for package test, as determined by the TAM architecture and test schedule. We assume a fixed-width TAM architecture as in [5] , where the top-level TAM is divided into several TAM partitions. This architecture implies that the total test time on each TAM partition must not exceed Tmax.
If the internal details of the embedded cores are available to the system integrator, fault simulation can be used to determine the fault coverage for various values of p * i , i.e., the number of patterns applied to the cores during wafer sort. Otherwise, we model the relationship between fault coverage and the number of patterns with an exponential function. It is well known in the testing literature that the fault coverage for stuck-at faults increases rapidly initially as the pattern count increases, but it flattens out when more patterns are applied to the circuit under test [24] , [25] . In our work, without loss of generality, we use the normalized function f ci(p * i ) = log 10 (p * i +1) log 10 p i to represent this relationship. A similar relationship was used in [25] . TAM in [2] TAM in [1] TAM in [3] TAM out [0]
TAM out [2] TAM out [3] Wrapped Core i
BOC [3] (a) (b) We have verified that this empirical relationship matches the "fault coverage curve" for the ISCAS benchmark circuits. Let i(P * i ) be the defect-escape probability for Core i when p * i patterns are applied to it. This probability can be obtained using Equation (3) as a function of the test escape βi and the probability θi that the core is faulty. The value of θi for each core in the SoC is obtained using the procedure described in [20] . Let us now consider an SoC with a top-level TAM width of W bits and suppose it has B TAM partitions of widths [w1, w2, · · · , wB], respectively. For a given value of wafer level (maximum) TAM width W * , we need to determine appropriate TAM sub-partitions of widths w * 1 , w * 2 , . . . , w * B such that w * i ≤ wi, 1 ≤ i ≤ B, and w *
The optimization problem PT LT W S can now be formally stated as follows: Problem PT LT W S : Given a pre-designed TAM architecture for a core-based SoC, the defect probabilities for each core in the SoC, maximum available test bandwidth at wafer sort W * and the upper limit on the test time for the SoC at wafer sort TMAX , determine (i) the total number of test patterns to be applied to each core and (ii) reduction in TAM width for each partition such that: (a) the overall testing time on each TAM partition does not exceed the upper bound Tmax and (b) the defect screening probability P (Bi) for the SoC is maximized.
The objective function for the optimization problem is as follows:
where the number of cores in the SoC is N . We next introduce the indicator binary variable δij , 1 ≤ i ≤ N , 0 ≤ j ≤ pi, which ensure that exactly one test-length is selected for each core. It is defined as follows:
where p i j=1 δij = 1. The defect escape probability * i for Core i is given by * i = q i j=1 δij i(j). We next reformulate the objective function to make it more amenable for further analysis. Let F = ln(Y ). We therefore get:
We next use the Taylor series expansion ln(1 − x) = −(x + x 2 /2 + x 3 /3+· · · ) and ignore the second-and higher-order terms [26] . This approximation is justified if the defect-escape probability for Core i is much smaller than one. While this is usually the case, occasionally the defect-escape probability is large; in such cases, the optimality claim is valid only in a limited sense. The simplified objective function is given by:
In other words, the objective function can be stated as
The constraint on the overall test time at wafer sort is given by Tmax which is a fraction of the overall test time of the SoC (TSoC). Due to serialization, the testing time for core i on TAM partition j, is given by (wi(j)/w * i (j) Ti(j) [23] . Therefore the test time of core i when it is tested with a reduced bitwidth of w * i is given by Equation (5) .
Let us now define a second binary indicator variable λ ik , to ensure that every core in the SoC is tested using a single TAM width; this variable can be defined as follows:
It can be inferred from the above definition that w i k=1 λ ik = 1 and Equation (5) can now represented as T *
k=1 δij Ti(j)λ jk wi · k . The nonlinear term in the constraint δij · λ ik can be replaced with a new binary variable u ijk by introducing two additional constraints:
A constraint to ensure that every core in a TAM partition is tested with the same TAM width W * x is also necessary and can be represented as shown in Equation (8) . Aj denotes the set of cores that are assigned to TAM partition j. The complete ILP model is shown in Fig. 5 . 
C. Experimental Results: PT LT W S
We now present the experimental results for two SoCs from the ITC'02 SoC test benchmark suite [21] . We use the public domain ILP solver lpsolve for our experiments [27] . Since the objectives of our experiment are to select the number of test patterns in a time-and bitwidth-constrained wafer sort environment, and at the same time maximize the defect screening probability, we present the following results:
• Given values of W * and Tmax relative to TSoC, the percentage of test patterns that must be applied for each individual core to maximize the defect screening probability for the SoC.
• The values of TAM partition widths w *
• The relative defect-screening probability P r S for each core in an SoC, where P r S = PS/P 100 S and P 100 S is the defect-screening probability if all 100% of the patterns are applied per core.
• The relative defect-screening probability for each SoC obtained using the ILP model.
We first present results on the number of patterns determined for the cores. The results for the d695 benchmark SoC is presented in Fig. 3 for three values of Tmax: TSoC, 0.75TSoC and 0.5TSoC . The fraction of test patterns applied per core is found to be different in each case to maximize the defect screening probability. Results are reported only for W * = 16 and W = 32; similar plots are obtained for different values of W * and W . Fig. 4 illustrates the defect-screening probabilities for the cores in the d695 benchmark for the above-mentioned test case.
We summarize the results for two benchmark SoCs in Table I for three different values of W * and W = 32. The relative defect screening probabilities PS and TAM partition widths to be used at wafer sort, obtained using PT LT W S , are enumerated for both benchmark SoCs. The ILP-based technique takes up to 3 hours of CPU time on a 2.4 GHz AMD Opteron processor, with 4 GB of memory for d695 when W * = 16 and W = 32. The results show that a significant portion of the faulty dies can be screened at wafer sort using the proposed technique. 
IV. ENUMERATION-BASED TAM WIDTH AND TEST-LENGTH SELECTION: Pe−T LT W S
The ILP-based approach in Section II is very efficient for small SoCs. However, due to its large size, it may not scale well for SoCs with large number of cores. It is therefore necessary to develop an alternative technique that can handle larger SoC designs. We next propose an efficient approach based on a combination TAM partitionwidth enumeration and ILP.
Our enumeration approach is based on the "odometer" principle used in a car odometer. Each digit of a car odometer here corresponds to a TAM partition width at wafer sort. Each digit can take values between 1 and the upper limit fixed by the TAM architecture designed for package test. We first increase the least significant digit if possible, and next roll the digit over to one and increase the next leastsignificant digit. The implementation of the enumeration approach for determining the optimal TAM partition widths and test-lengths can be done using the following sequence of procedures: (i) Given the number of TAM partitions B and an upper limit on the maximum TAM width W * , we first enumerate all possible TAM partition combinations. This enumeration can be done following the principle of a B-bit odometer, where each bit corresponds to the width of each TAM partition. The odometer resets to one as opposed to zero in the case of a conventional odometer (the maximum value that the i th bit can take before a reset is wi). At every increment in the odometer, we check whether B i=1 w * i = W * . All possible TAM partitions that meet the above condition are stored as a valid partition. We illustrate the above enumeration procedure with a small example. Let us consider an SoC whose TAM architecture is fixed and designed for 5 bits, and partitioned into three TAM partitions of widths 2, 3, and 1 respectively. The possible TAM enumerations for the above partitions are {111, 121, 131, 211, 221, 231}. If we consider W * to be 4, then the valid TAM partitions are {121, 211}.
(ii) For each valid TAM partition calculated in Step (i), we apply the test-length selection procedure PT LS from [20] . We calculate the defect screening probability for the SoC from the results obtained using PT LS .
The objective function for PT LS is the same as Equation (4). Let Ti(j) be the test time for Core i when j patterns are applied to it. For a given Core i on a TAM partition of width wB, we use the design wrapper technique from [5] to determine the longest scan in (out) chains of length si(so) of the core on that TAM partition. The value of Ti(j) can be determined using the formula Ti(j) = (1 + max{si, so} · j + min{si, so}) [5] . The test time T * i for Core i is therefore given by T * i = p i j=1 δijTi(j). Let Aj denote the set of cores that are assigned to TAM partition j. We must ensure that
The complete ILP model is shown as Fig. 5 (iii) If the defect screening probability of the new partition is greater than the previous partition, we store it as the new deefct screening probability, and store this partition as the current optimal partition. (iv) We repeat this procedure until all possible TAM partitions are enumerated.
Constants : i (j), T i (j)
Variables : δ ij Experimental results obtained using the Pe−T LT W S procedure are summarized in Table II . The results are represented in a similar fashion as in Table I . The values of the defect screening probabilities PS for four benchmark circuits [21] , as well as the recommended TAM partition widths for wafer-sort are shown in the table. The number of patterns determined using Pe−T LT W S for the p34392 SoC is illustrated in Fig. 6 . The results are shown for three values of Tmax: TSoC, 0.75TSoC and 0.5TSoC . Results are reported only for W * = 16 and W = 32; similar plots are obtained for different values of W * and W . Fig. 7 illustrates the relative defect-screening probabilities for the cores in the p34392 benchmark for the abovementioned test case. The computation time for the largest benchmark SoC p93791 was only 4 minutes, hence this approach is suitable for large designs.
V. CONCLUSIONS
We have formulated a test-length and a TAM width selection problem for wafer-level testing of core-based digital SoCs. To the best of our knowledge, this is the first attempt to incorporate TAMwidth-selection in the wafer-level SoC test flow. We have also incorporated core defect probabilities into the modeling and optimization framework. Experimental results for the ITC'02 SoC test benchmarks show that the proposed approach can contribute to effective defect screening at wafer sort. 
