Abstract-Wafer-level testing (wafer sort) is used in the semiconductor industry to reduce packaging and test cost. However, a large number of wafer-probe contacts lead to higher yield loss. Therefore, it is desirable that the number of chip pins contacted by tester channels during wafer sort be kept small to reduce the yield loss resulting from improper contacts. Since test time and the number of contacted chip pins are major practical constraints for wafer sort, not all scan-based digital tests can be applied to the die under test. We propose an optimization framework based on mathematical programming (integer linear programming, nonlinear programming, and geometric programming) and fast heuristic methods. This framework addresses test-access mechanism (TAM) optimization and test-length selection for wafer-level testing of core-based digital system-on-chips (SoCs). The objective here is to design a TAM architecture and determine test lengths for the embedded cores such that the overall SoC defect-screening probability at wafer sort is maximized. Defect probabilities for the embedded cores, obtained using statistical yield modeling, are incorporated in the optimization framework. Simulation results are presented for five of the ITC'02 SoC Test benchmarks.
I. INTRODUCTION
T HE CONSUMER electronics market is characterized by low product cost and decreasing profit margins. Market realities, as well as rapid advances in process technology and design tools, have led to system-on-chip (SoC) integrated circuits (ICs) that reduce the design cycle time and design cost. Short design cycle times are achieved by integrating a number of predesigned and preverified embedded cores into an SoC. While the testing of such core-based SoCs continues to be a major concern in the semiconductor industry [1] , [2] , a number of efficient solutions have recently been proposed for test-access mechanism (TAM) optimization and test scheduling [3] - [10] . The design of efficient TAM architectures and SoC test schedules are important problems that need to be addressed during system integration.
Test and packaging costs are major contributors to the overall product cost for an SoC [11] . Wafer-level testing, also referred to as wafer sort, is used to screen defective dies prior to packaging, thereby reducing packaging cost and test time for the packaged ICs [12] - [14] . However, a large number of waferprobe contacts lead to higher yield loss. Therefore, the number of chip pins contacted by tester channels during wafer sort is deliberately kept small to reduce the yield loss that results from improper contacts [15] . Since test time and the number of contacted chip pins are major practical constraints for wafer sort, not all scan-based digital tests can be applied to the die under test. Reduced pin-count testing (RPCT) has been advocated as a design-for-test technique, particularly for use at wafer sort, to reduce the number of IC pins that needs to be contacted by the tester [15] - [19] . RPCT reduces the cost of test by enabling the reuse of old testers with limited channel availability. It also reduces the number of probe points required during wafer test; this translates to lower test cost as well as less yield loss issues arising from contact problems with the wafer probe.
The test cost for SoCs can also be reduced by using estimated defect probabilities for the embedded cores to guide test scheduling [20] , [21] . These defect probabilities can be used to determine the order in which the embedded cores in the SoC are tested, as well as to identify the subsets of cores that are tested concurrently. The defect probabilities for the cores were assumed in [20] to be either known a priori or obtained by binning the failure information for each individual core over the product cycle [21] . In practice, however, short product cycles make defect estimation based on failure binning difficult. Moreover, defect probabilities for a given technology node are not necessarily the same for the next (smaller) technology node. Recently, a wafer-level defect-screening technique for corebased SoCs was presented in [22] and [23] . This approach is based on a combination of statistical yield modeling, to determine the defect probability for each core in the SoC, and integer linear programming (ILP) for optimization. It does not consider the problem of RPCT for wafer sort.
In this paper, we present an optimization framework that addresses TAM optimization and test-length selection for RPCTbased wafer-level testing of core-based digital SoCs. The objective here is to design a TAM architecture for wafer sort that utilizes a predesigned underlying TAM architecture for package test and also to determine test lengths for the embedded cores such that the overall SoC defect-screening probability at wafer sort is maximized. Defect probabilities for the embedded cores, obtained using statistical yield modeling, are incorporated in the optimization framework. The proposed method reduces packaging cost and the subsequent test time for the IC lot while efficiently utilizing available tester bandwidth at wafer sort. While an optimal test access architecture and test schedule can also be developed for wafer sort, we assume that these test planning problems are best tackled for package test, simply because the package test time is higher.
We present three techniques for test-length selection and TAM optimization. The first technique is based on the formulation of a nonlinear integer programming model, which can be subsequently linearized and solved using standard ILP tools. While this approach leads to a thorough understanding of the optimization problem, it does not appear to be scalable for large SoCs. We therefore describe a second method that enumerates all possible valid TAM partitions, and then uses the ILP model presented in [22] to derive test lengths to maximum defect screening at wafer sort. This enumerative procedure allows an efficient search of a large solution space, thereby resulting in significantly lower computation time than that needed for the first method. The third method relies on a heuristic based on geometric programming (GP). Simulation results on TAM optimization and test-length selection are presented for five of the ITC'02 SoC Test benchmarks [24] .
The remainder of this paper is organized as follows. In Section II, we describe how the defect-screening probabilities of the cores and the SoC are determined using the approach presented in [22] . Section III formulates the problem of TAM optimization and test-length selection. An integer programming model is presented to solve this problem. Simulation results for two ITC'02 SoC Test benchmarks are presented in Section III. Section IV presents the second optimization method based on the enumeration of TAM partitions. Simulation results for five of the ITC'02 SoC Test benchmarks are presented. Section V presents the GP-based heuristic and compares the defectscreening probabilities obtained using the different techniques. In Section VI, a nonlinear programming (NLP) solver is used to determine the error introduced by the linearization technique used in Section III. Finally, Section VII concludes this paper.
II. DEFECT-SCREENING PROBABILITY
In this section, we briefly describe how the defect-screening probability for the SoC can be determined using the method presented in [22] . The defect probabilities for the embedded cores are obtained using the yield modeling technique presented in [22] . Let us now define the following statistical events for core i.
1) A i : the event that the core has a fault. The probability associated with this event is determined from the statistical yield model described in [22] . 2) B i : the event that the tests applied to core i do not produce an incorrect response.Ā i andB i represent events that are complementary to events A i and B i , respectively. Two important conditional probabilities associated with the aforementioned events are yield loss and test escape, denoted by P(B i |Ā i ) and P(B i | A i ), respectively. By using a basic identity of probability theory, we can derive the probability that the test applied to core i detects a defect
Due to SoC test time and TAM width constraints during wafer-level testing, only a subset of the pattern set can be applied to any core i, i.e., if the complete test suite for the SoC contains p i scan patterns for core i, only p * i ≤ p i patterns can be actually applied to it during wafer sort. Let fc i (p * i ) be the fault coverage for core i with p * i test patterns. Let us now assume that the yield loss is γ i , the test escape is β i , and the probability that core i has a defect is θ i . By using these variables, we can rewrite (1) as
Similarly, we can rewrite P(B i ) as follows:
The defect-screening probability P S for an SoC with N embedded cores is given by
III. REDUCED PIN-COUNT TEST-LENGTH AND TAM OPTIMIZATION PROBLEM
In practice, the TAM bitwidth used at wafer sort is considerably less than the maximum available TAM bitwidth for package test. This is because of the likelihood of yield loss at wafer sort due to improper touchdowns/probe-pin contacts by the wafer probe [12] - [14] . RPCT methods are therefore desirable for wafer sort. Reconfigurable TAM architectures such as in [25] and [26] can be useful for wafer sort; these techniques impose considerable area and performance overhead due to the need for wrapper multiplexing. In [22] , it is assumed that the TAM architecture is fixed and optimized for package test; the same architecture is used for wafer-level test optimization. Therefore, the authors in [22] make the unrealistic assumption that the TAM bitwidth at wafer sort is the same as that for package test. We next formulate the test-length selection and TAM optimization problems for RPCT.
A. Test Data Serialization
Suppose that core i is accessed from the SoC pins for package test using a TAM of width w i (bits). Let us assume that for RPCT-based wafer sort, the TAM width for core i is constrained to be w * i bits, where w * i < w i . In order to access core i using only w * i bits for wafer sort, the predesigned TAM architecture for package test needs to be appropriately modified. Fig. 1 (a) shows a wrapped core that is connected to a 4-b-wide TAM width (w i = 4). For the same wrapped core, Fig. 1(b) shows a modified test access design that allows an RPCT-based wafer-level test with w * i = 2. For wafer sort in this example, the lines TAM out [0] and TAM out [2] are not used. In order to ensure efficient test access architecture for wafer sort, serial-to-parallel conversion of the test data stream is necessary at the wrapper inputs of the core. A similar parallelto-serial conversion is necessary at the wrapper outputs of the cores. Boundary input cells BIC [0] , . . . , BIC [3] and boundary output cells BOC [0] , . . . , BOC [3] , which can operate in both a parallel load and a serial shift mode, are added at the I/Os of the wrapped core. Multiplexers are added on the input side of the core to enable the use of a smaller number of TAM lines for wafer sort. A global select signal P T /W S is used to choose either the package test mode (P T /W S = 0) or the wafer-sort mode (P T /W S = 1). For the output side, the multiplexers are not needed; the test response can be serially shifted out to the TAM, while the next pattern is serially shifted into the boundary input cells. Fig. 1(c) shows the test data path without the multiplexers during the wafer-sort test mode. Note that the aforementioned design is fully compliant with the IEEE 1500 standard [27] because no modifications are made to the standard wrapper cells.
We next explain how the test time for core i is affected by the serialization process. Let T i (j) be the total testing time (in clock cycles) for core i if it is placed on TAM partition j of the SoC. Let w i (j) be the width of TAM partition j in the predesigned TAM architecture. At the wafer level, if only w i bits are available for TAM partition j, we assume, as in [28] for hierarchical SoC testing, that the w i lines are distributed equally into w i parts. Thus, the wafer-level testing time for core i on TAM partition j equals (w i (j)/w * i (j)) · T i (j) clock cycles. In the example of Fig. 1(b) , the test time for core i due to serialization is T * i (j) = T i (j) · (4/2). Note that other TAM serialization methods can also be used for wafer sort. While TAM serialization can be integrated in an overall optimization problem, it is not considered here for the sake of simplicity.
B. Test-Length and TAM Optimization Problem: P TLTWS
Let the upper limit on the test time for an SoC at wafer sort be T max (clock cycles). This upper limit on the scan test time at wafer sort is expected to be a fraction of the scan test time T SoC (clock cycles) for package test, as determined by the TAM architecture and test schedule. We assume a fixed-width TAM architecture as in [5] , where the top-level TAM is divided into several TAM partitions. This architecture implies that the total test time on each TAM partition must not exceed T max .
If the internal details of the embedded cores are available to the system integrator, fault simulation can be used to determine the fault coverage for various values of p * i , i.e., the number of patterns applied to the cores during wafer sort. Otherwise, we model the relationship between fault coverage and the number of patterns with an exponential function. It is well known in the testing literature that the fault coverage for stuck-at faults increases rapidly initially as the pattern count increases, but it flattens out when more patterns are applied to the circuit under test [29] , [30] . In our work, without loss of generality, we use the normalized function fc i (p * i ) = (log 10 (p * i + 1)/ log 10 p i ) to represent this relationship. A similar relationship was used in [30] . We have verified that this empirical relationship matches the "fault coverage curve" for the ISCAS benchmark circuits.
Let i (P * i ) be the defect-escape probability for core i when p * i patterns are applied to it. This probability can be obtained using (3) as a function of the test escape β i and the probability θ i that the core is faulty. The value of θ i for each core in the SoC is obtained using the procedure described in [22] . Let us now consider an SoC with a top-level TAM width of W bits and suppose that it has B TAM partitions of widths w 1 , w 2 , . . . , Problem P TLTWS : Given a predesigned TAM architecture for a core-based SoC, the defect probabilities for each core in the SoC, maximum available test bandwidth at wafer sort W * , and the upper limit on the test time for the SoC at wafer sort T MAX , determine the following: 1) the total number of test patterns to be applied to each core and 2) the (reduced) TAM width for each partition, such that the overall testing time on each TAM partition does not exceed the upper bound T max and the defect-screening probability P (B i ) for the SoC is maximized.
The objective function for the optimization problem is as follows:
where the number of cores in the SoC is N . We next introduce the indicator binary variable
We next reformulate the objective function to make it more amenable for further analysis. Let F = ln(Y ). We therefore get
We next use the Taylor series expansion
and ignore the second-order and higher order terms [31] . This approximation is justified if the defectescape probability for core i is much smaller than one. While this is usually the case, occasionally, the defect-escape probability is large; in such cases, the optimality claim is valid only in a limited sense. The simplified objective function is given by
In other words, the objective function can be stated as
The constraint on the overall test time at wafer sort is given by T max which is a fraction of the overall test time of the SoC (T SoC ). Due to serialization, the testing time for core i on TAM partition j is given by w i (j)/w * i (j) T i (j) [28] . Therefore, the test time of core i when it is tested with a reduced bitwidth of w * i is given by
Let us now define a second binary indicator variable λ ik to ensure that every core in the SoC is tested using a single TAM width; this variable can be defined as follows:
It can be inferred from the aforementioned definition that w i k=1 λ ik = 1, and (5) can now be represented as T *
The nonlinear term in the constraint δ ij · λ ik can be replaced with a new binary variable u ijk by introducing two additional constraints
A constraint to ensure that every core in a TAM partition is tested with the same TAM width W * x is also necessary and can be represented as in (8) . The variable A j denotes the set of cores that are assigned to TAM partition j. The constraint must be satisfied for every core in A j .
The complete ILP model is shown in Fig. 2 . The number of variables and constraints in the ILP model determines the complexity of the problem. The number of variables in our ILP model is
, and the number of constraints is 2 · N + 2 
C. Experimental Results: P TLTWS

1) Given values of W
* and T max relative to T SoC , the percentage of test patterns that must be applied for each individual core to maximize the defect-screening probability for the SoC.
2) The values of TAM partition widths w *
3) The relative defect-screening probability P r S for each core in an SoC, where P r S = P S /P 100 S and P 100 S is the defect-screening probability if all 100% of the patterns are applied per core. 4) The relative defect-screening probability for the SoC obtained using the ILP model. We first present results on the number of patterns determined for the cores. The fault coverage data for the d695 benchmark circuit are obtained using a commercial automatic test-pattern generation tool. The results for the d695 benchmark SoC are shown in Fig. 3 for three values of T max : T SoC , 0.75T SoC , and 0.5T SoC . The fraction of test patterns applied per core is found to be different in each case to maximize the defect-screening probability. Results are reported only for W * = 16 and W = 32; similar plots are obtained for different values of W * and W . Fig. 4 shows the defect-screening probabilities for the cores in the d695 benchmark for the aforementioned test case.
We summarize the results for two benchmark SoCs in Table I for three different values of W * and W = 32. The relative defect-screening probabilities P S and TAM partition widths to be used at wafer sort, obtained using P TLTWS , are enumerated for both benchmark SoCs. The ILP-based technique takes up to 3 h of CPU time on a 2.4-GHz AMD Opteron processor with 4 GB of memory for d695, when W * = 16 and W = 32. The results show that a significant portion of the faulty dies can be screened at wafer sort using the proposed technique.
IV. ENUMERATION-BASED TAM WIDTH AND TEST-LENGTH SELECTION: P e−TLTWS
The ILP-based approach in Section III is efficient only for small SoCs. However, due to its large size, it may not scale well for SoCs with a large number of cores. It is therefore necessary to develop an alternative technique that can handle larger SoC designs. We next propose an efficient approach based on a combination TAM partition-width enumeration and ILP.
Our enumeration approach is based on the "odometer" principle used in a car odometer. Each digit of a car odometer here corresponds to a TAM partition width at wafer sort. Each digit can take values between 1 and the upper limit fixed by the TAM architecture designed for package test. We first increase the least significant digit if possible and, next, roll the digit over to one and increase the next least significant digit. The implementation of the enumeration approach for determining the optimal TAM partition widths and test lengths can be done using the following sequence of procedures. 1) Given the number of TAM partitions B and an upper limit on the maximum TAM width W * , we first enumerate all possible TAM partition combinations. This enumeration can be done following the principle of a B-bit odometer, where each bit corresponds to the width of each TAM partition. The odometer resets to one as opposed to zero in the case of a conventional odometer (the maximum value that the ith bit can take before a reset is w i ). At every increment in the odometer, we check whether
All possible TAM partitions that meet the aforementioned condition are recorded as a valid partition.
We illustrate the aforementioned enumeration procedure with a small example. Let us consider an SoC whose TAM architecture is fixed and designed for 5 b and partitioned into three TAM partitions of widths 2, 3, and 1, respectively, for package test. All possible TAM enumerations for the aforementioned partitions are { 1, 1, 1 , 1, 2, 1 , 1, 3, 1 , 2, 1, 1 , 2, 2, 1 , 2, 3 , 1 }. These partitions are explicitly enumerated in the proposed method. If we consider W * to be 4, then the valid TAM partitions are { 1, 2, 1 , 2, 1, 1 }. 2) For each valid TAM partition calculated in Step 1), we apply the test-length selection procedure P TLS from [22] . We calculate the defect-screening probability for the SoC from the results obtained using P TLS . The objective function for P TLS is the same as (4). Let T i (j) be the test time for core i when j patterns are applied to it. For a given core i on a TAM partition of width w B , we use the design wrapper technique from [5] to determine the longest scan in (out) chains of length s i (s o ) of the core on that TAM partition. The value of T i (j) can be determined using the formula T i (j) = (1 + max{s i , s o } · j + min{s i , s o }) [5] . The test time T * i for core i is therefore given by
Let A j denote the set of cores that are assigned to TAM partition j. We must ensure that
The complete ILP model is shown in Fig. 5 3) If the defect-screening probability of the new partition is greater than the previous partition, we store it as the new defect-screening probability and store this partition as the current optimal partition. 4) We repeat this procedure until all possible TAM partitions are enumerated.
The experimental results obtained using the P e−TLTWS procedure are summarized in Table II . The results are represented in a similar fashion as in Table I . The values of the defectscreening probabilities P S for five benchmark circuits [24] , as well as the recommended TAM partition widths for wafer sort, are shown in the table. The number of patterns determined using P e−TLTWS for the p34392 SoC is shown in Fig. 6 . The results are shown for three values of T max : T SoC , 0.75T SoC , and 0.5T SoC . Results are reported only for W * = 16 and W = 32; similar plots are obtained for a range of values of W * and W . Fig. 7 shows the relative defect-screening probabilities for the cores in the p34392 benchmark for the aforementioned test case. The heuristic method results in lower defect-screening probability for most cases compared with the ILP-based method; for higher values of W * , the difference in defectscreening probability between the two methods decreases. The computation time for the largest benchmark SoC p93791 was only 4 min; hence, this approach is suitable for large designs.
V. TAM WIDTH AND TEST-LENGTH SELECTION BASED ON GP
GP problems are convex optimization problems that are similar to linear programming problems [32] . A GP is a mathematical problem of the form
where f i 's are posynomial functions, g i 's are monomials, and x i 's are the optimization variables; it is implicitly assumed that the optimization variables are positive, i.e., x i > 0 [32] . Mixed-integer GPs (MIGPs) are a class of problems that are hard to solve [32] . The problem P TLTWS can be modeled as an MIGP problem. We employ a heuristic method to solve the MIGP problem P gp−TLTWS for test-length and TAM width selection. By using heuristic methods, approximate solutions can be found in a reasonable amount of time; the optimality of the solution, however, cannot be guaranteed. Before we describe the GP-based heuristic method, we need to modify the objective function to make it amenable for further analysis. The objective of P TLTWS is to maximize the defect-screening probability,
. This is equivalent to the following minimization-based objective function:
The constraints for the optimization problem described in Section III can be easily modified for use in the MIGP problem. The complete MIGP problem for P TLTWS is shown in Fig. 8 . We use GP relaxation to transform the MIGP problem to a general GP problem that can be solved using commercial tools [33] . To obtain an approximate solution of the MIGP problem, the MIGP is relaxed to a GP and solved using commercial tools [33] ; the result obtained in this way is a lower bound on the optimal value of the objective function for MIGP. The values of the variables obtained after relaxation are then simply rounded toward the nearest integer. The heuristic then iteratively reassigns the values of the variables such that the constraints are satisfied while maximizing the defect-screening probability for the SoC. The heuristic used to solve P gp−TLTWS consists of the following steps.
1) In the first step of this procedure, we relax the MIGP for P TLTWS to a GP problem. The relaxation essentially means that the binary indicator variables used in the optimization problem can take noninteger values. 2) We then use commercial tools [33] 
. This is repeated until the time constraints on all TAM partitions are satisfied. The experimental results obtained using the GP-based heuristic procedure are summarized in Table III . The results are represented in a similar fashion as in Table II . The relative defect-screening probability obtained using the GP-based heuristic is greater than that obtained using the enumerative heuristic technique and less than that obtained using the ILP method. The computation time ranges from 6 min for the a586710 SoC to 51 min for the p93791 SoC.
VI. APPROXIMATION ERROR IN P r S
A Taylor's series expansion of δ i (j) i (j), without the higher order terms, was used in Section III to obtain a linear objective function for P TLTWS . If the defect-escape probability for core i is much smaller than unity, this assumption can be justified. To study the effect of this approximation, we evaluated the approximation error for the benchmark circuits. We used a commercial NLP solver [34] to incorporate higher order terms in our objective function. The NLP solver [34] uses the generalized reduced gradient method to solve large-scale nonlinear problems [35] .
The nonlinear objective function that we use in our experiments is shown as
The relative magnitudes of the quadratic and cubic terms are negligible compared to the leading order term when the defectescape probability of the core is negligible. We determine the approximation error as a measure to quantify the effect of these higher order terms on P r S . The approximation error is given by
We present experimental results on the approximation error in P r S when ILP and heuristic methods are used to solve P TLTWS versus when NLP-and GP-based methods are used. We use a commercial solver [33] for the GP-based heuristic method. The relative defect-screening probability was determined for a nonlinear objective function using a commercial solver [34] , where the quadratic and cubic terms are considered in addition to the leading order term; this procedure is described in Section II.
Let P r S−ILP , P r S−e−TLTWS , P r S−NLP , and P r S−GP denote the relative defect-screening probability of the SoC obtained using a linear objective function, the defect-screening probability using the enumerative heuristic, the relative defectscreening probability of the SoC using a nonlinear objective function, and the relative defect-screening probability using the GP-based heuristic method, respectively. We determine the approximation error as a measure to quantify the effect of these higher order terms on P As it is evident from the aforementioned equations, the results obtained using the NLP solver are used as a baseline case. This is because the results obtained using GP-based heuristic are only bounds (upper bounds on the relative defect-screening probability), and the results obtained using ILP and the enumerative heuristic method are not optimal. The "p" benchmarks do not consider solutions obtained using ILP because of the lack of a suitable solver to solve problems of this size. The approximation errors for the benchmark circuits are presented in Tables IV and V. The time needed by the NLP solver [34] to solve P TLTWS with the nonlinear objective function ranges from 6 min for the d695 SoC to 4 h for the "p" SoCs from Philips. This clearly indicates that the nonlinear version of P TLTWS is not scalable for large SoCs. The time to solve the GP-based heuristic ranges from 2 min for the d695 SoC to 45 min for the "p" SoCs. The GP-based heuristic can therefore be used to quickly determine bounds on P r S .
VII. CONCLUSION
We have formulated a test-length and a TAM width selection problem for wafer-level testing of core-based digital SoCs. To the best of our knowledge, this is the first attempt to incorporate TAM-width selection in an RPCT-based wafer-level SoC test flow. We have also incorporated core defect probabilities into the modeling and optimization framework. The optimization problem has been tackled using ILP, NLP, GP, and heuristic methods. The experimental results for the ITC'02 SoC test benchmarks show that the proposed approach can contribute to effective defect screening at wafer sort.
