Most of the previous studies on developing an optimal burn-in policy for semiconductor products only deal with the burn-in process itself and little is concerned with utilizing the information on the quality levels of chips before being subjected to burn-in. Developed in this paper is a dual burn-in policy in which the number of chips (d) which do not pass the wafer probe (WP) test and lie in the neighborhood of a reference chip is utilized as an indicator on the quality level of that reference chip. The dual burn-in policy first classifies the chips which pass the WP test into two groups using a boundary value of d, and then each group is subject to burn-in for its own duration. For a certain type of 256M DRAM product, the performance of the proposed dual burn-in policy is compared to that of the single burn-in policy in which all chips are subjected to the burn-in of the same duration without considering d. The analysis results show that, for the cases considered, the proposed dual burn-in policy is more cost-effective than the single burn-in policy, implying that the additional information from the WP test is beneficial to establishing an efficient burn-in policy in semiconductor manufacturing.
Introduction
A semiconductor manufacturing process consists of two major stages: fabricating chips on a wafer and packaging individual chips cut from each wafer. At the end of the fabrication process, wafer probe (WP) is conducted to sort chips into electrically good and defective ones. A chip becomes defective if it contains the so called "killer defects". Defective chips are usually discarded before putting into the packaging process. One exception is that, in the case of the defect tolerant memory product, it is first decided whether a chip with killer defects is repairable or not, and, if repairable, then it is laser repaired using redundant cells embedded in the chip and transferred to the packaging process. After the packaging process, every (in most cases) chip is subjected to a burn-in test in which an elevated temperature and voltage condition is used to weed out those chips that may cause infant mortality failures due to "latent defects". The killer defect may include such defects as open, bridge, etc. which cause electrical function 1 Corresponding author. tel.: +82-42-869-3116; fax: +82-42-869-3110.
failures during the WP test. Although the latent defect is basically the same as the killer defect in nature, it passes the WP test without being detected since it is small in size and/or occurs in undetectable locations. A chip with latent defects may deteriorate with time and fail under stress (e.g., burn-in) or at the normal use condition. If it is not screened out during burn-in due to an insufficient burn-in duration or ineffective stress condition, then it may fail in the field resulting in a low early-period reliability.
While burn-in is beneficial to improving the reliability of products shipped to customers, it is costly, affects the productivity and could prevent timely shipment of product. Therefore, it is imperative for a semiconductor manufacturer to devise an efficient burn-in policy in terms of cost and product reliability.
In this paper, we focus on the burn-in duration as a major factor for developing such an efficient burn-in policy.
If the burn-in duration is extended, then the direct burn-in cost and burn-in failure cost increase, while the field failure cost during the warranty period decreases. There exist numerous works [1] [2] [3] [4] [5] [6] which are concerned with determining the burn-in duration using these trade-offs. However, little is concerned with utilizing the information obtained at the WP test on the quality levels of chips before being subjected to burn-in.
In this paper, the number of chips (d) which do not pass the WP test and lie in the neighborhood of a reference chip which passes the WP test is considered as an indicator on the quality level of that reference chip. This is based on the well-known observation that defects on a wafer are clustered rather than randomly distributed, 7, 8 and therefore, a reference chip with a larger d is more susceptible to failure during and after burn-in than the one with a smaller d, and vice versa. 9 Developed in this paper is a dual burn-in policy in which the chips which pass the WP test are first classified into two groups using a boundary value of d, and then each group is subjected to burn-in for its own duration. To optimize the parameters of the proposed dual burn-in policy (i.e., boundary value of d and burn-in duration for each group), a cost model is developed considering the direct burn-in cost, burn-in failure cost, field failure cost, and handling cost per chip. In order to evaluate the expected total cost per chip, the distribution of d is modeled as a beta-binomial distribution, and the product reliability during and after burn-in is described using a proportional hazard model 10 with d as an explanatory variable. Then, in terms of the expected total cost, the proposed dual burn-in policy is compared to the single burn-in policy in which d is not considered and all packaged chips are subjected to the burn-in of the same duration.
The rest of this paper is organized as follows. In Sec. 2, the related works are reviewed and their relationship with the present study is examined. Sec. 3 introduces a method for empirically estimating the distribution of d. In Sec. 4, a proportional hazard model is developed for predicting the product reliability, and an estimation method of its unknown parameters is described. In Sec. 5, the single and dual burn-in cost models are developed using the findings in Secs. 3 and 4. In addition, the two policies are optimized and compared for a certain type of defect tolerant 256M DRAM products in terms of the expected total cost per chip. Finally, Sec. 6 concludes the paper and discusses future research areas.
Review of Related Works
"Burn-in yield" is defined as the percentage of chips which survive burn-in and is also called "reliability yield" since it is closely related to the latent defect that may affect the early lifetime of the product.
11
Similarly, "WP yield" denotes the percentage of chips that pass the WP test including those that are repaired when applicable. To develop an efficient burn-in policy, it is important not only to analyze the burn-in results but also to identify and utilize the relationship between the burn-in and WP test results. 8, 11 However, the existing works on yield modeling are mostly concerned with predicting WP yield, 8, 12 and correlating WP yield with burn-in yield (i.e., yield-reliability modeling) has not been sufficiently investigated. 11, 12 Early studies on yield-reliability modeling utilized only the basic information, WP yield, among the information that can be explicitly obtained from the WP test. 13, 14 Notable exceptions include Refs. 9, 15 and 16 which defined neighborhood chips in various ways for a reference chip that passes the WP test, and utilized the WP test results on these neighborhood chips in inferring the burn-in yield of each reference chip. This is based on the well-known observation that defects generated in the fabrication process are not randomly distributed but clustered on a wafer. in order to use their model to determine the burn-in duration, two assumptions need to be satisfied. That is, the time limit within which every latent defect in each chip is precipitated to failure must be specified in advance, and the average number of latent defects per chip must be small. In general, however, it is not easy to explicitly specify such a time limit in practice. Besides, for the matured product, it can be expected that the average number of latent defects per chip is small, but for the product in its early stage of production, such an expectation is hard to be realized. Finally, as shown in Sec. 4, the actual burn-in yield of the product considered in this paper shows a piecewise linear pattern rather than a smooth one as yield-reliability model based on actual data is developed to overcome the above-mentioned problems with the existing studies.
3. Distribution of the Number of Defective Chips in the Neighborhood
Definition of neighborhood
The quality level (i.e., the degree of contamination due to latent defects) of a reference chip that passes the WP test can be indirectly inferred from the WP test results on the neighborhood chips of that reference chip. This is due to the well-known fact that defects tend to be clustered rather than randomly distributed on a wafer. For instance, when the number of neighborhood chips of a reference chip is eight as shown in Fig. 1 , the quality level of a reference chip whose d equals zero (i.e., all eight neighborhood chips pass the WP test) would be different from that of a reference chip whose d equals eight, implying that there would also be a difference between the burn-in yields. In the present study, the neighborhood of a reference chip is assumed to be as shown in Fig. 1 , and therefore, d takes the values from zero to eight.
For a reference chip around the edge of a wafer, the number of neighborhood chips may be less than eight as shown in Fig. 2 . In such cases, it is usually assumed that all eight neighborhood chips exist and all of the imaginary (i.e., missing) chips are defective. That is, d for an edge reference chip is determined by counting the number of real defective neighborhood chips and the number of imaginary chips. This is for taking into account the phenomenon that an edge chip generally has a lower yield than an inner chip. Let p be the parameter of a binomial distribution, and assume that p is a random variable and follows a beta distribution with probability density function ( ) f p . Then, a beta-binomial distribution results from compounding a binomial distribution with respect to ( ) f p . 23 If d is assumed to follow a betabinomial distribution, then its probability mass function ( ) u i can be expressed as follows.
( ) 
where N is the number of neighborhood chips for a reference chip ( 8 N = ), p is the failure probability of
B × × is a complete beta function, and a and b are the shape and scale parameters of a beta distribution, respectively ( , 0 a b > ).
The shape of the distribution of d changes depending on the maturity of the product. That is, for the matured product, the mode of the distribution appears at a small value of d while, for the immature product, the mode tends to appear at a large value of d. 
Fitting beta-binomial distribution to data
To fit a beta-binomial distribution to the actual d data for a certain type of 256M DRAM product, the following is defined first. 
Then, unknown parameters a and b in Eq. (1) can be estimated using the NLIN procedure in SAS with the "weight" option, 24 and the results are shown in Tables 1 and 2 . The adequacy of the beta-binomial distribution can be assessed from the high 2 R or 2 R (adjusted) value in Table 1 and the fitted-value plot in Fig. 4 (a). In addition, the standardized residual plots in Fig 
Proportional Hazard Model for Reliability
To formulate a cost model for developing a burn-in policy, the reliability of the product must be described as a function of time during and after burn-in. The proportional hazard model proposed by Cox10 has been used to describe the relationship between the lifetime of a product and covariates or explanatory variables.25, 26 The present study employs the proportional hazard model with d as an explanatory variable to describe the reliability of chips that pass the WP test and are subjected to burn-in of a certain duration.
Proportional hazard model
The proportional hazard model with d as an explanatory variable can be expressed as follows.
where t represents the lifetime of a chip and equals zero at the start of burn-in, 2). That is, the proportional hazard model adopted in the present study is parametric in nature. Eq. (3) is a re-expression of Eq. (2) in terms of the reliability function.
where ( ) ; R t d represents the probability that a chip, which passes the WP test and has d defective chips in its neighborhood, survives t hours of burn-in. Taking logarithm of both sides of Eq. When actual data are given, ( ) (5) can be estimated.
In Fig. 6 , y¢ is plotted against d for the 256M DRAM product subjected to two burn-in durations.
The cases where (t, d) = (3, 7), (3, 8) , (6, 7) and (6, 8) In other words, there is little change in y¢ as d increases from 0 to the change point, but after passing the change point, y¢ increases (except the case where (t, d) = (3, 6)).
To incorporate the piecewise relationship between y¢ and d into the model, Eqs. (2) and (3) are respectively modified as
where z = 0 if 2 d < and 1 otherwise. Subsequently, Eq. (5) is modified as follows. Table 3 shows the regression analysis results for actual 256M DRAM data in Fig. 6 under the model in Eq. (8).
The point, (t, d) = (3, 6), in Fig. 6 deviates from the behavior of the other points, and therefore, its residual is examined together with some diagnostic measures. First, the corresponding standardized residual value is -2.03, which is marginal to be declared as an outlier. 28 The corresponding Cook's distance, leverage, and DFIT values28, 29 are 0.0435, 0.0405, and -0.516, respectively, indicating that the point does not have an undue influence on the regression coefficients and/or predicted values. Therefore, the point is decided to be retained in the subsequent analysis.
Since the P-value of the i term in Table 3 is large, another regression analysis is performed without the i term and the results are shown in Tables 4 and 5 .
It is noticed from Table 5 
Eq. (12) can be rewritten as Eq. (13) 
In addition, there usually exists a requirement on the reliability level of the product in the field. Such a requirement is often expressed as the average failure rate (AFR) during the warranty period f t , which is defined as
where ( ) R × is the field reliability function. 30 Since only the chips that survive the burn-in test are 
The product reliability at the field time f t can be obtained using Eq. (15) . In this paper, the field time is translated into the time in the burn-in time scale using the acceleration factor (AF conditions, respectively. 31 In addition, it is assumed in the present study that the warranty period f t is one year (i.e., approximately 9,000 hrs in the field time and 9,000/AF hrs in the burn-in time scale) and is within the infant mortality period.
Based on the above, the optimization problem for determining the burn-in duration b t for the single burn-in policy can be formulated as follows. Since the burn-in duration is usually determined in an increment of one hour in practice, the optimal burn-in duration is also determined in one-hour unit using a grid search method over the search region [0, 96].
Dual Burn-in Policy
In the proposed dual burn-in policy, the chips which pass the WP test are first classified into two groups using the boundary value c d of d. Then, the chips whose d is less than or equal to c d are subjected to burn-in of duration bL t , while the chips whose d is greater than c d are subjected to burn-in of duration bH t where bL bH t t < . The justification of this policy is as follows. A chip with a larger value of d than others is more likely to fail early in the field, and therefore, the burn-in duration for such a chip is extended to increase the chance of screening, and thereby, to meet the reliability requirement in the field.
On the other hand, when d is small, the burn-in duration is reduced to enhance productivity and to make economic benefits.
Under the proposed dual burn-in policy, the expected total cost per chip consists of the expected direct burn-in, burn-in failure, and field failure costs per chip for each group and the handling cost per chip ( 4 c 
Then, ( ) f AFR t can be calculated using Eq. (14).
The optimization problem for the dual burn-in policy can be formulated as for the single burn-in policy (see (16) (optimal expected total cost per chip for the single burn-in policy)
(optimal expected total cost per chip except for the dual burn-in policy). c
d = -
That is, if 4 c is smaller than d , then the proposed dual burn-in policy is preferred to the single burn-in policy, and vice versa. Table 6 shows the computational results. The optimal dual burn-in policy is to classify the chips that pass the WP test into two groups using three as the boundary value c d , and to employ the burn-in durations of 15 and 29 hours for the first and second groups, respectively. For the single burn-in policy, the optimal burn-in duration turns out to be 17 hours. Note that d = 1.238. Since 4 c is usually smaller than 1 c (= 1), the dual burn-in policy is more cost-effective than the single burn-in policy for the product considered.
Since some uncertainties may exist in the estimates of 2 c , 3 c , and AF, a sensitivity analysis is conducted with respect to these parameters. It is first assumed that the true values of 2 c , 3 c , and AF lie within ± 30% of the estimated values. Then, for each parameter, three true values are selected as (70% of the estimate, 100% of the estimate, 130% of the estimate).
This results in 27 combinations of true parameter values, and for each combination, the true expected total cost per chip is calculated for each policy which was determined using the estimated parameter values. Table 7 shows the sensitivity analysis results. It is first noticed that d values change from 1.153 to 1.321, implying that the advantage of the proposed dual burn-in policy is still maintained under the assumed uncertainties in the parameter estimates. On the other hand, the actual AFR values become greater than the required value (= 1,000 FIT's) for both policies when the true AF is smaller than the estimated one. This implies that an overestimation of AF should be avoided as much as possible.
Conclusion
Burn-in performs an important function to screen those products that are likely to fail early in the field, and thereby, to meet the reliability requirement. However, it is costly and may degrade productivity, and therefore, developing an efficient burn-in policy is imperative to manufacturers.
In this paper, a dual burn-in policy is developed and its effectiveness relative to the single burn-in policy is illustrated using the actual 256M DRAM data from a semiconductor manufacturing process. The proposed policy first classifies the chips that pass the WP test into two groups according to the number of defective neighborhood chips, and different burn-in durations are applied to the two groups. Then, a costbased optimization problem with a reliability requirement is formulated and solved for the decision variables of the proposed policy. To the best of the authors' knowledge, the present study is the first attempt to utilize the information on the WP test results in developing a burn-in policy.
The present study may be extended in several directions. In the proposed dual burn-in policy, the number of groups for the chips that pass the WP test is restricted to two. Policies with more than two groups need to be investigated to compare their effectiveness to the present one. In addition, for the defect tolerant memory product, the number of repairs for a chip at the WP test is another important indicator on the quality level of that chip. It may be a fruitful area of future research to develop a burn-in policy based on this indicator and compare its performance with the present one.
Acknowledgment
The authors thank the editor and a referee for their encouragement and suggestions that improve this article. 
Finally, ij w is defined as the reciprocal of ( ) ij Var y¢ .
B. Derivation of ( ) 
T t t T t T t t T t d i
b i b f b b T t d i d i d i T t t T t d i T t d i = > = = = × = < + > = × > = å () ( ) ( ) 8
T t t T t S i u i T t S i T t S i
= é ù > + > ê ú = > - ê ú > ë û å (C.2) ( ) ( ) { } ( ) { } ( ) { } 8 0 Pr , Pr 1 Pr , b f b b i T t t S i u i T t S i T t S i = é ù > + ê ú = > - ê ú > ë û å (C.3)( ) ( ) {
T t t S i u i T t S i T t S i
= é ù > + ê ú = > - ê ú > ë û å (C.4)( ) ( ) {
T t t d i T t T t
= > + = > = > å () ( ) ( ) ( ) ( ) ( ) ( ) 8
