Abstract-Large process variations in recent technology nodes present a major challenge for the timing analysis of digital integrated circuits. The optimization decisions of a statistical delay test generation method must therefore rely on the probability of detecting a target delay fault with the currently chosen test vector pairs. However, the huge number of probability evaluations in practical applications creates a large computational overhead.
I. INTRODUCTION
Process variations and high defect densities in recent technology nodes have emerged as new challenges for the timing analysis and the delay test of digital integrated circuits [1] - [3] . The uncertainty in the delays of all circuit components severely degrades the quality and reliability of all delay tests, leading to many test escapes [4] . More stringent path sensitization conditions can reduce the risk of test escapes, but these conditions are not satisfiable for a large number of paths [5] .
The detection of a target delay fault by a single test vector pair occurs with a certain delay fault detection probability, which must be increased by the application of additional test vector pairs to minimize the risk of test escapes [6] . This probability depends on many delay test parameters [7] - [13] , which must be optimized to find a suitable compromise between the final delay test quality and the test cost. Apart from the test set itself, these parameters include the test clock cycle time [9] and the masking of the combinational network outputs for fasterthan-at-speed testing [10] . To guide the delay test parameter optimization, the delay fault detection probabilities for all affected target delay faults must be estimated after every delay test parameter modification. To avoid a huge computational overhead, the knowledge of previous probability estimation results must be exploited to incrementally compute the delay fault detection probability after every parameter modification.
The large runtime of the delay fault detection probability estimation has prompted some research in a related but simpler problem [11] , which is obtained by neglecting all structural and spatial correlations. However, the resulting approximation error is acceptable only for tiny delay variations in the mature and classical manufacturing technology space [14] .
On the other hand, Monte Carlo simulations naturally consider structural and spatial correlations even in very complex circuit models. To avoid simulating the entire test set after every removal or insertion of a test vector pair, [12] proposed to store the random delay values and test results for all iterations and test vector pairs. However, a Monte-Carlo simulation of the entire circuit is inefficient because only sufficiently long paths, which are also sensitized by the new test vector pair, can have a significant impact on the fault detection probability.
In [9] , the authors presented a sensitization analysis in combination with a block based approach to compute the test vector pair delay distribution. While the block based approach could easily be extended for incremental computations, it also requires all gate and interconnect delays to have a normal distribution. However, this requirement is not satisfied for low power applications [15] and localized delay faults, which are often assumed to have an exponentially distributed fault size.
To alleviate these limitations, the authors of [13] introduced a path based approach which only requires normally distributed path delays. However, the efficiency of this approach deteriorates with the test set size, because a multivariate normal integral of increasingly high dimension must be approximated after every delay test parameter modification.
To address this problem, the paper at hand presents a new path based algorithm, which incrementally approximates the fault detection probability after every delay test parameter update. The algorithm minimizes the runtime for the probability computations by efficiently adapting an approximation of the test subset delay distribution, which is defined by the joint delay distribution of all test vector pairs in the test subset. The accuracy of the delay fault detection probability approximation only depends on the new values of the delay test parameters and is not affected by the delay test parameter update type or the order, in which these updates are applied.
The incremental analysis is based on a new extension of the Clark approximation [16] , which introduces an efficient method to compute the covariance Cov(U, V ) between the two maxima U = max(X 1 , X 2 ) and V = max(X 3 , X 4 ) of jointly normally distributed random variables X 1 , . . . , X 4 .
The remainder of this paper is organized as follows. The proposed incremental computation of the delay fault detection probability is presented in section II. Section III describes the extension of the Clark maximum approximation method. The experimental results for several NXP benchmark circuits are shown in section IV and conclusions are drawn in section V.
II. INCREMENTAL COMPUTATION OF THE DELAY FAULT DETECTION PROBABILITY
It is assumed that the whole circuit is subject to delay variations, which may cause path delay faults by itself or in combination with a single gate delay fault. The subset of all test vector pairs in the test set, which are applicable for the detection of the gate delay fault, is referred to as test subset. Furthermore, the time between the application of a test vector pair to the combinational network inputs and the stabilization of all network outputs is called test vector pair delay. The following subsections focus on the insertion and the removal of a test vector pair from the test subset. The update of other delay test parameters is addressed in subsection II-E.
Following an update of the test subset, the proposed algorithm incrementally approximates the probability of capturing an incorrect logic value into at least one scan flip-flop. The incremental computation is based on the efficient approximation and adaptation of the test subset delay distribution, which is the joint delay distribution of all test vector pairs in the test subset. Only relatively few distribution parameter entries are affected by a delay test parameter update, which yields substantial savings in runtime. Moreover, the small dimension of the distribution allows the efficient approximation of the timing failure probability, which is a tight upper bound for the delay fault detection probability. A timing failure occurs if at least one combinational network output has not stabilized to its final logic value within the test clock cycle time.
All major steps of the algorithm are shown in Fig. 1 . After the insertion of a new test vector pair, a sensitization analysis of this test vector pair is performed, which yields all complete transition paths along which transitions propagate to the combinational network outputs (A). The delay of the new test vector pair is then described by a normal distribution, which approximates the maximum of all transition path delays (B). This normal distribution subsequently extends the multivariate normal approximation of the test subset delay distribution by one dimension (C). Finally, the timing failure probability approximation is obtained by numerical integration over the resulting multivariate normal distribution (D).
After the removal of a test vector pair, the corresponding parameter entries of the multivariate normal distribution are deleted (C). A numerical integration over the reduced distribution yields the timing failure probability approximation (D).
The following subsections present a detailed description of all major steps of the algorithm.
A. Sensitization Analysis of New Test Vector Pair
The sensitization analysis is based on a single pass eventdriven timing simulation of the new test vector pair using only nominal delay values [9] , [13] . A set of complete transition paths is subsequently identified by tracing all transitions at the outputs of the combinational networks back to the inputs of the networks. The delay of a transition path is the sum of the delays of all circuit components along the path, where each component delay is defined as the sum of the nominal delay value, the inter-die and the intra-die part of the delay variation. Any complete transition path, which terminates at an unmasked combinational network output, is called long transition path if its probability of exceeding the test clock cycle time is above a user defined threshold. In the following, X 1 , X 2 , . . . , X n will denote the delays of the resulting n long transition paths.
The sum of all component delays along a complete transition path converges rapidly to a normal distribution for most practical models of correlation, especially if the distributions of the gate delays are close to a normal distribution [17] . This property is a major advantage of the path based approach and also justifies the assumption, that the joint distribution of all long transition path delays can accurately be described by a multivariate normal distribution.
B. Approximation of New Test Vector Pair Delay Distribution
The delay of the new test vector pair, which extends the current test subset from size k − 1 to size k, is defined as
and approximated using the extended Clark approximation method, which is presented in section III. The additional flexibility of the extension allows the formation of balanced tree like dataflow graphs (Fig. 2b) , in which only those maximum operations, which lie on the path to the final result, must be recomputed if modifications to the long transition path delays are made.
On the other hand, the Clark approximation always forms a single chain of maximum operations (Fig. 2a) , so that on average half of the maximum operation results depend on the value of a single variable. After the removal of the ith test vector pair, the dimension of µ and Σ reduces from k to k − 1 by deleting all entries, which correspond to the ith test vector pair delay Y i .
The time for the extension of the distribution increases linearly with the test subset size, assuming that all test vector pairs sensitize the same number of long transition paths.
D. Approximation of Timing Failure Probability
Finally, this step approximates the probability of a timing failure, which occurs if at least one test vector pair delay
The timing failure probability is computed as
and approximated by a multivariate normal integral over the test subset delay distribution approximation
where φ(·) denotes the probability density function of the multivariate normal distribution.
The probability Ψ is efficiently approximated using recent numerical integration algorithms [18] , [19] . To ensure the positive definiteness of Σ for linearly dependent test vector pair delays and approximation errors, all diagonal entries of Σ are multiplied with a very small constant greater than one.
E. Update of other Delay Test Parameters
The
III. EXTENDED CLARK APPROXIMATION METHOD
Let U = max(X 1 , X 2 ) and V = max(X 3 , X 4 ) denote two maxima of jointly normal random variables. The extended Clark approximation uses the formulas for the exact moments of U and V , presented by Clark [16] , but extends the flexibility of the approximation by providing a method for the accurate computation of the covariance Cov(U, V ).
The classical Clark approximation computes Cov(U, X 3 ) and Cov(U, X 4 ) using the formula Cov(max( Y 3 ) . However, the new random vector (X 3 , X 4 , U ) does not have a trivariate normal distribution. As a consequence, this formula does not compute the accurate value of Cov(max(X 3 , X 4 ), U ).
This extension computes the accurate covariance from
where the first raw moments E[U ] and E[V ] are available from the Clark approximation [16] . The cross-moment
To simplify the solution for the above expression, let the random vector W be a linear transformation of X, defined as
The transformed vector W has a multivariate normal distribution with mean vector
where µ i denotes the mean of X i and σ i,j denotes the covariance Cov(X i , X j ) with i, j ∈ {1, . . . , 4}. Only the upper triangular part of the covariance matrix A = {a i,j } is presented here due to spacial limitations. The lower triangular part is defined by symmetry A = A T . Suppose that the component W i has mean θ i , standard deviation a i = √ a i,i and correlation ρ i,j = a i,j /(a i a j ) with another component W j . By introducing the following notations
and using the property
the formulas for the truncated multivariate normal distribution, presented in [20] and [21] , can be simplified to
where φ k denotes the probability density function and Φ k denotes the cumulative distribution function of the k-variate standard normal distribution. Hence, the computation of the covariance Cov(max(X 1 , X 2 ), max(X 3 , X 4 )) involves several univariate normal integrals but only one bivariate normal integral, which can be evaluated using a fast double precision approximation method, recently presented by Genz [19] . For the special case ρ 2,4 = 0 the computation simplifies to
Finally, if either U or V has a normal distribution, the proposed extension simplifies to the special case considered by the Clark approximation formula for Cov(max(Y 1 , Y 2 ), Y 3 ).
IV. EXPERIMENTAL RESULTS
Several NXP benchmark circuit were speed-optimized using a commercial synthesis tool and mapped to the NanGate 45nm Open Cell Library [22] . The gate model for this approach conforms to the Verilog HDL [23] standard, but instead of real numbers, every delay value X of a gate has a normal distribution with variance var(X) = (c v E[X]) 2 and mean E[X] equal to the nominal delay value from the standard delay format description of the synthesized circuit. Based on predictions for future process technology nodes [14] , a variation coefficient of c v = 0.25 was selected. Interconnect delays and spatial correlations would require the analysis of full chip layouts, which had not been designed to avoid an unnecessary complex experimental setup. The test clock cycle time T clk was determined, such that delay variations would not cause timing failures in 95% of the fault free chips. For every circuit, a set of 20000 randomly chosen single gate delay faults was created. Next, a test subset for each delay fault was generated by a suitable example ATPG algorithm. At first, the 1000 longest paths through the fault site were found by a commercial static timing analysis tool. Thereby, the number of paths terminating at the same combinational network output was limited to 100. The resulting set of paths was sensitized with a commercial ATPG tool. Finally, the delay fault size was chosen, such that the delay of the longest complete transition path through the fault site was equal T clk .
To evaluate the incremental probability computation, the example ATPG algorithm extended an initially empty test subset by inserting single test vector pairs in decreasing order of the greatest complete transition path delay. The removal of a randomly chosen test vector pair only requires a rerun of the numerical integration, which is also presented in Table I . Only the results of those update operations are shown, which resulted in a test subset of size 1, 5, 10 or 20 test vector pairs. Alternative delay test parameter updates are evaluated by the results of the corresponding individual steps of the algorithm.
The reference fault detection probability P was computed by crude full-circuit Monte Carlo simulations of 10 4 iterations, which results in a sufficiently small estimated upper bound of 0.0098 for the half length of the 95% confidence interval. A fault is said to be detected if during one iteration a combinational network output has an unexpected logic value after the test clock cycle time T clk . The runtime of a Monte Carlo simulation is dominated by the large number of random delay values, which are required for each iteration. The Verilog HDL standard distinguishes between different pinto-pin, asymmetric rising/falling and conditional path delays. Therefore up to eight delay values are required for a twoinput gate. The random number generation utilizes highperformance implementations of the Box-Muller transform and the Mersenne Twister pseudo-random number generator from the Intel Math Kernel Library [24] . All programs were implemented in C, C++ and FORTRAN and executed on Intel Core i7-2600K processor workstations with 20GB RAM.
Column (1) gives the name of the NXP benchmark circuit, and column (2) shows the number of test vector pairs |Θ| in the test subsets Θ after the insertion or removal of a test vector pair. All following columns (3)- (14) present average values over all randomly chosen delay faults in each circuit.
The average runtime of the crude Monte Carlo simulation is shown in column (3). The following columns (4)- (11) and (12)- (14) present the results for the insertion and the removal of a test vector pair, respectively.
NXP circuit |Θ|
[ms]
[ms] Columns (4)- (6) present the results of the sensitization analysis. A complete transition path with delay X was considered long, if E[X] + 3 var(X) ≥ T clk . The average number of long transition paths |Π|, identified for all test vector pairs in the test subset Θ, is shown in column (4). The approximation error δ of the sensitization analysis is estimated by
and the average absolute value of δ is presented in column (5). In general, the sensitization of a long transition path is restricted to a subset of all possible delay realizations, due to hazards and variations in the arrival times of the transitions at the off-path inputs. Hence, the nominal delay value based sensitization analysis might identify a set of long transition paths, which are not representative for the arrival time of the last transition at the combinational network outputs. However, the resulting error δ becomes very small as the number of long transition paths increases. The average runtime of the sensitization analysis is shown in column (6).
The following steps of the algorithm were omitted, unless at least one long transition path was found. Columns (7) and (8) show the results from the extended Clark approximation for the computation of the test vector pair delay distribution and the extension of the multivariate normal distribution. The extended Clark approximation error is estimated by
The average absolute error, shown in column (7), demonstrates the superior accuracy of the extended Clark approximation for the proposed test subset delay distribution extension. However, experiments with more general statistical timing analysis problems have shown its high susceptibility to skewed intermediate results, which resulted in high approximation errors.
The average runtime for the delay distribution extension, shown in column (8) , is very small and increases almost linear with the size of the test subset.
The test subsets for p330k contains some test vector pairs, which sensitize thousands of long transition paths, with many paths being sensitized by multiple test vector pairs of the same test subset. To minimize the runtime of this step, it would be possible to ignore all recurrences of a long transition path delay without affecting the timing failure probability, unless this path delay is also deleted by a subsequent removal of a test vector pair or the masking of a transition path output.
The timing failure probability Ψ was approximated using the FORTRAN routines BVND, TVTL and MVNDST, which were developed by Genz [18] [19] . The average runtime, shown in column (9) and (12), increases rapidly for small and slowly for larger dimensions |Θ| ≥ 10. The parameters of MVNDST were chosen, such that the estimated relative error was below 5 · 10 −3 while using at most 10 5 points. These routines were also used for the presentation of the approximation errors δ and ε of the individual steps.
Columns (10)- (11) and (13)- (14) present the average runtime and the accuracy of the complete algorithm after the insertion or the removal of a test vector pair, respectively.
The approximation error of the final result is estimated by = Ψ − P , where Ψ denotes the approximation of the timing failure probability and P denotes the fault detection probability. The average absolute incremental analysis error, shown in columns (10) and (13), is small and mainly determined by the accuracy of the sensitization analysis.
Columns (11) and (14) highlight the very large speedup of the proposed algorithm. Compared to the previous approach [13] , the new algorithm is between 1.2 and 26.5 times faster. The speedup is defined by the ratio of the average runtime of the Monte Carlo simulation (column (3)) and the average runtime of the incremental computation over all delay faults.
Similary, the speedup for other delay test parameter updates can be computed from the sum of the average runtimes of the individual steps, which are required for the particular update. For example, the runtime after the reduction of the test clock cycle time or the modification of the combinational network output masks equals the sum of columns (8) and (9) .
V. CONCLUSION Delay variations in recent technology nodes reduce the quality and reliability of all delay tests. To find a suitable tradeoff between the statistical delay test quality and the test cost, statistical test generation methods must optimize all delay test parameters based on the probability of detecting a delay fault. To efficiently evaluate the huge number of probability estimations, an efficient incremental algorithm has been presented, which is suitable for the inner loop of automatic test pattern generation methods. The approach was compared with results of extensive Monte Carlo simulations and has consistently shown a very large speedup with only a small loss of accuracy.
VI. ACKNOWLEDGEMENT

