Abstract-
I. INTRODUCTION
A S VLSI technology scales into the nanometer domain, VLSI design faces new challenges from increasingly significant parametric variations and prevalent defects. Significant parametric variations and prevalent defects result from manufacturing process limitations, e.g., resolution limitation of lithography processes in manufacturing subwaveform layout features (which lead to lateral dimension variations for layout features), and ultimately from the uncertainty principle of quantum physics (which leads to variations of vertical dimensions of layout features, dopant concentration, temperature, stress, and so on). Aggressive VLSI design further aggravates the problem, e.g., with increasing device density, clock frequency, and on-chip temperature (which contributes to the increase of parametric variations and defect densities), and with tighter design constraints (which increase the vulnerability of VLSI designs to such parametric variations and defects).
One of the consequences of such unavoidable significant variabilities and prevalent defects is the increase of performance variability that has become the bottleneck for VLSI performance scaling. As a result, new statistical timing verification and delay test techniques are much needed.
To consider on-chip parametric variations, traditional static timing analysis (STA) is extended to statistical STA (SSTA), which further serves as the timing engine in the existing delay test pattern generation techniques. The existing path delay test pattern generation methods first identify the longest paths by STA or SSTA, and then apply a traditional automatic test pattern generation (ATPG) method that enables the critical paths and generates test vectors [7] , [8] , [14] , [36] , [37] . Relying on input-oblivious STA/SSTA as the timing engine for path delay test has several problems. First, STA/SSTA does not guarantee to exclude all false paths. False path identification is nondeterministic polynomial-time-hard problem, for which STA/SSTA largely relies on user input. Second, STA/SSTA provides pessimistic path delay upper-bounds and lower-bounds, while accurate path delays are needed for path delay fault coverage in path delay test. In this paper, we propose to rely on an input-aware timing analyzer as the timing engine for path delay test.
We observe that VLSI power estimation and statistical timing analysis study the same stochastic signal switching activities, and propose to leverage some of the existing techniques in VLSI power estimation for statistical timing analysis. While power estimation studies the average signal switching activities, statistical timing analysis studies the extremes of the stochastic process. Three categories of methods exist with different efficiency and accuracy tradeoffs in power estimation: 1) static; 2) statistical; and 3) vector-based dynamic approaches such as simulation and testing [23] . Similarly, three categories of methods exist in statistical timing analysis: 1) SSTA; 2) statistical timing analysis based on probabilistic techniques, such as those applied in power estimation; and 3) Monte Carlo [simulation program with integrated circuit emphasis (SPICE)] simulation and testing that provide the most accurate and trustful results. We leverage signal probability-based signal switching activity analysis techniques in power estimation, and developed an input-aware statistical timing analysis method signal-probability-based statistical timing analysis (SPSTA). 1 1 A similar work can be found in [17] , which extends signal probabilitybased signal switching activity analysis to calculate signal switching probabilities for each timing window interval for crosstalk-aware power estimation. The differences between this paper and [17] include: 1) the goals are different; this paper aims at statistical timing analysis, [17] aims at power estimation; 2) this paper calculates linear signal transition temporal occurrence probability functions (topf), while [17] calculates signal switching probabilities at discrete time steps; and 3) this paper does not include glitches while [17] includes glitches into account.
We apply SPSTA to VLSI delay test pattern generation, and achieve the first input-aware statistical timing analysisbased delay test pattern generation method. We observe that in the presence of performance variation, the worst-case performance is more probable to occur when multiple signals propagate simultaneously, and propose to generate delay test patterns that enable signal propagation networks for reduced delay test size and improved delay fault coverage. Our proposed SPSTA-based path delay test pattern generation method includes: 1) SPSTA timing analysis; 2) identification of timingcritical signal propagation paths/networks; and 3) application of ATPG techniques such as backtracing and logic implication that remove false signal propagation paths/networks and further improve SPSTA timing analysis accuracy.
We propose top timing-critical path coverage as a simple delay fault coverage metric. In our experiments based on the ISCAS'89 benchmark circuits, we observe that an extremely large number of Monte Carlo simulation runs are required to achieve a golden top timing-critical path ranking list, which implies that the common practice of functional at-speed test is unlikely to achieve satisfiable delay fault coverage. Compared with Monte Carlo simulation results, statistical static timing analysis simplified (SSTA-S) (or depth-based statistical analysis that has been implemented in Synopsys PrimeTime Advanced on-chip variation (OCV) technology [32] and provides the basis for a number of existing delay ATPG algorithms [7] , [8] , [14] , [36] , [37] ), and SPSTA achieve an average of 5.8%, 8.4%, and 3.6% inaccuracy in maximum signal arrival time estimate. Our experimental results based on ISCAS'89 benchmark circuits further show that the state-of-the-art delay test pattern generation method SSTA-TQM-BnB achieves an average of 47.32%, 45.14%, and 57.98%, SPSTA-based VLSI delay test pattern generation (SPSTA-DTPG) achieves an average of 57.41%, 61.43%, and 68.05%, while SPSTA-DTPG with test pattern compaction (SPSTA-DTPG-C) achieves an average of 83.09%, 87.48%, and 90.30% coverage of the top 50, 100, and 200 timing-critical paths, respectively.
The rest of this paper is organized as follows. We present the existing delay test ATPG and statistical timing analysis techniques in Section II, and present our proposed input-aware statistical timing analysis-based delay test ATPG methods in Section V. We present our experimental results in Section VI, before we conclude in Section VII.
II. BACKGROUND

A. Existing Delay Test Generation Methods
The increasingly tight performance constraints and significant performance variations have made delay test increasingly important and difficult. Early delay test or at-speed structural test methods base on the transition fault model, which assumes that any path through a faulty gate leads to a delay fault [6] , [15] . Goel et al. [12] propose to select transition faults at high fan-out nets since such faults affect more signal propagation paths. Yilmaz et al. [35] propose to assign each transition fault an occurrence probability, and propagate such probabilities based on signal probabilities.
Later techniques base on the gate delay fault model or as-late-as-possible transition fault model, which assumes that the longest path through a faulty gate lead to a delay fault [13] , [20] , [29] . The state-of-the-art delay test vector generation techniques base on the path delay fault model that considers delay faults due to accumulation of small delay variations in the presence of parametric variations. These techniques: 1) perform SSTA; 2) back-trace from the endpoints in a branch-and-bound algorithm and find the top N timingcritical paths based on a test quality metric (TQM); and 3) apply a traditional ATPG algorithm and generate the delay test vectors for the top N timing-critical paths [7] , [8] , [14] , [36] , [37] . The TQM is defined as
for a group of paths P which timing slack is S P . The timing slack of all the paths P going through a path segment Z 1,n = (a 1 , a 2 , . . . , a n ) is given by
where AT(a 1 ) is the arrival time at a 1 , RAT(a n ) is the required arrival time at a n , and D(Z 1,n ) is the delay of path segment Z 1,n . Because the actual arrival time AT(a 1 ) is the maximum signal arrival time along any path at node a 1 , and the required arrival time RAT(a n ) is the minimum required arrival time for any path at node a n , the path segment timing slack is the minimum of the timing slacks for any path through the path segment, and equals to the timing slack of the critical path through the path segment. Any path p i in path group P does not have a smaller timing slack s i or a larger TQM
This upper bound provides the basis for a branch-and-bound algorithm.
Test compaction improves stuck-at fault coverage by merging test vectors with don't cares. However, in path delay test, it is common that many top critical paths share long common segments and differ only in a few gates. While their delays are largely correlated, such structurally-correlated critical paths cannot be merged by the traditional test compaction methods. As a result, to avoid generating multiple test patterns for structurally-correlated timing-critical paths, the existing path delay test generation methods: 1) resort to different path delay fault coverage metrics other than delay space, such as fabrication-process parameter space [31] ; 2) select top critical paths for the highest joint path metric, i.e., the maximum path delay fault coverage when a group of critical paths are tested [37] ; or 3) select top critical paths for the highest detection probability, i.e., the highest conditional probability that when one path is tested, all the correlated paths are tested [28] . As these methods skip some top critical paths, their effect on path delay fault coverage depends on the correlation between the skipped and the tested critical paths.
B. Statistical Timing Analysis
The quality of test pattern generation methods depends on the accuracy of the employed statistical timing analysis techniques. However, without taking input patterns into consideration, SSTA is designed to provide pessimistic bounds for guardbanding-based design rather than actual timing performance and ranking of timing-critical paths for delay test vector generation. We review the status of statistical timing analysis techniques as follows.
SSTA was developed based on STA because as technology scales, the increasingly significant on-chip parametric variations result in increasing performance variation and affect timing yield. Early SSTA methods are in two categories. Blockbased SSTA [2] , [33] computes a signal arrival time distribution function for each node, e.g., in a breadth-first traversal of the netlist. Path-based SSTA [25] , [26] computes a signal arrival time distribution function for each node in each (nearcritical) path, based on the observation that signal arrival time distributions differ as the signal comes from different paths, e.g., due to path-sharing and resultant signal correlations.
Considering signal correlations is critical for SSTA accuracy [4] , [19] , [21] . Signal correlations may be introduced by some topological features of the netlist, e.g., shared path segments or reconvergent fan-outs [2] . Further, spatial correlations exist between on-chip variational parameters, which degrades as distance between components increases [24] , [34] . Such correlations can be captured by deriving statistical circuit properties such as signal arrival times in closed form expressions of design, process, and system runtime variational parameters, e.g., in affine arithmetics [21] , probabilistic interval analysis [27] , or by sampling analysis and regression [16] , [18] . Such closed form expressions further support a unified approach for full chip statistical timing and leakage analysis [5] .
One of the most significant sources of VLSI performance variation is the combinational logic inputs. Different input vectors may lead to totally different timing critical paths, besides different critical path delays. However, STA and SSTA do not consider input vectors, which not only lead to timing analysis inaccuracy, but also bring difficulty in delay test pattern generation. Before introducing SPSTA, let us review the existing SSTA and some of the statistical power estimation techniques.
III. PRELIMINARIES
A. Statistical STA
As technology scaling leads to increasingly significant parametric including performance variations, SSTA was developed on the basis of STA. Typically, SSTA models a signal arrival time or signal propagation delay as a linear function of statistical variables [18] , [21] , [27] , [33] that can be physical variational parameters or nonphysical statistical variables such as from principle component analysis (PCA) [1] , [30] . 2 There are two basic operations in SSTA: taking summation (SUM) and finding minimum/maximum (MIN/MAX) (Fig. 1) .
1) SUM:
Signal propagation delays are accumulated along a path by taking summations. For example, the output signal arrival time t 0 of an interconnect is given by the summation of the input signal arrival time t 1 and the delay d of the interconnect, e.g., assuming uncorrelated t 1 and d, as follows 3 :
For Gaussian t 1 and d, t 1 + d is also Gaussian, for which mean and standard deviation are given as follows:
2) MIN/MAX: For a multiple-input logic gate, the minimum/maximum signal arrival time at the gate output is given by the sum of the gate delay and the minimum/maximum of the signal arrival times at the gate inputs. For example, the maximum of two uncorrelated signal arrival times t 1 and t 2 is given as follows 4 :
For Gaussian t 1 and t 2 , MAX(t 1 , t 2 ) is not Gaussian, which mean and standard deviation can be approximated by closed-form formulas [33] . Finding the minimum is similar, e.g., MIN(t 1 , t 2 ) = −MAX(−t 1 , −t 2 ). 5 
B. Signal Toggling Analysis in Statistical Power Estimation
In power estimation, statistical techniques achieve significant efficiency improvement over simulation and testing. One of such techniques is based on signal probability and signal toggling rate computation as follows [23] . Signal probability and signal toggling rate computation for an AND gate with two independent signals at the gate inputs.
1) Signal Probability:
The definition for the signal probability is as follows.
Definition 1: The signal probability P(y) of a net y is the occurrence probability for net y to be of logic one.
Signal probability can be computed for the output of a Boolean function and can be propagated through a combinational logic circuit. For a two-input AND gate y = x 1 · x 2 , assuming x 1 and x 2 are uncorrelated, applying basic probability theory gives P(y) = P(x 1 )P(x 2 ) (Fig. 2 ). In the presence of correlation, conditional probabilities apply. In general, the signal probability P(y) for the output y of a Boolean function
where
are cofactors of f with respect to x 1 . Using binary decision diagram (BDD) representation of the Boolean function, signal probability computation takes linear time in term of the size of the BDD. Propagating signal probabilities in a combinational logic circuit takes linear time in a single traversal.
2) Signal Toggling Rate:
The definition for the signal toggling rate is as follows.
Definition 2: The signal toggling rate ρ(y) of a net y is the expected number of signal togglings per unit time (e.g., clock cycle) at net y.
In statistical power estimation, a signal toggling rate ρ(y) is given by signal probabilities of Boolean difference functions as follows (Fig. 2 ) [22] :
A Boolean difference function that enables a signal propagation path from an input x i to the output y is given by
where ⊕ denotes the exclusive-OR operation. Given signal probabilities in (7) in a netlist, Boolean difference probabilities in (9) and signal toggling rates in (8) can be computed efficiently in a single traversal of the netlist.
IV. SIGNAL PROBABILITY-BASED STATISTICAL TIMING ANALYSIS
SPSTA is based on an extension of the stochastic signal switching activity analysis in VLSI power estimation. VLSI power estimation computes average signal toggling rates per clock cycle, e.g., based on signal (logic one occurrence) Fig. 3 . Results of the MAX and the WEIGHTED SUM operations for an AND gate with two inputs both of 0.9 signal probability, and of signal arrival times t 1 and t 2 in symmetric distributions with the same mean but different deviations.
probabilities. SPSTA extends such average signal toggling rate computation to compute the time domain distribution of signal togglings in a clock cycle, i.e., signal transition topf. A signal transition topf includes information for not only signal arrival time pdf as in SSTA, but also a signal toggling rate as in power estimation (a close-to-zero signal toggling rate indicates a false path). SPSTA computes this new variable (signal transition topf) in two operations SUM and WEIGHTED SUM, while SSTA computes signal arrival time pdf's in SUM and MIN/MAX operations. A more formal description is given as follows.
A. Variable: Signal Transition Temporal Occurrence Probability
Definition 3: The signal transition topf φ(y) = F(t) of a net y is the time domain occurrence probability function for net y to take a signal transition at time t, where 0 ≤ t ≤ T c , T c is the clock cycle time.
A signal transition topf is the time domain distribution of signal toggling rate, i.e., signal toggling rates at different time spots in a clock cycle for a net (which is not limited to be Gaussian and can be in the form of any distribution), while the traditional signal toggling rate in power estimation is the average signal toggling rate per clock cycle.
A signal transition topf in SPSTA differs with a signal arrival time pdf in SSTA as follows. In SSTA, the time domain integral of a signal arrival time pdf equals one, assuming that there is always a signal arrival at a net in a clock cycle. SSTA does not calculate signal toggling rates.
In SPSTA, a signal transition topf gives a more complete picture of signal switching activity, including both the signal arrival time pdf and the signal toggling rate. The integral of a signal transition topf gives the average signal toggling rate per clock cycle at the net, which may not be one (Fig. 3 )
Normalizing the signal transition topf (such that its integral equals one) gives the signal arrival time pdf ϕ(y)
SPSTA provides integrated statistical timing analysis and power estimation results. This is much needed because as timing and power are interdependent, accurate timing and power analysis must be integrated. I  FOUR-VALUED LOGIC TRUTH TABLE FOR TWO-INPUT AND AND OR   LOGIC GATES. THE ROW AND COLUMN INDICES GIVE THE GATE INPUT   LOGIC VALUES. THE TABLE ENTRIES GIVE THE GATE OUTPUT LOGIC  VALUES.  † INDICATES THE MAXIMUM ARRIVAL TIME OF TWO SIGNAL   TRANSITIONS ON THE SAME DIRECTION, AND  ‡ INDICATES   THE MINIMUM ARRIVAL TIME OF TWO SIGNAL TRANSITIONS ON THE SAME DIRECTION
B. Operation: WEIGHTED SUM
Signal transition topf's are computed in the same way as signal toggling rates, i.e., by weighted summation based on signal probabilities
Note that the traditional signal toggling rates ρ(y) and ρ(x i ) are replaced by their time domain distributions φ(y) and φ(x i ), respectively. The result of a WEIGHTED SUM operation can be significantly different than that of a MAX operation. For symmetric input signal arrival time distributions, the result of a WEIGHTED SUM operation is still symmetric, while the result of a MAX operation is nonsymmetric (Fig. 3) . This is because that the MAX operation is applicable only for the signal arrival time of a noncontrolled value at the output of a gate, e.g., for the arrival time of a logic one at the output of an AND gate. A MIN operation is needed for the signal arrival time of a controlled value at the output of a gate, e.g., for the arrival time of a logic zero at the output of an AND gate. Combining the results of the MIN and the MAX operations gives a symmetric signal arrival time distribution at the output of the gate, as is given by a WEIGHTED SUM operation. Furthermore, the MIN and the MAX operations give accurate output signal arrival times only when multiple inputs are switching at the same time, in other cases some of the inputs may not be switching and they should not be taken as inputs of a MIN or MAX operation. These observations suggest that we need to separate logic zero, logic one, and rising and falling signal transitions and extend SPSTA to be based on a four-valued logic for improved accuracy, which is presented as follows.
C. Extension to Four-Valued Logic
We propose a four-valued logic wherein a signal may be of logic zero (0), logic one (1), rising signal transition (r ), or falling signal transition ( f ) ( Table I) . SPSTA needs to be based on such a four-valued logic because rising signal transition arrival time distributions need to be separated with falling signal transition arrival time distributions, which are propagated by different MIN/MAX computations for a multipleinput gate, as in STA. Different MIN/MAX computations for the rising/falling signal transitions spread the rising and the falling signal transitions in different directions, and lead to increased signal transition time distribution deviation. One must separate the rising and the falling signal transitions to consider this spreading effect in statistical timing analysis. Separating the rising and falling signal transitions also filters out the glitches, which result from simultaneous rising and falling signal transitions at the inputs of a gate. Glitches are not counted in traditional SSTA, wherein MIN/MAX operations take rising/falling input signal arrival times separately. However, glitches are included in traditional circuit switching activity analysis as in power estimation, wherein signal toggling rates are computed by WEIGHTED SUM operations without distinguishing rising and falling signal transitions. To filter out glitches in statistical timing analysis, rising and falling signal transitions need to be separated, as well as logic one and zero signals. This gives a four-valued logic. 6 In our four-valued logic, we compute four signal probabilities P 0 (y), P 1 (y), P r (y), and P f (y), which are the occurrence probabilities for a net y to be logic zero (0), logic one (1), rising signal transition (r ), and falling signal transition ( f ), respectively. 7 We take the following terminologies and notations as in logic simulation: 1) cd = controlled value; 2) nc = noncontrolling value; 3) ncd = noncontrolled value; 4) inv = inversion of the gate. The noncontrolled value at the output occurs only if all the inputs are of noncontrolling value. The output is of noncontrolled value or switching only if all the inputs are either noncontrolling or switching in consistence with the inversion of the gate. The output is of controlled value in all the other cases, e.g., if any of the inputs is of controlling value, or any two inputs are switching in opposite directions. Take an AND gate as an example, wherein 0 is the controlling and the controlled value, and 1 is the noncontrolling and the noncontrolled value, there is no inversion inv(r ) = r and inv( f ) = f .
By extending (7), we have the occurrence probabilities of controlled value, noncontrolled value, and rising and falling signal transition at a gate output as follows: (13) where Pr(∧ i (.)) is the concurrent occurrence probability for the inputs i . In the absence of correlation, concurrent occurrence probability is given by the product of individual 6 A five-valued logic is presented for path delay test generation, wherein a signal can be of stable zero (0), stable one (1), rising (R), falling (F), and uncertainty (X) [6] . Our four-valued logic differs with this five-valued logic in that we exclude glitches and uncertainties.
7 P r (y) and P f (y) are rising and falling signal toggling rates, respectively. Such signal toggling rates are always less than one because we exclude glitches in the four-valued logic.
occurrence probabilities, and (13) becomes
For an AND gate, the four-valued logic signal probabilities are given as follows:
Once signal probabilities are achieved, signal transition topfs are computed by WEIGHTED SUM and MAX operations. A WEIGHTED SUM operation combines input patterns based on their signal probabilities, while a MAX operation is needed in case that multiple inputs are switching. We elaborate as follows.
First, we normalize the rising and the falling topfs φ r (y) and φ f (y) as follows:
Normalized topfs ϕ r (y) and ϕ f (y) have unit time integral, while
Next, by enumerating all subsets of inputs as rising or falling inputs, the topfs for rising and falling output signal transitions are given, e.g., for an AND gate, as follows:
where R ⊆ {x i } is a subset of inputs that are rising and F ⊆ {x i } is a subset of inputs that are falling. For a two-input AND gate, the rising signal transition topf is given by
The runtime is O(2 k ) to compute signal transition temporal occurrence probabilities for a k-input gate, and is linear to the circuit size, i.e., the computation can be done in a single netlist traversal.
D. Handling Correlations
Signal correlations in a VLSI system come from two categories of sources. First, signal arrival times and signal propagation delays are functions of variational physical parameters, which may be correlated spatially and temporally [18] , [21] , [27] , [33] . PCA may be applied to extract a group of nonphysical independent statistical variables [1] , [30] . Alternatively, we need to compute statistical moments (means, standard deviations, skewness, and higher order moments) and (first-order and higher order) correlations of statistical variables
A higher order covariance, e.g., an nth order covariance of n + 1 variables is defined as follows:
Given the higher order covariance, signal correlation or signal probability of a Boolean function is computed based on simple Boolean and polynomial derivations. For example
Second, reconvergent fan-outs in a netlist create signal correlations [2] . One method is to compute Boolean functions for each node in the netlist, e.g., based on BDDs or symbolic simulation, then compute signal probabilities considering the correlations between the statistical variables in the first category. Alternatively, computing conditional probabilities takes care of reconvergent fan-out-induced signal correlations [2] . We implemented this method in SPSTA by validating each signal propagation path/network based on some of the existing ATPG techniques, such as backtracing [6] , and excluding false signal propagation paths/networks.
Take the circuit in Fig. 4 as an example. Assume a and b's fan-in cones do not overlap, and signals a and b's underlying statistical variables are independent. We observe that b is a reconvergent fan-out, and signal c is correlated with signals a and b. SPSTA handles such reconvergent fan-out-induced correlation as follows. When combining signals b and c's logic assignments or their topfs that are linear functions of signal arrival time distributions, SPSTA backtraces and finds their source logic values or signal arrival time distributions, then verifies their consistency (e.g., if they come from the same logic value or signal arrival time distribution at the fan-out b). If they are not consistent, we exclude such a combination. If they are consistent, we further avoid double counting of the common component in signal probability calculation. We elaborate as follows.
First, we calculate the signal probability P 1 (d). For signal d to be logic 1 that is the noncontrolled logic value for the AND gate, signals c and b both need to be logic 1. Subsequently, for signal c to be logic 1, signals a and b both need to be logic 1 (by backtracing). c = 1 and b = 1 are consistent because they both require the fan-out b be 1. Next, we make sure that P 1 (b) is counted only once. This can be achieved by including a denominator P 1 (b) in the calculation. As a result
Now, let us calculate the topf functions for c and d rising.
A rising c signal may come from a rising a or a rising b, or when both a and b are rising. Assuming zero gate delay, we have
For signal d to be rising, either c is rising, b = 1, and a is rising (a rising c signal may also come from a rising b signal, which is inconsistent with b = 1, so we exclude this combination here); or b is rising, and c = 1, which is not possible; or b and c both are rising, while a = 1 or a is rising. Excluding the b rising and c = 1 combination, we have
Substituting (24) and avoiding double counting
We observe that assuming zero gate delays, φ r (d) = φ r (c).
Taking out the pdfs (e.g., by time-domain integration), we have the d rising signal probability
Similarly, we have
and
More details on signal propagation validation by leveraging some of the existing ATPG techniques can be found in Section V-B.
E. SPSTA Summary
VLSI performance may vary significantly with different inputs. Different logic inputs may lead to different timing critical paths and large delay variations. SPSTA captures such input-induced performance variation by considering logic input signal probabilities and toggling rates, e.g., based on input pattern statistics.
SPSTA gives timing performance distributions for all possible input combinations (while in practice, pruning may apply). In the traditional worst case design methodology, one finds the worst case timing performance for a specific input combination, which gives not only improved accuracy in timing performance analysis, but also a delay test pattern. 8 
V. SPSTA-BASED DELAY TEST PATTERN GENERATION
We propose VLSI delay test pattern generation based on an input-aware statistical timing analyzer, e.g., SPSTA. Our method is to: 1) perform SPSTA; 2) identify timingcritical signal propagation paths/networks; and 3) generate test patterns that enable the timing-critical signal propagation paths/networks.
A. Path/Network Generation
SPSTA considers input signal probabilities, and produces: 1) signal arrival time distributions and 2) signal toggling probabilities. After SPSTA, we find the most timing-critical signal propagation paths/networks by backtracing from the output signals of: 1) a nonzero signal toggling probability and 2) one of the maximum signal arrival times in terms of μ+3σ . In SPSTA, we link each signal arrival time distribution at a gate output to its source signal arrival time distributions at the gate inputs. After SPSTA, we backtrace by following these links.
An interesting observation is that, for a MAX operation in SPSTA, a distribution at the gate output is linked to multiple distributions at the gate inputs. As a result, when we backtrace from a timing critical logic output, instead of a timing-critical signal propagation path, we may find a timing-critical signal propagation tree or even a timing-critical signal propagation network (Fig. 5) . This fits with the physical reality, which is in the presence of performance variations, the worst case signal arrival at the output of a gate is more probable to occur when there are multiple gate input switching. This is because in SPSTA or SSTA, a MAX operation gives a skewed distribution, which has a larger mean than any distribution resulted from a single input switching. For example, for a two-input AND gate, the latest rising signal arrival at the gate output is more probable to occur when both inputs are rising, while the input arrival times are variational. In other words, it is more probable to observe the worst case performance when a test vector is applied, which enables multiple timing-critical signal propagation paths or a timing-critical signal propagation network. A timingcritical signal propagation network covers multiple timingcritical signal propagation paths and reduces the number of delay test vectors. As a result, we propose to find delay test vectors that enable multiple signals to propagate in a timing-critical signal propagation network instead of a timingcritical signal propagation path. This reduces the number of test vectors, and increases path delay fault coverage. We have several observations. First, not the entire fan-in cone of a timing-critical logic output is a timing-critical network. For example, for the same two-input AND gate (e.g., G8 in Fig. 5 ), the latest falling signal arrival at the gate output is given by a single input falling with the other inputs having a noncontrolling value, which is logic one. Multiple input falling leads to an earlier falling signal arrival time at the gate output. As a result, a timing-critical signal propagation network includes only a single falling signal at the inputs of an AND/NAND gate and only a single rising signal at the inputs of an OR/NOR gate (Fig. 5) .
Second, we achieve only robust delay test. A robust path delay test guarantees to produce an incorrect value at the destination if the delay of the path under test exceeds a specified time interval (or clock period), irrespective of the delay distribution in the circuit [6] . To ensure a robust delay test, traditionally, a delay test pattern includes a single input signal switching. We achieve robust delay test for each signal propagation network as follows. For any gate in a generated timing-critical network that has multiple switching inputs, all inputs must switch from a controlling logic value to a noncontrolling logic value, for example, multiple falling signal transitions at the inputs of an OR gate, or multiple rising signal transitions at the inputs of an AND gate, such that the gate output arrival time is given by the maximum of the input signal arrival times. If a path delay fault exists, such that one of these inputs remains at a controlling logic value, the gate output remains at a controlled logic value, producing an incorrect logic output, regardless of the side input logic values. Furthermore, we make sure that all the side inputs of a timing-critical network take a stable noncontrolling logic value. We never generate a rising signal and a falling signal at the input of a gate in a timing-critical network. As a result, we guarantee that there exists no glitch in a generated timingcritical signal propagation network, such that we generate only robust path delay test.
Third, this is separate from the multiple input switching effect, which leads to up to 20% gate delay reduction for multiple inputs switching in the same direction [3] . A closer analysis reveals that such significant gate delay reduction occurs only when multiple controlling signals arrive at the inputs of a gate. For example, if multiple logic zero signals arrive at the inputs of a NAND gate, they turn ON multiple pMOS transistors charging the load capacitor. This leads to a much smaller gate delay compared with when a single logic zero signal arrives, which turns ON a single pMOS transistor charging the load capacitor. However, if multiple noncontrolling signals arrive at the inputs of a gate, they do not lead to much gate delay reduction. For example, if multiple logic one signals arrive at the inputs of a NAND gate, and turn ON the single nMOS transistor chain discharging the load capacitor, the gate delay is not much different than when a single logic one signal arrives at an input of the NAND gate, and turns ON the single nMOS transistor chain discharging the load capacitor with the other inputs having logic one. Because a MAX operation is performed only for multiple noncontrolling signal arrivals, there is no significant gate delay reduction due to the multiple input switching effect.
B. False Path/Network Removal
For scalability and accuracy, we check consistency and eliminate false signal propagation paths/networks.
A false signal propagation path/network is a path/network that exists in the topological network; however, no input vector pair exists that propagates a signal transition in that path/network. A reconvergent fan-out may lead to a false path/network. For example, if the two paths from the fan-out to the reconvergent fan-ins are of difference inversions, the rising signal and the falling signal at the inputs of the gate cannot propagate any further. 9 As a result, any network including such two reconvergent fan-out paths of different inversions is a false signal propagation network. Or due to the existence of a reconvergent fan-out, some signals are correlated, such that the side inputs of a timing-critical path/network cannot take their noncontrolling logic values at the same time. As a result, the timing-critical path/network is false. For example, the path G6-G8-G16-G9-G11-G10 in Fig. 5 is a false path, because the side input of G8 and the side input of G10 are connected to the same fan-out net G14, and cannot take their noncontrolling logic values (logic one for the side input of AND gate G8 and logic zero for the side input of NOR gate G10) at the same time.
In SPSTA, we combine gate input signal arrival time distributions for a gate output signal arrival time distribution. It is possible that certain input signal arrival time distributions may not take place jointly. For example, in the presence of a reconvergent fan-out, we may have a false signal propagation path/network, and a false gate output signal arrival time distribution, that results from two signal transitions of different directions or arrival times at the fan-out net. We eliminate such a false signal arrival time distribution by performing backtracing and checking consistency in SPSTA. For each combination of gate input signal arrival time distributions, we backtrace to their source signal arrival time distributions. If we visit a fan-out net more than once in backtracing, the logic values and the signal arrival times at the fan-out net must be consistent. Otherwise, we are backtracing from a false signal arrival time distribution.
To check consistency, we backtrace from all gate inputs, including not only the switching inputs but also the other stable inputs. We leverage the existing ATPG techniques, for example, backward implication. If a gate output has a noncontrolled logic value, then all the gate inputs need to have the noncontrolling logic value. For example, if an AND gate output is of logic one, then all the gate inputs need to have logic one. Otherwise, if a gate output has a controlled logic value, then at least one of the gate inputs has a controlling value. For example, if an AND gate output is of logic zero, then at least one of the gate inputs is of logic zero. There are multiple possibilities. In this case, we stop backward implication. We further perform forward implication. For example, if a logic assignment at a fan-out net is the controlling value of the gate at a fan-out branch, forward implication determines the gate output logic value.
C. Test Pattern Generation and Compaction
At the last step, we apply a modified ATPG algorithm such as path-oriented decision making (PODEM) [11] and find a pair of test vectors for each of the identified timing-critical signal propagation paths/networks. The PODEM algorithm enumerates the logic input combinations to find a test vector that detects a single stuck-at fault, i.e., the test vector needs to drive the fault site to logic one for a single stuck-at-0 fault, or logic zero for a single stuck-at-1 fault, and propagates this signal to a primary output [6] . To detect a path delay fault, we need to drive the timing-critical path starting point to take a signal transition, and all the timing-critical path side inputs to their noncontrolling logic values, respectively, such that the signal transition propagates through the timingcritical path. Since by backtracing, we have achieved a timingcritical path/network, the modified PODEM algorithm only needs to justify signal propagation in the given timing-critical
Algorithm 1 SPSTA-Based Statistical Delay Test Generation
Algorithm 2 Modified PODEM path/network. All the off-path/network signals do not take transition, including the timing-critical path/network side inputs. As a result, the modified PODEM algorithm assigns only stable logic 0 and logic 1 to the unassigned primary inputs and apply forward implication until all timing-critical path/network side inputs are assigned to their noncontrolling logic values, respectively.
We further compact the achieved path delay test patterns. Traditional static test pattern compaction is achieved by combining compatible test patterns. Given two test patterns, we apply D-intersection (∩) at each bit posi- [6] .
We further expand each signal propagation network to its maximum. For example, if a timing-critical network includes a fan-out net, we perform forward propagation and send the switching signal to the other branches. We remove the redundant signal propagation networks that are either covered by another network, or identical to another network. We will generate test patterns for only the prime (or maximum) timingcritical signal propagation networks.
Algorithms 1-3 summarize our techniques.
Algorithm 3 Test Pattern Compaction
VI. EXPERIMENTS
A. Experimental Settings
We base our experiments on the ISCAS'89 benchmark circuits. We assign signal probabilities to the primary inputs and the flip-flop outputs in two schemes.
1) We assign logic one, logic zero, rising signal transition, and falling signal transition with equal occurrence probabilities, that is 25%, to the primary inputs and the flip-flop outputs of the benchmark circuits. 2) We assign 15% logic one, 75% logic zero, 2% rising signal transition, and 8% falling signal transition to the primary inputs and the flip-flop outputs. In both schemes, we have process variations in the form of gate delay variations as well as the signal arrival time variations at the primary inputs and the flip-flop outputs. Gate delays are in normal distribution of μ = 1 and 3σ = 1. Net delays are zero. The signal arrival times at the primary inputs and flip-flop outputs are in normal distribution of μ = 0 and 3σ = 1. The gate delay and the signal arrival time variations at the primary inputs and flip-flop outputs are all independent to each other. While our current implementation is based on these simple models, SPSTA and SPSTA-DTPG methods are not limited to these simple models, and can be extended to be based on more accurate and realistic parametric variation and statistical delay models such as in [5] .
B. SPSTA and SPSTA-DTPG Evaluation Methodology
We have implemented four timing analyzers. 1) We implemented an STA that finds minimum/maximum signal arrival times at the primary inputs and the flipflop outputs (e.g., from the 3σ points of the signal arrival time distributions), and computes four signal arrival times at each timing node: minimum rising, minimum falling, maximum rising, and maximum falling signal arrival times (the minimum/maximum input signal arrival time plus the minimum/maximum gate delay give the minimum/maximum gate output signal arrival time, where the minimum/maximum gate delay is from a 3σ point). We take these minimum/maximum as 3σ points and find the means and the standard deviations for the signal arrival time distributions. 2) We have extended the STA implementation to SSTA, by computing the means and the standard deviations of the minimum rising, minimum falling, maximum rising, and maximum falling signal arrival times [the minimum/maximum input signal arrival time plus the gate delay give the minimum/maximum gate output signal arrival time, wherein the means and the standard deviations are computed based on (5)]. We take the −3σ point of the minimum and the +3σ point of the maximum signal arrival times and compute the means and the standard deviations of the signal arrival times. 3) We have implemented the four-valued logic based SPSTA, as described in Section IV-C. We compute signal transition temporal occurrence probabilities ϕ r and ϕ f , as well as signal probabilities P 0 , P 1 , P r , and P f . We report maximum mean signal arrival times ϕ r (ϕ f ) of occurrence probability P r > 0.001 (P f > 0.001) to exclude false paths. 4) For Monte Carlo simulation, we have implemented a four-valued logic simulator as in Section IV-C. We assign the four logic values (logic one 1, logic zero 0, rising r , and falling f signal transitions) and signal arrival times for the rising and the falling signal transitions to the primary inputs and the flip-flop outputs, and propagate them through the netlists. Glitches are not counted in a timing analyzer, e.g., a rising and a falling signal transition at the inputs of an AND gate give logic zero at the output. 10 The MIN or the MAX computation is taken for signal arrival times based on the logic of the gate and the signal transition directions. We have implemented three delay test pattern generators. 1) Monte Carlo logic simulation that finds m most timingcritical paths/networks of a circuit.
2) The existing SSTA-TQM-BnB method, which performs SSTA, and find out the most timing-critical signal propagation paths based on a TQM in a branch-and-bound algorithm [7] , [8] , [14] , [36] , [37] . 3) Our SPSTA-DTPG method, which performs SPSTA, backtraces from the logic output of the maximum μ+3σ signal arrival time, and find out the most timing-critical signal propagation networks. We propose the top N timing-critical path coverage as a delay fault coverage metrics. The top N timing-critical path coverage is the percentage of the golden reference top N timing-critical paths (e.g., given by a sufficient number of Monte Carlo simulation runs), which are selected by a delay test critical path selection method, such as SPSTA-DTPG or SSTA-TQM-BnB. We give a formal definition as follows.
Definition 4: For a delay test critical path selection method A, and a golden reference delay test critical path selection method G (e.g., Monte Carlo simulation of a sufficient number of runs), the top N critical path coverage C P is given by
where P A is the top N critical paths selected by method A and P G is the top N critical paths selected by the golden reference method G. P A ∩ P G is the intersection of sets P A and P G . |.| is the cardinality of a set. We achieve a golden reference top N timing-critical path ranking list as follows. We run Monte Carlo simulation in Algorithm 4 Achieve Golden Top Timing-Critical Paths batches. Each batch of Monte Carlo simulation runs generates a list of ranked timing-critical paths. After each batch of Monte Carlo simulation runs, we apply merge sort for all the generated timing-critical paths, and achieve a list of top N timing-critical paths. We compute the Kandall tau rank correlation coefficient for the last two timing-critical path ranking lists. The Kandall tau rank correlation coefficient is given as follows [35] :
where n is the number of critical paths. Given two ranked lists of timing-critical paths L x and L y , for timing-critical paths p i and p j , their ranks in L x and L y are x i , y i , x j , and y j , respectively, if x i > x j and y i > y j , or x i < x j and y i < y j , paths i and j are a concordant pair; otherwise, they are a discordant pair, except that if x i = x j or y i = y j , they are neither concordant nor discordant. Therefore, the Kandall tau rank correlation coefficient can be computed as follows:
If the Kandall tau rank correlation coefficient is close to one and stops to further increase as we run one more batch of Monte Carlo simulation, we stop and output the top N critical path ranking list as the golden result (Algorithm 4).
C. Experimental Results and Observations
We first run Monte Carlo simulation and achieve a golden timing-critical path ranking list for each benchmark circuit.
Observation 1: An average of 94.2 millions of Monte Carlo simulation runs are needed for these ISCAS'89 benchmark circuits to achieve a golden reference top N timing-critical path ranking list by Algorithm 4. This implies that the common practice of functional at-speed test cannot be expected to achieve good delay fault coverage.
We then compare Monte Carlo simulation, SSTA, SSTA-S, and SPSTA in terms of maximum output signal arrival time (Table II) . SSTA-S is a simplified SSTA algorithm, which takes no MIN/MAX operations and zero covariance between signal propagation delays. For example, given Gaussian distributions for the delay of each gate in a path, the path delay is in a Gaussian distribution, which mean is given by the sum of the gate delay means, and variance is given by the sum of the gate delay variances. SSTA-S has also been implemented in Synopsys PrimeTime Advanced OCV Technology by the name of depth-based statistical analysis [32] . A number of existing delay ATPG algorithms, such as SSTA-TQM-BnB, are based on SSTA-S resultant signal arrival times [7] , [8] , [14] , [36] , [37] . We have the following observation from Table II. Observation 2: SSTA, SSTA-S, and SPSTA achieve an average of 5.525%, 8.675%, and 3.875% inaccuracy in maximum output signal arrival time estimates compared with the golden reference Monte Carlo simulation results, respectively.
While SSTA gives pessimistic maximum signal arrival time estimates, SSTA-S and SPSTA give optimistic estimates. The 16.7% difference between SSTA and SSTA-S results is due to the MAX operations in SSTA. The 10.4% difference between SPSTA and SSTA results is because that in our implementation, we discard a signal arrival time distribution if its occurrence probability is smaller than an empirical threshold, which is needed to exclude false paths and to achieve a tradeoff between runtime and accuracy.
We next compare SSTA-TQM-BnB, statistical static timing analysis-based delay test pattern generation with (test pattern) compaction (SSTA-DTPG-C) (SSTA-TQM-BnB with test pattern compaction), SPSTA-DTPG, and SPSTA-DTPG-C (SPSTA-DTPG with test pattern compaction) in terms of top critical path coverage (Table III) .
Observation 3: SSTA-TQM-BnB achieves an average of 47.32%, 45.14%, and 57.98%, SSTA-DTPG-C achieves an average of 59.05%, 61.95%, and 70.11%, SPSTA-DTPG The 10.1%, 16.3%, and 10.1% path delay fault coverage improvements for the top 50, 100, and 200 critical paths, respectively, achieved by SPSTA-DTPG compared with SSTA-TQM-BnB are due to: 1) finding delay vectors that enable multiple signals to propagate in a timing-critical signal propagation network instead of a timing-critical signal propagation path and 2) removing false timing-critical signal propagation paths/networks. For example, an average of 6% of the top 200 timing-critical networks given by SPSTA-DTPG are nontree networks.
The 25.68%, 26.05%, and 22.25% path delay fault coverage improvements for the top 50, 100, and 200 critical paths, respectively, achieved by SPSTA-DTPG-C compared with SPSTA-DTPG are due to test pattern compaction. For the top 200 timing-critical paths, the average test pattern compaction ratio is 22.83%, e.g., 10 000 test patterns are compacted into 2283 test patterns. Table IV gives runtime comparison. Observation 4: Monte Carlo simulation, SSTA-TQM-BnB, and SPSTA-DTPG require an average runtime of 50.55 h, 7.75 min, and 12.45 min, respectively. With test pattern compaction, SSTA-DTPG-C and SPSTA-DTPG-C require an average runtime of 8.24 and 12.95 min, respectively. Compared with SSTA-TQM-BnB, SPSTA-DTPG runtime increases due to its larger computation load.
We also observe orders of magnitude of runtime saving achieved by SPSTA-DTPG compared with Monte Carlo simulation for the same path delay fault coverage. The proposed SPSTA-DTPG scales well as the instance size increases, partly because of an implemented pruning mechanism, which removes all signal arrival time distributions of a negligible occurrence probability.
VII. CONCLUSION
Our contributions in this paper are as follows. 1) We observe that VLSI statistical timing analysis and power estimation study the same stochastic signal switching activity in a circuit. By leveraging the existing signal probability-based VLSI power estimation techniques, we have developed an SPSTA technique. By leveraging the existing ATPG techniques, such as backtracing and logic implication, we remove false paths and consider reconvergent fan-out-induced signal correlation in SPSTA. 2) We propose VLSI delay test pattern generation method based on an input-aware statistical timing analyzer, such as SPSTA. We observe that in the presence of performance variation, the worst case performance is more probable to occur when multiple signals propagate simultaneously, and propose to generate delay test patterns that enable signal propagation networks for reduced delay test size or improved delay fault coverage. 3) We propose top timing-critical path coverage as a simple delay fault coverage metrics. Our experimental results based on ISCAS'89 benchmark circuits show that: 1) millions of Monte Carlo simulation runs are needed to achieve a stable top timing-critical path ranking; 2) compared with Monte Carlo simulation results, SSTA, SSTA-S (or depth-based statistical analysis), and SPSTA achieve an average of 5.525%, 8.675%, and 3.875% inaccuracy in maximum signal arrival time estimate; 
