A Monte Carlo-based approach is proposed capable of identifying in a non-enumerative and scalable manner the distributions that describe the delay of every path in a combinational circuit. Furthermore, a scalable approach to select critical paths from a potentially exponential number of path candidates is presented. Paths and their delay distributions are stored in Zero Suppressed Binary Decision Diagrams. Experimental results on some of the largest ISCAS-89 and ITC-99 benchmarks shows that the proposed method is highly scalable and effective.
INTRODUCTION
As CMOS technology is scaled down into the deep nanometer region, control of the physical parameters, such as the feature size of transistors, their doping levels, and oxide thicknesses, has become extremely difficult. Also, the close proximity of devices to each other has given rise to significant interference from elements surrounding a device due to inductive and capacitive coupling and due to environmental factors such as power supply and temperature fluctuations. These and other factors have made it increasingly difficult to predict timing behavior. This results in significant loss of yield from manufactured circuits.
Due to process variations, the delay of a path varies among the different manufactured chips. Therefore, the path delay is no longer a discrete quantity but a statistical quantity (i.e., a distribution). A path delay distribution can be obtained using Satistical Static Timing Analysis (SSTA) [Blaauw et al. 2008] or Monte Carlo. Under the SSTA-based technique, the delay of each gate/interconnect is given as a probability density function (pdf) represented using a mean (μ) and a standard deviation (σ ). The delay of a path is derived by performing a statistical summation operation over all gates/interconnects along the path.
Monte Carlo, on the other hand, generates several instances of a given circuit. In each instance, a discrete value is assigned to each gate and interconnect of the circuit [Stein 1986 ]. The delay of a path in each instance is a discrete value that is the sum of all the discrete values corresponding to the gates and interconnects along the path. The delay distribution of a path is obtained by collecting its delays over all generated instances.
This method for identifying the delay distribution of each path is path-enumerative in nature, and it is nonscalable given the potentially exponential number of paths in the circuit. Analyzing the delay distributions of paths is key to determining the operating frequency (clock period) and the consequent timing yield. This requires a framework to store and manipulate path delay distributions without path enumeration. Less pessimistic timing analysis methods, as in Sivaraman and Strojwas [2000] , rely on Monte Carlo, and, for each instance, they operate in a path enumerative manner. Thus, they are not scalable. The proposed method may accelerate such efforts. After all path delay distributions are calculated using our method, one can identify the highest delay and the associated path frequencies. This will indicate the probability of the highest circuit delay. Similar information can be extracted for the second highest delay and so on. Thus, the circuit's critical timing behavior can be obtained without path enumeration.
The second contribution in this article is aimed at selecting a set of critical paths under any clock period in a scalable manner without path enumeration. These are the set of paths that exhibit a high probability of violating the desired clock period due to defects arising from manufacturing process variations or signal integrity issues. The Path Delay Fault (PDF) model is considered to be very effective in detecting failures caused by undesirable variations in manufacturing processes; see Wang et al. [2009] and Padmanaban and Tragoudas [2005] among others. It is prohibitive to test all paths, and only a subset of paths that exceed a given test margin τ are tested. However, typically, the number of paths that exceed the test margin may be prohibitive (see Wang et al. [2009] and Padmanaban and Tragoudas [2005] among others), and this requires an approach that does not enumerate paths.
In order to account for variations in manufacturing processes and other environmental sources, statistical timing models are used to model gate delays and circuit delay. With the introduction of statistical timing models, the criticality/critical probability of a path x (ϕ τ (x)) is defined as the probability with which x exceeds the given τ .
The shrinking time to market has tightly constrained the test budget for detection of delay defects and allows for only a small number of paths to be tested. The challenge, therefore, is to select the set of paths that provide the maximum delay defect coverage under a given test budget.
The authors in Li et al. [1989] , Murakami et al. [2000] , Shao et al. [2003] , Sharma and Patel [2002] , Qiu and Walker [2003] , and Lu et al. [2005] propose selecting a longest testable path through each gate. These approaches are path-enumerative and lack an appropriate modeling for process variations.
Path correlation is classified into two categories: topological correlation and spatial correlation. Topological correlation refers to shared gates and interconnects, and spatial correlation stems from high process correlation in closely placed path components. According to the spatial correlation model, gates placed very close together on the layout are highly correlated, and, as the distance between any two gates increases, their correlation decreases. The authors in Agarwal et al. [2003] present a method to capture such spatial correlation between gates by dividing the circuit layout into grids.
The authors in Christou et al. [2010] present a non-enumerative technique for measuring path correlations. A numerical value representing the topological correlations among a given set of paths is computed. However, it is unclear how this approach can take into account path criticality under process variations and the spatial correlation of processes to yield a high-quality path set for delay test.
Analytical path selection methods in Zolotov et al. [2010] , Chung et al. [2012] , and He et al. [2013] assume a normal distribution of gate delays and the linear relation of gate delays to the process parameter variations. Also, they are imprecise in their statistical maximum calculation. Such inaccuracies or simplifications may lead to inaccuracies in computing path criticalities and selecting high-quality path sets for effective delay testing.
Monte Carlo, on the other hand, makes no simplifying assumptions on the distribution of process parameters or the models for gate/circuit delays. The accuracy of the Monte Carlo method is independent of the set of process variations and holds direct dependence on the sample size. This is favorable because the accuracy can be increased with an increase in sample size. It is regarded as the golden standard for validating approximation-based tools such as SSTA, and recent work in Singhee et al. [2008] and Veetil et al. [2011] present enhanced sampling methods that deliver reduced sample sizes to achieve a desired accuracy.
The authors in Wang et al. [2004] proposed a Monte Carlo approach for critical path selection while considering topological and spatial path correlations. Wang et al. [2004] select paths based on their conditional criticality. Conditional criticality of a path is the probability that a path is critical in the currently uncovered process space, which also translates to a given path's ability to detect defects that weren't already detected by the previously selected paths. This approach requires identifying enumeratively an initial set of candidate critical paths. The initial set of paths may be prohibitive on modern circuits with many close-to-critical paths. Each path is tested on every circuit instance to determine the set of instances detected by the path, which are then removed from consideration for subsequent paths.
A Monte Carlo-based approach is proposed herein, one that is capable of identifying in a non-enumerative and scalable manner the distributions that describe the delay of every path in a combinational circuit. Furthermore, a scalable approach to select critical paths from a potentially exponential number of path candidates is presented. Paths and their delay distributions are stored in Zero-Suppressed Binary Decision Diagrams (ZBDDs). This article presents a holistic content that elaborates on the procedures used for non-enumerative generation of path delay distributions and non-enumerative correlation-aware critical path selection. It expands on preliminary versions in Somashekar et al. [2012] and Somashekar et al. [2015] and introduces new procedures for implicit manipulation of paths and their related distributions.
The rest of this article is organized as follows. Section 2 presents preliminary information related to ZBDDs and the method of representing path distributions in ZBDDs. Section 3 presents the proposed approach for non-enumerative generation of path delay distributions for all paths in the given circuit. Section 4 presents the proposed correlation-aware critical path selection method to return paths that cover the various process instances. Experimental results in Section 5 show the effectiveness of the proposed approach in comparison to existing approaches, and Section 6 concludes.
PRELIMINARIES ON ZBDDS
A ZBDD is a canonical representation of Boolean expressions that facilitates implicit manipulation of a collection of sets. If the elements of each set are gates, then each set represents a path. The operation on ZBDDs corresponds to the basic set manipulation operations, and the ZBDDs provide built-in operators to carry out such operations. These operators have been shown to be very fast [Minato 1993 ]. Here, we list the operators used in this article:
The complexity of the procedures presented in this article are expressed as a function of the number of calls to these operators.
Figure 1(b) shows a ZBDD containing all physical paths in the circuit shown in Figure 1 (a). Each node in the ZBDD corresponds to a gate in the circuit. Nodes are placed in ZBDD levels, and all nodes at the same level correspond to the same gate or input pin in the circuit. In Figure 1 (b), the root node is A, and it corresponds to input pin A. In the example of Figure 1(b) , there are three inputs pins and three gates in the circuit, and the ZBDD has only six nodes that are placed in the six levels. However, in general, each ZBDD level may have multiple nodes that correspond to an input pin or gate. This will be shown in subsequent examples.
The solid edge out of each node indicates that the node belongs on a path. A dotted edge indicates that the node is not on a path. A ZBDD path is any directed path from the root to terminal 1 . In Figure 1 (b), each ZBDD path corresponds to a physical path in the circuit. Given a circuit, the corresponding ZBDD is constructed using the operations given in Padmanaban and Tragoudas [2005] .
Figure 1(c) shows a ZBDD, denoted by , representing the logical paths of the circuit in Figure 1(a) . In , there are two nodes per physical circuit node. A subscript is used with node names to denote a transition at the respective nodes. The symbol r is used to denote a rising transition, whereas the symbol f is used to denote a falling transition. Note that the ZBDD node corresponding to a rising transition at a gate and the ZBDD node corresponding to a falling transition at a gate are at different levels. This representation is useful to model different rise and fall delays.
The delay distribution of a path represents the different delay values exhibited by the path and the number of times the path exhibited each delay value. An effective approach to record delay distribution of a path is to observe the frequency f k with which a path exhibits delays within each interval b k . Figure 2 illustrates an assumed normal distribution. However, the delay distribution of a path is not necessarily normal, and the proposed approach does not apply any approximation to normalize the distribution. The following method describes a method to store path frequencies for different delay intervals in ZBDDs.
We expand on the work in Minato [1995] that presents an approach to represent polynomials in a ZBDD. A similar approach is employed to record the frequency of a path in a delay interval. To represent path delay distributions in a ZBDD, additional nodes are added to and paths are extended to include these nodes.
Consider the following expression
The integer coefficients in Equation (1) can be represented as a sum of 2 s exponents using binary encoding:
where b ∈ {0, 1} and {b m b m−1 b m−2 . . . b 0 } is the m bit binary representation of the integer C. Integer 6 is represented in binary as 110. It is expressed using sum of 2 s exponents as {2 2 + 2 1 }. The expression in Equation (1) can be rewritten as
The ZBDD for the preceding expression is shown in Figure 3 (a), where nodes 2 1 and 2 2 are added in order to associate frequency to paths. The frequency of any path is derived by adding the labels of all 2 s exponent nodes with solid edges on the path.
In general, nodes for the exponents of 2 are added to the ZBDD and are attached to the desired paths. These nodes are shared by all paths.
In the proposed approach, we first identify a set of delay intervals along the range of circuit delay. For k delay intervals, we have k nodes in the ZBDD represented by b i , i ∈ {0, 1, 2, . . . k}.
Nodes corresponding to delay intervals are added to the ZBDD, and paths are tagged to these nodes to complete the representation of delay distributions. 
PROPOSED APPROACH FOR NON-ENUMERATIVE GENERATION OF PATH DELAY DISTRIBUTIONS
This section presents an approach to identify delay distributions for an exponential number of paths in a circuit. This is achieved over several iterations. During each iteration, the node/gate delays are instantiated using a Monte Carlo instance. At each iteration, all path delay distributions are implicitly updated. Static Timing Analysis (STA) identifies the arrival (upper and lower) time bounds at the output of each node/gate and, subsequently, a time interval on the circuit delay that represents the expected range of path delays. The time interval is represented with up to P discrete points that define the precision of the approach. P can be some exponent of 10. Assume a circuit operating at 1GHz frequency. The delay of the longest path in this circuit is at most 1ns. If a precision of P = 10 3 is used, then two paths will be kept in distinct sets if their delay difference is at least 1 ps. In this scenario, there can be a maximum of 1000 discrete points to capture all path delays. Each discrete point essentially represents an interval that abstracts the delay to a given precision. The higher the precision, the greater the accuracy with which we can capture path delay distributions. The choice of precision is a tradeoff between accuracy and computation time.
A partial path is defined as a path from a primary input to an internal gate. We use ZBDDs to store partial paths. Partial paths are kept in different sets based on their delays. Discrete values are used to identify each set of partial paths.
Consider gate G shown in Figure 5 . The ZBDD associated with the discrete point t at the output of G is represented as t G . t G will contain all partial paths up to G whose delay is t.
Each gate in any given Monte Carlo instance has a fixed delay value. Let I(G) represent the delay of gate G in a given instance. In Figure 5 , E is one of the inputs to G. Let v E be the ZBDD containing all those partial paths through E whose delay, in the given instance, ties to v. The following operations take place each time a partial path is extended through a gate G. Using Equation (2), the delay of the partial path, when extended to the output of G, is calculated. Then, using Equation (3), the appropriate ZBDD is updated to include the partial path extended through G:
GenPathDelayDists() in Procedure 1 lists the procedure to identify the path delay distributions in a given set of Monte Carlo instances. The input to Procedure 1 is a ZBDD that implicitly holds all circuit paths, the desired precision P, and the set of Monte Carlo instances MC-Insts.
The following describes Procedure 1 with an example. The following exposition uses the circuit in Figure 4 (a), which shows the bounds on the arrival time at the output of each gate. The number adjacent to each node represents the corresponding gate delay in a given Monte Carlo instance.
Line 1 in Procedure 1 performs STA to determine the arrival time delay bounds at the output of each node. The function in Line 2 identifies the discrete points within the delay range at each node based on the given precision. A is an associative array that stores the discreet points associated with each gate's output.
In Lines 3−17, the ZBDD( ) is traversed in a breadth-first manner (i.e., level by level) from the root node to the terminal node 1 while processing nodes at each ZBDD level. In the ZBDD of Figure 4 (b), A is the root node, and the solid edge from A leads to node D. For the given instance, the delay associated with node D is 9, and the parents of D are nodes A and B.
Nodes A and B are primary inputs and, in this scenario, both have delay 0. Therefore, all partial paths that extend to the output of node D tie to the same discrete point (i.e., 9). The only non-empty ZBDD at the output of D is expressed as follows:
At the next level, node E has parents C and D. The delay associated with E in the given instance is 13 (see also Figure 4 (b)). Node C has a solid (then) edge to E and, therefore, it participates on the paths through E. The node D has a dotted edge (else) to E and, therefore, it does not participate in the paths through E. We obtain the parents of node D with solid edge to D, which are A and B.
Nodes A, B, and C are primary inputs, and, in this scenario, all three nodes have delay 0. Therefore, all partial paths that extend to the output of node E tie to the same discrete delay, which in this case is 13. The only non-empty ZBDD at the output of E is expressed as
Node F has one parent node (E), and the delay at node F in the given instance is 4 (see also Figure 4(b) ).
The delay of all partial paths through F is 17, and is obtained as follows:
In the preceding expression, D( 13 E ) returns the corresponding discrete point 13.
13 E is the only non-empty ZBDD at the input of F, and, therefore, the only non-empty ZBDD at the output of F is expressed as follows:
Node G has solid edge parents D and F. The delay associated with G in the given instance is 12. The delay of all partial paths through input D tie to 21 at the output of G, which is obtained as follows:
The delay of all partial paths through input F tie to 29 at the output of G, which is obtained as follows:
The non-empty ZBDDs at the output of G are expressed as follows 16 } be the set of ZBDD variables that correspond to the discrete points that our method uses to denote discrete delays of paths from the circuit inputs to circuit outputs. In this example, we need 17 discrete points to represent all integers in the range [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] (see also output of gate G in Figure 4(a) ). The final expression for the path delay distributions in the given instance is as follows:
Line 23 in Procedure 1 calls UpdateDelayDists() to update ξ with the path delay information obtained from the current Monte Carlo instance. The complexity of Procedure 1, per Monte Carlo instance, is O(E· k + k· m) in terms of built-in ZBDD operators, where E is the number of edges in the ZBDD, k is the number of discrete points to represent path delays, and m is the number of frequency nodes. From this analysis, one can observe that the proposed approach is capable of identifying the delay distributions of all paths in the circuit without enumerating the paths.
UpdateDelayDists() in Procedure 2 lists the sequence of operations carried out in order to update the path delay distributions after each Monte Carlo iteration.
The following illustrates UpdateDelayDists() (Procedure 2) with an example. Consider two paths {A-D-F, B-D-F}: The ZBDD in Figure 3(b) represents the delay distributions recoded for these paths up to the last Monte Carlo iteration and corresponds to ξ in the input to Procedure 2. 
PROCEDURE 2: UpdateDelayDists
Input: ξ , ξ inst Output: ξ 1 foreach b ∈ {b 0 , . . . , b n } do 2 ξ b ← SubSet(ξ, b) 3 ξ b inst ← SubSet(ξ inst , b) 4 if ξ b inst = ∅ then 5 continue 6 ξ b inst ← Update(ξ b inst , 2 0 ) 7 if ξ b = ∅ then 8 ξ ← ξ − ξ b 9 ξ I ← ξ b ∩ ξ b inst 10 ξ b ← (ξ b ∪ ξ b inst ) − ξ I 11 i ← 1 12 while ξ I = ∅ do 13 ξ temp ← SubSet (ξ I , 2 i−1 ) 14 ξ temp ← Update(ξ temp , 2 i ) 15 ξ I ← ξ b ∩ ξ temp 16 ξ b ← (ξ b ∪ ξ temp ) − ξ I 17 i ← i + 1 18 else 19 ξ ← ξ ∪ ξ b inst 20 ξ ← ξ ∪ ξ b 21 return ξ ACM
Non-enumerative Generation of Path Delay Distributions and Application

17:11
The expression corresponding to the input ZBDD ξ is
For each interval b, the procedure identifies the paths from ξ inst and non-enumeratively updates the frequencies of the paths for the same interval in ξ . For the given instance, no path exhibits delay in intervals b 0 -b 3 and, therefore, the process continues to b 4 . In Line 2, the function SubSet(ξ, b 4 ) identifies the subset of paths in ξ that contain the variable b 4 . The expression is by implicitly augmenting each path with a variable corresponding to 2 0 . This results in the following expression:
Lines 8−17 perform a series of operations to increment the frequency count of paths, which is encoded using 2's exponents. In Lines 8−10, the paths that previously had a count of 1 (2 0 ) are identified. The following shows the corresponding expressions:
The loop in Lines 12−17 increments the count of each path that previously exhibited delay in b 4 by 1. During the first iteration of the loop, the variable corresponding to 2 0 is replaced with the variable corresponding to 2 1 , effectively updating the path's frequency count from 1 to 2. The following shows the expressions corresponding to Lines 13−16:
The following shows the expressions corresponding to the Lines 13−16 during the second iteration through the loop:
Line 20 adds the updated interval b 4 back to the original ZBDD ξ :
The process proceeds to interval b 5 , and the following shows the expressions corresponding to Lines 2, 3, and 6, respectively:
The final updated ξ is expressed as follows: 
SELECTING CRITICAL PATHS FOR ATPG UNDER PATH CORRELATIONS
This section describes an approach to identify critical paths by implicitly considering path correlations. It is developed on the augmented data structure generated in Section 3. This augmented ZBDD is denoted by ξ .
The approach consists of three parts that are implemented without path enumeration. First, we consider all delay intervals that exceed test clock τ . They are merged into a single interval, and the frequencies of the respective paths are updated. Then the highest frequency path is selected. Subsequent paths are selected by considering path correlations.
For a given test clock, the potential critical paths are those that exhibit delays greater than the test clock. The criticality or the critical probability of each path is given by the frequency with which the path exhibits a delay exceeding the test clock. The frequency here is analogous to the number of defective instances that can be detected by testing this path. The critical probability for each path is implicitly held in the augmented ZBDD ξ .
PROCEDURE 3: MergeIntervals
First, procedure MergeIntervals() is presented. It is a method to combine paths in a given set of delay intervals and update their relative frequencies. This is a preprocessing step used for critical path selection. The input to the procedure MergeIntervals() is the ZBDD ξ, which implicitly holds the paths' delay distributions (Figure 3(b) ) and the subset of intervals over which to combine paths and their frequencies. The output of the procedure is ZBDD ξ U , which is the union of all paths and their summed up frequencies over the given range of intervals. MergeIntervals() is outlined in Procedure 3 and is illustrated here with an example.
Consider the following expression corresponding to ZBDD in Figure 3 (b):
The function SubSet() in line 1 co-factors ξ with respect to b 3 and returns the subset of combinations/paths containing the variable/node b 3 . It further performs an UnateProduct with b U to yield
The preceding operations are carried out without path enumeration, and their time complexity is polynomial to the number of nodes in the ZBDD. Similarly, Line 3 yields
The operations in Line s4 and line 5 produce the following expressions:
Since ξ I is not an empty set, the operation on Line 7 UpdateCount() increments the frequency count by replacing the nodes 2 i with 2 i+1 . The output of Lines 7, 8, and 9 are as follows:
Since ξ I is now an empty set, the while loop is exited and the procedure in Lines 3−9 is repeated for other intervals in the list. In this case, there are no more intervals and the combined entity ξ U is obtained as shown.
The complexity of Procedure 3 is O(k·m) in terms of built-in ZBDD operators, where k is the cardinality of the set of intervals given as input and m is the number of frequency nodes. Notice that the procedure is capable of updating the frequency of potentially an exponential number of paths across multiple intervals without path enumeration.
Using MergeIntervals(), the critical path selection operates only on one delay interval. In particular, all paths that exhibit delays greater than the test clock will be tagged to the node b U and the combined frequency of each path in all intervals above the test clock will also be associated with the node b U .
PROCEDURE 4: ExtractCriticalPath
The remainder of this section describes procedure ExtractCriticalPath() that selects the path with the highest frequency in b U . Such a path can potentially detect defects in most instances. Procedure 4 lists the process for extracting the highest frequency (most critical) path.
The input to procedure ExtractCriticalPath() is a ZBDD of paths with their associated frequencies in a single interval. The output is a set of one or more paths with the highest critical probability (frequency).
Line 1 initially obtains paths through the highest frequency node 2 m . This is a nonenumerative operation on the ZBDD that quickly returns a subset of paths whose frequency is at least 2 m . In Lines 2−5 the SubSet() operation is performed iteratively for all frequency nodes from 2 m−1 down to 2 0 . During each iteration, Line 4 performs an intersection of paths through the current node with the previous set of paths. If this operation yields a non-empty set, then Line 5 updates the path set ξ π to retain only the common set, which corresponds to the highest frequency paths up to the current node.
Let m be the number of frequency nodes, which is usually a very small number (much less than 20 for most circuits). The complexity of Procedure 4 is O(m) in terms of built in ZBDD operators. Thus, the most critical path among a potentially exponential number of paths is identified without path enumeration.
This procedure is illustrated with an example here. Consider a set of paths {A-B-C, B-D-F, G-D-F, G-E-H} in the interval above the test clock. Consider the following expression, which shows a certain frequency count associated with each path
The approach starts from node 2 6 , which yields paths {A-B-C, B-D-F}. This set of paths on intersection with the paths through node 2 5 would again yield {A-B-C,B-D-F}. For nodes 2 4 and 2 3 , the intersection would yield ∅ (NULL), and the path set is unchanged. For node 2 1 , the path set will be updated to {B-D-F}. Again, intersection with paths for 2 0 returns a ∅. Thus, the path with the highest critical probability is {B-D-F}, and its frequency is 2 1 + 2 5 + 2 6 . Subsequent paths are selected by considering path correlations. They should cover segments or process space not in the previously selected paths. All nodes in ξ U are assigned a weight initialized to 1. During the path selection process, the weight on nodes that physically participate or have strong spatial correlation to other nodes along previously selected paths is reduced by 1. After the weights are recomputed on each node, the problem of new critical path selection reduces to the problem of selecting the longest path in a Directed Acyclic Graph (DAG).
In order to select paths with high critical probability, the longest path search is limited to paths above a user-determined frequency threshold f T . The procedure to quickly determine the set of paths with critical probability above f T is listed in Procedure 5.
The inputs to Procedure 5 are the augmented ZBDD ξ U and the integer quantity f T . The output is the set of all paths in ξ U whose frequency is greater than f T . The following illustrates the procedure with an example Let m correspond to the highest frequency node present in ξ U . In our example, this is node 2 6 . Let f T = 10 and m = 6. The 6-bit binary representation of 10 is 001010, which is represented as 2 3 + 2 1 . The function GetMSB(10), in Line 1, identifies the highest 2's exponent for number 10, which is 3. The frequency of these paths is at least 2 3 . In Lines 2−5, the procedure gathers all such paths and stores them in ξ f T .
In Lines 6−11, the procedure prunes out unsought paths with frequencies 8 and 9, which contain the frequency node 2 3 .
PROCEDURE 5: ExtractFreqRange
The complexity of procedure ExtractFreqRange() is O(m) in terms of built-in ZBDD operators. Its complexity is independent of the number of paths.
The approach just presented follows from the understanding that paths that are highly correlated share most of the same segments or contain segments that have strong spatial correlation and, therefore, exhibit similar delay behavior. In other words, highly correlated paths have overlapping process space and tend to detect most of the same defective instances.
One may also insist that every new path should have at least T segments not in already selected paths, and f T can be slowly relaxed until the desired path is selected. Identifying the optimal value of T can be challenging because it directly impacts the correlation among the selected paths. Typically, T is set to a fraction of the average number of segments, denoted by α, along the critical paths in ξ U . Quantity α is
where V is the setof all nodes in ξ U and λ v is the number of paths through v. Quantity λ v , for each v, can be obtained in one forward and one backward traversal of ξ U . The proposed approach for correlation-aware path selection is listed in Procedure 6. The inputs to procedure SelectPaths() are the augmented ZBDD ξ (Section 3), the number of paths to select (K), the set (D τ ) of delay intervals above a given τ, and the initial lower bound on the frequency range f T . The output of SelectPaths() is a path set containing K paths.
Line 1, in SelectPaths(), calls MergeIntervals() (see Procedure 3). MergeIntervals(ξ, D τ ) returns a combined ZBDD of paths ξ U and their associated frequencies over the given set of intervals D τ . In Line 2, AvgPathLength(ξ U ) returns the average number of gates/nodes along paths in ξ U , which is calculated using Equation (4). Line 3 calculates T . w is a vector which contains the weight associated with each node in the ZBDD ξ U . InitNodeWeights(), in Line 4, initializes all elements in w to 1. In Line 5, ExtractCriticalPath() (see Procedure 4) returns the path with the highest frequency. In Line 6, the path p, from Line 5, is added to . UpdateNodeWeights(), in Line 7, reduces the weight corresponding to nodes that physically participate or have strong spatial correlation to nodes along the path p. Eliminate(p) in Line 8 removes the path from ξ U .
Lines 10−20, in SelectPaths(), select the subsequent K − 1 paths. Line 10 calls ExtractFreqRange(ξ U , f T ) (see Procedure 5), which returns a ZBDD of paths whose frequencies are greater than f T . In line 11, LongestPath() is a linear time procedure that extracts the longest path in the ZBDD based on the weights in w. In Line 12, Length(p) determines the length l of p as the sum of the weights on the nodes along p. If l is greater than T , then p is added to ; else, f T is relaxed and the procedure is repeated until a total of K paths are selected.
PROCEDURE 6: SelectPaths
Input: ξ, K, D τ , f T Output: 1 ξ U ← MergeIntervals(ξ, D τ ) 2 α ← AvgPathLength(ξ U ) 3 T ← 0.5 × α 4 w ← InitNodeWeights() 5 p ← ExtractCriticalPath(ξ U ) 6 ← p 7 w ← UpdateNodeWeights( p) 8 ξ U ← Eliminate( p) 9 K ← K − 1 10 while K = 0 do 11 ξ f T ← ExtractFreqRange(ξ U , f T ) 12 p ← LongestPath(ξ f T , w) 13 l ← Length( p) 14 if l > T then 15 ← ∪ p 16 w ← UpdateNodeWeights( p) 17 ξ U ← Eliminate( p) 18 K ← K − 1 19 else 20 f T ← Relax()
return
Previous efforts (Wang et al. [2004] and He et al. [2013] ) toward correlation-aware path selection have relied on a small set of candidate paths because their approach for correlation-aware path selection is path enumerative. These approaches are forced to select a small subset of candidate critical paths using heuristic pruning methods because they lacked the infrastructure to maintain the delay and critical probabilities for an exponential number of paths.
In fact, the approaches in Wang et al. [2004] and He et al. [2013] use Monte Carlo sampling and select only a restricted number of top delay paths from each sample. They combine the few paths selected from each sample and form their candidate set. These approaches do not explore all potential candidates, and, as a result, the quality of the set of selected paths may suffer.
The following describes the metric used to demonstrate the effectiveness of the proposed path selection approach. The quality of a path set is denoted by Q( ) and is evaluated based on the number of failing circuit instances the path set was able to detect among 10,000 random circuit instances. A circuit instance is failing if the circuit delay exceeds the given τ . A path p ∈ detects a failing instance if the path delay exceeds τ in the said failing instance. Let N denote the set of all failing instances and N ( p) denote the set of failing instances detected by path p. The quality of a path set Q( ) is calculated as follows:
EXPERIMENTAL RESULTS
The experimental results highlight the scalability of the proposed approach in terms of generating the path delay distributions for all paths in the circuit and, subsequently, selecting a set of critical paths for testing from among a potentially exponential number of candidate paths. Experimental evaluation was performed on circuits from the ISCAS' 85, ISCAS' 89, and ITC' 99 benchmarks. Experiments were conducted on a Linux machine with Intel Xeon processor and 24GB memory. The presented procedures were implemented in the C++ programming language. The nominal delay for each gate was obtained from a 45nm technology library. To account for spatial correlations, the circuit was partitioned into several grids, and gates were randomly assigned to each grid. We generated 10,000 Monte Carlo instances by assuming 5% variation in oxide thickness T ox and 10% variation in gate length L. Gates assigned to the same grid received identical process shifts. Table I lists the time for generating the delay distributions of all paths with 1 ps delay precision. Figure 7 illustrates the tradeoff between time complexity of the approach and the granularity of path delay distributions. As shown in Figure 7 , the complexity of the proposed approach increases almost linearly with increasing precision (decreasing granularity) of desired path delay distributions. Observe that the time penalty for the proposed approach is very reasonable considering the number of paths in the corresponding circuits (see column 3 of Table I ).
The test clock (threshold τ ) was set to 90% of the worst case circuit delay. The worst case circuit delay was calculated using static timing analysis. The quantity f T , which determines the lower bound on frequency, was relaxed by 100 units if a suitable path was not found in the current iteration.
The proposed approach for path selection (see Procedure 6) was compared against the Monte Carlo (MC10000)-based explicit path selection approach using 10,000 instances, as in Wang et al. [2004] . We call this method MC10000, and it operates as follows. First, a candidate set U is generated by collecting all paths that exhibit delays above the given τ . Paths in U are ordered based on their probability of exceeding τ . A path p from U is selected, starting with the path that has the highest probability, and is tested over all instances. Paths are then ranked based on the number of samples detected by each path. Circuit samples detected by path p are deleted before considering the next path. Note that if critical path selection is the only objective and the test clock τ is perdetermined, then it is not necessary to generate the delay distribution of paths over all intervals. The authors in Padmanaban and Tragoudas [2005] provide an efficient non-enumerative approach to quickly identify all paths that exceed a given τ in a given Monte Carlo sample. The approach in Padmanaban and Tragoudas [2005] is used along with UpdateDelayDists() (see Procedure 2) to generate the critical frequencies (critical probabilities) of the paths. In this way, we consider a single interval above τ, and, in this context, the procedure MergeIntervals() (see Procedure 3) is not required. The critical path selection algorithm has been implemented using this approach.
Our experimental analysis for the proposed critical path selection approach (Procedure 6) is presented in the following three tables. Table II presents the quality of a set of 10 paths selected under different values of quantity T . Table III and Table IV present results for path selection using the proposed approach in Procedure 6 when compared to path selection using the path-enumerative Monte Carlo-based approach MC10000. In Table II , Column 1 lists the circuit name. Column 2 lists the number of grid partitions per circuit in order to account for spatial correlations. Column 3 lists the total number of paths in the circuit. Column 4 lists the number of paths that exhibited delays greater than τ (i.e., at least 90% of the worst case circuit delay) . Columns 5−7 list the quality factor, given by Equation (5), under different T (described in Section 4). We choose three different values for T and list the quality factor of the selected path set under each. One can observe that the approach performs best when T = 0.5 × α . A lower value on T , as in Column 4, requires only a few uncovered segments along every new path. The paths selected in Column 4 were highly correlated, with fewer additional instances detected by each new path. A higher value on T , as in Column 6, demands a larger number of uncovered segments along every new path. In most cases, the approach was unable to find paths that satisfied this criterion, and, therefore, the final tally of paths, after several reductions of f T , was still under 10. In fact, the approach returned 10 paths only for C7552.
In Table III , Column 1 lists the circuit name. Column 2 lists the the number of paths that exhibited delays greater than τ (i.e, at least 90% of the worst case circuit delay). Columns 3 and 4 list the quality factor, with T = 0.5 × α , and time for the proposed approach to select a set of 10 testable paths from the set of all critical paths in the circuit. Note that the time reported here is cumulative of the time taken for generating the critical frequencies of all potentially critical paths considering 10000 samples and the test generation for each selected path. The testability of a path is determined after the path is selected using the methods of Padmanaban and Tragoudas [2005] and Kim et al. [2000] . The path selection is carried out over several iterations. During each iteration, one path is selected using the proposed approach and untestable paths were discarded. This procedure of determining the testability of a path applies to MC10000 as well. The quality factor and time for MC10000 are listed in Columns 5 and 6, respectively. For the ISCAS circuits, the quality of paths selected by the proposed approach is the same as that of MC10000, and the proposed approach achieves an average speed up of 100× when compared to MC10000. One can observe that as the number of candidate paths increases, as is the case for ITC'99 benchmarks, the approach in MC10000 becomes prohibitive.
To provide a more thorough comparative analysis on the quality of the path set selected by the proposed approach, we reduce the set of candidate paths by considering only the testable critical paths as in Padmanaban and Tragoudas [2005] . Results from this analysis are listed in Table IV .
In Table IV , Column 1 lists the circuit name. Column 2 lists the number of testable critical paths. Columns 3 and 4 list the quality factor, with T = 0.5 × α , and time for the proposed approach to select a set of 10 paths. Note that the time reported here is cumulative of the time taken for generating the critical frequencies of all potentially critical paths considering 10000 samples and the subsequent path selection process. Columns 5 and 6 list the quality factor and time for MC10000. When compared to the enumerative Monte Carlo approach for path selection, in Columns 5 and 6 one can observe that the proposed approach achieves the same quality factor, and the proposed approach is on an average 100× faster than MC10000.
CONCLUSION
In this article, a non-enumerative approach to identify the delay distribution of every path in the circuit was presented. Furthermore, a novel scalable approach toward path selection for effective delay testing was presented. The proposed approach explored an exponential number of candidate paths, but its time complexity is linear to the number of selected paths. The experimental analysis demonstrated that the quality of the path set selected by the proposed approach is similar to that of path-enumerative methods.
