We study diagnosis of segments on speedpaths that fail the timing constraint at the post-silicon stage due to manufacturing variations. We propose a formal procedure that is applied after isolating the failing speedpaths which also incorporates post-silicon path-delay measurements for more accurate analysis. Our goal is to identify segments of the failing speedpaths that have a post-silicon delay larger than their estimated delays at the pre-silicon stage. We refer to such segments as "failing segments" and we rank them according to their degree of failure. Diagnosis of failing segments alleviates the problem of lack of observability inside a path. Moreover, root-cause analysis, and post-silicon tuning or repair, can be done more effectively by focusing on the failing segments. We propose an Integer Linear Programming formulation to breakdown a path into a set of nonfailing segments, leaving the remaining to be likely-failing ones. Our algorithm yields a very high "diagnosis resolution" in identifying failing segments, and in ranking them.
INTRODUCTION
Post-silicon validation involves operating prototype chips to verify their correct behavior. In the past, incorrect behavior was mainly due to logical errors impacting the functional behavior of the chip. However, the presence of electrical failures, has now turned into a major hurdle. Electrical failures can be due to a combination of deep submicron effects such as power droop and crosstalk. In addition, manufacturing variations can be classified as a cause of electrical failures.
* This research is supported by National Science Foundation under awards CCF-0811082 and CCF-0811467.
Post−silicon measurement
Our Contribution: Previous Works:
Isolate the failing speedpaths the "failing" segments are diagnosed as accurately as possible.
Cover the failing speedpaths with non−overlapping segments such that on representative paths
Figure 1: Overview of Our Approach
Among different types of electrical failures, timing failure is perhaps the most important over many application sectors. This type of failure occurs when a chip does not operate at a desired frequency due to a combination of various electrical problems such as power droop, crosstalk noise, or a "slower" manufacturing imprint.
Recent research has focused on isolating failures. In [8] , a technique is proposed to dynamically isolate failures (including timing failures) at the micro-architecture level in which the isolation is at the block-level. In [1] , a statistical learning approach is proposed to predict the failing speedpaths by measuring the delays of a small set of representative paths at the post-silicon stage. To help identify such representative paths, in [3] , a technique is proposed which relies on defining a set of basic features (e.g., the number or types of logic gates) to rank the target speedpaths. In [5] , a manuallyguided technique is used to isolate failing speedpaths, via combination of techniques including clock shrinkage and using a debug tester. In [7] , a branch-and-bound technique for isolation of speedpaths is proposed which is based on parameterized static timing analysis.
The focus of the above research has been on isolating the failing speedpaths. However, the major challenge after identifying a failing speedpath is to understand the cause of failure in order to effectively predict the remaining ones, or apply repair or tuning if such infrastructure is in place. This task is particularly challenging due to lack of observability, and inability to analyze different segments of a speedpath.
In this paper, we propose a formal procedure to identify segments of failing speedpaths. Our objective is to form segments with post-silicon delays larger than their pre-silicon ones, and then rank them according to the degree of their delay deviations. We refer to such segments as "failing segments". Identifying failing segments helps towards addressing the above challenges to drive a more focused analysis (e.g., the identified segments can help find layout patterns that are timing-sensitive).
As shown in Fig. 1 , we assume failing speedpaths are first isolated, and timing measurements are made on representative paths at the post-silicon stage (for example using [5] or [9] ) to incorporate actual delays to enhance the prediction.
More specifically, our contributions are listed below:
1. We propose a novel ILP formulation which breaks down a failing speedpath into provably non-failing segments, leaving the remaining edges on the path to be potentiallyfailing candidates. We show a very high "diagnosis resolution" in identifying the failing edges. 2. We provide a ranking of the failing edges based on the degree of delay deviations from the pre-silicon models, and show this ranking to be highly accurate. In this work we assume the cause of a timing failure is manufacturing variations; i.e., dependency on specific instance of parameter variations in a manufactured chip. We also inject a random noise representing unknown silicon behavior in our analysis. While dynamic factors such as crosstalk or power-grid noise are also important causes of failure, they are not considered here. We believe focusing on manufacturing variations is still an important cause of some of the failures which we consider as a first step.
PRELIMINARIES
Given a timing-graph, let us assume the nodes represent logic gates and directed edges represent their interconnections. We assume each edge has a delay which captures the delay of its corresponding interconnect and the logic gate from where it initiates. We also assume primary input and output nodes have zero delay. We define a segment to be a sequence of connected edges on this graph.
In the presence of manufacturing variations, we assume X is a vector of the varying parameters such as channel lengths or threshold voltages of devices. We describe the delay of segment s using the following linear expression:
where μs is the average delay and as is the sensitivity vector with respect to parameter variations X. In the special case when a segment is an edge, we can still get a similar linear expression. Also, the delay of a path in the timing-graph can be written as a summation of the delay expressions of its edges. This linear modeling is done using a standard procedure as discussed in [2] and is shown to be very accurate. Please note that in this work we do not model the circuit delay which requires statistical maximum operations on arrival times under variability. We only need the above linear modeling of segments or paths which has been shown to be much less prone to error than circuit delay.
At the post-silicon stage, the instance of parameter variations (i.e., X) are unknown. For a fabricated chip, the degree of variation might be such that the delays of some of the paths exceed their timing requirements. We assume such paths are isolated and their delays are measured as discussed in [1] , [3] , [5] , [7] , [9] . Moreover, we assume the delays of additional speedpaths are also measured. In this paper, we refer to these measured paths as representative paths.
Let us assume n representative paths P = {p1, ..., pn} are measured at the post-silicon stage. The actual delays of these paths are denoted by vector da, while their variationaware delays are given by
where μ P is the expected delay of the paths in P, and AP is the matrix representing the sensitivities of each path with respect to the varying parameters. Using these actual measurements da, we identify segments on the failing speedpaths. For a segment s we definē In other words,ds is a random variable expressing delay of s given the measurements on the n representative paths.
If we assume the varying parameters in X have Normal distribution, then we haveds ∼ N (μs,σs) to also follow a Normal distribution [6] . The mean and standard deviation can be computed using the following equation as in [6] :
where ΣP = A T P AP and σs is the standard deviation of ds at pre-silicon stage. All the parameters in the above equation are known so mean and variance ofds are calculated as soon as the representative path delay measurements are made.
PROBLEM DEFINITION
Let us denote an instance of parameter variations for a fabricated chip by X = K (K is a constant vector). The actual delay of a segment s is now given by the expression ds| (X=K) , where the segment delay ds is evaluated at X = K. Now, we define failing and non-failing segments s on a path, which will be used in segment identification.
In other words, the actual delay of s is larger than the expected delay given by pre-silicon model. A segment that is not failing, is referred to as "non-failing".
The above definition of a failing segment is intuitive. As we discuss later, our objective is to accurately identify the failing segments on the failing speedpaths without the knowledge of instance of parameter variations and only relying on post-silicon path-delay measurements.
The following lemma helps us identify some of these nonfailing segments of a given path without requiring the knowledge of the instance of parameter variations. Lemma 1. A sufficient condition for a non-failing segment s isμs + 3σs ≤ μs.
Proof. Let us denote the left-hand-side of the inequality byds (wc) . It is the worst-case delay of segment s, given the path delay measurements at the post-silicon stage. The right-hand-side is the expected delay of s based on presilicon model in Eq. (1). Ifds (wc) ≤ μs holds, then s is a non-failing segment based on our given definition. Similarly, we can conclude s is failing ifμs − 3σs > μs.
The above lemma provides sufficient condition to identify some of the non-failing segments. However, it is not a necessary condition for a segment to be non-failing; i.e., a segment s can be non-failing whends (wc) > μs holds. In this case, the conditional variance of ds,σ 2 s , is too large and the actual measurements cannot provide enough information to interpretds. Note that with increase in n, the number of postsilicon measurements,σ 2 s decreases which allows for higher number of segments to be identified as failing/non-failing.
In the absence of knowledge of instance of parameter variations, we categorize a segment into one of the two cases:
• Segment s is "definitely non-failing" ⇐⇒μs +3σs ≤ μs • Segment s is "potentially failing" ⇐⇒μs + 3σs > μs For an instance of parameter variations, we identify a set of failing speedpaths which we denote by P f . Let us assume these paths are covered by a set S of non-overlapping segments. We divide S into two subsets consisting of definitely non-failing and potentially failing segments and denote them by S dnf and S pf , respectively. For the corresponding instance of variations, we denote the segments that are actually failing by S f . Fig. 2 shows the relationship between these sets, where the dashed area corresponds to S df , the definitely-failing segments. We further denote L as the number of edges covered by P f . The number of edges covered by S dnf , S pf , and S f are denoted by L dnf , L pf , and
For a given set of non-overlapping segments covering the failing paths P f , we define diagnostic resolution, DR, as
A high DR indicates that the set of potentially failing edges in S pf is close to the actually failing ones in S f . It implies that the formation of segments S pf for the failing speedpaths is a good one, providing high DR. Note, this definition is in terms of number of edges, and not number of segments. Problem Definition: Given post-silicon delay measurements of n representative paths denoted by P, and m failing paths P f which have been isolated, our objective is to divide P f into a set of non-overlapping segments and form S pf such that diagnostic resolution given in Eq. (5) is maximized. Note that our focus is on the segment identification problem, assuming the failing speedpaths are provided. We assume isolation of failing speedpaths is done using existing works such as [1] , [3] , [5] , [7] , while post-silicon path delay measurement is done using existing techniques such as [9] .
SEGMENT IDENTIFICATION: ILP
In this Section, we first introduce a mathematical formulation of the segment identification problem, and then discuss how it is linearized into an ILP formulation.
Mathematical Formulation
Here we explain our formulation for the simple case when we solve the segment identification problem on one failing speedpath, given post-silicon delay measurements on n representative paths. We can then iteratively apply this formulation on each of the m failing speedpaths in P f .
The objective of our segment identification problem is to maximize DR, given in Eq. (5). Since L f , the numerator of DR, is unknown for an instance of parameter variations, we alternatively minimize the denominator L pf which is the number of edges in the potentially-failing segments. On the other hand, since L pf + L dnf = L holds, minimizing L pf is equivalent to maximizing L dnf which is the number of edges covered by the definitely non-failing segments.
Therefore, in our approach to form S, we focus on dividing P f into a set S dnf of definitely non-failing segments. This leaves the remaining edges to be potentially-failing segments (of length 1). For example, Fig. 3(a) shows a valid solution for a path which is composed of two definitely nonfailing segments, while the remaining edges e3 and e8 are potentially-failing segments of length 1.
To achieve a correct mathematical formulation of the problem, we need to describe a definitely non-failing segment as a continuous connection of edges, and ensure that distinct definitely non-failing segments are apart by at least one edge.
To mathematically describe the above problem, we start by describing our notations with an example. Notations: Consider one speedpath with L edges e1, . . . , eL, and all edges are labeled in increasing order from the path input to its output. For the definitely non-failing segment set S dnf , we denote its j-th segment by s dnf j , and also assume that these segments are indexed in increasing order from the path input towards its output.
Since we require that the definitely non-failing segments be apart from each other by at least one edge, an upper bound on the number of such segments is Nm = L 2
. The remaining edges between these segments belong to S pf . We further define binary variables xij for i = 1, 2, . . . L and j = 1, 2 . . . Nm. If edge ei belongs to the j-th definitely nonfailing segment s dnf j , we set xij to 1. Otherwise, xij is equal to 0, and ei belongs to S pf of potentially-failing segments.
Since the number of definitely non-failing segments is unknown and we only have its upper bound, we define binary variables Δj to denote whether s dnf j exists for j = 1, 2 . . . , Nm. If Δj is set to 1, it indicates that s dnf j has been formed. Otherwise, it means that there is no need to form the j-th definitely non-failing segment.
We further define integer variables lj and uj for j=1, . . . , Nm. If s dnf j is not formed (Δj = 0), we force lj = uj = L. Else, we have formed s dnf j , and uj returns the edge index of the last edge covered by s dnf j while lj gives the index of the edge immediately before the first edge covered by s dnf j . Fig. 3(b) illustrates the definitions of lj and uj . Example: Consider Fig. 3(a) . Here, this speedpath includes L = 8 edges. The maximum number of definitely non-failing segments Nm is equal to L/2 = 4. For the particular solution shown in the figure, we have two definitely nonfailing segments s dnf 1 and s dnf 2 . The remaining edges, i.e., e3 and e8, belong to S pf . Therefore, we have Δ1 = Δ2 = 1 and Δ3 = Δ4 = 0. For s dnf 2 , we have l2 = 3 and u2 = 7. We also have u3 = l3 = u4 = l4 = L = 8.
The mathematical formulation is given by:
Δj ∈ {0, 1}, ∀j = 1, . . . , Nm
Considering the objective, we aim to maximize DR, which alternatively turns into maximizing L dnf . The L dnf is equal to the total number of edges covered by the set S dnf , and can be expressed using Eq. (6); based on our definition, the edges that are covered by S dnf will have xij = 1 and otherwise 0.
Eq. (7) indicates that each edge on the path should be covered by at most one definitely non-failing segment s dnf . For edge ei, if all xijs are 0 for j = 1, 2, . . . , Nm, then ei belongs to the set S pf of potentially-failing edges.
Eq. (8) ensures that the j-th segment formed by our formulation is actually s dnf j . This is consistent with the definition of definitely non-failing segment explained in Section 3.
Eq. (9) 
Eq. (12) uses Δj+1 to ensure that the distinct definitely non-failing segments are at least apart by one edge. Recall that if the (j + 1)-th segment s dnf j+1 is formed, then we have Δj+1 = 1. When Δj+1 = 1, we get uj + 1 ≤ lj+1, indicating that the formed segments s dnf j and s dnf j+1 are apart by at least one potentially-failing edge.
In our formulation, we define Nm number of Δj variables, where Nm is the maximum number of S dnf segments we can possibly form. However, when we cannot form Nm number of S dnf segments, the Δjs with higher indexes of j should be set to 0 for satisfying Eq. (12). Therefore, we have uj ≤ lj+1. Combining this case with Eq. (9), we conclude that when Δj+1 = 0, we have u k = l k = L for k ≥ j + 1. We use this special relationship to enforce the definition of Δj+1 variable which we discuss next.
The binary variable Δj+1 is defined using Eq. (13). When Δj+1 = 0, we have uj ≤ lj+1 = L. The inequality in Eq. (13) turns into
which is always true. When Δj+1 = 1, we have
which is again a valid inequality. Therefore, we enforce our definition of Δj+1 using the alternative conditions of uj ≤ lj+1 = L (when Δj+1 = 0) and uj + 1 ≤ lj+1 (when Δj+1 = 1).
Linearization
In the formulation presented in Section 4.1, Eqs. (8), (10), (11) are in non-linear form. Here, we describe how to linearize these constraints without approximation.
The Eqs. (10) and (11) are nonlinear due to max operation. In general, we can define an auxiliary variable ymax to replace the max expressions, which are in the form of maxi=1,...a(yi). Then, we enforce ymax to satisfy: y1 ≤ ymax, ..., ya ≤ ymax. Now, we are left to explain how Eq. (8) 
where Eq. (16) can be transformed into
The μe i andμe i in the above equation are defined using Eqs. (1) and (4), when the segment is just a single edge. It also makes use of the linearity property in expectation on conditional random variables in writing μs j andμs j in terms of μe i andμe i , respectively. Note that we can precompute μe i andμe i for all edges using Eqs. (1) 
Here, we introduce binary auxiliary variables min(xij, x kj ) to replace xij x kj since both xij and x kj are binary variables. In addition, we need to introduce additional linear constraints for these auxiliary variables expressing the min operation, which is skipped due to lack of space. We are left to transform the right-hand-side of Eq. (17) into a linear form. Using Eq. (4), we express 9σ as ji as jk (t ik + t ki ),
Here, we have two non-linear terms a 2 s jk and as ji as jk . Due to the lack of space, we only discuss how to linearize the first term. We can follow the same procedures to linearize the other one. For a
ae ik ae lk xijx lj (22) Note, we have already introduced auxiliary variables to linearize xijx lj , as previously discussed. Thus, we do not need additional variables or constraints to linearize a 2 s jk . Solving ILP for a path is extremely fast (fraction of a second in our simulations). For more paths, we iteratively apply the ILP and show this still gives us high accuracy in our simulations. Therefore, the use of ILP is justified for practical considerations.
SIMULATION RESULTS
Our framework was implemented using C++, and we used CPLEX 9.0 to solve the generated ILP formulation. In our experimental flow, we start by synthesizing ISCAS'89 benchmarks using 90nm TSMC library and Synopsys Design Compiler for minimum area under a stringent timing constraint to ensure having many critical paths. We assume parameter variations in effective channel length L eff and zero-bias threshold voltage V th to be Gaussian distributed with standard deviations of 5% and 10% of their mean, respectively. To capture spatial correlation between the varying parameters, we use the multi-level hierarchical model of [2] , which defines rectangular regions on the chip. The gates/interconnects in the same region or in physically close-by regions will share all or some of their parameter variations and be correlated to each other using this hierarchical model.
For smaller benchmarks (S1423 to S9234), we use a 3-level hierarchical model, resulting in This assumption is consistent with [6] . Consequently, for each region, we had two distinct random variables of L eff and V th .
To obtain a "golden model" capturing post-silicon behavior in our simulations, we assume the instance of parameter variation is known. We then experiment by analyzing many such instances generated using Monte Carlo (MC) simulation reflecting various post-silicon cases. Furthermore, for each variation instance, we assume an additional Gaussiandistributed error of i is introduced for the delay of each gate i representing a mismatch between pre-and post-silicon delay models. We randomly generate this error for each gate and assume it to be at most 6% of the nominal gate delay in our simulations.
Our framework requires as input, post-silicon delay measurements on representative paths (which can include failing ones). However, the problem of representative path selection is outside the scope of this paper and is studied separately, such as in [3] . While our framework can take as input any method for selecting and measuring representative paths, the important factor in our simulations is to ensure that the measured paths are indeed representative so they can more effectively reflect the silicon impact. We briefly summarize our procedure to get measured representative paths:
1. We first identify a large set of critical paths Pc, using pre-silicon timing models for nominal case of variability. These paths are selected if their nominal delays fall within a 20% window of the timing constraint.
2. We use the procedure of [10] to identify a subset of these paths denoted by P, as representative paths (P ⊆ Pc). These paths are highly correlated with the target critical paths in presence of variability. We indeed verified that our representative paths could predict the delays of critical paths with an average error of less than 3% over a large set of parameter variation instances.
3. We assume the measured delays of these representative paths are obtained using the explained "golden model". So we assume an instance of parameter variations X and an instance of random noise for each gate, and incorporate these in Eq. 2 to find the delay of each path for each simulation.
Comparisons on Diagnostic Resolution
We generate MC samples for parameter variations, which follow Gaussian distributions. Due to the very large dimension of the parameter variations, we generate several MC sample sets and all of them include 2,500 samples. We chose the sample set, which yields the largest number of failing samples, for performance validation. We declare an instance to be a "failing sample" if at least one path in PC fails the timing. Recall, we assume failing paths are identified and isolated using existing techniques, such as [7] . Here, we use our golden model to localize failing paths for a variation instance, while the representative paths are determined once and remain the same over all variation instances. The number of failing samples NFS is given in column 3 of Table 1 . Note this number is high compared to the total number of samples for some benchmarks. It is because 2,500 samples are selected to cover a high number of failing samples, representing various failure possibilities. In addition, this number also differs among benchmarks due to their topologies.
For a failing sample, we then measure representative path delays (P) according to our golden model. Our framework then generates and solves the ILP formulation using these measurements to identify segments on the failing paths.
Here, we consider two sets of failing speedpaths P f,1 and P f,2 . For an instance of parameter variations, P f,1 includes all failing speedpaths in P f with timing violation less than 3% of the timing constraint. It represents small violation case. Similarly, P f,2 includes all failing speedpaths with timing violation between 3% and 6% of the target circuit delay. It represents large violation case. Note that the paths in P f,1 and P f,2 are different for each variation case and are defined based on the post-silicon delays.
In table 1 , Column 2 provides the number of paths (|P|) for direct measurement. Columns 3-10 show the performance of our framework for small violation case where the segment identification is performed over the set of P f,1 . Column 4 gives the average number of failing paths (NFP ) in our MC simulation. Column 5 gives the average diagnostic resolution (DR) after ILP formulation. As can be seen, the average DR over all circuits is very high (76.79%). Note that in our simulations the actual failing set S f was indeed contained in the potentially failing set S pf .
Columns 6-7 give the ratios of the average number of potential failing edges (L pf ) with respect to the total number of edges covered by the failing speedpaths (L). We compute this percentage before ILP (where each edge is a segment of length 1 of either definitely non-failing or potentially failing types). We also compute this percentage after ILP (where some edges are merged into definitely non-failing segments). As can be seen, after ILP, we can reduce this percentage by on-average 8% in small and 13% in large benchmarks. We also report the average length of the failing speedpaths (LFP ) in Column 8. Column 9 gives the maximum number of edges which can be merged (ΔNe) in our MC Simulation. Compared with Column 8, we sometimes merge a large proportion of edges into non-failing segments.
Column 10 gives the average diagnostic resolution of a path (DRp) over MC samples. The DRp is defined with respect to each path; the ILP is solved for each path and the diagnostic resolution measured for each path separately, and then averaged over the number of paths and the MC samples. As can be seen, DRp is very high, indicating that the set of potentially-failing edges per path is very close to the actually-failing ones, if paths are individually considered.
Columns 11-14 give simulation results for large violation case (the set of failing speedpaths is P f,2 ). In this case, we observed that the values of DR and DRp are higher than those in small violation case. This could be due to the fact that in the large violation case, there are likely more number of failing edges on the paths (L f is larger).
To further show the effectiveness of our ILP formulation, we performed another experiment considering all failing speedpaths (for all the post-silicon delay violation cases). For each path, we recorded the number of edges merged into definitely non-failing segments after applying our ILP formulation. We also recorded the number of actual failing edges (obtained using golden model) over all the violation cases. Fig. 4 shows the scatter plots of these two quantities for S1488. As shown, for the cases of small delay violations, we can merge more edges into non-failing segments. For large violations, since more edges are prone to failure, there is less benefit in solving the ILP.
Note that solving ILP for each path is extremely fast (fraction of a second), which makes our algorithm practical.
Comparisons using Other Metrics
Consider small violation case (i.e., the set of failing speedpaths is P f,1 ). We define two ranked lists of 1) potentially failing segments, and 2) actually failing ones. This allows us to evaluate a match ratio as we discuss later. The two lists are constructed as follows. After applying our framework, we obtain a set of potentially failing segments S pf , and for each one compute the expected segment delay given the path measurements (i.e.,μs) using Eq. (4). We rank these segments in non-increasing order in terms of the segment delay deviation (μs − μs), and derive a rank list L1. We also rank all the segments in S f in terms of their actual post-silicon delay deviations from μs in non-increasing order to get the list L2. Note the length of all segments in L1 and L2 is 1.
Using a control parameter η ∈ (0, 1], we define a window of size m to compare the segments of the two ranked lists, where m max(η|L2|, min(10, |L2|)). This ensures that for fair comparison, the window size is at least 10 segments, or the entire list (if the list size is less than 10 segments). We then consider the first m segments of the two ranked lists L1 and L2 and define the metric "match ratio of the rank list" as MMRL(η) = n m , where n is the number of segments in the window of L2 which were contained in the window of L1. A high value of MMRL indicates a high accuracy of our framework. Table 2 gives MMRL for η = 20%, 40%, and 100%. Column 2 gives the total number of gates in the entire circuit. Columns 3-5 show that for nearly all cases, MMRL is larger than 85% indicating a large matching. It indicates that our framework can determine the top η (in %) failing segments very accurately, even when η is small.
In addition, Columns 6-7 provide the sharing information of the failing speedpaths. For the failing speedpaths, first we consider the failing edges and compute the average number of failing speedpaths going through each failing edge (over all the paths and MC samples). We denote this by n f e . Next, we consider all the failing and non-failing edges of the failing speedpaths. We then compute the average number of failing speedpaths going through each edge. We denote this by ne. As shown in Table 2 , n f e is higher than ne, indicating that the edges which are covered by a higher number of failing paths are more probable to fail. We plan to incorporate this observation to improve our framework in future.
CONCLUSIONS
We presented a framework for fast post-silicon segment diagnosis on failing speedpaths, which was based on our proposed ILP formulation. We showed through simulation that we can achieve a very high diagnostic resolution of failing segments in the speedpaths. A failing segment means that its silicon delay may be deviating (substantially) than its pre-silicon delay. We also show a higher accuracy for ranking the failing segments based on the degree of failure.
