Process variation has become prominent in the advanced CMOS technology, making the timing of fabricated circuits more uncertain. In this paper, we propose a Layout-Aware Path Selection (LAPS) technique to accurately estimate the circuit timing variation from a small set of paths. Three features of paths are considered during the path selection. Experiments conducted on benchmark circuits with process variation simulated with VARIUS show that, by selecting only hundreds of paths, the fitting errors of timing distribution are kept below 5.3% when both spatial correlated and spatial uncorrelated process variations exist.
Introduction
As the feature size in advanced VLSI technology continuous to shrink, process variation produces more uncertainty in circuit timing behavior. Accurate timing characterization plays an important role in a variety of applications [1] - [6] , such as statistical timing analysis, post-silicon tuning, postsilicon reliability analysis, and IC identification.
In general, process variation can be divided into two categories: correlated systematical variation and uncorrelated random variation. The correlated systematical variation tends to affect the closely placed gates or routed wires in a similar manner, making the gates and wires more likely to have similar process variations than those placed far apart [7] , which is called the spatial correlation. On the other hand, the uncorrelated random variation means the random variation involves spatial uncorrelation. Transistor timing is heavily impacted by both the correlated systematical variation and uncorrelated random variation [8] . Figure 1-(1) shows the distribution of gate timing variations under spatially correlated process variations, generated by the VARIUS tool [9] , where the deeper color represents higher timing variation. It can be seen that the timing variations of nearby gates are similar. Wires also have this kind of correlation [10] , [11] . Therefore, timing characterization methods [12] - [21] usually first divide a circuit layout into grids to reduce the sample complexity, as shown in Fig. 1- (2) and Fig. 1-(3) . When the grid is small enough, gates in a grid are considered to have the same timing variation. Then, by sampling the representative gates in grids or paths across grids, the timing variation of each grid can be profiled. The ways of sampling the timing information include (1) invasive measurement and (2) non-invasive testing.
For invasive measurement, on-chip monitors such as ring oscillator [12] , NMOS/PMOS transistor chain [13] and slew-rate monitor [14] are inserted into grids to measure their timing. The advantage is that the timing information of grids can be directly measured. However, the more grids the layout is divided into, the higher hardware overhead it costs. Recent works [15] - [17] tried to solve the hardware overhead issue, but the accuracy of timing characterization is still more or less sacrificed.
Non-invasive testing leverages the paths in circuits to get the timing information. As the paths normally go across several grids, the path delay information must be postprocessed by compressed sensing or least square techniques to obtain the timing of each grid [18] - [21] . Obviously, the accuracy of timing characterization is heavily dependent on the sampled paths. Path selection strategies have been well studied in delay fault testing works, but critical paths with long delays are the main concern. Without considering the distribution of paths on the circuit layout, these paths hardly reflect the whole timing distribution of the circuit.
In this paper, we propose a path selection method specifically for non-invasive timing characterization. This paper is an extension of our prior work in [21] . The proposed method has the following contributions:
Copyright c 2017 The Institute of Electronics, Information and Communication Engineers
• Sufficiency and uniformity of the sampled data for fitting timing variations are analyzed; • A layout-aware path selection method named LAPS is proposed to sample paths for estimating the circuit timing variation.
The rest of the paper is organized as follows. Section 2 reviews the problem formulation of estimating the timing variation based on the sampled path delays. Section 3 analyzes the sufficiency and the uniformity of the sampled data, while Sect. 4 proposes the LAPS method. The experimental results are given in Sect. 5, followed by the conclusion section.
Problem Formulation
In this section, we will briefly review the problem formulation of estimating timing variation with sampled path delays [18] , [21] . Taking the timing variation into consideration, the delay of a gate or a wire is represented by:
where K represents a gate or a wire. D(K) represents the actual delay of K. D nom (K) represents the nominal delay of K. v(K) represents the timing variation of K. If a path is a single-sensitized path, then its delay can be approximately considered as the accumulated delay of the gates and wires along the path [18] - [22] . Please notice that, different from the path selection in delay testing, we have no requirement on path length, so either long path or short path is feasible in our work.
Since the gates or wires nearby at layout have similar timing variations, their timing variations in the same layout grid are considered to be similar, especially when the grid is small. This approximation will introduce some fitting errors due to the uncorrelated random process variations, and we will show the fitting errors are still acceptable in the experiment section.
Because long wire can cross multiple grids, we divide the long wire into several segments so that every segment belongs to only one grid. The total delay of the wire is then considered as the sum of its segments. Different metal layers of a chip may have different timing variation distributions, so each layer will be accordingly divided into optimal number of grids.
For example, in Fig. 2 , P 0 is a single-sensitized path when e = 1, f = 0, and g = 0. The gates H and I are both in the grid G 1 , so they are approximately considered with the same timing variation v(
. The wire b belongs to two grids so it is divided into b 1 and b 2 ; a and b 1 are both in
. So the actual delay of P 0 is: 
where D nom (H) denotes the nominal delay of the gate H. Therefore, if the number of single-sensitized paths is N S S P , then the timing characterization problem can be formulated to the following equation set:
where N Grid is the total number of grids. K is a gate or wire which is located at the path P p and also belongs to the grid G g . In Eq. (3), as shown in [18] - [24] , D can be collected using the Automatic Test Equipment (ATE) or Built-In Self Testing (BIST). Since we focus on how to select effective paths for characterizing timing variations, the testing approaches to obtain D will not be extended. Please refer to literatures [18] - [24] to find detailed explanations. As the measurement accuracy has an impact on the fitting errors, we have considered the measurement errors in the experiments. The parameter Π can be obtained by Electric Design Automation (EDA) tools. And V is the unknown timing variation distribution to be calculated.
Please notice that, some single-sensitized paths can propagate both a rising and a falling transition. As rising delay and falling delay of the same gate usually are different, each path can provide two equations. Thus in the rest of the paper, one such path is counted as two paths.
As Eq. (3) is formulated based on the assumption that the timing variations of gates in the same grid are similar, a solution V that satisfies every equation will not exist. Therefore, we use the method of least squares to calculate V:
Data Sampling
The proposed key idea is shown in Fig. 3 . We will solve the above mentioned least square problem to characterize the timing variation distribution. As the quality of timing characterization is determined by the sampled data, this section will analyze the sufficiency and uniformity of the sampled data. Then in the next section, we will elaborate how to select the paths to obtain the sampled data.
The fitting error metric can evaluate the accuracy of the fitted timing distribution. The relative fitting error of a grid g, represented by E(g), is:
where N Gate (g) represents the number of gates or wires in the grid g, and k is a gate or wire in g, so v(g) represents the mean timing variations of the gates or wires in g, and v F (g) denotes the fitted timing variation of g. As Eq. (4) contains N Grid unknowns, so at least N Grid paths should be selected for constructing N Grid equations, and they are expected to have no linear correlation, i.e. without redundant equations. According to the sampling theory [25] , the sampled data should be sufficient and uniform. Sampling sufficiency means at least how many gates or wires in grids should be sampled, while sampling uniformity means the sampled gates or wires should be evenly distributed in the grids.
Obviously, to accurately fit timing variations, every grid should be sampled. In general, higher fitting accuracy can be achieved with more samples. Meanwhile, the decrease of fitting error will dramatically slow down after the number of samples reaches a point. Assume the number of data is N, and the data is in the Gaussian distribution with a standard deviation σ. To achieve an absolute fitting error d with a confidence level u, at least N S samples should be randomly selected [25] :
For example, assuming a grid contains N = 100 gates, and the σ of their timing variations is 0.01, then to achieve the absolute fitting error d = 0.01 with the confidence level u = 1.96 (confidence coefficient 95%), at least N S = 4 gates should be sampled. The theoretical analysis can help us to set the number of samples.
In regard to the sampling uniformity, to reflect the overall timing variation of the circuit, the sampled gates or wires are expected to be evenly distributed in the grids. In summary, the selected paths are expected to meet the following constraints.
C-1: To meet the requirement of the Eq. (5), the selected paths must be single-sensitized paths; C-2: For the sampling sufficiency, it is expected that at least N Grid paths should be selected to cover all the grids, and every grid is expected to be covered by different paths and through different gates or wires for reducing possibilities of constructing linearly correlated equations in the Eq. (4); C-3: For the sampling uniformity, the gates or wires covered by the selected paths are expected evenly distributed in each grid.
Among them, C-1 is the strictest constraint that must be met, while C-2 and C-3 are loose constraint that may not be met completely, but our proposed LAPS method will try to achieve them as much as possible.
Layout Aware Path Selection
The pseudo-code of the LAPS method is given in Algorithm 1. To cover a grid, the main-function LAPS selects a gate located in the grid, and then calls the sub-function Select One PATH to generate a complete path across the gate. We will use the example in Fig. 4 to introduce the algorithm. Fig. 4 An example of LAPS.
Algorithm 1 LAPS
For the sake of simplicity, only the transistor layout is illustrated in the figure. The layout is divided into 9 grids, and there is a selected path P 1 .
Line-1 and Line-2 of Alg. 1 present two user-defined parameters N Grid and N S S P . From Line-6 to Line-12, one single path selection is tried each time until the total number of selected paths reaches N S S P . If no more single-sensitized paths can be found, the function LAPS ends. Based on the constraint C-2, all the grids are expected to be covered by the selected paths, and the grids are expected to be covered as many times as possible through different gates. Thus, in Line-7, the LAPS starts from selecting a grid G T , which has been covered by the fewest number of selected paths, and then we will try to select a path across the G T . If more than one grid satisfies this requirement, then one of them is randomly selected. The sub-function Select A Grid is shown in Line-30 to Line-36. Next in Line-8, a gate K T is selected. It belongs to the G T and has been covered by the fewest number of selected paths. If more than one candidate gates are selected, then the constraint C-3 which considers the uniform distribution, is used to guide the further selection of K T from them. Based on C-3, if a gate can make the average layout coordinate of selected gates closest to the center of its grid, the gate is selected as K T . The sub-function Select A Gate is shown in Line-37 to Line-54. As K T may not be an input, in Line-9, an input K I , which can arrive at K T , is selected by the similar strategy based on C-2 and C-3. The major difference between selecting K T and selecting K I is that, instead of analyzing every gate of G T in Line-37 and Line-40, every gate which can arrive K T is analyzed for selecting K I .
The sub-function Select One Path is called in Line-10 selects a path that goes from K I and through K T . At the beginning, the path only contains an input gate K I . Then its succeeding gates are gradually selected and connected to the path one by one until arriving at an output gate. The detailed steps are as follows. Firstly, to go through K T , the succeeding gates that are not in the logic cone of K T are removed. Secondly, the succeeding gates that cannot satisfy the constraint C-1 are removed. Finally, the constraints C-2 and C-3 are used to determine the order of the remaining succeeding gates to be pushed into the stack K Stack. The K Stack is used for backtracing when the currently selected partial path cannot satisfy the C-1. If no more gates are left in the K Stack, then no single-sensitized paths are available, and this path selection fails. The failed paths will be recorded to avoid unnecessary try in the future. Otherwise, anytime the K s is an output gate, the selection is success.
For example, in Fig. 4 , at the beginning, both the grids G 5 and G 7 have not been covered by the path P 1 yet, so we randomly select one grid from them: assuming G 5 is selected as the G T . Then, in G 5 , since the gate C has not been covered yet, it is selected as the K T . Two input gates A and B can arrive at the C. Both of them are in the same grid. In G 4 , two other gates M and N have already been covered. The average layout coordinate of B, M, and N is closer to the center of G 4 than that of A, M, and N, so B is selected as K I . Now, K T is C and K I is B, so the currently selected partial path is k => B => C. Here C has two fanout gates D and F. According to C-1, the partial path k => B => C => D is a single-sensitized path when b 2 = 1, c 2 = 1, and d 1 = 1, while k => B => C => F is a singlesensitized path when b 2 = 1, c 2 = 1, and f 2 = 0. Although both D and F satisfy C-1, the average layout coordinate of D and U is closer to the center of G 8 than that of F and P to the center of G 2 . Hence D will be selected as the gate K s in the next loop, and the currently selected partial path becomes k => B => C => D. The D has only one succeeding gate E. The partial path k => B => C => D => E is a singlesensitized path when b 2 = 1, c 2 = 1, d 1 = 1, and e 1 = 0. However, to make e 1 = 0, i 1 and i 2 must be 0 too, which conflicts b 2 = 1. Thus, E does not satisfy the C-1. In this case, another partial path k => B => C => F will be tried. Finally, the complete path k => B => C => F => G => g is successfully selected.
Experimental Results

Experiment Flow
The experiment flow is shown in Fig. 5 . First of all, commercial EDA tools are used to generate layouts and extract normalized delays of gates and wires for these benchmark circuits. In our experiments, four ISCAS'89 benchmark circuits s13207, s15850, s35932, and s38417 and two largest ITC'99 benchmark circuits b18 and b19 are used. We use TSMC 65nm technology library to conduct synthesis, placement and routing. For each benchmark circuit, 10 6 paths are randomly selected. On average, the gates contribute more than 99% of the total delay of a path, so for the benchmark circuits, the wires have a much smaller effect on path delays. Thus in this experiment, only the timing variations of gates are considered. It should be understood that in some other circuits, the wires can contribute significant amount of path delays. As explained before, the proposed method can also characterize their timing variation distributions.
Secondly, the VARIUS model [16] is used to simulate the process variations of transistor threshold voltage and channel length. In [16] , for the spatially correlated process variations, a multivariate normal distribution with a spherical spatial correlation structure is applied. In the VARIUS, the phi, correlation distance relative to the grid width, is set to 0.5. The spatially uncorrelated process variations are generated with Gaussian distribution. The size of grid of VAR-IUS is set to let every gate in the circuit has its own variation.
Next, the proposed LAPS method is used to select the paths for timing characterization. An open-source tool [26] is adopted to check whether a partial path or a complete path is a single-sensitized path or not. With the injected timing variation distribution, the timing variations of every gate are obtained, so the measurement of path delays (D in Eq. (4)) can be simulated. Meanwhile, the matrix Π can be derived through the normalized delays of gates. With D and Π, the least square problem of Eq. (4) is then solved by a commercial mathematical tool to fit the timing variation distribution V. Finally, by comparing the injected timing variation distributions with the fitted distributions, the fitting error E(g) is calculated for every grid g.
Selected Paths
The selected paths are expected to meet the three constraints C-1, C-2 and C-3, so N Grid , N S S P , and R P/G = N S S P /N Grid are set to different values according to the different scales of benchmark circuits. For ISCAS'89 benchmark circuits, all the grids can be covered. For ITC'99 benchmark circuits, when the number of grids is large, some grids are not covered, but the percentage of covered grids is still greater than 99%. The sampling sufficiency is shown in Table 1 .
In Table 1 , column D1 gives the average number of selected paths to cover a grid. For example, when N Grid = 100 and N S S P = 100, a grid is covered by 14.78 selected paths on average.
Column D2 gives the average number of selected paths to cover a gate. As explained before, if a single-sensitized path can propagate both a rising transition and a falling transition, this path is counted as two paths of N S S P . When N Grid and N S S P are small, the selected paths indeed cover different gates in a grid. When N Grid and N S S P increase, as the total number of single-sensitized paths is limited by the circuit structure, the selected paths are more likely to cover the same gates. On average, when N Grid = 100 and N S S P = 100, a gate is only covered by 2.03 selected paths; when N Grid = 6400 and N S S P = 19200, a gate is covered by 4.68 paths.
The column D3 and D4 give the average number and the percentage of gates covered by the selected paths in a grid, respectively. For example, when N Grid = 100 and N S S P = 100, on average, 7.28 gates are covered in a grid.
In general, Table 1 shows the selected paths achieve the expected sampling sufficiency required by C-2.
As for the third constraint C-3, the selected gates are expected uniformly distributed in each grid. Figure 6 illustrates the distributions of selected gates. For each circuit, every grid is divided into 5 × 5 = 25 sub-grids. The height of every column represents the accumulated number of selected gates in each sub-grid. We can see the sub-grids of a circuit contains a similar number of selected gates.
The runtime for selecting one path depends on the complexity of the circuit, so the runtime increases nearly linearly with the increasing of N S S P . For the circuit s13207, it costs only less than 1 second to select a path, while for the circuit Table 1 Sampling sufficiency of select paths AN: Average number of gates in a grid; D1: Average number of selected paths to cover a grid; D2: Average number of selected paths to cover a gate;
D3: Average number of gates covered by the selected paths in a grid; D4: Average percentage of gates covered by the selected paths in a grid. b19, it costs about 15 seconds to select one path.
Fitting Errors
For simplicity, the following characters are used in the legends of subsequent figures:
Δ :
Delay under Process Variation − Normalized Delay Normalized Delay
MAX: Maximum MIN: Minimum AVG: Average SD: Standard Deviation First of all, the fitting errors are evaluated when only spatial correlated process variations are injected. The experimental results are shown in Fig. 7 . Generally speaking, the fitting errors range from 1.72% to 4.36%. Under the same number of grids, with more selected paths, more gates in each grid are covered, so lower fitting errors are achieved. When N Grid = 100 and N S S P = 100, the average fitting error is 3.18%; with N S S P = 300, the average fitting error is 2.42%.
This data shows the proposed method can achieve high accuracy in confronting with spatial correlated process variations. In reality, spatial uncorrelated process variations also exist. Figure 8 illustrates the fitting errors, where the R UNC is used to represent the percentage of contributions of spatial uncorrelated process variations to the timing variations of gates. Generally speaking, when R UNC = 30%, the fitting errors range from 2.82% to 5.29%. Similarly, with more selected paths, lower fitting errors are achieved. As more spatial uncorrelated process variations may result in larger standard deviation of gate timing variations in each grid, same number of covered gates in a grid would result in a higher fitting error.
Besides of spatial uncorrelated process variations, in reality, measurement errors also exist. Though repeating measurement can reduce these errors [27] , the delays of selected paths can still not be obtained accurately [16] , [18] , [27] . Hence, the fitting errors are further evaluated when measurement errors with Gaussian distribution (SD ≈ 1.6%) exist. Figure 9 illustrates the fitting errors when R UNC = 30% of Fig. 8 . With the increasing of the selected paths, the fitting errors decrease too. When R P/G = 3.0, the average fitting error reduces to 5.47%. Roughly comparing with the previous work [18] whose average fitting error is 8.35%, the proposed method effectively improves the accuracy. Fitting errors for spatial uncorrelated process variations (N Grid is 400 for s13207, s15850, s35932, and s38417; N Grid is 1600 for b18; N Grid is 3025 for b19) Fig. 9 Fitting errors for measurement errors
Conclusion
This paper proposes a LAPS method for post-silicon timing characterization. We analyze the sufficiency and the uniformity of sampled data for effectively fitting timing variations. Then we select paths with consideration of single-sensitize, sampling sufficiency and sampling uniformity. Experiments on benchmark circuits show that, by selecting only hundreds of paths, we can keep the fitting errors of timing distribution below 4.4% when only spatial correlated process variations exist, below 5.3% when spatial uncorrelated process variations also exist, and below 8.2% when measurement errors are involved.
