We propose SoftCorner, which is a novel approach to estimate the worst delay (3σ delay) of a VLSI system. SoftCorner is a modification of deterministic static timing analysis to overcome its tendency to give pessimistic estimates of the system delays when it uses 3σ corner delays for logic gates. The basic idea of SoftCorner is to use corner delays that are relaxed to <3σ . To select the relaxed gate delays, SoftCorner uses the delay information obtained from the K most-critical paths and then determines the degree of relaxation, and constructs a model that represents the probability that the delay of each logic gate is selected. This probability model is used to guide the selection of gate delays; the worst system delay with relaxed delay criterion can be obtained by running the selection module until the mean of the results converges. SoftCorner can also estimate the system delay at an arbitrary percentile. In experiments, SoftCorner estimated the system delay of benchmark circuits at 99.87 th , 95 th , and 85 th percentiles with an average error <2% compared with the Monte-Carlo (MC) simulation, and produced the estimates 5.89×10 4 times faster than the MC simulation on average.
I. INTRODUCTION
As the technology further scales down, fine control in the fabrication process becomes increasingly difficult, so the relative importance of process variations (PVs) has continuously increased [1] . PVs impose uncertainties on device characteristics that are affected by process parameters. As a result, timing components such as gate delay and arrival time at the circuit nodes become random variables. Therefore, designers seek to obtain limits to the system delay at a given criterion, such as three standard deviations (3σ ) above the mean (i.e., 99.87 percentile), which is related to the timing yield.
Many methods to estimate system delay at the target percentile have been suggested. One approach is to estimate the form of system delay distribution, then to use a cumulative density function to determine the system delay at the target percentile. Another approach is to obtain the system delay at the target percentile directly, like the deterministic static timing analysis (DSTA).
Among the methods included in the first approach to estimate the system delay at the target percentile after obtaining the distribution, Monte-Carlo (MC) is the most fundamental and accurate. MC simulation generates random samples from the parameter spaces and uses the samples in transistor-level timing simulations [2] . After a large number of simulations, estimated system delay converges closely to the system delay. MC simulation is the most accurate method because it does not require any assumptions about the distribution of the timing components. However, MC simulation is impractically slow.
For this reason, many analytic model-based methods have been proposed that do not require numerous simulations [1] , [3] - [8] . However, widespread use of these methods in industry is not practical, for several reasons. First the quantity of statistical foundry data in standardized format is insufficient [9] . Moreover, many challenges such as analyses of the latch-based design and interconnect analysis, which [19] .
have been addressed in early static timing analysis (STA) algorithms [10] , [11] , have not been solved [9] . Therefore, instead of analytic model-based methods, many MC-based methods have been proposed to reduce the complexity problem of the traditional MC simulation.
First, the use of various variance reduction sampling (VRS) methods to increase the convergence rate when obtaining data moments (e.g., mean and variance) has been considered as an alternative to MC simulation [12] - [14] . Veetil et al. [13] proposed stratification + hybrid quasi MC (SH-QMC) which uses different VRSs to different parameters considering the impacts of the parameters on the circuit delay. SH-QMC greatly reduces the number of data required to estimate the converged moments of the distributions of the timing components.
However, there are limitations to obtaining the distributions of the timing components using SH-QMC. The first limitation is related to complexity. SH-QMC greatly reduces complexity compared to MC simulation, but still requires a lot of simulations to converge compared to DSTA and analytic model-based methods. This disadvantage is often an important hurdle to meeting time-to-market constraints.
In addition, SH-QMC has disadvantages related to the accuracy. Above all, it is quite difficult to calculate the error bound or the number of samples required to converge [15] . Also, there is the loss of accuracy in estimating the system delay distribution caused by using the reduced number of data. If the system delay follows a specific density function, the parameters of the density function can be obtained using the rapidly converged moments. However, the distribution form of the system delay is not known, especially in modern process technology, although the gate delay distribution is well known to have a lognormal distribution at low voltage [16] . In this case, the system delay distribution can be estimated using a histogram. However, since the histogram is essentially a discrete function, the probability density function (PDF) information is insufficient. This drawback is exacerbated by a reduction in the number of data because the histogram is more inaccurate due to the width of the bins that expands as the number of data decreases [17] , [18] .
Another MC-based method was proposed in [9] . Merrett and Zwolinski [9] proposed MC static timing analysis (MCSTA) to overcome the high computational complexity of MC simulation. MCSTA first characterizes the cell delays using a number of samples extracted from PVs. Then, the numerous cell delay data are stored in variance cell libraries (VCLs). MCSTA selects and allocates one gate delay from the VCLs to each gate of the circuit. Using the selected gate delays, the traditional STA is run. This method greatly improves the efficiency compared to MC simulation, but the memory size is huge and the search time is long because it uses very large-sized libraries.
The second approach, which is the way to obtain the system delay at the target percentiles directly, involves traditional DSTA. Traditional DSTA estimates the 3σ point of the system delay distribution assuming that all gate delays have three sigma points of the gate delay distributions. However, as the with-in die (WID) variation increases in modern technology [20] , the traditional DSTA is known to cause the overly pessimistic results [21] . Therefore, many DSTA-based methods have been proposed to overcome these pessimistic results. These methods are more computationally efficient than MC-based methods and are more applicable in the industry than the analytic model-based methods.
Deterministic delay models (DDMs)-based STA is a DSTA-based algorithm [22] . DDM-based STA introduces a formula for less extreme δ sigma point, δ < 3 compared to the conventional 3σ corner used in DSTA. However, DDM-based STA requires some unrealistic assumptions because it uses a first-order model to derive the equation. For this reason, this method causes huge error, especially when applied to recent process technology.
We propose SoftCorner, which is a DSTA-based algorithm that reduces pessimism by using one of the candidate corner values for gate delay. SoftCorner has three advantages over the previous methods: it (1) does not require considerable simulation time as MC-based methods do; (2) requires a reasonable library size because only candidates for corner values are stored; and (3) does not require any assumptions about the distributions of the timing components or the linearity of the parameters on the timing components.
II. BACKGROUND
Statistical static timing analysis (SSTA) can be classified into three categories (Table I) according to the stage in the design flow in which it is used [19] .
Early process-specific SSTA is used when the design is not specified. Therefore, no circuit information including the placement and correlation information is provided. Therefore, early process-specific SSTA is performed on generic paths. Although it this method has the advantage of allowing quick optimization of the design, the analysis is the least accurate.
Early design-specific SSTA is performed after the design is determined, but other circuit information such as placement and correlations is not known. Therefore, it can be used in the pre-placement design phase. This type of SSTA is the most practical and has moderate accuracy compared to other types of SSTA. This type of SSTA is most frequently used because it has the advantage of being enabling quick optimizations with reasonable accuracy.
Late design-specific SSTA can be performed when the netlist, placement, and correlation information have all been determined. This method has the highest accuracy, but does not allow rapid optimization of the design. Therefore, it cannot be used in situations where design decisions must be made early in the design phase [23] .
SoftCorner is designed for use in early design-specific SSTA. SoftCorner can be used for quick optimization during early stages of the design flow even if placement and correlation information are lacking. In the pre-placement design step, only die-to-die (D2D) variation and uncorrelated within-die (WID) variation are determined [24] . Because the information about placement and correlation is not known at this stage, spatially-correlated WID variation is not considered. Therefore, SoftCorner requires information on the netlist, process variations including D2D and uncorrelated WID components, and the user-defined target yield.
III. PROPOSED METHOD TO REDUCE PESSIMISM OF TRADITIONAL DSTA
SoftCorner uses one of the candidate corner values stored in the library. These candidates are less extreme than the corner values that are used in traditional DSTA algorithm. By using one of these candidates, SoftCorner can reduce the pessimism of traditional DSTA method. SoftCorner consists of three steps (Fig. 1) .
In Step 1, the distributions of the gate delay and output slew of all logic gates are obtained using MC simulation. Because the distributions are obtained using MC simulation, SoftCorner does not require any assumptions about the PDFs of the timing components. From the obtained distributions, several candidates of corner values are stored in the library. Each candidate has a probability of being used as a delay (or output slew) in the STA engine.
In Step 2, the probability model is determined. This probability model depends on the input netlist and the userdefined target percentile.
Step 2 consists of three processes after reading the input netlist. During statistical timing analysis, the maximum path delay among K most-critical path delays is a random variable KCP max . Therefore, during
Step 2, the upper and lower bounds of the cumulative density function (CDF) of KCP max are obtained using [25] after extracting the K most-critical paths [32] . Then the target delay value is set using the obtained bounds. Finally, a probability model is adopted, which yields expected delay that is equal to the target delay; this model represents the probability that a certain a candidate will be selected as the corner value during timing analysis.
In Step 3, one candidate is allocated differently for each logic gate in the input circuit. For the allocation, the probability model determined in Step 2 is considered. Using the delay and slew corner values allocated to the logic gates of the circuit, the system delay at the target percentile is estimated using an STA engine. The processes of each step will be detailed in the following sections.
A. GENERATION OF LIBRARIES OF CANDIDATES FOR CORNERS
The first step of SoftCorner is generation of libraries that contain the candidate corner values. SoftCorner adopts the Liberty Variation Format (LVF), which has one delay value per load-slew combination for the logic gate [26] . Therefore, the distributions of gate delay and output slew for the i th output capacitance and the j th input slew are obtained using MC simulation. The candidate corner values from these distributions of MC data are stored in the libraries (e.g., Fig. 2 Because the converged MC data are used to obtain the distributions, SoftCorner does not require any assumptions about the distributions when it extracts any p th percentile point (p ≤ 99.87). One of the candidates in the library is selected and used as the corner value in STA. The candidates are less than or equal to the traditional worst gate delay (d N i,j ). Therefore, the use of the proposed library can effectively reduce the pessimism of traditional DSTA.
B. CREATION OF PROBABILITY MODEL
The second step of SoftCorner is to construct a probability model for all the candidates of each logic gate. Each candidate d i,j k has a probability P of being selected as the corner value. For each logic gate, P depends on the number of logic gates (depth) of the path in which the logic gate is included.
The expected delay of a logic gate decreases as depth of the path in which the logic gate included increases. This is because of the cancellation effect caused by the uncorrelated WID PV components. As the depth of a path increases, the uncorrelated WID PV components of the logic gates in the path are easily canceled out [27] , because WID components of all logic gates in the path rarely increase or decrease concurrently, especially for deep paths.
To incorporate this cancellation effect, as the depth of a path increases, SoftCorner increases the Ps of the smaller candidates to reduce the expected value of the gate delay. In contrast, if the logic gate is included in a short path, SoftCorner decreases the Ps of the smaller candidates to increase the expected value of the gate delay. In other words, the probability model in SoftCorner follows two conditions: (1) For logic gates included in a path that has large depth, P must be increased for small candidates. (2) For logic gates included in a path that has small depth, P must be decreased for small candidates.
To reflect the above two conditions, the probability that the C th candidate of a logic gate in a path with depth D is selected is represented as P(C, D). The C value of d k i,j is set such that the CDF value at d k i,j ( Fig. 2 ) is equal to that at Cσ value of a standard normal distribution. For example, C = 3 for d N i,j because the CDF value at d N i,j was set to 99.87% which is the CDF value of a standard normal distribution at the 3σ point. Including another variable x which adjusts the expected gate delay according to C and D, the probability model is represented as (1) . The sum of the probabilities at all Cs is normalized to 1 by dividing a C(D−x) by i a i(D−x) because one of the N candidates must be selected.
In (1), a is a constant and x is a variable. After a is set, the value of x is determined by the input circuit and the userdefined target yield. Even if SoftCorner uses an exponential function as the probability model, other forms can be used in a similar manner, if the conditions (1) and (2) are satisfied and the existence and uniqueness of the solution of the model are guaranteed (Section IV). a is related to the exponential property of the probability model. The range of a is 0 < a < 1. The exponential property of the probability model decreases as a increases (Fig. 3a, b) . Therefore, when only one logic gate is considered, a affects the variance of the selected corner values of the gate delay.
x is related to the degree of pessimism reduction. If x is too small, the smallest candidate (0σ ) will be selected for all logic gates regardless of D (Fig. 4a) . If x is too large, the largest candidate (3σ ) will be selected for all logic gates regardless of D (Fig. 4b) and, in this case, the results of SoftCorner are the same as those of traditional DSTA. Therefore, we can reduce the pessimism by reducing x. To make SoftCorner also applicable to input circuits for which pessimism never or excessively occurs, the range of x is set to
We should determine how much to decrease x (i.e., how much to reduce the pessimism). In SoftCorner, the pessimism is reduced until the expected value of KCP max using the probability model is equal to the target delay value calculated using [25] . KCP max is a random variable in statistical timing analysis. Depending on process parameter values, the critical path and its delay value change every run. Therefore, KCP max is a random variable that has a certain distribution. However, the exact form of this distribution is difficult to obtain accurately even using recent methods in path-based statistical timing analysis. However, [25] proposed a method that uses stochastic majorization theory to obtain the upper and lower bounds (Fig. 5, solid lines) of the CDF ofKCP max . In this process, the path delay is assumed to follow a normal distribution; this assumption is reasonable in most cases, because according to the central limit theorem, even though delays of each logic gate delay do not follow the normal distribution, the path delay distribution, which is the sum of several non-normal distributions, rapidly converges on a normal distribution [28] . The path delay distribution is well approximated as a normal distribution as long as D > 4 [16] . To validate the normality of the path delay distribution, we used the inverter chain with different depths, and made quantile-quantile (Q-Q) plot [29] and Shapiro-Wilk (S-W) test [30] which are frequently used as the normality test (Fig. 6) . For each inverter, 1000 delay values are simulated using HSPICE. Q-Q plot means that if the quantiles of the samples ('+' sign in Fig. 6 ) are on the quantiles of the normal distribution (straight line in Fig. 6 ), the samples can be approximated to the normal distribution. Also, the larger the S-W test value is, the closer to normal distribution the samples are. Fig. 6 shows that the path delay distribution is close to normal distribution when the depth is larger than four.
With the reasonable preconditions, the upper d upper and lower d lower bounds of the CDF of KCP max can be obtained (Fig. 5) 
In statistical timing analysis, d targ differs from the actual target percentile system delay because one path that is not included in the K most-critical paths can be a system delay. Therefore, a path that is not included in the K most-critical paths can be the path with maximum delay. Therefore, the upper and lower bounds in [25] cannot be solely used to estimate the system delay at any percentile. SoftCorner uses the method proposed in [25] only to determine how much to reduce the pessimism. SoftCorner can consider other paths that are not included in the K most-critical paths but can be critical paths.
After setting d targ , the x value is determined. This means that the probability model to be used in STA engine is fixed. x affects the Ps of the candidates and therefore affects the expected value of the gate delay value. As x increases, the P of the larger C at any D increases (Figs. 4c, d ). For this reason, the expected path delay increases as x is increased. The probability model is determined so that the expected maximum path delay using the model is equal to d targ .
Firstly, the expected value of KCP max , max{E[d path,j (x)]}, is calculated using a certain x. The x value when max{E[d path,j (x)]} is equal to d targ is selected. In this step, the Newton-Raphson (NR) method is used as follows. The k th candidate is selected considering the probability model defined in (1) . Therefore, the expected delay of a logic gate included in the path of depth D is defined as in (3) . If the logic gate is included in more than one path simultaneously, the minimum depth among the depths of the paths is used, to avoid optimism.
where 
becomes x of the probability model. The NR method yields the solution of (5). Using the NR method, x at the (t + 1) th step can be calculated using x at the t th step as
where g (x t ) is the derivative of g(x t ) with respect to x, as
where
. The existence and uniqueness of the solution of (5) will be proven in section IV.
We now show the range in which the result of SoftCorner is located. Using (1), KCP max is expected to be equal to d targ , because the solution of (5) is chosen as x. Even though SoftCorner uses the probability model that makes the expected KCP max equal to d targ to determine the degree by which pessimism is reduced, the probability model is applied globally to all logic gates in the circuit in the next step. Therefore, the method can consider paths that are not included in the K most-critical paths but that can be the most critical path. As a result, SoftCorner usually gives KCP max which is larger than d targ . In addition, because each logic gate has relaxed candidates less than 3σ point, the result of SoftCorner is smaller than that of the traditional DSTA. Therefore, SoftCorner method is more effective than both traditional DSTA and [25] . 
C. ALLOCATION OF DELAY AND SLEW VALUES
One of the candidates in the library is allocated to each logic gate in the circuit (Fig. 7) . The allocation is performed considering the probability P(k, D, x) that the k th corner will be selected (III.B). Various ways to consider the probability P(k, D, x) may be available in the allocation step. SoftCorner considers the probability using a seed value that is extracted from the standard normal distribution.
The process of allocation step is as follows. (i) We divide the range of a random variable that follows the standard normal distribution so that each range corresponds to a specific candidate. (ii) For a logic gate, a random seed is generated from the standard normal distribution. (iii) According to the range to which the random seed belongs, the corresponding candidate is selected and used as the corner value for the logic gate. In step (i), the range is divided so that both probabilities that the seed is generated and the corresponding candidate is selected should be same. Therefore, in the first step, a random variable that follows the standard normal distribution is divided into N range (R k , 1 ≤ k ≤ N ) as
. . .
so that the CDF of the k th section, f (R k ), is equal to P(k, D, x) in (1), where f (x) is PDF of the standard normal distribution. Because the P(k, D, x) is determined in the previous section III.B, r k (1 ≤ k ≤ N − 1) can be set satisfying (9).
FIGURE 8. (a)
The probability of candidates of a logic gate and (b) the standard normal distribution from which a random seed comes and the corresponding candidate of each range.
In Fig. 8 , we provide an example for N = 6 case; one range corresponds to one candidate. In this example, if the generated random seed d is in r 1 < d < r 2 , the second candidate is selected because the probability is the same with the probability of the second candidate, P(2, D, k 3 ).
In this manner, the allocation module can simply follow the probability model determined in III.B. Using the allocated candidate values, the STA engine is run, and produces the system delay at the target yield.
IV. EXISTENCE AND UNIQUENESS OF THE SOLUTION OF PROBABILITY MODEL A. EXISTENCE
The existence of the solution of (5) can be proved as follows. E[d path,j (x)] is an increasing function of x. Therefore, a solution of (5) exists when the sign of g(x) changes from negative to positive as x increases (Fig. 9a) . To verify that (5) satisfies the condition, we should examine the ranges of d targ and max(E[d path,j ]) terms in (5). First, d targ is determined as the maximum value among γ · σ i + µ i values of K most-critical paths where µ i and σ i are the mean and standard deviation of the i th critical path delay [25] . γ depends on K and the correlations among the K mostcritical paths. From our experiments based on [25] , γ ranged from 2.95 to 4.5, when 10 ≤ K ≤ 300. In Fig. 9(b) , the range of γ and the corresponding range of d targ are described. (Fig. 4) . When the minimum value of x = −10d max is used, the result is the same as KCP max when the minimum candidate in the library is used for all logic gates (Fig. 4a) . Similarly, when the maximum value of x = 10d max is used, max(E[d path,j (x)]) = KCP max when the maximum candidate is used for all logic gates (Fig. 4b) . The solution of (5) (Fig. 9b) . Therefore, if the circuit satisfies both of the following conditions, a solution to (5) (4), the existence of the solution of (5) can be simply tested before determining the probability model. Also, the existence of the solution can be easily satisfied by changing the range of the candidates in the library. If the pessimism of the circuit is too severe to satisfy Condition (3), the value d 1 i,j of the first candidate should be decreased. In the same manner, in rare cases, the value of d N i,j should be increased if Condition (4) is not satisfied due to weak pessimism. The above two modifications to satisfy the existence conditions can be performed without too much extra burden of runtime.
Next, max(E[d path,j (x)]) increases as x increases

B. UNIQUENESS
To demonstrate the uniqueness of the solution of the probability model, we should prove that (5) is a strictly monotonic increasing function. That is, we should prove that g (x) > 0 for all x. If (3) is a strictly monotonic increasing function, then (5) is also a strictly monotonic increasing function. Therefore, we will show the uniqueness of the solution by verifying that (3) is a strictly monotonic increasing function. Firstly, the derivative of (3) is
where In (10), − ln(a)/ i a i(D−x) 2 > 0 is always true because 0 < a < 1. Therefore, (11) should be proven to show that (5) is a strictly monotonic increasing function.
The sign of each term, ( (11) can be positive or negative depending on the relative size of k and i because a (k+i)(D−x) d k is always positive. Now, we will prove (11) by showing that the sum of two terms of (11) when (k, i) = (m, n) and (k, i) = (n, m) where m and n are different arbitrary elements from the set from which k and i are originated is always larger than or equal to zero as in (12) .
Because (11) is the summation of (k − i)a (k+i)(D−x) d k for all combinations of k and i which are from the same set, proving (12) for all arbitrary m and n from the set can be the direct proof of (11) . First, we assume that m > n, but the proof using the opposite assumption is also possible in the same manner. Because m > n, the first and second terms of the left hand side of (12) are positive and negative, respectively. Meanwhile, d m > d n if m > n because we stored the candidates in the library that way as described in Section III.A. Therefore, the absolute value of the first term is always larger than that of the second term as in (13) . Therefore, (12) is always true because the absolute value of the positive term is always larger than that of the negative term.
Because (12) is proved as described above, (11) is also always true. Now, we prove that dE[d gate,i (x)]/dx in (10) is always positive. Therefore, (5) has a unique solution.
V. APPLICATION OF PROPOSED METHOD IN DESIGN FLOW
SoftCorner can be used in the flow of early design-specific SSTA. SoftCorner belongs to the early design-specific SSTA methods in pre-layout verification (Fig. 10) . The proposed module has the netlist, the information of PV, and the target yield as inputs. In addition, a timing constraint d const is given. Running the proposed module yields the system delay d sys at the target percentile. The meaning of d sys is that the system delay must be d sys to achieve the target yield. Therefore, the design is passed if d const ≥ d sys . Otherwise, the yield at the timing constraint is lower than the target yield. For that case, the design should be modified to achieve the target yield.
A user can use SoftCorner to determine the yield of the design at the pre-defined d const . First, the user enters the expected yield as the user-defined target yield input. 
VI. EXPERIMENTAL RESULTS
A. EXPERIMENTAL ENVIRONMENT
In this section, SoftCorner was compared with the various benchmark methods. Two MC-based methods and two DSTA-based methods were used as the benchmark methods. SH-QMC [13] and MCSTA [9] were used as MC-based benchmark methods. These methods first estimate the distribution of the system delay and obtain the system delay at the target percentiles. Therefore, we ran these methods including MC simulation until from the first to the third moments of the distribution converge. Traditional DSTA and DDMbased STA [22] were used as the DSTA-based benchmark methods. Also, SoftCorner directly estimates the values at the target percentiles, so we ran it until the mean values of the simulation results converged. We compared the errors and runtime that occur when estimating the system delays at the several target percentiles. The errors were evaluated by comparing the results with the converged MC simulation results.
In experiments, 7-nm predictive technology model-multi gate (PTM-MG) [31] was used as the transistor model. The variations of the gate length, oxide thickness, fin thickness, and fin height were considered. For D2D variations, the 3σ values of the process parameter distributions were set to 18% of their mean values. For random WID variations, the 3σ values were set to 6% of their mean values. For traditional DSTA, the 3σ point of the gate delay distribution was used as the gate delay value to estimate the 3σ point of the system delay distribution. In SH-QMC, the parameter s was set to 2.5; this is the standard for selecting the critical parameters. Among the four process parameters, two were assumed to be critical, one was assumed to be moderately critical, and one was assumed to be non-critical. For MCSTA, 10,000 instances for each logic gate were included in VCL. For SoftCorner, a of the probability model was set to 0.7 and can be changed as long as the existence of the solution is guaranteed. Also, with reference to [25] , 300 paths were extracted to obtain the bounds of KCP max . The experiments were performed on ten ISCAS 85 and three ISCAS 89 benchmark circuits. We implemented SoftCorner and the benchmark methods in C++ language on Intel (R) Xeon (R) E5-2690 @ 2.90 GHz.
B. ACCURACY COMPARISON
SoftCorner uses the upper and lower bounds of KCP max in Step 2 to determine the probability model. MC simulation results were generally larger than the upper and lower (Figs. 11a, b) . This difference is due to paths that are not included in the K most-critical paths but can have maximum path delay in some PV samples. SoftCorner successfully considered those paths because it just uses the bounds to determine the degree of pessimism reductions, then applies that probability model to the whole circuit.
The variable of the probability model is determined by K most-critical paths of the input circuit. Therefore, if the probability model is applied to the whole circuit, it can result in the error in estimating the path delay especially for the paths which are not included in the K most-critical paths. However, applying the probability model, which is determined based on the K most-critical paths, to the whole circuit does not affect critical error in estimating the system delay. To validate that, we estimated the path delay at 99.87 percentile using DSTA and the proposed method compared to MC simulation (Fig. 12) . We first made a circuit which consists of several inverter chain with different depth from 1 to 64. Then, we found the probability model following SoftCorner algorithm using different K values. The probability model was applied to every logic gate, and the every path delay was estimated at the 99.87 th percentile (Fig. 12a) . Compared to MC simulation results, we have plotted the path delay error in Fig. 12b . The error of estimated path delay decreases as K value increases. However, the proposed method shows much smaller error compared to DSTA for all path depths and for all K values. Even though the error is large especially for the short path, the error of long path delay is much important to estimate the system delay compared to that of short path because long path delays are more likely to be the system delay.
The data on y-axis corresponding to ''MC'' and ''Proposed'' legend of Fig. 11 are listed on the ''MC'' and ''Proposed'' columns of Table II , respectively, to validate the accuracy of the proposed method compared with other methods. Estimated system delays at the 99.87 th percentile were compared for the proposed method and the benchmark methods ('Delay' column in Table II) . Also, the percentage errors of the estimated system delays compared to MC simulation are shown ('Error' column in Table II ). Mean absolute percentage errors are shown in the last row of Table II . Among the MC-based algorithms, SH-QMC estimated the target system delay most accurately (1.30% mean absolute percentage error). MCSTA selects random delay values from VCLs in gate level. Therefore, the probability that all the logic gates simultaneously selects the extreme values of the logic gate delay distribution among 10,000 instances in VCL is very low. Therefore, estimated system delay was ∼6.6% lower than MC simulation results. Among the DSTA-based algorithms, SoftCorner showed the most accurate results (2.04% average error). DSTA showed overly pessimistic results as expected because it assumes that all logic gates have 3σ values of gate delay distributions. DDM-based STA uses δ < 3 to use δσ points of gate delay when estimating the 3σ value of system delay distribution. However, this method has limitations in accuracy especially when the recent technology is used because the method is based on the first-order model and uses several unrealistic assumptions. The accuracy of SoftCorner was comparable with that of SH-QMC which is the most accurate MC-based method. SoftCorner can estimate other percentile points with high accuracy. When the target percentiles were 90 th and 85 th , SoftCorner estimated the system delays percentiles with 1.31% and 1.49% mean absolute percentage errors compared to MC simulation results (Table III) .
SoftCorner has variations on estimation results. The reason that the variation occurs is that SoftCorner determines the probability model, and a certain selected corner value changes at every run. However, because the expected value of the maximum among the K most-critical paths is set as d targ , the variation is not large and not critical. The histogram of the results of SoftCorner and MC simulation to present the mean and standard deviation of the results of SoftCorner overlap (Fig. 13) . When the c1908 circuit was used and the target percentile was 99.87, the mean of the results of SoftCorner was 37.27 ps with 0.41% error compared to the MC simulation results (Fig. 13, dotted line) . The standard deviation (s.d.) of the results of SoftCorner was 0.12 ps which is the much smaller value than its mean value. Similarly, when the target percentiles were 95 and 85, the mean values of the results of SoftCorner were 35.42 ps (s.d. = 0.08 ps) and 34.95 ps (s.d. = 0.13 ps), respectively. The mean and standard deviation of the MC results were 34.41 ps and 0.84 ps, respectively. Therefore, the proposed method converges much faster than MC simulation because of the smaller standard deviation of the results from SoftCorner. In other words, SoftCorner provides the estimation results at the target percentile much faster than MC simulation.
C. RUNTIME COMPARISON
All methods reduced runtimes compared to that of MC (Fig. 14) . Firstly, MCSTA reduced the runtime to 1.37×10 −2 times that of MC simulation on average; this reduction occurs because MCSTA replaces the HSPICE timing model with the VCL. It was efficient to reduce the runtime for one simulation at the expense of accuracy. DDM-based STA greatly reduced the runtime to 3.67 × 10 −10 times, compared to MC simulation, but DDM-based STA uses impractical assumptions, so it is not applicable especially in modern technology. SH-QMC greatly reduced the number of simulations, but it still required much longer runtime compared to other benchmark methods; the reason is that SH-QMC is an MC-based algorithm and does not reduce the runtime for one simulation. SoftCorner reduced runtime to 5.90 × 10 −4 times that of SH-QMC and to 1.89 × 10 −5 times that of MC simulation.
VII. CONCLUSION
This paper has presented SoftCorner, which is a novel DSTA-based statistical timing analysis method that uses lessextreme corner values than existing methods. SoftCorner can be used for the users who want to estimate the system delay at an arbitrary target percentile. SoftCorner first uses MC simulation data to develop libraries that contain the candidate corner values. Therefore, SoftCorner does not require any VOLUME 6, 2018 assumptions about the distribution forms of the timing components. Depending on the input circuit and the target yield, SoftCorner constructs the probability model and uses it to guide allocation of different corner values for different logic gates. Experiments verified the effectiveness of SoftCorner by comparing the accuracy and the runtime of SoftCorner with other benchmark methods. SoftCorner estimated the system delays at 99.87 th , 95 th , and 85 th percentiles with average errors of 2.04%, 1.31%, and 1.49% respectively with respect to MC simulation results. This was comparable to the most accurate MC-based benchmark method, SH-QMC. However, because SH-QMC is an MC-based method, it required much longer runtime than SoftCorner. SoftCorner consumed only 5.90 × 10 −4 times as much runtime as SH-QMC. VOLUME 6, 2018 
