Abstract-In this paper, we propose a novel flow to enable computationally efficient statistical characterization of delay and slew in standard cell libraries. The distinguishing feature of the proposed method is the usage of a limited combination of output capacitance, input slew rate and supply voltage for the extraction of statistical timing metrics of an individual logic gate. The efficiency of the proposed flow stems from the introduction of a novel, ultra-compact, nonlinear, analytical timing model, having only four universal regression parameters. This novel model facilitates the use of maximum-a-posteriori belief propagation to learn the prior parameter distribution for the parameters of the target technology from past characterizations of library cells belonging to various other technologies, including older ones. The framework then utilises Bayesian inference to extract the new timing model parameters using an ultra-small set of additional timing measurements from the target technology. The proposed method is validated and benchmarked on several production-level cell libraries including a state-of-the-art 14-nm technology node and a variation-aware, compact transistor model. For the same accuracy as the conventional lookup-table approach, this new method achieves at least 15x reduction in simulation runs.
I. INTRODUCTION
A standard cell library capturing statistical information of delay and output slew variations is at the core of statistical static timing analysis (SSTA), and, cost efficient statistical characterization of such libraries has become essential. The most widely used statistical library cell characterization method is based on the look-up table (LUT) approach where gate propagation delay (t d ), output transition time (S out ) and their variations are stored in a look-up table with different combinations of inputs such as cell types, input slew (S in ), load capacitance (C load ), supply voltage (V dd ), and other parameters [1] .
The runtime complexity required for such a statistical LUTbased approach is O(N sample · N LU T ), where N sample is the number of SPICE runs needed to obtain each mean and variance value and N LU T is the number of input vector combinations. This approach will quickly become infeasible as either N LU T or N sample in a technology increases. Historically, circuit level Monte Carlo (MC) simulation has been employed to generate a number of samples in the process parameter probability space [2] . Such approach allows variability-aware analysis to be implemented with minor changes on top of existing characterization tools but requires a large number of MC runs. To address this challenge, several approaches based on sensitivity analysis for library characterization have been proposed by EDA vendors. For instance, Composite Current Source (CSC) is adopted by the Synopsys PrimeTime SSTA tool and sensitivitybased effective-current-source-model (S-ECSM) is adopted by the Cadence statistical tool. All of these approaches aim at modelling the statistical impact of process parameter variations as a linear superposition of the impact of each parameter in the response model of the affected metric. Several Response Surface Methodologies (RSMs) have also been proposed to explore the sparsity of the process regression coefficients. An example of such a strategy is Least-Angle Regression (LAR) which uses L 1 -norm regularization [3] . One major benefit of regularizing with the L 1 -norm is that it results in sample complexity that is logarithmic in the number of features (e.g., principal components). For statistical characterization of standard cells, an error propagation technique using linear sensitivity analysis and Response Surface Methodology (RSM) using Brussel Design of Experiments (DoE) was proposed for library characterization in [4] . The Brussel DoE performs statistical feature selection keeping only those features that are most relevant to the response under consideration. Then it uses a model selection algorithm to build a suitable regression model for all the responses. More Recently, several statistical circuit simulator based on uncertainty quantification have been successfully applied to avoid the huge number of repeated simulations in conventional Monte Carlo flows [5] - [9] .
On the other hand, the expensive simulation cost of the statistical LUT-based approach is not only due to high dimensionality of the process space, but also due to high dimensionality of the cell input space (e.g., cell type, input slew S in , load capacitance, supply voltage V dd , etc.). This problem is further exacerbated as more design options are provided in recent technologies (e.g., multi-V t , multi-V dd ). While most of the existing work focuses on exploring the sparsity of the regression coefficients of the process space with a reduced process sample size for each input space vector, correlations between different cells and different input vectors within the same cell have not been considered in the open literature, to the best of our knowledge. This has been the main motivation of this work which proposes a novel acceleration method that operates in the library input space rather than its process space and that can be added to any acceleration used in the process space. This is achieved through the systematic use of recent advances in statistics and semiconductor metrology that we apply to the development of computationally efficient statistical characterization algorithms for standard cell libraries. We propose two key techniques to explore correlations in library input space. The first is a novel ultra-compact, analytical model for gate timing characterisation, and the second is a Bayesian learning algorithm for the parameters of the aforementioned timing model using past library characterizations along with a very small set of additional simulations from the target technology. Bayesian approaches were initially introduced in the area of VLSI design for post-Silicon validation and parameter extraction [10] - [15] . The intrinsic simplicity of the proposed timing model combined with the Bayesian learning [16] framework is capable of building very accurate circuit response representations.
The rest of this paper is organized as follows. Section 2 introduces basic notation and formulates the problem of statistical characterization in library input space. Section 3 describes prior work on gate delay modelling and presents our novel ultra-compact analytical model for gate delay and slew. Section 4 presents our Bayesian algorithm which learns timing model parameters from past library characterizations and a very small set of additional simulation runs in the target technology. The foundation of this algorithm is the use of maximum-a-posteriori (MAP) estimation. In Section 5, our new methods are validated on the library characterization in state-of-the-art 14-nm and 28-nm technology and compared with the LUT method. Our conclusions are given in Section 6.
II. PROBLEM FORMULATION
In library characterization, an accurate model for cell delay (T d ) and output slew (S out ) is developed given the following input data: a cell type, input slew (S in ), output load capacitance (C load ), transition direction (RISE/FALL), and supply voltage (V dd ). To formalize the library characterization problem, we consider an individual logic gate with multiple inputs and one output, and for simplicity, we start from the standard assumption that only one timing arc is modelled at a time, which implies that we do not consider simultaneous input switching. For p input variables (ξ = {ξ 1 , ξ 2 , ..., ξ p }), such as S in , V dd , C load , etc., the cell response is modeled as the following two functions:
The problem of nominal library characterisation is to estimate f T and f S given k input vectors {ξ} = {ξ (1) , ξ (2) , ..., ξ (k) } and k output observations {T
out }, such that the timing prediction error with respect to a baseline case is minimized under the condition that k is very small. The nominal baseline case is defined by SPICE simulations under n different input vectors (n >> k ) sampled randomly within the input space ξ.
Let us now denote by {T d } an ensemble of delay observations. This ensemble has been generated for a given input vector but under varying process parameters. Now we formulate the problem of statistical library characterisation in input space as that of estimating f T and f S given k input vectors {ξ} = {ξ (1) , ξ (2) , ..., ξ (k) } and k ensembles of output observations {{T
out }}, such that the prediction error for the statistical metrics with respect to a statistical baseline case is minimized under the condition that k is very small. The statistical baseline case is defined by statistical SPICE simulations using the same n different input vectors (n >> k) as the nominal baseline case, where the SPICE simulations are now executed according to the Monte Carlo method in process space. The metrics of the statistical baseline case include the mean and standard deviation of delay and output slew at each input vector i ∈ {1, ..., n}. They are denoted as μ
III. MODEL FOR DELAY AND OUTPUT SLEW
Accurate gate level modeling for delay and slew estimation has become a major challenge for nanometric technologies. Historically, the transistor delay has been simply approximated by C load V dd /I dsat , where I dsat is the drain current at V gs = V ds = V dd . A more accurate model, named the alpha-power law, was later proposed in the early 1990s [17] where a closed-form expression was derived for the delay of an inverter. A simplified version of the alpha-power law was proposed in [18] . More recently, a simple analytical expression for the intrinsic MOSFET delay, using physics-based models for the effective current and the total gate switching charge, was proposed to better describe nanometric technologies [19] , [20] .
Although these advanced delay models provide accurate description of transition activity in the cell, they are still quite complex, and detailed process information is required to fit the entire model.
Our first goal therefore is to contribute an ultra compact timing model that is at once a generalisation of older models but whose parameters allow a sparse representation of input space vectors. Fig. 1 (a) shows the key factors that affect the delay and output slew of an inverter. In this work, we consider the impact of input slew (S in ), output load capacitance (C load ), supply voltage (V dd ), and driving strength (I ef f ). The pull-up network is replaced with an "equivalent" PMOS while the pull-down network is replaced with an "equivalent" NMOS device.
To find our ultra compact model, we first study gate delay in a simple inverter and generalize it to any combinatorial logic cell. Recent studies [21] - [23] show that the simple C load V dd /I dsat metric follows the experimental inverter delay much better if the on-current in the denominator is replaced with an effective current I ef f representing the average switching current. In line with the intrinsic transistor delay defined in [19] , we model cell delay as
where k d is a scaling factor used to obtain a good fit to the actual cell delays. I ef f is defined as
and can be evaluated easily through performance modeling or through a circuit simulation that takes into account process variations [19] , [24] . Since our focus is to model delay and output slew as functions of input variables, (1) and (2), we assume we know I ef f for each input vector. Note that the direct link between process parameters and delay is still preserved in the I ef f current. To generalize the above model to any combinatorial logic cell, we simply next each gate onto an "equivalent inverter" and use the inverter characterization to estimate delays and output slews [25] - [27] . Fig. 1(b) shows the equivalent inverter of a NAND2 where the pull-up network is replaced with a PMOS and the pull-down network is replaced with an NMOS device. The charge transferred to or from the load capacitance during switching is equal to
where C par , V and α are all fitting parameters. Compared with the simple C load V dd /I dsat metric, several effects have been considered: (1) C par is introduced to account for parasitic capacitance, such as those associated with junctions and interconnects, which are not included in C load ; (2) V is introduced to compensate for the inaccuracy of the delay model at low V dd ; and (3) a linear coefficient α is introduced to account for S in 's impact on delay. The estimates of f T and f S are then converted to parameter extraction problems for {k d , C par , V , α}. A special feature of this simple delay model is that the same format is used to describe not only delay but also output slew S out albeit with a different set of values for the fitting Fig. 2 , where T d and S out are simulated through SPICE using a 14-nm industrial design kit and two separate V values are extracted for T d and S out . For different groups of C load and S in combinations, a constant value of 
The first step is to transfer observed training samples
.., k) to parameter subspace {k d , C par , V , α} and use both to derive a probability distribution on the parameter space. The pdf 's on {k d , C par , V , α} for delay and output slew can then be calculated and the parameter extraction problem solved using maximum a posteriori (MAP) estimation.
Without loss of generality, we describe the MAP estimation for delay parameter subgroup P T = {k d , C par , V , α}. Parameters for output slew are estimated in a similar manner.
First, we assume that P T follows a Gaussian distribution P T ∼ N (μ PT , Σ PT ):
where μ PT and Σ PT are the mean vector and covariance matrix of the parameter subgroup P T , respectively. Next, we assume that the μ PT follows a conjugate Gaussian prior distribution μ PT ∼ N (μ t0 , Σ t0 ).
where μ t0 and Σ t0 are the mean vector and covariance matrix of μ PT , respectively. We also define the delay model precision as β fT d , which equals the inverse variance of modeling errors across different technologies. Given μ PT and β fT d , we calculate the likelihood of observing delay at ith input condition T
associated with subspace distribution pdf (P T ) as
The learning of precision β fT d is a key step in this method. In practice, β fT d represents our "uncertainty" on proposed delay model at different input conditions due to its inability of capturing certain physical effects. While they depend on the details of the technologies, these precisions show a strong systematic trend across different input conditions ξ. In this work, extracted parameters μ PT from past technologies are used to learn the systematic precision β fT d at different input conditions.
Characterizations from a variety of technology nodes enable us to propagate our historical belief to a new technology node. While generic or broad historical technologies can be used to learn approximate precisions, in order to achieve the highest applicable prior precision, the best historical technologies would be those with the same design or process choices as the target technology. For example, if we intend to fit a library in a low power process, appropriate historical technologies would also be technologies in low power processes. Therefore a bias-variance tradeoff is needed in the selection of historical libraries.
The detailed learning process proceeds as follows. 
After the estimation of likelihood and precision, we are able to transfer delay characterization {T
d } to parameter subspace P T and obtain the conditional probability of observing T (i) d given μ PT and β fT d (ξ (i) ). We then combine this conditional probability with the prior distribution pdf (μ PT ) in (7) to accurately estimate μ PT . Assuming each delay simulation is ideal, we can write the likelihood function pdf (
According to Bayes' theory, the conditional distribution pdf (μ PT |T d ) is proportional to the product of the prior pdf (μ PT ) and the likelihood function pdf (T d |μ PT ):
The precision β fT d is learned from historical cell delay characterization and is therefore independent of the observation T d .
Consequently,
Substituting (10) and (12) into (11) yields:
The last step is maximum-a-posteriori (MAP) estimation to find optimal estimates of μ PT that maximize the log likelihood of the posterior distributions lnpdf (μ PT |T d ). It can be mathematically formulated as an optimization problem
Substituting (7) and (8) into (14) and removing the constant items yield:
where (15) is the summation of a concave quadratic function. Hence the optimization problem in (15) is also a convex programming problem and can be solved both efficiently and robustly. So far we have achieved individual library cell characterization (no statistical characterization included). The detailed efficient statistical library cell characterization proceeds as follows.
N sample different seeds for each cell under process variation are generated through Monte Carlo (MC) simulation or Design of Experiments (DoE) [4] . For jth seed in each cell, {T d } and {S out } under k input conditions are simulated through a SPICE simulation using .ALTER statement. P (j) T and P (j) S are extracted through proposed Bayesian inference with maximuma-posteriori (MAP) estimation for jth seed. For a targeted input condition ξ, the probability distribution of delay and output slew are calculated as pdf (f T (ξ, P T )) and pdf (f S (ξ, P S )). 
LUT input conditions

V. VALIDATION
In this section, two library cell characterization examples in several cutting-edge CMOS technologies are used to demonstrate the efficiency of our proposed method. All test cases as well as the historical library cell characteristics are generated using different BSIM based industrial design kits reflecting real measurements. To test and compare with the prior part, we have also implemented both deterministic extraction and statistical extraction using a look-up table (LUT) approach.
The baseline characterization is defined in this work by a 1000 points Monte Carlo simulation sampled randomly within the whole input space ξ = {S in , C load , V dd }. Note that these points only represent different operating conditions for a target cell while the effects of process variation are not included. Fig. 5 shows a scatter plot for 1000 points among the whole input space where we will compare our characterization result with standard methods. The first example is to conduct a nominal delay and output slew characterization for a library designed in a commercial state-of-the-art 14-nm FINFET technology. Both fitting and testing samples are generated through SPICE simulation using a well calibrated compact transistor model. Fig. 6 shows average prediction error compared with the baseline characterization using proposed model with Bayesian inference, proposed model with our least-square error function optimization, and look-up The second example is to conduct statistical delay and output slew characterization for a library designed in a commercial state-of-the-art 28-nm bulk-Silicon technology, which is different from the model used in the first example. The baseline characterization is defined similar to previous example where 1000 input combinations are sampled randomly within the whole space ξ = {S in , C load , V dd }. In this case 1000 seeds under process variation are generated for each cell to obtain statistical distributions for delay and output slew with different input combinations.
The error functions for statistical characterization of E(μ Td ), E(μ Sout ), E(σ Td ) and E(σ Sout ) are defined as Fig. 7 and Fig. 8 show average prediction error for mean and standard deviation of delay and output slew characterizing a library designed in a commercial state-of-the-art 28-nm technology compared with the baseline characterization using proposed method and look-up table approach. Up to 20X runtime speedup is observed to achieve the same characterization accuracy in mean value and standard deviation of T d and S out . Fig. 9 shows delay probability density simulated using baseline simulation, the proposed method with seven training input combinations, and an interpolation of look-up tables with 60 training input combinations together with baseline distribution using SPICE Monte Carlo simulation. The input combination is V dd = 0.734V , S in = 5.09ps, C load = 1.67fF . The proposed method shows a much better prediction for delay distribution which correctly predicts the non-Gaussian distribution for low V dd . VI. CONCLUSION In this paper we have presented an entirely different perspective on the acceleration of library characterizations. While previous authors have emphasized the use of statistical techniques to address the efficient design of variation-aware standard cell libraries by working in process space, in our work we use similar techniques for the efficient design of these libraries by working in the traditional library input space of input slew, output load, and voltage supply. The main insight that has enabled this shift in perspective is the contribution of a new ultra compact timing model for standard cells that is a powerful and accurate generalization of the simple C load V dd /I dsat metric. This new analytical timing model transfers the library characterization problem from one of input parameter sweep to one of machine learning and Table   Ideal Fig . 9 . Delay probability density simulated using baseline simulation, proposed method and an interpolation of look-up tables together with baseline distribution using SPICE Monte Carlo simulation with an input combination of V dd = 0.734V , S in = 5.09ps, C load = 1.67fF .
sparse sampling. Machine learning is used to develop priors of timing model coefficients using old libraries while sparse sampling is used to provide the extra data points needed to build the new library in the target technology. Our methods have resulted in 15X reduction in simulation runs with respect to baseline techniques that use random sampling methods.
