frequency, leakage power, etc.) to differ from design specifications.
The undesired uncertainties in circuit performance can lead to analog/ mixed-signal circuit failures and yield loss at nanoscale. As such, it has become extremely critical for highprecision analog/RF circuits such as phase-locked loop (PLL) and custom/ mixed-signal circuits such as SRAM arrays, which both have tight operating margin due to lower power supply and higher operating frequency. Thereby, the parametric yield, defined as the percentage of fabricated circuits that can function correctly, has become one important criterion to evaluate design robustness under process variation at nanoscale.
In general, the parametric yield can be estimated in either the performance domain or the parameter domain, as shown in Figure 1 . The performance domain is defined as a space consisting of all possible performance merits (i.e., voltage, gain, bandwidth, etc.), while the parameter domain is a space spanned by all circuit parameters (i.e., channel width, threshold voltage, etc.) and bounded by their minimum and maximum values. In both Editor's notes: Accurate yield estimation is always an important director of design. For analog/mixed signal circuits, the dominant yield loss mechanisms are parametric in nature. This paper provides an informative discussion of varied approaches to parametric yield estimation, including recently developed methods that provide a highly accurate and fast alternative to Monte Carlo methods for some types of analysis.
VAnne Gattiker, IBM Austin Research Lab domains, the successful samplings form a bounded region called the ''yield body'' or ''success region,'' and the nonlinear boundary that separates the success and failure regions is called the ''yield boundary. '' As a result, the percentage of satisfied samplings in both domains can be used to estimate the yield rate equivalently. Many performance-domain techniques have become available in past few decades: the Monte Carlo (MC) method repeatedly draws samples, runs simulations, and evaluates the yield rate, which can be easily applied to high-dimensional problems. However, it is extremely time consuming. To relieve the problem, the quasi Monte Carlo (QMC) method [1] , [2] has been proposed to generate quasi-random numbers rather than purely random samplings, which can save a large number of samples. Similarly, Latin hypercube sampling (LHS) [3] has been developed to generate multivariate samples by dividing the range of each variable into equally probable intervals and placing only one sample in each grid cube containing the same position (or called Latin hypercube). Moreover, importance sampling (IS) [4] [5] [6] has been established to efficiently handle the rare event (i.e., millions of samples can capture only one failure) which changes the sampling density to draw more failed samples.
Meanwhile, a large number of parameter-domain methodologies have been developed as well: nonlinear surface sampling [7] locates the points on the ''yield boundary'' by searching along the nonlinear performance surface in the parameter domain. The global search method [8] can find one point on the ''yield boundary'' with onetime simulation and thus save more runtime. The response surface modeling method [9] tries to model the performance as a polynomial function of all variable parameters and then evaluate the yield estimation. In addition, the design centering method [10] , the performance modeling method [11] , and other advanced techniques can also be used for parametric yield estimation.
In this paper, we present several aforementioned techniques, including the MC method and its variants (e.g., QMC [1] , [2] and IS [4] [5] [6] ) in the performance domain, and two recently established methods in the parameter domain (e.g., the nonlinear surface sampling method [7] and the global search method [8] ). We have compared all these approaches quantitatively by several circuit examples from the perspectives of both accuracy and efficiency. Moreover, we discuss the advantages and limitations of each method and further analyze the remaining challenges for parametric yield estimation of analog/mixed-signal circuits.
It is worthwhile to point out that this paper only covers a subset of well-established techniques due to the limited space, and interested readers are encouraged to refer to the resources in the reference section.
Performance domain yield estimation
In this section, we discuss the yield estimation methods, which are in the performance domain and include the MC method and its derivations. For illustration purpose, first we briefly present the fundamental idea behind each method and then have a quantitative comparison using one sixtransistor SRAM cell example at the end of this section.
Monte Carlo method
The most straightforward method in the performance domain is the MC method, which first generates tens of thousands of samples with the probability distribution of variable parameters. Then, it performs circuit simulation with each sample to evaluate the performance merit of interest. With the given performance constraints, the MC method can identify the successful or acceptable samples. In this way, the percentage of successful samples can be used to estimate the yield rate.
The advantage of the MC method is its simplicity and generality; it can be applied to arbitrary distributions of parameters and performance merits without any a priori knowledge. However, it is very time consuming to achieve high accuracy since its convergence rate to the exact value is only Oð1= ffiffiffiffi N p Þ, where N is the total number of samples. Therefore, it is not suitable for practical applications.
Quasi Monte Carlo method
An alternative to the MC approach is called the QMC method [1] , [2] , which uses quasi-random sequences rather than purely random samplings. The QMC method starts with the generation of quasirandom numbers (or representative samples), such as Faure (1982), Neiderreiter (1987), Sobol (1967), or Halton (1960) sequences. It then converts the samples following those specific distributions to the ones following the desired distribution.
For example, the Sobol sequence follows a uniform distribution u $ Uð0; 1Þ and can be converted to the Gaussian distribution using x ¼ F À1 ðuÞ, where F À1 is the inverse cumulative distribution function (inverse CDF) of the Gaussian distribution. The resultant sequence x follows the Gaussian distribution. As such, circuit simulation can be performed with the x sequence to obtain performance merits, and the yield is estimated as the percentage of successful samples.
The key idea behind the QMC method is to try to cover the entire parameter space evenly with deterministic samples rather than pure random numbers. Therefore, the QMC method can use fewer samples and potentially improve both accuracy and efficiency when compared with the MC approach.
In addition, the convergence rate of the QMC method can be reduced to Oð1=N Þ in the optimal cases, much faster than the MC approach. However, the convergence rate becomes OðlnðN d Þ=N Þ for multiple dimensions in the worst case, where d is the number of dimensions. As such, the performance of the QMC method can degrade with the dimensions.
Importance sampling
To improve the efficiency of the MC and QMC methods, the IS method has been proposed [4] [5] [6] , which shifts the sampling distribution toward the failure region and makes each sample more useful.
For example, we consider the original probability density function (PDF) of a variable parameter shown in Figure 2 , and assume the failure region locates at right tail. As such, IS tries to shift the sampling function toward the failure region which can increase the probability of sampling within the failure region.
IS can reduce the number of samples required to achieve a desired accuracy, especially in the case where the failure region is small for rare failure events [4] . However, it is always challenging to obtain an optimal sampling distribution or shift vector efficiently (e.g., the mean shift vector for one parameter in Figure 2 ), since this depends on the actual distribution of the performance merit and is unknown beforehand. Therefore, many techniques have been developed to find the optimal sampling distribution or the shift vector in recent literatures [5] , [6] , which will not be discussed in this paper due to limited space.
Experimental results
To evaluate the discussed methods, we apply them to estimate the yield of a six-transistor SRAM cell shown in Figure 3 , which is particularly IEEE Design & Test vulnerable to process variation due to everdecreasing supply voltage and reduced noise margin. One SRAM cell is used to store one memory bit, and a typical implementation involves six transistors: the four transistors Mn1, Mn3, Mp5, and Mp6 have two stable states, i.e., either a logic ''0'' or ''1, '' and the two additional access transistors Mn2 and Mn4 serve to control the access to the cell during read and write operations. The word line is used to determine whether the cell should be accessed (connected to the bit line), and the bit line is used to read/write the actual data from/to the cell.
In this paper, the SRAM yield is defined by the data retention failure due to static noise margin (SNM) variations, and the threshold voltages of all transistors are modeled as independent random variables. In general, SNM is used to measure the amount of noise voltage that is needed to flip the stored data in one SRAM cell. Typically SNM can be measured by the length of the maximum embedded square in the butterfly curves which consists of the voltage transfer curves (VTCs) of the two inverters in the SRAM cell. When SNM is smaller than zero, the butterfly curves collapse and data retention failure happens.
To investigate how the various methods might behave in a more realistic context, we apply the MC, mixture IS [4] , and spherical sampling methods [6] to estimate the yield rate due to data retention failures, where both Hspice and BSIM4 transistor models are used. Note that the failures have become rare events and the yield rate is very close to 99.99%. As such, we first compare the evolution of failure rate estimation from different methods in Figure 4 , which shows that MC converges very slowly and QMC can significantly improve the convergence. In addition, both mixture IS and spherical sampling can reach the similar estimation results with much fewer samples.
Moreover, Table 1 further shows the number of samples and circuit simulations required by different methods when the target accuracy level and confidence interval are set to 95%. From this table, it can be observed that the QMC method can achieve around 5.3X speedup over the MC method, while IS-based methods can be tens or even hundreds times faster than the MC method with the same accuracy. Also, it is worthwhile to point out that the performance of IS-based methods is very sensitive to the shifted sampling distribution, which can be demonstrated by the different speedup obtained from mixture IS and spherical sampling. We can see that previous conclusions can still hold: the MC method tries to randomly select samplings to cover the entire space evenly, so it needs a large number of samplings. The QMC method can reduce the number of samplings by covering the entire space with deterministic sequences. Moreover, the IS method can shift the sampling distribution toward the failure regions so as to pick more failed samplings for improved accuracy and efficiency.
Parameter domain yield estimation
In order to avoid the large number of samples and simulations for estimating the yield at the performance domain, several approaches have been proposed to estimate the yield in the parameter domain. Generally, the variable circuit parameters with performance constraint can define one hypervolume in the parameter domain, and the points within this hypervolume denote those samples that provide satisfied performance merits. Therefore, the parametric yield estimation is to calculate the ratio of the volume of the hypervolume to that of the entire parameter domain.
One most straightforward method is to find all the points in the hypervolume with massive samples, which is expensive and unnecessary because the points on the hypersurface boundary can provide adequate information of the hypervolume. Therefore, the parametric yield estimation methods in the parameter domain aim to locate points on the hypersurface boundary (or called ''yield boundary'') efficiently and accurately.
In this section, we will present two yield estimation approaches (i.e., nonlinear surface sampling [7] and QuickYield [8] in the parameter domain, which can efficiently estimate the yield with very high accuracy along with hundreds of times larger speedup over MC. Also, we use a ring oscillator as an example to compare these methods, and the reasons are twofold: 1) QuickYield [8] can only be applied to direct current (dc) analysis and periodical steady state (PSS) analyses and the evaluation of oscillator period needs PSS analysis; and 2) the methods in the parameter domain calculate the yield rate with geometrical computation (i.e., area, volume), therefore, they become very inaccurate for circuits with ''rare failure events'' (i.e., the failure probability of the SRAM circuit is very small) Figure 5 . (a) Performance surface over parameter domain, and (b) the yield boundary in the parameter domain [7] . Table 1 Number of simulation runs required by performance-domain methods to achieve the same accuracy for SRAM cell failure estimation. because the failure region becomes very tiny and the yield rate is very close to 100%.
Nonlinear surface sampling
Yield estimation with nonlinear surface sampling (YENSS) [7] aims to find the yield boundary as shown in Figure 5a . For example, the performance surface can be defined with each point on the surface corresponding to a sampling point in the parameter domain. With the given performance constraint, the surface can be divided into success and failure regions, which are separated by the intersection of these two surfaces (called the ''yield boundary''shown in Figure 5b ). To that end, it starts from the nominal design point (P0 in Figure 5a ) and searches along the tangent direction locally on the performance surface. As such, the yield can be estimated by the area (or volume) ratio of the bounded region to that of the entire parameter domain.
This method can avoid massive sampling but has twofold disadvantages: first, local search is inefficient to find multiple boundary points, since a large number of expensive simulations are required; second, this method can only handle small-scale problems with up to three parameters. In [8] , we have shown that a global search algorithm can resolve these limitations.
Global-search-based yield estimation
According to our discussion so far, previous methods in the parameter domain mostly utilize ''local search'' shown in Figure 6a : generating samples of variable parameter, performing simulation at samples, and then comparing with the given performance constraints to find the samples on the yield boundary. Typically, the iteration needs to be repeated many times to find one point on the yield boundary.
In order to improve the efficiency, ''global search'' has been proposed in [8] recently to find each point on the yield boundary with onetime simulation without sampling. It breaks the framework in ''local search'' by introducing the constraints as an extra equation into the original circuit system and treating the variable parameter as an unknown (as shown in Figure 6b) .
Specifically, one circuit can be described by a differential algebraic equation (DAE) system as
Here, xðtÞ are the state variables including node voltage and branch current. qðxðtÞÞ contains active components such as charges and fluxes, and f ðxðtÞÞ describes passive component and b denotes input sources. As such, the DAE system can determine a performance surface in the parameter domain shown in Figure 7a .
Meanwhile, the performance constraint further locates a constraint plane in Figure 7a . It is clear that the yield boundary in the parameter domain shown in Figure 7b is the intersection boundary of these two surfaces in Figure 7a , which is equivalent to the solution of an augmented system including DAE and performance constraint. As a result, we can introduce the constraint into the DAE system and add the variable parameter as the extra unknown as d dt q xðtÞ ð Þþf xðtÞ ð Þþb ¼ 0
The second equation of Hð p ; f m Þ is the general expression of performance constraint [7] , [12] , where p is the variable parameter and f worst is the worst performance that can be accepted. It should be clarified that the above nonlinear system is a determined system when the number of variable parameters p equals the number of performance constraints Hð p ; f m Þ. However, when the number of performance constraints is smaller than that of variable parameters, the nonlinear system can only consider a part of parameters at one time in order to handle only the deterministic system. Moreover, this nonlinear system can be denoted as F ðxðtÞ; p Þ ¼ 0 for the ease of notation and be solved with the Newton-Raphson method which needs to calculate the Jacobian matrix JðXÞ ¼ @F =@X (X ¼ ½x T ; p is a vector of unknowns). For dc analysis, the Jacobian matrix simply becomes
Similarly, one augmented system for PSS analysis can be constructed in discrete time domain with the finite difference method. Interested readers are referred to [8] and [12] for more details. As such, the resultant solution contains the parameter value of p corresponding to the point on the yield boundary in the parameter space, thereby, one boundary point can be found with only onetime simulation. Since it searches for the boundary point within the entire parameter space, we name this scheme ''global search. '' In the following, we further show the advantages of the global-search-based yield estimation.
Experimental results
In this section, we consider a three-stage ring oscillator, as shown in Figure 8 , to compare the methods in the parameter domain, where the period is considered as the performance merit. The nominal period of the oscillator T norm is 7.2028 ns calculated via PSS simulation. The design specification requires that the variation in period T be within AE2.5% of T norm . For illustration purpose, we consider the effect of random variations in the width of MOSFET in the first stage with 40% perturbation range from their nominal values. The nominal width of Mp1 is 3 m and that of Mn1 is 2 m. Note that the same methods can be applied to other parameter variations, such as threshold voltage, channel length, etc.
In fact, there exist two boundaries in the parameter space, which correspond to T min and T max . Therefore, we need to solve two augmented systems, and each of them considers one performance constraint as follows:
As a result, we can observe these two nonlinear boundaries from the results of QuickYield in Figure 8b , which are identical to the result from the MC method in Figure 8c . Also, YENSS in [8] can capture the same boundaries as Figure 8b , but it needs more circuit simulations due to local search. Note that we consider two variable parameters (e.g., channel widths of the first stage as Wp and Wn), which implies that the system in (2) becomes an underdeterministic system. To solve it, we convert it into a deterministic system by fixing the values of Wp at different points (shown in Figure 8b ) and solving for the corresponding values of Wn. In other words, the system (2) has 1-D Hð p ; f m Þ and 1-D p parameter.
To evaluate the efficiency with the same accuracy (95% accuracy with 95% confidence interval), we compare the runtime of MC, YENSS obtained from [7] by normalizing with respect to its MC runtime, and QuickYield in Table 2 . From this table, the MC method needs 44 073.8 s, while YENSS can achieve 139X speedup over it. Moreover, QuickYield can obtain 519X speedup over MC and be 4X faster than YENSS at a similar accuracy.
Note that the runtime comparison in Table 2 only contains the runtime of circuit simulations, and thereby QuickYield can achieve more speedup over YENSS when the expensive sensitivity analysis in YENSS is considered. Furthermore, extensive experiments show that the accuracy and runtime of QuickYield can scale with the number of boundary points [8] , which is suitable for analog/RF circuits with the high-precision design requirement.
THE METHODS IN the performance domain (e.g., the MC method and its variants) tend to follow the ''sampling-and-checking'' procedure and thereby make them extremely time consuming. As for the IS method, it is a critical but not trivial problem to find one good alternative distribution, as shown in Figure 2 , which can provide fast convergence rate of probability estimation so as to reduce the number of required samples and circuit simulations. To summarize, the primary unresolved issue for methods in the performance domain is how to efficiently identify as few samples as possible to provide accurate yield estimation.
As for the methods in the parameter domain, on the one hand, they can provide extremely efficient yield estimation by searching yield boundary and approximate the yield with the ratio of area or volume of the success region to that of the entire parameter space; on the other hand, they are also facing the same high-dimensional challenge as the MC-based sampling approaches: usually a typical process design kit contains more than 100 independent random variables to model the global variations and five to ten extra independent random variables to model random mismatches for a single transistor. As such, it easily involves 1000þ random variables in the parametric yield analysis of a typical analog/mixed-signal circuit. Table 2 Number of circuit simulations required by parameter-domain methods to achieve the same accuracy for the SRAM cell yield estimation.
Therefore, the methods in the parameter domain become quite computationally expensive for highdimensional problems, since these methods require expensive calculations for the hypervolume of the success region in the high-dimensional space. Moreover, when ''rare failure events'' are considered, the area/hypervolume of the failure region becomes very tiny so that the methods in the parameter domain depending on calculation of area/volume ratios cannot provide accurate yield estimation. For a subset of casesVdc and PSSVQuickYield can provide significant additional speed improvements over YENSS.
In this paper, the problem of variability-aware parametric yield estimation for analog/mixed-signal circuits is introduced: the first is the existing approaches in the performance domain, such as the MC method and its variants; and the second is the newly developed approaches in the parameter domain, such as YENSS and QuickYield. The pros and cons of all these methods are evaluated and compared using different circuit examples. Moreover, the potential challenges facing the variability-aware parametric yield estimation are also elaborated on to facilitate the future research. 
