Aiding design and test optimization of analog circuits requires accurate models that can reliably capture complex dependencies of circuit performances on essential circuit and device parameters, and test signatures. We present a novel Bayesian learning technique, namely relevance vector and feature machine (RVFM), for characterizing analog circuits with sparse statistical regression models. RVFM not only produces accurate models learned from a moderate amount of simulation or measurement data, but also computes a probabilistically inferred weighting factor quantifying the criticality of each parameter as part of the overall learning framework, hence offering a powerful enabler for variability modeling, failure diagnosis, and test development. Compared to other popular learning-based techniques, the proposed RVFM produces more accurate models, requires less amount of training data, and extracts more reliable parametric ranking. The effectiveness of RVFM is demonstrated in terms of the statistical variability modeling of a low-dropout regulator (LDO) and the built-in self-test (BIST) development of a charge-pump phase-locked loop (PLL).
INTRODUCTION
As the complexity of analog/mixed-signal (AMS) circuits keeps increasing at a rapid pace, the tasks of design, verification and test have become significant challenges. Nevertheless, it is essential to characterize the dependencies of circuit performances/specifications on various circuit and device parameters or test signatures for purposes such as design, verification and test optimization. However, doing so is not trivial since the targeted dependencies are usually complex and nonlinear with deep-rooted correlations, making it arduous to reliably quantify the importance of numerous parameters.
For characterizing sophisticated circuit systems, machine learning techniques based on circuit simulations or measurements have been proven to be effective and produced promising outcomes. For example, support vector machines (SVMs) [1] are used as nonlinear classifiers in [2] to capture Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. the mapping from input parameters to circuit performance. A regression extension to SVM is employed in [3] to rank circuit parameters based on their correlations with unexpected timing deviations. Additionally, Bayesian inference is often used to build statistical circuit models. For instance, a co-learning Bayesian model is proposed in [4] to efficiently model the performance of AMS circuits.
Input parameters of vastly different amplitudes are often normalized to the same value range before being fed to a machine learning algorithm. This and other issues may amplify the impact of redundant or noisy parameters in the model and aggravate its vulnerability to noisy training data and/or inclusion of noisy or redundant input features (parameters). Since the sensitivities of circuit performances to various parameters may vary vastly, it is instrumental to extract accurate statistical models and reliable parameter criticality simultaneously. While traditional feature selection or importance ranking techniques may help to identify and select some important parameters out of a large parameter set, building models only with the selected parameters usually degrades the model performance and few of those techniques can guide the model to achieve higher accuracy [5, 6] . These difficulties present important roadblocks to analog/mixed-signal circuit characterization with machine learning techniques.
To achieve both objectives, this paper proposes a novel Bayesian learning framework for characterizing analog circuits with sparse statistical regression models. The proposed framework is named relevance vector and feature machine (RVFM) and can be considered as a significant extension to the SVM [1] and Relevance Vector Machine (RVM) [7] . The RVFM simultaneously seeks relevant training samples (i.e. vectors) and parameters (i.e. features) to derive a sparse model in both the vector and parameter spaces based upon a novel decomposed feature-weighted kernel function. As a result, the RVFM not only produces accurate models learned from a moderate amount of simulation or measurement data, but also computes a probabilistically inferred weighting factor quantifying the criticality of each parameter as part of the overall learning framework, hence offering a powerful enabler for variability modeling, failure diagnosis, and test development. In addition, an iterative algorithm is developed for efficient training of the proposed RVFM.
Compared to other popular learning-based techniques, the RVFM produces more accurate models, requires less amount of training data, and extracts more reliable parametric ranking. The effectiveness of RVFM is demonstrated in terms of the statistical variability modeling of a low-dropout regulator (LDO) and the built-in self-test (BIST) development and optimization of a charge-pump phase-locked loop (PLL).
BAYESIAN LEARNING
Throughout this paper bold capital letters denote matrices and bold letters in lower case denote vectors. For a matrix X, we use X(i, j) to denote its entry at the i-th row and j-th column. We denote the i-th entry of a vector x by x(i).
Kernel Machines
Assuming that there are F circuit parameters of interest with which the circuit is described by a parameter (feature) vector x. A sample of the circuit is defined by a pair {xi, ti} where ti is the circuit performance under the configuration xi. By collecting a number of N samples, the objective of regression is to capture the mapping Ψ : x → t with a function y(x) whose output can be used as a prediction of the performance t.
For analog/mixed-signal circuits, the mapping from the parameters to the performance Ψ is usually nonlinear. For this, a kernel method in the following form is often used:
where x is the sample being predicted, x i is the training samples, and K(x, x i ) is the kernel function. By applying (1) to all the training samples, the regression model be rewritten as:
where 
Existing Bayesian Models
The relevance vector machines (RVM) [7] is a sparse Bayesian model providing a viable probabilistic framework for regression. The Bayesian network model of the RVM is shown in Fig. 1 . , the RVM is used to probabilistically determine the model (2). The given target values are modeled as random variables by assuming every entry of the additive noise e is a zero-mean Gaussian random variable with variance σ 2 . Further assuming the independence of tn gives the following probability distribution
Different from deterministic learning models (e.g. the SVM) that compute w directly, the RVM defines the prior distribution of w as independent zero-mean Gaussian random variables with variance α, and compute α instead of w in the training process:
If αi < ∞, wi is called a relevance vector since it has a variance greater than zero, allowing xi making contributions to the decision function. Note that the RVM performs prediction with the posterior probability of the internal variables (i.e. the weights). Via convolution of Gaussian distributions, the covariance and mean of the posterior p(w|t, α, σ 2 ) can be shown to be respectively:
where
The objective of the Bayesian network is to find the most probable model parameters with the given training samples, i.e. to maximize the posterior probability p(w, α, σ 2 |t). Based on the Bayes' rule, such objective is equivalent to:
A fast training algorithm is proposed in [8] to compute the optimal α and σ 2 . After the training, for anyx, the expectation of its predicted target is:
and the variance is:
where φ(x) is a vector of size N whose i-th entry is defined by φ(x)(i) = K(x, x i ). Under the same framework, a feature ranking technique is proposed in [6] . By defining the feature vector fi = (x 1(i), x2(i), ..., xF (i))
T and a new design matrix Φ (i, j) = K(fi, fj), [6] proposes a feature weighting model is:
where t (i) = K(t, fi) and v is the weights of all features. Since (10) adapts the RVM to identify relevance features, we refer to this technique as RFM.
RELEVANCE VECTOR AND FEATURE MACHINE
To achieve high model accuracy and feature weighting with high quality, we propose the RVFM whose conceptual structure is shown in Fig. 2 .
Here the key idea is to explore both vector weighting w in the sample space and feature weighting v in the parameter space simultaneously.
The RVFM Bayesian Network
As the first step towards developing the RVFM, we realize parameter weighting or feature ranking by directly assigning weights v to the parameters. For this, the Bayesian network described in Fig. 1 is extended to derive the new model on the left of Fig. 3 . It is further assumed that v are also Gaussian random variables with p(v|β) = 
2 which is Gaussian kernel with feature weighting). However, this nonlinear kernel makes the training of the network extremely challenging since the deterministic relationship from w and v to t is nonlinear. More specifically, the following optimization objective of the Bayesian training process is not analytically computable and hence hinders the optimization based training process:
To address the above difficulty, instead of defining v as the weights of each parameter, we define a new kernel with tractable feature weights by assigning weights to the decomposed kernel functions:
Substituting the new kernel function into the design matrix of (2) leads to the observation of an interesting and useful property of the new general liner model, i.e. w and v are now interchangeable. One way to express (2) is:
where the new design matrix Φ has a new size of N ×F and is now defined by
It is important to note that, now due to the terms like wivj in (13), the target t is bilinear in terms of each vector or feature weight. Furthermore, we replace the product of two random variables wivj by a single random variable ui,j, which leads to our proposed computable linear Bayesian network shown on the right of Fig. 3 . If we define a new vector u of size N ·F whose entry u((i−1)N +j) = uij = w(i)·v(j), the model becomes linear in u:
where Φu is an N ×(N ·F ) matrix with the entries Φu(i,
In the RVM, the zero-mean Gaussian prior distribution of w tends to help the model converge to a sparse solution since it results in the marginal prior distribution over w being Student-t distribution. A majority of the entries of w are likely to be zero or very close to zero, and thus can be pruned. Similarly, in our proposed Bayesian model, we use zero-mean Gaussian as the prior of u to achieve sparse solutions. Considering the nature of u, if ui,j is pruned, either the i-th sample (the whole i-th row) or the j-th parameter (the whole j-th column) should be pruned as well. To reflect this, we define a proper prior for u as:
To see how this new prior works, let us assume that αi → ∞, then all the {ui,j} F j=1 are zero, i.e. the i-th sample is discarded from the set of relevance vectors. Similarly, if βj < ∞, the j-th parameter is relevant and there should be at least one non-zero ui,j for i ∈ [1, N] .
The posterior covariance and mean of p(u|t, α, β, σ 2 ) under the proposed Bayesian network are found to be:
where Au = diag(α1β1, α1β2, ..., αN βF ) and Σu is the new design matrix defined in (14).
Efficient Training Algorithm
The marginal likelihood maximization [8] required in training the RVM model is solved in an iterative process similar to the well-known expectation maximization (EM) algorithm. Due to the required matrix operations, the computational complexity in each iteration is O(M 3 ) if there are M relevance vectors. For the proposed RVFM whose architecture is shown in Fig. 2 , if there are M relevance vectors and E relevance features, the number of the internal model variables is M ·E. Applying the RVM training algorithm will lead to a very high computational complexity of O(M 3 E 3 ). To address this computational challenge, our proposed efficient algorithm leverages the elegant property that w and v are interchangeable in the bilinear Bayesian network, reducing our model to a standard RVM. As Fig. 4 shows, fixing α and moving the resulting expectation of w into the design matrix (i.e. converting Φu in (14) to Φ in (13)) will reduce every column in Fig. 4 to a single weight vj with its prior βj. Similarly, row-wise reduction by fixing β converts the proposed network to another RVM network with w and α.
The above discussion suggests an efficient iterative training process. In each iteration, we reduce the model row-wise and column-wise, respectively, and update α and β subsequently. The complexity in each iteration is now either
. Note that ui,j = wivj is defined only for the conceptual model development and illustration purposes. In the actual training process, only α and β rather than w, v or u are updated in each iteration.
EXPERIMENTS
To demonstrate the superiority of the proposed RVFM, we compare its performance with popular learning-based techniques including the SVM [1] and the RVM [7] . We also compare the RFVM with the RFM outlined in Section 2.2 in terms of parameter (feature) ranking.
Variability Analysis of an LDO
Building an accurate regression model for a given analog performance and performing feature ranking among all sorts of process parameters are key to the understanding of the impacts of process variabilities on analog circuits. Since simulations or measurements are usually expensive, it is of great significance to build an accurate regression model and obtain reliable parameter weighting with a moderate amount of samples, which turns out to be a task well handled by the proposed method.
We investigate the process variations in a realistic lowdropout regulator (LDO) design (Fig. 5) proposed in [9] . We build RVFMs to analyze the impact of process variations on LDO specifications including its quiescent current, undershoot of the output voltage Vout and load regulation. Channel length variations of all transistors in the LDO are modeled at the SPICE level using a commercial 90nm CMOS technology design kit. We use various numbers of simulation samples to build a regression model relating the model process parameters with each targeted specification and test the accuracies of these models using a testing set of 1,000 simulation samples. The results are shown in Fig. 6 .
In this experiment, normalized mean square error (NMSE) is used as the metric to evaluate the performance of the predictors trained with different techniques. As Fig. 6 shows, the RVFM out-performs the popular SVM and RVM in all cases by achieving one-order of magnitude lower NMSEs.
We compare the ranking produced by the RVFM and the RFM on feature ranking in Fig. 8 . To evaluate the quality of the ranking, for each design specification, we train two RVMs only in the process parameters selected by RVFM and RFM, respectively. A parameter is selected by RVFM or RFM if its expected weight is greater than 0.01. Such procedure is firstly applied to the regression model with 20 channel length variations, i.e. the three columns on the left of Fig. 7 . Then, the same procedure is applied to an expanded full parameter set of 60 parameters involving variations of each transistor's channel length, oxide thickness and threshold voltage (on the right of Fig. 7) . The resulting NMSEs and the numbers of parameters selected indicate that the RVFM produces more reliable parameter weighting and reaches similar sparsity compared to the RFM. We use design knowledge to provide further insights and validation for the parameter rankings of the 20 channel length variations computed by the RVFM. For example, based on the analysis in [10] a majority portion of the multi-loop LDO's quiescent current is consumed by the fastest two loops in the output stage and hence the variation on M 2 has significant impact on the quiescent current. Moreover, variations on M 3, M 7, M 8 and M 9 may lead to mismatches and considerable changes at the two output nodes of the error amplifier, one of which is Vg of M 2. This analysis matches the ranking shown in Fig. 8(a) . The undershoot of the LDO is mainly determined by the load capacitor and the loop bandwidth, which is further determined by the error amplifier (involving M 3 ∼ M 10), the fast loop in the output stage (M 12), and the in-band zeros locations defined as:
where ga is the output admittance of the error amplifier defined by the gm of M 7 ∼ M 10. The ranking of the RVFM in Fig. 8(b) is reliable since it captures all these relevant variations.
Load regulation of the LDO is mainly determined by the DC loop gain, which is the product of the gains of all stages in the loop. The gain of the EA stage is inversely proportional to the gm of M 7 ∼ M 10 and the second stage is comprised of M 17 and M 11. Again, the ranking of the RVFM as shown in Fig. 8(c) successfully identifies all these important variations.
PLL BIST Scheme Optimization
Built-in-self-test (BIST) is very effective in detecting operational failures of deployed analog/mixed-signal circuits. Base on the concept of alternative test, efficient BIST solutions can be formed by collecting low-cost test signatures and relating the signatures to targeted performance specifications using statistical prediction models. The effectiveness of BIST heavily depends on the quality of the selected signatures and the tradeoffs between accuracy, overhead, and test time. We apply the RFVM to the BIST of a chargepump PLL targeting three key specifications: lock-time(LT), frequency overshoot(OVS), and jitter (JT). 9 shows the PLL along with three BIST schemes using various test signatures. Jitter, frequency overshoot and lock-time are important specifications but cannot be easily measured directly on the chip. To capture failures in those specifications, the first candidate BIST scheme [11] collects the readouts of the counter in the divider as its test signature, while the second scheme [12] collects the accumulated up and dn phase detector outputs via integrators and time-to-digital converters (TDCs). The third scheme is an example of IDDQ testing, measuring the quiescent currents of the charge pump (CP) and the voltage control oscillator (VCO) as test signatures similar to the approach of [13] .
The first two schemes operate in a special test mode which instead of feeding back the divider output, it first feeds the one-buffer delayed reference clock to the phase detector for 8 reference cycles with a cycle time of 0.1 us. Then, the reference clock input is replaced by the double delayed reference clock for another 8 cycles. Each cycle generates one signature for Scheme 1 and two for Scheme 2, making a total of 16 and 32 signatures for Scheme 1 and 2, respectively. Scheme 3 reads out two signatures, i.e. the CP and VCO quiescent currents, in the quiescent mode. Recently, learning-based classifiers like the SVM have been trained to perform the failure detection in BIST [11, 14] . To make better usage of the collected test signatures, we apply the proposed RVFM in each scheme. We fit the target specification into a sigmoid function before we employ the RVFM as a classier for failure detection. Three classification techniques, the SVM, RVM, and RVFM, are trained with 200 simulation samples and tested with 4,000 samples. The classifying errors are compared in Fig. 10 which shows the superior BIST classifier accuracy of the proposed RVFM.
In addition, the RVFM also produces reliable ranking among test signatures, which can be further leveraged to improve the efficiency of BIST schemes. For example, the RVFM ranks the 16 test signatures in Scheme 1 as shown in Fig. 11 when building the classifier for jitter failure detection. The tenth signature is the last one with a significant weight. After that, the remaining 6 signatures are of little importance and can be considered as redundant. Using the same procedure, we reduce the test time for each of the three specifications for Scheme 1 as reported in Table 2 .
Assuming that realizing all three schemes on-chip does not lead to significant overhead, we seek to improve BIST accuracy by leveraging the signatures of all the schemes. While combing all the signatures can offer the best accuracy, it may not be completely efficient due to the existence of redundant signatures. For this, we train an RVFM on all the signatures across the three schemes to predict the jitter. Based on the signature ranking shown in Fig. 12 , we collect the first three signatures in Scheme 1 and the first signature in Scheme 2. Although the third last signature in Scheme 2 also possesses a notable weight, collecting such signature is not cost-effective in terms of test time, and thus it is discarded. For Scheme 3, only the quiescent current of VCO is selected, which can be measured in 0.6us according to [13] . Based on these five selected signatures, we synthesize an optimized combined BIST scheme for each specification and show the results in Table 1 . As can be been, by using the proposed RVFM, the BIST accuracy can be boosted to over 99.88% with a test time reduction of about 40%.
CONCLUSION
This paper proposes a novel sparse Bayesian learning framework named relevance vector and feature machine to capture circuit characteristics and analyze circuit performance dependencies on assorted parameters or signatures via a statistical regression model. The advantages of the proposed framework are demonstrated by variability analysis of an LDO and BIST optimization for a charge-pump PLL. 
ACKNOWLEDGMENT

