I. INTRODUCTION

M
ANAGING leakage is among the most eminent challenges to be hurdled by the integrated-circuits industry, with the advent of deep-submicrometer technologies in the nanometer regime. Several effects contribute to chip leakage currents; weak channel inversion leading to subthreshold leakage, band-to-band tunneling, and gate-oxide tunneling are among the many factors that induce leakage at various levels [1] . Being the dominant leakage component in modern highperformance designs operating at high temperatures [2] , [3] , subthreshold leakage plays a particularly critical role, especially given its high sensitivity to threshold-voltage V th variations and the scaling trends of V th .
High subthreshold leakage comes as a price tag on better performance and reduced active power, as a result of the need to scale the metal-oxide-semiconductor field-effect transistor (MOSFET) threshold voltage (V th ) to accompany the reduction in supply voltages (V dd ) and oxide thickness. This is because the decrease of V th translates as an exponential increase in subthreshold leakage current I sub as can be seen from the following expression of I sub , based on the Berkeley short- channel insulated-gate field-effect transistor (IGFET) model 3 (BSIM3) device model [4] :
where V T is the thermal voltage, and n and I 0 are constant factors related to device characteristics. I sub reportedly increases at a rate as high as about 5× per generation [5] , or tenfold for a 0.1-V decrease of threshold voltage [6] , with total subthreshold leakage current in dense high-performance chips forecast to be about half the total chip current [7] . In today's 1.2-V 0.13-µm technologies, V th is about 0.3 V (25% of V dd ). Compare these figures with older 1-µm technologies, using a supply voltage of 5 V and having a threshold voltage of about 0.8 V (16% of V dd ) [8] . Clearly, the scaling trends of threshold voltage do not keep up with those of supply voltage.
The disparity in scaling the supply and threshold voltages is especially consequential to pursuing aggressive designs, inducing tighter design margins, thus placing process variations under great scrutiny. In particular, since the gap between V dd and V th has narrowed, variations in the level of supply voltage become very significant when it comes to meeting timing closure. The network that delivers supply voltage to circuit devices is referred to as the power grid. The consequences of power-grid voltage-level variations in the estimation of a circuit timing vulnerability are being currently researched [9] , [10] , and it has been reported in an industrial survey that 50% of chips would fail to meet their timing requirements if no power-grid verification were done prior to tapeout [11] . However, modern power-grid-verification tools fail to account for the impact of the statistical variations of leakage on the power-grid voltage levels. Our present work addresses this issue, and we focus on subthreshold leakage, referred to thereafter simply as "leakage." Preliminary versions of this work appeared in [12] and [13] .
If leakage were uniform or predictable across the chip, then the contribution of leakage to the supply voltage drop can be simply accounted for by adding a deterministic leakageinduced component to the total voltage drop on the power grid. However, this is not the case and process variations in V th and the transistor channel length cause exponentially larger variations in leakage currents, as a direct consequence of the exponential dependence of leakage currents on V th or channel 0278 -0070/$20.00 © 2006 IEEE length [4] , [14] . In the 0.1-µm technology, it is possible to get 30 mV standard deviation in V th [15] . Considering a supply voltage of, say, around 1 V, this means that a ±3σ interval for V th would span 18% of the supply! For whole chips, leakage variations have been measured at almost 20×, leading to a slowdown of the chip operating frequency by a factor of 1.4 [16] , [17] . These variations are also known to have a significant within-die local component [18] , [19] , so that transistors in close proximity on the layout can have significant variations in their leakage currents. This is also referred to as mismatch and its effect on delay has been studied [20] .
Recent publications have focused on the characterization of leakage currents subject to process variations. In [14] , an approach was proposed to estimate the leakage current considering within-die variations. In [21] , the sensitivity of leakage currents was studied with respect to physical parameters, including channel length, oxide thickness, and channel doping. This last approach was extended in [4] , resulting in a technique for computing the probability distribution function (pdf) of leakage currents, including die-to-die and intradie variations. Iterative methods were suggested in [22] to estimate chip leakage considering variations in the supply voltage and temperature distribution across the chip, and a probabilistic approach was introduced in [23] to estimate statistical parameters of the chip leakage current.
We will assume statistical variability of leakage currents, i.e., that the statistics of leakage currents are available, with the knowledge that both Monte Carlo and analytical techniques can be used to obtain this information [4] . While worst case analysis is necessary when considering powergrid voltage drop due to global correlated die-to-die variations [18] in leakage currents, intradie (or within-die) variations require statistical analysis in order to avoid overly pessimistic conclusions. There are no good tools today to estimate this statistical voltage drop on the power grid due to leakage-current variations.
The paper is organized as follows: Section II presents the equations that provide a model for the power grid and presents a breakdown of currents loading the grid in order, isolating the components that represent leakage-current variability. Section III further discusses leakage-current variations, where it is argued that corner-case analysis should be used to account for die-to-die variations, but that statistical analysis is necessary when dealing with within-die variations. Focusing on within-die variations, we first propose an analytical probability distribution model for the voltage drop at every node in Section IV, and we derive a relation between the secondorder statistics of leakage currents and grid voltage drops. Observing that the main computational difficulty lies in variance calculations, we propose a numerical Monte Carlo technique to estimate the variances and covariances in Section V. In Section VI, we suggest a statistical verification for power grids that is aware of leakage variability, and the lognormal voltage-drop model is used to derive verification conditions. Section VII derives direct and iterative criteria to check these conditions, where the Monte Carlo technique proves very useful again. Results are shown in Section VIII and we conclude in Section IX.
II. SYSTEM MODELS
A. Power-Grid Equations
We consider a resistance-capacitance (RC) model of the power grid [24] . Let t be the total number of nodes on the grid. C4 sites make up for nodes with a voltage source of value V dd to ground attached to them. Let p be the number of such sites, then t = N + p, where N is the number of nodes with no voltage source attached. These nodes are of interest since the voltage level of the p C4 sites is known. Let c k (k = 1, . . . , N) be the capacitance from every node to ground and C the diagonal matrix of all such capacitances. Let i k (t) be the value of the current source to ground attached to node k [note that if no current source is attached to some node, then i k (t) = 0, ∀t]. Let i(t) be the vector whose kth component is i k (t). Let G be the conductance matrix [25] of the power grid, obtained from the resistive branches. Let u k (t) be the voltage at node k and u(t) be the vector of all such node voltages. Modified nodal analysis [25] applied to the power grid leads to the following system equation:
where every component of the vector
be the voltage drop at node k, and V (t) the vector of voltage drops, then (2) can be written as
This is a revised equation for the node voltage drops, where the voltage sources have been set to 0 and the direction of the currents reversed. In the following, we will be mainly concerned with the above revised system. When the circuit is in standby state, we solve (3) in direct current (dc) state, discarding the capacitance matrix C and eliminating the time dependence from the voltage drops and currents, to yield
In the sequel, we use the static power-grid model to derive the distribution of the node voltage drops on the power grid and their variances/covariances, due to within-die leakage-current variations. The transient model is used to calculate the vector of voltage-drop means (see Section IV).
B. Breakdown of Current Components
Since the power grid has the property of being a linear system as shown in (2), the grid response, i.e., its voltage drops due to circuit currents, can be obtained simply by adding up the grid response to individual current components. In particular, both active and leakage-current effects ought to be considered when determining the grid response, which can be done by superposition of individual responses to both switching and leakage currents (static and dynamic).
Process-induced variations on the leakage component of the total current are relevant in the context of this work. It is standard practice to break down these process variations into dieto-die and within-die components [26] , [27] . For the variations on a given parameter, the die-to-die component takes the same value for all instances of that parameter within a single chip, but differentiates among distinct chips. The within-die component causes variations within a given chip, depending on location.
We therefore model the total current as follows:
where I active refers to the active current, I sub,nom is the nominal value of leakage current at the design point, I sub,dd is the die-to-die component of leakage variations, and I sub,wd is the within-die component of these variations. By linearity of the power grid, we can sum up the contributions of the four current components in (5) to get the total voltage drop on the power grid as follows:
Voltage drops on the power grid induced by active currents are calculated on either a transient basis or on a dc basis, using average or peaks of active currents, or combinations thereof. Typically, this is what power grid verification tools perform today. Clearly, active currents and nominal leakage constitute deterministic cases. This leaves inter-and intradie variations to be incorporated in the analysis and verification of the power grid.
III. DEALING WITH LEAKAGE VARIABILITY
A. Die-to-Die Variations
We recall the following monotonicity property of the power grid [28] .
Proposition (monotonicity): If v(t) is the voltage drop due to i(t) and v * (t) is the voltage drop due to i * (t), then the power grid has the following property:
which we will express by saying that the grid is monotonic. Note that the monotonicity property applies to both dc and transient power-grid analyses. A similar result was earlier proven [29] for the special case of an RC tree driven by a single voltage source. The monotonicity property allows one to handle die-to-die variations easily.
Since die-to-die variations are correlated across the chip, corner-case leakage currents on a given chip (due, for example, to largest variations in transistor threshold voltage) are not unlikely and are accounted for through case-file analysis, since, by virtue of the monotonicity property of the grid, these corner leakage currents correspond to corner voltage drops on the grid. Therefore, it becomes very easy to calculate V I sub,dd for all power-grid nodes, by setting the components of the dieto-die leakage variations at the top of their range (e.g., at the mean + 3σ point) and solving the grid once [using (3) or (4)], then setting them at the bottom of their range (e.g., at the mean − 3σ point) and solving the grid again. Cases other than the ±3σ corners (e.g., the median or 50th percentile of leakage) can be used to reflect more aggressive verifications.
B. Intradie Variations
Handling within-die variations is inherently more difficult than die-to-die variations, primarily because the currents are not all correlated, so that occurrence of a corner case is highly unlikely.
Within-die variations have both systematic and random components. Systematic variations arise from the observed waferlevel variation trends reflected on a given die [18] , [30] or the spatial locations of some features on the die and their context in terms of the neighboring layout patterns. They account for the local layout-dependent correlations due to physical parameter variations across a chip, and the correlations that arise from "environmental conditions" [23] , such as supply voltage and temperature. Random variations constitute the residual component of the total variations, which, in essence, cannot be explained systematically [30] , and are modeled as statistically independent random variables (RVs). Ideally, one wants to extract systematic variations and treat them as "deterministic" [31] . This approach, however, necessitates detailed variation models and is hard to put to use prelayout. It is standard practice today to ignore the systematic within-die component, at least in the early stages of the design, for several reasons: 1) systematic trends are layout-dependent, so that incorporating systematic variation models can occur only postlayout; 2) within-die correlation generally dies down quickly across short distances and 3) detailed variation models are often not easily available or readily usable by the designers.
These considerations extend to the special case of withindie leakage-current variability. Specifically, within-die leakage variations are induced by within-die variations in physical device parameters such as threshold voltage and channel length [14] , and are also affected by supply voltage and temperature [22] . This work focuses on within-die leakage variations induced by threshold voltage tolerances within-chip. We will proceed under the assumption that leakage currents in different areas of a chip are statistically independent, which effectively amounts to considering all within-die V th variations on a random basis. To emphasize this, we make explicit the following.
Assumption 1: At any time point, we model all within-die leakage-current variations as statistically independent.
Since corner-case analysis would yield overly pessimistic predictions of the effect of intradie variations because of their locality, statistical analysis needs to be effectively put to use. Analytical methods can be found in the literature to derive expressions for the statistics of within-die leakage-current variations (see [4] and [14] ), namely, their means and variances. Given such distributional information, our problem is to estimate the means and variances (and possibly covariances) of the corresponding grid voltage drops V I sub,wd . The rest of this paper develops the statistical analysis and verification of the grid in response to intradie leakage variations.
IV. STATISTICS OF NODE VOLTAGE DROPS
A. Distributional Model for Node Voltage Drops
It is helpful to distinguish between two types of leakage in integrated circuits. A circuit certainly draws leakage current when it is in standby or sleep mode, which may be referred to as the standby leakage. The circuit also draws leakage current when it is active. Indeed, a logic gate draws leakage current any time that its supply is "ON." Even inside a switching window, part of the current drawn from the supply may be attributed to leakage. The leakage drawn by the circuit during its active (nonstandby) states may be referred to as the dynamic leakage. The grid response to standby leakage may be obtained by a dc analysis of the grid, using only a resistive model, whereas response to dynamic leakage requires a transient analysis, using and RC or a resistance-inductance-capacitance (RLC) model of the grid.
In order to characterize the distribution of voltages at the nodes of the power grid, we have found it necessary to introduce the following "pseudostatic" assumption.
Assumption 2: In order to obtain the distribution of the power-grid voltage drops in response to within-die variations in leakage currents, we assume that the grid may be solved as a dc system at every/any time point. This is purely a simplifying assumption which helps arrive at a precise characterization for the voltage drop distributions. Notice that this assumption is automatically true for the case of standby leakage. Thus, our analysis is exact for standby leakage. For dynamic leakage, since the leakage current of a logic gate is constant when it is not switching, then this assumption may be acceptable in practice, especially since we will include in the analysis some dynamics of the system through the computation of the mean response (see Section IV-B).
With this assumption and using (4), we have
where we consider I sub,wd to be the random vector representing within-die leakage variations, and V I sub,wd the corresponding grid-response vector, with reference to (5) and (6) . In the following, and for simplicity, we drop the subscripts indicating within-die leakage variations.
From (8), it is clear that the voltage drop at an arbitrary node i, V i , can be expressed as
where q ij is the (i, j)th entry of G −1 . As (9) shows, the voltage drop at any node is a linear combination of the leakage currents loading the grid.
Since intradie leakage variations arise from intradie variations in transistor threshold voltage, modeled as Gaussian [14] , then intradie leakage variations are lognormal [4] , by virtue of the exponential relation between transistor leakage and its threshold voltage shown in (1) .
If the I j are lognormal, then the q ij I j are also lognormal. Hence, under Assumption 1, grid node voltages, given in (9) , are each a summation of independent lognormal RVs. Sums of independent lognormal variables have been extensively studied and characterized in the literature pertaining to communications, and it was found that such sums can be accurately captured by another lognormal RV [32] . Therefore, we model the distribution of a node voltage drop as lognormal. We provide empirical data to corroborate this argument in Section VIII-B.
Since
where µ W i and σ W i are, respectively, the mean and standard deviation of W i . The cumulative distribution function (cdf )
where Φ(·) is the cdf of the standard normal distribution (Gaussian with 0 mean and unit variance). From (10) and (11), it is clear that the distribution of V i has two parameters: the mean and standard deviation of the associated Gaussian RV W i . µ W i and σ W i are related to the mean µ V i and standard deviation σ V i of V i as follows [33] :
From (12) and (13), we clearly see that computing the mean and variance of the voltage drop V i completely specifies the distribution of V i .
B. Distribution Parameters
Mean calculation is actually trivial. Since the system (3) is linear, then due to the linearity of the mean (E[·]) operator [33] , one can write
Thus, if we solve the system (3) once, using simply the means of leakage current variations as inputs, the solution gives the voltage means at all the nodes, obtained from the dynamic model of the grid. Note that mean calculation may be performed on a transient basis, and introduces dynamic properties of the power grid into our formulation.
We proceed to derive the covariance matrix of the node voltage drops induced by the random vector of within-die leakage current variations. Recall that the off-diagonal entries of the covariance matrix of a random vector represent pairwise covariances, while the diagonal entries are the variances of the corresponding RVs [34] . Under static conditions (Assumption 2), we can write from (14)
We now combine (4) and (15) to yield
Multiplying each side by its transpose and applying the expected value operator to each side leads to
We recognize the expectations on the left-and right-hand sides of (17) as being simply covariance matrices of node voltage drops Σ V and within-die leakage-current variations Σ I , respectively. Thus, the above result can be rewritten as
Since G is symmetric, G = G T . Therefore, (18) becomes
Under the assumption of statistical independence of within-die leakage-current variations (Assumption 1), implying that Σ I is diagonal, it can be seen from (19) that
where [·] ij is the (i, j)th entry of a matrix, q ij is as defined in (9), and σ
is the variance of the ith leakage current, that is, the (i, i)th entry of Σ I . Specifically, the variance σ
Let Var(·) be the operator that takes a random vector and returns the vector of variances of each component of that random vector, respectively. Define a matrix G −1 (2) such that
Now, (21) can be written in matrix form as
From the above discussion, we have established that calculating the voltage drop means can be easily done. However, we observe from (19) - (23) that computing the covariance matrix of the voltage drops or even only their variances requires full knowledge of the cumbersome G −1 , the inverse of the grid conductance matrix. The next section discusses estimating these second-order statistics efficiently.
V. VARIANCE/COVARIANCE ESTIMATION
A. A Current-Sampling Approach
One way of estimating the (output) variance/covariance of voltage given the (input) variance of current is to proceed by brute-force random sampling on the currents. Generate a randomly chosen vector of current values, according to the lognormal distributions of the currents, and solve the system (4) for a corresponding sample of the voltages. Do this repeatedly and collect statistics on the voltages. Stop the sampling when the desired statistics have been estimated with sufficient accuracy. We have implemented this technique and found that it is viable in some cases, but that it suffers serious disadvantages regarding the accuracy of the variance/covariance estimates: As compared with the column-sampling approach, to be presented below, up to ten times as many variance/covariance estimates may not reach convergence within the same execution time. Further, even in the suggested simple brute-force scheme, since the sample variance can be related to a χ 2 distribution, one will be required to evaluate certain percentiles of the χ 2 distribution for every sample, in order to check whether an estimator of the variance/covariance has converged. This can get very expensive because it requires numerical integration of the χ 2 pdf-using a table is not a practical option because the number of samples can be huge. As such, this currentsampling approach will not be discussed further; in contrast, we present a numerical Monte Carlo technique, based on a column-sampling method, which effectively transforms the variance/covariance-estimation problem to a mean-estimation problem, so that simple Monte Carlo methods, based on the central limit theorem, can be applied.
B. A Numerical Monte Carlo Approach
We propose a numerical Monte Carlo technique to estimate variances/covariances of node voltage drops [12] , [35] . Observe that (20) is a weighted summation. (20) can be rewritten as
Since N k=l p l = 1 and p l ≥ 0, l = 1 · · · N , then we can view the p l weights as being probability values associated with the q il q jl values, so that the summation in (24) becomes the mean (weighted average) of all the q il q jl values in the ith and jth rows. If we define an RV r ij as being a discrete RV that takes the values r ijl = q il q jl with probabilities p l , l = 1, 2, . . . , N, then we can write (24) as
Let the mean of r ij be µ ij = E[r ij ] and its variance be v ij .
We can now use methods of mean estimation from statistics, basically Monte Carlo random sampling [36] , in order to estimate the population mean µ ij using the mean of a much smaller sample (say, of size n N ) from the population, i.e., using the sample mean.
Using a weighted random-number generator, we generate, according to the probabilities p l , a sequence of indices of columns of G −1 to be included in the sample. From these, we form the following sample mean for any pair of rows i and j: (26) where L is the set of indices included in the random sample.
We also compute the sample standard deviation of r ij , s ij ≥ 0 given by
Note that s 2 ij is an estimator of v ij . Now,r ij itself, being a sample mean, can be considered as an RV, with mean µ ij (since the sample mean is an unbiased estimator of the mean of an RV [33] ) and variance s 2 ij /n (for large n). Then, we have that
is an unbiased estimator of
ij /n. Furthermore, by the central limit theorem [33] ,r ij will be approximately normally distributed, so that the RV (σ ij −σ ij ) √ n/(Ss ij ) is normal with zero mean and unit variance [36] , for large n. 139 , r 135 , and r 137 , with corresponding sample meanr 13 and sample standard deviation s 13 . The sample mean and standard deviation are updated when column 2 is sampled. This process continues until convergence ofr 13 , at which point Sr 13 is taken as an estimator of the covariance between the voltage at nodes 1 and 3. As will be discussed below, the convergence of the sample mean is directly related to how small the sample standard deviation becomes, and to the number n of samples taken. Observe that G −1 is a symmetric matrix, so that when columns i and j of G −1 are sampled, the covariance between V i and V j and the variances of V i and V j can be computed exactly, as per (20) . With reference to Fig. 1 , since columns 9, 5, 7, and 2 have been sampled, the variances of, and pairwise covariances between V 9 , V 5 , V 7 , and V 2 can be computed, thus eliminating the need to bring estimates for these quantities to converge via the sampling process.
We can use tables of the standard normal to establish how large n should be in order for the sample standard deviation ofr ij to be small enough for it to be a viable estimator of µ ij , with a certain confidence, and up to a predefined tolerance [36] . For example, if it is desired to have (1 − α) × 100% confidence (where α is a small positive number, 0 < α < 1) that the following is true: then it is known from sampling theory [36] that n should be larger than n 0 where
where z α/2 is such that the area to the right of it under the pdf of the standard normal curve is equal to α/2, i.e., Φ(z α/2 ) = 1 − α/2. Thus, for instance, for 95% confidence, α = 0.05 and z α/2 = 1.960; for 99% confidence, α = 0.01 and z α/2 = 2.576. In practice, one samples until n is larger than 30 or 50 or so, then starts to use (30) to monitor convergence. Observe that tables are not needed to compute z α/2 , a task that can be easily done by software [e.g., using the erf(·) function] and that z α/2 needs to be evaluated only once. The process described above can be used to estimate any entry in the covariance matrix Σ V of the node voltage drops. In particular, the diagonal entries of Σ V represent the variances of these voltage drops and can be estimated by setting i = j, for i = 1, . . . , N in the above calculations. In the following, we are mainly concerned with the described Monte Carlo method applied for variance calculations.
C. Error Bound for Variance Estimation
Our intent here is to approximate the variance of every node voltage by choosing a meaningful error bound for use in (30) , with i = j, to check for the convergence of the estimated variance value. Using the same notation as in the previous section, we have from (28) that
Let δ be a small positive number, 0 < δ < 1. It makes sense to achieve a user-defined error bound on σ V i defined relative to the supply voltage V dd as follows: In other words, we want to find [to be used in (30) ] as a function of δ in order for the following to be true:
To simplify the notation, let x = µ ii and x 0 =r ii . Also, let y = √ µ ii = √ x and y 0 = √r ii = √ x 0 , and let γ = δV dd / √ S. Notice that γ > 0. We want to find in terms of δ so that
There are two cases to consider, according to whether y 0 is small or not, as shown in Fig. 2 . When y 0 is small, small enough so that y 0 − γ < 0, then (since y > 0 in all cases) in order to guarantee that |y − y 0 | < γ, it is sufficient to impose an upper bound on y in the simple form y − y 0 < γ. This is achieved by imposing an upper bound on x in the simple form x − x 0 < 1 , where 1 is the corresponding upper bound on (x − x 0 ), as shown in Fig. 2(a) . Let ∆x = x − x 0 and ∆y = y − y 0 . Since y 2 = x, then ∆(y 2 ) = ∆x, and since ∆(y 2 )=(y 0 +∆y) 2 − y 2 0 = (∆y) 2 + 2y 0 ∆y, then ∆x = ((∆y) 2 + 2y 0 ∆y). For ∆y = γ, 1 = ∆x = (γ 2 + 2y 0 γ) > 0. In order to guarantee (34) , in this case, we need to set = 1 = (2y 0 γ + γ 2 ). When y 0 is not so small, i.e., when y 0 − γ > 0, then we need to consider both upper and lower bounds, as shown in Fig. 2(b) . If 2 is the lower bound on (x − x 0 ), as shown in the figure, then we may compute 2 by setting ∆y = −γ, which leads to ∆x = (γ 2 − 2y 0 γ) < 0, and we set 2 = −∆x = (2y 0 γ − γ 2 ) > 0. Since 2 < 1 , as is obvious from the expressions found for each, then, in order to guarantee (34) in this case, we need to set = 2 = (2y 0 γ − γ 2 ). Notice that the condition y 0 > γ translates tor ii > δ 2 V 2 dd /S. In summary, then
Notice also that, in either case, > δ 2 V 2 dd /S. Plugging (35) into (30) leads to
where the + or − sign depends on whetherr ii is smaller or larger than δ 2 V 2 dd /S, respectively. To summarize, if it is desired to estimate σ V i to within ±δV dd , with (1 − α) × 100% confidence, then n must be larger than the threshold given by (36) . This provides a useful tradeoff between accuracy and speed, as more samples would be required for smaller δ.
By transforming the variance-estimation problem into a mean-estimation problem, we were effectively able to apply Monte Carlo methods to estimate the variance of each node voltage drop. Our sampling procedure, where the likelihood of a column to be sampled is driven by the current load at the corresponding node, intuitively amounts to giving a greater chance for a node with high-current variance and the corresponding column of G −1 to be sampled, which amounts to treating at as likely to have a greater impact on voltage-drop variances on the grid than nodes with lower current variances. The error bound (δV dd ) and confidence level (1 − α) impose a statistical bound on the error. The Monte Carlo technique along with the stopping criteria derived above thus enable the estimation of the variances of grid voltage drops, while circumventing the problem of knowing the full inverse of the grid conductance matrix. Since information on the mean voltage drop is readily available (Section IV-B), we have all parameters necessary to complete the description of the probability distribution of the node voltage drops in response to within-die leakage variations, given in (10)-(13).
VI. POWER-GRID VERIFICATION
A. Verification Methodology
Within-die variability of power-grid node voltage drops appears as a process-induced background level of noise on the grid. In this section, we propose to verify the grid in the presence of this noise [13] . The idea is to verify whether the "bulk of the distribution" at any/every every node falls below a userspecified "safety threshold-voltage level." The user specifies what exactly is meant by the bulk of the distribution, by specifying a percentage of the pdf to be below the threshold so that a node may be deemed "safe." The safety threshold voltage and the confidence level can be chosen to reflect aggressive designs or more conservative choices. For instance, when designers are able to make a statement such as: "Node i is safe if, with confidence 1 − β i , the voltage drop at i is less than V Ti volts," the approach described in this paper enables the verification of such a statement for all nodes, specifying parametrically the levels of "high confidence" 1 − β i and "safety threshold voltage" V Ti . This leads to a statistical definition of safety, which we state as follows.
Definition 1: Node i is said to be safe if
In the above, P{} denotes a probability, V i the voltage drop at node i (due to within-die leakage variations), V Ti the safety threshold voltage at node i, and β i a small positive number between 0 and 1. The term 1 − β i specifies the bulk of the voltage drop that is to be below the threshold V Ti to assert that a node is safe, and we would refer to it as the safety parameter. The choice of the safety parameter and the safety threshold voltage at a given node takes into account how critical a high voltage drop at that node is, and we made explicit their dependence on the node. Given safety parameters and threshold voltages at all nodes, our objective is to verify each node in the grid, i.e., tell which nodes/regions are safe with respect to the background noise, and which present a hazard.
Let V 1−β i be the 1 − β i percentile of the voltage drop at node i, i.e., V 1−β i is such that Thus, V 1−β i becomes a figure of merit of the verification procedure. It can be viewed as a parametric measure of the voltage drop on the grid nodes taking into account process variations on the leakage currents: If the safety parameter is 50%, this percentile is the median voltage drop, 90% and 95% represent more conservative measures, and 100% represents an upper bound on the voltage drop. The verification problem reduces to determining whether the 1 − β i percentiles of voltage drops are greater or less than a given safety threshold for any node i.
Let z β i be such that Φ(z β i ) = 1 − β i . From (11), we have
where W i = ln(V i ) is a Gaussian RV with mean µ W i and standard deviation σ W i , as defined in Section IV-A. From (37), and using (12) and (13), we can derive an expression for V 1−β i in terms of the mean and variance of V i :
The above equation relates V 1−β i to µ V i and σ
. We argued in Section IV-A that the mean can be easily computed. This leaves the variance of voltage drop as the only parameter on which V 1−β i depends, and we write V 1−β i (σ
) to emphasize this dependence. The following outlines our verification methodology. We first rewrite the safety/unsafety definition (Definition 2) in terms of voltage variances and derive critical variance values for verifying the node safety status (Section VI-B). Then we derive quick direct checks to see where the actual variance falls with respect to those critical values (Section VII-A). For nodes where direct checks do not provide conclusive information, we make use of the numerical Monte Carlo technique described in Section V-B to estimate the variance of the voltage at these nodes (Section VII-B), and hence, the 1 − β percentile voltage drops, and determine the node safety status.
B. Critical Variances
, and using (38) , one can easily show the following.
3) lim ). In order to translate the safety/unsafety criteria on V 1−β i to conditions on the variances, nodes will be divided into three groups according to the values of V Ti , µ V i , and z β i .
Group 1: Includes all nodes i such that
These nodes will all satisfy V 1−β i < V Ti , for all possible values of the variance of their voltage drop (0 < σ
, and therefore are safe irrespective of their variances.
The reason nodes satisfying the Group 1 criterion are always safe can be seen graphically by drawing a horizontal line at
. Such a line is always above the curve representing V 1−β i , as can be noted from Fig. 3 . Therefore, V 1−β i < V Ti , and these nodes are safe. 
Group 2:
Includes all nodes i such that
A node in this group is safe if and only if (iff ) the variance of its voltage drop σ
Conversely, a node in this Group is unsafe iff σ 
[see Fig. 4(b) ]. With this, determining whether a node i is safe or unsafe reduces to knowing where the variance of the voltage at that node is located with respect to σ Observe that the group of each node as well as critical variances are known automatically, and require no a priori knowledge of the variance of the node voltage drop, since z β i is directly obtained knowing β i , V Ti is a user-defined parameter, and the means of the voltage drops are easily calculated using (14) .
VII. CHECKING VOLTAGE VARIANCES AGAINST CRITICAL VALUES
A. Direct Criteria
In this section, we make use of bounds that may directly reveal whether V 1−β i is greater or less than the safety threshold V Ti , without having to compute the variances. From the previous section, if a node is in Group 1, then this node is safe for all possible values of the variance. The following provides similarly useful checks.
For convenience, we reproduce (21)
where q ij was defined to be the (i, j)th entry of G −1 . Since σ I j ≥ 0 and q ij ≥ 0, ∀i, j, then (43) yields
Let Std(·) be the operator that takes a random vector and returns the corresponding vector of standard deviations. The above leads to the following upper bound on deviations of node voltage drops:
where the vector inequality is defined componentwise. Given a lower-upper (LU) factorization of G, and the variances of the within-die-variability components of leakage currents, the cost of computing the upper bound given in (45) is only one forward/backward solve. Now, using the notation of Section V-B, we can write from (24)
This leads to a lower bound on Std(V) as follows:
where Var(·) is the variance vector operator, defined in Section IV-B. Given an LU factorization of G, the cost of computing this lower bound given in (48) is only one forward/ backward solve.
Putting (45) and (48) together yields the following interval on Std(V)
The above inequality provides an upper and a lower bound on the standard deviations, and hence the variances, of the voltage drop at each node on the power grid, at the cost of two forward/backward solves.
To put these results to work, let u 
B. Verification Through Column Sampling
For nodes in Group 2 or Group 3, where the variance bounds given in (49) do not provide conclusive information regarding their safety status, we can estimate their voltage-drop variances so as to determine the location of these variances with respect to critical values. For variance estimation, we make use of the numerical Monte Carlo method described in Section V-B.
Observe that the variance estimates obtained by the Monte Carlo technique inherently carry a certain error, bounded by a given confidence level (as in Section V). Some error will then be propagated to the estimates of the 1 − β percentiles of voltage drops, as per (38) , so that a certain confidence level will be required when asserting whether V 1−β i is greater or less than V Ti . Accordingly, we extend Definition 2 to consider that a node is safe if
where α is a small number between 0 and 1. In this perspective, for the nodes successfully checked in Section VII-A, α = 0.
Establishing the desired confidence level on the relative position of V 1−β i and V Ti for node i translates to establishing that same confidence level on the relative position of the variance of the ith node voltage and the corresponding critical variances (σ node i is in Group 3). Therefore, when applying the Monte Carlo technique for variance estimation, the estimate of the variance per se matters less than where the variance is with respect to critical values. For example, suppose node i is in Group 3, variance estimation is only needed in order to obtain high confidence on whether σ
is less than (greater than) σ 2 3,i , so that the node can be marked as unsafe (safe). Therefore, while the sampling procedure, as described in Section V-B, is unchanged, new error bounds, i.e., stopping criteria, must be derived.
A few cases need to be considered. If node i is in Group 2, the estimate at node i of the variance of the voltage dropσ ii (following the notation of Section V-B) may fall to the left of σ We recognize that the probability of the true variance being outside the interval where the variance estimate lies is not very high, and indeed for α small enough, less than 1 − α. Based on this, in the sampling process, we seek to find the smallest number of samples n that verifies, with confidence 1 − α, whether the node is safe or unsafe, according to the interval where the variance estimate lies.
Assume node i is in Group 2. Then, as was shown in Section VI-B,
1,i , 1 − α confidence that the node is safe is reached when
The above can be written as
where Var(·) represents the variance of an RV. From the arguments presented in Section V-B, we know that (σ
ii /n, where S and s ii are as defined in Section V-B, and n is the number of samples. Therefore, the above condition reduces to
That is, ifσ ii < σ 2 1,i , then a 1 − α confidence level that node i is safe is attained iff n satisfies (50). Note that (50) does not give a closed-form solution for n, but it is easy to check it using the erf(·) function. Observing that in this case σ ii < σ 2 1,i , we can obtain a closed-form sufficient condition for n to satisfy in order to verify safety, by neglecting P{σ
where z α is such that: Φ(z α ) = 1 − α. Identical reasoning applies whenσ ii > σ 2 2,i , and we obtain that the necessary and sufficient condition on n to verify safety is
and that a sufficient condition that yields a closed form for n is
Finally, if σ
2,i , we need to find the number of samples that will establish 1 − α confidence that node i is unsafe, i.e., that σ 
In order to write a closed-form sufficient condition, we recall that a 1 − α confidence interval on σ 2 V i is given by [36] 
So, to obtain a 1 − α confidence level that σ 
Obtaining bounds for nodes in Group 3 is easier as we have only one critical variance. Ifσ ii > σ 2 3,i , then we need to check for safety, i.e.
P σ
This is equivalent to
Note that (54) is a necessary and sufficient condition on n to achieve 1 − α confidence that node i is safe. Similarly, if σ ii < σ 2 3,i , we obtain 1 − α confidence that node i is unsafe iff n satisfies
Note that n 4 = n 5 , and that (54) and (55) are closed-form necessary and sufficient conditions on n, the required number of samples. The following summarizes convergence of node i after n samples.
If node i is in Group 2:
Ifσ ii > σ 
C. Residual Nodes
It can be seen from (51)-(55) that the number of samples required to establish the desired confidence level may be large if the estimated variance is very close to σ With an estimateσ ii of σ
, we use (38) to define a natural estimator of V 1−β i given bŷ
Hence, ifσ ii is close to σ 
Knowing the variations of
(see Fig. 3 ), it is easy to determine the point
, described in Section VI-B, and depending on the values ofσ ii , v 1 , and v 2 , v m can be equal to v 1 , v 2 , or µ
, as follows V ub,i can then be written as
(57) Fig. 5 illustrates a case for a residual Group 3 node.
Residual nodes are simply nodes having their V 1−β very close to the safety threshold voltage that it becomes difficult to tell whether they are safe or not. They are on the verge of safety or unsafety, as they were defined. Observe that V ub,i is always greater than V Ti (otherwise node i would have converged), therefore, V ub,i − V Ti can be viewed as the required increase in the safety threshold on node i to establish safety on that node.
VIII. RESULTS
A. Test Grids
The proposed analysis and verification methods have been implemented and tested on a number of test-case grids. Not having access to power grids from industrial designs, and because we need a large number of grids under different conditions, we have opted to generate a number of grids ourselves. In our test grids, technology and topology parameters are user specified. The grid-generation process is automatic and employs a pseudorandom-number generator, based on the builtin function random(·) in C. Starting with a square uniform grid of a given size, we proceed to randomly delete a userspecified percentage of nodes (which we call the degree of nonuniformity), thus rendering the grid structurally nonuniform. Typical geometric and physical grid characteristics (e.g., grid dimensions) and characteristics of the fabrication process (e.g., sheet resistance of a particular level of metallization) are given by the user, leading to an initial value of the conductance of every branch. When a node is deleted, the conductances of the remaining surrounding edges (branches) are increased by a random amount around a user-specified percentage of their initial values. The rationale behind this is to allow the nonuniform grid to be loaded with currents comparable to its uniform predecessor while exhibiting comparable current-resistance (IR) drops. The number of V dd (C4) sites is supplied by the user and C4s are placed randomly on the grid. The C4s and current sources are then distributed at random over the grid nodes. The user also specifies a typical value of leakage-current variance and mean and the assigned values are generated randomly around these typical values. Current sources are then distributed at random over the grid. All results were obtained on a 1-GHz dual-processor Sun Fire server with 4.0 GB of main memory. The implementation was carried out by writing C programs, and LU factorization and forward/backward solves done through the package UMFPACK [37] , using Basic Linear Algebra Subprograms (BLAS) routines from the Sun Performance Library [38] . Fig. 6 corroborates the fact that voltage drops are well modeled by a lognormal distribution. We randomly generated n s = 10 000 independent load vectors, with each individual current load following a lognormal distribution, and with different current loads in any given vector being statistically independent from one another. We collected voltage-drop data on all grid nodes, in response to each of the randomly generated load vectors, then computed the natural logarithms of each node voltage drop, i.e., ln(V i,j ), where V i,j is the voltage drop of node i and in response to current vector j, for i = 1, . . . , N, and j = 1, . . . , n s . We then obtained the sample meanμ i and sample standard deviation s i of each ln(V i ), for i = 1, . . . , N. The y-axes in Fig. 6 represent voltage-drop data transformed as (ln(V i,j ) −μ i )/s i , ∀i, j. The point being to verify lognormality of all voltage drops, the task becomes checking these transformed voltage drops, which are statistically independent, against a standard normal distribution. This was done graphically, and the data were fit on normal scores plots [33] , shown in Fig. 6 . As can be seen from the figure, voltage drops showed good fits, except for certain outliers, validating the choice of lognormal distributions to model the voltage drops on the power grid, induced by independent within-die leakagecurrent variations. Fig. 7 verifies the accuracy of the proposed column-sampling method for variance estimation. The correlation plot in Fig. 7(a) shows the estimated standard deviations versus exact ones, for a grid of 138 500 nodes, obtained after convergence of 100% of the grid nodes, reached after 14 589 samples. The estimated standard deviations are obtained via the column-sampling approach and the exact standard deviations obtained via the deterministic solution of (23) . The plot confirms the accuracy of the results. The error distribution is shown in Fig. 7(b) : Only one among 138 500 data points (exact standard deviation = 1.25%V dd and estimated standard deviation = 1.76%V dd ) carried an error larger than 0.5%V dd , and this error is 0.51%V dd . The number of data points that carried errors larger than 0.4%V dd was 70. The average error over all variance estimates is 0.06%V dd . For reference, we note that the mean standard deviation of the voltage drop over all nodes is 0.99%V dd and the maximum standard deviation of the voltage drop at any node is 3.77%V dd . The mean and maximum standard deviation are obtained from the exact values, representing the x-axis of Fig. 7(a) . Clearly, the number of samples required for convergence depends on the error bound δ and the actual variance values to be estimated. For a given power grid and with a fixed error bound, it can be observed from (35) that as the standard deviations to be estimated increase, the number of required samples increases quadratically.
B. Lognormality of Voltage Drops
C. Variance Estimation
An important point, underscored in Fig. 8 , is that a small fraction of nodes require a very large number of samples to reach convergence, as defined in (36): In the figure, and for a grid of 264 592 nodes, 95% of grid nodes converge after 1566 samples, 99% converge after 3304 samples, but it takes more than 13 000 samples for all nodes to reach convergence. In effect, for the "difficult" 1% of nodes, the 1 − α confidence interval after 3304 samples extends beyond the error bound (δV dd ). Fig. 9 shows the number of samples needed for convergence of 100% of the grid nodes as the error bound (δ) is varied, as a percentage of the supply voltage V dd . The figure shows a sharp increase in the total number of samples for an error bound of 1% of V dd or less. Indeed, if δ is greater than 5% of V dd , convergence is attained immediately after 50 samples, while more than 5000 samples would be needed to attain convergence when the error bound is fixed at less than 1% of V dd . Table I shows characteristics and results for a number of power grids whose sizes were chosen small enough to allow one to compute the voltage variances by solving (23) deterministically, and compare results with the proposed Monte Carlo approach. The statistical estimation was conducted at 95% confidence (α = 0.05) and with a 1% error bound (δ = 0.01). The fourth column of the table shows the mean and maximum voltage-drop standard deviation, taken over all the grid nodes. The values and spread of the voltage standard deviation continue the trend observed in Fig. 7 . The columnsampling process was stopped when 99% of grid nodes reached convergence, as specified in (36) . Since each column sample contributes to the estimation of every node voltage drop, each sample serves to update the variance estimate at all nodes, so that by the time the last column sample is taken, the confidence interval on the standard-deviation estimate of the majority of nodes is smaller than the error bound (δV dd ). This explains the relatively small average errors in the nodevoltage standard-deviation estimates, shown in the fourth column, which are found to be in the order of 0.1%-0.2% of V dd . By the same token, we expect the number of nodes, where the error in the standard deviation estimate is actually greater than δV dd , to be small relative to the grid size. Indeed, the sixth column shows the percentage of nodes (including nonconverging ones) where the error in the voltage standard-deviation estimate is in fact greater than the error bound of 1% of V dd . In most cases, this percentage is effectively smaller than 1%, the percentage of nodes that did not converge. The fifth column reports the maximum error on any node: In each case, this error is quite comparable to the error bound, mostly in the range of 1%-2% of V dd , and exceeding 2% of V dd only in one in ten cases. The last two columns of Table I compare the runtime of the column-sampling method with the deterministic solution of (23) . The speedup ranges from about 12.7× for the 17 678-node grid, to about 50× for the 142 956-node grid, and is, on average, about 26×.
Results on larger grids are shown in Table II . These grids are too large for the deterministic solution of node voltage variances to be carried out, therefore, only the proposed columnsampling approach was applied to them, and only runtime and memory-usage data are shown. It is clear that this technique can be applied to relatively large grids, with a tight (δ = 1%, in this case) error bound.
D. Verification of Power Grids
Tables III and IV show the overall performance of the proposed verification approach on grids of various sizes. The grids in the first table are small enough so that we were also able to solve (23) deterministically to obtain the exact value of the variance at each node, and consequently, the exact value of the 1 − β percentile of the voltage drop. The tables show runtimes for LU factorization, application of the direct checks (see Section VII-A), and verification through column sampling (see Section VII-B). for carrying out grid verification by a deterministic solution of (23), allowing us to compute exact values for grid voltage variances and therefore, to determine the safety status of each node with 100% confidence, i.e., without resorting to statistical variance estimation through column sampling. With this, we can measure the error of our verification approach when column sampling is put to use: The last column of Table III reports the resulting errors, defined as the ratio of the number of nodes that TABLE II  VARIANCE-ESTIMATION RESULTS ON LARGER GRIDS THAT DO NOT ALLOW A DETERMINISTIC SOLUTION   TABLE III  VERIFICATION RESULTS ON SMALL GRIDS, WHERE CHECKING THE ACCURACY OF THE PROPOSED APPROACH IS POSSIBLE   TABLE IV  PERFORMANCE OF THE PROPOSED VERIFICATION APPROACH ON LARGE GRIDS were deemed safe and are actually unsafe and vice versa, to the total number of nodes, excluding residual nodes.
Observe the difference in number of samples required for variance estimation (Tables I and II) and grid verification  (Tables III and IV) , which directly translates in significantly smaller runtimes for grid verification. This difference is due to the fact that Group 1 nodes and upper and lower variance bounds rule out the need for column sampling to estimate voltage variances for a fraction of nodes. Also, the sampling process for grid verification does not seek an estimate of the node voltage variance per se, but rather to establish a given confidence level on whether this variance is greater or less than a critical value. This implies a difference in the stopping criteria used for column sampling between the varianceestimation and grid-verification problems, such that sampling for grid verification may stop when establishing the desired confidence level on the location of the variance estimate with respect to critical variances, without having an accurate estimate of the variance itself.
We observed in our experiments that some grids were fully verifiable by the direct criteria, described in Section VII-A, in which case the runtime is dramatically reduced. The first grid in Table III and the second grid in Table IV are examples of such grids. The errors, shown in Table III , were very much in accordance with the specified bounds, and in the case of grids that were verifiable using only direct criteria, the error is 0. Fig. 10 shows distribution of the distance between the upper bounds on V 1−β i and the safety threshold voltages V Ti , for the grid of 1.3 million nodes, featured in the last row of Table IV . The safety parameter (1 − β) was set at (90%) for all nodes and the confidence level for convergence (1 − α) at 95%. The resolution was fixed at 1% of V dd (δ = 0.01). Fig. 10 shows that there were 14 362 residual nodes, corresponding to 1.1% of the grid size. The number of such nodes is primarily related to the user-specified resolution and how it compares with the actual variances of the node voltages. All experiments feature a resolution of 1% of V dd , and the number of residual nodes was consistently small, compared to the grid size (see also Tables III  and IV) . It was stated in Section VII-C that the distance between V ub,i and V Ti is an absolute measure of how much the requirement on the safety threshold voltage at a residual node should be relaxed (i.e., how much V Ti should increase) in order to deem a node safe. Fig. 10 illustrates that this distance is small (the average being 0.96% of V dd ), implying that V 1−β i and V Ti are indeed close (that closeness being controlled by the resolution).
IX. CONCLUSION
Technology-scaling trends exacerbate the importance of proper leakage management and reinforce the need for robust power distribution across the chip. This work addressed the vulnerability of chip power distribution to process-induced leakage-current variations. Leakage variations appear as random drops of the power-grid voltage levels and may be thought of as a background level of noise on the grid. We suggested a lognormal statistical model for the voltage drops on the grid in the presence of this noise and a numerical Monte Carlo technique to overcome the computational complexity associated with the grid size in order to determine the statistical parameters of this leakage-induced noise. With this ability, we were able to formulate a verification methodology in order to localize areas of the grid that are particularly susceptible to voltage violations due to leakage variations. Results showed the efficiency and applicability of the proposed approach.
