Power grid verification in modern integrated circuits is an integral part of early system design where adjustments can be most easily incorporated. In this work, we describe an early verification approach under the framework of current constraints where worst-case node voltage drops are computed via linear programs proportional to the grid size. We propose an efficient method based on a sparse approximate inverse technique to greatly reduce the size of such linear programs while ensuring a user-specified over-estimation margin (in volts) on the exact solution.
INTRODUCTION
A well-designed power grid in integrated circuits (ICs) should guarantee the proper logic functionality at the intended design speed. As the supply and threshold voltages decrease with technology scaling, full-chip verification is getting more and more challenging. A key concern is the fact that logic circuits slow down under reduced supply voltages thus putting overall circuit timing performance at risk. Hence, power grid analysis and verification has become central to reliable high-speed chip design.
Today, grid verification is typically done by simulation. Such an approach requires full knowledge of the current waveforms drawn by every logic block attached to the grid. These waveforms would then be used to simulate the grid and get the voltage drop at every node. Verifying the grid in this manner proves to be problematic. For one thing, the number of current traces needed to cover the space of voltage drops exhibited on the grid is intractable for modern designs where grids can consist of several million nodes. Another major drawback is that a simulation based flow does not allow for early grid verification, when grid modifications can be * This work has been supported in part by the Semiconductor Research Corporation (SRC) (www.src.org). most easily incorporated. The need, then, is for a verification approach that does not depend on simulation, i.e., a vectorless approach. Therefore, we adopt the framework of partial current specifications, in the form of current constraints [1] , which will be detailed in section 2.2. Under such constraints, power grid verification becomes a problem of computing the maximum voltage drops subject to current constraints.
In [1] , the authors adopted a DC grid model and formulated the verification problem as a set of linear programs (LPs). This approach identifies coarse problems with the grid but is oblivious to violations related to the grid dynamics. In [2] , the authors considered an RC model of the grid and focused on developing upper bounds on the worst-case voltages using a convergent iterative process that requires the solution of LPs while stepping repeatedly through time. A later work [3] gave a closed form bound which required a single LP per node. For all these formulations, finding the worst-case voltage drop everywhere on the grid would require the application of as many LPs as there are nodes, in a sequence, where every LP is proportional to the size of the grid. This becomes prohibitive for large designs; we often refer to this as the sequentiality problem. In this paper, we propose a novel approach to alleviate sequentiality based on an approximate inverse technique. This is a rigorous method that takes advantage of locality on the grid. It captures, for any given node, the current sources that are most influential, on that node, in a fast automated way, and then it formulates a reduced-size optimization problem while ensuring a user-specified over-estimation margin (in volts) on the solution of the exact LPs. Experimental results in section 5 show that even for small over-estimation values, the improvement in runtime is dramatic.
BACKGROUND 2.1 The Power Grid Model
Consider an RC model of the power grid, where each branch is represented by a resistor and where there exists a capacitor from every node to ground. This work is not applicable in its present form to RLC grids, because it takes advantage of special properties of circuit matrices in the RC case. We are working on extending this to the more general case. In a power grid, some nodes have ideal current sources (to ground) representing the currents drawn by the circuits tied to the grid at those nodes, while other nodes may be connected to ideal voltage sources representing the connection to the external voltage supply.
Let the power grid consist of n+p nodes, where nodes 1, 2, . . . , n have no voltage sources attached, and the remaining nodes (n + 1), (n + 2), . . . , (n + p) are the nodes where the p voltage sources are connected. Let i(t) be the element-wise non-negative vector of all current sources connected to the grid. We assume that ∀k = 1, . . . , n, i k (t) is well-defined, so that nodes with no current source attached have i k (t) = 0. With these assumptions, we can write the RC model for the power grid as [2] :
where v(t) is the n × 1 vector of time varying voltage drops (difference between V dd and node voltages). C is the n × n diagonal capacitance matrix, since all capacitors were assumed to be nodeto-ground. G is the n × n conductance matrix, which is known to be a diagonally-dominant symmetric positive definite M-matrix (so that G −1 ≥ 0). Using a finite difference approximation, a discrete-time version of (1) can be written as:
where
) is also a symmetric positive definite Mmatrix, so that A −1 ≥ 0. For properties of M-matrices, the reader is refered to [4] .
Current Constraints
In our approach, we perform a verification of the grid in the absence of complete information about currents drawn by the underlying circuit, what may be called a vectorless approach. We do this because the currents are typically hard to specify, for at least two reasons: 1) there is a large variety of possible current waveforms, so that the worst case is hard to determine up-front and simulation of a large set of waveforms is prohibitively expensive, and 2) grid design and verification cannot wait until the chip design is nearly-complete, and is typically done early in the design flow, so that the details of the underlying circuitry may not yet be available or complete.
However, while users may not know the exact circuit currents, there is always some knowledge from previous design projects which users can bring to bear. It is such knowledge that one aims to capture with the concept of current constraints, originally introduced in [1] . Current constraints capture the uncertainty about the circuit currents arising from both unknown circuit behaviors and the fact that one is uncertain about circuit details early in the design flow. The aim is to verify that the grid is safe (i.e., its voltages remain within certain bounds), under all possible transient current waveforms which satisfy these constraints. Two types of constraints are defined: local constraints and global constraints. Local constraints are upper bounds on individual current sources, where a current source can represent a single logic gate or cell, but more typically should represent a larger block. They can be expressed as:
where i L represents the peak value of current that this current source can draw -it is a "DC" upper bound on the transient waveform i(t). It is possible to also consider upper bounds that are themselves transient waveforms over time (transient constraints), but in this paper we restrict our work to the case of DC constraints. It is important to remember, however, that it is only the constraints that are DC; the currents themselves are transient. To ensure that these constraints are well-defined for every node of the grid, we enforce the condition that any node with no current source connected would have a zero i L component. If only local constraints are provided, the problem is much simplified, but the results would be overly pessimistic, because it is never the case that all chip components simultaneously draw their maximum currents, hence the need for global constraints, which are upper bounds on the sums of groups of current sources. They represent the peak total power dissipation of a group of circuit blocks. Assuming we have a total of κ global constraints, they can be expressed in matrix form as:
where S is a κ × n matrix that consists only of 0s and 1s which indicate which current sources are present in each global constraint. As for the case of local constraints, note that i G is a fixed time-independent upper bound, a DC constraint, but the currents themselves are transient waveforms over time. How would one obtain/specify these constraints in practice? If a logic block is available and small enough to simulate, then one can generate the constraints by an "offline" process of simulation, which can be viewed as a characterization of that block. If the block is not yet available or is too large to simulate, then one would need to rely on design expertise and engineering judgement (how big it is, what its power needs were in a previous technology and how scaling would affect those needs, etc.). If, early in the design flow, nothing is known about this block, not even its detailed functionality, one typically is able to come up with an area budget for it. From that, and from the projected power density (Watts/μm 2 ) for the target process technology, one can generate a rough current constraint for it. The bottom line: something is typically known about that block, which with good engineering judgement can be formulated into constraints. Chip designers typically use "power budgets" early in the design flow to help verify the grid, often making use of simple spreadsheet applications. The proposed constraint-based approach is a scientific and reliable approach for making use of such budgets, and it depends on good designer expertise and judgement. After all, if truly nothing is known about the circuit currents, then the grid simply cannot be verified.
Another possibility is that the current constraints can be used to implement a "spec-based" design flow; a chip-level designer would simply specify the constraints based on design expertise, and the grid is verified under these constraints. The constraints now become design guidelines to be observed in subsequent design activity. If all design teams follow these guidelines and verify their blocks, the end result would be a grid that is safe by construction.
Together, the local and global constraints define a feasible space of currents, which we denote by F , so that i(t) ∈ F if and only if it satisfies (3) and (4) . Later in the paper, we will define algorithms that operate on a vector i(t) and which are applicable at any value of time t. In that context, we will use the shorthand i ∈ F to denote the fact that i(t) is feasible, for any given t.
PROBLEM DEFINITION
In this section, we will show how grid verification boils down to solving linear programs that depend on a certain matrix inverse. The aim is to lay the ground-work for the rest of the paper, in which this inverse will be conservatively approximated using a very efficient approach. The novelty of this paper is in adapting a previous inverse approximation technique to the power grid verification problem. However, we do need to provide some detailed background on grid verification, to motivate the need for finding an approximation to the inverse matrix.
We are interested in the vector of maximum node voltage drops, over all possible currents in F . Using (2), we can write:
Consider the special case where the grid had no stimulus for all t ≤ 0 so that v(0) = 0. At time t = Δt, we can write:
At 2Δt and at 3Δt, we have:
This can be repeated, so that at any future time pΔt, we have:
At every point in time t ∈ [0, pΔt], the current vector i(t) must be feasible, i.e., we must have i(t) ∈ F and, under that condition, we are interested in the worst-case voltage attained (separately) by each component of v(pΔt). In order to compactly capture this notion, we introduce the following notation. Let f (x) : R n → R n be a vector function whose components will be denoted f 1 (x), . . . , fn(x), and let A ⊂ R n . Now, define a vector z ∈ R n , such that z i is the maximum of f i (x) over all x ∈ A. We will denote this by the following operator:
where the "emax(·)" notation is introduced to denote the fact that this is an element-wise maximization. Notice that each component z i may be found separately by solving the optimization problem:
Maximize:
Using this notation, we can write the worst-case voltage drops at all nodes at time τ = pΔt as:
where the notation "∀i(t) ∈ F" means that, for every time point t ∈ [0, τ], the current i(t) satisfies all the (local and global) constraints.
In practice, we are interested in the steady state solution where the system becomes independent of the initial condition (i(t) = 0, ∀t ≤ 0). Since the RC grid model is a dynamical system with a limited memory of its past, then the steady state solution can be obtained by evaluating (12) at points far away from the initial condition, i.e., as p → ∞. Thus, the general solution to the exact voltage drop maximization problem is:
Although the RC model is dynamic, i.e., its currents and voltages vary with time, the constraints are DC and do not depend on time. Hence, F is the same for each time step. With this, the components of (13) can be "decoupled," leading to:
where i is simply an n × 1 vector of variables that satisfies the (local and global) constraints, without reference to any particular point in time. This is a major simplification of the problem, as it has the advantage that the number of constraints for each maximization is fixed and does not span all previous time-points. Unfortunately, both (13) and (14) are of theoretical interest only. They cannot be directly computed, as they stand, because they have to be evaluated for a large number of time steps until convergence and because the element-wise maximization requires linear programs (LPs) proportional to the number of nodes in the grid which for modern designs is in the thousands or even the millions. In previous work, various approaches to simplify (14) have been proposed, two of which will be relevant to our work, and they are as follows.
In [1] , a DC model of the grid was verified subject to the current constraints. The approach employed a resistive grid model and offered a major simplification of (14). In this case, the exact maximum voltage drop vector can be computed with a single emax operation as follows: In another approach, and given an RC grid, the authors in [3] proposed an upper bound on the exact worst-case voltage drop vector, under transient currents. As t → ∞, the bound was found to be:
where I is the identity matrix. The result in (16) has a run-time advantage over (14) as it requires a single emax operation. Once that is computed, the evaluation of (16) is relatively cheap. It involves a single vector scaling and a single standard linear solve (LU factorization of G). Thus, we see that both approaches [1] and [3] give a formulation that involves the inverse of a large matrix, G in one case, and A in the other. More generally, both problem formulations involve solving the same/following generic problem:
where E is understood to be either A in the case of transient analysis [3] or G in the case of DC analysis [1] . Note that E −1 is symmetric, because E (be it A or G) is symmetric. In the mentioned previous work, the matrix inverse E −1 was not found explicitly. Instead, the problem was transformed into the voltage domain, so that it makes use of the original sparse matrix E. However, even then, the problem remained quite expensive, limiting the size of grids that may be verified. As an alternative, and in the rest of the paper, we propose an efficient approach to compute a tight conservative approximation of v , based on a sparse approximation of the matrix inverse, followed by a solution of (17) using linear programming.
PROPOSED SOLUTION
Let M be an n × n matrix and let the vector m k denote the kth column of M . Let e k be the n × 1 vector consisting of all zeros, except for its kth component which is 1. Then, consider the n × 1 vector, called the kth residual, defined by:
It is clear that, if we choose M = E −1 , then r k = 0, ∀k. In general, the norm of the residual r k 2 (the Euclidean or l-2 norm in this case) is positive, and becomes zero only when M = E −1 . Thus, the norms of the residuals, for all k, provide a measure of how far M is from being equal to the desired matrix inverse E −1 . Let M E −1 , so that when M takes the value M , then r k 2 = 0, ∀k, and we want:
Typical power grids have a mesh structure, where a node has a small number of neighboring nodes. Such a structure gives rise to a matrix E (be it G or A) that is sparse, symmetric, positive definite, and banded. For such matrices, it is well known in the literature that their inverse has entries whose values decay exponentially as one moves away from the diagonal [5] . We illustrate this property in Fig. 1(a) , which shows the structure of the G −1 matrix for a 100-node grid. In Fig. 1(a) , every entry is represented, at the appropriate location, by a square whose area reflects that entry's magnitude, and very small valued entries are represented by dots. It is clear that the majority of the entries are extremely small with the exception of few significant bands in the neighborhood of the main diagonal. As expected, G −1 entries decay as one moves away from the diagonal. This is a key observation that has been the motivation for published work [6] on finding approximations to the inverse matrix that basically ignore the extremely small values in the inverse. A matrix that has a large number of entries whose values are extremely small is refered to as being practically sparse. In this case, M = E −1 is practically sparse. Therefore, every component of v , which we will denote v k , is mainly the result of the contribution of very few current sources, specifically those that correspond to the significant entries in row k of M . This is not surprising, because one expects that the voltage at a node depends mostly on the nearby current sources. This is well-known in power grid research, as in [7] which finds that a current source has a localized contribution on the grid. This is often refered to as the locality effect. The difficulty, with locality, is to develop rigorous methods for determining exactly where the neighborhood is, for a given node, and how far it extends. By using methods based on discovering the significant entries of a practically sparse matrix, we believe that we have found a rigorous approach for taking advantage of locality.
Our contribution, described in the next two sub-sections, is as follows. Given a user-specified over-estimation margin (in volts) for v , we devise a fast automated method to formulate a smallersize optimization problem, based only on the current sources that are most influential on each node, and we solve it to provide a solution that meets the over-estimation margin specified by the user. We start by briefly describing the existing approach of inverse matrix approximation and then, in the following section, we describe how we have adapted this to solve the power grid verification problem.
Inverse Approximation Method (SPAI)
A sparse approximate inverse technique (SPAI) technique is given in [6] . Given some user-defined tolerance η, SPAI computes an approximationM to the actual inverse matrix M . It operates by finding anM matrix that gives a very small residual norm, as an approximation to the unknown M (which, as we saw above, would give a zero residual). The technique starts with an arbitrary initial matrix M and iteratively refines its columns by minimizing the Frobenius norm EM − I F subject to η. Recall that for any n × n matrix H, the Frobenius norm is defined as [4] :
where h ij are the entries of H, which can be re-written as:
where h k is the kth column of H. Obviously, the minimum of (21) (which is zero) occurs when H = 0. Likewise, the minimum of EM − I F (which is zero) occurs when M = E −1 = M . By reducing EM − I F until it is less than η, SPAI finds anM that is a good approximation to M = E −1 . To see how SPAI works, consider that:
In this way, the problem of finding an approximate inverseM separates into n independent least-squares problems:
For each k = 1, 2, . . . , n, vary m k so as to minimize r k 2 2 until an m k is found for which r k 2 ≤ η.
While r k 2 > η, the iterative refinement of m k proposed by [6] attempts to improve on the column by appending the most profitable entries, i.e., the ones that will result in the largest reduction in r k 2 . In this manner, the technique automatically captures the sparsity pattern by including the significant entries of the inverse while avoiding fill-ins.
The numerical study in [6] shows that SPAI captures the main entries of E −1 extremely well. Moreover, it is an extremely efficient technique. It is inherently parallel as the columns ofM can be computed independently of one another. Moreover, the computation of a column is cheap; it requires a matrix-vector product and several QR factorizations of small submatrices of the original sparse matrix E.
An Upper Bound Vector
TheM obtained by SPAI can be used to give an approximate solution to the exact optimization problem in (19). However, for power grid verification, we are interested in, not simply any approximation to the v , but in a conservative approximation to v , i.e., an upper-bound on it. In other words, we want anM which, if used instead of M in (19), would produce av that we can use to construct an upper boundv on v , so that v ≤v. In this section, we describe how we adapt SPAI to our needs, by modifying its stopping criterion, and then how we generate our upper boundv based on the matrixM achieved by SPAI under the new stopping criterion. Crucially, we will show that the overestimation v − v is under control and can in fact be specified up-front.
Stopping criterion
Normally, SPAI terminates when r k 2 is smaller than a usersupplied η, for every k. However, in order to arrive at predictable over-estimation bounds, as we will see below, a new stopping criterion will now be introduced. If m k is the kth column of
where · ∞ is the infinity norm. We define the error tolerance k > 0, where k ∈ R is a scalar. As we will see below, k is derived from another user-specified error-tolerance on voltages. We modify SPAI so that it stops when, for every k, we have:
which achieves the condition that, for every k:
It remains to explain how E −1 ∞ is found, when E −1 is unknown! This is easy to do, in fact, because E is an M-matrix, as follows. Let u be the n×1 vector of all 1s, so that
Then, E −1 u is a vector whose every component is the sum of all the entries in the corresponding row of E −1 . Because E −1 ≥ 0, then each component of E −1 u is actually the sum of the absolute values of all the entries in that row of E −1 . Thus, the component with the largest value in E −1 u is, in fact, the "largest absolute row sum" of E −1 which is its infinity norm. In other words, if x is a vector such that Ex = u, then it's clear that:
Finding this x is easy, by performing one LU -factorization of E, which we have to do anyway as part of power grid verification, followed by a backward and a forward solve starting with u.
Algorithm 1 UPPER BOUND
Input: E, δ Output:v 1: Compute E −1 ∞ using (26):
Backward solve: Ux = y
Computev using (27):
Upper bound
Once SPAI terminates with its approximation matrixM , using the new stopping criterion, we run our optimization (19) usingM instead of M , to get:v = emax
and this optimization run is quite fast, as we will see in the results section. In this section, we will show how thev resulting from this optimization is used to generate a tight upper boundv on v , in a way that the over-estimation v − v is under control and predictable up-front. We will do this by first showing how (25) leads to an upper bound on M , based onM and k . We then show how this leads to an upper bound on v , based onv and k . SPAI starts with an initial M = I, and tries to maintain sparsity and reduce fill-ins throughout the process. So, many of the entries ofm k will be equal to 0. Let C k be the set of non-zero entries inm k . The number of such entries, denoted by |C k |, is often considerably smaller than n. To compute an upper bound on m jk , for all j, k, two cases have to be considered, according to whetherm jk ∈ C k or not. Ifm jk / ∈ C k , i.e.,m jk = 0, then (25) yields:
so that k is an upper-bound on m jk , with a maximum overestimation error of k . In the second case, when m jk ∈ C k , then, (25) gives:
so thatm jk + k is an upper-bound on m jk , with a maximum over-estimation error of 2 k . Let be the n × 1 vector such that
We will now prove that these upper bounds on the entries of M lead to an upper boundv on v , as follows:
where u is the same vector of all 1s introduced above, so that
The proof of this result is given in the following claim, which also provides the measure of over-estimation between v andv.
Claim 1. For every k = 1, 2, . . . , n, let u k be the n × 1 vector whose jth entry is 1 ifm jk ∈ C k , and otherwise 0. Then, v in (30) is an upper-bound on v , with a maximum over-estimation error of:
Proof. In the following, and for any k ∈ {1, 2, . . . , n}, we will prove that 0 ≤v k − v k ≤ δ k . The proof addresses first the lower bound (of 0) and then the upper bound (of δ k ).
Algorithm 2 Modified SPAI(E, E
Minimize Em k −e k 2 and update m k {this is the result of a QR factorization of a sub-matrix of E }
7:
Update r k and k 8: end while 9: end for 10: return (M, )
The lower bound: For the kth component of v , and using the bounds on the entries of M seen above, and because M andM are both symmetric, it is easy to see that:
The upper bound: It is easy to see that, if the optimalv k is achieved at some current vector i , then it must also be the case that n j=1m kjîj =v k , whereî is obtained from i by setting to 0 every jth entry of i for whichm kj = 0. In the following, we make use of such a vectorî at whichv k is achieved.
If
By making use of the fact that |m jk − m jk | ≤ k , due to (25), and because M andM are both symmetric, this expands as follows: Our algorithm accepts δ as a user-specified error-tolerance on node voltage drops. During the algorithm, as SPAI is iterating on a column, causing changes in u k , we repeatedly update k based on:
until such time when the stopping criterion (24) is met. By claim 1, this achieves the result that: v − v ∞ ≤ δ (33) Algorithm 1 provides a high-level description of our approach which uses our modified version of SPAI (algorithm 2). The algorithm takes a user-defined δ, computes E −1 ∞ and calls Modified SPAI which returnsM and . The algorithm will then computev leading tov.
EXPERIMENTAL RESULTS
The above algorithm, including the modified SPAI component, has been implemented in C++. It computes the column approximations of the matrix inverse, generates the reduced optimization problem and solves it using the Mosek optimization package [8] . A number of tests were conducted on a set of randomly-generated power grids and computations were carried on a 64-bit Linux machine with 8 GB of memory. The grids are based on user specifications, including grid dimensions, metal layers (M1-M4), pitch and width per layer, and C4 and current source distribution. Four global constraints were specified for the grids under study. Fig. 1(b) showsM as a result of running Algorithm 2 on the 100-node grid mentioned earlier. By comparing this figure to Fig. 1(a) , it is clear that Modified SPAI captures the significant entries of the inverse extremely well while avoiding fill-ins. The resultingM is practically sparse with an average of 20 entries per row/column.
We ran Algorithm 1 on a 21,099-node grid for δ = 5mV and monitored the relative over-estimation error for the kth entry as Fig. 2 shows a scatter plot of such errors, in percent, versus the entries of v . The figure also shows the curve corresponding to δ = 5mV where a point on the curve represents a v entry over-estimated by exactly 5mV. On the other hand, a point with an over-estimation greater than 5mV will lie in the region above the curve. From the resulting data distribution, it is clear that our algorithm guarantees the specified over-estimation δ for all entries, as it should, by claim 1. Note that for some entries, especially those with magnitudes less than δ, the relative error can be high. This, however, is of no concern in practice, as the critical v entries are those that are significantly larger than δ. In that case, the relative error incurred is small. The absolute error is always, of course, guaranteed to be less than δ, as we have seen. In fact, the absolute errors can be much less than δ, as seen in the histogram embedded in the above figure, which shows that the actual over-estimation is often less than δ/2.
Before proceeding with the results, we note that computing the exact v in (19), against which we will compare our results, does not have to involve the expensive computation of M . The problem can be transformed as shown in [1] from an optimization in the current space to an optimization in the voltage space by defining a vector y, such that Ey = i, and then solving v = emax(y), ∀Ey ∈ F. The exact solution, using this approach, remains expensive and involves solving one LP for every node on the grid. Our algorithm also solves an LP for every node, but each of these LPs is much smaller, because it benefits from locality (we retain only the most influential current sources, as determined by the modified SPAI routine). The result is an extremely fast optimization run. Table 1 gives runtimes for computingv subject to different values of δ. The runtimes take into account the Modified SPAI algorithm runtime. The table also gives the CPU time required for computing v , using the above mentioned approach of converting the problem to the voltage domain. Our method was tested on all grid nodes. Given the size of the test grids, it was impossible to solve all n exact LPs associated with the computation of v . For this purpose, we chose 1000 random nodes, solved the 1000 corresponding exact LPs, and estimated the runtime of the exact approach. Results show that even for relatively small δ's, our proposed approach runs several orders of magnitude faster than the exact computation which exhibits impractical runtimes. For instance, computingv for a 50,444-node grid subject to δ = 5mV takes around 15.67 hours while computing v requires around 222.7 hours. This is over 14X speedup, and the last test case in the table shows an even higher, 27X, speedup! Thus, our method makes checking a power grid under incomplete current specification a feasible and practical solution. It is also highly parallelizable, both in the modified-SPAI module and in the rest of the algorithm where the separate LPs can be run in parallel.
CONCLUSION
We describe an early verification approach under the framework of capturing circuit uncertainty via current constraints and maximizing node voltage drops over the constraint space. Our proposed method takes advantage of locality on the grid and formulates, for any given node, a reduced-size optimization problem while ensuring a user-specified over-estimation margin (in volts) on the exact solution. With this technique, verifying power grids under incomplete current specification becomes scalable and practical. The method shows a drastic improvement in runtime, doing in hours what required days in previous approaches.
