Abstract-As part of early system design, one must verify that the power grid provides the underlying logic circuitry with voltage levels that are within specified ranges. In this paper, we describe a vectorless verification approach that can be applied early in the design process. We adopt an RLC model of the grid in the framework of current constraints that capture uncertainty about circuit details and activity. With just a few linear programs and one linear system solve, our proposed approach provides tight conservative bounds on the maximum and minimum worstcase voltage drops at every node of the grid. Results show the accuracy and speed of our technique thus making it practical and scalable.
I. Introduction
T HE RISING demand for low voltage modern integrated circuits (ICs) has made efficient analysis of power grids a critical task. A key concern is the fact that a poorlydesigned grid will result in excessive fluctuations in the voltage levels supplied to the underlying logic thus slowing it down and putting the overall circuit timing performance at risk. Therefore, in order to ensure correct logic functionality, there is a clear need for efficient grid verification.
There are two main causes of voltage violations; IR drop, which is the result of the resistivity of the metal rails, and Ldi/dt noise, which is due to the inductance of the rails and the grid-package interconnections. In today's designs, the increasing number of transistors and the high operating frequencies lead to large current demands and fast current transients both of which produce large IR drops and considerable Ldi/dt noise [1] - [3] . Note, in particular, that such inductive noise can result in either drops or overshoots where voltage levels at the grid nodes exceed that of the supply. Therefore, grid verification should account for both voltage behaviors and grid safety must be stated in terms of the maximum and minimum worst-case voltage levels at each node on the grid.
Today, grid verification is typically done by simulation. Such an approach requires full knowledge of the current wave-forms drawn by every logic block attached to the grid. These waveforms would then be used to simulate the grid and get the voltage drop at every node. Verifying the grid in this manner proves to be problematic. For one thing, the number of current traces needed to cover the space of voltage drops exhibited on the grid is intractable for modern designs where grids can consist of several million nodes. Several attempts have been made to address the problem of searching the current space for the worst-case current waveforms, including the computation of bounds on possible currents [4] or of current statistics [5] . Another major drawback is that a simulation based flow does not allow for early grid verification, when grid modifications can be most easily incorporated. The need, then, is for a verification approach that does not depend on simulation, i.e., a vectorless approach. Therefore, we adopt the framework of partial current specifications, in the form of current constraints [6] , which will be detailed in Section II-C. The constraints specify a feasible space in which currents can vary during circuit operation. Under such constraints, RLC grid verification becomes a problem of computing the maximum and minimum voltage drops. In [7] , the authors computed these worst-case voltage drops by using a convergent iterative process that requires the solution of linear programs (LPs) while stepping repeatedly through time. This approach becomes prohibitive even for medium size grids.
In this paper, we propose a novel technique to compute upper bounds on the maximum worst-case voltage drops and lower bounds on the minimum worst-case voltage drops on the grid. These bounds require a few LPs and one linear system solve. Experimental results in Section V show that the improvement in runtime of our technique is dramatic while ensuring small error values.
II. Background A. Power Grid Model
We consider an RLC model of the power grid where each branch is represented either by a resistor, referred to as an r-branch, or by a resistor in series with an inductor, referred to as an rl-branch. Two sets of nodes exist; one is internal to the rl-branches. We denote its members as internal nodes. The members of the other set are referred to as external nodes. Some external nodes may be connected to ideal voltage sources (assuming flip-chip technology, we will refer to an ideal supply voltage source as a C4, with the understanding that any parasitics that are part of a true C4 pad structure have 0278-0070/$26.00 c 2011 IEEE already been modeled and included in the grid description). As a result, the inductance in our grid model represents the inductive component of either the grid interconnect or the pad structure that connects the grid to the external voltage sources. There exists also a capacitor from every external node to ground and some of these nodes have ideal current sources (to ground) representing the currents drawn by the logic circuits tied to the grid. In the derivation of the system equations, we will use the state-variable approach to network analysis [8] .
Let the power grid consist of m + n 1 + p nodes, where nodes 1, 2, . . . , m are internal nodes, nodes (m + 1), (m + 2), . . . , (m + n 1 ) are external nodes with no voltage sources attached, and the remaining nodes (m+n 1 +1), (m+n 1 +2), . . . , (m+n 1 +p) are the external nodes where the p voltage sources are connected. Let c k be the capacitance from every external node k to ground. Let i s,k (t) be the current source connected to external node k, where the direction of positive current is from the node to ground. We also assume that i s,k (t) is defined for every node k = 1, . . . , n where n = m + n 1 so that all external nodes with no current source attached have i s,k (t) = 0, ∀t and all internal nodes have i s,k (t) = 0, ∀t. Let i s (t) be the vector of all i s,k (t) sources. Let u k (t) be the voltage at every internal or external node k, k = 1, . . . , n and let u(t) be the vector of all u k (t) voltage signals. Moreover, let i l (t) represent the inductive branch currents where l = 1, . . . , m, and let i(t) be the vector of all inductive branch currents. Notice that m also represents the number of inductors in the grid. Fig. 1 is an example of an RLC grid model.
B. System Equations
The time-domain equations that describe the circuit can be derived by applying Kirchoff's current law (KCL) at every node k, where k = 1, 2, . . . , n. In general, applying KCL at every non-C4 node of the grid, we get
where G is an n × n conductance matrix as in the traditional modified nodal analysis formulation [9] . The G matrix is known to be diagonally-dominant symmetric positive definite M-matrix (so that G −1 ≥ 0). G 0 is an n×n matrix of the conductance elements connected to V dd sources [6] . C is an n × n diagonal matrix of node capacitances with the understanding that entries corresponding to internal nodes are 0, and M is an n × m incidence matrix whose elements are either ±1 or 0, as in [8] . The term ±1 occurs in location m ij of the matrix when node i is connected to the jth inductor, else a 0 occurs. The sign of the non-zero terms depends on the positive direction of current in the branch and on the node under consideration. If the current assignment is away from the node, then the sign is positive, else it is negative. Finally, v dd is a constant vector whose entries are all equal to V dd (the supply voltage).
If we set all i s,k (t) = 0, ∀t, then no voltage drop should occur on the grid. So, all branch currents will be zero and all node voltages will have a value of V dd . System (1) then becomes Define v k (t) = V dd − u k (t) to be the voltage drop at node k and v(t) as the vector of all voltage drops. Then, if we use (1) and (2), the first system equation becomes
Notice that in (3) we do not take into account the relationship between the inductor branch current and the inductor voltage. It remains to model this relationship. The necessary equations were derived in [8] and are simply stated here. Notice that these equations will be applied to all inductances. So, the number of equations will be m. Relating all inductive branch currents to the voltage drop across the respective inductors we get the second system equation
where L is an m × m diagonal matrix of inductance values. Notice that the matrix multiplying the voltage vector is the transpose of the incidence matrix [8] . The pair of equations (3) and (4) represent the complete behavior of the power grid, as a dynamical system. Using a backward finite difference approximation v (t) ∼ =
v(t)−v(t− t) t
, a discrete-time version of (3) and (4) can be written as
The equation variables are the node voltages and the inductive branch currents. Note that G + C t is also a symmetric positive definite M-matrix. For properties of M-matrices, the reader is referred to [10] .
C. Current Constraints
In our approach, we perform a verification of the grid in the absence of complete information about currents drawn by the underlying circuit, what may be called a vectorless approach. We do this because the currents are typically hard to specify, for at least two reasons: 1) there is a large variety of possible current waveforms, so that the worst case is hard to determine up-front and simulation of a large set of waveforms is prohibitively expensive, and 2) grid design and verification cannot wait until the chip design is nearly-complete, and is typically done early in the design flow, so that the details of the underlying circuitry may not yet be available or complete.
Current constraints, originally introduced in [6] , capture the uncertainty about the circuit currents arising from both unknown circuit behaviors and the fact that one is uncertain about circuit details early in the design flow. The aim is to verify that the grid is safe (i.e., its voltages remain within certain bounds), under all possible transient current waveforms which satisfy these constraints. Two types of constraints are defined: local constraints and global constraints.
Local constraints are upper bounds on individual current sources, where a current source can represent a single logic gate or cell, but more typically should represent a larger block. They can be expressed as
where i L is the vector of peak values that the current sources can draw-it is a "DC" upper bound on the transient waveform vector i s (t). It is important to remember, however, that it is only the constraints that are DC; the currents themselves are transient. To ensure that these constraints are well-defined for every node of the grid, we enforce the condition that any node with no current source connected would have a zero i L component.
If only local constraints are provided, the problem is much simplified, but the results would be overly pessimistic, because it is never the case that all chip components simultaneously draw their maximum currents, hence the need for global constraints, which are upper bounds on the sums of groups of current sources. They represent the peak total power dissipation of a group of circuit blocks. Assuming we have a total of κ global constraints, they can be expressed in matrix form as
where U is a κ × n matrix that consists only of 0s and 1s which indicate which current sources are present in each global constraint. As for the case of local constraints, note that i G is a fixed time-independent upper bound, a DC constraint, but the currents themselves are transient waveforms over time. How would one obtain/specify these constraints in practice? If a logic block is available and small enough to simulate, then one can generate the constraints by an "offline" process of simulation, which can be viewed as a characterization of that block. If the block is not yet available or is too large to simulate, then one would need to rely on design expertise and engineering judgment (how big it is, what its power needs were in a previous technology and how scaling would affect those needs, and so on). If, early in the design flow, nothing is known about this block, not even its detailed functionality, one typically is able to come up with an area budget for it. From that, and from the projected power density (watts/µm 2 ) for the target process technology, one can generate a rough current constraint for it. The bottom line: something is typically known about that block, which with good engineering judgment can be formulated into constraints. After all, if truly nothing is known about the circuit currents, then the grid simply cannot be verified.
Another possibility is that the current constraints can be used to implement a "spec-based" design flow; a chip-level designer would simply specify the constraints based on design expertise, and the grid is verified under these constraints. The constraints now become design guidelines to be observed in subsequent design activity. If all design teams follow these guidelines and verify their blocks, the end result would be a grid that is safe by construction.
Together, the local and global constraints define a feasible space of currents, which we denote by F, so that i s (t) ∈ F if and only if it satisfies (7) and (8) . Later in the paper, we will define algorithms that operate on a vector i s (t) and which are applicable at any value of time t. In that context, we will use the shorthand i s ∈ F to denote the fact that i s (t) is feasible, for any given t.
III. Problem Definition
Before we proceed, we will show how the exact RLC grid verification boils down to solving linear programs that depend on a certain matrix inverse and on some matrixmatrix multiplications. The aim is to lay the ground-work for the rest of the paper, in which upper bounds on the maximum worst-case voltage drops and lower bounds on the minimum worst-case voltage drops are computed using a very efficient approach that minimizes the cost of computing the inverse and that limits the number of required matrix-matrix multiplications. The novelty of this paper is two-fold. First, we transform the RLC grid into a reduced circuit to alleviate the problems associated with its system matrices and to allow for fast matrix manipulations. Second, we derive efficient tight upper and lower bounds on the maximum and minimum worst-case voltage drops respectively. However, we do need to provide some detailed background on the RLC grid verification to motivate our contribution.
We are interested in the vectors of maximum and minimum worst-case voltage drops, over all possible currents in F. To simplify the notation, let
so that, combining (5) and (6), we can write
Consider the special case where the grid had no stimulus for all t ≤ 0, so that x(0) = 0. At time t = t, we can write
At 2 t and at 3 t, we have
This can be repeated, so that at any future time p t, we have
At every point in time t ∈ [0, p t], the input vector b(t) must be feasible, i.e., we must have i s (t) ∈ F. To simplify the notation, we define F b to be the feasible space of
Under that condition, we are interested in the maximum and minimum worst-case voltage attained (separately) by each component of the v(p t) subvector of x(p t). In order to compactly capture this notion, we introduce the following notation. Suppose f (c) : R n → R n is a vector function whose components are denoted f 1 (c), . . . , f n (c), and let A ⊂ R n . Now, define a vector z ∈ R 2n , such that, with i ∈ {1, 2, . . . , n}, z i is the maximum of f i (c) over all c ∈ A and z n+i is the minimum of f i (c) over all c ∈ A . We will denote this by the following operator:
where the "eopt(·)" notation is introduced to denote the fact that this is an element-wise optimization. Notice that each component z i , ∀i = 1, . . . , n may be found separately by solving the following maximization problem:
and each component z n+i , ∀i = 1, . . . , n may be found separately by solving the following minimization problem:
Using this notation, we can express the maximum and minimum worst-case voltage drops at all nodes at time τ = p t by
where the notation "∀b(t) ∈ F b " means that, for every time point t ∈ [0, τ], the current i s (t) satisfies all the (local and global) constraints. In practice, we are interested in the steady state solution where the system becomes independent of the initial condition (b(t) = 0, ∀t ≤ 0). Since the RLC grid model is a dynamical system with a limited memory of its past, then the steady state solution can be obtained by evaluating (18) at points far away from the initial condition, i.e., as p → ∞. Thus, the general solution to the exact voltage drop maximization and minimization problem is
Although the RLC model is dynamic, i.e., its currents and voltages vary with time, the constraints are DC and do not depend on time. Hence, F is the same for each time step. With this, the components of (19) can be "decoupled," leading to
where b is simply an (n + m) × 1 vector of variables with a subvector i s that satisfies the (local and global) constraints, without reference to any particular point in time. This is a major simplification of the problem, as it has the advantage that the number of constraints for each optimization is fixed and does not span all previous time points. Unfortunately, (20) is of theoretical interest only. It cannot be directly computed, as it stands, because it has to be evaluated for a large number of time steps until convergence and because the element-wise optimizations require a matrix inverse A −1 and LPs that are proportional to the number of nodes plus the number of inductive branch currents in the grid which for modern designs is in the thousands or even the millions. Also, it includes matrix-matrix multiplications at every time step. As an alternative, and in the rest of the paper, we propose a solution to efficiently verify the voltage drops on the RLC grid. We will find an upper bound on the maximum worst-case voltage drop and a lower bound on the minimum worst-case voltage drop on every node on the grid. These bounds will first require a few applications of the "eopt(·)" operator. Instead of p applications, only r applications are needed where r will be defined below. The vector resulting from the above operation will then be combined with an analytical expression to give the bounds at p → ∞.
A. Discussion
Before we proceed with our proposed solution, we will provide some discussion to highlight the impact of this paper and the significance of grid verification under an RLC grid model. The discussion will focus on a small 6-node RLC grid.
One of the difficulties in analyzing RLC grid models, especially the portions around a C4 pad, is that only certain temporal arrangements of the circuit currents may lead to the worst-case voltage drops on the grid. An example is shown in Fig. 2 , which illustrates the current stimulus that induces the maximum worst-case voltage drop on one of the grid nodes. The figure also shows the voltage waveform resulting from simulating the grid, using HSPICE, subject to that current stimulus. This current stimulus was found by running the exact verification approach in (20) . The approach also reported a maximum worst-case voltage drop of 63.68884 mV. This worst-case value was confirmed by the voltage waveform generated by HSPICE as it showed a maximum voltage drop of 63.67949 mV. Another example is shown in Fig. 3 , which illustrates the current stimulus that causes the maximum overshoot (minimum worst-case voltage drop) on the same grid node. This stimulus, along with a maximum worst-case overshoot of 23.7449925 mV, is also found by running the exact verification technique. Again, HSPICE was used to confirm the results by simulating the grid subject to that current stimulus. In this case, HSPICE reported a maximum overshoot of 20.96346 mV. The HSPICE-generated voltage waveform is also shown in Fig. 3 . An important aspect that was revealed by simulation using HSPICE is that the worst-case voltage waveforms gradually build up in magnitude until they reach their peaks.
As one can see, the current waveforms leading to the worstcase drops are quite complex even for this small 6-node grid. Such complexity arises from the transient system behavior and is compounded by imposing local and global constraints on the power grid current sources under the vectorless verification framework. Note that the global constraints are the reason some currents do not transition from zero to their peak values at certain points in time. As a result, to uncover such current loads, along with the worst-case voltage drops, we employed the novel and unique approach that is derived in (20) . From Table I in Section V, it is also clear that using this exact approach becomes computationally prohibitive even for small-size grids. Therefore, our proposed upper and lower bound approach, is a fast verification technique that can uncover the fact that there exists a combination of feasible current waveforms which can result in voltage violations, be they overshoots or voltage drops.
IV. Proposed Solution A. Grid Transformation
An alternative way of writing (6) is
where i(t) and i(t − t) are the inductive branch currents at time t and t − t, respectively. Note that these are the currents in what we previously defined as rl-branches. Fig. 4 shows a sample branch with two external nodes j and k, with voltage drops v j and v k , an internal node d, with voltage drop v d , and a branch current i l . Depending on the assigned direction of the inductive branch current, two cases arise. Fig 4(a) shows the case where i l flows away from the internal node d. The KCL equation at node d at time t can be written as Applying (23) at time t − t, we get
Fig 4(b) shows the case where i l flows toward the internal node d. In this case, the KCL equation at node d at time t − t gives
Depending on the current assignment in every rl-branch, we can either apply (24) or (25) to all the internal nodes to obtain
whereĜ is an m × n matrix whose kth row is that of either G or −G and it corresponds to the internal node of the kth rl-branch. Combining (21) and (26) eliminates i(t − t) and leads toĜ
Then, we substitute i(t) from (27) into (5) to get
It is clear that (28) technically eliminates the inductive branch currents from the equation variables thus drastically reducing size of the system under verification. A similar transformation technique was employed in [11] for a different grid model and using the trapezoidal finite difference approximation. In our paper, the RLC grid represented by (5) and (6) is now transformed into a circuit with similar properties as the RC grid model [12] . As is the case for the system matrix in the RC case, and unlike A in (9) ,
is sparse, symmetric, positive definite, and a banded Mmatrix. For such a matrix, it is well known that its inverse has entries whose values decay exponentially as one moves away from the diagonal [13] . Such matrices are often referred to as practically sparse and their inverses can be efficiently computed, with a high degree of precision, by using a sparse approximate inverse technique (SPAI) [14] . As we will see in Section IV-B, computing the matrix inverse efficiently and accurately is crucial for our approach.
B. Upper and Lower Bounds

1)
Overview: Now we will show how the upper and lower bounds can be obtained. The derivation of the main result will proceed in two steps. First, we will prove that the bounds at time t can be obtained from the bounds at time t − r t, where r is a crucial parameter that we will define below and which represents a small number of time steps. Second, we will show that this means that the true upper and lower bounds at infinity can be written as a matrix-vector product. The matrix is the inverse of (I − R), where R is a matrix to be introduced below, that depends on r. The vector will be obtained by r applications of the "eopt(·)" operator.
2) Bounds on Voltage Drops That Are r t Steps Apart: We use (28) to write an equation analogous to (10) as follows:
where
+MĜ. Because the RLC grid is a stable linear system and because the backward difference approximation used in (5) and (6) 
where i s (t) ∈ F for t ∈ [0, p t]. Let r be a positive integer such that p/r is an integer, then grouping every r consecutive terms in (30) leads to
As we will see in Section IV-B4, it is possible to choose an appropriate choice of r, which allows the computation of the upper and lower bound vectors. Define
keeping in mind that both N and s(t) are functions of r. Claim 1: Equation (31) can be represented as a recursive relation as follows:
where v(0) = 0 for all t ≤ 0.
Proof: Let v (t) be the value of (31), and v † (t) be that of (34). It is obvious that the case at time t = r t is satisfied,
the claim is true by induction if we prove the following for all t = p t:
Assuming the left-hand side of (35) is true, we get
Using (31) at t = (p t − r t), we obtain
Combining (36) and (37), and using (32) and (33) leads to
Let j = k + 1, then we can re-write equation (38) as follows:
Therefore, v (p t) = v † (p t). This completes the proof. We will now show that (34) leads to upper and lower bound vectors on the worst-case voltage drop on the grid. Define
Notice from (33) that (40) depends on the last r time steps. But we know that the current constraints are DC and that F is the same for each time point, then the result of running "eopt(s(t))" would be the same for any t. We can therefore decouple the components of (40) leading to
where i s is simply an n × 1 vector of variables that satisfies both local and global constraints. Thus, the result of applying eopt(s(t)) for any consecutive r time steps is the vector composed of s 
we have v lb (t) ≤ v(t) ≤ v ub (t), ∀t = r t, 2r t, . . . , p t. Proof: Since v(0) = 0 for all t ≤ 0, then the base case at time t = 0 is satisfied as an equality, i.e., v lb (0) = v(0) = v ub (0) = 0. Then, the claim is true by induction if we prove the following for all t = r t, 2r t, . . . , p t:
Assuming the left-hand side of (44) is true, and by making use of the fact that S is a non-positive matrix and N − S is a non-negative matrix, we get
(N − S)v lb (t − r t) ≤ (N − S)v(t − r t) ≤ (N − S)v ub (t − r t) Sv ub (t − r t) ≤ Sv(t − r t) ≤ Sv lb (t − r t).
(45)
Adding the above two equations leads to (N − S)v lb (t − r t) + Sv ub (t − r t) ≤ Nv(t − r t) ≤ (N − S)v ub (t − r t) + Sv lb (t − r t).
Adding (46) to s
max , then utilizing (34) and (43), we obtain
This completes the proof. Therefore, given bounds on v(t − r t), bounds on v(t) exist and can be computed using claim 2.
3) Bounds in the Steady State: Even though one can only compute bounds on voltage drops that are r time steps ahead in time, we are interested in upper and lower bounds in the steady state case, i.e., when t = p t → ∞.
Let y(t) = [v ub (t) v lb (t)]
T and w = [s 
where R is a 2n × 2n matrix defined as
Writing (47) at every time step r t, 2r t, . . . , p t, yields y(r t) = w y(2r t) = Rw + w y(3r t) = R 2 w + Rw + w . . .
y(p t) = (R
It is therefore clear that the convergence of the upper/lower bound depends on the convergence of the matrix series (R p + . . . + R + I) as p → ∞. It is well known [10] that the series (R p +. . .+R+I) converges as p → ∞ if and only if ρ(R) < 1, under which condition the limit of the series is (I − R) −1 . Consequently, as p → ∞, the series in (49) converges to
provided that ρ(R) < 1, a condition which we now explore. 4) Convergence: Denote the set of all eigenvalues of a matrix X by σ(X). We mentioned earlier that the spectral radius is the magnitude of the largest eigenvalue and since ρ(D −1 E) < 1, then we can say
The spectral mapping theorem [16] provides that, if X is a square matrix and k is an integer, then σ(X k ) = {λ k : λ ∈ σ(X)}. From this and by using (32), it follows that
So, in order to show that ρ(R) < 1, it will be useful to relate the spectral radius of R to that of N.
Claim 3:
The set of the eigenvalues of R, called the spectrum of R, is the union of the spectrums of N and Q.
Proof: Recall that the eigenvalues of R are the roots of the following characteristic polynomial:
where λ ∈ C. Replacing R by its block-matrix representation from (48), and using |X| to denote the det(X), then
Since the determinant of a matrix is invariant under elementary row additions, and by using the block-matrix determinant [17] , we manipulate (54) to obtain
Using (53) and (55), it is easy to see that the eigenvalues of R should satisfy
The spectrum of R is thus the union of the spectrums of N and N − 2S. Recall from claim 2 that 2S = N − Q. Hence, Q = N − 2S. This completes the proof. As a result, the series in (49) converges, as p → ∞, if and only if ρ(N) < 1 and ρ(Q) < 1. We already saw that the spectral radius of N is less than 1, but that of Q is not trivial and is expensive to compute. In this paper, we employ a conservative criterion to bound ρ(Q) based on the fact that, for any matrix X and any matrix p-norm, we have ρ(X) ≤ X p [10] . More specifically, we will bound ρ(Q) by the minimum of the Q ∞ and the Q 1 norms that are defined as
Since (58) and (59) are the maximum row/column sum of the absolute value of the entries of Q, and since Q is the matrix of element-wise absolute value of the entries of N, we have N 1 = Q 1 and N ∞ = Q ∞ . Therefore, a sufficient condition for the convergence of the series in (49), as p → ∞, is when the minimum of N ∞ and N 1 is less than unity.
This can be achieved by making use of the fact that for any matrix X such that ρ(X) < 1, we have lim k→∞ X k p = 0 for any p-norm [10] . This implies that X k p asymptotically decreases with increasing k. From this, and from the fact that N = (D −1 E) r and ρ(D −1 E) < 1, we can say that
where N is the set of integers. Therefore, our approach will be to find such an r 0 for either p = 1 or p = ∞, set r = r 0 , so that either N 1 < 1 or N ∞ < 1, which ensures convergence.
In Section V, we show that the value r 0 required to guarantee convergence is small in practice.
5) Choice of Time
Step: We digress briefly to mention that, when transient analysis is used, the exact worst-case voltage drop values, as well as their upper and lower bounds, depend on the choice of time step t. To see why, observe that in all of the above derivations, we are implicitly assuming that currents may change arbitrarily in their feasibility region F within t time. This would allow currents to switch from zero to their local constraint values, or vice versa, within a single time step. This is clearly not true when t is very small because the switching activity of the currents is dictated by the speed of the underlying logic circuit. Note that our implicit assumption, when used with a small t, is conservative because obviously all slower current scenarios are covered. On the other hand, t has to be small enough to capture the true transient behavior of the grid voltages. As a result, design expertise should guide the choice of t, by striking a balance between the dynamics of the current loads and the grid transient behavior. We also caution the reader that too small a t might introduce noise in the computations and render them useless due to roundoff error issues that arises when t is close to the machine epsilon [18] .
Note that the convergence of our approach, as evident from Section IV-B4, is immune to the choice of time step t. There always exists a value of r for which the matrix series in (49) converges.
C. Upper and Lower Bound Algorithm
In this section, we present a high-level description of our proposed approach that finds upper and lower bounds on the worst-case voltage drops on the grid. Algorithm 1, RLC BOUND, takes as input an RLC circuit along with the local and global constraints. It returns upper and lower bounds on the voltage drop on every node on the grid as the final output.
The algorithm starts by constructing the reduced system as described by (28) is added to its jth row.
After that, D −1 is efficiently computed (using SPAI [14] ) and stored in memory. By capturing only the significant entries of the inverse, the SPAI method guarantees a sparse yet very accurate approximation of D −1 . For more information on SPAI, the reader is referred to Appendix A. Once D −1 is computed, the algorithm then computes w in (42) for r = 1 which requires n maximizations and n minimizations. The fact that D −1 is sparse allows us to reduce the size of each of these optimizations resulting in quite fast optimization runs. Then, RLC BOUND calls Procedure 1, VAR STEP OPTIMIZATION. The purpose of this procedure is to find an r, and the corresponding w, such that either N 1 < 1 or N ∞ < 1. It first computes N for r = 1 (step 1) and stores it in memory. The procedure then iterates while checking N 1 and N ∞ . As long as the condition fails, r is incremented and an update of w is carried out as follows. Based on (42), we can express w for a given r as
The first term on the right-hand side is simply w at r − 1, and (D −1 E) r−1 in the second term is N, also at r − 1. Since both quantities were computed and stored a priori, the update simply requires a matrix-matrix multiplication (step 4), and one "eopt" (step 5). Procedure 1 also keeps on updating N using (32). As soon as an r is reached such that either N 1 < 1 or N ∞ < 1, Procedure 1 exits and returns the current r, N, and w values to Algorithm 1. The latter resumes execution at step 7. As discussed in Section IV-B4, the series in (49) is now guaranteed to converge and the closed form solution of the upper and lower bounds is computed in steps 7-8 via an LU factorization and a forward/backward solve.
Note that the multiplications in steps 1, 4, and 6 of Procedure 1 drop any entry in the result whose value is below some user-defined threshold. This maintains the sparsity structure of N and P as long as r is small, thus allowing the reduction in the size of the LPs required for updating w is step 5.
V. Experimental Results
Algorithm 1, along with Procedure 1, has been implemented in C++. The algorithm uses the Mosek optimization package [19] to solve the required linear programs. We carried out several experiments on a set of randomly-generated power grids, using a 2.6 GHz Linux machine with 8 GB of memory. The grids were based on user specifications, including grid dimensions, metal layers, pitch and width per layer, percentage of rl-branches, and C4 and current source distribution. These C4s and current sources were randomly placed on the grid. Moreover, the circuits generated include user-defined RLC models for the package-grid interconnections and all the experiments were performed on grids with up to ten global constraints and 1.1 V supply voltage.
To assess the quality of our results, we computed the maximum and minimum worst-case voltage drops on the grid using two exact verification approaches. The first approach was proposed by the authors of [7] where these worst-case voltage drops at every node are computed by solving a set of LPs at consequent time steps until convergence. The drawback of this approach is that the number of constraints is multiplied by the number of time steps which, even for small grids, can lead to potentially very large optimization problems. The second exact method was formulated in this paper for the RLC case as given in (20) . Even though the method also uses LPs to compute the worst-case voltage drops, its advantage over the previous exact technique is that the number of constraints at each time step is fixed and does not span previous time-points. Still, the method requires a matrix inverse that is computed once, in addition to matrix-matrix multiplications and solving linear programs at every time step until convergence.
Note that we used the same t value for the exact methods as well as our upper/lower bound technique. We assumed the underlying logic circuitry of all the grids under study can run at a maximum frequency of 10 GHz, so that, as was done in [20] , we use a t value of 1 ps. Table I shows the speed and accuracy of our proposed technique for computing the upper and lower bound voltage drop vectors. We compare our results with the vectors of maximum and minimum worst-case voltage drops resulting from the two exact approaches we discussed earlier. As an accuracy measure, we report the maximum absolute error between the upper and lower bound vectors obtained by our technique and the vectors resulting from the exact verification. Column 4 shows the maximum absolute error between the upper bound voltage drop vector and the exact maximum voltage drop vector, and column 5 shows the maximum absolute error between the lower bound voltage drop vector and the exact minimum voltage drop vector. The number of grid nodes is reported in column 1, and the value of r for which our approach converges is shown in column 2. The table also reports the runtimes of the three methods under consideration as well as the number of time steps required by the two exact approaches to converge. Results show that our proposed method resulted in a maximum absolute error of around 10mV across all nodes on the grids under study. Note that this error comes from two main sources. The first one is the sparse approximation of D −1 . We keep this to a minimum by using a very small k in (67), in the range of 10 −9 to 10 −7 . This means that the maximum difference between an entry in the kth column of the exact inverse and the corresponding entry in the kth column of the approximate inverse is between 10 −9 and 10 −7 . The second, and main, source of error comes from the computation of the upper/lower bound vectors via (50). The reason an error is incurred is because our method performs optimizations for the first r time steps only, where r is a very small number of time steps compared to what is required in the exact approach.
The error comparisons could not be extended to larger grids and we were able to perform these comparisons for grids with a maximum of 8835 nodes since the exact approaches are computationally expensive for larger grids. The exact technique of [7] took 3.08 days for the 187-node grid, while that presented in (20) took 3.45 days for the 8835-node grid. Results show that our approach runs several orders of magnitude faster than the exact computations which exhibit impractical times even for small grids. For instance, computing the upper and lower bound vectors for the 8835-node grid took 39.93 minutes while computing the exact worst-case voltage drops using (20) took 3.45 days. This is over 124X speed up! Moreover, our technique allows one to verify grids with a relatively large number of nodes; verification of a 170 193-node grid requires 36.19 h. This would have taken several days or even weeks in the exact case. Note also that, for all the grids under consideration, the maximum value of r for which (49) converges is very small thus limiting the number of matrix multiplications and the number of linear programs. Thus, our method makes checking an RLC power grid a feasible and practical solution.
One might argue that such grids are still small when compared to full-chip grids containing millions of nodes. Nevertheless, our approach is important for at least two reasons: 1) it is an approach to rigorously check the safety of an RLC grid in a truly vectorless approach, and 2) our method can be applied to the top-level main feeder network of the grid that is not as large as grids at the end of the design flow. The ability to test the main feeder network early in the design flow is a major advantage of our technique. We believe that it can lead to practical methods for early vectorless grid verification. Table II shows a breakdown of the runtime of the major components of our approach for the largest seven grids we investigated. Column 2 represents the time it takes to compute D −1 using SPAI. Column 3 shows the runtime of the matrixmatrix multiplications required, and columns 4 and 5 report the runtimes of the optimization component and the linear system solve, respectively. Even though the matrix-matrix multiplications comprise a significant portion of the total runtime of the algorithm, there is scope to achieve considerable speedup in computing D −1 as well as in performing the linear optimization step, by exploiting parallelism inherent in SPAI and "eopt." From Appendix A, the columns of the approximate inverse can be computed by n independent leastsquare problems. Also, recall from Section III that "eopt" is an element-wise optimization where each element can be found separately. As a result, given a cluster of multi-core machines, the runtime of these two components, if parallelized, would be reduced by a factor equal to the total number of cores in the cluster. This represents a significant speedup given that the combined runtime of SPAI and "eopt" for the grids under study represents 30% to 55% of the total runtime. Therefore, with parallelism exploited, our algorithm can be fast enough to verify power grids with a larger number of nodes.
To further showcase the accuracy of our algorithm, Fig. 5 shows a scatter plot of the relative errors, in percent, versus the maximum worst-case voltage drops on a 2325-node grid. The figure also shows the curve corresponding to an absolute error of 5 mV where a point on the curve represents a node voltage drop that is over-estimated by exactly 5 mV. On the other hand, a point with an over-estimation greater than 5 mV will lie in the region above the curve. From the resulting scatter plot, it is clear that such absolute errors are very small meaning that our algorithm is very accurate. Note that for some maximum worst-case voltage drops, especially those with small magnitudes, the relative error can be high. This, however, is of no concern in practice, as the critical voltage drops are those that are relatively large. In that case, the relative error incurred is small. A similar behavior is observed for the relative errors versus the minimum worst-case voltage drops.
VI. Conclusion
With the rising demand of low voltage designs, power grid verification is becoming increasingly important. We introduced a fast and efficient inductance-aware early verification technique in the framework of current constraints. Our approach transforms the RLC grid into a reduced circuit to alleviate the problems associated with its system matrices then, it computes conservative bounds on the worst-case voltage drops on the grid. The approach requires only a few linear programs and one linear system solve thus making early grid verification under the RLC model practical and scalable. Prior art simply could not handle grids of more than a couple thousand nodes, and even for such systems runtimes were prohibitive. Our method showed a drastic improvement in runtime; verifying a grid of about 170 000 nodes took just over 36 h. Results also showed that, for the girds under study, the errors incurred by our technique were relatively very small, never exceeding 10 mV.
Appendix
It is obvious from Section IV-B that D −1 is essential for the computation of the upper and lower bounds and needs to be computed in a fast and accurate manner so as not to compromise the speed and accuracy of the overall technique. In Section IV-A, we stated that in addition to eliminating the inductive branch currents and reducing the system size, the advantage of the RLC grid transformation is that D −1 is practically sparse and thus can be efficiently computed by a sparse approximate inverse technique.
In what follows, we will briefly explain the SPAI given in [14] . Let M be an n×n matrix and let the vector m k denote the kth column of M. Note that this M is different from the incidence matrix defined in Section II-B. Let e k be the n × 1 vector consisting of all zeros, except for its kth component which is 1. Then, consider the n × 1 vector, called the kth residual, defined by Given some user-defined tolerance η, SPAI computes an approximationM to the actual inverse matrix M . It operates by finding anM matrix that gives a very small residual norm, as an approximation to the unknown M (which, as we saw above, would give a zero residual). The technique starts with an arbitrary initial matrix M and iteratively refines its columns by minimizing the Frobenius norm DM − I F subject to η. Recall that for any n × n matrix H, the Frobenius norm is defined as [10] 
where h ij are the entries of H, which can be re-written as
He k
where h k is the kth column of H. Obviously, the minimum of (63) (which is zero) occurs when H = 0. Likewise, the minimum of DM − I F (which is zero) occurs when M = D −1 = M . By reducing DM − I F until it is less than η, SPAI finds anM that is a good approximation to M = D −1 . To see how SPAI works, consider that DM − I 
In this way, the problem of finding an approximate inverseM separates into n independent least-squares problems For each k = 1, 2, . . . , n, vary m k so as to minimize r k 2 2 until an m k is found for which r k 2 ≤ η. While r k 2 > η, the iterative refinement of m k proposed by [14] attempts to improve on the column by appending the most profitable entries, i.e., the ones that will result in the largest reduction in r k 2 . In this manner, the technique automatically captures the sparsity pattern by including the significant entries of the inverse while avoiding fill-ins.
The numerical study in [14] shows that SPAI is stable and it captures the main entries of D −1 extremely well. Moreover, it is an extremely efficient technique. It is inherently parallel as the columns ofM can be computed independently of one another. Moreover, the computation of a column is cheap; it requires a matrix-vector product and several QR factorizations of small submatrices of the original sparse matrix D.
Normally, SPAI terminates when r k 2 is smaller than a user-supplied η, for every k. 
where · ∞ is the infinity norm. We define the error tolerance k > 0, where k ∈ R is a scalar. As the authors did in [21] , we modify SPAI so that it stops when, for every k, we have
which achieves the condition that, for every k which is its infinity norm. In other words, if b is a vector such that Db = a, then it is clear that
Finding this b is easy, by performing one LU-factorization of D, followed by a backward and a forward solve starting with a.
