Abstract-Decoupling capacitors (decaps) are typically used to reduce the noise in the power supply network. Because the delay of gates and interconnects is affected by the supply voltage level, decaps can be used to improve the circuit performance as well. In this paper, we present the analytical delay model under IR drop, Ldi/dt noise, and decaps to study how decaps affect both the gate and interconnect delay. Given a floorplanning solution, we study how to allocate the whitespace for decap insertion so that the delay is minimized under the given noise and area constraint. We employ the Sequential Linear Programming method to solve the non-linear whitespace allocation problem. Our experimental results show that intelligent decap allocating decap makes further delay reduction possible without adding any additional decap.
I. INTRODUCTION
Signal integrity is a very important issue in VLSI technology. Simultaneous switching of digital circuit elements can cause considerable IR-drop and Ldi/dt noise in the power supply network. This power supply noise may cause logic faults. On-chip decoupling capacitors (decap) are widely used to mitigate the power-supply noise problem. By charging up during steady state conditions, decaps can be assumed as the role of the power supply and provide the current needed during the simultaneous switching of multiple functional blocks.
Non-ideal power supplies also affect circuit delay significantly. The IR drop and Ldi/dt noise along the power supply network decrease the V dd level, which in turn slows down the gates connected to it. While buffers have been a popular method to reduce the circuit delay, decap is another useful method to improve circuit performance. If decaps are inserted carefully, both the power-supply noise and circuit delay problems can be mitigated simultaneously. Decaps are usually inserted in the whitespace of a layout. Since it is the floorplanning stage that determines the whitespace regions, decap planning is highly effective during the floorplanning stage for optimizating delay and noise. A designer can choose the floorplan with the best performance, because the noise/decap-aware delay varies in different floorplans. However, more optimization time will be paid for this selection. The results for our integrated floorplanning algorithm shows this trade-off.
The history of utilizing decaps for delay minimization is short. The work in [1] introduces a gate delay model with respect to decap. Delay in this model is a linear function of voltage drop, and the coefficients are determined by multiple regression analysis. The authors in [2] apply this model to decap planning problem to meet the delay target by Lagrange relaxation. Compared with [1] and [2] , our work considers both gate and interconnect delay. In addition, our analytical model captures the highly non-linear relation between delay and decap. The work in [3] considers both power-supply noise and delay minimization in gate-level placement. Their delay model is related to the voltage change by a non-linear function.
Leakage is another significant issue in decap planning, because the leakage current flowing through decap raises the temperature of the chip. In a practical design, the leakage current is positive proportion to the area of decap, so the decap area should be restricted, although more decaps can obtain better performance on stabilizing the voltage level at each power grid. On the other side, if the voltage level is close to the ideal V dd, decap insertion is not useful. Therefore, an efficient sequential linear programming algorithm is proposed to planning decap with the leakage and other constraints. Additonal work on decap-aware floorplanning for 2D circuits is presented by [4] . The objective is to minimize the floorplan area while suppressing the power supply noise below the specified limit. The authors of [5] presents a circuit model to allow non-adjacent decap access instead of only adjacent decap in [4] . Iteration of linear programming in decap planning is applied in [6] , [7] . A time-domain method of power-grid analysis is proposed in [8] . Compared with these works, our work considers both noise and delay objectives. The contributions of this paper are as follows:
• We present an analytical resistance-inductance-capacitance (RLC) delay model that captures the non-linear relation between gate/interconnect delay and decap/IR-drop/Ldi/dt noise. Our model correctly captures the impact of decap planning on the noise and delay objectives, and can directly calculate the delay from the block at any place in the power grid. SPICE simulation results validate the usefulness of our formula.
• We present an effective floorplanning-level decap planning algorithm that utilizes the noise/decap-aware delay model for rigorous performance optimization under noise and area constraint. We employ a highly effective linear approximation to solve the original non-linear programming in a sequential fashion.
II. DECAP-AWARE DELAY AND NOISE ANALYSIS

A. Power-supply network and RLC Circuit Model
The model of power-supply network used in this work is shown as in Figure 1 . RP and LP are the package parasitics of the power pins, and others are on-chip components. The capacitors, such as C A and C B , are the combinations of the parasitic capacitors (not drawn in the power grid due to limited space) and the decap. We consider the nearest power-supply pin and both the shortest and second shortest paths from that pin 1 . The SPICE simulation in [4] shows that the error caused by this assumption is less than 10%. More complex models without these two assumptions can be designed based on [9] . However, these complex models will be too time consuming for use in the floorplanning stage. Our model with these two limiting assumptions can also obtain good approximation as demonstrated by [4] , so we apply both of them in the analysis. Figure 2 shows the circuit model used for the noise/decap-aware delay analysis. The physical model consists of a non-ideal power supply (due to IR and Ldi/dt noise), decaps, a RC buffer model that is attached to the non-ideal power supply network, a wire, and a final capacitive load. This RLC-model has the following three major parts:
1) Power supply: the resistors, inductors, and decaps between the power pin and the load have the same notations as in Figure 1 .
, and IB3 are the current to the power-supply grids adjacent to P , A1, B1, B2, and B3 respectively (these loads are not shown in Figure 1 because of limited space). I L is the current of the load without the output buffer. The non-ideal voltage level is denoted by V a. 2) Driving buffer: we use a first-order circuit model to represent the driving buffer that is connected to the non-ideal power supply. The voltage level at the output of the buffer is denoted by V b . Din denotes the buffer delay, Rout denotes the output resistance, and C par denotes the parasitic capacitance of the buffer. 3) Wire and load: we use the lumped π-model to represent the interconnect, where R w , Cw, and Lw respectively denote the wire resistance, wire capacitance, and wire inductance. C L denotes the loading capacitance, and Vc is the voltage level at the input of the load. The delay analysis measures the delay from node V a to Vc in Figure 2 .
B. Delay Analysis
The voltage level at each node of the circuit in Figure 2 can be solved by Kirchhoff's current law (KCL) and Kirchhoff's voltage law (KVL). The circuit can be separated into five parts denoted by subscripts g, c, l, v, and i. They represent the branches containing resistors, capacitors, inductors, voltage sources, and current sources respectively. Because of space limited, we represent the circuit as the matrix form. A g is incidence matrix of resistors, the same as for other parts. Therefore, KCL can be written as
KVL can be written as
These two pathes for other scenarios of power-supply network can be generated similarly as in Figure 1 , which is used for explaining easily.
whereVn is the vector of all node voltages. The constitutive relations of the resistance, capacitance, and inductance branches are GV g = I g , sCVc = Ic, and sLIL = VL, where G, C, and L are the diagonal matrices whose elements are the values of conductance, capacitance, and inductance in the corresponding branch, respectively. The decaps and inductors along the power grids are already charged before this transient analysis. Therefore, an additional voltage source is connected to each decap in parallel, and an additional current source is connected to each inductor except L W in series.
By applying the modified nodal analysis (MNA) method [10] , the following matrix equation can be obtained:
X is the vector of the transfer functions in s-domain. These transfer functions can be used to obtain the response in time domain. Therefore, the delay of V c can be calculated from the moments of its transfer function. These moments are denoted as m i (i ≥ 0), one element in
where m i is the ith moment vector of the transfer functions of all nodes in the circuit, and G −1 is calculated by LU decomposition. Several formulas of delay calculation by moment match have been proposed in [7] , [11] , [10] , and [12] . However, in the noise/decapaware delay model, poles and zeros should both be considered, which is different with [12] , because the decaps and inductors along the power grid have already been charged as part of the initial conditions. Therefore, a new set of formulas should be proposed. With consideration up to s 2 , the truncated transfer function of Vc is
The moment expansion of H(s) is
Matching the moments of H(s) to the moment of the transfer function of node c obtained before, we arrive at,
1 − 4a2 > 0, the poles of this system are −p1 and −p2. Therefore, the response of V c is
Because p 1 < p2, the second term of response decreases much more rapidly than the first term. Therefore, only the dominant pole needs to be considered,
Assuming the signal rises from 0 to 50% of V dd during the elapsed time, the solution to the following equation is the delay of the circuit under analysis:
Solving for t, the analytical noise/decap-aware model is as follows:
The total delay includes the buffer delay Din, the value of which can be measured at different voltage levels and used during decap planning by table look-up.
Similarly, when a 2 1 − 4a2 < 0, let p = a1/2a2 and q = Δ/2a2, the response is
Let vc(t) = 50%V dd, this recursive equation can be solved by approximating based on the first moment, m 1. Therefore, the total delay is
When a
Let v c(t) = 50%V dd, the total delay is
is the solution of w * exp(w) = x, which can be solved by iteration. If the interconnection is a multi-terminal net, more nodes will be added into the circuit model. Our method can also be applied to calculate the moments at these additional nodes.
To concentrate on the effect of the decaps, only two-terminal nets are used in this paper 2 . Section IV-A presents the SPICE simulation results of the delay model.
C. Noise Analysis
We use the method presented in [4] and [5] to calculate IR-drop and Ldi/dt noise from a given floorplan. A uniform RLC-mesh is used to model the P/G network. The edges of the mesh have resistive impedances and inductances. The mesh contains power supply nodes and power consuming nodes. The dominant paths of a block are the shortest path and the second shortest path from the nearest power supply to the block requiring most of the current. The voltage noise of a block, denoted V k noise , is simply the voltage change caused by the resistors and inductors along its dominant paths. We also consider the extra current provided by the decoupling capacitors added to some of the mesh nodes. Depending on the location of the blocks, some edges are included in more dominant paths than others. In this case, the voltage drop is larger on those "popular" edges and the blocks that use them.
In the worse case, a module would draw all of its switching current from its decap. Let Q (k) = ts 0 I (k) (t) · dt denote the maximum 2 We decompose multi-pin nets into a set of source-sink two-pin nets, because our decap planing is in floorplan phase. charge drawn from the power supply by block B (k) , where I (k) (t) is the current demand, and t s is the switching period. The decap demand is calculated as follows:
where V tol denotes the voltage-drop constraint. This worst-case demand is used during our optimization as the demand.
III. NOISE AND DELAY-AWARE DECAP PLANNING A. Problem Formulation
The decaps are usually inserted in the whitespace of a layout. Since it is the floorplanning stage that determines the whitespace regions, decap planning is highly effective at the floorplanning stage. One way to plan the decaps is to allocate the whitespace after the floorplan is fixed (= post-floorplanning). This problem is stated as follows:
Noise We use the noise/decap-aware delay model shown in Equations (1-3) to measure point-to-point delay values in the floorplan. We note that the delay of a wire is a non-linear function of IR-drop, Ldi/dt noise, decap, and the wire length. We employ the sequential linear programming (SLP) method to solve this highly non-linear, post-floorplan whitespace allocation problem.
B. Non-linear Optimization Formulation
Considering the area, noise, and leakage constraints, our non-linear decap planning problem is formulated as follows:
Subject to
x ik = 0, ws k is not adjacent to blki (10) where S b is the set of functional blocks, and Sw is the set of whitespace regions in the given floorplan. D ij is the delay from block i to block j based on our noise/decap-aware delay model (= Equations (1-3) ). x ik is the portion of whitespace k adjacent to block i for decap insertion. C ox is the capacitance per unit area. The area of whitespace j is A j . The decap requirement of block i, denoted by C i, is computed based on the voltage-drop constraint as shown in Equation (5). Equation (6) is our objective function, where the total delay among all two-pin wires is minimized. Equation (7) is the decap demand (= voltage-drop noise) constraint, which states that the total amount of decap adjacent to a block should exceed its demand for decap. Our voltage-drop and decap demand analysis is presented in Section II-C. Equation (8) is the whitespace area constraint, which states that the total amount of whitespace used for decap allocation cannot exceed the amount available. Equation (9) limits the total current leakage of decap, which is proportional to the total area of the decaps. Equation (10) states that only adjacent whitespace may be used to meet the decap demand of the blocks.
C. Sequential Linear Programming
We solve the non-linear optimization problem formulated in Section III-B using the sequential linear programming method, because the objective function of this problem is complicated, while the constraints are all linear. Our method consists of two steps, namely, initial planning, and iterative improvement. The goal of the initial planning step is to obtain a feasible solution for subsequent iterative improvement. During initial planning, our primary goal is to obtain a solution that satisfies the voltage-drop constraint, that is, a solution that meets the decap demand of the blocks. Note that the floorplan area will be expanded if the existing whitespace is not enough to meet the decap demand. In this case, we perform the voltage-drop analysis and decap demand computation again to reflect the changes in the floorplan.
We improve the delay of the initial solution during the iterative improvement step while suppressing noise. In this step, the floorplan area is not further expanded, but the decap allocation is re-adjusted to improve delay under the noise constraints. During the SLP-based improvement step, the non-linear delay objective is approximated linearly. In addition, a set of bounds are added to the solution space to ensure that the linear approximation is close enough to the original objective function. The SLP-based iterative improvement step terminates if the degree of updates in the solution is minimal.
1) Initial Planning
Step: The initial planning step is formulated as the following LP:
x ik = 0, ws k is not adjacent to blki (13)
The objective is to minimize the difference between the decap demand and the decap adjacent to each block. When this difference becomes zero, the decap demand is fully met and the voltage-drop noise constraint is satisfied. We minimize this difference because the amount of existing whitespace in the given floorplan may not be enough. In this case, floorplan expansion is necessary to add more whitespace. Thus, we repeat this iterative LP minimization and floorplan expansion until our objective becomes zero, therefore, the decap demand is satisfied. 3 Constraints (12) and (13) in this LP are identical to Equations (8) and (9).
2) Iterative Improvement
Step: Let C n Ai and C n Bi denote the amount of decap adjacent to block i along the shortest and the second shortest path respectively during the n-th iteration. The only variable is the decap in this step, because the floorplan is fixed. Therefore, the delay can be represented as D
The SLP-based iterative improvement step proceeds as follows:
• step 0: set n = 0; Obtain C n Ai and C n Bi (∀i ∈ S b ) from the whitespace allocation results of the Initial Planning Step.
• step 1: n = n + 1;
, the linearized version of our delay objective function, using f (C n−1 Ai , C n−1 Bi )).
• step 3: solve LP n , the LP of the n-th iteration. 3 The total number of iterations depends on the noise constraint and the initial floorplan. Our experimental results show the amount of expansion needed for each circuit in Section IV-B. circuit model formula Fig. 3 . Validation of our analytical noise/decap-aware delay model based on SPICE simulation.
, then terminate; otherwise, goto step 1.
Step 2 of the SLP requires linearization of the objective function used in the original NLP, that is, Equation (6) . Therefore, D
, the non-linear function between the delay and decap, is approximated into a linear form within the following bounds:
where C n−1 Ai and C n−1 Bi denotes the assignment variables from the previous iteration. The following constraints are derived from the above bounds.
where the additional notation X as a subscript of x means using only the decap along the path X.
The linear approximation of f (C n Ai , C n Bi ) within the above bound is
Our LP at the n-th iteration is formulated as follows:
subject to (7), (8), (9), (10), (14), (15), (16)
IV. EXPERIMENTAL RESULTS
We implemented our decap planning algorithm using C++/STL and ran it on a 1.2GHz Celeron-M processor with 1GB memory. The parameters used in this paper are based on 70nm technologies except for Table II , where we use 250nm for comparison with existing works. We report our results based on the MCNC and GSRC benchmark circuits. Power and ground pins are all assumed to be uniformly distributed on the boundary of the floorplan. Figure 3 shows the SPICE simulation of the RLC-model shown in Figure 2 . We compare this to our analytical noise/decap-aware delay model, Equations (1-3) . The length of the interconnect is 3mm so that we can analyze global connections common to block-level interconnects. We observe that our formulas closely match with the SPICE simulation. We also verify the non-linear relation between the delay and decap, where increasing decap size reduces the delay in a non-linear fashion. Figure 4 shows simulation scenarios with different wire lengths, load sizes, power-supply networks, and distances from the block to the nearest decap. Table I shows the setups and maximum error of these experiments. We vary only one parameter for each experiment to observe the sensitivity of the model to each parameter. Figure 3 and 4 show that our delay model is trustable in various scenarios on the floorplan. Figure 5 shows the SPICE-based voltage curves of various nodes for the RLC-circuit shown in Figure 2 . V in denotes the logic value applied to the input of the driver. V (a) shows the non-ideal behavior of the power supply 4 . V (b) corresponds to the output voltage of the driver, and V (c) denotes the voltage at the input of the final load.
A. SPICE Simulation Results
B. Decap Planning Results
We first compare our results to an existing work that is based on "effective decap distance" [5] 5 . We use the 250nm technology parameters used in [5] . Table II shows the comparison between [5] and our algorithm in terms of floorplan area, decap cost, and runtime. Note that [5] does not report delay results since it only optimizes decap cost under a noise constraint. In addition, the current density values in both results are based on random assignment within the same range. Thus, a head-to-head comparison is not quite possible. However, these results show that our results are comparable to [5] . We use more up-to-date 70nm technology for all subsequent results. Table III shows a comparison among the following algorithms.
• Noise-C minimizes decap cost under a noise constraint (noise-C). Delay is ignored in this case. This is designed to reproduce results in [4] and [5] for a more up-to-date technology node. Therefore, only initial step is applied in this algorithm.
• Delay-O, noise-C minimizes delay (delay-O) under a noise constraint (noise-C). This algorithm corresponds to our nonlinear programming formulation presented in Section III-C.
• "Integrated floorplan" integrates the delay-aware decap planning algorithm (= delay-O, noise-C) into the floorplanning process. Our Sequence-Pair-based [14] floorplanner performs decap planning at every move during the low temperature region of annealing, and choose the results with minimum delay comparing with others. Note that the same value for the noise constraint is imposed on all of the above algorithms. Our two major observations are as follows:
• The "decap" and "% ave-dly reduce" columns show that the delay-oriented algorithm (= delay-O, noise-C) obtains better delay reduction at comparable decap cost compared to the pure noise-oriented algorithm (= noise-C). Therefore, further delay reduction is possible with a similar decap cost. Further investigation reveals that our delay-oriented algorithm allocats more decaps to the blocks with more interconnects incident to them. A similar trend can be observed from the "maximum delay" column (= maximum among all interconnect delay values).
• The area increase from floorplan expansion is reported under "% area expand" column. This area overhead, mainly used to satisfy the noise constraint, ranges from 0 to 12%. We note that the integrated floorplanning algorithm obtains smaller delay than other algorithms, because the delay of other floorplans may be smaller than the final floorplan after decap planning. Therefore, the selection of different floorplans can further reduce the delay in spite of the longer runtime. The total runtime is proportional to the number of iterations used in the SLP, and depends on the number of levels of temperature used for decap planning. Lastly, Table IV shows the impact of the linearization bound on the delay function. Let the value of the decap in the pervious iteration is C, then the linearization bound is [C/ρ, ρC]. We change ρ and TABLE III 
