One of the most crucial high performance Systems-on-Chip design challenge is to front their power supply noise sufferance due to high frequencies, huge number of functional blocks and technology scaling down. Marking a difference from traditional post physical design static voltage drop analysis, a priori dynamic voltage drop evaluation is the focus of this work. It takes into account transient currents and on-chip and package RLC parasitics while exploring the power grid design solution space: Design countermeasures can be thus early defined and long post physical design verification cycles can be shortened. As shown by an extensive set of results, a carefully extracted and modular grid library assures realistic evaluation of parasitics impact on noise and facilitates the power network construction; furthermore statistical analysis guarantees a correct current envelope evaluation and Spice simulations endorse reliable results.
Introduction
The urgency to integrate increasing amount of functional units in the same circuit enhances the System-on-Chip (SoC) scenario design complexity. One of the most critical concern is related to the routing and sizing of interconnects delivering both signal and power supply to the functional units. In fact the compelling scaling down of transistors feature sizes that allows to achieve the SoC integration level is strictly entangled with signal and power supply integrity issues which are aggressively challenging the design of interconnects systems. In particular the Vdd and Gnd signals are exposed to deviation from the nominal values because of the mutual impact of two factors: the increasing currents to be delivered to the huge number of active devices and the parasitics of both on-chip and package-to-die wires, which are less negligible due to scaling down and to rising frequencies.
The phenomena at the basis of Power Supply Noise (PSN) are voltage drop (IR drop) and switching noise (LdI/dt) [1] . The former is due to the high amount of current needed by the power hungry blocks and to the wire resistance. This has the tendency not to decrease proportionally in scaled technologies. The tradeoff between interconnect density requirements, urging toward smaller wires, and the material and geometrical countermeasures, proposed by process engineers, is the key for limiting the IR drop impact in future technology nodes. The latter is related to the increasing current transients allowed by scaled transistors and required by frequencies constraints in performance compelling applications. In addition, the inductive behavior of interconnects is less and less negligible as frequencies increase, thus enhancing the jeopardizing effect of switching noise. Both IR drop and LdI/dt are related to on chip power supply network as well as to package-to-die power delivery system. As reviewed in [2] , the package level parasitic inductance has traditionally dominated the power distribution network total inductance. On the contrary, the on chip wire resistance has been in the past years recognized as having the most aggressive impact on the total PSN drop amount. This classification seems no more suitable to modern high-performance SoCs, as the forecasted increase rate of transient current is more than double the average current one [3] . Furthermore, the use of flip-chip package technology in substitution of wirebonding balances the on-chip/package inductance impact ratio. These aspects imply that switching noise will be in the future less negligible with respect to IR drop [4] , and that on-chip and package parasitics have influences on PSN which are difficult to disentangle [5] , [6] , [7] . An evolved PSN classification has been recently introduced: It differentiates the Static Voltage Drop (SVD), that is I avg R, associated to the average gate/block current, from the Dynamic Voltage Drop (DVD), that is
, due to transient gate/block current. The latter includes not only resistive drop, but switching noise and thus on-chip and package inductive impact as well. DVD evaluation is considered at the time of writing the most trustworthy indication of PSN, as it accurately takes into account transient currents. In fact, it has been shown in [8] that the use of an average IR drop for all the gates in a circuit using corner analyses (worst/best case power supply voltage) or the derating factor methodology (gate delay linearly varying with average power supply voltage variation) leads to completely inaccurate results in terms of circuit timing analysis. For example, in a medium performance industrial design (340K gates), synthesised using a 0.13µm tech-nology, the critical path analysis performed using the derating factor method leaded to approximately 50% underestimation of the noise effect on timing, if compared to an accurate transient Spice simulation. It is thus indubitable the importance of taking into account transient currents for both IR drop and switching noise when evaluating PSN effects. Another important point concerns the design flow stage in which PSN is taken into account. Traditional design methodologies use as much as possible predefined and overdesigned power grids. This is risky in current and future technology nodes, and not practicable in crowded high performance modern designs, in which interconnect resources must be carefully assigned. In more accurate design methodologies physical designers create the power network, verify with back-end tools the power supply voltage variations and adjust the power grid sizes and/or the blocks placement. In high performance design this is not a one shot phase; on the contrary it is cycled many times till the constraints are met, leading to intolerable time-to-market delays. Even if this back-end accurate analysis phase cannot be avoided, a prediction of the power grid design criteria would aid the designer in closing the loop in a shorter time. This is still more important when not only the on-chip power network impact on supply noise is considered, but when the wirebonding or flip-chip influence is taken into account as well. As a matter of fact, the choice on the connection points between package and on-chip grid, their parasitics and layout impact mutually on PSN. In fact, even if an on-chip power grid has been accurately designed, its DVD performance may be completely harassed in case package parasitics are included in the analysis. At the time of writing the possibility to know the DVD amount and its connection with power grid parameters is bounded to the back-end analysis. This means that a post physical design netlist must be extracted (for both interconnects and logic gates) and a time consuming vector based spice-like simulation must be executed. If, on the one hand, this step is sometimes feasible as a final verification step, on the other hand it is not suitable to a trial-and-error design method. Furthermore, the package impact is often neglected at this design stage. The aim of this work, thus, is to asses noise statistics and their dependency on package and on-chip design parameters. Furthermore, a methodology for estimating in an early design phase the potential DVD and its relation to the design variables is proposed. The solution space can be thus explored before the physical design step is taken. In this way, when designing the on-chip power grid, near-to-optimal solution criteria can be adopted, and rapid design closure can be achieved. The prediction regards dynamic voltage drop and concerns technology node, geometry, topology and on-chip and package power supply design alternatives.
The rest of the paper is organized as follows: In section 2 previous works on the subject are reviewed and in section 3 the proposed methodology is described. In section 4 the structure used for the PSN evaluation is analyzed and in section 5 the achieved results are discussed. Conclusions are drawn in section 6.
Previous works
The most important PSN aspects to be analyzed are: power supply network electrical behaviour, current envelopes flowing through metal lines, power grid topology and its connection to package, package type and parasitics. In most of the works addressing PSN analysis, power supply grid parasitics are initially extracted and subsequently the correspondent network is, in most of the cases, simplified to reduce the computational resources necessary for executing the electrical simulation. The challenging tasks do precisely concern both parasitics extraction accuracy and power grid modeling. The goal is to achieve a good trade off between precision, and thus results reliability, and the possibility to reasonably manage the extracted grid complexity. This step has been in past works reached focusing on different methodologies. They concern the way in which the grid parasitics can be extracted and the fact that parasitic extraction can be pursued considering or neglecting the inductance (RLC vs. RC parasitic networks). For example, in [9] macromodels for grid subsets are created, while in [10] the grid is reduced to a coarser structure mapped back to the original grid. In [11] and [12] the grid is modeled using transmission line theory and, in particular, in [4] lossy transmission lines are used for modeling power grid blocks with frequency dependent properties. In [13] a finite difference time domain method is used based on the solution of Maxwell's equations in the time domain. The circuit switching activity has a strong impact on IR-drop ad switching noise amount and distribution. Depending on the accuracy and on the use for which this information is derived, different analysis approaches can be applied. The simplest way to assess the current value in a macroblock is to sum up all the worst case currents for all the gates in the block, under the assumption of their activity timing window superposition. This assumption is as much realistic as much the considered block is made of dynamic logic and less combinatorial paths are present. In any case, it has been shown that this method leads to an extremely pessimistic evaluation of the wire width necessary to overcome electromigration and IR drop, especially for big blocks or circuits. Two are the problematic assumptions at the base of this approach: The coincidence of the switching windows for all the gates, and the equal value and "direction" of all the switching currents. Such a method has been given up when routing congestion became a concern, and when simple directives to routing tools where no more satisfactory to solve the problem. As an alternative in [14] the use of transistor level or gate level simulations is proposed to find out the current waveform drawn by the circuit block, applying a user defined or a random input vector. In some other works genetic algorithms are used to obtain an input set that will produce a worst case voltage drop at all the power bus nodes. Most of the times these approaches are resource consuming; furthermore it may not be true that the maximum total instantaneous current produces the maximum voltage drop at all power bus nodes. Another approach (frequently found in the literature) is on the contrary input independent; it differs from the previous one because it performs a static timing analysis to find out the minimum and the maximum time during which a gate may switch. Results could be still pessimistic because gates are assumed to switch in the same direction. If detailed informations are needed the only reliable approach is to take into account both polarity and timing informations for current spikes. Within this methodology different solutions have been proposed: In [15] temporal and spatial correlations and directions of switching currents are accounted for using a constrained graph approach; in [16] the dependency of the block current waveform on its input vector are captured using frequency domain current macro-models, while in [17] current signatures using complex compression methods are generated. The other point strictly related to DVD is the influence of package parasitics. Switching noise has become a concern when wirebonding parasitic inductance connecting the core pads to the package have reached critical values. The circuit pads are normally directly connected to signal buffers driving huge amount of currents, and thus inducing dangerous glitches on the on chip power distribution wires. The flip-chip (C4) full grid array technology has in many cases replaced wirebonding in high performance ICs. The inductance assumes smaller values in this case. Furthermore the power distribution technique has changed from the traditional interdigitated style to a hierarchical full area power supply pin connection. This allows a better voltage supply distribution to all the core points. On the basis of this scenario, it is thus necessary, when modeling the power grid, to take into account the power supply voltage connection to the package, and the equivalent model of the flip-chip or bondingwire interconnects. Many works in the literature have characterized the differences between the flip-chip and the wirebonding technology. In particular, in [18] the electrical performance of the twos are analyzed using a 3D electromagnetic simulator: Results show better C4 bump performance compared to the wirebonding one. Anyway, the mutual impact between package and on-chip parasitics, that is on PSN, is not frequently taken into account. In [19] the effective resistance between circuit blocks and power supply bump is taken into account during the floorplanning design stage. In [20] the package is analyzed in connection with the on-chip power supply distribution, while the impact of decoupling capacitors on package and on chip DVD is discussed in [21] . Anyway, in the last two works, the focus is on package parasitics, while both the on-chip grid and functional blocks current sources are modeled in a straightforward way. This is too simple for assessing not only the impact of the on-chip grid design parameters on both global and local PSN, but the mutual package-die influence on PSN as well.
Even if apparently not related to the previous discussion, when dealing with power grid design, even the electromigration (EM) problem should be taken into account, as it is strictly tied to DVD for many reasons. First, a wire stressed by EM has a growing resistance, and thus an higher impact on IR drop; second, the EM mechanism involves joule-heating, that is, net lifetime is a function of the real metal temperature. This depends on the RMS current drawn by the net and on metal resistance, that is itself dependent on temperature. Lifetime and PSN are then related one another by current, metal characteristics and temperature: Design directives for solving both the problems could be in opposition. The methodology proposed in this paper for PSN early assessment does not consider the EM problem, but it can be easily taken into account using, for example, the formulations expressed in [22] .
Proposed methodology
The most diffused back-end PSN analysis tools [23] , [24] use RC extracted values of the power grid networks (only few cases include inductance), and compute an average SVD using estimated switching activity factors and clock frequency. The results consist thus only of post physical-design average values: Late and non-DVD nor package aware informations are achieved. A different perspective is at the root of this work: Its goal is not the generation of a backend accurate analysis algorithm; on the contrary it aims at assessing the DVD behavior and its dependency on the design parameters. We propose to use this methodology for rapidly predicting the expected noise when a cluster of design conditions is given. These includes: technology, power grid parameters (metal layers, wire geometry, grid density, hierarchical topology, parasitics), circuit parameters (number of gates, operating frequencies, current switching activities), package variables (flip-chip or wirebonding parasitics, number and point of connections) and decoupling capacitors insertions (total capacitance amount, radius of effect, on-chip or off-chip decap). Given the abovementioned parameters, the power grid and package structure is generated in a modular way using a library of simple grid and package components and current generators described using the SPICE syntax. The library grid components differ for the technology, the metal type and the geometry. Their layouts are designed and their parasitics are extracted so that the final library of RLC block parasitics is used for the simulations. The current generators in the library are statistically described so that the complex circuit blocks switching activities are meaningfully modeled as transient events. Furthermore the statistical variables chosen reflect the parameters of the circuit whose PSN is to be predicted. The grid based on these library components is generated on the basis of the chosen topology and hierarchy. The package and decaps are included in the structure as well, still using ad-hoc library elements. Anyway, these models are simple, as they are included in the work with the aim of analysing the package impact on the on-chip DVD and not the detailed flip-chip or wirebonding behavior itself. Once the configuration to be analyzed is chosen and the grid is hierarchi- cally described, an electrical simulation is performed using the Monte-Carlo analysis feature embedded in the SPICE engine, so that current generators statistical parameters are varied and the DVD final generated data are statistically meaningful.
In this paper we consider a few grid configurations and report DVD values and their dependency on the chosen parameters. A far as the author knowledge, there are not previous works focusing on an a priori power grid dynamic voltage drop noise assessment tacking into account both on-chip and package parameters, as well as realistic transient current switching activities. The results give the designer still approximated but reliable indications on the expected circuit DVD on the basis of the chosen configuration, and thus directives for the following power grid physical design step to be executed.
DVD analysis structure
The purpose of this work is thus to easily build a framework in which various parameters are used for noise estimation. The relations between the noise figures found and the variable set help in figuring out guidelines to predict and reduce power grid noise in early design phases. For this reason the trade-off between accuracy and flexibility suggested the creation of libraries composed by simple but highly configurable structures for interconnects, current sources and package, for which a brief description is given in the following. It is important to underline that the possible variables and configurations considered in this context do not exhaust all the real possibilities, as the aim is to show the feasibility of the methodology and to assess the DVD noise amount in typical cases. Anyway, the framework organization is flexible and further variables and/or structures can be easily added to enlarge the library complexity. For example, one of the variables is technology: We consider here 0.25µm (hereinafter T 1 ), 0.18µm (T 2 ) and 0.13µm (T 3 ) as fully available at the time of writing for what concerns metal data and cell library informations. Once these data (not only ITRS estimated but real foundry data) are available for other up-to-date and future technology nodes, the same analyses reported here can be performed using the same methodology.
Power grid library
In high performance designs the power grid is often organized as sketched in figure 1 , where, for example, a top layer (M6) is used for distributing the power supply voltage to macro-blocks. Each of them has a more crowded power grid wired on a lower layer (for example M4) which distributes the supply to smaller blocks. Often the metal layer used is different for the horizontal and vertical directions, and Vdd and Gnd are routed using the same layer and interleaved. We consider thus as a basic structure for the power supply mesh the L − shaped layout in figure 2 . The metal used can be the same for both the directions or can be different: The variables i and j identify the metal layer used for the vertical and horizontal wires respectively, while the Mi − Mj via identifies the simple or stacked via between the two layers. When building the library we consider all the plausible (i, j) combinations of layers in this structure (metal layers from 1 to 6 are allowed by the technologies considered in this work). For what concerns the geometry, in the restricted library considered for this work we used two possible widths (w in figure) : One is the minimum allowed for power supply stripes by the technology taken into account (w min ) and the other is five times larger than the minimum 1 (5×w min ). The w min may not be realistic in real power networks, anyway it is used here for comparing different technologies at a reference point, being the minimum width allowed for power supply wires the only fixed width for each technology. The wire length is another variable which assumes in the case of this work two values: l 1 = 10µm and l 2 = 100µm (the same note as for w is valid here). The length granularity is based on the typical sizes of an "average gate" in a library of the technology considered as a starting point, that is the T 1 = 0.25µm. We assume that an "average gate" G T in the library for technology T has the parameter P , that is area, width, current peak and width, etc., averaged among the real gate parameters in a library of K gates:
where G k is a library gate; thus
is an average parameter P of the average gate G T . In the case of this L − shaped structure, we assume that the area embraced within the L − shaped square in the minimum length case l 1 corresponds to the area of two gates of average size for technology T 1 = 0.25µm, that is A
. The other cases in terms of length and of technology are a consequence of this assumption: Given l l , where l = 1, 2 in our case, the number of gates embraced for the other technologies in the correspondent l l × l l area is computed considering the "average gate" area:
where N T 1 1 = 2. These choices are clearly not general nor extremely accurate, but allow to easily build the basic blocks to be used for creating a complex grid in a modular way. The layout of these L − shaped structures is generated using Cadence [25], the correspondent GDSII files are used for extracting the parasitics using SPACE [26] for resistance and capacitance, and FASTHENRY [27] for inductance. The equivalent circuit is thus a T-RLC structure in each branch of the l − shaped block, sketched in figure 3 (the current generators parameters are described in the following section). In table 1 some of the values extracted in the l 1 = 10µm and l 2 = 100µm cases are reported for the three technologies. For l 2 only the M4/M6 and M5/M6 cases are reported as the most typical for this length. It represents in fact an higher hierarchical block. The Mi − Mj via resistance values are not reported for sake of brevity. The L − shaped structure equivalent circuit is used to build in a regular and modular way a mesh of the desired shape and size, so that we can emulate a mesh powering a circuit of an high number of gates. An example is reported in figure 4 , in which a certain number of L − shaped structures of length l 1 (or l 2 ) is used for creating a mesh so that a global l Table 1 Parasitics parameters extracted for the basic l 1 and l 2 structure. area is covered. The equivalent circuit of this mesh is created in an automatic way using the hierarchical constructs allowed by the SPICE language. This is clearly and approximation, as the "inter-segment" capacitances and the mutual inductances are neglected. Anyway, as said before, this is not a backend analysis methodology. Moreover two further points should be underlined. sample mesh, which results are not reported here for sake of brevity, compared to the approximated one, the intersegment capacitances resulted more than 10 times smaller than the C ground capacitance of the single wire segment.
Blocks current sources
For what concerns the block current we use a current generator connected to each intersection point of the T in the T-RLC model (see figure 3 ). This does not means that the gate is really connected to that point of the wire, as it will be probably joined to a M1 stripe in many contact points. The current source represents the current delivered to an ensemble of gates connected among them and placed in some way within the considered area, and contacted to the It is important to note that within a clock cycle the gate switching gives contribution to the global current envelope depending on the propagation delay through a path and with positive or negative polarity. Moreover, it should be underlined that the current can be different from zero even if the gate is not switching. Exhaustive details on this point are not shown here for sake of brevity. Anyway a simple example can be used to clear the point: Suppose that the inputs a and b of an and gate are initially 0 and 1 respectively, and that at instant t 0 they go to 1 and 0 respectively. The output should not change, but an internal current transient will happen, which is not negligible with respect to the "active" current case, as the on-off and off-on transistor switching is not instantaneous. It is thus clear the importance of correctly modeling the current generators to avoid the typical errors that leads to an extreme overestimation when the delays among switching are not considered or when only the worst case is taken into account.
In this work, thus, the current pulse is shaped as a triangle, which parameters (initial time, peak time, peak value, final time) are described such that they can be varied within fixed ranges using a statistical distribution (normal).
In particular both the peak value and its activation delay are associated to a normal distribution, while the other parameters are dependent values (for sake of simplicity). The parameters are chosen such that the current envelope can shift within a clock cycle and can have positive or negative polarities. When the circuit is simulated the statistical engine embedded in SPICE, that is the Monte-Carlo analysis feature, explores the probable space with many iterations (it has been shown in past works that 30 iterations allow to achieve statistically meaningful results). An example of the current statistical variation for one of the generator is shown in figure 6 . The peak and timing values chosen for the current generators are related to the number of "average gates" comprised in the L − shaped structure for each of the l l ×l l area and coherent with the previously defined number N T l . In the case of the simple block of length l 1 and for the T 1 , I source models the activity of one gate only; thus, the ranges for the current peak are averaged among the most used gates chosen in the 0.25µm cell library. For modeling the current shapes of a generic block of gates, that is, l
mesh structure, the number of gates embraced in the correspondent area is reckoned using the previously defined N T l . Then we performed a characterisation for considering the possible current activities of the library gates and the typical delays and clock cycles using some of the ISCAS85 benchmarks [28] having a similar number of gates (N T l ). The current and timing values are used for defining the ranges of the normal distribution parameters. We aim at disentangle our analysis from a given circuit connectivity: For this reason we collapse the characterisation in the parameters of a normal distribution. For example, in figure 7 we report a normal PDF reckoned after a characterization associated to the current peak parameter for an "average gate" in 0.13µm technology (P eakG T 3 ). For the l 1 case, the values associated to the 3σ limit for the normal distribution used to vary the current peaks are 1mA for T 1 , 0.8mA for T 2 , and 0.48mA for T 3 , while for the three cases the medium value is around 0. This means that the majority of the current values are chosen, e.g. for T 1 , between -0.33mA and +0.33mA. The timing parameters for the current envelopes are described in a more complicated way and not reported here for brevity. Anyway they are a function of the clock cycle and of the rising and falling times; the quantities used for these two values are T ck = 3ns t r = 0.2ns for T 1 , T ck = 1.2ns and t r = 65ps for T 2 , T ck = 0.5ns and t r = 30ps for T 3 . In summary, each of the generators in the circuit will be associated to its own statistical parameters (one for the peak and one for the activation delay) so that a Monte-Carlo simulation of the circuit can be performed. In this way we emulate the mesh as connected to a combinatorial circuit in which the gates are switching in whichever direction and whenever within a clock cycle, with meaningful current peak and timing values. The analysis is then independent from the circuit logical connectivity.
Package library
The package elements are composed, as suggested in the literature, of a resistance and an inductance connected in series. Their value depends on the type of package, e.g. wirebonding or flip-chip, and, in this case, are chosen among the measured values reported in [18] and [20] . For the wirebonding case (WB) the suggested resistance is around 140mΩ while inductance is 3.5nH. In the flip-chip (C4) case the chosen resistance is 100mΩ and inductance is 1.5nH.
Their connections to the on-chip mesh are diversified depending of the modular circuit created, as explained in the following sections.
Results
The simulations have been performed in a few phases. Firstly, the attention has been focused on geometrical and technological parameters (section 5.1), while in the following steps the analysis has been directed toward realistic cases of power meshes (section 5.2), of package (section 5.3) and decoupling capacitors choices (section 5.4).
Impact of geometry and technology
We compare here mesh topologies powering an increasing number of gates and having different shapes as well. The structures are sketched in figure 8 . In a first simple case (A) the L − shaped component is connected horizontally five times; this means that the shape is l l × 5l l . The width is also varied and can assume in the whole structure the value w min , or, alternatively, 5 × w min . The number of gates powered is then 10 for the l 1 and 0.25µm technology case, while 10 is the number of gate blocks (GB) for the other length l T l where the number of gates is a function of N T l . Hereinafter we will refer to GB for all the cases for sake of simplicity. Structure B has a squared shape as five A blocks are connected vertically. Sizes are then 5l l × 5l l and thus the GB number is 50. This structure is used for hierarchically creating the structure C, using an horizontal connection, leading then to a mesh powering 250 GB and measuring: 5l l × 25l l . Finally, a bigger squared mesh, D, based on the B one, spans 25l l × 25l l and powers 1250 GB. The nominal power supply in these simple experiments is connected only to the left size of the mesh (supposing an interdigitated distribution from higher layers) routed in metal 2, while, in this first simulation set, M1 is used for both metal M i and M j . Voltage waveform for one of the 30 Montecarlo iterations Fig. 9 . Example of noise waveforms measured in the worst case node (far from the nominal power supply) in the C structure at Vdd node for T 3 . Thirty waves are superposed after Monte-Carlo iterations in the left graph, while one of them is extracted in the right graph.
In figure 9 we report an example of noise waveforms at node Vdd for the C structure (T 3 , l 1 , 5 × w min ) due to the statistical variation of the current generator parameters performed by the Monte-Carlo engine. The voltage is measured at the grid node where it was found being the worst one, that is at the opposite side of the nominal power supply connection points. It is important to note (both from the left and the right graph) that this is a dynamic noise, that is both under and over-voltages are reached with a similar probability. This happens because the current values change polarity with the same probability in the two directions and because of the inductive presence in the electrical model. In spite of the fact that normally only the under-voltage is considered in PSN analyses, the over-voltage should not be neglected, as it implies a delay variation of the powered gates which not necessarily is less critical than the one caused by the under-voltage [8] .
In the following results, only maximum and minimum Vdd values measured among all the 30 iterations are reported instead of all the values for easy readableness. Figure 10 refers to T 1 , while figure 11 to T 2 , and figure 12 to T 3 . The nominal power supplies are 2.5V for T 1 , 1.8V for T 2 and 1.2V for T 3 . In these three figures we compare in two graphs the results for the four A-B-C-D structures based on the l 1 (bottom) and l 2 (top) lengths. Moreover, for each value we compare the results measured for meshes where the wire width is the minimum one (w min ) allowed by the technology for power supply delivering, and where it is 5 times the minimum. As expected, when the number of gate increases, but the shape is maintained (as from 10 to 250 GB, or from 50 to 1250), the worst case noise is higher; for example in the l T 1 2 case, a 10 to 250 GB increase (rectangular shape with 1:25 increasing rate) leads to an over/undervoltage 3.5 times higher. The same noise enhancement is maintained when the powered gate blocks go from 50 to 1250 (square shape with 1:25 increasing rate). This trend is similar in the other two technologies as well ( figure 11 and  12 ).
The increment is thus similar in the two rectangular and squared shapes, but the absolute values are different between the twos. In fact, it is important to note that, going from 10 to 50 GB the noise peak does not increase, but gets even lower (1.96 times smaller at T 1 ). The same can be noted while comparing 250 to 1250 GB (1.92 times smaller under-voltage). This is clearly due to the shape of the structures and to the fact that in these simulation set the nominal power supply is distributed only from one side. The squared case, even if bigger in terms of number of gate blocks, has a lower drop because the maximum distance from the nominal power supply does not increase. Furthermore the overall capacitance of the metal wires tends to equalize the overvoltage. This figure 8 ) and between two wire widths. Error bars display the variability of the measured data. is an important result from the designer point of view, as these noise figures give not only general directives on the most convenient shape to be adopted when a cluster of gates must be placed and its power mesh designed, but even quantitative values on the expected DVD consequence of this design choice. This trend is similar in the other two technologies as well. For an easier comparison among the three technologies the percentage variations with respect to the nominal power supply are reported in table 2 for the l 2 -w min -250 and 1250 GB, M 1 case. Results show how the scaling down has a negative effect on noise, and this is confirmed also by the results in the following.
Further simulations have been performed considering the other metal layers. We report here for sake of brevity the results obtained for the 1250 GB mesh only (l 1 ). An interesting synoptic view for the three technologies is in figure  13 . The first metal behaves as the worst one in all the technologies, even if this is more enhanced in the 0.25µm one. As underlined before, the technology scaling down does not necessarily imply a noise reduction, as it appears from the fact that, especially in the real case of the larger wire, for metal layers different from the first, noise increases when scaled technologies are taken into account. In this case, the cause is the non linear reduction of resistivity with technology scaling. From the data reported in table 1 it should be noted Table 2 Comparison among the three technologies: percentage variations with respect to the nominal power supply in the l 2 -w min -M 1-250 and 1250 GB case. that, even if the material used for the metallization is the same in the three technologies, that is a Cu-Al alloy, the impact of the high-resistance barrier in damascene metallization processes results in an higher effective resistance in more scaled processes. It is interesting to point out that while in 0.25µm technology both M5 and M6 can be used to strongly reduce PSN, in 0.13µm the difference introduced by the use of different metal layers is less evident, with the exception of M6. These kind of informations can be used while planning in early design phase the hierarchy of the power grid distribution. 
Impact of topology, activity and hierarchy
In this set of simulations the minimum block size used is the 1250 GB one. The results have been achieved for the three technologies, but only the 0.13µm ones, as the most interesting, are reported in the following. The first analysis reported in figure 14 compares the effect of different nominal power supply delivery systems. The left sided distribution, used in the first simulation step (section 5.1), is compared with a similar case in which two opposite sides are powered with nominal Vdd, and, furthermore, with the optimal case of nominal distribution all around the block perimeter. As expected the worst case noise reduces: 67% from one side to two sides and 83% from one to four sides. As expected the point suffering the worst noise peak (reported in figure) shifts from the side to the center of the block. Having the possibility to dynamically and rapidly evaluating informations of this kind could highly help the designer while planning how to distribute the supply voltage through the grid powering macros of different sizes and number of gates. The designer can thus choose the best solution trading-off between noise constraints, grid design complexity and wire resources allocation.
Another analysis performed is related to the effective switching activity of gates: This simulation has been carried out by deactivating some of the generators (in percentage from 100% to 20%) uniformly through the circuit area. We report in table 3 the values of voltage variations in percentage with respect to the nominal supply voltage. The structure used is the 1250 GB one with nominal power supply distributed around the block perimeter. As a general comment, reducing the activity of gates, for example using the clock gating technique, gives perceptible results, but only a strong reduction in the generators activity leads to important improvements in noise. This could have been expected. In fact the current causing the overvoltages is the global current shape on each grid branch, that is, the impact of the inactivity of some generators may be small over the global current envelope. Moreover, in this work, each generator is modeled with a statistical variation within extremes that include small current values as well, such that a realistic gate behaviour is already taken into account. This is important especially because in previous works one of the first problem was the reliable modeling of the gate activity based on the worst case only, which leads to a strong overestimation of power grid noise [15] .
We performed then a further simulation step on blocks which emulate big- ger squared circuits, nearer to up to date high performance ICs. Using the D structure previously defined, we simulated other three squared structures organized respectively as E = 5D × 5D (31k GB), F = 10E × 10E (3.1M GB) and G = 5F ×5F (78M GB). The power supply is distributed in a hierarchical way: Blocks D lines are routed in metal M4 (l 1 , w = 5 × w min ), while higher level meshes are routed in M6. The nominal power supply is distributed at the whole circuit perimeter. In figure 15 the worst case DVD (measured at the circuit center) is reported for the four D-E-F-G structures and for the three technologies. As pointed out before, the 0.13µm case shows the worst noise as better metal characteristics do not completely compensate the increased number of gates and shorter current transients. Furthermore, these results show how the increasing circuit complexity represents a critical parameter: In the F and G cases, which includes most of the state of the art SoC sizes (156mm 2 and 38cm 2 respectively), the suffered dynamic noise increases up to about 15% and 30% respectively. According to the ITRS this trend is expected to be confirmed by 0.90µm, 0.65µm and 0.40µm nodes.
Impact of package
The results reported in previous steps have been obtained without considering the influence of package. Two are the impacts to be considered: the number and position of connection points from die to package, and the package parasitics. In case a wirebonding technology is chosen (WB), the connections are peripheral, and the parassitic inductance has a relevant impact. On the contrary, the flip-chip (C4) package allows a less inductive connection and permits to distribute external power supply connection points on the whole die area.
The number of connections to the external pads is a further variable, even if related to the package technology used. In this work, as we are showing a methodology, only a fixed number of connections is considered (with the exception of one case) for sake of brevity. Anyway, thanks to the modularity of the approach, a variable number of Vdd and Gnd pads can be introduced in the analysis. To show the impact of these effects on noise we used in this simulation set the structure G in the 0.13µm case. It is interesting to take in consideration the distribution of noise on the chip area, as the package connection style and parasitics influence this aspect as well. In figure 16 , thus, the noise wave- forms (in the case of one Monte-Carlo simulation) of five measure points are shown. Supposing to partition the G square in 25 identical squares (5 × 5), we measure the voltage at the center of each square, and identify such points as 1.1, 1.2, . . . 1.5 for the first row, 2.1, 2.2, . . . for the second and so on. Being the structure perfectly regular, we report here only the points detailed in figure and reckoning one of the eight triangles included in square G. In this case the connections to nominal power supply are peripheral (WB package), but no package parasitics are involved in this first simulation. As expected, voltage variations worsen as the measure point is nearer to the die center. Consider now in figure 17 a comparison between the WB and C4 connection styles, for two measure points (corner, that is 1.1, and center, that is 3.3), still without package insertion. The ideal Vdd generators are distributed in the C4 case not only at the peripheral but within the circuit as well. It is interesting to note how a uniform distribution of connections points allows a better noise equalization (0.72V at the center for WB and 0.94V for C4). More complete results are in table 4, in which voltages at the six measure points are reported for the two packages (without parasitics) after 30 Monte-Carlo 
Impact of decoupling capacitors
When facing power supply noise, not only geometry and topology are used to reduce its effect, but decoupling capacitors are inserted within the die [29] , [21] and in the first level package [30] . In this work, we suppose that the physical design step allows to place decoupling capacitors not only at the die peripheral, but wherever possible in the die area. For the simulations we use the capacitors C ground in the equivalent circuit of the L − shaped structure in figure 3 . This in previous simulations models the intrinsic network capacitance only, while in this case its value is increased including in it both the intrinsic net and the decoupling capacitance. Several simulations have been carried on, but only two results are reported in the following for sake of brevity: a distributed decoupling capacitance of 50f F and a double capacitance of 100f F for each C ground capacitor. In table 6 simulation results for the G structure are reported. Noise is decidedly reduced in the WB case as an average overvoltage decrease of 14% is obtained, even improved to a 17% reduction if the distributed capacitance is doubled; furthermore a better voltage distribution is obtained within the circuit. In the C4 case noise is reduced of about 4% and up to 11.6% in the double capacitance case. This result is comparable to the dense C4 case. Optimal results can be thus achieved trading off between these two strategies and tacking into account routing congestion as well.
Finally, in figure 18 an interesting synoptic view of all the cases taken into consideration for circuit G is given. Worst case overvoltage and undervoltage are shown in percentage with respect to nominal Vdd. It is evident that a power supply noise estimation carried on without considering package parasitics is not reliable: In the WB case the undervoltage may worsen from 22% to 36%, Table 6 Decoupling capacitance insertion. Minimum and maximum voltages at Vdd for structure G. Comparison between WB and C4 style connection. Package parasitics included in simulation. Measured points: (1.1, 1.2, 1.3, 2.2, 2.3, 3.3) . Reported results are in Volts.
while in the C4 case the overvoltage may grow from 12% to 21%. Furthermore the C4 density is a variable to be taken into account as a dense connection to C4 solder bump assures a dynamic voltage drop reduction comparable to the insertion of decoupling capacitors. This last technique is normally adopted at the end of the design flow, so that capacitors are inserted wherever possible in the design. A prediction of the total distributed capacitance amount needed for reducing the dynamic voltage drop could be of help if available before the design stage. As a general comment it is interesting to note that the capability to early analyze these noise figures as a function of the design parameters allows to define design countermeasures without the need of long post physical design simulations. 
Conclusions
This paper presented a new methodology for estimating in an early design phase the Dynamic Power Supply Noise, so that the often repeated post physical design verification step can be executed only when near to optimal choices on power grid design have already been taken. Differently from previous methodologies, the proposed one allows to predict dynamic noise and not only static IR drop, as transient currents are considered and inductance is included in the analysis as well. The prediction is allowed by the creation of a library used for the automatic generation of power grid structures, which geometrical and technological parameters are diversified on the basis of the complexity of the circuit to be analyzed: Many grid structures, topologies and hierarchies can be simulated thanks to the library modularity. Package parasitics and connection to the on-chip grid can be included in the analysis as well. Their mutual impact can be thus simulated, thus abutting two aspects which are separately considered in previous power supply noise evaluation methods. Currents drawn by the circuits are modeled with current generators which shapes are varied with statistical Monte-Carlo analysis: Frequency, rising times, current peaks, gate switching delay within the clock cycle can be inserted as parameters, so that DVD is estimated using realistic gate switching activities. Notwithstanding a restricted variety of parameters have been used in this work, the validity of the methodology has been extensively shown as worst case Dynamic Voltage Drop predictions have been accomplished for various power grid structures and working conditions. Results demonstrate how realistic evaluation of parasitics impact are assured by a grid library which parassitics have been carefully extracted, and which modularity facilitates the global power network construction. A correct current envelope evaluation is guaranteed by statistical analyses. Furthermore reliable results are endorsed by accurate Spice simulations.
