in a 30% delay variation for mica1 gates. With ever diminishinx 'University of Michigan. Ann Arbor, MI
Introduction
Power supply networks are essential in providing the devices on a die with a reliable and constant operating voltage. Due to the interconnect resistance and inductancc of the on-chip and package supply networks, the supply voltage delivered to various devices on a die is non-ideal and exhibits both spatial and temporal fluctnations. These fluctuations in the supplied voltage can result in a reduction in operating frequency and can compromise the functional stability. Power supply integrity is therefore a critical concem in high-performance desibms.
The voltage drop that develops in a supply network can be broadly classified into IR-drop, which is the voltage drop due to the parasitic resistances of the interconnects and LdUdI dmp, which is the voltage drop due to the inductance of WO pads and the parasitic inductance of the supply interconnects. In today's high-end designs, it is not uncommon for the supply network to conduct as much as 50-100 Amperes of total current [1,6]. As semiconductor technology is scaled and the supply voltage is reduced, the total current that must be supplied by the power network is expected to increase even further, making it mare difficult to meet stringent supply integrity constraints. In particular, the LdYdt voltage drop is expected to become more prominent as it worsens with both increasing current demand and clock frequency [Z] . Fulthermore, IR-drop and LdWdt drop interact in a non-trivial manner and total drop i s not always the rum of the individual voltage drops.
The voltage fluctuations in a supply network can inject noire in P circuit, leading to functional failures in the design. Extensive work has therefore been focussed on modelling and efficient analysis of the worse-case voltage drop in a supply network 12-71. However, with decreasing supply voltages, the gate delay is becoming increasingly sensitive to supply voltage variation as the headroom between Vdd and V, is consistently reduced [12] . For instance, in 0.13wm technology, a 10% variation in the Vdd and Gnd voltages can result Permission to make digital or hard cvpics of all or part orchis work Tor personal or classrootn use is grantrd Mthout fee provided lhat copies are nul rmde or dis1rihutt.d fur profit ~i r coniiiieicial advantage atid that copies bear this notice and the full citiiiion "11 the first page. To copy ahenvise. lo rrpuhlish. tu post on servers UT tu redistribute to lists. requires prior specific prrmissmn andior a lee.
Cnnvriehl2003 ACM 1-581 13-762-l~03lOOl I ... $5.00.
JCCAD'O3, Novmmber [11] [12] [13] 2003 . San Joic, California, USA.
1
.. --cluck cycle times, accurate analysis of the supply voltage impact on circuit performance has therefore become a critical issue.
In this paper, we present a new approach for the analysis of supply voltage induced delay variations. Power supply analysis has been complicated by the enormous size of the supply network. For modem processors, it is not uncomon for the supply network to be represented by an RLC circuit requiring more than 60 million elements.
Simulation of such a large circuit is extremely challenging and significant progress has been reported in developing efficient simulation approaches [3,5,7]. However, even with effective acceleration methods, it is typically not possible to simulate a supply network for more than a handful of clock cycles in reasonable time. Selecting the simulation vectors that exhibit the worst-case supply voltage drops is therefore a key issue in supply network verification. The supply voltage fluctuation is strongly dependent on the Simulation vectors that determine the currents drawn by the devices from the supply network. Hence, critical supply integrity problems can g o undetected if Wont-case simulation vectors are not applied, regardless of the simulation accuracy.
A number of methods have therefore been proposed that use Genetic Algorithms or other search methods to automatically find vectors that maximize the total current drawn from the supply network [8,10]. These approaches typically are computationally intensive and are limited to circuit blocks, ratherthat full chip analysis. In addition, a number of vectorless approaches for constructing worstcase currents have been proposed using either propagation of timing windows [E] or constraint graph formulations [ I l l . Vectorless approaches have the advantage that they are conservative, meaning that the supply drop will be overestimated, rather than underestimated. However, these methods address only static IF-drop analysis. and not LdWdt drop, which has become a key concem in supply integrity analysis. Also, they do not consider the impact of supply fluctuations on delay. Recently, a statistical approach for analyzing the impact o f~u p p l y noise on delay WBS also presented [14J.
Power supply variation can impact the circuit delay in two ways:
Fint, a reduced supply voltage lessens the gate drive strength, thereby increasing the gate delay. Second, a difference in the supply voltage between a driver and receiver pair creates an offset in the voltage with which the driverlreceiver gates reference the signal transition, This has the cffect of creating either a positive or negative time shift in perceived signal transition at the receiver gate, as illustrated in Figure 1 . This dual nature of the supply voltage impact on circuit delay was obsewed in [I;] , and complicates the generation of simulation vectors that maximize the delay along a particular circuit path. Increasing the voltage drop at a particular location may worsen the delay of one gate while improving the delay of another. Therefore, a vector must be determined that results in an optimal combination of the often conflicting goals to introduce both reduced drive strengths and supply voltage shifts such that the total delay along a path is maximized.
Traditionally, the impact of supply noise on delay has been accounted for by reducing the operating voltage o f d l library cell by the expected supply voltage drop during library characterization. This assumes that the worst-case expectcd voltage drop occurs in all places of the design. This yields a very conservative analysis since, in practice, the worst-drop can occur in only a small region at any I 84 one point in time. On the other hand, this approach ignores the impact of voltage shifts between drivedreceiver pairs, thereby possibly underestimating the worst-case delay in certain situations, Also, it only accounts for static Ill-drop. In this papcr, we therefore present a new approach for the analysis of power supply drops on circuit delay. The proposed approach is vector-lcss, allowing for efficient analysis, and addresses for both IR-drop and Ldlidt drop cffects. We develop a linear model that accounts for both the impar! of driver strength reduction and voltage shitis behvecn driverlreceivcr pairs. Based on this model, we formulate the task ofdetermining the worst-case impact of supply noise on a path delay using a constrained linear optimization model where the currents of the differcnt blocks are the optimization variables. We use both spatial and temporal super-position of the voltage drops rcsulting from currents of individual circuit blocks. Linear constraints are then formulated both for the total power consumption of a chip, as well as for individual block currents. Constraints between currents of different blocks or a single block in consecutive clock cycles can be formulated expressing both spatial and temporal correlations that exist bctwcen circuit blocks. The proposed approach has the advantage that accurate constraints can be oxtracted from extensive gatc level simulation data that is readily available during the design process, thereby significantly improving the accuracy of the analysis while avoiding the need for lengthy and time consuming power grid simulation. We implemented the proposed methods and tested them on benchmark circuits, including a power grid from an industrial processor design. We show that the traditional analysis may ovcrestimate the change in delay of a path by more than 50% and demonstrate the effectiveness of our analysis.
The remainder of this papcr is arranged as follows. Section 2 describes our model for delay variations with respect to supply voltage fluctuations. Section 3 presents the problem formulations and optimization method for maximizing the impact of power grid fluctuations on delay. Section 4 prcsents the results obtained for different power grids. In Section 5 , we draw our conclusions,
Delay Model for Supply Fluctuations
In this Section, we present our approach for modeling the impact of voltage variations on the delay of a circuit path. Since the voltage variations in a power grid are typically very slow compared to the transition time o f a switching gate [15] , wc can make the simplifying assumption that the supply voltages are constant during the switching transitions. From the perspective ofthe path delay, we are therefore concerned with the impact of fixed voltage offsets from the nominal I' dd and Vss voltages on the delay of a circuit path. Note however that dynamic IR-drop and Ldlidt drop effects will be the cause ofthesc voltage offsets.
A voltage drop at a power supply point can impact the delay of a gate through one ofthe following two mechanisms: The relative shift between the driver and receiver gates is likely to be larger if the gates are separated farther apart as compared to the case when they are closer together. Therefore, nets that transmit signals across the chip will have a higher likelihood of shifts in supply voltage between their driver and receiver pair and hence are more susceptiblc to power grid noise.
The relative magnimde of the above two mechanisms depends on the input slope and output loading of a gate. The sensitivity of gate delay to driver strength reduction will increase with output loading, while the sensitivity to voltage shifts will increase with slower input signal transition times.
In order to maximize the delay of a path, it is necessaiy to induce voltage drops in the supply network such that the delay of each gate is increased through both mechanisms: reduction of driver strength and voltage shifts between successive gates in the path. A possible voltage assignment that maximizes the voltage shift between consecutive gates in a circuit path is shown in Figure 2 Figure 3 . A Driver-receiver pair In a non-ideal supply nehvork local supply voltages Vdd,, and V,,, and supply voltages Vdd,;", Vsr,in at the preceding driver gate. As shown in Figure 3@ ), the propagation delay T between the input and output transitions of a gate is measured at 1/2 the nominal supply voltage point to ensure a common reference between successive gates. The delay of the receiver gate depends on the Vdd,s and V.s,s voltages at the receiver gate itself, the voltages Vdd,i,, Vss,io at the preceding driver gate, the input transition time and the output load. For the purpose of our discussion, we consider a fixed output load, although in our actual implementation gates are characterize over a range of output loads. . -a.
x ".nbb" in " , . and Vs$,in are varied by +20 %. The delay curves in Figure 4 show thatfand g can be accurately modeled as linear functions for reasonable supply voltage variations. We therefore express the change in delay, AT of a gate with respect to its delay at nominal supply voltages as follows:
where AVdd,,, AVsrp AVdd,in, and AVsrin are the deviation of the four supply voltages from their nominal values and Atc," is the change in the input transition time from its nominal value. Similarly, we express the change in the transition time AIcou, at the output of a gate with respect to its transition time at nominal supply voltages as follows: We now express the delay and transition time at the output of gate G as follows: ' 1
In general,fandg are nonlinear functions oftheir variables. How.
ever, the voltage drop in a power grid network is restricted and is within this range, the delay of a gate is close to linear. Figure 4 pares the delay values determined using our linear model with delay values obtained through SPICE simulation for a low to high propagation delay ofan inverier in 0.13 micron technology with a nominal voltage variations are shown. We also compared the accuracy of the typica~~y within the range of +io% of v~~,~~~~~~~ we found that power supply of 1 . w Different combinations of maximum supply proposed dclay modcl for more than 3000 randomly generated voltage and transition time variations of f10 %. which resulted in a average error ofO.74% and maximum errorof8.1%.
It should be noted that while we linearly model the ckunge in delay duc to supply voltage variations, the nominaldelay itself is not a linear function of output load and nominal input transition time. We therefore used a non-linear, table based model, similar to that used in Synopsys Design Compiler, to model the dependence of nominal delay and output transition time on output load and nominal transition input time. For each possible load and input transition time condition, wc also determined different linear fitting constants 01-05 and bl-b5, which are stored in a table along with the nominal delay and output transition time values.
Circuit path delay model
We now consider the variation of the delay, ArpOrh of a circuit path due to supply voltage variations at different supply connections along a path as shown in Figure 2(a) . In general, the change in the delay ofthe nth gate is given by:
+~4 , " A V S S , " . 1 +~5,"AI,,".l (EQ 6) and the change in its output transition time is given by: where ubn, b , , are.the regression coefficients for gate n; A V , , , AV,,, are the supply voltage drops at gate n; AVdd,,.,, AV,,,., are the supply voltage drops at its driver gate, n-l The delay ofgate n is therefore defined in terms of the change ofthe output transition time ofgate n-I, leading to a recursive definition of the overall path delay. The total delay change of a circuit path, is the sum of the changes of the gate delays along the path and is expressed!as follows: However, ihe analysis can be easily extended to account for nonideal input signal transitions.
Equations 8 and 9 model the change in the delay of a path as a linear function of supply voltages at the individual gate connections. In the next section, we propose a method to express these supply voltages as a linearly function of block currents and formulate the problem of maximizing delay as a linear optimization problem.
Maximum Delay Variation Formulation
We now discuss how the supply voltages can be expressed as a linear function ofthe current sources using both spatial and temporal superposition and accounting for both IR-drop and Ldl/dt drop. We then show how the problem ofmaximizing delay change for a circuit path can be formulated as a linear optimization problem with linear constraints.
We consider a power supply network composed of RLC elements, current sources and voltage sources. We first consider an independent current source ;,,,(I), applied at node ni, and denote the voltage response generated at any node PI due to the current im(t) as V,,Jf). Given a set ofcurrent sources ',(I) , the response at any node n in the circuit due to this set of current sources acting together is the summation of all the responses at node n caused by the individual current sources:
This is the well known principle of superposition, applied spatially across the different current sources ofa supply network.
However, V,(r) in EQll depends on the entire current waveform ;"?(f), and requires that the entire current waveform is simulated for each current source. This complicates the formulation of the delay maximization problem since the number of possible current waveforms i,(f) can be very large and enumerating all possibilities would be impossible. We therefore approximate an arbitrary current waveform ; , , , @) using a piece-wise constant waveform with a discretization of time into time steps T?, as shown in Figure 5 total duration T, of waveform i&) and the time step size e, the number ofdiscretizations S is given by: T, = c*S. If the discretization time step is chosen sufficiently small, the piece-wise constant approximation of the continuous waveform has negligible error. We now represent the piece-wise constant current waveform as the sum of a series of current pulses of duration e, each shifted in time by one time step, as shown in Figure 5@ ) and expressed as follows:
where, p(t) = 1, ifO< I < = 0, otherwise.
and I, , ; is the magnitude of the piece-wise constant approximation of current pulse ;,,,(I) in the interval iq to (i+l)c.
Conceptually, we can therefore replace each current source ;,(I) at node m with a set ofScurrent pulse sources ;,,,;(f) connected to the same node in the grid. Note that each current pulse ; , &) is a scaled and shifted version of the unit current pulse iJr) with a unit pulse height and a pulse width o f c :
;,,(I) = I , i f 0 < f c IT$ (€0 13) 0, othenvise Due to the nature of a powcr supply network, the voltage response V, " ( 1 ) at node n due a single unit current pulse i,(O will reach steady-state and approach the nominal supply voltage given sui?-
Delay maximization formulation
We apply the above formulation to the problem of delay maximization, using a linear optimization formulation with the currenf Valcient time, The difference of the voltage (t) at node from the nominal supply voltage V~~,nom,,,a~ therefore approaches zero given sufficient time. We assume that this voltage difference has diminished below a specified error threshold at time T, = K * T,.
ues as optimization variables. We first divide the chip into circuit blocks and simulate the minimum and maximum currents of each circuit block using Powermill or Verilog simulations or estimate them on the basis of a previously fabricated part, In a microproces~ sor design, these circuit blocks could be, for example, the instruction Since any finite length current waveform i J l ) can be represented by a finite set of current pulse sources, we can compute the voltage response VJt) at node n by summing the response from each of the individual current pulse sources, using linear supe~asition. How. ever, since the power supply network is linear, the response resulting from each current pulse is simply a shifted and scaled version ofthe response v, " pulse, we can there.
fore express the change in the voltage response 4V (1, from the nominal supply voltage due to the current source i,(r) as follows: fetch stage, instruction decode stage, execute stage, caches and the main memory control units. We make the simplifying assumption that the total current in a circuit block is evenly divided among its Power Supply Points. This has the advantage that the voltage sensitivities, 4 V, " ( 1 ) can be computed with respect to the total current of a circuit block, instead of with respect to each individual current source point in a circuit block. This therefure greatly reduces the number of optimization variables in our formulation and improves its efficiency.
When selecting circuit blocks, it is therefore important that each (EQ 14) block is sufficiently small to ensue that the spatial distribution of the currents within a circuit block do not significantly impact the where is the magnitude of the piece-wise Constant Current Wave-voltage response. For high-performance processors, with tight and form approximated in interval iT, to (i + 1)c.
uniform supply grids over multiple layers of metal, the spatial distrimanner, we can therefore bution of the total block current is typically not significant for modcompute the response o f a n y node in the network due to an erate size blocks [17] . If however, necessary: the proposed approach Current im(l) using a single simulation o f a unit Current pulse can be extended for non-uniform current distributions. It is also desirable that circuit blocks are selected such that their currents are and combining scaled and shifted versions of this response, using independent, reducing the need to incorporate constraints between EQ14. The only approximation in this approach arise from the piecewise cunStant the currents of different blocks in the delay maximization formulation. simulation length of the unit current pulse response. Given a sufficiently fine grain discretization and sufficient simulation length of
The Current waveform for a Circuit blocks . m i c a l k has an the unite current pulse response, arbitrary accuracy can be obtained. approximately triangular shape within an clock cycle, as shown in Also, the computational complexity grows linearly with respect to Figure 6 , reflecting a higher switching activity at the stan of the the unit Dulse resvonse simulation leneth Tb and the number ofdis-clock cycle then at the end of the clock cycle [161. We currently resulting from a , , , , it cerizations S of the current waveform im(t). Typically, the length T, ofwaveforms i,(t) will be much greater than the unit pulse response approximate the Current waveform for a circuit block in a single clock cycle with a trapezoidal waveform, as shown in Figure 6 . We ...
time Tk. Since the simulation time of the supply network will by far Block C"rrE"t waveform dominate the run time effort, the proposed approach will provide a speedup of approximately TJTk compared to simulating the entire current waveform im(l). It should also be noted that the current waveform i,(t) can be approximated not only by a sequence of square current pulses, but also by other current pulse shapes, using a similar analysis.
Finally, we combine the temporal superposition with spatial superposition to obtain the voltage fluctuation AV&) at a node n due to a set of arbitraty current sources i,(t) at nodes m as follows: to a specific curren~profile and different current profile approximations could be used as well.
The block current within a clock cycle may vary not only in magnitude but also in shape with different input data. some input will result in more switching at the ofthe cycle, while other input vectors may result in more switching activity at the end of the cycle. However, with the scaling of process technology, the clock frequency has increased significantly while the resonance frequency of the supply network has steadily decreased. For a l-2Ghz processor, typical resonance frequencies of the power supply network are in the range of 30-80Mhz [IS]. Any change in the shape of the current waveform within a single clock cycle therefore impacts frequencies that are well above the resonance frequency of the powcr distribution nctwork and have little impact on the voltage waveforms. This is illustrated in Figure 7 , where the voltage determined from Powermill or Verilog simulation. The constraint in EQ23 forces an upper-bound on the total current of the chip. This expresses that, while individual blocks may vary dramatically from cycle to cycle, the total power ofthe chip typically has a well known maximum current consumption. This upper-bound on the total current can be computed using either chip-level Verilog simulation or by scaling the maximum power of a similar design in an older technology. Other constraints expressing dependences between different circuit blocks or expressing dependencies between different clock cycles can be added as well using linear inequalities, as explained in the following Section.
U V".
-I_ ". * ~ To compute AV,,,, ", and AV,,,, ",;, a unit trapezoidal current Figure , , ,,ariation ofvoltage at node in the power with source waveform is, in turn, applied at each circuit block and the different clock cycle waveform shapes. voltage drop of all nodes is measured for S subsequent clock cycles, response ofa node in the grid till the voltage drop becomes insignificant. This is a time consuming rent waveform shapes with equal total charge, is shown. One wave-step but for WQicallY Processor design at most a few tens of circuit form a triangular blocks are required and the simulation is performed only once for waveform uses the trapezoidal approximation, as shown in ~j g u r e 6 . each circuit block, after which the results can be reused for the analThc simulations show that the response ofthc voltage is nearly indis-ysis of any number of circuit paths. The optimization in EQl8 tinguishable. Note that, if necessary, thc proposed approach can be through E923 is implemented using a CPLEX linear optimization extended such that each clack cycle is divided into multiple time-package. For typical power grids, the number of variables is of the steps and is represented with a series of consecutive CURent pulses, order of thousands of variables, which can be easily solved using allowing for different waveforms within a clock cycle.
standard linear solution methods. Finally, we note that the optimization solution not only provides the maximum expected increase in Bascd on Figure 6 , we also observe that the voltage response the circuit path delay, but also will provide the exact current wave-V:,, ,,(/) within a clock cycle is nearlyconstant and can be approxi-forms for each circuit blocks that produce this delay variation. Such mated with a fixed ",,itage yalue v", ",;. ~~~~d on E Q I~ we now a worst-case "block current tracen can be simulated by the design to express the voltage variation of a Vdd node n as a function of the verify the predicted delay change and can give insight into the opercurrent ;,- (I] of circuit block n, as follows:
ation of the supply grid.
Generation of block current constraints is the sensitivity of the Vdd voltage node n with respect to the current ofblock m aRcr i clock cycles ofdelays. Similarly, we express the voltage variation ofa V,, node as:
where A Vm, ,,, is the sensitivity of the V,, node n with respect to the current of block m aAer i clock cycles ofdelays. We now formulate the problem of maximizing delay as a linear optimization problem as follows: Maximize:
such that:
However, in most processor designs, correlations between the currents of different blocks, or between currents of.a block in consecutive clock cycles will also arise. For instance, positive correlation between the current of two pipeline stages can arise when data is passed from one pipeline stage to the next, or negative correlation may exist between the currents of two circuit blocks that operate mutually exclusively.
We therefore incorporate linear constraints in the proposed formulation to express such correlations. It should be noted that the delay maximization formulation is conservative, meaning that it will over estimate the change in delay due to supply voltage fluctuations. This is the result of the optimization formulation, which automatically maximizes the delay change within the bounds ofthe provided constraints. Incorporating additional constraints in the analysis is therefore an effective method to reduce the conservatism of the analysis, Any linear constraint can be represented in the proposed formulation and a number of different approaches of automatically generating such constraints can be used. In this paper, we propose the use of gate level power simulation, such a Verilog based simulator, to extract correlation constraints. By simulating a large set of chip level simulation vectors, the correlation between the currents of different blocks in one clock cycle or between currents of blocks in different clock cycles can be observed and can be represented using linear currents of a Multiplier and an ALU block in an Alpha processor. The X-axis of the scatter plot corresponds to the current ofthe Multi-(EQ 23) plier block and the Y-axis corresponds to the current of the ALU. The constraint in EQ22 expresses that the current of a block must The entire processor design was simulated, and the current of the have a valuc between its maximum and minimum possible value, as ALU and ~~~t i~~i~~ blocks were using pre.characterized xr= ( ' m , i "peak such that Figure 8 . Correlation between Multiplier and ALU block currenb power data in the cell library Each point in the scatter plot represents a simulated clock cycle. In total, more than ten thousand clock cycles were simulated using a number ofbenchmark programs. Note that many of the scatter points coincide. Since the Alpha processor is a single issue machine and was designed with clock gating for reduced power consumption, the Multiplier and ALU blocks cannot be active in the same clock cycle. This negative correlation is evident from the L-shaped skater points in Figure 8 . To express this correlation in the delay maximization formulation, we generate the linear constraint as shown by the solid line in Figure 8 and expressed it with the following inequality:
' m u / , , , + ' . 3 6 1 A L U , c s 1.7 (EQ 24) It is clear that the constraint in EQ24 will.reduce predicted delay increase of the analysis by preventing the Multiplier and the ALU from simultaneously exhibiting their maximum current values.
An example of a correlation between currents in different clock cycles is shown in Figure 9 , where the current of the instruction ' I
Voltage drop formulation
We observc that the proposed method for delay maximization can be easily reformulated to computed the maximum voltage drop at a particular circuit node. In this case, we maximize the voltage drop, again subjcct to linear constraint and with the block currents as optimization variables, as follows: Maximize 8-1 C".I."IDIV /" WE., Figure 9 . Correlation between IF rtngr in eyrle t and ID stage in cycle t t l .
fetch stage in cycle f is plotted against the current of the instruction decode stage in cycle el. Since data is passed from the instruction fetch stage to the instruction decode stage, a correlation can arise, as clearly visible from the scaner plot in Figure 9 . In this case, the correlation is caphlred using two constraints, as illustrated in Figure 9 and expressed as follows:
'.711F, I "ID, t + I ' 3.5
Although in'this paper we manually extract constraints from the correlation data, it is clear that such constraints could be easily generated automatically by finding a polyhedron that encompasses all generated current points. The use of gate level power simulation has the advantage that very extensive suites of test vectors are readily available and block current data can be obtained from them kith minimum overhead during the design process. Also, gate level simulation is typically performed for many millions of clock cycles. The proposed approach allows realistic constraints to be extracted, based on extensive simulation data while at the same time avoiding the need to evaluate long power grid vectors, that will lead to intractable simulation times. 
Results
The proposed approaches for determining the worst-case voltage drop and maximum increase in delay of a path were implemented and tested on a number of grids of different sizes for both flip-chip and wire bond package models. Grid-I through Grid-8 are different size grids in 9 layers of metal, generated using pitches and widths of an industrial microprocessor design. Grid-9 is the grid of an industrial processor, extracted using a commercial extraction tool and consists of over I million elements. For each chip, design was partitioned into a number ofblocks. The maximum and minimum current of each block and the total maximum power of the chip was then obtained through either Verilog simulation or chip area estimates. Table 2 shows the results for wont-case voltage drop computation, using the approach described in Section 3.1. We compare the obtained results with two traditional approaches for voltage drop analysis. In the first approach (Peak Curr) shown in Table 2 , all blocks are assigned their maximum switching current, so as to draw peak simultaneously. In the second approach (Avg. Curr), we assign an average current to each block. The last column shows the voltage drop obtained from the constrained maximization approach, where blocks with low sensitivity will be switching with lower currents while blocks with higher sensitivity will switch with higher currents. The current drawn by each block will change in every clock cycle so as to maximize the voltage drop at a given node due to both Ill-drop and Ldlidt drop. Table 2 shows that the peak current approach overestimates the worst-case voltage drop by a maximum of 64% and by 37% on average over all test cases. On the other hand, the average current approach underestimates the worst-case drop by as much as 61% and by 51% on average. Table 3 , show the results of the proposed delay maximization approach. Table 3 shows the maximum expected delay increase of a critical path for each chip as determined by the proposed constrained optimization approach (Consrr Max). The results are compared with two traditional approaches, In traditional approach I , the worst-case voltage drop of power supply network is applied at all voltage supply points of the gates constituting the critical path. This is equivalent to the common practice of lowering the operating voltage of all cells in the library by the worst-case expected voltage drop during timing characterization. Table 3 shows that this approach over-estimates the increase in delay compared to the constrained maximization approach by 135% on an average. It should he noted however, that the over-estimation depends on the placement of the gates in the path on the chip, giving a worse over-estimation of the delay increase for paths that are distributed over a significant area of the die. ln traditional approach 2, the worst voltage drop at each gate location is first determined using the constrained voltage maximization formulation described in Section 3.1. Each local worst-case drop is then applied simultaneously at all gates in the path. This approach is therefore less conservative than traditional approach I since many nodes have a local worst-case drop that is less than the worst-case drop of the chip as a whole. Nevertheless, this approach is also conservative and Table 3 shows that this approach still averestimates the delay by 44.796 on average compared with the constrained delay maximization approach.
In Table 4 , we demonstrate the effectiveness of incorporating additional constraints between block currents into the formulation. We repeated the analysis of Grid-I of the Alpha processor, but added several linear constraints expressing correlations between currents of different blocks and between block currents in different clock cycles. The constraints were obtained using extensive Verilog simulation, as described in Section 3. Table 4 shows the increase in delay of 5 critical paths with and without these correlation constraints. Although only a few constraints were added to the analysis, the delay increase improved by as much as 21.7%, and by 16.5% on average, showing the effectivcness of this approach. In Figure 9 , the current waveforms generated by delay maximization approach for Grid-l are shown. As can be seen, the currents generated by the analysis are time varying and exploit the time dependence of IR-drop and Ldl/dt drop. The mn time for the linear optimization was less than lsec for all the grids since the linear optimizer can solve linear maximization problems very quickly. The initial step of computing sensitivities is computationally intensive in this approach but it can be considerably reduced using fast linear solvers. 
Conclusion
In this paper, we have presented a new approach for computing the maximum delay increase of critical path due to power supply voltage fluctuations. The analysis is vectorless while considenng both IR-drop and Ldlidt drop. We presented an accurate model for the path delay as a function of the supply voltages and then formulated the delay maximization problem as a constrained linear optimization problem. We also discuss how linear constraints can be added to the formulation to represent correlations between block currents. The analysis was implemented and tested on a number of benchmark grids, including the power grid of an industrial processor and we demonstrate the effectiveness of the proposed approach. This work was funded by research grants and contracts from
