Abstract− Power dissipation is becoming a major show stopper for integrated circuit design especially in the server and pervasive computing technologies. Careful consideration of power requirements is expected to bring major changes in the way we design and analyze integrated circuit performance. This paper proposes a practical methodology to evaluate the short−circuit power of static CMOS gates via effective use of timing information from timing analysis. We introduce three methods to estimate short−circuit power of a static CMOS circuit without requiring explicit circuit simulation. Our proposed methodology offers practical advantages over previous approaches, which heavily rely on simple special device models. Proposed approach is experimented with an extensive set of benchmark examples and several device models and found very accurate.
I. Introduction
During input signal transitions, both the NMOS and PMOS blocks in static CMOS circuits conduct simultaneously for a short period of time causing a direct current flow from power rails, resulting in short−circuit power. Prediction of short−circuit power is of increasing importance as power shows signs of limiting circuit performance. Many experts expect that scaling trends would make short−circuit power as important as the dynamic power dissipated in a logic stage [1] .
In this paper, we propose a methodology to predict the short− circuit power based on industry recognized timing models. Typically, a substantial effort is spent for timing verification for modeling in the industry. Hence, many integrated device manufacturers and fabless design companies make extensive use of timing libraries in their verification flows. A typical timing library models the gate delay under various input and load conditions. Common engineering practice, yet at first order, characterizes the gates per each input signal switching, and the output waveform is often approximated by a piecewise linear waveform with several selected datapoints. Typically the timing models include average gate delay, measured at 50% of the full logic value, and the output slew for pre−characterized datapoints. The typical timing rules can be formulated similar to the equations below:
t out,50 =F 1 t in, slew ,C L t out,slew =F 2 t in,slew , C L (1) where C L is the load capacitance and t in,slew is the input slew (transition time). Traditionally, F 1 and F 2 are selected as polynomial functions or stored in multi−dimensional tables. The function arguments may also include other parameters such as Vdd. Characterized functions for (1) are heavily used in static timing analysis, which is part of the state−of−art sign−off methodology. In a similar manner, short−circuit power for each gate can be pre−characterized for each switching input signal as Dartu discusses in [2] . For average short−circuit power, simple polynomial models have been proposed, but unlike timing models, pre−characterized short−circuit power models are not well adapted, and are not part of the standard gate libraries. In fact, most of the previous research on short−circuit power have focused on closed−form analytical expressions , even for an equivalent inverter circuit model using simplified device models, primarily to obtain a basic device−centric intuition for the short− circuit power. Most notably, [3] uses Shichman−Hodges model, and [4, 5] use the alpha−power law model described in [6] for previous models. These approaches generally attempt to solve the set of differential equations for a switching inverter loaded with a nominal load capacitance. However, the accuracy and efficiency of their formulas largely depend on speculated simple device models and assumptions made for the device operation during signal transitions. For example, [4, 5, 7, 8, 9] all evaluate inverter output waveform under the assumption of zero PMOS device current in order to obtain a solvable closed−form differential equation for output waveform. Then, the output waveform expression is used to deduce actual nonzero PMOS current for time−domain integration which results in total short−circuit power for the signal transition. Most recently in [10] , a more complex model (MM9 [11] ) is incorporated with alpha−power model.
Our goal in this paper is not to obtain a closed−form expression, but instead to layout a practical methodology for evaluating the short−circuit power of typical static CMOS circuits whose timing models are available. With the described methodology, we propose a practical and useful way to get around incomplete performance characterizations. We focus on the inverter circuit model with general RC loads, since for the power−perspective, the static CMOS gates can be macromodeled as an inverter for each input combination [12] . One significant advantage of our approach is the utilization of general device models, or general device i−v characteristics instead of building the model upon a simplified lower dimensional device model. This offers more promising applicability for circuit designers who can only use established timing models and typical device model cards in their analyses.
Section II summarizes the short−circuit current for static CMOS inverters. In section III, we briefly discuss previous work in the field. In section IV, the proposed methodology is outlined in detail. The results and findings are presented and discussed in section V. In section VI, we conclude with final remarks and outline some future directions.
II. Short Circuit Current
In a static CMOS logic gate, the short−circuit current, I SC is observed when both NMOS and PMOS devices form a DC path between power rails. The power associated with this current is referred to as short−circuit power. Since this power is delivered by the voltage supply ( V DD ), the total short−circuit power (total energy per transition) can be written as
where T is the switching period. In power analysis, the short− circuit power P SC must be added to P D , the power required to charge/discharge the load capacitance. P D and P SC are both considered dynamic, whereas leakage is considered as a static phenomenon.
In Fig. 1 , a simple inverter is driven by a rising ramp input, resulting in a short−circuit current, I SC on the PMOS device. Assuming the input signal begins to rise at origin, the time interval for short−circuit current starts at t 0 when the NMOS device turns on, and ends at t 1 when the PMOS device shuts off. During this time interval, the PMOS device moves from linear region operation to saturation (unless an exceptionally fast input causes the device to shut off before the output voltage starts falling). Based on the ramp input signal with a rise time T R , t 0 and t 1 can be expressed as:
The average short−circuit power can be specified as the integral of short−circuit current between t 0 and t 1 :
III. Previous Approaches
One of the earliest approaches to modeling short−circuit power is presented in [3] . Using the inverter model with a basic device model, a fairly simple formula for P SC is given. This formula assumes that the PMOS device operates in saturation during the short−circuit current interval. This simple model does not account for load capacitance and is very inaccurate for short channel devices. Vemuri proposed a more accurate model [4] , which uses the alpha−power law model. In this model the output waveform is solved explicitly by neglecting the short−circuit current. The output waveform is then used to evaluate I SC with the alpha−power law model and several approximations for the integration for P SC are presented. In [8] , a similar model is derived for the pi CRC load model. This work makes a triangular approximation of I SC to calculate the power. Most recently, [5] proposed updated formulas for the short−circuit power model of an inverter with alpha−power law device models.
In previous approaches, simplified device models are extensively used to develop closed−form formulas, or generate analytical relationship between device models and the short− circuit power. They are also used to compare the models with actual circuit simulation results. While these works are important for early estimation and trend analysis for CMOS technologies, they fail to be effective during the actual design verification process. The primary reason is that present−day device models tend to be much more complex than the ones assumed. Another difficulty observed is the need for a conversion, or extraction tool that produces an equivalent alpha−power model from the actual device models, which may well be opaque to the user (like BSIM). Another mentionable concern with existing power models is the treatment for complex interconnect RC(L) loads which critically influence deep submicron delays.
IV. Proposed Methodology

A. Signal Abstractions
In digital design, electrical signals are often approximated by piecewise linear waveforms for visualization and timing related computations. This simplification brings several advantages in representing the circuit performance by a few simple temporal variables, such as delay and slew.
Unlike its popular use, the piecewise linear waveform modeling can bring about severe inaccuracies for short−circuit power evaluation. Fig. 2 shows the input−output behavior of an inverter driving a pi−load. The data is taken for 0.5 micron technology node and the input signal is assumed as a saturated ramp. Fig. 2 also shows an enlarged trace of short−circuit current ( 1000 I sc ) from supply to ground through the PMOS device. All waveforms in Fig. 2 are obtained by circuit simulation (SPICE) using level 3 device models. This figure also shows the piecewise linear abstraction for the output waveform as a falling transition, matching the 50% point of the output. The output slope is calculated using 20% and 80% of the full logic value. When we use the piecewise linear output waveform to calculate the short− circuit current, the v ds of the PMOS device would be approximated as zero for a brief time interval, causing zero short−circuit current. By enforcing the piecewise linear input and output waveforms, another short−circuit power waveform is generated via circuit simulation. As shown in the Fig. 2 1000 I sc of pwl i ⁄ o the piecewise linear output waveform approximation clearly underestimates the area beneath the I ds curve, and therefore the total short−circuit power and energy. Note that the inaccurate piecewise linear model manages to predict the maximum short circuit current very closely. Note also that the base of the short−circuit current waveform is closely related to the threshold voltages of the devices and mostly depends on the input signal.
These observations motivate us to search for the maximum value of short−circuit current using the abstractions of input and output signal waveforms. For simplification, we will describe the methodology for static CMOS inverters of which the input and output signal waveforms are characterized as piecewise linear functions. The generalization for other gates is possible by transforming the CMOS gate into an equivalent inverter preserving the delay and supply current [12] . Since most of pre− characterization for the static timing analysis is performed for each timing arc on a single input−output pair, the equivalent inverter macro−modeling appears to be practically feasible. We believe that an inverter model for a general gate will yield sufficiently valid results for power performance. Similarly, other generalizations of waveform models are also applicable within the framework discussed below.
B. Assumptions on Device Modeling
In our methodology, the NMOS and PMOS device models are assumed to be quite general to cover existing device technologies. In general, behavior of a MOSFET device is represented by a set of nonlinear relations corresponding to multiple operating regimes. Since the drain−current source is the primary output of most of the existing models, we will use the following model for MOSFET device:
Equation (5) can be written as complex as possible to cover all three distinct operation modes (cut−off, linear region and saturation) with great accuracy. The vector p dev includes the device−related and environmental parameters that affect the device operation. Examples for such parameters are temperature, oxide thickness, channel length reduction factor. Hence, p dev vector can be considered as the device model parameters entered in a typical SPICE card.
The other mathematical equation concerns about the operating regime of the MOSFET device. Generally, the device is assumed to be active in both linear region and saturation regions. The operating regime also depends on the device model characteristics and terminal voltages. However, the relation between the operating regime and device model parameters is often simpler than (5 
where p v and α are elements of p dev . Throughout this paper, we make use of the fact that g function is relatively simpler. We assume that the timing models for all different gates are available in the form (1) , and the equivalent inverter representations are known. Furthermore, we will assume that efficient implementations of the device model f and g are given. One of the most practical ways to implement f is the use of multi−dimensional tables as done in many timing simulation tools. The evaluation of the device current for a particular configuration can be interpolated from the sampled data points stored in a table. Such tables may be constructed for different device sizes, or proper scaling formulas can be used.
For the inverter circuit shown in Fig. 1 , the timing models approximate the output waveform by a saturated ramp function: 
Using (9) 
To preserve simplicity, v sb terms are dropped. In fact for the simple inverter equivalent of the static gates, v sb is almost constant during the transitions. In the following subsections, we propose three different schemes to estimate the maximum short− circuit current using parametrizations given in (10) and (11) . We assume that the time interval for short−circuit current, the support for the I dsp trace, can be found using the formula given in (3) with the knowledge of T R and threshold voltages.
C. Saturation Based Short−Circuit Power Modeling
For the inverter−equivalent excited by a rising ramp input, the PMOS device operates in linear and saturation regions within the short−circuit interval. The first model we propose relies on the assumption that the maximum short−circuit current occurs at the boundary between these two operating regions. The boundary conditions can be found by solving the previously described functional g pmos . In this scheme, we solve the timepoint for the saturation region boundary, t sat , using the approximate functional:
g pmos pwl t sat ,T R ,T F , D F , p dev =0 (12) For many device models, the formulation of (12) is pretty simple as a single nonlinear equation. Most of the industrial device models implement variations of (6) or (7) with minor modifications. Since the terminal voltages are now parametrized with the time variable, (12) is reduced to a single variable nonlinear algebraic equation, and can be solved by Newton methods. The solution of t sat is then used to predict the terminal voltages for the PMOS device and the maximum short−circuit current as:
The maximum short−circuit current will be used to predict the total short−circuit power (energy) for the signal transition by a triangular approximation of the I ds trace:
D. Maximum Linear Region Current Based Short−Circuit Power Modeling
The second model originates from a different assumption about the time of maximum short−circuit current. In this model, we assume that the PMOS device conducts the maximum current in linear region operation. Therefore, we apply a search for this particular timepoint ( t max ) using device models detailing the linear region operation. Following the parametrization in (10), this search can be done by two different approaches.
The first method is generally applicable for analytical device models. This method solves the zero of the time−gradient of (10 Similar to the previous approach, the computed t max is used to evaluate the maximum PMOS device current and total short− circuit energy by the following formulas:
E. The Quad Model
The third approach can be described as a combination of the previous methods. After evaluating two different estimates for the maximum short−circuit current ( t max , t sat ), one can construct a quadrilateral I ds trace by these datapoints:
, t 1, 0 . Then, the total short−circuit energy for the rising input transition is determined as:
Typical current waveforms for the saturation−based model, the max−based model and the quad−model for modeling the current are illustrated in Figure 3 .
VI. Results
To verify the accuracy and efficiency of the proposed models, we performed experiments with different device models and circuit configurations. For the reported experiments, we followed the circuit equivalent structure shown in Fig. 1 . The load network is realized as a pi CRC load model. To benchmark a wide range of circuit and device models, we selected various sizes of MOSFET devices, different input signal and loading networks. The device models used in this section are selected from 0.25 micron technology node.
We have to mention that piecewise linear signal waveforms are constructed (emulated) from actual simulation results, matching the output waveform at its 20% and 80% points. This is intentionally performed to avoid the impact due to errors in the timing models. In practice, this is not feasible and timing models need to be evaluated for a given input signal and load network parameters. Although the timing models are characterized for capacitive loads, they do work well for general RC interconnect loads with effective capacitance [2] methods. This enables our model with arbitrary loads, instead of just pure capacitances just like in the complex delay computations. For all the reported experiments, detailed circuit simulation (SPICE) is used to obtain the exact results of output waveform, the maximum short−circuit current and the total short−circuit power. The device models are also used to construct the device model tables for use in the proposed models.
A. Alpha−power Law Model
In this experiment, we tested the inverter equivalent circuit with alpha−power law device models. The proposed models are implemented in a program which uses interpolation−based tables with linear scaling to evaluate the device current. A 30x30 table is constructed for various v ds , v gs pairs. The g indicator function for the operating regime is taken as the equation (7). By varying the driver sizes, input transition time, total load capacitance, we obtained 72 different circuit configurations. Fig.  4a−b shows the results obtained with the proposed models. Fig. 4a shows that the t sat and t max based models overestimate the total short−circuit power, but display high correlation. Based on the samples, the correlation between P sc spice and P sat sat and between P sc spice and P sc max are both found larger than 0.99. Fig. 4b shows more accurate results for the maximum short−circuit current prediction with the proposed models. The quad model results are very similar to those obtained from P sc max . From a careful investigation, we find that the support of the I sc curve actually overestimates the true interval for the short−circuit current. This is partly due to the fact that the models ignore Miller coupling. 
B. Level 3 Device Models
The second experiment is similar to the previous one. The major difference is the use of Level 3 device models and more circuit configurations. t sat based model uses the equation (6) One of the reason for the lesser accuracy is the use of the simple Shichman−Hodges model instead of the the original device level model. Note that the correlation between the total short−circuit power estimates with the actual SPICE result is around 0.85, which is a significant correlation. The correlation profile for the maximum short−circuit current is also similar.
C. Bsim Device Model
In this experiment, we repeat the analysis for the circuit configurations for BSIM3v2 device models for the 0.25 micron technology. Similar to the second experiment, we evaluate the models for 225 different circuit configurations. From BSIM3v2 models, we generate a 50x50 I ds table for each PMOS size to use in the proposed models. t max based model uses a bisection search to find the maximum current using few device evaluation. The Shichman−Hodges model is used to determine the operation regime in t sat based model. The results are depicted in Fig. 6a−c .
From the results, we see that the total short−circuit power estimates for the proposed models agree very well with the actual SPICE results. The correlation between three estimates and the actual power is larger than 0.99. Fig 6b shows how the models vary with respect to the stage delay. We clearly see that short− circuit power decreases as the stage delay increases. This can be a result of larger capacitance and weaker devices. But more importantly, the proposed models are relatively better for larger values of power and smaller delays, which seems to show potential benefits for future technologies.
In this section, obtained results for total short−circuit power and the maximum short−circuit current were discussed. I scmax is also critical for circuit performance, since it directly relates to maximum instantaneous power dissipation. Application of the proposed methodology for falling input transitions is trivial. Further extensions for generic logic gates, or series−connected MOSFET structures are possible using inverter equivalent circuit similar to [5, 12] . 
VI. Conclusions
In this paper, we have proposed a new practical methodology for evaluating short−circuit power dissipation using analysis results from timing models. The approach is fairly accurate and practical to get around with incomplete performance characterizations. Due to the linear waveforms used in standard delay models, the efficiency of the computation for short−circuit power is significantly enhanced. More importantly, our methodology does not depend on the specific device models used and can be used with present−day device models. The model shows good accuracy when compared with SPICE, and correlates well with the values predicted by simulations. A possible extension can be made by considerating the subthreshold leakage currents. 
