ABSTRACT
INTRODUCTION
Accurate power estimation is a critical step in the analysis and design of CMOS circuits in nanometer process technologies. The difficulty is mostly due to (a) input pattern dependence i.e., accurate power calculation requires knowledge of a "typical" or "expected" input stream, and (b) variability of the shape of the input signal waveform due to variations in key physical and electrical characteristics of CMOS logic cells and interconnects and/or different sources of noise, such as DC drop on supply lines and crosstalk noise on signal lines. While the first issue has been addressed in the past by developing various statistical or probabilistic power estimation methodologies [1] [2], the latter issue has not received much attention by the low power design community. To partially address this shortcoming, the present paper seeks to develop a short circuit power calculation method under noisy signal waveforms.
Power consumption in CMOS VLSI circuits comprises of three components: switching, short circuit (SC), and leakage. The switching component of power dissipation refers to the power consumed to cause a gate output transition and follows the wellknown P, = 0.5C Vddf a wherefis the clock frequency and a is
This work is sponsored in part by the Semiconductor Research Corporation (research ID #1423) the expected number of output transitions per clock cycle. For a detailed treatment, the reader may refer to [3] . The next component is the SC (or rush-through) power dissipation. The SC power is consumed by the current flow between the power rails (i.e., power supply to ground) through a direct current path which is temporarily established during an output transition. Therefore, the SC current at each time instance depends on the operation region of the transistors in the logic cell, which means that it is dependent on both the input and output voltage values. A well-known equation for time-averaged SC power dissipation is [4] : PS=1 kr" (Vd-2VT)3fa where Tin is the input transition SC12 time, VT is the threshold voltage of transistors, and k is the effective transconductance parameter of the logic gate. The leakage component of power dissipation (which is rising very fast compared to the switching component due to lower VT values and thinner gate oxides) accounts for the subthreshold current conduction, gate oxide tunneling currents, and reverse-biased p-n junction currents. For a detailed treatment, the reader may refer to [5] . The focus of this paper is on the SC energy dissipation.1 For years, it has been stated and generally accepted that the SC current can be made small (say less than 10% of the switching power) by following a few simple design guidelines e.g., do not overdrive a load and do not allow the transition time (inverse of the slew rate) of the intermediate signals in a circuit to become too long. We will show in this paper that the SC energy dissipation can be comparable to other sources of energy dissipation even for a well-designed circuit in current CMOS designs (e.g., refer to Figure 4 (a) and (b) in section 4.) This is mostly due to the increasing effect of noise, primarily crosstalk noise and its impact on the shape of the voltage signal waveforms inside the circuit. The increase in the transistor packing density as well as the clock frequency of the VLSI circuits increases the effect of capacitive crosstalk noise; the interconnect lines get thicker and narrower (and longer in case of global interconnects,) which result in the aggravation of crosstalk noise amplitude. This phenomenon in turn results in more distorted voltage signal waveforms and tends to increase the effective transition time of the signal waveforms that are subjected to crosstalk noise.
The remainder of this paper is arranged as follows. Next section brings a review of the previous SC power calculation techniques. In section 3 our current-based logic cell circuit model for SC power calculation is described. Section 4 presents our 1 Since the operation frequency of the circuit,f, is assumed to be fixed during the analysis and optimization steps that we consider in this paper and recalling P = E frelation, we alternately use "energy calculation" and "power calculation" in this paper. [4] , [6] - [10] . These Current-based modeling has proven to be a highly effective approach for delay calculation in STA tools [13] - [17] . Croix et al in [13] [17] to resolve the shortcomings of [13] - [16] . Interested reader may refer to the paper for details. We point out that none of the above-mentioned current based approaches (including [17] 
EXPERIMENTAL RESULTS
To show the effectiveness of CSPC, the model was compared with Hspice simulations [18] . Figure 3 shows comparison of CSPC results with Hspice for some examples of crosstalk-induced noisy waveforms given to a minimum sized inverter in our 130nm cell library with a Vdd of 1.2 volts. As seen, the output waveforms generated by CSPC closely match those generated by Hspice. Figure 4 shows another comparison with Hspice for some examples of crosstalk-induced noisy waveforms given to a minimum sized inverter with a F04 loading in our 130nm cell library. Figure 4(a) is for the case where only one aggressor is injecting the noise. The transition time at the input node of the aggressor and victim lines is set to 300ps. The input voltage, output voltage, and SC current waveforms obtained by CSPC as well as Hspice are depicted. It is seen that the CSPC-generated waveforms closely match the corresponding ones generated by 2 For cells realizing more than one logic function (such as an AND cell, which is simply a NAND cell followed by an INV cell), the characterization process should be repeated for each logic function.
Hspice. Figure 4(b) shows another example with the identical experimental setup, except for the number of aggressor lines which is two in this case. This figure shows that the accuracy of CSPC does not degrade no matter how distorted the input voltage waveform is. We note that the SC energy dissipation related to Figure 4 (a) are 2.68pJ (2.78pJ) by Hspice (CSPC.) Results for the case of Figure 4 (b) are 15.65fJ (15.74fJ ). This constitutes more than 5X rise in SC energy dissipation when the number of aggressors is increased from one to two. This is because as the number of aggressor lines increases, the duration in which both NMOS and PMOS are operating increases; this in turn significantly raises the SC energy consumption level. Figure 4(c) illustrates the results for a minimum size F04-loaded NAND3 for which a crosstalk noise is injected to one of the inputs through three aggressors, while the other two inputs assume a non-controlling, steady, high level logic value. The transition time at the input driver of the aggressor line as well as that of the NAND input victim line are set to 300ps. The SC energy dissipation for this case is 27.71fJ (28.01fJ) by Hspice (CSPC), meaning that the error of CSPC is less than 1.1% in this case. To compare CSPC to conventional techniques, we implemented the technique by Dartu et al. in [11] in which an input signals are approximated by smooth saturated ramp waveforms in order to be compatible with the pre-characterized lookup tables. Figure 4(b) illustrates the SC waveform for one such ramp approximation. The corresponding SC energy dissipation is calculated as 7.1fJ, which is less than half of the actual SC energy dissipation by the noisy waveform (i.e., 45.9% error with respect to the Hspice report, 15.45fJ.) This underlines the fact that the shape of waveform should not be ignored during the SC power calculation.
To investigate the accuracy of CSPC in dealing with a complex logic cells, an A0122 (And-Or-Invert) with size lOx was studied, where x denotes the minimum size for an A0122. The cell was F04-loaded. One of the input nodes was subjected to crosstalk noise through a coupling capacitance of 80fF. The other inputs were set to their non-controlling values. We used the same characterization process as an inverter for complex gates. The corresponding aggressor and victim lines were driven by 1 Ox inverters. The arrival time of the signal transition at the input of the victim line driver was set to 10ps while that of the aggressor 777 8A-5 line driver (i.e., the noise injection time) was swept from 100ps to 250ps with a time step of lps. Figure 5 depicts An automated test was performed to validate CSPC against Hspice for different logic cell types using a similar experimental setup to that of the previous experiment on the A0122. 150 noisy input waveforms were applied by sweeping the noise injection time for each logic cell. For each noisy input the transient analysis period and step size were set to 4ns and 3.3ps, respectively. Table 1 summarizes the average and maximum An A0122 with a relative size of lOx was considered under a similar experimental setup as the one in Figure 5 . However, this time the cell input under crosstalk attack was kept quiet. In addition, the arrival time of the aggressor line was set to a constant value, while its transition time was swept from 200ps to 400ps with a time step of lps. Figure 7 
