Abstract
Motivation
Modern IC-production lines offer deep submicron technologies. With the ever shrinking structures new possibilities arise for a higher degree of functional integration, more complex applications in more handsome cases. For marketing, environmental and reliability reasons a low power consumption is gaining importance. The time of operation for portable battery applications is limited by its energy consumption. With the increasing power consumption of complete chips cooling problems arise which dramatically influence the packaging and its costs.
The power reduction of an application can be achieved by technology improvements, voltage scaling [1] and design decisions for low power [2] . Because of the high demands on lowering energy consumption all possibilities must be exploited.
Within design for low power the power consumption of a certain design solution needs to be evaluated. Therefore power estimation tools are available on different levels of abstraction: circuit-level, gate-level, RT-Level and currently some investigations also face higher levels. Power calculation is most accurate on the lower levels. However, this accuracy has to be paid with lower performance. Which level of abstraction fits best to calculate power clearly depends on the given constraints (accuracy, calculation time, information available of the design).
Besides the validation of certain design solutions, tools on lower levels of abstraction are needed for characterizing higher level modules. For RT-module characterization one has the choice to take tools on circuit-or gatelevel. Even though this characterization has to be done only once for a module library, the use of SPICE-like tools is commonly not feasible due to the module's high complexity. On the other hand errors in accuracy during the characterization process will decrease the simulation accuracy on higher levels.
Within this paper we compared the following gateand circuit-level tools with each other by applying them to a set of ISCAS'85-and datapath-module benchmark circuits: -HSPICE (Version 93a) from META-Software, -PowerMill Version 5.1 from EPIC -Glitch Power Simulator GliPS from OFFIS, -Toggle Power Simulator TPS from OFFIS.
GliPS is a stand-alone event-driven simulator, which puts high emphasis on accurately modelling signal waveforms including incomplete transitions (defined as glit- ches, cf. fig. 1 ) [8] . TPS calculates power from simple toggle-count information, which is extracted from VERI-LOG-XL simulations [10] . TPS takes output loads and precharacterized gate internal power losses into account. Within the VERILOG-XL simulation an inertial delay model and a SDF (considering slope-effects) are used. A wide range of accuracy and simulator performance is covered by these four simulators. Within this paper a glitch is defined as a pair of two or more colliding output waveforms which are so close together that the corresponding voltage waveform neither reaches V SS nor V DD in between (cf. fig. 1 ). The energy consumption of a glitch is usually less than that of the underlying complete transitions and hence must be calculated differently. The glitch model [8] which is used within GliPS includes the determination of essential data to consider glitches within the energy calculation.
In the next section some parameters for gate level power modelling are introduced. In section 3 delay modelling alternatives, which influence accuracy of activity estimation, are dealt with. In section 4 the impact of these modelling parameters will be evaluated by analysing benchmark simulations. In the final section conclusions are drawn.
Power calculation of CMOS gates
The average power consumption of a single CMOS gate can be divided into three parts:
The power consumption due to leakage currents is much smaller than the other two dynamic components and hence it is often neglected within power calculation. This is true except for very low voltage supplies and as a consequence for very low threshold voltages [2] . The impact of leakage currents is not further discussed in this paper.
During switching a conducting path through the pullup-and pull-down network of a gate is present and as a consequence a short-circuit current is occurring. The third component is the capacitive component which takes into account the capacitive loading of switched capacitors. The short-circuit charge Q SC is often considered together with loading of gate-internal capacitors (not including faninand fanout-capacitors) as internal charge. If the internal charge is used within characterization, the capacitive component only contains the fanout-capacitor (including fanin-capacitors of consecutive gates).
We first discuss single complete output-transitions which are caused by single input-transitions. This model will afterwards be extended to consider glitches (i.e. partial transitions).
The short-circuit charge respectively the internal charge consumption depends on voltages at gate-internal nodes, the transition causing input-port, the switching output-port(s), possibly some stable logic input-values, inputslopes and output-loads.
Within TPS and GliPS precharacterized internal charge consumption data is available for each cell. TPS uses a constant internal charge per gate-transition and as a consequence the above given effects can only be considered on average. Within GliPS linear interpolation is used to get the desired charge and delay values from precharacterized data, which is organized in lookup tables. Within these tables all of the above given effects can be modelled. However, the voltage at gate-internal nodes, which are not uniquely given by the in-and output logic assignments, are not considered. The characterization data is derived from circuit-level simulations of single gates.
If a partial transition occurs instead of a complete transition the capacitive energy consumption is:
This equation also holds for complete transitions with ∆V=V DD . The capacitive power calculation is straight forward:
The sum can be obtained by logic simulation over a sufficient time interval [7] using a gate-level glitch model [8] . Equation 2 holds for all kind of glitches. As an example a dynamic glitch which consists of three ramps is illustrated in figure 2.
As a rough approximation the internal charge, which is consumed due to a complete output-transition, is also scaled by ∆V i / V DD within GliPS.
P
P leakage P short-circuit P Cap + + = 
Figure 2: Example for a dynamic glitch
E GlitchCap 1 2 ⁄ V DD Q Cap ⋅ ⋅ = 1 2 ⁄ V DD C L V ∆ ⋅ ⋅ ⋅ E complete tr. V ∆ V DD ----------- ⋅ = = P Cap 1 2 ---V DD C L V ∆ i i ∑ T ------------------ T ∞ → lim ⋅ ⋅ ⋅ = V ∆ i ∑ 1.0n 1.5n 2.0n 2.
Delay modelling of static CMOS
Within the last section it was emphasised that dynamic power-consumption of CMOS gates is dominant. Hence the activity calculation plays an important role within power simulation. During one computation cycle circuit nodes may multiply switch due to different path delays. This leads to unnecessary power consumption which can be as high as 65% for arithmetic units [6] . In order to consider this unnecessary switching the delay model plays an important role. Using a zero-delay model, only necessary power is calculated. Within the unit-delay model real delays are approximated in a very rough way, leading to path-delays which may have nothing to do with reality.
Using real-time delays two conventional ways of delay modelling are available: transport delay model and inertial delay model. Within the transport delay-model all input-events of a gate, causing a transition at the outputport(s) are propagated to the consecutive gates. This leads to an overestimation of activity. Within the inertial delay model output-pulses, which are shorter than a given threshold, are filtered.
The switching data, which is calculated using the transport or inertial delay model, does not include information about partial transitions. Hence each transition is associated with a V DD -swing when power is calculated according to equation 2. In case of a partial transition, which does not start from V DD respectively V SS , the delay will be shorter than the characterized one for a full transition. I.e., the pulse is smaller in reality than the pulse derived from the inertial respectively transport delay model (cf. fig. 3 ). The reduction of pulse width is important for possible glitch filtering in consecutive gates.
These two phenomena are considered within the enhanced gate-level glitch model [8] , which GliPS is based on. Different existing enhanced glitch modelling algorithms have been compared in [9] . The main important differences to conventional gate-level simulators are: • a transition is modelled by linear ramps, • ramps are derived from delay and slope information, • ramps are scheduled into the event-queue of the simulator when the ramp start, • glitches are presented by ramps crossing each other.
The ramp-and power calculation is done dynamically during simulation. This allows GliPS to consider different input-slopes at an instance's input-pin. Input ramps are generated by a driving gate's output. A static calculation of the driving gate's output-slope would require the averaging of all possible driver's input-slopes for all possible input-pins. As a consequence the accuracy would be decreased.
Practical results
The simulators HSPICE (Vers. 93a), PowerMill (Version 5.1), GliPS and TPS were evaluated for the benchmark circuits given in Table 1 . The first 3 benchmark circuits were generated from Synopsys' DesignWare and contain complex gates like Full-Adders. The ISCAS'85 benchmarks consist of basic gates (like AND, OR, NAND, NOR, EXOR) only. The designs are mapped on Atmel ES2's 1.0µm process and layout extracted data is available. Interconnects were modelled by single capacitors within HSpice and PowerMill.
For TPS delay calculation was done using the Cadence Delay Calculator (SDF enhanced wire delay model), which statically considers input-slope and outputload effects.
Within PowerMill the transistor-characterizations were run in advance (not included in performance data) and two alternatives were distinguished: • accurate mode: the following options were applied: set_sim_spd 0.2 and set_powr_acc 1, • default mode: no user defined options were used.
In table 2 the achieved accuracies of charge-consumption and in table 3 the simulator performances are reported with HSPICE as reference. It was not possible to simulate all pattern within one simulation run using HSPICE. As a consequence the simulations were split into Two sources of error in accuracy can be distinguished: Errors in activity estimation and errors due to the power model.
In table 4 the activity accuracy of the default PowerMill mode, TPS and GliPS is compared to the accurate PowerMill mode results. For TPS a high portion of charge-estimation error results from activity errors.
The decrease in accuracy is dramatic especially for the circuits with high circuit-depth (c6288, c3540, c1355 [pC] Hence the main source of errors of TPS is the activity estimation inaccuracy. This observation is also documented in fig. 4 (the decrease of activity for circuit-depth positions above 100 is due to the circuit structure). The deeper a net is located within the circuit, the higher the activity estimation errors are. These results also indicate, that the accuracy of local charge estimation for single gates decreases with circuit depth. The above given accuracy data refer to simulations of a large set of random pattern. The error of charge consumption for single changes of input pattern is typically much higher. The error may average out if a large pattern sequence is analyzed. In table 5 the maximum deviation of charge consumption is given. The maximum error of TPS for a single change of input pattern is above 100%. For the ISCAS'85 benchmarks the maximum error of GliPS is comparable to PowerMill (default mode). The more complex gates which are used for the DesignWare modules, were modelled as black box components within GliPS and TPS (even though GliPS is capable to model these gates more accurately). For this reason the maximum deviation of GliPS is higher than PowerMill for the DesignWare modules. Fig. 5 contains a plot with the number of pattern in a certain error interval. The deviation in average and the variation of the data are higher for TPS than for GliPS.
RT-level power models may contain a large number of parameters, which need to be characterized using lower level simulators (i.e. gate-or transistor-level). Commonly only a subset of the complete set of input-pattern can be used to characterize a specific RT-level power model parameter, which is more error prone than the charge estimation of the whole pattern sequence.
Conclusions
Within this paper we have compared different power estimation methodologies at different levels of abstraction. The main important task is to find a good compromise between accuracy and simulation performance for the given constraints. As key point it was observed, that the activity estimation plays a major role. Simple toggle count based gate-level simulators (like TPS) deliver acceptable accuracy (in terms of power and activity) only for circuits with small logical depth. For large and moderate logical depth circuits the delay needs to be modelled more accurately. This is possible using the new gate-level power estimation tool GliPS. Within GliPS delays and power are carefully modelled. Its accuracy is comparable to transistor level simulators running more than one order of magnitude faster. GliPS can also be used for advanced timing analysis. 
