Abstract t
Introduction
The supply voltage for future gigascale integrated systems are projected to scale to 0.37V for the 35nm, 17GHz generation [1] to reduce electric field strengths and also power dissipation ( Fig. 1) , increases of which are projected to be driven by higher clock rates, higher overall capacitance and larger chip sizes. A key challenge in the design of bulk Si CMOS logic circuits will be to meet the projected performances given the competing requirements of high performance and low standby power at low voltages [1, 2, 3] in the presence of threshold voltage reductions due to short-channel effects and subthreshold swing increases due to the 2D electrostatic charge coupling between gate and source/drain terminals of the MOSFET. A methodology [4] simultaneously considering the device, circuit and system levels of the design hierarchy and distinguishing local and global clock rates, is employed to minimize total power dissipated from a static CMOS critical path gate during a clock cycle. This methodology assumes a realistic environment of chip size, logic gate count, clock frequency, wiring capacitance, critical path depth and range of operating temperature. This analysis uses physical and stochastic models, verified by HSPICE, MEDICI and actual microprocessor implementations to investigate opportunities to scale Vdd to the optimal point corresponding to the limits of CMOS power dissipation where leakage power balances switching power dissipation, and when device capacitance balances wiring capacitance.
Circuit and Device Models
The performance of a generic CMOS processor is modeled assuming a global critical path of 15 [6], 2-way NAND stages, each stage driving average wire lengths (Fig  4) . Average wire lengths, in units of gate pitches, are determined (Table 1 ) from stochastic interconnect distributions [7] , derived recursively using Rents rule, and verified for an actual microprocessor in Fig. 5 . In logicintensive CMOS chips, packing densities are interconnect limited [8] where the effective size of a gate is determined by its wireability [9] . The gate pitch is estimated from NTRS projections for microprocessor chip size, and logic transistor count after discounting the extrapolated increases in cache size and cache area for high performance processors (Fig 6) . Assuming equal interconnect crosssectional dimensions, and that neighboring wiring planes in a multi-level network provide an approximate ground plane, total capacitance per unit length, including fringing effects, is estimated using analytical models in [ 10] Device performance is modeled using compact low-voltage Transregional MOSFET models [11, i2] (Figs 7, 8, 9 ) that predict circuit performance in the sub-threshold, saturation and linear regions of operation providing continuous and smooth transitions across region boundaries. High fieldeffects on carrier mobility are incorporated by adopting the mobility reduction model in [13] . Smoothness and continuity of the drain current expressions in the triode, saturation and the subthreshold regions are obtained by requiring differentiability and continuity of the product of the effective mobility and the areal charge density of inversion layer carriers. Low field mobility dependence on temperature and doping concentration is estimated using empirical models reported in [14] . The doping profile for the RD structure is selected as one that yields the smallest depletion depth, corresponding to the least DIBL effects for a given Vto and gate oxide thickness [15] . Increases in leakage current due to DIBL (Drain Induced Barrier Lowering) effects are calculated using 2D subthreshold models [6] that accurately predict the threshold voltage roll-off and subthreshold swing increase (Fig 10) dependence on supply voltage, device geometries and doping profile. The 2-way NAND gate, as a basic circuit building block in the critical path, has a performance that parallels that of any other circuit actually used in processor critical paths in reflecting technology improvements [16] . The improved delay dependence on fan-in at short channel lengths [17] due to a smaller reduction in the saturation drain current with a rise in the source voltage of the topmost series-connected MOSFET is modeled physically by calculating the fractional reduction of the normalized saturation drain current for the series-connected struc. [18] .
Minimum Power CMOS Random Logic Networks
Power drain of a static CMOS gate is minimized by scaling the supply voltage while meeting the performance required by scaling the threshold voltage and increasing the channel widths until further decrease in threshold voltage, increases total power due to a dominating static component [3] (Figs 10, 11) and further increases in device size increase total power due to larger gate sizes [19] (Fig 12) . Optimal supply voltage (Fig 13) , device threshold voltage and gate sizes are calculated corresponding to a simultaneous solution at these minima (Table 2 ). For a given wiring load, the performance of a static CMOS gate increases asymptotically with increasing (W/L) ratios, with gate delays reaching past the knee of the asymptotic dependence of delay on channel width, for wiring capacitance less than or equal to 40% of the total load capacitance. This point corresponds to minimum power with respect to gate size where further increases in gate size increases power linearly while permitting only asymptotic reductions in supply voltage. Critical path gates clocked at high local frequencies are assumed to be only 5 stages long and drive wire lengths averaged within a macrocell of a Short-wire' cellular array architecture (Fig 14) . Assuming gates are sized so that wiring capacitance is 40% of the total load, the cell count (Table 3) is calculated using the stochastic interconnect distribution by imposing a maximum heat removal coefficient of 50 W/cA on the average wire length of the cell, calculated using the stochastic distribution. Total CMOS power increases exponentially (Fig 15) for a given generation, with increases in clock frequency due to an exponential rise in the supply voltage necessary to meet shrinking cycle times and the accompanying increases in leakage current due to threshold voltage reductions and subthreshold swing increases. The maximum heat removal coefficients of the package thus impose limits on CMOS performance.
Summary and Conclusions
The limits on CMOS energy dissipation shown to be imposed by static power and by wiring capacitance, are investigated using a methodology that conjointly employs physical short-channel MOSFET drain current and threshold voltage roll-off and subthreshold swing roll-up models in tandem with stochastic wiring distributions. Optimum supply voltages, device threshold voltages, and device channel widths corresponding to minimum total power are calculated out to year 2014 for local and global critical paths. These projections are consistent with technology and cycle time forecasts by the NTRS. Limits on the performance of CMOS logic circuits are shown to be imposed by total power dissipation which increases exponentially with clock frequency. Limits on the cycle time performance imposed by power dissipation are projected for the same period. Constraints imposed by NTRS projected package heat removal coefficients, permit local clock rates to apply only within a macrocell whose 8 size and total number are calculated using the stocahstic distribution. 
References

