Digital circuits operating in the subthreshold region provide the minimum energy solution for applications with strict energy constraints. This paper examines the effect of sizing on energy for subthreshold circuits. We show that minimum sized devices are theoretically optimal for reducing energy. A fabricated 0.18pm test chip is used to compare normal sizing and sizing for minimum VDD. Measurements show that existing standard cell libraries offer a good solution for minimizing energy in subthreshold circuits.
Introduction
Emerging applications such as distributed sensor networks or medical applications have low energy as the primary concern instead of performance. Minimum energy operation for low performance situations occurs in the subthreshold region [ 1] [2] . Increasing leakage energy at low supply voltages offsets the reduced active energy and causes a minimum energy point. Many designs exhibit a minimum energy operating point higher than the minimum achievable VDD, and this operating point is a function of several parameters [1] [3] . In general, designs with larger leakage energy relative to active energy have a higher optimum VDD. This paper examines the effect of device sizing on minimum energy operation. After considering theoretically optimal sizing, we explore minimum energy operation for standard cell designs. A fabricated 0.18pm test chip provides measurements for analysis.
Optimal Sizing for Minimum Energy
Sizing influences the energy consumption of a circuit in two primary ways. First, sizing directly affects energy consumption by changing switched capacitance and leakage current. Secondly, sizing affects the minimum voltage at which the circuit functions, which can change the absolute minimum energy point. This section explores these two impacts.
A . Sizing for a given VDD It has been proposed that theoretically optimal minimum energy circuits should use minimum sized devices [4] , and first-order equations confirm this result for most cases. Equation ( I ) shows the propagation delay of a characteristic inverter with a certain switched capacitance Cg=aCL in subthreshold, where a is activity factor and CL is the load capacitance assuming average fanout:
K is a delay fitting parameter. The expression for current in the denominator of ( I ) models the on current of the characteristic inverter, so it accounts for transitions through both NMOS and PMOS devices. Thus, the terms Zo,g and VTg are fitted parameters that do not correspond exactly with the MOSFET parameters of the same name. For a single inverter, dynamic (EDm), leakage (EL), and total energy (ET) per cycle are expressed in (2)-(4), assuming rail-to-rail swing (VGS=VDD for "on" current).
Total energy per cycle is proportional to Cg' so minimum sized devices give a minimum Cg and minimize ET for a single inverter at a given VDD. Equations for energy of arbitrary circuits can use the inverter analysis as a foundation. Assuming a critical path depth of L D~ characteristic inverter delays gives an operating frequency, f = (tdLDP)-' . We can likewise use the inverter capacitance and leakage current to define total switched capacitance, C e f K f g , and total leakage width, Wef These changes give the expression for total energy per operation in an arbitrary circuit [5]. This equation provides an estimate for a generic circuit since C, E and Weff approximate other parameters that can change.
Assuming that the majority of gates in a typical design are sized similarly, a universal increase in transistor sizes will increase both Ceff and Wef, raising power. This type of sizing change is unlikely to decrease the critical path delay because the input to output capacitance ratios of gates will stay roughly constant, so the typical assumption of fixed capacitance loads is invalid [9] . Thus, minimum sizing also minimizes energy per operation for most generic circuits. One special case that violates this trend is a circuit with a small number of critical paths relative to the total number of paths. In this case, increased sizes on the critical path can reduce LDp with negligible increases in C,E and we^, lowering ET Minimum sized devices generally minimize energy consumption in subthreshold for a given VDD. However, sizing also impacts the minimum operating voltage, which can affect the total energy per operation, E, . 
B. Sizing and minimum operating voltage
Transistor sizing also impacts the functionality of CMOS circuits at low supply. voltages. Minimum VDD operation occurs when the PMOS and NMOS devices have the same current (e.g. [6] ). Previous efforts have explored well biasing to match the device currents for minimum voltage operation of ring oscillators [7] . Sizing can create the same symmetry in device current. Fig. 1 shows the minimum voltage for which a ring oscillator maintains 10%-90% voltage swing. The optimum PMOS/NMOS width across all process comers is 12. A similar analysis of minimum voltage operation while retaining 10% noise margins gives a lower minimum voltage at the typical comer and a higher worst-case minimum voltage but the same optimum size ratio. Fig. 2(a) shows the VTCs at the minimum VDD of 70mV for several P/N ratios. The gain is somewhat degraded, but the optimum sized curve is symmetrical and shows good noise margins. Fig. 2(b) shows the output of a 9-stage ring oscillator at the minimum voltage for the same sizes.
Since symmetrical devices give minimum VDD operation, a simple comparison of currents in NMOS and PMOS devices shows the approximate optimum size for minimizing VDD. The switching threshold of a symmetric inverter is vM=vDD/2. Sweeping the width of "on" NMOS and PMOS devices at VD,,/ Sizing according to this ratio allows for operation at lower VDD but increases the energy consumed for a given VDD (equation (5)). The energy savings from lowering VDD are at best proportional to VDD* if leakage is still negligible. Fig. 1 shows that the impact of sizing an inverter on the minimum supply voltage is only 6OmV, producing best-case energy savings of 0.202/0.262=0.6X due to voltage reduction. This improvement is not worthwhile if all PMOS devices are increased in size by 12X. Thus, minimum sized devices are theoretically optimal for reducing energy per operation when accounting for the impact of sizing on voltage and energy consumed.
Standard Cells and Minimum Energy
Standard cell libraries aid digital circuit designers to reduce the design time for complex circuits through synthesis. Most standard cell libraries focus on high performance, although including low power cells is becoming more popular [9] . Lower power cells generally use smaller sizes. One standard cell library geared specifically for low. power uses branch-based static logic to reduce parasitic capacitances and a reduced set of standard cells. Eliminating complicated cells with large stacks of devices and using a smaller total number of logic functions was shown to reduce power and improve performance [lo] . Standard cell libraries have not been designed specifically for subthreshold operation. This section evaluates the performance of a 0.18pm standard cell library in subthreshold operation.
We use an 8-bit, 8-tap, parallel, programmable FIR filter as a benchmark to compare normal cell selection with cells sized to minimize the operating voltage. of static CMOS logic. Additionally, most of the cells operate at 300mV in the worst case, which is close to the optimum performance shown in the previous section for a ring oscillator. The cells which exhibit the worst case (failing at 300mV) are flipflops and complex logic gates with stacks of series devices (e.g. AOI). We eliminated the problematic cells by preventing the synthesis tool from selecting logic gates with large device stacks and by re-sizing the offending flip-flop cell. Fig. 5 shows a schematic of the D-flip-flop. In the standard implementation, all of the inverters use small NMOS and only slightly larger PMOS devices except 13, which is several times larger to reduce C-Q delay. At the FS comer (fast NMOS, slow PMOS), the narrow PMOS in I6 cannot hold N3 at a one when CK is low. This is because the combined, strong off current in the NMOS devices in 16 and I3 (larger sized) overcomes the weakened, narrow PMOS device in 16. Tying back to the ring oscillator in Fig. 1 , the combined NMOS devices create an effective P/N ratio that is less than one. To prevent this, we reduced the size of 13 and strengthened 16. Clearly, the larger feedback inverter creates some energy overhead. However, the resized flip-flop can operate at 300mV at all process comers in simulation. Fig. 6 shows the lowest operating voltage for the cells in the minimum-VDD FIR filter. The number of cell types has reduced, and all of the cells work at 300mV across all corners. The next section uses test chip measurements to compare the filter sized for minimum VDD with the normal filter. puts. The first filter was synthesized using the unmodified synthesis flow and normal cells (Fig.4) . The second filter was synthesized using the modified flow in which some cells were omitted and some cells were resized to minimize VDD (Fig. 6 ). Both filters can operate using an external clock or an on-chip clock generated by a ring oscillator that matches the respective critical path delay of the filters. Filtered data comes from an off chip source or from an on-chip linear-feedback shift-register. Fig. 7 shows the measured performance versus VD, for the two filters using their respective critical path ring oscillators and the LFSR data to produce one pseudorandom input per cycle. The minimum-VDD filter exhibits a 10% delay penalty over the standard filter. Both filters operate in the range of 3kHz to 5MHz over VDD values of 150mV to 1V. Both filters are fully functional to below 200mV. Fig. 8(a) shows an oscilloscope plot of the standard filter working correctly at VDD=l50mV. The clock in this plot is produced by the ring oscillator on-chip. The reduced drive current and large capacitance in the output pads of the chip cause the slow rise and fall times in the clock, but the signal is still full swing. One bit of the output is shown. Fig. 8(b) shows an oscilnal is generated off-chip and fed through the on-chip clock tree. Although the output swing is degraded and the signal is noisy, it shows the robustness of CMOS logic to low supply voltages. Fig. 9 shows the measured total energy per output sample of the two FIR filters versus VDD. The solid line is an extrapolation of C e~V~D 2 for each filter, and the dashed lines show the measured leakage energy per cycle. Clearly, both filters exhibit an optimum supply voltage for minimizing the total energy per cycle. Within the granularity of the measurements, the opti10SCOpe plot of the clock output for v~~= 6 5 m V .
Measured Results from Test Chip
This clock sig-6-3-3 cycle of 50% in the filter sized for minimum VDD. The figure   also shows the worst-case minimum VDD for the two filters (cf. Fig. 4, Fig. 5 ). Accounting for overhead at the worst-case minimum VDD, the minimum-VDD FIR offers a reduction in total energy of less than 10% at the worst-case process comer, but this improvement comes at a cost of 50% at the typical comer. Simulations show that the measured overhead cost in the minimum-VDD filter primarily results from restricting the cell set that the synthesis tool could use. Since the tool was not optimized for the smaller set of cells, we did not see the improvements that are possible through this approach [ lo] . Using only sizing to create the minimum VDD filter would have decreased the overhead. However, the shallow nature of the optimum point in Fig. 9 shows that the unmodified standard cell library does not use much extra energy by failing at a higher V, D at the worst-case comer. Thus, existing libraries provide good solutions for subthreshold operation. Simulation shows that a minimum-sized implementation of the FIR filter has 2X less switched capacitance than the standard FIR, so a mostly minimum-sized library theoretically would provide minimum energy circuits. 
Conclusion
This paper has examined device sizing for subthreshold operation. For typical circuits and modem technologies, the optimum supply voltage for minimizing power is higher than the failure point for minimum sized devices at the typical comer. Thus, minimum sized devices are theoretically optimal for minimizing power. Even if the minimum energy point for a certain process comer or unusual circuit occurs at a supply voltage where minimum sized devices cannot function, the shallow nature of the optimum prevents up-sizing to reduce the minimum possible operating voltage from being worthwhile. Measurements from a test chip, shown in Fig. 10 , confirm that existing static CMOS standard cell libraries function well in subthreshold. Resizing or restricting cell usage in such libraries can lower the worst-case minimum VDD, but the overhead increases energy consumption at the typical comer. In theory, a standard cell library primarily using minimum-sized devices would minimize energy per operation.
