Sub-threshold operation has proven beneficial for energy-constrained systems, as it enables minimum energy consumption in logic circuits during active computation [1] , and reduces leakage current in components that must be constantly powered. Previous sub-V t research, for example a 0.13μm processor with an 8b ALU and 2Kb SRAM [2] , has demonstrated substantial energy savings. However, process scaling presents a new challenge in the form of heightened intra-die variation. Efficient power conversion and delivery is another essential consideration for a micro-power system. This paper presents a 65nm sub-V t SoC featuring a microcontroller core and custom 128Kb SRAM functional in sub-threshold, powered by a switched capacitor DC-DC converter that delivers variable load voltages from 0.3V to 0.6V.
Sub-threshold operation has proven beneficial for energy-constrained systems, as it enables minimum energy consumption in logic circuits during active computation [1] , and reduces leakage current in components that must be constantly powered. Previous sub-V t research, for example a 0.13μm processor with an 8b ALU and 2Kb SRAM [2] , has demonstrated substantial energy savings. However, process scaling presents a new challenge in the form of heightened intra-die variation. Efficient power conversion and delivery is another essential consideration for a micro-power system. This paper presents a 65nm sub-V t SoC featuring a microcontroller core and custom 128Kb SRAM functional in sub-threshold, powered by a switched capacitor DC-DC converter that delivers variable load voltages from 0.3V to 0.6V.
The sub-V t microcontroller implements a 16b RISC architecture with unified instruction and data memory, supporting 27 instructions and 7 addressing modes based on the standard MSP430 instruction set. GPIO ports and a watchdog timer are included as peripherals. A JTAG interface enables controlling the CPU externally and allows programming of the SRAM at start-up. The system features three low power modes in which clocks to the CPU and peripherals are shut off as indicated in Fig. 16 .7.1.
The logic is divided into two power domains; unused blocks are power-gated during standby using an on-chip PMOS sleep transistor as shown in Fig. 16 .7.1. At this time, the SRAM and key CPU states can be powered at 300mV for data retention while minimizing leakage. The oscilloscope plot of Fig. 16 .7.6 illustrates the core logic emerging from standby after power has been restored to the gated components. Additionally, the memory interface provides a one-entry cache, where a 64b memory row is stored during a single memory read. Successive 16b accesses to the same row require no further memory activity.
Logic design must account for the exponential effect of V t variation on sub-V t currents. At the 65nm node, even a static CMOS logic style does not guarantee functionality in sub-V t , as illustrated by Monte Carlo simulations in Fig. 16 .7.2. Process variation can randomly weaken the pull-up or pull-down network, thus degrading noise margins of logic gates. In a sub-V t register, inverters with reduced output levels decrease the hold SNM of latches and affect data retention. Moreover, as shown in the transient simulation of Fig. 16 .7.2, a clock buffer with reduced output swing can cause contention, thus impeding signal propagation.
A custom cell library was developed to address the above challenges. Logic gates were sized to provide sufficient noise margins in Monte Carlo simulation as per [3] . The maximum fan-in was restricted to 3, allowing logic to be synthesized efficiently while avoiding excessive stacking. Several register designs were evaluated in sub-V t , and the multiplexer-based static register was chosen for robustness. The cross-coupled inverters and clock buffers were sized appropriately to avoid the above failure modes.
To minimize leakage power in the memory, this system includes a 128Kb SRAM designed to operate down to the same minimum V DD as the core logic. The 8T bit-cell consists of a 6T cell and a 2T readbuffer, which removes read SNM limitations by isolating the storage node from the read bitline. To improve readability, sub-V t readline leakage is eliminated by controlling the feet of all un-accessed read-buffers. To improve write-ability, peripheral write drivers lower the cell supply voltage to weaken the internal cell feedback, while boosting the wordline by 50mV strengthens the access devices. The SRAM is primarily based on [4] with two main modifications: the number of bit-cells in a column is reduced to 64 from 256 to improve speed and read reliability; devices in the readbuffer are lengthened to achieve higher speed and lower read current variability in sub-V t .
A custom timing closure approach was developed to account for increased delay variability in sub-V t . As shown in Fig. 16 .7.3, delay distributions of timing paths in the microcontroller can have similar means but drastically different variances. Furthermore, setup and hold time distributions of various register types are not wellmodeled by canonical forms. To address these issues while keeping simulation times reasonable, a custom script was applied to an exhaustive list of timing paths generated by commercial software, in order to select a subset of paths for detailed analysis. The paths are first binned by nominal hold time margin. Within each bin, the script then estimates the delay variation (σ/μ) of each path, considering both device sizes and logic depth. Finally, paths with the highest σ/μ from each bin undergo Monte Carlo simulation, which gives accurate delay distributions and indicates whether extra buffering is necessary for satisfying hold time constraints.
An efficient DC-DC converter supplying ultra-low voltages is essential for realizing full power savings in sub-V t . In this system, a switched-capacitor (SC) DC-DC converter with on-chip chargetransfer (flying) capacitors can provide voltages from 0.3V to 1.1V, although the logic and SRAM only utilize voltages up to 0.6V. The SC converter, shown in Fig. 16 .7.4, improves upon a 0.18μm prototype described in [5] . The converter employs five different gain settings to maintain nearly constant efficiency with change in load voltage. A suitable gain setting (G<4:0>) is chosen depending on the load voltage being delivered. Charge-recycling techniques are added to this implementation to mitigate the loss in efficiency due to bottom-plate parasitics. The charge recycling block employs an on-chip bond-wire inductor and a 30pF charge-recycling capacitor. The converter uses an all-digital Pulse Frequency Modulation control method to regulate the output voltage. The enable signal enW<2:0> adjusts the width of the switches to reduce switching losses when the load power delivered scales down. The converter achieves a peak efficiency of 78% at 530mV load voltage and 182μA load current. Its efficiency while delivering 500mV load voltage is provided in Fig. 16 .7.4. The SC converter together with the chargetransfer capacitors occupy 0.12mm 2 in total area and can deliver load power levels up to 400μW. Fig. 16 .7.7. The minimum energy point occurs at 500mV, where the logic and memory together consume 27.3pJ/cycle operating at 434kHz. The DC-DC converter achieves 75% efficiency at the corresponding power level with the microcontroller as a load. During standby mode, the DC-DC converter enables dynamically scaling V DD to 300mV, where the combined power for core logic and SRAM is less than 1μW, as plotted in Fig. 16 .7.6. Accounting for the efficiency loss at ultra-low power levels, this provides a 2.1× leakage power reduction compared to keeping V DD constant at 500mV during standby. 
DIGEST OF TECHNICAL PAPERS •

