A Leakage-Biased Domino circuit family is proposed that maintains high speed in active mode but which can be rapidly placed into a low-leakage inactive state by using leakage currents themselves to bias internal nodes. A 32-bit Han-Carlson domino adder circuit is used to compare LB-Domino with conventional single and dual Vt domino circuits. For equal delay and noise margin, the LB-Domino technique gives two decades reduction in steady-state leakage energy compared to a dual-Vt technique.
Introduction
Energy dissipation has emerged as the primary design constraint for many systems, from portable electronics to high-performance microprocessors. Until recently, the dominant cause of energy dissipation in digital CMOS has been dynamic switchng of load capacitances. Continuing reductions in feature size reduce capacitance and supply voltage and hence dynamic switching energy per operation but, to maintain performance, threshold voltages must also be scaled down with supply voltage. Unfortunately;lowering the threshold voltage increases static leakage current exponentially, and within a few process generations it is predicted energy dissipation from static leakage current could be comparable to dynamic switching energy 13, 41.
A number of techniques have been proposed to combat this increase in l e a a g e power. These approaches can be divided into two categories. The first category focuses on the static design-time selection of slow transistors on non-critical paths. to force intemal dynamic nodes into a low leakage state. However. the internal node is pulled down through a PMOS leaving the possibility of an intermediate voltage on the dynamic node of the first stage of a domino chain if the data inputs are not high. This can cause short-circuit current in the static output inverter until the leakage through the input transistors finally pulls the dynamic node to ground. This paper presents a new DDFT circuit family, Leakage-Biased Domino (LB-Domino). LB-Domino uses sleep transistors only on non-critical paths and uses the leakage current itself to bias internal critical paths into a minimal leakage state -leakage currents are used to apply the optimal sleep vector. f i r technique has little impact on active energy or delay when applied to conventional domino circuitry. LB-Domino provides a low-leakage state which can be rapidly entered and exited with low transition energy overhead. This enables fine-grain leakage reduction, where small subcircuits can be deactivated for short periods of time.
Leakage-Biased Domino
An LB-Domino buffer is shown in Figure 1 . This example is a footless domino buffer without a clock transistor in the dynamic pull-down stack, but the LB t e c h q u e can also be applied to footed domino stacks. Only two small sleep transistors are added to a conventional CMOS domino gate: a high-Vt PMOS in series with the keeper power supply and a high-VI NMOS in series with the static output logic pulldown. When the sleep signal is deasserted, the circuit operates as a conventional domino gate with minimal performance degradation because mere are no additional series transistors in the critical evaluate path.
To place the circuit into sleep mode, the clock signal is left high after an evaluate cycle and the sleep signal is asserted (sleep=l and sleepb=0). If the data input was high, nodel would have been discharged, If the data input was low. nodel is high but the leakage through the NMOS dynamic pull-down stack will slowly discharge the node to ground (the precharge and keeper pull-up transistors are high-VI devices with significantly lower leakage than the pull-down stack). The NMOS sleep transistor is added to prevent any short-circuit current in the static output logic while the dynamic node discharges to ground. The static output, node2, will rise as the static pull-up turns on. As the leakage current of one domino gate causes its output node to rise, ths will cause the NMOS transistors in the pulldown stacks of the following domino gates to tum on, accelerating the discharge of their internal dynamic nodes. In this way, LB-Domino gates bias themselves into a lowleakage state where the internal dynamic nodes are discharged low and static nodes are charged high regardless of input vector state.
When the internal dynamic node is discharged, the main leakage is across the high-Vt PMOS precharge transistor which is tumed off by the clock signal remaining high. The leakage path of the static output includes at least two series NMOS transistors. one of which is a high-Vt device. A conventional precharge cycle is used to move from sleep mode hack to active mode.
Compared with MHS-Domino, LB-Domino has a simpler sleep mechanism that is compatible with, but does not require, a clockdelayed keeper. LB-Domino also avoids short-circuit current in the static output inverter of the first gate of a domino chain.
Evaluation Methodology
The carry generation circuit of a 32-bit Han-Carlson adder [ 121 was used to evaluate LB-Domino. The carry generation circuit is pure domino with six levels of alternating dynamic and static logic. The basic propagate-generate cells a e shown in Figure 2 . Fourvariants of the design were compared. The first uses only low-Vt transistors (LVT), while the second is a dual-Vt (DVT) design where only evaluation phase transistors are low-Vt. The third variant is an LBDomino (LB) design based on the DVT design but with high-Vt sleep transistors added to the keeper feedback circuits and the static logic pulldowns. The fourth vanant (LB2) is another LB-Domino design which only uses high-VI for the precharge transistors and for the added sleep transistors.
For all four designs. the input and output noise margin of all dynamic circuits was set to 10% of the supply voltage and the precharge/evaluation delays were equalized to within 1% error Temperature I 100°C 1 IW"C through transistor sizing. The circuits were designed for an existing TSMC 180nm process and a projected 70nm process obtained from the BPTM project [2] ( Table 1 ). All simulations used HSPICE.
Since both active energy and leakage power are dependent upon inputs, three different input vectors were considered ( Table 2) : vecl doesn't discharge any dynamic nodes, vec3 discharges all dynamic nodes, and vec2 discharges half and leaves half high.
Results
Figures 3 and 4 show the delay and active energy consumption for 180nm and 70nm processes respectively. The active energy of DVT is greater than that of LVT because the high-VI keeper transistors must be sired up to give equal noise margin and equal precharge delay. For the same reason, the active energy of LB is greater than that of DVT. However, LB2 can meet the delay constraints with only a small increase in active energy over the LVT design because it uses only a small number of high-Vt transistors. of LB and LB2 is independent of input vector because leakage currents bias the intemal nodes into the lowest leakage state over some transition lime. The LB schemes have worst-case sleep-state leakage currents that are around two decades lower than the LVT and DVT designs. For the 180 nm process, the LB scheme is preferred for circuits that spend enough time in sleep mode as it has lower leakage than LB2, hut for circuits that are more active, LB2 has lower active energy and reasonable steady-state leakage. For the 70nm process, LB2 is always better than LB since it has lower active energy and lower steady-state leakage than LB.
Figures 1 and 8 show how energy consumption evolves ober time when the circuit is put into a sleep state for 180 nm and 70nm processes respectively. The energy curves show the energy consumption when the circuit sleeps for the specified time, including the cost to transition the circuit into and out of the sleep state (e.g.. the energy to switch the gates of the sleep transistors). For LVT and DVT schemes, the sleep energy is just linearly proportional to sleep time as leakage currents are constant. The sleep energy curve of LB shows a very different characteristic. There is a large jump in energy after a short sleep time (around 20 ns for 180 nm and around I n s for 70nm). At this point, the static output of the first domino stage charges up to the threshold voltage, and causes the following stage to move rapidly to the low-leakage state. This process quickly ripples through the chain of domino gates. The energy stored in any precharged dynamic nodes is lost and must be restored during precharge when the circuit is next woken up, hence the steep rise in effective sleep energy dissipation. After this p i n t . the energy curve has a very shallow slope due to the lowered leakage currents.
For short sleep times. the LB schemes require more total energy than simply idling an LVT or DVT circuit. But for longer sleep times the energy cost of discharging the intemal dynamic nodes is amortized and the lower sleep leakage current yields lower overall energy. For LB and LB2, the cross-over point is around 2ps in the 180nm process for the worst case (vecl). However, the crossover point in the 70nmprocess is under Ions because active energy scales down faster than leakage power.
Conclusion
As leakage currents become more significant, the leakage currents themselves can be used to bias nodes into low-leakage states. When used to dynamically deactivate critical path circuits in projected 70nm process technologies, LB-Domino provides two decades reduction in steady-state leakage current compared with low-Vt or dual-Vt domino at equal delay and noise margin. LB-Domino has sub-cycle deactivation and reactivation latencies, and because leakage currents are used to bias lhe circuit, LB-Domino also has low transition energy overheads. Using LB-Domino to place circuits into a sleep state can yield net energy savings even for sleep times of under Ions. This makes dynamic fine-grain circuit deactivation practical, where small pieces of an active system can be powereddown for short periods of time to save leakage energy. 
