Abstract-
I. INTRODUCTION
T HE SIA roadmap identifies two distinct paths for voltage scaling-one for low power and one for high performance [1] . The low-power branch of the scaling predictions is intended for battery-operated applications that require minimal power consumption for long battery lifetime. The high-performance branch is appropriate for microprocessors and the like. As seen in Table I , both branches show a fairly aggressive scaling of the supply voltage-especially considering for how many generations the supply remained at 5 or 3.3 V. The battery-powered applications require a reduced supply because of the quadratic dependence of power on supply. Microprocessors warrant smaller supplies because they need smaller transistor dimensions for higher density without suffering from breakdown. Nonetheless, even for the low-power branch, the supply voltage seems to bottom out at 0.9 V. To understand the reason behind this predicted lower bound, it is essential to investigate the tradeoff and limitations associated with lowering the supply voltage.
Because current drive is dependent on , simply reducing the supply voltage results in an excessive increase in delay. One model of the scaling dependencies for inverter delay derived in [2] [5] . Even with a threshold as low as 0.2 V [5] , [6] , it will be difficult to maintain decent performance for a supply lower than 1 V with conventional circuitry. Such reasoning led to the predictions published in the 1994 SIA Roadmap and reprinted in Table I .
The above discussion was based on a critical assumption-that we use the same threshold voltage in both the onand off-state. In 1994, Assaderaghi et al. [7] introduced a scheme of connecting the gate to the body in a silicon-oninsulator (SOI) MOSFET to obtain low leakage ( high) and high drive ( low) for extremely low supply voltages. This configuration was termed dynamic threshold MOS (DTMOS). I-V curves illustrating the improvement are shown in Fig. 1 .
A couple of drawbacks of this technique should be mentioned. The immediate problem with this scheme is that it is limited to 0.5-V supplies because forward biasing the sourcebody junction would lead to excessive gate current. Reference [7] proposed using a limiter transistor to allow higher supply voltages while keeping the body below 0.5 V (for NMOS). That variation, along with others, will be discussed in this paper. Other drawbacks of this technique are the area penalty and the process complexity associated with contacting and isolating the body of each transistor. This paper will confront such issues and attempt to convince the reader that using 0018-9200/99$10.00 © 1999 IEEE supply voltages in the range of 0.5-1.5 V is feasible in today's technology with minor circuit adjustments.
II. OUTLINE
After a brief discussion of our methods, we present a power/delay analysis for several inverter designs that use varying body bias to control the threshold voltage. The data will show that there are several good alternatives to standard CMOS when we need to raise the ratio above 1/4. Then we will extend these circuit techniques to pass-transistor logic. It has been shown that various forms of complementary pass-transistor logic (CPL) can be much more energy or area efficient than regular CMOS or transmission-gate logic [3] , [5] , [8] - [11] , [13] . Thus, DTMOS variations have the potential of enhancing such efficiency or alleviating the need for alternative pass-gate circuits. We will conclude with a discussion of the results and future work.
III. METHOD
For the SPICE simulations to be accurate, we took great pains to optimize the process file for all ranges of operation. This optimization was accomplished using BSIMPRO. With that software, we measured isolated bulk transistors and extracted the optimized process file. The transistors were made using a 0.18-m process with a 0.3-V . The resulting process files were used in SPICE circuit simulations for power and delay measurements over various supply voltages and loads. Each transistor was split into ten transistors with 1/10 of the desired width to provide accuracy for the RC delay associated with charging the body. Delay values were obtained for a given circuit with the input driven by a similar circuit and the output loaded by a capacitor, as shown in Fig. 2 . Power values reflect the power used in the entire circuit. 
IV. INVERTER COMPARISON
For the traditional inverter analysis, three different styles (Fig. 3 ) are compared to the CMOS. The first variation is the DTMOS inverter introduced in [7] . The second inverter is similar to the first but with a limiter transistor to limit each body-source junction to 0.5-V forward bias. The limiter transistors are minimum-sized devices. The reference voltages are kept at 0.5 for NMOS and for PMOS for the most efficient operation. Either these references voltages can be provided separately or the threshold voltages of the limiter devices can be adjusted such that and GND can be used. The third inverter also uses minimum-sized auxiliary devices to augment the current drive by manipulating the body bias [12] . This time, however, the body-source junction is not directly clamped. Rather, any excess current caused by forward biasing is used to charge/discharge the output.
Plots for delay and power*delay for various supply voltages are shown in Figs. 4 and 5. For supply voltages below 0.5 V, the clear choice is the straight DTMOS inverter. For supply voltages between 0.5 and 1.5 V, both the limiter and augmented cases show significant improvement over the traditional CMOS inverter. The difference is noticeable when exceeds 1/5. This difference results from the change in threshold: since the threshold voltage is lower than its off-state value when the device is switching, the current is greater. Thus, the current is more accurately modeled by . Consequently, the scaling for constant delay requires a constant . Leakage still dictates that remain at 0.2 V or higher, but a reduced allows for a well below 1 V. For the process simulated in this paper, V for V. The rapid increase in delay occurs at a lower for DTMOS because the true is lower. Since each of the DTMOS variations add capacitance, the advantages of these variations are more apparent with heavier output loading.
V. PASS-GATE COMPARISON
As mentioned earlier, traditional scaling analyses emphasize the importance of trying to maintain a constant to preserve the switching speed [2] . As we try to scale below 1 V, it becomes tempting to raise this ratio slightly. Unfortunately, this is not the only source of delay penalty when we raise the ratio. In single-transistor pass-gate logic, a drop is lost across the device when it tries to pull the output high [3] , [5] , [8] , [9] , [11] , [13] , [14] . This signal degradation subsequently slows the pulldown of the output buffer and causes leakage because the pullup is not fully off. Since DTMOS circuits potentially lower the on-state threshold voltage to zero or below [15] , pass-gate logic may benefit further from such designs. Using DTMOS in pass-gate logic is a fairly new idea. Initial publications have already been made on the topic [14] , [16] , but a more complete analysis is needed. We investigated the impact of having a dynamic threshold on pass-gate logic both with and without restoration for a broad range of supply voltages. Restoration has commonly been used to alleviate the drop problem by assisting the pass-gate pullup at the output [8] , [11] , [13] , [14] . DTMOS could be substituted for restoration in pass-gate logic, but a combination of the two can help scale the voltage even further. The DTMOS variations illustrated in Fig. 3 were simulated in simple pass-gate circuits. The augmented circuit style needed to be customized for passgate logic, so a second auxiliary device was added to provide a new, symmetric design, shown in Fig. 6 . Three different restoration schemes were simulated and are shown in Fig. 7 . A simple XOR function was used for the pass-transistor logic simulations. As mentioned earlier, the DTMOS advantage is even greater for larger loading-both because the speed improvement is increased and because with larger drivers, the area penalty for the minimum-sized auxiliary devices in two of the DTMOS designs is negligible. Nonetheless, output loading was kept at 0.1 pF for all circuits, and modestly sized devices were used in these simulations-a 10 : 1 ratio of driver to auxiliary device widths.
For each design, restoration option 3 gave the largest improvement. The other restoration schemes showed the same general trends. The restoration option 3 comparisons for a given pass-gate logic style are shown in Figs. 8 and 9 . These This shows that even when using restoration, the DTMOS alternatives offer a significant improvement for a wide range of voltages. figures indicate that straight DTMOS is preferable for supplies below 0.5 V and that the limiter and augmented DTMOS styles are advantageous for supply voltages between 0.5 and 1.5 V. A third alternative for using DTMOS at higher supply voltages is to use the straight DTMOS design with no auxiliary devices but have a boosted ground so that the DTMOS circuit only sees a small part of the full supply [16] . This third method may serve as a way to interface I/O circuits with ultralow-power DTMOS circuits.
VI. PROCESS ISSUES
The main focus of this paper has been to present methods to overcome the 0.5-V limitation on DTMOS supply voltage so that the DTMOS idea can be used over a broader range. We also touched on the area penalty associated with DTMOS, saying that for large devices driving large loads, the added area for either the auxiliary devices or the body contacts can be neglected. A final complication with DTMOS circuits should also be addressed-that of process complexity. Depending on the process, opening body contacts and isolating the bodies for each transistor can add process steps. In some circumstances, an additional masking step may be required for the body contact. Isolation comes naturally for DTMOS when implemented on SOI wafers but is quite difficult for bulk silicon wafers. Reference [17] presented the first bulk DTMOS results using 1.5-m trenches and a four-well process-not a small task for most fabs. Other bulk-Si DTMOS options are currently being investigated. Another process challenge is to provide a low-resistance path to the body under the entire width of the transistor to minimize charging/discharging delay of the body and to maximize the DTMOS advantage. Bulk DTMOS is advantageous here because it is easier to provide a low-resistance path without affecting the threshold voltage. Body doping schemes for low resistance are currently being investigated.
VII. DISCUSSION
As mentioned in the inverter analysis, for the process simulated in this paper is 0.15 V for V. This corresponds to a of -V. In [15] , it has been shown that with an optimized channel doping profile, a 1 : 1 ratio can be obtained for . This makes DTMOS look quite promising for ultralow-power applications. For example, with a 0.5-V supply, one could engineer a 0.5-V with a 0-V . According to (1) , this would even have superior delay to a standard design with of 0.2 V/1 V. Furthermore, it is conceivable that a 0.2-V supply could be used with DTMOS given a 0.2-V and a 0-V . Such a supply voltage is very attractive to ultralow-power product designers.
VIII. CONCLUSION
In this paper, we demonstrated the various techniques of DTMOS that can be used for a broad range of supply voltages. DTMOS delay and efficiency become superior to traditional designs as the voltage is reduced and the loading is increased. We showed that with a reduced on-state threshold voltage, pass-transistor logic can continue to be used in low-power applications and that a combination of DTMOS and restoration can provide the best performance.
