Introduction
Standalone and embedded random-access memories (RAMs) have evolved rapidly, and their high density, low power, and low cost have contributed to improving the affordability and performance of electronic systems such as computers, communication systems, and consumer products. In research and development, the density of standalone RAMs has reached the 4-Gb level for dynamic RAMs (DRAMs) [1] and 72-Mb for static RAMs (SRAMs) [2, 3] , along with a reduced RAM cell area, as shown in Figure 1 [4] .
In embedded RAMs (e-RAMs), recent developments have focused on high speed under low voltages, exemplified by the 1.5-V, 300-MHz, 16-Mb DRAM macro [5] and the 1.5-V, 1-GHz, 24-Mb L3-SRAM cache [6] . Device miniaturization and the rapidly growing demand for mobile or power-aware systems have resulted in an urgent need to reduce power-supply voltage (V CC ) (Figure 2) . In standalone RAMs, the standard V CC has been reduced to as low as 1.8 V. In e-RAMs, the voltage has been lowered even more, because it is based on that of the logic circuits in microprocessing units (MPUs) [7] , reaching below 1.5 V. In particular, the need for e-RAMs to have low-voltage and small memory cells will become increasingly greater, because they are expected to occupy more than 90% of the area of systems-on-a-chip (SoCs) [8] . Reducing the supply voltage to the region below 1 V, however, places three stringent constraints on design [4] :
• Maintaining a high signal-to-noise-ratio (S/N) for RAM cells to operate stably.
• Reducing the leakage currents (especially gate-tunnel current and subthreshold current) in MOSFETs, which increases considerably when the gate-oxide thickness (t OX ) and the threshold voltage (V T ) are reduced.
• Suppressing speed variations that become prominent at low voltages as a result of design parameter variations.
Unless these problems are solved, RAMs will never be able to operate reliably. In addition, the low-power advantage of CMOS circuits will be lost, and we can envision a scenario in which even CMOS SoCs would suffer from huge dissipations of dc power caused by subthreshold currents, as was the case in the recent bipolar and BiCMOS large-scale integration (LSI) eras. In particular, reducing subthreshold current is extremely important in RAM circuit design and in random logic LSIs. To the best of our knowledge, the importance of reducing subthreshold currents in low-voltage high-speed room-temperature operation LSIs only became apparent in 1991 [9] as a result of innovative developments with 1.5-V high-speed DRAMs [10, 11] . In addition to the preceding reduction schemes through dynamic substrate control and power switches [12] , other key solutions to reduce subthreshold current were proposed in the early 1990s [13] [14] [15] [16] [17] , although these were all in the standby mode. A solution to reduce subthreshold current in the active mode was presented as early as 1993 using a hypothetical 16-Gb DRAM [18] . Although numerous attempts have subsequently been made in both RAMs and logic LSIs, the problem of reducing subthreshold current in the high-speed active mode remains unsolved, especially in random logic LSIs.
Trends and challenges with low-voltage RAMs
There are three major issues in producing low-voltage RAMs-stable RAM-cell operation, reduced leakage currents, and suppression of speed variations that are prominent at a lower voltage. However, developments toward creating a smaller cell and lower power dissipation with the simplest processes possible must also be viewed
Figure 1
Research and development trends in DRAMs and SRAMs: (a) Memory capacity per chip. (b) Memory cell area. Data for 32-Mb [2] and 72-Mb [3] SRAMs has been added to original data [4] . Data from recent conferences has been added to original data [7] .
as major concerns for RAMs, because the three issues are closely related to the degree of device miniaturization and low-voltage operation. The intention of this section is to clarify the issues common to both DRAM and SRAM technology trends. For this discussion, we have mainly assumed the standalone RAM chip shown in Figure 3 . The chip comprises a RAM array, iterative circuit blocks such as decoders and drivers, peripheral logic circuits, I/O circuits, and on-chip voltage generators that bridge the supply-voltage gap between the memory cell array and peripheral circuits.
Cell signal charge
The signal charge, Q S (Q S ϭ C S V DD /2, where C S is storage capacitance), has been reduced through device miniaturization and low voltage, as shown in Figure 4 (a) [9, 19] . This reduction destabilizes DRAM-cell operations because of a smaller signal voltage on the data line (DL) in a noisy memory array and larger soft-error rates (SERs). The Q S of SRAMs is significantly smaller than that of DRAMs by 1 to 1.5 decades. Thus, the SERs of SRAMs increase rapidly as a result of decreased parasitic C S and rapid reduction in operating voltage despite spatial scaling. In contrast, the SERs of DRAMs decrease gradually with device scaling, as shown in Figure 4 (b) [20] , as a result of the intentionally increased C S and spatial scaling that causes less collection of charges. The Q S is effectively reduced by the ever-increasing necessary V T , V T variation, and V T mismatch under a given V DD . As shown in Figure 5 (a), the necessary V T of RAM cells must be increased with greater memory capacity even under ever-lowering V DD . The increase in V T is due to specifications, where the maximum refresh time, t REFmax , required of standalone DRAMs must lengthen with 
Figure 4
Trends in signal charge and soft-error immunity of RAMs: (a) Signal charge for DRAMs and SRAMs presented at ISSCC and Symposium on VLSI Circuits. Data for 1-Gb DRAM and SRAMs has been added to that reported in [9] and [19] . (b) SER cross section for DRAMs and SRAMs [20] . memory capacity, and the data-retention current of SRAMs in power-aware systems must almost be constant. The V T variation slows down the half-V DD DRAM sensing and reduces the available signal charge of SRAM cells. The V T mismatch between cross-coupled/paired MOSFETs in a large number of DRAM sense amplifiers (SAs) and SRAM cells also increases with increased memory capacity and decreased device size, degrading the sensing margin of DRAM cells and the voltage margin of SRAM cells [4] .
Unfortunately, even in the absence of extrinsic variations (implant nonuniformity and channel length/width variations), there is an intrinsic V T variation that increases with device scaling as a result of random microscopic fluctuations in dopant atoms in the extremely small channel area. The standard deviation for this intrinsic random V T variation is expressed by
where q is the electronic charge, C OX is the gateoxide capacitance per unit area, N A is the impurity concentration, D is the depletion layer width under the gate, L is the channel length, and W is the channel width [21] . The standard deviation of V T mismatch (offset voltage) (␦V T ) is ͌ 2 times (V T ). The maximum V T mismatch ͉␦V T ͉ MAX , however, depends not only on the device parameters, but also on the number of MOSFETs, N, used in the chip. The ratio m ϭ ͉␦V T ͉ MAX /(␦V T ) increases with N, and its expected value is expressed by
The calculated maximum V T mismatch in the n-MOSFETs used in DRAM SAs and SRAM cells is shown in It should be noted that the ␦V T in SRAM cells, as much as 50 mV in a 128-Mb SRAM, is more serious because of larger N and smaller LW. Enlarging MOSFETs to reduce the ␦V T is fatal for a large-capacity SRAM because of increased SRAM cell area, while it can be done for DRAM SAs without substantially increasing the chip area because only one SA is placed on a pair of DLs. One method to solve the V T -mismatch problem of DRAM SAs is the mismatch-compensation circuit technique [22, 23] , which, however, causes area and access overheads. Therefore, a column-redundancy technique is needed to eliminate a certain percentage of SAs with excessive ␦V T to maintain the ratio mЈ ϭ ͉␦V T ͉Ј MAX /(␦V T ) at a constant. Here, ͉␦V T ͉Ј MAX is the maximum ␦V T after application of a redundancy technique. For example, if the ratio of spare columns to normal columns is 1/256 (0.4% of array area penalty), ͉␦V T ͉Ј MAX is limited to 2.9(␦V T ). As a result, the memory capacity limitation is extended
Figure 5
Threshold voltage, V T , requirement and V T mismatch issues in scaled RAMs: (a) Minimum necessary V T s at room temperature to maintain the leakage charge of the DRAM cell as low as 10% of the signal charge and the retention current of SRAM chips as low as 1 A, both at 100ЊC, assuming V T (extrapolated) ϭ V T (1 nA/ m) ϩ 0.25 V, S-factor ϭ 120 mV/decade at 100ЊC, ⌬V T /⌬T ϭ Ϫ2. 
by at least three generations, as Figure 5(b) shows. An efficient test method to detect and replace defective SAs (with excessive ␦V T ) is also needed. On the other hand, the mismatch of SRAM cells results in random bit defects, which require quite a large number of programmable elements for storing defective addresses (three million for a 32-Mb SRAM with 128-kb spare cells). Thus, an on-chip error-checking and correcting (ECC) circuit is indispensable [24, 25] .
Leakage currents
Both subthreshold current and gate-tunnel current greatly affect the operation of RAM cells and peripheral circuits, not only in the standby mode but also in the active mode.
Subthreshold leakage current
In a DRAM cell, a subthreshold leakage current flowing from the cell storage node to the data line shortens the data retention time. In an SRAM, the data retention current of the cell caused by the leakage is dramatically increased, along with decreasing V T , as Figure 6 (a) shows [26] . For example, the subthreshold current of a 1-Mb SRAM array reaches as much as 10 A at V T ϭ 0 V and 50ЊC, although it can be as small as 3 A at V T ϭ 0.65 V, which corresponds to the maximum retention current acceptable for a standalone SRAM for cellular-phone applications. Here, V T ϭ 0 and 0.65 V are minimum V T s corresponding to nominal V T s of 0.1 V and 0.75 V, respectively, with an assumption of a V T variation of Ϯ0.1 V. Thus, the currents prevent the V T of both DRAM and SRAM cells from scaling, as mentioned above. The leakage current in peripheral circuits, even in the active mode, also becomes huge, as exemplified in Figure 6 (b) by a hypothetical 16-Gb DRAM [18] . At present, our main focus is on subthreshold current in the standby mode, because the V T is still too high. For further reductions in V T , however, even numerous circuits, especially the iterative circuit blocks that are inactive during the active period, will start to generate subthreshold currents, causing a huge active current in the chip.
Gate-tunnel leakage current
A solution to the issue of gate-tunnel leakage current is also urgently required in designing RAMs for power-aware systems because the gate-oxide thickness, t OX , has been rapidly decreasing, as Figure 7 shows [27] . Recently, MPUs-and thus on-chip SRAM caches-have accelerated the trend to reduce t OX at a rate of ϫ0.175 over the last ten years, which is almost two times faster than that for standalone DRAMs, and thus, operation of core circuits at less than 1.5 V has become popular. 
Figure 6
Leakage current issues in SRAM and DRAMs: (a) SRAM cell leakage current plotted against cell V T for various junction temperatures, T j . Reproduced from [26] with permission; © 1998 IEEE.
(b) Trends in DRAM active current [18] . (Figure 7 ) [28] , and a 3.3-ns-cycle 6.6-ns-access 16-Mb macro with a dual V DD (1.5/2.5 V) and triple t OX (1.7/2.2/5.2 nm) [5] . Even for standalone DRAMs, the dual-t OX approach would, in the future, be useful for high speed and low power. In this case, the thin t OX of the periphery would follow the International Technology Roadmap for Semiconductors (ITRS) [29] , while the thick t OX of memory cells would follow a different path [ Figure 5 (a) and Figure 7 ], because it is not scalable, even if devices become increasingly miniaturized, as previously explained. Note that MPU and DRAM performances will slow down, because the pace of the t OX reduction projected by the ITRS [8] will slow down. Moreover, even the ITRS projection cannot be achieved without reducing the rapidly increasing gate-tunnel current developed at a t OX of less than 2-3 nm. Unfortunately, however, there have only been a limited number of circuit solutions. For example, the gate leakage current in RAM cells can be suppressed to some extent by reducing the supply voltage [25, 30] . The gate leakage current in peripheral circuits can be suppressed by shutting off the supply path by inserting a thicker-t OX switch [31] . The schemes can be applied only for standby mode. Since the current in the active mode must also be reduced, development of new gate-dielectric materials with low leakage and high dielectric constant appears to be the most desirable solution.
Speed variations and other issues with peripheral circuits
It is essential to suppress speed variations of peripheral circuits because the degree of speed variation for any given variation in design parameters is increased by lowering V DD , exemplified by (V T )/(V DD Ϫ V T ) [32] . Unfortunately, design parameters such as V T increase with technology scaling, as mentioned previously. The challenge is to instantaneously raise the gate-input voltage, to reduce speed variations through stringent controls of design parameters, such as V T , and to control V T or compensate for V T variation through circuit techniques. Power management is an effective way to suppress speed variations, as well as to reduce the power of power-aware systems. Testing methodology that is relevant to leakage currents is also a major area of concern. stacked-capacitor open-DL cell [35] . Here, the open-DL cell necessitates a low-impedance array to suppress inherent array noises [4, 36] generated by imbalances between a pair of DLs, each of which is placed in different subarrays. For standalone DRAMs, as many memory cells as possible must be connected to each DLpair to realize a smaller chip by reducing the overhead area at each DL-division, thus causing a larger C D . Instead, a large signal charge, Q S , is needed for the necessary signal voltage. Thus, a larger C S is desirable to lower V DD , which has been attained with sophisticated vertical (stacked/trench) capacitors and high dielectric constant (high-k) thin films. The subthreshold current caused by the resulting low V T is cut by the negative word-line (NWL) scheme [4] with a ␦ gate-offset during nonselected periods, as is discussed in the subsection on circuit applications in Section 4. NWL also reduces the high-level word-line voltage necessary for a full-V DD write operation, enabling the use of a thinner-t OX MOSFET for a given stress voltage [37] . Hence, low-voltage operations with a resulting small subthreshold swing (S-factor) are realized.
Low-voltage RAM cells

DRAM cells
One-transistor cells for standalone DRAMs
Figure 7
Trends in gate-oxide thickness, t OX , for DRAMs and MPUs presented at ISSCC and Symposium on VLSI Circuits [27] . 
One-transistor cells for e-DRAMs
The key to achieving high-performance e-DRAM is to use logic-compatible processes with a non-self-aligned cell contact and a MOS-planar capacitor and an extremely small subarray through the multi-divided DL [4] . The resultant increased cell area may be acceptable for e-DRAMs as long as it is significantly smaller than the six-transistor (6-T) full CMOS SRAM cell [7] . In addition, the resulting small C S is accepted by the resulting small C D , still enabling a sufficient signal voltage. Even increased SERs due to the small C S could be solved by using an ECC [24] . The small subarray coupled with the low contact resistance of cells reduces array-relevant line delays that are major bottlenecks in the access/cycle path. Thus, DRAMs could achieve an even faster access time than SRAMs as a result of the smaller physical size of their subarrays for a given memory capacity. In addition, the small subarray, coupled with circuit techniques such as multi-bank interleaving, pipeline operation, and direct sensing [4] , solves the speed problem in the row-cycle of DRAMs. A good example is the so-called 1-T SRAM** [38] , which incorporated a 1-T DRAM cell with a C S smaller than 10 fF using a single polysilicon planar capacitor and an extensive multi-bank scheme with 128 banks (32 Kb in each) that can operate simultaneously. Somasekhar et al. achieved a row-access frequency higher than 300 MHz for a 0.18-m, 1.8-V, 2-Mb e-DRAM with a planar capacitor cell [39] .
Gain cells
Gain cells such as 3-T and 4-T cells seem to be promising when the supply voltage is reduced to less than 1 V [40] . 2 when a self-aligned contact, triple polysilicon, and vertical capacitors are used. The cell becomes larger when the contact is replaced by a non-selfaligned contact. The 3-T and 4-T DRAM cells and the 6-T SRAM cell are also shown in the figure. They do not require a special capacitor [7] and they can be fabricated by a logic-compatible process with non-self-aligned contact and single polysilicon. Obviously, in terms of the cell area and simplicity of process, the 3-T cells are attractive compared with 1-T cells and the 6-T cell. Their advantages become more prominent at a lower V DD . Figure 8 (b) compares effective cell areas for V DD . Here, the effective cell area is the sum of the actual cell area and overhead area involved in the DL divisions. Note that even a high-Q S 1-T cell requires more DL divisions at a lower V DD to maintain the necessary signal, causing a rapid increase in the effective cell area with decreasing V DD [8, 27] . The lack of gain in the 1-T cell is responsible for the increase. On the other hand, the 3-T, 4-T, and 6-T cells are all gain cells that can develop a sufficient signal voltage without increasing the number of DL divisions, even at a lower V DD , and thus provide a fixed effective cell area that is independent of the V DD . Actually, however, the V DD has a lower limit for each cell. For the 3-T cell, it would be around 0.3 V, assuming a V T for the storage MOSFET of around 0 V, an NWL scheme of V WL ϭ Ϫ0.5 V for both read/write lines, and a low V T for the read/write MOSFETs of V T (r) ϭ 0 and V T (w) ϭ 0.3 V. An initial stored voltage (V store ) of 0.3 V for the cell, and even a decayed V store of 0.1 V, can be discriminated because of the gain if an improved sensing scheme is developed. The detection of and compensation for V T variations and
Figure 8
Possible cell structures for low-voltage DRAMs [7, 27] an additional capacitor at the storage node would further improve stability and reliability. For the 4-T cell, it would be as high as 0.8 V, because the V T of cross-coupled
MOSFETs must be higher than 0.8 V to ensure enough t REFmax , and thus the V DD must be higher than this voltage. The 6-T SRAM cell would be around 0.3 V if a raised supply voltage (V DH ) (e.g., 0.5 V) were supplied from an on-chip charge pump, as explained in the next subsection. Consequently, the effective cell area of the 3-T cell would be smaller than other cells at a V DD of less than 0.7 V. Note that the small polysilicon vertical-transistor 2-T 5F 2 cell recently proposed by Nakazato et al. [41] is another example of a gain cell, despite the small current drivability of the transistor.
In any event, in addition to the low junction temperature caused by the ultralow V DD , the wide voltage margin provided by gain cells would enable a sufficient t REFmax . Adjusting the potential profile of the storage node to suppress the pn-leakage current further lengthens the t REFmax and preserves the refresh busy rate, even in largermemory-capacity DRAMs [4] , or it lowers the data retention current in the standby mode. Even if the t REFmax were short, fast e-DRAMs, combined with a small subarray and new architectures, would allow the t REFmax to be drastically shortened, as discussed in the following.
The t REFmax is expressed as t REFmax ϭ n(t RC /␥), where n is the refresh cycle, t RC is the RAS cycle time, and ␥ is the refresh busy rate, defined as ␥ ϭ n(t RC /t REFmax ) [4] . This means that t REFmax can be made smaller by reducing n t RC or increasing ␥. Figure 9 shows an example of t REFmax for 
Figure 10
Leakage-current components and reduction in an SRAM cell. New scheme a 64-Mb DRAM. There are two cases; the first is for a standalone DRAM where n ϭ 4k (4k refresh cycles) and the second is for an e-DRAM where n ϭ 64. Note that t REFmax can be as short as 0.64 s for t RC ϭ 1 ns, and the refresh busy rate is 10%, while it is 40 ms for a standalone DRAM. Here, a 10% refresh busy rate may be acceptable if refreshes are hidden, as has been done in the 1-T SRAM [38] . One drawback of this scheme is to increase the refresh current (I REF ) that is expressed as
where M is the memory capacity (i.e., 64 Mb in this example). I REF can increase to as high as 1.3 A in e-DRAMs, while it is as low as 0.32 mA in standalone DRAMs. However, this current may be acceptable for high-performance applications, such as the on-chip cache memories of high-performance MPUs [39] .
SRAM cells
Reducing cell area is the greatest concern in SRAMs, as is suggested by the on-chip, 3-MB, L3 cache [6] . The loadless CMOS, 4-T SRAM [42] shows promise because the cell area is only 56% of that of the 6-T cell. However, it suffers from the data-pattern problem, and it is difficult to accurately control the nonselected word-line voltage to maintain the load current. At the present time, the 6-T cell is the best, despite its large area, because it enables the use of a simple process and design made possible by the wide-voltage margin of the cell. Even in the 6-T cell, however, subthreshold currents and gate-tunnel currents as well as the gate-induced drain leakage (GIDL) increase the retention current with lowering V T and decreasing t OX [43] . Thus, this applies strict limits on how much V T can be reduced. In addition, the soft-error issue is another concern.
To solve this problem, many driving methods and an optimal design for the cell of a small low-voltage cache have been proposed [4, 44] . Recently, a new driving scheme (Figure 10 ) has been proposed and applied to a 1.5-V, 27-ns access, 6.42 ϫ 8.76 mm 2 , 16-Mb SRAM [25] . The scheme, which lowers the data-line voltage from 1.5 V to 1 V and raises the ground line to 0.5 V at an activestandby mode transition, reduces the total leakage current per cell in the standby mode. At ambient temperature, the measured total current of the conventional is 95 fA. The largest component is the sum of subthreshold current and GIDL current of the n-MOSFET and p-MOSFET, although the V T s are as large as 0.7 V and Ϫ1 V. The gate-tunnel current of the n-MOSFET is comparable to the above, despite an electrical t OX as thick as 3.7 nm. The scheme greatly reduces the total current (to 17 fA). An offset source driving (discussed in the subsection on circuit applications in Section 4) by 0.5 V applied to the driver and transfer n-MOSFETs and an electric field relaxation by 0.5 V for all MOSFETs are responsible for the reduction. The reduction is more remarkable at a higher temperature. At 90ЊC, the total current of the conventional scheme is drastically increased to 1240 fA because of an increase in the subthreshold-current component. Note that GIDL current and gate-tunnel current are insensitive to temperature. The scheme reduces the total current to 102 fA. To cope with the increased SER caused by the reduced signal charge in the standby mode, an ECC was incorporated with a speed penalty of 3.2 ns and an area penalty of 9.7%, although an additional cell-capacitor can also improve the SER [ Figure 5 (a)] [45, 46] . Figure 11 shows another solution. The cell features a combination of a low-V T transfer MOSFET coupled with an NWL, a boosted power supply (V DH ), and high-V T cross-coupled MOSFETs [47, 48] . The NWL increases cell read-current (Icell) without inducing subthreshold current in transfer MOSFETs. The high-V T MOSFETs reduce the subthreshold current. The V DH increases the signal charge, Q S , and the drivability of driver MOSFETs against the high V T and V T imbalance. As a result, the cell read-current and the static noise margin (SNM) are dramatically improved, as shown in Figures 12 and 13 . The cell read-current increases while SNM decreases as the V T of transfer MOSFETs decreases. However, both the current and SNM increase as the V DH is raised. A usual design condition of Icell Ն 20 A and SNM Ն 100 mV can be realized by V DH Ϫ V DD Ն 100 mV at 1.0-V V DD [ Figure 12(a) ]. Even at an 0.8-V V DD and the same V DH Ϫ V DD , it is realized by a lower V T of the transfer MOSFETs [ Figure 12(b) ]. Moreover, the cell features a strong immunity against V T imbalance, the same as ␦V T in the previous section. Figure 13 shows SNM calculated for the worst combination of V T imbalance in a cell. For example, at an imbalance of 100 mV, the lower limit of V DD to achieve an SNM of 100 mV is 0.6 V without boosting (i.e., V DH ϭ V DD ). However, it becomes as low as 0.3 V at V DH Ϫ V DD ϭ 100 mV. There are no V DD
Figure 11
Improved SRAM cell [48] and static noise margin (SNM). In the standby mode, however, the generator current becomes larger than the total leakage current of the cell array, calling for a generator-current reduction through circuit techniques that are familiar to DRAM designers [4] .
Reduction of subthreshold current in peripheral circuits
Reduction scheme concepts Increasing V T is the best way to reduce the subthreshold current I leak of a MOSFET that is expressed by
where plus values refer to n-MOSFETs and minus values to p-MOSFETs, V T is the actual threshold voltage, S is the subthreshold swing, K is the body-effect coefficient, and is the drain-induced barrier lowering (DIBL) factor [49] . Here, q is the electronic charge, k is the Boltzmann constant, and T is the absolute temperature. Usually I leak is reduced to 1/10 with a V T increment of only 0.1 V (i.e., S ϳ 0.1 V/decade at 100ЊC). The two ways of obtaining a high-V T MOSFET from a low-actual-V T MOSFET are by increasing the doping level of the MOSFET substrate and by applying reverse biases. Thus, the selective use of the resulting high-V T MOSFETs in low-actual-V T circuits or the reverse biasing of low-actual-V T circuits decreases circuit subthreshold currents.
Although there have been many attempts to develop reverse-biasing schemes, the basic concepts can still be categorized into the three shown in Table 1: • (A) Gate-source (V GS ) reverse biasing.
• (B) Substrate-source (V BS ) reverse biasing.
• (C) Drain-source voltage (V DS ) reduction.
Here, the V GS reverse biasing scheme can be further categorized as V S -control with a fixed V G (A1) [14, 15] and V G -control with a fixed V S (A2) [13] . The V BS reverse biasing schemes can be categorized as V B -control with a fixed V S (B1) [12, 50] and V S -control with a fixed V B (B2) [51, 52] .
The efficiencies for reducing leakage for offset voltage ␦ are plotted in Figure 14 using 0.1-m MOSFET parameters. The reduction efficiency of (A2) is the I leak ratio without and with V GS reverse bias:
This is quite large because ␦ has been directly added to the low-actual V T . The reduction efficiency of (B1) is calculated in the same manner:
Figure 14
Leakage reduction efficiency of various concepts in Table 1 . Plotted using 0.1-m MOSFET (channel length = 90 nm, gate-oxide thickness = 2 nm) parameters. Table 1 Concepts to create effective high-V T n-MOSFETs. The arrows indicate subthreshold leakage current (I leak ). This is smaller than r 1 because of the square-root dependence on ␦ and the small K. (C) has quite a small reduction efficiency of
because of the small , unless V DS approaches thermal voltage (kT/q), where I leak is drastically reduced as the second factor of Equation (3). Scheme (A1) has the largest reduction efficiency of r 1 r 2 r 3 because all three effects are combined. (B2) has a reduction efficiency of r 2 r 3 , which is larger than that of (B1) because of the additional effect of reducing V DS . Note the inherently small offset voltage required to reduce the given leakage provided by scheme (A). This effectively reduces not only the subthreshold current in low-power mode, but also achieves a faster recovery time in high-speed mode, as is explained in the next subsection. The concept involve two types of biasing, static and dynamic. The former, or so-called dual-V T scheme, is to statistically combine low-V T MOSFETs and the resulting high-V T MOSFETs in core circuits. A CMOS dual-V T scheme [53, 54] in which a low V T is applied only to the critical path occupying a small portion of the core is quite effective in simultaneously achieving high speed and lowleakage current, although the basic scheme was proposed for an n-MOSFET 5-V 64-Kb DRAM [55] . A difference in V T of 0.1 V reduces the standby subthreshold current to one-fifth its value for a single low V T , although an excessive V T difference might cause a race condition problem between low-and high-V T circuits. The dual-V T scheme is also applied to SRAMs [54, 56] . It was reported that a combination of dual V T and dual V DD achieved a high-speed low-power 1-V e-SRAM [56] . Another application of the dual-V T scheme is a high-V T power switch [12, 14 -18] that can cut the subthreshold current of an internal low-V T core in standby mode, as described in the subsection on circuit applications. High-V T MOSFETs can easily be produced in a DRAM [57] by using the internal supply voltages that are required by DRAMs, as explained in the subsection on applications to RAMs. The high V T , however, eventually restricts the lower limit of V DD as the transconductance of the MOSFET degrades at a lower V DD .
The latter changes the V T so that it is low enough in high-speed modes, such as active mode with no reverse bias, while in low-power modes, such as standby mode, it is increased by changing bias conditions, as shown in Table 1 .
Circuit applications
This section reviews dynamic biasing schemes based on the above basic concepts, assuming circuits in which all MOSFETs have a low actual V T . Figure 15 (a) is a circuit diagram for self-reverse biasing. It features a low-V T switch p-MOSFET Q SP inserted between
Gate-source self-reverse biasing (A1)
Figure 15
Circuits for self-reverse biasing (A1) [14, 15] : (a) Principle; (b) operating waveforms; (c) application to iterative circuits. W S and W P denote the respective channel widths of Q SP and Q P . V TS and V TP denote the respective threshold voltages of Q SP and Q P .
(a)
Figure 16
Leakage reduction due to stacking MOSFETs using concepts (A1) and (C) in Table 1 . The same parameters as in Figure 14 are used. the source of the MOSFET Q P and V DD . The MOSFET Q SP stacked to Q P is a kind of power switch, working as a source impedance turning on and off during respective active and standby modes. A subthreshold current flowing from Q P when Q SP and Q P are off in the standby mode generates an offset voltage, ␦, on V DL as shown in Figure 15 (b), automatically providing a reverse bias ␦ to Q P so that the current is eventually reduced. This biasing is a combination of V GS reverse biasing, V BS reverse biasing, and V DS reduction, providing the primary effect to V GS reverse biasing and the secondary effect to V BS reverse biasing and V DS reduction, as described above. The gate voltage is V DD , not V DL , to take advantage of the V GS reverse bias. Note that no matter how large the original leak current at Q P is, it is eventually confined to the constant current of Q SP through the automatic adjustment of the offset voltage ␦. Here, ␦ is expressed as V TS Ϫ V TP ϩ S log(W P /W S ), and the current reduction ratio is expressed as 10 Ϫ␦/S if secondary effects are neglected [4] . Thus, the reduction is adjustable with ␦, that is, V TS and W S . If V TS is high enough, the current is completely cut off with a larger ␦, creating a perfect switch. A large ␦, however, results in slow recovery time, large charging/discharging current, and spike noise at mode transients. If V TS is low enough, however, ␦ becomes smaller (allowing leakage flow), causing an imperfect (leaky) switch, but the above problems are reduced. Moreover, a low-V T switch is favorable to reduce the necessary channel width of Q SP , because the increased transconductance can supply the accumulated current of the logic core with a smaller channel width, especially at a lower V DD . Sharing a low-V T switch through iterative circuits in RAMs [ Figure 15(c) ] is quite effective [14, 15] . Because a feature of RAM circuits is that only one of the iterative circuits is active, W S can be comparable to W P with little speed penalty in the active mode, while ␦ ϭ S/log (nW P /W S ) in the standby mode for V TS ϭ V TP . Therefore, both leakage and area penalty as a result of adding Q SP are negligible with increasing n (i.e., ␦). To be more precise, secondary effects must be taken into consideration: The substrate connection of Q P to V DD creates substrate reverse bias. The effect of reduced V DS is also added if ␦ is large (i.e., a small V DS ).
An extreme case of W S ϭ W P and n ϭ 1 is in the I leak reduction of series-connected MOSFETs, the so-called stacking effect [58, 59] . This effect can be explained by a combination of self-reverse biasing (A1) and V DS reduction (C), as Figure 16 shows, though (C) is not used alone. The leakage current of Q P is reduced through self-reverse biasing, while that of Q SP is reduced through reducing V DS . The node-voltage-lowering V M at the connection and the I leak reduction efficiency are determined by the equilibrium of the two currents and expressed by the crossing point of the two curves. Because the reduction efficiency becomes larger as the number of series MOSFETs becomes larger, the I leak of NAND gates using series-connected n-MOSFETs is efficiently reduced. Figure 17(a) shows offset gate driving, where the input voltage is "overdriven" by ␦. This is difficult to apply to random logic circuits because the logic swing of the output must be smaller than that of the input. However, it is useful to reduce I leak in bus drivers [13] , in power switches that have a low actual V T (Figure 17(b) [60] ), and in RAM cells (Figure 17(c) [47, 61] ), as was previously explained. Offset gate driving applied to an imperfect switch reduces I leak in standby, realizing an effectively perfect switch. However, the problems of a perfect switch described above arise.
Offset gate driving (A2)
Figure 17
Circuits for offset gate driving (A1) [13] : (a) Principle; (b) application to power switch [60] ; (c) application to RAM cells (negative word line) [47, 61] . Figure 18 (a) shows the circuit for substrate (well) driving, where the substrate voltages of MOSFETs in core circuits change between active and standby modes [12, 50, 62, 63] . Figure 18(b) shows the operating waveforms. This scheme can also be applied to reduce I leak in power switches (Figure 18(c) [64] ). Figure 18 (d) [51, 52] has the circuit for offset source driving, with switches Q SP and Q SN inserted between the MOSFET sources and power supplies. Note that this is quite different from (A1), though both utilize source switches. The input (gate) voltage of (A1), which is the output of the previous stage, is "full swing" (V DD ), while that of (B2) is not (i.e., V DL or V SL ). This difference results in the large discrepancy in I leak reduction efficiency, as shown in Figure 14 . From this viewpoint, power switches [17] applied to logic circuits can be categorized as (B2). Another application of this scheme is to reduce I leak in SRAM cells [25, 65] , as was discussed earlier.
Storage elements
Substrate (well) driving (B1)
Offset source driving (B2)
Comparison
There is a big difference between the two schemes (A) and (B) in mode-transient time, especially recovery (standby-to-active) time. In V GS reverse biasing, the small voltage swing, ␦, enables quick recovery (several nanoseconds). In V BS reverse biasing, however, it takes more than 100 ns for recovery when it is applied to a power line, because V BS reverse biasing requires a large V B swing (⌬V B ) or V S swing (⌬V S ), which is usually more than 1.5 V for a given change in V T (⌬V T ). The necessary voltage swing imposes different requirements on substrate driving (B1) and offset source driving (B2). In (B1), the necessary voltage is significantly larger than V DD , which is the sum of V DD and ⌬V B . For example, existing MOSFETs with a 0.2-V 1/2 -body-effect coefficient (K) require a ⌬V B as large as 2.5 V to reduce the current by two decades with a 0.2-V ⌬V T . A larger-K MOSFET is needed to reduce the swing. However, this slows down the speed in stacked circuits, such as NAND gates. In contrast, the K value decreases with MOSFET scaling, implying that the necessary ⌬V B will continue to increase further in the future owing to a lower K, and there will be a need for a larger ⌬V T reflecting the low-V T era. Eventually, this will enhance short-channel effects and increase other leakage currents, such as the GIDL current [66] . A shallow reverse V B setting, or even a forward V B setting in active mode, is also required to effectively increase V T in standby mode, because V T is more sensitive to V B [4] . However, the
Figure 18
Circuits for substrate-source voltage (V BS ) reverse biasing: (a) Substrate (well) driving (B1) [12, 50] ; (b) its operating waveforms; (c) application to power switch [64] ; (d) offset source driving (B2) [51, 52] ; (e) its operating waveforms.
(a) These problems include spike current and CMOS latch-up during power-on and mode transitions, V BB degradation caused by increased substrate current in high-speed modes and screening tests at high stress V DD , and slow recovery time as a result of poor current drivability of the on-chip charge pump. In offset source driving (B2), the necessary voltages and voltage swing at any node are smaller than V DD . This control becomes ineffective as V DD is lowered owing to a smaller substrate bias. However, the problems described above accompanied by an on-chip V BB generator are not expected.
The energy overhead of offset source driving (B2) through mode transitions is usually larger than that of substrate driving (B1). This is because the parasitic capacitances of source lines (V DL and V SL ) are larger than those of substrate lines (V BBP and V BBN ), though the necessary ␦ is smaller, as shown in Figure 14 . The parasitic capacitances of V BBP and V BBN consist mainly of junction capacitances between substrate (well) and source/drain of MOSFETs, while those of V DL and V SL include the gate capacitances of on-state MOSFETs as well as junction capacitances. The energy overhead of self-reverse biasing (A1) is quite small because of small and self-adjusted ␦.
Applications to RAMs
Features of RAMs
In the active mode, reducing leakage is extremely difficult because of the limited time to control it. In the standby mode, it is rather easy because there is sufficient time available. Fortunately, however, RAM peripheral circuits favor the reduction of subthreshold current (I leak ) (Figure 19 ) compared with random logic gates, because of the inherent features of RAMs described in the following. These are exemplified by the modern synchronous DRAM in the figure.
Use of iterative circuit blocks
RAMs consist of multiple iterative circuit blocks with low activation ratios, such as row/column decoders and drivers, each of which has quite a large total-channel width involving subthreshold current. In addition, all circuits in each block, except the selected one, are inactive, even during the active period. This enables I leak to be controlled simply and effectively with a smaller area penalty than logic LSIs, as shown in Figure 15(c) .
Use of input-predictable logic
RAMs are composed of input-predictable circuits, allowing circuit designers to predict all node voltages in the chip and to prepare the most effective subthreshold-current reduction scheme (e.g., V GS self-reverse biasing) in advance. As for input nodes, which are not predictable, the level-fixing input buffer (Figure 20 ) [15] can force the internal node voltages to be predictable. In standby mode (signal STANDBY is at high level), internal nodes including a i , a i , and the following-stage outputs are forced to be at predetermined levels, irrespective of input node A i . Similar techniques are applied to logic LSIs, though their node voltages are usually unpredictable because they contain registers or latches to retain internal states. Latches (Figure 21 ) [59] that fix the output level while retaining the latched data are effective in reducing I leak in sleep mode. Level-fixing flip-flops [67] combined with selfreverse biasing [15] , power switches [60] , and level holders [18] enable quick recovery from sleep mode. These techniques, in turn, can be applied to RAM peripheral circuits with registers or latches.
Slow cycle
RAMs feature a slow cycle t RC compared with random logic gates, and this allows each circuit to be active for only a
Figure 20
Method to make the internal nodes of RAMs predictable. Each node voltage during standby mode (standby signal is at high level) is in parentheses. Q SP , Q SN , and solid inverters consist of high-V T MOSFETs. Other logic gates consist of low-V T MOSFETs. Reproduced from [15] with permission; © 1993 IEEE. Word drivers short period within the "long" memory cycle, leaving additional time to control the subthreshold current. This is true for DRAM row circuits, which are slow enough to accept leakage controls. However, the column circuits in modern DRAMs (Figure 19 ) feature a fast burst cycle and unpredictable circuit operation (every column may be selected during the memory cycle). Therefore, it is difficult to reduce I leak in column circuits in the active mode. This is the case for high-speed SRAMs and logic LSIs.
Use of robust circuits
RAMs do not use leakage-sensitive circuits, such as dynamic NOR gates, that require a level keeper to prevent malfunctions caused by leakage [68] . The decoders of modern CMOS DRAMs consist of dynamic (for the row) and static (for the column) NAND gates to reduce the power (Figure 19 ). NAND decoders discharge only one output node in a selected decoder, while the NOR decoders used in the n-MOS era discharged all output nodes in decoders, except for the selected one.
In contrast, it is difficult to reduce I leak in random logic circuits because of the noniterative circuit topology, higher activation ratio, unpredictable node states, and faster cycle. Dual static V T [53] , the stack effect in NAND gates described above, and circuit reordering [69] are effective to some extent in reducing I leak in the standby mode of logic LSIs. However, reducing I leak in random logic circuits in the active mode is more difficult. The only scheme that has been reported thus far is dual static V T , though it has limited reduction efficiency because of the limited V T difference, as previously explained. More effective schemes have yet to be discovered.
Applications to DRAM standby mode
The reduction of subthreshold leakage current applied to iterative circuit blocks, such as a word-driver block, is extremely important in memory design. For example, a low-V T p-MOS switch [Q SP in Figure 22(a) ] [14, 15] shared with the n word drivers of a 256-Mb DRAM [70] enables the common power line, V DL , to drop by ␦ as a result of the total subthreshold current flow of nI when the switch is off in standby mode. As it provides each p-MOS driver, Q, with a ␦ self-reverse bias, the subthreshold current, I, eventually decreases. Hence, even if an on-chip charge pump for the raised supply V DH necessary for DRAM word-line bootstrapping suffers from poor output-current drivability, the V DH is well regulated. In the active mode, the selected word line is driven after V DL is connected to a supply voltage, V DH , by turning on Q SP . Here, the channel width of Q SP can be reduced to an extent comparable to that of Q without a speed penalty because of the low activation ratio, 1/n, of the drivers. In a 256-Mb chip, a ␦ as small as 0.25 V reduced the standby subthreshold current of word drivers and decoders by two decades [ Figure 22(b) ] without inflicting penalties in terms of speed and area.
Another example is shown in Figure 23(a) . This 256-Mb SDRAM [57] with a hierarchical word-line structure
Figure 23
Various leakage-reduction schemes applied to 256-Mb SDRAM [57] : (a) Application to array-associated circuitry; (b) leakage-current reduction. V T is defined by a current density of 10 nA/15 m. The peripheral circuits component is from peripheral MOSFETs without substrate bias. 
utilizes the self-reverse biasing described above combined with "pseudo" multiple static V T using substrate biasing. The circled MOSFETs in the figure are in the subthreshold region during standby mode. Here, selfreverse biasing is applied only to p-MOSFETs (open circles) that produce larger subthreshold current. This is because p-MOSFETs have larger total channel width and larger subthreshold swing due to the buried-channel MOSFET structure. The n-MOSFETs (shaded circles) and the p-MOSFETs in the column decoder have higher V T due to the respective well bias V BB and V DH . By combining both schemes, the total subthreshold leakage current in the power-down/self-refresh mode is reduced to one sixth, as Figure 23(b) shows. The current can be further reduced by applying both schemes to the peripheral circuits.
Applications to DRAM active mode
In the future, with a further reduction in V T , the subthreshold leakage current, I DC , will exceed the capacitive current, I AC , and eventually dominate the total active current, I ACT , of the chip [ Figure 6 (b)], as pointed out as early as 1993 [18, 71] . V GS back-biasing applied to an iterative circuit block, which is divided into m subblocks, each consisting of n/m circuits (Figure 24) , confines the currents to that of a single selected sub-block [18] . This is because all nonselected sub-blocks have no substantial subthreshold current due to V GS back-biasing (Figure 22 ) when the switch of the selected sub-block, including the selected word line, is turned on while the others remain off. The above-mentioned multi-static V T also reduces current. The subthreshold currents of low-V T circuits on the critical path are reduced by combining power switches and high-V T level holders (Figure 25 ) [18, 72] . The power switch goes off just after evaluating the input of the low-V T circuit and holding the evaluated output at the holder. This prevents the output from discharging, allowing the switch to quickly turn on at the necessary time to prepare for the next evaluation. This is a good example of the principle of avoiding large voltage swings with heavily capacitive loads. In fact, it has been reported that these circuits could reduce the active current of a hypothetical 16-Gb DRAM [18, 71] 
Speed variations and other issues with peripheral circuits
Other key peripheral circuits are sense amplifiers and lowvoltage supporting circuits, such as level shifters, stressrelease I/O circuits, and on-chip supply-voltage generators in RAM chips (Figure 3) . They play important roles in the stability and speed of RAMs. However, well-known logicgate blocks in peripheral circuits are also important in terms of suppression of speed variations, as explained earlier. Power management is essential for high-speed, low-power designs of the blocks. Testing methodology that is relevant to leakage currents is also a major area of concern.
Figure 24
Active leakage current reduction with partial activation of multi-divided subarray [18] .
Block #1
␦ ␦
Sense amplifiers
Sense amplifiers (SAs) are always slow because they manage a small signal, thus requiring high-speed design achieved by reducing speed variations. The design of SAs [4] , which usually have a cross-coupled circuit configuration in terms of low power and small area, can be different for DRAMs and SRAMs. This is because the necessary size, the number in a chip, and the circuit operation are usually different. DRAMs feature a huge number of tiny SAs in a chip, because one SA must be placed at each data line due to refresh requirements. In addition, in the standard mid-point (half-V DD ) sensing of DRAMs [4] , the SA must operate at the lowest voltage (i.e., half-V DD ) in the chip, despite the resulting halved data-line power without a dummy cell and with a lownoise array [4] . As a result, the statistically large V T variations, (V T ), and low-voltage operation slow down sensing with a wide spread in speed. Increasing the size of SA MOSFETs to reduce (V T ) and using redundancy and/or ECC to prevent SAs from acquiring an excessively large ␦V T are effective solutions that are similar to those associated with the V T -mismatch issue previously explained in the subsection on cell signal charge in Section 2. In overdrive sensing [73, 74] , this problem is solved by applying a higher voltage solely to SA inputs by isolating the data line from the SA or by capacitive coupling. Using additional capacitors may be acceptable in e-DRAMs, where area is of less concern. The recently presented full-V DD (or ground) sensing with a dummy cell [5] , which is a revival of the kind of sensing done during the n-MOS DRAM era of the 1970s, solves the problem with a raised voltage (i.e., V DD ). SRAMs have a small number of SAs on a chip, although they must be highly sensitive for a higher speed. Thus, in addition to some of the above solutions for DRAMs, a low-voltage current SA [75] may be acceptable despite the increase in area.
Low-voltage supporting circuits
High-speed level shifters that are proposed for SoCs [76, 77] and bridge the internal low-voltage core and highvoltage I/O circuits could be used for RAMs. Low-cost stress-release I/O circuits [78 -80] that manage the high voltage at the interface with a single thin t OX are also important. On-chip supply-voltage generators [4] continue to be essential in the stable operation of RAM cells with high supply voltages and in standardizing the power supply of standalone RAMs. In addition, they reduce subthreshold currents with multi-V T ( Figure 23 ) and speed variations at lower external supply voltages, as discussed below. Key issues are a high efficiency of voltage conversion, a high degree of accuracy in the output 
Figure 26
Active current reduction in hypothetical 16-Gb DRAM. V T is defined by a current density of 10 nA/5 m. Reproduced from [18] with permission; © 1994 IEEE. voltage, low power during the standby period, and a low cost of implementation [27] .
Power management
Power management is a solution to suppress speed variations and further reduce the power dissipation of power-aware systems through static and dynamic control of supply voltages. Power management can also effectively reduce subthreshold currents with V BB control, as mentioned earlier. Many schemes have thus far been proposed. The following subsections give a brief discussion of power-management problems that DRAM designers have experienced, followed by various viewpoints on power-management schemes that have been proposed by logic designers principally for SoCs.
In the past, DRAM designers encountered numerous problems that occurred even in static or quasi-static V BB and V DD . It is well known that the DRAM has been the only large-volume production LSI using a substrate bias that is supplied from an on-chip V BB generator. In the n-MOS DRAM era, when a quasi-static V BB was supplied to the p-type substrate of the whole chip (i.e., both array and periphery), the generator caused instabilities (surge current [55] or a degraded V BB level [4] ) at power-on and during burn-in high-voltage stress tests, and shortened the refresh time of cells due to minority-carrier injection to cells [4] . Poor current drivability of the generator consisting of charge pumps, a large substrate current generated from the peripheral circuits, and the substrate structure were mainly responsible for the instabilities. Even so, DRAM designers were fortunate because both the static bias setting of a deep V BB of about Ϫ2 V to Ϫ3 V and a sufficiently high V T of about 0.5 V allowed stable chip operation with small changes in V T , even with quite large quasi-static V BB variations and V BB noise [4] . In the CMOS era, substrate bias was removed from peripheral circuits primarily to eliminate instabilities caused by the generator and has only been supplied to the array to ensure stable operation.
Even a bump as small as Ϯ10% V DD made dynamic circuits unstable during the n-MOS era. This was due to a charge being trapped at floating nodes when voltage bumps were applied, causing malfunctions at the next cycle. Note that almost all peripheral circuits and DRAM cells were dynamic. Thus, a small diode-connected n-MOS (i.e., level keeper) was connected to the floating nodes of peripheral circuits to allow trapped charges to escape. However, bumps degraded the voltage margin of n-MOS cells, calling for grounded-plate cell capacitors [4] as a partial solution. Even in the CMOS era, memory cells, sensing relevant circuits (such as data-line precharge circuits and sense amplifiers) and row decoders/drivers were still dynamic, while other peripheral circuits have been static. Half-V DD sensing [81] (coupled with a half-V DD cell plate and a boosted word line) has been a circuitry solution because the margins of DRAM cells and the relevant sensing circuits are maintained wide despite voltage bumps. A CMOS feedback level keeper that is familiar to logic designers has been widely used for other dynamic circuits.
Static control of power-supply voltages
Static control is effective in suppressing speed variations of logic circuits while preserving stability of memory cells and memory-cell-relevant circuits. When V BB or internal V DD is statically controlled on the basis of parameter variations, inter-die speed variations can be suppressed, although intra-die speed variations remain unimproved. Negative effects, if any, when supply voltages are controlled statically or quasi-statically could be managed, as memory designers have done thus far. Controlling V BB with an on-chip V BB generator to adjust V T (the basic idea dates back to 1976 [62, 82, 83] ) could be widely used to suppress the variations if the previously discussed drawbacks are rectified. Controlling forward V BB , however, is more effective in reducing speed variations [84 -86] because the V T -V BB characteristics are more sensitive to V BB [4] . For example, controlling forward V BB reduced V T variations in logic circuits and improved speed of operations by 10% [85] . If a forward V BB is used, however, the requirements to suppress noise become more stringent, calling for a uniform distribution of the forward V BB throughout the chip [27] . Additional current consumption, in the form of bipolar current induced by the forward V BB , is another matter [85] that must be considered.
Control of internal V DD with an on-chip voltage-down converter (i.e., series regulator) [4] seems to be more practical, because the instabilities discussed above are not involved. In fact, a V DD control with both an off-chip buck converter and an internal-delay-detecting circuit [87] reduced the variation between speeds of the worst and best design conditions from five times to Ϯ20% at 0.5 V. However, the use of an on-chip voltage-down converter instead of the buck converter may be more practical because designs of the converter are simpler and have been well established in DRAM designs despite a lower conversion efficiency.
Dynamic control of power-supply voltages
Dynamic control reduces power dissipation and subthreshold currents. However, the problems described above might be compounded and become serious if dynamic control of V DD and/or V BB were applied to RAM chips, because they involve wide and dynamic changes in supply voltages and extremely low V T . Nevertheless, many attempts have been made, although only for the logic blocks of SoCs. Unfortunately, RAM cells and their relevant circuits are incompatible with dynamic controls, and thus they should at least be "quiet." Moreover, they must operate at a higher V DD . Their inherently small voltage margins are responsible for the requirements for the quiet and higher-V DD operation, as previously explained. Thus, as long as the controls never cause detrimental effects to RAM cells and their relevant circuits, some of them could be applied to parts of peripheral logic circuits (e.g., static circuits) in RAM chips or RAM blocks in SoCs. Note that SRAM blocks using full CMOS SRAM cells may accept dynamic voltage controls to some extent because of wide voltage margins, although care should be taken if dynamic sensing schemes are adopted.
Power switches [88] completely cut leakage currents of internal core circuits, although they incur a long recovery time on heavily capacitive internal power lines, as was explained in the subsection on circuit applications in Section 4. Dynamic voltage scaling (DVS) [89, 90] , in which the clock frequency and V DD vary dynamically in response to the computational load, provides reduced energy consumption per process during periods when few computations are performed, while still providing peak performance when required. Note that the highest V DD and lowest V DD that DVS can accept must be determined by the breakdown voltage of MOSFETs and the stability of RAM cells, respectively. This approach, however, becomes less effective in the low-V DD era because the range across which it is possible to vary V DD becomes narrower. In addition, successful operation over a wide range of V DD requires the accurate tracking of all circuit delays. Furthermore, applying DVS would make dynamic circuits (e.g., e-DRAMs) unstable without a level keeper [90] , although resultant instabilities depend on the changing rate of V DD and clock frequency.
For partially depleted (PD) SOIs, a wide changing of V DD may cause additional instabilities due to the floatingbody effect. DVS does not reduce subthreshold currents; these currents are reduced by elastic-V T CMOS [52] , where the clock frequency, V DD , and V BB are all dynamically varied. However, substrate noise may be coupled from the V DD power line when V DD is changed, which is hazardous in an on-chip V BB scheme. The cost and complexity of design are additional problems.
System-level low-power techniques introduced into a SoC would be effective if the problems described above could be solved. For example, ChipOS [91] was introduced to specify the acceptable maximum power and thus, maximum junction temperature. The power of the logic block for each sub-block is managed by controlling the gated clock and power switch to achieve a given power budget. In autonomous decentralized low-power systems [92, 93] , the frequency, supply voltage, substrate bias voltage, and power switch of each sub-block are all controlled by the system, according to its supplied processing load, to achieve the minimum power consumption. Even in this scheme, high-speed controls (e.g., for fast wake-up) of subthreshold currents of selected and nonselected sub-blocks would be essential.
Testing
Testing of low-voltage RAMs is problematic. A large subthreshold current makes it difficult to discriminate between defective and non-defective V DD currents (i.e., I DDQ currents), and thereby poses a problem in the I DDQ testing of low-voltage CMOS circuits. I DDQ testing with the application of a reverse V BB [94] is effective when low-temperature measurement and multi-V T design are combined. Lowering V DD only at detection is also important because it dramatically reduces GIDL currents. The unusual temperature dependence of speed (even nullified) at a lower V DD [95, 96] is another concern in speed testing.
Future prospects
On the basis of the above, we present future perspectives on low-voltage RAMs in terms of devices and processes, memory cells, peripheral circuits, and architectures.
Devices and processes
Device structure of RAM chips
In the near future, RAMs must unavoidably take at least a dual-t OX , dual-V T , and dual-V DD approach because of different requirements between RAM cells and peripheral circuits, as discussed in the subsections on cell signal charge and leakage currents in Section 2. RAM cells require an ever-higher V T ( Figure 5 ) and thus, a high V DD and thick t OX for stable and reliable operation. In contrast, peripheral circuits (or logic blocks on a SoC) require a low V DD , low V T , and thus, thin t OX for fast and low-power operations, according to ITRS trends [8] . For a higher I/O interface voltage, a triple t OX would be popular.
Low-leakage currents
Even one of the most-advanced schemes (Figure 10 ) would be less effective for lower-voltage, larger-capacity SRAMs. The resultant total current is as large as 1.6 A even for a memory capacity as small as 16 Mb-even if a large V T , a thick t OX , and an offset source driving are all combined. Thus, much larger V T and thicker t OX are needed in the future, calling for new devices such as fully depleted (FD) SOIs with a reduced S-factor and new gateinsulator materials. In addition, lowering V DD while keeping the voltage swing the same to preserve the effectiveness of the scheme increases SER to unacceptable levels because of decreased Q S in the standby mode, calling for soft-error-immune devices as well as on-chip ECC circuits.
Low voltage and high speed PD-SOIs [97] have been successfully used for products such as MPUs because they improve the performance of standard digital logic by 20 -35% over the comparable bulk process due to reduced capacitance. Major concerns with PD-SOIs, however, are the instabilities [4, 97] caused by the floating body. In particular, the resulting V T variations degrade margins of cells and their relevant circuits, and the degradation is further enhanced at a lower V DD . For SRAMs, some solutions have been proposed. These include reducing the number of cells connected to one column [98] to lower the accumulated subthreshold leakage from nonselected cells. A body contact applied to the paired MOSFETs of a sense amplifier [99] reduces sense-amplifier offset. A body-tied substrate with partial trench isolation [100, 101] is a solution to significantly improve immunity against soft errors while eliminating instabilities. The floating body in DRAMs degrades data retention time in the 1-T DRAM cell [102] . A combination of bulk for the DRAM cell array and an SOI for the peripheral circuits [103] is a solution despite the costly substrate structure.
The use of a dynamic threshold MOSFET (DTMOS) [104] , which is built with the body connected to the gate and thus enables a non-floating body, is attractive in terms of low-voltage operation and the suppression of speed variations. This lowers the upper limit of the V DD to less than 0.5 V, even at room temperature, because of the rapid increase in pn-forward current [85] . However, the feature of self-corrective V T [85, 87] that DTMOS provides can suppress speed variations.
Although the concept of DTMOS was originally proposed with PD-SOIs despite the highly resistive body, it has also been realized with bulk MOSFETs with a lowresistive body [87] . Coupled with an internal V DD control, the DTMOS with bulk MOSFETs reduced the delay variation (i.e., delay difference between the worst and best design conditions) to one-fiftieth at 0.5 V. In addition, it realized a drive current three times greater than a conventional CMOS, while reducing the subthreshold current to two orders of magnitude.
FD-SOIs are also attractive in low-voltage operation because of the reduced S-factor, a small junction capacitance, small body-bias effects, and a small layout area. Thus, excellent performances [87, [105] [106] [107] have been achieved with multi-V T (dual/triple) FD-SOI, despite low voltages (0.3-0.5 V) and still large (0.25-m) FD-SOI processes. In the 0.1-m or less era, however, we need to reduce additional V T variations [108] , if any, caused by thickness variations of the thin body and to attain multi-V T in specific MOSFETs to reduce subthreshold currents, although uses of special gate materials [109] and gate doping [110] have been proposed. Note that realizing multi-V T through dynamic V BB is impossible with FD-SOIs because of the lack of a body.
Because it seems unlikely that device and process solutions will be developed in time, the pace at which V DD is being lowered should be slowed so that larger MOSFETs are acceptable. Hence, vertical MOSFETs [111] that accept large channel length and t OX without sacrificing density might be effective. Vertical MOSFETs may also reduce RAM cell areas [41] . If the above attempts are unsuccessful, low-temperature bulk CMOS [112] may have a resurgence in the future.
Memory cells
In addition to small, high-speed ECC circuits, new RAM cells such as gain cells are indispensable, as explained in the subsection on DRAM cells in Section 3. In the long run, however, high-speed, high-density nonvolatile RAMs show strong potential for use as low-voltage memories. In particular, leakage-free and soft-error-free structures and the nondestructive read-out and non-charge-based operations that they could provide are attractive in terms of achieving fast cycle times, low power with zero standby power, and stable operation, even at the lower V DD . Simple planar structures, if possible, would cut costs. In this sense, magnetic RAMs (MRAMs) [113] and Ovonic Unified Memories** (OUMs**) [114] are appealing propositions. In MRAMs, one major drawback remains, which is to reduce the magnetic field needed to switch the magnetization of the storage element, while in OUMs, managing the proximity heating of the cell is an issue. In addition, the scalabilities and stability required to ensure nonvolatility still remain unresolved because development is still in its early stages.
Peripheral circuits and architectures
As far as RAMs are concerned, the subthreshold currents in the active mode could be reduced by improving the above-described CMOS circuits, unless they are too fast. In high-speed RAMs, such as fast SRAMs or high-speed column-mode DRAMs, however, current reduction is extremely difficult, as discussed in the subsection on applications to RAMs in Section 4. This suggests that a high-speed SoC will suffer from incredibly high power dissipated by its random logic gates because it may remain impossible to control subthreshold currents from these logic gates at a sufficiently high speed. Hence, the number of gates must be reduced. This implies that new SoC architectures will be required, such as memory-rich SoCs, which effectively reduce the subthreshold current. In addition to new architectures, low-power techniques learned from "old circuits," such as bipolar, BiCMOS, E/D MOS, capacitive boosting, CML circuits, and even I 2 Ls, might be necessary. 
Summary
