First, key issues for low-voltage «IV) embedded RAMs are summarized in terms of stable operation, suppression of leakage (gate-tunneling/subthreshold) currents, and speed variation of memory cells and peripheral logic circuits. Next, DRAM and SRAM cells to cope with the above issues, the circuit design focusing on subthreshold-current issue, and suppression of or compensation for design-parameter variations to reduce the speed variations are discussed. Voltage converters and power management for low-power and low-voltage operation are also explained. Finally, based on the above discussions, a perspective is given with emphasis on needs for simple/high signal-to-noise ratio memory cells (such as gain cells) with a pure logic compatible process, high-speed subthreshold-current reduction focusing on active mode, and memory-rich SoC architectures.
INTRODUCTION
Low-voltage embedded RAM (eRAM) is vital for low-power system-ona-chip (SoC) technology [1] . To take advantage of device miniaturization, however, the power-supply (Voo) of eRAMs must keep up with the rapid lowering of the Voo of MPUs, as discussed later. As Voo makes the transition to sub-V levels there are three major challenges [1] : stable operation of memory cells against reduced signal charges; suppression of the ever-increasing leakage (subthreshold and tunneling) currents of the MOSTs in the peripheral logic circuits and memory cells; and compensation for the increased variations of speed that are seen at lower levels of Voo.
In this paper, low-voltage eRAM circuits are discussed with emphasis on subthreshold-leakage current issue: First, low-voltage trends are discussed, and then recent developments to do with the above issues for DRAM and SRAM cells, and peripheral logic circuits are investigated. Other design issues for low-Voo operation, such as on-chip supply-voltage converters, power management, and testing methodology to cope with the subthreshold current are also described. A position is then taken that emphasizes needs for gain cells, subthreshold-current reduction circuits for use in active mode, and memory-rich SoC architectures.
2.
CURRENT GENERATION OF LOW-VOLTAGE eRAM TECHNOLOGY Device miniaturization for the core logic and embedded SRAM cache (i.e., eSRAM) of high-end MPUs has recently been accelerated by the use of an MOST structure with a thinner gate oxide (tox) , and thus a lower Voo and a lower threshold voltage (VT) for lower power and higher speeds (Fig. 1) [2] . Here, the dual-Voo and dual tox-device approach has been maintained with an MOST that has a thickertox and higher-VT for the higherVoo I/O circuitry. As a result, I-V Voo and 2-3-nm tox have become popular for use in the core and eSRAM.
Trends in Gate-Oxide Thickness
For standard (stand-alone) DRAMs, operation with a single external Voo and an on-chip voltage-<lown converter (i.e., series regulator) has been used to realize power-supply standardization despite the internally dual-Voo operation. In addition, the single thick tox (necessary for word-bootstrapping of the cells) has been used Figure2. Two kinds of leakage currents; subthreshold current (Is) and gate-tunneling current (lg), and a power switch for the reductions.
throughout the chip for low cost. Recently, however, a dual-Voo and dual lox-device approach, similar to that taken with MPUs, has been adopted to achieve higher speeds for eDRAMs. One example is a recent 8-Mb eDRAM with 3.7-ns access ( Fig. 1 ) [3] .
In any event, the above three issues arise with a rapid lowering of Voo. In particular, leakage-current reduction is essential for all the LSIs of the future. There are two major leakage currents; the subthreshold current and the gate-oxide tunneling current of the MOSTs (Fig. 2) . A tunneling-current reduction circuit with a power switch, in which an MOST with a thick lox and high VT is used, has recently been proposed (Fig. 2) [4] . However, continued reduction in lox will eventually come to depend on the developments of an MOST with a gate insulator of some high-k material, because even the circuit mentioned above would not be able to manage the exponential increases in current that are a result of lower and lower values for lox.
Memory Cells and Relevant Circuits
DRAM: For the low-voltage one-transistor one-capacitor DRAM (1-T) cell, using vertical capacitors and high-perrnrnittivity thin-films to maintain sufficient signal charge, and adjusting the potential profile of the storage node to suppress the pn-leakage current are critical ways of maintaining the cell's signal voltage, soft-error immunity and retention times even as Voo is lowered.
Low-voltage circuits are also important. The negative word-line (NWL) scheme (Fig. 3) [5] cuts the subthreshold current from the storage node to the data line despite the low-actual VT with a gate-source back-bias of 8 during the non-selected period. This makes it possible to reduce the wordline voltage in a full-Voo write operation by 8, which allows the use of an MOST with a thinner tox for a given device reliability. The resultant small subthreshold swing (i.e. S-factor) enables low-voltage operation. The Multidivided data-line scheme, in which data-line capacitance (Co) and line delay K. Itoh (Fig. 4 ) [6] , in which a I-T DRAM cell with a Cs as small as less than 10 fF is realized by a single poly-Si planar capacitor, and an extensive multi-bank scheme with 128 banks (32 Kb in each) that are simultaneously operable is used.
A greater than 300-MHz row-access frequency was achieved for a 0.18-J..lm 1.8-V 2-Mb eDRAM.
A DRAM cache with the same capacity as a single bank is used to hide the refresh operation: Even when there is a conflict between a normal access and a refresh operation at the same bank, a cache hit occurs while permitting a refresh cycle for that bank since all of the data in that bank have been copied to the cache.
Low-voltage high-speed sensing is also essential for the well-known midpoint sensing by which the data-line power is halved without using a dummy cell, since using the half-Voo data-line voltage makes the sense-amplifier (SA) operation quite slow. In the overdrive sensing scheme [7, 8, 9] this problem is solved by applying a higher voltage solely to the SA-inputs with isolating the data line from SA or with capacitive coupling. The use of additional capacitors may be acceptable in eDRAMs for which area is not a concern.
FigureS. 6-T SRAM cell and the voltage margin (a), and improved cell (b). SRAM: Of the many proposed designs [1] , the 6-T full-CMOS cell is most suitable despite the fact that it is large. This is because of the simple process and the ease of design provided by the cell's wide-voltage margin. Even for this cell, subthreshold-currents must be reduced and voltage margin must be widened as VT is lowered by lowering VDD• Note that the subthreshold current for VT = 0 V would lead to a retention current of as much as 10 A in a I-Mb SRAM array [1] . This places a strict limit on the reduction of VT• The worsening of voltage (noise) margins as VDD and VT are lowered (Figs. 5 and 6) creates a demand for a decrease in the ratio of transconductance of the transfer MOST to that of the driver MOST. The voltage margin is also worsened by variations in VT and mismatches of VT between paired MOSTs, both of which increase by device miniaturization [1] . In addition, for a low-VT transfer MOST, the margin is further worsened by the data pattern of non-selected cells in the column, as shown in Fig. 7 subthreshold leakage currents from the non-selected cells may be greater than the selected cell's current. This causes a read failure. A hierarchical data-line scheme provides a partial solution to this problem by limiting the number of memory cells connected to the data line [10, 11] . Using a data equalizer for current compensation [12] is also effective, despite the penalty in terms of area. Applying a gate-source offset driving scheme (Fig. 8 ) to the memory cell allows the use of a low-VT transfer MOST for high speed and a negligible data-line current [13] , although eleven transistors are required for each cell.
The loadless CMOS 4-T SRAM (Fig. 9 ) [14] is attractive because the cell size is only 56% of that of a conventional 6-T cell. Transfer MOSTs of the non-selected cells at Voo-word line (WL) voltage work as load elements, and the load current to keep the storage node high is supplied from a data line that has been precharged to Voo. The WL voltage, however, must be precisely controlled, as shown in the figure, so as to keep the load current (i.e., the off-current ofthe transfer MOST) to more than a hundred times that of the off current of the driver MOST. The data-pattern problem outlined
WL-voltage.level rompensalion (WLC)
, -------------------------- above also arises in this approach. Almost all of the above problems may be solved by using high-VT cross-coupled MOSTs combined with a boosted power supply, as shown in Fig. 5(b) [15] . High VT for the paired MOSTs is necessary to obtain a low subthreshold current. The raised power-supply voltage increases the transconductance of the driver MOST and compensates for variations in VT and mismatches of VT. Even a low-VT transfer MOST for high-speed is then acceptable, due to the increased transconductance of the driver MOST. Negative word-line biasing reduces the leakage currents from cells in the column. Using a high-VT transfer MOST with word bootstrapping is an alternative approach that achieves the same result.
Peripheral Logic Circuits
Subthreshold Current: Many attempts [1] have been made to reduce subthreshold currents in standby mode. Gate-source back -biasing [1] is effective most for memory applications. For example, applying a back-bias of as little as 0.25 V to a word-driver block reduces the standby subthreshold-current by 2-3 decades without inflicting penalties in terms of speed and area. In combination with a multiply divided power-line scheme [1] , this approach is able to confine subthreshold currents to a single subpower line. Applying dual-VT [16] to the critical paths of logic circuits is quite effective, reducing the standby subthreshold current to one-fifth of its value for a single low VT• In another form of multiple-VT design called random modulation [17] , four kinds of VTs are applied to further reduce the current in sub-IV circuitry despite additional cost and complexity of design. Applying the well-known dynamic substrate back-bias (VBB) reduces the current by 1-2 decades. Even so, the required VBB swing is as large as 1-3 V and this approach becomes less effective with device scaling [18] . The smaller body constant, enhanced short-channel effects, and increases in other leakage currents such as the gate-induced drain-lowering (GIDL) current [18] are responsible for the drawback. Even if the subthreshold current is still a small fraction of the active current because VT is still high, the subthreshold current in the active mode makes the operation of dynamic circuits such as dynamic NAND for the decoder unstable, discharging the floating nodes. The level keeper proposed for logic circuits (Fig. 10) [19] would also be effective in memory design, although its effects would be fatal if subthreshold current is increased to the extent that the keeper cannot manage. Speed Variation: The amount of speed variation for a given variation in designparameters is increased by lowering VDD• Control of VBB and internal VDD [20, 21] in accordance with the parameter variations reduces the speed variation. Controlling a forward VBB is more effective in reducing speed variations [22, 23] than controlling a reverse VBB• This is because the VT-VBB characteristics is more sensitive to VBB [1] . Forward VBB in [23] , for example, reduced the variation of logic circuits and improved operation speed by 10%. If a forward VBB is used, however, requirements for noise suppression become more stringent, i.e., there is a call for the uniform distribution of the forward VBB throughout the chip. However, having a quasi-floating. VBB at each MOST induces instabilities such as history effects, as in SOl, resulting in intra-die fluctuations in VT [24] . This is especially so for high-speed LSls. When the clock cycle is smaller than the noise decay time in the substrate, which may be more than 10 ns for a substrate capacitance of 1 nF, substrate noise may accumulate and cause a variation in VT• In particular, eDRAMs may couple spike noises to the floating VBB when many data lines are simultaneously charged and discharged. This is so unless a triple-well substrate is used. Additional current consumption, in the form of bipolar current induced by the forward VBB, is another matter that must be considered.
Other Design Issues
On-Chip Supply-Voltage Converters: On-chip voltage conversion is essential for the stable operation of memory cells, reduction of the subthreshold current, and suppression of variations in speed with fewer external supply voltages. A high efficiency of voltage conversion, high degree of accuracy in the output voltage and low cost of implementation are the key issues. Using voltage-up converter with a charge pump suffers from the poor current-driving ability of this approach, which means that subthreshold currents at the load must be strictly limited to maintain the boosted voltage level. Voltage-down converters have two types; the series regulator and the switching regulator. The series regulator, widely used in modem commercial DRAMs, provides a small output current and a poor efficiency (' m of around 50% when the difference between input and output voltages is large. A regulator of this type, however, offers a highly accurate output voltage and it is possible to implement all of the required components on a single chip. The switching regulator has a good efficiency, even in the high output-current region. However, a regulator of this type inherently suffers from a large ripple in the output voltage and a poor transient response for large changes in the current. In general, it also requires a large NMOST and PMOST, and external parts such as an inductor and capacitor. There have been many attempts to solve these problems. A combination of a charge pump and series regulator achieves a high efficiency and precise output voltages [25] , and is applicable to the NWL scheme described above. In the hybrid converter [26] , consisting of both a series regulator and switching regulator, the switching regulator is turned on and off according the load current to maintain a high efficiency across a wide range of load currents.
Power Management: Here are some view points of memory designers on the methods of power management for SoC technology that have been proposed by logic designers. Although it incurs a long recovery time at heavily capacitive internal power-lines, power switching with a high-VT MOST completely cuts the leakage currents of the internal core circuits. Dynamic voltage scaling (DVS) [27, 28] , in which the clock frequency and Voo are dynamically varied in response to the computational load, provides a reduced energy consumption per process during the periods of little computational, while still providing peak performance when this is required. However, this approach becomes less effective in the low-Voo era since the range across which it is possible to vary Voo becomes narrow. In addition, successful operation over a wide range of Voo requires the accurate tracking of all circuit delays. Furthermore, applying DVS makes dynamic circuits (e.g. eDRAMs) unstable. This is due to the trapping of charge at floating nodes when voltage bumps are applied. DVS does not reduce subthreshold currents. However, in elastic-VT CMOS [29] , where the clock frequency, Voo, and VBB are all dynamically varied, these currents are reduced. The cost is the added complexity of design. Recently, low-power designs for which it is possible to specify the maximum level of power have been proposed. In ChipOS [30] , for example, each circuit-block's power is managed by controlling the gated clock and power switch to achieve a given power budget. The confinement of junction temperature and load current that this allows may permit the use of temperature-sensitive eRAM, or the suppression of fluctuations in the voltage output of on-chip voltage converters.
Testing: A large subthreshold current makes the discrimination of defective and non-defective IODQ currents difficult, and thereby poses a problem for the IODQ testing of low-voltage CMOS circuits. IODQ testing with the application of a reverse VBB (Fig. 11) [31] is effective when lowtemperature measurement and multi-VT design are combined. 3.
FUTURE TRENDS
The I-T DRAM cell is not suitable for low-voltage operation because the reduction of its signal voltage is not acceptable. More division of the dataline, aiming at a smaller Co, to offset the reduction in cell-signal levels at a lower Voo sharply increases the effective cell area, as shown in Fig. 12 . This area is defined as the sum of the actual cell area and the total area of the additional cell-related circuits (such as sense amp) in each division. Using gain cells would be one solution for the problem, because they generate a high enough signal voltage without requiring any data-line division even at low values of Voo, and thus provide a fixed effective cell area that is independent of Voo. For eSRAM, the advanced 6-T cell (Fig. 5(b) ) may continue to be used.
The issue of leakage current presents a real challenge in the design and testing of a low-voltage high-speed Soc. The developments of innovative MOSTs that have new gate materials and subthreshold-current reduction circuits are the keys to tackling this challenge. In particular, with a further reduction in VT, subthreshold-current reduction circuits for use in high-speed active mode will be indispensable, even for static logic circuits. This is because the subthreshold current comes to exceed the capacitive current and eventually dominates the total active current of the chip [1] . In this case, the low-power advantage of CMOS circuits is lost. If this problem is not solved, we can envision a scenario in which even CMOS SoC would suffer from huge levels of dc power dissipation caused by subthreshold currents, as was the case in the bipolar and BiCMOS LSI eras of the recent past. Thus, lowpower techniques for bipolar and BiCMOS circuits might be revived to help reduce the power dissipation of the SoC. Reducing the number of random logic gates is one vital way of reducing the active-mode subthreshold current, since control of subthreshold currents from random logic gates at sufficiently high speed may remain impossible.
Hence, new SoC architectures such as a memory-rich SoC, by which the subthreshold currents are effectively reduced, will be needed.
4.

