† proposes a comprehensive SRAM cell optimization scheme that minimizes leakage power under ultra-low standby supply voltage (V DD ). The theoretical limit of data retention voltage (DRV), the minimum V DD that preserves the states of a memory cell, was derived to be 50 mV for an industrial 90 nm technology. A DRV design model was developed on parameters including body bias, sizing, and channel length. A test chip was implemented and measured to attain DRV sensitivities to key design parameters. Based on this, a low-leakage SRAM cell design methodology is derived and the feasibility of a 270 mV standby V DD was demonstrated, including a safety margin of 100 mV. As a result, the SRAM leakage power was reduced by 97%.
INTRODUCTION
Technology scaling and the fact that a larger fraction of a chip is devoted to memory made SRAM leakage control increasingly important. In microprocessor designs memory consumes a significant portion of system power budget during light-duty operation mode. For example, a past study on 0.13 m high-end processor showed that leakage energy accounted for 30% of L1 cache energy and 80% of L2 cache energy. 1 For mobile applications low standby power is crucial, since compared to the small duration of active power, most times leakage power determines battery life.
To minimize SRAM leakage power, many architecture and circuit level techniques were proposed, including dynamic-biasing, V DD -gating, and novel cell design. Dynamic-biasing techniques adjust the substrate-source and gate-source biases to enhance active driving strength and create low-leakage paths. [1] [2] [3] [4] V DD -gating techniques use sleep transistors to turn off un-used memory sections, 5 6 or reduce V DD to the data-retention level of the memory. 7 8 Recently a 10 T SRAM cell with improved margins under very low V DD was proposed. The low voltage operation reduces both active and standby power. 9 Compared to these existing approaches, this work focus on improving the conventional 6 T SRAM cell for ultra-low voltage standby operation, based on an in-depth understanding of the SRAM cell standby voltage limit. A comprehensive optimization methodology is developed, applying a combination of V DD -gating, dynamic-biasing, and sizing techniques. With the target of ultra-low power mobile application, the design goal is to achieve maximum standby power saving and reliable data retention, with minimum penalty on area, speed, and read-write stability.
In order to minimize SRAM leakage power, an effective method is to reduce the memory standby V DD . The minimum V DD that preserves memory data is the data retention voltage (DRV). Measurement results from a 32 K bits SRAM module implemented in 130 nm technology showed that SRAM cell DRV ranges from 60 mV to 390 mV. At a 100 mV safety margin above the DRV, SRAM leakage power can be reduced by 85%. 8 Based on predictive device models, 10 SPICE simulations showed that the DRV increases with technology scaling due to larger process variations ( Fig. 1) . At a 32 nm node with 700 mV V DD , the DRV with 3 process variations reaches 570 mV for standard SRAM cell. As a result, DRV-aware optimization is critical for future low-power and reliable SRAM design. Figure 1 also shows that by simply using a 5% larger channel length (L) for the four transistors in an SRAM cell inverter loop, 10 ∼ 80 mV reduction in DRV and 50 ∼ 90% saving in leakage power can be achieved. This is due to reduced device mismatch with larger channel length. As technology scales the tuning effect on DRV and leakage saving becomes more significant.
Starting from the analysis on theoretical bound of SRAM DRV, impact of design parameters and process variations on SRAM low-voltage data-retention behavior is thoroughly investigated in this work. Based on DRV sensitivities modeled and measured from a 90 nm technology test chip, DRV-aware SRAM optimization methods are derived. In the end, a low-leakage SRAM cell design methodology is summarized.
DRV MINIMIZATION
The DRV of an SRAM cell is determined by both design parameters and process specifications. 8 In the standby mode, data storage node voltages in a 6T SRAM cell, i.e., V 1 and V 2 in Figure 2 , can be solved from the balance of currents:
These currents may consist of both sub-threshold (sub-V th ) leakage and gate leakage, if DRV is smaller than V th . Although gate leakage becomes more pronounced at 90 nm node and below, its magnitude is still secondary to sub-V th leakage, especially at low voltage. 11 Therefore, sub-V th leakage is the basis of our derivation of DRV in this work. SRAM cell data-retention stability is further indicated by the static noise margin (SNM) of its voltage transfer curves (VTC). SNM approaches zero when V DD is reduced to DRV, which can be expressed by the following conditions: 
Theoretical DRV Limit
The above equations provide the cornerstone to quantitatively evaluate DRV under any process and design conditions. In reality, variations of process parameters are the liming factors of DRV, as shown in our previous work. 8 On the other hand, it is also important to understand the fundamental limit of DRV in theory, assuming ideal process and design conditions. This understanding will help guide the optimization of technology and design in the long term. For the first time, such a theoretical limit is analyzed and presented in this section. During the data retention mode, the leakage current mismatch between the two data-holding inverters is the dominant factor that causes high DRV. To investigate the theoretical limit, we first eliminate the mismatches by assuming that the SRAM cell is manufactured with ideal process conditions. In practice this variation-free condition may be approximated by increasing the channel lengths of the transistors (at fixed W /L ratio) to minimize the impact of process variations. Furthermore, leakage through the two pass transistors is another factor that cause current mismatch, especially for typical SRAM standby scheme that connects both bitline voltages to V DD . In this theoretical analysis we assume that those are ideal switches with zero leakage (i.e., I 5 = I 6 = 0). In a realistic design this may be achieved by applying negative word line voltage during standby. Finally, in order to attain the maximum data-retention SNM, perfectly balanced PMOS to NMOS strength ratio (P /N = 1) is assumed, i.e., those devices have symmetrical V th , sub-threshold swing and sizing. In practice we can approximate such a balance by applying standby body bias control to a standard-size SRAM cell. Usually reverse-biasing the NMOS devices during standby helps achieve a balanced leakage current ratio for optimal data-retention. Note that this standby balance strength assumption (P /N = 1) does not conflict with the standard sizing ratio required for SRAM active read and write operations, since the body bias can be reset to zero during active operation.
In next section, the impact of these three assumptions on both DRV reduction and standby SNM optimization will be analyzed. Based on these assumptions, Eq. (1) becomes:
Using the expression of sub-V th current, 8 Eq. (3) can be expanded into:
where v T is the thermal voltage kT /q and nv T ln10 is the sub-threshold swing. Then, by solving ( V 1 / V 2 ) from Eqs. (4) respectively, and using the condition of Eq. (2), the theoretical limit of DRV is solved as:
Note that n for an ideal CMOS technology is 1 (i.e., 60 mV/dec as the swing), which provides DRV = 36 mV. For a typical 90 nm technology with n = 1 5, DRV goes up to 50 mV. This matches well with SPICE simulation result from an industrial 90 nm technology, as shown in Section 2.2. Under the theoretical limit of DRV, V 1 = V 2 = DRV/2; as a consequence, the SRAM cell loses the capability to differentiate store data. Equation (5) provides the theoretical bottom-line of DRV for 6T SRAM design, no matter how well we can optimize the size or V th of transistors. Note that if the sub-V th swing could be reduced to 0 (i.e., n = 0), DRV could decrease to 0 V. In realistic design, process variations and deviation of design parameters away from the ideal condition cause DRV to be much higher than that of Eq. (5). The methods that help improve a realistic SRAM design to approach the ideal DRV are discussed in next section.
Improving Realistic SRAM Cell to Approach DRV Limit
In realistic SRAM cell design, the DRV is much greater than the theoretical limit because of process variations and performance-driven cell design that is un-optimized for low voltage data-retention. Based on an industry standard SRAM design with realistic process variations, the methods to minimize DRV can be observed by analyzing the SNM during data-retention mode.
In Figure 3 the solid lines show VTC of a standard cell under DRV condition. The un-balanced VTC openings are caused by three reasons: weak PMOS to NMOS strength ratio (P /N ) that skews the VTC; process variations that further degrade both curves especially the one with a weaker PMOS; and leakage through the pass transistor that connects the state zero to bitline at V DD . Therefore, to improve SNM and reduce DRV following techniques can be used: (1) Reduce process variation with larger channel length (2) Use balanced P /N strength ratio during standby (3) Suppress pass transistor leakage during standby The improvements on SNM are shown in Figure 3 . In design practice, the P /N ratio and pass transistor leakage can be controlled with methods of body bias control and negative word line voltage during standby to avoid impact on active memory operations. Increasing L at fixed W /L ratio involves a tradeoff with cell area. The impact of larger L on active operation parameters (data access delay, read, and write noise margins) will be analyzed in Section 6.3 of this paper.
In Figure 4 , the quantitative impacts on DRV by applying these optimization techniques are illustrated. Simulated with an industrial 90 nm technology model, the DRV of a standard-size SRAM cell with 20% V th and L local mismatches is around 260 mV. By reducing the process variations to zero, DRV can be lowered to 140 mV. Next by tuning the P /N ratio and suppressing the pass transistor leakages DRV approaches the technology theoretical limit of 50 mV. In design practice such an effective optimization may not be possible. Therefore the theoretical limit remains theoretical, but taking into account these DRV-factors and corresponding design methods help the designers build more reliable SRAM cell for low-voltage standby operation. 
A DRV DESIGN MODEL
In previous work, 8 a DRV model based on process parameters of each individual transistor in the SRAM cell was developed as following:
V th i can be accurately modeled by Eq. (7), with the second and third terms representing body bias and DIBL effects.
To facilitate low-power memory design, it is important to have a DRV design model that describes DRV sensitivity to the key SRAM design parameters such as transistor sizing, channel length, and body bias. Such a design model can be derived based on the above DRV model in Eqs. (6-7).
SRAM Design Variables
The design parameters used to optimize an SRAM cell are summarized in Figure 5 . These variables are W /L sizing ratio and channel length of PMOS pull-up transistors ( p , l p ), NMOS pull-down transistors ( n , l n ) and NMOS access transistors ( a , l a ), body bias voltages of PMOS and NMOS devices (V PB , V NB ), as well as bitline standby voltages (V BL , V BL ). Among these variables, the access Table I . DRV design model. transistor sizing and bitline voltages have less significant effect on DRV, due to that the l a in standard SRAM cell sizing is larger than minimum length and the stable grounded gate connection of access transistors. The DRV design model is based on the other variables of larger impact on DRV, including the body bias voltages and sizing of the PMOS pull-up and NMOS pull-down transistors. In Section 5, measured DRV sensitivities on all parameters will be analyzed and compared to this model.
DRV Design Model
A DRV design model is important to qualitatively and quantitatively evaluate DRV of a SRAM under different process and design conditions. A model based on key design parameters will facilitate designers in optimization of SRAM cell for ultra low power applications. Such a DRV model is presented in this section.
Using Eq. (7), i i in Eq. (6.3) can be expressed in terms of design parameters as Eqs. (6, 6.1-6.3) , an expression for DRV in terms of Key Design parameters is proposed and summarized in Table I . This model is general and scalable across all design parameters. Representative coefficients in the model are extracted by comparing it with industry 90 nm technology data. leakage. DRV decreases with stronger PMOS but increases again when the forward bias causes PMOS leakage to be stronger than NMOS with zero body bias. At fixed W /L ratio, larger channel length (l p ,l n reduces process variation and DRV. The widths (w p , w n ) alone have very little impact on DRV. Comparison between modeled and measured data in section V verifies this model. This model can be used in proposing various design guidelines to minimize the standby power of an SRAM cell. Table II shows the design of array sizing, normalized to an industry standard SRAM cell. A-arrays are a series of 25 memory arrays, from A1 to A25, with PMOS and NMOS channel lengths varied between 1 and 3 times of standard value. Similarly, the 25 B-arrays use different sizing ratios for NMOS pull-down and PMOS pull-up transistors.
Qin et al.

SRAM Cell
Optimization for Ultra-Low Power Standby
TEST CHIP DESIGN
While larger values were experimented on most variables, smaller n was used due to the strong strength of pulldown NMOS transistors in standard SRAM cell design. C-arrays test four configurations of access transistor sizing and channel length. The rest are D-arrays with mixed sizing and channel length experiments on pull-down NMOS and pull-up PMOS transistors.
Besides sizing experiments, several other standby controls were implemented in this chip, including configurable body bias voltages (V NB , V PB ) and standby bitline voltage control. During standby mode, the bitlines can be connected to either V DD or ground, or be left floating. The configurable body bias control did not involve area overhead in this design, since the industrial-IP SRAM module this test chip based on used separate metal grid for body bias connections. By reconnecting this grid from the V DD and ground contacts to the external V PB and V NB pins, flexible body bias control was achieved. However, in high-density SRAM cell designs, such a body bias control may involve extra area penalty. Finally, a ground-switch on each array enables leakage measurement on a per-array basis. The design diagram and chip layout are shown in Figure 7 .
DRV MEASUREMENT RESULTS AND MODEL VERIFICATION
DRV of every individual memory cell on the test chip was measured by constantly reading a written state out of the cell after a period of time in a low voltage standby mode. The read and write operations are conducted at 1 V supply voltage, while the standby V DD keeps reducing until the cell state read after standby becomes the opposite to the state written before standby. Due to process variations, every realistic SRAM cell has a predominant state, and always returns to this state when V DD is lower than DRV, no matter what the original cell state is. During DRV testing each cell is measured twice with pre-written state 1 and 0, respectively. The measurement that writes the cell predominant state always read out the same value even with zero standby V DD , while the other measurement provides the DRV of this memory cell when the non-predominant state flips to the predominant state at low V DD .
Measurement Results
Indicated by grayscale intensity, measured DRV of one chip is shown in Figure 8 . Average DRV of standard size arrays (A1 and B1) is around 140 mV. DRV reduces with larger channel length (A arrays), and increases when PMOS is sized between 2 and 3 times the standard size (B11 ∼ B20). For B21 ∼ B25 arrays when PMOS is 3X the standard size, strong pull-up strength causes instability or malfunction during write operation (at 1 V). Such failures are indicated by the black spots in Figure 8 . In contrary to the high DRV sensitivity on pull-up PMOS and pull-down NMOS sizing, pass transistor sizing show little impact on DRV (C arrays). As discussed in Section 3.1, this is because of three factors: the large l a in standard SRAM cell, the stable ground level voltage on access transistor gates, and the relatively less impact of transistor sizing on leakage and DRV. On the other hand, body bias affects transistor leakage in an exponential way, and cause large impact on DRV, as shown in the following measurement results. Figure 9 shows the average DRV values measured from cells on 15 chips, showing the impact of various design parameters on DRV. DRV sensitivity on body bias is the highest. In Figure 9 (a), for each V NB there is an optimal V PB that minimizes DRV, because balanced P /N ratio is a key factor in DRV optimization. At zero V NB , -0.2 V forward-biased V PB is optimal. This indicates a weak P /N ratio in standard cell sizing. On the other hand, V NB has a two-fold impact on DRV, due to the two types of NMOS devices in an SRAM cell. Forward-biased V NB significantly increases pass transistor leakage and leads to higher DRV. This made the P /N balance effect less obvious. Overall the measurement results show that in order to minimize DRV mean value, reverse-biasing V NB to suppress pass transistor leakages and adjusting V PB accordingly to achieve a balanced P /N strength ratio (zero V PB in this design and technology) are the most effective methods. Figure 9 (b) shows that generally DRV reduces with larger channel length at fixed W /L ratio. The shape of DRV versus l n curves is a result of the NMOS device characteristic, since for this device V th and its variance are the lowest with l n at 1.5X the standard length. Figure 9 (c) shows that the widths have relatively less impact on DRV, while the preference for a balanced P /N strength ratio can still be observed. Finally, experiments at different standby bitline voltages lead to less than 10 mV difference in DRV, since the pass transistor leakage dependency on bitline voltage is weak. DRV increases with temperature at about 5 mV/10 C.
Model Verification
The DRV design model presented in Section III can be verified by comparing both the model predicted DRV mean and statistical distribution with measurement data. As shown in Table III , the errors in predicted and measured DRV mean values are less than 5%. Figure 10(a) plots the modeled DRV mean values marked with measurement data over body bias, the most influential design parameter.
Qin et al.
SRAM Cell Optimization for Ultra-Low Power Standby Figure 10(b) shows the DRV distribution measured from 3840 standard sized SRAM cell. Shown in the same figure is the distribution of 3840 predicted DRV values generated from the DRV design model, with an input of process variation data in gaussian distribution extracted from the 90 nm technology. Comparison between the measured and model predicted DRV distributions shows a close match. Therefore when designing the minimum standby V DD of a memory module, this DRV design model can be used to predict the worst case DRV among cells in the whole memory. 
OPTIMIZATION
With the support from DRV design model and the test chip leakage measurement results, optimization analysis for lowest worst case DRV and minimum SRAM leakage power is presented in this section.
Worst Case DRV Minimization
In order to design a practical low-voltage SRAM leakage suppression scheme, the standby V DD of the memory chip needs to be derived based on the worst case DRV among all SRAM cells. As verified in Section 5, the DRV design model predicts the DRV distribution given the memory size, SRAM cell design and magnitude of process variations. Based on model prediction, Figure 11 shows the worst case DRV for aggregate memory size of 3840 bits (A1 array of 15 test chips with 256 bits in each array) over body bias and channel length. These predictions were confirmed with measurement data. As shown in Figure 11 (a), to minimize the worst case DRV, reverse biased V NB is effective due to reduced pass transistor leakage and improved P /N ratio. V PB is optimized at forward bias region for stronger PMOS and less variation. Figure 11(b) shows that larger channel length effectively lowers the worst case DRV by reducing device mismatch, but involves a tradeoff with area overhead. A designer may select the optimal point to balance memory reliability and low power requirement within area constraint. For example, with 50% larger channel length (40% extra area), 50 mV reduction in worst case DRV and 50% leakage power saving can be achieved. In a 
Qin et al.
SRAM Cell Optimization for Ultra-Low Power Standby larger memory, the reduction in worst case DRV is more dramatic.
Leakage Power Minimization
While reverse biased V NB and larger channel length reduce both DRV and leakage current, forward biased V PB minimizes DRV but increases leakage. To investigate the optimal bias scheme for leakage minimization, leakage of SRAM arrays on the test chip were measured under different voltage and bias conditions. During leakage measurement, only the ground switch of the measured array is turned on, and all other 63 SRAM arrays are turned off.
Typically the leakages through the other turned-off ground switches add up to about 5 times the leakage being measured (through the array with its ground switch turned-on).
By carefully estimating the turned-off resistances of each array and extracting the turn-on array leakage current from the measurement results, the individual array leakage currents are attained and analyzed as following. Figure 12 (a) shows the measured leakage power of a standard sized array at 100 mV margin above the worst case DRV at various body biases. Although reverse-biased V NB and forward-biased V PB minimize the worst case DRV, reverse bias on both V NB and V PB minimizes the leakage power with a saving of 60%.
As a summary, SRAM leakage power minimization requires reverse biased body bias and larger channel length at a cost of area. DRV can be further reduced by forward biasing V PB but with higher leakage power. The impact of larger channel length and body bias control on memory active operation metrics (performance, read, and write reliability) will be analyzed in next section.
The leakage savings by applying DRV-aware SRAM cell optimization methods are quantitatively shown in Figure 12 (b), which plots measured leakages of standard size array A1 and array A7 with 50% larger channel length. Leakage power can be reduced by 10X from standard cell standby at 1 V (A) to standby at 320 mV (B), the un-optimized worst case DRV plus 100 mV safety margin. With 400 mV reverse body biases for both PMOS and NMOS (bias scheme I), the worst case DRV does not change but leakage power reduces by 2X (C). Furthermore, by using larger channel length in A7, the worst case DRV plus safety margin is 50 mV lower, and leakage power is reduced by another 2X (D). Overall standby power saving with optimized SRAM design at 270 mV standby V DD is 75% compared to standard cell standby at un-optimized DRV, and 97.5% compared to standard cell standby at 1 V.
Finally, Figure 13 shows impact of leakage optimization on measured DRV distribution. Compared to the standard array DRV distributions, the A7 array DRV distribution under 400 mV reverse body bias moves towards the lower end with narrower spread. The DRV mean is 30 mV lower and worst case DRV is 50 mV lower. In the future work, by applying error correction scheme to correct the errors at the end of DRV distribution, the minimum memory standby V DD can be further reduced.
The Impact of Optimization on Active Operation Metrics
Besides reduction in DRV and leakage power, the impact of optimizations on active (read and write) SRAM operations are analyzed in this section. Figure 14 shows the simulated active operation reliability margins and read delay with and without the DRV-aware design optimizations. Simulated in an industrial 90 nm technology with 20% local variations in V th and channel length, the read margin is defined as the maximum square between the inverter VTC curves during read operation, 12 and the write margin is defined in the latest proposed method. 13 The read access delay is characterized as the time it takes to discharge the bitline to 90% V DD level, with an approximated bitline capacitive load of 5 pF.
As shown in Figure 14(a) , the write margin decreases with larger channel length, especially l p . This is because larger l p reduces the PMOS V th and causes an increased difficulty writing '0' into the SRAM cell inverter that originally hold state '1'. Such a situation can be improved by applying 400 mV reverse PMOS body bias and 400 V forward NMOS body bias (bias Scheme II) during write operation. By boosting the NMOS to PMOS strength ratio the write margin can be improved by 80 mV to 100 mV, resulting in a higher reliability than the original SRAM cell without DRV optimizations.
On the other hand, Figure 14 (b) shows that the read margin improves with larger channel length by 5 mV to 35 mV, due to the reduced mismatch in the circuit path of read access, formed by the pass transistor, and the pull-down NMOS device. Therefore the body bias control can be used to improve the other important SRAM cell design metric, the read performance. By applying 400 mV forward body bias to both NMOS and PMOS (bias Scheme III), the read margin is slightly lower than without body bias, but the read delay is reduced by more than 10% (Fig. 14(c) ).
As a summary, the DRV optimization techniques of applying body bias control and using larger channel length can be used effectively to improve both the active and standby operations. The critical tradeoff is between optimized results and the area penalty caused by larger channel length and body bias control. Another factor that should be taken into account for high-speed SRAM design is the time it takes to change the body bias voltages during active operation, which may reduce the potential performance gain.
