Abstract: We propose a novel charge sharing bit-line 10T SRAM for differential read and single ended (SE) write. Decoupled read provides high noise margin. Read bit-lines are not charged to full V DD , and these share charge for read 1 operation. A new write driver is proposed for SE write which charges the write-bit-line conditionally. Virtual power rail is used to suppress bit-line leakages. Compared with 6T SRAM, charge sharing scheme potentially consumes only 25% read and 50% write dynamic power. Thorough comparisons with 6T at 45nm node show that the proposed 10T design has 2× read static noise margin, 71% reduction in total read and 48% reduction in total write power.
Introduction
Static random access memory (SRAM) is the most widely used digital macro, and its design is crucial as it dictates overall performance and the power budget of the SoC. 6T SRAM is a standard macro, however, its use is limited to higher V DD values due to its severe read static noise margin (RSNM) degradation at low V DD . 6T SRAM has conflicting read and write requirements [1] , and proper sizing is required for specific applications. Also, low power (LP) operation and sub-threshold regime is of special interest to the research community [2] .
Many designs and techniques have been proposed to solve, or compensate, the RSNM and power dissipation problems of 6T SRAM. In this paper, we present a 10T SRAM cell for ultra LP applications. Our 10T SRAM provides decoupled differential read and single ended (SE) write operation. BL precharging, charge sharing technique and write driver are presented for LP dissipation. Rest of the paper is organized as follows. Section II provides a brief survey. Section III describes working principle of the proposed cell and its associated scheme. Section IV provides the evaluation results using a 45nm technology node. Finally, section V concludes the paper.
Existing implementations
In a conventional 6T SRAM, bit-lines (BLs) are pre-charged at V DD before read and write. During read, BL(BLB) discharges through the cell, while BLB(BL) stays at V DD . BL switching can account for more than half of total read power [3] . For the write operation, one of the BLs is discharged through write driver. Dynamic power of SRAM is proportional to BL voltage swing (∆V BL ). For write full BL discharge, and for read, with a sense amplifier (SA), about 100mV of ∆V BL is required.
An 8T SRAM, uses a separate read port and isolates internal nodes from I read . Sizing for read and write could be done independently. SRAM cell by Singh et al. uses SE decoupled read and write assist technique for SE write [4] . A low power technique by using separate read and write circuits is proposed in [5] . SRAM cells like 7T [6] , 8T [7] and 9T [8] cut-off the internal storage nodes from I read path during read operation to achiever higher RSNM. However, internal nodes are floating and are affected due to capacitive coupling during read. 9T SRAM cells by Liu [9] and Ltkemeier [2] have differential and decoupled read capability. Several other SRAM cells like 10T [10] and 9T [11] are SE and are designed to reduce BL leakages. Hierarchical BLs, charge recycling of different BL pairs, adaptive body biasing and multi-threshold technology have also been proposed. However, these state-of-the-art designs provide certain benefits while lack in others. SE decoupled read operation provides higher RSNM but makes SA design challenging and power hungry. Differential decoupled read ports provide high RSNM, however BL leakages and high dynamic power dissipation become the issue. Several cells which reduce BL leakages are either SE, or provide slow operation.
Proposed 10T SRAM and charge sharing scheme
The proposed 10T SRAM cell is shown in Fig. 1(a) . Transistors M1 to M4 make cross-coupled inverters which store data bit Q. M5 is the write access transistor. M6 to M10 constitute our read decoupled port [12] . Cross coupled inverters are sized minimum as they only store data. M5 is a strong transistor, thus enforcing data from WBL onto Q node. Proposed write driver is shown in Fig. 1(c) . Fig. 1 (b) shows our pre-charging scheme. We do not pre-charge at full V DD level. Rather, RBL is charged to V P1 and RBLB is charged to V P2 before read operation, where V DD > V P1 > V P2 .
Write operation
W (control signal) is asserted to perform the write operation. First, write enable signal (W en ) is activated which charges WBL only if D in =0. This is different from 6T case where BLs are pre-charged and one of the BL is discharged for each write operation. Thus, conditional charging of WBL saves potentially 50% of the write power (assuming equiprobable data bits).
Reading a 1
Q is 1 (QB=0) and hence M7 is ON and M9-M10 are off. Read signal (R) turns ON M6 & M8. RBL and RBLB gets connected through M6-M8. As BLs are at different voltage levels, charge sharing happens and the current flows from RBL to RBLB. Thus, RBL starts decreasing and RBLB starts increasing. Final voltage value of both RBLs is (V P1 + V P2 )/2. We can deactivate the read signal as soon as RBL has discharged (and RBLB has charged) by ∆V BL /2. A dynamic voltage level shifter (LS) can be used as pre-sense-amplifier that will convert this difference into ∆V BL . During next pre-charge, RBL will be charged by V P1 , and BLB will be discharged through V P2 . Thus, average dynamic power dissipation during read 1 operation is:
where α 1 is the probability of read '1' and f is frequency of operation.
Reading a 0
Q is 0 (QB=1) and thus M7 transistor is off while M9 & M10 are ON. Read signal (R) turns ON M6 & M8 transistors. RBL gets connected to the virtual power rail (V VDD ), which has value of V DD during read, and RBLB gets connected to the ground. Thus, current flows from V DD to the RBL and from RBLB to the ground. We can de-assert read control signal (R) as soon as the absolute sum of total voltage change of RBLs exceeds ∆V BL . Dynamic LS will convert this difference into ∆V BL , as in both read 1 and read 0 case, RBLs are moving in opposite direction, i.e if RBL is charging then RBLB is discharging and vice versa. For the next pre-charge, RBL will be discharged through V P1 and RBLB will be charged by V P2 . Assuming equal change in RBL and RBLB, average dynamic power with α 0 probability of read 0 is:
Average dynamic power
A single BL during read operation has average dynamic power of:
where α 0→1 is bit-line switching activity factor of the BL, f is frequency of operation and ∆V BL is BL voltage swing. A 6T SRAM has two BLs and for each read operation one of the BLs is pulled down. Thus, modeling 6T with a single BL having 1 activity factor, its dynamic power becomes:
In case of 10T SRAM, using α 1 = α 0 = 0.5 in Eq. 1& 2, average dynamic power is:
Thus, the proposed design with pre-charging scheme potentially provides 75% reduction in dynamic read power compared with 6T. However, values of V P1 &V P2 can affect the efficiency of operation and potential power savings.
Simulation conditions and results
We compare our scheme with the 6T SRAM scheme using 45nm technology models. Typical process corner is used and all simulations are done at 27 • C for different values of supply voltage. For accurate results, drain and source capacitances have been modeled properly. Distributed bit-line model is used with effective capacitance of 100fF. We use boosted read signal for 10T, while boosted word-line can not be used with 6T, as it would inject higher current and disturb internal node and thus reduce RSNM. SRAM 6T is designed for proper read and write operations.
Operation of 10T and 6T SRAM is shown in Fig. 2(a) . Transient waveforms show the increase, decrease and pre-charging of BLs for 1V V DD at 500 MHz frequency for read bit pattern '11001'. Effective ∆V BL of both schemes is also shown. Proposed pre-charging scheme and 10T SRAM achieves higher ∆V BL than 6T case. Also, 100 mV of ∆V BL is achieved faster than 6T. However, as supply voltage decreases, slew rate of ∆V BL of 10T decreases. Fig.  2(b) shows the RSNM butterfly curves of both SRAM designs at 400 mV and 1 V. Bar graph in Fig. 2(c) shows the comparison of RSNM values at different supply voltages. On average, from 0.4 V to 1 V, 10T SRAM achieves 2.13× RSNM of 6T.
Write operation is shown in Fig. 3 . As single ended SRAMs have difficulty in write operation, we have used boosted write control signal and strong write access transistor. Also, minimum sized inverters help improve the writability. Transient write waveforms for both write 1 ( W1 ) and write 0 ( W0 ) operations are shown in Fig. 3a at 1V V DD , and write delay for two different values of V DD is shown in Fig. 3b . 10T cell has smaller delay at lower V DD , while it has higher write 1 (W 1 ) delay at higher supply voltage.
Layouts of 6T and 10T cells are shown in Fig. 4a and 4b, respectively. In the layouts, the minimum transistor width is 80nm, and the effective channel length is 40nm. For 6T SRAM, cell ratio (defined: CR=β driver /β access ) of 1.5 is used, while access and load transistors are of same width. For 10T SRAM, a wider write driver is used, and cross coupled inverters are sized minimum. Bit-lines (RBL, RBLB, WBL) and power lines (VDD, GND) run vertically and are shared by the column. Control signals (R, W) and virtual power rail run horizontally and are shared for each row. Area of 10T SRAM is 0.92µm 2 , which is 1.8 times area of the 6T cell. However, power reduction and RSNM improvement outperforms the area cost of 10T SRAM.
Read power dissipation results are shown in Fig. 5a , for different values of supply voltage and operating frequency. 10T dissipates on average only about Fig. 5b . Proposed write driver only charges the WBL when 0 is to be written. For equiprobable data, 10T dissipates half of the write dynamic power of 6T SRAM. On average, over the range of operating voltages, 10T achieves about 48% reduction in total write power compared with the 6T.
Total and BL leakage power results for a 32 bit SRAM column are shown for both designs in Fig. 5c for two different values of V DD at typical process corner. Results are shown for three different cases: (1) all stored 32 bits are '1' (2) all stored 32 bits are '0' and (3) 16 bits are '1' and 16 bits are zero (average data case). All '0s' case is the worst-case for 10T SRAM, as both M9 and M10 are ON. At 0.6V, 10T has higher BL and total leakages for all '0s' case. However, at 0.6V, average case shows that both 6T and 10T have almost similar leakages. While at 1V, 10T have same leakages as 6T for all '0s' case. All '1s' case shows reduced leakages for 10T, and hence, in average case, 10T SRAM offers reduced leakages at 1V. Thus, on average, 10T has lower BL and total leakages than 6T at higher V DD , and almost similar BL and total leakages at lower values of V DD .
Conclusion
A low-power 10T SRAM employing a charge-sharing technique is presented. The design provides decoupled differential read access, and single ended write access. Comparing with 6T SRAM in a 45nm technology; read decoupling provides 2.1× RSNM, pre-charging and bit-line charge sharing achieves about 71% reduction in total read power, and the single WBL along with the proposed write driver allows about 48% total write power savings. 
