Abstract: This paper presents a novel 12T SRAM bitcell suitable for subthreshold operation. To make bit-interleaving structure feasible and eliminate half-select disturbance, the proposed cell features single pass-gate and dual pass-gates for read and write operation respectively. Additionally, the access path is decoupled by dedicate transistors from the true storage node, which both enhances the read stability and ensures enough sensing margin. Multi-threshold voltage metric is utilized to improve writability and leakage consumption. Simulation results show that the proposed cell offers 1.8X read static noise margin (RSNM) and 1.6X negative write static noise margin (WSNM) compared with traditional 6T cell at 0.4 V, sensing margin and access performance are improved compared with 10T cell.
Introduction
With the increasing demand for energy-saving applications in recent years, power consumption rather than performance becomes the primary design concern [1] . As embedded static random access memorys (eSRAM) account for a dominating portion of the total power in modern SoCs, low power SRAM design is necessary and promising. Reducing the supply voltage (VDD) decreases the dynamic power quadratically and leakage power linearly. Therefore, voltage scaling down to near threshold or subthreshold region has remained the major focus of low power design both for logic circuit and SRAM [2] . In the near threshold or subthreshold region, random dophant fluctuation (RDF) and line edge roughness (LER) etc aggravate the process variation and transistor mismatch [3] , and PVT (process, voltage, temperature) variations have an exponential impact on the transistor drive current. The deteriorating process variation and mismatch results in dramatic reduction in on-current to off-current ratio (Ion/Ioff) which makes it difficult to distinguish between the read current of the accessed cell and the cumulative leakage current of the unaccessed cells, leading to wrong read. Besides, traditional 6T bitcell has inconsistent transistor-strength requirements for read stability and writability and sizing transistors to meet the strength demand is ineffective due to the impact of V th variations at low voltage. All of the above factors cause the conventional 6T SRAM fail to work in near-or sub-V th region.
Various peripheral write and read assist circuits have been developed to make low voltage 6T SRAM reliable. Wordline over drive (WLOD) [4] , collapse cell VDD (CCVDD) [5] and negative bitline (NBL) voltage [6] are the main measures taken to improve the WSNM. The WLOD method boosts the word line (WL) to a higher than VDD level and increases the strength of the pass gate (PG) during write operation. However, WLOD incurs read-stability degradation for the half-select (HS) cells. The CCVDD scheme lowers the cell supply voltage to weaken the pullup PMOS and thus increasing the writability, whereas the hold static noise margin (HSNM) of the HS cells on the same VDD line deteriorates. The NBL plan assists writing operation by making one BL voltage negative to strengthen the PG on the write '0' side. But due to the inclusion of pump capacitors in NBL plan, additional area and power cost is sacrificed. For the sake of read stability, wordline under drive (WLUD) is utilized in [7] at the cost of cell performance. Suppressed BL technique is exploited in [8] to improve the read stability, nevertheless, such implementation leads to more area overhead and higher power consumption than WLUD.
On the other hand, diverse novel SRAM bitcells have been invented to solve the challenges faced by the 6T cell. Read-path decoupled SRAM cells like 8T [9, 10] , 10T [11] are proposed to make the RSNM equal to the HSNM by isolating the SRAM cell node from the read bitline. Because soft-error rate (SER) increases as VDD scales down [12] , it is necessary to implant error correction code (ECC) scheme which can address multiple bit soft-errors on the basis of bit-interleaving (adjacent bits are implemented as different logic words) structure [13] . Unfortunately, the robust read-path decoupled cells are not compatible with bit-interleaving in the conventional way because the HS cells are under read disturbance during write operation [14] , what's more, 8T or 10T structure suffers from low-speed single-ended sensing and additional read-bitline leakage consumption.
In conclusion, taking stability, soft-error, speed and power issues into consideration, differential sensing, noise-immune and bit-interleaving capable bitcells are attractive and competitive for subthreshold SRAM designing.
Previous differential sensing cells
Traditional 6T (referred as "6T") cell is shown in Fig. 1(a) . The conflicting transistor-strength requirements for read stability and writability necessitate a specific size relationship among the pull up (PU), pull down (PD) and pass gate (PG) transistors, shown as W PD > W PG > W PU . Large transistor size not only enlarges the layout but also increases the leakage current. In addition, when bitinterleaving structure is carried out to reduce SER, the HS cells suffer from read disturbance. As a result, write-assistant schemes like WLOD and CCVDD are unavailable.
A differential 10T bitcell (referred as "10T") is used to enable separate access for read and write operations in [15] , as shown in Fig. 1(b) . This 10T cell features dual pass-gates MNPG and MNPGI for write access, and dedicated read footer MNFO to enhance the stability. The writability is reduced due to the stacking NMOS transistors between the bitline and the internal node. To boost the writability, hybrid regular and high V th (RVT and HVT), or low and regular V th (LVT and RVT) are adopted. The two configurations offers either a lower minimum voltage (V MIN ) or a higher V MIN but a reduced static consumption. For example, in the higher V MIN mode, better writability is acquired by deploying PU and PD transistors as HVT while pass gates and the read footer transistors keep at RVT. However, such 10T cell suffers from lower sensing-speed or error-sensing owing to the BL leakage which is data and PVT dependent. This issue limits the minimum operating voltage. Fig. 1 (c) demonstrates a novel Schmitt trigger based, differential, 10T SRAM bitcell (referred as "SC") proposed by literature [16] . With built-in feedback mechanism, the SNM of the SC cell gets improved significantly. Unlike the conventional 6T cell, the SC bitcell gives better read stability as well as better writability. Note that the read access time gets longer because of the two stack PD NMOS transistors. Besides, read-disturbence of HS cells on the same wordline still exits during write operation when exploiting bit-interleaving structure.
Literature [17] presents a Schmitt trigger based 12T bitcell, displayed in Fig. 1(d) (referred as "SC 12T"). Because N6 and N9 isolate BL disturbance signal from the true storage node Q and QB during read operation, the RSNM is further improved theoretically compared with the SC cell. Whereas, writability deteriorates greatly as a result of the stack access NMOS transistors. Hence, the author employs multiple threshold transistor scheme to promote the write performance. LVT transistors are applied to N7 and N10 while RVT transistors are applied to the rest devices, at the cost of reduced RSNM and increased BL leakage consumption.
Proposed differential multi-V th 12T cell
In order to remedy the shortcomings of the previous 6T, 10T, SC and SC 12T cells. We fabricate a stable, reliable and bit-interleaving capable 12T bitcell suitable for subthreshold operation, referred as "12T". Detailedly described in Fig. 2 .
The proposed cell consists of four parts, namely, the symmetric access part (NAL1, NAL2, NAR1, NAR2), the back-to-back inverters part (PL1, NL1, PR1, NR1), the read footer part (NLD1, NRD1), and the BL leakage compensation (BLLC) part (PLD1, PRD1). As the stack access transistors degrade cell writability and the additional leakage current path (from node Q to Q2, QB to QB2) increases static power consumption, multi-V th design metric is utilized. On the read path, NAL1, NLD1, NAR1 and NRD1 are implemented as RVT transistors balancing a faster access time against low BL leakage. For the purpose of improving write performance and lowering leakage power, LVT NMOS is applied to NAL2 and NAR2 while PL1, PR1, NL1 and NR1 are configured as HVT transistors. The BLLC devices PLD1 and PRD1 characterize with RVT PMOS considering the BL leakage compensation strength whose purpose is to improve read sensing margin. It is noteworthy that LVT PLD1 and PRD1 have better leakage compensation capability, however, such allocation weakens the ability of holding '1' and thus decreases the SNM. Additional area penalty is incurred because of multi-V th designing (analyse in section 6).
Simulation results
For the proposed 12T cell together with previous 6T, 10T, SC 12T and SC cell, Hspice simulations based on SMIC 55 nm CMOS technology are carried out to prove the merits of our cell. To eliminate the effects of size optimization, all transistors in all cells feature minimum size. Higher V MIN mode with lower BL and VDD leakage is adopted for the 10T cell as described in section 2.
Hold static noise margin (HSNM)
1000 Monte Carlo simulations are executed to measure the HSNM of the proposed and previous five cells at 0.4 V and 25°C. For convenience of comparision, a fitting curve of normal distribution is depicted in Fig. 3(a) . In Fig. 3(b) butterfly diagram consists of VTCs (Voltage transmission characteristic curves) is displayed.
At hold mode, there is a curve bending on the VTC of 12T cell as shown in Fig. 3(b) , such bending results from the additional pull down path and shrinks the enclosure area. Assumping that voltage at node Q (VQ) translates from 0 V to 0.4 V, PMOS transistors PR1 and PRD1 are at on state whereas NMOS transistors NR1 and NRD1 are at off state in the beginning. When VQ increases to about 0.1 V, NRD1 is slightly switched on but PRD1 has not been shut off at the same time and thus a weak additional pull down path is introduced which causes the curve bending. In the latter half period, the VTC changing trend of 12T cell becomes similar with that of 6T cell after PRD1 turns off. The curve bending leads to HSNM reduction. According to the Monte Carlo simulation results in Fig. 3(a) , our cell possesses a 14% mean HSNM reduction compared with 6T cell. However, such little reduction is acceptable because HSNM is not the decisive factor for low voltage working.
Read static noise margin (RSNM)
RSNM of the five cells is measured at 0.4 V and 25°C by running 1000 Monte Carlo simulations. Fig. 4(a) presents the fitting curve of normal distribution to show the improvement vividly. Butterfly curve at 0.4 V and 0.8 V for 6T and 12T are shown in Fig. 4(b) . Mean value and standard deviation improvement of RSNM versus VDD is demonstrated in Fig. 4(c) and Fig. 4(d) respectively.
During the read phase of our 12T cell, signal WL is enabled whereas signal WWL is disabled. BLL and BLR have been precharged to VDD before reading. For the read '0' operation (VQ = VQ2 = '0', VQB = VQB2 = '1'), read current through NAL1 and NDL1 incurs disturbance only at node Q2. Fortunately, noise at node Q2 is prevented from the true storage node Q benefiting from the shut off PLD1. And thus, RSNM of 12T cell gets 80% and 20% improvement compared with 6T and SC cell respectively. Fig. 4(c) indicates that higher VDD brings about larger RSNM improvement in comparision with 6T cell.
Write static noise margin (WSNM) and read access time
Writing operation is executed when WL and WWL are enabled in 12T cell. As shown in Fig. 5(a) , larger negative WSNM stands for stronger writability. Thanks to the multi-V th design technique, the performance of writing '0' and writing '1' both gets promoted and thus our 12T cell has the strongest writability below 0.6 V among the five cells. Comparing 12T cell to 6T cell, 64% writability enhancement is obtained at 0.4 V and the promotion rate decreases as supply voltage increasing. Access time is defined as the time required for developing 50 mV bitline differential voltage after the wordline signal WL is asserted during read phase and configuration of 64 cells per bitline is employed for five cells. All cells are initialized to the same '0' state (VQ = '0', VQB = '1') at 25°C before measuring. As shown in figure Fig. 5(b) , because of the utilization of LVT access transistors in SC 12T cell, the access time decreases 66% compare to 6T cell at 0.4 V with a heavily degraded RSNM (show in Fig. 4(a) ). For the SC cell, 58% access time penalty is incurred at 0.4 V due to the stack PD NMOS transistors. 10T cell suffers from BL leakage and has 11% and 27% access time increasement at 0.4 V and 
Leakage consumption
The total bitcell leakage current shown in Fig. 6 consists of VDD leakage current and BL leakage current. The LVT access transistors in SC 12T cell leak far more current than the other four cells. Leakage consumption of our 12T cell is somewhat less than that of 6T cell owing to the application of HVT transistors.
Bit-interleaving and BL leakage compensation capability
Half-select disturbance is a serious problem that must be overcomed by the bitinterleaving structure. Fig. 7 demonstrates the select cell (cell0), HS cells (cell1 and cell2) and unselect cell (cell3) during write operation. For HS cell2 on the same column, there is no read disturbance introduced to the internal storage node thanks to the disabled signal WLhji, and the enabled signal WWLhmi brings no noise as well. For HS cell1 in the same row, the BLLhni and BLRhni have been precharged to VDD before write operation. When WLhii is asserted, high voltage read disturbance is generated at node Q2 or QB2 (depend on the stored data) due to the read current flows from BL to ground. Fortunately, the true storage node Q or QB is shielding from the disturbance because of the disabled PLD1 or PRD1. Consequently, disturbance-free bit-interleaving structure is feasible with our 12T cell.
1000 Monte Carlo simulations are carried out to illustrate the BL leakage issue shown in Fig. 8 . For the differential sensing 10T cell at low voltage, the dramatically reduced Ion/Ioff makes it difficult to distinguish between the read current of the accessed cell on one bitline and the cumulative leakage current of the unaccessed cells on the other bitline. Besides, the leakage current is impressible to PVT variations and related to the stored dataes. Peripheral assistant circuits used to compensate the leakage current incurs read performance degradation and additional power consumption. Thus, self-adaptive BL leakege compensation is necessary for reliable sensing without access speed decrease. The BLLC PMOS transistor PLD1 and PRD1 in our 12T cell adaptively compensate bitline leakage during read operation and thus more sensing margin (Fig. 8 ) and better access performance (Fig. 5(b) ) is obtained compared with 10T cell.
Area cost
The layout of 6T cell and proposed 12T cell is depicted as in Fig. 9 based on SMIC 55 nm CMOS process, and no DRC dimension rules are violated. The 12T cell costs 2.58X area compared to conventional 6T cell due to additional transistors and multi-V th design method. Table I . Summary comparisions in Table I indicates that the proposed 12T bitcell is competitive and promising for low power SRAM design.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61474135). Fig. 9 . Layout of 6T cell and 12T cell
