Robust high-density subthreshold SRAMs are indispensable for emerging ultra-low power applications such as implantable devices, medical instruments, and wireless sensor networks. Conventional 6T SRAMs in the subthreshold region fail to deliver the density and yield requirements due to the reduced static noise margin (SNM), poor writability, limited number of cells per bitline, and reduced bitline sensing margin. 8T and 10T SRAM cells have been proposed to improve the SNM by decoupling the SRAM cell nodes from the bitline and hence making the read mode SNM equal to the hold mode SNM [1,2]. This paper introduces various circuit techniques for designing robust high-density subthreshold SRAMs: (i) decoupled cell for read margin improvement, (ii) utilizing reverse short channel effect (RSCE) for write margin improvement, (iii) eliminating data-dependent bitline leakage to enable long bitlines, (iv) virtual ground replica scheme for improved bitline sensing margin, (v) writeback scheme for data preservation during write, and (vi) optimal gate sizing based on subthreshold logical effort. Fig. 18 .5.1 shows the proposed 10T SRAM cell. When read is enabled (RWL=1), the read bitline (RBL) is conditionally discharged through pull-down transistors M7, M8, and M9 depending on 'QB'. The cell node is isolated from the bitline during read operation, retaining a hold mode SNM. The 10T SRAM cell has an SNM of 76mV at a supply voltage of 0.2V while that of a conventional 6T SRAM cell is 14mV. When read is disabled (RWL=0), node A is held to V DD making the bitline leakage flow from node A to RBL, regardless of the data stored in the SRAM cell. Write operation of the proposed cell is performed by asserting WWL while the write data are loaded onto the write bitlines (WBL, WBLB). To improve the write margin, previous techniques have applied a wordline voltage that is higher than the cell voltage to increase the drive current of the write access transistors [2]. Instead, we use the governing RSCE in the subthreshold region to improve the cell writability without introducing a separate high V DD . RSCE is observed in modern CMOS devices due to the HALO pocket implants used to compensate the V t roll-off [3]. RSCE is not a concern in conventional superthreshold designs since it does not affect the device characteristics of minimum channel length transistors. However, in the subthreshold region where DIBL is reduced and current depends exponentially on V t , RSCE causes the operating current to increase with a longer channel length, as shown in Fig. 18 .5.2. For a fixed device width, minimum delay is achieved at 0.36µm in the subthreshold region, which is three times longer than the minimum channel length of 0.12µm. For equal drive current, device width can be reduced as the channel length is increased, lowering the junction capacitance, which significantly contributes to the write power consumption. Using RSCE yields further advantages such as better sub-threshold slope owing to the longer channel length and reduced impact of random dopant fluctuation due to the increased gate area for equal drive current. Writability of the proposed SRAM cell is improved using a 3× longer channel length for the write access transistors (Fig. 18.5.2, bottom) . The writability of the proposed SRAM is equivalent to that of a conventional scheme using a WWL voltage that is 70mV higher than the nominal 0.2V supply voltage. Longer channel length devices are also used for the static CMOS gates in the SRAM row decoding path to reduce the delay, power consumption, and circuit variability.
.5.1 shows the proposed 10T SRAM cell. When read is enabled (RWL=1), the read bitline (RBL) is conditionally discharged through pull-down transistors M7, M8, and M9 depending on 'QB'. The cell node is isolated from the bitline during read operation, retaining a hold mode SNM. The 10T SRAM cell has an SNM of 76mV at a supply voltage of 0.2V while that of a conventional 6T SRAM cell is 14mV. When read is disabled (RWL=0), node A is held to V DD making the bitline leakage flow from node A to RBL, regardless of the data stored in the SRAM cell. Write operation of the proposed cell is performed by asserting WWL while the write data are loaded onto the write bitlines (WBL, WBLB). To improve the write margin, previous techniques have applied a wordline voltage that is higher than the cell voltage to increase the drive current of the write access transistors [2] . Instead, we use the governing RSCE in the subthreshold region to improve the cell writability without introducing a separate high V DD . RSCE is observed in modern CMOS devices due to the HALO pocket implants used to compensate the V t roll-off [3] . RSCE is not a concern in conventional superthreshold designs since it does not affect the device characteristics of minimum channel length transistors. However, in the subthreshold region where DIBL is reduced and current depends exponentially on V t , RSCE causes the operating current to increase with a longer channel length, as shown in Fig. 18 .5.2. For a fixed device width, minimum delay is achieved at 0.36µm in the subthreshold region, which is three times longer than the minimum channel length of 0.12µm. For equal drive current, device width can be reduced as the channel length is increased, lowering the junction capacitance, which significantly contributes to the write power consumption. Using RSCE yields further advantages such as better sub-threshold slope owing to the longer channel length and reduced impact of random dopant fluctuation due to the increased gate area for equal drive current. Writability of the proposed SRAM cell is improved using a 3× longer channel length for the write access transistors (Fig. 18 .5.2, bottom). The writability of the proposed SRAM is equivalent to that of a conventional scheme using a WWL voltage that is 70mV higher than the nominal 0.2V supply voltage. Longer channel length devices are also used for the static CMOS gates in the SRAM row decoding path to reduce the delay, power consumption, and circuit variability.
The small I on -to-I off ratio in the subthreshold region limits the number of cells per bitline and negatively impacts the SRAM density. Bitline leakage from the unaccessed cells can rival the read current of the accessed cell making it hard to distinguish between the bitline high and low levels. Previous techniques suffer from data-dependent bitline leakage, which can cause the RBL high level to droop or RBL low level to rise based on the data stored in the unaccessed cells of a bitline. Figure 18 .5.3 shows that the bitline voltage for data '1' may be lower than that for data '0' for the worst case data patterns, which can cause the read buffer to generate an incorrect output. A 0.3V subthreshold SRAM with 256 cells on a single bitline has been reported in [2] . The proposed 10T SRAM cell eliminates the data-dependent bitline leakage problem by having M10 (Fig. 18 .5.1) turned on in all the unaccessed SRAM cells (RWL is low) while M8 and M9 are turned off. The drain voltage of M10 therefore becomes V DD and forces the leakage current to always flow from the cell into the bitline regardless of the data stored. A bitline swing of 130mV is achieved at a 0.2V supply while the bitline high and low levels are constant with respect to the data pattern of the column.
Sense amplifiers are replaced with static-inverter-type read buffers in subthreshold SRAMs to achieve a higher bitline sensing margin. In addition to the data-independent bitline leakage cell, a virtual ground replica scheme is proposed in this work to obtain the largest sensing margin by automatically tracking the optimal trip point of the read buffers under PVT variations. A replica bitline with hardwired data and control signals generates the highest logic '0' voltage, which becomes the virtual ground for the read buffers. As shown in Fig. 18 .5.4, the technique makes the trip point of the read buffer stay at the middle of the bitline high and low levels offering an ideal sensing margin.
A 4.1×1.5mm 2 SRAM with 480kb cells was fabricated in 8M 0.13µm CMOS (Fig. 18.5.7 ). Each SRAM quadrant has different number of cells per bitline (128, 256, 512, and 1024) to test the effectiveness of the proposed techniques in enabling long bitlines for high-density. The quadrant with 1024 cells on a bitline is functional down to 0.20V running at 120 KHz (27°C). Fig. 18.5.5 (left) shows the measured leakage current of the SRAM core while sweeping the supply voltage. The leakage current consumption is reduced to 10.2µA by lowering the supply voltage to 0.20V (27°C). Fig. 18.5.5 (right) shows that by utilizing the RSCE, 28% improvement in the row decoder path delay (1.22µs→0.88µs) is gained. The measured virtual ground voltages are shown in Fig 
