Abstract-The growing demand of multimedia rich applications in handled portable devices continuously driving the need for large and high speed embedded Static Random Access Memory (SRAM) to enhance the system performance. Many circuit techniques, e.g. body bias, bit charge recycling etc., have been proposed to expand design margins at low voltage operation while reducing leakage current at standby mode, but the performance is analyzed at the cost of speed and this issue is not addressed widely. Also due to continuous scaling of CMOS, the process variations also affect the performance of SRAMs. This paper presents the analysis of low leakage SRAM along with the speed factor.
I. INTRODUCTION
The rising demand for multimedia rich applications in handheld devices drive the need for large and high speed portable devices continue to drive the need for large and high speed SRAM (Static Random Access Memory) to enhance the system performance. The power sensitive portable devices need to reduce the dynamic and standby power consumption in order to meet the battery life time. As the leakage power in the CPU is mainly dominated by the large on die SRAMs as the rest of the processor has been greatly optimized to reduce the power consumption. The low transistor leakage has traditionally been achieved by increasing the transistor threshold voltage (Vt), gate length and gate dielectric thickness at the cost of speed and area. The low-power requirement has often been met by compromising the SRAM performance through the adoption of slower performing transistors with low leakage and lowering the supply voltage.
Lowering of Vdd in SRAM has been applied to reduce the standby leakage power and the active power associated with switching the highly capacitive bitlines and wordlines during active operations. However, the operating margin of SRAM cell sets the lower limit of operating voltage. To ensure adequate read/write margins of SRAM cells, the SRAM is required to run above a minimum supply voltage during the active mode and standby mode. However, device scaling of large SRAM has not kept pace with technology scaling due to the rising process variations in scaled SRAM cells, and the growing size of embedded SRAM [2] . Many circuit techniques have been proposed [2, 3, 4] to expand design margins at low-voltage operation while reducing leakage current at standby mode. However, the performance cost from trading off transistor speed for lower leakage is not clearly addressed. This paper analyzes the low leakage Manuscript received September 29, 2010 in SRAM cell along with the effect on speed.
In it, Section-II discussed the 6T-SRAM Cell and the analysis is discussed in the Section-III. It also discusses various SRAM Cell topologies. At Table-1, the performance review of the conventional SRAM Cells at various technologies is being compared.
II. CONVENTIONAL 6T-SRAM CELL
A typical SRAM, shown in Fig. 1 , consists of several blocks, e.g., memory cell arrays, address decoder, column multiplexers, sense amplifiers, I/Os, and a control circuitry. The functionality and design of every component of an SRAM block can be found in [7] . A schematic of 6T SRAM cell is as shown in the Fig. 1 . The bit value stored in the cell is preserved as long as the cell is connected to a supply voltage whose value is greater than the Data Retention Voltage (DRV). This is due to the presence of cross-coupled inverters in the 6T SRAM Cell. In an SRAM cell, the pulldown NMOS transistors and the pass-transistors reside in the read path. To achieve a high read stability, the pulldown transistors are made stronger than the pass-transistors. The pull-up PMOS transistors and the pass-transistors, on the other hand, are in the write path. Although using strong PMOS transistors improves the read stability, it degrades the write-margin. A proper sizing of pass-transistors is required to achieve an adequate write margin [5] . [7] III.
ANALYSIS OF CONVENTIONAL 6T-SRAM CIRCUITS
In this section we have analyzed the performance of various SRAM circuits with respect to speed and leakage power. Various methodologies have been used to improve the speed and leakage power consumption of the cell as discussed as below:
A.
Dynamic SRAM PMOS FBB (Forward Body Bias) An effective approach to help to reduce the leakage power involves dynamically changing the body bias of Fig. 2 . When the block enters the standby state, reverse body bias (RBB) is applied to increase the Vt of the transistors, and thus decreasing the sub-threshold leakage current. When the block returns to the active state, RBB is removed to decrease the Vt of the transistors, and thus restore the nominal performance of the transistors [7] . Fig.2 . Dynamic Body Bias [7] The key issue with this approach is that the range of threshold adjustment is limited, which in turn limits the amount of sub-threshold leakage reduction. However, the advantage of this approach over the sleep transistors is that it can be implemented without incurring any delay penalty. This can be done by applying forward body bias (FBB) when the block is in the active state. Under the FBB, the Vt of the devices is lowered, thus increasing the performance.
As a tradeoff, FBB increases the subthreshold leakage. It is also crucial to limit FBB to ensure that the source-bulk pn-junction remains in cutoff when FBB is applied. Lowering the power supply is known to be one of the best methods in reducing the power consumption of integrated circuits. Low power operation can be achieved by lowering the operating voltage in active and/or inactive modes. Moreover, typically SRAM can retain the data at a lower voltage (Standby-VCCmin) than the minimum voltage needed for Read / Write (ActiveVCCmin). Fig.3 . SRAM Circuit using Forward Body [8] The PMOS strength in the 6T-SRAM cell is essential to maintain the cell stability during the active mode. This is particularly important for low-voltage operation. The process solution of using low-Vt PMOS is often prohibitive due to the excessive large transistor leakage, but the adaptive body-bias design is shown to be effective in transistor performance improvement without the cost of leakage-power [9] . The column-based SRAM PMOS bodybias has been proposed to improve both the Read and the Write VCCmin [10] . The design keeps the N-well forwardbiased in idle mode, which increases the leakage power, and charges it above Vcc for the selected column during write operation. In this design, a dynamic forward body bias for the PMOS in the SRAM cell is developed to improve the robustness of low-voltage operation while meeting stringent product power requirements at minimum design overhead. The sub-array based design is adopted to keep the area overhead minimal. It applies forward-bias to the activated sections of the array during both Read and Write operations. Although this will degrade the Write-Margin but the overall VCCmin distribution will be improved, since the VCCmin for large arrays are Read limited. The amount of FBB is determined by the ratio of two PMOS devices (PL and PD), which has a built-in programming control [9] .
To meet fast and dynamic requirements, an NMOS pulldown path, formed by transistors ND and NS, and controlled by a pulse signal, is employed to achieve the fast voltage transition at the N-well. A feedback or shut-off mechanism is also used to prevent the N-well voltage from dropping too low and causing excessive junction leakage. The trip points of the inverters, SI1 and SI2, are optimized to meet this need as shown in Fig. 3 . The pull-down signal pulse is programmable and generated off the wakeup signal (Sleep5p). It starts the discharge of the N-well voltage one cycle before the WL is turned on in order to ensure that the N-well voltage level has reached intended static level. It is shown that the design has achieved high-frequency operation over a wide voltage range, a maximum frequency operation of 2.7 GHz at as low as 0.9 V, and 3.8 GHz at 1.1 V [9] . This operating frequency in 45nm technology represents a 25% improvement over previously reported results in 65nm technology. The 256Kb dual blocks operates at 2.3-4.2GHz and consumes 16-20mw total power using 0.7-1.2V variable V CORE , 1.2V fixed V LLC and standby of 0.9V at 85 o C. Power reduction decreases at very high activity of one access every three cycles (33%) activity Fig.  4 . Fig.4 . Chart showing voltage vs. frequency and power [14] B.
Bit-Line Charge Recycling The charge-recycling SRAM (CR-SRAM) reduces the read and write powers by recycling the charge in bit lines. When N bit lines recycle their charges, the swing voltage and power of bit lines are reduced to1/N and 1/N 2 , respectively. The CR-SRAM utilizes hierarchical bit-line architecture to perform the charge-recycling without static noise margin degradation in memory cells. It can recycle the charge in bit lines during both read and write operations, where as the conventional charge-recycling SRAM recycles the charge only during write operations [21] . The proposed CR-SRAM not only reduces the read power by using the charge-recycling read operation, but also it does not have the power and delay overheads due to the read-to-write mode change of the conventional charge-recycling SRAM. With the hierarchical bit line, the CR-SRAM performs bitline charge-recycling without static noise margin degradation. Therefore, it can reduce both read and write powers with good reliability. In the simulation, the CR-SRAM saves 17% read power and 84% write power compared with the conventional SRAM.
C.
Single-Ended 10T-SRAM Cell (10T-S SRAM) To improve the 8T-SRAM, a 10T non-recharge SRAM with a single-end read bit-line is being proposed [5] . Thereafter, we call "10T-S SRAM". Two PMOS transistors are appended to the 8T SRAM cell, which results in the combination of the 6T conventional cell, an inverter and transmission gate. The additional signal (/RWL) is an inversion signal of a read word-line (RWL). It controls the additional PMOS transistor (P4) at the transmission gate. While the RWL and /RWL are asserted and the transmission gate is on, a stored node is connected to an RBL through the inverter. It is not necessary to prepare a precharge circuit because the inverter fully charges/discharges the RBL [12] . In all the SRAMs, the worst cell with the worst thresholdvoltage variation determines the delay. At a supply voltage of 1.0V the 10T-S SRAM is faster than the 8T SRAM because neither the precharge circuit nor the keeper circuit is needed in it.
D.
Ultra Low-Power SRAM An ULP SRAM with column redundancy, electrically programmable fuse, and PBIST (Programmable Built-inSelf-Test) has been built and packaged in an 8 metal Layer Land Grid Array (LGA) package with the flip-chip technology. The large SRAM size was intended to explore the limitation of process technology and variability of SRAM cell [20] .
The standby leakage of 1Mb SRAM operating in the standby, low leakage-power and high-speed mode are as, in standby mode the SRAM macro draws 12 µA of leakage at Vdd of 0.5V. The leakage current increases to 22µA for the low power operation of Vdd at 0.7V and 90µA for the operation of Vdd at 1.2V.
The maximum access frequency of 1Mb SRAM is as shown in the Fig.5 . The SRAM can operate over a wide range of supply voltages from 1.2V down to 0.5V. It achieves 1.1GHz frequency at the nominal voltage of 1.2V and 250MHz at 0.7V. This performance represents the highest reported access frequency for the same class of SRAM standby power consumption and SRAM size. 
E.
Sub-threshold 10T-SRAM Cell A 256kb 65nm bulk CMOS test chip uses the 10T bitcell and the architecture shown in Fig.5 . The memory has eight 32kb blocks with 256 rows and 128 columns each. A single 128bit Dual I/O (DIO) bus serves all eight blocks. In the initial instantiation of the sub-threshold memory, only one read or write can occur per cycle, however the 10T bitcell would allow a read and write access to the same block in one cycle. Such a dual-port instantiation of the memory would require a second Dual I/O bus and additional peripheral logic. A combined global wordline and block select signal assert a local wordline that triggers either WLRD or WLWR. For a write access for the accessed row turns off. The write drivers simply consist of inverters with transmission gates, which turn-off when the memory is not writing to minimize leakage on the write bitlines (BL and BLB). The power supply to the WL drivers is routed separately to allow a boosted WL voltage. This technique improves the access speed and increases the robustness to local variations. The read bitline (RBL) is precharged prior to read access, and its steady-state value is "sensed" using a simple inverter. Column and row redundancy is a ubiquitous technique in commercial memories used to improve yield. For this analysis of the SRAM, it is being assumed that the availability of one redundant row and column per block. Fig.6 . Sub-threshold 10T-SRAM Cell [14] At 27 o C, the 10T memory saves 2.5X and 3.8X in leakage power by scaling from 0.6V to 0.4V and 0.3V, respectively and over 60X when Vdd scales from 1.2V to 0.3V. Scaling also gives the expected savings access, as shown in Fig.7 , it shows the measured frequency of operation versus Vdd (the 1.2V speed of 200MHz is a simulation result, because the testing board did not support high-speed testing). The maximum measured operating speed at 400mV is 475KHz in active energy per read [14] . Fig.7 . Frequency versus V DD [14] Voltage scaling is an effective strategy for minimizing the power consumption of SRAMs. Further, as SRAMs continue to occupy a dominating portion of the total area and power in modern ICs, the resulting total power saving is significant. Unfortunately, however, conventional SRAMs, based on the 6T bit-cell, fail to operate at voltages below approximately at 700mV, both because of the reduced signal levels and of increased variation. In sub-Vt, in particular, threshold voltage variation has an exponential effect on the drive current, resulting in increased cell instability and a severely degraded read-current. To address these limitations, an 8T bit-cell is incorporated into a 65nm and 256kb SRAM, and it achieves full read and write functionality deep into the sub-Vt regime at 350mV. At this voltage, the total leakage power is 2.2µW, and the operating speed is 25 KHz as shown in the Fig.8 and Fig.9 . The significantly reduced speed is expected in sub-Vt and is acceptable for low throughput, energy-constrained applications. At 350 mV, the leakage power represents almost 85% of the total power consumption, so, the leakage reduction is a critical consideration. Additionally, the tradeoff between the size of a sense-amplifier and its statistical offset is emerging as a primary limitation to SRAM scaling in advanced technologies. In this design, enabling sub-Vt write requires the use of circuit assists that result in a layout where sense-amplifier multiplexing between the adjacent columns is impractical. Accordingly, the sense-amp scaling limitation is stressed, necessitating a different approach for managing the offset-area tradeoff. The concept of senseamp redundancy is introduced, and it is demonstrated that, for a given area constraint, errors in the sensing network due to offsets can be reduced by over an order of magnitude. In this design, a factor of five improvements is expected with the implemented scheme, which incorporates a simple startup control loop. 
IV. CONCLUSION
As the battery operated devices are in great demand and to increase their reliability, the life time of battery is a prime concerned but this is done at the cost of speed. But in high speed circuits where speed is the major concern like wireless communications these low-leakage SRAM fails. For low-leakage and high-speed circuits concern should be on both the factors speed and power. This paper tries to incorporate the issue of speed which remains uncovered in SRAM designs. In a Conventional 6T-SRAM Cell, as we consider the devices of Small-Feature Sizes (180nm to 45nm), area improves which directly reflects the highdensity circuits, a good feature for the multimedia systems where more memory is required. But on the other side, it brings several other issues related to the device performance. For 180nm technology node with W/L as 1 and DualBoosting of wordline, speed is trying to be achieved at the cost of area, and power where at the 130nm feature size with gate feedback cell, power is improved speed and stability are compromised. The SRAM Cell at 90nm node with increased Vth of the NMOS transistors have improved the power (Vdd 0.5V and Vdd (min) is 44mV) and speed, simultaneously. The Self-Write-Back Sense Amplifier and Cascade Bit-line Scheme at 65nm technology (Cell size 0.495um
2 ) is best suited for Small Size SRAM and LowVoltage Operation. The access speed depends on the number of Sense Amplifiers. The High-K Metal Gate and FBB for pMOS at 45nm feature size allow a high density memory circuit along with the improvement in the power and speed performances with an extra care for the data stability.
