It is attractive to design power efficient and robust SRAM in low voltage and high performance systems for mobile or battery-powered electronics. To reduce the power consumption resulting from bit-line activities a new bit-line charge recycling circuit is proposed for 8T SRAMs. By eliminating the use of analog blocks required in existing circuits in literature, this proposed charge recycling scheme results in less design complexity. In addition, two types of SRAM cells are employed to improve the robustness in write operation, and hierarchical bit-line structure is applied to reduce the power consumption in read operation. Post-layout simulations demonstrate the proposed design results in 3.08 and 2.62 times enhancement of WSNM and SWN compared to conventional 6T SRAM design in the same technology, respectively. The power consumption of proposed design results in a reduction of 64.2% and 27.5% in write and read power consumption compared to 6T SRAM design. Moreover, given the same supply voltage (e.g., 1.2V), post-layout simulation shows the proposed design is able to run at 5 times higher clock rate than the existing designs in literature. Given the same clock frequency requirement (e.g., 100MHz), a lower supply voltage (e.g., 0.7V) can sustain robust operation of the proposed design.
Introduction
CMOS technology scaling driven by Moore's law has rapidly increased VLSI design capability and complexity. Nowadays, SRAM has been widely embedded in many low power and high performance systems, such as various CPU or GPU caches. SRAM stands for Static Random Access Memory, which allows data to be stored as long as power supply is present. Typical SRAM use 6-transistor structure, so it is usually referred to as 6T SRAM cell. 6T SRAM occupies less chip area and hence results in higher integration density than other types of SRAM. However, with the decrease of power supply, 6T cell is hard to maintain sufficient static noise margin (SNM) to meet the operation stability requirement [1] [2] . On the other hand, 8T SRAM cell consists of two additional transistors (i.e., N1 and N2 in Figure 1 (a)) in read operation path. Due to the fact that write and read signal paths are decoupled completely in 8T cell, the SNM is improved substantially. As a consequence, 8T SRAM cell is very attractive and feasible in low power and low-voltage power supply systems.
Among various power consumption components in SRAMs, cycle-by-cycle bit-line charging and discharging activity is the primary cause, since bit lines are long routing metals with a significant amount of parasitic capacitance. As depicted in Figure 1 (b), before write operation occurs in a SRAM cell, both of bit lines are pre-charged to VDD. Then, in the write operation one bit line (i.e., WBL0_N) is forced to perform a full-swing discharge from VDD to GND, which is the main cause for power consumption of SRAMs. In order to decrease power consumption, it is highly desirable to recycle and reuse this amount of charge (i.e., ×VDD). The first concept of charge recycling was proposed by B. S. Kong in 1996 [3] . The principle of charge recycling is similar to energy harvesting [4] [5] , which has been widely used in low power electronics. Later, the researchers proposed to utilize charge recycling techniques in SRAM design [6] [7] . The prior design methods in [6] [7] have demonstrated the capability of reducing bit line power consumption in SRAM to some extent. Yet, these solutions have several inherent drawbacks. For example, additional reference voltage sources are needed to pre-charge bit lines to different voltages before read and write operation [6] [7] . The presence of additional reference voltage sources increases system design complexity and power consumption. Meanwhile, the write performance is highly depended on the voltage difference, the higher number of charge recycling bit-line pairs (i.e., N=4 or 8) SRAM employs, the smaller voltage difference it has. Furthermore, the same SRAM cells are used for all different voltage swings (i.e., write performance for the bit-line voltage difference between GND and ¼ VDD is more robust than the voltage difference between VDD and ¾ VDD. It will be discussed in section 3.2). These features lead to potential of instability, restrict the application for low-voltage SRAMs and impede the charge recycling efficiency. Therefore, it is necessary to investigate new charge recycling schemes and to overcome the drawbacks of prior solutions. This is the focus of this paper.
The contributions of this paper are summarized as follows: (a) we propose a novel 8T charge recycling SRAM circuits (8T-CR SRAM). Compare to existing designs in literature, this proposed scheme gets rid of additional reference voltage sources, leads to less design complexity and lower cost for circuit implementation, and is applicable to low supply voltage systems, (b) we employ two types of SRAM cells in one design to balance the write performance, as well as enhance the read/write robustness, (c) we have implemented the proposed design using 65nm CMOS technology and present post-layout simulation results to quantify its benefits. Post-layout simulations demonstrate the proposed design results in a large improvement of write/read robustness compared to conventional 6T SRAM design in the same technology. Moreover, given the same supply voltage (e.g., 1.2V), post-layout simulation shows the proposed design is able to run at 5 times higher clock rate than conventional designs. Given the same clock frequency (e.g., 100MHz), the proposed design could reduce the required supply voltage from 1.2V to 0.7V.
The remainder of this paper is organized as follows. Section 2 presents a review of the related work on charge recycling SRAM design. In Section 3, we present the proposed design scheme including circuit structure, operational timing chart, and robustness analysis. In Section 4, we present the validation and benefits of our proposed SRAM scheme for energy-efficient operation, while Section 5 concludes the paper.
Related Work
Prior research effort [6] on the charge recycling SRAM cell is shown in Figure 2 (a). This design consists of a pair of basic 6T SRAM cells and additional circuits (i.e., a voltage sources, a resistor divider, MOS switches, two analog amplifiers, an additional power line to cells, et al.), which degrade SRAM integration density. As illustrated in Figure 2 (b), ¼ VDD and ¾ VDD of reference voltage sources are generated through the resistor divider. In the pre-charge phase of write operation, the switch EQ is turned on and EV is turned off. Either S or P is turned on, BL0 and BL0_N are pre-charged to ¼ VDD. BL1 and BL1_N are pre-charged to ¾ VDD. In evaluation phase, the switch EQ is turned off and EV is turned on. The input data drive switches P and S, one of which is turned on to build the proper bit-line voltage swing (i.e., P1 and P2 are turned on, while S1 and S2 are turned off). BL1_N is then charged to VDD, and BL0 is discharged to GND. BL1 and BL0_N share their initial charge and finally stabilize at about ½ VDD. Therefore, the bit line voltage difference is ½ VDD, instead of full-swing in non-charge-recycling SRAM.
Besides the aforementioned design complexity and cost overhead, this design suffers from another drawback described as follows: Figure 2 (a) is a simplified circuit with the number of charge recycling bit-line pairs equal to 2 (i.e., N=2), where the bit line voltage swing is ½ VDD. In fact, in order to maximize the charge recycling benefits, the prior designs [6] [7] choose N=4 or 8. Thus, the bit line voltage swing at write operation is only ¼ VDD or ⅛ VDD, which is insufficient to ensure a robust write operation in low voltage power supply systems. Hence, it is a big challenge to use these prior circuit schemes in low voltage SRAM systems, where VDD is usually less than 1V. In addition, write performance is unbalanced for different voltage swings. From the above discussion, it is evident that it is necessary to develop advanced charge recycling approaches that enables low-voltage, robust, and high-performance SRAM solution.
Proposed New Charge Recycling Scheme
In this section, we will focus on the discussion of our proposed 8T-CR SRAM system with more efficient bit-line charge recycling scheme. First, the proposed circuit structure is described. Second, we introduce two types of SRAM cells to enhance the stability in write operation. Third, we discuss how the hierarchical bit-line structure is implemented in this design. And last, we compare the SNM and WSNM with other SRAM cells.
Proposed Charge Recycling Scheme
The proposed charge recycling scheme is shown in Figure 3 . The concept of proposed bit line charge recycling method is to adaptively share charge between two adjacent bit lines in different pre-charge status. A decoder circuit is applied to select the 4 switches (S0, S1, S2, S3), which is based on input data. The truth table is shown in Table 1 .
In the write operation of conventional SRAMs, one of the two bit lines will discharge from VDD to GND to form a full-swing voltage difference. Then the voltage difference can transfer into the SRAM cell. In fact SRAM cell is constituted of two cross-coupled inverters which can enhance and rebuild the differential signals. So in the write operation of proposed SRAM, half-swing voltage difference is used. Bit line BL0 and BL0_N are pre-charged to VDD, while bit line BL1 and BL1_N are pre-charged to GND. Before word line is enabled, one of the two bit lines BL1 and BL1_N will be charged up from GND, meanwhile, one of the two bit lines BL0 and BL0_N will be discharged from VDD. The efficient way to obtain half-swing voltage difference for both of the SRAM cells is to connect proper bit lines directly by 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 turning on one of the four switches. Compared with the prior designs [6] [7] , this proposed scheme consists of only 4 MOS switches. There is no need for reference voltage sources. Figure 4 shows the simulation waveforms of our proposed SRAM circuit in write operation with f=500MHz and VDD=1.2V (standard power supply). During the evaluation phase, the NMOS switch S3 is turned on, and then bit-line signals (BL1_N, BL0_N) starts to share the charge which is stored previously. When the bit-line charge sharing process is complete, switch S3 is turned off and write word line (WWL) changes from low to high, the bit line signals are written into SRAM cells.
Here, we try to analyze the bit-line dynamic power consumption of each write operation in conventional 6T, 8T and our proposed 8T-CR SRAM cells. For simplicity, we study two columns of SRAM cells (i.e., 4 bit lines) as an example. Since during a write operation, one bit line of each SRAM cell has been discharged to GND. After this write operation, two bit lines are supposed to be charged from GND to VDD. Therefore, the dynamic power consumption of a conventional 6T or 8T SRAM cell is calculated as P , = P _ + P _ = 1 2 CV + 1 2 CV = CV Regarding our proposed 8T-CR SRAM cells, after a write operation, the voltages of 4 bit lines are VDD, 1/2 VDD, 1/2 VDD and GND. Since there is only one bit line that requires to be charged from 1/2 VDD to VDD, the 8T-CR SRAM cells lead to dynamic power consumption as
CV From the above estimation, it is apparent that theoretically the proposed 8T-CR SRAM design reduces dynamic power consumption of a write operation by 87.5%.
Circuit implementation of two columns of SRAM cells and SPICE simulation are also conducted to validate the above power estimation. Figure 5 shows the time-averaged power consumption of each bit line respect to 6T, 8T or our proposed 8T-CR design. Due to the presence of leakage power consumption, the simulated power consumption for write operation in the proposed 8T-CR design is 75% less than conventional 6T or 8T design. This result is very closed to the number we estimate from theoretical analysis in this section, therefore, here we make a conclusion that the proposed charge recycling scheme can significantly reduce the dynamic power consumption.
Robust SRAM cells for different voltage swings
The prior designs [6] [7] use the unified SRAM cell for all bit lines with different differential signals. It is well known that there is a threshold voltage loss when NMOS transistors pass a signal "1". So the write performance will become weak if the bit-line differential signals are VDD and ½ VDD.
In order to balance the write performance for different voltage swings, two types of SRAM cells are proposed in this work. As shown in Figure 3 , conventional 8T SRAM cell is marked as N-type; another type of 8T SRAM cell is marked as P-type. The only difference is that PMOS transistor P3 and P6 in P-type are instead of NMOS transistor N3 and N6 in the N-type. N-type cells are used for the bit-line differential signals GND and ½ VDD; P-type cells are used for the bit-line differential signals VDD and ½ VDD. In this case, the strong signal "1" will dominate write operation in P-type cells, and strong signal "0" will dominate write operation in N-type cells. The two cross-coupled inverters are sensitive to the strong signals, and have a positive feedback to enhance the write operation. Due to the two types SRAM cell scheme, an inverter (INV) is employed to each row. The conventional word line signal WWL controls N-type cells, while another word line signal WWL_N controls P-type cells. One inverter for each row is negligible for area cost. In addition, this inverter separates a long word line into two parts, which will improve the driving capability of word line or reduce the size of each word line driver.
Hierarchical Bit-line Structure
In conventional SRAM read operation, a single SRAM cell drives the whole bit line with large parasitic capacitance, so a sense amplifier (SA) is applied to shorten read time. But the power consumption of SA itself is non-ignorable. To speed up the bit-line discharging process, hierarchical bit-line concept was proposed in literature [9] . The basic concept is to use a single SRAM cell to drive a short sub bit-line and this sub bit-line drives the global bit-line. In this method, the total read time is reduced, as well as the entire power consumption.
In this work, we inherit the concept of hierarchical bit-line and implement our hierarchical SRAM system as follows: as illustrated in Figure 3 , every 16 SRAM cells share 1 sub read bit-line (Sub-RBL), and each Sub-RBL drives read bit-line (RBL) through an inverter and a NMOS transistor. Only one read word line (RWL) is enabled during the read operation. All the un-selected SRAM cells cannot discharge the pre-charged Sub-RBL, so their corresponding Sub-RBL remains high. If the selected SRAM cell remains high (i.e., node A is equal to "1" in Figure 3 ), it will drive Sub-RBL to discharge and then RBL is also discharged to low. Therefore, the final output signal is high. On the other hand, if the selected SRAM cell remains low (i.e., node A is equal to "0" in Figure 3 ), its corresponding Sub-RBL and RBL remain high, so the output is low. Based on the pass transistor type of each SRAM cell, we mark different cell as N-type or P-type cell (which is discussed in section 3.2), but their read operation are totally same.
SNM Analysis
Static noise margin (SNM) is a metric to evaluate the stability of SRAM cells [1] [2] . In this work, we study and compare SNM among different SRAM cells (i.e., 6T, 8T and 8T-CR) with 65nm ultra low-power CMOS technology in 1.2V standard power supply. Figure 6 shows that the SNM of 8T and 8T-CR SRAM cells are about 2.6 times comparing with 6T SRAM cell. Figure 6 also shows the proposed 8T-CR SRAM can achieve SNM as well as 8T cell, because the data path for read and write operations in both cells are decoupled. SRAM engineers can improve read or write performance of 8T SRAM cells separately by optimizing W/L ratio of transistors on the read or write path. However, the W/L ratio of each transistor in 6T SRAM cells must be precisely designed, because sizing for read and write performance is trading off and taking turns.
WSNM Analysis
Five common approaches for measuring write static noise margin (WSNM) of SRAM cell are introduced in [8] . In this work, we use bit line margin as a measurement metric of WSNM. As shown in Figure 7 (a), in order to evaluate WSNM, a SRAM cell is configured as a writing "1" case. Spice simulation is carried out to sweep the BL_N voltage from high to low. Write margin is defined as the BL_N value at the point when Q and QB flip [8] . The higher that trip-point value is, the easier it is to write the cell, implying a more robust SRAM cell.
The simulation results are shown in Figure 7 (b). The WSNM of proposed 8T-CR SRAM is approximate to 1.2V, which is much higher than the WSNM of conventional 6T SRAM cell (i.e., 389mV) and 8T SRAM cell (i.e., 515mV). Figure 7(b) indicates that the write robustness of our proposed 8T-CR SRAM cell is improved significantly.
Simulation Results
A 64Kb SRAM system based on the proposed 8T charge recycling technique has been implemented using CMOS 65nm technology. Figure 8 shows the layout view of this design and the total area is 0.16mm 2 (0.42mm×0.39mm). We can see the area overhead for charge recycling control switches is negligible. As both P-type and N-type SRAM arrays are used in this design, in order to achieve the same driving strength as in the N-type array, the P-type array is sized up slightly. As a result, the overall area of the proposed design is 14.8% larger than conventional 8T SRAM design.
In this section, we provide various simulation results to validate the benefits of the proposed 8T-CR SRAM design. Figure 9 plots the relationship between the required minimum supply voltage and targeted clock frequency for 6T, 8T and 8T-CR designs. This figure consists of a "Pass" region and a "Fail" region. We can easily find that for a given supply voltage, the 8T-CR cell can read or write under a higher clock frequency. For example, if the supply voltage is given as 0.8V, 6T and 8T SRAM cells can operate under a maximum clock frequency of 300MHz. On the contrast, the proposed 8T-CR cell is able to support read/write operations under a maximum clock frequency of 500MHz. Note process corner and temperature (-25℃ to 70℃) variations have also been considered in Figure 9 .
To make a fair comparison at cell-level, the conventional 6T, 8T, prior charge-recycling designs [6] [7] , and proposed 8T-CR SRAM cells were implemented using 65nm ultra low-power CMOS technology and simulated under 1.2V supply voltage. Table 2 shows the summary of noise margins and power consumption among these five design schemes. In terms of robust operation, this work leads to the highest WSNM, SNM and hold SNM. These outstanding signal noise margins indicate the proposed 8T-CR SRAM is the most robust one.
The proposed 8T-CR SRAM design also results in much less power consumption over conventional 6T or 8T designs. When the number of charge-recycling bit-line pairs is equal to 2 (i.e., N=2), the write power of the proposed 8T-CR SRAM is close to the design in [7] and less than the design in [6] due to the use of floating supply lines in [6] . When N is equal to 2, this work leads to the lowest power consumption for read operation. In fact, since N=6 and N=8 are implemented in [6] and [7] respectively, the voltage swing as well as the power consumption in [6] [7] are much lower than our proposed 8T-CR design. Table 3 lists the comparison summary of different SRAM designs at system-level with respect to various aspects. The prior design in [12] used the similar hierarchical structure to bit lines, however, did not employ any charge recycling features. Therefore, the power consumption of this design is the worst. In contrast with the aforementioned charge-recycling SRAM designs [6] [7] , our proposed solution results in less design complexity, because the proposed work does not use voltage sources, analog amplifiers, additional power lines for cells, WSA (write sense amplifier) and RSA (read sense amplifier). Given the same supply voltage, the proposed solution achieves a maximum operating frequency of 900MHz, which is about 6.2 times improvement than its counterpart [7] . If the proposed design operates at a clock frequency of 145MHz, its required supply voltage can be as low as 0.7V (in contrast with 1.2V in [7] ). Running our proposed SRAM system at 0.7V and 145MHz would lead to about 90% of reduction in total power consumption comparing with that at 1.2V and 900MHz. As indicated in Table 3 , the proposed SRAM system has the highest WSNM, which will avoid write failure in low supply voltage applications. The proposed SRAM system also has benefits of long-term reliability. According to the VLSI reliability study [10] [11] , a circuit operating at a lower supply voltage suffers from less voltage stress, gate aging and heat dissipation. Therefore, the long-term reliability of SRAM systems can be improved using the proposed design, thanks to the use of lower supply voltage.
Conclusion
SRAM component is widely used in modern electronic systems. The design of low power and high performance SRAM is increasingly attractive to achieve long operational lifetimes in a variety of electronic systems. In this work, we propose and discuss a new bit line charge recycling SRAM scheme for 8T SRAM. The design concept and operation mechanism are described. A design example is implemented in 65nm technology and simulation results demonstrate the capabilities and benefits of this proposed idea. The proposed design leads to more robust write/read operation, higher maximum operation frequency, lower minimum supply voltage and less design complexity. Hierarchical bit line [12] Write charge recycling [6] Write and read charge recycling [7] This work No need 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 List of figures 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 (b) Figure 2 . Existing SRAM charge recycling method [6] . 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 Figure 6 . SNM comparison chart. 3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64 
List of tables

