Provided by OpenSILIC

# Southern Illinois University Carbondale **OpenSIUC**

Articles

Department of Electrical and Computer Engineering

Fall 10-1-2016

# Power Efficient SRAM Design with Integrated Bit Line Charge Pump

Xu Wang

Shanghai Jiao Tong University, wangxu0737@163.com

Yuanzhi Zhang

Southern Illinois University Carbondale, yzzhang@siu.edu

Chao Lu

Southern Illinois University Carbondale, chaolu@siu.edu

Zhigang Mao

Shanghai Jiao Tong University, maozhigang@ic.sjtu.edu.cn

Follow this and additional works at: http://opensiuc.lib.siu.edu/ece articles

#### Recommended Citation

Wang, Xu, Zhang, Yuanzhi, Lu, Chao and Mao, Zhigang. "Power Efficient SRAM Design with Integrated Bit Line Charge Pump." *International Journal of Electronics and Communications* 70, No. 10 (Fall 2016): 1395-1402. doi:10.1016/j.aeue.2016.08.002.

This Article is brought to you for free and open access by the Department of Electrical and Computer Engineering at OpenSIUC. It has been accepted for inclusion in Articles by an authorized administrator of OpenSIUC. For more information, please contact opensiuc@lib.siu.edu.

# Power Efficient SRAM Design with Integrated Bit Line Charge Pump

Xu Wang<sup>1,2</sup>, Yuanzhi Zhang<sup>3</sup>, Chao Lu<sup>3</sup>, Zhigang Mao<sup>1</sup>

<sup>1</sup> Department of Micro-Nano Electronics, Shanghai Jiao Tong University, Shanghai, China

<sup>2</sup> Lonely Mountain Electronic Technology (Shanghai) Co., Ltd., Shanghai, China

<sup>3</sup> Electrical and Computer Engineering Department, Southern Illinois University Carbondale, IL, USA Email: wangxu0737@163.com, yzzhang@siu.edu, chaolu@siu.edu, maozhigang@ic.situ.edu.cn

# **Abstract**

Bit line toggling of SRAM systems in write operations leads to the largest portion of power dissipation. To reduce this amount of power loss and achieve power efficient memory, we propose a new SRAM design that integrates charge pump circuits to harvest and reuse bit line charge. In this work, a power-efficient charge recycling SRAM is designed and implemented in 180nm CMOS technology. Post-layout simulation demonstrates an 11% of power saving and 3.8% of area overhead, if the bit width of SRAM is chosen as 8. Alternatively, 22% of power reduction is obtained if the bit width of SRAM is extended to 64. Compared with existing charge recycling SRAM schemes, this proposed SRAM is robust to process variation, demonstrates good read/write stability, and illustrates better trade-off between design complexity and power reduction.

**Keywords:** SRAM; low power; charge pump; bit line.

#### 1. Introduction

In the past few decades, CMOS technology has been scaling down by Moore's law. Transistor channel length has been decreasing by the use of advanced manufacturing process. With the tremendous boost of integrated circuit performance, interconnection metals become more and more complex. Signal switching in these interconnection lines results in significant power consumption as well as heat generation. These drawbacks become more severe in memory systems, where interconnection metals dominate power dissipation of the entire block. For example, the charging and discharging activity of bit lines is the primary cause in SRAM, since they are long routing metals and connect to a large column of memory cells. Power in SRAM is mainly consumed by bit line charge and discharge currents. Therefore, in order to lower the power consumption of SRAM, suppressing bit line swing is substantially important [1].

The first charge pump was proposed by Dickson [2]. Later, a variety of optimized charge pumps were presented in literature [3-7]. The principle of charge pump is using two clock signals to modify connection states of capacitors. In most cases, charge pump provides a stable voltage supply other than power supply. So they are widely used as dc-dc converting circuits in power management systems [8].

The contribution of this work is that we integrate a charge pump circuit with SRAM and realize an efficient charge recycling SRAM system. The whole memory system was designed and implemented in 180nm CMOS process. The post-layout simulation shows an 11% of power reduction and only 3.8% of area overhead due to the integration of bit line charge pump when a SRAM has 8 bits. If the bit width of SRAM increases to 64, the resultant power saving is about 22%. Furthermore, this SRAM layout was embedded into an 8051 MCU layout to verify its operation compatibility. Simulation results demonstrate the 8051 MCU system works correctly with the proposed charge pump SRAM.

The remainder of this paper is organized as follows. Section 2 presents a review of the related works on low power SRAM system design. In Section 3, we present the proposed charge pump design scheme. In Section 4, we present the simulation results including operational timing chart, power consumption

analysis, efficiency calculation and PVT simulation. In Section 5, we provide the validation and benefits of our proposed SRAM scheme for energy-efficient operation, while Section 6 concludes the paper.

#### 2. Previous Works

As shown in Fig. 1, an adiabatic charge pump was used in SRAM design in [9]. In this paper, a SRAM array is divided into two parts: Producer slice and Consumer slice. Virtual ground units are utilized to collect dirty charge at the ground node of SRAM Producer slice. The operation principle is described as follows: Two virtual grounds alternatively behave to collect the charge from Producer to Consumer. Virtual ground unit is implemented by an adiabatic charge pump circuit, as illustrated in Fig. 1. When the virtual ground unit collects charge, the switch Producer and the switch Charge are turned on, while the switch Consumer, the switch In Use, and the switch Reset are turned off. In this configuration, the charge from SRAM Producer slice flows into the capacitor in virtual ground unit. This is a charge collection and storage process. Later, when the virtual ground unit is configured to supply charge to SRAM Consumer slice, the switch Consumer and the switch In Use are turned on, while the switch Producer, Charge, and Reset are turned off. In this way, there is a DC voltage source in series with capacitor, thus, the voltage potential in the top plate of capacitor is boosted. The previously stored charge in the capacitor flows into the SRAM Consumer slice. This is a charge utilization and recycling process.



**Figure 1.** The existing adiabatic charge pump SRAM method [9].

The aforementioned scheme has potential to reduce power consumption of a SRAM system up to 25% [9]. However, it also results in several disadvantages. First, the time-multiplexing operation of two virtual ground units is complicated. When the voltage level of one virtual ground unit is low, a separate DC voltage supply is required to enable charge recycling. The need of such as a separate DC voltage supply is an overhead of system implementation. Usually a large on-chip capacitor is required in the virtual ground unit to collect charge from SRAM Producer, this increases the design area and implementation cost. Second, this capacitor discharge process has a negative impact to the stability of power supply. It is easy to understand that the capacitor continuously discharging, so the top plate of capacitor keeps decreasing with time. This causes the instability issue of power supply to Consumer, especially the varying body bias voltage of transistors results in more substrate noise and increases the probability of SRAM operation failure.

To deal with these design disadvantages in [9], in this work, the authors propose to use a charge pump circuit to recycle the bit line charge, which is the dominant source of power dissipation.

# 3. The Proposed SRAM System with Bit Line Charge Pump

#### 3.1 The conceptual circuit of our proposed SRAM

Our proposed SRAM system is depicted in Fig. 2. We can see the charge pump is connected to the bit lines of SRAM. When the charge pump is in active mode, the switches S1 and S2 depend on input data. The charge stored in the bit line will be shared with capacitors (C1 and C2). Through the charge sharing, the bit line is discharged and voltage level is lower. The non-enabled bit line still keeps its pre-charged high voltage level. Later, when the memory write operation is complete, the two capacitors in charge pump will be connected in series, thus, creating a higher voltage level to move charge back to the bit line which shared its charge. With the help of proper timing control of the switches (S1-S5), the process of efficient bit line charge recycling is carried out.



Figure 2. The schematic of proposed charge pump SRAM

## 3.2 The Operating mechanism

Next, the operating mechanism of our proposed SRAM system is explained in details. We take the write operation as an example. Assume data "0" is supposed to write into SRAM cell. So data decoder turns on switch S1 and turns off switch S2.

Step 1: As illustrated in Fig. 3(a), it is a charge collection process. The switches (S3 and S5) are turned on and S4 is turned off. Hence, the capacitors (C1 and C2) are in parallel. The previously stored charge on bit line (BL) is split and flows into two capacitors. At the end of this step, the voltage level of bit line (BL) is about 1/3 VDD (this value is related to the BL parasitical capacitor value and C1, C2 capacitors value), the voltage level of BL\_N remains at VDD.

Step 2: As shown in Fig. 3(b), it is a SRAM write process. The switch S1 is turned off, so the charge pump is disconnect with memory cell. Thus, the charge stored in both capacitors in last step is reserved. The voltage level of word line (WL) is high. Due to the voltage difference of two bit lines (BL and BL\_N) is large enough (2/3 VDD), data "0" can be successfully written into the memory cell.

Step 3: As shown in Fig. 3(c), it is a charge recycling process. The switches (S1 and S4) are turned on and

switches (S3 and S5) are turned off. Both capacitors in the charge pump are connected in series, and hence lead to a higher voltage level that forces the charge in capacitors moving back to bit line (BL).



Figure 3. Working steps of bit line charge pump SRAM

In the read and hold cycles, the proposed SRAM design exhibits the same operation as conventional 6T-SRAM design. As shown in Fig. 4(a), in read operation, S1 and S2 are turned off, the charge pump is completely disconnected from the SRAM cell. As S3 and S4 are turned on, the sense amplifier is connected to BL and BL\_N. The voltage difference between BL and BL\_N, which reflects the data stored in the SRAM cell, is readout through the sense amplifier. In hold operation, as shown in Fig.4 (b), S1, S2, S3 and S4 are turned off, so both charge pump and sense amplifier are successfully disconnected from SRAM cells.



**Figure 4.** System diagram for SRAM read and hold operations

#### 3.3 Charge pump multiplexing structure

In order to realize an area-efficient circuit implementation, charge pumps don't have to be inserted with every bit line in a SRAM system. In this proposed design, the number of charge pumps is the same as the number of data width. As shown in Fig. 5, each charge pump is allocated and shared among multiple

memory cells and bit lines, as shown in Fig. 5. For example, assume the data width is N, which indicates N bit cells will be accessed during each write or read operation, there are only N active bit-line pairs and hence the required number of charge pumps is N. If the column number of a SRAM array is M, then the input terminals of a multiplexer is M/N. In Fig. 5, the SRAM array consists of 64 columns and the bit width is 8, therefore, 8 charge pumps are enough and 8:1 multiplexer is utilized.



Figure 5. Structure of charge pump multiplexing

### 4. Post-layout Simulation Results

#### 4.1 Simulation waveforms

The proposed SRAM system was designed and implemented in SMIC 180nm CMOS process. The capacity of SRAM is 8Kb. The post-layout simulation was carried out at supply voltage VDD=1.8V and clock frequency f=100MHz. The switches S3 and S5 in Fig.3 are controlled by signal CLK1, while the switch S4 is controlled by signal CLK2. In Fig. 6, when CLK1 changes to high, the voltage level of bit line (BL\_N) decreases, which is because of the charge sharing with capacitors(C1 and C2). When the word line signal (WL) is high, input data is written into the SRAM cell, due to the positive feedback of cross-coupled inverters in the SRAM cell, the voltage level of BL\_N continues to decrease. Later, when CLK2 becomes high, we can observe the voltage level of bit line (BL\_N) jumps due to the injected charge from charge pump.



Figure 6. Waveform of the proposed charge pump SRAM

#### 4.2 Dynamic power analysis of bit line

The system power loss consists of dynamic and static power, as shown below [10]:

$$Power = \alpha N_{gate} C V_{DD}^2 f + \sum_{N_{ente}} I_{eff} V_{DD}$$
 (1)

Here,  $\alpha$  is the switching activity defined as the probability of switching during a clock cycle.  $N_{gate}$  is the number of gates which the node has to drive. C is the total capacitance and f is the operation clock frequency.  $I_{eff}$  and  $V_{DD}$  are effective current and supply voltage. In SRAM systems, dynamic power consumption of bit lines is the dominant component. In this work, the proposed charge pump scheme only reduces the dynamic power loss, the static power loss remains the same. Based on the post-layout simulation waveforms, we conducted the following power saving estimation.

According to Eq. (1), assume  $\alpha$  and  $N_{gate}$  are equal to 1, the dynamic power consumption of a single bit line of conventional 6T-SRAM in one write operation is calculated as:  $P_{6T} = C_{bitline} V_{DD}^2 f = 5.06 e^{-14} \times 1.8^2 \times 100 e^6 = 1.64 e^{-5} (W)$ 

$$P_{6T} = C_{bitline} V_{DD}^2 f = 5.06e^{-14} \times 1.8^2 \times 100e^6 = 1.64e^{-5} (W)$$
 (2)

Here  $C_{bitline}$  is the bit line capacitance, whose value is obtained from layout extraction. In contrast, the dynamic power consumption of our proposed charge pump SRAM (CP-SRAM) in one write operation is:

$$P_{CP} = C_{bitline} V_{swing}^2 f = 5.06e^{-14} \times (1.8 - 1.02)^2 \times 100e^6 = 3.08e^{-6} (W)$$
(3)

The voltage level of bit lines is extracted from post-layout simulations in Fig. 6. We can see our proposed charge pump scheme reduce signficant dynamic power consumption by restricting bit-line voltage swing and recycling bit-line charge.

#### 4.3 Power efficiency of charge pump

Power efficiency is an important metric to evaluate charge pump design. In this sub-section, we will calculate and simulate the power efficiency of our proposed charge pump. The equation of a charge pump circuit is expressed as:

$$\eta = \frac{P_{out}}{P_{in}} \times 100\% = \frac{V_{out}(t) \times I_{out}(t)}{V_{in}(t) \times I_{out}(t)} \times 100\% = \frac{V_{out} \times I_{out}(t)}{V_{in} \times I_{out}(t)} \times 100\%$$
(4)

 $V_{out}(t)$  is the steady-state value of v(BL\_N) when CLK2 is turned off,  $V_{in}(t)$  is the steady-state value of v(BL\_N) when CLK1 is turned off in Fig. 6.  $V_{out}(t)$  and  $V_{in}(t)$  can be obtained from simulation waveform,  $I_{out}(t)$  and  $I_{in}(t)$  can be calculated as:

$$I_{out}(t) = \int_{T_3}^{T_4} i_{out}(t) dt \tag{5}$$

$$I_{in}\left(t\right) = \int_{T_{1}}^{T_{2}} i_{in}\left(t\right) dt \tag{6}$$

Furthermore, the integral of  $i_{out}(t)$  or  $i_{in}(t)$  can be calculated as the area of waveform  $i_{out}(t)$  or  $i_{in}(t)$  (i.e., waveform i(BL\_N) in Fig. 6). And the area of waveform  $i_{out}(t)$  or  $i_{in}(t)$  is approximate to the area of a triangle. Finally, the power efficiency of the proposed charge pump is calculated as:

$$\eta = \frac{V_{out} \times I_{out}(t)}{V_{in} \times I_{out}(t)} \times 100\% = \frac{V_{out} \times \int_{T_3}^{T_4} I_{out}(t) dt}{V_{in} \times \int_{T_1}^{T_2} I_{in}(t) dt} \times 100\% \approx \frac{V_{out} \times S_{out}}{V_{in} \times S_{in}} \times 100\%$$

$$\approx \frac{1.02(V) \times \left[\frac{1}{2} \times 271.4(uA) \times 0.481(ns)\right]}{0.603(V) \times \left[\frac{1}{2} \times 306.9(uA) \times 0.847(ns)\right]} \times 100\% = 84.95\% \tag{7}$$

#### 4.4 Read/Hold/Write static noise margin

Static Noise Margin (SNM) is an evaluation metric to characterize the noise immunization capability of SRAM cell [11]. When a SRAM cell is in read mode, two transistors (i.e., N1 and N2 in Fig. 7(a)) are turned on. It is possible that the previously stored data in the memory cell may be overwritten by the pre-charged bit lines (i.e., BL or BL\_N in Fig. 7(a)). Therefore, read SNM is used to evaluate the stability of SRAM cell during read operation. The schematic for finding SNM is shown in Fig. 7(a) [12], where word line signal WL is set to "1" and then the DC voltage supply varies from VDD to GND to change the voltage at node A from "1" to "0". The resultant voltage transfer curve is plotted as the solid line in Fig. 7(b). Similarly, hold SNM indicates the SRAM data stability when WL is set to "0" in Fig. 7(a), and the resultant simulation result is shown as the dashed line in Fig. 7(b). The simulated read and hold SNM are 0.38V and 0.84V, respectively.



Figure 7. Read/Hold SNM simulation

Correspondingly, write SNM is an evaluation metric to characterize how easy to write a data into a SRAM cell. The write SNM is defined as bit line voltage value when the state of memory cells flips [13]. The ease of a write operation depends on how close the bit line need **In Section 5**s to be driven to ground. Therefore, the required low-level of bit line voltage during a write operation is an indicator of write stability [14]. As depicted in Fig. 8(a), the word line signal WL is set to "1" and initial voltage of node B is set to "1", then the BL\_N voltage varies from VDD to GND. The voltage transfer curves at nodes A and B are shown in Fig. 8(b), where the write SNM value is 355.7 mV, which is the BL\_N voltage value when voltage at node A is equal to that at node B.



Figure 8. Write SNM simulation

In this section, these SNM simulations are static ones, which heavily depend on the sizing of SRAM cells. The swing level of bit line voltage does not impact these SNM simulation results. In this work, our proposed CP-SRAM cell is sized as same as conventional 6T-SRAM, thus, same read/hold/write SNM values are obtained from SNM simulations in Fig. 7(a) and Fig. 8(a). Even though these SNM values do not indicate any discrepancy, circuit operation differences still exist. For example, in a write operation, the bit line of 6T-SRAM shows a full swing of 1.8V, while the bit line voltage swing of our proposed CP-SRAM is 1.2V as shown in Fig. 6. Moreover, one of the two bit lines is driven by GND for 6T-SRAM, while it is driven by capacitances in charge pump for our proposed CP-SRAM.

#### 4.5 Process and temperature simulation

As mentioned in the section 4.4, static SNM simulations maybe overlook subtle drawbacks, herein we carry out dynamic simulations of read and write operations to estimate the stability of our proposed CP-SRAM. Synopsys nanosim was used in post-layout simulations to reveal the impacts of process variation. Five process corners (SS, SF, TT, FS and FF) and temperature range (-55°C~125°C) were used. Post-layout simulation results in Fig.9 demonstrate its read or write time linearly rises with the increase of temperature. Under the worst case (i.e., SS corner, 125°C), write time and read time are 4.42ns and 1.32ns, respectively. Fig.9 validates that the use of integrated charge pump in CP-SRAM is robust and it does not result in severe timing degradation or functional failure during read or write operations.



(a) Comparison of write time vs. temperature in different corners



(b) Comparison of read time vs. temperature in different corners **Figure 9.** PVT dynamic simulation

# 5. Layout Photo and Performance Comparison

Fig. 10 shows the layout view of a proposed SRAM, which is embedded into an 8-bit 8051 MCU system. The total system area is  $1466 \times 1465 \mu m^2$ . The CP-SRAM area is  $432 \times 309 \mu m^2$ , it consists of 128 rows and 64 columns of SRAM cells. The charge pump circuit is located at the bottom of CP-SRAM array.



Figure 10. Layout photo of an MCU with embedded charge pump SRAM

As shown in Table 1, the proposed CP-SRAM leads to a slight area overhead (3.8%) and a remarkable power saving (11%), in contrast to conventional 6T-SRAM design. In addition, there is a 44% of reduction in bit line current consumption. Note unlike the results in section 4.2 where the bit line current calculation is conducted for a single bit line during one write operation, the bit line average current in Table 1 is measured during 100,000 clock cycles of normal read/write/hold operation. Therefore, the bit line current in Table 1 is composed by read and write operation currents as well as leakage current, thus, this 44% of current reduction is more practical. Moreover, as discussed in Section 4.4, the Read/Hold/Write SNM margin value is obtained from HSPICE simulations. The Read/Hold/Write SNM in the CP-SRAM are the same as those of the conventional 6T-SRAM design.

**Table 1.** Comparison of conventional and proposed SRAM

|           | Area (mm <sup>2</sup> ) | Bit line current (μA) | Total power (mW) | Read/Hold/Write Margin (V) |
|-----------|-------------------------|-----------------------|------------------|----------------------------|
| 6T-SRAM   | 0.1285                  | 31.02                 | 2.09             | 0.38/0.84/0.36             |
| CP-SRAM   | 0.1334                  | 17.45                 | 1.86             | 0.38/0.84/0.36             |
| Reduction | -3.8%                   | 44%                   | 11%              | None                       |

Table 2 lists the percentage of power consumption reduction versus bit width. It is apparent that the percentage of power saving becomes increasingly significant as the increase of bit width. This is because more SRAM bit lines can participate in charge recycling when there are more SRAM bit lines available.

**Table 2.** Comparison of power consumption vs. bit width

| Bit width | 6T-SRAM power (mW) | CP-SRAM power (mW) | Reduction |
|-----------|--------------------|--------------------|-----------|
| 8         | 2.09               | 1.86               | 11%       |
| 16        | 2.5                | 2.13               | 15%       |
| 32        | 2.94               | 2.37               | 19%       |
| 64        | 3.49               | 2.72               | 22%       |

Table 3 summarizes the related literature results for comprehensive comparison and discussion. Reference [9] collects and recycles the virtual ground line charge, instead of bit line charge. Even though power consumption in [9] could be reduced to some extent, it also results in several disadvantages which have been discussed in the Section 2. Based on the available results, we deduced the power consumption per size of [9] as follows: the energy of SRAM system in [9] is mentioned as 3.25J when the benchmark is executed for 1 billion instructions. Since the clock frequency is 1GHz, we could estimate the power consumption as 270.8mW. As the SRAM size is 17Mb (*i.e.*, 512Kb L1-I Cache, 512Kb L1-D Cache and 16Mb L2-Cache), the power consumption per size is about 15.56 $\mu$ W/Kb. In addition, the implementation technology is 32nm and supply voltage is 0.7V in [9]. Hence, the power consumption should multiply 6×1.8<sup>2</sup>/0.7<sup>2</sup> to enable an equivalent comparison with this work (*i.e.*, 180nm technology and 1.8V supply voltage). Finally, the estimated power consumption per size in reference [9] is 617.3 $\mu$ W/Kb, which is higher than 232.5 $\mu$ W/Kb of this work. Furthermore, as mentioned in [9], if the number of word line is 128,

this design has an energy saving of 5.6% over conventional 6T-SRAM, while our proposed work results in an energy saving of 11%.

**Table 3.** Comparison summary of related SRAM designs

|                        | PATMOS [9]                                                             | ISQED [15]                                                          | TVLSI [16]                                                        | This work                                    |
|------------------------|------------------------------------------------------------------------|---------------------------------------------------------------------|-------------------------------------------------------------------|----------------------------------------------|
| Technology & Condition | BPTM 32nm model<br>Simulation                                          | BPTM 180nm<br>model Simulation                                      | TSMC 180nm Chip Test                                              | SMIC 180nm<br>Post-Layout<br>Simulation      |
| Low-power<br>Technique | 1 line charge of SR A M                                                |                                                                     | 4T-cell with Dual-Vth                                             | Bit-line<br>charge-pump                      |
| SRAM Size              | 17Mb                                                                   | 256Kb                                                               | 32Kb                                                              | 8Kb                                          |
| Area per Size          | No layout                                                              | No layout                                                           | 0.016mm <sup>2</sup> /Kb                                          | 0.017mm <sup>2</sup> /Kb                     |
| Power per Size         | 617.3µW/Kb, which was<br>reported as 5.6%<br>reduction over<br>6T-SRAM | No specific value,<br>but reported<br>14% reduction over<br>6T-SRAM | 1375µW/Kb, no<br>reported reduction<br>percentage over<br>6T-SRAM | 232.5µW/Kb,<br>11% reduction<br>over 6T-SRAM |
| Supply Voltage         | 0.7V                                                                   | 1.8V                                                                | 1.8V                                                              | 1.8V                                         |
| Frequency              | 1GHz                                                                   | 100MHz                                                              | 100MHz                                                            | 100MHz                                       |
| SNM                    | Not mention                                                            | Not mention                                                         | 200mV                                                             | 380mV                                        |

Reference [15] presented a charge recycling technique for SRAM in TFT-LCD process and application. Two ideas were introduced. First, the voltage level of virtual ground is significantly boosted to reduce the leakage current in SRAM. Second, charge in virtual ground gets recycled and re-used through switching the source-line to connect with other source-lines which have different voltage levels. In order to implement both ideas, each row of SRAM array shares one virtual ground point (called "source-line" in [15]). Because this design requires an additional 900mV supply to boost the source-line voltage level, it results in design complexity overhead and user inconvenience. Under the iso-simulation condition (*i.e.*, 100MHz, 1.8V), even though the specific power value was not mentioned in [15], it was reported that 54% and 14% of power reduction were obtained, due to boosting virtual ground and charge recycling from virtual ground, respectively. Note the simulation results in [15] were based on predictive technology model without considering layout routing and parasitic extraction, so 14% of power savings may vary with the specific implementation technology.

Because of the use of 4T memory cell structure, reference [16] has a smaller area (*i.e.*,  $0.016 \text{mm}^2/\text{Kb}$ ,  $784.7 \times 663.1 \mu\text{m}^2$  for 32Kb SRAM core without Pads) than our proposed design (*i.e.*,  $0.017 \text{mm}^2/\text{Kb}$ ). However, the power consumption of our proposed system is lower due to the use of bit line charge recycling. The power consumption mentioned in [16] is 20mW in BIST mode and 44mW in synchronous operation mode. Considering the memory system is 32Kb, the power consumption per size is  $625 \mu\text{W/Kb}$  and  $1375 \mu\text{W/Kb}$  in BIST and synchronous operation mode, respectively. In terms of write/read stability, reference [16] results in a bad SNM (i.e., only 200mV), while our proposed system has improved noise immunity (i.e., SNM=380mV).

From the above discussion, we can see the proposed CP-SRAM design is a good trade-off between design complexity and power savings. The proposed CP-SRAM design does not need extra DC supply voltage, which is required in the previous designs [9] [15]. As a result, this proposed solution is easier to implement. Circuit simulations have validated the Read/Hold/Write SNM values in the CP-SRAM are as good as conventional 6T-SRAM. Due to the use of integrated charge pump block for bit line charge recycling, the proposed design achieves 22% or 11% of power reduction over conventional 6T-SRAM for 64 or 8 bit widths scenario, respectively.

## 6. Conclusion

To the best of our knowledge, it is the first time that an integrated charge pump circuit was proposed to work inside SRAM systems and to recycle bit line charge in memory write operation. Circuit design, analysis and VLSI implementation of an 8Kb CP-SRAM system are presented in this work. Compared to conventional 6T-SRAM design, the proposed system leads to a remarkable power saving of 11% with a negligible area overhead of only 3.8%. In contrast with existing charge recycling SRAM designs, this proposed CP-SRAM design is robust to process variation and demonstrates good read/write stability, as well as better trade-off between design complexity and low power consumption.

# 7. Acknowledgment

This work is sponsored by Shanghai Pujiang Program of China (15PJ1431600).

#### References

- [1] Moriwaki S, Kawasumi A, Suzuki T, Sakurai T, Miyano S. 0.4V SRAM with bit line swing suppression charge share hierarchical bit line scheme. In: IEEE Custom Integrated Circuits Conference (CICC). 2011. p. 1-4.
- [2] Dickson JF. On-chip high-voltage generation in MNOS integrated circuits using an improved voltage multiplier technique. IEEE J Solid State Circuits 1976; 11:374-378.
- [3] Favrat P, Deval P, Declercq MJ. A high-efficiency CMOS voltage doubler. IEEE J Solid State Circuits 1998; 33:410-416.
- [4] Ying TR, Ki WH, Chan M. Area-efficient CMOS charge pumps for LCD drivers. IEEE J Solid State Circuits 2003; 38:1721-1725.
- [5] Lu C, Park SP, Raghunathan V, Roy K. Efficient power conversion for ultra low voltage micro scale energy transducers. In: IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE). 2010. p. 1602-1607.
- [6] Lu C, Raghunathan V, Roy K. Efficient Design of Micro-Scale Energy Harvesting Systems. IEEE J Emerging and Selected Topics in Circuits and Systems 2011; 1:254-266.
- [7] Lu C, Park SP, Raghunathan V, Roy K. Stage number optimization for switched capacitor power converters in micro-scale energy harvesting. In: IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE). 2011. p. 1-6.
- [8] Keung KM, Manne, V, Tyagi A. A Novel Charge Recycling Design Scheme Based on Adiabatic Charge Pump. IEEE Trans Very Large Scale Integration (VLSI) Systems 2007; 15:733-745.
- [9] Keung KM, Tyagi A. SRAM CP: A Charge Recycling Design Schema for SRAM. Springer Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation 2006; 4148:95-106.
- [10] Morifuji E, Yoshida T, Kanda M, Matsuda S, Yamada S, Matsuoka F. Supply and threshold-Voltage trends for scaled logic and SRAM MOSFETs. IEEE Trans Electron Devices 2006; 53:1427-1432.
- [11] Seevinck E, List F, Lohstroh J. Static noise margin analysis of MOS SRAM cells. IEEE J Solid State Circuits 1987; 22:748–54.
- [12] Calhoun BH, Chandrakasan AP. Static noise margin variation for sub-threshold SRAM in 65-nm CMOS. IEEE J Solid State Circuits 2006; 41:1673-1679.
- [13] Wang J, Nalam S, Calhoun BH. Analyzing static and dynamic write margin for nanometer SRAMs. ACM/IEEE Int. Symposium Low Power Electronics and Design (ISLPED). 2008. p. 129-134.
- [14] Zhang K, Bhattacharya U, Chen Z, Hamzaoglu F, Murray D, Vallepalli N, Wang Y, Zheng B, Bohr M. A 3-GHz 70-mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply. IEEE J Solid State Circuits 2006; 41:146-151.
- [15] Kim KJ, Kim CH, Roy K. TFT-LCD application specific low power SRAM using charge-recycling technique. IEEE Int. Symposium Quality of Electronic Design (ISQED). 2005. p. 59-64.
- [16] Wang CC, Tseng YL, Leo HY, Hu R. A 4-kB 500-MHz 4-T CMOS SRAM using low-V<sub>THN</sub> bitline drivers and high-V<sub>THP</sub> latches. IEEE Trans Very Large Scale Integration (VLSI) Systems 2004; 12:901-909.

# **Authors Biography**



Xu Wang received the B.S. degree in communication engineering from Harbin Institute of Technology, Harbin, China in 2005 and the M.S. degree in Microelectronics and Solid State Electronics from Harbin Institute of Technology in 2007. He is a Ph.D. candidate in Integrated Circuit Design and System at Shanghai Jiao Tong University, Shanghai, China. He has been a visiting scholar at Purdue University, West Lafayette, IN USA from 2012 to 2013. Now he works as a senior Lonely Mountain Electronic Technology (Shanghai) Co., Ltd. in Shanghai, China.

His research interests include full-custom SRAM design, low power electronics and methodology, ASIC physical design, and power sign-off.



Yuanzhi Zhang received the B.S. and M.S. degrees in Electronic Engineering from Shandong University, China in 2011 and 2014, respectively. He is working towards his Ph.D. degree at Southern Illinois University Carbondale, IL, United States since 2015 August.

His research interests include HEVC/H.265 video/image processing circuit and system optimization, low power SRAM VLSI design and methodology, and 3D-IC system design.



Chao Lu received the B.S. degree in electrical engineering from the Nankai University, Tianjin, China in 2004 and the M.S. degree in the Department of Electronic and Computer Engineering from the Hong Kong University of Science and Technology, Hong Kong, in 2007. He obtained his Ph.D. degree at Purdue University, West Lafayette, Indiana, in 2012. From 2013 to 2015, He worked as a R&D circuit design engineer at Arctic Sand Technologies Inc. and Tezzaron Semiconductors. Now he works as an assistant professor in Electrical and Computer Engineering Department of Southern Illinois University Carbondale. His research interests include design of micro-scale energy harvesting systems, HEVC/H.265 video/image processor, power efficient memory design, and power management IC design for ultra low power applications. Mr. Lu was the recipient of the Best Paper Award of the International Symposium on Low Power Electronics and Design (2007).



Zhigang Mao received his B.S. degree in Semiconductor Devices and physics from Tsinghua University, Beijing, China in 1986, the M.S. degree from SUPELEC, Gif, France in 1988 and Ph.D. degree in Information and Communication from University de Rennes I, France in 1992. From 1992 to 2006, he worked as a professor of Harbin Institute of Technology, China. Now he is the dean of Department of Micro-Nano Electronics, Shanghai Jiao Tong University, China

He was the main designer of the first IC Card Chip in China. He has received one National award for Science and Technology Progress and two Province awards for Science and Technology Progress. His current research interests include VLSI design methodology, high-speed digital circuit design technology, low power electronics, signal processor architecture, and hardware security technology and reliability in semiconductor devices.