This paper proposes a 10T bit-cell of dual-port (DP) SRAM design to improve Static Noise Margin (SNM) and solve write/read disturb issues in nano-scale CMOS technologies. In additional used the row access transistor in the bit-cell, adding Y -access MOS (column-direction access transistor) can improve dummy-read cells' noise margin and isolate the pre-charge noise from bit-lines in synchronous or asynchronous clock operation. The paper also proposes a scheme of combining the row access transistor and sharing bit-line with an adjacent bit-cell. This scheme can reduce the bit-line number to half and mitigate the current consumption of the write/read buffer caused by precharging the bit-line to VDD. Furthermore, Y -passgate (column direction access transistor) numbers can also be reduced to half with the proposed DP 10T SRAM architecture. The result shows that write/read buffer current consumption was reduced by over 30%, compared to the conventional DP 8T structure from 1.4 V to 0.6 V VDD.
INTRODUCTION
As CMOS technology continues scaling down to deep nanometer nodes, the system-on-Chip (SOC) demands high density of embedded SRAM for high speed, low power and area improvement. In addition, it is also required more memory bandwidth. The most employed element is single-port (SP) SRAM for widely used. It consists of one write/read-port to activate at one time and has random access feature. However, recently demands for more memory bandwidth, the dual-port (DP) SRAM or multi-port SRAM capacity has been investigated gradually, because it has parallel operation advanced feature for high speed communication and video applications which single-port SRAM does not have. The SOC chip employs dual-port or multi-port SRAM for parallel operation can improve memory bandwidth significantly. [1] [2] [3] [4] [5] Since SP SRAM has only 1 clock cycle to activate WL and access memory cell synchronously, the data access should execute serially. However, DP SRAM has 2 independent clock cycles which can access 2 different ports of memory cells separately and execute a parallel read or write operation. The conventional single-port and dual-port SRAM cell are shown in Figure 1 . Though DP SRAM has such advantages of accessing cells for read and write simultaneously, it also creates some write/read disturb or conflict problems in common row access. Some techniques solve the disturb issue on synchronous DP SRAM have been reported in Refs. [6, 7] . However, in both of synchronous and asynchronous DP SRAM applications, the disturb issue still exist. One paper has proposed a test procedure to detect the worst Vmin in asynchronous clock operation. 8 By this method, the circuit can screen the worst bit in an array which suffer write/read disturb issue.
This paper proposes a method of adding Y -access MOS into the bit-cell which can solve the DP SRAM write/read disturb issue happening in common row access. The cells of the unselected column keep in a hold state by disabling Y -access MOS transistor. Therefore, the write/read-disturb phenomenon will not happen in these dummy read cells as conventional DP 8T SRAM. Furthermore, through the Y -access transistor, we also propose a scheme that connects adjacent cells and shares the bit-line. The bit-line number is reduced to half. Therefore the scheme can save power consumption on the bit-line when pre-charging to VDD on the dummy read cells.
The rest of this paper is organized as follows. Section 2 describes the concern of the dual-port SRAM cell on 4 different operating modes to access cells within an array.
Wang and Hwang
A 45 nm 10T Dual-Port SRAM with Shared Bit-Line Scheme for Low Power Operation Section 3 proposes the DP 10T cell and proves that it overcomes SNM and write/read-disturb issue on dummy read cells. The layout of the DP 10T cell is also implemented. Section 4 provides the DP 10T SRAM architecture. We describe the normal array with Y -passgate connection, and further explain how to simplify Y -passgate connection. Section 5 provides simulation results and compares performance of DP 10T and DP 8T SRAM write/read buffer current consumption and access time delay. Section 6 summarizes the proposed technique and makes conclusion.
DUAL-PORT CELL CONCERN
Dual-port SRAM operation contains four access modes which can be categorized as shown in Figure 2 . The first mode is accessing A-port and B-port on different rows and different columns for write/read operations as shown in Figure 2 (a). The second mode is accessing A-port and B-port on different rows but the same column for write/read operations as shown in Figure 2 (b). Both of these access modes activate only single port on one row. As shown in Figures 2(a) and (b) on different row-accessing modes, when the WL of A-port or B-port is activated for write/read operations, the WL of another port turn on the same row is deactivated. Thus, no access conflicts occur. Figure 2 (c) shows a situation where write and read operations are conducted on the same row but different columns, causing an access conflict problem. When the WL of A-port is activated for write/read operations, other cells in the same row also activate the WL of A-port, creating the dummy read condition. If the WL of B-port is activated for write/read operations in different columns, the A-port dummy-read condition is encountered and the write/read-disturb issue occurs. The write-disturb issue hinders the ability to write data that differ from the dummy-read data. The read-disturb issue degrades the reading cell current and influences the read capability.
Other cells on different columns of the dummy-read row also degrade the SNM for the pre-charged bit-line, disturbing the internal nodes of the cell. Figure 2 (d) shows writing/reading operations activated in the same row and same column. Executing writing operations concurrently using two ports with the same input address should be inhibited. This is because writing different data into a cell using two ports simultaneously causes a serious DC conflict phenomenon and significant leakage current. For Figure 2 (d) situation, both-ports read and one-port write with the other port read are still permitted and are frequently applied operation for dual-port SRAM. What follows is a discussion on the various write/read disturb issues and their influence on cell stability and performance. Solutions for the write and read disturb issues of dualport SRAM operations are urgently required, as shown in Figure 3 . Figure 2(c) shows that when the cell in the left column executes a write/read operation at A-port, the right cell of A-port becomes a dummy read, with its BL2A and BL2AB pre-charged to high. If B-port attempts to write 0 onto the cell through BL2B, its internal storage node encounters a write data disturb issue and is difficult to flip, as shown in Figure 3(a) . This is called a write disturb. The other issue, called a read disturb, is explained as follows. When the cell in the left column executes a read operation at A-port, the BL2A and BL2AB of the right cell are precharged to high and its storage node "0" rises up slightly. Thus, if B-port attempts to read 0 through BL2B, its internal storage node encounters a read data disturb issue, as shown in Figure 3(b) .
When a dummy read occurs in the cell of an unselected column, the read static noise margin (RSNM) becomes worse than the hold SNM (HSNM) (with the WL of both ports deactivated), as the butterfly curve in Figure 4 (a) shows. For example, when the WL of A-port is activated, the BL1A and BL1AB shown in Figure 2 (c) are pre-charged to high, disturbing the internal storage node, which causes the RSNM to deteriorate, as shown by the red butterfly curve in Figure 4 (a). When both ports are activated, the deterioration of the RSNM is the worst, as shown by the green butterfly curve in Figure 4 (a). 
THE PROPOSED DUAL-PORT 10T CELL
Figure 5(a) shows the proposed DP 12T cell unit scheme. In this scheme, the access MOS, AXR and BXR are shared with the right adjacent cell. The other access MOS, AXL and BXL, are shared with the left adjacent cell. Therefore, the appearance of the 10T cell is as shown in Figure 5 (b). In a conventional DP 8T cell scheme, the X-direction access MOS (row direction control), AXL, AXR, BXL, and BXR connect the internal node to the BL and BLB of A-port and B-port respectively. The DP 12T cell scheme adds Y -direction access MOS (column direction control) AYL, AYR, BYL, and BYR between the X-direction access MOS and internal node. Figure 5 (b) shows a scheme with adjacent four DP bit-cells connected. The adjacent cells in column 0 and column 1 are connected by Y -direction access MOS controlled by AY0 and AY1. They share a common X-direction access MOS controlled by AWL. The MOS controlled by AWL connects to a common bit-line ABLB0 which is shared between column 0 and column 1. Similarly, the adjacent cells in column 1 and column 2 are connected by Y -direction access MOS controlled by AY1 and AY2. They share a common X-direction access MOS controlled by AWL. The MOS controlled by AWL connects to a common bit-line ABL2 which is shared by column 1 and column 2. As shown in Figure 5 (b), the bit-line pair in column 1 is ABLB0 and ABL2. The bit-line pair in column 2 is ABL2 and ABLB2. The same rule applies to B-port. We found that only even-numbered bit-line pairs were retained, whereas odd-numbered bit-line pairs were omitted. Employing this criterion, we retained only even-numbered bit-line pairs for the entire cell array. Thus, the total bit-line number was reduced to half, compared to that of the conventional DP 8T cell array. Figure 6 shows DP 10T cell with various operation schemes and their corresponding SNM distribution curve result. Figure 6 (a) shows a cell with the two-port WL activated. The cell is in a two-port read state when A-port and B-port read happen in the same row. The cells on the selected columns have the worst SNM result. Figure 6 (b) shows a cell with the WL of A-port activated. The cell is in a one-port read state when A-port read happens only. The RSNM result mean of 101.5 mV for the cells in the selected columns is superior to that of the two activated WL case at 85.2 mV. These two cases are the dummy read modes previously explained for DP 8T cells, as shown in Figures 2(a)-(c) . Although the RSNM value is marginally better than that of the DP 8T cell, the SNM degradation issue remains. The advantage of the proposed DP 10T cell is that it includes a pair of Y -direction access MOS to control the connection to the internal storage node. When the WL of A-port is activated in one row and one column, the other unselected cells in that row are deactivated by the Y -access MOS. As shown in Figure 6 (c), although the AWL is activated, the AY remains deactivated. Therefore, the RSNM of the cell is nearly equal to its HSNM. Figure 6 (d) shows the hold state of the DP 10T cell. Both the AWL and BWL are deactivated. Additionally, both the AY and BY also remain deactivated. Figure 6 (e) shows that the orange curve of one activated WL and one deactivated Y nearly coincides with the green curve, which is all WL deactivated. Furthermore, their SNM mean value is the same at 206.5 mV. This means that the SNM deterioration issue occurring in conventional DP 8T dummyread cells can be eliminated completely by using DP 10T cells instead. The RSNM and HSNM curves shown in Figure 6 (e) are 10000 Monte-Carlo simulation results in 0.6 V VDD, 25 C, and FS corner. Figure 7 shows a diagram of four adjacent DP 10T cells connected to form four columns in one row. When A-port reads in Column 0, the A-port BL/BLB of the other columns is pre-charged to high. If B-port writes 0 in Column 2, a write disturb issue occurs in conventional 8T DP cells. In Figure 7 , because AY2 is deactivated, the storage node of the 10T cell is isolated from ABL2. Therefore, writing 0 from BBL2 through the BY2-controlled access MOS onto the storage node does not cause any conflict issues. Similarly, when B-port writes in Column 2, the B-port BL/BLB of Column 0 is pre-charged to high. If A-port reads 0 at Column 0 with BY0 deactivated, the storage node is isolated from BBLB0 and no read disturb issue occurs. Thus, by employing a DP 10T cell structure, write and read disturb issues are overcome. two-cell scheme to demonstrate bit-line number reduction, which can mitigate bit-line and word-line capacitance. We found that the ABL2 and BBL2 bit-lines are shared between Columns 1 and 2, as required by the scheme shown in Figure 8 (a). Additionally, ABLB0 and BBLB0 are shared between Columns 0 and 1. Therefore, the bit-line numbers are halved. Furthermore, the bit-line metal number as well as the bit-line junction capacitance is reduced. As shown in the layout of Figure 8(b) , the junction areas of ABL2 and BBL2 are symmetrical to the boundaries of Columns 1 and 2; therefore, the BL junction capacitance is halved. Additionally, AWL or BWL control only one gate for each cell. As shown in Figure 8 (a), Column 1 employs only AWL/BWL connected to ABLB0/BBLB0 and Column 2 employs only AWL/BWL connected to ABL2/BBL2. In conventional DP 8T cells, AWL/BWL connects to both ABL/BBL and ABLB/BBLB separately. Therefore, the WL gate capacitance of the DP 10T scheme is halved. The layout area of DP 10T cells is 3.115 m 2 , as shown in Figure 8 (b). The area of conventional DP 8T cells is 1.131 m 2 , as shown in Figure 8(c) .
SNM Issue Overcome

Overcoming Write and Read Disturb Issues
Layout Implementation
The DP 10T cell layout shown in Figure 8(b) indicates that the storage node connects to the NMOS controlled by AY1 and BY1 followed the NMOS controlled by AWL and BWL. We found that the series NMOS controlled by AY1 and AWL maintains vertical symmetry to the series NMOS controlled by BY1 and BWL. The height of the cell layout extends in the BL direction. Additionally, we also found that the MOS controlled by the AY1 and BY1 of the left side maintains symmetry with the MOS controlled by the AY1 and BY1 of the right side. Therefore, the width of the cell layout extends in the WL direction. The aim of the DP 10T cell layout is to locate the ABLB0 and ABL2 on the boundary of the unit cell to enable it to grow into an entire array. By contrast, conventional DP 8T cells do not have an NMOS controlled by AY1 and BY1. Additionally, conventional DP 8T cells are not required to consider the vertical symmetry; thus, the layout is more compact, with less width and height.
DUAL-PORT SRAM ARCHITECTURE
The proposed 1Mb dual-port SRAM architecture is shown in Figure 9 . The 1 Mb SRAM chip is divided into 8 blocks.
Wang and Hwang
A 45 nm 10T Dual-Port SRAM with Shared Bit-Line Scheme for Low Power Operation Each block is stacked from bottom to top and numbered from block 0 to block 7. Every block has its own A-port and B-port control circuits, which are distributed under or over its dual-port cell array. The dual-port 10T cell is shown in Figure 8 (a). Each adjacent block shares a common area for the same port (A/B). Overall, 32 I/O circuits correspond to each vertical cell array column. Every time SRAM attempts to write data into a cell, the data first enter a write buffer in the I/O block. Next, the data are delivered to one local write driver in the local A/B-port control circuit. The selected block activates its local write driver and writes the data into the cell.
Similarly, when SRAM attempts to read data from a cell, the data must first be read out by a local read buffer. The selected block passes the data through the activated A/B-port control circuit to the data output buffer in one I/O block.
Normal Cell Array and Y -Passgate Connection
Figure 10(a) shows a four-cell row connected with a A-port Y -passgate scheme. In a read cycle, when RDL is activated at a low level and one yselt is high, the data of one pair of ABL and ABLB pass to DL and DLB through a Y -passgate. For example, column 2 employs an ABL2/ABLB2 pair and delivers data to DL/DLB with the yselt 2 high for a read cycle. Additionally, column 3 employs a ABLB2/ABL4 pair and delivers data to DLB/DL with the yselt 3 high for a read cycle. The DL/DLB then passes data to a local read buffer to be read.
Simplified Y -Passgate Connection
From Figure 10(a) , we can find that one ABL connects to 2 adjacent Y -passgates. That is, ABL can be shared by two adjacent Y -select signals. For example, ABL2 is shared with yselt 1 and yselt 2 . If either yselt 1 or yselt 2 is high in a read cycle, data can pass through Y -passgate into a ADL line. Additionally, ABLB2 is shared with yselt 2 and yselt 3 . However, if either yselt 2 or yselt 3 is high in a read cycle, data can pass through Y -passgate into a ADLB line. Thus we can simplify the Y -passgate number, as shown in Figure 10 Figures 10(a) and (b) , the Y -passgate number is reduced. Therefore, the junction capacitance of the ADL/ADLB line is reduced to half and the current consumption can be reduced further.
Compared with
SIMULATION RESULTS AND PERFORMANCE COMPARISON
Figure 11(a) shows the local read buffer circuit which contains a sense amplifier. Figure 11 (b) displays the waveform sequence of B-port-read related signals. First, when the B-port word-line WLB is activated, the internal nodes of the bit-cell, bl_in and blb_in, deliver the stored data 1 and 0. After the read control signals, RDL and SMPRS get an active low pulse, the BL and BLB data pass into the sense amplifier. When SOE becomes active low, the sense amplifier starts to latch and propagates data into the global data bus, GRDT and GRDC. GRDT and GRDC are initially pre-charged to the VDD level. As SOE goes low, they exit the pre-charged state and begin accepting data propagation from a local sense amplifier. As shown in Figure 13 (b), when GRDC goes low, the SRAM output signal Q becomes high and completes the read operation. Because of the proposed DP 10T cell structure, the dual-port SRAM architecture can exploit the advantage of sharing a bit-line across WL-access MOS between adjacent columns, thereby reducing the bit-line number. Furthermore, by employing Y -passgate simplified scheme, the DL/DLB line capacitance can be reduced also. This bitline number reduction effect can be verified on the current consumption when the bit-lines are pre-charged back to VDD. Because dummy-read operations do not exist in the proposed DP 10T cell array, the current consumption of the dummy read cells can be reduced, regardless of whether the write and read operations are executed in the same row. The reduced current consumption of the read and write buffers is shown in Figure 12(a) . The consumption of the proposed DP 10T architecture is compared with that of the conventional DP 8T structure. The results indicate that the current consumption of the write/read buffer is reduced by more than 30%. Figure 12 (a) shows a comparison of the current consumption of one-port write/read buffers. Figure 12(b) shows the write/read buffer current consumption when both ports are enabled on the same row but different columns. The reduction ratio of A-port read and B-port read operations was approximately 30% compared to conventional DP 8T cell arrays. The both-port write current reduction ratio was more than 30% compared to conventional DP 8T cell arrays.
Because the DP 10T cell includes an additional Y -direction access MOS between the internal node and the WL control gate, the read and write access time is affected. When reading a cell, the data must first pass through the Y -direction access MOS before delivering data into the bit line through the WL control gate. The series resistance of the Y -direction access MOS and WL control gate increases the data-reading delay compared to that of conventional DP 8T cells. Additionally, because the proposed DP 10T SRAM macro area overhead is 172%, the access time from the clock being triggered to accessing bit-cell data is affected. As shown in Figure 13 , the read access time delay increases by 10% to 30% versus 1.4 V to 0.6 V VDD. The write access time delay increases by 30% to 50% versus 1.4 V to 0.6 V VDD. A comparison of the overall performance of DP 10T and DP 8T SRAM is shown in Table I . Figure 14 (a) shows a diagram of the DP 10T array bitline leakage path. In a hold state or standby condition, the WL control MOS and Y -access MOS are deactivated when BL is pre-charged to VDD. Because the BL leakage current must pass through two disabled series-connecting MOS to the "0" storage node, the sub-threshold leakage current decreases compared to that of the conventional DP 8T cell array, which has only one WL control MOS on the leakage path. Figure 14(b) shows the BL leakage reduction between the proposed DP 10T cell and the conventional DP 8T cell array. With 0.6 V VDD, the BL leakage can be reduced by over 40% at the FF corner. Furthermore, because the BL number of the DP 10T array is only half that of the DP 8T array, the total BL leakage current can be reduced by more than 50%.
We conducted a simulation test of A/B-port write/read operations executed in a common row. A-port activates column 7 and executes write/read operations for four continuous cycles, as shown in Figure 15(a) . B-port activates column 0 and also executes write/read operations, as shown in Figure 15 (a). Data in the Figure 15 (b) waveform indicate that the internal node of the DP 10T cell can be written and read as required.
CONCLUSION
This paper has presented the novel design techniques to improve write and read capability on 1 Mb dual-port SRAM chip using tsmc 45 nm low-power CMOS technology. The proposed DP 10T cell can avoid write/read disturb issue happening on dummy read cells during common row access. This issue is overcome by adding column direction MOS to disable access from the bitline to the internal node. Furthermore, the bit-line can be shared between adjacent columns by sharing common WL control MOS in the DP 10T SRAM architecture. This scheme reduces the bit-line number to half and lowers the write/read current consumption. A 1 Mb 10T dual-port SRAM chip has been designed and implemented.
