1-V ultra low-power SRAM circuit techniques are described for word-bit configurable memory macrocells. A shared bitline SRAM cell architecture with modified address assignment is proposed t o reduce wasted memory-cell current to zero while suppressing the area penalty. For the new SRAM cell design, we devise a multiplexer-merged charge-transfer amplifier for high-sensitive read operation and a bitline precharge scheme with an equalizing line for high-speed write-recovery operation. A 1-V operating 64-kb (2kw x 16b x 2) test chip was designed using a 0.35-pm multithreshold-voltage CMOS (MTCMOS) logic process.
INTRODUCTION
High-performance low-power LSIs are highly demanded because they will make possible the efficient production of smaller, lighter portable information equipment. Ultra low-power systems with mW order power dissipation will be necessary for various coming services. Low-voltage system LSIs including wide-band memory macrocells, which provide high-peformance and low-power, are very suitable for providing such services economically. In order to develop these system LSIs efficiently in short turn-around-time (TAT), SRAM macrocells that aTe configurable for any word-bit organizations are needed. Therefore, low-voltage configurahle SRAM macrocells with ultra low-power dissipation are strongly desired.
Low-voltage techniques and partially memory-cell activating methods, such as a divided wordline scheme and selecting by a column decoded signal, have been reported to greatly reduce the power dissipation of SRAM [1] [2] [3] . However, they need a fully customized design and layout for various specifications, and this increases design TAT and chip cost. To shorten the design TAT, a word-bit configurable memory macrocell that employs the methodology of abutting nine kinds of leaf cells has been Permission to make digital or hard copies of all or part of this work for personal or clasiroom use is gcanted without fee provided that copies are not made or dirtrib-reported[4]. But there is wasted memory-cell current because a bitline multiplexer is necessary for word-hit configurable macrocells. This paper describes low-power SRAM circuit techniques for word-bit configurable macrocells. In the next section, conventional low-power SRAM techniques ase described in more detail, and problems to be solved are explained. After that, a shared-hitline SRAM cell architecture with modified address assignment is proposed to suppress the area penalty with no wasted memory-cell current. Moreover, we present a multiplexer (MUX) merged charge-transfer amplifier for high-sensitivity read operation and novel bitline precharge scheme for high-speed write-recovery operation. Finally, a 64-kb test chip designed using a 0.35-pm multithreshold-voltage CMOS (MTCMOS) process is described and simulation results are presented.
LOW POWER SRAM TECHNIQUES
Conventional low-power S R A M techniques are summarized in Table 1 . The methods in references [I] and [2] lower the supply voltage and suppress the voltage swing to reduce the power dissipation. In these schemes, low-Vth MOSFETs and boosted voltage or negative voltage are used to prevent a speed penalty at a low-voltage supply. These techniques are very useful for low-power SRAM macrocells, but they need a fully-customized design for various specifications. Single-bitline cross-point architecture [3] is a low-power technique for large-scale external SRAMs. This is not suitable for low-voltage SRAM macrocells because the speed penalty is large due to the single-bitline scheme. Moreover, this approach also needs a fully-customized design, which increases design TAT and chip cost.
A 1-V operating word-bit configurable SRAM macrocell has been reported [4] . It uses a MTCMOS technique to reduce the power dissipation, and any word-hit size SRAM macrocell is generated by abutting nine kinds of leaf cells. This enables us to generate low-power SRAM macrocells that are each different in word count and/or data bits in a short design TAT. However, memory-cell current is wasted because a bitline multiplexer (MUX) is necessary for word-bit configurable memory macrocells in a practical layout design. 
TAT
The method that uses a column decoded signal [3] is effective in reducing the wasted memory-cell current for configurable SRAM macrocells. However, there are some problems to be solved. First, the number of transistors in a SRAM cell increases from 6 to 8 when the differential scheme is used. This area penalty is a serious issue. Moreover, the Y select line (YSEL), which is controlled by a column decoded signal, has a large parasitic capacitance. So the power dissipation increases because of charge and discharge to the YSEL. Finally, the memory-cell current is reduced because two access transistors are connected serially in a SRAM cell. This increases access time. A boosted wordline scheme, which increases the memory-cell current, should not be used because the boosted voltage increases the complexity of the whole LSI design. So,
Single-bitline cross-point cell architecture
Large-scale memory None Long another approach is needed to maintain speed performance.
SHARED-BITLINE SRAM CELL ARCHITECTURE
We propose the novel SRAM cell architecture shown in wasted memory-cell current. When Cell A is selected, Q3 and 44 are cut off and datum stored in Cell A appears to BLO and IBLI. Data don't conflict in a hitline shared between two neighboring cells because both are not selected at the same time. The MUX needs no customized design for shared bitlines (see Fig. 2 ). As explained above, this architecture reduces the number of transistors to I equivalently in a memory cell, and, as a result, suppresses the area penalty with no wasted memory-cell current. A YSEL controls Q3 are 44, which are not connected to bitlines directly, to prevent large parasitic capacitance to a hitline. Moreover, the parasitic capacitance of wordlines is reduced because the number of transistors connected to a wordline is cut by almost half. This enhances wordline selecting speed and reduces the power due to charge and discharge.
The parasitic capacitance of the YSEL is larger than that of a bitline because two transistors, for instance Q3 and 44, are connected to the YSEL in each memory cell. The power dissipation due to charge and discharge to this parasitic capacitance should be suppressed. Since the YSEL is controlled by a column address, we modify the assignment of the column address. That is, the column address is assigned to more significant hits in a memory address. Because the changing probability of a more significant bit is lower than that of a less significant hit in general, this modified address assignment makes it possible to lower the effective operating frequency of the YSEL and reduce the power due to selecting the YSEL. Therefore, the new SRAM cell architecture reduces wasted memory-cell current to zero while suppressing the area penalty.
PERIPHERAL CIRCUIT TECHNIQUES FOR SHARED-BITLINE SRAM CELL ARCHITECTURE

High-sensitivity reading scheme
In the proposed memory cell architecture, memory-cell current is reduced because two access transistors, for instance Q1 and Q3 or 4 2 and 44 (Fig. I) , are connected serially. This reduces the hitline signal and increases access time. The MUX-merged charge-transfer sense amplifier (Fig. 2) is devised to prevent access time from increasing. A selected transistor in the MUX acts as a charge-transfer amplifier and amplifies the reduced bitline signal. After that, the second sense amplifier amplifies the output signal of the MUX-merged sense amplifier to CMOS level signal. This reading circuit scheme enhances the sensitivity of the read operation and suppresses the increase of access time due to the reduced memory-cell current. Figure 3 shows the highly sensitive reading scheme using the charge-transfer amplifier. Q5 in Fig. 3(a) is a selected transistor in the MUX and acts as a charge-transfer amplifier. When 4 6 precharges a dataline to VDD, a hitline is precharged to VDD-Vth and then Q5 is cut off. 
Fig. 3 High-sensitivity reading scheme
In conventional SRAM, hitlines are precharged to VDD to enlarge the margin against unexpected writing operation at read access. The shared-hitline memory cell has a larger static noise margin than that of the standard 6-transistor cell because of the low-conductivity of the series connected access transistors. Therefore, the hitline precharge level can he lowered to VDD-Vth, and the MUX-merged charge-transfer amplifier becomes available for the reading circuit. This scheme suppresses bitline swing in the write operation, so the power dissipation of a hitline due to charge and discharge is also reduced in write access.
High-speed write-recovery scheme
Writing operation is accomplished by lowering the hitline level to GND. Reading data sequentially after write access requires a high-speed write-recovery operation, which precharges the hitline level to VDD-Vtb in our circuit. Accordingly, an NMOSFET that has large conductivity should he added to a hitline to precharge fast in practical design. However, when a bitline is precharged by the NMOSFET, the write-recovery time is increased because the effective gate-source voltage decreases, which causes low conductivity, as the hitline level closes to VDD-Vth (see Fig. 3(h) ).
The high-speed write-recovery scheme is shown in Fig. 4 . In this scheme, every hitline is connected to an equalizing line through a PMOSFET. When lBLl is being recovered from GND to VDD-Vth, the current flows from the precharge transistors of other hitlines through the equalizing line. In a conventional memory cell architecture, precharge transistors work for each hitline-pair because all memory cells selected by a wordline discharge the charge on each hitline-pair. On the other hand, the proposed architecture enables all precharge transistors to work for only one bitline because no current flows except in a hitline selected YSEL. Therefore, the conductivity of the precharge NMOSFET increases effectively by the number of hitlines. Consequently,
high-speed write-recovery operation is achieved. This scheme is also effective for enhancing precharge speed after read operation.
DESIGN RESULTS
T o confirm the proposed architecture, a 1-V operating synchronous 64-kh (2kw x 16h x 2) SRAM macrocell, which has word-hit configurahle design, was designed with a 0.35-pm triple-metal MTCMOS logic process. Threshold voltages (Vth) of the NMOSFETs are 0.3 and 0.57 V, and those of the PMOSFET are -0.3 and -0.65 V. Figure 5 shows the readlwrite circuit configuration for one-hit data width using the proposed techniques. Memory cells, the write buffer and inputloutput resistors are composed of high-Vth MOSFETs to reduce subthreshold leakage current in the standby mode. The number of multiplexing hitlines is 8. A level hold circuit is connected to the equalizing line to prevent the voltage of the equalizing line from floating during read or write operation. A dataline-pair has an active load of PMOSFETs to compensate the voltage of the higher bitline in the write operation. Figure 6 shows the layout image of a memory cell. The size is 13.8 pm x 4.4 pm. Increasing memory-cell area toward a bitline should he avoided because this increases the parasitic capacitance of the bitline, and causes an increase of access time and power dissipation during write operation. This layout achieves the sharing of a bitline and an access transistor between neighboring cells with no area increase toward a bitline. The cell size is determined by the transistor design rule, so there is no increase due to the YSEL. The area penalty is 15% compared to the standard 6-transistor cell. This value is half of the area increase of an unshared 8-transistor cell. The test chip layout plot is shown in Fig. 7 . The size of the macrocell is 2 mm x 2.9 mm, and the area penalty is 13%.
For the test chip, the simulated maximum operating frequency at I-V supply is 33' MHz. This is adequate performance for ultra low-power applications. Figure 8 shows the simulated power dissipation at IO-MHz operation in read access, in comparison with the conventional 1-V word-bit configurahle SRAM macrocell. A conventional checker-hoard test pattern was used. The power consumed in the memory cell array is reduced from 1728 pW to 128 p W due to the shared-hitline cell architecture. This is because the number of activated memory cells is reduced to one-eighth within the same output-data bits, and the cell current is reduced by 40 ' 7 0 due to the two serially connected access transistors. The increase of power dissipation due to selecting the YSEL is suppressed because the column address is assigned to A<6> , A<7> and A<8>. Consequently, the total power dissipation is reduced to 486 pW, which is 1/4 that of the conventional I-V word-hit configurahle SRAM macrocell. 
SUMMARY
I-V ultra low-power SRAM circuit techniques have been described for word-bit configurable macrocells. A shared-bitline SRAM cell architecture with modified address assignment reduces wasted memory-cell current to zero while suppressing the area penalty. For the new SRAM cell design, we devised a MUX-merged charge-transfer amplifier, which enables high-sensitivity read operation with a simple circuit configuration, and a bitline precharge scheme with an equalizing line, which accomplishes high-speed write-recovery operation for NMOSFET precharge. A I-V operating 64-kb (2kw x 16b
x 2) test chip was designed using a 0.35-pm MTCMOS logic process. The simulated power dissipation is 1/4 (486 FW at IO-MHz operation) that of the conventional I-V word-bit configurable SRAM macrocell with a 13% area increase. The proposed architecture enables us to generate in a short TAT any word-bit size SRAM macrocells that have ultra low-power dissipation close to that of fully-customized SRAM macrocells.
