Abstract -T'fris paper introduces a two-port BiCMOS static memory cell that combines ECL-level word-linevoltageswingsand emitter-follower bit-line coupling with a static CMOS latch for data storage. With this cell, referred to as a CMOS storage emitter access (CSEA) cell, it is possible to achieve access times comparable to those of high-speed bipolar SRAM'S while preserving the high density and low power of CMOS memory arrays.
I. INTRODUCTION
T HE TIME required to access static memory has a significant influence on the throughput achievable in high-performance data processing systems. In such systems, static memories with access times under 5 ns are required if the machine cycle times of emerging high-speed processing units are to be fully exploited. In addition to access time itself, simultaneous multiport access is becoming an increasingly important approach to increasing overall system speed. However, conventional multiport memory designs are typically characterized by a large cell area and relatively slow access. The highest speed static memories have generally been realized using advanced bipolar technology [1]- [4] . However, the large cell area and standby power dissipation have precluded the scaling of these circuits to increasingly higher levels of integration. The performance of CMOS static memories has been improved dramatically through the continued scaling of technology. However, access times have remained greater than 10 ns, and CMOS memory design continues, therefore, to focus on low-power, highdensity applications [5] - [7] . Recently, BiCMOS technology has been used to significantly enhance the speed of CMOS static memory arrays [8] , [9] . However, most BiCMOS SRAM'S described to date combine conventional CMOS cells with the use of Manuscript receivedMarch 21, 1988; revisedMay 17, 1988 .This work was supported in part by a fellowship from IBM and by DARPA under Contract NOO014-87-K-0828.
The authors are with the Electrical EngineeringDepartment, Center for Integrated Systems,Stanford University, Stanford, CA 94305.
IEEE Log Number 8822440. bipolar transistors in the sense amplifiers [10] and for driving large capacitive loads [8] . In these designs, the access time remains limited by factors such as the large voltage swing on the word lines, the limited cell output current, and the number of circuit stages used [11] .
This paper introduces a BiCMOS two-port static memory cell, and associated access circuitry, with which it is possible to achieve access times comparable to those of high-speed bipolar SRAM'S while retaining the high density and low power of CMOS memory arrays. The cell, referred to as a CMOS storage emitter access (CSEA) cell, combines ECL-level word-line voltage swings and emitterfollower bit-line coupling with a static CMOS latch. Thus, ECL signal swings are maintained throughout the entire READ path. The cell's separate single-ended READ and WRITE ports allow it to be read and written simultaneously, making it especially attractive for multiport memory design. Compared with conventional multiport designs, the CSEA memory offers extremely high speed and small size. The CSEA cell is only one-third larger than a single-port, six-transistor CMOS static memory cell. To demonstrate the operation of the CSEA cell, a complete 4K x l-bit two-port SRAM has been designed and integrated in a 1.5-pm BiCMOS technology. This memory operates from a single 5.2-V supply with ECL-compatible 1/0. Under nominal operating conditions the measured READ access time for this prototype design is 3.8 ns and the power dissipation is 520 mW. The worst-case access time at a case temperature of 100°C is only 4 ns with a power dissipation of 620 mW. The access time is also nearly independent of variations in power supply voltage and data patterns.
The CSEA cell and the design constraints resulting from the single-ended READ and WRITE operations are considered in Section II. The detailed circuit design of the 4K X l-bit two-port memory is then presented in Section 111. The measured performance of the prototype memory is reported in Section IV, and conclusions are summarized in Section V. The READ word line (RWL) serves as the positive supply for the internal latch of the CSEA cell, and the cell is read by raising this word line. In this paper, a ONE is defined to be stored in the cell when storage node 2 in Fig. 1 is high; that is, M2 and i143 are ON, while Ml and M4 are OIFF.
When a ONE is stored in the cell, an increase in the potential of RWL is coupled directly to the base of the output emitter follower Q6, which forms a differential pair with sense-amplifier input transistor Q 7. When a ZERO is stored in the cell, the base of Q6 is grounded through i144
and is unaffected by changes in RWL. Thus, by setting the bit-line reference level V~~~~to the midrange of the RWL swing, the bit-line current 1~~~will be switched between Q6 and Q7 depending on the data stored in the cell. A nominal word-line swing of only 550 mV is sufficient to reliably switch the bit-line current.
The pass transistor through which the CSEA cell is written, M5, is controlled by the WRITE word Iine (WWIL).
Full CMOS logic levels are used on both WWL and the WRITE bit line (WBL). In order to write a ZERO into the cell (node 2 low), &f4 must be turned on by pulling node 1 high through M5 from a high level on WBL. "rhis is more difficult to accomplish than writing a ONE into the cell because M5 operates as a source follower and its threshold is increased by the body effect. Therefore, the logic thresholds of the cell inverters must be adjusted to provide equivalent margins for writing either a ONE or ZERO. 'To provide the drive for a single-ended WRITIE, M5 is considerably larger with respect to M3 and M4 than would be the case in a conventional balanced CMOS static cell. operates as a source follower with a correspondingly reduced drive capability.
To balance the delays for writing either a ONE or a ZERO, the threshold of the M2, M4 inverter is set slightly lower than that of the Ml, M3
inverter.
Since the READ and WRITE operations are separated in the CSEA memory, the delay associated with the WRITE recovery of the bit lines in a conventional CMOS six-transistor memory is avoided.
The CSEA cell can be selected simultaneously for both reading and writing because the READ and WRITE access are single-ended and access different storage nodes within the cell latch. Even when the WWL for a cell has been selected, the cell can be accurately read at the normal speed (as long as it is not actually being 'written) because the disturbance at the base of Q6 resulting from WWL going high is small. Thus, it is possible to simultaneously read one cell and write another even if they are in the same row.
B. Single-Ended READ
The minimum RWL swing needed to reliably distinguish the data stored in the CSEA cell can be determined by considering the two worst-case conditions depicted in Fig. 2. In Fig. 2 (a) a ONE is stored in the cell being read, while all other cells in the same column contain a ZERO. Under these circumstances Q6 must switch the bit-line current away from Q 7. To ensure a particular current sensing ratio K = 11/12, where II is the current flowing in Q6 and 12 is the current in Q 7, the difference between the high level of RWL and the bit-line reference must satisfy the relationship
where k is the Boltzman constant, T is temperature, and q is the electron charge.
As illustrated in Fig. 2(b) , the worst case for reading a ZERO occurs when all unselected cells in the column contain a ONE. In this situation the bit-line reference is not compared with the ZERO level stored in the internal latch;
instead, the bit-line current is switched between the reference transistor Q7 and the emitter-follower output transistors of the unselected cells, all of which have their bases pulled up to the low level of RWL. For this case the current sensing ratio is defined as K = 12/Ii, where 12 is the current in Q7 and II is the total current flowing in the unselected cells of the column. To achieve a specified K, the required difference between the low level of R WL and the bit-line reference is
where N is the number of cells in a column. It follows from (1) and (2) that the minimum swing on RWL needed to maintain a given current sensing ratio is
For N =64 and K =100, the minimum swing is 13. the state of the latch, the drive through the pass transistor A45 must be strong enough to overcome either the pull-up of Ml or the pull-down of M3 so that the voltage at the internal node 1 crosses the logic threshold of the J42, M4 inverter. However, for cells on the same row with WWL high, the unselected WBL'S must be biased at a level such that turning on M5 will not change the state of the M2, under which WRITE disturbances are evaluated. The disturbance of a cell is more severe when RWL is low because of the reduced margins and lower loop gain of the latch.
Since WWL and WBL are driven at full CMOS levels (GND and V&), the design parameters governing WRI.TE and WRITE disturbance that remain to be determined are the level at which WBL is biased for the unselected cells, In order to change the data stored in a cell from a ZBRO (node 2 low) to a ONE, node 1 must be brought below the M2, M4 threshold.
Thus, the WBL voltage must be IIess than that corresponding to the intersection of curves A and E in Fig. 5 ; that is, VwB~must lie in re@on I as identified. in the figure. Similarly, to write a ZERO into the cell Vw~~must be in region V, above the intersection of curves B and E. Disturbance of an unselected cell severe enough to change its data is avoided by biasing WBL to a voltage in region III, above the intersection of curves C Transfer curves for single-ended WRITE and WRITE disturbances. 
D. Cell Layout
The size of the CSEA cell is governed largely by the required six interconnections: GND, VEE, RWL, RBL, WWL, and WBL. Shown in Fig. 6 is a layout of the cell using 1. The block diagram of a 4K two-port CSEA memory is shown in Fig. 7 . The memory is organized as a 64-row x 64-column array. Separate READ and WRITE address inputs provide direct access to the READ and WRITE ports of the cell array. In addition, separate data input and output paths are used to improve system performance by eliminating the need for multiplexing the data bus, thus reducing the 1/0 delay in the critical path. 
IV. EXPERIMENT.U RESULTS
The prototype 4K x l-bit two-port memory design was integrated in a 1.5-pm, 5-GHz, double-metal, double-poly BiCMOS technology. A die photo of this experimental CSEA memory is shown in Fig. 15 ; the die size is 2.5 x 3.5 mniz. In laying out the circuit, particular attention was given to isolating the low-voltage-swing current switching bipolar circuits from power supply noise generated in the CMOS circuits. Separate GND pads, which serve as the positive supply voltages, are used for the bipolar and CMOS circuits, along with three V~~pads. Additional pads for monitoring and, if desired, altering the primary reference levels are included along with internal test palds to aid in the characterization of the memory. The performance of the experimental lmemory is summarized in Table I . Shown in Fig. 16 are oscillographs of the measured READ row access time for both worst-and best-case data patterns at room temperature with a supply voltage of -5.2 V. Under these conditions the power dissipation is 520 mW. The worst-case data pattern is that where all the cells in the array other than the one being read store a ONE. In this case, the parasitic capacitances on the READ word lines and bit lines are maximized. The best-case pattern is that where all cells in the array except the one being read store a ZERO. From Fig. 16 it is apparent that no significant data dependency is observed for the READ row access time; for both conditions, the access time is less than 3.8 ns. Fig. 17 shows the READ column access time at room temperature with the nominal supply voltage. Again, no trace: data output.
with a supply voltage of -4.5 V, and no changes in the access times were observed. The insensitivity of the access times to temperature is a consequence of using temperature-compensated voltage swings and low-temperaturecoefficient polysilicon resistors in the READ path. The independent READ and WRITE capability of the memory is demonstrated in the oscillograph of Fig. 19 . In this measurement, one cell is being read while another is simultaneously being written. The cell being read is also chosen so that it is in a WRITE disturbance condition; that is, its WWL has been taken high. Similarly, the cell being written is in a worst-case condition because its RWL is high. This experiment also confirms that the WRITE disturbance does not degrade the READ access time. Because the READ and WRITE ports are separate and independent, there is no arbitration problem between READ and WRITE addresses. When the two ports simultaneously read and write the same cell, the relationship between the READ and WRITE data is purely one of propagation delay; i.e., the READ port reflects the newly written data after an elapsed time approximately equal to the sum of the WRITE ENABLE pulse width and the READ access time.
The measured minimum WRITE ENABLE pulse width is 4 ns. Setup and hold times of the WRITE address signals with respect to the WRITE ENABLE pulse are less than 1 ns. The propagation delays of the ECL/CMOS translators in the WRITE address and WRITE ENABLE paths balance each other during the WRITE operation. Thus, the minimum WRITE ENABLE pulse width is determined by the delay in the dynamic decoder and the time needed to write data into the cell. The minimum WRITE cycle time is the sum of the minimum WRITE ENABLE pulse width and the address setup and hold times. This suggests that the prototype CSEA memory can be cycled at rates above 150 MHz.
Since most of the power in the CSEA memory is dissipated in the peripheral circuits, a 16K two-port SRAM implemented using the same circuit techniques and technology is projected to have a typical READ access time of 4.5 ns with a moderate increase in power dissipation to 750 mW.
V.
SUMMARY
This paper has described a new BiCMOS memory cell that combines the density and low power dissipation of CMOS cells with the access times achievable in bipolar SRAMS and independent READ and WRITE access, Key features of the CSEA cell are the use of small, ECL-like, word-line voltage swings and emitter-follower coupling to the bit line in a design wherein a static CMOS latch is used for data storage. The CSEA cell also employs single-ended READ and WRITE access to the cell, thus providing a multiport capability with little increase in cell size. The CSEAmemory cell has been incorporated into the design of an experimental 4K X l-bit two-~)ort SRAM. This prototype memory was integrated in a 1.5-pm 5-GHz BiCMOS technology, and a READ access time of 4 ns with a power dissipation of 520 mW was achieved. The experimental results confirm that in a BiCMOS technology it is possible to achieve access times comparable to those of bipolar SRAMS in a multiport static memory design that 1039 retains CMOS latches for data storage. The design approach adopted in the CSEA memory thus provides significantly enhanced flexibility y at the system architecture level without compromising speed, power, or integration density.
ACKNOWLEDGMENT
The authors wish to thank Integrated Device Technology, Inc. for fabricating the circuits. They are especially grateful to F. C. Hsu and C. C. Wu of IDT for numerous helpful discussions. They are also indebted to J. B. Kuo and R. W. Dutton of Stanford, who provided the initial inspiration for this project.
