Abstract -This paper describes a 64K CMOS RAM with an access time of 15 ns. The RAM was built using a technology with self-aligned TiSi2, sirtgle-level metaf, an a~erage minimum feature size of 1.35 pm, and a minimum effective channel length of L1 pm. An access of 10 ns is possible with the word line stitched on a second level of metaf and some minor redesign. High speed is achieved through innovative circuits and design concepts. New CMOS circuits include a sense-amp set signal generator, a row decoder, and an input circnit. These circuits feature use of CMOS devices to an advantage for high-speed safe operation. A layoutnde-independent graphics tool was used for the artwork design.
,we presented a 20-ns 64K NMOS design [5] . Also included on the plot is a 0.78X scaling of that design presented at the 1985 International Symposium on VLSI Technology, Systems, and Applications which gave access times as fast as 11 ns [10] . In this paper we will describe a 64K CMOS RAM with measured access times of under 15 ns and simulated access times of 10 ns with the addition of a second level of metal and some minor redesign. The characteristics of the 64K CMOS RAM are given in Table L The high speed of this CMOS RAM is due to a combination of technology and innovative CMOS peripheral circuitry. After a brief description of the technology, three of the key circuits will be described: the senseamplifier set generator, the row decoder, and the input circuit.
In each case, the advantageous use of CMOS devices for high speed while maintaining low-power safe Manuscript received May 5, 1986; revised May 20, 1986 1) The chip can be operated at minimum cycle time since inputs may be changed during an access.
2) The chip is insensitive to glitches on the inputs once the short sampling period at the beginning of a cycle ends and the inputs are disconnected from the internal chip circuitry.
3) Data outputs are always latched in a valid state or are in a high-impedance state, except when they are in transition.
4) Precharging of internal nodes is automatically initiated at the end of an access.
5) The chip has the same cycle time for any combination of, READ and WRITE operations even if data-in and data-out pins are shared.
IV.
I@Y CIRCUITS
The development of new CMOS peripheral circuitry was key to the high-speed access which was a major objective of the 64K CMOS RAM design. Fig. 4 shows a simplified block diagram of the access path, with the delay through each block indicated.
In most of the access path, data simply ripple from block to block, with one block activating the next one. Care was taken to achieve a uniform distribution of delay throughout the critical path. Three of the more important new circuits developed for this design will be described: the self-timed array and sense-amplifier circuitry; the row decoder which uses an innovative twostage NOR and NAND decoder, and the address buffer 5. During a READ or WRITE operation, a row and a column decoder will be selected. The selected row decoder will cause its associated word line to go high and the selected column decoder will turn on the gates of the n and p complementary parallel bit-switch devices. The use of dual bit switches is necessary to avoid threshold drops in propagating the cell signal from the bit lines to the 1/0 lines during a READ or in propagating the signal in the reverse direction during a WRITE. Since the bit lines and 1/0 lines are high at the start of a READ cycle, the p-channel device forms the best path for conducting the signal. When an 1/0 line is set to a low level during a WRITE, the n-channel bit switch provides the best path for discharging the bit line to a good low level.
At the end of each word line is the sense-amp set signal Node A falling turns on the 10/1 p-channel device 3, which causes +s~~to rise in its slow set mode of operation.
A short time later the output (node 1?) of the second stage of the set signal generator will rise, causing n-channel device 6 to turn on, which in turn discharges the FS line to a low level. Device 7 (a large 50/1 p-channel device)
connects the FS and~~~= lines. When the FS line discharges, device 7 turns on and causes @~~Tto rise in its fast set mode of operation. The slow and fast set slopes and the delays between them can be adjusted by changing the sizes of the devices in the set signal generator and device 7.
In addition to the slow and fast set signal and self-timing from the accessed word line, high-speed operation is further improved by the small p-channel decoupling devices between the small capacitance nodes of the sense amplifier ( SA and SAN) and the high capacitance 1/0 lines. These decoupling devices make it possible to set the sense amplifier much faster for the same differential signal compared to a sense amplifier without decoupling devices.
The SA and SAN nod;s are directly connected to the data-out buffer for further amplification before the signal is driven off-chip.
Simulated sense-amplifier waveforms are given in Fig. 6 . The two distinct slopes of the~s~~signal can be clearly seen. The smooth transition in slope from slow to fast occurs in conjunction with the increased differential voltage build-up across the sense-amplifier nodes. It can also be seen that the small p-channel decoupling devices make it possible to set the sense amplifier as~s~= rises without having to discharge the large bit-line or 1/0 line capaci- An SEM of the array and sense-amplifier circuitry is shown in Fig. 7 . The set signal generators are at the end of the word lines. As can be seen, the~~~~line and FS line run the entire length of the array. The sense-amplifier layout is symmetrical and balanced. This la;fout was generated with the layout-rule-independent physl.cal design tool, and the symmetry was retained as layout rules were changed.
B, Row Decoder
The CMOS row decoder of Fig. 8 Even with the very conservative bounce protection delay of 0.8 ns, the row decoder is still very fast, with a nominal delay from the higher order address bits rising to the word line rising of only 2.6 ns. The circuit for input of TTL addresses and data has high speed, low power dissipation, and safe operation. It is shown in a simplified version in Fig. 11 , with the complete schematic shown in Fig. 12 . Activated by the clock input falling, the circuit converts TTL levels to CMOS on-chip drive, latches the input state, and then disconnects the external input from the internal circuitry during an access. Following an access and the rise of the clock input, the circuit is designed to quickly precharge the internal nodes and the address lines for cycle time minimization.
The power dissipation and delay skew as a function of TTL variations and device parameter variations is well contained by this circuit design, which also provides very high speed. The delay through the circuit from the rise of the clock input until the rise of the large capacitance address lines is only 1.9 ns. As will be described in this section, CMOS devices are key to the high-speed safe operation of this circuit -especially as used in the two distinctive portions in Fig. 11 : the nonlinear front end and the self-referencing latch.
A salient feature of the high-speed input circuit is the nonlinear front end, which gives the voltage characteristic shown in Fig. 11 . Because of the body-effected threshold voltage of p-channel device 2, a solid ground is provided at node B over the full range of low-input 'lTL signal levels.
This input voltages less than 1.8 V the input device is cut off and devices 9 and 10, shown on Fig. 12 , hold node B to ground, Very small devices can be used in the inverter that drives device 10, so that power dissipatiorl due to intermediate voltages on node A (input to the inverter) is small.
If node B is at ground as the clock falls, the latch sets with node D high and with no steady-state power dissipation.
If the input voltage prior to latch activation is greater than 1.8 V, the small capacitance on node B will be quickly charged high through the input devices and the latch will set with node C high, causing input device 2 to be turned off and device 8 to be, turned on. This will result in a good high being provided on node B, thereby cutting off any momentary power dissipation in the latch.
As shown on the complete schematic of the input circuit The self-referencing CMOS latch in the input circuit is key to providing high speed, low power, safe operation.
Referring to Fig. 11 , a balanced physical design is used, so that p-channel devices 5 and 6 are well matched to each other, as are the n-channel latch devices 3 and 4. At the beginning of an access, the clock input is high and nodes C and D are low, resulting in both devices 3 and 4 being cut off. Shortly, after the beginning of an access, the clock input will fall, turning on p-channel device 7 and charging nodes C and D until the n-channel latch sets in the direction determined by which of the p-channel steering devices 5 and 6 is most conductive.
If node B has been charged high through the input devices 1 and 2, p-channel device 5 will be much more conductive than p-channel device 6, steering the setting of the n-channel latch so that node C goes high with negligible variation in delay as a function of variation in the TTL high level. If node B is low, then initially devices 5 and 6 will be etqually conductive. However, as nodes C and D both begin to charge high, device 5 will quickly become less conductive than device 6, thereby steering the setting of the latch so that node D goes high. If the nonlinear front end were not used to provide a good ground 'at node B for any external input voltage less than 1.8 V, then the worst-case TTL low of 0.8 V would result in substantially longer delay through the input circuit relative to the delay for a good TTL high.
However, the self-referencing CMOS latch used in conjunction with the nonlinear front end results in good control of delay and power dissipation variations for the full range of TTL input levels, while also providing highspeed operation.
V.
RESULTS
The chip has been extensively tested using N 2 test patterns. Functionality has been measured on all 16 data inputs and outputs for a power supply voltage of 3-6 V. The chip is also operational with a + 10-percent power supply variation over the full range of TTL input levels. VI.
PHYSICAL DESIGN
The use of a ground-rule-independent graphics tool for the high-performance 64K CMOS RAM artwork design is unique. The graphics tool enables timely accommodation The artwork for this chip was derived from existing artwork for a chip previously designed using the groundrule-independent tool. To make this conversion, -90 percent of the ground rules changed. Roughly six man weeks of time were needed for the conversion. Since the tool was then in an early stage of development, it is felt that this time could be reduced by possibly an order of magnitude. In addition, the tool has been used to generate several versions of the chip for various process development vehicles. The version of the chip described here used the preexisting nonoptimized pad cage used by all versions of the design.
An example will illustrate the potential of the groundrule-independent layout tool. The SEM of Fig. 14 shows the previously described input circuit used as a row address buffer. The same address buffer shown in Fig. 15(a) has an effective channel length of 1.1 pm and an area of 19321 pm2. In Fig. 15 (b) several ground rules were changed including a change in effective channel length to 0.7 pm. For the same device width-to-length ratios the area reduces to 13689 pm2. An examination of the aspect ratio of the two plots clearly reveals a more complicated transformation than a simple scaling. The total real time to generate 
VII. SUMMARY
A 64K CMOS RAM with an access time of 15 ns has been described. The RAM was built using a single level of metal, an average minimum feature size of 1.35 pm, and an effective channel length of 1.1 and 1.2 pm for n-and p-channel devices, respectively. An access time of 10 ns is possible with the word line stitched on a second level of metal, an effective channel length of 1.0 pm, and some minor redesign. High speed has been achieved through innovative circuits and design concepts. A layout-ruleindependent graphics tool was used for the artwork design.
