CMOS array design automation techniques by Lombardi, T. & Feller, A.
General Disclaimer 
One or more of the Following Statements may affect this Document 
 
 This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 
 
 This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 
 
 This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 
 
 This document is paginated as submitted by the original source. 
 
 Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 
 
 
 
 
 
 
 
Produced by the NASA Center for Aerospace Information (CASI) 
https://ntrs.nasa.gov/search.jsp?R=19790017147 2020-03-21T21:55:01+00:00Z
63/33 26858
4({
w
Ch10S ARRAY Dr-SIGN AUTOMATION TECHNIQUES
4
i
'1
000; ;s /"'' 'Po
n
40e; 11-k
ly^1^91_rlby^'
^S
c
r4
NASA CONTRACTOR
REPORT
I
RASA CR-150221	
^.104
(NASA-CR- 150221) CMOS ARRAY CESTGN
A 1110MATION TECHNIQUES Final P.eport (PCA
Appl iel computer Systi ms Lab.)
	
36 p
HC A03/4F A01	 CSCL 09C
N79-29118
rincl as
By T. Lombardi and A. Fell,ar
Advanced Technology Laboratories
Government Systems Division, RCA
Camden, New Jersey 08102
December 1976
Final Report
Prepared for
NASA - GEORGE C. MARSHALL SPACE FLIGHT CENTER
Marshall Space Flight Center, Alabama 35 812
W.
iii
IN	 it-_
TABLE. OF CONTENTS
Section Page
1 INTRODUCTION 1...................................
2 DESIGN OBJECTIVES	 .............................. 3
A.	 Access Time	 ................................. 3
11.	 Pinout	 .	 ..................................... 3
C.	 Outputs	 .................................... 4
D.	 Programming Options	 .. ........
	 ................. 5
E.	 Power Dissipation	 . .	 .	 . .	 . . . . .	 . . . . . . . . . . . . . . . .	 .	 ... f
F.	 Implementation	 ....	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . ..	 .	 .	 .	 .	 .	 .	 .	 ...... 7
3 CIRCUIT DESIGN	 ................................. 8
A.	 General	 .................................... 8
I1.	 NNIOS Memory Array
	
............................ 8
Co
	
1-of-64 Decoder	 ...	 ............................. 14
D.	 Output Decode	 .	 ...................	 ............	 . 16
E.	 Input/Output Buffers and Decoders .................... 18
F.	 Layout	 ...................................... 20
G.	 Testing	 .................................... 20
H.	 Simulation
	 ........	 .......	 .	 ..	 ..	 .	 ............... 22
4 CHIP STATISTICS	 ...	 ...	 ...	 .	 ........................ 23
5 CONCLUSIONS 	 ......	 . .	 ..	 .	 .	 .	 . .	 .	 .	 .	 . . . . . . . . .	 . .	 .	 .	 .	 .	 .	 .	 .	 . 26
6 RECOMMENDATIONS	 ..........
	 ..................... 27
APPENDIX .............................................. A-1
d
LIST OF ILLUSTRATIONS
Figu re Page
1 ATL078 block diagram	 ............................... 9
2 NMOS array interconnect	 ............................. 10
3 NMOS memory matrix 	 . . . . . . . . . . . . . . . . ............... 12
4 Memory programming l inks	 . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Section of PMCS structure of 1-of-64 decoder ................ 15
6 1-of-8 decoder logic	 ................................ 17
7 ATL078 data path	 .................................. 18
8 Chip select decode and tristate logic ...................... 19
9 ATL078 block layout ......................... a ....... 21
A-1 ATL078 word and bit locations 	 ......................... A-2
A-2 Repeatable 2 x 2 array of program links 	 ................... A-3
LIST OF TABLES
Ta ble
	
Page
1	 ATL078 Pinout .................................... 	 4
2	 Simulation of Worst-Case Access rath 	 ................... 22
3	 ATL078 Chip Statistics ............................... 	 24
iv
3	
1
Section 1
INTRODUCTION
This report describes the development of a 4096-bit CMOS SOS ROM organized
512 words by 8 bits. F, significant feature of this ROM is that it can be programmed
either at the metal mask level or by a laser beam after wafer processing has been
completed.
Commercially available ROM and PROM chips are made from two technologies:
bipolar and MOS. Bipolar ROM chips offer typical access times of from 20 to 150 ns.
The 20-to-50 ns range is representative of F.CL type designs. FCI. chip size is limited
due to power considerations, with 1024 bits or less being typical.
The majority of bipolar ROM designs utilize TTI, logic. State-of-the-art TTI,
ROM design offers 16K bits per chip, while PROM design tops out at 8K bits. Access
times from 50 to 150 ns are typical, with active power dissipations of 500 to 1000 Mw
per chip. Standard TTL, voltage reKluirements (5V f 5 to 10%) are specified for ROM
chips fabricated with this type logic.
ROM chips utilizing MOS technology are made primarily with PMOS, although some
NMOS and CMOS chip types are available. As a class, MOS ROMs have access times
from about 300 ns to several microseconds. PMOS and NMOS ROMs require from one to
three (normally two) voltage supplies to function, while CMOS ROMs require just one
supply voltage, which typically can be varied over a wide range (3 to 18 V). Some input
and/or output compatability exists between MOS ROMs and TTL circuitry, although most
circuit designs utilize special interface circuit elements.
PMOS and NMOS power requirements span the same range as TTI, bipolar (500 to
1000 mW), although some chip types have power dissipations in the 150 to 200 mW range.
CMOS ROT%1 chips have power requirements that reduce consumption over PMOS, NMOS,
and bipolar by a factor of ten . CMOS design sacrifices chip area to accommodate the
same number of bits as NMOS, PMOS, or bipolar designs. NMOS R01V1s tire available
with 16K Fits and PROMS presently offer 8K bits. The largest CMOS ROM or PROM
presently has 1K bits.
The design. of modern equipments has moved towards larger degrees of implementation
with CMOS circuits. The characteristics of CMOS have been well documented and will nct
1
be elaborated on here except to point out that the extremely low power dissipation of
CNIOS has removed a critical hurdle towards the develt,pment of LSI circuit types.
Improvements in CMOS technology and the maturity of the SOS (silicon on sapphire)
process have increased on-chip circi l it density and speed to the point where VLSI
(over 1000 gates) is possible with system speeds better than low-power Schottky TTL.
One reason for the polularity of CMOS is that it has a wide operating voltage
range (3 to 15V) that can be varied by the system designer to accomodate power, speed,
and interfacing problems. high-speed CMOS-SOS LSI systems are typically operated at
voltages above 5 V in order to realize an increase in circuit speed When such systems
required RAM and ROM components previously, the only speed compatible memories
were bipolar. Recently, RCA has introduced a series of CMOS SOS RAM chips that pro-
vide speed compatibility with bipolar RAMS, while having the low power and wider opera-
ting voltage characteristics of CMUS. There is, however, no comparable (P)ROM com-
ponent on the market. A 10 V CMOS SOS LSI system requiring a high-speed PROM or
ROM is forced to use a bipolar product. This has at least three negative effects: (1) an
extra power supply voltage (5V) is required; (2) the low-power CMOS SOS system has
been compromised by the use of higher powered TTL; and (3) components in addition to
the ROMs are required to interface the input and output voltage levels of the bipolar and
CDIOS subsystems.
An alternative to using bipolar ROM chips in high-speed, low-power systems is to
develop a ROM using a high-speed, low-power technology. CMOS SOS is such a tech-
nology.
The development of a 4K CMOS SOS ROM fills a void left by available ROM chit,
types, and also makes the design of a totally CMOS major high-speed system more
realizable.
2
kk
Section 2
DESIGN OBJECTIVES
A. ACCESS TIME
T'he 4K CAMS SOS 11011 desig.►ned by RCA was designated the ATI,078. Its organi-
zation is 512 words long by 8 bits wide
The design philosophy behind the development of the AT1.078 was to make it speed
compatible with existing bipolar ROMs at the system level. Specifically, this was inter-
preted to mean that the CAMS ROM operating at +10 V should provide access times com-
parable to a bipolar ROM (operating at 5 V) interfacing; with a +10 V CAMS system. A
nominal time allotted for a bipolar system access was 180 ns. This was broken up as:
10 ns, 10 V to 5 V conversion; 30 ns, address buffering; 120 ns, worst-case IMM
access; and 20 ns, 5 V to 10 V conversion. A comparable CMOS ROM system access
would avoid the two-level shifting stages so that the system access would break doN%n as:
30 ns, address buffering; and 150 ns, CMOS ROM access.
'I'iie origin of the bipolar system access values came from a bipolar MROW memory
in the SUMC-IIIC computer. The St1AiC-IIIC is a CMOS SOS LSI computer whose con-
tribution is primarily CMOS except for bipolar ROM memories, mnin memories, and
associated interfaces. The bipolar PROM being used in the SUAiC-IIIC was the Intel
36041,-6, which has a worst-case access time of 120 ns over a temperature range of 0
to 75°C. The target access time of the ATL078 was determined to be 150 ns as shown
previously. If this were to be a worst-case access at 75°C, then derating the CAMS
ROAM at 0. 3To per °C yielded a target worst-case access time for the CMOS ROM of
130 ns or better at 25°C. This was expected to yield typical access times of 50 to 70
ns. The design philosophy was to make this ROM purely static in nature so that the
cycle time and access time had target values that were identical.
B. PINOUT
The pinout selected for the AT1.078 was influenced by the pinouts of existing 4K bi-
polar ROAM chips. The pinout chosen was that of the Intel 36041, PROM (which ie identi-
cal with the Intel 33041,-6 ROM). This would allow the SUAIC--IIIC to be used as a test
bed for samples of processed ROM chips. This pinout also came within one pin of being;
directly compatible with the AlA1I 5340, the Intersil 5605, and the Barris 76.13-5.
*Microprogram ROM
3
I
A
The pinout selected is shown in Table 1. Of the 24 pins on the package, 23 were
used. The pin requirements for this chip were: 9 address lines, 8 output lines, 4 chip
select lines (2 positive and 2 negative), 1 ground line, and 1 voltage line. The 9 address
lines selected 1 of 512 words, each word being 8 bits wide. The 4 chip select lines
were required to maintain compatability with the 3604[.-6. Of the 4 lines, 2 were posi-
tive logic and 2 were negative logic. All 4 lines were internally decoded to control the
8 output drivers. This provided for a fast access from shi- ► select that was estimated to
be 30 ns. The voltage and ground pins provided inputs for the CMOS voltage levels of
from 3 to 15 volts. It was considered desirable that the chip be able to work over the full
CAMS voltage range to enhance its system potential.
TABLE 1. A'I'1.078 PINOUT
Pin ninction Pin IAinction
1 A7 24 N/U
2 A6 23 A8 (AISD)
3 A5J 22 VDD
4 A4 21 CS 
5 A3 20 CS 
( 1) A2 i	 19 CS 
7 Al 18 CS 
S A0 (I.S13) 17 08 (111SR)
9 01 (1313) 1 f 07
10 02 15 0t,
11 03 14 05
12 GND 13 04
C. OUTP JTS
Two output types are available from bipolar TTI. ROM chips: open collector and
tristate. In a CMOS design, tristate outputs, which seemed to be a clearly superior
choice, can be designed in CMOS with little cost in terms of chip area. The benefit
4
`	
offered by tristates is a reduction in system power due to the elimination of output
F	 pull-up resistors. Since reducing system power is consistent with a basic design
objective in going to a CMOS structure, tristates were used for each of the 8 outputs
of the ATL078.
The one precaution required of a tristate design was to reduce current spiking to a
tolerable and nondestructive level. Current spiking occurs when two or more tristates
turn on simultaneously, pulling to opposite logic levels. This happens frequently during
the transition between chip select (CS) states. The higher On resistance of MOS de-
vices (compared to bipolar) typically reduces this spiking current to tolerable levels for
all but extremely large output transistor sizes. Without placing undue re,3`rictions on
the sizes of the output tristate devices, it was felt that a CMOS tristate output could be
designed to drive typical system loads (40 to 80 pf) while still maintaining the target
system speed.
D. PROGRAMMING OPTIONS
A basic goal in the design of this 4K CMOS ROM was to make it programmable at the
metal level either by modifying the metal mask or by using i laser beam to out metal
links on a completely processed chip.
f
	
	 Programming on a ROM by fabricating unique metal masks is common practice em-
ployed by many ROM manufacturers. In the CAIOS-SOS process, the metal mask is the
next-to-last mask, followed only by the mast: that opens holes in the passivation layer.
This means that a large portion of the chip processing has been completed before the
metal programming mash is required to define a uniqu; chip. This could be used as a
mechanism to allow for partial processing of a number of wafers before specific ROM
types were required, thus providing for faster turnaround after a unique metal mask has
been defined. The processing time required for the final two mast: types would define
the delivery time for chips programmed this way.
As an alternative to programming with unique metal masks, a conventionally pro-
cessed SOS ROM could be programmed using a directed laser beam. The SOS tech-
nology is ideally suited to such an approach since the epitaxial silicon islands that form
transistors are normally separated from one another by the sapphire substrate. Sapphire
is a very hard, transparent material that acts as a surface on which to grow and a
dielectric isol; tion bettiveen adjacent transisLars. It also acts as a fine surface on which
to cut metal lines with a laser, since no damage will be done to the active transistor
semiconductor. Thus, the chip performance should not be affected by the programming
operation if sufficient area is left on the sapphire to sever the programming links.
Test were performed using a xenon laser on CMOS SOS 4007 equivalents (dual com-
plementary pair plus inverter). The laser had a 0. 2-mil kerf. Numerous cuts were
0
made through the metal interconnect (12, OOOA) on several chips. Subsequent probing
of these chips indicated no degradation of transistor performance compared to the pre-
cutting measurements.
An automated laser programming capability would include a laser source and a
programmable transport system. After the initial expense of setting up such a system,
the programming operation would be comparable in speed and . e fficiency to program-
ming stations now available for AIOS and bipolar PROM chips. Aside from fast turn-
around, small quantity runs of numerous types would become feasible and practical
since the AT1.078 could be treated as either a PROM or a ROM.
If the AT1.078 were treated as a PROM, then the entire wafer processing operation
could be completed long before the chips were to be programmed. To operate in this
manner, sonic provision should be made to allow for a pretesting capability on the un-
programmed chip. Sonic possibilities for pretesting; were: (1) power On and monitor
leakage currant; (2) addition of extra words of memory to allow for a partial test of the
address decoders and output drive circuitry; or (3) some variation or combination of the
previous two possibilities that would provide as much information about the functional
operation as was reasonable without seriously impacting on the chip design. Before
the chip design was undertaken, it was recognized that sonic provision for pretesting
should be made; however, it was decided to focus on arriving at an optimum chip con-
figur ation before pretest options were added. At that time, a tradeoff could be made
that would allow implementation of the pretcRt options that would have the least impact
on the chip area.
E. POWER DISSIPATION
Tlie power dissipation of CMOS is composed of a static and a dynamic component.
In normal CA10t^ design, the static component consists of the sum of the leakages through
the Off transistors in each of the complementary structures throughout a chip. The
dynamic component is equal to the sum of the CN,2 f loses over the chip. The only chip-
constant term in the CV2 f expression is the capacitance (C), which represents the gate
and the interconnect overlap capacitance. The operating; voltage (V) anti the operating
frequency (f) are user dependent, malting the dynamic dissipation of CAMS parts strongly
a function of the system in which they are used.
At the onset of this chip development, the chip circuitry was assumed to be totally
CAIOS except for the 4096 memory elements. The memory elements were conceptual-
ized as N1110S transistors whose source was tied to either the VDD (highest chip poten-
tial) or VSS (lowest chip potential). Accessing; an TMNOS memory element would bring its
drain either to VSS or to within one threshold drop of VDD ( source follower). In access-
ing a logic 111" state from an NA10S transistor in this fashion,, its drive capability would
be reduced and the delay associated with this element would be increased. An alternative
to this approach would be to assist the NAIOS device in pulling; up to a logic "1" state by
using; a biased-on PA1OS device whose source was tied to VDD- 'Phis would provide for
a faster access, but would also increase the do chip power requirement.
The overriding philosophy guiding the development of this chip was that it must be
speed compatible at 10 V tc a comparable bipolar chip (operating at 5 V). Sind this chip
6
was to Ile CAIOS SOS, a power gating of better than 10 to 1 could he expected over
comprrable bipolar parts. If, however, it were necessary to sacrifice some power
in order to maintain the speed objective, this would he done, since the resultant part
would still be considerably lower powered than a hipolar ROAM. Since bipolar ROM
chips require from 500 to 1000 mW to operate, a 10-to-1 power sating would still en-
al,le a CAIOS SOS 110Ai to dissipate 50 to 100 mNV. This range of value was talccn as a
design goal and was meant to include both static power dissipation and dynamic power
dissipation.
The static leakage of a C11IOS SOS c.ip hating 5000 to 10,000 transistors should be IN to
500 pn at 10 V. This corresponds to a 1 to 5 mW static dissipation. For a CAMS ROM
design hating no pull-up devices aiding the NAIOS memory elements, this represented a
realistic goal. Adding pull-up devices to the NAIOS memory elements would increase the
static dissipation; however it was expected that no more than 30 mkt' should be required.
This left 20 to 70 nAV as a target value for dynamic dissipation.
F. IMPLEMENTATION
The circuit implementation of this chip was to be totally silicon irate S(, .. Design
rules used in laying ou::'iA chip were to be the stln(lard CMOS SOS rules. The validity
and maturity of these rules has been provon by close to one hundred successfully frbri-
cated chip types.
The implementation of all input, output, decoding, and buffering circuitry was to be
done in CAIOS. The only section of the ROM that would not he complementary A1OS would
be the memory elements. These elements wore to be NMOS, either implemented as a
low-power array or aided by pull-up devices.
Final dec.sions as to the size of all transistor stages and as to the method of inter-
connection were to rely on computer simulation programs. Of particular importance
in this area was the speed-power tradeoff associated with outputting logic "1" data from
NAIOS memory array.
Section 3
CIRCUIT DESIGN
A. GENERAL
The memory array of the ATL078 consists of 4006 NA10S transistors whose sources
are connected to either VDD or gro;and. Circuit implementation of this array is achieved
by separating the 4096 transistors into 8 separate blocks of memory. Individual blocks
consist of 64 words by 8 bits (see Fig. 1).
Each 64-bv-8 block is driven by its own l-of-64 decoder, whose inputs are the
least significant 6 bits of the memory address, A O -A 6.
The 3 most significant address bits, A te -A 8 , are used as the inputs to a 1-of-8 de-
coder. The outputs of this decoder control the multiplexed output lines of the eight
64-by-8 memory arrays.
Fight bus lines connect the outputs of the 64-by-8 memory arrays. Each bus line
feeds sensing and shaping circuitry. whose output drives a tristate buffer.
The tristate huffers are controlled by the decoder of the 4 chip-select (CS) lines.
The chip-select lines are decoded on-chip as an AND function. Using these 4 chip-
select inputs, up to 16 chips (8K words) can be sucked before external decode cir-
cuitry is required.
B. NMOS MEMORY ARRAY
The NMOS memory array is broken up into 8 64-by-8 slices. Each slice has 64
rows and 8 columns of NMOS transistors. Every transistor in the array has its
source connected to either V DD or ground in the programmed state.
Figure 2 is a representation of the 64-by-8 NAIOS memory array interconnect.
Metal lines Mn vertically, bringing VDD and ground to each NAIOS element.
Additional vertical metal lines carry the output signals from the memory elements to
the outl)ut multiplexer. Row-select lines run horizontally in the array. The row-
select lines are polysilicon lines, which serve as the transistor gates as well as the
second level of interconnect in the SOS process.
8
aFig. 1. ATL078 block diagram.
9
VDD
(ALL VERTICAL LINES ALUMINUM)
SENSE OUT
GNDGND	 ( GND V	 GNDV	 V
DD	 I	 DC) I	 I	 DO	 I VDD
ROW
SELECT
(POLYSILICON)
I	 '	 I	 I
I	 I	 I	 I	 I	 II
I	 62	 I	 I	 I
ADDITIONAL	 I	 I
ROWS	 I	 I	
(	 I
I	 ^	 I	 I	 I	 I
I	 I
I	 I	 I	 I
I	 I
I	 I	 I	 ^	 I
I	 I	 I	 I	 ^	 ^	 I	 I	 I
I	 I	 I	 I	 I	 I	 (	 I	 I
ROW
SELECT
(POL'YSILICON)
Fig. 2. NMOS array interconnect.
10
Each row-se l ect li ._, a.;ts a,, the common gate for 8 NMOS transistors. In the
i 61-by-8 slice, 64 row-select lines are required. The drains of each of the 64 NMOS
devices In a column are connected together by a metal sense line. Eight sense lines
run vertically in each 64-by-8 slice.
When the ATL-078 is addressed, one row in each of the 64-by-8 memory slices is
selected, while the remaining 63 rows are unselected. Each of the 8 hits on a selected
row places its data on a sense output line. Only one N1110S drair. of the 64 drains con-
nected to a sense output line controls the state of the line at a time.
Figure 3 shows a section of the NMOS memory array. Fight complete memory cells
are sho%Nm (in a 4-by-2 array). The cell size is 2. 5 by 1. 55 mils. Each NMOS device
in the memory array has a gate width of 1.6 mils and a gate length of 0. 25 -nil.
The 8 cells shoum in Fig. 3 each have their two programming links intact. This
would represent the case where laser programming was to be perfoi-ned at a later time,
Figure 4 is a "blowup" of the programming links for 4 cells, each of which shows
a different programming state. The upper lefthand cell has both links intact, while
the two righthand cells each represent cells that were programmed at the metal mask
level (one tied to GND and one tied to A V). The cell in the lower left of Fig. 4 is
repro-sentative of a link that has been laser programmed. Th y vertical VDD and ground
lines buses ar:; each 0. _ mil %side, as are the programninig links. Separation between
the voltage or ground line and the NAIOS epitaxial silicon is 0. 2 mil. This 0. 2 mil is
the programming area for a laser where metal exists over sapphire. A registration
and alignment error of 0. 1 mil is allowed in any of the f o ur axes. If the 0. 1-mil
error occurs in the direction of the epitaxial silicon, then it is possible that the laser
may cut some metal over the epitaxial material. The epitaxial silicon involved in
this area is not in the conduction path between the NMOS source and NA1OS gate, so that
circuit performar...;n should not be affected.
The design chosen for this ROM utilizes 8 64-by-8 memory slices and 8 1-of-64
decoders (one decc-ier diiv'ng each memory slice). The design could have been Irr-Ae-
mented using 4 decoders if each memory array were 64-by-16; taken further, only
one decoder would be required if the memory array were 64-by-64. The deciding factor
as to the final implementation was the access-Ome requirement of the chip. Referring
to Fig. 3, it can be seen that the row select lines are polysilicon.
In a 64-by-8 slice, each row-select line acts as a series gate for 8 NMOS transis-
tors. The resistivity of N type polysilicon is taken as approximately 200 ohms per
square. This series resistance, comb1 7 ^ed with the NMOS gate capacitance, creates
an 13C time consta: t whose wo,st-case delay for a 64-by-8 slice is 12 ns. Increasing
the memory slice by 2 from 64-by-8 to 64-hy-16 increases the RC delay down the row-
select line from 12 ns to 48 ns. While it was felt that 12 ns could be tolerated for this
11
PRO(
LINK
GND
a
vDD OUTPUT
Fl
OUTPUT
I
o^
i L
a
a
ao
o
CELL
7 r mil
SIZE
v 7 r, 9: mil
MLMORY MATRIX
Fig. 3. NAMS mVM0I'y malt rix.
12
BIT TIED TO
UNPROGRAMMED
	
GROUND
BIT	 ^—
EPI
+V
	 EPI	 GND	
+V
t	 1
4
1
EPI
—il .2
LASER
PROGRAMMED
BIT
BI Y 1 ICU 1 V
VDD
Fig. 4. Memory programming links.
13	
44
portion of the access, 48 ns would have slowed down the overall access so that it would
have been over the target goal of 1:10 ns.
C. 1-OF-6•l DECODER
Eight 1-of-64 decoders were used to drive the S64-by-8 NMOS memory slices of
they 4K memory. The 1-of-64 decoder was functionally implemented as 64 6-Input NOR
gates with 6 series PAIOS and 6 parallel NAIc_t6 transistors forming the NOR function.
Because of the regularity associated with this type of deccder. tilt` series PAIOS poilion
was able to he implemented as a large tree type decoder.
A section of the PAIOS portion of the 1-of-64 decoder is shown in Fig. 5. Ten series
transistors are defined by the 10 vertical polysilicon gates shown in the fignire. The 8
rightmost gates are address bits A,,-A 5 and their complements. Since eac:i address bit
utilizes dither a true or a complement state for each of the 64 decode conditions, only
4 of the N gates can he used for any one address decoder. The remaining 4 gates :ire
eliminated from a particular decoder by short-circuiting the source to the drain of the
transistor formed by the excess gate. The short-circuiting; links for 4 of the decode
conditions arc• shown in Fig. 5.
The polysilicon forming; each of the 8 gates, A 2-Ar) and A 2-A 5 , extends the entire
height of t',e decoder array, which is 99.2 mils. Two grates are required for each address
line, one for the 13MOS section and one for the NAIOS section of the decoder. F.ach
grate is contacted twice along; its width to reduce the 11.0 time eonst;mt associated with
distributing; the address state along; the entire polysilicon gate width. One such con-
tact to the A 2 address gate is shown in Fig;. 5.
Address A 0 , A l and their complements occupy only two polysilicon colunims in
the PAIOS section oll the decoder. The leftmost polysilicon gate in Fig. 5 is A 01 It
extends for 48. 9 mils or approximately half of the height of the decoder. Address
A 0 alto extends for 48.9 mils but occupies the same column as A 0 , covering the other
half of the decoder column, both A 0 and A 0 are contac^od twice. The A 1 , A l column
alternates as A IA IA I A I . Each of the 4 transistor gates is 2 .1. 1 milli wide and is con-
tacted once.
The NMOS portion of the decoder has ten vertical polysilicon gates in a scheme
similar to that of the PMOS section. Six parallel NAIOS devices are formed b'v
defining; dither the presence or absence of cpitaxial silicon underneath complementary
gate addresses. The NMOS and PMOS sections of the 1-of-64 decoder taken together
are 23. 1 mils wide :uul 99. 2 mils high.
The 6 series PMOS transistors forming each portion of a 6-input NOR gate for
the 1-of-64 decoder have a net effective transistor width of 0.57 milli. This results
from 6 series PAiOS transistors having; the following widths: 48. 9, 241. , 11. 7, 5. 5,
14
15
A.
A3	 ^4 ~4	 r.5
X XX X IX
X X X x l--
Fux
X X
XX E X
ICON
structure of 1-of-64 decoder.
--SHORTING
LINK
-,- A 2
 BUS
---A l BUS
2.4 and 1 mils. Each section of the NAIOS portion of the decoder consists of 6
parallel NMOS transistors. Each NMOS device is 0. 8 mil wide.
The resultant G- input NOR gate is unbalanced, having over twice the capability
to drive a logic "0" Compared to a logic "1". Since a logic "1" selects a memory
row for interrogation, a previous address should he unselected before new address
select information is generated. This N%ill save power, since two NMOS devices in
the same column should not simultaneously be On.
D. OUTPUT DECODE:
Each 64-by-8 NMOS memory slice is separated from the output bus by a group
of 8 transmission gates (one for each bit). Data bus information may come from
any of the 8 14-by-8 slices. Only one group of transmission gates will :)e turned
on at a time so that only one 64-by-8 memory slice will control the output data hus.
Command controls for the 8 groups of (8) transmission gates come from a
l-of-8 decod	 This decoder acts on the three most significant address bits,
A6-A8, to p • vide an enable signal for one of tile. 8 banks of transmission gates.
The other 7 banks of transmission gates are turned off so that their associated
memoi-y bits are isolated from the data bus. The logic for the 1-of-8 decoder is
shown in Fig. 6.
Fight output data bus lines connect each of the common outputs of the 8 multi-
,iicxcd 64-by-8 memory slices. Computer simulations at 10 V were made from a
rox-select input of the NMOS memory Slice to the output data bus lines. Other
simulations of the critical data path had revealed that :i 35-ns access was re(juireb
for this section if the desired chip access time was to be achieved. The results
of the simulation for this section of the data path indicated that a worst-case
access of 60 ns could be expected accessing a logic "l" from the NAIOS memory
elements. Accessing a logic "0" tool: only 20 ns.
In order to speed up this section of the access time, 8 PAIOS transistors were
placed on the output data bus lines. The PAIOS devices had their drains tied to the
output data bus lines, their sources tied to V DD , and their gates tied to ground.
Several transistor %,&ltlis were tried; however a width of 0. 5 mil (L=0. 25) per-
formed the hest. With these pull-up devices inserted into the simulation, the access
time to a logic "1" or logic "0" was 35 ns. Ill
	
the logic swing on the
output data bus lines was the full supply voltage, thus increasing noise immunity over
the case where no pull-up de%lces were used. This is a more important consideration
if operation : ► t the lower end (3 to 7 V) of the operating voltage scale is anticipated.
16
K- K e e
A 6 	 A6
A7	 ^7
A8 [>
Fig. 6. 1-of-R decoder logic.
17
Static power is dissipated by the pull-up transist)rs only when a logic "0" is
being outputted to the data bus lines. Under this condition, each pull-up device will
require 0. 3 mA of do at 10 V. The worst-case situation would have all 8 PMOS
devices conducting simultaneously, resulting in 2.4 mA of static current. At 10
V, this represents 24 mW worst-case power dissipation. This was an acceptable
trac-1- ff in order to maintain the target access time.
E. INPUT/OUTPUT BUFFERS AND DECODERS
The least significant 6 address bits, A 0-A 5 , are buffered upon coming on-chip
and are then fanned out to the 8 1-of-64 decoders in the array. Each 1-of-64 decoder
sees 24 inputs: the 6 address bits and their complements each taken twice. At the in-
put to each of the 1-ef-64 decoders, 12 inverting and 12 non-inverting buffers shape
the address information and drive directly their associated 1-of-64 decoders. Figure
7 shows the data path for an address access on the ATL078. The first address input
stage is a large inverter that must drive 32 stages (4 stages for each of the 8 1-of-64
arrays). The second and third inverters shown in the data path of Fig. 7 represent a
non-inverting input stage to the 1-of-64 decoder. Two of this type stage and two single
inverter stages are driven by each address line on each of the 1-of-64 decoders.
ADDRESS
IN
A
(31)
4/19.2
	
2.4/1.2
	
7.8;4.1
B
2/2	
.F
TS	 PMOS
	 4/
2
-V
(7)
1 of 64	 63JT OTHERS
C NMOS
1.6
C
GND	 + V
12/6	
TS 24/12
DA1 A
OUT
(7)
FROM 1 of 8	 FROM CS DECODE
D	 E
Fig. 7. ATL078 data path.
18
I	 The output stages of the ATL078 are also sho%%m In Fig;. 7. Each data output
bus that connects the multiplexed outputs of the 64-hy-8 slices drives a balanced
inverter, which acts as a sense amplifier. The rise and fall time of the input to
this inverter is slow, so the sense amplifier is kept small to reduce loading.
Following; the sense amplifier is an intermediate buffer that further shapes the
output data waveform and builds up the drive capability to drive the output tristate.
The output tristate logic and the chip-select decode circuitry Is sho%m in Fig. S.
When the tristate turns on, data enters the tristate logic and passes through input
transmission grates to an inverter. Whim the tristate is in its high-impedance state,
the input transmission gates turn off, thus isolating; the memory data from the output
inverter. Inputs to the NAIOS and PNIOS elements of this inverter are held low and
high respectively by single-ended NIOS devices.
The chip-select decode circuitry is a buffered NANDing; of the four chip-select
inputs. The output of this circuitry is active low and drives .111 8 tristate output
buffers. Each tristate stage has its o%%m converter to form the required complement
of the chip-select enable signal.
All chip inputs and outputs are protected from static charge by diode stacks,
which are connected from the signal to either V DD or ground. The diode stacks con-
sist of 4 sets of hack-to-back diodes, having; a typical voltage breakdown of 24 to 30 V.
Each diode In the stack is 2 mils wide.
4.2'2 1
CS
>!.zC^  ^ 42/2114	 115.5	 34/17	 _
CS 
^p	 >CS
CS
.V
CS
+V
CS--{ P - 5.5
P-6
N,6
	
- - P=24
D---	 CS trout
o.
N-6	 N-12
P=6
	 ^—
CS — N=1.5
CS
4>_CSCS _
Fig. 8. Chip select decode and i ristate logic.
19
In addition to the input-output diode staf.ks, a single diode stack is placed be-
tween VDD and ground. This stack also consists of 4 sets of back-to-back diodes
Mth each diode having a width of 10 mils.
F. LAYOUT
Tne block layout of the ATL078 is sho%m in Fig. 9. The step-and-repeat size
is 252 by 257 mils.
The periphery of the chip contains the input Inverting buffers for address hits
A 0
 through A5, the 1-of-8 decoder using address bits A 6
 through Ag, the chip select
decode circuitry and the 8 output tristates.
The center of ;he chip can be considered as 4 quadrants, each containing 1K
of memory. In actual implementation, each quadrant contains 2 64-by-8 memory
slices, 2 1-of-64 decoders with associated input (inverting and non-inverting) buffers,
2 banks of data-out multiplexers (transmission gates), 2 sense amplifiers, and 2
intermediate buffers.
Figure 9 shows the 2 64-by-8 memory slices as one 64-by-16 block. The 2 slices
are connected together, sharing common polysilicon row-select lines. This can be
done without increasing delay, since each row-select line Is driven from both of its
ends by 1-of-64 decoders. What results is an averaging effect that reduces the delay
to the centermost (worst-case) NMOS bits of the 64-by-16 memory slice.
Interconnect and 1/0 wiring on the chip occupies the channels surrounding the
perimeters of the 4 quadrant sections. t%ddress inputs to the 8 1-of-64 decoders are
routed in the 3 vertical channels on the chip, as well as the topmost horizontal channel.
Output control information from the 1-of-8 multiplexer controller also runs in these
channels. The 8 data buses connecting the multiplexed outputs of the memory slices
run in the center horizontal channel. The sense amplifiers and intermediate buffers
for each of the 8 data lines are in the blocks labeled MTPUT DECODE in Fig. 9. Output
data from the intermediate huffers to the tristate drivers is routed down the 3 vertical
wiring channels and across the bottom horizontal channel.
G. TESTING
A pre-programming test capability was incorporated into the design of the ATL078
for chips that are processed to be laser programmed. Chips that are programmed at
the mask level require no spec ial test circuitry and may be treated as normal memory
chips for test purposes.
20
9 ...LS91 O, 1
O,	 03	 GND	 06 Ob	 06
OU TPUT TRISTA TIS
11	 12	 13
Ell 17 t
iH
14	 15
A,
3
A,
A
D
u
N
\^ 5
	
t
S
5
"I 1 b	 In
I's,
(LSO) A o I f1
IMSBI
As	 A,	 N!U	 A,
7	 1	 ^4	 23
AUORtSS OUMAS 1 of A
VCC
77
21 (
70 I
19
18I <
ATL078 Hlock LeVout
Fig. 9. ATL078 block lavout.
Chips processed for laser programming have two metal lint
of each memory hit, one link tied to ground and one link tied to VDD. As a result,
a direct short circuit between VDD and ground exists, preventing power from being
applied to the chips. This would result in a condition where no testing could be
performed on the chip until it was laser programmed. Because of yield loss, Con-
siderable wasted effort could he expended programming nonfunctional chips.
To allow power to be applied to the chip before laser programming, nil of the
vertical Vnh
 buses in the 64-by-8 memory slices have been isolated from the re-
mainder of the VDD bus structure. These vertical VDDD buses, which supply pro-
grammed VDD to the memory elements, are electrically common and are brought
to a common input teet pad physically located between pads 21 and 22.
With the test pad floating t,r tied to ground, power can be applied to the ATL078
chip while all the programming links are still intact. This will ;Mow all memory
address locations to be accessed while leakage is monitored. Normally a logic 11011
will be accessed from all memory elements, thus giving logic "1" output data on
all 8 outputs. Should an address decoder, an NMOS memory element, or a data
out multiplexer show v failure, the pull-up device(s) oil
	
data output line(s) will
Pull up the internal data bus(es) and drive the tristate outputs) to a logic 11011.
When the chip is to be operated normally, the test pad can be tied to VDD•
21
I
II. SIMULATION
The entire critical data path of the ATL078 was simulated using a computer-aided
simulation program. Simulation was heavily used in desigm and layout of the chip.
Validation of transistor sizes and interconnect approaches was performed before they
were incorporated into the chip.
The critical data path as shown in Fig. 7 was simulated in four paces at a 10 V
operating voltage. With a 15 pF load on the output, the worst-case access time was
found to be 117 ns. This increased to 126 ns with a 50 pf output load.
Table 2 shows a breakdown of the delays associated with each of the four parts
of the critical access path. In each case, the slowest possible data path is considered.
Figure 7 has the four simulated sections identified as AB, I3C, CD and DE.
TABLE 2. SIMULATION OF WORSZ-CASE ACCESS PATH
Circuitry Delays (ns)
Input buffers to input of 1-of-64, 22
Including decoder gate RC.
1-of-64 decode. driving NMOS memory 36
matrix including row-select RC.
NNIOS memory matrix through multiplexers 42
and sense amplifier.
Intermediate amplifier and 17 26
ou+ put tristate (15 pF) (50 pF)
Total 117 126
(15 pF) (50 pF)
22
Section 4
CHIP STATISTICS
A summary of the ATL078 chip statistics is presented in Table 3. The ATL078
is a 4096-bit CMOS SOS NUM organized as 512 words by 8 bits. Each of the 8 outputs
is implemented rs tristate logic.
The technology used in laying out the RON1 circuitry is the 7-ma8k SOS siaeon
gate technology. Standard design rules were used to implement the circuitry. These
rules reflect a mature, produceable process. Provisions were made to program thv
ATL078 in either of two ways: either at the metal mask level (level G) during pro-
cessing, or by means of a directed laser beam after processing. Use of a laser beam
to program the chip requires cutting metal links over the sapphire substatc.
The total number of MOS transistors used to implement the AT1.078 is 8782. The
chip step-and-repeat dimensions are 252 by 257 mils, for a transistor density of 7.4
square mils per transistor. The chip fits in a 24-pin package such as the Ntetce:•am
80-0131. Only 23 of the 24 package pinti are bonded to the ch + p. The chip has 24 pads,
23 bonded to the package pins and one used for pre-programing testing. This test pad
is wired to V D n in normal chip operation. The pinout use- 1 for this chip Is directly
compatible with the lntell 3604L-6 PROM and the 3304AL-6 ,10M.
Computer simulations were made of the address and chip access times using thc
worst-case data path for each. The simulations were made at 25°C and at an operating
voltage of 10 V. At a 15 pF external loaf;, the chip-select access time is 35 ns, in-
creasing 0. 25 ns per pF as the load increases. The address access time was simulated
as 117 ns at a 15 pF load. 'This delay also increases 0. 25 ns per pF %vith additia,r ex-
ternal loading so that at 50-pf load the delay is 126 ns. The cycle time is identical to
the worst-case address access time since the chip design uses only static logic.
Static power dissipation of tare ATL078 is the sum of the standard CMOS leakage
(across Off transistors) and the leakage contributed by the PMOS pull-up devices on
the 8 data output lines. At 10 V, the PMOS pull-up leakage can vary from 0 to 24 mW.
The normal C1\IOS Off transistor leakage should be 1 to 5 mW so that the total typical
static leakage should be approximately 14 mW. A worst-case data condition could re-
quire 29 mW of static power, while a best-case dissipation could be as low as 1 mkt.
23
TABLE 3. ATL078 CHIP STATISTICS
Parameter	 Value or Characteristic
Number of bits 4096
Organization 512 x 8
Outputs Tristate
Technology Ch1lOS SOS-Standard Design Rules
Number of Process Masks 7
Prngramming Options Metal Mask or Laser
Chip Si7,e 252 x 257 mils
Number of Transistors 8782
Density 7.4 sq. mils/translator
Number of I/O Leads
I
23 (uses 24-pin package)
Package Metceram 80-0131
Pin Compatibility Intel 3604L-6, 3304AL-6
Address Access Time-WC* 117 ns at 15 pF load
10 Volt, 25° C 126 ns at 50 pF load
Chip Select Access Time-WC* 35 ns at 15 pF load
10 Volt, 25'C 44 1,9 at 50 pF load
Cycle Time Same as Address Access
Power	 (10 volts)
I
Typical	 Worst Case
Static 14 mV'	 29 mW
Dynamic-no load 25 mW/TM l z 	4 0 mW/Mil z
*worst-case
24
The dynamic dissipation on-chip is a sum of the CV 2 f losses. The principal
contributors to on-chip capacitance are the gate capacitance and the interconnect
overlap capacitarjec. Summing this capacitance produced a typical calculated dynamic
dissipation of 25 mW per A1llz at 10 V. This assumes that 5 of the 9 address lines
cnange state on every address update. St.Aistically this represents a high typical value.
If all 9 address lines changed on every address transistion (not a Hkely case), the
dynamic dissipation would not exceed 40 mW per A1Hz. The d ,-,n:imic dissipation was
calculated at nD output load since output loading is a; stem dependent and only in::reases
the dynamic dissipation of the selected chips in a system. With an external lording of
50 pF on each of the 8 output.s of the chip, a typical dynamic dissipation of 20 mW per
Atliz would be expected (4 outputs changing). With all 8 outputs changing with every
address change, a worst-cas-- 6ynamic dissipation of 40 mW per Ailiz could be at-
tributed to external loading.
i
4
1
25
Section 5
CONCLUSIONS
The dcsit''n of this chip fills a void left by available ROM chip tylx • s; that of a high
speed low power ROM that tnturfaees directly to CAMS systems.
The results of tills •Ih ROM developnu'nl have produced a chip %%hose conflguration,
pinout, programming option;, ,Iwed, and lxmr ► • nwet the design objectives.
This 512-by-8 chip, designatcd the ATL078, has a pinout compAthle with the
Intel 36041. -6 PROM or the 3304AL-6 ROM. Programming is 6-me at the metal
mask level or by directed laser beam after processing. The AT1,078 operating :0
10 %, is speed compatible to hilx ► lar ROM chips at the system level, having a fast
chip select access of 4 . 1 ns. A worst-case add ► •ess access takes 126 ns.
At the system level, the po%Nvr savings of the AT1,078 compared to hilxdar ROT%Is
Is 10:1 or better. TN-pical ROM syHten ► ti should see average chip ► lissip; ► lion-, of
less than► 50 mw per Chip, while 500 m11' is typical of hipolar systems.
2(i
Section 6
RECOMMENDATIONS
The objective of this progran ► was the design of a 4096-bit CMOS-SOS IIOM that
could he field programmed by it laser technique. Such a desiin was completed and
documented. The method of programming by the laser scribing of metal links was
previously demonstrated on several devices. A practical technique for programming,
this ROM would involve the modification of the laser scribing or cutting system to
include an x-y step-and-repeat capability, which could be programmed to automatically
scribe tite metal link in accordance with the desired bit pattern. The cost of such
systems was estitn:iled as in the $60,000 to $100,000 ►angC. Such a cost would tend
to minimize the numberS of such programmers and consequently would severely linatt
their availability and usability. Therefore, such an approach is not recommended.
The need for a high speed (less thin 100 ns access), 4K-bit or higher CMOS-SOS
ROM in the tens of milliwatts range is greater now than ever - especially one that is
reasonably hardened to total dose and dose rate. The CMOS -SOS technology is now
more than capable of providing; this rate of access, and hasic designs for such a ROM
have already been generated.
Because of the general lived for suelt a low-liower, high-sped radiation-resistant
BONI in various Space and air-borne applications, and because both the designaS and
techruloKv to produce Such : ► device exist, the design of Such a C11lOS 1SOS ROM chip
is highly recommended.
27
Appendix
PROGRAMMING LINK LOCATIONS
In order to laser cut the programming links on the 4096 memory bit:
ATL078, an accurate indication of their location must be given. This apl
vides that Information.
Figure A-1 gives the relative topological layout of all the word and hit locations
on the chip. A proper chip orientation places the chip Wentifer (.ATL078) in flee
Lipper right-hand corner.
An explanation will he given of the information in the lower left hand quadrant
of Fig. A-1. The leftmost and rightmost rectangles are labeled 0--63. 'These are
the l-of-64 decoders and the 0--63 indicates the direction of advancing address. Be-
tween the two 1-of-64 decoders is a rectangle with a vertically dashed center line,
representing 64 by 16 bits of memory. The vertical dashing separates the 64-by-16
bits into two 64-by-8 slices. The numbers 320- ►383 indicate the true addresses
contained in this 64-word slice. Proceeding from the bottom and going upwards,
addresses 320-383 are on the left and addresses 256-319 cure on the right. The order
of the output bits is shown in the topmost rectangle of this quadrant. There are 16
numbers in this rectangle, representing two groups of 8 hits each. For example
2-1-3-4 . . . represents output bits 0 2 -0 1 -03 -04 . The three other quadrants can be
interpreted in similar fashion.
Each bit of memory has two metal links associated with it. One link must be
severed to program the bit. The location of the program links is repeatable in an
array of 2 bits by 2 bits. Figure A-2 shows the repeatable 2-by-2 array. The two
links associated with each Lit are labeled "1" and "0". To program a bit so that
the chip output is a high (logic "I") voltage the lint: labeled "1" must be severed. To
get a low-level chip output, link "0" must he severed. The two links of any one bit
have a horizontal separation of 0. 8 mil. The links are 0.4 mil wide. Proceeding
horizontally across a row, the link polarities do not strictly alternate; the polarities
are 1-0-0-1-1-0-0-1- ... and can be seen to be repeatahlc. in groups of .t links (2 bits).
Every 2 hits is repeatable in 5.0-mil spacings in X, and 3. 1 -mil spacings in Y.
The location of the bottom edge of the "1 " link (on i t s center-line) for address 320
output bit 0 2 is X50. 9 Y16. 2. This references a datum 0 at the intersection of the
centerlines of the scribe lines in the lower left of the chip. (Note: to sever link "1",
cut 0. 4 mil in Y; to sever lint: 0, move 0. 8 mil in X and cut 0. 4 mil in Y. )
A-1
ATL078
92
55
0
63
i-6A-3-1-2
56-4-3-1-2
11
48
63
0
Fig. A-1. ATL078 word and bit locations.
A-2
M ^+
	— 1	 0	 0	 1	 1	 0
2,4
	
1	 0	 0	 1	 0
0	 I	 0	 1 1	 1
	 EO
1 BIT	 1 BIT	 I2
.4-------- ---------^
I	 1 BIT	 1 BIT	 I
^	 I
I	 1	 0	 I	 0	 1	 I	 1	 0
Al	 .s---. --.8---►
 —2.a— •—.s--^
5.0	 —
(2 BITS)
2
Fig. A-2. flopeatnhlo 2 x 2 array of program links.
A-3
•7 r —
I
(2 BITS)
	 1
3.1
I
The locations of the lower left "1" links in the other 3 quadrants are as follows:
Address	 Bit	 Location "1" Link
128	 2	 X162 Y16.2
127	 2	 X50. 9 Y138.6
447	 2	 X162 Y138.6
A-4
*U.S. GOVERNMENT PRINTING OFFICE 1977-740049/265 REGION NO. 4
