Monolithic parallel processor, phase 1A by unknown
\~
"0 '" r 1 -, -i' " /;"J /.,
. '.. ,'\, " I ,
·1 :', \,- '~'\
..0 I
I, ". .,' ,\ \, \ ' .1W'.'.' ~i \ r-~" •
, h\jJ ...... '-..-/L -...J
Final Progress Report
Phase I(a)
<.,}? .1'
-4.1
•-'...,' ,"'~ ,:l
MONOLITHIC PARALLEL PROCESSOR
28 January 1910 To 21 September 1971
o
N,
N
r-
z
Contract No. NAS 5-11517
0::>
0
"(Y)
"
CJ)
.. ~ CO
1
+J 00'1 I'-! 0
0 •
Io.,o-t-4 >(J) f...IU ~Ho:i OU'l 0
I:il UU C) I...HUl ...
HUl 0 Cl: I..a (l) ..... ~o:i !-l '0«!10'(\j
o..Oo:i j
f...I-
uo.. 0-
H ....
::I:r-!f"-oQJ
F.-Hel 0'1 M Ci2
!-II:: .... ...III
>-l or-! ::E
au.. · ....
::;)
z o-r- z
0 (1)0'1 0Cl:
~..aU'l"" ~
.... 0
r- . ><
..... I:ilN Q. ::E
NU'l ill ...~
r-..a ! U'l 0
r¥1::tl ~
NPotOr-- v
'01 r-N Cl:
.... .. 0'1 '"Cl:
I q;j .... ~
IXIO
-UU'l • (\j :>'1:1, U'l I::i U
..a~ (\j ....
U'luIJ f...I
~o eli
zQ:lCO el
...... PI 01 ..:ll
Prepared by
RCA Solid State Division
Somerville, New Jersey
for
Goddard'Space Flight Center
Greenbelt, Maryland
..
. REPRODUCED BY
: NATIONAL TECHNICAL
: INFORMATION SERVICE
. u.s. DEPARTMENT OF COMMERCE .
SPRINGFIELD. VA. 22161
https://ntrs.nasa.gov/search.jsp?R=19720012530 2020-03-11T19:00:21+00:00Z
Final Progress Report
Phase I(a)
MONOLITHIC PARALLEL PROCESSOR
28 January 1970 To 27 September 1971
Contract No. NAS 5-11577
Goddard Space Flight Center
Contracting Officer: A.L. Essex
Technical Monitor: F. Link
Prepared by
RCA Solid State Division
Somervi lie, New Jersey
Project Manager: H. Miiller
Project Engineer: A. Dingwall
for
Goddard Space Flight Center
Greenbelt, Maryland
ABSTRACT
A four-bit parallel processor LSI array was designed and fabricated using
COS/MaS integrated-circuit technology. Twenty-five units were delivered to
NASA to demonstrate full achievement of Phase ICa) goals and to show the ap-
plicability of techniques for high-yield processing. The design features in-
clude the provision for interconnecting groups of parallel-processor chips to
~orm an expanded processor of any desired word length. This SOO-transistor
"computer on a chip" circuit has the logic capability of a medium-size, medi-
um-speed, general-purpose computer suitable for sophisticated scientific data
processing.
The ability to fabricate this device repetitively has now been demon-
strated.
iii
Section
I
TABLE OF CONTENTS
THE PARALLEL PROCESSOR . • . . • •
Page
1
PARALLEL PROCESSOR LOGIC DESIGN
A. General Description of Four-Stage Processor
B. General Logic Description
C. Operational Modes
D. Overflow Detection
E~ Zero Detection • •
F. Negative Detection
G. Conditional Operation
H. Instruction Repertoire .
I. Serial-Shift Operations
J. Parallel Commands
II
K.
L.
M.
N.
O.
P.
Q.
R.
Timing •
Mode-Independent Switches
Mode-Dependent Switches
Shift Switches
Conditional Operation
Overflow Indicator . .
Expansion to 16-Stage Processor
Electrical Performance. • • . .
v
3
3
3
7
7
9
9
10
10
11
12
15
16
18
19
19
20
21
21
Section
III
IV
TABLE OF CONTENTS (Cont.)
IMPROVED PHOTOMASK FABRICATION . •
A. Limitation in Large-Chip Size
Photomask Technology . • . . . .
B. Automated Photomasking Equipment •
C. Type 1600 lOX Reticle Generator
D. Type 1795 Chrome-Master Photorepeater
E. Processing •.••.•...••••.
REFERENCES • • . • . . . . . . . . . . . . • • . . • . .
vi
Page
25
25
26
26
29
30
31
Figure
1
2
3
4
5
6
7
8
LIST OF ILLUSTRATIONS
Title
Parallel Processor, Array Chip
Terminal Assignment Diagram
Logic Diagram
Operational Modes
Schematic Interconnection of Four Four-Stage Chips
ADD Perform Time • . • •
Digitizer Plotter System Employed to Digitize
Parallel Processor Data . • • • . • • •
Type 1600 Automatic Reticle Generator
vii
Page
4
5
6
8
22
24
27
28
SECTION I
THE PARALLEL PROCESSOR
A 36-month developmental program has been conducted to design, develop,
and fabricate monolithic complementary-symmetry MaS (COS/MaS) large-scale
parallel processor arrays(l). This task was completed successfully and 25
parallel-processor arrays of BOO-transistor complexity were designed, fabri-
cated, tested, and delivered to the NASA Goddard Space Flight Center.
The major objective of this Phase I(a) project was to fabricate high-
quality photomasks for this "large" chip-size device using computer automated
techniques in order to permit reproducible fabrication of this device with
extremely low standby leakage levels. This task was successfully accomplished.
High performance units of this arithmetic unit can now be repetitively gener-
ated to meet NASA's needs.
The original concept of functional operation of the array was conceived
b J "k" (2-4) h . "h h NASA G dd d S Fl" hy R. . Lesn~ews ~, at t at t~me w~t teo ar pace ~g t
Center, as part of a program to realize an ultralow-power computer that re-
quires a minimal number of array types. The logic was extended and tech-
niques for interconnecting groups of chips were implemented by RCA Airborne
Systems Division, Burlington, Massachusetts, and by RCA Solid State Division,
Somerville, New Jersey.
Among the unique design features of the parallel processor is its modu-
larity feature. By using external mode controls, a powerful multifunction
array capability is achieved which allows one chip to be used in a variety of
processing applications, and allows the generation of complete arithmetic
units using a single-array type.
The parallel processor has four-bit arithmetic processing and storage
capabilities with full functional decoding, expansion capability, and time
1
sharing of data pins. The array has 27 input/output pins with an equivalent
logic complexity of 200 two-input gates implemented on a 146- by ISS-mil chip
containing 77S active devices.
The RCA TAs7l6, a four-bit, COS/MaS parallel processor, is an example of
one building-block of a powerful computer arithmetic unit suitable for use in
low-power, medium-speed applications. Because of its unique mode-controlled
logic, n-bit arithmetic units of 8-, 12-, 16-, and 32-bit lengths can be con-
structed by interconnecting several TAs7l6 processors. Significant design
features of the TAS7l6 include:
• High reliability COS/MaS circuitry
• Look-ahead-carry for higher speed operation of n-bit arrays
• l6-instruction repertoire
• Single-phase clocking
• Easily expandable to n-bit operation
• Bidirectional data buses to minimize interconnections
• Full instruction decoding on chip
• Fully static operation
• Medium-speed operation: Add time for four-bit = 1.3 microseconds
(typical), l6-bit = 2 microseconds (typi-
cal)
• Low standby power «10 microwatts typical)
• Low dynamic power (10 milliwatts typical)
• Full military temperature range (-SSoC to +12SoC)
• High noise immunity: 4S percent of VDD typical over full
temperature range
• Operation from a single power supply of 3 to 15 volts
• Single-phase clocking
• High input impedance: lOll ohms typical at 2soC
2
SECTION II
PARALLEL PROCESSOR LOGIC DESIGN
A. GENERAL DESCRIPTION OF FOUR-STAGE PROCESSOR
Figure 1 is a photograph of the parallel processor chip. The four-stage
parallel processor basically is a four-stage shift register that has both
serial and parallel access. The logic associated with the register allows
parallel-two's complement addition, AND, OR, and EXCLUSIVE OR logic oper-
ations; and right, left, or right-cyclic shifts.
Figure 2 shows the lead requirements for the four-stage processor. All
control lines are encoded, with five leads used for instructions; four leads
for control; one lead for timing; two leads for power; and five leads for the
following conditions:
a. negative indication
b. zero indication
c. overflow indication
d. overflow input/output
e. conditional input
Because information will enter and leave on the same line, six leads are re-
quired for the four-stage register to transfer data, with four of the leads
used for parallel access and the remaini~g two leads used for serial a~cess.
In addition, four leads are available for expansion to a multiple of four (16
for example) stage processor.
B. GENERAL LOGIC DESCRIPTION
The logic configuration for the four-stage parallel processor is shown in
Figure 3. Functional gating is used where possible to achieve a low device
count. The hardware requirement is about 750 devices and 27 bonding pads.
3
Oa8g8V
Figure 1. Parallel Processor, Array Chip
4
NO.
I/O
DATA 1 R021 28
DATA 4 2 27 DATA 3TA5716
DATA 2 (TOP VIEW) VDD3 26
NEG. IND. GND
4 25
Z· ZERO IND.I 5 24
c e
6 23
d a
7 22
A b8 21
C
9 20 CLOCK
RT. DATA 10 19 B
BYPASS 11 18 LT. DATA
Nc 12 17 OVERFLOW I
Cl OVERFLOW
13 16
ROI
14 15
C2
05697 L
Figure 2. Terminal Assignment Diagram
5
08
00
0L
..
.
o
N
D
P
A
D
.
.
.
·
l
I
n
~
"·
C..
.
.
.
.
.
.
I~
'
M
U
ll
e
t"
"
M
O
S
"
.'
c,
pl"
ON
A"
"
.
-
C
U
Ji
lN
U
.
U
U
IM
C
.
.
.
.
.
'
M
O
l
O
IY
tC
l
•
.
&..
.
.
11
01
I
10
M
.
M
D
G
uG
,
D
-t
Y
H
fL
l.
-f
L
O
'
D
A
tA
I
.
-
.
.
.
.
C
A
fO
a
D
A
tA
lNO
T
RE
PR
OD
UC
IB
LE
--
--
--
-"
~k
~:
;X
·
'
~
:
.
'
:
.
:
,
,
.
.
.
.
.
.
.
.
_
_
•
•
4
O
'I'
II
'L
O
W
"
"
D
A
tA
4
•
A
D
-y
;::
lou
t
• Je
AI l...
.
.
I
110 II
,
1...
I
r
+-
,_~
'
.. lc,
I
1.c, I' liN IL !L.
o
L
.
I, I
.
I
.
I,
M
,
,
N
o
n
1.
o-
'.JO
'H
IP
-H
O
"
lI
"
N
U
I"
D
IN
P
U
'
TO
Q
O
U
T
.'"
O
M
L
o
w
-e
O
-N
''
''
(&
.0
(&
,
.
.
.
.
,.
I.,.
iO"
.
,
.
o
re
2.
~
I
(o
.
.
.
.
.
u
•
•
to
'
0
1
.0.
.
-
,,
,
H
oe
.n
o•
.
N
O
"
s
.
"
'
-
Z
I
I.
"
N
ul
ID
.
(O
N
N
Ie
T
'"
1'
0
v
•
.
Fig
ure
3.
Lo
gic
Dia
gra
m
JC. OPERATIONAL MODES
For this discussion mode is defined as the ability of the parallel pro-
cessor to control the transfer of either serial data or carries due to arith-
metic operations. The parallel processor will be capable of operation in one
of four modes. For simplicity, consider the parallel processor as a strictly
serial device. Serial data can enter or leave either side of the register.
Since there is only one lead on either side of the register, the serial trans-
fer must be bidirectional. The manner in which modes control the serial-data
lines is as follows:
a. Mode 0 (A, Figure 4 - Data can enter or leave from either side.
b. Mode 1 (B, Figure 4 - Data can enter or leave the register on
the left side during any serial operation.
c. Mode 2 (C, Figure 4 - Data can enter or leave on the right side.
d. Mode 3 (D, Figure 4 - Serial data neither may enter nor leave
the register, regardless of the nature of the serial operation;
furthermore, the register is bypassed electrically, i.e., there
is an electrical bidirectional path from the right serial lead
to the left serial lead. The most-significant (leftmost) bit
is used as the sign bit.
D. OVERFLOW DETECTION
A two's complement overflow is defined as having occurred if the signs of
the two initial words are the same and the sign of the result is different
while performing the ADD instruction.
The parallel processor will be capable of detecting and indicating the
presence or absence of an arithmetic two's complement overflow. Overflows
will be detected and indicated only during operation in Mode 2 or Mode 3. In
either mode, only four instructions (AD, SM2, SM, and SUB) will have the
potential of causing a two's complement overflow. If an overflow is detected
7
i------I
~I I"1-· -I MOOED 1-·-rr- ... MODE 0 •I ~. :::JL _____
A. MODE 0
i-----i
·1 ~~_. -I ::_ ~T .. MODE 1
L::: ______ ~
B. MODE 1
,------1
~ I..r~ ::: I- .-rr- MODE 2 •L:::: _____ =:::!J
C. MODE 2
1-----1
~ MODE3 tr-t=:1 ;OD:~ ~T ...~_____ :::::!J
D. MODE 3
01993 L
Figure 4. Operational Modes
8
and stored by a flip flop, only one of the five instructions (AD, SMZ, SM, SUB
or IN) can change the overflow indicator.
Occurrence of a two's complement overflow is represented by a "1" in the
overflow flip flop. The absence of an overflow is represented by a "0" in
the overflow flip flop. The flip flop will change from zero to one as over-
flows do not or do occur.
When anyone of the three subtraction instructions is used, the sign bit
of the data being subtracted will be complemented, and this value will be
used in the same manner as one of the initial signs (as in the add instruction)
to detect overflows.
If an overflow occurs, the final sign will be one's complemented. This
means that the final sign returns to the same polarity as the original sign.
The overflow flip flop will be updated at the same time that the new re-
sult is stored in the parallel processor.
E. ZERO DETECTION
The parallel processor will be capable of detecting the condition of all
zeros. This operation will be independent of modes. A condition of all zeros
will be represented by a "1" on the zero indicator line; otherwise, this line
will be zero. If the particular four-bit processor represents the least signi-
ficant set of bits, ZI should be tied to +V. ZIon all other parallel pro-
cessor array should be attached to the previous zero indicator line.
F. NEGATIVE DETECTION
The parallel processor will be capable of detecting the presence of a
negative number. This operation is independent of modes. If the condition
is true, a "1" will appear on the negative indicator line; otherwise, a "0"
will appear. A "1" in the most-significant bit position will indicate a nega-
tive representation.
9
G. CONDITIONAL OPERATION
Once the instruction and mode have been applied, only the clock pulse will
be required to change the state of the register. If this pulse could be
inhibited in the ON condition, all instructions would behave as a NO-OP.
The clock pulse can be constrained by using "conditions." A conditional
input (C) is compared with a control line (B), and a second control line (A)
defines whether or not to test the conditional input line (C). An instruction
will be permitted to operate under the following conditions:
a. Unconditional
b. The conditional input is positive
c. The conditional input is negative
H. INSTRUCTION REPERTOIRE
Four encoded lines will be used to represent 16 instructions. A fifth
line will be used solely to represent an OUT command. Encoded instructions
will be as follows:
NO-OP
Left shift
Right shift
Rotate (cycle) right
Input
Subtract from memory (SM)
Count up
Count down
Clear to zero
Set to one
AND
OR
EXCLUSIVE OR
Subtract from zero (SMZ)
Add (AD)
Subtract (SUB)
10
I. SERIAL-SHIFT OPERATIONS
a. Rotate (cycle) right - This operation is internal. The contents of
the register will shift to the right, cyclic fashion, with the left-
most stage accepting data from the rightmost stage, regardless of
mode. Data may leave the register serially on the right data line
only while the register is in Mode 2 or Mode 0. Data may leave the
left data line serially while in Mode 1 or Mode 0.
b. Right shift - The contents of the register generally will shift to the
right under the following conditions:
(1) In Mode 0, data may enter serially on-the left data line, shift
through the register, and leave on the right data line.
(2) In Mode 1, data may enter serially on the left data line. The
right data line effectively will be open-circuited.
(3) In Mode 2, data may leave serially on the right data line. The
left data line effectively will be open-circuited. Vacant spaces
will be filled with zeros.
(4) In Mode 3, serial data neither may enter nor leave the register;
however, the contents will shift to the right, and vacated places
will be filled with zeros.
c. Left shift - The contents of the register generally will shift to the
left under the following conditions:
(1) In Mode 0, data may enter the right data line, shift through the
register, and leave on the left data line.
(2) In Mode 1, data may leave serially on the left data line. The
right data line effectively will be open-circuited. All vacant
positions will be filled with zeros.
(3) In Mode 2, data may enter serially on the right data line. The
left data line effectively will be open-circuited.
(4) In Mode 3, data neither may enter nor leave the register; however,
the contents will shift to the left, and vacated places will be
filled with zeros.
11
J. PARALLEL COMMANDS
a. CLEAR - sets register to zero.
b. SET - sets register to all ones.
c. OR - processes contents of register with value on parallel-data lines
in a logical OR function.
d. AND - processes contents of register with value on parallel-data lines
in a logical AND function.
e. EXCLUSIVE OR - processes contents of register with value on para11e1-
data lines in a logical EXCLUSIVE OR function.
f. IN - loads value on parallel-data lines into register.
g. OUT - outputs contents of register on parallel-data lines.
h. SUB:
(1) In Mode 1, adds to the contents of the register the two's comple-
ment of whatever is on the parallel-data lines. Generated
carries may leave on the left serial line. The overflow indicator
is not altered.
(2) In Mode 2, adds to the contents of the register the one's comple-
ment of whatever is on the parallel-data lines. Carries may
enter on the right serial line but may not leave on the left
data line. The absence or presence of an overflow is registered.
(3) In Mode 0, same as Mode 2, except carries may leave on the left
data line. The overflow indicator is not altered.
(4) In Mode 3, same as Mode 1, except carries may not leave on the
left data line. The absence or presence of an overflow is
registered.
i. COUNT UP:
(1) In Mode 1, internally adds one to the contents of the register
and permits any resulting carry to leave on the left serial-data
line. No data enters or leaves either the parallel lines or the
right serial line.
12
(2) In Mode 2, adds to the contents of the register whatever is on
the right serial-data line. No data enters or leaves either the
parallel lines or the left serial line.
(3) In Mode 0, adds to the contents of the register whatever is on
the right serial line and permits any resulting carry to leave
on the left data line. No data enters or leaves the parallel
lines.
(4) In Mode 3, internally adds one to the contents of the register.
No data enters or leaves the register on any serial-data or
parallel-data line.
j. COUNT DOWN:
(1) In Mode 1, internally subtracts one from the contents of the
register and permits any resulting carry to leave on the left
serial-data line. No data enters or leaves either the parallel
lines or the right serial line.
(2) In Mode 2, subtracts one from the contents of the register and
adds to this result whatever is on the right serial-data line.
No data enters or leaves the parallel lines or the left data line.
(3) In Mode 0, subtracts one from the contents of the register and
adds to this result whatever is on the right serial-data line and
permits any resulting carry to leave on the left data line. No
data enters or leaves the parallel lines.
(4) In Mode 3, internally subtracts one from the contents of the
register. No data enters or leaves either the parallel lines or
the serial lines.
k. AD:
(1) In Mode 1, adds the contents of the register to whatever is on
the parallel-data lines and allows any resulting carry to leave
on the left data line. The right serial-data line is open-cir-
cuited. The overflow indicator is not altered.
13
(2) In Mode 2, adds the contents of the register to whatever is on
the parallel-data lines and the right serial-data line. Any over-
flows will set the overflow indicator. The left serial-data
line is open-circuited. The absence or presence of an overflow
is registered.
(3) In Mode 0, adds the contents of the register to whatever is on
the parallel-data lines and the right serial-data line. Any
resulting carry may leave on the left serial-data line. The
overflow indicator is not altered.
(4) In Mode 3, adds contents of the register to whatever is on the
parallel-data line. Any resulting carry' will set an overflow
indicator. The two serial-data lines are open-circuited. The
absence or presence of an overflow is registered.
1. SM - same operation as AD, except the contents of the register are
two's complemented during addition in Mode 1 and Mode 3. In Mode °or
Mode 2, the contents of the register are one's complemented and added
to whatever is on the right serial-data line and the parallel-data
lines. Overflows occurring in Mode 1 or Mode °do not alter the over-
flow indicator. The presence or absence of overflows is registered
on the overflow indicator in Mode 2 or Mode 3.
m.· SMZ:
(1) In Mode 0, one's complements the contents of the register and
adds whatever is on the right serial-data line to the contents of
the register. Any resulting carry may leave the left serial line.
Any overflow will not alter the overflow indicator.
(2) In Mode 1, two's complements the contents of the register and
permits any carry to leave on the serial line. Nothing may enter
the right serial line. Any overflow will not alter the overflow
indicator.
(3) In Mode 2, one's complements the contents of the register and
adds whatever is on the right serial line to the contents of the
14
register. Carries may not leave the left serial line. The
absence or presence of an overflow will alter the overflow indi-
cator.
(4) In Mode 3, two's complements the contents of the register. Serial
data neither may enter the right serial line nor leave the left
serial line. The overflow indicator will be at zero.
n. NO-OP - The NO-OP condition will inhibit the clock signal before the
D-type flip flops.
K. TIMING
Transfer of data is accomplished by using a D-type flip flop which re-
quires one clock pulse to transfer data on the input into the- storage e1e-
menta
The D-type flip flop consists of two double inverters which may feed back
on themselves through transmission gates providing a stable state. When the
clock is low, transmission gates "1" and "3" are active and gates "2" and "4"
are inactive. This state permits the retention of data by the second inverter
pair while allowing the incoming data to define the state of the first
inverter pair.
When the clock undergoes a low-to-high transition, the states of all trans-
mission gates are changed. During this transition the flip-flop input becomes
isolated and the first inverter pair is stabilized by opening the feedback
transmission gate holding the information which was on the data line. Mean-
while, the second inverter pair loses its feedback and a path is established
from the first stage. For a set of D flip flops in a shift-register configura-
tion, the effect of this transition is to permit the first stage of each flip
flop to store information from the output of the previous flip flop before the
second stage of the flip flops changes due to the new flip-flop input. During
the high-to-low transition, the new data are transferred to the second inverter
pair, in a manner similar to the original transfer, and the normal storage
mode is assumed.
15
L. MODE-INDEPENDENT SWITCHES
The state of the control lines to the processing logic for the 15 operat-
ing instructions is shown in Table I. True data from the parallel inputs are
gated into the parallel processor when Kl is high. The pertinent equation is
Kl = a~d + ac + cd • . • . • • • . • • . • . • • • • • • • • • (1)
Complementary data from the parallel inputs are gated into the processor by
K2 , which is given by
. . . . . . . . . . . . . . . . . . . . . . . . . . . (2 )
True information in the register is gated into the processor by K3, which
is given by
K3 = ab + c
and the complementary information is gated by K4 , where
(3)
K = b~4 • • . • • • • • . • • • • . (4)
Control Ks is used to set all ones into the processor for one operand. Control
Ks can be gated through for a SET or can be used in COUNT DOWN. The pertinent
equation is
Ks = abd + a~d . . . . . . . . . . . . . . . . . . . . . . . . (5 )
The EXCLUSIVE OR can be inhibited when XI is high, which allows the OR opera-
tion to be formed. The switching equation is
XI = a + d
The T'l transmission gate is used to load the register in parallel, where
(6)
IN abed . . . . . . . . . . . . . . . . . . . . . . . . . . . (7)
16
TA
BL
E
I.
OP
ER
AT
IO
NA
L
CO
DE
FO
R
TH
E
PA
RA
LL
EL
PR
OC
ES
SO
R
AR
RA
Y
O
pe
ra
ti
on
al
Co
de
M
od
e
In
de
pe
nd
en
t
M
od
e
D
ep
en
de
nt
In
-
(2
,3
)
(0
,1
)
(0
.1
)
(0
,2
)
(0
,2
)
(1
,3
)
(3
,1
)
(3
,1
)
(0
,2
)
(0
,1
)
s
tr
u
c-
a
b
c
d
K
K 2
K 3
K 4
K S
XI
IN
SU
M
AN
D
CR
R
R
L
R
R
rl
L T
l
R
ro
L T
O
L
Q
J
CA
R
M
ti
on
1
0
n
n
NO
P
0
0
0
0
0
0
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
AN
D
0
0
0
1
1
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
CN
TD
0
0
1
0
1
0
1
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
CN
TU
0
0
1
1
0
0
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
SM
Z
0
1
0
0
0
0
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
SM
0
1
0
1
1
0
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
AD
0
1
1
0
1
0
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
SU
B
0
1
1
1
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
SE
T
1
0
0
0
0
0
0
0
1
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
CL
EA
R
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
XO
R
1
0
1
0
1
0
1
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
0
0
OR
1
0
1
1
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
IN
1
1
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
L
1
1
0
1
0
0
0
1
0
0
0
0
0
0
0
0
1
0
O
'
1
0
1
1
0
0
0
0
R
1
1
1
0
1
0
1
0
0
1
0
0
0
0
0
1
0
1
1
0
1
0
0
1
1
0
0
R
1
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
0
0
The SUM transmission gate is used to perform all add-type instructions, where
SUM = a (b + c) . . . . . . . . . . • • • • (8)
Logic operations are performed by the AND and OR switches, where
and
AND = abed . . . . . . . . . . . . . . . . . . . . . . . . . . (9)
OR = ab . . . . . . . . . . . . . . . . . . . . . . . . . . . (10)
M. MODE-DEPENDENT SWITCHES
Code definitions for the modes were selected such that there is no need
to decode. The definitions selected are as follows:
Mode Cz C1
0 0 0
1 O· 1
Z 1 0
3 1 1
When high or "1," line C1 or Cz indicates which side of the processor is
inhibited. Lead C1 corresponds to the right side; lead Cz corresponds to
the left side.
Transmission gate Q (Figure 3) is used to force a "1" into the carry of
the least significant adder during a COUNT UP or SUBTRACT operation and during
Mode 1 or Mode 3. The equation is
Q = C + cd1 . . . . . . . . . (11)
A zero is forced into the carry by J, which is given by
J = C . cd . . . . . . . . . . . . . . (lZ)1
During Mode 0 and Mode Z, a carry is brought in from a previous array by the
CAR transmission gate, where
CAR = C . a1 . . . . . . . . . . . . . . . . . . . . . . . . . (13)
18
A carry may propagate out during Modes 0 and 1 by using an M, which is given
by
N. SHIFT SWITCHES
a •••••••••••••• CI ••••••••• •• (14)
The Rand L transmission devices are mode independent gates used to per-
form the right and left shifts. ~,Ro' ~l' ~O' LT1 , LTO ' and LN are mode
dependent shift controls. The equations are summarized as follows:
~ Cz . abcd
R = abcd
0
~,l = Cz · abc
LT 1 = Cz · abed,
~,o = Cl abc
L Cl · abedT,o
L = Cl
. abed
n
• • • • • • • • • • • • 0 • • • • • • • • • • •
• • • • • • • • • • 0 • • • • • • • • • • • • •
· . . . . . . . . . . . . . . . . . . . . . . .
• • • • • 0 • • • • • It • • • • • • • • • • •
· . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
· . . . . . . . . . . . . . . . . . . . . . . .
(15)
(16)
(17)
(18)
(19)
(20)
(21)
R
L
abc
abcd
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Note: R denotes right, L denotes left
(22)
(23)
O. CONDITIONAL OPERATION
The clock pulse operates on condition. Three control lines will be defined
to permit conditional instructions. These lines will be labeled A, B, and C.
The following truth table defines interactions among A, B, and C:
19
Permits
C B A Operation
0 0 0 Yes
0 0 1 Yes
0 1 0 Yes
0 1 1 No
1 0 0 Yes
1 0 1 No
1 1 0 Yes
1 1 1 Yes
The truth table reduces to the condition that <A+ B·C + B • C) data-transfer
operation is to take place. This expression combined with the clock pulse
accomplishes the data transfer providing the processor is ~ot in the NO-OP
state.
P. OVERFLOW INDICATOR
Switch SIGNA operates and puts the truth sum in the register under either
one of the following conditions: overflows have not been detected; or the
mode of operation and the instruction are such that overflow detection is not
needed. This condition can be summarized by
C • ab2
+ C2 • ab + abc . . . . . . . . . . . . . . . . . . . (24)
Switch SIGNB operates when an overflow occurs, and a "1" is placed in the
overflow flip flop. The complement of the most significant bit sum output
also is placed in the register; hence,
ab • C •••••••• (25)
2
A zero is placed in the overflow flip-flop when the following condition is
true:
ab • C •••••••••••• (26)2
20
The overflow flip flop can only be clocked during (ab • CZ) or during the
IN instruction. Data may be entered or removed from the overflow flip flop,
on the OVERFLOW I/O line, and during the IN and OUT commands, respectively.
Q. EXPANSION TO l6-STAGE PROCESSOR
The four-stage parallel processor is designed such that four processors
can be interconnected monolithically to form the l6-stage processor. The
wafer will be diced such that four operating four-stage processors will form a
l6-stage processor. The interconnection scheme is shown in Figure 5. In
sections Z and 3, the Cl and Cz mode controls are tied to ground; therefore,
these sections are held in Mode O. In section 1, Cc is tied to ground; there-
fore, the section can operate only in Mode 0 or Mode 1. In section 4, lead Cl
is tied to ground; therefore, this section can operate only in Mode 0 or
Mode Z. The modes for the l6-stage processor are determined by inputs Cl and.
CZ' as shown.
The bypass leads of sections land 4 are connected as shown in Figure 5.
When Cl and Cz are both "1," indicating Mode 3 the l6-stage register is by-
passed; and the left serial-data line of chip 4 and the right serial-data
line of section 1 are connected.
Leads R
ol of section 4 and RoZ of section 1 are conriected as shown in
Figure 5 to allow the rotate operation. R denotes the rotate (cycle shift)
o
function.
The ZERO INDICATOR leads are connected as shown. If a "0" occurs in the
l6-stage parallel processor, a "1" will appear on the indicator 'of the left-
most unit.
R. ELECTRICAL PERFORMANCE
The parallel processor units delivered to NASA met all original electrical
specifications with respect to speed and standby leakage power. At standby,
the current drain was typically less than 1 microampere.
All units were fabricated with a "standard threshold voltage" process,
giving thresholds of approximately 1.9 volts for both n-type and p-type tran-
Zl
N C
+v
e
IND.
Ro2
08
05
06
07
o IND.
e
J bed e
NEG.IND
Ro2
04
['1
02
03
o IND.
c d e
__0P_T_IO_N_A_L---41 BY PASS
r----~ Cl
Rol 2
,--------iiI C2
OVERFLOW I 0
OVERFLOWINO
LF ABC CL
RTO A
,.-------41 BYPASS
(}--+---~ Cl
N C
N.C
c d e Zj
NEG IND.
Ro2
012
09
OlD
On
o IND. ...----,
e
3
RT ABC CL a b
BYPASS
,----41 Cl
Rol
C2
OVERFLOW I 0
OVrRFLOW IND.
LF ABC CL a
OPTIONAL
RT ABC CL abc d e Zj I-------U
NEG. IND.
Ro2 NC
016 ~----~
4 °13.------U
014 .-----"""'\.
015 ~----"""'\...J
o INO
Zdel
o VOO
o GNO
02950L
Figure 5. Schematic Interconnection of Four Four-Stage Chips
22
sistors. As shown in Figure 6, worst-case four-bit add performance at a 10-
volt power supply was less than 1 microsecond with most other instructions
faster. This speed performance could be further increased by a factor of
approximately 2 by using a "low" threshold voltage process and a l2-volt sup-
ply.
23
VDD= IOV TIME SCALE = 0.51ls 'em
t APPLY DATA AND IN INSTRUCTION
t ON RISING EDGE OF CLOCK LOAD DATA (1111)
IN REGISTER
tCHANGE DATA TO 1000 AND APPLY ADO INSTRUCTION
tLOAD RESULTS OF ADO IN REGISTER
t APPLY OUT SIGNAL AND OBSERVE RESULTS OF ADO
ON DATA LINES
THE PHOTOGRAPH DEPICTS THE MINIMUM ADO PERFORM
TIME FOR PROPER CIRCUIT OPERATION.
0'688 V
Figure 6. ADO Perform Time
24
CLOCK
OUT LINE
DATA I
DATA 2
DATA 3
DATA 4
LEFT DATA LINE
SECTION III
IMPROVED PHOTOMASK FABRICATION
A. LIMITATION IN LARGE-CHIP SIZE PHOTOMASK TECHNOLOGY
During Phase I of this program to fabricate an BOO-transistor LSI mono-
lithic parallel processor COS/MOS chip for NASA, it became apparent that the
major, and unanticipated, technical difficulty was in obtaining large-chip
size, low-defect-level photomasks of good resolution and dimensional fidelity.
These limitations are associated with old piecing techniques of photomask
fabrication for the large chips, using handcut artwork and multiple photo-
graphic reductions.
Because the large-chip size (0.155 inch by 0.146 inah) of the parallel
processor exceeded the maximum useful capability of the reducing lenses at
that time, it became necessary to' prepare each photomask in four parts and
step-and-repeat each of the quadrants individually. In this process, it was
necessary to insert each reticle "blind" into the photorepeater. It was ob-
served that the accuracy with which each quadrant could be located mechani- '
cally with respect to its neighbors was such that the desired maximum design
tolerance of 50 microinches could not be consistently maintained, and random
positioning errors greater than 100 microinches within a chip resulted. Such
pattern misregistration cannot be tolerated in the COS/MOS fabrication-sequence
procedure which requires the successive alignment of seven photomasks, each
subject to independent random quadrant location. In three masks n+, p+ and
metallization, relative alignment is absolutely critical to avoid shorts and/or
.to ensure that the gate metal overlaps both source and drain diffusions for
proper MOS transistor operation.
Extraordinary alignment techniques had been needed to deliver the limited
number of parallel processors fabricated under the original Phase I program.
Such techniques obviously were not adequate to support a reproducible process
25
for more than a limited supply of parallel processor engineering samples. In
order to fabricate additional large-chip COS/MOS devices in a routine manner
successfully, it was necessary to employ better mask fabrication methods.
B. AUTOMATED PHOTOMASKING EQUIPMENT
To fabricate improved parallel-processor photomasks during phase I(a) of
the program, RCA used technically superior photomask-making equipment not
available to the industry at the start of the original phase I program. This
equipment consists of: a digitizer plotter for digital tape preperation, an
automatic reticle generator, and a chrome-master photorepeater.
Automatic artwork generation required that design information be digitized.
Tapes to automatically draft and generate reticles were prepared from digital
data entered mechanically from a digitizer plotter as indicated in Figure 7.
The redesigned parallel processor incorporated a number of minor improvements
which increased its speed.
C. TYPE 1600 lOX RETICLE GENERATOR
Once debugged tapes were available, lOX photographic reticles were gener-
ated by the Mann automatic pattern generator; these lOX reticles are used to
make the final photomasks in one step with a special Mann chrome-master photo-
repeater. Figure 8 shows the David W. Mann Company type 1600 reticle generator
with a PDP-8/S central control digital computer. Photomasks fabricated with
this equipment at RCA have shown both superb image detail and dimensional fi-
delity, and represent a very significant advance in photomask art.
Use of this equipment eliminated previous photomask problems, which were
observed in the Phase I parallel-processor program. Here, photographic re-
ductions of pieced 500X artwork presented major technical difficulties in
image definition, distortion, registration within masks, corner rounding, di-
mensional control, and timely delivery of masks. The ability to fabricate lOX
reticles directly solved these problems.
The type 1600 pattern generator is a fully automatic, computer-directed,
highly accurate and reliable system for producing lOX reticles without inter-
26
A BETTER QUALITY OF THESE
PAGES IS REPRODUCED AT THE END OF
THIS PUBLICATION
O~637V
Figure 7. DigitIZer Ploller System Employed to D,glllze Parallel Processor Data
27
O!le041V
Figure 8. Type 1600 Automatic Reticle GeneratOi
28
mediate artwork generation and reduction. Digital input tape controls all
automatic functions. Input data on the nine-channel magnetic tape includes X
and Y coordinates of the center of exposure, and height and width dimensions
of the rectangular exposure. Height and width of the area exposed in a single
flash on the lOX pattern may be varied in 240 discrete steps from 0.5 to 120
mils, a total of 57,600 sizes. The microset scale for both the scanning and
stepover axes assures positional precision of ± 0.00001 inch on the lOX reticle
up to 3 by 3 inches. The guaranteed reticle resdlution is superb and more
than adequate for the parallel processor, i.e., 650 line pairs per millimeter
over the entire circuit pattern area.
D. TYPE 1795 CHROME-MASTER PHOTOREPEATER
Photomasks for this project were prepared from the lOX reticles using a
new David W. Mann type 1795, six-head, chrome-master photorepeater. RCA re-
ceived the first model available and the parallel processor was one of the
first projects to utilize this·equipment.
The major technical advantages of this equipment are the generation of
durable, ultralow-defect level, chromium masters with superior edge acuity;
and the exceptionally large, guaranteed chip size (0.250 inch by 0.250 inch)
over which precision control of image resolution and dimensional accuracy can
be maintained.
The importance of ultralow photomask defect levels in an BOO-transistor
LSI array is clear. A single imperfection in any of the seven photomasks used
in the COS/MOS fabrication sequence will cause the chip in which the defect
falls to be inoperative. The enhanced durability and freedom from pinholes of
chrome masters will minimize imperfections and ensure that pattern damage does
not occur when making contact prints from the master.
The wide 0.25-inch field of the improved lenses used in the 1795 photo-
repeater were also of major significance in this program. Maximum useful chip
area, which could be usefully photorepeated with older equipment for the par-
allel-processor program, was less than 0.120 inch by 0.120 inch. This area
was smaller than the parallel-processor chip size (0.155 inch by 0.146 inch).
29
In the previous program, it was necessary to attempt to piece final images by
multiple photorepeater runs with resulting registration problems. This class
of problem was avoided with the new equipment since the new parallel-processor
chip size was well within the limits of the 1795 photorepeater.
E. Processing
As anticipated, the improved photomasks resulted in the capability to
make the parallel-processor reproducibly. Yield was observed in the first
wafer processed in a high-yield line and delivery of 25 low-leakage units was
made to NASA, fulfilling all delivery and performance specifications on this
project.
30
SECTION IV
REFERENCES
1. A.A. Alaspa, and A.G.F. Dingwall, "COS/MaS Parallel Processor Array",
IEEE Jr. Solid State Circuits, Vol. SC-5, No.5, October 1970.
2. R.J. Lesniewski, "A Large Scale Integration (LSI) Computer Concept Uti-
lizing only Five Types of General Purpose Digital Arrays," Master of
Science Thesis, University of Maryland; 1970.
3. R.J. Lesniewski and F.J. Link, "A Complementary - MaS Space Craft Data
Handling System," Proc. GOMAC Conference, 1969.
4. R.J. Lesniewski and D.H. Schaefer, "Goddard Space Flight Center Ultra
Low Power Computer Development Program," Proc. Third NASA Intracenter
Microelectronics Conference, February 1968.
31
