Monolithic parallel processor  Progress report, 28 Jun. - 27 Dec. 1968 by unknown
General Disclaimer 
One or more of the Following Statements may affect this Document 
 
 This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 
 
 This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 
 
 This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 
 
 This document is paginated as submitted by the original source. 
 
 Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 
 
 
 
 
 
 
 
Produced by the NASA Center for Aerospace Information (CASI) 
https://ntrs.nasa.gov/search.jsp?R=19690019909 2020-03-12T03:14:36+00:00Z
U;
a
N
 A
.	 ^	 o
Y
Y	 \\ S
.	 1 q
=	 h<
Y o
0
Y	 Qu
u	 .
^	 ^ u
for
GODDARD SPACE FLIGHT CENTER
Greenbelt, Maryland
N
^o
G1
PERIODIC PROGRESS REPORT
,MONOLITHIC PARALLEL PROCESSOR
'28 JUNE 1968 TO "7 DECEMBER 1968)
CONTRACT NO. NAS5-11511
Prepared by
RCA ELECTPONtC COMPONENTS
Somerv,lle, New Jersey
^I3
eM wV*d aa11101d
PERIODIC PROGRESS REPORT
MONOLITHIC PARALLEL PROCESSOR
(28 JUNE 1968 TO 27 DECEMBER i90'8)
CONTRACT NO. NAS5-11577
GODDARD SPACE FLIGHT CENTER
CONTRACTING OFFICE
	 A.L. ESSEX
TECHNICAL MONITOR- R.J. LESNIEWSKI
PREPARED BY
RCA ELECTRONIC COMPONENTS
Somerville, New Jersey
PROJECT MANAGER: J. HUBRAND
PROJECT ENGINEER- A. DINGWALL
for
GODDARD SPACE FLIGHT CENTER
GREENS; LT, MARYLAND
i
ABSTRACT
A four-bit parallel processor has been designed. The logic design
conforms to the contract objectives. The array of 750 devices has been laid
out on a chip 145 mils by 155 mils. A composite drawing was completed and
rubyli.ths were prepared. Extensive checking of the rubylith (with the assist-
ance of R. Lesniewski of NASA) was used to reduce the probability of error in
the layout. A test matrix was developed to permit testing of the processing
and control portions of the parallel processo-.
ii
TABLE OF CONTENTS
Section Pa%
I	 St^StARY OF PROGRAM	 STATUS	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 1
A. Introduction	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 1
B. Program	 Objectives	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 2
C. Program	 Status	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 4
II	 PARALLEL PROCESSOR LOGIC DESIGN	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 5
A. General Description of Four-Stage Processor	 .	 .	 . 5
B. General
	 Logic	 Description	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 5
C. Operational	 Modes	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 15
D. Overflow	 Detection	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 15
E. Zero
	
Detection	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . i7
F. Negative	 Detection	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 17
G. Conditional	 Operation.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 18
It. Instruction	 Repertoire	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 18
I. Serial-Shift	 Operations..	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . i9
J. Parallel
	 Commands	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 20
K. Timing	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 23
L. :Rode-Independent	 Switches	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 24
M. Mode-Dependent	 Switches	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 26
N. Shift
	 Switches	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 27
0. Conditional	 Operation
	
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 27
P. Overflow IndlcRtor 2b
iii
TABLE OF CONTENTS (Cont.)
Section
Q. Expansion to 16-Sta
t11	 BREADBOARD AND COMPUTER
A. Summary . . . . . .
B. Details . . . . . .
C. Computer Studies of
[ :, e
	
4e Processor . . . . . . . . .	 29
	
STUDIES . . . . . . . . . . . 	 31
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 31
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 31
	
Switching Speed . . . . . . .
	
34
	
IV	 COS/MOS FUNCTIONkl, LOGIC GATING . . . . . . . . . . . 	 47
A. General Discussion	 . . . . . . . . . . . . . . .	 47
B. Translation Procedure . . . . . . . . . . . . . . 	 50
	
V	 DESIGN AND LAYOUT . . . . . . . . . . . . . . . . . . 	 57
A. Introduction	 . . . . . . . . . . . . . . . . . .	 57
B. Layout Design Considerations and Implementa-
tion	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 60
C. EXCLUSIVE OR	 . . . . . . . . . . . . . . . . . .	 62
	
VI	 PROCESSING	 . . . . . . . . . . . . . . . . . . . . . 	 75
A. Introduction	 . . . . . . . . . . . . . . . . . .	 75
B. COS/MOS Fabrication Process . . . . . . . . . . .	 76
C. Capacitance-Voltage Curves	 . . . . . . . . . . . 	 78
	
VII	 TESTING .
	 .	 . . .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . . .	 . . .	 8i
A. Int.roducticn	 . . . . . . .	 . .	 . .	 .	 . . . . . .	 81
B. Initial Design Fault Isolation	 . . . . . . . . .	 81
C. Production Test Procedures	 . . . . . . .	 . .	 81
D. Parallel Processor Test Procedure . . . . . . . .	 85
iv
TABLE OF CONTENTS (Cont.)
Section	 Paxe
	
VIII	 CONCLUSIONS AND RECOMMENDA'11ONS . . . . . . . . . . .
	 91
	
IX	 PROGRAM FOR NEXT INTERVAL . . . . . . . . . . . . . .
	
93
r
v
PRECEQING PAGE BLANK NOT FILMED.
LIST OF ILLUSTRATIONS
i
FiKure 'rifle Page
1 Four-Stage Processor, 	 Lead Requirements	 .	 .	 .	 .	 .	 .	 . 6
2 Logic	 Diagram	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 7
3 Implementation of 8esic Gates in the Array Design
Logic	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 8
4 1perational	 Modes	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 16
5 Schematic Interconnection of Four Four-Stage Chips.	 . 30
6 Breadboard	 of	 the	 Processor	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 32
7 Construction of Functional Gates Using the CD.007D
as	 the	 Basic	 Building	 Block	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 33
8 Input and Output WaveformR for Control Section	 .	 .	 . 35
9 Zero	 Indicator	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 36
10 Clock	 Circuit	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 37
11 Typical	 Operating	 Wa ,.eforms	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 38
12 Example of Form Used to Enter Data Into the
Computer	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 41
13 Computer Calculated	 Switching Times	 .	 .	 .	 .	 .	 .	 .	 .	 . 45
14 Building	 Block	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 48
15 Functional	 Cating	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 49
16 Layout	 of	 the	 Parallel	 Processor	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 58
17 Lead Configuration for the Four-Bit Slice Using 28-
Lead	 Flat	 Pack	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 59
is Parallel Processor Functional Layout	 .	 .	 .	 .	 .	 .	 .	 . 61
vii
LIST OF ILLUSTRATIONS (Cont.)
Figure Title Page
19 A	 Functional	 Logic	 Block:	 D Flip	 Flop	 .	 .	 .	 .	 .	 .	 .	 . 63
20 D	 Flip-Flop	 Logic	 Schematic	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 64
21 Transmission Gate and inverter Circuit Schematic 	 .	 .	 . 65
22 transmission Gate	 Layout	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 66
23 Block Diagram Representation for Layout Development of
D	 Flip	 Flop	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 67
24 D Flip-Flop Composite	 ;ayout	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 68
25 Block Diagram for EXCLUSIVE-OR Layout Development .
	 .	 . 72
26 EXCLUSIVE-OR Composite	 Layout	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 73
27 COSMOS Fabrication	 Process	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 77
28 Capacitance-Voltage Curves for Clean and Impure
GA; des	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 79
29 Test Requirement for Three-Input NANID
 Gate with inputs
Active	 High	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 83
30 Test	 Requirement
	 :or	 Function. Gate	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 84
Viii.
3n gates, quad gates, and dual gates. In addition, RCA has available MSI
1
SECTION I
SU`NARY OF ?ROGRAM STATUS
A. INTRODUCTION
This first periodic report describes a program whose objective is to
design. develop, and fabricate monolithic, complementary-sy7metry MOS (COS/MOS),
large-scale, parallel processor arrays. The overall program is divided into
a 12-month phase and a six -month phase. The work effort to be performed during
these two phases is as follows: the fabrication of monolithic, four-bit
parallel processor arrays .,f more than 750 components, i.e., MOS transistors
and diodes; and the fabrication of monolithic, 16-bit parallel processor
arrays of more than 3000-component complexity. The 16-bit parallel procesRors
are to be fabricated by interconnecting clusters of four-stage arrays on one
chip.
The COSMOS technology to to applied to these parallel processor devices,
although relatively new, is well suited to large-scale integration because of
the minimal area requirements, low power dissipation, and suitability for
functional logic gating. nf special advantage for remote applications are:
essentially negligible standby pcwer dissipation (the design goals call for
less than 1.2 x 10 J watts standby power at ambient), and low operational
power at significant operational speed (centered design goals are 10 x 10-3
watts at a data transfer rate of 4 x 10-6
 seconds).
Successf>>1 achievement of the fabrication goals of this program wili
represent a significant advance in the COSMOS state of the art. RCA has
applied considerable corporate effort to the development of COS/MOS technology.
Since August 1967 when the commercial availability of the TA5361 dual three-
input NOR gate was first announced, a family of basic integrated circuit: has
'been made available commercially including dual D flip flops, inverters, trans-
arrays such as 16-bit memory cells, seven-stage counters, and 18-stage shift
registers. RCA's demonstrated experience in COS/MOS design and large-scale
production is currently unmatched as evidenced by the fact that RCA is stili
the sole commercial source for these products. The four-bit and 16-bit parallel
processors, however, represent devices having a complexity level more than an
order-of-magnitude greater than the COS/MOS devices available commercially.
E. PROGRAM OBJECTMS
The narai^el prectiss--- program consists of two distinct phases. The
objective ni the firs: Phase is to design, dt :velop, and fabricate individual
four-bit parcA el prores?or chips. the objective of the second phase is to
devei, atA employ b_gh-yield processing techniques so that monolithic clusters
of four ::)cd four.-bit parallel rr.ocessjr arrays will be found with reasonable
yield .o ;)11con wafars. 	 cl,,sters then are to be interconnected mono-
lithically ar.1 nacka:;e r: to Fo -r., the 16-bit parallel processor arrays.
Detailed discussions of logic obiectives of the parallel processor arrays
are given in Section IT. In addition to the logic requirements, the specific
design features and objectives are is follows:
a. Size - a monolithic four-stage array approximately 0.145 inch
by 0.155 inch.
b. Power - supply voltage +8 to +12 volts.
C. Standby power dissipation at ambient - less than 1.2 x 10 -5 .atts.
Operating power lei x 10 -3 watts at a data tr3n^lfer rate of
4 x 10
-6
 seconds at ambient with a supply voltage of +10 volts
and with 50 -icofarads on °ach of th p output lines.
d. The circuit will operate over the Temperature range: +100 0C to
-500C.
e. A "0" is represented by ground potential and a "I" by the supply
potential.
f. The complete circuit will require 27 1-ads, and will be motnted
in a 28-lead ceramic fiat pack.
2
g. All leads excep_ rower leads Nave a capacitive load not greater
than 5 picofara.:::, including package capacitance.
h. A decoded comman- 7.1 alone will not effect a data transfer. A
eeneral transfer (timing) pulse with a decoded command will effect
a data transfer.
i. (1) Each output will be able to drive a 50-picofarad load to
within 90 percent of the sup ply voltage for an output change from
"0" to 10 1 " in 4 x 10-b "seconds from ini!_a.ation of the Cvt
instruction.
(2) Each output will be able to drive a 50-picofarad load to
within 10 percent of the supply voltage for an ouz . ut change from
1" to "0" in 4 x 10
-E
 seconds from initiation of the "Out"
instruction.
j. Complete execution of non-carr y generating -instruction other than
"Out" will take no loneer than 2 x 10
-5
 seconds from the aaplica-
tier, of the instruction at ambient with a supply voltage of +10
volts.
k. Complete execution of a worst case four-bit parallel addition
shall take no more than 2 x 10
-b
 seconds from application of the
add instruction at ambient with a suppl y
 voltage of +10 volts.
1. The parallel processor shall be capable of operating in one of
four raodes, which will be controlled by two lines.
M. The register should require two timing pulses, oae nester in the
other; the smaller pulse will be one-half the larger. The circuit
is capable of operating with the width of the larger pulse
between 2 x 10
-6
 seconds and 1 x 1n-3 seconds.
Design objectives for the 16-bit parallel processor generally are similar
to those of the four-bit parallel processor. The following changes, however,
result from cascading four units in a single package.
a. Standby power dissipation at ambient - less than 5 x 10 -5 watts.
Gperatii,g power of 40 x 10 -3
 watts at a data-transfer rate of
3
: x 10-6 seconds at ambient, with a supply voltage of +10 vcits
and 50 picofarads on each output line.
b. The complete circuit will require approxinately 40 leads.
c. Complete execution ^f a worst-case 16-bit parallel addition will
take no longer than 6 x 10-6
 seconds from application of the add
instruction at ambient with a supply voltage of 10 volts.
C. PROGR.A.`t STATUS
The logic design, the layout, and the cutting of rubyliths were completed
for the parallel processor arrays during this reporting period. Current
effort is directed towards final checking and correction of the rubyliths so
that wafer fabrication can commence. Tasks still to be completed include
high-yield processing of parallel processor wafers, testing, debugging (if
required), and packaging of chips.
i
SECTION II
PARALLEL PROCESSOR LOGIC DESIGN
A. GENERAL DESCRIPTION OF FOUR-STAGE PROCESSOR
The four-stage parallel processor basicall y is a four-stage shift
register that has both serial anc parallel acce z=s. fhe logic associated with
the register allows parallel-two's complement addition, AND, OR, and EXCLUSIVE
OR logic operations; and right, left, or right-c yclic shifts.
Figure 1 shows the lead requirements for the four-stage processor. All
control lines are encoded, with five leadG used for instructions; four leads
for control; one lead for timing; two leads for power; and five leads for the
following coalitions:
a. negative indication
b. zero indication
C. overflow indication
d. overflow input/output
e. conditional input
Because information will enter and leave on the same line, six leads are re-
quired for the four-stage register to transfer data, with four of the leads
used for parallel access and the remaining two leads used for serial access.
In addition, four leads are available for expansion to a multiple of four (16
for example) stage processor.
B. GENERAL LOGIC DESCFIPTION
The logic configuration for the four-stage parallel processor is shown in
Figure 2. Functional gating is used where possible to achieve a low davice
count. Figure 3 shows the implementation of the basic gates used in the array
5
LEFT SERI
DATA LINE
NEGATIVE
;NDICATOR
ZERO
INDICATOR
BY PASS
Rif
Rot
Z,
GHT SERIAL
JA LINE
JERFLOW
DICATOR
JNOITIONAr_
IPUT
10 V
PARALLEL DATA LINES
3	 2	 1
a b c d e	 Cl C 2	AB
INSTRUCTION
	 MODES	 TIMING	 CONDITIONAL
Li'ES	 CONTROL	 PULSES	 CONTROL
Figure 1. Four-Stage Processor. Lead Requi;ements
i
01O SL
6
iq-Z>-
—T
a1 b
i=
ao
LA-
1Y ^`^
^1 S r r r i ^ a- 3 ^ ^ ^ (^ ^ J
Ll
+
?v o —im
.ca
F,
J
1
Jp
" VDD
II	 P
1
1S 2.1
BIIS 2.i
i 7
fx
ADD
r-
lr P
x	 x
I
J
L
O'988L
F!gure 3. Implementation of Bask Gates it the Aray Design Logic (Sheet 1 of 71
8
XY
Y Y	 X Y
I	 '
Z	 Z
X	 Y	 Z
0	 0	 VDD
0	 VDD
	 VDD
V DD	 0	 V^0
VDD	 V DD	 0
` VDD
N
P
^► Z
N
N
2
i
K2 B1
	
KI BI
Z
ADD
K-	 B:
h	 ►^
I
K2 	 B 1 I P
Z
K I	 t	 K2	 N
B i
 ^ ^	 B1^ r	 F;
Iy
,i I oo; L.
Figure 3. Implementation of Basic Oates in the Array Design Logic Sheet 2 of 11
9
K ' 1
Ki 1
B 1 I
Z
X Y	 r( 'f
VDD
I
01oea1.
Figure 3. Implementation of Basic hates in the Array Design Logic f Sheet 3 of P
13
__ y
'ADD
9 WOOL
Figure J . Impiementat on of Banc Gates in the Acay Design Logic (Sheet 4 of 7)
11
^-	 • vDD
x 2 • Y 2	 0
1
^l
	
X 1 Y 1	 r-^
P
X1+Y2 N
	
CI	 rP X1-Y1P	
^I P
	
C1	
f f	
X 1 Y1
I	 I	 '^
^I
	
X1 
+Y1	 i--'	 X2 Y2I t ff
I
L"	 1
	
}	 N	 N
	
> r	 r
	
r, X
	 X X
	
N
X
'VDD
X I Y 1 I	
FI
01 ^ F p X1+Y1^
Figure 3. Implementation of Banc Cates in the Array Design Logic (Sheet 5 of 1)
12
01"I
	r'J C7 .-. N M N ^^	 ^'tr r r r r r r r
	
-+ N x M ^.. C J f'7 (mil 7C	 Xcuxx ,cxx x
^ r a ^ I
+ ADD
X3 Y3	 I P
F^
X2 Y 2	 I F
Z
X1•Y111 P
C l	 1 1 P X14 V i 	 P X 2 +X 21 I P X3'Y3
C I	 I`i X1 Y^ 9r X2 Y21 1 N X3• Y 31 ► .
X1+Yi 	 I N
X 2 +Y 2 1 I f^	 f
XJ+Y3 1 1
:1996l
Figure 3. ImpiementaMn of Banc Gates in the Array Design Lcgic (Sheet 6 of
1.3
^'^,►-^^- ^ -- ----.^-^--- -
	 -- _ ^-	 - --gym ...^^
>	 r T	 T, T	 ^- r r
N —+	 N M	 of M KX	 X K
	
X X X X X' 4 VrjD
r4 Yy	 I
1
Xs Y3	
7 i
X ? •Y 9	I P
I^
X 1
 • Y l	 I	 p
-L-,q
tIX I-Y Il I° x"Yd j P X .)+`,'-, I P X4*''3 1
IJ
	
X1.Y,1
	
N X2 • Y] f `^
 X3 -Y31 I "+ X 4 . Y 41 f 't
0 • D92 L
Figure s. Implementation of Basic Gates in the Array Design Logic'Shee' 7 of 7)
1
Cl
1L
design logic. The hardware requirement is about 750 devices and 27 bonding
pads.
OPERATIONAL MODES
Fo g this discl:ssior, mod is defined as the ability of the parallel pro-
cessor to control the transfer of either serial data or carries due t, arith-
inetic operations. The parallel processor will be capable of operation in une
o- fou- modes. For sim.pliciry, consider the parallel processor as a strictly
serial device. Serial data can enter or leave eir-her side of the register.
Since there is onl y one lead on either side of the register., the serial trans-
fer must ce bidirectional. The manner in which modes control tl,e serial-data
lines is as follows:
a. Mode 0 (A, Figure 4) - Data can enter or leave from either side.
b. `:ode 1 (B, Figure 4) - Data can enter or leave the register on
the left side during an y serial operation.
c. `:ode 2 (C, Figure 4) - Data :ar. enter or leave on the right side.
d. :code 3 (D, Figure 4) - Serial data neither ma y enter nor leave
the register, regardless of the nature of the serial operation;
furthermore, the register is bypassed electrically, i.e., there
is an electrical bidirectional path from the right serial lead
to the left serial lead. The :Host-significant (leftmost) bit
is used as the sign bit.
D. OVERFLOW DETECTION
A two's complement overflow is defined as having occurred if the signs of
the two initial words are the same and the sign of the result is different
while p2rformino the ADD instruction.
The parallel processor will be capable of detecting and indicating the
presence or absence of an arithmetic two's complement overflow. Overflows
will be detected and indicated only during operation in Mode 2 or Mode 3. In
either mode, only four instructions (AD, SMZ, SM, and SUB) will have the
f	 Fotentio.. of rausind a two's complement overflow. If an overflow is detected
15
Figure 4. Ope,ational Modes
-6
Ui 99'L
lip 0	 MODE 0
^L^^ -^
MODE 0	 ^-s	 a
—J
A. MODE 0
MODE 1	 ^—	 MODE 1
B. MODE 1
MODE
	
MODE 2
C. MODE 2
I	 MODE 3
	
ha DE 3
D. MODE 3
and stored by a flip flop, only one of the five instructions (AD, SMZ , SM, SUB
or IN) can change the overflow indicator.
Occurrence of a two's complement ovurflow is represented by a "1" in the
overflow flip flop. The absence of an overflo. ,, Is represented by a "0" in
the overflow flip flop. The flip flop will chzzge from zero to one as over-
flows do not or do occur.
When any one of the three subtraction instructions is used, the sign bit
of the data being subtracted ..ill be complemented, and this value will be
used in the same manner as one of the initial signs (as in the add instruction)
to detect overflows. If these conditions a •,e true, the final sign will be
that of the one's complement.
The overflow flip flop will be updated at the same time that the new
result is stored in the parallel processor.
E. ZERO DETECTION
The parallel p rocessor will be detecting tie condition of all zeros. A
on the ZI input, indicating all zeros it the previous stage or that the
stage is the least significant state, is required. This operation will be
independent of modes. A condition of all zeros wil_1 be represented by a
on the zero indicator line; otherwise, this line will be zero. If the
particular four-bit processor represents the leas*_ significant set of bi`s,
ZI should be tied to +V.
F. NEGATIVE DETECTION
The parallel processor will be capable oL detecting the presence of a
negative number. This operation is independent of modes. If tLe condition
is true, a "1" will appear on the negative indicator line; otherwise, a "0"
will appear. A "1" in the most-significa-at bit position will indicate a
negative representati,-)n.
li
i
C. CONDITIONAL OP ERAT.ON
Once the instruction and Node have been applied, only the clock pulse will
be required to change the state of the register. If this pulse could be
inhibited in the ON condition, all instructions would behave as a NO-OP.
The clock pulse can be constrained by using "conditions." A conditional
input is compared with a control line, and a second control line defines
whether or not to test the conditional input line. An instruc*_Lon will be
permitted to operate under the following conditions:
a. Unconditional
b. The conditional input is positive
C. The conditional input is negative
N. INSTRUCTION REPERTOIRE
Four encoded lines will be used to represent 16 instructions. A fifth
line will be used solely to represent an OUT commend. encoded instructions
will he as follows:
NO-OP
Left shift
Right shift
Rotate (cycle) right_
Input
Subtract fro:,. memory (SM)
Count up
Count down
Clear to zero
Set to one
AND
OR
EXCLUSIVE OR
Subtract from zero (SMZ)
Add (AD)
Subtract (SUB)
18
I
—_ . -	 -
I. SERIAL-SHIFT OPERATIO\S
a. Rotate (cycle) right - This operation is internal. The contents of
the register will shift tc the right, cyclic fashion, with the left-
most stage accepting data from the rightmost stage, regardless of
mode. Data may leave the register serially on the right data line
only whil- the register is in `lode 2 or `lode 0. Data may leave the
left data line serially while in Mode 1 or Mode 0.
b. Right shift - The contents of the register generally will shift to the
right under the following conditions:
(1) In Mode 0, data may enter serially cn the left data line, shift
through the register, and leave on the right data line.
(2) In Mode 1, data may enter serially on the left data line. The
right data line effectively will be open-circuited.
(3) In Mode 2, data may leave serially en the right data line. The
left data line effectively will be open-circuited. Vacant spaces
will be filled with zeros.
(4) In Mode 3, serial data neither may enter nor leave the register;
however, the contents will shift to the right, and vacated places
will be filled with zeros.
C. Left shift - The contents of the register generally will shift to the
left under the following conditions:
(1) In Mode 0, data may enter the right data line, shift through the
register, and leave on the left data line.
(2) In `lode 1, data may leave serially on the left data line. The
right data line effectively will be open-circuited. All vacant
positions will be filled with zeros.
(3) In Mcde 2, data may enter serially on the right data line. The
left data line effectively will be open--circuited.
(4) In Mode 3, data neither may enter nor leave the register; however,
the contents will shift to the left, and vacated places will be
filled with zeros.
19
J . PARALLEL CO'TNIANDS
a. C1.EAR - sets register to zero.
b. SET - sets register to all ones.
c. GR - processes contents of register with value on parallel-data lines
in a logical OR function.
d. AND - processes contents of register with value on parallel-data lines
in a logical AND function.
e. EXCLUSIVE. OR - processes contents of register with value on parallel-
data amines in a logical EXCLUSIVE OR function.
f. IN - loads value on parallel-aata lines into register.
g. OUT - outputs contents of register on parallel-data lines.
h.	 SUB:
(1) In Mode 1, adds to the contents of the register the two's comple-
ment of whatever is on the parallel-data lines. Generated
carries may leave on the left serial line. The overflow indicator
is not altered.
(2) In Mode 2, adds to the contents of the register the one's comple-
ment of whatever is on the parallel-data lines. Carries may
enter on the right serial line but may not leave on the left
data line. The absence or presence of an overflow is registered.
(3) In Mode 0, same as Mode 2, except carries may leave on the left
data line. The overflow indicator is not altered.
(4) In Mode 3, same as I•lode 1, except carries may not leave on the
left data line. The absence or Iresence of an overflow is
registered.
i. COUNT UP:
(1) In Mode 1, internally adds one to the contents of the register
and permits any resulting carry to leave on the left serial-data
line. No data enters or leaves either the parallel lines or the
right serial line.
20
(2) in Mode 2, adds to the contents of the re;is?:er whatever is on
the right serial-data line. No data ent=rs or leaves either the
parallel lines or the left serial line.
(3) In Mode 0, adds to the contents of the register whatever is on
the right serial line and permits an, resulting carry to leave
on _ne left data line. No riata entf-rs or leaves the parallel
lines.
(L) In ?lode 3, internally adds one to the contents of the register.
No data enters or leaves the register on any serial-data or
parallel-data line.
i. COUNT DOVN:
(1) In ?lode 1, internally subtracts one from the contents of the
register and permits any resulting carry to leave on the left
serial-data line. No data enters or leaves either the parallel
lines or the right serial line.
(2) In Mode 2, subtracts o-ie from the contents of the register aad
adds to this result whatever is on the right serial-data line.
No data enters or leaves the parallel lines or the left data line.
(3) In Mode 0, subtrac'-s one from the contents of the register and
adds to this result whatever is o p- the right serial-data line and
permits any resulting carry to leave on the left data line. No
data enters or leaves the parallel lines.
(4) In Mode 3, internally subtracts one from tLe contents of the
register. No data enters or leaves either the parallel lines or
the serial lines.
k. AD:
(1) In Mode 1, adds the contents of the register ro whatever is on
the parallel-data lines and allows any resulting carry to leave
on the left data line. Tne right serial-data 'line is open-cir-
curled. The overflow indicator is not altered.
.^	 21
(2) In Mode 2, adds tine contents of the register to whatever is on
the parallel-data lines and the right serial-data line. Any over-
flows will se, the overflow indicator. The left serial-data
line is open-circuited. The absen--a or presence of an overflow
is registered.
(3) In Mode 0, adds the contents of the register to whatever is on
the parallel-data lines and the right .serial-data line. Any
resulting carry may leave on the left serial-data line. The
overflow indicator is not altered.
(4) In Mode 3, adds contents of the register to whatever is on the
parallel-data line. Any resulting carry will set an overflow
indicator. The two serial-data lines are .-)pen.-circuited. The
absence or presence of an overflow ^s registered.
i	
1. SM - same operation as AD, except the contents of the register are
two's complemented during addition in Mode 1 and '.Soda 3. In Mode 0 or
Mode 2, tiie contents of the register are one's complemented and added
to whatever is on the right serial-data 1n.E and the parallel-data
lines. Overflows occu"fring in Mode 1 or Mode 0 do not alter the ov=r-
flow indicator. The presence or absence of n.°erflows is registered
on the overflow indicator in Mode 2 or Mode 3.
M. SMZ:
(1) In Mode 0, one's complements the contents of the register and
adds whatever is on the right serial-data line to the contents of
the register. Any resulting carry may leave the left serial line.
Any overflow will not alter the overflow indicator.
(2) In Mode 1, two's complements the contents of the register and
per-its any carry to leave on the serial line. Nothing may enter
the right serial line. Any overflow will. not alter the overflow
indicator.
(3) In Mode 2, one's complements the contents of the register and
adds whatever is on the right serial line to the contents of the
22
^^ c
register. Carries may no,_ leave t'ie left serl.al  line. The
absence or presence of an overflow will alter the overflow indi-
cator.
(4) In 'lode 3, two's complements the contents of the register. Serial
data neither may enter the right serial line nor leave the left
serial line. The overflow indicator will be at zero.
n. NO-OP - The N0-OP condition will inhibit the clock signal before the
D-type flip _`lops.
K. TLMING
Transfer of data is accomplishes^ by using a D-type flip flop (Figure 1:)
• .`+i.i. requires one c l ock pulse to transfer data on the input into the storage
element_.
The D-type flip flop consists of two double inverters which may feedback
on themse l ves through transmission Rates providing a stable state. 'When the
clock is low, transmission gates "1" and "3" are active and gates "2" and %,
aie inactive. This state permits the retention of data by the second inverter
pair while allowing the incoming data to define the state of the first
inverter pair.
When the clock undergoes a low-to-high transition, the states of all trans-
mission gates are changed. During this transition the flip-flop input becomes
isolated and the first inverter pair is stabilized by opening the feedback
transmission gate holding the information which was on the data line. `lean-
while, the second inverter pair loses its feedback and a path is established
from the first stage. For a set of D flip fiops in a shif ---eg4ster configura-
tion, the effect of this transition is to permit the first sage cf each flip
flop to store information from the output of tLe previous flip flan before the
second stage jf the flip flops changes due to the new flip-flop input. During
the high-to-low transition, the new data are transferred to the second inverter
pair, iz a manner :similar to the original transfer, and the normal storage
mode is assumed.
23
L. MODE-INDEPE`DENT SWITCHES
The state of the control lines to she processing logic for the 15 operat-
ing instructions is shown in Table I. True data from the parallel inputs are
gated into the parallel processor when K 1 is high. The pertinent equation is
Kl= acd + ac + ca	 . . . . . . . . . . . . . . . . . . . . . . (1)
Complementary data from the parallel inputs are gated into the processor by
K2 , which is given by
K2= bcd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (2)
j
True information in the register is gated into the processor by K 3 , which
is given by
K3 = ab + z	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (3)
I
f
and the complementar y information is gated by K49 where
K,	 = be	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 '4)
K
Control K5 is used to set all ones into the processor for one operand. Control
K5 can be gated through for a SET or can be used in COUNT DOWN. The pertinent
equation is
K_ = abc + acd	 . . . . . . . . . . . . . . . . . . . . . . . . (5)
The EXCLUSIVE OR can be inhibited when XI is high, which allows the OR opera-
tion to be formed. The switching equation is
XT = ad	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (6)
The IN transmission gate is used to load the register in parallel, where
IN = abed	 . .	 .	 . .	 .	 . . . .	 .	 . .	 .	 .	 . .	 . . .	 ( I
24
IZO
V:
n
el
a
v
L`.
QG
L:.
F
W
zp c '"	 _	 -	 -	 -	 -	 a	 c	 c	 c	 _	 _	 c	 c
I	 m
-ccL6
c^
v - _	 _	 _	 ._.	 o	 ._.	 0	 0	 0	 P	 0	 _	 c	 -
d
y ^+	 Pf' -^	 --	 r"^	 O	 O	 O	 J	 O	 O	 O	 O	 O	 ^.	 O	 C
ij- -..	 ...	 O	 P	 O	 O	 O	 O	 O	 O	 O	 O	 O	 n	 ..	 ...
-
....
I	 ^	 O	 O	 ..,	 O	 P	 O	 O	 O	 .r	 O	 ...	 O	 -+	 p
J=	 O	 O	 C	 O	 O	 O	 O	 O	 C	 O	 O	 O	 O
vZ
•-•	 .^	 O	 O	 O	 ^	 ..	 C	 O	 P	 O	 O	 _	 ^	 _	 ....
Y
_
v_
•
17	 I u O	 O
y ( a
S
O	 O	 O	 O	 -	 -	 -	 -	 O
'"	 C	 C	 O	 O	 O	 -
L
_
1
C
- ` O
x
p	 >	 '<Yi	 O	 f	 F	 ti	 p	 C	 f	 W	 YZ	 ^	 ^	 =	 1	 <	 r	 Vf	 -.	 x	 a	 ?	 ^^^
H
U
.-a
F
r
25
The SUM transr.issicn gate is used to perform all add-type instructions, where
SUM--=-= a (b + c)	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (8)
Logic operations are performed b y the AND and OR switcheZ, where
AND - abcd	 . . .	 . . . .	 .	 . . .	 . . . .	 . . .	 (9)
a*.,d
OR = ab	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ,	 . .	 . . .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 (10)
M. MODE-DEPENDENT S':ITCHES
Code definitions for the modes were selec?td such that there is no need
to decode. The definitions selected are as follows.
C?'Mode 2.
0 0 0
1 0 1
2 i 0
_) 1 1
Wien high or "1," line C l
 or C2 indizates which side cf the processor is
inhibited. Lead C 1
 corresponds to the right side; lead C 2 corresponds to
the left Eide.
Transmission gate Q (Figure 2) is used to force a "l" into the carry of
the least significant adder during a COU::T UP or SUBTRACT operation ant during
	
Mode 1 or Mode 3. The equation is
	 V
Q = C 1 (c + d)	 . . . .	 . . . . .	 . . . . . . . . . . (11)
A zero is forced into the carry by T, which is given by
J = C1
 . cd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (12)
During Mode 0 and *lode 2, a carry is brought in frcm a previous array by the
CAR transmission gate, where
CAR = C 1
	a	 '13)
26
A ca rr y may propagate out during Modes 0 and 1 by using an M, which is given
by
-' = C^	 a	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (14)
N. SHIFT SI ITChES
The R and L transmission devices are mode independent Rates used to per-
	
form the right and left shifts. R, R R , R , L	 L	 and L are mode
o	 I1	 TO Ti	 TO	 N
dependent shift controls. The equations are summarized as follows:
RN
 = C,
	 •	 abcd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (15)
R	 = abcd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ( 16)0
R.:.1 =	 C 2	abc	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (17)
LT =	 C 2	abcd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (18)
R,... =	 C.	 abc	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ( 19)I 1
LT -	 C l	 abcd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (20)
L	 = C,	 •	 abcd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ( 21)
R= abc	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (22)
L = P	 •	 abcd	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 (23)
Note: R denotes right, L denotes left
0. CONDITiO\AL OPEP.ATION
The clock pulse operates on condition. Three control lines will. be define'
to permit conditional instructions. These lines will be labeled A, B, and C.
The following trjth table defines interactions among A, B, and C:
i
27
Permits
C	 B A Operation
0	 0 0 Yes
0	 0 1 Yes
0 Yes 
0	 1 1 No
1	 0 0 Yes
1	 0 1 No
1	 1 0 Yes
1	 1 1 Yes
The truth table reduces to the condition that (A + B•C + B 	 C) data-transfer
operation is to take place. This expression combined with the clock pulse
E	 accomplishes the data transfer providing the processor is not in the NO-OP
i
state.
P. OVERFLO"' T-NDICATOR
Switch SIGNA operates and puts the truth sum in the register under either
one of the following conditions: overflows have not been detected; or the
mode of operation and the instruction are such that ov--rflow detection_ is not
needed. This condition can be summarized by
1
	
SIG:A = (Sf
	
S l • S 2 • S f . S 1 • S ` ) - C 2 • ab
+ C 2
	ab + abc	 . . . . . . . . . . . . . . . . . . . (24)
Switch SIGNB operates when an overflow occurs, and a "1" is placed in the
overflow flip flop. The camplerent of the most significant bit sum output
also =s placed in the register; hence,
SIGNP, _ ( S f	 S 1	 S2 + S f	 S l	 S 2 )	 ab • C ')	 . . . . . . . . ( 25)
A zero is placed in the overflow flip-flop when the following condition is
true:
(SS1 • 52 + S f	 S1 • S 2 )	 ab • C 2	 . . . . . . . . . . . . (26)
28
The overflow flip flop can onl ,, be clocked during (ab • C 2 ) or during the
I\ instruction. Data may be entered or removed from the overflow flip flop,
on the OVERFLOW I!0 line, and during the IN and OUT commands, respectively.
Q. EXPANSION TO 16-STAGE PROCESSOR
The four-stage parallel processor is designed such that four processors
can be interconnected monolithically to form the 16-stage processor. The
wafer will. be diced s ,.tch that four operating four-stage processors will form a
16-stage processor. The interconnection scheme is shown in Figure 5. In
sections 2 and 3, the C 1 and C 2 mode controls are tied to ground; therefore,
these sections are held in :Mode 0. In section 1, C 2 is tied to ground; there-
fore, the section can operate only in Mode 0 or Mode 1. In section 4, lead C1
is tied to ground; therefore, this section can operate only in Mode 0 or
:lode 2. The modes for the 16-stage processor are determined by inputs C 1 and.
C2 , as shown.
The bypass leads of sections 1 and 4 are connected as shown in Figure 5.
'%-hen C 1 and C 2 are both "i," indicating Mode 3 the 16-stage register is by-
passed; and the left serial-data line of chip 4 and the right serial-data
line of section I are connected.
Leads Rol of section 4 and Ro - of section 1 are connected as shown in
Figure 5 to allow the rotate operation. R  denotes the rotate (cycle shift)
function.
The ZERO INDICATuR leads are connected as shown. If a "0" occurs in the
16-stage parallel processor, a "1" will appear on the indicator of the left-
most unit.
29
^`	 = O C L
	
L	
NEC. iN	 ♦ V C
	
r --- by P.1	 C	 Ro 2
O--L 	 C :	 K	 Da
N C
	 F	 CI
C	 G 2
i	 N C OVERFLO I o	 D3
N C ^t:OV'^RFLOW INDICATOR	 0 IND
LFDaA B CCloc,i	 d
1
RT	 A B C C L
 .^ b c d e	 Z
	
NEG IND	 N C
JPTIONAL BYPASS	 Ro2	 N C
l,l	 Dg
N C
	 Rol
	 2	 05
.^	
2	
D6
	
N C *----* OVERFLOW I o	 D7
N C
	
OVERFLOW W AD	 0 IND
LF A B C CL
	
b c d e
r
RT A B C CL , b C d e
OPTIONAL
	
NEG !ND
	
N C
BYPASS	 Rot
	
N C
1	 C1	 Di2
— N C
	 Rol	 D9	 ^
—E—
	 C2	 3	 DIG
N C
	
OVERFLOW I o	 D11
N C
	 LFOV RFLOWC N^ 	 IND
	
L 
r	 b	
d	 e 
R T	 A B C C i , b C d e	 z'
NEG !ND
BYPASS	 Rot	 N C
C1	 Dl 	 —0
	
Rol	
013	
--;^
C-	 Dla
D1J
OVERFLOW; ! o
	 0 iND !
	 _	 J
	
OVERFLO',W IND	 Z	 --0
I ^ -A .B .:_ &C L&i .i .c .d .e a
O VDD
O GND
02950L
Figure 5. Schematic Interconnection o r Fou: Four-Stage Chips
30
-rrTION III
BREADBOARi	 ` COMPUTER STUDIES
A. SLR* 11ARY
The logic diagram, as received from RCA Burlington, was evaluated by con-
structing a breadboard (Figure 6) of the entire processor, and by computer
programming a portion of the processor. The breadboard verified that the
major logic and control functions were cc:cect. The compu-er runs showed that
the proposed device sizes (1.1 mils) were adequate.
B. DETAILS
The contract specified that only one stage (plus control logic) of the
four-stage parallel processor would be breadboarded. However, in order to
verify the total logic package, and to aid in verifying the test program, it
was decided to breadboard the entire parallel prccessor. The breadboard was
constructed on 12 printed circuit cards which were mounted on a rack with plug-
in sockets. Only commercially available RCA COS/MOS devices were used for the
breadboard. The CD4007D was the primary building block for the breadboard,
and several CD4000D's, CD4001D's, CD4002D'r, and CD4003D's were used to
complement its fabrication. A total of 121 COS/MOS packages were required to
complete the breadboard.
The control logic was constructed first because it is needed to drive all
the other sectionE. Functional gating was used wherever appropriate so that
the breadboard would correspond as close l y as feasible to the actual IC.
Commercially available COS/MOS NOR gates were used wherever possible. For
example, the RN
 control requ ired a ze.o-'.ever-active-NAND gate. Since a zero-
active-NAND gate is identical to an active-high-NOR gate, RCA's CD4000D could
be used (see A, Figure 7). The CD4007D was the most extensively used build-
ing block. This device is essential for constructing the functional gates
such aF shown in Figure 7.
21
02 951 
Figure 6. Breadboard of the Processor
32
—^
RN
OF 3;0,10
RN
A SCHEMATIC OF ACTIVE LOW NAND GATE
13	 1	 11
13	 3	 1	 10	 12
6	 ^
i	 ^	 ?
B SCHEMATIC OF BAS', PUILDING BLOCK C04071) . IN BREADBOARD
D-
C FUNCTiONAL GATE iN CONTROL SECTION
22
I
I
141
I	 6l
'31
NOTE
32 NUMBERS REFER TO PIN NUMBER
21 11 OF CD40M	 SUBSCRIPT REFERS
UNIT NUMBER TWO CD4007D s1 1
1
111
REQUIREDi
101	 i
42
,1
`1
ii l I
5 I
41
02 Y`2 L
0 IMPLEMENTING THE FUNCTIONAL GATE USING CD4007D
Figure 7. ConStrUCt70n cf Functional Gates for the Parallel Processor
Using the CD4007D as the Basic Building Block
33
ti(
After the control logic section was completed, it was checked against the
truth table received from RCA Burlington. Figure 8 shows the output waveforms
of the control section for all 16 possible input combinations.
The arithmetic and accumulator sections were built one stage at a time and
debuggeA before priceeding. The register in each stage was inplemented using
the RCA CD4003D dua.-data flip flop.
Two minor changes were made in the breadboard so that the logic functions
could be fabricated with standard NOR gates instead of building up NOR or NAND
gates with the CD4007D. The first change was in the Zero indicator. The
logic liagram as received from RCA Burlington is shown in A, Figure 9. The
breadboarded v=csiun is shown in B, Figure 9. The second change involved the
clock circuit. The circuit received from_. RCA Burlington is shown in A,
Figure 10. The breadboarded circuit is shown in B, Figure 10.
The complete breadboard was exercised for most major modes of operation
such as shift right, shift left_, AD, LOGICAL AND, EXCLUSIVE OR, etc. The
worst case AD of two four-bit numbers was 1.9 microseconds, or within specifi-
cations. Figure 11 shows some typical operating waveforms.
C. COMPUTER STUDIES OF SWITCHING SPEED
A program for comp l iting the switching speed of COS/MOS circuits was
developed by A. Feller of RCA Camden, New Jersey. (1) This program was adapted
to BTSS Ii (Basic Time Sharing System) by J. E. Meyer, RCA Princeton, New
Jersey. The program enabled RCA engineers in Somerville to submit data to
the computer center via the teletype facilities in Somerville, New Jersev.
The computer center processed these data in batch mode because of the length
of the program. Output of the batch processor was mailed back to Somerville.
This last step introduced considerable delay in debugging individual runs, as
turn-around time (time from data submission to receipt of mail) was usually
2 to 3 days.
A. Feller, Computer Analysis and Simulation of MOS Circuits, ISSCC,
February 20, 1969
34
ab
c
K1
K2
K3
K4
K5
X1 (ACTIVE
IN	 LOW)
sum
AND
OR
Ro
R
a
b
c
d
(MODE INDEP.)
R N	(MODE 2,3)
(MODE 0,1)
R T1	 ('MODE 0,111
LT1
A. CONTROL LOGIC OUTPUT WAVEFORMS FOR THE 16
POSSIBLE CONTROL COMBINATIONS
B. MODE•DE°ENDENT OUTPUTS FROM CONTROL SECTION
Figure S. Input and Output Wavtforms for Control Section
02053P
-%10
35
Q2  1
Q4 Q3
Z;
ZERO INDICATOR
A. ORIGINAL LOGIC DIAGRAM
ZERO INDICATOR
1,uruvfv
o29UL
Q7 Q1
Q45 
Q3
ZI
B. AS BREADBOARDED
Figure 9. Zero Indicator
36
ab
c
d
B X O CX
AX
CL
BX O CX
AX
CL
CLOCK
A. AS ORIGINALLY DESIGNED
CLOCK
02955L
B. AS BREADBOARDED
Figure 16. Ciock Circuit
A. ADDITION
a
SEQUENCE
b
1. SET COMMAND
c 2. CLEAR
d 3. IN
4. EXCLUSIVE OR
CLOG( 5. EXCLUSIVE OR
6. OR
01 TOR
D2 8. AND
D3 9• CLEAR
D4
Q1	 VtRT =20V/CM
Q2	 NOR -µs/CM
Q3
Q4
m.264 P
B. EXCLUSIVE OR
a
b SEQUENCE
C 1. IN	 COMPAAND = ENTER DATA
2. IN
d 3. 114
4. ADD CWMAND = ADD Di = QiCLOCK
5. ADD
o. ADDDi 1. ADD
02
D3
VERT - 24 V/ CM
D4 FOR = 20µs/CM
Q1
Q2
Q3
Q4
Figure 11. Typical Operating waveforrors (Sheet l of 2)
38
Q1
Q2
03
Q4
C. SHIFT RIGHT, LEFT
D. SET AND CLEAR
a	 SEQUENCE
b I. CLEAR (d L;NE NOT
SHOWN)
C 2. IN	 t"NTER !FROM
CLOCK 01 D3)3. ADD
	
(0S -0? - i}
Q1 (D2-0	 D4-0)
02
SCALE: VERT - 20 V/CM
Q3 HUri -ud/cm
Q4
a	 SEQJENCE
b 1. CLEAR	 COMAND
2. IN
C 3. SHIFT RIGHT
d 4. SHIFT RIGHT
S. SHI FT RIGHT
CLOCK S. SHIFT LEFT
1. SHIFT LEFT
D1 8. SHI'	 EFT
02 9. SHIFT LEFT
D3
D4
[MODE - 3]
Q1
Q2 VERT - 20 V/CM
HOR - 20 "3/CM
Q3 .
Q4
SEQUENCE
1.SET
2.CLEAR
3.SET	 'COMMAND LINES
4.CLEAR	 NOT S40IIN
5.SET
6. CLEAR
Ott"P
E. ADDITION USING MI4IMUM CLUCK DELAY
Fi gae 11. Typical Operatng Waveforms (SLm 2 of 2)
39
Figure 12 is an example of the form in which data are entered into the
coM.puter. This form was developed to enable the ez_g'_neer *.o ZirniplJ .'1. iti
circuit parameter=_ at his desk, and then turn the form cver to s tecbnl.'..=
for actual typing onto the teletype. This input progr.ni ?s :a
 for '11 ct,-,20:
number of independent nodes ; number of depende -.. ,; d ^ ; azs. nt,nter of resi:-^t:rs,
capacirors, p-MOS, and n-MOS. It then asks ,.- resi:,:or and capacitor valitos
and nodes connected to: active device parar-etEr 	 c.M-;^	 'a
(mode 2 = internally programmed pulse; mote 	 ;spr pr_:grF!- • :ea	 puffsc
parameters; initial node voltages (ask for DC SOLMON if initial. node vcltseeF
are unknown; and, finally, printer output intervals and total run Lime. The
`ormat of all of the data described abcve is changed and written on a simu-
lated tape file called TAPE 3. To process these data, the user types /BATCH
01 02 03 and then tape 3 is batch processed by the main programs wl:i l-ten on
tapes I and 2. (See reference 1 for details of the main program.)
The circuit parasitic elements must be aetermirec before I-omputations can
begin uecause they must be entered separately. The cnly major parasitic
element in most LSI COS/MOS circuits is capacitance-gate tc srurce-dra;.n and
drain-substrate (AC OW) . Data on these parasitic capacitar.ces is contained
in an internal RCA report (2) .	 a data were checked bymaking a araparison
with measured data on the capacitance of a simple device., the U4007D. Tatre
was reascr.able agreement. kher. these values were used in the program, they
were checked further by running a simple inverter pair the actual width and
length of the CD'307D. Pair delay was computed to be 32.3 nanoseconds. The
data sweet specifies a minimum of 33 nanoseconds. Minimum speed is the best
speed observed in production, devices. The medi,i or average speed of a device
normally is apprexiwately twice the minimum speed. Fence, knowing the
minimum speed enables the median speed of a large lot to be obtained easily.
"taximum speed, as specified in the data sheet, is r-.n a_hitrarily picked number
for testing p±.rposes, and is too imprecise a numoer to ba used in computer
studies. There is no practical upper limit on how slow an actual device can
be (slue to processing differences, defects, etc.), while minimum speed is
firmly fired by phvsicai limitations.
D. Burr, Computer Analvsis of Switching Speeds in Digital CHOS Circuits,
GEL 658-2, August, 1968.
40
C -MOS TIME-SHARING COMPUTER FORM
	F. CARL.SON
	
Y2621
DIAL 452-2020 TA ACCESS COMPUTER
BEGIN 22	 /W 14 5
/ON 21245	 CClVrRCL
01/06/6? 13.13 ?2
Rr4DY
/CATAI.P-
• L-INPUT	 L-TAPF04	 mL-TA?FO1	 •TESTTAPE
•- PLOT	 *L-TAPE32	 I-T.APF59
READY
/CODE INPUT
1160
PLS ENTER CIRCUIT DESCRIPTION
TYPE A OF INDEPENDEST NODES, DEPENDENT NODES,
RESISTORS,CAPACIT2RS,PMOS,NMOS
NI:
ND:	 I
NR-
.14C :
No_
NN:
RESISTOR r :VALUE(KOHMS),VODE,NODE
R is
R 2-
R 3:
R 1:
R 5:
R 6:
R 7:
R R.
R 9:
!t i 0
CAPACITOR i :VALUE(PF),NODE,NODE :ALL DEP. NODES MUST HAVE A CAPACITOR)
C l:
C 2:
C S=
C 4:
C 5:
C G:
C 7:
C a-
C 9:
Ci0_
Cl 1:
C12:
r,13:
C14:
r15:
C16:
C17:
Figwe 12. Examola of Form Used to Enter Data Into the Computer (Sheet i of 3)
oa9e ?L_
41
P"?S 0 -VT,C w .'iiI:)TH(MILS), CH. LENGTH( YI LS) ,0 NODE,
N3DE, S N7 0E
PMOS 1-
PMOF 2=
P1005 ^ _
PMOF 4=
P"eS 5:
PMec 6_
oM?S 7:
PMOS ?_
PMOS 9-
PMPS 10=
NMOS 0 _VT,CH.WIDTH,CH.LENGTH,D NODE,G NODE,S NODE
NMOS l=
NMOS 2:
NMOS 3-
NMOS a-
Nf!0S 5-
NMOS 6.
NMOS 7-
NMOS B:
NMOS 9=
e	 NMOS10-
ENTER FOLLOWING CHIP PARAMS----REL PERMIT:VITY OXIDE,SILICON.
'OXIDE THICKNESS(ANGI,N MOBiL1TY,D0N0R C @NCENTRA?10N,P MOdi
LITY,ACCEPTOR CONCENTRATION,GAMMA
EPx=
ESI=
TOX=
UEFFN_
DONORS-
UEFFP-
ACCEPT-
GAMMA-
WHAT KIND OF RUN IS THIS?
MODE 1 OR MODE 2:
ELATE" PULSE HEIGHT,'.1DTH(SEC),TAU(SE:C),&NODE
V1;-
TP-
TAU-
vaD<<
D0 YOU NEED DC SOLUTION7
ENTER INDEP NODE VOLTS AND DEP NODE INITIAL VOLTS
NOjE I MUST BE MOS T +FNODE 2 MOST -
INDEP NODE - CONSTANT VOLTS
NODE 1_
N30E 2-
42DE 1-
Figure 12. Exampre of ionm. Used to Enter Data Into the Coxpula) (Sheet 2 of 3)
42
070117,
DEP NODE = INITIAL  VOL TS
NODE 4=
NODE 5=
NODE 6=
NODE 7=
NODE 8=
NODE 9=
NODE10=
NODE 11=
NODE 12=
NODE13_
NODE14-
NODE15=
N0LE16=
N13DE17=
NODE18=
NODE19=
NODE20=
ENTER CONTROL INFO----PRINT INTERVAL,STOP TIME,
INITIAL INTEGRATION INTERVAL,M AX EIGENVALUE(SEC OR I/SEC)
TFRINT=
TSTOP=
TINTED=
E;GEN_
STOP AFTER x:140
1160
IF, UNDER /CATALOG, A L
— TAPE03 APPEARS, TYPE
DROP TAPF03
	 MACHINE WILL ANSWER "READY"
F guie 12. Example of Fuim Used to Enter Data Into the Commuter (S keet 3 of 3)
vh?L
43
q.
The test circuit was reprogrammed into the computer with slightly lower
mobilities and with gate-to-output feedthrough capacitance eliminated. This
capacitance was added i.istead to the capacitance from output to ground. This
approximation was necessary to speed Lit) computer run time; the small time
constants involved caused the computer to go through an excessive number of
integrations. The result of ti,is rur. was a pair delay time of 35.5 nanoseconds.
Since the data sheet ;specified a mi;A mum of 33 nanoseconds, it was felt that
this model was adequate for use in simulating a portion of the pa-allel pro-
ces •.or. Table 11 lists the capacitance parameters used for the MOS model.
TABLE I1. PARASITIC CAPACITANCE VALUES USED FOR
COMPUTER MODEL OF COSMOS CIRCUITS
Element	 Parameter	 Value
Ea f n pn pair	 Capacitance of gate to substrate 	 0.21 pF/1.1 mil
or AC ground	 channel width
Each p drain	 Capacitance cf drain to substrate	 0.23 pF/1.21
or AC ground (includes feedthrough
	 square mil
capacitance from gate)
Each n drain	 Capacitance of drain to substrate 	 0.28 pF/1.21
or AC ground (includes feedthrough 	 square mil
, :.pacitance from st.te)
The arithmetic, register, and control portions of the first stage were
programmed to check rw_tching speed. The sections actually programmed are
ahn..m ono{,.fli1.. in Figure 13 along with the waveforms at appropriate Feints.
are from start of control signal to leading edge of circuit.
2enr points).
44
^O O O l?i
0
W ^^
e
b
v
It
5
P-
x
v+ J
r cr.
4 ^
X J1
x=
o^
a^
a `^
O^
J a
cc
^ Q
Wras
a
r
y
i
r'	 N^	 C
c	 ^^
r
ti
r	 17
r
Z ^
M_	
Ii
y	 ^	 I
'Q
/	 \	 1
I
r
I
^ a
^I
^	 J
f'^	 i	 _^ 1	 Iw
N Y
^i
4S
t—.^
^t
^^ o
L`
N ^C
Of
W
tl
PRECEDING PAGE BLANK NOT FILMED.
SUCTION IV
COS/MOS FUNCTIONAL LOGIC GATING
A. GFNFRAL DISCUSSION
Any complex logic function expressed b y a truth table or a Boolean equatirni
may be translated into a hardware schematic b y one of two methods: the build-
ing-block method, which utilizes standardized AND and OR circuits; or tine func-
tional-gating method, which utilizes components to s ynthesize the function.
The preference within the computer industry traditionally has been for the
building-block approach, despite the fact that functional gating requires
fewer components to realize the same logic. This preference has be?n due, in
part, to the more complex appearance of functional logic schematics and to the
subsequent greater difficulty caused by this comp .., appearance in debugging
and troubleshooting. Compare the identical logic .0 Figures 14 and 15, for
example, in terms of tracing a signal path.
Another traditional reason for avoiding functional logic has been the com-
plexity of gate design; as long as gates are structured of passive resistors
and capacitors as well as active levices, this reason is valid. Fan-in, fan-
out, load current, drive current, offset voltage, and le-iel shifting all must
uc _onsidered in conventlonal gate design, and it has been found preferable
to limit building blocks co a`ew well-designed types that are repeated over
and over.
With the advent or c-)mplementary MOS arrav technology, the aituation has
changed. As will be shown in the following paragraph, the translation of
Boolean expressions into COSMOS functional logic is extreme-y simple and
straig },tforward. The saving in components is impressive (40 components in
Figure 14, for example, _ompared to 18 components in Figure 15). rhere is no
penalty in circuit-design costs, because only p-and-n active devices are used.
F.ach p-and-n pair constitutes a basic decision-making element, with a single,
17
47
Figure 14. Building Block
48
I
N — 11 N
H
:r
z
A N	 F	 G !
^I N ^`N
6
C
VDD
I
P
H ijI	 P
G
_ 	
P
I	 P
A	 D	 C	 C	 E
D	 —
I	 N
E
N
Figure 15. Func local Gating
40
01035L
common input to the gate of each device; the y always operate as a pair, with
one device on and the other off. Thc:e is no load current, the output is at
either supply or grouvd potential, there is no level shifting (since inputs
and outputs are at the same potential), and they may be combined in series or
parallel configurations with other p-and-n pr'-s without restriction.
Since there is no access to individual components in an array configura-
tion, the computer builder performs ro debr^ging or troubleshooting on the
packaged array. The arra y is a true "black box" (the pellet wh4msicali y has
been called a "black speck"); therefore, the logic _an be presented arbitrarily
by building-block gates or an y other simulative symbols.
B. TRANSLATION PROCEDURE
A typical example will be used to demonstrate the simple step-by-step pro-
;lating a Bool3an expression into a functional logic schematic.
.lding-block form is shown in Figure 14. The logic will be
ie circuit shown in sh ., t 7 of Figure 3, with A, B, C, etc.,
the actual signals.
Soolean expression is:
C • D • E) + (C • F.D•E) + (C.D • E) + (E-HI + 1 = Z . . . . . . .(27)
.te the expressior in vertical columns, using a separate
in for each term and a separate line for each different
ible:
C
D	 D
E	 F E
F
H
I
50
(3) Circle the common variables:
A
B
	
C	 C
	
D	 D	 D
	
^E	 E	 E	 E
F
G
R
I
(4) Organize the n devices first. A series string of n units (each
vertical column) represents an k D funk ion, since all units must
be on simultaneously to close the circui t_ path. Remernb,'ir that
one end of each n string must be at the output, and. or-e end must
be at around. Since E is common to four of the five columns,
choose E to be ground; therefore, F, G, and d will connect to the
output, -nd I (sole variable in its column) must connect to both
output an(.! ground:
A	 Output
B	 I
C	 C	 Ground
D	 D	 D
E	 E	 E	 E
51
N___j
(,) Box the varIableQ and combine common e1emonts:
-r
	
OUTPUT
Step (.) completes the n section.
Every n unit must have an associated p unit, with the same input variable
connected to each pair. The p units also must form a path from supply (+V DD
to the output line.
From De Morgan's theorem, the relationship between AND and OR is that of
negated variables;
Since the variable that turns on nn n device also turns off it& companion
p unit, the negation is automatic, and 1 -k- n series string is equtunlent to
11X
O11Tf U1
thi s p parnIlei unit4.
completed In step (5'
unite in parallei and
Nxample, and the pair
parallel P A and 1' D in
1'o derive the p schemntic from the n Aect inn Will Ch WAS
therefore, it is necessar y merely to vonfigure series
pnrnllel units in serien. N A is in serles with N p , for
is paralleled by N P . The p functionF, therefore. art
seriem with 11 s
TO SUPPLY VOL TAGE ,
 • VDDI
4
P F ^I
TO GROUND
51
1'r
P
(r,)	 Similarly, NC Is in g eriea with the N 9 N H And N 1, group, anal
this cnmh [tint Ion, fit turn, i g parnlleled by NW	Repinclna aeries
with parallel and rnrallel with Herlea, P  will lie parHllel to
P A. , P ii , and 1' r , and Pr Oil be In series with the combination:
TO(4VDD)	 4
1
i
TO GROUND
54
OJTPUt
JNI1S
(7j f'ontinuF- tcwni'd completion of the p 4 p rticm. r-i , I g elnl, pnrnl!el
with merlem Arid p r+ rirA with pnrnllal:
55
(8) Nedrnw with convPntlonral gymhale!
1
al
'^UU
11
111 !
c^ !
I
!	 C^ I	 41 1	 E l i
CU 1r
A l
01
a1
Er
56
SL(:'fl()N v
DESIGN ANb LWI T
A . I NTRODU '7 W N
A monolithic ft+ur-ntnge varai!el processor ,irray (Figure 16) w, ► ich forms
the ba,lc cell of a 16-stage parallel processor has been designed in partial
fulflllmant of Phnse I requirements of NASA Con f_rnct WAS 5-11111 for the
Goddard Space Flight renter, The pnrnllel processor basically is a four-stage
rhift to , -later that la cnpnble of both serial end parallel access. The encoded
control luAic that is dired in fncllitRte9 two c s complement addition, AND, OR,
anti EXCLUSIVE -ON ingir operation s, as well as right, left, and ..-fight cyclic
shIfta.	 It also provides for negativr, r.ero, And overflow indications.
COS/MOS logic anti (:-)S/MOS .ircult configuratinn n ► v used to design an arrav
that functionally Is to roeoputer on s chip excopt f)r memory and buffer cir-
cuits. It requires tow quiescent power disaip g tion and provides computing
times of less than n microsecond. The unique C(,X/Hc)S A-ray design yielded A
high packing density, -nnkin,t feasible n circuit containing 750 n - And-p chnnne,
device @ packed in a chip site of 150 mile square. 'doer ► completed, this will
be the largest COS/MOS arrny fnhricnt p d in industry to di+te.
Figure 1 sh ►+ws the lead requirement r_nd riR!ire 17 shows the lead configurn-
tion for the fnur-bit Alice to be packaged In a 29-lend fint-park integrated
circuit.	 'Figure 5 is n schemitic lnyout, Indicatlnq the Interconnectlor of
four four-stage chips to form a monolithic 16-bit parallel proLeasor. The
lead confiKuratlon has been chosen such thn ,
 all the input encoding controls
fan either on the top or the bottom of the slice, the inp^ ► t dntn lines fan to
the right, and the remninit.g indicators and controls fan to the left. Since
the control lends Are duplicated at t he top and the bot t om of the slice, thls
nrrm ► ge ►rent mnkes feasible the extension to a 16-stage vroceasor by two-layer
metal, l,Pnm lend, or simple wire-bonding techniques. The wire-bonding tech-
57
oar4str
Figure 16 Layrart of file Parallel pr fs . "A
58
r— —
02 660 L
F otire 17. L o ad Configuration for the four-Bjt Slice Using 28-Lead flat Pack
59
nlgme in combinntir)n with a suitably designed package will he given primary
consideration durinV Phase 11 of the contract, This layout mares possible
monolithir interconnecO in by wire bonding either between adyace;t chips or
around nonoperating chips. A special ceramic 40-lead flat pack will by
developed for the 16-stage processor during Phase 11 of the contract. Some
prellminaiy designs are already under consideration.
Fi . LAYOUT DESIGN ' INSIDEMAT I O?: S AND 1MPLEMENIAT ION
The circuit schematic of the four-stage parallel processor is shown In
Figure Z. The complexity of the hrray required a systematic design approach.
The design philosophy centered on optimizing the array with a minimum of inter-
connect airing and tunnels required W feed the decoded control lines to
optimized functional logic blocks on parallel data lines. Thus, emphasis was
placed on achieving a layout suitable for the control-logic decoder with
vertical Input instruction lines fanning in from both ends and the horizontal
decoded control lines fanning out at appropriate levels to feed the logic_
blocks In successive columns. Figure 18 is a schematic of the layout with
functional logic blocks. Conservative design tolerances, in terms of channel
widths, clearances between contact and diffusion edge, clearances between metal
lines etc., and particularly between p+ and n+
 power lines, were followed to
minimize leakage. This conservative approach resulted in a design trade-off
between yield and chin size. Since precise alignment of all Ines in the array
Is desirable, all the gates were position pd in one .iirection (horizontal with
reference to Figure ;8), necessitating a larger chip area. To further
facilitate the extension of this layout to Phase 11 of :h p
 contract, bonding
pads for input instructions and contro? lines were provided at both ends and
at the top and the bottom of the processor. The 10-percent increase in chip
area, which is the result of this feature, is a worthwhile sacrifice consider-
ing the advantage gained towards implementing Phase 11.
Parts of the four vertical data columns are similar and ri•petitive.
Advantage is taken of similbrity and symmetry in the design sur_h that two
data columns share a P well, and the two control logic columns with their inter-
connecting cross tunnels also share a P well. In this way, the design contains
a total of only three P wells and thezehy minimizes well periphery.
60
CONTROL LOGIC,	 JJ^
	
_	 DATA COLOOMNS
DECODER
C 1 , C2	 1	 1	 GECODER I CARRY IN NEGATIVE
Xmn GATES	
j	 FOR_ CONTROL AND ZERO
I	 L N. LTo I NO' CATICh
DECODER	 i	 CAR IN
DECODER	 I—
fGR:	
INTER-	 RI	 T1 Xn.n GATE
J, Q, RTo	 CONNECT	 '	 DECODER
—^
IN GUT/ RIGHT SHIFT Xmn GATES —
LINES ANDI 	 TOR:
I	 'TUNNELS
	 RT1. L T1 , LCLOCK	 I	 ^_ — 0 FLIP FLOPS ---
CIRCUIT	 DECODER 1
FOR:
LEF (SHIFT Xmn GATE°I	 K 3 , K2, K5 —
DECODER
	 DECO EFFOR ^—
!(	 K	 I	 L FOR X 1 , M4,	 1
Ro X,;,n GAT E	 '	 I	
M Xmn GATE
FAST ARITHMETIC DATA COLOUMNS
I	 CARRY
'	 I	 GENERATE
'	 DECODER I
OVERFLOW	 ^	 ^	 FOR:
DETECTION
4 3	 2 I	 1
i	 AND
CIRCUIT	 I	 OR
sum
SIGN A
—! —
'	 ~ SIGN P Xi	 mn
GATE
EXCLUSIVE • OR CIRCUITS ——
r
C,",
F au.e 18. Functional Layout of the Parallel Proressri
61
:wo functional logic blocks. the 1) flip flop and the EXC'.USIVE OF., arz
discussed here to illustrate the development t:f the layout. Figure 19 is the
logic block diagram of a D flip flop with its truth table. The specification
calls for the D flip - lop to transfer D input to Q output on lob. Go High clock
transition, and for .he Q output to be buffered. F i gure 20 gtves the logic
schematic of the twc basic components, the transmission gate and the inverter,
from which the D flip flor can be built. The complementary circuit configura-
tion for these two components is shown in Figure 21. The -ransmission gates
are closed or opened by the control signal, e.g., clock iti this Figure, that
is generated elsewhere in the control logic column. Signals from the control
logic column feed several transmission gates, and the inverter required with
each transmission gate is localized and paired. A method of wiring the combina-
tion of inverter and transmission Kate is indicated in Figure 22.
A particular advantage of this wiring method is that a number of such
building blocs can be stacked t!p. It: this configuration the device source and
drain diffusions are found alone the vertical, while the signal interconnect
lines are found alone the horizontal with diffused power-supply connections.
The block diagrar.. representation for layout development of the D flip flop
(Figure 23) exemplifies this approach. The transmission gates and inverters
are positioned appropriately one above the other to fac.litate a verr compact
layout. The composite layout that follows this scheme is illustrated in
Figure 24. The cross-hatched areas represent p + diffusion regions and the
dotted areas represent n diffusion regions. The shaded horizontal areas
represent metal interconnect lines. By removing either a cross-hatched or
dotted tone fror, the contact area, whi,-h can be seen clearly except for the
shading over it, this composite can be used to single out the different masks
required for fabrication.
C. EXCLUSIVE OR
this functional logic block, which is repeated on each data line, is a
case watere the design methods described in the previous paragraph were combined
with the functional gating technique discussed initially, and in detail, in
62
CL
LOGIC RLOCK
D CL o Q
0 -711-1
1 J 1 0^
X	 Q Q
TRUTH TABLE
02*0OL
Figure 19. A Functional Lo p ir 9!ct;: 0 Trip Flop
63
nr
I
t-
01DAZl
Figure 20. D flin-Flop Logic. Schanahc
64
I	 VUD
CL
I
1
G I_
it	 N
— AANVERTER
I I	N
I
GND —
^f
I P	 N
UL
I- --B:TRANSMISSION GATE
(NORMALLY CLOSED
BETWEEN TERMINALS
1 AND 2 WHEN Cr
-
 IS LOW ► .
Figu.e 21. Transmissinn Gate and inverter Circuit, Schematic
02997L
65
I
CTR	 I	 CL	 CI_ INV	 CL
	
r^ J	 ,; ICL
CL
Cl.
GRD
-01
CL
1
02984L
CLOCK
L
U z
NOTE. CTR -NORMALLY CLOSED TRANSMISSION GATE (NORMAL WHEN THE CLOCK IS LOW)
A. BLOCK REPRESENTATION FOR '.AYOU1
LEGEND:
® P' SOURCE OR DRAIN
N ' SOURCE OR DRAIN
^—	 GATE REGION WITH METAL IZATION
B. DEVICE TOPOLOGY
Figure 21. Transmission Gate Layout
66
^f A I A IN
^I	
7	
I	 ^	 I
	
^— I ;TR LOU D I I	 Ci	 CL INV	 CL	 CL INV
II	 IN	 j	 N	 I	 i	 P
CI
I	 ^
I
C	 GTR LOOP I	 (,'L
I	 p
	
01R LOOP I I	 CL	 P l	 IINVLL
	
,	 I	 4
	
I	 (	 —
P
( ^J	 I
INV 2 LOOPI i	 OTR LOOP 21
	
IOTR OOP
N	 I	 ^^	 P
INV 1 LOOP;
I	 P
l..
17 l^
	
^, I
t
	INV I LOOP? I	 CTR LOOP 11
r.	 I
N^J F L
	I 	 I
	
BUFFER	 I	 I
	INVERTER	 INV2 LOOP
G
N	 N	 !
(	 `rI^
CTR LOOP 
	
INV I LOOP 1
T P
	
I P
I (7
INV 1 LOOP?
1
INVERTER
P I	 ! P
Q
NOTES'X	 X TO GATE
IN . - 1 VVERTLR
C TR • NORMALLY CLOSED TRANSMISSION GATE
OTR • NORMALLY OPEN TRANSMISSION GATE
F,gu(e 13. Blccw Diagram Represenla((on for LZIYou! Development of D Flip Flop
— -0(OUTPUT)
67
1 J,f^P
LFGENO
M : N' REGION
0 : P' REGION
Figure 24. D Flip-Flop Composite Layout
68
Y o---- —
X C---
Y-x
Section III of this report. The logic specified for this relatively Simple
functlonal. block is as f(.11owa:
EXCLUSIVE - OR LOGIC
In addition to NOR and NAND, the logic shown above involves OR gates. OR
gates would require a stag^ of inversion If complementary devices are to be
used in assembling the logic. The device count necessarily would go up. To
minimize the number of devices required to generate the function, the following
technique, using functional gatirg, is applied.
EXCLUSIVE-•OR output = S = XY + XY
XY+XX+ XY+YY
= X ( X+Y) + v (X+Y)
(X+Y) (X+Y)
XY (X+Y)
XY (X+Y)
(XY) + (X+Y)
69
i
1f7
N STRING
UTPUT
P STk
.J
x
TPUT
"l he final reduction is a NOR output which can he functionally designed as
Indicated below:
ADD
70
f7
:IVE-OR
Putting the p string and the n string together and generating X+Y from a
simple 'Y)k gate, the EXCLUSIVE-OP, circuit Decores
VDD
	
ADD
Figure 25 illustrates the device topology for the funLtionally designed
EXCLUSIVE-OP, circuit. Figure 26 is the composite layout generated for this
logic functional block. The composite employs the design considerations out-
lined previously for the D flip-flop circuit.
71
EXCLUSIVE-OR (OUTPUT)'—XY • XY
NOR GATE
cz ;.R : _
Figure 25, Block C agram For Exclusive-Or Layout Development
72
LEGFN0
	 "4
P' REGION
- N' REGION
Figur? 26. Exclusive-Or CompasIte Layou!
73
PRECEDING PAGE 
FLAN K NOT FILME L
 .
SECTION VI
PROCESSING
A. INTRODUCTION
Processing of complex, LSI, COS/MOS semiconductor devicea requires an
unusual degree of manufacturing sophistication. Although processing on the
parallel processor circuits has not started yet, it is anticipated that high-
yield-fabrication procedures ,ill account for a significant portion of the
engineering effort expended in this program. The followi-z processing con-
siderations, pertinent to this program, will require unusual engineering atten-
tion.
a. Tha successful fabrication of 750-component LSI chips for Phase I
and 3000-compr:ne	 LSI chips for Phase II would represent a
sig*:ifi^_ant ._halien .ve for any integrated-circuit technology. The
challenge is great in the COS/*SOS technology where processing
sequences are highly complex.
Successful fabrication of COS/"OS devices re q uires ultraclear.
processing to z: _id mobile charges in the game oxide and/or gate-
oxide defe.-ts.
C. Unusuall y
 high-accuracy alignments and accurate geometric control
of components are essential to ensure roper deviLe operation.
d. The large chip size (0.145 inch by 0.155 inch) is at the limits
of present photomask art for generating images without distor-
tioa and high rPs--lution.
e. High-yield techniques are essential to achieve the objectives
of the program at a reasonable vield. Special handling and
photolithographic fabrications appear to be essential.
75
B. COS/MOS FABRICATION PROCESS
The first step in the fabrication process is shown in A, Figure 27. In
this step a low concentration "well" is diffused into the homogeneous N-type
silicon which is typically 1 to 2 ohm-cm material. Next, the high concentra-
tion regions which form the source and drain of the P-channel devices are
diffused as shown in B, Figure 27. In a similar manner, high concentration
n-regions are diffused within the well to form the source and drain for the
N-channel devices as shown in C, Figure 27. The physical separation between
these diffused regions determines the channel length and directly affects
device characteristics.
Up to this point, the fabrication of complementary MOS devices is similar
to the fabrication of conventional bipolar planar devices, i.e., processing
steps for both types of device include silicon dioxide growth, photoresist
application, oxide etch, high-temperature deposition and diffusion of impuri-
ties, and masking operations. At this point, however, MOS fabrication enters
a critical phase because in MOS construction the purity of the silicon dioxide
directly over the channel and beneath the metal gate is far more critical than
in bipolar devices. Any mobile charges in this oxide due to sodium or calcium
impurities move under the influence of the high field between the gate and
substrate and can induce uncontrolled leakage currents between the source and
drain. These leakage currents were a major obstacle in the development of
early MOS enhancement devices.
To ensure that a pure channel oxide will exist in the critical region, the
existing oxide is etched back to the surface in the source-channel-drain
areas as shown in D, Figure 27. The wafer is then thoroughly cleaned, and
a thermal oxide is grown in a specially prepared and maintained "clean" furnace.
A small amount of phosphorous-doped oxide is usually added as a "getter" dur-
ing this process to immobilize any stray ions that may intrude on the surface.
This step is illustrated in E, Figure 17. Precautions are also taken during
the metal evaporation phase to ensure that no contaminants reach the surface
of the channel oxide beneath the gate metalization. These special precautions
require the use of ultra-pure aluminum wire and a scrupulously clean vacuum
chamber. The contact-opening and metal-etch steps are shown in F, Figure 27.
76
D
a
0
Sv
w
J
Q
W
O
Q
C7
Z_
Z
W
tZ
0
U
2OU
dm
O
cr
GU
Cl.
O
Z
d
Z
0WCL
OG
r
dW
S
i
cc
s
i
c	 c
b
I
S
3
W
LD
XO ^
J W
o
o
^ dW US Z
W WS O
F- C]
O p^
O
z ^
3 cn
O O
^ 2
U CL
W
JWZzd
J
W
U
O
Ln
Z
W
O
XJ
LL. Q
O W
C-0 C--
U d
W ^
G
V)
Nd
W
c0
v
J5
W
O
Ov
N
a^
OO
U-
W
W}
OF-
Z
J
.IW3
_Z
2
F-
3
Z
_O
U
W
W
CL}
O J
Z J
O w
O 1
<L W
Lt =
vi
Z
U
W
W
}
CL
O
z
O
LL
W
O
2O
^ Z
^ O
lL
W _J
^ N O
d
W	 v
d	 U
77
I78
C. CAPACITANCE-VOLTAGE CURVES
The capacitance-voltage measurement technique has been valuable in
developing; the clean-oxide clean-metal process. Briefly, this technique in-
volves the measurement of the capacitance due to the oxide dielectric in series
with the capacitance of the surface space-charge layer. Ideally, the induced
Junction is due solely to the field between the metalized electri-de (or
channel if this were a gate) and the substrate. As explained previously,
however, mobile charges in impure oxide can induce an inversion layer and,
therefore, change the bias voltage at which the series capacitance appears.
Figure 28 shows capacitaticr-voltage curves for clean and impure oxides after
one minute at 300uC, with 10 volts positive bias.
y
v
x
v
a
U
CLEAN OXIDE
AFTER +3009C,	 06 VOLT SHIFTI—I MIN,+IO V	
DUE TO MOVEMENT	 jOF IONS
AFTER +300•C
I MIN,+ 10 V	 INITIAL
IMPURE OXIDE i
4.2 VOLT SHIFT
DUE_ TO MOVEMENTOF IONS
INITIAL
t
a	 2	 4	 6
	
VOLTAGE	 01243
Figure 28. Capac!tance- Voltage Curves for Clean and Impure Oxides
79
y
P^ECEpING PAGE a
tANK NvT
SECTION V11
TEST INC
A. INTRODUCTION
	
A significant dilemma in the implementatior 	 logic is the problem
	
of verifying the proper operation of the fabric 	 -irrays. Not only is it
necessary to fabricate and test volume quantities of an array, but it is also
es-antial to isolate any initial array design faults.
B. INITIAL DESIGN FAULT ISOLATION
When the design and the sample fabrication of an array is complete, it is
necessary to verify that all design phases have been performed in a proper and
logical fashion. To aid in "debugging" the parallel processor array, a number
of internal test probe pads were included, in addition to the ordinar y bond
pads. These internal test points were strategically located to provide ease
in trouble shooting should a failure be revealed in the logic design or in the
mask design. By probing these pads it is possible to isolate areas of the
array and verify their proper operation. By using these pads, therefore, a
fault car. eas'.ly be located, at least to cne gate structure and often to thl
exact device. This capability greatly facilitates the process of locating a
logic or mask error without elaborate rechecking.
C. FRODUM ON TEST PROCEDURES
k'hen it is det.2rmir.ed that the masks are correct and functional arrays
are producible, a test matrix is required which will exercise each array
thoroughly to select functionally usable chips. In this procedure, only the
27 bond pads are accessable to the test words in which the inputs will. auto-
matically be sequenced and the outputs verified.
In order to exhaustively test the array, all possible combinations of the
15 re q uired inputs would be applied. This would require 215 over 32,000 test
81
words to be applied. All outputs would be monitored for each test word. This
procedure results in an exorbitant period of test time to provide an Indica-
tion of array operability. Hence, it behooves the designer to devise a short,
concise test sequence which is practical in time, yet will supply assurance
that the array is indeed operating as required.
The basic concept leading Ln the reduced test matrix for the parallel pro-
cessor involves the verification of proper operation of each pn device pair.
To achieve this, each logic gate structure must i-e made active in all possible
made-, independently, then made inactive in as few conditions as possible where
it can be shok°i that each device is capable of controlling the logic output.
"rhe procedure is illustrated in Figure 29 where four (4) tests are all
that are required to test a three-input NAND gate. In the first test, if
any of the p devices are "on," the output will be false, or there will be a
DC current path from +V to grourd which can easily be detected in the power
tests. In the second, third, and fourth tests, an n device that is "on" will
be detected in the power test, and the p 3evice that is "off" will show up as
a logic error. Also, any error resulting from metalization shorts will be
defected in power tests. The extension of this procedure to functional logic
structures is quite simple and provides an even greater reduction of "tests per
input" than for the NAND/:FOR structure. 	 (See Figure 30.)
The testing of a bilateral current device (transmission gate) requires
a different test technique. In testing the conducting state of a transmission
gate in the parallel processor, a sensitized path through the transmission
gate was required to change the state of a D flip flop. The changing-of-state
requirement is derived from the fact that if the transmission gate was off,
the D flip ?lop input would be open circuited and the charge-storage property
of the device would prevent a change in the flip flop for millisecond time
durations. To test for nonconducting states, signals are applied such that
each data terminal of the transmission gate in question has active signals
present which are of opposite state, i.e., "1" and "0." If the gate is con-
4
ducting, a low resistance path, ).0 ohms, will be present and can be tested for
in quiescent power checks. The quiescent power of a chip should be less than
82
A B C Z TESTS
-V +V +V G An Bn Cn ON, Ap Bp Cp OFF
G -V -V +V ADON, An OFF
+V
-V
G
+V
+V
G
+V
-`/
Ep .3N, BnOFF
Cp ON, Cn OFF
Z NOTE: Anima DEVICE CONTROLLED BY A
App DEVICE CONTROLLED BY A
A B C
Z
+V
1
Figure 29. Test Requirement For Three-Input NAND Gate With Inputs Active Nigh.
83
K 
KZ
K2 --d
Dx
K1 K2 I Dx Z TESTS
1 0 1 0 K1nDxo ON. K lp Gxp OFF
0 i 0 0 K2n D xn ON, K2	 Dxp OFF
1 0 0 1 K2p Dxp Ott, K2n Oxn OFF
0 1	 I 1 K l
	Dx	 Oda. K ln Dxn OFF
K 2	Dx	 KI	 C 	 +V
:G71L
Figure 30. Test Requirement for Function Gate
84
12 microwatts and the leakage from two opposite state loverters through a trans-
mission gate would be (P = F. 2 /R - 1000 2 /3 x 104R = 3 milliwatts^ more than two
orders of magnitude above the normal quiescent power. This approach provides
a good tolerance margin for detectin g, this type of failure.
The utilization of this app roach allows the testing of parallel data oaths
simultaneously. nnalysi.s of the individual gate structures along the serial
path is achieved by establishing logic paths to and from the structure in
question so the required gate input conditions are directly manipulated by the
data inputs, an ,1 the gate output is directly monitored at the array data out-
puts.
D. PARALLEL PROCESSOR TEST PROCEDURE
The Lest procedure (Table III) for the parallel processor was devised using
essentially two semi-independent test procedures: one for the processing por-
tion of the logic and one for the control portion of the logic. In testing
the processing portion of the array, it is assumed that the control logic is
functioning properly, and the control portion testing is based upon outputs of
control gates being monitored by using conditions on the inputs to the pro-
cessor portion, which is assumed correct, to define logic paths. This is
essentially a process of establishing "logic paths" to and from the structure
to be tested.
The test matrix table has been configured in such a manner as to provide
a complete picture of the relationship between a given test word and the
specific device structure that is being functionally tested. The information
contained in the right-hand column enumerates which particular gate structures
are under test and whether these specific gate structures are being programmed
for an active or for a specific inactive state to exist at the outputs. Thus,
it can be seen that the table provides the user with any and all information
that is required to functionally test out a given chip.
With reference to the test matrix table, the first 26 test words are tests
of the control portion of the array. For the majority of tests the array is
in mode III and all of the 16 processor instructions are examined. For each
85
PPECEDIN G PAGE :-)LANK NOT rjUAD'.-
TABLE III. TEST MATRIX FOR THE I
Mee.	 O P	 MODE _
	 INDEPfNDENT
	 MODECoerd
	 C O DC	 R,	 IMo 'K,	 K, Rir 	XI IN	 SUM •NA CM R ♦ 	 R	 l	 n w	 Rn Rn ^n l .. lw O	 1	 CAR M	 ^IMIIC, e b C C Q	 i14 ♦ M •.3
	 w.^ :L.	 .+	 :ii ♦ «3	 . . na :	 .. i♦ w n.h r .^. mow:• rric. s2r
	 .i	 7 i	 a.	 a^ +f.
1
6r	 r	 r	 " r 	  
10 1	 Ir	
tr
v ^ I I  	 _	
,	
III ^ ^ L ^ , i^^ ` ^
	 ^^	 ^	 .	 1 	 ..
	
+	 i
.4r ^
	 !	 r
SEQUENCE Of EVENTS:
	 NOTE:: 1)	 7)
	
3)	 3)	 C
s)
J	 1	 //
;-	 _._r
87
^^^,- # 1
JUX FOR -HE PARALLEL. ??OCESSOR ARRAY
pE^^NOEMT - — -	 T CLOCK DATA IN_PUTS	 drtvu,	 TESTS	 FOR:I	 CAN M SIGN A - --	 SIGN o	 C,	 o. 31 N D t%
	
^.wo^a
	
L i
K^
f
	
• Z 
^S^ :C. •:! 1hAt4 • 4h+ai 	 .iya^• `1[^b^+'h M►•. MFi.— ^^.x..•^r ^^AILL, Iw :M •^^ 1. fit fy /. 1r V" 
	
tit	
- - - --- --
ff+-	 - mot'	 t[	 I	
_	
}.tl•`	 T I
!!	
^	 t. 
v
^,	 r
r	 p
---- -
-r--.
i	
-	 -
r r r	 -
	
A	 _,	 f
- --
	 L^-	 —	 --	 Y-	
F I	 -tea led -.K 	 r. WA^..f	
--
--	
---•+.-^-	 --	 r_l	 _	 __ _	 1 I, II	 ^ of 'mot [lo,l .
	--r ~	 _'	
err..^ ,//F 	 .ps ? 'Dr- - A	 .-,. ,.J +:
-.	 - -
	 !	 ?	
J^rr^ 
10
	 J - ._>;..:. Tit: ^C`'is...1 -roe -Zro
--	 ^	
r	
•^ Tot _ A r s+. «f -^6 i -
Ijyl
-^--
	 .` ____	 -	 ^^lr--^-s uG"o:-'-`^!- . _	 ^ ---sue - - ^^: -	
- -	
^^"r'^^N^`-• 3rr_- = -^^ • ^
Qo
PRECEDING PAGE BLANK NOT FILMED.
control instruction (R, AND, L, ctc.), the data inputs are defined so as many
test condi • ions as possible can be monitored. After the initial 16 words,
ant; untested conditions can be tested.
Upon, successful completion of the testing of the control logic, the pro-
cessing portion of t'.e array is checked. The arithmetic and shifting portion
are checked independently. In testing the arithmetic portion, the first
E\CLUSICE-C`R structure of the full adders, the cam generation structures,
and the input gating structures are tested independently. Tests for AND, OR,
and SLM transmission, gates, and the second EXCLUSIVE-OR structure are mixed
into the above tests.
Testing of the shift register portion is accomplished in a straightforward
shifting test. This test checks out the L, R, and R
0 
transmission gates and
checris the D flip flog; operation. Tests for special indicaLors are checked
during the control test portion of the test sequence.
In the test matrix, a column contains information as to .hat conditions
are bein g tested for in each test Nord. Examination of this column will reveal
that all conditions re quired by Section II are satisfied.
PKECEDiNG PAGE BLANK NOT FILMED.
SECTION VIII
CONCLUSIONS NND RECOMENDATIONS
A four-bit parallel processor has been designed. The logic design conforms
to the contract objectives. The array of 750 devices has been laid out on a
chip 145 mils b y 155 ails. A composite drawing was completed and rubyliths
were prepared. Extensive checking of the rubylith (with the assistance of
R. Lesniewski of NASA) was used to reduce Lh2 probability of error in the lay-
out. A test matrix was developed to permit testing of the processing and
control portions of the parallel processor.
It is recommended that the program continue as planned.
91
PRECEDING PAGE BLANK NOT FILMED.
SECTION IX
PROCi•-&M FOR NEXT' INTERVAL
During the next interval, tte f:ur—bit parallel processor unit will be
fabricated and tested.
93
