Onboard processor addressing and packaging evaluation  Final report by unknown
__.......___..____ _______
I
I
I
I
I
i
I
I
I
I
1
!
I
I
I I I
ONBOA_D_mOCESSOR
ADDRESSING AND PACKAGING EVALUATION
__ FINAL REPORT
April 30, 1970
Pre;mred For
••7
NASA _ODDARD SPACE FLIGHT. CENTER .....
Greenbelt, Maryland ................................
WESTINGHOUSE DEFENSE AND SPACE CENTER
_ AerQepace and Electronics Systems Division
(THRU)
/
U (NASA CR OR TMX OR AD NUMBER) (CATEGORY)
I I _ , k
https://ntrs.nasa.gov/search.jsp?R=19700024793 2020-03-11T23:11:17+00:00Z
!!
_/. _ II ,;_.
>
,T
ONBOARD PROCESSOR
ADDRESSING AND PACKAGING EVALUATION
FIlL _P_T
April 30, 1970
Prepered For
NASA GODDARD SPACE FLIGHT CENTER
Greenbelt, Maryland
WESTINGHOUSE DEFENSE AND SPACE CENTER
Aercapace and Electronics Systems Division
Baltimore, Maryland
'rT'
!!
r
i
TABLE OF CONTE_TS
1.0 INTRODUCTION .......... .......................................... i
2.0 TASK i ......................................................... 2
2.0.1 Outline of Changes for Task i ................................ 2
2.O.1.1 Technique I ............................. .................... 2
2.O.1.2 Technique IS ................................................ 2
2.1 Addressing Technique I ......................................... 3
2.1.1 Modifications to the CPU ...................................... 3
2.1.2 Addition of Instructions ...................................... 3
2.i.2.1 Load Immediate Instructions .................................
2.1.2.2 Plus Immediate Instructions ................................. 5
2.1.2.3 Minus Immediate Instructions ............................... 6
2.2 Addressing Technique II ......................................... 7
2.2.i Design Modifications .......................................... 8
2.3 Summary of Task I .............................................. 13
2.3.1 Sechnlque I ................................................... 13
2.3.2 TechnAque II ................................................. i_
3.0 TASK 2 .......................................................... 15
3.0.i Additional Instructions.. ...................................... 17
3.0.2 Functions Deleted ............... •.............................. 17
3.O.3 Instructions Removed ................... _............ ,......... 17
3.0,4 Instructions Modified ......................................... 18
3.1 Addition of Instructlons .............,••......................... 18
_.i.i Instruction Description and DisQussi0n ........................ 18
TABLEOF CONTENTS (continued)
3.1.i.i Indirect Instructions ....................................
3.1.i.2 Test for Zero Instructions ...............................
3.1.1.3 Exchar_e Instructions ....................................
3.2 _eletions ....................................................
3.2.1 Scale Register and Associated Instructions .................
3.2.2 Modifications of The Divide Instruction ...................
3.2.3 Elimination of the OR-ANDFIIp Flop ........................
3.3 Summary of Task 2 ............................................
4.0 TASK 3 .......................................................
4.1 Problems Encountered in Partitioning Control Logic ...........
4.2 Evaluation of Present Control Logic ..........................
4.3 Approaches to Control Logic Redesign /........................
4.3.1 General Purpose Logic Structure Approach ...................
4.3.2 Partitioning for Substrates .................................
4.3.3 Summary of Control Lmgic Redesign ..........................
4.4 Microprogra_ned Control Logic ................................
4.4_i Partitioning the Memory ....................................
4.4.2 External Logic Partitioning ................................
4.4.3 Problems Introduced by Microprogramming ....................
4.4.4 Su_ry on Microprogrammed Control ........................
4.5 Summary of Task 3 ..............................................
ii
18
21
21
23
24
25
26
28
29
30
_0
31
33
34
39
4Z
4Z
42
63
65
65
!!
Figure
A-I
4-2
4-3
4-4
4-5
LIST OF ILLUSTRATIONS
Title
Examples of Recurring Structures
General Purpose Logic Structures
If if ft 1!
II 1! It II
II It . II tl
Page
32
35
36
3?
37
i
'7
Table
2-1
2-2
3-I
3-2
4-i
4-2
LIST OF TABLES
Control Modifications for Instructions
Utilizing Immediate Addressing
Addressing Mode Selection
Gate Changes for Task 2
Division Corrections
Characteristics of General Purpose Logic
Structures
Results of Logic Partitioning
9
I0
19
27
38
_0
iii
'n
iLq
I
!
This report summri_es the work performed on NASA Contract _$5-I0667
ao_ification number three. Objectives of this contract were to improve the
progran_ing capability of the OBP system and at the saae tim_ reduce CPU
control circuit_ for Improved packaging feasibility. The StateMent of Work,
associated with this contract, was divided into three rain tasks. The first
task, Task 1, dealt with the investigation of two techniques for improving
the ad_ressing c_psbility of the OBP. The eval_tion of Task 1 indicated that
neither addressing approach was suitable for the OBP systea and as a result
a new approach was selected for Task 2. Section 3.0 of this report Lists
those changes selected for Task 2. After these changes were finalised, they
were designed into the CPU, and drawings were updated. The changes associated
with Task 2 were Implemented so that the control circuitry of the CPU was
reduced. Tills reduction in control circuitry aided the implementation of
Task 3.
The objective of Task 3 was to evaluate the control circuitry on
the basis of redesigning i_ _or better partitioning. Existing control
circuitry in the CPM was orgarLized in groups that had similar logic structures.
These groups were then used in the partitioning of the central logic. The
results of thls reorganisation were evaluated and it was felt that a more
efficient approach to partitioning could be taken. Therefore, a preliminary
investigation into the use of rmad-cn_ memories for the control sntion was
undertaken since this approach appears to simplify the partitioning task. The
results Of th_s study are also presented in the report.
2.0 TASK i
The purpose of Task i of this report was to evaluate two addressing
techniques for the OBP-CPU. These two addreaaiuE approaches were chosen as
hhe m_st desirable approaches to consider at the outset of the contract.
Direct inputs from OBP programmers indicated both approaches would provide
the desired addreaelng flexibility. Therefore the selection of the preferred
approach was primarily dependent o_ its impact on the hardware. The following
addressing techniques were evaluated.
2.0.1 Ou_ne of Char_es for Task_l
The followlng outline illustrates the changes to the C_U for
Techniques I and If.
2.0. i.i Technloue I
Eodifications:
i. Bits 7-12 replace bits i-5 for minor op-code decoding.
2. Bits i-6 of the instruction word are used as the immediate
operand.
Instructions Added
1. LOAD A I_DIATE
2. PLUS A I_EDIATE
3. MINUS A IMMEDIATE
_. LOAD EA IMMEDIATE
5. PLUS EA L_DIATE
6. MINUS EA IMMEDIATE
2.0.1.2 TechniQue II
F_difioatiens:
i. Bits 12 and 13 of the instruction word are used as an addressing ....
mode field for decoding one of four addressing modes.
2. Addition of immediate addressi_ to applicable major op-_ode
instructions. An ii bit i.mediate operand is used.
2
,A
!
r
!!
3. Addition of indhrect addressing to applicable major op-code
instructions. A 5 bit page register is required to m_tain
the 16 bit address.
2.1 Addressin_ Techni_ I
The discussion of this technique is divided into two sectS ons. The
first section discusses the general modifications required in the CPU
and the second section discusses the specific details of the additional instruc-
tions.
2.1.1 Modifica%_ons to the CPU
The implementation of this approach, required a change to the instruc-
tion field decoding. A total of six instructions was added to the instruction
set and all six were decoded as minor op-code instT.nactions. Since these
instructions added immediate addressing capabilities to the CPU, it was
desirable to utilize the least significant six bits of the instruction word
as the inmedlate value and this selection was incorporated into the design.
This decision was made because of the available gating and flow organizations,
and the inherent programming ease associated with the use of the least
significant bits. The selection of the least significant six bits as the
i_edlate value required a change in the mAnor op-code field decoding. In
the original systems, bits 1-5 were decoded as the manor op-code. Due to the
addition of the immediate instructions, the manor op-code field was moved
to bits 7-12 and associated decoding was changed to satisfy decoding require-
Addition of Instructions
Technique I provided immediate addressing capability for the,OBP
'm
3
programs by adding six minor op-code instructions to the instruction set.
These added instructions were similar to existing major op-code instructlons
with the difference being the use of bits I-6 of the instruction word as an
operand rather than part of an address for the operand fetch. The following
instructions were investigated:
i. I_IADA IMMEDIATE
2. PLUS A IMMEDIATE
3. _ A IMMEDIATE
_. LOAD EA IMMEDIATE
5. PLUS EA LMMEDIATE
6. _ F.A IMMEDIATE
2.1.2.1 Load Immediate Instructions
Two _oad immediate instructions, LOAD A IMMEDIATE and LOAD EA
IMMEDIATE were evaluated for Technique I. These load immediate instructions
gate the immediate operand field of the instruction word through the adder
and into the selected register.
Two phases (@i' _2) are required to implement these instructions with
phase one utilized as the normal instruction fetch phase. The second phase,
_2' executes the imnediate load by gating the innediate operandiinto the
adder which in turn is clocked into the selected register. Since load functions
do not require carry delays through the adder, only one clock cycle is
necessary to implement the load. A total of three clock cycles are required
to complete the load immediate initructions,
The existing instructions, LET and SET EXT_SION WITH, are similar in
function to the load ieznedlate instructions. In _2' the existing inatrmctions
|
!
I
I
I
J.
1
I
I
!
fetch an operand from memory and load the selected register. The fetch-load
function is exesuted in three clock cycles, for a total of five clock cycles
for complete execution. Since the load _ediate instruction requires three
clock cycles for completiov, two clock cycles are saved.
The total hardware impact of these instructions was minimal, with the
majority of the changes being implemented _J_the control area. The MORA,
SUHACC, SU_, PA_ clock and ACC clock control circuits require modification.
In addition, logic was added to decode these instructions and the adder inputs
were changed. The gates controlling the most significant 12 bits of the A
input to the adder were changed from two inputs to three inputs in order to
inhabit these bits during the load immediate instructions.
2.1.2.2 Plus Immedlate Instruction
Two plus immediate instructions, which add the immediate value to the
accumulator or to the extended accumulator, were evaluated for Technique I.
The plus immediate instructions gate the immediate value of the instruction word
into one set of adder inputs and the selected register into the other set
of adder inputs. Then, the adder output is clocked into the selected
register.
These two instructions have similar phasing and are implemented by
utilizing much of the same gating. These instructions require two phases
(_l' _2 ) with phase one used as the normal instruction fetch phase. Phase
two (wPdch requires 2 clock cycles) gates in the immediate portion of the
instruction word, and inhabits the remaining inputs to that set of adder inputs and
also gates the selected register into the other set of adder inputs. The
adder is then clocked into the selected register and the instruction is completed
in four d_ock cycles. In the original CPU design, the PLUS instruction (which
is very similar to the plu_ immediate instructions) adds the MOR to the ACC.
During the second phase of the instruction, the operand fetch and add are
executed. These two operations are completed in three cloak cycles. Thus,
the PLUS instruction is five clock cycles in duration and onl_ one cycle longer
than the plus immediate instructions.
Very few additional gates were required to implement the plus immediate
instructions elnce existing phasing was used to Impl_ment the two instructions.
Circuitry was required for the decoding logic, HORA, ACCB, _B, SUMACC, and
SUN_ control lines as well as the ACC and _ clocks.
2.1.2.3 NAnuae Immediate Instructions
Two minus immediate instructions, which subtract _he imaediate
value from the accumulator or from the extended accumulator, were evaluated
for Technique I. The iinus immediate instructions subtract the Ismedlate
portio_ of the instruction word from the selected register and then store
the result in the selected register.
Two methods for implementing these two instructions were considered.
The first approach investigated the possibility of utilizing a load i_mediate
into the register not being selected and then negating this value. This
approach destroy,_i the contents of the unused register and thus was
discarded as _macceptable.
The second approach, which _aas selected for the design, evaluated
the subtraction of the imaedlate value from the selected register.
The organization of the two marius immediate instructions is almilar
to the MINUS instruction requiring two phases (_i" _2 ) and utilising phase one as "
the instruction fetch phase. Phase two which requires 2 clock cycles,
I
!
I
I
!
!
gates the complement value o_ the i_nediate pperand to the least significant six
bits of one set of adder inputs and one*s into the remaining 12 inputs.
Simultaneously, the selected register, alone with the Co carry are gated into
the other set of adder inputs. The result of this 2'e complement addition
is then clocked into the selected register. Two clock cycles are required
to complete @2" A total of four clock cycles are _e_uired to complete__th_
minus immediate instruction, as compaM with five clock cycles for the
original MINUS instruction.
Most of the additional logic _or the minus iRnediate instructions was
added in the control with only the adder inputs being changed in the arithmetic-
register area. The 12 most significant |ares of one set of adder inputs had
one input gates changed to two input gates for use in .gating on" those 12
most significant inputs of the adder. The "gating on" feature set one' • into
these inputs so that a 2.s complement addition could be performed. In the
control, Goding and phasing gates were needed to implement the two phase
operation associated with these instructions. Additional gating was required
for the MORA, ACC B, CO, M_B, _ clock and ACC clock signals. The change
to the register-adder section amounted to changing the one input gates to two
input gates on the 12 most significant bits of the adder so that Z' s
can be forced into these bits.
2.2 Addressin_ Techniaue
Addreesir_ Technique II did not add instructions as such to the OBP,
but added two addressing capabilities (i_nediate and indirect addressing)
which a_fected a number of instructions. One7 modifications to the CPU
were neeeslar_ to implement then changes and they are dieouseed in the
followln_ paragraphs.
7
2.2.1 D_esi_n _ification
During the initial evaluation of this addressing approach, the removal
of the index register was considered to facilitate implementation of this tech-
nique. After the initial investigation of the OBP program was completed, it
was decided that a definite need for indexing existed, and the index
register was not removed.
With these two addressing modes added to the CPg, a total of four
addressing modes were available in the CPU a_ some type of detection circuit_
was necessary. To properl_ detect these modes, it was decided that bits 12
and 13 of the instruction word would be used as a 2 bit field for address
mode selection. The addressir_ modes and their associated codes are listed
in Table 2-2.
Immediate addressing capability was added to applicable instructions
as referenced in Table 2-1. This addressing feature permitted the use of the
least significant ii bits of the instruction word as an immediate operand.
Since the basic flow organization of the CPU was such that operande were
gated through the adder and the adder also controlled register to register
flow, it was advantageous to use the existing flow organization for the imnediate
addressing scheme.
Because bit 12 of the instruction word was used for address mode
detection, and an 11 bit immediate operand would satisfy the majority of com-
putational requirements, an 11 bit immediate operand was selected for imple-
mention in the CPU. The use of on_7 an 11 bit operand required that the
decoding of the immediate mode would control the 7 most significant inputs
of the adder which were utilized when the full 18 bit operate1 was selected.
8
I,,j
i
1
!
I
TABLE 2-1
Control Modifications For
Instructions Utilizing Immediate Addressing
Seven MSB of Adder
Qat_ o_
Plus
TRANSFORMED BY
LET
ANDED WITH
TIMES
THF_ GO TO
ORED WITH
SET EXTENSION WITH
USE SUB
GO TO
DIVIDED BY
EORED WITH
SUB PLUS
EXECUTE
HALT
I0
RESUME,
Seven MSB of Adder
_ed on .....
IF SUB
IS EQUAL
IS GREAT_
IS LESS
MINUS
Six LSB of MOR
SET SCALE
SHIFTED BY
CYCLED BY
DOUBLE SHIFTED BY
DOUBLE CYCLED BY
|
!
?
Address Mode Field
13 12
0 0
o 1
1 o
1 1
TABLE 2-2
Addressing Mode Selection_
Addressin_ Mode
DIRECT ADI_SING
IMMEDIATE ADDRESSING
INDEXING
INDIRECT AD_%ESSING
n
10
d!
I
I
I
Two sets of adder inputs had to be controlled so .that both "gating on 'v and
"gating off" provisions were present. In order to accomplish this, the "A"
inputs to the adder were changed as follows. First, the seven most significant
two input gates were changed to three input gates for "gating off" control.
Second, the seven most significant one input gates were changed to two input
gates for "gating on" control. Also associated with the immediate addressing,
was the inhibit control on the memory request lines, as well as associated
clock and input control modifications. In order to properly utilize the
immediate approach in shift, cycle and scale setting instructions, oontrol
circuitry was added to control the setting of the scale register and operation
counter with the least significant six bits of the instruction word. Table 2@i
li:i_tsthe inatructions and their associated control.
In general, the execution of instructions using the adder was
decreased by onl_r one clock cycle with immediate addressing. In the
original design, the fetch and add phase were overlapped so they could be
executed in three clock cycles. To decrease this t_me _Mther a faster
adder must be utilized or the memory cycle time for the OBP must be decreased.
Indirect addressing was _nplemented for the instructions referenced
in table 2-i. Indirect addressing uses the normal operand fetch phase as
an address fetch phase which fetches the address used to fetch the operand.
This additional _hase adds two clock cycles and an extra memory cycle to
each instruction utilizing this mode of addressing.
The original system was designed to access 65K of memory through
s 12 bit address field and a & bit page which was appended to the address field.
The Page Register value was loaded into the four most significant bits of the
'rY
11
address register during the instruction fetch phase. Thus, the first
_096 core locations could be accessed without setting the page register.
_hen higher core locations were accessed, the Page Register had to be Set
with a Set Page instruction priu to executing the given instruction.
When bit 12 _f the instruction word was selected as a field bit for
address mode selection, only an ii bit address field was available to address
memory. The use of an ii bit address field on_ allows access of the first
20_8 locations of core, therefore some means of controlling the 12th bit of
the address field was required. Since the present CPU organization utilAzed
the Page Register for address control, the utilization of th_s organi_ation
would enable complete address modification with minimal impact on hardware.
Therefore, an additional bit was appended to the Page Register.
With the introduction of indirect addressing, an additional control
phase similar in structure to the index phase was added to the CPU. The
decoded output _f the address mode selection field selected this phase to
fetch an address, which was clocked into the address register and used as
the normal operand address. The associated control circuitry was modified
when this phase was added to the CPU and additional control was added to request
memory during the indirect address fetch. Since the instruction word bits
12 and 13 were decoded for the address mode selection, it was also necemzary to
ensure that the gate delays through the decoder to the phase selection circuitry
was not critical to the system operation. Therefore, the number of gates connected
in a serial fashion had to be limited. Thim necemsitated the use of a parallel
gating structure and reaultediln a greater number of gates,
12
_rY
II
I
I
I
I
I
!
Task 1 was orxanized so that two addressing techniques were evaluated
for the OBP system. A summary of these techniques follows:
2._. ! Technioue I
The addition of immediate addressing instructions was evaluated in
Technique I of this report. Six instructions were added to the instruction
set providing the capability to use immediate operands in OBP programs.
The following six instructions were added:
1. Load A Immediate
2. Plus A Immediate
3. Minus A Immediate
4. Load EA Immediate
5. Plus EA Immediate
6. Minus EA Immediate
The addition of these instructions required the change of the minor op-code
field from bits i-5 to bits 7-12 of the instruction word since bits 1-6
are used as the immediate operand field. It was possible to use bits 7-12
as the immediate operand, but that approach had a significant impact on hard-
ware beside the confusion associated with using these bits as an immediate
value.
The purpOse of adding immediate instructions to the CPU ts to
execute instructions utilizing partial word operands with a decrease in
execution time. The savings in program time however, must be jJustified by a
limited increase in hardware. While the total impact on hardware associated
with these instructions was n_Lniil, the savings in _rogram time was very
13
limited. The immediate add, or immediate 2's complement add(subtract) instruc-
tions are only one c_ck cycle faster than the original PLUS and _JS instructions
and this minimal saving in execution time does not Justify the implementation
of this technique, in addition, the OBP programmer expressed their feeling
that more addressing capability than that gained with this technique was
desired for increased programming flexibility. Thus, the technique was rejected.
2.3.2 Technique II
The addition of immediate addressing and indirect addressing
capabilities for the OBP system were evaluated in Technique II of this report.
The implementation of this technique dictated that some type of addressing
mode selection scheme must be implemented intt_ae CPU to distinguish between the four
possible addressing modes. As a result, bits 12 and 13 of the instruction •
word were selected as the field for use in address mode selection. These
bits were decoded and are referenced in Table 2-2.
Since the immediate addressing technique utilized the least signifi-
cant ii bits of the instruction word as an immediate _perand. The need for
an operand fetch was eliminated. Because the immediate operand did not use
all 18 bits, a method of controlling the re_ainlng seven bits was required.
Consequent_-, the adder gating was changed. Since the operand fetch and add
phases were overlapped in the original CPU instruction, a gain of onl_ one
clock cycle was realized for any immediate instruction ful_v utilizing the
adder. While the additional ciecuitry associated with the immediate addressing
approach was minimal, it was not Justified by the limltld savings in execution
time.
When indirect addressing Was incorporated into the CPU, the addition
of a bit to the Page Register, the control of the memory request line, and
!
I
I
I
the new phasing added, had a big impact on the control circuitry +of the CPU.
In addition, indirect addressing adds two clock cycles to th_ execution time
of applicable instructions. The additional execution time and associated
increase in hardware were the main factors in rejecting this approach.
It was desirable to reduce eEecution time in the approaches discussed
¢
above so that program efficiency would be improved. Two means of reducing
instruction execution time were considered; either design a faster adder,
or reduce the basic memory cycle time. Since these two changes would have a
considerable Impact on hardware, they were rejected.
After evaluating both techniques, it was decided that the geale
whAch origi_ motivated these changes (increased program executiwn speed
and flexlbilAty with minimal hardware impact) were not achieved.
3.o
The study associated with Task i dealt with the investigation of
two approaches for improving the addressing capability of the OBP system.
These two techniques were evaluated and then discussed with NASA personnel
and programmers associated with the OBP. The results of these discussions
led to the decision to eliminate both addressing approaches since neithe_
proved satisfactory for the OBP s_stem. Instead, two alternative approaches
for improving the addressing capability of the OBP were suggested.
First, multiple index registers could be added to the CPU. Second,
15
the one index register could be utili_ed along with indirect store and indirect
load instructions. The second approach was chosen primarily because of the
increased hardware associated with the first approach.
In addition to the two indirect instructions, it was requested
that CPU _-egister_ be available for use as loop counters for controlling both
program loop execution az_. branching. To satisfy this request, instructions
to test and increment registers were added to further increase the addressing
and operational features of the OBP system.
Accompanying the test instructions was the request for the ability
to manipulate and transfer data in the CPU registers. In particular, it was
felt useful to compute an index v_lue in the accumulator and transfer this
value to the subscript register ($5), while not destroying the present value
of the SS. This operation is performed in the present OBP by executing
the following four instructions: PLUS, SAVE SUESCP_PTS IN, YIELD, and
USE SUESCR/PT. These four instructions could be replaced by PLUS and
EXCHANGEinstructions. In order to satisfy these requests, three instructions
were added to the instruction set to exchange the accumulator,the extended
accumulator, and the subscript register.
Although the new instructions were proposed with the improvement of
system performance in ,_nd, the decision to implement them was subject to
consideration of the impact of integrating these instructions into the design.
It was particularly desirable to delete any circuitry not being used in the
present system applications. _e investigation of the hardware impact of these
instructions was carried out with this thought in mind and it was found that--
certaini_'astructions and functions were not utilized by the programners in the
|
!
T
16
!i
!
I
I
I
I
]_esent programs. As the evaluation of ImplementinE Task 3 waa oonaluded,
the in,tructions were added with the least possible amount of additional
hardware, and all circuitry having limited application in present programs
was removed. The following changes were incorporated.
NOTE:
3.0.1
3.0.2
Additional Irmtructionn
i. LOAD INDIRECT
2. STORE INDIRECT
3. IF_ ¢ o, s_ _,._ I__ _ I
2
_. IF SS _ O, SET D and INC_ SS
5. EXCHANGE A3 & SS
6. EXCHANGE A & EA
7. EXCHANGE EA & SS
(i) EA - Extended Accumulator
(2) SS - Subscript Register
(3) A - Accumulator
(A) D - Decision Flip Flop
(5) PER- Memory 0porand Register
Functions Deleted
i. SCALE REGIST_
2. OR-AND FLIP FLOP
,_UStructio_ Removed3.0.3
1. LET SCALE
2. SET SCALE
3. O_
_. AND
I?
I 'rT •
3.O._ Instructions Modified
i. NORMALIZE - The normalized count is now stored inthe subecrlpt
register
2. MULTIPLY -Scaling is deleted fx'om multip_v
3. DIVIDE - Scaling, correction cycles, and overflow tests
are deleted from divide.
The chart in Table 3-i illustrates the impact of these changes.
3.1 Addition of l_struct_o_
Based on the resultso6f Task i, seven instructions (specified in
Paragraph 3.O.1 of this report) were added to the OBP instruction set. The
intent of these instructions was to improve the programmers ability to write
programs dlrectly associated with the OAO satellite applications.
The organization of the CPU is such that data flow paths inte_ connect
the register oircuitry via the adder. These flow paths were used to implement
the new instructions and thus onl_ the circuitry associated with the CPU
control logic was increased.
3.1.1 Instruction Descri_ion a_d Discussion
Seven ne_ instructions have been selected for integration into the
CFU design. A discussion of these instructions is given in the followAng
paragraphs.
3.1.1.i Indirect Ins_r_ctlons
Two indirect instructions, INDIRECT I_T and INDIRECT YIELD were
selected to be incorporated into the CPU design. The selection of these two
instructions was based c_ the desire for additional capability to easily access
buffer memory areas not readily accessible with one index register. This
18
d,q.
!
I
19
increased capability could be achieved with multiple index registers ..... However,
indirect addressing provided similar capabilities with less hardware.
The indirect load instruction loads data in the accumulator in the
same manner as the LET instruction, except that the indirect load requires
an additional phase which is two clock cycles long. This phase fetches an
address which is set into the address register, which in turn is used to
fetch the normal operand fetch address. Since all memory access functions
require two clock cycles, the indirect load is two cycles longer than a
LET instruction. Three phases (_i' _2' _3 ) are required to implement this
instruction. Phase one is the normal instruction fetch phase, phase two
fetches the address for the operand fetch, and phase three fetches the
operand and loads it into the accumulator.
The indirect store instruction stores data in memory in the same
manner as a YIELD instruction except for the data storage phase. The indirect
store instruction requires an additionalpphase used to fetch an address that
is set into the address register and used as an address to store data. The
indirect store requires three phases (_i' _2' _3 ) for complete execution
and ia one clock cycle longer than a YIELD. Phase one is the normal instruc-
tion fetch phase, _ase two fetches the address used for data storage, and
phase three stores the data at that address.
Both instructions were decoded as major op-code instructions with
inde_Lng available for both. The associated phasing, decoding, and control
modifications were minimal due to the use of existing phasing and flow
organizations. Control was also added to the memory request logic for the
extra _em_ry cycles.
instructions.
Eighteen (18) gates were required to implement these
i
I
I
20 !
_4
3.1.1.2 Test for Zero Instructions
Two instructions (T___T SS and TEST EA) were added to the instruction
set to test register contents for zero. These instruOtions required no
operand fetch and thus were implemented as manor op-code instructions. They
permit the SS and EA registers to be used as pointers or counters for calling
up sequential arrays of data, and for performing loop and branch operations.
The instructions were implemented in two phases (_i' _2 ) and are completed
in five clock cycles. The instructions operate as follows.
Phase one executes the normal instruction fetch function, while
phase two performs the test for zero on the selected register. If the
register under test is zero, no action is initiated in phase two and the
instruction is completed, but if the register tested in phase two is not
zero, the -D',flap flop is set and the register under test is incremented
by one °
The implementation of these instructions required little
additional hardware since existing zero detection circuitry was used for
performing the tests. Each register under test is gated into the adder and the
"_UM_O" output is tested for a "i". This output is then used to control the
setting of the "D',flap flop and the incrementing of the register. No new
circuitry was introduced into the registerearithmetic section of the CPU
and only a small amount was added in the control area. Full use of existing
flow paths was employed to hold the additional logic to a minAmum. Twelve (12)
gates were required to implement these two instructions.
3.1.1.3 Exchange XnstructAops
Three of the seven new instructions are 'register exchange,, instruc-
21
tions. These instructions consist of exchanging the accumulator and the
extended accumulator, exchanging the accumulator and the subscript register,
and exchanging the extended accum_ator and the subscript register.
Addition of these instructions allow8 the programmer to exchange
registers without using several memory access instructions. One particular
application of the exchange accumulator and subscript register is in the
computation and use of an index value. First the index value is computed in the
accumulator, then the accumulator and subscript registers are exck_nged
so that the computed index values can be used innnediately without destroying
the existing index value.
The exchange instructions consist of two phases (_i' _2 )' and are
five clock cycles in duration. No operand fetch is required with these
instructions and they are decoded as minor op-code instructions. Phase one
is the normal instruction fetch phase and phase two performs the exchange.
When executing an exchange with the A, the following sequence is executed
in phase two. The NOR is cleared and the A is gated through the adder
and stored in the MOR. Then the EA or SS (depending on the instruct_n being
executed) is gated through the adder and is clocked into the A. The MOR
is then gated through the adder to the EAoor SS and the register is clocked.
If the instruction exchanges the EA and SS, the following sequence
is executed. During phase two, the MQR is cleared and the EA is gated through
the adder and stored in the NOR. Then the SS is gated through the adder and
clocked into the EA. Final_, the MOR is gated into the adder and clocked
into the SS to complete the instruction.
These instructions make use of the existing register organikation,
with register to register flow accomplished via the adder and therefore onl_
22
,1
!
!
I
ml I ........
]
J
I
I
I
I
control circuitry modifications are required. Twenty-one (21) gates are
required to implement these instructions.
3.2 Deletions - Circuitry a_d ,Instructions
In addition to the seven instructions added in Task 2j the following
circuitry and instructions were deleted.
The followl_ deletions were made:
1. Removal of the OA flip flop.
2. Removl_ of the scale reEimter.
The followinE instructions were modified:
i. Normalize
3. Divide
The foll_ing instructions were removed:
I. LET SCALE
2. SET SCALE
3. OR
_. AND
The decision to remove registers and instructions was a combined hardware-software
decision. When the two indirect instructions were added, two major op-codes
were required and onl_ one major op-code decoding was available for these
instructions. The two alternatives which existed were either to add additional
circuitry for the decoding, or to delete existing major op-code instruOtionm.
To properly evaluate the alternatives, the hardware and programming aspects
of these changes were investigated. An evaluation of the first alternative,
the addition of decoding circuitry, verified there would be a considerable iapect
rl
23
on the hardware and programming manual documentation and this alternative was
rejected. The second alternative was to investigate the possibility of removing
major op-code instructions. An evaluation of this alternative showed that
removal of the seldom used scaling function would free a major op-code.
The following section is a discussion of the removal of this function.
3.2.1 Scale Re_ister and Associated Y_s_rugtions
In addition to its limited use, the scale register and its associated
circuitry were non symmetrical relative to other register and control logic.
This made it difficult to partition the scale register logic and since a
total of IA3 gates and flap flops were associated with the scale register,
its removal was attractive from the standpoint of hardware simplification.
The deletion of the s_ale register eliminated the LET SCALE
instruction (major c_-code) and the SET SCALE instruction (minor op-code).
In addAtion, three other instructions NORMALIZE, TIMES, aKd DIVIDED BY were
affected by the removal of the scale register. In all, a total of 113 gates and
flip flops were removed by the elimination of the scale register.
First, the NORMALIZE instruction, _hich previously stored the
normalized count in the scale register, had to be changed. Since the
normalized count must be saved, two possible solutions were considered. First,
the count could be stored in a fixed memory location. Second, the count could
be stored in a CPU register. The register store approach was considered more
practical since extra logic and execution time is required for the memory
store-cycle of the first approach. Storage of the normalize count in a
register utilizes the basic flow paths of CPU, and requires negligible change to
the logic. An idditional six gates are required on the adder. However, these
24
II
!
I
I
I
replace the six orlginall_ used on the scale register. The subscript register
was selected as the register to be used for storing the normalized count
since the accua_lator and extended accumulator are normalized during the
NORMALIZE instruction. Onl_ a small amount of control logic is required
to implement thAs transfer.
A second instruction affected by the removal of the scale register
was maltlply (TIMES). TIMES origlnally had a scaling phase that shifted the
product, based on the value in the scale register. The scaling phase and
its associated hardware were eliminated with the removal of the scale register. __
The third instruction affected by the scale register was DIVIDED BY.
The divide instruction had a scaling phase that shifted the dividend prior
to performing the divide itself. Again, the direction and amountoef the
shift was determined from the value in the scale register. The scaling phase
and its associated hardware were removed with the eli_Atnation of the scale
register.
•3.2.2 F_xllficatlon of the Divide _struction
In the original design of the CPU, the decision to implement a
total hardware divide was based on the evaluation oJ_ the usage of a divide
instruction. The initial investigations indicated that the dAvide would be used
frequent_ and would prove most efficient, both in power saved, and execution
time saved, if it was total_v perfo:rmsd with hardware. Further investigations
into the applAcation of the divide instruction in the system programs
indicated that the divide instruction was used to a very limited extent. This
knowledge, accompanied with the fact that divide used a considerable amount
of control logic, prompted the decision to eliminate hhe majority of the
circuAtry aisoclated wlth divide.
25
The .oz'igir.al hardware divide made both the divisor and dividend
positive prior to performing the divide algorithm so that no correction to the
quotient would be reLluired. After the divide was completed, the sign of the
remainder was tested to see if it was negative. If it was negative, the divisor
was added to the remainder for the correction cycle. After the correction
cycle was completed, the quotient was placed in the accumulator and the
remainder in the extended accumulator. Also included in the hardware divide were
the divide overflow tests.
It was decided that all special phasing associated with the divide
algorithm would be removed and the divide instruction was reduced to perform
the standard add/subtract shift cycles of the non-restoring division algorithm
with 81 gates and flip flops eliminated by the modification. As a result,
prior to the divide, the programmer must ensure that the divisor is larger,
in magnitude than the dividend to prevent overflow from occuring. In addition,
if an exact quotient (quotient may be off by i in the least significant bit)
or remainder is desired the appropriate corrections cycles must be performed
as indicated in Table 3-2.
3.2.3 Elimination of the OR-AND Fllp Flop
The OR-AND (OA) flip flop was inltial_7 designed into the CPU
as a means for controlling the setting and resetting of the "D" flip flop.
The setting of the OA indicated an "and conditional test" to reset the "D"
flip flop, while the resetting of the OA indicated an "or conditional teat"
to set the "D" flip flop. An examination of the system programs indicated
that only the "OR" state of the OA flip flop was being used. Since only one
state of the flip flop was being used, the decision was made-to eliminate it.
26
I
L ' " I I I _i'_' h, . • -
i
I
|r
l
I
0 0 0 _ _ _
_ _ _ _ Z Z
27
Direct_ associated with the elimination of the O_ fllp flop was
the deletion of two mlnor op-code instructions; the OR and AND instruction.
Removal of these two instructions, the OA flip flop and the setting of the "D"
flip flop resulted in a decrease in control circuitry.
3.3 Summ_ of Task 2
Task 2 was organized to make changes in the CPU design that would give
the programm,_r greater flexibility in addressing and other programming opera-
tions. The total impact on hardware was the prime factor in deciding which
approach to take.
A total of seven new instructions were added to the OBP instruction
set. Table 3-i illustrates that a total of 51 gates were required to ful_v
implement these seven instructions. The additional flexibility and programming
speed gained from these instructions was large, relative to the percentage
increase in hardware to implement the seven instructions. The hardware was
held to a minimum by utilizing the existing CPU flow paths to implement
these instructions and the circuitry added was entirely in the control area.
Four instructions, the scale register and the OA flip flop were
removed. TIMES, DIVIDED BY, and the NORMALIZE instructions were affected by
the removal of the scale register. In addition, DIVIDED BY had overflow tests
and setup and correction cycles removed from the hardware.
The total change in the circuitry resulted in a reduction of circuits
used in the CPU. While the seven instructions added circuitry to the control
area, the deletions removed much more control circuitry. In comparing the
additions with the deletions, the additions tend to follow the basic symmetry
of the CPU design, while the deletions tended to be unsymmetrical. A total
28
!
!
!decrease of 160 gates was. realized with the improved instruction set. Table _-I
indicates where the 160 gates were removed fr_n the logic.
The elimination, of unique type control logic from the CPU was most
desirable from a packaging standpoint. The basic register-arithmetic portion
of the CPU is readily partitioned, while the control logic does not appear
to be readil_ partitionable. To enhanve packaging efficiency of the control
logic, the non-similar logic is to be minimized. The removal of the scale
register and OA flip flop and simplification of the divide instruction eliminated
a portion of that logic which is difficult to partition and thus should simplify
the packaging of the control logic.
A.o TASKS
This section discusses the results of a study of methods for redesign-
ing the OBP control logic in such a way as to introduce a greater degree of
symmetry or regularity into its structure. It is hoped that such a redesign will
facilAtate partitioning the logic into a relatively small number o_ general
purpose logic structures and thereby make it feasible to mount it on hybrid
substrates.
In the breadboard, almost all inputs an_ outputs from i_lividual gates
require external connections by way of pins on the printed circuit boards.
Since the number of pins is restricted, the density of gates is severely
iAmited. It is hoped that by partitioning the logic into more complex struc-
tures with man_ of the interconnections made on the substrate itself, the gate
density may be increased and the size of the processor reduced. To make
this packaging method economlcall_ feasible, however, it is necessary that
the n,unber-of-distlnct types of substrates be mAnJ_nized. Thus complex logic
29
structures which occur repeatedly throughout the control circuitry must
be identified.
_.I Problems Encountered in Partltionir_ Control Logic
Unlike computer register logic in which essentially identical logic
structures are associated with each bit, control logic is in general a
formless irregular conglomeration of gates and flip flops. This is a
result of the fact that this logic is used to generate all the unrelated timing
and data dependent conditions which control the various data transfers
and transformations within the computer. Each block of logic performs a
function which is different from, and independent of, the functions of
neighboring blmcka.
A further complication results from the tendency of the logic
structures which generate individual control signals to be relative_ small
groups of gates with large numbers of inputs and outputs. These external
connections are difficult to reduce because they come from and go to a wide
variety of places in the processor. For example, the signal which sets the
END fllp flop has over 20 inputs from such sources as the phase flip flops,
various tests of the contents of the operation counter, selected bits of the
memory operand register, instruction decoder, etc. It is impossible to package
all these control slgnal_sources on the same subs_rat_ with the END flip flop.
As long as conventional gates and flap flops are used as the basic control
building blocks, it appears that pin limitations are still the factor
which control the degree to which packaging density may be increased.
_.2 Evaluation of Present control Lolic
Before a redesign of the OBP control logic was attempted, the
3O
!
!
I
q_k_
!
!
i
I
I
I
II
tl
feasibility of partitioning the present design was evaluated. The logic was
partitioned into as few types of fairly complex structures as possible. This
not on_v provided an estimate of the magnitude of the problem but also
served as a reference by which other designs could be Judged. During a
previous study, the registerllogic was partitioned into 18 sixty pin sub-
strates of-three different types. Since there are rough_ the same number
of gates in the control logic as in the register logic, this number provides
an order of magnitude goal for the control logic partitioning.
As expected, the frequency of occurrence of similar logic structures
sharp_v decreased as their complexity increased. In fact, no groups containing
more than three gates were found which occurred often enough (e.g., more than
eight times) to be considered general purpose structures. Examples of the
recurring structures which were found are shown in Figure _-i. It can
be seen that the savings in external connections is small as long as the
occurrence of similar complex structures is so limited.
Using these small general purpose blocks and implementing the
rest of the control logic with discrete gates and flip flops, a partitioning
scheme was organized which required 60 substratee (assuming 60 pins per
aubstrate) of five gifferent types.
4.3 Ao_roaches to Control Logic RedeeIA_n
Two approaches to the redesign of the OBP control logic were
investigated. The first consisted of constructing several relative_ complex
logic structures which could serve as general purpose logic blocks. These
blocks were not identical to any block actuall_ appearing in the control,
but were similar to several elightl_ different ones. Each block was designed
to be substituted for a ma_ number of similar structures with a minimum
31
E J
S70-580- VA-5
Figure _-I. Exa_ Is of Recurring Structures
32
I
I
I
I
I
I
I
i:
I
wastage of pins and gates. By replacing most of the control logic with a
few groups of gates in this manner, m much greater degree of regularity was
introduced without changing the basic building blocks of the system.
The other approach which appeared to offer a significant reduction
in the complexity of the OBP control was the use of a mlcroproErammed LSI
memory to store a majority of the control signals for the CPU. By sequencing
through a block of locations whose outputs directl_ or indirectly generate
the OBP control signals, many of the functions presently performed by discrete
gates can be performed instead by a much more compact LSI memory.
In the following paragraphs, these two design approaches will be
discussed in more detail.
_.3.1 General P_rpose Logic Structure Approach
The functions performed by the control logic are determined by
the various algorithms used to imp_ament the instruction set. Given these
constraints, however, the actual configuration of gates and flip flops necessary
to perform the operations in the proper order is not fixed. In the original
OBP control, the logic was designed with economy of gates in mind. However, synwnetry
is a more important characterAstic for partitioning. Therefore, extra pins and
gates can be included if by doir_ so it becomes possible to design one circuit
which can replace each of several slightly different ones. The pins wasted by
this practice will hopefully be more than compensated for by the pins saved
through the use of a more complex interconnection pattern on the substrates.
To design these structures, the OBP control circuits were categorised
by structural similarity; Then one general purpose circuit was designed to
implement the switching function represented _ each of the,e si_"_.,_lsrlogic
33
structures. In general this necessitated wasting inputs and even entire
gates when the number of inputs varied from structure to structure. Output
pins were wasted when some circuits required the true signal to be brought
out, others required the complement, and soma required that both be available.
By providing the smallest nu_er of external connections necessary to meet
the needs of all the structures in the group, this waste wan minimized.
Approximately 90 percent of the OBP control logic was implemented
with 13 general purpose logic structures. Examples of these and their charac-
teristics are shown in Figures _-2 to _-5 and Table _,I. The iO percent of
the control logic which is not represented by these circuits consists primarily
of miscellaneous single gates and expander inputs which can probably be mounted
on the same substrates as the larger structures, thereby obviating the need
for a special substrate type.
_.3.2 _artitiordnu "for Substrates
6ivan the first order partitioning of organizing discrete gates
into general purpose logic structures, there remains the second order partition-
ing problem of placing these structures on substrates. The logic structures were
assigned to the substrates systematically to minimize both the total number of
substratas and the wastage of gates. However, some wastage was unavoidable
where logic structures would not fit on substrates in exactly the quantities
required by the system.
In the partitioning of the control in its present form, it was
assumed that the substrates to be used would have 60 pins. This ass_ption was
made because all the h_brid substrates manufactured by Westinghouse thus far
have been of this type. However, since control logic partitioning is so
severel_ affected by pin limitations, it seemed reasonable to investigate the
consequences of using submtrates with more pins. A sanewhat arbitrary decision
3_
I
I
I
I
1
I
1
]
ilr
I
I
I
I
\
]
Figure A-2.
I I
y
V
\
\
\
I
J\
ii
)
s?o- 580-vA- 2
General Purpose Logic Structures
35
STO -5 BO-VA-I
Figure &-3. Qeneral Purpose Logic Structures
36
I
I
I
I
1
r:
i'" ,
I
\
\
$70-580 -VA- 4
I
\
I
General Purpose Logic Structures
_7
Figure &-&
Y
STO-SBO-VA-5
'm_
Logic Structure Type No. of Gates No. of Pins Quantity Required
I 22 53 3
II 8 19 _0
III 6 18 7
Iv 5 9 7
v IA 28 8
VI 7 ll lO
VII 6 16 9
VIII 3 7 &l
IX 5 _ 5
X _ 12 16
XI i0 19 8
XII 6 12 8
Xlll & 6 32
Characteristics of General Purpose Logic Struct_es
Table 4"--I
38
!
I
I
!!
[
was made to perform the partitioning using substrates having 60, 90, 120 and
150 pine. Increments of 30 pine were felt to be large enou4h to show significant
differences in the partitions. The upper lim_.tof 150 pins w_s chosen as
the greatest number likely to be available on a substrata.
The results of this partitioning are summarized in Table &-2. It
can be seen that even if general purpose logic structures are available, the
gains are small for 60 pin substrates. A large improvement is achieved by
using 90 pin substrates. As the number of pins is increased to 120 and 150,
further reductions in both the number of substrates and the number of types
can be made. However, as the number of integrated circuit chips increases, the
amount of substrate area used for Interconnections also increases. Consequently,
it may prove necessary to use larger eubstrates to _veically realize the 120
and 150 pin configurations.
_.3.3 Summary of Control L_ic RedesiRn
It appears that significant reductions in the total number of
eubstrates and the number of substrate types may be achieved in this manner but
on_ by relaxing the pin limitations to at least 90 pins per substrate. At
present onl_ 60 pin subetrates are available hut no fundamental reasons are
known wh_ larger numbers of pins and perhaps larger substrates (should over-
crowding prove to be a problem) cannot be made available. It is also
interesting to note (see Table _-2) that the total number of external connections
tends to be reduced by using substrates with more pins. This implies that an
increase in reliability is possible through the use of general purpose logic
structures since the primary sources of failures are wired connections.
39
IPins/Substrate No. of Types
6O 7
9O 5
120
150 3
No. of Max. No. of _o. of Pins
Substrates Chips/Substrate (total)
52 13 3120
27 18 _5o
22 22 2_30
16 29 2_00
Results of Logic Partitioning
Table %-2
i
_0
1
I
I
!
I
J
1
|
I
I
I
it
!
The substrate count for the control logic only approached the 18
substrate goal set by the register logic partitioning as the number of pins
was increased to 120. The fact that such reductions cannot be achieved
with @0 p_l substrates demonstrates the problems caused by the inherently
large numbers of external connections required for control logic.
4._ Microu_oflr@med Centre ! L_flic
To design a microprogrammed control unit, each machine instruction
is subdivided into "microinstructions", each of which represents the control
operations needed during one period o_ the system clock. Each microinstruction
is represented by one word in the memory. When a microinstruction is accessed,
the bits of this word are used to provide the necessary control signals.
Blocks of microinstructions are addressed in sequence to execute complex
machine instructions. If data dependent conditions are required, the control
memory output is used as an input to an external logic circuit which
implements the necessary function.
In addition to data conditioning logic, circuits external to the
memory will be needed to control the order and timing of the microinstruction
addressing sequence. It has been estimated that the external logic will
require approximately 300 gates er roughly 30 percent of the present control.
A.A.I Partition_n_ the Memory
For the portion of control logic rmpresented by the memory, parti-
tioning is simple and efficient. First, since the interconnections other
than address inputs and control signal outputs are all made on memory chips,
difficulties due to pin limitations are greatly lessened. Secondly, since
all the memory chips are connected in the same manner; only one type of
A1
substrate need be designed for the entire memory. If read/write memories
are used, all memory substrates will be exactly identical. If read-only
memories are used, everything except the contents of the individual chips
will be identical for all substrates. In either case, the use of one substrate
type to implement 70 percent of the contr_l logic represents a major
achievement. -
A preliminary design of the control memory needed for the OBP
indicates that a memory of approximately 150 words by 120 bits should be
adequate. This size assumes that all words are of uniform length and is
therefore a maximum figure. Since all microlnstructions do not make use of
all the control outputs, it may be possible to design a memory with individual
blocks of words containing only the control bits needed to perform particular
microlnstructions. For the following results, however, a uniform word
length memory was assumed.
To permit an estimate to be made of the number of substrates required,
it was further assumed that a 256 word by 8 bit memory chip was used as the
basic building block. This is the size of one of the largest monolithic MOS
memory chips currently available and was cnnsidered a realistic candidate
for selection should.this cesign approach be _lected.
Using these assumptions it was determined that 15 such chips would
be required. It was found that these could be placed on four identical
60 pin substrates.
A.A.2 External l_ic PartitionlnR
Although a detailed psrtitioning scheme was not worked out for the control
logic which was not incorporated into the memory, extrapolation from the results
_2
I
!I
L_
!1
I
!
t
I
obtained from the design of the general purpose logic structures described
earlAer should provide reasonab_ accurate estimates of what can beachieved.
Assuming a linear relationship between the m_ber gates and the number of
substrates required, it should be possible to place the 300 external gates on
16 sixty pin substrates or six 90 pin substrates. If 90 pin subatrates are
used, it appears that all of the control logic could be mounted on
approxlmatel_ i0 subatrates.
_._.3 Problems I_troduced by Kicro_roKrammlng
One of the most serious dr_backs to the mlcroprogramming approach
described so far is the slow speed of presentl_ available low power P-channel
MOS memories. These memorios typicall_ have access times of the order of
one or two mlcrosecoluis. The clock presentl_ used in the OBP has a pulse
duration of approxlmate_ 320 nanoseconds and a period of 1500 nanoseconds.
Sin_ al_ operation_ must be completed before the clock pulse goes high,
there are only 1180 nanoseconds per clock period in which to manipulate or
transform data. It is obvious that problems occur with a one microsecond
RO_ in these circ_u_stances since the time remaining after the control signals
are available is insufficient for most processor operations.
One solution to this problem is to use a faster memory such as the
available high speed bipolar TTL memorles. However, the power dissipation
for a bipolar memory c_the required size would be more than i00 watts. This
figure is clear_ out of the range of interest for the OBP. On the other hand,
mem_rles are proposed for the near future which should operate at
sufficiently hi&h speed and require an extremely low amouat of power. These
may provide the best solution when they become available. ___
_3
A second possibility is to reduce the clock frequency and thereby
increase the period. It_as been calculated that a reduction from the present 667
KHz to _00 KHa would allow sufficient time for a PMDS ROM to operate properly.
It is beyond the scope of this report to determine whether a decrease in fre-
quency of this magnitude can be tolerated. 400 _Hz was shown to allow sufficient
time to access a control word and perform an 18 bit addition in one _lock
cycle.
The third method is to compensate for the access time of preeent_
available ROM's by overlapping the microinstruction fetch with the previous
control word. If the ROM output word is stored in a clocked buffer register,
the next mlcroinstruction can be accessed during the execution of the present
one. The present control word is protected by the buffer since the new output
word cannot enter the register until the next clock pulse. In this way the
necessity of waiting after each microlnstruction for the R0M to produce the
next control word ie avoided.
The major obstacle to this approach is the amount of hardware needed
to buffer the ROM outputs. To buffer 120 control lines, a like number of
flip flopswill be needed. If discrete flip flops are used this means
increasing the amour_ of hardware by approximatel_ one third. However, there
are currently available MSI devices which would reduce the required amount of
hardware and could be placed on the same substrates as the ROH'e. The use of
either counters or other HSI functions capable of acting as clocked output
buffers may make this method practical.
Another obstacle to this approach is two clock cycle add time of the
present deiign. In the context of mlcroprogra_ning, thls means that the same
location must be accessed continuously for two clock cycles in control words
L -- " _,_:"_:, ....... ,........ :.,a._ ,_ --'L'_,'.:...... _-_ _ -!" _
!_k
I
whore additions occur. The control logic needed to ,ccompJ.tsh tl_8 would
probab_7 be a phase fllp flop scheme much like the one presentl_ used.
A preLiminary design shc_ed that approximately 50 additional gates
would be required to produce all the necessary timing conditions. This
approach is undesirable both because it requires a substantial aaount of control
logic external to the memory and because it complicates the timing control.
To avoid these problem,ilt is anggested that a high speed adder be investigated.
_._._ Summary of MicrouroKranned Csn#rOl
Hicroprogranaed control provides a great reduction in the OBP
control logic through the replacement of 70 percent _f the logic by compact
L_I memories. T_e logic which cannot be programmd in the aemory can be
effectlve_y partitioned through the use of general purpose logic structures.
Although _he speed of presently available P-channel M0S memorles
presents some design problems, it is anticipated that these will be avoided
when large densely _ackaged C_0S memories becaae available. The use of
C_OS memories will also greatly reduce the power requirements of the OBP.
The adoption of a one clock cycle adder will also enhance the advantages
offered by this design approach.
Because mos_0 of the connections in a microprogra_ed control unit are
made on the mmnory chip itself s the number of necessary wired connections
is relatively small. _naequently system reliability should be improved.
4._ Smmmary of Task
Both approahhes described above will facilitate the partitioning
of the OBP control logic. However, the general purpose logic structure approach
is real_7 on_ a partial l,olution in that by itself it allows only some of
_5
the reductions that are possible when it is used in conjunction with micropro-
grammin_. As mentioned above, it should be possible to mount the control
on 52 sixty pin substrates, using the general purpose logic technique alone.
If, on the other hand, most of te_t_ic is incorporated into a memory, and
general purpose structures are used _o partition the rest, it is estimated
that 20 sixty pin substrates should be sufficient. If 90 pin packages are
made available, the numbers of required substrates for the two techniques
can be reduced to 27 and i0 respectively.
Based on these results it appears that the optimum design approach
is to employ microprogramming to the maximum extent possible, using general
purpose logic structures to partition the remaining logic. Since it has been
shown that significant reductions in the rqquired number of substrates can
be made by relaxing pin limitations, it is further suggested that the
possibility of using substrates with more pins be eeriousl,v oonsidered.
It should be noted that although the use of substrates has been
assumed t'.xoughout this section, the results appl_ equal_v to any type of
replaceable package (e.g., printed circuit cards, etc. ). Substrates were
used because they appeared to be the most likely packaging device for the
next generation OBP.
66
