A comparison of processor technologies by Wachter, Eddie R.
Virginia Commonwealth University 
VCU Scholars Compass 
Theses and Dissertations Graduate School 
1983 
A comparison of processor technologies 
Eddie R. Wachter 
Follow this and additional works at: https://scholarscompass.vcu.edu/etd 
 Part of the Computer Sciences Commons, and the Mathematics Commons 
 
© The Author 
Downloaded from 
https://scholarscompass.vcu.edu/etd/5617 
This Thesis is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has 
been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. 
For more information, please contact libcompass@vcu.edu. 
College of Humanities and Sciences 
Virginia Commonwealth University 
This is to certify that the thesis prepared by Eddie R. Wachter 
has been approved by his committee as satisfactory completion 
of the thesis requirement for the degree of Master of Science. 
Dr. James E. Ames, IV 
Director of Thesis 
/ 
Dr. Richard Allan 
Committee Member 
Dr. Francis R. Kane 
Committee Member 
Committee Member 
Dr. William E. Haver 
Department Chairman 
Dr. Elkse v. P. Smith 
Dean, College of Humanities 
and Sciences 
A Compa r i son of Proce ssor Technol o g i e s  
A the s i s  submitted in par t i a l  ful f i l lment of the 
requi rements for the degree of Master of Sci ence 
at Virginia Commonwealth Unive r s i ty . 
by 
Edd i e  R .  Wachter 
Director : Dr . Jame s E .  Ames ,  IV 
Af f i l iate As s i s tant Pro f e s so r  
Depa r tment of f'.'Ja thematica l Science s 
Virg inia Commonwealth Univer s i ty 
Richmond, Virg inia 
May, 1 9 8 3  
ii 
Acknowl edgements 
I wis h  to thank Dr . Ame s for hi s support and a s sistance 
in preparing the s tudy , Dr . Richard Al l an for his 
experience and insights which guided me in the right 
dir e c tions ,  and above a l l , my f amily for under s tanding 
the time involved in pursuing a degree at night ove r  
t h e  l a s t  four year s . 
List of Figures 
Abstract 
Introduction 
P roc essor Technology 
Table of Contents 
Centra l  Processor Organization 
Main Storage Systems 
Input/Output Systems 
Reliabi lity , Ava i l a b i l ity , Serv i c eability 
Applications 
Futures 
i i i  
iv 
v 
1 
4 
17 
4 0  
48 
5 7  
6 3  
• 6 9  
iv 
List of Figures 
Figure 1 - Switching Speed vs. Power Consumption 7 
Figure 2 - Amdahl 580 Bus Organization 23 
Figure 3 - CDC CPU Block Diagram . . 29 
Figure 4 - IBM 308X Processor Units 32 
Figure 5 - IBM Central Processor Flow Diagrams 34 
Figure 6 - Univac 1100/90 System Configuration 38 
v 
Abstract 
The purpose of thlS paper is to present a discussion 
of the technology implementation and design of four very 
high performance mainframe computer systems. The systems 
evaluated are: 
Amdahl 580 Series 
CDC 170 Series 800 
IBM 308x Series 
Univac 1100/90 Series 
Included in this evaluation is a survey of the technology 
used, its characteristics, packaging and performance. 
Each system component is evaluated on the basis of design 
philosophy, technology, and the total system design with 
regards to reliability, availability, and performance. 
Introduction 
Basic to a l l  computer systems is a pre-defined and 
establ ished architecture and the phys i c a l  implementation 
of this architecture . Often times these two areas are 
confused when a study is undertaken and therefor e  some 
background is presented here to distinguish between the 
two structures. 
The architecture of a machine is not a b luepr int for 
the design of a computer system, but a descr i ption of the 
log i c a l  appearance, the conceptua l  structure and 
functional behavior of the processor, as v i ewed by the-
programmer . This information about the architecture of 
the machine defines how the hardwar e  w i l l  respond to the 
software. When programs running on d ifferent machine 
implementations produce ident i c a l  r esults as defined by 
a sing l e  architecture, the impl ementations are considered 
compatible . Most ma j or compute r  manufacturers, such as 
those studi ed here, have deve loped fam i l i es of compatible 
computer systems in which the architecture of the 
mach ines within the fam i ly are the same, yet the 
implementation of that architecture may v ary from one 
I 
2 
machine mod e l  to another. 
The reasons that manufacturers develop different 
impl ementations are manifold. One is to take advantage 
of technologica l  changes that have been deve loped in the 
l abora tory and are now commercial ly avai l ab l e. Another 
is to functionally enhance a product with specialized 
hardware ,  mic rocode and features which add to the 
versatility of the machine. But most of a l l ,  
imp l ementations are c hanged to provide varying levels o f  
processing speed , a lso known a s  computer performance , 
within a product line. These different implementations 
provide different oper ating speeds , · yet are functional ly 
compatible. 
This is especial l y  t rue where p lug compatibl e  main­
frames (PCM ) compete , such as I BM and Amdahl. Both 
manufacturers co;nform to the I BM System/3 7 0  Archite c tur e , 
yet the physica l  impl ementation by each vendor of that 
architec ture is completely different. As long as a l l  
machines within a family of systems or a l l  P CM machines 
meet the specific a rchitec tural r equirements , the system 
evaluation c riteria changes to other components , such as 
price , flexibility , re liability , and performance. 
3 
In order to understand the s e  part i cular areas j u s t  
mentione d , a c loser look a t  the phy s i c a l  imp l ementation 
i s  required . How the manufacturer has des igned the 
funct iona l  components of the proc e s sor from the type o f  
c ircui try used , d e s i gn of the functional parts , and how 
they a l l  communi cate to achi eve the architectura l  s t andard 
has a d irect bear ing on performance , r e l i abi l i ty ,  and 
flexibi l i ty. 
I t  i s  thi s aspect of proces sor des ign that i s  
presented here. 
P rocessor Technology 
The circuitry design and technology use d  in 
mainframe computers are the primary determinants of the 
processing power of the computer. The r esul ting 
performance of the C entral P rocessing Unit (CPU )  is 
determined by the cyc l e  time and the number of instructions 
that c an be performed in a given cycl e. The number of 
instruc tions in a given cyc le depends on the particular 
architecture of the machine , the number of logic l eve ls 
between driving and r e ceiving registers and the degree 
of par a l l e lism in the design. Since the cycle time of 
the machine is very dependent on the technology chosen 
for the hardware implementation , name ly the circuits , 
packaging and interconnections , substantia l emphasis has 
been p laced on research in this area . Improvements in 
processor throughput c an then be made simply by c hanging 
to a faster logic circuit without c hanging the basic 
architecture of the computer. 
using the traditional engineering 1/3 rule for delay 
estimates , the cyc l e  time of a processor is roughly 
4 
a l located as fol lows : 
1/3 circuit switching 
1/3 loading and unloading of power 
1/3 interconnection transmission. 
5 
For examp l e , if the cyc l e  time is to be 2 4  ns (nano 
seconds , or bil lionths of a second ) , then each function 
should be compl e ted in 8 ns . I f  they can be comp l eted 
in l ess than 8 ns , the cyc l e  time c an be reduced 
according ly . 
The circuit switching time is the total circuit de l ay , 
not that of an individual circuit. Since there are 
typica l ly 6 to 8 l ev e ls o f  logic per circuit , the net 
delay p e r  circuit should be l ess than 1 ns each to stay 
within the tot a l  a l loca tion o f  8 ns . 
I n  order to unde rstand the performance associated 
with the CPU's studied , it is necessary to understand the 
types of circuit technology use d  by the manufacturers in 
each component of their CPu . Each technology has its own 
charac teristics and applicability to diffe r ent processes , 
and the industry has standardized on a few o f  them . 
6 
Thos e  currently in manufacture , regarded a s  " state 
o f  the art" , are; 
TTL - Transis tor-Transis tor Logic 
ECL - Emitter Coupl e d  Logic 
NMOS - Negative Channel Metal Oxide 
S emiconductor 
G aA s  - Gal lium Ars enide 
Josephson Junctions 
Figure 1 diagrams the switching speeds and power 
consumptions o f  each of thes e  technologies .  
Transis tor-Transistor Logic , TTL , developed in the , 
mid 1 9 6 0 ' s ,  has provided both the standard for interf a c e s  
from c omputers to peripheral e quipment , and mainframe 
l ogic circuits . Since it is a more mature technology , 
TTL manufacturing costs are l ow and yiel d s  are high . 
Moderate power i s  required to power the TTL gate , and 
switching time is s low , on the order of 1 /3 to 1/4 a s  
f a s t  a s  ECL . TTL circuits require special termination o f  
e ach gate to e liminate transmi s sion line like e f fects , 
and .are general ly mas s  produced in common form , using 
part of the circuits on each chip . 
7 
1 0 0  
10 u (j) (j) UJ 4-l "'-co UJ 
tJl ..c "'- 4J 1 :>. I=: co 0 .--i .,-i 
(j) .--i 
Cl .--i .,-i 
a:l 
. 1  
. 0 1 1 1 0  1 0 0  
Power Consumpt i on 
Thousandths/wat t  
Figure 1 
8 
ECL , Emitter Coupled Logic , i s  the fas t e s t  
c ommercia l l y  available technology and has been around for 
approximat e l y  10 year s . I t  operates 3 - 4 time s faster 
than TTL , a l though its power comsumption is higher , on 
the order o f  twice a s  much . Another attribute o f  ECL is 
its c apability to drive more circuits ( place signal s  to 
many other circuits )  thus r equiring l e s s  circuitry to 
perfo rm a logic func tion . This l e ad s  to higher 
re liability due to fewer circuits being used . 
Considerations in using this type o f  circuitry is that 
ECL is very c ritica l  o f  circuit l ength , which is 
compensated with higher power r equirements . ECL is used 
for main logic units and high speed 'memory . 
Ac tive e l ement or vo latile memorie s are the latest 
teqhno logy for fabricating computer memory. Early memory 
device s  such as magnetic cores have been replaced with 
large s c a l e  integration ( LS I )  memorie s  which are much 
faster , more compac t , and r equire l e s s  powe r . 
NMOS , Nega tive Channe l Metal Oxide S emiconductor , 
is the dominant memory technology . I t  offe r s  moderate 
speed in switching , very high density , low power 
6onsumption , and a reasonable manufac turing co s t . 
9 
U sing NMOS techno logy , 2 5 6k ( 2 5 6 , 0 0 0 )  chips are pos sib l e, 
wherea s  current proce s sor s  are now using 1 6 k  chip s . 
The primary reason that 6 4 k  and above chips are not 
currentl y  used i s  due to manufacturing costs a s sociated 
with the f a l lout percentage o f  denser chip s . Once this 
problem is s o lved , 6 4k to 2 5 6k chips wil l  be standard . 
GaA s , G a l lium Arsenide ( compound s emiconductor ) , a 
new technology , o f fers three to four times the d evic e  
speed potential o f  silicon-based circuits . This 
technology has a number o f  s hortcomings to be deal t  with , 
such a s  manuf ac turing dif ficul t y  and high power 
consumption , but o f fers the advantage s of high speed , 
o 
a high d egree of fanout , and increased speed a t  0 F .  
G aAs o f fers an a lternative to Josephson Junc tions with 
a reasonabl e  environment , but as with any temperature 
s en sitive device , control of the environment become s the 
d etermining factor if a techno logy is to be u s ed or not .  
Josephson Junctions are transis tors that have a very 
f a s t  switching speed due to their superconductivity , a 
re su l t  o f  its operation near absolute z ero tempera ture . 
Their development has been limited to l aboratory 
experimentation due to the temperature requirements 
1 0  
and there fore are not currently applicable to commercial 
use . 
Onc e  the basic techno logy i s  e s tablished by a 
manufac turer , the chip and its a s sociated packaging 
l eve l s  are designed according to the architectura l  
f eature s o f  the machine . The number o f  circuits per 
chip and the interconnections of the chips weigh heavil y  
o n  t h e  proces sing speed of t h e  CPU . I f  pul s e s  mus t  
travel great distance s  ( greater than 1 0  cm ) t o  the next 
logic circuit , the speed o f  the machine i s  reduced 
accordingl y . I t  is therefore a prime de sign criteria 
to place as many circuits on a chip tha t are related to 
a sing l e  operation , and to p lace chips with re l a ted 
functions in proximity to one ano ther to reduce the 
interconnection dis tance .  
The following is a lis t o f  implied "rul e s "  each 
manufacturer follows in designing the physica l  s tructure 
of their processor in order to achieve speed , reliability , 
and performance . 
1 1  
1 .  U s e  high speed switching circuitry 
as the ba sis for l ogic circuits. 
2 .  M aximi z e  the number o f  circuits per chip . 
3 .  Reduce the number o f  interconnec tions and 
the dis tanc e  between them. 
4 .  P lace needed microcode , where used , c lo s e  
t o  circuitry that require s it . 
5 .  Cool the techno logy to provide the 
nec e s s ary speed and reliability o f  
the circuitry . 
How each manufacturer has impl emented the s e  "ru l e s" 
depends on the technology they are f amiliar with and their 
de sign objec tiv e s  for the proce s s or ,  such as speed , 
f lexibility , redundancy , cooling technol ogy , and price . 
Let us now examine each manufac turer against the 
component s  ( chips and techno logy ) de s cribed above . 
Amdahl Corporation's 5 8 0  s erie s o f  proce s sors use s 
ECL circuit technology , p l a cing 4 0 0  circuits on a chip 
with a 9 5% use factor . Because o f  ECL circuitry , gate 
delays are in the 4 0 0  pico second ( tril lionth o f  a s e cond ) , 
power required per c hip is 3 . 0  watt s , and the f anout or 
ability to drive o ther circuit s  is 2 to 5 circuits . Due 
to the high power requirement s , uniform cooling of the 
chip is nece s s ary and this is accomplished by the use o f  
1 2  
a c oo ling fin bonde d  directly t o  the chip . Air from the 
computer room i s  drawn over the chip acro s s  the cooling 
fins and out of the mainframe to be coo led by the computer 
room air c onditioning . Amdahl proce s sors therfore require 
no coolant refrigeration units or chi l l ed water unit s  
t o  dis sipate h e a t  generated b y  t h e  proce s sor . 
Each chip is soldered into an appropriate mul tipl e  
chip c arrier , (Mee ) , of which 8 are required t o  make up 
the l ogic circuitry of the pro c e s sor . Each Mee contains 
1 21 chips made up o f  logic and Random Acces s  Memory (RAM) 
chip s to hol d  the microcode . The RAM chips are placed 
next to l ogic chips requiring the m�crocode to reduce the 
interconnec t  time o f  the signal .  Each Mee is a 14 l ayer 
printed circuit board and the 8 Mee's are housed in a Mee 
s tack with side pane l s  for interconnections , which are 
a l s o  mul ti - layered printed circuit board s, and the who l e  
Mee s tack occupie s 5. 6 cubic f ee t . The 8 Mee's are 
de signated as f o l l ows : 
5 - CPU 
1 - I nstruction Unit 
1 - Execution Unit 
1 - Storage Unit 
2 - High Speed Buf fe r  
1 3  
1 - Input/Output P roc e s sor ( 2nd optional) 
1 - Console P roc e s sor 
1 - Memory Bus Contro l l e r  
The functiona l  replace able unit the r fore consists 
of the 1 21 chip MCC, each of which p e r forms a particular 
function of the CPU. 
Control Data Corporation ' s  Cyber 1 7 0  Mod e l  8 7 5  
proce s sor utilize s ECL circuitry on p lug-in circuit 
boards that are not gene r a l ly considered large scale 
integration, LS I. The boards are mounte d  in a logic 
cha s si s  that is a functiona l  unit, i. e. CPU, I/O control, 
and memory . Diagnostic s are done to individua l chips, 
and replaced at that l eve l. Utilizing ECL technology, 
CDC has advantag e s  of high f anout, low g ate delay s, but 
a somewhat high power r equirement . Cooling is 
accomplished by a c losed loop chil l ed water syste� . 
1 4  
The CPU consists o f  3 main bay s o r  f r ames, each 
housing a particular componenti the central proc e s sor, 
the input/output unit, and the c entra l  memory . The 
centra l  proce s sor consists of 9 independent functional 
units and a control ling centra l  proce s sor. 
I BM Corporation's 3 0 8x s erie s of proc e s sor s uses TTL 
technology for its primary CPU circuitry . The TTL chip 
has a theoretical 7 0 4  g ate s per chip with an e f f e ctive 
utilization of approximate l y  6 0% or 4 0 0  circuits . 
switching time per gate is 1 2 0 0  picosecond s  with a low 
power r equirement of . 4  to 2 . 7  watts per chip . Fanout 
ability is 1 to 3 circuits . Cooling of the chip is 
a ccomp lished by use of a heat conduction mechanism of 
a c ylinder touching a chip, tran s f e r ring the heat of the 
chip to a helium chamber, through a l ex an interposer to 
a water j acket where heat is carried out of the proce s sor 
by a s erie s of water hos e s .  The unit, c a l l ed a Therma l 
Conduction Module, TCM, contains 1 3 3  chips per unit and 
is the fie l d  replaceable unit of the proce s sor . Chips 
c an either contain logic circuitry or hol d  microcode for 
use by neighboring chips . The centra l  proce s sor consists 
of, depending on CPU mode l, 1 9  to 5 4  TCM's . The ba sic 
1 5  
uni proce s sor mode l  3083 i s  con f i gured with TCM's a s  
fol low s : 
8 - CPU 
1 - Execution E l ement 
l� - Instruction E lement 
3 - High Speed Buf f e r  
2 - Control Store E l ement (mi c rocode control) 
� - Var i able l ength instruction execution 
6 - External Data Control le r  - channe l contro l l e r  
5 - Sy stem Contro l l e r  
Each maj or function , CPU , EXDC , and S C , are housed 
in the i r  own multi l ayered printed c ircuit board . 
Functions are p laced in proximity to other f requentl y  
r e f erenced functions , and microcode i s  d i str ibuted to 
memory chips on each TCM . Communi c at ion between CPU , 
EXDC , and S C , i s  accomp l i shed v i a  cable connections . 
16 
The Sperry Univac 1100/90 series of computers u s e s  
high speed ECL technology incorporating gate array 
impl ementation , with 168 gates per chip . Each c hip 
dis sipates approximately 5 watts o f  heat . Univac c a l l s  
its new implementation High Performance packaging which 
is 10 times as dense a previous mode l s . A custom 
r e ctangular ceramic c hip c arrier is used with 5 4  connecting 
leads f rom the c hip to the circuit board . The fie l d  
replaceable unit is this chip carrie r . The circuit boards 
are placed in pair s in a stack arrangement with a liquid 
cooling plate between each board of the pair . Board to 
Board interconnection is accomplished via two side panel? 
and a backpl ane . Boards are mounted in 3 functional 
f r ame s : CPU , I/O proces sing, and memory . Cooling of the 
units is accomplished by c hi l l ed water s ystems and a 
radiator e f fect o f  drawing air over a r adiator type 
dis sipator and between the c ard s . 
Centr�l Processor Organization 
Once the basic circuit technology is established and 
the packaging has been designed, the next most critical 
factor in processor speed is the component construction 
of the central processor. It is the primary function of 
the central processor to take an instruction, decode it 
into its component parts, perform the requested operation, 
and either store the results or indicate the results by 
use of setting some type of flag or condition code. The 
components of the central processor, how they perform 
their tasks, and their interrelation all contribute to 
overall system performance. Instruction timing, micro­
coding of operations and instruction control are the 
components of the central processor and the functions 
that must be managed. 
The cycle time of a particular machine does not 
necessarily directly translate to processor speed. 
Moreover, it is the number of cycles that it takes to 
execute an instruction that determines the MIPs, or 
millions of instructions per second. Theoretically, if 
17 
18 
all machines had similar cycle times and performed one 
instruction per cycle, then these machines would perform 
equally. Since technology has its limiting factors in 
number of circuits and switching times, other design 
concepts come into play. One of these is pipelining 
instructions where 2 or more instructions are in various 
phases of execution at the same time. Each instruction 
to be executed must be verified for a valid operation, 
have its operands fetched, set a condition code, be 
executed, and have its results stored. If each function 
can be performed on a different instruction in a 
different phase, theoretically this results in one 
instruction executed every machine �ycle, provided a good 
algorithm for branch determination is used. 
Another way that instruction execution time can be 
improved is to utilize multiple processors to perform 
instruction execution in parallel. This parallelism can 
be accomplished either by universal processors, one that 
executes all instructions, or specialized processors that 
execute particular types of instructions such as boolean, 
floating point, decimal, etc. The degree of overlap or 
parallelism controlled by the instruction processor has 
a direct bearing on the throughput of the processor. 
1 9  
�hould these concepts be used together, such as 
pipelining and parallelism, quantum factors in throughput 
can be achieved. 
In order to make the machine as flexible as possible, 
where new features can be added without significant 
changes to the processor, microcode is used to control 
the flow of logic. While somewhat slower than hardware, 
microcode is much more flexible. Not only is it used for 
logic control, but diagnostic functions utilize microcode 
to facilitate scanning of circuits to determine their 
status so that complete system monitoring can be 
accomplished. 
There are two types of microcode used in high speed 
processors, horizontal and vertical. Horizontal micro­
code is a wide word or bit string which generally 
controls one machine cycle. Each bit within the word 
controls a data path of the operation, a 1 signaling to 
take the path, a 0 not to. Horizontal microcode is used 
where high speed and flexibility are needed, since the 
testing of a bit determines the data path. Vertical 
microcode, on the other hand, is characterized by being 
20 
a set of instructions, much like a mini program, that 
are tailored to the specific task. Vertical microcode 
is easily written and modified, and is generally not 
hardware dependent. As a result, simulation of routines 
is a prime candidate for vertical microcode. 
Controlling the flow of instructions, data, and 
results between main memory, instruction units, and 
other components of the processor is the purpose of a 
system controller. Where there is no overlap and 
everything is done sequentially, (each task must be 
completed before the next begins), no system controller 
is needed. But in order to take advantage of overlap, 
pipelining and parallelism, control of the system must 
be undertaken. A prime implementation of using a 
controller is the use of a bus system architecture, 
where the controller provides paths and message traffic 
control between the major components, insuring that as 
many components of the processor are busy as possible. 
The efficient use of all system resources along with 
minimal serialization and efficient system control 
enhances the throughput of the machine. 
2 1  
Each manufacturer has designed his CPU t o  exploit the 
features of their product, based on the target market 
for their machine and the limits of the technology. 
Also key to the internal design of some machines is 
providing the ability to have different MIP rates of the 
same machine by varying internal communications. This 
provides the manufacturer to build one basic machine 
and upgrade a processor by enabling intra-system 
co�unication and overlap. This concept is most 
prevalent in the commercial processor environment. 
Amdahl 
The Amdahl 58 0 computer system is designed as a 
uniprocessor rated at 1 3  MIPs. This rated speed is 
achieved by utilizing a cycle time of 24 ns and a pipe­
line architecture whereby several instructions are in 
various stages of execution at the same time. The CPU 
is composed of 8 multiple chip carriers (MCC's), each of 
which has a particular function within the processor 
complex. They are housed within a stack 5 . 6  cubic feet 
in size and are mounted horizontally in the stack and 
connected by two uni-directional communication buses. 
22 
The buses are distributed by 2 multi-layered printed 
circuit boards that form the sides of the stack. 
(Figure 2 ) . Five of the stack implemented functional 
units compose the CPU: 
Instruction Unit (I-Unit): Fetches, decodes, and 
controls instruction execution 
• Execution Unit (E-Unit): Provides computational 
facilities of the CPU 
Storage Unit (S-Unit): Controls instruction 
operand storage and retrieval facilities 
Instruction Buffer (I-Buffer): High speed buffer 
storage for instruction streams 
Operand Buffer (O-Buffer): High speed buffer 
storage for operand data 
Within the Amdahl 58 0 ,  two sets of functions are 
performed simultaneously: instruction fetch, which 
provides a doubleword of instruction stream every cycle 
holding it in the I-Unit for execution, and instruction 
execution. Instruction fetch for each cycle looks at the 
instruction needed to fill the instruction buffer, be it 
a branch address or next logical instruction, and has it 
fetched and prepared for holding in the instruction 
buffer. Extensive buffering of target instruction 
streams allows for early decision making on branch 
2 3  
580 BUS ORGANIZATION 
- � ------- - - - -- -. 
I I �----------� --.-. 
r---------- ---, 
I I '---�----------� 
.. I CONSOLE 1 .. - 1 j-
- J IOP1 1 .. - 1 r-
- 1 ·IOPO 1 .. -- I 1-
.. 
�--------�-----., 
..... 
- �---- ---- ----� ...... 
, , J l .. I I .. - 1 MBC J-
.. 1 S-UNIT L_ - L 1-
-
- 1 BUFFER 1-- I j -
- 1 . BUFFER 1--- 1 j-
.. 1 I-UNIT I .. - 1 J-
- I E-UNIT 1-- I J -
r--------------, 
I I 
L- _______ __ __ --� 
Figure 2 
24 
ins tructions and minimal loss of was te d  instruc tion 
fetch . 
I n s truc tion execution is accompli shed by pre senting 
a new ins truc tion to the pipe line ( execution proc e s s )  
The pipe line consis ts of the following 5 pha s e s: 
• Generate - decode the ins truction , validate 
the opcode , generate operand addre s s e s , and 
s e nd the opcode to the E -Unit 
Buffer - acce s s  the operand buffer for operands 
Luck - logica l functions , comparison s , e t c. 
c ondition code s are s e t  
Execution - calcula tions are done 
Write - re sul ts are s tored in regi sters or O-Buffer 
Extensive u s e  of microcode is used in the I ns truction 
unit and Execution Unit for ins truction control. 
Horizontal microcode is used for fast acce s s  to control 
l ogic f low . The microcode is loaded on 4 K  RAM chips on 
the MCC a s sociated with the func tion , and the RAM chips 
are located c lo s e  to the logic chips that use it . Acces s 
time of the RAM chips is 7ns , within the 1/3 of the cyc l e  
time o f  the machine , 24 ns. 
25 
The S torage Unit a long with the High Speed Buffer 
provide s torage for ins truc tions and operand s . The 
S torage Unit consis ts of one MCC , whi l e  the High Speed 
Buffer consis ts of two MCC ' s ,  one for instruc tions and 
one for operands . Both are 3 2K byt e s  in size and have an 
acce s s  time of one cyc l e , 2 4 ns . The S torage unit 
receive s  and proce s s e s  a l l  data traffic between the CPU 
data buffers and main memory , and a l l  ins truc tion unit 
data requests for ins truction preparation , taking 
advantage of the bus architectur e . 
v�ith each MCC being a functional unit and the side 
pane l s  acting a s  bus e s , control of a l l  proces sor 
functions is critical . Another MCC , the Memory Bus 
Contro l ler , provide s  c ommunication p aths and message 
traffic control between the ma j or func tional parts . Each 
MCC can direct reque s t s  to another MCC via the bus 
s y s tem ,  a nd the MBC determine s  whether or not it needs 
to be involved to control acce s s . A pas s-thru circuit 
on the MBC a l lows for a direct connection between 
functional MCC ' s  . 
. 
Amdahl has bui l t  in concepts for providing additional 
performance proces sors using the s ame de sign . A s lower 
26 
speed model is achieved by changing the ins truction 
pipe line or increasing the cyc l e s  per ins truction. A 
l arger throughput proces s or is achieved by attaching the 
bus e s  of two proces sors together forming a mul tiproces sor 
configuration. 
Current a nnouncements by Amdahl are a range of 58 0 
proces s ors with a M I P  range of 1 0  to 24 M I P s . 
CDC 
58 5 0  - uniproc e s sor - 6 4 K  HSB - 1 0  M I P s  
58 6 0  - uniproce s sor - 6 4 K  ESB - 1 3  M I P s  
58 7 0  - a t tached proc e s sor - 1 28 K  H S B  - 22 M I P s  
588 0 - mul tiproce s s or - 128 K  H S B  - 24 M I P s  
The C D C  Cyber 1 7 0  8 7 5  i s  a uniproces sor de sign using 
a central proces sing unit , nine independent functional 
unit s , a s torage move unit, and a central memory contro l . 
I t  is c l a s sified a s  a uniproc e s sor only for the reason 
that a second proces sor with nine additional functional 
units can be added to form an a ttached proce s s or 
configuration. 
27 
Each of the nine func tional units is  a spec i a l ized 
ari thme t i c  uni t  with an a lgori thm for performing a portion 
of the c en tra l proce s sor ins truct ions . The pro c e s s or i s  
d e s igned w i th a 25ns cycle time , rated a t  19 M I P s . 
The central proces sing uni t con s i s t s  of operat i ng 
reg i s ters and control logic to prefetch i n s truc t i on s  and 
p a s s  them to the appropriate func t ional uni t ,  and s tore 
the re sul t s  to central memory contro l .  E ach uni t  i s  
independent o f  the others and mul tiple uni t s  can be 
in opera t i on at any one t ime providing overl apped 
i n s truc t i on execution . The functional units are: 
Boolean unit: logica l  operations 
Shift unit: shift l eft , right , pack , unpack 
Normal ize unit: floa t i ng poi nt norma l i za tion 
Floating add unit: add , subtract float i ng point 
Long Add unit: add , subtract extended float p t  
Mul tiply uni t: mul tiply floating point 
Divide unit: divide floating po i n t  
population count unit: count 1 b i t s  in operand 
I ncrement uni t: addre s s  generation 
28 
The S torage Move Unit performs a l l  block copy transfers 
to and from extended memory . The Central Memory Control 
unit control s  the f low of data between central memory 
and the sys tem components . ( Figure 3 ) .  
The c entral proces sor uses  a 1 2  word ins truction 
s tack that performs a function simil ar to high speed 
or cache memory . The ins truction s tack receive s  
ins truc tions from main memory and c a n  hold up t o  4 8  
ins tructions .  Executed ins tructions are not dis c arded 
but are retained , providing a facility to loop within 
the s tack or branch to a s tacked ins truction . 
I n s truc tion execution is accompli shed by presenting 
operand data via regis ters to the appropriate functional 
unit by the c entral proce s s or .  The operation is 
performed within the number of c lock cycles required 
and the re sults are placed in output registers . The 
central proces sor then reads thes e  regis ters and places 
the re sul ts in memory . Qua si-pipe lining is used on mo s t  
functional units since data pa s s e s  through a s e t  of 
registers during each cycle within the functiona l unit . 
Therefore a new s e t  of operands can be s tarted into the 
2 9  
��T�7��;�-------------------------------------' 
, , 
I I 
I I 
" .... ". 8-D. _. • _. 
_n , 
.. 11 a -- .  7f � .. • � 'U"U VIlli I IIIC'T_ .,.. ...-r_ .... U NIl" ,..TIII lICIT" 11111'. ., "' I 
fI ........ iI ...... iI ...... � .............. I. ...... I. .............. I. .. '·.T . : 
a.'Ul PItOCllIIlIG I.oWI&T I 
• xn ..... L 
nO"a" 
III)VI .. IT 
"'0012 
., .... " II ... ...  
.... _0 r::------------------- -- ---
-:::" 01..vT.o..mvT .....  
�. I ,�, ...... 
--, 
I 
I 
I 
I 
I 
.... _ .... ,{ , 
r---- ___ .J 
�T 
ft. I I 1.0 o.u.lu I ,,"lYI 
L ___________________ J , ... -.. 
-
(i)--(i) a-,CT1 f'O arno.AL CJ' 1 uDfIl'IC4L TO cpa 
(j) "- 1CII.It ... ",.,. QWf...-c. 0III"I'l0UJ. ... ' 
Figure 3 
, 
r--------J 
3 0  
func tiona l unit each c l ock cycle , even though i t  takes 
mul tipl e  c lock cyc l e s  to compl e te an operation . This 
approach provides a theoretical maximum of one ins truc tion 
executed per functional unit per cycle . 
A l l  central proce s sor conponents are hard wired 
with a very high use of specialized instruc tions that 
would otherwise be delega ted to microcode . Extensive 
use of s y stems registers provide for rapid task switching 
and c oncurrent execution of programs . This c omp l ement 
of regis ters a l so grea t ly reduces the number of main 
s torage referenc e s  required . 
Additional proces sing capabilities ra ted a t  3 2  MIPs 
can be achieved by the addition of a s econd centr a l  
proc e s s or forming an attached proc e s sor configuration . 
I BM 
The I BM 3 08 x  s erie s  of proce ssors was origin a l l y  
de signed a s  a dyadic proce s sor , the central pro c e s sing 
unit consis ting of two identical central proce s sors 
running under the control of a system contro l ler . 
3 1  
Var i a t ions of thi s configuration have been deve loped 
and wi l l  be d i scussed later . The 3 08 3  uniproc e s sor , 
mode l  J ,  w i l l  be used a s  the base machine for descr iptive 
purpo se s . The 3 08 3  i s  a 26ns cyc l e  machine rated a t  
7 . 5  M I P s . Thi s  M I P  value i s  lower than Amdahl or CDC , 
even though the cyc l e  t ime i s  r e l at ively the s ame , the 
r e a son be i ng that I BM does not use bus or pipe l i ne 
archi tec tur e . 
The bas ic 3 08 x  Proc e s sor Unit cons i s ts of four 
c omponents: c entral s torage; s y s tem contro l ler ( SC ) ; 
external data control ler ( EXDC ) ;  and one or more c entral 
proce s s or s . ( F i gure 4 ) . Each c omponen t , except central 
s torage , is compos ed of mul tiple thermal conduct ion 
modu l e s , ( TCMs ) , sets of which are d e s i gned for a 
specific purpos e .  The TCMs are mounted vertica l ly to 
l arge backplanes that are mul t i-layered c i rcui t  board s . 
The se boards are connected together to form the proce s so r  
uni t . 
The s ys tem contro l le r  provides a l l  commun ication 
fac i l i ti e s  between component s . Al l s torage reque s t s , 
high speed c ache reque s t s , I /O r eque s t s  and d i spatching 
3 2  
3083 
Central 
processor 
Central � System � Ext. data storage controller controller 
3081 
Central 14- � Central processor processor 
Central I+--t System � Ext. data storage controller controller 
3084 
Central Central Central 
, 
Central 
processor processor processor processor 
0 2 1 3 
� / 
� 
r:? 
System System 
controller �-._. / controller 
/ �-> 
"'-. cF � 
External Central Central External 
data storage storage data 
controller controller 
Figure 4 
33 
of work takes place within the system controller. 
Each component acts as an individual processor to the 
SC and is handled accordingly. Some models of the 308x 
perform overlap of instruction fetch and execution under 
the control of the SC. 
The central processor consists of five functional 
elements each of which is located on one or more TCMs. 
(Figure 5) . The components and their function are: 
Instruction Element - controls instruction 
sequencing, initiates instruction requests, 
decodes instructions, generates operand address 
and executes most arithmetic and logical 
operations. Controlled by horizontal microcode 
Variable Field Element - executes all variable' 
field length storage to storage instructions. 
Controlled by horizontal microcode. 
Execution Element - executes fixed point multiply 
and divide instructions and all floating point 
operations. unit is hardwired, no microcode used. 
Buffer Control Element - 32K cache memory using 
a store-in algorithm. 
Control Store Element - controls sequencing of 
microcode throughout the central processor. 
Hithin each TCM are both logic and microcode chips. Time 
to access microcode on the same TCM is 16ns. 
3 4  
Central Processor (CP) 
Control Buffer 
Store Control 
Element Element 
(CSE) (BCE) 
t 
1 I 
Variable Instr. Execution Field f+- Element - Element Element (IE) (EE) (VFE) 
Figure 5 
3 5  
Each central proce s sor has i t s  own 3 2K high speed 
buffer , and management of the buffer is control led by 
the sys tem contro l ler . Non - s tore -through ( s tore - i n )  
a lgori thms are u s e d  t o  provide a h igher internal 
performance l eve l than s tore - through when cons idered i n  
con junction w i t h  centra l s torage acce s s  t ime . Acce s s  
t ime of c ache i s  5 2ns or 2 cyc l e s . Management of 
referenc e s  to the s ame memory locations by a mul t i­
central -proces sor sys tem i s  contro l led by the sys tem 
contro l ler . Due to s tore - i n  phi lo sophy , if thi s happens , 
data can be transferred from one high speed buffer to 
the other by and through the system contro l ler , and 
process i ng is re sumed. The s tore�i n  a l gor ithm prov ides 
and a s s i s t s  for a feature c a l l e d  hardware checkpo int 
retry . The ba s i c  concept is to e s tabl i sh a checkpo int 
at some particular instruction N,  s tore the content of 
a l l  reg i s ters i n  a backup s e t  of reg i s ter s  and s tore 
changed cache value s in a push down s tack . Should an 
error occur , reg i s ter s and cache w i l l  be returned to 
pre-error status to provide for i ntegrity . The fa i l i ng 
i n s truction i s  retried a specific number of time s , each 
t ime the reg i s ters and cache are reset , and then before 
logg ing the error , the reg i s ters are reset once more . 
3 6  
I BM has currently announced 6 mode l s  of 3 0 8x 
proces sors: 
3 08 3 -E uniproce s sor 3 2K HSB - 3 . 5  M I P s  
3 08 3 -B uniproce s s or 3 2K hSB - 5 . 5  M I P s  
3 0 8 3 - J  uniproce s sor 3 2K h S B  - 7 . 5  M I P s  
( The var i ance s i n  speed of the above proces sor s  
i s  attributed to the use or non-use of 
i n s truc t i on pre -fetch and the number of 
tra n s l a te looka s i de buffer entries . )  
3 08 1-G 
3 08 1-K 
3 08 4 -Q 
Sperry Univac 
dyadic proces sor 6 4 K  HSB - 10 M I P s  
( 2-3 08 3-B central proces sor s )  
dyadic proces sor 6 4 K  HSB - 1 3  MIPs 
( 2- 3 08 3 -J central processors ) 
quadratic proce s sor 1 28 K  HSB - 24 M I P s  
( 2-3 08 1-K proces sors ) 
The Sperry Univac 1 1 0 0/ 9 0  series of computers i s  a 
mul t i - proce s s or providing a range of performance from 
7 to 2 5  M I P s . One to four centra l proce s sors , one to 
four ma in s torage uni ts , and one to four I /O pro c e s sor s  
3 7  
i n  any configuration can be used to de s ig n  a s y s tem . 
( F i gure 6). 
Each CPU cons i s t s  of five separate components , each 
perform ing a specific function: 
I n s truction pipe l ining - a three level pipel i ne 
tha t provide s for overl apped execution of three 
i n s truc t i on s . A "wraparound " feature is prov ided 
so that i ntermediate resul ts from one i n s truc tion 
can be used i n  the next instruct i on i n  the pipe . 
3 2K high speed buffer for ins truc t i on s , wi th an 
acce s s  t ime of 3 0n s  per i n s truc tion . 
3 2 K  high speed buffer for operand s , with an 
acc e s s  t ime of 3 0ns per operand . 
Ar ithme t i c  unit d ivided into 3 d i s t i nc t , spec i a l  
purpose components: 
binary ari thmetic component 
high speed mult iply component 
• dec ima l ari thme t i c  component 
( Each component is opt imized to reduce 
executi on time for its spec ific task ) 
Dup l i cate X f i l e  to accelerate operand and 
i n s truction addre s s  forma t i on . Thi s f i l e  
provides two cop i e s  of the content s  o f  each 
i ndex reg i s ter for i nternal manipulation . 
Each component i s  hardwired , using no mi crocode , and 
achieves a 7 . 5  MIP un iproce s sor w i th a 3 0n s  cycle time . 
opt. 
opt. 
opt. 
4096K 
Words 
Common Bus 
6 4K ESB 
CPU 
lOP 
� I� I 
I I 
C/) C/) 
� � 
Q) Q) 
c c 
c c 
oJ oJ 
..c:..c: 
u u 
..j <Xl 
Figure 6 
38 
opt. opt. 
opt. opt. 
-
opt. i- opt. 
-
3 9  
Communication s  between CPU , I/O proces sors , and 
main memory management is accomplished by a special ly 
deve loped interrupt s truc ture that us e s  a reque s t/ 
acknowle dg ement scheme for communication . Each 
c omponent passes messages to each o ther using dedicated 
mail boxes , an imp lementation of using a bus de sign 
with little overhead of a sys tem control mechanism . 
Optional ly , a buil t-in performance monitoring set 
of hardware c o l l e c tion regis ters c an be referenced to 
analyze s y s tem performanc e . U sing this performance 
tol l ,  overhead of a software monitor to gather 
performance data is e liminate d . 
Since no sys tem contro l l er feature is utilized , and 
communica tion is accomp lished via mailboxing me ssage s , 
configurations are free form , and no re lationships 
exist be tween CPU's , I /O proc e s sor s , and memory units 
to force specific configurations. As a resul t ,  typica l  
C P U  c onflic t and a s s ociated overhead is n o t  a fac tor 
and s y s tem communication s  do not bottleneck on one 
sys tem component . 
Main S torage Sy stems 
The primary function of main s torage in a proc e s sor 
i s  to prov ide a s torage are a for instructions and data 
that w i l l  be acted on by the central proces sor . The 
a c t i v i ty of the main s torage sys tem is de term i ne d  by 
i n s truc tions and data references requ ired by the cpu. 
Therefore , the speed of the proces sor i s  somewhat 
dependent upon the speed of the memory acc e s s e s  r equired . 
Variou s me thods have been imp l emented to a c c e l erate 
fetch and s tore reque s t s . Most notable i s  the concept 
of a high speed c ache buffer . This buffer , usua l l y  
con s idered part o f  the c entral proce s sor , holds a c ert,a i n  
amount of the l a s t  fetched data from main s torage. 
Fabr i cated from high speed random acce s s  c h ip s , a c c e s s  
t o  the data within the high speed buffer i s  u s u a l l y  
o n e  machine cyc l e . Should the reque s ted data not be 
i n  the high speed buffer , a s torage reque s t  i s  s en t  to 
a s torage con trol unit for fe tching of the reque s te d  
data . A s torage fe tch operation can take anywhere from 
one to twe lve machine cycle s ,  depend ing on a number of 
4 0  
4 1  
fac tors such a s  type of memory used , number of 
simu l taneous reque s t s , and amount of data transferred 
per machine cyc l e . Once the data is fetched , it is 
moved to the high speed buffer where it is acc e s sed by 
the c en tral proces sor . Highe s t  performance can be 
achieved if mos t  fetch and s tore opera tions are done 
within the high speed buffer . Cons equently , the greater 
the size of the high speed buffer , the better the 
machine throughput .  
I n  order to provide fas t  acce s s  to memory , some 
d e signs have interleaved memory into arrays in which 
consecutive addr e s s e s  are in different memory module s . 
F or examp l e , if a machine had four way interl e aving 
and the c entral proces sor re�ue s te d  four consecutive 
memory location s , a l l  four could be fetched simu l taneously 
reducing memory fetch time by a fac tor of four . 
Each sys tem s tudied has incorporated certain 
attributes of main s torage s y s tems to optimize 
performance of the machine . M emory sys tems are genera l l y  
standard within e a c h  vendor ' s  product line which a l l ows 
for e asier upgradeability . 
4 2  
Amdahl 
Amdahl ' s  Main S torage sys tem is compo sed of a 
Memory Bus Contro l l er (MBC ) and a Main S torage Unit ( MSU ) . 
The MSU i s  a 1 6MB to 6 4MB uni t ,  composed of 1 6 K  Dynamic 
NMOS chips wi th a cyc l e  t ime of 28 8 ns ( 1 2 machine cyc l e s )  
fetching 8 bytes e ach cyc l e . 
The MBC provides communication paths and mes sage 
traffi c  control between conponents of the proce s sor . 
A data i ntegri ty unit a s sures that the current ver s i on 
of a data l ine ( 3 2  byte s )  i s  acce s sed where mul tiple 
c op i e s  of a data l ine can ex i s t  in the operand buffer , 
i n s truc tion buffer , I /O proce s sors , or main memory . 
A component of the MBC , the Main S torage Contro l ler 
( MS C )  , receives data r eque s t s  for acc e s s e s  to memory , 
s e t s  the appropriate c ontrol l atches and generates 
error check ing and correc t i on code s . 
The MSU con s i s t s  of 8 b i t  byt e s  arranged i n  32  byte 
l ines . Four way i nter l e av i ng and four way quarterl i ne 
mul t ip lexing ( each quarter l ine be i ng 8 by te s i n  length) 
provide data bus paths of 7 2  bits (0 bit byte s p lus 1 
43 
bit p arity per byte ) . Each 8 byte mes sage can be 
tran sferred every machine cyc l e . The 58 0 MSU ha s been 
optimized for main memory data fetche s , which are the 
mos t  c ommon memory reque sts . A scenerio of a memory 
fetch is a fol lows: 
CDC 
S torage unit generates a read reque st 
MBC receives read request 
MSC unit receive s  reque s t  
M S C  uses  me s s age opcode and addr e s s  portion 
to crea te control signa l s  for MSU 
MSU acc e s s e s  one of the four quarterlines from 
one of the four interleave s and la tche s them 
to a data-out regis ter which is 8 byt e s  wide 
Tran sfer of data is done to reque s ting unit . 
The CDC 8 7 5  proces sor consis ts of 2 5 6 K  to 1 0 24K words 
of 4K bipol ar memory organized into 16 l ogic a l l y  
independent bank s . The banks are phas ed so that 
suc c e s sive addre s s e s  are in different bank s , much like 
interl e aving . One word can be fetched every 2 5ns , and 
having a three word transfer scheme , the memory cyc l e  
4 4  
t ime i s  7 5n s , w i th a word s ize o f  60 bit s. Four memory 
i nterface ports , each w i th a three word buf fer , are 
provi ded. Each central proces sor (max imum 2) is connected 
to a port , and each set o f  p eripheral proce s sor s  
( max imum 2 )  i s  connected t o  a port. A d a t a  d i s tributor 
s ervices each of the memory i n terface ports on a 
prior i ty b a s i s  and mUl t iplexes data be twee n  the ports 
and memory . The data d i s tributor a l s o  does error check i ng 
and correcti on. No cache memory i s  ava i lable s ince 
the bipo l ar memory speed i s  a very fast 7 5 ns. 
As an opt ional feature , Extended S emiconduc tor 
Memory ( ESM ) provide s up to two m i l l ion words o f  
add i t ional capaci ty. Data i s  trans f erre d  be twee n  main 
memory and ESM at the rate of 1 0  mi l l ion word s/second. 
When used with the D irect Extended Memory Acc e s s  feature , 
where data i s  s tored on d i sk , ESM functi ons a s  a buffer 
between d i sk and central memory. A Uni f ied Extended 
Memory feature is standard whi ch a l lows mai n  memory to be 
part i t i oned i nto area s  reserved for execution and areas 
reserved for large data s torage. 
45 
IBM 
The I BM 3 08 x  Proce s sor contains be tween 8 and 6 4  
MB (megaby te s )  of memory i n  8 M B  increments fabricated 
from 1 6K MOS techno logy housed within the central 
proc e s sing unit . Main s torage acc e s s  time is 3 1 2n s  
or 1 2  machine cyc l e s , fetching 8 byt e s  per memory cyc l e . 
Central s torage is divided into Basic S torage 
El ements ( BSE) . Each BSE has either 8 MB or 1 6  MB of 
s torage and contains logic for fetching doubl ewords 
(8 byte s )  from or s toring to data arrays in each BSE . 
The BSE logic performs four functions: 
Data s torage and retrieval for the complex 
C entral s torage communication with the CPU 
via the S y s tem Control ler 
Error Checking and Correc tion 
S torage regeneration control 
Central memory is 2 way interleaved by dividing the 
data arrays into Basic S torage Modu l e s  (ESM ) . 
4 6  
I nter leaving of contiguous 2K blocks of s torage 
provides for s imul taneous acce s s  of separate Ba s i c  
Storage Modu l e s  by mul tiple CPU proce s sors , I /O 
proc e s sors , or combi nat ion s  of proces sors. Key 
control le d  s torage protection i s  used for both s tore 
and fetch protection. Each 4 K  block of s torage i s  
protected by a 7 bit protect key , and i s  regul ated 
by the Sys tem Control ler. S torage reque s t s  can be 
queued ,  up to a maximum of 8 ,  within the Sys tem 
Contro l ler. 
Sperry Univac 
, 
The Univac 1100/90 sys tem con s i s t s  of up to 4 Ma in 
S torage Uni t s , (MSU ) , e ach composed of 4 mi l l ion words 
of s torage , fabricated from 6 5K chips . Each MSU ha s 
four s torage bank s , each of whic h  acts i ndependently , 
g iving each MSU the capabi l i ty to handl e  a total of 
four s imul tane ous reque s t s  from lOPs and CPUs. 
The s torage sys tem wi l l  interface to CPU buffers of up 
to 8 word s , u s i ng an 8 word Llock transfer. 
Two or four way interl e aving is used and i t  
47 
automa t i c a l l y  a l locates consecutive block ( 8  word s )  
addre s se s  to separate storage bank s . Each CPU can 
s tack up to 16 write reque s t s  if a bus y  cond i t i on i s  
sensed for the MSU. Main s torage c an a l s o  be logi c a l l y  
par t i t i oned t o  each CPU , providing d i agno s t i c  and 
s torage protection c apabi l i ti e s. 
I nput/Output Sys tems 
An I nput/Ou tput sys tem consis ts of components that 
transfer data between the proce s sor and peripheral 
devic e s . The se components genera l l y  consist of channe l s ,  
which hand l e  data transfer protocol and sense and s tatus 
interpretations , and one or more I/O proces sors which 
in terface between the proce s sor and the channel s .  
Communication channe l s  exi s t  between the I /O proce s s or 
and the central proce s s or for receiving commands for 
data transfer and between the I /O proces sor and the main 
s torage s y s tem for the transferring of data into or 
from main s torage and peripheral devic e s . 
I n  order to provide flexibility of device s  tha t are 
attached to a channe l ,  the I /O proce s sors are general l y  
microcoded t o  a l low for many different d evice typ e s  
and configuration s . The speed of the I /O proces sor and 
the number of channe l s  it control s  has a direc t  bearing 
on the aggregate data rate , the amount of data 
transferred through the I/O proce s sor . With current 
high speed direct acce s s  devic e s , block mu l tiplexor 
4 8  
49 
channe l s  ( channe l s  tha t interleave blocks of data from 
mu l t iple d evices on the s ame channe l )  are the dominant 
channe l  type . Byte or word channe l s  func t ion much 
the s ame a s  block mul tiplexor channe l s  except byt e s  
o r  words are i nterleaved rather than block s . Byte or 
word channe l s  are measured in the k i l obyte or 
k i loword range whereas block mu l t iplexor channe l s  are 
mea s ured in the megabyte range . 
Redundancy , shar ing of resources , and speed of I /O 
proc e s s or s  are the area s  addre s s ed by the proces sor s  
studied here . Each has impl emented a d i f ferent de s ig n  
b a s e d  on the type of workload , amount of bu i l t - in 
re l i abi l i ty ,  and relat ionship the I/O processor has with 
the o ther c omponents o f  the sys tem . 
Amdahl 
The Amdahl 58 0 provides 1 6  or 32 mu l ti plexor channe l s  
imp l emented i n  LS I technology . Each s e t  o f  1 6  channel s  
i s  control led by an I /O proc e s sor ( l OP ) which i s  the 
primary interface between peripheral devi c e s  and the CPU . 
The l OP con s i s t s  of three component s: 
I /O contro l l e r  ( l aC ) 
Bu s Handl e r  
5 0  
1 6  interface handlers , one for each channe l 
An l OP includ ing the l aC and Bus Handler i s  impl emented 
on a s i ng l e  mul tiple chip carr i e r , NCC, and is shared 
by the 16 interface hand l e r s . Each channel has 2 5 6  
subchanne l s  and can accomodate up t o  6 . 0  megabytes 
per second data rate . The max imum aggregate data rate 
for the first 1 6  channe l s  is 50 megaby tes per second , 
w i th the second l OP i ncrea s i ng the total data rate to 
8 0  megabytes per second . 
Data paths in and out of an l OP are v i a  the bus 
sys tem . The Bus Handler i s  the i nterface and provide s 
data buffer i ng when required . The l aC performs the 
proc e s s i n g  for the l OP and manag e s  the l aC and the 1 6  
I nterface Hand ler s .  Normal data transfe r , inc luding 
channel protocol and data buffering , is done by the 
I nterface Handlers. Data and commands are fetched 
directly from the s torage uni t ,  via the data bus 
5 1  
thereby reduc i ng contention between l OP s  and the cpu. 
Subchanne l queui ng prov ide s the abi l i ty to hold I/O 
ac tiv i t i e s  that were denied access to the sys tem 
typi c a l ly due to a busy device or c hanne l .  Once the 
d e s ired device or c hanne l i s  avai labl e , the he l d  reque s t  
i s  released for proc e s s ing . Use of the feature reduces 
the load on the CPU . 
One or two byte mul t i p lexor channe l s  are interfaced 
to the proces sor v i a  the consol e  i nterface . Each byte 
mul tipl exor channe l has an I nterface Handler and 
support s  a data rate up to 2 0 0  ki lobyte s per second . 
CDC 
The CDC 8 7 5  contains an I /O unit to perform a l l  
control over external dev i c e s  connected t o  the sys tem . 
The I OU i s  composed of the fol lowing func tional areas: 
Periphera l Proces sors ( P P )  
I /O Channe l s  
Centra l Memory a c c e s s  
Data Channel Connec tor ( DC C )  
Real time c lock 
Communications interface 
Maintenance r egis ter 
5 2  
A basic I OU contains 1 0  PPs and 1 2  I/O c hanne l s  and 
can be expanded to 2 0  PPs in group s of 5 with a maximum 
of 2 4  I /O channe l s . Each PP is an independent proce s so r  
with its own memory a n d  e a c h  s e t  of 1 0  PPs compris e s  
a mUl tiplexing sys tem which al l ows the PPs t o  share 
c ommon hardware for arithmetic , logica l  and I /O 
opera tions without losing independence .  Mul tipl exing 
is a c hieved by a s e t  of 1 0  identical registe r s , one 
for each PP , rotated to provide time s licing of the 
s e t  of P P s . 
Each PP c ommunicates with the c entral proc e s so r  via 
central memory and flag settings , while PPs communicate 
with each other over an internal interface . Each PP 
has a 4K ( 1 2  bit words )  memory , and each PP executes 
programs a lone or in conjunction with other P P s  to 
control data transfer s .  Reque s t s  from the central 
5 3  
proces sor a r e  s tored i n  central memory , fetched b y  a 
PP , tran s l a ted into I /O reque s t s , and scheduled and 
comp l e ted under the control of the PP program . The 
prog r ams u s e  the 4K memory as a data buffe r  between 
external device s  and central memory . Any PP can access 
central memory directly via the central memory acc e s s  
function . Since data transfer is in 1 2  bit word s , the 
central memory acce s s  func tion a s senb l e s  5 succes sive 
1 2  bit words into one 6 0  bi t c entral memory word , 
and likewi se di sas semb l e s  a 6 0  bit word for a write 
operation . Any PP can acc e s s  any I /O channe l and 
transfer at the rate of one 1 2  bit word every S O Ons , 
and a l l  channe l s  can be ac tive a t  the s ame time . A 
maximum of 2 0  P P s  c an simul taneou s ly read central 
nemory for c ommand s ,  and a maximum of 5 P P s  can write 
to centra l memor y  at the s ame time . Due to time slicing , 
each PP wi l l  be in a particular state of execution , 
so central memory bottlenecks are not a probl em. 
5 4  
I BM 
The I BM 3 08 x  s e r i e s  of computers conta ins an 
External Data Contro l l e r  ( EXDC ) , fabri c ated of LS I 
chips that perform channe l  functions . Prov i s ions for 
up to 24 chann e l s  wi th a data rate of up to 3 megabyt e s  
per channel o n  e a c h  block mul t ip l exor channel a r e  
ava i lab l e . F ou r  o f  t h e  24 channe l s  can b e  byte 
mu l t i p l exor channe l s  wi th a data rate of up to 5 0 0  
k i lobyte s/second . The EXeC contains the fo l l owing 
l og i cal e l ements : 
Chann e l  proc e s s ing e l ement ( CP E )  
Data s e rver e l ement ( DS E )  
I nterface adapter e l ement ( IAE ) 
The CPE i s  a spec i a l ized proces sor for contro l l i ng 
I /O i ns truct i on s  and interrupts. I t  performs que u i ng 
of I /O r eque s ts, manages channe l path s e lecti on , and 
communicates with the sys tem contro l l er. I t  i s  dr iven 
by pageabl e ver t i c a l  microcod e , has a two byte data path , 
and i s  packaged on one TCM . The CPE i s  shared by up to 
three DSE s , each DSE hand l ing 8 c hannel s .  Each DSE i s  
55 
a hor izontal microcoded proce s sor ,  hou sed on one TCM , 
wi th the microcode be i ng shared by the 8 channe l s  on 
an equ a l  round robin ba s i s . Each DSE has 2 5 6  bytes 
o f  data bu f f er ing c apab i l i ty per channe l and contro l s  
the trans fer o f  data to and from central s torage . Each' 
DSE por t i s  connec ted to a non-LS I I n ter face Adapter 
E l ement , ou tboard from the proce s s or .  This hardwired 
I AE contains 8 bytes of data buf fer ing and commun icates 
and hand l e s  all data tra f f i c  sequences on the c hanne l .  
Sperry Un ivac 
The Univac 1 1 0 0/ 9 0  contains one to four I /O proc�s sor s  
( l OP s ) , each w i th a t  l e a s t  8 word c hanne l s  and 4 block 
channe l s , each s e t  compr i s ing a channel modu l e . Each 
l OP c a n  handle a total of s ix modul e s  in any comb i na t ion , 
w i th a t  l e a s t  one being a block mu ltiplexor modul e. 
B lock mu l tiplexor c hanne l s  trans fer up to 4 . 3  megabytes 
per second on i nput and 3 . 7  megabytes per second on 
ou tpu t ,  w i th each b lock modul e  having an aggregate data 
rate of 1 7 . 2  megabytes per second. Word channe l s ,  in 
sets of 8 ,  have a max imum data rate of 3 . 7  megabytes 
per second per channel and 1 8  megabyte s per second p er 
word modu l e . 
5 6  
The l OP r eceives commands via a Universal Processor 
I nterrupt/mailbox system. The l OP does all channel 
processing without interrupting the central processor 
until completion . Queued I /O requests are kept in the 
l OP if device or path busy ' s  are detected . The l OP 
comrClunica tes directly with main storage I completing tile 
I /O request and signaling the requesting unit that I /O 
proces sing is complete via the mailbox system. 
Re l iabi l i ty ,  Ava i l abi l i ty & S erviceab i l i ty 
The sys tems de scribed in thi s  paper a l l  use some form 
of LS I technology as the ba sic bu i ld i ng blocks for the 
proce s sor . C ompared to previous non-LS I technology of 
pr ior machine s ,  the se proce s sor s require an approach to 
error detec t ion , fau l t  i so l at ion , and s ervice d ifferent 
from their predec e s sors . But a long w i th the technology 
c omes an intr i n s ic r e l i abi l i ty of a r educ tion in fa i lure s  
due t o  fewer phy s i c a l  connections and off-chip data 
paths , where mos t  failure s occur . 
I n  the pa s t ,  error re -creation s trategy s a t i sf i ea 
mo s t  d iagno s t ic procedures because conventional too ls 
and human inte l l igence were sufficient for problem 
i s o l a t ion and repair . With the LS I packag e s  used in 
the se machine s ,  a l e s ser number of ind ividual components 
are ava i lable for repair or replacement , and diagno s t ic s  
are s truc tured t o  " c a l l out "  the fail ing f i e ld replace­
ab l e  unit ( FRU ) . Consequently , the less the number of 
FRU s ,  the eas ier i t  i s  to s ervic e .  
5 7  
58 
Should ma j or components , such as central proc e s sors , 
I /O processor s ,  e tc . be replic a ted , sy s tem avai labi l i ty 
can be enhanced , provid ing ade quate real t ime problem 
diagno s i s  and bui l t  in sys tem reconf iguration func t i ons 
are ava i l abl e . The recordi ng of any such actions , along 
with a l l  i ntermi ttent problems for early probl em 
de tec tion , provid es a veh i c l e  for the vendor to fo l l ow 
what i s  or ha s happened w i thin the proc e s sor , on parts 
such a s  LSI chip s . Thi s  concept has removed the need 
to probe manual ly var ious c ircuits for problem diagno s i s  
and prov ide s  real t ime c apab i l i ti e s  t o  vendor per sonnel . 
Each manufac turer has impl emented the s e  features 
accordi ng to the ir pro c e s sor de s ign and components . 
Amdahl 
Amdahl ha s u t i l ized the concept of fewer FRU s  to 
improve the r e l iabi l i ty of the proce s sor , and by 
replac i ng di screte wiring w i th printed c ircui t  boards 
for proce s sor component i nterconnec t i on s . The high 
c ircui t  dens i ty has a l l owed Amdahl to place an entire 
funct ional un i t ,  such a s  the centra l proc e s sor ,  on a 
s ingle MCC which provides fast fault i so l a t i on and repair . 
5 9  
F au l t  isolation i s  accomp l i shed by hav i ng each MCC 
contain logic which records the ident i ty of the c ir c u i t  
on t h e  M C C  detecting the error . Source data , control 
s igna l s  and l atch contents are all recorded u s i ng the 
console proc e s sor . Bu i l t - i n  logic scan fac i l i ti e s  can 
d e termine and/or set the contents of sys tem latches 
for d iagno s t i c  purpo se s .  Spe c i a l i zed RAM s are inc luded 
on each MCC to maintain a microcode and bu s transac tion 
h i s tory for fau l t  trac i ng . RAM i s  a l s o  used for 
ma intaining a hi s tory of main memory c orrectable errors . 
The console proc e s sor , impl emented on one MCC , 
conne c t s  to a console comp l ex wh ich includes a f l oppy 
and hard d i sk ,  a sys tem scan f ac il i ty , a C RT/keyboatd 
and an AMDAC control . U s i ng AMDAC , any funct ion done 
l oc a l ly c an be per formed from a r emote d i agno s t i c  
fac i l i ty v i a  te lephone connec t ion t o  the conso l e . S i nce 
the console proce s sor i s  impl emented i n  LS I and is part 
of the c entral proc e s sor , a separate m i crocompu ter 
ba sed support proc e s sor located in the conso l e  provides 
add i t ional d iagnostic c apab i l i ty .  Component replication 
i s  done only in an attached or mu l t iproce ssor sy stem .  
6 0  
CDC 
CDC ha s implemented maintenance d iagnostic fac i l i t i e s  
in a f au l t - ana l y z e  mode . No automatic reconf igura t i on 
i s  ava i labl e , w i th the exception of an a ttached proce s sor 
env ironment , where l o s s  of one central proce s sor w i l l  not 
cause total sys tem outage . F a i led periphera l  proc e s sors 
c an be conf igured out of the sys tem and degraded I /O 
sys tems can be run . 
B a s i c  er ror detect ion i s  used and data i s  s tored in 
maintenance r eg i sters for i nterrogation . Diagno s t i c s  
a r e  done down t o  the board level and the F RU i s  the 
f a i l i ng board . RTA , Remote Technic a l  A s s i s tance , i s  
av a i l ab l e  v i a  telephone l ines to a l low for remote 
d i agno s t i c s  in suppor t of local service per sonne l .  
I BM 
I BM has imp l emented RAS in much the same form a s  
Amdahl s ince the 3 0 8 x  inc lud e s  much the Sru�e feature s ,  
proc e s sor s and fac i l i t i e s . I BM has a l s o  inc luded many 
'o f the s ame concepts , such as real t ime logg ing , 
6 1  
d iagno s t i c  chips on each TCM , and remote acce s s  to 
sys tem func tions . The basic dyad i c  3 0 8 1  proc e s sor 
c onta i n s  two central proc e s sors w i th automa tic sys tem 
r econf iguration shou ld one fai l .  
o f  the UP 3 0 8 3  mode l s . 
This i s  not true 
There are a l so a l imi ted number of F RUs per 
proc e s sor , between 18 and 54 TCM s depending on mode l ,  
each which conta i n s  approx imately 1 3 3  LSI chips . Each 
TCM contains d i agno stic c i rc u i try c a l l ed Log ic Suppor t 
S t a tions to c ommunicate u s i ng the Level Sensi tive Scan 
D e s ign c apab i l i ti e s  of the Proce s sor Control ler 
to record the s tatus o f  the machine down to par t i cu l ar 
l atche s . The fac i l i ty can be used for loading o f  
Eng ineer i ng Change s  t o  microcode a s  we l l  a s  d i agno s t i c  
c apabi l i ti e s  of hardware func t i ons . 
The Proc e s sor Contro l ler , a separate uni t ,  conta i n s  
n e c e s sary d i sk ,  modems and c o n s o l e s  f o r  record i ng a l l  
a c t iv i ty within the proc e s sor complex . 
6 2  
Sperry Univac 
Univac provide s advanced proce s sor ava i labi l i ty 
through the u s e  of mu ltiple CPUs and rops prov i d i ng 
c ontinued sys tem operat ion shou ld a component f ai l . 
Parti tion ing of CPU s  and rops can be accomp l i shed so 
that a component can be removed from the sys tem , 
d i agnosed , repaired , and re-added to the sys tem w i thou t 
an outage . 
The sys tem support proc e s sor , a micro-proc e ssor 
ba sed uni t ,  conta ins d i sk , memory and its own opera t i ng 
sy stem that i s  u sed to communic ate with each component 
of the cPU . Parti tioning o f  components i s  done u s ing 
thi s  proc e ssor , and i t  a l so does recovery and i solat ion 
action when errors are d i scovered . Logging of a l l  
actions i s  done for later ana lys i s . 
D i agno s t i c s  are done down to the board conta ini ng 
the LS I c hips . Thi s  board i s  the F RU of the 1 1 0 0/ 9 0  
system .  
App l i cation s  
E a c h  o f  the compu ter systems s tudied h a s  been 
d e s igned for a par t i c u l ar s egment of the compu ter 
marke tplace and components have been des igned to provide 
high performance r e l ative to the type s of app l i c a t i on s  
run on e a c h  sy stem . 
The ab i l ity of a sys tem to per form work can be 
def ined two way s : 
Performance 
Throughput 
How f a s t  can the work be accomp l i shed 
How muc h  work can be done in X 
amount of t ime . 
The se concepts are app l icable to the two ma j or type s 
o f  workload s : 
On- l ine re�u i r i ng f a s t  re sponse 
Batch requ i r i ng high degree of ove r l ap 
Each are a  demand s a spec i f i c  type of des ign , yet mos t  
c omputer u se r s  have a mixture o f  both on the ir sys tems . 
6 3  
6 4  
S pe c i f ic per formance i n  a commercial env ironmen t  i s  
gene r a l l y  mea sured in a benchmark , where a characte r i stic 
workload i s  run and measured on a particular c omputer 
s y s tem and re -run and measured on another s y s tem for 
compar i son . Li kewi se the same procedure can be done 
within a fam i l y  of sys tems to de termine the improvement 
to be g ai ned in upgr ading from one sys tem to another . 
The app l i c ations s ta ted here are based on 
manu f ac turer ' s  c l a ims and indus try s tandard s , a long with 
c onc l u s ions based on i nforma tion provided on internal 
proc e s sor organi z a t ion and features . Each proc e s sor 
w i l l  perform d i f ferently on d iff erent workload s , but 
genera l  trends can be seen based on archi te c tural 
implementation s . 
Amdahl 
The Amdahl 5 8 0  series o f  proc e s sors is d e s igned 
around a high M I P  ( 1 3 )  uniproces sor and i ndependent 
I/O proce s sor s .  Relative to other systems , i t  i s  the 
f a s t e s t  commercial uniproces sor ava i l abl e .  Because of 
the high MIP rate , the 5 8 0  lends itself very we l l  to 
6 5  
on-l ine app l i cations , where speed of tur naround o f  a 
transaction i s  of u tmo st impor tanc e . 
G eneral ly ,  in dealing w i th an on- l i ne app l ication , 
there i s  a s ingle server queue mechanism , such as a 
contro l region or program , d i spatching work . The speed 
of the on- l i ne sy s tem is dependent on the speed of thi s  
control reg ion . S ince the 5 8 0  i s  a fast un iproce s sor , 
the c ontrol region can then serve more i t ems per u n i t  
of t ime than a n y  other proc e s sor , thereby providing 
the fast on- l i ne re sponse t ime . 
With regard to batc h proc e s s i ng ,  the 5 8 0  provides 
f a s t  turnaround of batch j obs due to a high degree o f 
performance ,  that i s ,  j obs w i l l  execute i n  l e s s  CPU 
time due to a high MIP rate . The e f fort p l aced on I /O 
proc e s sor s support the batch and on- l ine app l i c a t ion . 
Amdahl has microcoded the execution unit for 
f lexibi l i ty a t  the expense of per formance of some 
i n s tructions . Where other manuf ac ture r s  have hardwired 
f loating po int instruction s , Amdahl has u t i l i zed 
microcode at the expense of speed . Therefore , the 
6 6  
5 8 0  doe s not perform as we l l  in a high computational 
envi ronment whe r e  f l oating po int calculations exist . 
S ince the 5 8 0  was not designed at thi s marketplace , 
t h i s  ha s not a f f e c ted Amdahl s a le s .  
CDC 
The CDC 8 7 5  machine has been aes igned for the 
s c i e n t i f i c  comput i ng facet of the commer c i a l  data 
proc e s s ing community . The hard-wi red and spe c i a l i zed 
func t ional units prov ide high speed f ixed and f loating 
point c a lculations and the word s i z e  and reg i s ter u s age 
prov ide for a high degree of accuracy in computations . 
Wi th i t s  peripheral proc e s sors , the 8 7 5  provides good 
I /O ove r l ap of CPU proc e s s i ng , and since I /O does not 
interrupt the sys tem , more produc tive cyc l e  t ime i s  
ach ieved . 
I BM 
The IBM 3 0 8x processors have been d e s i gned for 
throughpu t .  S ince the maximum speed o f  a single 
pro c e s sor i s  7 . 5 M I P s , high speed uni ts are made from 
6 7  
t ightly coupled dual ( dyad i c )  proc e s s or s ,  prov id ing 
13 M I P s . S ince 2 j obs can be executing s imu l taneou s l y , 
a high degree of batch throughput i s  achieved . 
I ndependent I /O proces sors help to improve throughpu t 
a l s o .  W i th regards to on- l ine appl icat i ons , the 
r e l atively s l ow M I P  rate w i l l  tend to l imit the speed 
of transac t ion turnaround . 
I BM has mi crocoded a l l  instructions except f loat i ng 
point , which are hard-w i red . Ther efore , the I BM 3 0 8 x  
does very we l l  in a sc i enti f i c  environment where 
mu l t iple high c a l cu la t i on programs are needed to run . 
Thi s a l so c ar r i e s  over into the commerc i a l  batch 
environment ,  providing a high degree of concurren�' 
i n  batch proc e s s ing . 
Spe r ry Univac 
The Unival 1 1 0 0/ 9 0 s e r i e s  provides a high degree of 
redundancy a t  a moderate 6 . 5  M I P  rate per proc e ssor . 
Mu ltiple CPU conf igurat ions provide for a high throughpu t 
l eve l , much the s ame a s  I BM .  S ince the 1 1 0 0/ 9 0  system 
c an e a s i l y  be par t i t ioned , dedicat ing proce ssors to 
6 3  
tasks i s  ava i l able to provide a good overal l throughput 
ratio . The independence of the I/O proc e s sors 
c ontr ibutes to the throughput .  The partitioning a spec t 
l ends i t s e l f  to ease of te s t ing of new appl icat ions , 
sys tems software , and mainte nance/repair . I t  has 
obviou s ly been des igned for f l exibi l i ty .  
F utur e s  
E a c h  manu f a c turer descr ibed h a s  u l t imately the s ame 
goa l s  in mind , to prov ide the highe s t  performance ,  mos t  
r e l iable proce s sor ava i l abl e . With the l arge software 
and per iphe r a l  i nve s tment each cus tomer has made , the s e  
chang e s  made b y  the manufacturer mu st b e  evolut ionary 
r a ther than r evolutionary . Within the next 5 to 1 0  
year s ,  computer manufacturers w i l l  concentrate on 
technology and i t s  adaptabi l i ty to r e l iabi l i ty ,  
s ervic eabi l i ty ,  and performanc e . 
B a s i c  technolog ical improvements w i l l  come in �wo 
c ategor i e s , s emiconduc tor technology and packag ing , 
technology . W i th regards to s emiconduc tor improvements , 
the u s e  of other tran s i s tor technology ( F igure 1 )  
w i l l  provide needed lower gate swi tching speeds . G a l l i um 
Arsenid e , GaAs , currently under development ,  provides 
a 4 x  decrease i n  s ignal de lay t ime than s i l icon , the 
most commonly used media . TTL w i l l  more l ikely be 
phased out s i nc e  i t  i s  a 1 9 6 0 ' s  technology and i t s  speed 
c annot c ompete w i th EeL . Jos ephson Junc tions , though 
very f a s t  and low power consumer s ,  w i l l  not be practical 
6 9  
7 0  
unt i l  1 9 9 0 - 1 9 9 5  due to log i s t ical problens w i th cool ing , 
serviceabi l i ty ,  and manu f a c tur e .  Vendor s w i l l  seek to 
r educe the d e l ay s  per c ircu i t  w i th l e s s  regard to 
heat d i s s ipat ion s ince i t  i s  e a s i er to cool c ircu i try 
than it i s  to increase i t s  speed . 
Wi th r egards to packag i ng techno logy , the aim in 
the future w i l l  be to r educe the communication t ime 
between c ircu i t s  and between chips . By placing mul t i p l e  
chips in a s e a l e d  package , chip t o  c h i p  connection s  can 
be made wi thout w i r ing , redu c i ng s ignal delays . S ince 
a c eramic substance s lows down e l ectrical s i gnal s ,  
ceramic w i l l  be u sed only to hold a larger , more dense 
chip , perhaps a mu l t i - l ayered cube of c i rcu i t s . 
Memory sys tems w i l l  be improved but not a t  the 
expense of c o s t .  c o s t  p e r  bi t h a s  and w i l l  rema in the 
dominant fac tor in memory technology . B i t s  per chip 
exceeding 2 5 6K have been developed but y i e l d s  are not 
high e nough to warrant l arge manuf actur ing fac i l i t i e s . 
By 1 9 8 5 - 1 9 9 0 , 2 5 6 K  chips should be commerc i a l ly avai lable . 
Manuf acturer s expect to have cyc l e  time of RAM chips to 
be in the 2 - 3 ns r ange , prov id i ng very f a s t  memo r i e s  
7 1  
and u l t imately affecti ng the cyc le t ime o f  the 
proc e s s or . 
I npu t/Output systems w i l l  be improved by taking 
advantage of technology developed by the communication s  
i ndus try , that i s ,  the u s e  o f  f iber opt i c s  for 
per iphe r a l  conne c tions to the mainframe proces sor . Not 
only w i l l  thi s  provide for faster data transfer , but more 
r e l iable and l e s s  bu lky equ ipment w i l l  be requ i red . 
The e f fec t of a l l  the se technolog i c a l  changes on the 
c omputers of 1 9 8 5 - 1 9 9 0  w i l l  probably be somewhere near 
the fol lowing f igure s : 
Proc e s sors - 30  MIP uniproc e s sors will be 
availabl e , and can be conf igured into 2 and 
4 way mul t iproces sor sys tems 
M emory - M emory s i z e  w i l l  be an average o f  
2 0 0 - 3 0 0  MB , fabr icated from high dens ity 2 5 6 K 
chips . H igh speed bu ffer speeds w i l l  reach 
2 - 3 n s , lowering the machine cyc l e  t ime . 
I npu t/Ou tpu t  - F i ber opt i c s  w i l l  b e  used to 
provid e  f o r  2 0MB/second channel rate s , a 7x 
improvement over current rate s , wi t!l highe r 
accuracy and l e s s  des ign constr a i nts . 
7 2  
The proc e s sor s of the 1 9 8 5 - 1 9 9 0  era wi l l , a s  a 
r e s u l t  of the se techno log ical change s ,  be phy s i c a l ly 
sma l l e r , somewha t compa tible wi th current sys tems 
archi tec ture , inherently more r e l i abl e , and e a s i e r  to 
s e rv i c e  due to a r educed number of phy s ical components 
that are f ie ld replaceabl e .  Each manufacturer mu st 
antic ipate the advantag e s  o f  techno log i c a l  advance s  
and evaluate them a s  to app l i cabi l i ty t o  the i r  
proc e s sor s , and use them t o  prov ide machine s that mee t 
the needs of the data pro c e s s ing community . 
7 3  
L i s t  o f  Referenc e s  
7 4  
L i s t  o f  References 
Amdahl Corporation . 58 0 Technical I ntroduction , 1 9 8 2  
Borger son , B . R .  and other s .  " The Evolution of the 
Sperry Univac 1 1 0 0  Serie s : A n i story , Anal y s i s , 
and Pro j ec tion , " Communications of the ACM , 
Volume 2 1  No 1 ,  1 9 7 8  
Control Data Corporation . CDC Cyber 1 7 0  Computer Sys tems 
Mode l s  8 6 5  and 8 7 5  Hardware Re ference Manua l , 1 9 8 2  
Control Data Corpor atio n .  Cyber 1 7 0  Series 8 0 0 Computer 
Sys tems : Arc hitectural Overv i ew ,  1 9 8 2  
Cr edeur , K . R .  Compar i sons of Some Large Sc ienti f ic 
C omputer s .  NASA N 8 2 - l l 8 l 7 / 5 , 1 9 8 1  
Datapro Re search Corpor a t ion . Amdahl 5 8 0  System . 1 9 8 2  
Da tapro Research Corporation . Control Data Cyber 1 7 0  
S e r i e s  8 0 0 .  1 9 8 2  
Hamache r , V . C .  and other s .  Computer Organizat ion . New York : 
McGraw H i l l , 1 9 7 8  
I BM Journal o f  Research and Deve lopment . Vol . 2 6  No . 1  
January 1 9 8 2  
IBM Corporation . I BM 3 0 8 3  F unctional Character i s t i c s . 1 9 8 2  
Redd i , S . S .  and F ev s te l , E . A .  " A  Conceptual F r amework for 
Computer Architecture , "  ACM Computing Surveys Vol . 8 
No . 2 1 9 7 6 , pp . 2 7 7 - 3 0 0  
S atyanarayanan , M .  Mu l t iproc e s sor s : A Comparat ive S tudy . 
Eng l ewood N . J .  : Prentice Hal l , 1 9 8 0  
Sca nne l l ,  T .  " Univac Caps Large-Scale 1 1 0 0  Line , "  
C omputerwor l d , ( Cu l y  1 9 ,  1 9 8 2 )  
7 5  
S i ewiorek , Daniel P .  and other s .  Compu ter S tructure s :  
Principl e s  and Example s .  New York : McGraw H i l l  1 9 8 2  
Sperry Univac . Sperry Univac 1 1 0 0/ 9 0  F ac t s  and F igur e s . 
1 9 8 2  
Wu , Lin C .  VLSI and Mainframe Computer s . I EEE COMPCON 
7 8  S pr ing , San F ranc i sco , 1 9 7 8  
7 6  
V i ta 
