Ramrod: an experimental multi-microprocessor by Rabinowitz, Alan Errol
R A M R O D  : A N  E X P E R I M E N T A L
M U L T I  - M I C R O P R O C E S S O R
Alan Errol Rabinowitz
A Thesis Submitted to the Faculty of engineering 
University of the Witwatersrand» Johannesbura, 
for the Degree of Doctor of Philosophy.
Johannesburg 1982.

DECLARATION
I hereby declare that this thesis is my own work 
and that it has not previously been s u b m i t t e d  for 
a oeyLe= at any o%her university.
’
Alan Errol Rsnin
23rd Day of December , 1-82
PAGE (ili)
ABSTRACT
The computer architect of the 80's races apparently
intractable dilemma: Computer manufacturers have to conteno
with the soaring costs incurred in producing custom-made 
chips, and would prefer to use commercially-available, 
state-of-the art, large-scale integrated circuits. Product 
users, however demand highly reliable, realistically- priced 
systems which are nevertheless flexible enough to meet 
changing needs.
It is generally accepted that to be reliable and flexible = 
system should be conceptually simple and inherently 
fault-tolerant. Further, accepting the necessity for 
maintenance, it becomes clear that the architecture should 
be totally modular both for hardware and for software.
This thesis is an attempt to reconcile these seemingly 
conflicting demands. An architecture is proposed, based on 
the freguently-used principle of closely-coupled 
multiprocessors, which avoides the pitfalls of 
over-complexity and too-heavy software dependence.
PAGE (iv)
The proposed system is inherenty simple, making use of <= 
single, high-speed time-division multiplexed bus to provide 
for communication between processors and memory, 
complexity is reduced by adopting a distributed, 
hardware-oriented operating system. Simplicity is enhanced 
by the use of a unified memory structure, whereby the user 
may freely allocate local or global memory, or a mixture of 
both.
Of importance is the use throughout of
commercially-ava ilable, large- scale integrated ci. suits. 
This is particularly relevant as the work was undertaken in 
isolation from the centres of research into custom-made 
microelectronics.
The author has developed the proposed system to prototype 
level. The prototype has been subjected to a series of 
performance evaluation tests , and the results obtained 
prove the viability of the technique adopted, and 
demonstrate its promise for the future.
PAGE ( iv)
The proposed system is inherenty simple, making use of a 
single, high-speed time-division multiplexed bus to provide 
for communication between processors and memory. Software 
comole x i ty is reduced by adopting a distributed, 
hardware-oriented operating system. Simplicity is ennanced 
by the use of a unified memory structure, whereby the user 
may freely allocate local or global memory, or a mixture of 
both.
Of importance is the use throughout of
commercially-available, large- scale integrated circuits. 
This is particularly relevant as the work was undertaken in 
isolation from the centres of research into custom-made 
microelectronics.
The author has developed the proposed system to prototype 
level. The prototype has been subjected to a series of 
performance evaluation tests , and the results obtained 
prove the viability of the technique adopted, and 
demonstrate its promise for the future.
PAGE (v)
ACKNOWLEDGEMENTS
The author wishes to express his sincere appreciation to the
following people without whose help this thesis would net
have been written.
Professor M.G. Rodd, his supervisor , Head of the
Department of Electrical Engineering, University of the
Witwatersrand, for his guidance, enthusiasm, interest and 
unselfish help in the writing of this thesis and advice on 
solution of technical problems and for the opportunity to be 
able to 'mirror 1 his thoughts.
Susan, his wife, for her patience, tolerance , enthus ^sm 
and encouragement throughout his 'career as a student and 
her help in preparing the diagram'.
Riva Rachel, Aviva Esther and Yona Mordechai, his children, 
who helped provide a reason for completing the work
Ralph and Anne Rabinowitz, his parents, for their material 
and moral support.
The technicians of the Department of Electrical Engineering 
for their contribution to the technical work and 'repairs' 
done to the project.
PAGE (vi)
^us Finucci and Johann Lambrechs, his colleagues, for their 
help through tight spots
sue Rood for the task of making the thesis legible and 
intelligable.
Finally the author wishes to thank the University of the 
Witwarersrand fur the use of equipment , the CSIR for 
providing a grant for purchasing components, and Perseus for 
providing the author with a Research Fellowship.
PAGE (vii)
CCU
EPROM
TTL
CPA
TDM
CPU
ECL
ALU
RAM
MOS
IC
DC
VDU
ns
ma
V
nano
mil 1 i
micro
Kilo
mbytes
mega
VLSI
GLOSSARY OF TERMS AND ABBREVIATIONS
Computer Control Unit
Electrically Programmable Read Only Memory 
Transistor-Transistor Logic 
Central Processor Array 
Time-Divis ion Multiplex 
Central Processing Unit 
Emitter Coupled Logic 
Arithmetic and Logic Unit 
Random Access Memory 
Metal Oxide Silicon 
Integrated Circuit 
Direct Current 
Visual Display Unit 
nano-seconds 
milli-amps 1/1000 amps 
Volts 
10* *-9 
10**-3 
10**-6 
10**3
mega bytes 
10**6
Very Large Scale Integrated
PAGE (viii)
CONTENTS PAGE
ABSTRACT........................................ (iii)
ACKNOWLEDGEMENTS....................    (v )
GLOSSARY OF TERMS AND ABBREVIATIONS......... (vi i)
LIST OF GRAPHS, TABLES AND FIGURES........... (xii)
1 INTRODUCTION....................................
1.1 The Growing Demand for Computing
Power  1-1
1.2 Distributed Control..................  1-3
1.3 The Influence of Technology on
Architecture.......  1-15
1.4 Software................................  1-18
1.5 Multiprocessors........................  1-20
1.6 Ramrod: a Multiprocessor Architec'ure 1-26
1.7 Conclusion.............................  1-27
2 MULTIPROCESSORS.................................  2-1
2.1 Multiprocessor Structures............  2-1
2.2 Interconection Strategies............  2-4
2.3 Shared Memory..........................  2-13
2.4 Time-Division Multiplexed Bus.......  2-14
2.5 Supervisor Control....................  2-17
2.6 An Overview of Ramrod. .  .............  2-17
2.7 Conclusion ............................  2-20
3 A REAL-TIME OPERATING SYSTEM FOR Ramrod  3-1
PAGE (ix)
CONTENTS ,,^2
3.1 The Role of an Ooerating System....  3-1
3.2 Multiprocessor Operating Systems,... 3-3
3.3 The use of the Operating System
in Ramrod....... 3-5
3.4 Basic Structure of the Operating
System....... 3-7
3.5 Inter-Task Communications...........  3-12
3.6 User Task to Operating System
Communication..........  3-16
3.7 Conclusion............................. 3-16
4 HAPTVARE STRUCTURE............................. 4-1
4.1 System Overview............... .......  4_-
4.2 Basic Structure of Ramrod.............. 4-g
4.3 Physical Construction..................  4-32
4.4 Conclusion...............................  4-'35
IMPLEMENTATION' OF THE OPERATING SYSTEM...... 5-1
5.1 Operating System Kernel.............. 5-2
5.2 Local Operating System.........   5 - 5
5.3 Conclusion............................. ^_g
6 SOFTWARE STRUCTURE..............................  6-1
6.1 Data Flow Approach.................  6-2
6.2 Task Definition.........................  6_6
6.J Inter-Task Communication...............  5-8
PAGE (x)
CONTENTS nar.r
6.4 Conclusion.............................  6-15
7 EVALUATION OF SYSTEM..........................  7-1
7.1 Practical Limitations................  7-2
7.2 factors Influencing the Relative
Comparison...... 7-4
Procram used in Relative Comparison. 7-6
7.4 Results.................................  7_g
7.5 Conclusion.............................  7-12
8 CONCLUSION......................................  g.j
8.1 Uniqueness of Ramrod.................. 8-2
8.2 Commercial Viability of Ramrod.......  8-4
8.3 Critical Analysis of Ramrod...........  8-5
8.4 Future Enhancements................... 6-6
8.5 Conclusion......................  c_7
CONTENTS
PAGE (xi)
PAGE
APPENDICES......................................
A Emitter Coupled Logic...................  A-l
B Microprogramming Bit-Slice Technology. B-l
C The Modelling of a Circular Bus.......  C-l
D Currently available Multiprocessors... D-l
E Input/Output interfacing................ E-l
F The Exoslice Development System.......  F-l
G The Circuitry............................  G-l
H Marketing Costs of Ramrod .............  H-l
I High Level Description of Programs.... 1-1
J Reliability............................... j-1
REFERENCES......................................  R-l
Index to Figures, Tables and Circuit Diagrams
Figure S 2
No. Title PAG
2.1 Typical Multiprocessor 2-3
2.2 Shared-bus 2-5
2.3 Crossbar switch 2-9
2.4 Multiport Memory 2-12
2.5 System Diagram 2-19
3.1 Slave Processor to memory segement Pairing 3-11
4.1 Ramrod block Diagram 4-3
4.2 Timing on t e Shared-Bus 4-5
4.3 Circular Construction 4-14
4.4 Microinstruction format 4-23
4.5 Master Controller 4-31
4.6 View of Ramrod 4-34
6.1 Data flow Instructions 6-5
7.1 Operating System Sequence 7-7
7.2 Execution Sequence 7-9
A-l ECL Structure A-4
A-2 Series Gating A-4
A-3 Collector Dotting A-5
A-4 10804 latch A-5
B-l Conventional and Microprogrammed Computers B-3
B-2 Typical Microprogrammed Computer 3-5
PAGE (xiii)
No. Title PAGE
C-l Thevenin Equivalent of Driving gate
C-2 Capacitor Resistor network
D-l Cyba-M
D-2 Siemens 4004/220, 230
D-3 Siemens 201
D-4 Cmmp.
D-5 The Banyan Multi-Microcompter System
D-6 The Intel 4 32 System
F-l Microinstruction execution steps
J-l Serial reliability
J-2 Parallel Reliability
J-3 Composite Reliability
J-4 Ramro5 1s Reliability
Circuit Diagrams 
No. Title
G-l Microprocessor Module G-2
G-2 ECL latch Module g -4
G-3 Memory Module G-6
G-4 Control Board G-8
G-5 (a) Central Processor Array G-10
3 - 5 (b) Input/Output G-ll
G-6 (a) Computer Control Unit G-13
G-6(b) Pipeline Registers G-14
C-15 
D-2 
D-4 
D—6 
D— 8 
D-10 
D-l 3 
F-7 
J-5 
J-5 
J-6 
J - 6
PAGE (xiv)
Graph :
No. Title
1 * Failure Rate of Components
2 Average task delay time as a 
function of task Characteristics
C-l Comparison between predicted and
observed results
C-2 Comparison of rising edges for
various termination resistors.
C-3 Comparison of falling edges for
various termination resistors.
C-4 Comparison of the Voltage cross-section
on the bus at various times for 
various termination resistors.
Tacles
No. Title
1.1 Cost/Performance ratio
1.2 Reliability of Components
7.1 Execution Times
PAGE
"1-8
1-21
C-7
C-9
C-10
C-l 2
1-4
1-5
7-11
CHAPTER 1
INTRODUCTION
"An I directed my heart to know wisdom, and to know madness 
and folly ,but I have perceived that this also is a torture 
of the spirit. For where there is much wisdom there is much 
vexation, and he that increaseth knowledge increaseth oain" 
[Eccelesiastes i 17,18].
This thesis proposes a technique for interconnecting a large 
number of microprocessors to form a simple, inexpensive but 
efficient computer system. The system ? inherently modular 
thus enhancing reliability, mai :ainability, and 
testability.
1.1 Tr.e Growing Demand for Comoutina Power
In order to cope with the rapid advance of technology and 
the ever-increasing demands of society, particularly in 
respect of automation, there is a need for the provision of 
more commuting power at lower cost. One need only to look 
at fields such as those mentioned below to see that the 
'supercomputer' is very much in demand.
Short-range won' her Coro, n • t - r• - ; urn / •= / iccurate and
highly cc-ipl • < leather mod, I I i. Comput°r assisted
tcmogr nphy which involve; h igh-roocd signal processing
and in >g ing, i •; w ll as the i od I ling of organ such as thr 
heart, no • is advanced equipment for computing at speeds 
approach log 100 million floating point operations
(megnf lops) per s.-cond [Sl.\] . I) ’.cl tr fu ion researchers 
could use a computer 10 0 tines faster than any existing 
machine for modelling the plasma ir :tabi3 I;ie- of proposed 
fusion power generators [SPG 8(A j .
One of the world's most complex undertakings in the past two 
decades has boon the USA Department of Defense (DOD)
Ballistic Hi silo Program. A criti a I port of the large 
research and developement inv -st nt in this program has 
been the effort to develop • G .I i- ocessing hardware and 
software technologies to meet the computational challenges 
of this complex problem. The Ballistic Missile Program 
needs a e - touting system that will deliver a throughput of 
hundreds of megaflops per second, with a high degree of 
confidence that correct execution will occur. This
challenges even thr- most advance t • Imologists. [DAV 80]
PAGE 1-3
The computer engineer, who takes on himself the burden of 
designing such a machine faces a great challenge. He must 
bear in mind that a computer is ultimately designed for the 
end user, and it is the user's evaluation that counts, as it 
is he who will be in the most intimate relationship with the 
computer. The computer, therefore, has to be user 
acceptable in terms of reliability, maintainability and 
safety.
1.2 Distrrouted Contro
Amongst the many criteria which determine the choice of a 
particular design, is that of overall cost. A feel for this 
criteria may be established in tu.ble 1.1. In this Sugarman 
compares the processor cost/performance for a particular 
sample problem requiring 8 3800 flops for each iteration. It 
can be seen that the AP-120B peripheral array processor is 7 
times as cost-effective as its nearest rival the CRAY-1 
supercomputer (SUG 80].
PAGE 1-4
Michino Mf 1 ops $/flop installation
AP-120B 5.9 .03 . 15
CRAY-1 38.4 .21 8
STAR-10 0 16.8 .48 8
VAX 11/780 .26 .77 .2
CDC 7600 3.3 .91 3
ILLI AC IV 9.1 1.1 10
CDC 6600 .63 1.59 1
IBM 370/165 .87 2.3 2
TAOL 1 I. 1 COST/Pt j!(MA':CE P \ 1 ' [SUG 80]
(A Mor; iFlop (Mf Icn) is a : i 11 ion floating point operations. )
PAGE 1-5
Of interest from the above compar is ions, is the observation 
which may be made that parallel processors, which are 
cheaper than supercomputers, can be used in situations such 
as those mentioned previously.
1.2.1 Reliability
An essential trade-off to be considered in comouter design 
is the complexi z y of the computer versus the power (or
throughput). It is common knowledge that the more powerful 
a computer is, the more complex it becomes [SUG 81]. It is 
also common knowledge that complex electronics, unless
highly integrated, becomes increasingly unreliable and 
costly. This is easy to explain:- From table 1.2 it can be 
seen (a) that the reliability of a computer board decreases 
40-fold when compared to tne reliability of a single
integrated circuit because of the increase in components and 
complexity, and (b ) that the reliability decreases 
dramatically as the number of components per system
increases.
PAGE 1-6
Order of Magnitude of Fits
Transistor 
LSI component 
Solder connection 
Switch(percontact)
Pluq connection (per contact)
1 Board computer (25 chips) 4
— — — ---R— il i tv of Components [KOP 81]
10 to 100
100 to 10tJ0
2 to 20
30 to 300
30 to 300
000 to 40000
(1 FiT " Failure in 10**9 hours i. e. 115000 years)
PAGE 1-7
Gtiph 1 shows the characteristic curve for an electronic 
device. Early failures, such as infant mortality or burn in 
failures, occur at a high initial rate which decreases when 
the weak units have died out. The useful life period, which 
is the most important period because it is the key to 
reliability prediction, is followed by the wearout period. 
Wearout failure results from degradation of the strength of 
a device and exposure to the environment [DOY].
f a i l u r e  r a t e
INFANT
MORTALITY
USEFUL LIFE
TIME
GRAPH I FAILURE RATE OF COMPONENTS
PAGE 1-5
The society in which we live is becoming very
s^*e '-‘"v0nsc^ou5' an<3 increasingly dependent on computers, 
.therefore a computer which has the function of, say,
controlling production machines or on which we rely for the 
handling of critical data, has to be extremely reliable.
One need only look at what happened at Three Mile Island. 
The nuclear reactor at Three Mile Island was controlled by 
iardwired logic and many small computers. There was no 
central controlling facility nor was there communication 
between the distributed control points. At the moment of 
crisis the operators were faced with many indicator lights 
and perhaps, if there had been an interconnection of the 
control points, the near-disaster might not have occured. 
r or tnis to be available, clearly a highly reliable 
computing system is vital. NASA, too, has an aircraft
energy effiency research program needing ultra-reliable 
computers that would counteract faults automatically. The
aircrart flies very close to the limits of stability and 
therefore a computer with a fast response time is needed 
(rather than a human) to control it. The probability of a 
computer failure during a flight must be less than the 
probability of mechanic.i structure failure during the same 
period. Tnus an ultra-reliable and fast computer is needed.
PAGE 1-10
Reliability is normally defined as the "probability that a 
system will function within the specified limits for at 
least a specified time under specific environmental 
.conditions" [KOP 81J. It is concerned with all the parts of 
the system (hardware, software, printed circuit boards, 
etc. ) , their intet action, the inter c'-'* "tion mechanisms 
between the various parts and finally, naturally, depends on 
the mechanical construction.
In striving to achieve a high degree of software 
reliability, problems such as software validation are 
encountered. Present techniques are inadequate for 
evaluating the reliability of software, and perhaps the only 
way of checking software is by exhaustive testing [LAM] 
Bernhard maintains that system validation problems are 
primarily related to software, and that no guidelines exist 
for determining software reliability [BERC]. Making the 
software simple and well defined can help in solving these 
problems, but the programmer can never claim with total 
certainty that his program is error-free (see 1.4).
PAGE 1-11
The reliability of a computer system require, a tborouq, 
investigation. Reliability involves both software and 
hardware and it was decided to limit the study of
' rel- - b U l t y  • ln this thesis "primarily to that of hardware 
(appendix Software reliability is dealt with on the
Keep It simple and well- defined" precept.
One of tne most critical factors influencing the reliability 
computer system is the interconnection structure of the 
system. This is because, although the reliability of the 
individual components can be maximized, the overall 
reliability of the system will be related to the component 
inte. connections, which are not usually duplicated and are 
inherently unreliable (being mainly mechanical in nature,. '
In Practice, there are various techniques available for 
attaining a high degree of reliability.
achieve reliability through either the use of inherently 
ighly-reliable components or through the introduction of 
reaundancy. (Redundancy here implies that the system 
more resources than are absolutely necessary forrr operation ’■ Ac°cdin9 to — .to=,
■< 9 ’ y reliable systems are "systems with a structure
independent of any critical resource that has a relativelv 
high failure rate."
PAGE 1-12
Lae uOoi oi increasin | La-> reliability of components
■ ic’n iovccl du-ing vi inu ■ : re, is very high. There foe e
f au 11-to I :• cam ■ * i rj normally adopted on the pt lise that it
i° moce ccon >in a a a 1 to build vedun ant systems than to strive 
"
system is one which can survive multiple "aults that would 
normally bring a conventional co outer to a halt [STI].
Of importance in a redundant system is the aoility to detect 
an error. Error detection presupposes that the result of a 
otea in a pro ' can be related to an acc -ptance criterion. 
In a sysb- i with redundancy, additional resources typically 
are used to form an error detection module which may be 
separated from the actual active pros ssinj nodules [KOP 
81] .
A key issue in fault-toloranc - design is the size of the 
unit that is to be replaced in the event of a failure —
j) .
is generali/ visualised as the unit which is removed by the 
service eng in- r once ho has localised a fault to a 
particular unit, which is then replaced by a identical one. 
It is also clear that an 5RU must be testable - specifically 
tuis requires it to have well d'lined inter faces[KOP 81].The 
5RU could be a resistor or transistor at one extreme or a 
complete board at the other. Since the costs of electronic 
components are steadily decreasing it becomes economically
PAGE 1-13
feasible to think in terms of a complete board as the SRU.
From the above it may be concluded that a well-structured 
computer should therefore have inherent fault-tolerance 
built into 1ts architecture, by having redundant components. 
By adop . uch a design, however, it would seem th<_c
reliability is achieved at the expense of simplicity. This 
thesis discusses an architecture for a computer system that 
is reliable, partially fault-tolerant and (of importance) 
simple and well-structured.
1.2.2 Maintainability
The user of a system is primarily concerned with the 
availability of the system for his use. Ava'ilabil icy is a 
function of the Mean Time Between Failure (MTBF) and the 
Mean Time To Repair (MTTR). As a failure occurs, the faulty 
module is replaced by the service engineer and the user can 
then carry on operating the machine as if nothing had 
happened. Provided the principle of fault-tolerance is 
adopted, however,during the diagnosis and repair time the 
user will simply experience a slight drop in performance.
PAGE 1-14
There is clearly a trade off between maintainaoility and 
reliability - both being linked to the availability of the 
system (see appendix J }. Maintainability, which can be 
defined as the probability of repair in a given time, 
implies that the system must be modular. If the SRU is 
extracted for repair, the system must be able to tolerate 
this removal and recover once the module is re-inserted.
A module is characterised by the function it performs. It 
is essentially a 'clack box' which transforms a set of 
inputs to a set of outputs. In designing systems using 
modules the designer assumes that other modules, except the 
one on which he is working, work to specification. Testing 
is done in a similar easy fashion.
Of importance too, is the practical realisation that once a 
computer system has been installed, the user inevitably 
needs to increase its capacity! Enhancement of a computer 
system can be achieved much more readily in a 
well-structured, modular design.
PAGE 1-15
1.3 The Influence of Technology on Architecture
The advance of technology is sometimes too rapid for the 
system designer, in that by the time his design is
functional there may be newer and more powerful components
available which might more easily accomplish his reauired
tasks. This problem is never more apparent than in the
world of electronics, and particularly, the digital area
where the pace of technological innovation is staggering.
The Electronic designer has three approaches available to 
him when utilizing state-of-the-art digital hardware. These 
may be summarized as follows:-
1. The use of Custom Designed Integrated Circuits.
The engineer designs highly complicated integrated 
circuits from the transistor junction level
normally using the support of a Computer Aided 
Design system (CAD). These components are then 
fabricated especially to meet the required
function. Clearly cost is a problem unless volume
is high (typically > 10000 units).
2. The use of Readily available VLSI. The engineer
attempts to utilise integrated circuits which have 
already been manufactured and which perform
specific functions.
PAGE 1-16
3. The use of Semi-Custom Logic (e.g. a Logic Array). 
Ir. this technique the integrated circuit 
manufacturer produces a chip which is complete from 
fhe semiconductor point-of—view, but which lacks 
the final interconnection of the various logic 
iunctions that are performed by the semiconductor 
junctions. Thus the designer of a circuit 
typically has two to three thousand logic gates
available for his design. Using CAD techniques, he 
-hen creates a system using on 1 'r the types of 
components available on the particular array chip 
in which he is interested. Once again, using CAD 
facilities, the designer optimises the 
.i.r ..erconnection of the gates to give himself a
system which meets his requirements. The final 
interconnection of the components (the metalisation 
process ) is relatively cheap, and the approach is 
is cost-effective for a medium level of production 
[ROD 82].
In addition to the points mentioned a Dove, a fundamental 
premise in design is that a designer should strive to
utilise state-of-the-art technology; this, of course is in 
itself a situation requiring much thought. A case in point 
is the Josephson Junction. Conference papers continue to be 
delivered on this technology but the scientific world still 
waits for a commercial computer based on Josephson
Junctions.
PAGE 1-17
Josephson devices, which are based on super conductivity and 
tunnelling, are very attractive for ultra-high-performance 
computers. They are extremely fast-switching (<10 pi co 
seconds) h a .e extremely low power lissioation (< 500 nano 
watts per circuit). However, they have to operate at near
the Absolute Zero temperature (—270 deg C ), so that they can
function according to the specifications. This temperature 
requirement causes undue environmental complexity as well as 
additional costs for refrigeration, and inconvenience of
system debugging and servicing [ANA 30].
Therefore even though Josephson Junctions are undoubtedly
superior in most aspects to any other logic family
available, there is a natural reluctan_e amongst computer
designers not to use this technology until it has been
proven and tested.
An important factor which has to be considered is the local 
situation. As the work for this thesis took place in
relative isolation from the centres where electronic 
technological advances are normally made, the decision was 
made to design a completely modular system based on locally 
available technology. This ruled out the use of Custom 
designed circuits, as well as that of logic arrays - this 
latter industry being still in its infancy in South
PAGE 1-18
Africa[NOV]. However the majority of leading Integrated 
Circuit Producers are represented in the country, and thus 
the bulk of commercially available components could be 
considered.
Finally from a maintenance point—of-view the approach 
adopted appears to have much merit. One has always to 
ensure that the local maintenance personnel can cope with 
the technology they are servicing; also that replacement 
components are readily available.
1.4 5of tware
The complicated aspect of software reliability has not been 
dealt with in detail, as it is beyond the immediate scope of 
this particular investigation.
However, a few general guidelines which should be adhered co 
in attempting to produce reliable software have formed the 
basis of all software developed in this project. These are
as follows:-
1. The specification of the program should be kept
simple and accurate and must be well documented to 
allow non technical persons to understand the
software
PAGE 1-19
2. The software must be well designed with clear
meaningful documentation,in order to reduce effort 
in testing and maintenance.
3. The software must preferably be organised in a tree 
structure, in order to make reading and 
understanding easier.
4. The software should be written in a modular
structure with loose coupling between modules, so 
that any module can be extended, replaced or 
removed without affecting the other modules.
5. The software should be designed in a top-down
fashion, which first describes the problem in a 
very high-level way, and then proceeds to give 
lower levels of description until the level is 
reached which contains definitions of indivisable 
functions iLAM].
Because most of the operating system has been implemented in 
hardware, and only a minimal amount of software (designed 
using the above principles) is reauired to complement the 
operating system, it is felt that this approach ensures a 
relatively high'degree of software reliability. However it 
must be emphasised again that this aspect of reliability was 
not considered in det.-il in this thesis.
I • 5 Mul t i j3r QC' ‘osocs
has applied the theory of traffic movement through a
telephone exchange to the analysi-: of the performance of
multitasking industrial control computers and has produced a
graph (draph 2) which mirrors the expression derived for the
mean delay experienced by a task in a queue which can form
m  a multitasking computer system fROO 75]. This was done
m  0rder t0 Predict the performance of the system. As can
be seen, the delay time in the queue increases as the
average request rate increases. There are clearly various
ways to increase throughput of a system as may be deduced 
front these curves:
PAGE 1-21
(S0N033S Nil ^Q) 3 0 3 0 0  Nl /,V03Q NV3y\
GRAPH 2
OF TASK CHARACTERISTICS
/PAGE 1-22
1. Make the task length shorter, i.e. simplify tasks 
oi increase speed of processing
2. Decrease the average request rate, i.e. reduce the 
demands made on the processor
3. hake the computer faster, i.e. increase overall 
operating speed (as wall as achieving (a) above).
4. increase the number of processors, i.e to 
decentralize the processing.
powerful, large-wor d-s ize mainframe computers and
supercomputers have made high-data rate processing feasible, 
out these systems are not economical for laboratory
.nvironments, data acquisition, process plants or reduction 
applications. Minicomputers on the other hand are
economical, but are technically unsatisfactory because of 
their limited comouting speed and smaller fixed-point 
[ALE 81]. Therefore the microprocessor, which is 
cheap, and which can be interconnected to form a powerful 
multiprocessor computer, can fill the gap left by the two 
other computer systems.
PAGE 1-23
Decentralisation of a computer system implies that there i,
a distribution of intelligence (i.e. processors,. As has
een ln '=CC"lu:' 1'1, Parallel processors (of which the 
multiprocessor is one type, compare verv well with 
•supercomputers' on a cos./performance ratio. There is less
reliance on a centralised facility, and processors can be
added on a more flexible basic and in smaller increments.
Multiprocessors inherently rely heavily on parallelism to
enhance throughput and computation. Kith such a hardware
structure many elementary data routing and processing
functions can be implemented concurrently, improving total
processing speeds by 10 to 100 times over typical 
minicomputers.
* mult^ r°«ssPr architecture increases productivity throug, 
processing, and maximises the likelihood that i 
processor will be available when it is reguested. The 
system can generally be tailored to user requirements in 5
m°re fleXible manner than " n  a centralised facility, 
because each processor in the system can be used to perform 
a separate function.
PAGE 1-24
A multiprocessor computer should also be inherently modular 
and therefore the cost of increasing its processing 
capability is smaller than that incurred when expanding a 
large computer. Redundancy at a hardware level is naturally 
easier in a multicomputer than in a monolithic central 
facility as extra modules (which are added on to take an 
active or passive part in the system) can take over trhe 
execution of a task in the event of a processor failure.
As has been pointed out in appendix J, a multiprocessor 
system that has redundant units is ideally more reliable 
than a uniprocessor system. An additional factor to 
consider when designing a redundant modular system is that 
the system should 'gracefully degrade'. This idea is 
illustrated in the following example. An on-line airline 
cooking system is a distributed computer with user terminals 
in each booking office and with a centralised data base. 
Tne failure of any terminal should not inhibit other users 
from accessing the common data base. This is usually 
referred to as "graceful degradation" in that failures will 
accumulatively affect the overall system performance but not 
cause immediate and total system failure.
PAGE 1-25
In such a system reconfiguration is, however, necessary when 
a permanent error, like a processor failure, occurs. At the 
conception of a redundant system it has therefore to be 
decided at what level redundancy is to be implemented - at 
system level, subsystem level or at a component level (as in 
the to discussion of SHU above). Therefore it is logical to 
make the SRU (i.e. a complete board ) the redundant 
component as well.
From the previous sections it may be seen that the choice of 
components of a multiprocessor is critical. As mentioned in
1.2.1 tne SRl should be a complete circuit board. A 
micropt ocessor computer boar 1 will provide a convenient 
basis for reconfiguration after an error and should 
therefore be the SRU.
It can therefore be concluded that a multiprocessor computer 
a simpler alternative to a bigger computer in most 
applications.
PAGE 1-26
1.6 Ramrod: A Multiprocessor Architecture
Ramrod, as the multiprocessor structure developed in this 
thesis has been named, was designed using a master-slave 
approach as it was felt that there was a need to provide for 
supervision of the slave processors with respect to their 
intercommunication, execution of tasks and probable failure. 
This is of particular importance in an experimental system 
which Ramrod essentially is.
For this reason it was concl ided that the master had to be 
more sophisticated and more powerful than the actual 
processors. Therefore the master was designed using 
bit-slice technology and the instruction set was custom 
built to suit the application (see 4.2), whilst the slave 
processors were selected to be simple, single-board 
computers. Using bit-slice technology for the master 
implies that the designer has complete control over the 
architecture and many featur ; are therefore included to 
provide this with properties inherent in operating systems.
PAGE 1-27
1.7 Conclusion
Many inexpensive and relatively powerful single-board 
computers are currently available on the market and can 
therefore form the SRU of the multiprocessor system. In the 
event of a processor failure, the faulty processor board can 
ne replaced by a working one, and redundancy achieved at the 
same level. The multiprocessor system can have redundant 
idle iPv boards ready to take over should a processor 
failure occur.
This approach to architecture is currently receiving much 
attention: a lead.ng German Computer Architect Wolfgang
Giloi maintains that "The distributed multiprocessor system 
is the only known architectural form that can satisfy high 
cost effectiveness, modular extensibility, fault tolerance 
ana simplification of software production and maintenance 
simultaneously" [GIL BEHR].
Any multitasking computer system has its activities
co-ordinated via an operating system. in the case of a
multiprocessor the operating system may itself bt 
distributed with a part of its functions performed by the 
master processor and other parts by the various slave 
processors. This should result in a highly efficient
computer system as there is only partial reliance on the
master processor, and each processor shares in the execution
PAGE 1-28
of the operating system [TRAKHJ . Of interest is the 
implication that the various component parts of the 
operating system can themselves be executed truly in
parallel I
As will be shown in the next chapter, the preferred 
interconnection strategy for a multiprocessor is a shared 
bus in which the processors access common memory. As will 
be shown this is an optimal solution despite claims that a 
shared bus has serious bandwidth limitations.
Ramrod has such a shared-bus structure with a wide 
bandwidth, this having been achieved by a technique that 
appears to be novel. Ramrod has been designed, built and 
tested. The prototype, although suffering from certain 
timing problems, has been succesfully evaluated and the 
methods used are shown to be viable. The result is a 
multiprocessor system which makes use of commercial 
well-understood computing elements and which is reliable, 
modular and easy to maintain.
CHAPTER 2
MULTIPROCESSORS AND AN INTRODUCTION TO RAMROD
Before dealing with the actual scheme adopted in Ramrod this
chapter will provide a general background to multiprocessors
and the various possible strategies which may be used. The
interconnection philosophy of Ramrod will then be discussed 
in this light.
2.1 Mu I Processor Structures
A multiprocessor typically has the following attributes:
1. The system contains two or more processors of 
comparable capabilities.
2. All processors share access to common memory, but 
may have local memory.
3. All processors share access to Input/Output 
channels, control units and peripheral devices.
*  page 2 - 2
'
on rating system. [EMLi 74]
aInnercnt in this definition is the concept of 
multiprocessor system as a so-called 'tightly coupled' 
distributed computer. This implies that the various 
processors in the system are in close proximity to each 
other and have access to a common memory and common 
Input/OutPut system.
A typical multiprocessor will take the form shown in Figure 
2.1. Processors (Pl_Pn) are connected to Memory Elements 
(Ml-Mn) or other peripheral devices. Thus communication 
between the processors and resources (mem, I/O, peripherals)
.
often referred to as the Processor/Memory switch.
PAGE 2-3
TYPICAL MULTIPROCESSORFIGURE 2.1
PAGE 2-4
The following sections provide more detailed descriptions of 
the major interconnection technologies and their advantages 
and disadvantages. The shared bus, the cross bar switch and 
the multiport memory are compared in terms of cost, 
reliability, system throughput and transfer caoacity. 
Discussions of systems using these architectures are to be 
found in appenndix D.
2.2 Interconnection Strategies
2.2.1 Shared Bus
The simplest switch for a multiprocessor system is a common 
bus connecting the units as shown in Figure 2.2.
M E M O R Y  1 M E M O R Y  n
S H A R E D  BUS
P R O C E S S O R  2 PROCESSOR 3
I/O
I/O
M E M O R Y  2
P R O C E S S O R  1
MEMORY 3
PROCESSO.. n
FIGURE 2.2 SHARED BUS
rv
I
ui
P
A
G
E
PAGE 2-6
The shared bus can be centrally polled, i.e. the processors 
only transmit when selected by the controller. Bus 
contention is avoided by using schemes such as :
Fixed priorities, which allow processors with a 
higher priority to gain access to the bus if 
another lower priority processor presently has 
access.
2. First-in-first-out : the processor which first 
made the request is granted access to the bus
3. Daisy chaining: The processors are asked in turn 
. whether they have made a request, and only then can
s processor be given access to the bus.
On the other hand, the bus may be interrupt driven by the 
processors, which request bus usage. This scheme allows 
random usage of the bus. However an interrupt system can 
cause problems, if, during one processor's control of the 
bus, a second processor requests the bus, access can be 
granted to this second processor, and the first's data is 
lost. On the other hand, should all interrupts be disabled 
during a bus access then the requesting processors will have 
to wait for access and processor idle time is increased.
PAGE 2-7
The bus can be a Time-Division Multiplexed (TDM) bus, where
eacr. processor is allocated a time slot, or it can be
eguency-Div i s i on Multiplexed (FDM), in which each
processor has a particular transmit/receive frequency.
The shared bus is simple to design and construct, but has
oanawiath limitations inasmuch as the number of active or
passive units connected to it is limited [WEI 81). This
reduction in bandwidth results because when more units are
connected to the bus, the bus is simply unable to keep up
with tne increase in communication which accompanies the 
addition of units iZOC]. As there is only one path for all 
aata transfers, the total transfer rate within the system is 
limited by the speed of access of devices onto the bus and 
the actual bus bandwidth.
Tne snared bus is usually connected to a common memory 
(inoeed so are the other strategies) and therefore memory 
contention is also an obvious problem in that there is only 
one bus and one access to the memory connected on the other 
side, so there is a likelihood that two or more processors 
will try to access the same area of memory simultaneously. 
This problem can, however, be overcome by dividing the 
memcry into segments, and allowing only one processor at a 
time to access a segment.
PAGE 2-8
This scheme is the least costly in terms of the hardware 
used, and is also the least complex in terms of components 
as the bus can be totally passive. Modification is achieved 
simply by physically dding or removing functional units. 
However, a single bus system is naturally unreliable in that 
if the bus fails then a tote, system failure occurs.
2.2.2 Cross Bar Switch
The cross bar switch as shown in figure 2.3 has separate 
paths from the processors to each memory and I/O unit. The 
functional units (processors, memories and I/O) need not be 
concerned with the bus interface as the switch contains all
MEM ORY nMEMORY 3MEMORY 2MEMORY 1
ESSOR
PROCESSOR
PROCESSOR
PROCESSOR
FIGURE 2.3 CROSSBAR SWITCH
to
I
VO
P
A
G
E
IPAGE 2-10
This is the most complex interconnection scheme because the 
number of connections is necessarily large and because of 
the extra logic needed in the switch. The complexity grows 
exponentially as the number of units becomes large. 
Functional units, however, are simple and inexpensive 
because they do not need the extra logic to drive the 
interface and the potential exists for a high data transfer 
rate, since there is a separate path available to each unit.
Reliability is reasonable and can be improved by redundancy 
of the units. System efficiency is high because 
simultaneous transfers between processors and memory units 
can be accomplished.
Clearly the switching elements are the major drawback to 
such a scheme, but it must be pointed out that Intel is 
about to produce an LSI circuit with a large number of 
cross-bar switches for their new range of multi-processors 
[ENS 74U h is will obviously reduce the cost factor as well 
as the complexity discusser above.
PAGE 2-11
2.2.3 Multiport Memory
In a multiport memory system the control, switching anc 
priority arbitration are concentrated at the interface to 
the passive units, and not in the switch as in the cross oar 
scheme. Figure 2.4 shows that each processor has a separate 
port and bus connecting it to each memory and I/O unit.
P R O C ESS ORP R O C E S S O R P R O C ESS OR
; MORY 1
I/O
MEMORY 3
/O r,
MEMORY 2
MEMORY n
P R O C E S S O R
f i g u r e : 2 . 4 MULTIPORT MLMORY
PAGE 2-13
This approach is the most expensive, since multiport 
memories are costly and each memory has to have contention 
logic built in, in order to arbitrate between processors 
competing for the resource. High data transfer rates can be 
achieved, but expandability is difficult as more logic is 
required to increase the number of memory ports, or to share 
the existing ports amongst all the processors.
This system has its use in a system which has a limited 
number of processors, but clearly becomes unwieldy as soon 
as the number gets large.
2.3 Shared Memory
All of the interconnection strategies, previously mentioned, 
usually use shared memory which provides for a means of 
interaction between microprocessors. This interaction can 
be enhanced if there is no distinction between local memory 
of a single processor and global memory of the 
multiprocessor system. This is clearly a unique feature 
which has the advantage of being able to incorporate a truly 
distributed data base, since processors can address all the 
memory simultaneously. However care must be taken to ensure 
security of data.
PAGE 2-14
2 • 4 T ime-D i v i s i or, Multiplexed Bus
The Time Division Multiplexed (TDf-"' shared bus offers the 
best capability of all the interconnections discussed as it 
is simple, cheap, easy to implement and is a passive 
interconnection. The appare • r limitations of the TDM bus 
are:
1. Memory Contention
2. Reduced Bandwidth
3. Bus Contention
Fortunately these can largely be overcome as outlined below.
2.4.1 Memory Contenti or
A shared bus, as mentioned previously, is usually associated 
with common memory and therefore memory contention can 
occur. A system that allows the programmer to use a range 
of program addresses which may be different from the range 
of physical memories available (known as virtual memory 
addressing) may circumvent memory contention.
PAGE 2-15
2.4.2 ffandwidth Limitations
Bandwidth limitations can be overcome by using high-speed
technology to achieve a high data communication rate on the
shared bus. in addition if the speed of access to the bus
is increased then the overall bandwidth should also be 
increased.
2.4.3 Bus Contention
Examining the simple time-shared bus, it is found that 
contention occurs when several processors are making heavy 
use of the bus and when there are no mechanisms to res'- ve 
this contention (and cause processor idle time). Therefore 
a model of the shared bus can be made as a master/slave 
relationship (where each slave runs a single user task and 
the masters provide the requested service) in order to 
consider the problem of content!on.
Let the slave request rate = 1/L in secs and slaves only 
process when serviced by a master.
Let
N * number of slaves
Navg = average number of slaves
M = number of masters
Mavg = average number of masters
/PAGE 2-16
L/U, and Pi = probability of i slaves in oueue. 
^avg ~ average number of total busy processors 
wavg = average waiting time in queue.
Sm = 1/U expected service time of requests 
wm = expected waiting time in queue
It can be shown [TOO 78} that
'av9’-P'ifcM"1N.
Wavg =wm +sm ~N~Navq 
^avg•L
This simply states that the average slave waiting time 
increases as more slaves become idle {N-Navg} while 
waiting for service from masters.
The same conclusion can be reached by examining the bus
utilization factor, which is the fraction of the time that a
particular processor will make use of the data bus during an 
instruction cvcle.
This master/slave approach reinforces the need for some sort 
of control to supervise the allocation of memory to 
processors and the allocation of time slots to processors 
for execution of tasks. If this time slot (or bus 
utilization factor) is reduced then a significant increase 
in system throughput is achieved.
PAGE 2-17
2.5 Suocryj sor Co,, trol
The key to the success of a mas ter/alave multiprocessor
.
Software has boon shown to be less reliabl : than hardware
[KOP 81] and a l a r g o  orogram can never really be proved 
correct. Rocj [RC ) 76] has shown in his investigation that 
the implement it ion of m  operating system kernel in hardware 
has rruc'a promise. Th • us * of bit-slice technology offers 
the designer a chance to design the architecture of the
.
therefore, a bit-s1ice master controller was designed to 
contain several features of the kernel of an operating 
system. This will b d i s e a s ' d  at r later stage.
2.6 An Ov or v b w  of Pi - rod
,
simplified diagram is shown in figure 2.5, was designed with 
th' f o 1 1 ow in-j features :
1. Distribut'd Operating Syr Lem
2. TDM shared bus
s.
PAGE 2-18
Tightly-coupled slave processors
Common - Shared memory, with no distinction between 
global and local memory.
Intelligent Input/Output Control
PAGE 2-19
I/O BUS
SLAV^pROCFSSORS /
DISK UNIT
m e m o r i e s
- PROCESSOR TO 
MEMORY BUS
USER 
CONSOLE l J 3
' ^ v
PRINTER
l \ ^ J:T
/ /
1 USER CONSOLE
/
OPERATING SYSTEM CONTROLLER
FIGURE 2.5 SYSTEM DIAGRAM
PAGE 2-20
2.7 Conclusion
1 '1 1 1 i comp ut or system with distributed control is
probably the only architectural form that has the potential 
to satisfy all major architectural goals such as cost 
-
decomposition of software complexibility [GIL BEHR]. The
multiprocessor is an interconection of uniprocessors and it
is this interconnection scheme which forms the basis of this 
thesis.
.
by a bit-slice processor which has the function of a system 
supervisor. inere are presently 5 slave processors each of
.
memory modules, which initially consist of 256 byte
segments, ere interfaced to the processors via a time-shared 
■
(ECL).
Tne I/O section ha not been implemented in the prototype 
i : 1 ' J 'Ct o a ] at a.! It? 1 developoment (appendix H) .
The following chapt-r describes the hardware structure 
whereas chanter 4 discusses the software in more detail.
CHAPTER 3
A REAL-TIME OPERATING SYSTEM FOR RAMROD
"Then did I see in the whole work of G OD, that a man is 
not able to find out the work that is done under the 
sun, inasmuch as though a man were able to toil to seek 
for it he would not find it, and even if he were wise 
to think to know it, he would yet not be able to find 
it" [Eccl viii 17].
This chapter discusses aspects of real-time operating 
systems which must be considered when providing the 
supervisory control required by Ramrod. In order to 
meet the objectives of speed and reliability it was 
decided to place as much as possible of the operating 
system in hardware rather than in software. Finally in 
order to meet the criterion of reliability, it was 
decided to distribute the operating system as tar as 
possible throughout Ramrod.
3.1 The Role of an Ope.atina System
"n general an operating system has the prime function 
of transforming raw hardware into a system more 
amenable to its users! In addition, it should make the 
best possible use of available hardware so as to be 
generally more cost-effecive.
PAGE 3-2
An operating system has to be able to:
1. Provide maximum system reliability with a 
minimum of operator interventio .
2. Exclude the user from details of 
implementation i.e make the system appear to 
tne user as simple as possible.
3. Give the impression that the user has the sole 
use of the computer.
In general then a real-time, multi-user operating 
system should be able to:
1. Perform input/output either to a peripheral 
and/or to a user
2. Dispatch tasks to processors according to some 
predefined algorithm
3. Perform multitasking, i.e. allow many tasks 
to be executed concurrently
4. Supervise communications between tasks and/or 
the operating system
5. Recognize and service i n t e r r u p t s
PAGE 3-3
3•2 Multiprocessor Operating Systems
In the system developed in this thesis, Ramrod, in 
which there are many slave processors available for 
executing tasks, the operating system has an extra 
function to perform in scheduling the slave processors 
for execution of tasks. Once a task is dispatched to a 
free slave processor then the slave processor still has 
to be initiated. This is a true multitasking _,r 
multiprocessing environment and tasks can be said tc be 
running concurrently. It must be noted that 
concurrency in a parallel processor computer is true 
concurrency as processors can execute tasks absolutely 
simultaneously, whereas there is only 1 apparent1 
concurrency in a uniprocessor compu ter (since the 
so-called concurrent tasks are actually being processed 
serially).
In order to have a near-linear increase in processino 
power in relation tc an increase in processors, the 
master controller (in which the kernel of the operating 
system resides) should preferably have a cycle time 
faster than that of a single processor. This ensures 
the ability to control the slave processors as well as 
execute the kernel of the operating system.
PAGE 3-4
A~ advantage of having some of the operating system
removed from the processing environment is that the
master can be faster than the slaves and thus have more
-.revive control over the system. Secondly, a
purp se-built hardware structure should be able to
accommodate and execute the function of an operating
.s.e„. Dtt.er than a general microprocessor. This is
largely due to the fact that the actual functions
performed by an operating system are relatively simple
ana require little data manipulation. The functions
do, however, require to be executed as fast as
possible, in order not to degrade the performance of 
the actual processors.
In croer to have enough power to control many 
proc or. tne one hand and to contain an operating
system on the other hand, bit-slice architecture (which 
15 '"ery fast and is micr©programmable, see app B) , 
offers the opportunity of designing a purpose-built
powerful operating system processor, as mentioned in
2.4.
PAGE 3-4
An advantage of having some of the operating system 
removed from the processing environment is that the 
master can be faster than the slaves and thus have more 
effective control over the system. Secondly, a 
purpose-built hardwa* ; structure should be able to 
accommodate and execute the function of an operating 
system better than a general microprocessor. This is 
largely due to the fact that the actual functions 
performed by an operating system are relatively simple 
and reouire little data manipulation. The functions 
do, however, require to be executed as fast as 
possible, in order not to degrade the performance of 
the actual processors.
In order to have enough power to control many 
processors on the one hand and to contain an operating 
system on the other hand, bit-slice architecture (which 
is very fast and is microprogrammable, see app E), 
offers the opportunity of designing a purpose-built 
powerful operating system processor, as mentioned in
2.4.
PAGE 3-5
3 3 The use of the Operating System in Ramrod
The Ramrod operating system has certain essential
functions to perform. These are summarized as 
follows:-
1. Each processor in the Ramrod multiprocessor
structure must be allocated a task to execute,
and these have to be loaded into the common
memory from an external source. A segment of 
memory must be assigned to each task, so that 
each processor can execute a task 
independently of ocher processors. A task is 
considered to be an activity which provides a 
function such as Input, Output or it may be an 
execution of a prog-am, or segment of a 
program [LIST].
2. From 1 above it may be seen that each
processor in Ramrod has to be allocated a time 
slice in order that it may access memory on 
the other side of the common bus. In
addition, the particular memory segment
selected has to be enabled. A processor must 
be able to address any or all of the memory 
segments in order that the system can be said 
to contain a virtual memory. To achieve this,
PAGE 3-6
neretore, some sort of intelligent control 
needed
3- A list of information pertaining to the
location of defined tasks in memory,
processors scheduled to run tasks, and the
eta,us of tasks must be monitored so that the
operating system knows the configuration of
the system at any point in time. An interface
to the system user is also required in order
that such system information may be accessed,
55 V e n  a$ Providing an overall ability to
communicate with the system and its component 
par ts.
hniCr‘ 11 hac been executing must b 
redispatched to the next available workin 
processor.
aerlnea tasks will require th 
ability to communicate with others and thi 
must be supervised in order that security o 
information may be assured, one processor ma,
also need to draw on the results produced b, 
another processor.
PAGE 3-7
in view of the above more than dedicated logic is 
needed ior the total control of the system. T h e r e f o r e ,  
,r, intelligent master controller must be created which, 
in essence, contains some basic features of t h e  k e r n e l  
of an ooerating system, and may indeed implement t h e s e  
facilities in hardware (see 4.2.).
As has seen pointed out in chapter 1, distribution of 
the hardware improves systt a reliability a n d  s i m i l a r l y  
distribution of the operating system w i l l  i m p r o v e  
software reliability. Thus it was d e c i d e a  to 
distribute the operating system as much a s  p o s s i b l e .  
The major effect of this is to p r o v i d e  f o r  l i m i t e d  
operation in the event of the failure of a  p a r t i c u l a r  
section of the system.
3. 4 Basic Structure of thr Opr rating S y s t e r ^
It has successfully been shown that a hardware-based 
operating system implemented using bit slice technology 
can indeed work with a high degree of efficiency 
[ROD 76 1 .
PAGE 3-8
As was d e m o n s t r a t e d  in Roc 11 s w o r ;
the advantage of a high operating ra> - and t . ;nir .
relatively simple hardware St rue tut . ihi^ » ' ^
aids in the d bugging )f the hardware/soltwa^e
structure. The core of an operating system is the
executive or the nucleus, and it is this that will be
nucleus concerns itself with memory management,
input/output, task dispatching, processor scheduling, 
inter-task con tun cat ion and intertupt .
The ope r a t i n g  system proposed for Ramrod also has t n e  
highly des i r a b l e  property of being partially 
distributed. This increases the reliability oL t h e  
system as a whole, because once a slave processor is 
executing a task, it needs no assistance from t h e  
master until inter-task c o m m u n ication is wante-1 or the 
•
can still ca rry on e x ecuting until one of the above two 
terminating conditions occur. Thi^ featur_ is 
important in view of the strategic role or the 
operating system construction. Clearly as appendix J
,
weak point and the failure of the master should not 
cause a total syst em collapse. Therefore an effort 
should be made to distribute the operating system
PAGE 5-9
wherever possible.
Thus routines related to the function of the 
Input/Output module are implemented in the slave 
processors while the rest of the kernel of the 
operating system is incorporated in the master 
bit-slice processor.
Another important consideration in the design of the 
operating system is the control over memory usage. 
Memory management is concerned with loading tasks into 
memory, ensuring that there is place for the task to 
reside and finally removing completed tasks. This can 
be combined in Ramrod with task dispatching since the 
memory is common to all processors, and therefore a 
particular memory segment can be assigned to any 
processes. A Task Control Block (TCB) table is kept to 
inform the nucleus where the task reside' the state of 
the task and to which processor it has been disnatched. 
Thus dynamic rescheduling is achieved by allowing 
another free slave processor access to the segment of 
memory.
/PAGE 3-10
Pairing a memory segment and a slave proce-^o. 
accomplished by selection, by the master, of a segment 
of memory simultaneously with the selection of 
processor for access on the TDK common bus. A modulo n 
counter (where n - number of processors! generates 
consecutive addresses for reading a fast Read/Write 
memory (RAM), whose output selects or deselects 
processors and memory segments. The master controller 
has the ability to rewrite this fast RAM, thus alio, ing 
any combination of processor-memory communication. 
This is illustrated in figure 3.1, which shows that the 
master controller determines which devices are allowed 
access onto the TDM b u s .
ROCESSORnPROCESSOR4PROCESSOR3PROCESSORS
 ^
p CESSOR]
MASTER
CONTROLLER
DIVISION MULTIPLEXED BUSt TIME-
___
MEMORY 3
MEMORY n
MEMORY 4
MEMORY 2MEMORY 1
w
1
IGURE 3.1
SLAVE PROCESSOR TO MEMORY SEGMENT PAIRING
P
A
G
E
PAGE 3-12
3 ,  5 I n t e r - P r o c e s s  C o m m u n  i  c a t .  i o n s_
In any system in which several tasks are being execute 
in parallel the situation will aiways occur when 
processors require to exchange data. Inter-process
communication can be defined, in this context,
message passing between processes. For example, in a 
simple arithmetic calculation z = ( X w Y I * (X-Y), one task 
could do the addition, a second task the subtraction
and a third task the multiplication. The first two 
tasks have to pass their data to the th.-d task, in
order that the multiplication rt oct r. Thus the
third task waits until it receive' sages from task 1 
and task 2. Inter-process communication (IPO nas to 
be co-ordinated by the operating system which must 
know, amongst other things, who the partners to the 
message are, so that processes can cooperate correctly 
in the manipulation of data, whilst a more detailed
discussion of communications appears in Chapter 6, the 
discussion which follows outlines how Inter Process
Communication (IPO is presently implemented in Ramrod.
Of importance in the consideration of IPC methodology 
is the danger of deadlocks. This arises because 
resources are usually allocated to processors on the 
basis of their availability without any predete-mined 
allocation algorithm. Deadlock can be explained as in
PAGE 3-13
the following e x a m p l e : A user task is granted the
printer for outputing data and then requests the card 
reader to read in data; Another user task is using the 
card reader and then requests the printer so that it
can output results. If these resources can only ce
used by one process at a t i m e , and neither process wil- 
release the resource it holds, then deadlock occurs.
In order to prevent d e a d l o c k , (or deadly embrace ) 
Dijkstra proposed the semaphore [ ul J ] , as a no., 
negative integer, which apart from initialisation of 
its value, can be acted upon only by the operations 
Wait and Signal' [LIS]. The Wait aid Signal functions 
can be summarized as follows;
Wait(s) : when s>0, decrement s
Signal (s): increment s
Thus a resource (printer etc) can only be allocated to 
one process. This approach is widely used and could be 
implemented in Ramrod (see section 6 .j ).
Another more practical solution is, however, possible. 
If a program is partioned into tasks, which are 
executed sequentially, such that there is no need for 
any inter-task communications until a task is 
terminated, and inter-task communication only takes 
place between adjacent tasks, then deadlock can never
PAGE 3-14
o c c u r ! Thus when partitioning the program, if a point 
in a task is reached where communication with another 
task is needed, this is the place where the user should 
partition the program (see 6.2).
It is suggested that the user, who writes the programs 
for his particular needs should do the partitioning of 
the tasks in this manner. This clearly is possible to 
implement automatically but it is beyond the scope of 
this present investigation to include software which 
will part ition the tasks according -c the aoo 
specification. In the present system this is carried 
out manually. Thus using data flow techniques (chapter 
6) the only communication between one task and another 
occurs either at the beginning or at the end of the 
task. This highly pragmatic approach proven most 
useful, and s u p n s i n g l y  easy to implement. It is 
however only a partial and somewhat crude solution. As 
will be discussed in the next section, Ramrod has 
provided many other possible hardware mechanisms which 
may be used to implement Inter-Process Communication. 
Thus Ramrod is a useful testbed for evaluating a 
var ietv of proposals .
PAGE 3-15
3.5.1 C o m m u n i c a t i o n  m e c h a n i s m s  p r o v i d e d  b y  R a m r o d —
Ramrod provides three mechanisms through which tasKs 
can communicate with each other.
1. An intelligent Input/Output controller 
(Ethernet see appendix E), which allows any 
processor to be connected to any peripheral, 
or to any other processor.
2. A vector interrupt system to the master: A 
task can suspend itself once I P C  is r e q u i r e d  
and be woken up at a later stage. This is 
analogous to Hoare's communicating sequential 
orocesses (see chapter 6 ).
2, Tasks communicate by passing data through 
common memory (see earlier discussion on 
oj ikstra ' s semaphores, which car. be 
implemented via this mechanism).
However it must be pointed out that the above are only 
mechanisms, and do not provide for deadlock avoidance! 
They do show however the pow*r of the structure of 
Ramrod as an experimental tool.
PAGE 3-16
3.6 User Task to Operating System Communication
Communication between any user task and the operating
system is effected simply by means of a 'watchdog'
timeout signal which interrupts the master controller.
A task must contain instructions which continuously 
trigger a monostable multivibrator, which will time out 
if the task terminates or if a failure occurs. The 
interrupts of the slave processors are vectored so that 
the master can identify the interrupt. .he master can
then check whether the timeout was caused by a
processor fault or task fault or if the processor has 
finished executing the task.
In summary this simple mechanism is extremely powerful 
and provides both for a termination indication, as well 
as the ability to detect a processor failure.
3.7 Conclusion
Implementing the operating system in hardware (by 
purpose designed architecture ) makes the overall
system reliable and flexible, because (as stated
before) hardware is naturally more reliable than 
software. In addition there is an ability to
microprogram the operating system , it is claimed that
the system is flexible, as the architecture is easily
PAGE 3-17
modified to suit the user's needs. The operating 
system functions have to be complemented by the minimal 
amount of software and this adds to reliabilty.
Furthermore the operating system is distributed in that 
Input/Output and certain local control routines are 
implemented in the slave processors. Thus the 
reliability of the system as a whole is enhanced.
CHAPTER 4 
SYSTEM HARDWARE
deliver those that practised" [Eccl v m  8j.
The block diagram of Ramrod was discussed m  2.6 ana
in this chapter the hardware of both the bit-slice
master processor and the multiprocessors are 
outlined. A more detailed discussion of this
hardware structure is to be found in appendix G.
In order for any new architecture to have economic 
relevance, it must be simple and efficient, ana meet
the needs of its potential users. In oroer to 
achieve these aims it should exhibit such features as 
fault-toler ance and provision of the neccessary 
redundancy. The multiprocessor -tructure of Ramrod 
fulfills these criterea.
PAGE 4-2
4 .i System Overview
As has been discussed previously (sect. 1.5), 
distribution of work over veral conventional
processors with common storage is . one approach— to
Increasing processing speed. However, a serious
problem with common, shared-memory multiprocessor 
systems is that all the memory is accessible by all 
Processors, and therefore special support is required 
to ensure that processors do not access the same 
address simultaneously, thereby corrupting 
other's data [ACER 82).
figure 4.1 shows the overall system block diagram 
with the Master Controller (ME, which is in charge of 
the system. The ME which is a bit slice hardware 
cased real time operating system controls the
data/address latches on botn sides of 
D i v i s i o n  Multiplexed (TDM) bus. As the lat.hes are 
ra.ntical the hardware can be said to be modular.
PERIPHERAL PERIPHERAL PERIPHERAL PERIPHERAL
1 _  2 - 3 N
5
ETHERNET BUS
5
ETHERNET ETHERNET ETHERNET ETHERNET
INTERFACE INTERFACE INTERFACE '—  —  —  — — INTERFACE
1 2 3 N
■n I ROL LINES TO 
LATCHES AND 
PROCESSORS
MASTER CONTROLLER 
DESIGNED FROM 
2900 BIT-SLICE SERIES 
FAMILY
MICROPROCESSOR MICROPROCESSOR MICROPROCESSOR MICROPROCESSOR
; 1 2 3 N
DATA/
ADDRESS
LATCHES
DATA/
ADDRESS
LATCHES
z r
j
DATA/
ADDRESS
LATCHES
DATA/
ADDRESS
LATCHES
DATA AND ADDRESS BUSSES EMITTER COUPLED LOGIC u
0
-4/ 1
DATA/
---ik: x.
DATA/
v
DATA/
j,c---- V  __
LATA/
ADDRESS ADDRESS ADDRESS ADDRESS
LATCH __ LATCH LATCH LATCH
RANDOM ACCESS MEMORY
FIGURE: 4.1 RAMROD BLOCK DIAGRAM
PAGC 4-4
The multiprocessor architecture proposed a.:d designed 
makes use of conventional microprocessors, with their 
relatively slow processing times. Of great 
significance is the fact that the cycle time of a 
single processor in the system is not significantly
_ . . .  iwit .......
average cycle time for a conventional microprocessor 
is approximately 1 microsecond, tl :me baing set 
primarily by speed of memory access. Figure 4.2 
shows how all the processors communicate with the 
common memory by way of the TDM bus. While the
.
the first processor and this time is available for
use by the other processors. Each processor uses the
bus for a very short period and if there are 50
processors then this period is 20 nanoseconds. Thus 
with the present system 50 microprocessors are able 
to communicate with each other with almost no 
degradation in performance of any of the 
micronrocessors.
1 M I C R O P R O C E S S O R  CYCLE
►4---
li «- *- n t
:3
A D D R E S S  TO MEMORY 
(PROCESSOR 1)
DATA FROM MEMORY 
(PROCESSOR 1)
FIGURE 4.2 TI MIN G ON THE SHARED BUS
I
U1
P
A
G
E
PAGE 4-6
Tne cyclic operation occurs as follows: each
processor deposits data and addresses into its
latches and when these are given access to the busf
data are transferee into latches on the other side of 
the bus.
It should be noted that the data can be sent to more 
than one set of memory latches, thus giving the 
processors access to more than one memory segment 
simultaneously. In addition it should be noted that 
the concept of a distr ibued data base can be 
implemented easily on this type of computer system, 
as a global variable with many copies can be 
simultaneously updated by one processor. A 
distributed data base implies that each processor in 
the system has its own copy of the data base.
4 . 2 Basic Structure of Pam.roc
The following section provides an overview of the 
various components of Ramrod. Full details of actual 
implementation, with the circuit diagrams, appear in 
Appendix G.
PAGE 4-7
4,2, I M iccoproc^r-r or Modulo
The microprocessor slave module consists of an ci.iB5 
microprocessor together with memor/ an l 
associated support chips. It is also provided with a 
serial data channel to allow access during
devel. .ant (so that a terminal could be provided to
each slave processor an 1 hence a 1 low direct co i..ro 1) .
These slave processors have to be synchronised wi-h 
the Time-Divisioa Multiplexed (TDM) bus in order that 
there should be a minimal amount of processo- idle 
tine (as discussed in s ction 2.4.1). In addition, 
in order that tha o rating system can be 
distributed, i limited number o operating system 
functions mu ;t h ' present on each processor (so that 
failure of th master controller is not critical in 
the short term). The slave processors can therefore 
continue executing th * ta '< "> dispatched to them until 
the tasks suspend themselves, and thus the system can 
"gracefully degrade".
The local op-rating system is implemented in a 
resident Electronically Programmable Read Only Memory 
(EPROM) on th' slave processor module and includes 
additional software to enable the slave to be 
self-test ed.
PAGE 4-8
4.2.2 TDM bus and Interface
The system designed is tolerant of processor failure 
but, as with the conventional common bus, it is very
sensitive to bus failure. A catastrophic failure
occurs if the bus fails and therefore a dual, 
redundant bus must be provided to minimize the 
possibility of system failure due to bus failure. 
The system can contain two sets of identical latches 
for each processor and memory segment. Thus, when 
one bus fails (which can be detected by the master 
processor), its associated latches are disabled and 
the second set of latches is enabled. A second 
control board (see 4.2.7.8) can achieve this 
switching of latches.
Most Multiprocesor systems with a common memory and 
bus suffer from bandwidth limitations, since the bus 
bandwidth will not increase even though more
processors are added (as has been mentioned
previously in section 2.2.1). Thus, what is needed 
is a state-of-the-art design, capable of high speeds 
of transfer, inexpensive and uncomplicated.
/PAGE -4-9
Wnen choosing a logic family for the implementation 
of the bus and interface there are several factors to 
consider: i.e. noise immunity, logic flexibility,
speeo and some practical considerations. Obviously, 
for each application the factors must have a certain 
priority. In the case of the bus controller the 
highest priority is given to speed, as this
determines the transfer rate across the bus. Then 
the priorities are: logic flexibility, practical
considerations and noise immunity.
4.2.2.1 Speed
In order to decrease the degradation in processor 
performance, the transfer rate of the bus must be 
high. Unfortunately the faster the logic, the higher 
the cost and the power dissipation! when considering ' 
high speed, the number of levels of gating becomes an 
added factor, which in turn is a function of the 
logic flexibility.
Gate propagation delay is perhaps the most important 
measure of speed. It is defined as the time taken 
for an output to appear from a gate after the signal 
has been entered at the input.
PAGE 4-10
4. 2 . 2. 2 Logic P l.-x'ibi I i ty -
Reduct'on of the component count for a particular 
device is dependant on the flexibility of the logic 
family used. Flexibility is roughly related to the 
number of different outputs the integrated circuit 
(IC) has available. W Lre-ORing, the capability of 
tying more than one output together also 
significantly reduces component count. Other factors 
to consider are:
1.
unnecce nary.
2 .
because the faster 
closely a short
characteristics of 
(MOT B ].
driving capability, 
the signal the more 
line acquires the 
a t •ansmiss ion line
3. Input/output interfacinq, i.e interfacing to 
the bua and fron the bus to memory.
levels are not Tr in sir, tor-Trans is tor Logic 
(TTL) levels.
5 . Multiple gal n;, thus reducing chip count.
PAGE 4-11
4 '^ •2•5 Practical Considerations of Logic Choice -
Before committing a design to paper, the availability 
of the components has to be ascertained, and second 
sourcing has to be considered. Since a budget is 
normally to a project and subdived for the various 
sections, the cost factor plays a part in the 
selection of the component. If the logic to be used 
is "unusual" then the designer has to address 
problems such as what power supplies are required as 
this might necessitate extra power supplies over and 
above the normal single +5 volts requirement of TTL 
based systems.
4 . 2 . 2 . 4  .Noise Immunity -
As a system such as Ramrod might have to operate in 
an electrically noisy environment, it must have high 
noise immunity (it was originally conceived for use 
in process control). There are two types of noise 
immunity to be considered, i.e. internal and 
external. when the circuits themselves switch from 
one level to another, internal noise is generated, 
whereas external noise is caused by external devices. 
A good measure of immunity is the voltage difference 
between the two logic levels, as the greater the
PAGE 4-12
difference between leve]s the higher the noise level 
must be in order to corrupt the data.
4 . 2.2.5 Compar is ions of Logic Families -
As the highest Priority is speed, only Emitter 
Coupled Logic (ECL) and Advanced Schottky 
Transistor-Transistor Logic (AST) were considered for 
use in the bus system, as these are the only 
currently available, off-the-shelf, logic families 
fast enough for the application. The advantages and 
disadvantages of these two families are tabulated in 
the appendix.
Both ECL and AST have the same availability and 
second sourcing problems in South Africa, i.e both 
are difficult to obtain, and the costs are generally 
the same.
The major disadvantages of using ECL are the several 
power supplies required and the need for thoroughness 
in testing. However, as ECL s at least twice as 
fast as AST, it was chosen as the logic in which the 
bus and interface were to be implemented.
p a g e 4-13
4.2.3 Bus Interface
The interface to the ECL bus is implemented via a 
oidirectional logic level translating latch, which 
; ovides a rapid means of converting microprocessor 
or memory TTL levels to the bus' ECL levels. In 
addition tne latches need ECL control signals which 
are provided by simple one-way TTL to ECL 
translators.
Each microprocessor and memory module has its own set 
O a. latches,and as the latches on either side of the 
ous are identical, there is no need to design an 
cx.ra latch module. The latches, as mentioned above, 
can translate in either direction and can thus be 
used on both sides of the bus. Its control sionals, 
which are derived from the processor and memory 
boards, determine how its operating mode.
4.2.4 The Time-Division Multip lexed Bus (TPM)
The TDM bus adopted is unusual in that it is circular 
and is joined at the ends! The philosophy behind the 
structure is simple; to ensure minimum transmission 
time of signals on the bus the physical distance 
between any processor and the memory unit should be 
kept at a minimum (fig 4.3).

PAGE 4-15
The use of a circular bus for ECL has not been widely 
documented, although Sander son and Zoccoli [ZOC] 
have designed a multiprocessor system using a
circular ECL Dus where they state the advantages of
using such a construction, but they do not, however,
go into detail. Therefore the modelling of the bus 
has been the subject of an additional investigation 
(Appendix C ). This investigation has shown the
princple to be viable and has revealed design
parameters.
Note that there are actually two buses which th'
system uses:
a) The ECL TDM bus
b) A 7TL bus for power and control signals
4.2.5 Memory Module
In the prototype each module contained a relatively
small memory segment (2 56 Bytes Random Access Memory, 
RAM) together with the logic needed to produce the 
signals for reading from and writing to the latches. 
This was selected partly on economic grounds - a 
practical, lull scale system clearly would have 
larger memory segments. The speed of this memory 
need not be particularly high because the slave
PAGE 4-16
processors access this memory via the TDM bus and 
have to wait their turn for a time slice.
4.2.6 Input,-'Output Module
The implementation of the input/output section
adopted is similar to the memory interface concept
developed in that it is possible to pair a device on
the I/O bus to any other device. It is advantageous 
to have inter changeability amongst processors for I/O 
functions just as for memory. Implementation must 
also take into account possible
processor-to-processor communication as well as 
processor-to-per ipheral communication. Thus the 
input/output interface must be highly flexible.
In order to implement this a highly intelligent- and 
fast interface is needed. The only medium found to 
fulfil these criteria lies in the adaptation of a
simple but high-speed link based on the principles of
the Ethernet System. Ethernet has the highly 
desirable feature that no master controller is 
required. Any device wishing to use the bus simply
'listens' and seizes the bus when it is free.
Simultaneous transmissions are ignored and 
retransmission takes place after a "random" wait time 
(see Appendix E for more details of Ethernet).
PAGE 4-18
PAGE 4-17
However it must be pointed out that initially, m  
°raer to simplify testing of the system, I/O was 
achieved via dedicated processors. The software has 
Identity section which can determine whether the 
processor is connected to a terminal or a disk 
operating system, or whether it is simply a task 
processor. Work on the Ethernet controller is 
currently taking place in a related project. (see 
Appendix E for details).
4.2.7 Bit-Slice Master Controller
71,6 m e d  eStabHshed <“ ct I-*, for a hardware 
cased operating system with the following facilities:
I- An interface to the multiprocessor structure 
to schedule processors
2. A
sophisticated interrupt hierarchy 
Interrupts may come from the processors, 
after failure or task termination or from 
real-time clocks. Any hardware- implemented 
operating system must be able to deal with 
these interrupts and respond accordingly
PAGE 4-17
However it must be pointed out that initially, 
order to simplify testing of the system, I/O was 
achieved via dedicated processors. The software has 
an identity section which can determine whether the 
processor is connected to a terminal or a disk 
operating system, or whether it is simply a task
processor. Work on the Ethernet controller is
currently taking place in a related project. (see
Appendix E for details).
4.2.7 Bit-Slice Master Controller
The need was established (sect 1.4) for a hardware
based operating system with the following facilities:
1. An interface to the multiprocessor structure 
to schedule processors
2. A sophisticated interrupt hierarchy. 
Interrupts may come from the processors, 
after failure or task termination cr from 
real-time clocks. Any hardware- implemented 
operating system must be able to deal with 
these interrupts and respond accordingly
PAGE 4-18
3. The processor must have available a limited
amount of: high-speed storage in which it can
hold Task Control Blocks, oointecs, stacks 
and constants.
,
there are two important factors which must be kept in
mlnd: speed anfl flexibility. Flexibility is
desirable so that the functions can be as universal
as possible and so that the processor can be expanded 
if needed.
4.2.".I Bit-51i co Arc i itecture -
It has been shown [r o d  7CJ that bit-slice 
architecture offers the best features for
.
O'-^ipnei has almost complete control over the
3r( ' ‘ ' processor required, >sd in
addition the bit-slice processor is
|,V71"''' !"iS 1 raakes it superior to other
architectural techniques, it offers a greater degree 
Of flexibility in specifying a computer's instruction 
repertoire, while also resulting i,. considerable 
simplification in the logic.
PAGE 4-19
Bit-slice microprocessors are capable of high-speed 
operation since they are based on bipolar technology, 
often resulting in cycle times of less than 10H 
nanoseconds.
Currently the following bit-slice microprocessor 
families are widely used and relatively freely 
available:
1. The Intel 3000 series
2. The Motorola 10800 ECL series
3. The Advanced Micro Devices 2900 Low Power
Schottky (AMD) series
From a user's point of view the differences are few 
but the main designer's criteria are local 
availability and developement tools. The AMD series 
was chosen because of the local support and second
sourcing and primarily because a cheap emulation tool
was available in the form of an extension to the 
Motorola EXORciser (see Appendix F).
PAGE 4-20
i'urn ing to the actual design, the bit-siice processor 
can be subdivided into two parts: the Computer
Control Unit (CCU, and the Central Processor Unit 
B u - s l i c e  architecture is essentially the
th“ 1' 01 a "OL"”al processor with one basic 
difference, each functional unit has only (for
instance in the 2900 series, a 4 bit wide word and to
”eke an 8 blt two units need to be
inter-connected. Fundamental to any microprocessor 
besed system is the determination of the
au/anta.-;.. that their micro-instruction set is
■
the actual architecture, the structure of the 
microinstruction must b • discussed.
' ion
The principle decision which has to be made by th 
designer of a microproqrammable logic system is th
.
d "t ' "' '' ' ' lt * ormat the designer has to bear it 
mind the facilities required and the external control 
dc<: i ' 1 '' this format .
PAGE 4-21
6 are generally two classifications of 
microinstructions: Vertical or Horizontal. A
Horizontal microinstruction will control the 
operation of many resources in parallel, and can be 
unlimited in width but in practice is normally up to
64 bltS Wlde’ (In actual fact this is often 
determined by practical issues - such as the maximum 
size which a developement facility can support). m  
contrast, a vertical microinstruction is similar to a 
normal machine code instruction and affects only a 
single primitive operation. After reviewing the 
requirements of the bit-slice processor it becomes 
apparent that the chosen format must be horizontal, 
m  order to achieve the parallelism required.
PAGE 4-22
9 deSi9natira °f fi=ld= - t h i n  the chosen
1S shown in figure 4.4. This means 
that in one horizontal microinstruction the following 
operations may be specified;
Control of ventral Processor Array (CPA)
2. Control of next address generation
3. Control of status of flags from the CPA
C°ntr01 °£ ^Put/output functions, including 
local memory control board, interrupt unit
ana microprocessors
(EN lPfv  PROCESSOR 
UNIT
ruwf fwirpn
s m t t K ,  r r N i m  ( B i nAWf53(retain
JlAl A Ahi'* I 1 
r < > » M  M f ' I . NC'WIC-^L
(J  ^ U M lR  IF5Ti nh£f/TYl M U'f ' 'X
TP
4<. 6%
B H
S '  .>1 
Bo DO
NX
jn- j )  32 3i 30 gq aa 22 JIl. -15 ?’•
JV IX D, B
■ a 70KEKQ EEEEE E EQ7E Q
FIGURE 4.-1 MICROINSTRUCTION FORMAT
PAGE 4-24
4.2.7.3 Tne Computer Control Unit (CCU) -
The major function of the CCU is the sequencing of 
instructions, i.e the determination of the order in 
which instruct ons are to be fetched from the 
microinstruction store. Generally the n icroproqram
sequencer contains:
1. A microprogram counter register which will
increment after each clock cycle, thereby
selecting sequential addresses.
2. A condition code multiplexer whereby the
status of flags or other bits can be tested 
for conditional branching.
3. A multiplexer which can select between the
counter register and a directly specifier
address.
The AMD sequencers and next address unit used allow 
the addressing of 2**12 = 4096 locations with 2**4=16 
next address instructions fo. the control of
conditional branching instructions. The actual
address space is organised into a 1-dimens ional 
array, 1024 ty 64 bits wide. Each microinstruction
supplies 4 bits for the next address control and 12
PAGE 4-25
bits for the actual address. The scheme has an 
advantage in that it allows the user to write his 
instructions in a sequential fashion. In addition, 
most other conventional programming techniaues can be 
used (for example subrouting where return addresses 
are automatically stacked and unstacked).
4.2.7.4 The Central Processor Array (CPA) -
The CPA of the bit-slice microcontroller is similar 
in functional operation to that of the Arithmetic and 
Logic Unit (ALU) of a conventional von-Neumann type 
structure. The CPA can execute the following 
operations:
1. ALU functions
2. Address and route data to and from local
memory
3. Route data to and from I/O interface to the
control board
4. Mask the inter riot control unit
5. Provide status bits to be routed to the CCU
PAGE 4-26
The AMD CPA contains 16 general-purpose registers to 
hold stack pointers, memory address pointers and 
system constants. A fast look ahead carry unit is 
provided to make fast arithmetic computations
possible. A status and shift control unit is
included to control status and other functions
usually -ssociated with an ALU.
The logical operation of the ALU is determined by a 
x /-bit control code and 6 bits de ermine the 
operation of the status contol unit.
4.2.7 . 5 Input'Outout Control Uni: -
The input/output control has the following functions:
1. Local memory read/write
2. Control board read/write
3. Microprocessor memory read/write
4. Microprocessor hold and reset
5. Masking of interrupts
PAGE 4-27
The programming of the control board (4.2.7.8) 
achieved by the I/O control unit. While this board 
is being programmed its outouts are inhibited to 
prevent unwanted processor-memory combinations from 
taking place.
Upon receipt of an interrupt the input/output unit 
can either hold the particular processor or reset it 
to begin executing a task. In order to interrogate 
the memory of any microprocessor the MC behaves like 
a slave processor and simply reads the memory.
4.2.7.6 Interrupt Unit -
Ramrod accepts interrupts from each microprocessor 
and can interrogate its memory to find out the type 
of interrupt. Only 5 levels of interrupt are used, 
although the number is theoretically expansible to 
any number of levels.
A slave processor generates an interrupt request
signal which instructs the CCU to jump to the
interrupt service routine, where the identity of the 
interrupting processor is determined. This is
achieved by the interrupt controller which supplies 
the sequencer with an address corresponding to the
PAGE 4-27
The programming of the control board (4.2.7.8) is 
achieved by the I/O control unit. While this board 
is being programmed its outouts are inhibited to 
prevent unwanted processor-memory combinations from 
taking place.
Upon receipt of an interrupt the input/output unit 
can either hold the particular processor or reset it 
to begin executing a task . In order to interrogate 
the memory of any microprocessor the MC behaves like 
a slave processor and simply reads the memory.
4.2.7.6 Interrupt Unit -
Ramrod ac~' *-s interrupts from each microprocessor 
and can interrogate its memory to find out the type 
of interrupt. Only 5 levels of interrupt are used, 
although the number is theoretically expansible to 
any number of levels.
A slave processor generates an interrupt request
signal which instructs the CCU to jump to the
interrupt service routine, where the identity of the 
interrupting processor is determined. This is
achieved by the interrupt controller which supplies 
the sequencer with an address corresponding to the
PAGE 4-28
interrupt level. This is normally known as a
vectored interrupt.
4.2.7.7 Local Memory -
An operating system needs storage for tables, tasK
blocks etc. The registers in the CPA are
insufficient, and in addition scratch pad use is also 
neccessary, so ordinary Metal Oxide Silicon (MOS)
memory is made available for this purpose. This is 
similar to RAM in a simple microprocessor system.
4.2.7.8 Control Boa-d -
The control board consists of two identical sections 
of very fast RAM which are used to enable the 
microprocessor and memory modules respectively. The 
Master Controller can only write the enabling signals 
into this memory and the actu-_. reading of the memory 
is accomplished by a modulo n counter where n is the 
number of processors in the parallel array. The data 
which is read from this RAM provide the enabling 
signals for the latches which allow processor/memory 
communication.
PAGE 4-29
4.2.7.9 Control Store -
A key factor in the design of a bit-slice 
microprocessor is the cycle time of the processor. 
Bit-slice timing can be calculated from the worst 
time taken for data to traverse the data path. The 
access time of the control store has a direc^ 
influence on this cycle time.
The data path begins with the instruction b=ing 
fetched from memory end being presented to the 
pipeline register. From there the individual sits 
are available for control of the relevant parts of 
the system. While the rest of the system *s 
operating on this instruction, the sequencer 
generates the next address, which is supplied to the 
control store, the store is accessed and the 
instruction is ready for entry into the pipeline 
register.
Thus the read time of the control store is included 
in the cycle time, which is reduced by the method of 
pipelining outlined above. However there is still a 
need for a fast -access memory.
PAGE 4-30
Eit-slice processors are usually designed with a 
decoding PROF which accepts macroinstructions, or 
normal instructions of processors, and calls 
subroutines of microinstructions to implement the 
macro instruction. If this PROM is dispensed 
with,then the user can write his program on the 
microinstruction level, thus improving speed.
A disadvantage is that the user has to write the full 
64 bits rrespective of how many bits he needs. The 
inner workings of the processor are also not 
transparent to the user. Figure 4.5 shows the 
general structure of the bit-slice processor 
discussed.
PAGE 4-31
ADDRESS
BUS
PAGE 4-32
4.3 Physical Construct ion
The physical construction of Ramrod1s bus structure 
is shown in figure 4.6. The latch boards plug 
directly on the buses so as to ensure that the ECL 
signals are generated as close as possible to the 
bus. The TTL control signals and power lines go 
through tie latch board to the board that is plugged 
'piggy back1 fashion on to it. This 'piggy back 1 
board can either be a memory or a processor board and 
it then logically determines the mode in which the 
latch board is to function.
The bus itself was implemented using double sided 
"scotch-flex" cable with one complete side grounded. 
Edge connectors were connected directly onto this 
cable.
The control board which supplies the TTL control 
signals plugs directly into the TTL bus and is driven
by the MC via a set of cables.
The master controller is located external to the bus 
structure and, in practice, was located within an
expansion chassis associated with the EXORciser
developemer,t system.
PAGE 4-33
The slave processors, memory boards and latch modules 
are cooled by a fan which is mounted on top of the 
bus structure. ECL requires power supplies different 
from that of TTL and thus there are five supplies 
(+5, -12, +12, -5.2, -2 volts) connected and sensed 
at the top of the circular construction. These 
supplies are in addition to those of the bit-slice 
master controller as power requirements for the 
circular construction are high and if the supplies 
were not duplicated then there would be a significant 
power drop to the circular construction. Figure 4.6 
shows a complete view of Ramrod.

4 * 4 Cone 1 us ion
PAGE 4-35
iSSS
UP tC n°W the hard—  S — re has been developed
ana the f 0 U W i n 5  Cha« -  describes the basis of the 
sortware structure of Ramrod.
CHAPTER 5
IMPLEMENTATION OF THE OPERATING SYSTEM
"For everything there is a season; and a proper time 
for every pursuit under the heavens. There is a time 
to be born and a time to die; a time to plant and a 
time to pluck up what hath oeen planted; a time to 
kill and a time to heal ; a time a time to break down 
and a time to build up; a time to weep and a tim to 
laugh" [Eccl iii 1,2,3,4].
As has been stated before, the operating system of 
Ramrod has been distributed amongst the various 
processors in the system in order to increase the 
reliability of the computer as a whole. The kernel 
of the operating system is implemented in the 
bit-slice master processor while most of the routines 
which control input/output and- local processing are 
resident in EPROMs on the slave processor boards. 
Additional details of the operating system software 
are to be found in appendix I and only high level 
functional description are discussed in this charter. 
Actual listings of the software can be obtained from 
the Dept. of Electrical Engineering at the 
University of the Wi 4-watersrand.
PAGE 5-2
5.1 Operating System Kerne-
The operating system has been simplified as far as 
possible in order to implement only the essential 
functions which are required to evaluate Ramrod. It 
must be pointed out that additional features still 
have to be implemented to provide a full, commercial 
system.
The operating system kernel, as currently implemented 
in the master controller contains the following 
functions:
1. Dispatcher
A list is kept of the status of the tasks in 
the system (Task Control Blocks TCB). The 
dispatcher has the function of scanning the 
list of TCB's and when a tasks is waiting to 
be executed the dispatcher looks for an 
available processor on the processor status 
list.
PAGE 5-3
Scheduler
Once a task has been dispatched to a 
processor, the processor has to be initiated 
so as to run. The scheduler has therefore 
the prime function of enabling processors.
The 'round robin1 scheme of scheduling is
actually implemented in the hardware (see 
4.2.7.8) whereby processors are allowed 
access to the bus, and hence to the common
memory, in turn (i.e. time slicing the
bus) .
Memory Manager
This module keeps a list of the memory 
segments showing which are free or which are 
occupied. Once a task has been executed its 
memory segment joins the 'free' list. When 
a task is loaded this module is consulted in 
order to find a 'free' segment.
PAGE 5-4
Interrupt Handler
The interrupt handler determines the source 
of an interrupt and proceeds to service the 
interrupt after saving the volatile 
environment of the interrupted routine. 
When the interrupt has been serviced 
execution of the interrupted routine is 
resumed.
Input/Qutout Module
It is the function of the input/output 
module to initiate an I/O operation, on 
request. Tasks are loaded from an external 
source or can be entered by the user from a 
console which is connected to one of the 
slave processors. This module i '-> an 
interface between the nucleus of the 
operating system and the routines whicu a.e 
resident in the slave processors.
It should be noted that in the prototype 
system, tasks were resident on the disks of 
the associated EXORciser development system. 
The operating system obtained tasks from 
this system and loaded them into Ramrod =>
PAGE 5-5
memory as required. This technique obviated 
the need for a dedicated disk controller and 
disk.
5.2 Local Operating System
The component of the operating system contained in 
each slave processor has the following functions:
1. It can act as an extension to the 
input/o :tput module of the operating system.
2. It can function autonomously in order to 
e..a .e ssl: testing of the slave processor
1 can directly control the processing 
activities of the slave processor and will 
only execute user tasks as requested by the 
master controller
The microcomputer determines, on power up, what type
of function it is to perform. This is achieved by
the processor which writes its identity into a 
location of common memory and if the orocessor is
reset then it can determine its mode of operation by
reading this location. The determination of this
PAGE 5-6
mode, in the final system, is automatic but during 
development this was basically determined by
transmitting data through a Universal Synchronous 
Asynchronous Receiver Transmitter (USART), and 
reading immediately the data on the input. This 
feature enabled direct control of each slave
processor during system testing. Once the identity 
of the processor is determined its mode can only be
changed by a power down sequence or by the master
controller which can reprogram the appropriate memory 
location.
5.2.1 Input/Output Extension to the Operating System
In this mode the slave processor behaves as an 
intelligent terminal, and can be connected to a user 
console or to a host computer. As mentioned before, 
this allows a user direct access to each slave 
processor - a most valuable aid during development. 
For example, programs which are to be run by a slave 
and which have been developed on, say, a development 
system can be loaded via this routine into Ramrod's 
common memory. In addition the user can view the 
system on the console. This gives the user a way of 
getting his programs into Ramrod's memory without 
direct control from the master controller- again a 
useful aid curing development.
PAGE 5-6
mode, in the final system, is automatic but during 
development this was basically determined by
transmitting data through a Universal Synchronous 
Asynchronous Receiver Transmitter (USART), and
reading immediately the data on 'he input. T m s  
feature -enabled direct control of each slave 
processor during system testing. Once the identity 
of the processor is determined its mode can only be
changed by a power down sequence or by the master
controller which can reprogram the appropriate memory
location.
5.2.1 Inout/Outout Extension to the Operating System
In this mode the slave processor behaves as an 
intelligent terminal, and can be connected to a user 
console or to a host computer. As mentioned before, 
this allows a user direct access to each slave 
processor - a most valuable aid during development. 
For example, programs which are to be run by a slave 
and which have been developed on, say, a development 
system can be loaded via this routine into Ramrod's 
common memory. In addition the user can view the 
system on the console. This gives the user a way of 
getting his programs into Ramrod's memory without 
direct control from the master controller- again a 
useful aid during development.
PAGE 5-7
5.2.2 Self-Testing Routines
In order to test the microcomputer initially the 
following routines are included in the slave 
operating system;
1. Identify, on power up, the function the 
processor is to perform (i.e. it can be a 
slave processor executing tasks as set by 
the master controller, a processor which 
communicates with the user via a VDU, or a 
processor which can load tasks om an 
external source e.g. a disk operating 
system).
2. Substitute or update any memory in the slave 
processor's address space
3. Display contents of this memory on screen
4. Insert code into any of this memory
5. Execute program inserted by user
PAGE 5-8
5.2.3 Slave Task Processing
In this mode the slave processors execute tasks at a 
specific location in the common memory address space. 
This location is in the common memory area and 
therefore tasks which have been loaded via the master 
controller I/O module and which have been dispatched 
to this particular processor are executed after a 
request by the master processor. On completion the 
master is notified by means of the mechanism 
described in 3.6.
5 . 3 P o n d  us i or.
Graceful degradation of the system is assured because 
if the master fails the slave processors can continue 
functioning until their tasks are completed.
As can be seen in appendix J , the hardware 
reliability of Ramrod depends on duplication of the 
master and the TDM common bus, whereas the software 
reliability is greatly enhanced by distribution of 
the operating system.
PAGE 5-9
In summary, the operating system is distributed and 
contains the following:
1. Scheduler
2. Dispatcher
3. Memory Manager
4. Input/Output Manager
5. Interrupt Handler
In addition a list of information pertaining to the 
status of tasks, processors and memories is 
maintained by the master controller.
The distributed operating system which has been 
described is essentially simple and has proved to be 
most effective. It appears to be both an effective 
tool and a successful combination of hardware, 
firmware and software.
CHAPTER 6
APPLICATION SOFTWARE STRUCTURE
"For all this did I reflect over in my heart and to
explain all this, that the righteous, and the wise,
and their services are in the hand of GOD; that man 
knoweth neither love nor hatred; it is all ordained 
before them" [Eccl ix 1].
Whilst this thesis has set out to produce an
operational system and has concentrated on the
fundamental design issues, it is important to give 
some attention to methods which may be used to 
construct applications software.
Therefore this cnapter discusses an appropriate 
method, and the techniques discussed are utilised in 
a relatively simple example which will form the basis 
for the practical evaluation of Ramrod.
It is common knowledge that parallel processing can 
be greatly enhanced by using techniques a.lopted from 
data flow languages. Computations represented by 
cyclic data flow graphs can be automatically unfolded 
to expose all parallelism to the underlying hardware 
[AGER 82 J.
PAGE 6-2
The discussion that follows presents a pragmatic 
introduction to such techniques and shows how they 
may be implemented in pro luction.
6 .1 1 ja ta Flow A- nr oacri
Data iLow machines attempt to provide concurrency in 
operation in order to achieve high speed of 
-
allow the computer architecture to be visible to the 
programmer in order to achieve parallelism. This is 
unnatur 11 is the language then closely reflects the 
behaviour of the computer rather than the manner in 
which the programmer normally thinks [ACR]. The data 
flow language approach on the other hand, directly 
reflects the progra: or's thoughts whilst making the
CO nr,:! tor 1 s arch i t. ec fur > ' ran m  rent.
A data flow language i ; defined as a "language based 
entirely upon the notion of data flowing from one
.
concept has the adv in tag- of allowing the data f low 
language program I > be repre .onted graphically.
PAG,.' 6-3
U'' a dat<a ! l ° - / language is extremely 
as sub-programs can be understood entirely on the-
,
' ' " ' ' t',u n 13 altering another module's variables.
f
the modules that look independent can be executed
Independently, and modules can therefore run 
concurrently [r,AV 82].
The data flow machine, which is a direct image of the
language it supports, is in contrast to the 
'
computer model, and it is based on tae following two 
princip.es:
"'.synchrony. All operations executed when and only
,
.
th.'.-r. re no side effects". IGAJ1
Asynchruny denotes an execution mechanism in which 
data values pass through nodes in data flow graphs as 
tokens, and an operation is initiated whenever all 
input tokens are present at a node in the graph, 
functionality implies that any two enabled operations
.
PAGE 6-4
Figure 6.1 shows how Z= (X-f-Y) * (X-Y) is graphically 
represented and therefore computed. The functions +, 
-f * are called actors and they reside at a node and 
nodes are connected by arcs. Data flows on arcs from 
one node to another in a stream of discrete tokens. 
Tokens are considered carriers of data objects [DAVj. 
It must be noted that an actor, or operator, cannot 
be initiated before all of its tokens are available 
(see chapter 7) (see Figure 6.1),
NODE
NODE
FIGURE 6.1 DATA FLOW INSTRUCTIONS
PAGE 6-6
The data flow computer is designed to recognise which 
of the instructions are enabled. All such 
instructions are dispatched to execution units as 
soon as they are available.
The data flow concept can be extrapolated to 
conventional von Neumann structures by having tne 
processing elements operate simultaneously on tasks 
rather than on instructions. If a more global 
tlook is taken, it can be seen that the task can be 
fined in a similar way to a data flow instruction 
such that it is enabled if all input conditions are 
met, and it is suspended when an output condition
ou
de
n i- <*■ u r s
6.2 Task Definition
The actual mechanism of automatically decomposing a 
program into tasks which will fit into the above 
category is beyond the scope of this i n v e s t i g a t i o n .  
The program must be decomposed before entry into the 
system. One solution is to write the programs in a 
data flow functional language, which has inherent 
properties for parallel processing (see 7.3).
PAGE 6-7
Using the data flow concept, a task in this 
multiprocessor environment is defined as the smallest, 
functional unit of software, which requires inputs 
for execution to begin and which only terminates when 
an output condition occurs. Therefore a task is 
autonomous and must run to completion before any 
communication with another task. This concept, is of 
course, very interesting as it reflects one of the 
original proposals discussed in section 3.5 to avoid 
deadlock.
Inter-task communication is thus kept to a minimum 
and each task has a single indivisible function. 
Thus the instruction in figure 6.1 is a task which 
accepts two inputs, X and Y, and produces an output 
2.
However, it must be remembered (see section 3.5) that 
the above approach to Inter-Process Communication 
(IPC; is relatively limited, and other mechanisms 
should be investigated. Ramrod is an excellent 
vehicle for experimenting with these ideas. The next 
section discusses various possible selected IPC 
mechanisms, and it is shown how they may be 
implemented in Ramrod.
PAGE 6-8
6 . 3 I nter- Ta sk Communication
A d i s t r i b u t e d  m u l t i p r o c e s s o r  com pu t e r  s y s t e m  needs a 
s o p h i s t i c a t e d  c o m m u n i c a t i o n  m e d i u m  to p r o v i d e  for the 
n e c e s s a r y  in ter-task c o m m u n i c a t i o n .  Re l i a b i l i t y ,  
r e d u n d a n c y  and m o d u l a r i t y  (as m e n t i o n e d  in 1.2.1) are 
r e a u i r e m e n t s  for the i m p l e m e n t a t i o n  of such a medium. 
T his  c o m m u n i c a t i o n  me d i u m  is the kev to flexibl e 
i m p l e m e n t a t i o n  of r e d u n d a n c y  and e x p a n s i b i l i t y  
[ M A C ] .
The i nterac ti on be tw e e n  tasks :rise when two 
c o n c u r r e n t  (truly c o n c u r r e n t  r the m u l t i p r o c e s s o r  
sy stem and pse udo c o n c u r r e n t  in the u n i p r o c e s s o r 
e n v i r o n m e n t ) tasks need to e x c h a n g e  d a t a .
The c o n c u r r e n t  tasks have access  to co m m o n  m e m o r y  
v ar i a b l e s  wh ich rep r e s e n t  the state of p h y sica l 
r e s o u r c e s , and which are used to c o m m u n i c a t e  data 
be tween  c o o p e r a t i n g  t a s k s . In gene r a l  the co mmo n 
var iable s can rep resen t shared  ob je c t s  called 
r e s o u r c e s , and in order to share r e so urces the 
c o n c u r r e n t  tasks need to be sy nchroni se d. 
S y n c h r o n i s a t i o n  is def in ed as an o r d e r i n g  of 
o p e r a t i o n s  in time and in the m u l t i t a s k i n g  
e n v i r o n m e n t  this infers that " o p e r a t i o n s  A and B must 
never be e x e c u t e d  at the same time ", i.e mutual
PAGE 6-9
e x c l u s i o n  [BRI 73]. A more d e t a i l e d  d i s c u s s i o n  of 
I n t e r - P r o c e s s  C o m m u n i c a t i o n  p r i m i t i v e s  has been 
u n d e r t a k e n  by M a c l e o d  [ M A C ] .
Th e t r a d i ti onal ways of h a n d l i n g  the Inter Pr oc ess  
C o m m u n i c a t i o n s  a r e :
1. Se ma p h o r e s
a s e m a p h o r e  is a s y n c h r o n i s i n g  varia b l e  
( flag) w h i c h  inform s a task wh ether the 
re sou rce it wishe s to share is a v a i l a b l e  or 
u n a v a i l a b l e  [D.TJ] .
2. Critic al Regions
A c o n c u r r e n t  task can only access common 
v a riab le s wi t h i n  a c r i tical  region. The 
task can onl y  enter a crit ic al region wi t h i n  
a finite time, and only  one task at a time 
can be inside a c r i t i c a l  region. The task 
can remain in the critic al  region for a 
finite time only [BRI 73].
PAGE 6-1(3
Communicating Sequential Processes [HOA /i]
Input/Output are basic primitives of 
programming. Parallel processing using 
communicating Sequential Processes (CSP) is 
a fundamental program structuring method. 
This Communication is considered as being 
synchronised input/output.
A process communicates with another process 
by naming it as its destination for output, 
while at the same time the second process 
names the first as a source for its input. 
When both processes are ready to transfer 
data the value to be output is copied from 
the source to the destination. A
disadvantage of this close synchronisation 
scheme is that if one of the processes 
finishes before the other there will be a 
certain amount of idle time by the processor 
concerned, and there is also a limit on the 
amount of parallelism achieved.
PAGE 6-11
Verifying programs in a uniprocessor 
environment is difficult enough, and Hoare 
therefore states that there is no method for 
verification of programs in a multiprocessor 
environment.
4. ADA [US POD]
One of the most exciting developements in 
real time languages is the ADA language, 
which is a project of the United States 
Department of Defense. ADA is similar to 
CSP in that it has a low level construct for 
the synchronisation o f  parallel tasks. ADA 
incorporates the concept of a rendezvous, in 
which two processes communicate with each 
other at a specific (real) time, for 
interprocess communication.
5. P r imit iv es for D i s t r i b u t e d  Comp ut ing[LI SK ]
An advantage of a distributed organisation 
is reduced contention for a single CPU, but 
this is replaced by contention for the 
communication medium. Other advantages are 
speed of response from the CPU's, better
PAGE 6-12
reliability, higher capability and
expansibility,
* m
The basic construct of this I PC method is 
called a guardian which consists of objects 
and processes. An object contains 
data(integers etc.) and a process is an 
execution of a sequential program. 
Communication by processes in different 
guardians is by means of message passing. 
The guardian exists entirely at a single 
node of a distributed system. Once a 
message has been sent, the sending process 
can proceed. Receiving messages are 
associated with a timeout which is necessary 
because an expected response may not arrive 
due to errors or failures.
Ports, which have global names, allow 
queueing of messages as they provide some 
buffer space. If this buffer space is full 
the message is lost. The port is a 
unidirectional gateway into a guardian and 
is described by the type of messages that 
can be sent to it.
PAGE 6-13
6. MARS [KOP 82]
In the MARS project IPC differs for state 
messages and event information. An event is 
a happening at a point in time, whereas 
state information deals with attribute
values of objects which are are only valid 
for a certain period of time. However,
event and state information are related as a 
change of state is an event.
An event message is queued at the receiver
and can only be removed by that receiver
when it is read. A state message is valid
for a specified period of tirtie and can be
read by several tasks many times. Tne IPC
mechanism is, on a high-level, a broadcast 
medium with a group addressing capability 
(Ethernet?).
The above mechanisms have been . own to be viable and
the author does not wish to debate their merits.
However, it is not clear how to determine which 
method is most suitable for a particular case of 
Inter Process Communication (IPC).
PAGE 6-14
Ramrod, in this thesis, does not set out to solve the 
problem of choosing an I PC mechanism but rather
provides a vehicle for testing them. All the above
methods can be implemented in Ramrod as there are 3
ways to support IPC (see 3.5.1).
1. Via common memory. As a slave processor has
access to any other slave processor s
memory, all of the above IPC methods are 
able to be implemented in Ramrod
2. Via Input/Output. As the I/O interface is
intelligent one processor can address
another processor using unique n^mes for
each processor (CSP).
3. via communication through the master using
interrupts. The master can interrogate a
slave processor and determine what it wants 
and most of the methods listed aoove can be 
implemented.
PAGE 6-15
6.4 Conclusion
This chapter has attempted to provide a mechanism 
which may be adopted in ord< to produce application 
software for Ramrod. It is suggested that Data Flow 
techniques seem to be appropriate and in the next 
chapter a simple program is developed on this bas . 
In addition this chapter has looked at Inter Process 
Communication mechanisms and it has been shown that 
Ramrod is capable of implementing all of these - thus 
enhancing the value of the system as an experimental 
tool.
EVALUATION OF SYSTEM
"For vvh ' k n o w t h  wh at is go id for in n in this life, the 
number of fcho do vs of his vain life that he should spend 
them as a shadow. For who can tell a man what will be after 
him under the sun" [ Eccl vi 12].
It has been claimed that Ramrod is more efficient than a 
conventional uniproc sor, but there is a difficulty in 
proving this. Ef, ici- ncy is usually defined as the ratio 
between useful work per for. 1 an 1 the total work performed. 
In this project, however, efficiency is evaluated on a 
comparative basis b e t v e n  P c o d  and a uniprocessor computer 
of similar po- or to on- of the Slav: processors. Evaluation 
techniques (outlined by R o d  [ROD 76]) can really be only 
applied to one s/st ,n and cannot form the basis cf 
comparision between two fundamentally different types of 
computer system . R o m  l‘s architecture is similar to that 
of an array procvssor and therefore it should be used for 
vector pr-a sing i o, b:-r to utilise it as efficiently as 
possible. TL- refo. , wb n evaluating Ramrod this point must 
be kept in mind and thus merely obtainin ! a run time for
/• .
PAGE 7-2
The problem is analogous to calculating the reliability of 
Ramrod, since conventional reliability theoery is really of 
little significance in a fault-tolerant system (appendix J) . 
Therefore it was decided to limit the evaluation of Ramrod 
to using it as a simple vector processor operating on an 
array of integers, while allowing a uniprocessor to do the 
same operation on the array and ?n comparing the 
respective performance. This compc xson is naturally not 
totally valid but, more than anything else, it dees 
illustrate the vital factor that the system developed has 
much merit and provides an indication as to how it can be 
used.
7 .1 Practica l I. ir--.at ions
As the project was by definition very large, and because 
many of the ideas such as sharing common memory and time 
slicing the Emitter Coupled Logic (ECU bus are almost 
unique, some of the architectural features developed have 
not been fully implemented in the prototype. The software, 
as well, has been simplified in order that the basically 
novel thoughts of Ramrod be demonstrated and proved to be 
viable.
*z
PAGE 7-3
The operating system has been simplified by allowing the 
user to load his tasks via the console in addition to using 
the master operating system functions to load tasks. 
Purpose designing the operating system for the evaluation of 
Ramrod also reduces the complexity of the operating system
i.e. all the features in a complete operational system have 
not been included - only those that are absolutely necessary 
to run the test program.
It must be clearly understood that the omission of. the above 
features does not in any way undermine their importance and 
contribution to the project. These features can easily be 
incorporated into the system because of the modularity of 
Ramrod.
An additional factor which is isually examined in the area 
of evaluation, is the question of memory utilisation, but as 
the control store is large enough for the operating system 
and because tasks reside in the common memory, this 
evaluation is not relevant in the present situation.
PAGE 7-4
7.2 Factors influencing the Relative Comparison
In order to synchronise the slave microprocessor to the 
Time- Division-Multiplexed (TDM) common bus the "Ready" line 
of the 8085 has been used. Thus for a read memory cycle the 
address is first transferee! across the bus and the 8085 is 
held 1 unready1 until the data returns from the memory. 
Initially this double cycle seauence only applied to the 
read memory cycle, but as the 8085 has a multiplexed 
data/address bus it was found necessary to make the write 
memory cycle a double cycle as well, thus effectively 
doubling the time taken for writing data to memory. This 
factor obviously influences the run time of the 8085 slave 
processor and must be kept in mind when comparing the 
execution figures of Ramrod and a uniprocessor system. The 
Bus Enabling Signals (BES) (figure 4.2 time slots) were
originally chosen as having a period of 1 micro-second and
therefore the logic on the memory cards was designed with
this in mind to provide the read/write, select and clock 
pulses using monos table multi-vibrators. It has 
subsequently been determined that the BES frequency can be 
increased to 2MHz, thereby significantly reducing the run
time of a task.
PAGE 7-5
The method used for interrupting the bit-slice master 
processor is via the watchdog circuitry which takes 14 
milliseconds to time-out. Therefore the execution time of 
the task should be reduced by this time period as an
alternative method for interrupting could be designed. The
slave _ .e.sor can generate an exclusive address in c 5er
to signal the master, though the watchdog circuitry is still
needed to indicate a malfunction. Thus there would be two 
interrupts from each slave.
The uniprocessor system used in the comparative studies was 
one of the slave processors executing in isolation. This is 
a preferred solution as it incorporates the double cycles 
mentioned above, and hence provides a direct comparison in 
terms of execution times.
I
PAGE 7-6
' * 3 Program Used in Relative Coinpar is ion
As was mentioned earlier, the architecture of Ramrod is 
similar to that of an Array Processor so it was decided to 
undertake the evaluation by making Ramrod do an exercise on 
■tn array of integers.
Figure /.I illustrates how the program runs and shows how 
data flow techniques are applied.
-he program calculates the maximum of an array of numbers.
The array is divided by the number of slave processors that
are present and each subdivision becomes the input for the
operators (slaves) 1, 2 and 3. The routines are initiated 
by the apearance of tokens (subdivisions). They operate on 
the arrays and are terminated when the output (maximum) 
occurs. Operator 4 can only be iniated when all of its
tokens (maxima) are present at the input. It terminates 
once the output (absolute maximum) appears.
START
p.A'i
C AL CULATE
MAXIMUM
*A) (
'M
d i v i d e  u p  
i i n t e g e r s
s u b a r r a y CA LCULATE MAXIMUM CALCULATE
MAXIMUM; MA XIM UM z
- " 8 4 % ■...- i—■
CALCULATE 
MAXIMUM
ABSOLUTE
MAXIMUM
---- j
FIGURE 7.1 OPE R A T I N G  SYSTEM SEQUENCE
I
P
A
G
E
PAGE 7-3
7.4 ResuIts
Figure 1.2 shows the execution sequence of the test program. 
When several slave processors are executing identical tasks 
simultaneously there is a possibility that two or more tasks 
will send interrupts to the master controller 
simultaneously, therefore these tasks contain uurnmy loops so 
that their execution times are not similar.
MEMORIES 
TO PROC4
FOR USER
TO ENTER 
DATA
I NT MEMORIES 
TO PROCS 
-1 - 3. 5_______
PROCS GET 
MAXIMA
i
I NT MEMORIES 
TO PROC2
PROC2 GETS
ABSOLUTE
MAXIMUM
INT MEMORIES 
IP P&OCA
USER 
VIEWS 
DATA
FIGURE 7.2 EXECUTION SEQUENCE
PAGE 7-10
Below is a table which shows the steps for the test program 
with their execution times.
PAGE 7-
Function Execution Time
1. Assign 3 memories to a 
slave processor so that user 
can enter array
2. User generates integers, 
after insertion, which 
is detected and serviced
3. These 3 memoriess are re­
assigned to 3 slave processors, 
for calculation of maxima
4. Slave processors calculate 
their maxima
5. Three interrupts are 
generated and serviced
6. These 3 memories are 
assigned to another slave 
processor so that it calculates 
absolute maximum of array
7. This slave processor 
calculates maximum of 3 numbers
8. Generation and service of 
interrupt
9. Assign memories to console 
slave processor so hat user 
can view result
Table 7.1 Execution Times
13 microseconds
550 nanoseconds
10 microseconds
1600 microseconds
1250 nanoseconds
10 microseconds
180 microseconds
550 nanoseconds
12.6 microseconds
PAGE 7-12
Note: The time taken for the user to input the data is not
relevant, anc applies to the execution time of tne single 
processor as well.
Thus the total run time for the system to calculate the 
maximum of 96 integers is approximately 1.8 milli-seconds. 
If one slave processor were to operate on the entire array 
it would take 5.2 milli-seconds whereas a processor which 
does not have any wait states inserted ) takes 2.6 
milli-seconds to operate on 96 integers (see figure 7.1)
7.5 Conclusion
Ramrod performs very well under the given conditions and 
when the bus Enabling Signal (BES) frequency was indeed
increased the machine became even more powerful. The 
results must be viewed whilst keeping this point in mind. 
The fastest time for Ramrod to do the above example was in 
the region of 1 milli-second thus making it 2.5 times faster
than a single processor. However one must bear in mind that
Ramrod has the ability to allow the processors to 
inter—communicate and therefore its overall power is 
difficult to estimate. In addition, if the operating system 
were better scheduled then there would have been no need to 
have a separate processor to maximise the relative maxima 
and the total run cime could then be reduced by an ad led 
factor of 30C micro-seconds. It is estimated that it would
PAGE 7-13
take a processor 1.3 milli-seconds to ooerate on an array of 
24 integers (96/4 =24).
CHAPTER 8
CONCLUSION
"The end of the matter is, let us hear the whole; Fear GOD 
and keep his commandments; for this is the whole duty of 
man. For every deed will GOD bring into the judgement
concerning everything that hath been hidden whether it is 
good or whether it is bad" i Eccl xii 13,14].
Very Large Scale Integrated (VLSI) microcomputer components 
are highly cost-effective because of the high volume at 
which they are produced, and therefore future computer
architectures must utilise this dramatic advance in
technology [GIL BEHR].
This thesis sets out to define a computer system which is 
highly cost-effective and whose architecture is based on 
data-flow techniques in order to provide a more efficient 
way of data access than the conventional computers.
However, the architecture was also based on a high degree of 
fault-tolerance, modular extensibility and simplicity.
PAGE 8-2
8.1 Uniqueness of Ramrod
The project brought out the unique features, detailed below, 
in order to reconcile these seeminglv conflicting demands of 
modular architecture on the one hand and simplicity and
fault-tolerance on the other hand.
1. The Time-Division Multiplexed (TDM) common bus
It was shown that a common shared bus does not 
necessarily have a low bandwidth, and can indeed be 
used very efficiently to time division multiplex 
many devices - the key being the very short time 
required by each processor to access the bus.
2. Circular Bus
The TDM bus is constructed in a circular fashion 
and joined at the ends. Ramrod proves the 
viability of using a circular bus to improve signal 
levels and hence to decrease the maximum delay 
between devices.
PAGE 8-3
3. Distributed Operating System
In order to increase the reliability of the system, 
as a whole, and to provide for graceful degradation 
the operating system is distributed amongst the 
processors, and the kernel of the operating system 
is built into the hardware of the bit-slice master 
processor.
4. Local/Global Memory
Another feature of Ramrod is that it does not 
differentiate between local and global memory. 
This offers many useful properties such as a simple 
mechanism for implementing a distributed data base.
5. Inter Process Coi.mur, i cation (I?C>
There are three methods by which Ramrod can achieve 
I PC, thus making it a good test bed for developing 
ideas about IPC:
(a)via the I/O module using Ethernet
(b)via the common, shared bus
(c)via the Master Processor
PAGE 8-4
6. Readily Available Components
The architecture, all though novel, uses freely 
available components and thus maintainability and 
extensibility are assured.
8.2 Commer cia l Vi a b i l i t y  of Ramrod
Ramrod can be used in such diverse applications such as 
process control on the one hand and data base management on 
the other hand, and this is perhaps one its most important 
contributions to technology. In addition the system is 
relatively cheap but powerful. A cursory calculation shows 
that the cost of developing and marketing this computer can 
oe in the region of R25,00C - R35,060 thus placing it in the 
lower bracket of minicomputers, with, of course, more 
relative power.
The cost of software development for the purpose of testing 
Ramrod has been included in the above calculations but the 
cost of producing software for making the machine as 
versatile as is claimed in Chapter 1 could not be 
ascertained and is clearly considerable.
PAGE 8-5
8.3 Critical Analysis of Ramrod
It has been claimed that Ramrod can support 50
Microprocessors and 50 memory segments, but in view of the 
investigation carried out (see appendix C) into the ability
of ECL to drive the circular bus, additional circuitry is 
required and this might slow down the propogation delay 
which would have an effect on the overall system throughput.
The actual operation of Ramrod was marred by problems
relating to the construction of the ECL bus. Whilst the 
timing was shown to be viable, the critical nature of this 
timing made it subject to temperature problems.
ECL has a very high heat dissipation problem which depends 
on the level of the power supplies and the termination
resistors which in turn affect the ECL logic levels. Any
variation in ambient temperature clearly affects all the
parameters, and during hot weather tr.e system suffered from
occasional intermittent faults - attributed to timing
problems.
Finally, the choice of the actual physical bus was a poor 
decision - the interconnection from the edge connectors to 
the scotch-flex system proved to be unreliable and was the 
source of many mechanical failures.
PAGE 8-6
8 . 4 Future Enhancements
The basic design of Ramrod is sound but there is still room 
for improvement which can be achieved by:
1. Improving the present design to overcome mechanical 
problems resulting from the physical bus 
construction
2. The incorporation of those features mentioned in 
Chapter 1 so as to permit a fully operational 
vehicle which may be used in long-term experiments.
In particular the following ones required attention:
1. In order to ensure stable power supplies on each 
board, regulators must be resident on each printed 
circuit board.
2. The latch module needs to be redesigned so that the 
ECL chips are closer to the bus.
3. Both the TTL and ECL Busses must be constructed 
from a flexible printed circuit board so as to 
provide a more reliable mechanical structure.
4. In order to increase the number of slave processors
PAGE 8-7
and memory segments, the loading of the ECL bus 
chosen needs additions] investigation.
5. In order to include an intelligent I/O interface 
the work on Ethernet needs completion.
6. Better software support is needed to develope the
microcode. A related project is investigating a
highly flexible microassembler and emulator [WILD].
7. Perhaps standard processor and memory cards could 
be used instead of purpose-built hardware thus 
making Ramrod universal.
8.5 Conclus ion
In summary, Ramrod has been designed and built to a 
prototype stage and tests were run to show its viability. 
Although it suffers from certain problems relating to the 
mechanical structure and also is somewhat temperature 
dependant, it has proved to oe a most successful and in many 
ways unique design. In providing an extremely powerful 
computer which makes use of freely available components it 
has met its prime desig; 'bjectives and illustrated much 
promise for future development.
APPENDIX A 
EMITTER COUPLED LOGIC
A .  I  I n t r o d u c t i o n
A compar ision of the fastest commercially 
state-of-the— art technologies (ECL and AST) as 
discussion on the use of ECL is outlined in the
pages.
Emitter Coupled Logic 
Advantage
Propogation delay 2-3ns
Low output impedance
Can drive transmission l.nes
Very high fan-out
Complementary outputs 
High output drive capability 
Slow rising edges 
Wire-oring possible
available 
well as a 
following
Disadvantages
Has different power 
supplies from standard 
TTL
Has different logic 
levels from standard 
TTL
Extra power supply for 
transmission line 
All outputs need pull 
down resistors 
High power dissapation 
large ground plane 
needed
PAGE A-2
High input impedance,therefore 
unused inputs go low
Advanced Schottkv TTL
advantages Disadvantages
TTL compatible Propogation delay twice
as long as ECL 
i.e same levels, powe. supplies cannot drive
transmission lines 
Low power consumption No wire-ORing
High noise immunity rast output transition
therefore reflections 
and crosstalk 
Thresholds low levels 
slightly offset from TTL
Emitter Coupled Logic is a non saturating form of digita 
logic which eliminates transistor storage time as a speei 
limiting characteristic and permits very high spee<
operation. "Emitter Coupled" refers to the manner in whicl 
the emitters of a differential amplifier within th<
integrated circuit (IC) are connected. The differentia: 
amplifier provides high impedance inputs and voltage gair 
within the circuit. Emitter follower outputs restore the 
logic levels and provide low output impedance for good line
driving and high fanout capability.
PAGE A-3
A typical ECL gat structure is shown in figure A-l as well 
?c the available separate functions.
ECL has two ground inputs which eliminate crosstalk between 
Circuits ir a package. In order that unused inputs may be 
left open 50 K ilo-Ohm "pinch" resistors drain input 
transistor leakage current end hold these unused inputs at a 
fixed logic zero level.
Typical logic levels for ECL are -0.98v which is a logic 
high level and -1.75v the logic low level.
In order to increase logic flexibility, speed and power 
efficiency two techniques of connecting the differential 
amplifiers are used. Figure A-2 illustrates the SERIES 
GATING technique which permits the generation of upto 2n 
logic functions from n inputs with one current source, while 
COLLECTOR DOTTING (illustrated in figure A-3) a.lows the 
logic nitj  function to be achieved bv interconnecting one 
collector node of separate differential current switches 
together. A thi rd technique, WIRE-ORing, enables the logic 
OR function to be generated by tying together two or more 
emitter follower transistor. A disadvantage of ECL is that 
there is a limitation of the number of WIRE-OR connections 
of 6. Therefore bus drivers need to be used when this limit 
is exceeded.
PAGE A'
BY COURTESY OF MOTOROLA INC.
T e m p e r a t u r e  
an d  V o i t a v e  
C o m p e n t a t e d  
8 i a t  N e x w o r kD i ^ f e r e n x i a l  I n p u t  A m p l i f i e r
C C 2
90?
220 Q6
07
0 6  ►
Q3 05
Rp
50 k < 9 8  k50 k
FIGURE A-1 ECL STRUCTURE
BY COURTESY OF MOTOROLA INC.
0 3A  O-
02B »
C » 0 4  0 5 0 6  0 7
- C  3
0 2  0 3
Ol
FIGURE A-2 SERIES GATING
PAGE A-5
BY COURTESY OF MOTOROLA INC,
R1
03
Ol
60-
BE
■° V CC1
■O V c 'A ♦ B # (C+D
CO » 02
DO-
FIGURE A-3 COLLECTOR DOTTING
BY COURTESY OF MOTOROLA INC.
ttl — I
•"'Oul I !----
- » - E C U  ^
C** B v c m
‘
I t t l  e c u
- r ~ T
J
!7
■
□  6  
E C l l  Outowi
C *l*cl D neoie
M C I 0804 4 B.ts
M C I0806 6 B ill
FIGURE A-4 10804 LATCH
PAGE A-6
Using ECL for high speed logic design can result in more 
problems than using AST transistor transistor logic (the 
propagation delay is about 2ns, and thus delays are
introduced from the wiring). Therefore wiring lengths 
should be reduced as much as possible. Using wiring with 
2.0..s/ft delay means that there is approximately one gate 
delay for every foot of wiring.
Transmission line principles should be employed in order to 
design interconnections between ICs. Line lengths approach 
the quarter wavelength of the signal and t! erefore 
distortion and reflections can occur. Lines must be 
properly terminated with matching impedances to avoid these 
and other associated problems.
EcL designers have further minimised crosstalk by
deliberately slowing the rise and fall times to more than 
3ns.
Manufacturers recommend that only one-sided printed circuit 
boards be used, keeping the second side as a ground plane in 
order to reduce noise generation as well.
PAGE A-7
*he characteristic impedance Zg of a single line over a 
ground plane separated by a dielectric medium ,i.e. 
microstrip lines, is calculated by;
ZB = 87 In {5.98h}
   * (• ;
(€,-+1. 41) **1/2 { . 8w+t}
where er = relative dielectric constant 
w = width of microstrip 
t = thickness of microstrip 
h = thickness of printed circuit board
ECLs logic levels of -0.98v and -1.75v are derived from the 
—5.2v power supply. The reason for this power suooly as 
opposed to the normal +5v supply is that it helps to reduce 
noise generation when the emitter foiljwers switch from one 
level to the other.
ihe de^ioners of EcL circuits incorporated another useful 
feature into their designs by including at least one 
inverted output signal in an IC package. For example the 
1Z104 quad 2 input AND gate has one inverted output i.e. 
the NAND function is derived.
The 10195 HEX INVERTER/BUFFER has 6 EXCLUSIVE-OR gates with 
one input commnnned. Therfore the IC can be configured as a 
buffer or inverter.
PAGE A-8
A.2 Sample Data
The 10104 Quad 2 Input AND gate
Propogation delay is 2."ns typical while rise and fall times 
are approximately 3ns. The power consumed per gate is 
35mw(no load).
— .e. ECL/TTL Invertinc Bidirectional Transeiver with
Latch
Referring to the block diagram in figure A-4 the reader will
notice that there are four control signals needed to operate
this package. The OUTPUT DISABLE when at a logic low level
disables both the ECL and TTL output buffers, while at a
logic high level these buffers are enabled. The ECL/TTL
signai allows control of the direction of data transfer and 
translation.
The LATCH BYPASS select line allows the latch circuitry to 
be bypassed for fast data transfer. When it is a logic low- 
level data is directed to bcth the latch input and output 
buffer simultaneously, and this enhances the speed of 
translation and throughput.
APPENDIX B
MICROPROGRAMMING AND BIT-SLICE TECHNOLOGY
Bit-slice technology and microprogramming are reviewed in
this appendix , in order to provide a general background of
the master controller which has been developed to control
the operation of the processors in the multiprocessor 
structure.
Bit-slice microprocessor families are not revolutionary, 
rather they represent a new stage in the evolution of the 
design of central processing units ( CPU's).
In machines designed from small scale integrated technology 
wnere integrated circuits could only hold a small number of 
basic components the ALU would occupy one printed circuit 
boa-d, and the registers another board etc. so a complete 
CPn would occupy many boards or cards. The logic was 
commonly separated into n bit wide sections thus one card 
would contain a small chunk of the total processing unit,and 
the cards were cascadaole.
With the introduction of MSI and LSI it became economically
feasible to include more of the control logic onto one
'cmp', and eventually the single 'chip' microprocessor was 
developed.
PAGE B-2
The bit-siice microprocessor represents a further stage in 
the developement of microprocessor technology in that the 
processor is again sliced as before, but this time each 
'Chip' is a complete chunk, and can be cascaded to form a n 
bit wide processor. In addition to that the Hit-slice 
microprocessor has been specifically designed to be used in 
microprogrammed machines.
Tne organisation of a conventional computer is shown in 
figure B-l. Essentially, four major sections may be
identified:
the memory
the input/output facilities 
the A L U
the control unit
The control unit or central processing unit (CPU) provides 
for overall control of tb, various sections of the computer.
A n tn m e t ic  
and 
Logical unit
Output
FIGURE B-] CONVENTIONAL COMPUTERS
PAGE B-4
The o r g a n i s a t i o n  of a m i c r o p r o g r a m m e d  computer str ucture  is 
s h o w n  in figure B-2. The essential d f f e r e n c e  between the 
a b o v e  two s t r u c t u r e s  lies in the mode of o p e r a t i o n  of the 
CPU. In the m i c r o p r o g r a m m e d  c o m p u t e r , the control store 
c o n t a i n s  sets of p r i m i t i v e  o p er ation c o d e s , which are termed 
m i c r o i n s t r u c t i o n s .  Each com po nent pa r t  of a
m i c r o i n s t r u c t i o n  s p e cifies an el eme n t a r y  logical or 
a r i t h m e t i c  process to be ef fected  in the computer. A 
m a c h i n e  co de instruction is e x e cuted by a series of 
m i c r o i n s t r u c t i o n s  contain ed  in the control store.
PAGE B-5
Output
Miciumstrui non 
memory
FIGURE B-2 TY PI C A L  M I C R O P R O G R A M M E D  C O M P U T E R
PAGE B-6
M i c r o p r o g r a m m i n g  allows the d e s i g n e r  f l e x i b i l i t y  in the 
d e s i g n  of his instruct io n s e t . A m a c r o  instr uction or 
m a c h i n e  code instruction  is p e r f o r m e d  by e x e c u t i n g  several 
m i c r o  ins tructions  in sequence. The machine c^de 
in str uc tion is used a pointer to this sequence. T h e s e  m ic ro 
in struc ti ons are us ua ll y stored in a control m e m o r y  within 
the b i t- sl ice architecture.
The m i c r o - i n s t r u c t i o n  word is brok en  up into several f i e l d s , 
each o^ which de fine s a p a r t i c u l a r  function w i t h i n  the 
m a c h i n e . Thus the longer the word the more is a c h ieved in 
any one ins truction and the faster the c o m p u t a t i o n . The 
designer  nas to anal yse the t r a de- of f between the wi dth and 
the depth of tne instructions. A m i c r o - i n s t r u c t i o n  word is 
typi ca lly 32, 56, 64 or 128 bits wide.
The s t r uctu re  of the sy st em can be altered by the 
o r g a n i s a t i o n  of the m i c r o p r o g r a m  word fields, al lo wi ng the 
de s i g n  to cl osel y match the functio n it must perform.
A b i t -sl ic e m i c r o p r o c e s s o r  system requires a lot more 
compone nt s than the two p r e v i o u s l y  me n t i o n e d  m i c r o p r o c e s s o r  
designs, and will th erefore be more e x pensi ve  and consume 
more p o w e r , but. will be more p o w er fu l and faster.
PAGE B-7
A control memory, usually a programmable read only memory 
(PROM), contains the microprogram words. The operation of 
the sv_.err, is as follows: A sequence of micro-instructions
th^s memory is executed to fetch an instruction from 
external main memory, which is then decoded and passed 
through a mapping PROM to generate the address of the first 
micro-instruction which is to be executed to perform the 
required macro-instruction. The sequencer controls the 
branch to the required address. The instructions are 
fetcheo from the control memory and then other operations 
such as , ALU functions,testing etc. are performed by the 
rest of the system. Then a branch is made back to the 
instruction fetch cycle, at which point there may oe 
^ranches to other sections of micro-code.
Tnc pipeline register essentially splits the system into two 
parts. it contains the micro-instruction currently being 
executed. This instruction is fed to the rest of the system 
Which performs the required operation while the next 
instruct.. is fetched and placed in the pipeline register. 
Thus the presence of this register allows the 
micro-instruction fetch cycle to occur in parallel with the 
data operation rather than serially, effectively doubling 
the clock frequency.
APPENDIX C 
THE MODELLING QF THE CIRCULAR BUS
C .1 Introduction
An investigation into the operation of the parallel ECL 
circular bus was undertaken by Messrs Bradford and Hunter as 
a tinal year undergraduate project and was supervised by the 
author.
A preliminary literature survey showed that very little 
information is available in the field of circular busses. 
i o c c o I i and Sanderson [ZOC] claim that they use a circular 
bus for their computer but however do not give enough 
detail. It is known, as well ,that the Cray super-computers 
J£e circular E^L busses but there is no information about 
this for general public consumption.
therefore in order to fully understand the operation of the 
bus it was decided to model the bus as well as conduct 
practical experiments.
PAGE 0 2
c -2 Model of the Bus
A computer program was used to simulate the operation of the 
=nd this mathematical model was compared against the 
measurements observed practically. (The program can be
Obtained from the Dept. of Elec. Eng. at the University 
of the Witwatersrand).
mt>del aSSUmeS that ^ e r e  are no dielectric or cooper 
losses and therefore the characteristic impedance of the bus 
2 becomes:
2 = (L/C)**1/2 
Where L = .56 micro-henry's/m 
and C = 82 pico-farads/m . 
therefore
2 = 8 2  Ohms/m
Eut taking into account the capacitive loading of the edge 
connectors of 2 pico-farads/connect ion Z = 72 Ohms/m.
The propogation delay of the bus
T (L.C) ** 1/2 = 7.85 nano-secor ds/m
PAGE C-3
Similarly the characteristic impedance of the tracks and its 
propogation delay are:
where t = thickness of the track 
and w = width of the track
f L . C ) * * 1/2 =17.36 nano-seconds/ it 
er = relative dielectric constant
Each board in the system has a terminating r ^ ^  is tor to -2 
vo^wS and when all of the gates are disabled the voltage on 
the bus settles to -2 volts. If a transceiver is enabled to 
transmit a high level (-.85V) then there is a voltage swing 
or 1.15 V in 3.5 nano-seconds (propogation delay of the 
gate). It must be noted that receiving gates reoresent the 
same high impedance to the bus as do inactive aates. This 
disabled to high level transition as well as the inverse 
-ransition only are considered as the voltage swings are 
large compared to the other voltage swings (.375 V).
a.j6 ^.levenin Equivalent of a driving gate is shown in figure 
C.l and has a Vth= .7 V and a source impedance of 7 Ohms 
irrespective of the load current.
Z t Ohms
PAGE 0 4
As the model assumes no dielectric or copper losses, direct 
modelling of lumped capacitance is prevented, and the 
capacitance is rather modelled <|s being distributed. The
rise and fall times are modelled as beino linear.
I
h
rAGE
Rt = 50
Rth= 7 
FIGURE C
th
ZZK C Z
6  vth Rt
V
-2V -2V
ohms Zt = 50 ohms
ohms ^th= "0,689 Volts
-1 THEVENIN EQUIVALENT OF DRIVING GA
PAGE C-6
C •3 Resuli
Graph C.l shows a comparison of the predicted and observe,
results Oh various boards, on the bus , for a termination
resistance of 270 Ohms. It reveals a difference in the rise
and fall times of 2 - 3 nano-seconds which can be attributed 
to the assumption of a lossless line.
Board i
40 ov
Board 5
redicted 
bserved
Board 3
44J.
GilAPH C 1 COMPARISON BETWEEN PREDICTED AND 
OBSERVED RESULTS
PAGE 0 8
Graphs C.2 and C.3 compare the rising and falling edges for 
different terminating resistorr and it appears that there is 
a critical resistance for a good termination.
PAGE 0 9
20V
GRAPH 0 2  COMPARISON OF RISING EDGES FOR
VARIOUS TERMINATION RESISTORS
PAGE C-10
7(v) *
- 1,0
-c,0 I
Board 1
10
-i-
20 t(ns)
7 (V )
-1,0
-1,5
*
- 2,0
GRAPH C-3
Board 3
COMPARISON OF FALLING EDGES FOR
VARIOUS TERMINATION RESISTORS
PAGE C-ll
Graph C.4 shows the cross section voltage along the bus at 
various boards. The slope of the wavefront determines 
whether an overshoot will occur or not.
(V)
PAGE C-12
I
C oh-s
130 ohms
Board "umber
GRAPH C-4 COMPARISON OF THE VOLTAGE CROSS-SECTION
ON THE BUS AT VARIOUS TIMES FOR VARIOUS
TERMINATION RESISTORS
PAGE 0 1 3
As the signal passes each board its magnitude is decreased 
end hence the slope of the voltage cross section becomes 
eepcr than the critical slope and no overshoot occurs. 
Decreasing the termination resistance decreases the 
transmission coefficient and no overshoot is obtained. This 
also increases the reflection coefficient and allows less of 
the incident pulse to arrive at the gate. If this 
resistance is chosen carefully enough then the reflection 
coefficient can be increased to allow large reflections but 
not nave too much of an overshoot, and allow enough of the 
pulse to arrive at the gate for correct detection.
C .4 Corel us icn
ECL gates can drive a 50 Ohm load terminated to -2 V. For < 
high level output (-.85 V, the current drawn is 23milli-a„p5 
(ma) but the manufacturers claim that MECL 10,000 series car 
source 50ma for surge conditions.
Once a stable steady state logic level is reached then there
is a constant flow of current and only DC conditions apply.
Thus for 10 boards there are 10 resistors connected in 
parallel therefore the effective R . R / 1 0 .  At t h e  h l g h
level (-.85 V) the current drawn is 1.15/R/10. This current 
must not exceed 50ma thus R > 230 Ohms and a 270 Ohm
termination resistance is recommended.
PAGE C-14
Tnis termination is however for a fixed number of boards and 
11 tne number "aried then the current and logic levels would 
be changed. r .gure C .2 shows a resistor-capacitor network 
which overcomes this problem. R2 is chosen so that at 
staole conditions the equivalent load is 50 Ohms and Reap 
- 2d Ohms which allows the maximum 50ma surge current to 
ow. The capacitance slows down the rise and fall times 
but improved logic levels are introduced.
H Z y - r - a
<cap R2
-2V
FIGURE C-2 CAPACITOR REflSTOR NETWORK
PAGE 015
H Z >-[HI > -
<cap
L
■2V
FIGURE C-2 CAPACITOR RESISTOR NETWORK
a p p e n d i x  d 
CURRENTLY AVAILABLE MULTIPROCESSORS
-urrentlv available multi-microprocessors are reviewed
below, in order to appreciate how the author chose the 
present structure of Ramrod.
D.1 CYBA-M
Cyba-M was a vehicle for research into multi-microprocessor 
systems initially undertaken by Swansea University College, 
and now at UMIST in Manchester. Figure (D-l, shows its 
basic structure, consisting of 15 identical Processing 
Elements, each of which comprises a microprocessor, a switch 
and some local memory. The global memory is a 10 Mbyte/sec 
memory, accessed through a 16 port switch, which determines 
the highest priority request generated by the node switches 
The Image memory (which provides the 1/0/ facilities), is a 
distributed bus structure with a maximum data rate of 2.5 
mbytes/sec. It is accessed through another 1-6 Port switch 
which is functionally identical to the Global Memory Switch. 
The 16th port is for use by the command console, which 
exercises total system control.

C O M M A N D
C O N S O L EMEM O R Y
P E R I P H E R A L  IMAGE 
M E M O R Y
f i g u r e  D-l C YBA-M
PAGE D-3
The disadvantages are:
The command console, similar to a Master Processor, is very
complex from both the hardware and software points of view.
The Global Memory is very fast and , being multiport, is
therefore very expensive. In addition , the priority
circuitry is complex. The switches are relatively simple
(2-1 multiplexers) but nevertheless add to the complexity of 
the whole system.
D • -- .The Siemens 4004/228/230
Th. design .= cased on the star configuration and comprises 
a oecicated central processor, a dedicated input/output 
processor , a hard wired maintenance processor and a memory 
system. All these wo,k asyochronously and exchange 
information via a co-ordinator (figure 0-2).
PAGE
M E M O R Y  S Y S T EM
m a i n t e n a n c e
I P R O C E S S O R  ■ c e n t r a l
PROCESSOR
I/o P R O C E S S O R  
C H A N N E L S
FIGURE D-2
SIEMENS 4004/220/230
-PAGE D-5
The disadvantages are:
Each processing element is dedicated to a paticular function 
and therefore if it fails, chat function can no longer be 
carried out. There is no redundancy in the system to allow 
for such failures, and tne system allthough it has a 
maintenance processor , is not able to readily recover from 
faults.
D .3 The Siemens SMS 2 n l
The SMS 201 nas a multiple Instruction Multiple Data (MIMD) 
structure for high speed numerical computations. Each 
processor (PR) has a dedicated Arithmetic Processing Unit 
attached to it. In addition each processor has its own 
program and data memory as well as a communication memory 
(CM) which connects the module to other modules, and to a 
main processor (MPR) via an interconnection network (ICN). 
(figure D-3)
I
PAGE
ICN
PR
MPR
FIGURE D-3 SIEMENS 201
PAGE D-7
The Disadvantages are: Too much reliance is placed on the
tfain processor. The communication memory is the channel for 
inter-processor communication and as such, is quite 
compj icated and therefore expensive. The interconnection 
network must be sufficiently intelligent to cater for 
priorities and to resolve conflicts.
D.4 The Carneoie-MelIon C.mmo
The multiprocessor is comprised of 16 DEC PDP-11 
minicomputers, each having its own private memory space and 
own input/output device. The PDP-11 Unibus is used for I/O 
as well as for inter-processor communication. There is a 
large shared memory which is accessed by the processor's 
address translator through a 16 by 16 crossbar switch 
(figure D-4).
PAGE D-8
1 6  x  16  CR O S S B A R  I N T E R C O N N E C T  
P R O C E S S O R  TO MEMOR'i  ONLY
A D D R E S S
T R A N S L A T O R
A D D R E S S
T R A N S L A T O R
I/OCOMM I/O
I N T E R P R O C E S S O R
INTE RRUPT
I N T E R P R O C E S S O R  I N T E R R U P T  BUS
C O N T R O L L E R
COMMI/O '• C O M M
FIGURE D-4 Cmmp.
PAGE D-8
16 x 16 C R O S S B A R  I N T E R C O N N E C T  
P R O C E S S O R  TO MEMOR'i ONLY
A D D R E S S
T R A N S L A T O R
A D D R E S S
T R A N S L A T O R
I/OCOMM
I N T E R P R O C E S S O R  I N T E RRUPT BUS
COMM COMMI/O
INTERPROCESSOR
CONTROLLER
INTERRUPT
FIGURE D-4 Cmmp.
PAGE 0-9
The disadvantages are :
The crossbar switch is complex and expensive and the address 
translator has to be able to resolve memory conflicts. 
Although there is no main processor the system is not fault 
tolerant, as a task on a failed processor module cannot be 
re-allocated.
D .5 The Banyan Multi-microccmouter System (BMS)
The BMS is composed of 15 Z8001 processors interconnected
with 15 memory segments by a 4x4 crossbar switch. The 
interconnection is fully parallel. unidirectional and is 
packet switched. Overall control resides in a Vax 11/780 
which accesses the rest of the system, via a Unibus adapter, 
using I/O transactions.
The BMS has the disadvantage of a complex crossbar switch. 
In addition there are local interfaces (I/Fn) to provide 
communication between the crossbar switch (SN) to the 
processors (Pn) or the memories (SMn) (see figure D-5).
PAGE D-10
%
SEPAL COWTHOLJ*nwOftK
sv.
sv.
sv
FIGURE D-, THE BANYAN MULTI-MICRCOMPUTER SYSTEM [McDJ
PAGE D-ll
D.6 INTEL iAPX 422 Multiprocessor System
The iAPX 432 is a 32 bit microprocessor which has an ADA 
compiler. It comprises of two chips forming a General Data 
Processor (GDP). It has been designed for multiuser 
applications and offers the user transparent
multiprocessing, i.e. the number of GDP's can be increased 
or decreased without the software having to be rewritten.
The designer is free to choose his own bus structure and the 
432 uses a standard interconnection protocol. Input/Output 
is achieved through the Interface Processor (IP) which 
programs a group of programmable associative memories 
(window registers) to map the I/O subsystem's address space.
The 432 uses virtual addressing such that only 7% of 
microprogram space is used. The 432 can operate in two 
modes: In the master mode a component operates normally
whilst in the checker mode the output pins reverse 
themselves and operate as special input pins. These pins 
sample data and compare this data to the data that would 
have been sent if the chip was operating in the master mode. 
Thus a highly fault-sensitive system can be built.
PAGE D-12
Instructions can vary in length from zero to three operands, 
and can thus support scalar, vector and record data types, 
such as found in ADA. There are no registers and memory and 
a hardware supported special stack are used for operands.
The arc tire is object-orientated, and the object 
provide an identical framework from simple bytes till
messages that are sent to another processor. Objects are
stored in segments of the address space, and they are 
always addressed via an object descriptor which contains 
information pertaining to the type and location. An access 
descriptor indicates the location of the object descriptor 
which is the only way to address an object. Thus the 432 
has a two level operation for memory requests.
The 4 32 has a hardware operating system and can handle 
complex software applications and has many software
protection mechanisms and has an extensive hardware fault 
detection mechanism. Thus it is very powerful and offers 
the computer architect an ideal basis for developing a
real-time multiprocessing system (figure D-6).
PAGE D-13
T '■ ■ . • T
= 3 3 2 = E 2 3 3 E K E a a S S S S |
8 US 
A R B I T R A T IO N  
LOGIC
BU S
R E Q U E S T
L O G I C
A D DRESS
B U F F E R
D A T A
B U F F E R
IRC
L O G IC
BUS  
R E Q U E S T  
LOG IC
AD DRESS
B U F F E R
s i t
V  ’' ' I ' I
J:__!_ & 5 1 g £
l T I M I N G  
)  C O N T R O L  
LO G IC @ 6
A D D R E S S  
L A T C H  A N D  
I N C R E M E N T
M E M O R Y
A R R A Y
O A T *  
S U F F E R  
A N D  SW AP N
DATA
BU F F E R
. IRC 
■LOGIC
2 ___
i ARX 43201  
D A T A  
RROCESSOR
iARX 43 20 2  
DATA  
RROCESSOR iARX 43201 iA R X  43 2 0 2 .A
-
• •
' Tv> 
.-V
" J
FIGURE D-6 THE INTEL 432 SYSTEM [RAT]
APPENDIX L 
INPUT'JPDPUT INTERFACING
The requirements of the input/output modules of Ramrod are 
listed , ana a brief introduction to Ethernet is discussed, 
with the view to using Ethernet as a communication medium on 
the I/O side of Ramrod.
E .1 Requirements
The input/output section of the multiprocessor system is 
required to handle communications between peripherals and 
processors on the one hand and between processor and 
processor on the other hand. Therefore the I/O bus must 
have a high degree of intelligence.
The I/O bus must have the same facilities as the TDM common 
memory bus discussed earlier, that is if a processor fails 
then another processor must be able to 'hook' onto the now 
vacant peripheral. Processors must be transparent to other 
and to the peripherals, and must be able to communicate with 
any device that is connected to the bus.
/PAGE E-2
Another important feature required from the intelligent I/O 
is that there be no master controller of the bus, is 
that if the controller fails another device can become the
controller. This increases reliability and provides for 
r edundancy.
In order to implement inter-task communication or, the 1/0 
bus the message's destination will probably be another 
t a s f s  identity , and the bus will have to be clover enough 
to determine which processor is executing this task.
The interface to the bus needs to be modular and relatively 
sample so that it can fit onto one printed circuit board 
Similar to the TTL/ECL interface boards, and it should be 
bidirectional. If an interface board is removed the system 
should not be affected, and at least 50 processors and 50 
peripherals must be able to be connected to the system.
addition the software overhead for protocols which 
control the information transfer between transmitting and 
receiving devices must not be too high.
PAGE E-3
E. 2 _Cf:hy-1 net 
E . 2 . 1 i.N T K O D l_I C T 10 N
A project involving the design of an Ethernet Controller was
undertaken by s.A. E l U s o v  as an MSc project in the Dept.
ot Klee. Eng. „t the Univ. of the Hitwatersrand, with the
idea of incorporating Ethernet on the Input/Output side of 
Ramrod.
Ethernet is a local area network which e v o . ’ed out of the
Aloha network it the University of Hawaii. Studies of the
Aloha network revealed a number of problems and refinements
were undertaken at the Xerox Paulo Alto Research Centre in 
the mid 1970's.
• •
E .  2 .  2  .  I  ' .  '■ t , / o r  k  Con f  i g u  r  a  t  i o n  -
Iho maximum network configuration is as follows:
1. A coaxial cable, terminated in its characteristic
impedance at each end, constitutes a cable segment.
A segment may contain a maximum of 500 meters of 
coaxial cable.
PAGE E-4
A maximum of 100 station transceiver connections 
may be made per segment.
3. Segments can be joined together u s i n g  repeaters,
provided that the longest path between any two
transceivers is less than 1500 meters, and that
there are no more than 2 repeaters in the path 
between any two stations.
4- Repeaters do not have to be located at the ends of 
segments, nor is the user limited to one repeater 
per segment, in fact, repeaters can be used not 
only to extend the length of the channel, but to 
extend the topology from one to three-dimensional.
nation on the Ethernet Network __ ___
-------------   u  V  UiJ 1
coaxial medium via an ethernet controller. The controlle, 
is loined to A transceiver, which is fixed on to the coaxial 
oable by a transceiver cable, consisting o f  six shielded 
twisted pairs not more than 50 meters in length.
PAGE E-5
e -2.3 Message exchanging in Ethernet 
£•2.3.1 The Transmitting Station -
Before broadcasting, the transmitting station must ensure 
that no other station is busy using the medium. This is 
acheived by "carrier sensing" whereby the transmitter of a 
station is prevented from becoming active until all 
transitions on the coaxial cable have ceased.
As t h e r e  is nothing to prevent two or more stations from 
scheduling a transmission for the same message slot, 
cc iis ions will occur. Due to the ability of a station to 
C a r r i e r  sense", collisions will only occur at the start of 
a messages. The time interval during which collisions can 
occur is called the "collision window", which is long enough 
to a.low for signals to propagate throughout the medium.
When a collision does occur, the transmitting station must 
stop transmitting its message and start transmitting a 
jam . A "jam" is a burst of noise that ensures that all
nodes will detect that a collision has taken place. After
sending the jam, the station controller will enter a binary 
exponential backoff alg< ithm to randomise the re-scheduling 
of the transmission. In order to take into account 
increased traffic during busy periods, the backoff algorithm 
increases its mean value exponentially with the number of
:PAGE E-6
collisions of the message.
-.2.3.2 The Receiving Stations -
The receiver must continually monitor the line to detect any 
broadcasts. Message packets are broaocast randomly over the 
medium. In order for the receiver to extract the data from 
the information stream, a synchronization burst must precede 
the transmission.
Ail messages must be examined to determine their destination 
address. Each station on the Ethernet can be addressed in 
the following ways:
Physical Address : A unique address associated
with -he station, and distinct from the address of 
any other station on any Ethernet.
2. Multicast Address : An address that can be setup
under software control that will be accepted. This 
means that more than one station can use the same 
address.
3. Broadcast Address : This address is accepted by
all stations on any Ethernet system. It can be
used by a station when it is connected on to the
network to indicate that it has become an active 
station.
Once a message has been accepted by the receiver, it must 
first perform an error check to determine if there were any 
transmission errors, before handing the message packet to 
the host processor.
E .2.4 Comoar ison of Ethernet
In ioken Bus [RAVj nodes are connected to a common bus in a 
virtual ring. In order to transmit a node must be in
possession of the 1 token1 , and therefore the method of
access is highly organised and there is an absence of 
collisions. However there is a possibility that a faulty
node could create a duplicate token or that the token could
get uoSw. inis means that extra logic is needed to prevent 
these posibilities.
Ring network [RAV]on the other hand interconnects nodes in a 
loop with messages travelling around the loop in one 
direction. Access is deterministic and priorities can be 
assigned theieby preventing collisions. However as each 
node acts as a repeater , the reliability of the network 
depends on the reliability of a single node. The removal of 
a node from ^he network can result in messages circulating 
indefinitely.
PAGE E-8
Etnernet has a major disadvantage in that as the loading
becomes heavy collisions increase snl the channel
utilisation decreases.
E.2.5 Summa r v
Etnernet ,a bit serial communication medium, can operate 
upto 1C Megabits per second . A typical packet has a 64 bit 
preamble, 48 bit destination and source address, 16 bit dat," 
type word, 368 to 12000 bits of data, 32 bit Cyclic 
Redundancy Check and a 96 bit packet gap.[CRA] Thus an 
information packet can range from 672 to 12304 bits.
Etnernet , which consists of coaxial bus segments, can be 
expanded passively by adding transeivers and coaxial cable. 
If needed signal strength can be buffered by connecting a 
simple packet repeater.
E-2.6 Protocols
Transfering information packets from one device to another 
requires methods for error correction, flow control, process 
naming, security and accounting. These methods are usually 
termed protocol. Ethernet has a simple error controlling 
packet protocol, called Ethernet File Transfer Protocol 
(ETFP), which is implemented in the interface to Ethernet.
PAGE E-9
E .3 Concljs ion
Ethernet fulfills all the above cr iterea and is therefore 
the most suitable bus communication medium. However as the 
hardware is not so readily available the actual design of 
the Ethernet bus is being designed in a related project and 
until then the I/O bus will have dedicated processors for 
each peripheral.
It should be noted that until recently Ethernet
implementations were not commercially available. Intel has 
announced their NDS-11 network development system [HUG].
A P P E N D I X  r 
Ili.2 I:XOSr,rpi- D E V E L O P M - - S Y S ?  :M
An i n t c o d u c t i o n  to the Motoi ol-t dzo rciser  D e v e l o p m e n t  sy ste m 
and tne F A b r  packag e, w h i c h  a l l o w s  a user to em ulate and 
design his b). t-sl i ce hardwar ?, is d e s c r i b e d  below.
i.he r.\i).;lice oit slice d e v e l o p m e n t  system ha s been d e s i g n e d  
1 ° 00 1 un on t^ e M 6830 EX O l c i s e r  m i c r o p r o c e s s o r  d e v e l op ment 
system. ft al lows the us er's slice sys te m to be slaved to 
tne L A O R c i s e r  via P e riphera l I n te rface Adapte r(?IA)  cards.
The F l e x i b l e  Aid for 3 1 i c e d - p r o c e s s o r  Test(FAST) mo nit or  
al l o w s  the d e s i g n e r  to dovelope and debug programs for use 
in his hardw a r e .
FYox us ed in c o n j u n c t i o n  with the M o t o r o l a  D i s k e t t e  
O p e r a t i n g  S y s t e m( MDGS) can be o p e r a t e d  in a floppy disk 
environment.
F . 1 ' i ' h :  GxOCCis M
Ine M63'e,.l LXORci.n.-r is a s y s t e m  dc vel orem n t tool used in 
the d e s i g n  and d e v e l o p m e n t  of M6 U80 d i c r , p , o c ^ s s o r  systems. 
B a s i c a l l y  tne E X OR ciser a s s i s t s  the s y s t e m  d e s igner by 
a l l o w i n g  d e b u g g i n g  of s o f t w a r e  and h a r d w a r e  emulat ion .
PAGE F-2
Once the EXORciser has been loaded the user can look at the 
contents of memory and perform the Motorola Active 
Interface(MAID) functions as listed below.
MAID enables the user to;
i)Examine and change, if necessary, contents of a memory 
location or an MPU register. 
i i)Execute a program
iii)single step the program or run until a previously 
inserted breakpoint is encountered.
iv)Perform decimal-octal-hexadecimal conversions as well 
as calculate offsets for the relative addressing mode.
F.2 MDOS
The M6800 Diskette Operating System(MDOS) enables the user 
to develope his software easily on the EXORciser. It is an 
interactive operating system that interprets commands from 
the operator's console.
The user can store or retrieve data, in the form of files, 
on a diskette,process this data or activate other user 
commands from the diskette. There are various system
commands that allow the user for example to initiate and 
format diskettes and check them for errors. Command 
chaining can be achieved by storing commands in a special 
command file and then invoking this file. MAID is entered
PAGE F-3
once an object file has been loaded into the memory space so 
that the program can be executed.
Files can be edited either by using the Co-Resident Editor 
or the updated version EDIT1. The EDIT1 editor 
automatically assigns line numbers to each file line, but 
otherwise is faster and more efficient than the former 
editor.
F.3 MASM
The Macro Assembler (MASM) has been designed for
microprogrammed bit slice processor developement.
The user must first of all define his microword size and 
then the mnenomics and the format of the microword ip the 
DEFINITION PHASE, which reads a definition source file and 
creates an assembly source file. The definition allows for
implicit or explicit field lengths. Overlapping fields can
be achieved by using 1 dont care' fields.
Once a program has been written using the assembly language 
defined in the previous phase it can assembled during the 
ASSEMBLY PHASE.
iu
lU
PAGE F-3
once an object file has been loaded into the memory space so 
that the program can be executed.
Files can be edited either by using the Co-Resident Editor 
or the updated version EDIT1. The EDIT1 editor 
automatically assigns line numbers to each file line, but 
otherwise is faster and more efficient than the former 
editor.
F.3 MASM
The Macto Assembler (MASM) has been designed for
microprogrammed bit :lice processor developement.
The user must first of ill define his microword size and 
then the mnenomics and the format of the microword ip the 
DEFINITION PHASE, which reads a definition source file and 
creates an assembly source file. The definition allows for
implicit or explicit field lengths. Overlapping fields can
be achieved by using 1dont care1 fields.
Once a program has been written using the assembly language 
defined in the previous phase it can assembled during the 
ASSEMBLY PHASE.
m
PAGE F-4
When the program has been successfully assembled then the 
resulting object file can be merged with another system file 
to allow it to be loaded during the execution of FAST.
A disadvantage of the macro assembler is that the user must 
actually list the whole microword even though he may not 
wish to use all the fields.The number of fields are limited 
and therefore a long microword with too many fields will 
have to have some oC its fields joined together.
F.4 EXOSLICE
Exoslice has been designed to extend the EXORCISER'S 
emulating capability.Once the program has successfully been 
assembled the user's bit- slice hardware can be directly 
coupled to the main system. This is achieved by using the
Flexible Aid for Slice Testing (FAST) program.
The EXOslice subsystem is capable of being connected to the 
ECL 10800 bit-slice family or the 2900 bit-slice family.
The subsystem is made up of the following components: (a)
Input/Output modules which feature 32 ECL output lines, 16 
ECL input lines and 4 ECL output control lines. These can
be expanded to 5 modules thus allowing a 160 bit word
length. The I/O module has 3 Peripheral Inteface Adaptors 
(PIA) thus allowing the EXORCISER to read and write words
PAGE F-5
greater than the 6800's 8 bit word. A decoding Programmable 
Read Only Memory (PROM, allows the FAST software to
consecutively address all output lines followed by all input 
lines.
(b) In order to interface to a TTL 2900 series bit slice 
system an ECL to TTL module is provided for each I/O module.
ge..e. a „es control signals from the EXORCISER in order
allow the user's bit slice system to be slaved to the
EXORCISER. The user's microprogram storage is then
effectively replaced by the main system's Read/Write
storage. The EXOR Clk signal enables the user's system e 
single stepped.
f a s t can als. he used without previously using the macro
assembler. Definition can be achieved during the running of 
the FAST program and instruct!.
-ons can be loaded, examined,
changed, inserted or deleted as in any other available 
emulator.
time a new microword
ram FAST emits a clock pulse each
is put out, and the special reset
pul ses.
PAGE F-6
Figure F-l shows the functional steps during a micro­
instruction execution. The line table is a Duffer which 
temporarily stores all data going to or coming from the 
hardware interface.
PAGE F-7
MlOxV -:am AREA 
wosiT #  n
UNE. TA6L£
cm FiEiJ)
FIGURE F-l m i c r o i n s t r u c t i o n  e x e c u t i o n  s t e p s
in
te
rf
ac
e
PAGE F-8
Similarly to MAID FAST enables the user to insert, display 
or remove breakpoints for subsequent program running. The 
program can be executed step by step or free run. User's 
data can be manipulated as files from the diskette nd thus 
previously saved or assembled programs can be loaded 
directly while operating in FAST.
Once a word has been successfully defined a hardware 
configuration list can be obtained
FAST unfortunately has a maximum of 11 fields and the
designer must keep this in mind when designing his 
microword.
In the DEBG or MPGM modes the format of the micro word is
h»xadecimal by word and vica versa. A far better
■system woulc be to divide the word into fields defined by
the user and allow him to use the hexadecimal format for
each field. This would decrease debugging time 
considerably.
A P P E N D ! '!
Till; CIRCUITRY
• ! i c o p' " ___ '. AT ■
A timer is gated into tiie RST pin oE the 3085 processor so 
that the processor is reset after a power up sequence. 
This cai also foe don v, iually by a RESET button or by the 
Master Controller.
A m-,.-testable and D type 1 itch form the basis of the 'watch 
doj' a"! arm. When a trigg r, in this case a read common 
memory, is not received by the alarm, the processor is held 
by the READY signal and the MC is notified and an LED is 
lit.
IiiG la ten con i r o 1 signals are tri joe cd by a master 
processor pulse which enables the processor to latch its 
address and then its dat into ciie latches ( write cycle). A 
read is accomplished by 1 tening the address and holding the 
processor until the next cycle wh -n the d .tn returns.(figure
PAGE G-2
-'V 'Z <3 C C c
H i l i i i  I
::; " '• • . •
11iiill -
■'*3
" " "  l i m i
FI
GU
RE
 
G-
l 
Mi
CR
P*
 
OC
ES
SO
R 
M
O
D
U
L
PAGE G-3
G .2 The ECL latch Module
This module consists of Bidirectional Translating latches
which are controlled by ECL signal translated by a TTL to 
ECL tranlator. There are termination resistors on every
point of access to the ECL bus so uhat the bus is terminated
at every output (figure G-2).
/PAGE G
A.i
A i l
Ail
S' i ;
o r,-, ■e-r-
_L
-JJ."
3
-5v
JS.
-  5 _ 2 v  > 6  V
u
si,.'S.
—  lOidlf 
0
:  v  . r  '
, f ! A ,
- - ^ 1 -  : °
; sou.
©
—  !
i m :
-y 2.
(3v|
Ai | 
S.* |
A,i
'0| 'II
i «L-
______
— :<3
f  5v  
-°>
'805
©
.t
■ 3v
j£a
Oi
=1
<• ;
«% 3
- %
- I p ,  _ ^ 5 v
'
,r...l h  . L
/ O !
i  j j 2 T  : !
iosou.
10 sou.
-f—
Hid 
Af, 
At i 
A t ,
T T T F
_2U_n$
I
I Aft 
I -“£ *
I >
L:3f
o o j v
f -
I Of.
| £3*1 
I Of I 
0*1
■JLI
! U .
SZi; ccj. 
f M  f
1 USw
I -fr 
i CeW
rot.
FIGURE G-2 ECL LATCH MODULE
PAGE G-5
G .3 The Memory module
A read/write signal is generated from the R/W signal 
received the ECL bus by the monostables and a JK flip flop. 
This is done in order to generate the correct width pulses 
for controlling the latches on the memory side.(figure G-3)
PAGE G-6
.4
0: •
am
-*rv_
00>-
rf
CCmT*?o. OF 1
:«E
Ov
5 V
^ 00
Ov
FIGURE G-3 MEMORY MODULE
mPAGE G-7
G.4 The Control Board
A modulo 5 counter, driven by an external clock , accesses 
two identical sets of fast memory. The MC can write to 
these memories the data desired and then the counter reads 
successive locations. The output of the memories are the 
master memory pulses and the master processor pulses. It 
should be noted that any combination of dulses can be 
obtained. (figure G-4)
PAGE G-3
i i i i n  i m u  H i m
iiiiiiaa ia iilamia 11a
Illll
FI
GU
RE
 
G-
4 
CO
NT
RO
L 
BO
AR
D 
 
 
 
 
« 
-
PAGE G-9
G .5 The Central Processor Array
Four 4 bit slices are joined to form a 16 bit ALU. The
status and fast look ahead units are included to speed up 
computations. A multipl ex er enables either data from the 
local memor y or from the pi pelin e r e g i s t e r s . (figure 
G-5 (a))
One way latches enable the ALU to commu ni cate with local 
m e m o r y , common me m o r y  and other I/O.
Real time execut ion is enabled via a mul ti plexe r or 
a l t e r n a t i v e l y  the EXOR CI SER provides all the neccessar y 
control signals. (figure G - 5 (b ) )
1PAGE G-10
li ivi 111 ii i ilium
.... I ■11111*11
r!H .•;!:r'•■'•
1
it-
^ 3 $
ir
i 1 i 15 « f
FI
GU
RE
 
G-
5 
(a
) 
CE
NT
RA
!,
 
PR
OC
ES
SO
R 
A
R
R
A
Y
mill;'-: i n - i i !
 -  '
PAGE
=
5333-^^7-'
1: ’ * - '
mi im iiM M  
a  Hi Sit I vi.SK ■“ i
;.t-Nl v u
I H
i
„
..... _J
 ^-t t ; I I 1 I •**-
PAGE G-ll
-
^  3)\
-u —
i
M *  -
Lj" Z  m
' ' 3
 ^4 ■*
mi mi ii
.< v. ,1 y»
FI
GU
RE
 
G-
5(
b)
 
I
N
P
U
T
/
O
U
T
P
U
T
PAGE G-12
G .6 The Computer Control Unit
Three 4 bit sequencers give 2**12-4K by 64 locations in the 
control store for the microinstructions. The next address 
unit enables the sequencers to function more efficiently,by 
adding extra codes. A vector input to the sequencer can be 
obtained from the interrupt unit circuitry, or directly from 
the pipeline.
The interrupt unit can recognise an interrupt from each of 
the processor modules. The interrupts however have to be 
c o r r e c t l y  pulsed by a set of monostables.
The next address unit also controls a 12 bit counter which 
Prod. :es a signal once a preset condition has occurred. 
This signal as well as status flags are routed through a 
multiplexer to be tested, with polarity, by the next address 
unit. The interface to the processor common memory is 
derived in a similar fashion to that of the processor 
modules.(figure G-6(at)
The pipeline registers are one way latches whicn collect 
their data from very fas RAM (control store). This control 
tore is replaced by the EXORCISER during operation of FAST, 
t the RAM can be loaded from the EXORCISER for real time 
processing . The control signals for the rest of the system 
are derived from the pipeline registers.(figure G-6(bl)
s
DU
liiiiiiimi
l i f ' f  < < h i f
?<\G” G-13
— -r :
. 
3
.31515 «-1 ;. r-T
■
d; 15 -M
i 11; ; 1 i' ■ | A5~ '
i i e m  rtnii
u  1 < =•■
liurj? - _
®t -nU
PAGE G-14
m u  titI l l l l l l l W l U l l
=3
>Z51- I.t_ , — - 1
I U 1 I 1 I I I I I I I I I I
l . . .
F
I
G
U
R
E
 
G-
6(
h
)
 
P
I
P
E
L
I
N
E
 
R
E
G
I
S
T
E
R
S
A P P E N D I X  H
r r r i T T T ^  m s T S  o f  r a m r o d
The cost of marketing a product can be basically 
into two a r e a s :
1. Development cost
2. P r o d u c t i o n  cost
H _ i Development Costs_
The development c o s t s  o f  the complete Ramrod system must 
take into account the following:
1. Microcoding 9R100/4 lines - R3000
2. Other software - R3000
3. Research and develo p m e n t  @R250/day for 3 man years 
- R320,000
4. Equipment such as Logic Analysers, Oscilloscopes, 
M u l t i m e t e r s  etc - R30,000
5. Emu
iation on a development system costing R53,0O0
PAGE H-2
Clearly the last three items are the most c0--11 
dominate the development cost ot a commercial produc 
They .perhaps , overlap on the other costs . 
development dost is in the region of
H . 2 P r o d u c t i o n .
P r o d u c t i o n  costs include purchasing components f o r  the
c
ca
omplete system and the production of a single Kamrod system 
n be calculated from the following:
1. Printed circuit board layout for 4 boards - *1°°°
2. Printed circuit board manufacture - R508
3. Mechanical work and structure - M O M
4. Techn ical work (soldering etc) - R2B00
5. Integrated C i r c u i t s  - R130B
6. P r i n t e d  Circuit Boards - R1200
7. Miscellaneous components (Fan etc ) - R5fi0
PAGE H-3
Thus it costs approximately R9000 to produce a single Ramrod
tern which consists of 5 slave processors, 
processor, , memory modules and 10 latch modules.
sys
H . 3 M a r k e t i n g  Cost
The selling cost of a marketable product is a function 
the amortised development costs, production costs, normal 
application software, sales and support necessary to
ma intain the product
/a p p e n d i x  I
HIGH LEVEL DESCRIPTION OF SOFTWARE
This section p r o v i d e s  a High Level D e s c r i p t i o n  of the main 
routines used in the master controller and the local 
operating system. The description is b a s e d  on a simplified
form of Pascal.
I .1 MAIN ROUTINE
Read number of tasks to be loaded;
WHILE memsegment still free ; search memory table
Read in Tasks
IF freeProcessor found THEN ; search processor table 
BEGIN
Dispatch task to processor 
Schedule processor to run 
END
UNTIL no tasks left.
1.1.1 INTERRUPT ROUTINE 
Disable Interrupts 
Read interrupt ID.
Call IntService (ID.)
Return to main routine.
IntService (IE.)
E n a b l e  I n t e r r u p t s  
S e r v i c e  i n t e r r u p t  
R e t u r n
PAGE 1-3
D i s p a t c h  T a s k  t o  P r o c e s s o r  
A s s i g n  F r e e M e m  t c  F r e e P r o c ; 
S c h e d u l e  P r o c e s s o r  t o  R u n
A s s i g n  T i m e s l o t  t o  P r o c ;
P r o g r a m  c o n t r o l  b o a r d  m e m o r y
P r o g r a m  c o n t r o l  b o a r d  m e m o r y
PAGE 1-4
I.2 Local Operating System 
START: If Identity Equals slave
then Execute user task
ELSE BEGIN
set up Usart;
IF identity equals load;
THEN load tasks from disc;
ELSE BEGIN
Ask user for command 
CASE of Command
1: display memory 
2: Execute program 
3: Test Common memory 
4: Inform status of Ramrod 
5: insert data into memory 
6: Move data 
7: Substitute data 
8: Display registers 
END 
END
END
END
Ask User for Command
Reai Command from Console 
Call Command Routine
Display Memory 
REPEAT
Read start address, end address 
Display address, data 
UNTIL end of address
Execute Program 
Read start address
Put start address in Program Counter 
Execute
Test Common Memory 
REPEAT
Write random data into memory 
Read data and compare 
UNTIL end of memory
Inform User Status of System 
DO 209 times 
BEGIN
yead number of processors
Display data
Read number of memories
/PAG!
D i s p l a y  d a t a
R e a d  n u m b e r  of T a s k s
D i s p l a y  d a t a
E N D
I n s e r c  D a t a  
R E P E A T
R e a d  a d d r e s s , d a t a  
W r i t e  d a t a  i n t o  a d d r e s s  
U N T I L  E n d  O f  C o m m a n d  ( E O C )  C h a r a c t e r
M o v e  D a t a  
R E P E A T
R e a d  d e s t i n a t i o n  a d d r e s s  
R e a d  e n d  a d d r e s =
R e a o  s o u r c e  a d d r e s s
M o v e  d a t a  f r o m  s o u r c e  t o  d e s t i n a t i o n  
U N T I L  e n d  a d d r e s s  
S u b s t i t u t e  M e m o r y  
R E P E A T
R e a d  a d d r e s s  
R e a d  d a t a
w r i t e  d a t a  i n t o  a d d r e s s  
U N T I L  E O C
ijsolav Registers 
REPEAT
Write contents of register into memory 
Display "Reg" , d a t a  
UNTIL no more r e g i s t e r s .
6a p p e n d i x J
RELIABILITY [SMI]
The
reliability ot a system is primarily influenced by its 
complexity. The fewer the parts and the fewer the types of 
materials and components involved then the greater is the 
probability of an inherently reliable product. In addition 
Che use of redundant parts , whose individual failure does 
not cause the overall product to fail, is a common method to 
achieve a higher reliability.
It is good engineering practice to satisfy reliability
requirements, but the engineer must bear in mind that the 
mathematical aspects of the subject, although important, 
serve only to refine requirements and do not themselves
create a reliable product.
It is clear that the cost of making a system more reliable 
must be offset , in part , by a saving in maintenance to 
justify it. Maintainability and reliability , together, 
dictate the availability of the equipment, and are 
interdependent for the following reasons:
1. If the system's reliability is partly dependant on 
redundancy , it will be more reliable if the repair
time (maintainability , of an SRU is improved.
Thus maintainability can contribute directly to the
reliability.
PAGE J-2
2. The design and assurance activities to achieve both 
of these parameters are , generally, the same.
3. The overall availability of the system, i.e. the 
'up time' is also dependant on both these 
parameters.
Availability is defined as the ratio of the up time to tip? 
total time. Up time is defined as the Mean Time Between 
Failure(MTBF) whereas total time is the sum of up time and 
'down time'. Smith [SMI] makes a distinction between down 
time and the Mean Time To Repair (MTTR) but for the purpose 
of this thesis they are considered the same.
Thus Av = MTBF
MTBF+MTTR
Availability is achieved by a combination of maintainability 
and reliability and there is a trade off between these two 
parameters as explained in the following example:
A system which has a MTBF of 100 hours and a MTTR of 101 
hours has an Av= 100/101, has the same Av as a system with 
MTBF=200 hours and MTTR = 20 2 hours. Clearly the
reliability of the former case is greater than that of the 
latter while the converse is true of the maintainability.
/PAGE J-3
Reliability as mentioned above is influenced by the 
complexity of a system, and thus a uniprocessor system will 
probably be more reliable than a multiorocessor system. 
However if the factors of redundancy and repair are 
fntroduced the the multiprocessor becomes much more
reliable.
MTTR
The repair time is defined as the inverse of
Therefore if a redundant system is periodically repaired, 
whether or not faults are present, each time it is repaired 
the reliability calculations begin anew.
There follows calculations of the reliability and the MTEF 
of various systems including Ramrod .
Figure J.l shows the reliability of a system consisting of 
several parts where the failure of any block causes a system 
failure ( eg. a two board computer).
Thus R = Ra -Rb
Figure J .2 shows the situation where all blocks must fail in
order to cause a total system failure ( eg. a redundant
processor system).
A
Thus R = Ra+Rh-RA'Rb
PAGE J-4
rigate J.3 illustrates a situatior. which is composite of J.l 
and J . 2 .
Thus R = Ra -(Ra+Rb-Ra-Rb)
The reliability diagram .of Ramrod is illustrated 
j.4. It will be analytically proven , and it can be seen 
from the diagram as well, that failure of either the matter 
or the ECL bus causes a system failure. H o w e v e r  i. 
noted that the system will gracefully degrade to
uniprocessor computer if the master fails ,s mentioned in 
chapter 6, and unfortunately there is no way to show this in 
the mathematical model. Therefore the r lability of Ramroo
is much worse than : actual reliability.
PAGE J-5
B
FIGURE J-3 COMPOSITE RELIABILITY
e c l BUSMASTER ■
PROC1
PROC2 ]—
i— | MEMl 
«—! MEk2 --1
T P ROC3 -4 — ; MEM3_
PROC4 MEM 4
—  PROC5 T|— 1 MEM5
FIGURE J-4 RAMROD'S RELIABILITY
FIGURE J-l SERIAL RELIABILITY
FIGURE J-2 PARALLEL RELIABILITY
PSiGE J-7
A uniprocessor system with L=2EO0(; FITS and z soards 
<=>o
now MTEF = } P ( t ) d t  
0
h h e r e R = e- ^ ~ = 1/2L - I. b years
for a ou intiple redundant system 
Rp^ = R^-SK'i + i e R ^ - l K R ^  + SR
thus XTBF =1 - 5 + 1£ - 10 +5 = 2 0  years
10L 8L 6L 4L L
Now if Repair is introduced then 
MTEF = u4
where u = 1/24 and thus u >> L 
therefore
MIBF = 1C18 years.
However 'amroc has a Reliability of;
R - = a . R • i o 5 ^
therefore MTBF = 1.3 years
Or if Ramrod is considered as 4 identical units , and
an any failure causes a system failure then
MTBF = 1/4 L = 1.4 years which is almost 1.3 vears.
PACE J-3
Thus it i?3 obvious th.at th" master a n d  the CC!j bus 
bottlenecks and must be dupl lea t-'-u. dow it 
duplicated then lor a doubl< rr v i u n d a n  t syst^.n
rtp2 = 2R ~ R‘"
then the reliability ol Ramrod is
Rp22 * Rp5‘' 
therefore
: n -  1
H l 26L 24L 22L 2UL iBL loL Vi. 12L1CL dL 
= 6 .7 years.
however if repair time is now introduced , then Ramrod can 
be analysed as a quadruple redundant system which requires 
three units to operate, because i1 one unit tails then there 
is still the other identical unit which can now t, <<e over
opc ration.
Thus MTBF (Ramrod) = 7 L + u =50 x 1 ° years
12L2
are the 
these are
REFERENCES
PAGE R-2
[. SPR 82] AGERWALA, T. and ARVI.NIi, “Data I lo- -Y '
IEEE Computer Vol. 11 NO. 2. Februac ■
[ALE 81] ALEXANDER, P, "Array Processor Design Concepts”; 
Computer Design December 18 81.
[ A N A  80] A N A C K E P . ,  W, " J o s e p h s o n  C o m p u t e r  T e c h n o l o g y :  An IBM
R e s e a r c h  P r o j e c t " ;  I L M  J .  R e s .  D e v e l o p .  v o l  24 .
N o .  2  M a r c h  1 9 b 0 .
[ A C R ]  A C K E R M A N ,  W . G ,  " D a t a  P l o w  L a n g u a g e s "  C o m p u t e r
V o l .  1 5  N o . 2  F e b r u a r y  1 9 8 2 .
[ A G R  7 6 ]  A G RAWA LA , A . K ,  R A U S C H E R ,  T . G . ,  " F o u n d a t i o n s  o f  
Microprogramming A r c h i t e c t u r e ,  S o f t w a r e
A p p l i c a t i o n s " ; A c a d e m i c  P r e s s ,  I n c  1 9 / 6
[ A M n  1 ]  A D V A N C E D  M I C R O  D E V I C E S  T h e  A M 2 9 0 0  F a m i x y  D a t a  Book. 
1 9 7 9
[ A M D  2 ]  A D V A N C E D  M I C R O  D E V I C E S  Build a  Microcomputer Series. 
1 9 7 9
[ A L D ]  A L - D A B A S S ,  D ,  " M i c r o p r o c e s s o r  b a s e d  P a r a l l e l  C o m p u t e r s  
a n d  t h e i r  A p p l i c a t i o n  t o  t h e  s o l u t i o n  o f  C o n t r o l
A l g o r i t h m s " ;  C o n t r o l  S y s t e m s  C e n t r e  R e p o r t ,
U n i v e r s i t y  o f  M a n c h e s t e r ,  J a n  1 9 7 7 .
(/
PAGE R-3
[ARD] ARDEN , B.W., GI NOG Ml, ,
Multiorocossoc/Comput -r ArchiKctuce"; I.E.E.R.
Transactions o n  C o m p u t e r s ,  V o l .  C - 3 1 ,  h l u - " “ a y  
19B2.
[ E R I  7 8 ]  B R I N C H  H A N S E N ,  P  . , " ! ) i s t r  i b u t o d  1 r 0 '
c o n c u r r e n t  p r o g r a m m i n g  c o n c e p t " ;  Comm A.C.M. V o l  
2 1 ,  M o . 1 1  N o v .  1 9 7 8  p p  9 3 4 - 9 4 1
[ U L A ]  B L A K E ,  R . E . ,  " A d v a n t a g e s  t o  b e  g a i n e d  trom P r o c e s s  
C o n t r o l  b y  C o m p u t e r " ;  E l e c t r o n i c s  a n d  P o w e r  , M a r c h
1 9 7 7
[ B O H ]  B O W E N ,  D . A . ,  B U H R ,  R . J . A . ,  "  T h e  L o g i c a l  D e s i g n  of 
M u l t i p l e  Microprocessor S y s t e m s " ; P r e n t i c e  h a l l  I n c .
N e w  J e r s e y  1 9 8 0
[ B A R ]  B A R R O N ,  D . h . ,  " C o m p u t e r  O p e r a t i n g  S y s t e m s " , C h a p m a n  a n d  
H a l l  L o n d o n  1 9 7 1
[ B S O  1 ]  B O S T O N  S Y S T E M S  O F E I C E  B S O  C r o s s  L i b r a r i a n  (NLIR)
U s e r  M a n u a l  . 1 5  F e b .  1 9 8 1 .
[ B S O  2 ]  B O S T O N  S Y S T E M S  O F F I C E  B S O  C t o a B - R e f e t e n c c  Program
( M R H F )  U s e r  M a n u a l  .  1 8  M a y  1 9 8 1 .
( B S O  3 ]  B O S T O N  S Y S T E M S  O F F I C E  B S O  C r o s s  L i n k a g e  E d i t o r
( M L I N K )  U s e r  M a n u a l  . 2  J u l y  1 9 8 1 .
[ B S O  4 )  B O S T O N  S Y S T E M S  O F F I C E  B S O  R e l o c a t i n g  C r o s s  A s s e m b l e r  
f n  a n f t rM  User M a n u a l  . 7 J u l y  1 9 8 1 .
/PAGE R-4
[330 5] BOSTON SYSTEMS OFFICE BSO Object File Conversion 
Utility (OBJCMV) User Manual . 7 July 1981.
[BSO 6 J BOSTON SYSTEMS OFFICE BSO Simulator/Debugger
(SI8035) User Manual . 7 July 1981.
[BIS] 31SCAERI, J .,GAGO, A., "Low-Cost Multiprocessing 
System"; Electronics Letters Vol 17 no. 2 4  26 Nov
1901.
[BLO] BLOOD, rt.R.Jr., " M e d  System Design Handbook"; 2nd eel. 
Motorola Inc,1972
[BRI 73] BRINCB—HANSEN, P., "Operating System Principles'; 
Prentice-Hall, 1973
[3AK] BAKER, K. "Specifying The System" Microprocessors and 
Microsystems, Sept 1961' Vol. 4  No. /
i
[BAR] BARTEC, T.C."Digital Computer Fundamentals";3rd ed. 
Tokyo: McGraw-Hill Kogakusha Ltd., 1972.
[BRK] BRINKMAN, E.L., " A Selection of Multi-Microcomputer
Systems"; Mini-Micro Systems, JAN 1979.
[PUH] BUHR, R.J.A., E T A L .  "Why Multiple Microprocessors";
Internaliona1 f posium on Mini and Micro computers 
Montreal Canada, 197/.
[BERT] BERNHARDT, D, . and SCHMITTKK, c.,.
Implementation of Fault-Tolerant Multi-Microcomputer
S ' / ;
Mo. 4 May 1981.
[BERD] BERNHARD, R,. "The 'no-downtine' computer";
I.E.E.E. Spectrum September .1977 r> - 83-37.
[CRA] CRANE, R.C., "Software pack and cor. - er link EEC 
computers in an Ethernet"; Electronics Dec. 15,1981
[DOY] DOYLE, E.A.Jr., "How Parts Fail"; I.E.E.E. Spectrum 
October 1981.
[DBS] DESIMONE, S.E., "Test Techniques for ECL loaded 
Boards" ; Computer Design,June 1952.
[DU] DJIKSTRA, E.W., "Co-operating Sequential Processes", 
reprinted in "Programming Languages", edited by 
. , NATO te, A
Press, London 1968,pp 43-112.
[DAV 78] DAVIDSON, J., ET AL., "A Generalized Multiprocessor 
System"; I.E.E.E 1)7 .
[DAV 82] DAVIS, A.L.,KELLER, i'.K , "Data tlow Program
Graphs"; I E E E  Computer Vol. 15 Mo. 2, Fenrumy
/PAGE IV 5
(DAV 301 DAVIS, C.G., COUCH, R.L., "Ballistic Missile
'
November 19 80
[DEN] DENNIS, J.B., "Data Flow Supercomputers"; Computer 
November 1983 
'
for the New SNCF Computer Systems Network" . Paper 
read at the 8th ORE Colloquium , Madrid , 5 and 6 May
1981.
[ENS 80] ENGLOW, P.H. Jr. "What is a 'Distributed1 Data 
Processing System?" Computer Jan 1978 Vol. 1983 pp
#  75-96
[ENS 74] ENSLOW, P.H.Jr. Comtre Corporation,
Multiprocessors and Parallel Processing";New 
York ;John Wiley and Sons,1974
[EUR] EURGMICRO JOURNAL; Vol 5:
[PEL] FELDMAN, J.A., "High Lev 1 Programming for Distributed 
Computing"; Comm. A.C.M. Vo.!. 22 No. 6, June
1979, pp 353-368.
[FAR] FARBCR, G. , "Principles and Applications
Decentralized Process Control Computer Systems"; 
Distributed Process Computer Sytstems.
[GIL BEHR] GILGI, W.K., BEHR, P.M., "Making Di stt. i'out.d 
Multicomputer Systems Sale and Programmable ; 
Internal report at the Technical University 
Berlin, West Germany.
[GAJJ GAJSKI, D.D.,et al, "A Second Opinion On Data rlow 
Machines and Languages"; IEEE Computer Vol. 15 do. 
2, February 1932.
[HOA 78] HOARE, C.A.R., "Communicating
Processes"; Comm.A.C.M. Vol. 21, No. 8, Aug. 
1978, pp 666-677.
[HOP] HOPKINS, A.L., et al ,"FTMP A Highly Reliable 
Fault-Tolerant Multiprocessor for Aircraft ; 
Proceedings of the I.E.E.E. , vol. 66 No. Iw Oct 
1978.
[HOA 72] HOARE, C.A.R., "Towards a Theory of Parallel 
Programming"; Operating System Techniques, Academic 
Press, New York, 1972 pp 61-71
[HUD] HUGHES, P., DOONE, T .," Mu1ti-Processor Systems";
Microelectronics and Reliability, vol. 15 pp 
281-293, Pergamon Press, 1977.
[HUG] HUGHES , J  . , " Dove lopemen t Systems: Ethernet. ;
Computer Design , May 1 38 2.
PAGE R-i
M .,"Bit-slice[HIR] BIRD, D. J, ELI IQ"', 0.
their use and application ;M icropcocossor s- 
Electronics and Power, vol 25, No. 4, March 1979, pp
179-184
[ H O P  8 0 ]  H O P K I N S ,  R . L . , "Meeting the Challenge o f  A u t o m a t e d  
E C L  T p  vnq" C o m p u t e r  Design S e p t  1 9 8 0  p p  1 1 5 - 1 2  . .
[INT 1]INTEL COaFORATI01 
O c t .  1 9 7 9 .
N MCS 83/85 Family users manual ,
[INT 2] INTEL CORPORATION Coir.pon nt data catal' I , 1983.
[INT 3] INTEL CORPORATION Peripheral design handbook , Aug, 
1980 .
[INT 4] INTEL CORPORATION Memory Design Handbook , Jan, 
1981.
[INT 5] INTEL CORPORATION SDK 5 Kit User’s Manual, .
[JOB] JOHNSON, D. , "Logic Analyser and mo Developc,.ient 
System, Aid in Debugging Multiprocessing Networks"; 
Digital Design Nov 1980
[KAH] KAHNS, S., ET AL. , "Automated Control by Distributed 
Intelligence"; Scientific American 15/ ,*.
[KART] KARTASHEV, S.P. and KARTASHEV, S.I. "Supecsystems 
for the 80's"; I.E.E.E. Computer Nov 1983
I
I
PAGE R-9
[KARP] KARPLUS, W.J., AND COEKN, D., " A r c h i t e c t u r a l
Software Issues in the Design and Application of: 
Array Processors"; I.E.G.E. Computer Sept. ]9%l
[KER] KERGUELEN R. "Use of Micro-Computers in Distributing 
Processing on the SHCF" . Paper read at the 8th ORE 
Colloquium , Madrid , 5 and 6 May 1901.
[KOP 31] KOPETZ H. "Distributed Computer Control Systems" .
Course presented by The Continuing Engineering 
Education Division , University °L
Wi twatersr and , 4 to 6 uov. lyol.
T e c h n i c a l  U n i v e r s i t y  of Berlin Report MA 82/2, April 
1922.
[KOYJ KOYAMA, S., MIURA, R. , "A Multiprocessor System for 
Fast On-Line Simulation of Dynamical ^Systems", 
reprinted from Simulation of Systems, Delft 19/6, 
North-Holland Amsterdam: 1976
[ROY] KOYAMA, S., MIURA, R., "An all-Digital Dynamical
System Simulator using Parallel Processing",
reprinted from A link between Science and
Applications of Automatic Control, New York and 
nvfnrri? P'? r a n  moon Pro fr>, 19 / 1
PAGE R-13
[KOY 771 KOYAMA, S., ISURUGI,*., et al,". ^ U z a t i o n  o£ a 
D D R  System Eot C o n t i n u o u s  D y n a m i c a l  System simulation 
w i t h  a universal M u l t i m i c r o p r o c e s s o r  S y s t e m  ‘ H A R P S 1 
", Euromicro newsletter Vol . 3, -o. 1 ''' '
[LAM] LAMBRECaS, J.S.D., ROOD, M.C,.,
S o f t w a r e  f o r  u s e  i n  S a i l - S a f e  C o n t r o l " ;  P r e p r i n t s  o f
t h e  3 r d  I F A C / T B I P  S y m p o s i u m  o n  S o f t w a r e  t o r  C o m p u t e r  
C o n t r o l  , 5-8 Oct. 1982
[LISKl LISKOV, B., "Primitives for Distributed Computing" 
Froc. 7th Symposium on Operating Systems Principles, 
Pacific Grove California Doc. 1973 PP 33-42
[ L I S T ]  L I S T E R ,  A . M . ,  "  F u n d a m e n t a l s  o f  O p e r a t i n g  S y s t e m , ,  , 
The M a c m i l l a n  P r e s s  Ltd., 1 9 7 v
[ M c D j  M C D O N A L D ,  W . C . ,  W A Y N E  S M i T . i ,  R . ,  "■ ‘ 1 ‘ A l “ t “
for Real-Time Applications"; Computer ,Ocr. 1982 13p
2 5 - 3 9
: Distributed
Packet Switching for Local Computer networks";
1. 19, NO. 7
[H . "  " !
a p p l i c a b i l i t y  o f  I n t e r p r o c e s s  c o m m u n i c a t i o n  P r i m i t i v e  
P r o p o s a l s  t o  D i s t r i b u t e d  P r o c e s s  C o n t r o l " ;  P r e p r i n t s  
o f  t h e  3 r d  I F A C / I F I P  S y m p o s i u m  o n  S o f t w a r e  f o r  
r n m n n i - p r  C o n t r o l  , 5 - 8  O c t .  I ) o 2
PAGE R-13
[KOY 77] SOYAM-X, S., ISURUGI ,Y. , et al, "A Roaliz j cion oi.
DBA System Eoi: Continuous Dynamical System simulation 
with a Universal Multimicroprocessor System ’HARPS' 
", Euromicro Newsletter Vol. •, No. 4, 1.9 77.
[LAM] LAM8RECHS, J.S.D., ROOD, M.G., "Highly Reliable
Software for use in Fail-Safe Control"; Preprints of 
the 3rd I FAC/1FIP Symposium on Software for Computer 
Control , 5-8 Oct. 1982
[LISK] LI3K0V, B., "Primitives for Distributed Computing"
Proc. 7th Symposium on Operating Systems :jr incip tes ,
Pacific Grove California Dec. 1979 pp 33-42
[LIST] LISTER, A.M., " Fundamentals of Operating Systems ; 
The Macmillan Press ltd., 1970
[MCD] MCDONALD, W.C., WAYNE SMITH, R., "A flexible test-bed 
for Real-Time Applications"; Computer ,Oct. 1982 pp
25-39
[MET] METCALFE, R.M., BOGGS, D.R., "Ethernet : Distributed
Packet Switching for Local Computer Networks"; 
Communications of the ACM July IS 76, Vol. 19, No. 7
[MAC] MACLEOD, I.M ., ROOD, M.G., "An Evaluation or
applicability of Interprocess Communication Primitive 
Prono :t! to Distributed Process Control" ; Preprints 
of the 3rd IL’AC/IFIP Symposium on Software for 
Computer Control , 5-8 Oct. 198 2
[MAK] MAKING, K., KOYAMA, 3., et al., A '
Digital Simulator (UOSS) Using a Hierarchical 
Distributed Multi-processor Technology", reprinted 
from Simulation of Systems '7S, Sorrento 1979, 
North-Kol1 and Amsterdam: 1979
[HAD MAISEY, D., "Distributed Processing for Industry"; 
New Electronics September 9 190..
[MIC] MICRO MEWS, A Newsletter from L'Electron s.A. 
Microprocessor Division; Chnotuc o .
[MOT B] MOTOROLA INC."Hoc1 High Speed Integrated Circuits"; 
Series B .
[MOT 1] MOTOROLA I N C ."EXOslice User’s gu i d e ” ; Switzerland: 
1977
[MOT 2] M •
Switzerland: 197 5
[MOT 3| MOTOROLA I U C ."M68MDOS3 EXORdisk 11/111 Operating 
System User's Guide" ; 1st ed. , li'/o
...............
Telecommunications"; Electronics and
Instrumentation,vol 11 No.4, April 1:, 3 0 ,pu v Z 7j
PAGE R-12
[NAD] NADIR J . , McCORMIC B - "Bus Arbiter Streamlines
Multiprocessor Design" . Computer Design , June 
1 9 8 9  , p n . 1 0 3 - 1 9 9 .
[NOV] NOVAK M., "Gate Arrays - fabrication, design and 
economics"; MSc Research Report Dec. 1982, 
University of the Witwatersrand, Johannesburg
[PAT] PATEL, J.H. "Performance of Processor-Memory
Interconnections for Multiprocessors"; I.E.E.E. 
Transactions on Computers Vol. C - 3 0  No. 1 0  October 
1 9 0 0 .
[POL] POLCZYNSKI, M.H., "Multip1 mp Control System raises 
throughput without bus conflicts"; Electronic Design 
J a n .  7 ,  1 9 0 2 .
[PEB] PEBERDY, N . "Digital Electronics-Logic Families";
The Electrical Engineer.Sept 1 9 8 0  pp 1 3-20,Thompson 
South Africa
[RAV] RAVASIO, P.C., et al, "Local Computer Networks"; 
North-Holland, Amsterdam, 1 9 0 2 .
[RA3] RABIHOWITZ, A .E ., and ROOD, M.G., "Ramrod a Multi- 
Microprocessor Computer"; Proc. 2nd South African 
Computer Symposium, OCT. 1901, Pretoria.
/PAGE R-13
r.OD 76] ROOD, M.G., "Organisation of [ndustL ia'
C o m p u t e r s "  PhD. Thesis, U n i v e r s i t y  oC C a p e  Town,
1976
[ROD 82] ROOD, M.C., " The Impact of Microelectronics on
Distributed Control Systems "? Inaugral Lecture for 
the Head of the. Dept. of Electrical Engineering , 
University of the Witwatersrand, Johannesburg 2'Jth
/
October 1982.
[RAT] RATTNER, J., LATTIN, W.W., "ADA d e t e r m i n e s  the 
A r c h i t e c t u r e  of 32 b i t  M i c r o p r o c e s s o r " ;  E l e c t r o n i c s ,  
Feb. 24 1901.
[SUG 80] 3UGARMAN, R., " 'Superpower1 Computers"; I.E.E.E. 
Spectrum April 1980.
[SMI] SMITH, ,D,J., " R e l i a b i l i t y  and M a i n t a i n a b i l i t y  m  
P e r s p e c t i v e " ;  M a c M i l l a n ,  1901.
[SAT] SATYANARAYANM, M., "Commercial Multiprocessing
Systems"; Computer May 1980 pp 7^-96
[STI] STIFFLER, J,J. , "How Computers Fail"; I.E.E.E. 
Spectrum October 1982.
. ,  E ,  J  r . ,  (  ' ' "  1
Technology and A r c h i t e c t u r e " ;  I.E.E.E. Transactions 
C o m p u t e r s , Vol. C-31, No. 9 May 1982.on
PAGE R-14
[TOR] TORRERO, E.R. "They said it couldn't be done"; 
I.E.E . G .  Spnctcum Sept. 19S--.
[TAR] TAtiAKA, Y., HI Y AS HIT A , K., e V. nl o (■
university Array Processor System): A Hew
Hierarchical Array Processor System", 2nd Euromicro 
Symposium on Micro Architecture, Venice: Oct 1976
[TOO] TOONO, H.D., "Multi-Microprocessor Systems"; Siemens 
Forsch-u. Entwickl.-Ber Bd 7(1973) nr.
6,Springer-Verlafj 19/ .
[TRAKH] TRAKHTENGERTS, E.A., SHURAITS, Yu.H., Software
Design for Multiprocessor Systems Computer Control";
'
Moscow, USSR.
[THE, t b b i S, D., - Array Processor Architecture"; I.E.E.E.
COMPUTER Sept. 1981.
[TEX 1, TEXAS INSTRUMENTS INCORPORATED The TTL Data Book for 
Design Engineers; 1973
[TEX 2] TEXAS INSTRUMENTS INCORPORATED Supplement to the TTU 
Data Book 1974
[os BOO] UNITED STATES DEPT, of DEFENCE, "Reference Manual 
for the ADA programming Language, July 19 80
[VICl VICK, C.R., et al " Adaptable Architectures for 
Supersystems"; Computer November 1930.
P a g e  R-14
[t o r ] TORRERO, E .a . "They said it couldn't b° done"; 
I.E.E.G. Spectcum Sept. 19S;.
[ TAiJ ] TAdARA, Y. , MI Y AS HIT A, K. , et al "HARPS (Hokkaido 
university Array Processor System): A New
Hierarchical Array Processor System", 2nd furoraicro 
Symposium on Micro Architecture, Venice: Oct 1976
[TOO] TuONG, H.D., "Multi-Microprocessor Systems"; Siemens 
Forsch-u. Entwick1.-Ber Bd 7(1^ o) nr.
6,Springer-Verlag 197 G.
[TRAKH] TRAKHTENGERTS, E .A ., SHURAITS, Yu.M., "Sol two re
Design for Multiprocessor 3y ferns Computer Control"; 
Internal re?orc at the Institute of Control Sciences, 
Moscow, USSR.
[THE] THEIS, D., " " > 1 ’
COMPUTER Sept. 1981.
[TEX 1] TEXAS INSTRUMENTS INCORPORATED The TTL Data Book for 
Design Engineers; 1973
[TEX 2] TEXAS INSTRUMENTS INCORPORATED Supplement to the TTL 
Data Book 19 74
[US DOD] UNITED STATES DEPT. of DEFENCE• "Reference Manual 
for the AD x programme .g Language, July 19 80
[VIC] VICK, C.R., et al " a aptable Architectures for
S u p e r syste s" ; Cor;,outer November 1980.
PAGE R-15
fwcij WErTZMAG, C., "Distributed Micro/Mihicomouter Systems, 
Structure, Implementation and Application"; 
Prentice-Hall, N.J. 1980
[WAP] WATSON, I., CURD, J., "A practical Data Flow 
Computer ", I E E C o m p u t e r , February 1982
[WJLl .. I u a LS, M.V. STRINGER, j.b., "Microprogramming and the 
Design of the Control Circuits in an Electronic 
Digita „ Computer"; reprinted in "Computer 
Structures:Readings 9
[ ILD] WILD, n., "A Support System for Developement of a 
Microprogrammed Controller", MSc Dissertation (in 
preparation) Dec. 1932, University of the 
Witwatersrand, Johannesburg
‘ " J ^  11 JvD' A *K •' "A Multi-Microcomputer Interface " ;
Microelectronics and Reliability Vol 19 pp
513-522:P -rgnmon Press Ltd. 1980
[7AK] ZAKS, R., WILMINK, J., HICOUD, J.D. "Microcomputer 
Ai.cn j toe* urea". Euromicro Symposium. Amsterdam: 
North Holland, Oct 1977.
fZOC] ZOCCOLI, M.P., SANDERSON, A.C., "Rapid Bus
I'.ii i . iprocr. nor System", Computer Design, Nov 1981, pp 
189-200

Author Rabinowitz A E
Name of thesis Ramrod: an experimental multi-microprocessor 
PUBLISHER:
University of the Witwatersrand, Johannesburg 
©2013
LEGAL NOTICES:
Copyright Notice: All materials on the Un i ve r s i t y  of  t he W i t w a t e r s r an d ,  Johannesbu r g  L i b r a r y  website 
are protected by South African copyright law and may not be distributed, transmitted, displayed, or otherwise 
published in any format, without the prior written permission of the copyright owner.
Disclaimer and Terms of Use: Provided that you maintain all copyright and other notices contained therein, you
may download material (one machine readable copy and one print copy per page) for your personal and/or 
educational non-commercial use only.
The University o f the W itwatersrand, Johannesburg, is not responsible for any errors or omissions and excludes any
and all liability for any errors in or omissions from the information on the Library website.
