Sensitivity analysis of transputer workfarm topologies by Johnson, Timothy J.
Calhoun: The NPS Institutional Archive
Theses and Dissertations Thesis Collection
1989
Sensitivity analysis of transputer workfarm topologies.
Johnson, Timothy J.
















Thesis Advisor: Chyan Yang
Approved for public release; distribution unlimited

Unclassified
ecuriU' Classification of this page
REPORT DOCUMENTATION PAGE
a Report Security Classification
Unclassified
1 b Restrictive Markings
a Security Classification Authorirv
b Declassification/Downgrading Schedule
3 Distribution Availability of Report
Approved for public release; distribution is unlimited.
Performing Organization Report Number(s) 5 Monitoring Organization Report Number(s)




7 a Name of Monitoring Organization
Naval Postgraduate School
c Address (city, state, and ZIP code)
vlonterev, CA 93943-5000
7 b Address (city, state, and ZIP code)
Monterev, CA 93943-5000
a Name of Funding/Spyonsoring Organizaiion 8b Office Symbol
(If Applicable)
9 Procurement Instrument Identification Number
c Address (city, state, and ZIP code) 1 Source of Funding Numbers
Program Element Number Project No Task No Work Unit Accession No
1 Title (Include Security Classification) Scnsitivitv Analysis of Transputer Workfarm Topologies
. 2 Personal Author(s) Timothy J. Johnson




14 Date of Report (year, month,day)
September 1989
1 5 Page Coimt
79
1 6 Supplementary Notation The views expressed in this thesis are those of the author and do not reflect the official
JDolicy or position of the Department of Defense or the U.S. Government.
7 Cosati Codes
ield Group Subgroup
1 8 Subject Terms (continue on reverse if necessary and identify by block number)
Network, Workfarm, Lx)ad Balancing, Linear Network, Tree Network,
Transputers, Multiprocessors
19 Abstract (continue on reverse if necessary and identify by block number)
Parallel processing structures such as multiprocessor arrays and pipelining enhance throughput tremendously
for suitable algorithms having high degrees of concurrency. However, if the time to process different
*vorkpackets becomes irregular, much of the advantage offer traditional sequential processing systems may be
iost.
In an attempt to produce a more flexible response to workload demands, a transputer workfarm was
investigated. Two network topologies, a linear model and a tree model, were built using the transputer as the
Drocessing element (PE), or worker. An algorithm was developed which could be run independentiy on all
workers in the workfarm. Each worker produced results independent of the other workers. By altering specific
variables within the algorithm, the network performance could be changed. The results from this thesis illustrate
how these parameters affect each network and provide comparative information between the linear model and the
Tee model.
20 Distribution/Availability of Abstract
XI unclassified/unHmiied I same as report I DTIC users
2 1 Abstract Security Classification
Unclassified
22a Name of Responsible Individual
Chvan Yans




DO FORM 1473, 84 MAR 83 APR edition may be used until exhausted
All other editions are obsolete
security classification of this page
Unclassified
T245273





Lieutenant, United States Navy
B.S., Norwich University, 1980
Submitted in partial fulfillment of the requirements
for the degree of






Parallel processing structures such as multiprocessor arrays and
pipelining enhance throughput tremendously for suitable algorithms
having high degrees of concurrency. However, if the time to process
different workpackets becomes irregular, much of the advantage offer
traditional sequential processing systems may be lost.
In an attempt to produce a more flexible response to workload
demands, a transputer workfarm was investigated. Two network
topologies, a linear model and a tree model, were built using the
transputer as the processing element (PE), or worker. An algorithm
was developed which could be run independently on all workers in the
workfarm. Each worker produced results independent of the other
workers. By altering specific variables within the algorithm, the
network performance could be changed. The results from this thesis
illustrate how these parameters affect each network and provide















4. Serial Links 9
5. Timers 9









1. General Algorithm Structure for All Topologies 17
2. Linear Algorithm 1
9




A. Linear Topology 29
B. Tree Topology 30
C. Comparison ofTopology Performance 3
1
V. CONCLUSIONS and DISCUSSION 34
APPENDIXA DETAILED SOURCE CODE -LINEARTOPOLOGY 36
APPENDIX B LINEAR MODEL GRAPHIC DATA (NON-ADDRESS MODE) 54
APPENDIX C LINEAR MODEL GRAPHIC DATA (ADDRESS MODE) 58
APPENDIX D TREE MODEL GRAPHIC DATA (NON-ADDRESS MODE) 62
APPENDIX E TREE MODEL GRAPHIC DATA (ADDRESS MODE) 64
LIST OF REFERENCES 68
BIBUOGRAPHY. 69
INITIAL DISTRIBUTION LIST 70
ACKNOWLEDGEMENTS
I would like to thank several people who contributed to the
completion of this thesis. First, my advisor Prof. Yang. He provided
me with a great deal of guidance including new ways to look at the
problem. Without his help I would have had much greater difficulty in
completing as much as I was able to. Secondly I would like to thank
Captain Rod Scott, RCAF, for providing a source code base for part of
the linear model from which to work from. Next, Prof. Kodres'
generosity in providing the facilities, transputers, computers, and
study space without which none of this would have been possible.
Finally, to Prof. Shing for being able to fulfill the requirements of the




To my wife Sandie who was my inspiration for completing my
degree and kept me looking to the future. She was very supportive
throughout my stud}'' program despite having to handle her own career
and was especially strong during the last four months when we were
separated because to her career requirements. She persevered





Distributed computing systems provide an exciting avenue for
enhancing performance of hardware and software systems. Traditional
multiprocessor systems utilize a one- or two-dimensional array of
processors using nearest-neighbor or pipeline computing structures
[Ref. 1]. Many networks will operate efficiently when each processor
shares the computational load, i.e., achieves load-balancing.
Programmers and users have a unique environment in which
algorithms possessing a concurrent nature can be run with much hig-
her efficiencies than previously encountered in sequential
programming structures. Most solutions to existing problems have
been approached from a sequential processing aspect. A significant
percentage of these problems have varying degrees of underlying
concurrency which may be exploited. The most dramatic advantage to
be gained by exploiting the concurrent nature of the algorithms is the
increased throughput. By dividing a task into several smaller tasks and
processing them concurrently on separate processing elements (PEs),
a tremendous decrease in processing time may be observed.
Once a task has been determined to have some inherent
concurrency, either a system must be built to conform to the nature of
the problem, or the problem must be configured to run on existing
parallel processing systems. Stone, [Ref. 1], points out that there are
several different parallel processing structures and philosophies
available, each with its own distinct advantage. The purpose of this
thesis is to investigate one of the structures called a workfarm. The
PE's within the workfarm may be configured into many topologies and
therefore only two will be looked at in depth, the linear topology and
the tree topology.
a WORKFARM CONCEPT
Suppose a problem can be broken into a finite number of identical
parts, each of which takes a different amount of time to solve. Due to
varying processing times, nearest-neighbor or pipeline designs may
encounter difficulties caused by an unbalanced work load on adjacent
processors resulting in communications delays.
A logical alternative is a processor workfarm. In a workfarm, each
processor (worker) executes the same functional block of code on
individual workpackets independently of adjacent workers. This
design is inherently different from the nearest-neighbor arrays and
pipelines in that adjacent workers operate asynchronously with
respect to each other. A controlling processor distributes the
workpackets to the workfarm whenever a results packet is returned
from the network. This synchronization is handled within the
software at the controller level.
In the case of a linear model, Figure 1.1, a controller distributes
workpackets to the first worker in the workfarm. The first worker
will process as many packets as it is able to handle and send the
remaining workpackets on to the rest of the workfarm. This happens
for each successive worker until the end worker is encountered.
Results are returned to the controller by trickling back up the linear
network in the opposite direction to the flow of work.
.Tram
(controller)
100 200 300 400 500 600 700 800
sy
workfarm
Figure 1.1 Linear Model Workfarm
The sensitivity of a given topology is analyzed by altering various
parameters within the algorithm which might affect the performance
of the workfarm. Parameters which may be altered to observe this
effect include the input buffersize of each worker (how many
workpackets a worker may buffer before having to pass additional
workpackets on), the size of the workpackets (how many iterations
must be done per workpacket), etc. The results presented in this
thesis indicate that buffersize and the size of the workpackets
(stripsize) are limiting factors in the linear model workfarm. In
addition, two modes of operation for the linear model have been
studied, non-addressed and addressed. In both cases the controller
sends a new workpacket out to the network each time it receives
results from a previously processed workpacket. In the non-
addressed mode the first available worker encountered by the
workpacket will grab the workpacket for processing or buffering, if
sufficient space is available. On the other hand, the same controller in
the address mode will send the workpacket directly to the specific
worker in the chain which just returned a results packet. Each of
these designs may have its own advantages.
II. THE TRANSPUTER
A OVERVIEW
The T800 [Ref. 2] transputer is just one in a family of transputers
produced by INMOS. The parallel architecture, augmented with the
CSP-based (Communicating Sequential Processes) [Ref. 3] language,
OCCAM [Ref. 4], makes it an ideal and inexpensive tool to conduct
research in topics of concurrency. Due to the serial link
intercommunication structure of the transputer, a variety of network
topologies can be easily implemented. Prior to discussing these
topologies, a greater understanding of transputer architecture is
required.
a ARCHITECTURE
The T800 is a single chip implementation of what is traditionally a
separate microprocessor and several support chips. Figure 2.1 shows
a simple breakdown of the T800 architecture. Transputer regions are
subdivided into a RISC technology microprocessor, on-chip RAM, four
paired input/output serial links, an external memory interface module
and a systems services module. In addition the T800 model

























Z /^—N "-ink A-K









\l—|/ Interface \ ^
Figure 2. 1 T800 Transputer Block Diagram
1. Processor
The processor is a FUSC machine containing instruction logic,
an instruction pointer, a workspace pointer, an operand register and
three source/destination stack-like registers. All registers are 32 bits
long. Four Gbytes of memory may be addressed. The first 4 Kbytes of
this address space are on-chip RAM. The various registers have the
following functions:
• workspace pointer - points to an area of memory containing
local variables
• instruction pointer - points to the next instruction to be
executed
• operand register - used to form instruction operands
• A, B, C registers - evaluation stack
a Instructions
Instructions refer to the stack implicitly. For example,
evaluation of an add instruction adds the contents of A to the contents
of B and places the sum in A. Overflow protection is not provided in
hardware as this is easily handled by the compiler.
All instructions in the instruction set have the same format
and are representatives of the most commonly used instructions in
most programs. Each instruction is a single byte which can be
decomposed into two 4 bit segments. The upper 4 bits contain the
function code and the 4 lower order bits are a data value. This alone
limits the number of functions to 16. However, most program
operations involve the loading of small literal values and the loading
and storing of one of a small number of variables. Two of the 16
functions, prefix and negative prefix, provide for extending the length
of the instruction operand. The prefix instruction first loads its 4 data
bits into the operand register and then shifts this value to the left 4
bits. The negative prefix instruction merely complements the
operand register prior to executing the shift. This scheme can create
operands in the range of -256 to 255 by simply using just one of the
appropriate prefix instructions.
b. Concurrency
The fundamental programming structure is known as a
process, which is simply a sequence of instructions. A single
transputer can run concurrent processes independent of a network of
transputers. This is allowed by multiplexing high and low priority
processes. Low priority processes are run whenever high priority
processes are idle or are waiting for communications. Typically high
priority processes are of short duration while most low priority
processes are of longer durations. The user defines what processes
run as high priority and which run as low priority. These are user-
defined and eliminate the need for a kernel.
2. FPU
The T800 also houses a 64 bit floating point unit. The FPU
performs single and double length arithmetic conformed to floating
point standard ANSI-IEEE 754-1985. This FPU is capable of
sustaining 2.25 MFLOPS processing concurrently with the CPU on a
30 MHz transputer model. However, for this thesis the T800 was
operated at 20 MHz.
3. Memory
The T800 is configured with 4KBytes of on chip static
random access memory. This memory serves as the lowest address
block for the 4 GBytes of memory addressable by the T800, The
remainder of the 4 GBytes must be supplied as external memory via a
32 bit bus. The 4KBytes of on chip RAM may be accessed via the 32
bit internal bus for read /write operations in one clock cycle.
4. Serial Links
The four pairs of input/output serial links provide the means
for building networks of processors. These links allow the
implementation of CSP by providing a means to make direct
communications channels between processors within the network.
5. Timers
There are two hardware timers within the T800 which
operate at two distinct levels. The high level timer provides 1 fisec




The workfarms to be investigated in this thesis required the ability
to be easily configured into a variety of network topologies. The T800
transputer and its programming language, OCCAM, were chosen
because of the simplicity in implementing multiprocessor networks
with them. As previously stated, all parallel processing structures have
advantages and disadvantages specific to their design. For this reason
two different topologies have been studied in this thesis to determine
their efficiencies relative to a given independent algorithm as the test-
bed. The two topologies chosen were the linear model and the tree
model. Since the goal was to determine the most efficient model for
the given independent algorithm, each model was tested by varying
specific parameters within the algorithm which might have influence
on the interprocessor communications and actual time spent doing
calculations. In each case the desired goal was to minimize the time
to complete a given batch of work and to observe the resulting load
balance for all workers in the network. The models are looked at in
more detail in the following sections.
Given a sufficient number of transputers, imagination is the only
limit to the number and design of possible topologies. However,
practicality from a hardware and software implementation standpoint
would argue that more regular and symmetrical structures be
investigated. With regards to these considerations, two other
10
topologies were of interest given the number of transputers available
for this network study. However, due to time considerations, a loop
model and a cube model could not be implemented. A basic
discussion of the loop and cube topologies will be included in the next
section, but it will be limited to hardware configuration only.
B. Models
1. Linear
The linear model is a straight forward application of the
transputers located on the BOOS boards and the TRAM. A block
diagram of the linear topology is shown in Figure 1.1. The model is
implemented by simply connecting the transputers in a one
dimensional array and ensuring that the appropriate link connections
are defined in the software. A diagram of the linear workfarm model




Gink not xised) Qink not used
Figure 3. 1 Linear Model Physical Link Diagram
The arrows depict the direction of data flow along the links. Note that
these are not bidirectional links, but rather both the in and out
complements for each link are used as a pair to provide output and
input communications. Workpackets are generated in the controller
and sent to the network via the outward flow of the links. Results are
passed back up the links to the controller in the other direction.
2. Tree
The tree model is a more significant deviation from a simple
linear design topology. The TRAM has three pairs of output and input
links to choose from for connecting to the workfarm network. For the
tree model, two of these link pairs are used. Each link pair connects
to one of the BOOS boards. The root worker of the BOOS board already
1 2
has links to its two orthogonally adjacent neighbors and a simple
hardwire connection is made from its fourth available link pair to the
diagonally positioned worker on the board. This configuration creates
two branches from the TRAM with each branch splitting into three
more branches. The configuration is shown abstractly in Figure 3.2
and schematically in Figure 3.3.
worker 1 worker 2
worker 3) ( worker 4) { worker 5) worker 6) ( worker 7J ( worker 8)




Ginks not used) (links not used)
Figure 3.3 Tree Model Physical Link Diagram
3. Loop
The loop topology is a direct derivative of the linear model.
The data flow is unidirectional instead of bidirectional as in linear
model. Figure 3.4 depicts the loop model in a block diagram.
Workpackets flow from the controller to the network. The results
continue to flow in the same direction as workpackets and are
transmitted by the last worker in the chain back to the controller via a






































Figure 3.5 Loop Model Physical Link Diagram
4. Cube
The cube is the most complicated of the four models. It
combines features of the tree and the loop topologies. Figure 3.6 is an
15
abstract diagram of the nodes and the link connections. Output from
the controller would flow to the first worker where the workpackets
(and results) could be forwarded to one of the three adjacent nodes
along three different channels depending on availability. Each of these
three adjacent nodes could in turn forward the workpackets (and
results) to two other nodes, as shown in the figure. Ultimately all
results will converge on the last worker in the cube which sends the
results on to the controller via a separate link.
Tram
(controller)
Figure 3.6 Cube Model Block Diagram
1 6
C. Algorithm
1. General Algorithm Structure for All Topologies
As mentioned in the previous section, there are several
possible topologies in organizing a workfarm, e.g., linear, loop, tree,
cube, and many others. The linear and tree topologies were
investigated in this project. To test these topologies, an algorithm was
designed to produce independently processable workpackets. Each
topology employs a similar set of processes to accomplish the task.
The processes implemented may differ slightly due to the
communication requirements in the different topologies. Process
names will be in bold print for the remainder of this paper. Common
to each of the workfarms is the controller. The purpose of the
controller is to coordinate the various processes necessary to 1)
generate work based on the number of strips the graphics screen is
subdivided into, 2) send the work out to the network of transputers
for processing, 3) receive the results from the network and 4)
monitor the time it takes to process each batch of work. A general





Figure 3.7 Linear Model Controller Process
The complete boxed diagram represents the controller
process. This is loaded specifically into the root transputer. Each of
the circled processes are part of the overall process and they run in
virtual parallelism. In other words, they are not truly running in
parallel. The four processes are time sliced based on their priority
levels. Since they are all specified to run at high priority, they each
receive equal processing time. Therefore, no one process receives
priority over any other.
The arrowed lines between processes indicate designated
software channels between those processes along which data may be
passed. Arrows extending beyond the bounds of the controller signify
1 8
channels being routed along physical links to and from other
hardware. The work and results channels are used to communicate
with the network of workers while the to.graph and from.graph
channels are used for sending aind receiving data between the display
and external filing hardware.
2. Linear Algorithm
The workfarm is where the workpackets are processed. The
process which performs the calculations is consistent in each worker
and in each topology. Any differences in the algorithm loaded into the
network workers are due to communications requirements which
depend on the specific topology being employed. It is also important
to note that since more than one layer of workers is being used in
every workfarm, physical links must be established between upper
level workers and lower level workers in the hierarchy so that
workpackets may be communicated to all workers in the farm. The
process to be loaded into the linear workfarm is pixel.gen and is





Figure 3.8 Linear Model Pixel.gen Process
The channels labelled work. in, work. out, results. in and
results.out are along physical links and they are connected either to
the TRAM (in the case of the first worker in the farm) or to other
adjacent workers. The same discussion for processes internal to
controller apply to the process pixel.gen. Note that pixel.gen has two
levels of subprocesses. Processes router, gen.pixel and bypass all are
at the same level while processes my.buffer, buffer and mix are all one
level deeper.
The linear workfarm is implemented in two schemes. The
first scheme is a non-addressed mode in which the controller sends
20
workpackets out to the network without specifying which worker is
going to process the workpacket. The second scheme uses an
addressed mode in which each workpacket contains the destination
worker's unique address within the array. This allows a new
workpacket to go directly to the worker that has returned a
completed workpacket rather than to an opportunistic worker closer
to the controller as the non-addressed mode would allow.
The basic operation of the algorithm is as follows. Controller
contains work.valve, display, gen.work and stopwatch. These four sub-
processes are running in parallel within controller. Display initializes
the network of workers, via gen.work, with initialization data
necessary to carry out the processing of the workpackets (initialization
phase). This is sent from the controller via the channel annotated
work. out. This output is fed to each of the workers by entering the
first worker in line via its input channel work. in. Once a worker has
been initialized with the appropriate data it will signal back to
work. valve to send workpackets. Each worker will first get one
workpacket to be processed by gen.pixel, then will fill its buffer (store,
within router, is not \asible in these figures) to the limit set by the
variable buffer. size. When this task has been completed, this worker is
full and any more initial workpackets will be forwarded to the next
worker in line via channel work.out from router and channel work. in
of the next worker. This process is repeated until all of the initial
batch of workpackets has been sent to the network. Since the linear
model has been implemented using two schemes, non-addressed and
21
addressed, they have different methods for sending workpackets out
to the array of workers. A integer value in display, called address, is
set to either 999 or some other number. When address is set to 999,
the non-address mode is used for workpacket distribution. Otherwise,
workpacket distribution is done according to the address mode.
During the distribution of the initial batch of workpackets for the non-
addressed mode, a workpacket is shipped without regards to which
worker will receive it. The first worker with an available buffer
opening will absorb that workpacket. The number of workpackets to
be sent out during this phase equals the number of workers in the
array multiplied by the buffersize plus one. With a standard number of
eight workers in the array, the number of workpackets initially sent
would be:
8 = number of workers
#workpackets = 8*(10+1) = 88 where: <^ 10 = possible bufifersize
1 = workpacket for processing
In the non-address mode, it is conceivable that the first couple of
workers will begin processing their buffered packets prior to all of the
workers receiving an equal percentage of the initial allotment.
Therefore, until processing overtakes the first processors, they may
accept larger percentages of the initial workpacket allotment. As soon
as the first processor completes a packet, it will return the results via
my.buffer and mix. The results are then passed from pixel.gen to
controller \na. the channels results.out and results,! respectively. Once
in controller, work.valve will send the results on to display.
1
1
In the addressed mode, work. valve sends the initial
workpackets to the workfarm deterministically. Work.valve sends the
initial workpackets as bundles to each worker in the array so every
worker receives exactly the same initial number of workpackets. The
number of workpackets in each bundle is the same and is equal to
adding one to the current buffersize, i.e.,
„, , , , ^r^ A ^^ u / 10 = possiblc buffcrsize \#workpackets/worker = 10+1 = 11 where: < _, , ^ • >^ \ i = woriqjacket for processing j
Consequently with eight workers in the workfarm the total number of
packets shipped initially in this example would be 88. The major
difference here is that each worker will get the same number of initial
packets.
The elapsed time for each batch of work was accomplished by
triggering stop.watch to get a start time at the end of the initialization
phase. Then after work.valve receives the results from the last
workpacket, stop.watch is triggered again to get the stop time. The
elapsed time is the difference between the stop time and the start
time and represents the total amount of time taken to process all of
the workpackets after the initialization phase for the network.
Several parameters are important in this workfarm. One of
the most important parameters is the variable stripsize. One goal
might be to section a monitor screen up into parts (or strips) and then
randomly generate a color for each pixel (one byte for each pixel) of
that strip. Reducing the stripsize reduces the number of pixels (or
bytes) that need to be calculated in that strip. Consequently the
23
processing time for a strip decreases along with stripsize. However,
the other side of this tradeoff is that as the stripsize shrinks, the
maximum number of strips required to complete the screen increases.
These two variables are inversely proportional. In order to calculate
the color for the screen randomly, a random number generation
library routine is called. It is actually called twice, once in gen.work
and then again in gen. pixel. Both results are used in gen.pixel. The
value calculated by gen.pixel is compared against the value forwarded
by gen. work. If the former is not within plus or minus the
window.width of the latter, then gen.pixel recalculates a random
number and does the comparison again. This procedure causes each
workpacket to be processed for a different period of time. Once a
match has been made within the constraint of the window.width, the
random number created by gen.pixel is returned with the results. The
reason for replacing the random number value with the one calculated
in gen.pixel is to scale the random number to a value between and
255. Since there are only 256 possible colors per pixel, the calls
produced by gen. pixel must produce results within the range of to
255. Obviously as the window.width requirement becomes more
constrained, the longer the process will take to converge and
processing time will increase.
Every run of the workfarm decreases the stripsize by one half.
The maximum stripsize is 128 bytes and the minimum stripsize is 1
byte. Each reduction in stripsize effectively doubles the maximum
workload. It also reduces the amount of processing time for each
24
workpacket because each successively smaller stripsize contains only
half as many bytes as the previously larger stripsize. By reducing the
processing time per workpacket, the rate at which workpackets and
results will be transmitted increases. It is possible that the first few
workers will become a communications bottleneck while attempting
to cope with the demands of more distal workers. This is because
more distal workers have less of a requirement for communications
due to their positioning within the network.
3. Tree Algorithm
The tree topology required some additional communication
considerations that were not required in the linear model. Since the
tree topology is a branching structure, allowances had to be made for
communications to follow these branches freely in the non-addressed
mode. Workpackets need to be able to go to any worker that is
available. The algorithm was basically the same as the pixel.gen used
in the linear algorithm with additional branching communication
channels. The controller was modified to handle two output channels
and two input channels to the network as shown in Figure 3.9. A new















Figure 3.9 Tree Model Controller Process
There were two modified versions of pixel.gen. The first,
pixel.gen2, was modified to handle three input and output channels as
required by the second level workers. In addition, pixel.gen2 was also
modified to allow a second level worker to reroute any workpacket
among the third level workers in the non-addressed mode. This
allowed a workpacket to be cycled through the third level workers
until one that could process or buffer it was found. Figure 3.10 shows
pixel.gen2 with the channel, reroute, which allows the rerouting of






Figure 3.10 Tree Model Second Level Worker PixeLgen2 Process
The third level workers did not require any additional output
channels for branched workers and was not modified to handle this. If
additional workers were to be appended to the third level workers,
then pixel.gen3 could be modified the same as pixel.gen2 to allow for
communications. Pixel.gen3 does include a channel, reroute, which
will send an unprocessed workpacket back up to the second level
worker for redistribution to another third level worker. Pixel.gen3 is










The load balancing results [Ref. 5], produced by varying the
stripsize and buffersize parameters, are graphically depicted in
Appendix B and Appendix C. Appendix B contains the results for the
non-addressed mode operation of the linear topology. Appendix C
contains the results for the addressed mode operation. The graphs
are broken down by stripsize with workpackets done per worker
versus buffersize. Examining these graphs shows a wide range of load
balancing. Large stripsizes corresponding to larger workpacket sizes
resulted in more even load balancing for both the addressed and non-
addressed modes of operation. For the large stripsizes, buffersize did
not seem to have a significant effect on the load balancing. As the
stripsize was decreased below 32, the load balancing became less
symmetric with the most distal workers in the network taking on less
work with small buffersizes. The load balancing appears to smooth out
as the buffersize increases with stripsizes down through 8, but careful
inspection of the actual loads shows that the further the worker is
from the controller, the greater the load it will carry in terms of
workpackets processed. This can be attributed to the decreased
processing time associated with smaller workpackets. Decreasing the
workpacket size reduces the processing time required by any given
worker. This results in a communications buildup that increases the
more proximal to the controller a worker is. Those workers closest to
29
the controller become inundated with communication demands to
serve workpackets and receive results from the most distal workers
because of the faster processing time due to smaller packets. For the
non-addressed mode, serving workpackets to the network that are
smaller than 16 bytes in length results in severely unbalanced loading.
The addressed mode provides a slight advantage over the non-
addressed mode in that there is a point of convergence for load
balancing given any stripsize. A buffersize of 1 produced a reasonable
load balance for all workers in the network regardless of stripsize.
Distributing workpackets by request to the workers, vice the first
available method used in the non-addressed mode, resulted in the
pattern seen in Appendix C.
R Tree Topology
Due to the bilateral topology of the tree, the load balancing results,
as shown in Appendix D and Appendix E, were as expected for large
values of stripsize. The load was shared evenly between both main
branches of the tree. In addition the tertiary level workers shared
identical loads within a single branch. The tertiary workers in one
main branch also handled exactly the same amount of work as the
tertiary level workers in the other main branch. Both workers in the
second level handled the same amount of work, but processed fewer
packets than any given tertiary worker. This was expected as their
position in the network dictated their task in mediating
communications between the controller and the tertiary workers.
30
Both t±ie non-addressed and addressed modes responded similarly
to the decrease in stripsize. The load balancing remained symmetric
through a stripsize of sixteen at which point the load balancing
became asymmetrical. As a result of faster processing time for smaller
packets, whichever branch gained control of the initial workpackets
first would continue to absorb most of them for the remainder of a
batch of work.
C Comparison of Topology Performance
In order to do a comparative analysis of the two topologies, timing
results were obtained for both the linear and tree models for both
operating modes over a range of workpacket sizes. Table 4.1 contains
sample results for each of these categories. These figures were
obtained for a buffersize equal to 2 in all cases. In comparing the
linear with the tree for any given mode and stripsize, it is clear that
the tree has a temporal advantage in processing a complete batch of
work. Distributing work to the network deterministically using the
addressed mode resulted in quicker processing times over the non-
addressed mode. This was due to less processing time being required
to determine if a worker could handle the work. The deterministic
method merely checked to see if a worker was the destination worker.
If not. the workpacket was passed on. For example, with a large
stripsize (128 bytes), the tree topology using the addressed mode is




















Table 4.1 Comparison of Linear Model and
Tree Model Timing Results
Although the linear model was simpler to set up, the tree topology
did have some advantages which might make it more useful. In terms
of timing, the tree topology was faster than the linear model. For
many applications the difference in speed between the two topologies
may not be of any consequence to the user. However, for applications
where every additional margin of speed is of the essence, a tree type
model may prove advantageous by creating the shortest
communication distances, with the fewest intermediate nodes
between a root controller and the most distal workers in the network.
The linear model could potentially become bogged down in
communication delays with increasing numbers of workers arranged
in that topology.
Load balancing considerations make the tree a better model for
processor utilization. There was a greater percentage of processors in
the tree network doing useful processing as compared with the linear
3 2
model. This was particularly evident as the stripsize decreased below
thirty-two.
33
V. CONCLUSIONS and DISCUSSION
Difficulties encountered were centered around debugging. The
Transputer Development System, TDS, that was used did not provide
any debugging facilities. Therefore, attempting to localize bugs in the
network was a time consuming and not altogether enjoyable task. The
controller level of each of the networks was simpler to debug because
it interfaced directly with the monitor controlling boards. However,
the network of workers was essentially invisible to debugging except
at the source code level. Possible alternatives to working in the
Transputer Development System would be using the Profiler [Ref. 6]
and NDB (network debugger) [Ref. 7] produced by Parasoft
Corporation. The Profiler is a performance monitor composed of an
Execution Profiler, a Communication Profiler, and an Event Profiler.
The Execution Profiler monitors time spent in individual routines, the
Communication Profiler evaluates the time spent in communications
and I/O, and the Event Profiler shows interactions between processors
and allows user-specified events to be monitored. NDB is a symbolic
source and assembly level debugger for parallel computers. Using this
tool it is possible to determine how far a program has run. The
problem of not being able to completely monitor the network will still
be a problem though.
The tree network, running in the non-addressing mode, did not
execute any batches of work beyond the first two within a stripsize of
eight. Due to time constraints and the inherent difficulties with
debugging, this problem remains to be resolved. Data from this
34
configuration with stripsizes ranging from 128 to 16 as compared to
data from the other network configurations appeared to be accurate
and was considered useful.
The transputer has an inherent simplicity based on the CSP
philosophy which makes it a valuable tool for workfarm research.
Many topologies, such as a cube or loop, can be investigated quickly
and simply using this device. Additional research should be
investigated utilizing a shared memory schemes to enhance the utility
of this device in problems requiring shared databases. In addition
more effort should be applied towards the production of debugging
tools for multiprocessor networks.
Another approach to debugging small networks of transputers,
i.e., less than 16, could lie in the CSP design philosophy of the
transputer itself. A debug specific hardware board could be designed
which taps one link to each transputer in the network thus enabling a
debug program to monitor each of the processes executing in any or
all of the transputers in the network connected to the debug
hardware. Obviously this would defeat the connectivity of each of the
transputers in the network to some degree. However, in the interest
of providing more debugging capability and hence more robust
programs, this may be an area worth investigating.
35
APPENDIX A
DETAILED SOURCE CODE - LINEAR TOPOLOGY
-- link definitions
VAL linkOout IS
VAL linklout IS 1
VAL link2out IS 2
VAL linkSout IS 3
VAL linkOin IS 4
VAL linklin IS 5
VAL link2in IS 6
VAL linkSin IS 7
-- declarations
VAL numT8 IS 8:
VAL numT4 IS :
VAL numTs IS numT8+numT4
:
— channel declarations
CHAN OF ANY to
.
graph, from. graph :
CHAN OF ANY graph . to .mouse, mouse . to
.
graph
CHAN OF ANY to.net , from. net :
CHAN OF ANY :
CHAN OF ANY to . time, from. time
[numT8+numT4] CHAN OF ANY results:
[numT8+numT4] CHAN OF ANY work :
— VAL assignments
VAL work. in IS
VAL work. out IS
VAL results. in IS





PROC controller (CHAN OF ANY to
.
graph, from. graph, work, results)
--Process which provides control for workfarm. Contains four internal





CHAN OF ANY new . work, init , to . display , from. display
:
CHAN OF ANY to . time, from . time
:
CHAN OF ANY report:
PROC work. valve (CHAN OF ANY results, new . work,
to . time, from. time, to .display, from. display, report)
--PROC work. valve monitors the network for incoming results which it
--sends to PROC display. At the beginning and end of each batch of
--work, PROC work. valve gets the current time from PROC stop. watch.
--At the end of a batch of work, PROC work. valve computes the elapsed





























command = c. result
SEQ
results ? x;y; [pixels FROM FOR stripsize] /address
to. display ! c . result ;x;y;
[pixels FROM FOR stripsize]
new. work ! address








command = c. report
SEQ
from. time ? time
to. display ! c. time; time
command = c . report .data
SEQ
results ? packets; compares ; num
to. display ! c . report .data; packets; compares; num
command = c.init
SEQ
results ? seed; window .width; rmax;
stripsize; lastT;buf fer .size
to. time ! TRUE -- start timing
IF
address = 999
SEQ i = FOR iters
new. work ! address
37
TRUE
SEQ i = FOR iters/ (buffer. size+1)
SEQ j = FOR buffer. size








from. display ? stripsize; lastT;address;buf fer . size
SEQ
iters := lastT TIMES (buffer . size+1)
maxwork := (512*512) /stripsize
workdone :=
PROC gen. work (CHAN OF ANY in, out , init, report
)
--PROC gen. work creates new work as dictated by PROC work. valve. Work
--is only sent to the network as results are returned. The one
—exception to this is the initialization of the network for each
--batch of work. At that time PROC display provides the initial
--values for the parameters to be used by the network in doing its
--calculations. PROC display sends this information to PROC gen. work
--which then sends it to the network. When a reply is received by
--PROC work. valve that the network has been initialized then a trigger
— is sent to PROC gen. work to send a specific number of workpackets to
--the network. As results are received by PROC work. valve, PROC


















PROC random (INT rnum,VAL INT rmax)

















out ! c . ray; x;y; randnum; address
workdone := workdone + 1










in it ? Tseed; window. width,-maxrand; stripsize; lastT; buffer . size
SEQ
out ! c . init ;Tseed; window. width;maxrand; stripsize;
lastT; buffer. size





out ! c . report
PROC display (CHAN OF ANY in, out , to .graph, from. graph, init
)
--PROC display generates the initial data required by the network to
--process the workpackets. It sends this data to PROC gen. work which
--in turn sends the initialization data to the network. PROC display
--also receives incoming results from PROC work. valve and sends them





INT num, packets, total
:
INT maxwork, reports, time, buffer . size
:
INT x,y, command:








BOOL active, not .done
:
39
VAL seed IS 127463 (INT32)
:






= VT220 .set .scroll
= top
= bottom
to. graph ! c .to .VT2 20 ; mode /params
PROC clear. screen ()
SEQ
mode : =
params [0] := VT220 . clear . screen




params [0] := VT22 0.tab
to. graph ! c .to .VT2 2 ;mode; params
PROC c. return ()
SEQ
mode : =
params [0] := VT220. return
to. graph ! c .to .VT22 ; mode; params
PROC print. screen ()
SEQ
mode : =
params [0] := VT220 .print . screen
to. graph ! c . to . VT220 ;mode;params
PROC write. num (VAL INT num)
SEQ
mode : = 1
params [0] := VT22 0.num
params [1] := num
to. graph ! c . to . VT220 ;mode;params
PROC write. num. xy (VAL INT num,x,y)
40
SEQ
mode : = 1
pa rams [0] = VT22 .num.xy
params [1] = num
params [2] = X
params [3] = y
to. graph ! c .to .VT22 ,-mode; params
PROC write. text (VAL [] BYTE text)
SEQ
mode : = 2
params [0] := VT220.text
len := SIZE text
to. graph ! c . to . VT220;mode;params; len; text
PROC write. text .xy (VAL [] BYTE text, VAL INT x,y)
SEQ




= VT220 .text .xy
= X
= y
len := SIZE text




params [0] := VT220 .highlight




params [0] := VT220 . underline
to. graph ! c . to . VT220 ;mode;params








to. graph ! c. get. mouse
from. graph ? x .pes ;y .pos;m. l;m.m;m.
r
WHILE (NOT m.r) AND (NOT m.m) AND (NOT m.l)
41
SEQ
to. graph ! c. get. mouse












WHILE (stripsize >= 1)
SEQ
SEQ buffer. size = 1 FOR 20
SEQ
-- color monitor


















write . text . xy
reply
c . init . crt ; sony
reply
c . select . screen;
reply
c . clear . screen;
reply
c .display . screen;
reply
c . select .colour .table;
reply










c . to .host ; window .width
c . to .host ; stripsize
c .to . host ; buffer . size
write . text .xy ("Window. width : ",3,1)
write. num. xy (window . width, 3, 12)
write .text . xy ("# Transputers: ",3,50)
write. num. xy (lastT,3,65)
write . text . xy ("Random number range - ",4,1)
write. num. xy (rmax, 4, 25)




write . num. xy
c. return ()
underline ()






(buffer. size, 5, 63)
Work Done Compares")
out ! stripsize; lastT; address /buffer . size
init ! seed; window .width; rmax; stripsize; lastT/buffer . size
reports :=











[pixels FROM FOR stripsize]




= maxwork + 1
command = c . report .data
SEQ
in ? packets ; compares; num




c .to .host ; num
c . to .host /packets
c .to .host ; (INT compares]


















write. text ("Test ")
write. num (command)








to. graph ! c . to .host ;maxwork
to. graph ! c .to .host ; time
c.return ()
write. text ("Total work: ")
write. num (maxwork)
c.return ()
write. text ("Time (usee): ")
write. num (time)
c.return ()




PROC stop. watch (CHAN OF ANY in, out)
--PROC stop. watch provides timing information from the high priority
—clock running in Ijisec ticks. PROC stop. watch is called in this


















work .valve (results, new . work, to .time, from. time,
to. display, from. display, report
)
gen .work (new .work, work, in it, report
)
display (to .display, from. display, to .graph, from. graph, init
)
stop. watch (to . time, from. time)
SKIP
PROC pixel. gen (CHAN OF ANY work . in, work . out , results . out , results . in.
44
VAL INT mynum)
-PROC pixel. gen is run by each worker in the workfarm. It contains
-processes which control the routing, processing and buffering of
-workpackets and processes which control the flow of results back up to
-the controller.
#USE "raycom.tsr" -- ray tracer command definitions
-- declarations
CHAN OF ANY to . ray . t r , f rom. ray . t r, requestmore : -- internal channels
PROC router (CHAN OF ANY work . in, work . out , to . ray . tr, requestmore, VAL
INT mynum)
—PROC router controls the flow of workpackets. If a given worker in
--the network cannot handle the current workpacket it is either
--buffered or sent to another worker in the network.
-- declarations




VAL buffmax IS 20
:





INT address,buffer . size
:
INT command, lastT :
INT32 seed,myseed:









PROC store (INT x, y, randnum, address)
--PROC store is the buffer for workpackes in any given worker. If
--the worker is not busy and PROC store has available workpackets,
--one will be sent by PROC send to PROC gen. pixel
SEQ
buffcount := buffcount + 1
IF
(buffcount = (buffer . size-1) ) OR (buffcount = (buffmax-1))
SEQ
bufferfull := TRUE
xtemp [buffcount ] := x
ytemp [buffcount ] := y
rantemp [buffcount ] := randnum




xtemp [buffcount ] := x
ytemp [buffcount ] := y
ranterop [buffcount ] := randnum
addresstemp [buffcount ] := address
PROC send ()
--PROC send removes workpackets from the workpacket buffer and
--routes them to PROC gen. pixel if PROC gen. pixel is not busy and
--the buffer has at least one workpacket.
SEQ
to.ray.tr ! c . ray ; xtemp [buffcount ] ;ytemp [buffcount ]
;
rantemp [buffcount ] ; addresstemp [buffcount
]
bufferfull := FALSE
packets. done := packets. done + 1
IF
buffcount > empty













-- (NOT busy) & requestmore ? sendmore
—SKIP
work. in ? command
IF
command = c . ray
SEQ






to.ray.tr ! c . ray ; x; y; randnum; address
packets. done := packets. done + 1
busy := TRUE
bufferfull
work. out ! c . ray; x; y; randnum; address
TRUE






to.ray.tr ! c . ray; x;y; randnum; address
packets -done := packets .done + 1
busy := TRUE
TRUE
store (x, y, randnum, address)
TRUE




work. in ? seed; window. width; rmax;
stripsize; lastT; buffer. size
myseed := seed
seed := seed + 1231 (INT32)
work. out ! c . init ; seed; window. width; rmax; stripsize;
lastT; buffer. size





command = c . report
SEQ
work. out ! c. report
to.ray.tr ! c . report .data ;packets .done
command = c.test
SEQ
work. in ? command
work. out ! c .test ; command
TRUE
SKIP
PROC bypass (CHAN OF ANY results . in, results . out , from. ray . tr)
--PROC bypass receives results from PROC gen. pixel or from other
—workers
--downstream trying to send their results up to the controller. It
--provides a
--control point for the flow of results to the controller.
-- declarations
#USE "raycom.tsr" -- ray tracer command definitions
CHAN OF ANY mine, theirs : -- internal channels
PROC my. buffer (CHAN OF ANY in, out)
--PROC my. buffer receives results from PROC gen. pixel within that
--specific worker and forwards that data on to PROC mix for
--transmission to the controller
-- declarations






INT mynum, packets .done
:
INT64 compares:










command = c. result
SEQ
in ? x;y; [pixels FROM FOR stripsize] ; address
out ! c . result ;x; y;
[pixels FROM FOR stripsize] ; address
command = c . report .data
SEQ
in ? packets .done; compares ;mynum






out ! c . test /command
TRUE
SKIP
PROC buffer (CHAN OF ANY in, out)
--PROC buffer receives results from downstream workers and forwards
--the results to PROC mix for transmission to the controller
-- declarations
#USE "raycom.tsr" -- ray tracer command definitions




INT mynum, packets .done
INT64 compares:










command = c . result
SEQ
in ? x;y; [pixels FROM FOR stripsize] ; address
out ! c . result ;x;y
;
[pixels FROM FOR stripsize] ; address
command = c . report .data
SEQ
in ? packets .done /compares,-mynum
out ! c . report .data,• packets .done; compares ;mynum
command = c . report
out ! c . report
command = c . init
SEQ
in ? seed; window .width; rmax,
•
stripsize; lastT; buffer. size





out ! c .test ; command
TRUE
SKIP
PROC mix (CHAN OF ANY mine, theirs, out
)
--PROC mix acts as the control point for sending results from this
—worker or from a downstream worker depending on whether the active
--input channel is from PROC my. buffer or PROC buffer respectively.
— declarations
#USE "raycom.tsr" -- ray tracer command definitions




INT mynum, packets . done
:
INT64 compares:










command = c. result
SEQ
mine ? x;y; [pixels FROM FOR stripsize] ; address
out ! c . result ;x;y;
[pixels FROM FOR stripsize] ; address






out ! c .test ; command
command = c . report .data
SEQ
mine ? packets .done; compares ;mynum





command = c. result
SEQ
theirs ? x;y; [pixels FROM FOR stripsize] ; address
out ! c . result ;x;y;
[pixels FROM FOR stripsize] ; address
command = c . init
SEQ
theirs ? seed; window . width; rmax;
stripsize; lastT; buffer. size





out ! c .test ; command
command = c. report
out ! c. report
command = c . report .data
SEQ
theirs ? packets .done; compares;mynum




my. buffer (from. ray . tr, mine)
buffer (results . in, theirs)
mix (mine, theirs, results . out
)
PROC gen. pixel (CHAN OF ANY in, requestmore, out , VAL INT mynum)
--PROC gen. pixel processes the workpackets by generating random
--numbers until one of them falls within a specific tolerance
-- (window . width) of a random number generated by PROC gen. work in the
--controller process. The results are sent to the local PROC
--my. buffer and then to PROC mix for transmission to the controller
--either directly or via an intermediate worker.
-- declarations




















VAL sendmore IS (BYTE)
PROC random (INT rnum, VAL INT rmax)












command = c . ray
SEQ
in ? x; y; randnum; address
SEQ i = FOR stripsize
SEQ
random (rnum, 255)












compares := compares + 1 (INT64)
IF
window. width >= temp
closenough := TRUE
TRUE
random ( rnum, rmax)
out ! c . result ; x;y; [pixels FROM FOR stripsize] ; address
requestmore ! sendmore
5 1
command = c . init
SEQ
in ? seed; window . width; rmax; stripsize
out ! c . init ; stripsize
compares := (INT64)
command = c . report .data
SEQ
in ? packets. done










router (work . in, work . out, to . ray . tr, requestmore, mynum)
bypass (re suits. in, results. out, from. ray . tr)
-- low priority
gen .pixel (to.ray.tr, requestmore, from. ray .tr, mynum)




graphics (in, out, from. mouse, to. mouse)





mouse (from. graph, to .graph, from.net, to .net
)
PLACED PAR
PROCESSOR 2 T4 — mouse/terminal process
PLACE mouse .to .graph AT linkOout
PLACE graph . to .mouse AT linkOin
PLACE from.net AT link2out
PLACE to.net AT link2in
VT22 0mouse (graph . to .mouse, mouse . to .graph, from.net, to . net
)
PROCESSOR 7 T4 — graphics process
PLACE to. graph AT link3in :
5 2
PLACE from, graph AT link.3out
PLACE mouse .to .graph AT linkOin
PLACE graph. to. mouse AT linkOout
graph (to .graph, from. graph, mouse .to .graph, graph .to .mouse)
PROCESSOR 10 T8 — controller process
PLACE to. graph AT linklout
PLACE from. graph AT linklin
PLACE work[0] AT linkOout
PLACE results [0] AT linkOin
controller (to
.
graph, from. graph, work [0] , results [0 ] ]
PLACED PAR i = FOR numT8-l — T800 ray tracers
-- T800 ray tracers
PROCESSOR ((i+l)*100) T8
PLACE work[i] AT work. in [i]
PLACE work[i+l] AT work. out [i]
PLACE results [i] AT results . out [i]
PLACE results [i+1] AT results . in [i]
pixel .gen (work[i] ,work[i+l] , results [i] , results [i + 1] , ( (i + 1) *100) )
PROCESSOR ( (numT8) *100) T8
PLACE work [numT8-l] AT work . in [numT8-l ] :
PLACE results [numT8-l] AT results . out [numT8-l]
:
pixel .gen (work [numT8-l] , endloopTS, results [numT8-l]
,




LINEAR MODEL GRAPHIC DATA (NON-ADDRESS MODE)






























































































































































































LINEAR MODEL GRAPHIC DATA (ADDRESS MODE)


























































































































































































































































TREE MODEL GRAPHIC DATA (NON-ADDRESS MODE)



























































































TREE MODEL GRAPHIC DATA (ADDRESS MODE)



























































































































































1 5000 - ^
Stripsize=4
1 0000 - -




























































1. Stone, Harold S., High Performance Computer Architecture,
Addison Wesley, Reading, Massachusetts, 1987.
2. Transputer Reference Manual, January 1987, INMOS Ltd.,
Bristol, United Kingdom.
3. Hoare, C. A. R., "Communicating Sequential Processes,"
Communications of the ACM, vol. 21, no. 8, pp. 666-677, August
1978.
4. A Tutorial Introduction to OCCAM Including Language Definition,
March 1988, INMOS LTD., Bristol, United Kingdom.
5. Yang, C. and Johnson T. J., "A Sensitivit}'^ Analysis of the Linear
Model Workfarm," accepted for presentation, SIAM Parallel
Processing Conference, December 1989.
6. Profiler: A Profiling System for Parallel Computers, 1988,
Parasoft Corporation, Mission Viejo, California.
7. NDB: A Source Level Debugger For Parallel Computers, "A Guide
to Debugging C and Fortran Progranns", 1988, Parasoft
Corporation, Mission Viejo, California.
68
BIBUOGRAPHY
Bryant. Gregory R., Design, "Implementation and Evaluation of an
Abstract Programming and Communications Interface for a Network of
Transputers," M.S. Thesis June 1988, Naval Postgraduate School,
Monterey, California.
Cloughley, William R., "Evaluation of Work Distribution Algorithms and
Hardware Topologies in a Multitransputer Network," M.S. Thesis June
1988, Naval Postgraduate School, Monterey, California.




1. Defense Technical Information Center
Cameron Station
Alexandria, Virginia 22304-6145
2. Library, Code 0142
Naval Postgraduate School
Monterey, California 93943-5002
3. Department Chairman, Code 62
Department of Electrical Engineering
Naval Postgraduate School
Monterey, California 93943
4. Dr. Chyan Yang, Code 62YA
Department of Electrical Engineering
Naval Postgraduate School
Monterey, California 93943
5. Dr. Uno R. Kodres, Code 52KR
Department of Computer Science
Naval Postgraduate School
Monterey, California 93943
6. Dr. Man-Tak Shing, Code 52SH
Department of Computer Science
Naval Postgraduate School
Monterey, California 93943













10. Mr. Shawn DeKalb




Mr. James F. Johnson
RD #1
Gramllle. New York 12832
12. Lieutenant Timothy J. Johnson
16229 Arena PI.
Ramona, California 92065
13. AEGIS Modeling Laboratory, Code 52







^Y, CALi^^fi^Jl'^'"- ".*'*' - •
Thesis
J6373
c.l
Johnson
Sensitivity analysis
of transputer workfarm
topologies.
Thesis
J6373
c.l
Johnson
Sensitivity analysis
of transputer workfarm
topologies.

