This paper presents a comparative analysis for the PxMxB multi-bus computer system. The system consists of a set of P processors that access a set of M shared memories via a set of B buses. We begin by describing the system and the underlying assumptions to model it.
INTRODUCTION
Analyses of multi-processor computer systems attracted tremendous attention over the past two decades. Many computer architectures have been studied and their potential performance characteristics evaluated [MarSan 19861 Via a simulation approach, unnecessary simplifying assumptions may be eliminated. Moreover a simulation model may capture the dynamics of a system with a great level of detail.
In this paper we analyze the performance of the basic PxMxB multi-processing system. We present an analytical model and then compare it numerically against two sets of simulation results.
OVERVIEW
Consider a multi-processor architecture with: P -identical parallel processors, M -equally referenced memories, and B -global buses Every processor may access at a given time any of the M memories using one of the B buses as depicted in Figure 1 . . . Figure 1 . A multi-processing system In a given machine cycle a processor generates a memory request (i.e. a store or a fetch) with a probability of 0. If the addressed memory is free and a bus is available, the request is satisfied in the next cycle. If howgiven cycle r requests are adever another processor places a redressed to exactly k memories is quest to that memory in the same cycle given by : or a bus is not available, the request is reintroduced in the following cycle M r-1 until it gets satisfied. During the wait for the memlory request to be completed the isiauing processor is
Memories
( k)( k-l) k=l, . . .MI r=k, . . . P
ASSUMPTIONS

4.3
The probability rk that in a given cycle exactly k memories are referenced is therefore:
In a given cycle only one out of memory may b'e serviced. Service time is one 'cycle.
multiple requests to a given
The Bandwidth of the system BW(p) is the rate that the system completes memory requests. It equal s ,
:
At a beginning of a cycle a request may be issued to a memory only if one of the buses is available. Therefore, if B < M, at the most B requests are serviced simultaneously.
sw(e) = z-, min(k,B) 7ck
A request satisfied in a given cycle enables the processor to continue its3 execution at the following cycle. If however tho request is b l o c k e d because the memory is engaged in servicing another request, or all buses are busy, the request is reissued in the :next cycle.
After every cycle all buses are disengaged from any processor/memory association.
ANALYTICAL APPROXIMATION
The analysis presented here is very similar to the one outlined in [MacDougall 19871 and I [Mudge 19841 . The main difference is the introduction of the qr,k probabilities at 4.2.
(a) Assume initially that at the beginning of a cycle there are no outstanding requests from previous cycle(s).
4.1
The probability P r of having r requests in a given cycle is given by : P, = (Ip)e.ci-e)p--, r = o , :~, , . , P
4.2
The probability qr,k that in a (b) Assume now that at the beginning of a given cycle there are also requests that were not satisfied in previous cycles.
. 5
We apply here the approximation BW(a) for the Bandwidth, where
It is thus implied in the approximation for the Bandwidth that colliding requests only increase the effective rate of emitting memory requests. In reality though a colliding request is addressed in the next cycle to the same memory. Thus if for example in a given cycle 3 requests are issued to a given memory, in the following cycle at least 2 requests are posted for that memory. The analytical approximation does not take this dependency into account. It assumes uniform distribution of requests at the beginning of every cycle.
. 6
Denote by c the average number of cycles before a request is issued. The geometric distribution yields:
The average time T, between the completion of consecutive requests at the individual proces- 
THE SI N L
AL
We built the simulation model with Emula4 [Halachmi 19931. Emula4 is a process oriented network simulation language designed for the MS-WINDOWS platform. It is a multi-tasked simulation superset of the standard programming language "c" [Kernighan 19881 .
EL
The main logic of the simulation model consists of two types of processes, Fetch and Execute. At the simulation start a pair of these processes is spawned for each of the P CPUs.
The Fetch process contains the logic of processing machine instructions at a given CPU. Every once in a while a memory request is generated and the CPU is blocked until a response is received for this memory request.
An Execute process serves all memory requests for a given CPU.
When a memory request registers at an Execute process, the addressed memory is checked.
If busy, the request is delayed for a cycle.
Likewise, if a bus is not available, the request is not processed in that cycle. This wait may last several cycles until both a memory and a bus are found available to carry out the request. At this point the memory and the bus are seized for a cycle, and then a response is sent back to the corresponding Fetch process, unblocking the CPU to continue processing machine instructions until the next memory request is issued. 
THE FETCH & EXECUTE CODE
1
The model was first coded with the same s e t of assumptions as described for its analytical counterpart. Next we modified it (as above) to accommodate the more realistic assumption that if a request is not satisfied in 
NUMERICAL RElSULTS
The simulation model was executed with the number of processors varying between l and 10.
For every run we measured the system throughput and compared it with the analytical approximation derived via 4.10 and 4.11,.
Two sets of results were generated: I.
corresponds to the system with the same assumptions as applied for the analytical model.
11
. corresponds to the system without the simplifying assumption about memory references.
The numerical results are presented in Table 1 and Figure 2 . It is apparent from these results that for a small number of processors (relative to the number of memories) the analytical formula approximates well the throughput of the system. As the number of processors grows, the increase in throughput diminishes.
Also notice that the analytical approximation overshoots the anticipated values, especially of set 11. The analytical formulei deteriorates partially due to the ossential (but unrealistic) assumption that if a request to a given memory is not satisfied in a cycle, the request is reissued in the subsequent cycle, but not necessarily to the same memory unit. There is no need of courf3e to commit to such a simplifying assumption in the simulation model. 
INPUT
CONCLUSION
Analytical models and simulation models work side by side. They are complimentary to each other. In the case of the simple interconnection bus system, the iterative analytical procedure provides quick and valuable estimates. When we need to analyze a more detailed system, and cannot afford to make gross simplifying assumptions, the simulation approach is an ultimate choice.
