Simulation experiments for performance analysis of multiple-bus multiprocessor systems with nonexponential service times by Onyuksel, Ibrahim & Irani, Keki B.
18
Simulation experiments for performance analysis
of multiple-bus multiprocessor systems
with nonexponential service times
Ibrahim H. Onyuksel








Ann Arbor, MI 48109.
IBRAHIM H. ONYUKSEL received Yuksek Muhendis (Dipl.-Ing.) degree
in electrical engineering from Istanbul Technical University, Istanbul, Turkey,
in 1969, the M.S. degree in electrical engineering from Cornell University,
Ithaca, NY, in 1977, and the Ph.D. degree in computer, information and con-
trol engineering from the University of Michigan, Ann Arbor, MI in 1985.
From 1971 to 1972 he was with the Turkish State Radio and Television Net-
work, and from 1972 to 1973 he was with Marmara Scientific and Industrial
Research Institute in Turkey, where he worked on the RF communication
hardware. Currently he is an Assistant Professor in the Department of Com-
puter Science at Northem Illinois University, DeKalb, IL. Before joining NIU,
he was on the faculty of Northwestern University, Evanston, IL and George
Mason University, Fairfax, VA. His current research interests include per-
formance evaluation of computer and communication systems, and queue-
ing network models.
Dr. Onyuksel is a member of the IEEE and ACM. He was on the program
committee of COMPSAC 1986 and 1987.
KEKI B. IRANI received the B.S. degree in mechanical engineering and the
B.S. degree in electrical engineering from the University of Bombay, Bom-
bay, India, in 1946 and 1947, respedively, and the M.S.E. degree in elec-
trical engineering in 1949 and the Ph.D. degree in 1953, both from the
University of Michigan, Ann Arbor, MI.
From 1950 to 1957 he was a Research and Design Engineer for Philips
Telecommunication Industries, The Netherlands, where he worked on
coaxial telephone systems. He joined the faculty of the Department of Elec-
trical Engineering, University of Kansas in 1956, and the University of
Michigan faculty in 1961. He is now with the University of Michigan, where
he is a Professor in the Department of Electrical Engineering and Computer
Science. He has worked in the areas of electrical power systems, electrical
networks, communications, control and computers. His current research
interests are computer architecture, distributed computer systems,
databases and artificial intelligence.
Dr. Irani is a member of the IEEE, ACM, Sigma Xi, and Tau Beta.
ABSTRACT
A simulation model (program) is constructed for performance
analysis of multiple-bus multiprocessor systems with shared
memories. It is assumed that the service time of the common
memory is either hypo- or hyperexponentially distributed. Process-
ing efficiency is used as the performance index. To investigate the
effects of different service time distributions on the system perfor-
mance, comparative results are obtained for a large set of input
parameters. The simulation results show that the error in approx-
imating the memory access time by an exponentially distributed
random variable is less than 6% if the coefficient of variation is
less than 1, but it increases drastically with this factor if it is greater
than 1.
Key words: Multiprocessor systems, nonexponential service
times, performance modeling and analysis, pro-
cessing efficiency, simulation models.
INTRODUCTION
In early multiprocessor systems, a crossbar switch was used to con-
nect the processors to the common memory. For example, a widely
known crossbar system is C.mmp multi-minicomputer 0]. The per-
formance of a crossbar multiprocessor system has been analyzed
in recent years [2]-[6]. However, crossbar interconnection networks
are becoming less and less interesting due to their complexity and
their cost. Recent proposals and implementations show that a more
attractive alternative would be a bus-structured interconnection
network [7], [8].
The performance of bus-oriented multiprocessor systems has been
studied by many researchers. Fung and Torng [9] developed a deter-
ministic model for the analysis of memory contention and bus con-
flicts in multiple-bus multiprocessor systems. Goyal and Agerwala
[10] proposed two generic classes of multiple-bus systems, and they
analyzed the performance of these systems using the in-
dependence approximation introduced by Hoogendoorn [4].
In a way similar to that suggested by Enslow [11], a multiple-bus
multiprocessor system, as depicted in Figure 1, can be described
by its characteristics as follows:
~ The multiprocessor system contains two or more processors of
comparable capabilities. Each processor has its own local
memory unit.
~ All processors share access to a common memory, which
consists of several modules.
~ Processors and common memory modules are connected by
multiple buses.
~ The allocation of common resources to processors is con-
trolled by a controller unit. -
19
Processors execute segments of programs in their loca) memories
until they need to access the common memory. When a processor
requests access the common memory, it computes an address, in-
cluding a memory module number, and then signals the controller
for a connection to the referenced module. Requests for con-
figure 1. A typical multiple-bus multiprocessor system.
nection are assumed to be independent from one processor to
another, and more than one processor may request access
simultaneously. A processor can access the common memory via
one of the time-shared buses if the referenced module is free and
a bus is available for connection. The configurationof this type of
systems is usually denoted by a triple p x m x b, where p, m, and
b are number of processors, number of memory modules, and
number of buses, respectively
The main reason for using multiprocessor systems and dividing
the common memory into several modules is to achieve better per-
formance for the proposed system. In theory, a multiprocessor
system with N independent processors can compute a given pro-
blem at most N times faster than any one of the processors can.
However, theoretical speedup cannot be achieved in practice
because of sharing the common resources among the processors.
Section 2 presents a closed queueing network model for the per-
formance analysis of a multiple-bus multiprocessor system. Sec-
tion 3 shows a simple way to use hypo- and hyperexponential
distributions to match the first two moments of a random variable.
Finally, in Section 4, our simulation model is explained and the
effects of the input parameters on the multiprocessor system’s per-
formance are discussed.
Performance Modeling
To analyze the performance of a multiprocessor system with N in-
dependent processors under conflicts, the behavior of the system
can be modeled by a closed queueing network with N classes of
customers, two stages of parallel servers (processors and common
memory modules), and several passive resources (buses for
processor-memory connection).
The queueing network model of the multiple-bus multiprocessor
system (Fig. 1) can be described by the follawing assumptions:
(1) When a processor requests access the common mem-
ory, a connection is immediately established between the pro-
cessor and the referenced module, provided that the refer-
enced module is not being accessed by another processor and
a bus is available for connection.
( 2 ) A processor cannot have another memory requested if its pre-
sent request has not been granted.
(3) The duration between the completion of a request and the
generation of the next request to the common memory is an
independent, exponentially distributed random variable with
the same mean value of 1/~ for all processors.
(4) The duration of an access by a processor to the common
memory is an independent, identically distributed random
variable with the same mean value of 1/ JL for all memory
modules.
(5) The probability of a request for access from a processor to a
common memory module is independent of the module and
it is equal to 1 / m.
If a queueing network model satisfies assumption (5), then it is
called a uniform reference model (URM). Although this assump-
tion considerably simplifies the analysis, it may not be very realistic
for some systems. Several researchers have attempted to solve the
problem with nonuniform access probabilities. However, their
methods are applicable only to small-scale systems [12] -[15].
The goal of the analysis of the queueing network model is to deter-
mine the values of a performance measure for a given set of input
parameters. A performance measure is merely an index which can
be used to represent the performance of a system. In this paper,
processing efficiency (PE), which is equal to the expected value
of the percentage of ACTIVE processors, is used as a direct measure
for the &dquo;computing power&dquo; of a multiprocessor system-A pro-
cessor is called active if it is executing instructions in its private
memory, and an active processor is neither accessing nor waiting
to get access to the common memory. Most of the other perfor-
mance measures for the queueing network model of a multiple-
bus multiprocessor system are related to PE with very simple equa-
tions [16], and an exact, closed-form solution for PE with exponen-
tially distributed memory access times is obtained by Irani and
Onyuksel [17].
Figure 2: The hypoexponential server.
Non-exponential Service Times
In this paper, we assume that the service time of the common
memory is either hypo- or hyperexponentially distributed, because
these distributions are sufficient to match the first two moments
of a given random variable. Let Xbe a random variable. If Xhas an
exponential distribution, then the coefficient of variation for X is
known to be Cx - ClxIE[X] - 1, where EM is the expected value
and aX is the standard deviation of X If CX ~ 1 then it is possible
to find a hypoexponential distribution for Cx< 1 and a hyperex-
ponential distribution for Cx> 1, which exactly matches the first
two moments of X.
An r-stage Erlang distribution can be realized by an r-stage serial
server such that each stage of the server has exponentially
distributed. service time with the same service rate for all stages.
Erlang distribution can be generalized by relaxing the restriction
that each stage of the server has the same service rate. This is called
the hypoexponential distribution. Figure 2 illustrates the r-stage
hypoexponential server, where each stage of the server has ex-
20
ponentially distributed service time with the mean value of 1 / p,¡
for the ith stage. Within the service facility, at most one of the stages
may be occupies by a customer and no new customer may enter
the server until the previous one departs. Customers enter from the
the left and depart to the right.
Let p, = kilt for i =1, ... , r be the exponential service rate of the
ith stage of an r-stage hypoexponential server, and let Ybe the pro-
babil ity distribution to this server with E[Y] = 1/~. It is known that
where xi - i /ki for i ~ 1, ... , r. Equation (1) yields
where xiK (ki a 1) for i =1,...,r. By definition of Cyand by equa-
tions (1) and (2), it is obtained
It can be easily shown that Cyreaches its minimum value for
Xi &dquo;’ ’’ ** Xr s 1 /r(Erlang distribution), which yields
Cy - 1/ 01n other vwrds, if X is a probability distribution with
Cx E [ 1 / r, 1], then there exists an r-stage hypoexponential
distribution Ywhich exactly matches the first two moments of X
Combining equations (3) and (4) yields a more suitable expression
for Cyas follows:
An r-stage hyperexponential distribution can be realized by an r-
stage parallel server such that each stage of the server has exponen-
tially distributed service time. Figure 3 illustrates the r-stage
hyperexponential server As for the hypoexponential server, at most
of the stages may be occupied by a customer and upon entry into
the service facility the customer proceeds to service stage i with
probability a;.
Leti4i - kip. for i... 1,...,rbe the exponential’service rate of the
itch stage of an r-stage hyperexponential server, and let Z be the pro-
bability distribution corresponding to this server ~ith E[Z] - 1/ p..
It is known that
where xi - 1 / ki for i = 1,..., r. Equation (6) yields
where xi % 1 /«; ( k; ~ «; ) for = 1,...,r. r. By definition of Cz and




B Or /B&horbar;&horbar;&horbar;&horbar;&horbar;&horbar;&horbar;. r &horbar;&horbar;&horbar;&horbar;&horbar;&horbar;&horbar;~
Figure 3. The hyperexponential server.
A Special Case
If <x~ - - - - = a, g 1 /r then equations (8) and (9) can be
simplified as follows:
The preceding expressions show that Cz reaches its maximum
value for xi - r and Xi - 0 for i - 2,...,r, which yields
CZ - Ý 2f - 1. In other words, if X is an arbitrary probability
distribution with CZE [1; 2r - 1 ], then there exists an r-stage
hyperexponential distribution Z (with equal branching pro-
babilities) which exactly matches the first two moments of X.
Simulation Experiments
Because of its generality and its simplicity, the most popular ap-
proach to analyze the performance of a computer system is to use
a simulation technique. However, to run a simulation program is
usually very costly, and its execution time is goverend by the
number of sample points which directly determines the error on
the final output statistics. In most cases, the error can be reduced
by additional computational time. 
’
21
For the performance analysis of a multiple-bus multiprocessor
system with a common memory, a simulation program was con-
structed. The program was run on the Michigan Terminal System
(MTS) for several multiprocessor configurations with hypo- or
hyperexponential memory access times.
For random number generation, the pseudo-random sequences
generated by a subroutine (on MTS) were used. This is called a
multiplicative congruential generatorwhich generates a sequence
( ui ) uniformly distributed on the interval (0,1). To eliminate the
dependency between various events of the simulation experiment,
different seednumbers (randomly selected) are assigned to each
independent sequence of events. For example, the interval between
subsequent access requests of a processor to the common memory
is independent of the others, so that different seed numbers are
used for each processor.
For the simulation program, there are two types of probability
distributions to generate, namely, the uniform distribution and the
exponential distribution. Since hypo- and hyperexponential
distributions can be represented by a serial or a parallel combina-
tion of several exponential distributions, they do not need to be
generated separately.
Let Uand Xbe uniformly distributed random variables in the in-
tervals (0,1) and (a,b), respectively, and let Ybe an exponentially
distributed random variable with parameterx. It is then shown in
Reference [18] that
If the modules of the common memory are numbered from 1 to
m and if I is a random integer uniformly distributed between 1 to
m, then byassigninga = 1 and b = m + 1 in equation (10), Ican
be generated from X as follows:
Let Vbe an r-stage hypoexponential distribution with parameters
P.l,... ,p.,. If { Y;} is a sequence of independent, exponentially
distributed random variables corresponding to the stages of the
hypoexponential distribution such that V - Ef-l Yi, and if (Uil is
a sequence of independent, uniformly distributed random
variables in the interval (0,1), then equation (11) yields
If V is an r-stage Erlang distribution with parameter 14 then
A, - - - - - ~,, = rw Thus, the preceeding equation yields
Let Wbe an r-stage hyperexponential distribution with parameters
ILl,... ,ILr. If a stage in the parallel server is chosen uniformly, and
if I is a random integer uniformly distributed between 1 to r, then
where U, and U2 are independent, uniformly distributed random
variables in the interval (0,1) and ILl is chosen uniformly from
(>, , ... ,>r) by the index variable I.
The objective of our simulation experiments is to investigate the
behavior of the multiple-bus multiprocessor system on its
equilibrium condition. Since the behavior of the simulation model
does not represent the transient behavior of the system, the data
observed during this period are discarded. Let N, be the number
of sample points used to compute the final statistics and N, be the
discarded data points. Thus, the program runs until N = Nt + NS
data points are observed. In order to estimate the transient period,
we made a number of preliminary pilot runs (with different seed
numbers for each run) and compared the final statistics at various
&dquo;ages.&dquo;These simulation experiments showed that if we discard
the initial Nt = 1,000 sample data points of each run, the effect
of transient period and the selection of seed numbers on the final
statistics becomes less than 1 %. Of course, the selection of Nt
seems rather arbitrary, but if N is sufficiently large, then it is
reasonable to believe that the error, which is made by consider-
ing the system in equilibrium after Nt data points, is negligible.
It is clear that the simulation error on the final statistics decreases
as NS increases. However, if N, is too large, then running the
simulation program will become very costly. Therefore, a lower
bound for N, must be estimated for a given error percentage. Sup-
pose E[Y] is chosen as a performance index. If the samples for
E jY], Yl,’ .., YNe, are statistically independent, then for suffi-
ciently large values of N,, it is shown in Reference [18] that
where
Yi is the sample mean,
is the sample variance, and ZCX/2 is the upper 100 (cx/2 ) percen-
tile of the standard normal distribution.
The interval ~ is said
- - - - ~-
to be 100( 1-a) percentage confidence interval for E[ Y]. Equation
(12) shows that f[Y] is contained in the interval with probability
(1 -a). For the confidence interval considered above, the length L
of the interval may be written as,
and it follows
For given a and syvalues, Nscan be determined by the preceeding
equation so that the confidence interval will have a prescribed
length. In general, sy is not known in advance, but it can be
estimated by a pilot run.
For the simulation experiments, a = 0.05 and (L/2Y) were chosen.
First, a pilot run was made by using Ns = 100,000 (N~ = 10,000)
sample points for the 2 x 2 x 2 system with p = ’Alp. = 1 and
Cs = 1 / 2, which is the coefficent of variation for the service
time Sof the common memory: PE = 45.37 % and QPE = 35.80 %
were obtained. This yields L = 0.91 % for a 1 % simulation error with
95 % confidence, and by equation (13), we obtain Ns - 24,000.
However, Ns = 50,000 with Nt = 5,000 were selected to obtain
more precise results.
22
Tables 1 and 2 give the simulation statistics (mean value, stan-
dard deviation, and 95 % confidence interval) on PE for a family
’ of bus-sufficient (BS) systems with p,m,b = 2,4,6,8, p = 1, and
Cs = 0.90, 1 / B/2~ Simulation results are also compared with
the exponential service times (Q = 1)~ Let us define APE =
(PE - PEI)/PE1 with PE1 be the PE of a system with an exponen-
tial server. In fact, APE is a direct measure of the effect of Cs
on the system performance. Since 1 / v$K CS<1 for both cases
of the example, the service time of the common memory can
be approximated by a two-stage hypoexponential distribution
with ILl = klll and 112 = k21l: For Cs = 0.90, ~ = 1.12 and
k2 = 9.41, and for Cs = 1/B/2~ ~ = k2 = 2.
The values of APE in Tables 1 and 2 show that, for
1 v’2 ~ Cs < 1, the error in approximating the service time of
the common memory by an exponential distribution is less than
5 %. Indeed, simulation results show that the maximum error is
about 4.51 + 0.43 % with 95 % confidence (the 8 x 6 x 6 system
with P = 1).
At the extreme point, CS = 0 (deterministic access times).
For the 6 x 4 x 2 bus-deficient (BD) system, the exact results
for Cs = 1 and the simulation results for Cs = 02 are compared
for p = 0.1,0.5,1.0. This comparison yielded APE =
0.64, 5.55, 4.78%, respectively. These values show that even for
the extreme case, the maximum error is less than 6%.
To run a simulation program for large-scale multiprocessor
systems with less than 1% error is computationally inhibitive.
Even for small-scale systems our simulation program took, on
the average, 100 seconds of CPU time (on MTS) to produce one
data point for N = 55,000. This seems to be a drawback in us-
ing a simulation technique for the performance analysis.
To investigate the characteristics of PE for Cs > 1, the simula-
tion program was run for CS a 1.10 and y’2: The results are
tabulated in Tables 3 and 4. Since 1 < Cs < V3-for both cases
of the example, the service time of the common memory can
be approximated by a two-stage hyperexponential distribution
with a uniform selection for each stage, where ILl = klll and
Table 1. Simulation statistics on PE for p - 1 and Cs - 0.90.
1 The exact results for the exponential case are obtained from Reference {19J.
2Simulation results for fixed access times are obtained from Reference {16J.
Table 2. Simulation statistics on PE for p - 1 and
Table 3. Simulation statistics on PE for p - 1 and
Table 4. Simulation statistics on PE for p - 1 and ~ 1
23
The values of 0 PE in Tables 3 and 4 show that for
1 < CS < v,&dquo;2, the error in approximating the service time of the
common memory is less than 8 %. The values in Tables 1-4 yield
the following observation:
!f CS < 1 then PE> PE1, and if Cs > 1 then PE < PEI. Thus, to in-
crease the PE of a system, as need to be decreased for the service
time of the common memory.
CONCLUSION
In this paper, the performance analysis of a typical multiple-bus
multiprocessor system is extended beyond the exponential
distribution and a simulation model is developed for hypo- and
hyperexponentially distributed memory access times.
Processing efficiency is used as a primary performance measure.
For a large set of values, the effect of Cson PE investigated and the
comparative results are presented. If the coefficient of variation for
the service time of the common memory of the multiple-bus
multiprocessor system is less than 1, then our results show that ap-
proximating the service time by an exponential distribution will I I
not produce a large percentage of error on the system performance.
Even in the worst case (constant service time), A PE is less than 6%
with 95% confidence.
REFERENCES
[1] W.A. Wulf and C.G. Bell, "C.mmp&mdash;A multi-miniprocessor," in Proc. AFIPS, 1972,
pp. 765-777.
[2] D.P. Bhandarkar, ’"Analysis of memory interference in multiprocessors," IEEE Trans.
Comput., vol. C-24, pp. 897-908, Sept. 1975.
[3] F.S. Baskett and A.J. Smith, "Interference in multiprocessor computer systems with
interleaved memory," Commun. ACM, vol. 19, pp. 327-334, Jun. 1976.
[4] C.H. Hoogendoorn, "A general model for memory interference in
multiprocessors", IEEE Trans. Comput., vol. C-26, pp. 998-1005, Oct. 197Z
[5] B.R. Rau, "Interleaved memory bandwidth in a model of a multiprocessor com-
puter system," IEEE Trans. Comput., vol. C-28, pp. 678-681, Sept. 1979.
[6] D.W Yen, J.H. Patel, and E.S. Davidson, "Memory interference in synchronous
multiprocessor systems," IEEE Trans. Comput., vol. C-31, pp. 1116-1121, Nov. 1982.
[7] R.J. Swan et al., "CM*&mdash;A modular multi-microprocessor;’ in Proc. AFIPS, 1977,
pp. 637-644.
[8] J.V. Levy, "Buses: The skeleton of computer structures," in Computer Engineer-
ing: A DEC View of Hardware System Design, C.G. Bell, J.C. Mudge, and J.E.
McNamara, eds., Digital Equipment Corp., 1978.
[9] F. Fung and H.C. Torng, "On the analysis of memory conflicts and bus conten-
tions in a multiple-microprocessor system," IEEE Trans. Comput., vol. C-28, pp. 28-37,
Jan. 1979.
[10] A. Goyal and T. Agerwala, "Performance analysis of future shared storage
systems," IBM J. Res. Develop., vol. 28, pp. 95-108, Jan. 1984.
[11] P.H. Enslow, "Multiprocessor organization&mdash;A survey," ACM Comput. Surveys,
vol.9, pp. 103-129, Mar. 1977.
[12] A.S. Sethi and N. Deo, "Interference in multiprocessor systems with localized
memory access probabilities," IEEE Trans. Comput., vol. C-28, pp. 157-163, Feb. 1979.
[13] K.O. Siomalas and B.A. Bowen, "Performance of crossbar multiprocessor
systems," IEEE Trans. Comput., vol. C-32, pp. 689-695, Jul. 1983.
[14] H.C. Du and J.L. Baer, "On the performance of interleaved memories with
nonuniform access probabilities," in Proc. 1983 Int’I Conf. on Parallel Processing
pp. 429-436.
[15] D. Towsley, "An approximate analysis of multiprocessor systems," in Proc. 1983
ACM Sigmetrics Conf. on Meas. and Mod. Comput. Syst., pp. 207-213.
[16] M. Ajmone Marsan and M. Gerla, "Markov models for multiple-bus
multiprocessor systems," IEEE Trans. Comput., vol. C-31, pp. 239-248, Mar. 1982.
[17] K.B. Irani and I.H. Onyuksel, "A closed-form solution for the performance analysis
of multiple-bus multiprocessor systems," IEEE Trans. Comput., vol. C-33, pp.
1004-1012, Nov. 1984.
[18] H. Kobayashi, ModelingandAnalysis: An Introduction to System Performance
Evaluation Methodology, Addison-wesley, 1978.
[19] I.H. Onyuksel, Markovian Queueing Network Models for Performance Evalua-
tion of Multiple-Bus Multiprocessor Systems, Ph.D. Dissertation, The Univ. of
Michigan, Ann Arbor, Mich., May 1985.
