This paper presents a mathematical model to measure the amount by which a computer's speed i s reduced when it time-
Operation of all storage units, regardless of their independence in satisfying processor requests, is synchronized, with no overlapping of read/write cycles. (This departure from the way modern storage units function is discussed in the paper.) Cycle duration is the same for all storage units. Input'/output channels as well as central processing units are referred to as processors, although they can be distinguished by assigning special values to certain parameters (the tiebreaking probabilities).
Each processor, i, can request use of a storage unit for only one cycle, and does so with probability p i . Thus, the demand pattern of each processor is equivalent to a sequence of Bernoulli trials. If a processor fails to get use of a storage unit for a requested cycle, it automatically repeats its request for the next cycle. Thus, the sequences of Bernoulli trials are intermittently shifted forward, which activity can be regarded as a Markov chain. Only one request can be satisfied each cycle by one storage unit.
Consider the case of two processors, A and B. Storage unit -two processors j ( j = 1, 2 ) is requested for each cycle by processor A with probaand two storage bility p a , and by processor B with probability Pb?. If both prounits cessors request the same storage unit, j, for the same cycle, processor A will win with probability naj and processor B will win with probability n b j . Thus, 0 5 p a j i -pb j 5 2 and naj inb; = 1 for j = 1, 2 . Each time a processor request is not satisfied, its refused request and all of its subsequent requests are postponed one cycle. Thus, we have two parallel sequences of Bernoulli trials, which are intermittently shifted forward. The shifting process can be described as a finite Markov chain'
with the five states shown in Table 1 . Let Pij (i, j = 1, . . , 5) be the conventional transition probability of going from state i to state
P =
For example, to go from state 2 to state 3, which has probability P,, (shown boxed), we require that processor B demand a cycle on storage unit 1 (which has probability p b l ) and that processor A win the resulting conflict (which has probability I I a l ) . We do not require a request by processor A , since state 2 implies this. Processor A is delayed on storage unit 2 ; B is using 2
5
Processor B is delayed on storage unit 2 ; A is using 2
The Markov chain represented by the matrix P is both irreducible and aperiodic" Thus, if the matrix P is multiplied by itself many times, it converges t o a matrix with identical rows:
. P , ' .
P ,
The elements Pi ( j = 1, . . . , 5) of the limit matrix A are the limiting probabilities that the system will be found in state j.
By definition, the sum of the five probabilities P , , . . . , P , equals
1.
Since the limiting operation converges to A , it follows that :
This matrix operation represents five simultaneous equations with five unknowns, the limiting state probabilities, and can be readily solved by the techniques of matrix algebra. In general, there are n unknowns for n states.
Processor A is delayed one cycle each time state 2 or 4 is entered. Therefore, in the limit, processor A is delayed ( P z + P4)X cycles for every X cycles. Consequently, after X cycles, processor
A has advanced only X -( P z + P4)X cycles, so that the stretching factor for processor A is [I -( P z + P4)]-'. This factor can be interpreted as a ratio, Td/T,, where T , is the time to do a certain task on processor A without contention and T*, is the corresponding time with contention. Similarly, for processor B ,
By solving the transition matrix P for the limiting probabilities, Pi, we have: nai = the probability that A will be granted the storage unit if both A and B request storage unit i for the same cycle, and
A channel is distinguished from a processor by the value of the probability with which it prevails in obtaining use of a storage unit for a cycle, in the event that it and a processor both request the unit for the same cycle. Ordinarily, the channel has priority, so that this situation is equivalent to the case of two processors and N storage units if one processor is privileged over the other. Therefore, let A be the channel and B the processor; then II,; = I and r I a i = 0 for all i. The stretching factors then become:
The number of states increases from five to the seven shown in Table 2 as we add a third processor to the case of two processors and one storage unit; however, the number of independent parameters increases from three to eight. This means that explicit general formulas are more difficult to obtain and more cumbersome to use. The fifteen system parameters are the following, of which only eight are independent : p , is the probability that processor X requests a storage unit for any given cycle, where X = a, b, c. Define y. = 1 -p,. The forty-nine transition probabilities are specified in the Appendix.
For this and all other cases where at least three processors are involved, it) is best to first substitute the numerical values of the parameters and then solve the associated set of linear simultaneous equations. However, explicit formulas are given for two particular situations.
In one case, one processor representing a channel, C, with an arbitrary storage demand rate and high priority, is involved with two processors, A and B , each having an arbitrary storage demand rate but lower priority than the channel. Thus, p,, pb, and p , are arbitrary. A and B are given equal priority in a conflict between them by n a b = n b a $ Absolute priority is given to C in a conflict with either A or B by n,, = ne* = 1
Absolut'e priority is given to C in a conflict with both A and B by n c u b = n e b .
1
Using these values, the following set of three simultaneous equations can be derived, from which the limiting state probabilities, P,, P,, and P7, may be calculated:
( q c + PbPJP, -( q c + P a P c P .
Similarly, processor B is delayed whenever states 2 , 5 , and 7 are entered. The values of P,, P,, and P, are zero, since the channel can never be delayed in contention with the other processors.
PI can be obtained from the relation PI + P , + P , + Pi = 1.
Thus,
If the two processors, A and B , have the same storage demand rate, so that pa = p b , then the stretching factors for A and B are identical. For this case, the limiting state probabilities can be calculated directly from the following: In the second case, the three processors are identical in their storage demand rate; however, the priority scheme is arbitrary. Thus, pa = p b = p , = p and q = 1 -p . In addition, A is favored in the event of simultaneous requests for storage unit 1, because it wins conflicts with B four times out of five. In the same way, B is favored in requests for storage unit 2.
Note that the utilization of storage unit 2 is not 110 percent, because the programs being executed by A and B have finite length. Furthermore, if A and B both wanted storage unit 2 all t'he time, the stretching factor would be 2 for each (assuming the contested cycles were assigned to A and B alternately). Thus Therefore, the stretching factors given bv Equations 7 and 8 are
Simulation studies
The mathematical model is useful within the limitations of analytic techniques in general. The derivation of the analyt'ic formulas is possible only within t'he framework of certain restrictive assumptions. Also, successful ut'ilization of these formulas hinges upon a fairly precise knowledge of the various probabilities that combine to form the resultant, equations. Therefore, a study was conducted to determine whether the mathematical model was valid in the general, nonrestrictive case or only within its rather limiting premises. The study was also intended to produce sets of probabilities required by the analytic formulas.
The mathematical model of the interacting processors and shared storage units is grounded on two main assumptions. First, the storage units operate cyclically and synchronously regardless of processor demand for access. This also implies that main storage has no potential for interleaving. Second, a processor's requests for access to st'orage are independent of prior demands (a processor's requests form a sequence of Bernoulli trials). Although critical for t,he derivation of the analytic formulas, the realism of the first of these assumptions is questionable when compared with the operation of an actual stsorage unit, which operates during a cycle only in the event' of a processor request, which incorporates overlapping read/write cycles, and which may operate with a degree of interleaving. Likewise, the realism of the second assumption, and thus the entire model, may be brought into question when it is realized that processor requests for use of storage are not independent of one another.
Simulation of the mult'iprocessing environment was used t'o determine the predict'ive accuracy of t'he analytic t'echnique. It should be emphasized that t,his work was not done to check the analytic formulas against a real system; it was import'ant only to validate them against a system that was not based on the same restrictive assumptions, a system that,, incidentally, incorporated all the relevant features and complexity of a real syst'em. However, the multiprocessor model that was constructed The model operates at t'he instrucfioll level, rather than simulating execvtion of complete job steps or programs, and it includes such processor features as eight-byte pre-fetching of instructions, and br:tnching and :usccssing of dat'a dependent' upon the assigned instruction length. The storage units operate asynchronously and :we structured to allow interleaving. The direct-acvess devicbe, \\-hen operating, t,rnnsnlits at a rate of 200 kilobytes per second, generating the highest priority service request's in both the indepeudent' and the contention storage units at constant intervals of 40 microseconds. Each of the CPU's, in contrast,, provides a new instructioll irnmediately up011 cornpletion of use of t,he st'oragc unit' in processing the previous instruction. As these elements interact, st,orage cont'ention owurs, and, in addition, the probabilities required for the analytic. formulas previously discussed may he derived. Table 3 presents the simulation results pertinent t o verification of the analytic tecahnique. The simulations inrlutled models both with and without t,he generalized direct-acscaess device and storage units ranging in complexit,y from no interleaving to four-way interleaving. The xmsses per instruction include hot'h instruc*tion and data accesses. Although the independent system executed a greater number of instructions t'han either CPU of t,he multiproressor, the instruction rate of the mult,iprocessor was generally great'er t,han that of the independent' system, due to the considerable contention for storage unit cycles exhibited by the multiprocessing CPU'S. The enhancement' percentages provide a measure of this increased rate with respect' to t'he independent system. These figures arc arrived at' by subtracting the independent processor il~struc%ion rate from the total multiprocessor instruction rate and dividing the result by the independent instruction rate.
It seems logical to begin a verification of the analytic technique with inspection of the simple example in which a processor contends only with a channel for their shared storage unit. This corresponds t o the case of one processor, one channel, and N stores, considered previously. With only one storage unit, N is equal t o 1, and Equation 6 reduces to The probability, p b , of a storage access by the processor may be determined by first calculating t'he maximum number of requests for service that a single processor may make in one second. With two-way interleaving, it may be assumed that one-half of the processor's requests spend the minimum amount of time, 0.75 microsecond, in use of storage. The remaining requests cannot effectively utilize the interleaving potential and are forced to spend 1.5 microseconds while being serviced by main storage. The actual instruction rate of the independent processor without input'/output interference is shown in Table 3 to be 488,219 instructions per second. This figure varies according to the instruction mix.
Table 3 also shows that approximat'ely 1.42 accesses to main storage were required for each instruction. This figure seems reasonable if it is realized, first, t'hat the four-byte instructions, which constitut'e the majority of t'he instruction set of the simulation model, each require one data access, and, second, that two such instructions can be fet'ched per access, given a storage width of eight byt'es. The deviation of t'his figure from the expected value of l..5 depends on the relative percentage of two-byte instructions, which do not access storage for data, and of six-byte instructions, which require two such data accesses. This value of accesses per instruction varies according to different equipment configurations and instruct'ion mixes.
The probabilit'y of a storage access by t'he processor is, then, equal to The corresponding simulation stretching factor may be calculated by division of the independent processor's instruction throughput without I/() interference by the same processor's throughput with such interference. This fraction may be seen, from In the case of one processor, one channel, and one store, then, it appears that the analytic and simulat'ion approaches yield nearly identical results. The verification of a slightly more complex analytic formula presents itself in the case of two processors and N stores, where N is again made equal to 1. I n reality, this configuration would be equivalent' to a multiprocessing system with no I/() devices. If it is assumed that each processor has, an equal probability of requesting this storage unit for any particular cycle and, further, that a cysle under contention is granted with equal probability to one of the two processors, then Equations 3 and 4 become equal, and pa assumes equality wit'h p b . Letting p a = p , = p , Substitution of the previously determined probability, p a = 0.780, yields
The corresponding simulation stretching factor may be produced by dividing the independent processor throughput by the average throughput of a single CPU in the multiprocessor. st,retc,hing factor (simulat'ion) = 488,219 instructions/second 331,5Ti4 inst'ructions/second This indicates that the analytic formula yields a value about seven percaent higher than simulation. This error was fairly consistent over several runs wit'h differing parameters, indicating that this dift'ercnce arises from the assumptio~~s n x d c and not from statistic*nl v:wiation of the simu1;Lt'ion. Considering the grossness of the assumptions, it is a very modest' error.
The final verification deals with the rase of three processors and one storage unit, where one of t,he three processors assumes the attributes of :L cshallnel. Oheying the premises of the previous example, evaluation of the annlytic stretching f:wtor merely involves substit,ut>ion of t'he processor :md the chumel probabilities, 0.780 and 0.028, respectively, into Equat'ions 8 and 9.
T,* TX 1
Simulation of t)he same ( m e produces :L stretching factor through division of the independent' proressor t'hroughput nAh 1 Table 3 :~lso displays the :Ln:dyt,ics and simulat'ion vdues for t'he environments in whicah the storage a l l o m either no int,erleaving or four-way int'erleaving. The colwlusion t'o be drawl from the correspondence of t,he three set,s of figures is t'hat it' is reasonable t,o expect t,h:tt' an analytic. approwh \vi11 yield results withill ten percent of t'hose derived from an actual simulation of the same problem. In t,his particular inst,:rnce, it should be not>ed that t,he percwltage deviatiol~ of the an:dyl,ica from t#he simulxt'ioll figures increases as-t'he degree of storage interleaving incxases. This merely highlights t'he fact th:Lt the :mumptions of the analytic technique become increasillgly inacwrate as the cornplexity of t'he model is augmented.
If t'he premises of the second example are relaxed to the extent that, the two rontending processors request, st'orage cyrles with differing prob:tbilities, cd(vlation of the analytic st'retrhing factor for the case of three proctssors and one storage unit involves t,he resolution of t'he three simultaneous equations of Equation 7. Figure 2 ahon-s the results for the rase of unequal probabilities, p , = 0.9 and p b = 0.6. S o storage inter1c:Lving is assumed. The stretching factt,or is plotted as a function of increusillg c~lla111~1 activity. It is important to note that, the simulat,ion model indicates a difference between t'he st,retc:hing factors for processors A and R under t,he condition of no I/() requests, which the analytic model does not' predict. With increasing I;() ac:tivit)y, a difference in the stretching fact'ors for the tn-o processors appears i n the analytic model, but t,he spread remains more modest than the ___ = 1.487 
Summary
The analytic approach appears to be useful in providing approximate stretching factors for storage contention. However, if the desired results must be much more accurate than 10 to 15 percent, it is usually necessary to resort t o simulation; the advantages gained through the speed of the analytic technique ordinarily are balanced by its inability to mirror changes in model complexity as readily as simulation.
