.~S ' t S~C T Multiprocessors are beginning to be regarded increasingly f a v o r a b l y as candidates for controllers in critical real-time control applications such as aircraft. Their considerable t o l e r a n c e of component failures t o g e t h e r w i t h their great potential for high t h r o u g h p u t are c o n t r i b u t o r y f a c t o r s .
In this paper, w e p r e s e n t f i r s t a logical c l a s s i f i c a t i o n of multiprocessor s t r u ctures with control applications in mind. We point out t h a t one important s u b c l a s s has hitherto been n e g l e c t e d by t h e a n a l y s t s . This is a class of s y s t e m s w i t h a common memory, minimal i n t e r p r o c e s s o r communication and p e r f e c t p r o c e s s o r symmetry.
The performance c h a r a c t e r i s t i c of t h e g r e a t e s t importance in real-time applications is the response time distribution. Indeed, w e h a v e shown in a s e p a r a t e paper [ 2 ] how it is possible to c h a r a c t e r i z e rigorously and o b j e c t i v e l y the performance of a real-time multiprocessor given t h e application and t h e multiprocessor response time distribution and component failure c h a r a c t e r i s t i c s . We t h e r e f o r e p r e s e n t here a computation of the response time distribution for a canonical model of real-time multiproc e s s o r .
To do so, w e approximate the multiprocessor by a blocking model and present a means for e f f i c i e n t analysis. Two s e p a r a t e models are derived: one c r e a t e d from the s y s t e m ' s point of view, and t h e o t h e r from t h e point of v i e w of an incoming task. The former model is analyzed along largely conventional lines. For t h e l a t t e r model, an a r t i f i c~z l serueT-is used, and t h e s y s t e m is transformed into a queueing network.
Introduction
is t h e t a s k of optimal (or sub-optimal) program partitioning..Parallelizing compilers t h a t t r a n s l a t e code w r i t t e n in von-Neumann languages to e x p l o i t any possible parallelism are f r e q u e n t l y i n e f f i c i e n t in the s e n s e t h a t t h e compiler overhead is usually considerable and t h a t it is difficult, if not impossible, to find e v e r y possible parallelism.
However, t h e r e are applications where program partitioning arises quite naturally and no parallelizing compiler is n e c e s s a r y . One such application is real-time control, where the global control f u n c t i o n divides naturally into w e a k l y i n t e r a c t i n g taslcs or ¢ t o m j~n c t i o~s [ 1 ] . Indeed, it is proper to regard t h e control f u n c t i o n as a welldefined set of taslcs, each r e l a t i v e l y i n d e p e n d e n t of t h e others. The t a s k s can be triggered by a v a r i e t y of sources: environmental stimuli, timers, or operators, and correspond to some specific s o f t w a r e p a c k a g e s being run. In most cases, t h e triggering by stimuli is random 2 and can be c h a r a c t e r i z e d by probability distributions.
The major c h a r a c t e r i s t i c s of a multiprocessor control s y s t e m are t h e r e f o r e t h a t (a) t h e control f u n c t i o n c o n s i s t s of a number of w e a k l y -i n t e r a c t i n g and w e l l -d e f i n e d t a s k s , and (b) t h e load can be c h a r a c t e r i z e d s t o c h a s t i c a l l y with reasonable a c c uracy. The second c h a r a c t e r i s t i c enables us to obtain p e r f o r m a n c e evaluations more a c c u r a t e l y than would o t h e r w i s e be t h e case; while the f i r s t influences t h e a r c h i t e ct u r e of the interconnection s t r u c t u r e .
How well a multiprocessor controller performs clearly depends on t h e requirements of the application. We showed in [ 2 ] how controller performance could be rigorously and o b j e c t i v e l y evaluated, given t h e controt a p p l i c a t i o n and multiprocessor response time distribution. Keeping this foundation in mind, w e e v a l u a t e in this paper t h e performance of a canonical model of m u l t i p r o c e s s o r s for real-time control.
Because the entire set of programs t h a t the control s y s t e m will ever e x e c u t e is well-defined, t h e designer has an option: the s y s t e m can either h a v e a common, or mass, memory t h a t contains all t h e a p p l i c a t i o n s s o f t w a r e --t h u s requiring t h e t r a n s f e r of t h e r e l e v a n t s o f t w a r e upon a t a s k trigger 3 --or e v e r y p r o c e s s o r can hold in its p r i v a t e memory all the applications s o f t w a r e it will e v e r need. The l a t t e r altern a t i v e eliminates t h e need to t r a n s f e r s o f t w a r e , t h u s reducing overall response time. Choosing this a l t e r n a t i v e p r e s e n t s t h e designer with t w o s u b -a l t e r n a t i v e s : either provide each p r o c e s s o r with a p r i v a t e memory so large t h a t t h e s e t of control programs required in t h e control of the process can be held in it in its e n t i r e t y , or preallocate t a s k s to specific processors. Based on the above observation, the following can be considered to be canonical logical models of m u l t i p r o c e s s o r s used in real-time control.
T~jpe 1: A Type 1 multiprocessor controller is one in which t h e processors do not s p ecialize; in other words, t a s k s are not r e s e r v e d for s p e c i f i c processor s , and each proc e s s o r in the s y s t e m may be allocated any task. The p r o c e s s of t a s k allocation is t y p i c a l l y dynamic. This t y p e is divided into two s u b c l a s s e s depending on t h e size of processors' p r i v a t e memory. In a T y p e ] a system, t h e p r i v a t e memory of the processors is large enough to hold both the applications and e x e c u t i v e s o f t w a r e in its
In a Type lb system on the other hand, the p r i v a t e memory is too small to hold all the applications s o f t w a r e and requires the t r a n s f e r of s o f t w a r e with each t a s k trigger. This t r a n s f e r can be either single, or involve paging.
The t r a d e o f f b e t w e e n requirements of memory size and t a s k e x e c u t i o n time in this s u b c l a s s i f i c a t i o n is clear. As a general rule, the processors allocated a t a s k e x ecute it in its e n t i r e t y w i t h o u t interruption unless some of the processors fail. There is thus little or no interaction b e t w e e n the individual processors in this t y p e of system. The b e s t known implementation of the Type l b s t r u c t u r e is t h e Draper Laboratory's Fault-Tolerant Multiprocessor (FTMP) [ 5 ] .
Type 2: In a Type 2 system, t h e processors are preallocated specific t a s k s (or s u bt a s k s ) and the s o f t w a r e related to t h e s e is loaded in their p r i v a t e memory. With the identification of specific t a s k s with particular processors comes t h e problem of reallocation of t a s k s on processor failure. Thus, the s y s t e m of this t y p e is too inflexible to easily reconfigure itself upon failure but its reliability is obtained in general through physical r e d u n d a n c y in system components. In general, T y p e 2 is used when the individual t a s k s have s i g n i f i c a n t l y d i f f e r e n t t i m e / s a f e t y c r i t i c a l i t y (e.g. flight control and navigation t a s k s in a i r c r a f t applications). Processor interaction can be considerable.
Type 3: A Type 3 s y s t e m is a composite of a Type 1 and a Type 2 system.
Figure 1 shows graphically the above classification of multiprocessors. Notice t h a t the classification is logical and is d i f f e r e n t from most conventional ones t h a t are usually based on physical interconnection s t r u c t u r e s .
A considerable body of literature has developed around the problem of analyzing multiprocessors. Almost invariably, the procedure is to use a Markov model of the s y stem to solve for such performance measures as throughput, reliability,, availability, etc. Type l a s y s t e m s have been the focus of a great deal of attention. The t e ndency of almost all authors is to assume identical processors and an identical exponential s e r v i c e time distribution for all job classes, upon which t h e s y s t e m d e g e n e r a t e s into an M / M / m queue. This analysis is then embedded in a determination of t h e multiprocessor performance. One good example is t h e work on closed-form estimations of performability by Meyer [ 3 ] .
Type 2 s y s t e m s can be modelled as queueing networks, and t h e r e is a large body of literature on this topic.
Type l b, however, has been almost t o t a l l y neglected. This is odd, considering t h a t one of the f e w multiprocessors a c t u a l l y to be c o n s t r u c t e d (i.e. FTMP [5] ) is an example of this t y p e of system. Also, it is likely t h a t Type l b s y s t e m s will grow in importance as time progresses since it is ideal when the job mix is composed of a large number of tasks, each called r e l a t i v e l y infrequently. Again, the analysis of Type 3 systems requires, as a prerequisite, the analysis of Type 1 systems. In this paper, we shall analyze an important s u b c l a s s of the Type 1 b system. This paper is organized as follows. In Section 2, we analyze an important s u bclass of multiprocessors, obtaining an. approximate e x p r e s s i o n for its response time. We conclude in Section 3 by describing briefly some e x t e n s i o n s of this work presently being undertaken, and their implications for real-time multiprocessor design.
Response-Time Analysis of Type lb System Without Paging

Description of the Real-Time Multiprocessor
The multiprocessor we analyze is shown in Figure 2 . It has a dispatcher allocating tasks to c identical processors. 5 Service at the dispatcher is FCFS, and all the tasks place an identical (statistically speaking) demand upon the computational resources of the system. When a processor is assigned a task, it first sends to the common mass memory for the relevant applications software. This is the only reference the processor makes to the common memory during a single execution. Task arrival is modelled as a Poisson process with rate ;~, and all service time distributions are exponential (in the sequel we also consider non-exponential service times): atthe processors the mean service rate is /4 and at the memory the mean software transfer rate (per task) is /z m. Note that/z represents the actual execution rate for an individual task on a single processor; it does not take into account software transfer time.
A point t;~citly made in almost all work in this area is that the input process is Poisson, and that all service times are exponential. The principal reasons for such an assumption are that (i) while both input and service distributions are non-exponential in practice, they can sometimes be approximated well by Poisson and exponential assumptions, and (ii) that analysis of a system with general arrival and service distributions is almost impossibly difficult.
The first point ensures that the exponential assumption leads to at least' an approximate model of reality. It is important, however, to check this fact for each particular model by employing alternative approaches, eg., simulation. In this paper, we carry out this task by means of a simulation program that assumes Weibull service distributions. The exponential distribution is a special case of the Weibull. By varying the standard deviation of the service distribution while keeping the mean constant, we obtain an indication of the range of input intensities for which the exponential assumption is a good approximation.
In this model, there is the problem of simultaneous possession of resources by the tasks. That is, there is a period when, immediately after being allocated a processor by the dispatcher, the processor queues up for service at the common memory for the applications software. During this period, the processor is forced to remain idle. The present system does not involve multiprogramming.
Since there is a period when a task is in possession of both a processor and the common memory, this multiprocessor does not fit into any of the well-behaved queueing models (such as M/M/m, G/M/m, etc.). There are two approaches to obtaining a solution for the response time distribution. The first is iteTative and employs the method of surrogate servers as in [4] . The second, which we present here, also Involves surrogate or artificial servers but is non-~terative in nature. It consists of approximating the multiprocessor by a model in which simultaneous possession of resources does not occur, but where blocking takes place. We have thus translated our problem into the context of a blocking model. An important consequence of this is that the analysis of the case where paging is allowed for can be handled by an immediate extension of this approach. To solve this problem, we employ artificial se~uevs. Artificial or fictitious servers have been used in a-number of models. In [6] , they are employed in the analysis of an open queueing network with" blocking. Our analysis, although superficially similar, is non-iterative and treats a system with parallel servers with blocking. In such a setting, the iterative technique presented in [
We begin by noting t h a t t h e s y s t e m as shown in Figure 2 can be approximated by t h a t in Figure 3 . The approximation is j u s t i f i e d since the time s p e n t by a t a s k receiving s e r v i c e (as d i s t i n c t from waiting in the queue) at the d i s p a t c h e r is negligible compared to the memory s e r v i c e time. According to our approximate model, an incoming t a s k w a i t s in the M e m o r y -D i s p a t c h e r (M.D.) queue, and, provided it is admitted
To analyze this model, we portray it from two complementary points of view. The first, which is the system.-orient~cl model is as in Figure 3 . On the other hand, Figure   4 depicts the system as seen by an incoming task, i.e., it is a tusk-orient~ed model. We Figure 5 and assuming s t e a d y -s t a t e is achieved, the following global balance equations can be w r i t t e n down:
. . S y s t e m -O r i e n t e d M o d e l
Denote by pi.j(t) the probability at time t t h a t t h e r e are i t a s k s in the M.D. queue (including t h e one receiving s e r v i c e at memory if no blocking is taking place), and t h a t j processors are e x e c u t i n g (i.e. t h e applications s o f t w a r e has been t r a n s f e r r e d to t h e s e j processors and actual e x e c u t i o n is taking place). In this section, we determine the s t e a d y -s t a t e values pt.j=timpi.j(t). 7 t-,o= --
Blocking of incoming t a s k s begins to occur when i+j=c and a new t a s k arrives before any admitted t a s k has been completed. Unless o t h e r w i s e s t a t e d , a variable in the sequel is assumed to be zero if one or more of its s u b s c r i p t indices become negative. The s t a t e -t r a n s i t i o n diagram appears as
(k+j/~)po.j = (j+l)/zPo.jtz +/~rnPIj-* for O<j<c (2) (~+c#)Po,¢ = /ZmPl.c-1
(~+/~m)Pi,O = ~kPi-1,0+~Pi. I for i>O ( 4 ) (k+j/z+/Zm)Pi,j = hpi_l.j+/ZraPi+l,j-l+(j+l)/zpi.j÷l for O<j<c, i>O (s) ',i (k+C/z)pi. c = ~kPi_l.c+~mPi+1.c_ I for i>O (6) The boundary condition is:
~Pi,i =I (7) i=O j=O TO solve for Pi.j, we use the method of generating functions. Define gj(z) = ~p~.jz' (s) i=O Using generating functions and manipulating, we obtain the following recursion for 0>1: 
In particular,
Equating the right-hand sides of (1 1 ) and (1 4), we have:
The only remaining unknowns are the P0.J for O_<i_<c-1. The equations required to solve for these are derived as follows.
The boundary condition (7) yields the relation. C ~gj(1) = I
( 1 7) j=0 Also, the generating functions gj(z) must converge in the unit disc I zl___l. All poles of the generating functions lying in the unit disc must therefore be cancelled out by corresponding zeros. The use of this condition yields further equations in the P0.i for O_<i_<c-1 by (1, 5) since the zeros of the generating function are functions of the values taken by the Po.i. Note that no additional equations can be obtained from a further invocation of (10) since the apparent additional pole at the origin in ( 1 0 ) is cancelled out by a zero.
It can be shown that the above equations are sufficient to permit the computation of the P0.i values. The generating functions g~(z) are therefore completely determined. By inverting them (numerically, in most instances), the s t e a d y s t a t e probabilities ~ Pi.j I can be obtained.
Recall that the above analysis is for c>1. For the case when c=1 (this is also a special case of the blocking problem considered in 
i=O j=O
Rem.cLrk 2:
In the above treatment of the system-oriented model, the e x i s t e n c e of a s t e a d y -s t a t e has been implicitly assumed. That is, it is assumed t h a t l i m p i j ( t ) exists. From a theorem in stochastic processes (see, for example, [ g ] ) , it ?bllows that since the underlying Markov chain is irreducible and aperiodic, any solution of the balance equations that also satisfies the boundary condition is unique and represents limp i j(t).
. 3 . T a s k -O r i e n t e d Model
In this model, we employ an artificial s e r v e r to account for blocking delay. This yields an approximate solution for the response time distribution.
An incoming t a s k views the system as e x p r e s s e d in Figure 4 . With a probability a, it is blocked. The blocking is expressed in terms of an artificial server, to which the incoming task branches if it finds the system blocked, i.e., the service rate distribution of the artificial server is the same as the departure rate distribution for the tasks executing on the processors. When the service time at the processors is exponentially distributed, the artificial server also has an exponential service time distribution.
Implicit in this analysis is the assumption that the transients have died down and a condition representative of s t e a d y -s t a t e e x i s t s at all times. This is patently not true. In most cases --considering that controllers of this t y p e are generally lightly loaded --the probability of blocking is very small. However, once a blocking cycle begins, the probability of its continuing is greater than a for obvious reasons. Even so, it is likely that the transients in both the blocking or nonblocking mode will play a not inconsiderable role in determining the queue behaviour. This is especially the case when the input intensity is neither v e r y low nor near to driving the system into saturation. When in this intermediate range, one would e x p e c t the system to switch from the blocked to the unblocked s t a t e and vice versa with a fairly high frequency. Hence, any model that is based on the assumption of no transients would be e x p e c t e d to provide less accurate results in the intermediate range of input intensity than in the extreme ranges. (Indeed, the relative accuracy between analytical results at two different intensities might be used to obtain some indication of the frequency with which the system switches from the blocked to the unblocked s t a t e and vice versa).
If we assume that non-transient behaviour is exhibited at all times, the problem lends itself to a particularly simple solution and we obtain an approximation to the true solution. It will be shown by simulation t h a t this approximation yields reasonable results.
Under the assumption of s t e a d y -s t a t e , the system may be regarded as the queueing network shown in Figure 4 . The artificial server represents the blocking delay. Under the queueing rules for this model, there is no queueing for the processors once memory access is completed; a t a s k is admitted into the memory only when there is a free processor ready and waiting for it.
It only remains to compute the value of a and the artificial service rate,/~as-It is easy to see that (21~
2,4. Validation
In an attempt to validate the analytical results obtained above, a multiprocessor with three processors was considered.
As a reference, a GPSS simulation model was developed for the system. The input intensity was varied over a wide range and results obtained through simulation compared with those obtained from analysis. In Table 1 , we present the mean response times from the simulation and the analysis.
Throughout the range of intensities studied, the analytical and simulation results were acceptably close (to within 10% in most cases).
Non-exponential Service Rates
While the extreme complexity of analyzing systems with blocking and general service distributions forces the analyst to assume exponential service distributions, it is common knowledge that programs on many occasions have non-exponential service requirements. This can be modelled by equating the mean of the actual service time to a fictitious exponential quantity. In such instances, the sensitivity of the model to this inaccuracy is crucial, s
We test the robustness of our model to non-exponential service time distributions by assuming those to be Weibull, which is a more general distribution than the exponential. This distribution has the distribution function Fwe(t)=l-e-(¢On. The mean is The exponential and the Rayleigh distributions are special cases of the Weibull, obtained by setting ~7=1, and U=2, respectively. When ~7<1, the variance is greater than the mean, when U=I, the variance equals the mean, and when 7>1, the variance is less than the mean. Our test for model robustness takes the form of plotting the ratio of the mean response time when fi is 0.5 and 0.7 respectively to when ~7=1 (i.e., the exponential distribution). The mean response time stays close to the exponential value for a relatively large value of nominal input intensity. As expected, at high intensities, there is a marked divergence from the value predicted by the exponential distribution. However, in critical control applications, the utilization is almost always very low to allow sufficiently broad margins of safety. Note that the actual input intensity is greater than the nominal value as defined in Figure 6 , obtained bycomputer simulation.
a Indeed, the success of such models as the central server Is due In a large measure to a relative insensitivity to this inaccuracy.
Discussion
In this paper, we have presented a logical taxonomy for multiprocessor systems used in control applications, and analyzed one important class. It is clear from the model developed here that what is 'obtained is an upper bound for the actual response times. There are two queues in this system: one in front of the artificial server and the other in front of the Memory-Dispatcher. The model incorporates both queues since when blocking occurs the task at the head-of-the-line (H.O.L.) position is waiting for the first of the tasks admitted to the system to leave so that it can be unblocked and begin to access the M.D. Tasks that have entered the queue after the H.O.L. task have this latter task as an additional impediment to entry into the system. The system represented in Figure 4 follows immediately from the above argument. Implicit to this model is the assumption of independence between the M.D. and the processors. This is clearly a simplification, the blocking ensures that no such independence exists. Due to the correlation between the activity of the M.D. and the processors, the model tends to overestimate the response times. A more accurate model should attempt to deal with this by subtracting a correction term to account for the correlation.
Extension to Type lb systems with I/0 represented by paging and admitting of multiprogramming, although nontrivial, is not difficult. One would have here three classes of jobs: one in front of, or being served by, the memory-dispatcher, one undergoing service, and the third class consisting of jobs either being served by the memory after receiving a portion of their service at the processor or waiting to reaccess the processor. It is also not difficult to think of a number of different systems based on variations of the above basic theme. For instance, one might obtain new models based on assigning priorities to the tasks, in all these cases, the artificial server approach can be used to advantage.
Taking account of multiple job classes with FCFS service and no priority distinction between the classes is, however, very difficult to do exactly since an unmanageably large number of states results. Suitable approximations must therefore be sought. These are likely to be only moderately accurate. One approach might be to use clustering techniques to identify "metaclasses" (groups of classes) and then to employ some averaging techniques. 9 This method should be reasonably accurate at low intensities. At high intensitites, an approach based on the diffusion approximation should be explored. Also, the reader should bear in mind that we have analyzed a logical family of computer systems. This analysis is therefore valid for a wide range of physical implementations.
The model presented provides not only the mean response times, but is also a means to obtain the response time distribution. This response time distribution, once obtained, can be used in a number of ways. As already mentioned, we showed in [2] how response time distributions together with a probabilistic model of the computer and a full mathematical description of the control application can be processed to provide rigorous and objective means for evaluating the performance of a real-time multiprocessor system in the context of its application.
Tradeoffs can also be studied and the multiprocessor system refined thereby. For example, one might consider trading off processor nuTrLber against processor [ 7] . 
