Recursive analytical expressions for speedup and solution time for a multilevel tree sequentially processing a divisible load under cut through switching are developed. Such cut through switching is shown to be more efficient than store and forward switching. Aerospace applications include sensor networks, radar, and satellite imagery processing.
I. INTRODUCTION
The processing of massive amounts of data on distributed and parallel networks is becoming more and more common. Aerospace applications include radar and infrared tracking, satellite imaging, and sensor networks. Over the past 15 years, a number of researchers have mathematically modeled such processing using a divisible load scheduling model [1] [2] [3] that is useful for data parallelism applications.
Divisible loads are ones that consist of data that can be arbitrarily partitioned among a number of processors interconnected through some network. Divisible load modeling usually assumes no precedence relations among the data. For instance, a satellite image processing system may involve a stream of independent images arriving at a robotic tape silo for storage and then being distributed to processors on a cluster computer. In this situation divisible load scheduling theory can answer the question of how to optimally schedule image transmissions and assign load to processors to maximize parallel system speedup and minimize overall processing time.
One reason that motivated the creation of divisible load theory was the need for a means of modeling integrated measurements, communication, and computation in sensor networks [4] . At the time of [4] there was a recognized need to combine the communication and computation aspects of sensor networks modeling. Divisible load theory allows such integration. Because of its underlying linear model, continuous mathematics framework, divisible load modeling is very tractable. It has been used to accurately and directly model such features as specific network topologies and scheduling policies [2, [4] [5] [6] [7] [8] [9] [10] [11] 14] computation versus communication load intensity [2, 4] , time-varying inputs [15] , multiple job submission [2, [16] [17] [18] , and numerous applications such as image processing, databases, and multimedia.
A. Our Contribution
The goal of this paper is to quantify the superiority of virtual cut through switching to store and forward switching for networks processing divisible loads. Under store and forward like switching, a node in a tree must receive load for all of its descendants before beginning to distribute load to them [19] . This clearly can be improved upon. The way to do this is to use virtual cut through switching [12, 13] which allows a node to distribute load to its descendants even as it continues to receive its descendants' load. Both strategies are used here, which ends with a comparison. Note that the majority of the divisible load scheduling literature to date involves store and forward switching. To achieve our goal we discuss tree networks. Tree networks, though of interest in their own right, can be used to span any interconnection topology to deliver load to processors. They are thus relevant for many load distribution policies in such commonly used interconnection networks as hypercubes and meshes. Moreover a logical (Ethernet or wireless channel like) bus can be modeled as a single level tree (root and adjacent children only) with load distributed sequentially to children and all links having the same transmission speed. Multilevel trees can model cascaded networks as discussed in Section IIIB.
A multilevel tree is considered here where at time t = 0 the divisible load is available at the root node. Load is optimally (in a solution time or speedup sense) distributed throughout the tree to gain the benefit of parallel processing of the load. Our main results are closed-form solutions under a variety of scheduling scenarios (including the use of fat trees) for the solution time and speedup of single level tree networks and recursive expressions for homogeneous multilevel tree networks. These recursive expressions are novel. Calculating speedup for heterogeneous trees is also discussed. This paper begins with the development of some notation and analytic background in Section II. The scheduling method using cut through switching from level to level is derived in Section III. The scheduling method using store and forward switching from level to level is derived in Section IV. In Section V we compare the performance of cut through switching scheduling and store and forward scheduling. The conclusion appears in Section VI.
II. MODEL AND NOTATION

A. Model and Notation for Single Level Tree
If a node begins to process its load as soon as the load begins to be received, we call this simultaneous start. It is illustrated in Fig. 1 (where each node contains a miniature timing diagram-such diagrams are explained later.)
For a heterogeneous single level tree, which can be collapsed into an equivalent node, the notation is presented as follows.
Load fraction assigned to root processor ® i
Load fraction assigned to ith link processor pair w i
Inverse computing speed on ith processor w eq Inverse computing speed of equivalent node collapsed from single level tree z i
Inverse link speed on ith link T cp
Computing intensity constant. Entire load can be processed in w i T cp seconds on ith processor T cm Communication intensity constant. Entire load can be transmitted in z i T cm seconds over ith link T f,m Finish time of an equivalent node collapsed from single level tree composed of one root node and m children nodes. T f,m is equal to w eq T cp T f,0 Finish time for entire divisible load solved on root processor, (i.e., tree without any children nodes but the root node). T f,0 is equal to 1 £ w 0 T cp , that is w 0 T cp .
DEFINITION 1 First°e q is the ratio of the inverse computing speed on an equivalent node to that on the root node°e
An equivalent node can exactly present the operation characteristics of a subnetwork it replaces. Equivalent element modeling is a feature of linear models, such as circuit theory, queueing theory, and divisible load theory.
DEFINITION 2 Speedup, the ratio of finish time on one processor (i.e., the root node) to that on an equivalent node collapsed from a single level tree. It is thus a measure of parallel processing advantage. This value is equal to the ratio of the inverse computing speed on the root node to that on an equivalent node, i.e., the inverse of°e q . Hence,
B. Model and Notation for Multilevel Tree
A heterogeneous multilevel tree network is too complex to obtain a closed-form solution of speedup. Therefore, a homogeneous multilevel tree network where root processors are equipped with a front-end processor for off-loading communications is evaluated.
Suppose that after a subroot receives all of the assigned fraction of load for its descendants, it starts distributing these loads to its descendants concurrently. This strategy is called "store and forward switching with simultaneous distribution" (see Fig. 2 ).
The notation for a multilevel homogeneous fat tree is denoted as follows. 
® j,0
Load fraction assigned to root processor of jth level subtrees ® j,i
Load fraction assigned to ith link-processor pair of jth level subtrees w i eq j¡1
Inverse computing speed of equivalent ith node which represents (j ¡ 1)th level subtree, which consists of collapsed single level subtrees from level 1 ascending to level j ¡ 1. In a homogeneous multilevel tree, we assume that w eq j¡1 = w i eq j¡1 (i = 1,2,:::, m)
Processing finish time of k level homogeneous tree with one root node and m equivalent children nodes. DEFINITION 3 p j¡1,i ; the multiplier of the inverse capacity of the ith link at level j (see Fig. 2 ). The value of the multiplier p j¡1,i is defined as the inverse of the total number of children processor descendants at and below layer j ¡ 1 for the ith subtree. The variable p j¡1,i allows fat tree modeling. A fat tree allocates more capacity to nodes near the root to improve the transmission speed. In a homogeneous multilevel fat tree, p j¡1 = p j¡1,i (i = 1,2,:::, m). Hence,
With this choice of p j¡1 , the transmission capacity between a level j parent node and its children nodes is 1=(p j¡1 z), which is larger than the capacity of bottommost level links by 1=p j¡1 . This implicitly indicates that each node within an equivalent subtree from layer j ¡ 1 down to layer 0 has an equivalent transmission capacity of 1=z to the root. DEFINITION 4°j, the ratio of the inverse computing speed on an equivalent node at level j to that on the root node°j = w eq j =w:
DEFINITION 5 Speedup, the ratio of finish time on one processor (i.e., the root node) to that on an equivalent node collapsed from a subtree from level k to level 1. This value is also equal to the ratio of the inverse computing speed on the root node to that on an equivalent node, i.e., the inverse of°k. Hence,
III. CUT THROUGH SWITCHING WITH SEQUENTIAL DISTRIBUTION
For the purposes of determining the optimal load allocations, the single level trees within the overall multilevel tree are divided into a root single level subtree (level k) and single level subtrees below the root subtree (level 1, 2, :::, k ¡ 1). It is assumed that all data is available (and stored) at the root at t = 0. Thus the root can immediately deliver load to its children at level k. This is the root node with data storage case.
For the other single level trees, it is assumed that a root's load must be completely received by a single level tree root before load is distributed to its children. After this, load is relayed through the root to its children in virtual cut through mode. This is the root node without data storage case. Note that the root node without data storage policy is particularly appropriate when bandwidth is limited, as in a homogeneous tree.
In both cases, only the simultaneous start strategy is considered. In the simultaneous start strategy each processor begins processing the received data while it continues to receive the data. This strategy was originally described by Kim [20] .
In the following both cases are examined, first in the context of a single level tree in isolation and then in the context of multilevel trees.
In this section we evaluate the performance of the multilevel tree model using sequential distribution model under the cut through switching and the simultaneous start strategy. The complexity of sequential distribution model is more involved than that of the better performing simultaneous distribution. This model is applied to situations when the parent nodes do not have enough capacity and ability to enable the simultaneous distribution model. 
A. Processors Using Simultaneous Start Model in a Single Level Tree
The following two subsections discuss sequential distribution in single level trees: one model with data storage and the second model without data storage.
1) Single Level Tree: Root Node with Data Storage: The process of sequential load distribution can be represented by Gantt chart-like timing diagrams, as illustrated in Fig. 3 . Here the horizontal axis is time and communication appears above the axis and computation appears below the axis. According to the figure, the fundamental recursive equations of the system can be formulated as follows:
The normalization equation for the single level tree with intelligent root is
This gives m + 1 linear equations with m + 1 unknowns. From (6)
where q i = (w i¡1 T cp ¡ z i¡1 T cm )=w i T cp . We assume w i¡1 T cp > z i¡1 T cm ; that is, communication time must be faster than computation time. See also [2] for a discussion of the best choices of w i and z i . According to (9) and (10), the normalization equation (8) leads to "
Then the value of ® 1 is
Therefore, the finish solution time is derived as follows:
Since a single level tree can be collapsed into a single equivalent node, the equivalent inverse computation speed w eq of a collapsed node can be derived as follows:
According to Definition 1 in Section II,°e q is equal to w eq =w 0 . Thus, from (14)°e
Since
where ® 0 = 1, and speedup is the ratio of computation time on one processor to computation time on the entire tree with m children, we obtain
As a special case, consider the situation of a homogeneous network where all children processors have the same inverse computing speed and all links have the same inverse transmission speed (i.e., w i = w and z i = z for i = 1,2,:::, m). Note the root w 0 can be different from w i . Therefore
where ¾ = zT cm =wT cp and i = 2,3,:::, m. Consequently,
2) Single Level Tree: Root Node without Data Storage: The process of load distribution can be represented by Gantt chart-like timing diagrams, as illustrated in Fig. 4 .
The fundamental recursive equations of the system can be formulated as follows:
This gives m + 1 linear equations with m + 1 unknowns. Now from (20)
where
It is assumed that w 0 T cp > z 0 T cm (communication speed is faster than computation speed for the 0th link and root). Following the similar derivation in Section IIIA1, the expressions of°e q and speedup can be obtained as follows:°e q = w eq w 0 = 1
Therefore,
As a special case, consider the situation of a homogeneous network where all children processors have the same inverse computing speed and all links have the same inverse transmission speed (i.e., w i = w and z i = z for i = 1,2,:::, m). Note the root w 0 can be different from w i . Then
B. Processors with Simultaneous Start. Homogeneous Multilevel Tree Analysis
For purposes of illustration we consider two types of multilevel tree scheduling for homogeneous trees. Here homogeneous trees have processors with identical computing speeds and links with identical transmission speeds. Homogeneous networks arise in practice in such instances as cluster computer installations. Calculating speedup for heterogeneous trees is discussed in Section IIIC.
One type of multilevel tree scheduling discussed here uses sequential distribution of load from a node to its children for all tree levels. This is a good model for a cascaded series of Ethernets (utilizing collision domain interface cards) or a cascaded series of wireless channels. The second type of multilevel tree scheduling uses simultaneous distribution of load from the tree root to its children (in the topmost level of the tree) and sequential distribution for tree levels below that (levels j, j = 1,2,:::, k ¡ 1). By way of example, this can model a large capacity robotic tape silo feeding a number of clusters simultaneously using ATM links. Each cluster consists of one of more cascaded Ethernet implementing sequential distribution (because of the use of collision domain interface cards).
In this section we develop recursive expressions for solution time and for speedup for the two scenarios. This is done first for levels j = 1,2,:::, k ¡ 1 (with sequential distribution) and then in two subsections for the topmost level k (for both sequential and simultaneous distribution). At the end of these two subsections the overall recursive expressions are presented.
The methodology developed in this section can be applied to other combinations or exclusive uses of scheduling policies at each tree level. The ones discussed here are natural for a first study. Note also that if one is interested solely in the optimal allocations of load to processors, rather than speedup calculations, the methodology of [21] 
The process of load distribution for the multilevel fat tree network using cut through switching for computing and communicating can be represented by Gantt chart-like timing diagrams. According to an equivalent single level fat tree, level j (see Fig. 5 ), the fundamental recursive equation can formulated as follows:
® j,0 wT cp = ® j,1 w eq j¡1 T cp + ® j,0 p j zT cm (28) ® j,i¡1 w eq j¡1 T cp = ® j,i w eq j¡1 T cp + ® j,i¡1 p j¡1 zT cm where i = 2,3,:::, m:
The normalization equation for the single level tree with an intelligent root (that can process load as well as distribute it) is
This gives m + 1 linear equations with m + 1 unknowns. Now from (28)
Here the expression of k eq j¡1 is manipulated as follows:
k eq j¡1 = wT cp ¡ p j zT cm w eq j¡1 T cp = w w eq j¡1 ¡ wp j zT cm w eq j¡1 wT cp (32) (
where ¾ = zT cm =wT cp . It is also assumed that wT cp > p j zT cm (communication is faster than computing). Here (29)
where i = 2,3,:::, m and q eq j¡1 = (w eq j¡1 T cp ¡ p j¡1 zT cm )= w eq j¡1 T cp . It is assumed as before that w eq j¡1 T cp > p j¡1 zT cm . The expression of q eq j¡1 can be manipulated as follows:
where ¾ = zT cm =wT cp and°j ¡1 = w eq j¡1 =w. According to (31) and (34), the normalization equation (30) for the jth subtree leads to
Hence the value of ® j,1 is expressed as
Consequently, the equivalent finish time becomes 
According to (34), (35), and (39), the expression of°j is derived as follows:°j
Since the inverse computation capability of each node is w, it concludes that w eq 0 = w. Hence the initial value of°j is obtained as follows:°0
For a homogeneous multilevel nonfat tree, the bandwidth of each transmission links is the same, that is, p j = 1. So from (40)°j
According to (42) and°0 = 1, we can derive°1
Consequently, the general form of°j of nonfat tree is obtained as°j
2) Level k Subtree: Root Node with Data Storage: In this subsection two types of distribution model for the topmost level subtree, level k, are discussed. One is sequential distribution, the other is simultaneous distribution. Generally simultaneous distribution requires a central processing unit (CPU) be fast enough to continually load all output buffers to its children. If the buffer capacity of the parent node cannot satisfy this basic requirement for simultaneous distribution, the sequential distribution model discussed in IIIB2a can be used. The simultaneous model is described in IIIB2b. 2a) Level k subtree using sequential distribution: In this part the expression for the topmost level subtree using sequential distribution is derived. The start strategy used here is simultaneous start.
The timing diagram of level k subtree using sequential distribution is illustrated in Fig. 6 . According to this figure, the fundamental recursive equation can be obtained as follows:
® k,i¡1 w eq k¡1 T cp = ® k,i w eq k¡1 T cp + ® k,i¡1 p k¡1 zT cm , i = 2,3,:::, m:
The normalization equation for the topmost subtree is
This gives m + 1 linear equations with m + 1 unknowns. Then from (44) Here, we let k eq k¡1 = wT cp w eq j¡1 T cp = w w eq j¡1 = 1°k
Now from (45)
= (q eq k¡1 ) i¡1 ® k,1 , i = 2,3,:::, m:
Let q eq k¡1 = 1 ¡ wp k¡1 zT cm w eq k¡1 wT cp
According to (48) and (49), the normalization equation becomes
Consequently, the value of ® k,1 can be obtained as follows:
The equivalent finish time T 
According to Definition 1 in Section II and equations from (48), (50), and (51) we obtain the expression of°k as follows:°k
Then the speedup is
(59) 1) For a homogeneous multilevel nonfat tree the bandwidth of each transmission links is the same, that is, p j = 1. Then from (42), if j = k ¡ 1, then°k
Hence, (58) can be solved as follows:°k
This leads to a closed solution and the speedup of the multilevel nonfat tree is
2) For a homogeneous multilevel nonfat tree, the expressions of°s are as follows:°0 = 1 (63)°j
where j = 1,2,:::, k ¡ 1 (64)°k
Consequently, the speedup is expressed as
2b) Level k subtree using simultaneous distribution: In the topmost level subtree, the root of this level is the topmost root for the entire tree. Since all the data for distribution in the topmost root is already stored in this node, it is not necessary for this node to wait for data to come in from its parent (if the parent exists) under cut through transmission. Therefore, unlike the rest of the levels below level k, in a nonfat tree, the topmost level k can use simultaneous distribution instead of sequential distribution to improve the performance.
Using simultaneous distribution, the top equivalent subtree timing, level k, is illustrated in Fig. 7 . According to the Fig. 7 , the fundamental recursive equations are ® k,0 wT cp = ® k,1 w eq k¡1 T cp (67) ® k,i¡1 w eq k¡1 T cp = ® k,i w eq k¡1 T cp , i = 2,3,:::, m:
(68) In addition, the normalization equation for the single level subtree with intelligent root (that can process load as well as distribute it) is
This gives m + 1 linear equations with m + 1 unknowns. These equations can be solved recursively in the same manner as was done in the previous sections to obtain k eq k¡1 = wT cp w eq j¡1 T cp = w w eq j¡1 = 1°k The speedup of this multilevel tree using simultaneous distribution in the topmost level subtree is
1) Consider a homogeneous multilevel nonfat tree, which uses simultaneous distribution in the topmost level subtree but sequential distribution in the levels below the topmost level. (Nonfat tree means that all the bandwidth of each transmission links is the same, p j = 1). Now from (43), we obtain°k
Hence the value of°k can be obtained as follows:°k
Finally, the speedup of a multilevel nonfat tree using sequential distribution but simultaneous distribution at the topmost level is
2) For a homogeneous multilevel fat tree, the values of°s are as follows:°0 = 0°j
The speedup is
C. Speedup Calculation for Heterogeneous Trees
For tractability and to produce recursive expressions, Section IIIA and Section IIIB assumed a homogeneous symmetrical tree. Speedup can be calculated for heterogeneous and nonsymmetrical trees by collapsing single level subtrees, starting from the bottom of the multilevel tree and working upwards, until a single equivalent processor representing the operating characteristics of the entire tree is found [2, 21] . Using this, the calculation of speedup relative to a reference processor is straight forward. Since relatively simple recursive algebraic expressions are not possible, this procedure is best done recursively by a computer program.
D. Use of Multi-Installment Scheduling
A known technique for decreasing the time processors wait to receive load under sequential distribution is to distribute load sequentially and periodically in small installments or rounds [7, 18] . Multi-installment scheduling can be used in conjunction with the optimal cut through switching presented here to boost speedup. If communication speeds are faster than computation speeds (the assumption in here) then as installment size shrinks a (saturating) performance improvement results. If communication speeds are relatively slower than computation speeds (as is the case in some wireless networks) "gaps" will result in the timing and the equations presented here will not be valid. In this case though the use of store and forward switching will result in excessive store and forward delay. The use of cut through switching, by comparison, will lead to a dramatic performance improvement.
IV. SEQUENTIAL DISTRIBUTION USING STORE AND FORWARD SWITCHING
Under store and forward switching, a node must completely receive the load for itself and its descendants before beginning to compute and distribute load to its children.
Again, for the purposes of determining the optimal load allocations, the single level trees within the overall multilevel tree are divided into the root single level subtree (level k) and the single level subtrees below the root subtree (level 1, 2, :::, k ¡ 1). It is assumed that all data is available (and stored) at the root at t = 0. Thus the root can immediately deliver load to its children at level k. This is the root node with data storage case.
For the other single level trees, it is assumed that its load must be completely received by a single level tree root before being distributed to its children. After this, load is relayed through the root to its children in store and forward mode. This is the root node without data storage case.
In both cases, only the simultaneous start strategy is considered. In the simultaneous start strategy each processor begins processing the received data while it continues to receive the data. In the following both cases are examined, first in the context of single level trees in isolation and then in the context of multilevel trees.
A. Processors with Sequential Distribution. Homogeneous Multilevel Fat Tree Analysis
A fat tree architecture is now considered where upper links have more capacity than lower links in such a way that each node has bandwidth 1=z to the root.
We proceed by aggregating single level subtrees into equivalent processors, starting from the bottom of the tree and working upwards [21] . The tree's bottommost single level subtrees are at level 1, and the tree's topmost (including the root) subtree is at level k.
Consider a homogeneous multilevel fat tree network where all processors have the same inverse computing speed w, and all links of level j have the same inverse transmission speed p j¡1 z. We use the same tree labeling as in Fig. 2 . The value of p j¡1 is defined in Definition 3 from Section II, that is,
Again, the process for the load distribution of a multilevel fat tree network using the store and forward switching for computing and communicating from upper level to lower level can be represented by Gantt chart-like timing diagram. We derive the speedup of the whole multilevel tree by successively collapsing single level trees into equivalent nodes until the entire tree is collapsed into an equivalent node. We first use the root without data storage model for levels (j = 1,2,:::, k ¡ 1), and then use the root with data storage model for the top level (level k).
B. Level j Subtree. Root Node without Data Storage
The Gantt chart-like timing diagram for jth level subtree is illustrated in Fig. 8 . According to Fig. 8 , the fundamental recursive equations of the jth level tree network are ® j,0 wT cp = ® j,1 w eq j¡1 T cp + 1 £ p j zT cm (80) ® j,i¡1 w eq j¡1 T cp = ® j,i w eq j¡1 T cp + ® j,i¡1 p j¡1 zT cm , i = 2,3,:::, m:
Here as we move up the tree, collapsing single level trees into equivalent processors, the single level trees consist of a root with inverse speed w and children nodes of inverse speed w eq j¡1 . The normalization equation for the jth single level tree with intelligent root is
This yields m + 1 linear equations with m + 1 unknowns. Now using (80),
where k eq j¡1 = w=w eq j¡1 = 1=°j ¡1 and ¾ = zT cm =wT cp :
(84) Now from (81), ® j,i = w eq j¡1 T cp ¡ p j¡1 zT cm w eq j¡1 T cp ® j,i¡1 = q eq j¡1 ® j,i¡1
where i = 2,3,:::, m. Naturally w eq j¡1 T cp > p j¡1 zT cm as communication time is assumed as be faster than computation time. We note that q eq j¡1 = w eq j¡1 T cp ¡ p j¡1 zT cm w eq j¡1 T cp = 1 ¡ p j¡1 zT cm w eq j¡1 T cp £ w w
Consequently,
According to (83) and (85), the normalization equation (82) becomes
Finally, one obtains the value of ® j,1 :
Proceeding as in the previous section, one can find°j, the inverse of speedup as°j
C. Level k Subtree. Root Node with Data Storage
In this subsection two types of distribution models for the topmost level subtree, level k, are discussed. One is sequential distribution, the other is simultaneous distribution. Generally simultaneous distribution requires a CPU be fast enough to continually load all output buffers to its children. According to the specification as above, the timing diagram and recursive formulae for speedup are the same as Section IIIB2.
1) Level k Subtree Using Sequential Distribution: The timing diagram of level k subtree using sequential distribution is the same as illustrated in Fig. 6 . According to Fig. 6 , the solution of°k and speedup are obtained as (58) and (59) as follows:°k
2) Level k Subtree Using Simultaneous Distribution: The timing diagram of level k subtree using sequential distribution is the same as illustrated in 
V. SUMMARY AND PERFORMANCE EVALUATION
For cut through switching and store and forward switching the recursive speedup formulae are developed as above and summarized as follows.
A. Homogeneous Multilevel Fat Tree
In this part, we summarize the recursive formulae for a multilevel fat tree using sequential distribution under cut through switching and store and forward switching.
For level j:°0
°j (store and forward) =
96) where j = 1,2,:::, k ¡ 1.
1) If the distribution of the k level is sequential,°k
The speedup is 
B. Homogeneous Multilevel Nonfat Tree
The homogeneous multilevel nonfat tree using cut through switching and store and forward switching under the sequential distribution and simultaneous start and method assume all the bandwidth of each transmission link is the same, p j = 1. This is one special case of the homogeneous multilevel fat tree. The formulae of the tree using cut through switching can be obtained as closed-form formulae. The following formulae apply only to the model using cut through switching.°0
1) If the distribution of the k level is sequential°k
2) If the distribution of the k level is simultaneous,°k Fig. 9 . Ratio of speedup for multilevel tree models using simultaneous start or staggered start to that for an ideal model.
C. Performance Evaluation
According to the recursive equation (95), (96), (97), and (99) for the fat tree model and setting p j = 1 (where j = 0,1,:::, k ¡ 1) for the nonfat tree model, we obtain the ratio of the speedup for these eight cases to the speedup of the ideal model and then illustrate the result in Fig. 9 . The ideal model has extremely fast communication time.
As shown in Fig. 9 , the ratio of the speedup of the store and forward models for the fat tree and nonfat tree networks to that of the ideal model approaches zero very quickly as the number of tree levels is increased. This means that even if the store and forward model uses a fat tree network, the speedup saturates quickly under the sequential distribution. The cut through model has the best performance with a fat tree network. Even this model with a nonfat tree network also has better performance than that of a store and forward model with a fat tree network.
VI. CONCLUSION
The most important results of this paper are simple recursive solutions for speedup and solution time for a divisible load optimally scheduled on a multilevel tree with virtual cut through switching. This is done for a variety of scheduling features under a number of scenarios. This work is more general than the exact situations discussed here and the methodology can be applied to a wide variety of load distribution scheduling policies.
Aerospace applications will certainly see the increasing use of multiple sensor/multiple processor systems. In such systems the ability to do processing in a solution time optimal manner decreases response time and minimizes the amount of hardware necessary to accomplish a task. With the increasing ubiquity and decreasing cost of such systems, this tractable performance evaluation approach should be of interest.
ACKNOWLEDGMENTS
The assistance of M. Moges in preparing this article is appreciated. He was a lecturer in the Wu Feng Institute of Technology and Commerce, Taiwan, from 1991 to 1997. In 1998, he joined Golden Circuit Electronics Corporation, Taoyuan, Taiwan, and worked on manufacturing printed circuit boards. He joined Wintek Corporation, Taichung, Taiwan, in 1999, as an R&D electrical engineer, where he was primarily engaged in LCD driver circuit modular design. He is currently a senior engineer in Memes Technology Corporation, Nangkang Software Park, Taipei, Taiwan, where his research interests are RF/microwave circuit design for wireless applications and scheduling theory for general tree models. He is presently a professor in the Department of Electrical and Computer Engineering at Stony Brook University, Stony Brook NY. In supervising a very active research area, he has published extensively in the areas of parallel processor and grid scheduling, ad hoc radio networks, telecommunications network planning, ATM switching, queueing, and Petri networks.
Jui Tsun Hung
Dr. Robertazzi has authored, coauthored or edited four books in the areas of performance evaluation, scheduling and network planning.
