Abstract-The performance characteristics of banyan and crossbar communications networks are compared in a VLSI environment, where it is assumed that the entire network resides on a single VLSI chip and operates in a circuit switched mode. A high-level model of the space (area) and time (delay) requirements for these networks is developed and relative performance comparisons are made based on a space-time product measure. The results differ significantly from those obtained with more traditional analyses which are usually based on switch aggregate comparisons and SSI-based delay calculations. The analysis presented shows that the area required by both networks grows as O(N2). Time delay grows as O(N) for the crossbar, and approximately 0[Na(log2 N)2] for the banyan where 0 < a < 1. This contrasts with traditional results which yield O(N log N) and 0(log N) switch and delay growth for banyan networks.
I. INTRODUCTION I N recent years there has been increasing interest in tightly
coupled multiprocessor computer systems. This has been due both to the enhanced performance possibilities for such systems and the steady decrease in hardware costs associated with these systems. One central issue in their design relates to the network over which the multiple processors communicate. Clearly, as the number of processors increases, the characteristics of this communications network can become critical to overall system performance and cost. This has led to a wide variety of studies aimed at characterizing and quantifying the performance of such networks [1] - [1 2] . This paper focuses on two network types: the crossbar [13] - [15] and the banyan [16] , and compares their performance characteristics in a VLSI [17] environment, where it is assumed that the entire communications network resides on a single VLSI chip. Both networks are also assumed to operate in a circuit switched mode; that is, an entire designated path through the network is held during transmission of information between input and output ports. This path is held for the length of the transaction and then released.
The comparison is based on a performance measure which includes both the chip implementation area required for a network and the time delay imposed by the network. Thus, both space and time are included in the performance criterion. Note that previous work on network evaluation [12] considered bandwidth and number of switches in the network as the Manuscript received June 6, 1980 ; revised October 11, 1980 . This work was supported in part by the National Science Foundation under Grant MCS 78-20731 and NCHSR Grant HS03792.
The author is with the Center for Computer Systems Design, Washington University, St. Louis, MO 63130. principal measures of interest. This makes sense in an environment where wires and connection paths are of negligible cost compared to switching element costs. The situation changes, however, when network and switch implementation takes place on a single chip. In such an environment, connection paths may use substantial amounts of the chip area, thus reducing the area available to the switch elements themselves, and thereby reducing the size of a network that can be implemented on a given chip. The time delay associated with the paths may also contribute to the overall delay in a nonnegligible manner.
The remainder of the paper explores these ideas more fully. The second section reviews the general characteristics of the crossbar and banyan networks. The third section presents a VLSI implementation model. Expressions for the chip area required by each network are developed as a function of the network size and other key parameters. The fourth section considers the time delay properties of the networks and additional expressions are presented for network delay, again as a function of network size and certain key parameters. An overall space-time performance measure is derived in Section V and performance curves presented and discussed. The final section contains a summary and conclusions of the paper.
The principal contributions of the paper relate first to the development of a VLSI oriented model and methodology for comparing communications networks and second, to some particular conclusions associated with a banyan/crossbar network comparison. With regard to the latter, the most interesting conclusion is that using a space-time product performance measure the differences between banyan and crossbar networks are significantly smaller than predicted by traditional analysis, usually based on simple switch count considerations. Both networks are shown to grow in space as 0(N2). Time delay is shown to grow a 0(N) for the crossbar and approximately 0[Na(log N)2] for the banyan, where 0 < a < 1.This contrasts with traditional results which yield 0(N log N) and 0(logN) switch and delay growth for banyan networks. For relatively small values of N (i.e., N < 64), the two networks are shown to be comparable when a space-time product measure is used.
II. CROSSBAR AND BANYAN NETWORKS: GENERAL CHARACTERISTICS
Figs. 1 and 2 show the overall structure of eight input/eight output crossbar and banyan networks, respectively. The switch connection patterns permitted in the networks are also illustrated. The figures shown represent a typical multiprocessor configuration of the B-5500 [18] or C. MMP [19] the P and M notation standing for processor and memory subsystems. Routing of messages through the network can be performed on the basis of an address associated with each output node. Certain key properties of these networks can be identified. 1) Full Interconnection Capability: Both networks permit any single input/output connection to be made by placing the appropriate switches in the proper positions.
2) Blocking/Nonblocking Capabilities: In the crossbar network, as long as each input port addresses a unique output port, no blocking occurs in the network. That is all messages can be routed through the network simultaneously. In the banyan network this is not the case, and under certain conditions messages going to different output ports will require use of the same path between two switches. Since under the assumed protocol only one message can hold a given path during message transmission, blocking will occur. This blocking reduces the bandwidth of the banyan network and introduces added delays in system operation. Note that by addressing unique output ports, the analysis emphasizes the blocking characteristics inherent in the network, excluding blocking effects due to output port conflicts. Fig. 3 illustrates the probability of message blocking PN as a function of network size N. The network size is defined as the number of inputs (or outputs) in an N input/N output (square) network. It is assumed that all the messages enter the network at the same time, address unique output ports, and randomly win path access in the case of path contention. The results were obtained using simulation methods and demonstrate how the blocking probability increases with the network size. The form of this A least squares curve fit for the ten networks of size N = 2i, 5) Delay Costs (D'): Given that a path through a network is available, traditional SSI based analysis assumes that delay through the network is primarily proportional to the average number of switches through which the message passes. If output ports are uniformly addressed, then the average number of switches a message will pass through in a crossbar network (D'CB) of size N is N, while for a banyan network this average number is log1 N. In the banyan case however, there is the possibility that the path will not be available (i.e., the message will be blocked) and this must be accounted for in determining the average message delay (DBA). Assume messages are of equal length and begin and end simultaneously. Also, assume a saturated system where messages are always available to be sent at each input port, and a message retry protocol where blocked messages reenter the system again with the next message batch. Under these conditions, if PN is the probability of a message being blocked in a network of size N, then the average delay through a banyan network is given by 
6) Performance Measure: One simple overall performance measure for a network is the product of component and average delay costs. For the crossbar and banyan cases these are given below. 2(l -PN) N j The remainder of this paper considers how this performance measure changes when one moves from an SSI environment to a VLSI environment where the entire communications network is implemented on a single chip. In particular, one must now consider the required chip area for a given network as an important performance component, rather than number of switches per se. Furthermore, there must be some consideration of area related parameters such as network layout and topology. These parameters have generally been of low importance in traditional SSI-based network comparisons. In terms of network delay, added consideration must now be given to the connection path contributions to overall delay. Finally, the close relationship between layout, area, and delay must be quantified.
III. A VLSI AREA MODEL
In order to develop a VLSI area model without performing detailed network design and layout, certain assumptions must be made and general parameters defined. Some of these are design related, while others are technology related. We begin with those which are design related and apply to both banyan and crossbar networks. The objective initially is to develop equations for the area taken by a single switch.
Consider first the geometric shape of the individual switches which make up the network. While the actual shape of a switch on the chip may be difficult to describe since it comprises a group of individual components connected together in a particular manner, it is convenient to assume each switch fits into a square of area A, whose side is of length L. Because of the nature of the networks, and the assumption of comparable switch complexity in banyan and crossbar networks, all switches are taken to be of equal size.
The logic associated with the particular switch design can be roughly divided into two parts: one part is associated with data paths (i.e., registers, multiplexers, etc.), while the other is associated with control of these data paths. This control logic for instance would include facilities for local control of the switch position. These parts can be considered to occupy areas Ad and Ac, respectively. The size of the data portion of the logic is directly related to the number of communication lines associated with each path. This is denoted by w and is referred to as the path width. This path width, along with other parameters, will directly affect both network bandwidth and chip pin requirements. The latter point will be discussed in a later section. Under many design circumstances it is reasonable to assume that the control portion of the logic is independent of the path width.
A key design parameter y can now be defined as the ratio of A, to Ad given a path width equal to one.
=yA=/Ald(w = ).
Based on the experience with an SSI crossbar switch design [14] , [15] a -y on the order of 10 or 20 seems reasonable. This, however, is a design parameter and may vary widely depending on the control procedures used and resulting logic designed. The next general assumption relates to the orientation of paths entering and leaving each square switch. It is assumed that one path of width w is associated with each side of the square. While this is clearly a "natural" layout assumption for a crossbar network, some juggling of alternatives indicates it is also reasonable in the banyan case. Note that except for certain switches on the periphery of the network, the others have two entering and two leaving paths.
A second parameter K can now be defined as the ratio of the data area Ad to the minimum data area required for a path width equal to w. This minimum area is related to the feature size X which is determined by the particular technology and fabrication process used. Considering the basic unit of length to be the feature size X, for connections laid out in metalization layers, Mead and Conway [17] recommend a minimum line width and minimum distance between adjacent lines of 3 units each. A path containing w parallel communications lines would 285 thus be about 6w units wide. Since each switch side is assumed to have w connecting lines, a lower bound on the data portion of the switch area is (6w)2. K may therefore be defined as K = Ad/(6w)2 (K> 1).
(7) K represents the increased area in Ad over the minimum required to support a w wide path. Thus, for w = 1, Ad = 36K and from (6) one obtains A, = 36Ky.
Substituting for A, and from (7) for Ad in the relationship A = A, + Ad we now obtain an expression for the switch area in terms of the parameters y, w, and K.
The length of a switch side L is therefore
This concludes the development of a set of general assumptions and definitions related to the switch area. In the following sections these are incorporated in overall topological models for the crossbar and banyan networks. The banyan VLSI area model is more difficult to derive because unlike the crossbar, paths between switches cross each other on a single plane and the area implications of these crossovers are a function of both circuit layout [20] , and fabrication technology (e.g., number of connect layers assumed). Determining the "optimal" layout for a given circuit is generally an unsolved problem except for certain simple cases [21 ] . Due to this, the approach here is to develop area expressions based on "reasonable" layout assumptions. Constraints of analytical simplicity tend to bias the results toward the banyan network, however, indications are that the bias is not a major one.
With regard to the fabrication technology, the model assumes two layers of metal interconnect, where one layer provides for all horizontal paths and the other layer provides for all vertical paths. This can be extended to a three layer model. Further extension to larger numbers of layers are possible, but more difficult to quantify. In addition, it is possible to lay out some of the connections in the nonmetalization layers present. This will change the line width and spacing somewhat, but will not change the analysis significantly. For simplicity it is assumed that connections between connect layers (i.e., z direction) take negligible area.
To obtain the banyan network area, consider first the horizontal length of the network. Two possible layouts for a portion of the network are illustrated in Fig. 4(b) and (c). The layout of Fig. 4(b) produces the minimum horizontal length and that length will be used in the analysis that follows. Note that this layout will actually be possible only if the vertical spacing occupied by the 2w lines between two switches is less than the switch side length. A more detailed analysis can take this constraint into account. However, the effects on space-time product are not significant in most cases of interest and will not be pursued here.
With a network of size N, we obtain the following expression for the minimum horizontal lengths:
The first term in (10) represents the length contribution due to the switches themselves, while the second term is due to the separation between the switches required by the interconnect paths.
A similar approach can be taken for the vertical length. Here, however, the distances between switch rows will increase as one moves from one switch row to the next (one level to the next). The reason for this is that the number of path crossovers between succeeding rows increases as one moves from the input ports of the network to the output ports. The result is that the number of adjacent horizontal paths which must be routed increases with each level. Since these paths reside on the same connect layer and must be separated by a minimum distance of 3 units, the vertical distance between the switches increases correspondingly. The minimum vertical distance between switch rows for the first level is 3 as shown in Fig. 4(d) [17] . In this case, however, the area associated with the drivers will increase from level to level. This second approach which minimizes delay is considered here.
In order to determine the area contribution of the drivers, it is first necessary to develop an expression for the level to level line lengths. 
The first term represents the added vertical contribution to the switch due to the drivers and the second term (i.e., L) is the contribution due to the basic square switch itself. The overall vertical length for a banyan network is thus given by (17) LBAV = LBAVL + LBAVS.
The banyan area is thus ABA = LBAHLBAV. To compare the space requirements for the two networks consider the ratio ABA/ACB. The ratio can be expressed as a function of the design parameters y, K, w, v, and AV. Fig. 7 shows this ratio as a function of the network size N and the path width w. Notice that as the path width increases, the area taken by the banyan increases. This is to be expected since due to crossover considerations in the banyan case, the path width strongly affects row separation. Although not given here, it should be noted that the area ratio reaches as asymptotic value as N increases. This is an entirely different result from that obtained from a traditional cost analysis based on a ratio of number of switches in each network [i.e., (log2 N)/2N]. This traditional approach always shows the banyan being less costly than the crossbar, in addition to the ratio of the costs asymptotically approaching zero for large N. The finite nonzero limit arises from the fact that although the number of switches in the banyan network increases as O (N log2 N) , the interconnect area increase as 0(N2).
IV. A VLSI TiME DELAY MODEL One approach to determining delay time through a network is to divide this time into two parts. The first relates to the delay time associated with logic within the switch. The second relates to the delay time encountered in driving the interswitch paths. The sum of these times over an average path constitutes the average delay for a network.
An exact expression for the delay through the switch logic elements requires a detailed knowledge of the switch design. An approximate expression however can be obtained by specifying the number of logic levels (m) a signal passes through in a switch and by making some specific assumptions [17] as DNOR =f(l + q)r (19) wheref is the gate fan-out, r is the transit time through the active region of an MOS transistor, and q is the ratio of the pull-up to pull-down transistor impedances. Note that using the Mead and Conway design rules [17] , a minimum size driver would be a square with each side being 2 feature sizes in length. The delay associated with driving such lines is given by a is the ratio of the interswitch metalization path capacitance per unit area Cw to the transistor gate capacitance per unit area Cg. p is the length of the interswitch connect path to be driven.
In the second case the lines being driven are long when compared to the feature size, and thus the line capacitance is large when compared to the gate capacitance of a minimum size driver. To drive these long paths in minimum time a sequence of driver stages is assumed, where the first driver stage is of minimum size, each successive driver is g times larger than the previous driver, and Nd drivers are present. If Nd is given by Mead and Conway show that the delay time is given by DDR t T(g/10g2 g) 10g2 (0.75ap) (22) and that DDR is minimized when g = e, the base of natural logarithms. The delay is near minimum when g = 4 and this value will be used for simplicity. Thus, DDR z 2r 1Og2 (0.75ap). (23) Note that the 0.75 factor arises from the per unit area definitions of the capacitance, the metalization wire width of 3, and the minimum driver area of 4.
A. A Crossbar Delay Model
As indicated earlier, the average number of switches in a crossbar path is N. Furthermore, the crossbar layout discussed in Section 111-A indicated a switch separation equal to several feature sizes (i.e., p = 3). Substituting in (21) and adding in the switch delays expressed in (20) results in an overall delay given by DCB = 2.5NmfT + (N -1)T(1 + 2.25a). (24) Note the average path in an N X N crossbar contains N switches and N -1 interswitch paths.
B. A Banyan Delay Model
An equivalent analysis must now be done for the banyan case. Two complicating considerations enter here however, which were not present in the crossbar case. First, the path lengths between switches vary both from level to level and within a given level. Second, the probability of path blocking must be taken into account.
This second point can be handled by directly following the same assumptions which led to (2). Thus, if PN is the probability of a message being blocked in a network of size N, and DBAS is the delay associated with a message passing through the network once, then the overall average banyan delay is The first term represents the contribution due to the log2 N rows of switch delays, while the second term (under the summation sign) represents the contribution due to the log2 (N/2) interrow path delays. This second term is an average over the type 1 and type 2 paths.
C. Banyan/Crossbar Delay Comparisons
Corresponding to the area ratio discussed in Section III-C, the delay ratio DBA/DCB can now be formed using (24) and (26). Note that this ratio is independent of r, the transit time. Fig. 8 shows this ratio as a function of network size N and path width w. The curves indicate that as w increases, the delay associated with the banyan increases. This is due to the greater interrow distances which occur in the banyan case as the path width grows. Such growth does not occur in the crossbar case.
The curves also show that the banyan delays get relatively smaller as N increases. This results from the O(N) delay growth in the crossbar case and the slower delay growth in the banyan case. The slower delay growth in the banyan case results from three factors. The first is the delay through the switches themselves which grows as 0(log N). Second is the interrow staged driver delays which grow as 0(log2 N)2. Since these two factors are additive, the term 0(log2 N)2 dominates. The third factor which is multiplicative is the blocking probability which increases with delay as 0(1/ (1 -PN) ). Using the function given in (1), the overall banyan delay growth is O(Na log2 N)2 where a is approximately 0. 19. For large N, this last term is asymptotically smaller than N, and hence from a delay standpoint the banyan network will be better than the crossbar for large N. Note, however, that the crossbar is still relatively much better than predicted by traditional analysis, which shows the banyan delay growth as 0(log N).
V. NETWORK SPACE-TIME COMPARISON Given that banyan and crossbar network area (space) and delay (time) characteristics have been derived, the space-time product can be formed using (9) , (18) 
From this, the ratio CBA/CCB can be obtained. This ratio is plotted in Fig. 9 along with the equivalent ratio as obtained from a traditional analysis (5) . The VLSI analysis shows much less difference in performance than the traditional analysis.
As an example, consider the SSI-based crossbar switch dis- cussed in [14] and [15] . The area model, based on general geometric chip layout considerations, was derived from the topological properties of the networks and several layout assumptions. One assumption was that only two connect layers were present, one of them for vertical and the other for horizontal paths. Use of more interconnect layers will reduce the banyan area, however, at a significant fabrication cost. The analysis presented may be extended to these cases.
The time model was based on determining the average path of a message through the network. The average number of switches and the average path length were calculated. These results were combined with some simple time delay models based on a NMOS implementation, a typical NOR gate component, factors such as logic fan-out and number of levels, and a sequence of staged drivers to drive interswitch paths. Expressions for the average delay through the network were presented.
The area and time models were used to evaluate an overall space-time performance measure for the networks. The difference between the two networks using this measure is much less than predicted by traditional SSI-based analysis and for puter element. In this case the "size" of the algorithm will generally be much larger; the unit of execution can be regarded as a "process" in the operating system sense. Here, asynchronous system operation is an option.
Tightly coupled, multiprocessor systems, where the processors share a common high performance random access memory suffer from connection complexity that is costly and thus tends to offset the favorable economics of microprocessors [9] , [11] , [12] . In multicomputer systems processor-memory units are interconnected in a network and computation-enabling data are passed using a message passing protocol. Although the interconnection is much simpler, messages may pass through a number of network nodes before reaching their destinations. The message processing load and message delay can offset the advantage of simple connectivity.
When considering such multicomputer systems, regular networks of low degree having small maximum message path lengths (or graph diameters) are of interest. The regularity permits a local network to be implemented, or extended, using a "standard" building block. Given the pin limitation on microcomputers, a relatively small number of bidirectional buses at a node is a natural limitation. Clearly a node with two buses, or degree 2 limit the regular network to cycles. With 2n + 1 nodes in a cycle the maximum message path length is n [2] , [10] , [13] . This diameter is acceptable if n is relatively small, but even for several tens of computers path lengths become too large. There are many families of regular graphs of degree 3 [1] , [3] - [8] and it is, of course, possible to greatly reduce the graph diameter for a given number of nodes in comparison to the degree 2 case, or cycle. This paper is concerned with a particular family of degree 3 graphs called Chordal Rings which can be formed by adding chords to a simple cycle, or ring.
For regular graphs it is possible to compute, at each node, the path(s) between any source and destination node-as opposed to using a tabular representation of the network stored at each node. This path determination is particularly simple with the Chordal Ring geometry and an algorithm is given.
0018-9340/81/0400-0291$00.75 © 1981 IEEE
