Abstract-The DIMOND is a building block for communication networks in which throughput is more important than transmission delay. Its main attraction stems from the fact that it allows the routing of messages through a network to be completely distributed over the individual building blocks.
I. INTRODUCTION ONE of the characteristic aspects of a computer archi-
tecture is the interconnection of communicating units such as processors, peripherals, and memories. In present-day computers the communication function is often implemented by means of a single switch (bus or crossbar). This implementation is in harmony with the other main functions (such as processing and storage) also being performed by single modules.
However, such an implementation of an essential function may suffer from one or more of the following shortcomings (for a detailed discussion see [ 1] ). 1) Vulnerability: A failure in an essential module has a catastrophic impact on the system.
2) Potential Performance Limitations: A shared module can easily become the bottleneck of the system.
3) Poor Adaptability: In the light of specific requirements, the function often proves to have been under-or overdesigned.
These potential shortcomings argue strongly in favor of the distribution of all essential functions. Moreover, it is expected that the use of microprocessors will lead to systems with distributed processing. For such systems it is natural to distribute the communication function as well. In munication networks. A message (containing routing information) arriving at a DIMOND is switched to a designated output port, where it is stored in a register.
Our aim in designing the DIMOND was to achieve a building block for communication networks which would be both universal and elementary. Furthermore, it should allow the routing of messages through a network to be completely distributed over the individual building blocks. The DIMOND is universal in that it comprises both the fork and the join function for messages streams. In other words, it is able to separate and merge message streams. The DIMOND is elementary in that it comprises the minimum number of ports with which these functions can be realized. Moreover, ad hoc control functions are avoided, i.e., the component comprises only those control functions which, in our view, are essential for a switching component to be generally applicable.
II. FUNCTIONAL DESCRIPTION
As stated, the DIMOND ( Fig. 1 When two DIMOND's are interconnected for communication, the data lines and the control lines that perform a hand-shaking procedure are interconnected directly (Fig. 2), i.e., ini(B) = out1(A)
In the sequel such a connection will be denoted as input i(B) = output i(A). III. COMMUNICATION NETWORKS One main application field for the DIMOND is the implementation of communication networks. The routing of a message through such a network can be performed sequentially by a number of DIMOND's.
A network interconnecting n senders with n receivers will be termed an n-by-n node. As stated in the Introduction, the DIMOND is the most elementary (two-by-two) node. Two basic structures, loop and tree, allowing the construction of an n-by-n node out of DIMOND's, are described below and their relative merits discussed.
In the examples given, the routing information takes the form of a destination address. More precisely, a message consists of two fields: an information field and an address field. Since four-by-four nodes are constructed, the address field contains two bits (ad,, ado). 1) Loop Structure: In Fig. 3 the DIMOND's are connected in a loop structure. The remaining input and output of each DIMOND are connected to a sender and receiver, respectively.
Messages offered to a DIMOND, with a destination address equal to the receiver address of the DIMOND, are switched to the receiver. All other messages are sent on to the next DIMOND in the loop (sequential search). More 
The signals on the destination lines depend on the address field of the message as follows: there are bits in the address field. Each DIMOND switches a message depending on one particular bit in the destination address. In the example ( For A and B: deso = adI(ino) and des1 = adI(in1). For C and D: deso = ado(ino) and desI = ado(in1).
In order to avoid indefinite blocking of particular message streams, it is advisable to use alternating priority signals for A, B, C, and D.
In the table given below, a comparison is made between both structures with respect to the following aspects (it is assumed that n is a power of 2).
1) The number of DIMOND's needed to build a node.
2) The average number of DIMOND's a message has to pass through between sender and receiver.
3) Is the network prone to congestion? More precisely, can the throughput of the network decrease when the number of messages in the network increases? The throughput of the tree deso(A) = ad lado ofino(A); -desi(A) = adj ado ofini(A) deso(B) = ad1 I ado of ino(B); desI(B) = adI ado of inI(B) deso(C) = ad1 ado of ino(C); desI(C) = adI ado of inI(C)
Moreover, in order to minimize the probability of congestion, messages within the loop are given priority over incoming messages, i.e., for A, B, C, and D: prio = 0.
2) Tree Structure: In a tree structure, a path between any sender and any receiver consists of as many DIMOND's as structure will reach a saturation point, after which additional messages in the tree will have no influence on the throughput. The loop structure, however, has a maximum throughput when the registers on the loop are alternately full and empty. Every additional message on the loop will decrease the throughput. In the extreme case-in which all registers on the loop contain a message-the throughput will even be zero (deadlock). Due to the fact that messages in the loop have priority over incoming messages, the deadlock situation can only arise when all senders put a message on an empty loop simultaneously. The extension of the loop with one additional DIMOND will obviously prevent such a situation from arising since one sender will never put a message on the loop.
4) The need for external circuitry to implement the routing function.
5) The number of bits needed for addresses if a receiver requires the address of the sender. In a loop structure the message then has to contain both the destination and the sender address. In the tree structure each DIMOND consumes a particular address bit not to be used again by any of its successors; hence, that bit location is available to insert one bit of the sender address. The structure can thus be made to let a message start with its destination address and to arrive with the address of its sender. A device connected to both an input and an output port of a node is called a subscriber of the node. It will be evident that n subscribers can be connected to an n-by-n node so as to allow bidirectional communication among all subscribers. Both implementations of the node allow simultaneous communication between a number of subscribers. In such applications, the loop structure has the advantage of allowing the allocation of one dedicated DIMOND to each subscriber.
For the orderly construction of composite networks, the following property is of interest; any two nodes, regardless of structure and size, can be interconnected by joining one output port of one node to one input port of the other, and vice versa. Each constituent node (with its subscribers) may then be considered as a single subscriber of the other. Thus, for example, hierarchical networks can be implemented. In the first clock phase, it is determined which inputs have to be copied into which registers. The copy allowances so determined are stored in four flip-flops: c00, c01, c lo, and c 1 I (coi is the allowance to copy inO into reg1 etc.).
In the second clock phase, two actions are performed concurrently:
if If the DIMOND is implemented in terms of integrated circuits, it is advisable to design two separate circuits: a control circuit comprising the copy allowances and the status flip-flops, and a switching circuit comprising the crossbar and the output registers (see Fig. 5 ).
Such an implementation allows a modular composition of the data path, since one control circuit can control an arbitrary number of switching circuits.
V. A FIFO BUFFER WITH MINIMAL DELAY
In this section the implementation of a FIFO (first in, first out) buffer with the aid of DIMOND's is described. First, a FIFO buffer with two places will be constructed out of a single DIMOND (Fig. 6) . Let By induction from the previous case, it can be proved that this configuration indeed behaves as a FIFO buffer with six places. The type of FIFO buffer described has two attractive properties.
1) By adding one DIMOND, the capacity of the buffer is modularly augmented by two places.
2) The buffer has minimal delay in the sense that the delay of an empty buffer equals the delay of one DIMOND. More generally, the number of registers a message has to pass through depends only on the number of messages already waiting in the queue and not on the buffer capacity.
VI. PROCESSING NETWORKS In communications networks, messages are transmitted from one DIMOND to another without modification. The control signals are then used as described in the previous sections. However, it is also possible to use DIMOND's in processing networks. As a case in point, we present a processing network which accepts messages and delivers them in sorted sequence according to some ordering relation (Fig. 8) .
Assume an unordered sequence of data items to be offered at in I(A). Let 
VII. CONCLUDING REMARKS
The DIMOND is an attractive building block for communication networks in which throughput is more important than transmission delay. According to the taxonomy in [1] , DIMOND networks can be classified as IDD networks (indirect, decentralized routing, dedicated data paths). The main attraction of the DIMOND stems from the fact that it allows the routing of messages through a network to be completely distributed over the individual building blocks. This ability distinguishes the DIMOND from any related proposal of which the authors are aware.
In [3] a component is proposed consisting of a two-by-two line switch and a binary storage element controlling the switch. The networks described are capable of permuting a set of input lines onto a set of output lines. Messages are not stored in the network and part of the control for such a network has to be performed by a central module.
In [2] , [4] , and [5] three basic building blocks are described which allow the modular construction of networks composed of a hierarchy of interconnected loops. Around each loop, data blocks circulate which cannot be delayed. The data blocks may be full or empty. Control is not completely distributed over the subscriber modules, since each loop requires a module performing supervisory functions.
The implementation given in Section IV requires one central clock for all elements in a network. This implies that the distances between elements in one network should be small. This restriction on its application could be removed by designing the DIMOND so as to operate asynchronously with respect to its environment.
The amount of circuitry required for the implementation of the DIMOND is small for a large-scale integration (LSI) chip, leaving room for the following additions.
1) The data registers (rego and regl) may be replaced by minimal delay FIFO buffers (Section V). This modification does not affect the minimum transit time of the module; it only increases the buffer capacity. Such an extension would alleviate the problem of congestion and eliminate the problem of deadlock in a loop structure.
2) By multiplexing the data lines it is possible to increase the number of bits in a message by a factor of two without affecting the transit time of the DIMOND. This can be achieved due to the fact that a message leaves a hole (empty register) after its transmission to a successor. Consequently, the data lines are not used by a subsequent message for at least one clock cycle. In the idle cycle the data lines can thus be used to transmit the second half of the message. In that case, a message will consist of a truck (first half) and a trailer (second half), where the truck contains the routing information. [4] . In particular, the study of asynchronous circuits has led to the development of several schemes for designing and controlling networks of interconnected asynchronous modules. One crucial factor underlying the different approaches is the nature of the delays that are assumed to occur in the network.
ACKNOWLEDGMENT
Two of the approaches to network design that are particularly useful with respect to current technology are the "speed-independent circuit" [1] , [2] and the "propagationlimited network" [3] speed-independent model is that network delay is confined to the logic gates and that intergate (line) delay is zero. The propagation-limited approach assumes that delay between modules is significant while delay within a module is negligible. Taken together, the two views provide a reasonably accurate model of an asynchronous network composed of interconnected logic modules in which a module might be as small as a single LSI chip.
In such a network, one recognizes that there is only limited access available to each module. In addition to data paths between modules, it is necessary to have control paths between them to govern the movement of data and to monitor the operation of the modules themselves. Inclusion of control requires the use of access paths to a module, thus reducing the available data paths. Hence, although the potential data processing capability of a module is large, it will be restricted if the number of data paths is limited.
One remedy to this problem of maximizing the data processing potential of a module with a fixed number of access paths is to increase the capacity of the data paths by permitting more than two signal levels to be transmitted on each path. In this case, the use of multivalued logic circuits in the design of the modules is required. However, merely allowing multivalued signals to be used neither ensures that the data processing capability is increased nor guarantees that the control structure will not require more access paths. Further, if the logic design of a nonbinary network is considerably more difficult than that of a binary network, the effort to incorporate the multivalued signals may outweigh the benefit resulting from their use in a system. The purpose of this paper is to briefly describe one possible approach to the design of networks composed of asynchronous 0018-9340/80/1000-0889$00.75 © 1980 IEEE
