Abstract-Of concern here are asynchronous modules, i.e., those whose activity is regulated by initiation and completion signals with no clocks being present. First a number of operating conditions are described that are deemed essential or useful in a system of asynchronous modules, while retaining an air of independence of particular hardware implementations as much as possible. Second, some results are presented concerning sets of modules that are universal with respect to these conditions. That is, from these sets any arbitrarily complex module may be constructed as a network. It is stipulated that such constructions be speed independent, i.e., independent of the delay time involved in any constituent modules. Furthermore it is required that the constructions be delay insensitive in the sense that an arbitrary number of delay elements may be inserted into or removed from connecting lines without effecting the external behavior of the network.
INTRODUCTION
JT has been suggested that computer design will, in the future, be dominated by organizations which employ large arrays of modules operating simultaneously. This is attributed both to the use of "large scale integration," in which the cost and size of each module is very small, and also the realization that "parallelism" is capable of providing a substantial increase in computing speed and hardware utilization. Most work done to date on arrays of modules has been concerned with synchronous modules, i.e., those which are controlled by a master clock. In this paper, we will be concerned with asynchronous modules, i.e., those whose activity is regulated by initiation and completion signals, with no clocks being present. Computer designers have long recognized that utilization of hardware could be improved by the use of asynchronism, since the length of time that an operation requires, as viewed by a system containing that operation, is equal to the actual time required, rather than the least upper bound, as in the synchronous case. Moreover, the problem of random-varying or unbounded delays may be avoided by using asynchronism, providing a kind of built-in error checking. The fact that a system may be subject to evolutionary changes, or modification while operating may also be provided for. The approach here is also useful in systematic "flowchart" design methods in which a system can be designed and implemented almost directly from a flowchart with little regard for considerations such as races, hazard, fan-out, etc. Such schemes have the effect of reducing the complexity of the design-automation process. A number of systems have been proposed or designed in this spirit.
Representative discussions may be found in Muller [23] , Clark [7] , [8] ,andDennis [11] . The purpose of this paper is twofold. First, we intend to make precise a number of operating conditions deemed essential or useful in a system of asynchronous modules, while retaining an air of independence of particular hardware implementations as much as possible. Second, we present some results concerning sets of modules that are universal with respect to these conditions. That is, from these sets we are able to construct any arbitrarily complex module as a network. It is stipulated that such constructions be speed-independent, i.e., independent of the delay time involved in any constituent modules. It is here that the techniques differ from more conventional investigations of asynchronous switching (cf., [28] ). In the latter, dependence on timing to avoid races, etc., present such overwhelming difficulties that any proposed modifications to a system would likely require that it be redesigned. In the present work we are interested in constructing arbitrarily large systems without global considerations of such things. Furthermore, we are interested in a practical problem that does not presently seem to be amenable to treatment by such theories, namely, the problem of simultaneous input changes.
We are especially interested in sets of modules that have as few interconnecting lines as possible, since the latter would tend to be representable by arrays of less interconnection complexity. Although universal sets of modules are presented here, the author's intention should not be construed as a suggestion that all computers be constructed of such modules. In many cases, commonly used modules could be implemented much more efficiently than the methods presented here would suggest. In this case a decomposition technique, which constructs a network using cheaply implemented modules wherever possible to replace more complex constructions, would be very useful. Furthermore need for nontrivial arbitration arises, for example, in systems in which certain resources such as a memory must be shared among two "processes" in a manner in which either can use the resource but both cannot use it simultaneously. The modules described in Definition 3.5 are useful in this regard.
Condition 4: There may be an arbitrary delay between the assimilation of an input signal by a module and the production of a corresponding output signal. This delay is always finite, but is not necessarily bounded.
Conditions 1 and 2 have been introduced mainly for consistency with certain types of physical implementations. It is possible to remove them and obtain a different class of modules as far as mathematical modeling is concerned. Condition 3 is introduced because there is no way of determining the exact order of two signals occurring sufficiently close together, due to physical limitations on the speed of signal propagation inside the module itself. Condition 4 is fundamental to the assumptions of asynchronism. Definition 1.3: A network is a collection of modules with some of their lines interconnected. If an input line of a module is unconnected, then it is an input line to the network. If an output line of a module is unconnected then it is an output line of the network.
Just as modules are required to satisfy certain conditions, networks of modules are required to satisfy the following additional conditions. Condition 5: At most two modules in a network are ever connected by the same line, and this line must be an input to one module and an output from the other.
Condition 6: If a signal is produced by one module on an input line to another module, it must be assimilated before a second signal occurs on the same line.
Condition 7: A line interconnecting two modules has no intrinsic delay.
Whereas Conditions 1-4 were restrictions on module operation, Conditions 5-7 restrict the operation of an interconnection of modules. Conditions 5 and 6 have been introduced for the same reason as Conditions 1 and 2 and could be removed for more generality. Condition 7 is introduced primarily for ease in presenting the model. It will be seen later that no generality is lost as far as the results of this paper are concerned.
In some previous models [11] , [23] 1) It permits a network of any size to be designed without regard to relative spatial distance between modules (which correspond physically to time-delays).
2) Finally we assume that every module satisfies, and every realization must be made to satisfy, the following condition.
Condition 1l-(Finite-blocking condition): If a signal is present on an input line to a module, this signal must eventually be assimilated.
This condition is related to the "finite-delay" property in [18] , [191, the "reliability condition" in [24] , and absence of "blocking" and "individual blocking" in [20] .
In most realizations presented in the sections to follow, it will be obvious that the conditions stated previously hold and therefore unnecessary to give explicit mention of them. They were, however, an important criterion in selecting the modules presented here and it is quite easy to find pitfalls in which sets of modules appear to be universal, but fail to satisfy these conditions. The easiest condition to overlook seems to be that of delay-insensitivity, although the finite-blocking condition is also subtle.
We now wish to investigate classes of modules which are universal in the sense that from modules of this class we can obtain realizations of any module. We begin by defining universality, and then showing that a certain set of modules is universal for a restricted class of networks. This is then used to obtain results about larger classes of networks. [8] , [27] . This module is called a "union" module in [11 ] . Fig. 3 .
Several call modules may be interconnected to achieve any number of calling pairs. One way to do so is by the represents the "called" pair of lines. To prevent subsequent diagrams from being too cluttered, an abbreviation is adopted for this configuration as shown in Fig. 4 Fig. 7 .
Using the modules presented previously, we are now able to show the construction of an arbitrary serial module m = (I, 0, N, A) where N = (Q, qo, 1, A, f, g). Assume that n has n states. We record the current state as a 0 state in one of a series of (n -1) S modules. The network of Fig. 8 shows the means of testing and setting the state using calls and D-calls. Assume that the current state is q. A signal on the jth input line initiates a control sequence as shown in Fig. 9 so as to first determine the state, given by ffq, j), and then set the new state, given by g(q, j). By "merge to g(q, j)" in the diagram, we mean that for each output line there is a tree of merge modules which has inputs from exactly those sequences which are to produce an output on the line g(q,j). This completes the proof of Theorem 2.1. An alternate construction, which uses only [log2 QI 1 S-modules for the state recording network is also possible. We leave this construction as an exercise for the reader. Definition 2.6: A set of modules is said to have modularity n if n is the maximum number of lines on any one module in the set.
In the preceding discussion, it was shown that a set with modularity 7 and cardinality 2 is universal for the class of serial modules.
We now show that there are universal sets of modularity 6 and 5 for the class of serial modules. The modules G and H as shown in Fig. 10 may be combined to realize an S module. Hence we have the following. Proof: Consider the realization of a parallel module from only serial modules. The requirement that only one input be signalled at a time means that the realization is separable into disjoint serial parts. However in general this is not possible; e.g., the output of the J module depends on both inputs, which could be signalled concurrently.
We next investigate sets which are generally universal. show only the case m = n = 2. In the figure, certain outputs of the DC modules are not used, and are marked "nc" to avoid cluttering. Also, the state changing network, an instance of Fig. 8 Corollary: The set IM, G, H, F, AC I (modularity 6) is wbw-universal for the class of all modules.
IV. IMPLEMENTATIONS The method of signal flow described in the preceding sections may appear unusual to some readers. It is the purpose of this section to discuss some possible signalling conventions and module implementations which could be used in practice. The manner in which implementations relate to the choice of "atomic" modules will also be discussed.
It should be mentioned that in implementing atomic modules, we are permitted to make assumptions about delays. This seems essential. In fact, it can be shown that certain modules, specifically those with "essential hazards" [28] , require delays for their implementation. The utility of describing a network in terms of modules, however, is that the modules provide a kind of "sphere of protection" around those parts of the network in which delays are critical. In fact, we may introduce the concept of a quasi-realization as being a realization in which certain lines are granted immunity from the delay-insensitivity condition, and then relate this concept to the discussion in previous sections, but we will not do so.
There are three seemingly natural signalling conventions which may be used for communication from a module m to another module m' by a line L.
1) Pulse-m sends a pulse on L. m' must either transmit the pulse immediately, or contain a flip-flop which is set by the pulse, indicating the signal's arrival. In this case, m' has essentially assimilated the pulse when it resets the flip-flop.
2) Symmetric Transition: Line L is considered to have a value of one of two possible levels, 0 and 1. A signal is indicated by a transition from logic value 0 to 1, and from logic value 1 to 0. m' assimilates the signal by acting on the level change.
3) Asymmetric Transition: A signal is indicated by a transition from logic value 0 to 1. The value must be reset to 0 before m produces another signal. These conventions are summarized in Fig. 18 . Each of these types of signalling appears useful in finding simple implementations of certain modules, but none seems to yield universally simple implementation. Hence it is instructive to discuss conversion from one type of signal to the other. To convert a pulse to a symmetric transition, the scheme shown in Fig. 19 is used. T represents a standard "trigger" flip-flop. To convert a symmetric transition to a pulse, the scheme of Fig.  20 is used. The e symbol indicates the standard "EXCLUSIVE-OR" and A indicates a delay whose length is roughly equal to the intended duration of the pulse.
To complete the picture, we must discuss conversion between symmetric and asymmetric transition signalling. We first note that a module employing asymmetric signalling can be viewed as one with symmetric signalling, with every other transition indicating a reset. Hence it suffices to consider the symmetric case only in specifying implementations. Furthermore, due to the necessity of resetting in the asymmetric case, this type of signalling appears to be usable only when every line is paired with an oppositely-directly line which indicates assimilation. In this paired case, a conversion from one convention to the other is shown in Fig. 21 , where the component modules are S and M modules using symmetric transition signalling. It is interesting that the same network performs the conversion in either direction.
Under the assumption that a module is serial, several techniques are available for implementation. One can use the Huffman synthesis approach [16] for any of the signalling conventions, the "pulse-mode" technique [22] for pulse signalling, and the "transition logic" [6] analog of pulse mode for transition signalling. Some modules which seem to be naturally implemented using transition logic are shown in Fig.  22 . MC denotes the "Muller C' element, an element with memory. An implementation of MC using NOR gates is given in Fig. 23 . If both inputs of the MC are the same, then the output is equal to the input. If they are different then the output retains its former value. Hence in Fig. 22(c) , assuming that both inputs are initially the same value, no output transition occurs until transitions have occurred on both inputs, which is effectively the behavior of a Join. In Fig.  22(d) , an (I,J) pair of transitions cause a transition on the "pulser" output of exactly one of the MC elements. However a transition has also occurred on one input of two of the other MC elements. The feedback loops have the effect of cancelling these signals, through the use of EXCLUSIVE-OR's. The A denotes a delay which is introduced for the purpose of allowing the circuit to stabilize before signalling its environment. The S and ATS modules seem to be naturally suited to implementation using pulse signalling, as shown in Figs. 24 and 25 . The reader may wish to observe this by attempting the design using transition-mode signalling. The ATS module implementation shown is still rather complicated and deserves explanation. Referring to Fig. 25 Unfortunately, there is a problem with this implementation. Since the relative timing of R and T is not phenomena has been reported [5] , [9] , [21] , [31] but apparently they are not widely recognized. As time increases after the occurrence of this phenomenon, it can be observed empirically that the probability that the flip-flop has not stabilized grows smaller. Hence making A2 large increases the reliability, but by no means to 100 percent. Note that by the arbitration condition, it does not matter for this module in which state FF1 finally settles because if it does not reflect the true state of Tr, it certainly will on the next T pulse, as R cannot occur again until the To output occurs, by the specification of the module's operation.
To the author's knowledge, no totally "reliable" solution to this problem exists. A recent workshop was directed to the problem without definite conclusions [32] . The interested reader may also wish to compare the above cited work to [33] , [34] . Such problems appear to occur whenever the arbitration condition (Condition 3) does not hold in a trivial way; that is, the output produced depends on the order of assimilation of simultaneous inputs. For example, the implementation of the Join module as shown in Fig. 23 does not appear to have this flaw. We reemphasize at this point that it is for such reasons that we would like to do without nontrivial arbitration whenever possible.
V. GENERALIZATIONS AND RELATION TO OTHER MODELS
Discussed in this section are some generalizations of the model presented in previous sections and the relation of these generalizations to other asynchronous computation models. We first consider a generalization which allows multivalued signals on lines. This is motivated by the problem of transmitting "data" from a module m to a module m' which can take on one of n > 1 values. In the present scheme, this value could be encoded as a "unary-encoded" signal on one out of n lines. A more efficient binary encoding could be used which first encodes the data in binary, then represents each bit as a signal on one out of two lines, requiring at most 2 [log2 n] lines. This method is discussed in [3] , [14] , [23] . A still more efficient method is described in [27] , however it is not amenable to description in the present model because assumptions about delay times are involved. The idea is that the data are encoded in binary using [log2 nl lines, but an extra line accompanies these lines. A signal on this extra line indicates when data are being transmitted. This signal lags the data slightly, so that when it arrives at m', it is certain that the data lines have their proper values, assuming the delay in all lines in the bundle are the same. Other methods for implementing multiple-valued asynchronous signals are discussed in [3] . If the model presented here were extended so that lines could hold multiple-valued signals, then this implementation could be represented conveniently. Here then is a case in which it may be beneficial to extend the present model, since it does not succinctly represent the idea of data transmission. Related to this are practical considerations for bus structures, which are discussed in [35] .
Another generalization is possible in which concurrent input signals are given relative priorities which would govern their order of assimilation by a module. It [29] and, in a loose sense, the "cells" or "locations" of [ 18] , [19] .
It may be noted that for each of the modules described herein, there is a representation of the state-transition table in terms of a "Petri net," as described in [15] , provided that we impose on the latter a condition which is the equivalent of the finite-blocking property. This means that Petri nets are, in a sense, universal for the representation of such structures. Other investigations have shown that Petri-nets, properly restricted, can be implemented using modules such as the ones described here [25] , [30] .
Another variation allows a line to be connected to only two modules, but allows it to be both input and output to each [4] . Condition 1 is therefore removed. This temporarily defies intuition, until we view it in the light of paired signalling conventions using transitions, such as discussed in [11] , [23] . Still more complex Lines are discussed in [25] , [30] . The author is convinced that the answer to questions concerning universality will not be found in traditional investigations of universal switching elements, as such investigations have not strived for the properties deemed necessary here, e.g., speed-independence, delay-insensitivity, etc. The work by Muller [24] perhaps comes closest to these desiderata, but since it was concerned with "autonomous" and, in a sense, "serial" networks, it is not clear that the results are relevant to the problem presented. Another discussion, more similar to the present one, is found in Petri [26] . A preliminary negative result is cited in [10] .
Finally we mention again that problems of this nature also relate to problems encountered in "concurrent programming" of computers. Since the flow of control in asynchronous modular networks is similar to that in concurrent programming, answers to questions presented here can yield answers to the questions of the adequacy of programming language constructs in effecting parallel computations. The questions we ask here also seem to ask something about the fundamental nature of asynchronous concurrent processes, apart from the physical realm of one implementation. For example, although the mutual exclusion problem [12] , [13] , [20] , a form of arbitration, appears to be solved for software processes, what has in fact happened is that the problem has been pushed to a lower level. That is, if the processors are truly asynchronous, there must be an arbitrating device between them and the memory. Many readers will undoubtedly be aware of other such similarities.
ACKNOWLEDGMENT
The author wishes to thank the anonymous referee and J. Banning and D. Carroll for their comments and suggestions.
