Ebergen, J.C., Arbiters: an exercise in specifying and decomposing asynchronously communicating components, Science of Computer Programming 18 (1992) 223-245.
Introduction
As computations are distributed over more and more processes to achieve a high degree of parallelism, a growing need arises for primitives that realize the proper synchronization and communication between these processes. One of these primitives has to guarantee the mutually exclusive access of processes to their critical sections. It has to arbitrate among a number of concurrent requests of the processes for entering their critical section. Only one of these requests may be granted at a time. A hardware primitive that realizes such a function is called an arbiter.
Unfortunately, circuit realizations of arbiters exhibit the fundamental problem of metastable behaviour [l] . This means that there may be an indefinite delay before a decision is made which of the pending requests will be granted. As a consequence,
arbiters cannot be used safely in purely synchronous circuits, where decisions must be reached within a fixed clock period. Therefore, most circuits in which arbiters are used are a special type of asynchronous circuits called speed-independent circuits [ 141.
A speed-independent circuit is a network of basic elements of which the correctness is independent of delays in the response times of the basic elements. In case the correctness of the network is independent of the response times of the connection wires as well, we say the circuit is a delay-insensitive circuit [13] . The usefulness, flexibility, and potential of these types of asynchronous circuits have been demonstrated by many authors [lo, 13, 161 and, most recently, by Ivan Sutherland in his 1988 Turing Award lecture [ 181. It is believed that these circuits form a challenging new area of circuit design which differs considerably from the traditional design techniques for synchronous circuits. One of the most challenging tasks is to develop a formalism and notation for the design of these circuits.
Several notations and formalisms are used to specify the communication behaviour of asynchronous circuits. Petri Nets and Signal Transition Graphs are widely used [3, 5, 121 . State graphs are also used, but they become unmanageable in case a high degree of parallelism is involved. The notation of CSP [7] has been used successfully by Martin and others [lo, 201 . In this paper we use a simple notation similar to regular expressions and CSP. We show that communication behaviours can be specified succinctly in this notation, and that such specifications may even help in deriving decompositions in a calculational style.
The design approach is based on trace theory, an event-based formalism without any time metric. It is developed for the specification and design of parallel computations and delay-insensitive circuits [2, 6, 15, 19, 211 . A variant of trace theory is used by Dill, who was the first to design an automatic verifier for speed-independent circuits [4] . In our formalism we reason about circuit elements as (abstract) components that communicate asynchronously. The communication behaviours are represented by sequences of occurrences of events, also called truces. The communication events and the relative ordering of their occurrences are the only topics of interest. We will not consider any gate or switch level implementation of our components nor give a quantitative analysis of the delays in a circuit. We illustrate the method by the design of some arbiters. We first formally specify the communication behaviour of the four-phase arbiter. We then present a decomposition of the general arbiter, which arbitrates among n 2 2 requests, into basic arbiters, which arbitrate between only two requests. The decomposition is based on the idea of a simple token ring. The token-ring idea has been applied by many authors [S-lo] , sometimes with similar circuits as a result, using different formalisms. The design of a general arbiter is interesting for several reasons. First of all, an arbiter is a circuit that exhibits both parallel and nondeterministic behaviour.
Therefore, it is a nice example to illustrate how parallel and nondeterministic behaviour are dealt with in the formalism and notation presented. Secondly, the example illustrates nicely how one can derive a decomposition in a calculational style, in particular when parallelism is involved. Thirdly, several arbiter decompositions have been given in the literature which turned out to have errors [4] . This shows that finding arbiter decompositions is indeed a nontrivial and challenging task. Let us illustrate the mechanistic interpretation with the specifications of the MERGE and the TOGGLE given in Fig. 3 . The environment of the MERGE initially may produce either an input a or an input b; the component may then produce an output c, after which the environment may produce an input again, and this behaviour repeats. The TOGGLE distinguishes the odd and even occurrences of the input a: after every odd occurrence of a it may produce output b, and after every even occurrence of a it may produce output c. Also here, there is a strict alternation of inputs and outputs, where the environment may start with producing an input.
Sequential behaviours
From the mechanistic interpretation it follows that a specification is a prescription for both component and environment.
Because of the inclusion of the environment prescription, we can specify the conditions under which correct (component) behaviour must be guaranteed.
In 
Parallel bebaviour
A component which involves parallel behaviour is the (Muller) C-ELEMENT. A C-ELEMENT has two input terminals and one output terminal. As a logical circuit it is often specified as follows: if both inputs are 1 (0), then the output will become 1 (0), otherwise the output remains the same. In our formalism, we prescribe a special communication behaviour for the C-ELEMENT. This communication behaviour is given in the state graph of Fig. 4 together with a schematic of the
C-ELEMENT.
In terms of a command, this C-ELEMENT can be specified by I:-pc,! In order to give concise specifications when parallel behaviour is involved, we introduce the operation weaving. Formally, the weave EO (1 E 1 of two trace structures represented by the commands EO and E 1 is defined by
where tJB denotes the trace t projected on alphabet B, i.e., the trace t from which all symbols not in B have been deleted. We stipulate that weaving has highest priority, then concatenation, and then union. Notice that, in a weave, common symbols must match. One could also say that weaving expresses "parallel behaviour with synchronization on common symbols". There are two special cases of weaving EO and E 1:
(1) if aE0 n aE 1 = 8, weaving amounts to interleaving or shuffle;
(2) if aE0 = aE 1, weaving amounts to intersection.
Weaving is commutative, associative, and has E as identity, i.e., E 11 F = E.
A weave of two trace structures can also be seen as the "conjunction" of two behaviours:
every behaviour that is in accordance with EO and E 1 is contained in the weave and vice versa. We use this property for specifying communication behaviours that have to satisfy several requirements.
For each of the requirements we then specify a communication behaviour and, subsequently, take the weave of these behaviours as the complete specification. For example, the C-ELEMENT can be considered as a conjunction of two behaviours: one behaviour that prescribes the alternation of a's and c's and one behaviour that prescribes the alternation of b's and c's.
The reader may be tempted to interpret the weave as the (parallel) composition of components in the sense of "connecting the circuits specified by the weavands". We emphasize, however, that weaving should be considered here solely as an operation to construct trace structures for expressing the communication behaviour of one component.
The (de)composition of components is discussed later.
The four-phase arbiter
A basic component that exhibits both parallel behaviour and nondeterministic behaviour is the four-phase arbiter. The basic four-phase arbiter communicates with two processes, process 0 and process 1 say. Each process is connected to the four-phase arbiter by four terminals. For process 0, we denote these terminals by the symbols r0, go, f0, and ~0, according to the following interpretations that are associated with them. r0 request for grant; g0 grant; f0 release (or free) the grant; a0 acknowledgement of release.
A similar interpretation holds for the symbols rl, gl, fl, and al, but now related to process 1. The specification of the four-phase arbiter by means of a command is given in Fig. 5 . The command for the four-phase arbiter can be explained as follows. First, we consider the communication with process 0 in isolation, i.e., we consider the symbols r0, gO,fO, and a0 only. With respect to these communication actions the behaviour is a repetition of request, grant, release, and acknowledgement of release. This behaviour is expressed in the first line of the command. A similar reasoning applies to the communication behaviour with respect to the symbols rl, gl, fl, and al, which gives rise to the second line of the command. Finally, we have to specify that the processes have mutually exclusive access to their critical sections, which is the only synchronization requirement between the two communication behaviours. Each process is in its critical section between the grant and the successive release of the grant. The mutual exclusion condition then amounts to requiring that the parts go!; fO? and gl!; fl? do not overlap, i.e., either process 0 is in its critical section or process 1 is in its critical section. This leads to the third line of the command. Notice that only the third line introduces the nondeterminism in the behaviour. The complete specification of the four-phase arbiter is simply the conjunction of the three behaviours specified above: the communication behaviour between the arbiter and process 0, the communication behaviour between the arbiter and process 1, and the behaviour to guarantee mutual exclusion.
Since "conjunction" of behaviours is conveniently expressed by weaving, we obtain the command of It is also possible to specify a four-phase arbiter that starts in a state different from 00. For example, the four-phase arbiter that starts in state 02 is specified in Fig. 7 . In contrast to the arbiter of Fig. 5 , the arbiter of Fig. 7 The schematic of this four-phase arbiter is also given in Fig. 7 . Remark. Circuit implementations of the four-phase arbiter can be made with a so-called ME element [16] , MERGES, and TOGGLES.
From the specification of the basic four-phase arbiter in Fig. 5 we can easily construct a specification for a four-phase arbiter that arbitrates among n processes, n > 0. For example, for n = 3 we obtain the specification Notice that these specifications are linear in n, while state graphs for such arbiters are exponential in n. Another attractive property of this command is that the specification concerns with respect to parallel behaviour and nondeterministic behaviour are clearly separated.
A tentative token-ring decomposition
In order to become more familiar with specifying communication behaviours by means of commands, we discuss a tentative token-ring decomposition for a four-phase arbiter arbitrating among n processes, n 2 2. The decomposition is based on the following idea. We have a ring-wise connection of n components in which a token is traveling clockwise from component to component. Each component communicates with its neighbours in the ring and with one process. The components are called token-ring interfaces. A process requests the token from the token-ring interface in order to enter its critical section. If the token-ring interface receives the token and there is a pending request, it may grant the token to the process. Otherwise the token is sent on to the next token-ring interface. Upon exit from the critical section, the process releases the token to the token-ring interface. Since there is only one token in the ring, at most one process can be in its critical section. Accordingly, mutual exclusion is guaranteed. For n = 3 the tentative decomposition into token-ring interfaces is depicted in Fig. 8 The first line specifies the communication behaviour between the token-ring interface and process 0. The second line specifies the communication behaviour of the token-ring interface with its neighbours.
The third line specifies the mutual exclusion condition and can be explained as follows. After the token-ring interface has received the token, it repeatedly arbitrates between either sending the token to the next token-ring interface (and waiting until the token is received again) or granting the token to process 0 (and waiting until the token is released). Process 0 is in its critical section in the part go!; fO?, and in the part nl!; no? one of the other processes may be in its critical section. A similar property holds for the other token-ring interfaces. We use this freedom in manipulating commands later for the verification of the decomposition.
Correctness criteria
Now that we have a tentative decomposition, we formulate the conditions we have to verify in order to conclude that the decomposition is correct. In this section we discuss four conditions, which are based on the abstract mechanistic interpretation we gave in Section 3. Informally, a network of components forms a decomposition of a component E, if this network may produce any trace in tE, provided the environment of this network produces the inputs as specified in E. Furthermore, in the network of components no computation interference may occur. We formulate the conditions for the tentative decomposition given in Fig. 8 . Let E be the arbiter arbitrating among three processes as given in Section 5, E, is the ith token-ring interface for 1 i is 3, and E, denotes the IWIRE. the so-called structure of the network and two conditions concern the behaviour of the network. We first discuss the conditions on the structure of the network.
In the network (E,, , E, , E2, E3, Ed) there must be no dangling inputs and outputs,
i.e., every input is connected to an output and every output is connected to an input.
In formula:
(ui:O~i<5:oE,)=(ui:O~i<5:iE,).
(1)
If (1) holds, we say that the network (E,,, E,, EZ, E,, Ed) is closed.
The second condition is that outputs of distinct components are not connected to each other. In formula:
If (2) holds we say that the network is free ofoutput interference. (Notice, however, that inputs may be connected to each other.) Condition (2) guarantees that each symbol can be produced by at most one component.
Condition (1) and (2) together guarantee that each symbol can be produced by exactly one component and be received by at least one component.
Conditions (1) and (2) are conditions on the structure of the network and are formulated in terms of the alphabets of the trace structures. Conditions (1) and (2) are satisfied by the network (E,,, E, , E?, Es, Ed), as can be verified easily. The next two conditions are behavioural conditions; they are phrased in terms of the trace sets and the alphabets.
The third condition requires that the environment prescription for any component in the network may not be violated. We can simulate the network by generating traces of symbols, representing joint behaviours of the components in the network. Formally, we construct the trace set X of all joint behaviours as follows. Initially, X = { &}. Choose a trace t, symbol z, and index i, 0 s i < 5, such that tEXAzEoE,r\tz&aE,EtE, holds. In other words, after joint behaviour t, component E, can produce output z.
If for all j, 0 S j < 5, we have tzJaE, E tE,, then we add tz to X. In other words, if all other components can accept z, i.e., their environment prescription is not violated, then tz is a joint behaviour as well. If some component cannot accept z, we stop the simulation and say that the network has computation interference. Our third condition is:
The network is free of computation interference.
When the network is free of computation interference, X represents the set of all traces that can be constructed with the above simulation.
A less operational, and perhaps more formal, formulation of absence of computation interference can be given as follows. In case there is no computation interference, the joint behaviour of the network is equivalent to the trace set of the weave of all components in the network. In formula, X = W, where Furthermore, if there is computation interference for certain t, z, and i, then it follows that the above property does not hold for this t, z, and i as well. Therefore, absence of computation interference and the above property are equivalent. The fourth condition is that every trace of the component specified (here E) may also occur in the simulation.
When no computation interference occurs, the joint behaviour of the network can be represented by W (or X). The fourth condition then becomes:
WJ,aE = tE, i.e., the behaviour of the network with respect to the alphabet of E is exactly the trace set of E. Condition (4) excludes, for example, decompositions of the general arbiter where only process 0 would be granted and never process 1 or 2. This condition also excludes the decomposition of components into the so-called "accept-everything-donothing" module, i.e., a component that accepts every possible input but never produces any output. On the other hand, although condition (4) requires that each trace in tE may occur in the simulation, it does not require that some traces are guaranteed to occur. Consequently, fairness nor absence of deadlock or livelock are guaranteed by condition (4) (as we shall see later). For this reason, other works on asynchronous circuit design [2, 4] have restricted themselves to conditions (l), (2), and (3) only.
In this paper we consider the above four conditions as our correctness criteria for a decomposition. They can be generalized naturally to any network of components.
We mention two properties of decomposition which can be readily verified. The first property states that any component can be decomposed into itself, i.e., for any component E we have the identity decomposition:
E+(E)
The second property is that in any decomposition we can introduce components E without invalidating the decomposition. For example, we have E + (E, E). Component E can be seen as the "identity" component: it has no communication terminals and it does not do anything.
Notice also that the ordering of components in a decomposition is immaterial: if E + (E, , Es), then also E + ( E2, E,).
As an example of a decomposition, we have i.e., a WIRE can be decomposed into three other WIRES. Although this is a rather trivial decomposition, verifying the correctness of the token-ring decomposition essentially boils down to verifying a decomposition like this, as we shall see.
Two theorems on decomposition
Verifying the four conditions of decomposition can be automated. David Dill [4] has developed an automatic verifier that checks the first three conditions. Such an algorithm basically constructs a finite state graph for the joint behaviour of the network from the state graphs of the components.
Unfortunately, the state graph for the joint behaviour can be exponential in n, where n is the number of components in the network. Consequently, the time complexity of such a verification algorithm can be exponential in n as well. In the case of our arbiter decomposition, where there is a high degree of parallelism, a straightforward verification would indeed be exponential in n. Fortunately, we have two theorems on decomposition that enable us to verify decompositions more efficiently. One theorem can be characterized as "decomposi-
tion by stepwise refinement" and one theorem can be characterized as "decomposition by partwise refinement".
The first theorem, which enables us to decompose a component by stepwise refinement, is called the Substitution Theorem. The theorem expresses that in a decomposition in which a component, say EZ, is used, we may safely substitute component El by one of its decompositions, provided that the symbols introduced in the decomposition of Ez are fresh symbols.
Theorem 1 (Substitution Theorem). Let components E,, E,, E,, E, and E, satisfy E,+(E,, E7) and Ez+(Ej, &).
IJ furthermore, (aE, u aE,) n (aE, u aE,) = aE,,
then
Notice that the condition on the alphabets is essentially a void condition, since it can always be satisfied by an appropriate renaming of the symbols introduced in the decomposition of E2. Theorem 1 applies to decompositions into two components only. The generalization of this theorem to decompositions into an arbitrary number of components is straighforward and is omitted here. (A proof of the Substitution Theorem can be found in [6] .)
The Separation Theorem allows us to find a decomposition by partwise refinement. The theorem expresses that if a component is specified by a weave E 11 F, we can first try to find decompositions for the parts E and F and then find a decomposition for E 11 F by collecting all commands in the decompositions and weaving those commands that have common outputs.
We use the notation
E+(i:l<i<n:E,)
to stand for "component E can be decomposed into the network of components Ei, 14 i < n". The Separation Theorem can be generalized to decompositions of components expressed by weaves of more than two commands.
Theorem 2 (Separation Theorem). Let components E, F, E,, F,, with
The generalization is straightforward. A proof of the Separation Theorem can be found in [6] .
Verification of token-ring decomposition
The token-ring decomposition can be verified, and perhaps derived, in a calculational style by application of the Separation Theorem.
Since E is written as a weave of four commands, we obviously try to find decompositions for each of these parts and then apply the Separation Theorem. E is written as E = F. ) I Notice that EE,JaE, = E,, and that G,, where G,, is the command EE, with !x? and !y? replaced by x! and y? respectively, is a specification for a four-phase arbiter similar to the one in Fig. 7 Applying the Separation Theorem to these decompositions, we find the decompo- From the decomposition it follows that each time a token is received, the fourphase arbiter is released for an indefinite period to allow for a pending request to be granted. In case there is no pending request, or after the process has released the token, the request y? will eventually result in a grant nl!, i.e., the sending of the token to the next token-ring interface. By application of the Substitution Theorem, we may substitute the decomposition of Fig. 9 into the three token-ring interfaces of Fig. 8 . Thus, by stepwise and partwise refinement, we obtain a decomposition of a four-phase arbiter, arbitrating among n processes, n > 0, into n basic four-phase arbiters (with a special initialization), n WIRES, and one IWIRE as depicted in Fig. 10 . After a request by process 0, the token may be sent on to the next token-ring interface repeatedly without ever being sent to process 0. This phenomenon, where internal actions do take place but no external actions are performed, is known as livelock [7] . Absence of livelock is not required by our definition of decomposition.
Livelockfree decompositions of arbiters have been given by Seitz [ 171 and Martin [ 111. Both initial solutions, however, contained errors. The corrected versions can be found in [4] .
Concluding remarks
We have illustrated a method for the specification and decomposition of asynchronously communicating components by the design of a component for a wellknown difficult problem: the construction of large arbiters from small ones. The exercise yielded a surprisingly simple outcome and the decomposition could be verified, and perhaps derived, with relative ease.
The program notation of commands allowed for a concise specification of communication behaviours of components. The parallel behaviours of arbiters could be specified conveniently by a conjunction of specific behaviours, i.e., by a weave of commands.
Had we used state graphs instead, we would have had to analyze much larger specifications.
The formalism also enabled us to design a component by stepwise refinement, thanks to the Substitution Theorem, and by partwise refinement, thanks to the Separation Theorem. The Separation Theorem demonstrates that the weaving operator is helpful not only finding succinct specifications of components, but also for finding decompositions of components. The syntax of commands may be of assistance in the derivation of a decomposition: we can try to manipulate and expand a command, by rewriting and inserting internal symbols at particular places, in such a way that the decomposition can be recognized in the expansion.
There may be many places where to insert the internal symbols, and finding the appropriate places is still a bit of magic. By carefully studying a number of exercises, we hope to develop some heuristics for deriving decompositions. The manner in which we have derived a decomposition shows that the task of finding decompositions is very similar, to the task of programming. Programs also are derived by stepwise or partwise refinement, by rewriting, and by introducing local variables. Deriving circuit designs in a similar calculational style may help to master the ever-increasing complexity in VLSI design.
