Citation for published version (APA): Rem, M. (1990). The nature of delay-insensitive computing. (Computing science notes; Vol. 9020). Eindhoven: Technische Universiteit Eindhoven.
Introduction
Almost all digital circuits contain clocks; not the types of clock that tell the time, but rather more like metronomes: in its simplest form a clock produces a periodic signal that alternates between a low and a high voltage level. Its high and low going transitions are used to synchronize different parts of the circuit. N ow imagine that the circuit has an input wire whose voltage level is sensed during the period when the clock is high, i.e. from a high going to the next low going transition. This sensing is done by producing the logical conjunction of the levels of the input wire and the clock. The result is stored in a flip-flop. A flip-flop is a device with two stable states; it enters one of these states depending on the level of the voltage it is offered.
If the input wire that is sensed happens to make a high going transition towards the end of the clock period, the voltage produced may be just a small 'runt' pulse, cf. Fig. 1 . If the flip-flop is offered such a marginal pulse, it may linger for a while in a metastable state before entering one of its stable states. Unfortunately, there is no upper bound for the time the flip-flop may stay in the metastable state. This phenomenon is known as the metastability phenomenon [3, 13] . It is sometimes referred to as the glitch phenomenon.
It is essential for clocked circuits that the clock period be chosen sufficiently long to guarantee that all parts of the circuit stabilize within the clock period. The metastability phenomenon obviously conflicts with this timing constraint.
The example above exhibits metastability in the presence of asynchronous inputs, but metastability also arises in arbitration and synchronization. An arbiter is a device that is used to establish mutual exclusion among asynchronous requests. A synchronizer is a device that delays an asynchronous input in such a way that it is synchronized with another signal. The latter is usually the clock. Both arbiters and synchronizers can be realized only if we impose no upper bound on the time they take to produce their outputs. In essence, they do not produce their outputs until they have left the metastable states they possess.
In delay-insensitive systems we accept the fact that the durations of subcomputations may be unbounded. We, therefore, do not use an autonomous clock to synchronize the parts, but we have the different components of the system signal their completion explicitlY[l J. We are aware that it may take quite some time before completions are signaled, but we cater to this by designing the system in such a way that its correct functioning does not depend on these delays.
A system consists of components and connecting wires. It is called delayinsensitive if it functions correctly under arbitrary and possibly varying delays in components and wires. Of course, the delays will affect the operating speed of the system, but this is not considered part of the 'correct functioning'. The type of correctness we do have in mind will be made precise in the sequel.
Communicating data
In order to acquire an operational appreciation of delay-insensitivity, we discuss the problem of delay-insensitively communicating data from one component to another. The problem is to send one bit of information from component S to component R, cf. Fig. 2 .
As a first try, we connect the components by two wires: wire v to convey the bit, and wire r to signal that the data have been sent. The latter is known as a 'data valid' signal. Initially both wires are low. Component S first gives wire v the value of the bit to be communicated; after that it makes wire r high. Component R waits until wire r is high, after which it copies (for instance, into a flip-flop) the value of Wlfe v.
The above scheme will solve the problem only if we know that the delay in wire v does not exceed that in wire r. Such a delay assumption, known as a 'bundling constraint' can, of course, not be made if we want the communication to be delay- The solution is to code the bit to be communicated in such a way that R can detect its arrival [20] . This requires at least two wires to convey the bit: one wire can only have two states (low and high), but we need a third state to indicate the absence of a value. Dual-rail encoding is a technique that uses two wires per bit, cf. Fig. 3 . The absence of a value is coded by two low wires. Value 0 is sent by making wire vO high, and value 1 by making vI high. The two wires are never high simultaneously.
The above scheme is not very useful if more bits have to be communicated successively: when may we decide that S can again send a bit? The only way out is to have R acknowledge that the bit has been received, cf. Fig. 4 Now the receiver is the one that initiates the communication, viz. by making (request) wire a high. The sender does not start sending the bit until it has received this request. The schemes of active and passive sending are also known as data driven and demand driven, respectively.
C-element
The communication protocols developed above can easily be adapted for sending multiple-bit messages. We employ two wires per bit and extend the protocols straightforwardly, cf. Fig. 6 . Since R acknowledges complete messages only, one acknowledge wire suffices.
We have seen that I-bit messages can be acknowledged by means of an OR-gate. An interesting question is what mechanism we need for 2-bit messages. Consider the case that S is active. One may be tempted to generate signal a as the conjunction of vO V vI and wO V wI, cf. Fig. 7 . This implementation, however, is erroneous. A possible sequence of events is vOj;wOj;aj;vOl;al At this point the sender is allowed to transmit another message. However, the low going transition on wO is still on its way, which can interfere with the next message. The problem is that the low going transition on a is generated too earley. Obviously, the AND-gate should be replaced by one that does not produce a low going transition on its outputs until both inputs have gone low.
Such an element is known as a Muller C-element, or simply C-element, cf. Fig.  8 . It is sometimes called a last-of or a rendezvous element. If both inputs a and b have equal values, this value is also produced at output Ci otherwise C remains what it was. This is a state-holding element: if the values at a and b differ, the value at C equals the last common value of the inputs.
A C-element is often used to synchronize different components, cf. Fig. 9 . Components P and Q have to be synchronized to accomplish 'mutual inclusion', i.e., they each have a synchronization point at which they must wait for the other component to reach its synchronization point. This can be realized by the following protocol for P:
and similarly for Q. Statement 5 represents the part that is executed in mutual inclusion with component Q.
Think transitions
Above we have tried to give a conventional description of a C-element, viz. by giving how the output values depend on the input values. Such descriptions, however, are not very adequate for use in delay-insensitive systems. In delay-insensitive systems the transitions are the important events, and what should be specified are the possible orders in which these events may take place [15] . For the C-element these possible orders may be specified by the following behavioral expression:
(ai,bi;cj ;aLbl ;c!)* It expresses that first input wires a and b go high (the comma, which takes priority over the semicolon, expresses concurrency), after which output wire c goes high (the semicolon expresses order), which is followed by a and b going low, after which c goes low. From then on it starts allover again (the asterisk expresses repetition).
The assumption is again that initially all wires are low. If we neglect the directions of the transitions the above expression may be written as
(a,b;c)*
We draw a scheme that shows how the values on the output wires depend on those on the input wires, writing 'low' as 0 and 'high' as 1:
The fact that we have different output values for the same input combination shows that C-elements are indeed sequential (or state-holding) elements.
A behavioral expression specifies an interface between a component and its environment. It specifies when the component may produce output transitions, but it also specifies when its environment may offer input transitions: input transitions are not allowed to arrive at 'wrong moments'. If an input transition arrives 'out of order' this is called computation interference. Now it is becoming clear what we mean by 'correct functioning' of a system. A system consists of components, each specified by the possible orders in which the transitions may occur. The components should be such that the system cannot exhibit computation interference.
In delay-insensitive systems one usually discerns a second correctness requirement, besides absence of computation interference, and that is absence of transmission interference. We speak of transmission interference if there is a connecting wire at which there are at least two transitions simultaneously present. We can phrase transmission interference as a form of computation interference by saying that each wire from point a to point b is a component with (ai ;bi ;a! ;b!)* or simply (a ; b)*, as its behavioral expression.
The behavioral expression does not give a complete description of what a component 'can do'. Consider, for example, the following expression:
., ., ., . Symbols '?' and '1' specify that a and c are inputs and band d outputs. We have not mentioned the directions of the transitions. This component can be implemented by just two wires that connect a with c and b with d. The same two wires would, however, also implement, for example,
. , ., .,. .,.,.,.
where the bar denotes the choice-operator, similar to the plus in regular expressions. The bar has a lower priority than the comma and the semicolon.
Next replace in the above expression d by c, so that only one output remains:
. , ., .,.
. , ., ., .
This component may be implemented by an OR-gate, as the following table shows:
In contrast to that of the C-element, this table exhibits exactly one output value per input combination. Such processes are called combinational.
Formal definition of processes and systems
Before giving a formal (operational) definition of delay-insensitivity, we must first define what processes and systems are. We use a simple trace-theoretic model for processes:
A process T, sometimes referred to as a directed process, is a triple (1,0, T) process (1,0, T) with a, b, ab, ba, abc, bac, abca, baca, ... } where c denotes the empty trace. This process is a C-element. We usually specify it by the behavioral expression A system is a set of processes, such that each symbol of a process occurs in exactly one process as input symbol and in exactly one process as output symbol. The connecting wires are not modeled explicitly; each symbol represents a wire, running from the process of which it is an output symbol to the process of which it is an input symbol. Thus we have defined what is known as a closed system (no dangling inputs or outputs) with point-to-point connections. Both conditions may be weakened, but the restricted definition suffices for our purposes.
Example 2 Consider the system consisting of four processes specified by Po:
(a?,b?;c!)* P,: (d! ;e! ;c?l' P 2 : (d?;a!)* P 3 : (e? ; b!)* Process Po is a C-element. A pictorial impression of the system is shown in Fig. 11 .
Definition of delay-insensitivity Consider a system of n processes: Po, p,,'" ,P n -" where Pi = (Ii, ai, Ti) . The states of the system are the n-tuples (to, t" ... ,t n -,)
with ti E (Ii U Oi)*. We define the reachable states of the system as follows : 1) 2) (c, c,···, c) is reachable if (to,"', ti,"" tn-I) is reachable (0 S; i < n) and a E Oi 1\ tia E Ti or a E Ii n OJ 1\ a#tj > a#ti then (to, ... , t,a, ... , tn-I) is reachable 3) no other states are reachable where a#t denotes the number of occurrences of symbol a in trace t.
The idea behind the above definition is that in state (to, tl"'" tn_I) trace ti is the current trace of process Pi. Condition 1) expresses that the initial state is reachable. In the course of a computation current traces are extended only. They can be extended with output symbols and with input symbols. The rule governing these extensions distinguishes output and input. Condition 2) expresses that the current trace of a process may be extended with an output symbol if the extended trace belongs to the trace set of the process. Notice that the prefix-closedness implies that then the current trace was in the trace set as well. The second part of 2) expresses that the current trace may be extended with an input symbol if that symbol happens to be 'on its way', i.e. if it has been output more often than it has been received. This extension may lead to a current trace that is not in the trace set of the process. The reception of an input is actually the only way to bring the current trace outside the trace set. The model captures that processes do control (by their trace sets) the sending of outputs but not the reception of inputs.
Examples of reachable states for the system of Example 2 are (c,c,c,c) (c,d,c,c) (c, de, c, c) (c,de,c,e) (c, de, c, eb) (b, de, c, eb) We have now all ingredients to define delay-insensitivity for systems. State (to,tl,···,t nl ) is called safe if
1\
The first condition expresses the absence of computation interference and the second one the absence of transmission interference. A system is called delay-insensitive if all its reachable states are safe.
The system of Example 2 is an example of a delay-insensitive system. The following example is not delay-insensitive. Process P denotes the reflection of process P, i.e. if P = (I,O,T) then P = (O,I,T) . 
Decomposition
Suppose a computation is specified as a process and we have to design a delayinsensitive implementation for it. In other words, we have to find a set of processes into which the specified process can be decomposed [21, 12, 11, 8, 18 ].
Let P be a process and let X be a set of processes such that P rt X. We define set X to be a decomposition of process P if set Xu {P} is a delay-insensitive system. 
as the following table of reachable states shows:
where Q is the process given by (a! ; c? ; b! ; d?l'. It is, however, also a decomposition of, for example, (a?;c!;a?;c! I b?;d!;b?;d!l' as can be easily checked. This proves the claim made in Section 4. It also shows that composition cannot simply be the inverse of decomposition. A suitable definition of composition can be found in [17, 4] .
Example 7 A 3-input C-element can be decomposed into two 2-input C-elements:
Q : A decomposition rule is useful only if it satisfies the substitution property. This property states that if process P decomposes into XU{ Q} and process Q decomposes into Y then P decomposes into Xu Y. Our decomposition rule indeed satisfies the substitution property, provided that distinct names are used for the internal wires in X and Y. Example 8 In this example a process is decomposed into a set of just one process. In other words, the latter process implements, or 'refines', the other process.
Consider process P, given by P: Process Q differs from process P in that it does not produce output c. Process P can be decomposed into process Q, as the following table shows:
This example demonstrates that in the choice between outputs the designer is allowed to make an a priori choice. The word 'allowed' means here, of course: without running the risk of causing computation or transmission interference, since these are the only correctness concerns we have introduced. In particular have we not considered progress requirements.
A designer is not allowed to make an a priori choice between inputs. For example, process P does not decompose into Q:
Here we have computation interference: ac is not a trace of Q. As an aside we mention that Q does decompose into P.
An interesting question is whether a process decomposes into itself. This is in general not the case. Process P l of Example 2 is a process that does not decompose into itself, as we observed in Example 4.
Processes that decompose into themselves are known as delay-insensitive processes. The C-element is an example of a delay-insensitive process. There are several characterizations of delay-insensitive processes, the oldest of which was given by J.T. Udding[16J. As we have seen in Example 2, processes that are not delay-insensitive can very well be used to construct delay-insensitive systems.
Building blocks
The typical way of designing an inverter in CMOS is shown in Fig. 12 . The input is forked to two transistors. This is clearly not a delay-insensitive decomposition of an inverter into two transistors: if one of the two branches of the fork is exceptionally slow a conveying connection between power and ground is maintained, a situation that is more commonly known as a short circuit.
Individual transistors are simply too primitive to be used as building blocks for delay-insensitive compositions. Delay-insensitive systems require building blocks of a higher aggregation level. Ebergen[5] has outlined a finite set of building blocks Figure 12 : A CMOS inverter into which all delay-insensitive processes can be decomposed. This set consists of two types of C-elements, a fork, an exclusive OR, a toggle, and an arbiter. Internally such building blocks will not be delay-insensitive. They correspond to what Seitz [14] has termed equipotential regions.
As mentioned in Section 4, combinational processes are processes that have exactly one output combination for each combination of input values. An example of a combinational process is M:
(a ? b?· dl . c? . el )* ., ., ., ., .
as the following table of input values and corresponding output values shows: M is a process with two outputs. According to the table above, output d is the majority of the input values and output e is a copy of input c. Let process P be specified by (d? ic!)*. Then C-element (a?,b? ie!l' can be decomposed into {M,P}:
Thus we have exhibited a delay-insensitive decomposition of a sequential process into two combinational processes. Brzozowski and Ebergen [2] have shown that sequential processes cannot be decomposed into sets that contain only forks, i.e. processes of the form (a? ; b!, c!)', and single-output combinational processes. Martin[9] shows that extending these sets with C-elements does not help very much. Essentially, the only sequential processes that can then be built are various forms of C-elements.
Speed-independent
In the speed-independent computing model, which is older than the delay-insensitive one [lO] , all delays are assumed to be in the components. The wires do not exhibit delay, which makes transmission interference not an issue.
In order to define speed-independence more precisely, we need to change our definition of reachable states (which models asynchronous communication) into one that is based on synchronous communication. For synchronously reachable the second condition in the definition of reachable reads: 2) if (to, ... , ti, ... , t j , ••• , t n _ 1 ) is reachable and
A system is called speed-independent if all states that are synchronously reachable are safe.
The reachable states under synchronous communication form a subset of those that are reachable under asynchronous communication. Delay-insensitive systems are, consequently, also speed-independent. The inverse is not true.
We show that a C-element can speed-independently be decomposed into a singleoutput combinational process Po and a fork P 1 [6]: (a? b?· d' . e?)* *' ., ., . Process P 1 is a kind of fork that is (in speed-independent settings) often referred to as an isochronic fork. In order to demonstrate that C-element c:
(a?,b?;c!)* can speed-independently be decomposed into {Po, PI}, we investigate system {Po, PI, C}, with C given by (a!, b!; c?)*. This system is indeed speed-independent:
System {Po, PI, C} is not delay-insensitive. An important difference between speedindependence and delay-insensitivity is that in the speed-independent model we can realize forks that guarantee that one of its outputs arrives earlier at a component than the other one does.
Conclusion
Starting with the problem of communicating data, we have gradually found our way to an operational, but precise, definition of delay-insensitivity. The virtue of this operational model is not only its relative simplicity, but also its clear relation with computing media in general and VLSI circuitry in particular. We have used trace theory [19, 7] to formulate these definitions, since traces are very well-suited to express nontemporal relations between events. Our treatment exhibits a clear separation between the communication model, which captures the types of delays we want the correctness of the system to be independent of, and the correctness concerns. We have discussed two communication models: one in which the delays are both in the components and in the wires, and one in which the delays are just in the wires. With respect to correctness we have, throughout the paper, sticked to just one correctness concern: absence of interference.
Design is nothing else than decomposing large problems into smaller ones, until the latter problems either are trivial or have been solved before. Therefore, we have extensively addressed the concept of decomposition, interleaved with many examples. There is a limit to delay-insensitivity: one ends up with primitive building blocks of one kind or another. We have briefly discussed the nature of these blocks.
Acknowledgements
I am indebted to Tom Verhoeff, who is the inspirator behind the operational model in this paper. Ivan Sutherland coined the title of Section 4. Acknowledgements are also due to Kees van Berkel and the members of the Eindhoven VLSI Club for numerous discussions on the ins and outs of delay-insensitivity. 86/13 R. Gerth W.P. de Roever
Title
The formal specification and derivation of CMOS-circuits.
On aIithmetic operations with M-out-of-N-codes.
Use of a computer for evaluation of flow films.
Delay insensitive directed trace structures satisfy the foam the foam rubber wrapper postulate.
Specifying message passing and real-time systems.
ELISA, A language for formal speCification of information systems.
Some reflections on the implementation of trace structures.
The partition of an information system in several systems.
A framewoIK for the conceptual modeling of discrete dynamic systems.
Nondeterminism and divergence created by concealment in CSP.
On proving communication closedness of distributed layers.
Compositional semantics for real-time distributed computing (Inf.&Control 1987).
Full abstraction of a real-time denotational semantics for an OCCAM-like language.
A compositional proof theory for real-time distributed message passing.
Questions to Robin Milner -A responder's commentary (IFIP86).
A timed failures model for extended communicating processes.
Proving monitors revisited: a first step towards i'lUif}.ing object oriented systems (Fund. Informatica
On the existence of sound and complete axiomati zations of the monitor concept.
Federatieve Databases.
A formal approach to distributed information systems.
Delay-insensitive codes -An overview.
Enforcing non-determinism via linear time temporal logic specification.
Temporele logica specificatie van message passing en real-time systemen (in Dutch).
Specifying message passing and real-time systems with real-time temporal logic.
The maximum number of states after projection.
Language extensions to study structures for raster graphics.
Three families of maximally nondeterministic automata.
Eldorado ins and outs. Specifications of a data base management toolkit according to the functional model.
OR and AI approaches to decision support systems.
Playing with patterns -searching for strings. A compositional proof system for an occam-like real-time language.
A compositional semantics for statecharts.
Normal forms for a class of formulas.
Modelling of discrete dynamic systems framework and examples. 
