Kropf 7] has collected together a number of problems that can be used as benchmarks for hardware veri cation, one of which is a bus arbiter. The function of the bus arbiter is to assign the use of a shared resource on each clock cycle to one out of N subsystems that may want to use it, and in such a way that no subsystem is denied access forever. The most signi cant component in its implementation is a round-robin scheduler. Rather than verify the existing implementation of the scheduler, this paper shows how to construct a correct implementation from the given requirements. We make use of both point-free and pointwise relation algebra.
1 Introduction Kropf 7] has collected together a number of problems that can be used as benchmarks for hardware veri cation, one of which is a bus arbiter. The function of the bus arbiter is to assign the use of a shared resource on each clock cycle to one out of N subsystems that may want to use it, and in such a way that no subsystem is denied access forever. The most signi cant component in the implementation of the bus arbiter is a round-robin scheduler. Rather than verify a given implementation of a scheduler, we consider in this paper the much more instructive problem of constructing a scheduler from its speci cation. The basis of our construction is the algebra of relations; we specify the problem within the algebra and then calculate a correct and e cient implementation. In the next section we formulate the task and in the section thereafter we present the algebra in which our solution is presented. Subsequently, we outline our calculations and then we give the calculations in detail. The paper is concluded by a discussion of what has been achieved. 2 The Task A bus arbiter is a device that should assign the use of a resource at each clock cycle to at most one out of N subsystems that may want to use it, and it should do it so that no subsystem is denied access forever. More speci cally, a bus arbiter is a circuit that maps a stream of N boolean inputs, representing requests to use the resource, to N boolean outputs, representing acknowledgements that the resource may be used 1 . Let us call the input stream req and the output stream ack. It is easier to think of B N as a subset of f0,: : : ; N?1g , so we write n 2 req:t to mean that the n-th component of req is high at time t. The speci cation of the arbiter, as given by Kropf 7] , is then:
1. No two output wires are asserted simultaneously: for each instant t, j ack:t j 1 :
2. Acknowledge is not asserted without request: ack:t req:t : 3 . Every persistent request is eventually acknowledged: there is no pair (n; t) such that 8(t 0 : t t 0 : n 2 req:t 0^n 6 2 ack:t 0 ) :
Kropf himself suggests an implementation. The arbiter should normally grant acknowledge to the request that is lowest in index, and fall back to a round-robin scheme when there are too many requests. This is accomplished as follows: at any given moment there is a privileged wire. For simplicity, we may take t mod N to be the privileged wire at any time t: If wire n is privileged, and is asserting a request, and it was asserting its request the previous time it was privileged, then it is acknowledged. This way any wire will be acknowledged in less than 2N clock cycles. One way to implement this arbiter is to construct two modules, called for instance LT and RR, the rst one granting request to the lowest index asserted in its input, and the second one implementing the round-robin algorithm. The rst module is combinational, while the second one has state. Module LT simply returns the lowest numbered signal that is asserted. 1 Note that we are not interested in dealing with metastability problems. Metastability is an e ect that may occur whenever a signal is sampled asynchronously; what may happen is that the signal oats in a state that cannot be interpreted as a logical \true" or as a logical \false". It should be clear that this e ect cannot be modelled within the framework of this paper. What we call \wire" here is a mathematical abstraction of physical wires.
We focus in this paper on the development of the round-robin scheduler, RR. Suppose N is a function that delays an input stream by N clock cycles. That is, The task we undertake in this paper is to construct a circuit that implements RR as speci ed above.
Relation Algebra
We will write our speci cations and our circuits in point-free relation algebra. A brief introduction to our style of relation algebra follows; for a more complete treatment see 1].
A (binary) relation over a set U is a set of pairs of elements of U. For x,y in U and R a relation over U, we write xhRiy instead of (x; y)2R. When a relation R satis es xhRiy^zhRiy ) x=z we say that the relation is deterministic. In that case it may be considered as a function with domain on the right side and target on the left side; we denote by R:y the unique x such that xhRiy holds, if such an x exists. The reason for this name is that we usually interpret relations as programs taking input from the right and producing output on the left. In this way a deterministic relation is interpreted as a deterministic program. We usually use the letters f, g, h to stand for deterministic relations. We use the convention that \." associates to the right so that f:g:x should be parsed as f:(g:x) . (This is contrary to the convention used in the lambda calculus.) Relations are ordered by the usual set inclusion ordering. Hence the set of relations forms a complete lattice. The relation corresponding to the empty set is denoted by ??, and the relation that contains all pairs of elements of U is denoted by >>. The identity relation, , is de ned by xh iy x=y. The composition of two relations R, S is denoted by R S and de ned by xhR Siy 9(z :: xhRiz^zhSiy). Composition is associative and has unit element . The converse of a relation R is written R and is de ned by xhR iy yhRix.
A monotype is a relation A such that A . An example of a monotype is N, de ned by nhNim n=m^(n is a natural number). (Bird and De Moor 2] use the name \core exive" instead of monotype.) There is a one-to-one correspondence between the subsets of U and the monotypes; and this makes it possible to embed set calculus in relation calculus. The left domain of relation R, denoted R < , is the least monotype A such that A R = R. As its name suggests, R < represents the set of all x such that x is related by R to some y. Similarly, the right domain of relation R, denoted R > , is the least monotype A such that R A = R.
The relation R 4 S (pronounced R split S) is de ned as the least relation X such that for all x, y and z, (x; y)hXiz xhRiz^yhSiz. Note that the requirement that R 4 S be the least relation satisfying the above equation in X implies that there is no y such that xhR 4 Siy when x is not a pair. That is, the left domain of R 4 S is a set of pairs. In general, composition does not distribute through split. However, it is the case that (R 4 S) T = (R T) 4 (S T) ( S T T S : (1) The antecedent holds, for example, when T is a deterministic relation (since then T T ). It also holds if S is a so-called left condition: that is, if S = S >>.
We de ne R S (pronounced R times S) by (x; y)hR Si(z; v) xhRiz^yhSiv :
The following properties are easily proved: This use of a comma to separate arguments is, in our view, highly inadvisable for several reasons, but in particular because of the relative size of the \,". In this example, because \," is smaller than \;" many people incorrectly read (3) as R; (S; T); V ] particularly when, as often happens, the author pays no consideration to spacing. The notation used in this document has been carefully chosen to enhance readability. We have also been careful to space formulae so as to automatically suggest the correct parsing of expressions to the human eye and thus minimise the need to refer to a We now generalise the relational product to n-wide products. A product of a single relation is the relation itself. A product of two relations is the usual relational product. The product of n relations R 0 through R n?1 is R 0 (R 1 (: : : R n?1 )). By adopting the convention that associates to the right, we can write the above product as simply R 0 R 1 : : : R n?1 . Corresponding to n-wide products we have n-tuples. For instance, a 3-tuple has the shape (a; (b; c)) for some a, b, c. By adopting the convention that pairing associates to the right, we write the above tuple as simply (a; b; c). We now de ne a notion of left and right arity for relations. First we introduce the (n) family of monotypes:
(1) = (n + 1) = (n) for n 1 : (4) For any expression E in the positive integers, we can assign a corresponding monotype (E) by means of de nition (4) . When the expression is just a numeral or a single letter, we can drop parentheses, and write n in place of (n). We say that R has right arity n i R = R n :
Similarly, we say that R has left arity n whenever R = n R :
We can now de ne new operations. The map operation generates the product of n copies of a circuit: For every R, we have map n :R = n map n :R = map n :R n :
Corresponding to the fusion law (2) we have the map fusion law: map n :(R S) = map n :R map n :S : (5) Generalising from making n copies of a circuit, we also use maps to combine n di erent circuits. The term map:(k : 0 k < n : R k ) denotes a circuit with n inputs and n outputs, the relation between the kth input and output being determined by circuit R k . Note that this more general map also obeys a fusion law like (5) . The zip operation is well-known in functional programming. Informally, zip n transforms a pair of n-tuples into an n-tuple of pairs. One common way to de ne it is:
The above is an acceptable relational de nition, since a function is also a relation. The arity of zip n is given by zip n = n zip n = zip n n n :
A law about zip n is zip n map n :R 4 map n :S = map n :(R 4 S) ; (6) the proof is by induction on n.
About circuits
Following established practice (see 3, 5, 9]) we model a circuit as a relation between arbitrary collections of streams, a stream being a total function with domain the integer numbers. Abusing language somewhat, we will use the word \circuit" to mean an actual circuit, or a relation between streams as described above. Context should make clear which one is meant. We usually denote streams by small letters taken from the beginning of the alphabet. Given a relation R, a relation between streams can be constructed by \lifting": ah _ Rib 8(t :: a:t hRi b:t). Hence for any R, relation _ R is a circuit. Note that, for deterministic relation f, stream a and integer t, f:a:t = ( _ f:a):t. We refer to this property in our calculations by the hint \lifting". Circuits can be built by relational composition, and product: given R and S, two circuits, the relations R S and R S are also circuits. A particular relation on streams is the primitive delay, denoted by @ and de ned by ah@ib 8(t :: a:(t+1) = b:t) :
The delay relation, written , is a generalisation of primitive delay to arbitrary pairings of streams. It is de ned as the least xed point of function X 7 ! @ X X:
Delay can be thought of informally as the union of an in nite list of terms = @ @ @ @ (@ @) (@ @) @ (@ @) (@ @) : : : . Note that is deterministic. The delay relation is polymorphic in the sense that it applies the primitive delay @ to a collection of wires, independently of the shape of the collection. Formally, = : (7) (Note that this and other properties of delays are proved in 11].) Finally, the feedback of a circuit R, written R , is de ned by ahR ib ahRi(b; a) : (8) We may now summarize our means of constructing circuits:
1. If R is a relation then _ R is a circuit. 2. If R, S are circuits, then R S, R S and R 4 S are circuits. 3. Delays are circuits. 4. If R is a circuit, then R is a circuit.
A circuit R is said to be combinational if it is de ned exclusively by means of the rst two items in the above list; i.e., if delay and feedback do not appear in its de nition. A circuit term has an interpretation as a picture that is often useful as an aid to understanding how a circuit term is interpreted as a real circuit. Figure 2 shows the correspondence between pictures and circuit terms.
Design Steps
We recall from section 2 that our task is to construct a hardware circuit implementing the round robin scheduler Because our development of the round robin scheduler is quite long we begin rst by giving an overview. Some of the terms used in this overview may not be completely clear at this stage. They will however be explained in full detail later. The steps are as follows:
1. Low level speci cation.
In the rst step we reformulate the given speci cation of RR using tuples of booleans to represent sets. The new speci cation takes the form RR = lt intersect 4 N : (9) In this speci cation the de nition of RR has been split into three components. There are two advantages of a modular speci cation. One is ease of understanding, which is of considerable help to ensuring that the informal requirements are correctly recorded in the formal speci cation. The other is that it is much easier to identify potential ine ciencies in the implementation. In the second step we analyse the three components with respect to implementability. The lt component is, at rst sight, a potential bottleneck. However, this turns out not to be the case. It can in fact be implemented using what is called a \cyclic multiplexer". The problem with the implementability of RR as speci ed by (9) is the component 4 N . The area required for its implementation is O(N 2 ) since it consists of N delays each with arity N. The conclusion of this phase is thus that it is this component on which we should focus our attention. 3. Goal.
Having analysed the source of ine ciency in (9) we can proceed to formulating the goal. Speci cally, we wish to construct ip-ops k;N such that RR = lt intersect 4 map:(k : 0 k < N : k;N ) (10) and such that each such component has at most one delay element. (The components are called \ ip-ops" because this is a name that is commonly given to memory elements.) 4. Simpli cation of the goal.
The rst step in the achievement of the goal is to simplify it so that it becomes more manageable. The requirement on k;N is that Note that (11) speci es the behaviour of the ip-op k;N only at times t such that t mod N = k. At other times its behaviour is unspeci ed. This increased latitude (compared to the de nition of N whose behaviour is speci ed at all instants) is what is needed to construct an e cient implementation of the circuit.
Construction of the ip-ops.
The nal step is the construction of the components k;N . Again in a series of steps, we calculate that k;N = ( cmx k;N ) ; (12) where cmx k;N is a cyclic multiplexer. Thus the implementation of the ip-op does indeed require only one delay element. The combination of (10) and (12) is then the desired implementation of the round robin scheduler.
This then is the overview. Let us now present the full details. In software, lt would be implemented with a straightforward if-then-else statement. In Ruby the full generality of an if-then-else statement is typically shunned. In this case, however, the full generality is not needed and the component can be implemented using a so-called cyclic multiplexer 8]. For each k, 0 k < N, the cyclic multiplexer cmx k;N has two input streams and one output stream. Its function is to copy the input value in the rst stream to the output at times t such that t mod N = k and to copy the input values from the second stream to the output stream at all other times. Formally, ahcmx k;N i(b; c) 8(t : t mod N = k : a:t = b:t) 8(t : t mod N 6 = k : a:t = c:t) :
In practice, a cyclic multiplexer can be implemented by a combination of a counter and a so-called demultiplexer, using techniques that can be found in any circuit design textbook; see for instance Katz 6] . More discussion on the implementation of lt is in 11].
Comparing the de nition of lt with that of cmx k;N it is clear that lt can be implemented by pairing each of the N bits of the input stream with a stream of false bits and passing each pair of bits to the corresponding cyclic multiplexer. That is, lt = map:(k : 0 k < N : cmx k;N 4 K F ) : (15) where K F is a circuit that ignores its input and constantly outputs the value F (false). This concludes this section. The combination of (13) and (15) is a correct implementation of the round-robin scheduler.
E ciency Analysis and the Goal
One advantage of a modular speci cation like (13) is that it simpli es the task of identifying potential ine ciencies. We need only examine each component in turn.
Assuming that a cyclic multiplexer has an e cient implementation, it is clear that the two components lt and intersect have e cient implementations. The bottleneck in the implementation is in fact the component N . Note that the input arity of this component is N since the input stream is in fact a stream of N bits. The total area required for its implementation is thus O(N 2 ). On the other hand, it seems plausible that an O(N) implementation can be found for RR (although not the component N ) since at any stage only N bits need to be recorded. Speci cally, at any time t it su ces to record the value of the kth input bit only at the last time that it was privileged. We can express our intuition about the memory component that is required as follows. For each input bit k we replace the N component by a memory element k;N , \ " standing for \ ip-op" (this being the name often given by circuit designers to memory elements). That is, we wish to design k;N such that Moreover, the implementation of the ip-ops should involve at most one delay element.
Simplifying the Goal
In the following discussion it will be useful to introduce the name specRR for the term lt intersect The goal is to derive k;N such that specRR = impRR. In this section we simplify the goal by splitting it up into separate requirements for the individual ip-ops and by eliminating the \map: _ " term in the de nition of intersect. We begin by splitting the goal up. 10 Construction of the ip-ops From the de nition of D k;N it is clear that (21) speci es the behaviour of k;N only at times t such that t mod N = k; at all other times there is complete latitude in its behaviour. It is this latitude that we now exploit. The component N can be seen as a memory element that stores N input values. This is because its implementation demands that the input value at each time t is recorded for use at time t+N after which it can be discarded. From (21) it is clear, however, that it su ces to record the input value only at times t such that t mod N = k, that is once every N clock beats. The crucial step in the calculation of k;N below is thus to replace the function mapping t to t?N by a function that is constant for N time intervals. A well known example of such a function is the function mapping t to t div N. But This is the crucial step discussed above. We replace \t?N" using the property of modular arithmetic:
We have thus calculated a functional speci cation of k;N : 
Discussion
Programming has been variously described as an \art", a \craft", a \discipline", a \logic" and a \science". In this paper we have tried to illustrate the calculational style of programming at work. The \art" of calculation lies in the way the original problem is dissected into manageable chunks, the \craft" is formulating the heuristics and goals that lead to a successful resolution of the individual calculations. The \discipline" is to maintain a uniform granularity in executing calculational steps, thus avoiding errors of oversight and omission. The \logic" lies in the way the individual pieces are combined and, the \science" is making the right choice of formalism, at the right level of detail, at the right time. In all these aspects of the programming task, the guarantee of quality is the combination of concision with precision. In this paper we have done our utmost to add yet another clear illustration of the calculational method to the extensive literature on the topic, this time in an application area that is relatively unusual. We have shown how the natural decomposition in the original problem statement is preserved and indeed exploited in shaping the overall calculation. We have discussed at some length the heuristics governing the individual calculations so that the immediate and long-term goals are always clear. We have applied both pointwise and point-free relation algebra, the choice of which to use being governed by the concern for concision without loss of precision. And we have made all our calculational steps explicit and straightforward so that the reader is able to make an easy local check of their accuracy. In this way we hope to have made a modest but lasting contribution to the mathematics of program construction.
