We present a derivation of a regular language recognizing circuit originally developed by Foster and Kung 4]. We make use of pointfree relation algebra, in a style combining elements from earlier work in the Eindhoven Mathematics of Program Construction group 0], and from Ruby 6]. First we derive non-systolic recognizers, much in the same way as functional programs are derived. Then we make use of standard circuit transformation techniques, recast in the relation algebra framework, to obtain circuits that are very close to the ones presented by Foster and Kung.
Introduction
In 1982, Foster and Kung 4] presented a specialised silicon compiler that constructs recognizers for regular languages. The compiler was presented without formal justi cation; indeed, they did not present a formal speci cation of the functionality of the compiler. Their informal description of the functioning left much room for alternative interpretations. Subsequently, Backhouse 1] veri ed the correctness of Foster and Kung's compiler. His task amounted primarily to reverse engineering | trying to discover the speci cation satis ed by the compiler. This resulted in the discovery of an error in Foster and Kung's construction | acknowledged by Foster in his Ph.D. thesis 3] . Otherwise the formal calculations in Backhouse's report were disappointingly complicated and not judged by its author to be worthy of widespread publication. In this paper we present a formal derivation of Foster and Kung's compiler. The complexities of the earlier veri cation have been overcome in two ways: by exploiting (point-free) relation algebra rather than elementary predicate calculus, and by a judicious decomposition of the design task. Our design consists of rst deriving a non-systolic implementation, followed by a transformation of this design to a systolic version using standard techniques (\slowing" and \retiming " 6] ). A precise de nition of \systolic" can be found in Leiserson's thesis 8]. For our purposes, a circuit is systolic when it can be seen as a network of processing elements interconnected by wires, these wires being interrupted by delays (registers). The presence of the delays on the wires has an important e ect on the minimum clock period that can be assigned to a circuit. In fact, the presence of long wires uninterrupted by delays forces the designer to assign a larger clock period to the circuit, and that in turn may have a negative e ect on the overall performance. Foster presents a formal veri cation of the compiler in his Ph.D. thesis 3] . Both the speci cation and the implementation have been adapted in order to overcome the error in 4]. We, ourselves, have as yet been unable to understand Foster's arguments, possibly because we do not understand how to describe the non-standard components he uses as stream transducers. In this paper we take the easy way out and avoid the problem rather than overcome it. A further weakness of this paper is that the individual calculations are not tied together into a completely rigorous whole. These weaknesses are pointed out in the relevant places. A formal derivation of a similar compiler has also been given by Kaldewaij and Zwaan 7] , but their implementation is not systolic, in the sense that the minimum clock period that can be assigned to their circuits is a function of the length of the regular expression to be matched. In contrast, the circuits that we generate can be assigned a clock period that is independent of the number of sequence operators in the regular expression, although it does depend on the number of star and choice operators. The paper is organized as follows: in the next section, we introduce the reader to relation algebra, and our de nition of \circuit". In section 2 we show how to specify the problem with relation algebra. Then in section 3 we derive a rst, non-systolic version of the recognizers. In section 4 standard circuit transformations are applied to obtain a systolic version. Finally, in section 5 some conclusions are drawn.
About circuits and relation algebra
We will write our speci cations and our circuits in point-free relation algebra. A brief introduction to our style of relation algebra follows; for a more complete treatment, see 0].
A (binary) relation over a set U is a set of pairs of elements of U. For x,y in U and R a relation over U, we write xhRiy instead of (x; y) in R. When a relation R satis es xhRiy^zhRiy ) x=z we say that the relation is deterministic. In that case it may be considered as a function with domain on the right side and target on the left side; we denote by R:y the unique x such that xhRiy holds, if such an x exists. The reason for this name is that we usually interpret relations as programs taking input from the right and producing output on the left. In this way a deterministic relation is interpreted as a deterministic program. We usually use the letters f, g, h to stand for deterministic relations. We use the convention that \." associates to the right so that f:g:x should be parsed as f:(g:x) . (This is contrary to the convention used in the lambda calculus.) Relations are ordered by the usual set inclusion ordering. Hence the set of relations forms a complete lattice. The relation corresponding to the empty set is denoted by ??, and the relation that contains all pairs of elements of U is denoted by >>. The identity relation, I, is de ned by xhIiy x=y.
The composition of two relations R,S is denoted by R S and de ned by xhR Siy 9(z :: xhRiz^zhSiy). Composition is associative and has unit element I. The converse of a relation R is written R and is de ned by xhR iy yhRix.
A monotype is a relation A such that A I. An example of a monotype is I N, de ned by nhI Nim n=m^(n is a natural number). There is a clear one-to-one correspondence between the subsets of U and the monotypes; and this makes it possible to embed set calculus in relation calculus. The left domain of relation R, denoted R < , is the least monotype A such that A R = R. As its name suggests, R < represents the set of all x such that x is related by R to some y.
A left condition is a relation R such that R = R >>. Clearly, if R is a left condition, then, for all x, 9(y :: xhRiy) 8(z :: xhRiz). This suggests that a left condition may also be interpreted as a set, as we may take it to represent the set of values x such that 9(y :: xhRiy). We usually abuse notation by writing x2R in place of 9(y :: xhRiy), when R is a left condition.
A right condition is de ned analogously, but we will not need to use right conditions in this paper. There is obviously a 1{1 correspondence between monotypes and left conditions given by the functions R 7 !R < and R 7 !R >>. Making the right choice of which to use can simplify calculations a great deal. We use both in this paper.
The relation R 4 S (pronounced R split S) is de ned as the least relation X such that for all x, y and z, (x; y)hXiz xhRiz^yhSiz. Note 
if f is a deterministic relation. We de ne R S (pronounced R times S) by (x; y)hR Si(z; v) xhRiz^yhSiv.
The projection relations and are de ned by xh i(y; z) x=y and xh i(y; z) x=z. The following properties are easily proved:
The large number of binary operators that we use may make it di cult to parse our expressions; but the precedences were carefully chosen in order to minimise the need for parentheses, and the spacing around operators hints at the way to read a formula. See table 0 for a complete list of precedences.
Following established practice (see 5, 6 , 11] ) we model a circuit as a relation between arbitrary collections of streams, a stream being a total function with domain the integer numbers. Abusing language somewhat, we will use the word \circuit" to mean an actual circuit, or a relation between streams as described above. Context should make clear which one is meant. We usually denote streams by the letters a through e.
As an alternative to our de nition, it is possible to de ne streams as functions on the natural numbers (rather than the integers); but this leads to a more complicated theory, where many equalities no longer hold (see 6] for details). Our de nition corresponds in a sense to ignoring initialisation problems. One may see here an analogy with traditional derivation of programs, where one can factor a proof of correctness into a proof of partial correctness together with a proof of termination. What we have instead is a derivation of a circuit that is correct, provided that the circuit can be initialized. This leaves us with the obligation of proving that our circuits can be correctly initialized. We will not devote much space to this latter problem. We trust that the reader will see that our circuits can be initialized, provided that there is a way to set the contents of all boolean delays to false, and all character delays to some value di erent from the encoding of all symbols in T . We assume that some \reset" wire exists in the implementation that performs this function, and we will not give further mention to this issue.
Given a relation R, a relation between streams can be constructed by \lift-ing": ah _ Rib 8(n :: a:n hRib:n). Hence for any R, relation _ R is a circuit. Note that, for deterministic relation f, stream a and integer m, f:a:m = ( _ f:a):m. We refer to this property in our calculations by the hint \lifting". Circuits can be built by relational composition, and product: given R and S, two circuits, the relations R S and R S are also circuits.
A particular relation on streams is the primitive delay, denoted by @ and lay is de ned to be the converse of delay. In the interpretation as circuits, a delay is a memory element that outputs the contents of memory on the left side, and at every clock tick replaces the contents of memory with the input on its right side. The interpretation of antidelays is the same, with the role of \left" and \right" reversed. Note that both and are deterministic. We de ne the identity relation for streams in a way that is similar to how we de ned delay. The primitive stream identity is de ned by ah ib 8(n :: a:n =b:n).
The identity on arbitrary pairings of streams, denoted by , is then de ned by
The delay relations are polytypic in the sense that they apply the primitive delay @ to a collection of wires, independently of the shape of the collection.
Formally,
(Note that this and other properties of delays are proved in appendix A.) A similar domain property that we use frequently is:
These equations express the fact that applying to a pair of (collections of) wires ( ) is the same as applied to each component of the pair ( ). From this property, and the corresponding one for , one immediately obtains the following useful distributivity properties: for 32f ; g,
and from (0) and the fact that delays are deterministic, one obtains
Finally, the feedback of a circuit R, written R , is de ned by ahR ib ahRi(b; a).
We may now summarize our means of constructing circuits:
0. If R is a relation then _ R is a circuit.
1. The projections and are circuits.
2. If R,S are circuits, then R S, R S, R 4 S, and R are circuits.
3. Delays and antidelays are circuits.
If R is a circuit, then R is a circuit.
A circuit R is said to be combinational if it is de ned exclusively by means of the rst three items in the above list; i.e., if delay, antidelay and feedback do not appear in its de nition. A circuit term has an interpretation as a picture that is often useful as an aid to understanding how a circuit term is interpreted as a real circuit. A picture shows which \parts" of a circuit are connected; and interconnections are important in order to evaluate the circuit's performance. Figure 0 shows the correspondence between pictures and circuit terms. The picture interpretation shows the presence of combinational paths in the circuit. A picture may be seen as a graph, where combinational elements and delays are nodes, and wires are edges. A combinational path is a path in the picture that does not contain delays. One important parameter for the e ciency of a circuit implementation is the clock speed. Roughly speaking, the shortest clock period that can be assigned to a circuit implementation A circuit is said to be systolic when it is built out of small modules, interconnected by wires interrupted by delays (see 8] ).
There are many optimisation techniques that can be used to improve the performance of circuits. Here we will make use of retiming and slowdown.
Retiming (see 9]) is a transformation that is essentially based on the following laws: given that R is a circuit as de ned above, R = R (6) and R = R
These laws can be proved by structural induction (see appendix A). Combining (6) with the property that = = (8) we obtain the property R = R ; (9) and from the combination (8) and (7), we obtain R = R ; (10) for all circuits R.
(Note that the two retiming laws (9) and (10) break down when the domain of streams is taken to be the natural numbers rather than the integers. Instead of equalities one obtains isomorphisms up to retiming, making calculations more cumbersome.)
Another optimisation technique is slowdown 12]. Given a circuit R, the circuit slow:R is obtained by replacing every occurrence of and in R by and , respectively. The slowed circuit is not equivalent to the original one; it has di erent timing properties. The reason for implementing a slowed version of a circuit is that the extra delays that are introduced can be shifted around by means of the retiming laws, with the general goal of making the circuit more systolic.
The relation between R and slow:R is formally described in 12, p. 8]. Our use of slow is one of the places we alluded to in the introduction where our account is not completely rigorous. To illustrate the use of slowing, suppose we are implementing circuit ( ( R)) n (11) for some n >0, where R is combinational The last line is a circuit whose interpretation has no long combinational paths:
In fact, the length of the longest combinational path is no longer dependent on n. The n term, enclosed in a dashed box in the picture, would not normally be implemented. Rather, it can be thought of as a timing speci cation of the circuit. It tells us what precisely is the timing di erence between the original circuit, and the one that is actually implemented. Suppose one were to implement the circuit in the last picture, minus the part in the dashed box. In order to obtain a circuit equivalent to slow:( ( R)) n , one should delay the input on the upper wire by three clock ticks, and \an-ticipate" the output of the lower wire by the same number of ticks. (Of course, while delaying is certainly an implementable operation, anticipating usually is not.) Note that the placement of delays and antidelays (outside the dashed box) implies that the ow of data through the circuit is both from left to right and from right to left; this is called \contra-ow".
The speci cation
The problem we want to consider is that of formulating a syntax-directed construction of a systolic circuit that (repeatedly) recognizes strings in the language denoted by a regular expression. The syntax of a regular expression is given by the BNF grammar E ::= t jE+E jE ; E jE where t stands for all elements of a given nite alphabet T .
To begin with, we de ne a mapping from E to the set of stream transducers F, where
Thus, given a regular expression E, the recognizer for E maps a pair con- In order to avoid the error in 4] we shall restrict the regular expressions to those expressions not including a subexpression E such that the empty word is a member of E. It is well known that this does not reduce the expressive power of regular expressions and that every regular expression can be easily transformed to one of this form. We denote the isomorphism between strings of booleans and left conditions by tt (standing for \times true") and de ne it by, for all integers m and n and all streams of booleans e, mhtt:ein e:m For instance, if the stream e is de ned for all n by e:n (n is odd), then tt:e = f1; 3; 5; : : :g. We also introduce a binary relation on integers mem:(E; a), for each expression E and each stream of letters a, de ned by mhmem:(E; a)in a n; m)2E
where a n; m) denotes the string a:n ; a:(n+1) ; : : : ; a:(n+m?1). Note the switch in the order of m and n. Returning to the example where E = t ; t, we have that if the stream a is de ned by a:n t for all n, then mhmem:(t ; t; a)in m = n + 2. Another way to look at mem is as a set transformer. If we compose mem:(E; a) after a left condition, we obtain another left condition:
mem:(E; a) (R >>)=(mem:(E; a) R) >> So if R >> can be interpreted as the set f0; 1; 5g, then, given a de ned as above, mem:(t ; t; a) R >> could be interpreted as f2; 3; 7g.
The following properties of mem are easily veri ed (see appendix B):
mhmem:(t; a)in m = n+1^a:n =t mem:(E+F; a) = mem:(E; a) mem:(F; a) mem:(E ; F; a) = mem:(F; a) mem:(E; a) mem:(E ; a) = (mem:(E; a)) (12) These properties provide ample justi cation for choosing to use relation algebra in the formal speci cation of the recognizer: the function E7 !mem:(E; a)
is a homomorphism from the algebra of expressions to the algebra of relations.
We say that a circuit f (formally a stream transducer, i.e., an element of F) recognizes regular expression E when the following holds, for all a2Stream(T ) and e2Stream(I B): tt:f:(a; e) = mem:(E; a) tt:e A way to read this is: the set of times at which e is true, that is tt:e, is transformed by mem into a set that must be exactly the same as the set of times at which f:(a; e) is true.
A non-systolic recognizer
Once the speci cation is made clear, deriving a (non-systolic) recognizer is easy. We begin by deriving the recognizer for a single character. We have: 9 (n :: a n; m)2E^n2S)) ) f assume " = 2 E and mem:(E; a) > I N Predicate calculus. g 8(m : m2I N : m2S 9(n : n2I N : n <m^n2S))^S I N f the natural numbers are well-founded g S = ;
f calculus, S = X >> g X = ??
We have thus found that the assumptions " = 2 E and mem:(E; a) > I N together imply that mem:(E; a) is well-founded. The second of these assumptions is equivalent to postulating that the stream a is such that if a segment of a is a word in E, then this segment is wholly contained in the non-negative \half". Actually, it simpli es matters if we make an even stronger postulate, namely that for all n <0, the value of a:n is some character not appearing in E: This corresponds to asserting that the circuit is fed invalid input until time 0. One may think of time 0 as the moment after the circuit is reset. Given this assumption, we may henceforth just say that mem:(E; a) is well-founded if " = 2 E.
We are now ready to tackle the derivation of the circuit that recognizes E . Assume f recognizes E. Assume also that " = it is apparent from the picture interpretation of the above circuit =v^=u^=t that there is a path that is not interrupted by delays from one side to the other of the circuit. Such a path, often called a \combinational path", places a constraint on the implementation since the longer the path, the longer it takes for the real circuit to stabilize after a change in the input. This forces the implementer to use a clock with a longer period, with a possible negative e ect on the performance of the whole circuit (see, for example, 9]). Even worse is the fact that for every string recognizer, this path grows in length with the length of the string. In order to apply the optimisation techniques we have introduced in section 1, we should try to transform our circuits so that they exhibit contraow. Taking inspiration from Foster and Kung's work, we concentrate on sequence. We begin by considering a general technique for introducing contraow in a circuit. Suppose we want to implement a circuit R, taking input on the right side and producing output on the left side. A simple way to introduce contra-ow is to implement 4 R >> (14) instead of R, the relation between the two being This is a straightforward consequence of the retiming equations (6) and (7) and the domain equations (2).
Returning now to recognizers, let g 4 f be the recognizer for E ; F. (The expression has to be parsed as (((E 1 ; E 2 ) ; E 3 ) ; : : :) ; E n to achieve this result.) The bene t of this transformation is maximised in the case that the expression to be recognized is a sequence of characters. Before slow-down the constructed circuit has of course a pair of wires stretching across the full breadth of the circuit. After slowing and retiming the circuit is completely systolic (see gure 3). If the regular expression does not have this very special shape the bene t is diminished. Consider for instance expressions of the form E 1 + E 2 + E 3 + : : :
The translation of such expressions has the form This way we have reduced the problem of nding a recursive de nition for to the problem of nding a recursive de nition for , where the ( term) part does not occur. We now proceed by cases. Note rst, however, that the requirement on :E satis es :E plumb < = slow: :E plumb < is met if :E = slow: :E The addition of the context condition plumb < is needed for the application of equation (18) Finally, we get to the sequence operator. As mentioned earlier, this is where the context condition is needed:
:(E ; F) plumb < = 
Conclusions
In the usual squiggol style, one works with syntactic terms that can be interpreted as both mathematical functions, and computer programs. What one does then is to take a term and transform it according to rules that do not change the functional interpretation, but may | and should | change the e ciency of the term interpreted as a program. What we did in the last section is very similar, except that instead of working with a simple term, we had to improve the e ciency of a term-valued function, . This is how functions like come into being. Its characterisation as a function from relations to relations is simple; but it is not as easy to specify formally what we expect to do as a function from syntactical terms to syntactical terms. What we had in mind as we worked is \apply the useful transformations as thoroughly as possible." It could prove fruitful to apply further work to develop notations for cleanly specifying term transformation functions of this kind. An interesting element of our derivation is its use of the unique extension property (uep) for regular languages in the case of a starred expression. The fact that the subexpression should not include the empty word is a necessary and su cient condition for application of the uep. This is where the error occurred in Foster and Kung's original paper. The non-uniqueness of solutions to certain equations in relation calculus corresponds to indeterminate behaviour in the corresponding circuits.
At the present stage of our work we are not completely content with the clarity and rigour of our derivation. We are very satis ed with the derivation of the non-systolic recognizer and with the division of the derivation into two phases. Our dissatisfaction is with the formal presentation of retiming and slowdown and, in particular, tying together individual calculations. We are currently endeavouring to eliminate these weaknesses.
A Proof of the delay and retiming laws
This section contains proofs of the properties of delay and, in particular, the retiming laws. The proof of (2) The proof of the second half, and of the corresponding properties of , are similar.
The next calculation establishes property (8) .
Note that all the proofs we have given until now can be easily modi ed to prove the corresponding properties for antidelay. We will then assume that the reader is convinced that both (6) and (7) hold for lifting, projections, and delays.
Suppose now that both (6) and (7) 
