:\ hstrart: Path cxprcssions wcrc originally proposed by Campbell and Habcrmann II] as a mechanism Ibr pnscss synchronization at the rnonitljr lcvcl in %oftwarc. Not uncxpccrcdly, they also probidc a useful notation for specifying the bcbavior of asynchronous circuits. Ylotivatcd by this potential application WC invcstigatc how to directly translate path cxprcssions into hardware.
Introduction
As the boundary brtwccn software and hardware grows less and less distinct. it bccomcs increasingly important to invcstigatc methods of diicctly implementing various programming langungc features in ha&are.
Since many of the problems in interfacing hardware devices involve some form of process synchronization. language fcaturcs for synchrcmiration dcscrvc considcrablc attention in such investigations. Permission to copy without fee all or part of this material is granted provided that the copies arc not made or distributed for direct commcnzial advantage, the ACM copyright notice and the title of the publication and its date appear. and notice is given that copying is by permission of the Association for Computing Machinery. To copy othewir.
In this paper WC consider tbc problem of directly implcmcnting path -------------------------------
or to republish, requires a fee and/or specific permission. shown how to implcmcnt Petri nets as circuits by using a PLA-like dcvicc called an asynchronous logic array 1111. Thus. an obvious method for compiling path expressions into circuits would bc to first translate the path cxprcssion into a Petri net and then to implcmcnt the Petri net as a circuit using an asynchrouous logic array. However, careful examination of Iaucr and Campbell's schcmc shows that a multiple path cxprccsion consisting of M paths each of Icngth K can result in a Petri net with K" places. Thus, the naive approach will in gcncral be infeasible if the number of individual p&s in a multiple path expression is large.
For the cast of a path expression with a single path their schcmc dots rcsulf in Petri net whtch is comparabic in size to tltc path expression.
However, direct implcmcnmtion of such a ncr using Path's ideas may still result in a circuit wi!h an unacceptably large area. An asynchronous logic array for a Petri net with P places and '1' transitions wi!l have ~1"s proportional to P.'T' rcgardlcss of the number of arcs in the net. Since the nets obtained from path cxprcssions tend to have sparse cdgc SCLF, this quilcir;ltic bch&or stay w:lFtC significant chip arca.
Pcthaps, the work that is cluscst to ours is due to I .i and Inucr In contrast. the circuits produced by the construction drscribcd in this paper have arra proportional to N 'log(N) where N is the total length of the mulriplc path cxprcssion under consideration. Furthcrmorc, this bound holds regardless of the number of individual paths or the dcgrcc of syn~hrunization bctwccn paths. As in [3] and [4] JJIC basic idea is to pcncratc circuits for which the underlying graph structure has a constant scparstor theorem [7] . For path cxprcssions with a single path the tcchniqucs used by [3] and [4] can bc adapted without great difhculty. For multiple paths with common cvcnt names. however, the construction is not straightforward, bccausc of the potential need for synchronization at many diffcrcnt points on each individual path.
Morrovcr. the actual circuits that WC USC must bc much more complicated than the spnchrouous ones used in ( [3] . 141). Since cvcnts dre inhcrcntly asynchrouous in our model, all of our circuits must be self-timed. This requires the use of special circuit &sign techniques and significantly complicates the proof that this circuit corrcetly cal)turec the semantics ofpath expressions. 
The Semantics of Path Expressions
In this section WC give a simple but formal semantics for path expressions in terms of partially ordered multiscts of events [12] . We also relate our semantics to the one in terms of Petri Nets given by
Lauer and Campbell 161. If LhC OrdCI'hg relation of a pomsct P over Z is a total order, then we Ciltl naturally ilSSOCiiltC a scqucncc of clcmcnts of X with Y; WC will use S(P) to dcnotc this scqucncc. In fact. a pomsct should bc rcgardcd as a natural gcncrali/,ation of a scqucncc in which ccrtnin clcmcnts are pcrmittcd to bc concurrent; this is why the concept is useful in modeling systems whcrc several cvcnts may occur simultaneously.
Definition 2 IfP = (Q. <. l-9 is a pomsct over Z and Z, c Z, then the rmricriorl of P to Z, is the pomset P( A simple purh expression is a regular cxprcssion wit!1 an outermost Klccnc star, The only operators pcrmittcd in the regular cxprcssion arc (in order of prcccdcnce) "*", 'I;", and "+'I. The "*" operator is the Klccne star, ":" is the scqucncing operator, and "+" rcprescnts exclusive choice. Operands are cvcnt names from some set of cvcnts X that WC will assume to bc fixed in this paper. The outermost Kleene star is usually rcprcscntcd by the delimiting keyword path . . . end. Thus (a)' would be rcprescnted as path a end.
A multiple path expression is a set of simple path cxprcssionr. As we will see shortly. each additional simple path cxprcssion further constrains the order in which events can occur. However, we cannot simply take as our semantics for multiple path expressions the intersection of the languages corresponding to the individual patb expressions; two events whose order is not explicitly rcstrictcd by one of the simple path expressions may bc concurrent. For example, in the multiple path cxprcssion for the rcadcrs and writers problrm discussed in the introduction the two mad cvcnts R, and R, $rray occuf simultaneously.
Ncvcrthcless. WC will stil! have occasion to use ordinary regular cxprcssions in giving the semantics for path cxprcssions; if R is an ordinary regular cxprcssion over L, rhcn Z, c E will bc the set of symbols of Z that actually appear in II and I,, c Zi will bc regular language which corresponds to R. 1. 'l'hc cxtcrnal world raises nEoQ, to indicate that it would like to proceed with event e.
2. 'I'hc synchronizer raises ACK~ to allow the cxtcrnal world to proceed with event e.
3. The cxtcrnal world lowers REQ, signifying completion of cvcnt e.
4. The synchronizer lowers ACK~ signifying the end of the cycle and permission to begin a new enc.
In this implementation, on event will occur during the period between cycles 2 and 3 in this protocol, whcrc both REQ and ACK arc high.
Thus, multiple oecurrcnccs of any event e arc non-overlapping in time. t%ch scqucnccr block in Figure 3- Tl~c output of the latch at the end of the c gate for e, which is lab&d CLR~, is conncctcd to each of the NOR gates in front of the arbiter which corresponds to cvcnt e or to some event mutually exclusive to e.
Ihc following is an informal description of how the circuit works.
The circuit bchavcs as shown in the timing diagram in Figure 3 With this notation in place we state some propositions, or axioms, that dcscribc the propertics of the circuit of Figure 3 -2. These propcrtics will be used to prove that the circuit is safe and live. The propositions that arc not self-cvidcnt will bc justiticd in later sections of this paper. proof: See the appendix. 0
As a convetse to thcorcm 10 WC would like to show that our circuit
1QS
can product any valid tract Est. such that 'l'(Kxt) E 'I'r@)
for at least some behavior of the cxtcrn;ll world. Howcvcr for some traces 7 E '1'r.J M), thcrc dots not exist any F.xt such that 'I'( Ext)=T. SO there is 00 way any circuit can product the required trace Ext. This happens when T dots not suficicntly constrain the order in which the clcmcnts may occur so that any actual set of time intervals will have fcwcr concurrent clcmcnts than 'I'. Given such a T it is ncccssary to constrain its partial order relation further, by adding additional (consistent) prcccdcnce relationships. It is easy to show using dctinition 4 that this will never rcmovc T from the set Tr$vl). WC shall show that whcncvcr T is sufticicntly constrained so that it falls in a class of tracts WC call layered, then for some behavior of the cxtcrnal world T(Ext) for our circuit will equal this modified T. In general, any trace P will have a corresponding layered trace T which prcscrvcs most of the parallelism of P. It is easy to show that for any trace P.thcre exists a laycrcd uxc 'f, which differs from P only In that the partial order relation of P is a rcstricrion of that of T. We show that a compact layout for the scqucnccr exists, so that circuits of this type can bc implcmcntcd cconornically in VI.SI.
Since .a simple path expression is a regular cxprcssion. the scqucnccr for a simple path cxprcssion is similar to a rccognizcr for the regular NOW that the behavior of a scquenccr has been described, we show how to construct a sequencer for any path. A scqucnccr has two parts: Because of the definitions of Start, and End,. the l&most latch is loaded from ENB whcncvcr at least one 1-R is on and no TA is on, while the rightmost latch is loaded to update RI3 whcncvcr at least one TA is on and no TH is on. The two latches are ncvcr loaded at the same time;
in fact, bccausc TR and TA follow the four cycle signalling convention.
t!crc is a non-zero time bctwccn the end of the load signal for one latch and the start of the load signal for the other. Thus there is no combinational path through the cell. We define L.B and RES to bc correct if they meet the following conditions l E!GB is true for a sub-circuit if each sequcncc of crcnts satisfying the expression for the sub-circuit may bc the next sequcncc lo occur.
l RES is true for a sub-circuit if some sequcncc of events satisfying the sub-circuit has just occurred, and E&JR was uuc bcforc the beginning of that sequence. WC shall prove the stronger statcmcnt thal al1 ENR signals in the rccognizcr arc cofrcct. This proof is based upon the structure of the rccognizcr. An t!NR signal in a rccognizcr is set by one of four sources:
. The operand port of a "+ " or "+" cell:
l The left operand port of a ";I' ~~11;
l The right operand port of a ";" cell;
. length II that has k types of input events is laid out in this fashion, the arca of the layout is no more than O(rr(log n + k)). This is due to the structure of the rccognizcr circuits. All rccognizcr circuits arc trees, which can bc laid out with all nodes on a lint and cdgcs running pnrallcl to the lint using no more than O(log II) wiring tracks [7) . Thus the height of the circuit in Figure 4 .7 is O(log n + k) while its width is o(n).
Implementation of the Arbiter
In this section we briefly elaborate on the arbiter shown in Figure   3 Hcncc an arbiter is simply a transducer that takes a set of inputs and produces a 5~1 of outputs, subject to the constraints outlined earlier.
Morcr?:cr, it is tmplicitly assumed that the arbiter is oblivious of any static or dynamic structure of the oath cxprcssions other than those rcprcscntcd by the conflict graph and the set of events cnablcd -in particular, it has no knowlcdgc of the syntactic structure of the path cxprcssion, nor dots it know the internal states of the individual scnucncers. Clearly. one can build non-oblivious arbiters that may perform better. but this will bc at the expcnsc of conceptual simplicity and the arca needed for additional logic and global wires.
'fo motivate our design WC shall hricfly discuss the problems with some simple schemes. In particular, WC show that any dctcrministic ohliviors arbiter gives rise to starvation of an cvcnt which is continually Ihc difftulty of building a fair dctcrministic arbiter can bc illustrated by an example. Let X = { At. A? . . . , An } bc a set of events. To try to build a fair arbiter for P we might assign a priority number from 0 through n -1 to each event. whcrc the priority corresponds to the number of times the cvcnt is blockc~f, ie., the number of times the event is enabled but not sclcctcd by the arbiter. At any instant thu arbiter sclccts from the set of cnablcd cvcnts with the highest priority number.
When an cnablcd cvcnt is sclcctcd its priority numhcr is rcinitializcd to the lowot value. On the other hand. ifthc cnablcd cvcnt is not sclcctcd its priority numhcr is incrcmcntcd by enc. II seems that since an cvcnt
Ai can have at most II-1 neighbors in the conflict graph, and since each time it is blocked at Icast one of its ncighhors is sclcctcd with a resulting incrcmcnt in its own priority. after the I?' attempt Ai must have the highest priority amonS all the neighboring cvcnts and hence must be sclcctcd. Huwcvcr, an cvcnt may ncvcr bc cnablcd cvcn if its rcqucst is sul! pending bccausc scqucncing conditions imposed by the path cxprcssion may block the cvcnt. In order to make this observation concrctc consider the following path cxprcssion:
Assume that the external client always rcqucsts pcnnission to perform ail three cvcnts A. I3 and C. Let the priorities of all three be o's initially.
As a result, initially A and L3 arc enabled. Assume that II is selected, making B's priority 0 and A's priority 1. In the next instant, A and B will again be cnablcd. Dut now A has the higher priority and will be sclcctcd. so that A's priority bccomcs 0 and KS becomes 1. Continuing in this fashion, it is easy to see that the scqucncc chosen will bc B A B A BA . . . . The trouble with this schcmc is that C will never bc cnablcd even .if its request is pending. This cxamplc can be extcndcd to the following lemma. Rcfore procceding further, let us consider the path expression path A + B end. when: the contlict graph is G = (V. F1) = ({ A, B }. {[A. RI}).
Seitz [14] has shown how to build an arbiter for such a structure using an interlack&mcnt, as shown in Figurc S-2.
Circuit operation in Figure 5 -2 is most easily visualized startiun with neither clieut rcqucstin& v1 and vz both near 0 volts, and both outputs high. If any single input, say At,,, is lowered then v1 is driven high.
high thrrshold buff,ra Proofz By induction on the number of rising transitions ofm's :
1. (First transition): Let the corresponding cvenf bc e. By proposition 9 initially all TA'S arc low, and all CLR'S are high, hence all TN's arc low initialiy. By proposition 7 all TA's will remain low until the first rising transition of TRc By the same proposition I%~ will not change until the first rising transition of TR . If DIS WC= not IOW, INc would remain low (see . H%ncc by proposition 6, TR, would remain low, a contradiction. 2. (For a succeeding transition): Let the corresponding cvcnt be p and that of the previous transition q. While TRI is high no 'IA or 'i'R other than TA,, or I'R~ can bc high (proposition 6 and icmma 19). Until CIA goes high, 'TR must remain high (see Figure 3-2) . Once Cfno goes high. a% I:~, with a e ZRj, will bc low after a short delay (see klgurc 3-2). Assuming the variation in this delay for diffcrcnt ds is less than the delay of the arbiter in lowering TRY: all 'I'R~ with D t q will continue to remain low until CI.Rq IS lowcrcd (see Figure 3-2) . All .I.A~. with a z q, also continue to remain low (proposition 7). But CI.R remains high at lcast until TA is lowcrcd (see Figure 7) . lkncc by ~Jx time TR is raiscdgall TA's will bc low. Also 'I'R could not have bcc:raiscd if IN were low (proposition 6f But if INS was high when TA' was last lowcrcd then IN would n& bc low (see Figurg  3-2) . assuming the main kOR gate plus the 2-input NOR gate have a lcsscr delay than the Mullcr-C clcmcnt plus the SR Flip-Flop. Morcovcr, rxSp cannot change bcforc TRp is raised (proposition 7). Hcncc DISp must bc low when .rRp is raised. The proof for the second half of the cycle is more straightforward.
By lemma 8 once all RI:@S are lowered. within a finite time all rclcvant 'TA's will be raised, causing the corresponding CLR'S to go Iligh. As a result all relevant IN's go low (see figure and hence by proposition 6 all ACK'S go low within a finite time, completing the cycle. q
