Compiling Path Expressions into VLSI Circuits by Anantharaman, T. S. et al.
Compiling Path Expressions 
into VLSI Ci rcuits 
T. s. Anantharaman 
E. M. Clarke 
M. J. F ostcrt 
13. ;v1ishra . 
Carncgic-\lellon University 
Pittsburgh. Pennsylvania 15213 
June 1985 
CTCS-166-3S 
tCurrcnt address: Department of Computer Science, Columbra University. New York, New York 10027. 
This research was partially supported by NSF Grant MCS-82-16706, and the Defense Advanced RescJrch 
ProJccts ,\gency (DOD). ,\RPA Order No. 3597. monitorcd by the Air force Avionics LaborJtory Cndcr 
Contract F33615-81- K -1539. 
.\ilqra(t: Path expressiuns l,I,ere urigln,llly propused by CJrnpbcl1 Jilli HJoermJ!l1l [2] JS ,1 mcch,lnl')m for 
process synchroniz.1tion at the moniwr level in sur'twJre. ~()t lIncxpcc[cdly. they Jlso provide Zl Llseful 
nucJtiun fur specifying me behavior uf asynchronuus circuits. \lot!\J(ed by c.hesc putentlal applicJtIons we 
investigate how [0 dir(Xtly translate path expressIons In(O h;udl,l,;.ue. 
Our implementation is cumplicated in the case of multiple pam expressions by the need for synchronizJtion 
on event names that Jre cornmon to more than one path. \10rcover, since events are inherently asynchronous 
in our model, all of our circuits must be self-timed. 
~cverthelcss. the circuits produced by our cunstruction have area proportional to N 'log(N) where N is thc 
[0 tal length of thc multiple path expression under consideration. This bound holds regardless of the number 
of individual paths or the degree of synchronization between paths. Furthermore, if the structure Qf the path 
expression allows partitioning. the circuit can be layed out in a distributed fashion without additional area 
overhead. 
1. Introduction 
As the boundary between software and hardware grows less and less distinct., it becomes increasingly 
important to investigate methods of directly implementing various programming language features in 
hardware. Since many of the problems in interfacing hardware devices involve some tonn of prcces.1 
synchronization. language features for synchronization deserve considerable attention in such investigations. 
In this paper we consider the problem of dir(Xcly implementing path expressions as self-timed VLS [ circuits. 
Path expressions were originally proposed by Campbell and Habermann [2J for rcstricting access by other 
proccsses to the procedures of a monitor. For example, the Simple readers and writers problem with two 
reader processes and a single writer precess is solved by the foiJowing multiple path expression: 
path Rl + Wend, 
p:nh R2 + W cnJ. 
The first path expression prohibits a read operation by the first process from cccurring at the same time as a 
write operation. The second path expression enforces a similar restriction on the behavior of'the second 
reader process. In a compu~tion under control of the multiple path expression. the two read operations may 
occur simultaneously, but a read and write operation cannot occur at the s.m1C time. 
A simple path expression is a regular expression with an outennost K1cene star, The only operators 
permitted in the regular expression are (in order of precedence) ...... ";". and" + ", lbe " ... operator is the 
Kleene star, It; .. is the sequencing operator, and" +" represents exclusive choice. Operands are evcnt nJITIes 
from some set of events k that we will assume to be fixed in this paper. The outermost Klcene star is usually 
represented by the delimiting keyword path ... end. lhus (a)· would be represented as path a end. Roughtly 
the sequence of events allowed by a simple path expression must corrc~pond to the sequences ascccpt by the 
regular expression. 
/\ multiple path expression is J set uf simple P,)(/1 expressions. 1\5 we \Ii III sec shortly. CJch JddiCiunJI simple 
path expression further constrJins the order in which c\encs cJ.n occur. HO\licver, we cannot simply take as 
our semantlcs fur multiple PJth expressions the intersection uf the IJnguJges corresponding to the individual 
p;llh expressions: two events whose order is not explicitly restricted by one uf the simple path expressions may 
be concurrent. For example. in the multiple path expression for the readers Jnd writers problem discussed in 
the introduction the two read events Rl and R2 may occur simultaneollsly. Nevertheless. we will still have 
occasion to use ordinary regular expressions in giving the semantics for path expressions. 
Path expressions are useful for process synchronization for two reasons: First, the close relationship 
between path expressions and regular expressions simplifies the task of writing Dnd reasoning about programs 
which use this synchronization mechanism. Secundly. the synchronization in many concurrent programs is 
finite state and thus. can be adequately described by regular expressions. For precisely the same re(1sons. path 
expressions are useful for controlling the behavior of complicated asynchronous circuits. The readers and 
writers example above could equally well describe a simple bus arbitration scheme. In fact. the finite-state 
assumption may be even more reasonable at the hardware level than at the monitor level. 
Path expressions may be useful in coordinating the actions of distributed systems. Distributed systems are 
typically locally synchronous. with each device having a local clock. but globally asynchronous, since no 
global clock is scnt to every device. If two devices in such a system share a resource, but do nut share a global 
. clock. some means of synchronizing their actions must be provided. An asynchronous device that enforces a 
path expression could be used as a synchronizer in this case. lsing such a synchronizer, separate devices in a 
distributed system could run without a global clock, synchronizing their actions only when necessary. 
Which brings us to the topic of this paper: What is the best way to translate path expressions into circuits? 
Lauer and Campbell have shown how to compile path expressions into Petri nets [7], and Pati! has shown how 
to implement Petri nets as circuits by using a PLA-like device called an asynchronous logic array [13]. Thus, 
an obvious method for compiling path expressions into circuits would be to first translate the path expression 
into a Petri net and then to implement the Petri net as a circuit using an asynchronous logic array. However, 
careful examination of Lauer and Campbell's scheme shows that a multiple path expression consisting of M 
paths each of length K can result in a Petri net with K M places. Thus, the naive approach will in gereral be 
infeasible if the number of individual paths in a multiple path expression is large. 
For the case of a path expression with a single path their scheme docs result in Petri net which is 
comparable in size to the path expression. However, direct implementation of sLich a net using Patil's ideas 
J 
may still result in a circuit with an unJccepLJbly large arc.] .. \n Js:,nchronoLts logic arrJ> for a P2~ri ne~ with P 
p);:1Ces and T transitions will have JreJ proportional (0 P' T reg,lrdless of (he number of arcs in the net. Since 
me nets obtained from path exprcssions (end to hJve sparse edge sets. lhis quadratic beha'dor may waste 
signit1cant chip area. 
Perhaps, the work that is closest to ours is due to Li and Lauer [10] who do indeed implement path 
expressions in VLSI. However. their circuits differ significantly from ours: in particular, their circuits are 
synchronous, and synchronization with the external world (wl1ich is, of course, inherently asynchronous) is 
not considered (This means that the entire circuit, not just the synchronization, must be described using pach 
expressions. Furthennore, their circuits use PL\'s that result in an arc a complexity of O(N2). Rem [15] has 
investigated the use of a hierarchically structured path expression-like language for sp<xifying C~10S circuits. 
Although he docs show how certain specifications can be tr~1nslated into circuits. he does not describe how to 
handle synchronization or give a general layout algorithm that produces area efficient circuits. 
In contrast, the circuits produced by the construction described in this paper have area proportional to 
~ ']og(N) where N is the total length of the multiple path expression under consideration. Furthennore, this 
bound holds regardlcss of the number of individual paths or the degree of synchronization between paths. As 
in [4] and [5] the basic idea is to generate circuits for which the underlying graph structure has a constant 
separator theorem [81. For path expressions with a single path the techniques used by [4J and [5] CJn be 
adapted"without great difficulty. For multiple paths with common event names. however, me construction is 
not straightforward, because of the potential need for synchronization J( many ditTcrent points on each 
individual path. MoreO'Ver, the actual circuits that we use must be much more complicated than the 
synchronous ones used in ( [4], [5]). Since events are inherently asynchronous in our model, all of our circuits 
must be self-timed and the use of special circuit design techniques is required to correctly capture the 
semantics of path expressions. 
The paper is organized as follows: A fonnal semantics for path expressions in terms of partially ordered 
multisets [14] is .given in section 2. In sections 3, 4, and 5 we give a hierarchical description of our schcmc for 
implementing path expressions as circuits. In section 4 we first describe how che complete circuit interfaces 
with the external world. We then show how to build a synchronizer that coordinates U1C behavior of the 
circuits for the individual path expressions in a multiple path expression. In section 3 we describe a circuit for 
implementing single path expressions which we call a sequencer. [n section 5 we show how the arbiter circuit 
used in section 4 can be implemented. We also argue that these circuits are correct and can be laid out 
efficiently. The conclusion in section 6 discusses the feasibility of our implementation and the possibility of 
extending it to other synchronization mechanisms like those used in CCS and esp. 
2. The Semantics of Path Exp ressions 
In this section we give a simple but formal semantics for pJth expressions In terms of partially (JrJcrcd 
multiscts of events [l4]. An alternative semantics in tcrms of Pctri \cts is gi'vcn by I.dllcr and (.llnphcllln (7]. 
:\ a pomset may be regarded JS a gcnerali7Jtion of a sequence in which certain elemcnts Jre pcnnittcd to be 
concurrent: this is why the concept is useful in modeling systems where sevcral events may occur 
simultaneously. 
Definition 1: A partially ordered mu/tiset (pomset) over 2: is a triple (Q. s. F) where (Q. s) is a partially 
ordered set and F is a function which maps Q into 2:. 0 
An example of a pomset is shown in Figure 2-1. We LIse subscripts to distinguish different elements of Q 
that map to the same element of I. In this case Q = (/\.A2.A3.Bl.J32.I33'Cl,C2,C3) and L = (A,B,C). Note 
that we could have alternativcly defined a pomsct as a directed acyclic graph in which each node is labeled 
with some clement of I. 
8, 8 2 8 3 /~/~/ 
A1~ /A2~ //3~ 
C, C2 C3 
Figure 2·1: An example pomset 
If the ordering relation of a pomset P over I is a total order. then we can naturally associate a sequence of 
elements of I with P; we will use S(P) to denote this sequence. 
Definition 2: If P = (Q. ~: F) is a pomset over I and Ii ~ I. then the restriction of P to Ll is the 
pomset pi I = (Ql' S l' F 1) where Q 1 = {d E Q I F( d) E II } and::; l' Flare restrictions of s. F to 01' 
1 
respectively. 0 
If P is a totally ordered pomset over L and II ~ I. then S(pl k ) is just the subsequence of S(P) obtained by 
1 
deleting all of those elements of I which are not in I r If if R is an ordinary regular ~xpression over L, then 
LR ~ L will be the set of symbols of L that actually appear in Rand Lf{ ~ L~ will be regular language which 
5 
corresponds to R. 
Definition 3: I.et L he a finitc sct uf cvcnLS: a {race over I IS a finite pomsct [' = (Q. $. F) over~. Wc 
say that i E Q is an iI/stance of an cvcnt e E 2: if F( i) = e. ,\n instance II 0 f C\ en t e, preCfJes an instance i2 
• l 
of event e2 if i1 precedes i2 in the partial order:s. :\n instancc it of event C1 IS concurrent with an instance 
i2 of cvent e2, if neither instance preccdes the other. 0 
fn the example above A1 precedes A 2, but [31 and C1 are concurrenL 
Definition 4: Let R be a simple path expression with event set LR. ,\ tr2ce T is cOllsistell! with R iffTI ,_ , -R 
is totally ordered and scrl i:
R
) is a prefix of some scquence in LR . If \-1 ~ a multiple path expression, 
~en a trace Tis COf/sis/em with M iff it is consistent with each simple path expression R in M. Tr ~C\O is 
the set of all traces which are consistent with M. 0 
Consider, for example, the multiple path expression M: 
path A; I3 cnd, 
path A;C end. 
with L = {A. B, C}. It is easy to sce that the trace in Figure 2-1 is consistem with each of the simple path 
expressions in M and hence is in Tr ~(M), 
3. Implementing the Sequencer for a Simple Path Expression 
This section shows how to construct a sequencer that enforces the semantics of a simple path expression. 
The sequencer circuit is constructed in a syntax-directed fashion based upon the structure of the simple path 
expression. We show that a compact layout for the sequencer exists, so that circuits of this type can be 
implemented economically in VLSI. 
Since a simple path expression is a regular expression, the sequencer for a simple path expression is similar 
to a recognizer for the regular expression. Although schemes for recognition of regular languages have been 
proposed that avoid broadcast [4], we will usc a scheme that requires broadcast of events throughout the 
sequencer [5, 12]. Because our scheme for interconnecting sequcncers (see section 4) requires broadcast, the 
broadcast within an individual sequencer carries no additional penalty. r\ sequencer for a simple path 
expression is built up from primitive cells. each corresponding to one character in the path. The syntax of the 
path determines the interconnection of the cells in the sequencer. 1 n this section, we first describe the 
6 
beh.:n ior of J sequencer for a simple path expression. then give J syntax-directed constn.!ction method . 
. \ outside world communie<.Hcs With a sequencer using three lines for each event: 
• TR : a signal to the sequencer that event e is about to commence in the outside world: 
e 
• TA : an acknowledgement from the sequencer that the execution of event e has been noted by the 
e 
sequencer. 
• DIS : a status line indicating that action e would violate the path constraints so that lR should not 
e e 
. be asserted by the outside world. It is valid when TR and TA arc both low. 
These communication lines interact in a complex way. For a single type of event, the signals TR and TA 
. e e 
follow the four-cycle signaling convention ( for an example see Section 4). For different types of events. the 
outside world must guarantee the correct in teraction of TR signals by ensuring that only one TR signal for an 
event satisfying the simple path expression is asserted at any time. The outside world can use the DIS status 
lines to determine which requests to send to the sequencer. 
The sequencer also has a part to play in ensuring the correct interaction of TR, TA and DIS. Besides 
generating a T" signal that follows the four cycle convention with TR, it must ensure that the signal DIS is 
e 
correct as long as no TR or T" signal is asserted. This guarantee means that if no TA is asserted, and neither 
DIS e/ nor DIS e2 is true, then the outside world may choose arbitrarily between e/ and e2. letting either of them 
through to the simple path sequencer. On receiving a TRe signal, then, the sequencer must assert TAe' adjust 
its internal state to reflect the occurrence of event e, assert the proper set of DIS lines while awaiting the 
negation OflR before negating TA . 
e e 
Yfore formally we require the following propositions to hold: 
Proposition 5: (Sequencer protocol): For any sequencer SEQj , 
1. TA is raised only if TR is high. 
e e 
2. T" is lowered only if TR is low. 
e e 
3. DIS is stable while all TR'S and TA'S are low. 0 
e 
Proposition 6: (Sequencer safety and liveness) : For any sequencer SEQj , assume that at all times. 
• no two TR's are high simultaneously, 
• TR is raised only if DIS and all TA'S are low, 
e t 
• TR is lowered only if T A is high. 
e t 
Then the following hold: 
1. TA is raised within a finite time ofTR being raised. 
e e 
2. TA is lowered within a fl11itc time oflR being lowered. 
e e 
7 
3. For Jny sequenccr SLQ, whcnc\cr JII I."S Jnd lR'S MC low, eXJctly thuse c\cnts c will hJve Drs" 
low, for wl1ich scns'cqU))) CJn be extendcd by c (() give J prefix of some sequence in 
LRj' 0 
~ow chat the behavior of J sequencer has been dcscribed. we show how to construct a sequencer for any 
simple path expression, t\ sequencer has two parts: a controller Jnd a recognizer. The controller is connected 
directly to the rest of the outside world and generates both the TA signals and some control signals for the 
recognizer. The recognizer keeps track of which events in the path have been seen and generates the DIS 
signals. 
Figure 3-1 shows the controller for a simple path P. The controller accepts the signals TR from the 
. e 
sequencer for each event e that appears in P. It generates the signals TA along with Start and End. The 
_ e 
meaning of TA is that all actions caused by TR have been completed. In this realization. TA is just a deiJycd 
t e 
version ofTR. where the delay is long enough to let the sequencer stabilize. I\n upper bound on this delay can 
be computed from the layout of the rest of the circuit. Thus the sequencer is self-timed but not delay 
inscnsitlve. A delay insensitive circuit will be described in a separate paper [1] . I t has been omitted in this 
paper as it unnecessarily complicates an understanding of how the sequencer works. StJrt and End are 
essentially two phase clock signals that control the movement of data through the recognizer for P. Roughtly 
Start is true from the time one TR is asserted until the correponding T/\ is asserted. while End i) truc from the 
time TR is uC3sserted until TA is also deasserted. The clement bbcllcd '\f.E. (Mutual Exclusion) is an interlock 
element as shown in fig 5-2. It is required to guarantee that the two clock phases are strictly non-overlapping, 
The recognizcr for a path accepts the TR signals and generates the Drs signals. It is made up of sub-circuits 
e 
corresponding to subexpressions of the path. To construct the recognizer for a pam, we parse me path using a 
context-free grammar. Productions that are used in parsing the path determine me in terconnections of 
sub-circuits to form the recognizer. Non-terminals that arc introduced in me parse correspond to primitive 
cells used in the circuit 
Recognizers are construct~d using the following grammar for simple patll expressions. 
S ..:.. path Rend 
R - R;R I (R + R) l (R)-I <event>. 
The terminal symools in the grammar correspond to primitive cells; there is one type of cell for the "+" 
symbol. one for the ...... symbol. one for the ";" symbol. and one for each event. The non-terminals 
correspond to more complex circuits that are fanned by interconnecting the primitive cells. Using the 
method described in [3}. semantic rules attached to the productions of the grammar speci fy how the circuits 





Figure 3·1: The controller for path P 
To keep track of which events in the path have occurred and which are legal. the sub-circuits uf a recognizer 
communicate using the signals E.';[3 (enable) and RES (result). If L:\I3 is asscrted at the input of a circuit for a 
5ubexpression at the beginning of a cycle (when START is asscrted). the subcircuit begins keeping track of 
. events starting wic.h that cycle, and Jsserts RES after d cycle if the event sequence so far is legal for the 
subexpression. The E~I3 input may be asserted before any cycle, and the subcircuit must generate a RES signal 
whenever any of the previous E~B inputs by itself would have required it. At the cop level E:\8 is asserted only 
once. before the first .cycle. Between cycles each subcircuit deasscrtS the DIS signal ... for an event, if the 
occurancc of that event during the next cycle is legal (this is the case if the subcircuit would assert Drs for 
some subsequent sequence of events even if E~B were not asserted any more). These event signals from all 
subcircuits are combined to generate the external DIS signals. 
Figure 3·2 shows the cell for event e. Two latches. clOCKed by Start and End, control the flow of [~B and 
RES signals. The latches are transparent when their enable is asserted and hold their previous value otherwise. 
The latch pair forms a level trigerred master - slave D-Flip- F1op, clocKcd by the non-overlapping clock signals 
Start and End. 
The event cell in Figure 3· 2 propagates a 1 from E~B to RF.5 only if event e occurs. When this cell is used in 
a recognizer for a path expression, the E~B input will be true if and only if event e is pe!1i1itted by the 
9 
From other 
cells tor e 
~OIS. ~ ~ ENS p-t:J 10 l~tCh RES ~ 
I 
TR e Sta rt p ctld? 
(Some TM (Sorr.e T A 
and no T A) Jrld no TM) 
Figure 3-2: Cell for event e in path P 
expression. Thus. if E?"B is true it negates DIS for the path, as shown in the figure. \Vhen a request TR is ( . 
made, the output of the A~O gate is loaded into the leftmost latch. [f this request is TRe.-> this output is 1; 
otherwise it is O. In either case the output of the A:\O gate is propagated to RES through the. latch when rR is 
lowered. 
Figures 3-3 and 3-4 show the cells for the ":" (sequencing) and" +" (union) operators. Thesc Jrc strictly 
combinational circuits. The circuit for ";" feeds the Rf.5 signal from the circuit at its left into the E~G signal 
for the circuit to its righL The circuit for" +" broadcJSts its E~ [) signal to its operands and combines the RES 
signals from its operands in an OR gate. It will be seen that the combination (union) of multiple recognitions 
by each subcircuit is essential in allowing them to be built up recursively, and exploits the fact that the union 
and )cquencing operators are distributive over union.1 
Figure 3-5 shows the cell for the ... " operator. The cell enables its operand after receiving' either a 1 on 
either its own END or its operand's RES. Every time the operand is enabled the ..... cell also puts out a 1 on its 
own RES. It therefore outputs 1 on RES after 0 or more repetitions of its operand's expression. The additional 
A~D gate sets the output to 0 momentarily after ~h event. thereby preventing the formation of a latch whe~ 
two or more ..... cells are used together. This cell is responsible for making the minimum cycle duration 
depend on the path expression. During the first phase of a cycle the sequencer has to perform an (-closure of 
IThis is also the rc:lSOn why this method cannol be used for extendcd rcgular c:'(prcssion Wlth complementJlntcr.;cctlon by 
ivnclllng/ Al'Oing thc corresponding Rrs outputs: The complcmcntJintcfSCCllon operator.; arc not dlstllbuLlvC over union. 
10 
/ 
I RES ENS 
I I 
\:; A t \ 
Figure ),3: Cell for "; It 
ENS t RES 
Figure ),4: Cell for" +" 
the simple path cxpr~ssion. This delay is directly reflected in the gate delay between the c .... o input and RES 
output of the ..... cell. These delays will add up for an expression like ((a e ; be) ; (c e ; de)). 
End p ENB RES 
Figure ),5: Cell for ... " 
11 
When IJrgcr circuits are made from Ulesc cells. U1C RI sand [\11 Sig!l:1ls ['c(ain ulcir mCJnings. Fach CyelH 
cell or sub-circuit formcd from sCyef;1! cells accepts une input I.'\B and produces onc ()utput RLS. In general 
we define a pair of E:"13 and R.CS to be currect if me rullol,l,ing JPplies ,H me beginning of cach cycle (just 
before ST\RT is deasserted) : 
• r:-., /I is true if and unly if L1e sequcnce of events so far can be extended by any sequence of events 
satisfying the expression of the subcircuit controlled by the E:" [J/R. LS pair, to give a prefix of some 
sequence in LRj' 
• RES is true if and only if some sequence of events satisfying the subcircuit has just completed, and 
E:-.,-n was true just before the beginning of that sequence. 
In addition, a sequencer has a signall:"IT, not shown in me fi 6urcs, which clears the RES outputs of all event 
(leat) cells and generates the c:-.,-n input for the root cell (which must a ..... cell, if there is an outennost 
implied Klccne Star) during the first cycle (an RS flip-flop set by L1e [:'jIT si;nal and reset by E:"O can be 
used to generate this E='iR signal). 
The semantic actions for the productions of the grJmmar de~ribe the interconnections of the cells in 
Figures J- 2. J-3 and 3-4. Attributes arc attached to the symbols of the grammar to represent the sets of events 
that appcar in the path. These sets determine which TR and TA signals are combined to produce St:lrt and 
End. 
S[A] -+ path R[A] cnd 
Hoole the Ins output of R to its F:-"-B input, Jnd connect l:--.;rr. 
R[A u 13] -+ R[A):R[13] 
Connect the RES output for R[A] to l1e E~n input of R[D] 
R[A u 13] -+ (R[AJ + R[BD 
Connect the R's to the operand ports of a + cell. 
R[A] -+ (R{AD- Connect R to the operand port of a • cell. 
R{{t-)] -+ event t' Use a cell for e as the circuit for R 
Figure 3-6 shows a recognizer for the path path a:(a + b);c cnd constructed using this syntax-directed 
technique. 
All rccognizcrs constructed by this procedure perform the correct function. as required by Propositions 
5 and 6. The former follows directly from the control circuit while the latter is equivalent to the following: If 
a recognizer is initialized and some sequcnce of evcnts 'clock.ed' into thc circuit, the recognizer will output 1 




Figure 3-6: A recognizer for ~th a;(a+ b);c end 
expression. To prove this we show that the E:'-in input of an event cell in the recognizer is 1 if and only if the 
event corresp~nding to this cell is permitted hy the path. As shown in Figure 3-2. Drs is 1 if and only if none 
e 
of the cells for event e is enabled. Therefore. proving that an event cell has its E:--;n signal set if and only if the 
corresponding event is permitted in the path will show that the recognizer is functionally correct. In other 
words. we wish to prove that all [~n signals for event cclls are correct, according to the definition of E~B 
above. 
\Ve shall prove the stronger statement that all E~ B signals in the recognizer are correct. This proof is based 
upon the structun.: of the recognizer. An E\" 0 signal in a recognizer is set by one of four sources: 
• The opcrand port of a .. +" or ..... cell; 
• The left opcrand port of a ";" cell; 
• The right operand port of a ";" cell; 
• The I~IT signal. 
In the first and second cases the signal is correct if and only if [\0 for the operator cell is corn~cl In the 
third case the signal comes from the RES port of a recognizer for an initial 5ubcxpression. 1l1crefore it is 
correct if and only if the RES signal for the subexpression is correct. In the fourth case the signal is asserted 
only at the start of the recognition and is correct by definition. Thus. to prove that the circuits are correct, we 
need only prove that if the E:--';B signal for any rccognizer is correct then so is the RES signal. 
Once again. the proof of correctness is based upon the structure of a recognizer. In a correct recognizcr the 
RES signal is true at time II if Jnd only if the END signal is true at some preceding time '0 and the events 
between '0 and '1 obey the path. A recognizer that is a single event cell is clearly correct. A recognizer for 
path a:h built by composition of correct subrecognizeri for a and b is also correct. since if RES b is true at time 
13 
(2 then Ulcre must be some time [t whcn Rf\ WdS CrlIC. with ,lillntcr'vcning c'.cncs s':H1sfying r Jtl1 b. [3ut (hcn 
lhcrc must have becn a time [0 whcn [\13
J 
WJS true and JII c\cms bccwccn to ,lnd [1 must SJtlsfy PJth a. fly 
definicion of composition. then. the events betl,l,ecn '0 Jnd 12 SJtisfy a:h. :\ recognilcr for PJt.h (a)* is corrcrt 
if its subrecognizcr is correct. sincc it outputs 1 Clnd enables its operand If Jnd only if F\n or RFS
a 
is tnle. 
Finally. J recognizer for path J + h is correct if both subrecognizers arc corrcct. since if RES is true then one 
of RES
a 
or RFS IJ must be true. and if one of [~mJ or E0~ is true then E\n must be true. Since all methods of 
constructing recognizers have been shown to lead to correct circuits. recognizers constnlCccd using this 
procedure are functionally correct 
Now that circuits have been designed and proved correct. we give compJct layouts for them. The floorplan 
for a sequencer. shown in Figure 3-7 has the cells that make up the rccognizer arranged in a line with the 
controller to one side. The TR signals flow parallel to the line of rec03nizer cells to enter the controller. and 
the StJrt and End signals emerge from the controller to flow pJrallel to the line of cells. The [\8 and RES 





I RES ar~ 
1111 II f! ENS 




Figure }O7: The floorplan for a sequencer 
The layout in Figure 3-7 !s fairly small. If the sequencer for a path of length n that has k types of input 
events is laid out in this fashion, the area of the layout is no more than O«(n + k)(Iog n + k». This is due to 
the structure of the recognizer circuits. All recognizer circuits are trees, which can be laid out with all nodes 
on a line and edges running parallel to the line using no more than O(1og n) wiring tracks [8]. Thus the height 
of the circuit in Figure 3-7 is O(log n + k) while its width is O(n + k). 
14 
4. Synchronizers for Multiple Path Expressions 
This section describes our linplemCIHJtion of Sy nchronlLers for muluple path expressions. Figure 
4·1 Illustrates the interface between a synchronizer and the external world. Each event e is associated with a 
request line REQ and acknowledge line I\CK . The synchronizer cooperates ',I;ith the external world to ensure 
e e 
that these request and acknowledge lines follow a 4-cycJe protocol: 
1. The external world raises REO to indicate that it would like to proceed with event e. 
e 
2. The synchronizer raises ACK to allow the external world to proceed with event e. 
e 
3. The external world lowers REQ , signifying completion of event e. 
e 
4. The synchronizer lowers ACKe' signifying the end of the cycle and permission to begin a new one. 
In this implementation, an event will occur during the period between CYC!c3 2 3nd 3 in this protocol, where 
both R[O and ACK arc high. Thus, multipre occurrences of any event e are non'overlapping in time, since Jny 

















Figure 4-1: i\ synchronizer 
The synchronizer in Figure 4·1 could be used to coordinate processes in a distributed system. 8lch of the 
devices in the system would be a client of the synchronizer; only a subset of the REQ and ACK lines would go 
to each device. I3efore performing an action, each client would request permlssion from the synchronizer Jnd 
wait until pennission was granted. In this way, harmonious cooperation could be ensured with only a small 
amount of inter-device communication. fkcause of the symctric nature of the protocol any client could Jet 
either as a master or a slave relative to other clients. i\ slave would always assert all RCQ'S and wait for a 
response through the ACK'S telling it what to do, whereas a master would assert REO'S only for those events it 
wishes to proceed with and usc the ACK'S only to get its timing right 
An overview of a synchronizer circuit is shown in Figure 4-2. The circuit shown is self timed but not delay 
independant as it makes certian assumtions about gate delays which will be described later. Some of the 
building blocks in the circuit are described below. 
lS 
AC l( I 
ACX, 
REQ 1 : 
REQ e 
CLR t 
Figure 4·2: A synchronizer circuit 
The C gate in Figure 4-2 is a Muller C-element; the output of a C-element remains low until all inputs are 
high and thereafter remains high until all inputs are low again. Its behavior then cycles. For an 
implementation see [16]. 
The arbiter in Figure 4-2 enforces pairwise mutual exclusion over the outputs corresponding to pairs of 
events which occur in the same path expression.ln addition to enforcing mutual exclusion the arbiter tries to 
raise any output whose input is high. Many implementations of arbiters will have metastable stltes during 
which fewer signals than possible may be high at the output. Despite the metastable states, however, once an 
output signal has been raised, it must remain high as long as the corresponding input remains high. The 
implementation of such an arbiter is discussed in detail in section 5. 
Each sequencer block in Figure 4-2 ensures that the sequence of events satisfies one of the simple path 
16 
expressions that comprIse the Illultiple paul expression. It WJS descrIbed In the bst scc[ion. The s;nchronl7er 
circuit contains one sequencer for eJch simple PJth expression. so that eJch simple p.1th expression is satisfied 
by In executIon event trace. For eJch event c Ulat appears in a simple path, the corresponding sequencer has 
three connections: a request lR ,an ackno\\'ledge TA", and a disable DIS. Events are sequenced by executing 
e c e 
a 4-cycle protocol over one pair of the TR/L\ lines. The DIS outputs of the sequencer are only valid between 
these cycles (when all TR and TA are low), and indicate which events would 'violate the simple path. The 
synchronizer will not initiate a cycle for any event whose DIS line is high. The implementation of the 
seq uencer is gi ven in section 3. 
We now describe how the components of the circuit are interconnected. Refer to Figure 4-2. Let SEQ 
~ 
denote the set of sequencers for simple paths that contain event e. Every sequencer in SEQ has its DIS signal 
~ e 
connected to a ~OR gate for e, its TA signal connected to a C gate for e, and its TR signal connected to ACK . 
e ~ e 
'111e output of the latch at the end of the C gate for e, which is lab~led CLR , is connected to each of the ~OR 
e 
gates in front of the arbiter which corresponds to event e or to some event mutually exclusive to e. 
~otice that there is no in trinsic need for the synchronizer to be centralized as long as the constraints 
themselves do not require it. Whenever the multiple path expression can be partioned into disjoint sets of 
paths so that paths in different sets do not refer to the same event, then each set can be lmplemented as a 
circuit independently of the others. 
The following is an informal description of how the circuit works. The circuit behaves as shown in the 
timing diagram in Figure 4- 3. When REQ is raised, event e is not allowed to proceed unless each sequencer in 
~ 
SEQ signals that at least one e type transition is enabled by negating DIS. Once this happens I~ is raised. 
e e e 
provided no mutually exclusive event is executing the second half of its cycle (and hence has its CLR high). If 
the arbiter decides in favor of some other pending event mutually exclusive to e, the above process repeats 
until e again gets a chance at the arbiter. Otherwise ACK will be raised and latched by the ~OR gate 
~ 
arrangement in front of the arbiter. At this point the external world may proceed with event e. 
Simultaneously each sequencer in SEQ will find TR high and after some time raise TA. When all 
. t t e 
sequencers in SEQ have raised TA and the external world acknowledges completion of event e by lowering 
t t 
REO.oJ CLR will be raised. This causes ACK to be lowered. Each sequenccr in SEQ will find TR low and 
c: t t e t 
after some time lower TA . When all such sequencers are done. CLR E is lowered, and the cycle is completed. 
e. 
To formally establish the correctness of our circuit. we must establish two things: First., we must show that 
the circuit allows only semantically correct event traces; second, that the circuit will allow any semantically 
correct event trace for some behavior of the external world. These propcrties of the circuit are often called 










Figure 4-3: Synchronizer timin'J 
proof will make usc of properties of the various circuit components shown in Fi:;urc 4-2. We list the m~t 
important of these properties as propositions. namely those relating to the sequencer. the arbiter. and the 
external world. Properties of other circuit components such as SR Flip- Flops. NO R gates. etc .. are JSSumed to 
be well known and are used without further discussion. The prcof also makes certain assumptions about the 
delays of the components: 
1. The delay of the main NOR gate plus the 2-input OR gate is less than that of the main ~fujkr-C 
clement plus the SR. Flip-Flop. 
2. The maximum variation in delay for the NOR gates in front of the arbiter is less than the 
minimum delay of the arbiter. 
\Ve begin by introducing some notation that will be needed in the proof. Let the sequencers be denoted by 
SEQl ... SEQp corresponding to the path expressions R 1 .,. Rp E \t1, and let LR1 ... ~Rp be the subsets of r that 
actually appear in Rl ... Rp respectively. Let I be a set of time intervals. which may include semi-infinite 
intervals extending from some finite instant to infinity. Each clement in I is labelled by an element in r. 
Detine T(I) to be the trace w.hich has an clement for each element in I and has the obvious partial order 
defin9d between elements whose time intervals are non-overlapping. Referring to Figure 4- 3, let 
• Ext = set of time intervals labelled 'external', 
• Int = set of time intervals labelled 'internal', 
• ScqU) = set of time intervals labelled 'sequencer' for sequencer SEQr 
For every interval hInt with label e there are corresponding intervals with the same label in Ext and in every 
ScqU) such that e E LRj , namely those which start at the same time. We assume that the starting points of 
- intervals in Int lie within some finite time period of interest and the intervals in F:xt and ScqU) are restricted 
18 
to intervals corresponding to those in Int. 
\Vith this notation in place we state some propositions. or axiums. that describe the properties of the circUit 
of f--'igure 4-2, These properties will be used to prove that the circuit is safe Jnd live, The propositions that are 
not self-evident will be justified in later sections of this paper. 
Proposition 7: (External world protocol): For all events e, 
1. REQ is raised only if ACK is low. 
e e 
2. REQ is lowered only if ACK is high. 0 
e e 
Proposition 8: (Arbiter safety and liveness): 
1. For any events el.e2 that are mutually exclusive. ACK el and ACK e) are never high simultaceously. 
2. For any event e, ACK is raised only if I\" is raised. 
/! c 
3. For any event e, ACK is lowered only if I~ is low,and within a finite time of [7'/ being lowered. 
/! c e 
4. Consider any set of events L' ~ L, such that no two cyents in L' arc in the same path expression. 
Then if all I~ ,e E I', are raised, within a finite time all .I\CK ,e E I', must be raised. 0 
/! e 
Proposition 9: (Initialization) 
1. Sequencers arc initialized with all TA'S low. 
2. The synchronlLer circuit SR flip-flops are initialized to make all CLR'S high. 0 
Thc following theorem states that a synchroniLer satisfying Propositions 7 through 9 is provably safc. 
Theorem 10: (Synchronizer Safety) : T(Ext) E Tr k(M) . 
proof: See the appendix. 0 
,\s a converse to theorcm 10 we would like to show that our circuit can producc any valid tra~e Ext, such 
that T(Ext) E Tr I(M) for at least some behavior of the external world. However for some traces T E Tr ~(M), 
there does not exist any Ext'such that T(Ext) = T, so there is no way any circuit can produce the required trace 
Ext. This happens when T does not sufficiently constrain the order in which the clements may occur so that 
any actual set of time intervals will have fewer concurrent clements than T. Given such a T it is necessary to 
constrain its partial order relation further, by adding additional (consistent) precedence relationships. It is 
easy to show using definition 4 that this will never remove T from the set Tr r(M). We shall show that 
whenever T is sufficiently constrained so that it falls in a class of traces we call layered, then for some behavior 
of the external world T(Ext) for our circuit will equal this modified T. 
19 
Definition 11: A trace P = (Q.S.U IS called /dyer('d. if Q CJn be subdi\ided Into J sequence of subsets. 
such that for any il, i2 E Q. il prccedes i] iff the sllbsel in v.hich il lies precedes the subset in v.hich i2 
lies. 0 
The trace in Figure 2·1 is layered. since its clements can be subdivid~d into the sequence of subsets 
{(Al).(Bl,Cl).(1\2).(132.C2).(A3),(By C3)} with the above property. If the size of each subset were one. then the 
trace would be totally ordered. 
In general, any trace P will have a corresponding layered trace T which preserves most of the parallelism of 
P. It is easy to show that for any trace P.there exists a layered trace T, which differs from P only in that the 
partial order relation of P is a restriction of that ofT. 
Theorem 12: (Synchronizer Liveness): Given Jny layered trace P E Tr I Cv1) , our circuit \It ill produce an 
event trace Ext, such that T(Ext) = P for some behavior of t.he external world. 0 
proor: Sec the appendix. 0 
s. Implementation of the A rbite r 
In this section we briefly elaborate on the arbiter shown in Figure 4-2 to show that the corrditions assumed 
for it can be mel In older literature the term arbiter refers to a device which selects a single event from a 
mutually exclusive set of requests. In this paper the term is used in a somewhat less restrictive sense. j\ll 
. events need not be mU:1.lally exclusive"and the arbiter may select more than one event concurrently, as long as 
no two mutually exclusive events are selected simultaneuusly. In addition, the arbiter should be jair when 
forced to chose between events. This is much harder to achieve than just the mutual exclusion requiremcnl 
The following observation helps to simplify the arbiter: a pair of events occurring in any single path 
expression must be mutually exclusive. This is due to the role that each event playS in enforcing 
synchronization among a set of multiple path expressions that all contain the same named event. The 
arbitration function can thus be represented by a conflict graph, in which each event is denoted by a vertex 
and the relation between a pair of mutually exclusive events denotcd by an undirected edge. Our observation 
shows that the resulting conflict graph for a set of path expressions consists of a set of overlapping cliques, 
where a clique of k nod~ AI' A2, ... , i\ k' corresponds to a path expression R, with 
I R = { AI' A2, "', A k }. The conflict graph represents the static structure of a set of path expressions. 
Figure 5-1 shows a multiple path expression with its conflict graph. 




..... _-.~ __ .... F 
path (A + 8 + D) end 
path (8;(C + O);E) end 
path (E + F + G) end 
Figure 5-1: The conflict graph of a path expression 
enabled at any inst.lnt ( An event with a pending request is enabled if it docs not violate the sequencing 
constraints of any path expressions) . The dynamic structure of the set of path expressions is represented by 
an active subgraph of the conflict graph induced by the set of vertices corresponding to the events. enabled at 
that instant. The function of the arbiter is to select an independent set of this subgraph. thus ensuring that 
only one of any pair of mutually exclusive events is enabled. In this paper we require the arbiter to respond 
whenever it can and not introduce deliber(l(e w(lit states. More formally we define a maximally parallel set of 
events to be an independent set of the active subgraph. such that it is not a subset of any other independent 
set of the active subgraph. We require the arhiter to respond with a maximally parallel set without waiting for 
any input change or introducing deliberate delays. In general there will be more than one possible maximally 
parallel set, and the arbiter need not chose the largest one. Note that events overlap in time. hence when the 
arbiter makes its selection some of the events in the subgraph may already be selected, and this further 
constrains the possible choices of the arbiter. 
The arbiter should be fair when faced with a choice. So far we have not defined what we mean by fairness. 
The definition is complicated because events with pending requests need not be enabled. ilecausc of logic 
delays, the circuits keeping track of the path expression states, may think a particular event is still cnabled 
even though the arbiter has just acknowledged a conflicting event. For our purposes such an event is 
considered not enabled. The most commonly used definitions of fairness that allows pending events to be 
disabled are due to Lehman, Pneuli and Stavi [9] . The definitions apply to infinite execution traces. An 
arbiter is fair if all the infinite execution traces it produces are fair. 
1. Impartiality: Each pending event is infinitely often acknowledged in the trace. (M ust be fair to all 
events). 
2. Fairness: Each pending event is either infinitely often acknowledged or almost everywhere disabled 
in the trace. (Need be fair only to events that are infinitely often enabled). 
3. Justice: Each pending event is either infinitc/y often acknowledged or infinitely often disabled. 
(~eed only be fair to events that are after some finite time continuously enabled.) 
The order of these definitions is such that if an arbiter is fair according to one definition it will also be fair 
according to any succeeding definition but not the other way round. Note that these definitions do not require 
2 L 
different c\ents to bc ackno';l.1cdgcd \vitli cqual i"Jirncss. Jll tJld( IS required is :J1~1t no e'.2nt is stM\cd. 
Since we do not allow deliberate ';l.Jit stJtes it is not possible fur an arbitcr for PJlh cxpreSS1UnS(u be faIr 
according to the lirst definition. Consider for instance t.he t'()llowing palh expressiun: 
path (t\ + 8); C end. 
path D; (A + E) cnd 
Suppose that each event takes the same amount of time to execute externally and that new requests for each 
event are forthcoming as soon as allowed by the protocol. Then simultaneous execution of D and 8 will 
alternate with simultaneous execution of C and E without the arbiter ever having to block any event. Yet. 
event A will never execute even if it remains continually ready. If. however. the first request for event B is 
delaycd by the time it takes to execute an event, then initial execution of event D may be followed by 
alternate executions of A and (D.C)! Note that neither the durJtion of external events nor the occurrence of 
external requests is under the control of the circuit. 
The third definition is easy to satisfy and all arbiters to be deSCribed in this paper satisfy this condition. In 
fact this kind of fairness is probably all that is required for most practical Jpplications. However. it is clearly 
not the strongest form of fairness that can be enforced. 
The second definition of fairness can be realized using a simple LR U type deterministic arbitration 
algorithm. Let there be k events. \Ve assign a priority number from 0 to k-l to each evncl., where the priority 
corresponds to the number of times the event is blocked. ie the number of times the event is enabled but not 
selected by the arbiter. I\t any instant the arbiter selects from the set of enabled events in order of priority. 
When an enabled event is selected its priority number is reinitializcd to the lowest value. On the other hand. 
if the enabled event is not selected its priority number is incremented by one. Since each event is enabled 
infinite number of times. any particular event can have at most k-l neighbors in the connict graph. and since 
each time it is blocked at least one of its neighbors is selectcd with a resulting increment in its own priority. 
aftcr the ~th attempt it will have the highest possible priority. It is pOSSible to show (using induction on k) that 
when it gets enabled next it will have the highest priority, and hence get selected. Since this will happen an 
infinite number of times. this ensure fairness according to the second definition. The LR U algorithm has the 
added advantage that the response time to different events is approximately balanced. 
Ho.wever even the second definition is not the strongest possible form of fairness that can be enforced for 
path expressions. Consider for instance the path expression path ((A;C) + (D;A» end. I\s before assume that 
all events are pending at all times. The execution sequcnce DADt\ I1A ... then is fair according to this definition 
22 
e"en though event C is starved (event C is never enJbled) . \\ie could hJ"e done better huwever since 
ACD,\i\CBi\ ... is also a legal execution sequence. 
Obviously the strongest furm of fairness en forcible lies somewhere between definitions 1 and 2. We do not 
know the strongest form of fairness that can be enfofCJ;;d for path expressions. Intuitively the fairest arbiter 
would always cause starvation for the least number number of events possible. It is not possible to 
characterize this form of fairness just in terms of execution traces. Reference must be made to the sequencing 
constraints that enable/disable pending requests, which in our case in volves the complete path expression. 
The probelm can be greatly simplified by requiring the arbiter to be oblivious of the sequencing constraints 
and therefore equate a disabled event with a event not requesting. This restriction will also tend to simplify 
the logic since the arbiter size need not depend on the size of the path expressions, but only on the alphabet 
size. It should be kept in mind however th:lt like our previous restriction requiring prompt response, this 
restriction limits the kind of arbiters possible. 
We shall describe a probabilistic arbitration algorithm for an oblivious arbiter whose infinite execution 
traces will be "fair" with probability 1 where "fair" is defined by either of definitions 2 and 3. It also holds for 
stronger forms of fairness and therefore realizes some kind of fJirness between definitions 1 and 2. The 
algorithm is as follows: Whenever the set of currently executing events is not a maximally parallel set, find all 
ways of extending this set with enabled events so that the new sets are maximally parallel, choose one of them 
at random, and then acknowledge the events in the selected extension. Every time an event is no longer 
. disabled there is a finite probability that it will be acknowledged, and if this is the case infinitely often the 
event will be infinitely often acknowledged. It follows that this algorithm ensures fairness in the sense of the 
the second or third definition above. It will also prevent st.1rvation for event C in the last example above. 
Although this algoritllm is currently only of theoretical interest since we do not know of any efficient 
implementations it forms the basis of several efficient arbiter implementations below. 
We first show that no deterministic oblivious arbiter can do as well as our probabalistic algorithm. We show 
that every deterministic oblivious arbiter gives rise to starvation of an event which is continually requesting 
for some path-cxpression for which the probabilistic algorithm (described above) does not cause such 
starvation. Later we consider ways of physically implementing the probabalistic algorithm. We look at several 
direct implementations that appcar to work at first sight but have problems when examined more closely. We 
show that a straight-forward extension of Seitz's scheme [16] for a two-input arbiter to a general conflict graph 
results in an unfair arbiter. We present one attempt to rectify this problem based on graph-coloring, and show 
why it docs not work. Finally, we present a somewh.:1t non-standard scheme implemented in CMOS which 
forms a best direct approximation to the probablistic algorithm described dbove. i\1l of these schemes also 
23 
sutTer from the dr3wb,lCk thJt critlcJlly balZ1nccd circliit elcmelilS are needed and/or llle levcl of nOISCln ehe 
circuit must exceed the amount of imhalance. Finally we show a prJcticJI WJy of implementing slich In 
JigoriLhm givcn an oraclc that generates a rJndom sequence of bilS. Such an oracle can bc pI1;.sicJlly 
JPproxirn<Hed by an ofT-chip thcrmal noise source, that is amplified Jnd digitised. 
The difficulty of building a fair deterministic arbiter that matches the probabalistic arbiLer can be Illustrated 
by an example. Consider the following path expression: 
path (A:C) + (B:(A + 13)) end. 
Assume the LRU algorithm described previously is being used. and that the external clientls always requests 
permission to perform all three events A. 13 and C. Let the priorities of all three be O's initially. As a result.. 
initially t\ and IJ arc enab\cd. ,\ssume t.hat 13 is selected, making ITs pliority 0 and 1\'S priority 1. In t.he next 
instant, 1\ and 13 will Jgain be enabled. nut now /\ has the higher priority aod will be selected, so t.hat A's 
priority hecomes 0 and Irs becomes 1. Continuing in this t~1shion, it is easy to sec thJt the sequence chosen 
'Will be 13 A 13 A IlA .... The uouble with this scheme is t.hat C will never be enabled even if its request is 
pending. Increasing the number of levels of priority will not help. This exarnple CJl1 be extended to the 
following lemma. 
Lemma 13: Let JI be a deterministic finite-state uansducer implementing an oblivious deterministic 
arbiter. Then there exists a path expression over L = { ,\, 8, C } such that one event, say C, will be 
starved even though its request is continually pending. \-forcover the probabalistic algorithm does not 
cause such starvation for this path expression. 
Proof: Let ,\I be a deterministic finite-state transducer whose alphabet is 2: = { A, 8, C }. Let the states 
of M be S = { 51,52, ... , Sm }. Let the conflict graph, C, for the path expression be the complete graph 
on the vertices A, Band C. We construct a path expression P with the conflict graph C such that ,tf 
causes the starvation of the event C. Notice that because of the nature of the conrlict graph C, if at any 
instant A and B (but not C) are enabled then at most one of A and 8 may bc selected by J/. 
Let 51 be an arbitrarily chosen state of 1\1. We conduct an experiment on ,\/ by continuously providing A 
and 8 as the enabled inputs, starting with J.,f in the state 51' If we present a string of inputs 
{ A, 13 }, { A, B }, ...• { A, 13 } of length m then we notice that at the 1st input { A. 13 }, the 
uansducer deterministically goes from the state 5{1) = 51 to a state .s(2) while outputting A or 13. Let 5(1), 
5{2), ... , .s(m + 1) be the sequence ofst.1tes and a E { A, 13}m be the output string produced as a result 
of the experiment. As a consequence of the pigeon-hole principle, some two states in the sequence of 
24 
Stdtcs will be the same, Of all such pairs, let s(t) and :-;0) be two slIch swtes closest to St' !\ssLlmc elidt i < } 
and let k be the smallest multiple of U - i) sLlch that k ~ i. Without loss of gCl1crJlity assumc thJt .H 
outputs I3 when in state s(t) with the input { i\, 13 }. 
Let [> be the path expression 
path (A + Bt l ;((A ;C) + 13); (/\ + I3)k-i end 
It is easy to see that [> has G as the conflict graph and if the requests for A. I3 and C are continuously 
pending then the sequence of outputs will be a string in { A. 13 } (,J and C will never be enabled. 
The probabilistic algorithm would have no problem with the path-expression since from any state (of the 
path expression) it could reach the state enabling C with finite probability. and hence enable C an 
infinite number of times in an infmite trace. 0 
The result of the above lemma can also be stated as follows: A deterministic oblivious arbiter needs at least 
N/2 states to do as well as one using the probabalistic algorithm, where N is the size of the path-expression, 
whereas the probablistic algorithm requires no internal state. The actual bound on the minimum number of 
states required may be much larger. 
Before proceeding further, let us consider the path expression path ,\ + I3 end. where the conflict graph is 
G = (V, E) = ({ A, n }, {[A. Bn). Seitz [16] has shown how to build an arbiter for such a stnlCture llsing an 
interlock-clement, as ~hown in Figure 5-2. 
h1gh threshold 
buffers 




Circuit operation in Figurc 5-2 is most easIly l,isUJlil.cd sLlfling ',I,ilh ne:lllCf cllcnt rcqucsting, VI Jnd Ii, 
l .:. 
ho(h near 0 Yolts, and both outputs high. If Jny single input. SJy \n' is lowered then I't is driven high. 
rcsulting in A being lowered - B remains unaffected . .\1orcovcr, once 1\ IS lowcred. and JS long as 
. out out out 
:\ is kept low the interlock clemcnt remains in this stable state irrcspectil,c of what happens to f1 , If 1\ is In ' In In 
now raised high then the clement returns to its initial condition if n IS still high: or 13 is lowered if 13 is ~ , In out :n 
lowered in the meantime. 
However, the interesting situation occurs when both A. and B are both lowered concurremly or within a 
10 10 
very short interval of time. In this case the cross-coupled :--';OR gates enter a metastable state, which is resolved 
after indeterminate period of time in favor of either A or 13. Since this resolution depends on the thennal 
noise generated by the gates. it is inherently probabilistic. In this case the outputs of the ~OR gates themsel ves 
cannot be used as the outputs. High threshold inverters between the :'\OR gates and the outputs prevent false 
outputs during the metastable condition. 
It would seem natural to extend Seitz'S idea by generali7ing it to the conflict gr(lph for an arbitrary set of 
path expressiuns. Ruughly speaking. we may construct a circuit by homomorphically transfonning the 
conflict graph to a circuit by replacing each vertex with a NOIZ gate Jnd each edge with a cross-coupling of 
:-JOR gates corresponding to the pair of vertices on which the edge is incident. However, such an 
implementation in NMOS has some severe problems, which wi!! be clarified if we consider the circuit for the 
readers-writers path expression: 
path Rl + Wend 
path R2 + Wend 
where the pair Rl and yv and the pair R2 and Ware mutually exclusive, The conflict graph and the circuit for 
this expression are shown in Figure 5· 3. 
Consider the situation when the circuit is in the none-requesting condition and <111 three requests. R I , R2 
and W, arrive concurrently. An infinitesimally short interval ~, after all three requests arrive, let us assume 
that the voltages at the outputs (of the NOR gates) have increased by an infinitesimally small value A Y <t:: \h' 
The pull-down MOS transistors may be assumed to be operating in their linear region. If all pull-ups are 
assumed to provide equal active resistance, the output of the NOR gate corresponding to \V will grow less 
rapidly than those corresponding to Rl or R2. The cumulative etTect of this imbalance will result in a low 
output for W's NOR gate and high outputs for R1's and R2·s. Hence if RI, R2 and W request continuously 
then the request for W will never go through, resulting in W's starvation. An apparent fix to this problem is to 
increase the ratio of pull-up to pull-down for \V's NOR gate to twice that ofR1's Jnd R2·s. Dut if this is done 
26 
o----------------~o~------------------~o 
R 1 w R z 
( a) 
Gnd 
R 1 w R z 
( b ) 
Figure S.3: (a) The Conflict Graph and (b) The Arbiter in NMOS. 
in a static manner then, when only Rl and Ware requesting, W will have an unfair advantage over Rl' 
The imbalance that favors certain arbiter inputs over others will not occur if the conflict graph is complete. 
A second arbiter design makes use of this observation. We first obtain a minimal vertex coloring for the 
conflict graph, i.e., an assignment of colors to the vertices of the graph so that no two adjacent vertices receive 
the same color. This task is, of course, NP-complete. However, it only needs to be done once, and there are 
heuristics that will come within a factor of two of the minimum number of colors. Events that correspond to 
vertices within the same color class may occur simultaneously without violating our constraint on the behavior 
of adjacent vertices. Thus, we only need to arbitrate between color classes, and the conflict graph for the color 
classes will be complete. " schematic diagram for this second design is shown in Figure 5-4. 
For each color class an OR gate is used to collect the inputs that correspond to vertices in the class. 
Additional AND gates arc used to combine each arbiter output with all the inputs that correspond to vertices 
IN D-out 
In for color Out for color 
Arbiter 
Figure 5·4: An Arbiter based on graph coloring 
in that color class. Assuming that .111 of the initial OR gates have the same delay and that all of me fInal AND 
gates also have the same delay, the second design will be fair.. 
Although the sccond design appears. at first, to have solved the problem with me original design, further 
though t shows that in reality the second scheme may not be that much better than the first. First of all, the 
assumption that all of the OR gates have the same delay may not be very realistic. If the sL1ndard N\fOS 
implementation for OR gates is used. the dclay through a gate will depend on the number of inputs that are 
high--the argumcnt is essentially the same as the one that is used to show me imbalance in the first arbiter 
design. Thus, if more Inputs In one color class are on than in another color class, Lhc el, ents in the .fIrst color 
class would always win the arbitration. 
Moreover, the seeo'nd design does not acknowledge maximally parallel sets. A conflict graph consisting of 
2~ vertices arranged in a ring may be colored with just two colors. If 0i > 2 there will be two vertices with 
different colors that are not adjacent. Assume that both request service at the same time and that all of the 
other vertices remain inactive. llccause the two events belong to different color cldsses our arbiter design will 
not Jet them occur in paralleJ. Since the vertices are not adjacent, however, they should be allowcd to proceed 
in parallel. 
An arbiter that tries to configure itself dynamically for the problem with two readers and one writer is 
shown in Figure 5·5. To see how this scheme tries to remcmdy problcm discussed earlier, consider the 
situa.tion when the circuit is in non-requesting condition and all three requests, R I' R2 and W, arrive 
concurrently. An infinitcsimally short interval ~t after all three requests arrive, the voltages at the outputs 
will have increased by an infInites;mally small value ~ v <t:: vth . The pull-down MOS transistors are in their 
linear region. However, since active resist..1nccs of the pull-up tr:msistors depend on the neighboring events 
23 
( w, R! Rz ) 
1,11) 
Gnd 
Rl ',II R2 
Figure 5-5: The Arbiter for l-Writer-2-Readers Problem in CMOS. 
that are enabled, the pull-up resistance of the gate associated ':'lith W is exactly half of that associated with Rl 
or R2• This provides a balance among pull-up resistances and results in almost equal rate of growth of 
voltages at the outputs. Hence the interlock clements enter theIr metastable states more or less 
simultaneously; and the metastable condition is resolved either in favour of Rl and R2 or in favour of \V, the 
choice governed by statistical thermal phenomena. 
A similar analysis shows that the circuit behaves correctly when only two out of three requests arrive 
concurrently. However, if only one request, say W, 'arrives while all its neighbours remain in their non-
requesting condition the circuit beha~'es somewhat differently. In this case the pull-up transistor with input 
(1l!. R . R ) will turn on, thus allowing the output of the gate to go high. It is important to observe that the 
1 2 
pull-up transistors are controlled dynamically by the requests for the neighbouring events - if there is a 
request for the neighbouring event then only the pull-up corresponding to the event turns on; and if there is 
no request for the neighbouring events then only the pull-up corresponding to the event itself turns on. For 
this to be implemented correctly it is essential that the pull-up corresponding to the event itself be turned on 
only after a delay necessary for the requests for the neighbouring events to propagate to the gate of the 
pull-up. Unfortunately the time constants associated with the arbiter outputs differ since the capacitances are 
not dynamically adjusted and hence even this circuit fails to be (even theoretically balanced). 
We now describe a probablistic arbiter that docs not rely on critical balancing of circuit clements, or the 
presence of noise in the circuit itself. It makes use of an external oracle, that works as a random bit generator. 
This can be prJctically realized in a seperate isolated circuit, that uses thermal noise (or some other source of 
noise) to generate a random bit pattern. The arbiter itsel f is only required to ensure mutual exclusion and the 
simple extension of Seitz's arbiter described above will perform this function. The only difference is the 
29 
presence of a delay clemel1t Jt each input. The dela:, cicments Gin be JlgnJily w' Itched on or ofT (b I 
bypassing them). and are large enough. so tlldt if two conflictillg c\cnts cuc cnabled at the S,llTIe time. Jlld one 
is delayed by the delay element the other is sure to be passed by the arbiter. This means that thc dclay should 
exceed the gate delay of the arbiter (when no conflicts occur). The delay clements are cach controlled by a L 
bit register. which determines if the dclay is on or orT. r\ new value is loaded in to eJch register from a 
(sepcrate) oracle. each time the corresponding event gets enabled. This lTIeans '.lthcnever a new set of events 
gets enabled. their 'priorities' arc randomly 1 or O. It is easy to show that any maximally parallel set then has a 
finite,chance of being selected (when just its events havc priority 1 and all others have priority 0). which is just 
what the probabalistk algorithm requires. To ensure that the random bits clocked into the different registcrs 
are largely uncorrelated. thc oracle is split into multiple orJcles by clocking it into a shift register at a high 
rate. The parallel outputs of the shift register will be largely uncorrelated if all bits in the register gets shifted 
. out once for every arbitration cycle. Lower clocking rates will still work. since the outputs will still be paniJ.lly 
uncorrelatcd. A tapped delay line could be used instead of the sh i ft re~ster. 
For many path expressions, the LRC algorithm is just as fair as the probabalistic JIgortlhm Jnd hJS t.he 
advantages that the response times arc approximately balanced. instead of being a complex function of the 
conflict grJph as in the probabalistic algorilhm. For such path expressions the usc of the LRC algonthm is 
preferable. A way of realizing lhe LR U algorithm in hardware has not yet been described. One realization is 
to lise logically controllable delay lines in front of an arbiter mJt ensures mutual exclusion. just as in the case 
of the probabalistic algorithm. However in this case cach of me k event inputs has k delay lines (in series) and 
thc delay lines arc controlled directly by their priority: Each time an event is blocked. an additional delay line 
is switched off for it, whereas if the event is acknowledged all its delay lines arc switched on again. reducing its 
priority to the lowest level. This circuit requires just O{k*,) area. 
More direct ways of combining thc advant.1ges of the LR U algorithm wilh the probabalistic algorithm 
remain to be in vestigated. 
6. Conclusion 
Since our circuits have the const.1nt separator property, a more compact 0(01) layout is be possible using the 
techniques of [5]. However. while it is definitely possible to automatically generate the O(~ 'log(N») layout 
that we prop~. it is much more difficult in practice to generate lhe O(N) layout of [5]. Furthermore, the 
O(N) layout will occupy less area only for very large N. We suspect lhat case of generating the layout will win 
over asymptotic compactness in this casc. One of the authors (M. Foster) is currently implementiflg a silicon 
compiler for path expressions. based on the ideJs in this paper. 
Finally. we pl.:.n to investigate extensions of our construction to appropriate finite state subsets of CSP [6] 
JO 
and CCS [11]. In the case ofCSP the subset will only permit boolean valued vanablcs Jnd messages which Jre 
signals. If the number of message types is fIxed. we conjecture (l1at area bounds comparable (0 those in 
section 3 can be obtained. t\rrays of processes in which the connectivity of the communication graph is low 
can be treated specially for a more compact layout. Such a finite-state subset of CSP may even be more useful 
Ulan the path expression language discussed in the paper for high level description of various asynchronous 
circuits. 
References 
1. ;\nantharaman, T. A. "t\ delay insensitive regular expression recognizer." (1985). 
2. Campbell, R. H. and A. N. Habennann. The Specification of Process Synchronization by Path 
Expressions. In Lecture NOles in Computer Science. Volume 16, G. Goos and 1. Hartmanis, Ed.,Springer-
Verlag, 1974, pp. 89-102. 
3. Foster. M. J. Specialized Silicon Compilers/or rallguage Recognilion. Ph,D. Th., C~1U. July 1984. 
4. Foster. M. J. and Kung, H. T. "Recognize Regular Languages with Programmable I3uilding-I31ocks." 
Journal 0/ Digital Syslems VI. 4 (Winter 1982).323-332. 
5. Floyd, R. W. and Ullman, J. D. "The Compilation of Regular Expressions into Integrated Circuits." 
Journal a/the ;1ssociation/orComputillg Afac:hinery 29,3 (July 1982),603-622. 
6. Hoare, C. A. R. "Communicating Sequential Processes." Comm. ;1C,\/ 21,8 (1978). 
7. Lauer. P. E. and Campbell. R. H. "Fonnal Semantics of a Class of High-Level Primitives for Coordinating 
Concurrent Processes." Acta fll/omwllca 5 (June 5 1974), 297 -332. 
8. Leiserson, C.E. Area-Efficient VI-Sf Computation. Ph.D. Th., Carnegie-Mellon University, 1981. 
9. D.Lchman. A. Pnueli, J. Stavi. "Impartiality, Justice and Fairness: The Ethics of Concurrent 
Termination." Automata. Languages and Programmillg. (198l). 265-277. 
10. Li. W. and P. E. Lauer. A VLSI Implementation of Cosy. Tech. Repl ASiYf/121, Computing 
Laboratory, The Univcrsity of Newcastle Upon Tyne, January, 1984. 
11. \iilner, Robin. A Calculus o/Communicating Systems. Volume 92: Lec/ure Notes in Computer Science. 
Springer-Verlag, Tkrlin Heidelberg NY. 1980. 
12. Mukhopadhyay. A. "Hardware Algorithms for Nonnumeric Computation." / EEE Transactions on 
Compute.rs C-28, 6 (June 1979).384-394. 
13. Patil, Suhas S. An Asynchronous Logic t\rray. MAC TECHNICAL MEMORANDUM 62, 
Massachusetts Institute of Technology. May, 1975. 
14. Pratt, V. R. On the Composition of Processes. Symposium on Principles of Programming LAnguages, 
ACM, January. 1982. 
15. Rem, Martin. Partially ordered computations, willI applications to VLSI design. Eindhoven University of 
Technology, 1983. 
31 
16. Seitz. C. L. "Ideas i\ bout ,\rb iters." L.-1.\ /lJ f) II First Q UQrt cr (1980). In-l·t 
Appendix: Proof details 
Refer co section 4: 
Lemma 14: If the same assumptions as in proposition 6 are satisfied. then T(SeqU)) is consistent with R . 
J 
Proof: From proposition 6 it follows that ScqU) consists of non concurrent time intervals. TIle result is 
therefore easy co prove by induction on the number intervals in ScqU). using the same proposition. 0 
Lemma 15: For each element i in Int with label e, the corresponding elements in Ext and ScqU) are 
subintervals of i. 
Proof: Follows from the properties of the circuit in fig 4- 2) (see also fig 4- J). 0 
Lemma 16: For any Rj EM. T(lnt)l" is a totJlly ordered multisct. 
.... RJ 
Proof: It is easy co show that r(Jllt)l" = T(lntl" ). But Inti" . consists of 'internal events' of the 
""R] <.oR] -R 
path expression Rj, during each of which the corresponding :\CK is high. Hence by proposition 8, no two 
such events overlap. and therefore T(InOI" is a totally ordered multisct. 0 
""Rj 
Lemma 17: For any Rj E M, TOnt)l" = T( Ext)1 " . 
""Rj ""Rj 
Proof: For any clement i ofT(lnt). that is also in T(Innl" ,the corresponding element ofT(F.'(t) will be 
""Rj 
in T(Ext)ll: (definition 2) since they must map to the s.1me alphabet e E LR . Hence these traces have 
Rj. . J 
the same number of elements. Also from lemma 15 it follows that if il and i2 are two clements. of 
T(lnt)ll: . satisfying one or none of" if precedes i2" and" i2 precedes ir. the corresponding clements of 
I 
RJ 
T(Ext) l: . will satisfy at least the same relationships. In other words the partial order of T(Int) is a 
RJ 
restriction of that of r(Ext). But by lemma 16 T(Int) I " is a totally ordered multiset. Hence from the 
""Rj 
above T(Ext)l" will have the S.1IT1e partial order rclauonship and, therefore, be the same totJlly ordered 
""Rj 
multiset 0 
Lemma 18: For any Rj € M, T(ScqU» = T(Int)ll: . 
Rj 
Proof: Follows from lemma 15 and 16 in the same way as in the proof of lemma 17. The only difTcrence 
is that T(ScqU»1 ~ = T(SeqU». 0 
""Rj 
Lemma 19: For Jny sequencer SEQ, no two TR'S are high simultJneously. 
J 
Proof: The two TR'S would be two ACK's of events in the same path expression Rj, which cannot be high 
simultaneously by prop<?sition 8. 0 
32 
Lemma 20: For any sequencer SEQ, ,TR is raised only if DIS is low and all TA'S are low. J e e 
Proof: By induction on the number of rising transitions OfTR'S : 
o 
1. (First transition): Let the corresponding event be e. By proposition 9 initially all T:\'S are low, and 
all CLR'S are high, hence all TR'S are low initially. By proposition 5 all TA'S will remain low until 
the first rising transition of TR . lly the same proposition DIS will not change until the first rising 
t e 
transition of TR . If DIS were not low, I~ would remain low (see Figure 4-2). Hence by 
e e e 
proposition 8, TR would remain low, a contradiction. 
e 
2 .. (For a succeeding transition): Let the corresponding event be p and that of the previous transition 
q. While TR is high no TA or TR other than TA or TR can be high (proposition 8 and lemma 19), 
Until CLR ~oes high. TR must remain high (s~e Figu~e 4-2). Once CLR goes high, all IN , with a 
E r R" will be low after g short delay (see Figure 4- 2). Assuming the ~ariation in this delay for 
diffefent a's is less than the delay of the arbiter in lowering TR • all TR with a .. q will continue to 
remain low until CLR is lowered (see Figure 4-2). All TA • wid{ a .. q, ~lso continue to remain low 
(proposition 5). But tl.R remains high at least until TA ais lowered (see Figure 5). Hence by the 
time TR is raised all 'LZ,S will be low. Also TR could not have been raised if I~ were low 
(propos~ion 8) .. But if DIS was high when TA w~s last lowered then r~ would now be low (see 
Figure 4-2), assuming the rhain NOR gate pIu: the 2-input NOR gate ha~e a lesser delay than the 
Muller-C clement plus the SR Flip-Flop. Moreover, 1)(S cannot change before rR is raised 
(proposition' 5). Hence DIS must be low when TR is raised.P P P P 
Lemma 21: For any sequencer SEQ. ,TR is lowered only if TA is high. J t e 
Proof: The NOR gate arrangement in front of the arbiter insures that once TR is high it remains high 
t 
until CLR is raised, and this can occur only if TA is high (see Figure 4-2). Moreover once TA is high it 
e e t 
will remain high until TR is lowered (proposition 5). 0 
e 
Theorem 10 
Proof: Lemmas 19,20,21 satisfy the preconditions of proposition 6. Hence T(ScqU» is consistent with Rj 
for any Rj E M. By lemma 18 and definition 4, T(Int) is consistent with Rj for any Rj E M. By lemma 
17 and definition 4, T(Ext) is consistent with Rj for any Rj E M. Hence by definition 4, T(Ext) E Tr k(M). 
o 
ununa 22: 1fT € TriM) is layered. then each subset (cf definition 11) ofT has the property that no two 
elements in it are instances of events in rRJ for any Rj € M. 
Proof: Any two clements iI,i2 (corresponding to events el,e2) in the same subset ofT must be concurrent 
(definitions 3.11). Suppose el,e2 € LRj with Rj E M. Then TI kR' will include il,i2 which will be 
concurrent (definition 2), Hence TlrRj cannot be a total order and ~erefore T ( Tr l:(M) (definition 4) 
JJ 
-- leading to a contradiction. Hence the result 0 
Theorem 12 
Proof: The behavior we require of the external world is that it simul taneously raise REQ for all events in 
the first subset of T, wait until all corresponding ACK are high. then simultaneously lower all REQ. wait 
until all ACK are low, then repeat this cycle for the next subset of T. and so on. We need to show that 
under these conditions the circuit responds within a finite amount of time in each cycle. The result then 
follows directly. 
As shown in the proof of lemma 20, all ACK'S are initially low. Hence they are low at the beginning of 
each of the cycles mentioned in the previous paragraph. At the beginning of each such cycle, Ext,Int and 
every ScqU) with Rj € M. get redefined. Let Tp denote T restricted to subsets before the current cycle. It 
is easy to show by induction on the number of cycles and definition 4 that at the beginning of each cycle 
T( Ext) = Tp and Tp € Tr ~(M). Hence for any Rj E \-1, S(Tp I'> ) is a prefix of some clement in LR . If 
"'" -RJ ,J 
the next subset contains an instance if of event el. then for each Rj E M such that el E LR· • S(Tpl ~ ) J .... Rj 
can be extended by if to give a prefix of some sequence in LRj ; in fact this extension gives the next value 
of Tp I ~ (see lemma 22). But by lemmas 18.17, for any Rj E M. T(Scq(j» = T(Ext) I ~ = Tp I) . 
""'Rj .... Rj .... Rj 
Hence for each Rj € M, such that e1 E LRj' T(SeqU» can be extended by if to give a prefix of some 
sequence in LR·. Thus by proposition 6, the corresponding sequencers SEQ. with el E LR ' will have DIS. J J J J 
low. This applies to any ef in the next subset'{)fT. 
Therefore at the beginning of any cycle, when REQ
ei for any event el in the next subset ofT is raised. all 
DIS
ei inputs to the NOR gate for event el (see Figure 4-2), will be low. Also within a finite amount of 
time all r~levant TA e/s must go low by proposition 6. since the corresponding TR e/s are already low. 
Hence CLRd will go low, and INti will go high for each el in the next subset of T. It follows from 
proposition 8 and lemma 22 that all ACK'S corresponding to events in the next subset of T will be raised 
within a finite amount of time. 
The proof for the second half of the cycle is more straightforward. By lemma 6 once all REQ'S are 
lowered, within a finite time all relevant TA'S will be raised, causing thc corresponding CLR'S to go high. 
As a result all relevant IN's go low (see figure 4- 2) and hence by proposition 8 all ACK's go low within a 
finite time, complcting the cycle. 0 
Table of Contents 
1. Introduction 
2. The Semantics of Path Expressions 
3. Implementing the Sequencer for a Simple Path Expression 
4. Synchronizers for Multiple Path Expressions 







List of Figu res 
Figure 2-1: :\ n cXJmple pomsct 
Figure :r 1: The controller for path P 
I:igure :r 2: Cell for event e in path P 
Figure :r3: Cell for ";" 
Figurc 3··t Cell for" +" 
Figure 3·5: Cell for ....... 
Figure 3-6: ,\ recognizer for pnth a:(a+ b);c end 
Figure 3·7: The tloorplan for a sequencer 
Figure 4·1: A synchronizer 
Ftgure 4- 2: A synchronizer circuit 
Figure 4·3: Synchronizer timing 
11 
Figure 5·1: The con flict graph of a path expression 
Figure 5·2: Seitz's Interlock Element 
Figure S.3: (a) The Conflict Graph and (b) The Arbiter in 0lMOS. 
Figure 5·4: An Arbiter based on gr<1ph coloring 
Figurc 5·5: The /\ rhiter for 1- \V riter- 2- Readers Problem in C:rIOS, 
4 
8 
9 
10 
10 
10 
12 
13 
14 
15 
17 
20 
24 
26 
27 
23 
