Production Rule Verification for Quasi-Delay-Insensitive Circuits by Cook, James N.
Production Rule Verication for
QuasiDelayInsensitive Circuits
James N Cook
Department of Computer Science
California Institute of Technology
Pasadena California
In Partial Fulllment of the Requirements
for the Degree of Master of Science
June  
Dedicated to Jerry Ellen
and especially Emily
Copyright
c
  James Cook 
  Introduction
Circuit designs have become extremely complex The need to manage this complex
ity has led to the development of new automated circuit design methods Currently
these methods lean toward silicon compilation the automated transformation of a
high level circuit description into a transistorlevel implementable design A par
ticularly interesting class of circuits for automatic generation are delayinsensitive
circuits Delayinsensitive circuits are designed to operate correctly with any arbi
trary but nite delay in wires or in operators 	
 Delayinsensitive circuits must
be asynchronous since the use of a clock would bound the range of delays possi
ble for correct operation The property of delayinsensitivity has several desirable
consequences for automated design 	 
 Facilitated layout A delayinsensitive design will function correctly after
arbitrary changes to the lengths of its wires
 Elimination of global clock signals This eliminates diculties in dis
tributing the clock signal simultaneously to all parts of the circuit
 Inherently modular designs Any component of a delayinsensitive design
can be replaced by another logically equivalent component even if the new
component has dierent delays
 Speed optimization Transistors can be arbitrarily resized without concern
for the correctness of the circuit
 Increased robustness Delayinsensitive designs are less sensitive than
other designs with respect to manufacturing process variations operating
temperatures source voltages etc
Of course the above advantages come at some cost Elimination of the global
clock signal requires that subcircuits must generate completion signals to indicate
that they are done with their computation This requires additional wiring and
increases the complexity of the subcircuits In addition the concurrent nature of
computation in such circuits can be more dicult to analyze than computation in
clocked circuits
The concurrent nature of delayinsensitive circuit design can make them extremely
dicult to debug Although many errors show up as shorts or hazards during
simulation there is no guarantee that they will appear Simulations are performed
by assigning delays to various components of the circuit it is possible that these
assigned delays will mask such an error Errors undetected in simulation will cause

a circuit design to lose its delayinsensitivitythe circuits correctness becomes
dependent on the actual delays being similar to those assumed in the simulation
Fortunately these types of errors can be detected in highlevel descriptions of a
circuit design Circuits can be conveniently expressed as lists of production rules
a notation developed by Alain Martin at the California Institute of Technology
	  Lists of production rules can be guaranteed to be free of shorts and hazards
by examining them for two properties called stability and noninterference Delay
insensitive circuits must have these properties for all possible sets of component
delays While it is possible to manually verify sets of production rules for these
properties such manual verication is error prone and becomes unwieldy for all
but the smallest circuits
In this document we present an automated method for the verication of delay
insensitive circuits expressed as production rules We begin with a description
of Martins design method and specication of the production rule notation We
precisely dene stability and noninterference and relate them to delayinsensitivity
We give sequential and parallel algorithms for performing verication We provide
several examples of the verication method and describe our implementation of
the algorithms We conclude with a summary of this work

 Production Rules
Alain Martins research group at the California Institute of Technology Cal
tech has developed a synthesis method and a set of design tools for quasidelay
insensitive circuits The class of completely delayinsensitive circuits has been
proven to be quite limited 	 quasidelayinsensitive circuits are delayinsensitive
under the assumption of isochronic forks and allow the design of a larger class of
circuits Under the isochronic fork assumption some of the gate outputs that are
connected to multiple gate inputs are labeled isochronicthese outputs are as
sumed to arrive at all the connected inputs simultaneously see 	  for details
Quasidelayinsensitive circuits are a superset of speedindependent circuits speed
independent circuits can be considered as quasidelayinsensitive circuits where all
wires are labeled isochronic 	 In this document we exclusively consider quasi
delayinsensitive circuits despite the use of the phrase delayinsensitive to refer to
them
The synthesis method works as follows The designer begins by writing a pro
gram describing the circuits behavior this program has the form of a collection of
concurrently executing sequential processes that communicate over oneway data
channels The notation used for these programs is called CSP Communicating
Sequential Processes and is based on CAR Hoares original notation 	
 The
CSP program is next transformed into a handshaking expansion by reducing it
to an equivalent set of processes where all communication actions have been re
placed with manipulations of shared variables This handshaking expansion is then
transformed into a set of production rules in which all explicit sequencing has been
removed Production rules are the lowest level program description in the design
method they can be easily simulated and can be automatically implemented in
CMOS They are also suciently general to be useful in specifying VLSI circuits
outside the context of this design method
In the production rule notation a circuit is described in terms of its variables and
the conditions under which transitions on these variables occur Each production
rule has the formG  S  where G is a boolean expression on the circuits variables
and S is a simple assignment eg z and z correspond to z  true and z 
false G is called the guard of the production rule Multiple production rules
with identical guards are often written with a single guard and several simple
assignments G  x  y  z is an abbreviated form of G  x G  y G  z
A production rule res executes its assignment some time after its guard evaluates
to true If the ring of a production rule does not change any circuit variables
value then the ring is called vacuous If a ring does change some variable the
ring is called eective
Production rules describe both combinatorial and stateholding gates Figure 

shows the production rules and transistor implementations of a NAND gate and an
inverting Muller Celement Note that the staticizer used to hold the Celements
state is not explicitly described in the production rules This is due to the semantics
of production rule ring each ring is equivalent to an assignment to a variable
Thus the variable should hold its value until some other production rule res and
a dierent assignment takes place
a  b
a  b
 c
 c
d  e
d  e
 f 
 f 
C
d
f
e
c
b
a
GND
Vdd
c
b
a
Vdd
GND
staticizer
w
f
e
d
Figure  Implementation of NAND gate and inverting Celement
There are two properties that production rule sets must satisfy these properties
are termed stability and noninterference Stability is dened as follows
A production rule G  S is stable if every time G becomes true it
remains true until the assignment S is completed
Noninterference only relates to complementary production rules that is production
rules of the form G  z and G  z for some z  Noninterference is dened
as follows


Two complementary production rules G  z and G  z are
noninterfering if and only if G G  holds invariantly
We call a set of production rules stable and noninterfering if each individual pro
duction rule is stable and all complementary production rules are noninterfering
	
Informally the stability and noninterference requirements for production rules fol
low directly from their implementation in CMOS Figure  gives the implementa
tion of two complementary production rules that will illustrate these requirements
a
b  c
 d
 d
a
b
c
Vdd
GND
d
staticizer
w
d’
Figure  Production rule implementation
The stability and noninterference requirements now become clear If production
rule a  d is unstable then a may become false momentarily and quickly return
to true Thus the conducting path from Vdd to d
 
may not remain conducting
suciently long for node d to be set high which could assign an indeterminate
value to d  Furthermore if these two production rules are interfering then a and
b  c might become simultaneously true resulting in a short from Vdd to GND
that could also assign an indeterminate value to d 
Although simulation can be used to check for instability and interference it is not
guaranteed to nd all such problems Take for example a production rule set
containing the following production rules

 a
 b
 a  b
 b
 a
 c
Assume that the simulator approximates physical delays by assigning to each pro
duction rule a delay between the time it will be enabled and when it will re
The delay could be based for example on the sizes and fanouts of the gates that
would implement the circuit Let the delays for    and  be   and
 time units respectively Then whenever the simulator nds a true the following
sequence will occur
 a becomes true
   res and b becomes true
  res and c becomes true

  res and a becomes false
Thus the simulator will nd that whenever ab becomes true it remains true until
c completes Unfortunately this is not true in generalif the delay associated with
 was changed from  time units to  then awould become false before  would
be able to re Thus production rule  would be unstable and the circuit could
fail This circuit is therefore not delayinsensitive changing component delays
can change its behavior To detect this sort of error more exhaustive verication
methods than simple simulation are required

 Verication Method
We begin with the single assumption we require for verication We require only
that the production rule set be closed before we can check for stability and nonin
terference
A set of production rules is closed if and only if for every variable other
than Reset used in a production rules guard there exists a production
rule describing an assignment to that variable
The assumption of closure is necessary because we intend to verify our production
rule set with method similar to simulation We need to have production rules for
each variable that will change including the circuits inputs from the environment
Note that exact specication of the environmentwith production rules is not always
possible for example synchronization and arbitration cannot be expressed in terms
of production rules See Appendix A for information about how environment
specication is done in practice
  Sequential Algorithm
Assume a closed production rule set P with variables x
 
 x

  x
n

A circuit state is a vector with one element per circuit variable Each vector
element can have the value  or 
 
For example S	k is the value of x
k
in state S
A production rule is a twotuple consisting of a boolean expression and an assign
ment We dene several functions on states and production rules
For production rule p transp is the simple assignment transition that will be
performed by that production rule For example transx
 
 x

 x

 x

 
x


For production rule p and state S  enbp  S is true if and only if the boolean
expression of p evaluates to true ie p is enabled when its variables are assigned
the values in S 
For production rule p and state S  ep  S is true if and only if enbp  S 
transp  x
k
  S	k    transp  x
k
  S	k   Thus ep  S is
true if and only if if production rule p can cause the circuit to change state when
in state S  We call such production rules eectively enabled in S 
 
If the circuits reset logic is also being tested vector elements can also have the value U for
undened

For production rule p and state S  resultp  S is dened as follows
 transp  x
k
  enbp  S implies resultp  S  h  S	k	     S	k    i
 transp  x
k
  enbp  S implies resultp  S  h  S	k	     S	k    i
 enbp  S implies resultp  S  S 
For example resultx
 
 x

 h  i  h  i but resultx
 
 x

 h  i 
h  i
let R  fS
init
g and let M  fg
while R 
 fg do
remove state S from R
let E  fp  P j enbp  Sg
let E
 
 fp  E j ep  Sg
if p  q  E    k  n such that transp  x
k
 and transq  x
k
 then
report interference between p and q in state S
for each p  E
 
do
let S
 
 resultp  S
if q  E
 
such that enbq  S
 
 then
report q unstable
if S
 

M then
let R  R  fS
 
g
end for
let M  M  fSg  mark S 
end while
Figure  Sequential verication algorithm
Figure  gives the sequential verication algorithm in terms of the above functions
and set operations The algorithm searches all states that can be reached by
production rule rings beginning in initial state S
init
 During this search if the
algorithm nds a state in which two complementary production rules are both
enabled it reports an interference problem If it nds a transition between states
that disables an enabled production rule before it can re it reports an instability
R and M are sets of circuit states All states in R remain to be examined M
is a set of marked states which the algorithm has already examined S and S
 
are states E and E
 
are sets of production rules with all production rules in E
enabled in S and all production rules in E
 
eectively enabled in S  S
init
is the
circuits initial state P is a set containing all the circuits production rules
Claim This algorithm nds all instabilities and interferences in production rule
set P over variables x
 
     x
n


Proof Dene a directed graph G as follows Let each vertex of the graph be one
of the 
n
possible circuit states Let there be an edge from vertex x to vertex y
if there exists a production rule p such that resultp  x  y Note that G may
contain cycles Let all vertices be initially unmarked We claim that all vertices
reachable from the vertex corresponding to state S
init
will eventually be marked
First we prove that the algorithm terminates Since there are a nite number
of production rules the inner for loop must terminate Consider the set R M 
States are never removed from this setif a state is removed fromR it is eventually
added toM  Initially S
init

M  Furthermore each S
 
added to R is not inM ie
R M   due to the test in the if statement Thus each iteration of the while
loop moves one state from R to M  Since the number of possible reachable states
on n variables is nite jRM j is nite Therefore R will eventually become empty
and the while loop will eventually terminate Thus the algorithm terminates
Suppose there exists some state reachable from S
init
that is unmarked upon ter
mination Then there must exist some nonempty subset U of unmarked states
reachable from S
init
 The rst statement of the algorithm places S
init
into R so
the while loop will execute at least once and S
init
will be marked Since S
init
is
initially marked and all states in U are reachable from S
init
 there must exist at
least one marked vertex in G connected to an unmarked vertex In the algorithm
before a state is marked all states immediately reachable from it are added to R
Additionally each S removed from R is eventually marked Since the algorithm
only terminates when R is empty there can be no such marked vertex connected
to an unmarked vertex Thus all reachable vertices are eventually marked so all
reachable states are eventually visited
Assume that two production rules p
 
and p

are interfering By the denition of
interference there exists some reachable circuit state T such that enbp
 
 T  
enbp

 T  Since all reachable states are marked state T must be marked by the
algorithm Before T is marked however set E will contain both p
 
and p

 and
the interference between p
 
and p

will be reported
Assume that some production rule q is unstable Then by the denition of stability
there exists production rule p and reachable state T such that ep  T   eq  T 
 enbq  resultp  T  Since all reachable states are marked T will be marked
by this algorithm Before T is marked however set E
 
will contain both p and q
Thus S
 
will eventually become resultp  T  and the instability will be reported
 

  NPHardness
The need to check all circuit states reachable from S
init
makes the timecomplexity
of the above algorithm exponential in the number of circuit variables It is un
likely that this timecomplexity can be substantially improved as both instability
and interference checking are NPhard We will later discuss how to eciently
implement this algorithm
Problem Given a set P of production rules an initial state S
init
 and two com
plementary production rules p
 
and p

in P  decide if p
 
and p

are interfering
simultaneously enabled in a state reachable from the initial state under some set
of delays for production rule rings
Theorem Interference checking is NPhard
Proof To prove that interference checking is NPhard we reduce the satisability
problem SAT to it Let E be an instance of SATa boolean expression in
conjunctive normal formover variables x
 
       x
k
 Construct production rule set
P over variables x
 
       x
k
and e as
true
true
E
 x
 
    x
k

 e
 e
Let the initial state S
init
be the state in which all variables in P are false We claim
that E is satisable if and only if there exists interference between true  e and
E  e If E is satisable there exists some assignment to x
 
    x
k
such that
E evaluates to true Let x
 
    x
j
be the true variables in this assignment There
exists a set of delays in which true  x
 
 through true  x
j
 all have shorter
delays than true  x
j 
 through true  x
k
 Thus the circuit can reach a state in
which x
 
    x
j
are true and x
j 
    x
k
are false In this state both true  e and
E  e will be enabled so there will be interference between these two production
rules
Conversely if there exists interference between true  e and E  e then there
exists some state in which E  e is enabled which implies that E can evaluate
to true Construction of the above production rule set can be done in polynomial
time Thus satisability reduces to interference detection and the reduction can
be done in polynomial time Therefore interference detection is NPhard  
Problem Given a set P of production rules an initial state S
init
 and a production
rule p decide if p is unstable p is unstable if and only if there exist states S
 
and
S

reachable from S
init
such that p is eectively enabled in S
 
 disabled in S

 and
the ring of some production rule moves the circuit from S
 
to S



Theorem Instability checking is NPhard
Proof To prove that instability checking is NPhard we reduce the satisability
problem to it Let E be an instance of SAT over variables x
 
       x
k
 Construct
production rule set P over x
 
       x
k
and e  f as
true
E
e
 x
 
    x
k

 e
 f 
Let initial state S
init
be the state in which all variables in P are false We claim
that E is satisable if and only if production rule e  f  is unstable If E is
satisable there exists some assignment to x
 
    x
k
such that E evaluates to true
Let x
 
    x
j
be the true variables in this assignment There exists a set of delays
in which the delay for E  e is less than some constant D and the delays for
true  x
 
 through true  x
j
 are at least D time units shorter than the delays
for true  x
j 
 through true  x
k
 and e  f  Thus the circuit can reach a
state in which x
 
    x
j
are true x
j 
    x
k
are false and e  f  is eectively
enabled but has not yet red In this state both E  e and e  f  will be
enabled this state corresponds to S
 
 Production rule E  e will then re
disabling e  f  Thus e  f  is unstable
Conversely if e  f  is unstable then there exists some state in which e is true
Since E  e is the only production rule encoding a transition on e and the initial
state has all variables false E must have evaluated to true and therefore be sat
isable Construction of the above production rule set can be done in polynomial
time Thus satisability reduces to instability detection and the reduction can be
done in polynomial time Therefore instability detection is NPhard  
   Parallel Algorithm
The problem of circuit state space searching appears suitable for parallel solution
since there are potentially many states to search and each state can be examined
independently This problem is not merely a tree search however because the
state graph may contain cycles When new states to check are discovered they
must be tested for membership in the set of states already examined to prevent
checking states more than once This membership check complicates the parallel
algorithm We now describe a parallel adaptation of the above algorithm for the
messagepassing model of computation
The basic idea for the parallel algorithm is to distribute the sets R and M over
several processes Each process will have local sets R
i
and M
i
such that the union

of all R
i
forms R and the union of all M
i
forms M  The way in which these sets
are distributed will greatly aect the eciency of this algorithm
Distributing the set R is trivial Whenever a new state is generated it can be sent
to an arbitrary process for checkingchecking a state for stability and interference
requires only the state itself and a description of the circuits production rules We
require that each process has access to a copy of the production rule set being
checked

Distributing the set M is more dicult We cannot arbitrarily assign states to
processes as each process needs to check states for membership in the global set
M  Thus we are forced to take a somewhat atypical approach to parallelization
rather than directly mapping work onto processes we instead map the state space
The goal of this mapping is to have each process be responsible for some subset of
the state space Each process i will maintain in M
i
a record of previously checked
states for subset i It will use R
i
to store a set of states remaining to be checked all
states in R
i
will also belong to subset i We therefore need to develop a mapping
from states to processes that can be used to generate these subsets
Our state space to process mapping needs to be reasonably uniform for the parallel
computation to be ecient By uniform we mean that the number of states stored
in each M
i
should be roughly equal If any one process is responsible for a large
number of states then the speed of the computation will be limited by the speed
with which that process can decide membership and generate successor states
Furthermore the space required to store all reachable states may be much larger
than the storage space available for a single process if the state space is unevenly
divided then some processes may run out of storage
One important consideration in choosing such a mapping is that we desire the
circuits reachable states to map uniformly onto the processes This is more di
cult than mapping all possible states onto processes since we have little a priori
knowledge of which states will be reachable
Fortunately there is a way to perform this mapping we use techniques developed
for hashing Since a state is a list of boolean values we represent it with a sequence
of bits which we can consider as either a string of  bit characters or as a large
integer We can then use existing hash functions on these strings or integers to
perform the mapping Thus for any state S hashS is the number of the process
responsible for that state
We can utilize our hash function to distribute both R and M  Whenever we
generate a new state S
 
to be examined we send it to process hashS
 
 That

This is not an unreasonable requirement since both the number of production rules and
the size of their representations tend to be small For example the control circuitry for an
asynchronous microprocessor required less than two hundred production rules to describe

process can then check if S
 
has been previously examined by examining its local
M
i
set and if not adding it to its local R
i
set for future examination This method
is particularly ecient because it does not require processes to query each other
about states belonging to M  Each state is sent directly to the process responsible
for it and no response message is required
Each process i runs the following program
let R
i
 fg and let M
i
 fg
repeat
while message pending do
receive state T
if T 
M
i
then
let R  R  fTg
end while
remove state S from R
i
let E  fp  P j enbp  Sg
let E
 
 fp  E j ep  Sg
if p  q  E    k  n such that transp  x
k
 and transq  x
k
 then
report interference between p and q in state S
for each p  E
 
do
let S
 
 resultp  S
if q  E
 
such that enbq  S
 
 then
report q unstable
let d  hashS
 

if d  i and S
 

M
i
then
let R
i
 R
i
 fS
 
g
else if d 
 i then
send S
 
to process d
end while
let M
i
 M
i
 S  mark S 
end repeat
The algorithm is initiated by sending S
init
to process hashS
init

Figure 
 Parallel verication algorithm
Figure 
 gives the parallel verication algorithm This algorithm is almost iden
tical to the sequential one the only dierence is that whenever a new state S
 
is
generated it is not necessarily added to the local R
i
 Instead it may be sent to
some other process as dictated by the hash functions mapping As mentioned
above this also takes care of the distributed M membership check

An important consideration in messagepassing parallel algorithms is the locality
of communication If messages are sent arbitrarily between processes then the
algorithm will be less ecient than if we can somehow guarantee that processes
will only send to processes on nearby computing nodes This increased e
ciency stems from two sources rst shorter physical communication distances
decrease the time required to transport the message and second the message
passing network may be able to carry more messages simultaneously Of course
this optimization will be dependent on the architecture of the multicomputer used
to run the algorithm the physical wires between processors determine which are
neighbors and which are not For multicomputers that use a hypercube architec
ture such as C L Seitzs Cosmic Cube 	 we can select a hash function that
will cause most communications to be from a process to one of its neighbors
The ability to localize communication in this manner stems from a crucial re
alization about production rule simulation When we re an eectively enabled
production rule exactly one bit of the state changes This follows directly from
the denition of eective All states S
 
that are generated by the above algorithm
will dier from S in exactly one bit Furthermore neighboring processors in a
hypercube have IDs that dier in exactly one bit Our goal is therefore to nd
a hash function which in addition to uniformly distributing our reachable circuit
states maps states which dier in exactly one bit to processors with IDs that dif
fer in as few bits as possible In practice uniformity can be traded for locality of
communication We give several hash functions with dierent localityuniformity
ratios
One deciency of this method is that it only produces local communication on
hypercube architectures However the problem of mapping hypercubes onto other
architectures has been studied by other researchers 	   and will not be dis
cussed here
 String Hashing
Substantial amounts of research has been spent looking for simple hash functions
that generate uniform distributions from string inputs One such hash function
described by Pearson 	 has the benets of being stringbased and quick to
compute even on small microprocessors The function is computed as follows
h  
for i in lengthstate	 loop
h  Table
 h XOR state
i 
end loop
return h MOD processcount


1 = 0010 = 000
2 = 010 3 = 011
7 = 1116 = 110
5 = 1014 = 100
Figure  Processor IDs in a hypercube architecture

As given this hash function only works for up to  processes In 	 Pearson
describes ways of extending it to larger values Table contains the numbers 
in random order We use each character of String to store  bits of state infor
mation This hash function was found to produce the most uniform distribution
of states to processes of any hash function tried Unfortunately this hash function
does not guarantee any sort of locality and in our tests produced uniform random
communication distances
 XOR Folding
Another approach we tried was to design a function specically to generate local
communications One way of doing so is the following Given 
n
processes break
the state bits into w separate nbit words Exclusiveor these words together and
return this value as the value of the hash function This can be done as follows
h  
mask   LEFTSHIFT n	  
for i in w loop
h  h XOR state AND mask	
state  state RIGHTSHIFT n
end loop
return h
The exclusiveor operator has the property that for a XOR b  c if a
 
diers from
a in exactly one bit then a
 
XOR b  c
 
 with c
 
diering from c in exactly one
bit Thus this hash function has the property that states diering in exactly one
bit will generate hash values that dier in exactly one bit All communications in
the hypercube are therefore guaranteed to be with neighbors Unfortunately this
hash function does not distribute states as uniformly as the Pearsons method
 Prime Hashing
There is a third choice for a hash function We can treat our state as a large integer
and hash it with the method suggested in Knuth 	
return state MOD prime	 MOD processcount
Since the state may contain a large number of bits the modulo operations must
be implemented as multiprecision calculations but this can be easily done by

Pearson XOR Prime
SD avg dist SD avg dist SD avg dist
Test Case  
    

 
avg  statesprocess
Test Case    
  
 


avg 
 statesprocess
Test Case   
 

   
avg 
 statesprocess
Test Case 
 
    
 
avg 
 statesprocess
Table  Uniformity and communication distance in a node hypercube average
internode distance   For each example the standard deviation of the size of
M
i
is given as is the average message distance Test cases are cells from a parallel
to serial converter and a cache controller
utilizing the property a  
w
 b mod p  
w
mod p  a mod p mod p 
b mod p mod p Thus the modulo computation can be performed with wbit
words for any positive w This hash function distributes states more uniformly
than XOR folding and often produces shorter message distances than Pearsons
method This is due to the fact that changes in the low order bits in state are
often although not always re ected in changes in the low order bits of the hash
function result Of course this hash method depends on the ordering of the bits
in the state but on average it performs well
Table  gives some data on the relative eciency of the above schemes In general
Pearsons hash function appears to be most useful for systems with limitedmemory
where uniform distribution of states is extremely important or communication
architectures where message locality is relatively unimportant XOR folding seems
most useful for hypercubes where the relative ineciency of the distribution is
oset by the communication locality Prime hashing seems to cover most systems
in between


 Examples
We begin with a simple example illustrating the verication of a correct production
rule set The following set of production rules describes a simple oscillating circuit
 a
 a
 b  c
 a
 a
 b  c
 b
 c
 a
 b
 c
 a
The variables in this circuit will transition as follows rst a followed by both b
and c in any order followed by a followed by b and c in any order Let us
step through the sequential verication algorithm for this circuit
For any circuit that does not have explicit reset production rules ie rules in
volving a variable named Reset we assume that its initial state has all variables
low We write states as vectors for example h    i corresponds to the state with
a low b high and c low Our initial state is therefore h    i We place h    i
into set R and begin the rst while loop
We choose and remove the state h    i from R Production rules  	 and 

have true guards in this state and are therefore enabled We let E  f  	  
g
In this state b is already low and c is already low so production rules  and
	 are vacuous Thus we let E
 
 f
g There is no pair of production rules
in E that calls for up and down transitions on the same variable so there is no
interference in this state We enter the for loop and choose 
 in E
 
 Firing 
 in
state h    i will lead to h    i so we set S
 
 h    i All production rules in
E
 
are still enabled in S
 
 thus there is no instability in the transition from h    i
to h    i S
 
is not in R or M  so it is added to R We then exit the for loop
add S to M and continue
The diagram in Figure  shows the reachable state graph that will be explored by
this algorithm In no state is there a pair of production rules in E that call for up
and down transitions on the same variable eg   and  are never both in E
This implies that in no reachable state are two such production rules enabled and
the production rule set is therefore noninterfering
Also note that for every state each production rule in E
 
is contained in E for the
neighboring state This implies that if a production rule becomes enabled and is
eective then it will not be disabled until after ring Thus the production rule
set is stable as well  

000 100
E: 4, 5, 6
E’: 6
110
E: 1, 2
E’: 2
E: 1, 2, 6
E’: 1, 2
111
E’: 1
E: 1, 2
101
E: 1, 2, 3
E’: 3
011
001
010
E: 3, 4, 5
E’: 4, 5
E: 4, 5
E’: 5
E: 4, 5
E’: 4
(to 000)
Figure  State graph explored by verication algorithm
Let us now examine an incorrect production rule set The following production
rule set contains an unstable production rule
 a
 b
 a
 b
 a  b
 b
 a
 b
 a
 c
We begin as above with the initial state h    i in set R In this state production
rules  and  are enabled and are placed into E These two are obviously
noninterfering and we let E
 
be the eective production rules in E Thus E
 

fg The result of  in h    i is h    i so we add this state to R and h    i
to M 
In state h    i production rules   and  are enabled and noninterfering Only
  is eective and the result of   in h    i is h    i We add this state to R
add h    i to M and continue
In state h    i production rules    and  are enabled and noninterfering
Of these  and  are eective so they are placed into E
 
 The result of  in
h    i is h    i However in h    i production rule  is disabled Thus it
is possible for  to become eectively enabled and then disabled without ring
Since  is in E
 
 the algorithm reports  as unstable and therefore detects this
problem

 Implementation
 Eciency Issues
As shown above both stability and noninterference checking are NPhard prob
lems Since the verication algorithm searches all reachable states for a set of
production rules it can be very slow run time exponential in the number of
circuit variables in the worst case In practice however circuits reach only a
tiny fraction of their state space Furthermore there are several implementation
tricks that can speed the run time of this algorithm
The rst and most important implementation issue is the choice of data structures
for R and M  As the algorithm is written we will need to perform insertion
deletion retrieval and membership for union operations on R and insertion and
membership on M  The membership operation must be particularly ecient as
membership in M is tested for every S
 
 and membership in R is tested in the
union let R  R  fS
 
g for those S
 
not in M  Fortunately we can avoid many
of these membership tests by implementing the sets R and R M instead of R
and M  Since each S we remove from R is eventually added to M  we need not
remove states from R M  Furthermore the nal if statement requires only one
membership checkif the state S
 
is not in R  M it is added to both R and
R M  Thus we never need to check for membership in R and we can implement
it eciently with a FIFO queue We will frequently check for membership in RM 
however so we implement it with a hash table All set operations can therefore be
performed in O time on average
The choice of data structure for the production rule set is also important One
good way to do this due to Steve Burns is to build an expression tree for each
production rules guard include the transition at the root of the tree and include
only one copy of each variable in the forest of production rules Thus each variable
has a list of uplinks leading to expressions involving it and a list of downlinks
leading to all transitions on it Sets of enabled production rules can be stored
as lists of pointers into this data structure Expressions can be evaluated by
associating a value with each node When a production rule is red the expressions
aected by the ring can be updated by traversing the uplinks of the variable to
which the production rule assigned
It is also not necessary to implement both E and E
 
 If only E is implemented
the function result can return a  ag if an ineective production rule is passed
to it The check for instability can be performed while updating the production
rule data structureif a transition was eectively enabled and the update routine
disables it then instability can be reported

a  b  c
ba
c
other production
production
rules for c
rules for brules for a
expressions involving c
production
up
^
Figure  Proposed production rule structure

Test Case Variables States Time sec
P   	 
S   
REG   
HALFCACHE 
  
SERIAL 
  


Table  Run times for prlint on several test cases P and S are highly sequential
subcircuits in a parallelserial converter REG is a collection of register processes
HALFCACHE is a collection of cache controller subcircuits SERIAL is a full
serialparallel converter All times are for a SPARCstation IPX
 prlint Implementation
The CAST Caltech Asynchronous Synthesis Tools design tool package developed
at the California Institute of Technology contains several programs that deal with
production rules One of these prsim is a fast and memory ecient production
rule simulator It can be used for high level simulation of large circuits described
with production rules Another suite of programs bubble cellgen and Vgladys
transform a production rule description of a circuit into a CMOS implementation
Thus production rules can be used for description and testing of existing circuit
designs as well as synthesis of new circuits In either case however the production
rules must be stable and noninterfering if they are to describe a correct circuit
We have implemented the above sequential algorithm in a program called prlint
which will be incorporated into the CAST design tool package This program
performs simple consistency checks on input production rule sets then applies the
algorithm to verify stability and interference
The inner loop of prlint is given in Figure  We use a FIFO queue to store the
states remaining to be checked and a hash table for those checked previously The
prs data structure contains a description of the production rule set being veried
Since applying a state to the production rule set and recomputing all the guards
requires a substantial amount of computation we use re pr and undo re pr
to make local changes to the data structure With this method prlint checks
approximately  circuit states per second on a SPARCstation IPX
Figure  shows sample output when prlint is run on the second example from the
previous section The initial warnings are generated by a simple syntax checker
built into prlint all following output is generated by the verication algorithm

void simple checkstruct Prs prs StateVector initial
f
int i
int enabledcount
int result
StateVector state stateprime
PRPtr enabled	MAX ENABLED
struct Fifo remaining
struct Hashtable checked
remaining  create Fifo
checked  createHashtable
put Fiforemaining initial
fastAddHashtablechecked initial
while NULL 
 state  get Fiforemaining f
apply stateprs state
enabledcount  build enabled listprs enabled
void check interferenceprs enabled enabledcount
for i   i 	 enabledcount i f
result  re prprs enabled	i  re PR and check stability 
if result 
 VACUOUS !! result 
 EXCLUDED f
stateprime  new state vectorprs
retrieve stateprs stateprime
undo re prprs enabled	i
if NULL  associateHashtablechecked stateprime f
fastAddHashtablechecked stateprime
put Fiforemaining stateprime
g if
g if
g for
g while
g simple check
Figure  Inner loop of sequential algorithm implementation


Cprlint printprs printstates sampleprs

Production Rules
a  b
a  b
b  a  c
b  a
b  a
Warning Variable c has only one type of transition
Warning Variable c is set but not used
Warning Reset variable not found
Warning assuming all variables initialize to false	
Check Vars b c a

Run circuit
Checking 
Checking 
Checking 
Error Unstable production rule
b  a  c
Checking 
Checking 
Checking 
Checking 
Checking 

Statistics
Production rules 
Variables 
States Visited 
Possible States 
Figure  Sample prlint output

prlint supports several command line options that provide more information
about the circuit as it is simulated These are documented in the online man
ual page

 Conclusions
We have presented a method for the verication of quasidelayinsensitive circuit
designs This verication requires that circuits be expressed in terms of production
rules which we have presented as a means to describe delayinsensitive circuits
Given this production rule description we carry out a search of the circuits reach
able states and check for stability and interference errors which correspond to
possible shorts and hazards We have shown how this search can be performed
both sequentially and concurrently and have implemented both algorithms We
have given examples of the search method and have described our implementation
of an automated verication tool
We hope that this verication method and its implementation will be used not
only to check circuits generated by Martins synthesis method but also as a way
to verify the delayinsensitivity of other circuit designs


A Channel Declarations
In Martins synthesis method parts of a circuit exchange information via commu
nication channels These channels are implemented as sets of wires that exchange
data in a fourphase protocol For example consider two circuits that need to
synchronize their operation This can be done by performing a communication
between the circuits that exchanges no data Such communications can be imple
mented as follows
Circuit A Circuit B
r
a
Communication on this channel takes place as follows When circuit A becomes
ready to synchronize it raises wire r When circuitB becomes ready to synchronize
it waits for r to become high then raises a in acknowledgment A then lowers r
and B lowers a The part of the communication performed by circuit A is called
the active part and the part performed by circuit B is called the passive part
Assume then that we only have production rules describing circuit A In order to
close the production rule set we must specify the behavior of this channel This
can be done with the following production rules
r
r
 a
 a
In general however all behaviors of channels leading to a circuits environment
cannot be described solely with production rules For example consider a circuit
with a onebit input channel connected to the environment Such a circuit would
be implemented as follows
Circuit
do
dt
df
Environment
Communication on this channel could be either passive or active In a passive com
munication the circuit would wait for either dt or df to be set by the environment

Once it received this information it would raise d
o
in acknowledgment Then the
environment would lower both dt and df  after which the circuit would lower d
o

In an active communication however the circuit would initiate the communication
by raising d
o
as a request The environment would respond by raising either dt
or df  which would be acknowledged by the circuit lowering d
o
 The environment
would complete the communication by lowering dt and df 
The problem with describing this circuits behavior in production rules is that
there is no mechanism for raising only one of two data wires Assume we are
describing a passive communication If we use the following production rules for
the environment then both data wires will become high
d
o
d
o
 dt  df 
 dt  df 
It is also not possible to disable one of the rst two production rules after the other
has red Attempting this solution leads to the following production rules
 d
o
 df
 d
o
 dt
 d
o
 dt
 df 
 dt  df 
The problem with this production rule set is that it contains an instability When
all variables are false production rules   and  are enabled However the ring
of   disables  and vice versa leading to an instability
We have devised a notation called port declarations that compactly describes both
synchronization and data channels This notation is convenient for closing produc
tion rule sets and is accepted as input in prlint The syntax for these declarations
is based on that of channel declarations in the CAST program prif 	 these dec
larations can be copied into les that will be used with prlint
The syntax for a port declaration is as follows
hport typei port  hinput listi  houtput listi 	
A port type is one of active or passive An input list is a comma separated list
of variables representing the wires in the channel that are inputs to the circuit we
are describing An output list is a comma separated list of variables representing
the output wires in this channel During verication at most one of the variables
in the input list will be true at once Further it is an error for the circuit to set
more than one output variable true simultaneously Thus a port declaration for
the above passive communication would be

passive port  dt df do 	
Port declarations can be thought of as macros that expand to lists of production
rules possibly involving an error  ag and lists of variables that need to be kept
mutually exclusive The above example would expand to
d
o
d
o
excldt df
 dt  df 
 dt  df 
Thus the verier will check states reached by ring dt and also by ring df  but
will be prevented from checking states reached by ring both dt and df  by the
excl statement In prlint these mutual exclusion statements are handled by the
re pr function called from the main checking loop As can be seen in Figure 
this function returns EXCLUDED if a production rule is prevented from ring
by such a statement
If instead we were describing an active communication we would use the following
port declaration
active port  dt df do 	
which would correspond to the following production rules
d
o
d
o
excldt df
 dt  df 
 dt  df 
Note that this example has essentially the same functionality as the one abovethe
circuit receives a single bit input from the environment The change from passive
to active requires the circuit to raise d
o
before the environment responds this is
re ected in the reversal of the direction of the dt and df transitions
CircuitEnvironment
co
ct
cf
For a nal example let us consider an active communication where a circuit sends
a data bit to the environment Again the port declaration for this channel is
simple

active port  ci ct cf 	
This port declaration corresponds to the following production rules
ct  cf
ct  cf
ct  cf
 c
i

 c
i

 error
In this case the environment waits for one of ct  cf to become true before acknowl
edging with c
i
 If possible we would like to ensure that the input circuit does
not raise both ct and cf  this would be a violation of the dual rail communication
protocol This is accomplished with the ct  cf  error pseudoproduction rule
This production rule is treated by prlint like any other but if the variable error
becomes true then a special error is generated
As shown above port declarations give a compact description of communication
between a circuit and its environment Although most useful for specifying data
channels they may also be used to describe the synchronization channels previ
ously mentioned Furthermore port declarations generalize easily to an arbitrary
number of inputs and outputs We have found the notation convenient for closing
sets of production rules for use with prlint and believe that they present a useful
extension to the production rule notation

References
	 Shahid H Bokhari On the mapping problem IEEE Tran Computers
 March 
	 Steven M Burns Performance Analysis and Optimization of Asynchronous
Circuits PhD thesis California Institute of Technology 
	 David L Dill and Edmund M Clarke Automatic verication of asynchronous
circuits using temporal logic In Henry Fuchs editor 	
 Chapel Hill Con
ference on VLSI pages 
 Computer Science Press 
	
 CAR Hoare Communicating sequential processes Communications of the
ACM  August 
	 Donald E Knuth The Art of Computer Programming Searching and Sorting
volume  AddisonWesley  Section 

	 TenHwang Lai and Alan P Sprague Placement of the processors of a hyper
cube IEEE Tran Computers 

 June 
	 F Thomson Leighton Introduction to Parallel Algorithms and Architectures
Arrays Trees Hypercubes Morgan Kaufmann Publishers  Sections 
and 
	 Alain J Martin Compiling communicating processes into delayinsensitive
VLSI circuits Distributed Computing 

 
	 Alain J Martin The limitations to delayinsensitivity in asynchronous cir
cuits In William J Dally editor Sixth MIT Conference on Advanced Research
in VLSI pages  MIT Press 
	 Alain J Martin Programming in VLSI From communicating processes to
delayinsensitive circuits In C A R Hoare editor Developments in Concur
rency and Communication AddisonWesley  UT Year of Programming
Institute on Concurrent Programming
	 Alain J Martin et al CAD tools for VLSI design Report CSTR
California Institute of Technology 
	 Peter K Pearson Fast hashing of variablelength text strings Communica
tions of the ACM  June 
	 Charles L Seitz The Cosmic Cube Communications of the ACM 
 January 

	
 Jan L A van de Snepscheut Trace Theory and VLSI Design volume  of
Lecture Notes in Computer Science SpringerVerlag 


