Synthesis of asynchronous control circuits with automatically generated relative timing assumptions by Stevens, Kenneth & Cortadella, Jordi
Synthesis of asynchronous control circuits with automatically 
generated relative timing assumptions
'Jordi Cortadella* 2Michael Kishinevsky, 2Steven M. Bums and 2Ken Stevens 
'Univ. Politecnica de Catalunya, Barcelona, Spain and 2Strategic CAD Lab, Intel Corporation, USA
Abstract
This paper describes a method of synthesis of asyn­
chronous circuits with relative timing. Asynchronous com­
munication between gates and modules typically utilizes 
handshakes to ensure functionality. Relative timing as­
sumptions in the form “event a  occurs before event b" can 
be used to remove redundant handshakes and associated 
logic. This paper presents a method for automatic gen­
eration of relative timing assumptions from the untimed 
specification. These assumptions can be used for area and 
delay optimization of the circuit. A set of relative timing 
constraints sufficient for the correct operation of the cir­
cuit is back-annotated to the designer. Experimental re­
sults for control circuits of a prototype iA32 instruction 
length decoding and steering unit called RAPPID (“Re­
volving Asynchronous Pentium®Processor Instruction De­
coder”) shows significant improvements in area and delay 
over speed-independent circuits.
1 Introduction
Asynchronous communication utilizes handshaking to en­
sure functionality that require some area and delay penalty 
with respect to synchronous design. Timing information 
can be used to combat the full handshake overhead in area 
and delay by removing redundant handshakes and associ­
ated logic. Since absolute timing information is mostly un­
known until layout is complete, relative timing information 
in the form “event a occurs before event b” is a natural rep­
resentation of timing that can be used in the design flow.
Relative timing (RT) was used for design of a prototype 
iA32 instruction length decoding and steering unit called 
RAPPID (“Revolving Asynchronous Pentium®Processor 
Instruction Decoder”) that was fabricated and tested suc­
cessfully [15, 16], Silicon results show significant advan­
tages - in particular, performance of 2.5-4.5instructions per 
nS - with manageable risks using this design technology. 
RAPPID achieves three times faster performance and half 
the latency dissipating only half the power and requiring a 
minor area penalty as a comparable 400MHz clocked cir­
cuit. Another experiment with a circuit based on timing 
assumptions is described in [2].
The design flow for synthesizing relative timing circuits 
is as follows. Relative timing assumptions are provided by 
the user or extracted by the algorithm presented in this pa­
per. The circuits are tnen designed using the assumptions 
tor area and delay optimization. RT circuits can be opti­
mized with respect to the untimed circuits for two reasons:
• RT assumptions reduce the set of reachable states 
and hence increase the number of don’t care states 
for logic optimization of all signals.
• It is possible to extend the set of states in which a sig­
nal is enabled without changing the set of reachable 
states if other enabled signals are known to be or can 
be made faster than the early enabled (a.k.a. lazy) 
signal. This additional flexibility adds local don’t 
cares that can differ from one signal to another.
’This work was suponed by a grant from Intel Corporation and was done during 
i\ visit ro SCL in summer 1998.
A (possibly relaxed) subset of timing assumptions used for 
optimization is back-annotated by the tool and become tim­
ing constraints. Different valid netlists require different 
timing constraints. The circuits are then designed to meet 
the relative orderings, or verified that the restrictions are al­
ready part of the delays in the system. Methods based on 
separation analysis [6 j, geometric timing [ 1 0 ], and relative 
timing can be deployed for verification [ 1 2 ].
In [3] it is shown that relative timing synthesis can be 
automated using lazy transition systems in which enabling 
and firing regions for signal transitions are distinguished. 
This paper enhances the method of [3] in three major ways.
•  A method for automatic generation of timing as­
sumptions starting from a speed-independent (un­
timed) specification is presented. Most of the timing 
assumptions used in RAPPID circuits can be auto­
matically extracted. Only architectural or environ­
mental assumptions on the inputs needed to be spec­
ified by the user.
•  A method for automatic backannotation of RT con­
straints sufficient for the correct operation of a circuit 
is developed.
•  A method for timing aware state encoding is de­
ployed. It reduces the number of state signal and 
generates timing assumptions for state signals if nec­
essary. It has a significant positive effect on both area 
and performance.
Section 2 presents basic theory and models. Section 3 
described a method for automatic generation of RT as­
sumptions. Section 4 presents technique for extracting tim­
ing constraints for a derived RT netlist and briefly describe 
timing-aware state encoding. Section 5 presents experi­
mental results.
2 Basic notions
For brevity, we assume the reader to be familiar with Petri 
nets, a formalism used to specify concurrent systems. We 
refer to [9] for a general tutorial on Petri nets. Lazy transi­
tion systems and lazy state graphs were introduced in more 
detail in [3],
2.1 Transition Systems and State Graphs
A transition system  (TS) is a quadruple [13] TS =  
(S,E ,T,Sj„), where S is a non-empty set of states, E is a 
set of events, T C S x  E x S 'i s a  transition  relation, and s-m 
is an initial state. The elements of T are called the transi­
tions of TS and will be often denoted by s A  s' instead of 
(s,e , s').
State Graphs are binary interpreted transition systems: 
every state is assigned a binary vector of signal values in 
the specified circuit; every event is interpreted as a rising 
(a + ) or falling ( a - )  transition of a signal a. Notation a* is 
used if one is not specific about the direction of the signal 
transition. The set of signals of an SG is called X  =  /U O , 
where /  and O denote the set of input and output signals 
respectively.
A labeling function v : S —» {0,1}" assigns a vector of 
signal values to each state (n =  |X|). We will call v0 (.v) the 
value of signal a in state s. An SG is consistent if rising





Fig. 1: (a) Petri net, (b) Transition System.
Fig. 2: (a) STG for the xyz example, (b,e) SG s with timing 
domains, (c,d,e) Circuits.
and falling transitions alternate for every signal on any path 
in the SG. An example of a TS  and a SG  are given in 
Figure l.(b) and Figure 2.(b), correspondingly.
2.2 Signal Transition Graph
A Signal Transition Graph (STG) is a Petri net (PN) in 
which transitions are labeled with rising and falling signal 
transitions like in a SG . An example of a PN is shown in 
Figure l.(a). This PN corresponds to a TS in Figure 1 .(b). 
An STG has an associated SG in which each reachable 
marking corresponds to a state and each transition between 
a pair of markings to an arc labeled with the same event 
of the transition. Figure 2 .(a) depicts an STG with three 
signals, x ,y , and z  corresponding to the SG in Figure 2.(b). 
For simplicity, places with only one input and output tran­
sitions are often omitted in STGs.
2.3 Excitation and quiescent regions
The excitation region of an event a*, denoted by ER(a*), is
the set of states such that s £ ER(a*) •£$> j  —t. The quiescent 
region of a + , denoted by Q R (a+ ), is the set of states such 
that s € Q R (a+ ) <£> va(s) = 1  A s E R (a - ) .  Similarly, 
s G Q R (a—) o  v>a(s) =  0 A s E R (a+ ). In Figure 2.(b), 
E R (x -)  =  {101,111} and Q R (x—) =  {001,011,010}.
2.4 Lazy transition systems
The main distinctive feature of a lazy system is that it can 
assume a non-zero delay between enabling of transition and 
its firing. Due to this, tne set of states in which a transition 
is enabled might be larger than the set of states in which the 
transition fires. '
Definition 2.1 (Enabling and firing regions) The
enabling region, EnR(a*), o f a signal transition a* is a the 
set o f  states in which transition a* is enabled. The firing 
region, FR(a*), o f  a signal transition a* is the set o f  states
from  which a* can fire, i.e. s €  F R (a*) 3 s ' : s ^  s'.
A  potentially enabling region, PEnR , gives an upper 
bound for a set of states which can be selected as an actual 
enabling region in the RT-implementation. The freedom in 
choosing the enabling region within the PEnR  gives addi­
tional possibilities for logic optimization. It is easy to see 
the following correspondence between the introduced re­
gions: FR(a*) C EnR(a*) C PEnR (a*). We will defer 
discussion of examples until Sections 4.1-4.2.
Definition 2.2 (Lazy state graph) A transition
a* is called lazy if  EnR(a*) ^  FR(a*). A state graph is 
called lazy (lazy SG) iff a t least one transition is lazy 1.
The correctness properties of SGs can be easily trans­
ferred onto lazy SGs. A lazy SG  is consistent, determinis­
tic and commutative if the underlying SG  has these proper­
ties. Persistency property must be generalized for enabling 
and firings as discussed in Section 2.8.
2.5 Timing assumptions
Timing assumptions could be conservatively defined in the 
form telling that one event is happening before or after an­
other.
Difference assumptions. A difference assumption b* <  
a* (reads b* before a*), involving two potentially concur­
rent events a* and b*, assumes that, due to certain tim­
ing characteristics, whenever b* and a* are both enabled, 
b* always fires earlier than a*. In an SG this assump­
tion can be represented by the concurrency reduction of a* 
with respect to b*. RT difference assumptions allows one 
to eliminate states unreachable in timing domain similar 
to state elimination based on absolute timing information 
in [10, 11], They are not sufficient however for expressing 
lazy behavior of signals.
Early enabling assumption. Suppose that transition a * 
triggers the firing of transition b*, i.e. a* and b* are ordered 
in tne specification. Assume that a* can be made “faster” 
than b* in the circuit. Then the enabling o fb *  can be started 
earlier, e.g., from the events triggering a*, and the proper 
ordering of a* before b* will still be ensured by the timing 
properties of the implementation. In lazy SG this results 
in the backward expansion of PEnR(fo*) into FR(a*).
Simultaneity assumption. The simultaneity assumption 
is a relative notion, which is defined on a set of concurrent 
transitions T  with respect to a reference transition a *. It 
tells that from the point of view of a* the skew of firings 
times of transitions from T  is negligible. This assumption 
can be viewed as a local fundam ental mode of T  with re­
spect to a and hence as a generalization of burst-mode ma­
chines [14, 17], An example of the application of simul­
taneity assumption is discussed in Section 4.2.
Assumptions relating only input events cannot be au­
tomatically generated from the circuit behavior and can be 
provided by the designer or generated from the implemen­
tation of the environment.
2.6 Next-state functions
The implementation of an SG  as a logic circuit is done 
through the definition of the next-state function  for each 
output signal and binary vector. For SGs it is defined as 
follows:
f 1 if 3s 6 E R (a+ ) U Q R (a+ ) s.t. v(.v) — Z 
f a{Z) =  \  0 if 3s 6  E R (a—) U Q R (a -)  s.t. v(.v) =  Z
{ -  otherwise
1 As we are targeted at optimization of output signals of a circuit lazy behaviors 
of input signals is not considered.
325
The next-state function f a is correctly defined when 
the SG has the C SC  property, i.e. there is no pair of 
reachable states (s ,s ') such that v(s) =  v(.r') and (s G 
E R (a+) U Q R (a+ ) or s' G E R (a - )  U Q R (a -)) . Note 
that f a is an incompletely specified function with a don ’t 
care (DC) set corresponding to those binary vectors with­
out any associated state in the SG. The logic netlist is 
speed-independent if SG is deterministic, commutative and 
output-persistent[4].
In the SG of Figure 2.(b), the DC set is empty since 
all binary vectors have a corresponding state in the SG. 
As an example, f x( 101) = 0 ; / v(101) = / z(101) =  1 since 
signals x and y  are enabled, and z  is stable in that state. The 
Karnaugh maps for the next-state functions are depicted in 
Figure 3.(a).
For a lazy SG the next-state functions are defined dif­
ferently:
f  1 if 3a’ €  F R (a+ ) U Q R (a+) s.t. v(j) =  Z 
fa(Z) =  < 0 if 3 j G F R (a—) U Q R (a—) s.t. v(s) =  Z 
[ — otherwise
Note that this definition generally gives more don’t care 
vectors that the definition for a SG due to two reasons:
•  More states are unreachable, since timing assump­
tion can reduce concurrency
• States in (PEnR — FR) do not belong to either FR, 
or QR, and hence are included into the DC-set.
As an example, in the lazy SG of Figure 2.(e), /*(101) =
— ; / y(101) =  / z(101) =  1 as explained in Section 4.2.
The conditions for speed-independent implementability 
can be trivially extended to lazy SGs.
2.7 Logic synthesis
From the next-state functions of a SG, a speed-independent 
circuit can be derived by implementing the boolean equa­
tion of each output signal as an atomic complex gate [8] or 
as a generalized C-elements [1 ,7]. For example, a speed- 
independent complex gate implementation for the STG in 
Figure 2.(a) is a netlist: -
x =  z +  xy; y  =  x +  z ; z =  x +  zy.
Similarly, from the next-state function specification corre­
sponding to a lazy SG, an RT-circuit can be derived in the 
form of complex gates or generalized C-elements as illus­
trated by an example in Sections 4.1-4.2.
2.8 Monotonic covers
Not every logic function derived from the definition of 
the next-state function satisfies hazard-freedom conditions, 
and hence valid. The following definition is related to haz­
ards in the behavior of asynchronous circuits.
Given two sets of states Si and S2 of an SG such that 
S2 C S i,w e  will say that Si is a monotonic cover of S t if
for each transition s A s ':
(s G Si —S2 => s' G 5 | ) A  (s G 52 => s' #  Si — S2)
Only monotonic covers of FRs can be selected as EnRs 
for hazard-free solutions for logic netlist [3]. If Si — 
EnR(a*) and S2 =  FR(a*), then (1) no disabling of a* is 
possible and (2) there are no transitions from FR(a*) to 
EnR(a*) — FR(a*), i.e., no disabling of firings for a* is 
possible either. Hence, persistency of a* in the RT imple­
mentation is guaranteed. For example, in the SG of Fig­
ure 2.(b), the set {101,110,111} is a monotonic cover of 
ER(jc—). However, the set {100,101,111} is not, since the
transition 100 110 violates the conditions for mono­
tonicity.
3.1 Ordering relations
Let TS =  (S ,T ,E ,so) be a transition system. Assume that 
every event in E  corresponds to a single connected excita­
tion region.
Definition 3.1 (Conflict) An event e\ 6 E  disables another 
event e i £ E if  3 j | S2 such th a ts\ G EF\(ez) u n d s j £  
ER(e2). Two events e \ ,e 2 G E  are in direct conflict i f  ei 
disables e2 o r  e i disables e\.
Definition3.2(Concurrency) Two events e 1, <?2 £ E are
concurrent (denoted by e\ || e2) i f  they form  a state d ia­
mond, i.e.
1 . ER(ei) n  ER(e2 ) 7  ^0,
2. V s e E R ( e {) n E R ( e 2) : ( s ^ s {) € T  A {s ^  s2) E 
T =£■ 3 ^  € S (a’i —4 5*3 ) 6 T A ($ 2  -f?) £ T .
Definition 3.3 (Trigger) An event e\ £  E triggers another 
event e i £ E (denoted by ej — > 62) if  3s\ -4 S2 such that
51 ^  ER(e2) ands2  € ER(e2).
Definition3.4(Enabledbefore) Let e i , e 2 £  E be two
concurrent events. e\ can be enabled before e-i (denoted  
by ei <le2) if3 s i -¥  S2 such th a ts\ £ ER (ei) — E R (ei) and
52 G ER(ei)flER(e2)-
Definition3.5 (Enabledsimultaneously) Let e \ , e 2 £ E 
be two concurrent events, ei and e2 can be enabled si­
multaneously (denoted by e \O e 2) if  3.?| —> S2 such that 
s\ $  ER(ei)U ER(e2) a n d s 2 G ER(g|)nER(e2).
Definition 3.4 can be extended to sets of events as fol­
lows.
Definition 3.6 (Enabled before a set of events) Let e G E
be an event pairwise concurrent with a ll the events in the 
set X  =  { e i , . . . , e n} C E. e can be enabled before X (de­
noted by e <3 X ) if  3s 1 A  S2 such that si G ER(<?) -  ER(X), 
52 G ER(e)flER(X ) ande' where ER(X) = E R (ei)U  
...  U ER(e„).
Figure l.b depicts the transition system derived from 
the Petri net of Figure l.a. The following facts can be de­
rived using the definitions above: -1 (a || b), c || / ,  c <1 /  
, cO e , etc. Event d  cannot be enabled before {<?,/}, 
but can be enabled before {e , f , g } since there is a tran­
sition sg -4 5i9 such that sg G ER(<i) — ER ({e,/ ,£ } ) , S19 G 
ER (d) fi E R ({e ,/,g} ) and h £  { e , f , g } .
3.2 Delay model
A delay model for events presented in this section gives an 
informal intuitive motivation for the automatic generation 
of timing assumptions. This model refers to the delay of 
the events in the TS. The delay of an event is defined as 
the difference between its enabling time and its firing time. 
Three types of events are considered:
Non-input events: its delay is in the interval [1 — e, I +  e]
Fast input events: its delay is in the interval (I +  £, °°)
Slow input events: its delay is in the interval [A,°°)
The synthesis approach also assumes that (1) the delay 
of a gate implementing a non-input event can be lengthened 
by delay padding or transistor sizing, (2) the delay of two 
gates can always be made longer than the delay of one gate.
3 Automatic generation of relative timing as­
sumptions
326
Hence, one can assume that e < 1/3, (3) the circuit will 
never take longer than A time units (minimum delay of a 
slow input event) in becoming stable from any state of the 
system and a quiescent environment.
The previous assumptions on the timing behavior of the 
circuit can be translated into assumptions on the firing order 
of the events.
3.3 Rules for deriving timing assumptions
We present rules for deriving timing assumptions in the fol­
lowing format: (1) ordering relations that must be satisfied 
in a (Lazy) SG  for a rule to be applied, (2) automatic timing 
assumption that can be generated, and (3) informal justifi­
cation of a rule based on the above delay model.
3.3.1 Assumptions between non-input events
The following rules can be applied for deriving timing as­
sumptions between non-input events, e i , e 2 ,e?> € E:
I. Event enabled before another event.
Ordering relations: (ej || ei) M«i < ^ 2) A (ej f&ei) A (ei Pe-^. 
Difference timing assumption: ei fires before ej 
Delay assumptions: one gate shorter than two gates.
II. Events simultaneously enabled.
Ordering relations: (ei !| ei) A (e[<>ei) A (ei -Ae\).
Difference timing assumption: e\ fires before ei 
Delay assumptions: delay of ei longer than delay of e \.
III. Event triggered by events simultaneously enabled.
Ordering relations: (e\ || e2) A (e\ j&ej) A {ei -fle 1) A
\(e\ =$■ ej) V (ei => ej)]- 
Simultaneity timing assumption: e\ and ei simultaneous wrt ey. 
Delay assumptions: one gate shorter than two gates.
IV. Early (speculative) enabling for ordered events.
Ordering relations: ( e i— > ei).
Early enabling timing assumption: e\ fires before e% (but ei can be 
enabled concurrently with ei).
Delay assumptions: delay of e\ shorter than delay of ei.
Let us illustrate the previous cases with the example of 
Figure 1 assuming that all events are non-input. Timing as­
sumptions of type I can be derived for the pairs of events 
(c , f ), (c ,g ) and (e ,d ), where the first element of the pair is 
assumed to fire before the second. Timing assumptions of 
type II can be applied to the pairs (b, h) and (c ,e ). Timing 
assumptions of type III can be applied, e.g., to the events 
triggered by the pair (b, h ) that triggers the events c, e and 
g.  Timing assumptions of type IV can be applied, e.g., to 
the event d  triggered by the event c. If this assumption ap­
plies, then potential enabling region for d  includes states 
{i'2 ,i-5 ,i8 ,jl2 ,ii5 ,^18,521} as don’t care states for the 
values of the next state function for signal d  in addition to 
the originally present states { i3 ,i6 ,^ 9 ,^ 1 3 ,jl6 ,il9 ,j2 2 } .
3.3.2 Assumptions between non-input and input 
events
Assume that e \ , e j  € E  are a non-input and an input event 
respectively and they are concurrent.
V. Input not enabled before non-input event.
Ordering relations: (e 1 || ei) A ei ^e \.
Difference timing assumption: e\ fires before ei.
Delay assumptions: delay of environment longer than delay of one 
gate.
This assumption covers the ones of type I and II for the 
case in which e2 is an input event. The delay assumption 
used in this case states that the response time of the envi­
ronment will always be longer than the delay of one gate.
Assume that e e  E  is a slow  input event, X =  { e \ , . .. , en} C 
E  is a set of non-input events and e is pairwise concurrent
with all the events in X._______________________________
VI. Slow environment not enabled before non-input events.
Ordering relations: (Vc, € X : e || e,) A e -fiX.
Difference timing assumptions: X fires before e.
Delay assumptions: delay of slow input event longer than A (delay 
of stabilizing the circuit under a quiescent environment).
To illustrate the meaning of this timing assumption we 
will consider that h is an input event and d  is a slow input 
event in the example of Figure 1. The rest of events are 
non-input. After firing the events a , b and c a state in which
d,  e and h are enabled is reached (state .S3 ). At this point 
it can be assumed that e and f  will fire before d  (two gate 
delays vs. slow environment). However, no assumptions 
can be made about the firing order between d  and g since g 
is preceded by an input event (h) for which no upper bound 
on its delay can be assumed. In case h would be a non-input 
event, d  would be assumed to fire before h and g  also.
4 Backannotation of timing constraints
After logic synthesis, the validity of the timing assumptions 
must be verified or validated to ensure the correct function 
of the circuit. However, the circuit may be correct for a set 
of states larger than the one defined by the timed domain, 
which can be obtained by a set of less stringent timing as­
sumptions. In other words, some of the timing assump­
tions are redundant for a particular logic synthesis solution, 
while some other can be relaxed. This section attempts to 
answer the following question:
Can we derive a minimal set o f timing assump­
tions sufficient fo r  a circuit to be correct?
This set of timing assumptions backannotated for a
fiven logic synthesis solution is called timing constraints. iming assumptions (both manual and automatic) are part 
of the specification and provide additional freedom for 
logic synthesis, while timing constraints is a part of the im­
plementation, since they constitute requirements to be met 
sufficient for a particular netlist solution to be valid.
4.1 Example 1
Let us analyze the example in Figure 2. The shadowed 
states in SG of Figure 2.(b) correspond to the timed domain 
determined by the timing assumptions
z +  < y +  and y +  <  x -
Under these assumptions, logic synthesis can be performed 
by considering the states 110 and 001 unreachable, i.e. in 
the don’t care set of the logic functions for all signals x,y,z.
The circuits of Figures 2.(c) and 2.(d) have a correct 
behavior under the previous assumptions. Looking at the 
circuit of Figure 2.(c) we observe that:
•  The gates x =  z. +  xy and v =  x +  z are correct imple­
mentations for the whole untimed domain.
•  The gate z =  x is a correct implementation for all 
the states except for 001. In this state x =  0 and 
z — should have been enabled according to the next 
state function of the implementation, out it is not 
enabled in this state according to the original state 
graph specification.
Thus, even the circuit may have been obtained using 
the two previous assumptions, only one relative timing con­
straint y+  <  x -  must be ensured for the circuit to be cor­
rect. In general, each gate of the circuit is correct for a 
subset of the untimed domain which is also a superset of 
the timed domain. The circuit is correct for those states in 
which all gates are correct.





11 10 N^ OO
J
01 11 10 2 00
z
01 11 10
0 0 0 0 0 0 I 1 0 0 0 0 0
1 0 0 1 1 1 1 1 1
z00 01 11 10 2 00 01 11 10 Z00 01 11 10
0 1 \ 0 0 0 0 ||jy|s 1 0 0 0 0 0
1 1 0 1 I  1 1 w 1 1 1 1 n
^ 0 0 01 11 10 ^ z00 01 11 10 Z00 01 11 10
0 1 0 0 0 0 0 1 1 0 0 0 (1 0 0





1 l o 0
I 1  « 1 1
^ 0 0  01 11 10 xfOOOllMO
0 * 1 0 0 0 i t 0 0




- global DC |H  - local DC




Fig. 3: Next state functions for xyz example: (a) Original 
untimed specification; (b) Specification for RT assumptions 
“z +  <  _y+ and y +  <  x —”; (c,d) Implementations from Fig­
ures 2.(c,d); (e) Specification for RT assumption “y + ,z +  
simultaneous with respect to x — (f) Implementation from 
Figure 2.(f).
4.2 Example 2
Let us consider the same example under a simultaneity 
assumption “* +  and >’+  are simultaneous with respect to 
x —’’. Under this assumption state 001 is unreachable and 
becomes a don’t care for all signals. In addition states 101 
and 110 becomes don’t cares for signal x, since both belong 
to the potential EnR(.*—) according to the semantics of 
the simultaneity assumption. Only one timing constraints, 
z +  <  x - ,  is sufficient for the circuit in Figure 2.(0 to be 
correct. Gate x  =  y  is not enabled in 101, hence concur­
rency is reduced in this state with respect to the original 
untimed SG and state 001 becomes unreachable under any 
gate delays. State 110 on the contrary corresponds to the 
concurrency expansion for enabling of x —. This enabling 
is lazy since 110 €  EnR(jc—) A 110 ^  FR (jc-).
Figure 3 shows Karnaugh maps for the next state fun- 
stions of signals x,y,  and z  for specifications and imple­
mentations corresponding to the examples above. A legend 
shows that timing assumptions provide two types of don’t 
care vectors in RT specifications: global don’t cares corre­
sponding to states unreachable due to timing assumptions, 
and local don’t cares that differ for different signals. In the 
RT implementations some states become unreachable due 
to untimed concurrency reduction and therefore discrepan­
cies in the corresponding values of the next state functions 
compared with the original untimed specification can be ig­
nored; some discrepancies corresponds to concurrency re­
duction (disabling of signal transitions without persistency 
violation), and finally, other discrepancies correspond to 
lazy enabling and require timing constraints for correct cir­
cuit behavior.
4.3 Correctness of RT circuit
Let S be an original untimed SG with a finite set of reach­
able states Zl 2 and initial state i'o. Let C be a circuit 
netlist implementing S under timing constraints C. A pair 
<  G ,C  >  is called a relative timing circuit (RT circuit). It 
defines a lazy SG, L<c,c>» with a set of reachable states 
11l . The RT-circuit implementation can contain more sig­
nals than the original specification S  if some state signals 
are inserted for resolving state conflicts. Let us assume that 
S  has n signals and L  has k , k > n ,  signals. Then for com­
paring states one needs to use a homomorhism h \B^ t-» B'\ 
that given an implementation state hides (k — n) new in­
ternal signals and obtains a specification state. Homo­
morhism, h, is naturally extended to sets of states.
A RT-circuit is said to be correct if the following con­
ditions are satisfied: "
1. h(ZlL) C Zl, i.e. no states outside original untimed 
domain are reachable by the RT-circuit.
2. All signals persistent in S are also persistent in lazy 
SG L<g,c>- All state signals inserted in Z.<c;,c> are 
persistent. Commutativity and determinism are pre­
served.
3. The initial state is preserved with respect to the I/O 
interface, i.e., if Jo € S and sJ0 6 L<g,c> are the ini­
tial states of the original SG  and the lazy SG corre­
sponding to the implementation, then there is a path
so h(so) or h(s'0) sq in S such that sequence X 
contains only events of internal signals, not observ­
able  by the environment.
4. No events disappear: If ERs(e) ^  0, then FR/,(e) ^  
0 A /j(FRz,(e)) C ERs(e)
5. No new deadlock states appear in L<c,c>-
4.4 Theory for backannotation
For the ease of exposition let us assume that no state sig­
nals are inserted in the RT circuit, and therefore the number 
of signals stays the same for S  and L. We will briefly dis­
cuss now state signal insertion is done in Section 4.8. Let 
Zl be the set of states reachable in the untimed domain of 
a state graph and (T  <Z 11 the set of states reachable under 
a set of timing assumptions, manual - provided by the user 
and automatic - derived for synthesis according to the rules 
of Section 3. Let us assume that we have a circuit with m 
output signals, Let G =  {g«,
(where X  is the set of signals) be a set of gates implement­
ing the RT circuit, where g«, (X) denotes the boolean func­
tion implemented by the gate of signal a,-.
Reachable states in the untimed domain
Let us call 0^(G) the set of states reachable in the un­
timed domain for the circuit G. Note that, in general, 
Zl — !R.(G) 0 due to the reduction of concurrency im­
posed by the circuit, and % {G ) -  Zl 0 due to expansion 
of concurrency for enabling for lazy transitions. The lat­
ter states are not generated by our procedure since they 
must be unreachable in RT domain anyway. The former 
states are of interest, since they do not require any tim­
ing constraints (see examples 4.1 and 4.2). Let us denote 
ZIg =  ^.(G ) H Zl. ZIq can be calculated as follows;
1. For each output signal at, calculate disabled(aj) — 
{.y € Zl | s #  EnRc(«i) A s 6 E R j(a ,)} , i.e. states in
2Our implementation is currently limited by the bounded untimed STGs and 
SGs. It can be easily extended to unbounded untimed STGs by making unbounded 




Fig. 4: Formulation of the backannotation problem. 
{C|,C2 ,C3} is the set of timing constraints sufficient for 
correctness ofRT solution.
which a; was enabled in the untimed domain in SG, 
S, but made stable by the circuit.
2. For each output signal a,-, remove all arcs s ^  from 
the SG for all states s € disabled(ai).
3. Calculate the new set ZIq =  <K.{G) ft ZI of reachable 
states.
States with incorrect behavior
Let us call incorrect(G) C ZIq the set of states inside ZIq 
that are recmired to be unreachable for the correctness of 
the circuit. These states can be calculated as follows:
1. For each output signal a,-, calculate incorrect(cij) =  
{a- € (ZI — T ) | s e EnRc(a,-) A s 6 QRs(a;)}, i.e. 
states in which a, was stable in the untimed domain, 
but enabled in the circuit.
2. incorrect(G) — ZIq H ((J«f incorrect(ai))
Backannotation: problem formulation
We need a set of constraints that make the states in 
incorrect(G) unreachable. A trivial solution to this prob­
lem is to take the complete set of timing assumptions used 
for logic synthesis, i.e. those for which CT is the set of 
reachable states. Our goal, however, is to find the less strin­
gent set of constraints sufficient to make the circuit correct. 
Given a set of timing constraints C  =  {Ci,. . .  ,C ,,}, we will 
call % {C) C ZI the set of states reachable after applying 
C  in the untimed domain. In general, the problem can be 
formulated as follows (see Figure 4):
Find a set o f  constraints C with the largest %SC) such 
that
1. <T C ‘KS.C) C ZI — incorrect(G)
2. Vs e  ‘T : £ EnRc(a;*) A s $  ERj(a,*)) => 3 a j : 
s -k  s' A s' £  T  A (a j*  <  a;*) € C
The first condition guarantees that no incorrect states 
inside ZI are reachable (constraints Ci,C2 in Figure 4), 
whereas the second makes sure that no states outside ZI 
can be reached in the RT circuit (constraint Cs in Figure 4).
4.5 Finding a set of timing constraints
Relative timing constraints are defined in terms of firing or­
der of events. Constraining a firing order between a pair of 
events makes only sense when they can be enabled simul­
taneously and fire in any order, i.e. when they are concur­
rent. Thus, each timing constraint C, can be denoted by an 
ordered pair of concurrent events, e.g. C, =  (ej <  e*).
Given a constraint Ci =  (e; <  e*), we define the set of 
arcs disabled(Ci) as
disabled(Ci) =  {5 s' | 35 —> si sn :



















Fig. 5: Example for backannotation with table of unreach­
able states for each pair of ordered events.
In particular, the path i’i sn can be empty if
s  € ER(e j)  fl ER(eJt). disabled(Ci) is the set of arcs with 
label ejt that must not fire in order for e j to fire before e ,^ 
i.e. those arcs with source states in which both events are 
concurrent or preceding ER(ev) D ER(e^) inside ER(e*).
Given a set of constraints C  =  { C |,... ,C/;}, ‘J{.(C) is 
the set of reachable states after removing the arcs in
[ J  disabled(Cj) 
c,-ec
4.6 Example 3
Figure 5 shows an example for deriving a set of timing 
constraints for backannotation. Initially we have ZI =  
{so,...,sio} and T  =  {jo,si>A’2,S5,S8,*9,-rio}- Let us as­
sume that 6^ and s-/ are the states in which the behavior 
of the circuit is incorrect. The table in Figure 5 contains 
the set of states that become unreachable by  reducing the 
concurrency between each pair of concurrent events3. For 
example, by imposing the order d  <  b, the states S2 and 53 
become unreachable.
The problem to be solved is the following: find a set of 
ordering constraints between pairs of events such that the 
new set of reachable states covers 'T and does not intersect 
the set of incorrect states {56,57}. Moreover, we want to 
maximize the set o f  reachable states, i.e. to find the less 
stringent set of timing constraints.
The problem can be posed as a covering problem. The 
cells of tne table in bold correspond to those constraints that 
do not remove any state from <T. The covering problem can 
be formulated as follows:
(e <  c) A (b < d  V b <  e)
with the minimum-cost solution C =  {e  <  c ,b  <  e} and 
H (C) =  {50,5),52,54,55,58,59,510}
4.7 Solving the covering problem
The covering problem for backannotation does not corre­
spond to a unate covering problem, since the cost of the 
final solution (number of disabled arcs) is not the sum of 
the cost of each constraint.
Currently, p e t r i f y  uses a greedy approach to solve 
the covering^problem that can be easily implemented by 
symbolic BDD-based techniques. It merely consists in 
choosing the constraint that removes the maximum num­
ber of arcs whose destination is in incorrect(G) and that 
have not been removed by previous constraints. This pro­
cess is iteratively repeated until all incorrect states become 
unreachable.
3For simplicity, unreachable states are reported in the table for this example. In 
general, the analysis must be performed by calculating the removed disabled arcs. 
In this particular case, the resulting analysis is the same.
329
1X 1
• Jim- riM- -









Fig. 6: (a) FIFO controller, (b) Specification, (c) Specifi­
cation with state encoding signal, (d) RT implementation 
with gC elements, (e) Timing constraints sufficient for cor­
rectness.
In some cases, not all the incorrect states can be made 
unreachable since the timed state space has been produced 
by early enabling some events. In those cases, a similar 
iterative process is executed to cover those incorrect states 
that can oe legalized by early enabling. As an example, 
consider the state s$ in Figure 5. Assume that j'6 is incor­
rect since the next-state function indicates that /  is enabled 
in that state. The state could be made correct by extend­
ing E n R (/)  towards s& and imposing the type-IV constraint 
e < f .
4.8 Timing aware state encoding
The problem of state encoding is in inserting state signals 
for resolving C SC  conflicts. State encoding in our imple­
mentation is automatically solved using an extension or the 
method presented in [4]:
•  Only those encoding conflicts reachable in the RT 
domain are considered in the cost function such that 
no effort is invested in solving conflicts unreachable 
in RT domain, tT.
•  Automatic timing assumptions can be generated for 
inserted state signals using rules from Section 3 im- 
gj^ing that the state signals can be implemented as
5 Experimental results
5.1 Academic examples
The results for the well-known benchmarks used at 
academia are presented in Table 1. Tables l.(a) and l.(b) 
present the results for specifications with and with state 
coding conflicts respectively. SIa, SI, and TI represent area 
and delay optimization for speed-independent design, and 
relative timing results, correspondingly.
For each experiment, area is estimated as the number 
of literals of the set and reset networks of generalized C 
elements. Delay (response time) is estimated as the aver­
age number of non-input events in the critical path between 
the firing of two input events. Comparing the columns SI, 
and TI, we observe a reduction of about 40% in area. The 
reduction in response time is less than 5% if we consider 
all events to have a delay of one time unit. However, the 
performance improvement is much more pronounced if it 
were evaluated with actual delays, given that the logic of 
the timed implementation is much simpler. We report this 
analysis in Section 5.2.
5.2 Example: a FIFO controller
In this section we trace the development of a FIFO cell 
(specified in Figures 6,(a),(b)), a simplified abstraction of 
a part of the RAPPID design. The modules at the left and 
right sides of the controller have a similar speed as the con­
troller itself. In fact, these events are generated by twin 
modules connected at each side. For this reason, it is not 
wise to assume that the input events are slow.
We simulated four FIFOs using different implementa­
tions of the FIFO cell and measured a cycle time of the
Table 2: Performance comparison of FIFOs normalized to 
a fan-out four inverter delay
FIFO and a forward latency (an average event propagation 
time from li to ro) of a cell. The results normalized to the 
delay of an inverter with fan-out four in a given technology 
are shown in Table 2.
For the first relative timing FIFO (reported in the first 
row) we use a RT circuit derived by p e t r i f y  using only 
automatic timing assumptions presented in Figures 6.(e). A 
proper transistor sizing is required for correct operation of 
the circuit. No user-defined assumptions on the environ­
ment are used. The timing analysis explained in Section 3 
has been applied to the specification, and state encoding 
has been automatically solved as desribed in Section 4 .8 . 
With this strategy, only one additional state signal, x, was 
required as shown in Figure 6.(c)4. There are some inter­
esting aspects of this implementation:
• The state signal x  is is switching concurrently with 
other activity in the circuit.This is a result of the state 
encoding strategy of p e t r i f y  that attempts to in­
crease the concurrency of new state signals until they 
disappear from the critical paths according to the de­
lay model explained in Section 3.
•  The response time of the circuit with regard to the 
environment is only one event (two inverters), i.e. as 
soon as an output event is enabled it fires without 
requiring the firing of any other internal event.
Finally, the implementation of Figure 6.(d) requires some 
timing constraints to be correct. After applying the method 
proposed in Section 4 , five timing constraints between pairs 
of concurrent events have been derived that are sufficient 
for the circuit to be correct. They are graphically repre­
sented in Figure 6.(e).
The constraints l„+  <  x — and r„+ <  x — are not inde­
pendent. Since the implementation of x  is x — l„ +  r,„ it 
is always guaranteed that one of them will hold, whereas 
the other must be ensured. Since lH+  and /•„+ are enabled 
simultaneously, these constraints will always hold if the de­
lay of two gates is longer than the delay of one gate. From 
the rest of constraints, the most stringent is x — <  r ,+ . In 
the worst case, both n,+ and x — will be enabled simultane­
ously by r„+ . In this case, it is required the delay of x — to 
be snorter than the delay of r,+  (from the enviroment). In 
case of a very fast environment, it can be forced by differ­
ent techniques, e.g. transistor sizing or delay padding for 
gate x. w
For the second FIFO (the second row of the table) we 
derived a speed-independent circuit using p e t r i f y  in the 
mode of automatic concurrency reduction [5] without con­
straining I/O concurrency of the cell. Because of concur­
rency reduction only one state signal was required [4] like 
in the case of the automatic RT solution. However, the state 
signal was on a critical cycle and the implementation of lo  
and ro contained additional p-transistors, which made the 
speed-independent circuit 20-30% slower than the RT one.
5.3 RAPPID control circuits
In this section we compare manually optimized RT control 
circuits used for RAPPID [16, 15] with those derived auto­
matically with p e t r i f y .  For each example, Table 3, re­
ports: m a n u a l (obtained by applying relative timing man­
ually), a u t o m a t i c  (obtained automatically by p e t r i f y
4This new specification is not strictly a Petri net, since the arcs from /„+ and r(,+ 
to the OR place indicate an ur-cuusality relation: x- is triggered by the first event 
to fire, whereas the token produced by the latest event is implicitly consumed. An 









adtast 18 31 13 2.17 1.00 1.00 2 1 0
alloc-outbound 20 23 22 1.50 l.i i 1.00 2 2 2
master-read 65 79 45 2.29 1.33 1.29 7 1 3
mmuO 33 47 20 2.31 1.38 1.38 3 3 0
mmul 25 32 15 1.60 1.12 1.12 2 2 1
mrO 50 51 30 1.60 1.45 1.15 3 3 2
mrl 36 39 20 2.25 1.19 1.19 4 3 0
nak-pa 24 35 24 1.25 1.00 1.00 1 1 • 1
nowick 18 19 16 1.50 1.17 1.00 1 1 1
ram-read-sbuf 30 26 21 1.10 1.00 1.00 1 1 0
sbuf-ram-write 24 44 24 1.63 1.00 1.00 2 2 1
sbuf-read-ctl 18 21 16 2.00 1.50 1.50 1 1 1
seq3 18 22 18 1.50 1.00 1.00 2 2 2
seq-mix 23 28 24 1.40 1.20 1.10 2 2 2
vmebus 22 33 17 2.29 1.57 1.57 I 1 0




















Table 1: Experimental results: specifications without CSC (a) and with CSC (b).
Design
m








FIFO-A 22 22 46 3.0 3.0 9.0 2.5 2.5 5.7
FIFO-B 16 15 46 2.0 2.0 9.0 2.0 2.0 5.7
Byte-cntr 32 27 71 4.0 3.0 5.0 3.0 2.5 4.1
Tag-unit 31 47 ' 112 4.0 4.0 8.0 4.0 2.7 6.9
Summary 101 111 2h 3.3 7.75 3.0 ’ 2.4 5.6
Table 3: Comparison for two generic representative exam­
ples (fifo) and two control circuits from RAPPID (byte- 
control, tag-unit). Response time is measured in gate de­
lays, area in transistors, m: manual, a: automatic, s: speed- 
independent.
and applying relative timing) and s p e e d - in d e p e n d e n t  
(obtained automatically by p e t r i  f y  without concurrency 
reduction).
From the table it can be deduced that automatic solu­
tions are quite comparable with manually optimized RT 
designs. The improvement in response time by applying 
relative timing is about a factor of 2, substantially better 
than for the examples of Table 1. This is because the de­
signers of these circuits had a stronger interaction with the 
tool and provided aggressive timing assumptions on the en­
vironment that could not be derived automatically.
6 Conclusions
The method for automatic generation of timing assump­
tions presented in this paper allows the designer to concen­
trate on defining those timing assumptions that can only 
be deduced from a detailed knowledge of the environ­
ment. The technique for automatic back-annotation of tim­
ing constraints relative to a particular RT circuit provides 
necessary timing information for the down-stream tools. 
Timing-aware state encoding allows area/delay optimiza­
tion of RT circuits.
Relative timing presents a “middle-ground” between 
clocked and asynchronous circuits, and is a fertile area for 
CAD development. Both burst-mode[14, 17] and speed- 
independent specifications are at opposite extremes of a 
more general class of relative timing specifications.
Ackowledgments We would like to thank Shai Rotem, 
Luciano Lavagno, Alex Kondratyev and Alexandre 
Yakovlev for their contributions in motivating this work 
and developing the theory for synthesis with relative tim­
ing.
References
[1] S. Bums. General conditions for the decomposition of state holding 
elements. In International Symposium on Advanced Research in 
Asynchronous Circuits and Systems, Aizu, Japan, March 1996.
[2] W. S. Coates, J. K. Lexau, I. W. Jones, S. M. Fairbanks, and I. E. 
Sutherland. A fifo data switch design experiment. In Proc. Interna­
tional Symposium on Advanced Research in Asynchronous Circuits 
and Systems, pages 4-17, 1998.
[3] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, 
A. Taubin, and A. Yakovlev. Lazy transition systems: application to 
timing optimization of asynchronous circuits. In Proceedings of the 
International Conference on Computer-Aided Design, pages 324­
331, November 1998.
[4] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and 
A. Yakovlev. A region-based theory for state assignment in speed- 
independent circuits. IEEE Transactions on Computer-Aided De­
sign, I6(8):793-812, August 1997.
[5] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and 
A. Yakovlev. Automatic synthesis and optimization of partially 
specified asynchronous systems. In DAC, pages 100-115, June 
1999.
[6] Henrik Hulgaard and Steven M. Burns. Bounded delay timing anal­
ysis of a class of CSP programs with choice. In Proc. International 
Symposium on Advanced Research in Asynchronous Circuits and 
Systems, pages 2-11, November 1994.
[7] Alain J. Martin. Synthesis of asynchronous VLSI circuits. In 
J. Straunstrup, editor, Formal Method's for VLSI Design, chapter 6, 
pages 237-283. North-Holland, 1990.
[8] D. E. Muller and W. C. Bartky. A theory of asynchronous circuits. 
In Annals of Computing Laboratory o f Harvard University, pages 
204-243, 1959.
[9] T. Murata. Petri Nets: Properties, analysis and applications. Pro­
ceedings of the IEEE, pages 541-580, April 1989.
[10] Chris J. Myers. Computer-Aided Synthesis and Verification of Gate- 
Level Timed Circuits. PhD thesis, Dept, of Elec. Eng., Stanford Uni­
versity, October 1995.
[11] Chris J. Myers and Teresa H.-Y. Meng. Synthesis of timed asyn­
chronous circuits. IEEE Transactions on VLSI Systems. 1(2): 106— 
119, June 1993.
[12] Radu Negulescu and Ad Peeters. Verification of speed-dependences 
in single-rail handshake circuits. In Proc. International Symposium 
on Advanced Research in Asynchronous Circuits and Systems, pages 
159-170, 1998.
[13] M. Nielsen, G. Rozenberg, and P.S. Thiagarajan. Elementary transi­
tion systems. Theoretical Computer Science, 96:3-33, 1992.
[14] S.M. Nowick. Automatic Synthesis of Burst-Mode Asynchronous 
Controllers. PhD thesis, Stanford University, Dept, of Computer 
Science, 1993.
[15] S. Rotem, K. S. Stevens, R. Ginosar, P. A. Beerel, C. J. Myers, 
K. Yun, R. Kol, C. Dike, M. Roncken, and B. Agapiev. RAPPID: 
An asynchronous instruction length decoder. In Proc. ASYNC, April 
1999.
[16] K. S. Stevens, S. Rotem, and R. Ginosar. Relative timing. In Proc. 
ASYNC, April 1999.
[17] Kenneth Yi Yun. Synthesis of Asynchronous Controllers for Hetero­
geneous Systems. PhD thesis, Stanford University, August 1994.
331
