Synthesis of asynchronous control circuits with automatically generated relative timing assumptions by Cortadella, Jordi et al.
Synthesis of asynchronous control circuits with automatically 
generated relative timing assumptions 
’ Jordi Cortadella: 2Michael Kishinevsky, 2Steven M. Burns and 2Ken Stevens ’ Univ. Polit5cnica de  Catalunya, Barcelona, Spain and 2Strategic CAD Lab, Intel Corporation, USA 
Abstract 
This paper describes a method of synthesis of asyn- 
chronous circuits with relative timin . Asynchronous com- 
munication between gates and motules typically utilizes 
handshakes to ensure functionality. Relative timing as- 
sumptions in the form “event a occurs before event b” can 
be used to remove redundant handshakes and associated 
logic. This aper presents a method for automatic gen- 
eration of rerahve timing assumptions from the untimed 
s ecification. These assumptions can be used for area and 
&lay optimization of the circuit. A set of relative timing 
consfrants sufficient for the correct operation .of the cir- 
cuit is back-annotated to the designer. Ex erimental re- 
sults for control circuits of a prototy e iA52 instruction 
length decoding and steering unit c d e d  RAPPID (“Re- 
volvin Asynchronous Pentium@Processor Instruction De- 
coder’? shows significant improvements in area and delay 
over speed-independent circuits. 
1 Introduction 
Asynchronous communication utilizes handshaking to en- 
sure functionality that require some area and dela penalty 
with respect to synchronous design. Timing in&rmation 
can be used to combat the full handshake overhead in area 
and delay b removing redundant handshakes and associ- 
ated logic. Bnce absolute timing information is mostly un- 
known until layout is complete, relative timing information 
in the form “event a occurs before event 6” is a natural rep- 
resentation of timing that can be used in the design flow. 
Relative ti,ming (RT) was used for design, of a protot pe 
iA32 instruction length decoding and steering uni t  c d e d  
RAPPID (“Revolvin Asynchronous Pentium@Processor 
Instruction Decoder’5 that was fabricated and tested suc- 
cessfully [15, 161. Silicon results show si nificant advan- 
ta es in particular, performance of 2.5-4.finstructions per 
n,f - with manageable risks using this design technolog 
RAPPID achieves three times faster performance and harf 
the latency dissi ating only half the power and requiring a 
minor area penayty as a comparable 400MHz clocked cir- 
cuit. Another experiment with a circuit based on timing 
assumptions is described in [2]. 
The design flow for synthesizing relative timing circuits 
is as follows. Relative timing assumptions are provided by 
the user or extracted b the algorithm presented in this pa- 
er. The circuits are txen designed using the assumptions 
for area and delay optimization. RT circuits can be opti- 
mized with respect to the untimed circuits for two reasons: 
0 RT assumptions reduce the set of reachable states 
and hence increase the number of don’t care states 
for logic optimization of all signals. 
0 It is possible to extend the set of states in which a si - 
nal is enabled without chan ing the set of reachabfe 
states if other enabled signa& are known to be or can 
be made faster than the early enabled (a.k.a. lazy) 
signal. This additional flexibility adds local don’t 
cares that can differ from one signal to another. 
This  work was suported by ii grant from Intel Corporation and was done during 
;I visit to SCL in summer 1998. 
0-7803-5832-5/99/ $10.00 Q 1999 IEEE 
A (possibly relaxed) subset of timing assumptions used for 
optimization is back-annotated b the tool and become tim- 
ing constraints. Different validlnetlists require dilferent 
timin constraints. The circuits are then designed to meet 
the refative orderings, or verified that the restrictions are al- 
ready part of the dela s in the system. Methods based ,on 
separation anatysis [6f yomerric timin 1. LO], and relative 
timing can be eployed or verification fl21. 
In [3] it is shown that relative timing synthesis can be 
automated using lazy transition systems i n  which enablin 
and firing.regi0n.s for signal transitions arle distinguishet 
This paper enhances the method of [3] in three major ways. 
0 A method for automatic generation of timing as- 
sumptions starting from a speed-independent (un- 
timed) specification is presented. Most of the timing 
assumptions used in RAPPID circiiits can be auto- 
matically extracted. Only architectural or environ- 
mental assumptions on the inputs ncaded to be spec- 
ified by the user. 
A method for automatic backannotation of RT con- 
straints sufficient for the correct oper;ition of a circuit 
is developed. 
A method for timin aware state encoding is de- 
ployed. It reduces t ie  number of state si nal and 
generates timing assumptions,for state signa% if nec- 
essary. It has a significant positive effect on both area 
and performance. 
Section 2 presents basic theory and models. Section 3 
described a method for automatic generation of RT as- 
sumptions. Section 4 resents technique for extracting tim- 
ing constraints for a &rived RT netlist and briefly describe 
timing-aware state encoding. Section 5 presents experi- 
mental results. 
2 Basic notions 
For brevity, we assume the reader to be faimiliar with Petri 
nets, a formalism used to specify concurrent s \terns. We 
tion systems and fazy state graphs were introduced in more 
detail in [3]. 
2.1 Transition Systems and State Grap.hs 
A transition system (TS) is a quadruple [ 131 T S  = 
(S,E,.T,S;,~), where S is a non-empty se1 (:if srates, E is a 
set of events, T C S x E x S is a transitiori ,relalion, and si!, 
is an initial state. The elements of T are called the fmnsi- 
tions of TS and will be often denoted by s 4 s’ instead of 
(s, e, s‘ ) . 
State Graphs are binary interpreted transition systems: 
every state is assigned a binary vector of signal values in 
the specified circuit; every.event is interpreted as, a risin,g 
(a+) or falling (a-)  transition of a si nal a. Notation a* is 
used if one is not specific about the firection of the signal 
transition. The set of signals of an. SG is called X = I U 0, 
where I and 0 denote the set of input ancl output signals 
respectively. 
A labeling function v : S + (0, I}”  assigns a vector of 
signal values to each state (n = 1x1). We will call v , , (s )  the 
value of signal c1 in state s. An SG is consisrent if rising 
refer to [9] for a eneral tutorial on Petrj nets. c: dzy transi- 
324 
x 
d 
(h) 
M 
(U) 
Fig. 1: (a) Petri net, (b) Transition System. 
(d) (d (10 
Fig. 2: (a) STG for the xyz example, (b,e) SGs with timing 
domains, (c,d,e) Circuits. 
and falling transitions alternate for every signal on any path 
in the SG. An example of a TS and a SG are given in 
Figure 1 .(b) and Figure 2.(b), correspondingly. 
2.2 Signal Transition Graph 
A Si nal Transition Graph (STG) is a Petri net (PN) in 
whick transitions are labeled with rising and falling signal 
transitions like in a SG.  An example of a PN is shown in 
Figure 1 .(a). This PN corresponds to a TS in Figure 1 .(b). 
An STG has an associated SG in which each reachable 
marking corresponds to a state and each transition between 
a pair of markings to an arc labeled with the same event 
of the transition. Figure 2.(a) depicts an STG with three 
signals, x,y,  and z corresponding to the SG in Figure 2.(b). 
For simplicity, places with only one input and output tran- 
sitions are often omitted in STGs. 
2.3 Excitation and quiescent regions 
The excitation region of an event a*, denoted by ER(a*), is 
the set of states such that s E ER(a*) ($ s 3. The quiescent 
region ofa+, denoted by QR(a+),  is the set of states such 
that s E QR(a+) vo(s) = 1 A s 6 ER(a-). Similarly, 
s E QR(a-) ($ v,(s) = 0 A s 6 ER(a+). In Figure 2.(b), 
ER(x-) = (101,111) andQR(x-) = (001,011,010). 
2.4 Lazy transition systems 
The main distinctive feature of a lazy system is that it can 
assume a non-zero dela between enabling of transition and 
its firing. Due to this, t l e  set of states in which a transition 
is enabled might be larger than the set of states in which the 
transition fires. 
Definition 2.1 (Enabling and firing regions) The 
enabling region, EnR(a*), ?fa signal transition a* is a the 
set ofstutes in which transition U* is enabled. The tiring 
region, FR(a*), of a signal transition a* is the set of states 
from which a* canfire, i.e. s E FR(a*) 
A potentially enabling re ion, PEnR, gives an upper 
bound for a set of states whica can be selected as an actual 
enabling region in the RT-implementation. The freedom in 
choosing the enabling region within the PEnR gives addi- 
tional ossibilities for lo ic optimization. It is easy to see 
the fofowing correspon%ence between the introduced re- 
gions: FR(a*) C EnR(a*) g PEnR(a*). We will defer 
discussion of examples until Sections 4.1 -4.2. 
Definition 2.2 (Lazy state graph) A trunsition 
a* is called lazy ifEnR(a*) # FR(a*). A srute gruph is 
called lazy (lazy SG) $at least one transition is lazy ‘. 
The correctness properties of SGs can be easily trans- 
ferred onto lazy SGs. A lazy SG is consistent, determinis- 
tic and commutative if the underlying SG has these pro er 
ties. Persistenc property must be generalized for enab&&, 
and firings as dscussed In Section 2.8. 
2.5 Timing assumptions 
Timing assumptions could be conservativcl defined in the 
form telling that one event is happening beEbre or after an- 
other. 
Difference assumptions. A difference assumption b* < 
a* (reads b* before a*), involving two potentially concur- 
rent events a* and b*, assumes that, due to certain tim- 
ing characteristics, whenever b* and a* are both enabled, 
b* always fires earlier than a*. In an SG this assump- 
tion can be represented by the concurrency reduction of a* 
with respect to bt. RT difference assumptions allows one 
to eliminate states unreachable in timing domain similar 
to state elimination based on absolute timin information 
in [ 10, 111. The are not sufficient however kr expressing 
lazy behavior o?signals. 
Early enabling assumption. Suppose that transition a* 
trig ers the firing of transition b*, i.e. U* and ht are ordered 
in t ie  specification. Assume that a* can be made “fater” 
than b* in the circuit. Then the enabling ofh* can be started 
earlier, e.g., from the events triggering a*, and the proper 
ordering of a* before b* will still be ensured by the timing 
properties of the implementation. In lazy SG this results 
in the backward expansion of PEnR(b*) into FR(u*). 
Simultaneity assumption. The simultaneity assumption 
is a relative notion, which is defined on a set of concurrent 
transitions T with respect to a reference transition a*. It 
tells that from the point of view of LI* the skew of firings 
times of transitions from T is negligible. This assumption 
can be viewed as a local fundamentul mode of T with re- 
s ect to a and hence as a generalization of burst-mode ma- 
cRines [ 14, 171.. An cxample of the application 01‘ simul- 
taneity assumption IS discussed in Section 4.2. 
Assumptions relating only input events cannot be au- 
tomatically generated from the circuit behavior and can be 
provided by the designer or generated from the implemen- 
tation of the environment. 
2.6 Next-state functions 
The implementation of an SG as a logic circuit is done 
through the definition of the next-state junction for each 
output signal and binary vector. For SGs it  is defined as 
follows: 
3s’ : s 3 s‘. 
- otherwise 
‘As we are targeted at optimization of output signals ol’ ii circuit iuzy hehaviors 
of input signals is not considered. 
325 
The next-state function f u  is correctly defined when 
the SG has the CSC property, i.e. there is no pair of 
reachable states (s,s‘) such that V ( S )  = v(s‘) and ( s  E 
ER(a+) U QR(a+) or s’ E ER(a-) U QR(a7)) .  Note 
that fu is an incomplete1 specified function with a don’t 
care (DC) set correspondkg to those binary vectors with- 
out any associated state i n  the SG. The logic netlist is 
speed-independent if SG is deterministic, commutative and 
output-persistent[4]. 
In the SG of Figure 2.(b), the DC set is empty since 
all binary vectors have a corresponding state in the SG. 
As an example, fx ( lOl )  = O;f,,(lOl) = fi(lOl) = 1 since 
signals x and y are enabled, and z is stable in that state. The 
Karnau h maps for the next-state functions are depicted in 
Figure !?.(a). 
For a lazy SG the next-state functions are defined dif- 
ferently: 
- otherwise 
Note that this definition generally gives more don’t care 
vectors that the definition for a SG due to two reasons: 
0 More states are unreachable, since timing assump- 
tion can reduce concurrency 
0 States in (PEnR - FR) do not belong to either FR, 
or QR, and hence are included into the DC-set. 
As an example, in the lazy SG of Figure 2.(e), f x (  101) = 
-;fy( 101) = fl( 101) = 1 as explained in Section 4.2. 
The conditions for speed-independent implementability 
can be trivially extended to lazy SGs. 
2.7 Logic synthesis 
From the next-state functions ofa  SG, a s  eed-independent 
circuit can be derived by implementing t ie  boolean e ua- 
tion of each out ut signal as an atomic complex gate [a or 
as a generalize8C-elements [l, 71. For example, a speed- 
independent complex gate implementation for the STG in 
Figure 2.(a) is a netlist: 
Similarly, from the next-state function specification corre- 
sponding to a lazy SG, an RT-circuit can be derived in the 
form of complex gates or generalized C-elements as illus- 
trated by an example in  Sections 4.1-4.2. 
2.8 Monotonic covers 
Not every logic function derived from the definition of 
the next-state function satisfies hazard-freedom conditions, 
and hence valid. The following definition is related to haz- 
ards in the behavior of asynchronous circuits. 
Given two sets of states SI and S2 of an SG such that 
S2 C SI, we will say that SI is a monotonic cover of S2 if 
for each transition s 4 s’: 
x=-; y = x + z ;  z=x+zy .  
(s E SI - S2 + S‘ E SI) A (S E S2 + S’ # s1 - 5’2) 
Only monotonic covers of FRs can be selected as EnRs 
for hazard-free solutions for logic netlist [3]. If SI = 
EnR(a*) and S2 = FR(a*),  then ( 1 )  no disabling of a* is 
possible and (2) there are no transitions from FR(a*) to 
EnR(a*) - FR(a*), i.e., no disabling of firings for a* is 
possible either. Hence, persistency of a* in the RT imple- 
mentation is guaranteed. For example, in the SG of Fig- 
ure 2.(b), the set { 101,110,111) is a monotonic cover of 
ER(x-). However, the set { 100,101,111) is not, since the 
transition 100 -% 110 violates the conditions for mono- 
tonicity. 
3 Automatic generation of relative timing as- 
sumptions 
3.1 Ordering relations 
Let TS  = (S, T,E,so) be a transition system. Assume that 
every event in E corresponds to a single connected excita- 
tion region. 
Definition 3.1 (Conflict) An eventel E E‘ disables another 
event e2 E E i f  3sl 3 s2 such that SI E ER(e2) and s2 # 
ER(e2). Two events el,e2 E E are in direct conflict i fel  
disables e2 or e2 disables el. 
Definition 3.2 (Concurrency) Two events e l ,  e2 E E ure 
concurrent (denoted by el 11 e2) if they ,jbrm a .state dia- 
mond, i.e. 
1. ER(e1) n ER(e2) # 0, 
2. Vs E ER(e1) n ER(e2) : ( s  1 S I )  E T A (s 7 s2) E 
T =+ 3x3 E S : (SI 1 3 3 )  E T A ( ~ 2  3 . ~ 3 )  E T .  
Definition 3.3 (Trigger) An event el E E triggers another 
event e2 E E (denoted by el -+ e2) if 3.~1 3 s2 such that 
SI # ER(e2) and s2 E ER(e2). 
Definition 3.4 (Enabled before) Let e l ,  9 E E be two 
concurrent events. el can be enabled betore e2 (denoted 
by el ae2 )  $31 + $2 such thatsj E ER((.[)  - ER(e2) and 
s2 E ER(e1) n ER(e2). 
Definition 3.5 (Enabled simu1taneously:l Let e I e? E E 
be two concurrent rvents. e l  und e? can he ennbled SI- 
multaneously (denoted by elOe2) Lf I s [  + s 2  such that 
SI # ER(e1) U ER(e2) and s2 E ER(el) nER(p2) .  
Definition 3.4 can be extended to sets of events as fol- 
lows. 
Definition 3.6 (Enabled before a set of events) Let e E E 
be an event pairwise concurrent with all the events in the 
set X = {el, .  . . ,e,,) c E. e can be enabled before X (de- 
noted by e ax) i f 3 1  4 s:! such that SI E ER(e) - ER(X) .  
s2 E E R ( e ) n E R ( X )  ande’eX,  where ER(X) = ER(e1)U 
... U ER(e,,). 
Figure 1.b depicts the transition system derived from 
the Petri net of Figure 1.a. The following facts can be de- 
rived using the definitions above: -.(a 11 b),  c 11 ,f, c a f 
, cOe, etc. Event d cannot he enabled before { e , f } ,  
but can be enabled before { e , f , g )  since: there is a tran- 
sition s9 3 s19 such that s9 E ER(d)  - E R ( { e , f , g } ) ,  $19 E 
3.2 Delay model 
A delay model for events presented in  this section gives an 
in ormal intuitive motivation for the automatic generation 
o f timing assumptions. This model refers to the delay of 
the events in the 7s. The delay of an event is defined as 
the difference between its enabling time and its firing time. 
Three types of events are considered: 
Non-input events: its delay is in the interval [ 1 -‘E, 1 + E] 
Fast input events: its delay is in  the interval ( I + E ,  -) 
Slow input events: its delay is in the interval [A,-) 
The synthesis approach also assumes (hat ( I )  the delay 
of a gate implementing a non-input event can be lengthened 
by delay padding or transistor sizing, (2) the delay of two 
gates can always be made longer than the delay of one gate. 
n ER({e , f  ,g)> and h # { e , f , g l .  
326 
Hence, one can assume that E < 1/3, (3) the circuit will 
never take longer than A time units (minimum delay of a 
slow input event) in becoming stable from any state of the 
system and a quiescent environment. 
The previous assumptions on the timing behavior of the 
circuit can be translated into assumptions on the firing order 
of the events. 
3.3 Rules for deriving timing assumptions 
We present rules for deriving timing assumptions in the fol- 
lowing format: (1) ordering relations that must be satisfied 
in a (Lazy) SG for a rule to be applied, (2 automatic timing 
cation of a rule based on the above delay model. 
3.3.1 Assumptions between non-input events 
The following rules can be applied for deriving timing as- 
sumptions between non-input events, el ,e2,e3 E E :  
assumption that can be generated, and ( 4 ) informal justifi- 
I. Event enabled before another event. 
Orderingrelations: (e l  11 e2)A(el a e z ) A ( e z  f i e l ) A ( e l  (bel). 
Difference timing assumption: el fires before e2 
Delay assumptions: one gate shorter than two gates. 
11. Events simultaneously enabled. 
Ordering relations: (el 11 e?) A (etoez) A (e2 fier). 
Difference timing assumption: el fires before e2 
Delay assumptions: delay of e2 longer than delay of el. 
111. Event triggered by events simultaneously enabled. 
Ordering relations: (el 11 e2) A (el $e2) A (e2 $cl) A 
Simultaneity timing assumption: el and e2 simultaneous wrt e j .  
Delay assumptions: one gate shorter than two gates. 
IV. Early (speculative) enabling for ordered events. 
Ordering relations: (el --+ e2). 
Early enabling timing assumption: el fires before e2 (but e2 can be 
Delay assumptions: delay of e l  shorter than delay of ez. 
[ (CI  * e , )  v (e2 * en)]. 
enabled concurrently with e l ) .  
Let us illustrate the previous cases with the example of 
Figure 1 assuming that all events are non-input. Timing as- 
sumptions of type I can be derived for the pairs of events 
(c, f ) ,  (c ,g)  and (e,d), where the first element of the pair is 
assumed to fire before the second. Timing assumptions of 
type I1 can be applied to the pairs (b,h) and (c ,e) .  Timing 
assumptions of type 111 can be applied, e.g., to the events 
triggered by the pair (b ,h)  that triggers the events c, e and 
g.  Timing assumptions of type IV can be applied, e.g., to 
the event d triggered by the event c. If this assumption ap- 
plies, then potential enabling region for d includes states 
(s2,~5,s8,s12,s15,sl8,~2l} as don’t care states for the 
values of the next state function for signal d in addition to 
the originally present states (s3, s6, s9, s 13, sl6, sl9, s22). 
3.3.2 Assumptions between non-input and input 
Assume that el e2 E E are a non-input and an input event 
respectively and they are concurrent. 
events 
V. Input not enabled before non-input event. 
Ordering relations: (el 11 e2) A e2 f ie] .  
Difference timing assumption: el fires before e2. 
This assumption covers the ones of ty e I and I1 for the 
case in which e2 is an input event. The ielay assumption 
used in this case states that the response time of the envi- 
ronment will always be longer than the delay of one gate. 
3.3.3 Assumptions between non-input events and slow 
input events 
Assume that e E E is a slow input event, X = {el, .  . . ,e,,} C 
E is a set of non-input events and e is pairwise concurrent 
with all the events in X. 
Ordering relations: (Ve; E X : e 11 e;) A e fix. 
Difference timing assumptions: X fires before e .  
Delay assumptions: delay of slow input event longer than A (delay 
of stabilizing the circuit under a quiescent environment). 
To illustrate the meaning of this timing assumption we 
will consider that h is an input event and c l  is a slow input 
event in the example of Figure 1 .  The rest of events are 
non-input. After firing the events a, b and c a state i n  which 
d,  e and h are enabled is reached (state s3). At this point 
it can be assumed that e and f will fire before d (two gate 
delays vs. slow environment). However, no assumptions 
can be made about the firing order between d and g since g 
is preceded by an input event (h) for which no upper bound 
on its delay can be assumed. In case h would be a non-input 
event, d would be assumed to fire before h and g also. 
4 Backannotation of timing constraints 
After logic synthesis, the validity of the tiiiiing assumptions 
must be verified or validated to ensure the correct I’unction 
of the circuit. However, the circuit may he correct for a set 
of states larger than the one defined by the time<! d<)main, 
which,can be obtained by a set of less stringent timing as- 
sumptions. In other words, some of the timing assump- 
tions are redundant for a particular logic synthesis solution, 
while some other can be relaxed. This section attempts to 
answer the following question: 
Can we derive a minimal set of timir1.g assump- 
tions sufJicient for a circuit to be correct.? 
This set of timing assumptions backannotated for a 
$,en logic synthesis solution is called timing constraints. 
iming assumptions (both manual and auroniatic) are part 
of the specification and provide additional freedom for 
logic synthesis, while timing constraints i s  a part o f  the im- 
plementation, since they constitute requiretnenh to he inet 
sufJlcient for a particular netlist solution to be valid. 
4.1 Example 1 
Let us analyze the example in Figure 2. The shadowed 
states in SG of Figure 2.(b) correspond to the timed domain 
determined by the timing assumptions 
Under these assumptions, lo ic synthesis can be erformed 
by considering the states 1 1% and 00 I unreachagle, i.e. in 
the don’t care set of the logic functions for all signals x,y,z. 
The circuits of Figures 2.(c) and 2 4 4  have a correct 
behavior under the previous assumptions. Looking at the 
circuit of Figure 2.(c) we observe that: 
0 The gates x = z + Xv and = x + z ;ire corrcct imple- 
mentations for the whole untiined domain. 
0 The gate z = .I- is a correct impleinentation for all 
the states except for 001. In this state .Y = 0 and 
z- should have been enabled accc!rdin 7 to the next 
state function of the implementation, k u i  i t  is not 
enabled in this state according to the original state 
graph specification. 
Thus, even the circuit may have been obtained using 
the two previous assumptions, only one relative timing con- 
straint y+ < x- must be ensured for the circuit to he cor- 
rect. In general, each ate of the circuit is correct for a 
subset of the untimed &main which is also a superset <)f 
the timed domain. The circuit is correct for  those states in 
which all gates are correct. 
z + < y +  and y + < x -  
__. 
327 
zoo 01 11 10 
O I O O O  
I l O O l  m
v z o o  01 11 10 
X 
0 
1 
zoo 01 11 10 
O O I I O  
1 1 1 1 1  m
zoo 01 11 10 
0 0 1 1 0  
1 1 1 1 1  m
.Lvzoo 01 11 10 
.Lvzoo 01 I 1  10 
zoo 01 11 10 ..TI 
1 1 1 1 1  -- 
LEGEND: 
@j 
~ Specificutinn global DC - Iiicul DC 
j ~inpiementatiiin: /‘h . required 
‘-I liniinr consintints r e ~ u e ~ l l n  
@ -concurrency ;;uglcha6le 
..................... ^.........______......................._....................I’ 
Fig. 3: Next state functions for q l z  example: (a) Original 
untimed specification; (b) Specification for RT assumptions 
“z+ < y+ and y+ < x-”; (c,d)Im lementations from Fig- 
ures 2.(c,d); (e) Specification for kT assumption :‘y+,z+ 
simultaneous with respect to x-”; (f) Implementatioyfpm 
Figure 2.(f). 
4.2 Example2 
Let us consider the same example under a simultaneity 
assum tion “x+ and y+ are simultaneous with respect to 
x-”. finder this assumption state 001 is unreachable and 
becomes a don’t care for all signals. In addition states 101 
and 1 10 becomes don’t cares for signal x,  since both belong 
to the potential EnR(x-) according to the semantics of 
the simultaneity assumption. Only one timing constraints, 
z+ < x - ,  is sufficient for the circuit in Figure 2.(f) to be 
correct. Gate x = j j  is not enabled in 101, hence concur- 
rency is reduced in this state with respect to the original 
untimed SG and state 001 becomes unreachable under any 
gate delays. State 110 on the contrary corres onds to the 
concurrency expansion for enabling of x- .  &is enabling 
is lazy since 110 E EnR(x-) A 110 $! FR(x-). 
Figure 3 shows Karnaugh maps for the next state fun- 
stions of signals x , y ,  and z for specifications and imple- 
mentations corresponding to the exam les above. A le end 
shows that timin assum tions p rov ig  two types of i on?  
care vectors in R? spedigations: global don’t cares corre- 
sponding to states unreachable due to timing assumptions, 
and local don’t cares that differ for different signals. In the 
RT implementations some states become unreachable due 
to untimed concurrency reduction and therefore discrepan- 
cies in the corresponding values of the next state functions 
compared with the original untimed specification can be ig- 
nored; some discrepancies corresponds to concurrency re- 
duction (disabling of signal transitions without persistency 
violation), and finally, other discrepancies correspond to 
lazy enabling and require timing constraints for correct cir- 
cuit behavior. 
4.3 Correctness of RT circuit 
Let S be an  original uiitimed SG with a finite set of reach- 
able states U and initial state so. Lct G be il circuit 
netlist implementing S under timing constraints C. A pair 
< G,C > is called a relative timin.g circuit (RT circuit). It 
defines a lazy SG, L<G,c>, with a set o f  reachable states 
UL. The RT-circuit implementation can contain mo!e si 
nals than the original specification S if some state sign& 
are inserted for resolving state conflicts. L,et us assume that 
S has n signals and L has k ,k  2 n,  signals. Then for com- 
paring states one needs to use a homomor.hism h : Bk c) B”, 
that given an implementation state hider; (k - n)  new in- 
ternal signals and obtains a specification state. Homo- 
morhism, h, is naturally extended to sets of states. 
A RT-circuit is said to be correct i f  thc following con- 
ditions are satisfied: 
1. h ( 2 f ~ )  U, i.e. no states outside original untimed 
domain are reachable by the RT-circuit. 
2. All signals persistent in S are also persistent in lazy 
SG L<G,c>. All state signals inserted in L<c;,c> are 
persistent. Commutativity and dete:rminism are pre- 
served. 
3. The initial state is preserved with respect to the I/O 
interface, i.e., if so E S and do E L<:~;,c> are the ini- 
tial states of‘the original SG and the lazy SG corre- 
sponding to the implementation, then there is a path 
so 3 h(sb) or h(sb) 3 SO in  S such that sequence T 
contains only events of internal signals, not ohserv- 
able by the environment. 
4. No events disappear: If ERs(e) # 69, then FR,,(e) # 
5.  No new deadlock states appear in L<G,c>. 
0 A WRL(.))  E E w e )  
4.4 Theory for backannotation 
For the ease of exposition let us assume that no state sig- 
nals are inserted in the RT circtlit, and therefore the number 
of si nals stays the same for S and L. We will briefl dis- 
c u s s ~ ~ ~  state signal insertion is done in Section 4.8: Let 
U be the set of states reachable i n  the untimed domain of 
a state graph and I E U the set of states reachable under 
a set of timing assumptions, inanual - provided by the user 
and automatic - derived for synthesis accordincr to thc rules 
of Section 3. Let us assume that we have. a ctf-cuit with M 
output signals, ( 1 1 ,  . . .  ,a,,,. Let G = { g a , ( X ) ,  ..., gc ,,,, (X)} 
(where X is the set of signals) be a set of gates implement- 
ing the RT circuit, where g u j ( X )  denotes the boolean func- 
tion implemented by the gate of signal ai. 
Reachable states in the untimed domain 
Let us call R ( G )  the set of states reachable in the un- 
timed domain for the circuit G. Note that, i n  general, 
U - X ( G )  # 0 due to the reduction of concurrency im- 
posed by the circuit, and K(G)  - U # 0 due to expansion 
of concurrency for enablin for lazy transitions. The lat- 
ter states are not generate8 hy our procedure since they 
must be unreachable in RT domain anyway. The former 
states are of interest, since they do not re uiie any tim- 
ing constraints (see examples 4.1 and 4.2).\e; us denote 
UG = R ( G )  f l  U. UG can be calculated ;IS follows: 
1. For each output signal ai, calculate disubled(a;) = 
EnRG(ui) A s E ERs(ui)}, i.e. states in {s E U I s 
20ur implementation is currently limited by the bounrlrd untimed STGs and 
SGs. It can be easily extended to unbounded untimed STGs by making unbounded 
(infinitely growing) markings of STGs unreachable in RT domain. 
328 
Fig. 4: Formulation of the backannotation problem. 
(CJ,C~,C~} is the set of timing constraints sufficient for 
correctness of RT solution. 
which a; was enabled in the untimed domain in SG, 
S, but made stable by the circuit. 
2.  For each output signal a;, remove all arcs s from 
the SG for all states s E disabled(aj). 
3. Calculate the new set UG = $(G) n U of reachable 
states. 
States with incorrect behavior 
Let us call incorrect(G) c UG the set of states inside UG 
that are re uired to be unreachable for the correctness of 
the circuit.%'hese states can be calculated as follows: 
I .  For each output signal a;, calculate incorrect(a;) = 
{s E (U -  T )  1 s E EnRc(a;) A s E QRs(a ; ) ) ,  i.e. 
states in which U ,  was stable i n  the untimed domain, 
but enabled in the circuit. 
2 .  incorrect(G) = U G  n (Uu, incorrect(ai)) 
Backannotation: problem formulation 
We need a set of constraints that make the states in 
incorrect(G) unreachable. A trivial solution to this prob- 
lem is to take the complete set of timing assumptions used 
for logic synthesis, i.e. those for which I is the set of 
reachable states. Our goal, however, is to find the less strin- 
gent set of constraints sufficient to make the circuit correct. 
Given a set of timing constraints C = {Cl,. . . ,C,)}, we will 
call x ( C )  C U the set of states reachable after applying 
C in the untimed domain. In general, the problem can be 
formulated as follows (see Figure 4): 
Find U set of Constraints C with the largest q(C)  such 
that 
f. I C_ R(C) 2 U-incorrect(G) 
2. VS E I : (S E EnRG(ai*) A s 6 ERs(ai*)) + 3aj : 
S Y S I  A SI E I A (ai* <ai*) E c 
The first condition guarantees that no incorrect states 
inside U are reachable (constraints Cl,C2 in Figure 4), 
whereas the second makes sure that no states outside U 
can be reached in the RT circuit (constraint C' in Figure 4). 
4.5 
Relative timing constraints are defined in terms of firing or- 
der of events. Constraining a firing order between a pair of 
events makes only sense when they can be enabled simul- 
taneously and fire in any order, i.e. when the are concur- 
rent. Thus, each timing constraint Ci can be dknoted by an 
ordered pair of concurrent events, e.g. Ci = (e j  < ek). 
Given a constraint Ci = (e,i < ek), we define the set of 
arcs disabled(Cj) as 
Finding a set of timing constraints 
disabled(C;) = { s  3 s' I 3s -+ s1 -+ . . . + s,* : 
S I , .  . . ,sn-l E ER(ek) A s,~ E ER(ek) n ER(ej)} 
order uiireachable 
b<d ( ~ 4 6 7 )  
b<e {s7) 
L<d \4,SS,,7.\8 1 
c<e (\7,s8) 
d<b [\2,s3} 
d<c (s3) 
e<b 1 \2,s3,sS.s6) 
e<c (b3,s6) 
Fi . 5: Example for backannotation with table of unreach- 
abfe states for each pair of ordered events. 
In particular, the path SI -+ . . . -+ s,, can be empty if 
s E ER(ej)  n ER(ek). disabled(Ci) is the set of arcs with 
label ek that must not fire in order for ej to fire before ek, 
i.e. those arcs with source states in which both events are 
concurrent or preceding ER(e,j) n ER(ek) inside ER(ek). 
Given a set of constraints C = {Cl,. . . ,Cl,}, q(C) is 
the set of reachable states after removing the arcs i n  
U disabled(Cj) 
CjEC 
4.6 Example3 
Figure 5 shows an example for deriving a set 01' timing 
constraints for backannotation. Initially we have U = 
{SO,. . . ,s10} and I = {so ,s~  ,s~,s~,s~,s~,sIo}. Let us as- 
sume that S6 and s7 are the states in.which the behavior 
of the circuit is incorrect. The table in Figure 5 contains 
the set of states that become unreachable by reducing the 
concurrency between each pair of concurrent events'. For 
example, by imposing the order d < 6, the states s2 and s3 
become unreachable. 
The problem to be solved is the following: find a set of 
ordering constraints between pairs of events such that the 
new set of reachable states covers 1 and does not intersect 
the set of incorrcct states {s6,s7}. Moreover, we want to 
maximize the set of reachable states, i.e. to find the less 
stringent set of timing constraints. 
The roblem can be posed as a coverin.g problem. The 
cells of tRe table in bold correspond to those constraints that 
do not remove any state from I. The covering problem can 
be formulated as follows: 
( e < c )  A ( b < d  V b < e )  
with the minimum-cost solution C = {e < c,h < e} and 
4.7 Solving the covering problem 
The covering problem for backannotation does not corre- 
spond to a mate covering problem, since the cost 01' the 
final solution (number of disabled arcs) is not the sum of 
the cost of each constraint. 
Currently, petrify uses a greedy approach to solve 
the coverin roblem that can be easily im lemented by 
s mbolic B%b-based techniques. It mere& consists in 
cioosing the constraint that removes the maximum num- 
ber of arcs whose destination is in incorrect(G) and that 
have not been removed b previous constraints. This pro- 
cess is iteratively repeateJunti1 all incorrect states become 
unreachable. 
$(c) = {SO, SI 9 s2, $4 SS 7 $8 7 SY *ylO} 
?For simplicity, unreachable states are wported i n  the table for this example. In 
general, the analysis must be performed hy tzilctiliiting thi. removed c l i ~ ~ ~ h l i d  arcs.
In this particular case, the resulting annlysis is  the same. 
329 
, -t ni. 10. + (C) 
11. - I,* n,  ,I+ 10. 11%. rl. n. 
; ,,_. ..-.._.- 
11-, k * r r . * w  rlt ., \..I \,.;r I* 
,- .._.__. 
(bl 
la1 
Fig.. 6: (a) FIFO controller, (b) Specification, (c) Specifi- 
cation with state encoding signal, (d) RT implementation 
with gC elements, (e) Timing constraints sufficient for cor- 
rectness. 
In some cases, not all the incorrect states can be made 
unreachable since the timed state space has been produced 
by early enabling some events. In those cases, a similar 
iterative rocess is executed to cover those incorrect states 
that can \e legalized b early enabling. As an example, 
consider the state sg in h g u r e  5. Assume that S6 is incor- 
rect since the next-state function indicates that f is enabled 
in that state. The state could be made correct by extend- 
ing EnR( f )  towards S6 and imposing the type-IV constraint 
e <  f .  
4.8 Timing aware state encoding 
The problem of state encoding is in inserting state signals 
for resolving CSC conflicts. State encoding in our im le 
mentation is automatically solved using an extension ofthe 
method presented in [4]: 
0 Only those encoding conflicts reachable in the RT 
domain are considered in the cost function such that 
no effort is invested in solving conflicts unreachable 
in RT domain, 1. 
0 Automatic timing assumptions can be enerated for 
inserted state signals using rules from Eection 3 im- 
1 ing that the state signals can be implemented as k? logic. 
5 Experimental results 
5.1 Academic examples 
The results for the well-known benchmarks used at 
academia are presented in Table 1. Tables l.(a) and l.(b) 
present the results for specifications with and with state 
coding conjlicts respectively. SI,, SI, and TI represent area 
and delay optimization for speed-independent design, and 
relative timing results, correspondingly. 
For each experiment, area is estimated as the number 
of literals of the set and reset networks of generalized C 
elements. Delay (response time) is estimated as the aver- 
age number of non-input events in the critical path between 
the firing of two input events. Com aring the columns SI, 
and TI, we observe a reduction of aiout 40% in area. The 
reduction in response time is less than 5% if we consider 
all events to have a delay of one time unit. However, the 
performance improvement is much more pronounced if it 
were evaluated with actual delays, given that the logic of 
the timed implementation is much simpler. We report this 
analysis in Section 5.2. 
5.2 Example: a FIFO controller 
In this section we trace the development of a FIFO cell 
(specified in Figures 6.(a),(b)), a simplified abstraction of 
a. part of the RAPPID design. The modules at the left and 
right sides of the controller have a similar speed as the con- 
troller itself. In fact, these events are generated by. twin 
modules connected at each side. For this reason, it IS not 
wise to assume that the input events are slow. 
We simulated four FIFOs using different implementa- 
tions of the FIFO cell and measured a cycle time of the 
.. res U e 
SI reshuWed 7.6 
Table 2: Performance comparison of FIFOs normalized to 
a fan-out four inverter delay 
FIFO and a forward latency (an average event propagation 
time from li to ro) of a cell. The results normalized to the 
delay of an inverter with fan-out four i n  a given technology 
are shown in Table 2 .  
For the first relative timing FIFO (reported in the first 
row) we use a RT circuit derived by petri f y using on1 
automatic timing assumptions presented in .Figures 6.(e). 
proper transistor sizin is required for correct operation of 
the circuit. No user-cfefined assumptions on the environ- 
ment are used. The timing analysis explained in Section 3 
has been applied to the s ecification, ,and state encoding 
has been automatically soyved as desribed in Section 4.8. 
With this strategy, only one additional state signal, x, was 
required as shown i n  Figure 6.(c)'. Therc are some inter- 
esting aspects of this implementation: 
0 The state signal x is is switching concurrcntly with 
other activity in the circuit.This is a result olthe state 
encoding strategy of petrify that attempts to in- 
crease the concurrency of new state signals until they 
disappear from the cKitical paths according to the de- 
lay model explained in Section 3. 
0 The response time of the circuit with regard to the 
environment is only one event (two inverters), i.e. as 
soon as an output event is enabled it fires without 
requiring the firing of any other internal event. 
Finally, the implementation of Figure 6.(d) requires some 
timing constraints to be correct. After applying the method 
proposed in Section 4, five timing constraints between pairs 
of concurrent events have been derived that are suficient 
for the circuit to be correct. They are graphically repre- 
sented in Figure 6.(e). 
The constraints I,,+ < x- and rfJ+ < .r- are noi inde- 
pendent. Since the implementation of x is x = 1,, + rfJ, it 
is always guaranteed that one of them will hold, whereas 
the other must be ensured. Sjnce ff!+ and rb+ are enabled 
simultaneously, these constraints will alwa 's hold if the de- 
lay of two gates is longer than the delay orone gate. From 
the rest of constraints, the most stringent is x- < r;+. In 
the worst case, both rj+ and x- will be enabled simultane- 
ousl by rcJ+. In this case, it is required the delay ofx- to 
be siorter than the delay of ri+ (trom thc enviroment),.In 
case of a very fast environment, i t  can be fiarced by ditter- 
ent techniques, e.g. transistor sizing or delay padding for 
gate x. 
For the second FIFO (the second row of the table) we 
derived a speed-independent circuit using petrify in the 
mode of automatic concurrenc reduction 1151 without con- 
straining U 0  concurrency of d e  cell. Because of concur- 
rency reduction only one state si nal was required [4] like 
in the case of the automatic RT sofution. However, the state 
signal was on a critical cycle and the implementation of lo 
and ro contained additional p-transistors, which made the 
speed-independent circuit 20-30% slower than the RT one. 
5.3 RAPPID control circuits 
In this section we com are manually optimized RT control 
circuits used for RAPPYD [ 16, 151 with thwe derived auto- 
matically with petrify. For cach example, Table 3, re- 
ports: manual (obtained by applying relative timing man- 
ually), automat ic (obtained automatically by petrify 
~~ ~~ ~~ ~~ ~ 
4 ~ i s  new speciticution is iitit strictly ii Prtri net. siiice the iircs troni /,,+ iind v,,+ 
to the OR place indicate an or-causuliry relation: L- is triggered by llic tirst event 
to fire, whereas the token produced by the liitesl event is iiiirilicilly cwsiinicd. An 
equivalent Petri Net is n bit more cumbersome and is oniittetl for simplicity. 
330 
circuit 
a .&it 
a&-outbound 
master-read 
I mmu0 I 33 47 20 I 2.31 1.38 1.38 I 3 3 0 I 
Area Response time State signa 
SI, SI, TI SI,, Sit TI SI,, SI, $1 
ii $ i: % !:: !:!: 
65 79 45 2.29 1.33 1.29 7 7 3 
mmu I 
nak-aa 
Sbuf-Wdd-ctl 
seq3 
seq-mix 
vmebus 
Total 
nowick 18 19 i.50 i 3  i.00 1 ram-read-sbuf I 30 26 5 I 1.10 1.00 1.00 I ! 6 I sbuf-ram-write 24 44 24 1.63 1.00 1.00 2 2 1 
18 21 16 2.00 1.50 1.50 1 1 I 
18 22 18 1.50 1.00 1.00 2 2 2 
23 28 24 1.40 1.20 1.10 2 2 2 
22 33 17 2.29 1.57 1.57 I I 0 
424 3 5 0 - - - ~ - 3 2 3 - ~ ~  
Design 
circuit 
chu 133 
chu 1 50 
converta 
ebergen 
half 
hazard 
inslatch 
trimos-send 
var 1 
vbe5b 
vbefic 
vbe6a 
vbel0b 
wrdatab 
Area (# tr.) Worsl case Average case 
response time response time 
8 8  
24 20 
30 21 
18 8 
13 12 
IO I O  
18 24 
32 26 
35 33 
:E-; 
Bvte-cntr 
Table 1 : Experimental results: specifications without CSC (a) and with CSC (b). 
m a  s m a s m a s  
32 27 71 4.0 3.0 5.0 8.0 2.5 4.1 
:: :: 2 9:: 3:: ;:: : ;: ::: 
f i g u n i t  11 31 47 I12 I 4.0 4.0 8.0 I 40 2.7 6 9 
S u m m q  (1  101 1 1 1  275 . I . .  7 1 29 . 775  . .  I 7 0  2 4  . -  56 
Table 3: Com arison for two generic representative exam- 
ples (fifo) a n t  two control circuits from RAPPID (byte- 
control, tag-unit). Response time is measured in gate de- 
lays, area in transistors. m: manual, a: automatic, s: speed- 
independent. 
and applying relative timing) and speed- independent 
(obtained automatically by petrify without concurrency 
reduction). 
From the table it can be deduced that automatic solu- 
tions are uite comparable with manually optimized RT 
designs. ?he improvement in response time by applying 
relative timing is about a factor of 2, substantially better 
than for the examples of Table 1. This is because the de- 
signers of these circuits had a stronger interaction with the 
tool and provided affressive timing assumptions on the en- 
vironment that cou 
6 Conclusions 
The method for automatic generation of timing assump- 
tions presented in this paper allows the designer to concen- 
trate on defining those timing assumptions that can only 
be deduced from a detailed knowledge of the environ- 
ment. The techni ue for automatic back-annotation of tim- 
ing constraints rgative to a particular RT circuit provides 
necessary timing information for the down-stream tools. 
Timin -aware state encoding allows arealdelay optimiza- 
tion O ~ R T  circuits. 
Relative timin presents a “midd1e;ground” between 
clocked and asyncaronous circuits, and is a fertile area for 
CAD development. Both burst-mode[ 14, 171 and speed- 
independent specifications are at opposite extremes of a 
more general class of relative timing specifications. 
Ackowledgments We would like to thank Shai Rotem, 
Lucian0 Lava no, Alex Kondratyev and Alexandre 
Yakovlev for tieir contributions in motivating this work 
and developing the theory for synthesis with relative tim- 
ing. 
References 
not be derived automatically. 
[ I ]  S. Bums. General conditions for the decomposition of state holding 
elements. In Internutioncif Symposium on Advunced Reseurch in 
Asynchrfiniius Circuits ctnd Systems, Aizu, Jupun, March 1996. 
[2] W S. Coates J K Lexau I. W Jones S M. Fairbanks and I E 
Sutherland. A fifo data swhch design eiperiment. In Pro;. Inter& 
fionul Sympfisium on Advunced Reseurch in A.svtrchrnnous Circuits 
undSysfems, pages 4-17, 1998. 
[3] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, 
A. Taubin, and A. Yakovlev. Lazy transition systems: application to 
timing optimization of asynchronous circuits. I n  Proceedinxs of’the 
Internutbtu11 Conference on Computer-Aided Design, p g e s  324- 
331, November 1998. 
[4] J. Cortadella. M. Kishinevsky, A. Kondratyov, L. Lavatgiio, and 
A. Yakovlev. A region-based theory for state rtssignment i n  speed- 
independent circuits. lE&& Trcinrctctiorw on Conputer-Aided De- 
sign, 16(8):793-8 12, August 1997. 
[SI J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavigno, and 
A. Yakovlev. Automatic synthesis and optiinization of partially 
specified asynchronous systems. I n  DAC, pages 100-1 IS, June 
1999. 
[6] Henrik Hulgaard and Steven M. Burns. Hounded delay timing anal- 
ysis of a class of CSP programs with choice. I n  Pro[:. Interneitioncd 
Sympcisium on Advunced Reseerrch in A.synchr~onous Circuifs und 
Systems, pages 2-1 I ,  November 1994. 
[7] Alain J .  Martin. Synthesis of asynchronous VLSI circuits. In 
J. Straunstrup, editor, Formctl Methods .fiir VLSI De.vign, chapter 6, 
pages 237-283. North-Holland, 1990. 
[8] D. E. Muller and W. C. Bartky. A theory of asynchronous circuits. 
In Annuls of’ ComprtfinR Luborcitory of’ Hctrvcid Univemity, pages 
204-243, 1959. 
[9] T. Murata. Petri Nets: Properties, analysis and applications. Pro- 
ceedings ofthe IEEE, pages 541-580, April 1989. 
[ IO] Chris J. Myers. Crnnputer-Aided Synrhesis u r d  \/c.r$ccrfion c!fGerre- 
Level Timed Cireuifs. PhD thesis, Depl. of Elec. Eng., Stanford Uni-  
versity, October 1995. 
[ I  I ]  Chris J. Myers and Teresa H.-Y. Meng. Synthesis of timed asyn- 
chronous circuits. /&EE Tretnsuc/ions on VLSI Sy.sietns. I(2): 106- 
119, June 1993. 
[ 121 Radu Negulescu and Ad Peters .  Verification o f  speed-dependences 
in single-rail handshake circuits. In Proc. Interncttiorzul S~wiposium 
on Advunced Research in Asynchronous Circu/ts und Sy.riems, pages 
159-170, 1998. 
[I31 M. Nielsen, G. Rozenberg, and P.S. Thiaganjan. Elementary transi- 
tion systems. Theomticuf Computer Science, 963-33, 1992. 
[ 141 S.M. Nowick. Aurri/nutic Synthe.ris of Bursr-Modc As,vnchmnous 
Contrdfers. PhD thesis, Stanford University, Depi. of Computer 
Science, 1993. 
[IS] S. Rotem, K. S. Stevens, R. Ginosar, P. A. Seerel. C. J .  Myers. 
K. Yun, R. Kol, C. Dike, M. Roncken, and B. Agapiev. RAPPID: 
An asynchronous instruction length drcodcr. 111  Proc. ASYNC. April 
1999. 
[I61 K. S. Stevens, S .  Rotem, and R. Ginosar. Relalive timing. I n  Proc. 
ASYNC, April 1999. 
[ I71 Kenneth Yi Yun. Synthesis c~f’Asyrrclirnnous Confrollersfiw Hetero- 
geneous Systems. PhD thesis, Stanford University, August 1994. 
331 
