Synthesis of speed independent circuits based on decomposition by Myers, Chris J. & Yoneda, Tomohiro
S y n t h e s i s  o f  S p e e d  I n d e p e n d e n t  C i r c u i t s  B a s e d  o n  D e c o m p o s i t i o n
Tomohiro Yoneda* Hiroomi Onda Chris Myers* 
National Institute of Informatics Tokyo Institute of Technology University of Utah 
yoneda@nii.ac.jp onda@ yt.cs.titech.ac.jp myers@ece.utah.edu
A b s t r a c t
This paper presents a decomposition method fo r  speed- 
independent circuit design that is capable o f significantly 
reducing the cost o f synthesis. In particular, this method 
synthesizes each output individually. It begins by contract­
ing the STG to include only transitions on the output o f 
interest and its trigger signals. Next, the reachable state 
space fo r  this contracted STG is analyzed to determine a 
minimal number o f additional signals which must be rein­
troduced into the STG to obtain CSC. The circuit for this 
output is then synthesized from this STG. Results show 
that the qualm  o f the circuit implementation is nearly 
as good as the one found from the fu ll reachable state 
space, but it can be applied to find circuits for which full- 
state space methods cannot be successfully applied. The 
proposed method has been implemented as a part o f  our 
tool n u t a s  (Nii-Utah Timed Asynchronous circuit Syn­
thesis system), and its very first version is available at 
h t t p : /  / r e s e a r c h  . n i l . a c  . j p / ~ y o n e d a .
K e y  W o rd s : Decomposition, synthesis. STGs. abstrac­
tion. speed-independent circuits.
1. I n t r o d u c t io n
Logic synthesis [1, 2, 3] from low level specification lan­
guages is one of the major approaches to the automated syn­
thesis o f asynchronous circuits. This approach can poten­
tially synthesize more optimized circuits with higher per­
formance than other methods such as syntax directed trans­
lation method [4, 5, 6, 7, 8, 9]. It, however, usually requires 
an enumeration of the state space of the given specifica­
tion, and it often suffers from the state explosion problem. 
Thus, large specifications expressed in hardware description 
languages have usually been synthesized by syntax directed 
translation methods or similar techniques that do not require 
state space enumeration, sometimes with local optimiza­
tion techniques such as [10]. This paper tackles the chal­
lenge of using logic synthesis also for large specifications
* This research is supported by JSPS Joint Research Projects, 
f This research is supported by NSi! Japan Program award INT-0087281 
and SRC grant 2002-TJ-1024.
derived from hardware description languages, as it has the 
potential of in the future providing further global optimiza­
tion through timed circuit synthesis [11]. In this approach, 
a specification written in some high-level language is first 
translated to a signal transition graph (STG), and, then logic 
synthesis is applied to this STG. This method requires a 
compiler to generate STGs with the complete state coding 
(CSC) property, and an efficient logic synthesis method. A 
preliminary tool for the former is described in [12], and im­
proved version is described in [13]. Guaranteeing CSC by 
such a correct-by-construction method, which may not give 
optimal solutions in the number of inserted state variables, 
is practical for large STGs, because automatic CSC solvers 
sometimes do not handle such STGs well. Note that only 
a small number of inserted state variables are actually used 
to implement each output, and so, the delays of the circuits 
are not significantly affected even in a non-optimal solu­
tion. This paper is for handling the latter issue, and aims at 
reducing the average cost for logic synthesis from STGs by 
decomposing a specification and running the logic synthe­
sis procedure for each small sub-specification.
The idea for decomposition based synthesis is first pro­
posed by Chu [14]. In his work, one primary output is 
picked up, and the given STG is modified by replacing each 
transition for the signal that does not affect the output by 
a dummy transition. Then, the modified STG is reduced 
by eliminating selected dummy transitions while preserving 
the behavior. A correct circuit can be synthesized from this 
reduced STG with usually much smaller cost. This work, 
however, had two open problems. First, the reduction of 
STGs, called contraction, was not formalized. For a sim­
ple STG such as a marked graph, its contraction is easy. 
But, in the general case, the formalized algorithm was un­
known at that time. Second, it was not straightforward to de­
cide if a signal actually affects the output signal or not, and 
no algorithm to make this decision is given in his thesis. As 
for the first problem, Vogler and Wollowski recently formal­
ized the contraction algorithm using a bisimulation relation 
in [15], and Zheng and Myers developed a timed contrac­
tion algorithm in [16]. On the other hand, Puri and Gu tried 
to solve the second problem in [17]. Their algorithm greed­
ily removes an irrelevant signal (with respect to the output 
signal) such that the number of CSC conflicts does not in­
crease by hiding the signal. This algorithm is, however, not
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC'04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rso c ie t y
so helpful for our purpose, because it needs the state graph 
of the original STG, which cannot be constructed due to 
state explosion for very large STGs. Beister, Eckstein, and 
Wollowski proposed a similar decomposition based method 
for extended-burst-mode machines [18].
The main contribution of this work is to propose a new 
algorithm to find a sufficient set of input signals for a given 
output for the decomposition based synthesis approach. The 
algorithm stalls with a small set of signals which are cer­
tainly needed for the output signal, and uses only the state 
graphs of the contracted STGs for determining other neces­
sary input signals. Since the state graphs of the contracted 
STGs are usually very small, it does not suffer from the 
state explosion problem. Furthermore, its decision proce­
dure computes candidates of the necessary signals in many 
cases more directly than the greedy algorithm in [17], al­
though some cases need heuristics.
The proposed algorithm, however, has the following re­
strictions on the class of STGs to be handled. First, the given 
STG must be 1-safe and output semi-modular, where in­
tuitively, the number of tokens in each place must not ex­
ceed one in a 1 -safe STG, and output transitions are not dis­
abled by or do not disable any other transitions in an output 
semi-modular STG. Note that these are required in almost 
all logic synthesis algorithms. Second, the given STG must 
have CSC. This is not so restrictive for our purpose, because 
our compiler from a high-level specification language guar­
antees it as mentioned before. Finally, the given STG must 
satisfy the following two properties: (1) the guided simu­
lation with respect to each output signal terminates, and 
(2) for every two reachable markings of the STG, either 
one is reachable from the other. These requirements come 
from the analysis method of the CSC violation traces, and 
are explained in Section 4. We believe that many specifi­
cations to which logic synthesis is applied satisfy these re­
quirements. At least, every benchmark circuit specification 
shown in Section 5 satisfies them.
The rest of this paper is organized as follows. Section 2 
shows the basic theory of our decomposition based synthe­
sis, where several notations are based on [15] and [19]. Sec­
tion 3 describes the overview of the proposed method, and 
Section 4 explains in detail how the input sets are deter­
mined, which is the main issue of this paper. Several exper­
imental results are shown in Section 5, and Section 6 gives 
our conclusion.
2 . B a s ic  th e o r y
An STG G = (P ,T ,F ,  f i ° ,1, In, Out) is a labeled net, 
where P  is a set of places, T  is a set of transitions (P  H T  = 
0), F  C  ( P x T ) U ( T x P )  is the flow relation, fi° is the ini­
tial marking, I : T  —¥ (InliOut) x {+ , —} U {A } is the label­
ing function, In and Out are the input and output signal sets. 
Let sig(G) denote In U Out. A transition t with l(t) = A 
is called dummy. For w £ sig(G), w -transition denotes a
transition t with l(t) = w+ or w —. For any transition t, 
•t = {p £ P  | (p.t) £ F }  and = {p £ P  | (t.p) £ F }  
denote the source places and the destination places of t, re­
spectively. Transitions t and t' such that •£ n * f' ^  0 are said 
to be in conflict. Note that when STGs G, G\, etc. are con­
sidered, their corresponding components P , T, etc., P i, Ti, 
etc. are implicitly considered.
A marking fi. of G  is any subset of P .  A transition t is en­
abled in a marking // if •£ C  // (all its source places have 
tokens in //); otherwise, it is disabled. If a transition t is en­
abled in //, it can fire, and a new marking //' = (// — •t )U t»  
tis obtained, denoted by // -4 //'. Fora sequence v = t it2 ■ ■ ■ 
of transitions, // A  //' is defined similarly (// is equal to //' 
for an empty v). v is called a trace, if there exists //' such 
that (i° A- fi.'. Let trace(G) denote the set of all traces of G. 
A trace may contain multiple occurrences of the same tran­
sition. In this paper, it is assumed that those occurrences 
of the same transition are distinguished by some appropri­
ate way, such as, by attaching firing counts.
Each marking has a state vector, which represents the 
values of signals in In U Out. Different markings may have 
the same state vector. In this paper, a state implies a state 
vector or a set of markings with the same state vector. It 
is sometimes convenient to annotate to a state the informa­
tion whether the outputs are excited to rise or fall. For this 
purpose, R. or F  is used in addition to 0 or 1 in state vec­
tors. R. represents the binary value of 0, but it implies that 
the output is excited to rise. F  indicates the signal is 1, but it 
is excited to fall. When these two notations with and with­
out F /R . should be distinguished, we call the former dec­
orated states, and the latter nondecorated states. For ex­
ample, suppose that two markings // and //' have decorated 
state (1010) and (101R). They have the common nondeco­
rated state (1010), but the behavior of the output is differ­
ent in those markings. This situation is called a CSC viola­
tion, and these two markings are a CSC violation pair. If an 
STG has a CSC violation pair, we say that the STG does not 
have CSC. Otherwise, it has CSC. If an STG does not have 
CSC, a circuit cannot be synthesized from the STG.
The property called output semi-modularity is also nec­
essary to synthesize a circuit from an STG. Formally, this 
property is violated, if and only if there are two chains vi 
and v-2 of dummy transitions such that their first transitions 
are in conflict, and that either of the non-dummy transitions 
that follow vi or v2 is related to an output signal. The most 
simple case is that vi and v2 are empty, and so, an output 
transition disables or is disabled by an input or output tran­
sition.
For x  £ Out, ES(x+)  denotes a set of reachable non­
decorated states where x  can rise, QS(x+)  is a set of reach­
able nondecorated states where x  is stable high. E S ( x - )  and 
Q S ( x - )  are defined similarly. The other states are unreach­
able, and this set is denoted by UR. From the definition of 
CSC, if and only if an STG has CSC, its E S ( x + ), QS ( x +), 
E S ( x - ) ,  and Q S ( x - )  sets are disjoint for each x  £ Out.
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
Iii this paper, the implementation technologies consid­
ered are atomic gates and generalized-C (gC) elements. A  
circuit for an atomic gate implementation for each x  G Out 
is defined by a cover G(x) ,  which is a set of states where the 
logic function of the circuit produces 1. The cover is correct 
with respect to G , if it satisfies
C(x)  -  UR = ES(x+)  U QS(.t+).
The gC implementation needs two covers C( x +)  and 
C ( x - ) ,  and they are correct with respect to G, if they sat­
isfy
ES(x+)  C  C ( .t+ )  -  UR C  ES(x+)  U QS{x+),  
E S ( x - )  C  C ( x - )  ^ U R C  E S ( x - )  U QS( x - ) .
If the covers are correct with respect to G,  then the corre­
sponding circuit is also correct with respect to G.
For a nondecorated state s and a set D  of signals, the 
D-closure of s, denoted by C d(s), is a set of all non­
decorated states, including s, such that their state vec­
tors are the same if the signals in D  are projected out. 
The core of a .D-closure is the common state vector ob­
tained by projecting out the signals in D . For example, 
for .s =  (abed) = (1101) and D = {u,b},  CD (s) = 
{0001.0101.100 1 110 If and its core is (cd) = (01). The 
mappings from D-closure Cd (u) to its core s ' and its in­
verse are defined by proj£)(Ci)(s)) and proj2^ 1(s '). Note 
that both are the one-to-one mappings. The D-closure and 
these mappings are extended to sets as follows: Cd (S)  = 
U .,g S Cd(s)< projI ) (Ci)(S')) =  {projZJ(CjC>(s)) | s G S'},
and proj2)1(S") =  U s-Gs ' Pr° j I ) V ) -
For an STG G  and x  G Out, a set D  of signals is an ir­
relevant input set for x , if
1. D C  I n U O u t - { x } ,
2. Cd {ES{x +))  -  UR = E S (x+ ), and
3. Cd {ES{x - ) )  -  UR = ES(x—).
From this definition, the following lemma holds.
Lem m a 1 For an STG G  with CSC and an irrelevant in­
put set D  for its output x,  Cd (ES(x +)),  Cd (QS(x +)),  
Cd (ES(x —)), and Cd (QS( x—)) are disjoint.
The proof is shown in the appendix. From this lemma and 
C d ( S )  2  S' for any set S', QS(x+)  and Q S ( x - )  satisfy
1. C i)(eS(.'r+)) -  U R =  QS(x+) ,  and
2. Cd (QS(x - ) )  — UR = QS(x~) ,
when G  has CSC and D  is an irrelevant input set.
For an STG G  and a set D  of signals, let G d  denote an 
STG obtained from G  by making transitions related to sig­
nals in D  dummy. Then, the following lemma holds.
Lem m a 2 Suppose that an STG G  has CSC and is output 
semi-modular. For any x  G Out  and any irrelevant input set 
D  for x,  a speed-independent circuit for x  synthesized from 
G d  is correct with respect to G.
Intuitively, this can be explained as follows. From the above 
properties for Cd (ES(x +)),  Cd (ES(x —)), Cd (QS(x +)),  
and Cd (QS(x —)), even if the values of signals in D  are 
changed in a state, the resulting state falls in the same state 
set (i.e., E S ( x + ) ,  E S ( x - ) ,  Q S ( x + ), or Q S ( x —)) as the 
original state, if it is reachable. Hence, the behavior of an 
STG is not affected by projecting out the signals in D.  A  
more formal proof is shown in the appendix.
On the other hand, for a non-irrelevant input set D , G d 
no longer has CSC as shown below.
Lem m a 3 Suppose that an STG G  has CSC, and for x  G 
Out, a set D  of signals with D  C In U Out — {.r} is not an ir­
relevant input set for x.  Then, G d  does not have CSC with 
respect to x.
(Proof) Since D  is not an irrelevant input set, there exist ei­
ther states s G ES(x+)  and s ' G Cd (u) — UR such that 
s ' £  ES(x+) ,  or states s G E S ( x - )  and s ' G CD {s) -  UR 
such that s ' ^  E S ( x - ) .  In the former case, s ' must be 
in QS(x+)  U E S ( x - )  U QS ( x - ) .  But, the value of x  in 
s is 0, and so is the value of x  in s ' from x  £  D.  Thus,
G Q S ( x - )  holds, s and s ' are mapped in the same state 
in G d - Hence, s G ES(x+)  and s ' G Q S ( x - )  imply that 
G d  has a CSC violation with respect to x.  The similar dis­
cussion holds for the latter case. (Q.E.D.)
For STGs G\  and G-2 , a simulation from G\  to G -2 is a 
relation S' between markings of G\  and G 2  satisfying
•  (/'■?< / '! )  G S', and
•  for all ( / / i , //-2 ) G S' and all //1 A  /4  with v = 
t i t 2 ■ ■ - tn (n > 0), there exists some firing sequence
v' = t^ t '2 ■■■t'm (m > 0) and fi'2 such that //■2 
//-2. h ( v )  = h i v ' )  and (//^, //-^) G S', where for 
u =  */| n -2 ■ ■ ■ ui  (/-• >  0), l(u)  is obtained from 
Z(«i)/(«2) • • • I("i- ) by deleting A.
If B  is a simulation from G\  to (?2 , and B is a simulation 
from & 2  to G i , then B  is a bisimulation between G\  and 
G 2  ■ Let G 1 G 2 and G\  k, G 2 denote that there exist a 
simulation from G\  to G 2  and a bisimulation between G\  
and G 2 , respectively.
For an STG G , x  G Out, and V' C  sig(G) such that x  G 
V', abs(G. V, x)  is any STG with the input signal set F —{.r} 
and the output signal set {.r} such that abs(G. V, x)  k , G d 
with D  =  sig(G) — V'. abs(G. V, x)  can be obtained by the 
net contraction algorithm, and it can usually be constructed 
such that its state space is much smaller than that of G.  The 
details can be found, for instance, in [15].
Theorem  1 Suppose that an STG G  has CSC and is out­
put semi-modular. For x  G Out and some V' C  sig(G) with 
x  G V', if abs(G .F . x)  has CSC, then a speed-independent 
circuit for x  synthesized from abs(G. V, x)  is correct with 
respect to G.
(Proof) If D = sig(G) — V' is not an irrelevant input 
set for x,  then from Lemma 3, G d  does not have CSC 
with respect to x.  This implies that abs( G. V . x )  does not
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
decompositionJbased^synthesis (G ) {
f o r a l l  x e  Out {
Gabs = o b t a in _ s y n th e s iz a b le _ a b s  (G , x) 
i f  (Ga.bs == ' ' i m p o s s i b l e ' ' )  t h e n  a b o r t  
CT. = l o g i c _ s y n t h e s i s  (Gats)
}
}
F ig u re  1. Top-level a lg o rith m  fo r s y n th e s is .
have CSC either, because abs(G, V, x)  «  G d - Hence, D  
is an irrelevant input set for x.  From Lemma 2, a correct 
speed-independent circuit for x  is synthesized from G d - 
Again, from the bisimilarity between G d  and abs(G, V', x),  
abs(G, V, x)  produces the same circuit as the one obtained 
from G d - (Q.E.D.)
From this theorem, if an input set V  such that abs(G, 
V. x)  has CSC is determined, a correct speed-independent 
circuit for x  can be synthesized efficiently. The main con­
tribution of this work is to develop its decision procedure 
without constructing the state graph of G.
3 . D e c o m p o s i t io n  b a s e d  s y n th e s is  o v e rv ie w
The top level algorithm for the proposed decomposition 
based synthesis is shown in Figure 1. It tries to compute a 
synthesizable abstraction G abs f° r each output signal x  of
G. This is actually abs(G, V', x)  that has CSC for some V'. If 
it is impossible, then it is proven in the theorem shown later 
that G  does not have CSC, and so the algorithm aborts. Oth­
erwise, an ordinary speed-independent logic synthesis tool 
such as Petrify or ATACS is applied to G abs to synthesize a 
circuit for x.
The algorithm for obtaining synthesizable abstraction is 
shown in Figure 2. It first constructs the initial input set for 
x  by taking the signals that make x  enabled, called trigger 
signals for x  This is because trigger signals belong to no 
irrelevant input sets as shown in the proof of Lemma 2.
For this initial input set V , the algorithm next com­
putes abs(G ,V ',x) 2, and check if it has CSC. If it does, 
the algorithm returns it. Otherwise, some set of traces of 
abs( G. V . x )  that cause CSC violations, C S C V ,  is ex­
tracted 3 by generating and checking the state graph of 
abs (G. V. x ) .  Note that this state graph is usually much 
smaller than that of the original G.  The algorithm then an­
alyzes each f) £ C S C V  and tries to find candidate inputs
1 This is actually implemented in a conservative way such that it takes 
the signals related to the first non-dummy transitions reached from x+ 
or x — transitions by the upward net traversal of Cl.
2 A simplified version of the algorithm shown in L15 j is used to compute 
abs(G\
3 In our current implementation, one shortest trace is selected for each 
CSC violation pair, because using all CSC violation traces is very ex­
pensive.
obtain_synthesizable_abs (G , x ) {
V  = i n i t i a l _ i n p u t _ s e t  (G, x) 
lo o p  {
Gab, = o b ta in _ a b s  (G, V ,  x)
i f  (Gabs h a s  CSC) th e n  r e t u r n  Gabs
C S C V  =
o b ta in _ C S C _ v io la t io n _ tr a c e _ s e t  (Gabs) 
f o r a l l  y € C S C V  { 
candidate =
analyze_CSCV _t r a c e  (y , G , V , x , p,o, null.) 
i f  (candidate == ' ' i m p o s s i b l e ' ' )  th e n  
r e t u r n  ' ' i m p o s s i b l e ' ' 
a d d _ c o n s t r a in t s _ m a t r ix  (candidate)
}
newV = s o lv e _ c o v e r in g _ p ro b le m  ()
V  = V  U newV
}
}
F ig u re  2. A lgorithm  to  o b ta in  s y n th e s iz a b le  
a b s tra c tio n .
to be added in order to resolve the CSC violation. The al­
gorithm may decide that it is impossible to resolve the CSC 
violation by adding any signals in G. In this case, the al­
gorithm returns “impossible”. Otherwise, candidate  con­
tains a set of requirements such that each requirement is a 
set of signals, and it is satisfied if at least one of them is 
added to V'. In order to resolve the CSC violation, every re­
quirement must be satisfied. Those requirements are added 
in the constraint matrix to set up a covering problem. This 
process is repeated for every CSC violation trace in CSCV. 
Finally, the covering problem is solved for those require­
ments, and the optimal set of signals are added to V'. This 
V' is used to compute a new abs(G, V', x ), and the algorithm 
repeats the above process.
4 . A n a ly z in g  C S C  v io la t io n  t r a c e
This section shows the detailed algorithm for ana-
1 yze CSCV trace that analyzes the CSC violation traces 
and determines a set of requirements for an appropriate in­
put set for x. As shown in Figure 2, analy /e  CSCV trace 
receives g as a parameter. This g is a trace of abs(G, V', x) 
that causes a CSC violation. In order to find an appropri­
ate input set for x , it is necessary to obtain a concrete trace 
of G  that corresponds to g. This can be done without gen­
erating the state graph of G  by a technique similar to the 
one used for the partial order reduction, which we call 
guided simulation. This section starts with the guided sim­
ulation algorithm, and then shows how to determine the 
candidates for the input. In the following, an interface sig­
nal means the signals used in abs(G, V', x), i.e., the signals 
in V', and a noninterface signal means the remaining sig-
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
analyze_CSCV_trace ( y , G , V , x , f i ,  f )  { 
i f  (g i s  e m p ty )  { 
candidate = f  i n d _ i n p u t s  ( / ,  G, V , x) 
r e t u r n  candidate
}
g i = p i c k  t h e  f i r s t  t r a n s i t i o n  o f  g 
N  = f i n c L f i r i n g _ t r a n s  {fi,gi) 
i f  (N  i s  e m p ty )  r e t u r n  ' ' b a c k t r a c k ' ' 
f o r a l l  t € N  { 
fi = f i r  e{f i , t )
y = y
i f  {t == g i )
re m o v e  f i r s t  t r a n s i t i o n  o f  g' 
result =
a n a ly z e _ C S C V _ tra c e  (g' ,G , V , x , fi , f - t )  
i f  (result /  ' ' b a c k t r a c k ' ' )  
e x i t  f o r a l l
}
r e t u r n  result
F ig u re  3. A lgo rithm  fo r th e  g u id e d  s im u la tio n .
nals of G, i.e., the signals in D  =  sig(G) — V . The corre­
sponding transitions are called similarly.
4.1. G u id e d  s im u la tio n
For a given abstracted trace g, the guided simulation ob­
tains a trace f  o f G  such that a trace obtained by project­
ing out the noninterface signals from /  is equal to g. More 
formally, del ( D . f )  = g  with D  = sig (G) — V ,  where for 
h = tit-itz ■ ■ ■ and a set E  of signals, if l ( t i )  £ E x  {+ . —}, 
then del( E . h )  = r/, otherwise, del( E . h )  = fir/, with 
r/ = del( E , t 2t3 • • ■)■
Although an interface transition and a noninterface tran­
sition that are concurrent can lire in any order, our algorithm 
fires all possible noninterface transitions before firing an in­
terface transition because this greatly simplifies the anal­
ysis algorithm shown later. We call a trace satisfying this 
property a regular trace. This simplification can, however, 
cause situations where the guided simulation does not ter­
minate. This happens, for example, when there exists a loop 
in which noninterface transitions can fire forever. Practi­
cally, such situations can be detected by counting the con­
tinuous firings of noninterface transitions and checking if it 
exceeds some upper bound.
The algorithm for the guided simulation is shown in Fig­
ure 3. It forms the body of the recursive procedure an- 
alyze_CSCV_traee. If g  is nonempty, the algorithm 
picks the first transition of g, denoted by g i,  and com­
putes by find_firing_trans(//. g i ) a set of transitions 
that should be fired for g \. Figure 4 shows the algo­
rithm for find_firing_trans. It first computes a set of po­
tentially necessary transitions for g i, which is {.(/i}U
find_firing_trans (f i , t ) {
i f  {t € enabled(fi))
result = {<} U (enabled(fi) fl NonlF) 
e l s e  result = n e c e s s a r y  {fi, t) 
i f  (3'u € result fl NonlF s . t . conflict (u) =  {«}) 
r e t u r n  {«} 
i f  {result — conflicl(t) /  0) 
r e t u r n  { result — conflicl(t) ) 
r e t u r n  result
necessary {fi, t) {
i f  {t i s  a l r e a d y  v i s i t e d )  r e t u r n  0 
i f  {t € enabled(fi)) r e t u r n  {£} 
result =  t h e  s e t  o f  a l l  t r a n s i t i o n s  
f o r a l l  (p € *t — fi) {
7r =  0
f o r a l l  (t! € »p D NonlF)
7r =  7r U n e c e s s a r y  {fi.t') 
i f  (7r i s  s m a l l e r  t h a n  result) 
result = 7r
}
r e t u r n  result
F ig u re  4. A lg o rith m s fo r fin d in g  t r a n s i t io n s  to  
b e  fired  a n d  fo r c o n s tru c tin g  n e c e s s a ry  s e ts .
(enubled(fi)  fl NonlF) if <71 is enabled, where NonlF  is a 
set of all noninterface transitions, and neeessary(//. .91) oth­
erwise. The former is for satisfying regularity, and the lat­
ter is a minimal set of enabled noninterface transitions 
such that .gi can never be enabled if none of those tran­
sitions is fired. For example, consider the nets shown in 
Figure 5, where the transitions except for g\ are nonin­
terface. In the case of Figure 5(a), the necessary set of 
,9i is { f i} or {fr2}- { f i} can be a necessary set of .9 1 , be­
cause .91 cannot be enabled if t \  is not fired. Similarly, {f2} 
can be another necessary set of .91. On the other hand, in 
the case of Figure 5(b), even if t \  is not fired, .91 may be en­
abled by a token produced by firing t2- If neither t\  
nor t 2 is fired, .91 cannot be enabled. Thus, the neces­
sary set of .91 for this case is { f i , *2 }- find_firing_trans then 
chooses a set of transitions that should actually be fired. The 
first two conditions are for firing enabled noninterface tran­
sitions earlier than .91 in order to satisfy regularity. If 
re su lt  contains an enabled noninterface transition that con­
flicts with no other transitions (i.e., conflict(u) =  {u}, 
where conflict(u) = {u ' | *u fl •u ' ^  0}), then it can 
be fired alone. Otherwise, all conflicting transitions ex­
cept for conflict(g\) are returned and used for backtrack­
ing. Finally, if there exist no transitions that are concurrent 
with .9 1 , .91 and its conflicting enabled noninterface transi­
tions are returned.
If find_firing_trans returns more than one transi-
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
F ig u re  5. E x am p les  fo r th e  n e c e s s a ry  s e t  






tion, then those transitions are fired one by one in an- 
alyze_CSCV-trace. This is because it is impossible to 
find the exact transition that should be fired in the mark­
ing, and so, the algorithm relies on backtracking. If an 
inappropriate transition is fired, then it becomes im­
possible to fire the next transition of g in some mark­
ing, and an empty set is returned by find_firing_trans. 
In this case, backtracking occurs by returning “back­
track”.
Although it is guaranteed by this backtracking mecha­
nism that the concrete trace that corresponds to g is cer­
tainly obtained, G  with many conflicts sometimes causes 
a lot of backtrackings. Our tool supports two options that 
we consider to be practical solutions for this problem. One 
is to keep every conflicting transition in abs ( G. V. x ) ,  and 
the other is to keep some of those conflicting transitions, 
which are selected by the users through specific comments 
in the STG file. When translating specifications written in 
a high-level language, our compiler automatically suggests 
the selection of conflicting transitions that should be kept in 
abs(G. V, x)  by using the latter option.
The fired transitions are appended to /  in each recur­
sive call of analyze_CSCV_traee. When g becomes empty, 
/  holds the corresponding concrete trace, which is passed 
to find in p u ts .
4.2. D e te rm in in g  th e  in p u t set
Each CSC violation trace g of abs(G. V, x)  constructed 
by o b t a i n _C S C _ v i o 1 a t i o n _ t r  a c e _s e t is assumed to satisfy 
that g =  (<7oi.9i), /'o ^  /4  (/'o is *he initial marking of 
abs(G. V, x)),  fi[ fi'2, fi[ and fi'2 are a CSC violation 
pair, and there are no markings between /!.[ and /4  that have 
the same nondecorated state as that of //[. The correspond­
ing concrete trace /  generated by the guided simulation is 
also of the form ( /o , / i )  such that fo and f i  end by inter­
face transitions. Let fii and //■2 denote the markings of G  ob­
tained by fo and / 1 , respectively (see Figure 6). Note that 
this assumption simplifies the input set decision procedure.
F ig u re  6. L abe ling  o f a  c o n c re te  tra c e .
but in order to guarantee that every CSC violation pair is 
reached from the initial marking by a simple path, the STG 
needs to satisfy the property that for its every two reach­
able markings, either one is reachable from the other.
For two interface transitions u and b in / ,  if a fires be­
fore b, it is denoted by (a, b) € -Rf,.*.,.- For two (interface or 
noninterface) transitions t i  and t -2 in / ,  if t \  causes t-2 , that 
is, t-2 fires by consuming the token produced by the firing of 
t i , it is denoted by (t i ,  t-2 ) € R {ausc ■ If t i  and t 2 are related 
by the transitive closure of the union of R { rdcr and R{ausc, 
i.e., ( h , t 2) e  (R fordcr U R{ausc)*, then we say that h  is an 
ancestor of t -2 in / ,  and denoted by [ti #2]- Since the 
specific abstracted trace g is focused on, this ancestor rela­
tion represents an actual causality relation with respect to
g. The reason why the ordering relation of interface signals 
is also included is that if  the firing order of concurrent in­
terface transitions in g changes, then it is considered to be 
a different CSC violation trace with a different CSC viola­
tion state pair, and such a different trace is handled sepa­
rately in the forall loop of obtain^synthesizable_abs. This 
ancestor relation plays a key role in the proposed algorithm. 
Thus, in the first step of find in p u t, this ancestor relation is 
set up, which is actually done by constructing a data struc­
ture similar to an occurrence net [20].
In order to resolve the CSC violation between fii and //2, 
it is necessary to find a noninterface signal w  such that f i  
contains odd number of u>-transitions. If u>-transition cer­
tainly fires in f i  in odd times, then the CSC violation is 
resolved. However, the existence of concurrent transitions 
makes this decision difficult. Thus, we need to define the 
following notions (see Figure 7).
final(h) denotes the last transition in /;, and chain(u>. h) 
is a sequence (t-otit-2  ■ ■ - tn- i )  of all w -transitions (i.e.,
I (t-i) = w +  or w — for each i) firing in h in this order. For a 
trace h = (hi . h^)  such that at least h,2  ends with an inter­
face transition, and a noninterface signal w, w  is confined 
by h -2 in /; , if
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC'04)
1522-8681/04 $20.00 ©2004 IEEE
when ^-transition fires in h i,  for the last transition t-Cl 
of chain('u,i./ii), [tCl final(/ii)], and
when ^-transition fires in h-2 , for the first transi-
of chain(u>,h-2), 
h final(/i2 )].
tion t 82 and the last transition t C2 
[final(/ii >h *g2] and [tc
If w  is confined by h -2 in h, it is guaranteed that all w- 
transitions that can fire between final(/ii) and final(/«2) are 
just those in chain(u>. /;2). From the regularity and the as­
sumption that h 2 ends with an interface transition, any non­
interface transition that is concurrent with final (^2 ) fires be­
fore final(/i2 )• Thus, final(/i2 ) is an ancestor of the next w- 
transition that fires after h^- From this reason, it’s not nec­
essary to consider the first w-transition after /;2. In the case 
° f /  =  (/01 f i ) ,  fa also ends with an interface signal. Thus, 
the condition [final(/ii) ts2] is 1101 necessary either. The 
other cases shown later, however, need this condition.
If w  is confined by h2 in h, and chain(u>, h-2 ) contains an 
odd number of transitions, then w is odd-confined by h 2 in
h. even-confined is defined similarly.
Consider the first interface transition in f i ,  and divide 
f i  into f i  and fa with it, i.e., fa = (fa, fa) and fa  ends 
with this first interface transition. Figure 6 shows the rela­
tion among fa  ■ ■ ■ fa . The following lemma holds.
Lem m a 4 The CSC violation with respect to /  is resolved 
by adding a noninterface signal w  to V', if w  is odd-confined 
by fa in / ,  and fa  contains no w-transitions.
(Proof) Since w  is odd-confined by fa in / ,  it is guaran­
teed by the ancestor relation that the state vector of f i i , pro­
jected to V  U {to}, is different from that of //2 - Further­
more, from both the assumption with respect to g that there 
are no markings between //j and fi.'2 that have the same bi­
nary state vector as / / j , and the assumption that fa  contains 
no w-transitions, no new CSC violation pair is introduced.
(Q.E.D.)
In cases that fa  contains w-transitions, every marking 






|  u +
' w +
(wabx) (uwabx) 







F ig u re  8. R eso lv in g  C SC  v io la tio n  by a d d in g  
e s s e n t ia l  s ig n a ls .
if it is projected to V', because /y.3 is obtained by the first in­
terface transition in fa . Thus, those markings in odd posi­
tions cause a CSC violation with //2, because their values of 
w  are the same as that of //2 • One example is shown in Fig­
ure 8. The first u>+ in fa  leads to a marking with state vec­
tor (11072), and this state has a CSC violation with //2. This 
CSC violation cannot be resolved only using the signal w.
Such a CSC violation, however, can be resolved by 
adding another noninterface signal u in addition to w  such 
that u  is odd-confined by / 4 in /  and the first u-transition in 
/ 4 fires in fa , where / 4 is the suffix of fa after the first w- 
transition. The final column of Figure 8 shows how the CSC 
violation is resolved in this case. This additional signal u  is 
called essential for w in / .  Hence, in this case, a signal w  to­
gether with its essential signal u  resolves the CSC violation. 
The precise definition of essential signals is shown as fol­
lows.
Suppose that w is odd-confined by fa in / ,  and 
chain('u,i, fa ) = t it2 • • • tn, where f* is the last w-transition 
that fires in fa . For every odd integer i < k, a noninter­
face signal U j(^ w) is essential for w  in /  with respect to 
i, if (1) Ui is odd-confined by hi in / ,  where hi is the suf­
fix of fa after t-u and (2) for the first Uj-transition t Ui in 
hi,
if either t i+ 1 does not exist or [final(/2 ) 
then [final(/2 ) "“»/ t Ui],
else, [ti+ 1 /  t Ui]
¥f ti+ i] ,
holds. A set of essential signals that includes some essential 
signal u,i for every odd integer i < k  is called a sufficient 
essential signal set for w.
Figure 9 shows a sufficient essential signal set {u i, u3} 
for w. Each essential signal distinguishes the (common) 
state vector of all the markings between the one obtained by
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC04)




< 4 L >
+ (= tl)
9 -  -  (= *->) 
i  «i +  (= t ui) 
+  {= ta )
C j^
^ tlx +  (= t U3) 
j  W +  (= 1.1) 
i  «■ -  (= t :,)
ha
F ig u re  9. An e x a m p le  w h e re  m o re  th a n  o n e  
e s s e n t ia l  s ig n a l is n e c e s s a ry .
INPUTS: a,b 
OUTPUTS: c,xl,x2
F ig u re  10. An STG w h e re  o u r  su ffic ie n t c o n ­
d itio n  d o e s  n o t w ork.
ti and the one where f ,+ 1 tires from that of //-2- For this rea­
son, the first M-transition t Ui must tire certainly after £;+i. 
Furthermore, final(/-2 ) plays the role of t i+i in the last sec­
tion of f-2 - From these discussions, the following theorem, 
which is a generalized version of Lemma 4, holds.
Theorem  2 The CSC violation with respect to /  is resolved 
by adding a noninterface signal w  that is odd-contined by / i  
in /  and its sufficient essential signal set to V'.
Note that this is a sufficient condition. Thus, even if 
the signals satisfying the above condition are not found, 
CSC violations may be resolved. For example, consider 
the STG G  shown in Figure 10 and its output c. The trig­
ger signals for c is a and b. abs(G. {a, b, c}, c) has a CSC 
violation trace g = c:+ ,u+ ,b+ ,b— with go = c:+,u+ 
and g\ =  b + , b The guided simulation generates fo  = 
c + ,x l+ ,x 2 + ,a +  and / i  =  b+, x l  —, b—. x l -transitions 
appear odd-times in f i .  x l  is, however, not confined by / i  
in / ,  because x l+  and u+  are concurrent, and so, [x l+  j  
a+ ]. There exists no other noninterface signal that is con­
fined by f i .  Hence, no signals do not satisfy the above con­
dition. However, G  itself has CSC. Thus, although x l  and 
x2  do not satisfy our sufficient condition, x l  together with 
x2 can resolve the CSC violation.
On the other hand, it is possible to decide that a given 
CSC violation can never be resolved by adding any signals. 
One sufficient condition is as follows.
Theorem  3 If there exists a (noninterface or interface) tran­
sition t  in f -2 such that [f f i na I (/ a )], and for any nonin­
terface signal u, u  is even-confined by the suffix of / i  from 
t  in / ,  then G  has no CSC (see Figure 11).
(Proof) The state vector of the marking where t  tires is the
F ig u re  11. U n reso lv ab le  C SC  v io la tion .
same as that of //2- Furthermore, from [t final(/2 )] 
and the regularity, they certainly cause a CSC violation. 
Since for any noninterface signal u, u  is even-confined, 
any noninterface signal cannot distinguish those state vec­
tors. Hence, this CSC violation cannot be resolved even by 
adding all noninterface signals. This means that G  has no 
CSC." " (Q.E.D.)
If the above condition is satisfied, find_input returns 
“impossible”. Otherwise, it tries to find noninterface sig­
nals that can resolve the given CSC violation. If it succeeds, 
there are usually many choices. For example, suppose that 
signals u, b, c, and d  can be w, and c has a sufficient essen­
tial signal set {e}, and d  has a sufficient essential signal set 
{ /}  or {(/}. The whole condition can be expressed by
o V l V  (c A e) V (d A ( /  V g)).
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC'04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
Table 1. Experim ental resu lts  (1).
Circuit (#T,#0) Petrify Proposed method
CPU (s) Mem(MB) area CPU(s) (Petrify+other) Max(MB) area ave. #L
cb (10,10) 9.6 4.6 82 3.5 = (3.3+0.2) 3.4 82 1.2
cachem (11,16) 219.5 7.7 122 36.8 =(36.1+0.7) 4.0 123 1.5
lf6 (21,41) t  (>59272.4) (>742) - 98.9 =(94.9+4.1) 4.7 200 1.2
(f : BDD manager overflow: >  30000000 nodes) 
_______________________________ Table 2. E x p erim en ta l r e s u lts  (2).
Circuit (#T,#0) Petrify Proposed method
CPU (s) Mem(MB) area C PU( s) (Petri fy+other) Max(MB) area
FTR5_2mul_csc (7,19) 78.8 7.8 151 32.4 = (31.2+1.2) 4.3 150
TTR2_2mul_d_csc (7,19) 240.2 12.3 150 179.6 = (176.9+2.7) 5.8 151
LMS4_prl2_csc (9,18) 354.6 18.6 177 28.3 = (26.6+1.7) 4.5 177
To set up the covering problem, this is transformed into 
product-of-sum form such as
( a Vb Vc Vd ) ( a Vb Vc Vf Vf } ) ( a Vb Ve Vd ) ( a Vb Ve Vf Vf } ) .
Each clause is a requirement (see Section 3), and find in p u t  
returns a set of those requirements.
As long as the condition of Theorem 3 does not hold, 
even if no noninterface signals satisfy the sufficient condi­
tion for resolving the CSC violation, it is worth adding some 
signals as shown in the example of Figure 11. f in d in p u t 
uses some heuristics to choose those signals, for instance, 
choosing a noninterface signal of which transitions just ap­
pear odd-times in / i .
5 . E x p e r im e n ta l  r e s u l t s
The proposed method has been implemented using the C 
language. This section evaluates the potential performance 
of the proposed method and the quality of the synthesized 
circuits. The experiments here have been done on a 2.8 
GHz Pentium 4 workstation with 4 gigabytes of memory. 
As for the selection of conflicting transitions in order to 
avoid backtracking, the compiler suggestion is used for the 
first set of examples, and the option to keep every conflict­
ing transition is used for the second and third sets. Note that 
the performance of Petrify shown in these experiments is 
just used for suggesting the complexity of each example, 
not for comparing the superiority of both methods. On the 
other hand, the estimated area sizes shown by Petrify are a 
good criterion to evaluate the overhead of our method.
The first set of experimental results is shown in Ta­
ble 1. In this experiment, the STGs generated from a high- 
level specification language by the method shown in [12] 
are used. Those are for the sub-circuits of the instruction 
cache system for TITAC2 asynchronous microprocessor 
[21]. These STGs originally contain many dummy transi­
tions generated by the compiler from the high-level speci­
fication language, and those dummy transitions degrade the
Table 3. E x p erim en ta l r e s u lts  (3).
Circuit (#I.#0) Petrify Proposed method
CPU (s) area CPU(s) area
alloc-outbound (4.5) 0.05 21 0.12 21
atod (3.4) 0.06 23 0.18 23
chul50 (3.3) 0.06 21 0.12 21
chul72 (3.3) 0.03 7 0.06 7
convcrta (2.3) 0.06 28 0.19 28
dff (2.2) 0.05 14 0.14 14
mastcr-rcad (6.8) 2.23 53 1.72 53
mp-forward-pkt (12.3) 0.12 29 0.08 29
nak-pa (4.6) 0.14 31 0.23 31
nowick (3.3) 0.08 25 0.10 25
pc-rcv-ifc (4.7) 1.68 64 1.61 65
pc-scnd-ifc (5.5) 1.38 64 1.51 64
rara-read-sbuf (5.6) 0.23 32 0.28 32
rcv-sctup (3.2) 0.03 14 0.12 14
sbuf-rara-write (5.7) 0.27 30 0.40 30
sbuf-read-ctl (3.5) 0.08 19 0.13 19
sbuf-send-pkt2 (4.5) 0.15 25 0.33 25
scndr-donc (2.2) 0.02 7 0.03 7
trimos-scnd (3.6) 0.38 48 1.38 48
vbclOb (4.7) 0.30 47 0.62 47
vbc4a (3.3) 0.06 9 0.05 9
vbc6a (4.4) 0.22 42 0.66 42
vmcbus-arb (3.2) 0.23 14 0.46 14
wrdata (1.4) 0.03 22 0.19 22
wrdatab (4.6) 0.43 53 0.86 53
performance of synthesis process by Petrify or contraction 
process in our method. Thus, the reduced STGs obtained by 
removing those dummy transitions from the original STGs 
are used. The second column of the table shows the num­
ber of input signals and output signals of the circuits. The 
third main column shows the CPU times, the memory us­
age, and the estimated area of the synthesized circuits by 
Petrify with gC implementation option. Petrify cannot syn­
thesize “lf6” due to BDD manager overflow. The fourth 
main column shows the results of our method. For fair com­
parison, Petrify is used for the logic synthesis of each ab-
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC'04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
stracted STG, e.g.. Petrify is run 41 times to synthesize the 
sub-circuit for each output in the case of “lf6”, although any 
logic synthesis tool can be used. Our CPU times show both 
the ran times of this final logic synthesis process by Pet­
rify and the remaining run times for contraction, state space 
generation of abstracted STGs, and analysis of CSC viola­
tion traces. The final sub-column shows the average loop 
counts in obtainjsynthesizable abs.
These results show that our method efficiently handle a 
large specification that the traditional method cannot han­
dle. As for the area overhead, it seems small, but since the 
large example cannot be synthesized by Petrify, more com­
parisons are necessary.
For this purpose, some examples from [22] and the stan­
dard benchmark examples are used. Table 2 and Table 3 
show these results. From these results, the area overhead is 
just 0.2%. Thus, even though our method uses restricted in­
formation for synthesizing sub-circuits, the quality at least 
with respect to the area size is not badly affected.
6 . C o n c lu s io n
This paper presents a decomposition method for efficient 
synthesis of very large speed-independent circuits. While 
this method does have some overhead for small circuits, it 
allows for the synthesis of large circuits that could not be 
synthesized using flat synthesis methods. The experimen­
tal results show that the area overhead appeal's to be very 
small.
Although the theory and algorithms presented in this pa­
per are for untimed circuits, the proposed idea can be ex­
tended to timed circuit synthesis by minor modification as 
follows. Let G u mcd be a timed STG, where transitions have 
the earliest and latest firing times, and G„„i,,„,d be its un­
timed version (i.e., a STG obtained from Gi,,„,d by remov­
ing all earliest and latest firing times). If 6" satisfies 
the requirements for our method, it is safe to apply our de­
cision procedure to G unumcd in order to obtain the neces­
sary signals of Since the decision procedure does 
not use timing information, the input signal sets may not be 
optimal, i.e., some redundant signals may be included, but 
necessary signals are never missed. Thus, if the resultant in­
put sets are used by the timed contraction of G umcd fol­
lowed by the timed logic synthesis procedure, an optimized 
circuit using the timing information can be synthesized for 
Gumcd• The timed circuit synthesis using this idea from 
high-level specification languages together with its com­
piler is future work as well as the comparison with other 
approaches like syntax directed translation.
A c k n o w le d g m e n t
We’d like to thank H. Saito for giving us his benchmark 
circuits. We also thank the referees for their helpful com­
ments.
[11 J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, 
and A. Yakovlev. Petrify: a tool for manipulating con­
current specifications and synthesis o f asynchronous con­
trollers. 1E1CE Transactions on Information and Systems, 
E80-D(3):315-325, March 1997.
[21 P. A. Beerel, C. J. Myers, and T. H.-Y. Meng. Cover­
ing conditions and algorithms for the synthesis o f speed- 
independent circuits. IEEE Transactions on Computer-Aided 
Design, March 1998.
[31 R. M. Fuhrer, S. M. Nowick, M. Theobald, N. K. Jha, B. Lin, 
and L. Plana. Minimalist: An environment for the synthesis, 
verification and testability of burst-mode asynchronous m a­
chines. Technical Report TR CUCS-020-99, Columbia Uni­
versity, NY July 1999.
[41 Steven M. Burns and Alain J. Martin. Syntax-directed trans­
lation of concurrent programs into self-timed circuits. In 
J. Allen and F. Leighton, editors. Advanced Research in 
VLSI, pages 35-50. MIT Press, 1988.
[51 Kees van Berkel, Joep Kessels, Marly Roncken, Ronald 
Saeijs, and Frits Schalij. The VLSI-programming language 
Tangram and its translation into handshake circuits. In Proc. 
European Conference on Design Automation (EDAC), pages 
384-389, 1991.
[61 Euiseok Kim, Jeong-Gun Lee, and Dong-Ik Lee. Automatic 
process-oriented control circuit generation for asynchronous 
high-level synthesis. In Proc. International Symposium on 
Advanced Research in Asynchronous Circuits and Systems, 
pages 104-113. IEEE Computer Society Press, April 2000.
[71 Joep Kessels and Ad Peelers. The Tangram framework: 
Asynchronous circuits for low power. In Proc. of Asia and 
South Pacific Design Automation Conference, pages 255­
260, February 2001.
[81 Doug Edwards and Andrew Bardsley. Balsa: An asyn­
chronous hardware synthesis language. The Computer Jour­
nal ,45(1): 12-18, 2002.
[91 A. Bystrov and A. Yakovlev. Asynchronous circuit synthe­
sis by direct mapping: Interfacing to environment. In Proc. 
International Symposium on Advanced Research in Asyn­
chronous Circuits and Systems, pages 127-136, April 2002.
[101 Tiberiu Chelcea and Steven M. Nowick. Resynthesis and 
peephole transformations for the optimization of large-scale 
asynchronous systems. In Proc. ACM/IEEE Design Automa­
tion Conference, June 2002.
[ I l l  Chris J. Myers and Teresa H.-Y. Meng. Synthesis o f timed 
asynchronous circuits. IEEE Transactions on VLSI Systems, 
1(2): 106-119, June 1993.
[121 Tomohiro Yoneda and Chris Myers. Synthesizing timed cir­
cuits from high level specification languages. Nil Technical 
Report, NII-2003-003E, 2003.
[131 A. Matsumoto. High level synthesis of asynchronous cir­
cuits (in Japanese). Master Thesis, Tokyo Insitute of Tech­
nology, 2004.
[141 Tam-Anh Chu. Synthesis of Self-Timed VLSI Circuits from 
Graph-Theoretic Specifications. PhD thesis, MIT Labora­
tory for Computer Science, June 1987.
[151 Walter Vogler and Ralf Wollowski. Decomposition in asyn­
chronous circuit design. In J. Cortadella, A. Yakovlev, and
R e f e r e n c e s
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC'04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
G. Rozenberg, editors. Concurrency and Hardware Design, 
volume 2549 of lecture Notes in Computer Science, pages 
152-190. Springer-Verlag, 2002.
[16] H. Zheng, E. Mercer, and C. J. Myers. Modular verification 
of timed circuits using automatic abstraction. IEEE Transac­
tions on Computer-Aided Design, 22(9), September 2003.
[17] Ruchir Puri and Jun Gu. A modular partitioning approach 
for asynchronous circuit synthesis. In Proc. ACM/IEEE De­
sign Automation Conference, pages 63-69, June 1994.
[18] J. Beister, G. Eckstein, and R. Wollowski. From STG to 
extended-burst-mode machines. In Proc. International Sym­
posium on Advanced Research in Asynchronous Circuits and 
Systems, pages 145-158, April 1999.
[19] Chris Myers. Asynchronous Circuit Design. John Wiley & 
Sons, 2001.
[20] Kenneth McMillan. Using unfoldings to avoid the state ex­
plosion problem in the verification of asynchronous circuits. 
In G. v. Bochman and D. K. Probst, editors, Proc. Inter­
national Workshop on Computer Aided Verification, volume 
663 of lecture Notes in Computer Science, pages 164—177. 
Springer-Verl ag, 1992.
[21] Akihiro Takamura, Masashi Kuwako, Masashi Imai, Taro 
Fujii, Motokazu Ozawa, Izumi Fukasaku, Yoichiro Ueno, 
and Takashi Nanya. TITAC-2: An asynchronous 32-bit mi­
croprocessor based on scalablc-dclay-inscnsitive model. In 
Proc. International Conf Computer Design (ICCD), pages 
288-294, October 1997.'
[22] H. Saito. Synthesis o f Globally Delay Insensitive Locally 
Timed Asynchronous Circuits from Register Transfer tevel 
Descriptions. PhD thesis. University of Tokyo, 2003.
A p p e n d ix
Proof of Lem m a 1 From  x  #  D, the value of x  dis­
tinguishes the elem ents in Cd (ES(x +))  from  those in 
Cd (QS(x +))  and Cd (ES(x —)). Similarly, Cd (QS(x —)) is 
disjoint f ro m Cd (QS(x +))  and Cd (ES(x —)). Cd (ES(x +))  
and Cd (QS(x — )) are disjoint for the following rea­
son. Suppose that they have a com m on elem ent ,s. Then, 
there must exist ,s'i G ES(x+)  and ,s'2 G Q S ( x - )  
such that ,s G Cd(.s'i) and ,s G C0 (1*2 )• Then, from 
the definition o f D -closure, Cd(.s'i) =  € 0 (^2 ) must 
hold, and so ,s'2 G Cd(.s'i) is implied. This, how­
ever, contradicts Cd (ES(x +)) — UR = ES(x+) ,  because 
ES(x+)  and Q S ( x - )  are disjoint. Hence, Cd (ES(x +))  
and Cd (QS(x — )) are disjoint. The disjointness be­
tween Cd (ES(x — )) and Cd (QS(x +))  can be shown 
similarly. (Q.E.D.)
Proof o f Lem m a 2 For this proof, we focus on the gC im ­
plementation, and furthermore, only C (x+ )  is considered, 
because the proof for the other cover and the atomic gate 
im plem entation can be done similarly. Let ES1 and QS' de­
note the excitation and stable state sets o f G d . From  the 
construction of G d , those are obtained by ES'(x+ ) = 
projd (Cd (ES(x+))), QS'(x+) = projD (CD (QS(x+))), 
and so on. Since some unreachable states o f G  are mapped
into those excitation or stable states of G d , the unreach­
able state set UR1 of G d  satisfies proj^ ( U R 1) C UR.
The first thing to be shown is that G d  has CSC and is 
output semi-modular-. The former is straightforward from 
Lemma 1. For the latter, suppose that G d  is not output semi­
modular-. This can happen if an output transition t x is en­
abled by an irrelevant transition t-d in G, and td is replaced 
by a dummy transition in G d , resulting in a new chain of 
dummy transitions to t x from a dummy transition conflict­
. . . . t-A • •
ing with some transition. Let ,s'i -4  ,s'2 be the state transition 
caused by t-d• Since td makes t x enabled, ,s'i ^  ES(x+)  and 
.s'2 G E S (x + ) hold (assuming that t x is a rising transition of 
an output x). This, however, contradicts that td is a transi­
tion related to an irrelevant signal, because ,s'i G Cd (u2 ) 
but ,s'i $  ES(x+)  U UR. Therefore, G d  is output semi­
modular-. The similar- discussion holds for the case that td is 
followed by a chain of dummy transitions. Since G is output 
semi-modular-, there are no other cases that G d  is not out­
put semi-modular-. Hence, G d  can be concluded to be out­
put semi-modular-. The signals related to transitions like td 
are called trigger signals for x,  and they do not belong to 
any irrelevant input set for x.
The above two facts guarantee the existence of a cor­
rect circuit with respect to G d- Let C' (x+)  denote one of 
its covers. Note that this cover satisfies
ES' {x+)  C C "(.t+) -  UR'  C ES' {x+)  U Q 5 '(.t+ ). (1)
The next thing to be shown is that this cover also satisfies the 
correctness condition of G. Since C'  ( :r+ ) should be consid­
ered in the state space of G, this is shown by
ES{x+)  C p ro £ 1(C '(:r+ )) -  UR C ES{x+)  U 0 S (.t+ ).
(2)
The above (proj£)1(G '(.r+ )) — UR) can be rewritten as fol­
lows.
projd1(C' (x +)) -  UR 
= p roj^1 ( ( C  ( x + ) -  UR') U (O' ( x + ) n  U R ' ) ) - U R  
= proj^1(C '(:r+ ) -  UR') U
proj2)1(C '(:r+ ) fl U R ' ) -  UR 
= proj^1 ( C { x + ) -  UR') -  UR. (3)
This (3) is obtained from proj^ ( U R 1) C UR. Applying 
p roj^1 to (1) derives
proj^1 (ES'(:r+)) C proj2)1(G '(:r+ ) -  UR')
C proj2)1(£’5, (.'f+) U QS' (x+))
Cd (ES(x + )) C proj2,1(G '(:r+ ) -  UR') 
C C d (ES(x + ) ) U C d (QS(x +))
Cd (ES(x +))  -  UR C projd 1(C' (x +) -  UR') -  UR
C (CD(£-S(:r+)) -  UR) U (CD (QS(x+))  -  UR)) 
ES(x+)  C proj2)1(C '(:r+ ) — UR') -  UR
C ES(x+)  U QS(x+)  (4)
The final equation (4) holds because D  is an irrelevant input 
set. From (3) and (4), (2) is obtained. (Q.E.D.)
Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems (ASYNC04)
1522-8681/04 $20.00 ©2004 IEEE C o m p u t e rs o c i e t y
