Technology mapping of speed-independent circuits based on combinational decomposition and resynthesis by Cortadella, Jordi et al.
Technology Mapping of Speed-Independent Circuits Based on 
Combinational Decomposition and Resynthesis 
Jordi Cortadella* Michael Kishinevsky Alex Kondratyev 
Universitat Politkcnica de The University of Aim 
Catalunya, 08071 Aizu-Wakamatsu, Aizu- Wakamatsu, 
Barcelona, Spain 9 65- 8 0 Japan 
The University of Aim 
9 6 5- 80 Japan 
Luciano Lavagnot Alex Yakovled 
Politecnico di Torino University of Newcastle upon Tyne 
10129 Torino, Italy 
Abstract 
This paper presents a solution t o  the problem of 
sequentaal multi-level logic synthesis of asynchronous 
speed-independent circuits. The starting point is a 
technology-independent speed-independent circuit ob- 
tained using, e.g., the monotonous cover conditions. 
We describe an algorithm for the factorization of this 
circuit aimed a t  implementing it in a given standard 
cell library, while preserving speed-independence. The 
algorithm exploits known eficient factorization tech- 
niques from combinational multi-level logic synthesis, 
bat achieves also boolean simplification. Experimen- 
tal results show a significant improvement in terms of 
number and complexity of solvable circuits with respect 
to existing methods. 
1 Introduction 
Recent years have seen a revival of interest in 
the sub-class of asynchronous circuits called speed- 
independent circuits. Such circuits have been char- 
acterized by Muller in his seminal paper [9] as be- 
ing hazard-free using the unbounded gate delay model. 
Even though neglecting wire delays can be restric- 
tive in practice, speed-independent circuits are a good 
starting point for synthesis and optimization proce- 
dures that use more detailed and realistic delay mod- 
els. On the other hand, very efficient analysis and 
synthesis techniques, supported by CAD tools, exist 
for speed-independent circuits today. 
Current synthesis techniques still suffer from a se- 
vere limitation: either they assume that the imple- 
mentation library contains and gates with unbounded 
fanin and “free” input inversions ([l, 5, 81) or they 
use non-standard “hazard absorbing” flip-flops whose 
effectiveness in practice still needs to be evaluated 
*This work has been partly supported by the Ministry of 
Education of Spain (CICYT TIC 95-0419). 
tThis work has been partly supported by MURST research 
project “VLSI architectures”. 
$This work has been partly supported by the U.K. EPSRC 
GR/J52327 and the British Council Programme (Spain) Ac- 
ciones Integradas MDR/1996/97/1159. 
NE1 7RU England 
([ll]). Other results on the implementability of semi- 
modular circuits without inputs using two-input/two- 
output and and or gates ([15]) are only interesting 
from a theoretical standpoint, due to their extremely 
high implementation cost. 
Only recently people have begun to analyze the 
decomposability of speed-independent circuits using 
a given, realistic standard cell-like library. The ap- 
proach described in [13] works only under the fun- 
damental mode assumption’, which is overly restric- 
tive and does not fit well theoretically with the un- 
bounded delay assumption. The same authors de- 
scribe in [12] a method to perform technology map- 
ping for speed-independent circuits that only decom- 
poses existing gates (e.g., a 3-input A N D  into 2 2- 
input ANDs) ,  without any further search of the im- 
plementation space. They do not explore complex de- 
compositions, that could use multi-cube divisors or 
decompose several gates simultaneously. The same 
limitations also affect the work of [l, 21. 
Most recent examples of relevant work can be found 
in [lo, 11, 41; each of them lacks flexibility either with 
respect to the gate library, the scope of optimization 
or the extent of logic sharing. 
The main contribution of this paper is an efficient 
solution of the technology mapping problem for speed- 
independent circuits. We have developed a body of 
theory that allows us to prune the search space when 
looking for solutions. We use classical logic synthesis 
techniques for combinational multi-level logic in order 
to find good candidate functions for the decomposi- 
tion. We then derive efficient filtering conditions that 
guar ant ee : 
0 speed-independent implementability of the new 
signals, and 
0 a bound on the global increase in complexity of 
the circuit, due to the need to acknowledge the 
new signals. 
‘I.e., the environment is not allowed to change circuit inputs 
unless the circuit is stable. 
98 
1066-1409/97 $10.00 0 1997 IEEE 
acdx 
T+- 
OoOl - 
2 Theoretical background 
In this section we introduce theoretical concepts re- 
quired for understanding our decomposition method. 
These concepts are subdivided into three parts: (1) 
circuit specification and its basic logic implementabil- 
ity; (2) conditions of hazard-free decomposition of 
complex gates; and (3) correctness-preserving trans- 
formations to ensure those conditions. They are sum- 
marized in the following subsections. 
2.1 State Graphs and Logic 
A State Graph (SG) is a labeled directed graph 
whose nodes are called states. Each arc of an SG is 
labeled with an event, that is a transition (rising or 
falling) of an input or output signal of the specified 
circuit. Each state is labeled with a vector of sig- 
nal values. An SG is consistent if its state labeling 
v : S -i (0, l}" is such that: 
0 for each edge (s, U:, S I ) ,  vi(s)  = 0, vi(s') = 1, 
while vj(s)  = vj(s') for all j # i. 
0 for each edge ( s ,uF ,s ' ) ,  vi(s) = 1, vi(s') = 0, 
while v j (s )  = vj(s') for all j # i. 
Informally, this means that in every transition se- 
quence from the initial state, rising and falling tran- 
sitions alternate for each signal. Figure 1,a shows an 
SG which is consistent. 
The set of signals A whose transitions label SG arcs 
are partitioned into a (possibly empty) set of inputs 
Ad, which come from the environment, and a set of 
outputs or state signals that must be implemented A,. 
In addition to consistency, the following two properties 
of the SG model are needed for their implementability 
in a logic circuit. 
The first property is speed-independence, required 
for the existence of a hazard-free circuit implemen- 
tation. It consists of three constituents: determin- 
ism, commutativity and output-persistency. An SG 
is called deterministic if for each state s and each la- 
bel U* there can be at most one state s' such that 
s 5 s'. An SG is called commutative if whenever two 
transitions can be executed from some state in any 
acdx 
y+- 
C) 
acdx 
Tc-l 
Figure 1: An example of State Graph and logic decomposition (benchmark huzurd.g) 
99 
order, then their execution always leads to the same 
state, regardless of the order. An event U* is called 
persistent in state s if it is enabled at s and remains 
enabled in any other state reachable from s by firing 
another event b*.  An SG is called output-persistent 
if its output signal events are persistent in all states. 
Any transformation (e.g., insertion of new, auxiliary, 
signals for decomposition) if performed at the SG level 
may affect all three properties. 
The second property, Complete State Coding 
(CSC), becomes necessary and sufficient for the ex- 
istence of a logic circuit implementation. A consistent 
SG satisfies the CSC property if for every pair of states 
s,s' such that v(s) = v(s'), the set of output events 
enabled in both is the same. (The SG in Figure 1,a is 
output-persistent and has CSC.) CSC does not however 
restrict the type of logic function implementing each 
signal, and therefore CSC and SG speed-independence 
ensure hazard-freedom only if each signal is imple- 
mented as a single atomic gate. The complexity of 
such gate can however go beyond that provided in a 
concrete library or technology. 
2.2 Hazard-free implementability 
The decomposition of a complex gate into smaller 
gates creates new signals, that are not part of the origi- 
nal specification. In order to guarantee that these new 
signals do not produce.hazards, we must ensure that 
their covers satisfy the important property of mono- 
tonicity, which is defined in this section. 
Necessary and sufficient conditions for output- 
persistent implementation using unbounded fanin and 
gates (with unlimited input inversions), bounded fanin 
or gates and C elements were given in [8] (extending 
a previous result of [l]). In this work we are consider- 
ing a similar basic implementation architecture, called 
the standard-C architecture, which is described in Fig- 
ure 2. The difference from previous work is that in- 
stead of unbounded fanin and and or gates for the first 
and second levels, we will allow only implementable 
gates, that is gates which exist in the chosen library. 
The conditions derived in [8] are correct also in the 
presence of input inversions if the delay of the in- 
verter does not exceed that of the remaining logic on 
the fastest feedback loop involving the inverter itself. 
< n 
a 
. 
Ea 
Figure 2: The standard-C architecture extended for 
complex gates 
The concepts of excitation and quiescent regions are 
essential for defining the hazard-freedom conditions. 
A set of states is called an excitation region for event 
a* (denoted by ERj(a*))  if it is a maximal connected 
set of states such that for every state s E ERj(a*) 
there is an event s +. Since any event a* can have 
several separated ERs, an index j is used to distinguish 
between diflerent connected occurrences of U* in the 
SG. Similarly to ERs, we define switching regions (de- 
noted by SRj(a*)) as connected sets of states reached 
immediately afier the occurrence of an event. 
The set of events entering states of an excitation 
region ERj(a*) from outside the region is called a set 
of trigger events for event a*.  Looking at a circuit im- 
plementation of an SG, signals whose events are trig- 
ger for an event of a signal a will certaznly be inputs 
(called trigger signals for U )  to the logic circuit imple- 
menting a. 
The quiescent region &Rj (a*)  of a given signal tran- 
sition with excitation region ERj(a*) is a maximal 
set of states s reachable from ERj(a*) such that (1) 
a is stable in s and (2) s is not reachable from any 
other E R k ( a * )  such that le # j without going through 
E R j ( a * )  z. Examples of ER, SR and QR are shown 
in Figure 1,a. 
Let cj(a*)  denote one of the first-level complex 
AND-OR gates in the standard-C architecture. cj (a* )  
is a correct monotonous poly-term cover3 for the gen- 
eralized excitation region ERj((a*) if 
1. c j ( u * )  covers (i.e., its Boolean equation evaluates 
to 1) all states of ERj(a*). 
a* 
*Note that contrary to [l, 81 in this paper we consider only 
the so called restricted quiescent regions, which do not include 
states reachable directly from two different ERs of the same 
signal (condition 2 of the definition) 
3Here for simplicity we consider the definitionof Monotonous 
Cover without the extension by the so-called backward quiescent 
regions and without considering covering of multiple regions by 
the same cover. However all the results can be easily generalized 
for this extension as well. 
2. cj(a*) does not cover any state from ERi(a*) U 
3. cj(u*) changes at most once within QRj(a*) .  
Q&(Q*), where i # j. 
Under these conditions, it is possible to show that the 
outputs of the first-level gates are one-hot encoded, 
and that means that any valid Boolean decomposition 
of the second-level or gates will be speed-independent. 
The chosen architecture also covers the case in 
which a signal in the specification admits a combi- 
national implementation (called a complete cover). In 
that case the set and reset network are the comple- 
ment of each other, and the C element with identical 
inputs can be simplified to a wire (see Figure 2,b,c). 
2.3 Property-preserving event insertion 
Our decomposition method is essentially be- 
havioural - the extraction of new signals at the struc- 
tural (logic) level must be matched by an insertion of 
their transitions at  the behavioural (SG level. Event 
insertion is an operation on the SG w h ich selects a 
subset of states, splits each state in it into two states 
and creates, on the basis of these new states, an exci- 
tation and switching region for a new event. Figure 3 
shows the chosen insertion scheme, analogous to that 
used by most authors in the area [14], in the three 
main cases of insertion with respect to the position of 
the states in the insertion set ER($) (entrance to, exit 
from or inside ER($)).  
Figure 3: Event insertion scheme 
State signal insertion must also preserve the speed- 
independence of the original specification, that is re- 
quired for the existence of a hazard-free asynchronous 
circuit implementation. Formally, we say that an in- 
sertion state set E R ( x ) ,  in an SG A' obtained from 
a deterministic and commutative SG A by insert- 
ing event x, is a speed-independence preserving subset 
(SIP-set) iff (1) for each a E E ,  if a is persistent 
in A, then it remains persistent in A', and (2) A' is 
deterministic and commutative. An efficient method 
of finding SIP-sets based on the notion of regions has 
been proposed in [6]. 
Assume that the set of states S in an SG is parti- 
tioned into two subsets which are to be encoded by 
means of an additional signal. This new signal can 
be added either in order to satisfy the CSC condition, 
or to break up a complex gate into a set of smaller 
gates. In the latter case, a new signal is added to rep- 
resent the output of the intermediate gate added to 
the circuit. 
Let r and T = S - r denote the blocks of such a 
partition. In order to implement such an encoding, 
we need to insert appropriate transitions of the new 
signals in the border states between the two subsets. 
100 
In this aper we shall consider the so-called input 
border (IBY of a partition block r ,  denoted by I B ( r ) ,  
which informally is a subset of states of r which have 
predecessors not in r .  We call I B ( r )  well-formed if 
there are no events leading from states in r - J B ( r )  
to states in IB(r) .  
Insertion of a new si nal can be formalized with the 
notion of I-partition (141 used a similar definition). 
Given an SG with a set of states S, an I-partition 
is a partition of S into four blocks: So, SI, S+ and 
S-. So(S1) defines the states in which 2 will have the 
value 0 (1). S+(S-) defines ER($+) (ER($-)). For 
a consistent encoding of 2, the only allowed events 
crossing boundaries of the blocks are the following: 
So + S+ + S1 + S- --+ So, S+ ---f S- and 9- -+ 
S+ . 
3 The technology mapping method 
As described in the previous section, any deter- 
ministic, commutative, output-persistent SG with the 
CSC property and satisfying the Monotonous Cover 
conditions can be implemented using the standard- 
C architecture. This section describes how the 
potentially large abstract gates derived from the 
Monotonous Cover implementation can be decom- 
posed into library gates while maintaining speed- 
independence. The potentially huge search space is 
limited by an efficient search algorithm that prunes 
decompositions that are guaranteed to violate speed- 
independence. 
c _ _ _ _ _ _ _ _ _ _ _ _ _  - - - - - - - - - - - - - 
Figure 4: Decomposition of the cover function c(a*) 
1 a sub- To simplify the implementation of c(a* function f is extracted from it and is imp emented 
by a separate gate with output f”. In a common form 
c(a*) is represented by c(a*) = f * g + T (where f,.g 
and r are arbitrary boolean functions), as shown in 
Figure 4. This approach is more general than [12, 41 
where switchin s of a new signal f” must be acknowl- 
edged by c j  (U*? only and gate f always has a fan-out 
equal to 1. The acknowledgment of f” by the covers 
different from c(a*) (see e.g. the function c(b*) in Fig- 
ure 4) offers two advantages: the sharing of a gate f 
by several cover functions can simplify the implemen- 
tation and, what is more important, succeed in many 
cases for which a local acknowledgment fails (see the 
experimental results). 
The overall algorithm for sigfial insertion aimed at 
logic decomposition is sketched below. The next sec- 
tions describe each step in more detail. 
while circuit is not implementable do 
Calculate monotonous covers for all events: 
a* = event with the most complex cover; 
/* Kernels, co-kernels, AND/OR decomposition */ 
D = {set of divisors for c(a* ) } ;  
for each f E D do 
Find I-partition for f; 
Evaluate progress for decomposition of c(a* ) ;  
/* (property 3.1) */ 
Estimate progress for all other covers; 
/* (property 3.2) */ 
end for 
if there is no f E D that can make progress on c(a*)  
then /* The cover c(a* )  cannot be decomposed */ 
return; 
else fbest = f with valid I-partition and 
Insert a new signal with f ’ s  I-partition; 
best global decomposition progress; 
end if 
end while 
The algorithm can be tuned by trading-off efficiency 
and quality of the results. For example, other events 
different from a* can be also selected for decomposi- 
tion in case no good divisor is found for a*. 
Note that the algebraic divisors are only used for a 
preliminary choice of the function of the new signal to 
be added to the SG. The well-formedness conditions 
are then used to refine this function, so that it has 
a speed-independent implementation. The implemen- 
tation of every signal in the circuit is recomputed at 
every step. This practically implements boolean di- 
vision and can even obtain sequential decomposition. 
The conditions discussed below are used to guarantee 
progress at each step. We prune those divisors that 
would excessively increase the complexity of other sig- 
nals due to the requirement to acknowledge every tran- 
sition of the new signal to satisfy speed-independence. 
We use the example hazard.g from the set of asyn- 
chronous benchmarks to illustrate our algorithm. Its 
SG is shown in Figure 1,a and an MC-implementation 
of the output signals c and d is presented in Figure 
5,a. Our target is the decomposition of function Sz 
into two-input gates, because two-input gates are a 
standard worst case against which the performance of 
a decomposition algorithm can be measured. 
Figure 5: Circuits for a hazard.g example before (a) 
and after (b) decomposition 
3.1 Logic decomposition 
synthesis (see [3] for more details). 
We assume here familiarity with multi-level logic 
As tradition- 
101 
ally done in multi-level combinational synthesis, we 
have chosen algebraic division as the main operation 
for logic decomposition. However other divisors (e.g. 
boolean divisors) might also be considered within this 
scheme. For each event a*, there may be several opti- 
mal functions that can implement a monotonous cover 
c(a*). Instead of finding decompositions for all valid 
covers, we choose only one of the minimum-cost cov- 
ers and seek algebraic divisors for it, so as to avoid an 
explosion in the computational cost of the search. 
Thus, for each cover c(a*) we seek algebraic divi- 
sors, aiming at  decompositions of the followin type: 
c(a*) = f * g + r where g is the quotient c(!a*)/f. 
AND-decomposition is done when r=O, whereas OR 
decomposition occurs when g=1. 
To find good divisors f for c(a*) the following func- 
tions are considered: 
0 Kernels and co-kernels of c(a*).  
0 If c(a*) is a poly-term cover, any subset 
of terms of the sum-of-product expression (OR- 
decomposition). 
0 If c(a*) is one cube, any subset of literals of the 
cube ( AND-decomposition). 
0 Recursive decomposition of the previous candi- 
dates, e.g. sub-kernels and AND/OR-decomposition 
of kernels. 
This generation of divisors is heuristically pruned 
to avoid an explosion of candidates for functions with 
many terms or cubes with many literals. Experimental 
results have shown this type of decomposition to be 
very effective. 
Example  haaard.g (Figure 1). Function S, consists 
of a single 3-literal cube i idc.  It can be decomposed in 
three ways: by functions i id ,  iic and dc.  
Example  2. For the cover c ( z* )  = ab + a c  + d e f  
the following divisors are generated (trivial 1-literal 
divisors are not considered): the kernel b + c, the OR- 
decompositions ab, ac, d e f ,  ab + ac, ab + d e f  and 
ac + d e  f and the AND-decompositions d e ,  df and e f .  
3.2 Speed-independent implementation 
A boolean function f defines a bipartition {So, Sl} 
of the set of states of SG, where So (Sl contains states 
in which f = O  ( f=l) .  To insert a signa f” that realizes 
function f ,  it is necessary to find two additional sets of 
states: ER(f”+) C S1 and ER(f”-)  C So in which 
f” is enabled and fires from 0 to 1 and from 1 to 0, 
respectively. 
Let us denote by I B ( f + )  the set of SG states in 
which the function f changes the value from 0 to 1, i.e. 
TB(f+) = { S  E A,  3sl + s A f(s1) = 0 A f(s) = 1). 
Clearly I B ( f + )  must be included into E R ( f + ) .  Sig- 
nal f” can have a speed-independent implementation 
if and only if its excitation regions are well-formed SIP 
sets [6] .  These sets can be obtained by the following 
iterative procedure: 
of decomposition function 
Generation of E R ( f S  +) 
1. Start from E R ( f S + )  = IB( f+)  
2. Force well-formedness: recursively insert in 
ER(fS+) all states from S1 that are direct pre- 
decessors for some state from ER(f”+) 
3. Force SIP properties: if some state diamond in- 
tersects with ER(fJ+) in an illegal way remove 
the illegal intersection by inserting in E R ( f  +) 
the corresponding states of the diamond 
4. Preserve (if necessary) the input-output interface: 
if an input signal b can be delayed in an insertion 
of f”  by ER(fS+), then extend ER(fJ+ beyond 
triggers of b. Return to Step 2. 
the ER(b*), thus removing f”+ from t h e set of 
exists) or a fixed point is reached. It can be s 6 own 
Calculation of ER( f” +) stops either if at some step 
ER(f”+) intersects with So (then no legal E R  f”+) 
[7] that the above procedure always finds the well- 
formed SIP set ER(fS+)  (if exists that is minimal 
and unique. Calculation of ER(fJ- I can be done sim- 
ilarly based on I B ( f - ) .  
Example  h a z a r d g .  Figure 1,b implies that a de- 
composition of s, = Bcd by a function f = ?id is 
illegal, because f intersects illegally with a state di- 
amond {1011,0011,1001,0001} and this intersection 
cannot be removed by extending IB(  f +) without hit- 
ting the states where f = O .  
Decompositions based on functions iic and de are 
valid and the corresponding excitation regions of sig- 
nal f” are shown in Figure l,c,d. Note, that states 
(0011,1001) are included into ER(fS-) (Figure 1,d) 
to preserve the 1/0 interface, otherwise input events 
a- or d-  will be delayed by the insertion of f  ’. State 
0001 is included into ER(f”-) to preserve speed- 
independence. 
3.3 Progress Analysis 
A proper choice of ER(f”+) and ER(f”-) guar- 
antees the possibility of a speed-independent imple- 
mentation of a signal f”. However it does not guar- 
antee that the speed-independence of a target cover 
function c(a*) = f * g + r is not disturbed when f is 
implemented by a separate gate. Conditions for safe 
substitution of f by signal f” in c(a*) are given by 
the following property. These conditions are formu- 
lated in terms of states of the original SG, i.e. before 
f s  is actually inserted. Hence the conditions can be 
efficiently checked without  reconstructing the SG (as 
in [4]). Note, though, that in the formulation of the 
property we must consider set QR(a*)’ 2 QR(a*), 
that also includes states in the excitation regions of 
the transitions of a after QR(a*) ,  if f”- becomes a 
trigger for those transitions. 
Property 3.1 Le t  c(a*) = f * g + r be a m o n o t o n o u s  
cover of ER(a*), and E R ( f S + )  and ER( fS - )  be t h e  
exci tat ion regions of a sagnal f” (obtaaned as  discussed 
. T h e  implementataon cf(a*) = f 8  * g + r  satis-  
f ies  t e MC condi t ions a#: 
1. Ifs1 E (ER(a*)nf*g*~)nER(f”+) and  sl 2 s2 
t h e n  s tate  s2 ER( f s+) .  Thas ensures  that f” 
evaluates t o  1 in all s tates  of ER(a*) t h a t  are 
covered only  by f * g .  
102 
For any state s outside ER(a*)U&R(a*)‘ we have 
s 4 ER(f”-) n g This ensures that cube f”*g 
cannot evaluate t o  1 outside ER(u*) U QR(a*)’ 
For any state s E (&R(a*) n f * g * F), s 
ER(f ’+) ,  and 
for  an state sl predecessor of a state s2 E 
QR(a*yr3ER(fd-)ng, sl E r+g. These two con- 
ditions ensure the monotonous behavior of f”*g 
inside &R(a*)’. 
The proof is in [A. 
Ezample ha2ard.g. All the conditions of Property 
3.1 are satisfied for the decompositions Zc and dc and 
for both of them S, can be safely decomposed into 
two AND gates. 
3.4 Cost estimation 
We now estimate the complexity of the circuit ob- 
tained after decomposition with respect to that of the 
original circuit. 
Complezity of the implementation off” 
It can be shown, by analyzing the MC conditions 
([q), that f” = f is a correct complete cover for 
a signal f” . 
Events f o r  which f” as not a trigger 
The preconditions for these events are not mod- 
ified by the insertion of f”, and hence we 
can use the same implementation as before the 
decomposi tion4. 
Events for  which f” is a trigger 
Here we have two different cases. 
1. Trigger event f”* replaces another trigger 
event d* for some E R ( e * ) .  In this case we 
try to  substitute the literal f” for d in the 
cover function c(e*) = d * m + n. This can 
be done by checking the conditions that are 
similar to those of Property 3.1. 
2. Trigger event f”* does not replace any other 
trigger event for ER(e* . In this case the 
can in general increase (unless the expanded 
don’t care space induced by fS* permits to 
further simplify it). This increase can be 
moderate (one literal) if the conditions of the 
following property are satisfied. 
complexity of the cover 2 unction for E R ( e * )  
Property 3.2 [7] Lei c(b*) be a monotonous cover 
for  event b* in SG A .  If in the SG A’ obtained from 
A by inserting the new signal f”: 
1. event f”+ is a trigger for  b*; 
2. ER(f”+) n SR(b*)  = 0 and 
9. c(b*) n ER(^-) = 0 
‘It is possible, though, that f 3  can be used to simplify the 
implementation of those signals as well, by increasing the don’t 
care space. 
then the implementation c(b*) * f” in A‘ for event b* 
satasfies the Monotonous Cover condations. 
This property is used as a heuristic filter to select can- 
didate divisors that are guaranteed not to increase 
excessively the complexity of the implementation of 
other signals. 
Note that this conservative estimation is applied 
not only for Case 2 but also for Case 1 when the sub- 
stitution of an old trigger signal by a new one fails. 
Example ha2ard.g. In the decomposition using dc 
(Figure 1,d) signal f” becomes a new trigger event to 
2- without replacing any other trigger event. Hence 
the cover for z- will increase by one literal, while the 
cover for z+ will decrease by one literal. Hence this 
decomposition is not useful. 
In the decomposition using hc event f”- is inserted 
before c- and replaces trigger event a+. Function for 
c- will not increase in complexity. The result of the 
decomposition by function Ec is shown in Figure 5,b. 
4 Experimental results 
The strategy for general logic decomposition previ- 
ously presented has been implemented and applied to 
a set of different benchmarks. Results are shown in 
Table 1. 
We have measured the complexity of each gate as 
the number of literals required to implement it as 
a sum-of-product gate, either complemented or not. 
Thus a 2-input XOR gate ( d + t i b )  is considered to be a 
4-literal gate, whereas the function f = ab+ac+db+dc 
is also considered a 4-literal gate (f = lid + be). This 
model is slightly different from the one used in [4] in 
which the complexity was measured as the number of 
different inputs for FPGA lookup tables. 
The first set of columns in Table 1 indicates the 
complexity of the circuit before decomposition. The 
second set of columns reports the number of signals 
inserted for decomposition using gates with at most 
i literals (i = 2,3,4). The next column summarizes 
the results presented by Siege1 [la] about the imple- 
mentability of the circuit with only 2-input gates. All 
the implementations have been verified to be speed- 
independent, 
From the 32 examples, only G were not imple- 
mented (n. i. ) with 2-literal gates. Only one 6-input 
AND gate in pe-send-ifc and two 5-literal gates in 
tsend-bm were not decomposed when attempting to 
implement these circuits with 4-literal gates. Our re- 
sults show a significant improvement over those pre- 
sented in [12], and only one circuit (pe-rcv-if) could 
not be implemented with 2-literal gates from that 
benchmark suite. 
Global acknowledgement allows our method to ef- 
fectively decompose complex gates with high fan-in ( 6  
or 7 literals). This is shown by circuits like mri and 
vbelOb that were implemented with 2-literal gates. 
Figure G illustrates this fact, depicting the circuit 
vbelOb before and after logic decomposition into 2- 
literal gates. 
The final columns present a rough estimation of 
the cost for speed-independence-preserving logic de- 
composition. The cost is evaluated as the number of 
-
103 
Circuit 
doc-outbound 
chu133 
chul50 
converta 
dff 
ebergen 
half 
hazard 
master-read 
mmu 
mp-forward-pkt 
mrO 
nU-1 
nak-pa 
nowick 
pe-rcv-ifc 
pe-send-ifc 
ram-read-sbuf 
rcv-setup 
Tdf t  
sbuf-ram- writ e 
sbuf-send-ctl 
sbuf-send-pkt2 
seqmix 
seq4 
trimos-send 
tsend-bm 
vbe5b 
vbe5c 
vbe6a 
vbelOb 
wrdatab 
# gates with n literals 
n = 2  3 4 5 6 7 
4 2  
2 2  
3 2  
5 2  
2 2  
5 2  
1 1  
2 2  
4 4 1  
4 2 1 1 1  
2 2  
5 1 2 3 1 1  
3 2 2 1 1  
8 1  
6 2  
10 10 3 1 
10 7 7 1 1  
4 3  
1 1 
1 3  
2 4  
2 1 
5 4  
2 3  2 
1 2  1 
6 3 
8 6 4 2  1 
3 1  
1 
4 4 
2 5  1 
5 7 1  I 10 2 
our tech. mapping 
i = 2  i = 3  i = 4  
1 
2 
2 
1 
2 
3 
1 
2 
8 1 
n.i. n.i. 2 
2 
n.i. 5 4 
9 4 3 
1 
2 
ni .  5 1 
n.i. n.i. n.i. 
2 
2 1 
5 2 
4 
n.i. 1 
6 
15 3 2 
4 1 1 
11 3 
n.i. n.i. n.i. 
1 
1 
8 4 
10 3 1 
1312 
1611 
2113 
1412 
2112 
712 
3717 
1412 
1311 
1712 
1513 
1412 
2313 
1412 
312 
3719 
yes I 1313 1713 
no 
Yes 
no 
no 
no 
no 
5417 4619 
2414 2414 
2113 1611 
2414 2314 
1111 
2211 
911 
2210 
2413 2916 
2813 3013 
3816 75/10 
2215 2517 
3316 4718 
713 613 
1312 1312 
3816 1816 
4317 3317 
I 5215 62 j 8  
I 624192 6481109 
Table 1: Experimental results 
literals of the combinationalgates and the number of C 
elements of the circuit. The column “non-SI” reports 
the cost of decomposing the original implementation 
of the circuit into 2-literal gates without preserving 
speed-independence (tech-decomp -a 2 command in 
SIS). The column “SI” reports the cost of the de- 
composition preserving speed-independence. In some 
cases, such as vbega, the number of literals is reduced 
because the decomposition strategy allows to share 
[*AI 
x!** 
g*. RI 
M 
Rz logic among different covers. In most cases extra cost 
However, and considering that the area of a C ele- is added for the preservation of speed-independence. !* X I  
R1 X I  ment is roughly equivalent to a 3-input AND gate, 
we independence can conclude is not that higher the than cost of 10% preserving of the area. speed- 
xz 
A3 
A4 is x2 A I  2 g  A3 
A4 
5 Conclusions and future work 
In this paper we have shown a solution to the prob- 
lem of multi-level logic synthesis and technology map- 
ping for asynchronous speed-independent circuits. Let 
us summarise our two-step approach. 
The first step Section 3.1) chooses a candidate for 
covers, sub-cubes etc. Different versions are evaluated 
Figure 6: vbelOb before and after logic decomposition 
into 2-iiteral gates. 
decomposition: a (I gebraic kernels, non-cube-free sub- 
104 
and the “best” one is taken. The “classical” combina- 
tional decomposition stops here. 
The second step (Sections 3.2 and 3.3) performs the 
actual decomposition - it attempts to find a speed- 
independent implementation as similar as possible to 
the candidate obtained at step (1). This is based on 
a bi-partitioning using SIP-insertion corresponding to 
signal x obtained by the first step. Functions for all 
signals are derived from scratch. Our complexity argu- 
ments in Section 3.4 show that there is a good chance 
that x will get exactly the same function which was 
extracted at step (1). However, there is a chance also 
that this function will be smaller (thanks to boolean 
decomposition). Multiple acknowledgements for 2 ap- 
pear automatically at this function generation step. 
Functions for signals which were not decomposed at 
step (1) may also change. Hence, the chosen imple- 
mentation for z may correspond to a very general se- 
quential decomposition. Moreover this is not a local, 
but “global” decomposition since other signals may 
change as well. 
The method is implemented in the tool petrify. The 
results of the last section, to the best of our knowl- 
edge, show that the method appears to produce the 
most effective and efficient known decompositions of 
the standard set of asynchronous benchmarks (beating 
even the fundamental mode solutions). For example, 
examples such as vbelO and wrdatab have been de- 
composed for the first time into two-input AND gates 
by a software tool. 
References 
[l] P. A. Beerel and T. H-Y. Meng. Automatic gate-level 
synthesis of speed-independent circuits. In Proceed- 
ings of the International Conference on Computer- 
Aided Design, November 1992. 
[2] Peter A. Beerel and Teresa H.-Y. Meng. Logic trans- 
formations and observability don’t cares in speed- 
independent circuits. In Proceedings of TA U 1993, 
September 1993. 
[3] R.K. Brayton, G.D. Hachtel, and A.L. Sangiovanni- 
Vincentelli. Multilevel logic synthesis. Proceedings of 
IEEE, 78(2):264-300, February 1990. 
[4] S. Burns. General conditions for the decomposition of 
state holding elements. In International Symposium 
on Advanced Research in Asynchronous Circuits and 
Systems, Aim, Japan, March 1996. 
[5] S. Burns and A. Martin. A synthesis method for self- 
timed VLSI circuits. In Proceedings of the Interna- 
tional Conference on Computer Design, 1987. 
[6] J. Cortadella, M. Kishinevsky, A. Kondratyev, 
L. Lavagno, and A. Yakovlev. Complete state encod- 
ing based on the theory of regions. In International 
Symposium on Advanced Research in Asynchronous 
Circuits and Systems, Aim, Japan, March 1996. 
[7] J. Cortadella, M. Kishinevsky, A. Kondratyev, 
L. Lavagno, and A. Yakovlev. Speed-independent 
circuit technology mapping: decomposition of com- 
binatorial logic. Technical report, University of Aizu, 
Japan, September 1996. 
[ti] A. Kondratyev, M. Kishinevsky, B. Lin, P. Vanbek- 
bergen, and A. Yakovlev. Basic gate implementation 
of speed-independent circuits. In Proceedings of the 
Design Automation Conference, 1994. 
[9] D. E. Muller and W. C. Bartky. A theory of asyn- 
chronous circuits. In Annals of Computing Laboratory 
of Harvard University, pages 204-243, 1959. 
[lo] Enric Pastor, Jordi Cortadella, Alex Kondratyev, 
and Oriol Roig. Structural methods for the synthe- 
sis of speed-independent circuits. In Proc. of Euro- 
pean Design and Test Conference, pages 340 - 347, 
Paris(France), March 1996. 
[ll] M. Sawasaki, C. Ykman-Couvreur, and B. Lin. Ex- 
ternally hazard-free implementations of asynchronous 
circuits. In Proceedings of the Design Automation 
Conference, June 1995. 
[la] P. Siegel and G. De Micheli. Decomposition meth- 
ods for library binding of speed-independent asyn- 
chronous designs. In Proceedings of the International 
Conference on Computer-Aided Design, pages 558- 
565, November 1994. 
[13] P. Siegel, G. De Micheli, and D. Dill. Automatic tech- 
nology mapping for generalized fundamental mode 
asynchronous designs. In Proceedings of the Design 
Automation Conference, June 1993. 
[14] P. Vanbekbergen, B. Lin, G. Goossens, and H. De 
Man. A generalized state assignment theory for trans- 
formations on Signal Transition Graphs. In Proceed- 
ings of the International Conference on Computer- 
Aided Design, pages 112-117, November 1992. 
[15] V. I. Varshavsky, M. A. Kishinevsky, V. B. 
Marakhovsky, V. A. Peschansky, L. Y. Rosenblum, 
A. R. Taubin, and B. S. Tzirlin. Self-timed Control of 
Concurrent Processes. Kluwer Academic Publisher, 
1990. (Russian edition: 1986). 
105 
