Technology mapping for speed-independent circuits: Decomposition and resynthesis by Kondratyev, Alex et al.
Technology Mapping for Speed-Independent Circuits: Decomposition and 
Resynthesis 
Alex Kondratyev, The University of Aizu, Japan 
Jordi Cortadella, Univ. Politecnica de Catalunya, Barcelona, Spain* 
Michael Kishinevsky, The University of Aizu, Japan 
Lucian0 Lavagno, Politecnico di Torino, Italy t 
Alex Yakovlev, University of Newcastle upon Tyne, United Kingdom ! 
Abstract 
This paper presents theory and practical implementa- 
tion of a method for multi-level logic synthesis of speed- 
independent circuits. An initial circuit implementation is 
assumed to satisfy the monotonous cover conditions but is 
technology independent. The proposed method pedorms 
both combinational {inserting new gates) and sequential 
{inserting new memory elements) decomposition of com- 
plex gates in a given standard cell library, while pre- 
serving original behaviour and speed-independence. The 
algorithm applies known eficient algebraic factorization 
techniques from combinational multi-level logic synthesis, 
but achieves also boolean simplification and sequential 
decomposition. The method allows sharing of decomposed 
logic. 
1 Introduction 
Speed-independent circuits, originating from D.E. 
Muller’s work [ 111, are hazard-free under the unbounded 
gate delay model. With recent progress in developing effi- 
cient analysis and synthesis techniques, supported by CAD 
tools, this sub-class has moved closer to practice, bear- 
ing in mind the advantages of speed-independent designs, 
such as their greater temporal robustness and self-checking 
properties. 
Existing methods of logic synthesis for speed- 
independent circuits either assume that the implementa- 
tion library contains and gates with unbounded fanin and 
“free” input inversions ([1,5,9]) or they use non-standard 
‘‘hazard absorbing” flip-flops whose effectiveness inprac- 
tice still needs to be evaluated ([ 141). Other results on the 
implementability of semi-modular circuits without inputs 
using two-input/two-output and and or gates ([HI) are 
only interesting from a theoretical standpoint, due to their 
extremely high implementation cost. 
In attempts to map speed-independent circuits into a 
more realistic, standard cell-like, library, other sort of re- 
‘This work has been partly supported by the Ministry of Education of 
Spain (CICYT TIC 95-0419), ACD-WG (ESPRIT21949) and integrated 
action UK -1995-0203. 
h i s  work has been partly supported by MURST research project 
“VLSI architectures”. 
Work supported by UK EPSRC GRn24038, ACiD-WG (ESPRIT 
21949) and British Council integratedaction Spain (MDR/1996/97/1159) 
strictions have been exercised. For example, the approach 
described in [16] works only under thefundamental mode 
assumption, which is overly restrictive and does not fit well 
theoretically with the unbounded delay assumption. The 
same authors describe in [15] a method to perform tech- 
nology mapping for speed-independent circuits that only 
decomposes existing gates (e.g., a 3-input AND into two 
2-inputANDs), without any further search of the implemen- 
tation space. They do not explore complex decompositions, 
that could use multi-cube divisors, or decompose several 
gates simultaneously. The same limitations also affect the 
work of [l, 21. The idea of complete resynthesis of a 
circuit every time a new signal is inserted is exploited in 
[12] for the technology mapping of timed asynchronous 
circuits. However the search space for decomposition is 
again limited by a single signal network. 
In [13] a method for technology mapping of speed- 
independent circuits using complex gates was presented. 
This method however only identifies when a set of simple 
logic gates can be implemented as a complex gate, but 
cannot perform a speed-independent decomposition of a 
signal function in case it does not fit into a single gate. In 
fact, this method can be used as a post-optimization step 
after our proposed decomposition technique. 
Finally, Bums analyzes [4] the correctness conditions 
for a decomposition of a sequential element that is part of 
a speed-independent circuit into two sequential elements 
(or a sequential and a combinational element). Notably, 
these conditions are analyzed using the original (unex- 
panded) behavioural model, thus helping the efficiency of 
the method. This work is, in our opinion, a big step in 
the right direction, but addresses mainly correctness issues. 
It does not describe how to use the efficient correctness 
checks in an optimization loop, and does not allow the 
sharing of a decomposed gate by different signal networks. 
The idea of combinational logic decomposition with 
resynthesis has been proposed in [8,7]. The approach com- 
bines together efficient algebraic factorization techniques 
used in multi-level combinational logic synthesis (finding 
candidates for decomposition), and speed-independence 
preserving signal insertion (the latter idea originated in [ 171 
and was implemented efficiently in [6]). 
The main contribution of this paper is a generalisation 
and extension of the above basic idea so as to cover both 
combinational and sequential decomposition. We have 
0-8186-7922-0/97 $10.00 0 1997 IEEE 
240 
developed a body of theory that allows us to prune the 
search space when looking for solutions. We continue 
to use classical logic synthesis techniques available for 
combinational multi-level logic in order to fiid good can- 
didate functions for the decomposition. In the case of 
combinational decomposition the newly inserted signal is 
a library gate. The insertion of a combinational gate is 
based primarily on one of the two transitions of the gate's 
output (e.g., its rising transition). The other transition of 
the combinational gate is fully determined by the insertion 
place of the first transition. 
A sequential decomposition, based on a new memory 
element, can improve the progress of mapping by rendering 
the opposite transition a more effective role, since the set 
and reset logic are inserted independently. In particular, 
two boolean functions can be decomposed at the same time 
with one new signal. Thus, in comparison with [4], this 
method: 
e targets the search of the solution towards a given 
e allows logic sharing based on multiple acknowledg- 
0 performs global optimization via resynthesis (rather 
Throughout the paper we use the following notation: 
library; 
ments; 
than sequential decomposition). 
A stands for the original State Graph, A' -- for a new 
State Graph obtained by signal insertion. 
a, b, c, . . . (lower case Latin letters) are used for signal 
names and, corresponding to them, literals in Boolean 
functions. 
2 - always denotes a new signal, which is inserted 
in State Graph A to decompose a non-implementable 
function. 
B,  C, F, P, Q, R, . . . (upper case Latin letters, except 
A) stand for the names of Boolean functions. 
2 Theoretical background 
In this section we introduce theoretical concepts required 
for our decomposition method: (1) circuit specification 
and its logic implementability; (2) conditions for speed- 
independent decomposition of complex gates; and (3) 
transformations of state graphs to ensure those conditions. 
2.1 State Graphs and Logic Implementability 
A State Graph (SG) is a labeled directed graph whose 
nodes arc called states. Each arc of an SG is labeled with 
an event, that is a rising (a+) or falling (a-) transition 
of a signal a in the specified circuit. We also allow 
notation a* if we are not specific about the direction of 
the signal transition. Each state is labeled with a vector 
of signal values. An SG is consistent if its state labeling 
v : S --f (0, is such that: in every transition sequence 
from the initial state, rising and falling transitions altemate 
for each signal. Figure 1,b shows the SG for the Signal 
Transition Graph in Figure l,a, which is consistent, We 
write s Z (s 5 s') if there is an arc from state s (to state 
s') labeled with a. 
a+ 
Z- 
a,d - inputs acdz r--l 
o001- 
Figure 1: An example of State Transition Graph (a) and 
State Graph (b) (benchmark hazard.g) 
The set of all signals whose transitions label SG arcs 
are partitioned into a (possibly empty) set of inputs, which 
come from the environment, and a set of outputs or state 
signals that must be implemented. In addition to consis- 
tency, the following two properties of a SG are needed for 
their implementability in a speed-independent logic circuit. 
The first property is speedindependence. It consists of 
three constituents: determinism, commutativity and output- 
persistency. A SG is called deterministic if for each state 
s and each label a there can be at most one state s' such 
that s ---f s'. A SG is called commutative if whenever 
two transitions can be executed from some state in any 
order, then their execution always leads to the same state, 
regardless of the order. An event U* is called persistent 
in state s if it is enabled at s and remains enabled in any 
other state reachable from s by firing another event b*. A 
SG is called output-persistent if its output signal events are 
persistent in all states. Any transformation (e.g., insertion 
of new signals for decomposition), if performed at the SG 
level, may affect all three properties. 
The second property, Complete State Coding (CSC), 
becomes necessary and sufficient for the existence of a 
logic circuit implementation. A consistent SG satisfies the 
CSC property if for every pair of states s,s' such that 
v ( s )  = v(s'), the set of output events enabled in both states 
is the same. (The SG in Figure 1 ,b is output-persistent and 
has CSC.) CSC does not however restrict the type of logic 
function implementing each signal. It requires that each 
signal is cast into a single atomic gate. The complexity 
of such a gate can however go beyond that provided in a 
concrete library or technology. 
2.2 Gate-level implementability without hazards 
Necessary and sufficient conditions for speed- 
independent implementation using unbounded fanin and 
gates (with unlimited input inversions), bounded fanin or 
gates and C elements were given in [ 1,9]. In this work we 
are considering a similar basic implementation architec- 
ture, called the standard-C architecture, which is described 
in Figure 2. The difference from previous work is that 
24 1 
instead of unbounded fanin gates for the set and reset logic 
of C-elements, we will allow only implementable gates, 
that is the gates which exist in the chosen library. 
Figure 2: The standard-C architecture extended for com- 
plex gates 
The concepts of excitation and quiescent regions are 
essential for that. A set of states is called an excitation 
region (ER) for event a* (denoted by ERj(a*))  if it is a 
maximal connected set of states such that Vs E E Rj (a*)  : 
s 5. Since any event a* can have several separated ERs, 
an index j is used for the distinction between different 
connected occurrences of a* in the SG. 
The quiescent region (QR) (denoted by QRj(a*))  of a 
transition a*, with excitation region ERj(a*) ,  is a maximal 
set of states s reachable from ERj(a*)  such that a is stable 
in s and s is not reachable from any other ERk(a*) such 
that k # j without going through ERj (a*) '. Examples of 
ER and QR are shown in Figure 1,b. 
Let Cj(a*) denote one of the first-level AND-OR 
gates in the standard-C architecture. Cj(a*) is a cor- 
rect monotonous poly-term cove? for the excitation region 
ERj(a*)  if the following three conditions are satisfied: 
1. Cover condition: Cj(a*)  covers all states of ERj(a*)  
(i.e., Cj(a*)  evaluates to 1 in all states of ERj(a*)) .  
2, One-hot condition: Cj(a*) does not cover any state 
outside ERj(a*)  U QRj(u*). 
3 .  Monotonicity condition: Cj(a*) changes at most once 
The conditions above are called the Monotonous Cover 
conditions or shortly the MC-conditions. Since under these 
conditions the outputs of the first-level gates are one-hot 
along any state sequence within QRj(a*) .  
'Note that contrary to [9, 11 in this paper we use only the so-called 
restricted quiescent regions which do not include states reachable directly 
from two different excitationregions of the same signal. 
*Here for simplicity we consider the definition of Monotonous Cover 
without the extension by the so-called backward quiescent regions and 
without considering covering of multiple regions by the same cover. 
However all the results can be easily generalized for this extension as 
well. 
encoded any valid Boolean decomposition of the second- 
level or gates is speed-independent. 
The standard-C architecture permits a combinational 
implementation of a signal. If the set and reset networks 
are the complements of each other, then a C-element with 
identical inputs can be simplified to a wire (see Figure 
2,b,c). In such case we say that the signal has a complete 
cover, 
2.3 Property-preserving event insertion 
Our decomposition method is essentially behavioural -- 
the extraction of new signals at the structural (logic) level 
must be matched by an insertion of their transitions at the 
behavioural (SG) level. Event insertion is an operation on 
a SG which selects a subset of states, splits each of them 
into two states and creates, on the basis of these new states, 
an excitation region for a new event. Figure 3 shows the 
chosen insertion scheme, analogous to that used by most 
authors in the area [17]. 
Figure 3: Event insertion scheme: (a) before insertion, (b) 
after insertion 
State signal insertion must preserve the speed- 
independence of the original specification. An inserted 
signal is denoted by x in this paper. The corresponding 
to it events are denoted x*, x + ,  x - ,  or, if no confusion 
occurs, simply by 2. Let A be a SG and A' is a state graph 
obtained by insertion of event x .  We say that an insertion 
state set ER(x) ,  in a SG A is a speed-independence pre- 
serving set (SIP-set) iff: (1) for each event a in A, if a is 
persistent in A, then it remains persistent in A', and (2) A' 
is deterministic and commutative. The formal conditions 
for the set of states r to be a SIP-set can be given in terms 
of intersections of r with the so-called state diamonds of 
SG [6]. These conditions are illustrated by Figure 4, where 
all possible cases of the illegal intersections of r with state 
diamonds are shown. 
It was shown in [6] that the insertion of a signal by 
means of a SIP-set is a necessary and sufficient condition 
to preserve the speed-independence of a corresponding SG. 
This requirement is the most general one in the synthesis 
of speed-independent circuits and it does not restrict the 
solution space unless we go beyond the speed-independent 
class. An efficient method for finding SIP-sets, which 
is based on regions, has been proposed in [6].  The 
first method for finding SIP-sets based on reduction to 
satisfiability problem was proposed in [ 171. 
Assume that the set of states S in a SG is partitioned 
into two subsets which are to be encoded by means of 
an additional signal. This new signal can be added either 
in order to satisfy the CSC condition, or to break up a 
complex gate into a set of smaller gates. In the latter case, 
a new signal represents the output of the intermediate gate 
added to the circuit. Let r and 7 = S - r denote the blocks 
of such a partition. For implementing such a partition we 
242 
Figure 4: Possible violations of SIP conditions 
need to insert transitions of the new signals in the border 
states between T and V. 
In this paper we shall consider the so-called input 
border of a partition block T ,  denoted by IB(T) ,  which is 
informally a subset of states of T by which T is entered. We 
call IB(r )  wellformed if there are no arcs leading from 
states in T - IB(T) to states in IB(T).  If a new signal is 
inserted using an input border, which is not well-formed, 
then the consistency property is violated. Therefore, if 
an input border is not well-formed, its well-formed speed- 
independent preserving closure is constructed, as described 
by Algorithm 4.1 in Section 4. 
The insertion of a new signal can be formalized with 
the notion of I-partition ([17] used a similar defmition). 
Given a SG, A, with a set of states S, an I-partition is a 
partition of S into four blocks: {S+, S', S-, So}. So(S') 
defines the states in which z will have the stable value 
0 (1). S+(S-) defines ER(z+) (ER(%-)) in the new 
SG A'. Therefore, abusing notation we will often refer 
to S+(S-) as to ER(s+) (ER(z-)) when talking about 
states of the original SG A or, if confusion may arise, 
we write ERA(z+) (ERA(z-)) .  If the insertion of z 
preserves consistency and persistency, then the only tran- 
sitions crossing boundaries of the blocks are the following: 
so --+ s+ -+ s' t s- t so. 
3 Decomposition techniques 
We assume here familiarity with multi-level logic syn- 
thesis (see [3] for more details). 
As described in the previous section, any deterministic, 
commutative, output-persistent SG satisfying the CSC and 
the Monotonous Cover conditions can be implemented 
using the standard-C architecture. We assume that C- 
elements are present in the library '. OR-gates combining 
cover functions C( a*) can be decomposed by any standard 
technique since their inputs are one-hot encoded. Hence the 
bottleneck for technology mapping is the implementation 
of cover functions C(a*) using gates available in the 
library. 
As traditionally done in multi-level combinational syn- 
thesis, we have chosen algebraic division as the main 
operation for logic decomposition. Thus, for each cover 
function C( a*) we seek algebraic divisors, aiming at de- 
compositions of the following ty C(a*) = F * G + R 
where G is the quotient C ( a * ) F !  AND-decomposition 
31n fact our technique works and is implemented also for RS- and 
Dlatches. However, this generalization of the method is omitted due to 
the lack of space. 
i complex gate! I 
Figure 5: Cover function C(a*) (a) and its combinational 
(b) and sequential (c) decompositions 
is done when R = 0, whereas OR decomposition occurs 
when G = 1. 
However, contrary to the classical combinational de- 
composition we use divisor F not for immediate extraction, 
but as a first approximation of the function to be extracted. 
More specifically, function F defies one (sequential de- 
composition) or two (combinational decomposition) blocks 
of a partition of the state space, which is later used for new 
signal insertion (see Sections 4 and 5 for more details). 
Two ways of decomposing C(a*) are possible: 
0 combinational decomposition: a divisor F is imple- 
mented by a combinational gate, z, as shown in Figure 
5,b and 
0 sequential decomposition: an additional latch (e.g., 
C-element) implements signal z; divisor F is used 
as one of the input functions for the latch as shown 
in Figure 5,c. Another function (denoted by P in 
the figure) must be extracted from some other cover 
function. Functions F and P form the set and reset 
functions for the new sequential signal z. 
In our decomposition technique transitions of z are 
acknowledged by several cover functions. This is more 
general and powerful than [lS, 41 where transitions of s 
must be acknowledged locally, only by the cover function 
C(a*) from which z is extracted. Multiple acknowledg- 
ment offers two advantages: (1) the same signal z can be 
shared by several cover functions (this corresponds to the 
extraction of common sub-dividers in classical multi-level 
decomposition) and (2) correct speed-independent decom- 
position can be found even if it does not exist for solutions 
with single acknowledgments (see the experimental re- 
sults). Note that we do not specifically search for multiple 
acknowledgments. They appear automatically due to the 
signal insertion technique based on SIP-sets. Hence our 
solution is correct by construction and contrary to [2] never 
requires iterations with verification procedures. 
To find good divisors F for C(a*) the following func- 
tions are considered: 
0 Kernels and co-kernels of C(a*). 
243 
0 If C(a*) is a poly-term cover, any subset of terms of 
the sum-of-product expression (OR-decomposition). 
0 If C(a*) is one cube, any subset of literals of the cube 
(AND-decomposition). 
0 Recursive decomposition of the previous candidates, 
e.g. sub-kemels and AND/OR-decomposition of kernels. 
This generation of divisors is heuristically pruned to 
avoid an explosion of candidates for functions with many 
terms or cubes with many literals. Experimental results 
(Section 6) have shown this type of decomposition to be 
very effective. In particular, only those decompositions are 
considered that: 
(1) preserve speed-independence and 
(2) guarantee progress in mapping the circuit to the given 
library. 
The first condition is satisfied by finding an I-partition 
for signal IC. Many candidates for decomposition are 
filtered out at this step, since for many divisors there are 
no valid I-partitions. 
To clarify the second condition assume that function F 
is extracted from a cover function C( a*) for combinational 
decomposition (see Figure 5,b). If there is a valid I-partition 
for a new signal 2, then there is a speed-independent 
implementation for the circuit with signal IC. However, 
in general, there is no guarantee that function C(a*) is 
simplified in the new circuit. The substitution of 5 for 
F in C(a*) does not always preserve s ed-independence 
and hence new fan-in signals for C G )  can appear in 
the implementation. Thus, the progress condition checks 
whether a substitution of IC instead of F in C( a*) is valid. 
Since multiple acknowledgment of 5 can appear, the 
requirement for “good decomposition” is following: the 
complexity of all (other than C(a*)) functions in 5’s 
fan-out has to remain the same or to increase very moder- 
ately. In Section 4.3 we present a computationally efficient 
method for the estimation of effective decompositions. 
The overall algorithm for logic decomposition is 
sketched below. The next sections describe each step 
in more detail. 
Algorithm 3.1 (Speed-independent decomposition) 
while circuit is not mapped to the libmy do 
Calculate monotonous covers for all events; 
Let a* be the event with the most complex cover; 
Let D, be a set of divisors for C(a*); 
/* Kemels, co-kernels, AND/OR decomposition */ 
Let &be a set of divisors for the most complex 
cover functions other than C(a*); 
for each F E D, do 
Decomposition(F, F )  ; 
/* Check combinational decomposition for F */ 
for each P E &do 
Decomposition(F, P) 
/* Check sequential decomposition for the pair { F, P} */ 
end for 
end for 
if All decompositions fail then 
else 
return; /* Cover C(a*) cannot be decomposed */ 
Choose the best decomposition ({F, F }  or {F ,  P}); 
Insert a new signal 
/* by an I-partition defined by the best decomposition */ 
end if 
end while 
Decomposition(F, P); 
Find I-partition for the pair { F, P}; 
if not exists then return failure; 
Evaluate progress for decomposition of C(a*); 
/*(proposition 4.1) */ 
if no progress then return failure; 
Estimate progress for all other covers; 
/* (property 4.5) */ 
if implementability is disturbed then return failure 
Note that after each cycle, when a successful decom- 
position is found, the implementation of every signal in 
the circuit is recomputed for the best candidate. Since at 
the recomputation step the new don’t care sets are used 
for all signals, this practically implements sequential de- 
composition and boolean division (i.e., it is far beyond the 
capabilities of algebraic factorization). 
a 
a 
z 
z 
Figure 6: Circuits for a hazard.g example before (a) and 
after (b) decomposition 
Example hazardg. This example (from the set of 
asynchronous benchmarks) is used for illustrating our al- 
gorithm. Its Signal Transition Graph and SG are shown in 
Figure l,a and b. Signals a and d are inputs, signals c and 
z -- outputs. A speed-independent implementation of the 
output signals c and z is presented in Figure 6,a. Our target 
is the decomposition of function S, into two-input gates, 
because it is a standard worst case against which the per- 
formance of a decomposition algorithm can be measured. 
Function S, consists of a single 3-literal cube iidc. It can 
be decomposed in three ways: by extracting functions iid, 
Ec and dc. 
Example 2. For the cover C(y*) = ab+ac+de f the fol- 
lowing divisors are generated (trivial 1-literal divisors are 
not considered): the kernel b + c, the OR-decompositions 
ab, ac, d e f ,  ab + ac, ab + def  and ac + def and the 
AND-decompositions de, df and e f .  
4 Combinational d ~ c o ~ ~ o s i t i o n  
4.1 State partitioning 
In this section we apply the theory of SIP-insertion, 
reviewed in Section 2.3, to a divisor F of a given cover 
C(a*). 
244 
Definition 4.1 (Transition sets) Let A =< V, E > be a 
SG with a set of states V and a set of events E .  Let S C V 
be a subset of states and e E E be an event. The following 
sets of states are defined for S and e (see Figure 7): 
before(e, S )  = { s : s # S A 3s'( s 5 s' A s' E S ) }  
entry(e, S )  = {s : s E s A W ( s '  -5 s A SI # s)}  
Zeave(e, S )  = {s : s E S A W ( s  -5 s' A s' # s)} 
after(e, S )  = {s : s s A 3s'(s' s A s' E s)}  
eave(e,S) 
after(e,S) 
Figure 7: Illustration of transition sets 
Pred(S)  and Succ(S) give the sets of states outside 
S reachable in one step in backward or forward direction, 
respectively. Input, I B ( S ) ,  and exit, EB(S) ,  borders of 
S give the sets of states inside S from which the states 
not included in S are reachable in one step in backward 
or forward direction, respectively. Our technique operates 
with input borders. Set I B ( F )  defined by Definition 4.1 
can be computed as follows ( I B ( F )  is computed similarly): 
IB(F) = U entry(e, {s : ~ ( s )  = 0 1 )  = 
= {S : F(s )  = 0 A 391 : SI + s A F ( s ~ )  = 1). 
eEE 
An event b* is said to be a trigger event for event a* 
if entry(b*,ER(a*)) # 8. Informally, by firing trigger 
events it is possible to enter the excitation region for a*. 
We also say that signal b is a trigger signal for signal a 
and for event a*. All trigger signals for signal a must be 
included in the support of the logic function implementing 
a and hence each trigger signal will be in the fan-in of a. 
Triggers can be easily derived by observing ERs of a in 
the SG. 
We can also show another property of trigger signals, 
that will be used to estimate the complexity of the logic 
after decomposition. 
Property 4.1 Event x* is a trigger for event b* in SG A' 
iffZeave(b*, ER(%*)) # 8. 
The proof follows directly from the rules for event 
insertion (cf. Figure 3), because if Zeave(b*, ER(x*)) # 8 
then the firing of b* will be delayed until x* has fired. 
Any boolean function F defines a bipartition { S F ,  SF} 
of the set of states of a SG: SF = {s : F ( s )  = 1)  and 
S" = {s : F ( s )  = 0).  As discussed in Section 2.3, for 
insertmg a new signal x it is necessary to find an I-partition, 
{S+,S1,S-,So}, based on bipartition { S F , S F } .  The 
four blocks of I-partition are constructed as follows: 
S -  = ER(x-)  g SF and S+ = ER(s+) g S", 
corresponding to the excitation regions of 5 in the new 
SG, are obtained by the well-formed closure of the 
input border sets, I B ( F )  C SB and I B ( F )  C S F ,  
respectively [6]. 
- 
- 
S' = SF - S+ and So = S F  - S - .  
The following property states that, if there is a well-formed 
SIP closure of the IB ,  then there is a minimal closure that 
has strictly less states than any other. 
Property 4.2 [8] Let {b ,6}  be a bipartition of the SG 
states. Let I1 C b and let I2 be a minimal well-formed SIP 
set such that I1 C I2 C b. Then I2 either does not exist or 
unique. 
In particular (the practically useful case), this property 
holds for I1 = IB(b). The proof (see [SI) provides 
a constructive procedure for selecting the minimal well- 
formed SIP closure of the input border without backtracking 
and thus is computationally efficient. This procedure can be 
summarized as follows. (We further illustrate it by deriving 
ER(x-)  for F = dc in S,  for the hazard.g example, as 
shown in Figure 8). 
8) 
Figure 8: Derivation of ER(x-)  for a decomposition dc 
of a hazard.g example 
Algorithm 4.1 Generation of ERs for a new signal, 
by example of S- = ER($-)  
1 .  Let ER(z-) = I B ( F )  
245 
2. Find well-formed closure by recursive application of 
the following rule: i f s  E Pred(ER(x-))  n SF, then 
let ER(x- )  = ER(x - )  U s. 
3. Preserve (if required) the input-output interface by 
checking that no input signals can be delayed by x. 
For this do the following: 
for any input si nal b: i s E after(b*,ER(x-)), 
then let ER(x-7 = E R i - )  U s. 
4. Force SIP properties (make any intersection of state 
diamonds with ER(x - )  legal by inserting in ER(x - )  
the corresponding states of the diamond). Goto Step 
9 
d 
Calculation of ER(x - )  stops either if at some step 
intersects with S F  (then there is no legal 
) or a fixed point is reached. Calculation of 
is done similarly based on I B ( F ) .  
Example ha2ard.g continued. In the example 
(see Figure 8,a) ER(x- )  = (1011) (step 1). It is 
well-formed (step 2). At step 3 we will find that 
state 0011 E af ter (a- ,ER(x- ) )  and state 1001 E 
). Therefore, {0011,1001} are in- 
Figure 8,b). State diamond 
illegally intersects ER(x - )  
(step 4). To le alize this, the intersection state OOO1 is 
included in ERtx- )  as shown in Figure8,c. 
Figure 9 shows the results of ER(x*)  generation for 
the decomposition of S,  = Ecd with divisors Ed, Ec and 
dc, respectively. The choice F = Ed is not valid (see Fig- 
ure 9,a), because F intersects illegally with state diamond 
{ 101 1,001 1,1001,0001). This illegal intersection cannot 
be corrected by expanding I B ( F )  without hitting states 
where F = 0. The divisors Ec and dc are valid and the 
corresponding ERs of signal x are shown in Figure 9,b,c. 
acdz 
Figure 9: Three attempts to decompose S, = Zcd in 
huzurdg example 
4.2 Progress Analysis 
If ER(x+) and ER(x - )  are derived, then there is a 
speed-independent implementation of the SG with a new 
signal x. However, to ensure progress in the technology 
mapping for the target cover function C(m) = P * G + R, 
we would like to have the following implementation in the 
new circuit: C(a*) = x * G + R (function F is substituted 
in this expression by one literal z)~ .  This is not always 
possible, since to preserve speed-independence, C( a*) may 
require more fan-in signals. We will formulate progress 
conditions which will defiie when the implementation 
above is valid. 
State images. The progress conditions are easily formu- 
lated in terms of the new SG A' . However, constructing 
the new SG is computationally hard and hence it is better 
to use the original SG A (cf. the approach in [4]). For 
this we need to compare the states of A and their images 
in A'. The insertion scheme (Figure 3) determines a binary 
relation (we call it an image relation) between the states 
of A and the states of A'. A state s' from SG A' is said 
to be an image of a state s from A if values of all signals, 
except x ,  are the same in s and in s'. Then, state s is called 
the inverse image of s'. The inverse image for any state 
from A' is unique. The opposite is not true. Each state 
s E ER(x*)  from SG A has two images s', s" in A' such 
that s' 2 s". All other states in A have one image. The 
image relation is expanded to the sets of states. If S is 
a set of states in A', then its inverse image is denoted by 
S-'. To avoid confusion, we will add subscript A or A' to 
address the objects in SGs A and A' if necessary. 
Inverse images for excitation and quiescent regions. 
The validity of substituting a new signal z in a cover 
function C(a*) is checked by considering the inverse 
images of ER(u*)At and QR(u*)A~. By construction, only 
states from ER(u*)A have images in which a* is enabled, 
hence ER(u*)A is the inverse image of ER(u*)A~ .  For 
quiescent regions the image relation is more complicated. 
Consider, for example, signal transition a+. For every state 
s E QR(u+)A there is an image in which signal a is equal 
to 1, and therefore QR(u+)A C QR(a+),f. However, 
QR(a+)p! can include additional states because some 
original signal transitions are delayed by x.  
Figure 10: Inverse image for quiescent regions 
This case is illustrated in Figure 10. In SG A state 
s E ER(a-). However, in one of its images, s', signal a 
is equal to 1 and is stable, and therefore s' E QR(u+)A~.  
Hence, state s is in the inverse image of QR(u+)AI. The 
following procedure computes the inverse image for a 
quiescent region (by example of QR(ai+)A!). 
4Algorithm 4.1 does not modify the borders of SF or SB, so the 
combinational solution z = F is always valid. However the technique 
described in this section may also find a sequential decomposition with 
this combinational "seed". 
246 
Algorithm 4.2 Computing inverse image 
for quiescent regions 
QR( ai+);! = QR( ai +)A; 
for each a, - that succeeds a;+ do 
i fs  E ER(%*) n ER(a,-) A after(a,-, s) n ER(%*) = 0 
then QR(ai+)Af = QR(ai+),! U s 
end for 
Before formulating progress conditions we present a 
useful property that captures conditions for signal x to 
have a constant value inside the excitation region of the 
original signal a in the new SG even if the excitation 
region for x* in the original SG contains states from the 
ER(a*)  (2 ,  as before, denotes the signal which is inserted 
for decomposition). 
Property 4.3 [8] Let SG A' be obtained from SG A by 
insertion of signal x .  Let a* be an event. Let ER(x+)  be 
a welllformed SIP closure of the input border for a block 
of a state partition for SG A, obtained with Algorithm 4.1, 
such that ER(x+)  f l  ER(a*)  # 8. Ifthe following two 
conditions are satisfied for SG A 
then x is equal to 1 in any state of ER(u*)At. 
A symmetrical property holds for ER(x- ) .  The next 
proposition states the progress condition by presenting 
conditions for preserving monotonous cover conditions for 
substituting function F with one literal x in the cover 
function C(a*). 
Proposition4.1 [7] Let CA(U*) = F * G + R be a 
monotonous cover of ER(a*)  in SG A. Let ER(x+)  and 
E R ( x - )  be the S+and S-sets for inserting a signal x ob- 
tained by Algorithm 4.1. Thefunction CAI ( U * )  = x*G+ R 
satisfies the three conditions for the monotonous cover in 
the new SG A', iff: .. 
1. Covercondition: after(a*, (ER(a*)nF*G*f i - ) )n  
ER(x+)  = 8 
2. One-hot condition: Vs : s $2 ER(a*)  U 
QR(a*),! + s $2 E R ( x - )  n G 
3. Monotonicity conditions: 
(a) Vs : s E (QR(a*)  n F * G * x) + s $2 ER(x+) ,  and 
(b) VS : s E 'QR(a*),! n E R ( x 1 )  n G + Pre;(s) E 
G + R  
The proof is given in [8]. The conditions in the above 
proposition can be informally explained as follows. 
Condition 1 ensures the cover condition for CAI ( U * )  in 
the new SG A', by detailing Property 4.3. Set ER(a*)  n 
F * G * contains those states of ER(a*)  in SG A 
that are covered by F * G,  but not by R.  Therefore, to 
satisfy the cover condition in SG A', the image of this 
set in A' must be covered by the function x * G. If 
after(a*, (ER(a*)  n F * G * x)) n ER(x+)  # 0, then 
there is a transition SI 2 s2 internal to ER(x+)  such 
that F ( s l )  = G(sl) = 1 and R ( s l )  = 0. Hence, state 
~ 
247 
SI has two images si and sy in A' such that si 2 sy,  
which implies that signal x has value 0 in si and value 1 
in sy. Therefore, state si E E R A ~ ( x + )  is not covered by 
CAI(U*) = x * G + R since both 2 * G and R have value 
0 in si. The cover condition is violated. 
Condition 2 ensures the one-hot condition for CA' ( U * )  
in the new SG A'. Lets  be outside ER(a*)  U QR-'(a*). 
If s E E R ( x - )  n G in SG A, then in the new SG A', 
function x * G evaluates to 1 in the first image s' of s 
(s' "J s"). Hence, for A', function x * G is evaluates to 1 
outside ERA,(U*)UQRA~(U*) ,  which violates the one-hot 
condition for the cover function CA' ( U * )  = x * G + R.  
Condition 3 ensures the monotonicity condition for 
CAI ( U * )  in SG A'. Condition 3(a) guarantees that CAI ( U * )  
cannot make a non-monotonous transition of the ty e "1- 
0-1" along any path inside ERA~(u*)  U QRA'(a*y Set 
QR(a*) n F * G * contains the states of QR(a*) that 
are covered by F * G,  but not by R ,  in SG A. Let some 
state s from this set belong to ER(x+) .  Then there are 
two images for state s in SG, A': s' and s" such that 
s' 2 s". Function x * G evaluates to 0 in s' and to 1 in 
SI'. Neither image is covered by R. Moreover since states 
of E R  a*) are covered by CA,(,*) the cover function 
CA' (a* \ performs a non-monotonic transition 1-0-1 along 
a path within ERAI(u*)  U QRA,(u*) (this path starts in 
ER(a*)  and contains states s' and s"). 
Condition 3(b) ensures that CA' ( U * )  cannot make a non- 
monotonous transition of the other type "0-1-0 along any 
path inside ERA, (a*) U QRA,(u*). Assume that there is at 
least one state, s, such that s E Q R ( u * ) ~ !  n ER(x- )  fl G 
and let its predecessor, SI, be covered neither by G nor by 
R.  Then function CA,(,*) has value 0 in the image, si, of 
s1 (if s1 has two images, then CAI ( U * )  has value 0 in both). 
State s has two images in A' (s' "s s"). Function x * G 
evaluates to 1 in the first one, s', and to 0 in the second one, 
s". Hence, function CAI (a*) performs a non-monotonous 
0-1-0 transition along the path si -+ s' -+ s" in A'. 
Example hazard.g continued. All the conditions of 
Proposition 4.1 are satisfied for F = Ec and F = de and 
for both of them S, can be safely decomposed into two 
AND gates. 
4.3 Cost estimation 
The progress condition (if satisfied) guarantees that the 
implementation of a target cover function C(a*) will be 
simplified as a result of a decomposition. However, to 
accept a decomposition we need to check that it will not 
increase the complexity of logic for other events. We use a 
conservative estimate of logic complexity, in which trigger 
signals play a key role, in order to select candidates for 
decomposition. 
All events (besides the target event a*) can be divided 
e Events x* of signal x 
in 3 groups: 
It can be shown, by analyzing the MC conditions that 
x = F is a correct complete cover for a signal x.  
The preconditions for these events are not modified 
by the insertion of x ,  and hence we can (in the 
e Events for which x* is not a trigger 
worst case) use the same implementation as before the 
decomposition. It is possible, though, that x can be 
used to further simplify the implementation of those 
signals as well, since the don't care set is increased. 
e Events for which x* is a trigger, denoted by TT(x) .  
For estimating complexity of such events the follow- 
ing procedure is used. 
Algorithm 4.3 Estimating complexity of signals 
for which x is a trigger 
1. for each b* E T T ( I )  do 
2. if I* replaces trigger event d* in ER(b*) then 
I* property 4.4 */ 
3. if I substitutes d in a cover function C(b*) then 
I* proposition 4.2 *I 
4. The complexity of C( b*) i s  not increased 
/* property 4.5 *I 
6. The complexity of C(b*) is increased moderately 
8. Decomposition fails 
5. else if I can be added as one additional literal to C(b*) then 
7. else 
9. end if 
10. end if 
1 1. end for 
Further we consider the main steps of Algorithm 4.3. 
Replacement of other trigger events by x (line 2 of Al- 
gorithm 4.3). Property 4. l helps to find the set of events 
Tr(z )  for which signal x becomes a trigger. Conditions 
for replacing a trigger event by a new signal transition x* 
are stated by the following property. 
Property4.4 [8] An event x* replaces d* as a trigger 
event for b* in SG A' iff  in SG A the following conditions 
are satisj?ed: 
(1) entry(d*,ER(b*)) c ER(x*) 
( 2 )  before(d*,ER(b*))n E R  x*) = 0 
(3) after(&, entry(d*, ER@* 0 )) n ER(x*) = 0 
Example ha2ard.g continued. Let us consider a com- 
binational decomposition of S, using function F = dc. 
ER(x+) satisfies all the conditions of Property 4.4 and 
hence x+ becomes a new trigger event for t+ instead of 
d+. On the other hand, for ER(x-)  for both events a- 
and d- condition 2 of Property 4.4 is violated. Therefore, 
events a- and d- are concurrent with 5- and none of 
them is replaced by the new trigger event x- .  After in- 
serting signal x event t- will have three trigger events 
x-, a-, d-.  For the decomposition based on function 
F = Ec, the new signal x replaces old trigger signals for 
Validating substitution of signal x into a cover function 
other than C(a*) (line 3 of Algorithm 4.3). If a trig- 
ger event x* replaces another trigger event d* for some 
ER(b*), then the next step is to check that signal d can be 
replaced by signal x in the logic implementation of C(b*). 
Assume that CA@*) = d * M + N .  We want to check 
validity of substitution CAI (b*) = x * M + N .  Conditions 
for validity of such substitution are almost identical to 
those of Proposition 4.1. 
both Z+ and t - . 
Proposition 4.2 Let C( b*) = d * M + N be a monotonous 
cover of ER( b*) in SG A. Let { S+ = ER( x+) , S' , S- = 
ER(x-) ,  So}  be the I-partition for inserting signal x. The 
implementation CA, (b*) = x * M + N satisfies the three 
conditions for monotonous cover in the new SG A' iff: 
I. Cover condition: (after( a*, (ER(a*) nd* M * N ) )  n 
2. One-hot condition: Vs : s $ ER(a*) U 
QR(a*)A! =$ s # (ER(z- )  U S ' )  n M 
3. Monotonicity conditions: 
(a) Vs : s E (QR(a*) n d * M * 
Pred(s)  E M + N 
Let us clarify the difference between Propositions 4.1 
and 4.2. In Proposition 4.1 signal x is the output of 
the gate implementing function F and is substituted into 
C(a*) = F * G + R instead of F. In Proposition 4.2 x 
substitutes signal d, which is implemented by a gate differ- 
ent from the gate implementing x .  Therefore, Conditions 
1-3 have a more general form in Proposition 4.2. Indeed, 
to ensure the cover condition (according to Property 4.3) 
condition ER(a*) n F * G * R  n ER(z-)  = 0 is required. 
This condition is automatically satisfied if z = F and x 
substitutes F in C(a*), whereas it is not if x substitutes 
signal d.  If signal z substitutes function F, z is equal to 
1 in the same states as F with the exception of ER(x*). 
Hence, in the one-hot and the monotonicity conditions, we 
should only consider states from ER(x-) .  If x substitutes 
signal d,  then states from S should be considered as well. 
Note that Property 4.2 can also be used when signal 
z replaces several trigger signals dl , . . . , dk. In this case 
the cover function for b* can be represented as C( a*) = 
dt * . . . * dk * M + N .  After substituting x a new cover 
function is C(b*), = z * M + N .  
When the replacement fails (line 5 of Algorithm 4.3). 
In this case the complexity of a cover function for ER( b*) 
can in general increase (unless the expanded don't care set 
induced by x* implies further simplification of C(b*)). If 
the conditions of the following property are satisfied, then 
no more than one literal is added to the fan-in of C(b*). 
We restrict our method with such a moderate increase in 
complexity only to bound the search space. 
Property 4.5 [7, 81 Let CA(b*) be a monotonous cover 
for event b* in SG A. If in the SG A' obtained from A 
by inserting a new signal x the following conditions are 
satisfied: 
ER(X+)  = 0) A (ER(u*) n d * M * F n ER(X-)  = 0) 
+ s $ ER(z+), and 
(b) VS : s E QR(U*)A! n (ER(x-) U SI) n M 3 
1. event x+ is a trigger for b*; 
2.  ER(x+) r l  after@*, ER@*)) = 0 and 
then the cover function CAI (b*) = CA&) * z for 
3. C(b*) n ER(x-) = 0, 
event b* in A' satisfies the monotonous cover conditions. 
This property is used as a heuristic filter to select candidate 
divisors that are guaranteed not to increase excessively the 
complexity of the implementation of other signals. 
248 
Example hazard.g continued. For a decomposition 
with F = dc (Figure 9,c) signal s becomes a new trigger 
for e- without replacing any other trigger. Hence the cover 
for z- will increase by one literal. A cover for z+ will 
decrease by one literal. This decomposition is not effective. 
If F = Zc is used, then event x- is inserted before c- 
and replaces trigger event a+. Function for c- will not 
increase in complexity. The result of decomposition using 
function Ec is shown in Figure 6,b. 
5 Sequential decomposition 
5.1 Motivation 
Combinational decomposition is limited, since signal 
insertion using bipartition { F 9 F }  is based primarily on 
one of the two transitions of the gate's output (e.g., its 
rising transition). The other transition of the combinational 
gate is fully determined by the insertion place of the first 
transition. Moreover, if s substitutes F in cover function 
C(a*) = F * G + R, then in most cases, event x+ 
becomes a trigger to a* and is acknowledged by a* itself. 
However, z- is often acknowledged by signals different 
from a, which may increase their complexity. Sequential 
decomposition, based on a new memory element, can 
improve the progress of mapping by allowing the opposite 
transition to play a more effective role, since the set and 
reset logic are inserted independently. In particular, two 
boolean functions can be decomposed at the same time 
with one new signal. 
Assume that there are two functions C( a*) = F * G + R 
and C(b*) = P * Q + T, which are not yet mapped in the 
library, and such that F * P =: 0. Then, in the sequential 
decomposition, a new signal x is inserted in such a way 
that x will go to 1 when F is changing from 0 to 1, and 
go to 0 when P is changing from 0 to 1. Then, both rising 
and falling transitions of z can be used to simplify the 
cover functions: x+ to simplify C(a*), and z- to simplify 
C(b*). 
To illustrate that sequential decomposition can be more 
powerful than combinational decomposition, let us modify 
the hazard.g example, by declaring signal c to be an input 
(example hazard-m0d.g). As before, we would like to map 
the three-literal function S, =: Zdc into two-input gates. 
Since signal c is now an input, there are additional con- 
straints for preserving the input/output interface. Indeed, 
we can no longer make the new event x- trigger for input 
c-. Derivation of ER(z+)  and E R ( z - )  for the example 
hazard-m0d.g and the combinational decomposition based 
on F = iic (that succeeded in hazard.g) is shown in Figure 
11,a. ER(s-) for hazard-m0d.g includes 5 states (instead 
of 1 for hazard.g, cf. Figure 9,a) and event x- is no 
longer replacing any other trigger event of z-. Decom- 
position based on F = Zc makes the cover function for 
event z- even worse: 4 literals instead of 3 (hence it 
is not useful). Example hazard-m0d.g cannot be mapped 
using only combinational decomposition. Further we will 
refer to hazard-m0d.g to illustrate the steps of sequential 
decomposition. 
5.2 State partitioning 
Combinational decomposition using function F is based 
on bipartition of states into two blocks S F  = { s : F( s) = 
1) and SF = {s : F ( s )  = 0) = {s : F ( s )  = 1). This 
- 
a -  8% RZ 
JI 
a- 
;- C-
C) 
Figure 11: Decompositions of hazard-m0d.g example: (a) 
combinational, (b) sequential, (c) logic for signal z 
bipartition is transformed to an I-partition with four blocks 
for inserting a new signal z as described in Section 4.1. 
F=P=O 
Figure 1 2  I-partition for sequential decomposition 
Sequential decomposition using a pair of orthogo- 
nal functions { F , P }  defines a partition of states 
into three blocks {SF,Sp,SFF} (see Figure 12): 
SF -- = {S : F ( s )  = l}, Sp = {s : P ( s )  = l}, and 
SF = {s : F(s )  = P(s)  = 0). We make the following 
transformations of the three blocks when constructing an 
I-partition, { S+, S', S-, So},  based on the four input bor- 
ders: IB(F) ,  IB(P), IB(F),  and IB(B) (see Figure 12) 
Algorithm 5.1 Constructing I-partition for 
sequential decomposition 
(a.i) D1 = all states backward reachable fiom 
Pred(IB(P)) without hitting SF;  
(a.ii) Include in D1 all states forward reachable from 
I B ( 7 )  U D1 without hitting IB(P)  6; 
I .  I* Construct D1 and DO -- subsets of SFF *I 
5As before, for an ease of presentation, we ignore in this paper the 
fact that the set and reset functions for C-elements are not required to be 
orthogonal. 
6'IJe reachability relation is reflexive ( 8  + 8)  and hence states of 
I B ( F )  can be includedinto D I .  
249 
2 .  
3. 
4. 
5. 
6. 
(b.i) DO = all states backward reachable from 
Pred(IB(F))  without hitting S p ;  
(b.ii) Include in DO all states forward reachable from 
I B ( P )  U DO without hitting I B ( F ) ;  
then return failure; 
If (Dl 17 DO # 8) then return failure; 
Construct S+ = ER(x+)  as a well-formed SIP clo- 
sure of I B ( F ) ;  
I f  S+ n ( S p  U DO) # 0 then return failure; 
Construct S- = ER(x- )  as a well-formed SIP clo- 
sure of I B ( P ) ;  
If S- n (SF U 0') # 8 then return failure; 
S' = ( S F  U 01) - S+andSo = ( S p  U D o )  - S -  
i* I-partition for { F, P }  is constructed *I 
If i(Succ(D1) C I B ( P )  A SUCC(DO) I B ( F ) )  
Algorithm 5.1 fails to construct an I-partition for { F, P }  
if any one of steps 2-5 retums failure. Otherwise, the al- 
gorithm retums an I-partition {S+ = ER(x+) ,  S', S- = 
ER(x- ) ,  So}. 
Note that the set D1 (and similarly DO) is constructed in 
two steps l(a.i) and l(a.ii), by first applying the backward, 
and then the forward reachability. If SG A is cyclic (such 
that the initial state SO is reachable from any other state of 
a SG), then step l(a.ii) can be omitted, since it does not 
produce any new states in D1. However, if SO is not a 
cyclic state for SG A and SO E SFF, then both traversals 
are needed to identify which set, D1 or Do, state so (and 
its successors) belongs to. 
Algorithm 5.1 ensures consistency for the new signal x .  
Step 2 checks that an path in SG A starting from IB(F) 
cannot reach ER(x+ywithout crossing E R ( x - )  (or sym- 
metrically a path from I B ( P )  cannot reach ER(x- )  
without crossing ER(x+)) .  Step 3 checks that D1 and 
DO have no states in common. These two checks guaran- 
tee that there are no cycles inside D1 U DO. Therefore, 
signal x can only perform consistent transitions in A': 
1* 4 0 4  o* + 1 4 1* 4 .... 
The following property, which proof can be found in 
[SI, shows that Algorithm 5.1 is sound. 
Property5.1 Let I = {S+ = ER(z+),S',S- = 
ER(x- ) ,  So} be an I-partition obtained by Algorithm 5.1 
for a pair offunctions {F,  P }  such that F * G = 0. The 
new SG, A', obtained from A by inserting signal x using 
I-partition I is consistent and speed-independent. 
Example hazard-m0d.g. Let us consider the pair of 
functions F = ?ic and P = & (F is extracted from 
S, = Zcd, while P is extracted from R,  = E&. The 
well-formed SIP sets ER(x+)  and ER(x- )  defined us- 
ing { F , P }  are shown in Figure 11,b. The sequential 
decomposition based on {F, P }  is feasible because: 1) 
DO = { 1100,1110); D1 nD0 = 0; and hence all conditions 
of Property 5.1 are satisfied. 
F * P = 0; 2) any cycle starting in ER(x+)  (ER(x-  
crossesER(2-) (ER(x+) )3 )  D1 = {llll,  1011,0011 
5.3 Progress conditions 
Sequential decomposition is aimed at mapping in the 
libr two non-implementable cover functions C( a*) an 
C ( q .  However, the decomposition can be useful if at 
least one of the functions is simplified. We can accept 
a sequential decom osition simplifying only one cover 
function (e.g., c(a*j'= F * G + R) in two main cases: 
1. Combinational decomposition using F for function 
C(a*) failed because of the x -  event, e.g., the SIP 
conditions are violated for ER(x- ) .  
2. Combinational decomposition using F for function 
C(a*) is valid, but it makes logic for some other 
(than a*) events more complex (e.g., due to the 
acknowledging event x - ) .  
However, if none of the functions C(a*) or C(6*) is 
simplified by sequential decomposition, then it is rejected. 
Estimating progress for sequential decomposition is very 
similar to that for a combinational one. 
For target events a* and b*. The validity of substi- 
tuting signal x into C(a*) and C(b*) is checked by 
Proposition 4.2. If the substitution is not valid, the 
conditions of Property 4.5 are applied to implement 
CA, (a*) or CA, (b*) as CA(a*) * z or CA(&) * V, 
correspondingly. 
e For events x* of signal x .  If conditions of Property 
5.1 are satisfied, then signal x can be implemented by 
a C-element with inputs F and P. 
Similar to 
the combinational case, it can be shown that the 
complexity of these events cannot increase with the 
insertion of x .  
Events for which x is a trigger. We either check that 
x can substitute for some other trigger signals in these 
cover functions (see Proposition 4.2) or (if this check 
fails) that cover functions can be implemented with at 
most one extra literal z (see Property 4.5). 
Events for which x is not a trigger. 
Example hazard-m0d.g continued. For the target event 
z+, the sequential decomposition with F = Zc and P = d 
satisfies the progress condition. It also does not disturb 
the implementability of event z-. Thus, the sequential 
decomposition is successful, while all combinational de- 
compositions fail. The final implementation is shown in 
Figure 11,c. 
6 Experimental results 
The strategy for general logic decomposition presented 
above has been implemented and applied to a set of bench- 
marks. Results are shown in Table 1. 
We have measured the complexity of each gate as the 
number of literals required to be implemented as a sum-of- 
product gate, either complemented or not. Thus a 2-input 
EXOR gate (a5 + ab) is considered to be a 4-literal gate, 
whereas the function ab  + ac + db + dc is also considered 
a 4-literal gate (d + &). This model is different from the 
one used in [4] where technology mapping was targeted at 
250 
library ( signa1s)JCPU 
i = 2  i = 3  i=4 
- in - 
- 2/2 - 
2/2 - 
In - 
214 - 
3D - 
111 - 
211 - 
n.i. 5/59 2/26 
n.i. 51130 4/91 
101126 4/26 3/20 
- 
- 
- 
- 
- 
- 
8/274 1/39 - 
- 213 - 
- 114.0 - 
213 - - 
n.i. 31450 11203 
n.i. n.i. n.i. 
218 - 
212 111 - 
5/10 213 - 
4/23 - 
3/19 116 - 
61130 - 
- 
- 
- 
91247 2/20 1/12 
4/12 114 114 
101129 3/15 - 
n.i. n.i. n.i. 
111 
111 
- - 
- - 
8131 4/11 - 
7/93 2/11 - 
ion6  3/20 116 
wrdatab 
Total 
Siegel [151 
i = 2  
Yes 
no 
no 
Yes 
no 
no 
Yes 
Yes 
no 
Yes 
no 
Yes 
Yes 
Yes 
no 
no 
no 
no 
i #gates with n literals 
l n = 2  3 4 5 6 7 
4 2  
2 2  
3 2  
5 2  
2 2  
5 2  
1 1  
2 2  
4 4 1  
4 2 1 1 1  
2 2  
5 1 2 3 1 1  
3 2 2 1 1  
8 1  
6 2  
10 10 3 1 
10 7 7 1 1  
4 3  
1 1 
1 3  
2 4  
2 1 
5 4  
2 3  2 
1 2  1 
6 3 
8 6 4 2  1 
3 1  
1 
4 4 
2 5  1 
5 7 1  
Circuit 
alloc-outbound 
chu133 
chul50 
converta 
dff 
ebergen 
half 
hazard 
master-read 
mp-forward-pkt 
mro 
IWl 
nak-pa 
nowick 
pe-rcv-ifc 
pe-send-ifc 
ram-read-sbu f 
rcv-setup 
sbuf-ram-write 
sbuf-send-ctl 
sbuf-send-pkt2 
seqmix 
seq4 
trimos-send 
tsend-bm 
vbe5b 
vbe5c 
vbe6a 
vbelOb 
"U 
rpdft 
Table 1 : Experimental results 
the implementation in FPGA 4-input lookup tables. Due to 
this fact, it is difficult to make direct comparison with the 
solutions from [4]. The difference between our approach 
and that of [4] can be easily shown by the example in 
Figure 13. For the STG of Figure 13 output signals e and 
y are implemented by 3-input AND gates. Our tool finds 
their decomposition into 2-input AND gates, in which both 
outputs e and y are used to acknowledge switchings of a 
new signal z. No valid decomposition (preserving speed- 
independence) exists when z is acknowledged by only one 
output (either y or z). The method from [4] looks for the 
decomposition within a single signal network and hence 
will fail to decompose 3-input AND gates. 
The first set of columns in Table 1 indicates the com- 
plexity of the circuit before decomposition. The second set 
of columns reports the number of signals inserted for de- 
composition using gates with at most i literals (i = 2,3,4), 
and the CPU time required to find the solution (in seconds, 
for a Sparcstation 20). The number of inserted signals 
shows also the number of iterations in technology mapping 
-- the circuit is resynthesized every time a new signal is in- 
serted. The next column summarizes the results presented 
by Siegel [15] about the implementability of the circuit 
litsllatches (i = 2) 
non-SI SI 
1 6l3 1514 
1312 1311 
1611 1712 
2113 15l3 
14/2 1412 
21R 2313 
7/2 312 
1412 1412 
37p 3719 
13l3 1413 
54/7 4819 
2414 2414 
2113 1811 
2414 2314 
911 1111 
2210 2211 
2413 2916 
1313 2715 
2813 3013 
3816 4716 
2215 23p 
3316 4118 
13D 1312 
713 613 
3816 19p 
43p 33p 
5215 5416 
637195 6401109 
1 
SIS 1 
2210 
2810 
2410 
5710 
2610 
3410 
1910 
2810 
9310 
2910 
9310 
3510 
2010 
3710 
1110 
2210 
3210 
3810 
7010 
5 110 
7210 
3510 
1610 
8210 
9510 
with only 2-input gates. All realizations have been verified 
to be speed-independent. 
From the 32 examples, only 5 were not implemented 
( n i )  with 2-literal gates. Only one 5-input AND gate in 
pe-send-ifc and two 5-literal gates in tsend-bm were not 
decomposed when attempting to implement these circuits 
with 4-literal gates. We significantly improve over the 
results presented in [15], and only one circuit @e-rcv- 
$I could not be realized with 2-literal gates from that 
benchmark suite. 
The global-acknowledgment allows the method to ef- 
fectively decompose complex gates with high fan-in (6 or 7 
literals). This is shown by circuits like mrl and vbelOb that 
were implemented with 2-literal gates. Figure 14 illustrates 
this fact, depicting the circuit mrl before and after logic 
decomposition into 2-literal gates. 
The effectiveness of sequential decomposition is illus- 
trated in Figure 15. The insertion of a new latch was crucial 
to allow the decomposition of a gate that could not be 
decomposed by the approach presented in [15]. 
The final columns present a rough estimation of the cost 
for speed-independence-preserving logic decomposition. 
The cost is evaluated as the number of literals of the 
251 
d- - b+-- d+- y+- a- - y- - c+-- d- 
c- - d+- z- + b- - z+ - c+- a+- c- 4 4 
a) 
Figure 13: Example abcd STG a) 3-AND gate implemen- 
tation b) 2-AND gate implementation c) invalid “local” 
decompositions d,e). 
combinational gates and the number of C elements of 
the circuit. The column “non-SI” reports the cost of 
decomposing the original implementation of the circuit 
into 2-literal gates without preserving speed-independence 
(techdecomp -a 2 command in SIS). The column “SI” 
reports the cost of the decomposition preserving speed- 
independence. In some cases, such as vbe6a, the number 
of literals is reduced because the decomposition strategy 
allows sharing logic among different covers. In most 
cases extra cost is added to preserve speed-independence. 
However, if we consider that the area of a C element is 
roughly equivalent to a 3-input AND gate, we can conclude 
that the area cost of preserving speed-independence is not 
higher than 5%. 
The last column shows, for the sake of comparison, the 
cost of performing technology mapping against a 2-input 
library (which is roughly the same as a 2-literal library) 
using the bounded wire delay model, after delay padding 
([lo]). If we consider the cost of a C element to be 3 literals, 
the total cost of the speed-independent implementations in 
the 2-literal library is 640 + 109 x 3 = 967 literals, which 
is considerably smaller (and probably faster, because there 
is no need to add delay buffers). 
7 Conclusions and future work 
In this paper we have shown a solution to the problem 
of multi-level logic synthesis and technology mapping for 
asynchronous speed-independent circuits. The method is 
based on both combinational and sequential decomposition, 
for each of which we apply a two-step approach. 
The first step (Section 3) chooses a candidate for de- 
composition: algebraic kernels, non-cube-free sub-SOPS, 
sub-cubes etc. Different versions are evaluated and the 
“best“ is taken -- say, it corresponds to the new signal 
z. Combinational decomposition for synchronous circuits 
stops here. In the case of sequential decomposition two 
candidates are considered simultaneously. 
Figure 1 4  mrl before and after logic decomposition into 
2-literal gates. 
The second step (Sections 4 and 5) performs actual 
decomposition -- it attempts to fmd an optimized speed- 
independent implementation based on the candidate ob- 
tained at step (1). This is based on partitioning the state 
space into four sets, in which signal z is stable and is 
changing, while ensuring the speed-independence of the 
expanded specification (a necessary condition for speed- 
independent implementability). A new implementation 
is then derived for each signal, thus achieving global 
optimization and acknowledgement. The complexity argu- 
ments in Section 4.3 show that there is a good chance that 
z will get exactly the same function which was extracted at 
step (1). However, there is a chance also that this function 
will be smaller (thanks to boolean decomposition). Mul- 
tiple acknowledgments for z appear automatically at this 
function generation step. Functions for signals which were 
not decomposed at step (1) may also change. Whenever a 
combinational decomposition fails to simplify the overall 
complexity (due to the lack of control in the insertion of 
the opposite transition z-), the procedure applies a se- 
quential decomposition (where z- is used to simplify one 
more cover). As a result, the actual function for z may 
correspond to a very general sequential decomposition. 
Moreover this is not a local, but “global“ decomposition 
since other signals may change as well. 
The method is implemented in the tool petrify. The re- 
sults shown in the last section, to the best of our knowledge, 
show that the method appears to be the most effective and 
efficient amongst those available to date for the standard 
set of asynchronous benchmarks For example, it is for the 
fxst time that such examples as vbelO and wrdatab have 
been decomposed into two-input AND gates by a software 
tool. 
We are currently working at improving the method to 
make it complete (i.e. answering the key question of what is 
252 
I t  
b- 
I A 
a+ d- a- 
'! / 
\\ 
k 
Figure 15: Example ebergen before and after logic decom- 
position into 2-literal gates. 
the largest class of State Graphs that can be implemented in 
a given library) and at extending the basic implementation 
architecture to other types of sequential elements, such as 
S/R flip-flops or D latches. 
References 
[l] P. A. Beerel and T. H-Y. Meng. Automatic gate-level 
synthesis of speed-independent circuits. In Proceedings of 
the International Conference on Computer-Aided Design, 
November 1992. 
[2] Peter A. Beerel and Teresa H.-Y. Meng. Logic transfor- 
mations and observability don't cares in speed-independent 
circuits. In Proceedings of TAU 1993, September 1993. - .  
R.K. Brayton, G.D. Hatchel, and A.L. Sangiovanni- 
Vincentelli. Multilevel logic synthesis. Proceedings of 
IEEE, 78(2):264--300, February 1990. 
S. Bums. General conditions for the decomposition of 
state holding elements. In International Symposium on 
Advanced Research in Asynchronous Circuits and Systems, 
Aizu, Japan, March 1996. 
S.  Bums and A. Martin. A synthesis method for self- 
timed VLSI circuits. In Proceedings of the International 
Conference on Computer Design, 1987. 
J. Coltadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, 
and A. Yakovlev. Complete state encoding based on the 
theory of regions. In International Symposium on Ad- 
vanced Research in Asynchronous Circuits and Systems, 
Aizu, Japan, March 1996. 
[7] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, 
and A. Yakovlev. Technology mapping of speed- 
independent circuits based on combinational decomposition 
and resynthesis. In Proc. of European Design and Test 
Conference, Paris(France), March 1997. 
[8] A. Kondratyev, J. Cortadella, M. Kishinevsky, L. Lavagno, 
and A. Yakovlev. Technology mapping for speed- 
independent circuits. Technical Report TR 96-2-005, Aizu 
University, 1996. 
[9] A. Kondratyev, M. Kishinevsky, B. Lin, P. Vanbekbergen, 
and A. Yakovlev. Basic gate implementation of speed- 
independent circuits. In Proceedings of the Design Automa- 
tion Conference, 1994. 
[ 101 L. Lavagno and A. Sangiovanni-Vincentelli. Algorithms 
for synthesis and testing of asynchronous circuits. Kluwer 
Academic Publishers, 1993. 
[ 113 D. E. Muller and W. C. Bartky. A theory of asynchronous 
circuits. In Annals of Computing Laboratory of Harvard 
University, pages 204-243, 1959. 
[12] Chris J. Myers, Peter A. Beerel, and Teresa H.-Y. Meng. 
Technology mapping of timed circuits. In Asynchronous 
Design Methodologies, pages 138-- 147. IEEE Computer 
Society Press, May 1995. 
[ 131 EMc Pastor, Jordi Cortadella, Alex Kondratyev, and Ono1 
Roig. Structural methods for the synthesis of speed- 
independent circuits. In Proc. of European Design and 
Test Conference, pages 340 -- 347, Paris(Fmce), March 
1996. 
[14] M. Sawasaki, C. Ykman-Couvreur, and B. Lin. Extemally 
hazard-free implementations of asynchronous circuits. In 
Proceedings of the Design Automation Conference, June 
1995. 
[15] P. Siegel and G. De Micheli. Decomposition methods 
for library binding of speed-independent asynchronous de- 
signs. In Proceedings of the International Conference on 
Computer-Aided Design, pages 558-565, November 1994. 
[I61 P. Siegel, G. De Micheli, and D. Dill. Automatic technology 
mapping for generalized fundamental mode asynchronous 
designs. In Proceedings of the Design Automation Confer- 
ence, June 1993. 
[17] P. Vanbekbergen, B. Lin, G. Goossens, and H. De Man. 
A generalized state assignment theory for transformations 
on Signal Transition Graphs. In Proceedings of the In- 
ternational Conference on Computer-Aided Design, pages 
112--117, November 1992. 
[18] V. I. Varshavsky, M. A. Kishinevsky, V. B. Marakhovsky, 
V. A. Peschansky, L. Y. Rosenblum, A. R. Taubin, and 
B. S .  Tzirlin. Self-timed Control of Concurrent Processes. 
Kluwer Academic Publisher, 1990. (Russian edition: 1986). 
253 
