Logic Decomposition of Asynchronous Circuits Using STG Unfoldings by Khomenko V
Newcastle University e-prints  
Date deposited:  10th September 2010 
Version of file:  Published [Newcastle University Computing Science Technical Report] 
Peer Review Status: Peer reviewed  
Citation for published item: 
Khomenko V. Logic Decomposition of Asynchronous Circuits Using STG Unfoldings. Newcastle upon 
Tyne:School of Computing Science, University of Newcastle upon Tyne,2010 
Further information on publisher website 
Publishers copyright statement: 
Use Policy: 
The full-text may be used and/or reproduced and given to third parties in any format or medium, 
without prior permission or charge, for personal research or study, educational, or not for profit 
purposes provided that: 
• A full bibliographic reference is made to the original source 
• A link is made to the metadata record in Newcastle E-prints 
• The full text is not changed in any way. 
The full-text must not be sold in any format or medium without the formal permission of the 
copyright holders. 
 
 Robinson Library, University of Newcastle upon Tyne, Newcastle upon Tyne. 
NE1 7RU.   
Tel. 0191 222 6000 
COMPUTING 
SCIENCE 
Logic Decomposition of Asynchronous Circuits Using STG 
Unfoldings 
Victor Khomenko
TECHNICAL REPORT SERIES 
No. CS-TR-1215  August 2010 
TECHNICAL REPORT SERIES
             
No. CS-TR-1215   August, 2010 
Logic Decomposition of Asynchronous Circuits Using STG 
Unfoldings 
V. Khomenko 
Abstract 
A technique for logic decomposition of asynchronous circuits which works on STG 
unfolding prefixes rather than state graphs is proposed. It retains all the advantages of 
the state space based approach, such as the possibility of multiway acknowledgement, 
latch utilisation and highly optimised circuits. Moreover, it significantly alleviates the 
state space explosion, and thus has superior memory consumption and runtime.
© 2010 University of Newcastle upon Tyne. 
Printed and published by the University of Newcastle upon Tyne, 
Computing Science, Claremont Tower, Claremont Road,
Newcastle upon Tyne, NE1 7RU, England. 
Bibliographical details
KHOMENKO, V. 
Logic Decomposition of Asynchronous Circuits Using STG Unfoldings  
[By] V. Khomenko 
Newcastle upon Tyne: University of Newcastle upon Tyne: Computing Science, 2010. 
(University of Newcastle upon Tyne, Computing Science, Technical Report Series, No. CS-TR-1215) 
Added entries
UNIVERSITY OF NEWCASTLE UPON TYNE 
Computing Science. Technical Report Series.  CS-TR-1215
Abstract
A technique for logic decomposition of asynchronous circuits which works on STG unfolding prefixes rather than 
state graphs is proposed. It retains all the advantages of the state space based approach, such as the possibility of 
multiway acknowledgement, latch utilisation and highly optimised circuits. Moreover, it significantly alleviates 
the state space explosion, and thus has superior memory consumption and runtime. 
About the author 
Victor Khomenko obtained his MSc with distinction in Computer Science, Applied Mathematics and Teaching of 
Mathematics and Computer Science in 1998 from Kiev Taras Shevchenko University, and PhD in Computing 
Science in 2003 from University of Newcastle upon Tyne.  He is a Program Committee Chair for the International 
Conference on Application of Concurrency to System Design (ACSD'10). He also organised the Workshop on 
UnFOlding and partial order techniques (UFO'07) and Workshop on BALSA Re-Synthesis (RESYN'09).  From 
September 2005 Victor is a Royal Academy of Engineering/EPSRC Post-doctoral Research Fellow, working on 
the Design and Verification of Asynchronous Circuits (DAVAC) project.  Victor’s main interests include model 
checking of Petri nets, Petri net unfolding techniques, verification and synthesis of self-timed (asynchronous) 
circuits. 
Suggested keywords
LOGIC DECOMPOSITION 
UNFOLDING 
ASYNCHRONOUS CIRCUITS 
SAT 
V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS 1
Logic Decomposition of Asynchronous
Circuits Using STG Unfoldings
Victor Khomenko
Abstract—A technique for logic decomposition of asynchronous
circuits which works on STG unfolding prefixes rather than
state graphs is proposed. It retains all the advantages of the
state space based approach, such as the possibility of multiway
acknowledgement, latch utilisation and highly optimised circuits.
Moreover, it significantly alleviates the state space explosion, and
thus has superior memory consumption and runtime.
Index Terms—Logic decomposition, unfolding, asynchronous
circuits, SAT, STG.
I. INTRODUCTION
A
SYNCHRONOUS circuits (ACs) are circuits without
clocks. This is a promising type of digital circuits, as they
often have lower power consumption and electro-magnetic
emission, no problems with clock skew and related subtle
issues, and are fundamentally more tolerant of voltage, temper-
ature and manufacturing process variations. The International
Technology Roadmap for Semiconductors report on Design [1,
Table DESN4] predicts that 22% of the designs will be driven
by handshake clocking (i.e. asynchronous) in 2013, and this
percentage will raise up to 40% in 2020.
Though the listed advantages look rather attractive in the
view of the current and anticipated microelectronics design
challenges, correct and efficient ACs are notoriously difficult
to synthesise. This paper tackles the problem of logic decom-
position of ACs, i.e. the problem of decomposing large logic
gates into smaller ones without introducing hazards. This is
arguably one of the most complicated problems in the design
flow.
We focus on an important subclass of ACs, called speed-
independent (SI) circuits; this model follows the classical
Muller’s approach [2] and regards each gate as an atomic eval-
uator of a Boolean function, with a delay element associated
with its output. In the SI framework this delay is unbounded,
i.e. the circuit must work correctly regardless of its gates’
delays, and the wires are assumed to have negligible delays
(or, alternatively, wire forks are assumed to be isochronic
— in such a case the circuit is often referred to as quasi-
delay-insensitive (QDI); for the purposes of this paper, these
two models are indistinguishable). Signal Transition Graphs
(STGs) [3], [4] are a formalism for the specification of such
circuits. They are Petri nets in which transitions are labelled
with the rising and falling edges of circuit signals, see the
example in Fig. 1(a–c).
V. Khomenko is a Royal Academy of Engineering/EPSRC Post-Doctoral
Research Fellow. He is affiliated with the School of Computing Science,
Newcastle University, UK. E-mail: Victor.Khomenko@ncl.ac.uk.
This research was supported by the RAENG/EPSRC research fellowship
EP/C53400X/1 (DAVAC) and EPSRC grant EP/G037809/1 (VERDAD).
It should be noted that logic decomposition of ACs is con-
siderably more complicated than the corresponding problem
in synchronous flows. In the traditional synchronous case the
problem can be formulated on a multi-level combinational
Boolean network, which should be mapped to a given gate
library by applying the conventional Boolean methods (in
particular, algebraic or Boolean division). During this process,
the existing algorithms try to minimise some cost function that
takes into account the estimated area and/or delay (sometimes
other metrics such as power consumption are also used).
When moving to ACs, several levels of complexity are
added to the described setup. First of all, the problem can
no longer be formulated as a combinational optimisation, and
one has to deal with a sequential circuit. Second, it is no longer
possible to break up a complex-gate into several smaller ones,
together computing the same function, as hazards can easily
be introduced in this way. (Note that in synchronous circuits
such hazards also occur, but are filtered out by the clock.) In
fact, some STGs which are implementable by complex-gates
can be not implementable in an SI way in some fixed gate
library (e.g. one comprising only one- and two-input gates).
Finally, gate libraries commonly contain latches, and the best
ACs can often be obtained only by utilising them.
The described issues are illustrated by means of an often
used example of a simplified VME bus controller shown in
Fig. 1 (see also [5, Chapter 2]). Assuming that the encod-
ing conflicts have already been resolved by insertion of an
additional internal signal csc (this task should have been
accomplished in the preceding stages of the design flow), one
can start with the STG shown in Fig. 1(c). The complex-
gate implementation shown in Fig. 1(d) can be synthesised
from this STG. Assuming that the gate library contains only
one- and two-input gates and latches, there is a problem of
decomposing the complex-gate implementing csc into smaller
gates.
Unfortunately, it turns out that the naı¨ve logic decom-
position shown in Fig. 1(e) is not SI and even violates
the original specification (though it would be acceptable in
the synchronous framework). Indeed, consider the following
sequence of events:
0© dsr+ csc+ lds+ ldtack+ d+ dtack+ dsr−
csc− 1© d− dtack− dsr+ 2© csc+ d+
At the initial state marked 0©, x = 1 and all the other signals
are 0. At the state marked 1©, x− becomes enabled. Since an
SI circuit should work regardless of the gate delays, one has to
allow that the gates implementing x and lds may be relatively
slow, and so the rest of the shown sequence is feasible. When
the state marked 2© is reached, csc+ becomes enabled —
2 V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS
dtack
VME Bus
Controller
Transceiver
Data
lds
ldtack
d
Device
Bus
dsr
(a)
dsr
lds
ldtack
d
dtack
(b)
lds+
d+
dtack+ lds-
dsr-
dtack-
dsr+ ldtack+ ldtack-
csc+
d-
csc-
(c)
d
ldtack
lds
dsr
dtack
csc
(d)
x
lds
dtack
dsr
csc
d
ldtack
(e)
lds+
d+
dtack+
dsr-
dtack-
dsr+ ldtack+
csc+
csc-
d-
map-
lds-
ldtack-
map+
(f)
d
ldtack
lds
dtack
dsr
csc
map
(g)
csc
lds
dsr
dtack
d
ldtack
C
(h)
inputs: dsr , ldtack outputs: dtack , lds , d internal: csc,map
Fig. 1. VME bus controller (read cycle): interface (a), the timing diagram (b), the STG with an additional internal signal csc resolving the encoding
conflicts (c), a complex-gate implementation (d), a naı¨ve logic decomposition exhibiting hazards (e), an STG with a new signal map and the corresponding
logic decomposition with multiway acknowledgement (f, g), and an implementation with a C-element (h).
something not expected in the STG in Fig. 1(c). Though csc is
an internal signal which is not observable by the environment,
this malfunction can propagate to an observable output by
producing an unexpected d+. Note that the difference between
the complex-gate implementation in Fig. 1(d) and this naı¨ve
implementation is that in the latter the gate implementing x
is allowed to have an arbitrary delay, while in the former it is
‘inside’ a complex-gate which is assumed to be atomic (and
thus has no internal delays).
A correct decomposition into two-input gates is shown in
Fig. 1(g). Note that in contrast to the described hazardous
solution, the new internal signal map is acknowledged by
two gates (see [6], [7] for the concept of acknowledgement).
This illustrates the concept of multiway acknowledgement,
when different transitions of a signal can be acknowledged by
different gates; e.g. in this example map+ is acknowledged
by csc+ and map− by d− (as opposed to the simple case
of local acknowledgement, where the newly inserted signal
is acknowledged only by the gate being decomposed; unlike
the synchronous case, where the local acknowledgement is
sufficient for decomposition, and the multiway one is used
only for optimisation purposes, in the asynchronous case it
is quite common that a complex-gate is not decomposable
using local acknowledgements only, but is decomposable using
multiway ones). This transformation is rather not obvious at
the circuit level; the corresponding STG shown in Fig. 1(f) is
much easier to understand.
An implementation utilising a latch is shown in Fig. 1(h),
where Muller’s C-element [2] with the next-state function
[c] = ab ∨ c(a ∨ b) is used. If this latch is present in the
library, this implementation is likely to be superior in terms
of area and performance to the one in Fig. 1(g). However, this
transformation is also non-trivial, and is only possible due
to the fact that there is no globally reachable state at which
dsr = ldtack = 0 and csc = 1 (this is the condition when
the complex-gate in Fig. 1(d) and the C-element in Fig. 1(h)
behave differently); i.e. a global analysis of the state space of
the specification is required, which also takes into account the
knowledge of the environment’s behaviour.
PETRIFY [5], [8] is one of the commonly used tools for
synthesis of ACs from STGs. It addresses the issues mentioned
above, in particular it allows for multiway acknowledgements
and can utilise latches. For synthesis, PETRIFY employs the
state space of the STG (in the form of BDDs [9]), and
so it suffers from the combinatorial state space explosion
problem [10] — even a relatively small system specification
can (and often does) yield a very large state space. This puts
practical bounds on the size of control circuits that can be
synthesised using such techniques, which are often restrictive,
especially if the specification is not constructed manually by
a designer but rather generated automatically from high-level
hardware descriptions. For example, designing circuits with
more than 20–30 signals with PETRIFY is often impossible.
In this paper, a different data structure for representing
the state space, viz. STG unfolding prefix, is employed. The
experiments in [11]–[13] show that for the application do-
main of ACs, they are much more compact then the explicit
representation or BDDs, and thus significantly alleviate the
state space explosion. An unfolding-based technique for logic
decomposition of ACs is presented, which has all the nice
features of PETRIFY’s algorithm, but can handle much larger
STGs than PETRIFY, while delivering high-quality circuits.
Together with [11]–[13], it essentially completes the synthesis
flow for asynchronous circuits from STGs that does not involve
building reachability graphs at any stage and yet is a fully
fledged logic synthesis.
II. UNFOLDING PREFIXES
A finite and complete prefix Pref Γ of the unfolding Unf Γ
of an STG Γ is a finite acyclic net which implicitly represents
V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS 3
all the reachable states of this STG together with transitions
enabled at those states. Intuitively, Unf Γ can be obtained by
successive firing of transitions starting from the initial marking
of Γ, under the following assumptions: (i) for each new firing
a fresh transition (called an event) is generated; (ii) for each
newly produced token a fresh place (called a condition) is
generated.
Due to its structural properties (such as acyclicity), the
reachable states of Γ can be represented using configurations
of Unf Γ. A configuration C is a downward-closed set of
events (being downward-closed means that if e ∈ C and f
is a causal predecessor of e then f ∈ C) without choices
(i.e. for all distinct events e, f ∈ C, there is no condition c
in Unf Γ such that the arcs (c, e) and (c, f) are in Unf Γ).
Intuitively, a configuration is a partially ordered execution, i.e.
an execution where the order of firing of some of its events
(viz. concurrent ones) is not important. Moreover, [e] will
denote the local configuration of an event e, i.e. the smallest
(w.r.t. ⊂) configuration containing e (it is comprised of e and
its causal predecessors).
Unf Γ is infinite whenever Γ has an infinite run; however, if
Γ has finitely many reachable states then Unf Γ eventually
starts to repeat itself and can be truncated (by identifying
a set of cut-off events beyond which no further events are
generated) without loss of information, yielding a finite and
complete prefix Pref Γ. Intuitively, an event e can be declared
cut-off if the already built part of the prefix contains a
configuration Ce (called the corresponding configuration of e)
such that its final marking and encoding coincide with those
of [e] [14] and Ce is smaller than [e] w.r.t. some well-founded
partial order on the configurations of Unf Γ, called an adequate
order [15]. Fig. 2 shows a finite and complete unfolding prefix
of the STG shown in Fig. 1(c); the only cut-off event is
depicted as a double box, and its corresponding configuration
is {e1, e2}.
2
e9
e11
e13
12e
e14
e4
e10
d−csc+dsr+ lds+ ldtack+ d+ dtack+ dsr− csc−
dsr+dtack−
lds− ldtack−
e
csc+
3 e6e5 e7 e8e1 e
Fig. 2. A finite and complete prefix of the STG in Fig. 1(c).
Efficient algorithms exist for constructing unfolding pre-
fixes [15], [16], which ensure that the number of non-cut-off
events in Pref Γ can never exceed the number of reachable
states of Γ. Moreover, complete prefixes are often exponen-
tially smaller than the corresponding state graphs, especially
for highly concurrent STGs, because they represent concur-
rency directly rather than by multidimensional ‘diamonds’ as
it is done in state graphs. For example, if Γ consists of 100
transitions which can fire once in parallel, the state graph will
be a 100-dimensional hypercube with 2100 vertices, whereas
Pref Γ will coincide with the net itself. Since practical STGs
usually exhibit a lot of concurrency, but have rather few choice
points, they are an ideal case for applying unfolding-based
techniques; in fact, in many of the experiments conducted
in [11]–[13] unfolding prefixes are just slightly bigger than
the original STGs themselves. Thus, unfolding prefixes are
well-suited for alleviating the state space explosion in STG
based design.
In [12] the unfolding technique was applied to detection of
encoding conflicts between reachable states of an STG. In [17],
[18] a method for checking validity of transition insertions
on unfolding prefixes was developed, which was successfully
applied to resolution of encoding conflicts in [11]. In [13] the
problem of complex-gate logic synthesis from an STG free
from encoding conflicts was solved. The experiments in [11]–
[13] showed that unfolding-based approach can handle much
bigger STGs then PETRIFY, without reducing the quality of
produced circuits. This paper proposes a method for logic
decomposition of SI circuits eliminating the necessity of
using atomic complex-gates, which are not very realistic in
practice. This completes the unfolding-based logic synthesis
flow [11]–[13] for SI circuits that does not build the state
graph at any stage. Combined with the STG decomposition
approach of [19], it can be applied, e.g. for control re-synthesis
of BALSA or TANGRAM/HASTE specifications as described
in [11].
Throughout the paper, the following functions will be used.
The final encoding of a configuration C will be denoted by
Code(C); this is a Boolean vector whose elements correspond
to the signals of the STG. Moreover, Codex(C) will denote
the element of Code(C) corresponding to a signal x, and
CodeX(C) is the projection of Code(C) onto a set of signals
X . The Boolean function Outx(C) is true iff C enables an
x±-labelled event, and the next-state function Nxtx(C)
df
=
Codex(C)⊕Outx(C). Intuitively, the result computed by the
complex-gate implementing an output or internal signal x at
the final state of C should be Nxtx(C).
III. THE SI LOGIC DECOMPOSITION ALGORITHM
In [20], a logic decomposition algorithm based on Boolean
relations has been proposed. This algorithm is outlined below
(with minor changes).
forever do
for all non-input singals y do
S[y]← ∅
for all G ∈ {latches , gates} do
S[y]← S[y] ∪ decompositions(y,G)
best H [y]← best SI candidate in S[y]
if for each y, best H [y] is implementable
Library matching
stop
if for each y, best H [y] is empty
fail
H ← the most complex best H
Insert a new signal z implementing H into the STG
First, the algorithm computes the complex-gate implemen-
tation for each non-input signal y. Then it decomposes this
implementation top-down, using a latch or a gate G from the
library, so that the output of G produces the desired signal,
and its inputs are produced by some complex-gates His, which
are computed using a Boolean relation solver (see Fig. 3).
For example, signal csc in Fig. 1(d) is implemented by the
4 V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS
yGF
H1.
..
Hn.
..
..
.
..
.
y
Fig. 3. General framework for SI decomposition.
complex-gate [csc] = dsr ∧ (csc ∨ ldtack), and when decom-
posed with G being the two-input AND gate, [H1] = dsr and
[H2] = csc ∨ ldtack form a possible decomposition.
Then the algorithm checks which of the computed decom-
positions are SI, by trying to insert in a SI way new signals
implementing the non-trivial His. In the chosen example, as
[H1] = dsr is a trivial function, there is no point in imple-
menting it as a new signal; hence a signal map implementing
[H2] = csc ∨ ldtack is inserted as shown in Fig. 1(f). In
this STG map triggers not only the signal being decomposed
(csc), but also d; this is the reason why map appears in the
fan-in of the gates implementing both csc and d in Fig. 1(g),
resulting in a multiway acknowledgement for map. (As it is
impossible to insert in a SI way a new signal implementing
[H2] = csc ∨ ldtack and triggering only csc, the incorrect
decomposition in Fig. 1(e) is not considered by the algorithm.)
Once the decompositions are computed for all non-input
signals, and their SI status is evaluated, a heuristically best SI
decomposition is chosen, if there is one.
If all the non-input signals are directly implementable, the
algorithm performs the library matching step to recover some
area and delay before stopping. At this stage, small gates
can be combined into a larger one, if the latter is in the
library; this is guaranteed to preserve the SI property, provided
that the matched gate is atomic. On the other hand, if no
SI decompositions have been found, the algorithm stops and
reports a failure. Otherwise, some decomposition is heuristi-
cally chosen, a new signal implementing one of its complex-
gates Hi is inserted into the STG, and the loop is repeated.
On the next iteration, the implementation of y will depend
on the newly inserted signal, and hence will be simpler, and
some heuristics are used to prevent a significant increase in the
implementations of the other signals and to ensure progress.
The top-level structure of the algorithm proposed in this
paper is essentially the same; the main difference is that the
insertion of a signal implementing a given Boolean function
is performed using the STG unfolding prefix rather than
BDDs. (The algebraic division based decomposition algorithm
described in [21] can also be handled using the techniques
described in this paper.) Hence one can distill the task of
inserting a new signal, whose implementation is the given
Boolean function F (X), into the STG, and the rest of the
paper focuses on how to solve it using unfolding prefixes.
IV. TRANSFORMATIONS
This paper primarily focuses on SB-preserving transforma-
tions, i.e. ones preserving safeness and behaviour (in the sense
that the original and the transformed STGs are weakly bisimi-
lar, provided that the newly inserted transitions are considered
silent) of the STG. Below several kinds of transition insertions
that will be used for SI logic decomposition are described, and
the algorithms presented in [17], [18] allow one to check their
validity.
Building an unfolding prefix of an STG can be a time-con-
suming operation. However, the approach described in [17],
[18] allows one to avoid a potentially expensive re-unfolding
after each transition insertion, by introducing local modifi-
cations to the existing prefix instead. Moreover, it yields a
prefix similar to the original one, which is advantageous for
visualisation and allows one to transfer some information from
the original prefix to the modified one.
A. Sequential pre-insertion
A sequential pre-insertion is essentially a generalised transi-
tion splitting, and is defined as follows. Given a transition t and
a set of places S ⊆ •t, the sequential pre-insertion S ≀ t is the
transformation inserting a new transition u (with an additional
place) ‘splitting off’ the places in S from t. The picture below
illustrates the sequential pre-insertion {p1, p2} ≀ t.
p1
p2
p3
t
q1
q2
q3
⇒
p1
p2
p3
u p
t
q1
q2
q3
We will write ≀t instead of S ≀ t if S = •t.
One can easily show that sequential pre-insertions always
preserve safeness and traces (i.e. firing sequences with the
silent transitions removed). However, in general, the behaviour
is not preserved, and so a sequential pre-insertion is not
guaranteed to be SB-preserving (in fact, it can introduce
deadlocks) [17]. Given an unfolding prefix, it is quite easy
to check whether a pre-insertion is SB-preserving [17].
If a sequential pre-insertion S ≀ t is applied to an STG,
the inserted transition should not ‘delay’ an input (as this
would impose a constraint on the environment which was
not present in the original specification), and so t must be a
non-input transition. Moreover, one should take care that the
output-persistency is not violated. ([17] presents an algorithm
allowing one to check that the newly inserted transition will
not be in a dynamic choice relation with any other transition,
which ensures output-persistency.)
B. Sequential post-insertion
Similarly to sequential pre-insertion, sequential post-inser-
tion is also a generalisation of transition splitting, and is
defined as follows. Given a transition t and a set of places
S ⊆ t•, the sequential post-insertion t ≀S is the transformation
inserting a new transition u (with an additional place) ‘splitting
off’ the places in S from t. The picture below illustrates the
sequential post-insertion t ≀ {q1, q2}.
V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS 5
p1
p2
p3
t
q1
q2
q3
⇒
p1
p2
p3
t
p u
q1
q2
q3
We will write t≀ instead of t ≀ S if S = t•.
One can easily show that sequential post-insertions are al-
ways SB-preserving. If a sequential post-insertion is applied to
the STG, the output-persistency is guaranteed to be preserved.
However, one still has to ensure that the inserted transition
does not ‘delay’ any input transitions.
C. Concurrent insertion
Concurrent transition insertion can be advantageous for
performance, since the inserted transition can fire in parallel
with the existing ones. It is defined as follows. Given two
distinct transitions, t′ and t′′, and an n ∈ {0, 1}, the concur-
rent insertion t′n|−→t′′ is the transformation inserting a new
transition u (with a couple of additional places) between t′
and t′′, and putting n tokens in the place in its preset. We will
write t′ |−→t′′ instead of t′0|−→t′′ and t′•|−→t′′ instead of t′1|−→t′′.
The picture below illustrates the concurrent insertion t1
•|−→t3
(note that the token in p is needed to prevent a deadlock).
t1 t2 t3 ⇒ t1 t2 t3
p u q
In general, concurrent insertions preserve neither safeness
nor behaviour. In fact, safeness is not preserved even if n = 0
(e.g. when in the original net t′ can fire twice without t′′
firing), and deadlocks can be introduced even if n = 1 (e.g.
when in the original net t′′ should fire twice before t′ can
become enabled). In [17], an efficient test whether a concurrent
insertion is SB-preserving, working on an unfolding prefix, has
been developed.
If a concurrent insertion t′n|−→t′′ is applied to the STG,
the output-persistency is guaranteed to be preserved, but the
inserted transition should not ‘delay’ an input, and so t′′ must
be a non-input transition.
D. Generalised insertion
Generalised transition insertion (GTI) [18] is a generalisa-
tion of concurrent insertion. It is defined as follows. Given
two non-empty disjoint sets of transitions S and D, called
respectively sources and destinations, the generalised insertion
S֌|։D is the transformation inserting a new transition u
with |S| new places in its preset and |D| new places in its
postset and connecting these places to the transitions in S and
D, respectively, as shown in the picture below. In addition,
some of the new places in the preset of u can be initially
marked.
S u D
In [18] efficiently checkable on the unfolding prefix conditions
guaranteeing that the transformation is SB-preserving have
been developed. Moreover, since the number of all possi-
ble GTIs usually grows exponentially with the size of the
STG, their straightforward enumeration would be impractical.
Hence, [18] developed a method for computing only poten-
tially useful (in the context of logic decomposition) GTIs.
If a generalised insertion S֌|։D is applied to the STG,
the output-persistency is guaranteed to be preserved, but the
inserted transition should not ‘delay’ an input, and so D must
not contain any input transitions.
E. Equivalent transformations
It can happen that a sequential post-insertion t ≀ S yields
essentially the same net as a sequential pre-insertion S′ ≀ t′,
where t ∈ ••t′; in particular, this happens if S ∪S′ ⊆ t• ∩ •t′
and |•p| = |p•| = 1 for all p ∈ S ∪ S′. In such a case there
is no reason to distinguish between these two transformations,
e.g. one can convert a post-insertion into an equivalent pre-
insertion whenever possible. Moreover, since post-insertions
are always SB-preserving, there is no need to check the
validity of the resulting transformation.
Furthermore, the following equivalences can be introduced
for the cases involving a GTI:
• Two GTIs S֌|։D and S′֌|։D′ are equivalent iff
S = S′ and D = D′ (note that the notion of equivalence
is structural rather than behavioural). Furthermore, one
can reduce the number of behaviourally equivalent GTIs
by imposing a certain minimality condition on their
sources and destinations, as described in [18].
• Though a GTI S֌|։D cannot be structurally equivalent
to a sequential pre-insertion S′ ≀ t, it still makes sense to
regard them equivalent if S = •S′ and D = {t}, as they
are behaviourally equivalent in such a case.
• Similarly, though a GTI S֌|։D cannot be structurally
equivalent to a sequential post-insertion t≀S′, it still makes
sense to regard them equivalent if S = {t} and D = S′•,
as they are behaviourally equivalent in such a case.
• A GTI S֌|։D is equivalent to a concurrent insertion
t′n|−→t′′ iff S = {t′} and D = {t′}. Hence, in practice it
makes sense to impose an additional constraint |S∪D| >
2 to avoid incidentally generating a GTI that is equivalent
to some concurrent insertion.
F. Commutative transformations
A pair of transformations commute if the result of their ap-
plication does not depend on the order they are applied. (Note
that a transformation can become ill-defined after applying
another transformation, e.g. t ≀ {p, q} becomes ill-defined after
applying t ≀ {p}.) One can observe that:
6 V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS
• concurrent insertions and GTIs commute with any tran-
sition insertions;
• a sequential pre-insertion and a sequential post-insertion
always commute;
• two sequential pre-insertions S ≀ t and S′ ≀ t′ commute iff
t 6= t′ or S ∩ S′ = ∅;
• two sequential post-insertions t ≀ S and t′ ≀ S′ commute
iff t 6= t′ or S ∩ S′ = ∅.
It is important to note that an SB-preserving transition
insertion remains SB-preserving if another commuting SB-
preserving transition insertion is applied first. Hence trans-
formations whose validity has been checked can be cached,
and after some transformation has been applied, the non-
commuting transformations are removed from the cache and
the new transformations that became possible in the modified
STG are computed, checked for validity and added to the
cache.
A composite transition insertion is a transformation defined
as the composition of several pairwise commutative transition
insertions. Clearly, if a composite transition insertion consists
of SB-preserving transition insertions then it is SB-preserving,
i.e. one can freely combine SB-preserving transition insertions,
as long as they are pairwise commutative. This property is
useful for logic decomposition: typically, several transitions of
a new internal signal map have to be inserted in each iteration
of the algorithm, in order to preserve the consistency of the
STG. For example, in Fig. 1(f) a composite transformation
comprising two commuting SB-preserving insertions (adding
the new transitions map+ and map−) has been applied in or-
der to insert a new signal map with the given implementation
[map] = ldtack ∨ csc while preserving the consistency of the
STG.
V. FUNCTION-GUIDED SIGNAL INSERTION
As described above, logic decomposition boils down to the
inserting into an STG Γ a new internal signalmap with a given
implementation [map] = F (X). That is, one has to compute a
consistency-preserving and SB-preserving composite insertion
Î such that, once the corresponding transitions are added to
Γ, it is possible to label each of them map+ or map− so
that the modified STG can be synthesised as an SI circuit,
and F (X) is an implementation of the newly inserted signal
map. In [11], a similar problem of inserting a new signal to
resolve encoding conflicts has been solved by reducing it to
SAT. Below we outline the main idea of that approach.
Given Pref Γ, one can compute a set I of valid (i.e. SB-pre-
serving, SI-preserving, not delaying an input, etc.) insertions as
described in Sect. IV. Note that the number of transformations
in I is relatively small:
• the number of valid sequential pre- and post-insertions is
linear in the number of STG transitions, assuming that
|•t| ≤ c and |t•| ≤ c for every transition t and some
constant c that is independent of the STG;
• the number of valid concurrent insertions is at most
quadratic in the number of STG transitions;
• though the number of valid GTIs can be exponential
in the worst case, only ‘potentially useful’ [18] for
inserting a signal implementing F GTIs are computed
and subsequently used by the proposed approach, and
their number is usually small.
Now one can formulate a SAT problem as follows. For each
insertion I ∈ I we create a Boolean variable, also denoted by
I , indicating whether I ∈ Î . The SAT formula below ensures
that for any satisfying assignment of a SAT instance to be built,
the corresponding composite insertion Î (obtained by taking
the insertions whose corresponding variables are assigned 1) is
valid (i.e. it preserves the consistency of the STG, the chosen
individual insertions commute and introduce no auto-conflicts
or self-triggering):
MUT EX ∧ SA ∧ CUT OFF ,
where the MUT EX constraint ensures that no two signal
insertions I, I ′ ∈ Î are non-commuting, concurrent, in auto-
conflict or one of them can trigger the other; the sign al-
ternation constraint SA ensures that a consistent assignment
of signs to the newly inserted transitions is possible, and the
CUT OFF constraint is needed to ensure that the properties
achieved by MUT EX and SA will hold not only for the
configurations of the complete prefix, but also beyond its cut-
off events, i.e. for the full unfolding.
Some further constraints can be appended to this formula
to ensure additional properties. For example, [11] added a
constraint CORE ensuring that some of the encoding conflicts
are resolved (i.e. some progress is made); in this paper we
will add a constraint FUN instead, ensuring that the newly
inserted signal map is implemented by a given Boolean
function F (X):
MUT EX ∧ SA ∧ CUT OFF ∧ FUN . (1)
Generation of MUT EX , SA and CUT OFF constraints
is described in [11]; hence we concentrate on generating the
FUN constraint, which is the main contribution of this paper;
it should be noted that though the used techniques resemble
those in [11], this contribution is non-trivial and technically
difficult.
The FUN constraint is generated in two steps. First, we
select the subset IF ⊆ I of insertions which are compatible
with F . Then incremental SAT is used to compute a set of
clauses expressing FUN , which depend only on variables I ∈
IF .
We denote by F ′x
df
= F |x=0 ⊕ F |x=1 the partial derivative
of F w.r.t. x. Intuitively, F ′x(X) = 1 iff F essentially depends
on x when its inputs are a vector X , i.e. its value changes if
the component corresponding to x in X is flipped. (Note that
F ′x itself does not essentially depend on x, and the notation
F ′x(X) is used only for convenience.) We will also write F (C)
instead of F (Code(C)), and similarly for F ′x.
A. Selecting compatible insertions
We now introduce a notion of a compatible insertion, and
then show how to check the compatibility property on Pref Γ.
The theory developed in [17], [18] states that if some SB-
preserving insertion I is applied to Γ, yielding the STG ΓI ,
V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS 7
then for each configuration C of Unf Γ there is a unique mini-
mal w.r.t. ⊂ configuration ϕI(C) of Unf ΓI such that C can be
obtained from ϕI(C) by removing the events corresponding
to the newly inserted transition tI , i.e. C = ψI(ϕI(C)), where
ψI is the function projecting configurations of Unf ΓI to ones
of Unf Γ.
Let x be a signal of Γ and C be a configuration of Unf Γ.
The predicate Trig is defined as follows: Trig(C, x, I) holds
if there is an x±-labelled event ex in Unf ΓI such that ϕ
I(C)
does not enable an instance of the newly inserted transition tI
and ϕI(C) ∪ {ex} is a configuration of Unf ΓI enabling tI .
Intuitively, Trig(C, x, I) holds if at the state given by ϕI(C),
x± can fire and trigger tI .
An insertion I ∈ I is called compatible with a Boolean
function F (X) defined over the set X of signals of Γ if for
each configuration C of Unf Γ and each x ∈ X such that
Trig(C, x, I) holds, F ′x(ϕ
I(C)) = 1. The intuition behind
this definition is as follows. Suppose tI is a transition of the
newly inserted signal map implementing F . At the final state
of ϕI(C), tI can be triggered by ex, i.e. firing of x changes the
value of F . The final encodings of the configurations ϕI(C)
and ϕI(C) ∪ {ex} differ only for x, i.e. F must essentially
depend on x at these states, i.e. F ′x(ϕ
I(C)) = 1.
One can now observe that only compatible insertions can
be used to implement F . Indeed, incompatible ones change
the value of map when F ′x(ϕ
I(C)) = 0, i.e. when F must be
stable.
Using the correspondence between the configurations of
Unf Γ and Unf ΓI , one can re-formulate the compatibility of
an I ∈ I as a simple reachability-like property of Γ, which can
efficiently be checked on Pref Γ ([16] outlines an approach for
checking reachability-like properties on unfolding prefixes).
Hence, one can compute IF by simply checking this property
for each insertion in I. For example, the valid insertions
compatible with the function ldtack ∨ csc in the VME bus
controller example are listed below:
I1 : ldtack
−≀
I2 : ldtack
−•|−→d+
I3 : ldtack
−•|−→d−
I4 : ldtack
−•|−→csc−
I5 : ldtack
−•|−→lds+
I6 : ldtack
−•|−→dtack+
I7 : csc
−≀
I8 : csc
− |−→dtack−
I9 : csc
−•|−→d+
I10 : csc
−•|−→csc+
I11 : csc
−•|−→lds+
I12 : csc
− |−→lds−
I13 : csc
−•|−→dtack+
B. Generating the FUN clauses
The set of clauses comprising FUN can now be computed
as follows. For each configurations C of Unf Γ enabling an
instance ex of at least one signal x ∈ X such that F ′x(C)=1,
let
IC
df
= {I ∈ IF | Trig(C, x, I)}.
If map is the newly inserted signal implementing F then
its transition must be enabled at the state corresponding to
ϕI(C∪{ex}) in ΓI , i.e. some insertion in IC must be used to
implement map. This can be expressed by ensuring that the
clause
∨
I∈IC
I is in FUN for each such a C. (Note that if
IC = ∅ then an empty clause is in FUN , which means that
(1) is unsatisfiable and so one cannot insert a signal).
An inefficient way of building FUN would be to enumerate
for each x ∈ X the satisfying assignments of the following
Boolean formula:
CONFC ∧ CODEC,X ∧ DER
x
X ∧ EN
x
C ∧∧
I∈IF
(
I ⇐⇒ T RIGx, IC
)
.
(2)
Here, CONFC , which depends only on variables confe
corresponding to non-cut-off events of Pref Γ, ensures that
C
df
= {e | confe = 1} is a configuration (and not just
an arbitrary set of events); CODEC,X relates the variables
codex, x ∈ X , to the variables confe in such a way that if
the values of all confe are fixed and satisfy CONFC then
Codex(C) = codex for all x ∈ X; DER
x
X depends only on
the variables codex, x ∈ X , and ensures that F ′x(C) = 1;
EN xC ensures that an instance of x is enabled by C; and
T RIGx, IC depends on the variables confe and computes the
value of Trig(C, x, I) for the given x and I . Note that the last
part of the formula relates the variables I corresponding to the
insertions in IF to the variables confe in such a way that if
the values of all confe are fixed, I = 1 iff Trig(C, x, I) holds;
that is, I will occur in the clause being generated only if the
computed satisfying assignment assigns it the value of 1.
However, the number of configurations is usually very large,
and it is computationally infeasible to enumerate them using
a naı¨ve brute-force search. A more efficient approach outlined
below exploits the following two observations:
• The same clause can be generated by many different
configurations, and hence once one such configuration
is found, the others can be excluded from the search.
• If the set of literals of one clause is a subset of the
set of literals of another clause, then the latter clause
is redundant and can be dropped; we say that the former
clause subsumes the latter one. (Note that a clause always
subsumes itself.)
Technically, this can be implemented using incremental
SAT, as follows. Whenever some satisfying assignment of
(2) is computed, the corresponding clause (I1 ∨ . . . ∨ Ik) is
obtained from it and added to FUN . Then, before continuing
the search, the clause (¬I1∨ . . .∨¬Ik), which excludes all the
solutions resulting in the clause subsumed by the current one,
is added to (2), and the process is iterated until the formula
becomes unsatisfiable. In effect, the minimal elements of the
projection of the set of satisfying assignments of (2) onto the
set of variables IF are computed.
Preliminary experiments show that this technique is quite
efficient; in fact, the number of iterations is usually quite small
in practice — FUN often contains less than five clauses. For
example, the FUN constraint for ldtack ∨ csc in the VME
bus controller example is
(I1∨I2∨I3∨I4∨I5∨I6)(I7∨I8∨I9∨I10∨I11∨I12∨I13) .
Feeding (1) to a SAT solver now yields two possible composite
transition insertions for the new signal implementing ldtack ∨
8 V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS
csc, viz. {I1, I7} and {I1, I12}. However, the latter yields
[csc] = dsr ∧ (map ∧ ldtack ∨ csc) or
[csc] = dsr ∧ (map ∧ lds ∨ csc)
as the possible implementations for csc (i.e. the signal being
decomposed), instead of the expected [csc] = dsr ∧map, and
so the former composite insertion (cf. Fig. 1(f)) is heuristically
chosen by the decomposition algorithm. (One can re-formulate
the problem of checking whether a given composite insertion
yields the expected implementation for the signal being de-
composed as a separate reachability-like property of Γ, which
can be efficiently checked on Pref Γ [16].)
C. Correctness proof
Below we state that the proposed method is sound (note that
the method is incomplete due to the greedy nature of the search
performed by the decomposition algorithm, and because only
‘structural’ insertions are used).
Proposition 1 (Soundness). Let ΓÎ be the result of applying
to an STG Γ a composite insertion Î obtained from some
satisfying assignment of (1). Then [map] = F (X) is a possible
implementation for the newly inserted signal map in ΓÎ .
Proof: The composite insertion Î is SB-preserving and
preserves the consistency and output-persistency of the origi-
nal STG due to the theory developed in [11], [17], [18]. Hence
we only need to prove that [map] = F (X) is a possible
implementation for the newly inserted signal map in ΓÎ . It
is enough to show that for any configuration C of Unf ΓÎ ,
Nxtmap(C) = F (C).
For the sake of contradiction, suppose there is a ‘bad’
configuration of Unf ΓÎ for which this property does not hold.
Let C be a minimal w.r.t. ⊂ bad configuration. Since the
cut-off condition for STG unfoldings takes into account the
encodings [14], and due to the CUT OFF constraint, if a
bad configuration exists in Unf ΓÎ then a configuration with
the same final marking and encoding (including map) exists
already in Pref ΓÎ , i.e. w.l.o.g., C is a configuration of Pref ΓÎ
containing no cut-offs.
Assuming that the phase assignment (map+ or map−) to
the newly inserted transitions is such that Nxtmap(∅) = F (∅),
C 6= ∅. Moreover, since C is minimal w.r.t. ⊂, each causally
maximal event of C is labelled by a signal in X ∪ {map},
where X is the support of F , since the events with different
labels cannot trigger or disable map (as map is output-
persistent due to the theory developed in [11], [17], [18]) or
change the value of F , and so do not affect the validity of
Nxtmap(C) = F (C).
Let e be a causally maximal event of C. We consider
the possible cases, and show that each of them leads to a
contradiction.
Case 1: Suppose e is labelled by map±. Then, due to
the minimality of C, the configuration C \ {e} is not bad,
i.e. Nxtmap(C \ {e}) = F (C \ {e}). Since C is bad,
Nxtmap(C) 6= F (C). Since CodeX(C) = CodeX(C \ {e}),
Nxtmap(C \ {e}) 6= Nxtmap(C), i.e. Codemap(C \ {e}) ⊕
Outmap(C \ {e}) 6= Codemap(C) ⊕ Outmap(C). Since
Codemap(C) = ¬Codemap(C \{e}) and Outmap(C \{e}) =
1, Outmap(C) = 1, i.e. map is either auto-concurrent or self-
triggering, contradicting the MUT EX constraint.
Case 2: Suppose e is labelled by some x ∈ X .
Case 2.1: If F ′x(C \ {e}) = 0 then F (C \ {e}) = F (C).
Since C \ {e} is not bad and C is bad, Nxtmap(C \ {e}) 6=
Nxtmap(C). Since Codemap(C \ {e}) = Codemap(C),
Outmap(C\{e}) 6= Outmap(C). Since firing e cannot disable
an instance of map (as non-output-persistent transformations
are rejected), Outmap(C \ {e}) = 0 and Outmap(C) = 1, i.e.
e triggers some instance of map, i.e. the property Trig(ψI(C\
{e}), x, I)) holds, where I is the insertion corresponding to
this instance of map. Hence, I is not a compatible insertion,
a contradiction.
Case 2.2: If F ′x(C \ {e}) = 1 then F (C \ {e}) 6= F (C).
Since C \ {e} is not bad and C is bad, Nxtmap(C \ {e}) =
Nxtmap(C). Since Codemap(C \ {e}) = Codemap(C),
Outmap(C\{e}) = Outmap(C). If Outmap(C\{e}) = 0 then
an empty clause would be present in FUN , which would make
the whole instance unsatisfiable, leading to a contradiction.
Hence, Outmap(C \ {e}) = Outmap(C) = 1. Since non-
output-persistent transformations are rejected, both C \ {e}
and C enable the same transformation map, i.e. the property
Trig(ψI(C \ {e}), x, I) did not hold for the corresponding
insertion I , and so I /∈ IψI(C\{e}). Due to the MUT EX
constraint, the instances of map cannot be concurrent or in
conflict, and since C enables an instance of I , it cannot enable
instances of any other insertions in Î . Hence IψI(C\{e}) = ∅
and an empty clause is present in FUN , making the whole
instance unsatisfiable, which leads to a contradiction.
D. Encoding conflicts
Two distinct reachable states of an STG are in the Universal
State Coding (USC) conflict if they have the same encoding,
and are in the Complete State Coding (CSC) conflict if they
have the same encoding and enable different sets of non-input
signals. Obviously, a CSC conflict is always a USC one, but,
in general, not vice versa. An STG satisfies the USC/CSC
property if it is free from USC/CSC conflicts.
It is well-known that the CSC property is one of the
necessary conditions required for implementability of an STG
as an SI circuit [5]. Note that an STG with USC conflicts
still can be synthesised, as long as it does not contain CSC
conflicts. However, USC conflicts indicate redundancy in the
specification (at the state graph level, the states in USC conflict
can be fused without affecting the correctness), and so STGs
with USC conflicts but without CSC ones are rare in practice.
The result below shows that the USC property is preserved by
a function-guided signal insertion.
Proposition 2 (USC). Let ΓÎ be the STG obtained from an
STG Γ by applying a composite insertion Î implementing the
function-guided signal insertion [map] = F (X) as described
in Prop. 1, and Γ had the USC property. Then ΓÎ also has
the USC property.
Proof: For the sake of contradiction, suppose there are
two configurations C and C ′ of Unf ΓÎ whose final states are
V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS 9
in USC conflict. If one of the configuration can be obtained
from the other by adding a map±-labelled event then their
final encodings differ at the position corresponding to map, a
contradiction. Otherwise, the final states of ψÎ(C) and ψÎ(C ′)
are distinct and Code(ψÎ(C)) = Code(ψÎ(C ′)), i.e. Γ had a
USC conflict, a contradiction.
In [21] it was claimed that a function-guided signal insertion
always preserves the CSC property. Fig. 4 demonstrates that
this is not the case: in fact, a USC conflict can be ‘promoted’ to
a CSC one by such an insertion. Indeed, the original STG has
USC conflicts which are not CSC ones, and is implementable
by the circuit [x] = a, [y] = b, [u] = c, [v] = c; however, after
the function-guided insertion [map] = c shown by dashed
boxes, signals u and v are no longer implementable due to
the CSC conflict between the states following c+ in the two
branches.
c+a+
y+ b− c+
v+
u+
v+
u+
map+
map+
x−
y−
a−x+
b+
inputs: a, b, c outputs: x, y, u, v internal: map
Fig. 4. An STG illustrating that a USC conflict can be promoted to a CSC
one by a signal insertion.
Hence, in certain rare cases, ΓÎ will not satisfy the CSC
property and thus one will not be able to synthesise some
of its output or internal signals (although the newly inserted
signal map will always be synthesisable due to Prop. 1).
To cope with this problem, the following optimistic strategy
can be used. The algorithm can try to perform an insertion
in the hope that in most cases the CSC property will be
preserved. If the resulting STG does contain a CSC conflict
(this will be detected during the derivation of a complex-
gate implementation in the SI logic decomposition algorithm),
then the corresponding CSC core (see [11]) can be mapped
(using ψÎ ) to a USC core in the original STG. Then the
algorithm backtracks, and solves the problem again, this time
with additional constraints prohibiting composite insertions
which would result in this USC core becoming a CSC core in
the modified STG. The process is iterated until the modified
STG has the CSC property or the SAT instance becomes
unsatisfiable due to the additional constraints.
E. Optimisations
Below we propose a number of optimisations which can
significantly reduce the computation cost of the proposed
method.
First of all, one can observe that the decomposition algo-
rithm can attempt to insert the same function F several times.
Since this is the most expensive part of the algorithm, it makes
sense to cache all the insertion results. For this, it is convenient
to represent F as a BDD, since this ensures the canonicity
and allows for easy comparisons of functions and for simply
using pointers as keys for a hash table. Moreover, since the
composite insertions for F and F essentially coincide (only
the phases of the inserted transitions are flipped), there is no
need to compute the insertion for F if one for F is already in
the cache. (Note that complementing a BDD is a constant-time
operation.)
When selecting insertions compatible with F , one can
quickly rule out many non-compatible ones by applying the
following cheap test. I can be compatible only if each its
instance J is triggered only by events labelled by signals in X ,
and for each such x-labelled trigger e of J , F ′x(C \ {e}) = 1,
where C is the smallest (w.r.t. ⊂) configuration enabling J .
Intuitively, I should be enabled only if firing an x ∈ X
simultaneously triggers an instance of I and changes the value
of F at the current state.
Since only the insertions compatible with F are considered
when the clauses comprising FUN are derived, one can
restrict the search space only to configurations whose maximal
events are labelled with signals in X . This can be easily
achieved by augmenting the CONFC with additional clauses.
VI. COST FUNCTION
Once the FUN constraint has been generated, the SAT
problem (1) has to be solved. Typically this problem has
several solutions, and a heuristic cost function is used to guide
the search towards ‘good’ ones, resulting in small area and/or
performance overhead. The constructed SAT instance is solved
several times, with constraints on the value of the cost func-
tion appended to the formula, so that a solution minimising
the value of the cost function is eventually computed. (The
process resembles a binary search on the value of the cost
function.) The cost function we used is similar to the one
used for resolution of encoding conflicts used in [11], with
the components calculating the number of remaining encoding
conflicts dropped. It takes into account:
• the estimated delay introduced by the insertion;
• the total number of syntactic triggers of all output and
internal signals;
• the number of inserted transitions of a signal;
• the number of signals which are not ‘locked’1 with the
newly inserted signal.
The user can choose the relative weights of the components
of the cost function to guide the resolution process towards
solutions with the desired area/latency trade-off.
VII. EXPERIMENTAL RESULTS
The unfolding-based logic decomposition algorithm de-
scribed in this paper has been implemented in the MPSAT
tool. In this section we present the results of running it on a
number of benchmarks. To solve the arising SAT instances,
the ZCHAFF solver [24] was used. The results are compared
with those produced by PETRIFY v4.2, which uses BDDs to
represent and manipulate the state graph of the STG. The gate
library petrify.lib that comes with PETRIFY was used as
1Two signals are in the ‘lock’ relation [22] if their instances alternate in
every execution sequence, always starting from the same signal. ‘Locking’
the newly inserted signal with as many other signals as possible is a good
heuristics for logic simplification [23].
10 V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS
petrify2.lib petrify3.lib petrify.lib
Benchmark PFY MPSAT PFY MPSAT PFY MPSAT
ADFAST F F F 656 416 F
DUPLICATOR 184 184 184 184 184 184
ALLOC-OUTBOUND 296 288 256 248 256 248
NAK-PA 304 384 328 328 328 344
NOWICK 240 256 280 264 280 264
RAM-READ-SBUF 320 368 312 360 304 312
SBUF-RAM-WRITE B 760 496 536 496 608
SBUF-READ-CTL 256 264 248 248 264 264
IRCV-BM T T B T B 632
MMU 712 F 712 712 712 696
MMU0 600 F 616 672 624 664
MMU1 496 552 B 544 B 576
MOD4 COUNTER T 536 528 520 528 472
MR0 504 552 480 520 488 504
MR1 B 504 464 560 440 528
PAR(4) 568 584 504 520 504 560
SEQ MIX 344 320 328 304 328 304
SEQ(8) 744 632 688 632 688 632
MASTER 1882 552 672 528 640 528 640
SPEC SEQ(4) 328 280 304 280 304 280
TRCV-BM T T B F B F
TSEND-BM F T B F B 720
Total 5136 5336 7256 7528 7256 7504
+3.89% +3.75% +3.42%
TABLE I
EXPERIMENTAL RESULTS: ASSORTED SMALL STGS.
the target library; it has combinational gates and latches with
up to four inputs, and the derived libraries petrify2.lib
and petrify3.lib were produced by selecting from it
only the gates and latches with up to two and three inputs,
respectively. All the experiments were conducted on a PC with
a PentiumTM IV/3.2GHz processor and 1Gb RAM.
A. Assorted small benchmarks
Table I presents the experimental results for a number of
assorted small benchmarks from [25], with CSC conflicts
resolved using the method described in [11]. The logic de-
composition has been performed using the three gate libraries
mentioned above, and the areas of the resulting circuits are
reported. We use ‘F’ to indicate that the tool has terminated
failing to decompose a circuit and ‘T’ to indicate that the
tool has not terminated within 10min (this happens when the
tool keeps inserting new signals without making progress).2
Furthermore, it turned out that occasionally PETRIFY pro-
duced incorrect circuits3 — apparently, there is a bug in its
implementation of the logic decomposition algorithm; these
cases are indicated in the table by ‘B’. The totals in the table
are taken over the benchmarks for which both PETRIFY and
MPSAT succeeded and produced correct solutions.
From this table one can observe that PETRIFY and MPSAT
are quite comparable: they have succeeded more or less on
the same benchmarks, and the areas of the resulting circuits
are quite similar (the overall difference is less than 4% for
each of the three gate libraries). It should be noted that the
2Note that these two kinds of failures are pertinent to the decomposition
algorithm in Sect. III due to its ‘greedy’ selection of the decomposition on
each step (without the possibility of backtracking) and the fact that there is
no theoretical guarantee that every circuit can be decomposed using a given
finite gate library.
3The VERSIFY tool [26] was used to check correctness.
x
+
1
x
+
2
x
+
3
x
+
4 z
+ y
+
1
y
+
2
y
+
3
y
+
4
x
−
1
x
−
2
x
−
3
x
−
4
y
−
1
y
−
2
y
−
3
y
−
4
z−
w
+
1
w
+
2
w
+
3
w
+
4
w
−
1
w
−
2
w
−
3
w
−
4
outputs: w1, w2, w3, w4;x1, x2, x3, x4; y1, y2, y3, y4; z
Fig. 5. The PPWKCSC(3, 4) STG modelling three weakly synchronised
pipelines.
‘Library matching’4 step of the logic decomposition algorithm
in Sect. III has not been implemented in MPSAT yet, so non-
optimised numbers are given in the MPSAT columns; it seems
reasonable to expect that this optimisation could recover 10–
15% of the area. It is also worth noting that MPSAT’s failures
for TRCV-BM benchmark are due to the effect described in
Sect. V-D, namely due to promoting a USC conflict to a CSC
one (the current implementation does not try to recover in such
a case and simply fails).
One should treat the provided results with caution, as the
parameters like the success rate and the quality of the resulting
circuits reflect the quality of the heuristics used for selecting
a decomposition on each step of the logic decomposition
algorithm, rather than the quality of the function-guided signal
insertion sub-routine, which is the main point of this paper.
However, it is probably safe to make the following important
conclusion. When performing a signal insertion at the level of
the state graph, PETRIFY can completely restructure the STG,
whereas unfolding-based insertion performed by MPSAT is
currently limited to structural insertions considered in Sect. IV.
Nevertheless, these experiments seem to indicate that for logic
decomposition such structural insertions are not too restrictive
in practice compared to the signal insertion at the state graph
level, as STG restructuring seem to be useful only in rare
cases. On the other hand, unfolding-based insertions scale
better (see below), which is in practice a much more desirable
quality than the ability to do restructuring.
B. Scalable benchmarks
We also compared the described method with PETRIFY on
the PPWKCSC(3, n) series of scalable benchmarks modelling
three weakly synchronised pipelines. They are the benchmarks
from the corresponding series used in [12]. In these bench-
marks all the signals are considered outputs, i.e. the control
4Recall that library matching tries to combine small gates into larger ones
(this does not violate the SI property) and to re-shuffle inverters at gates’
inputs and outputs so that the total number of inverters is minimised and as
many gates as possible have an inverted output (as the ‘negative’ logic gates
are usually smaller and faster). To ensure that re-shuffling of the inverters
preserves the SI property both PETRIFY and MPSAT use the pragmatic
assumption that inverters at gates’ inputs (‘bubbles’) have negligible delays.
V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS 11
STG Reachable Prefix Time, [s]
Benchmark |P |/|T | Sig states |B|/|E| PFY MPSAT
PPWKCSC(3,3) 34 / 20 10 1024 63 / 36 2 1
PPWKCSC(3,6) 70 / 38 19 524288 183 / 96 52 2
PPWKCSC(3,9) 106 / 56 28 268435456 357 / 183 8475 5
PPWKCSC(3,12) 142 / 74 37 137438953472 585 / 297 >15hrs 11
TABLE II
EXPERIMENTAL RESULTS: SCALABLE BENCHMARKS PPWKCSC(3,n)
MODELLING THREE WEAKLY SYNCHRONISED PIPELINES.
logic is designed as a closed circuit. The inputs are inserted
after the synthesis is completed, by breaking up some outputs
and inserting the environment into the breaks, thus forming
handshakes (sometimes with an inverter attached to the output
if the environment acts as an active port). Fig. 5 shows the
PPWKCSC(3, 4) STG.
The purpose of this series is to distill as much as possible the
complexity of the core sub-routine in SI logic decomposition,
viz. function-guided signal insertion, which is the focus of this
paper. As mentioned earlier, the performance of the decompo-
sition algorithm and the quality of the resulting circuit is so
much affected by the multitude of heuristics for choosing the
decomposition on each step that it is difficult to compare the
two implementations of this core sub-routine in PETRIFY and
MPSAT. The advantage of the PPWKCSC(3, n) series is that
the impact of the decomposition selection heuristics is very
restricted: each signal except z in these benchmarks can be
implemented either by an inverter or by a two-input C-element
(with one input inverted), and z itself can be implemented by
a three-input C-element, which can be decomposed in two
two-input C-elements. Hence, when decomposing using the
petrify2.lib gate library (which contains, among other
gates and latches, an inverter and two-input C-elements with
all possible input inversions), the impact of the heuristics is
minimised, and PETRIFY and MPSAT compute very similar
solutions by inserting a single signal.
The experimental results for these benchmarks are sum-
marised in Table II. For each benchmark, this table gives
the STG size (numbers of places/transitions and signals), the
number of reachable states, the size of the unfolding prefix
(numbers of conditions/events), and the runtime (in seconds)
to perform logic decomposition by PETRIFY and MPSAT. The
unfolding prefixes were built using the PUNF tool [27]; the
time is not reported because it was negligible in all cases
(≪1sec).
From this table one can see that the number of reachable
states grows exponentially with the size of the STG, whereas
the size of the prefix grows only quadratically. Though using
BDDs helps PETRIFY to alleviate the state space explosion,
its performance stills suffers considerably, as it struggled to
decompose a circuit with 37 signals. Overall, MPSAT was
clearly superior in terms of runtime and memory consumption:
the cases which were intractable for PETRIFY were solved by
MPSAT relatively easily. This confirms the observation [11]–
[13] that unfolding prefixes provide an excellent data structure
for representing STG state spaces, as the practical STGs
usually have high concurrency but rather few choices — the
ideal case for unfolding-based techniques.
VIII. CONCLUSIONS
In this paper, we proposed an unfolding-based technique for
solving the logic decomposition problem for SI circuits. It has
all the attractive features of the state space based technique
of [20] (highly optimised circuits, the possibility of mul-
tiway acknowledgement, latch utilisation), and significantly
alleviates the state space explosion. Together with [11]–[13],
this essentially completes the unfolding-based synthesis flow
for SI circuits which does not generate state graphs at any
stage and yet is a fully fledged logic synthesis. Combined
with the STG decomposition approach of [19], this design
cycle can be applied for control re-synthesis of BALSA or
TANGRAM/HASTE specifications.
This technique has been implemented in the MPSAT tool.
The experimental results show that PETRIFY and MPSAT have
similar success rates and similar quality of the produced cir-
cuits, which this suggests that structural insertions considered
in Sect. IV are usually sufficient for logic decomposition,
and complex transformations like STG restructuring are rarely
useful in practice. Furthermore, unfolding-based logic decom-
position scales much better.
As future work, it is planned to implement the library
matching algorithm in MPSAT to recover some area and
performance for the produced circuits. Furthermore, improving
the heuristics for selecting the best decomposition on each step
of the logic decomposition algorithm is an important direction
of research, as these heuristics significantly affect the success
rate of the algorithm as well as the quality of the produced
circuits.
REFERENCES
[1] “International technology roadmap for semiconductors: Design,” 2009,
URL: http://www.itrs.net/Links/2009ITRS/2009Chap-
ters_2009Tables/2009_Design.pdf.
[2] D.E. Muller and W.S. Bartky, “A theory of asynchronous circuits,” in
Proc. Int. Symp. of the Theory of Switching, 1959, pp. 204–243.
[3] T.-A. Chu, Synthesis of Self-Timed VLSI Circuits from Graph-Theoretic
Specifications, Ph.D. thesis, Lab. for Comp. Sci., MIT, 1987.
[4] L. Rosenblum and A. Yakovlev, “Signal graphs: from self-timed to timed
ones,” in Proc. Int. Workshop on Timed Petri Nets. 1985, pp. 199–206,
IEEE Computer Society Press.
[5] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Ya-
kovlev, Logic Synthesis of Asynchronous Controllers and Interfaces,
Springer-Verlag, 2002.
[6] A.J. Martin, “The limitations to delay-insensitivity in asynchronous
circuits,” in Proc. 6 th MIT Conference on Advanced Research in VLSI.
1990, pp. 263–278, MIT Press.
[7] V. Varshavsky, Ed., Self-Timed Control of Concurrent Processes, Kluwer
Academic Publishers, 1990, Translated from Russian, published by Na-
uka, Moscow, 1986.
[8] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Ya-
kovlev, “PETRIFY: a tool for manipulating concurrent specifications
and synthesis of asynchronous controllers,” IEICE Transactions on
Information and Systems, vol. E80-D, no. 3, pp. 315–325, 1997.
[9] R.E. Bryant, “Graph-based algorithms for Boolean function manipu-
lation,” IEEE Transactions on Computers, vol. C-35-8, pp. 677–691,
1986.
[10] A. Valmari, Lectures on Petri Nets I: Basic Models, vol. 1491 of Lecture
Notes in Computer Science, chapter The State Explosion Problem, pp.
429–528, Springer-Verlag, 1998.
[11] V. Khomenko, “Efficient automatic resolution of encoding conflicts
using STG unfoldings,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 17, no. 7, pp. 855–868, 2009, Special
Section on Asynchronous Circuits and Systems.
12 V. KHOMENKO: LOGIC DECOMPOSITION OF ASYNCHRONOUS CIRCUITS USING STG UNFOLDINGS
[12] V. Khomenko, M. Koutny, and A. Yakovlev, “Detecting state coding
conflicts in STG unfoldings using SAT,” Fundamenta Informaticae,
vol. 62, no. 2, pp. 1–21, 2004.
[13] V. Khomenko, M. Koutny, and A. Yakovlev, “Logic synthesis for
asynchronous circuits based on Petri net unfoldings and incremental
SAT,” Fundamenta Informaticae, vol. 70, no. 1–2, pp. 49–73, 2006.
[14] A. Semenov, Verification and Synthesis of Asynchronous Control
Circuits Using Petri Net Unfolding, Ph.D. thesis, School of Computing
Science, Newcastle University, 1997.
[15] J. Esparza, S. Ro¨mer, and W. Vogler, “An improvement of McMillan’s
unfolding algorithm,” Formal Methods in System Design, vol. 20, no.
3, pp. 285–310, 2002.
[16] V. Khomenko, Model Checking Based on Prefixes of Petri Net Unfold-
ings, Ph.D. thesis, School of Computing Science, Newcastle University,
2003.
[17] V. Khomenko, “Behaviour-preserving transition insertions in unfolding
prefixes,” in Proc. ATPN’07. 2007, vol. 4546 of Lecture Notes in
Computer Science, pp. 204–222, Springer-Verlag.
[18] V. Khomenko, “A new type of behaviour-preserving transition insertions
in unfolding prefixes,” in Proc. ICGT’10. 2010, pp. 75–90, Springer-
Verlag.
[19] V. Khomenko and M. Schaefer, “Combining decomposition and unfold-
ing for STG synthesis,” in Proc. ATPN’07. 2007, vol. 4546 of Lecture
Notes in Computer Science, pp. 223–243, Springer-Verlag.
[20] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, E. Pastor,
and A. Yakovlev, “Decomposition and technology mapping of speed-
independent circuits using Boolean relations,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no.
9, pp. 1221–1236, 1999.
[21] A. Kondratyev, J. Cortadella, M. Kishinevsky, L. Lavagno, and A. Ya-
kovlev, “Logic decomposition of speed-independent circuits,” Proceed-
ings of the IEEE, vol. 87, no. 2, pp. 347–362, 1999.
[22] P. Vanbekbergen, F. Catthoor, G. Goossens, and H. De Man, “Opti-
mized synthesis of asynchronous control circuits from graph-theoretic
specifications,” in Proc. ICCAD’90. 1990, pp. 184–187, IEEE Computer
Society Press.
[23] J. Carmona and J. Cortadella, “Private communication,” 2006.
[24] S. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, “CHAFF:
Engineering an efficient SAT solver,” in Proc. DAC’01. 2001, pp. 530–
535, ASME Technical Publishing.
[25] J. Carmona and J. Cortadella, “Encoding large asynchronous controllers
with ILP techniques,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 27, no. 1, pp. 20–33, 2008.
[26] O. Roig, Formal Verification and Testing of Asynchronous Circuits,
Ph.D. thesis, Universitat Politecnica de Catalunya, 1997.
[27] “PUNF home page,” URL: http://homepages.cs.ncl.ac.uk/
victor.khomenko/tools/punf/.
