Efficient Automatic Resolution of Encoding Conflicts Using STG Unfoldings by Khomenko V
 
 
 
 
 
 
 
University of Newcastle upon Tyne 
   
 
 
 
 
 
 
 
COMPUTING 
SCIENCE 
 
 
 
 
 
 
Efficient Automatic Resolution of Encoding Conflicts Using STG 
Unfoldings 
 
V. Khomenko 
 
 
 
 
 
 
 
 
 
 
TECHNICAL REPORT SERIES 
              
 
No. CS-TR-995 January, 2007 
NEWCASTLE
UN IVERS ITY OF
TECHNICAL REPORT SERIES 
              
 
No. CS-TR-995  January, 2007 
 
 
 
Efficient Automatic Resolution of Encoding Conflicts Using STG Unfoldings 
 
 
Victor Khomenko 
 
Abstract 
 
 
Synthesis of asynchronous circuits from Signal Transition Graphs (STGs) involves 
resolution of state encoding conflicts by means of refining the STG specification. In 
this paper, a technique for resolving such conflicts by means of insertion of new 
signals is proposed. It is based on conflict cores, i.e. sets of transitions causing 
encoding conflicts, which are represented at the level of finite and complete prefixes 
of STG unfoldings. The experimental results show significant improvements over the 
state space based approach in terms of runtime and memory consumption, as well as 
some improvements in the quality of the resulting circuit. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
© 2007 University of Newcastle upon Tyne. 
Printed and published by the University of Newcastle upon Tyne, 
Computing Science, Claremont Tower, Claremont Road, 
Newcastle upon Tyne, NE1 7RU, England. 
Bibliographical details 
 
KHOMENKO, V. 
 
Efficient Automatic Resolution of Encoding Conflicts Using STG Unfoldings  
[By] V. Khomenko 
 
Newcastle upon Tyne: University of Newcastle upon Tyne: Computing Science, 2007. 
 
(University of Newcastle upon Tyne, Computing Science, Technical Report Series, No. CS-TR-995) 
 
Added entries 
 
UNIVERSITY OF NEWCASTLE UPON TYNE 
Computing Science. Technical Report Series.  CS-TR-995 
 
 
Abstract 
 
Synthesis of asynchronous circuits from Signal Transition Graphs (STGs) involves resolution of state encoding 
conflicts by means of refining the STG specification. In this paper, a technique for resolving such conflicts by 
means of insertion of new signals is proposed. It is based on conflict cores, i.e. sets of transitions causing 
encoding conflicts, which are represented at the level of finite and complete prefixes of STG unfoldings. The 
experimental results show significant improvements over the state space based approach in terms of runtime and 
memory consumption, as well as some improvements in the quality of the resulting circuit. 
 
About the author 
 
Victor Khomenko obtained MSc with distinction in Computer Science, Applied Mathematics and Teaching of 
Mathematics and Computer Science in 1998 from Kiev Taras Shevchenko University and PhD in Computing 
Science in 2003 from University of Newcastle upon Tyne.  Successfully completed the EPSRC-sponsored 
"Parallel Model Checking" project. Interests: model checking of Petri nets, Petri net unfolding techniques, self-
timed (asynchronous) circuits.  
 
 
Suggested keywords 
 
CSC CONFLICT,  
STG,  
ASYNCHRONOUS CIRCUITS,  
SYNTHESIS,  
UNFOLDING,  
SAT 
V. KHOMENKO: EFFICIENT AUTOMATIC RESOLUTION OF ENCODING CONFLICTS USING STG UNFOLDINGS 1
Efficient Automatic Resolution of Encoding
Conflicts Using STG Unfoldings
Victor Khomenko
Abstract— Synthesis of asynchronous circuits from Signal
Transition Graphs (STGs) involves resolution of state encoding
conflicts by means of refining the STG specification. In this paper,
a technique for resolving such conflicts by means of insertion of
new signals is proposed. It is based on conflict cores, i.e., sets
of transitions causing encoding conflicts, which are represented
at the level of finite and complete prefixes of STG unfoldings.
The experimental results show significant improvements over the
state space based approach in terms of runtime and memory
consumption, as well as some improvements in the quality of the
resulting circuit.
I. INTRODUCTION
ASYNCHRONOUS circuits are a promising type of dig-ital circuits. They have lower power consumption and
electro-magnetic emission, no problems with clock skew and
related subtle issues, and are fundamentally more tolerant
of voltage, temperature and manufacturing process variations.
The International Technology Roadmap for Semiconductors
report on Design [1] predicts that 22% of the designs will
be driven by handshake clocking (i.e., asynchronous) in 2013,
and this percentage will raise up to 40% in 2020.
PETRIFY [2], [3] is one of the commonly used tools for
synthesis of asynchronous circuits. As a specification it accepts
a Signal Transition Graph (STG) [4] — a class of interpreted
Petri nets in which transitions are labelled with the rising
and falling edges of circuit signals. For synthesis, PETRIFY
employs the state space of the STG, and so it suffers from the
combinatorial state space explosion problem. That is, even
a relatively small system specification may (and often does)
yield a very large state space. This puts practical bounds
on the size of control circuits that can be synthesised using
such techniques, which are often restrictive, especially if
the specification is not constructed manually by a designer
but rather generated automatically from high-level hardware
descriptions. (For example, designing a control circuit with
more than 20–30 signals with PETRIFY is often impossible.)
Hence, this approach does not scale. Moreover, PETRIFY
cannot guarantee a solution which can be mapped to the gate
library at hand.
One way to cope with the state space explosion problem is
to use syntax-directed translation of the specification to a cir-
V. Khomenko is a Royal Academy of Engineering/EPSRC Post-Doctoral
Research Fellow. He is affiliated with School of Computing Science, New-
castle University, UK. E-mail: Victor.Khomenko@ncl.ac.uk.
cuit, avoiding thus building the state space. This is essentially
the idea behind BALSA [5] and TANGRAM [6]. This technique,
although computationally efficient, often yields circuits with
large area and performance overheads compared with syn-
chronous counterparts. This is because the resulting circuits
are highly over-encoded, i.e., they contain many unnecessary
state-holding elements.
For asynchronous circuits to be competitive, one has some-
how to combine the advantages of logic synthesis (high quality
of circuits) and syntax-directed translation (guarantee of a so-
lution, efficiency) while compensating for their disadvantages.
A natural way of doing this is to apply logic synthesis to the
control path extracted from, e.g., a BALSA specification. This
control path can be partitioned into smaller ‘lumps’ which
can be handled by logic synthesis, and the ‘lumps’ on which
it fails (because of either inability to find a solution in the
given gate library or exceeding memory or time constraints)
are implemented using the syntax-directed translation. The
initial experiments conducted in [7] showed that this combined
approach can half the area devoted to control flow and improve
its latency, compared with the traditional syntax-directed trans-
lation, as long as the size of ‘lumps’ which can be confidently
handled by logic syntax is sufficiently large.
Arguably, one of the most difficult tasks in logic synthesis
is resolution of Complete State Coding (CSC) conflicts, arising
when semantically different reachable states of an STG have
the same encoding, i.e., the binary vector representing the
value of all the signals in a given state, as illustrated in
Fig. 1(a,b). To resolve a CSC conflict, new internal signals
helping to distinguish between these states must be inserted
into the specification in such a way that its ‘external’ behaviour
does not change. (Intuitively, insertion of a signal elongates the
encoding, introducing thus additional memory into the circuit,
helping to trace the current state.) The quality of the resulting
circuit (in terms of area and latency) depends to a large extent
on the way the new signals were inserted.
The design flow advocated in [7] is as follows. Given a
(potentially large) STG, the CSC conflicts are resolved using
an integer linear programming (ILP) technique to approximate
the state space of an STG. Then the resulting STG (free from
CSC conflicts) is decomposed into smaller components in such
a way that they are also free from CSC conflicts, as described
in [8]. (Typically, each component is responsible for producing
a single signal.) Then these components are synthesised one-
2 V. KHOMENKO: EFFICIENT AUTOMATIC RESOLUTION OF ENCODING CONFLICTS USING STG UNFOLDINGS
by-one using PETRIFY. This approach can handle much larger
specifications than PETRIFY alone, but its scalability is still
limited since ILP is an NP-complete problem. For example, [7]
reports that it took 28.3 minutes to resolve CSC conflicts in an
STG with 436 places, 398 transitions and 199 signals, followed
by 44.7 minutes of synthesis. Moreover, an ILP approximation
of the state space may work poorly for some STGs, e.g., those
containing many self-loops (i.e., pairs of arcs (p, t), (t, p)
going in opposite directions).
In this paper, we follow a more scalable approach, which
tries to avoid performing expensive operations (such as resolv-
ing CSC conflicts) on the original specification. It works by
proceeding with decomposition immediately, without resolv-
ing CSC conflicts. Hence, the resulting components, unlike
ones in the technique described above, are not free from CSC
conflicts. If a component has a CSC conflict, it can happen
due to one of the following two reasons: (i) this conflict was
present already in the original STG; or (ii) this conflict was
introduced because some of the signals preventing it in the
original STG are not present in the component. The technique
described in [9] allows one to check which of these two
reasons applies, and in case (ii) to find signals which need to be
added to the component to prevent such CSC conflicts. Finally,
the remaining CSC conflicts are resolved in each component,
and the resulting STGs are synthesised.
Although this approach is quite scalable, it can be successful
only if resolution of CSC conflicts and logic synthesis can be
efficiently performed for all components, since a failure to
synthesise even one of them means that the whole STG is not
synthesised. In particular, PETRIFY may be inadequate for this
task because of its rather restrictive limitations on the size of
components. A more promising approach is to employ STG
unfolding prefixes [10]–[12].
A finite and complete unfolding prefix of an STG is a
finite acyclic net which implicitly represents all the reachable
states of this STG together with transitions enabled at those
states. Intuitively, it can be obtained through unfolding the
STG, by successive firing of transitions, under the following
assumptions: (i) for each new firing a fresh transition (called
an event) is generated; (ii) for each newly produced token a
fresh place (called a condition) is generated.
Due to its structural properties (such as acyclicity), the
reachable states of an STG can be represented using configura-
tions of its unfolding. A configuration C is a downward-closed
set of events (being downward-closed means that if e ∈ C
and f is a causal predecessor of e then f ∈ C) without choices
(i.e., for all distinct events e, f ∈ C, there is no condition c
in the unfolding such that the arcs (c, e) and (c, f) are in
the unfolding). Intuitively, a configuration is a partial-order
execution, i.e., an execution where the order of firing of some
of its events (viz. concurrent ones) is not important. We will
denote by [e] the local configuration of an event e, i.e., the
smallest (w.r.t. ⊂) configuration containing e (it is comprised
of e and its causal predecessors).
The unfolding is infinite whenever the original STG has an
infinite run; however, if the STG has finitely many reachable
states then the unfolding eventually starts to repeat itself
and can be truncated (by identifying a set of cut-off events)
without loss of information, yielding a finite and complete
prefix. Intuitively, an event e can be declared cut-off if the
already build part of the prefix contains a configuration Ce
(called the corresponding configuration of e) such that its final
marking and encoding coincide with those of [e] [13] and Ce
is smaller than [e] w.r.t. some well-founded partial order on the
configurations of the unfolding, called an adequate order [10].
Fig. 1(c) shows a finite and complete unfolding prefix of the
STG shown in Fig. 1(a); the only cut-off event depicted as a
double box, and its corresponding configuration is {e1, e2}.
Efficient algorithms exist for building such prefixes [10],
[11], which ensure that the number of non-cut-off events in a
complete prefix can never exceed the number of reachable
states of the STG. However, complete prefixes are often
exponentially smaller than the corresponding state graphs,
especially for highly concurrent STGs, because they repre-
sent concurrency directly rather than by multidimensional
‘diamonds’ as it is done in state graphs. For example, if
the original STG consists of 100 transitions which can fire
once in parallel, the state graph will be a 100-dimensional
hypercube with 2100 vertices, whereas the complete prefix will
coincide with the net itself. Since STGs usually exhibit a lot of
concurrency, but have rather few choice points, their unfolding
prefixes are often exponentially smaller than the corresponding
state graphs; in fact, in many of the experiments conducted
in [14] they are just slightly bigger then the original STGs
themselves. Therefore, unfolding prefixes are well-suited for
alleviating the state space explosion problem.
In [14] the unfolding technique was applied to detection of
CSC conflicts between reachable states of an STG. Moreover,
in [15] the problem of complex-gate logic synthesis from an
STG free from CSC conflicts was solved. The experiments
in [14], [15] showed that unfolding-based approach can handle
much bigger STGs then PETRIFY.
The visualisation method presented in [16] is aimed at
facilitating a manual refinement of an STG with CSC conflicts,
and works on the level of unfolding prefixes. In order to
avoid the explicit enumeration of CSC conflicts, they are
visualised as cores, i.e., sets of transitions ‘causing’ one or
more of them. (A core can be computed as the difference
of two configurations whose final states are in CSC conflict.)
All such cores must eventually be eliminated by adding new
internal signals that resolve the CSC conflicts to yield an STG
satisfying the CSC property. This approach is illustrated in
Fig. 1(c). One can see that the encodings at the beginning
and at the end of the core are the same. This suggests that a
core can be eliminated by the introduction of a new signal,
csc, in such a way that one of its transitions is inserted into
3dtack−
dsr+
lds−
d−ldtack−
ldtack+
lds+ dtack+ dsr−d+
(a)
1M
10110 10110
01111
11111
10111
ldtack+
2M
10100
dsr+dtack−
dtack−
1001000010
01000
01010
1000000000
lds− lds−
ldtack−ldtack−
lds−
dtack−
ldtack−
dsr+
d+
d− dsr− dtack+
lds+
dsr+
0011001110
conflict
CSC
(b)
12e
2C
e7
e11
e4
e9
lds+ d+ dtack+ d−dsr+ ldtack+
core
dsr+
lds+C 1 dsr−
csc+
csc−
lds−
ldtack−
e dtack−1 e2 e5 e6e3
e8 e10
(c)
inputs: dsr , ldtack ; outputs: lds, d , dtack ; internal: csc
Fig. 1. An STG modelling the read cycle of the VME bus controller (a),
its state graph showing a CSC conflict between the reachable states M1 and
M2 (b), and its unfolding prefix showing the conflict core corresponding to
this CSC conflict and a way to resolve it by insertion of a new internal signal
csc (c). The order of signals in the binary encodings is: dsr, ldtack, dtack,
lds, d.
the core, as this would violate the stated property. Note that
at least two transitions, viz. the falling and the rising edges
of the signal, have to be inserted into the STG in order to
ensure the consistency [3], [4] — a necessary condition for
implementability of an STG as a circuit, ensuring that all the
state encodings are binary; in particular, for every signal s, the
following two properties must hold: (i) in all executions of the
STG, the first occurrence of a transition of s has the same sign
(either rising of falling); (ii) the rising and falling transitions
of s alternate in every execution. In this example, the new
transitions were inserted concurrently to existing ones in order
to minimise the latency of the circuit. After transferring them
into the STG, no more CSC conflicts remain in it, and so one
can proceed with logic synthesis. (Other ways of inserting a
signal to resolve the CSC conflict in this example are also
possible — see Section VI.)
The semi-automatic approach of [16] is only feasible for
synthesis of relatively small ‘handcrafted’ blocks. In this paper,
we present a technique which is also based on cores in the
STG unfolding prefix, but is fully automatic and can handle
much larger STGs than PETRIFY, while delivering high-
quality circuits. Together with [14], [15], [17], it essentially
completes the design cycle for synthesis of asynchronous
circuits from STGs that does not involve building reachability
graphs at any stage and yet is a fully fledged logic synthesis.
The conducted experiments show that the proposed method
has significant advantage both in memory consumption and in
execution time compared with the existing state space based
methods, while delivering somewhat better circuits compared
with those produced PETRIFY and the ILP method of [7].
Combined with the decomposition approach of [9], [18], [19],
this design cycle can be applied for control re-synthesis of
BALSA or TANGRAM specifications as described above.
II. COMPARISON WITH OTHER TECHNIQUES
In this paper, we compare the proposed technique for
resolving CSC conflicts with two other techniques: the one
implemented in PETRIFY [2], [3] and employing the state
graphs and the recent technique based on Integer Linear
Programming (ILP) described in [7].
PETRIFY’s approach is well-documented in [3]. It works
with state graphs, and thus does not scale. However, for
small specifications it typically yields quite good solutions.
Moreover, it has some additional capabilities which neither the
ILP approach of [7] nor the proposed method have, viz. it can
restructure the specification, as illustrated in Fig. 2. However,
in practice the scalability is usually much more desirable than
the ability to do restructuring (as it is useful only in very
special cases).
The recent approach described in [7] works in a very
different way. Instead of exact computation of the state space,
it uses an approximate technique based on Integer Linear
Programming (ILP). Briefly, this approach takes as an input
a lasso-shaped CSC violation trace starting from the initial
state and such that the two states, say, s1 and s2, in CSC
conflict are positioned on the loop of the lasso. Then it tries to
insert a set of new transitions (obtained by splitting existing
transitions) corresponding to a new signal into the STG, in
such a way that the STG remains consistent and the numbers
of such transitions on the parts of the loop between s1 and s2,
as well as between s2 and s1, are odd (i.e., the CSC conflict
is resolved). For this, an ILP problem is formulated, whose
solution gives a set of transitions which need to be split (an
elegant sufficient condition for the consistency of the resulting
STG based on the place redundancy test is used). Moreover,
a heuristic cost function is used to guide the search towards
solutions corresponding to small circuits. This procedure is
iterated until all the CSC conflicts are resolved.
The approach presented in this paper was to some degree
inspired by that in [7], but it has a number of important
differences. It iteratively inserts new internal signals into the
specification until no CSC conflicts remain. On each iteration,
it tries to eliminate some of the CSC conflict cores in the
unfolding prefix [16] by insertion of new signals, guided by
a heuristic cost function. The technique described in [17] is
used to avoid re-unfolding the specification after each iteration,
and to transfer the unresolved conflict cores from iteration to
iteration. The main differences from [7] are described below.
• We use STG unfolding prefixes rather than STGs. This
allows for an exact test of consistency where [7] used an
approximate one, based on redundancy of places.
• We used a SAT solver rather than an ILP one. Besides,
the SAT encoding of the problem is based on a different
idea.
4 V. KHOMENKO: EFFICIENT AUTOMATIC RESOLUTION OF ENCODING CONFLICTS USING STG UNFOLDINGS
• Unlike [7], the proposed method does not require a lasso-
shaped CSC violation trace (if the STG is not reversible
then it is not always possible to find such a trace even if
there are CSC conflicts).
• Using unfoldings allowed us to efficiently compute vi-
olation traces using the technique described in [14]. In
contrast, for methods working on the STG level, like that
in [7], this is only possible for some restricted net classes,
such as marked graphs or live and safe free-choice nets.
Intuitively, the problem of checking whether a given
safe STG has CSC conflicts is PSPACE-complete [20,
Proposition 5.1], while ILP is in NP, so the knowledge
of a Parikh vector of the violation trace (i.e., a vector of
non-negative numbers representing the number of times
each transition fired in a given execution; it is typically
returned by ILP methods [21]) does not help much — the
reachability problem remains PSPACE-complete even if
such a Parikh vector is provided as a part of the input. In
principle, [7] could also use, e.g., the technique of [14]
for computing violation traces, but this would, to some
degree, defeat the rationale of their approach, since an
unfolding prefix of the STG has to be built for this —
but then it would be natural to employ it for conflict
resolution as well.
The actual approach used in [7] for computing a CSC
violation trace works as follows [22] (unfortunately, this
question was not addressed in [7]). The problem of
CSC conflict detection is formulated as an ILP problem,
which, if infeasible, guarantees that STG has no CSC
conflicts. Otherwise, a Parikh vector of a CSC violation
trace is computed, and an attempt is made to restore
a trace from this Parikh vector by firing one-by-one
the transitions corresponding to its non-zero components
(the corresponding component of the Parikh vector is
decremented after each firing). If at some point none of
such transitions is enabled, one of them is anyway chosen
and fired (leading to a ‘negative’ marking). The process
stops when all the components of the vector become zero.
One can see that a violation trace is produced (and then
resolved by insertion of new signals) even if the computed
solution of the ILP problem is spurious (i.e., the corre-
sponding CSC conflict states are unreachable). Moreover,
the produced violation trace can be spurious (i.e., passing
via negative markings) even if there is a real execution
corresponding to the computed Parikh vector. Hence, the
method of [7] can sometimes insert redundant signals
resolving ‘spurious’ CSC conflicts.
• The transformations used in [7] were limited to simple
transition splitting. The proposed approach allows to use
a much wider class of transformations; in particular, tran-
sition insertions splitting off just a part of a transition’s
preset or postset, as well as concurrent insertions, are
allowed.
• The proposed method takes into account multiple cores,
whereas the ILP approach considers only a single (per-
haps, spurious) violation trace. In particular, this makes
it possible to choose insertions which resolve many cores
with one signal, reducing thus the total number of inserted
signals and allowing for quicker progress — see the 8-
way sequencer case study in Section VI. Moreover, it
does not require a lasso-shaped violation trace and can
deal with arbitrary violation traces.
• Though the proposed approach is fully automatic, it
inherits the visualisation possibilities described in [16],
which may be useful for interaction with the user.
The described advantages come at the price of increasing
the runtime compared with the method of [7]. However,
the proposed method is much faster then PETRIFY and can
handle quite large specifications. As it is intended for use
in conjunction with the decomposition approach of [9], [18],
[19], it is sufficient for practical applications such as control
re-synthesis.
III. TRANSFORMATIONS
In this paper, we are primarily interested in SB-preserving
transformations, i.e., transformations preserving safeness and
behaviour (in the sense that the original and the transformed
Petri nets are bisimilar, provided that the newly inserted
transitions are considered silent) of the Petri net. Below we
describe several kinds of transition insertions, which we will
use for CSC conflict resolution, and the algorithms presented
in [17] allow to check their validity.
Building an unfolding prefix of an STG can be a time-
consuming operation. The approach described in [17] allows to
avoid a potentially expensive re-unfolding after each transition
insertion, by introducing local modifications in the existing
prefix instead. Moreover, it yields a prefix similar to the orig-
inal one, which is advantageous for visualisation and allows
one to transfer some information (e.g., the yet unresolved CSC
cores) from the original prefix to the modified one.
Sequential pre-insertion
A sequential pre-insertion is essentially a generalised transi-
tion splitting, and is formally defined as follows. Given a tran-
sition t and a set of places S ⊆ •t, the sequential pre-insertion
S ≀ t is the transformation inserting a new transition u (with
an additional place) ‘splitting off’ the places in S from t. The
picture below illustrates the sequential pre-insertion {p1, p2}≀t.
p1
p2
p3
t
q1
q2
q3
=⇒
p1
p2
p3
u p
t
q1
q2
q3
5One can easily show that sequential pre-insertions always
preserve safeness and traces. However, in general, the be-
haviour is not preserved, and so a sequential pre-insertion is
not guaranteed to be SB-preserving (in fact, it can introduce
deadlocks) [17]. Given an unfolding prefix, it is quite easy to
check whether a pre-insertion is SB-preserving [17].
If sequential pre-insertion S ≀ t is applied to the STG,
the inserted transition should not ‘delay’ an input (as this
would impose a constraint on the environment which was not
present in the original specification), and so t must not be
an input transition. Moreover, one should take care that the
semi-modularity is not violated. ([17] presents an algorithm
allowing one to check that the newly inserted transition will
not be in a dynamic choice relation with any other transition,
which ensures semi-modularity.)
Sequential post-insertion
Similarly to sequential pre-insertion, sequential post-inser-
tion is also a generalisation of transition splitting, and is
formally defined as follows. Given a transition t and a set
of places S ⊆ t•, the sequential post-insertion t ≀ S is the
transformation inserting a new transition u (with an additional
place) ‘splitting off’ the places in S from t. The picture below
illustrates the sequential post-insertion t ≀ {q1, q2}.
p1
p2
p3
t
q1
q2
q3
=⇒
p1
p2
p3
t
p u
q1
q2
q3
One can easily show that sequential post-insertions always
preserve safeness and behaviour, and hence are always SB-
preserving.
If sequential post-insertion is applied to the STG, the semi-
modularity is guaranteed to be preserved. However, one should
ensure that the inserted transition does not ‘delay’ any input
transitions (as this would impose a constraint on the environ-
ment which was not present in the original specification).
Concurrent insertion
Concurrent transition insertion can be advantageous for
performance, since the inserted transition can fire in parallel
with the existing ones. It is defined as follows. Given two
distinct transitions, t′ and t′′, and an n ∈ {0, 1}, the concur-
rent insertion t′n|−→t′′ is the transformation inserting a new
transition u (with a couple of additional places) between t′
and t′′, and putting n tokens in the place in its preset. We
will write t′ |−→t′′ instead of t′0|−→t′′ and t′•|−→t′′ instead of
t′1|−→t′′. The picture below illustrates the concurrent insertion
t1
•|−→t3.
p1 t1
p2
t2 p3 t3 p4
⇓
p1 t1
p2
t2 p3 t3 p4
p u q
In general, concurrent insertions preserve neither safeness
nor behaviour. In fact, safeness is not preserved even if n = 0
(e.g., when in the original net t′ can fire twice without t′′
firing), and deadlocks can be introduced even if n = 1
(e.g., when in the original net t′′ should fire twice before t′ can
become enabled). In [17], an efficient test whether a concurrent
insertion is SB-preserving, working on an unfolding prefix, has
been developed.
If a concurrent insertion t′n|−→t′′ is applied to the STG,
the semi-modularity is guaranteed to be preserved, but the
inserted transition should not ‘delay’ an input (as this would
impose a constraint on the environment which was not present
in the original specification), and so t′′ must not be an input
transition.
Equivalent transformations
Sometimes a sequential post-insertion t ≀S yields essentially
the same net as a sequential pre-insertion S′≀t′, where t ∈ ••t′;
in particular, this happens if S ∪ S′ ⊆ t• ∩ •t′ and |•p| =
|p•| = 1 for all p ∈ S ∪ S′. In such a case there is no reason
to distinguish between these two transformations, e.g., one
can convert the post-insertion into an equivalent pre-insertion
whenever possible. Moreover, since post-insertions are always
SB-preserving, there is no need to check the validity of the
resulting transformation.
Commutative transformations
Two transformations commute if the result of their appli-
cation does not depend on the order they are applied. (Note
that a transformation can become ill-defined after applying
another transformation, e.g., t ≀{p, q} becomes ill-defined after
applying t ≀ {p}.) One can observe that:
• a concurrent insertion always commutes with any other
transition insertion;
• a sequential pre-insertion and a sequential post-insertion
always commute;
• two sequential pre-insertions S ≀ t and S′ ≀ t′ commute iff
t 6= t′ or S ∩ S′ = ∅;
• two sequential post-insertions t ≀ S and t′ ≀ S′ commute
iff t 6= t′ or S ∩ S′ = ∅.
It is important to note that an SB-preserving transition
insertion remains SB-preserving if another commuting SB-
preserving transition insertion is applied first. Hence trans-
formations whose validity has been checked can be cached,
6 V. KHOMENKO: EFFICIENT AUTOMATIC RESOLUTION OF ENCODING CONFLICTS USING STG UNFOLDINGS
and after some transformation has been applied, the non-
commuting transformations are removed from the cache and
the new transformations that became possible in the modified
STG are computed, checked for validity and added to the
cache. (In particular, in the proposed CSC conflict resolution
procedure, there is no need to check the validity of a particular
transformation if it was checked in a preceding iteration.)
A composite transition insertion is a transformation defined
as the composition of a set of pairwise commutative transition
insertions. Clearly, if a composite transition insertion consists
of SB-preserving transition insertions then it is SB-preserving,
i.e., one can freely combine SB-preserving transition inser-
tions, as long as they are pairwise commutative. This property
is useful for conflict resolution: typically, several transitions of
a new internal signal have to be inserted in each iteration of the
algorithm, in order to preserve the consistency of the STG. For
example, in Fig. 1(c) a composite transformation comprising
two commuting SB-preserving sequential insertions (adding
the new transitions csc+ and csc−) has been applied in order
to resolve the CSC conflict while preserving the consistency
of the STG.
IV. RESOLUTION OF CSC CONFLICTS
On each iteration of the proposed CSC conflict resolution
procedure, a consistency-preserving composite insertion Î
resolving some of the conflict cores is chosen.
Given a finite and complete prefix of the STG unfolding,
one can compute a set I of valid (i.e., SB-preserving, semi-
modularity-preserving, not delaying an input, etc.) insertions
as described in the previous section. (There is only a poly-
nomial in the size of Σ number of such signal insertions
if max
⋃
t∈T {|
•t|, |t•|} is bounded by a constant.) Then we
formulate a SAT problem as follows.
For each insertion I ∈ I we create a Boolean variable, also
denoted by I , indicating whether I ∈ Î . The constraints below
ensure that for any satisfying assignment of a SAT instance to
be built, the corresponding composite insertion Î (obtained
by taking the insertions whose corresponding variables are
assigned 1) is valid (i.e., that it preserves the consistency
of the STG, the chosen individual insertions commute and
introduce no auto-conflicts or self-triggering) and that some of
the conflict cores are resolved (i.e., some progress is made).
This SAT instance will be a conjunction of the constraints
described below.
MUT EX constraint
Two signal insertions, I and I ′, are called mutually exclu-
sive if they are non-commuting, or the inserted transitions are
either concurrent or in auto-conflict or one of them can trigger
the other.
All these conditions can be checked statically on the prefix,
and one can build an undirected graph G representing the
‘mutually exclusive’ relation on I. Then, for every edge (I, I ′)
of G, the transformations I and I ′ must not be used together,
which is expressed by the constraint:∧
(I, I ′) is an
edge of G
(¬I ∨ ¬I ′) .
The size of this constraint can be quadratic in |I|. A smaller
translation can be obtained by heuristically covering the edges
of G by minimum number of cliques (using, e.g., the heuristic
algorithm described in [23]), trying also to minimise the sizes
of individual cliques, and generating the constraint∑
I∈Cl
I ≤ 1
for each clique Cl . A linear in |Cl | translation of this pseudo-
Boolean constraint into a Boolean formula is possible by
introducing auxiliary variables [24], [25].
Sign alternation constraint
The chosen SAT encoding does not carry any information
concerning the signs (‘+’ or ‘−’) of the inserted transitions.
This is motivated by the desire to reduce the number of
variables in the corresponding SAT instance by exploiting
the following symmetry: it is always possible to flip the
signs of all the transitions corresponding to a given internal
signal without affecting the correctness (consistency, semi-
modularity, etc.) of the STG. However, one still has to ensure
that consistent assignment of signs to each signal insertion
within the composite signal insertion is possible; given such a
composite insertion, one can statically compute the assignment
using a prefix, by arbitrarily choosing the initial value (0 or 1)
of the newly inserted signal. Hence, without loss of generality,
one can assume that this value is 0 (it can be easily changed
to 1 by flipping the signs of all the transitions corresponding
to the newly inserted signal after the CSC conflict resolution
process is completed).
In part, this condition is ensured by the MUT EX con-
straint, which guarantees that the instances of the newly
inserted signals are not concurrent. The purpose of the sign
alternation constraint SA is to ensure that the signs of the
instances of the newly inserted signal alternate in each con-
figuration of the prefix.
Given a configuration C of the prefix and a composite
insertion Î , we denote by Code
bI(C) the encoding of the newly
inserted signal at the final state of C. (Recall that we assume
that the initial value of this signal is 0, i.e., Code
bI(∅)
df
= 0.)
Let J0, . . . , Jk be the instances of I in the prefix, i.e., the
I-labelled events which would have been added to the prefix if
the insertion I is applied to the STG. (They can be computed
statically on the prefix [17].) We extend the usual notation
for presets and postsets to transformation instances; but note
that, depending on the type of insertion, •Ji or J•i (or both)
7may be not in the prefix (until the transformation is applied).
However, the events in ••Ji are in the prefix even before the
transformation is applied.
For a configuration C, let #IC be the number of instances
of I which would be inserted by the transformation I into C;
it can be computed statically as follows:
#IC
df
=

#t′′C if I is t′n|−→t′′
#tC if I is S ≀ t
#tC −m if I is t ≀ S,
where m = 1 if C can be extended by some instance of I , and
m = 0 otherwise, and #tC denotes the number of t-labelled
events in C.
Assuming that the instances of the new signal within C
can be assigned signs in a consistent way, Code
bI(C) can be
computed as follows:
Code
bI(C) ⇐⇒
⊕
I : #IC is odd
I .
The sign alternation constraint needs to ensure that if
I ∈ Î then all its instances J0, . . . , Jk can be assigned
the same sign in a consistent way, i.e., that the values of
Code
bI([
••J0]), . . . ,Code
bI([
••Jk]) are the same, where [X ]
denotes the minimal (w.r.t. ⊂) configuration containing all the
events in X . This can be accomplished, for each I ∈ I, by
the following constraint:
I ⇒ SAME(Code
bI([
••J0]), . . . ,Code
bI([
••Jk])) ,
where
SAME(x0, . . . , xk)
df
=
k∧
i=0
(xi ⇒ xi+1 mod (k+1)) ,
and Code
bI([
••J0]), . . . ,Code
bI([
••Jk]) are new auxiliary Boo-
lean variables.
Since for a given t, all insertions of the form t ≀ · and tn|−→·
have the same ••J0, . . . , ••Jk, the sign alternation constraints
for a group G of such insertions can be combined as follows:( ∨
I∈G
I
)
⇒ SAME(Code
bI([
••J0]), . . . ,Code
bI([
••Jk])) .
Note that the SA constraint is defined via Code
bI([
••J ]) for
all instances J of all the insertions I ∈ I, and the definition
of Code
bI(C) assumes that the instances of the new signal
within C can be assigned signs in a consistent way, i.e., they
are not concurrent (which is ensured by MUT EX ) and their
signs alternate, which has to be ensured by SA. This mutual
dependency of Code
bI(C) and SA does not cause problems,
though, due to the following inductive argument. Suppose SA
is incorrect for some configuration C of the prefix. Since
Code
bI(X) is computed correctly whenever SA is correct
on X , and due to MUT EX no two instances of the new
signal can be concurrent, SA must be incorrect already for
the configuration [••J ] ⊂ C for some instance J of I ∈ I.
Since ⊂ is a well-founded order and SA is correct for the
empty configuration, we have a contradiction.
CUT OFF constraint
The sign alternation constraint ensures that the signs of
instances of the newly inserted signal will alternate in any
configuration of the prefix. However, to guarantee consistency,
one still has to add a constraint CUT OFF ensuring that this
is also the case for the configurations of the full unfolding
beyond the cut-off events of the prefix. For this, it is enough
to ensure for each cut-off event e that after Î is applied, the
value of the newly inserted signal is the same in the final states
of [e] and its cut-off corresponding configuration.
One may be tempted to express this constraint as
Code
bI([e]) ⇐⇒ Code bI(C
e) ,
for each cut-off event e with a corresponding configuration Ce.
However, it does not take into account the following subtlety.
It can happen that some instance J of a post-insertion I ∈ Î is
such that Ce can be extended by J . The definition of Code
bI
does not take J into account (since J will not be in Ce after the
transformation is applied), even though it may become a part
of the corresponding configuration of e after I is applied. To
capture this, a post-insertion I is called e/Ce-mismatching if
some instance J of I is such that Ce can be extended by J and
[e] cannot be extended by J . Now such additional instances
of post-insertions can be taken into account as follows:
Code
bI([e]) ⇐⇒ Code bI(C
e)⊕
⊕
I∈Me
I ,
for each cut-off event e with a corresponding configuration Ce,
where Me is the set of e/Ce-mismatching post-insertions.
As an optimisation, this constraint can be represented as
¬
(
Code
bI([e])⊕ Code bI(C
e)⊕
⊕
I∈Me
I
)
,
and ⊕-sums can be optimised, as described at the end of
this section. Alternatively, one can observe that if two post-
insertions are commutative and non-concurrent then no con-
figuration can be extended by both of them. Hence at most one
of the variables in
⊕
I∈Me I can be assigned 1, i.e., one can
replace this sub-expression by
∨
I∈Me I . This can improve
the runtime of SAT solver and shorten the formula, and the
⊕-sums can still be optimised for Code
bI([e])⊕ Code bI(C
e).
CORE constraint
To ensure progress, a constraint CORE conveying that at
least one of the conflict cores is resolved is added. Let CS be
a core. A signal insertion I is called hanging w.r.t. CS if, after
it is applied, it directly precedes or succeeds CS. A composite
8 V. KHOMENKO: EFFICIENT AUTOMATIC RESOLUTION OF ENCODING CONFLICTS USING STG UNFOLDINGS
transition insertion Î is hanging w.r.t. CS if some I ∈ Î is
hanging w.r.t. CS.
One can observe that if Î is hanging w.r.t. CS then CS is
not resolved by Î . In the transformed prefix, this core will
resurface as a core CS′, as one can always ensure that the
encodings at the beginning and at the end of CS′ coincide by
adding, if needed, a hanging instance of I ∈ Î to the core.
CS is resolved by a composite signal insertion Î if an odd
number of signal instances is inserted into it, and none of the
inserted signal instances is hanging w.r.t. CS. By introducing
new auxiliary variables HangingCS and ResolvedCS for each
core CS, CORE can be expressed as follows:(∨
CS
ResolvedCS
)
∧
∧
CS
(
HangingCS ⇐⇒
∨
I∈HCS
I
)
∧
∧
CS
ResolvedCS ⇐⇒
(
¬HangingCS ∧
⊕
I /∈HCS∧
#ICS is odd
I
) ,
where HCS is the set of hanging w.r.t. CS transition insertions.
Computation of ⊕-sums
One can notice that the constructed formulae contain many
⊕-sums over the same set of variables I. There is typically
a lot of sharing between them, and so these sums can be
optimised by computing common sub-sums only once.
The problem can be abstractly formulated as follows.
Given m ⊕-sums over the variables x1, . . . , xn, build a small
acyclic Boolean circuit with n inputs and m outputs computing
these ⊕-sums. (Such a circuit can then be converted into a
Boolean formula in the conjunctive normal form, whose size
is linear in the size of the circuit.)
This problem can be solved in a number of ways. The
method described in [24, Chapter 4.7], [26] divides the vari-
ables into n/ logn groups of log n variables each, computes
all the possible sums in each group, and forms the circuit
from these sums. For this, at most n
2+mn
log n −m binary ⊕-gates
are needed. In the actual implementation, a method based on
preset trees [11, Chapter 4] was used. Experiments show that
it works quite well in practice.
Another optimisation is to use xi ∨ xj instead of xi ⊕ xj
for variables which are known to be mutually exclusive
(e.g., those corresponding to concurrent or non-commutative
transformations).
V. COST FUNCTION
On each iteration of the method, a heuristic cost function
is used to guide the search towards ‘good’ solutions with
small area and/or performance overhead. The constructed
SAT instance is solved several times, with constraints on the
value of the cost function added to the formula, so that a
solution minimising the value of the cost function is eventually
computed. (The process resembles a binary search on the value
of the cost function.) The cost function we used is a weighted
sum of the following components:
• the estimated number of unresolved CSC cores;
• the estimated number of unresolved Universal State Cod-
ing (USC) cores, i.e., cores corresponding to different
states which have the same encoding (though USC cores
which are not CSC cores are not harmful, they can
become CSC cores once new signals are added to the
STG);
• the estimated delay introduced by the insertion;
• the total number of syntactic triggers of all output and
internal signals;
• the number of inserted transitions of a signal;
• the number of input signals which are not ‘locked’ (see
below) with the newly inserted signal;
• the number of output and internal signals which are not
‘locked’ with the newly inserted signal.
The user can choose the relative weights of the components of
the cost function, e.g., to guide the resolution process towards
solutions with the desired area/latency trade-off.
Estimating the number of unresolved cores
The variables ResolvedCS occurring in the CORE con-
straint give a possibility to estimate the number of unresolved
cores as follows: ∑
CS
¬ResolvedCS .
This estimate can be improved by taking into account the
following:
• the union of ‘adjacent’ resolved cores will re-appear as a
core on the next iteration of the method;
• if a hanging w.r.t. a USC core insertion is applied then
this core is likely to be ‘promoted’ to a CSC core on the
next iteration of the method.
Estimating the delay
We consider a delay model where each transition of the
STG is assigned an individual delay; e.g., input signals usually
take longer to fire than non-input ones, because they often
denote the end of a certain computation in the environment.
(This delay model is similar to that in [3].) It is quite crude,
but it is hard to significantly improve it, since the exact time
behaviour is only known after the circuit and its environment
are synthesised. For example, one can assume that the delay
of input signal transitions is 3 time units and the delay of
output and internal signal transitions is 1 time unit (these can
easily be adjusted by the designer). Then the delay estimate
is simply the total number of signal insertions I ∈ Î which
increase the latency of some output.
9Estimating the logic complexity
The logic complexity can be (roughly) estimated using
the number of syntactic triggers of each output or internal
signal z (i.e., signals whose transitions occur in ••t for some
z-labelled transition t). The set of syntactic triggers of z is
an approximation of the set of real triggers of z (i.e., signals
whose firing can enable z); note that all the real triggers of z
are always in the support of the complex gate implementing z.
In addition to triggers, other signals, called context signals, can
also be in the support of z, so this estimate of logic complexity
is quite crude.
For all output or internal signals y and for all signals x we
create the auxiliary variables TrigIy, TrigxI and Trigxy , and
add the following constraints to the formula:
TrigIy ⇐⇒
∨
I∈I y
I ,
TrigxI ⇐⇒
∨
I∈Ix 
I ,
Trigxy ⇐⇒
∨
tx∈Tx,ty∈Ty,
tx∈••ty
∧
I∈Itx|ty
¬I ,
where I y and Ix are the sets of transition insertions which
can ‘trigger’ an instance of y (respectively, be triggered by
an instance of x), Tz is a set of transitions corresponding to
a signal z, and It′|t′′ is the set of transition insertions which
‘separate’ t′′ from t′ ∈ ••t′′. Now, the total number of triggers
of all output and internal signals can be computed as
∑
y
TrigIy + ∑
x triggers y
Trigxy
+∑
x
TrigxI .
Computing the number of inserted transitions
Sometimes a SAT solver tries to resolve as many cores
as possible by a single signal, inserting a large number of
its transitions into the STG. While ensuring a fast progress,
this strategy may lead to a very large logic implementing the
inserted signal. Hence it makes sense to make |Î | =
∑
I∈I I
a part of the cost function.
Computing the lock relation
Two signals are in the ‘lock’ relation [27] if their instances
(i) cannot be concurrent, and (ii) alternate in every execution
sequence. ‘Locking’ the newly inserted signal with as many
other signals as possible is a good heuristics for logic simpli-
fication [22].
For each signal z we introduce a new auxiliary variable
lockedz tracing whether the newly inserted signal z is ‘locked’
with z; hence the number of signals ‘locked’ with the newly
inserted one can easily be computed once these variables are
correctly assigned. Below we develop constraints ensuring
this.
Suppose p is a new (virtual) place such that z±-labelled
transition put a token in it, and the transitions labelled by the
newly inserted signal consume a token from it; moreover, let
the initial marking of p, denoted M0(p), be 0 if some instance
of z± causally precedes any instance of I ∈ Î and 1 otherwise.
One can see that for each configuration C, the number of
tokens in p in its final state, denoted Tokensp(C), can be
computed as
Tokensp(C) = M0(p) + #zC −#IC .
Moreover, for each cut-off event e with a corresponding con-
figuration Ce, Tokensp([e]) = Tokensp(Ce), since #z[e] =
#zC
e is guaranteed by the unfolder and #I [e] = #ICe is
guaranteed by the CUT OFF constraint. Hence, to show that
z is ‘locked’ with the newly inserted signal it only remains to
check that for each instance e of z or I , Tokensp([e]) ∈
{0, 1}, i.e., for each instance e of z, Tokensp([e]) = 1
and for each instance J of I , Tokensp([••J ]) = 1. The
computation can be performedmod 2, since the smallest ‘bad’
configuration will not satisfy the property mod 2. Hence,
lockedz ⇐⇒
∧
e∈Ez
(
(M0(p) + #z[e] mod 2)⊕ Code
bI([e])
)
∧
∧
I
(
I ⇒
∧
J∈JI
(
(M0(p)+#z [
••J ] mod 2)⊕ Code
bI([
••J ])
))
,
where Ez is the set of events of the prefix corresponding to
the transitions of a signal z, and JI is the set of instances
of I .
VI. CASE STUDIES AND EXPERIMENTAL RESULTS
The CSC conflict resolution method described in this paper
has been implemented in the MPSAT tool. In this section we
present a number of case studies illustrating some interesting
features of the proposed approach, as well as the results of
running it on a number of benchmarks. To solve the arising
SAT instances, the MINISAT 2 solver1 has been used. All
the experiments were conducted on a PC with PentiumTM
IV/3.4GHz processor and 2G RAM.
VME bus controller.
The specification of the read cycle of VME bus controller
is shown in Fig. 1. Although it is a very small benchmark
containing a single conflict core, MPSAT was able to find
17 possible ways to resolve it, listed in Table I. This shows
that the proposed method explores a fairly large design space,
including solution 8 with two set and two reset transitions.
1Available from http://www.cs.chalmers.se/Cs/Research/
FormalMethods/MiniSat/Main.html
10 V. KHOMENKO: EFFICIENT AUTOMATIC RESOLUTION OF ENCODING CONFLICTS USING STG UNFOLDINGS
1 ≀lds+, ≀dtack+
2 ≀d−, ≀lds+
3 ≀lds+, dtack+ |−→d−
4 ldtack−≀, ≀dtack+
5 ≀lds+, ≀dtack−
6 ≀d−, ldtack−≀
7 ldtack−≀, dtack+ |−→d−
8 ≀d−, ≀lds+, ≀dtack+, ≀dtack−
9 d+ |−→d−, ≀lds+
10 d+ |−→d−, ldtack−≀
11 lds− |−→lds+, ≀dtack+
12 ≀d−, lds− |−→lds+
13 ≀lds+, dsr− |−→dtack−
14 lds− |−→lds+, dtack+ |−→d−
15 ≀lds+, dtack+ |−→dtack−
16 d+ |−→d−, lds− |−→lds+
17 d+ |−→dtack−, ≀lds+
TABLE I
THE COMPOSITE TRANSITION INSERTIONS RESOLVING THE CSC
CONFLICT SHOWN IN FIG. 1.
Many of these solutions cannot be computed by the method
of [7], as the class of transformations it uses is limited to
transition splitting.
An ‘unresolvable’ conflict.
The STG in Fig. 2(a) was presented in [7]. PETRIFY can
resolve all the CSC conflicts in it by restructuring the net and
inserting two signals, and it was claimed that it is impossible to
resolve CSC conflicts without such a restructuring. However,
MPSAT has found a solution with three signals requiring no
restructuring, shown in Fig. 2(b). (When this was reported to
the authors of [7], they amended their tool and it was able to
resolve the conflicts by inserting four signals.)
An 8-way sequencer.
Sequencers are among the standard ‘building blocks’ of
circuits produced from hardware description languages like
BALSA and TANGRAM. The ‘parent’ handshake at port a
initiates eight sequentially ordered ‘child’ handshakes at ports
b, . . . , i. Then the parent handshake completes, and the cycle
continues. (The completion of the last ‘child’ handshake is
reshuffled with the completion of the ‘parent’ handshake for an
early acknowledgement at port a.) Fig. 3 shows the unfolding
prefix of the STG specifying an 8-way sequencer with seven
conflict cores.
Intuitively, at least three bits of additional memory are
needed to implement this specification (by counting how many
of the eight ‘child’ handshakes have been executed so far), so
the CSC conflicts cannot be resolved by insertion of fewer
than three signals. However, it is not trivial to find a solution
with three insertions — in fact, PETRIFY’s solutions contains
−
a− y+
b−
y−
b+ b+a
+
x+
x
(a)
+ csc3
+
csc1
−
csc2
+
csc3
+
+a
+x
−x
+a
+y
−b
−y
+b +b
csc1
+
csc3
−
csc2
−
csc1
+
csc2
(b)
inputs: a, b; outputs: x, y; internal: csc1, csc2, csc3
Fig. 2. An STG from [7] (a) and a way to resolve the CSC conflicts in it
by inserting three signals without restructuring (b).
four new signals. MPSAT was able to find a fully concurrent
solution with three signals shown in Fig. 3 by dotted lines.
Note that to accomplish this the signal csc1 is set and reset
twice in each cycle.
Finding a solution with three signals is only possible by
analysing multiple cores; the method of [7] cannot find such
a solution because it analyses just a single violation trace on
each iteration — in fact, it needs six signals to resolve the
CSC conflicts in this case study.
Assorted small benchmarks
Table II compares the three methods for resolving CSC
conflicts: the state-space based approach implemented in PET-
RIFY, the ILP approach of [7] and the one proposed in this
paper on a number of assorted small benchmarks from [7].
The meaning of the columns in the table is as follows (from
left to right): the name of the problem; the number of places,
transitions, and input and output signals in the original STG;
the number of signals inserted by PETRIFY, the ILP approach
of [7] and the approach proposed in this paper; and the number
of literals in the final complex-gate implementations produced
by the three approaches (the smallest numbers are highlighted).
The numbers in the ‘Pfy’ and ‘ILP’ columns are as reported
in [7], and, for consistency with [7], PETRIFY was used to
synthesise the STGs after the CSC conflicts were resolved.
It should be noted that different sets of weights in the
cost function were used to produce the numbers in the two
‘SAT’ columns: in the former the cost function was aimed
at minimising the number of inserted signals, whereas in the
latter it was aimed at minimising the number of literals in the
final implementation.
11
f 0
+ f 1
+
0
−f f 1
−
1
+
csc 2
+
csc 1
+
csc
2
−
csc 1
−
csc 3
−
csc
a1
−g0
+ g1
+ g0
− g1
− h0
+ h1
+ h0
− h1
− i 0
+ i 1
+
a1
+
a0
− i 0
− i 1
−
a
1
−
csc 3
+
csc
0
+ b0
+ b1
+ b0
− b1
−
c0
+
c1
+
c0
−
c1
− d0
+ d1
+ d0
− d1
−
e0
+
e1
+
e0
−
e1
−
inputs: a0, b1, c1, d1, e1, f1, g1, h1, i1
outputs: a1, b0, c0, d0, e0, f0, g0, h0, i0
internal: csc1, csc2, csc3
Fig. 3. The unfolding prefix of an STG modelling an 8-way sequencer, showing 7 cores and a fully concurrent solution with 3 new signals.
STG Signals Literals
Example |P |/|T | In/Out Pfy ILP SAT Pfy ILP SAT
ADFAST 15/12 3/3 2 2 2 14 17 21
IRCV-BM 55/46 5/4 2 4 1 38 46 28
MMU 20/16 4/4 3 3 3 29 27 27
MMU0 20/16 4/4 3 5 3 29 33 27
MMU1 24/16 4/4 3 2 2 32 25 25
MR0 31/22 5/6 3 4 3 45 34 29
MR1 25/18 4/5 4 4 3 35 29 27
NAK-PA 22/18 4/5 1 1 1 18 18 18
NOWICK 19/14 3/2 1 1 1 14 13 14
PAR(4) 23/20 5/5 4 4 4 32 32 32
SEQ(8) 36/36 9/9 4 6 3 47 43 44
TSEND-BM 45/39 5/4 2 3 1 39 40 27
ALLOC-OUTBOUND 17/18 4/3 2 2 2 16 16 15
DUPLICATOR 14/12 2/2 2 3 2 19 16 13
MOD4 COUNTER 16/16 1/2 2 4 2 25 28 25
RAM-READ-SBUF 26/20 5/5 1 1 1 18 19 19
SBUF-RAM-WRITE 29/20 5/5 2 2 2 22 29 29
SBUF-READ-CTL 14/12 2/4 1 1 1 15 15 15
MASTER 1882 38/26 6/7 1 1 1 38 38 39
TRCV-BM 53/44 5/4 2 4 2 36 41 34
SEQ MIX 20/20 4/4 3 2 2 20 20 20
SPEC SEQ(4) 20/20 5/5 3 3 2 20 20 20
Total 51 62 44 601 599 548
TABLE II
EXPERIMENTAL RESULTS: ASSORTED SMALL STGS.
One can see that in all cases the number of inserted by
MPSAT signals was smaller or the same compared with
the other methods, and also it produced in average 8.5–
8.8% smaller implementations. This may seem not particularly
spectacular, but such an improvement over PETRIFY on small
benchmarks is noteworthy.
Scalable benchmarks
We also compared the described method with PETRIFY (the
ILP tool of [7] was not available from the authors yet) on
two groups of scalable benchmarks modelling m pipelines
weakly synchronised without arbitration (PPWK(m, n)) and
with arbitration (PPARB(m, n)). They are the benchmarks
STG Prefix Signals Literals Time, [s]
Example |P |/|T | In/Out |B|/|E| Pfy SAT Pfy SAT Pfy SATs SATl
Marked Graphs
PPWK(2,3) 23/14 0/7 41/24 1 1 35 34 <1 <1 <1
PPWK(2,6) 47/26 0/13 119/63 1 1 71 70 5 <1 <1
PPWK(2,9) 71/38 0/19 233/120 1 1 107 106 34 <1 6
PPWK(2,12) 95/50 0/25 383/195 1 1 142 142 368 <1 18
PPWK(3,3) 34/20 0/10 63/36 2 2 59 54 4 <1 <1
PPWK(3,6) 70/38 0/19 183/96 2 2 112 108 105 <1 6
PPWK(3,9) 106/56 0/28 357/183 2 2 163 162 1838 4 55
PPWK(3,12) 142/74 0/37 585/297 — 2 — 216 mem 5 175
STGs with Arbitration
PPARB (2,3) 48/32 2/13 110/66 2 2 81 81 35 <1 2
PPARB (2,6) 72/44 2/19 218/120 2 3 117 116 118 1 17
PPARB (2,9) 96/56 2/25 362/192 2 2 153 152 1041 2 50
PPARB (2,12) 120/68 2/31 542/282 — 3 — 188 mem 8 159
PPARB (3,3) 71/48 3/19 188/114 3 3 136 131 620 <1 14
PPARB (3,6) 107/66 3/28 368/204 3 3 190 184 5043 2 117
PPARB (3,9) 143/84 3/37 602/321 3 4 244 238 12307 7 354
PPARB (3,12) 179/102 3/46 890/465 — 5 — 292 mem 24 839
TABLE III
EXPERIMENTAL RESULTS: SCALABLE BENCHMARKS.
from the corresponding series used in [14], with the latter
series modified by ‘factoring out’ the arbiter into the environ-
ment to ensure semi-modularity. Note that in these two series
of benchmarks all the signals except the arbiter’s grants in
PPARB(m, n) are considered outputs, i.e., the control logic
is designed as a closed circuit. The inputs are inserted after
the synthesis is completed, by breaking up some outputs
and inserting the environment into the breaks, thus forming
handshakes (sometimes with an inverter attached to the output
if the environment acts as an active port). Fig. 4 illustrates
these two types of STGs, and the results for these two groups
are summarised in Table III.
The meaning of the columns in Table III is the same as
in Table II, except that the sizes of the corresponding finite
and complete prefixes (in terms of the numbers of conditions
12 V. KHOMENKO: EFFICIENT AUTOMATIC RESOLUTION OF ENCODING CONFLICTS USING STG UNFOLDINGS
x
+
1
x
+
2
x
+
3
x
+
4 z
+ y
+
1
y
+
2
y
+
3
y
+
4
x
−
1
x
−
2
x
−
3
x
−
4
y
−
1
y
−
2
y
−
3
y
−
4
z−
(a)
outputs: x1, . . . , x4, y1, . . . , y4, z
r+x g
+
x r
−
x g
−
x
r+yg
+
yr
−
yg
−
y
x
+
1
x
+
2
x
+
3
x
+
4 z
+ x
+
5
y
+
1
y
+
2
y
+
3
y
+
4z
+y
+
5
x
−
1
x
−
2
x
−
3
x
−
4 z
− y
−
1
y
−
2
y
−
3
y
−
4z
−
x
−
5
y
−
5
(b)
inputs: gx, gy; outputs: x1, . . . , x5, y1, . . . , y5, z, rx, ry
Fig. 4. STGs modelling two weakly synchronised pipelines without arbitration (a) and with arbitration (b).
and events) are given in the forth column and the runtimes (in
seconds) are now reported for each of the methods in the last
four columns (for MPSAT, the runtimes for signal and literal
optimisation are reported separately). We use ‘mem’ if there
was a memory overflow. It also should be noted that since
PETRIFY was not able to synthesise some of the resulting
STGs, they were synthesised with the unfolding-based tool
described in [15], which currently does not support multi-
level Boolean minimisation and outputs the equations in the
minimised disjunctive normal form (DNF).
One can see that on these benchmarks PETRIFY and MPSAT
were very close in terms of the number of inserted signals
and the number of literals. However, in terms of runtime and
memory consumption MPSAT was clearly superior: in some
cases the runtime differed by orders of magnitude, and the
cases which were intractable for PETRIFY due to memory
overflow were solved by MPSAT relatively easily.
It should be noted that, depending on whether signals or lit-
erals are minimised, MPSAT’s runtimes can differ significantly
on same benchmark. This can be explained by the fact that in
the former case many of the parameters of the cost function
(viz. the estimated delay, the total number of syntactic triggers
of all output and internal signals, the number of inserted
transitions, the numbers of inputs and outputs which are not
‘locked’ with the newly inserted signal) are not taken into
account (resulting in a considerable shortening of the SAT
instance), whereas in the latter case only the estimated delay
is not taken into account.
VII. CONCLUSIONS AND FUTURE WORK
This paper proposes a new method for resolution of CSC
conflicts based on STG unfoldings. The problem is re-
formulated in terms of Boolean satisfiability, and a tunable
heuristic cost function is used to guide the design space
exploration towards good solutions.
The presented case studies demonstrate that the proposed
approach explores a large design space and is able to find
interesting solutions which could not be found by other
methods; moreover, the experimental results show it is quite
fast and results in good solutions.
As it was mentioned in the introduction, the proposed
approach is intended for use in conjunction with the decompo-
sition method of [9], [18], [19]. This work is almost finished
now, and the preliminary results are very encouraging [28].
In future work, we intend to extend the method to other
transformations, in particular concurrency reduction [29], [30].
Moreover, there is still a scope to improve the cost function,
e.g., using the ideas described in [27].
Acknowledgements: The author would like to thank Josep
Carmona and Jordi Cortadella for helpful discussions and
13
benchmarks, and to Mark Schaefer for his feedback concerning
the developed MPSAT tool. This research was supported
by the Royal Academy of Engineering/EPSRC post-doctoral
research fellowship EP/C53400X/1 (DAVAC).
REFERENCES
[1] “International Technology Roadmap for Semiconductors: Design,” 2005,
URL: www.itrs.net/Links/2005ITRS/Design2005.pdf.
[2] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Ya-
kovlev, “PETRIFY: a Tool for Manipulating Concurrent Specifications
and Synthesis of Asynchronous Controllers,” IEICE Transactions on
Information and Systems, vol. E80-D, no. 3, pp. 315–325, 1997.
[3] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Ya-
kovlev, Logic Synthesis of Asynchronous Controllers and Interfaces,
Springer-Verlag, 2002.
[4] T.-A. Chu, Synthesis of Self-Timed VLSI Circuits from Graph-Theoretic
Specifications, Ph.D. thesis, Lab. for Comp. Sci., MIT, 1987.
[5] D. Edwards and A. Bardsley, “BALSA: an Asynchronous Hardware
Synthesis Language,” The Computer Journal, vol. 45, no. 1, pp. 12–18,
2002.
[6] K. v. Berkel, “Handshake Circuits: an Asynchronous Architecture for
VLSI Programming,” International Series on Parallel Computation, vol.
5, 1993.
[7] J. Carmona and J. Cortadella, “State Encoding of Large Asynchronous
Controllers,” in Proc. DAC’06. 2006, pp. 939–944, IEEE Comp. Soc.
Press.
[8] J. Carmona and J. Cortadella, “ILP Models for the Synthesis of
Asynchronous Control Circuits,” in Proc. DAC’03. 2003, pp. 818–826,
IEEE Comp. Soc. Press.
[9] M. Schaefer, “CSC-Aware STG-Decomposition,” in Proc. 18th UK
Asynchronous Forum. 2006, Newcastle University.
[10] J. Esparza, S. Ro¨mer, and W. Vogler, “An Improvement of McMillan’s
Unfolding Algorithm,” FMSD, vol. 20, no. 3, pp. 285–310, 2002.
[11] V. Khomenko, Model Checking Based on Prefixes of Petri Net Unfold-
ings, Ph.D. thesis, School of Computing Science, Newcastle University,
2003.
[12] K.L. McMillan, “Using Unfoldings to Avoid State Explosion Problem
in the Verification of Asynchronous Circuits,” in Proc. CAV’92. 1992,
LNCS 663, pp. 164–174, Springer-Verlag.
[13] A. Semenov, Verification and Synthesis of Asynchronous Control
Circuits Using Petri Net Unfolding, Ph.D. thesis, School of Computing
Science, Newcastle University, 1997.
[14] V. Khomenko, M. Koutny, and A. Yakovlev, “Detecting State Coding
Conflicts in STG Unfoldings Using SAT,” Fund. Inf., vol. 62, no. 2, pp.
1–21, 2004.
[15] V. Khomenko, M. Koutny, and A. Yakovlev, “Logic Synthesis for
Asynchronous Circuits Based on Petri Net Unfoldings and Incremental
SAT,” Fund. Inf., vol. 70, no. 1–2, pp. 49–73, 2006.
[16] A. Madalinski, A. Bystrov, V. Khomenko, and A. Yakovlev, “Visu-
alization and Resolution of Coding Conflicts in Asynchronous Circuit
Design,” IEE Proceedings: Computers & Digital Techniques, vol. 150,
no. 5, pp. 285–293, 2003.
[17] V. Khomenko, “Behaviour-Preserving Transition Insertions in Unfolding
Prefixes,” Tech. Rep. CS-TR-952, School of Computing Science,
Newcastle University, 2006.
[18] W. Vogler and B. Kangsah, “Improved Decomposition of Signal Tran-
sition Graphs,” Tech. Rep. 2004-08, Institut fu¨r Informatik, Universita¨t
Augsburg, 2004.
[19] W. Vogler and R. Wollowski, “Decomposition in Asynchronous Circuit
Design,” Tech. Rep. 2002-05, Institut fu¨r Informatik, Universita¨t
Augsburg, 2002.
[20] J. Esparza and P. Jancˇar, “On the Complexity of Consistency and Com-
plete State Coding for Signal Transition Graphs,” in Proc. ACSD’06.
2006, pp. 47–56, IEEE Comp. Soc. Press.
[21] T. Murata, “Petri Nets: Properties, Analysis and Applications,” Proc. of
the IEEE, vol. 77, no. 4, pp. 541–580, 1989.
[22] J. Carmona and J. Cortadella, “Private Communication,” 2006.
[23] J. Gramm, J. Guo, F. Huffner, and R. Niedermeier, “Data Reduction, Ex-
act, and Heuristic Algorithms for Clique Cover,” in Proc. ALENEX’06.
2006, pp. 86–94, SIAM.
[24] I. Wegener, The Complexity of Boolean Functions, Wiley-Teubner Series
in Computer Science, 1987.
[25] N. Ee´n and N. So¨rensson, “Translating Pseudo-Boolean Constraints into
SAT,” Journal on Satisfiability, Boolean Modeling and Computation,
vol. 2, pp. 1–25, 2006.
[26] J.E. Savage, “An Algorithm for the Computation of Linear Forms,”
SIAM Journal on Computing, vol. 3, pp. 150–158, 1974.
[27] P. Vanbekbergen, F. Catthoor, G. Goossens, and H. De Man, “Optimized
Synthesis of Asynchronous Control Circuits from Graph-Theoretic Spec-
ifications,” in Proc. ICCAD’90. 1990, pp. 184–187, IEEE Comp. Soc.
Press.
[28] V. Khomenko and M. Schaefer, “Combining Decomposition and
Unfolding for STG Synthesis,” in preparation, 2007.
[29] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Ya-
kovlev, “Automatic Handshake Expansion and Reshuffling Using
Concurrency Reduction,” in Proc. HWPN’98, 1998, pp. 86–110.
[30] V. Khomenko, A. Madalinski, and A. Yakovlev, “Resolution of Encoding
Conflicts by Signal Insertion and Concurrency Reduction Based on STG
Unfoldings,” Fund. Inf., 2007, submitted paper.
