Resolution of Encoding Conflicts by Signal Insertion and Concurrency Reduction Based on STG Unfoldings by Khomenko V et al.
School of Computing Science,
University of Newcastle upon Tyne
Resolution of Encoding Conflicts by
Signal Insertion and Concurrency
Reduction Based on STG Unfoldings
V. Khomenko, A. Madalinski and A. Yakovlev
Technical Report Series
CS-TR-858
September 2004
Copyright c©2004 University of Newcastle upon Tyne
Published by the University of Newcastle upon Tyne,
School of Computing Science, Claremont Tower, Claremont Road,
Newcastle upon Tyne, NE1 7RU, UK.
Resolution of Encoding Conflicts by Signal Insertion and
Concurrency Reduction Based on STG Unfoldings
V. Khomenko1, A. Madalinski2 and A. Yakovlev2
1 School of Computing Science
2School of Electrical, Electronic and Computer Engineering
University of Newcastle upon Tyne, NE1 7RU, UK
Abstract
A combined framework for the resolution of encoding
conflicts in STG unfoldings is presented, which extends pre-
vious work by incorporating concurrency reduction in addi-
tion to signal insertion. Furthermore, a novel validity con-
dition is proposed to justify these transformations.
1 Introduction
Signal Transition Graphs, or STGs [2,4], are widely used
for specifying the behaviour of asynchronous control cir-
cuits. They are interpreted Petri nets in which transitions
are labelled with the rising and falling edges of circuit sig-
nals. Synthesis based on STGs involves the following steps:
(a) checking sufficient conditions for the implementability
of the STG by a logic circuit; (b) modifying, if necessary,
the initial STG to make it implementable; and (c) finding
appropriate Boolean next-state functions for non-input sig-
nals.
A commonly used tool, PETRIFY [3, 4], performs all
these steps automatically, after first constructing the reach-
ability graph of the initial STG specification. To gain ef-
ficiency, it uses symbolic (BDD-based) techniques to rep-
resent the STG’s reachable state space. While such an ap-
proach is convenient for completely automatic synthesis, it
has several drawbacks: state graphs represented explicitly
or in the form of BDDs are hard to visualise due to their
large sizes and the tendency to obscure causal relationships
and concurrency between the events, which prevents effi-
cient interaction with the user. Moreover, the combinato-
rial explosion of the state space is a serious issue for highly
concurrent STGs, putting practical bounds on the size of
control circuits that can be synthesised. Thus PETRIFY can
fail to synthesise a circuit, especially if the STG models are
not constructed by a human designer but rather generated
automatically from high-level hardware descriptions.
Where PETRIFY fails, other tools based on alternative
techniques, and in particular those employing Petri net un-
foldings, may succeed. A finite and complete unfolding
prefix of an STG Γ is a finite acyclic net which implicitly
represents all the reachable states of Γ together with transi-
tions enabled at those states. Intuitively, it can be obtained
through unfolding Γ, by successive firings of transition, un-
der the following assumptions: (a) for each new firing a
fresh transition (called an event) is generated; (b) for each
newly produced token a fresh place (called a condition) is
generated. The unfolding is infinite whenever Γ has an in-
finite run; however, if Γ has finitely many reachable states
then the unfolding eventually starts to repeat itself and can
be truncated (by identifying a set of cut-off events) without
loss of information, yielding a finite and complete prefix.
Fig. 2(c) shows a finite and complete unfolding prefix (with
the only cut-off event is depicted as a double box) of the
STG shown in Fig. 2(a).
Efficient algorithms exist for building such prefixes [8,
9], which ensure that the number of non-cut-off events in
a complete prefix can never exceed the number of reach-
able states of Γ. However, complete prefixes are often ex-
ponentially smaller than the corresponding state graphs, es-
pecially for highly concurrent Petri nets, because they rep-
resent concurrency directly rather than by multidimensional
‘diamonds’ as it is done in state graphs. For example, if the
original Petri net consists of 100 transitions which can fire
once in parallel, the state graph will be a 100-dimensional
hypercube with 2100 nodes, whereas the complete prefix
will coincide with the net itself.
In [9, 11] the unfolding technique was applied to the im-
plementability analysis in step (a), viz. checking the Com-
plete State Coding (CSC) condition [4], which requires de-
tecting encoding conflicts between reachable states of an
STG. Since STGs usually exhibit a lot of concurrency, but
have rather few choice points, their unfolding prefixes are
often exponentially smaller than the corresponding state
graphs; in fact, in many of the experiments conducted
1
in [9,11] they are just slightly bigger then the original STGs
themselves. Therefore, unfolding prefixes are well-suited
for both visualisation of an STG’s behaviour and alleviating
the state space explosion problem.
In [15] the unfolding technique was applied to step (b),
in particular for enforcing the CSC condition (i.e., for the
resolution of CSC conflicts), which is a necessary condition
for the implementability of an STG as a circuit. A CSC
conflict arises when semantically different reachable states
of an STG have the same binary encoding. Fig. 2(b) shows
the state graph of the STG in Fig. 2(a) with a CSC conflict
between states M1 and M2. To resolve a CSC conflict, new
signals helping to distinguish between the involved states
are inserted into the specification in such a way that its ‘ex-
ternal’ behaviour does not change. (Intuitively, insertion of
signals introduces additional memory into the circuit, help-
ing to trace the current state.) In [15] a framework was de-
veloped for an interactive refinement process based on vi-
sualisation of conflict cores, i.e., sets of events causing en-
coding conflicts, which are represented at the level of finite
and complete prefixes of STG unfoldings.
The work in [12] addresses step (c), where unfoldings
techniques are used to derive equations for logic gates of
the circuit. Together with [9, 11, 15] they form a complete
design flow for complex gate synthesis of asynchronous cir-
cuits based on STG unfoldings rather than state graphs.
This paper extends the framework for the visualisa-
tion and resolution of encoding conflicts in [15] (step (b))
by incorporating the concurrency reduction transformation
(which can eliminate encoding conflicts by removing some
of the STG’s reachable states) in addition to signal insertion.
The common belief that concurrency is crucial for per-
formance is questionable. In a highly concurrent specifica-
tion, almost all combinations of signal values are reachable,
and thus Boolean minimisers cannot efficiently exploit the
‘don’t care’ values, which results in large and slow gates
in the final implementation. Moreover, transitions of the
newly inserted signals delay output transitions, and hence
can also increase the delay of the final circuit. Concur-
rency reduction can increase the number of unreachable
states, thus providing more ‘don’t cares’ for logic optimi-
sation. Furthermore, if an encoding conflict is solved by
concurrency reduction rather than signal insertion then no
additional gate is required to implement this signal. Thus,
the elimination of encoding conflicts by concurrency re-
duction may result in a faster and smaller circuit. In gen-
eral, both concurrency reduction and signal insertion are
required to explore a larger solution space, and consider-
ing only one of these techniques may leave out important
solutions. Existing techniques either apply concurrency re-
duction at the state graph level [5, 14] or are restricted to
specific net classes or use local transformations [1] and thus
restrict the design space.
Another important contribution of this paper is a novel
notion of validity, which is used to justify STG transforma-
tions used to solve encoding conflicts. We believe it better
reflects the intuition than other existing notions. However,
this notion is much more general and is also of independent
interest: it is formulated for labelled Petri nets (of which
STGs being a special case) and arbitrary transformations
preserving the alphabet of the system. For example, it can
be applied to justifying the concurrency increasing trans-
formation used in [16] to convert speed-independent circuit
into delay-insensitive ones.
2 Basic notation
In this section, we first present basic definitions concern-
ing Petri nets and STGs, and then address several key con-
cepts related to net unfoldings [4, 7–9, 17].
2.1 Petri nets
A net is a triple N df= (P,T,F) such that P and T are
disjoint sets of respectively places (circles) and transitions
(boxes), collectively known as nodes, and F ⊆ (P× T )∪
(T ×P) is a flow relation (arcs). As usual, •z df= {y | (y,z) ∈
F} and z• df= {y | (z,y) ∈ F} denote the pre- and postset
of z ∈ P∪ T , and •Z df=
⋃
z∈Z
•z and Z• df=
⋃
z∈Z z
•
, for all
Z ⊆ P∪T . We assume that •t 6= /0, for every t ∈ T . A mark-
ing (tokens) of N is a multiset M of places, i.e., M : P →
N
df
= {0,1,2, . . .}. An example of a Petri net with the initial
marking {p1, p2} is shown in Fig. 1(a).
A net system is a pair Σ df= (N,M0) comprising a finite net
N = (P,T,F) and an (initial) marking M0. A transition t ∈ T
is enabled at a marking M, denoted M[t〉, if for every s ∈ •t,
M(s)≥ 1. Such a transition can be executed or fired, leading
to the marking M′ defined by M′ df= M− •t + t•, where ‘−’
and ‘+’ stand for the multiset difference and sum respec-
tively. We denote this by M[t〉M′. For example, in the net
system shown in Fig. 1(a) transition t1 can fire consuming a
token from p1 and producing a token in p3, which can be ex-
pressed as {p1, p2}[t1〉{p2, p3}. The set of reachable mark-
ings of Σ is the smallest (w.r.t. ⊂) set [M0〉 containing M0
and such that if M ∈ [M0〉 and M[t〉M′ for some t ∈ T then
M′ ∈ [M0〉. For a finite sequence of transitions σ = σ1 . . .σk,
we write M[σ〉M′ if there are markings M0, . . . ,Mk such that
M0 = M, Mk = M′, and Mi−1[σi〉Mi, for i = 1, . . . ,k.
A net system Σ is k-bounded if, for every reachable
marking M and every place p ∈ P, M(p) ≤ k, and safe if
it is 1-bounded. Moreover, Σ is bounded if it is k-bounded
for some k ∈ N. One can show that the set [M0〉 is finite iff
Σ is bounded.
A transition t ∈ T is auto-concurrent if there is a reach-
able marking M such that for every p ∈ •t, M(p) ≥ 2. A
2
p
1
p
5
p
7
p
6
p
2
t3t2t1
t6 t7p
3
p
4
t4 t5
(a)
c10 c11
c6 c8 c9
c5c4
c2 p2
p
5
p
6
p
6
p
7
p
2
p
1
p
6
p
7
p
6
p
7
c15 c16 c17 c18
12
e t511e
c14c13c12
c7p7
4
e
t4
5e
t5
p
4
p
3
c3
c1 p1
t4
p
4
p
3
p
5
8
e
9
e t2t1 10e t3
6
e t6 7e t7
1
e t1 2e t2 3e t3
(b)
Figure 1. A Petri net (a) and one of finite and
complete prefixes of its unfolding (b).
net system Σ is non-auto-concurrent if no its transition is
auto-concurrent.
2.2 Branching processes and configurations
Two nodes of a net N = (P,T,F), y and y′, are in struc-
tural conflict, denoted y#y′, if there are distinct transitions
t, t ′ ∈ T such that •t ∩ •t ′ 6= /0 and (t,y) and (t ′,y′) are in the
reflexive transitive closure of the flow relation F , denoted
by . A node y is in structural self-conflict if y#y.
An occurrence net is a net ON df= (B,E,G) where B is
the set of conditions (places), E is the set of events (tran-
sitions) and G is a flow relation. It is assumed that: ON is
acyclic (i.e.,  is a partial order); for every b ∈ B, |•b| ≤ 1;
for every y ∈ B∪E, ¬(y#y) and there are finitely many y′
such that y′ ≺ y, where ≺ denotes the irreflexive transitive
closure of G. Min(ON) will denote the minimal w.r.t. ≺ el-
ements of B. The relation ≺ is the causality relation. Two
distinct nodes are concurrent, denoted y co y′, if neither y#y′
nor y y′ nor y′  y. Fig. 1(b) shows an example of an oc-
currence net where, e.g., the following relationships hold:
e1 ≺ e6, e4#e5 (due to the choice at c1) and e6 co e7.
A homomorphism from an occurrence net ON to a net
system Σ is a mapping h : B∪E →P∪T such that: h(B)⊆P
and h(E)⊆ T (conditions are mapped to places, and events
to transitions); for all e ∈ E, the restriction of h to •e is a
bijection between •e and •h(e) and the restriction of h to e•
is a bijection between e• and h(e)• (transition environments
are preserved); the restriction of h to Min(ON) is a bijection
between Min(ON) and M0 (minimal conditions correspond
to the initial marking); and for all e, f ∈ E, if •e = • f and
h(e) = h( f ) then e = f (there is no redundancy). In Fig. 1(b)
this homomorphism is shown as labels of the nodes.
A branching process of Σ is a quadruple β df= (B,E,G,h)
such that (B,E,G) is an occurrence net and h is a homomor-
phism from it to Σ. A branching process β′ = (B′,E ′,G′,h′)
of Σ is a prefix of a branching process β = (B,E,G,h) of Σ,
denoted β′ v β, if (B′,E ′,G′) is a subnet of (B,E,G) such
that: if e ∈ E ′ and (b,e) ∈ G or (e,b) ∈ G then b ∈ B′; if
b ∈ B′ and (e,b) ∈ G then e ∈ E ′; and h′ is the restriction
of h to B′ ∪E ′. For each Σ there exists a unique (up to iso-
morphism) maximal (w.r.t.v) branching process, called the
unfolding of Σ (it is infinite whenever Σ has an infinite exe-
cution).
If a branching process pi is such that for every its condi-
tion b ∈ B, |b•| ≤ 1, then pi is called a process. Since one
of the transformations we are discussing is concurrency re-
duction, it is convenient to use a partial order rather than
interleaving semantics, and our discussion will be based on
processes, which are a partial order analog of traces. The
main difference between the processes and traces is that in
the former the events are ordered only partially, and thus
one process can correspond to several traces, which can be
obtained from it as the linearisations of the corresponding
partial order. A Petri net generates a set of processes much
like it generates a language.
A process can be represented as a (perhaps infinite) la-
belled acyclic net, with places having at most one incoming
and one outgoing arc. (And a branching process can be con-
sidered as overlayed processes.) A processes is maximal if
it is maximal w.r.t. v, i.e., if it cannot be extended by new
events. A maximal process is either infinite (though not ev-
ery infinite process is maximal) or leads to a deadlock.
If pi is a process and E ′ ⊆ E is a set of events of the
unfolding not belonging to pi such that the events from pi and
E ′ together with their incident conditions induce a process,
then this process will be denoted by pi⊕E ′. Moreover, if pi
is finite and U ⊆ T , #U pi will denote the number of events
of pi with labels in U ; furthermore, if t ∈ T then #tpi
df
= #{t}pi.
A configuration of an occurrence net is a finite set of
events C ⊆ E such that for all e, f ∈C, ¬(e# f ) and, for ev-
ery e ∈ C, f ≺ e implies f ∈ C. For every event e ∈ E,
the configuration [e] df= { f | f  e} is called the local con-
3
figuration of e. Moreover, for a set of events E ′ ⊆ E,
[E ′] df=
⋃
e∈E ′ [e]. Note that [E ′] is finite whenever E ′ is, and
[E ′] a configuration if it is finite and no two events in E ′
are in structural conflict. The set of triggers of an event
e ∈ E is defined as trg(e) df= max≺([e] \ {e}). For exam-
ple, in the net shown in Fig. 1(b) {e1,e3,e4} is a config-
uration whereas {e1,e2,e3} and {e4,e7} are not (the for-
mer includes events in structural conflict, e1#e2, while the
latter does not include e1 ≺ e4), [e9] = {e1,e3,e4,e6,e9},
[e6,e7] = {e1,e3,e4,e6,e7} and trg(e4) = {e1,e3}. Intu-
itively, a configuration is a partial-order execution, i.e., an
execution where the order of firing of some of its transi-
tions is not important; e.g., the configuration {e1,e3,e4}
corresponds to two totally ordered executions: e1e3e4 and
e3e1e4. Configurations are somewhat similar to finite pro-
cesses, the difference being that the former are sets of events
of the unfolding while the latter are nets. However, it is
sometimes convenient to interpret a process as the set of its
events, e.g., in order to apply set operations (such as ⊆, ⊇
and \) to it. For example, we will denote by pi \pi′ the set
of events which are in pi but not in pi′, and write E ′ ⊆ pi to
denote the fact that every event in E ′ ⊆ E is in pi. Similarly,
max≺pi will denote the set of causally maximal events of pi.
A cut is a maximal (w.r.t. ⊂) set of conditions B′ ⊆ B
such that b co b′, for all distinct b,b′ ∈ B′. Every marking in
a branching process reachable from Min(ON) is a cut. Let
C be a finite configuration of a branching process β. Then
Cut(C) df= (Min(ON)∪C•)\ •C is a cut; moreover, the mul-
tiset Mark(C) df= h(Cut(C)) of places is a reachable mark-
ing of Σ. A marking M of Σ is represented in a branching
process β if the latter contains a configuration C such that
M = Mark(C). Every marking represented in β is reachable,
and every reachable marking is represented in the unfolding
of Σ. For example, the cut corresponding to the configura-
tion {e1,e3,e4} is {c6,c7}, and the corresponding reachable
marking of Σ is {p6, p7}. Similar notation will be used for
finite processes, e.g., Cut(pi) and Mark(pi) are defined as
Cut(C) and Mark(C), respectively, where C comprises the
events in pi.
A branching process β = (B,E,G,h) of Σ is complete if
there is a set Ecut ⊆ E of cut-off events such that, for every
reachable marking M of Σ, there exists a finite configuration
C of β such that C∩Ecut = /0 and M = Mark(C), and for each
such C and every transition t enabled by M, there is an event
e 6∈C in β such that h(e) = t and C∪{e} is a configuration (e
may be in Ecut ). For example, the branching process shown
in Fig. 1(b) is complete w.r.t. the set Ecut = {e5,e11,e12}
(cut-off events are shown as double boxes).
Although, in general, an unfolding can be infinite, for
every bounded net system Σ one can construct a finite com-
plete prefix of the unfolding of Σ, by choosing an appropri-
ate set Ecut of cut-off events, beyond which the unfolding is
not generated.
2.3 Labelled Petri nets
Definition 1 (LPN). A labelled Petri net (LPN) is a tuple
ϒ df= (Σ,I ,O, `), where Σ is a Petri net, I ∩O = /0 are re-
spectively finite sets of inputs (controlled by the environ-
ment) and outputs (controlled by the system), and ` : T →
I ∪O ∪ {τ} is a labelling function, where τ /∈ I ∪O is a
silent action (e.g., internal signals in an STG). ♦
In this notion, τ’s denote internal signal transitions which
are controlled by the system and not observable by the en-
vironment. They should be distinguished from ‘dummy’
transitions in STGs, which do not correspond to any actual
signals of the circuit and are a syntactic feature. In figures,
we will denote inputs by i or ik, and outputs by o or ok.
A branching process of an LPN ϒ = (Σ,I ,O, `) is a
branching process of Σ augmented with an additional la-
belling of its events, (`◦h) : E → I ∪O∪{τ}, and processes
of an LPN are defined in a similar way. If pi = (B,E,G,h)
is a process of an LPN ϒ = (Σ,I ,O, `) then the abstrac-
tion of pi w.r.t. ` is the labelled partially-ordered set (with
the labels in I ∪O) abs`(pi) df= (E ′,≺′, `′) where: E ′ = {e ∈
E | `(h(e)) 6= τ}; ≺′ is the restriction of ≺ to E ′×E ′; and
`′ : E ′ → I ∪O is such that for all e ∈ E ′, `′(e) = `(h(e)).
We will write abs(pi) instead of abs`(pi) if ` is obvious from
the context.
An LPN is input-proper if no input event in its unfolding
is triggered by an internal event, i.e., if for every event e in
the unfolding such that `(h(e)) ∈ I , and for every event f ∈
trg(e), `(h( f )) 6= τ.
2.4 Signal Transition Graphs
A Signal Transition Graph (STG) is a triple Γ df= (Σ,Z,λ)
such that Σ = (N,M0) is a net system, Z is a finite set of
signals generating a finite alphabet Z± df= Z×{+,−} of sig-
nal transition labels, and λ : T → Z± is a labelling function.
The signal transition labels are of the form z+ or z−, and
denote the transitions of a signal z ∈ Z from 0 to 1 (rising
edge), or from 1 to 0 (falling edge), respectively. We will
also denote by z± a transition of signal z if its direction is
not particularly important. For the graphical representation
of STGs a short-hand notation is used, where a transition
can be connected directly to another transition if the place
‘in the middle of the arc’ has one incoming and one outgo-
ing arc. An example of an STG specification of a VME bus
controller regulating the communication between a device
and a bus through a data transceiver is shown in Fig. 2(a).
We associate with the initial marking of Γ a binary vector
ν0
df
= (ν01, . . . ,ν
0
|Z|) ∈ {0,1}
|Z|
, where ν0i corresponds to the
initial value of signal zi ∈ Z. Moreover, with a sequence of
transitions σ we associate an integer signal change vector
4
νσ
df
= (νσ1 ,ν
σ
2 , . . . ,ν
σ
|Z|) ∈ Z
|Z|
, so that each νσi is the differ-
ence between the numbers of the occurrences of z+i – and
z−i –labelled transitions in σ.
Γ is consistent if, for every reachable marking M, all
firing sequences σ from M0 to M have the same en-
coding Code(M) df= ν0 + νσ, and this vector is binary,
i.e., Code(M) ∈ {0,1}|Z|. Such a property guarantees that,
for every signal z ∈ Z, the STG satisfies the following two
properties: (i) the first occurrence of z in the labelling of
any firing sequence of Γ starting from M0 has the same sign
(either rising of falling); and (ii) the rising and falling la-
bels of z alternate in any firing sequence of Γ. All STGs
considered in the sequel are assumed to be consistent.
The state graph of a consistent STG Γ is a tuple SGΓ
df
=
(S,A,s0,Code) such that: S
df
= [M0〉 is the set of states;
A df= {M λ(t)−→ M′ | M ∈ [M0〉 ∧M[t〉M′} is the set of tran-
sitions; s0
df
= M0 is the initial state; and Code : S→{0,1}|Z|
is the state assignment function, as defined above for mark-
ings. Fig. 2(b) shows the state graph of the STG depicted in
part (a) of this figure together with the encodings of all the
reachable states.
Signals in Z are partitioned into input signals, ZI , out-
put signals, ZO, and internal signals, Zτ. Input signals are
assumed to be generated by the environment, while local
(i.e., output and internal) signals are produced by the logical
gates of the circuit. Logic synthesis derives a Boolean func-
tion Fz(z1, . . . ,z|Z|) for each signal z ∈ ZO ∪ Zτ, which re-
quires the conditions for the enabling of each output signal
transition to be determined without ambiguity by the encod-
ing of each reachable state. To capture this, let Loc(M) df=
{z ∈ ZO∪Zτ | ∃t ∈ T : M[t〉∧λ(t) = z±} be the set of local
signals enabled at state M. Two states of SGΓ are in CSC
conflict if they have the same encoding but different sets of
enabled local signals. Γ satisfies the Complete State Coding
(CSC) property if no two states of SGΓ are in CSC conflict.
Fig. 2(b) illustrates a CSC conflict between two different
markings, M1 and M2, that have the same encoding, 10110,
but Loc(M1) = {lds} 6= Loc(M2) = {d}. This means that,
e.g., the value of Flds(1,0,1,1,0) is ill-defined (it should be
0 according to M1 and 1 according to M2), and thus lds is not
implementable as a logic gate. To cope with this, the STG
should be transformed, e.g., by adding new internal signals,
so that the resulting STG satisfies the CSC property.
Note that an STG Γ can be considered as a special case of
an LPN with the same underlying Petri net, I df= ZI , O
df
= ZO
and ` defined as
`(t)
df
=
{
τ if λ(t) = z±∧ z ∈ Zτ
z if λ(t) = z±∧ z /∈ Zτ .
A branching process of an STG Γ = (Σ,Z,λ) is a branch-
ing process of Σ augmented with an additional labelling of
its events, (λ◦h) : E → Z±. One can easily check the con-
e1 e2 e5 e6e3
e8 e10
12e
2C
e7
e11
dtack−
dsr+
lds−
d−ldtack−
ldtack+
lds+ dtack+ dsr−d+
1M
10110 10110
e4
e9
lds+ d+ dtack+ d−dsr+ ldtack+
core
dsr+
lds+C 1
(c)
(b)(a)
dsr−
csc+
csc−
lds−
ldtack−
dtack−
01111
11111
10111
ldtack+
2M
10100
dsr+dtack−
dtack−
1001000010
01000
01010
1000000000
lds− lds−
ldtack−ldtack−
lds−
dtack−
ldtack−
dsr+
d+
d− dsr− dtack+
lds+
dsr+
0011001110
conflict
CSC
(d)
lds = csc · (ldtack ·dsr+ lds)+d lds = dsr · (lds+ ldtack)+d
dtack = d dtack = dtack · lds+d
d = lds · csc · ldtack+d ·dsr d = ldtack ·dsr · lds
csc = d+ lds · csc
(e)
inputs: dsr, ldtack; outputs: lds,d,dtack; internal: csc
Figure 2. VME bus controller: the STG for the
read cycle (a), its state graph showing a CSC
conflict (b), its unfolding prefix with the corre-
sponding conflict core (c), and the equations
for signal insertion (d) and concurrency re-
duction (e). The signal order in binary encod-
ings is: dsr, dtack, lds, ldtack, d.
sistency of Γ once its finite and complete prefix has been
built [17]. A complete unfolding prefix of the STG shown
in Fig. 2(a) is shown in part (c) of this figure, where e12 is a
cut-off event.
3 Valid transformations
The notion of validity for signal insertion is quite easy
— one can justify such a transformation in terms of weak
bisimulation, which is well-studied. For a concurrency
reduction (or transformations in general), the situation is
more difficult: the original and transformed systems are
typically not even language-equivalent; deadlocks can dis-
appear (e.g., the deadlocks in Dining Philosophers can be
eliminated by fixing the order in which forks are taken);
deadlocks can be introduced; transitions can become dead;
even the language inclusion may not hold (some transfor-
mations, e.g., converting a speed-independent circuit into a
delay-insensitive one [16] can increase the concurrency of
inputs, which in turn extends the language). For the sake of
generality, we discuss arbitrary transformations (not neces-
sarily concurrency reductions or signal insertions).
Intuitively, there are four aspects to a valid transforma-
tion:
I/O interface preservation The transformation must pre-
serve the interface between the circuit and the environ-
ment. In particular, no input transition can be ‘delayed’
by newly inserted signals or ordering constraints.
5
Conformation Bounds the behaviour from above, i.e., re-
quires that the transformation introduces no ‘wrong’
behaviour. Note that certain extensions of behaviour
are valid, e.g., two inputs in sequence may be accepted
concurrently [6, 16], extending the language.
Liveness Bounds the behaviour from below, i.e., requires
that no ‘interesting’ behaviour is completely elimi-
nated by the transformation.
Technical restrictions It might happen that a valid trans-
formation is still unacceptable because the STG be-
comes unimplementable or because of some other
technical restriction. For example, one usually re-
quires the transformation to preserve the speed-inde-
pendence and boundedness of the STG [4, 5].
In the example below, the original LPN is bounded (in
fact, safe), whereas the concurrency reduction shown
by the dashed arc yields an unbounded LPN, even
though its behaviour may be valid.
a b
In this section we discuss in the described framework the
notions of validity proposed in [5,6] and present a new one,
which, in our opinion, better reflects the intuition of what a
valid transformation is. Since the first and the last aspects
are well-studied [4], we will concentrate on the remaining
two aspects, viz. conformation and liveness.
3.1 Critical overview of previous validity notions
The liveness restrictions imposed on transformations
in [5] require that (i) no events become dead, and (ii) no
(new) deadlock states appear. As the example in Fig. 3
shows, these restrictions are not sufficient to guarantee the
correctness of the modified LPN. Indeed, the enabling re-
gion of output o has not become empty, and the set of dead-
locks has not changed, even though the transformation is
clearly invalid: in the original specification, output o is al-
ways produced, whereas in the transformed one the envi-
ronment can prevent o from occurring by repeatedly choos-
ing i1 rather then i2.
In [5] a notion of conformation is introduced. However,
it cannot express the liveness conditions, e.g., the Univer-
sal Do-Nothing module, accepting all inputs but not pro-
ducing any outputs, conforms to any specification with the
same alphabet; thus one cannot require the circuit to do any-
thing. The other notion introduced in [5] is based on the
existence of a winning strategy in a certain infinite game,
and is quite complicated. In this paper we propose an ele-
gant bisimulation-style notion which takes the liveness into
account.
i1 i2 o
i1 i2
i1 i2
o
o
ER(o)
deadlocks
i1 i2
o
ER′(o)
deadlocks
Figure 3. Liveness problem: an LPN with the
concurrency reduction shown by the dashed
arc together with its state graphs before and
after the transformation.
3.2 Our notion of validity
For the sake of generality, we discuss arbitrary LPNs
(STGs being a special kind of them). We assume that the
transformation does not change the inputs and outputs of
the system, and we will denote by ϒ and ϒ′ the original and
transformed LPNs, respectively. Since one of the transfor-
mations we are discussing is concurrency reduction, it is
convenient to use a partial order rather than interleaving se-
mantics, and our discussion will be based on processes of
LPNs.
Given processes pi of ϒ and pi′ of ϒ′, we define a relation
between their abstractions, abs(pi) and abs(pi′), which holds
iff in pi′ the inputs are no less concurrent and the outputs are
no more concurrent than in pi. That is, the transformation
is allowed, on one hand, to relax the assumptions about the
order in which the environment will produce input signals,
and, on the other hand, to restrict the order in which outputs
are produced. Thus the modified LPN will not produce new
failures and will not cause new failures in the environment.
Intuitively, abs(pi) and abs(pi′) are bound by this relation
iff abs(pi) can be transformed into abs(pi′) in two steps (see
the picture below): (i) the ordering constraints for inputs
are relaxed (yielding a new order ≺′′, which is a relaxation
of ≺); (ii) new ordering constraints for outputs are added,
yielding abs(pi′) (thus, ≺′′ is also a relaxation of ≺′).
abs(pi) = (S,≺, `)
(S′′,≺′′, `′′)
ϕ
abs(pi′) = (S′,≺′, `′)
ψstep 1 ↘ ↗ step 2
Below we give two alternative definitions of such a relation.
6
Definition 2. Let pi and pi′ be processes of ϒ and ϒ′,
respectively, abs(pi) = (S,≺, `) and abs(pi′) = (S′,≺′, `′).
We define abs(pi) BJ∗ abs(pi′) if there exist a labelled par-
tially ordered set (S′′,≺′′, `′′) and one-to-one mappings ϕ :
abs(pi) → (S′′,≺′′, `′′) and ψ : abs(pi′) → (S′′,≺′′, `′′) pre-
serving the labels and such that:
• ≺′′= ϕ(≺)∩ψ(≺′) (≺′′ is a relaxation of ≺ and ≺′);
• if e is an output event and f ≺ e then ϕ( f ) ≺′′ ϕ(e)
(in step 1, existing ordering constraints for outputs are
preserved);
• if e′ is an input event and f ′ ≺′ e′ then ψ( f ′)≺′′ ψ(e′)
(in step 2, no new ordering constraints for inputs can
appear). ♦
This definition turns out to be too restrictive in practice,
e.g., in the picture below, the abstraction of the process ob-
tained by adding the dashed arc is not bound to the abstrac-
tion of the original one by BJ∗ , since delaying o indirectly
delays i2.
i1 o1
i2
In practice, one often can assume the weak fairness,
i.e., that a transition cannot remain enabled forever: it must
either fire or be disabled by another transition firing. Un-
der this assumption, the transformation in the picture above
is quite reasonable. The following notion is less restrictive
than the one given in Definition 2. Unlike that definition, it
is mostly concerned with direct ordering constraints.
Let (S,≺) be a partially ordered set and s ∈ S. An s′ ∈ S
is a direct predecessor of s if s′ ≺ s and there is no s′′ ∈ S
such that s′ ≺ s′′ ≺ s. We will denote by DP≺(s) the set of
direct predecessors of an s ∈ S.
Example 1. Consider the acyclic graph representing a par-
tial order, with the direct predecessor relation shown as
solid lines and the transitive arcs shown as dotted lines.
The second graph represents the relaxation of this order ob-
tained by eliminating the arc (s2,s3). Note that some of the
transitive arcs are now a part of the direct predecessor re-
lation.
s1 s2 s3 s4
s5 s6
s1 s2 s3 s4
s5 s6
Definition 3. Let pi and pi′ be processes of ϒ and ϒ′,
respectively, abs(pi) = (S,≺, `) and abs(pi′) = (S′,≺′, `′).
We define abs(pi) BJ abs(pi′) if there exist a labelled par-
tially ordered set (S′′,≺′′, `′′) and one-to-one mappings ϕ :
abs(pi) → (S′′,≺′′, `′′) and ψ : abs(pi′) → (S′′,≺′′, `′′) pre-
serving the labels and such that:
• ≺′′= ϕ(≺)∩ψ(≺′) (≺′′ is a relaxation of ≺ and ≺′);
• if e is an output event and f ∈ DP≺(e) then ϕ( f ) ∈
DP≺′′(ϕ(e)) (in step 1, existing direct ordering con-
straints for outputs are preserved, and existing indirect
ones can become direct, e.g., as in the picture below);
i1 i2 o BJ
i1
i2
o
• if e′ is an input event and f ′ ∈ DP≺′(e′) then ψ( f ′) ∈
DP≺′′ψ(e′) (in step 2, no new direct ordering con-
straints for inputs can appear, e.g., as in the picture
below).
i
o1
o2
BJ o1 o2 i
♦
The following proposition states that BJ is less restric-
tive than BJ∗ .
Proposition 1. Let (S,≺, `) and (S′,≺′, `′) be labelled par-
tially ordered sets such that (S,≺, `) BJ∗ (S′,≺′, `′). Then
(S,≺, `) BJ (S′,≺′, `′).
Proof. Follows from comparison of Definitions 2 and 3 tak-
ing into account the following facts:
• if s′ ∈ DP≺(s) then s′ ≺ s;
• if ≺′ is a relaxation of ≺, s′ ∈ DP≺(s) and s′ ≺′ s then
s′ ∈ DP≺′(s).
In the rest of this paper we will assume that the weak
fairness condition holds and use BJ rather than BJ∗ .
Example 2. The following hold:
i1
i2
BJ
6BJ −1 i1 i2
o1 o2
BJ
6BJ −1
o1
o2
7
i1 i2
o
BJ
6BJ −1
i1 i2
o
BJ
6BJ −1
i1 i2
o
o1 o2
i
BJ
6BJ −1
o1 o2
i
BJ
6BJ −1
o1 o2
i
i1
i2
o1 o2
BJ
6BJ −1
i1 i2
o1 o2
BJ
6BJ −1
i1 i2
o1
o2
Note that BJ is an order (if we do not distinguish
order-isomorphic partially ordered sets). In the sequel,
slightly abusing the notation, we will write pi BJ pi′ instead
of abs(pi) BJ abs(pi′).
Definition 4 (Validity). ϒ′ is a valid realisation of ϒ, de-
noted ϒ ( ϒ′, if there is a relation ∝ between the finite
processes of ϒ and ϒ′ such that pi /0 ∝ pi′/0 (where pi /0 and pi′/0
are the empty processes of ϒ and ϒ′, respectively), and for
all finite processes pi and pi′ such that pi ∝ pi′:
• pi BJ pi′
• For all maximal processes Π′wpi′, and for all finite pro-
cesses p̂i′ w pi′ such that p̂i′ v Π′, there exist finite pro-
cesses p˜i′wp̂i′ and p˜iwpi such that p˜i′vΠ′ and p˜i∝p˜i′.
• For all maximal processes Πw pi, and for all finite pro-
cesses p̂iw pi such that p̂ivΠ, there exist finite proces-
ses p˜iw p̂i and p˜i′ w pi′ such that p˜ivΠ and p˜i ∝ p˜i′. ♦
Intuitively, every activity of ϒ is eventually performed by ϒ′
(up to the BJ relation) and cannot be pre-empted due to
choices, and vice versa, i.e., ϒ′ and ϒ simulate each other
with a finite delay. Note that ( is a pre-order, i.e., a se-
quence of two valid transformations is a valid transforma-
tion.
In this definition, considering maximal processes is es-
sential. Indeed, according to this notion the transformation
in Fig. 3 is not valid, since in the original LPN no extension
of the process comprising an instance of o within the max-
imal process comprising an infinite sequence of instances
of i1 and an instance of o has a corresponding (in terms of
the BJ relation) process in the transformed LPN, which
would have to fire i2 before it is able to fire o.
Example 3. The following hold:
i1
i2 i1
i2 (
6(−1 i1 i2
o1 o2
(
6(−1
o1
o2 o1
o2
o
6(
6(−1
o o
o
. . . o
.
.
.
o
. . .
4 Concurrency reduction
Now we give a general definition of concurrency reduc-
tion (see Fig. 4).
Definition 5 (Concurrency reduction). Given an LPN
ϒ = (Σ,I ,O, `) where Σ = (P,T,F,M0), a non-empty set of
transitions U ⊂ T , a transition t ∈ T \U and an n ∈ N, the
transformation U n99K t, yielding an LPN ϒ′ = (Σ′,I ,O, `)
with Σ′ = (P′,T,F ′,M′0) is defined as follows:
• P′ df= P∪{p}, where p /∈ P∪T is a new place;
• F ′ df= F ∪{(u, p)|u ∈U}∪{(p, t)};
• For all places q ∈ P, M′0(q)
df
= M0(q), and M′0(p)
df
= n.
We will write U 99K t instead of U 099K t and u n99K t instead
of {u} n99K t. ♦
Note that concurrency reduction cannot add new behaviour
to the system — it can only restrict it. Furthermore, one
can easily show that if a concurrency reduction U n99K t such
that `(t) /∈ I is applied to an input-proper LPN ϒ, then the
resulting LPN ϒ′ is also input proper.
The proposition below is quite technical. Its advantage
is that the conditions are imposed only on the original LPN.
Some of these conditions are simplified later for special net
classes.
8
np
U
u1
u2
.
.
.
uk
t
Figure 4. Concurrency reduction U n99K t.
Proposition 2 (Validity condition for a concurrency re-
duction). Let U n99K t be a concurrency reduction trans-
forming an input-proper LPN ϒ = (Σ,I ,O, `) into ϒ′ =
(Σ′,I ,O, `), such that `(t) /∈ I and for each finite process pi
of ϒ, each t-labelled event e and each maximal process
Π w pi⊕{e} of ϒ, there exists a finite process p˜i v Π of
ϒ such that p˜i w pi⊕{e}, e ∈ max≺ p˜i, t /∈ h(p˜i \ (pi⊕{e}))
and n+#U p˜i≥ #t p˜i. Then ϒ ( ϒ′.
Proof. We define the relation ∝ between the finite processes
of ϒ and ϒ′ as follows: pi ∝ pi′ iff there exists a one-to-one
mapping ξ between the nodes of pi and non-p-labelled nodes
of pi′, where p is the place added by the concurrency reduc-
tion (note that pi does not contain p-labelled conditions),
such that for every condition c and event e of pi:
• h(c) = h(ξ(c)) and h(e) = h(ξ(e)) (i.e., ξ preserves the
labels of events and conditions).
• e ∈ •c iff ξ(e) ∈ •ξ(c) and e ∈ c• iff ξ(e) ∈ (ξ(c))•
(i.e., ξ preserves the environments of the conditions;
note that the environments of the events in pi′ may
contain additional p-labelled conditions which are not
present in pi).
Intuitively, pi ∝ pi′ iff pi′ can be obtained from pi by adding a
few p-labelled conditions and the corresponding arcs. Note
that according to this definition, pi /0 ∝ pi′/0. We proceed by
proving a few properties of this relation.
Claim 1: if pi ∝ pi′ then pi BJ pi′.
We choose the labelled partially ordered set (S′′,≺′′, `′′)
in Definition 3 to be abs(pi), ϕ to be the identity map-
ping, and ψ to be ξ−1 restricted to the events labelled
by non-internal transitions. Both ϕ and ψ preserve
the labels and ≺′′= ϕ(≺)∩ψ(≺′) (the latter holds be-
cause ≺′′=≺ by definition, and all the arcs in pi are
also present in pi′, i.e., ≺ is a relaxation of ≺′).
Since ≺′′=≺, the only property we still have to prove
is that if e′ is an input event and f ′ ∈ DP≺′(e′) then
ψ( f ′) ∈ DP≺′′ψ(e′). It holds because `(t) /∈ I and ϒ
is input proper, and thus no input event is delayed (ei-
ther directly or via a chain of τ-labelled events) by the
transformation.
Claim 2: ξ(Cut(pi))⊆ Cut(pi′).
By the definition of the ∝ relation, ξ preserves the en-
vironments of the conditions, and the claim easily fol-
lows from the fact that a condition in a finite process
belongs to the cut iff its postset is empty. Note that in
general ξ(Cut(pi)) 6= Cut(pi′), since the latter may con-
tain p-labelled conditions which are not present in the
former.
Claim 3: if pi ∝ pi′ and pi′ can be extended by a finite set
of events E ′ then pi can be extended by a finite set of
events E such that h(E) = h(E ′) and pi⊕E ∝ pi′⊕E ′.
If E ′ is a singleton {e′} then •e′ ∈ Cut(pi′) and the re-
sult follows from Claim 2 (even if e′ is t-labelled, since
there is no p-labelled place in ϒ able to restrict the fir-
ing of t). Since any finite extension of pi′ can be ob-
tained by a finite sequence of single-event extensions,
the claim follows by induction.
Claim 4: if pi ∝ pi′, pi can be extended by a finite set of
events E and t /∈ h(E) then pi′ can be extended by
a finite set of events E ′ such that h(E) = h(E ′) and
pi⊕E ∝ pi′⊕{E ′}.
The proof is very similar to that of Claim 3 (but we do
not have to take care of t-labelled events, which are the
only ones consuming tokens from place p in ϒ′).
Claim 5: if pi ∝ pi′, p ∈ Mark(pi′) and pi can be extended
by a t-labelled event e then pi′ can be extended by a t-
labelled event e′ in such a way that pi⊕{e}∝ pi′⊕{e′}.
Since •e∈Cut(pi), the result follows from Claim 2 and
the presence of a p-labelled condition in Cut(pi′). Note
that e′ can be chosen in a non-unique way if there are
several p-labelled conditions in Cut(pi′).
Now we need to demonstrate that the relation ∝ satisfies
Definition 4, i.e., assuming that pi ∝ pi′ we need to show that
1. pi BJ pi′.
This property holds by Claim 1.
2. For all maximal processes Π′ w pi′, and for all finite
processes p̂i′ w pi′ such that p̂i′ v Π′, there exist finite
processes p˜i′ w p̂i′ and p˜iwpi such that p˜i′vΠ′ and p˜i∝p˜i′.
This property follows from Claim 3.
3. For all maximal processes Πw pi, and for all finite pro-
cesses p̂iw pi such that p̂ivΠ, there exist finite proces-
ses p˜iw p̂i and p˜i′ w pi′ such that p˜ivΠ and p˜i ∝ p˜i′.
9
Since any extension p̂iw pi can be obtained by a sequence of
single-event extensions, it suffices to prove this property for
the case p̂i = pi⊕{e} vΠ.
If h(e)6=t then the result follows from Claim 4. If h(e)=t
then there exists a finite p˜i v Π such that p˜i w pi⊕{e}, e ∈
max≺ p˜i, t /∈ h(p˜i\ (pi⊕{e})) and n+#U p˜i ≥ #t p˜i. Since e is
a maximal event of both pi⊕{e} and p˜i, the events in E df=
p˜i\(pi⊕{e}) are concurrent to e. Moreover, t /∈ h(E), and so
by Claim 4, pi⊕E ∝ pi′⊕E ′ and h(E) = h′(E ′) for some E ′.
Since p˜i = (pi ⊕ E)⊕ {e}, e is t-labelled and t /∈ U ,
#U p˜i = #U ((pi⊕E)⊕{e}) = #U (pi⊕E) and #t p˜i = #t((pi⊕
E)⊕{e}) = #t(pi⊕E)+1, and thus n+#U p˜i≥ #t p˜i implies
n+#U (pi⊕E)≥ #t(pi⊕E)+1 > #t(pi⊕E).
Since h(pi) = h′(pi′) and h(E) = h′(E ′), h(pi ⊕ E) =
h′(pi′⊕E ′), and thus n+#U (pi′⊕E ′) > #t(pi′⊕E ′), i.e., p ∈
Mark(pi′ ⊕ E ′), and so by Claim 5, pi′ ⊕ E ′ can be ex-
tended by a t-labelled event e′ in such a way that pi⊕{e} v
(pi⊕E)⊕{e}= p˜i ∝ p˜i′ = (pi′⊕E ′)⊕{e′} w pi′.
This validity condition for general LPNs is quite com-
plicated. It can be somewhat simplified if t is a non-auto-
concurrent transition.
Proposition 3 (Validity condition for a concurrency re-
duction on non-auto-concurrent nets). Let U n99K t be a
concurrency reduction transforming an input-proper LPN
ϒ = (Σ,I ,O, `) into ϒ′ = (Σ′,I ,O, `), such that `(t) /∈ I ,
and t is non-auto-concurrent and such that for each t-
labelled event e and for each maximal process Π ⊇ [e] of
ϒ there is a finite set EU ⊆Π of events with labels in U con-
current to e such that n+#U [e]+ |EU | ≥ #t [e]. Then ϒ ( ϒ′.
Proof. Any maximal process Πw pi⊕{e} is also a maximal
extension of [e]. Take p˜i df= (pi⊕{e})∪ [EU ]; note that p˜ivΠ
since pi⊕{e} v Π and EU ⊆ Π. All the events in EU are
concurrent to e, and thus e ∈ max≺ p˜i. Since t is non-auto-
concurrent, all the t-labelled events in p˜i are in [e], i.e., t /∈
h(p˜i\ (pi⊕{e})) and #t p˜i = #t [e]; thus, since #U p˜i ≥ #U [e]+
|EU |,
n+#U p˜i−#t p˜i = n+#U p˜i−#t [e]≥
n+#U [e]+ |EU |−#t [e]≥ 0 ,
so, by Proposition 2, ϒ ( ϒ′.
Remark 1. This proposition requires the non-auto-concur-
rency of a particular transition rather than the absence of
two transitions with the same label which can be executed
concurrently. That is, the non-auto-concurrency is required
not on the level of LPN, but rather on the level of the under-
lying Petri net. In particular, the non-auto-concurrency is
guaranteed for safe Petri nets.
The next section explains how concurrency reduction
can be employed for resolution of encoding conflicts in STG
unfoldings.
5 Resolution of encoding conflicts
At the level of unfoldings, encoding conflicts can be
compactly represented using conflict cores [15]. Encod-
ing conflicts can be resolved by either adding auxiliary sig-
nals or by concurrency reduction. The former approach was
studied in [15], where additional signals are employed to
disambiguate states having the same binary encodings. The
latter makes some of the states unreachable and thus can
eliminate encoding conflicts.
The resolution of encoding conflicts by signal inser-
tion is illustrated in Fig. 2(c), where the signal csc is in-
serted concurrently to existing transitions in order to min-
imise the latency. The logic equations for this solution are
shown in Fig. 2(d). Fig. 2(c) also shows how to reduce the
concurrency between lds− and dtack− so that state M1 is
removed from the reachability graph shown in Fig. 2(b),
which in turn resolves the encoding conflict. One can see
that in this example the equations for the signal insertion
are more complex then those obtained by concurrency re-
duction. This can be explained as follows. The concur-
rent insertion of auxiliary transitions avoids delaying any
output transitions but increases the state space and thus re-
duces the ‘don’t care’ set, which is used for logic optimi-
sation. Moreover, signal insertion increases the number of
support variables for output signals. Furthermore, an addi-
tional logic gate is required to implement the inserted sig-
nal. In contrast, concurrency reduction reduces the state
space, increasing the ‘don’t care’ set, while delaying the
output transition dtack−, making it wait until lds− com-
pletes.
In general, concurrency reduction produces smaller cir-
cuits, and it may also be the case that the resulting circuit is
faster due to simplification of the gates. Thus, even though
the system manifests less concurrency, it might be actually
faster due to the events taking less time to fire. On the other
hand, there are situations when signal insertion produces
better solutions.
A combined framework is presented here, which uses
both signal insertion and concurrency reduction to elimi-
nate cores and the corresponding encoding conflicts. This
allows to explore a larger design space.
5.1 Encoding conflicts in a prefix
A CSC conflict can be represented as an unordered con-
flict pair of configurations 〈C1,C2〉 whose final states are in
CSC conflict, as shown if Fig. 2(c). In [10, 11] two tech-
niques for detecting CSC conflicts (based, respectively, on
integer programming and SAT) were proposed. Essentially,
they allow for efficiently finding such conflict pairs in STG
unfolding prefixes.
The set of all conflict pairs may be quite large, e.g., due
10
to the following ‘propagation’ effect: if C1 and C2 can be
expanded by the same event e then 〈C1∪{e},C2∪{e}〉 is
also a conflict pair (unless these two configurations enable
the same set of local signals). Therefore, it is desirable to
reduce the number of pairs needed to be considered, e.g., as
follows. A conflict pair 〈C1,C2〉 is called concurrent if
C1 * C2, C2 * C1 and C1∪C1 is a configuration. Below is a
slightly modified version of a proposition proven in [10]:
Proposition 4. Let 〈C1,C2〉 be a concurrent CSC conflict
pair. Then C = C1∩C1 is such that either 〈C,C1〉 or 〈C,C2〉
is a CSC conflict pair.
Thus concurrent conflict pairs are ‘redundant’ and should
not be considered. The remaining conflict pairs can be clas-
sified as follows:
Conflict pairs of type I are such that either C1 ⊂ C2 or
C2 ⊂ C1 (Fig. 2(c) illustrates this type of CSC con-
flicts).
Conflict pairs of type II are such that C1\C2 6= /06=C2\C1
and there exist e′ ∈C1 \C2 and e′′ ∈C2 \C1 such that
e′#e′′ (Fig. 7(c) illustrates this type of CSC conflicts).
Definition 6 (Core). Let 〈C1,C2〉 be a conflict pair of con-
figurations. The corresponding complementary set is de-
fined as C S df= C14C2, where 4 denotes the symmetric set
difference. C S is a core if it cannot be represented as the
union of several disjoint complementary sets. ♦
For example, the core corresponding to the conflict pair
shown in Fig. 2(c) is {e4, . . . ,e8,e10} (note that for a con-
flict pair 〈C1,C2〉 of type I, such that C1 ⊂ C2, the corre-
sponding core is simply C2 \C1), and the core correspond-
ing to the conflict pair 〈{e1,e4,e6} ,{e2}〉 in Fig. 7(c) is
{e1,e2,e4,e6}.
One can show that every complementary set C S can be
partitioned into C1 \C2 and C2 \C1, where 〈C1,C2〉 is a con-
flict pair corresponding to C S . Moreover, if C S is of type I
then one of these parts is empty, while the other is C S it-
self. An important property of complementary sets is that
for each signal z ∈ Z, the differences between the numbers
of z+– and z−–labelled events are the same in these two
parts (and are 0 if C S is of type I). This suggests that a com-
plementary set can be eliminated, e.g., by introduction of a
new internal signal and insertion of its transition into this
set, or by ‘dragging’ an existing event into it using addi-
tional ordering constraints, as these would violate the stated
property.
5.2 Core elimination by signal insertion
A framework for visualisation and manual resolution of
encoding conflicts was presented in [15], where cores are
eliminated by signal insertion. By introducing an additional
internal signal and insertion of its transition, say csc+, into
the core one can destroy it eliminating thus the correspond-
ing encoding conflicts. To preserve the consistency of the
STG, the transition’s counterpart csc− must also be inserted
outside the core, in such a way that it is neither concurrent
to nor in structural conflict with csc+. Another restriction
is that an inserted signal transitions cannot trigger an input
signal transition (the reason is that this would impose con-
straints on the environment which were not present in the
original STG, making it ‘wait’ for the newly inserted sig-
nal).
The core in Fig. 2(c) can be eliminated by inserting a
new signal, csc+, somewhere in the core, e.g., concurrently
to e5 and e6 between e4 and e7, and by inserting its comple-
ment outside the core, e.g., concurrently to e11 between e9
and e12. (Note that concurrent insertion of these two transi-
tions avoids an increase in the latency of the circuit, where
each transition is assumed to contribute a unit delay.) Af-
ter transferring this signal into the STG, it satisfies the CSC
property.
It is often the case that cores overlap. In order to min-
imise the number of inserted signals, and thus the area and
latency of the circuit, it is advantageous to insert a signal
in such a way that as many cores as possible are elimi-
nated by it. That is, a signal should be inserted into the
intersection of several cores whenever possible. In [15] the
exploitation of core overlaps is implemented by means of
a height map showing the quantitative distribution of the
cores. The events located in cores are assigned an altitude,
i.e., the number of cores it belongs to. (The analogy with
a topographical map showing the altitudes may be helpful
here.) ‘Peaks’ with the highest altitude are good candidates
for insertion, since they correspond to the intersection of
maximum number of cores.
The elimination of encoding conflicts by signal insertion
is schematically illustrated in Fig. 5, which represent typ-
ical cases in STG specifications. Cores ‘in sequence’, can
be eliminated in a ‘one-hot’ manner as depicted in Fig. 5(a).
Each core is eliminated by one signal transition, and its
complement is inserted outside the core, preferably, into an-
other non-adjacent one.1
An STG that has a core in one of the concurrent branches
can also be tackled in a ‘one-hot’ way, as shown in Fig. 5(b).
Note that in order to preserve the consistency, the tran-
sition’s counterpart cannot be inserted into the concurrent
branch, but can be inserted before the fork transition or af-
ter the join one. In a branch which is in a structural conflict
with another branch, the transition’s counterparts must be
inserted in the same branch somewhere between the choice
1The union of two adjacent cores is usually a complementary set which
will not be destroyed if both the transition and its counterpart are inserted
into it.
11
csc−
csc+
csc−
(a) sequential
fork
join
csc−
csc−
csc−
csc−
csc−csc+
(b) concurrent
choice
mergemerge
csc−
csc−
csc−
csc+
csc−
csc−
(c) structural conflict
Figure 5. Strategies for core elimination by
signal insertion.
and the merge points, as shown in Fig. 5(c).
Obviously, the described cases do not cover all possible
situations and all possible insertions (e.g., one can some-
times insert a new signal transition before the choice point
and its counterparts into each branch, etc.), but they give an
idea how the cores can be eliminated.
5.3 Core elimination by concurrency reduction
Concurrency reduction removes some of the reachable
states of the STG and thus can be used for the resolution
of encoding conflicts. The elimination of conflict cores
by concurrency reduction involves the introduction of ad-
ditional ordering constraints, which fix some order of exe-
cution. In an STG, a fork transition defines the starting point
of concurrency and a join transition defines the end point.
Existing signals can be used to disambiguate the conflicting
states in a core by delaying the starting point or bringing for-
ward the ending point of concurrency. If there is an event
concurrent to the core, and a starting or ending point of con-
currency is in the core, then this event can be forced into the
core by an additional ordering constraint, thus destroying it.
For example, in Fig. 2(c), e9 is concurrent to some of the
events in the core, and the starting point of concurrency is in
the core, so the concurrency reduction shown by the dashed
line in this figure can be used to eliminate the core by ‘drag-
ging’ e9 into it. Two kinds of concurrency reduction based
transformations for core eliminations are described below.
Forward concurrency reduction illustrated in Fig. 6(a)
performs the concurrency reduction h(EU )
n
99K h(g) in
the STG, where EU is a maximal (w.r.t.⊂) set of events
outside the core which are in structural conflict with
core
f
g
e
core
d
f
g
p
e
(a) forward
core
f
e
g
core
p f
e
g
d
(b) backward
Figure 6. Core elimination by concurrency re-
duction.
each other and concurrent to g — an event in the core.
It is assumed that e is in the core, either e≺ g or e co g,
and for exactly one event f ∈ EU , e≺ f .
Backward concurrency reduction illustrated in Fig. 6(b)
works in a similar way, but the concurrency reduction
h(EU )
n
99K h( f ) is performed. It is assumed that e is in
the core, f is an event outside the core such that f ≺ e,
EU is a maximal (w.r.t. ⊂) set of events which are in
structural conflict with each other and concurrent to f ,
such that exactly one event g ∈ EU is in the core, and
either g≺ e or g co e.
In both cases the core is destroyed by additional ordering
constraints ‘dragging’ f into the core.
These two rules are illustrated by the examples in Fig. 7,
where they are applied to cores of types I (parts (a,b) of
this figure) and II (parts (c,d) of this figure). In Fig. 7(a)
instances of b+ and a− are concurrent to the core. The for-
ward concurrency reduction b+ 99K e− can be applied, be-
cause b+ succeeds e+ and e− succeeds e+. This ‘drags’ b+
into the core, destroying it. Note that f is an input and
thus cannot be delayed, and so the concurrency reductions
b+ 99K f + and b+ 99K f− would be invalid. The back-
ward concurrency reductions e+ 99K a− and f + 99K a− can
also be applied to eliminate the conflict core, because a−
precedes e−, and both e+ and f + are in the core and pre-
cede e−. Either of these reductions ‘drags’ a− into the core,
destroying it.
In Fig. 7(b), d+ is concurrent to events in the core and
precedes c+, an event in the core. The only event in the
core which precedes or is concurrent to c+ is a+. However,
a+ 99K d+ is an invalid transformation, which introduces a
deadlock. The concurrency reduction {a+,b+} 99K d+ has
been used instead, since b+#a+ and b+ co d+.
12
Fig. 7(c,d) show the elimination of type II cores. A for-
ward concurrency reduction is illustrated in Fig. 7(c). An
instance of d+ is concurrent to the core and succeeds a+,
an event in the core, and therefore it can be used for a for-
ward reduction. The only possible concurrency reduction is
d+ 99K a−, since b+ is an input and thus cannot be delayed.
The backward concurrency reduction technique is illus-
trated in Fig. 7(d), where d+ is concurrent to a+ and e+
in the core and precedes b+ in the core. The only events
in the core which either precede or are concurrent to b+
are a+ and e+, and either of them can be used to eli-
minate the core. However, both reductions a+ 99K d+
and e+ 99K d+ are invalid, since they introduce dead-
locks. Thus c+ should be involved, yielding the follow-
ing two backward concurrency reductions eliminating the
core: {a+,c+} 99K d+ and {c+,e+} 99K d+. Note that the
reductions {a+,b+/1} 99K d+ and {b+/1,e+} 99K d+ do
not eliminate the core, because d+ is ‘dragged’ into both
branches of the core, and so the net sum of signals in these
two branches remains equal. (And our backward concur-
rency reduction rule does not allow to use these two trans-
formations, since only one event from the set EU is allowed
to be in the core.)
6 Implementation
In our framework, encoding conflicts can be eliminated
by the introduction of auxiliary signals and concurrency re-
duction. A heuristic cost function is applied to select the
best transformation for the resolution of encoding conflicts.
It takes into account: (i) the delay caused by the applied
transformation; (ii) the estimated increase in the complex-
ity of the logic and (iii) the number of cores eliminated by
the transformation. The resolution process involves finding
an appropriate transformation for the elimination of cores
in the STG unfolding prefix, as was explained earlier. The
following steps are used to resolve the CSC conflicts:
1. Construct an STG unfolding prefix.
2. Compute the cores and, if there are none, terminate.
3. Choose areas for transformation (the ‘highest peaks’
in the height map corresponding to the overlap of the
maximum number of cores are good candidates).
4. Compute valid transformations for the chosen areas
and sort them according to the cost function; if no valid
transformation is possible then
• change the transformation areas by including the
next highest peak and repeat step 4;
• otherwise manual intervention by the designer is
necessary; the progress might still be possible if
the designer relaxes some I/O constraints, uses
timing assumptions, etc.
f−
e−
e+
f+
b+
a−
d+
a+
c−
d−
c+b−
inputs: b,c, f ; outputs: a,d,e;
forward reduction: b+ 99K e−
backward reductions: e+ 99K a−; f + 99K a−
(a)
2p
1p p’
c+
c−
a+
e−
d−
e+
d+
c−
a−
c+
b−
b+
inputs: a,b; outputs: c,d,e;
backward reduction:
{a+,b+} 99K d+
(b)
e6
e10
e8
e9
e7
e5
e2
e1
e4
e3
c+
c−
b+/1
b−/1
a+
b+
a−
b−
d−
d+
inputs: b,d; outputs: a,c;
forward reduction:
d+ 99K a−
(c)
2
p
1
p’’ p’
b+/1
b+ e+
c−
b−/1
a+
a−
e−
c+
d+
f+
d−
b−
p
f−
inputs: a,b, f ; outputs: c,d,e;
backward reductions:
{a+,c+} 99K d+; {c+,e+} 99K d+
(d)
Figure 7. Elimination of type I (a,b) and type II
(c,d) cores.
5. Select the best according to the cost function transfor-
mation; if it is a signal insertion then the location for
insertion of the counterpart transition is also chosen.
6. Perform the best transformation and continue with
step 1.
The described framework is being integrated into our tool
CONFRES [15].
6.1 Cost function
A cost function was developed to heuristically select on
each iteration of the encoding conflict resolution process the
best transformation (either a concurrency reduction or a sig-
nal insertion). It is comprised of three parts, taking into ac-
count the delay penalty inflicted by the transformation, the
estimated increase in the complexity of the logic, and the
13
number of cores eliminated by the transformation:
cost
df
= α1 ·∆ω+α2 ·∆logic+α3 ·∆cores .
The parameters α1,2,3 ≥ 0 are given by the designer and
can be used to direct the heuristic search towards reducing
the delay inflicted by the transformation (α1 is large com-
pared with α2 and α3) or the estimated complexity of logic
(α2 and α3 are large compared with α1).
The first part of the cost function estimates the delay
caused by a transformation. A delay model where each tran-
sition of the STG is assigned an individual delay is consid-
ered; e.g., input signals usually take longer to complete than
non-input ones, because they often denote the end of a cer-
tain computation in the environment. (This delay model is
similar to that in [3,4].) It is quite crude, but it is hard to sig-
nificantly improve it, since the exact time behaviour is only
known after the circuit and its environment are synthesised.
Weighted events’ depths in the unfolding prefix are used
to determine the delay penalty of the transformation. The
weighted depth ωe of an event e is defined as follows:
ωe
df
=
{
ωh(e) if •(•e) = /0
ωh(e) +maxe′∈•(•e) ωe′ otherwise,
where ωt is the delay associated with transition t ∈ T . In
our implementation, these delays are chosen as follows:
ωt
df
=
{
3 if t is an input transition
1 otherwise.
These parameters can be fine-tuned by the designer if nec-
essary.
For a concurrency reduction h(U) n99K t, the delay penal-
ty ∆ω is computed as the difference of the weighted depths
of a t-labelled event after and before the concurrency re-
duction. The value of ∆ω is positive if t is delayed by the
transformation, otherwise it is 0. Note that the event’s depth
after the reduction is calculated using the original prefix.
For a signal insertion, several (at least two) transitions
of a new signal csc are added to the STG. For each such a
transition t, the inflicted delay penalty ∆ωt is computed, and
then these penalties are added up to obtain the total delay
penalty ∆ω df= ∑t ∆ωt . If the insertion is concurrent, no addi-
tional delay is inflicted (∆ωt df= 0), since in our delay model
the transitions corresponding to internal signals are fast, and
so their firing time cannot exceed that of the concurrent
transitions. If the insertion is sequential, the inflicted delay
penalty ∆ωt is computed by adding up the delay penalties
of all the transitions u delayed by t: ∆ωt
df
= ∑u ∆ωut , where
for each such u, the delay penalty ∆ωut is computed as the
difference of the weighted depths of a u-labelled event after
and before the transformation. Note that ∆ω is calculated
using the original prefix.
The second part of the cost function, ∆logic, estimates
the increase in the complexity of the logic. The logic com-
plexity is estimated using the number of triggers of each
local signal. The set of triggers of a signal z ∈ Z is defined
on the (full) unfolding as
trg(z) df=
{
z′ ∈ Z
∣∣∣ ∃e′ ∈ ⋃
(λ◦h)(e)=z±
trg(e) : (λ◦h)(e′) = z′±
}
,
and can be approximated from below using a finite prefix of
the STG unfolding.
We will also denote by trg′(z) the number of triggers of
a z ∈ Z after the transformation. (Note that for the transfor-
mations which we use, trg′(z) can be approximated using
the original prefix.)
For a concurrency reduction U n99K t, where t is a z-
labelled transition, the estimated increase in complexity of
the logic ∆logic is computed as
∆logic df= C(|trg′(z)|)−C(|trg(z)|) ,
where
C(n)
df
=

0 if n = 1
1 if n = 2⌈
2n
n
⌉
if n > 2
estimates the number of binary gates needed to implement
an n-input Boolean function. This formula was chosen be-
cause the asymptotic average number of binary gates in a
Boolean circuit implementing an n-input Boolean function
is 2n
n
[18], and that all the triggers of a signal z are always in
the support of the complex gate implementing z. Note that
the maximal number of triggers which can be added is |U |;
the actual number of added triggers can be smaller if some
of the signals labelling the transitions in U were already
triggers of z. (In fact, this number can even be negative,
e.g., as in the first solution for the A/D convertor case study
described in the next section.) This definition of ∆logic dis-
courages solutions using complex gates with too many in-
puts: the penalty is relatively small if the number of triggers
is small, but grows exponentially if the transformation adds
new triggers to a signal which already had many triggers.
For a signal insertion, several local signals in the modi-
fied STG can be triggered by the new signal csc. Let Z ′ de-
note the set of all such signals. For each signal z∈ Z ′, the in-
crease ∆logicz in the complexity of the logic implementing z
is estimated, and then these estimates are added up. (Note
that ∆logicz can be negative for some z ∈ Z ′, e.g., when csc
replaces more than one trigger of z.) Moreover, the added
signal csc has to be implemented, and thus introduces addi-
tional logic complexity, which is estimated and added to the
result: ∆logic df= (∑z∈Z′ ∆logicz)+∆logiccsc, where
∆logicz
df
= C(|trg′(z)|)−C(|trg(z)|)
14
for all z ∈ Z′, and
∆logiccsc
df
= C(|trg′(csc)|) .
Note that in the case of signal insertion, at most one addi-
tional trigger (viz. csc) per signal can be introduced.
The third part of the cost function, ∆cores, estimates how
many cores are eliminated by the transformation. It is com-
puted by checking for each core ‘touched’ by the trans-
formation whether it is eliminated or not, using the orig-
inal prefix. While doing this, the following consideration
concerning signal insertion should be taken into account:
if both rising and falling transitions of the new signal are
inserted into the same complementary set, it is not elim-
inated; in particular, if these transitions are inserted into
adjacent cores, the complementary set obtained by uniting
these cores will resurface as a new core on the next iteration
(even though the original cores are eliminated).
Note that for efficiency reasons the cost function should
be computed on the original unfolding prefix. This strat-
egy significantly reduces the number of times the unfolding
prefix has to be built, saving time.
7 Case studies
In this section, two examples demonstrating the pro-
posed combined framework for the resolution of encoding
conflicts are discussed.
7.1 Weakly synchronised pipelines
Fig. 8(a) shows an STG modelling two weakly synchro-
nised pipelines without arbitration [11]. The STG exhibits
encoding conflicts resulting in two cores shown in Fig. 8(b),
where two possible concurrency reductions resolving the
CSC conflicts are shown. Both cores can be eliminated
by introducing a causal constraint, either z− 199K x+1 or
z−
1
99K x+2 . However, the first reduction delays x
+
1 and adds z
to the triggers of x1, whereas the second reduction has no ef-
fect on the delay (z− can be executed concurrently with its
predecessor) and on the number of triggers of x2 (as z+ al-
ready triggers x−2 ). Thus the latter reduction is preferable
according to our cost function, resulting in the STG shown
in Fig. 8(a), with the dashed arc taken into account. The
corresponding equations are presented in Fig. 8(d).
The cores can also be eliminated by an auxiliary signal
csc. Phase one of the resolution process inserts a signal tran-
sition somewhere into the highest peak in the height map,
which comprises the events e8,e10 and e11. For example, in
Fig. 8(c) a signal transition csc+ is inserted after e8 and its
counterpart is inserted outside the cores before e6, ensuring
that the cores are destroyed. Other valid insertions are pos-
sible, e.g., inserting csc+ before e10 and its counterpart be-
fore e6. Both these transformations eliminate all the cores,
x +2y +2y +2 x +2
y −2 y −2
e13e13
e2e3
e9
e5
y +1
y +2
x +2
x +1
x −2
x +2x +2e11
x −1
e8
x −2
e4
e9
e5
y −1
e2e3
y −2
1x −
y −1
y +1 x +1
e0e1
y −1
e7
y +1
1x −e4
x +1
e10
e11
e12 x −1
e8
x −2
y +1 x +1
e0e1
e7
y +1
1x −
x +1
e10
e12
csc−
split
z−
z+
z−
z+ z−
z+
csc+ split
(b)(a)
e
(c)
6
e6
x1 = x2
x2 = z · (x1 +x2)+x1 ·x2
y1 = y2
y2 = y1 · (y2 + z)+y2 · z
z = y2 · (z+x2)+x2 · z
(d)
x1 = x2 · csc
x2 = x2 · (z+ csc)+x1
y1 = y2
y2 = y1 · (y2 + z)+y2 · z
z = z ·y2 + csc
csc = csc · (y2 + z)+x2
(e)
x1 = x2 · csc
x2 = x2 · (z+ csc)+x1
y1 = y2
y2 = y1 · (y2 + z)+y2 · z
z = z ·y2 +x2 · csc
csc = csc · (y2 + z)+x2
(f)
outputs: x1,x2,y1,y2,z; internal: csc
Figure 8. Weakly synchronised pipelines: the
STG (a), its unfolding prefix with cores, show-
ing transformations resolving the encoding
conflicts (b,c) and the corresponding equa-
tions (d,e,f).
and in both of them the newly inserted signal has two trig-
gers, but the former insertion delays three transitions, adds
the trigger csc to x1 and replaces the trigger x2 of z with csc,
whereas the latter insertion delays two transitions and adds
the trigger csc to x1 and z. The equations corresponding to
these two solutions are shown in Fig. 8(e,f).
One can see that the implementations derived by signal
insertion are more complex than the one obtained by con-
currency reduction. These two implementations also delay
signals z and x1, whereas the one derived using concurrency
reduction does not have additional delays. Additionally,
the solution obtained by concurrency reduction results in
a symmetrical STG.
7.2 A/D converter
The example shown in Fig. 9 is a part of the A/D con-
verter proposed in [13]. It contains two type I and three
type II cores shown in Fig. 9(a). The events e3, e6, e11
and e13 comprise the highest peak, as each of them belongs
to four cores. They can be eliminated by a forward con-
currency reduction, since events e5 and e6 are concurrent
to the events in the peak and the concurrency starts in the
peak. The valid concurrency reductions are presented in
the table in Fig. 9(c), where the column ‘lits’ shows the to-
tal number of literals in the corresponding equations. The
first four solutions eliminate all the cores in the peak, and
the last one eliminates only one core. Incidentally, the first
four solutions eliminate the remaining core as well, because
the corresponding ordering constraints also act as backward
15
concurrency reductions. The first solution introduces a large
delay (e11 is delayed by an input event e9) but no additional
triggers (in fact, the number of triggers of Lr is reduced,
since Ar ceases to be its trigger), whereas the second one
does not delay e11 but introduces an additional trigger. The
equations for these two solutions are shown in Fig. 9(b).
The third solution delays e6 by e5, and the fourth solution
delays e6 by e5 and e9; moreover, both these solutions in-
troduce an additional trigger to Ar (which already had three
triggers), and thus are inferior according to our cost func-
tion.
Alternatively, the encoding conflicts can be solved using
signal insertion, by inserting a transition csc+ into the peak
and its counterpart outside the cores belonging to the peak,
preserving the consistency and ensuring that the cores are
destroyed. Recall that input signal transitions cannot be de-
layed by newly inserted transitions, i.e., in the peak csc+
cannot delay e3 and e13. In the second phase, the parts
of the prefix which are concurrent to or in structural con-
flict with the inserted transition are faded out, as the con-
sistency would be violated if the csc− is inserted there. At
the same time, one can try to eliminate the remaining core
{e5,e9,e16,e18}. The valid signal insertions are shown in
the table in Fig. 9(d), where a sequential insertion is des-
ignated by either ‘after’ or ‘before’ ei, inserting a signal
transition directly either after or before the transition cor-
responding to ei. A concurrent signal transition insertion
is denoted by ‖ei to e j, where a signal transition is inserted
between ei and e j.
Solution 6 introduces the smallest delay (only ready−
is delayed), whereas solution 7 has the smallest estimated
logic complexity, but the largest delay (the insertion delays
ready+, Ar− and ready−). Solutions 9 and 11 have the
greatest estimated logic complexity. The equations for so-
lution 6, 7 and 9 are presented in Fig. 9(b). One can see that
the equations for solution 7 and 9 have the same number of
literals, even though their estimated logic complexities were
quite different. This shows that our cost function is not per-
fect, since C(n) is quite a rough estimate of complexity, and
since the cost function does not take the context signals into
account. However, it is not trivial to significantly improve
this cost function without introducing a considerable time
overhead in computing it. (In particular, the context signals
cannot be computed for a particular signal z until all the en-
coding conflicts for z are resolved.)
8 Conclusions and future work
This paper presents a combined framework for the reso-
lution of encoding conflicts in STG unfoldings. This frame-
work explores a larger design space and allows the designer
to exploit the area/delay tradeoff, which is crucial in synthe-
sis of many interface controllers, e.g., in the ‘glue logic’ be-
e3
e6
e1
e2
e4
e12
e19
e0
e11
e10 e7
e15
e17
e20
e13
e8
e14
e18
e5
e16
e9
ready+
start−
start+
ready−
Laf+
Ar−
Laf−
Ar+
Ad−
Ad+
Lr+
Ar−
Lam+
Lr−Ad−
Ar+
start+
Lr+
Ad+
Lam−
Lr−
(a)
ready = Laf
Lr = start ·Ad ·Ar+
Laf · (Ar+ ready)
Ar = Lam ·Laf · (Ar+Ad)
equations for solution 1
ready = start · ready+Laf
Lr = start ·Ad · ready ·Ar+
Laf · (Ar+ ready)
Ar = Lam ·Laf · (Ar+Ad)
equations for solution 2
ready = Laf+ csc
Lr = start ·Ad ·Ar · csc+
Laf(csc+Ar)
Ar = Lam ·Laf · (Ar+Ad)
csc = start · csc+Laf
equations for solution 6
ready = csc
Lr = Ar · (start · csc ·Ad+Laf)
Ar = Lam ·Laf · (Ar+Ad)+
Laf · csc
csc = start · csc+Laf
equations for solution 7
ready = Laf+ csc
Lr = csc · (start ·Ar ·Ad+Laf)
Ar = Lam ·Laf · (Ar+Ad)
csc = start · csc+Laf ·Ar
equations for solution 9
(b)
concurrency reduction
# causal constraint ∆ω;∆logic;∆cores lits
1 h(e9) 99K h(e11) 3;-1;-5 11
2 h(e5) 99K h(e11) 0;2;-5 14
3 h(e5) 99K h(e6) 1;2;-5 14
4 h(e9) 99K h(e6) 4;2;-5 11
5 h(e10) 99K h(e11) 3;0;-1 n/a
(c)
signal insertion
# phase 1 phase 2 ∆ω;∆logic;∆cores lits
6 ‖e3 to e11 before e16 1;3;-5 16
7 after e3 before e16 3;2;-5 15
8 before e6 before e16 2;3;-5 16
9 before e11 before e16 2;4;-5 15
10 after e3 after e9 3;3;-5 18
11 after e3 ‖e5 to e16 2;4;-5 20
(d)
inputs: start,Lam,Laf ,Ad; outputs: ready,Lr,Ar; internal: csc
Figure 9. Top level of the A/D converter: the
unfolding prefix with cores (a), a selection of
equations corresponding to valid transforma-
tions (b), and lists of possible concurrency
reductions (c) and signal insertions (d).
tween IP cores of SoCs. Encoding conflicts are represented
by means of cores, which are sets of transitions ‘causing’
them. The advantage of using cores is that only those parts
of STGs which cause encoding conflicts, rather than the
complete list of CSC conflicts, are considered. Since the
number of cores is usually much smaller than the number
of encoding conflicts, this approach reduces the amount of
information to be analysed.
Moreover, a novel validity condition has been proposed
to justify these transformations, which is also of indepen-
dent interest. We have developed a sufficient condition for
16
a concurrency reduction on a general LPN being valid, as
well as a simplified version of this condition for the case of
a non-auto-concurrent Petri net.
The future work will be focused on the following issues:
• developing an algorithm for checking the validity of
a concurrency reduction and a signal insertion on safe
nets;
• improving the cost function;
• performing the transformations directly on the unfold-
ing prefix rather than the STG whenever possible, in
order to reduce the number of runs of the unfolding
algorithm.
Acknowledgements
We would like to thank Maciej Koutny and Walter Vogler
for helpful comments and suggestions. This research was
supported by the EPSRC grants GR/R16754 (BESST) and
GR/S12036 (STELLA).
References
[1] J. Carmona, J. Cortadella and E. Pastor: A Structural Encod-
ing Technique for the Synthesis of Asynchronous Circuits.
Fundamentae Informaticae 50(2) (2002) 135–154.
[2] T. -A. Chu: Synthesis of Self-Timed VLSI Circuits from
Graph-Theoretic Specifications. PhD Thesis, MIT/LCS/TR-
393 (1987).
[3] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno
and A. Yakovlev: PETRIFY: a Tool for Manipulating Con-
current Specifications and Synthesis of Asynchronous Con-
trollers. IEICE Transactions on Information and Systems
E80-D(3) (1997) 315–325.
[4] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno
and A. Yakovlev: Logic Synthesis of Asynchronous Control-
lers and Interfaces. Springer Verlag (2002).
[5] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno
and A. Yakovlev: Automatic Handshake Expansion and Re-
shuffling Using Concurrency Reduction. Proc. of HWPN’98,
(1998) 86–110.
[6] D. L. Dill: Trace Theory for Automatic Heirarchical Veri-
fication of Speed-Independent Circuits. PhD Thesis 15213,
CMU (1987).
[7] J. Engelfriet: Branching Processes of Petri Nets. Acta Infor-
matica 28 (1991) 575–591.
[8] J. Esparza, S. Ro¨mer and W. Vogler: An Improvement of
McMillan’s Unfolding Algorithm. Formal Methods in Sys-
tem Design 20(3) (2002) 285–310.
[9] V. Khomenko: Model Checking Based on Prefixes of Petri
Net Unfoldings. PhD Thesis, School of Computing Science,
University of Newcastle upon Tyne (2003).
[10] V. Khomenko, M. Koutny and A. Yakovlev: Detecting State
Coding Conflicts in STGs Using Integer Programming.
Proc. of DATE’02, IEEE Computer Society Press (2002)
338–345.
[11] V. Khomenko, M. Koutny and A. Yakovlev: Detecting State
Coding Conflicts in STG Unfoldings Using SAT. Proc. of
ACSD’03, IEEE Computer Society Press (2003) 51–60. Full
version: Special Issue on Best Papers from ACSD’03, Fun-
damentae Informaticae 62(2) (2004) 1-21.
[12] V. Khomenko, M. Koutny and A. Yakovlev: Logic Synthe-
sis for Asynchronous Circuits Based on Petri Net Unfold-
ings and Incremental SAT. Proc. of ACSD’04, IEEE Com-
puter Society Press (2004) 16–25. Full version: to appear in
Special Issue on Best Papers from ACSD’04, Fundamentae
Informaticae.
[13] D. J. Kinniment, B. Gao, A. Yakovlev and F. Xia: Towards
asynchronous A-D conversion. Proc. of ASYNC’00, IEEE
Computer Society Press (2000) 206–215.
[14] B. Lin, C. Ykman-Couvreur and P. Vanbekbergen: A Gen-
eral State Graph Transformation Framework for Asynchro-
nous Synthesis. Proc. of EURO-DAC’94, IEEE Computer
Society Press (1994) 448–453.
[15] A. Madalinski, A. Bystrov, V. Khomenko and A. Yakovlev:
Visualization and Resolution of Coding Conflicts in Asyn-
chronous Circuit Design. Proc. of DATE’03, IEEE Com-
puter Society Press (2003) 926–931. Full version: Special
Issue on Best Papers from DATE’2003, IEE Proceedings:
Computers & Digital Techniques 150(5) (2003) 285–293.
[16] H. Saito, A. Kondratyev, J. Cortadella, L. Lavagno and A. Ya-
kovlev: What Is the Cost of Delay Insensitivity?. Proc. of
CAD, IEEE Computer Society Press (1999) 316–323.
[17] A. Semenov: Verification and Synthesis of Asynchronous
Control Circuits Using Petri Net Unfolding. PhD Thesis,
University of Newcastle upon Tyne (1997).
[18] I. Wegener: The Complexity of Boolean Functions. Wiley-
Teubner Series in Computer Science (1987).
17
