Logic synthesis of speed independent circuits based on STG decomposition is a promising approach to tackle complexity problems like state-space explosion. Unfortunately, decomposition can result in components that in isolation have irreducible CSC conflicts. Generalising earlier work, we show how to resolve such conflicts by introducing internal communication between the components. The new algorithms are successfully applied to some benchmarks, including very complex STGs arising in the context of control resynthesis.
Introduction
Speed independent (SI) circuits are an important class of asynchronous circuits with well-known advantages [3] . Signal Transitions Graphs (STGs) are a formalism to specify the behaviour of SI circuits. They are interpreted Petri nets in which transitions are labelled with rising and falling edges of the circuit signals.
PETRIFY [3] and PUNF&MPSAT [6] are common tools for logic synthesis of SI circuits from STG specifications. PETRIFY explores the entire state space of the STG, suffering from state-space explosion, and the unfolding based synthesis via MPSAT has complexity problems when solving large SAT problems. Although logic synthesis leads to very efficient circuit implementations -compared e.g. to syntax-directed translation [4] -it is only applicable to rather small specifications.
To cope with these complexity problems, we decompose the STG specification into several smaller component STGs and apply logic synthesis for each, see e.g. [9, 10] . In contrast to our approach, other e.g. in BALSA [4] , a netlist is derived that describes how to connect certain handshake components (HCs). Instead of connecting all these HCs, control resynthesis forms suitable clusters of control HCs and specifies their interface behaviour by an STG [5] . Such an STG is usually too complex to apply pure logic synthesis, but decomposition-based synthesis with our tool DESIJ [10] often succeeds.
The paper is organised as follows: in the next section we introduce the basic concepts of STGs and their decomposition (including the generalised version). In Section 3, we briefly recapitulate how to avoid selftriggers in component STGs by introducing internal communication between components; furthermore, we suggest how to deal with general irreducible CSC conflicts. In the following section, we present structural techniques for introducing internal communication by correctly inserting new internal signals into the original specification and decompose it anew. In Section 5, we present some experimental results also for some STGs arising in the context of control resynthesis. We draw conclusions in Section 6.
Due to lack of space, many details (esp. all proofs) had to be omitted, see the full version in [11] .
Basics
This section provides basic notions for STGs and their decomposition, for more details see [3, 10] . We assume acquaintance with multisets over a set A, i.e. functions ms, ms : A → N 0 with e.g. a ∈ ms ⇔ ms(a) > 0 and (ms − ms )(a) = max(0, ms(a) − ms (a)).
Petri Nets and STGs
A Petri net is a 6-tuple N = (P, T,W, M N , Σ, l) where P and T are disjoint and finite sets of places and transitions. W : P × T ∪ T × P → N 0 is the weight function and M N the initial marking, where a marking is a multiset of places, i.e. a function P → N 0 which assigns a number of tokens to each place. The marking of a set of places is defined as the sum of all individual markings. Σ is a set of actions, and l : T → Σ ∪ {λ } is the labelling function where λ denotes the empty word. If necessary, we write P , P i etc. for the nets N , N i etc. Analogous conventions apply later on.
Information about transition enabling and firing, structural and dynamic conflicts or the reachability graph RG N of N can be found e.g. in [10] . A place p is called marked graph place or MG-place if ∑ t∈T W (t, p) = 1 = ∑ t∈T W (p,t). Unmarked MGplaces are not drawn; they are implicitly given by an arc between the respective transitions; cf. A nonempty sequence w = x 1 x 2 . . . x n of places and transitions without duplicates is a path (of N) if W (x i , x i+1 ) > 0 for 1 ≤ i < n. Obviously, places and transitions have to alternate on a path. With an abuse of notation we often consider a path as the set containing its elements, writing for example p ∈ w. A path w is a marked graph path or MG-path if every place of w is an MG-place. For a marking M, the marking M(w) of a path w is defined as M(w ∩ P).
An STG is a tuple N = (P, T,W, M N , In, Out, Int, l), where (P, T,W, M N , Sig ± , l) is a Petri net, In, Out and Int are disjoint sets of input, output and internal signals, and Sig = In∪Out ∪Int is the set of all signals; signature refers to this partition. Sig ± = Sig × {+, −} is the set of rising and falling signal edges, which are denoted as s + , s − resp. instead of (s, +), (s, −) resp. We write s ± if it is not important or unknown which direction takes place; if such a term appears more than once in the same context, it always denotes the same direction. Loc = Out ∪ Int is the set of locally controlled or just local signals which are produced by the STG.
Transitions labelled with input/output/. . . edges are called input/output/. . . transitions. Transitions with label λ (called dummy-transitions) do not correspond to any signal change, i.e. they are not internal transitions. An example of an STG is shown in Fig. 1 ; input transitions are depicted by bold squares and local transitions by normal squares.
STGs are used for specifying the behaviour of speed-independent (SI) circuits. The idea is as follows: a reachable marking of the STG roughly corresponds to a state of the intended circuit (viz. the values of its signals). If some marking activates an output (or internal) edge, the circuit must produce the same edge if it is in a corresponding state and the environment of the circuit must be ready to receive it; if some marking activates an input edge, the environment is allowed to generate it and the circuit must be ready to receive it.
For the first step from markings to circuit states, one defines the notion of state assignment: for an STG N, a state vector is a function sv : Sig → {0, 1}, assigning a Boolean value to each signal. A state assignment assigns a state vector sv M to each marking M of RG N , such that for every signal x ∈ Sig and every pair of markings M, M ∈ RG N :
If such an assignment exists, it is uniquely defined by Figure 1 . An STG modelling a simplified VME bus controller (top) and its state graph with a CSC conflict between the shaded states (bottom) these properties 1 , and the reachability graph and the underlying STG are consistent. This is necessary for circuit synthesis and assumed throughout. From an inconsistent STG, one cannot synthesise a circuit, and in this paper we assume that all STGs are consistent. The state graph of an STG is its reachability graph where each marking is annotated with its state vector; cf. Figure 1 (bottom) where only the state vectors are shown.
We now explain the important concept of Complete State Coding (CSC). If there is a state assignment, N has CSC if any two reachable markings M 1 and M 2 with the same state vector (i.e. sv M 1 = sv M 2 ) enable the same local signals. Otherwise, N has a CSC conflict, cf. e.g. the shaded markings in Figure 1 (bottom), and no circuit can be synthesised directly. If CSC is violated, one tries to achieve it by inserting internal signals such that the state vectors of M 1 and M 2 differ and the external behaviour of the STG is unchanged; thus the internal signal insertion must be input proper [8] , i.e. no input edge must be delayed by any internal edge. If CSC cannot be achieved this way, the conflict is called irreducible. Self-triggers are a special type of irreducible CSC conflicts characterised by a transition sequence M 1 [t 1 t 2 M 2 , where t 1 ,t 2 are labelled with the same input 1 At least for every signal s ∈ Sig which actually occurs.
signal, but complementary edges, and M 2 does not activate the same local signal edges as M 1 (note that there must be a place in t 1
• ∩ • t 2 ). We call t 1 the entry and t 2 the exit transition.
Now, we present a number of operations for decomposition. Lambdarising a signal means to change the labelling function such that all transitions corresponding to this signal are labelled with λ and to remove this signal from the signature. By contrast, hiding a signal set H ⊆ Out from an STG N results in the STG N/H = (P, T,W, M N , In, Out \ H, Int ∪ H, l), i.e. some output signals are now considered to be internal. Most important for decomposition is transition contraction: Definition 2.1 (Transition Contraction) Let N be a Petri net and t ∈ T with l(t) = λ ,
• t ∩ t • = / 0 and W (p,t),W (t, p) ≤ 1 for all p ∈ P. The t-contraction N of N is obtained by removing t and replacing its pre-and post-set by their cartesian product as shown in Fig. 2 . A transition contraction is called secure if either
We now define redundant transitions and implicit places; the deletion of such a transition, place resp., (including the incident arcs) is another operation that can be used in our decomposition algorithm. A transition t is redundant if either it is a λ -transition with W (p,t) = W (t, p) for each place p (i.e. t is a loop-only transition), or there is another transition t with the same label such that W (p,t) = W (p,t ) and W (t, p) = W (t , p) for each place p (i.e. t is a duplicate transition).
A place p is implicit if it can be removed from the net without changing the set of firing sequences. However, detecting implicit places is PSPACE-complete 2 and during decomposition only redundant places (which are implicit) are deleted. Definition 2.2 (Redundant Places) The place q is (structurally) redundant [1] if there is a set of places Q -called reference set -with q ∈ Q, a valuation V : Q ∪ {q} → N and some d ∈ N 0 which satisfy the following properties for all transitions t:
We conclude by introducing some operations for signal insertion. In particular, gyroscope insertion inserts essentially what is known as toggle-transition. Definition 2.3 (Place-refinement, subnet-contraction, gyroscope insertion [10] ) Let N be an STG, p ∈ P.
(1) Consider a net N (cf. Figure 3 ) with:
In N , the labels of the new transitions and their signature can be chosen arbitrarily. Starting from N, N is called a place-refinement of p with initial marking (in, out); starting from N , N is called a subnetcontraction if g 1 and g 2 are λ -transitions.
(2) A gyroscope insertion with initial marking (in, out) inserts a new implicit place p ∈ P with in + out tokens into N (giving the intermediate N ) and applies place refinement with initial marking (in, out) to it (giving N ).
A gyroscope insertion is called an input/output/internal gyroscope insertion if g 1 and g 2 are labelled in N with s + , s − resp. and s is a fresh input, output or internal signal resp.; it is called a dummy gyroscope insertion if g 1 and g 2 are labelled with λ .
STG Decomposition
For the STG decomposition algorithm of [9] , a partition of the output signals of the given specification STG N is chosen, and the algorithm decomposes N into component STGs, one for each set in this partition. For each component, equations for the corresponding outputs are derived from the respective state graph, instead of deriving the equations from the state graph of N, and the respective circuits implement N if put in parallel.
Very often, the cumulated states of all component state graphs give a number much smaller than the state count of N, in which case the decomposition can be seen as successful. Actually, it might already be beneficial if each state graph is smaller than the one of N, in particular for reducing peak memory usage.
Of course, the behaviour of the specification should be preserved in some sense; such a correctness is captured by a variant of bisimulation, tailored to the specific needs of asynchronous circuits, see e.g. [9] .
We now discuss our decomposition algorithm in more detail. In the following, we assume a given deterministic, consistent specification N without internal signals (which can be considered as outputs), andfor this paper -with arcs of weight 1 only. First, one chooses a feasible partition, i.e. a family (In i , Out i ) i∈I for some set I such that the sets Out i are a partition of Out, In i ⊆ Sig \ Out i for each i and furthermore:
• If two output signals x 1 , x 2 are in structural conflict in N, then they have to be in the same Out i .
• If there are t,t ∈ T with t ∈ (t
Observe: if we have a feasible partition, we can build another feasible one by adding additional input signals to one of the members.
For each member (In i , Out i ) of the partition, an initial component is generated from N: in a copy of the original STG N, every signal not in In i ∪ Out i is lambdarised and the signals in In i are considered as inputs of this component -even if they are outputs of N. Then reduction operations are applied to an initial component until no more λ -labelled transitions remain; currently, the reduction operations in use are: secure contraction of a λ -labelled transition, deletion of an implicit place, and deletion of a redundant transition (cf. Sec. 2.1). The decomposition in Figure 4 is based on the output partition {{x}, {y}}. Since signal a is the only 'syntactical trigger for' x as well as y, we get two components C r and C d with In r = In d = {a}, Out r = {y} and Out d = {x}. In the initial component for C r the x-labelled transition is lambdarised and then contracted, and similar for C d .
If one cannot contract all λ -transitions, backtracking is applied: the former label of a non-contractible λ -transition is added as a new input to the component and the new corresponding initial component is derived and reduced from the beginning.
In principle, every so-called totally admissible operation can be used for reduction. It is proven in [9] that the decomposition algorithm using arbitrary totally admissible operations always returns a correct decomposition. Here, it is enough to know that the three operations from above as well as subnet-contraction from Definition 2.3 are totally admissible. Now we present a new generalisation of the decomposition algorithm, where we relax the definition of a feasible partition, i.e. we can apply essentially the same algorithm with additional starting points. We still consider the same kind of specification N as above and define a quasi-feasible partition as a family (In i , Out i ) i∈I as above, but changing the first requirement to
• If two output signals x 1 , x 2 are in dynamic conflict in N, then they have to be in the same Out i .
Theorem 2.4
Also when starting from a quasifeasible partition, decomposition of a deterministic N results in deterministic components that form a correct implementation of N. If only the operations listed in this paper are used and N is consistent (free of dynamic io-conflicts resp.), then so are the components. The proof follows the lines of the proof of Theorem 4.1 in [9] . Carefully checking all the eight (sub)proofs, one sees that mostly no essential changes are needed, but for proving that secure transition contraction is still totally admissible, one has to invest considerable effort to fill two proof gaps for Lemma 4.3 and 4.4, see [11] .
Inserting Internal Communication
Pure application of STG decomposition can lead to irreducible CSC conflicts. Ignoring the clouds in Figure  4 , component C r (including place p st ) has a self-trigger corresponding to t en and t ex (see the shaded states in (b)) although the initial specification N has none. The key to apply this approach is to identify a delay transition in N. Consider a self-trigger of C r between the markings M 1 and M 2 , i.e. there is a transition sequence M 1 [t en t ex M 2 and (t en ,t ex ) is the entry/exit transition pair of the self-trigger. Intuitively, the fast occurrence of t ex can lead to malfunction of the circuit; our aim is to delay it. Definition 3.1 (Delay Transition) A transition t d of N is called delay transition w.r.t. a transition pair (t en ,t ex ) if t d is labelled with a local (i.e. non-input) signal edge and t d occurs in all transition sequences t en ... t ex enabled under a reachable marking of N.
Intuitively, if t ex fires after t en then t d must fire in between, delaying t ex as desired.
In the next sections, we only consider self-trigger avoidance, since self-triggers appear very often in our benchmark examples, cf. Section 5. Furthermore, a self-trigger is the most severe type of irreducible CSC conflict -w.r.t. to our approach -since there is only one transition pair (t en ,t ex ) for which a delay transition has to be found. For general irreducible CSC conflicts, there might be several opportunities:
Consider the two critical components in 
consist of input events only. In principle MPSAT and PETRIFY are able to report these traces, and they can be mapped to corresponding firing sequences v 1 and v 2 ; observe that v 1 could even be empty, as in Figure 5b . Following [7] it is enough to consider conflict types as in Figure 5 , since other conflicts can be reduced to these; Figure 5a refers to a typical type II conflict (according to [7] ) and 5b to a type I conflict. Figure 5 shows that there are usually several pairs (t a ,t b ) -like an entry/exit transition pair -where one can look for a delay transition that fires between both transitions. For example in Figure 5a we can identify the pairs (t 2 ,t 3 ), (t 4 ,t 5 ) and in Figure 5b (t 3 ,t 4 ), (t 5 ,t 6 ), (t 3 ,t 5 ) etc. Irreducible CSC conflicts can be avoided if our approach succeeds for just one of those transition pairs. Hence, the focus on self-triggers (which exhibit only one possibility for identifying an entry/exit transition pair) is not a real restriction.
Self-Trigger Avoidance
In [10] , it was shown how to avoid a self-trigger even without recalculation of the critical and the delay component, but only under strong restrictions: for a critical component with a self-trigger as in Figure 4b , the corresponding specification N must have a structure as in Figure 6a : there must be a restricted MG-path w from a transition t 1 labelled with a local signal edge of the delay component to the exit transition t ex via the entry (t en ) and the delay transition (t d ); namely, all transitions of w between t 1 and t en (excluding t 1 ) must have only one incoming arc and all transitions between t d and t ex (excluding t ex ) must have only one outgoing arc. The sub-STG between t en and t d must be a path where all transitions and all places have one incoming and one outgoing arc, only. These strong restrictions make it relatively easy to find a suitable path -if it exists.
Here, we will deal with much more general cases like the one in Figure 6b , accepting a second reduction pass for the critical and the delay component. In Section 4.1, we propose a structural method to identify a delay transition for a given self-trigger, and we present a structural technique to insert an implicit place into a Petri net as a necessary step for the internal gyroscope insertion into N in Section 4.2. In Section 4.3, we propose how to avoid uncontrollable growth of the delay component by introducing internal communication between the critical and several auxiliary components.
How to Identify a Delay Transition
A delay transition is defined based on traces of N (Definition 3.1), but for efficiency we need a structural condition to identify such a transition. Since it is difficult to find a really sufficient condition (only satisfied for delay transitions) or a necessary condition that is not too restrictive, we propose an indication for a delay transition -and just speak of a delay candidate: Definition 4.1 (Delay Candidate) For a specification N and an entry/exit transition pair (t en ,t ex ), a delay candidate t d is a local transition on a path w = w 1 w 2 starting from t en leading to t ex , where w 1 is a path from t en to t d (t en = t d ), while w 2 is a path from t d to t ex without merge places, i.e. ∀p ∈ w 2 |
• p| = 1. No transition on w \ {t en ,t ex } has its signal in Sig r .
Observe that the path w 2 could even be empty, and then there are no restrictions on the specification's structure. Having no merge places on w 2 indicates that t d has to fire before t ex , except that the initial marking of w 2 might initially allow to fire t ex without firing t d . The last condition reflects the idea that w is contracted to p st in C r (cf. Figure 6b) , i.e. none of its transitions except t en and t ex is r-relevant; the signals of C r are called rrelevant, in particular when considered in the context of the full STG N.
Although a delay candidate is not necessarily a delay transition according to Definition 3.1, our criterion is sufficient for many benchmark examples -see our examples in Section 5. Figure 6 . Specifications N yielding at least one delay candidate t d for a potential self-trigger, given by t en and t ex ; p st ∈ t en
• ∩ • t ex sketches a place of C r as a result of reducing all elements in w, cf. also Figure 4b .
as well as all arcs to t en 3 in N resulting in N . Second, we apply an ordinary depth-first search (DFS) starting at t en in order to determine all transitions in N that are reachable from t en ; the others are removed as well. One could also perform a breadth-first search and store the distances from t en ; this would help to find a delay candidate close to t en , see below.
Third, in N a backward search starting at t ex will be applied to find all paths w 2 from a delay candidate t d to t ex ; the search can be restricted, because no merge places p should be visited (i.e. |
• p| = 1). To find all such paths, we modify the backward-directed DFS: when going backward, vertices are marked as visited as usual (this avoids repeated vertices on a path), but when returning from the recursion, the mark is erased (and the vertex can be used in other paths). One can also consider just one or a few paths, hoping that the respective candidates will help to avoid the self-trigger.
After a potential path w 2 is found, we apply a backward-directed breadth-first search (BFS) to find a shortest path w 1 from t en . For this, all the vertices of w 2 must be marked as visited, initially. Only if t en is backward-reachable, t d is a delay candidate. The idea is that the algorithm in Section 4.2 starts at t en and tries to reach some t d quickly; hence, t d should be chosen such 3 The arcs from t ex do not matter for Subsection 4.1.1. that w 1 is short. Usually, there are several possible w 2 and, hence, delay candidates. In the future, we will investigate how the delay candidate selection influences synthesis time and the resulting circuit area and performance.
Inserting Implicit Places into a Petri Net
For a given entry transition t en and a delay candidate t d (with a path w 1 from t en ) the algorithm in Figure 7 inserts a redundant place q (w.r.t. some set Q) into an unweighted STG N such that t en ∈
• q and t d ∈ q • .
The insertion of q will be initiated by calling the function insertImplicitPlace. In line 9, a forward traversal in N from t en towards t d will be started by calling the place function with the unique place p as argument such that p ∈ w 1 and p ∈ t en
• . Every time the function place is processed as well as in lines 35 or 41, a redundant place q could be inserted via the operations specified in lines 10, 11 and 12. The redundancy of q is assured, since for every call of place and every new place p of Q,
• p as well as p • will be added to the potential preset (preq) and postset of q (postq) -except for loop transitions -see lines 33 and 34 (for a proof see [11] ). Consequently, for every incoming token to p a token is added to q, for every token removed from p a // Global variables // potential pre-and postset of q if (place(p, t en )) Algorithm INSERTIMPLICIT for inserting an implicit place q. It is assumed that t en ,t ex and t d are known to the algorithm as global constants.
return (false,false) 10 (bool,bool) checkTransitionB(transition t)
return (false,false) If stop = false, the traversal should be continued and t is not valid (value = false). Otherwise, value says whether t is valid (successful termination) or not; in the latter case, place has to backtrack and to remove its current p from Q.
token is removed from q, too. If the redundant place q were unsafe, we would get a CSC conflict due to firing two edges of the new signal ic; thus p is not added if it would lead to two tokens on q initially (line 31).
To make q useful, the preset and postset of q must fulfil some application specific requirements which are tested by calling transitionF for each transition in p • (lines 36, 37) and transitionB for each transition in
• p (lines 38, 39) -F and B resp. indicate a forward or backward traversal. The functions checkTransition{F/B} in lines 16 and 25 are used to terminate the traversal at certain transitions, see Figure 8 . They are tuned to make q
• consist of local transitions only -including t d -such that the later gyroscope refinement of q with a new internal signal ic is input proper according to [8] , i.e. no input will be delayed by ic. Furthermore,
• q should consist of t en and other transitions labelled with r-relevant edges only, such that the recalculated critical component C r gets no additional relevant signals, except for ic as an additional output; this avoids uncontrollable growth of C r .
If checkTransition{F/B} does not return to stop the search, transition{F/B} calls the place function again to find a suitable place in the postset or preset resp. of t. With such a call, the post-and preset of q are extended until these fulfil the requirements specified in checkTransition{F/B}. Note that the algorithm is correct for arbitrary functions checkTransition{F/B}, see Proposition 4.2 below.
To ensure that t d ∈ q • , the algorithm applies a forward traversal straight to t d as long as the delayFound flag is not set, see lines 18 -20. Similarly, in the implementation of place, the next transition t on w 1 is chosen first in line 36 (not shown). Eventually, the flag will be set by checkTransitionF, and then all the other possible paths starting from the places in w 1 will be traversed in forward and backward direction.
The algorithm works on a net N and constructs a copy N extended with q in lines 10, 11 and 12. Figure 9 . Examples for INSERTIMPLICIT: the loops in lines 36 and 38 process the resp. nodes from left to right; at transitions labelled with F the forward search terminates successfully, and analogously for B.
We try to clarify how INSERTIMPLICIT works with the help of some examples shown in Figure 9 . For the example in (a), w 1 is t en p 1 t 1 p 2 t d ; thus, the algorithm calls place(p 1 ,t en ) (e.g. t 1 is added to postq), transitionF(t 1 ), place(p 2 ,t 1 ) (e.g. t 1 is removed from postq) and transitionF(t d ). Now flag delayFound is set and the last call returns successfully. Next, transitionF(t 4 ) does not directly terminate since t 4 is an input transition, but the next call of transitionF returns successfully as well. Going back to the call place(p 1 ,t en ), transitionB(t 2 ) and then transitionB(t 3 ) are performed. In the end, a redundant place q with
• q = {t en ,t 3 } and q
Next consider Figure 9b , first without transition t 5 . The forward traversal reaches t d , and then transitionF(t 1 ) and place(p 3 ,t 1 ) are called. Here, t 4 is added to preq and postq in line 33, and removed again in line 34. Thus, for t 4 neither transitionF nor transitionB are called in lines 37 or 39. Eventually, the algorithm inserts a redundant place q, where
• q = {t en ,t 3 } and
Now consider transition t 5 ; when reaching p 3 , we get calls transitionF(t 5 ), place(p 1 ,t 5 ) and transitionF(t 1 ). Here, the check requires to continue (t 1 is an input transition), but the algorithm fails in line 21, since there is no place left in t 1
• \ Q. Note that the repeated traversal of places in Q -like p 3 -would prevent the termination of the algorithm. Figure 9c indicates a situation, where t 1 has already been visited by forward traversals -and declared as suitable for q
• via line 7 of checkTransitionF -for two times. Now place p 1 is reached via a call place(p 1 ,t), and t 1 is added to preq in line 33; this would make t 1 a (generalised) loop transition for q. But after line 34, preq(t 1 ) = 0 and postq(t 1 ) = 1, and due to the condition in line 38, transitionB will not be called for t 1 ; the search terminates by returning true. Proposition 4.2 For any terminating checkTransition{F/B} operation without side-effects on N, Q, preq or postq, the algorithm INSERTIMPLICIT terminates such that the place q is implicit in N . If false is returned, no place q will be inserted.
Gyroscope Insertion
Once we have inserted implicit place q as proposed in Section 4.2, we perform an output gyroscope refinement introducing ic; then we modify the components as necessary and finally hide ic such that we have introduced internal communication to avoid the self-trigger in C r . A potential problem can be seen when considering Figure 6c .
Assume the unmodified delay component only produces output signal x, another component produces signal s, and we have identified delay transition t d for the self-trigger between t en and t ex . Observe that q
• does not only contain t d , but also t 1 ; thus t d and t 1 are in structural conflict because of q -and after q's refinement the conflict is caused by the gyroscope place p out (see also Figure 3 ). Using feasible partitions, the component for x now has to produce s as well; we get a new component combining the delay component C d and the component generating signal s. This effect can lead to large delay component STGs from which no circuit can be synthesised (because of their complexity).
Thus, for the second reduction pass, we use a quasifeasible partition as studied in Subsection 2.2. This allows to keep the original feasible partition, only adding ic as an output for C r and as an input for all components producing a signal of a transition in the post-set of q; these components include C d , the others are called auxiliary components; cf. algorithm AVOID in Figure  10 that generalises AVOID-0 from [10] since it deals with several auxiliary components, in addition to the delay component. This algorithm does not necessarily produce a correct decomposition, but a failure can be recognised: Proposition 4.3 AVOID is correct in the following sense: if (C i ) i∈I was obtained from N by decomposition (and hence correct w.r.t. N), then either the partition used by AVOID is quasi-feasible and ((C i ) i∈I ,C r ,C d , (C j ) i∈J ) with I = I \ ({r, d} ∪ J) is correct w.r.t. N , which in turn is correct w.r.t. N when hiding {ic}, or there is some self-trigger with the two transitions of ic in C d . Figure 11 depicts the resulting circuit architecture when applying the algorithm AVOID (Figure 10 ) to the specification in Figure 6b (see 6c for the insertion of the implicit place q). Instead of using a feasible partition enforcing a maybe complex delay component generating both signals x and s, we can use a quasi feasible partition yielding the delay component generating x and an auxiliary component generating signal s only.
In general, the components have many selftriggers, so AVOID has to be applied repeatedly. Often, k components (k > 1) yield the same self-trigger, i.e. characterised by the same entry/exit transition pair in N. In each case, a separate internal signal is needed, since it has to be produced by the respective component. Still, it is not necessary to find an appropriate delay transition and apply INSERTIMPLICIT k times. After the first run of these algorithms, an implicit place q with its sets
• q and q
• is known, and we can insert k concurrent gyroscopes via k copies q 1 , ..., q k of q, i.e. with
• q i = • q and q i • = q • , ∀i ∈ {1...k}. Observe that this approach massively increases the concurrency degree of the new delay component and also of the auxiliary components. This can be avoided by introducing only one place q and refine it via a sequence of k gyroscopes. More information about these optimisations can be found in [11] .
Results
We have integrated the algorithms INSERTIM-PLICIT, CHECKTRANSITION and AVOID in our decomposition tool DESIJ. For identifying a delay candidate as well as the paths w 1 and w 2 , so far a simple solution is implemented. We are currently working on integrating the ideas of Section 4.1.1. Right now, we can only avoid self-triggers, but we will avoid more general conflicts as proposed in Section 3. We applied decompositionbased logic synthesis via DESIJ and PETRIFY as synthesis backend to the benchmark examples in Table 1 . Observe that our hitherto existing method [10] cannot synthesise an SI circuit for any of the benchmark examples except for SPT-6 and SPT-7. We will now compare these results with the ones from MOEBIUS, MPSAT and the pure application of PETRIFY. Initially, none of the benchmark examples satisfies CSC, so we cannot compare our results to a decomposition-based synthesis with NUTAS [12] .
All benchmark computations were performed on a standard PC. We would like to thank Josep Carmona for providing the MOEBIUS results.
The first two columns of Table 1 report the benchmark names and their sizes in terms of place count (|P|), transition count (|T |), number of input signals (|In|) and output signals (|Out|). The remaining columns report the synthesis times for each benchmark in seconds. Note that in the DESIJ-column this time is split into decomposition time (Dec), the time for inserting internal communication signals as described in this paper (Int) and the cumulated time for PETRIFY synthesis of the component STGs (Syn). In the MOEBIUS-column the times are also split into the encoding time to solve CSC for the specification N (Enc), the time to compute the projection of N for each output signal (Proj), and the PETRIFY-like logic synthesis for the projected components (Syn). Observe that the encoding time for SPT-7 is 9.5 hours.
The benchmarks 4. to 8. arise when resynthesising handshake circuits [5] . All benchmarks up to 100 places or transitions can be handled within a few seconds by DESIJ. Observe that MPSAT cannot deal with dummy transitions; we have removed all such transitions via contraction, but this failed for the Shifter specification. MPSAT and PETRIFY cannot synthesise a circuit for SPT-6 and SPT-7 due to memory overflow (m.o.); for MOEBIUS, the synthesis time is an order of magnitude higher than the DESIJ-based logic synthesis.
Note that DESIJ-based synthesis is not always the best approach to exploit the advantages of logic synthesis. In particular, for a small specification like tsendcsm an MPSAT or PETRIFY synthesis is better suited, since DESIJ often introduces more internal signals [11] . As a consequence, we propose to apply DESIJ for very large specifications where pure logic synthesis with other tools is impossible or takes unacceptably long.
Conclusion and Outlook
In [10] , we presented ideas how to avoid irreducible CSC conflicts that can result from decomposition; here, we improved the ideas such that now we can apply decomposition-based SI logic synthesis to many more specifications than before. This can enable SI logic synthesis for very complex STGs, in particular resulting from control resynthesis of handshake circuits.
To optimise our approach, we will in particular investigate how the choice of the delay candidate according to Section 4.1 can influence synthesis time and the resulting circuit area and performance.
