Introduction
Asynchronous circuits are a promising type of digital circuits with well-known advantages compared to their synchronous counterparts (e.g. [vBJN99] ).
In order to support asynchronous system design, CAD tools were developed e.g. following syntaxdirected translation from a high level HDL (e.g. BALSA [EB02] or TANGRAM [vB93] ) or based on logic synthesis using graph based specifications such as Signal Transition Graphs (STGs) [Chu87, CKK + 02]; the latter are widely used for modelling the behaviour of asynchronous circuits. So far the most successful initiatives are the syntax-directed translation based approaches since they support the designers with a programming language design entry and guarantee a robust implementation of complex specifications. However, the efficiency of the resulting circuits is not satisfactory since the power of boolean minimisation techniques cannot be exploited. To overcome this drawback we are aiming at the incorporation of speed independent (SI) logic synthesis into a design flow based on syntaxdirected translation by resynthesis of the system's control path using STGs; see e.g. [CC06] and [WW07] .
However, STG logic synthesis suffers from statespace-explosion. To cope with it, we will apply STG decomposition: instead of synthesising an entire specification STG, it will be decomposed into several smaller component STGs, which are synthesised separately. These components together form the desired asynchronous system. Since the decomposition as in [CC06, YOM04] requires a specification with CSC and leads to components with one output only, we apply STG decomposition as in [VW02, VK06] . The latter has the only disadvantage that it can lead to components which are not SI implementable due to so-called irreducible CSC conflicts.
Here a new solution to this problem is proposed. After decomposition, we try to 'repair' the irreducible CSC conflicts by introducing internal communication between the components such that the overall behaviour of the resulting asynchronous system is preserved. We have successfully applied this approach to benchmarks derived from BALSA components and obtained much better results in terms of synthesis time and circuit area than [CC06] , where CSC conflicts are solved in the specification using integer linear programming.
The next section contains some definitions regarding STGs and their decomposition and introduces a new decomposition operation. In Section 3, we present the basic idea for the introduction of internal communication in order to avoid irreducible CSC conflicts in component STGs, as well as an improved algorithmic solution. Benchmark results are presented in Section 4. Finally, we give our conclusions in Section 5.
All proofs and supporting theorems can be found in [SVWW08] .
Basic Definitions
This section provides the basic notions for Petri nets and STGs. We assume that the reader is familiar with the notion of pre-and postset, the firing rule for transitions and the drawing conventions. For a more detailed explanation cf. e.g. [CKK + 
Petri Nets and STGs
A Petri net is a 6-tuple N = (P, T,W, M N , Σ, l) where P and T are disjoint and finite sets of places and transitions. W : P × T ∪ T × P → N 0 is the weight function and M N the initial marking, where a marking is a multiset of places, i.e. a function P → N 0 which assigns a number of tokens to each place. The marking of a set of places is defined as the sum of all individual markings. Σ is a set of actions, and l : T → Σ + {λ } is the labelling function where λ denotes the empty word.
If necessary, we write P N etc. for the components of N, and P , P i etc. for the nets N , N i etc. Analogous conventions apply later on. For a place p, N−p denotes the net in which p and all dependent elements are deleted; for a marking M, M| P denotes its restriction to P ⊆ P.
A nonempty sequence w = x 1 x 2 ...x n of places and transitions without duplicates is a path (of N) if
With an abuse of notation we often consider a path as the set containing its elements, writing for example p ∈ w. w is a marked graph path or MG-path if every place of w is an MG-place. For a marking M, the marking M(w) of a path w is defined as M(w ∩ P). w is called non-joining, non-forking resp. if for every transition t ∈ w,
A transition is 2-live if there is a firing sequence for every n ≥ 0 which contains t n times; a transition is live if every reachable marking activates a firing sequence containing t. A net is 2-live, live resp. if each transition is 2-live, live resp. N is called bounded if for every reachable marking M and every place p, M(p) ≤ k for some k ∈ N; if k = 1, N is called safe. N is bounded if and only if the set [M N of reachable markings is finite. In this paper, we are only concerned with bounded nets.
We lift the notion of enabledness to transition labels: we write M[l(t) M if M[t M . This is extended to sequences as usual -deleting λ -labels automatically since λ is the empty word; i.e. M[a M means that a sequence of transitions fires, where one of them is labelled with a while the others (if any) are λ -labelled.
A net has a dynamic conflict if there are different transitions t 1 and t 2 such that for some reachable marking M: M[t 1 and M[t 2 , but ∃p ∈ P : M(p) < W (p,t 1 )+W (p,t 2 ). A dynamic conflict implies a structural conflict, i.e.
•
The reachability graph RG N of a Petri net N is an edge-labelled directed graph on the reachable markings with M N as root; there is an edge from M to M labelled l(t) whenever M[t M . RG N can be seen as a finite automaton (where all states are accepting). N is deterministic if its reachability graph is a deterministic automaton, i.e. if it contains no λ -labelled transitions and if for each reachable marking M and label a ∈ Σ there is at most one M with M[a M .
An STG is a tuple N = (P, T,W, M N , In, Out, Int, l), where (P, T,W, M N , Sig ± , l) is a Petri net, In, Out and Int are disjoint sets of input, output and internal signals, and Sig = In + Out + Int is the set of all signals; signature refers to this partition of the signals. Sig ± = Sig × {+, −} is the set of signal edges or signal transitions; its elements are denoted as s + , s − resp. instead of (s, +), (s, −) resp. s + denotes that the value of s changes from logical low (written as 0) to logical high (written as 1), and s − denotes the opposite direction. We write s ± if it is not important or unknown which direction takes place; if such a term appears more than once in the same context, it always denotes the same direction. Transitions labelled with λ do not correspond to any signal change (cf. state assignment below) and they are also called dummy-transitions.
For STGs, unmarked MG-places are not drawn, but implicitly given by an arc between the respective transitions. Transitions are drawn as rectangles containing their labelling (input transitions with a thick border).
STGs are widely used for specifying the behaviour of asynchronous circuits. The idea is as follows: a reachable marking of the STG roughly corresponds to a state of the intended circuit (viz. the values of its signals). If some marking activates an output (or internal) signal edge, the circuit must produce the same edge if it is in a corresponding state and the environment of the circuit must be ready to receive it; if some marking activates an input, the environment is allowed to produce it and the circuit must be ready to receive it.
For the first step from markings to circuit states, one defines the notion of state assignment: for an STG N, a state vector is a function sv : Sig → {0, 1} where '0' means logical low and '1' logical high. A state assignment assigns a state vector sv M to each marking M of [M N , such that for every signal x ∈ Sig and every pair of markings M, M ∈ [M N : Lambdarising a signal means to change the labelling function such that all transitions corresponding to this signal are labelled with λ and to remove this signal from the signature. This operation is important for the decomposition algorithm described below. By contrast, hiding a signal set H ⊆ Out from an STG N results in the STG N/H = (P, T,W, M N , In, Out − H, Int + H, l), i.e. some output signals are now considered to be internal signals.
For the parallel composition , we will have to consider the distinction between input and output signals. The idea of parallel composition is that the composed systems run in parallel and synchronise on common signals -corresponding to circuits that are connected on the wires corresponding to these signals. Since a system controls its outputs, we cannot allow a signal to be an output of more than one component; input signals, on the other hand, can be shared. An output signal of a component may be an input of other components, and in any case it is an output of the composition. Internal signals of one component must not be used by the others; this is no serious restriction and can always be achieved by renaming the respective signals. 2 The parallel composition of STGs N 1 and N 2 is de-
The place set of the composition is the disjoint union (∪) 2 A composition can also be ill-defined due to what is called computation interference; this is a semantic problem, and we will not consider it here but later in the definition of correctness.
Figure 2. Example of a transition contraction.
of the place sets of the components; therefore, we can consider markings of the composition (regarded as multisets) as the disjoint union of markings of the components. To define the transitions, let A = Sig 1 ∩ Sig 2 be the set of common signals. If e.g. s is an output of N 1 and an input of N 2 , then an occurrence of an edge s ± in N 1 is 'seen' by N 2 , i.e. it must be accompanied by an occurrence of s ± in N 2 . Since we do not know a priori which s ± -labelled transition of N 2 will occur together with some s ± -labelled transition of N 1 , we have to allow for each possible pairing. Thus, the parallel composition N = N 1 N 2 is obtained from the disjoint union of N 1 and N 2 by combining each s ± -labelled transition t 1 of N 1 with each s ± -labelled transition t 2 from N 2 if s ∈ A. Such transitions are pairs and the
for an example of a parallel composition, see Fig. 1 . More generally, we
where w| N i denotes the projection of the trace w onto the signals of the STG N i . Obviously, we can define the parallel composition of a finite family (or collection) (C i ) i∈I of STGs as i∈I C i , provided that no signal is an output signal of more than one of the C i . We now introduce transition contraction, which will be most important in our decomposition procedure. We essentially repeat from [VK06] , where further discussions can be found. Definition 2.1 (Transition Contraction) Let N be a Petri net and t ∈ T with l(t) = λ ,
For two different transitions t 1 , t 2 with t 1 = t = t 2 , we call the unordered pair {t 1 ,t 2 } a new conflict pair whenever
A transition contraction is called secure if either
Fig . 2 shows a part of a net and the result of contracting the λ -transition; note that the transitions b − and c + form a new conflict pair, although they already are in structural conflict before the contraction. In general, N might fail to be consistent, even if N is; but secure contractions preserve consistency (see [VK06] ).
We conclude this section by defining redundant transitions and implicit places; the deletion of such a transition, place resp., (including the incident arcs) is another transformation that can be used in our decomposition algorithm. A transition t is redundant if either it is a λ -transition with W (p,t) = W (t, p) for each place p (i.e. t is a loop-only transition), or there is another transition t with the same label such that
is a duplicate transition).
A place p is implicit if it can be deleted from the net without changing the set of firing sequences. However, detecting implicit places is PSpace-complete and during decomposition only redundant places [Ber87] are deleted. Redundant places are implicit (but in general not vice versa); they are defined on the structure of the net and there are LP techniques to find them. Since these techniques are still not efficient enough, only the subset of shortcut places is deleted. An MG-place p is a shortcut place if there is an MG-path w between t ∈
• p
It is easy to see that shortcut places are indeed redundant. They will be used in the correctness proofs below.
STG Decomposition
Synthesis with STG decomposition works as follows: a partition of the output signals of the given specification STG N is chosen, and the decomposition algorithm decomposes N into component STGs, one for each set in this partition. Then, from each component, equations for the corresponding outputs are derived from the respective reachability graph, instead of deriving the equations from N itself.
Very often, adding up the sizes of these graphs gives a number much smaller than the size of the reachability graph of N, in which case the decomposition can be seen as successful. Actually, it might already be beneficial if each reachability graph is smaller than the one of N, in particular for reducing peak memory.
Of course, the behaviour of the specification should be preserved in some sense; this is captured by a variant of bisimulation, tailored to the specific needs of asynchronous circuits:
there is an STG-bisimulation B between the markings of N and those of C with the following properties:
Here, and whenever we have a collection (C i ) i∈I , P i stands for P C i , Out i for Out C i etc.
In a simple case, (C i ) i∈I consists of just one component C 1 (immediately implying (C2)).
(C2) ensures that no computation interference occurs, i.e. if a component produces an output (which is under the control of this component), then the other components expect this signal if it belongs to their inputs, and no malfunction of these other components must be feared. (C2) is actually also satisfied for x ∈ Int i , since internal signals of one component are by definition unknown to the other components.
The following theorem specialises a result from [SV07] . For STGs the notion of speed-independence (SI) [CKK + 02] is important: an STG is speedindependent if the intended circuit works correctly under arbitrary delays of the gates (while the signal propagation is considered instantaneous). As a consequence, an STG has to be input proper, i.e. no input becomes enabled by an internal signal, since otherwise the environment might produce the input before the internal signal is produced by the circuit.
In [SV07] it was shown that the above correctness notion implies that (essentially) the implementation is input proper; in particular, if the solution of a CSCconflict inserts an internal signal in front of an input, it is not correct in the sense of Defintion 2.2.
We now describe our decomposition algorithm roughly, for more details cf. [VW02, VK06] . In the following, we assume that we are given a deterministic, consistent specification N without internal signals. 3 First, one chooses a feasible partition, i.e. a family (In i , Out i ) i∈I for some set I such that the sets Out i are a partition of Out, In i ⊆ Sig \ Out i for each i and:
• If two output signals x 1 , x 2 are in structural conflict in N, then they have to be in the same Out i .
• If there are t,t ∈ T with t ∈ (t
If we have a feasible partition, we can build another feasible one by adding additional input signals to one of the members. Then, so-called totally admissible operations are applied in order to reduce the initial components until no more λ -labelled transitions remain. Currently, the reduction operations in use are: secure contraction of a λ -labelled transition, deletion of an implicit place, and deletion of a redundant transition; we refer to this current version of our algorithm as the CIR-algorithm.
Unfortunately, it is not always possible to contract all λ -transitions. Besides the technical cases where the contraction is undefined or not secure (possibly leading to an incorrect decomposition), the contraction might also generate a new auto-conflict. This reveals nondeterminism present in the respective initial component but not in the specification. This indicates that the component has not enough information to properly produce its outputs. Such a contraction is disallowed and consequently a new signal is added as follows.
If λ -transitions remain, backtracking is applied, i.e. a new input is added to the component. Technically, this input is added to the initial partition and the new corresponding initial component is derived and reduced from 3 For the decomposition algorithm, internal signals can be considered as outputs; see [SV07] for more details. Gyroscope insertion between t and t with initial marking (2, 1) via place insertion (here: a shortcut place due to w) and place refinement.
the beginning. The new input signal is taken from the former label of a non-contractible λ -transition. As discussed above, the new partition is feasible again. This cycle of reduction and backtracking is repeated till all λ -transitions of the initial component can be contracted. The algorithm guarantees correctness if arbitrary totally admissible operations are applied.
Place Refinement and Subnet Contraction
Now, we introduce some new operations to deal with signal insertion. A gyroscope insertion inserts essentially what is known as toggle-transition. A gyroscope insertion with initial marking (in, out) inserts a new implicit place p ∈ P with in + out tokens into N (giving the intermediate N ) and applies place refinement with initial marking (in, out) to it (giving N ). A gyroscope insertion between t and t (t,t ∈ T ) with initial marking (in, out) inserts p between t and t (i.e.
• p = {t} and p • = {t } with arc weights 1).
A gyroscope insertion is called an input/output/internal gyroscope insertion if g 1 and g 2 are labelled in N with s + , s − resp. and s is a fresh input, output or internal signal resp.; it is called a dummy gyroscope insertion if g 1 and g 2 are labelled with λ . Subnet-contraction is not really intended as reduction operation. But in principle, one could try to apply it if only backtracking is the alternative, though the odds for this to succeed seem to be low. Proposition 2.5 Subnet-contraction is a totally admissible operation [VK06] 
Avoiding Self-Triggering
As mentioned above, the contraction based decomposition of [VW02, VK06] can lead to components having irreducible CSC conflicts. Now, we introduce a new approach to solve such conflicts speed-independently by introducing internal communication between the components. The resulting algorithms may seem simple or restricted, but nevertheless they are of great practical relevance and require non-trivial correctness proofs.
Basic Idea
To see how irreducible CSC conflicts can emerge, consider Fig. 4 : the STG N on the left is implemented using component C r (where x + is contracted) for output y and C d for output x; the clouds should be ignored for now. Considering the respective state graphs next to the STGs, one can see that C r has a CSC conflict between the highlighted states: both have the same state vector (a, y) = (0, 0) and only one of them activates output y.
This conflict describes a so-called (dynamic) selftrigger, i.e. a firing sequence of two input transitions labelled with the same signal and complementary edges returning to the same state vector. The insertion of an internal signal transition ic + (at the position of the cloud) to solve this CSC-conflict is not possible, since it would not be input proper, cf. the discussion after The- In this context, we call the component having the self-trigger critical. Since such self-triggers arise in our benchmarks quite often, we concentrate on them in this paper. To find them with a structural method, we define a structural self-trigger as two transitions t and t which are labelled with complementary edges of the same input signal and satisfy t ∈
• (
• t ); a structural self-trigger is called MG-self-trigger if the place between t and t is an MG-place. A structural self-trigger is necessary for a dynamic one.
Since there is a signal edge between a + and a − in N, one can tune the decomposition not to contract the respective transition; instead, backtracking is performed and x is added as an input for C r . Although this helps in some cases, in our experience an irreducible CSC conflict very often remains. Another possibility is to add output x to C r ; in this case, C d will not be generated anymore since C r is responsible for x and y and consequently larger. If this is applied repeatedly, it easily results in components with large state spaces, destroying the advantage of decomposition.
In this paper, we return to the idea to insert the internal signal ic between a + and a − . The problem is that this requires the circuit environment to wait with lowering a for the occurrence of ic, although it does not even know about ic. The new idea is to achieve the desired causal relationship by making ic a communication signal between the components: we identify an output between a + and a − in N (here x + ) and the respective socalled delay component C d ; then, we insert ic as an output in C r and as an input in front of x + in C d . This way, a − waits for x + , which in turn waits for ic. To view the situation from the point of C r : its environment consists of all other components (including C d ) and the circuit environment, and this combined environment now indeed waits with lowering a for the occurrence of ic (i.e. C d delays the overall environment such that a − cannot arrive too fast causing malfunction of the circuit).
Finally, we hide ic (cf. set H in Definition 2.2) such that, from the point of view of N, we have inserted internal signal ic at the position of the cloud, and this is correct. This view is the key to prove the new method correct: first insert ic into N preserving correctness; since we can only decompose nets without internal signals, we regard the result as N /{ic} where ic is an output of N . Second, we decompose N with the same initial partition as before, except that ic is added to the outputs of C r and the inputs of C d . And indeed, in the reduction of C r , x + is contracted putting ic into the right position. Furthermore, it should be possible to avoid recomputation of the other components: for such a component, ic is lambdarised initially, so it should be possible Input specification N -critical component C r with structural self-trigger t 2 , t 4 delay component C d with t 3 'between' t 2 and t 4 in N, l(t 3 ) ∈ Out
Output modified specification N -modified critical component C r -modified delay component C d 1 in N, perform an output gyroscope insertion for a fresh signal ic via an implicit place p and interme- to remove it immediately since it can even be removed when representing an internal signal. Then, reduction can proceed as in the case of N.
For an implementation of our idea, some problems remain to be solved. First, to ensure the practically important consistency, one cannot only insert ic + between a + and a − , but one has to introduce a suitable ic − -transition somewhere else, too. This is problematic since it is unclear how to find a proper insertion point. Instead, the cloud is replaced by a gyroscope (simulating a toggle-transition).
Second, it is in no way obvious how to insert ic into a general N while preserving its behaviour, since the usual methods for event insertion would build the reachability graph of N, which we must avoid. 4 These considerations are formalised in the algorithm AVOID-0 in Fig. 5 ; its correctness is stated next. For the proof observe: if gyroscope insertion into N results in N , then N /{ic} is correct w.r.t. N by Proposition 2.6, i.e. N is correct w.r.t. N when hiding {ic}. Furthermore, ((C i ) i∈I ,C r ,C d ) is correct w.r.t. N : one can argue that the components (C i ) i∈I do not have to be recalculated, and the critical and delay component are recalculated and therefore correct. Then the result follows from Theorem 2.3.
If there is another irreducible CSC-conflict, we apply the same algorithm to N and ((C i ) i∈I ,C r ,C d ). In the end, this gives some (C i ) i∈I which is correct w.r.t. N when hiding some H and, again by Theorem 2.3 correct w.r.t. N when hiding H + {ic} (the set of all inserted new signals). Furthermore, if there are several components with irreducible CSC-conflicts such that the pairs (C r ,C d ) are disjoint, step 1 can be executed for each of them with respect to the respective modified specification, and then every affected component is newly generated in a single second reduction pass.
Observe that the new decomposition in 3.1 is correct in the sense of Definition 2.2, but it is not even guaranteed that the structural self-trigger actually disappears. Also, the algorithm has to recalculate the affected components; instead we would like to apply gyroscope insertions directly to the calculated components. The next section presents an algorithmic solution for a (practically important) special case where this is possible.
Avoiding Recalculation
We will now consider a very simple case of selftriggering where the specification is essentially an MGpath in the relevant part. We assume we are given a deterministic and live net N; also the latter is a common and sensible assumption on STGs, and in fact 2-liveness is sufficient. Further, we restrict ourselves to the CIRalgorithm.
Let (C) i∈I be correct w.r.t. N. In the critical component C r , let there be an MG-self-trigger due to two input transitions t 2 and t 4 labelled with a + and a − for a ∈ In N . These transitions, the specification N and all components are inputs to the algorithm AVOID-1 in Fig. 6 , which returns a modified specification, and a modified critical and delay component; the critical component is always generated without a second decomposition pass, the latter if possible.
In general, the algorithm returns FAIL whenever the structural conditions in line 1 for the specification are not fulfilled. A path between t 2 and t 4 (line 1) is guaranteed to exist since such a path exists in C r and the reduction does not generate new paths, and usually (but not always) there will be an MG-path. However, the path might fail to be non-forking or there might be no proper output x which can help to destroy the selftriggering. In this case, the self-triggering might be avoided by adding additional inputs to C r as discussed in the previous section. Otherwise, a possible output x and the delay component C d producing it are chosen (line 2). In the rest of the algorithm, the corresponding gyroscope insertions into N and the components C r and C d take place where for C d a proper starting transition t 1 has to be found (line 5); cf. r = / 0 a transition t 3 of w is labelled with an output edge x ± w 2 , w 3 are the subpaths of w from t 2 to t 3 , from t 3 to t 4 resp. no transition of w − {t 4 } has more than one outgoing arc 2 choose C d such that x ∈ Out d , choose a fresh signal ic ∈ Sig N 3 in N, perform an output gyroscope insertion between t 2 and t 3 for ic with the initial marking (M N (w 2 ), 0), giving N 4 in C r , perform an output gyroscope insertion between t 2 and t 4 for ic with the initial marking (M N (w 2 ), M N (w 3 )), delete p s ; this gives C r 5 in N, find an MG-path w 1 from some transition t 1 to t 2 with the following properties (t 1 = t 2 is possible): The idea of the proof is to show that the resulting components C r and C d could have been constructed by the decomposition algorithm, implying correctness with Theorem 3.1. This involves some supporting theorems stating essentially that the reduction operations of the CIR-algorithm commute with gyroscope insertion, e.g. an implicit place stays implicit when inserting a gyroscope. The conditions on the paths are e.g. needed to prevent that the intermediate place of the gyroscope insertion is duplicated during the hypothetical reduction; there are examples where these restrictions are violated and the results are not correct.
A concrete implementation has some freedom since there might be more than one suitable output on w. In such a case, one might choose the output x and the corresponding delay component C d such that a path w 1 exists at all or is better suited than others. If a ∈ Sig d , we have t 1 = t 2 and the conditions for w 1 are trivially fulfilled. In this case, t 2 might additionally be a syntactical trigger of t 3 in C d ; then, it is possible that the signal a is no longer necessary and might be lambdarised in C d and contracted. Theorem 3.2 shows that we have inserted into the self-trigger an output that records the occurrence of a + . This removes the original irreducible CSC conflict, but it could very well be the case that there still is some CSC conflict: the first a + is recorded by ic + , the second one by ic − ; so the state vector before the first a + could easily be the same as the one after the second a − . What we achieved is that this CSC conflict can now be solved by inserting internal signals into C r alone as described in [CKK + 02] and proven correct in the sense of Definition 2.2 in [SV07] .
In fact, the irreducible CSC conflict can also vanish completely if the two state vectors just mentioned differ in other signals; see [WW07] for an example. In the experiments reported in the next section, this is actually always the case; here, the reason is that always at least two gyroscopes are inserted that 'fire' alternatingly.
Experimental Results
The approach of the previous section was implemented as part of our existing decomposition tool DESIJ [Sch07] . In this section we compare it with standard logic synthesis by PETRIFY [CKK + 02] and MPSAT [KKY04] and syntax-directed translations of handshake circuits. In the latter, a circuit behaviour is specified e.g. by a BALSA program [EB02] ; following its syntax, this is translated into a 'netlist' of prede- fined handshake components (HCs), which have (handmade) standard implementations. To build the final circuit, these are connected with handshake channels (usually using the 4-phase protocol) as specified by the netlist. This approach is very fast and results in robust (i.e. global delay insensitive) implementations of very large specifications. However, due to the extensive use of handshake communication, the resulting circuits are heavily overencoded and therefore large and slow. We aim at resynthesis of such HC netlists in order to remove unnecessary handshake signals as follows: each HC is specified by a (usually small) STG, and these are composed in parallel according to the netlist to get one overall STG. In this STG, we lambdarise and contract the intercomponent handshake signals such that only the external behaviour remains. The result is then used as input specification for our approach.
The SeqParTree (SPT) benchmarks examples used here were first used in [CC06] and consist of two types of handshake components with one passive and two active ports each. A sequencer (;) waits for a handshake on its passive port and performs one handshake each on its both active ports in sequence, while a paralleliser (||) performs them in parallel. The SPT benchmarks are trees of sequencer and paralleliser components such that the root is a sequencer which has two parallelisers as children and each paralleliser can have two sequencers as children again etc. SPT-n is such a tree with n levels (consisting of 2 n −1 components); Fig. 7 shows SPT-3.
All benchmarks were performed on a Pentium 4 HT with 3 GHz and 2 GB RAM. The DESIJ part of the table in Figure 8 reports the results of the decomposition based synthesis of SPT-2 to SPT-7 by applying the new approach, using PETRIFY for logic synthesis of the components. The columns report the number of places, transitions and input/output signals of the corresponding specification. Runtimes are given in seconds for decomposition (deco), insertion of internal communication (int) and synthesis of the components (syn). The ic column reports the number of added internal communication signals, and in column lit, the number of literals for a complex gate implementation is shown. This applies also for the PETRIFY, MPSAT and BALSA parts; note that for the first two, sig denotes the number of internal signals which are introduced during synthesis.
In our examples, the new approach was able to turn components with irreducible CSC-conflicts directly to components with CSC, i.e. no additional internal signals had to be introduced during synthesis; cf. the discussion in the previous section.
To the best of our knowledge, no existing STG synthesis technique is able to synthesise such large specifications. With the method of [CC06] , resynthesis of SPT-5 takes 132 seconds and yields an implementation with 269 literals. In contrast, we are able to synthesise SPT-5 in 10 seconds yielding a much smaller implementation with 195 literals. Moreover, our new approach can even synthesise SPT-6 and SPT-7 in less than 25 minutes. Observe that e.g. SPT-7 is not much harder to resynthesise than SPT-6: intuitively, parallelisers add concurrent behaviour to a specification, increasing the state space drastically; since SPT-7 is derived from SPT-6 by adding only additional sequencers, this effect does not occur.
The new approach is also successful w.r.t. resynthesis, i.e. the size of the resulting circuit was decreased when compared to the direct translation. As one can see, implementations resulting from BALSA translations lead to larger circuits (in terms of literals 5 ) than the synthesis according to our approach. On average, we reached a reduction of 50%. The impact on the circuit's performance will be investigated in future work.
Compared to pure logic synthesis with PETRIFY or MPSAT our circuit area is slightly better than that of MPSAT, but twice as large as that of PETRIFY. However, we are not only able to synthesise circuits by orders of magnitude faster, but we can even synthesise SPTs with more than 4 levels, which is impossible with pure logic synthesis due to memory overflows.
Conclusions and Outlook
Summing up, irreducible CSC-conflicts resulting from STG decomposition as in [VW02, VK06] can often be solved by introducing internal communication between the components. Since this approach is purely structural, we consider it as a breakthrough in decomposition based logic synthesis. In particular, the resynthesis of handshake specifications of a size never reached before is now possible.
