This paper presents a novel technique for synthesis of speed-independent circuits. It is based on partial order representation of the state graph called STG-unfolding segment. The new method uses approximation technique to speed up the synthesis process. The method is illustrated on the basic implementation architecture. Experimental results demonstrating its efficiency are presented and discussed.
Introduction
The problem of synthesis of speed-independent circuits from their Signal Transition Graph (STG) specifications has been approached by many researchers. Several tools exist today, such as SIS [10] , Assassin [12] , Forcage [3] and Petrify [2] , which are capable of synthesising circuits of moderate size. All but Forcage use some form of State Graph (SG) representation to obtain truth tables of the implementation logic. Petrify uses Binary Decision Diagrams (BDDs) to represent SG symbolically and can thus synthesise circuits from larger descriptions. Forcage, on the other hand, uses Change Diagrams (partial order model) to derive an implementation but is restricted to specifications without choice.
Construction of SG hits available computational limits due to state explosion. A structural method in [6] can implement STGs avoiding exhaustive state exploration. It uses concurrency relation between transitions of the STG to obtain an initial approximation of the implementation. If this approximation does not satisfy correctness criteria, then iterative refinement is performed using State Machine (SM) decompositions. Although powerful, this method it is restricted to SM-decomposable specifications.
The main goal of this work is to develop a method for implementing STGs that cannot be synthesised by the above techniques due to the large size of their SG. A way to achieve this goal will be analogous to the one in [6] -it will draw upon relations at the event-based, rather than state-based, description level. This method will, however, be free from the limitations of [6] .
The solution to this problem is found in the use of a partial order approach, already known to have given positive results in STG verification. It is based on an implicit representation of SG in the form of a finite STG-unfolding segment [9] . It was shown [9] that such a segment can often be built for those examples where the construction of SG fails. While the segment is being constructed it is also verified for correctness. Thus, after the verification stage is completed, an implementation can be derived from an already built STG-unfolding segment. Two approaches are possible within the new synthesis method: exact and approximate. The former obtains an implementation equivalent to that derived from the SG. At the end of the synthesis procedure this approach produces an implementation by recovering binary states from the segment (similar Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and /or a fee. c 1997 ACM 0-89791-920-3/97/06 ..$3.50 DAC97 -Anaheim, CA, USA to the approach of [5] ). Although it benefits from the unfolding methodology which restricts the set of states needed to examine for each signal, the exact approach may suffer from exponential explosion of states. To battle the complexity, the latter approach uses concurrency relation to initially approximate and then to refine an approximated implementation. The structural method of [6] works on the STG level, assuming that two transitions are concurrent if they can ever fire simultaneously. Loose approximation may require several computationally costly refinement iterations. On the contrary, our method works with a partial run of the STG specified behaviour. Thus it is possible to pin-point when exactly any two transitions become concurrent. This local information gives a more accurate initial approximation and a more precise refinement. Therefore the implementations can be obtained faster and be better optimised.
The aim of this paper is to suggest and illustrate synthesis of speed-independent circuits from the STG-unfolding segment built for their specifications. The method is illustrated on the atomic complex gate per signal architecture and is compared with the existing approaches.
Synthesis of Speed-Independent circuits
General synthesis approach We assume that the reader is familiar with the basics of the Petri net theory [7] . A marked Petri net (PN) is a tuple N = hP;T;F;m0i where P and T are non-empty sets of places and transitions, respectively, F is a flow relation and m0 is an initial marking. A Signal (Transition) Graph (STG) [8, 1] is a tuple G = hN;A;Li (labelled PN) where N is a marked PN, A is a set of signals and L : T ! f + ; ,g A is a labelling function. STGs are a special case of labelled PNs, used for low level descriptions of asynchronous circuits. The set of transition labels represents changes of signals: +ai (for up) and ,ai (for down). Notation ai indicates a transition labelled with a change of ai regardless of the direction of this change.
Conventionally, to obtain an implementation for an STG G a corresponding SG is built. The SG S, also called State Transition Diagram (STD), is derived by constructing the reachability graph (representing all reachable markings) of the underlying PN and then assigning binary codes vi to each vertex s. The binary codes must be assigned consistently, i.e. :
every arc is labelled with exactly one signal transition, and for each pair of states s1 and s2 connected with an arc labelled with ai the following is true:
Once a consistent state assignment was performed, truth tables are obtained for each output signal and an implementation is produced. The process of obtaining a truth table depends on the implementation architecture chosen (for this particular signal).
Correctness criteria for synthesis of speed-independent circuits can be divided into general correctness criteria and architecture specific correctness criteria. The former are behavioural properties of an STG, which characterise an STG to be implementable. In addition to the consistent state assignment, they also include:
Boundedness, which guarantees that the behaviour specified by an STG can be implemented into a finite size circuit; Semi-modularity (also called "output signal persistency"), which implies that excited output signals cannot be disabled by some input signal change and thus cause a hazard.
The latter group of properties is usually checked during the actual logic synthesis process. These are generally referred to as coding conflicts and indicate that although the STG is implementable "in principle", some binary state may be associated with different markings which makes them indistinguishable at the circuit level. However, if a cover is obtained somehow differently (e.g. using an oracle), it may cover some other states. For example, a method described in [6] use structural information to obtain covers. Such cover is called approximated cover, and needs to be checked for correctness. There are different requirements for correctness of covers according to the implementation architecture chosen.
The following three architecture types are normally considered:
Atomic complex gate per signal implementation;
Atomic complex gate per excitation function implementation;
Atomic complex gate per excitation region implementation.
The first architecture can be considered as a basic type. The other two aim at reducing the size of customised complex gates. In these architectures it is assumed that the output signal is implemented using a memory element. The Set and Reset excitation functions for this memory element are implemented as atomic complex gates (the former) or a network of atomic complex gates (the latter). Depending on which memory element is used, the implementations are divided into i) Standard C-element implementation, which uses Muller C-element as the memory element, and ii) RSlatch implementation, where an RS-latch is used.
To demonstrate the novel technique we chose the atomic complex gate per signal architecture. Our method, however, can be easily adapted to the other architectures.
Atomic complex gate per signal implementation This is a basic architecture for speed-independent circuits studied in [1] . The circuit is implemented as a network of atomic gates. Each gate uniquely implements one output signal. Its boolean function can be represented as Sum-Of-Products (SOP) or Sum-OfFunctions(SOF). An example of such gate is shown in Figure 1(b) . Each gate is allowed to be sequential (latch), i.e. contain an internal feedback with a zero delay. The delay between its internal "ANDing" and "ORing" parts is also assumed to be negligible. The gate depiction is used to denote the implemented boolean function as the actual implementation is resolved on the transistor level. the binary code assigned to the state. The cover C for implementation is obtained from the terms included into the on-set. The DC-set can be used for optimising the size of C. This is done in standard minimisation tools, such as Espresso [10] .
1 Here and further, for simplicity, it is assumed that the on-set is constructed. Usually, the simplest from the on-and off-sets is chosen for implementation. Obtaining exact covers usually means that all states in the onor off-set must be known. An approximation algorithm produces approximated covers of the on-and off-sets. Therefore, in this implementation architecture, covers of on-and off-sets must satisfy the following condition: Definition 1 Two covers C On ai and C Off ai are said to be correct iff C On ai and C Off ai cover Onai and Offai respectively and C On ai C O f f a i DC-set.
2
If the covers do not satisfy the above condition, then the approximation is too loose and needs to be refined. If, on the other hand, the covers are exact but still intersect outside the DC-set, then this STG has CSC problem. In this case it should be corrected by changing the specification, e.g. by inserting additional signals.
Slices in STG-unfolding segment
STG-unfolding segment Analysis of STGs using STGunfolding segment was studied elsewhere [9] . An STG-unfolding segment is a tuple G 0 = hT 0 ; P 0 ; F 0 ; L 0 i where T 0 , P 0 and F 0 are sets of transitions, places and the flow relation, respectively, and L 0 is a labelling function which labels each element of G 0 as an instance of elements of G. G 0 is a partial order obtained from an STG G by the process of its unfolding which starts from the initial marking. The unfolding process uses the structural properties of the constructed partial order to determine the relations of conflict, concurrency and precedence between instances. These relations are used to decide where to instantiate the next element. The following key notions were introduced in [4] :
The min-set of transitions needed to fire t 0 , including t 0 , is called local configuration of t 0 and is denoted as dt 0 e. In contrast to PN-unfolding [4] , the STG-unfolding takes into account signal interpretation of PN transitions and keeps track of the binary codes reached by transition firing. However, it still examines only a subset of all reachable states and thus is more efficient than SG analysis for a vast number of examples.
Each instance t 0 of STG-unfolding segment is assigned with a binary code dt 0 e which is reached by firing transitions in dt 0 e.
Similar to its postset, the binary code corresponding to a configuration C is calculated from dt 0 e of transitions comprising it. It was shown in [9] that all states of the SG are represented in the STG-unfolding segment as postsets of some configuration. It was demonstrated in [9] that an STG-unfolding segment can only be constructed for an STG specification satisfying boundedness and consistent state assignment criteria. The last general correctness criterion, semi-modularity, can be checked on the STGunfolding segment in linear time. A minimal excitation cut c min e t 0 , which represents a state at which t 0 becomes first enabled.
Cuts
A minimal stable cut c min s t 0 , which represents a state which is reached by firing of t 0 .
A maximal excitation cut c max e t 0 , which represents a state from which, in a correct STG no advancement can be made unless t 0 is fired.
A maximal stable cut c max s t 0 , which represents a state which is reached after firing of t 0 from which firing of any transition leads to a state enabling the next change of the signal ai labelling t 0 .
Each instance of the STG-unfolding segment uniquely identifies Thus each instance identifies states bounding the subset of the onset (or off-set) of ai which is found for this particular instance.
Slices To represent a (connected) set of states we introduce a notion of a slice of the STG-unfolding segment. A slice of STGunfolding segment is a set of cuts S = hc min ; C max i defined with a min-cut of the slice, c min , and a set of max-cuts, C max , such that 8ci 2 S the following is true: c min ci and 9c max j 2 C max : ci c max j . No two cuts in the set of max-cuts are sequential.
In other words, a slice is defined between one min-cut and a set of max-cuts. Every cut in between the min-cut and a max-cut is encapsulated in the slice S. Furthermore, for any two cuts ci and cj encapsulated by S, if ci cj, then all cuts between ci and cj are also encapsulated by S. Since each cut represents some state in the SG, for any two states si and sj represented as sequential cuts in a slice, all states on any path from si to sj are also represented as cuts encapsulated into S. The number of cuts in the set of maxcuts corresponds to the number of configurations (non-conflicting runs of the STG) which include configuration producing the mincut. The elements of the STG-unfolding segment, i.e. places and transitions, bounded by instances in min-cut and max-cuts are said to belong to the slice. A slice represents a subset of reachable states found in the SG for any STG bounded by the cuts defining it. As discussed earlier, the synthesis of speed-independent circuits is based on finding subsets of reachable states. Therefore, slices of the STG-unfolding segment can be used to identify and represent these subsets.
Cuts and slices are illustrated in Figure 2 . Consider a cut c = p Each cut is produced by some configuration of the STG-unfolding segment. Hence, the binary codes of the SG states represented by cuts encapsulated in a particular slice can be recovered by examining its cuts.
Synthesis from STG-unfolding segment
Obtaining exact covers First, consider the problem of synthesis from the STG-unfolding segment G 0 by finding exact covers for the on-(off-)set. To implement an output signal of an STG as an atomic gate, its on-set 2 is required. Since its SG is represented as an STG-unfolding segment, the problem is to find a set of slices in this segment which represents all states in the on-set, i.e. an on-set partitioning of G 0 for ai.
To define each slice we need to identify a min-cut and a set of max-cuts. From all instances in the STG-unfolding segment only instances of +ai may change the value of corresponding element in the binary codes. Furthermore, for each instance +a For complete definition of each slice we need to determine a set of max-cuts for each slice. The minimal excitation cut of any instance ,a 0 i represents the first state at which ,a 0 i becomes excited.
This cut belongs to the off-set.
For each instance +a 0 i the slice must be bounded by a set of cuts which can be reached from min-cut without exciting ,ai. The slice is bounded by the maximal excitation cuts of immediate predecessors of next+a 0 i , i.e. cuts at which an immediate predecessor of a transition from next+ 0 ai is the only transition to fire. This is the furthest state to which advancement of the system can be made from +a 0 i without enabling ,ai. In the case of initial transition the set of max-cuts for the first slice is chosen using firstai. Deriving cover approximation from STG-unfolding segment The synthesis procedure described in the previous Subsection suffers from one drawback. If many concurrent transitions belong to a slice, then obtaining the binary codes for all cuts will suffer from exponential explosion of states. To battle this an approximation method is suggested.
Two types of nodes can be identified in the on-set of signal ai: those which have +ai excited and those at which ai is stable at Cover refinement Due to the approximated nature of the covers, an on-set cover found from the STG-unfolding segment may implement an incorrect function. Indeed, if a output signal is implemented using an on-set cover approximation which covers a state belonging to the off-set, then the output will change to "1" where it is suppose to be "0". Thus cover approximations obtained using the algorithm described before need to be checked. To check cover correctness both on-and off-set cover approximations are required. Suppose that both approximated covers for the on-and off-set of ai were obtained. Suppose also that their intersection is nonempty. The covers' intersection may only belong to the DC-set. However, to find the DC-set all codes in both on-set and off-set must be known. Therefore, to ensure the covers implement the logic functions correctly we check a stronger condition: approximated covers for on-and off-set are said to be correct if their intersection is empty. The approximation produces semi-optimised covers. Exact covers have their intersection empty by construction. Therefore, if the covers' intersection is non-empty, then they need to be refined until their intersection becomes empty, possibly restoring the exact covers. Thus the use of a stronger condition only affects the quality of optimisation rather than correctness of covers.
If after complete refinement on-and off-set covers still intersect, then this STG has a CSC problem and cannot be implemented without changes to the specification. Correct refined covers can be optimised using any known minimisation technique. The pseudo-code of the algorithm for deriving covers for onand offsets is shown in Figure 5 . The initial on-and off-set cover approximations are found as described in the previous Subsection. If the approximated covers' intersection is not empty, then these covers are refined. Only concurrency relation was used for finding approximated covers. Other relations between transitions concurrent to a 0 i were ignored. The general idea behind refinement is that using these relations some of the information about the cover is restored. Covers are refined until "they are good enough", i.e. covers' intersection becomes empty.
The on-and off-set covers' intersection may become non-empty due to approximation of MR cover for some places in the approximation set. These MR cover approximations may intersect with the ER cover approximations of some instances of the opposite signal transition. In this case only cover approximations for these places (but not all in the approximation set) and the instance of opposite signal transition need to be refined. The set of signals Sigwhich cause the intersection is also known. These are exactly those signals whose value is undefined in one of the cubes B 2 C . Thus we need to consider a problem of refining a cover approximation for an element x 0 of STG-unfolding segment with Sig. Informally, at each step the refinement procedure restores the marking component of reachable states represented by the slice. It finds a set of places which can be marked together with each already partially restored marking. The cover function is then changed reflecting the fact that partially restored markings now include found places. Thus in the end, when the procedure terminates, the covers correspond to fully restored markings and cover only states with these marking components.
Since each step refines the value of at least one variable and the set of signals is finite, the refinement procedure will terminate in finite number of steps producing an exact cover for the states 
Experimental results
The method suggested in this paper was implemented on the basis of the unfolding tool "PUNT". Experiments are divided into two major series. The goal of the first series was to demonstrate the quality of the proposed method. Results of the synthesis procedure, tested on a set of benchmarks, are shown in Table 1. The table presents time breakdown (in seconds) for synthesis a speed-independent circuit from its STG specification in the atomic complex gate per signal architecture ("PUNT ACG"). Column "UnfTim" shows the time taken to construct the STG-unfolding segment; column "TotTim" shows the total time taken to synthesise a particular circuit (including Espresso optimisation). For comparison, same set of benchmarks was synthesised using two known tools Petrify and SIS. Their timings are grouped in the column "Other tools". Literal count (columns "LitCnt") was used as a measure of the quality of the new synthesis method. The literal count shows the total number of literals in the obtained covers of final implementations. The number of signals (column "Sigs"), influencing the complexity of the specification and its behavioural representation, is also given for each specification.
As it can be observed, the synthesis technique based on the STG-unfolding segment produces implementations comparable to those produced by other tools. The timing results show that our technique compares favourably to Petrify. It is also comparable with SIS on the benchmarks with low count of signals and it becomes increasingly better with the growth of the signal count. These results show that for small sized benchmarks, the overheads of constructing the STG-unfolding segment and traversing it may outweigh the time spent on constructing a small reachability graph with an efficient implementation. Using a stronger correctness condition for approximated covers may produce a slightly worse implementation due to the fact that the DC-set is partitioned.
The second series of experiments shows the feasibility of the new method on a set of scalable examples such as Muller pipeline. Experimental results are shown in Figure 6 . As can be observed, existing tools soon choke on the size of the specification either running out of memory or taking prohibitively long time. The literal count for all three tools was the same. Both SIS and Petrify exhibit doubly exponential growth of time taken. The first dependency is due to the state space explosion, the second is due to the exponential complexity of the exact synthesis process used in both tools. In addition, we synthesised a Counterflow pipeline specification [11] which has 34 signals. From the existing tools, only Petrify was able to synthesise it taking more than 24 hours. At the same time PUNT was able to synthesise it in under 2 hours thus giving an order of magnitude gain in speed. This is shown on the graph as a circled dot.
Conclusions
In this paper we presented a new method for synthesis of speed independent circuits. Our approach is based on the STG-unfolding segment. It uses the segment as a model from which an implementation is obtained. As the size of the STG-unfolding segment is often smaller than the size of the SG, it is possible to synthesise specifications of larger sizes. In addition, due to the smaller size of the semantic model, the implementation can be achieved faster on a number of moderate sized examples. We demonstrated applicability of our method on an existing set of benchmarks.
Future development of this method can be directed into exploring heuristics for the refinement procedure, which is the core of our method. In addition, this method can be adapted to the other implementation architectures. In this case, the approximation will be used to obtain the excitation functions for memory elements by finding the slices corresponding to the required regions of the SG. Furthermore, the method can be enhanced by accommodating checks for weaker correctness conditions for approximated covers.
