Abstract-This paper presents a time-efficient method for the decomposition and resynthesis of speed-independent (SI) circuits. Given the specification of an SI circuit, our method first generates its standard C implementation. Then, the combinational decomposition is performed to decompose each high-fanin gate that does not exist in the gate library into some available low-fanin gates. The time efficiency of our method is achieved in two ways. First, the signal transition graph (STG), whose complexity is polynomial in the worst case, is adopted as our input specification. Second, to reduce the resynthesis cycles, which constitute a major part of the run time, our method first investigates the hazard-free decomposition of each high-fanin gate without adding any signals. Then, for those gates that cannot be decomposed hazard free, two signal-adding methods constructed at the STG level are developed for resynthesis. This decomposition and resynthesis process is iterated until all high-fanin gates are successfully decomposed or no solution can be found. Several experiments have been done on the asynchronous benchmarks and it can be seen from the results that our method largely reduces the run time only at a little more area expense when compared with previous work.
I. INTRODUCTION
S PEED-INDEPENDENT (SI) circuits are hazard free under the unbounded gate-delay assumption that gate delays are unbounded but finite and wires have zero or negligible delay. For a high-fanin gate that does not exist in the gate library, the only way to make it implementable is to decompose it into some available low-fanin gates. This decomposition must be done with care so that speed-independence is still preserved on the new gates. Some work on the decomposition of SI circuits has been done. The method in [19] deals with a hazard-free decomposition of some gates only; further solution space searching and gate sharing are not considered. In [3] , the sequential decomposition is performed on a sequential circuit with high-fanin gates after the correctness conditions are analyzed. However, it lacks an efficient method for using the correctness check and sharing the decomposed gates. An algebraic factorization technique for the combinational and sequential decomposition of complex gates is proposed in [11] and later improved in [7] by using Boolean algebra for logic decomposition and extending the use of C elements to different types of storage elements. The last two methods, however, are time-consuming when dealing with large circuits since they are both constructed at the state graph (SG) level. Moreover, they add new signals to the initial signal transition graph (STG) for each gate to be decomposed, which means more resynthesis cycles are required. Although some area saving can be achieved, much more CPU time has to be spent. Our method overcomes this time-consuming problem by adopting STG as the input specification and reducing the resynthesis cycles during decomposition. Although STG has its limitation in circuit specification, it has the advantage that the problem complexity is only proportional to the number of signals and can thus save a lot of run time. Moreover, due to the high-level property of an STG, it is also easier for designers to describe the circuit behavior. Thanks to these properties of an STG, it has been previously applied in solving the state conflict problem and synthesizing the SI circuits [2] , [4] , [10] , [15] , [16] , [20] . To our knowledge, however, no work has been constructed at the STG level for the decomposition and resynthesis of SI circuits. In this paper, given the STG specification of an SI circuit, a standard C implementation is first generated. Then, for those high-fanin gates, the combinational decomposition is performed to decompose them into some low-fanin ones. To avoid an explicit generation of all the markings in an STG, some properties peculiar to cubes are analyzed with the help of the concurrency [4] and interleave [16] relations between signals. Since the resynthesis cycles constitute a major part of the run time, they are reduced in our method by first finding a hazard-free decomposition of each high-fanin gate without adding any signals to the STG. By observing the change of cubes and the firing relation between transitions, the requirements for a hazard-free decomposition are proposed. If no such decomposition exists, the initial STG is modified to incorporate as few new signals as possible for resynthesis. Two signal-adding methods, intermediate gate acknowledging (IGA) and signal replacing (SR), constructed at the STG level are developed. This decomposition and resynthesis process terminates when all high-fanin gates are successfully decomposed or no further progress can be made. When compared with previous work, our method largely reduces the run time only at a little more area expense. This paper is organized as follows. Section II gives the background needed in our discussion. To make the whole work done at the STG level, Section III discusses some of the cube properties. The hazard-free decomposition is discussed in Section IV, and for some gates that cannot be successfully decomposed, the resynthesis procedure is given in Section V. Our experimental results are shown in Section VI. Section VII concludes this paper and gives the directions for future development.
II. BACKGROUND
In this section, we recall some of the basic definitions on Petri nets (PNs) [14] and STGs [4] . Then, we discuss the implementation structure adopted in our method and the requirements for the hazard-free implementation of an SI circuit.
A. PNs and STGs
An STG is an interpreted PN , where is a set of places, is a set of signal transitions, is the flow relation and is the initial marking. For a node , the fanin and fanout nodes of are denoted by and , respectively. A marking of a PN is an assignment of tokens to each place. A transition is said to be enabled in a marking if each fanin place of is marked in . When this enabled transition fires, it reaches a new marking by removing a token from each place in and adding a token to each place in . A marking is said to be reachable from the initial marking if there exists a sequence of firing , called a feasible sequence, that transforms into . A path in a PN is a sequence of nodes such that any two adjacent nodes and are directly connected, i.e., and it is called a cycle if and are identical. A path or cycle is said to be simple if no node appears in it more than once.
A free-choice (FC) net is a PN where if two or more transitions share one fanin place , then is the only fanin place for all of them. A place is said to be a multifanin or multifanout place if it has more than one fanin or fanout transition. In an FC net, a multifanout place is an FC place; when is marked, only one of its fanout transitions can fire. A PN is live if for each reachable marking and each transition , another marking where is enabled can be reached from . A PN is safe if for each reachable marking and each place , is assigned at most one token in . A marked graph (MG) is a PN where each place has exactly one fanin and one fanout transition and a state machine (SM) is a PN where each transition has exactly one fanin and one fanout place. It has been proved in [9] that a live and safe FC net can be decomposed into a potentially exponential set of strongly connected MG or SM components that covers the net and each SM contains exactly one token. We assume that only the live and safe FC net is considered in this paper.
For a signal , the rising and falling transitions of are denoted by and , respectively, and represents a generic transition. If changes several times in one cycle of the circuit operation, is used to denote the -th instance of . Transition and its next transition are adjacent; there exists no other transitions of between them. An STG is said to be output-semimodular if no output signal transition enabled in any marking can be disabled by other enabled transitions. The implementation of an STG that is not output-semimodular may produce unspecified changes on gate outputs, i.e., hazards. The fanout transitions of an FC place are hence restricted to be transitions of input signals. Each marking of an STG is encoded by the binary value of each signal and the characteristic function for the binary code of is denoted by . The encoding of a marking is said to be consistent if there exists no rising or falling transition of a signal in when the corresponding value of is 1 or 0 in . An STG is said to satisfy the complete state coding (CSC) property, if, for any two different markings with the same binary code, the output signal transitions enabled in them are identical. A CSC violation means that the circuit being in the same state has to produce different transitions on gate outputs. These conflict states can be distinguished in the circuit by an additional memory [5] . A more restrictive condition, called the unique state coding (USC) property, requires that each reachable marking be assigned a unique binary code. An STG satisfying the output-semimodularity, consistency and CSC properties is called an implementable STG, from which a correct SI circuit can be derived [12] . Two transitions and are said to be concurrent if they can fire in the same marking without disabling each other and they are autoconcurrent [16] and is said to be persistent to if the underlying signal is nonconcurrent to . A transition is said to satisfy the persistency property if each trigger transition of is persistent to . This persistency property is a necessary condition for an STG to be realized by the standard C implementation [12] . In our work, the input specification is hence assumed to be an implementable STG satisfying persistency, useless in some cases where the standard C implementation can be reduced to a combinational implementation. Fig. 1(a) shows the STG specification of a benchmark example master-read.g; it is an MG containing the master read operation of Intel Multibus (IEEE Standard 796). Fig. 1(b) is the STG specification of another example sbuf-send-pkt2.g, obtained from the Post Office chip [8] ; it is a free-choice net composed of four MGs.
B. Implementation Structure
Each gate in our work is assumed to be atomic and satisfies the pure-delay assumption [13] . An atomic gate can be modeled as an instantaneous Boolean function of its inputs with a single pure delay on the output. Given any input change of an atomic gate, it responds with a corresponding output change after some (potentially unbounded) delays. The relative ordering of signal transitions on the inputs is hence preserved irrespective of the actual gate delay. Our implementation structure, shown in Fig. 1(c) , is based on a restricted class of asynchronous circuits called the standard C implementation [12] . Each noninput signal is implemented by a signal network that consists of the C element and the combinational set and reset networks. A gate is enabled or disabled if its output is ready for a rising or falling change. The set or reset network, composed of set or reset transition networks, is responsible for enabling or disabling the C element. For a signal , if the persistency property is satisfied for each rising transition of and the set and reset functions of are complements of each other, can just be realized by the combinational set network. It is hence not necessary for each falling transition of to satisfy persistency. Similarly, it is also possible for to be realized by the combinational reset network. For the reason of circuit implementability or area saving, a set or reset transition network may sometimes be shared by multiple instances of the rising or falling transition. The discussion of transition network sharing, however, is omitted here due to space limitations.
C. Hazard-Free Implementability
A hazard is an unexpected output transition, either rising or falling, in response to a change in some input(s). An output transition is said to be expected if it occurs in accordance with the STG specification. In an SI circuit, due to the lack of a global clock and the pure delay assumption, no internal transition is allowed to occur unexpectedly. An unexpected internal transition will be propagated to the output, causing an unexpected output transition, i.e., hazard. Since an internal transition cannot be perceived by the outside environment, it must be acknowledged by some expected output transition(s). A transition is said to be acknowledged by a transition if there exists a gate with as one of its inputs and as its output, satisfying that only after is completed will be completed. The completion of a signal transition means that the signal has changed to the new value from its original value. This acknowledgment property can also be further extended to be applied to signals not belonging to the same gate.
For the standard C implementation, the enabling or disabling of a transition network is modeled by the set-on or set-off operation of a cube. A cube is a set of literals representing the logic value of each signal in the original or complementary form. A cube is said to be set on in a marking or covers , i.e., , if each literal or has a corresponding binary value 1 or 0 in ; otherwise, is set off in , i.e.,
. To obtain the hazard-free conditions, several regions must first be defined. For a transition , the excitation region for , denoted by ER , is the maximal connected set of markings where is enabled. All the markings in ER are covered by a cover cube , which is then implemented by a single AND gate. The quiescent region (QR) for , denoted by QR , is the maximal connected set of markings reached from ER before is enabled. It has been shown in [1] and [12] that the resulting circuit operates with SI without any hazards if each cover cube is a monotonous cover (MC) satisfying the following conditions. 1) ) is not of the same signal as or . Owing to the characteristic of a C element that the two input values are allowed to be identical at the same time, the markings covered by , under some constraints, may be extended to include the markings in QR [16] for circuit implementability or area saving. Since the outputs of the first-level AND gates in the standard C implementation are one-hot encoded, the SI property is still preserved if any valid Boolean decomposition is applied to the second-level OR gates [19] . Fig. 2 shows a hazard-free implementation of master-read.g. There are five three-input AND gates having to be decomposed if only two-input gates are available in the gate library.
III. CUBE PROPERTIES FOR PROCESSING AT STG LEVEL
To perform our decomposition and resynthesis work at the STG level, some cube properties are analyzed in this section. The formulation is borrowed from [16] with some adjustments to make it more suitable for our discussion.
A. Finding Markings From Cubes
We first discuss how to find markings with the same property from a cube. For a place and a transition , there are the following three types of markings that we are concerned about: 1) the markings in which is marked; 2) the markings in which is enabled; 3) the markings directly reached by the firing of .
To avoid generating all the markings of an STG, whose complexity is usually exponential in the worst case, markings of the same type are captured by a single cube. A cube is only a conservative approximation for the binary codes of the markings to be captured and may cover markings that do not exist. To make the cube estimate more accurate, it should be the smallest among those possible, i.e., with the largest number of literals [12] . To find the cube, we need the relation between any two nodes in an STG. The deriving of this relation is based on the utilization of the cycle, MG, and SM decomposition of the free-choice STG [4, 16] . For any two nodes and in an FC net , they are ordered if there exists a simple cycle in to which both of them belong. They are concurrent if they are not ordered and there exists an MG component in to which both of them belong. Lastly, they are conflicting if they are not ordered and there exists an SM component in N to which both of them belong. If is an MG each SM component in will be a simple cycle. This net decomposition, in some cases, cannot provide enough information to determine the relation. If and are not ordered and there exists no MG component or SM component to which both of them belong, their relation (either concurrent or conflicting) can be obtained by the relation of their fanin and fanout nodes. A signal is conflicting with a place if each transition of is conflicting with ; is concurrent to if there exists some transition of concurrent to ; ordered with if is neither conflicting with nor concurrent to .
For a place , the set of markings in which is marked is called a marked region (MR) [16] of, denoted by MR . A cube is then defined to cover all the markings in MR . 
B. Changes of Cubes
How a cube is changed by the firing of transitions is now discussed. For two markings and , a marking is said to be interleaved with ( , ) if will always be reached before when the marking goes from to . For a marking where a cube is set off and a marking where is set on, will be turned on if the marking goes from to and is set off in any marking interleaved with ( , ). Conversely, if the marking goes from to and is set on in any marking interleaved with ( , ), will be turned off. For a cube to be set on in a marking , each signal with a binary value 1 in must have a corresponding literal or a don't care in , whereas only one signal with a literal in is sufficient to make set off. It can be similarly stated if the binary value of is 0 in . The condition for turning on a cube, which is constructed on the literal correspondence of all signals, is hence much stricter than that for turning off a cube, which requires only one literal correspondence.
Definition 3.4:
A transition set is a turn-on set of a cube if will always be turned on by firing all the transitions in ; is a turn-off set of if will always be turned off by firing any one of the transitions in .
Let be composed of transitions . If is a turn-on set of a cube , then for a marking where is set off and each transition in is enabled, another marking where is set on will be reached from by firing all the transitions in . However, is still set off in any marking reached from by firing only some (not all) of the transitions in , i.e., ER EQR . If is a turn-off set of , then for a marking where is set on and each transition in is enabled, another marking where is set off will be directly reached from by firing any one of the transitions in , i.e., ER EQR . How the firing of a transition contributes to the turn-on or turn-off change of a cube is analyzed here by indicating the covering relations between and the fanin/fanout cubes of . For two cubes and , there are three covering relations involved.
Relation 1: covers , i.e., . Relation 2: the intersection of and is empty, i.e., . Relation 3: the intersection of and is not empty, but does not cover , i.e., . If covers , covers all the markings covered by ; if their intersection is empty, there is no common marking covered by them; if their intersection is not empty, but does not cover , there exists some common marking covered by them and some marking covered by but not by . Another relation " covers " is excluded in our discussion since it is synonymous to the first relation " covers " when the roles of and are interchanged. For a transition , since its fanin cube covers all the markings in ER , if a cube covers , also covers all the markings in ER . If the intersection of and is empty, does not cover any marking in ER . If their intersection is not empty, but does not cover , there exists in ER some marking covered by and some marking not covered by . These covering relations between and the fanin cube of can also be applied to and the fanout cube of .
By the above three covering relations, for a cube , considering its covering relations with and of one of its transitions , there are nine cases as to what influence the firing of has on the turn-on or turn-off change of , as shown in Table I . The transition whose underlying signal literal is a don't care in is excluded in our discussion since the firing of does not have any influence on . In case 1, since the firing of does not make any change on , is called a static transition of . In case 2, since is turned on by the firing of , is called a turn-on transition of and constitutes a turn-on set of . In case 3, since is turned off by the firing of , is called a turn-off transition of and also constitutes a turn-off set of . In case 4, since can be turned on or kept at off by the firing of , is called a quasiturn-on (Q-on) transition of . Similarly in case 5, since can be turned off or kept at off by the firing of , is called a quasiturn-off (Q-off) transition of . In the above two cases, there exists other Q-on or Q-off transitions concurrent to and these concurrent transitions (including ) can be properly combined to make a turn-on or turn-off set of . In case 6, is set on in both ER and EQR . This is a case that never occurs since if the underlying signal of has a literal in , the binary value of is 1 in ER . After fires, due to the consistency property of the STG, the value of is changed to 0 in EQR , where is obviously set off. It can be similarly explained if the literal of the underlying signal of is in . Besides, the remaining three cases 7-9 will not occur, either; there exists a marking , where is set on and another marking reached from by the firing of , where is also set on.
C. Concurrent Set
Here, we show how to derive the turn-on and turn-off sets of a cube by the concurrency relation between transitions. Let be a set composed of all the turn-on, turn-off, Q-on and Q-off transitions of .
Definition 3.5:
A concurrent set of a cube is a subset of where any two transitions are concurrent to each other. A concurrent set is said to be maximal if it is not a subset of any other concurrent set. A concurrent set composed of only one turn-on or turn-off transition is a maximal concurrent set with the smallest number of transitions. If a concurrent set is composed only of Q-on transitions, it is a Q-on set; if it is composed only of Q-off transitions, it is a Q-off set. Then we have the following conclusions. Due to space limitations, all the theorems in this paper are presented without proof.
Theorem 3.1: Let be a Q-on set of a cube , then is also a turn-on set of if and only if is maximal.
Theorem 3.2: Let be a Q-off set of a cube , then is also a turn-off set of if and only if is maximal.
Since a Q-on transition may be concurrent to a Q-off transition, a maximal concurrent set may contain both Q-on and Q-off transitions.
Theorem 3.3: Let be a maximal concurrent set of a cube composed of Q-on and Q-off transitions at the same time, then is neither a turn-on nor a turn-off set of and there exists unexpectable turn-on and turn-off changes on during the firing process. 
Definition 3.6:
A maximal concurrent set is said to be well behaved if it does not contain Q-on and Q-off transitions at the same time.
For an SI circuit, if there exists a gate whose underlying cube contains a maximal concurrent set that is not well behaved, there may exist hazards on this gate. A cube is said to be well behaved if all its maximal concurrent sets are well behaved. Take a cube of master-read.g as an example, Table II gives some properties for the transitions of and the roles they play. The fanin and fanout cubes of are represented only by literals related to . Using the concurrent relation between transitions, three maximal concurrent sets can be obtained, i.e., a turn-on set , a turn-off set and a set that is not well behaved. After is turned on by firing all the transitions in , it will be turned off by the firing of in . Then, if and fire before , will be turned on again and then turned off by ; otherwise, there is no change on until all the transitions in are enabled again in the next operation cycle.
After classifying all the transitions in into turn-on or turn-off set for a well-behaved cube , the disjoint relation between maximal concurrent sets has to be discussed.
Theorem 3.4:
Let be a collection of all the maximal concurrent sets of a well-behaved cube , then any two sets in are disjoint. Since any two maximal concurrent sets of a well-behaved cube are disjoint, each transition in belongs to exactly one turn-on or turn-off set in and contributes to at most one turn-on or turn-off change of in one cycle of the circuit operation. Each turn-on or turn-off set of , therefore, has a one-to-one correspondence with a turn-on or turn-off change of . These cube properties can then be applied in checking whether a cover cube is a MC for a transition ; i.e., the following three requirements must be met: 2 1) is well behaved; 2) there exists exactly one turn-on set of composed of all the trigger transitions of ; 3) there exists exactly one turn-off set of interleaved with ( , ). If is well behaved, there is no unexpectable change of . If there exists exactly one turn-on set of composed of all the trigger transitions of , will be turned on by firing all these trigger transitions; i.e., will be set on in ER . This meets the cover condition that covers all the markings in ER . If there exists exactly one turn-off set of interleaved with ( , ), will be turned off before the firing of . After is turned off, it will be kept unchanged until all the trigger transitions of are enabled again in the next operation cycle. This meets the remaining monotonous and one-hot conditions.
IV. HAZARD-FREE DECOMPOSITION
For a high-fanin gate that does not exist in the gate library, the only way to make it implementable is to decompose it into some available low-fanin gates. This implicitly introduces new signals to the circuit and each of these signals must be properly acknowledged or hazards may result. As mentioned in Section II-C, no hazard occurs when an OR gate is decomposed, whereas for an AND gate, this must be done with care. In the following discussion, gate decomposition hence refers to the decomposition of AND gates and we assume only two-input gates are available in the gate library. Our method basically consists of the following two steps: 1) decompose each high-fanin AND gate hazard-freely into some 2-input gates and if necessary; 2) generate a new STG and perform resynthesis, then return to the first step. This process is iterated until all high-fanin gates are decomposed or no solution can be found. Our search method is basically greedy and heuristic; an exhaustive enumeration with backtracking is sometimes necessary to find the solution.
A. Decomposition Hazards
Let be the high-fanin gate in the transition network of a transition to be decomposed [see Fig. 3(a) Fig. 3(b) -(c), respectively. In Fig. 3(b) , hazards may occur if there exists any unacknowledged transition of or . The unacknowledged transition of is caused by that of , which cannot be acknowledged by an expected transition of . An expected transition of is a transition having the same behavior as that of in Fig. 3(a) . No hazard occurs on if each transition of can be acknowledged by an expected transition of . Similarly, to make the decomposition in Fig. 3(c) free from hazards, each transition of and must also be acknowledged by an expected transition of .
Here, we show by an example of how improper decomposition can lead to hazards and why the knowledge of the environment is necessary for a hazard-free decomposition. For gate in Fig. 2 , there are three ways to decompose it into two-level gates, as shown in Fig. 4(a)-(c) . The corresponding timing diagrams are also depicted in Fig. 4(d)-(f) , respectively. There are unacknowledged transitions in the first two decompositions. Only the third decomposition is hazard-free; each transition of can be acknowledged by an expected transition of . We can see from this example that the decomposition of an AND gate cannot be done at will; the environment behavior must be taken into consideration during decomposition to make each new transition acknowledged.
B. Hazard-Free Conditions
To make the decomposition in Fig. 3(b) free from hazards, each transition of must be acknowledged by a corresponding transition of . Here the enabling or disabling of a gate is modeled as the turn-on or turn-off change of the corresponding cube. The cubes of gates , , and in Fig. 3 are represented by, , , and , respectively. The condition for acknowledging the rising transition of in Fig. 3(b) is first presented. Theorem 4.1: The rising transition of can be acknowledged if and only if the intermediate cube is well behaved and the number of turn-on sets of is less than or equal to that of in any MG. Since the turn-on and turn-off changes of a cube alternate in an STG during the firing process, for an MG containing exactly one turn-on set of and , it also contains exactly one turn-off set of them. This, however, does not mean that the falling transition of can also be acknowledged; another requirement must be met. For a decomposition where the rising transition of is acknowledged, the condition on which the falling transition of can also be acknowledged is given in the following theorem. Since there is no acknowledgment problem of in an MG containing no turn-off set of , only the MG containing exactly one turn-off set of is discussed. ). Now we illustrate the above two theorems by the example in Fig. 4 . The turn-on and turn-off sets of each intermediate cube in Fig. 4(a) -(c) are given in Table III . For in Fig. 4(a) , since is not interleaved with ( ), the falling transition of cannot be acknowledged. And since contains a maximal concurrent set that is not well behaved, there exists unexpectable changes on . For in Fig. 4 (b) and in Fig. 4(c) , since they are well behaved and contain exactly one turn-on and one turn-off set, the rising transition of can be acknowledged by that of . However, since is not interleaved with ( ), the falling transition of in Fig. 4(b) still cannot be acknowledged. But in Fig. 4(c) , FIG. 4(a) - (c) since is interleaved with ( ), the falling transition of can be acknowledged by that of . For the decomposition in Fig. 3(c) , to make the rising transitions of and acknowledged, constraints imposed on the intermediate cube in Fig. 3 (b) must also be satisfied for the two intermediate cubes and in Fig. 3(c) ; i.e., and must be well behaved and the number of turn-on sets of and must be less than or equal to that of in any MG. Moreover, to make the falling transitions of and also acknowledged, the constraints imposed on the primary input in Fig. 3(b) must be satisfied for both and . To satisfy the constraint for , for an MG that contains a turn-on set and the next turn-off set of , must be kept unchanged between the firing of and in . If does not contain any turn-on or turn-off set of , is sure to be kept unchanged; the falling transition of can hence be acknowledged by that of . Since the roles of gates and are interchangeable, now the falling transition of can also be acknowledged. On the other hand, if also contains a turn-on set and the next turn-off set of , must be interleaved with ( ) to make the falling transition of acknowledged. Similarly, to satisfy the constraint for , must also be interleaved with ( ) to make the falling transition of acknowledged. This does not work since satisfying these two constraints will result in the fact that neither nor is interleaved with ( ), making it impossible for the final cube to be turned off before the firing of . Therefore, to make the falling transitions of and acknowledged, the next turn-off sets of and , i.e., and , cannot exist in the same MG.
During the decomposition of in Fig. 3(a) , each possible decomposition of must be checked to be hazard-free by the theorems presented above until a successful decomposition is found. The intermediate gate in Fig. 3(b) has to be decomposed again if it is still not available in the gate library. This decomposition process is repeatedly performed on each new intermediate gate until all gates are implementable. However, the hazard-free decomposition of an AND gate does not always exist. For the five three-input AND gates in Fig. 2 , only and can be successfully decomposed.
V. RESYNTHESIS
For a gate that cannot be hazard-freely decomposed, new signals are added to the STG and resynthesis is performed. Assume has been decomposed into ( )-level gates as Fig. 5(a is well behaved, but there exists some MG in which the number of turn-on sets of is greater than that of ; 3) is well behaved and the number of turn-on sets of is less than or equal to that of in any MG, but there exists some MG containing a turn-off set of where the primary input is not kept unchanged between the firing of and . There are unacknowledged rising and falling transitions of in the first two cases, whereas in the third case, only the falling transition of cannot be acknowledged. If is not well behaved, there exists unexpectable transitions of . Since it is impossible for an unexpectable transition to be acknowledged, the only way to solve this is to find another possible decomposition where is well behaved. For a well-behaved , two signal-adding methods are developed to generate a new STG for resynthesis.
A. IGA
As shown in Fig. 5(a) , for an unacknowledged gate whose underlying cube is well behaved, each unacknowledged transition of can be made acknowledged if becomes explicit in the STG, called IGA. This idea of global acknowledgment was suggested in [1] and later formalized in [7] , [11] constructed at the SG level. The IGA signal is added to the initial STG so that each transition of can be acknowledged by a corresponding transition specified in the new STG . In the circuit level, the output of is connected, with an inverter attached if necessary, to some signal network that acknowledges . Since the adding of a new signal to may change the circuit of some signal(s) to be implemented, has to be resynthesized. To add to , each transition of , corresponding to a turn-on or turn-off change of the IGA cube , must be carefully inserted so that still preserves the behavior of . Properties such as CSC, consistency and persistency, which are originally satisfied in , should also be satisfied in . To add a new transition to , we have to find the transitions that are connected to , i.e., the trigger transitions of and the transition to which is connected, i.e., the acknowledge transition of . For each added rising transition and falling transition , these trigger and acknowledge transitions can be obtained with the help of the corresponding turn-on set and turn-off set of . Since the behavior of the outside environment cannot be changed, the transition that can be chosen for acknowledging must be a noninput signal transition.
1) Trigger Transitions of
: Since will always be turned on by firing all the transitions in , must be enabled after these transitions fire. The transitions in hence constitute the trigger transitions of .
2) Trigger Transitions of
: Since will always be turned off by firing any one of the transitions in , must be enabled after fires. This OR-causality behavior, however, is not allowed in the STG for signal adding. Each turn-off set is hence restricted to consist of only one turn-off transition and this transition uniquely constitutes the trigger transition of .
3) Acknowledge Transition of : Let the turn-on set be composed of Q-on transitions and be its next turn-off set composed of only one turn-off transition . To avoid generating any autoconcurrent transitions of , the acknowledge transition must be chosen such that fires before its next transition . And to make each trigger transition persistent to , must also be chosen such that fires before the next transition of each . Any transition that is interleaved with ( ), can be chosen to attain this purpose, as Fig. 5(b ; there must be an acknowledge transition in . As Fig. 5(c) 
4) Acknowledge Transition of
: Let the turn-off set be composed of only one turn-off transition and be its next turn-on set composed of Q-on transitions . To avoid generating any autoconcurrent transitions of , the acknowledge transition must be chosen such that fires before its next transition . And to make the trigger transition persistent to , must also be chosen such that fires before the next transition of , let it be . Any transition that is interleaved with ( , ) can be chosen to attain this purpose, as Fig. 5(e) shows. This restriction ensures that fires before ; the trigger transition is hence persistent to . Since only after fires will it be possible for to be enabled, must be a transition in or fire before at least one of the transitions in . Therefore, fires before at least one of the transitions in . Since is enabled after the firing of all the transitions in , and hence fire before . No autoconcurrent transition of is generated. Moreover, since is a new trigger transition of and fires before , is persistent to . Similarly, if there exists a multifanin or multifanout place interleaved with ( , ), must be interleaved with ( , ).
B. SR
When the intermediate cube belongs to the third unacknowledged case, only the falling transition of cannot be acknowledged. If is turned off before the firing of , another signal-adding method called SR can be applied. For the convenience of explanation, the gate to be decomposed is assumed to exist in the transition network of a rising transition and let be the corresponding trigger or context transition of on the primary input . Gate in the transition network of a falling transition can also be dealt with the same way. The idea of SR is to replace with a new SR signal so that and its next transition are acknowledged by the rising and falling transitions of , respectively, and is kept unchanged between the firing of and its next transition . The falling transition of can hence be acknowledged by that of . As Fig. 6 shows, is realized in the circuit level as the output of a new OR gate with input set , satisfying that is enabled by the firing of and disabled by the firing of and . Except that, will not be enabled or disabled by any other transitions. Adding a new gate to the implementation will change the circuit of some signal(s) to be implemented; the whole implementation has to be regenerated. That is, each transition of must be added to the STG and resynthesis has to be performed.
By the duality principle, the OR gate can be transformed into an AND gate with all the inputs and the output inverted. The rising transition of , corresponding to the falling transition of , can hence be perceived from the turn-off change of the SR cube . Similarly, the falling transition of , corresponding to the rising transition of , can be perceived from the turn-on change of . To avoid generating any unex- pectable change of , must be well behaved. Since is enabled (or is disabled) only by the firing of , must contain exactly one turn-off set . Since is disabled (or is enabled) only by the firing of and , must also contain exactly one turn-on set . The trigger and acknowledge transitions of each rising or falling transition of SR signal can be obtained from those of the corresponding falling or rising transition of , which are found in the same way as that of an IGA transition. For an STG with multifanin or multifanout place, the SR signal can be similarly added as that in adding an IGA signal.
C. Resynthesis Procedure: An Example
After a new STG is created from the initial STG by the IGA or SR signal-adding method, it has to be resynthesized and a new implementation is generated. The decomposition work is finished if there exists no undecomposable gate in , or another new STG has to be created for the next resynthesis. Since the adding of a new signal to may result in more undecomposable gates in , has to be checked if progress is made when compared with the initial implementation . Here, progress being made means that the decomposition cost of is lower than that of . The decomposition cost of an implementation is the total decomposition cost of all the undecomposable gates; the decomposition cost of an undecomposable gate is defined as that of the smallest unacknowledged gate in one of the possible decompositions of . For an unacknowledged gate , the decomposition cost is defined as the number of two-input gates which is decomposed into. The more such gates, the more resynthesis cycles are required. Considering a fourinput undecomposable gate , if it can only be hazard-freely decomposed into a three-input gate cascaded with a two-input gate, the smallest unacknowledged gate of is a two-input gate whose decomposition cost is one. If no such hazard-free decomposition exists for , the smallest unacknowledged gate of is a three-input gate with decomposition cost two. If progress is made after resynthesis, is created from . Otherwise, it has to be created from again by working on a different acknowledge transition, a different unacknowledged gate, a different signal-adding method, or a different undecomposable gate. In some cases, however, if a circuit cannot be hazard-freely decomposed after all the solution space is searched, the condition for deciding if progress is made has to be relaxed. Now, we illustrate the resynthesis procedure by decomposing the three undecomposable gates , and in Fig. 2. Fig. 7 gives the whole resynthesis procedure. Shown in the boxs are the undecomposable gates, along with the corresponding cubes, in each implementation and this cube is omitted if it is the same as that in the previous implementation. It can be seen from Fig. 7 that starting from gate , four resynthesis cycles are required to make each gate decomposed. Progress is first made after the third resynthesis, where has been hazard-freely decomposed by the IGA method. Fig. 8(a) shows how this is achieved in the circuit level. Due to the change of the circuit of signal , can also be decomposed hazard free. The decomposition work is finished after the fourth resynthesis. Fig. 8(b) shows how the new OR gate is added to the circuit. For another example sbuf-send-pkt2.g, there is only one 3-input gate, corresponding to cube , in the transition network of that cannot be hazard-freely decomposed. By the IGA method, two signals and (both with multiple instances of transitions) have to be added to the STG for resynthesis. The decomposed implementation is given in Fig. 8(c) and 8(d) shows the final STG.
VI. EXPERIMENTAL RESULTS
The method for the decomposition and resynthesis of SI circuits presented in the previous sections has been automated and incorporated into our earlier synthesis tool [10] in approximately 20 000 lines of C code. We have also demonstrated the time efficiency of our method by testing it on the asynchronous benchmarks [18] , assuming a gate library where only two-input basic gates and C elements are available. The experimental results, shown in Table IV , were obtained on a Sparc Ultra-30 station with a clock speed of 248 MHz and 128 MB of physical memory. All the decomposition results have also been verified to be correct by the tool developed in [17] . In Table IV , column "Sigs" reports the number of signals to be added to the initial STG for resynthesis. The complexity of an AND/OR gate is measured as the number of literals, either complemented or not. Column "Lits" reports the number of literals of all the combinational gates and the number of required C elements is given in column "C-eles". In addition, the number in the parentheses "old" indicates the number of C elements in the initial implementation. The required CPU time (in seconds) is reported in column "Time".
It can be seen from Table IV that since only the combinational decomposition is considered in our method, there is no big change in the number of C elements before and after decomposition of every circuit except vbe5c.g (using one fewer C element). We also compared our results with [11] (by the experimental data from [7] ) and Petrify [6] (by the command "petrify -lit2") assuming the same gate library. The number of added signals in our method is lower than that in [11] and Petrify, implying that fewer resynthesis cycles are required by our method. If the cost of a C element is considered to be three literals, the total cost of our implementation (Total 1) is 666 literals. Compared with [11] , where the total cost is calculated as 791 literals, our method can reach approximately a 18.8% area reduction. Moreover, our run time, which has been normalized due to different machines used, is only 16.7% of that in [11] obtained on a Sparc 20 machine. On the other hand, we found that the circuit master-read.g cannot be hazard-freely decomposed by Petrify. The total cost of our implementation excluding this circuit (Total 2) is 619 literals, whereas the total cost by Petrify is 581 literals. Though our method needs 6.1% more circuit area, this slightly worse but acceptable result can be further improved in the future by taking the sequential decomposition into consideration. The required run time by our method, however, is only 6.3% of that by Petrify. In addition, we also used vbe6a.g as another scalable example to further show the time efficiency of our method. It is a MG with four stages of request-acknowledge pair executing sequentially. Several of its variations with different number of stages were also tested. The run-time comparison with Petrify is depicted in Fig. 9 .
VII. CONCLUSION
We have presented a time-efficient method for the decomposition and resynthesis of SI circuits. The method starts with the STG specification of an SI circuit, then various hazard-free decompositions are investigated for each high-fanin AND gate that does not exist in the gate library. For those gates that cannot be decomposed hazard free, new signals are added to the STG and resynthesis is performed. By adopting the STG as our input specification, reducing the resynthesis cycles, and applying the efficient signal-adding methods for resynthesis, the run time has been largely reduced with only a little more area expense. In the future, we plan on modifying our method to take the sequential decomposition and general storage elements into consideration and to investigate more latch sharing possibilities for area reduction.
