Abstract-Asynchronous speed independent (SI) circuits based on an unbounded gate delay model often suffer from high area penalty. It happens due to the lack of efficient global optimization. This paper presents a boolean optimization method based on tranduction method to optimize asynchronous SI circuits while preserving hazard-freeness.
I. INTRODUCTION
Signal Transition Graphs (STGs), interpreted Petri Nets, are commonly used to specify behaviors of asynchronous circuits. Starting from STGs, an asynchronous logic synthesis tool p e t r i f y [2] can synthesize the corresponding asynchronous speed-independent (SI) circuit, where the circuit behaves correctly under any gate delay.
Even though p e t r i f y is well established for the synthesis of asynchronous SI circuits, global optimizations using the relationships among logic functions of gates are not realized because each output function is derived separately from the encoded state graph of an STG. This sometimes causes redundant circuits such that the same function appears on the different logic networks.
To solve this problem, in this paper, we show an approach to optimize asynchronous SI circuits globally using don't cares derived from given circuit structure. The don't cares are exploited at the whole circuit structure by calculating permissible functions. Permissible functions [4] are functions guaranteeing that a change within them does not affect the circuit outputs. They are frequently used at multi-level logic optimizations. One of the representative approaches using permissible functions is transduction method [4, 51. The transduction method optimizes given circuits by sharing common gates and substituting gates so that the total number of gates and/or connections are minimized.
In this paper, we extend the transduction method to apply it for asynchronous SI controllers considering how to calculate permissible functions and which transformations guarantee hazard-freeness. In particular, we focus on gate substitution algorithm in [4] . Fig.1 shows the proposed optimization
Logic functions or

Generated circuits
Optimization based on transduction , Optimized Developed * parts
Fig. 1. Framework of this work
flow based on the transduction method. As initial inputs, it accepts logic functions for a circuit (or technology mapped circuit) produced by p e t r i f y and the corresponding binary encoded state graph. Then, it optimizes the circuit in terms of gate substitution if hazard-freeness is guaranteed.
According to [3], in p e t r i f y , global optimization in terms of gate substitution is carried out when a function is decomposed during technology mapping. However, it is restricted to whether some gate is substituted by the decomposed gate or not. In addition, the computation complexity is increased because recalculation of state space is required when each function is decomposed. Our approach does not restrict to some specific gate and impose recalculation of state space because it optimizes circuits globally without requiring any logic decomposition. The rest of the paper is organized as follows. In section 11, the basic notions of STG-based synthesis is presented. In section 111, the calculation of permissible functions for asynchronous SI controllers is discussed. In section IV, we show how to substitute the gates in asynchronous SI circuits preserving hazard-freeness. Finally, we show the experimental results in section V and conclude this work in section VI. Fig.2.a shows a simple interface between two modules in an asynchronous system, a master (e.g., a processor) and a slave (e.g., memory). The interface involves two signal handshakes, one for controlling the transmission of address (add and add,,k) and the other for data (data and data,,k). The timing diagram shown in Fig.2 .a defines the synchronization protocol between the handshakes. An STG transition is enabled if all its input places (arcs) contain a token. In the initial markiiig {pl,p2} of the STG in Fig.2 .b. transition add+ is enabled. Every enabled transition can fire, removing one token from every input place of the transition and adding one token to every output place. After the firing of transition add+ the token moves to a new marking, { p 3 } , where data+ is enabled. The set of all signals in an STG is partitioned into a set of inputs, which come from the environment, and a set of outputs and state signals that must be implemented.
B. State Graph
Playing the token game for the reachability analysis on a given STG, one can generate a State Graph (SG) in which each node (a marking) is labeled with a vector of signal values (in Fig.2 .c, signals that can change in the state are marked with "A") and arcs between pairs of states are labeled with the corresponding fired transitions.
Excitation region and quiescent region. A maximally connected set of states in which a* is enabled is called an excitation region (ER) for transition a* (denoted by ER(a*), e.g., the shadowed set of states in Fig.2 .c corresponds to ER(data-)). The quiescent region (QR) for transition a* noted by QR(a*)) is a maximal set of states such that a is stable and not reachable from any other ER(a*.
Signal consistency. An SG is consistent if in every transition sequence from the initial state, rising and falling transitions alternate for each signal. Fig.2 .c shows the SG for the STG in Fig.2 .b, which is consistent.
Implementability conditions. In addition to consistency, the following two properties are required for an SG to be im- said to be in csc conjict (binary codes 100*0 and 10*00 in Fig.2 .c).
The following sufficient condition was proved in [l] : an
STG can be implemented by a speed-independent circuit if ii'
is consistent, output-persistent, and CSC.
C. Generalized C-element implementations
If previously discussed conditions are satisfied, one can produce an SI circuit out of an STG where each signal a will be implemented as a = S + a . a form, where R and S are set and reset functions respectively. This way of implemen tation is known as generalized C-element implementation (or gC-implementation). In this work, we focus on the circuits derived by this implementation style as optimization targets. Fig.3 shows an example of gC-implementation for a signal a. Each gate in the first level (i.e.,C(a+), C(a + / 2 ) ) corresponds to a signal transition and is derived to satisfy the following monotonous cover conditions. Note a * /i means the i-th transition of signal transition a*. 
CALCULATION OF PERMISSIBLE FUNCTIONS
A. Permissible Functions
Permissible functions, defined for each net and gate output, represent a set of logic functions in which a change within the functions does not affect circuit outputs (due to don't care space derived from circuit structures).
According to [4] , the calculation of permissible functions consists of the following two steps.
Calculation of logic values for each gate and net by as-
signing truth values for the primary inputs (Fig.4 .b) 2. Calculation of permissible functions for each logic and net starting from the circuit outputs to the primaiy inputs (Fig.4.c) After all of the logic values are calculated (Fig.4.b) , the permissible functions are derived by assigning possible don't care for each input. For example in an OR gate, all of the inputs must be 0 if the output is equal to 0. However, if the output is equal to 1, one of its inputs must be 1 while the others can be either 0 or 1 (i.e., don't care, denoted by *). In Fig.4 .b, the permissible functions of the inputs of the OR gate, w1 and b, can be 0 1 * 1 and O* 1 *. Following to the same consideration for the AND gate, we can obtain the permissible functions of the circuit in Fig.4 .a as in Fig.4 .c.
In some cases, a gate (or a net) may have several candidates of the permissible functions. For example in Fig.4 .c, there is another possibility of the permissible functions for w l and b, 01** and 0" 11 (the last element is different from the previous case). The difference comes from the choices of the don't care assignments for w l and b where the output of the OR gate is 1
In fact, since calculations of all candidates require lots of computation time, we concentrate on only a partial set of permissible functions called 
Circuits
In addition to the previous procedures, the following considerations are required for the calculation of permissible functions in asynchronous SI controllers.
1. Removal of all feedback loops 2. Assignments of truth values from corresponding SG
Assignments of don't care except the states in ERs
Looking through asynchronous si circuits, they contain feedback loops because they describe sequential machines. To prevent the iterative calculations caused by these loops, we must cut all loops as in Fig.5 .a before the calculation of logic values.
In addition, since the behaviors of asynchronous SI controllers are represented by the states in the corresponding SGs, the truth values of signals directly come from the corresponding SGs (see Fig.5.b) .
The last requirement is don't care assignment. In asynchronous SI circuits, since the timing of the signal changes of non-input signals (i.e, output or state signals) is represented as an ER on SG, assignments of don't care to the states in ERs may lead to some hazardous behavior during transformations. Therefore, we do not assign don't care for any state in ER. It must be considered on all of the gates and the nets in a given circuit. Fig.5 .c shows the calculation result of the permissible functions for Fig.5 .a.
Iv. TRANSDUCTION METHOD FOR ASYNCHRONOUS SI CIRCUITS A. Validations ofgate substitutions
In order to preserve hazard-freeness after optimizations, our approach allows substitution if gate 92 which substitutes gate g l satisfies the monotonous cover conditions of the gate g l . This is checked by observing the relationships of the logic values and the corresponding ERs and QRs for both g l and 92. Before describing formal substitution conditions in gCimplementations, we define several terminologies. Note g l is a gate for the i-th transition of signal transition Proof of proposition IV.l. The first condition in Prop.IV.1 guarantees the cover condition in the monotonous cover conditions. Since we have never assigned don't care for all of the states in ER, G,(gl) is equal to 1 i f s is a state in ER(a * /i).
Under such a situation, if g2 does not cover the state s (i.e., g2 = 0 in state s), it looses the timing to produce signal transition a * /i which implies a hazardous behavior.
The second condition is for the one-hot condition in the monotonous cover conditions. According to the one-hot condition, gate g l does not cover the states out of ER(a * /,i) U QR(a * /z). If gate 92 covers such states and substitutes g l , it means that there exist hazardous behaviors for gate g2 in those states, which may be propagated to the circuit outputs.
The third condition is for the monotonicity condition in the monotonous cover conditions. Fig. 6 . An illustration of gate substitution if in their CSPFs, G(g1) and G(g2), G(g1) includes G(g2) (G(g1) 3 G(g2) ). For example in Fig.6, g l is Fig.7) . If the conjunction is not empty, we check whethergl org2 is included in that conjunction or not (iVewG 2 g l or NewG 2 92). Whec g l is included in that conjunction, 92 is substituted by g l if for g l all of the conditions in Prop.IV. 1 with respect to 92 are satisfied. We call this check as Case3-check. If the conjunction of CSPFs exists but g l or g2 is not included, we create a set of new gates (New in Fig.7 ) such that each new gate g is included in the conjunction (iVewG 2 g).
In this case, we must check all of the conditions of Prop. IV. 1 for the newly created gate with respect to g l and 92. Fig.7 shows a pseudo code of the extended gate substitution algorithm.
Example. In order to demonstrate how the extended gate substitution algorithm works, we apply it for an example circuit. Fig.8 .a shows the SG of this example and Fig.8 .b shows a part of the corresponding SI circuit with respect to gCimplementation. The CSPFs are assigned for C-element, set, and reset functions (i.e., C-elements for aout and csc, gate g l , and gate 92). Since CSPFs of 92 and g3 are neither G(g2) C G(g3) nor G(g2) 3 G(g3) and the conjunction of them is not empty (G(g2) n G(g3) = I1*100000000000000), this is Case3.
Suppose we substitute gate g3 by gate g2. For the first condition of Prop.IV. 1 , Cases-check checks the states where of s4), since g2 covers those states, the last condition is also satisfied. As a result, the substitution of 93 by 92 is carried out without leading any hazardous behavior ( Fig.8.d 3] ) due to the introduction of decomposed gates. Although the formal conditions for substitutions must be considered, we will investigate the applicability of our gate substitution algorithm for technology mapped circuits. Hence, in this work, the functionalities and hazard-freeness of optimized circuits were verified by using SI verification tool versify. The formal conditions for substitutions will be considered in our future work. 
A. Experiments on gC-Implementations
In this experiment, CSPFs are assigned for C-element, set, and reset functions. Table I The result shows that we can reduce 20% of the area with respect to the number of nodes (17% wrt the number of literals) on average. In gC-implementations, our approach works well while substituting identical gates or the gates which have similar logics.
B. Experiments on Technology Mapped Circuits
Similar to the previous experiments, we apply our gate substitution algorithm to technology mapped benchmark circuits. In this experiment, we assume that our library has C-element (c = u.b+(a+b).c),AND,OR,NAND,NORandINVgates under three fanin. CSPFs are assigned for each gate. Table 11 shows the result of this experiment. From the result, the area reduction is about 10% wrt the number of nodes (6% wrt the num. of lits.) on average. In this experiment the effect is not so much compared to gC-implementations because p e t r i f y tries gate substitution when a function is decomposed. However, our approach can optimize the circuits even after the gate substitution is camed out.
have redundant circuitry. To solve this problem, we proposed an optimization method for asynchronous SI controllers globally using transduction method while extending it to preserve hazard-freeness. The experimental results were encouraging in that on average the area reductions by our approach were about 20% (for gC-implementations) and 10% (for technology mapped circuits) in terms of the number of nodes. The algorithm discussed in this paper was implemented using JAVA.
For future works, another optimization method based on maximal set of permissible functions is considered because it may give better results. In addition, the formal substitutiorr conditions for technology mapped circuits will be considered.
