Abstract-This paper develops a theoretical framework for the hazard-free gate-level implementation of speed-independent circuits specified by event-based models, such as signal transition graphs (for processes with AND causality and input choice) or their extension, called change diagrams (which allow OR-causality). It presents sufficient conditions, called the generalized monotonous cover requirements, for a hazard-free circuit to be built within a standard implementation structure. This structure consists of two-level simple-gate combinational logic and a row of latches, either a C-element or an RS-latch. A set of semantic-preserving transformations is defined that can be applied to an original behavioral description of the circuit so as to produce its specification in the form that satisfies the monotonous cover requirement. The transformations are applied at the event-based representation level (to avoid state explosion) and proved to be effective. The main result of the paper is therefore twofold: 1) the proof that any speed-independent behavior can be implemented at the gate level without hazards and 2) an efficient method for constructing such an implementation. Experimental results show that the proposed method compares very favorably, in area and performance, to the previously known techniques.
I. INTRODUCTION
A UTOMATED synthesis of asynchronous, or self-timed, circuits has recently become an important issue in the list of problems confronting the very large scale integration (VLSI) computer-aided design (CAD) community. The scope of application for asynchronous circuits is increasing due to a number of potential advantages:
• greater modularity since one can design, reuse, and maintain components one at a time;
• no problems with clock signal skew and thus no area or time penalty for the fast and reliable driving of clock lines since the system does not use the clock at all; • operational scalability, that is, if one part works slower, the system slows down but does not fail;
Manuscript received December 8, 1994 ; revised April 22, 1998 . The work of A. Kondratyev was supported in part by the Engineering and Physical Sciences Research Council under Grant GR/L24038. This work of A. Yakovlev was supported by the Engineering and Physical Sciences Research Council under Grants GR/J52327/K70175 and GR/K70175/L28098. This paper was recommended by Associate Editor A. Saldanha.
A. Kondratyev is with the University of Aizu, Aizu-Wakamatsu 965 Japan. M. Kishinevsky is with the Strategic CAD Lab, Intel Corp., Hillsboro, OR 97124 USA.
A. Yakovlev is with the University of Newcastle upon Tyne, Newcastle NE1 74U England.
Publisher Item Identifier S 0278-0070(98)06765-7.
• robustness to parameter variations, for example, temperature and supply voltage [24] ; • reduction of power dissipation due to the absence of clock activity and performing signal transitions only when they are needed [35] . One approach for the design of asynchronous circuits is to rely upon known bounds on gate/wire delays and/or to use a restricted input-output signaling protocol [14] , [20] , [31] , [34] , [43] . A typical example of the latter is the fundamental mode protocol, where a new input pattern may be applied to the circuit only when the circuit is completely settled after the previous pattern. This assumption implies the imposition of certain relational constraints on the response delay of the environment.
In contrast, speed-independent circuits [15] , [23] , [28] rely on the unbounded delay model, which uses pessimistic assumptions for gate delays (no bounds are known) and realistic assumptions for wire delays (the skew of wire delays at multiple fan-outs is less than one gate delay). These circuits do not impose global constraints on the environment. Instead, each input change is acknowledged by the circuit to indicate that the environment is allowed to apply the next input pattern. Therefore, speed-independent circuits are more robust to irregular parameter variations (process changes, voltage and temperature variations, noise). Speed-independent circuits enjoy the property of being self-checking to stuck-at faults [26] , [38] and are easier to verify than bounded delay circuits [12] , [15] , [20] .
A. Problem Description
An event-based model, called the signal-transition graph (STG), has become popular as the specification language for synthesis of asynchronous circuits [4] , [15] , [20] , [27] , [37] , [41] . A simple example of the STG is shown in Fig. 1(a) . It defines causal relations between rising (denoted with ) and falling (denoted with ) signal transitions in the circuit by means of arrows. For example, the rising transition of signal has two fan-in arrows and which mean that transition can only occur when both transition and transition have occurred. The dots placed on the arcs indicate the particular initialization of the circuit. Such a representation of the circuit behavior can be viewed either as an interpreted Petri net [13] or as a formalization of timing diagrams [20] .
If an STG specification satisfies certain correctness criteria (discussed in Section II-B), one can derive a state graph from such an STG. The state graph, called a transition diagram (TD) following the tradition of [26] , and [28] , has vertices, called states, which are labeled with binary codes, and arcs between the states that correspond to the allowable signal transitions. A TD corresponding to the STG from Fig. 1 (a) is shown in Fig. 1(b) . To indicate that a signal can perform a transition the signal's value is marked with an asterisk ( and ). If any two states of a TD have different Boolean codes or if states with the same code output signals have the same next value, i.e., the complete state coding requirement is met (see Section III-A), then the TD defines an incompletely specified logic function for each output signal. The following three sets of Boolean vectors can be derived from the TD: the on-set, the set of states with the signal value equal to 1 or to 0 where the function evaluates to 1; the off-set, the set of states with the signal value equal to 0 or 1 where it evaluates to 0; and the dc-set (don't care set), the set of states that are not reachable from the initial state and wherein the function is not specified. From the Karnaugh map given in Fig. 1(c) , one can derive the logic function . The next stage of synthesis is technology mapping into standard gate cells. The function for signal may not be implementable with one standard logic gate due to three major reasons.
1) Signal occurs in the function both in the inverted and noninverted form. This requires an extra inverter for delivering these two values. The use of extra inverters can introduce logic hazards at the output of the cell that implements signal when signal performs rising or falling transitions (shown by the arrows inside the Karnaugh map in Fig. 1 
(c).
2) The function for is self-dependent, i.e., it depends on (itself) and consequently requires a feedback. An actual implementation of feedback structures must involve at least two negative gates inside the feedback loop and is typically based on latches.
3) In general, the logic function for may be too complex for one library cell. Due to the problem of logic hazards, decomposing "large" logic functions into "smaller pieces" is nontrivial. It was proved in [38] that any logic function derived for a speed-independent specification can be decomposed into an -latch, built of two complex gates, without hazards. Fig. 2(a) shows a Karnaugh map for the function of an -latch that implements signal in a dual-rail form, and Fig. 2(b) shows the -implementation of signal . Although the function implemented as a cube can change its value nonmonotonously in the state sequence as shown in the Karnaugh map, this does not imply hazards at the outputs of the -latch. Indeed, the stable "0" is held at the output of the complex gate by the "1" from the output of the latch when cube is changing its value. [A complementary metal-oxide-semiconductor implementation of the complex gate for is shown in Fig. 2 
(c).]
However, a simple gate implementation based on a trivial decomposition of a complex gate latch into a standard simple gate latch and a sum-of-product two-level combinational circuit for functions and is hazardous, as exemplified in Fig. 2(d) . A hazard-free implementation using simple gates can be obtained by including redundant literals into and : and [ Fig. 2(e) ]. In general, it is not always possible to find a nonprime or redundant cover for the hazard-free implementation of the and functions for the -latch. If this is impossible, a transformation of the initial specification is needed.
The main results of the paper can be summarized as follows.
For the speed-independent implementation of a correct STG specification, we use a standard implementation structure. Two possible such structures are considered: an -implementation and a C-implementation. They are based on the use of twolevel combinational logic and a latch (either an latch or a C-element) for each output signal. We prove that if each cube implemented by an AND-gate in the first level of combinational logic obeys the monotonous cover (MC) requirement, then both standard implementations are speed independent. This requirement is formulated in terms of regions in the state space of the transition diagram that corresponds to the initial STG. 1 The term "MC requirements" refers to restrictions that will be imposed on cover cubes in the sum of products (SOP's) of the S and R functions of an implementation. Bearing in mind the link between a TD (and hence STG) and the ON and OFF sets of the S and R functions it produces, we can therefore use this term for the TD's (STG's) that generate the excitation functions satisfying the MC requirements. We then switch our focus to model transformations, where we present a constructive procedure that allows us to transform an initial STG specification to the form that meets the monotonous cover requirement. From the latter, a speed-independent standard implementation can be directly derived. The method is based on inserting additional internal signal transitions, and it always converges. Only very limited transformations that preserve the language generated by the specification are used here. Neither concurrency reduction [36] nor signal reshuffling [23] is allowed. In other words, by equivalence in this paper, we mean the trace equivalence with respect to externally observable signals.
We extend these results in several ways. First, we relax the monotonous cover requirement to allow gate sharing in the combinational logic and discuss several optimization techniques. Then we generalize the monotonous cover requirements to allow hazard-free implementation for a wider class of specifications, such as those that are not restricted by the unique entry condition (defined in Section III-C) and those with OR-causality.
B. Related Work
A great deal of research activity has been recently focused on the synthesis of asynchronous circuits from the STG specifications. We will only discuss the work that directly relates to the hazard-free gate-level implementation of speedindependent circuits.
In [2] , Beerel and Meng suggest a synthesis method that is based on the use of the standard C-implementation structure. They define an implementation condition that is equivalent to our monotonous cover requirement for the case of decomposition into simple gates. This work has been an important step toward the implementation of speed-independent circuits into standard gate cells. Our work generalizes the results of [2] by 1) considering a wider class of specifications that can be handled in the synthesis process and 2) extending the theory of monotonous cover to support more aggressive optimization.
The requirement for specifications to satisfy the unique entry conditions, central in [2] , is overly restrictive even for the implementation of circuits with AND causality between signal transitions. We show in this work how to avoid this restriction, and apply our methods to specifications that have both AND and OR causal relations between signal transitions.
The conditions of [2] and the "basic" monotonous cover requirement are both targeted at the architecture in which every transition of an output signal in the specification is implemented by a separate cone of logic realized in simple gates (AND, NAND, OR, and NOR). To improve the efficiency of the implementation, we produce the generalized monotonous cover requirements (where a single cone of logic can be shared to implement several signal transitions) and the polyterm monotonous cover for the implementation with complex gates (AND-OR, AND-NOR, etc.).
This work also suggests a different (from [2] ) way of organizing logic that implements signal transitions. The monotonous cover requirement is defined in accordance with one cube (that leads to a very simple and, we believe, efficient architecture: one excitation region-one AND gate in combinational logic), while the conditions in [2] are defined for a set of cubes covering the excitation region. The latter makes logic more complicated (see the experimental results) and calls for a computationally hard (circuit-level) transformation of the circuit based on using additional acknowledgment wires from the gates of the combinational logic. This method does not always converge and sometimes needs extra signals added into the specification. The problem of logic decomposition with sharing was recently solved in [10] .
Chu [4] , [7] suggests a way of implementing TD specifications that are produced by free-choice safe STG's. His logic synthesis is based on complex gates, which may not be implementable as single units, satisfying both the delay model requirements and the constraints of a given technology.
Moon et al. [27] , describe an implementation architecture consisting of two-level sum-of-product logic and set/resetdominant latches. They consider STG specifications with AND causality only. Their hazard elimination procedure is not strictly complete, as it does not always guarantee hazard-free implementation. Varshavsky and Kishinevsky [38] present a formally justified method of implementing autonomous (having no external inputs) speed-independent circuits by complex AND-NOR gates and by two-input NAND and NOR gates, as well as distributive circuits without OR-causality by two-input NAND (or alternatively NOR) gates only, thus proving theoretical results on the implementability of these classes. However, the offered constructions are area inefficient and thus impractical.
Yu and Subrahmanyam [42] proposed to use the property of a separable cube to define necessary and sufficient conditions for hazard-free implementation of a limited class of STG's. Their analysis, however, assumes another implementation architecture (sum-of-product combinational logic without separate asynchronous latches), and it is applied under rather severe constraints on the specification, namely, only marked graphs (STG's without choice) and nonrepetitive transitions of noninput signals are allowed.
A preliminary discussion of the monotonous cover condition accompanied with a synthesis method was presented in [17] and [18] . This method was formulated as a set of Boolean constraints under the rules of introducing additional signals. The solution can be found using Boolean satisfiability solvers. This approach allows handling only relatively small specifications because Boolean constraints are described in terms of the state graph rather than at the signal-transition level. Furthermore, the control of the quality of logic obtained is rather limited. The latter is due to a limited opportunity to influence the proof run in the Boolean satisfiability solvers.
Pastor et al. [33] used the theory of monotonous covers presented in this paper for developing structural methods of synthesis of speed-independent circuits using cube approximations derived from STG's. Last, [10] presents the state-ofthe-art method for decomposition and technology mapping of speed-independent circuits, which is built on the monotonous cover theory. This paper is further organized as follows. Section II introduces asynchronous circuits and STG's. TD's and their properties are presented in Section III. Section IV defines two standard implementation architectures based on C-elements and -latches. Section V presents the monotonous cover requirements. Section VI discusses reductions of STG specifications to the form satisfying monotonous cover requirements. To this end, a set of equivalent transformations at the STG level is presented. Section VII shows the extensions of monotonous cover theory to support several optimizations and to capture processes with OR-causality. Last, Section VIII presents experimental results and comparison (on a benchmark list) with the previously known methods.
II. MODELING CIRCUIT BEHAVIOR

A. Asynchronous Circuit Model
A circuit is described as an interconnection of logic gates. Every gate is represented by a combination of a function evaluator, which evaluates the corresponding logical function instantly, and an unbounded delay, which is attached to the gate's output. The type of delay can vary depending on the assumptions about particular physical properties of the electronic devices and wires implementing the circuit. We will adopt the most pessimistic view on the properties of the gate delay: any glitch in its input can propagate to the output (more details on delay models can be found in [1] and [26] ). All the delays involved in switching and transmission of signals within a gate and through its output wire prior to a fork are assumed to be reduced to the output delay. The skew of signals in the wire delays after the fork is assumed to to be less than the minimum gate delay. A delay induced by a wire may be modeled explicitly, if need be, by inserting an auxiliary component, a buffer, into the wire break.
A circuit is a set of gates and a set of input nodes with each gate input connected to strictly one gate output or one input node and with no two outputs tied together. The transient behavior of the circuit may also proceed in parallel with the signal transitions on the input nodes. In other words, the environment may change some of the input values in response to the external output changes according to some protocol. This style of circuit-environment interaction is often called the input-output mode. It is more general and robust than the traditional fundamental mode [14] , [20] , [31] , [34] , [43] , which restricts the environment's action by allowing input transitions only when all output signals are stable.
A spontaneous deviation from the intended circuit behavior, called hazard, can occur in the implementation of an asynchronous circuit. This appears as a short spiky pulse, which does not correspond to any signal transition in the specification.
In the framework of speed-independent design, a hazard-free behavior is captured by the notion of semimodularity [26] , [28] that is defined as follows. An excited signal can either perform the enabled transition (e.g., transition
at the output of an excited AND gate with output at one and at least one input in zero) or become disabled because the gate inputs change (e.g., if both inputs of an AND gate go to one before the output changes to zero). The latter behavior is considered hazardous because in such conditions, a spurious pulse might appear on the output of the gate. A circuit is semimodular with respect to a state and a signal if, when the circuit is initialized in , no transition of the signal can be disabled during the circuit operation. A stricter definition will be given in Section III. In the framework of this paper we will use the terms "speed-independent" and "semimodular" circuit as synonyms because semimodular circuits are the widest subclass of speed-independent circuits that are free of hazards under both inertial and pure delay models [41] . (For the original analysis of the relationship between semimodularity and speed independence, one may look into [26] and [28] .)
B. STG's
An STG is defined as a Petri net (PN) whose transitions are labeled with the transitions of signals in a modeled circuit. We assume here the standard definition of a PN [29] :
, with being a set of places, a set of transitions, a flow relation, and the initial marking, which is a function where is a set of nonnegative integers. A PN marking is depicted by tokens in the places, whose number is determined by . In this paper, we will mainly consider a sufficiently powerful subclass of PN's called free-choice PN's.
A free-choice net is a PN in which, if two or more transitions share one predecessor place, then this is the only predecessor place for all of them. Such a place is called a free-choice place. A marked graph (MG) net is a PN in which each place has exactly one predecessor and one successor transition. MG's allow concurrency but cannot model choice.
An STG is a free-choice PN whose transitions are interpreted as signal transitions on the circuit inputs (input transitions) or gate outputs (output transitions). A signal transition can be represented by or , where is the name of the signal; " " or " " is the sign of the signal transition, with " " for the transition of from zero to one, and " " for the opposite transition (only binary systems are considered here); and is a subscript denoting the instance number of the or transition (in one cycle of the circuit operation the signal on an input or gate output may change several times). This subscript can be omitted if the signal transition can occur only once in the circuit operation cycle.
is used to denote either a " " transition or a " " transition. Definition 2.1: An STG is a triple where is a free-choice PN, is the set of signal transitions, and is a function that labels each transition with a signal transition.
The functioning of an STG is similar to that of Petri nets [29] . A transition is enabled if all its predecessor places are marked. When an enabled transition fires, the marking of each predecessor place is decremented, and the marking of each successor place is incremented. The new marking of the STG obtained through the firing of the transition can again make some transitions enabled. This leads to the dynamic behavior of the STG, analogous to that of a circuit. We can therefore talk about sequences of transitions that fire under the markings reachable from the initial marking . Such sequences will be called feasible sequences of the STG.
An STG is graphically represented as a directed graph with transitions denoted by their names and places by circles, where places that have only one predecessor and successor transition are usually omitted. Transitions of input signals are underlined. The subscripts of signal transitions, if necessary, are placed in the same line with the signal change (using commas).
The following two properties of PN's (and hence STG's) reflect their ability to define finite and cyclic processes in circuits.
A Fig. 3(a) . This STG has only one free-choice place that is initially marked. This place is a predecessor one for transitions and , and both of them are enabled initially. Now we will introduce the notion of STG correctness that points out the necessary and sufficient conditions for the correspondence between the STG model and its correct implementation as a semimodular circuit [15] , [39] . Not every STG is implementable by a circuit. In the previous section, we considered initialized circuits, which have a single initial state. Hence, when targeting at the implementation by an initialized circuit, we should require the similar property from the original STG, i.e., in its initial marking, the value of each signal must be determinate (zero or one). We will call an STG well initialized if for every signal in any feasible sequence starting from the first transition of has the same direction (either falling or rising). Clearly, in a well-initialized STG, the value of each signal in the initial marking is determinate.
Any sequence of signal transitions in a binary circuit possesses the following property, called switchover correctness: the rising and falling transitions of the same signal must alternate. Last, for an STG to specify the finite behavior (of a circuit), the set of markings during the STG operation cannot grow infinitely, i.e., the underlying Petri net has to be bounded.
Our goal is a semimodular implementation of the STG behavior without internal nondeterminism. In this paper, we do not consider implementations involving arbiters [11] . Hence, the free-choice places in the STG must only be associated with external nondeterminism, which corresponds to the designer's partial knowledge of the environment's reactions. Thus, for a correct STG, all the successor transitions of a free-choice place may only be input signal transitions.
Definition 2.2:
An STG is correct iff it satisfies the following four conditions.
1) The STG is well initialized.
2) Each feasible sequence of signal transitions is switchover correct.
3) The STG is bounded.
4) Any free-choice place precedes only input transitions.
It was proved in [39] that STG correctness (in terms of Definition 2.2) is necessary and sufficient for implementability of an STG with a semimodular circuit.
STG descriptions are more compact than TD's and are therefore more appropriate for specification and verification.
However, in synthesis of Boolean functions for circuit signals, it is often more convenient and efficient to derive Boolean vectors corresponding to the markings of STG's.
III. TRANSITION DIAGRAMS
A TD is a directed graph where each vertex is in one-to-one correspondence 2 In all other cases, the value of signal in and is the same. It can be proved that a TD obtained from a correct STG always has a consistent state assignment [15] . Fig. 3(b) shows the TD that corresponds to the STG from Fig. 3(a) . It is easy to check the consistency of its state assignment.
A. Complete State Coding
Each state of a TD defines a vertex in the on-set or the offset of a signal function . If , i.e., signal has either value 1 or in , then belongs to the on-set. Otherwise, belongs to the off-set. All states that do not belong to the TD of the circuit, i.e., that are not reachable from the initial state(s), belong to the dc-set of . Using the minimization of the incompletely specified logic functions defined by their on-sets, off-sets, and dc-sets, we derive the gate functions of the implementation.
Unfortunately, such a procedure is not always immediately possible. Two different markings of an STG may correspond to the TD states that have identical Boolean codes, even in the case when the TD has a consistent state assignment. If two states of a TD with identical Boolean codes have different excitation of output signals, then the TD violates the complete state coding (CSC) requirement [5] , and the two states are said to be in a CSC conflict.
A violation of the CSC requirement means that the circuit being in the same binary state has to produce different transi-tions on the gate outputs. This is possible only if the circuit can distinguish such conflict states by means of an additional memory, i.e., internal signals that do not exist in the original specification.
If an STG is correct, then there is a semimodular circuit such that its output behavior is equivalent to the STG specification. This circuit can, however, have more signals than the original STG, since some additional state signals are required to ensure the CSC property. Thus, the equivalence between the STG and the circuit behavior is considered only with respect to the signals from the initial STG specification.
We will not discuss here how to ensure the CSC property. There have been several formal techniques proposed recently for the elimination of CSC conflicts by inserting extra internal signals [8] , [15] , [22] , [32] , [37] . Thus, we will further consider only TD's that satisfy the CSC property. For such TD's, it is always possible to derive logic functions for all output signals. Usually, some of the logic functions are too complex to be implemented by a single gate. Therefore, the function has to be decomposed into several gates, and this decomposition must not introduce any hazards, i.e., be semimodular.
B. Semimodular Transition Diagrams
We defined a TD as an object generated through the token flow simulation of an STG. However, in order to formally check the properties of a circuit, it is convenient to relate an initialized circuit with its TD. The TD can be constructed through circuit simulation using state traversal, following the way suggested in Section II-A. Efficient techniques of TD generation and asynchronous circuit analysis can be found elsewhere [28] , [38] . Drawing upon the correspondence between the circuit and its TD, we can formally define the notion of semimodularity in the TD terms and speak about a semimodular circuit as a circuit whose TD satisfies certain formal properties.
Definition 3.2 (Conflict) [28] , [38] : State of a circuit is said to be a conflict state with respect to signal iff is excited in and there is another signal excited in such that and signal becomes stable in state . If is an output signal, then is said to be an output conflict state; if is an input, while is an output signal, then is said to be an input-output conflict state; when both and are inputs, is said to be an input conflict state.
Hazards at the gate outputs can occur if a circuit reaches an output conflict state. Input-output conflict states specify the conditions under which output signals disable inputs. This might impose nonimplementable constraints for the behavior of environment and should be avoided in speed-independent implementation. Input conflicts identify states in which the environment determines the direction of control flow. For example, a random-access memory (RAM)-controller has a read and a write mode controlled by the read and the write input signals. This type of input control is represented in the TD of the RAM-controller by means of input conflict states, in which both read and write input signals are excited, but always only one of them (the environment's decision) performs a transition.
Definition 3.3 (Semimodularity):
A TD is said to be semimodular (output semimodular) with respect to state iff no conflict state (output and input-output conflict state) is reachable from Let us return to the TD example in Fig. 3 (b). In the initial state both and signals are excited but the firing of either of them disables the excitation of the other (see states and Thus, is a conflict state and this TD is not semimodular. As and are both input signals, state is an input conflict state. There are no other conflict states in this TD, so it is output semimodular and can be implemented by a semimodular circuit.
The definition of semimodularity is given for TD's. However, we will equally apply it directly to the circuits associated with the TD's in question. Bearing in mind that we consider speed independence and semimodularity of circuits as synonyms, we have the following correspondence among circuits, STG's, and TDs: speed-independent circuits output semimodular TD's correct STG's.
C. Properties of Transition Diagrams
This section produces a finer analysis of the structural properties of TD's. The foundation laid here will be further refined into a set of properties that the TD derived from the correct STG must satisfy in order to allow its hazard-free implementation in a number of implementation architectures.
The following definitions relate signal transitions with states of TD's.
Definition 3.4 (Excitation Region):
An excitation region of signal in transition diagram is a maximally connected set of states in which has the same value and is excited.
The excitation region corresponding to transition will be denoted as
Note that there can be several excitation regions for corresponding to multiple transitions of We will call an excitation region that corresponds to a " " (" ") transition an up-excitation region (down-excitation region). Definition 3.5 (Quiescent Region): [2] The quiescent region corresponding to transition is the maximally connected set of states reachable from such that 1) is stable in and 2) is not reachable from any other without going through . 3 Excitation region and the following quiescent region are shown by dashed lines in Fig. 3(b) . Definition 3.6 (Minimal State): A state is said to be a minimal state for excitation region if it has no predecessors within the region. It is denoted as Definition 3.7 (Unique Entry Condition): An excitation region is said to satisfy the unique entry condition (UEC) if it has exactly one minimal state.
The notion of "unique entry condition" is important because it is a sufficient condition for the existence of a single cube to cover an excitation region. Fig. 3(b) . The minimal state of can be reached only by firing transition This transition is the only trigger to However, inside , transition is excited, which leads to the nonpersistency of transition with respect to
IV. BASIC IMPLEMENTATION STRUCTURES
In our synthesis method, we consider two implementation structures based on two different types of asynchronous latches: one using Muller C-elements and the other using -latches. Both structures are essentially the same except that the latter is dual-rail encoded.
A. Standard C-Implementation
A two-input Muller C-element is an asynchronous memory element with inputs and and one output The next state equation is
The implementation structure using C-elements is shown in Fig. 4(a) . This structure is called a signal network. A signal network for output signal is constructed in the following way.
1) For each up-excitation region
, a region function is derived as a single cube implemented by an AND gate.
2) The region functions are combined by an ORgate to create an up-excitation function for (the down-excitation function is obtained in a similar way).
3) Up-and down-excitation functions are connected to the inputs of a C-element (directly and via an inverter, respectively). Such an implementation is called a standard C-implementation. It requires, in addition to C-elements, AND gates, OR gates, and inverters.
If we consider all input inverters as independent gates, the standard C-implementation will no longer be speed independent. To justify the use of input inverters, we consider some realistic bounds on gate delays.
Theorem 4.1 [19] : Assume that a standard C-implementation of some STG is output-semimodular. Let be the same standard C-implementation except that all input inversions of AND-gates are implemented by separate inverters. Let be the maximal possible delay of one inverter and be the minimal possible delay of one signal network (it consists of the AND-gate delay the OR-gate delay the C-element delay).
is hazard free under any distribution of gate delays satisfying the following condition:
The relational constraint on the value of inverter delays given by the latter statement is realistic. This allows us to use the standard C-implementation without considering precise bounds on gate delays.
B. Standard RS-Implementation
We can also use an latch to implement the signal network. An latch is an asynchronous memory element with inputs and and dual-rail outputs and The dual-rail outputs are inverses of each other. The next state equations are and When latches are used, all the internal signals of a circuit are already implemented in the dual-rail form that allows us to replace all the inverse occurrences of each internal signal in the region functions by the signal from the inverse output of a corresponding latch. If all input signals are also presented in the dual-rail form, all the region functions can be implemented simply by an AND-gate without input inversions.
In practice, it is more efficient to use NAND and NOR gates. With latches, this requirement comes quite naturally: a two-level AND-OR function can be replaced by a twolevel NAND-NAND function with the same inputs and output. The corresponding structure is shown in Fig. 4(b) . Such an implementation is called a standard RS-implementation. The possibility of using NAND and NOR gates is an advantage of the -implementation over the C-implementation. However, it requires either working under the dual-rail interface with environment or using special input-output converters to and from the dual-rail form.
V. MONOTONOUS COVER CONDITION
We now investigate the necessary and sufficient conditions for deriving a speed-independent circuit from an STG using the basic implementation structures described in the previous section. However, bearing in mind the relationship between STG's and TD's, we will formulate these conditions in TD terms.
To develop the sufficient conditions for a hazard-free implementation under the unbounded gate delay model, we first introduce the notion of a cover cube and its correspondence to the excitation regions. We now define the key definitions that essentially capture the sufficient condition for a speed-independent implementation. Monotonous cover requirements are very similar to the conditions from [2] with the exception that here they are first formulated with respect to a single cube, while [2] considers the arbitrary logical functions to implement region functions.
We believe that the construction "region function AND- gate" leads to a simpler and more efficient implementation in the basis of simple gates (see experimental results). Further, we generalize the monotonous cover requirements to allow the sharing of logic between different region functions and consider several exceptions that can optimize the monotonous cover implementations by softening the requirements. have to be covered by completely; otherwise, some other cube is required for correct cover, and thus more than one cube can be turned on inside one excitation region.
The generalized MC conditions give a theoretical basis for the optimization based on sharing gates between different region functions. The idea of gate sharing under the standard implementation is not novel. For example, [2] suggests that if two region functions correspond to adjacent cubes with the same set of variables (e.g., and ), then they can be merged into one gate implementing cube . However, this condition covers only a particular case of sharing, and it is applied after the region functions have been derived, i.e., for this particular cover implementation. Our generalized MC requirement considers the problem before the derivation of region functions and hence can be used for deriving a minimal cover. It gives more general conditions for a proper (hazardfree) sharing of logic. However, we must admit that the search space for choosing the corresponding set of transitions is quite large, and the selection should be guided heuristically.
The importance of the generalized MC requirement is shown by two facts. First, we prove that the standard implementation based on MC cubes will have the behavior equivalent to the original specification (i.e., conformance of implementation to specification). Second, we show that the implementation is free from any hazards (in internal gates and outputs).
To establish the relationship between the behavior of the implementation and that of the specification (STG), we need to define a formal notion of equivalence.
The behavior of an STG and a circuit can be compared by the languages they realize. The languages are characterized by the set of traces, i.e., feasible sequences, of signal transitions. The equivalence relation between signal transitions in the STG and the circuit is given by the relationship between any transition in the STG and any rising (falling) transition of signal in the circuit. Both an STG and a circuit behavior can be characterized by their trace sets. Thus, one can compare in this way two different STG's, or two circuits, or an STG and a circuit. When an STG and a circuit are compared, we will always assume that the circuit is initialized in the state in which all internal signals are stable. Note that for standard implementations, this is a reasonable assumption because all the latches are connected to the external outputs, and hence when the latches are initialized to a proper state, all the internal signals will eventually become stable.
Theorem 5.3: If the excitation functions and of each output signal in the TD derived from a correct STG are represented as the sums of cubes and respectively, where corresponds to the monotonous cover of some excitation regions and each region is covered by exactly one cube, then both standard RS-and C-implementations are trace equivalent to the original STG with respect to the set of external signals.
Proof: We will refer to the outputs of AND-and OR-gates in the SOP's of the excitation functions as internal signals so as to distinguish them from the external signals that are present in the original STG.
Let us prove the statement by the induction on the length of feasible sequences in the STG and in the circuit.
1: Let 1.1: Assume is feasible in the STG. We must show that an equivalent sequence exists in the implementation.
Clearly, transition is enabled in the initial state of the STG. The initial states of the STG and the implementation coincide in their external signals. Therefore, if is an input signal, the environment should enable at the input of implementation, and is feasible in the implementation. If is an output signal, then some cover cube in the excitation function must be "ON" in the initial state of STG. Let, e.g.,
The cover cube in corresponds to an AND-gate in the SOP of Every internal signal of the implementation is stable in the initial state, which means that the output of the chosen AND-gate should be in state 1. From the similar consideration, it follows that output being an internal signal, is also in state 1. By the MC requirements, no cover cube of any is turned "ON" in the initial state of the STG. Therefore, in the initial state of the implementation, all AND-gates in the SOP of the and the gate itself, have their outputs at zero. Hence, the external output is enabled in the implementation, and sequence is feasible. Consider an arbitrary feasible sequence in the STG, where the subsequence has length By the induction assumption, there exists a sequence feasible in the implementation such that and are equivalent by the set of external signals. Let state be the final state of . is equivalent to , and therefore the projection of state on the set of external signals belongs to the TD derived from the STG. In the TD state , signal is enabled because of the feasibility of Therefore, and has to be covered by some monotonous cube This cube is implemented by an AND gate and, after the firing of this gate is either enabled or is already in state 1. If it is enabled, then by choosing a proper value for its delay, we can make feasible the sequence in the standard implementation. The same is valid for the OR gate . Thus, without loss of generality, we can consider that after firing the sequence , signal is equal to one. Signal will not be enabled under such a condition only if However, in state , all cover cubes in must be turned off. This means that any AND gate in is either enabled and is going to switch to zero or is already in state 0. Then, again by choosing proper delays for the gates, we can make a feasible sequence in the implementation, after which all AND gates of and itself will be set to zero contains only internal signals where denotes a pattern with one component in "1" and the remaining in "0," and denotes the pattern where all components are in "0." So the output of an OR-gate repeats the changes on its "hot" input Hence, similar considerations that proved the semimodularity of are also valid for OR-gates.
Case 3 (Output Gates):
The semimodularity of output signals immediately follows from the equivalence of standard implementation and original STG with respect to output signals.
Theorem 5.4 provides sufficient conditions for guaranteeing implementation correctness. There are several cases when these conditions can be softened.
• Degenerative SOP: An SOP implementation becomes degenerative, for example, if a cube consists of one literal or and/or the corresponding excitation function (e.g., ) consists of cube only. Then we can remove the AND and OR-gate from its implementation and connect the output directly to the corresponding input of C-element or latch. In this case, it is sufficient to demand from that it be a correct cover, not necessarily a monotonous one. Stated differently, when is turned "ON" and "OFF" in it does not influence the output signal , as the latter is already in state 1. At the moment of entering the cube will be reset (see Definition 5.2 for correct cover conditions).
• Extending "don't cares" for the standard Cimplementation 4 : To operate correctly, an -latch based on NOR-gates requires that the and functions be disjoint, i.e., the following condition should be met:
The dual condition, must be met for an -latch based on NAND-gates. For the standard C-implementation, neither of these conditions is necessary. That allows one to expand the dc-set for the and functions. Indeed, if the output of a C-element is in state 1 and its input also has the value "1," the value on the other input is of no importance for the output behavior. Therefore, function can be set to one before i.e., in the states of where which formally violates the condition for correct covering (see Definition 5.2). Consequently, this condition can be relaxed for the case of standard C-implementation. Note that due to the monotonicity requirement for the function , it has to go high in only once, except for those cases when function consists of one literal. In the latter case, we do not need to preserve the monotonicity of (see the previous item), and can consider all states of where as "don't cares" for
VI. REDUCTION TO THE MC FORM
If the initial specification does not satisfy the MC requirement, it has to be altered to allow a hazard-free implementation.
One of the obvious ways is to change the specification semantically, e.g., by the reduction of concurrency [36] or by the signal reshuffling [23] . Such a process will certainly converge because, in the worst case, we may end up with a fully sequential specification without any parallelism. For such specifications to satisfy the MC-requirement, it is sufficient to ensure CSC by one of the existing methods. This approach, however, is subject to two important factors: performance (typically, it slows down the circuit operation) and I/O protocol (concurrency reduction must not change the behavior of the environment).
In this paper, we will rely on a more restricted sort of transformations, those preserving the language generated by the specification. The task thus stated is to reduce the specification to the MC-form by adding extra signals in such a way that these signals will be internal for the implementation, while for the outer observer, the implementation will show the same behavior as the initial specification, except, probably, for timing/performance characteristics.
The equivalence of our transformations is based on the notion of trace equivalence, introduced in Definition 5.7. Trace equivalence does not require the equivalence of input-output interfaces, i.e., the sets of input and output signals in traceequivalent models can be different. If one has to preserve the input-output interface, then a stricter equivalence notion is needed, which requires the equivalence of input and output sets in both objects.
Let two STG's and have the same sets of input signals and external output signals If STG's and are trace equivalent with respect to the set then these STG's will simply be called equivalent.
The task of reduction to the MC form is now formalized as follows: add new signals to the original STG in such a way that the obtained STG will satisfy the MC conditions and will be equivalent to the original one.
Two questions however arise here: 1) Is such a reduction always possible? and 2) How to do it? We will see further that for a correct STG, the answer to the first question is positive. This fact will be proved in a constructive way, i.e., by presenting an efficient technique that allows the reducing of any STG to its MC form, so the solution for the second problem will be shown altogether.
The first attempt to find a general method of ensuring the MC requirements was made in [17] and [18] . The MC requirement was formulated there as a set of Boolean constraints under the rules of introducing additional signals. If the additional signals satisfy these constraints, the solution must have the MC properties. This solution can be found using the state-of-the-art Boolean satisfiability solvers. However, this approach allows handling of only small specifications, and the quality of the logic obtained is low.
We present here another approach based on direct transformations on the STG level, which is computationally more efficient and allows more flexibility in the reduction of logic.
In this section, we will restrict ourselves to correct STG's, with the additional requirements for STG's to be safe and for the corresponding TD's to satisfy UEC's and to have no states with the same binary codes.
5 Such STG's are called strongly correct STG's.
In fact, this does not limit the descriptive power of correct STG's. Similar to the methods of reduction to CSC form [8] , [15] , [22] , [37] , a TD, and the corresponding STG, can always be reduced to the form with all states encoded by different binary codes. We will assume that this transformation is performed beforehand. The unique entry condition is also not too restrictive. In Section VII-B, we will generalize this condition and show how to ensure this generalized property in an arbitrary STG. We have chosen to work with safe STG's because the procedures of reduction to the MC form are much simpler for them. Moreover, it was proved in [15] that any unsafe STG can be unfolded into an equivalent safe one. Besides, unsafe specifications are relatively rare in practice.
Let us solve the problem of STG reduction to the MC form in two steps: 1) STG reduction to the persistent form; 2) reduction of the persistent STG to the MC form.
A. Eliminating Nonpersistency
The methods for eliminating nonpersistency presented in this section generalize those of [38] . These methods apply structural transformations at the STG level, such as insertion of new events (see [29] ), which preserve trace equivalence. For comparison, model transformations presented in [25] have been based on concurrency reduction.
The following theorem gives an upper bound on the number of signals that have to be inserted to remove nonpersistency for a given pair of transitions in STG's with choice. Moreover, it shows where these signals have to be inserted.
Theorem 6.1: Assume that there is a nonpersistency between a pair of transitions and in a strongly correct STG. This nonpersistency can be eliminated by introducing two additional signals, and no new nonpersistencies arise.
Proof: Let be a trigger nonpersistent transition to i.e.,
We shall prove this theorem by developing an effective method for eliminating nonpersistency between and by inserting two extra signals and into STG. Three properties have to be checked:
• the nonpersistency between and is eliminated and no new nonpersistencies have been created; • the modified STG, is equivalent to the initial one, ; • if the initial STG, has CSC and consequently allows direct circuit implementation, then the modified STG, also meets the CSC requirement. 5 The latter requirement, called unique state coding, is stronger than the CSC condition and is taken just to simplify the presentation of this paper. As is a trigger transition to , it must be a predecessor of There are only two cases to consider.
1)
is a direct predecessor of [see Fig. 5(a) ].
2) The relation of direct precedence between and is mediated by place [see Fig. 6(a) ]. This place is explicit in the STG and thus can have more than one predecessor transition. Case 1: Assume that there is an arc between and Additional signals are inserted in the way shown in Fig. 5(b 
STG
is obviously equivalent to the initial STG, as can be shown by projecting on the set of the original signals in Let us prove that in the suggested transformation, all states of the TD are encoded by different binary codes, i.e., the strong correctness of the STG is preserved. Consider the excitation regions that correspond to the added transitions Let us exclude from the states of e.g., the additional bits corresponding to signals and Clearly, the obtained projection of will coincide with the set of states in the original TD. The original TD has no states with the same binary code. Hence, in , no state will be encoded with the code of some other state of the TD. Similar consideration is valid for all other excitation regions introduced diring this transformation.
We can therefore state that if satisfies the conditions of the theorem, then also obeys the requirements of strong correctness.
Case 2: Assume that , as shown in Fig. 6(a) . In this case, a simple modification of the transformation for case 1 will exclude the nonpersistency between and [see Fig. 6(b) ]. The equivalence between and as well as the preservation of CSC, can be shown in the same way as for case 1.
Theorem 6.1 shows how any strongly correct STG can be reduced to the persistent form. For practical purposes, in many cases more efficient ways of eliminating nonpersistencies exist. A useful set of heuristics is presented in [19] . They include methods for eliminating a single nonpersistency by one signal, sharing of signals to eliminate several nonpersistencies at once, etc.
B. Reduction of Persistent STG to MC-Form
According to Theorem 5.1, the persistency property is necessary for the MC requirements, and it can be achieved using the results in Section VI-A. Thus, we can now consider the reduction to the MC form for persistent, strongly correct STG's.
First, we illustrate the method of reducing an STG to the MC form using a simple example of the specification shown in Fig. 7(a) , with input signal and output signals In this STG, all output signals are persistent. The Karnaugh map corresponding to the logic function of signal is shown in Fig. 7(c) . Excitation region consists of three states. The only signal that is ordered with is and thus cube is chosen to cover However, this cover is neither monotonous nor even correct because cube also covers two states from the off-set of the function for To make this cover correct, we need to split cube into two cubes and , but this violates the MC requirement because two cubes will now be turned on in state 0110 (underlined in Fig. 7) inside one excitation region. Thus, function does not meet the MC requirement, and is therefore hazardous when implemented by simple gates. For all other excitation functions, the MC requirement is satisfied, and the logic for their implementation is hazard free. The obtained set of excitation functions is (1) Fig. 7 . Insertion of additional signal x to ensure MC-conditions.
To reduce the cover for to the MC form, we need to distinguish between the case when signal is equal to one after firing ( has to be equal to one) and the case when signal is still in one (after ) but has to become zero after This can be done, for example, by adding signal , which will be high in and will become low before [see Fig. 7(b) ]. Such a transformation will result in the following logic:
The area estimate for logic corresponding to the standard Cimplementation for 1) and 2) (using the SIS library from [21] ) shows that the hazard-free implementation is even smaller than the original hazardous one: 464 area units for 1) and 374 for 2). The area of hazard-free implementation can be further reduced to 344 units by extending "don't cares" for signal This allows one to implement signal simply by a C-element with an inverted output. The result strongly depends on the place where the transitions of signal are inserted. In our case, the logic is reduced because after adding signal signal can be implemented by a hazard-free combinational circuit without a latch. This is an issue of the logic optimization strategy. The following statement, similar to Theorem 6.1, holds for the reduction to the MC form.
Theorem 6.2: In a persistent, strongly correct STG let the MC-requirement be violated for some excitation region that corresponds to transition Then, by inserting new signals, a persistent, strongly correct STG equivalent to can be derived, where this violation is eliminated and no new violations of the MC-requirement arise.
Proof: We first consider the case when has no places in its predecessor set, i.e., is directly preceded by transitions (without loss of generality, we can assume and ). We will introduce additional signals and into in the way shown in Fig. 8 .
Consider the MC properties of and the newly introduced transitions 
1) If
was covered by MC cube in then we will ensure the MC cube (if then this cover cube will be Indeed, evidently covers all states of
The correctness of cover follows from the correctness of cover , since cannot be turned on outside because it would imply that had to be turned on somewhere outside Last, cube changes only once in any trace inside because it is reset by the transition and cannot be set again inside due to the persistency of In fact, the next transition of (a down transition, for our choice of transition polarity) can occur only after in , and thus occurs after in 6 2) From the similar consideration, it is easy to show that cube is an MC for , and cube is an MC for 3) Cube is a correct cover for because in all states where signal has to be at one, signal is at zero and cube is turned off ( cannot be concurrent to ; otherwise, in the original , is concurrent which contradicts the correctness of ). As this cube consists of only one literal, the correctness of guarantees hazard-free behavior (see the notes about the degenerative cases discussed at the end of Section V), so we need not check the monotonicity of the cover.
4) Let us show that the MC for
can be represented by cube Indeed, the monotonicity of this cube follows from the persistency of (any changes of signals should happen only after ), while the correctness of is ensured by the reset of immediately after 5) For all direct successors of or , we will replace, in the corresponding cover cubes, literals and with and respectively. This clearly will not violate their MC-properties. STG is obviously equivalent to the initial STG, as can be shown by finding the projection on the set of the main signals from 6 If +bk jk has no monotonous cover in G; then it might be that +yk has no monotonous cover in G 0 : However, as cube yk is an MC for ER(+bk jk ) in G 0 ; the insertion of +yk does not introduce new MC violations, which is necessary for progress. Last, this transformation preserves CSC in exactly the same way as in Theorem 6.1. 6) The case when has an input place among its direct predecessors can be treated in a similar way. However, can have several transitions directly preceding it, and therefore to preserve the consistency of the STG, the up transitions of the corresponding additional signal should be inserted before each of them (see Fig. 9 ). Theorems 6.1 and 6.2 prove that any strongly correct STG can be transformed into an equivalent STG with MC properties. This implies that for the adopted architectures (SOP combinational logic plus a latch), we have found a method for the hazard-free implementation of an STG in the basis of simple gates.
In the following section, we extend the reduction methods in two ways. First, we will show how simple modifications to the standard implementation structures can help ensure the MC requirement. Then we will generalize the MC-theory to allow the hazard-free implementation of a wider class of specifications. STG's without the UEC and models with ORcausality will be considered.
VII. EXTENSIONS OF THE MC-THEORY
A. Extending Implentation Structures 1) Inverse Feedback:
The MC conditions can be violated for a cover cube if this cube has nonmonotonous behavior in the corresponding quiescent region (changes more than once in it). One way to ensure the monotonicity of the cube is to reset it immediately when entering the quiescent region. Note that in all states of excitation region the value of signal is inverse to its value in all states of the following quiescent region. For example, in the states of , signal is equal to zero, and in all states of the quiescent region , is equal to one. Signal has a constant value in and so can be added to the cover cube. This will restrict the cover only by the states in the excitation region. In such a case, the requirement for the cover cube to change only once inside the quiescent region is fulfilled automatically.
The use of self-dependent covers of excitation regions allows softening of the MC-requirements and reformulation of At the implementation level, a self-dependent cover involves an additional inverse feedback wire from the latch output to its corresponding signal network.
This optimization is possible because the inverse feedback is never used to switch the output of a latch. It is only used to reset the gates in the corresponding signal network after the switching of the latch has already occurred. However, some precautions should be taken if inverse feedbacks are used together with other optimizations (e.g., it limits the possibilities of logic sharing).
2) Polyterm Covers: Let us extend the basis of standard gates for the implementation of SOP structure to complex gates (AND-OR). Clearly, since the basis is more powerful, the requirements for hazard-free implementation can be relaxed. Despite such a relaxation, the main objectives of the monotonous cover of excitation functions remain valid. These are:
1) to preserve the one-hot discipline, which assumes that in the SOP structure of the signal network, only one gate can be turned on at a time;
2) to switch the output of the gate that triggers the latch monotonously, i.e., only one transition can occur at the output of this gate before the latch changes its output value.
Definition 7.1 (Polyterm Monotonous Cover):
The union of cubes is a polyterm monotonous cover for if: 1) covers all states in ;
2) the logic function corresponding to changes at most once in any trace of states inside ;
3) does not cover any reachable state outside Definition 7.1 extends the MC conditions for a polyterm implementation. Evidently, similar to a single-term monotonous cover, the polyterm cover can be generalized for the case of several excitation regions covered by the same set of cubes (see Definition 5.5).
In practice, however, the complex gate implementation requires additional correctness criteria since achieving atomic behavior for a complex gate with input inverters is not always possible. To justify the use of input inverters (see Section IV), we showed that the standard implementations are robust because a malfunciton (i.e., nonhazard-free operation) can only happen if the delay of the inverter is greater that the delay of the entire region network. When a complex gate depends both on the direct and inverted values of some signal then the conditions for correct operation are determined by the result of the race between the inverter and the wire at the inputs of the complex gate. Clearly, the conditions for hazard-free operation in this case should be stricter than in the case when input inverters are in a race with the whole signal network. For hazard-free operation of a complex gate with input inverters, to guarantee the atomic behavior of a gate, the following sufficient requirement can be added to the three conditions of Definition 7.1: 4) Any signal transition inside is an internal transition for at least one of the cubes in i.e., both and are covered by some This condition implies that any transition within one excitation region is internal for at least one of the cover cubes. It can be informally viewed as a generalization of the static hazard elimination condition, which is used to implement Boolean functions in the fundamental mode [30] , [34] , [38] : every input transition should be internal for at least one cube of the cover that implements the function.
Condition 4) would be, for example, violated for a polyterm cover corresponding to an XOR gate if both its inputs, and change concurrently within the same excitation region of signal
B. Extending Class of Specifications 1) Revising the Unique Entry Condition:
Up to now, we required the excitation regions of STG to satisfy the UEC requirement, i.e., to have only one minimal state. To illustrate that the violations of UEC are not necessarily always dangerous, let us consider the example.
in Fig. 10 (b) has two minimal states:
and . However, both of them are triggered by the rising transitions of , and cube is a monotonous cover for Hence, there is a simple way for generalizing the unique entry condition.
Definition 7.2 (Generalized Unique Entry Condition):
An excitation region is said to satisfy the generalized unique entry condition if each of its minimal states has the same set of trigger transitions.
If an excitation region satisfies the generalized UEC, it can be covered by a single cube using the technique from Section VI.
Let us consider a TD that violates the generalized UEC condition, i.e., it has an excitation region in which there are minimal states that are entered by different sets and of trigger signals. It can be shown that in correct STG's, such states are reached only by alternative branches. Therefore, the reduction of an STG to the generalized UEC form can be done by inserting an additional signal [see Fig. 10(c) ], where and are silent transitions of STG. It can be observed that after this transformation, there will be two seprate excitation regions for each with a single minimal state. At the same time, the generalized UEC will be satisfied for This technique, described here only informally, appears to be effective in most practical cases.
2) Modeling Processes with OR-Causality:
The class of STG's considered so far allows only one type of causal relation between signal transitions, which simply inherits the enabling and firing rules of the underlying Petri net transitions. It is called AND-causality, as every transition can occur only if all of its direct predecessors have occurred. To specify arbitrary processes in semimodular circuits, OR-causal relations between signal transitions may also be needed [15] , [38] . In the latter case, a transition can occur when at least one of its direct predecessors has occurred. It has been proven [40] that OR-causality cannot be adequately captured in the class of free-choice PN's (and hence in the class of correct STG's) at the level of net transitions without involving complex labeling mechanisms. 7 OR-causality can, however, be expressed if we introduce in the model an additional type of arc specific to OR-causal relations. Such a model, called a change diagram, (CD) was originally proposed in [39] (without choice) and further developed (to allow choice) in [40] . This extension is of importance because any circuit with an OR-gate (or, dually, an AND-gate) has an OR-causal relation if concurrent rising (dually, falling) transitions occur at the inputs of the gate.
The formal definition of change diagrams is based on two types of precedence relations between transitions in asynchronous circuits.
1) The strong precedence relation between transitions and usually depicted by a solid arc in the graphical representation of change diagrams, means that that cannot occur without the occurrence of
2) The weak precedence relation between transitions and usually depicted by a dashed arc in the graphical representation, means that may occur after an occurrence of But may also occur after some other transition which is also weakly preceding without the need for to occur.
Definition 7.3:
A CD is a triple where is a marked directed graph 8 with a set of vertices ; two types of arcs, for strong precedence and for weak precedence; the initial marking on the arcs; is the set of signal transitions; and is the unique labeling function. The strong and weak precedence relations must satisfy the following: 1) they are mutually exclusive, i.e., implies and vice-versa, and 2) all the predecessors of a transition must be either of the strong type or of the weak type. Hence, the set of transitions is partitioned into AND-type transitions (with strong predecessors) and OR-type transitions (with weak predecessors).
The firing rule of CD's is similar to that of PN's, with arcs playing the role of places and flow relation elements at the same time. Each arc is assigned an integer marking, which, unlike a PN marking, can be negative. Initially, each arc in has a marking of one, and each arc not in has a marking of zero.
• An AND-type transition is enabled if all its predecessor arcs have a marking greater than zero.
• An OR-type transition is enabled if at least one predecessor arc has a marking greater than zero. When an enabled transition fires, the marking of each predecessor arc is decremented and the marking of each successor arc is incremented. 9 Similar to STG's, CD's generate statetransition semantics represented by TD's. Due to the absence of choice in CD's, the TD's generated by them have no conflict states. It was proven in [15] that the modeling power of correct CD's exactly coincides with that of semimodular TD's. The definitions of liveness, boundedness, safeness, and (strong) correctness, applied to STG's in Section II-B, can be trivially extended to CD's. Fig. 11(a) shows an example of a process with OR-causality specified by a CD. Unlike all other (AND-type) transitions, transition (OR-type) fires if at least one of the transitions or have fired. The TD generated by this CD is shown in Fig. 11(b) .
3) Implementation of Processes with OR-Causality: At the TD modeling level [ Fig. 11(b) ], OR-causality always leads to the violation of UEC conditions. Contrary to the violations of UEC in STG's, however, in CD's, different minimal states of an excitation region are entered by different concurrent (not conflict) transitions. Hence, the technique of reduction to the generalized UEC presented in Fig. 10(c) will not work here.
Nevertheless, a simple gate implementation of OR-causal processes can be found. For example, it was proven [38] that any semimodular TD can be implemented in the functional basis of two-input NAND and two-input NOR gates. Together with the above-mentioned result of [15] implies that any correct CD is also implementable in the same logic basis. The generic implementation constructions proposed in [38] , however, are area ineffcient and hence impractical. We are therefore interested in an extension of the basic MC-form that would allow OR-causal signal transitions to be produced by their signal networks. The major problem with the use of MC-form in an OR-causal operation is as follows. The number of cubes that cover the excitation region corresponding to an OR-transition is at least equal to the number of minimal states. Moreover, these cubes can be simulateneously turned on in this excitation region, thereby breaking up the correctness of the MC-based simple gate implementation. For example, in the model of Fig. 11 , the subregions and induced by their respective minimal states and give rise to the two-cube, complex cover Both cubes cannot, however, be separated into two simple AND gates without hazards-these cubes are both turned on in state which belongs to To avoid the above problem in a simple gate implementation, we must make sure that the cubes that cover excitation regions with OR causality consist of single literals and hence do not need explicit AND gates in the first layer of the SOP structure. Thus, any MC implementation of a CD with OR would have simple OR gates in the first layer of its signal network, as shown in Fig. 12 .
Informally, an OR-monotonous cover is a cover in which an excitation region of an OR-transition is covered monotonously by a set of single literal cubes. A more formal treatment requires the following definitions.
Let be an excitation region of an OR-transition Let have a set of minimal states An excitation subregion is a subset of states in that are reachable inside the region from minimal state Evidently, if for each subregion we can find its own monotonous cover cube then the set of cubes will satisfy the monotonous polyterm cover conditions for the whole region
In the CD of Fig. 11 , the subregions and satisfy the polyterm cover A way toward a simple gate implementation of CD's with OR-transitions is given by the following lemma. OR-gate . According to this corollary, to implement a CD in the basis of simple gates, it is sufficient to ensure the OR-monotonous cover conditions for all OR-transitions. This would satisfy the C-implementation shown in Fig. 12 . The following theorem is a direct consequence of the theorems in Section V and the above corollary.
Theorem 7.1: If each excitation region (OR-excitation region) for an output signal of a correct CD satisfies the monotonous (OR-monotonous) cover conditions, then the generalized RS-and C-standard implementations are semimodular. The next question is whether any correct CD can be implemented in an MC-form. The proof of the following theorem presents a basic technique of how a CD that violates the OR-monotonous cover condition can be transformed to an MC-implementable form. Fig. 13 (dotted arcs denote the transitive ordering relations).
(We now assume that a general synchronizer is found in the original CD or that the CD is transformed to the form with the explicit general synchronizer like in Fig. 13.) There are two possible cases where a general synchronizer may be positioned relative to . Case 1: If precedes the transition where is the next (to ) transition of signal then the transformation shown in Fig. 14(a) will ensure the OR-monotonous conditions for (The dotted lines in Fig. 14(a) denote the transitive ordering relations). Case 2: If precedes then the cube corresponding to one of the OR-causes will be turned on even after the signal is reset, which violates the conditions of correct cover. In this case, the OR-transition itself must also be replaced [see Fig. 14(b) ].
The transformations suggested in this proof are not aimed to ensure optimality in achieving the OR-monotonous cover conditions. In practice, the number of additional signals can be much less. To illustrate this, let us return to the CD from Fig. 11(a) . The OR-monotonous cover conditions are violated for OR-transition because and which imply that cube for the OR-cause will still be on after goes low. One additional signal ensures the OR-monotonous cover for (see Fig. 15 ). The up-excitation function for will be Note that the introduction of does not cause any new noncorrectness.
In such cases, we first need to insert additional signals reducing the specification to an OR-monotonous form (see transformation in Fig. 16 ) and then apply the generalized standard implementation with OR-gates.
The above implementability results were proven for CD's, which do not allow input choice in specifications. To have a combination of OR-causality and choice within the same model, one could use a unified formalism, e.g. causal logic nets (CLN's), defined in [40] . This model, pictorially similar to STG's, associates with every transition a Boolean function, defined on the set of predecessor places, thus determining the type of causality. An interested reader may refer to [40] for details, including a classification of OR-causality types and different types of firing rules. In principle, the transformation techniques aimed at OR-monotonicity, described here for CD's, can be applied to a subclass of correct CLN's, in which places involved in OR-causal enabling functions are not used for choice.
VIII. EXPERIMENTAL RESULTS
The proposed approach has been tested on the examples presented in this paper and on the known set of benchmarks from [21] . The CAD tool "FORCAGE" [15] was used to derive the excitation functions and check the implementations with respect to their freedom from hazards.
To evaluate the efficiency of our approach, the obtained circuits were realized in the SIS library of simple gates, and their area and delay were estimated. The closest method of implementation in simple gates was suggested by Beerel et al. [2] . The circuits obtained by this method are semimodular and thus are hazard free under any distribution of gate delays. We will give the comparison of our solutions to those obtained by [2] . Unlike both the technique of [2] and our method, the approach developed in [21] ensures hazard-free properties by selecting the right ratios between gate delays (implemented in the SIS tool). We will compare our method to that of [21] using the area and delay estimates from [2] taken for bounded delay synthesis from SIS tool.
The delays of implementations in [2] and [21] were evaluated through the worst case delays of the networks for the signal implementation. This is not completely relevant to the circuit performance, as neither critical cycles nor cycle times are taken into account. However, it relates to the speed properties of a circuit because the faster the components, the higher the circuit speed. The delay of a signal network depends upon its depth and upon the delays of its gates, which in turn are determined by the gates' fan-in. To be in correspondence with the experimental results from [2] and [21] , we used the same strategy for the evaluation of the circuit delay.
Almost all specifications from the set of benchmarks described in [2] and [21] satisfy the MC properties. However, by choosing the appropriate type of architecture (RS-or Cstandard), one can achieve area and/or delay reduction. A particular optimization strategy can also strongly influence the circuit's area and performance.
The classical asynchronous logic transformations [34] are not speed-independence preserving. Our method allows optimization at different levels. At the level of MC transformations, one can optimize the circuit area and delay by inserting extra signals. If the specification already satisfies the MC properties, then we choose for each signal a monotonous cover with the minimum literal count among all possible monotonous covers. For speed-independent standard C-implementation, we use gate-level local optimization, similar to that described in [2] but more aggressive. This includes gate sharing and converting AND and OR gates into faster and more area-efficient NAND and NOR gates. Whenever there is a monotonous cover for a signal such that the signal network for signal can be implemented as a simple two-level combinational circuit for without a latch. We also actively use the expansion of the dc-sets for the and functions in the C-implementation, as described in Section V.
For the standard -implementation, we additionally allow merging the OR gate from the two-level combinational net with the NOR-gates that implement the -latch. This reduces the circuit delay (the depth of the signal network decreases by one) and area while still keeping the circuit in the simple gate basis.
The results are shown in Table I . The column labeled "Trans./states" shows the number of signal transitions in the initial STG specification and the number of states in the corresponding transition diagram. The columns labeled "Area" give the total area (excluding routing) of each circuit, using a "generic" standard cell library from SIS. The columns labeled "Del" give the maximum delay inside one signal network based on the SIS conventions for delay estimates. Although this method does not allow one to observe the actual cycle time of the circuit, we had to choose it in order to be compatible with the delay estimate for the Stanford and SIS methods. The columns labeled "SIS" and "Stanford" are directly borrowed from [2] . The numbers for SIS were obtained in [2] under the assumption of the fixed delay model (when the lower and upper bounds for gate and wire delays coincide) with an optimization script to minimize area. The column labeled "C-impl" presents the area and delay estimate for the locally optimized (for area) standard C-implementation. The column labeled "RS-impl" gives the area and delay estimate for the area-optimized standard -implementation. The last column shows the best between standard C-and RS-implementations with respect to area.
We summarize the experimental results in Table II . It puts the area and delay for the Stanford method and the methods presented in this paper against the implementation obtained by the SIS tool. The total area and the delay for all circuits from the benchmark implemented by SIS was taken equal to one.
These results show that, in comparison with the SIS implementation, our area-optimized standard implementation on average reduces the area by about 10% and increases the performance by about 34%. The C-implementation and the -implementation, separately, produce approximately the same area ( 3 and 2%) and increase the speed by about 27 and 36%, respectively. SIS needs to pad extra delay lines to ensure hazard freedom. These delays represent a significant fraction of the overall delay through the circuit. By contrast, speed-independent standard implementations provide hazard freedom by construction. This is the reason for our speed advantage in comparison with SIS.
In comparison with the Stanford method, our optimized implementation reduces on average the area by about 21% and increases the speed by about 18%. The C-implementation and the -implementation separately give an area reduction of 16 and 11% and speed increase of 9 and 20%, respectively. Fig. 17 shows why our C-implementation gives some benefits in comparison with [2] for the "converta" example. The initial STG is shown in Fig. 17(a) . Fig. 17(b) shows the circuit from the benchmark of [2] , with the area 520 units and delay 6.0 units. Fig. 17(c) shows our standard C-implementation with the area 360 units and delay 3.6 units. The circuit area was reduced due to using the extended dc-set for the up-function of signal choosing cover cubes with all literals noninverted for signal and better local optimization for signal The area can be further reduced, to 320 units, by substituting an XOR-gate, available in the asynchronous library from SIS, for the three gates as shown in Fig. 17(c) .
IX. CONCLUSION
In this paper, we have presented a theoretical framework for the hazard-free gate-level implementation of speedindependent circuits specified by STG's or their extension, called change diagrams. We first considered the most well known class of behaviors, defined by free-choice safe STG's, which only allow specifying circuits with AND causality between signal transitions. Toward the end, the most general class of speed-independent specifications, described with CD's, with both AND and OR causality, was tackled. We have provided sufficient conditions for hazard-free implementation by the standard structures combining two-level simple-gate combinational logic with latches (either a C-element or -latch). We described these conditions using transition diagrams, a state model that can be generated from the initial event-based specification defined in the form of an STG or CD.
We have developed a set of transformation techniques that can be applied at the event-based specification level (which would make them more efficient) to obtain the corresponding specification in the form satisfying the monotonous cover requirement. Here, in a step-by-step manner, we have shown the methods of eliminating nonpersistence and nonmonotonous covering, generalizing the MC conditions to allow optimization based on sharing logic gates and handling excitation regions with multiple entry states. An important feature of our approach is that all of the allowed transformations are semantic preserving, i.e., adding new signals does not change the original order between signals as defined by the initial STG or CD. In other words, we allow only very limited alterations in the model, e.g., forbiding concurrency reduction or signal reshuffling.
We have also presented a number of techniques to improve efficiency and hence practicality of the monotonous cover approach under certain relaxation of the original implementation structures, such as using the inverse feedbacks and complex (AND-OR) gates with appropriate modification to the MC condition.
We pursue an approach that is in some sense opposite to the one in [2] , which achieves hazard freedom by applying some computationally hard, logic-level transformation procedures, yet without a firm guarantee of finding a correct solution.
Our method does not fail because it is applied at the level of the behavioral model and can be guided by the chosen implementation structure. That is, it can always guarantee a hazard-free solution for any speed-independent specification if the designer has a chance to compromise on a number of factors: the gate/wire delay model (inertial or distributed inertial), the model of a complemented literal (a separate inverter or inhibitor input to the gate), possibility to use AND-OR-NOT gates,
-latches, C-elements, and, last, area/speed overheads concerned with the addition of auxiliary state signals. Unlike [17] and [18] , the approach described in this paper reduces specifications to the MC form by direct and constructive transformations instead of reducing the problem to the Boolean satisfiability task, where the possibilities to control the quality of the logic area and performance are relatively poor.
The experimental results have shown that our present approach to hazard-free implementation of speed-independent circuits appears to improve over the previous work in both qualitative and quantitative domains.
The state-of-the-art implementation of the proposed method is in the tool petrify [9] . This tool also implements new methods for logic decomposition and technology mapping of speed-independent circuits in a library of gates with restricted fan-in [10] ; these methods are built upon the MC theory. 10 As the main direction for future work, we consider a two-level minimization guided by the monotonous cover constraints and an optimized transformation of STG and CD specifications to the monotonous cover form. It would also be interesting to relax the equivalence criterion for possible signal reshuffling and concurrency reduction.
