Petri nets 46, 37, 45, 48] are a powerful formalism for modeling concurrent systems. They are capable of implicitly describing a vast state space by a succinct representation which gracefully captures the notions of causality, concurrency and con ict between events. Petri nets have also been chosen by many authors as a formalism to describe the behavior of asynchronous circuits by interpreting the events as signal transitions, thus coining the term Signal Transition Graph (STG) 50, 4].
Introduction
The synchronous hypothesis, stating that computation and communication take zero time and that events can only happen at some discrete points in time, is quite rigid and does not match well physical reality. However, it has enabled digital circuit design to make an incredible progress in recent years. The asynchronous hypothesis, stating that you cannot make any precise assumption on how much time computation and/or communication take, is (apparently) much more exible and closer to reality. Yet, its role in digital circuit design is still marginal. We believe that the main reason for this is the lack of a widely accepted and well-established design ow for asynchronous circuits based on a sound mathematical model exactly like Finite State Machines and Boolean Algebra are the foundations that have made CAD-supported synchronous digital design possible.
The purpose of this paper is to review recent results in asynchronous circuit analysis and synthesis based on a sound mathematical model that yield e cient automated tools. Even though it is not, by itself, su cient to bring asynchronous circuits (again) into mainstream logic design, since this would require a deep change in the way the design is carried out, we hope that it will help to promote the idea that asynchronicity is not synonymous of \black magic" any longer.
The mathematical model that we use is an interpretation of the widely known Petri Net (PN) model, in which every PN transition models a circuit signal change. We will show, by means of examples, how real (and ctitious, but well-known) problems can be speci ed in this form, and how they can be solved, both in terms of analyzing the speci cation properties, and in terms of generating a logic circuit implementation.
The class of asynchronous circuits that we consider in this paper is called Speed-independent (from the pioneering work of Muller 36, 34] ). In this class, computation takes an a priori unbounded amount of time, while communication is assumed to take a small (more precisely, smaller than any computation) amount of time. This may appear unrealistic for modern VLSI technology, where communication across a large chip can actually take much longer than any gate switching. However, we consider it useful because:
any point-to-point (single receiver) communication can be modeled as a computation delay without loss of generality, it is possible to enforce communication protocols between subcircuits that ensure no dependency on communication delays (such protocols are generically termed delay-insensitive 53, 35] ), it enables the use of (restricted, as it will be described in Section 8) Boolean optimization to e ciently implement the circuit.
Without this hypothesis, optimization capabilities (and hence the applicability of asynchronous logic altogether) for delay-insensitive circuits are almost non-existent, and synthesis techniques amount to little more than a syntax-driven translation from a speci cation language and peephole optimizations.
Let us consider the use of STGs in the design ow for asynchronous circuits (see Figure 1 ).
Formal speci cation of asynchronous circuits STGs are used as a behavior speci cation of a system in a top-down design ow (block S1 in Figure 1 ). The STG model inherits the asynchronous semantics of PNs and can be naturally used for specifying behavior of the target asynchronous implementation. This approach is illustrated in the paper by considering several design examples. PNs can be also used in a bottom-up design ow (block S2). For this, each circuit module (netlist of gates) can be syntactically translated into a PN, called a circuit PN 58] . This PN can be used for further veri cation and for circuit optimization via resynthesis.
Veri cation of asynchronous circuits Veri cation is used at di erent stages of design (the corresponding blocks in Figure 1 are marked by V 1, V 2 and V 3). After a system is speci ed one should answer the following question: \Can it be implemented with an asynchronous circuit?" This requires veri cation of implementability properties (block V 1 in Figure 1 ).
There is an iterative loop of changing the speci cation until it becomes implementable. While changing the speci cation the designer can use the information about violation of implementability properties that is provided by the veri cation process.
The implementability conditions for STGs are known from literature 4, 28, 55] ), where they have been formulated in terms of PN markings. Such an approach requires reachability analysis of a PN or an STG, using explicit generation of a reachability graph (or its binary encoded version -a State Graph), which can be very complex for a highly concurrent behavior. There are several well-known techniques to ght with the \state explosion problem" in behavioral analysis of Petri Nets. Symbolic Binary Decision Diagram-based (BDD) traversal of a reachability graph allows its implicit representation which is generally more compact than an explicit enumeration of states 42] . Other methods avoid the generation of the corresponding reachability graph 32, 13, 22, 26] by considering a nite pre x of the equivalent occurrence net. This pre x, called an unfolding, is an acyclic net where all places have no more than one input transition.
In the paper, we will mainly concentrate on these two approaches for property veri cation using PNs although for veri cation of certain properties they can be complemented by the stubborn sets 54] and partial order techniques 16] . Other approaches for asynchronous circuits veri cation using PNs or similar formalisms can be found in 11, 1, 49, 61] .
After the circuit is synthesized, the implementation veri cation is required for checking if the implementation (the circuit) conforms to the speci cation (block V 2 in Figure 1 ) 11, 49] . In a PN-framework, this problem can be viewed as a comparison of two PNs: a PN describing the speci cation and a circuit PN corresponding to the implementation.
The implementation veri cation is used for hand-crafted designs, for automatically synthesized circuits if the synthesis method is not correct-by-construction, for debugging synthesis tehniques and for checking that the physical implementation actually corresponds to the design assumptions that are used in correct-by-construction synthesis methods. In the following, this task will be considered as a comparison of two PNs: one representing the speci cation and one derived from the circuit. For a correct design, the circuit PN must conform to the speci cation.
The circuit implementation may also require to verify some properties. For example, any asynchronous circuit should be free of hazards, that are short spurious pulses at the outputs of gates (block V 3 in Figure 1 ). This can be done in the same way as in top-down design (module V 1) by translating the circuit speci cation into a corresponding PN 1 . 1 Although veri cation by analyzing with partial order techniques the State Graph derived from a circuit can be 3 Synthesis and optimization of asynchronous circuits Given a Signal Transition Graph, one can produce a net-list of an asynchronous controller in the target gate library while preserving the speci ed input-output behavior.
There are two di erent routes for synthesis. The rst approach is a syntax-directed translation of a PN (veri ed for certain implementability properties beforehand) into a circuit. The direct translation approach is easy to implement but, in most practical cases, it would hardly yield an elegant and compact solution. The paper shortly discusses the direct translation methods before going into a more challenging synthesis approach (for more details on structural methods we refer to the survey 60] ).
An alternative synthesis procedure is, in a way, similar to classical logic design of nite state machines. It can be seen as a set of equivalent transformations of the original speci cation (e.g., by means of event insertion) to satisfy the implementability properties. It includes state assignment by solving the Complete State Coding problem 4, 5] coupled with logic minimization and speedindependent technology mapping to a target library 51, 3, 25] . Related approaches can be also found in 56, 1, 30] . Methods based on the theory of state regions 12, 40, 6] provide an e cient framework for the solution of the above tasks, as they allow us to treat uniformly large sets of markings. State regions are also well suited for symbolic manipulation. These methods are discussed in Section 8.
Basic Notions
Let N = hP; T; F; m 0 i be a Petri net (PN) 37], where P is the set of places, T is the set of transitions, F (P T) (T P) is the ow relation, and m 0 is the initial marking. can again make some transitions enabled. We can therefore talk about sequences of transitions that re under the markings reachable from the initial marking m 0 . Such sequences of transitions will be called feasible traces or simply traces. The set of input places of transition t is denoted by t and the set of output places by t . Similarly, p and p stand for the sets of input and output transitions of place p. A place p is called a choice place if it has more than one output transition. A choice place is called free-choice if any of its output transitions has only one input place. A PN is free-choice if all its choice places are free-choice.
The set of all markings reachable in N from the initial marking m 0 is called the Reachability Set of N. Its graphical representation is called the Reachability Graph (RG).
An example of PN and of its reachability graph is shown in Figure 2 ,a,b. There are two choice places in the PN, p1 and p5, both of them are free-choice places and consequently the PN is free-choice. Since in every reachable marking of this PN there is only one token, each marking of its reachability graph contains only one place. This PN has no concurrency.
A PN is called:
k-bounded, if for every reachable marking the number of tokens in any place is not greater than k (the place is called k-bounded if for every reachable marking the number of tokens in it is not greater than k), bounded, if there is a nite k for which it is k-bounded, more e cient in practice. safe, if it is 1-bounded (a 1-bounded place is called a safe place) A transition t i is called non-persistent if t i is enabled in a reachable marking m together with another transition t j , and t i becomes disabled after ring t j . Non-persistency of t i with respect to t j is also called a direct con ict between t i and t j . A PN is persistent if it does not contain any non-persistent transition.
In the PN in Figure 2 ,a transitions t1 and t2 are both enabled in the initial marking. Firing each of them disables the other, hence this PN is non-persistent.
To show the application of PNs to digital circuit design, we will consider the signal interpretation of PNs called Signal Transition Graphs. This model is widely used in asynchronous design for specifying control circuits 4, 28, 33] .
Signal Transition Graphs (STGs) are PNs whose transitions are interpreted as changes of circuit signal levels. A signal transition can be represented by a j + for the j-th transition of signal a from 0 to 1 or a j ? for its j-th transition from 1 to 0, while a j is a generic name for either a rising or falling transition of a. S A denotes the set of all signals of an STG that can be divided into input, output and internal signals. This de nition of an STG di ers from the de nition 4] in two ways. It does not limit the class of PNs to free-choice PNs and allows the use of dummy transitions.
An example of an STG is shown in Figure 2 ,c. This STG is obtained by labeling transitions of the PN from Figure 2 ,a with signal transitions. STGs are often drawn in their shorthand form, where transitions are denoted by their labels (instead of bars) and places with only one input and output transition are omitted.
The interpretation of PN transitions by the changes of binary signals allows us to introduce a binary encoded Reachability Graph for STGs. This graph is called a State Graph (SG). Historically, SGs appeared independently (and actually earlier) than PNs. They were used as a formal language in the pioneering work of D. Muller 36] to specify the behavior of the so-called speed-independent circuits. However, de ning an SG as a supplementary model for an STG (as Figure 2 ,c is shown in Figure 2 ,d. The * marks the enabled signals in each SG state.
Overview of design methodology
This Section presents a simple example illustrating the design of asynchronous circuit by using interpreted Petri Nets. It is an informal introduction to the methodology that is discussed in more detail in further sections. Figure 3 shows a behavioral speci cation of an asynchronous bus controller. The timing diagrams for the read and write cycles describe the temporal behavior in terms of signal transitions (events).
The diagrams show di erent types of relations between the events, namely choice between the read and write cycle when the controller is in its initial state, causality between pairs of ordered events such as req+ ! ack+ and concurrency between pairs of events, such as req? and rw? in the read cycle, that can occur in any order.
Petri nets are appropriate to capture the previous relations in the same behavioral model. An interpreted Petri net (Signal Transition Graph) for the controller speci cation is shown in Figure  3 ,c.
In the implementation of controller we will follow the conventional scheme, separating the design of the datapath from the design of the control part. The datapath is a regular part, which is generally built by composing reusable units such as registers, ALUs, multipliers, etc. These units, and their asynchronous composition, require di erent design methods that we will not consider in this paper (see, e.g., 52]). On the other hand, the controller is generally much more irregular and di cult to reuse or assemble e ciently from basic components larger than standard logic gates. Hence the logic design of control parts is the main target of the methodology presented in this review.
Let us illustrate the design steps by means of the bus controller of Figure 3 .
Speci cation (see Section 4) .
At this stage we must formalize our notion of what we are going to implement. The output of the speci cation step is an STG describing the control part, like that in Figure 3 ,c.
Veri cation (see Sections 6 and 7).
Veri cation generally means to gain con dence that a formal speci cation does not contain any error. While absolute con dence can never be obtained, due to the impossibility to model all errors, our con dence in their absence is increased by checking simple properties, such as: no new request can be produced until the acknowledgement is done, the set phase (rising of request and acknowledgement) is followed by the reset phase (falling of request and acknowledgenment), etc. These properties can be analysed via the construction of a binary encoded rechability graph (called State Graph, see Section 2). It is easy to check that the State Graph of the controller in Figure 3 con rms the correctness of original speci cation.
Synthesis (see Section 8)
A State Graph that satis es the implementability conditions, as checked during the veri cation step, de nes the truth table for every signal of the STG. Hence we could use conventional logic minimization techniques to derive logic equations for all non-input signals. However these logic equations correspond to arbitrary boolean functions that may not be available in a given gate library. Therefore the functions should be decomposed on simpler components which can be directly matched with library gates. The latter step is called technology mapping and is non-trivial in asynchronous design, because decomposition may introduce hazards in the circuit implementation. The only non-input signal of the controller in Figure 3 is signal ack, which has a logic equation ack = req + ack rw. Two di erent implementations of ack obtained by speed-independent technology mapping are shown in Figure 3 ,d,e.
Speci cation of circuits
First, we will show the use of PNs in a top-down design methodology when the design ow starts from a behavior speci cation and ends up with a circuit that implements it. This approach will be illustrated with several examples that will be used throughout the paper.
After this we will discuss how to use PNs in a bottom-up design ow. In this ow the starting point is a set of library blocks, and extracting from them a behavior speci cation can be useful for veri cation, optimization and resynthesis.
Set-dominant latch. A set-dominant latch is a basic block of logic design. Similar to a conventional SR-latch it has two inputs: S (set) and R (reset) and one output Q. The output Q goes high whenever the set input S is asserted (S = 1) and goes low when the reset input is asserted while the set input is low (R = 1; S = 0) (see Figure 4 ). Note that due to asymmetry in the latch behavior any combination of input values is feasible in a latch operation and this latch is more robust than the conventional SR-latch for which the combination S = R = 1 is forbidden. It is easy to see that the behavior speci ed by the STG in Figure 2 ,c can be implemented using a set-dominant latch. From this STG the logic equation for the latch output Q can be derived. Consumer-producer problem. Let us consider the implementation of a control for a consumerproducer system with multiple consumers and a single producer. Consumers are synchronized with a producer by a bu er of restricted capacity. In the simplest case when the capacity of a bu er equals to 1, each consuming process requires one piece of production in a bu er and the producer does not start working again until the piece is consumed (the bu er is empty). The latter means that such system is operating only when there are requests on consuming, and is idle if no request is present. Input-output behavior of a control unit for the consumer-producer system is speci ed by signals: reqi { input requests for consuming from i-th consumer and con, prod { output grants for consuming and producing respectively. We assume that only one consumer can produce a request reqi+ at any given time, and that the choice between requesting consumers is non-deterministic. This means that we abstract away from the speci cation any reason that the environment may have to decide between consumers. Abstraction and non-determinism are powerful mechanisms to simplify analysis and synthesis tasks of complex concurrent systems.
The control unit communicates with the environment in a handshake fashion. After a consumer has produced a request reqi+, it is granted by con+. Then the granted consumer removes the request (reqi?), and the control unit waits the synchronization with the producer (prod+) to report about the nishing of consuming (con?) and to restart the process from the initial state. The process for a two-consumer system can be speci ed by the STG shown in Figure 5 ,a. Note that req1+ and req2+ are rising transitions of two distinct request signals, while con 1 + and con 2 + are transitions of the same signal. This means that several distinct PN transitions can have the same physical interpretation. For the purpose of optimization, we would like to share the logic controlling di erent consumers as much as possible. Hence the last part of handshake communication (event con?) is made common for all processes. This approach is clearly di erent from a distributed control implementation where this event would be duplicated in each consumer process.
The implementation of the control unit for a consumer-producer system will be discussed in later sections.
Railway control. Consider an imaginary ring railway with several sections and two trains on them (see Figure 6 ). The trains move in only one direction and transport the goods. Since their speed and time for loading/unloading are not constrained in any way, it is necessary that appropriate control should guarantee safety of the railway. For that, let us assume that while the trains are in motion there must always be at least one free section between them. Since the delay of the processes are arbitrary (according to the problem semantics), the control circuit must be asynchronous and we will design it using STGs. Let us derive the causal relations between events on the railway. Clearly, when a train appears on the i-th section (event ti+), it means that the same time it leaves the (i-1)-th section (event t(i-1)?).
In fact, because the train has a non-zero length we shall consider that it rst appears on the i-th section and only then the detector t(i-1) is reset, i.e. the arrival of the train to the i-th section is a complex event ti+; t(i-1)?.
The train passes onto the (i+1)-th section if it has arrived to the i-th section (complex event ti+; t(i-1)? has happened) and the semaphore s(i+1) is open.
From the safety tra c requirements follows that the semaphore on the i-th section (si+) can be opened only if the train has left the (i+1)-th section (event t(i+1) and s(i+2)?) and the semaphore must be closed as soon as the train appears on the i-th section (ti+).
These casual dependencies are graphically shown in Figure 7 ,a. Based on them, the STGs for the railway system with 6 and 7 sections are generated (see Figure 7 ,b,c). In the next sections we will discuss how to get correct control circuits from these STG. Dining philosophers. This is the well-known problem 18] of sharing resources (forks) between concurrent processes (philosophers). The system contains n forks for eating spaghetti and n philosophers sitting around the table (see Figure 8 ,a for the system from 3 philosophers). To eat spaghetti, a philosopher needs two forks 3 : one from the left and one from the right. Clearly, when a philosopher i is eating, his left and right neighbor are missing the fork and they should wait until a philosopher i nishes and frees the forks. Taking left and right forks for a philosopher are independent actions, e.g., he can take one fork and then wait until the other is free. However, after eating, a philosopher returns forks simultaneously. Let us re ne this classical formulation and allow the philosopher to return forks independently. This gives additional exibility in describing a philosopher's behavior. For example, instead of the standard \wise and fair" philosopher, we can now describe a \malicious" philosopher who can intentionally delay returning one of the forks to annoy his neighbor. In Section 7 we will see that this complication has crucial consequences for the complexity of veri cation.
The speci cation of each philosopher's behavior is as follows. When a philosopher i is ready to eat, he produces the request for eating (eat i +) and takes the left (lf i +) and the right forks (rf i +) if they are available (places F i and F i+1 respectively contain token). After eating (eat i ?), a philosopher returns forks (lf i ? and rf i ?) and returns to the initial state. The speci cation for three philosophers is presented by the STG in Figure 8 ,b. It is easily scalable for n philosophers. 
Circuit Petri Nets
In the previous examples PNs were used as a behavioral speci cation of a system to be designed. Another possible application arises when one would like to extract a behavior from a system that has already been designed and is speci ed by a netlist of gates (or by logic equations). A set of logic equations can be syntactically translated into a Petri Net 58] . This PN (though looking a bit awkward) can then be translated into a Reachability Graph and a corresponding SG via a token ow simulation. From the SG a new circuit can be synthesized. This suggests an e cient way for resynthesis and optimization. The behavior of each circuit signal a can be represented by a so-called signal cycle (see Figure  9 ,a). Places a0 and a1 of a signal cycle denote the markings in which signal a has value 0 and 1 respectively. Transition a+ (a?) should re if signal a is at 0 (1) and the logic function for signal a is at the opposite value 1 (0).
Let us show how to construct circuit PNs using a logic equation of signal a (a = F(a; b; c; : : :)). The cofactor of Boolean function F with respect to a (a) is de ned as F a = F(a=1; b; c; : : :; ) (F a = F(a=0; b; c; : : : ; )).
Clearly, all states in which a=0 and the logic function F equals to 1 are speci ed by F a = 1. Hence transition a+ in a signal cycle must have input places that should express the condition F a = 1. Transition ring in PNs has AND-semantics. Therefore, in order to express the condition F a equals to 1 and transition a+ might re. Clearly, the change of signal a from 0 to 1 does not in uence the value of F a , and hence places p1 and p2 are connected with a+ by self-loops (A p $ t arc denotes two arcs: p ! t and p t).
The condition for the ring of transition a? is given in the same way by a conjunctive normal form of F a . 
Speed-independent circuits
A circuit is an interconnection of logic gates. Every gate is composed of (1) an instantaneous logic function evaluator and (2) a delay attached to the function output (see Figure 11 ). All the delays involved in the switching and transmission of signals within a gate and along the connecting wire prior to its fork are assumed to be reduced to the output delay. The skew of the signals after the fork due to wire delays is assumed to be less than the minimum gate delay. A delay induced by a wire may be modeled explicitly, if needed, by inserting an auxiliary component, a bu er, into the wire break.
For an asynchronous behavior the exact value and even bounds for the delays are often unknown, hence we will be rather pessimistic and assume that in a circuit the delays of gates are nite but unbounded. A circuit whose behavior is correct under any values of gate delays is called speed-independent 36]. In this paper we will consider only the design of speed-independent circuits 5 . A signal, z i , is stable in state s, if its value in this state, 0 or 1, is equal to the value computed by its logic function, z i (s) = F i (s). Otherwise, a signal is said to be enabled. For example, an AND gate with output at 1 and at least one input at 0 is enabled. In any state an enabled signal can change its value. This generates the dynamic behavior of a circuit, i.e. transitions from the initial state to other states.
Delay Function evaluator
The key issue of correctness of an asynchronous circuit operation is hazard-freedom. A hazard appears as a pulse (possibly a short spike), which does not correspond to any signal transitions in the speci cation.
An example of hazardous behavior is shown in Figure 12 . The circuit in Figure 12 ,b implements the output c as the XOR function of inputs a and b. If inputs make a transition from a = 1; b = 0 to a = 0; b = 1, then according to the speci cation the output c must stay at 1. However, in the circuit it can deviate to 0 { a hazard may occur.
The reason for hazards in an asynchronous circuit is that the signals propagate from inputs to outputs with di erent delays and can compete with each other. Consider, for example, the circuit Figure 12 ,b. After the output of the upper AND gate has changed from 1 to 0, the output of the OR gate becomes enabled to change from 1 to 0 as well. However, depending on the relative delays of the gates in the circuit, it may well be that the output of the bottom AND gate changes rst from 0 to 1. This race at the inputs of the OR gate can lead to a short spike at its output. Speed-independent circuits, being correct under any distribution of delays, are free from hazards. Indeed, in a speed-independent circuit any enabled signal cannot be disabled before changing its value. Indeed, a hazard as in the previous example can occur only because an output becomes disabled due to changes of the gate inputs, since in that case the gate output might or might not switch depending on the gate delay. This property of \proper disabling" is characteristic of speed-independent circuits.
Implementability
In order to analyze the implementability of an STG by a circuit, we must introduce a notion of equivalence between PNs (in turn inducing an equivalence between STGs) and a notion of equivalence between circuits and STGs. These notions of equivalence may vary depending on the required level of detail (e.g., whether internal signals in a circuit matter or not). The behavior of PNs can be compared by comparing the languages generated by the PNs, where the language of a PN is de ned (as usual) as the set of its feasible traces.
Let us start by informally establishing two forms of equivalence of PNs:
Strong Equivalence. Two PNs are strongly equivalent if (1) the number of transitions is the same for both nets and (2) their languages (sets of feasible traces) coincide up to transition renaming.
Trace Equivalence. The set of transitions for each PN is rst partitioned into observable (external) and non-observable (internal) transitions. Two PNs are trace equivalent if their languages coincide up to renaming observable transitions if all non-observable transitions are removed from the feasible traces 47]. Contrary to strong equivalence, trace equivalence allows one to map several transitions of one PN into one transition of the other and to ignore internal transitions, checking the equivalence only by observable external transitions. The internal behavior of two PNs can be quite di erent (e.g., in the number of ring transitions) as long as the external behavior stays the same.
Similarly, one can talk about the equivalence of an STG and its circuit implementation by comparing the respective languages 6 . Such comparison is possible since both STGs and circuits de ne feasible traces of signal transitions (that can be, e.g., represented using state graphs). We say that an STG, D, is gate-implementable if there is a logic circuit C strongly equivalent to D implementable if there is a logic circuit C trace equivalent to D. The gate-implementability condition guarantees that the circuit has the same number of signals as the STG, i.e. no additional internal signals are required for implementing the speci cation. Each signal in the circuit is associated with a Boolean function that can be implemented as a complex gate (hence the word \gate" in the de nition). In practice, these complex gates are often too large to be physically implementable as a single cell of the gate library and should be decomposed. This design step is called technology mapping and it must be performed without introducing hazards into the circuit, and was considered in 25].
It is often the case that the circuit implementing an STG has more gates than the number of signals in the initial STG speci cation. There are two reasons for this.
The initial STG speci cation often de nes a behavior that can be implemented as a set of logic gates only by adding state signals, as explained in Section 5.3.
The logic decomposition of complex gates into smaller ones also introduces internal signal nodes into the circuit. The implementability condition captures equivalence of an STG and a circuit up to observable external behavior.
We add the pre x SI-and say that an STG is SI-implementable (SI-gate implementable) if the circuit implementing the STG is speed-independent (and hence hazard-free).
Properties required for implementability
In this section we will discuss the following properties of an STG speci cation, that are related to its implementability: redundancy, boundedness, consistency, determinism, signal and transition persistency.
Redundancy, Boundedness and Determinism
A place is redundant if its removal does not change the language of a PN (i.e. the set of feasible traces). Redundant places do not contribute to any trace behavior of a PN and therefore can be removed without altering the speci cation. For the sake of simplicity, from now on we will consider only irredundant PNs and STGs.
Any irredundant speci cation that can be implemented by a logic circuit must have a nite state space 7 . Hence the boundedness of the underlying irredundant STG is a necessary condition for implementability.
An STG (and the corresponding SG) is called deterministic with respect to signal transition a if for any state s there is at most one state s1 such that s a ! s1. The SG (and corresponding STG) is deterministic if it is deterministic for all signal transitions. Since we consider here only deterministic circuits, determinism of the speci cation is a necessary condition for the implementability.
Consistency
Not every bounded STG can be interpreted as the speci cation of the switchings of a set of circuit gates. Let us assume, for example, that the following sequence is feasible in an STG: b 1 +; a+; b 2 +; : : :. After ring b 1 + signal b must be at logical 1, and no correct interpretation can be suggested for the following transition b 2 +. The property of consistency of state assignment guarantees that such problem never occurs.
An STG and the corresponding SG is consistent 8 if for any feasible trace from the initial state, rising and falling transitions alternate for each signal. All speci cations considered so far in this paper have been consistent.
If an STG is bounded, consistent and deterministic, then it can be implemented with a circuit. However, additional state signals may be required to resolve state con icts.
Complete State Coding (CSC)
Let us illustrate, by the example of the consumer-producer system, why inserting state signals can be necessary. The binary encoded SG corresponding to the STG specifying the system with two consumers is shown in Figure 5 ,b. There are two states encoded with the same binary code 0001 but with a di erent next value for the output signal prod: in one state, signal prod is stable high, while in the other state it is enabled to fall. Hence the de nition of the next state function for signal prod would require two contradictory values 1 (due to the rst state) and 0 (due to the second state) for the same binary vector 0001. Therefore, no circuit with two Boolean functions (corresponding to output signals con and prod) can implement the STG from Figure 5 ,a.
A pair of states in an SG with the same binary encoding but di erent enabling of non-input signals is said to be in a Complete State Coding (CSC) con ict. An STG and the corresponding SG satisfy the Complete State Coding property if no pair of SG states has the same encoding but di erent enablings of non-input signals.
If a bounded and consistent STG with n non-input signals has the CSC property, then it can be implemented by a circuit with n (arbitrarily complex) logic gates.
CSC con icts can be resolved by encoding the corresponding states with additional signals. These signals distinguish the CSC con icts by assigning the corresponding states di erent binary codes. In 5] an e ective procedure for inserting additional signals to resolve CSC con icts is presented. This procedure preserves the trace equivalence and speed-independence of the original speci cation and always converges for safe STGs.
Persistency
All the properties considered up to now do not restrict hazards in the implementation. To capture the speed-independence requirement at the STG level, let us consider the signal persistency property 9 . Signal persistency (similarly to speed-independence, as de ned above for circuits) means that if an STG signal is enabled, it res independently from the ring of other signals 10 . However, one should distinguish between input and non-input signals. For inputs, which are controlled by the environment, it is possible to have a non-deterministic choice which is represented in the STG model by con icts, i.e., disabling of one input signal by another input signal. Input con icts per se do not imply hazardous behavior. For non-input signals, which are produced by circuit gates, signal transition disabling may lead to hazards at the output of the gate, as discussed above, thus making the circuit behavior dependent on the gate delays. Disabling input transitions by non-input transitions, on the other hand, is also considered dangerous, since it can lead to hazards in a logic implementation of the environment 11 .
De nition 5.1 An SG (and the corresponding STG) is signal persistent if:
1. no non-input signal can be disabled by another signal and 2. no input signal can be disabled by a non-input signal.
A bounded, consistent, deterministic and signal persistent STG can be implemented with a speed-independent circuit.
Signal persistency and transition persistency (see Section 2) are closely related. Clearly, the only source of signal non-persistency of a signal a is the non-persistency of some transition labeled with a i . Yet, not any non-persistency of a i leads to the violation of persistency by signal a. In Figure 13 ,a transitions labeled with a 1 + and b 2 + are both non-persistent. However, signals a and b are persistent in the corresponding SG in Figure 13 For fake-free STGs signal persistency and transition persistency coincide, and hence signal persistency can be checked as transition persistency at the STG level without generating a state graph. It can also be shown 24] that fake con icts for non-input signals can be always replaced by concurrency in STG speci cations, and hence excluding fake con icts does not restrict expressiveness of STG speci cations. In summary we can conclude that a bounded, consistent, and deterministic STG is always implementable by a logic circuit; 11 A logic implementation of the environment would also be hazardous in the presence of input-to-input disabling. However, this is considered acceptable here because this non-determinism is often the result of an abstraction in the environment behavior speci cation, for the purpose of simplifying the STG model. An environment in general must have a persistent speci cation in order to be implementable as a hazard-free logic circuit. 12 They can be distinguished, if necessary, if a true concurrency semantics is used instead of interleaving.
gate-implementable, if it satis es the CSC requirement; SI-implementable, if it is signal persistent.
6 State-based veri cation
The properties of STG implementability are formulated in Section 5.3 in terms of SG states. However, the explicit manipulation of an SG is often problematic because of the exponential growth of its size with respect to the number of signals. Recent developments in using symbolic techniques for reachable state space traversal based on Binary Decision Diagrams(BDDs) 2, 42], can be applied to avoid the explicit manipulation with the state set and to soften the space explosion problems.
Modeling bounded Petri nets and STGs using Boolean functions
Let N = hP; T; F; m 0 i be a safe Petri net and let M P be the set of all markings of N (n = jPj; jM P j = 2 n ). A marking can be represented by a Boolean vector m = (p 1 ; : : : ; p n ), where p i = 1 (p i = 0) denotes that p i is marked (not marked) 13 . Each set of markings M 2 2 M P can be represented by a characteristic logic function M : B n ! B that equals 1 for those vectors that correspond to markings in M. For example, given the Petri net depicted in Figure  5 Function E(t) (ASM(t)) states that all input (output) places of transition t contain a token, and function NPM(t) (NSM(t)) states that no input (output) place of t contains a token.
Let us recall a few useful de nitions from the theory of Boolean functions. Let f(x 1 ; x 2 ; : : : ; x i ; : : : ; x n ) be a Boolean function of n variables. The cofactor of f(x 1 ; x 2 ; : : : ; x i ; : : : ; x n ) with respect to literal x i is f(x 1 ; x 2 ; : : : ; 1; : : : ; x n ), while the cofactor with respect to literal x 0 i is f(x 1 ; x 2 ; : : : ; 0; : : : ; x n ). The notion of cofactor can be generalized to a set of variables, e.g., f x i ;x 2 = (f x 1 ) x 2 . The existential abstraction of f(x 1 ; x 2 ; : : : ; x i ; : : : ; x n ) with respect to x i is 9 x i (f) = f x i + f x 0 i .
We The symbolic representation is as follows:
The existential quanti cation of t from the above formula gives the set of markings N (M) reachable by ring any one enabled transition from a marking in M. Here F g denotes the cofactor of Boolean function F with respect to the set (Boolean product) of literals g.
Example. Assume that in the example of Figure 5 However, detecting if some unsafe marking is reachable can be done by identifying a marking m in which a transition t is enabled and some successor place p of t, and not predecessor of t, is already marked. In that situation, after ring transition t, place p will have two tokens. This idea is implemented in the algorithm from Figure 15 . Figure 16 : Algorithms to verify persistency for each place must be guessed in advance in order to compute the transition function N (M; t). In case the guessed bound of some place is exceeded, the traversal must be restarted from the beginning, with a new de nition of . Hence we can state that non-safeness is a problematic property for state-of-the-art BDD-based techniques.
Consistency. Verifying that the STG is consistent can be done during the traversal by checking the consistency of the newly generated states. We rst de ne the following characteristic function:
The characteristic function of the states with inconsistent assignment is derived according to the de nition of consistency (cf. Section 5. An algorithm to check transition persistency is shown in Figure 16 . Note that only transitions with some common predecessor place can be in a direct con ict. This is used in the algorithm to reduce the search. Let R(N) be the set of reachable markings of N. The set of markings with t i enabled is calculated. Next, the set of markings reachable in one step by ring some transition t j 6 = t i is obtained. If t i is not enabled in any one of those markings, then t i is not persistent.
Complete State Coding. The CSC requirement can be checked for each non-input signal by de ning the following characteristic functions: ER(a+) = 9 P (R(D) E(a+)) ER(a?) = 9 P (R(D) E(a?)) QR(a+) = 9 P (R(D) a ? E(a?)) QR(a?) = 9 P (R(D) a ? E(a+)) ; where 9 P F denotes the existential abstraction of F with respect to the set of all variables representing places of the PN, P. Using this operation removes all variables from P from the corresponding Boolean functions.
ER(a ) is the set of binary codes that correspond to states in which some a i is enabled (a set of excitation regions). It is obtained by abstracting the places (9 P ) from the states of the excitation region. QR(a+) (a set of quiescent regions) is the set of binary codes that correspond to states in which a = 1, but a? is not enabled (similarly for QR(a?)).
A CSC con ict occurs in an SG when in two states s1 and s2 with the same binary code some non-input signal a is enabled to rise in s1 and is stable at 0 in s2, or enabled to fall in s1 and stable at 1 in s2 This means that state s1 belongs to the excitation region of a while s2 belongs to a quiescent region of this signal. As s1 and s2 have the same binary code, the characteristic functions of excitation and quiescent regions for a must intersect. Based on this the CSC requirement for non-input signal a can be checked as follows 41]:
CSC
(a) = (ER(a+) \ QR(a?) = ;)^(ER(a?) \ QR(a+) = ;) CSC(D) =â is non?input CSC(a)
Example. In the example of the consumer-producer system of Figure 5 , the set of states where signal prod is stable and equal to 1 (quiescent region of prod) is f0001; 1001; 1011; 0101; 0111; 0011g (see Figure 5 ,b). The set of states where prod is enabled and equal to 1 (excitation region of prod?) is the single state 0001. The intersection of the excitation region and of the quiescent region is non-empty, thus indicating the presence of a CSC con ict (see Figure 5 ,b) due to the states with the code 0001.
Determinism and fake con icts. Determinism and fake con icts can also be easily checked using manipulations with characteristic Boolean functions.
This Section showed that all the properties related to STG SI-implementability can be checked by symbolic traversal of the reachability space. This is not surprising, because the set of reachable states naturally contains all the information needed for the implementation of an STG. Some e ort is still required because in the symbolic traversal the reachability space is manipulated implicitly, and hence the check of some properties can have a complex formulation. However, even with an implicit representation of the reachability set the size of the corresponding BDD might still be exponential with respect to the size of the original STG. For such examples the event-based approach developed in the next section can be more e cient.
Symbolic representation with Binary Decision Diagrams
In this section we brie y explain how sets of states can be represented by means of Boolean functions and e ciently manipulated by using Binary Decision Diagrams (BDDs) 29]. A BDD is a directed acyclic graph with one root and two leaf nodes (0 and 1). Each non-leaf node is labeled with a Boolean variable and has two outgoing arcs with labels 0 and 1. A BDD represents a Boolean function interpreted as follows: each variable assignment has a corresponding path that goes from the root node to one of the leaf nodes. The label of the leaf node is the value of the function for that assignment. As an example, the BDD depicted in Figure 17 for further details on how to manipulate Boolean functions e ciently by means of BDDs. Figure 17 illustrates how the reachability set of markings of a PN can be represented with BDDs. The example relies on the fact that an e cient encoding can be found to represent the reachable markings.
The strategy for encoding is based on the observation that the sets of places SM 1 = fp 0 ; p 1 ; p 2 ; p 3 g, SM 2 = fp 4 ; p 5 ; p 6 g and SM 3 = fp 6 ; p 7 ; p 8 g are state machines of the PN 17] . This information can be structurally obtained by using algebraic methods 9]. State machines correspond to placeinvariants of the PN and preserve their token count in all reachable markings. Given the initial marking of the net, at most one of the places of each state machine will be marked at each marking. Thus, the following encoding can be proposed: two Boolean variables (v 0 and v 1 ) can be used to encode the token in SM 1 7 Event-based veri cation. Unfolding approach Methods based on partial orders are well-known techniques to avoid the \state explosion problem" in the behavioral analysis of Petri Nets (PN) 32, 13, 22] . Instead of a reachability graph, a nite pre x (called unfolding) of an equivalent occurrence net (acyclic net where all places have at most one input transition) is generated. Although we consider the application of the unfolding approach to the same veri cation tasks as BDD-based methods of Section 6, these two techniques are unfortunately incomparable in terms of e ciency (i.e., it is very di cult to tell a priori which one is better, except for some very special properties and subclasses of PNs). Hence, they should be viewed as complementary rather than completely independent. Let us call a PN acyclic if there are no cycles in the graph of the PN. In an acyclic PN some places have no input transitions. We assume that all these places are initially marked with one token and no other places are initially marked. Two examples of a cyclic and an acyclic PN are shown in Figure 18 ,a,b.
Petri Net unfolding
De nition 7.1 (Ordering relations) Let N = hP; T; Fi be an acyclic PN and x 1 ; x 2 2 P T. x 1 precedes x 2 (denoted by x 1 ) x 2 ) if (x 1 ; x 2 ) belongs to the re exive transitive closure of F, i.e., there is a path in the graph of the PN between x 1 and x 2 .
x 1 and x 2 are in con ict (denoted by x 1 #x 2 ), if there exist distinct transitions t 1 ; t 2 2 T such that t 1 \ t 2 6 = ;, and t 1 ) x 1 , and t 2 ) x 2 . x 1 and x 2 are concurrent (denoted by x 1 jjx 2 ), if they are neither in precedence, nor in con ict.
Let us consider these relations for the example of an acyclic PN in Figure 18 rst a cyclic PN is unfolded into an equivalent acyclic net, and then a nite pre x of the acyclic net is used. It is particularly convenient to unfold a PN into a special kind of acyclic PN, called occurrence net in 39, 13] , in which every place has at most one input transition. Figure 19 ,a shows an occurrence net for the PN from Figure 18 ,a. Each transition ti (place pj) of the initial PN has a set of corresponding transitions (places) in the occurrence net ti 0 ; ti 00 ; ti 000 ; : : : (pj 0 ; pj 00 ; pj 000 ; : : :) that are called instantiations of ti (pj) . It can be shown that under the partition r that associates each transition t of PN with its instantiations t 0 ; t 00 ; : : :, the original PN is trace equivalent to its occurrence net. Further we will refer to any object in the occurrence net equivalent to an object in the original cyclic PN by adding one or more apostrophes (or by adding a superscript) to its name. For example, t 0 and t are the corresponding transitions in an occurrence net and a PN, m 0 is a marking in the occurrence net and m is the corresponding marking in the PN, etc. Although the occurrence net for a cyclic PN can be in nite, it is always possible for a bounded PN to truncate the occurrence net up to a nite \complete" subgraph (an unfolding) that carries the same amount of information. We need to introduce several notions in order to de ne this truncation.
De nition 7.2 (Con gurations) 32].
A set of transitions C 0 T 0 is a con guration in an occurrence net if:
1. for each t 0 2 C 0 the con guration C 0 contains t 0 together with all its predecessors; 2. C 0 contains no mutually con icting transitions. The minimal con guration that contains t 0 and all the transitions preceding t 0 is called a local con guration of the transition t 0 (denoted f) t 0 g) Each con guration C 0 corresponds to a marking (called nal marking of C 0 ) that is reachable from m 0 after all the transitions in C 0 have been red. A nal marking of a local con guration of t 0 is called a basic marking of t 0 and denoted BM(t 0 ) 0 .
In the occurrence net of the PN from Fig. 19 ,a, the local con guration f) c 0 g for transition c 0 is equal to fa 0 ; b 0 ; c 0 g. The basic marking for c 0 is BM(c 0 ) 0 = fp5 0 ; p2 00 g. Marking fp3 0 ; p4 0 g does not correspond to any local con guration; however, it corresponds to the con guration fa 0 ; b 0 g.
Cut-o s
Occurrence nets are truncated by the cut-o transitions. Criteria for choosing these transitions for di erent classes of PNs were suggested in 32, 26, 20] . The necessary condition for a transition t 0 i to be a cut-o is that the basic marking of t 0 i repeats the basic marking of some other transition t 0 j that has been generated earlier in an occurrence net (t 0 j is called an image of t 0 i ).
Intuitively it means that ring t i generates a marking of the original PN that has has already been reached after ring t j . As shown in 32, 26, 20] some more precautions have to be taken in determining cut-o s.
De nition 7. where is an adequate partial order.
The simplest partial order is de ned as the integer ordering on the size (de ned as the number of transitions) of con gurations 32]. More elaborate (and more e cient) cut-o criteria have been de ned in 26, 20 ]. An unfolding is obtained from the occurrence net by removing all the places and transitions which follow cut-o s.
The unfolding corresponding the PN of Figure 18 ,a is shown in Figure 19 The last two cut-o criteria are the most general and e cient, in terms of the size of the generated unfolding. The criterion of Esp'95 20] is optimal, in the sense that it allows one to de ne a total order between transitions with the same basic marking. I.e., each time two transitions with the same basic marking are generated in an unfolding, one of them will be de ned as a cut-o 
Checking implementability properties by unfoldings
Boundedness. Figure 21 ,a shows an example of an unbounded PN. The place p3 is unbounded because the trace t1; t2; t1; t2; : : : can generate an unbounded number of tokens in p3.
(a) Figure 21 : Unbounded PN (a) and its occurrence net (b).
Let N 0 be an unfolding of PN N. N is unbounded if and only if there is a transition t 2 T that has two instantiations in N 0 , t 0 and t 00 , such that t 0 ) t 00 and BM(t 0 ) < BM(t 00 ).
Hence, checking boundedness of a PN can be reduced to the analysis of the precedence relation between transitions in an unfolding.
Return to the example of Figure 21 . As soon as the transition t1 00 will be generated in the unfolding, one can conclude that the original PN is unbounded. Indeed, both t1 0 and t1 00 are instantiations of the same transition t1, and t1 0 ) t1 00 . BM(t1 00 ) = p2p3p3p4 > BM(t1 0 ) = p2p3p4, so the marking of place p3 can grow inde nitely and this place is unbounded.
Persistency. An unfolding gives an explicit representation of con icts. However, not every pair of con icting transitions is non-persistent (in direct con ict). For example, con icting transitions t3 0 ; t4 0 from PN in Figure 22 do not share any input place and thus cannot disable each other. Con icting transitions t1 0 ; t4 0 are also not in a direct con ict (although they share the input place p2 0 ), because they are never enabled simultaneously.
The structural properties of direct con icts can be stated as follows 26]: A direct con ict occurs when two transitions share the same input place and none of their predecessors are in con ict (otherwise a direct con ict has already occurred for the predecessors).
Hence the transition persistency check can be reduced to the analysis of con ict relations between transitions in an unfolding. For example, in the PN in Figure 22 , transitions t1 0 and t4 0 are not in a direct con ict, because t2 0 is the direct predecessor of t4 0 and t2 0 is in con ict with t1 0 . However, ft1 0 ; t2 0 g are in a direct con ict. An STG is non-deterministic whenever two transitions with the same label are in a direct con ict. This is clearly a particular case of non-persistency. Similarly, fake con icts in an STG can be checked based on the direct con ict relation.
Consistency. The In the STG shown in Figure 23 ,a transitions a 1 + and a 2 + are concurrent. The following trace is feasible: a 1 +; a 2 +; a?; : : :. After ring a 1 + signal a is at logical 1, and therefore the next transition of signal a should be negative. However, the trace contains a positive transition a 2 +. This shows that auto-concurrency indeed captures one of the features of consistency.
Example. To illustrate the importance of auto-concurrency and sign alternation, let us return to the railway example with seven rail sections from Section 4 (part of its unfolding is presented in Figure 24 ,a). One can notice that events si 0 ? and si 0 + are concurrent in the unfolding (see e.g. s2 0 ? and s2 0 + in Figure 24,a) . This is not the case for the railway system with 6 sections.
The reason for such di erence in the behavior is that for six sections the tra c of trains is rather strictly synchronized, and each train can move relatively to the other only by one section. With more sections in the system, the freedom of motion increases and our seemingly good speci cation shows inconsistencies, and hence cannot be implemented. To eliminate this problem, let us change the speci cation and allow a train to appear on the i-th section strictly after the semaphore on the previous section is closed (s(i-1)! ti+). The obtained STG (shown in Figure 24 ,b) is consistent and can be implemented with a speed-independent circuit. Note that further complication of the system (introducing more sections) does not lead to any incorrectness and the corresponding STG keeps the implementability properties.
Auto-concurrency and sign alternation allow us to check consistency for all but the \last" signal transition instantiations in an unfolding. To see this, let us return to the example shown in Figure 23 and suppose that transition a 1 + is removed from the STG together with places p3 and p4. The remaining part of the unfolding is both non-auto-concurrent and sign alternating, but the feasible trace b+; a?; b+ shows a violation of consistency. This fact cannot be explicitly observed in the unfolding, since there is no information about the next transition of signal b after b 0 +. The problem can be solved by considering binary states corresponding to basic markings 27]. Each marking m 0 in an unfolding of an STG is mapped to one binary state s. Values of all the signals in s can be obtained by ring all the transitions from the con guration C 0 that corresponds to the marking m 0 . We will say that an STG satis es the proper state assignment property if for any pair of transitions with equal basic markings, the corresponding binary states coincide.
The following statement gives necessary and su cient conditions for an STG to be consistent. Let D be a bounded STG and D 0 be an unfolding of D. If D 0 is non-auto-concurrent, sign alternating and has proper state assignment, then D is consistent.
The proper state assignment property helps to distinguish the consistency violation in the unfolding from Figure 23 ,b. If the initial state of STG is 00, then the ring of a 2 + 0 or b+ transitions reaches the same marking, but the corresponding binary states are di erent: 10 and 01 respectively.
We have reduced checking STG consistency to the analysis of ordering relations in an unfolding. Indeed auto-concurrency and sign alternation are particular cases of concurrency and precedence relations, while binary states corresponding to basic markings can be calculated locally from the binary states corresponding to the preceding transitions.
Complete State Coding. This property is the most di cult in the analysis by unfoldings.
CSC con icts are essentially de ned by pairs of binary states and there is little hope to check the CSC property without completely analyzing the reachability set.
To alleviate somehow this di culty, unfolding methods can at least perform a conservative check of CSC 41] . This analysis is based on an approximation of the reachable state space, by testing su cient conditions which ensure the absence of CSC con icts. In case the conditions are satis ed, one can be sure that the STG has CSC property. Otherwise, it may either be the case that a CSC con ict really exists, or that the conditions were overly conservative (i.e. the analysis can produce false negative results).
For the exact analysis of the CSC property, it is currently better to use the BDD-based approach described in the previous section.
This Section considered the application of the unfolding technique to the veri cation of speci c properties related to STG implementability. While unfolding-based analysis is much more powerful 16 , we restricted our consideration to this special case, since its veri cation is simpler than the general case, but it is su cient to decide the applicability of our synthesis technique.
Event-based vs state-based veri cation
Sections 6 and 7 presented two di erent approaches for the veri cation of asynchronous systems. Although both approaches are aimed at checking the same properties, they have quite di erent characteristics in applications.
It is not easy to provide an exact recipe where to apply any of these methods. Below we will try to give some hints, by comparing the veri cation results obtained by two software tools: petrify 5] based on BDD techniques and unfolding 26] based on unfolding methods.
First, let us note that for small/moderate examples both methods are nearly equally good and the analysis times are small. Table 2 presents the results of checking the STG implementability for the examples from Section 4.
Example SI-impl. SI-gate-impl. C P U BDD C P U unf Remarks set-dominant latch (Fig. 2,c) + + 0.16 0.1 < 1 consumer-producer (Fig. 5,a) + ? 0.98 < 1 non-CSC 6-section railway (Fig. 7,b) + + 12.1 < 1 7-section railway (Fig. 7,c) ?
? 27 .1 < 1 Inconsistent Modi ed 7-section railway (Fig. 24,b) + + 52.6 < 1 3 dining philosophers (Fig. 8,b) + + 10.1 < 1 Table 2 : Veri cation results for small examples (CPU times in seconds).
To compare the e ciency of the event-based and state-based veri cation, let us consider the scalable examples of the railway system with n sections and of the n dining philosophers. The CPU times and the sizes of internal representations (BDD in case of state-based veri cation and unfolding for event-based methods) are presented in Table 3 clearly shows the di culties of a direct comparison of state and event-based methods: the unfolding-based method is superior to the BDD-based technique for the veri cation of the railway system, while BDD traversal is more e cient for the analysis of the dining philosophers example (for cases with more than 10 philosophers).
There is some intuition behind the ine ciency of unfolding methods for the veri cation of the dining philosophers. The common parts of alternative branches of the PN are duplicated in the unfolding, and, roughly speaking, the unfolding grows \in width" with respect to the amount of con icts. Because of the concurrency, di erent instantiations of the same PN transition can have di erent basic markings, 17 and hence due to concurrency the unfolding grows \in depth". 17 In a sequential process all instantiations of the same transition always have the same basic marking.
The dining philosophers example is characterized by a complicated structure of con ict and concurrency relations. Hence the size of the unfolding quickly blows up when the number of philosophers increases. Note that this di culty arises only in the re ned version of the dining philosophers problem (when returning the left and right forks are considered as independent actions). In the classical version (when forks are returned simultaneously) there is no such concurrency in the behavior of a single philosopher, and the unfolding method works more e ciently than the BDD-based technique.
The above observation about the class of speci cations \di cult" for the unfolding approach is far from rigorous. Things are even more vague when talking about the e ciency of BDDbased methods. The size of the BDDs obtained during a symbolic traversal can be quite di erent and depend on many factors. One of the most important factors is the order of signals during BDD construction: the size of BDD for the same STG with a \bad" order of signals can be exponentially larger than that with a \good" ordering 24]. However, even using the same heuristics on the ordering of signals, the size of the BDDs can di er by several orders of magnitude for STG speci cations that are approximately equal in size. BDDs are more suitable for regular speci cations (like our scalable examples) and can handle large examples, while for \unstructured" speci cations the BDD size generally blows up faster.
The conclusion from the comparison of the unfolding and BDD approaches can be as follows: they are complementary techniques which allow one to attack from di erent sides the veri cation problem of asynchronous systems. Their e ciency depends on di erent and unrelated features of the speci cation, and if one technique fails, the other can succeed.
Synthesis of asynchronous circuits from STGs and PNs
Two approaches for designing asynchronous circuits using STGs, syntax-directed compilation and the logic synthesis, are shown in Figure 25 . The compilation process solves two tasks: (a) matching the STG elements with corresponding primitives from a library and (b) assembling primitives into a proper implementation. This approach is fast and simple. It gives circuit implementations that are linear in the size of the original STG. The latter allows us to estimate the complexity of the implementation at the early stage of the design process. However, the implementation is often ine cient, both in terms of area and performance. This restricts the application of the structural synthesis methods.
The synthesis-based approach in in most respects opposite to the structural methods: it is complex and computationally expensive, but it gives much more e cient implementations. In this approach the initial speci cation is gradually re ned until the behavior of each signal can be implemented by a library gate. The re nement is done by means of equivalent transformations (e.g., by inserting new signals) and can be performed either directly at the STG level (route 2.1 in Figure 25 ) or using the corresponding SG (route 2.2 in Figure 25 ).
Structural synthesis
There are two basic elements in a PN: places and transitions. Simulating the PN behavior by a circuit can be done using either places or transitions as a modeling basis.
Modeling by places. For simplicity let us consider safe and persistent PNs. The goal of the synthesis procedure is to achieve a correspondence between the marking of every place p and the value of the output signals corresponding to p. The binary value corresponding to the marking of p (\0" if there is no token, \1" if there is a token) can be stored in a latch. When the marking of p is supposed to change according to the token ow, the corresponding latch changes the stored value.
An example of circuit primitives for the place-based implementation of PN is shown in Figure  26 ,a 58]. The marking of any place p is encoded by the values at the outputs of the RS-latch, (p; p 0 ), as follows: (0,1) { no token (\p is not marked"), and (1,0) { there is a token (\p is marked"). a1 is the \activity gate" which triggers the transmission of a token from place p to the following places. The token ow in the PN fragment in Figure 26 ,a is simulated by the circuit in the following way. Initially place p1 is marked (latch (p1; p1 0 ) is in state (1,0)) while p2 is not (latch (p2; p2 0 ) is in state (0,1)). Passing the token from p1 to p2 starts by the switching a1? of the activity gate a1. This forces the latch (p2; p2 0 ) to change the state from (0,1) (unmarked) to (1,0) (marked) going through the transient state (1,1) via events p2+,p2 0 ?. The latter triggers the latch (p1; p1 0 ) to move into the unmarked state (events p1 0 +,p1?) which is acknowledged at the cell modeling p2 by switching the activity input a1 (event a1+). The token has shifted from p1 to p2.
From this consideration, it follows that the atomic ring operation of transition t is modeled in the circuit in two steps: rst the token appears in p2 and then it is erased in p1. However, this does not lead to any confusion in the modeling process, because in the transient state (when the token resides in both p1 and p2) the token ow is stopped by preventing the activity gate a2 from switching (gate a2 can be set only when latch (p1; p1 0 ) goes to the unmarked state).
The basic primitive can be easily modi ed for the modeling of more complicated token ows. For p3, that is an output place of the join transition t (see Figure 26 ,b), a token can appear only when both p1 and p2 are marked. This is modeled by AND-ing falling events, a1? and a2?, from the activity gates of the cells for p1 and p2. For p1, that is an input place of the fork transition t (see Figure 26,c) , the token is erased only when it appears in both p2 and p3. This is modeled by AND-ing the falling transitions of signals p2 0 and p3 0 for resetting latch (p1; p1 0 ). Note that the restriction of safeness is not crucial for the suggested approach, since unsafe nets can be modeled in a similar way by adding additional bu ers and pielining the basic primitives. Non-persistency can be modeled by adding additional input variables from the environment.
By thorough analysis of a PN structure it might be possible to simplify the modeling circuit 57]. However, there are few hopes to get an area e cient circuit using this approach because of the low level of granularity in modeling PNs: in general it is too expensive to model each place with a separate latch. Still, this approach might be useful for fast prototyping of asynchronous controllers when the synthesis time (not the area or delay cost) is a crucial issue.
Modeling by transitions. The alternative way of simulating the PN behavior by a circuit consists of modeling by transitions. In the simplest form, the ring of transition t is modeled by the switching of the output T of the corresponding circuit primitive. The direction of switching (rising or falling) is irrelevant. Hence, if initially the output T is at 0 every odd ring of transition t corresponds to T+ while every even ring corresponds to T?. Therefore, this kind of modeling is called two-phase.
The rst complete set of circuit primitives for two-phase modeling of PN behavior was suggested by Patil and Dennis 44] . The constructions presented there are simple and elegant, but they are suitable only for those safe, persistent nets in which at most one input place can be shared by several transitions. These restrictions are partly removed (safeness is still necessary) in the modeling scheme of Figure 27 22]. Firing of transition t3 corresponds to the switching of a four-input C-element T3. If initially signals T1; T2; T3; T4; T5 are all at 0, then output T3 will rise only after rising of signals T1 and T2 which model rings of transitions t1 and t2. The next occurrence of transition t3 is modeled by signal T3 falling. The latter happens only after the falling of T1 and T2 (transition t1 and t2 must re for the second time) and rising of T4 and T5 (the successors of t3 must nish their rst rings). Hence the negative feedbacks from the circuits modeling successor transitions prevent the interference of consecutive ring ows for transition t3.
The transition-based implementation is typically more area e cient than the place-based one, but it is still less e cient than the logic syntehsis techniques described in the next Section.
Synthesis-based methods
Section 5.1 gave the formal conditions which must be satis ed by an STG in order to be implementable. In particular, it was shown that if a deterministic, persistent and consistent STG with n non-input signals satis es the CSC property, then there exists a speed-independent circuit with n gates, whose behavior is strongly equivalent to the original STG. However, this statement does not restrict in any way the types of gates used in such circuit. It assumes that the implementation basis is not limited and that the gates can implement any arbitrary Boolean function (so-called complex gates). Even though this assumption looks rather unrealistic, methods for complex gate implementation are of interest due to the following reasons:
1. A complex gate implementation can satisfy the requirements of real library (we will see that this is the case with several examples that we have considered so far). The task of complex gate implementation of an STG with the CSC property is to obtain logic equations for all its non-input signals. It has a straightforward solution if one considers the corresponding SG. 18 If the Complete State Coding requirement is met, then for any non-input signal of the STG, the SG de nes an incompletely speci ed logic function as three sets of Boolean vectors: the on-set, the set of states where the function evaluates to 1; the o -set, the set of states where it evaluates to 0; and the dc-set (don't care set), the set of states where the function is not speci ed.
The dc-set includes all the states which are not reachable from the initial state of the SG. We can de ne which states belong to the on-set and o -set by considering the so-called implied value of a signal, i.e. the next value that the signal is going to take.
If signal a is stable in the SG state s, then s belongs to the o -set of a if a=0 in s and to the on-set of a if a=1 in s.
If signal a is enabled in state s, then its value must be the opposite to the current one, and s belongs to the o -set of a when the current value of a is 1 and to the on-set when it is 0.
Using the \*" notation for enabled signals, the above rules can be shortly formulated as follows:
if a = 1 _ a = 0* in state s, then s is included in the on-set of a, while if a = 0 _ a = 1*, then s is included in its o -set.
Example. Let us illustrate the synthesis procedure by deriving the logic equation for the set-dominant latch from Section 4. Its behavior speci cation in the form of SG in Figure 28 ,a is reproduced from Figure 2 ,d. The output signal Q is equal to 0 and is stable in the initial state 0*1*0 and hence this state belongs to the o -set of Q. Q keeps the 0 value but is enabled in 110*, and 110* is included into the on-set of Q. Following these rules we can ll the Karnaugh map for signal Q (shown in Figure 28 ,b) and derive the logic equation Q = S _ RQ with conventional minimization techniques. This is the well-known equation of an SR-latch. 18 Synthesis can be performed directly from the STG as well. These methods are based on approximation techniques similar to the one discussed in Section 7.3. However, they are more complicated and we will omit them due to the lack of space. One can look for details in 43]. The circuits for the semaphore control of the railway system with 6 and 7 sections, automatically derived using the above method (see Figures 7,b, 24 ,b and Table 2 ), are shown in Figure  29 ,a,b. The overall circuits for the railway control are shown in Figure 29 ,c and d respectively. One can see that for both circuits the parameters of the complex gates are quite reasonable and these implementations can be realized in most real libraries. If an STG is not SI-implementable, then there is no speed-independent circuit that is trace equivalent to the STG. Such speci cations are rejected by the synthesis procedure and must be changed by hand to satisfy the implementability properties.
We next describe a method to satisfy the CSC requirement by inserting additional state signals.
Complete State Encoding
When an STG contains CSC con icts, they must be disambiguated by additional signals. This can be done by inserting one signal at a time until all con icts are resolved 19 . Such an iterative procedure can be formulated as follows: Example. There is one CSC con ict for states 0*0*01 and 0001* in the SG, and we can separate the con icting states by the partition shown in Figure 30 ,b. , and signal csc+ should be enabled in all of them. This means that transition csc+ must be inserted concurrently with the transitions between border states (see Figure 30 ,c). The STG obtained after the insertion of signal csc satis es the CSC property and its implementation on complex gates is shown in Figure 31 ,a. One can see that the logic functions of the implementation are more complicated than in the case of railway system controller, and their complexity may exceed the requirements of the gate library. The complex gate implementation can be decomposed into simpler gates and C-latches by the technology mapper 5](see Figure 31 ,b). We will answer these questions in the next section by using the theory of state regions.
Speed-independence preserving event insertion
Regions, Excitation Regions and Switching Regions. Let A region 40] is a subset of states with which all transitions labeled with the same event e have exactly the same \entry/exit" relation. This relation will become the predecessor/successor relation in the Petri net.
Let us consider the SG shown in Figure 32 . The set of states r1 is a region, since all transitions labeled with a+ enter r1, all transitions labeled with b? exit r1, and transitions labeled with b+ and a? do not cross r. On the other hand, the set of states f1*1*, 1*0g (shown by the dotted line in Figure 32 A region r is a pre-region of event e if there is a transition labeled with e which exits r. A region r is a post-region of event e if there is a transition labeled with e which enters r. Figure 32 shows the correspondence between regions and Petri net places. E.g., place p2 corresponds to region r1. Place p2 is an input place for the transition labeled with b? because region r1 is a pre-region of b?.
While regions in an SG are related to places in the corresponding PN, excitation regions are related to transitions of the PN. A set of states is called an excitation region for event a (denoted by ER(a)) if it is a maximal set of states such that for every state s 2 ER(a) there is a transition s a !.
It can be shown 6] that in the SG corresponding to a safe STG, the excitation region of a persistent transition can be obtained as the intersection of all pre-regions for this transition.
In the SG from Figure 32 the excitation region for a?, ER(a?), is shown by the dotted line. It corresponds to the transition a? in the PN. Similarly to the ER, we de ne the switching region for event a as the set of states reached immediately after the occurrence of a (denoted by SR(a)).
Insertion scheme. Informally, the event insertion operation selects a subset of states, splits each one of the selected states into two states and creates, on the basis of these new states, an excitation and a switching region for the new event. Figure 33 illustrates the insertion scheme which is typically used in applications to asynchronous design.
A state signal insertion must preserve the SI-implementability properties of the original specication. Formally, we say that a set of states ER(x) selected for the insertion of a new signal in an 1. r is a region or 2. r is an excitation region of a persistent event a or 3. r is an intersections of pre-regions of the same event, when r is connected and all its exit events are persistent. This suggests that good candidates for insertion sets should be sought on the basis of regions and their intersections. Since any disjoint union of regions is also a region, this gives an important corollary that sets of states selected for insertion of new events can be built more e ciently from \bricks" (regions) rather than \sand" (states).
Partitions and borders. We now return to the rst problem of bi-partitioning the set of states.
To distinguish the CSC con icts in an SG, a bi-partition of its states into sets S 0 and S 1 should be constructed. Even though the con ict relation is de ned on pairs of states, the requirements of SIP-sets do not allow us to manipulate single states in the construction of a partition. The problem of nding an optimal SIP partition maximizing the number of distinguished con icts is intractable, and hence requires to develop appropriate heuristics.
Such an approach can be based on considering larger sets of states when constructing a partition. Regions and intersections of pre-/post-regions of the same event are good candidates for \bricks" when looking for SIP sets, rather than individual states (\sand"). To construct a partition, adjacent bricks are combined together to form bigger blocks. The union of bricks is guided by a cost function that takes into account the number of distinguished con icts. A greedy block merging approach can be used in practice.
Example. The SG in Figure 32 has two pairs of CSC con icts f10*, 1*0g and f0*1, 01*g.
A partition distinguishing the con icts is constructed from the intersections of post-regions. The post-regions of transition a+ are regions r1 and r2 (shown in Figure 35 by dashed lines), and those transition b+ are r3 and r4 (shown by dotted lines). Their intersections are shown by dashed areas in Figure 35 ) =f0*0*g. Both exit borders (shown by the dashed areas) are well-formed and satisfy the SIP-set requirements. The result of the insertion of the additional signal c using these exit borders is shown at the right in Figure 35 . The resulting SG has no CSC con icts and can be implemented by a speed-independent circuit. This circuit corresponds to the C-latch with two inputs inverters that we have already discussed in Section 4 (see Figure 10) .
In the example of Figure 32 , the obtained bipartition distinguishes all CSC con icts in the original SG. In general, several iterations inserting signals can be required. On every iteration the speci cation is re ned by excluding some con icts. The convergence of the procedure was shown in 5] for the class of SGs corresponding to safe STGs.
Conclusion
This paper has discussed various techniques for speci cation, analysis and synthesis of asynchronous circuits that have been developed in recent years based on an interpreted PN specication called STG. We have shown, by means of examples, how the function of an asynchronous control circuit can be speci ed using an STG, and how an existing asynchronous netlist can be modeled (for analysis and re-synthesis) as an STG. A data path is usually designed independently and is controlled by the control circuit via some form of request/acknowledge or using timing assumptions (bundled data) to simplify the control logic.
We have compared BDD-based versus unfolding-based analysis techniques for STGs. Finally, we have summarized state encoding and logic synthesis techniques for asynchronous circuits using STGs.
We hope that this review has been useful, rst of all as a contribution towards the establishment of a practical asynchronous circuit design ow, second as a paradigm for the establishment of other interpretation of Petri nets for synthesis of other classes of asynchronous processes. A designer interested in applying these synthesis techniques should use the examples presented in this paper as a guideline to specify asynchronous controllers by means of STGs, and use the tools based on the theory described in the previous sections in order to implement the control logic. An example of such tools is petrify, which is available from http://www.ac.upc.es/vlsi/petrify/petrify.html. The data path, on the other hand, could be designed by using micropipelining or semi-synchronous techniques, in which the clock signals are locally generated by the controllers 52, 15] . Standard physical design tools can then be used, with the additional constraints arising from the need to satisfy the isochronic fork assumptions, to place and route the circuit. Information about such isochronic forks can be obtained directly by using PN-based veri cation techniques.
We strongly believe that the scope of application of PNs should not be limited to analysis (either by simulation or by means of formal techniques) but be extended to synthesis as well. This is possible only by mapping the interpretation of the net onto a domain for which synthesis techniques can be developed, such as the Boolean domain in this case, and by interpreting PN properties, such as persistence, as desirable (or dangerous) implementation properties, such as hazard-freeness.
In the future, we will need to enhance the techniques summarized in this paper with some notion of time, in order to compete with synchronous circuits that take full advantage of the fact that physical delays are always bounded. Although some work in this area has been done ( 38] ), a more aggressive or/and more e cient optimization is still needed. Moreover, a good design ow needs an abstraction mechanism, and unfortunately general PNs are particularly lacking in this respect (mostly because logic circuit abstraction is based on \boxes" with inputs and outputs, while an STG only models sequencing of events). Again, some work has been done ( 59] ), but it needs more development.
