Interfaces, by nature, are often asynchronous since they serve for connecting multiple distributed mo dules agents without common clock. However, the most recent developments in the the ory of asynchronous design in the areas of speci cations, mo dels, analysis, veri cation, synthesis, technology mapping, timing optimization and performanc e analysis are not widely known and r arely accepted by industry.
1 Speci cation with Petri Nets 1.1 F rom timing diagrams to Petri Nets Figure 1 depicts the in terface of a device with a VME bus. The beha vior of the controller is as follows: a request to read from or write in to the device is received by one of the signals DSror DSwrespectively. In a read cycle, a request to read is done through signal LDS. When the device has the data ready LDT ACK, the controller m ust open the transceiver to transfer data to the bus signal D. In the write cycle, data is rst transferred to the device. Next, a request to write is done LDS. Once the device ac knowledges the reception of the data LDT ACK the transceiver m ust be closed to isolate the device from the bus. Each transaction m ust be completed b y a return-to-zero of all interface signals, seeking for a maximum parallelism betw een the bus and the device operations. A PN has two t ypes of vertices: places denoted by circles and transitions denoted by b o xes, and arcs from places to transitions and from transitions to places. Places correspond to local states of the system and are used for k eeping information about system resources and conditions for execution of transitions. Places can keep tokens denoted by black dots. A tok en in a place indicates that a resource is available or a condition satis ed. In general more than one tok en can be kept in a place, but w e will consider only the simplest case: a place cannot contain more than one tok en.A set of all places currently mark ed with a tok en corresponds to a current global state of the net. Such global states are called markings. The initial marking of the PN in Figure 3 is fp 0 ; p 1 g .
T ok en game and concurrency
Transitions correspond to system ev ents signal transitions in the example. A transition is enabled if all input places contain a token. In the initial marking of the PN in Figure 3 : STG for the READ cycle only one transition, DSr+, is enabled; another one, LDS+, is not: only place p 1 among two of its input places, p 1 and p 2 , contains a token. Every enabled transition can re. Firing removes one token from every input place of the transition and puts one token to each of its output places. Firing of a transition is an atomic instantaneous operation, while some unspeci ed time can pass between enabling and ring of the transition. After the ring of transition DSr+ the net moves to a new marking fp 1 ; p 2 g and then LDS+ becomes enabled.
This process of moving tokens around a.k.a. token game in a few steps will re transition D,. This leads the net into the marking fp 7 ; p 8 g .In this marking two transitions DTACK, and LDS, become enabled. Since their input places are di erent they do not con ict for tokens and cannot disable each other. This represents concurrency between DTACK, and LDS,. In total, there are four pairs of concurrent transitions: DT ACK,; L D S , , DT ACK,;LDTACK,, DSr+; L D S , , and DSr+; LDTACK,, where concurrency is a potential to re at the same time.
State graphs
Playing the token game one can generate a Transition System TS an abstract state graph in which each arc between a pair of states is labeled with the corresponding red transition. Figure 4 depicts a TS for the READ cycle. Each state in the TS generated from a PN corresponds to a marking, which is shown at the left from the corresponding state. A TS with states labeled with markings is called a reachability graph of a PN. F or Signal Transition Graphs each state of the corresponding TS also can be associated with a binary code of signal values, which are shown at the right from the states 1 . A TS with states labeled with binary codes of signals is called a state graph of an STG.
Choice and arbitration
The environment of the device has a choice to request the read or the write operation. Similarly, if an arbitration within the device is involved, then the device itself can internally make a non-deterministic choice between two requests. Choice is expressed in PNs b y c hoice places as shown in Figure 5. Here places p 0 and p 3 are choice places, places p 1 and p 2 merge alternative branches of the behavior and all other places are removed from the gure, since they have only one input and one output arc they are called implicit places 1 for the sake of readability w e separate with dots left handshake signals, right handshake signals, and data transceiver control signal; enabled signals are marked with an asterisk. Property veri cation. After specifying the design it is required to check implementability properties to answer the following question: Can the speci cation be implemented with an asynchronous circuit?" 14, 16 . Other properties of the speci cation can be of interest as well, e.g., absence of deadlocks, fairness in serving requests, etc. General purpose veri cation techniques can be employed for this analysis 19 . Implementation veri cation. After design is done fully automatically or especially with some manual intervention it is often desirable to check that the implementation is correct with respect to the given specication 10, 24 . Performance analysis and separation between events is required a for determining latency and throughput of the device and b for logic optimization based on timing information 12, 22 see also Section 5. Properties required for implementability include: boundedness of the PN to guarantee that the speci ed state space is nite; consistency of an STG to ensure that rising and falling transitions alternate for each signal; completeness of state encoding to check that there are no con icts in de nition of Boolean functions for each non-input i.e. output and internal signals; persistency of the STG to verify that a no non-input signal transition can be disabled by another signal transition and b no input signal transition can be disabled by a non-input signal transition. The former ensures that no short glitches, known as hazards, can appear at the gate outputs, while the latter ensures that no hazards can occur at inputs of the device. If all the above properties are satis ed, then the STG specication can be implemented as a, so-called, speed-independent circuit 20 2 . Speed-independence means no hazards under any v ariations of gate delays if variations of some critical wire delays after forks so-called isochronic forks stay within reasonable bounds e.g., within one gate delay.
Let us illustrate two of the above properties with an example. Two states in the TS in Figure 4 are underlined. They correspond to the di erent markings, fp 4 g and fp2; p 8 g , but their binary codes are equal, 10110. Moreover, enabling conditions in these two states for output signals LDS, and D are di erent. Therefore, the implied value of the next state Boolean function for signal LDS for vector 10110 should be 1 for the rst state and 0 for the second state. This is a con ict in the de nition of the function. To resolve this con ict two methods can be employed: a inserting an additional state signal whose value should distinguish two con ict states or b concurrency reduction. In the rst case one feasible solution is to insert rising transition of the additional state signal right before LDS+ and its falling transition right before D,. So con icting states will be associated with di erent v alues of the new state signal. In the second case, a possible solution is to remove the con icting state fp2; p 8 g from the speci cation. The environment should usually stay u n touched for the compositional reasons, therefore delaying input signals is not allowed. Hence, signal transition DTACK, can be delayed until LDS, res. The automatic techniques for solving the state encoding problem are presented, e.g., in 6, 27 .
To illustrate the persistency property let us consider transitions DSw+ and DSr+ in Figure 5 assuming for a moment that they are output signals to be implemented. Both are simultaneously enabled and disable each other after ring. Such behavior cannot be implemented without hazards unless special mutual exclusion elements arbiters are used.
Techniques
There are several techniques for ghting with the state explosion problem" in analysis of Petri Net-like speci cations. 2 Also called quasi-delay-insensitive in the literature 18, 2 Symbolic Binary Decision Diagram-based BDD 3 traversal of a reachability graph allows its implicit representation which is generally much more compact than an explicit enumeration of states 24 . Partial order reductions 11 , stubborn sets 26 , identi cation method 14 ignores many or even most of the states for analysis of certain properties. Structural properties of PNs e.g., place invariants can provide fast upper approximation of the reachability space 21, 9 and also can be used for dense variable encoding of states in the reachability graph. Structural reductions are useful as a preprocessing step in order to simplify the structure of the net before traversal or analysis, keeping all important properties. Unfoldings 19, 16 are nite acyclic pre xes of the PN behavior, representing all reachable markings. They are often more compact than the reachability graph and due to the acyclic property are well-suited for extracting ordering relations between places and transitions concurrency, con ict and preceding. Di erent t ypes of unfoldings are also used for performance analysis 12 . More details on the applicability of these techniques can be found in 13 .
Logic Synthesis
The goal of logic synthesis is to derive a gate netlist that implements the behavior de ned by the speci cation. For simplicity, w e will illustrate this step by synthesizing a speedindependent circuit for the read cycle of the VME bus see Figure 3 .
The main steps in logic synthesis are the following: Encode the SG in such a w a y that the complete state coding property holds. This may require the addition of internal signals.
Derive the next-state functions for each output and internal signal of the circuit. Map the functions onto a netlist of gates.
Complete State Coding
As mentioned in Section 2.1, the SG of Figure 4 has state con icts. A possible method to solve this problem is to insert new state signals that disambiguate the encoding con icts. Figure 6 depicts a new SG in which a new signal, csc0, has been inserted. Now, the next-state functions for signals LDS and D can be uniquely de ned. The insertion of new signals must be done in such a w a y that the resulting SG preserves the properties for implementability.
Next-State Functions
When an SG ful lls all the implementability properties, a next-state function can be derived for each non-input signal. Given a signal z, w e can classify the states of the SG into four sets: positive and negative excitation regions ERz+ and ERz, and quiescent regions QRz+ and QRz,.
A state belongs to ERz+ if z = 0 and z+ is enabled in that state. In this situation, the value of the signal is denoted by 0 in the SG. A state belongs to QRz+ if s in stable 1 state. These de nitions are analogous for ERz, and QRz,.
The next-state function for a signal z is de ned as follows: Once the next-state function has been derived, boolean minimization can be performed to obtain a logic equation that implements the behavior of the signal. In this step it is crucial to make an e cient use of the don't care conditions derived from those binary codes not corresponding to any state of the SG. F or the example of Figure 6 , the following equations can be obtained: D = LDTACK csc0; LDS = D + csc0 DT ACK = D; csc0 = DSrcsc0 + LDT ACK A w ell known result in the theory of asynchronous circuits is that any circuit implementing the next-state function of each signal with only one atomic complex gate is speed independent. By atomic gate we mean a gate without internal hazardous behavior 14, 1 7 . Two possible hazard-free gate mappings for the next-state function of the READ cycle example are shown in Figure 7 ,a and b.
However, there could be two obstacles in the actual implementation of the next state functions: a a logic function can be too complex to be mapped into one gate available in the library; b the solution requires the use of gates which are not typically present in standard synchronous libraries. The second is the case with solution Figure 7 ,a. A gate pictured as a circle with "C" is a so-called C-element 20 : a popular asynchronous latch with the next state function c = ab + ca + b. Its output, c, goes high low if both inputs, a and b, go high low; otherwise, it keeps the previous value.
Hazards
A crucial problem which makes solution of logic decomposition problem for asynchronous design di cult is a problem of hazards 25, 23 . Recent development in 23 shows that if the so-called Fundamental mode is acceptable input cannot change until all internal circuit activity stabilizes, then most of the known methods of logic minimization can be gracefully extended to asynchronous hazard-free minimization. These results can further be extended to FSMs 29 .
Unfortunately, the Fundamental mode is often too restrictive and in particular is not satis ed for logic implementing signal functions in synthesis using STGs. 
Decomposition and Technology Mapping
One of the partial solutions to the logic decomposition for non-fundamental mode, called the monotonous cover requirement 1, 1 5 , allows one to decompose any function into two-level combinational logic and a latch. This does not solve h o w ever a problem of breaking gates if the fan-in or fan-out is too large. The latest results 4, 5 allow one to obtain a hazard-free decomposition and then map the decomposed solution into the available library without 4 or with 5 gate sharing into gates with restricted fan-in. Applying method from 5 two other correct solutions can be found for mapping the control for READ cycle into two inputs gate library: solution in Figure 7 ,b uses a standard reset dominant RS-latch instead of the C-element; solution in Figure 8 ,a uses only combinational gates. This solution seems to be a standard synchronous decomposition for the function of signal csc0 = DSrcsc0 + LDT ACK: map0 = csc0 + LDT ACK; csc0 = DSrmap0
Note, however, that signal map0 is also fed to gate D = LDS map0. It is only because of this multiple acknowledgment o f map0 b y t w o di erent gates, that this solution for the READ cycle control is hazard-free: every rising transition at map0 i s a c knowledged by signal D, while every falling transition by signal csc0. Another synchronous decomposition for csc0 presented in Figure 8 ,b is hazardous and cannot be accepted.
The technique for decomposition and technology mapping from 5 is based on using candidates for decomposition extracted by algebraic factorization and boolean relations and inserting hazard-free signals with multiple acknowledgment. 000  000  000  000  000  000 000  000  000  000 000   111  111  111  111  111  111 111  111  111  111 111 sep(D-,LDS-)<0 sep(LDTACK-,DSr+)<0 a b Figure 9 : a STG extracted for the two-input combinational gate circuit, b timing STG with separation constraints for the optimized circuit 4 Back annotation State regions 8 are sets of states such that they correspond to a place regions or a transition of the PN excitation regions. Entry and exit arcs for a region correspond to input and output transitions of a place. Apart from being useful for state exploration regions provide another important feature: at any step of the design process a PN corresponding to the current TS can be extracted and back-annotated to the designer. This is useful both for interactions with the design process and for the performance and timing analysis of the circuit. An example of a PN extraction is shown in Figure 9 ,a.
Timing Optimization
The power of optimization based on timing information is two-fold.
Timing constraints always reduce the set of reachable states and hence increase the number of don't care states 22 . Moreover this concurrency reduction does not introduce new dependencies between signals since it is fully based on timing not on logic ordering. Using timing requirements it is possible to extend the set of states in which signal is enabled without changing the set of reachable states: signal transition enabling does not cause signal ring if other enabling signals are known to be or can be made faster. Let us illustrate how timing information can increase the exibility in logic optimization by example of the READ cycle. Assume rst that, as a part of the initial specication, it is given that the reset at the right side handshake is always faster than the next read request at the left side handshake, formally: maximal separation 12 between transitions LDTACK, and DSr+ is negative, SepLDTACK,; D S r + 0. Then there is no need in the additional state encoding signal and the circuit is simpli ed to Figure 10 ,a.
Assume next that the physical design level tools achieve control over the delay information using gate and transistor sizing, placement and routing, and constraining interconnect delays. Then the logic-level synthesis tools can perform logic optimization at the same time generating separation constraints that must be implemented by the physical level tools. For example, it is possible to start enabling of LDS, right after DSr,instead of D, given that the requirement SepD,; L D S , 0 will be satis ed. This requirement is satis ed if the maximal delay o f D , is smaller than the Figure 10 : Circuits for the READ cycle after timing optimization minimal possible delay o f LDS, that can be implemented, e.g., by transistor sizing or delay padding. The resulting circuit corresponding to both timing requirements is shown in Figure 10 ,b. Back-annotation to an extended PN with relational timing constraints so-called lazy PNs can be done for the circuits optimized based on timing information see Figure 9 ,b.
Other Design Techniques
This paper has presented a design methodology based on Petri net speci cations of the behavior of a circuit. However, other models have been proposed in the literature. Among them, we can point up the methods based of burst-mode machines 29 and on syntax-directed 2 or transformationbased 18 translation from process algebras. Burst-mode machines work under the so-called fundamental mode assumption, i.e. after each burst of inputs events accepted by the system, the environment allows the circuit to stabilize before reacting to the output events. This assumption is realistic for many applications and enables the utilization of combinational logic minimization methods for synchronous circuits with ad-hoc extensions to prevent hazardous behavior 23 .
Translation from process algebras has been proposed for formalisms derived from CSP. Syntax-directed translation derives a netlist of components that implement the behavior of each of the constructs of the language parallel sequential composition, choice, communication, synchronization, etc.. The size of the resulting circuit is linearly dependent on the size of the input description. This fact enables designers and tools to predict the circuit's performance and complexity parameters at the earliest steps of the design process.
Other e orts have been devoted to map asynchronous speci cations into standard HDLs aiming at the simulation and validation with commercial tools 28 .
Summary
In the last few years, the techniques for asynchronous designed have matured. Among the applications for asynchronous design we can point up asynchronous interfaces, high-performance computing, low-power and low-emission design, etc. There are also applications at the system level, e.g. hardware-software co-design.
Recently there has been an increasing interest of few but large-scale industries e.g. Intel, Philips, Sharp, ARM, Cogency, SUN, HP in asynchronous design targeting at di erent goals: low p o w er, high performance, etc.
Asynchrony i n troduces a new paradigm in logic design. Asynchronous circuits are much more di cult to design and, for this reason, it is crucial to provide CAD tools to handle the most di cult tasks automatically. Most of the steps of the design process presented in this tutorial are supported by the tool petrify available at DAC paper home URL: http: www.lsi.upc.es ~jordic petrify.
For a more complete tutorial in PN-based design of asynchronous control circuits we refer to 7 . For further information on asynchronous design, the reader can look at the Asynchronous Logic Home Page http: www.cs.man.ac.uk amulet async index.html and the proceedings of the ASYNC Symposiums.
An extended version of this paper can be found in 13 .
