Abstract| This paper presents a new algorithm for timed state space exploration, POSET timing. POSET timing improves upon geometric methods by utilizing concurrency and causality information to dramatically reduce the number of geometric regions needed to represent the timed state space. The utility of POSET timing is illustrated by its application to the automatic synthesis and veri cation of gatelevel timed circuits. Timed circuits are a class of asynchronous circuits that incorporate explicit timing information in the speci cation which is used throughout the synthesis procedure to optimize the design. Using POSET timing, our synthesis procedure derives a timed circuit that is hazard-free. The circuit uses only basic gates to facilitate the mapping to semi-custom components, such as standard-cells and gatearrays. The resulting gate-level timed circuit implementations are 30 to 40 percent smaller and 30 to 50 percent faster than those produced using other asynchronous design methodologies. This paper also demonstrates that timed designs can be smaller and faster than their synchronous counterparts. The POSET timing algorithm can not only e ciently verify our synthesized circuits but also a wide collection of large, highly concurrent timed circuits and systems that could not previously be veri ed using traditional techniques.
I. Introduction I N recent years, there has been a resurgence of interest in the design of asynchronous circuits due to their ability to eliminate clock skew problems, achieve average case performance, adapt to processing and environmental variations, and provide component modularity. Asynchronous circuits can also lower system power requirements by reducing synchronization power, automatically powering down unused components, removing spurious transitions, and easily adjusting to a dynamic power supply. While asynchronous designs have long been used in interface circuits, they are now being considered for the design of low-power embedded controllers and portable devices due to their low-power advantages.
Traditional academic asynchronous design methodologies use unbounded delay assumptions, resulting in circuits that are veri ably correct, but sacri ce timing for simplicity, leading to unnecessarily conservative designs. In industry, however, timing is critical to reduce both chip area and circuit delay. Due to the lack of formal methods to handle timing information correctly, circuits with C. Myers is with the Dept. of Electrical Engineering, University of Utah, Salt Lake City, UT 84112. E-mail: myers@ee.utah.edu.
T. Rokicki timing constraints usually require extensive simulation to gain con dence in the design. Our research bridges this gap by introducing timed circuits in which explicit timing information is incorporated into the speci cation and utilized throughout the design procedure to optimize the implementation. Timed circuits can be signi cantly smaller and faster than those produced using traditional methods, and they are more reliable than those produced using ad hoc methods. The speci cation of timing constraints also facilitates a natural interaction between synchronous and asynchronous circuits.
Timing considerations, however, often introduce an additional exponential factor of complexity into the design procedure. As a result, timing analysis has hitherto either been avoided 1], 2], 3], simpli ed 4], 5], or considered only after synthesis 6]. This paper develops an exact and e cient timing analysis algorithm, POSET timing, and applies it to the automatic synthesis and veri cation of gatelevel timed circuits.
In our previous work, an e cient timing analysis algorithm is developed and applied to incorporate timing considerations into the synthesis of timed circuits 7] . We veried that our timed circuit implementations are hazard-free using Burch's timed circuit veri er 8]. This work, however, is not without its limitations. First, since the timing analysis is limited to only choice-free speci cations, our synthesis procedure could only be applied to a limited class of circuits. Second, these timed circuit implementations require complex-gates, making it di cult to use semi-custom components, such as standard-cells and gate-arrays, which are becoming increasingly important to improve time-tomarket. Third, the discrete-time veri cation approach employed by Burch's veri er is limited in its applicability since the number of discrete-time states grows exponentially with respect to the number of concurrent events. This paper presents a new timing analysis algorithm, POSET timing, which allows us to develop more general and widely applicable procedures for the synthesis and veri cation of timed circuits. First, it allows us to extend our synthesis and veri cation procedures to a very general class of speci cations, namely any speci cation which can be translated into a 1-safe timed Petri-net. Second, our new synthesis procedure facilitates the mapping of our implementations to semi-custom components by using additional correctness constraints, thereby producing hazardfree timed circuits using only basic gates such as AND gates, OR gates, and C-elements. Our synthesis procedure has been fully automated in a CAD tool and applied to several examples, resulting in gate-level timed circuit implementations which are 30 to 40 percent smaller and 30 to 50 percent faster than designs using other methodologies. After synthesis, the POSET timing algorithm is used to verify if the synthesized timed circuit implementation back-annotated with delays from a given cell-library satis es its timed speci cation. Our veri cation procedure is shown to be able to rapidly verify larger, more concurrent timed circuits and systems than could previously be veri ed using traditional techniques.
Section 2 describes the initial speci cation method, timed Petri-nets, and how they can be translated to ones which can be analyzed using the POSET timing algorithm. Section 3 describes the POSET timing algorithm and how it is used to explore the timed state space. Section 4 brie y explains the application of POSET timing to the synthesis and veri cation of timed circuits. Section 5 gives our conclusions.
II. Timed specifications
Timed Petri nets 9] are a natural way to specify timed circuits and systems. They can, however, be di cult to analyze directly. This work uses timed Petri nets as a speci cation language, and translates them to a orbital nets for analysis. Orbital nets are shown later to be e ciently analyzable using the POSET timing algorithm.
A. Timed Petri nets
A 1-safe timed Petri net (TPN) is modeled by the tuple hP; T; F; M 0 ; i where P is the set of places, T is the set of transitions, F (P T ) (T P ) is the set of edges, M 0 P is the initial marking, and : P ! N N 1 is an assignment of timing requirements to places. A marking is a subset of the places. For a place p 2 P , the preset of p (denoted p) is the set of transitions connected to p (i.e., p = ft 2 T j (t; p) 2 F g), and the postset of p (denoted p ) is the set of transitions to which p is connected (i.e., p = ft 2 T j (p; t) 2 F g). For a transition t 2 T , the presets and postsets are similarly de ned (i.e., t = fp 2 P j (p; t) 2 F g and t = fp 2 P j (t; p) 2 F g).
Timing is associated with a place as a timing requirement consisting of a lower and upper bound. The lower bound is a nonnegative integer and the upper bound is an integer greater than or equal to the lower bound, or 1. Since real values can be expressed as rationals within any required accuracy, restricting the bounds of timing requirements to be integers does not decrease the practical expressiveness of timed Petri nets.
In a timed Petri net, an untimed state is a marking of the net (i.e., M P ). A timed state is an untimed state with a time-valued clock clk i associated with each marked place p i (i.e., (M; CLK) P <). Each clock advances with time and denotes how long the place has been marked.
The behavior speci ed by a timed Petri net is de ned with an operational semantics composed of two types of operations: ring of transitions and advancement of time. For a given timed state, (M; CLK), a transition is untimed enabled if all places in its preset are marked (i.e., t M). The set of untimed enabled transitions is denoted T e (M). A transition is timed enabled when it is untimed enabled and all places in its preset have a clock that is greater than or equal to the lower bound of the timing requirement on the place (i.e., 8p i 2 t : clk i l i ). A transition is expired when it is timed enabled and all places in its preset have a clock that is greater than the upper bound of the timing requirement on the place (i.e., 8p i 2 t : clk i > u i ). A transition cannot occur until it is timed enabled, and it must occur before it becomes expired. Any timed enabled transition can be red instantaneously, and any number of transitions can be red without time advancing. A transition is red by removing the tokens in the places in its preset and discarding the clocks. The places in the postset of the red transition are then marked, and all newly marked places are assigned a clock initialized to zero.
Time is advanced by uniformly increasing the clocks associated with the places by an amount which is less than or equal to max-advance for a given timed state. The function max-advance is de ned as the maximum amount of time that can be allowed to advance before some transition would become expired. More formally, for a given timed state, (M; CLK), max-advance is de ned as follows:
max-advance(M; CLK) = min t2Te(M) fmax pi2 t fu i ? clk i gg:
These semantics de ne the set of timed ring sequences, TFS, as a sequence of pairs of transition rings and time values. For simplicity, the time value represents a nonnegative duration since the previous pair. Executing a timed ring sequence on a timed Petri net results in the timed state re( ). The set TFS is de ned recursively. The empty sequence " is in TFS. For every ring sequence in TFS and for every value of such that max-advance( re( )), then ; ( ; ) is in TFS, where represents an`empty' ring. In addition, if a transition t is timed enabled in re( ), then ; (t; 0) is also in TFS. The reachable timed state space, TS, is the range of the function re over TFS.
B. Orbital nets
In order to use the POSET timing algorithm for timed state space exploration, it is necessary to translate a timed Petri net into an orbital net representation. An orbital net, while similar to a timed Petri net, has several key di erences which facilitate the development of e cient timing analysis algorithms. Orbital nets are essentially a labeled 1-safe Petri net extended with automatic net constructions and syntactic shorthands for composition and receptiveness 10]. The net constructions allow us to have relatively straightforward operational semantics, while the syntactic shorthands allow us to compose the nets without an exponential blowup in net size. These features are described in detail in 10]. Orbital nets also allow simultaneous actions (i.e., transitions labeled with sets of actions). These allow us to easily mix behavior and environmental requirements even at the gate model level.
The key di erence between orbital nets and timed Petri nets is that timing requirements can be either behavioral (b) or constraint (c) (i.e., hl; uitype), and that only a single behavioral place can be in the preset of any transition.
The timing bounds associated with a behavioral place are used to specify guaranteed timing behavior. The timing requirements associated with a constraint place are used to specify desired timing behavior, and they do not a ect the actual timing behavior. If the timing requirement on a place is omitted, it is assumed to be h0; 1ic. For a given orbital net, the set B is the subset of the places in P which are of type behavioral, and the set C is the set of constraint places.
The POSET timing algorithm described later relies on the fact that each behavioral place represents a single nondeterministic choice of delay that cannot be a ected by other behavioral places. In other words, the delay between a transition in the preset of a behavioral place and a transition in the postset of the same place should always fall between the lower and upper bound of the timing requirement on this place. This property simpli es the function max-advance to be the minimum di erence over all marked behavioral places between the upper bound of the timing requirement on the place and its clock, or 1 if there are no marked behavioral places. More formally, for a given timed state, (M; CLK), max-advance is de ned as follows:
With this de nition of max-advance, a simple all-pairs shortest-path algorithm (Floyd's algorithm) can be at the core of the timed state space exploration algorithm. When there are multiple behavioral places in the preset of a transition, the timing semantics allow the delay between the transition in the preset and postset of one of the behavioral places to exceed the upper bound on its timing requirement when the transition is being constrained by another behavioral place. To avoid this problem, orbital nets are restricted to include at most a single behavioral place in the preset of each transition.
In an orbital net, a transition is timed enabled when it is untimed enabled and if there is a behavioral place in its preset, this place's clock is greater than or equal to the lower bound of the timing requirement on the place. Before ring a transition, however, the constraint places in the entire net must be checked, and if any constraint place p i is marked with clk i > u i , this ring is marked as a failure. Also, the clocks corresponding to a marking that is removed from a constraint place p i must be checked, and if clk i < l i , this ring is also marked as a failure. Finally, after the ring of a transition, every marked behavioral place must have a transition in its postset that is untimed enabled in the new state; if this condition is not satis ed, this ring is a failure. This requirement ensures that every marked behavioral place can re in all states in which its timing conditions are met, and thus the value of its clock when it res cannot be controlled by external state. If a failure is detected during synthesis, the speci cation is inconsistent and must be modi ed before an implementation can be obtained. If a failure is detected during veri cation, the timed circuit violates its speci cation.
C. Translation from timed Petri nets to orbital nets
If timed Petri nets are translated to orbital nets in the obvious way of marking all places as behavioral, then most speci cations of interest would naturally have multiple behavioral places in the preset of some transitions. Fortunately, the original timed Petri net speci cation can always be transformed into an orbital net which satis es the single behavioral place requirement. This transformed net can then be analyzed to nd the reachable states using our efcient timing analysis algorithm. Therefore, the POSET timing algorithm can be applied to any 1-safe timed Petri net.
The transformation from an arbitrary timed Petri net into an orbital net which satis es the single behavioral place requirement is completed in two steps. The rst step in the transformation labels all places of the original timed Petri net as behavioral places (including those that have h0; 1i timing requirements). After step one, the net satis es the single behavioral place requirement if and only if for every transition in the original timed Petri net there is only one place in its preset. If at least one transition has two places in its preset, then the second step must be performed.
To illustrate the second step, consider a fragment of a timed Petri net that has two places in the preset of a transition shown in Figure 1(a) . The desired timing behavior can be depicted graphically as shown in Figure 1(b) . This net can be transformed to the orbital net shown in Figure 2 (a) which satis es the single behavioral place requirement. The idea behind this transformation is that a path through the net is created for each possible ordering of the transitions in the preset. This has the e ect that each transition in the preset is given the chance to be the one controlling the ring time of the transition in the postset. For illustration purposes, additional events c 0 and c 1 are added to the net to occur simultaneously with the two transitions associated with c. Note that every place from the original net is behavioral while the newly added places are constraint (i.e., h0; 1ic). The timing behavior of c 0 and c 1 are shown graphically in Figure 2 (b) and (c), respectively. The behavior of these two together is exactly the desired timing behavior of c. For n behavioral places, the net is transformed to model the n! possible orderings of the n enabling events. While this transformation can lead to a substantial blowup in the net size (O(2 n ) times in the worst-case) making the timing analysis more complex, we have found that the value of n tends to be quite small in practical examples.
The transformation is a bit more complicated in the case that one of the behavioral places in the preset has multiple transitions in its postset. In this case, the transformation described above is applied to each of the transitions in the postset individually rst. Then, the common transitions and places are stitched together as shown in Figure 3 .
It is important to note that an orbital net has both a static and dynamic restriction. The static restriction is the single behavioral place requirement. The dynamic restriction is that if at any time during state space exploration, the behavioral place is marked, it must be the case that a su cient number of constraint places must also be marked such that a transition in the postset of the behavioral place is enabled. Fortunately, the net transformation procedure described above restricts the use of constraint places such that this is always the case. If, however, a designer adds additional constraint places to check various timing properties, then it is possible that the dynamic restriction may be violated.
D. Reachability graphs and state graphs
The goal of timed state space exploration is to nd the set of reachable states for the system being analyzed. The reachable untimed state space for a TPN can be represented as a reachability graph (RG). A RG is a graph in which its vertices are untimed states (i.e., markings) and its edges are possible state transitions. A RG is mod- to p1 to p2 to p1 to p2 to p2 to p1 
For synthesis, it is useful to be able to determine in which states a signal is untimed enabled to rise or fall. The sets rise(u) and fall(u) provide this information and are de ned as follows:
An excitation region for signal u is de ned as a maximally connected subset of either rise(u) or fall(u). If it is a subset of rise(u), it is called a set region, and it is denoted ER(u"; k) where k indicates that it is the k th set region. Similarly, a reset region can be denoted ER(u#; k). For each signal u, there are two sets of stable, or quiescent states.
There is the set of states where the signal u is stable high denoted QS(u") (i.e., QS(u") = fM 2 j (M)(u) = 1^M 6 2 fall(u)g), and the set where it is stable low denoted QS(u#) (i.e., QS(u#) = fM 2 j (M)(u) = 0^M 6 2 rise(u)g), A portion of the timed Petri net and SG for a controller for a port selector (SEL) is shown in Figure 4 . The SEL example accepts data and a port selection and forwards the data out the selected port. In the initial state, all signals are low and xfer i is enabled to rise. After xfer i rises, sel o and data o become enabled to rise. Assuming that data o rises rst, in the next state data i and sel o are both are enabled to rise. However, the maximum delay for sel o to rise is 20 while the minimum delay for data i to rise is 40. Therefore, the only possible next transition is for sel o to rise, and the timing information has removed a possible state transition. The nal state graph obtained for the SEL contains 53 states. A state graph generated ignoring all the timing information contains 256 states. The size of the SG and the complexity of the circuitry are strongly correlated. For the SEL example, our synthesis procedure derives a gate-level timed circuit implementation with 27 literals shown in Figure 5(a) . If all the timing information is ignored, synthesis produces a gate-level speed-independent circuit implementation with 44 literals shown in Figure 5 (b). Besides being nearly 40 percent smaller, the timed circuit has reduced latency since it requires gates with at most 3-inputs while the speed-independent circuit requires many large gates including one with 6-inputs. While in this example it is fairly easy to see the timing relationships between the signal transitions, in general it requires complex analysis. The next section describes a new e cient algorithm for timed state space exploration. 
III. Timed state space exploration
The basic idea behind synthesis and veri cation methods that use explicit state space exploration is that only anite subset of the ring sequences need to be considered to compute the complete set of reachable states if the reachable state space is nite or has a nite representation. In orbital nets, the clocks associated with each marking can take on real values, so there are an in nite number of timed states. In order to perform explicit state space exploration, the state space exploration algorithm must either group the timed states into a nite number of equivalence classes or sets, or restrict the set of values that the clocks can attain. This section describes three previously proposed techniques for timed state space exploration: unit-cubes, discrete-time, and geometric timing. Then, it introduces our proposed technique, POSET timing (also known as partial order timing in 10]), which improves upon the geometric methods by making use of concurrency and causality information.
The example in Figure 6 motivates this presentation. This example models two consumers, a and b, and two resources, s (slow) and f (fast). Consumer a thinks for 1 to 8 units of time, then acquires either the slow (as") or fast (af") resource. After using the resource, it returns to its initial state. Consumer b is identical, but its think time is from 2 to 10 units of time. Each use of resource s takes from 8 to 10 units of time, and each use of resource f takes from 1 to 3 units of time. While this is presented as an orbital net, it is easily modeled by Alur's timed automata 11], Henzinger's timed transition systems 12], or most other operational models of timed systems. Alur's unit-cube technique has the best known worstcase complexity for timed state space exploration of general timed systems 11]. This technique enables nite state space exploration by dividing the timed state space for a particular untimed state into equivalence classes of timed states. These equivalence classes are de ned such that each component of the execution of a timed automata (the execution of a transition or the advancement of time) maps an entire equivalence class into one or more successor equivalence classes, and each element of the equivalence class maps into each successor equivalence class. In addition, all enabling conditions of transitions and other allowed observable characteristics are de ned identically for the entire equivalence class. Thus, it is possible to explore the entire timed state space by enumerating the equivalence classes.
Each equivalence class consists of an untimed state, the set of clocks that have exceeded the maximum value they are compared against, the set of integer portions of the clock values, and an ordering of the fractional portions of the clock values. This ordering is of the form (0 rel f i1 rel f i2 : : : f in ) where (i 1 : : : i n ) is some permutation of (1 : : : n), f j represents the fractional portion of clock j, and rel is one of < or =. For the case where there are two marked places and two clocks clk 1 and clk 2 , assuming neither clock has exceeded the maximum value it has been compared against, the equivalence classes are pictured in Figure 7 (a); every point, line segment, and interior triangle is an equivalence class. The points are the cases where the fractional components of both clocks are zero; the horizontal and vertical line segments are the cases where one clock has a zero fractional component and the other has a nonzero fractional component; the diagonal line segments are the cases where both clocks have the same nonzero fractional component, and the interior triangles are the cases where each clock has a distinct fractional component. While it is su cient to store only those equivalence classes that occur after the ring of a transition, the number of such equivalence classes still explodes in practice. For our example, there are a total of 2,463 unit-cube equivalence classes in the total timed state space. Of course, if the timing values are increased, this number rises signi cantly. It has been proven, however, that the general unit-cube technique is unnecessary for orbital nets since considering only integer event times gives a full characterization of the continuous-time behavior 10] (this proof is similar to one given by Henzinger, et. al. in 13] for timed transition systems). In other words, only timed states associated with each discrete-time instance, represented as a point for the two-dimensional case in Figure 7 (b), need be considered. This technique is used by Burch for verifying timed circuits 8], and has a worst-case state space size of jSj (k+1) n which is better than the unit-cube method by more than n!. The essential restriction in orbital nets that allows this optimization is that clocks are only compared with constants using ' ' or ' ', rather than '<' or '>'; this can be justi ed for real systems with the imprecise nature of physical time. State space exploration in discrete time is a simpli cation of that in Alur's unit-cube method in that the ordering relation of the fractional components of the clock is dropped, as is the possibility of time advancing in non-integral units. Our example yields a total of 487 discrete timed states; again, this number rises signi cantly as the timing requirements increase.
Both unit-cube and discrete-time methods, however, are of little more than theoretical interest because the size of the state space increases exponentially with the concurrency in the net. For a circuit with timing values accurate to two signi cant digits, with up to six independent concurrent pending events, the state space is easily in excess of 10 12 states|well beyond the capabilities of most nitestate synthesis and veri cation techniques.
Other e cient techniques have been developed 14], 15], 16], to nd a single exact time separation between two events in various types of speci cation methods. However, each time a separation is needed during state space exploration, these techniques reanalyze the complete graph, so using them to compute all of the possible separations in a graph is slow making them impractical for timed state space exploration.
Another approach to represent timed states is to use convex geometric regions (or zones) as shown in Figure 7(c) . Even though the worst-case performance is much worse than either the unit-cube or the discrete-time approaches, this approach usually performs well in practice. Dill 26] . This algorithm, however, reduces veri cation time by exploring only part of the timed state space. This may limit the timing properties that can be veri ed. Furthermore, the entire timed state space is needed for timed circuit synthesis. While reducing the number of interleavings is useful, one region is still required for every ring sequence explored to reach a state. If most interleavings need to be explored, this technique could still result in state explosion. This section describes POSET timing which improves upon the geometric methods by making use of concurrency and causality information. This is accomplished through the exploration of partially ordered sets of events as opposed to linear sequences.
A. Geometric regions
Rather than consider at each step a single discrete-time state or a minimum equivalence class of timed states, the geometric timing method considers a larger set of timed states in parallel. Speci cally, convex geometric regions of timed states are used to represent the timed state space. These regions are described by a set of constraints which are either lower and upper bounds on speci c clock values or di erences between pairs of speci c clock values. These constraints are usually encapsulated in a matrix (sometimes known as a di erence bound matrix), where the constraints on clocks fclk 1 B. State space exploration with geometric timing Each geometric region can be considered as an in nite set of timed states which are operated on in parallel. In order to perform state space exploration using geometric timing, this section rede nes the operational semantics of orbital nets in terms of these geometric regions as opposed to individual timed states. The aspects of state space exploration that do not consider time are not discussed, since they are the same in both cases. This section describes how these operations work for a single step in a timed sequence, assuming it works for the predecessor sequence; the trivial base case and structural induction on sequences completes the proof that these operations work for all sequences. Figure 8 shows how geometric timing works on our example. The rst column shows the untimed state, the second shows the geometric region before advancing time, and the third shows the region after advancing time. The regions are shown both as constraint matrices and regions in space. The 0th row and column of each matrix represent the time relationships between the clocks on the places and the ctitious clock, clk 0 , which is always 0. In other words, the 0th row is the maximum value of the corresponding clock, and the 0th column is the negative of the minimum value. The other entries represent time relationships between the clocks. For example, in row 1, the third column and second row for the rst matrix represents the relationship clk a ? clk s 8.
The initial state of our example has tokens in (P a , P b , P , P si ), with clock values of 0 for the tokens in P a and P b . Before advancing time, the region is represented as a single point at the origin, as shown in the leftmost region for the initial state in Figure 8 . In our original operational semantics, advancing time involves adding some number t to all clocks. For geometric regions, advancing time involves extruding the geometric region in the clk 1 = clk 2 = = clk n direction, subject to max-advance, which itself is a convex region. To perform this operation, the algorithm simply sets the maximum bounds on each clock (the rst row of the constraint matrix) to their maximum values (if the places are behavioral places) or in nity (if the places are constraint places). The algorithm then recanonicalizes the matrix in time O(n 2 ). The resulting matrix and geometric region are shown in the rightmost columns of Figure 8 . In row 0, the upper bound on P a is 8 and P b is 10. Since both of their clocks are initially zero, max-advance returns 8.
Determining whether a particular transition is timedenabled in our original operational semantics entails comparing the clocks with the timing requirements. With geometric regions, the timing analysis algorithm determines the subset of the timed states in the region for which the particular transition is enabled; this is called the enabling region. This can be calculated by introducing the enabling conditions on the transition selected for ring as additional constraints on the region, and recanonicalizing. For orbital nets, these conditions are always a single new minimum value on that behavioral place, and the algorithm can recanonicalize in time O(n 2 ). The dashed lines on the geometric diagrams in Figure 8 show, for each behavioral place, the minimum ring time constraint added by that place. Thus, for example, in the initial region, any transition in the postset of either P a or P b can re, since introducing the minimum ring times in either case produces a non-empty region.
After selecting an enabled transition, ring that transition involves removing some set of clocks and introducing new clocks initialized to zero. With geometric regions, removing these clocks involves projection of the system of constraints to eliminate a particular set of variables, and introducing new clocks is done by adding a new set of variables equal to zero. For example, to obtain the geometric region in row 1 (before advancing time), the algorithm introduces the minimum ring constraint for P a then projects the region onto the y axis (representing the elimination of the token from P b ). The algorithm then introduces a new clock for the token in place P s , and initializes it to zero. At this point, time is again advanced by setting the maximum bounds on each clock to their maximum values. In this region, af" is enabled because there is some subset of the region above the dotted line representing the token in place P a , but bs# is not enabled because the introduction of the minimum ring time of place P s yields an empty region.
While unit-cube and discrete-time methods operate on timed ring sequences, geometric timing operates over untimed ring sequences. For each untimed ring sequence that the orbital net might execute, geometric timing computes directly the full set of reachable states of all possible timed sequences that have the same underlying untimed sequence.
There are many di erent options for storing the geometric regions and deciding when to backtrack. In general, for every explored untimed state, there are one or more geometric regions that have been seen for that state. For instance, the untimed ring sequence bs"] leads to the same untimed state as bs", af", af#] (compare rows 1 and 3 in Figure 8 ), yet have di erent geometric regions. One option, perhaps the simplest, is to simply hash each geometric region into the state table and only backtrack when an identical geometric region is seen. For our example, the total number of geometric regions for the 7 reachable untimed states using this technique is 234, for an average of 33 regions per untimed state. The number of transitions red in our depth-rst exploration was 518.
Another option for state space exploration is to maintain a list of seen geometric regions for each untimed state, and compare each new geometric region against the entire list using the subset operation. It is possible to check if one ge- ometric region is a subset of another in time O(n 2 ). Then, the search can backtrack when a newly found region is a subset of a previously seen region. Conversely, if the newly found region is a superset of a previously seen region, then any pending state exploration from the previously seen region can be canceled, and the previously seen region can be removed from the list. In our example, using this technique, 20 di erent geometric regions are found for the 7 untimed states in 78 transition rings.
Many other optimizations are possible and have been explored by the authors; some of them yield reasonable savings in runtime, but usually the savings in runtime is less than a factor of two; in the face of the combinatorial explosion of timed state space exploration in general, these improvements are typically too small to be worth detailing in this paper.
C. Drawback of geometric timing
While geometric timing can be very e cient, with more concurrent examples, such as the adverse example, adv4x40, shown in Figure 9 , the number of geometric regions can rise astronomically. While only having a single untimed state, standard geometric timing techniques generate an incredible 219,977,777 distinct geometric regions. This is more than the number of discrete-time states!
The main di culty with all of the geometric timing techniques is the large number of geometric regions that can exist for each untimed state. The POSET timing technique introduced next tends to dramatically reduce the number of geometric regions for each untimed state, typically down to an average of one or two.
D. Concurrency, causality, and posets
The major source of blowup in the adverse example is the way the standard geometric timing algorithm calculates the set of timed states reachable from a sequence of transition rings; the transition rings are linearly ordered, even if they are concurrent in the system being evaluated. That is, if two concurrent transitions start clocks, the constraints between the two clocks re ect the linear order that the transitions are red in the original sequence. For example, when the geometric timing algorithm analyzes the untimed ring sequence a; b], it obtains the upper geometric region shown in Figure 10 , and when the algorithm considers the sequence b; a], it obtains the lower geometric region. In general, if there are n concurrent transitions that reset clocks visible in the resulting timed state, there are n! di erent sequences that need to be considered, each of which leads to a distinct geometric region. For this reason, it is important to distinguish the causal ordering of transitions from the non-causal ordering that comes about from the selection of a particular ring sequence.
To solve this problem, our POSET timing algorithm con- structs a partially ordered set, or poset, for each untimed ring sequence which is represented with an acyclic, choicefree unfolding of the original orbital net. The poset reects the concurrency and causality inherent in the ring sequence. Initially, the unfolded net representing the poset contains a single transition with places in its postset corresponding to each initially marked place. Transitions are added in the same order as they occur in the ring sequence. For each transition in the ring sequence, a correspondingly labeled transition is added to the unfolded net. A set of arcs into the transition are connected from the most recently added places in the unfolded net corresponding to places in the preset of the transition in the original orbital net. Finally, a new set of places corresponding to the places in the postset of the transition in the original net are added, and these places are connected to the new transition. Every place and every transition in the unfolded net, except the rst, correspond to some place and some transition in the original net. Every place and every transition in the original net correspond to zero or more places and transitions in the unfolded net.
A poset explicitly represents the concurrency in a particular ring sequence. That is, a particular poset corresponds to many di erent ring sequences that di er only in the interleavings of concurrent transitions; every such ring sequence res the same set of transitions and leads to the same nal untimed state. For example, the poset represented with the unfolded net shown in Figure 11 E. State space exploration with POSET timing State space exploration proceeds just as it does for the geometric method, except that, for each sequence, the algorithm constructs the corresponding unfolded net. With depth-rst search, this is done incrementally. The algorithm also incrementally calculates a poset matrix that stores the ring time relationship among the transitions. For each place p in the poset, there is a unique occurrence of a transition in its preset, ( p; i), and its postset, (p ; j). Note that i and j are the occurrence indices for these transitions. For each constraint place p, the constraint ( p; i) (p ; j) is introduced where the function () is the time of the occurrence of the given transition. For each behavioral place p in the resulting unfolded net with a timing requirement of hl; uib, two constraints are introduced. The rst re ects the minimum separation, ( p; i) ? (p ; j) ?l. The second re ects the maximum separation, (p ; i) ? ( p; j) u. All constraints introduced in this fashion for a given unfolded net must be satis ed. This poset matrix can then be used to produce a geometric region which after canonicalizing represents the full set of reachable states for the poset corresponding to the unfolded net. Applying this procedure to the unfolded net shown in Figure 11 , the POSET timing algorithm obtains at once the geometric region which encloses both regions shown in Figure 10 . Figure 12 shows a poset from our example, and Figure 13 shows the geometric regions found by the evaluation of this poset. In the initial state, only the initial reset transition has red, leading to the initial marking. The poset matrix is a singleton 0 which states that the maximum time separation between reset and itself is 0. In general, the poset matrix is extended by adding the new constraints from the ring transition and recanonicalizing. The constraint matrix has three components: minimum values for each marked timed place, maximum values for each marked timed place, and constraints on the di erences between two marked places. To compute the constraint on the di erence between two marked places, the algorithm copies the constraint on the di erence between the ring times of the two transitions in their presets (which may be the same transition) from the poset matrix. The minimum values for newly marked timed places are set to zero; the minimum values for timed places that retain their token are copied from the enabling region of the previous state. The maximum values are computed as before; the maximum delay for behavioral places and in nity for constraint places. The algorithm can then canonicalize the resulting matrix with a specialization of Floyd's algorithm in time O(n 2 ); this is the time-extruded set of reachable states for all transition sequences that share this partial order.
Returning to our example, to re bs" from the initial state, the algorithm extends the poset with the additional transition. Then, the algorithm computes the enabling region as before; this provides the minimum values for the timed places that retain their token. The algorithm then extends the poset matrix; the empty matrix is extended with the bounds on the separation between the reset tran- sition and the transition bs", derived from the behavioral timing place 2; 10]. The resulting matrix is shown in row 1 of Figure 13 .
Compare row 2 between Figure 8 and Figure 13 to see that already, after two rings, the reached states are larger for the POSET method; after three rings, the di erence is even more dramatic. Unlike in the geometric region method, the geometric region found in row 3 is a superset of that found in row 1, so all further execution from row 1 can be canceled in the POSET timing method. For our example, the POSET timing method can represent the full set of reachable timed states with only 9 geometric regions after only 26 transition rings.
The POSET timing algorithm is shown in Figure 14 .
The algorithm begins with an initial state composed of the initial marking, geometric region, and poset matrix. Next, it calculates the set of timed enabled transitions, and it selects one to re. The algorithm then restricts the region to the subset where the transition is enabled to re, canonicalizes this new region, and checks if any constraint places for this transition have not met their lower bounds. Next, the algorithm updates the marking and poset, and computes the new geometric region. The clocks in this new region are then allowed to advance subject to max-advance, and the region is canonicalized and normalized. Normalization is the step which keeps the state space nite by taking care of in nite upper bounds. If in the new region any constraint place is able to exceed its upper bound, a failure is reported. due to POSET timing are much more dramatic with many more simultaneous clocks due to the n! potential orders in which those clocks can be enabled. In fact, the POSET method typically reduces the average number of timed regions for each untimed state to a value close to one. For the adverse example in Figure 9 , POSET timing obtains exactly one geometric region corresponding to the one untimed state.
F. E ciency considerations
The number of transitions in the unfolded net is equal to the length of the ring sequence plus one, and it increases with the depth of our search. Calculating the minimum separations between the occurrence times in the unfolded net, even with our incremental O(n 2 ) approach, becomes prohibitively expensive as the ring sequence lengthens. In addition, the algorithm needs a poset matrix for each step; this would require a tremendous amount of storage during depth-rst search.
To keep n bounded as the depth of our search increases,
the algorithm determines what pre x, if any, of the unfolded net can safely be ignored. The algorithm can eliminate any transitions that no longer a ect future calculations. In general, the algorithm can eliminate a variable from any set of equations or inequalities whenever it has produced the full set of equations or inequalities that use that variable. Since all constraints introduced through the ring of a transition are associated with places connecting the new transition to the old, once a transition in the unfolded net no longer has any marked places in its postset, it is eliminated from the poset matrix. Thus, our n is|at most|the number of marked places in the original net at any given time, plus one for the current transition.
As with geometric timing, POSET timing backtracks whenever a new geometric region is a subset of a previously seen geometric region. POSET timing also takes advantage of the fact that if the newly found region is a superset of a previously seen region, then any pending state space exploration from the previously seen region can be canceled, and the previously seen region can be removed from the list.
In addition, a hash table of canonicalized sequences of posets that have been explored can be kept; by comparing each potential extension of the poset against the previously seen ones, we can avoid visiting the same poset multiple times. Performing this check is usually signi cantly faster than performing all the timing calculations, so this can save execution time at the expense of memory.
IV. Applications
This section describes the application of POSET timing to the synthesis and veri cation of timed circuits.
A. Synthesis
Synthesis is the process of transforming a speci cation into a circuit implementation. Our synthesis procedure begins with a speci cation in high-level language from which a hazard-free timed circuit implementation is generated using only basic gates such as AND gates, OR gates, and Celements. After the speci cation is translated to an orbital net representation, the POSET timing algorithm is used to nd the set of reachable states. Our synthesis procedure is brie y described here. For a more complete description, please see 27], 28].
From the resulting SG, there are several di erent approaches that could be used to obtain a gate-level timed circuit implementation. The rst approach is to use a traditional boolean minimization technique directly. Unfortunately, if the logic is mapped to basic gates and the delays of these gates are considered individually, the implementation may be hazardous. Another approach is to split the design of the rising and falling transitions to obtain a generalized C-element implementation 1] and decompose it to basic gates. The basic structure is depicted in Figure 15 (a) in which the upper sum-of-products represents the logic for the set, the lower sum-of-products represents the logic for the reset, and the result is merged with a C-element. This can be implemented directly in CMOS as a single compact gate with weak-feedback as shown in Figure 15(b) or as a fully-static gate as shown in Figure 15(c) . This technique alleviates some of the hazard problems, but it may still be hazardous when mapped to basic gates. To address this problem, after a generalized C-element implementation is produced and decomposed to basic gates, the design could be back-annotated with delays from the gate library, and the circuit could be veri ed. While this may often work, it is not clear what to do in the cases in which a hazard does exist. Also, a hazard is a spurious transition which wastes power and does no useful work. In a power e cient implementation, it is desirable to have logic which is hazard-free both internally and externally. To avoid the hazard concerns discussed above, we take a standard C-implementation approach in which each rising and falling region for each output signal is implemented using an atomic gate (often a single cube or AND gate), which must satisfy certain correctness constraints. While the general structure of the standard C-implementation is similar to the generalized C-element structure shown in Figure 15 (a), each set or reset cube is implemented with an atomic gate that must satisfy certain constraints to guarantee that the merged implementation is a gate-level hazard-free circuit. The approach is conservative in that timing analysis may show that the decomposed generalized C-element implementation is su cient, but the overhead required tends to be small to get a safe implementation that is free of internal hazards. It has been observed that the atomic gate is often only a single cube. In 28], 23 of the 27 speed-independent benchmarks had a single cube implementation. In at least one of the four that does not have a single cube implementation, one exists when realistic timing numbers are incorporated into the design of a timed circuit. In 28], a single-cube algorithm is described that is typically over an order of magnitude faster than the general algorithm used for multi-cube atomic gates. Both algorithms have been implemented for timed circuits. For descriptions of the algorithms, please see 28]. This subsection describes the theory and justi es its correctness for timed circuits.
The cover of a set region C(u"; k) (or a reset region C(u#; k)) is a set of states for which the corresponding atomic gate in the implementation evaluates to one. In order for a cover to lead to a hazard-free implementation, it must satisfy certain correctness constraints 29], 28]. These constraints guarantee that any gate in the implementation only changes when it is actively driving the output signal to change. This ensures that the transition of the gate is acknowledged.
First, a correct cover needs to satisfy a covering constraint which says that the reachable states in the cover must include the entire excitation region but must not include any states outside the union of the excitation region and associated quiescent states, i.e.,
where \*" indicates either \"" for set regions or \#" for reset regions.
Second, the covers of each excitation region must also satisfy an entrance constraint to ensure hazard-freedom. This constraint says that the cover must only be entered through excitation region states, i.e., does not need to be a maximal connected set of states. It is proven in the appendix that this condition is made redundant by the entrance constraint. A concern may also be raised that a quiescent state may be reachable from multiple excitation regions; this state would be eliminated from any correct cover by the entrance constraint which is another implication of the proof in the appendix.
The correctness of our timed circuits is a direct result from the proof for the speed-independent case given in 28].
In 28], it is proven that a standard C-implementation that satis es these correctness constraints operates correctly regardless of the delay of the gates in the implementation and the environment. In other words, all delays are unknown and fall in the range from h0; 1i. In our case, the delays of the gates and environment are known. As for the gate delays, it has already been proven that these correctness constraints guarantee correctness regardless of the delay of the gates, so knowing the delays of the gates does not change that. This delay information, however, has also been used by the POSET timing algorithm to nd all possible interleavings between both the input and output signals (i.e., the SG). In other words, the timing information has limited what signal orderings are possible. As long as the states in the reduced state graph are the only reachable states, the correctness constraints guarantee correct circuit operation. Therefore, the only concern would be that the delays of the actual circuit implementation allow additional states. As long as the minimum and maximum delay of the standard C-implementation falls in the given range, there is no problem. Checking this is the subject of the next subsection on timed circuit veri cation.
The synthesis procedure using the POSET timing algorithm to nd the reachable state space has been fully automated within the CAD tool ATACS. We have applied this procedure to several examples and compared our results with designs produced using other asynchronous design methodologies including Beerel's speed-independent method (SYN) 29], Lavagno's method which adds delay elements to remove hazards (SIS) 6], and Yun's burst-mode method (3D) 32]. The results are tabulated in Table I . The rst two examples (SEL and SEL2) are di erent versions of the controller for a port selector. We also synthesized a timed circuit implementation for the controller from the simple asynchronous MMU described in 33]. The timing information for these examples is derived from corner case SPICE analysis of the datapath units and typical gates used in the controller. This corner case analysis produces minimum delays for the best-case process, 0 degrees C, and 5.5 volts and maximum delays for worst-case process, 125 degrees C, and 4.5 volts. Even for these extremely conservative delay estimates good results are obtained. We also synthesized a DRAM controller which was originally designed using burst-mode in 34]. The last example is the target-send burst-mode portion of a SCSI controller (TSBM) originally speci ed using a burst-mode nite-state machine in 34]. The delay numbers for these designs are taken from conservative delay estimates from the original design 35].
First, the literal counts (Lit) for the gate-level circuits derived using the generalized C-element (gC) technique and our standard C-implementation techniques is compared. Our results show only about a 10 percent increase in literal count for generating a safe implementation that has no internal hazards. For the rst three examples, the timed implementations are compared with those produced by SYN and SIS in terms of area represented by transistor count and latency normalized to the delay of an inverter. The timed implementations are about 40 percent smaller and faster than the speed-independent ones produced by SYN. Compared with SIS, the area gains are about the same, but the improvement in delay is now about 50 percent. The table also gives the number of reachable states (j j) for timed and other methods. The results show up to two orders of magnitude less states in the timed case. In fact, due to the large state space of the MMU example, SIS runs out of memory during synthesis. The last two examples are compared with the 3D method with the 3D speci cations and results taken from 32] assuming a 0.3ns inverter delay in a 0:8 m CMOS process. For these designs, our timed circuits show about a 30 percent improvement in area (comparing literal count) and delay.
The DRAM controller is of particular interest because it is typically implemented as a synchronous circuit. Since a DRAM controller must interface with a synchronous environment, it cannot be implemented as a speed-independent asynchronous circuit, but it can be implemented as a timed circuit that satis es certain timing constraints. Our gatelevel timed circuit implementation is shown in Figure 16(a) . A synchronous implementation of the DRAM controller shown in Figure 16 (b) is generated using Berkeley's synchronous synthesis program SIS 36] . Surprisingly, our timed design is about 40 percent smaller and 30 percent faster. The critical path of the timed circuit is only 2 gates and a memory element as compared to 4 gates and a memory element for the synchronous design. There is an additional speed advantage when the margins needed for setting the clock period are taken into account. There is also a signi cant improvement in power consumption since our timed design produces no spurious transitions. These improvements come about from the sequential don't-care information that is taken into account when the behavior of the environment is considered. In present synchronous design tools, this environmental information is neglected. Of course, a synchronous design could be produced taking this information into account as illustrated in our next example. If the clock is considered simply as another input, synchronous circuits can also be designed using our methodology. The last example is therefore a synchronous design: a two-bit counter. Using SIS and a standard synchronous gate library, the implementation for the counter shown in Figure 17 (a) is derived. This implementation uses 32 transistors and has a critical path through an inverter, 2 NAND gates, and a latch (approximately 6 inverter delays). The complex gate implementation synthesized for the counter is shown in Figure 17(b) . This implementation takes 6 transistors for the logic and 16 for the latches. The critical path through the logic is an inverter, a pass gate, and a latch (approximately 2.5 inverter delays). Our implementation is more than 30 percent smaller and more than twice as fast as the one produced using the synchronous synthesis tool SIS. Comparing the implementations, we nd that both implement C 0 0 using a single inverter. The di erence is in the implementation of C 0 1 . Our timed circuit implementation makes use of the information that C 0 1 only changes in states where C 0 is high. Thus, it is implemented using an inverter and a pass gate which is gated on C 0 . SIS's implementation, on the other hand, does not take into account .5 n/a n/a MMU 187 56 62 210 4.5 23,296 412 10 out of memory n/a n/a DRAM 79 38 38 110 5.5 n/a n/a n/a n/a n/a 46 7 TSBM 113 32 33 140 4.5 n/a n/a n/a n/a n/a 58 7.5 the sequencing of the states. For example, if a sequence of states in which the counter is counting 00-11-01-10 is possible, this circuit would generate the correct next state given the current state. This extra logic, however, is unnecessary since this counter always goes through the states in the same order: 00-01-10-11-00, etc. B. Veri cation Veri cation is the process of checking if the synthesized circuit satis es its speci cation. Our veri cation procedure requires both a speci cation and circuit implementation either given in or translated to an orbital net representation. The orbital net for the speci cation is mirrored (i.e., inputs and outputs are swapped) 37] and composed with the orbital net for the implementation. The state space is then explored using the POSET timing analysis algorithm described earlier. If a failure is detected in the process of exploring the state space, an error trace is returned, otherwise the timed circuit is found to implement its timed speci cation. In order to verify timed circuits, our veri cation procedure adopts trace theory as de ned by Dill 37] as its behavioral semantics, as well as Burch 8] extensions to trace theory semantics for timed circuits. Our veri cation procedure provides structural constructions and syntactic shorthands for labeled safe Petri nets that correspond to the behavioral semantics operations.
To verify that our synthesized timed circuits implement their timed speci cations, our veri cation procedure begins with the speci cation in a high-level language and the implementation given as a netlist of basic gates. The speci cation is translated to an orbital net representation in which the timing requirement for each behavioral place in the preset of an output transition is changed to a constraint place with timing requirement h0; 1ic. These constraints must be satis ed by the timed circuit implementation. Note that a h0; 1ic timing requirement is used rather than the delays given in the original behavioral place, since circuits with delays outside this range may function correctly. For h0; 1ic constraint place to be satis ed, it is necessary for this place to be marked whenever a transition in its postset has its behavioral place marked. In other words, if the circuit produces an output transition, the speci cation must be in a state in which it is willing to accept that transition. If a performance constraint is desired, additional constraint places with nite delay ranges can be added as well.
For each gate in the implementation, an orbital net is constructed corresponding to an instantaneous function block such as the one given for the AND gate in Figure 18(b) . This net is composed with a delay element such as the one in Figure 18 (c) with the behavioral timing requirement set by the delay given in the gate library. The behavioral place labeled h2; 4i indicates that an output occurs between 2 and 4 time units after the preceding input occurs; no behavior violating this requirement are generated by the net. The constraint places do not constrain the behavior of the net, but if another input event occurs before the preceding output event then the environment violates the speci cation. Composition of these nets gives an AND gate operating under the output delay model. In a similar manner, an AND gate operating under the input delay model could also be obtained. The delay model shown in Figure 18 (c) is relatively simple, and it su ces for many types of circuits. More complex delay models can and have been constructed, modeling more accurately the behavior of a gate under hazard conditions (for example, one which models inertial delay); for these, the separation of gate models into combinational function and delay behavior is essential 10]. Each orbital net in the implementation is composed with the other orbital nets as dictated by the connections in the netlist.
To determine if a timed circuit implements its timed speci cation, the reachable state space is found using the POSET timing algorithm for the orbital net obtained by composing the implementation with its mirrored speci cation. If while exploring the state space a failure is detected, a sequence of transitions found using a depth-rst search is reported that demonstrates the failure. This sequence, however, may be quite long, so after reporting the fail- ure the procedure nds a possibly shorter sequence using a breadth-rst search. The veri cation procedure described in the previous section has been automated in the tool Orbits written by Tom Rokicki. This tool has been incorporated into the design system for timed circuits ATACS. Experimental results are given in Table II which were run on an HP9000/735 with 144 megabytes of memory using CScheme 7.3. The left four columns indicate values that are the same for geometric and POSET timing. The startup time is the time required to parse the input and construct the appropriate orbital net. The number of net nodes is the sum of the places and transitions in the resulting orbital net. The third column gives an estimate of the number of untimed states. The fourth column gives the number of discrete states, after all timing parameters are divided by their greatest common divisor. The next four columns give the number of geometric regions and the runtime in seconds for veri cation using standard geometric timing and POSET timing, respectively.
The rst half of Table II consists of the automatically synthesized gate-level timed circuits described above. First, we nd that the number of discrete states can be quite large making discrete-time veri cation di cult, if not impossible. Veri cation of these examples using POSET timing is also more e cient than the geometric timing approach, especially in the case of the DRAM controller where the veri cation time is improved by over an order of magnitude.
The second half of the table consists of other timed circuits and systems that exhibit a high degree of concurrency. For example, the seitz queue element is from 38], and seitz2 is two connected copies of this circuit. The kyy examples 35] have thirty-seven gates and timing parameters given to three signi cant digits. Where the examples ran out of time or space using the geometric method, often the veri cation is far from done. For the seitz2 example, after one hour of CPU time, only 1,404 of the 4,572 untimed states have been seen, yet 473,202 distinct geometric regions have been encountered. One particular untimed state has 13,275 distinct geometric regions at this point. POSET timing for this example nds the entire state space as 5,820 geometric regions in one half minute of CPU time.
One more thing to consider from Table II is the ratio of the number of regions found using POSET timing to the number of untimed states. We nd that POSET timing often nds on average very close to one, and in all of our examples, no more than two geometric regions for every untimed state. This means that the POSET timing approach is achieving a near optimal representation of the timed state space.
V. Conclusion
This paper describes POSET timing and its application to the automatic synthesis and veri cation of gate-level timed circuits. The POSET timing algorithm operates on speci cations represented as orbital nets. These nets must satisfy the single behavioral place restriction. We describe a net transformation method that can be applied to any 1-safe timed Petri-net to always produce an orbital net that satis es the single behavioral place restriction. The POSET timing algorithm extends geometric methods using concurrency and causality information represented in a poset. By considering posets of events instead of linear sequences, the POSET timing algorithm is capable of substantially reducing the number of geometric regions necessary to represent the reachable timed state space. Our original synthesis method for timed circuits is restricted to choice-free systems. POSET timing allows us to extend this synthesis method to a very general class of systems, namely any system that can be speci ed as a timed Petrinet. We also demonstrate the e ectiveness of this synthesis procedure on several practical examples, and our results indicate that our timed circuit implementations are significantly smaller and faster than those produced by other asynchronous and synchronous design methodologies. Our veri cation results show that POSET timing veri cation can handle some larger, more concurrent examples than the standard discrete or geometric methods. POSET timing also often nds on average very close to one and no more than two geometric regions for every untimed state which means this approach is achieving a near optimal representation of the timed state space. The e ciency of the POSET timing algorithm has allowed us to incorporate timing into asynchronous circuit design. Our procedure produces both e cient and reliable implementations opening the door to the use of asynchronous circuits in domains previously dominated by synchronous circuits. 
