Abstract-The class of speed independent (SI) circuits opens a promising way towards tolerating process variations. However, the fundamental assumption of speed independent circuits is that forks in some wires (usually, large percentage of wires) in such circuits are isochronic; this assumption is more and more challenged by the shrinking technology. This paper suggests a method to generate the weakest timing constraints for a SI circuit to work correctly under bounded delays in wires. The method works for all SI circuits and the generated timing constraints are significantly weaker than those suggested in the current literature claiming the weakest formally proved conditions.
I. INTRODUCTION
As technology shrinks into the deep sub-micron scale, process variations become one of the main obstacles to circuit design. Asynchronous design, which inherently highly tolerates process variations, suggests a promising solution to combat this problem. Among all asynchronous design paradigms, delay insensitive (DI) circuits show the highest tolerance to delay variations in both logic gates and wires. However, as was proved in [1] , almost all useful asynchronous controller specifications do not have DI implementations. So, speed independent/quasi-delay insensitive (SI/QDI) circuits, which only need an isochronic-fork timing assumption, were introduced to enlarge the class of specifications that could be synthesized while still holding a strong variation tolerance ability. However, the isochronic fork assumption becomes unreliable during process shrinking. The more and more severe threshold variations and wire delays (compared to gate delays), together with buffer insertion, could cause isochronic fork failure. Recent research proved that the isochronic-fork timing assumption could be relaxed into a weaker and easier to satisfy timing assumption [3] , and only a few forks in SI circuits are dangerous [2] . These studies provide valuable resources on the hazard detection when isochronic fork assumption is relaxed. But the problem is still there. In [3] authors proved that adversary path timing assumption was equivalent to the isochronic fork timing assumption for the correctness of SI circuits. They claimed this assumption was the weakest timing assumption that was both necessary and sufficient for correct operation of SI circuits. However, the isochronic fork timing assumption is only a sufficient but not 978-3-9810801-7-9/DATE11/ c ⃝2011 EDAA a necessary condition for the correctness of SI circuits. Thus, the equivalence between adversary path and isochronic fork timing assumptions cannot prove the necessity of the adversary path timing assumption. By considering the function of gates, the timing requirement for correctness is considerably weaker than simply 'contains no adversaries'. In [2] authors proposed a method to generate the timing constraints for a SI circuit to work correctly, but their technique directly compared transitions in the high level signal transition graph to decide whether a circuit glitches. So, the use of their method is quite restricted. In this paper, a technique to generate the weakest timing constraints for a SI circuit to work correctly corresponding to a given environment under the bounded wire delay model is presented. All the generated constraints could be fulfilled by delay padding. Thus, severe process variations in deep submicron age could be tolerated by SI circuits in which a few dangerous forks are fixed. This paper is organized as follows: section II introduces the concepts used the following sections; section III presents our method to generate the weakest timing constraints for any SI circuit and the how to fulfill them; section IV shows the experiment results and section V concludes the paper.
II. PRELIMINARIES
In this section, definitions and notation used in this paper are introduced. All wires and gates are considered under the pure delay model. Pure delays only shift transitions for a given time without absorbing any glitches; By contrast, an inertial delay not only delays a transition but also suppresses any pulses whose width is narrower than a given value. Without considering the case where glitches are used to eliminate the hazards [4] , a SI/QDI circuit will work correctly under the inertial delay model if it works correctly under the pure delay model. Usually, circuits are considered in context with their environment (ENV). The signals that come from the EVN are inputs to the circuit and denoted by set I and that feedback to the ENV are outputs denoted by set O. Besides, the signals inside the circuit are denoted by set R. Circuit: A circuit is a triple C = (A, F, ψ), where A is a set of signals, F is a set of functions, such that for each signal a ∈ (R ∪ O), there is a function f a ∈ F which computes a and ψ is a labeling function which labels a wire between a and each fanout signal of a. Given a state s of the circuit (which is a Boolean vector), f a↑ (s) = T RUE iff the function f a evaluates to '1' in s, f a↓ (s) = T RUE iff f a evaluates to '0' in s. In this paper the circuit is under the intra-operator fork timing assumption [3] , which assumes that in a fork, wires fed to same gate are considered to have the same delay. The behavior of an SI circuit is often depicted by a formal version of a timing diagram, called a signal transition graph, which is an interpreted Petri Net [10] . Petri Net (PN): A Petri Net is a quadruple N = (P, T, F, m 0 ), where P is a finite set of places, T is a finite set of transitions, F ⊆ (P × T ) ∪ (T × P) is a flow relation, and m 0 is the initial marking. A place p (transition t) is an input place (transition) of a transition t (place p) if p × t ∈ F (t × p ∈ F) and is an output place (transition) of a transition [10] : A signal transition graph is a triple G = (N, A, λ ), where N is the underlying Petri Net, A is a finite set of signals, and λ is a labeling function which labels each transition to A × {+, −}. ∀a ∈ A, a+ depicts a rising transition on signal a, a− depicts a falling transition on signal a and a * is used to depict either a+ or a−. The places which only have one pre-and posttransition in a STG are often omitted for brevity. In the following sections, a STG is decomposed into a set of MGs and in each MG segment, all places are omitted and the flow relation ⇒ between two transitions is called an arc. The STG in which A = I ∪ O that only depict the interactions between the circuit and the environment is called a specification STG, denoted by STG spec ; while the STG in which A = I ∪ O ∪ R that depict the whole event order in a circuit is called a implementation STG, denoted by STG imp . The method presented in this paper requires that the original STG imp could be decomposed into a set of MGs, in which events have explicit orderings. Currently the limitation for the original STG imp are safe and free-choice. The technique to decompose any safe STG into MGs will be left for future work. STG is usually used as a succinct high level event-based model to represent the orderings of the signal transitions in a circuit. The explicit causality and concurrency between transitions in STG is good for manipulating the relations between transitions, while the verification could be done much easier in a low level state-based model, the state graph, which explicitly shows every state a circuit could reach. The state graph corresponding to a STG could be derived from this STG by traversing all markings the STG could reach from m 0 [8] . State Graph (SG) [10] : A State Graph is a quadruple SG = (A, S, E, π), where A is a finite set of signals, S is a finite set of states, E ⊆ S × S is a set of transitions, and π is a labeling function which labels each state with a bit-vector over A. 
III. RESYNTHESIS AND HAZARD REMOVAL METHOD
In this section, a technique to generate the weakest timing constraints for a SI circuit to work under the bounded wire delay model is presented. This technique is based on STG relaxation, which removes one timing ordering in an STG and checks whether a glitch could appear in the new STG. If no the STG will be updated, and if yes a timing constraint will be generated. This process iterates until all timing orderings in the final STG are guaranteed by gate logic or interface protocol or timing constraints. The 'relaxation cases' in this section make sure all the situations under bounded wire delay model are considered, the 'relaxation ordering' ensures the generated timing constraints are the weakest under certain criterion and the 'delay padding' method guarantees all of these timing constraints could be fulfilled. The flow graph of the technique is presented in Fig. 1 .
A. Generation of the weakest timing assumption 1) deriving local STG:
A SI circuit is hazard free if each gate is hazard free under hazard free inputs. This property implies that hazard analysis could be carried out in the local environment for each individual gate to avoid the full state exploration problem. The local environment for a gate a is the STG (or a set of MGs that are equivalent to this STG) that only depicts the transition relations between a and the inputs Graph on a subset of signals is achieved by projection of the SG of this MG on these signals and then using the technique for deriving a free choice PN from a finite transition system in [7] to get the new MG. The projection of a SG on a subset of signals is defined as follows. Projection of a State Graph on a subset of signals [6] :
only one is different between s and s ′ }
The step that derives the local STG for gate a is performed by function Deriving local STG(MG, a).
2) Timing ordering relaxation:
In order to allow a SI circuit to work under bounded wire delays, certain timing conditions have to be satisfied on the inputs of the logic gate in the circuit. For this, we need to analyze the timing relations between signal transitions in the local STG(s) for each gate. When we remove the isochronic fork assumption, we effectively relax some of these relations. In particular, the relations that need to be relaxed are between input transitions to the gate. Below is a classification of the four kinds of arcs that appear in the local STG of a gate a.
(1) x * ⇒ y * , where x ∈ f anin(a) and y = a (2) x * ⇒ y * , where x = a and y ∈ f anin(a) (3) x * ⇒ y * , where x, y ∈ f anin(a) and x = y (4) x * ⇒ y * , where x, y ∈ f anin(a) and x ∕ = y The arcs (1) -(3) are irrelevant to the isochronic fork relaxation. These orders are always guaranteed in the circuit. Therefore, the critical one is order (4), which assumes that the transition x * propagates to gate a before it propagates along a path that triggers y * and y * reaches gate a. All such orders, if reversed by wire delays, are adversary paths in [3] . Fulfilling all the relations in (4) could guarantee the correctness of the circuit (equivalent to the no adversaries requirement in [3] ). However, as will be seen later, this requirement could still be relaxed. In this section we try to relax the transition orderings in case (4) as much as possible if no hazards would appear in the resulting STG. Relaxation of an arc x * ⇒ y * in STG is depicted in Fig. 2 From the STG view, relaxation changes one order assumption into a set of weaker order assumptions; from the circuit view, it removes the limitation that the transition x * must reach the gate a before transition y * . After relaxation, in the SG corresponding to the new STG, more states are reachable. The original STG will be updated into the relaxed one if no new states contain potential hazards; if glitches appear in the newly added states then a timing constraint "x * must reach gate a before y * ", denoted by x * ⇛ y * , is added in the timing constraint set Rt and the arc x * ⇒ y * is marked 'guaranteed' in the STG. Each newly generated STG after relaxation is simplified before next relaxation by eliminating multiple and redundant arcs.
3) Hazard criterion: Whether the relaxation of an arc is acceptable or not depends on whether the newly introduced states cause glitches. A glitch is a premature transition, in which the output of a gate is enabled when it is expected to remain stable. We first define the prerequisite event set of an event, the prerequisite event set of the i-th occurrence of event a * , E pre (a * /i) = {z * : z * ∈ ( < a * /i)}. The prerequisite event set for each transition on output signal a is calculated before relaxation and is used to check the correctness after this relaxation has carried out. When an arc x * ⇒ y * of type (4) in a STG is relaxed, in the SG of resulting STG one of the four cases will be satisfied: relaxation case1: ∀s ∈ QR(a+), f a↓ (s) = FALSE and ∀s ∈ QR(a−), f a↑ (s) = FALSE. relaxation case2: ∃s ∈ QR(a+) such that f a↓ (s) = T RUE and ∀s ∈ QR i (a+) such that f a↓ (s) = T RUE, s(z) = 1 if cause any glitches. Relaxation case 2 and 3 are the situations that the gate is enabled in some states in QR(a * ) but may not imply a glitch. Relaxation case 4 is the case where glitches will definitely appear. Case 2 occurs when one transition x * which cannot be guaranteed to be acknowledged by the gate output transition a * is relaxed to be one predecessor transition of a * . If x * cannot cause a * in any cases it is the case 2.1, otherwise it is the case 2.2. Case 3 occurs when transition x * triggers a * but when the arc x * ⇒ y * is relaxed, y * could trigger a * instead of x * . If case 2.1, 2.2 or 3 occurs then the STG will be modified as shown in Fig. 4 (if in case 2.2 or 3, the STG will be split into 2 sub-STGs, and each of them will be expanded recursively). All the new STG(s) after modification will be checked if they contain glitches. If not the STG will be updated, otherwise, a timing constraint x * ⇛ y * will be added. Arcs marked with a # symbol in Fig.  4 are only used to indicate 'if in that case' (e.g. if in the case that x * comes before y * ) and will not be relaxed in the future steps (also not a timing constraint). The function that relaxes all possible arcs in a STG is listed in Alg. 2. 4) Optimal relaxation ordering: Different relative timing constraint sets might be derived if arcs are relaxed in different orders. This is shown in Fig. 5 . Four different sets of timing constraints could guarantee the correctness of the circuit:
We prefer to generate the optimal one during the relaxation process rather than generate all of the cases and choose the best one. Exhaustion of all relaxation orders implies a time complexity of O(n!) w.r.t. the number of arcs to be relaxed and most of these relaxations lead to the same results. Here, we consider the criterion for the weakest timing constraint set as the one where the tightest constraint in it is the loosest among all sets. The weakest constraint set could be generated by relaxing the tightest arc at each step. This will relax tighter arcs as much as possible before they become the necessary timing requirement to avoid entering the hazardous state. The standard of the tightness could be automatically determined by the topology of the arc in the circuit. For example, the path that crosses the environment is usually considered looser and the path which contains more levels is looser. This information is derived from the STG imp and the function f ind tightest arc(NM) returns the tightest input ⇒ input arc in NM, the arc set whose orderings have not been guaranteed yet. The top level algorithm to derive the weakest timing constraints for a SI circuit to work under the wire delay model is depicted in Alg. 3. 
B. Delay padding to fulfill the dangerous timing constraints
When the relaxation is done, all of the generated timing constraints are changed into the pairwise delay constraints between a wire and a path by tracking back to the implementation specification STG imp and then looking up the Circuit C. As an example the STG imp and the circuit of a FIFO is presented in Fig. 7 . The generated delay constraints using the technique in last section are shown in Tab. 1. The constraint in each row means that the delay of the wire in first column must be smaller than the sum of the delays in the third column. The circuit is guaranteed to work correctly if all of these delay constraints are fulfilled. Most constraints in table 1 are quite loose, which are considered to be fulfilled automatically; while, when some constraints are considered dangerous, delay padding to guarantee these orders (or another technique that could fix the order of two events) has to be carried out. Fig. 6 shows the possible padding position (position 1-5) to guarantee the delay constraint that a wire from gate g 1 to g 4 should be faster than another path between these two gates. Padding on position 1, 3 or 5 (padding on wire), only delays transitions on one branch of a gate; while padding on position 2 or 4 (padding on gate) will delay all branches of a gate, which is equivalent to the increase of the delay of a gate. Padding on a gate could always fulfill one delay constraint without worsening other delay constraints, but might unnecessarily delay other branches in a fork; while padding on a wire has less performance penalty but might worsen another delay constraint if the wire that the delay padded on should be faster than another adversary path. A greedy padding policy is used which tries to pad the delay on position 1 if the corresponding wire does not participate in another delay constraint (in column 1), if it does then, tries to pad on position 3. In the worst case all of the wires in the adversary path are in some other delay constraints then pad on the position 2 could break this cyclic demand.
Due to the padding rule described above it could be guaranteed that all of the delay constraints could be fulfilled (padding on the last gate could always fulfill this delay constraint without worsening another, like the technique used in synthesis in [9] ). Usually, the optimal padding method to get the minimum performance penalty is to try to pad delays on wire near the destination gate of an adversary path such that this wire is not in the fast path of another delay constraint by looking up the delay constraint table such that this wire does not appear in the first column. The constraints in Tab. 1 could be fulfilled by just delaying unidirectional transitions. This introduces less performance penalties and could be done using the current starved delays. The performance penalties using different delays in FIFO example are shown in the next section. 
IV. EXPERIMENTS AND RESULTS

A. Comparison of timing constraints
The generated timing constraints are significantly weaker compared with the constraints implied by timing assumption proposed in [3] , which is currently the weakest formally proved conditions. To the FIFO example, there are three ≤3-level (two wires and one gate in the adversary path) and four ≤5-level (three wires and two gates) timing constraints in timing assumption proposed in [3] , which do not cross the environment. But only one and two, respectively, are generated by our method.
B. Delay penalty introduced by delay padding
The FIFO circuit is simulated under SPICE simulation using ASU PTM bulk CMOS model library (90nm to 32nm) [5] for testing the delay penalty due to the padding. Fig. 8 shows the delay penalty to eliminate all the glitches in one million gates scale (scale decides the maximum wire length) using different padding methods (buffer and one-direction current-starved delay). The delays are inserted to just counter the maximum wire length delay, the environment is assumed to be zero delay and the delay penalty is calculated as the maximum latency increase in the slowest STG cycle. 
V. CONCLUSIONS
Are the timing constraints produced by our method necessary? The answer is it is not guaranteed. One example is the circuit in Fig. 5 . There are four sets of timing constraints that could guarantee the correctness of the circuit thus none of them is necessary. Our method just generates the weakest one among all solutions w.r.t. a certain criterion. Speed independent design suggests a good paradigm to tolerate process variations but the more and more sever wire delays and process variations also challenge the fundamental assumptions of the speed independent design. This paper proposes a method that fixes only few potential hazardous places in the speed independent circuits to make sure that speed independent circuits could work correctly under the wire delay dominance and in the high variation age.
