Abstract-Analyzing the timing properties of asynchronous systems is essential for characterizing their performance and power. Previous work on timing showed that such systems under and-causality and fixed delay exhibit periodicity properties. We give a different graph-based rigorous proof of the exact timing behavior of more general classes of such systems, and conclude their exact periodicity property, where each of the signal transition will occur with the same period after finite occurrences. We established our results under weaker assumption about system connectivity/topology, and this paper provides the theoretical foundation, for the exact periodicity property to be applied and exploited in circuits containing a combination of synchronous and asynchronous components. We provide simulation-based results for several typical asynchronous circuit topologies to quantify this time period in practical circuits. We also provide an extension of our analysis and methods to the case of bounded delay systems. A key result that is a consequence of our analysis is that asynchronous circuits can be integrated with synchronous logic via a metastability-free interface, thereby eliminating the highoverhead synchronizers when an asynchronous circuit is fully surrounded by synchronous logic.
I. INTRODUCTION
T IMING analysis of asynchronous logic is more complicated than the equivalent problem in synchronous logic. Most synchronous circuits can be partitioned into acyclic regions of combinational logic separated by clocked elements, simplifying timing analysis. The (typically acyclic) combinational logic section is characterized by its minimum and maximum delay. The two delays combined with setup and hold time properties of state-holding elements can be used to characterize the timing properties of synchronous logic. This analysis is the part of any synchronous design automation framework, and used for timing optimization during design and chip implementation.
In contrast, asynchronous logic cannot be readily partitioned into acylic regions of combinational logic. Many common asynchronous logic gates (the C-element is an obvious example) are inherently state-holding, and the timing behavior of W. Hua is with Cornell University, New York, NY 10011 USA (e-mail: wh364@cornell.edu).
R. Manohar is with Yale University, New Haven, CT 06520 USA (e-mail: rajit.manohar@yale.edu).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD.2017.2693268 asynchronous circuits must be modeled by examining cyclic paths of timing dependencies. Since, the number of cycles in a circuit can easily be exponential in the number of connections between pairs of gates, at first glance this appears a computationally intractable problem. Fortunately, there are well-known timing analysis techniques that have polynomial complexity that can be used to analyze the timing properties of asynchronous circuits. Event-rule (ER) systems together with other closely allied approaches including a restricted class of Petri-nets [1] , [2] known as marked graphs [3] , process graphs, and timing constraint graphs are widely used to model the timing properties of asynchronous systems. In these approaches, the circuit is abstracted away as a collection of events (an event being a signal transition), and the circuit topology is used to construct a graph that captures interevent dependencies and their timing relations. ER systems model and-causality (also known as max-causality), where an event can occur only after all its predecessors have occurred. 1 Since, a circuit is normally a nonterminating computation, ER systems are typically infinite structures. To capture the finite nature of a circuit, ER systems corresponding to concurrent systems, such as asynchronous circuits can be generated from bounded structures referred to as repetitive ER (RER) systems. Intuitively, the ER system can be obtained by an infinite "unrolling" of the RER system, just like a finite while-loop can be unrolled to construct an infinite computation.
Such systems and models have been explored in many previous papers and properties of MinMax Algebra or MaxPlus Algebra [4] - [8] were also shown to be closely related. Burns [9] introduced strongly connected (where there is a path from any transition to another) RER systems with andcausality and fixed delays, and showed that a minimum-period linear timing function is a good approximation of the timing of events. Gaubert and Klimann [4] used dioid algebra and concluded that such systems will eventually become periodic, and if they are strongly connected, each event will eventually occur with the same period. Some of the recent papers [5] - [8] on Max-Plus Algebra showed that the corresponding system associated to the Max-Plus matrix will be periodic after finite transience and gave bounds on this interval. They also pointed out that if the corresponding system is strongly connected, the period will be the same as well. Hulgaard et al. [10] presented an algorithm for exact bounds on the time of separation of events (TSEs) in concurrent systems with bounded delays. In this paper, the analysis of systems with bounded delays is essentially based on that of the fixed delay version of the systems by considering the extreme values of delay bounds, and on the assumption that timing values in the fixed delay version will eventually become periodic and have the same period. Hulgaard et al. [11] also proposed a modification of the TSE algorithm in certain types of nonstrongly connected systems. The modification relied on the empirical observation that often such systems also exhibit periodicity, and their work applied the modified algorithm to the timing verification of "STARI" [11] , [12] , where the system is not strongly connected and contains both synchronous and asynchronous components. This paper [13] provided bounds on average time separations of events based on the stochastic timing model. Properties of min-max functions were discussed in [14] and [15] , where the fixed delay ER system was extended to or-causality [15] , [16] . A variant of Winskel's event structures [17] , timed ER structures, was introduced in [18] , which considered generalized ER systems with bounded timing constraints and added conflict from event structures to model nondeterministic behavior. In particular, the work considered environmental choice, which is also an important part of generalizations of models to or-causality. An efficient algorithm that analyzes the time separation bounds of events in the systems with bounded delays and both types of causalities was also proposed in [19] . The work [20] revisited the TSE problem, and provided an efficient algorithm that avoids undetermined number of graph unfoldings, based on the direct evaluation of periodic steady-state behaviors of systems under strongly connected and-causality and fixed delay models.
In this paper, we provide a different rigorous proof of the timing analysis of RER systems, and therefore, the timing characteristics of asynchronous circuits, based on graph and path properties directly. We establish that an RER system with and-causality and fixed delays always exhibits exact periodicity and can be exactly characterized by a periodic function after finite number of occurrences k of each transition. Our proof of system periodicity does not require the assumption that RER systems are strongly connected. This paper provides theoretical foundation, for the exact periodicity property to be applied and exploited in circuits containing both synchronous and asynchronous logic. 2 We provide an analytical bound on k as a function of system parameters (Section III). These parameters are known ahead, or can be found by algorithms with bounded number of graph unfoldings. Our results hold regardless of initial conditions and the nature of the cycles in the RER system, and we will show that the time taken to reach periodic steady-state is quite short for many practical asynchronous circuit topologies (Section IV). We also provide an extension of our results to bounded delay systems by considering the extreme values of delay bounds, a similar idea as in the previous work [10] , [11] , [20] .
Our results have important practical consequences. In particular, we conclude that if the primary inputs and outputs of an asynchronous circuit are placed in an environment where all the inputs are computed in a single clock domain that is slower than the natural frequency of the asynchronous logic, all the outputs of the asynchronous circuit will be synchronous with respect to the same clock after a finite initial time interval. Hence, we can build metastability-free interfaces between synchronous and asynchronous logic. We provide an example of one such interface in Section IV.
II. BACKGROUND
We now introduce the event rule formalism, and then present the formulation of the timing analysis problem in that framework. Some basic definitions from previous work [9] are repeated here for completeness.
Definition 1: An ER system is a pair E, R where E is a set of events, and R ⊆ E × E × R ≥0 is a set of rules that define timing constraints.
A rule r = (e 1 , e 2 , α) is written as e 1 α → e 2 , and indicates that the event e 2 cannot occur until α time units after e 1 occurs. e 1 is said to be the source of r [denoted src(r)], and e 2 is said to be the target of r [denoted tgt(r)]. There could be events f for which there are no rules that constrain their timing. In other words, there is no r ∈ R such that tgt(r) = f . These are referred to as initial events. For a nonterminating computation, the set of events E as well as the set of rules R are infinite.
Often systems with nonterminating computations of interest are repetitive. Hence, although the sets E and R are infinite, they can be represented as unfoldings of a compact, finite representation.
Definition 2: An RER system is a pair E , R where E is a set of transitions, and R ⊆ E × E × R ≥0 × N is a set of rule templates. 3 The ER system corresponding to E , R is given by E, R , where E = E × N and R consists of rules of the form u, i α → v, i + where r = (u, v, α, ) ∈ R and i ∈ N. Here, the non-negative integer i is called the occurrence index and is called the occurrence index offset of r.
Definition 3: Given an ER system, the function t : E → R ≥0 is said to be a timing function for the ER system iff
(1)
Initial events can be assigned an arbitrary time by a timing function, as there are no rules that constrain them. Notice that there might be multiple rules which have the same target. In this case, each rule places a constraint on the time when the event f can happen and all constraints must be satisfied; hence, this representation captures and-causality.
The timing simulationt is a special timing function which captures the notion of the actual time at which each event occurs.
Definition 4: The timing simulationt is the timing function defined byt
with the convention that the max operator evaluates to 0 when the set is empty (for initial events). We assume that the timing simulation assigns time 0 to every initial event unless otherwise specified. Note that the definition of timing simulation is similar to "as soon as possible scheduling." It is easily established that for any timing function t,t(e) ≤ t(e) for all events e [9] . Given a potential timing function for the system, we introduce the notion of the slack of a rule in the ER system.
Definition 5: Given a function t : E → R ≥0 , the timing slack function s t : R → R induced by t is defined by
The timing slack is another way to identify whether or not a given function t is in fact a valid timing function; t is a valid timing function iff the slack function is non-negative.
It is sometimes convenient to visualize event rule systems as well as RER systems in graphical format. For event rule systems, the acyclic constraint graph is used to visualize timing constraints [9] . The graph is constructed from the ER system: the set of vertices is the set of events E and a rule e α → f ∈ R is displayed as an edge from e to f with an edge weight label corresponding to the delay α.
For RER systems, a similar visualization approach is used and is referred to as the collapsed constraint graph [9] . The set of vertices of the graph is the set of transitions E , and each rule template u, v, α, is displayed as an edge from u to v with an edge weight α. In addition, if > 0, the edges are marked with lines (called "ticks") where the number of lines is the value of .
The following definitions deal with the graphical representation of the ER and RER systems.
Definition 6: A path p of length n [denoted as l(p) = n] in an RER system is a list of arcs (e 0 , e 1 , . . . , e n ) where (e i , e i+1 , α i , i ) ∈ R , for some α i ∈ R ≥0 and i ∈ N. We define the delay along the path p by δ(p) = i α i and the epsilon-sum (p) = i i . A path is a simple cycle iff e 0 = e n and e i = e j for all other pairs i, j with i = j.
A path p of length n in an ER system is a list of arcs (e 0 , e 1 , . . . , e n ) where e i α i → e i+1 ∈ R, for some α i ∈ R ≥0 . We define the delay along the path p by δ(p) = i α i . Given a timing function t, we can define the slack of the path s t (p)
We examine paths in two graphs (the constraint graph and the collapsed constraint graph) in the analysis below. To resolve ambiguity, we use the term e-path to refer to a path in the ER system rather than the RER system.
For example, consider the RER system E , R where E = {x↑, x↓, y↑, y↓} R = {(x↓, x↑, 10, 0), (x↑, x↓, 10, 1), (x↑, y↑, 1, 1) (y↑, x↑, 1, 0), (y↑, y↓, 10, 0), (y↓, y↑, 9.9, 1)}.
The graphical representation of this RER system is shown in Fig. 1 (similar to an example explored in [20] ), with part of the ER system corresponding to this RER system shown in Fig. 2 . In this example, the initial events are y↑, 0 , and x↓, 0 . Thus,t( y↑, 0 ) = 0 andt( x↓, 0 ) = 0. Consider the transition x↑. We can set t(x↑, 0) to be any number greater than ten to obtain a valid timing function. However, by definition, onlyt( x↑, 0 ) = max{t( y↑, 0 + 1,t( x↓, 0 ) + 10} = max{0+1, 0+10} = 10 is the timing simulation. In Fig. 2 , the timing simulations of the corresponding events are provided. We highlight the rules with zero slack in blue, and the rules with positive slack in bold red.
While general RER systems allows arbitrary non-negative integers (r) for rules r ∈ R , in practical cases, (r) is often 0 or 1. Furthermore, if a rule with delay α has (r) > 1, it can also be represented by (r) consecutive directed edges, each with delay α/ (r) and = 1. Thus, without loss of generality, we assume that (r) ∈ {0, 1}, and thus max{ (r)} = 1 for all of the rules in the RER systems discussed in the rest of this paper. Given this assumption, we have the following corollary.
Corollary 1: For any path or e-path p
A straightforward generalization of Definition 4 (timing simulations) from edges to paths is also stated without proof below.
Corollary 2: If p is an e-path from e to f ,t(f ) ≥t(e)+δ(p). Definition 7: An RER system is said to be valid if for each transition e in the RER system, there is a path p from e back to itself with (p) = 1 and δ(p) > 0.
Theorem 1: In a valid RER system ∀n e > 0 :t( e, n e ) >t( e, n e − 1 ).
Proof: By assumption, for any event e there is a path p e from e to itself with (p e ) = 1 and δ(p e ) > 0. By construction of the ER system, there is a corresponding e-path p e,n e from e, n e − 1 to e, n e . By Corollary 2 t( e, n e ) −t( e, n e − 1 ) ≥ δ p e,n e > 0
as desired. The assumption of being valid is a technical one. In any RER system that corresponds to a physical circuit, we will havet( e, n e ) >t( e, n e − 1 ) since this says that the n e th signal transition occurs after the (n e − 1)th signal transition on the same variable. If the RER system does not satisfy the constraint of being valid but satisfies the inequality in Theorem 1 by virtue of corresponding to a circuit, we can compute the smallest gap between successive transitions of the same event and add a new "self loop" rule template with a delay that is smaller than the smallest gap. This will not change the timing simulation and result in an RER system that satisfies our technical requirement.
Definition 8: An RER system is live if each simple cycle S in the system satisfies: (S) ≥ 1.
All of the RER systems discussed in the following sections are assumed to be live and valid. If an RER system is not live, then all the transitions on cycles S with (S) = 0 can be collapsed into a single transition without loss of generality [9] .
Definition 9 (Critical Cycle): The cycle mean of a cycle C is defined to be δ(C)/ (C). A critical cycle C of an RER system is a simple cycle with the maximum cycle mean p ,
The system shown in Fig We use N = |E | to denote the number of transitions in the RER system, and D = |R | to denote the number of rule templates in the RER system. We also assume that the maximum delay of a rule r ∈ R in the RER system is denoted as α max . In most of previous work, the collapsed constraint graph is assumed to be strongly connected. In this paper, we make a weaker assumption-namely, that for any transition f in the RER system, there exists a path p from a transition on some critical cycle to f . Given this assumption and the definition of N, we have the following corollary.
Corollary 3: For any transition f in the RER system, there exists a path p from a transition e on some critical cycle C to
In what follows, we examine e-paths in the ER that correspond to cycles in the corresponding RER system. We sometimes refer to these e-paths as cycles-although the e-path in the ER system is in fact acyclic, it corresponds to a cycle of transitions in the corresponding RER system.
III. PROOF OF PERIODICITY
Given these technical preliminaries, we will prove our main technical result Theorem 3, the exact periodicity in an RER system with and-causality and fixed delay, based on graph and path properties, which can be stated as follows: there are integer constants M ≤ lcm{1, 2, . . . , N} < 3 N [21] (M is defined later in the proof, which characterizes the period of the system, a similar parameter to in [10] and the bound here corresponds to the worst case scenario) and
(p is also defined later in the proof, which characterizes, and is at most, the second largest cycle mean of the system) such that for all transitions e and all integers n ≥ k
The work [14] implied that the system will have the same asymptotic cycle time under our weaker assumption of system connectivity/topology, and papers [5] - [8] gave bounds on transience of the system corresponding to Max-Plus matrix before it exhibits periodicity and pointed out that if it is strongly connected, the period will be the same and equals to critical cycle mean. In particular, systems (graphs) associated to the Max-Plus matrices in their papers only correspond to the RER systems in this paper when every edge has exactly 1 tick, and the work [6] provided some insights on how to apply their results to more general systems like those in cyclic scheduling. We provide a different proof strategy. Our results on exact periodicity properties of systems make weaker assumption about system connectivity/topology, and thus directly characterize and apply to more general classes of systems, especially the circuits/systems that we will discuss later. Although our bounds on k are not directly comparable to those of [5] - [8] because of the different assumptions and definitions, parameters and terms in our formula are similar in structure to previous results.
The rest of this section establishes this main technical result through a number of small steps. The next three lemmas are straightforward.
Lemma 1: Let f be a noninitial event in an ER system. Then, ∃ at least one rule r = e α → f ∈ R such that sˆt(r) = 0. Proof: If f is a noninitial event, then there is at least one rule that has f as a target. This means that in the equation fort(f ) (Definition 4), the set on the right-hand side of the equation is nonempty. Hence, there is some
Corollary 4: For every noninitial event, there exists a path from some initial event to it that has zero slack.
As can be seen in Fig. 2 , from any event, we can always find a blue path (zero slack rule) back to an initial event.
Consider the ER system that corresponds to an RER system. For any noninitial event f , n f , there is a zero slack epath from some initial event to it (Corollary 4). Consider some event e, n e on such a zero slack e-path that includes e, n e
Lemma 2 (Zero Slack Path Timing Property):
For all i ∈ Z where i + min(n e , n f ) ≥ 0 t f , n f + i −t f , n f ≥t( e, n e + i ) −t( e,
n e ). (6)
Proof: By Corollary 2, given any e-path p from e, n e + i to f , n f + i , we havê
By construction, we are given a zero slack e-path p 0 :
Since the ER system is a generated from an RER system, there is an e-path p i from e, n e + i to f , n f + i corresponding to p 0 , and
Using (8) for δ(p 0 ) and rearranging the inequality gives us the result.
The proof of Lemma 2 relies on the repetitive nature of the ER system obtained from the RER system-if a zero-slack path exists between e, n e and f , n f , then the delay on that path is a bound on the minimum delay between e, n e + i and f , n f + i for any i ≥ − min(n e , n f ).
Proof:
for all i ≤ k, i ∈ N + , where j i ∈ {1, 2, . . . , k − 1}. Since we have k j i 's, but only k − 1 different elements, the pigeonhole principle indicates the existence of
With all the preparation lemmas above, the following lemma shows that the timing gaps of the occurrences of the same transition can be considered in blocks with finite sizes, where each block is bounded above by a quantity related to the critical cycle mean.
Lemma 4 (Upper Bound):
Given an event f , n f and
where p is the critical cycle mean (Definition 9). Proof: By Corollary 4, there always exists a zero slack e-path from some initial event g, 0 to the noninitial event f , n f . By finding back zero slack rule from f , n f , we will obtain such a path.
If n f ≥ NI, then this zero slack e-path has at least NI edges [because max{ (r)} = 1]. We only have N unique transitions in the RER system, so some transition (denoted as e) must appear at least (NI/N) + 1 = I + 1 times on this e-path. Consider the I + 1 occurrences of transition e that are closest to the initial event g, 0 in time, denoted by events e, n 0 , e, n 1 , . . . , e, n I , where 0 ≤ n 0 < · · · < n I ≤ NI. (Notice that e might occur more than I + 1 times, and we only choose the I + 1 occurrences with the smallest n i 's).
We can decompose the large cycle ( e, n 0 , . . . , e, n 1 , . . . , e, n I ) into I cycles ( e, n 0 , . . . , e, n 1 ), ( e, n 1 , . . . , e, n 2 ), . . . , ( e, n I−1 , . . . , e, n I ). We denote these cycles as C 1 , . . . , C I . Then by definition, we have that
for all m ≤ I, m ∈ N + . If I is not divisible by any l i s, then I ≥ 2, and we can apply Lemma 3 with k = I and l i = n i − n 0 for i ≤ I, i ∈ N + . Thus, there must exist
where q ≤ N, q ∈ N + . Notice that we can further decompose the cycles C a+1 , . . . , C b into simple cycles S i 's without changing the total weight and Algorithm 1 Computing M and p for Periodicity Proofs
end for end for end while epsilon-sum of the cycles. By (the case when i < 0) in Lemma 2, we havê
as desired. Note that if the q in (12) is found by exactly carrying the steps described above, the equality in (12) holds only when all of the equalities in (15) hold, which further implies that
That is, all of the simple cycles S i 's are critical cycles. Remark 1: Our proof of this lemma also has some similar structures to that of Lemma 3 in [5] when we set up a comparison between timing simulations and the critical cycle mean. We introduced the parameter I in Lemma 4 as we will need it in later parts of the proof. It is needed when we have to consider the case where an RER system has multiple critical cycles, and some cycles have (·) > 1. In those cases, the proof will need to examine timing gaps in blocks whose size is an integer multiple of I where I corresponds to the value of the epsilon-sum.
Given an RER system, we can easily compute the timing simulation until every transition occurs at least n times, giving ust( f , i ) for all f ∈ E and all 0 ≤ i ≤ n. While computing these values oft, we can record all rules that have zero slack. Note that there might be multiple rules with a given target that have zero slack, in which case we record just one of them.
We compute p using a well known algorithm [9] , which has worst case complexity of O (N 2 D) . In the following analysis, we also need two additional parameters: the least common multiple of number of the epsilon-sum of all the critical cycles, and the second largest cycle mean. Computations of the exact values of the two quantities, however, can be very costly. Instead, we define M and p , as the result of executing Algorithm 1, to be replacement of the two quantities, respectively, and we use them to bound the time that it takes for the system to exhibit periodicity.
Examining Algorithm 1, note that using the same reasoning used in Lemma 4, a slack zero e-path from an initial event to f , NM must contain at least M cycles. In the algorithm, G i denotes the simple cycles decomposed from these cycles. The algorithm terminates (when M is not increased by the for loop) because M is bounded from above by the least common multiple of the (C)'s for all critical cycles C. Also, note that (G i )|M for any critical cycle G i and δ(G i )/ (G i ) ≤ p for any noncritical G i obtained while applying the algorithm above. The quantities M and p appear many times in our equations and bounds, and we will give an intuition of the roles they play in our analysis later.
Suppose we pick an arbitrary event f , n f in the ER system where n f ≥ NM, obtain a zero slack e-path from an initial event to it, and find the q ≤ N, q ∈ N + as in Lemma 4 such thatt
Then, we also have the following corollary based on Lemma 4.
Corollary 5 (Lower Bound):
If at least one of the simple cycles S i (as in Lemma 4) is a critical cycle, then
for all k ∈ N + . Proof: Assume that some transition e on the zero slack e-path is on the critical cycle denoted S x . By construction,
where the last step follows from applying Corollary 2 to the e-path obtained from S x repeated M/ (S x ) times.
Theorem 2 (Stability of Steady-State):
For any transition f , if n f ≥ NM, and
for all k ∈ N + . Proof: For the event f , n f , we follow the steps in the proof of Lemma 4 and obtain q 1 ≤ N, q 1 ∈ N + such that:
Assumption (18) implies that (20) is an equality, and (18) is an equality for all 0 ≤ k < q 1 . Hence, all the simple cycles S i 's from Lemma 4 are critical cycles. Applying Corollary 5 (with k = 1)t
Equation (21) extends (18) to k = −1. We now repeat the argument, starting from event f , n f + M to obtain q 2 ≤ N, q 2 ∈ N + such that
The extended (18) forces the inequalities to be equalities. That ist
The result readily follows by a simple induction.
Remark 2 [10] : Mentioned that, in strongly connected systems where cycles have more than one tick, timing values being determined by a critical cycle once does not necessarily mean that timing values have entered the periodic steady-state. Additional unfoldings might be necessary before the steadystate is reached, and the number is closely related to solutions to the Frobenius problem [22] . This theorem also reflects this point, and provides an alternative way to verify the timing periodicity (including in systems that are not strongly connected but satisfy our assumptions). In particular, if the periodicity property is maintained MN times, the system is guaranteed to be in the periodic steady-state.
Lemma 5 (Bound of
Proof: We know that there exists a zero slack e-path p 0 from some initial event g, 0 to f , n f . p 0 can be decomposed into simple cycles S i and a cycle-free path p 1 = (g, e 1 , . . . , e l(p 1 )−1 , f ) with 0 ≤ l(p 1 ) < N in the RER system. Note that p 0 corresponds to the path starting from g, 0 , traversing some of the e i 's, then the cycles S i , and then the rest of the e i 's until we reach f , n f . Thereforê
By Corollary 3, we can find a path p 2 with 0 ≤ l(p 2 ) ≤ N − l(C) < N from the transition e to f . Set p to be the e-path corresponding to p 2 . First notice that by Corollary 1,
Thus, we have
Similarly by Corollary 3, we can find a path p 3 with length 
Also, notice that
Then, we have
as desired.
Then, for all transitions f
Proof: For any n ≥ L and transition f , there exists a zeroslack e-path from some initial event to f , n and find positive integer q 1 ≤ N as in Lemma 4 such that
We use S i to denote the ith decomposed simple cycle as in Lemma 4. There also exists a zero slack e-path from some initial event to f , n − q 1 M , and we apply Lemma 4 again to obtain q 2 ≤ N such that
We use S i to denote the ith decomposed simple cycle from Lemma 4. Define
where we repeat the aforementioned procedure k times. k is selected so that
Since the change from K k−1 to K k is q k M and q i ≤ N for all i, this also means K k ≤ n − N. Therefore
Lemma 5 implies that there exists an e-path p 0 with l(p 0 ) < N from some event e, n e to f , n , where e is on some critical cycle C. Thus
by Corollary 1 and the earlier constraint on K k . Corollary 2 implies that
Since, n − K k ≥ N again by earlier constraint on K k , Lemma 5 also provides an e-path p −K k from e, n e − K k to f , n − K k such that
Assume, toward a contradiction, that the inequality in (32) is strict. We use the contrapositive of Corollary 5 (with k := q 1 and n f := n − q 1 M) to conclude that none of the simple cycles S i 's are critical cycles. Using the same argument as (15)
Since n − K k−1 ≥ (M + 1)N > NM, we can repeatedly apply this reasoning to obtain inequalities of the form (39).
Telescoping them, and combining with (32), we get
Let p 0 be the e-path corresponding to p −K k but from e, n e to f , n . Then, by (37) and (40)
Together with (38) and (41)
This is a contradiction, because slack value is non-negative. Thus, the equality in (32) must hold. Then, by Lemma 5, all of S i 's are critical. Thus, for all n ≥ L, we must havê
Theorem 2 then implies that for all n ≥ L + NM
as desired. Remark 3: Our proof based on graph and path properties (slack of paths) provides a different view from previous work on how the system enters the periodic steady-state with time period Mp .
Specifically, think of all the events in the system as partitioned into blocks, with block sizes equal to integer multiples of M but less than NM. If there is positive slack from e, n e (e is on a critical cycle) to f , n (which is not in steady-state), then there is always a minimum reduction in slack-namely, p − p after each block: from e, n e + k 1 → ( y↑, 3 ) = 8.7. After each block (block size is one in this simple case because each cycle has an epsilon-sum of one), there is at least 10 − 9.9 = 0.1 reduction of slack on the path, because transition x ↑ is on the critical cycle and occurs with a slower rate. After the slack vanishes, the timing of occurrence of transition f will be determined by that of e, and thus determined by the critical cycle mean, which further implies that the transition f will satisfy the property stated in the inequality (43). Theorem 2 then implies that if a transition satisfies this property for long enough time, it will always be in the steady-state.
Complexity: We know that max{ (C)} = N for any simple cycle in the RER system. In the worst case, where p ≈ p and M = lcm{1, 2, . . . , N}, we know that M < 3 N [21] . Thus, it can take very long time, O([Np /(p − p )] + N3 N ), for the system to enter the steady-state. Fortunately, the worst case described rarely occurs, especially in real circuits as illustrated below.
IV. APPLICATIONS
The theoretical upper bound of the occurrences of events we provide in Section III on when the system is guaranteed to show periodic behavior applies to any system that satisfies the requirements we described at the beginning of this paper-i.e., any system that can be modeled using an ER system. Since the theoretical results consider the worst case in every single step of analysis, the bound is extremely loose in practice. The majority of practical systems exhibit periodicity very quickly except for some extreme cases, as will be illustrated next. 
A. Abstract Examples
The parameter N corresponds to the number of transitions in the RER system. This parameter provides one measure of system complexity. For an asynchronous circuit, the transitions are of the form x↑ and x↓, where x is the output of a gate. Hence, N is typically twice the number of gates in the asynchronous circuit, and will obviously affect the time it takes for the system to show periodicity. Now, we will explore some abstract examples to analyze the effects of parameters M and p quantitatively.
Consider the example shown in Fig. 3 , where there are two critical cycles (p = 5), with epsilon-sum of two and three, respectively. The transition y↑ has two edges coming from x↑ and y↓, which belong to the two different critical cycles. Thus, intuitively, the timing of occurrences of y↑ will be dominated by the two cycles in turn. It is reasonable that in the worst scenario, y↑ has to occur at least order of M = 2 × 3 = 6 times before it shows the periodic pattern. Indeed, the timing simulation of this example verifies our assumption and has an occurrence period of 6 The bound on k we give in Theorem 3 justifies the intuition provided by Gaubert and Klimann [4] and Hulgaard et al. [10] , that it might take long for the system to become periodic if there exists a cycle with cycle mean close to p . Consider the example shown in Fig. 1 . This example takes a very long time to reach the steady-state because p ≈ p , a similar example to [20, Fig. 3 ]. There are only four transitions, but the system does not show periodicity (period 20) until after 3612 time units, when each transition occurs more than 180 times.
B. Asynchronous Circuit Examples
We simulated different configurations of token rings, linear FIFOs, and cross-connected FIFOs ("X" topology) of asynchronous weak-conditioned half-buffer pipeline stages. While these circuits contain data and therefore contain orgates (a simple WCHB FIFO is shown in Fig. 4) , they are often designed with symmetric timing characteristics for both true and false data values. In this case, we can construct RER systems that correctly captures their timing properties. In the simulations that follow, a single gate delay is ten time units.
R(a, b) in Table I denotes a ring with a + 2b stages of buffers ( are copied to the FIFOs of length a, b and c, d, respectively, (Fig. 6) , and the output of the FIFOs are combined to produce the two primary outputs. We run enough times of timing simulations of each transition to make sure that the system is indeed in periodic steady-state, and report the occurrence period M, the time period Mp , and the time when the observed transitions (arbitrarily chosen) in the system complete their first steady-state period. This is referred to as the "first completion" time. For example, in R(3, 2), the observed output goes from low to high at time 120, 220, 380, 540, 640, 800, 960, 1060, 1220, 1380, . . . Thus, M = 3 and Mp = 420. As can be seen from our results, most simple examples become periodic after a small delay. 
C. Asynchronous + Synchronous Circuits
Our framework for analysis is capable of modeling circuits that contain both synchronous and asynchronous components. A simple free-running clock can be easily modeled as an RER system with two transitions clk↑ and clk↓, and two rule templates that relate the two transitions and whose delays correspond to the duration for which the clock is high and the clock is low. This transition can be used to determine propagation delays of other signals (the output of state-holding elements, for example) in the usual way. If part of the circuit described by the RER system is asynchronous and the clock is truly synchronous (i.e., not pausable/stretchable, etc.), there is no path from any event in the asynchronous circuit back to clk↑ or clk↓. Hence, most previous theory of periodicity does not apply to such an RER system because the collapsed constraint graph is not strongly connected. However, if the cycle period for the clock is in fact the critical cycle period (which is typical), then our results apply as long as there is a path from clk↑ or clk↓ to all the other events in the system.
The immediate consequence of this is that if there is an asynchronous component whose primary inputs are driven by a clocked environment, and there is a path (in the RER system) from the clock transition to every other transition in the system, then all signals in the entire RER system will be periodic with the same period as the external clock after finite signal transitions. In this scenario, we have M = 1 since there is only one critical cycle-the one that determines the clock period. Table II demonstrates the synchronization of asynchronous systems with an external periodic input signal (a clock). Theorem 3 implies that such a system will exhibit exact periodicity, and provides a bound on the time taken to reach this periodic state. The worst case bound is at least 8N, and in our examples N is quite large. Table II shows that in reality, the system typically enters the steady-state much faster than the bound.
For this example, we used a linear array of FIFO [ denoted  L(a, b) with a + 2b stages with b initial data values] implemented with weak-conditioned logic, where the input to the array is synchronized to a clock with fixed period of 200 time units (20 gate delays in our simulations). We selected this period because it was longer than the cycle period of the freerunning asynchronous circuit, so that the critical cycle in the system is the clock cycle. Additional asynchronous arithmetic circuit examples that we show consist of 2, 4, and 8 bit adders (ADD) and a 4-bit and 8-bit array multiplier core (MUL). In each case, the external data inputs are clocked with the clock cycle period shown, and the time at which the output of the asynchronous core is synchronized to this external clock is shown as well. The adder and multiplier circuits are implemented using precharged half-buffer asynchronous logic [23] , assuming symmetric sizing of data wires. Table II shows when the asynchronous circuit outputs synchronize to this external clock. We can see from the table that the outputs of the circuit quickly become synchronous with respect to the external clock. Hence, an asynchronous circuit where all the inputs are synchronized to a single clock will have synchronous outputs after finite asynchronous outputs-and the outputs can therefore be read by a clocked circuit-without metastability, as implemented in Fig. 7 .
In our implementation, the input of an asynchronous circuit (labeled "Async") is connected to the output of a synchronous circuit. Assume that the clock period is p 1 and the asynchronous circuit has the largest cycle mean p 2 (which determines the natural frequency of the asynchronous circuit). If p 1 > p 2 , the combined system will have p = p 1 and p = p 2 . As we have established, the output of the asynchronous circuit will become synchronous to the external clock with fixed occurrence period p after finite occurrences of transitions. It is then possible to build a metastability-free interface between synchronous and asynchronous logic. To handle the initial transient when the asynchronous circuit is not periodic, a k-stage FIFO is attached to the output of the asynchronous circuit to hold the initial asynchronous tokens for synchronous reading. We assume that the handshake of a k-stage FIFO is fast enough and it will not change the critical cycle mean p of the whole system (this is typical in our experience, since an FIFO is one of the fastest asynchronous circuits). The output of the FIFO is connected to another clocked component (with the same clock period p ) with an enable signal that is initially low.
Note that any finite part of a timing simulation can be easily computed using standard simulation tools. We proceed as follows.
Step 1: We connect a perfect sink-a circuit that acknowledges the data instantly-to the output of the FIFO, and simulate the circuit in Fig. 8 . In this case, the value of k does not matter and the token will be removed by the sink as soon as possible. The entire system becomes synchronous to the external clock after each transition occurs a certain number of times-denoted by m.
We record when the ith token arrives at the output of the asynchronous circuit (which is equivalent to ith occurrence of the output transition) and denote that time by t i . Since the circuit becomes synchronous after each transition occurs m times, we know that t m+j = t m + jp for all j ∈ N.
Each FIFO stage can hold one token. 4 Denote the minimum delay for a token at one FIFO stage to propagate to the next stage by d, assuming the rest of the FIFO is empty.
Step 2: We would like to replace the circuit from Fig. 8 (that destroys the output of the asynchronous circuit) with Fig. 7 .
Suppose we change the enable from low to high after n (n ∈ N + ) clock cycles (at time t e = np ). t e has to be large enough that by each time the clocked component connected to the output of the FIFO acknowledges a token, there is a token there for acknowledgment. Also, k has to be large enough such that the FIFO can hold all the tokens that have been produced by the async block but have not been acknowledged by the clocked component, without overflow. To simplify matters, we assume that the enable signal is changed at the negative edge of the clock (i.e., so the enable is asserted half-way through the nth clock cycle). Hence, the output circuit reads the FIFO data at times t e + (i
The first token (i = 0) is produced by the async block at time t 1 (from step 1), and it propagates through the FIFO with delay k×d. 5 Generalizing, it is clear that if the time t e satisfies the constraints
for all i ≤ m, i ∈ N, we are able to read the tokens stored in the k-stage FIFO with the same clock period p without metastability (Fig. 9 ). For the case when i > m, we know from our periodicity result that t m+j = t m + jp , and hence (45) for i = m is sufficient to show that the circuit continues to operate correctly for the rest of the data. To characterize k, define a new parameter d to be the delay that it takes for an empty stage to move from (i+1)th stage to ith stage. This parameter characterizes the process that a token at the (i + 1)th stage is acknowledged and removed, and the token at the ith stage moves to the (i + 1)th stage, effectively create an empty stage move from the (i + 1)th stage to ith stage. Since it considers the time it takes for the next stage of FIFO to be ready to accept a new token, d > d. We can also make a reasonable assumption that for any u ∈ N + such that t u < t e + kd + (1/2)p and
for any u ∈ N + such that t u ≥ t e + kd + (1/2)p , the FIFO will never overflow. To see this, first notice that at time t u , the total number of tokens that have been generated is u. Meanwhile, we start to remove the tokens synchronously at time t e + (1/2)p , and at time
Furthermore, the length of the FIFO might need to be larger to compensate the effect that when the token is removed, the empty stage is on the right most side, and it takes some more time for this empty stage to "propagate" to the left and effectively create a new space for tokens generated by the asynchronous circuit. Since it takes at most kd time units for an empty space to propagate from one end of the FIFO to the other, at time
) empty stages are actually created and available at the output of the asynchronous circuit. For simplicity, we get rid of the case when t u < t e +kd and use (47) as a uniform expression of k. Notice that (47) is a sufficient condition of (46) when t u < t e +kd . Theoretically, such a simplification might result in unnecessarily large value of k in some unusual systems when the difference between t i+1 and t i are many times of p for some i. We do not have to worry about this in most practical circuits. Rearranging the inequality, a simple version of sufficient condition of k such that the FIFO will never overflow is
for any u ∈ N + . We do not actually have to verify infinite number of integer values u either, since again, t m+j = t m + jp for any j ∈ N. If the inequality (48) holds for some u 1 ≥ m, u 1 ∈ N + , it automatically holds for all u ≥ u 1 , u ∈ N + .
There are two technical details worth mentioning. First, we can always pick t e large enough so that (45) is satisfied by an appropriate choice of k so long as the async circuit's output data token rate is approximately equal to its input data token rate. 6 The reason is that d is typically much smaller than p , since an asynchronous FIFO is one of the fastest circuits one can design. The second technical detail has to do with the value of kd. d was chosen as the empty FIFO delay, but if k > 1 and we consider the ith token's propagation delay, the FIFO may not be empty. This could result in the ith token taking more time to propagate through the FIFO than kd. However, this stall would occur only because there are earlier tokens that have already reached the output of the FIFO. Hence, the stall is therefore a result of the async block producing tokens faster than they can be read by the synchronous output. This scenario also implies that the output can be safely read via the synchronous interface without metastability. Hence, we need not consider this as a separate case and we simply have to verify (45). The property we have relied on is that for large enough k, the FIFO will not perturb the timing of the async circuit.
We used the method outlined above to implement the metastability-free interface to some of our asynchronous circuits examples. We simulated the timing of transitions using our asynchronous circuit simulation tools. We report the minimum k and the corresponding minimum t e that are valid for each different asynchronous circuit in Table III , and that result in metastability-free operation.
The large values of k in Table III correspond to asynchronous circuits that can produce output tokens without receiving any inputs. Note that for most of the circuits, the value of k is either one or two; in the other cases, it is zero or one plus the number of initial tokens can be output by the async block without receiving any external input. Therefore, we expect that the FIFOs needed in many practical cases (for example arithmetic units and datapath blocks) will be small.
D. Extensions to Bounded Delay Systems
Work on the problem of TSE considered systems under bounded delay models [10] , [11] , [20] . The analysis is essentially based on periodic properties under fixed delay models where delays of all the rules are set to the maximum or minimum of the range. Notice that if one permits even a tiny and arbitrary pertubation of a delay value [α, α + r] on an edge on the critical cycle, the system is no longer periodic. This can be easily seen by examining a simple ring oscillator, where one of the inverters has variable delay. A perturbation of the delay value on an edge which is not on the critical cycle might also have effects on timing properties of the system. In reality, a clock is not exactly periodic either, because of jitter. However, the exact periodicity under fixed delay models reveals the core properties of such systems, and applications based on such properties can be readily generalized to bounded delay systems by similar methods as in the previous work. We now examand half initial ones) by , or (n+ (1/2) )p (related to initial n and half clock cycles) by , depending on whether n c > n or n c ≤ n. In the RER system, we denote the output transition of the asynchronous block to be f , and thus in the corresponding ER system,t( f , u ) = t u+1 (we denoted t 1 as the first token, but f , 0 is the event of first occurrence of transition f ). Now, fix the delays of all rules except for those in R 1 . Without loss of generality, we increase the delay of an arbitrary edge (rule), clk ↓, n c δ(r 1 ) → clk ↑, n c in the ER system, which corresponds to r 1 in the RER system, from some reference value δ (r 1 ) ∈ [β 1 , β 1 ], by , to δ(r 1 )+ ∈ [β 1 , β 1 ] .
First, in (45), we can find a zero slack e-path p i from some initial event g, 0 to f , i . Notice that a transition e, n e with n e > i is never on p i , which implies that for any n c > i, t i+1 will not change. If n c ≤ i, the left-hand side of the inequality, the time after initial n + i + (1/2) clock cycles, will increase by exactly , and by Theorem 4, the righthand side of the inequality will increase by at most . Thus, the left-hand side of the inequality will always increase by at least the same amount as the right-hand side of the inequality, and δ(r 1 ) ≡ β 1 for all n c ∈ N will be the worst case scenario.
Then, in (51), kp and up will always change at the same time and by the same amount, which always results in a cancellation. Again by Theorem 4, the right-hand side of the inequality will decrease by at most . When n c ≤ n, the (n + (1/2))p will decrease by exactly , which implies a worst case scenario when δ(r 1 ) ≡ β 1 for all n c ≤ n. When n c > n, (n + (1/2))p will remain the same, which implies a worst case scenario when δ(r 1 ) ≡ β 1 for all n c > n.
The same argument holds if we increase clk ↑, n c −1
→ clk↓, n c in the ER system, which corresponds to r 2 in the RER system. Define p min = β 1 + β 2 , p max = β 1 + β 2 , and let d max correspond to the value of d when each rule r ∈ R 3 picks its maximum delay. If 2p min − p max ≥ q d max , where q 2, which is typical in practical circuits, consider the system under the worst case scenario we discussed above, where δ(r 1 ) ≡ β 1 , δ(r 2 ) ≡ β 2 for all n c ≤ n 1 and δ(r 1 ) ≡ β 1 , δ(r 2 ) ≡ β 2 for all n c > n 1 . We can find a metastability-free interface solution (n 1 , k 1 ) to it because when we increase n 1 by 1, t e will be increased by at least p min , and the total increase of delays of rules in the ER system will be p max − p min . Theorem 4 then guarantees that timing simulations of the output transition increase by at most p max − p min . Together with the similar idea to technical details we mentioned in the fixed delay system, we can always find n 1 and k 1 large enough that satisfy both (45) and (48) for the system. Furthermore, (n 1 +1, k 1 +v) where 2 ≤ v ≤ q , v ∈ N are all feasible solutions. Notice that Lemma 6 does not apply directly here because delays of rules depend on n 1 . However, with the reasonable assumption we just made, similar result still holds. To see this, if (45) holds for (n 1 , k 1 ), then for (n 1 + 1, k 1 + v), δ(r 1 ) ≡ β 1 , δ(r 2 ) ≡ β 2 for all n c ≤ n 1 + 1 and δ(r 1 ) ≡ β 1 , δ(r 2 ) ≡ β 2 for all n c > n 1 + 1. The left-hand side of (45) increases by at least p min and the right-hand side of it increases by at most p max − p min + vd max by Theorem 4, for all i, and the inequality still holds by the assumption. Similarly, (51) still holds as well because the left-hand-side of it increases by at least vp min − p max and the right-hand side of it increases by at most vd max for all u.
If δ(r 1 ) ≡ β 1 and δ(r 2 ) ≡ β 2 for all n c , we can find another metastability-free interface solution (n 2 , k 2 ). Corollary 6 applies here, and together with arguments above, we can find a common solution (n 3 , k 3 ) to both systems. Thus, (n 3 , k 3 ) is a solution to worst case scenario when delays of rules in the ER system corresponding to r 1 ∈ R 1 are allowed to change.
Similar arguments can be made for all other edges, and it is also much easier than the analysis above because t i+1 (or d and d ) are monotone increasing when we increase delays of rules corresponding to r ∈ R 2 (or r ∈ R 3 ), respectively, and other parameters will not change in either case. It is easy to check that when r ∈ R 2 , the worst case scenario for (45) is when each rule picks its maximum delay and that for (51) is when each rule picks its minimum delay. When r ∈ R 3 , the worst case scenario for both (45) and (51) is when each rule picks its maximum delay. A common solution to combinations of these worst case scenarios will be a valid solution for the bounded delay system. Remark 4: If q and q are large, which is typical in practical systems, in Corollary 6, n 3 and k 3 of the minimum common solution (n 3 , k 3 ) will not be much larger than max{n 1 , n 2 } and max{k 1 , k 2 }, respectively, which is good for practical feasibility of our design under bounded delay assumptions.
E. Existing Systems
STARI as is shown in Fig. 11 , is a novel technique for high-bandwidth communication proposed by Greenstreet [12] . It combines synchronous and self-timed design methods, and corresponds to a case in our design, where the async component is a collection of wires. There are also at least two additional commercially developed systems that connect asynchronous components to a synchronous environment without any metastability. One of these systems is a digital FIR filter chip used in the read channel of a disk drive controller [24] . Another system is an asynchronous FPGA that provides a synchronous I/O interface and user model [25] . While both these systems relied on the empirical observation of periodicity of a specific class of asynchronous circuits, our results show that this is not a coincidence. The property of exact periodicity holds for a wide range of asynchronous circuits, and can be exploited for metastability-free interfaces between synchronous and asynchronous logic.
V. CONCLUSION
We considered RER systems under and-causality and fixed delay model, and gave a different rigorous proof based on graph and path properties that, regardless of initial configuration, after finite number of occurrences of each transition, k , the system will enter steady-state, where each transition occurs with a fixed time period Mp and occurrence period M. this paper provides an analytical bound on k , and uses a weaker assumption about the RER system. Specifically, we assume that for any transition f in the RER system, there is a path from some transition e on a critical cycle to f . Our results under weaker assumption provide theoretical foundation, for the exact periodicity property to be applied and exploited in circuits containing a combination of synchronous and asynchronous components. We implemented the analysis and provided experimental results that support our analysis. We also provided an extension of our analysis and methods to the case of bounded delay systems.
In particular, we showed that it is possible to interface asynchronous circuits to synchronous logic without metastability, provided all inputs to the asynchronous circuit are clocked. In future work, we plan to explore efficient and practical algorithms to compute the timing properties of asynchronous circuits. We also plan to extend our results to systems with both and-causality and or-causality using the ideas from [15] and [16] , thereby extending the results to general asynchronous circuits.
