Synthesis of timed asynchronous circuits by Myers, Chris J. & Meng, Teresa H.-Y.
106 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 2, JUNE 1993
Synthesis o f Timed Asynchronous Circuits
Chris J. Myers and Teresa H.-Y. Meng
Abstract—In this paper we present a systematic procedure to 
synthesize timed asynchronous circuits using timing constraints 
dictated by system integration, thereby facilitating natural in­
teraction between synchronous and asynchronous circuits. In 
addition, our timed circuits also tend to be more efficient, in both 
speed and area, compared with traditional asynchronous circuits. 
Our synthesis procedure begins with a cyclic graph specification 
to which timing constraints can be added. First, the cyclic graph 
is unfolded into an infinite acyclic graph. Then, an analysis of 
two finite subgraphs of the infinite acyclic graph detects and 
removes redundancy in the original specification based on the 
given timing constraints. From this reduced specification, an 
implementation that is guaranteed to function correctly under the 
timing constraints is systematically synthesized. With practical 
circuit examples, we demonstrate that the resulting timed imple­
mentation is significantly reduced in complexity compared with 
implementations previously derived using other methodologies.
I. Introduction
T HE DESIGN OF timed asynchronous circuits has recently 
gained much attention because of the increasing need 
for asynchronous circuits in mixed synchronous/asynchronous 
environments. Inherent in these environments are timing con­
straints (gate, wire, and environment delay information) which 
circuits must satisfy and can exploit to optimize the implemen­
tation. Existing asynchronous design techniques either cannot 
handle systems with timing constraints, or do not fully utilize 
the information contained in them. This paper presents a 
methodology to synthesize asynchronous circuits that utilizes 
timing constraints throughout the synthesis procedure. As a 
result, our timed circuits retain the same behavior with less 
circuit complexity than earlier implementations.
Many methodologies have been proposed for the synthe­
sis of speed-independent circuits [ 1 ]—[4]. Speed-independent 
circuits are very robust since they are guaranteed to work 
independent of the delays associated with their gates, but they 
can be overly conservative when timing constraints are known. 
Timed circuits, on the other hand, are only guaranteed to work 
if the delays fall in the range given in the timing constraints of 
the specification. Utilizing these timing constraints, we trade 
robustness to variations in delays for significant reductions in 
circuit complexity.
Speed-independent circuits are restricted to interfaces where 
their environment only changes inputs in response to changes 
of outputs. Inputs from a synchronous circuit often do not 
satisfy this restriction. In order to address this problem,
Manuscript received. This work was supported by an NSF fellowship, ONR 
Grant N00014-89-J-3036, and research grants from the Center for Integrated 
Systems, Stanford University and the Semiconductor Research Corporation 
under Contract 92-DJ-205.
The authors are with the Department of Electrical Engineering, Stanford 
University, Stanford, CA 94305.
IEEE Log Number 9209313.
fundamental mode synthesis methods have been used [5]—[8], 
which assume the environment will wait long enough for the 
circuit to stabilize before inputs are changed. Timing analysis 
must be performed after synthesis, and appropriate delays may 
need to be added to guarantee that this requirement is satisfied. 
Since these methods limit the concurrency within a circuit and 
do not fully utilize available timing constraints, they may result 
in circuits that are larger and slower than necessary.
Methods have been proposed to use timing constraints to 
synthesize timed circuits [9], [10]; however, most techniques 
apply timing constraints after synthesis only to verify that 
hazards do not exist. If hazards are detected, delay elements 
are added to avoid them, degrading the performance of the 
implementation. It was shown in [4] that the more conservative 
speed-independent model while resulting in somewhat larger 
circuits actually produces faster circuits compared with the 
timed circuits described in [10]. This surprising result can be 
attributed to the fact that these timed circuits often need to have 
delay elements added to the critical path to remove hazards.
Our synthesis procedure uses the timing constraints at 
the outset to enhance performance while minimizing circuit 
complexity. In several practical examples, we show that sig­
nificant reductions in circuit complexity (measured in terms 
of literal count needed for the implementation) as compared 
to previous designs can be achieved using very conservative 
timing constraints. In particular, in a memory management 
unit designed for use with a real asynchronous microprocessor 
[11], [12], the circuit complexity is reduced by over 50% over 
the speed-independent implementation. Circuit performance is 
also enhanced, not only because we have reduced circuit area 
and do not use delay elements, but also because we are able 
to synthesize a more concurrent specification without adding 
state variables. An example of a DRAM controller to be used 
with a synchronous processor and DRAM array is presented 
to illustrate a design that cannot be done speed-independently. 
Circuit complexity is also reduced as compared to previous 
fundamental mode designs [13], [7].
This paper contains five sections. Section II describes our 
specification language and timing analysis algorithm. Section 
III discusses our synthesis procedure. Section IV presents 
several practical examples. Section V gives our conclusions.
II. Timing Analysis on Timed Specifications
A wide variety of methodologies for specification of asyn­
chronous circuits have been proposed. They can be roughly 
grouped into three classes: language based, such as commu­
nicating sequential processes (CSP) [1]; graph based, such as 
signal transition graphs (STG) [2]; and finite-state machine 
based, such as burst-mode state machines (BSM) [6], At
1063-8210/93J03.00 © 1993 IEEE
Ia high-level, CSP provides a very concise representation 
for large designs such as the microprocessor described in 
[11]. It is well suited for non-deterministic behavior, but 
it can be difficult to specify concurrency within a process. 
On the other hand, STG provides a good representation of 
concurrency within a process, but it is cumbersome to use for 
large designs and cannot specify arbitrary non-deterministic 
behavior. Neither of these representations are good for spec­
ifying asynchronous circuits in a synchronous environment. 
BSM has been successfully used for such applications [13], 
which was made possible by assuming fundamental mode 
as opposed to the other two specifications which use the 
speed-independent model. None of these specification methods 
incorporates timing constraints.
We chose to use a specification language, the event-rule (ER) 
system [14], which is easily derivable from CSP, STG, and 
BSM and incorporates timing constraints. It is shown in [14] 
that specifications that are not disjunctive or non-deterministic 
can be directly transformed into an ER system. A specification 
is disjunctive if there exists a transition in the specification that 
is specified to occur after either one transition or another, but 
it does not have to be preceded by both. A specification is non- 
deterministic if the circuit behavior is determined by a choice 
made by either the environment or the circuit. Derivation of 
ER systems from each specification method described above 
(i.e., CSP, STG, and BSM) is illustrated through an example. 
While our synthesis procedure does not presently allow non- 
deterministic specifications, it is shown, by way of an example, 
that some non-deterministic specifications can be transformed 
into deterministic specifications which can then be synthesized.
In order to synthesize timed circuits, timing analysis must 
be used on the ER system specification to deduce timing 
information necessary to detect redundancy in the specification 
from the given timing constraints. More specifically, in timed 
circuits, the timing information needed is the minimum and 
maximum difference in time between any two events (i.e., 
signal transitions) in a circuit specification. Polynomial-time 
algorithms have been developed [15], [16] to determine the 
difference in time between any two events in an acyclic graph. 
Circuit specifications, however, are normally cyclic. Therefore, 
to apply these algorithms to circuit synthesis, these results 
must be expanded to handle cyclic specifications. Recently, an 
algorithm has been proposed that finds these time differences 
in cyclic graphs in exponential-time [17]. In this paper, we 
propose instead a polynomial-time heuristic algorithm which 
is sufficient for our purposes. Our algorithm unfolds the cyclic 
graph into an infinite acyclic graph and then examines only 
two finite acyclic subgraphs of the infinite graph to determine 
a sufficient bound on the time difference between two events.
2.1. Event-Rule System
The ER system was introduced in [14] for performance 
analysis of asynchronous circuits. It was modified to incor­
porate bounds on the timing constraints and introduced as a 
specification language for timed circuits in [18]. An ER system 
is composed of a set of atomic actions, events, and the causal 
dependencies between them, rules, and it can be compactly 
represented using an event-rule (ER) schema.
MYERS AND MENG: TIMED ASYNCHRONOUS CIRCUITS
1) Events: An event is defined as “ ... an action which one 
can choose to regard as indivisible—it either has happened or 
has not ... ” [19]. In circuits, events are transitions of signals 
from one value to another. There are two transitions associated 
with each signal s in a specification, namely, s | where j 
denotes that the signal s is changing from a low to high value, 
and s I where J. denotes that the signal s is changing from 
a high to low value.
2) Rules A rule is a causal dependency between two 
events. Each rule is composed of an enabling event, an enabled 
event, and a bounded timing constraint. Informally, a rule 
states that the enabled event cannot occur until the enabling 
event has occurred. If two rules enable the same event then that 
event cannot occur until both enabling events have occurred. 
This causality requirement is termed conjunctive.
The bounded timing constraint places a lower and upper 
bound on the timing of a rule. A rule is said to be satisfied 
if the amount of time which has passed since the enabling 
event has exceeded the lower-bound of the rule. A rule is said 
to be expired if the amount of time which has passed since 
the enabling event has exceeded the upper bound of the rule. 
An event cannot occur until all rules enabling it are satisfied. 
An event must always occur before every rule enabling it has 
expired. Since an event may be enabled by multiple rules, it is 
possible that the difference in time between the enabled event 
and some enabling events exceeds the upper-bound of their 
timing constraints, but not for all enabling events. These timing 
constraints are the same as the max constraints described in 
[15] and the type 2 arcs described in [16].
Finding timing constraints for a specification is not a trivial 
task. Rules can be categorized into environment rules (i.e., the 
enabled event is a transition of an input signal) and internal 
rules (i.e., the enabled event is a transition of a state variable 
or output signal). Timing constraints for environment rules 
can be determined from interface specifications or datapath 
delay estimates. For internal rules, the problem is much more 
difficult since the timing constraints cannot be known until 
the circuit is synthesized, but the circuit cannot be synthesized 
without given timing constraints. To solve this problem, the 
designer should estimate the maximum delay for the gates 
in the library to be used and set the upper-bound of the 
timing constraint in each internal rule to this value. The lower- 
bound of the timing constraint should usually be set to 0 
since optimizations could potentially reduce the gate to nothing 
more than a wire. After a circuit is generated, it should be 
analyzed using a timing analysis tool to verify that the timing 
constraints used are correct. If the circuit violates the timing 
constraints, it must be resynthesized with more conservative 
timing constraints (larger upper bounds in this case). In order 
to avoid resynthesis, conservative values should be used for 
timing constraints on internal rules at the outset. In the design 
of interface circuits and other controllers, inputs often are 
from off-chip or from a datapath. In these cases, the lower 
bound of the timing constraint on environment rules is large 
compared with the upper bound of the timing constraints on 
internal rules. Therefore, a conservative estimate for internal 
gate delays does not significantly affect the complexity of the 
timed implementation.
107
108 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 2, JUNE 1993
Fig. 1. The cyclic constraint graph for an SCSI protocol controller (courtesy 
of [20]).
3) Event-Rule Schema: An ER system can be specified 
using an ER schema and initialization information described in 
the next subsection. An ER schema defines a cyclic constraint 
graph which is a weighted marked graph in which the vertices 
are the events, the arcs are the rules, the weights are the 
bounded timing constraints, and the initial marking is given by 
the value of e. Each rule of the form (e, /, e, r) is represented 
in the graph with an arc connecting the enabling event e to the 
enabled event f .  The arc is weighted with the bounded timing 
constraint r. In other words, each rule corresponds to a graph 
segment, e^-*f (or e-+>f when the rule is initially marked, i.e., 
e = 1). A cyclic constraint graph is similar to a STG in which 
timing constraints have been added to the arcs. The ER schema 
is defined more formally as follows:
Definition 2.1: (Event-Rule Schema): An event-rule schema 
is a pair (E\ R!) where E ' is a finite set of events, and R ' is 
a finite set of rules. Each rule is denoted as {e, /, e, r), where 
e and /  are two events, e is defined to be 1 if the rule has an 
initial marking and 0  otherwise, and r  = [I. u] where I is the 
lower bound and u is the upper bound of the timing constraint 
on the rule.
As an example, a SCSI protocol controller, originally speci­
fied with a STG [20], is specified by its cyclic constraint graph 
as shown in Fig. 1. An example of a rule in this constraint 
graph is between the two events q J. and rdy 1 , which is of 
the form (q [.rdy 1 , 0 , [0,5]).
Our synthesis procedure requires that each event in an ER 
schema is uniquely identified. This led to the restriction in [18] 
of only one rising and one falling transition of each signal 
per cycle in the specification. To remove this restriction in 
this paper, each occurrence of the rising and falling transition 
in a cycle is given a unique name. For example, a signal s 
specified to rise and fall twice in a cycle, is renamed to for 
the first rising and falling transitions and s2 for the second. 
These events are treated separately during the timing analysis; 
however, they are recombined during synthesis as illustrated 
in an example later.
Another requirement is that the cyclic constraint graph is 
well-formed. A cyclic constraint graph is well formed if it is 
strongly connected, every cycle has the sum of the e values 
along the cycle greater than or equal to 1 , and for every event
there exists a cycle including the event in which the sum of the 
e values is equal to 1 [17]. Many specifications are not well- 
formed, but such specifications can often be synthesized by 
transforming them into ones which are well-formed as shown 
later in an example.
4) Event-Rule System: To construct the ER system, the 
cyclic constraint graph is transformed into an infinite acyclic 
constraint graph. Each event in the ER schema is mapped onto 
an infinite number of event occurrences, each corresponding 
to a different occurrence of that event. The rules are similarly 
mapped. Thus in the infinite acyclic constraint graph, each 
rule occurrence (e, /,* ,£ , r) corresponds to a graph segment, 
(e, i). The occurrence-index i is used to denote each
separate occurrence of an event or rule in the ER schema. 
The first occurrence has i =  0, and i increments with each 
following occurrence. The occurrence-index offset e is the 
difference in the occurrence-index of the enabled event /  and 
the enabling event e. For each rule occurrence, the value of 
the occurrence-index offset e is the same as the value of the 
initial marking £ for the corresponding rule in the ER schema.
A special reset event is added to the set of events in order 
to model the reset of the circuit. For each initially marked 
rule (i.e., e = 1 ) with enabled event / , a reset rule is added 
between the reset event and the event /. This rule models 
special timing constraints on the initial occurrence of the event 
/ . Effectively, the acyclic constraint graph is constructed by 
cutting the cyclic constraint graph at the initial marking and 
unfolding the graph an infinite number of cycles. The result 
is an ER system as defined below:
Definition 2.2: (Event-Rule System): Given the event-rule 
schema (E ',R '), define an event-rule system to be a pair 
(E , R) where each event occurrence (e, i) in E  where i > 0  
represents an occurrence of an event e in E', and each rule 
occurrence (e ,/, i,£ ,r )  in R  where i > £ is an occurrence 
of a rule (e, /, c ,t) in R '. The event (reset, 0) is added to 
E. For each rule in R ' in which £ = 1, a rule, of the form 
(reset, /, 0,0, ro) is added to R.
The specified circuit behavior is defined by simulating the 
acyclic constraint graph using the timed firing rule given 
below:
Definition 2.3: (Timed Firing Rule): Given that t({ f,i)) is 
the exact time of the event occurrence (/,*), it can take on 
any value within the bound defined in terms of the times of 
the event occurrences that enable it. The bound can be given 
as follows:
/ ,max, DW (e’?: - £»  + <«/,*))
e , r ) G n
< max {t((e,i — £■}) + u}.
{e,f.i,E,T)eR
A subgraph of the unfolded infinite acyclic constraint graph 
for the SCSI protocol controller is shown in Fig. 2. An example 
of a rule occurrence in this ER system is between the two 
event occurrences (q j, 0) and (rdy  1,0), which is of the form 
(q I . rdy 1,0,0, [0,5]). According to the timed firing rule, the 
event occurrence (rdy 1 , 0 ) cannot occur until both the event 
occurrences (q 1,0) and (go T,0) have occurred, and it must 
occur before 5 time units have elapsed since both the event 
occurrences occurred.
I2.2. Timing Analysis
In order to transform an ER system specification into a timed 
circuit, our synthesis procedure requires a timing analysis 
algorithm to determine the minimum and maximum time 
difference between any two events. We have developed an ef­
ficient polynomial-time timing analysis algorithm to determine 
a sufficient estimate of these time differences based on only 
two finite subgraphs of the infinite acyclic constraint graph.
1) Worst-Case Time Difference: A time difference is a 
bound in the amount of time between two event occurrences 
(see Definition 2.4). The worst-case time difference is a bound 
on the minimum and maximum difference in time between 
two events for any occurrence (see Definition 2.5).
Definition 2.4: (Time Difference): Given two event occur­
rences, (it, i — j)  and {v, i), and the occurrence-index offset 
between them j  where j  > 0, the time difference between 
these two event occurrences is the strongest bound [Li,Ui] 
such that:
Li < t((v, i)) - t((u, i - j) )  < Ui
Definition 2.5: (Worst-Case Time Difference): Given two 
events, u and v, and the occurrence-index offset between them 
j  where j  > 0, the worst-case time difference between these 
two events, [L,U], is:
L =  min{Z/j} and U =  max{{7,},
i> j
where [Li, Ui] is the time difference for each occurrence of u 
and v with offset j  (as defined in Definition 2.4).
2) Algorithm to Estimate Worst-Case Time Difference In 
our ER systems, a pair of events has an infinite number of 
occurrences; however, it is possible to analyze a finite number 
of occurrences to find a sufficient estimate o f the worst-case 
time difference as defined in Definition 2.6.
Definition 2.6: (Estimate o f the Worst-Case Time Differ­
ence): Given the worst-case time difference [L , U] between 
two events, an estimate of the worst-case time difference is 
any [£', U'] such that L' < L  and U' > U.
Given two events u and v and an occurrence-index offset 
between them j. Algorithm 2.3 determines an estimate of the 
worst-case time difference between them by constructing two 
finite acyclic subgraphs to be analyzed by Algorithm 2.2. The 
first subgraph includes only events and rules with indexes i — 1 
and * for some arbitrary value of * > 0. A source event is 
added to this subgraph, and each rule with e = 1 and with 
index i — 1 is replaced with a rule from the source event 
to the enabled event with a timing constraint of [0, oo]. This 
construction guarantees that no timing assumptions are made 
about previous cycles which are not modeled in our finite 
graph. For the special case when i = 0, another subgraph is 
constructed which includes only events and rules with * = 0. 
We prove later that the analysis of these two subgraphs yields 
an estimate of the worst-case time difference.
These two subgraphs are acyclic and finite so the algorithms 
described in [15] and [16] can be used to find the time 
difference between any two event occurrences (u ,i — j )  
and (v, i )  in these graphs. The function MaxDiff (defined 
recursively in Algorithm 2.1 [16]) is used to find the upper-
MYERS AND MENG: TIMED ASYNCHRONOUS CIRCUITS
bound of the time difference Ul . MaxDiff is also used to 
find the minimum time difference Lt since MinDiff((u,i -  
■7>,(M)) = (-1) * MaxDiff({v, i ) , (u, i  -  j ) )  [15], [16]. 
The estimate of the worst-case time difference returned by 
Algorithm 2.3 is the minimum of the lower-bounds and the 
maximum of the upper bounds of the time differences for the 
ith and 0th occurrence. In our synthesis procedure, only time 
differences with values of j  = 0 or j  =  1 are of interest, so this 
algorithm does not produce a tight bound for j  > 1. Also, since 
the worst-case time difference is only defined over values of 
i where i >  j ,  the 0th occurrence only needs to be considered 
if j  = 0. Finally, since this algorithm is called repeatedly in 
the synthesis procedure, the graphs are created only once for 
a given circuit, and once a time difference is calculated for a 
particular pair of event occurrences, it is stored in a table such 
that it need not be recalculated.
Algorithm 2.1 (Find Max Time Difference in an Acyclic 
Graph):
int MaxDiff(acyclic graph G; event occurrences (u , i — j) ,
(v,i)) {
If there is no path from {v, i) to (u, i -  j )  then
maxdiff - max {MaxDiff (G, {u, i -  j) ,
(c, i £)) -|- Ueu})
Else
maxdiff = min {MaxDiff (G, (e, i — j  - e),
(e,u,i-j',£,r„)6fi
{v, i)) -|- /eu})
Retum( maxdiff);
}
Algorithm 2.2 (Find Time Difference in an Acyclic Graph)
bound TimeDiff(acyclic graph G; event occurrences
( u ,i- j) , (v ,i)) {
Li =  (-1) * MaxDiff{G, {v, i), {u, i -  j ));




(Find Estimate of the Worst-Case Time Difference in a Cyclic 
Graph):
bound WCTimeDiffiER system (E , R); events u, v;
occurrence-index offset j )  {
I f ( j  > 1) then R etum ([-oc, oo]);
Else {
Construct subgraph G from (E , R ) using only 
events and rules with indices i — 1 and i 
for an arbitrary i > 0  and exclude rules with 
enabling event (reset, 0);
Add event (source, i — 1) to graph G;
For each rule of the form {e ,f,i — 1,1,r) in graph G , 
replace it with (source,f ,i — 1,0, [0, oo]);
[Li, Ui] =  TimeDiff(G, (u , i - j ) ,  (v , i ));
If ( j  = =  1) then Retum([Li, Ui]);
Else{
Construct subgraph G' from (E , R) using only 
events and rules with index i = 0 ;
109
- ----  I
no IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 2, JUNE 1993
({reset, 0)) foi,0) ) (<goi,0>) 
*■  / '[20.501 ^
«l*A (<rdyt,Q)) (<rrfy-l,0)) ( (gT0) )




Fig. 2. A subgraph of the infinite acyclic constraint graph for the SCSI 
protocol controller.
[L0,U0] =  TimeDiffiG', (u, 0), (v,0))\
L' = min(Li, Lq);
U' = max(Ui, Uo)]
Retum([L', (/']);
}}}
For the example shown in Fig. 2, the estimate of the worst- 
case time difference found by Algorithm 2.3 between the two 
events rdy  | and q | with occurrence-index offset j  =  0 is 
the bound [15,55], This means that rdy j  always occurs at 
least 15 units of time after q [, but no more than 55 units of 
time after q j.
Proof of Correctness: Theorem 2.1 shows that the bound 
for the ith occurrence, [Li, Ui], found in Algorithm 2.3 is an 
estimate for all * > 0. Therefore, combining this with the 
actual time difference for i = 0 results in an estimate of the 
worst-case time difference.
Theorem 2.1: Algorithm 2.3 determines an estimate of the 
worst-case time difference between two events.
Proof: In order to show that Algorithm 2.3 returns an 
estimate of the worst-case time difference, we must show 
that the following inequalities hold: L' < L and U' > U 
(from Definition 2.6). If j  > 1 then Algorithm 2.3 returns 
[L',U'\ = [—00,00] which trivially satisfies Definition 2.6. 
If j  = 1 then it returns [L',U'] = [L,, (/;]. If j  — 0 
then Algorithm 2.3 returns V = min(L0, Li) and U' = 
max(t/o, Ui). Since [L0, Uo] is an actual time difference for 
the 0th occurrence, we only need to show that [Li, Ui] always 
yields an estimate for * > 0. A maximum time difference 
is calculated recursively in terms of other maximum time 
differences (see Algorithm 2.1). Therefore, when calculating 
Ui using subgraph G, one of two cases may occur. Its value 
may be independent of maxdiff values for events not in graph 
G (i.e., events with indices less than i - 1). If this is the case, 
then Ui = min,>i {t/j}. On the other hand, if it depends on 
time differences of earlier events not in graph G, then just 
before MaxDijf falls off the end of the graph, it will call either 
MaxDiff[G, (source,i —I), (/, i —1))(1) or MaxDiff(G, (f , i  — 
1 ),(source,i - 1)) (2). Since the rule between (/,i - 1) 
and (source,i — 1) has timing constraint [0,00], (1) will 
return 00, and (2) will return 0. If graph G were extended 
to include another cycle, the rule between (source, i — 1) 
and {/, i — 1,) would be replaced with a rule of the form 
(e ,f ,i — 1,1,t ) .  Now, MaxDiff(G, (e,i- 2 ) ,( f ,i — 1)) would 
be called which would return a value less than or equal to
00, or MaxDiff(G, (/, i — 1), {e, i — 2)) would be called which 
would return a value less than or equal to 0 (note this second 
case is never positive because from the ordering defined by the
rule, we know that e always occurs before /). This relationship 
continues to hold if the graph is extended an infinite number 
of cycles. Since the value found for case (1) and for case (2) 
is greater than that found if graph G is extended back further, 
and since the maximum time difference is calculated by adding 
these values to values found on the rest of the graph, we know 
that the value calculated for Ui using graph G will be less than 
or equal to the actual value of Ui for i > 1. Therefore, U' > U, 
and we can similarly show that L' < L. Thus, Algorithm 2.3 
gives an estimate of the worst-case time difference. □
4) Complexity o f the Algorithm: Calculating the time 
difference of each pair of events using the MaxDiff algorithm 
has complexity 0(v ■ e) where v is the number of vertices 
and e is the number of arcs in the graph [15]. Let \E'\ and 
|i^ | be the number of events and rules, respectively, in the 
cyclic constraint graph representation. The largest graph which 
Algorithm 2.3 analyzes has 2\E'\ vertices and 2|i?'| arcs. 
Therefore, using Algorithm 2.3 to calculate estimates for all 
time differences has complexity 0 (\E '\ •
5) Extensions to Find a Better Estimate: If either the bound 
is not tight enough or there is interest in finding worst-case 
time differences of events across more than one cycle (i.e., 
j  > 1), the algorithm can be extended by increasing the size 
of the subgraphs which Algorithm 2.3 analyzes. Assuming 
subgraph G is enlarged to contain c cycles (c = 2 in Algorithm 
2.3), the algorithm is modified in the following ways:
1. Construct subgraph G using only events and rules with 
indexes i — (c — 1),..., i where i > (c - 2).
2. Construct subgraph G' using only events and rules with 
indices i < (c - 2).
3- If j  < (c — 2) then using graph G', find [Lj, Uf], .,., 
[L(c-2), U{c_2)].
4. L' = m in(L i,L j,... ,L(c_2)) and U' = max(Ui,Uj, 
■■■, U(c—2))‘
In the modified algorithm, estimates of worst-case time 
differences with j  < (c — 1) can now be found. Theorem 2.1 
can easily be extended to show that the modified algorithm 
returns an estimate of the worst-case time difference. It is also 
easy to show that the complexity of the modified algorithm is 
0(c\E '\ • c|#|). _
6) Termination o f the Algorithm: In order to avoid unnec­
essary calculations, the algorithm can be modified to check if 
extending the size of the subgraphs analyzed (i.e., increasing 
c) is helpful. To do this, the algorithm is modified to return 
a best-case estimate, [Lbe3t, Ubest], in addition to the worst- 
case estimate, [L',U'], where Lbest = min(Lj,..., i ( c_2)) 
and Ubest = max(f/j,..., {/(c_2)). Given the actual worst-case 
time difference is [L, U], it is easily shown that these estimates 
satisfy the inequalities: L' < L < Lbest and Ubest < U < U'. 
If tightening the bound to [Lbest, Ubest] would not result in 
less circuitry than [L1, U'], then it is not worth increasing c. In 
fact, if Lbest = L' and Ubest = U', then the actual worst-case 
time difference [L, U] has been found. In general, increasing 
c does not guarantee that the exact bound [L, U] can always 
be found, but in all the circuit examples that we synthesized, 
Algorithm 2.3 (i.e., c = 2) either found the exact bound or at 
least a sufficiently tight bound to detect all redundancies.
MYERS AND MENG: TIMED ASYNCHRONOUS CIRCUITS 111
in. Synthesis Procedure
Given an ER system specification, we apply our timing 
analysis algorithm to derive an optimized timed circuit im­
plementation. The synthesis procedure has three steps. The 
first step is to detect and remove redundant rules from the 
specification. The second step is to construct a reduced state 
graph. The third step is to derive a circuit implementation 
from the reduced state graph.
3.1. Removing Redundant Rules
The first step in the synthesis procedure is to detect and 
remove redundant rules in the timed specification. Since each 
internal rule results in a literal in the implementation in order 
to ensure the behavior specified by the rule, if it is determined 
that this behavior is guaranteed without the rule (i.e., the 
rule is redundant) then the literal can be removed from the 
implementation resulting in a smaller circuit.
1) Redundant Rules: A rule is redundant in the timed spec­
ification if its omission does not change the behavior specified 
by the timed firing rule. This is defined more formally as 
follows:
Definition 3.1 (Redundant Rule): A rule (e ,f,i,e ,r) is re­
dundant for all i > e if the bound on the time of the event 
occurrence {/, i) with the rule removed as defined below:
(e,f,x,€,-r)eRNR
, ma* { t ( { e , i - e ) )  +  u}
where R ^ r  = R — {(e, /, i, e, r)\i > e} is the same as the bound 
specified in the timed firing rule (see Definition 2.3).
2) Algorithm for Detecting Redundant Rules: If there are 
multiple rules enabling an event, then it is possible that some 
of them are redundant. Algorithm 3.1 checks each rule by 
using Algorithm 2.3 to find an estimate of the worst-case time 
difference between the enabled and enabling event. We prove 
later that if the lower-bound of this estimate is larger than the 
upper-bound of the timing constraint on the rule, then the rule 
is redundant.
Algorithm 3.1 (Find Redundant Rules)
set FindRed(ER system (E ,R )) {
Rnr ~ R’,
For each rule of the form (e,f,i, e,r) where r = [I, u] { 
[V, U')=WCTimeDiff((E, R), e, f, e);




The SCSI protocol controller example depicted in Fig. 2 has 
four events that are enabled by multiple rules: req j, rdy J., 
req and q T- For the rule, (q J., rdy J., i, 0, [0,5]), Algorithm 
2.3 estimates the worst-case time difference between the two 
events rdy I and q J. to be the bound [15,55]. Since the 
lower-bound of this time difference, 15, is greater than the 
upper-bound of the timing constraint on the rule, 5, the rule
is found to be redundant. In other words, the rule between the 
events q j  and rdy J. can be removed without changing the 
specified behavior. Further analysis finds this to be the only 
redundant rule.
3) Proof of Correctness: Definition 3.1 defined a redundant 
rule as a rule which could be removed from the ER system 
without changing the behavior specified by the timed firing 
rule. By applying transformations to the timed firing rule, 
Theorem 3.1 proves that Algorithm 3.1 finds redundant rules. 
Theorem 3.1 Algorithm 3.1 finds redundant rules.
Proof: (by contradiction) Given a rule (e, /, i, e, r) sat­
isfies the condition set forth in Algorithm 3.1 to be redundant 
(i.e., L' > u), assume that it is not redundant. In that case, 
there exists a value of i such that one of the following is true:
t((e, i-e)) + l<  t((f, i)) or t((f, *)) < t((e, i - <•)) + u.
(from Definitions 2.3 and 3.1). Now, subtract t ( (e , i  — e)) from 
each element:
I <  t((f, i)) - t((e, i -  e)) or t((f, i)) - t((e, i - e)) <  u.
These are instances of a worst-case time difference, so they 
are bounded by [L, U],
L < I < t({f,i)) - t((e,i — e)) <U
or L <  t((f,i)) - t((e,i - e)) <  u < U.
(from Definition 2.5). Since L' returned by Algorithm 2.3 is 
an estimate of the worst-case time difference (from Theorem 
2.1), L' <  L (from Definition 2.6). Also, V  > u (from 
Algorithm 3.1) and u > I (from Definition 2.2), so the 
following inequalities hold:
l < u < L ' < L < l  or u <  L'  <  L <  u.
Thus, we have a contradiction in each case. □
3.2. Finding the Reduced State Graph
In order to generate a circuit implementation, many method­
ologies transform a higher level specification into a state graph 
so that Boolean minimization techniques can be applied [2], 
[3]. Essentially, a state graph is a graph in which the vertices 
are bitvectors and the arcs are signal transitions. Each bitvector 
specifies the binary value of every signal in the system when 
the system is in that state. In our method, timing analysis is 
utilized to generate a reduced state graph which often has 
significantly fewer states than a state graph generated without 
considering timing constraints. Since the size of the state graph 
and the complexity of the circuitry are strongly correlated, our 
method often results in simpler circuitry compared with other 
methods that do not fully utilize timing constraints.
1) Reduced State Graph Typically, a state graph is speci­
fied as a set of states and a set of transitions between states
[2], [3]. Algorithm 2.3 can be utilized to detect states that can 
never be reached, resulting in a reduced state graph. These 
unreachable states are removed from the set of states, and 
the transitions leading to them are removed from the set of
T
112 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 2, JUNE 1993
transitions. It is not always possible to infer from the reduced 
state graph all enabled transitions, since a transition can be 
enabled in a particular state without an arc leading from it to a 
state where that transition has occurred. Although the transition 
cannot occur in the next state, the fact that it has been enabled 
is needed during synthesis. To solve this problem, a reduced 
state graph is fully characterized by a set of states that contain 
information on enabled transitions as described in Definition
3.2. Each such state is a vertex in the reduced state graph, and 
these vertices are connected by arcs as described in Definition
3.3.
Definition 3.2 (State): Each state 5 is of the form 5 = 
«ii • • • i Sk-, ■ ■ ■, sn, where n is the number of signals in the 
specification. Each state bit Sk has the value: 0 if the signal 
Sk is low, R  if the signal sk is low but enabled to rise, 1 
if the signal Sk is high, and F  if the signal Sk is high but 
enabled to fall. The function VAL[sfc] = 0 if s* = 0 or R and 
VAL[sfc] = 1 if Sk = 1 or F.
Definition 3.3: (Reduced State Graph): A reduced state 
graph is a graph in which its vertices are states and the arcs 
are allowed transitions between states. There exists an arc 
from state 5 to state S' if there exists a signal Sk, such that 
for all Z ^  fc, VAL[sJ] = VAL[sj], and either Sk = R  and 
VAL[s'fe] = 1, or Sk = F  and VAL[s'fc] = 0.
2) Constrained Token Flow: The reduced state graph is de­
rived using constrained token flow described in Algorithm 3.3. 
This is similar to token flow which is used for finding state 
graphs as described in [3] [2]. The algorithm begins with the 
initial marking of the constraint graph which is defined as 
the set of rules enabled by reset. The function FindState is 
then used to find the state as defined in Definition 3.2 for the 
marking. Given a marking, an event is enabled if all rules 
which enable that event are in the marking. If in a marking 
more than one event is enabled, all possible event sequences 
need to be generated. With timing constraints, it may be 
possible that one of the enabled events is always preceded 
by another, in which case the function Slow, implemented in 
Algorithm 3.2, is used to check if an enabled event is slower 
than some other enabled event. If so, the occurrence of the 
slower event is postponed. The result is that some states are 
no longer reachable, yielding a reduced state graph. Note that 
if the function Slow is changed to always return FALSE then 
the resulting state graph is the same as generated using regular 
token flow.
Algorithm 3.2: (Check If Event Is Slow)
boolean SlowfER system (E, R); event occurrence (u, fc); 
marking M ) {
For each event (v, I) that is enabled in M  where u /  v,
If (I > k) then {
[L\ U']=WCIimeDiffl(E, R ),u ,v ,l-  k);
If(V  < Oj then Retum(TRUE) 
else Retum( FALSE);
} Else {
[L1, U']=WCTimeDiff((E, R),v,u,k — I);
If(L ' > Oj then Retum(TRUE) 
else Retum(FALSE);
}}
Fig. 3. (a) State graph for the SCSI protocol controller, (b) Reduced state 
graph for the SCSI protocol controller.
Algorithm 3.3: (Find Reduced State Graph)
set FindRSG(ER system (E, R)) { 
initial-marking = {rules in R of the form 
(reset, f , 0,0,t0)}; 
setjofjnarkings = {initial-marking}; 
presentstate = FindState( (E, R), initial .marking); 
set jofstates = {present state};
While (setjofjnarkings ^  0j {
7lake marking from setjofjnarkings (i.e., 
setjofjnarkings = setjofjnarkings—{marking} );
For each enabled event (f, i) in marking {
If not (Slow( (j?, R), (/, i), marking)) then { 
newjnarking = marking— { rules in marking 
of the form (e, f,i,e ,r)}
+ {rules in R of the form (/, g,i + e, e' ,t')}; 
presentstate = FindState((E, R), new .marking);
If (presentstate & set j>fstates) then { 
set jof states set jofstates 
+{ presentstate }; 





Using this algorithm on the SCSI protocol controller with 
the function Slow replaced with FALSE (i.e., ignoring the 
timing constraints), the state graph obtained contains 20 states 
as shown in Fig. 3(a). If the timing constraints are considered, 
a reduced state graph is derived which contains 16 states as 
shown in Fig. 3(b).
3.3. Derivation of a Circuit Implementation
Several methods exist which transform a state graph into 
a circuit implementation such as those described in [2]-[4], 
We present a method similar to guard strengthening described 
in [21] but derive the circuit implementation from a state 
graph. A guard is a conjunction of signals and their negations. 
When this conjunction evaluates to true, the transition it is 
guarding can occur. The reason this method is called guard
Istrengthening is that it starts with weak guards (i.e., the guard 
may evaluate to true in states in which the transition it is 
guarding should not occur) to which signals are added to 
strengthen them.
1) Finding the Enabled State: The first step is to determine 
the enabled state for the transitions on each signal. The enabled 
state for a transition is the value of each signal in all states 
in which that transition is enabled to occur. This provides 
information on which signals are stable during a particular 
transition, and thus, can be used to strengthen the guard for 
that transition. This is defined more formally in Definition 3.4. 
Algorithm 3.4 shows how the enabled state for each transition 
can be found from the reduced state graph.
Definition 3.4: (Enabled State): For each transition sk 
the enabled state is of the form QkT = gfcTii , . . .  ..., qkhn, 
where n is the number of signals in the specification. Each 
qk],i is determined as follows: if in all states where sk = R, 
VAL[sj] =  0 then qki,i = 0; if in all states where sk = R, 
VAL[s;] =  1 then qk^i =  1 ; otherwise, qk],i =  X . The 
enabled state for the transition st J. is similarly defined.
Algorithm 3.4: (Find Enabled State) 
array FindES(ER system (E, R); set setjofstates) {
For each signal sk, Q k}=Q  *4 =  undefined,. . . ,  undefined;
For each state and each signal sk in each state 
If  (Sk = =  R) then 
For each signal s/
If  (qk\,i = =  undefined) then qk],i =  VAL[si};
Else if (qis\,i # VAL[si\) then qku  = X ;
Else if (sk = =  F) then 
For each signal si 
If(Qki,i —— undefined) then qk[,i =  VAL[si\;
Else if(qku  ^  VAL[si]) then qkU =  X ;
Retum(Q);
}
In the SCSI protocol controller, the enabled state for the 
transition req f is 0 X 0 0 0 , since there are two states 0FR.0Q 
and 0 0 / 2 0 0  where the transition req } is enabled to occur. 
In this case, both the state graph and the reduced state graph 
give the same enabled state. However, for the transition rdy f , 
if the state graph is used, the enabled state is X 0 0 0 1 , but if 
the reduced state graph is used, the enabled state is 1 0 0 0 1 . 
Therefore, using timing constraints, the enabled state can 
contain less uncertainty.
2) Detecting and Resolving Conflicts: The next step is to 
check for conflicts in each state. A conflict occurs when the 
weak guard evaluates to true in a particular state, but in that 
state the signal is enabled to change or has changed to the 
opposite value. This either results in interference, where a 
signal is being both pulled high and low at the same time, 
or it can result in a misfiring, where a transition occurs in 
a state in which the signal should remain stable. Both cases 
represent circuit hazards and must be prevented.
The non-redundant rules are used to construct the weak 
guards for each transition. To prevent a conflict, context signals 
are added to a weak guard to guarantee that the transition being 
guarded cannot occur in the particular problem state. A signal 
can be used as a context signal if it is stable in the enabled state 
for the transition, and its value in the enabled state is different
MYERS AND MENG: TIMED ASYNCHRONOUS CIRCUITS
from that in the problem state. For each transition, a table is 
constructed where the columns are the conflict problem states, 
and the rows are the signals which can be chosen to solve 
each problem. An outline of the basic procedure is described 
in Algorithm 3.5. The function Problem determines if a set of 
rules is sufficient to prevent a given transition from occurring 
in a particular state. The function Solution checks if a signal or 
its negation can be used to prevent a transition from occurring 
in a given state.
Algorithm 3.5: (Find Conflicts): 
array FindConffER system (E, R n r ): set setj)fstates; 
array Q) {
For each state S and each signal sk in S 
If{{sk = =  F  or sk = =  0) and (Problem(S, sk t,
{rules in R nr  of the form (e,sk |,z,e,r)}))) then 
For each signal si,
If  (Solution(S, qk],;)) then Ck\ [.s;, ,5] = TRUE;
Else If (Problem(S, sk j, {rules in R nr  of the form 
(e,sk | ,*,e,r)})) then 
For each signal si,
If (Solution(S, qki,i)) then Cfcj[s;,5] = TRUE; 
Retum(C);
}
In the SCSI protocol controller, the transition req | has 
the weak guard -tack A ->rdy. Using this guard and the state 
graph shown in Fig. 3(a), there is a conflict with the state 
000i?l. This problem state is compared with the enabled state 
for req 0X000, as determined earlier. The only signal which 
can be chosen to solve this problem is -iq. Using the reduced 
state graph in Fig. 3(b), the state 000RI is not reachable, so 
there is no conflict. Thus, the guard is not strengthened with 
-iq, and the timing constraints have again helped reduce the 
complexity of the implementation.
3) Finding an Optimal Cover: Determining which context 
signals to use to optimally solve all conflicts constitutes a 
covering problem, which is solved by treating the table of 
conflict problems and possible solutions as a prime implicant 
table [22], Thus for each transition, a prime implicant table 
is solved using the procedure outlined in Algorithm 3.6. The 
function Choose-essential sows determines if a problem has 
only one possible solution. If so, the signal associated with 
that solution is added to the guard and all problems solved 
by this signal are removed from the table. The function 
RmAominating-columns detects if solving a problem implies 
another will be solved, and if so, removes the second problem. 
The function RmAominatedjrows checks if one solution solves 
all the same problems that another solution does and more, and 
if so, removes the second solution. If there is only one remain­
ing problem to solve, the function Solvejessentialjcolumns 
solves it, and, if possible, does so by selecting a signal which 
provides symmetry between guards for the rising and falling 
transition.
This procedure is repeated until all problems are solved, or 
the number of problems solved is no longer decreasing. At the 
end of the procedure, all problems may not be resolved if the 
table is cyclic, in which case the remaining problems can be 
solved by inspection or a branching method [22] implemented 
in the function Solve .remaining .problems.
113
114 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 2, JUNE 1993
TABLE I
Context Signal Table for the Transition 
rdy t from the SCSI Protocol Controller
Context Signal Ifcble for rdy \
Problems
Solutioos
010F0 FFOOO0FR00 FOOOO ooroo
ack l/ V-go V J •s->rdy
4 V ■J •J V V v'
- i  ack- 
- ,rdy-
-i ack a  - i  rdy a  - *  q - >  req"\ 
ack a  <7 —» reqi
Mg)--
Algorithm 3.6: (Find Optimal Cover): 
set FindCover(ER system (E , R n r ): array C) {
Rc 0.
For eac/i transition t {
While ((NumProb(Ct) > 0) and (NumProb(Ct) 
is decreasing)) do {
Rc  = Rc  + ChoosejessentiaLrows(Ct); 
Rm-dominating ^ columns (Ct);
Rm-dominated-rows{ Ct) ;
Rc  =  Rc+
Solvejessentialjcolumns{{E, R n r  + R c), Ct);
}
If (NumProb(Ct) > 0) then Rc = Rc  
+ Solve-remaining.problems((E, R n r  + Rc), Ct);
}
Retum( R c );
}
Returning to our example, in the reduced state graph, there 
are still conflicts associated with the transition rdy A table 
of problems and possible solutions is shown in Table I. In this 
table, there is an essential row since the fifth column can only 
be solved by choosing q. Strengthening the guard with this 
signal solves all the problems.
4) A Complex Gate Implementation: For each output sig­
nal s the trigger signals (i.e., those given in the rules) and the 
context signals (i.e., those added to solve conflicts) for s | 
are implemented in series in a pullup network, and similarly, 
the signals needed for s J, are implemented in series in a 
pulldown network. The resulting circuit is a state-holding 
element called a generalized C-element [1], The complex gate 
implementations for both the speed-independent and the timed 
versions of the signal req from the SCSI protocol controller are 
shown in Fig. 4, with the guards that are being implemented. 
If a signal appears only in the pullup, but not in the pulldown, 
then it is annotated with a If a signal appears only in 
the pulldown then it is annotated with a Otherwise, the 
signal has no annotation. A static CMOS implementation for 
each element is also shown in Fig. 4. Since signal ->q was not 
needed in the guard for req | for the timed implementation, the 
resulting circuitry needs two less transistors. Similarly, since 
the rule (q I, rdy [0 ,5]) is found to be redundant, the
signal -iq is not used in the guard for the transition rdy J,, and 
two transistors are saved there as well.
3.4. Exceptions
Throughout the synthesis procedure, there are various ex­
ception conditions which can occur if the procedure finds that 
it has a specification for which it cannot derive an implemen­
tation. Each is briefly described here with suggestions on how
(b)
Fig. 4. (a)Speed independent implementation of req. (b) Timed implemen 
tation of req.
to modify the specification to solve the problem, but a general 
solution for timed circuits is still an open area of research.
1) Complete State Coding Violation: A timed specification 
violates the complete state coding property if in the reduced 
state graph, two states have the same binary value, but different 
transitions on non-input signals are enabled in each state (see 
Definition 3.5). To solve this problem, state variables are 
usually added to the specification.
Definition 3.5: (Complete State Coding Property): A re­
duced state graph has the complete state coding property if 
for any two states S and S' either there exists a signal s* 
such that VAL[sfc] ^  VAL[s'fe], or for all non-input signals s*, 
Sfc = *fc-
2) Persistency Violation: After the enabled state is found, 
the synthesis procedure verifies that the timed specification 
is persistent [2], [3] as defined below. While in general 
the persistence property is not a necessary requirement for 
synthesis [4], it is required to use the enabled state approach. 
Persistence problems can be solved by either adding state 
variables or persistence rules [2], [3].
Definition 3.6:(Persistence) For each rule of the form 
(e, /, i, e, t) in the set of non-redundant rules R n r , if event e 
is a rising transition on the signal Sk and the enabled state Q / 
of event /  has qftk = 1, then event e is persistent. If event 
e is a falling transition on the signal sk and the enabled state 
has qftk =  0 , then event e is persistent.
3) Unresolvable Conflicts: Finally, it is possible that there 
may be no available context signal to resolve a conflict. This 
problem may be caused by a potential context signal which is 
non-persistent [4]. To solve this problem, state variables are 
again added.
3.5. Putting It All Together
The entire synthesis procedure neglecting exceptions can 
be given as follows:
Algorithm 3.7 (Automated Timed Asynchronous Circuit Syn­
thesis)
circuit ATACS(ER system (E ,R )) {
R Nr = FindRed({E,R));
RSG = FindRSGf {E, R) );
Q = FindES({E, R), RSG);
MYERS AND MENG: TIMED ASYNCHRONOUS CIRCUITS
Fig. 5. Block diagram for part of the MMU controller.
C  =  FindConff (E, R n r ), RSG, Q);
Rc  = FindCover({E, R n r ), C);
Circuit = FindCircuit({E, R n r  + Rc)):
Retum(Circuit);
}
IV. Exam p les
This section describes two practical examples: a memory 
management unit (MMU) and a DRAM controller. The MMU 
is derived from a CSP specification, and it is used to illustrate 
the complexity reduction of timed circuits compared to speed- 
independent circuits. The DRAM controller is derived from a 
BSM specification, and it is used to demonstrate how timed 
circuits can be used in a synchronous environment.
4.1. Memory Management Unit
The first example is a MMU designed for use with a 16-bit 
asynchronous microprocessor [11]. The original implementa­
tion was derived using Martin’s synthesis method [12]. The 
basic operation of the MMU is to convert a 16-bit memory 
address to a 24-bit real address. There are six possible cycles 
that the MMU controller can enter, depending on data from 
the environment. For simplicity, the design of only one cycle 
is discussed: memory data load. A simplified block diagram is 
shown in Fig. 5 in which only signals involved in this cycle 
are depicted.
1) From CSP to a Timed Specification: The high-level CSP 
specification for the memory data load cycle is: *[MDl —► 
(RA  | B); M SI; MDl] (see [12]). This specification 
is initially transformed into the following handshaking 
expansion:
*[[mdli A -^rai];rao f;[-i6 i];6o t; [rai A bi A -ims/*]; mslo\; 
[msli]; mdlo f ; rao J.; bo j ; [-■ mdli\; mslo [; mdlo J,],
which is then converted to the constraint graph shown in Fig. 6.
The transformation from CSP to a handshaking expansion 
is not unique. A more concurrent constraint graph shown in 
Fig. 7 also satisfies the high-level CSP specification. This 
specification is simply a reshuffling [1] of the earlier one. 
This reshuffling is not considered in [12] because it results 
in a complete state coding violation [2], This means that the 
more concurrent specification cannot be implemented without 
adding state variables. Adding state variables not only changes 
the specification, but can also add extra circuitry and/or delay
115
Fig. 6. The cyclic constraint graph specification!! for the unoptimized MMU.
Fig. 7. The cyclic constraint graph specification for the optimized MMU.
Fig. 8. The cyclic constraint graph specification for the persistent MMU.
to the implementation. This cost often outweights the benefit 
of the higher degree of concurrency. This particular problem 
can also be solved by adding persistence rules, but this can 
reduce the concurrency in the specification. If conservative 
timing constraints are also added, the reduced state graph 
of the more concurrent specification shown in Fig. 7 does 
not have a complete state coding violation, and thus, it can 
be implemented without adding state variables or persistence 
rules. To make the specification in Fig. 7 persistent, three arcs 
are added to the constraint graph as shown in Fig. 8; the 
specification can now be implemented speed-independently. 
As shown later, the speed-independent implementation is still 
more complex than the original implementation derived from 
the specification in Fig. 6.
2) Speed-Independent versus Timed Implementation: A 
speed-independent and a timed implementation of the specifi-
116 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 2, JUNE 1993
TABLE II
Comparison of the Guards for the Speed-Inedependent 
and Timed Implementations of the MMU
mdli_
msli-
Speed* Iodepeadeat G Birds SiapUfed tWd tiurdt






cation shown in Fig. 8  are compared. For the timed implemen­
tation, the timing constraints used are depicted in Fig. 8 . The 
lower-bound of the timing constraint on mdli | states that the 
processor does not issue memory requests faster than every 
30 ns. The lower bound of the timing constraint on msli | 
states that the DRAM access time takes at least 30 ns. Both 
of their upper-bounds are infinite since the processor could 
choose never to do a load, or the interface could choose never 
to process the request. The reseting of the acknowledgement 
(i.e., mdli J. and msli j) is assumed to be somewhat faster, 
and must occur within 5 to 30 ns of the reset of the request. 
The other numbers were obtained from SPICE simulations 
of the datapath circuitry for a 0.8/zm CMOS process. The 
comparator, denoted bi, has a delay of between 2.5 to 13 
ns, and the registers, denoted rai, have a delay of between 
2 to 9 ns depending on temperature, voltage, and processing 
variations. All output signals have a delay of 0 to 1 ns where 
1 ns was found to be the maximum delay of the gates in the 
library used.
In the MMU specification, there are five events with multi­
ple rules enabling them: rao |, bo t, mslo |, mslo j, and 
mdlo [. Timing analysis determines that at least one rule 
associated with each event is redundant. In all, 6  of the 15 rules 
on output signals in the original specification are redundant. 
This includes the 3 persistence rules. To determine which 
context signals must be added, the first step is to determine 
the reduced state graph and the enabled state for each signal 
using the timing constraints. A state graph generated without 
any timing constraints results in 92 states while the reduced 
state graph only has 22 states. Using the reduced state graph, 
the timed implementation needs 5 context signals as opposed 
to 7 needed for the speed-independent implementation.
After adding context signals to our original specification, 22 
literals (note that we define a literal to be a signal in a guard) 
are required for a speed-independent implementation as shown 
in Table II. The timing constraints reduce the circuit to only 
10 literals. Thus, our circuit complexity is reduced by over 
50 percent using conservative timing constraints. A complex 
gate implementation for both is shown in Fig. 9. Note that this 
reduction is possible not only because of removing redundant 
literals, but also because the gate needed for implementing rao 
and bo can be shared after the optimizations.
4.2. DRAM Controller
Our next example is a DRAM controller which is an 
interface between a microprocessor and a DRAM array. This 
example is interesting for two reasons. It is an asynchronous 
design in a synchronous environment, and it is an example 
which includes non-deterministic behavior (i.e., input choice)
Fig. 9. (a) Speed-independent implementation of the MMU controller, (b) 
Timed implementation of the MMU controller.
Refresh Addr Counter
Fig. 10. Block diagram of the DRAM controller (courtesy of [13]).




' dtackl^—^  cast 
w elK lca t x lcal
r r ___-
ra s T  
cast dtack t
rasi 
a ?  6 t
ry-°L*r\
^  dtackJ N - '  
selcat
bi
c ^ b - jcasi r o a T
c a s t  dtack T  
selcai
Fig. 11. Burst-mode specification for the DRAM controller (courtesy
of [13].
which can be synthesized by transforming it into a determin­
istic specification. The DRAM controller has three possible 
modes of operation: read, write, and refresh. A block diagram 
for the entire DRAM controller is shown in Fig. 10. The design 
of the refresh cycle is discussed in detail in the next subsection 
to illustrate how synchronous inputs can be incorporated into 
an asynchronous design. The three cycles are combined to 
illustrate synthesis of a specification with non-determinism and 
multiple occurrences of events in a single cycle.
1) From Burst Mode to Timed Specification: Our specifica­
tion is derived from a burst-mode specification shown in 
Fig. 11 [13]. The specification of the refresh cycle is converted 
to the constraint graph shown in Fig. 12. Notice that this 
constraint graph is not well-formed (i.e., it is not strongly 
connected), so our timing analysis procedure cannot be applied 
directly. To solve this problem, the dashed arcs in Fig. 13 are 
added to the constraint graph. For this example, these new 
ordering rules are chosen to make the specification satisfy 
the fundamental mode assumption (i.e., outputs must occur 
before inputs can change). For example, the transition r fip  |
MYERS AND MENG: TIMED ASYNCHRONOUS CIRCUITS 117
Refresh r
Fig. 12. Constraint graph specification for the refresh cycle of the DRAM 
controller.
*~ f^lefresh \— Write Read
Fig. 14. Removal of data dependence from the DRAM controller speci­
fication.
All unmarked solid rules have timing constraint [0,2],
All dashed rules have timing constraint [0,0].
Fig. 13. Well-formed constraint graph specification for the refresh cycle of 
the DRAM controller. Unmarked solid rules have timing constraint [0,2]. 
Dashed rules have timing constraint [0,0],
must occur before c J., so a rules is added between them. The 
timing constraints for these rules are [0,0] which means that 
only the ordering of the two events is important and not the 
time difference between them.
In general, an ordering rule can be added between two 
events if the enabling event is guaranteed by the timing 
constraints to always precede the enabled event by at least 
the amount of time given in the upper bound of the ordering 
rule. In other words, the rule must be declared redundant 
using the timing analysis, since it is not actually enforced with 
circuitry. If this is the case, the implementation synthesized is 
valid; otherwise different ordering rules need to be chosen. 
If no ordering rules can be found to make the graph well 
formed, then our procedure cannot derive an implementation 
which can satisfy the given timing constraints. Currently, these 
ordering rules must be added before the synthesis procedure 
can be applied, but future research will incorporate finding 
appropriate ordering rules into the procedure.
2) Burst Mode versus Timed Implementation: The imple­
mentation of a timed version of the DRAM controller is 
compared with implementations from two burst-mode design 
styles [13], [7]. For our timed implementation, the timing 
constraints used for the refresh cycle are depicted in Fig. 
13 [23], These timing constraints are derived assuming the 
environment is as depicted in Fig. 10, and the controller is 
being used with a 68020/30 running at 16-20 MHz.
The implementation of the refresh cycle is considered first. 
As before, the first step is to determine which rules in the 
specification are redundant. All 7 of the ordering rules added 
to make the graph well formed are found to be redundant. In 
addition, the rules from a J. to ras | and r fip  j are also 
redundant. Next, the synthesis procedure derives a reduced 
state graph with 16 states. Using this reduced state graph
Fig. 15. Timed implementation of the DRAM controller.
and the non-redundant rules, one context signal is needed 
for the implementation. In all, 7 literals are needed for the 
implementation of the two output signals in the refresh cycle, 
rfip  and ras.
The implementation of the complete DRAM controller is 
non-deterministic; i.e., the environment can choose to do a 
refresh cycle, a write cycle, or a read cycle. Our timing analysis 
algorithm cannot analyze specifications with non-determinism 
directly. To solve this problem, the specification is converted 
to a long cycle going through a refresh, a write, and a read 
cycle sequentially as illustrated in Fig. 14. In this example, 
since each cycle always returns to the same state before the 
next cycle is chosen, all possible behaviors are modeled.
The resulting cyclic constraint graph has multiple occur­
rences of the same event in a cycle. For example, the transition 
ras j  now occurs three times in a single cycle. Each event 
which occurs multiple times is given a unique name for 
each occurrence, and these events are noted to be on the 
same signal. For example, the three occurrences of ras | are 
replaced with ras i f, ras2 T, and ras 3 These events will be 
treated separately during timing analysis, but together during 
synthesis.
The same procedure described earlier is used to find re­
dundant rules and the reduced state graph. When determining 
the enabled state, the multiple occurrences of an event are 
considered together. For example, when determining the en­
abled state for the transition we there is a state where 
dtacki = F  and dtack2 = 1, and another state where 
dtacki — 0 and dtack2 = 1. Therefore, in the enabled state for 
we I, both dtacki and dtack2 are set to X . To find conflicts, 
the individual occurrences of the same event are used, but to 
determine context signals, only the merged value is available. 
For example, ras can only be used as a context signal if ras 1, 
ras2, and ras3 all qualify as context signals.
This procedure leads to the implementation of the DRAM 
controller shown in Fig. 15. Note that although some of the 
gates are shown with multiple levels, they are all actually
118 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 2, JUNE 1993
TABLE 111
Comparisons between Speed-Independent and Timed Implementations of Several Examples. 
“SV” Under Context Rules Indicates that a State Variable is Needed for Synthesis











SCSI Protocol Controller 
[20]
10 20 2 12 9 16 1 10
Pipeline Handshake [3], 
[18]
8 16 0 8 6 12 0 6
MMU unoptimized 13 56 3 16 9 33 3 10
MMU optimized 12 174 sv n/a 7 22 3 10
MMU persistent 15 92 7 22 7 22 3 35
DRAM controller [13] n/a n/a n/a n/a 24 96 11 35
Microprocessor [11], [14] 
Fetch
8 32 sv n/a 7 14 3 10
PCAdd 10 49 sv n/a 7 18 3 10
Exec 42 354 sv n/a 17 76 sv n/a
ALU 26 123 sv n/a 17 44 3 20
Fig. 16. Complex-gate implementation of the DRAM controller.
implemented as single complex gates. For example, a dynamic 
gate implementing cos is shown in Fig. 16. Our final imple­
mentation has 35 literals (41 literals if the gate for dtack and 
selca are not shared). A locally clocked implementation as 
reported in [13] used 62 literals and 1 state variable. A 3-D 
implementation as reported in [7] used 46 literals and 1 state 
variable. Our implementation did not need a state variable.
4.3. Other Results
The synthesis procedure described in this paper has been 
fully automated in a CAD tool which transforms a well-formed 
ER system specification into a complex gate implementation. 
All results reported in this paper were compiled using this 
program, and they appear tabulated in Table III.
Additional examples in this table are parts of an asyn­
chronous microprocessor described in [11], and their spec­
ifications are taken from [14], All of the microprocessor 
specifications need state variables for a speed-independent im­
plementation; however, three of the four can be implemented 
without state variables if conservative timing constraints are 
added.
The timed implementation for the MMU controller and the 
refresh cycle of the DRAM controller have been verified using 
Burch’s timed circuit verifier [24], [25] to be hazard free 
under the given timing constraints. Here, hazard-freedom is
defined to mean that no transition once enabled to occur can 
be disabled without it occurring.
V. C o n c lu s io n s  a n d  Fu t u re  R ese a r c h
We have proposed a new methodology for the specifica­
tion of timed asynchronous circuits, the event-rule system, 
and developed a timing analysis algorithm to deduce timing 
information sufficient for the synthesis of timed circuits. A 
synthesis procedure based on our timing analysis algorithm 
has been constructed to detect and remove redundancy in the 
specification and to produce a reduced state graph. From the 
reduced state graph, our procedure systematically derives a 
complex gate implementation. Our results indicate that by 
using conservative timing constraints, our synthesis procedure 
can significantly reduce a circuit’s complexity. While reducing 
circuit area, we also increase circuit performance, not only 
because smaller circuits switch faster but also because we 
are able to synthesize more concurrent specifications than 
can often be considered practical using other design styles. 
Finally, we have applied our technique to the synthesis of 
asynchronous circuits in a mixed synchronous/asynchronous 
environment.
At present, our synthesis procedure requires a well-formed, 
deterministic ER system specification. While we have shown 
through an example how these restrictions can be relaxed, 
a systematic method has not yet been incorporated into our 
synthesis procedure. In the future, we plan to incorporate 
transformations to make a specification well-formed into the 
synthesis procedure. We also plan to generalize our timing 
analysis algorithm to handle non-deterministic behavior. The 
third direction for future work is develop a procedure for 
adding state variables to a timed specification to resolve excep­
tions: complete state coding violations, persistency violations, 
and unresolvable conflicts. This problem is not as straightfor­
ward as it sounds because adding state variables changes the 
specification, and thus may invalidate earlier timing analysis. 
Therefore, techniques used for adding state variables in other
MYERS AND MENG: TIMED ASYNCHRONOUS CIRCUITS 119
methodologies may not be directly applicable. Also, while we 
are able to verify our designs to be hazard free, verifying 
that they satisfy a specification has not yet been completed 
and will be addressed in the future. Finally, we intend to 
apply our technique to larger examples, and implement the 
IC design of interesting timed circuits to better assess the area 
and performance gain.
A c k n o w l e d g m e n t
The authors would especially like to thank Prof. David Dill 
of Stanford University who dedicated considerable amount 
of time in assisting us in formalizing our work. We would 
also like to thank Peter Beerel of Stanford University for his 
invaluable comments on numerous versions of this manuscript. 
Their thanks also go to Dr. Jerry Burch of Stanford University 
for verifying several of our designs. The authors would also 
like to thank Profs. Steve Bums and Gaetano Borriello of the 
University of Washington and their students for many valuable 
discussions on timing analysis and the synthesis of timed 
circuits. Finally, they would like to express our appreciation 
of the work done in Professor Alain Martin’s group at the 
California Institute of Technology, for their insight in the 
design of asynchronous circuits—their assistance in deriving 
the specification and speed-independent implementation of the 
MMU example is also gratefully acknowledged.
R eferen c es
[ 1 ] Alain J. Martin, “Programming in VLSI: From communicating processes 
to delay-insensitive VLSI circuits,” in UT Year o f Programming Insti­
tute on Concurrent Programming, C.A.R. Hoare, Ed. Reading, MA: 
Addison-Wesley, 1990.
[2] Tam-Anh Chu, “Synthesis of self-timed VLSI circuits from graph- 
theoretic specifications,” Ph.D. dissertation, Massachusetts Inst, of Tech­
nology, Cambridge, 1987.
[3] Teresa H.-Y. Meng, Robert W. Brodersen, and David G. Messershmitt, 
“Automatic synthesis of asynchronous circuits from high-level specifi­
cations,” IEEE Trans. Computer-Aided Design, vol. 8, pp. 1185-1205, 
Nov. 1989.
[4] Peter A. Beerel and Teresa H. Y. Meng, “Automatic gate-level sythesis 
of speed-independent circuits,” in Proc. IEEE 1992 ICCAD Dig. Papers, 
pp. 581-586, 1992.
[5] S. H. Unger, Asynchronous Sequential Switching Circuits. New York: 
Wiley-Interscience, 1969. (re-issued by Robert E. Krieger, Malabar, 
1983).
[6] Steven M. Nowick and David L. Dill, “Synthesis of asynchronous state 
machines using a local clock,” presented at the IEEE Int. Conf. on 
Computer Design, ICCD-1991, 1991.
[7] K. Y. Yun, D. L. Dill, and S. M. Nowick, “Synthesis of 3D asynchronous 
state machines,” presented at the IEEE Int. Conf. on Computer Design, 
ICCD-1992, 1992.
[8] Al Davis, Bill Coates, and Ken Stevens, “The Post Office Experience: 
Designing a large asynchronous chip,” in Proc. Twenty-Sixth Ann. 
Hawaii Int. Conf. on System Sciences, pages 409-418. IEEE Computer 
Science Press, 1993.
[9] Gaetano Borriello and Randy H. Katz, "Synthesis and Optimization of 
Interface Transducer Logic,” in Proc. IEEE 1987 ICCAD Dig. Papers, 
pp. 274-277, 1987.
[10] L. Lavagno, K. Keutzer, and A. Sangiovanni-Vincentelli, “Algorithms 
for synthesis of hazard-free asynchronous circuits,” in Proc. 28th 
ACM/IEEE Design Automation Conf., 1991.
[11] A. J. Martin, S. M. Burns, T. K. Lee, D. Borkovic, and P. J. Hazewindus, 
“The design of an asynchronous microprocessor,” in Decennial Caltech 
Conf. on VLSI, pp. 226-234, 1989.
[12] Chris J. Myers and Alain J. Martin, “The design of an asynchronous 
memory management unit,” Tech. Rep. CS-TR-92-25, California Inst, 
of Technology, 1992.
[13] S. M. Nowick, K. Y. Yun, and D. L. Dill. “Practical asynchronous 
controller design,” presented at the IEEE Int. Conf. on Computer Design, 
ICCD-1992, 1992.
[14] Steve Bums, “ Performance analysis and optimization of asynchronous 
circuits,” Ph.D. dissertation, California Inst, of Technology, Pasadena,
1991.
[15] Kenneth McMillan and David L. Dill, “Algorithms for interface timing 
verification,” presented at the IEEE Int. Conference on Computer 
Design, ICCD-1992, 1992.
[16] P. Vanbekbergen, G. Goossens, and H. De Man, “Specification and 
analysis of timing constraints in signal transition graphs,” in Proc. 
European Design Automation Conf., 1992.
[17] T. Amon, H. Hulgaard, G. Borriello, and S. Bums, ‘Timing analysis 
of concurrent systems,” Tech. Rep. UW-CS-TR-92-11-01, Univ. of 
Washington, Seattle, 1992.
[18] Chris Myers and Teresa H.-Y. Meng. “Synthesis of timed asynchronous 
circuits,” presented at the IEEE Int. Conf. on Computer Design, ICCD-
1992, 1992.
[19] Glynn Winskel, “An introduction to event structures,” in Linear Time, 
Branching Time and Partial Order in Logics and Models for Concur­
rency. Norway: Noordwijkerhout, 1988.
[20] Tam-Anh Chu, Private communication, July 1991.
[21] Alain J. Martin. “Formal program transformations for vlsi circuit syn­
thesis,” in UT Year o f Programming Institute on Formal Developments 
of Programs and Proofs, E. W. Dijkstra, Ed. Reading, MA: Addison- 
Wesley, 1989.
[22] Edward J. McCluskey, Logic Design Principles with Emphasis on 
Testable Semicustom Circuits. Englewood Cliffs, NJ: Prentice-Hall, 
1986.
[23] Ken Yun, Private communication, 1992.
[24] Jerry R. Burch. “Modeling timing assumptions with trace theory,” in 
1989 Int. Conf. on Computer Design: VLSI in Computers and Processors, 
pp. 208-211, 1989.
[25] Jerry R. Burch, “ Trace algebra for automatic verification of real­
time concurrent systems,” Ph.D. dissertation, Carnegie Mellon Univ., 
Pittsburgh, PA, 1992.
■
 Chris J. Myers received the B.S. degree in elec­
trical engineering and Chinese history in 1991 and 
the M.S.E.E. degree in 1993 from the California 
Institute of Technology, Pasadena. Currently he is 
working toward the Ph.D. degree at the same uni­
versity.
His current research include asynchronous logic 
synthesis, algorithms for analysis of real-time con­
current systems, CAD for VLSI systems, computer 
architecture.
Mr. Myers has received the Caltech Carnation 
Merit Award in 1990, the Rodman W. Paul History Prize in 1991, and a 
National Science Foundation Fellowship in 1991.Teresa H.-Y. Meng received the B.S. degree from 
National Taiwan University in 1983, and the M.S. 
and Ph.D. degrees from the University of California 
at Berkeley, in 1984 and 1988, respectively.
Since 1988 she has been with the Department 
of Electrical Engineering, Stanford University as as 
assistant professor. Her current research activities 
include wireless data communication, asynchronous 
logic synthesis, and low-power design of real-time 
DSP applications.
Dr. Meng has received the IEEE Signal Process­
ing Society’s Best paper Award in 1988, the 1989 NSF Presidential Young 
Investigator Award, the 1989 ONR Young Investigator Award, a 1989 IBM 
Faculty Development Award, and the 1988 Eli Jury Award at U.C. Berkeley 
for recognition of excellence in systems research.
