Moving from Weakly Endochronous Systems to Delay-Insensitive Circuits  by Dasgupta, S. et al.
Moving from Weakly Endochronous Systems
to Delay-Insensitive Circuits
S. Dasgupta(1), D. Potop-Butucaru(2), B.Caillaud(3), A.Yakovlev(1)
(1) School of EECE, University of Newcastle upon Tyne, UK
(2) Verimag, Grenoble, France
(3) IRISA, Rennes, France
Abstract
We consider the problem of synthesizing the asynchronous wrappers and glue logic needed for the
correct GALS implementation of a modular synchronous system. Our approach is based on the
weakly endochronous synchronous model, which deﬁnes high-level, implementation-independent
conditions guaranteeing correct desynchronization at the level of the abstract synchronous model.
We can therefore factor the synthesis problem into (1) a high-level, implementation-independent
phase insuring the weak endochrony of each synchronous module and (2) the actual wrapper
synthesis phase, highly simpliﬁed by the high-level assumptions, yet ﬂexible enough to produce
various, eﬃcient implementations.
We focus here on the synthesis of delay-insensitive asynchronous wrappers from weakly en-
dochronous synchronous modules, and show how this can be done for a simple DLX processor
model.
Keywords: GALS, delay-insensitivity, Petri net
1 Introduction
1.1 Synchrony, asynchrony, and GALS
Dealing with concurrency, time, and causality in the design of complex digital
circuits has become increasingly diﬃcult as the complexity of the designs grew.
The synchronous paradigm [8] dominated the design automation of digi-
tal circuits ever since its beginnings. Regardless of technology changes, the
synchronous model remained at the base of the ever more eﬃcient tools and
Electronic Notes in Theoretical Computer Science 146 (2006) 81–103
1571-0661  © 2006 Elsevier B.V. 
www.elsevier.com/locate/entcs
doi:10.1016/j.entcs.2005.05.037
Open access under CC BY-NC-ND license.
design ﬂows that allowed the exponential increase in speed and complexity for
more than 30 years. The reason to this is twofold:
(i) The synchronous abstraction 1 facilitates the speciﬁcation and the anal-
ysis of complex systems. Provided that a few high-level constraints in-
sure compliance with the synchrony hypothesis, the designer can forget
about timing and communication issues and concentrate on function-
ality. The synchronous model features deterministic concurrency and
simple composition mechanisms facilitating the incremental development
of large systems. Also, synchronous models are usually easier to ana-
lyze/verify/optimize compared to asynchronous counterparts, often be-
cause the state-transition representations are smaller.
(ii) Until recently, the fundamental ingredients of a synchronous implementa-
tion (global clock distributed on the whole chip with small skew, circuit-
wide communication within a single clock cycle) were easily mapped onto
the various silicon technologies.
However, the increase in speed and complexity, and the decrease in feature
size made technology mapping ever more diﬃcult. As a result, an important
research eﬀort has been directed to ﬁelds such as clock distribution, skew con-
trol, and on-chip interconnect design. The problem of the resulting techniques
is that they are often global and rely on a stronger integration of various design
phases (such as logic synthesis and placement). This increases the interdepen-
dency between functionality and communication and is contrary to the current
trends aiming at modular development based on oﬀ-the-shelf IPs.
One solution to what seems to be an evolutionary dead-end may come
from asynchronous circuit design[6]. Indeed, modularity and component-based
implementation ﬁgure among the potential beneﬁts of asynchronous design
methodologies (along with increased eﬃciency, lower power consumption and
lower electromagnetic interference). The weak point of asynchronous design
methods is complexity. Unlike synchronous circuits, asynchronous circuits
cannot dissociate (in the most general case) between combinational behavior,
sequential behavior, and timing aspects. The state explosion occurs very fast,
so that only small circuits can be handled. Moreover, regardless of eﬃciency
considerations, a radical paradigm shift towards asynchronous design is un-
likely to occur in the near future, given that most CAD tools are fundamentally
synchronous, and that few engineers have adequate training.
1 Cyclic, clock-driven execution. During each clock cycle, the behavioral propagation is
causal, so that the status of every wire is deﬁned prior to being used in computations.
Note that the last requirement empowers the conceptual abstraction that computations
and communications are inﬁnitely fast (“zero-time”) and take place at discrete points in
time, with no duration.
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10382
Gathering advantages of both the synchronous and asynchronous approaches,
the Globally Asynchronous, Locally Synchronous (GALS) systems are emerg-
ing as an architecture of choice for implementing complex Systems-on-Chips.
In a GALS system, locally-clocked synchronous components are connected
through asynchronous communication lines. Thus, unlike for a purely asyn-
chronous design, the existing synchronous tools can be used for most of the
development process, while the implementation can exploit the good modu-
larity properties of asynchronous communication.
1.2 The problem. Contribution
This paper addresses the problem of correctly and eﬃciently implementing a
modular synchronous speciﬁcation as a GALS circuit where inter-component
communication is done through asynchronous lines (in our case, FIFOs). This
operation, also called desynchronization [14] or GALSiﬁcation [10], involves
the construction of asynchronous wrappers that control input, output, and
clock generation for each synchronous module. The wrappers have two func-
tions:
(i) reconstruct, for each synchronous module, the input synchronization fronts
from asynchronous events
(ii) preserve, in a certain sense, the semantics of the synchronous speciﬁcation
in the GALS implementation (which may involve a form of distributed
control to insure the needed global synchronization properties).
The exact problem we consider is that of synthesizing the asynchronous wrap-
pers starting from the speciﬁcation of the synchronous modules. Our approach
is based on the weakly endochronous synchronous model, detailed in section 3,
which deﬁnes high-level, implementation-independent conditions guaranteeing
correct desynchronization at the level of the abstract synchronous model. The
synthesis problem is factored into a high-level, implementation-independent
phase insuring the weak endochrony of each synchronous module (not cov-
ered in this paper) and the actual wrapper synthesis phase, simpliﬁed by the
high-level assumptions.
We focus here on the synthesis of delay-insensitive asynchronous wrap-
pers from weakly endochronous synchronous modules. The choice of delay-
insensitive logic as implementation domain is determined by its excellent mod-
ularity properties, its ability to support concurrency (and thus more eﬃcient
implementations), and by the existence of state-of-the-art tools allowing the
speciﬁcation and synthesis of delay-insensitive circuits.
Our main contribution is the introduction of a clear formal framework
that allows us to guarantee the correctness of GALS implementation models
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 83
involving synchronous and asynchronous formalisms used in digital circuit
design (synchronous Mealy machines and Petri Nets).
1.3 Previous work
The distributed, asynchronous, or GALS implementation of synchronous spec-
iﬁcations is a subject that draws more and more attention. Although stated
in a purely synchronous framework, the latency-insensitive design of Carloni
et al. [2] has been a major source of inspiration in our work. There, the
goal is to modify the modules of a synchronous system in such a way as to
guarantee the preservation of the semantics when the implementation-level
communication lines have arbitrary latencies. Like in our case, producing a
latency-insensitive implementation consists in synthesizing for each module a
synchronous wrapper that controls input, output, and clock generation. The
approach guarantees the correctness of the resulting system, but it is ineﬃ-
cient, as the wrappers eﬀectively simulate a unique global clock which runs
as slow as the slowest computation or communication in the system. This
approach is only applicable to single clock SoC’s and cannot be extended to
multi-clocked systems. Another disadvantage of this scheme is that the mod-
ule waits for all its incoming data from its input channels, before it generates
its output on each output channel. The designer doesn’t have control over
his inputs to gated clock. Therefore, a data not required for a particular
computation, or any output channel not ready to accept data, can unneces-
sarily stall the synchronous module by gating the clock. Hence, this kind of
communication scheme is undesirable.
Several papers, like that of Singh and Theobald [16], extend Carloni’s ap-
proach by allowing latency-insensitive circuits support execution modes and
concurrency, and thus allow multi-rate, on-demand execution. These ap-
proaches allow an improvement in eﬃciency, in terms of power consuption or
speed. However, the new approaches do not guarantee the correctness of the
implementation w.r.t. the initial synchronous speciﬁcation. High-level criteria
covering the correctness aspects of such implementations have been deﬁned by
Potop, Caillaud, and Benveniste [14,13] but the current paper gives the ﬁrst
hardware implementation to the new concepts.
Leaving the purely synchronous model, we ﬁrst mention pausible clocking
by Yun and Donohue [18]. The goal is here to insure correct synchronization
in the transmission of data between synchronous modules using asynchronous
FIFOs. The approach focuses on the elimination of metastability problems
for single FIFOs. It cannot deal with synchronization constraints involving
several FIFOs, so that (1) the correctness of the implementation must be
insured by other means and (2) the synchronous module has a free running
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10384
clock, which increases consumption.
On the asynchronous side, we mention the work of Cortadella et al. [1]
on the fully asynchronous implementation of synchronous speciﬁcations. The
problem with this approach is that the method is global and intrusive, leaving
little place to modular design based on oﬀ-the-shelf IPs. Our approach directly
aims at modular development, by relying on high-level conditions guaranteeing
that a synchronous IP can be embedded in any environment.
More generally, we mention the large number of attempts to combine ad-
vantages of synchrony and asynchrony. Among them, we mention the pio-
neering work of Seitz [15] on systems with several clock domains, and the
thesis of Chapiro [3] which coined the term GALS. More recently, the burst-
mode circuits of Yun and Dill [17] represent an interesting intermediate model,
but their “fundamental” execution mode, which lacks concurrency, makes the
deﬁnition of modular design methodologies diﬃcult.
The remainder of the paper is organized as follows: Section 2 presents the
formal framework supporting our approach. It deﬁnes the microstep model
allowing us to represent both synchronous and asynchronous systems, and the
introduces weak endochrony (and the related correct implementation results).
Section 3 gives a background on the modeling framework used and the theory
of Regions. Section 4 presents the delay insensitive architecture and explains
how the new method can be applied to a very simple DLX-like processor.
Section 5 sums up the steps in the synthesis process and elaborates on the
translation of weakly endochronous ﬁnite state machines into Petri net models.
Section 6 outlines the veriﬁcation process of the Petri net model developed.
Section 7 gives a short conclusion. It also gives the directions we currently
follow to complete and extend our work.
2 Weakly endochronous synchronous systems
This section resumes the results of Potop and Caillaud [13]. It ﬁrst
deﬁnes the microstep model that allows us to reason in a uniﬁed framework
about synchronous and GALS systems. Then, it introduces weak endochrony
and the related semantics preservation results. Simple examples show how
simple synchronous/GALS systems are be modelled.
2.1 Microstep transition systems
We start the presentation of the microstep formalism with a small example
– a synchronous system with two input channels (b and d) and two output
channels (a and c). Channels a and d carry no data, being used only for
synchronization (alternatively, data is uninterpreted). The system emits a
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 85
message on channel a and then awaits for one message from either channel b
or d (e.g. for whichever comes ﬁrst). If signal b arrives with value 0, then the
system awaits the next clock cycle, where it emits c with value 42 (then, it
does nothing forever). If b is received with value 1 or d is received, then the
system does nothing forever. The behavior of the system is not speciﬁed for
b diﬀerent from 0 or 1. The “clock variable” of the system, denoted with τ ,
functions as a separator between synchronous cycles.
Σ1 :
s2◦ τ  s3◦ !c=42  s4◦ τ

s0◦ !a  s1
◦
?b=0

◦ ?b=1 
◦
?d




s5◦ τ

s6◦ τ

In a more classical macrostep framework, like that of [14], this system would
be represented by:
Σ1, macrostep version :
s2 c=42  s4

s0
a,b=0
 a,b=1 
a,d




s5

s6

However, this compact form hides both I/O and computation causality which
are essential aspects of any asynchronous implementation. Hence the need
for a new formalism that would bridge between the abstract, macro-step syn-
chronous models and asynchronous formalisms.
By analogy with the macrostep model, the initial state and the destinations
of clock transitions will be called synchronizing states, and the sequences of
microsteps ending with clock transitions shall be called reactions.
2.1.1 General deﬁnitions
We model every system, component, and communication line using ﬁnite state
machines of the form Σ = (S, sˆ, V, ◦→Σ ), where S is a ﬁnite set of states,
sˆ ∈ S is the initial state, V is a ﬁnite set of variables, and ◦→Σ is the
transition relation. The label l of a transition s◦ l  s′ is a partial valuation
of the variables in V . Formally, if Dv denotes the domain of a variable v,
and if ⊥ is a special symbol denoting the absence of a value, then the set
of all possible labels over V is LV =
∏
v∈V (Dv ∪ {⊥}). We denote with
supp(l) = {v ∈ V | l(v) = ⊥} the support of a label l, and we denote with ⊥V
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10386
the transition of empty support. Our state machines are composed by classical
synchronized product. If Σi = (Si, sˆi, Vi, ◦→i ), i = 1, 2, then Σ1 ⊗ Σ2 =
(S1 × S2, (sˆ1, sˆ2), V1 ∪ V2, ◦→ ), where (s1, s2)◦
l  (s′1, s
′
2) iﬀ si◦
l|Vi
Σi
 s′i , i =
1, 2, where l |Vi is the restriction of l over Vi.
Our systems and components communicate with each other and with the
environment through directed FIFO channels, each channel being represented
as a pair of directed variables. We emit a value on channel c by assigning
the variable !c, and we receive a value by reading variable ?c. Note that the
variables !c and ?c must have the same domain Dc.
To represent synchronous and GALS systems, we shall represent the clock
signals using special clock variables that carry no data (their domain contains
a single value, denoted  – the clock tick). In our small example, tau is such
a clock variable.
We denote with TracesΣ(s) the set of traces of the transition system Σ
starting in state s. Two traces ϕi, i = 1, 2 are asynchronously equivalent,
denoted ϕ1 ∼ ϕ2 if their projections ϕi |{?c,!c} on every communication channel
c coincide. For every channel c, ϕ1 |{?c,!c} is a preﬁx of ϕ2 |{?c,!c}, then we write
ϕ1 ≤ ϕ2. We say that ϕ1 and ϕ2 are non-contradictory, denoted ϕ1  ϕ2 if
for all c, one of the projections is preﬁx of the other.
2.1.2 Microstep synchronous transition systems
To represent synchronous systems, we shall use ﬁnite state machines having
exactly one clock variable (the system clock), having only directed and clock
variables, and satisfying a number of axioms, which include the synchronous
hypothesis. Formally, if τ is a clock variable and D is a set of directed vari-
ables, then the transition system Σ = (S, sˆ, V = D ∪ {τ}, ◦→ ), is a microstep
synchronous transition system (µSTS) if it satisﬁes:
STS1 (clock transitions): if s◦ l  s′ and l(τ) = ⊥ then l |V = ⊥V .
STS2 (synchrony hypothesis): two assignments of a same variable must
be separated by a clock transition. More exactly, if s0◦
l1  s1◦
l2  . . .◦ ln  sn
and ∀i : li = τ , then supp(l1), . . . , supp(ln) are mutually disjoint.
The ﬁrst axiom identiﬁes the clock transitions which separate the synchronous
reactions/clock cycles. The clock transitions are the only ones where the clock
variable is present. The second axiom is the actual synchronous hypothesis,
which states that during a clock cycle (i.e. between two clock transitions) a
communication variable can be assigned at most once.
In addition to these two fundamental axioms, we shall require that our
systems satisfy 3 more conditions. The ﬁrst two simply facilitate the deﬁnition
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 87
of our formal framework:
STS3 (void transition): ∀s ∈ S : s◦
⊥V  s .
STS4 (preﬁx closure): if s◦ l  s′ and l
′ ≤ l, then there exists s′′ ∈ S such
that s◦ l
′  s′′ and s′′◦
l\l′  s′ .
Axiom STS3 facilitates the deﬁnition of the composition by synchronized prod-
uct, by allowing diﬀerent systems to evolve independently (do nothing while
the other advance) when no synchronization is needed. We shall assume these
void transitions present in all the examples of this paper, but we shall not
graphically represent them. Axiom STS4 tells us that all transitions can be
decomposed into atomic transitions assigning exactly one variable. In our
framework, transitions assigning several variables are used to express in a
static fashion the concurrency between the composing atoms.
The last assumption is more important, as it constrains the class of rep-
resentable systems to stuttering-invariant ones, meaning that between two
synchronous reactions the system can spend any number of clock cycles doing
nothing. This hypothesis departs from the classical synchronous model, but
we see stuttering-invariance as a prerequisite for the eﬃcient multi-rate GALS
deployment.
STS5 (stuttering-invariance): sˆ◦
τ  sˆ and ( s◦
τ  s′ ⇒ s′◦
τ  s′ )
Note that the previous example is not stuttering-invariant. Here are two
simple stuttering-invariant systems:
Σ2 :
s2◦
τ1

s0
◦
τ1  ◦
!a  s1
◦
?b

◦
?r




s3◦
τ1

Σ3 : t0
◦
τ2  ◦
?a  t1◦ τ2
 t2
◦
τ2

◦ !b  t3◦ τ2

2.1.3 Synchronous and asynchronous composition
Both our modular synchronous systems and GALS implementations are built
from microstep synchronous automata using two diﬀerent composition mech-
anisms. In both cases, we simplify the model by only allowing point-to-
point communication through lossless FIFOs. We use FIFO models, which
are transition systems themselves, to represent communication through such
synchronous and asynchronous channels.
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10388
To represent synchronous communication, we use 1-place synchronous FI-
FOs (which are µSTSs). The FIFO model associated with a channel c is:
SFIFO(c, τ) = ({c0, c1} ∪
⋃
x∈Dc
{cx}, c0, {τ} ∪
⋃
x∈Dc
{!c = x, ?c = x}, ◦→S )
where the transition relation is deﬁned by:
c0
◦
τ  ◦
!c=x  cx◦?c=x  c1◦
τ
 , x ∈ Dc.
Asynchronous communication involves inﬁnite asynchronous FIFO models
(which are not µSTSs):
AFIFO(c) = (D∗c , ,
⋃
x∈Dc
{!c = x, ?c = x}, ◦→A )
where the transition relation contains all the transitions of the form:
x1 . . . xn◦
!c=xn+1  x1 . . . xnxn+1◦
?c=x1  x2 . . . xn+1
Deﬁnition 2.1 [synchronous composition of µSTSs] Let
Σi = (Si, sˆi, Vi = Di ∪ {τi}, ◦→Σi ), i = 1, 2 be composable µSTSs and let τ be
a clock variable. Then, the synchronous composition of Σ1 and Σ2 over the
base clock τ is:
Σ1 |
τ Σ2 = Σ1[τ1/τ ]⊗ Σ2[τ2/τ ]⊗
⊗
c∈C(V1)∩C(V2)
SFIFO(c, τ)
where Σ[τ/τ ′] represents the system Σ where the clock variable has been re-
named from τ to τ ′, and C(V ) = {c |?c ∈ V ∨!c ∈ V } is the set of channels
associated with a variable set V .
The synchronous composition of the µSTSs Σ1 and Σ2 over the base clock
τ is a µSTS of clock τ . The result of the synchronous composition is unique
upto a renaming of the clock variable (so that we can discard τ from the
notation). Moreover, the operator | is associative and commutative (again,
modulo clock renaming).
Note that (1) the local clocks are synchronized/renamed in the product
over the new global clock and (2) the synchronizing states of |ni=1 Σi have void
communication lines (all synchronous FIFO models SFIFO(c, τ) are in their
unique synchronizing state c0).
Deﬁnition 2.2 [asynchronous composition] Let Σi = (Si, sˆi, Vi, ◦→Σi ), i =
1, 2 be transition systems. Then, the asynchronous composition of Σ1 and Σ2
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 89
is:
Σ1 || Σ2 = Σ1 ⊗ Σ2 ⊗
⊗
c∈C(V1)∩C(V2)
AFIFO(c)
The || operator is associative and commutative. The asynchronous com-
position of two µSTSs is not a µSTS(because it has two clock variables).
2.1.4 Example
Using the small µSTSs Σ2 and Σ3, we illustrate our deﬁnitions and give the
intuition behind our criteria for correct GALS deployment of synchronous
speciﬁcations.
The result of the synchronous composition of Σ2 and Σ3 is:
Σ2 | Σ3 :
s0, t0
◦
τ◦
!a
		
s1, t0◦
?a 
◦
?r
		
◦
?a?r





s1, t1
◦
?r
		
s3, t0◦
?a  s3, t1◦
τ  s3, t2
◦
τ


 ◦
!b  s3, t3
Note that we simpliﬁed the notation by not representing the state of the two
FIFOs SFIFO(a, τ) and SFIFO(b, τ) (the initial state (s0, t0) having void
FIFOs, the status of the FIFOs is fully determined in each state). Also note
that the composed system is blocked in state (s3, t3) because SFIFO(b, τ)
cannot take a clock transition (data has been written on it, but not read).
The system Σ2 | Σ3 can deadlock.
The asynchronous composition of Σ2 and Σ3 is:
Σ2 || Σ3 :
s0, t0
◦
τ1,τ2,τ1τ2◦
!a
		
s1, t0
◦
τ2

◦ ?a 
◦
?r
		
◦
?a?r





s1, t1◦
τ2 
◦
?r
		
◦
τ2?r





s1, t2
◦
τ2

◦ !b 
◦
?r
		
◦
!b?r





s1, t3
◦
τ2

◦ ?b 
◦
?r
		
s2, t3
◦
τ1,τ2,τ1τ2

s3, t0
◦
τ1,τ2,τ1τ2


 ◦
?a  s3, t1
◦
τ1


 ◦
τ2,τ1τ2 s3, t2
◦
τ1,τ2,τ1τ2


 ◦
!b  s3, t3
◦
τ1,τ2,τ1τ2



It is essential to note that Σ1 || Σ2 has traces, like !a; ?a; τ2; !b; ?b, that are not
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10390
asynchronously equivalent to any of the synchronous traces of Σ1 | Σ2. Such
traces are not covered by the veriﬁcation done on the synchronous model,
meaning that the GALS implementation does not preserve the semantics of
the speciﬁcation. It is also important to note that requiring a one-to-one
correspondence between synchronous and asynchronous traces is not a good
idea, because for large classes of systems it can be highly ineﬃcient (exploiting
the concurrency between diﬀerent computations to allow the systems to evolve
at diﬀerent rates is a desirable feature because it minimizes communication
and power consumption).
Indeed, the good correctness criterion for desynchronization is the preser-
vation of the asynchronous traces. Formally, the GALS implementation is
correct if any of its traces (executions) can be extended with a ﬁnite number of
transitions to a trace that is asynchronously equivalent to a synchronous trace.
Unfortunately, this criterion is undecidable even for ﬁnite systems, but in the
next section we shall give suﬃcient conditions which are decidable.
2.2 Weak endochrony
Microstep weak endochrony (or, simply, weak endochrony) is the property
guaranteeing that a synchronous component (µSTS) reads its inputs in a
fashion that remains predictable even in an asynchronous environment. Weak
endochrony requires that every internal choice of the component is visible as a
choice over the value (and not presence/absence status) of a directed variable
(either input or output). Thus, the behavior of the system becomes predictable
in any asynchronous environment, because choices can be determined or ob-
served. With this requirement, the implementation space delimited by weak
endochrony is nonetheless very large: Concurrent behaviors are not aﬀected
by the previous rule, so that independent system parts can evolve at diﬀer-
ent speeds. Weak endochrony does not require I/O determinism. Instead,
a weakly endochronous component must inform the environment about non-
deterministic decisions (the variable used to do so behaves like an oracle that
is visible from outside).
Formally, we say that the µSTS Σ = (S, sˆ, V = D ∪ {τ}, ◦→ ) is weakly
endochronous if it satisﬁes the following four axioms:
WE1 (determinism): s◦ l  si , i = 1, 2 ⇒ s1 = s2. From now on, we shall
denote with s.ϕ the unique state of Σ having the property s◦
ϕ  s.ϕ , and
the notation is extended to traces.
WE2 (independence): In a given state, transitions with disjoint labels
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 91
commute. Formally, if l1 and l2 are disjoint and if 1, l2 = τ , then:
s1
s0
◦
l1

◦
l2 


s2
⇒ ∃s3 :
s1◦ l2



s0
◦
l1

◦
l2 


◦
l1unionsql2  s3
s2
◦ l1

WE3 (clock properties): Non-contradictory reactions in a given state can
be united to form a composed transition. Moreover, a strong conﬂuence
property holds. Formally, if s0◦ τ  s1 and ϕ ∈ TracesΣ(s0) with τ ∈
supp(ϕ), then:
(i) ϕ ∈ TracesΣ(s1)
(ii) if ϕ; τ ∈ TracesΣ(s0), then ϕ; τ ∈ TracesΣ(s1) and s0.(ϕ; τ) = s1.(ϕ; τ)
(iii) if ϕ;ψ; τ ∈ TracesΣ(s1), then there exists ψ
′ ≤ ψ such that ϕ;ψ′; τ ∈
TracesΣ(s0).
(iv) if ϕ; τ , θ; τ ∈ TracesΣ(s0) and ϕ  θ, then there exists ρ such that
ϕρ ∼ θ and ϕ; ρ; τ ∈ TracesΣ(s0)
WE4 (choice): The same choices must be available on non-contradictory
paths starting in a given state. Formally, if ϕi; v = xi ∈ TracesΣ(s), i = 1, 2
and ϕ1  ϕ2, then ϕ1; v = x2 ∈ TracesΣ(s).
While their formmay seem complex, the axioms of weak endochrony simply
require conﬂuency, both inside a reaction and at the level of general traces, in
the case where no choice has been made over the value (not presence/absence
status) of a communication variable.
Weak endochrony covers a large class of systems, which is closed to syn-
chronous composition (thus, incremental design is facilitated):
Theorem 2.3 (compositionality) Let Σi, i = 1, n be composable weakly
endochronous µSTSs. Then, |ni=1 Σi is weakly endochronous.
2.2.1 Correctness results
While example Σ3 is weakly endochronous, the same is not true for Σ2. There,
the choice between reading b and reading r in state s1 is not visible from the
exterior. If the environment provides both b and r, input reading is non-
deterministic. On the other hand, if ?b and ?r were concurrent, then the
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10392
system is weakly endochronous:
Σ4 :
s2
◦
τ1

◦
?r





s0
◦
τ1  ◦
!a  s1
◦
?b

◦
?r





◦ ?b?r  s4◦
τ1

s3◦
τ1

◦
?b

Moreover, the GALS implementation model Σ4 || Σ3 preserves the semantics
of Σ4 | Σ3:
Σ4 | Σ3 :
s0, t0
◦
τ◦
!a
		
s1, t0◦
?a 
◦
?r
		
◦
?a?r





s1, t1
◦
?r
		
s3, t0◦
?a  s3, t1◦
τ  s3, t2
◦
τ


 ◦
!b  s3, t3◦
?b  s4, t3
◦
τ



Σ4 || Σ3 :
s0, t0
◦
τ1,τ2,τ1τ2◦
!a
		
s1, t0
◦
τ2

◦ ?a 
◦
?r
		
◦
?a?r





s1, t1◦
τ2 
◦
?r
		
◦
τ2?r





s1, t2
◦
τ2

◦ !b 
◦
?r
		
◦
!b?r





s1, t3
◦
τ2

◦ ?b 
◦
?r
		
◦
?b?r





s2, t3
◦
τ1,τ2,τ1τ2

◦
?r
		
s3, t0
◦
τ1,τ2,τ1τ2


 ◦
?a  s3, t1
◦
τ1


 ◦
τ2,τ1τ2 s3, t2
◦
τ1,τ2,τ1τ2


 ◦
!b  s3, t3
◦
τ1,τ2,τ1τ2


 ◦
?b  s4, t3
◦
τ1,τ2,τ1τ2



As expected, the asynchronous composition binds tighter than the synchronous
one, but for any trace of Σ4 || Σ3 going from (s0, t0) to (s4, t4) we can ﬁnd an
asynchronously equivalent trace in Σ4 | Σ3. Such a GALS implementation is
obviously correct, because it does not introduce new behaviors.
In fact, a stronger relation exists between weak endochrony and correct
GALS implementation. The weak endochrony of the components and the
global correctness of the synchronous speciﬁcation (absence of deadlocks) im-
ply that the GALS implementation is semantics-preserving (i.e. correct).
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 93
Theorem 2.4 (correctness) Let Σi, i = 1, n be composable weakly endo-
chronous µSTSs. If |ni=1 Σi is non-blocking, then ||
n
i=1 Σi is correct w.r.t. the
synchronous speciﬁcation |ni=1 Σi.
This theorem gives the basis for the synthesis method proposed in the next
section. Indeed, if the components of a deadlock-free synchronous speciﬁcation
are weakly endochronous, then the synthesis of the GALS wrappers can be
done locally for each module, without knowledge about the global system.
Then, the implementation can be derived by connecting the resulting modules
with asynchronous FIFOs of arbitrary length.
3 Theory of Event Models and Regions
This section throws light upon the background of Petri net model used in this
paper. It also introduces the theory of Regions.
3.1 Petri nets
A Petri net is a model used to represent systems with concurrency. It is
a quadruple N = {P, T, F,M0}, where P is a set of places, T is a set of
transitions, F is a ﬂow relation denoted by F ⊆ {(P ×T )∪ (T ×P )}and M0 is
the initial marking. A transition is enabled when all its predecessor places have
a token. The enabled transition can then ﬁre, removing all the tokens from its
predecessor places and adding one token to each successor place. A labelled
PN is a PN with a labelling function λ : T → A associating each transition of
the net with a name. A labelled Petri net can have a combination of implicit
places, where the input and output transitions are named using symbols from
the alphabets, connected by arcs and transitions which are labelled with signal
transitions (a+, a−).
Important properties of a Petri net
(i) Liveness: if any transition can ﬁre inﬁnitely often, from any reachable
marking. Liveness ensures complete deadlock freedom.
(ii) Safeness: if no reachable marking from M0 can assign more than one
token to any place.
3.2 Theory of Regions
The theory of Regions for elementary system was developed by Nielsen et.
al. [12]. It was subsequently adapted by to give a practicaly useful synthesis
procedure (implemented in tool Petrify), for 1-safe nets, by Cortadella et.al
[4]. Subsets of states in a transition system, that correspond to a set of places
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10394
in a Petri net are called Regions. Let S1 be a subset of states S of a TS. A
transition s1→ s2 enters S1 if s1 /∈ S1 and s2 ∈ S1. Transition s1→ s2 exits S1
if s1∈ S1 and s2 /∈ S1. If neither of the conditions hold true then the transition
does not cross the region. If both the conditions hold true then the transition
is internal to the region. A subset r is a region if for each event a , one of the
conditions hold true: All transitions with label a
(i) enter r,
(ii) exit r, or
(iii) does not cross r.
If r1 and r2 are regions of a TS, such that r2 ⊂ r1, then r2 is a subregion of
r1. r2 is a minimal region if it contains no subregions of the TS. A region r
is a pre-region of event a if transition labelled a exits r. A region r is a post
region of event a if the transition labelled a enters r.
4 Proposed Latency Insensitive Architecture with Inter-
mittent Clock Triggering
This approach is primarily based on asynchronous handshake protocol. As
shown in the Fig.1 the locally synchronous system is encapsulated by an
asynchronous wrapper. This asynchronous wrapper consists of communica-
tion channels and a clock generator.
The communication channels consist of a set of input and output FIFOs
(shown in Fig.4). We consider that each signal is transmitted from one syn-
chronous island to the other using a dedicated FIFO. Therefore, we have as
many FIFOs as there are signals in the system. When data is available at
the input the FIFOs, they are read by the synchronous module. The clock
generator, then triggers (clk+) the local clock for computation and gener-
ation of outputs. After the output has been generated, it is written to the
output FIFOs. The clock is released (clk-) by the clock generator and the syn-
chronous module is ready to read its next set of inputs.The activation of clk+
and clk− transitions, mark the start and end of a synchronous computation,
respectively.
This scheme gives rise to two advantages over the previously mentioned
communication schemes.
(i) In contrast to the prevalent clock pausing schemes, we do not have a
free running clock. The clock is triggered when the data required for a
particular computation is read and is waiting for some operation to be
done on it. The clock is released after the completion of the computation.
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 95
Synchronous
Module
Clock
generator
Ack3
Req3
Ack2
Req2
Ack1
Req1
AckO3
ReqO3
AckO2
ReqO2
AckO1
ReqO1
Input
FIFOs
Output
FIFOs
Fig. 1. Latency Insensitive communication
This leads to a signiﬁcation reduction in power consumption.
(ii) In contrast to the prevalent clock gating schemes, the synchronous module
is not unnecessarily stalled by the unavailability of an input not required
for a particular computation. This leads to an increased eﬃciency.
4.1 DLX architecture
In this paper we de-synchronize the DLX-like datapath architecture to exem-
plify the proposed transition from weakly endochronous systems to latency
insensitive circuits. Here, we consider a simple unpipelined DLX-like archi-
tecture. Our approach can be directly extended to a pipelined DLX architec-
ture. Fig.2 shows a simpliﬁed and abstract view of the overal architecture.
The globally synchronous system is partitioned into ﬁve main synchronous
islands, Instruction Fetch(IF), Instruction Decode(ID), Execution(EX) and
Write Back(WB). These islands operate at diﬀerent clock speeds. The verti-
cal dotted lines separate diﬀerent clock domains.
The dashed lines group two synchronous islands, namely, Instruction De-
code(ID) and Memory(MEM). In our paper we will concentrate on the ID
block. This block receives instructions from Instruction Fetch(IF) block and
communicates with the MEM block, with exchange of data between them.
The instruction is decoded into any one of the following types: Load (ILoad),
Store (IStore), ALU or Move (IMov). In this paper we will only deal with
IF-ID interface and ID-MEM interface. Hence, we will ignore the last two in-
struction types. The transition system speciﬁcation is shown in Fig.3. The ID
block receives instruction from IF block. This block decodes this instruction
and emits memory write ﬂag (MWF = 0, MWF = 1). If the ﬂag is 1, it
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10396
ID EX
MEM
IF WB?Inst
Environment
5
Clock Domain
1
Clock Domain
2 3
Clock domain
4
!MData
?DData
!WF[0:1]
Clock Domain
Clock Domain
Fig. 2. Partitioned-DLX Architecture
emits signal MData, that controls the writing of data to the memory block.
If the ﬂag is 0, it reads the signal ?DData that controls the loading of data
from the memory to the ID block.
Following sections describe the translation procedure.
5 Synthesis Methodology
The following steps sum up the synthesis process.
(i) Identify the modules in a synchronous system, which when partitioned
from the main system, would enable high performance if their speed is
increased independently.
(ii) Build weakly endochronous FSMs for each module. An example of a
transition system for such a FSM is shown in Fig.3.
(iii) Identify the transitions from the FSM, that will become actual transitions
of the circuit. This can be done at the level of the control automaton
by identifying, in each synchronizing state(destination of an "T"), short-
est sequences of transitions that end with an "T". Divide these shortest
sequences into greatest sequences where emissions take place after re-
ceptions. These greatest sequences will be the reactions of the actual
circuit.
(iv) Modify the automaton by inserting "hardware clock transitions" and I/O
signalling in the middle of the greatest sequences and by removing the
"synchronous clock" transitions, as well as the transitions corresponding
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 97
to concurrent execution of greatest sequences.
(v) Translate FSM to PN using Petrify [5].
(vi) Extend the translated Petri net to handle intermittent clock transitions.
(vii) Choose an eﬃcient asynchronous inter-domain communication scheme(e.g.
Asynchronous FIFOs).
(viii) Implementation of the controls.
Step (i) is illustrated in Fig.2 and described in Section 4.1 and (ii) have already
been discussed in Section 3. In this section we will illustrate steps (iii) to (viii).
The Req signals in Fig.1 represent the incoming requests from another
module in the system. In Fig.3, the signals ?ILoad and ?IStore of the syn-
chronous automaton correspond to these Req signals. The T signals represent
the clock transition that leads to a synchronizing state, discussed in Section
2.1. Similarly, the Ack signals in Fig.1 correspond to the Ack signals in Fig.3.
The clock transitions T1 and T2 of the synchronous FSM, that lead to the ini-
tial state are replaced by the above mentioned asynchronous Ack1 and Ack2
handshake signals that return the Petri net model to its initial state.
Fig.3 shows the FSM of the ID module illustrated in Fig.2. S0 is the
synchronizing state, identiﬁed in step (iii), since it marks the destination of
T1 and T2. Therefore, the shortest sequence is the sequence of transitions:
S0◦?ILoadS2◦
!WF=0S4◦
!DDataS5◦
ack S0 .
After applying step (iii) we get a sequence S0◦
?ILoadS2◦
!WF=0S4 , that is the
greatest sequence where emission of !WF = 0 takes place after the reception
of ?ILoad. The clock transitions are inserted between these transitions in
step (iv). This is done in such a way, that the clock is only triggered (clk+)
when clken is asserted after all the input signals, required for a particular
computation, have been received. Our current approach assumes that the
computation is completed in one clock cycle. The clock is released and clken
signal de-assreted after the emission of the output signals and is not triggered
till another input or set of inputs are read.
Following are steps undertaken to translate the Transition System to a PN
(i) For each event a in the TS a corresponding transition labelled a is gen-
erated in the PN.
(ii) For each minimal region ri, a place pi is generated
(iii) Place p contains a token in the initial marking M0, iﬀ, the initial state
of the TS is an element of the set of states r.
(iv) The ﬂow relation is constructed as follows:
a ∈ pi• iﬀ ri is a preregion of a and a ∈ •pi iﬀ riis a postregion of a.
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–10398
!S0
S4
S5
S7
S1 S3
S2
S6
r2
r3
r4
Synchronous Automaton
Clock Generation
Petri net Model
WF=0
!WF=1
?DData
!WF=1
?ILoad
?IStore
p1
CD1
CD2
r1
p3
?ILoad
?IStore Dum2
Dum1
!MData
Ack1
Ack2 p2
T
T
!WF=1,!MData
!WF=0
?DData
!MData
clken+
clken−
Fig. 3. TS-to-PN Translation
The process deﬁned above is fully automated by the tool Petrify. It takes a
textual description of the synchronous automaton as its input. The results
obtained are depicted in Fig.3. This ﬁgure illustrates the translation of the ID
module that interfaces with IF and MEM modules, from weakly endochronous
FSM to PN. The set of states r1 = {S1, S3} is a region, since all transitions
labelled !MData exit the r1 and all label ?IStore enters r1. Similarly, r3 is
a region since all transitions labelled !MData enter r3 and transition labelled
T2 exits r3. In contrary, the set of states S = {S0, S1} is not a region. This
is because, though T1 and T2 enter r2, transition S1 → S7 labelled !MData
exits the this set, but transition S3 → S6 with the same label does not.
The regions r1, r3 and r4 are minimal regions. Hence, region r1 can be
mapped to place p1, r3 to p2, r4 to p3, and so on. The region r1 and r3
form a preregion and post region to event !MData, respectively. Hence, p1
is the predecessor place and p2 is the successor place for transition labelled
!MData. Similarly, place p2 leads to transition Ack2, with T2 replaced by
Ack2, as mentioned above.
For the sake of clarity, we have omitted the clock transitions from the
synchronous FSM model. Step (vi) extends the Petri net obtained from Petrify
to handle intermittent clock. The clken transitions, shown in Fig.3, control
the actual circuit clock transitions. The theory of regions applied in step
(v) cannot be directly extended to handle these transitions. This is because
the semantical signiﬁcance of the clock transitions were not identiﬁed and
treated like any other input, output or internal signals. Hence the model
had to be extended by hand to meet the semantical requirements of the clock
in the locally synchronous modules. This is done by identifying the available
inputs, required for a particular computation. When these inputs are received,
on their respective channels, the clock enable signal clken is triggered. The
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 99
CC C
C
X0
X1
CD
Output
C C
X Output
Ack
F F
E E
F F
F F
E E
MODEL IMPLEMENTATION
(a) (b)
(c) (d)
Fig. 4. FIFO Model and Implementation
circuit clock is triggered on assertion of the clken signal. On completion of
the computation, detected by completion detection signals introduced, namely,
CD1 and CD2, the clock enable is de-asserted, preventing further clock ticks.
This extension is illustrated in a dotted box named "clock generation" in Fig.3.
For the sake of clarity, circuit clock is not shown in the ﬁgure. The above task
is a direct outcome of step (iii) of the synthesis methodology, imposed by the
weakly-endochronous correctness criterion, that the clock is triggered after all
the inputs are read and it is released after all the outputs are emitted. The
clock remains paused, otherwise.
We have used the modeling tool PEP [11] to extend the Petri net model
obtained from Petrify. The dummy signals Dum1 and Dum2 are introduced,
at this stage, for synchronizing the inputs to trigger the clock enable (clken)
signal.
Step (vii) elaborates on the choice of the inter domain communication
scheme.
In our design we have chosen asynchronous FIFOs to connect two clocked
domains, working at diﬀerent speeds. Several papers have presented diﬀerent
types of FIFOs. These approaches include clock skew handling, robust inter-
face for mixed timing systems and reduction of penalties for long interconnect.
Any of the above can be applied to our design depending on the requirements
of the system.
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103100
Name of the Module |s| |t| |B| |E|
ID-module 15 13 26 15
|s| = Number of States, |t| = number of transitions
|B| = number of conditions, |E| = number of events
Table 1
Veriﬁcation Results
In the model we use a very straightforward design of a standard FIFO,
which is a basic requirement of the system. The model and implementation
of such a FIFO is shown in Fig.4. The signal ﬂags are communicated via dual
rail or otherwise encoded, e.g. 1-of-4, FIFOs. For other control signals, a
single rail FIFO is used. Fig.4(a) and (c) represent the PN models of dual
rail and single rail FIFO, respectively and Fig.4 (b) and (d) represents the
implementation, using C-elements, of the respective FIFOs.
In contrary to the basic latency insensitive approach that assumes point-to-
point network topology, our approach can be extended to any simple topology.
These topologies include, ring architectures [7], simple forks and joins [19] etc,
thus increasing the eﬃciency of our approach.
6 Veriﬁcation of the PN model
As mentioned in Section 5, the tool Petrify was used to translate the weakly
endochronous transition system to Petri net for logic synthesis. Since, a part
of the model is developed by hand and glued to the model generated by Petrify,
the ﬁnal PN was veriﬁed to ensure it satisﬁed the overall system speciﬁcation.
Two main properties, as deﬁned in Section 3.1, were veriﬁed: Safeness and
Liveness.
PEP was used to verify the safeness property of the original net. We
have used in-house tools, PUNF [9] and CLP [9] for reachability analysis and
veriﬁcation. PUNF was used to obtain a ﬁnite and complete preﬁx of the Petri
net’s unfolding. The output from PUNF was fed to CLP to further verify the
functional properties of the net. The choice of the veriﬁcation tool was based
on its expressiveness and analysis power. The Petri net satisﬁed the safeness
and liveness properties. The net statistics (|s| and |t|) and unfolding statistics
(|B| and |E|) are shown in Table 1. A Signal Transition Graph is obtained
from the Petri net model. STG speciﬁcation is fed to Petrify for logic synthesis
leading to circuit implementation.
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 101
7 Conclusion
This paper sets the guidelines for a new methodology for the synthesis of the
delay-insensitive asynchronous wrappers needed for the correct-by-construction
GALS implementation of a modular synchronous system.
The approach is based on the recent results of Potop and Caillaud [13],
which deﬁne high-level, decidable criteria for the correct GALS implementa-
tion of modular synchronous speciﬁcation, namely the weak endochrony of the
modules and the absence of deadlocks in the global synchronous speciﬁcation.
The synthesis problem is thus reduced to that of synthesizing the asynchronous
wrappers for weakly endochronous synchronous modules. This problem can be
solved on a local basis, without knowledge about the properties of the global
system.
We used an example – a simple model of a DLX-like processor – to intu-
itively present and give implementation hints on the diﬀerent phases of the
proposed methodology.
7.1 Future work
A formally deﬁned algorithm for the proposed synthesis methodology will
be developed. This methodology will include the extension of the theory of
regions to handle intermittent clock transitions. The extension will be incor-
porated in the automatic synthesis tool, like Petrify, to enable the translation
of weakly endochronous synchronous automata into synthesizable Petri net
models.
We also intend to extend the underlying theory in order to simplify the
generated logic by taking into consideration:
• closed-system assumptions, for instance under the form of sequential care-
sets.
• the fact that synchronous speciﬁcations are often meant to run in asyn-
chronous environments, under speciﬁc input arrival hypothesis (e.g. one
event per clock cycle)
References
[1] Ivan Blunno, Jordi Cortadella, Alex Kondratyev, Luciano Lavagno, Kelvin Lwin, and
Christos P. Sotiriou. Handshake protocols for de-synchronization. In Proceedings Async04,
pages 149–158, Crete, Greece, 2004.
[2] C. Carloni, K. McMillan, and A. Sangiovanni-Vincentelli. The theory of latency-insensitive
design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
20(9):1059–1076, 9 2001.
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103102
[3] Chapiro, D., “Globally Asynchronous Locally-Synchronous Systems,” PhD Thesis, Stanford
University, 1984
[4] J. Cortadella, M. Kishinevsky, A. Kondratiev, L. Lavagno, and A. Yakovlev. Deriving Petri
nets from ﬁnite transition systems. In IEEE Transactions on Computers, Vol. 47, Aug. 1998.
[5] J. Cortadella, M. Kishinevsky, A. Kondratiev, L. Lavagno, and A. Yakovlev. Petrify: a tool
for manipulating concurrent speciﬁcations and synthesis of asynchronous controllers In IEICE
Transactions on Information and Systems, Vol. E80-D, No. 3, March 1997, pages 315-325
[6] J. Cortadella, M. Kishinevsky, A. Kondratiev, L. Lavagno, and A. Yakovlev. Logic Synthesis
of Asynchronous Controllers and Interfaces. Springer, 2002.
[7] S. Dasgupta, A. Yakovlev. Modeling and Veriﬁcation of Globally Asynchronous and Locally
Synchronous Ring Architectures. In Proceedings DATE 2005, Munich, Germany, 2005.
[8] Nicolas Halbwachs. Synchronous programming of reactive systems. Kluwer Academic
Publishers, 1993.
[9] V. Khomenko. Model checking based on preﬁxes of petri net unfoldings. PhD thesis, University
of Newcastle, 2003.
[10] M. Krstic, E. Grass. GALSiﬁcation of IEEE 802.11a Baseband Processor Proceedings
PATMOS 2004, LNCS 3254, 2004
[11] T. Margaria, B. Steﬀen. Tools and Algorithms for the Construction and Analysis of Systems.
Lecture Notes in Computer Science. Vol. 1055, Second Int. Workshop, TACAS’96, Passau,
Germany, pages 397-401. Springer-Verlag, March 1996.
[12] M. Nielsen, G. Rozenberg, and P. Thiagarajan. Elementary transitions systems. Theoretical
Computer Science, 96:-33, 1992.
[13] D. Potop-Butucaru and B. Caillaud. Correct-by-construction asynchronous implementation of
modular synchronous speciﬁcations. In To appear in Proceedings ACSD 2005, Mont StMichel,
France, 2005.
[14] D. Potop-Butucaru, B. Caillaud, and A. Benveniste. Concurrency in synchronous systems. In
Proceedings ACSD 2004, Hamilton, Canada, 2004.
[15] C. Seitz. System Timing. Chapter 7 of Introduction to VLSI Systems by C. Mead and L.
Conway. 1980.
[16] M. Singh and M. Theobald. Generalized latency insensitive systems for GALS architectures.
In Proceedings FMGALS2003, Pisa, Italy, 2003.
[17] K. Yun and D. Dill. Automatic synthesis of extended burst-mode circuits. IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, 18(2):101–132, Feb. 1999.
[18] K. Yun and R. Donohue. Pausible clocking: A ﬁrst step toward heterogenous systems. In
Proceedings ICCD 1996, 1996.
[19] S. Zhuang, W. Lee, J. Carlsson, K. Palmkvist and L. Wanhammar Asynchronous data
communication with low power for GALS system. In IEEE International Conference on
Electronics, Circuits and Systems, (2002).
S. Dasgupta et al. / Electronic Notes in Theoretical Computer Science 146 (2006) 81–103 103
