Practical advances in asynchronous design and in asynchronous/synchronous interfaces by Brunvand, Erik L. & Nowick, Steven
7.1
P r a c t i c a l  A d v a n c e s  in  A s y n c h r o n o u s  D e s ig n  a n d  in  A s y n c h r o n o u s /S y n c h r o n o u s
I n t e r f a c e s
Erik Brunvand 
Dept, of Computer Science 
University of U tah 
SLC, U tah 84112
Steven Nowick 
Dept, of Computer Science 
Columbia University 
New York, NY 10027
Kenneth Yun 
Departm ent of ECE 
University of California 
San Diego, CA 92093
Abstract
Asynchronous systems are being viewed as an increas­
ingly viable alternative to purely synchronous systems. This 
paper gives an overview of the current state of the art in 
practical asynchronous circuit and system design in four ar­
eas: controllers, datapaths, processors, and the design of 
asynchronous/synchronous interfaces.
1 Asynchronous Control
Classical asynchronous controllers were typically imple­
mented as Huffman machines [67]. These machines do not 
use clocked latches or flip-flops: the state is simply stored 
on feedback loops. Typically a fundamental mode assump­
tion is required, to insure correct operation: once an input 
change occurs, no new inputs may arrive until the machine 
has stabilized. Much of the basic theory on asynchronous 
state machines was developed by Huffman, Unger, and Mc- 
Cluskey (see [67]).
H azards, or the potential for glitches, are an important 
consideration in any asynchronous design [67]. In synchronous 
systems, the global clock usually filters out the effect of 
glitches. In asynchronous systems, there is no global clock, 
so any glitch may be interpreted as a valid signal change, 
and cause a malfunction. A number of techniques to elim­
inate combinational hazards, as well as critica l races and 
essen tia l hazards, have been proposed [67, 22, 5].
While this early work laid the foundations of asynchronous 
controller synthesis, the design methods had major limita­
tions: (i) lack of ability to handle highly-concurrent envi­
ronments; (ii) poor performance; (iii) problems in hazard 
elimination (in some methods); and (iv) lack of CAD opti­
mization algorithms and tools.
Since the early and mid 1980’s, several controller synthe­
sis methods were developed, to address these limitations. 
These methods fall into three general categories: (i) state 
machines; (ii) Petri-net and graph-based methods; and (iii) 
translation methods.
Permission to make digital or hard copies of all or part of this work for personal or 
classroom use is granted without fee provided that copies are not made or distrib­
uted for profit or commercial advantage and that copies bear this notice and the full 
citation on the first page. To copy otherwise, to republish, to post on servers or to 
redistribute to lists, requires prior specific permission and/or a fee.
DAC 99, New Orleans, Louisiana
©1999 ACM 1-58113-092-9/99/0006..$5.00
1.1 Asynchronous State Machines
Much of the recent work on asynchronous state machine 
design is centered on bu rst-m ode m ach in es  [50, 52].
Burst-mode specifications grew out of earlier informal 
specifications by Davis et al. [18, 17]. Davis proposed ma­
chines which would wait for a collection of input changes (an 
“input burst”), and then respond with a collection of out­
put changes (an “output burst”). The key contribution is 
that, unlike classical asynchronous machines, inputs within 
a burst could be uncorrelated: arriving in any order and 
at any time. Therefore, these machines could operate more 
flexibly in a concurrent environment. Unfortunately, their 
synthesis methods did not insure hazard-free designs.
Nowick and Dill [52, 50] modified and formalized these 
specifications into the final form called burst-m ode  (BM) 
[52, 50]. They also proposed a new self-synchronized design 
style called a locally-clocked state machine, which was the 
first burst-mode synthesis method to guarantee a hazard- 
free implementation [52, 50]. The method has been applied 
to large-scale designs such as a cache controller [51]. They 
also developed the first exact hazard-free 2-level logic mini­
mization algorithm [53].
Yun and Dill proposed an alternative implementation 
method, called 3D  [79, 83]. The specifications were also 
generalized into into extended bu rst-m ode  (XBM), to allow 
greater concurrency and practicality [80, 82], XBM spec­
ifications can be used to to synthesize controllers for syn­
chronous/asynchronous interfaces, where the global clock is 
treated as one of the controller’s inputs.
A number of optimization algorithms and CAD tools 
have been developed, for sequential and combinational logic 
synthesis [23, 53, 66, 36, 39], technology mapping [61], tim­
ing analysis [10], and synthesis for testability [54]. Burst­
mode CAD tools have been applied to several industrial 
designs, including an experimental routing chip [17] and 
low-power infrared communications chip [40] at HP Labo­
ratories, an experimental SCSI controller at AMD [81], and 
a high-performance experimental instruction-length decoder 
at Intel [60].
1.2 Petri-Net and Graph-Based Methods
Petri nets and state graphs can also be used to specify 
asynchronous circuits. A Petri net is a directed bipartite 
graph which can describe both concurrency and choice. The 
net consists of two kinds of vertices: places and transitions. 
An assignment of tokens to the places is called a marking, 
which captures the state of the concurrent system.
104
Several synthesis methods use restricted Petri nets, called 
m arked graphs, which model concurrency, but not choice. 
More general Petri nets called Signal T ransition  Graph  (STG), 
as well as s ta te  graphs which specify interleaved conncur- 
rency, are now commonly used [12, 46, 72, 2, 35].
A number of synthesis algorithms have been developed, 
for state minimization and assignment [37, 16] and hazard- 
free logic decomposition [8] (see also [72, 2, 35]). Full-scale 
CAD packages are now available, including one incorporated 
into the Berkeley S IS  package [38], as well as P e tr ify  [16]. 
Another synthesis method, called ATACS, focuses on tim ed  
circu its [48].
1.3 Translation Methods
Translation methods specify an asynchronous system us­
ing a high-level concurrent programming language. Com­
mon languages include variants of Hoare’s CSP, occam  and 
trace theory. The program is then transformed, stepwise, 
into a low-level program which maps directly onto a circuit. 
These methods can be used to synthesize both datapath 
and control. A few methods use formal algebraic deriva­
tions [21, 33]. More commonly, though, compiler-oriented 
techniques are used.
At Caltech, Martin et al. [42] specify and asynchronous 
system using a CSP-like parallel language, augmented with 
sequential constructs based on Dijsktra’s guarded commands. 
The specification describes a set of concurrent processes 
which communicate on channels. The specification is then 
automatically compiled into a collection of gates and com­
ponents which communicate on wires [9]. An alternative 
approach was developed by Brunvand and Sproull based on 
occam specifications [7].
At Philips Research and Eindhoven University, van Berkel 
et al. [68, 69] have developed an industrial synthesis pack­
age, based on their Tangram  language. The tool has been 
applied to both commercial and experimental designs, in­
cluding a DCC error corrector and an 80C51 microcontroller 
(discussed in Section 3).
2 Datapath
This section describes some of the recent advances in self­
timed datapath design, concentrating on performance issues 
only.
A datapath can be classified as pipelined or non-pipelined. 
There has been a tremendous amount in asynchronous pipe­
lines, starting with the classical m icropipeline  work by Suther­
land [63]. Pipeline control can be implemented using either 
a two-phase protocol [24, 76, 1] or a four-phase protocol 
[19, 28, 25, 27],
All of the asynchronous datapath designs strive to ob­
tain higher average-case speed than the worst-case speed 
of comparable synchronous circuits. For non-pipelined da t­
apaths, the performance advantage of non-pipelined asyn­
chronous circuits is much clearer. The latency, the only rel­
evant metric in non-pipelined datapaths, is simply the sum 
of all datapath element delays in the critical path. Thus 
the average-case latency for asynchronous datapaths, deter­
mined roughly by the sum of the average-case delay of indi­
vidual elements, is in general much lower than synchronous 
counterparts. Some examples of non-pipelined datapaths 
are Williams’s divider ring [75], van Berkel et al’s DCC er­
ror corrector [4], Yun et al’s differential equation solver [77], 
and Benes et al’s Huffman decoder [3].
For pipelined datapaths, tradeoffs are more complex. Work 
at Sun Labs [15] shows that asynchronous pipelines, if de­
signed properly, can approach the speed of synchronous shift 
registers. However, it is unclear if asynchronous pipelines, 
except in some special cases [60], can ever out-perform syn­
chronous counterparts. A goal is therefore to aim for com­
parable performance as a synchronous pipeline, but with the 
added benefits of “elasticity” (variable rate operation).
Our conjecture is that the average-case throughput (tak­
ing into account only data dependency, not operating con­
ditions) of a deeply-pipelined  asynchronous circuit would 
be close to the worst-case throughput. Shorter pipelines, 
though, tend to exhibit much better average-case behavior.
We describe below some recently introduced techniques 
to improve the average-case performance of self-timed dat­
apaths.
2.1 Adders
In order to exploit variable data-dependent delays, self­
timed datapath elements incorporate some form of comple­
tion detection mechanisms. The most common form is based 
on dual-rail logic [43]. However, as the datapath becomes 
wider, the overhead for completion detection becomes sig­
nificant. Yun et al [77] observed that one way to tackle this 
problem is to parallelize the computation and completion 
detection as much as possible. Their techniques resulted in 
2.8ns average-case delay for a 32-bit carry bypass adder fab­
ricated in 0.6/wn CMOS process, with only 20% completion 
sensing overhead on average. Another way to deal with wide 
datapaths is to perform bitw ise  completion detection. Mar­
tin et al [44] showed an impressive throughput gain (at the 
expense of sacrificing latency) using this technique.
A somewhat different twist to completion detection is 
called the speculative com pletion. This technique assumes 
the circuit normally finishes computation significantly faster 
than the worst-case. If the circuit cannot complete the com­
putation in time, it aborts reporting completion. This tech­
nique requires a special auxiliary circuit called “abort detec­
tion circuit”, which operates in parallel with the datapath 
element itself. Nowick et al [55] applied this technique to 
a 32-bit Brent-Kung adder and resulted in the simulated 
average-case delay to be less than 2ns in 0 .6 p m  CMOS pro­
cess.
2.2 Iterative structures
The development of zero-overhead se lf-tim ed  ring  tech­
nique by Williams [74] is clearly the most significant break­
through in self-timed iterative structures. Williams showed 
that a self-timed ring can be designed in dual-rail domino 
logic with essentially zero overhead. He applied this tech­
nique to a self-timed 160ns 54-bit mantissa divider [75] as 
a part of a floating-point divider. This design was incorpo­
rated in a commercial microprocessor design [73].
It can be shown that this technique is generally appli­
cable to any iterative structure in which the latency needs 
to be optimized. Consequently, this technique has been ap­
plied to other academic and industrial designs, such as a 
division and square root unit design by Matsubara and Ide
[45], a self-timed packet switch design by Yun et al [78], and 
a Huffman decoder design by Benes et al[3]. There have 
been other iterative structure designs that achieve high per­
formance with data-dependent computation times, such as 
a bundled data multiplier design by Kearney and Bergmann
[34].
105
2.3 Large scale examples
In certain applications in which there is a large variation 
in processing delays between common and rare cases, asyn­
chronous designs tend to fare much better than synchronous 
designs. A research group at Intel demonstrated this with 
their asynchronous instruction length decoder design called 
RAPPID ( “Revolving Asynchronous Pentium Processor®  
Instruction Decoder”) [60]. The RAPPID’s length decod­
ing out-performs, by a factor of 3, the same function inside 
a 400MHz Pentium II fabricated in the identical 0.25/jm 
CMOS process. This speedup is primarily attributed to op­
timizations for common, short-length instructions and self­
timed techniques enabling these optimizations.
In another application, an asynchronous Huffman de­
coder design by Benes et al [3], by exploiting the large 
data-dependent variation in decoding time, achieves a simi­
lar average-case performance as the worst-case performance 
of comparable synchronous designs, but with 5-10  tim es  
sm a lle r  area.
So far, we have only discussed techniques to exploit vari­
able data-dependent delays. However, if the operating con­
dition is taken into account, we can obtain much more signif­
icant performance benefits from asynchronous circuits. This 
speedup is essentially due to inherent margins that must be 
built in synchronous systems to accommodate worst-case 
timing behavior but are not required for asynchronous sys­
tems. Dean [20] proposed a self-timed processor architecture 
called S T R iP  based on this idea. Yun et al [77] demon­
strated a high-performance asynchronous differential equa­
tion solver chip, whose average-case speed (tested at 22°C 
and 3.3V) is 48% faster than comparable synchronous de­
signs (designed to operate at 100° C and 3V for the slow 
process corner).
3 Asynchronous Processors
This is an exciting time for asynchronous processors. Re­
cently, at Phillips Semiconductors, pagers with asynchronous 
chips have been released commercially to market (see be­
low). In addition to the current academic interest in asyn­
chronous systems, several companies such as Intel, Sharp, 
Sun, and HP have shown interest. The asynchronous circuits 
these companies have developed are showing some promise 
of making their way into products.
Processors are, in many ways, the most demanding ap­
plication for asynchronous techniques. In addition to being 
extremely complex systems, processors are often the target 
of the most aggressive optimization that the circuit design­
ers can bring to bear. The optimization criterion may be 
raw speed, low power, noise and EMC (Electro-Magnetic 
Compatibility) properties, or some combination of these, 
but it is in a processor where such requirements are the 
most critical. It is also the case that the organization of 
most modern high-performance microprocessors uses a syn­
chronous pipelined approach, and alternative architectures 
may be required to achieve comparable results with asyn­
chronous processors. But, it is the potential benefits of the 
asynchronous approach that are compelling in this world of 
highly-optimized systems. In terms of raw speed, lowered 
power, and improved EMC properties, asynchronous tech­
niques may have much to offer.
Until recently, there have been relatively few asynchronous 
processors reported in the literature. Early work in asyn­
chronous computer architecture includes the M acrom odule  
pro jec t during the early 70’s at Washington University [13,
14] and the self-timed dataflow machines called D D M -1  and 
D D M -2  (Data Driven Machine) built at the University of 
Utah in the late 70’s [18].
More recent academic projects include the Caltech Asyn­
chronous Microprocessor [41] which was the first asynchronous 
microprocessor of the VLSI era, the NSR [6], fully decoupled 
and built from FPGAs, and the Rotary Pipeline Processor
[47] which takes a circular ring approach to the pipeline. 
In addition, at Sun Labs, a new coun terflow  architecture 
has been proposed, with a fully asynchronous implementa­
tion [62], Some recent asynchronous processors are high­
lighted below.
Philips Asynchronous 80C51. At Philips Labs, an asyn­
chronous version of the venerable 80C51 controller has been 
developed that exhibits nearly four times lower power than 
a power-optimized synchronous version. It also has signif­
icantly reduced EM emissions. These properties have con­
vinced Philips to develop a family of these asynchronous 
controllers for pagers, and commercial pagers using these 
chips are now on the market [70, 71].
Sharp DDMP Signal Processor. Sharp Corporation has 
developed an experimental self-timed data driven multi-media 
processor aimed at digital television receivers and other ap­
plications. The fabricated processor exhibits impressive per­
formance and power consumption, operating at a speed of 
8600 Million Operations per Second and with power con­
sumption less than 1 watt. The processor consists of 8 pro­
grammable, data-driven processing elements connected by 
an elastic router [65].
The Amulet. A group at the University of Manchester has 
built a number of versions of a self-timed micropipelined 
VLSI implementation of the ARM processor [26] which is an 
extremely power-efficient commercial microprocessor. The 
first-generation Amulet design is within a factor of two of 
the commercial ARM of the same time [56]. The second- 
generation Amulet 2e was targeted at embedded applica­
tions and demonstrated a modest improvement in power per 
MIPS over the commercial synchronous version [27, 29], as 
well as nearly immediate restart from full standby mode. 
The third-generation Amulet 3 promises further improve­
ments in both performance and low power [30],
The Fred Architecture. Fred is a self-timed, decoupled, 
concurrent, pipelined computer architecture [59, 58]. It dy­
namically reorders instructions to issue out of order using 
an instruction window to organize the reordering, and al­
lows out-of-order instruction completion. It handles excep­
tions and interrupts, and includes a novel functionally pre­
cise exception model that works well in the asynchronous, 
decoupled, out of order environment [57].
Caltech Asynchronous MIPS R3000. Subsequent to the 
success of their first small asynchronous processor, the asyn­
chronous group at Caltech has built an asynchronous ver­
sion of the MIPS R3000 processor. Their processor uses 
deep, fine-grained pipelining which is exploited naturally by 
the underlying asynchronous circuits. The asynchronous 
R3000 exhibits significantly improved MIPS/watt perfor­
mance over the synchronous version when scaled to account 
for different processes and voltages [44].
106
TITAC. A group at Tokyo Institute of Technology and 
Tokyo University has fabricated several versions of a new 
architecture they call TITAC [49]. The most recent version 
is a full-featured 32-bit architecture that uses delay-scaling 
techniques to improve performance by taking real circuit de­
lays into account, rather than conservatively assuming un­
bounded gate delays [64].
4 Asynchronous/Synchronous Interfaces
It is clear that there are interesting applications that can 
take advantage of asynchronous techniques. However, a vast 
majority of systems are and will continue to be synchronous. 
The question then is how to utilize some of the proven ben­
efits of asynchronous circuits in a largely synchronous envi­
ronment.
Some have suggested that communication between mod­
ules should be asynchronous (although the modules them­
selves are synchronous) because the cost of global synchrony 
is prohibitively high in large-scale VLSI systems. Chapiro 
first suggested the idea of GALS system in [11]. Yun and 
Donohue demonstrated a prototype GALS system with a 
mixture of asynchronous and synchronous modules in [84]. 
In this chip, synchronous modules were equipped with pau- 
sib le clocking control to prevent synchronization failures.
Yet others have argued that maintaining precise frequency 
reference in a globally synchronous environment is not too 
difficult. The real problem is the uncertainty in clock phases. 
Ginosar and Kol [31] suggested an adaptive synchronization 
scheme to remedy this problem. Furthermore, some syn­
chronous systems [32, 85] are moving closer to asynchronous 
by allowing significant time borrowing to overcome clock 
skew and jitter problems.
References
[1] S.S. Appleton, S.V. Morton, and M.J. Liebelt. Two-phase 
asynchronous pipeline control. In IEEE Int. Symp. on Ad­
vanced Research in  Asynchronous Circuits and System s, 
April 1997.
[2] P.A. Beerel and T. Meng. Automatic gate-level synthesis of 
speed-independent circuits. In IC CA D , pages 581-586. IEEE 
Computer Society Press, November 1992.
[3] M. Benes, S.M. Nowick, and A. Wolfe. A fast asynchronous 
Huffman decoder for compressed-code embedded processors. 
In IEEE Int. Symp. on Advanced Research in Asynchronous 
Circuits and System s, pages 43-56, 1998.
[4] K. van Berkel, R. Burgess, J. Kessels, A. Peeters, M. Ron- 
cken, and F. Schalij. A fully-asynchronous low-power error 
corrector for the DCC player. IEEE JSSC , 29(12):1429- 
1439, December 1994.
[5] J.G. Bredeson and P.T. Hulina. Elimination of static and dy­
namic hazards for multiple input changes in combinational 
switching circuits. Inform ation and Control, 20:114-224, 
1972.
[6] E. Brunvand. The NSR processor. In Proceedings of the 26th 
International Conference on System  Sciences, Jan 1993.
[7] E. Brunvand and R.F. Sproull. Translating concurrent pro­
grams into delay-insensitive circuits. In ICCAD , pages 262­
265. IEEE Computer Society Press, November 1989.
[8] S.M. Burns. General condition for the decomposition of 
state  holding elements. In Int. Sym p. on Advanced Research 
in  Asynchronous Circuits and System s, pages 48-57. IEEE 
Computer Society Press, November 1996.
[9] S.M. Burns and A.J. Martin. Syntax-directed translation 
of concurrent programs into self-timed circuits. In Advanced
Research in  VLSI, pages 35-50. MIT Press, Cambridge, MA,
1988.
S. Chakraborty, D.L. Dill, and K.Y. Yun. Min-max timing 
analysis and its application to  asynchronous circuits. Pro­
ceedings o f the IEEE, 87(2), Feb 1999.
D.M. Chapiro. Globally-Asynchronous Locally-Synchronous 
System s. PhD thesis, Stanford University, October 1984.
T.-A. Chu. Synthesis of self-timed vlsi circuits from graph- 
theoretic specifications. Technical Report MIT-LCS-TR-393, 
MIT, 1987. Ph.D. Thesis.
W.A. Clark. Macromodular computer systems. In Spring 
Joint Com puter Conference. AFIPS, April 1967.
W.A. Clark and C.E. Molnar. Macromodular system de­
sign. Technical Report 23, Computer Systems Laboratory, 
Washington University, April 1973.
W.S. Coates, J.K. Lexau, I.W. Jones, S.M. Fairbanks, and 
I. E. Sutherland. A FIFO data  switch design experiment. In 
IEEE Int. Sym p. on Advanced Research in Asynchronous 
Circuits and System s, pages 4-17, 1998.
J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, 
and A. Yakovlev. Methodology and tools for sta te  encoding 
in asynchronous circuit synthesis. In DAC, June 1996.
A. Davis, B. Coates, and K. Stevens. Automatic synthesis 
of fast compact self-timed control circuits. In IFIP Working 
Conference on Asynchronous Design Methodologies, 1993.
A.L. Davis. The architecture and system method for DDM1: 
A recursively structured data-driven machine. In 5th Annual 
Sym p. on Com puter Architecture , April 1978.
P. Day and J.V. Woods. Investigation into micropipeline 
latch design styles. IEEE TVLSI, 3(2):264-272, June 1995.
M.E. Dean. STR iP: A Self-Tim ed R ISC  Processor Archi­
tecture. PhD thesis, Stanford University, 1992.
J.C. Ebergen. A formal approach to designing delay- 
insensitive circuits. D istributed Computing, 5(3): 107—119, 
1991.
E.B. Eichelberger. Hazard detection in combinational and 
sequential switching circuits. IB M  Journal of Research and 
D evelopm ent, 9(2):90-99, 1965.
R.M. Fuhrer, B. Lin, and S.M. Nowick. Symbolic hazard- 
free minimization and encoding of asynchronous finite state 
machines. In ICCAD , pages 604-611, November 1995.
S. Furber. Computing without clocks: Micropipelining the 
ARM processor. In Graham Birtwistle and Al Davis, ed­
itors, Asynchronous Digital Circuit Design, Workshops in 
Computing, pages 211-262. Springer-Verlag, 1995.
S.B. Furber and P. Day. Four-phase micropipeline latch con­
trol circuits. IEEE TVLSI, 4(2):247-253, June 1996.
S.B. Furber, P. Day, J.D. Garside, N.C. Paver, and J.V. 
Woods. A micropipelined ARM. In Proceedings of VLSI93, 
Grenoble, France, 1993.
S.B. Furber, J. D. Garside, S. Temple, J. Liu, P. Day, and 
N.C. Paver. AMULET2e: An asynchronous embedded con­
troller. In IEEE Int. Symp. on Advanced Research in A syn ­
chronous Circuits and System s, April 1997.
S.B. Furber and J. Liu. Dynamic logic in four-phase mi­
cropipelines. In IEEE Int. Symp. on Advanced Research in  
Asynchronous Circuits and System s, March 1996.
J. D. Garside, S. Temple, and R. Mehra. The AMULET2e 
cache system. In IEEE Int. Sym p. on Advanced Research in 
Asynchronous Circuits and System s, March 1996.
J.D. Garside, S.B. Furber, and S.-H. Chung. AMULET3 re­
vealed. In IEEE Int. Symp. on Advanced Research in A syn ­
chronous Circuits and System s, April 1999.
R. Ginosar and R. Kol. Adaptive synchronization. In ICCD, 
























[32] D. Harris and M.A. Horowitz. Skew-tolerant domino circuits. 
IEEE JSSC, 32(11):1702-1711, November 1997.
[33] M.B. Josephs and J.T . Udding. An overview of D-I algebra. 
In HICSS, volume I, pages 329-338. IEEE Computer Society 
Press, January 1993.
[34] D. Kearney and N.W. Bergmann. Bundled data  asyn­
chronous multipliers with data  dependant computation 
times. In IEEE Int. Symp. on Advanced Research in A syn ­
chronous Circuits and System s, April 1997.
[35] A. Kondratyev, M. Kishinevsky, B. Lin, P. Vanbekbergen, 
and A. Yakovlev. Basic gate implementation of speed- 
independent circuits. In D A C , pages 56-62. ACM, June 1994.
[36] D.S. Kung. Hazard-non-increasing gate-level optimization 
algorithms. In ICCAD , pages 631-634, November 1992.
[37] L. Lavagno, C.W. Moon, R.K. Brayton, and A. Sangiovanni- 
Vincentelli. Solving the state  assignment problem for signal 
transition graphs. In DAC, pages 568-572, June 1992.
[38] L. Lavagno and A. Sangiovanni-Vincentelli. Algorithm s for  
synthesis and testing of asynchronous circuits. Kluwer Aca­
demic, 1993.
[39] B. Lin and S. Devadas. Synthesis of hazard-free multi-level 
logic under multiple-input changes from binary decision di­
agrams. In IC CA D , pages 542-549, Nov. 1994.
[40] A. Marshall, B. Coates, and P. Siegel. The design of an 
asynchronous communications chip. IEEE Design and Test, 
11(2):8-21, Summer 1994.
[41] A. Martin, S. Burns, T.K. Lee, D. Borkovic, and P. Hazewin- 
dus. The design of an asynchronous microprocessor. In Proc. 
Cal Tech Conference on VLSI, 1989.
[42] A.J. Martin. Programming in VLSI: From communicat­
ing processes to delay-insensitive circuits. In C.A.R. Hoare, 
editor, D evelopm ents in  Concurrency and Communication, 
pages 1-64. Addison-Wesley, Reading, MA, 1990.
[43] A.J. Martin. Asynchronous datapaths and the design of 
an asynchronous adder. Formal Methods in  System  Design, 
1( 1): 119—137, July 1992.
[44] A.J. M artin, A. Lines, R. Manohar, M. Nystroem, P. Pen- 
zes, R. Southworth, and U. Cummings. The design of an 
asynchronous MIPS R3000 microprocessor. In Advanced Re­
search in VLSI, September 1997.
[45] G. M atsubara and N. Ide. A low power zero-overhead self­
timed division and square root unit combining a single-rail 
static circuit with a dual-rail dynamic circuit. In IEEE Int. 
Symp. on Advanced Research in Asynchronous Circuits and 
System s, April 1997.
[46] T.H.-Y. Meng, R.W. Brodersen, and D.G. Messerschmitt. 
Automatic synthesis of asynchronous circuits from high-level 
specifications. IEEE TCAD, 8(11): 1185—1205, November
1989.
[47] S. Moore, P. Robinson, and S. Wilcox. Rotary pipeline 
processors. IEE Proceedings, Com puters and Digita,l Tech­
niques, 143(5), September 1996.
[48] C. Myers and T. Meng. Synthesis of Timed Asynchronous 
Circuits. IEEE TVLSI, 1(2):106-119, June 1993.
[49] T. Nanya, Y. Ueno, H. Kagotani, M. Kuwako, and A. Taka- 
mura. TITAC: Design of a  quasi-delay-insensitive micropro­
cessor. IEEE Design & Test of Computers, ll(2):50-63,
1994.
[50] S.M. Nowick. Automatic synthesis of burst-mode asyn­
chronous controllers. Technical report, Stanford University, 
March 1993. Ph.D. Thesis (available as Stanford Univ. Cptr. 
Sys. Lab. tech report, CSL-TR-95-686, Dec. 95).
[51] S.M. Nowick, M.E. Dean, D.L. Dill, and M. Horowitz. The 
design of a high-performance cache controller: a case study in 
asynchronous synthesis. IN TE G R A T IO N , the VLSI journal, 
15(3):241-262, October 1993.
[52] S.M. Nowick and D.L. Dill. Synthesis of asynchronous state 
machines using a local clock. In ICCD, pages 192-197. IEEE 
Computer Society Press, October 1991.
[53] S.M. Nowick and D.L. Dill. Exact two-level minimization of 
hazard-free logic with multiple-input changes. IEEE TCAD, 
14(8):986-997, August 1995.
[54] S.M. Nowick, N.K. Jha, and F.-C. Cheng. Synthesis of asyn­
chronous circuits for stuck-at and robust path delay fault 
testability. IEEE TCAD , 16(12): 1514—1521, December 1997.
[55] S.M. Nowick, K.Y. Yun, and P.A. Beerel. Speculative com­
pletion for the design of high-performance asynchronous dy­
namic adders. In IEEE Int. Sym p. on Advanced Research in  
Asynchronous Circuits and System s, April 1997.
[56] N.C. Paver. The Design and Im plem entation of an A syn ­
chronous M icroprocessor. PhD thesis, University of Manch­
ester, 1994.
[57] W .F. Richardson and E. Brunvand. Precise exception han­
dling for a self-timed processor. In ICCD, pages 32-37, Los 
Alamitos, CA, October 1995. IEEE Computer Society Press.
[58] W.F. Richardson and E. Brunvand. Architectural considera­
tions for a self-timed decoupled processor. IEE Proceedings, 
Com puters and D igital Techniques, 143(5), September 1996.
[59] W .F. Richardson and E. Brunvand. Fred: An architecture 
for a self-timed decoupled computer. In IEEE Int. Sym p. on 
Advanced Research in Asynchronous Circuits and S ystem s , 
1996.
[60] S. Rotem, K. Stevens, R. Ginosar, P. Beerel, C. Myers, 
K. Yun, R. Kol, C. Dike, M. Roncken, and B. Agapiev. RAP­
PID: an asynchronous instruction length decoder. In IEEE  
Int. Sym p. on Advanced Research in  Asynchronous Circuits 
and System s, 1999.
[61] P. Siegel, G. De Micheli, and D. Dill. Technology mapping 
for generalized fundamental-mode asynchronous designs. In 
DAC, pages 61-67. ACM, June 1993.
[62] R.F. SprouII, I.E. Sutherland, and C.E. Molnar. The coun­
terflow pipeline processor architecture. IEEE Design & Test 
of Com puters, ll(3):48-59, Fall 1994.
[63] I.E. Sutherland. Micropipelines. CACM , 32(6):720-738, 
June 1989.
[64] A. Takamura, M. Kuwako, M. Imai, T. Fujii, M. Ozawa,
I. Fukasaku, U. Ueno, and T. Nanya. TITAC-2: an 
asynchronous 32-bit microprocessor based on scalable-delay- 
insensitive model. In ICCD, pages 288-294, October 1997.
[65] H. Terada, S. Miyata, and M. Iwata. Ddmps: Self-timed 
super-pipelined data-driven m ultimedia processors. Proceed­
ings of the IEEE, 87(2), Feb 1999.
[66] M. Theobald, S.M. Nowick, and T. Wu. Espresso-HF: a 
heuristic hazard-free minimizer for two-level logic. In DAC, 
pages 71-76, June 1996.
[67] S. H. Unger. Asynchronous Sequential Switching Circuits. 
Wiley-Interscience, John Wiley & Sons, Inc., New York, 
1969.
[68] C.H. van Berkel and R .W .J.J. Saeijs. Compilation of commu­
nicating processes into delay-insensitive circuits. In ICCD, 
pages 157-162. IEEE Computer Society Press, 1988.
[69] K. van Berkel, R. Burgess, J. Kessels, A. Peeters, M. Ron­
cken, and F. Schalij. Asynchronous Circuits for Low Power: 
a DCC Error Corrector. IEEE Design & Test, ll(2):22-32, 
June 1994.
[70] H. van Gageldonk. An asynchronous low-power 80C51 micro­
controller. Technical report, Eindhoven University of Tech­
nology, Sept 1998. Ph.D. Thesis.
[71] H. van Gageldonk, K. van Berkel, A. Peeters, D. Baumann, 
D. Gloor, and G. Stegmann. An asynchronous low-power 
80C51 microcontroller. In IEEE Int. Sym p. on Advanced 
Research in  Asynchronous Circuits and System s, April 1998.
108
[72] V.I. Varshavsky, M.A. Kishinevsky, V.B. Marakhovsky, V.A. 
Peschansky, L.Y. Rosenblum, A.R. Taubin, and B.S. Tzir- 
lin. Self-timed, Control of Concurrent Processes. Kluwer 
Academic Publishers, 1990. Russian edition: 1986.
[73] T. Williams, N. Patkar, and G. Shen. SPARC64: A 64- 
b 64-active-instruction out-of-order-execution MCM proces­
sor. IEEE JSSC, 30(11):1215-1226, November 1995.
[74] T.E. Williams. Self-Timed Rings and their Application to 
Division. PhD thesis, Stanford University, June 1991.
[75] T.E. Williams and M.A. Horowitz. A zero-overhead self­
timed 160ns 54b CMOS divider. IEEE JSSC, 26(11):1651- 
1661, November 1991.
[76] K.Y. Yun, P.A. Beerel, and J. Arceo. High-performance two- 
phase micropipeline building blocks: double edge-triggered 
latches and burst-mode select and toggle circuits. IEE Pro­
ceedings, Circuits, Devices and Systems, 143(5):282-288, 
October 1996.
[77] K.Y. Yun, P.A. Beerel, V. Vakilotojar, A.E. Dooply, 
and J. Arceo. The design and verification of a high- 
performance low-control-overhead asynchronous differential 
equation solver. IEEE TVLSI, 6(4):643-655, December
1998.
[78] K.Y. Yun, S. Chakraborty, K.W. James, R. Fairlie- 
Cuninghame, and R.L. Cruz. A self-timed real-time sorting 
network. In ICCD, pages 427-434, October 1998.
[79] K.Y. Yun and D.L. Dill. Automatic synthesis of 3D asyn­
chronous finite-state machines. In IC CAD , Nov. 1992.
[80] K.Y. Yun and D.L. Dill. Unifying synchronous/asynchronous 
state machine synthesis. In ICCAD, pages 255-260. IEEE 
Computer Society Press, November 1993.
[81] K.Y. Yun and D.L. Dill. A high-performance asynchronous 
SCSI controller. In ICCD, pages 44-49, Oct. 1995.
[82] K.Y. Yun and D.L. Dill. Automatic synthesis of extended 
burst-mode circuits: part I (specification and hazard-free 
implementations). IEEE TCAD, 18(2):101-117, February
1999.
[83] K.Y. Yun and D.L. Dill. Automatic synthesis of extended 
burst-mode circuits: part II (automatic synthesis). IEEE  
TCAD, 18(2): 118—132, February 1999.
[84] K.Y. Yun and R.P. Donohue. Pausible clocking: A first step 
toward heterogeneous systems. In ICCD, pages 118-123, 
October 1996.
[85] K.Y. Yun and A.E. Dooply. Optimal evaluation clocking of 
self-resetting domino pipelines. In Proc. of Asia and South 
Pacific Design Autom ation Conference, pages 121-124, Jan­
uary 1999.
109
