A 100-MIPS GaAs asynchronous microprocessor by Tierno, José A. et al.
A 1 OO=MIPS GaAs 
I I I 
Asynchronous 
Microprocessor 
WE HAVE DEVELOPED a design 
method for asynchronous VU1 
circuits that is, to a large extent, 
technology independent.’ Thus, it 
makes porting a design from one 
technology to another straight- 
forward. Also, since the circuits 
designed by this method are quasi- 
delay-insensitive, they are more ro- 
bust with respect to variations in 
physical parameters. Hence, the 
method facilitates designing in a 
demanding technology such as 
GaAs (gallium arsenide), in which 
fabrication process parameters, 
particularly threshold voltages, are 
difficult to control. Finally, since 
asynchronous circuits do not use a 
clock, we avoid the complexities of 
high-speed clocking schemes. 
Adapting our method to GaAs d e  
sign is an excellent demonstration 
of the method’s advantages. Thus, 
we decided to port the asynchro- 
nous microprocessor we designed 
in CMOS in 1989 to GaAs.2 We would 
demonstrate the method’s portabil- 
JOSE A. TIERNO 
AIAlN J. MARTIN 
DRAZEN BORKOVIC 
TAK W A N  LEE 
California Institute of 
Technology 
The authors describe how they 
portad an asynchronous 
micmprscesm previously 
ity across vastly different technologies, as 
well as its efficiency and robustness. 
With an electron mobility about six 
times that of silicon at room temperature, 
and with a lower parasitic capacitance 
SUMMER 1994 
due to semi-insulating substrate, GaAs is 
potentially faster than silicon. Until r e  
cently, however, GaAs was not available 
to the VLSI community at large because 
of inherent fabrication difficulties. These 
0740-7475/94/$04.00 Q 1994 IEEE 
difficulties have largely been over- 
come. Several foundries now offer 
GaAs fabrication lines under con- 
ditionssimilar to CMOS fabrication, 
with chip size limited to about 
100,000 transistors. In particular, 
Vitesse Semiconductors offers fab- 
rication through MOSIS (Metal 
Oxide Semiconductor Implemen- 
tation System) to the United States 
academic community. 
Currently, the transistor of choice 
for GaAs digital VLSI circuits is the 
MESFET (metal semiconductor field- 
effect transistor). With no oxide to 
insulate a MESFETgate from source 
and drain, the logic families avail- 
able in GaAs are much less attractive 
than in CMOS or even NMOS. DCFL 
(direct coupled fieldeffect transis- 
tor logic) has been adapted to GaAs, 
but it has greatly reduced noise 
margins and restricted fan-in and 
fan-out. With no complementary 
transistor available, the logic is ra- 
tioed. As a result, GaAs loses a con- 
siderable fraction of its speed 
advantage due to the complexity of the 
available logic families. To bypass these 
limitations, we designed several new cir- 
cuits and adapted a number of old cir- 
cuits for use in the microprocessor. 
43 
G o A s  M I C R O P R O C E S S O R  
IMEM E *[ ID!imem[pc] ] 
FETCH = * PCI1; ID?i; PC12; El! i ;  
PCADD I ( *[ [ pcI1 t PCll; y := pc t 1; PC12; PC := y 
0 S A 1  -+ PCA1; y := pc t offset; PCA2; pc := y 
0 &c t X ! p c * X p c  
0 Ypc -+ Y?pc ’ Ypc 
1 1 -  
II *[ [ Xof -+ X!offset Xof ] ] 
1 
EXEC *[El?j; 
[ alu(j.op) -+ E2; Xs Ys AC!j.op ZAs P 
0 Id(j.op) -+ E2; ZMs Ys MGL 
0 st(j.op) -+ E2; Xs Ys MCS 
0 adi(j.op) -+ OF; E2; Xof Ys AC!add ZAs P 
0 stpc(.j.op) -+ Xpc Ys* AC!add ZAs P; E2 
0 jmp(j.op) t Ypc*Ys; E2 
0 brch(j .op)t OF; F?f; 
[ cond(f,j.cc) -+ PCAI; PCA2 
0 cond(f,j.cc) -+skip 
I; E2 
I1 
Figure 1 .  CHP description of the microprocessor. 
The microprocessor 
The microprocessor, a 16-bit, pipe- 
lined RISC (reduced instruction-set 
computer), is a modified version of the 
1989 CMOS design. Instructions issue in 
order but may complete out of order. 
The microprocessor has 16 general- 
purpose registers with four buses-two 
for read and two for write. Registers 
have individual locks to solve read-after- 
write and write-after-write conflicts. 
In addition to the ALU, the program 
counter (PC) unit contains one adder 
for relative branching and increment- 
ing the PC register. For simplicity, we 
omitted the CMOS microprocessor’s 
memory address adder, thus reducing 
the number of required buses from five 
to four and making the data path small- 
er. Other modifications include a re- 
vised pipelining of the ALU unit and a 
revised sequencing circuit to equalize 
delays in all pipeline stages. 
We initially specify the microproces- 
sor as a set of concurrent processes. We 
later transform the text of these process- 
ALU I ( * [ [ ZC -+ AC?op * X?X Y?y; 
(z,f) := aluf(x,y,op,f) B 
n F -+ F!f 
11 
II * [Pi V I  
1 
11 [B; ZA!z*V] 
MU = [ [ E L - +  Y?ma MCL; MDL?w; ZM!w 
I] MCS-+X?w*Y?ma*MCS; MDS!w 
DMEM I [ [ E L  t MDL!dmem [ma] 
0 MDS -+ MDS?dmem [ma] 
1 1  
11 
REG[k]= ( * [  [ T b k A k = j . X A %  -+X!r* XS ] ]  
I I * [ [ - t b k r , k = J . ) ’ r , k  -+Y!r*  YS] ]  
11 *[ [ -,bk A k = j.z A ZAs -+ bk?; ZAs; ZA?r; b k l  ] ] 
11 .[ [ ,bk A k = 1.2 A Z M ~  t bk?; ZMs; ZM?r; bkL ] ] 
1 
es, shown in Figure 1, into a signal tran- 
sition language, or handshaking ex- 
pansion, and then we compile it into 
production rules (the gate netlist) for 
the circuit. 
The high-level specification of Figure 
1 shows in detail how the different units 
interact. (The language used, Com- 
municating Hardware Processes, or 
CHP, is similar to Hoare’s Commun- 
icating Sequential Processes, or CSP.3,4) 
Process FETCH fetches instructions 
from the instruction memory and trans- 
mits them to process EXEC, which de- 
codes them. Process PCADD updates 
the address of the next instruction con- 
currently with the instruction fetch and 
controls the offset register. The execu- 
tion of an ALU instruction by process 
ALU can overlap the execution of a 
memory instruction by process MU. 
EXEC executes the jump and branch in- 
structions. The ALU executes stpc as the 
instruction “add the contents of register 
pc to register y and store it  in register z.” 
The process array REG [ k] implements 
the register file, each register having a 
lock. PCADD contains its own adder. 
Processes IMEM and DMEM describe 
the instruction memory and the data 
memory, respectively. 
These concurrent processes syn- 
chronize by means of communication 
commands on channels. A restricted 
form of shared variables is allowed. 
Figure 2 shows the structure of process- 
es and channels. 
GaAs and the MESFET 
The MESFET is the best-suited tran- 
sistor for GaAs VLSI applications. It is the 
easiest to manufacture, provides the 
highest density (about 100,000 transis- 
tors on a chip), and, for clock frequen- 
cies of more than 200 MHz, has a better 
power-delay product than CMOS. Com- 
pared with ECL (emittercoupled logic), 
GaAs is slightly faster, uses far less pow- 
er, and has higher circuit density, for a 
similar cost. An important application 
of GaAs is to replace ECL parts, such as 
fast RAMS, and other U1 circuits. 
44 IEEE DESIGN 81 TEST OF COMPUTERS 
PCADD Registers ALU 
Figure 2. Process and channel structure. 
MESFETs are junction FETs. The met- 
al gate forms a Schottky junction with 
the transistor channel. With no insula- 
tion between gate and channel, the 
Schottky junction creates a diode from 
gate to source and from gate to drain. 
This diode imposes severe limitations 
on the type of gates that digital circuits 
can employ. At around 0.6Vvoltage dif- 
ferential between gate and drain (or 
gate and source), the gate-to-drain (or 
gatetosource) diode starts conducting 
a significant amount of current. In a con- 
figuration like a DCFL inverter, if the in- 
put voltage goes higher than 0.7V, the 
gateto-drain voltage (that is, the gateto 
output voltage) becomes larger than 
0.6V. Then current can flow from the in- 
put to the output, causing the charac- 
teristic DCFL transfer curve shown in 
Figure 3. Observe that the output volt- 
age starts to increase once the input volt- 
age goes beyond 0.6V. 
Hole mobility is low in GaAs (10 
times less than electron mobility5), mak- 
ing p-type FETs too slow relative to n- 
type. Therefore, complementary logic 
is not practical. As in NMOS, n-type tran- 
sistors come in two flavors: enhance- 
ment mode and depletion mode. 
E-mode transistors have a positive 
threshold voltage. D-mode transistors 
have a negative threshold voltage; that 
is, they require a negative gatetosource 
SUMMER 1994 
MU 
voltage to be cut off. 
Direct coupled FET logic. DCFL, 
the most widely used logic family in 
GaAs VLSI, is analogous to its NMOS 
counterpart. It issimple, uses little pow- 
er, and has the highest density of all. 
Figure 4a shows a DCFL NOR gate. 
Signals have a restricted voltage 
swing because of the input-to-ground 
diode at the input of DCFL gates. Logic- 
low is about O.lV, while logic-high is 
about 0.6V; this drastically reduces the 
noise margins of DCFL gates. In DCFL 
NAND gates, noise margins become 
critical. The gate-to-source voltage in 
the top transistor of the pull-down chain 
is even closer to the transistor's thresh- 
old voltage, making the noise margin so 
VI" f4;;h 
0.1 0.3 0.5 VI, 
Figure 3. Input-output characteristic of a 
DCFL inverter driving a DCFl inverter 
(simulated). 
- - 
small that many designers avoid using 
NAND gates altogether. 
As in NMOS, signals often must be 
buffered. A superbuffer configuration, 
shown in Figures 4b and 4c, increases 
the noise margins by lowering the logic- 
low voltage, since the output stage is 
not ratioed. 
Power consumption. In DCFL 
GaAs, most of the power dissipation 
comes from static currents. DCFL gates 
take current continuously, with only a 
small part used to charge and discharge 
capacitors. We obtain a good power- 
delay product in GaAs because transis- 
tors are faster and gate capacitances are 
lower than in CMOS, and the reduced 
delay makes up for the extra power. 
To optimize the power-delay prod- 
uct, we design circuits with little re- 
dundancy, so that most of the circuit is 
active at any time. For example, we pre 
fer a pipelined ALU to two nonpipe- 
a Ab i o : i  out ft-!;< 1O:l 20:l 1O:l 2011 3:1 out 
- - - - _ -  -  - - - -  - - - - 
(a) 04 IC) 
Figure 4. DCFl NOR gate (a); superbuffered NAND gate (b); squeeze buffer (c). 
45 
G a A s  M I C R O P R O C E S S O R  
a Repeating section 
4P-u Out 
C 
d 
- -  
(a) 
Figure 5. Multi-input C-element: transistor schematic (a); logical diagram (b). 
lined ALUs in parallel because the en- 
ergy cost of the first is half that of the 
second, virtually independent of usage. 
It is possible to build dynamic GaAs 
circuits with very low power consump 
t i ~ n . ~ , ~  However, the practical circuits 
demonstrated so far have very low in- 
tegration levels. 
GaAs technology mapping 
Although mapping the microproces 
sor design into DCFL gates is tempting, 
the specific requirements of asynchro- 
nous circuits make that choice imprac- 
tical. DCFL has low noise margins, and 
the gates one can use in practice are re- 
duced to NOR gates with a relatively 
small number of inputs. Asynchronous 
operation requires monotone signal 
transitions and is sensitive to noise and 
charge sharing. Also, the compilation 
process sometimes generates complex 
gates with a large number of inputs. 
Delay-insensitive decomposition of 
these gates into smaller gates is a deli- 
cate task. 
We need circuits that allow a direct 
synthesis of complex gates up to a rea- 
sonable size. To solve the problem of 
large operators, we have investigated 
several alternative logic configurations. 
We chose different solutions for the 
data path and the control circuits. 
Data path. The data path includes 
three different units: the ALU, the PC 
unit for manipulations of the program 
16 
counter, and the memory unit for exe- 
cution of load and store operations. 
These circuits consist of combinatorial 
cells with no feedback loops, replicated 
a number of times. Data path delay is 
determined primarily by carrychain, 
control signal, and bus delays8 With no 
feedback loops to amplify noise, noise 
problems are less severe. 
We must optimize the data path for 
size and power. Most signals in the data 
path are local, with the exception of 
control lines and buses, and delays and 
power depend directly on the data 
path’s physical dimensions. In this d e  
sign, 70% of all power is spent in the data 
path, 15% in the register file, and 15% in 
the control logic (pad driver power is 
excluded from this computation). 
To satisfy size and power constraints, 
all data path gates are DCFL, except 
NAND gates, buffers, and completion 
detection circuits. DCFL gates are small 
and power efficient. The data path con- 
tains only the simple gates available in 
DCFL. 
Figure 4b shows the implementation 
of NAND gates5 The superbuffer stage 
allows the output low voltage to be low 
enough for other DCFL gates because 
the pull-down does not have to fight a 
passive pull-up. Therefore, noise mar- 
gins increase considerably, with a small 
penalty in area and power. 
We use superbuffers to buffer bus 
and control signals. To improve perfor- 
mance and noise margin characteris- 
tics, we added a feedback transistor, 
creating a squeeze buffer (see Figure 
4c). Squeeze buffers (developed by 
Richard B. Brown of the University of 
Michigan) allow the use of a stronger 
pull-up transistor; the feedback transis- 
tor limits the output high voltage. 
Completion detection takes place in 
sequence with the calculations per- 
formed by the data path and affects 
performance directly. Generating the 
completion signal efficiently is critical. 
To generate completion signals from the 
data path, we use C-elements with a 
large number of inputs. They can be 
built from smaller Celements connect- 
ed in a tree,4 or, as in our microproces- 
sor, as a single logic gate (see Figure 5).9 
Though we could implement a com- 
pletion tree with DCFL NOR gates, it 
would be significantly slower and big- 
ger than that of Figure 5a. 
Control logic. Control logic takes 
care of the sequencing of actions in the 
microprocessor. Compilation assigns 
each control signal a set of production 
rules of the form 
G+ZT 
H + z J  
where G and H are Boolean expres- 
sions in terms of the other signals. G 
and H do not have to be complemen- 
tary. In fact, most operators in the con- 
trol are state holding; that is, G v  Hdoes 
not hold. There are different direct im- 
plementations of these production 
rules. One, called source follower FET 
logic, is described in Tierno’s paper, 
which presents a systematic way of 
generating any operator described by 
production rules.g We applied this 
method in the design of our first GaAs 
microprocessor. However, it resulted 
in a circuit with a large power con- 
sumption (4W) and modest perfor- 
mance (70 MIPS). 
For our second GaAs microproces- 
sor, we used a different approach. Each 
imnE DRSION I TEST OF COMPUTERS 
- 
a 
b 
Z - 
d 
C 
I I  
wz 
a A 7 b v d A C + Z ?  
~ d A ~ e + Z ~  (4 
Figure 6. Dual-rail implementation of 
control signals: transistor schematic (a); 
logical diagram (b); production rules (c). 
Figure 7. Example of a NOR-NOR P I A  
implemented with source followers. 
Figure 8. Register cell. 
signal has a dual-rail encoding, always 
generating both a positive and a nega- 
tive sense. An advantage of this ap- 
proach is that no inverters are necessary 
to generate the Boolean expressions for 
the production rules. A drawback is that 
combinational gates require extra cir- 
cuitry to generate dual-rail outputs, as 
do signals coming in from the data path. 
Figure 6 shows how a specific set of 
production rules is implemented in 
dual-rail. Note the feedback transistor 
on the outputs of the individual NOR 
gates. These transistors have the same 
function as the one on the squeeze 
buffer, allowing the use of much 
stronger pull-up transistors. 
Programmable logic arrays. The 
microprocessor uses PLAs to evaluate 
condition codes for the branch in- 
structions and kill-propagate-generate 
codes for the ALU. The microproces- 
sor's power consumption is relatively 
high (2W), and the chips run hot 
(around 100°C). At this temperature, 
subthreshold currents of the pull-down 
transistors may be strong enough to 
overpower the pull-ups. NOR gates with 
more than six inputs are impractical b e  
cause the off current of six transistors is 
of the same order of magnitude as the 
on current of one transistor. Therefore, 
we cannot use static DCFL PLAs. 
We implemented the NOR planes with 
source followers, which can be turned 
. 
off more effectively than the corre 
sponding DCFL structure (see Figure 7).  
We pay a penalty in speed and power, 
but min-terms with up to 10 inputs are r e  
alizable. The internal signals in the NOR 
plane can switch rail to rail, giving much 
improved noise margins; the diodes act 
as levelshifters and help cut the input 
transistors. Also, the width ratio between 
the pull-up and pull-down transistors in 
the source follower is close to 1, and the 
subthreshold current of the pull-down 
can better balance the pull-up. 
Register file. The register file has 16 
registers, each 16 bits wide, and four 
port.s-two for reading and two for writ- 
ing into the registers. Register 0 is hard- 
wired to be always read as zero. Each 
register bit has a total of 12 transistors, 
four for the flip-flop and two extra for 
each port (see Figure 8). All ports are 
dual-rail; that is, they generate data and 
inverted data. Between reads and writes, 
the buses are precharged to a neutral 
state to reset the completion circuits and 
to prepare for the next transfer. 
Read ports use a dual-ended sense 
amplifier (see Figure 9, next page). This 
sense amp detects a small difference b e  
tween the true and false buses and dri- 
ves them strongly in opposite directions, 
using transistors T5, T4, and T1. To work 
properly, the circuitry must select the 
register being read some time prior to 
applying the sense signal. To this effect, 
SUMMER 1994 47 
G a A s  M I C R O P R O C E S S O R  
Figure 9. Sense amplifier and precharge 
circuit for register file. 
we derive the sense signal s/F from an 
OR of all select signals for the given port. 
When s / i  is low, transistors T8, T3 
( 7 7  and T2) precharge buses Fand  & 
to a value determined by the ratio b e  
tween T8 and T3 (T7 and T2). Transistor 
T6 further ensures the circuit’s symme- 
try. The buses are buffered before go- 
ing into the data path, to isolate the 
sense amplifier and to restore the logic 
levels of the data signals. 
The write ports generate a comple- 
tion signal indicating that the write is 
done or that the buses have been pre- 
charged. We assume that the write is fin- 
ished when all bits in the bus have valid 
data and one of the registers is selected. 
Likewise, the precharge is finished when 
all bits in the bus have been precharged 
and no register is selected. To generate 
the completion signal, we use a 17-input 
Celement, designed as in Figure 5. 
Resu I ts 
Using these techniques, we have de- 
signed several circuits and fabricated 
and tested them on the HGaAs I1 and 
HGaAs I11 processes offered by Vitesse 
Semiconductors. Among them are two 
small RAMS, a register file, and two 
microprocessors. 
Asynchronous static memories. 
We designed and fabricated two types 
of static memories in GaAs. The first is a 
16-word x &bit, dual-ported memory. Its 
purpose is to provide a small amount of 
fast memoryso that the microprocessor 
can run test programs at full speed. Of 
30 bonded devices, 29 were functional. 
The chip’s access time is 5 ns, and it dis- 
sipates 500 mW at 2.2V. 
The second SRAM has 64 words of 4 
bits. We designed it in collaboration 
with H.P. HofsteeIO as an intermediate 
step toward a larger memory to serve as 
a cache for the microprocessor. All 30 
bonded devices received were func- 
tional. The access time is 3 ns, includ- 
ing pad delays, and the chip dissipates 
700 mW at 2.3V. 
We designed the 64x4 memory after 
we designed the first microprocessor, 
and we incorporated several improve- 
ments derived from our experience 
with the earlier design. We also care- 
fully optimized the circuits for high 
speed and low power consumption. 
The performance we obtained indi- 
cates that the improvements we envi- 
sion for the next microprocessor 
generation should be attainable. 
Microprocessors. The first micro- 
processor design uses the circuits de- 
scribed by ti ern^.^ Our main concern 
was to get around parameter variation 
problems and noise margin considera- 
tions. We overcame these problems, 
but at the expense of power and per- 
formance. At 70 MIPS, the micropro- 
cessor consumes 4W. 
The second microprocessor design 
incorporated the new ideas and circuits 
presented in this article. All of those 
circuits were fabricated and tested 
successfully. 
We simulated the second design ex- 
tensively with Hspice. The expected 
performance was about 200 MIPS with 
a dissipation of 2W. Power and speed 
predictions, using the Hspice models, 
have been very accurate so far. How- 
ever, the measured performance is only 
100 MIPS. The causes are still under in- 
vestigation. We have found some evi- 
dence of fabrication problems and 
underestimation of the parasitic ca- 
pacitances as extracted by the Magic 
layout program. 
Another factor affecting performance 
is pad delay. So far, we have used ECL 
levels on the outside of the chip, to en- 
able it to interface to standard parts and 
simplify prototyping. Pad delays are on 
the order of 1 ns, mostly spent in level 
conversion. The pad frame also uses a 
considerable amount of power-close 
to 2W under worst-case conditions for 
the microprocessor. Matched pad dri- 
vers and receivers in a system com- 
posed exclusively of GaAs parts would 
greatly reduce delay and power use. 
This would certainly be a requirement 
in the interface with cache memory. 
IN THE COURSE OF THIS PROJECT, we 
designed several different GaAs cir- 
cuits. To overcome the limitations of 
GaAs, most of them had very strict re- 
quirements. We found no general so- 
lution to the problem of synthesizing 
all logic circuits in a design as big as a 
microprocessor. Instead, we devel- 
oped specific solutions for implement- 
ing completion trees, control circuits, 
PLAs, registers, and so on. 
The first GaAs microprocessor, 
though disappointing in performance, 
gave us invaluable experience in veri- 
fying and testing GaAs circuits. Together 
with the RAM designs, it helped us de- 
sign a second microprocessor with con- 
siderable performance improvements. 
Porting the design of the original 
CMOS microprocessor was almost as 
easy as we expected. A few changes of 
the original CMOS design were neces- 
sary because of the complexity of reg- 
ister cells in the first GaAs version. We 
carried over these changes to the sec- 
ond microprocessor to speed up the re- 
design. Overall, the performance of the 
new microprocessor is satisfactory. At 
50 MIPS/W, it offers remarkable speed 
for the power consumption. 
In general, we are satisfied with our 
solutions to the problem of designing 
48 IEEE DESIGN & TEST OF COMPUTERS 
asynchronous circuits in GaAs, al- 
though we were expecting better per- 
formance. The MESFET is far from an 
ideal switch; as a result, gates with good 
gain, noise, delay, and power charac- 
teristics are quite complicated. To make 
those gates more reliable, we must pay 
an overhead in delay and power, off- 
setting the raw performance gain of the 
GaAs MESFET over the silicon MOSFET. 
More importantly, integration levels 
in GaAs are very limited compared to 
CMOS. An important source of perfor- 
mance in digital circuits is parallelism, 
which can more than make up for the 
difference in raw speed. Reduced i n t e  
gration levels greatly penalize circuits 
that can make efficient use of paral- 
lelism, such as microprocessors with 
wide data paths, or memories. 
Of course, some problems are inher- 
ently sequential, such as serial com- 
munication channels. GaAs is an ideal 
technology for a router for a highspeed 
fiber optic network, or for a digital 
signal-processing chip for microwave 
circuits. Forsuch problems, we can ap- 
ply the tools of asynchronous design 
and carry over to GaAs all the advan- 
tages of asynchronous circuits. 
Acknowledgments 
We are indebted to Marcel van der Goot, 
Steve Burns, Pieter Hazewindus, H. Peter 
Hofstee, Ray Milano, and Cindy Hibbert. 
The research described in this article was 
sponsored by the Advanced Research 
Projects Agency, under ARPA order 6202, 
and monitored by the Office of Naval 
Research, under contract N00014-87-K-0745. 
References 
1. A.J. Martin, “Compiling Communicat- 
ing Processes into Delay-Insensitive 
VLSI Circuits,” Distributed Computing, 
Vol. 1, NO. 4, 1986, pp. 226-234. 
SUMMER 1994 
2. A.J. Martin et al., “The Design of an 
Asynchronous Microprocessor,” in 
Advanced Research in VLSI: hoc.  
Decennial Caltech Conl V W ,  C.L. Seitz, 
ed., MIT Press, Cambridge, Mass., 1989, 
3. C.A.R. Hoare, “Communicating Se- 
quential Processes,” Comm. ACM, Vol. 
21, No. 8,1978, pp. 666-677. 
4. A.J. Martin, “Synthesis of Asynchronous 
VLSi Circuits,” in Formal Methods for 
VLSl Design, J.Straunstrup, ed., North- 
Holland, Amsterdam, 1990, pp. 237-283. 
5. S.I. Long and S.E. Butner, Gallium Ar- 
senide Digital Integrated Circuit Design, 
McCraw-Hill, New York, 1990. 
6. K.R. Nary and S.I. Long, “GaAs 2-Phase 
Dynamic FET Logic-A Low-Power 
Logic Family for VLSI,” IEEE J. Solid- 
State Circuits, Vol. 27, No. 10, 1992, pp. 
7. P.S. Lassen and S I .  Long, “Ultralow- 
Power GaAs-MESFET MSI Circuits Using 
2-Phase Dynamic FET Logic,” IEEE J. 
Solid-state Circuits, Vol. 28, No. 10,1993, 
pp. 1038-1045. 
8. A.J. Martin, “Asynchronous Data Paths 
and the Design of an Asynchronous 
Adder,” Formal Methods in System 
Design, Vol. 1,  No. 1, 1992, pp. 117-137. 
9. J.A. Tierno, Designing Asynchronous 
Circuits in Gallium Arsenide, master’s 
thesis, CSTR-92-19, California institute 
of Technology, Pasadena, Calif., 1992. 
10. H.P. Hofstee, “Deriving Some Asyn- 
chronous Memories,” 1991, unpub- 
lished, available from the author at 
California Institute of Technology, 
Dept. of Computer Science, Pasadena, 
CA 91 125; or HPH@vlsi.cs.caltech.edu.. 
pp. 351-373. 
1364-1371. 
Jo& A. Tierno received the engineering d e  
gree in electrical engineering from the Uni- 
versidad de la Republica, Montevideo, 
Uruguay, and an MSc degree in electrical en- 
gineering from the California Institute of 
Technology, where he is currently working 
toward a PhD in computer science. His re- 
search concentrates on lowenergy asyn- 
chronous design. Other interests include 
computer architecture and GaAs VLSl design. 
Alain J. Martin is a professor of computer 
science at the California Institute of Tech- 
nology. He graduated from the Institut 
National Polytechnique de Grenoble. His 
research interests include concurrent and 
distributed programming and its applica- 
tion to the design of VLSl circuits and high- 
ly concurrent computing systems. 
Drazen Borkovic received the MSc degree 
in electrical engineering from the California 
Institute of Technology. 
Tak Kwan (Tony) Lee received the BS de- 
gree in computer engineering and the BA 
degree in mathematics from UC San Diego. 
He received the MS degree in computer sci- 
ence from the California institute of Tech- 
nology, where he is working on his doctoral 
thesis on analyzing the performance of de- 
lay-insensitive circuits. His other research 
interests include CAD tools and asynchro- 
nous arithmetic units. 
Send correspondence to Jose A. Tierno, 
Caltech, Dept. of Computer Science, Pasa- 
dena, CA91 125; or jat@vlsi.cs.caltech.edu. 
49 
