Critical design issues for gallium arsenide VLSI circuits. by Bushehri, Ebrahim
Middlesex University Research Repository
An open access repository of
Middlesex University research
http://eprints.mdx.ac.uk
Bushehri, Ebrahim (1992) Critical design issues for gallium arsenide 
VLSI circuits. PhD thesis, Middlesex Polytechnic. 
Submitted Version
Available from Middlesex University’s Research Repository at 
http://eprints.mdx.ac.uk/6146/
Copyright:
Middlesex University Research Repository makes the University’s research available electronically.
Copyright and moral rights to this thesis/research project are retained by the author and/or 
other copyright owners. The work is supplied on the understanding that any use for 
commercial gain is strictly forbidden. A copy may be downloaded for personal, non-
commercial, research or study without prior permission and without charge. Any use of the 
thesis/research project for private study or research must be properly acknowledged with 
reference to the work’s full bibliographic details.
This thesis/research project may not be reproduced in any format or medium, or extensive 
quotations taken from it, or its content changed in any way, without first obtaining permission 
in writing from the copyright holder(s).
If you believe that any material held in the repository infringes copyright law, please contact 
the Repository Team at Middlesex University via the following email address:
eprints@mdx.ac.uk
The item will be removed from the repository while any claim is being investigated.

Middlesex 
University 
The copying of this thesis in any way or form is illegal. YOU MUST NOT COPY DISSERTATIONS 
Anybody found making illegal copies of any part of this thesis will be dealt with in accordance to University 
regulations. 
Your borrowing rights will be revoked. 
Please sign the copyright declaration below. 
The copyright of this thesis rests with the author or the University. No part will be photocopied or 
published without prior written consent from one of the above. Any quotation or information derived 
from this thesis will be fully acknowledged and fully cited and that failure to do so will constitute 
plagiarism. I agree to abide by this declaration. 
Date Name Student 
ID 
IMiiiiii19 5 
T h e Sheppard Library 
.Middlesex University' 
The Burroughs' 
\London NW4 4BT-*» -• * 
020 8411 5852 
http://library.mdx.ac.uk t : r v , r 
MIDDLESEX 
U C UNIVERSITY 
SHORT L O A N C O L L E C T I O N 
CRITICAL DESIGN ISSUES FOR 
GALLIUM ARSENIDE VLSI CIRCUITS 
A thesis submitted to the Counci l for National Academic Awards 
by 
Ebrahim Bushehri 
in partial fulfilment of the requirements for the degree of 
Doctor of Philosophy 
April 1992 
Microelectronics Centre, Middlesex Polytechnic 
Site 
: PDUESEX 
u": :vr.?S!TY 
L;3RARY 
A c c e s s i o n 
No. 
C l a s s 
No. 
Specin1 
C o l l o d i o n 
— 
INDEX 
ABSTRACT 
ACKNOWLEDGEMENTS 
GLOSSARY 
CHAPTER 1 Introduction 
1.1 Review of Silicon Technology 
1.2 Limitations of Silicon Technology 
for High Speed Applications 
1.3 Gallium Arsenide as an Alternative Substrate 
1.4 Current Developments and Future Trends 
1.5 Scope of this Thesis 
CHAPTER 2 GaAs Device Fabrication 
and Modelling 
2.1 Suitable Devices for VLSI Implementation 
2.2 GaAs MESFET Structure 
2.3 Planar Processing Steps for GaAs MESFETs 
2.4 Self-Aligned Gate Process Technology 20 
2.5 GaAs MESFET Design Rules and Layer Representation 22 
2.6 An Appropriate Device Model for GaAs VLSI 22 
2.7 Important Effects Included in the Device Model 26 
2.8 Interconnect Modelling 27 
Chapter 3 MESFET Logic Families 
for GaAs VLSI Circuits 
3.1 Types of MESFET Logic Gate 30 
3.2 Normally-ON Logic Gates 30 
3,3 Normally-OFF Logic Gates 34 
3.4 Suitable Logic Gates for GaAs VLSI 37 
3.5 First Order Design of DCFL and SDCFL Gates 37 
3.6 Definition of Design Parameters 41 
3.7 Detailed Analysis of DCFL and SDCFL Gates 43 
3.8 Design of Buffering Schemes for GaAs VLSI Circuits 49 
CHAPTER 4 Analysis of A d d e r Circuits 
for GaAs VLSI Implementation 
4.1 Adder Design Approach 59 
4.2 Types of Adder 62 
4.3 Evaluation of Adder Circuits for GaAs VLSI 69 
4.4 Summary of Important Points 77 
CHAPTER 5 A High Speed GaAs Multipl ier 
5.1 A Suitable Multiplier for GaAs Implementation 81 
5.2 The Algorithm 82 
5.3 The Overall Architecture 85 
5.4 Implementation Issues 94 
5.5 Performance Evaluation 96 
CHAPTER 6 A Novel Design and Layout 
Approach for GaAs VLSI Circuits 
6.1 Architectural Decomposition of GaAs VLSI Circuits 99 
6.2 Ring Notation for the Layout of GaAs VLSI Circuits 100 
6.3 Important Issues in Ring Notation Layouts 106 
6.4 Design of BLC Adders using the Ring Notation 110 
6.5 Evaluation of the Ring Notation Adders 115 
and Multiplier Circuits 
CHAPTER 7 Conclusions 
7.1 Summary and Conclusions 
7.2 Recommendations 
119 
124 
Appendices 
A Layer Representation and Design Rules 126 
B Derivation of Gate Delay Formula 138 
C Brief Description of the Design Tool 141 
D CPS and CPW Models for the Estimation of 143 
the Inductances in the Supply Rails 
E Logic and Ring Notation Diagrams of the DCFL 147 
and SDCFL BLC Adders 
References 153 
Critical Design Issues for Gallium 
Arsenide VLSI Circuits 
Abstract 
The aim of this research was to design and evaluate various Gallium 
Arsenide circuit elements such as logic gates, adders and multipliers 
suitable for high speed VLSI circuits. The issues addressed are the logic 
gate design and optimisation, evaluation of various buffering schemes and 
the impact of the algorithm on adder and multiplier performance for 
digital signal processing applications. This has led to the development of 
a design approach to produce high speed and low power dissipation 
Gallium Arsenide VLSI circuits. This is achieved by : 
Evaluating the well established Direct Coupled Logic (DCFL) gates and 
proposing an alternative gate, namely the Source Follower DCFL 
(SDCFL), to improve the noise margin and speed. 
Suggesting various buffering schemes to maintain high speed in areas 
where the fanout loading is high (eg. clock drivers). 
Comparing various adder types in terms of delay-power and delay-area 
products to arrive at a suitable architecture for Gallium Arsenide 
implementation and to determine the influence of the algorithm and 
layout approach on circuit performance. To investigate this further, a 
multiplier was also designed to assess the performance at higher levels 
of integration. 
Applying a new layout approach, called the 'ring notation*, to the adder 
and multiplier circuits in order to improve their delay-area product. 
Finally, the critical factors influencing the performance of the circuits are 
reviewed and a number of suggestions are given to maintain reliable 
operation at high speed. 
2.4 Self-Aligned Gate Process Technology 
2.5 GaAs MESFET Design Rides and Layer Representation 
2.6 An Appropriate Device Model for GaAs VLSI 
2.7 Important Effects Included in the Device Model 
2.8 Interconnect Modelling 
2.9 Effect of Process Variations 
Chapter 3 MESFET Logic Families 
for GaAs VLSI Circuits 
3.1 Types of MESFET Logic Gate 
3.2 Normally-ON Logic Gates 
3.3 Normally-OFF Logic Gates 
3.4 Suitable Logic Gates for GaAs VLSI 
3.5 First Order Design of DCFL and SDCFL Gates 
3.6 Definition of Design Parameters 
3.7 Detailed Analysis of DCFL and SDCFL Gates 
3.8 Design of Buffering Schemes for GaAs VLSI Circuits 
CHAPTER 4 Analysis of Adder Circuits 
for GaAs VLSI Implementation 
4.1 Adder Design Approach 
4.2 Types of Adder 
4.3 Evaluation of Adder Circuits for GaAs VLSI 
4.4 Summary of Important Points 
Acknowledgements 
I would like to express my gratitude to the following people at Middlesex 
Polytechnic. 
Professor John Butcher, my director of studies, for his valuable guidance 
and support throughout this research programme. His much needed 
comments and constructive criticisms on the draft of the key chapters are 
also greatly appreciated. 
My supervisors Mr Richard Bayford and Dr Robert Paul Camp for their 
help, advice and technical input to the project. 
Mr Paul Burn, managing director of the Integrated Circuit Design Centre 
(ICDC), for his support and encouragement throughout the project. 
My colleagues Mr Divya Pujara and Mr Majid Saber with whom I had 
many useful discussions on various aspects of the project. 
I would like to acknowledge the support and help of the following people 
at the University of Adelaide, South Australia. 
Dr Kamran Eshraghian, Head, of the Centre for Gallium Arsenide VLSI 
Technology, for his direct influence on many areas of the project. His 
novel idea of 'ring notation layout' methodology has formed the basis of 
the results presented in chapter 6 of this thesis. 
Mr Derek Abbott, research officer, for his guidance, especially in the 
initial stages of the project. 
Mr Andrew Beaumont-Smith for providing much of the software support 
for the design tools and the design rules for the particular GaAs process 
used in this research. * 
Thislwbrk has been partially supported by the Sir Keith and Sir Ross 
Smith Foundation of the Australian Council for Research. 
Acknowledgements 
I would like to express my gratitude to the following people at Middlesex 
Polytechnic. 
Professor John Butcher, my director of studies, for his valuable guidance 
and support throughout this research programme. His much needed 
comments and constructive criticisms on the draft of the key chapters are 
also greatly appreciated. 
My supervisors Mr Richard Bayford and Dr Robert Paul Camp for their 
help, advice and technical input to the project. 
Mr Paul Burn, managing director of the Integrated Circuit Design Centre 
(ICDC), for his support and encouragement throughout the project. 
My colleagues Mr Divya Pujara and Mr Majid Saber with whom I had 
many useful discussions on various aspects of the project. 
I would like to acknowledge the support and help of the following people 
at the University of Adelaide, South Australia. 
Dr Kamran Eshraghian, Head of the Centre for Gallium Arsenide VLSI 
Technology, for his direct influence on many areas of the project. His 
novel idea of 'ring notation layout' methodology has formed the basis of 
the results presented in chapter 6 of this thesis. 
Mr Derek Abbott, research officer, for his guidance, especially in the 
initial stages of the project. 
Mr Andrew Beaumont-Smith for providing much of the software support 
for the design tools and the design rules for the particular GaAs process 
used in this research. 
This work has been partially supported by the Sir Keith and Sir Ross 
Foundation of the Australian Council for Research. 
GLOSSARY 
id Gate delay (ps) 
Leff Effective channel length (um) 
W Width of the FET channel (u m) 
p Resistivity (Hem) 
un , up Electron and hole mobilities (cmVVs) 
Vbi Schottky barrier height (V) 
Vt Threshold voltage (V) 
Vp Pinch-off voltage (V) 
X Channel length modulation parameter (1/V) 
p Transconductance parameter (amp/V2) 
N Effective channel doping density (atom/cm3) 
c - e0. er where e0 is the permittivity of free space (F/cm) and 
er is the relative permittivity of GaAs (13.1) 
Cgo Zero bias gate capacitance (F) 
Cgd, Cgg Gate-drain and gate-source capacitances (F) 
Rd, Re Drain and source resistances (ohm) 
a Effective channel implant depth (A) 
q Electron charge (C) 
a Hyperbolic tangent drain multiplier (1/V) 
Fc Average clocking frequency 
Fj Fanin 
F 0 Fanout 
BFL Buffered FET Logic 
SDFL Schottky Diode FET Logic 
CCFL Capacitor-Coupled FET Logic 
QFL Quasi-FET Logic 
DCFL Direct Coupled FET Logic 
SDCFL Source Follower DCFL 
RDCFL Ring notation DCFL 
RSDCFL Ring notation SDCFL 
CHAPTER 1 
INTRODUCTION 
1.1 Rev iew o f Si l icon Techno logy 
Silicon is the most widely used semiconductor material for integrated 
circuits. The main reasons for this choice are the ease of purification, the 
ease of forming single crystals and the device considerations such as the 
ease of epitaxial growth and the growth of high integrity oxide [1]. As a 
result many device types have been proposed in silicon for integrated 
circuits. Initially the main workhorse in the IC industry was the bipolar. 
technology and more recently the MOS process. 
MOS integrated circuit technology has progressed tremendously because 
of the huge demand for digital electronics applications. As shown in Table 
1.1, it is now possible to fabricate integrated circuits containing up to 1 
million or more transistors [2]. This trend is likely to continue (Moore's 
law) such that by the end of 1990's the level of complexity will probably 
exceed 10 million transistors per chip. 
The advantages of this increased level of integration are reflected in the 
cost reduction, higher reliability, higher speed and low power dissipation 
of systems which are also extremely small and light weight. To achieve 
these results there has been a systematic approach to improving the 
process technology and also major efforts have been directed towards 
solving the problems of device scaling. Apart from the higher packing 
densities achievable from the fabrication of smaller devices, it is possible 
to make devices with higher operating frequencies in order to fulfil the 
speed requirements of state-of-the-art computer systems [3] [4] [5]. 
1 
Year Technology No. of Trans. 
per Chip 
Typical Products 
1950 Discrete 
Components 
1 Junction Trans, and 
diodes. 
1961 SSI 10 Logic gates, Flip-Flops. 
1966 MSI 100-1000 Counters, Adders, 
Multiplexers. 
1971 LSI 1000-20,000 8 bit microprocessors, 
ROM, RAM. 
1980 VLSI 20,000-500,000 16 and 32 bit 
Microprocessors. 
1985 ULSI > 500,000 Special Processors, Real 
time image processors. 
1990 GSI >10,000,000 WSI 
Table 1.1 Microelectronics Evolution. 
1.2 Limitations of Silicon for High Speed Applications 
Super fast computers with sub-nanosecond cycle times, and multi-gigabit 
per second telecommunication and instrumentation systems are the 
driving forces behind the development of high speed VLSI circuits. The 
emphasis is on increasing the level of integration and the speed of these 
circuits to achieve the computational power required by the application 
areas mentioned above [6]. 
The principal requirements of high speed VLSI circuits are: small feature 
size, high process yield and, most important of all, extremely low dynamic 
switching energy [71 [8]-[9]. 
The origins of the first two requirements are obvious. Clearly, large 
numbers of gates can not be placed on a reasonably sized chip unless the 
gate areas are small. For instance if a 1cm 2 chip is to accommodate 
2 
100,000 transistors, the size of the individual gates must be less than 
lOOOum 2. The process yield should also be sufficient to produce 
economically such complex parts. 
The dynamic switching energy or power-delay product, 2Pd x xd, is the 
minimum energy that a gate can dissipate during a clock cycle. The power 
dissipation for a chip with N g gates with an average gate clocking 
frequency F c will therefore be : 
This relation is illustrated in Figure 1.1, for a typically 'large' total input 
power of 2 Watts [10], 
Number of Gates/Chip 
1E9 
Dynamic Switching Energy (fJ) 
F igure 1.1 Switching energy as a function of the number of gates per 
chip for a practical power of 2 Watts. 
3 
The requirement on dynamic switching energy for high speed VLSI is 
quite severe. Even allowing for the fact that power dissipation for large 
chips could safely be somewhat higher than 2 Watts, dynamic switching 
energies of much less than O.lpJ appear essential for achieving practical 
very high speed VLSI [11]. Therefore, it is of critical importance to 
evaluate the existing technologies and choose the one with the lowest 
speed-power product in order to be able to combine high levels of 
integration with high speed performance. 
As mentioned in section 1.1, MOS is by far the most often used technology 
for VLSI circuits and will continue to fill this role. In order to obtain high 
speed and high density MOS ICs, the device geometries need to be 
continuously scaled to smaller sizes [12]. This means that the theoretical 
and practical limits associated with the scaling of MOS circuits must be 
investigated to find the limitations of existing technologies. 
Figure 1.2 shows the gate propagation delay and power dissipation 
against the channel length of fabricated CMOS inverters [13] [14]. At 
0.5pm (state-of-the-art commercial device size) and standard power 
supply of 5V. the delay is about 120ps with power dissipation of l . lmW. 
The speed-power product of the gate is therefore about O.lpJ, enabling the 
realisation of high speed, medium scale integrated circuits. The expected 
circuit performance with scaling for different technologies has also been 
investigated by PA.H Hart, et al [15]. They have considered a range of 
devices such as ECL, I*L and MOS. The scaling process most benefits the 
MOS technology, with speeds higher than that of ECL and speed-power 
product even lower than PL. Below lu/n gate width, a delay time of 
lOOps and a power-delay product of 0.02pJ should theoretically be 
possible. However when device miniaturisation is continued, the second 
order effects on device characteristics become so significant that simple 
scaling of the technology becomes a non-viable approach at a certain 
geometry [16]. For example, the encroachment of the field oxide (the so-
called bird's beak created during the local oxidation stage of the normal 
4 
silicon process) makes the effective channel width smaller than the design 
size and degrades the drain current significantly. In addition hot carriers 
generated by the high electric field across the channel and the drain 
pinch-off region cause unacceptable device instabilities unless the power 
supply voltage is scaled down along with the* channel length reduction. 
Scaling down the supply voltage results in the loss of a marked 
distinction between the logic 'low' and logic *high' levels. For example 
scaling a 2 pm technology to 0.2 urn would require the supply voltage to 
be lowered from 5 to 0.5V with a consequent narrow noise margin and 
high sensitivity to variations in the supply voltage. 
_ 0.60 
0) 
c 
d 
^ 0.50 
t 
> 
£ 0.40 
<D 
a 
to 
E 
i -
^ 
o 
Q 
C 
0 
'+> 
0 
o> 
D 
a 
0 
L-
Q_ 
0.30 - -
0 . 2 0 -
0.10- -
0.00 
VDD = 3 V p 
\ / A , A = P 
\ / 5 V 
1 
-~k * 3 V 
i.20 
- •1 .00 
--0.B0 
0.60 
- -0 .40 .9 
E 
a 
i_ 
a 
t 
a) 
> 
c 
a 
c 
0 
0 
a 
- -0 .20 
0.00 
w 
Q 
a) 
5 
0 
CL 
0.0 0.5 1.5 2.0 
Effective Channel Length L (fim) 
eff 
Figure 1.2 Delay and power dissipation of scaled inverters for power 
supplies of 3 and 5 volts. 
Another problem encountered in CMOS is the latch-up susceptibility 
5 
which becomes a serious drawback in sub-micron geometries. 
Therefore as the device geometry is reduced, we are quickly reaching the 
limits of silicon technology for ultra high speed, VLSI circuits. We are 
hence prompted to seek other technologies to provide for faster devices 
which will be a prerequisite for even more sophisticated system design 
capabilities. 
1.3 Gallium Arsenide as an Alternative Substrate 
Before assessing the suitability of GaAs as a substrate for VLSI circuits 
it is important to note that our concern is only with ultra-high speed 
applications. Then, in order to explore the potential of the technology, it 
is necessary to make a direct comparison between GaAs and silicon. First 
we concentrate on the two materials and their electrical properties, a 
summary of which is given in Table 1.2 [17]. 
Properties GaAs silicon 
Electron mobility (cnrWs) 5000 800 
Maximum electron drift velocity (cm/s) 2 x l 0 7 l x l O 7 
Hole mobility (cmWs) 250 350 
Energy gap (eV) 1.43 1.12 
Type of gap Direct Indirect 
Density of states in conduction band (cm' 3) 5 x 1 0 " 3 x l 0 1 9 
Maximum resistivity (Qcm) 10 9 10 5 
Minority carrier life time (s) 10"8 lO' 3 
Breakdown field (V/cm) 4 x10 s 3 x l 0 5 
Schottky barrier height (V) 0.7-0.8 0.4-0.6 
Table 1.2 Properties of GaAs and silicon at 300 K 
6 
The advantages of GaAs over silicon as a base material for ICs are [18] 
[19] [20]: 
a) At normal doping levels the saturated drift velocity for GaAs and 
silicon are almost equal with values of 1 .4xl0 7 and lx l0 7 cm/s 
respectively. However the saturation velocity in GaAs is achieved at 
electric fields about four times lower than in silicon. 
b) Electron mobility in GaAs is six to seven times higher than in silicon. 
Therefore, transit times as short as 15-10ps, corresponding to current 
gain-bandwidth products in the range 15-25GHz can be obtained for 
GaAs transistors for typical gate lengths of 0.5-1 pm (a three to five 
times improvement over silicon devices). 
c) The semi-insulating property of GaAs material (resistivity in the range 
of 10 7-l 0 9Ocm at room temperature) is another advantage for 
high performance devices. It not only minimises the parasitic 
capacitances but also allows for easy electrical isolation of multiple 
devices on a single substrate. 
d) Schottky barriers can be realised on GaAs with a large variety of 
metals (e.g. aluminium, platinum, titanium) leading to high quality 
Schottky junctions with excellent ideality factors (n less than 1.1) and 
fairly low reverse currents J8 < l u A I cm 2. 
e) GaAs is more radiation resistant than silicon due to the absence of gate 
oxide and can operate over a wider temperature range (-200 to 200 °C) 
because of its larger band gap, and finally : 
f) The direct band gap of GaAs allows efficient radiative recombination of 
electrons and holes, meaning that forward-biased pn junctions can be 
used as light emitters. Thus, efficient integration of electrical and 
optical functions is possible. 
The expected higher performance of GaAs compared with silicon should 
be studied not only on the basis of the material properties but also in 
7 
terms of the actual logic gates and integrated circuits implemented in 
either technology. As explained in section 1.2, the most important figure 
of merit for logic gates in high-speed VLSI circuit applications is the 
dynamic switching energy. Figure 1.3 shows the calculated dynamic 
switching energy versus propagation delay relationships for GaAs and 
silicon MESFETs (W= 10urn, L= lpm), with a load capacitance of 30fF 
[211. 
1000 
500 
^ 2 0 0 
* 100 
0) 
c 
UJ 
cn 
c 
'£ 
o 
E 
o 
c 
Q 
50 
20 
1 0 - -
<— 39 ps 
GaAs MESFET 
silicon MESFET 
Load Capacitance C ^= 30 fF 
5.0 
2.0 % 
> 
- 1 . 0 
0.5 
0.2 
20 50 100 200 500 1000 
0.1 
2000 
> 
CP 
c 
(1) 
o 
-M 
0 
> 
cn 
o 
Propagation Delay T (ps) 
d 
Figtire 1.3 Optimised switching performances of silicon and GaAs 
MESFETs with a load capacitance of 30fF. 
It is evident that the logic switching speeds and speed-power products of 
the FET gate are dramatically improved in GaAs. For the same logic 
voltage swing, a GaAs MESFET (L= 1pm) would give about 4-6 times 
higher switching speeds than its silicon counterpart. For a logic voltage 
8 
swing of 3.5V, the silicon MESFET should achieve a switching speed of 
183ps. With the same gate length a GaAs MESFET, should achieve the 
same switching speed with only a 300mV logic swing. This is reflected in 
the figures for the dynamic switching energies of the gates. For the GaAs 
MESFET, it is only about 3fJ, whereas for the silicon MESFET, its value 
is about 150 times higher (0.45pJ), restricting the level of integration. 
Having discussed the superior performance potential of GaAs material 
and logic gates compared with silicon, we must also consider the 
performance of GaAs integrated circuits with reasonable complexity, and 
compare them with their silicon counterparts. Tables 1.3 through 1.5 list 
some of the GaAs and Si multipliers, memories and gate arrays [22] [23] 
[24]. The performance trade-off between speed and power is evident 
within each technology as well as the effect of design rules. For the same 
device dimensions, GaAs devices perform better either in terms of power 
dissipation or propagation delay. The results show that GaAs IC 
technology will have a significant impact on the performance of digital 
signal processing systems. A factor of 2 to 5 times the system clock 
frequency over present systems is projected for digital GaAs ICs. 
Technology Size Delay 
(ns) 
Power 
(mW) 
Comments 
Si NMOS (TRW) 8x8 45 1000 2 urn design rule 
Si ECL (NEC) 8x8 5 1400 2x6um emitter 
Si NMOS (BELL) 16x16 20 1000 1.5 um design rule 
Si SOS (TOSHIBA) 16x16 27 150 
Si CMOS (NEC) 16x16 45 100 
GaAs DCFL 
(FUJITSU) 
16x16 10.5 952 2um gate length 
GaAs DCFL 
(TOSHIBA) 
8x8 12 160 
GaAs SDCFL 
(ROCKWELL) 
8x8 5.25 2200 
Table 1.3 IC technologies comparison (for multiplier circuit). 
9 
Technology Size Access Power Comments 
bits time (mW) 
(ns) per IK 
Si ECL (FUJITSU) 4K 3.2 750 
Si ECL (NEC) 4K 2.3 400 
Si NMOS (BELL) 4K 5.0 100 lum design rule 
Si CMOS (NIPPON) IK 25.0 low 1.5um design rule 
GaAs DCFL IK 1.3 300 2um gate length 
(FUJITSU) 4K 3.0 175 
GaAs DCFL IK 2.0 459 lum gate length 
(NIPPON) 6.0 38 
GaAs DCFL 256 5.0 35 
MC D-DOUGLAS 
GaAs HEMT IK 3.4 290 JFET technology 
(FUJITSU) 0.9 360 
Table 1.4 IC technologies comparison (for memory circuit). 
Technology Size 
(gates) 
Gate delay 
(ps) 
Power 
(mW/gate) 
Si ECL (NIPPON) 5000 500 (average) 1.0 
Si BIPOLAR (IBM) 10000 1700 loaded 
1400 loaded 
0.34 
0.57 
Si SOS (TOSHIBA) 8000 870 loaded 0.45 
Si ECL 
(COMMERCIAL) 
170-1500 3500-1500 29-0.85 
GaAs DCFL 
(TOSHIBA) 
1000 300 loaded 0.2 
GaAs DCFL 
(TEKTRONIX) 
1224 100 (fo=l)* 
200-250 (fo=3) 
0.25 
GaAs SDCFL 
(HONEYWELL) 
432 250 (r.o) + 3.0 
GaAs SDCFL 
(LOCKHEED) 
320 184 (r.o) > 1.0 (Est.) 
* (fo=N) is a gate with fanout of N. 
+ (r.o) is the results obtained from ring oscillators. 
Table 1,5 IC technologies comparison (for gate array). 
10 
1.4 Current Developments and Future Trends 
GaAs technology maturity in the processing of digital integrated circuits 
in 1991 is equivalent to silicon technology maturity of the mid 1970's. 
However, improvements seen with GaAs processing technology are 
* 
occurring at a rate which is three times that which occurred in silicon 
processing during the 1970's and early 1980's [25], The turning point 
came in 1986 with the development of a new method of manufacturing 
digital GaAs ICs, The process employs the usual metal-semiconductor 
field-effect transistors (MESFETs), except that a refractory metal replaces 
gold in the MESFET self-aligned gates [26]. This innovation not only 
eases manufacture but also permits the use of a logic family which trades 
off some of gallium arsenide's high speed for lower power consumption. 
The result is a high yield and relatively low cost solution to the needs of 
very high speed digital integrated circuits. 
The market for digital GaAs ICs is growing very fast. Figure 1.4 shows 
the perceived European GaAs IC market in 1984, 1989 and 1994 [27], 
This demonstrates that the leading sector until the late 1980's was 
analogue MMICs, but that both digital and optoelectronic ICs will be 
employed increasingly in systems. By the end of 1994, the European 
market will mostly be devoted to GaAs digital applications. The same 
progress is happening world-wide, with most of the newly available VLSI 
products in GaAs being application specific integrated circuits (ASICs). 
The most dramatic impact on the computer market will occur when GaAs 
microprocessors begin to appear. These chips will bring the power of 
today's supercomputer to the desktop workstation. Because of their 
relatively low power dissipation, clock frequencies in excess of 250MHz 
could be accommodated in an office environment enclosure which contains 
only a fan for cooling [28]; In sharp contrast, today's supercomputers 
require exotic liquid or refrigerated-air cooling. 
11 
2% Opto - electronics 
1984 $ 3 M 1989 $ 1 4 0 M 
1 AK Opto - electronics 
1994 $ 1 . 8 7 6 B 
Figure 1.4 Market sectors for GaAs ICs in Europe from 1984 to 1994. 
1.5 Scope of this Thesis 
This chapter has shown the superior performance of digital GaAs circuits 
in terms of speed and power dissipation and has predicted an ever 
growing use of this technology for high speed digital applications. 
The ultimate success of GaAs as a base for digital integrated circuits 
depends on various factors, the most important of which are the process 
and design issues. 
The process maturity of GaAs is reaching the stage where the 
implementation of true VLSI circuits {t20,000 transistors) is possible. 
This is brought about by the constant improvement in the preparation of 
defect free crystals as well as in production of devices with very small 
parameter variations. At such levels of integration, a design approach 
must be developed to ensure reliable operation whilst maintaining the 
high speed and low power dissipation offered by the technology. 
The subject of this thesis is to identify the critical design issues, ranging 
from the optimisation of basic gates to the impact of the algorithms and 
overall architecture on the performance of GaAs VLSI circuits. This is 
12 
achieved by designing a range of test circuits such as logic gates, buffers, 
storage elements, adders and multipliers based on existing design ideas 
to identify potential problem areas. The data provided from this design 
exercise are then used to develop novel techniques to improve the 
performance of GaAs circuits at high levels of integration. Although the 
designs are primarily targeted for image processing applications, in 
principle they could have much wider applications. 
In chapter 2 various GaAs devices are introduced and their suitability for 
VLSI applications is assessed. The manufacturing sequence of the devices 
is then explained to provide a better understanding of their structures. 
The layers and their associated layout rules are subsequently defined in 
order to be able to identify them on the circuit layouts and to show the 
minimum feature sizes for the GaAs process used. Also, in this chapter, 
the device models and process parameters are discussed in some detail. 
These are important issues as they directly determine the validity of the 
simulation results. 
The GaAs MESFET logic families are discussed in chapter 3. A detailed 
comparison between the logic gates is presented to select the most 
appropriate one for GaAs VLSI applications, namely the Direct Coupled 
FET Logic (DCFL) gate. An alternative gate configuration called the 
Source follower DCFL (SDCFL) is also proposed in an attempt to improve 
the noise margin and speed of GaAs circuits. This is followed by 
suggesting a number of buffering schemes to improve the speed where the 
fanout loading is high. This is particularly important for the clock drivers 
required in any synchronous VLSI circuit. 
The fourth chapter gives a review of various adder circuits. These adders 
are designed, laid out and simulated to find the best adder architecture 
for GaAs implementation. The effects of algorithm and design technique 
on the performance of the adder circuits are fully demonstrated. The 
effects of various interconnect technologies on the overall delay are also 
investigated to suggest adder architectures which would be least sensitive 
13 
1 
to interconnect. The design and evaluation of a GaAs multiplier circuit is 
presented in chapter 5. This is a natural progression towards the 
implementation of a VLSI circuit for digital signal processing applications. 
The multiplier circuit is used to demonstrate further the effectiveness and 
identify the limitations of conventional circuit design approaches for GaAs 
digital circuits. 
A hierarchical design procedure and a novel layout method are proposed 
in chapter 6 to minimise the delay and area of circuits. This novel design 
technique is applied to the same circuit examples in chapters 4 and 5 
which are then re-evaluated. A comparison between the results obtained 
from the circuits in this chapter and those achieved by using the 
conventional design techniques is given to show the improvements in 
performance. 
Finally, the overall objectives and the work carried out during the course 
of the project are summarised in chapter 7. The outcomes together with 
the conclusions drawn from the research are also presented. 
14 
CHAPTER 2 
GaAs DEVICE FABRICATION AND MODELLING 
2.1 Suitable Devices for VLSI Implementation 
A number of different devices have been developed for GaAs. They fall 
into two categories, the first and second generation devices [30]. First 
generation devices are the Depletion-mode MESFET (DFET), 
Enhancement-mode MESFET (EFET), Enhancement-mode Junction FET 
(EJFET) and Complementary EJFET (CE-JFET). The second generation 
devices include the High Electron Mobility Transistor (HEMT) and 
Heterojunction Bipolar Transistor (HBT). Second generation devices are 
faster than the first generation devices due to better exploitation of the 
GaAs. For example the operating frequency of DFETs, in general, is 
between 20 to 80GHz and for HEMTs it can vary from 70 to 100GHz [31]. 
There are also more exotic devices being invented in the research labs 
which attempt to reach the ultimate performance of GaAs. However for 
high speed VLSI circuits the most important factor, apart from high 
operating frequency, is the maturity of the process. At present the first 
generation MESFETs are the most widely used devices for VLSI 
applications. Even at sub-micron level they can still be easily 
manufactured and provide high operating frequencies. 
The designs and analyses of the circuits presented in this thesis are based 
on MESFETs. Therefore the results and the final conclusions are specific 
to MESFETs, although the fundamental design and implementation 
issues are believed to be applicable to circuits using other GaAs devices. 
The following section presents a detailed description of MESFETs, their 
fabrication process and design rules as well as the equivalent circuit 
models used in all the simulations. 
15 
2.2 GaAs MESFET Structure 
Figure 2.1 shows the basic structure of a GaAs MESFET. It consists of a 
chromium doped, semi-insulating substrate into which source, drain and 
channel are made by n-type dopant implantation [32]. 
v. 
Schottky contact 
Gate Ohmic contact 
Dram 
Semi- insulating GaAs substrate 
Conductive channel 
Figure 2.1 Cross section of an ion-implanted MESFET. 
The gate is formed when a metal such as aluminium is deposited over the 
channel. Conduction in the channel is confined to the region between the 
gate depletion-edge and the substrate and may be modulated by the gate 
voltage. 
GaAs MESFETs are somewhat similar to silicon MOSFETs. The major 
difference is the presence of a Schottky diode at the gate-channel 
interface. The detailed device operation is also different in that in GaAs 
the electron velocity saturates for an electron field roughly ten times 
lower than in silicon. Thus, the saturation in drain current, for GaAs 
MESFETs occurs due to the carrier-velocity saturation, whereas channel 
pinch off causes this in silicon [33]. 
The threshold voltage of the GaAs MESFET can be adjusted by varying 
the channel thickness and the concentration of the implanted impurity. 
The normally 'ON' DFET is characterised by its thick and highly doped 
channel exhibiting a negative threshold voltage. By reducing the channel 
thickness a normally 'OFF' EFET with positive threshold voltage can be 
fabricated. For the DFETs the channel thickness is in the range of 1000 
to 2000A, whereas for the EFETs it ranges from 500 to lOOOA, 
16 
There are many ways of fabricating MESFETs and the process can be 
adapted to the application for which they are intended. For high 
performance GaAs VLSI circuits the most dominant approaches in device 
fabrication are the planar and self-aligned gate processes [34]. 
2.3 Planar Process ing Steps for GaAs MESFETs 
Figure 2.2 shows a generalised manufacturing sequence for a discrete 
planar GaAs DFET process. It is presented here to show the steps in 
transistor fabrication without the complications of simultaneous 
fabrication of other components (the same process applies to EFETs). 
As shown in Figure 2.2a, initially the GaAs substrate is coated with the 
first level of insulator which is a thin layer of silicon nitride (Si gN 4). This 
thin film of insulator remains on the wafer throughout the processing 
steps that are to follow. A photoresist is then applied and selectively 
removed to define a shallow high resistivity n- channel layer. The channel 
is formed by direct implantation of silicon ions through the silicon nitride 
layer, into the GaAs substrate. 
Figure 2.2b shows the formation of the deep and heavily doped n + layer 
for the source and drain regions, after a second application of photoresist 
and the selective removal process. The resultant channel resistance is in 
the range of 1000 to 2500Q/square, which is too high for source and drain 
contacts. Therefore the surface concentration of the n + is kept relatively 
high to minimise the resistance seen by the ohmic metal contacts. 
In the next step, namely the cap and anneal process (Figure 2.2c), the 
wafer is capped with a suitable material such as silicon dioxide (Si0 2 ) by 
chemical vapour deposition. This layer of silicon dioxide is particularly 
important as it prevents arsenic out-diffusion, brought about by the high 
vapour pressure associated with GaAs when subject to temperatures in 
excess of about 600 C°, during the anneal step. The anneal step is 
performed in a hydrogen ambient to activate electrically the implanted 
17 
r e g i o n s . 
T h e ohnaic c o n t a c t m e t a l l i s a t i o n step in which contact areas for the source 
a n d dra in are f o r m e d uses a process known as the lift off technique 
( f i g u r e 2.2d). 
I n t h e lift of f p r o c e s s t h e deposited metal adheres to the underlying 
m a t e r i a l w h e r e t h e r e i s no cap layer while the remaining metal on the 
c a p layer is r e m o v e d w h e n the layer is stripped. This allows precise metal 
d e f i n i t i o n w i t h o u t a n e t c h back process. The metals used in the ohmic 
m e t a l l i s a t i o n a r e go ld -germanium-nicke l or gold-germanium-platinum 
a l l o y . 
A n i m p o r t a n t p o i n t t o n o t e is that the semi-insulating nature of the GaAs 
s u b s t r a t e can n o t b e u s e d alone to provide good isolation between devices 
( b a c k - g a t i n g ) [ 3 5 3 [ 3 6 ] - I n fact, it is usual to implant H + ions into the field 
a r e a s to r e d u c e t h e e f f e c t of the parasitic interactions between the nearby 
d e v i c e s . 
O n e o f the m o s t c r i t i c a l steps in the fabrication process is the gate 
m e t a l l i s a t i o n . S c h o t t l c y gates together with the first level interconnect are 
f o r m e d by m u l t i - l a y e r gold and refractory metal thin films such as 
t i t a m u m / p l a t i n u m / g o l d alloy, deposited by electron beam evaporation 
( F i g u r e 2.2e). S e c o n d a n d higher level metals are not in contact with the 
G a A s s u b s t r a t e , t h e r e f o r e platinum which is used to prevent the 
i n t e r a c t i o n o f g o l d w i t h the GaAs surface can sometimes be eliminated 
from th is step. 
T h e final step o f t h e process is the passivation step which is used to 
p r o t e c t aga inst m o i s t u r e and contamination (Figure 2.2f). This entails a 
t h i c k layer o f s i l i c o n n i t r i d e being deposited on the gate, source and drain 
m e t a l l i s a t i o n , u s i n g a l o w temperature plasma enhanced chemical vapour 
d e p o s i t i o n p r o c e s s . 
18 
CO 
Insulato'r(Si3N4) 
(level 1) 
Si + Photoresist 
J, 
"7ir 
n Implant 
Semi— insulating GaAs substrate 
( (a) Deposition of tlic first! evel insulator. Implantation of silicon ions. 
Insulator (level l) 
Insulator (level 2) 
Semi- insulating GaAs Substrate 
(c) Deposition of insulator. Annealing of the implant. 
(e) Deposition ol Ti / Pt / Augate and 1st level metal. 
Semi— hisulalmg GaAs Substrate + Implant 
(b) Formotion of the Source and Drain regions. 
P mm 1 
AuGc/Ni or AuGc/Ft contacts 
m m . 
Semi- bisulatbig GaAs Substrate 
(d) Deposition of olunic contacts. 
Second level metal 
J 
Via contact 
(f) Deposition of second level metal. 
Figure 2.2 A typical planar manufacturing process for a GaAs MESFET. 
2.4 Self-Aligned Gate Process Techno logy 
In order to improve fabrication technology, the self- aligned gate method 
was borrowed from silicon NMOS process. In this method, the Schottky 
gate is used as a mask for implanting the source and drain regions of the 
devices. The n + source and drain layers are embedded close to the gates. 
Therefore the parasitic source resistance of the FETs is greatly reduced 
and as a consequence the transconductance of the device is increased. In 
addition the process offers improved pinch-off voltage uniformity, which 
is of crucial importance for the manufacture of VLSI circuits based on 
normally-off EFETs. 
The fabrication steps for a self-aligned gate process are shown in Figure 
2.3. Just as for the planar process the first step is to form the channel 
area by selective implantation of silicon ions into the GaAs substrate 
(Figure 2,3a). Next, a high temperature stable material such as Tungsten 
Nitride is deposited over the substrate and is patterned by an etching 
process to define the gate area (Figure 2.3b). The gate acts as a mask for 
the next step in the process which is the formation of source and drain by 
the high dose implantation of ions (Figure 2,3c). This step is followed by 
capping of the substrate with silicon dioxide so that the sample can be 
annealed without any arsenic out-diffusion due to the high vapour 
pressure. 
It is important to note that the gate material must withstand the high 
temperatures (about 800 C°) during the annealing process. Tungsten 
Nitride has been found to be satisfactory as a gate material. It has a 
typical film resistivity of 70uQ-cm and Schottky barrier height of 0.8 V 
to n-type GaAs. 
After the annealing (Figure 2.3d), the final stage of the process is the 
ohmic metallisation of the source and drain regions (Figure 2.3e). As in 
the case of the planar process, the metals used in the ohmic metallisation 
are gold-germanium-nickel alloy or gold-germanium-platinum. 
20 
to 
ri~ Implant 
Semi- insulating GaAs substrate 
(a) Implantation or n* ions. 
n* Implant 
11 | ;i j :t »ii:it b;f Sits fy-ar. a m. 135 ™ 
n+ Implant ^ 
ill 
r •1 
Semi- insulating GaAs substrate 
(c) Implantation of n + ions. 
Gate 
Semi-insulating GaAs substrate 
(b) Formation of the Schottky gate. 
Semi-insulating GaAs substrate 
(d) Anneal cycle to activate the dopants. 
Ohmic contact 
Semi- insulating GaAs substrate 
(e) Formation of oltmic contacts. 
Figure 2 3 A typical manufacturing process for a self-aligned GaAs MESFET. 
The formation of the second and higher level metals together with the 
final passivation stage is similar to that of the planar process, described 
in the previous section. 
2.5 GaAs MESFET Design Rules and Layer Representation 
The layout and design rules are intended to ensure reliable circuits with 
optimum yield and size. They are set by the designer and the process 
engineer to provide the best compromise between yield and performance. 
The layout rules must define: a) the geometry of the features that can be 
reproduced by the mask and lithography process and, b) the interaction 
between different layers. There are two main approaches to achieve this: 
the lambda-based and micron-based rules. In lambda-based rules, every 
feature is expressed in terms of the parameter lambda. The micron rules, 
on the other hand, are given as a list of minimum feature sizes and 
spacings, according to the capabilities of the process technology. 
The lambda-based rules are simple and somewhat relaxed to ensure high 
yield circuits. This, however, results in performance degradation due to 
the increase in area. For high speed GaAs VLSI circuits, micron-based 
rules must be used to achieve optimum performance [371. 
The layout rule set used throughout the work presented in this thesis is 
given in appendix A, so that it can be used for further circuit design and 
implementation work, if required. The set includes the width and spacing 
rules for different layers together with some special rules for MESFETs. 
The colour coding of the layers together with the layer patterns are also 
provided so that each layer in the circuit can easily be identified [38] [39]. 
2.6 An Appropriate Device Model for GaAs VLSI 
In the following chapters a considerable amount of computer simulation 
is described, in order to present a novel design approach for GaAs 
22 
MESFETs. The validity of the results and final conclusions depend totally 
on: a) the accuracy of the model for the individual devices and b) the 
accuracy of the parameters, extracted for the model [40]. The deciding 
factor in choosing a particular model must arise from the comparison of 
the simulated results with the measured data to provide reliable results. 
For VLSI circuit simulation, another important factor in choosing a 
particular model is that it should be CPU time efficient. Clearly complex 
models can not be used for circuits with many thousands of MESFETs. 
On the other hand MESFETs are complex internally and simple equations 
can not describe their behaviour under all possible conditions. 
The most commonly used MESFET model is based on the JFET model, 
consisting of a parallel diode and capacitor between gate-source (D^, Cg,,) 
and gate-drain (D^, C g d ) , plus a controlled current source (I d s) between 
drain-source. For anything other than the most approximate simulations 
it is necessary to add resistors R d , RB and R g in series with the drain, 
source and gate respectively, add a drain-source resistor (R^) and drain-
source capacitor (C^). The complete equivalent circuit model is shown in 
Figure 2.4 [41] [42]. 
ii ' A 
c a » Source 
Figure 2.4 The equivalent circuit model for GaAs MESFETs. 
23 
The problem is to define a formula for the 1^ current. The simplest 
formula is given by the Schichman and Hodges model [43], which is 
implemented in most versions of SPICE programs. 
The model has a number of inadequacies when it comes to modelling 
short channel MESFETs (which is the case for most MESFETs) [44]. 
These are as follows. 
a) The square-law relationship of I d B to is often significantly different 
from the behaviour of the actual device. 
b) The approximately linear dependence of output conductance on I d a is 
often not observed (they are more often independent). 
c) The saturation of 1^ is assumed to be at Vds - V - Vt, whereas the 
actual device exhibits early saturation at a significantly lower voltage 
than the formula suggests. 
A simple, more accurate, model was proposed by W.R. Curtice in 1980 
[45], which incorporates a tanh function in the formula. It allows the 
linear and saturation regions to be modelled by the same equation. This 
model is used for all the simulations presented in this thesis and apart 
from the accuracy and simplicity, having access to the foundry measured 
parameters for this model was the main reason for choosing it. 
The drain-to-source current [I d 8 ] , described by the Curtice equation is as 
follows: 
'* - p <v^2{UXV^^aV^ <SU) 
where P is the transconductance parameter, Vga is the gate-source 
voltage, V t is the threshold voltage, X is the channel length modulation 
parameter, a is the hyperbolic tangent drain voltage multiplier and V d 6 
is the drain-source voltage. 
DC characteristics are defined by the model parameters V t and (J (which 
24 
determine the drain current with gate voltage), by X (which determines 
the output conductance) and by the saturation current of the two gate 
junctions. 
The following equations describe the threshold voltage and 
transconductance parameters [46]: 
^ (2.2) 
* 2 e 
2a L 
where V b i is the built-in potential, N is the effective channel doping 
density, q is the electron charge, a is the effective channel implant depth, e 
is the permittivity, p„ is the electron mobility,W is the gate width and L 
is the channel length. 
Charge storage is modelled by non-linear capacitances, defined by the 
parameters C^ and C^. They are considered as Schottky-barrier diodes 
and modelled as : 
c - Cg0 
8 S ' (2.4a) 
\ 
8Q 
v (2.4b) 
where and V g ( J are the gate-drain and gate-source voltages, and C g 0 is 
the zero bias capacitance. 
The parameter values used in the model are given in Table 2.1 [47]. They 
are derived from an n-channel self-aligned GaAs MESFET process. 
25 
Parameters used in the 
model 
S y m -
bol 
unit Values for 
EFET 
Values for 
DFET 
Threshold voltage v t V 0.15 -0.5 
Transconductance parameter P A/V 2 3.63x10 4 2.13x10-* 
Channel length modulation X 1/V 0.1 0.13 
Drain voltage multiplier a - 2 2 
Built-in voltage v b i V 0.6 0.6 
Effective channel doping 
density 
N cm' 3 10 1 7 10 1 7 
Implant depth a A 700 1500 
Dielectric permittivity e F/cm 1.16xl0" 2 1.16xl0* 2 
Gate-source capacitance c. F 1.5x10-" 1.3x10-" 
Gate-drain capacitance F 7.5x10-" 6 .5x10 1 6 
Source and drain resistances ft 1500 1150 
Table 2.1 Parameter values used in the MESFET model. 
2.7 Important Effects Included in the Device Model 
Having introduced the equations for the I d a current and the gate 
capacitances, there are two important effects which have to be modelled. 
a) Transit-time effects 
Transit-time is brought about by a finite delay in a change in I d E when 
the voltage at the gate is changed. This is due to the fact that charge 
transport occurs at a maximum velocity of 107cm/s. Therefore, for a 
lum channel length, it takes about lOps for the current to change 
when the gate voltage is altered. This time delay is very important in 
delay calculation of GaAs circuits and can be included in the model by 
26 
substituting the Vgt(t) - {Vgs(t-i)) for V g B , where x is the time delay. 
b) Dispersion effects [48] [49] 
There are a number of undesirable effects in GaAs MESFETs which 
may be significant in the performance of the overall circuits. One of the 
most dominant effects is the transconductance dispersion which is 
brought about by the non-ideal semi-insulating substrate and surface. 
This results in higher output conductance (order of 2-3 times) in 
saturation for high frequency signals than would be predicted from 
curve tracer or parameter analyser measurements. 
One of the easiest way to model this effect is simply to increase the 
value of A. in the Curtice model from the value extracted for the low-
frequency measurement to its high frequency value. Typically the high-
frequency value is three orders of magnitude larger than the low-
frequency value. Although this simple model ignores the effect of 
overshoot and phase shift due to dispersion effects, it is adequate for 
the performance evaluation of digital circuits presented in this thesis. 
2.8 Interconnect Model l ing 
The switching speed of MESFET circuits depends on both the device and 
interconnect lines. The propagation of a signal along an interconnect line 
is dependent on a number of factors. They include the distributed line 
resistance, capacitance and inductance, the impedance of the driving 
source and the cross-talk between the lines [50]. 
The interconnect for digital GaAs circuits can still be treated as purely 
capacitive provided the effective ON resistance of the driver gate is larger 
than that of the line by at least 2 orders of magnitude [51]. This is the 
case with the MESFET gates used in our circuits (see chapter three for 
design and analysis of logic gates). 
The capacitance of the lines can be derived using the parallel plate model 
27 
[52], but this simple model ignores the influence of the cross-talk 
(coupling) which can severely degrade the speed of GaAs VLSI circuits. 
There are several methods to reduce the effect of cross-talk. For example 
using a thick interlayer with low dielectric constant between the lines and 
the GaAs substrate can reduce the cross-talk by as much'as 13%. A 
further 8% reduction can be achieved by using an air bridge technology 
where the interconnect lines are suspended in the air [53]. 
F igure 5.5 Line capacitance calculation. 
In order to be able to predict accurately the performance of the overall 
GaAs circuits the effect of coupling must be included in the computer 
simulation. One effective method is to use Green's function to provide an 
electrode capacitance matrix for self and mutual capacitances of the lines 
by determining their total electron charge. This method provides accurate 
values for the capacitance of both the device and interconnect lines [54]. 
However as the number of conductors increases, the size of the 
28 
capacitance matrix continues to grow and results in excessive CPU time 
and memory allocation to compute the capacitances and store the final 
values. Therefore in the computer simulation of the circuits presented in 
the following chapters the parasitic capacitances due to coupling are 
manually added to the capacitance of the lines in the critical paths and 
are based on the calculated results given in Figure 2.5. This provides a 
crude estimation, but sufficiently accurate results without any sacrifice 
in CPU time or memory allocation [553-
2.9 Effect of Process Variations 
Another important issue is the effect of process variation on circuit 
performance. The simulations performed in this research are all based on 
parameters for a commercial GaAs process. The parameters were also 
varied by as much as 50% to ensure that the results were valid for a large 
change in parameters. Therefore the proposed design approaches are 
believed to show a good tolerance to process spread. A detailed analyses 
of the process parameter spread is beyond the scope of this thesis and is 
not presented. 
29 
CHAPTER 3 
MESFET LOGIC FAMILIES FOR GaAs VLSI 
CIRCUITS 
3.1 Types of MESFET Logic Gate [56] [57] 
There are two main approaches to the design of MESFET logic gates. 
They are categorised as either Normally-ON or Normally-OFF logic gates. 
The Normally-ON logic gates consist of DFETs and were the first 
generation devices developed for GaAs digital circuits. The main reason 
for the development of this class of logic was the process maturity of 
DFETs. Later, when the yield and threshold voltage uniformity of EFETs 
were improved the Normally-OFF logic gates were introduced. They 
consist of both types of device (DFETs and EFETs) and possess 
characteristics essential for the implementation of VLSI circuits on GaAs 
(eg small area, low power dissipation etc). 
Gate configurations based on these logic classes are described in this 
chapter. They are intended to show the trends and developments in GaAs 
logic design and further aid the choosing of a particular gate configuration 
best suited to VLSI implementation. 
3.2 Normally-ON Logic Gates 
A number of approaches have been proposed for the design of this class 
of logic. They are: the Buffered FET Logic (BFL), Schottky Diode FET 
Logic (SDFL) and Capacitor-Coupled FET Logic (CCFL). 
a) Buffered FET Logic (BFL) [58] [59] [60] 
The basic structure for the BFL gate is shown in Figure 3.1a. It consists 
of two sections, the logic input and the driver/level Shifter output. 
30 
Different logic functions are implemented by modifying the logic input. 
The output driver is used to ensure input and output, logic level 
compatibility between the gates. Also, in order to be able to turn off the 
DFET logic switch (T s) of the driven gate, a negative supply voltage (VSS) 
is required which adds to the complexity of the gate. 
This type of gate is considered to be one of the fastest, but is expensive 
in terms of power and area. Most of the power is dissipated in the driver 
section, therefore to reduce the power it is possible to remove the load 
driver DFET (T D ) in the output stage of the BFL gate, as shown in Figure 
3.1b. This new configuration is called the Unbuffered FET Logic (UFL) 
and is more suitable for LSI applications. The absence of T D , however, 
reduces the speed and fanout capability of the gate. 
L o g i o Dr- i v t r A ^ v s I bti i Ffcw-
VDD 
OUT1 
T C S 0 U T n 
L o s t o 
VDD 
;~HtT I N n M J ^ 
L w v e I s h i f t s r 
OUT1 
OLTTn 
( a ) 
K e y : 
(b) 
T L = A C T I V E LORD 
T D = LOAD D R I V E R (SOURCE FOLLOWER) 
T g = L O G I C SWITCHES 
T<=3= CURRENT S I N K 
Figure 3.1 (a) BFL gate with the load driver, (b) UFL gate without the 
load driver. 
b) Schottky Diode-FET Logic (SDFL) [61] [62] 
In this logic approach Schottky diodes are used to perform the logic 
31 
operations. They are followed by a Schottky diode for level shifting and 
a buffer stage. A possible configuration of the gate is shown in Figure 
3.2a. The power consumption and area of this type of gate are less than 
the BFL gate but with lower speed and drive capability. 
It is possible to increase the drive capability of the gate without excessive 
increase in power dissipation by adding a push-pull source follower at the 
output, as shown in Figure 3.2b, To improve the noise immunity of the 
gate, the power supply for the logic is normally isolated from the source 
follower. 
L o g i c 
<a) 
OUT 1 
OUTn 
B B B I O G a t e 
VDD 
S o u r c e Fo I I ower-
VDD 
OUT* 
OUTn 
(b) 
Figure 3.2 (a) The basic SDFL gate, (b) SDFL gate with a source follower 
output stage. 
c) Capacitor-Coupled FET Logic (CCFL) [63] [64] 
In order to overcome the problem of level shifting in the Normally-ON 
gates the natural choice is to use a capacitor to couple the input and 
output stages. Figure 3.3a shows a typical CCFL gate, where a reverse-
biased diode is used as the capacitor (D C A P ) . 
The gate has a very simple structure and requires only one supply rail. 
32 
In addition the power dissipation of the gate is low compared with BFL 
and SDFL gates. This is due to the fact that there is no power consumed 
in the capacitors. As soon as they are charged, the action thereafter is to 
transfer the charge between successive stages. Also, as the capacitor is 
placed in series with the DFET gate (T P D ) , the capacitive loading is 
reduced and hence the speed of the gate is improved. 
The use of a capacitor implies a minimum operational frequency of the 
circuit. This frequency is determined by the leakage currents and relative 
sizes of the coupling capacitor and reverse biased gate-source junction of 
the T P D . For applications where the low frequency cutoff point is not 
acceptable, a combination of reverse and forward biased diodes is used to 
provide both the level shifting and capacitive coupling between the stages 
[65]. Figure 3.3b shows the basic structure of such a gate, called 
Capacitor-Diode FET Logic (CDFL). The gate area is increased as a result 
of adding the level shifting diodes but the low power dissipation is still 
maintained since the current through them can be made very small. 
L o s i c | Di-1 v » r - L o g r c 
VDD 
ST 
OUT1 
O U T n 
L v v w t mh i Ffc»r-
— X — T D c f t p 
~THTCS OUTn 
vss 
( a ) 
K e y : 
D = C P P O C I T O R D I O D E 
CAP 
(b) 
Figure 3.3 (a) CCFL gate configuration, (b) CDFL gate configuration. 
33 
3.3 Normally-OFF Logic Gates 
Normally-OFF logic includes Quasi-FET Logic (QFL) and the Direct-
Coupled FET Logic (DCFL). These utilise EFETs as switching devices and 
have become increasingly popular as their yield is constantly being 
improved. 
a) Quasi-FET Logic (QFL) [66] 
The development of the Normally-OFF logic gates was hampered by the 
lack of maturity of GaAs processing in the 70's and early 80's. The major 
obstacle was the variation in threshold voltage across the wafer. The QFL 
gate was invented to allow for a wider spread in threshold voltage (-0.4 
to 0.1V) with little effect on the noise margin of the gate. The gate 
consists of a logic and level shift circuit, as shown in Figure 3.4. The 
insensitivity of the gate performance to process variation is due to the 
level shift circuit. However, the circuit is operated in strong overdrive, 
with the supply voltage set at 2.5V, resulting in an increase in power 
dissipation. Unlike the Normally-ON logic gate (with the exception of the 
CCFL gate), the QFL gate requires only one supply rail but achieves 
comparable dynamic performance. 
Lag i c 
OUT1 
VSS 
Figtire 3.4 QFL gate configuration. 
34 
b) Direct-Coupled FET Logic (DCFL) [67] [68] [69] [70] 
Figure 3.5a shows the basic structure of a DCFL gate. It consists of a 
DFET load (pull-up, T L ) and an EFET switch (pull-down, T s ) , and closely 
resembles an nMOS gate. DCFL is much simpler than others mentioned 
so far, which leads to a higher packing density. DCFL gates with faster 
switching speeds (about 15ps) than any other GaAs logic gate have been 
fabricated. These results are however obtained with a large power supply 
voltage of 4V which causes the pull-down FET to be heavily forward 
biased, reducing the reliability of the gate. At a more realistic supply 
voltage ranges between 1 and 2V DCFL gate delays are slightly greater 
than that of the BFL gate. The main drawback with this type of gate is 
that the allowable output voltage swing is about 0.8V, equal to the barrier 
height of the Schottky gate diode of the driven EFET. Therefore, only 
small voltage swing can be expected from DCFL circuits, resulting in 
small noise margins. Also DCFL gates have a poor load drive capability 
which could severely limit the performance of large circuits with high 
fanout and long interconnect lines. 
A possible solution to low noise margin and poor fanout capability is to 
use a super-buffer configuration as shown in Figure 3.5b. The output 
stage consists of a load driver (TD, connected as a source follower) and a 
pull-down (T P D ) EFET. They can be appropriately sized to drive a given 
capacitive load. The problem with the super-buffer configuration is that 
when the output logic level is to switch from a logic 'high' to a logic low', 
both the T D and T P D transistors are hard ON for a short period of time. 
Therefore a current spike appears with a momentary voltage drop in the 
supply line [71]. With many of these gates in a VLSI circuit switching at 
the same time, large voltage drops could be observed in the supply rail, 
giving rise to an incorrect logic operation. Therefore the use of super-
buffer configuration necessitates a careful design of the supply lines. 
Another approach to improving the noise margin and fanout of the DCFL 
gate is to use the Source follower DCFL (SDCFL) gate [72]. Figure 3.5c 
35 
shows the'SDCFL gate configuration. The source follower stage can be 
sized to drive a given load and due to the action of the T D high values of 
noise margin can be obtained. 
Log1 c 
IN1 h INH h 
OUT1 
OLTTn 
IN n 
BuFFer 
VDD 
O 
OUT 
(a) ( b ) 
Log i o 
IN n 
Souroe fo I Iower 
3 D 
OUT 
HE cs 
(c) 
Figure 3.5 (a) DCFL gate configuration, (b) Super-buffer inverter, (c) 
SDCFL inverter. 
36 
3.4 Suitable Logic Gates for GaAs VLSI 
The logic gate requirements for high speed VLSI circuits are explained in 
chapter one. They are, apart from high speed, low power dissipation and 
small area. The prospects of such gates for VLSI implementation are 
summarised by K. Lehovec et al. [46]. Taking the area of the logic gates 
into consideration, BFL and CCFL (> 1000pm 2 ) are limited to MSI 
complexity and the SDFL (> 500pm 2 ) gate can be used only for LSI 
structures. In other words Normally-ON logic gates are not suitable for 
VLSI on the basis of area alone. 
Even with a larger chip area, these gates can not satisfy the power 
requirements for VLSI, The high power dissipation of the BFL gate 
(40mW) limits the integration level to MSI. CCFL and SDFL gates, with 
power dissipations of 2.5mW and 3.5mW respectively, can achieve only 
LSI complexity. According to H.C. Josephs [73] the power restriction for 
a high speed VLSI circuit would require logic swings of less than 1.8V. 
Further increase in the level of integration to Ultra Large Scale would 
require a voltage swing of 0.8V or less. 
Therefore the DCFL gate with small area (=200um 2 ) , low power 
dissipation (0.1-0.2mW) and low voltage supply level (1-2V), as well as 
circuit simplicity, is by far the strongest contender for GaAs VLSI 
implementation. SDCFL gate of comparable delay and power dissipation 
can also be used in conjunction with the DCFL to improve the fanout and 
interconnect drive capability. To show this, a detailed analysis of the 
SDCFL and DCFL gates is presented in section 3.7. They form the basis 
of the designs presented in the following chapters. 
3.5 First Order Design of DCFL and SDCFL Gates 
The design of logic gates involves the determination of optimum transistor 
sizes. This stage is very important in the design process as the 
performance of the overall circuit is directly determined by the 
37 
performance of the logic gates. 
We begin by using the device model to give a first order approximation 
and an insight to the parameters influencing the choice of transistor sizes 
for DCFL and SDCFL gates. This is followed by a detailed computer 
simulation for various input/output conditions, supply voltage, etc to find 
the optimum transistor ratios. 
Figure 3.6a shows two basic DCFL inverters, with their typical 
interconnections. The current equation for the load DFET (I L) and the 
switch EFET (I s) are as follows [741: 
lL = (-Kfi)2 tanh(a[W)D-KJ) <3.1a> 
h - Ps(K.rt-F(5)2tanh(a70) (3.1b) 
Equating the two currents and using equation 2.3 we obtain : 
w l as (-VtL)2 tanh(a [VDD-VJ) 
(3.2) 
For Vin - V0 - V D D - 0.4V, equation 3.2 reduces to the form : 
2 
ws *L ( ° - 4 - y t s ) 
2 
W L aS {-Vaf 
(3.3) 
From equation 3.3 the ratio of the transistor widths can be determined for 
various values of load and switch threshold voltages, as shown in Figure-
3.7. For an implant depth ratio (a L /a s) of 2:1 the transistor width ratio is 
reduced by a factor of three when the switch threshold voltage is varied 
from 250 to 150mV. The same effect is observed when the load threshold 
voltage is reduced from 900 to 500mV. The smaller device ratio results in 
smaller logic gates and ultimately smaller overall circuit. This justifies 
the choice of the threshold voltages given in table 2.1. 
38 
VDD 
T, Cl_oad> 
- I 
F i r » f c 
- > | < -
Li_= Ls= 0- ar'" 
(a) 
-^[H ' u< i -o -d> j - j S (Drivbt> 
L o a d i 3 
VDD 
DFET 3 EFETT 
(Curfnb) 
8 i n k i 
DFET 
->|<3-
F i r » f c 8coond 
L i . = l - s = L D = L c s = 0 - e r j M 
(to 
Figtire 3.6 (a) Two DCFL inverters with their typical interconnections. 
(b) Two SDCFL inverters with their typical interconnections. 
39 
Supply Voltage (V) 
1.4 
V t s , |VtL| (mV) 
Figure 3.7 The gate width ratio (W S /W L) as a function of V t. The solid 
lines are for the implant depth ratio of 2:1 and the dashed lines are for 
a ratio of 4:1. 
The effect of the supply voltage derived from equation 3.2 is also shown 
in Figure 3.7 (dashed-dotted line). Above the gate built-in potential (0.8V) 
the effect of the supply voltage is minimal. Therefore the supply voltage 
can be set at 0.8V. However to account for the supply voltage variations, 
in practice, it is set to a higher value (1-2V). 
Figure 3.6b shows two SDCFL inverters, with their typical 
interconnections. The logic part is the same as the DCFL gate and 
equation 3.2 can be used to determine the ratio of the active load (T D) to 
logic switch (T s). The driver is added to improve the noise margin and the 
speed of the gate. The size of this stage is determined by the output drive 
requirement. Therefore, the input transistor sizing is independent of the 
40 
output drive requirements. However, the size ratio of the input switch to 
that of the driver load influences the gate intrinsic delay. The smaller the 
ratio the longer is the gate intrinsic delay. 
3.6 Definition of Design Parameters 
In the following section the gates are evaluated in terms of noise margin, 
propagation delay and power dissipation. There are various definitions for 
these parameters. In order to avoid confusion, the definitions used in our 
analysis are given below. 
a) Noise margin 
In the evaluation of the gates, we are interested in the worst case noise 
margin. Therefore only the static noise margin is considered which is 
found graphically using the 'mirror-and-maximum-square' method [75] 
[76]. In this approach, noise of equal and opposite amplitude is applied to 
the inputs of a flip-flop and the noise margin is measured as shown in 
Figure 3.8. 
V0UT|V] 
0.2 0.4 0.6 O.B 
Figure 3.8 Noise margin calculation. 
There are several other definitions of noise margin which can give results 
41 
slightly conflicting with the above method [77] [78]. In our analysis 
however, a detailed comparison of the gates is presented and only the 
relative values of the noise margins are of interest. Therefore, irrespective 
of the method used, the final conclusions should be the same. Indeed, the 
absolute values should also be confirmed by measurements on real 
devices. 
b) Propagation delay 
t +tf 
The propagation delay is defined as the average of t r and t f (ts - ——1), 
2 
where t r and t, are shown graphically in Figure 3.9 [79]. 
VOLTAGE 
Figure 3.9 Delay time calculation. 
c) Power dissipation 
The power dissipation consists of static and dynamic components. For 
high speed circuits, the dynamic component of the power dissipation is 
significant and must be included in the calculations [80]. 
42 
A general formula for the power dissipation of a DCFL gate is : 
.*™"9CFnm " ™™D + C,x(Kw)2x/ (3.4) 
where VDD is the supply voltage, ID is the DC current supplied by VDD, 
Cj is the load capacitance, V b i is the output voltage swing and f is the 
operating frequency. 
For the SDCFL gate, the power dissipation through the source follower 
stage must be added to the above expression : 
PoDcnvm ' WDxUdi + W + f*WH?x(4C, + C2) (3.5) 
where IDi9Ci and Id2>C2 are ^ne current and load capacitances of the logic 
and the source follower stages, respectively. The above equation is derived 
under the assumption that the voltage swing at the output of the logic 
stage is twice the built-in voltage. 
The term average power dissipation, used in the following chapters is 
derived by taking the average of the instantaneous power dissipation over 
one clock period which includes both the static and dynamic components 
of the power dissipation. 
3.7 Detailed Analysis of DCFL and SDCFL Gates 
Having introduced the terms used in the analysis of the logic gates, the 
following gives the results of detailed SPICE simulations performed to 
evaluate the suitability of DCFL and SDCFL gates for VLSI. 
a) Effect of device width ratio on gate performance 
Figure 3.10 shows the effect of the ratio of the load-to-switch gate widths 
of DCFL and driver-load to logic-switch gatewidths of SDCFL gate on 
noise margin and propagation delay. An increase in device width ratio 
degrades the noise margin and improves the speed of both the DCFL and 
SDCFL gates. For the entire range of device ratios the noise margin of the 
43 
SDCFL gate is at least twice that of the DCFL gate. For the same 
propagation delay of about 60ps, the SDCFL gate shows a fourfold 
improvement in noise margin over the DCFL gate. 
SDCFL Driver Load-Logic Switch Gotewidth (WD/W$) 
0.2 0.3 0.4 0.5 0.6 
DCFL Load-Switch Gatewidth ( W L / W S ) 
Figure 3.10 Noise margin and propagation delay of the DCFL (solid 
lines) and SDCFL (dashed lines) gates as a function of the gatewidth 
ratios. 
The most important criteria in the design and evaluation of the gates are 
the noise margin and the propagation delay. The former will guarantee 
the correct functionality of the circuit and the latter determines the 
dynamic performance of the overall circuit. The power dissipation is given 
a lesser priority since its value for DCFL and SDCFL gates is very low 
compared with other logic families. 
44 
For optimum gate performance in terms of noise margin and delay, the 
width ratio of the driver-load (T D) to logic-switch (T s) of the SDCFL gate 
is set to 8:10. In order to optimise.the area, the logic-load (T L) and 
current-sink (T c s ) gate widths are set to minimum geometry. For the 
same criteria the load (T L) to switch (T s) ratio of the DCFL gate is set to 
4:16, with minimum geometry load gate width. The absolute values of the 
transistor sizes are given in Figures 3.6a and 3.6b. 
b) Effect of supply voltage on the gate performance. 
The relationship between the propagation delay and power dissipation of 
the gates is given in Figure 3.11, 
200 
SDCFL Power Dissipation (/aW) 
300 400 500 
120--
& 100 
o 
o 
8 0 - -
60 
60 80 100 120 140 160 
DCFL Power Dissipation (/aW) 
600 
\ ) 0 . 6 
1 
VDD (V)= 
1 1 
v SDCFL 
o DCFL 
\ 0 . 8 
o V 
1.2 
1.4 
o -
1.4 
j i 
1.6 
1 u 
1.8 
1 
2 
_i 
180 200 
Figure 3.11 The propagation delay of DCFL and SDCFL gates versus 
their power dissipation for different values of the supply voltage. 
45 
Since the output voltage swing is limited by the Schottky barrier height 
of the driven FET, high values of the supply voltage will result in higher 
power dissipation without any useful increase in speed. The same is 
observed for the noise margin of the gates. As shown in Figure 3.12, the 
noise margin of the DCFL gate remains constant for supply voltages 
above IV. For the SDCFL gate, the noise margin is improved by 30mV for 
an increase in supply voltage from 1.4 to 2V. This, however, doubles the 
power dissipation with only 15ps reduction in delay. 
SDCFL Supply Voltage (V) 
1.0 1.2 1.4 1.6 1.8 2.0 
180 \ 1 I 1 1 1 1 
80--
4 0 - F - 1 1 1 1 1 1 
0.4 0.6 0.8 1.0 1.2 1.4 
DCFL Supply Voltage (V) 
Figure 3.12 The noise margin of the DCFL and SDCFL gates as a 
function of the supply voltage. 
In order to maintain the constant current supplied by the pull-up FETs 
(the load in DCFL and, the logic-load and driver-load in SDCFL), the 
supply rail voltages for DCFL and SDCFL gates are set to a minimum of 
46 
1 and 1.4V, respectively. This is to account for any voltage variations in 
the supply rail. 
c) Fanout and fanin sensitivity of the gates 
The drive capability of the gates is important in large circuits since the 
fanout loading increases due to circuit complexity. As the number of 
driven gates is increased, the current into the gates of the switch FETs 
is further subdivided. Therefore there is less voltage across them, 
resulting in a degradation of the logic high level. This subsequently limits 
the fanout of the gate. The effect of fanout on noise margin and delay of 
the gates is shown in Figure 3.13. 
FANOUT 
Figure 3.13 Noise margin and propagation delay of the DCFL (solid 
lines) and SDCFL (dashed lines) gates as a function of fanout. 
47 
The SDCFL gate maintains a noise margin which is at least twice that of 
the DCFL gate for a fanout range of 1 to 5. Table 3.1 shows that, in terms 
of fanout, the delay and noise margin of the SDCFL gate can be further 
improved by increasing the width of the FETs in the driver stage while 
maintaining the nominal ratio of 2:1. This will however increase the area 
and power dissipation of the gate and should only be considered for heavy 
fanout loading. 
Driver ratio (Wj /WJ Noise margin (mV) 
FO=l FO=3 FO=5 
Delay (ps) 
FO=l FO=3 FO=5 
8/4 127 105 91 72 185 290 
12/6 140 110 101 75 120 205 
Table 3.1 Effect of varying the width of the FETs in the driver stage 
(while maintaining the same ratio) of the SDCFL gate. 
Both gates are very sensitive to fanin loading. This is due to the low OFF 
resistance of the MESFETs which results in a leakage current through 
the pull down FETs, degrading the noise margin of the gates. Also the 
delay is increased with fanin as the result of added stray capacitances. 
The effect of fanin on the delay of the gates is given in table 3.2. In order 
to avoid overall performance degradation the fanin is set to a maximum 
of 3. 
Type of gate 
Delay (ps) 
FI=1 FI=3 
DCFL 100 133 
SDCFL 72 128 
Table 3.2 Effect of fanin on the delay of the SDCFL and DCFL gates. 
48 
The analyses show that the DCFL gate should be used for the basic logic 
elements within a GaAs VLSI circuit. Small area and low power 
dissipation are the main reasons for this choice. As demonstrated (in 
Figure 3.13) the gate is very sensitive to fanout loading. In fact, the 
maximum tolerable fanout is 5, beyond which the noise margin becomes 
too small for reliable circuit operation. 
On the other hand, the SDCFL gate shows a superior performance to the 
DCFL gate in terms of noise margin and speed but it consumes larger 
power and area. Noise margin improvement better than fourfold is 
possible with power dissipation of three to five times that of the DCFL 
gate. Therefore, the use of the SDCFL gate is particularly advantageous 
where the fanout loading is high. Both gates should be utilised to 
complement each other in high speed, low power and reliable GaAs VLSI 
circuits. 
3.8) Design of Buffering Schemes for GaAs VLSI Circuits 
Having introduced the basic gates for GaAs VLSI, the next step is to 
design appropriate buffering schemes for driving large loads. This is 
particularly important for the clock drivers required in any synchronous 
VLSI circuit. There are two important issues which must be addressed, 
namely the effect of wiring and high fanout count. 
The former accounts for up to 50% of the total delay in large GaAs 
circuits [55] [81] [82]. As the length of the interconnect lines increases 
relative to circuit complexity, the RC time delay of the lines can seriously 
degrade the performance. For 'sufficiently small' wire lengths, RC delays 
can be ignored. The lines can then be treated as one electrical node and 
modelled as simple capacitive loads. This assumption holds if either of the 
following inequalities is true [83]: 
T < x (3.6) 
49 
or : 
Rm t 2 . 3 x ^ „ ( (3.7) 
where xw is the delay through the wire, xg is the gate delay, Ron is the ON 
resistance of the driver FET and Rj n t is the resistance of the interconnect 
line. 
The interconnect delay can be estimated by : 
t « rxCx/2 (3.8) 
2 
where r is the resistance per unit length, C is the capacitance per unit 
length and / is the length of the wire. 
Substituting equation 3.8 into equation 3.6, gives ; 
I < 
2 x T g (3.9) 
r x c 
substituting the typical values for r (~ 0.0230/um) and c (* 0.05/"F/um) 
and an average gate delay of lOOps gives a maximum line length of about 
13mm. For a conservative design guide the maximum line length, with 
capacitive behaviour should be set to 4mm. The same order of magnitude 
for the line length can be obtained using the equation 3.7. Typical values 
for R o n are in the range of 40 to 400Q, depending on the bias voltages and 
the frequency of operation. For MESFETs with 10GHz operating 
frequency and dimensions of W=10pm, L=lum and with the typical bias 
conditions required in DCFL and SDCFL gates, the value of R o n is about 
250ft. Using the equation 3.6 and the previous value of r, the maximum 
length of a capacitive line would be of the order of 4.3mm. 
For VLSI applications the length of interconnections can often be longer 
than 4mm, therefore the effect of the RC time delays of the lines must be 
considered in delay calculations. As demonstrated by H.B. Bakoglu [84] 
50 
this effect can seriously degrade the performance of large circuits and 
should be avoided in practice. The solution is to break up these long lines 
into segments and add buffers at every stage so as to transform the lines 
into capacitive loads. These buffers are commonly termed repeaters and 
can be sized for optimum speed performance. The size of these buffers 
must be carefully adjusted to drive other gates as well as the interconnect 
lines. The following section attempts to define a buffering scheme suitable 
for GaAs VLSI implementation. 
a) Some useful concepts [85] [86] 
The conventional unit of drive capability is that produced by an inverter. 
One method,of increasing the drive capability is to WIRE-OR the unit 
inverters in parallel. For example the drive strength of the buffer in 
Figure 3.14 is 3. 
Figure 3.14 Three inverters WIRE-ORed to form a buffer with drive 
strength of 3. 
More inverters can be added to the chain to achieve the required signal 
rise and fall times. This however, loads the previous stage which 
decreases its operating speed. Therefore the drive strength of all the 
previous stages must also be increased. The number of inverters in each 
stage must be determined to achieve optimum speed. This can be done by 
defining a relative fanout for the overall buffer, given by : 
where the absolute fanout is defined as the sum of loads imposed by the 
relative fanout absolute fanout 
drive strength 
51 
driven gates and drive strength is the number of gates which are WIRED-
ORed, 
b) An optimum relative fanout for GaAs buffers 
The basic gate configurations used to arrive at an optimum value for the 
relative fanout of GaAs buffers are the DCFL, super-buffer (SU) and 
SDCFL gates (see Figure 3.5). 
Three ring oscillators, based on the above gates were simulated in SPICE, 
The oscillation periods were made equal by adjusting the dimensions of 
the FETs. The delay of each gate was set at about lOOps. The gates were 
then evaluated in terms of noise margin, power dissipation and area. The 
results for the noise margin are given in Figure 3.15. 
80 
0-| 1 1 1 . 1 1 h-
0 4 6 12 16 20 24 
Fanout 
Figure 3.15 Noise margin of the SDCFL, SU and DCFL buffers with 
fanout loading. 
It is evident that the DCFL, SU and SDCFL gates should be used in low, 
medium and high fanout situations respectively, to ensure adequate noise 
margins. 
52 
The results for power dissipation and area of the gates are given in Table 
3.3. The power dissipation of the SU gate is one third of the SDCFL gate, 
hence can be used as a logic element within a VLSI circuit to provide 
buffering for high fanout and long interconnect lines. 
The SU gate is also less sensitive to the capacitance of the high-
impedance node (output of the logic stage in Figures 3.5b and 3.5c). As 
shown in Figure 3.16 the delay of the SU gate is about 150ps whereas the 
delay of the SDCFL gate is about 200ps for a high-impadance node 
capacitance of 40fF. In other words, in terms of delay, it is more 
advantageous to implement the logic functions with medium to high 
fanout load in SU gates. 
Power dissipation (mW) area ( p m 2 ) 
DCFL 0.06 480 
SU 0.5 1404 
SDCFL 1.4 1560 
Table 3.3 Comparison of power dissipation and area of the DCFL, SU 
and SDCFL gates. 
To find an optimum value for the relative fanout of the above gates, the 
buffers in Figure 3.17 were simulated in SPICE and evaluated in terms 
of delay, area and power dissipation. 
Figure 3.18 shows the delay of the buffers as a function of relative fanout. 
In terms of delay, the optimum relative fanout of the DCFL buffer is 4, 
for which the delay is about 850ps. Beyond this point the delay is 
increased due to the high sensitivity of DCFL gates to fanout loading. For 
the SU and SDCFL buffers, an increase in relative fanout from 4 to 8 
reduces the delay from 725 to 700ps and 580 to 535ps respectively. 
However, this improvement is insignificant compared with the sharp 
reduction in delay from the relative fanout of 2 to 4 (320ps for SU gate 
and 350ps for the SDCFL gate). 
53 
in 
a. 
220 
§ 160 + 
Q 
140 - -
120--
100 
BD 
o SU-buffer 
A SDCFL-buffer 
10 20 30 
Capacitance (fF) 
40 
Figure 3.16 Delay sensitivity of the SU and SDCFL buffers to the 
capacitance of the high impedance node. 
Vm 
= 8 B4 DCFL gates 
64 DCFL gates 
V i n 
=3? 64 DCFL gates 
Figure 3.17 Three buffering schemes with relative fanouts of 8,4 and 2. 
A very important issue in the design of the buffers (especially for the 
clock drivers) is to ensure equal signal rise and fall times at the output 
of the buffers. The differences in the rise and fall times (skew) for all 
three types of buffer are given in Figure 3.19. Minimum skew is achieved 
with a relative fanout of 4. The amount of skew for DCFL, SU and 
SDCFL buffers are 110, 90 and 12ps respectively. 
54 
200--
180 •-
Q. 
1100 
1000--
900--
o 
"5 800 - -
a 
700--
600--
500 
O DCFL-buffer 
o SU-buffer 
A SDCFL-buffer 
8 0 2 4 6 
Relotive Fanout 
Figure 3.18 Delay versus relative fanout for different buffering schemes. 
600 
450--
01 
a. 
* 
£ 300 
150--
0 
• DCFL-buffer 
o SU-buffer 
a SDCFL-buffer 
0 2 4 6 
Relative Fonout 
Figure 3.19 Skew versus relative fanout for different buffering schemes. 
55 
The area of the buffers are reduced with increasing the relative fanout. 
As shown in Figure 3.20, there is a sharp decrease in area for a change 
of relative fanout from 2 to 4. However the reduction in area is very small 
for the relative fanout of greater than 4. At a relative fanout of 4, the area 
of the DCFL, SU and SDCFL buffers are 14xl0 3 , 39xl0 3 and45xl0 3 um 2 
respectively. 
The buffers were evaluated also in terms of power dissiption and the 
results are shown in Figure 3.21. The power dissipation of the DCFL 
buffer is almost constant. The power dissipation of the SDCFL buffer is 
most affected by the change in relative fanout and is reduced from 31 to 
13mW for an increase in relative fanouts from 2 to 4. 
Based on the above, the optimum relative fanout of all three buffers is 4. 
A relative fanout of 8 shows slight improvement in the delay, area and 
power dissipation of the SU and SDCFL buffers, whereas only the area 
of the DCFL buffer is improved. Once the important issue of equal rise 
and fall times is considered (Figure 3.19), a relative fanout of 4 is 
considered as the best compromise. Finally, were the buffers to be used 
as clock drivers, the length of the lines to the driven gates are usually 
long and the lengths may vary significantly. If the buffers are sensitive 
to this variation, the well known problem of clock skew may occur. Figure 
3.22 shows the sensitivity of the buffers to this loading. For a large 
increase in load capacitance from 0.5 to 2pF, the delays of the DCFL, SU 
and SDCFL buffers are increased by 150, 32 and 48ps respectively. 
Based on the results obtained in this chapter, the design of the large 
circuits presented hereafter is based on DCFL gates. Where a clear 
advantage in using the SDCFL gate is expected, the circuits are also 
implemented in SDCFL and their performance is compared to that of the 
DCFL counterpart. Super-buffers are also used as an extension to DCFL 
elements to improve the speed and noise margin of the overall circuit. The 
clock drivers are implemented in SDCFL, with a relative fanout of four 
to drive a particular fanout and interconnect load. 
56 
0-1 1 1 1 r 
0 2 4 6 8 
Relative fanout 
Figure 3.20 Area versus relative fanout of different buffering schemes. 
0-1 1 1 1 1 ' 
0 2 4 6 8 
Relative Fanout 
Figure 3.21 Power versus relative fanout of different buffering schemes. 
57 
m 
a 
1100 
1000 + 
900 
» 800 + 
o 
700 + 
600 4-
500 
0.5 
o DCFL-buffer 
o SU-buffer 
A SDCFL-buffer 
2.0 1.0 1.5 
Interconnect Capacitance (pF) 
Figure 3.22 Delay versus interconnect capacitance (relative fanout=4). 
58 
CHAPTER 4 
Analysis of Adder Circuits for GaAs VLSI 
Implementation 
4.1 A d d e r Design Approach [87] 
Addition is an essential element in computer arithmetic and is considered 
the workhorse in most digital signal processing systems. At a VLSI level 
of complexity, adder cells are required to be physically small, operate at 
high speed and dissipate minimum power. 
The purpose of this chapter is to evaluate various adder configurations for 
GaAs VLSI implementation. The circuits are based on DCFL gates and 
are fully optimised in terms of speed for a given area and power 
allocation. 
A one bit full adder computes two binary digits and bj, and a carry 
input c{ to produce a sum output s { and a carry output c i + 1 . The outputs 
are related to the inputs by the following boolean equations : 
st - at © bt © c, (4.1) 
Ci+1 " Qfii + ^tCi + Ciai *4*2^ 
To implement the one bit adder in GaAs DCFL, the above logical 
expressions must be represented in the equivalent NOR functions : 
(4.3) 
These equations can be mapped directly into DCFL using NOR gates. As 
59 
discussed in the previous chapter, the high sensitivity of the DCFL gates 
to fanin and fanout loading can severely degrade the performance. To 
show this effect, two , design techniques have been employed. The first 
approach is to design for a minimum number of gates with high fanin and 
fanout counts in order to optimise the area. The only limit imposed on the 
design is a maximum fanout of 6, so as to achieve a positive noise margin 
under the worst case conditions. This design is called the unbuffered 
adder. The fanin and fanout limits are then reduced to achieve optimum 
speed performance. This is termed the buffered adder. Figures 4.1a and 
4.1b show the circuit diagrams of the unbuffered and buffered one bit 
adder respectively. The former is the direct implementation of equations 
4.3 and 4.4 while the latter modifies the equations to accommodate a 
maximum fanin and fanout of 3. 
The delay through the carry chain, t is given by : 
T *= Tni + T r~ + Xn, (4.5) 
CU\(unbuffered) U i (1,5) W M "^O.l) 
T - Xrj + Tr~ + Tr, (4.6) 
CM (buffered) u - i(U) U (^2,D WQA) 
where t- is the delay through the nth gate with fanin of R and 
fanout of F„, 
A general formula was derived (see Appendix B) for the delay of DCFL 
gates [88]: 
T - 40 x [ ! + 0 . 2 8 x F , + 1 .2xF ] + 1840xc , (4.7) 
G(Fi.Fo) L ' °J 1 
where Ct is the loading capacitance of the gate in femto farads . 
Substituting equation 4.7 into equations 4.5 and 4.6, gives a carry chain 
delay of 536 and 435ps for the unbuffered and buffered adders 
respectively. Clearly if the one bit adder is to be cascaded to form a long 
ripple carry chain, the buffered adder should be used for optimum speed. 
Both designs should also be evaluated in terms of power dissipation, area 
60 
and sensitivity to interconnect to achieve the best compromise. For 
example, in the case of the ripple-carry adder, a fanout limit should be 
imposed on the carry block to improve the speed. The unbuffered sum 
block in Figure 4.1a may be used to reduce the overall area and power 
dissipation. 
Figure 4.1 Logic diagram of the one-bit RC adder a) unbuffered b) 
buffered. 
This design technique is used in the implementation of the adders 
discussed in this chapter, and forms a basis for selecting a particular type 
of adder suitable for GaAs VLSI. 
61 
4.2 Types o f A d d e r [89] [90] 
Adder circuit configurations are presented in this section. They range 
from the simple and slow versions like the Ripple-Carry adders to the 
high speed and more complex implementations such as the Carry-Look-
ahead adders. Furthermore, the buffered and unbuffered versions of each 
adder type are given to show the trade-offs in speed, power and area. 
a) Ripple-Carry adder 
The block diagram of a Ripple-Carry (RC) adder is shown in Figure 4.2. 
The logic diagrams for the Sum and Carry generator blocks of the 
unbuffered RC adder are given in Figure 4.1a. The buffered version is 
realised by a fanout reduction on the Carry generator block as shown in 
Figure 4.1b. 
bn 
CfiRRy. BLOCK, — 
Cn 
L 
b i a i 
CARR/i BLOCK 
C1 
bs as 
I I 
CftRh% BLOCK 
C0 
SUM B L O C K 
Sn S i 5 0 
Figure 4.2 Block diagram of the RC adder. 
b) Carry-Look-ahead adder [91] [92] 
The speed of the RC adder can be improved by calculating the carries to 
each stage in parallel. In other words, the carries are generated 
simultaneously resulting in a constant addition time irrespective of the 
number of bits. 
62 
The circuitry required to generate the parallel carries is derived using 
the following equations : 
S. = P. © C , (4.8) 
I I i-i 
C = G + PC• , (4.9) 
where : 
G, = at'bt (4.10) 
p. = a,®b, (4.1D 
Gj and P, are called the carry generate and propagate functions and they 
are derived directly from the inputs a^ and \ The recursive equation of 
4.9 can be applied repeatedly to obtain the required set of carry signals. 
The equations for an n-bit Carry-Look-ahead (CL) adder are as follows : 
(4.12) 
Ci - ox + c 0 p t 
C2 - G2 + G l P 2 + C 0 P 2 P l 
Ck = GK+ Gk_,Pk + Gk_2Pk_xPK + ... + GxP2...Pk + C^PxPr..PK 
Cn- Gn + Gn_xPn + + C0PtPr..PH 
These equations should be transformed into their equivalent NOR form 
for GaAs DCFL implementation. The logic diagram of a 4-bit CL 
generator is given in Figure 4.3a. As the size of the CL generator is 
expanded, the fanin and fanout limitations of the DCFL gates are quickly 
reached. Therefore the number of carry-look-ahead bits should be limited 
to 2, 4 or 8 depending on' the speed requirement. For GaAs DCFL 
implementation, this limit is set to 4 (section 4.3). The 4-bit CL blocks are 
then abutted as illustrated in Figure 4.3b, to form an n-bit adder. 
63 
? 
P „ P , = £ > — 
% 
G 3 — • 
^wM^y 1 
^ ~ " Y F 
< BIT CfHP-LMKftCSO GENEWTW 
OW-KOHBD BCW« 
_J 
AMR-UMK» GDEWIFB 
C R - -
Bm-t«J»BD BOTKB 
c, 
SJ1 BLOCK 
(b) 
( a ) 
Figure 4.3 a) Logic diagram of a 4-bit CL generator, b) An n-bit adder 
constructed using the 4-bit CL generators. 
c) Carry Select adder [93] [94] 
Another approach to speed up the addition cycle is to use the Carry Select 
scheme (CS). The basic structure for a CS adder is shown in Figure 4.4. 
Two n-bit ripple-carry adders are built, one with a zero and the other 
with a one carry input. The carry from the previous stage is used to select 
the output of the appropriate adder using a multiplexer. The carry output 
to the next stage is determined from the previous carry and the carry 
output from the two n-bit adders. The value of n was set to 4, in order to 
be able to easily expand the adder from 4 to 32 bits. The buffered CS 
64 
adder is also implemented by applying a fanout reduction on the 4-bit 
adders. 
33 U3 30 U0 a? b 7 a4 b4 
c 0 4-BIT RIPPLE 
CfiRRV BLOCK 
S0 S3 
C4 1 4-BIT RIPPLE 
CARRY BLOCK 
(S4 S7)1 
(S4S7)0 
C e! 
4 
T 
4 
MUX 
4-BIT RIPPLE 
CARRY BLOCK 
Cb> 
a7 b7 a4 b4 
(S4 S7) 
Figure 4.4 Schematic diagram of a Carry Select adder. 
C 8 
d) Binary Look-ahead Carry adder [951 [961 [971 
Binary Look-ahead Carry (BLC) adder, like the CL adder is based on the 
parallel computation of the carries. It uses an associative operator *Of 
which computes the carry signals in a binary tree structure. The function 
of the 'O' operator is as follows : 
iS,P) o (g\p>) - (g + (jf.g'hp.p1) 
where g, p, g1 and p' are boolean variables. 
The carry signals can be computed as follows : 
(4.13) 
C - G 
1 i 
(4.14) 
65 
where 
(4.14) 
and 
(8i,Pi)•o.-(G^l,Pi_1) - (gi,pi)o(gi_l,pi_l)...o...(g0>P0) (4.15) 
where n is the number of bits. 
Therefore the G/s and P/s of each consecutive stage are computed using 
the same function. In other words, identical circuit elements arranged in 
a binary tree structure can be used to implement the carry bits. 
For example, consider the equations for an 8-bit carry generator : 
The eight bit BLC adder can now be constructed. The complete structure 
is illustrated in Figure 4.5. The similarity in the equations results in a 
simple carry generator block consisting of only three cells. They are the 
'black', Tialf-black' and the 'white' processors. The black processors 
perform the ' 0 ' operation defined in equation 4.13 and the white cells 
transmit the data. The function performed by each of the processors is 
also shown in Figure 4.5. The variables g[ and p[ are the gt's and p /s 
from the previous stages. The 'precondition' cells provide the inputs to the 
carry generator block and the sum cells perform the XOR function on the 
carries (C{) and the propagate signals (p f) from the precondition cells to 
generate the sum output. 
S 3 + P382 + G v p 2 . p 3 
84 + / V C 3 
85 + C3-P4-P5 
Be + Pe-Cs 
80 
81 + Pr8o 
82 + / V C i 
(4.16) 
66 
P7.G7 a o o o o o i 
— o o o o 
P5.G5 
P* ,G4 
P3.G3 
P1.G1 
P0,G0 
CORRy GENERATOR BLOCK 
Cm><><M><TO 
T=1 T=2 T=3 T=4 T=S 
S7 
SB 
pa p i ^ ^ ^ ^ r ' PO p w P O 
s i /j^r 
- A 
go 
/ 
/ 
go J » 
/ 
P i = bi © b i 
Bi = ai . b i 
Si 
po = pi . p'l 
go : ( p i . g i > + S I 
pa = pi 
Co,00 - ( P I ,a'i ) + s i 
PO = 
go -
Pi 
gi 
So = Pi © Ci 
PRECONDITION CELL BLOCK CELL HALF BLACK CELL 
W I T H CELL SUH CELL 
Figure 4.5 Structure and data flow diagram of an 8-bit BLC adder. 
The logic diagrams of the cells within the 8-bit adder are shown in Figure 
4.6. They are the NOR equivalents of the equations given in Figure 4.5 
and can be directly implemented in GaAs DCFL. As for the other adders, 
buffers must be included also in the carry block to exploit the speed 
i 
performance of GaAs. With this objective in mind, the buffers are placed 
in the critical path of the carry block. For the 8-bit adder, minimum 
geometry inverters are added at positions (T 2 ,Ci ) and (T 4 ,C 3 ) to reduce the 
fanout loading (Figure 4.5). The positions of the buffers for 8, 16 and 32 
67 
bit buffered BLC adders were calculated, to minimise the delay through 
the critical path, bearing in mind the unique timing characteristics of 
DCFL gates (Table 4.1). 
SUM CELL 
Figure 4.6 Logic diagrams of the cells in a BLC adder. 
8 bits 16 bits 32 bits 
(T 2 , C ) (T 2 , C,) (T 2 , 
(T 4 , C.) (T 3 , C,) (T, , C 3) 
(T, , C 7) (T 4 , C 7) 
(T, , C n ) (T, , C 1 6) 
( T 7 , C 2 3) 
Table 4.1 Location of the buffers for 8, 16 and 32-bit BLC adder. 
68 
4.3 Evaluation o f Adder Circuits for GaAs VLSI 
The adders were implemented using a full-custom approach, in order to 
optimise the area of the circuits. The layouts of all the adders were 
handcrafted using the Phasel layout tool (Plan, Appendix C). From the 
layouts, a set of SPICE input files was generated using the Phasel net 
list extractor (GaAsnet, Appendix C). They include the transistor models, 
the nodal capacitances and transistor connectivity. From the SPICE 
simulation results, the delay and power dissipation of the adders were 
accurately determined. The area of the adders can be extracted directly 
from the layout. Also the customised buffering schemes proposed in the 
previous section were evaluated for each type of adder. Comparison of the 
adders in terms of delay, power dissipation and area forms the basis for 
selecting a particular adder type for GaAs VLSI. 
In section 4. lb , it was mentioned that the number of carry-look-ahead bits 
in the CL adder is limited to 4. Due to the high fanin and fanout 
sensitivity of the DCFL gates (demonstrated in chapter 3) the expected 
speed improvements will not be achieved if the number of carry-look-
ahead bits is expanded beyond 4. This can be shown by implementing a 
32 bit CL adder with carry-look-ahead blocks of 2, 4 and 8 bits. The 
SPICE simulation results are shown in Figure 4.7. The delay of the adder 
with 2 bit carry-look-ahead blocks is 13.5ns. The increase in the carry-
look-ahead bits from 2 to 8 reduces the delay by 5.3ns, i.e. an 
improvement in speed of only 39%. However the area is increased from 
0.9mm 2 to 3.9mm2. This rather unexpected increase in area is the result 
of having to add extra gates to fulfil the fanin and fanout requirements 
of the DCFL gates. The best compromise is to use 4-bit carry-look-ahead 
blocks with a delay and area of 10.28ps and 1.9mm2 respectively. In this 
section, the adders referred to as CL adders consist of 4 bit carry-look-
ahead blocks. 
69 
Figure 4.7 The delay and area of a 32-bit adder with different carry-
look-ahead limits. 
The following is the evaluation of the buffered and unbuffered versions of 
RC, CL, CS (using 4-bit ripple-carry blocks) and BLC adders introduced 
in the previous section. 
Figure 4.8 shows the delay of the unbuffered adders as a function of the 
number of bits (dashed lines). For the 2 and 4-bit adders, there is no clear 
advantage in using the carry speed-up techniques and the simple RC 
adder can be used since the 4-bit CL, CS and BLC adders give the same 
performance in terms of delay (about 2.6ns). As the number of bits is 
increased the adder delays begin to diverge. The delays for 32-bit RC, CL, 
CS and BLC adders are 17.16, 10.28, 6.91 and 5.92ns respectively. 
Therefore in terms of delay, there is a clear advantage in using the BLC 
or CS adders for a high number of bits (i.e. 24-32 bits). 
The solid lines in Figure 4.8 show the delay of the buffered adders. The 
70 
benefit of including the buffers as proposed in the previous section is 
evident from the graph. In the case of 32-bit BLC and CS adders, the 
delays are*reduced from 5.92 down to 4.61ps and from 6.91 down to 
5.40ps respectively (a 22% improvement). 
0 • 4 8 : ' 12" 16 ' 20 24 28 32 
Number of Bits 
Figure 4.8 Buffered (solid lines) and unbuffered (dashed 
lines) Adder delays for different number of bits. 
71 
The delays of 8 to 32-bit RC and CL buffered adders are only 5% less than 
their unbuffered counterparts. This is due to the relatively l ow fanout 
loading in the critical paths of the RC and CL adders in comparison with 
the CS and BLC versions. Also, the interconnects in the carry chain of the 
RC and CL adders were short in comparison. As a result the capacitance 
loading due to the lines was not significant. 
The area of the adder circuits is another important issue for VLSI 
application. Figure 4.9 shows a comparison of the areas of the buffered 
adders. The unbuffered adders are not included in the graph since their 
area is almost equal to the buffered versions. In fact the extra gates 
required to implement the buffered adders results in less than 5% 
increase in area. 
The RC, CL, BLC and CS adders occupy almost the same area as the 
number of bits is varied from 2 to 4 (about 0.2mm2 for 4-bit adders). They 
begin to differ significantly as the number of bits exceeds 16. In fact, a 16-
bit CS adder with an area of 1.3mm2 is almost twice the size of its RC 
counterpart. At 32 bits the area of the RC, CL, BLC and CS adders are 
1.50,1.98, 2,73 and 2.84mm2 respectively. Therefore in terms of size, the 
RC and CL adders are the most suitable for GaAs VLSI, especially where 
the number of bits is more than 16. However, for VLSI, a generally 
accepted measure of performance is the delay-area product. A circuit with 
the lowest delay-area product is the optimal design. 
For up to 8 bits, the performance of the adders is closely matched and any 
one of the above adders can be selected. It could be argued that since the 
RC adder is the easiest to implement, given its simple structure, it can be 
used for a low number of bits. For a high number of bits, the time-area 
optimal circuit is the BLC adder, closely followed by the CS adder. To 
further justify this claim, the area of a CL adder with 8-bit carry-look-
ahead blocks (CL8) is also included in the graph of Figure 4.9. 
The delay of this adder is comparable with the delay of the BLC adder. 
72 
At 32 bits the delays are equal, but the area of the CL8 adder is 1.5 times 
that of the BLC adder. 
4.0 + 
3.5+ 
o—o RC ader 
a — a CL oder 
d — d BLC ader 
o—o CS ader 
o — o oder with C-L-A limit of 8 E 2.51 
o 
< 
0 4 8 12 16 20 24 28 32 
Number of Bits 
Figure 4.9 Adder area for different number of bits. 
73 
Although delay and area are normally used to evaluate a particular 
circuit configuration for VLSI, in high speed applications power 
dissipation of circuits is another criterion which must be considered before 
selecting a particular design style. In fact one of the limiting factors in 
increasing the level of integration for high speed circuits is power 
dissipation. The average power dissipation of the buffered adders against 
the number of bits is shown in Figure 4.10. Again, the results for the 
unbuffered adders are not shown as the excess power due to the buffers 
is less than 2% of the total power dissipation. 
Up to 8 bits, the power dissipations of the adders are comparable. For a 
higher number of bits, the CS and CL adders dissipate the most power, 
about 56mW for 32-bit addition. This is due to the fact that a relatively 
large number of gates is required to implement the CL and CS adders, 
especially in the case of the CS adder, where blocks of 4-bit RC adders are 
duplicated to generate the carry into the next stage. The power 
dissipation of the BLC adder is as low as the RC adder. At 32 bits, the 
average power dissipation for RC and BLC adders is about 40mW. 
The average power dissipation has static and dynamic components. The 
static power dissipation is proportional to the total number of transistors 
in a circuit. The dynamic power dissipation however, is directly related 
to the number of gates switching at a given time. The BLC adder exhibits 
a comparatively low average power dissipation because it has a 
particularly low dynamic dissipation. This is due to the fact that only one 
row of the carry block is activated at a given time. Since each processor 
consists of only a few basic gates, the total number of switching devices 
is low. Furthermore, the interconnect lines are short and the fanout 
loading is kept low. 
The final issue to consider is the effect of interconnect on the delay of the 
overall circuits. There has been a major effort to improve the existing 
interconnect technology. This has led to the development of low 
impedance lines such as second and higher level metallisation and more 
74 
recently the air bridge technology. This, however, adds to the cost and 
reduces the yield. 
60 
0 4 8 12 16 20 24 28 32 
Number of Bits 
Figure 4 . 1 0 Power dissipation of various adders. 
75 
o — o RC adder 
i — A CL adder 
a — a BLC adder 
0.00 0.02 0.04 0.06 0.08 0.10 
2 
Capacitance (fF//zm ) 
Figure 4,11 Delay sensitivity of 32 bit adders on 
interconnect. 
For a given delay, power and area, a design which is less sensitive to 
interconnect should be considered a better candidate for GaAs VLSI 
implementation. Figure 4.11 shows the delay of 32-bit adders with 
76 
increasing line capacitance. It attempts to show the effect of different 
interconnect technologies on the delay of the adders. The low capacitance 
values (<0.02 / F / p m 2 ) correspond to the air bridge technology; second 
and higher level metals are given a line capacitance of 0.06 down to 
0 .02/F/um 2 . The capacitance values higher than 0.06/F/pm 2 are used to 
show the performance of the adders implemented using the first level 
metal only. 
As shown in Figure 4,11, the advantage of using the air bridge and/or a 
high level metal (second or third) is quite evident. For instance the delay 
of the 32-bit unbuffered CS adder is doubled from 6 to 12.5ns, as the line 
capacitance is raised from 0.02 to 0 .1 /F /pm 2 . The graph shows also the 
effect of buffers in reducing the adder sensitivity to interconnect. For 
example with the line capacitance of 0.1 fF/pm2, the CS unbuffered adder 
has a delay of 12.5ns whereas the delay is about 8.9ns for the buffered 
version. 
Another important point is the effect of interconnect on design styles. The 
BLC adder is the least sensitive circuit configuration to interconnect than 
the other designs. The worst case delay for the buffered BLC adder is 
about 7.3ns. This is followed by the buffered CS adder with a worst case 
delay of 8.9ns. 
4.4 Summary o f Important Points 
In this chapter various adder circuits have been evaluated for GaAs VLSI 
implementation. The following points can be derived from the analysis. 
a) For a low number of bits (up to 8), the traditionally slow RC adder may 
well be adequate for high speed GaAs applications. However as the 
number of bits is increased, the BLC adder followed by the CS adder 
show far superior performance to that of the RC and CL adders. This 
performance is measured by delay-power and delay-area products which 
are lowest for the BLC and CS adders (Figures 4.12,13). 
77 
100 
10  - -
10 
o — o RC odder 
a — a dodder 
d — d BLC adder 
o — o CS adder 
0 4 8 12 16 20 24 28 32 
Number of Bits 
Figure 4.12 The delay-power product of adders for different 
number of bits. 
b) The proposed buffering scheme is an effective method of speeding up 
the logic elements (eg adders). The buffers improve the speed by. as 
much as 30%, but occupy less than 5% of the total area and result in 
less than 2% increase in power dissipation. This is achieved by the way 
of reducing the fanout and breaking up the interconnect lines into 
78 
smaller segments. Therefore the designs are more tolerant to 
interconnect loading and cross talk. 
0 4 8 12 16 20 24 28 32 
Number of Bits 
Figure 4,13 Delay-area product of the adders for different 
number of bits. 
79 
c) The effect of the original algorithm and overall architecture on the 
performance of the final design should not be overlooked. For example, 
the binary tree structure of the BLC adder, resulting from the 
associative property of the algorithm, produces a regular layout of 
processing elements, connected over short interconnect lines. This is 
particularly useful for GaAs DCFL implementation as the fanout and 
interconnect loading are reduced. 
Having introduced a practical approach to the design of optimal GaAs 
adders, a more complex circuit example is required to show the 
effectiveness or limitations of our design approach. A natural progression 
is to implement a multiplier which makes extensive use of the optimal 
BLC adder. In the next chapter, a modified Booth's multiplier is designed 
and implemented to be used as a vehicle for the evaluation of a new 
design and layout technique for GaAs. 
80 
