Design of a sixteen bit pipelined adder using CMOS Bulk P-Well technology. by Reid, William R.
Calhoun: The NPS Institutional Archive
Theses and Dissertations Thesis Collection
1984














SIXTEEN BIT PIPELINED ADDER




Thesis Advisor: D. E. Kirk
Approfed for public release; distribution unlimited
1223070

SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)
REPORT DOCUMENTATION PAGE READ INSTRUCTIONSBEFORE COMPLETING FORM
I. REPORT NUMBER 2. GOVT ACCESSION NO RECIPIENT'S CATALOG NUMBER
4. TITLE (and Subtitle)
Design of a Sixteen Bit Pipelined
Adder Using CMOS Bulk P-Well Technology
5. TYPE OF REPORT 6 PERIOD COVERED
Master's Thesis;
December 1984
6. PERFORMING ORG. REPORT NUMBER
7. AUTHORfM
William R. Reid
S. CONTRACT OR GRANT NUMBER("»)
9. PERFORMING ORGANIZATION NAME AND ADORESS
Naval Postgraduate School
Monterey, California 93943
10. PROGRAM ELEMENT. PROJECT, TASK
AREA 4 WORK UNIT NUMBERS





13. NUMBER OF PAGES
116




16. DISTRIBUTION ST ATEMEN T (of this Report)
Approved for public release; distribution unlimited
17. DISTRIBUTION STATEMENT (of the abetract entered In Block 20, It different from Report)
18. SUPPLEMENTARY NOTES
19. KEY WORDS (Continue on reverse aide it necessary and Identify by block number)
VLSI Design, CMOS, CMOS-PW, Pipelined Adder, Carry Look
Ahead Addition, CAD Tools
20. ABSTRACT (Continue on reverse side It necessary and Identity by block number)
The design of a sixteen-bit pipelined adder CMOS integrated
circuit is presented. The adder is designed to maximize
throughput and to provide for testability. Tutorial material
on CMOS design is also presented.
DD ,^3 1473 EDITION OF 1 NOV 65 IS OBSOLETE
S/N 0102- LF-014-6601 "
SECURITY CLASSIFICATION OF THIS PAGE (When Data Bntarad)
Approved for public release; distribution is unlimited,
Design of a
Sixteen Bit Pipelined Adder
Using CMOS Balk P-flell Technology
by
William R. Reid
Lieutenant Commander, United States Navy
B.S., Purdue University, 1975
Submitted in partial fulfillment of the
requirements for the degree of





The design of a sixteen-bit pipelined adder CMOS inte-
grated circuit is presented. The adder is designed to
maximize throughput and to provide for testability.
Tutorial material on CMOS design is also presented.
TABLE OF CONTENTS
I. INTRODUCTION 8
II. CMOS CIRCUITS 10
A. COMPARISON WITH NMOS 10
1. The Inverter 11
2. The NOR Gate and Transmission Gate .... 13
B. CMOS DESIGN METHODOLCGIES 16
C. CMOS IMPLEMENTATION TECHNOLOGIES 20
1. CMOS- SOS 21
2. CMOS-Bulk 21
3. Twin-tub CMOS 26
D. CMOS TECHNOLOGY SELECTION 27






IV. DESIGN OF THE ADDER 44
A. LOGICAL DESIGN 44
1. Zero Level CLA Logic 48
2. First Level CLA logic 49
3. Second Level CLA Logic 49
B. DESIGN FOR TESTABILITY 53
C. LAYOUT DESIGN 54
V. TEST PLAN 63
A. INPUTS AND OUTPUTS 63
3. TESTING FOE CORRECT OPERATION 66
1. Intermediate results 66
C. TESTING FOR SPEED OF OPERATION 67
71. CONCLUSIONS 72
A. THE CMOS TECHNOLOGIES 72
B. CMOS CAD TOOLS 72
C. DESIGN OF THE ADDER 73
APPENDIX A: SPICE MODEL CARDS FOR 3-MICRON CMOS-PW
DE7ICES 74
APPENDIX B: UNIX MANUAL ENTRY FOR RULEC 77
APPENDIX C: PRESIM USER'S GUIDE 7S
APPENDIX D: ADDER SIMULATION 82
APPENDIX E: LAYOUTS 102
APPENDIX F: TEST 7ECT0RS 111
LIST OF REFERENCES 113
BIBLIOGRAPHY 115
INITIAL DISTRIBUTION LIST 116
LIST OF TABLES
1. Lyra Error Abbreviations 32
2. First Level CLA Logic for a 16-bit Sam 49
3. Register Serial Outputs 67
4. PLA Evaluation Sequences 69
LIST OF FIGURES
2.1 CMOS Transistor Symbols 11
2.2 (a) NMOS Inverter (b) CMOS Inverter . . 12
2.3 Minimum Dimension Inverters 14
2.4 2-input Nor Gate 15
2.5 CMOS Transmission Gate 16
2.6 NMOS-Iike CMOS Static Gat€ [fief. 6] 17
2.7 Dynamic NAND Gates [Ref. 6] 18
2.8 Dcmino CMOS Structure [Ref- 6] 19
2.9 Circuit Difficult to Implement in Domino CMOS ... 20
2.10 P-Well Process, Top View [Ref. 6] 23
2.11 P-Well Process, Side View [Ref. 9] 24
2.12 Bipolar Transistors in CMCS-Bulk [Ref. 6] .... 25
2.13 The Latchup Circuit [Ref. 6] 25
2.14 Grounding of the P-Well 26
3.1 CMOS Exclusive OR [Ref. 6] 37
3.2 CMOS Latch Design [Ref. 6] 39
4.1 CMOS Output Loading Model 46
4.2 Preliminary Chip Floorplan 55
4.3 Dual Mode Latch 56
4.4 AND Gate 57
4.5 OR Gate 57
4.6 Exclusive OR Gate 58
4.7 PLA Structure 59
4.8 Final Layout 61
5.1 Charge Sharing in a PLA 68
I- IO1QDDCTI0N
For several years the ability of systems engineers to
design custom digital integrated circuits has been growing.
The Mead and Conway design methodology described in
Intr o duction to VLSI System s [ Eef . 1
]
# permits the systems
engineer to be his own logic circuit designer. A prolifera-
tion of computer-aided design (C&D) systems such as the
MacPitts silicon compiler [Eef. 2], the chip layout language
(CLL) [ Ref . 3], the graphics editor Caesar [Ref. 4], and the
Burlap hierarchical layout language [Ref. 5] make it
possible for the engineer to rapidly carry the Mead and
Conway design methodology through to a final design. This
includes iterative simulation and redesign to provide justi-
fiable confidence in the final design submitted for
fabrication.
Many of the techniques utilized in the Mead and Conway
methodology and most of the CAD tools are based on having
the final design implemented in a technology that uses only
one type of doping for the semiconductor material in the
active region of the transistors. Because of their higher
switching speed, negatively doped metal oxide semiconductor
(NMOS) transistor technologies are generally used.
Selection of an NMOS implementation technology does
provide the systems engineer with a complete and proven
methodology for the design of a very large scale integrated
(VLSI) circuit and allows the use of many extensively tested
CAD tools. Like any other design decision, selection of
NMOS iiplementation brings with it some limitations. There
are two primary problems associated with NMOS digital
circuits.
The first is the ultimate switching speed limitation.
Though many NMOS VLSI circuits operate at clock rates in the
8 to 10 MHz range, there are many applications requiring
higher clock rates. The second problem is the dissipation
of the relatively large amount of power consumed by NMOS
digital circuits. State of the art, commercially available
NMOS VLSI circuits commonly have power consumptions in the
vicinity of 3 to 5 watts. Considerable design effort is
required to insure that the dissipation of this much energy
by a chip measuring approximately 5 millimeters on a side
does not alter the performance of the micron sized features
on the chip.
One group of technologies that offers both increased
switching speed and greatly reduced power consumption is
complementary metal oxide semiconductors (CMOS) . CMOS
circuits also offer the benefits of greater radiation hard-
ening and increased noise margin. In this thesis investiga-
tion, much of the Mead and Conway methodology was utilized
in the design of a CMOS circuit. A general purpose color
graphics CAD tool called Caesar that has been frequently
used in the design of NMOS circuits was employed. In
carrying out the design of the 16 bit pipelined high speed
adder in CMOS two separate goals were pursued. The first,
of course, is speed and the seccnd is verifiability. A high
speed adder implies not only a high clock rate of operation
but also a small latency between input of operands and
output of the sum.
A discussion of CMOS technologies and the implementation
of logic circuits in those technologies follows in Chapter
2. Chapter 3 presents a description of the CAD tools used
to construct and simulate the layout for the adder. The
logic and layout design of the adder is covered in Chapter 4
and is followed by a test plan for the fabricated chip in
Chapter 5.
II. CMOS CIBCaiTS
Before the design of CMOS digital circuits can be
attempted, an understanding of how to best implement logic
functions in CMOS is necessary. It is also important to be
aware of the advantages and disadvantages of the different
CMOS iiiplementation technologies. In this chapter the oper-
ation of CMOS digital circuits is explained using similar
NMOS circuits as a benchmark for comparison. The different
methodologies for assembling the CMOS pieces to produce the
desired logical results are reviewed and the selection of
the CMOS-Bulk p-well implementation technology is explained.
A. CCMPAEISON WITH NMOS
In NMOS digital circuits there is only one type of
switching device, namely the n-channel enhancement mode
metal oxide semiconductor (MOS) transistor. The other prin-
cipal device utilized in NMOS circuits is the depletion mode
n-channel MOS device which acts as a load resistor. In CMOS
there are both n-channel and p-channel enhancement mode
transistors available. As in NMOS, the n-channel device can
be considered on when Vdd (typically +5 Volts DC) , a logical
1, is present on its gate. The p-channel device can be
considered on when ground (GND) , a logical 0, is present on
its gate. In Figure 2. 1 are the symbols that will be used
for the n-channel and p-channel transistors in this thesis.
The basic differences between NMOS and CMOS technologies








Figure 2. 1 CMOS Transistor Symbols.
1 The Inverter
Figure 2.2 (a) shows an NMOS inverter. Whenever
there is a logical 1 on the input, the voltage drop across
the lead resistor is approximately Vdd and the output is a
logical 0. This results in steady state power consumption.
When the input switches to a logical 0, before the output
can assume a logical 1, the lead capacitance (CI) on the
output must be charged to Vdd through the load resistor with
a resistance of several kilohms. This results in a much
longer transition frcm to 1 , where the load capacitance is
charged through the load resistor, than from 1 to where
the load capacitance is discharged through the switched on
NMOS enhancement transistor. The reason for this asymmetry
is that the pull-down transistcr's on resistance is typi-
cally only one fourth or less that of the on resistance of
the pull-up load depletion mode transistor. The technique
of prechar^ing circuits, where all outputs are set to
logical 1 during one clock cycle and then selectively forced
to on the opposite (evaluation) clock cycle has proven
helpful in gaining control over the unsymmetric switching
times. This longer switching time from to 1 must still be
accounted for, however, and represents the primary limita-
tion to the speed of NMOS circuits.
11
Figure 2.2 (a) HMOS Inverter (b) CMOS Inverter.
In the CMOS inverter of Figure 2.2 (b) the input is
applied to the gates of both devices. An input of logical 1
causes the n-channel device to switch on and the p-channel
device to switch off, resulting in an output of logical 0.
Similarly, an input of results in an output of 1. In both
cases, one device is fully off, representing a resistance on
the order of gigaohms. Thus, the steady state power
consumption is essentially zero. In operation the only
power consumption of consequence occurs during the tran-
sition when neither transistor is fully on or off.
Additionally, since the output load capacitance is both
charged and discharged through a turned on transistor, the 1
to and to 1 switching delays are theoretically the same.
Actually the switching delays depend on many parame-
ters. The n-channel and p-channel device dimensions are
frequently not the same, the lobility of the electrons in
12
the n-channel is greater than the mobility of the holes in
the p-channel. Also, the capacitive load seen by the
p-channel device in CMOS p-well (CMOS-pw) is greater than
the load seen by the n-channel device because of the highly
doped p-well. Typically/ the result in CMOS-pw is a
slightly longer transition time of the to 1 output tran-
sition- Some designers attempt to compensate for this by
consistently making the p-channel transistors wider than the
n-channel transistors.
Unlike NMOS, the output of a CMOS digital circuit
makes a full excursion between Vdd and GND. This makes CMOS
circuits less sensitive to noise than NMOS circuits. CMOS
should also benefit more from future reductions in feature
size. NMOS is more restricted in ultimate feature size
because the power dissipation requirements of the depletion
mode devices will create more problems as feature sizes
shrink. In Figure 2.3 the relative sizes of minimum dimen-
sion inverters implemented in currently available 3 micron
feature size CMOS-PW and NMOS technologies are shown.
2- The NOR Gate and Trans mission Gate
Figure 2.4 shows the circuit diagrams and layouts of
a two-input NOR gate implemented in both CMOS-PW and NMOS.
From Figures 2.3 and 2.4 it is evident that static 1 CMOS
gates are more complex and area consuming than their NMOS
counterparts. In these fully complementary circuits a
redundancy in the structures is evident. The pull-up only
or pull-dcwn only would be sufficient to implement the
logic. In the CMOS circuits of Figures 2.3 and 2.4 the
inputs must perform two tasks. A logical 1 on an input
causes both a connection between the output and ground and a
1 Static logic circuits continuously evaluate their
inputs and produce their specified logic output. Dynamic
circuits periorm logical evaluation of the inputs only when

















































Figure 2.3 Minimua Dimension Inverters.
disconnection between the output and Vdd. Logically these
two actions are equivalent, therefore only one action should
be necessary to implement the logic. Design methodologies
to accomplish this are described in section B of this
chapter. The parallelism of the CMOS transmission gate of
Figure 2.5 and the NMOS pass transistor is evident. The
major difference lies in the bilateral nature of the CMOS
transmission gate. It is made up of both n-channel and
p-channel devices and requires both polarities of the
control signal for operation. The reason for this bilateral
requirement is that the p-channel device does not transmit
























Figure 2.4 2-iuput Nor Gate.
high voltages well. The resulting unpredictable voltage
drops make it necessary to utilize both types of transis-
tors. This increase in complexity over its NMOS counterpart
is partially offset by the absence of the level restoring
circuitry NMOS requires following a pass transistor. 2
2 In NMOS digital circuits the length to width ratio of
the pull down transistor is usually four times that of the
depletion mode transistor load. This ratio is required to
insure sufficient excursion of the output voltage. However,
after a pass transistor is used, a ratio of 8:1 rather than
4: 1 must be used to restore the 1GS threshold voltage drop





, , .. i
1 c
Figure 2.5 CMOS Transmission Gate.
In general CMOS technolcgies are ratioless. The use
of "improper" ratios will not affect the logical operation
of most CMOS gates, it will only affect the speed of opera-
tion of the gates.
B. CMOS DESIGN METHODOLOGIES
Static gate CMOS circuits have three serious deficien-
cies when compared to static NMOS gates. First, they are
more area consuming. Second, they can be slower. Though
the individual gates can be faster in CMOS, the p-channel
and n-channel gates are in parallel, thus, the fanout 3 and
the output load capacitance of each circuit are doubled
Third, a CMOS static gate is redundant, duplicating its
functionality in both the pull-up and pull-down section.
One approach to remedy these deficiencies is to use a
static NMOS-like style of design as in Figure 2.6 Here the
p-channel device is always on and the pull-up to pull-down
dimension ratio is relied upon to produce the proper output
voltage. This introduces power consumption problems and
takes away the full excursion on the output. Another
3 Fanout represents the number of transistors that the
output of a logic gate must drive.
16
Figure 2.6 NMOS-like CMOS Static Gate [Ref. 6].
approach is to make extensive use of transmission gates to
build up logic functions. Using transmission gates means
both polarities of all control signals are required- The
resulting large number of wires required to route these
control signals can become very area consuming, especially
if only one metal layer is available.
A third and more effective solution is to use dynamic
logic. Figure 2.7 contains three different implementations
of a dynamic three- input NAND gate. In each, the output is
meaningful (i.e. represents the value of the boolean expres-
sion in1 in2 in3) only when elk is high and elk is low. The
circuits of Figure 2.7 (a) and (b) depend on the pull-up to
pull-down ratio to produce the proper output. As with the
NMOS-like style of design, full excursion on the output is
17
lost and there is steady state power consumption during the
evaluation cycle. The circuit in Figure 2.7 (c) is prec-
harged when elk is low and evaluation of the inputs takes
place when elk is high. This configuration allows only one
change of the output from 1 to 0, so the inputs must be
stable at the time elk goes high. A change of one of the
inputs from 1 to after elk has gone high cannot cause the
output to return to 1.
In general dynamic CMOS eliminates the redundancy of
static CMOS by applying all inputs to one type of device and
r
ClkJ



























Figure 2.7 Dynamic, HAND Gates [Ref. 6].
a control signal to the other type of device. The most
popular dynamic CMOS logic design technique is domino CMOS
[Ref. 7], illustrated in Figure 2.8 Here the output is the
18
logical AND of the boolean function (in1 in2 + in3) to be
implemented and a control (clock) signal. When the clock is








inl- in2 + in3
Figure 2-8 Domino CMOS Structure [Ref. 6].
evaluation occurs. With a common clock shared by all the
domino gates on a chip, during the evaluation cycle the
signals ripple through the chip as though the logic were
purely static. The follow on inverter insures that the
output of each gate is low when evaluation begins. This
prevents the outputs of all gates from changing unless
driven lew by the inputs. Domino CMOS is not always the
answer though. If the logic of Figure 2. 9 were implemented
in domino CMOS it would be more area consuming than the same
circuit implemented in static CMOS. Dynamic CMOS is more
19
area consuming in this case because these are simple gates
with only a few inputs. Each NCR gate if implemented stati-
cally would need two n-channel devices and two p-channel
devices. If implemented dynamically, each NOR gate requires
three transistors of one type (one for each input and one
for the control signal) and one transistor of the other type
(for the control signal again) . The number of transistors
needed remains the same but the dynamic logic requires the
designer to keep three inputs electrically isolated instead
of just two. And if the dynamic design technique is domino,
six additional inverters will be needed. As can be seen in
Figure 2.4, in CMOS a NOR gate can be constructed from just
one stage. Adding the follow-on inverter of the domino
design results in an OR gate. Thus a second inverter is
required to return the logic to that of a NOR gate.
1
L>^
Figure 2.9 Circuit Difficult to laplement in Domino CMOS.
C. CMOS IMPLEMENTATION TECHNOLOGIES
One of the principal issues in the design of a process
to implement CMOS digital circuits in silicon is how to
isolate the two types of devices. This can be accomplished




The only process currently offered by Metal-Oxide
Semiconductor Implementation Service (MOSIS) which uses an
electrically insulating substrate is Silicon on Sapphire
(SOS) . In this technology the n-channel and p-channel tran-
sistors are formed on silicon islands left after etching an
epitaxial layer of silicon on a sapphire (Al^O^) substrate.
2 . CMOS-B ulk
The other CMCS processes offered by MOSIS all use
CMOS-Bulk p-well technology. The p-well processes differ in
the number of layers of metal interconnections (1 or 2) and
the presence or absence of capacitors. In CMOS-Bulk p-well
(n-well) the substrate is n-doped (p-doped) and the
p-channel (n-channel) devices are in this substrate. To
isolate the n-channel (p-channel) devices from the substrate
a heavily doped p-well (n-well) is first placed to act as
the back gate. The heavy doping of the p-well (n-well)
degrades the performance of the n-channel (p-channel) device
while the p-channel (n-channel) device is optimized. In
p-well CMOS, though the mobility of electrons in the
n-channel device still exceeds that of the holes in the
p-channel device, the performance difference of the transis-
tors is ninimized. The more uniform performance of the two
transistor types makes the p-well process appropriate for
CMOS random logic.
Figures 2.10 and 2.11 represent the top and side
views of the steps of the CMOS-pw process for the production
of an inverter. These steps are: (1) starting with an
n-type substrate the p-well is patterned, (2) The active
areas in the p-well and on the substrate are established,
(3) the polysilicon is patterned, (4) the two ion implant
masks are placed (the N+ mask is simply the photographic
21
negative of the P+ mask)
, (5) contact cuts are made, and
(6) the metal is placed.
a. Latchup in CMOS-pw
One of the main problems associated with
CMOS-Eulk, both p-well and n-well is latchup. Basically
latchup involves generation of a short circuit between Vdd
and GND, and can result in the complete destruction of a
chip. Many researchers have tried to formally define the
conditions [Eef. 8] that cause latchup to occur. This task
is extremely complex because the phenomenon is so dependent
on layout, which is unique to each chip design. Though a
fully quantitative analysis of latchup is still not avail-
able, a qualitative analysis will show what happens on the
chip when latchup occurs.
Looking at the side view of an inverter in
Figure 2.12, parasitic bipolar transistors can be seen. The
base of the npn transistor is the p-well and the base of the
pnp transistor is the n-doped substrate. These parasitic
transistors are connected as shewn in Figure 2.13 . If the
output of the gates goes below GND by a value equal to the
threshold of the npn transistor, its emitter starts to
inject current (electrons) intc the base (p-well) and the
resultant collector current flows to the Vdd node. If the
resistance between the Vdd ncde and the source of the
pull-up p-channel HO S transistor, R1, is large enough, the
voltage drop across E1 will exceed the threshold of the pnp
transistor. The collector current (holes) of the pnp device
flows to the GND node. If the resistance between the GND
node and the source of the pull-down n-channel MOS tran-
sistor, R2, is great enough, the resultant voltage drop
across R2 will increase the base current in the npn tran-



























































/ \ I 1—
I
Figure 2-11 P-iell Process, Side View [Eef. 9],
24
The only way to stop this destructive process once it has
started is to disconnect Vdd or GND. Prevention of latchup















Figure 2.12 Bipolar Transistcrs in CMOS-Bulk [Bef. 6].
Figure 2.13 The Latchup Circuit [ Ref . 6 J.
The MOSIS CMOS-Bulk p-well design rules include
features for the specific purpose of reducing the
25
probability of latchup. The ninimum separation rules for
p-wells and P+ doped active areas exist for this purpose.
Their aim is to reduce the gain of the parasitic bipolar
transistors, thus requiring a larger noise spike of longer
duration to start the latchup sequence. A frequently used
technique is the grounding of the p-well as illustrated in
Pigure 2-14 . Here the effect cf the P+ doped area covering
half of the contact cut for the ground bus is to reduce the
resistance E2 in Figure 2.13 . Another practice is to place
a small capacitor across the Vdd and GND pins of CMOS-Bulk
chips. To provide capacitive filtering of noise spikes on
the chip, Vdd and GND busses are frequently run close
together. Also, Vdd input pads are designed to provide




















Figure 2. 14 Grounding of the P-Well,
3. Iwin-tub CMOS
This process, also called twin-well, uses both
n-wells and p-wells on a high resistivity N- or P-
26
substrate, or in an epitaxial layer of silicon on a P+ or N+
wafer. Since the well doping does not have to overcome the
substrate doping, both the n-channel transistors in the
p-well and the p-channel transistors in the n-well can be
optimized. Domino CMOS is enhanced by the use of this
process since the optimized n-channel devices can speed up
the complex boolean expression evaluation and the optimized
p-channel devices can speed up the signal drive between
stages (thereby reducing the effect of a given f anout)
.
D. CMOS TECHNOLOGY SELECTION
The CMOS implementation technologies available from
MOSIS are CMOS-Bulk p-well with one metal layer, CMOS-Bulk
p-well with two metal layers, CMOS-Bulk p-well with two
metal layers and capacitors (for analog circuits) and
CMOS-SOS.
The advantages of CMOS-Bulk are: (1) very good noise
margin, (2) faster than NMOS, and (3) a proven reliable
fabrication process. Its disadvantages are: (1) latchup
susceptibility, (2) use of p-well guard rings is needed if
radiation hardening is desired, (3) lower circuit density
than NMOS or CMOS-SOS, and (4) more complex design rules
than either NMOS or CMOS-SOS.
The advantages of CMOS-SOS are: (1) faster than NMOS or
CMOS-Bulk, (2) very good noise margin, (3) intrinsically
radiation hardened, and (4) no latchup. Its disadvantages
are: (1) expensive fabrication process due to the sapphire,
(2) sapphire variability reduces the reliability of the
fabrication process, (3) thermal mismatch between the
sapphire and silicon limits the carrier mobility, and (4) it
is not a viable technology for dynamic memory due to back
channel leakage.
27
CMOS-Bulk p-well was selected as the implementation
process for the adder for the following reasons. First,
technology files for this process were available at the
Naval Postgraduate School (NPS) enabling the use of extant
computer aided design (CAD) tools. Second, since this would
be the first CMOS VLSI design at NPS, utilizing the most
reliable process is prudent to prevent design problems from
being clouded by implementation process problems.
28
III. DESIGN TOOLS
To employ the Mead-Conway design methodology on a large
scale design, three computer aided design (CAD) tools are
needed- A layout design editor for viewing the circuits as
they are created is the first tcol required. Next, a design
rule checker is necessary to confirm that all the design
rules for the specified technology have been adhered to.
Though not a complex task, the large number of checks that
must be made for even a modest design makes manual design
rule checking highly error prone. Finally, a circuit simu-
lator is needed to verify that the circuit as designed
provides the proper logical output. In the design of the
sixteen-tit pipelined adder, the Caesar layout editor
[Eef. 4], the Lyra design rule checker [ Ref . 10], and C.
Terman's ENL circuit simulator [Ref. 11] were employed.
A. CAESAE
Caesar is a generic layout editor. It is not designed
for any particular VLSI implementation technology. It is
not even limited to designing integrated circuits. Caesar
is a graphics layout editor for the creation and manipula-
tion of rectangles where the user specifies the color, size,
and placement. It is through the user specified technology
file that the rectangles of color take on meaning. At the
Naval Postgraduate School (NPS) there are two technology
files available for use with Caesar. One is for N-doped
metal oxide semiconductors (NBOS) and the other is for
complementary metal oxide semiconductors utilizing a P-doped
well (CMCS-pw) .
29
Caesar works with files cf its own special format.
These file are indicated by an appended file type of ca(i.e.
xxxx.ca). On command Caesar will generate a Caltech
Intermediate Format (CIF) file cf the same layout. Again it
is the technology file which tells Caesar which CIF layer
labels to attach to the colored rectangles.
At NPS, Caesar is set up to take commands from any
terminal where the execution of the Caesar program is initi-
ated (usually the ADM-3a console adjacent to the color
graphics display unit) and from a four-button puck on a
graphics tablet attached to the color display device.
Caesar displays its graphics results on an AED 767 color
monitor and displays its menus, messages, and prompts on the
command console. Detailed information on the installation
and operation of Caesar at NPS can be found in Reference 4
and Reference 2.
Caesar is an interactive CAE tool. The results of any
command are rapidly displayed on the AED 767. The results
of a ccmmand may be undone (u) cr repeated (.) with a single
stroke of the specified key on the command console. While
running Caesar, a user may also call upon the design rule
checker, Lyra, to check the area inside and within three
Caesar units* of the current box for design rule violations.
This interactive use of the layout graphics display and the
design rule checker helps to insure that there will not be
any design rule forced changes late in the design cycle when
changes are much more time consuming. With Caesar's level
of interaction with the designer, the design loop consisting
of (1) issue commands to perturb existing circuit, (2)
visual inspection to verify command's generation of desired
A Caesar design is layed out on a grid of Caesar units.
These units do not represent any specific length. When
creating a CIF file from a Caesar file the desired length of
a Caesar unit is specified.
30
results, and (3) design rule checking of new circuit, can be
rapidly completed.
Caesar is a hierarchical design tool. With Caesar,
circuits can be created by piecing together cells (other
files of type .ca) which in turn may be made up of other
sub-cells. Theoretically, there is no limit to the number
of levels in the hierarchy. Net only can cells (sub-cells,
etc.) be called upon to fill locations in a circuit, if they
need to be modified to function properly, Caesar provides a
subedit mode to facilitate editing of layouts one level
below the current editing level. Care must be taken when
this subedit feature is used since the changes made to the
cell are global. Everywhere the given cell is used on the
chip, the newly edited version will appear.
B. LIRA
like Caesar, Lyra is a generic design rule checker.
When Lyra is invoked from within Caesar, the actual program
executed to check for design rule errors depends on the
technology file indicated in the header of the Caesar file
being edited. After running, Lyra sends a message to the
command console indicating the number of errors found. On
the graphics display Lyra paints the exact location of each
error and labels each error with the design rule violated.
The error label consists of abbreviations for the layers
involved, followed by an underscore, followed by an abbrevi-
ation for the type of violation detected. Table 1 lists the
abbreviations used by Lyra for CMOS-pw.
The winter 1983 distribution of the University of
California at Berkeley (UC3) CAT tools included two versions
of Lyra. One for the Mead-Conway NMOS design rules and the
other for the Jet Propulsion Laboratory's (JPL) five-micron















supports fabrication of the JE1 CMOS-pw process, design
rules for the MOSIS supported three-micron CMOS-pw process
were obtained. Professor Marco Annatarone at
Carnegie-Mellon University (CMO) generated the listing of the
three-micron CMOS-pw design rules compatible with Lyra and
has provided NPS with a copy. To generate executable code
from the prototype Lyra program and imbed the specific
process design rules, the program rulec (see Appendix B) is
run with the design rule list file as its argument.
Now, when Lyra is invoked from Caesar while editing a
CMOS-pw technology circuit, the three-micron minimum feature
size CMOS-pw design rules are applied. This version of Lyra
does not check for exceeding any maximum dimensions. The
only maximum size design rule in this technology is for
contact cuts, which may not exceed 3 microns by 8 microns.
Avoidance of improper contact cuts can be accomplished by
utilizing Caesar's hierarchical nature. Contact cuts of all
needed sizes and types are generated once and saved to be
inserted as cells wherever needed.
C. SIMULATION
Once a circuit layout has completed this initial design
loop, it matches the designer's conception of how it should
appear and is free of design rule violations. The perform-
ance of the given circuit, though, remains uncertain. To
simulate the performance of the design, programs such as
SPICE [Ref. 11] and ENL [fief. 11] are used.
32
1 . SPICE
SPICE is an important simulation tool in the design
of high speed CMOS digital and analog circuits. With its
detailed device modeling, SPICE can provide accurate
predictions of performance once the device parameters of the
implementation technology are known. SPICE provides the
logical output of a circuit based upon the inputs and
describes the transient behavior of the circuit as it
changes to the new logical output. Thus SPICE enables a
designer to optimize transistor dimensions for speed.
Unfortunately, the version of SPICE currently avail-
able en both the Vax 11-780 and the IBM 3033 at NPS (version
2G6) fails when the parameters of the devices fabricated by
the MCSIS three-micron CMOS-pw trocess are used. With these
parameters the transient behavior solutions do not converge.
Engineers at CMU, UCB, and the University of
Washington (UW) are currently employing an experimental
version of SPICE {version 2X. x developed at UCB) which is
successful simulating with the three-micron CMOS-pw device
parameters. This version, however, has other bugs and is
therefore not available for general distribution. The
changes to SPICE 2G6 that enable SPICE 2X.x to simulate the
three-micron CMOS-pw devices will be incorporated into the
next distribution of SPICE {version 2G7)
.
The Naval
Postgraduate School is in the gueue of institutions to
receive SPICE 2G7 once it is ready.
In order to run a SPICE simulation of a CMOS circuit
designed using Caesar, the following steps should be
executed. First, the labeling feature of Caesar is used to
place labels on the electrical nodes of interest in the
circuit (Vdd, GND, input, output, etc.). Second, the Caesar
command
: cif 100 -p
33
is issued to generate the baseDame. cif file. The parameter
100 indicates a scale of 100 centimicrons per Caesar unit 5
and must be specified unless the default value of 200
centimicrons per Caesar unit is desired. The -p option
causes entries to be made in the basename.cif file for the
labels assigned- Third, after exiting Caesar and returning
to Unix, the circuit extractor Mextra [Eef. 10] is invoked
using the command
% mextra basename
to create the file basename. sim. To modify the basename. sim
file to a SPICE file (basena me . spice) , the program sim2spice
[Ref. 11] is used- The basenane. spice file contains a list
of transistors and capacitors in the circuit in a SPICE
compatible format.
The basena me. spice file must be edited to add the
model parameters for the transistors, to specify the wave-
forms of the input (s) , to specify the type of analysis to be
performed (usually transient analysis) and to specify the
output to be produced (tables, graphs, etc.). The Spice
User's Manual [Eef. 11] contains the formats of these addi-
tions to basename. spice. Best case and worst case device
model parameters for the MOSIS three-micron CMOS-pw process
as compiled by Dr. M Annaratone of CHO and Dr. L. Glasser
of MIT are found in Appendix A.
2. EN I
ENL is a timing and logic simulator for digital MOS
circuits. It is an event driven simulator which uses a
resistance-capacitance model of a circuit to estimate node
transition times and to estimate the effects of charge
5 Since the minimum dimensions for the 3-micron CMOS-pw
process are specified in microns instead of lambda, CMOS-pw
circuits are usually designed or Caesar using one micron per
Caesar unit.
34
sharing. 6 After input values have been assigned by the user,
RNL calculates the effects of those inputs by repeating the
following operations until there are no further node value
changes: (1) when a node is added to the network due to a
transistor being turned on, the charge sharing implications
of the new node's capacitance and logic state on each of its
electrical neighbors is computed, (2) for each node that
might be affected, Vthev and Ethev (the parameters of the
Thevenin equivalent circuit) are calculated and the new
logic state is determined from Vthev (O.OVdd to 0.3Vdd =
logic 0, 0.8Vdd to I.OVdd = logic 1, logic X otherwise), (3)
if the node has changed state, the transition time is calcu-
lated using the node's capacitance, and (4) any changes are
propagated to other nodes. Details of the computation
methods used by RNL can be found in the RNL Version U.2(0W)
User's Guide [Ref. 11]. More important to the user is an
understanding of what information RNL keeps, what it
discards, and how it decides what to do next.
Basic to the operation of RNL is the idea of an
event. The three elements of an RNL event are: (1) a node
in the network, (2) a new logic state for the node, and (3)
the time when the node value changes to the new logic state.
RNL maintains a list of events, sorted by time, that tells
what processing remains to be done. When the user changes
an input, an event is added to the list. RNL sequentially
processes the next event on the list, stopping when (1) the
list is empty, (2) a node the user is tracing changes value,
or (3) when the specified simulation time interval has
elapsed. To process an event, 5NL removes it from the list,
changes the node's state to reflect its new value, and then
6 Charge sharing refers to the capacitive effects that
happen when two or more previously unconnected nodes, each
having seme charge and capacitance, become connected by a
resistor (transistor turning on). .
35
calculates any new events resulting from the node's new
value.
In calculating new events, first all nodes that
might be affected by the change are found and marked. This
includes the source and drain cf all transistors for which
the current node is the gate and all nodes connected to
these nodes through turned on transistors. The search
through the network stops when a non-conducting transistor
or an input is reached. For each marked node, two calcula-
tions are made. First, a charge sharing calculation is
performed to model changes of state due to the charging and
discharging of node capacitances. Second, a final value
calculation is done to determine the node's ultimate logical
state.
A given node can have only two events pending: (1) a
charge sharing event describing an immediate change in the
node's state due to charge redistribution among the nodes on
the connection list, and (2) a final value event describing
the final, driven state of the node. RNL observes the
following rules for processing events: (1) when a new charge
sharing event is scheduled, throw away all previously
pending events for the node, and (2) when a new final value
event is calculated, it will be ignored if (a) there is a
pending final event for the same value which is scheduled to
occur sooner, (b) there is a pending charge sharing event
for the same value as the new final event, or (c) there is
no charge sharing event and the new final value event is the
same as the node's current value. These rules are based on
the assumption that the event that was last calculated
reflects the latest configuration of the network and there-
fore should override events calculated earlier. Charge
sharing events discard any pending final value events
because any charge sharing calculation is immediately
followed by a new final value calculation.
36
These event rules, however, sometimes lead RNL to
generate incorrect results. This is especially true of
signal driven circuits (circuits where inputs are applied to
the source and drain of a transistor as well as its gate)
and circuits that depend on the analog properties of the
devices to predict the behavior of the circuit. For























Figure 3.1 CMOS Exclusive OB [ Ref . 6 ]-
pipelined adder in Figure 3. 1 This design has proven to
function correctly at CMU, however, the RNL simulation shows
this circuit failing.
Starting in a state where A=0, B=1, and out=1,
assume that the input A then transitions to 1. Initially
Q1 , Q3 , 0.4, and Q6 are on. When input A goes high, Q3 is
turned off (no events generated) and Q2 is turned on, gener-
ating a charge sharing event and a final value event for
37
Abar resulting in Abar going low. When Abar goes low, the
still turned on Q6 is now trying to drive the output node
low and the still turned on Q4 (RNL recognizes that it takes
a finite amount of time for Q4 to turn off but does not
recognize that n-channel transistors do not conduct high
voltages well) is still trying to drive the output node
high. The result is an output of X, the undefined state.
Next, Q4 is turned off. Since turning off Q4 adds no new
nodes to the network, the event list is empty and the output
remains at X. The primary difficulty RNL has with this
circuit centers around the fact that the output node is
controlled by two nodes that can change at different times.
As a result, a charge sharing event due to one input can
eliminate a final value event of the other, with that final
value event being the force which determines the circuit's
actual behavior.
The circuit cf Figure 3.2 is a proven latch design
which also fails in BNL simulation. In Figure 3.2 the frac-
tions next to the transistors represent the length to width
ratios of the devices. This circuit is dependent on these
ratios fcr proper operation. These ratios insure that the
gain of the input signal on the gates of Q5 and Q6 is
greater than the gain of the feedback signal to the same
gates. RNI does not recognize the difference in these gains
to be sufficient to cause the gates of Q5 and Q6 to be at
either logical 1 or when the input signal is the opposite
of the feedback signal. As a result, the circuit becomes
locked up at X. Because of RNI's difficulty with these two
circuits, other designs were employed in the final adder
(see chapter 5) to facilitate testing of the overall design.
To use RNL as installed at NPS, the following steps
should be followed. First latel the circuit and generate
basename.cif as before. Again the program Mextra is used to
extract the circuit, this time with the -o option (Mextra
38
Figure 3-2 CMOS Latch Design [ Ref . 6].
basenaie -o) . The -o option causes Mextra not to compute
capacitances. A follow on program in this sequence, Presim,
performs this computation with greater accuracy. It should
be noted that there are three different circuit extraction
programs, each named Mextra. There is the MIT version, the
DCB version and the Ufi modified UCB version. The next tool
to be used in the seguence, Presim, can accept the output
format of the MIT version and the UW modified UCB version.
At NPS, the UCB version is installed and was used. The MIT
and UI modified DCB versions differ in the order of the
parameters in a transistor specification. Professor
Annaratone at CMU developed a program, cformat, to change a
• sim file generated by the UCB version to the MIT format.
However, cformat does not work if the -o option is used with
Mextra. To avoid a loss of accuracy, the .sim file can
manually be changed to the Ufl modified UCB format. The
39
first step in this format change is to use the 71 text
editor to add "format: UCB" to the header line of base-
name. sim. The other change that needs to be made is to
change the labels for the n-channel transistors from "n" to
"e". Using the EX editor, the following steps accomplish
this
:
% e basename.sim - invokes the editor
: g/ n/s//e/g - make global change
for all n as first char
in a line, change to e
: w - write back edited file
: g - exit editor
The next step is to create a binary file for RNL
from basename.sim using Presim. This is done by issuing the
command
:
% presim basename.sim basename config
Basename.sim is the edited .sim file and basename is the
file into which presim writes its binary output. Config is
the calibration file used to select other than default
values for the circuit element capacitance and resistance.
A copy of the presim user's guide from the UW/NflC VLSI
Consortium release 2.0 and the calibration file used in
simulating the adder are contained in Appendix C. The
values used in the calibration file are taken from the MOSIS
supplied electrical parameters.
The final step is to run RNL itself. This is done
by entering one of the following two Unix commands:
% rnl or
% rnl cmdfile
where cmdfile is the name of a file containing a seguence of
RNL commands. Entering the first Unix command will cause
RNL to take its commands directly from the console
40
interactively. If the second Onix command is used, speci-
fying a command file, RNL first executes all the commands in
cmdfile and upon completion, starts taking commands from the





(read- network "has ename")
where basename is the file generated by presim. The first
two commands load RNL with several macros which simplify
user interfacing with RNL.
The user interface with RNL is a LISP interpreter.
The interpreter continuously executes the loop: (1) read a
command, (2) evaluate the command and perform the specified
actions, and (3) print the result. There are two formats
for specifying commands to this loop. The first is:
(function argument argument ... argument)
Here the parentheses delimit the command and spaces separate
the elements. The interpreter reads the entire command, up
to the closing parenthesis, then the first element is inter-
preted as a function and all the others as arguments. The
arguments may be of the same command form, (function arg arg
... arg). If the following command were issued to RNL,
(* 12 (+22) (/ 14 7 ))
RNL would respond by typing 96 (12*4*2). The other format
for commands to RNL is
(function ' (argument argument ... argument))
where the " f " indicates the quote special form which keeps
its argument from being evaluated. For example, (+ 2 3)
evaluates to 5, but f (+ 2 3) is a string of three elements.
When this second RNL command format is not used to represent
an argument of another command (i.e. is not contained within
41
the parentheses of another command) , it may be written in
the more natural form:
function argument argument .... <newline>
Tutorials on RNL are contained in the University of
Washington/Northwest VLSI Consortium's VLSI Design Tool s
Refe r ence Manual [Ref. 11]- There are two points concerning
the aextra, Presim, RNL simulation cycle a user should be
aware of that are not brought out in the documentation. The
first concerns the use of vectors in RNL commands. As
evidenced in the tutorials of Reference 11 and the adder
Simula lion results in Appendix D, vectors can be used to
make the input and output of RNL less cumbersome and
verbose. After the vector has been defined, a user will
then want to assign values to it. The documentation shows
the format of the vector value assignment command to be:
(invec ' (vecname values))
However, the "values" field has its own specific format.
The first character should be a or a 1 indicating positive
and negative numbers, respectively. The LISP interpreter
will work with negative numbers but RNL will not accept
negative numbers as logical inputs. The second character is
a letter specifying the number base of the input vector (b
for binary, h for hexadecimal) - For example, to assign the
binary value +101010 to the vector vectone, the RNL command
would be:
(invec » (vectone 0b10 1010})
The other point concerns the location of input
labels on the input pads. Ehen the entire chip is being
simulated, the input labels are normally placed on the metal
pads where the off chip leads are attached. Before an input
signal from a bonding pad reaches the interior circuits of a
chip it must pass through a resistor in an overvoltage
42
protection circuit. In the extraction and simulation
process this resistor is viewed as an open circuit.
Therefore, on input pads, the input label must he placed
after the resistor in the signal path.
With Caesar, Lyra, and ENL, a designer at NPS has
the requisite CAD tools for the complete logical circuit
design loop. With these tools circuits that are free of
design rule errors and produce the desired logical results
can be designed. The lack of SPICE somewhat restricts the
designer's ability to optimize speed, but there are several
design techniques that can be employed to design chips that
run fast. These will be covered in the next chapter.
43
IV. DESIGN OF THE ADDER
As stated in the introduction, the primary goals of the
adder design are to maximize throughput and to provide for
testability. The adder is to fce a pipelined adder. Every
clock cycle it should accept as inputs two 16-bit addends
(A 1 , the least significant bit, through A16 and 31, the
least significant bit, through B16) and one carry-in (Cin)
bit. It is desired to produce the 16-bit sum (S 1 ,the least
significant bit, through S16) and the carry-out (Coat) bit
as quickly as possible. Both the number of clock cycles
from input of the addends to the output of the sum and the
duration of each clock cycle are to be minimized. A secon-
dary consideration in the design is expandability. An
expandable design is one that can easily be extended to
produce a 32-bit or 64-bit sum utilizing the same circuit
structures. In this chapter the logical design and layout
design of the 16-bit adder will be presented. The equations
presented in this chapter are taken or derived from equa-
tions found in chapters three through six of The Logic of
Comp uter Arithme tic by Flores [ Eef . 12]. In these equations
concatenation implies the logical AND, the symbol + implies
the logical OR, and the symbol + implies the logical XOR.
A. LOGICAL DESIGN
In considering the speed spectrum of adders from a
logical standpoint, at the fast end there is the table
look-up. With 33 binary inputs and 17 outputs, this would
require an address space of 233 17-bit words. With current
technology this is not feasible- At the other end of the
spectrum is the serial adder. On clock cycle 1 it uses A1,
44
B1, and Cin to produce 31 and Clout (carry out of tit one
into tit 2). On clock cycle 2 it uses A2, E2, and Clout to
generate S2 and C2out. Here 16 clock cycles elapse before
the sum is available. An adder can also be implemented as a
ripple carry adder where the duration of each clock pulse is
sufficient to allow a carry into the sum to propagate all
the way through to a carry out. In the case of the 16-bit
adder, this would require a clock duration at least sixteen
times the length of the gate delay of the one bit adder.
The middle ground belongs to the carry look- ahead adder
£Ref. 3]. In carry look-ahead (CIA) addition the carry into
each bit position, C (i) , is generated from the propagate,
/>,,,= A [t)QB {l) (egn 4.1)
<?(,-)= A[,)B {i) (ecn 4.2)
P(i), and generate, G (i)
,
primitives. P (i) =1 implies that
a carry into bit(i) will- be propagated through to bit (i+1).
G(i) =1 implies that A (i) and B (i) will provide a carry
into bit (i+1) of the sum, regardless of the contents of the
<?(,->= £(,-,)+£(,-,)/>(,-,)+ ••• + Cm P [i _ 1yP {7)P [1) (egn 4.3)
5 (.)= c l>)® p (.) ( e ^ n u « 4 )
less significant bits of A and E. The algorithm for the CLA
sum generation is as follows. The first event is the evalu-
ation of equations 4.1 and 4.2 to generate the P (i) and G (i)
primitives. The second event uses the P(i) and G (i) primi-
tives as inputs to eguation 4.3 to generate the C (i) 's. The
final event is the computation of the S (i) •s from equation
4.4 .
45
As pointed out by Flores [Eef. 12] and by Conradi and
Hauenstein [Eef. 3], there are several logical implementa-
tions of carry look ahead addition. A principal task of
this thesis investigation was to select a fast logical
design. Without the circuit simulator Spice, the analysis
of each design considered was more qualitative than quanti-
tative. In this qualitative analysis, a turned on tran-
sistor is considered as a resistor with its resistance
proportional to its length and inversely proportional to its
width. All gates driven by such a turned on transistor are
considered to be capacitive loads with capacitance propor-
tional to the area of the gate. The interconnect wiring is
considered to add both parallel capacitive loading and













Figure 4.1 CHOS Output Loading Model.
From this model it is obvious that the amount of inter-
connect wiring and the number of gates driven (fanout)
should be minimized to minimize the output transition time
when the positions of switches SI and S2 of Figure 4.1 are
46
reversed. This led to the following guidelines in the
design of the adder:
1) The internal logic of each stage should be accom-
plished with minimum dimension transistors , 3 microns
x 4 microns (length x width)
.
This leads to more
compact circuits with shorter interconnections and
reduces the capacitive load on the preceding stage.
2) Significantly wider transistors (3-micron x 9-micron)
should be used at the output of each stage where the
fanout and interconnect leading is greater.
3) The fanout of any transistor should be kept to less
than five.
This requires a more complete definition of fanout
because the capacitive loading of a gate depends on its
area. A 3-micron x 4-micron transistor driving six other
3-micron x 4-micron transistors has a fanout of six. A
3-micron x 8-micron transistor driving the same load is
considered to have a fanout of three. Though this implies
that a high fanout problem can be solved by merely
increasing the width of the driving transistor, it neglects
the effects of the interconnect wiring. As gates are added
to the load of a transistor, each subsequent addition must
be more remote from the driving transistor. Since the
resistance of the wiring is proportional to its length and
inversely proportional to its width, the resistance of the
wiring will increase unless the width is also increased.
However, since the capacitance of the wiring is proportional
to its area, most of the gain achieved by widening the wire
to reduce resistance is offset by the increase in capaci-
tance. As a result, in the design of the adder, increasing
the width of the driving transistor was not viewed as a
complete fix for a fanout problem.
For the comparison of the different approaches to CLA
addition, the term logical event needs to be defined. The
47
most basic definition is a combinational logic circuit
accepting a set of inputs, performing its specified opera-
tions on those inputs and generating a set of outputs.
Therefore, the input of the addends, followed by the compu-
tation and output of the sum can be considered as a logical
event. However, a primary design consideration for the
adder is to provide for testability and a key element of
this provision is the availability of intermediate results
(see section 3 of this chapter). This implies breaking up
the sum generation into several separate events. The first
event takes the addends as inputs, performs some logic oper-
ation (s) on them and stores the results in a register. The
next event takes its inputs from that register and stores
its results in another register. This chain continues until
the last event deposits the sum on the output pads of the
chip. To provide the tester with easily interpreted inter-
mediate results, the equations presented in this chapter
were taken as boundaries for each logical event. The terms
on the right side of the equation determine the inputs and
the left side terms determine the output of a logical event.
Once all the inputs for an equation are generated by
previous events, the logic of the equation becomes part of
the current event.
1 . Zero Level CIA Logic
This logic requires three events to generate the
sum. First, equations 4.1 and 4.2 are used to generate the
P (i) f s and G (i) 's. Second, from equation 4.3 the C (i) f s are
generated. Finally, the sum is derived from equation 4.4
The principal problem with this approach for a sixteen-bit
adder lies in the application of equation 4.3 Here, the
input P (1) has a fanout of 15, which makes this approach
unsatisfactory.
48
2- First Level CIA Logic
Noting that a four-bit sum generated using zero
level CIA logic is within the design guidelines suggests
cascading 4-bit slices of the same logic as indicated in
Table 2 Here the sum is available after six events and the
TABLE 2



























































fanout is reduced by a factor of four. The event cycle time
reduction would more than make up for the event count
increase since cycle time grows faster than linearly with
fanout. The only drawback with this design lies in the cost
of extending it to generate 32-bit or 64-bit sums. For
every 4-bit slice added, another event is required. Thus, a
64-bit add would require 12 events.
3- Second Level CLA Logic
Again the data is divided into 4-bit slices called
blocks. But rather than let the carries ripple through the
blocks, two new primitive functions are introduced. They
49
are the block propagate, 3P(i) , and block generate, 3G(i) ,
functions. 3P(i) = 1 implies that a carry into block (i) will
be propagated through to block (i+1) . BG(i)=1 implies that
block (i) will generate a carry into block (i+1). For a 4-bit
block where bit(1) is the least significant bit, The BP and
BG primitives are generated by equations 4.5 and 4.6 respec-
tively, with the P(i)'s and G(i)'s computed as before.
BP[i) = P {i)P (i)P (i)P (i) (egn 4.5)
BG [i) ~ G (<)+ G WP («)+ G WP (*)P (»)"*" G H)P i*)PWP (2) (egn 4.6)
Next, the block carry, 3C (i) , which represents the carry
from block (i) into block (i+1), is computed using equation





Bp u (egn 4.7)
So far, after three events, the ? (i) 's, G(i) 's,
BP(i)'s, BG (i) ' s, and BC(i)»s have been generated. If the
same method of generating the final sum as used in zero
level CIA were to be used, two additional events would be
required. The first again applies the logic of equation 4.3
to each 4-bit block to generate the carry into each bit.
Here the Cin for block (i) is given by BC(i-1). The second
cycle is used to generate the sum from the C (i) 's and
P (i) f s. One of these events can be eliminated if, while the
BC(i) 's and their predecessors are being computed, an esti-
mated sum of the 4-bit block is also computed. One method
is to compute two estimated sums for each block, one
assuming an carry into the block of and the other assuming
a carry in of 1. When the correct carry in for block (i) is
generated, it is used to multiplex the correct sum for the
block to the output. This assumed carry method was rejected
50
because of the large amount of area consumed by the regis-
ters needed to hold two possible answers. The second method
is to compute the estimated sum of the block assuming a
carry-in of and then correcting the estimated sum once the
actual carry-in to each block is known.
Since the estimated sum, ES (i) , is not needed until
after the third event and computing it as one event again
leads to fanout problems, the computation of £5(4), the most
significant bit, through ES ( 1) is computed in two events as
follows. First, an intermediate estimated sum, IES (i) , is
computed using two-bit slices, each assuming a carry in
(see equations 4.8 through 4.11). At the same time, a carry
from bit (2) into bit (3) (IC23) is computed using equation
4.12 On the next event, ES (i) is computed from the IES(i)'s
and IC23 using equations 4.13 through 4.16 .




IES {i) = P {i)
IC2Z = G( 2)+G( 1 )/> ( 2 )
£5 (I ) = IES (i)
£5( 2 ) = IES ^)
ES {S) = !C2ZQlES {i)









Now, after three events, estimated sums for each
4-bit block and the actual carry into each block (Cinb) are
available. From these the sum can be computed using equa-
tions 4.17 through 4.20 .
s [i) = C<niQ ES {l)
S W = [c^ES^QESp
(egn 4. 17)
(egn 4. 18)
S {1) = \c, ni ES (i)ES {^QES {i) (egn 4.19)
S H) - c,nh ES {l) ES [2)ES (S) (~)ES {i
(egn 4.20)
Using second level CIA logic, the 16-bit sum is
generated in only four events. Additionally, this design
can easily be extended to the generation of 64-bit sums.
The logic of equations 4.5 and 4.6 which produced the second
level primitives BP and BG can be used again to generate
third level primitives, B3P a cd 33G. These third level
primitives represent the carry propagate and carry generate
properties of 16-bit slices. The carry into each 16-bit
block is provided by implementing equation 4.7 . Thus,
adding one event will provide the carry into each of four
16-bit blocks of a 6 4-bit sum. The logic of equation 4.3 is
then used to generate the carry into each 4-bit block of the
sum and the final sum is computed as before. The final
result is that by adding two events, for a total of six, and
using the same logic as before (i.e. no new circuits need to
be designed), the 16-bit adder can be extended to a 64-bit
adder.
52
B. DESIGN FOR TESTABILITY
Another primary objective cf the adder design was to
provide for testability, that is, the ability to logically
detect fabrication errors or circuit malfunctions rather
than visually searching for faults with a microscope.
As the complexity of integrated circuits has grown, the
ability to logically detect faults using only the normally
available inputs and outputs has decreased markedly. As
complexity increases, the number of likely faults to be
tested for and the number of input vectors required to
isolate a specific fault grow rapidly. Unless a design
technique is used which allows the tester to examine the
interior logic of a chip , the order of magnitude of the
number of input vectors required to perform useful logical
testing is prohibitive. Thus, if logical testability is
desired, a design technique that provides for it must be
used.
One such design technique is level sensitive scan design
(LSSD) £Ref. 13]. level sensitive implies that the output
of any logic element is dependent only on the levels of its
inputs. No logic elements are allowed to depend on a tran-
sition such as in an edge triggered flip flop. Scan design
implies that all memory elements in the design are to have
an auxiliary function where their contents are serially fed
to an output pad for examination. This gives a tester the
ability to examine intermediate results. In applying the
1SSD technique to the adder design, the following steps were
taken.
First, all circuits were designed to respond to the
level of their inputs and not to require a transition to
trigger their operation. Second, to insure that each logic
event worked only with stable, non-fluctuating input levels,
the inputs to each event were gated. The input gates were
53
opened only after the inputs were stable (i.e. the outputs
of the previous event were stable) and closed before the
input gates of the previous event were opened. Third, a
dual mode latch was used to stcre the output of each logic
event. In the normal mode cf operation, the register
latches the outputs of one lcgic event in parallel and
stores them to be used as inputs for the next logic event.
In its secondary mode of operation, the register stops
taking its parallel inputs and starts to run as a shift
register, shifting its contents onto an output pad.
One of the conseguences of using the LSSD technique is
the large amount of area consumed by the dual mode regis-
ters. In high speed operation, an inverter pair would be
sufficient to store inter-event results. But to permit low
speed testing where the capacitance of a gate may discharge
during one clock phase, and provide the dual mode feature, a
pair of clocked latches with control circuits is required.
C. LAYOUT DESIGN
With the logic decided upon, the next step was to create
the layout of the adder. The lcgic consisted of four events
to produce the sum. Another event was needed to latch the
input data onto the chip. A two-phase clock was needed to
insure that two adjacent events did not run simultaneously
(insuring stable inputs to each event). To make the output
of the adder compatible with the input to another adder, a
one event delay was added. This insures that the output of
one adder does not change while a second adder is using the
sum from the first as an input. With two 16-bit addend
inputs, one carry-in input, one power supply (Ydd) input,
one reference (GND) input, a 16-bit sum output, one carry-
out output, and two clock inputs, ten pads were left from a
standard 64-pin chip for register mode control input and
register (shift mode) output. Since the design called for
54
five registers, one for each logic event and one for
latching the input data, five pads were used for input of
the register mode control signals and five were used for the
registers to serially output their contents. With the
required inputs and output identified, the preliminary floor
plan shown in Figure 4.2 was created.
Q2
(J
Input Bl - B16
Event 2 : compute P, G (phi2)
Event 3 : Compute BP, (phil)
BG, IES, IC23
Event 4 : compute BC, ES (phi2)
Event 5 : Compute sum (phil&2)
and delay until phi2





























Figure 4.2 Preliminary Chip Floorplan.
55
The first circuit designed was the dual mode latch of
Figure 4.3 Here the circuit is designed to latch the IN







































Figure 4.3 Dual Mode Latch.
(phi 1 is low). When phil goes low, a copy of the input is
also stored in the second latch and becomes available at
shift-out which is connected to shift-in of the next latch.
When control goes high, the IN signal is blocked and the
latch takes its input from the register to the left. The
shift-in of the leftmost latch in a register is tied to
ground. Versatec plots of the actual layouts of this dual
mode latch and the other circuits described in this section
are given in Appendix E.
The ,AND gate used was corstructed from a NAND gate
followed by an inverter as shown in Figure 4.4 Similarly,
the OB gate was constructed frcm a NOR gate followed by an
56
inverter (see Figure 4.5). Although logic implemented using
these AND and OR gates is more area consuming than the same
logic implemented in NAND and NCR gates only, the penalty is
not severe because they were used infrequently in the final
design.















Figure 4.5 OB Gate.
The exclusive OR gate (XOE) was constructed from two
inverters and three NAND gates as shown in Figure 4.6 .
57
Thougii this design is considera hly more area consuming than
the XCE gate of Figure 3.1, it was selected because the RNL
circuit simulator could correctly model its operation.
Figure 4.6 Exclusive OR Gate-
More complex logic functions were implemented using
programmed logic arrays (PLA) where the outputs are the
logical sum (OR) of the products (AND) of inputs. A single
phase design was needed. A FLA designed to compute when
phil is high, between the time the preceding event had
produced stable outputs (phi2 gcing low) and the time phil
goes low, had to produce the proper sum-of -products results.
To hold down fanout, a dynamic structure was needed so that
inputs could be applied to a single type of transistor. To
prevent steady state power consumption a precharged dynamic
structure was needed. Because of charge sharing, the prec-
harging must take place while the inputs are present on the
transistor gates of the PLA (see chapter 5, section C, for a
complete explanation of the charge sharing problem in this
PLA structure) . Thus, two distinct events must occur during
58
this time period. First, the inputs must be applied and
precharging must take place. Then evaluation must occur.
To cause these two events to occur during a single phase of
the clock, the inter-phase time when both phil and phi2 are
low must be utilized for precharging. The basic structure
of the resulting PLA is shown in Figure 4.7
Figure 4.7 PIA Structure.
deferring back to the flocrplan in Figure 4.2, the
layout of the circuits which perform the logic of each event
are presented in Appendix E. The names assigned to the
layouts are given below. Event 1 consists of a 33-bit dual-
mode latch. Event 2, which computes the P and G primitives
for each bit, is made up of 16 AND gates, 16 XOE gates, and
another 33-bit latch. Event 3, which computes the BP and BG
primitives, The IES (i) f s and the IC23 for each 4-bit block,
is made up of four instances cf PLA82 and a 29-bit latch.
59
The circuit PLA82 is made up of an 8-input, 5-product,
2-output PLA , two XOE gates, ore AND gate, and one OR gate.
Event 4, which computes the ES(i) f s and BC for each 4- bit
block uses four instances of PLA84 to compute the ES(i)'s
and one instance of PLA915 to compute the BC (i) 's and a
21-bit latch. The circuit PLA915 is a 9-input, 15-product,
5-output PLA and the circuit P1A84 is an 8-input, 7-product,
4-output PLA. Event 5 uses four instances of PLA104 to
compute the S (i) f s and a 17 bit latch to store results and
provide the added delay (by taking the output from the shift
out position, the extra clock cycle of delay is generated)
.
The circuit PLA104 is a 10-input, 14-product, 4-output PLA.
With this design, the input to output latency is three full
cycles of a two-phase non-overlapping clock; three cycles of
the clock elapse between the time the addends are presented
to the chip and the time the sum becomes available at the
output. In the first three registers the odd number of bits
is due to the need to store the carry-in value until event
4. In the last two registers the odd number of bits is due
to the need to store the computed value of carry-out.
The resulting final layout of Figure 4.3 shows the
actual on-chip layout locations of each event's logic. In
addition to the logic circuits for each event, the circuits
AMP and AMP5 are also seen. These are driver circuits for
the high fanout control and clcck signals. Each takes as
its input a control signal and produces as outputs the
control signal and its inverse, both driven by 3-micron x
160-micron transistors. This amplifier is the same design
used by the output pads to drive off chip loads.
This final layout represents one implementation of a
pipelined CLA adder designed for testability. The relative
merits of this design and others that may have been imple-
mented can, as yet, only be gualitati vely discussed. The
addition of SPICE 2G7 to the CAE toolbag will provide future
60
Figure 4.8 Final Layout.
CMOS designers with the quantitative analysis necessary to
make decisions involving tradeoffs among primary design
objectives.
61
This final design, when simulated using RNL, functioned
properly at clock speeds up to 14 megahertz. Testing of the
actual chips produced by MOSIS should give an indication of
the accuracy of RNL's predictions. The following chapter
presents a test plan to check for proper operation of the




After several iterations of the design-simulate-redesign
loop, a final layout was achieved for the 16-bit pipelined
adder. These iterations provide considerable confidence in
the logical correctness of the layout. Appendix D contains
ENL simulation results for the full adder. In reading these
results it should be kept in aind that the adder requires
three cycles of the two-phase clock to produce the sum. In
the first part of the simulation, the inputs were kept
constant for three clock cycles to facilitate easier reading
of the results. With these steady inputs, simulations were
run to verify the generation of correct sums, concentrating
on those addends that would produce carry propagates and
carry generates across the boundaries of the 4-bit blocks.
The last part of the simulation utilized different inputs
each clock cycle. This was done to test the pipelining
feature of the design, insuring no dependence on repeated
inputs of the addends to produce the proper sum.
After fabrication of the chip, application of similar
inputs to make the same determinations for the actual
circuits will form the initial portion of the test plan. In
this chapter a test plan for the verification of computa-
tional correctness and speed will be presented.
A. INPUTS AND OUTPUTS
The first step in testing the chip will be to connect it
to the required input and output circuitry. To accomplish
this, the identity of the inputs and outputs on each pin
must be determined. Microscopic examination of the chip
will reveal the logo "16-bit Add", located between the GND
and Vdd buses for the pads in the northeast corner (see
63
Figure 4.8 which is repeated below for convenience). Using
this landmark, the signals on the pads can be labeled as
follows.
Figure 4.8 (repeated) Final Layout
64
The western edge has sixteen input pads for the addend
A, with the least significant bit, A(1), located at the
northern end. The northern edge of the chip also has
sixteen input pads for the addend B, with the least signifi-
cant bit, B{1), located at the eastern end- The southern
edge has fourteen output pads and two input pads. At its
western end is the GND input pad followed by fourteen output
pads for S(16), the most significant bit of the sum, through
S(3). Following S ( 3) , at the eastern end is the input pad
for Vdd. The eastern edge of the chip has eight input pads
and eight output pads. Starting at the northern end, there
are input pads for phil, phi2, Cin, C0N1 (control signal for
the dual mode register of event 1), C0N2, C0N3, C0N4, and
C0N5. They are followed by output pads' for SREG1 (serial
output from dual mode register of event 1), SREG2, SEEG3,
SREG4, SREG5, Cout, S (2) , and S (1) at the southern end.
To supply power to the chip, +5 volts DC should be
applied to the Vdd pad and volts to the GND pad. All
logical inputs including clocks and control signals should
be either GND for a logical or Vdd for a logical 1.
Simulation with RNL revealed sonie restrictions on the clock
signals. For proper operation, each clock should remain
high for a minimum of 20 nanoseconds and the clock inter-
phase time, when both phil and phi2 are low, must be at
least 10 nanoseconds in duration. For initial testing, to
insure that charge sharing protlems caused by too short an
interphase time, and fanout problems caused by too short a
clock phase duration, are not interpreted as fabrication
errors, the clock speed should be adjusted so that both
above clock parameters are exceeded by one order of
magnitude.
The outputs, like the inputs, are at Vdd to represent a
logical 1 and at GND to represent a logical 0. The circuits
used to measure the outputs should have high input
65
impedance, on the order of one megohm. The output pads of
the adder are not designed to handle the current source and
sinx requirements of transistor-transistor logic integrated
circuits. The output measurement circuits should be
constructed using NHOS or CMOS devicesthat are designed to
operate between +5 7clts DC and ground.
B. TESTING FOE CORRECT OPERATION
After connecting the adder to a test harness, the next
step is to verify the generation of correct sums by the
adder. There are several inputs that should be included in
the testing to verify the correct operation of individual
circuits. These are contained i-n Appendix F. In addition
to the test vectors of Appendix F, several randomly selected
input vectors should be tested. If the adder should fail to
generate correct sums, The LSSD features can be employed to
examine intermediate results.
1 . Interm ediate results
With the LSSD design, a tester can leave input
levels constant for a long period of time and use the shift
mode of the internal registers to examine the internal state
of the chip. The rightmost bit of each register is always
available at the output pad for that register. To obtain
the contents of the other bits, the control signal for the
given register is set to and held at logical 1 while the
clock continues to run. For registers 1, 3, and 5 the
serial output will be meaningful and stable while phi2 is
high. The serial output of registers 2 and 4 will be stable
when phil is high. Table 3 lists in order the intermediate






Cycle SEEG1 SHEG2 SREG3 SREG4 SREG
B1 P1 BP1 Cin S.1
1 B2 P2 IES3 BC2 S3
2 B3 P3 IES4 Cout S5
3 B4 P4 BG2 ES2 S7
4 B5 P5 IES5 ES4 S9
5 B6 P6 IES6 ES6 S11
6 B7 P7 IC67 ES8 S13
7 B8 P8 BP3 ES10 S15
8 B9 P9 IES11 ES12
9 B10 P10 IES12 ES14 Cout
10 B1 1 P12 BG4 ES16 S2
11 B12 P12 IES13 BC1 S4
12 313 P13 IES14 BC3 S6
13 314 P14 IC1415 ES1 S8
14 315 P15 BG1 ES3 S10
15 B16 P16 IES1 ES5 S12
16 A1 G1 IES2 ES7 S14
17 A2 G2 IC23 ES9 S16
18 A3 G3 BP2 ES11
19 A4 G4 IES7 ES13
20 A5 G5 IES8 ES15
21 A6 G6 BG3
22 A7 G7 IES9
23 A8 G8 IES10
24 A9 G9 IC1011
25 A10 G10 BP4
26 A11 G11 IES15
27 A12 G12 IES16







C. TESTING FOR SPEED OF OPERATION
Once the chips containing fabrication errors have been
culled from the chip set returned by MOSIS, the task
remaining is to determine just how fast the adder can run.
Rather than simply increasing the clock rate until the adder
fails, the duration of the time both phil and phi2 are high,
and the interphase time should reduced separately. RNL
simulation indicates that the circuit which generates S4
within P1A104 is the limiting circuit for clock phase dura-
tion (i.e. it requires the longest time to correctly
67
evaluate its inputs). RNL simulation also indicates that
the circuits in PLA 104 which generate S1 and S4 are the
limiting circuits for the clock interphase duration.
Since the PLA is constructed of precharged dynamic
circuits, the evaluation clock phase must be long enough to
allow the inputs to drive the outputs to their proper
values, even if the inputs are the same as those of the
previous evaluation cycle. This allows the tester to use a
constant input as the duration of each clock phase is
reduced until the adder produces incorrect results.
Determination of the clock interphase duration limit is
more difficult. This is because the inputs to a PLA must be
changing to cause charge sharing problems to occur. For
Figure 5. 1 Charge Sharing in a PLA.
example, in Figure 5.1 assume that the first set of inputs
is in1=1, in2=0, and that this is correctly evaluated to
68
produce out=0 when phil is high. Now assume that the next
input is in1=0 and in2=1, which should also evaluate to
out=0. However, if the precharge time (when the inputs are
present on the gates of Q2 and £3 and phil is still low) is
insufficient, C2 will not be charged to Vdd when precharging
ends (C2 was discharged to zero volts during the previous
evaluation when in1 was high and phil was high). Now, when
evaluation begins (phil going high) the low voltage across
C2 causes Q5 and Q6 to interpret their input as a logical 0.
As a result the output of the Q5-Q6 inverter pair goes high,
causing Q8 to turn on, discharging C4 and resulting in an
output of logical 1, which is incorrect. Table 4 lists the
proper evaluation seguence when precharge time is sufficient
and the improper seguence due to insufficient precharge
time. In this table, for the inputs, output, and capacitor
voltages a 1 indicates Vdd, indicates GND, and X indicates
somewhere in between. For the transistors, a 1 indicates





phi in C Q 1 out
1 2 12 1234 T;
1 10 0011 1 1000 10 C0
1
10 00 11 010101 1C01
1 01 0011 010101 1001
01 0111 001101 1C01
1 01 0111 1010010C01
Improper evaluation seguence:
phi in C Q 1
1 2 12 1234 1234567890
1 10 0011 1100010C01
10 0011 010101 1C01
1 01 0011 010101 1C01
01 0X11 0011011C01
1 01 oxxo 1010XX0X10 1
69
nor fully off. Subsequent inputs of in 1 = and in2=1 may
produce correct results since with constant inputs, each
precharge time will add more charge to C2 until there is
sufficient charge to allow the output of the Q5-Q6 inverter
to remain low.
Thus, to check for charge sharing problems in the
circuit of Figure 5.1, the inputs must alternate. Likewise,
in PLA104 to check for charge sharing errors in output S1,
its inputs must alternate between ES1=0, BC=0 and ES1=1,
BC=1 as the interphase time is reduced. This can be accom-
plished for all four instances of PLA104 simultaneously by
alternating inputs of
A = 0001 1001 1001 1001
B = 0000 1000 1000 1000
Cin = 1
and
A = 0000 0000 0000 0000
B = 0000 0000 0000 0000
Cin =
To check for charge sharing errors in S4, the inputs to PLA
104 must cycle between BC=1, S4=0, S3=S2=1,S1=0 and
BC=0, S4=0,S3=S2=S1=1 . This may be accomplished for all four
instances of PLA104 simultaneously by alternating inputs of
A = 0110 1 1 10 1110 1110
B = 0000 1000 1000 1000
Cin = 1
and
A = 011 1 0111 0111 0111
B = 0000 0000 0000 0000
Cin =
This maximum speed testing assumes that RNL has correctly
identified the slowest circuits on the chip. RNL
70
simulations have indicated that the next slowest circuit
(PLA915) is at least 20% faster than PLA104 (16.0 nsec for
PLA915 vs. 20.1 nsec for PLA1C4). Also, ail other PLA's
functioned properly with a 5 nsec interphase time.
Should PLA104 prove to be the speed limiting circuit for
the chip, the actual failure speeds of the chip can serve as




The experience gained in the design of the adder coupled
with the clarity of hindsight leads to the following conclu-
sions and recommendations.
A. THE CMOS TECHNOLOGIES
The CMOS technologies will play a role of steadily
increasing importance in the "VLSI designs of the future.
MOSIS is already offering, on an experimental basis, CMOS
Bulk p-well fabrication with a one-micron minimum feature
size. A scalable set of design rules, to allow initial
fabrication in 3-micron CMOS fcr design verification before
the far more expensive 1-microc process is used, is being
developed.
In the private sector there is considerable research
aimed at finding an insulating substrate material that does
not have the variability and thermal problems of sapphire.
Progress in this area will remove the drawback caused by
latchup tendencies in CMOS Bulk.
B. CMOS CAD TOOLS
Though the design tccls currently available at NPS consti-
tute a complete set for the design of CMOS Bulk p-well
circuits, the recent CAD tool set released by the
University of Washington/Northwest VLSI Consortium, Release
2.0 [Ref. 11 ], coupled with University of California at
Berkeley Winter 1983 CAD tools, represents a more complete
and cohesive set for CMOS design. When sufficient disk
space on the Vax 11-780 beccmes available to load the
Release 2.0, implementation of the Release 2.0 CAD package
72
is highly recommended. An added benefit of installing the
Release 2.0 package is the cell library provided. The
library contains several basic standard cells with known
performance characteristics. The library also contains the
standard pad frames used by MOSIS. Though MOSIS does not
require the use of standard pad frames on designs submitted,
their use does speed up fabrication.
As mentioned earlier, as socn as SPICE 2G7 is available,
its addition to the CAD toolbag would be most advantageous
to a CMOS designer.
C. DESIGH OF THE ADDER
If the design of the adder were to be undertaken again,
a different approach to generating the sum would probably
have been used, especially if the new CAD tools mentioned
above were available. The logic approach to the computation
would still involve CLA addition, but it would be accom-
plished using combinational logic and library cells rather
than PLA*s. Testability would probably suffer greatly, but
effort would be made to reduce the sum generation tc two
logical events. Though the level of testability provided by
the current design should provide considerable insight into
CMOS Bulk p-well performance and CAD tool accuracy, there
would be no need to repeat the investigation.
73
APPENDIX A
SPICE MODEL CABDS FOE 3-MICRON CMOS-PW DEVICES
CMO* models for MOSIS 3-micron CMOS Bulk p-well devices:
Fast Models
.model n nmos vto=0.4 tox=0-7e-7 lambda=1e-7 ld=1e-6
+xj=1.1e-6 gamma=.3 uo=500 cbd=5e-4 cbs=5e-4
.model p pmos vto=-.4 tox=0. 7e-7 Iambda=1e-7 ld=1e-6
+xj=1.1e-6 gamma=.3 uo=300 cbd=3.5e-4 cbs=3.5e-4
Slow Models
.model n nmos vto=1.0 tox=Q.8e-7 lambda=1e-7 ld=.5e-6
+ xj = 0.6e-6 gamma=1.3 uo=400 cbd=6e-4 cbs=6e-4
.model p pmos vto=-1.0 tox=0.8e-7 lambda=1e-7 ld=.5e-6
xj=0.6e-6 gamma=-9 uo=200 cbd=4.1e-4 cbs=4.1e-4
MIT Models for MOSIS 3-micron CMOS Bulk p-well devices:
Slow - Slow
.model nss nmos level=2 rsh=20 tox=650e-10 ld=.25e-6
+xj=.35e-6 cj=6e-4 cjsw=4e-1C wo=475 vto=1.2
+cgso= 1.3e-10 cgdo=1.3e-10 nsub=1.5e16
+ vmax=5e4 pb=.7 mj=.5 mjsw=. 5
+neff=2.5 ucrit=8e4 uexp=.25
.model pss pmos level=2 rsh=80 tox=650e-10 ld=.25e-6
+xj=.35e-6 cj=4.1e-4 cjsw=2.5e-10 uo=190 vto=-1.2
+cgso= 1.3e-10 cgdo=1.3e-10 nsub=5e15 tpg=-1
+vmax=5e4 pb=.7 mj=.5 mjsw=.5
+neff=2-5 ucrit=8e4 aexp=. 15
74
Fast p-type Slov n-type
.model nfs nmos level=2 rsh=30 tox=600e-10 ld=.25e-6
+xj=.35e-6 cj=6.0e-4 cjsw=4. Oe-10 uo=475 vto=1.2
+cgso=1.9e-10 cgdo=1.9e-10 nsub=1.5e16
vmax=5e4 pb=.7 mj = .5 mjsw=.5
+neff=2.5 ucrit=8e4 uexp=. 25
.model pfs pmos level=2 rsh=20 tox=600e-10 ld=.40e-6
xj=.60e-6 cj=2.0e-4 cjsw=1-0€-10 uo=270 vto=-0.6
+ cgso=2.0e-10 cgdo=2.0e-10 nsub=0.3e15 tpg=-
1
+vmax=5e4 pb=.7 m j=. 5 mjsw=. 5
+neff=2.0 ucrit=8e4 uexp=. 15
Past p-type Fast n-type
.model Lff nmos level=2 rsh=10 tox=550e-10 ld=.40e-6
+xj=.60e-6 cj=3.0e-4 cjsw=2. Oe-10 uo=675 vto=0-6
+cgso=2.5e-10 cgdo=2.5e-10 nsub=0.5e16
vmax=5e4 pb=.7 mj=.5 mjsw=. 5
+nef f=2.5 ucrit=8e4 uexp=. 25
.model pff pmos level=2 rsh=20 tox=550e-10 ld=.40e-6
+xj=.60e-6 cj=2.0e-4 cjsv=1.0€-10 uo=270 vto=-0.6
+cgso=2.5e-10 cgdo=2.5e-10 nsub=0.3e15 tpg=-1
vmax=5e4 pb=.7 mj=.5 mjsw=.5
+neff=2.0 ' ucrit=8e4 uexp=. 15
75
Slow p-type Fast n-type
.model nsf naos level=2 rsh= 10 tox=600e-10 ld=.40e-6
xj=.60e-6 cj=3.0a-4 cjsw=2.0€-10 uo=675 vto=0.6
+cgso=2.0e-10 cgdo=2.0e-10 D=ub=0.5e16
+vmax=5e4 ph=-7 aij=.5 mjsy=.5
+neff=2.5 ucrit=8e4 uexp=.25
.model psf pmos level=2 rsh=80 tox=600e-10 ld=..25-6
+xj=-35e-6 cj=4. 1e-4 cjsw=2.5e-10 uo=190 vto=-1.2
+cgso=1.2e-10 cgdo=1.2e-10 nsub=5.0e15 tpg=-1
vmax=5e4 pb=.7 mj=.5 rajsw=.
5
neff=2.0 ucrit=8e4 uexp=. 15
76
APPENDIX B
DNII MAHUA1 ENTET FOB EOLEC
RULEC (CAD) CAD Toolbox User's Manual RULEC (CAD)
NAME




Rulec is a shell script with the following processing steps:
i)
.
The actual Lyra rule compiler is invoked to translate the symbolic rule
description, rules. r, to lisp code, rules.
L
ii) The lisp compiler, Liszt, is invoked to compile rules.l to -rules.
o
iii) rules.o is loaded into Lyra.proto to generate an executable lisp Lyra,
rules.
iv) The intermediate files rulesX and rules. a are deleted.
The following options are supported:
—1 (load 011I7) No compilation is done. Previously compiled rules, rules. o,
are loaded into Lyra.proto to generate an executable Lyra rules. This
option is useful mainly at Berkeley, where Lyra.proto changes frequently.
—o (save object) Name.o is not removed. Enables "rulec 4 rules' in the
future.
FILES
~cad/bin/rulec — rulec shell script.
~cad/lib/lyra/Rulec 1 — lisp rule compiler
~cad/lib/lyra/Lyra.proto — Lyra sans compiled rules code.
^cad/lib/lyra/^r — standard rulesets.

























Department of Computer Science
University of Washington
Seattle. WA 98195
(This document is based on portions of the document 'User's Guide to NET, PRESIM and
RNL/NL,* by Christopher J. Terman, Laboratory (or Computer Science, Mi.T., Cambridge, MA
02139.)
One must first convert the sim file to a network file suitable for use by RNL or NL - to do this
we run PRESIM:
presim foojim foo [config] options...
which converts the file foo.sim into a binary file for RNL/NL called foo.
The
-f option:
Suppresses the sum-of-products formation. This may be desired if you think
sum-of-products is formed wrong otherwise the advantages of the transistor and
node reduction make this option unattractive.
The -« option:
•cfile^ninvalue
writes a list of node aames and capacitances to the specified file. Only capacitances larger than min-
value will be included.
The -t option:
•tfllejninvalue
writes a list of transistors and RC values to the specified file - there are two entries for each transis-
tor. The R's come from the size of the transistor, Ct from the source/drain capacitance. Only RC
values larger than minvalue will be included.
The -p option:
-presist .voltage
provides a worse-case estimate of the circuit power consumption by assuming that all the pullups
(DEP or LOWP devices with drain-VDD) are all on simultaneously. "Voltage* specifics the supply
UW/NW VLSI Release 2 - 1 - 1CVV83
79
UW/NW VLSI Consortium PRESIM User's Guide
voltage, (or example *-pi* specifies a VDD or 5 volts. The result is printed liter PRESEM completes its
other processing. When figuring the resistance of a pullup device the 'power* characteristic resistance
as set in the coring file is used.
The optional third file (con fig) specifies various electrical parameters. The internal values (the
defaults) are a generic set. They do not reflect any particular fabrication process . (ITW-NW VLSI
NOTE: A configuration file is provided in the source code that duplicates the internal settings as an
example of how this ale could be used. In addition we note that, the resistor values are stored first
sorted by width, then by length not by the ratio. Values not explicitly provided in the configuration
file are estimated by Linear interpolation.) The formal of this file is lines of the form
parameter value comments-.
Lines beginning with '? are treated as all comment. The parameter names and their default values
are:













2nd metal capacitance - area, pf/sq-microu
2nd metal capacitance - perimeter, p£/micron
1st metal capacitance - area, pf/sq-micron
1st metal capacitance - perimeter, pf/micron
poly capacitance - area, pf/sq-micron
poly capacitance - perimeter, pf/micron
n-diffusion capacitance — area, pf/sq-micron
n-diffusion capacitance - perimeter, pf/micron
p-diffusion capacitance - area, pf/sq-micron
p-diffusion capacitance - perimeter, pf/micron
gate capacitance - area, pf/sq-micron
microns/lambda (conversion from .sim file units
to units used in cap parameters)
lowthresh OJ ; logic low threshold as a normalized voltage
highthresh 0.8 ; logic high threshold as a normalized voltage
cntpuilup ; < > means that the capacitor formed by gate of
; pullup should be included in capacitance of output
; node
diffperim ; < >0 means do not include diffusion perimeters
; that border on transistor gates when figuring
; sidewall capacitance (*)
subparea ; < >0 means that poly over transistor region will not
; be counted as part of the poly-bulk capacitor {')
LTW/NW VLSI Release 2 10/1/83
80
UW/NW VLSI Consortium PRESIM Uier'i Guide
diffext diffusion extension for etch transistor, ije., each
transistor is assumed to have a rectangular source
and drain diffusion extending diffext units wide and
transistor-width units nigh. The effect of the
diffusion extension is to add some capacitance to
the source and drain node of each transistor —
useful when processing the output of NET to improve
the capacitive loading approximations without adding
explicit load capacitors, diffext is specified in
lambda (it will be converted using the lambda factor
above).
resistance channel context width length resist
this command specifies the equivalent resistance for a transistor
of type channel with the specified width and length. Transistors
matching this entry will have the specified resistance; Linear
interpolation is done if the width and/or length is not matched
exactly.
channel is one of "enh", 'dep', "intrinsic*, low-power",
"puUup*. or "p-chan"
context is one of "static", "dynamic-high", "dynamic-low", or 'power*
width is given in lambda
length is given in lambda
resist is given in ohms
(") These paramters should be 1 only when processing the output of
the node extractor. They cause various corrections to be made
to the interconnect component of a node's capacitance - usually
only extracted sim files have information regarding interconnect
capacitance.
PRESIM uses these parameters in calculating the capacitance for each electrical node and the resis-
tance for each transistor channel.




The following two listings are; (1) the RNL command file
for the entire chip and (2) the results of running that
command file. In addition to this overall testing, all the
layout of Appendix G were simulated individually. A nice
feature of RNL is the indication of when a watched node
changes state. Thus, by making all the outputs of a circuit
watched nodes, RNL will provide the minimum time duration
for a clock cycle to produce the outputs (the longest time
indicated by the simulation). This can be confirmed by
running the simulation with a faster clock, resulting in
outputs of X (neither 1 nor 0) where insufficient time has
been allowed.
RNL simulation to determine the minimum time for prec-
harging the PLA circuits is only slightly more involved.
For each product term in the PLA, alternating inputs are
selected that will result in maximum amount of N+ diffusion
needing to be charged from vclts to Vdd. Then as these
inputs are alternated, the PIA precharge time is reduce
until the circuit fails to produce correct results. For the
PLA ' s in the adder, visual inspection for the product term
with the longest precharge requirement was done by looking
for the longest N+ diffusion line which must be charged
through the maximum number of transistors. The visual
inspection results were confirmed by ENL simulations.
82




























































a a o a
" 1
" )
8? a 'i 65 ae a 7 aR a9 alC all al2 al3
3 c 4 b S be b 7 c f b S o l b 1 1 bl2 bl3 b 1
4
S <• s7 s li s 9 S 1 S 1 1 S 1 2 Sl3 S 1 4 S 1 5 s 1 6
conl con2 con3 con4 crn5))
bi5
a7 ao a c















S rhll C h i 2 ) )
alb al5 aW
as* ai- a7 a6
Dbbfc n ) 6 b 1 5 n 1
4
b 9 PS o 7 be
out S16 sibsur






















































al3 al2 all all)




b5 bi b3 t2 nil)
sH S13 sl2 si 1 S10
Sh s7 sf- s5 s* s3 s2 Si))
! is
1 si- / t b << ^ s ;;
now!" (vec c l o c < s ) clr.
b*a* ) ne* line








U b (• o u 1 1 1 1 1 1 1 l) )
b 1 1 1 1 '"> 1 1 1 10 ) )
p 1 C Hi (J 1 ) )
OMllllMUlllllin)
o o r. o o r. o o c o u o u o o o l- ]
)
Get 2° 13:^° 19*<< eric.CiTc Face 2
1 cin























Dec 6 15:23 1984 cMp.loa Paoe 1
Loe^lna uwslir.l
Done loadino uwsl-n.l

























































































Ster beoins a in ns.
phllrl a o
Step nealns a 35 ns,
phllso a














































Step beoins a 70 ns.
phi2s0 9
Step becins a so ns.
ohilsi s- o
Step beoins e 105 ns.
rhllsO a o










Sten beoins a 1 4n ns ,
chi2=0 p
Ster becins a 150 ns,
ohilsi a
Step becins ? 175 ns,
ohilso a








Dec 6 15:23 19R4 cnlp.loo Pace 3
SUm=0b0COOOO00O00O0OO00
Step beqlns «• 210 ns.
phi2=0 a
Step becins a 220 ns.
phil=i a n
Ster becins 9 2*5 ns.
Dhll=0 a n








Ster beains * 2«0 ns,
onl2=0 B
Stec becins a 290 ns,
phllr] t>
Step bedns a 315 ns.
chil=0 a n







gumsObO 00 000 00 00000000



















Pec 6 15:23 1984 chip. loo Pace 4
Step beolns 6 350 ns.
onil=l a
Sten beolns P 395 ns.
pnii = a o




clocl<s = 0b01 cln = cout=0
aaas=nbOonoiiiiooooilil
bbbhxObl 11100001 1 1 10000
SUirrObOOOOOOCPOOOOOOOOO
Step beolns a ^20 ns.
oni?=o a o
Ster bealns a 430 ns.
phll=l a n
Step bealns e 455 ns.
DhH=0 a o








Ster beolns a 49C ns.
pni2=0 a o
Ster beolns e 5^0 ns,
phll=i a
Step bealns £ 525 ns.
phll=P e o
















oec 6 15:23 1994 chic.loa Paoe 5
S8 = l a 16 ."
s6=l e lfc .8
S4=l p 16 .<?
s2=i a 17
Sl = l 3 19,.1
state Is now
Current ti""e = 560
clocKs = 0b01 c 1 n = o c o i; t =
aaaa=0t00001 1 HOOOOi i n
bbDhrCbl 11 100001 11 innoo























S t e r beclns a 570 ns.
Dhil=l p
SteD beolns ? 59? ns.
nhil=0 a




clocks=Cb01 c 1 p = n c u t =
aaaa=ObOOOC11110000llll
bbbbsObOOOOOOOlOO'iOOOOi
SumsObOl 1 11 1 1 1 1 1 1 1 1 11 11
Step beains a 630 ns.
nni2=0 a
Step beolns a 6^0 ns.
phll=l a
Stec beclns a 665 ns.
ohil=0 a
Step beclns a 675 ns.
phi2=i a
state Is now:
Current tin>e = 700
clocks=0b0l cin=o coutsO
a«aa = OtCO0Oi 111 00001 in
pbbb=ObOooooon 100000001
89
Pec 6 15:23 198* cnlc.loe Paae 6
sum = ot«oi 111111111111111
Step beolns a 700 ns.
Dhi2=0 a n
Stec bealns s 710 ns.
Dhllsi e c
Stec beoins ? 735 ns.
Dhil=r> a n






















Step becins 6 770 ns.
cln=l a o
Dhl?=0 a
Step beolns a 7 q r. ns.
phllsi e o
Stec bealns 9 805 ns,
phllso e o








Step beolns a 840 ns,
chi2=0 a o
90
Dec 6 55:23 ]9»4 cMc.loc Faoe 7
Step beoins P P50 r.s.
phll=l »
Step beoins P 875 ns.
phll=0 e








Step begins a 9 1 ns.
phi2 = s>
Step beoins « 920 ns.
phi 1 = 1 9 i
Ster beoins a 9 <j 5 ns.
pnll=0 P










Stec beclns a 9R0 ns.
a 1 6 = 1 a
al5=i a
al4=i a o









Ster beoins a 990 ns.
Dhll=i p o
Step beoins P 1015 ns.
phll=0 a o
Steo beoins e 1025 ns.
ohl?=l a n
91












Sten beolns 1085 ns.
phll=0 a





clcctcs = Ob01 cln = (* couts"
aaaasOfcll 1111111111 11 11
bbbb = Cc000 00OOOoo0000()
SUmsObOCOOlOOOOOOOlOOOl
Ster beolns e 1120 ns,
Dhi2 = C a r»
Ster bealns a 1130 ns.
chllsl a
Step beolns 9 1155 ns.
Dhll=0 a

















Current tlrre = 1190
cloc<s=0b0l cln=0 cout=o
aaea = Ohlllll 1 l'l 11 11111 1
bbbb=0b0o00000000000000
sui" = 0b01 111111111111111
92
nee 6 15:23 1994 chip. loo F?oe °
S t e n bealns B 1190 ns.
cin=l B
Dhl2=0 e o
Ster bealns B 1200 ns.
onil=l e
SteD bealns fl 1225 ns.
nnll=o a n
St en bealns B 1235 ns,
phi2=l b o
state Is now:
Current tlme = 1260
cloc* s=Oh01 cln=l cout=0
aaap = otllllllllllllll 11
bbbb=0bOOOo0000O000C000
sumsObOllllllllllUltll
Stec beolns e 1260 ns.
oni2=c e
Ster bealns » 127" ns.
ohllsi e
SteD bealns B 12Q5 ns,
pMl = o a o










Stec bealns & 1330 ns.
nni2=0 B
Ster berins B 1340 ns.
onil=l '














Pec b 15:23 19%4 chic. leg Pane 1 <J



















ic 14 n s
.
Ster beolns ° M10 ns,
pnii=i a n
Step healns B 1^35 ns.
pnll=C a








Stec bealns 9 1470 ns.
phl?=0 »
Step peclns * 1^90 ns.
pnll=l a
Step bealns £ 1505 ns.
phll=0 e




cloclcssf'bOl cln = C cout=l
aaaa = Obllllllllll 111111
bbbh=0bO00000O0OO000O01
SUirsOblOOOOOOOOOOOOOOOO
Step bealns a 1540 ns.
nhi?=0 a n
94
Dec 6 15:23 19«fl chlcloc paae 11
Step beains P. 1550 ns.
onilsl a o
Step beains a 1575 ns.
onil=0 e








Sten beolns * 1610 ns.
bl = e
ohi7=0 a
Step beolns P 1620 ns,
Dhll=l a o
Step begins a 1645 ns.
DHl1=0 6
Sten reains * 1655 ns.
phl?=l a i
state Is now:
Current tirre = 1680














a7 = a o
a6=0 a o
a5=0 a
a* = a o




Ster beolns P 1690 ns,
philxi a n
95
Dec 6 15:23 19T4 cMc.loo Faqe 12
Steo beolns P 1715 ns.
Dhll=0 P




cloocssOfOl cln = o coutal
aaae = 0fc00C. 000000 000000
bbbb=0bO00O000C)oocoC00
SUn-sOfclOuOOOCOOOOOOOOOO
Stec beolns f 1750 ns.
b 1 6=1 a
b15=1 e
bl4=l P














Ster bealns a 1760 ns.
rMll=l p











































Dec 6 15:23 1964 enip.loc Pace 13
cout = P 7.2,9
state i s now
:
Current times l a 20
clockssObOl cln=P cout=0
aaaa = 0b00OOC0OOC0OO0<^0O
bbbbaOfcllll] 111 1 11 1111
1
SUffsObOllllll 1111111113
Step beolns P 182^ ns.
al? = l 8
Dni2=o e o
Ster beolns ? 1R30 ns.
nnii=i 9 o
Ster beolns 8 1*55 ns.
Phll=0 8
Stec beains 9 1965 n s,
Dhi?=l 8
Sl6rO E 14.2






















Ster beolns a 189n ns.
bl2=0 a
Dhl2=0 e
Step beolns e 1900 ns.
ohll=l a
Ster beolns 9 1925 ns.
chll=0 e
















































sOM 11101 11 11 ill 111
Ohoim i j liiuiiiii
Sttr beains B i960 ns,
cln=l e
pni2so b o
Stec renins P 197 ns





































a 12 = "a n
pni2=o a
Stec hecins a 2040 ns,
Dhilsi a o
S ten benins 6 2 n b 5 ns.
rhil=n a o
Ster beairs a 2075 ns.
phi2=l a o













sl5 = l a
si4=i a
sl2 = l e
Ster beolns a 2100 ns.
cln=0 a o
chl2=0 a o
S t e d beains a 2110 ns.
phll=l a o
Stec beains a 2135 ns.
phll=0 P
Step beains 6 2145 ns.
phl?=l P
sl6 = B 14.2













n*?c 6 15:23 1984 chic.loa Paae 16
S4=0 a 16.5









Stec beains a 2170 ns,
ohi2=0 a
Stec beclns a 21«n ns.
dM1 = 1 a
Stec becins a 2705 ns,
rhil=0 c









Ster beains fl 2240 ns.
nhl2=0 e
Sten *ealns a 2250 ns.
Dhll=l P
Ster beclns e 2275 ns.
onil=0 «









Stec beclns a 2310 ns.
ohl2=0 a o
Ster becins a 2320 ns.
ohll=l a o
Ster beolns a 2345 ns.
100
nee fe 15:23 1964 cMn.loc i-aoe 17
ohil=0 f














































































(N <N co CO CO CO to CO H •H •H






•£} • Li- • u—ta-u-tij u—y ij , r-^ . . .. .., ,/ I ,- "* —— '
GND "|Lga::::i?S^::-n:ffia:::-::nsa:





ClC ^r rsr rorn cn i<n —i »H -h «H
•rl Im CO C/l Ul |V1 1(1 W W 1/1 £ £
u u w lu w lu u Iti w Icj o< a
i o ij i pi ij i i i a gH h ! M H rt u r. H ~ h f-f
to fn rti ttfi m m ,-ri r-i m ra t< -
¥ Vdd








t-:i:3:Jx-.^:.j[vJ:i-.: ii H:.. :j "i ^ H -
Q-^"~ ~^
fSjp n.-fi-ihi :i ^.i'^-^p^^-^H;'- ^--l^l;^
i— :,'-it I ;.• - : : !.l : : : ;c>
fp '3 1 1 <^ 3 ^^^ -J-J '
iNa^r"% . :. -.
^_[f ^JJ j^Jj |Tp!J -•
y^frr.'f '.•":.' ZBZZB. '•<"•?••>? g vv v
GND li^::::^y§nT^,: fcJ|-:.-:;;^gT::;j^




r. ^T <r m ro <N CM iH iH •H •H
•H a. u a. o &. u CU O J= x:
U a CQ 0) CQ ca ca CQ CQ a. tu
.c
ft
§ m a i a a i § § § n
(J
^
l ' l^-^ig-X, Wl-^ l^ 1 11 jm tm
-y V.
Vdd
|*gA,,»..j,.,ir.„ti. :i...i ,.iv^i...; ®53gg^|
P*^T^ i . '
i|-:fi:j } ":-(:j;| t •••ptf 4". t::j i^^ffl
^•|f'-t!-.H:.if.i:]--: ; l,i:: 5fj^5^^gg^^;g;|j;;: ; |j::| jj.;j. [£[:§,; g ggg^ gj]
k^qfcn:-f ^.- ifcft, .-* f^;ffi^jL- - JCTafezzfa^"' l.-.-TJu.S.:. ».,„. -.<













msb- - - - - 1st)
Addend B Cin
msb- - - - - lsb
Sum

















test fcr proper IC23
0101010101010101
0010001000100010









































































1. Mead, C. and Conway,. L-, Introductio n to VLSI Sy stems ,
Addison-Wesley , 1980.
2. Carlson, D.J., Application of a Silicon Compiler to
VLSI Design of DigiTaT" Pipeline d* HuTEipIiers , "I SEE"
TEesis, TIaval Postgraduate 5ch~ooI, Konterey , Ca., June
1984.
3. Conradi, J. R. and Hauenstein, B. E. , VLSI Design of a
16 Bit Very Fast Pip_eliEed Carry Look" Ahea]3 Adder,
M~S"EE Thesis, Uaval Postgraduate "School, MonTerey , Ca.
,
September 1983.
4. Ousterhout, J., Editing lh 2I Circuits with Caesar,
Computer Science Division, BeparlmenH or Electrical
Engineering and Computer Sciences. University of
California, Berkeley, pp. 1-22, March 22,1983.
5. Tsai, L. L. and Achugbue, J. 0., "BURLAP: A
Hierarchical VLSI Design System," VLSI Design, pp.
21-26, July/August 1983: a- rr
6. Carnegie-Mellon University Computer Science Department
Report CMU-CS-84-101, Let^s Design CMOS Circuits! Part
One, by M. Annaratone ,~Ipril 3, T9"83.
*
7. Krambeck, R. H.< Lee, C. M. and Law, H. S., "High
Speed Compact Circuits *ith CMOS," IEEE Journal of
Solid State Circuits, Vol. SC- 17, No. 3, pp7 ~575-6 V5~
June, T9"H27
8. Fang, R. C. and Moll, J. L. , "Latchup Model for the
Parasitic p-n-p-n Path in Bulk CMOS," I EEE
Transactions on Elect ron Devices, Vol. ED-31, No. TT
ppT~TT3^TZ0T January~TT84T
9. Massachusetts Institute cf Technology VLSI Memo No.
82-117, Introductory CMOS Techniques, by L. A. Glasser
and W. S . "Son g7~7ebr u ary" 19E37
10. Computer Science Divisicn (EECS) , University of
California, Berkeley, Report No. UCB/CSD/83/ 1 15, 1983
VLSI Tools, edited by R. M. Mayo, J. K. OusterhouTT
and fl7~ST~ Scott, March, 1S83.
11. University of Washington/Northwest VLSI Consortium,




' PrStfSa-Hiii, 1?B§. -23i£ ~ £2a^iSE Arithmetic,
13
- EaiSSs.?. Iik"S!IML f^t^fi^J : Bh"' s "e
114
BIBLIOGEAPHY
Mercer, M- R. and Agarwal, V. D. , "A Novel Clocking
Technique for VLSI Circuit Testability," IEEE Journal of
Solid State Circuits, Vol. SC- 19, No. 2, pp. "2U7-2TT7 Ipril,
Tosuntikool, N. and Saxe, C. L. , "Rapid Design of Functional
Cells," VLSI Design, pp. 73-77, July/August 1983.
Williams, M. J. Y. and Angell, J- B., "Enhancing Testability
of Large-Scale Integrated Circuits via Test Points and Added
Logic," IEEE Transactions on Computers, Vol. C-22, No. 1,





Attn: Library, Code 0142
Naval Postgraduate School
Monterey, California 93943












5. Defense Technical Information Center 2
Cameron Station
Alexandria, Virginia 22314

















c.l Design of a sixteen
bit pipelined adder
using CMOS Bulk P-Well
Technology.

