VLSI design with the MacPitts silicon compiler. by Larrabee, Robert C.
Calhoun: The NPS Institutional Archive
Theses and Dissertations Thesis Collection
1985

















Thesis Advisor: D. E. Kirk
Approved for public release; distribution is unlimited
T2228A1

SECURITY CLASSIFICATION OF THIS PAGE fW7i»n Dmtm Entered)
REPORT DOCUMENTATION PAGE READ INSTRUCTIONSBEFORE COMPLETING FORM
1. REPORT NUMBER 2. GOVT ACCESSION NO 3. RECIPIENT'S CATALOG NUMBER
4. TITLE (and Subtitle)
VLSI Design With The MacPitts
Silicon Compiler
5. TYPE OF REPORT 4 PERIOD COVERED
Master's Thesis;
September 1985
6. PERFORMING ORG. REPORT NUMBER
7. AUTHORS
Robert C. Larrabee
8. CONTRACT OR GRANT NUMBERfa,)
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Naval Postgraduate School
Monterey, California 93943-5100
10. PROGRAM ELEMENT, PROJECT, TASK
AREA 4 WORK UNIT NUMBERS





13. NUMBER OF PAGES
265




16. DISTRIBUTION ST ATEMEN T (of this Report)
Approved for public release; distribution is unlimited
17. DISTRIBUTION STATEMENT (of the abstract entered In Block 20, If different from Report)
18. SUPPLEMENTARY NOTES
'9. KEY WORDS (Continue on reverse aide if neceasary and Identity by block number)
VLSI, Silicon Compiler, MacPitts, VLSI Design
20. ABSTRACT (Continue on reverse aide If neceaaary and Identify by block number)
An analysis of the MacPitts silicon compiler is presented.
The emphasis of the analysis is on the interrelationship
between algorithmic syntax and resulting circuit structure.
Errors inherent to the silicon compiler are investigated,
and corrections to the errors are proposed.
DD 1 JAN 73 1473 EDITION OF 1 NOV 65 IS OBSOLETE
S N 0102- LF- 014- 6601 SECURITY CLASSIFICATION OF THIS PAGE (When Data Bntared)
Approved -for public release; distribution is unlimited
VLSI Design With The MacPitts Silicon Compiler
by
Robert C. Larrabee
Lieutenant, United States Navy
B.S., University o-f Texas at Austin, 1978
Submitted in partial f ul -f i 1 1 ment o-f the
requirements -for the degree o-f





An analysis o-f the MacPitts silicon compiler is
presented. The emphasis o-f the analysis is on the
interrelationship between algorithmic syntax and resulting
circuit structure. Errors inherent to the silicon compiler






I NTRODUCT I ON 8
II. COMBINATIONAL LOGIC STRUCTURES IN THE
MACPITTS SILICON COMPILER 12
A. COMBINATIONAL LOGIC CIRCUITS
IN THE DATA PATH 12
1. THE Basic Chip Frame 13
2. A Data Path Inverter 15
3. A Data Path OR Gate 19
4. A Data Path NOR Gate T)
5. A Four Input NOR Structure
In The Data Path 23
6. A Data Path AND Gate 26
7. A Three Input AND Structure
In The Data Path 28
8. Data Path Basic Organelles 28
9. Bit Slice Combinational Logic ......... 29
10. Two Data Path Chips: Counters. ......... 31
B. COMBINATIONAL LOGIC STRUCTURES
IN THE CONTROL PATH 39
1. Control Path Combinational Logic ..... 40
2. A Control Path AND Gate And Control
Path Syntax 41
A Control Path OR Gate cr "7
4. A Four Input OR Gate
In The Control Path 57
5. A Four Input AND Gate
In The Control Path 58
6. A 15 Input OR Gate
In The Control Path 61
7. Control Path Semantics 66
8. Five Input AND Gates
In The Control Path 71
9. A Better 15 Input
Control Path OR Gate 77
10. Two Considerations In
MacPitts Programming 81
C. SUMMARY . . . . . 84
III. A SPEED-POWER COMPARISON BETWEEN A DATA PATH
AND CONTROL PATH EQUIVALENT CIRCUIT 86
A. DATA PATH FIVE INPUT AND GATE .. 86
B. CONTROL PATH FIVE INPUT AND GATE . .. 96
C. SPEED-POWER COMPARISON 99
D._ ALTERNATE POSSIBILITIES FOR FIVE
INPUT AND GATES . . 107
IV. SEQUENTIAL LOGIC IN MACPITTS 114
A. AN OVERVIEW 114
B. GREY CODE-TO-BINARY CONVERTER 119
1
.
Al gor i thm Desi gn 119
2. Functional Constituents Ot The Chip ... 125
3. Alternate Design 143
C. BLACKJACK GAME 148
1. The Algorithm 148
2. The Chip 154
D. THE MEAD-CONWAY TRAFFIC LIGHT CONTROLLER ... 156
1. The Algorithm 157
2. The Chip 162
E. SUMMARY 164
V. MACPITTS VERSUS HANDCRAFTING: A COMPARISON 169
A. HANDCRAFTED TRAFFIC LIGHT CONTROLLER 169
1
.
Desi gn 1 70
2. Optimization and Analysis 175
6. COMPARISON WITH MACPITTS DESIGN 180
VI. A DESIGN EXAMPLE: HAMMING ERROR
DETECTOR/CORRECTOR 185
A. ERROR DETECTOR 185
1. Design Considerations 187
2. Prototype Error Detector 189
3. Expanded Prototype 192
4. Error Detector 195
B. HAMMING METHOD 15/4 ERROR CORRECTOR ........ 199
1. Design Considerations 199
2. Prototype Desi qns 200
3. The 15/4 Error Corrector .............. 214





APPENDIX A CHAPTER III LISTINGS 234
APPENDIX B CHAPTER IV LISTINGS 244
APPENDIX C CHAPTER V LISTINGS 255
APPENDIX D CHAPTER VI LISTINGS 260
LIST OF REFERENCES 261
BIBLIOGRAPHY 263
INITIAL DISTRIBUTION LIST 264
I. INTRODUCTION
The purpose o-f . silicon compilation is to allow -faster
design o-f integrated circuits. Silicon compilation trees the
designer -from the basic layout, routing, and circuitry
concerns inherent to integrated circuit design. The MacPitts
silicon compiler does this by designing an integrated
circuit chip -from a behavioral specification input.
Previous work at the Naval Postgraduate School
investigated applications o-f the MacPitts silicon compiler
to design o-f pipelined digital adders CRef. 13 and
multipliers CRe-f. 2D. Work by Froede CRe-f. 3 J showed the
limitations o-f MacPitts, in its inability to produce fast
VLSI chips. This deficiency is due primarily to the layout
scheme (circuit structure) which MacPitts uses.
This thesis investigates the interrelationship between
MacPitts algorithmic syntax and resulting circuit structure,
MacPitts partitions the chip functionally as shown in Figure
1.1. The data path is at the top, and performs numerical
operations and combinational logic tests. The control path
i s at the bottom, and performs decisions which direct data
path operations.
Chapter II considers combinational logic in both the
data path and control path. The effects of syntax on














Figure 1.. 1 MacPitts Chip Functional Block Diagram
qualitatively, and ine-f-f iciencies and limitations of
implementation are noted. The basic data path organelles
(•fundamental combinational logic structures) are also
invest i gated
.
Chapter III is a quantitative treatment of -functionally
equivalent circuits in the data path and control path. A
five-input AND gate is created in both the data path and
thecontrol path, and a comparative analysis is performed.
The results are extended to similar data path combinational
logic structures.
Chapter IV investigates MacPitts sequential logic. A
Gray code-to-binary serial decoder is designed, and a
functional analysis is performed. The relationship between
syntax and circuit structure is emphasized, with an
alternate solution considered. A blackjack game chip is
presented as a more elaborate MacPitts finite state machine
(FSM) , and its structure is contrasted to that of the Gray
code decoder. The Mead-Conway hi ghway-f armroad traffic light
controller CRef. 4: p. 81 D problem is solved with a
MacPitts design, and an alternate solution is offered.
Chapter Vis a quantitative comparison of a MacPitts
design with a handcrafted equivalent. The Mead-Conway
traffic light controller design from Chapter IV is compared
to a computer-aided engineering (CAE ) -desi gned variant,
which has a programmed logic array (PLA) FSM. The designs
are compared for speed, size, and power comsumption.
Chapter VI is a design example. A design cycle -for
MacPitts is developed, and illustrated with the Hamming 15/4
error detector /corrector CRe-f. 53. The prototype (-first
model) and archetype (chie-f model) algorithms and chip
layouts are provided. An analysis o-f the alternate designs
is given, and a basis -for choosing the archetype is
proposed. The Hamming 15/4 error detector /corrector is then
designed based on the archetype, and analyzed with available
CAD tools.
Chapter VII is a summary o-f errors detected in the
MacPitts silicon compiler and suggestions -for enhancement.
The errors and suggestions are cross-re-f erenced to MacPitts
source code where possible.
I 1
I I • COMBINATIONAL LOGIC STRUCTURES IN THE NACPITTS
SILICON COMPILER
Inasmuch as the MacPitts algorithm creates combinational
logic -functions, it would be helpful to know how it does
this. Does there exist an explicit directive to the LISP
object -file which calls and implements the logical functinris
requested, or are they implicitly specified? It the latter
is true, it would suggest simpler source algorithms could be
written to specify the circuit, function. I-f the -former case
is true, then more lengthy algorithms are required, but the
circuit designer has more latitude -for direct control and
optimization o-f layout.
A. COMBINATIONAL LOGIC CIRCUITS IN THE DATA PATH
Combinational logic structure instantiation in the data
path o-f a MacPitts generated chip is directed by the data-
path..lisp -file in the MacPitts source code. The data-
path, lisp tile calls specific functional units called
organelles from the organelles. lisp file to implement the
desired logic. These LISP files are: compiled under the Liszt
compiler and linked to the rest of the compiled MacPitts
files by the available Makefile routine. The resulting 1 ,. 6
Megabyte binary image constitutes the integrated MacPitts
silicon compiler.
!..
1 . The Sasi c Chi p Frame
The initial investigation consisted of the
MacPi tts-generated design frame called wire. mac. The
algorithm to create this structure is shown in Figure 2.1.
WIRE. MAC
SOURCE CODE FOR ALGORITHMIC CREATION OF NO
FUNCTION BY MACPITTS SILICON COMPILER
(program wire I
(def I ground)
(def aln port Input (2))




( def 7 power
)
( a lways
( setq res a 1 n ) ) ) )
Figure 2.1 Wire. mac
The extension .mac refers to a MacPitts algorithm. MacPitts
is taken to refer to the silicon compiler, the psuedo-LISP
language which it uses, and the LISP source routines which
constitute the silicon compiler. To avoid con-fusion, the
MacPitts driver routines written by the chip designer will
be referred to as algorithms. Other meanings of the term
MacPitts will be clarified by context.
MacPitts produces a seven pad chip, routing the
input directly to the output without clocking. The three
phase clocking is not required for this circuit, so the
clock runs all terminate within the chip frame without
connections as shown in Figure 2.2. The three phase clock
must be specified in the algorithm, however, and the clock
traces are produced whether they are used or not. Note that














and the clock pads are also placed in the order specified in
the driver algorithm (Figure 2.1). Furthermore, neither the
clock traces nor the signal lines takes a direct route to
its destination. Even though these lines are all metal, the
excess lengths induce a lessening of maximum chip speed due
to capacitance. This topic will be treated in a later
chapter. The data path Vdd-ground comb does not connect with
the Vdd rail at bottom left on the stipple plot. This is
common with very small data path chips, and the error can be
corrected in Caesar or a similar VLSI graphics editor.
2. A Data Path Inverter
The next program, macnot.mac shown in Figure 2.3,
specified a logical NOT function. As expected, MacPitts used
a single inverter of 4:1 ratio in the data path. The input
which is on the top left diffusion line in Figure 2.4 runs
to the gate of the NMOS inverter via a metal and diffusion
routing, and the inverted output comes out on a polysilicon
line from the far right of the circuit. It was also noted
that the logical integer specification is required for NOT,
i.e.
, one must use [word-not] rather than CnotD. The reason
for this is given in Southard CRef. 6:pp. 47-48], which
indicates that integer logical operators must be used on
word elements, (ports and registers), and Boolean logical
operators on control elements (flags and signals). The
logical Boolean specification CnotD is used on flags, input
signals, and internal signals but it is not used for input
1 5
ports or register contents. In either Boolean or integer
data types, the NOT function takes a single value, as would
be expected. :
The syntax of the driver algorithm (the .mac -file)
is data-type sensitive, in a similar manner as Fortran is
sensitive to the integer and -floating point data types. The
two data types (-from the programming perspective) are
Boolean and integer. Each data type is treated di f f erent 1 y
by the MacPitts compiler, and each requires a different












The fundamental difference in data types is
argument length. Boolean data are of single bit length,
whereas integer data are of word length (one bit or
greater). Integer type data operations all occur in the
data path of a MacPitts design, and Boolean operations all
occur in the control path.
In Figure 2.3, the data type is declared in the
DEF statement, the form of which is
(def (name) <function> < input, output, or internal
< p i n number ( s) >)
;MACNOT.MAC
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<not> FUNCTION BY MACpITTS SILICON COMPILER(program macnot 1
(def 1 ground)
(def a port input (2))
;aln*lnput//res»output
(def b port output (3))
(def 4 phla)
<def 5 phlb )
(def 6 ph 1c >
;must show 3-phs elk, oven If not used
( def 7 power >
( a 1 ways
( setq b (word-not a)))) i
Figure 2.3 Macnot. mac
«*J» ••,.<•.. £,
1 :••
M . > »
f ; . ' • '
[it
Figure 2.4 Data Fath Inverter
17
where the name is any ASCII character string, the -Function
can be either port, signal, register, or -flag. The next
•field determines where the data is applied, and for most
circuits is either input or output. The pin number is
required -for all input and output data. The data type is
determined by the function field. Signals and flags a.r&
Boolean data, ports and registers are integer (word length)
data. The subsequent MacPitts -forms in the driver algorithm
must agree in type with the DEF declarations.
I-f an incorrect data type specification is used,
MacPitts generates an appropriate error diagnostic at
compilation time. For instance, it one were to define the
inputs hot and cold as Boolean type and attempt integer
operations on them as -follows
(de-f hot signal input b)
(def cold signal input 6)
(setq warm (word—nor hot cold))
the following diagnostic would result at compilation time;
brror : 1 ogi cal
yet
coercion to integer not implemented
Si mil arly, if Boolean operations ^re attempted on integer
data, the following diagnostic results at compilation time:
Error : Bool ean conversion not implemented yet
MacPitts error diagnostics can be quite con-fusing
to the inexperienced user. It is suggested that one peruse
the 1 i ncol n . 1 i sp , hl.grep, and compmesg . 1 i sp files of the
MacPitts source code to gain insight into the cause o-f
specific diagnostic messages. Thi s can be easily done on-line
under the BSD Unix operating system. The grep -feature
(pattern search and recognition) is used. The general.
command -format is
grep < search pattern > <file to search >.
For example, i -f one attempted Boolean operations on a
register (an integer-valued data type) in MacPitts, the
second diagnostic given above would result. To loc£*.te the
source o-f this message, change directory to the residence of
MacPitts source code and issue the Unix command
grep boolean •*.*
to . locate all occurrences o-f the word boolean. Caution is
advised in issuing the grep command. I-f a very common word
is searched tor, the search may take quite a long while, and
the results may not be very helpful. The search capability
of the grep command is limited though, as explained in the
BSD Unix manual.
3. A Data Path OR Gate
Next a MacPitts routine was written to generate a
two input OR gate in the data path. Again, the integer data
specification is required (see Figure 2.5).
i ;MACOR.MAC
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<or> FUNCTION BY MACPITTS SILICON COMPILER//2 Input gate//
(program macor 1
(def 1 ground)
(def a port Input (2 )>
;a,b a 1nputs//c»output
(def b port Input ( 3 )
)









(setq c ( word-or a b ) ) ) )
Figure 2.5 Macor. mac
The resulting circuit extracted -from the chip is depicted
in Figure 2.6. The OR function is implemented as a NOR gate
-followed by an inverter. Figure 2.7 shows the gate
equivalent o-f a two input data path OR structure. The two
inputs to the NOR gate come in on the le-ft top o-f the
circuit, the output is then inverted to yield a logical OR
function, and the output o-f the inverter is routed -from the
le-ft back out on the poly line below and parallel to the
input tracks. This routing scheme (river routing) is
determined by the MacF'itts source code, and the chip
designer has no control over it. All chip inputs and outputs
are routed inside the main ground bus, with little regard to
minimizing trace length (see Figure 2.2). So an OR gate in
the data path o-f MacPitts is constructed from a two input
NOR gate with an inverter on the output, and the inputs and
outputs all connect the data path -from the le-ft side.
20
Figure 2.6 Data Path OR Gate
y> A + B
Figure 2.7 Gate Equivalent o-f Figure 2.6
21
4. A Data Path NOR Gate
A two input data path NOR -function is shown in
Figure 2.8. The resulting circuit in Figure 2.9 shows
instantiation as a two input 8:1 NOR gate, with the inputs
A, B, at top left and the result, C, at bottom left. If two
inputs sire permissible, Bre more? Does MacPitts know to
adjust the transistor k values for multiple input gates? A
two input NOR chip was specified in the algorithm, and
MacPitts created a two input NOR gate. So explicit circuit
specification has been realized so far in the MacPitts chip
data path. When the algorithm specifies a NOR function, a
NOR gate is instantiated. As will be discussed later, this
is not the case in the control path.
;MACNOR.MAC
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<nor> FUNCTION BY MACPITTS SILICON C0MPILER//2 Input gate//
(program macnor 1
(def 1 ground)
I def a port Input (2 ) )
;a,b*lnputs//c*output
(def b port Input { 3 ) >
(def c port output (4 ))
(def 5 phla)
(def 6 ph1b>
(def 7 ph 1c >
;must show 3-phs elk, even If not used
( def 8 power ) ;
( a 1 ways
( setq c ( word-nor a b ) ) ) ) )
Figure 2.8 Macnor. mac
Figure 2.9 Data Path NOR Gate
5. A Four Input NOR Structure In The Data Path
Figure 2.10 shows the MacPitts algorithm to
generate a -four input NOR structure (not the -functional
equivalent o-f a -four input NOR gate) in the data path. The
MacPitts -form used was
(setq out (word-nor a (word-nor b (word-nor c d)))
where setq is the LISP assignment operator, out is the
output port, a,b,c,and are the inputs, and all data is o-f
integer (word) type. The pre-f i x-operator nature o-f LISP
syntax CRe-f. 6:p. 47D indicates the logical operation which
this gate will perform. Figure 2.11 shows the layout o-f the
circuit MacPitts produces -from this algorithm, and Figure
2.12 depicts the gate-level equivalent.
Note the topology, two inputs to the -first NOR
gate, its output and another input to the next NOR gate and
repetition to the third level. The output comes from the
last (rightmost) NOR gate.
This structure wi 1 1 not be the -functional
equivalent o-f a -four input NOR gate. As the LISP—like syntax
suggests, the NOR o-f tour inputs is not equivalent to the
cascading o-f two input NORs.
; FOUR NOR. MAC
;SOURCE CODE FOR ALGORITHMIC CREATION
;<nor> STRUCTURE BY MACPITTS SILICON
( program f 1 vnor 1
(def 1 ground)
(def a port Input (2 ) )
(def b port Input (3) )
(def c port Input (4))
(def d port Input (5) )
(def e port Input (6))
(def outr port output (7))
(def 8 phla)
(def 9 phlb)
(def 10 phlc )
( def 1 1 power
)





a(word-nor b(word-nor c d))>)))
Figure 2.10 Four nor. mac









Figure 2.12 Gate Equivalent of Fournor Circuitry
6. A Data Path AND Gate
These observations raise the question o-f how a two
input data path AND gate would be constructed by MacPitts.
The (word-and x y) integer expression is required to
implement this circuit al gar i thmi cal 1 y , and a reasonably
compact circuit is expected. Figure 2.13 shows the MacPitts
algorithm to create the two input bit AND -function in the
data path.
;MACAND.MAC
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<AND> FUNCTION BY MACPITTS SILICON C0MPILER//2 Input gate//
(program macand 1
(def 1 ground)
(def a port Input (2 ))
(def b port input (3 >
)




( def 8 power
)
( a 1 ways
( setq c ( word-and a b ) ) ) )
Figure 2.13 Macand. mac
The AND chip is implemented as a two input 4:1 NAND
gate, the output o-f which drives a 4:1 inverter. The
stipple plot o-f this circuit is shown in Figure 2.14, and
its gate level equivalent is shown in Figure 2.15. In
Figure 2.14 note the input similarities to the previous
circuits. The two inputs enter the organelle at top left,
the signal is routed to the gate, and the output exits the
organelle on the bottom polysilicon line at the le-ft. Also
Zto




Figure 2.15 Gate Equivalent o-f Data Path AND Gate
T1 "/
note the difference among layouts o-f the MacPitts NAND gate
and the MacPitts NOR gate, and the corresponding Mead-Conway
cells CRe-f. 4:p. 173.
7. A Three Input AND Structure In The Data Path
The three input AND was expected to produce gates
similar to those o-f the two input AND, a series o-f cascaded
NAND gates each -followed by an inverter. Figure 2.16 shows
the algorithm -for the three input AND circuit, and Figure
2.17 depicts the resulting layout. The circuit is the
equivalent of three ANDs due to associativity o-f AND.
8. Data Path Basi c Orqanel les
Nhen a MacPitts source algorithm is invoked by the
linked binary MacPitts image by issuing the command
macpitts <filename> <options>
LISP object code is generated (unless the noobj option is
specified, in which case MacPitts searches for a previously-
created object file of <f i 1 ename>. ob j ) . In the filename. obj
file it is observed that the data path logical operations
Ar& all derived from NOT, NAND, and NOR LISP operations.
This is due to the -fundamental hardware building blocks of
MacPitts data path combinational logic being two input NAND
and NOR gates, and NOT gates (inverters). Knowing this, the
reason -for the two-input gate implementation as depicted in
the previous -figures becomes clear.
28
Any data path logic organelle is composed of these
primitives. The OR organelle is a NOR gate with an inverter
on its output. • The AND organelle is a NAND gate with an
inverter on its output. In the data path, these organelles
Are assembled into macros in the organelles. lisp tile of the
MacPitts source code. The process of silicon compilation is
thereby shortened, since some o-f the constituent parts a.rs
already put together.
A two input data path NAND gate chip is implemented
exactly as it is specified. A three input NAND structure is
implemented as expected, by cascading two NAND orgeneIi.es
(the three input NAND structure is not functionally
equivalent to a three input NAND gate). The output, again,
is what the LISP parenthesized notation would lead one to
expect
.
9. B i t SI i ce Combi nat i anal Loqi
c
So -far, all examples given have used inputs having
one bit, but the data type specification tor data path
combinational logic is integer. Word size data inputs B.rs
treated in the expected way. Figure 2.16 illustrates a
routine which performs the logical AND on two input vectors
each four bits wide. Notice the similarity of this MacPitts
program to those already given. The only differences between
this routine and the AND of two bits ana the PORT
statements, which make logical and connective assignments
between i /o ports and inter—chip hardware blocks.
;3AND.MAC
;SOURCE CODE FOR 3 INPUT DATA PATH <AND> GATE
(program 3and 1
(def 1 ground >
<def a port Input (2)
)
(def b port Input ( 3 )
(def c port Input ( 4 > )
(def d port output ( 5 )
)
(def 6 ph ta )
(def 7 phlb)
(def 8 phlc)
(def 9 power >
( a 1 ways
( setq d (word-and (word-and a
1-igure 2.16 3and.mac
b) ) >) >
Figure 2.17 Circuitry from 3and.ci-f
Figure 2.18 illustrates the data path circuitry
which implements this logic. It is evident that the logic is
performed by replications o-f the -fundamental MacPitts AND
organelle, a NAND gate with inverted output. In comparing
this circuit to Figure 2.14 the similarity becomes clear.
The word-and integer operation as specified in the source
algorithm translates to a data path AND organelle in the
LISP object -file. This organelle is replicated,
instantiated, and connected to inputs and outputs to create
the circuit (cifplot) shown in Figure 2.19. This data, path
word operation capability would not usually be applied to
bit-width combinational logic, as the previous discussions
might suggest, but rather to bit—slice operations such as
word masking, parity checks, arithmetic operations, and so
on .
10. Two Data Path Chi ps: Counters
A -four bit resettable up—counter chip was designed
by MacPitts using an algorithm given in the MacPitts
documentation. Figure 2.20 shows the algorithm to specify
the counter's behavior, and Figure 2.21 shows the resulting
chip layout diagram. This example gives an indication of the
implicative nature o-f MacPitts, which is actually a function
of the LISP object code. There is a bank of three vertical
drivers below the data path block in Figure 2.21. These are
clock drivers, which drive the three phase clock.
;MULTIAND.MAC
{SOURCE CODE FOR ALGORITHMIC
;<AND> FUNCTION BY MACPITTS
(program multland 4
(def 1 ground)
a port Input (2 3 4 5>)
b port tnput (6 7 8 9)
)





















( word-and ) ) ) )
Figure 2.18 Mul t i and. mac




Figure 2.19 Logic Circuitry From Mul t l and . ci
f
{Example of MACP1TTS algorithm to create a 4 bit counter
{Illustrates use of "always" and "cond" commands
{title: count*. mac
(program count4 4
( def 1 6 power
)
( def 1 ground )
(def 2 phla)
(def 3 phlb)
(def 4 ph 1c )
(def rst signal Input 5)
(def count register)
(def cnt_up signal Input 6)
(def ld_zero signal Input 7)
(def out port output (12 13 14 15) *)
( a 1 ways
( cond
( ld_zero
( setq count 0) )
( cnt_up
(setq count (1+ count)) ) )
(setq out count) ) )
Figure 2.20 Count4.mac
They connect to the clock lines on the bottom and to the
count registers at the top.
There is a small Weinberger Array beneath the clock
drivers. A Weinberger array CRef. 8] is used by MacPitts to
control data path operations. It can be inferred -from the
size comparison between the data path block and the control
block that this is a data intensive chip. The MacPitts
algorithm reflects this, with many data operations such as
SETQ and (1+ count)
,
the increment statement, and -few control
operations such as







where each conditional > requires a decision. This
decision making is perhaps more obvious in the generated
object -file, where each COND statement is translated to an
IF statement. MacPitts implements the decisions more along
the lines o-f a Pascal CASE construction than as an IF
construction (the compiled LISP code re-flects the IF logical
testing, but it is set within a parallelizing command).
The SETQ form has operated on just ports so tar. In
count4.mac, the SETQ -form operates on a register (COUNT, the
current counter value). The last line in the algorithm,
(setq out count) , sets the output port to the current count
register value. From the hardware perspective, this can be
viewed as a latching or storage o-f the register contents.,
and clocking the contents to an output port. This is
necessary in MacPitts since ports cannot store data.. Only
registers can store data in the data path, and MacPitts
implements registers as master—slave flip flops.
The chips considered so far, with the exception of
count4.mac, have been pure data path chips. In almost all
useful chips, there will be a data path which is control lea
by a Weinberger array control path. It is difficult to guess
the relative sizes of the data path and control path from
just the MacPitts driver algorithm. Nevertheless, it few
conditional decisions s,rG to be made and many arithmetic or
logical operations are to be performed, the data path is
likely to be the larger.
Figure 2.22 shows the algorithm (the .mac tile)
for count 16ud. mac , the MacPitts driver -for a 16 bit up/down
counter. The signal and register names Are self explanatory.
The previous -four bit up-counter was the prototype -for this
16 bit up/down counter. The differences are in word length,
the addition of a new input signal (count_down) , the
conditional test of count_down, and the decrement operation
(1- count) if count_down is asserted true. It is usually a
good idea to model a desired algorithm with a simpler
prototype (functionally similar but having fewer inputs and
outputs) , and to test the prototype in the MacPitts command
interpreter. For example, designing a four bit up counter is
a good preliminary step when a 16 bit up/down counter is
desi red
.
It can be inferred that the ratio of data path to
control path size will be greater for this chip than for
cou.nt4.mac. Figure 2.23 shows the resulting cifplot of
count 16ud . mac , and the 16 bit wide data path is indeed much
larger than the control path, and as expected, much larger
than the four bit counter data path also.
;Example of MACPITTS algorithm to create a
;cop1ous1y commented for clarity's sake
16 bit up/down counter
;tttlet count 1 6ud .mac
(program countl6ud 18
;note that the IS opposite the title determines # of outputs
;doc. says data paths; actually equates to output pads(NOT paths)






;the counter will require a 16 bit width storage reg Ister (McP« m/s FF)
;... a count up enable signal,
;... a count down enable signal, »
;... and a reset signal. These are described syntactically below:
(def rst signal Input 5)
;th1s declares a bank of 16 clocked m/s FFs (see stlppleplot)
(def count register)
(def cnt_up signal input 6)
(def cnt_dn signal Input 7)
(def ld_zero signal Input 8)
the 16 output pads are specif ledt
def out port output (9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24) )
always command means to execute what follows every clock cycle
a lways
the cond (-ftlon) statement means to check the following guard
conditions, and execute ONLY that one which Is .true.
execution of one guard precludes execution of any subsequent guards.
( cond
there are three guards to checktls ld_zero .true.?
If not, Is cnt_up .true.?
If not. Is cnt_dn .true.?
If neither Is .true, then exit the loop
( 1d_zero
If ld_zero 1s asserted (high), then make count»0 (1.e.,clr FFs)
(setq count S3) )
( cnt_up
; If cnt_up Is asserted (high), then Increment the count FF bank
(setq count (1+ count)) )
\\f cnt_dn Is asserted (high), then decrement the count FF bank
( c n t_d n
(setq count (1- count)) ) )
;regardless of which (if any) operation Is done, the FF contents
;are assigned to the output with the setq command,
(setq out count) ) )
Figure 2.22 Count 16ud. mac
37
Figure 2.23 Count 16ud. ci
f
38
B. COMBINATIONAL LOGIC STRUCTURES IN THE CONTROL PATH
The implementation of combinational logic in the control
path o-f a MacPitts design is -fundamentally different -from
its implementation in the data path.
In the data path, all combinational logic is constructed
from basic two input NOR, NAND, and NOT cells, as described
in the MacPitts source code file data-path . 1 i sp . Any
logical implementation, however complicated, is constructed
from these three organelles (other organelles do exist in
the organelles. 1 file, but they all are constituted either
from these basic cells or permutations of these cells).
Furthermore, the specifications required by MacPitts in
the data path are more oriented towards structure than
behavior. For instance, when the programmer /desi gner writes
the following algorithmic fragment
(word-and a(word-and b c))
what is being explicitly specified is a two-level gate
structure. The innermost level comprises a two-input AND
gate, the output of which is fed to the input of the second
level AND gate, in parallel with the third input. Note that
a single gate with more than two inputs is not permitted in
the data path. The syntax constraints of the MacPitts
compiled object code determine this structure. Again, this
apparent limitation is not really a limitation at all
39
because MacPitts is so constructed as to force decisions to
be made in the control path. Consequently, the necessity of
Boolean algebraic reduction in the data path combinational
logic is highly unlikely.
1 . Control Path Combi nati onal Loqi c
The control path implementation o-f combinational
logic is simpler than the data path implementation in two
ways. It is behavior oriented, rather than structure
oriented. The MacPitts designer needs only to specify the
MacPitts LISP—like behavior of the structure, and the
MacPitts environment produces a realization of it. This
requires little (if any) Boolean reduction which might be
required for complicated data path logical structures.
The control path combinational logic is also
simpler structurally, in that it is always implemented in a
highly-regular Weinberger array. A tradeoff between
simplicity of layout and maximum circuit speed exists,
however, and this topic will be considered in Chapters IV
and V. Although a Weinberger array is geometrically simpler
than a Programmable Logic Array (PLA) , it is not as fast or
as smal 1
.
The selection of which path is to perform the
combinational logic is inherent in the MacPitts (the
language) syntax. If the logical operator is a Boolean form
and its antecedents are signals or flags, the control path
will do the logic. If the logical operator is an integer
4
1
•form and its antecedents Are ports or registers, then the
combinational logic will take place in the data path. Thus,
the syntax drives the selection o-f where the combinational
logic occurs.
The initial MacPitts documentation offered some
insight into these distinctions. A variety of tests were
devised in the current investigation to explore the
combinational logic implementation di -f -f erences between the
data and control paths. The experiments designed to arrive
at the above conclusions -for the control path logic a.r<a
presented in the -following sections.
2. A Control Path AND Gate, And Control Path
Syntax
Casand.mac (cascaded AND gates; Figure 2.24 ) was
the algorithm to create the initial structure to explore
combinational logic implementation in the control path. The?
control path implementation of combinational logic requires
a different kind of input declaration than does the data
path. In the control path, the inputs must, be declared as
<name> signal input <pin number
>
This has the effect of coercion to Boolean (true or false,
as opposed to one and zero) in the MacPitts environment.
Consequently, a different type o+ logical operator
is required in the SETQ argument, forms. In the data path,
using def i ned-i nteger ports as inputs, the integer logic
f
SETQ -forms are used (word—or, word-nand, etc). In the
control path, however, Boolean SETQ -forms are required (or,
nand , etc.). The data path integer SETQ -forms are limited to
two logical arguments, whereas the control path SETQ -forms
are e-f -f ect i vel y unlimited as to number o-f logical arguments.
This seemingly arbitrary constraint becomes understandable
in view o-f structural implementation in the respective
paths. In the data path, all logic must be implemented by
cascades o-f two input gates. In the control path, all logic
is implemented by a Weinberger array, which has no practical.
limit (except speed, pin count, and chip size) on the number
o-f inputs.
Furthermore, the data path combinational logic
restrictions are less strict (structurally speaking) than
are the control path logical structures. For instance, in
the data path all combinational logic structures are derived
-from NAND, NOR, and NOT gates, and implemented as macro
organelles. In the control path, however, all logic
structures are constrained to be NOR gates. The basename.obj
-file that results from a basename.mac file indicates all
control path combinational logic implemented as NOR
operations. Figure 2.25, casand.obj, shows the NOR -function
used to perform the AND function in the control path. All
control path combinational logic operations are implemented
in this fashion, as in the more common PLA.
{(
{CASAND. MAC
iSOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<and> FUNCTION BY MACPITTS SILICON COMPILER//2 Input gate//
(program casand I
(def I ground)
(def a signal Input 5)
(def b signal Input S)








(setq c (and a b> ) > »
(b
(setq c (and a b) ) ) ) ) )








signal a Input 5)
s Ignal b 1 nput 6
)









Ignal -output c) (nor ((primitive (gate 4)))))
(gate 4) (nor ((primitive (gate 3)> (primitive (gate 2)))))
(gate 3)
( nor
((primitive (gate 1)) (primitive (gate 0)) (primitive ( s Ignal - 1 nput a)))
(gate 2) (nor ((primitive (gate 1)> (primitive (gate #)))>>
(gate 1) (nor ((primitive ( s 1 gna 1 - 1 nput a>))))
(gate 0) (nor ((primitive ( s
1




1 ( ground )
)
8 ( power )
)
5 (Input b ( s Ignal -Input b)))
5 (Input a ( s Igna 1 - 1 nput a)))
7 (outputs c ( s Ignal -output c)))))
Figure 2.25 Casand. obj
43
The AND plane i n an NMOS PLA is actually comprised o-f NOR
gates, its -function is logical AND, but its constituent
circuits are NOR gates. The NOR structure which the control
path uses is also di-f-ferent topol ogi cal 1 y -from that used in
the data path.
A concise review o-f the data path and control path
variable types illustrates the usage di -f f erences:
DATA TYPE





ELEMENT signal ( i nput
,
i nternal ) port (all types,'
All storage elements are implemented as master-
slave i 1 1 p—f 1 ops. They retain their value until a new value
is clocked into them. The -flags are one bit wide, and are
two—state devices, either true or -false. The registers have
a capacity o-f the data path width as declared in the
initial PROGRAM statement in the MacPitts source program
written by the chip designer.
Non-storage elements are used primarily tor data
communication within a clock cycle, where clock cycle here
is taken to re-fer to the command interpreter clock cycle,
and not one o-f the three o-f -f -chip clock phases which a
MacPitts design requires. The determination o-f the value o-f
these non-storage elements is germane to pipelined digital
machines. When used in any application, cars must be taken
so that their value is the one necessary -for subsequent
stages of logic. A thorough understanding of the counter-
intuitive parallelism inherent in MacPitts (the language) is
necessary to avoid mistakes here. MacPitts is not like the
standard sequentially executed higher level languages. There
are at least three levels o-f implicit parallelism possible
in a MacPitts algorithm, and an understanding o-f parallel
operations is necessary to avoid functional errors. This
consideration is germane to MacPitts programming, and will
be considered in detail later in this Chapter and in
Chapters III and IV.
The next-to-last line in Figure 2.24 illustrates a
conditional. The (b ...statement is a checked '(condition)
argument o-f the beginning COND (do upon condition)
statement, as is (a... . If condition a is false, and
condition b is -false, then no output is SETQ'd. Intuition
would suggest that the output would then either remain at
its last value or transition to tri state, neither o-f which
is correct. The output is pulled low by the Weinberger &.rr*^
circuitry. This. is evident in Figure 2.27 the Weinberger
array from casand.mac, and in Figure 2.28, the logic gate
equivalent. The (cond (t ... form can be used to set a
desired output, but it is usually better suited as a default





Figure 2.27 Casand.ci-f Weinberger Array
47
MacPitts does not view this algorithm as a usual
high level sequential test, however, but rather as a
parallel test of a and b. The non-intuitive parallelism of
MacPitts was mentioned in the previous paragraph, as was the
similarity o-f the MacPitts COND statement to the Pascal CASE
statement. Some elaboration will serve to clarity this
necessary concept. MacPitts evaluates all o-f the forms
within the scope o-f a COND statement in parallel, in a
mutually exclusive -fashion. With regard to mutual.
exclusivity, it is then similar to the CASE statement; each
condition under the scope o-f a COND can be modelled as a
•f 1 ow— o-f —control switch, either turning on the evaluation o-f
its constituent -forms or else skipping ov&r their
evaluation. The analogy does not hold -further than this,
however, because MacPitts evaluates all o-f the conditions
under a COND in parallel. The object code created from a
MacPitts source -file illustrates this well. An example is
(cond
(hot (setq -fan_pn t>)
(cold (setq -fan_on
-f ) )
(ok (setq -fan_on i ) ) )
Where hot, cold, and ok a.r<= Boolean variables
(signals or -flags), -fan_on is in this case a Boolean signal
output which is to be turned on (t) or o-f-f ( -f ) depending on
an input temperature signal. COND -forces parallel evaluation
o-f these three conditions under its scope, hot, cold, or ok.
.1.
The last parenthesis in this -fragment closes the beginning
parenthesis prior to the COND , bringing the three conditions
under its scope. Since these conditions are evaluated in
parallel, a better code -fragment would be
(cond
(hot (setq -fan_on t))
(cold (setq fan_pn f))
(t (setq -fan_on -f ) ) )
where the last line indicates TRUE, i.e., it is always true.
Since COND evaluates in parallel with mutual exclusion based
upon order, if either o-f the -first two conditions is true,
then the remaining conditions are not evaluated. It neither
o-f the -first two conditions is true, however, then the tan
will be turned off. This code fragment permits one less
signal input (or one less f 1 ag used) on the chip, and use at
the TRUE t condition should always be considered. Its use is
not necessary, as indicated by the tirst code -fragment.
MacF'itts produces an accompanying object code which
structurally resembles the following fragment
(if




(t ... ) ) )
where the COND translates to an IF, and the parallelism of
MacPitts is evident in the PAR (parallelize) embracing the
three constituent conditions under the COND. Parentheses are
as important in MacPitts as thev are in LISP. In the last
line above, there are three closing parentheses. The
innermost closes the TRUE condition, the middle parenthesis
closes the PAR (paral 1 i sat i on of condition checking), and
the outermost closes the IF (cond) statement.
The LISP object -file of casand.mac in Figure 2.25
indicates the LISP equivalent of the MacPitts (language)
algorithm, and shows how LISP views the NOR gate inputs as
primitives. MacPitts is also able to compile a chip layout
directly from a LISP object code. This is an option for the
designer who is fluent in LISP in that customising of the
code and hence the chip's structure is possible. RVLSI-3
CRef. 6:p. 4] describes how to create a chip design from an
existing LISP object file.
Figure 2.26 shows the chip resulting trom
casand.obj. The pads are all placed clockwise around the
periphery of the chip in the order specified in the , mac
file (Figure 2.24). This built-in function of MacPitts
lends itself to both errors and possibilities of
improvement. It is easy to identify pad function if the
MacPitts algorithmic source file (written bv the designer)
is at hand.
Figure 2.27 also shows the topological difference
between the data path and the control path. In previous data
path circuits, all combinational logic was implemented with
recognizable NMOS logic gates. In the data path, the
Weinberger array is made up of many vertical metal columns
with perpendicular polysilicon lines cutting across them.




Figure 2.28 Gate Equivalent o-f Casand Weinberger Array
In Figure 2.26 the Vdd input rail did not connect
with the main Vdd bus (it has been corrected in Figure
2.26). It passes through the polysilicon vias and stops
abruptly. The reason -for this is the expectation o-f minimum
chip size which MacPitts harbors. For any but the simplest
o-f chips, the Vdd comb will extend out to the input Vdd
rail. I-f it does not, the Vdd pad can be placed almost at
will by modi-fying its position in the basename.mac -file.
RVLSI-3 discusses this CRe-f. 6:pp. 11-13D. The designer
can exercise a -fair amount o-f latitude in pad placement, and
MacPitts will accommodate most o-f the time. The suggestion
in RVLSI-3 that GND be placed near the beginning and Vdd be
placed near the end is a good one. The main problem here
would arise i-f GND were placed on the right side o-f the chip
so that it contacted the Vdd comb (which it will do i-f car^
51
is not exercised). MacPitts places pads exactly in the order
specified, and does no pad functional error-checking.
Similarly, i f a pad is dual -def i ned , MacPitts permits it
with no diagnostics. This extends to the same pad being used
for both Vdd and an input signal. So carB is important in
both pad specification and positioning.
There exists the possibility for some improvement
in chip speed by designer intervention in specifying the pad
location. By moving pads five, six, and seven (input and
output signals in casand.mac) closer to the Weinberger-
array, the metal run lengths can be reduced and thus the
metal to substrate capacitance. This results in a somewhat
faster chip, all other factors being equal.
Figure 2.27 is a blowup of the Weinberger array
generated by casand.mac, and Figure 2.28 is its logic gate
equivalent. The Weinberger s.rr^y is a versatile PLA— 1 i ke
structure generally used to implement sequential logic. In
this chip, as an unc locked circuit, it implements a
combinational function. Weinberger a.rra:y gate
instantiation errors were first detected here (circled).
Note the two half lambda gaps in the NOR gate inputs. By
Caesar editing, unexpanding of affected cells, and grep-
searching the .cif files it was discovered that, these errors
occur whenever certain NOR gate inputs s.re invoked. The
errors themselves were suspected to reside in the
control. lisp file of the MacPitts source code. Two specific
cells appear to generate these errors: partial-gate-input
('ground-right), and parti al -gate-i nput ('ground left). Each
is one-half lambda too short. Chapter VI will treat the
solution o-f this problem. The MacPitts command interpreter
does not detect this type o-f error, since it only exercises
the algorithm. Lyra or a similar design rule checker will
detect this error. The designer would do well to visually
note MacPitts' inherent errors and correct them prior to
submission to a design rule checker (drc).
3- A Control Path OR Gate
Figure 2.29 illustrates the MacPitts algorithm to
create a two input OR realization in the control path. The
OR -function is realized by a selective SETQ choosing
process, in a similar fashion to the previous AMD
real i zat i on
.
Figure 2. -30 is the Weinberger array logical unit of
casor.cif. The inputs are brought in on either side, and the
output comes out -from the middle of the structure. The same
instantiation errors as in the previous chip were generated.
Part i al -gate— i nput (gnd left) is depicted in the upper left
of the stipple plot, and part i al -gate-i nput (gnd right) is
depicted in the lower right of the plot (circled) in Figure
2. 30.
The logical operation of the Weinberger array could
stand some clarification. Figure 2.31 depicts a gate-level
•functional representation of Figure 2.30, the control path
; CASOR . MAC
jSOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<or> FUNCTION BY MACPITTS SILICON COMPILER//r> Input gate//
(program casor 1
(def 1 ground)
(def a signal Input 5)
(def b signal Input 6)








(setq c t) )
(b
(setq c t) > ) ) ) )
Figure 2.29 Casor. mac
implementation o-f a two input COND-test OR structure.
Looking at Figures 2.30 and 2.31 and 2.28, the -function will
be explained. Figure 2.30 has -four depletion mode
transistors (control columns to MacPitts). The left most
transistor is the -first inverter in Figure 2.31. The next
column in Figure 2.30 serves as the top NOR gate in Figure
2.31. Moving right in Figure 2.30, the next column is the
output inverter. And the rightmost column is the lower NOR
gate corresponding to Figure 2.31. When viewed as a gate
level equivalent, it can be seen that the Weinberger arre^ry
is both larger and slower than its data path equivalent (c-f.
Figure 2.6). In the control path, the signal requires
approximately -four gate delays to propagate -from input to
output. This slowness has been somewhat mitigated by
fe^3£feg$j ft** M
$*;#* ?.;;<?*-$
V>J'{; s * -\ -*-Jk "i
,»;;<£"•:'.'. Jr ,«-VvA
£...*, »••<? V^ ^ •-;£{*
;













. T" ./'* % ^>T^ ?-'.'•
'fti.-r./n'iiti'irtiiittiiVii i..,^'VV^ifiif,,iV«i.»'!i-»,n',»Tiiii l ,i' /2< iVn,.it >-'\ .. 1. 1. fir
'*&.'
lti^: '.y .-. .IVT-.i.':'/*.-' — ^': ,-j
'It' - V r :
Figure 2.30 Casor Weinberger Array
_>0_£L_ZC>
T> A + B + B
A+B
Figure 2.31 Gate Equivalent o-f Casor Logic
the large aspect ratio o-f the pupllup transistors (bottom,
Figure 2.30). The comparable logic gate in the data path
only requires approximately two gate delays, one -for the NOR
gate and one -for its subsequent inverter (Figure 2.7).
This simple COND-driven control path OR gate serves
as an indication o-f how MacPitts constructs similar yet more
complicated Weinberger Array structures. The decision logic
is quite unlike that o-f a PLA. In a standard NMOS AND plane-
0R plane PLA, a signal may experience at most -four gate
delays (considering input and output inverters both active,
and pass transistors inducing a very small time delay ). For
this simple OR circuit, a gate delay o-f approximately -four
is realized. The cascading o-f NOR and inverters induces even
more delay -for more complicated Weinberger Arr^y circuitry.
4. A Four Input OR Gate In The Control Path
A quad-input OR structure is specified
algori thmical ly in Figure 2.32. The OR logic which is
implicit in MacPitts specifications is perhaps clearer here
than in the two input OR structure. The COND statement
forces a Boolean test o-f each input, and selects the
appropriate output. To reiterate, the COND statement and its
attendant -forms can be viewed as the i -f-then-el se construct
o-f many higher level languages. The difference is that
MacPitts tests the condition -forms in parallel, and not in a
serial -fashion as most higher level so-ftware compilers
would. The mutual exclusivness o-f the <conditions> is
determined by serial order, however, even though the testing
o-f the conditionals is done in one clock cycle (or in
paral 1 el )
.
' ;QUADOR.MAC
':- ;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<or> FUNCTION BY MACPITTS SILICON COMPILER//* Input gate//
(program quador 1
(def 1 ground)
(def a signal Input 5)
(def b signal Input 6)
(def c signal Input 7)
(def d signal Input 8)
(def e signal output 9)
(def 2 phla)
(def 3 phtb)




(setq e t) )
(b
(setq e t) )
(c
(setq e t) )
(d
(setq e t) )
) ) ) )
Figure 2.32 Quador. mac
This is reflected in in the resulting structures.
Figure 2.33 shows the labelled Weinberger array resulting
from guador.mac, and Figure 2.34 is its logic gate
equivalent. A strength o-f MacPitts is that it forces the
designer to consider both behavior and structure while in
the process of writing the driver algorithm. This is
considered to be advantageous, inasmuch as the abstractness
factor is minimal. There are two broad categories of
silicon compilers, behavior oriented (e.g., MacPitts), and
structure oriented (e.g., Bristle Blocks). In Bristle Blocks
and most other register transfer logic (RTL) silicon
compilers, a structure is the fundamental building block.
The structures (register, adder, ALU, gate) must be
connected appropriately to implement the desired behavior.
In MacPitts, the desired behavior of the chip is the input
to the silicon compiler and the chip which implements this
behavior is the output. The experienced designer is aware of
the structure that results from a given behavioral
specification, and has the latitude to optimize the
algorithm accordingly. This has been mentioned previously,
regarding pad placement and COND. Optimization will be
treated further later in this thesis.
5. A Four Input AND Gate In The Control Path
Figure 2.35 shows the algorithm to create a four
input AND gate in the control path, and Figure 2.36 shows












Figure 2.34 Gate Equivalent of Quador Logic
Note the errors generated in this simple -four input, one
output circuit (circled, Figure 2.36). There are seven gate
gap errors (all parti al -gate-inputs) , and three alignment
errors. The alignment errors are actually derived -from mis-
translation o-f the Weinberger array inter-face cell by
MacPitts (the program). The inter-face cell is created with
the proper pitch, set aside in the VAX 11/780's memory, then
invoked and its image translated to the proper position in
the upper-le-ft o-f the Weinberger array. By convention, upper
le-ft on the MacPitts chips re-fers to the nominal position o-f
the GND pad, position one.
;QUADAND.MAC
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<and> FUNCTION BY MACPITTS SILICON C0MPILER//4 Input gate//
(program quadand 1
(def 1 ground)
(def a signal Input 5)
(def b signal Input 6)
(def c signal Input 7)
(def d signal Input 8)





( def 10 power
)






(setq e (and abed)
( setq e (and abed)
(setq e (and abed)
I
(setq e (and abed)
(setq e f
Figure 2.35 Quadand. mac
So what appears to be three separate alignment
errors is actually just one cell translation error. This
60
error should be repairable in the macro-instantiation
portion o-f MacPitts, although -further investigation will
consider also the possibility o-f an error in program
i nstal 1 ment
.
6. A 1J5 Input OR Gate In The Control Path
It was stated previously that MacPitts will permit
no more than -five deep cascading o-f the same gate organelle
in the data path. This is not the case, however, in the
control path. Figure 2.37 shows a MacPitts algorithm to
create a 16 input OR circuit. Note again how natural the
specification is, and the intuition it gives into both
behavior and structure. To reiterate: in the data path, one
specifies structure explicitly and the implicit behavior
results. In the control path, one specifies behavior
explicitly, and the implied structure (always a Weinberger
array) results (c-f. Figure 2.13, data path AND, Figure 2.25,
control path AND). The suggestion is to specify as much
combinational logic as possible in the control path (this
decision fortunately never arises because MacPitts is not
primarily a combinational logic design tool).
In program multior.mac the data path width is still
one. The data path width actually refers to the number of
outputs from the chip (in the absence of a data path) , not
as its name would lead one to believe. So with one output,
the data path width is one, even though there are 16 inputs.
Figure 2.36 Quadand Weinberger Array
62
The -format -for data path width specification is
(program (program name) <data path width;-
Figure 2.38 shows the chip structure o-f multior.cif . It is
seen that the chip is composed of a small un-cl ocked control
path unit alone, in the middle o-f the Weinberger Vdd/GND
comb. There ar& no data path organelles. As previous
experience would suggest, this control path has several
instantiation gap errors and cell translation errors (see
Figure 2.25). The large number of depletion pu.Ilup
transistors inherent to the Weinberger array is also
apparent. Combinational logic implementation in the control.
path typically requires more depletion pullups than would be
required for the equivalent structure in the data path,
because all control path logic is done with NOR gates. Since
the pullups arB always turned on, a MacPitts chip is not
expected to be very conservative of power. In the four input
OR gate, there were eight pullups in the Weinberger array,
and seven instantiation gap errors. In the 16 input. OR
circuit, there ar& 30 pull up transistors, and approximately
40 gap errors. These errors arsf caused by instantiation ot
the part i al -gate-i nput cells (specifically, parti al -gate-
i nput-ground-1 ef t and part l al -gate-i nput-ground-r i qht ) , and










































ALGORITHMIC CREATION OF LOGICAL




hlaXdef 2 phlbXdef 3 phlc)
Ignal Input S)
Ignal Input S)




I gna 1 1 nput 11)
I gna 1 Input 12)
Ignal Input 13
)
1 gna 1 1 nput 14)
Igna 1 1 nput 15
1 gna 1 1 nput 16)
1 gna 1 Input 17)
Ignal 1 nput 1 8 )
Igna 1 1 nput 19)
Ignal 1 nput 20)
Ignal output 21 )
power )
Input gate//



























































MacPitts is limited in the data path as to how many
combinational logic cascades may be made. Since the control
path is designed to make decisions, the combinational logic
cascading constraint is absent for most practical chips.
Nevertheless, an error was detected in the multior.cif tile,
Figure 2.38. From multior.mac in Figure 2.37, one would
expect the chip to have 22 pads, 16 input pads, one output
pad, three clock pads, one ground pad, and one Vdd pad. The
cifplot only shows 21 pads. This error does not show in the
command interpreter. The 16 input OR function works as
expected there. The error apparently lies elsewhere than in
the .mac -file. The chip does function nevertheless, but as a
15 input OR gate instead of as a 16 input OR gate. The pad
deletion error (one fewer pads instantiated than specified
i"n the .mac file) occurs whenever an OR gate having more
than five inputs is specified in the .mac file. This is an
unexpected error, though not very serious. The control path
is rarely called on to do this sort of logic. If a special
function of this type is required of a MacPitts chip, the
designer can circumvent this problem by specifying an extra
input pad in the .mac file. The chip will compile to cif,
but the extra pad will not be instantiated nor will any of
the attendant combinational logic or wires.
7 Con trol Path Semantics
The syntax (algorithm rules) for combinational
logic in the control path has been illustrated in the
previous sections. To gain an understanding of MacPitts, the
semantics (what the algorithm means) is more important than
how to say what it means.
The parallelism possible in MacPitts has been
previously referred to in the discussion of parallel testing
o-f conditions under a COND statement. This is not the only
place where MacPitts forces parallelism. Parallelism is also
•forced upon all <act i oris > within a true condition under a
COND. The general form of a COND statement is
(cond ( (condition) <actions> '(transition) ))
The (condition) is a Boolean variable upon which the
true/false test is made, the (actions) ^r& SETQs, and the
(transition) is one of GO, CALL, or RETURN (to be discussed
in Chapter IV) . In the previous example, both hot and cold
were Boolean conditional variables which would be tested in
parallel. The '(actions) under the CONE) refer to a set o+
SETQ assignment operators, and the SETQ ' s under a COND a.ns
all done in parallel, or simultaneously- The (transition
form indicates a state transition to be made if < condition
is evaluated as true. This state transition occurs in
parallel (same clock cycle) with the < actions) associated
SETQ's. The state transition mechanism of MacPitts is very
straightforward and natural to a designer familiar with
Mealy type -finite state machines. This topic will be
considered in depth in Chapters IV and V.
Note the difference between the parallelism implied
within the COND and that parallelism implied in condition
evaluation. The conditions are all examined in parallel, and
for the -first one that evaluates to logical TRUE, all forms
within its scope are executed in parallel. This high degree
of implicit parallelism makes MacF'itts ideally suited for
pipelined architectures. Consider the following code in
which three Boolean conditionals determine the outputs. The
destinations of the SETQs are also Boolean, and in this case
are non-storage elements (signals). The outputs a.re declared
signals instead of flags (which are storage devices) so that





(setq wi ndows__open t)
(setq doors-open t)
(setq heater_on f )
)
(col d






(setq doors_open t) ) )
This algorithm models a simple digital home
temperature controller where f refers to an inactive or
closed device, t re-fers to an active or open device. and a
comf ortabl e temperature deadband exists between heating and
cooling requirements. All three Boolean conditions (hot,
cold, and true) are tested in parallel. The order of mutual
exclusion is the order in which the conditions are written
(if both cold and t are true simultaneously, only the
actions under cold will be executed). The conditional (t...
is the MacPitts equivalent o-f a reserved word, and indicates
the always true conditional. It is used in this algorithm as
the default state of the system, where the temperature is
comfortable enough to leave both the doors and windows open.
Even though (t... is always true, the evaluation order of
the conditionals prevents the forms under its scope from
being set unless both the preceeding conditionals are false.
The actions under each true condition are also performed in
parallel, or in the same clock cycle. So the testing of ail.
three conditions and the resultant SETQ -(actions) occur in
only one clock cycle, due to the implicit parallelism of
MacPitts. It is not necessary for the MacPitts programmer to
explicitly parallelise the forms under a COND , the MacPitts
compiler does this every time it encounters a COND. The?
(setq < output > f) statements under the hot and cold CONi 3
are not required for this system. As explained previously,
the Weinberger array will set the output false if it is not
explicitly driven true for non—storage Boolean variables.
The (setq <output> f) statements have the advantage of added
clarity in the MacPitts driver algorithm at the expense o-f
increased size o-f the Weinberger array (more decisions ars
required )
.
The -Following code -Fragment produces the same





(setq windows_open (not cold))
(setq doors_open (not cold))))
In this example, no conditional testing is
necessary although the results are equivalent to the
previous example. On every clock cycle, all o-F the -Forms
embraced by PAR ar<^ executed. On each clock cycle, the -fan,
heater, windows, and doors are set to the correct state. The
resulting hardware is simpler, since fewer decisions are
required. This is the preferred format when conditional
testing can be explicitly done with Boolean logic in the
Weinberger array. But this code fragment lacks the ability
to branch. When transfer of control is required, then it is
necessary to use the full generalized COND statement
cond ( (conditional > -(actions) (transition) )
form instead of the truncated version
cond ( (conditional) (actions)- )
8. Five Input AND Gates In The Control Path
The savings of area, in the Weinberger array can be
substantial when Boolean decisions are made without a
precedent COND statement- Figure 2.39 shows the MacPitts
code used to generate a -five input AND gate using COND -for
each output, and Figure 2.40 shows the resulting Weinberger
array. Figure 2.41 is the logic gate equivalent o-f the -five
input COND driven AND gate. Contrast this with Figure 2.42
illustrating the code -for generation
;FIVEAND.MAC
•.SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<and> FUNCTION BY MACPITTS SILICON C0MPILER//5 Input gate//
(program flveand 1
(def 1 ground)
(def a signal Input 5>
(def b signal Input 6)
(def c signal Input 7)
(def d signal Input 8)
(def e signal Input 9)











(setq z (and a b c d e)




(setq z (and a b c d e)
I
(setq z (and a b c d e)
i
(setq z (and a b c d e)
(setq z f
Figure 2.39 Fiveand.mac
o-f a -five input AND gate in the Weinberger array without
CONDs, Figure 2.43, the resulting Weinberger array logic
generated by MacPitts, and Figure 2.44, the logic gate
71
OPQRST UVW Y Z









Figure 2.41 Gate Equivalent o-f Fiveand Logic
73
equivalent of a -five input AND gate without CONDs. The
second structure is far simpler topol ogical ly , having only
six pullup transistors. The Weinberger array which achieves
the same results with CONDs, Figure 2.39, requires twelve
pullups by comparison. Since fewer explicit decisions need
to be specified, even the code o-f the COND-less chip is more
terse than its COND decision counterpart. In comparing the
logic gate circuit equivalents, the -five input AND gate
created with CONDs requires six inverters and six NOR gates,
and the NOR gates have -fan—ins o-f -five, six, seven, eight,
and nine. There are -four levels to this structure. The -five
input AND gate created without CONDS has only five inverters
and one NOR gate with a fan-in of five, and there are two
levels of gates. The circuit created without CONDs is
smaller, simpler, and faster.
;SIMPL5AND.MAC
;SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<and> FUNCTION BY MACPITTS SILICON C0MPILER//5 Input gate//
(program slmplSand I
(def 1 ground)
(def a stgnal tnput 5)
(def b signal Input 6)
(def c signal Input 7)
(def d signal Input 8)
(def e signal Input 9)
(def 2 signal output 10)
(def 2 phla)
(def 3 phlb) »
(def 4 phlc)
( def 1 1 power
)
( a 1 ways
(setq z (and a b c d e) ) ) )
Figure 2.42 Si mpl 5and . mac
74






Figure 2.44 Gate Equivalent o-f SimplSand Logic
76
The economics of using CONDless algorithms does not always
justify their use. Silicon compilation is intended to tree
the engineer from the micro—design aspects of creating a
chip, and Boolean minimisation (see the home temperature
controller example) is a step away -from this goal.
Typically, the control path is not used to implement
combinational logic -functions, taut rather to provide
controlling inputs to data path operations. The decision to
signal on five simultaneous TRUE inputs would always be done
as shown in Figure 2.42, and not as in Figure 2.39, but this
decision would usually have a COND embracing (around)
itself. The COND in MacPitts is used for decision. Attempts
to minimize CONDs will lead to a loss of clarity in the
algorithm (see the simplified home temperature controller
example). Nevertheless, if the Weinberger array becomes too
large and slow, Boolean reduction techniques such as Quine-
McCluskey or Karnaugh maps should be considered.
9 - A Better 15 Input Control Path OR Gate
A remarkable power savings in the Weinberger a.rrs,x'
can be expected where this alternate algorithm (explicit
specification of outputs without use of COND testing) is
feasible. Figure 2.45 depicts another method of
al gor i thmi cal 1 y specifying a sixteen input logical OR
selector in the control, path (compare with Figure 2.37)
Figure 2.38 shows the resulting layout from the algorithm
using multiple CONDs for selection, and Figure 2.46 shows
the Weinberger array layout resulting -from the algorithm
using just Boolean logic specification. Figure 2.47 shows
the logic gate equivalent o-f Figure 2.46.
! ;SMPLMLTR.MAC
"-.SOURCE CODE FOR ALGORITHMIC CREATION OF LOGICAL
;<or> FUNCTION BY MACPITTS SILICON C0MPILER//1S Input gate//
; a simplified structure resulting from elimination of "cond"
(program smplmltr 1
(def 1 ground)
(def a s Igna Input 5)
(def b s Igna Input 6)
(def c 3 Igna Input 7)
(def d S Igna Input 8)
(def e 3 Igna Input 9)
(def f 3 Igna Input 10)
(def g S Igna Input 11 )
(def h S Igna Input 12)
(def t 3 Igna Input 13)
(def J 3 Igna Input 14)
(def k S 1 gna Input 15)
(def 1 S Igna 1 nput 16)
(def m S Igna Input 17)
(def n S 1 gna Input 18)
(def o 3 1 gna Input 19)
(def P 3 1 gna Input 20)
(def q 3 Igna 1 output 21)
(def 2 phla)
(def 3 ph1b>
(def 4 ph fc)
( def 22 power
)
( a 1 ways
( setq q (or a b c d e_f_9_h_l_ J k 7 _m_n_o_p ) ) )
Figure 2.45 Smplmltr. mac
Note in particular the di -f -f erence in number o-f pullup
transistors between the two circuits (Figures 2.38 and
2.46). There are thirty pullups in the circuit created
using COND testing, and only two pullups in the circuit
created -from the COND-less algorithm. The pullup transistors
are always turned on, and as a consequence consume
proportionally more power than transistors which are















jV». ; ., .-• : ^ .... '" •
>i f • ,. . .,, fi,, , t' ,,', ,.., . i •|i„..i.;.ii» ,''.,>•.», ;-.,„.' ,...'.<>--

















Figure 2.47 Gate Equivalent o-f Smplmltr Logic
80
savings can be realized by appropriate COND-less decision
specification, where appropriate. But note that this is not
always possible, nor is the COND-less algorithm always as
clearly understood as the algorithm using COND -for testing
and branching.
These logic decisions would all occur electrically
in the Weinberger array (equi val ent 1 y; occurring
algor i thmi cal 1 y in the compiled LISP object code), since the
decision stipulations are Boolean and not integer. The forms
•for Boolean combinational logic and integer (word;
combinational logic are syntactically different, and it is
necessary that the MacPitts programmer understand this
syntax difference in addition to the logical implementation
difference described previously.
1Q- Two Con si d er at i on s In. Mac Pi tts Proqrammi n q
MacPitts is both a programming language and a
method of designing digital circuits. As such, the
programmer must consider the consequences of syntax used in
the driver algorithm (the .mac file). It is not always
apparent beforehand whether a given function should be done
in the control path or in the data path. The choice is
determined by the syntax used by the designer.
Suppose a four input AND gate is to be designed in
both the data path (word type) and in the control path
(Boolean type) , where a, b, c, and d are inputs and z is the
output. The statement which relegates the decision to the
3 :\
data path is
(setq z (word-and a (word—and b (word—and c d))) )
where a, b, c, d, and z must all be either ports or
registers (integer valued). The corresponding statement far
the control path is
(setq z (and a b c d)
)
which requires that a, b, c, d, and z all be either signals
or -flags (Boolean valued).
In complicated architectures and most sequential
machines, this choice does not have to be made a priori, but
rather will be made by syntax in writing the MacF'itts
algorithm. In simpler architectures, like a Hamming error
detector or a Grey code decoder, this decision should be
made beforehand. The choice can be regarded as one between
individual treatment of the data bits (usually done in the
control path logic) , or treating the data as n—bit words
(done exclusively in the data path). Examples of algorithms
to do Grey code decoding and Hamming error detection and
correction are given in Chapters IV and VI.
The MacF'itts programmer /desi gner must also consider
the hardware ramifications of syntax. The algorithm chosen
to implement a function in MacF'itts drives the circuit
implementation to achieve that, function.
It has been mentioned previously that COND -forces
conditionals to be tested in parallel, and their antecedent
actions to be SETQ ' d in parallel. This equates to silicon
area/speed tradeoff on the chip. If multiple operations of
the same type are to be done under a COND, MacPitts will
instantiate copies o-f the required organelle, and perform
the operations in parallel. Conversely, if the same
operations are not put under a COND, MacPitts will
instantiate only one copy of the organelle, and perform the
operations serially. For instance, there are two ways to
perform a set of three data path logical two-bit ANDs on six
inputs. The first method does the operations in parallel, at
the cost of silicon area.
(cond (t
(setq x (word-and a b)
)
(setq y (word-and c d))
(setq z (word-and e f ) ) ) )
This algorithm fragment would execute in one clock cycle,
but MacPitts would implement it with three data path AND
gate organelles, each gate having two inputs. The slower
algorithm would be
(setq x (word-and a b)>
(setq y (word-and c d))
(setq z (word-and e f>)
The second example would require three clock cycles to
execute, but only one data path AND organelle would be
instantiated. Similarly, PAR forces all -forms within its
scope to be executed in parallel. The best way to verify
this is to create a short FSM algorithm, and manually clock
it while in the interpreter. (This is also an excellent
method to optimize algorithms for throughput by paralleling
operations where possible and testing for execution in the
interpreter. The results may not be what is expected.)
C. SUMMARY
This chapter discussed the differences between MacPitts'
implementation of combinational logic in the control path
and data path. The fundamental difference is one of
structure, which is driven by syntax.
When the data type is defined Boolean, and the correct
operations are applied to the bits, the combinational logic
occurs in the control path. Control path logic is always
done by a Weinberger array^ an array of NOR gates. When the
data type is defined as integer, and the correct operations
are applied to the words, the combinational logic occurs in
the data path. The fundamental units of the data path are
two— input organelles, which are structural mappings of the
syntactical statements NOT, AND, NAND , OR, NOR, XOR,
i ncrement /decrement , and add/subtract. The data path
performs the arithmetic functions and also generates signals
to control for decisions. Combinational logic syntax (and
hence structure) in the data path obeys the fundamental laws
B4
of Boolean algebra, such as associativity and commutati vi ty
.
The designer must consider these laws in writing the
MacPitts algorithm i f correct -function is desired.
The LISP-like COND -form produces parallelism in
MacPitts. The COND form is a statement which (structurally)
implements decisions in the Weinberger array and
(algor i thmi cal 1 y ) drives control flow in both the .mac -file
and the .obj -file. Control path structures may be reduced in
size (where possible) by not using the COND -form to specify
output conditional setting. The alternative is the PAR
(parallelize) form, which parallels all the forms under its
scope. The forms embraced by PAR must be the functional
equivalents of those under COND, which requires designer
intervention and possibly Boolean algebraic reduction. The
result of this alternative is unconditional explicit
assignment of outputs. This is feasible in simpler chips,
and should always be considered on the basis of an
engineering tradeoff between design time and chip speed.
The COND statement, with multiple selections of
conditionals, can be viewed as an implicit AND-OR structure
realized in NORS in the Weinberger array. An alternate
syntactical viewpoint of COND is the CASE statement.
The gates created in this chapter are rather artificial,
in that they were made to show just the structures desired.
In practice, the combinational logic structures used are
likely to differ slightly.
85
Ill- A SPEED-POWER COMPARISON BETWEEN A DATA PATH
AND CONTROL PATH EQUIVALENT CIRCUIT
A behavior-oriented silicon compiler requires a high
level algorithmic description of the chip's desired function
as its input. The output is a machine readable low level
geometric description o-f the resulting digital circuit,
usually CIF (Caltech Interchangeable Format), a language
describing rectangles -from which the various process masks
and their relative locations are registered. When a CIF tile
is processed by Mosis (Metal Oxide Silicon Implementation
Service), the desired chip results.
Chapter II considered the qualitative effects of
algorithmic syntax on some circuit structures in the data
and control paths. It is also desired to do a quantitative
investigation on functionally equivalent circuits in each
path, and to compare the results. The circuits chosen s.r&
the five input AND gates in both their control path and
data path configurations. Handcrafted versions of the five
input AND gate are contrasted to the MacPitts five input AND
gates.
A. DATA PATH FIVE INPUT AND BATE
Figure 3.1 shows the algorithm used to create a five
input AND gate in the data path. Figure 3.2 shows the
labelled ci-fplot of the four cascaded NAND organelles and
-four inverters, and Figure 3.3 is the logic gate equivalent
of the cifplot. The LISP object -file is included in Appendix
A to show how MacPitts implements the data path AND function
;FIVANO.MAC, data path
{ program f 1 vand 1
( def 1 ground
)
(def a port Input (2))
(def b port Input ( 3 )
>
(def c port Input (4>)
(def d port 1 nput ( 5 )
)
(def e port Input (S))





( def 1 1 power
( a 1 ways
( setq z
(word-and a(word-and b(word-and c(word-and d e)>)))>
Figure 3.1 Data Path Five Input AND Gate .mac File
by invoking the organelle AND -four times. As discussed in
Chapter II, the MacPitts algorithm produces the LISP object
file, from which MacPitts (the silicon compiler) produces
the layout. At run time, the MacPitts (silicon compiler)
script file shown in Appendix A is created. The best way to
create a script file of a MacPitts terminal session is to
issue the command
macpitts basename herald > basename. script &
where the option herald directs MacPitts to send compiler
messages (see compmesg.* files in MacPitts source code) to
the designated output device, ">" is the BSD Unix redirect,
87
Figure 3.2 Stipple Plot of Data Path Five Input AND Gate
88
basename. script is the -file into which the terminal session
is to be recorded, and "&" is the Unix command to put a
process into the background. If the algorithm is not -fully
debugged, then issue instead
macpitts basename herald
so MacPitts diagnostics and Liszt diagnostics both will come
to the screen, and no hardcopy recording will occur. It is
possible to both monitor and simultaneously record the




Figure 3.3 Gate Equivalent o-f Figure 3.2
script basename. script (starts script recording)
to which Unix will respond with
"script started, filename is basename. scri pt
"
89
Figure 3.4 Stipple Plot Showing Critical Nodes
90
then issue the full path command (a Unix bug requires this)
/vl si /macpi t /bi n/macpi tts basename herald
and when compilation is done type control d to terminate the
script recording. The script capability is useful tor
•following the MacPitts compilation process, gives insight
into how MacPitts works, and assists in debugging the driver
algorithm. Tracing of MacPitts' compilation of an algorithm
can then be done with a grep search on the compmesg.* tiles
tor the statistics and the h 1 . 1 i sp files for the herald
messages. If the algorithm halts execution, the script file
indicates where in the compilation process the error was
detected. That part of the algorithm can then be checked for
errors.
The script of a MacPitts session also has informative
material (statistics) on the chip size, components, maximum
power used, and host computer effort expended to compile the
chip. Carlson [Ret. 2: p. 43] describes the script file
produced by a MacPitts compilation session.
After the basename. cif file is produced by MacPitts, it
is necessary to comment out the beginning user extension
zero lines with the vi screen editor. This is done by
invoking vi on the cif file
vi basename. ci
f
and placing parentheses around these lines. Carlson
CRe-f. 2: p. 70] explains why this is necessary. The Caesar
File must next be created so labelling of nodes can be done
-For Mextra (Manhattan Circuit Extractor). The command to
convert a .cif -File to a .ca -File is
cif2ca -o < off set > basename. ci -F
where the o-F-Fset is a number added to the Caesar symbol xx.ca
•Files to distinguish them -From previously created symbol
files which might have the same number (:•::•:).
The procedure described above results in a MacPitts end
product, the basename. ci f -File, and a version of that -File
amenable to editing in the VLSI graphics editor Caesar, the
basename. ca -File. For quantitative analysis o-f a MacPitts
design, -further steps are required.
To begin this analysis, the nodes are labelled (in
Caesar) -For Mextra and Crystal (a timing analyser). Work by
Froede CRef . 3s pp 63—80D addresses Crystal analysis of
MacPitts circuits. After the input, output, GND , and v"dd




: ci f -p
in Caesar to save the new labelled .ca file and to create a
.cif file with nodes at points (-p) for Mextra. Figure 3.2
is the point-labelled cif plot of the data path five input.
AND gate. Next Mextra is invoked on the labelled tile by the
command
mextra -o basename
where the -o switch causes more accurate capacitance
calculation (than is done without -o) . Mextra produces the
basename. nodes -file, which can be checked for connectivity
and to see that all labelled nodes ar& included. Appendix A
shows the .nodes tile for the data path AND gate. The
basename. si m file is also produced, and can be used for
switch level simulation with Esim, SPICE simulation. Crystal
timing analysis, and power estimation with Powest . The
berk85 version of Crystal is the more useful (compared to
the berk83 Crystal) version. To record a Crystal session,
start the script recording, and then call Crystal with its
full path designator
.
/vl si /ber k85/bi n/crystai basename. si m
Crystal has many options and commands. The 1985 version of
the Crystal manual which describes them is available on the
Naval Postgraduate School VAX 11/780 in the file
/vl si /ber k 85 /doc/ crystal /crystal . tbi ms
Appendix A shows the script recording of a Crystal analysis
of the data path AND gate. After the input and output nodes-
are assigned and the delay is given, the command
critical -g -f i 1 en ame. dummy
is issued, then Crystal is stopped with
qui t
and then script is terminated with control d. The critical
command determines the time-critical (i.e., slowest) signal
path, and the -g (graphical results) switch in conjunction
with it creates a Caesar-compatible -file o-f the critical
node locations as shown in Appendix A. This -file can then be
added to the basename.ca -file by the sequence o+ commands
caesar basename (Caesar edit labelled tile)
: source filename (add critical nodes to screen)
Since the Crystal nodes displayed in Caesar a.re not
reproduced in cit , the nodes must be edited in Caesar it
an annotated stipple plot is desired. One technique is to
erase the Crystal —sour ced (created by the : source command.;
nodes, and replace them with implant layer squares (implant
tor visibility and contrast) and then to relabel the delay
times with Caesar's : label command. The revised Cs.e<sar tile
can then be saved and converted to cit tor stipple plotting
with the series at commands
: save
and then
: ci f -p
Figure 3.4 shows the ci-fplot of the circuit with the
critical nodes marked. The critical nodes lie along what
Crystal considers the critical (slowest) path. The largest
delay shown is the circuit cumulative delay, and each marked
node indicates a cumulative delay. This makes it simple to
determine the delay between critical nodes as the difference
between their successive cumulative delays. The stipple plot
can be difficult to interpret if it is desired to determine
what structure causes the delays. A gate equivalent of the
cifplot can be helpful in the analysis. The gate level
equivalent of this circuit with marked cumulative delays is
shown in Figure 3.5. The data path AND gate spreads the
delay out evenly, with approximately 10 ns per gate, as is
expected from the transistor aspect ratios shown in Figure
The maximum power consumed by the circuit can be
determined in either of two ways. The MacPitts script
session (of the compilation process) records it, or Rawest
(Power ESTimator) can be used on the basename.sim file
produced by Mextra. Powest computes the power based on only
the number of depletion transistors, assuming that they a.r^
on all the time (for the maximuum power figure) or on half
the time (for the average power figure). MacPitts considers
both the number of depletion transistors and the power
consumed by the circuit wires, so the MacPitts power should
be the more accurate of the two. The command to use Powest
on the .sim -file is
powest -p < basename.sim
Where the -p switch directs Powest to print out in-formative
data about the circuit, and the < is the Unix backwards
redirect, which directs the .sim file to Powest. Appendix A
shows the result ot a Powest analysis of the five input data
path AND gate. Checking the Powest result can also serve as
a check on the accuracy of Mextra's nodal extraction. For
example, from Figure 3.2, the cifplot, there &r(s eight
depletion pull up transistors and no enhancement pullu.ps or
special transistors. The Powest analysis in Appendix A
confirms this count. This transistor count verification is
important in a MacPitts data path design analysis. It has
been observed that the Vdd bus (top metal trace, Figure 3.2)
does not always connect with the vertical lines to the
pull up transistors. The gap is so small that it is not
usually evident, in Caesar, although a design rule checker
such as Lyra will detect it.
B. CONTROL PATH FIVE INPUT AND GATE
Chapter II discussed the two different types of
control path five input AND gates possible. The COND driven
AND gate was structurally more complicated (Figure 2.40),
while the "CONDI ess" AND gate was comparatively simple
(Figure 2.43). The COND driven AND gate is more likely to
occur in practice (since the purpose of the Weinberger array
is decision making, or conditional control), so that circuit
is analysed in this section.
Figure 3.6 is the MacPitts driver which creates the
control path to implement this logic. Figure 3.7 is the
resulting Weinberger array, which has had the
odd_parti al _gate input gap errors repaired in Caesar (so
Lyra and hence Mextra will work, and produce a valid . sim
•file). Figure 2.41 is the logic gate equivalent of the
Weinberger array. Appendix A contains the object -file tor
this chip. The NOR character of the Weinberger array logic
was discussed in Chapter II, and in the LISP object file all
logic is done with NORs. Appendix A also contains the LISP
object file for the equivalent data path function, and in
Figure 3.2 all logic is implemented in AND organelles. The
Weinberger array is composed of inverters also, but an NMOS
technology inverter is just a degenerate (single input) NOR
gate. The difference in implementation from a software
(language) perspective is that the data path function is
done in organelles, and the control path function is done
exclusively in NORs. The data path organelles are already
compiled in the org an el 1 es. 1 i sp files, so MacPitts has to
work harder to create the equivalent function in the control
path. Both the basename.obj file and the citplot of the
Weinberger array show the NOR logic implicit in control path
combinational logic. The MacPitts script file is shown in
Appendix A, and its data path counterpart is also for
3. 07ns 8. 75ns 14. 45ns 20. 15ns
fl —f
5. 77nsd>^» 11.47nsJH> 17. 17ns 21. 97ns>{>|3>{^
Figure 3.5 Gate Equivalent o-f D.P. AND Showing Delays
;FIVEAND.MAC, control path gate
(program f Iveand 1
(def 1 ground)
(def a signal Input 5)
(def b signal Input 6)
(def c signal Input 7)
( def d s 1 gna 1 Input 8)
(def e signal Input 9)
(def z signal output 10)
(def 2 phla)
(def 3 phlb)
(def 4 ph 1c )











(setq z (and a b c d e)
( setq z (and a b c d e)
(setq z (and a b c d e)
I
(setq z (and a b c d e)
(setq z (and a b c d e)
( setq z f
Figure 3.6 Control Path Five Input AND Gate .mac File
98
comparison. These -files contain information which will be
compared in the next section.
The same CAD tools were used on this circuit as were
used on the data path circuit, in the same order. Mextra
produces the .nodes -file (Appendix A). The control path
logic also di-f-fers -from the data path logic in the number of
nodes produced to model the equivalent circuit. The
Weinberger array node list is approximately 25"/. larger than
the equivalent data path node list. Appendix A contains the
Crystal analysis o-f the circuit, and the critical path -file
-for source input to the Caesar -file. Figure 3.8 depicts the
Weinberger Array with the critical nodes marked, and Figure
3.9 is the gate level equivalent o-f Figure 3.8 with delay
node values and gate equivalent -fan — ins marked. Appendix A
contains the Powest analysis o-f the control path AND gate,
and this information is incorporated into the following
table for comparison.
C. SPEED-POWER COMPARISON
Table 3.1 compares functionally equivalent MacPitts five
input AND gates in both their control and data path
conf l qurat l ons.
G2 GtlGI G12G8 G9 G5 G3 G10
Figure 3.7 Weinberger Array From C.P. Five Input AND Gate
100






















FIVE I NPU.T AND GATE




average, CW 1 .00182 .00094
max i mum , C W ] . 00245 . 00 1 88
Maximum delay
Crystal , Ens] 81.15 85.98
Length x width o-f logic circuit
[lambda] 209 x 173 386 x 113
Number pull ups
(less pads) 12 8
Compi 1 e ti me
CCPU min] 2. 106 1. 535
CPU peak memory demand
Ckb] 349
So all other things being equal, the data path circuit
is superior to the control path circuit in terms of power
consumption, size, and compile time in MacPitts, and
slightly interior in terms o-f maximum speed attainable.
The data path power advantage is understandable when the
number o-f depletion pull ups there is compared to the number-
in the control path. A power consumption ratio ot 0.67 is
expected, and the calculated ratio is close to that. The
di -f f erence is explained by the long horizontal polysilicon
runs in the Weinberger array, which have a comparatively
high specific resistance (ohms/square), and therefore
consume more power.. The first row in the table above,
MacPitts computed power, is calculated on the whole chip and
not on the just logic circuitry. This value shows a similar
power consumption relationship, but the poly runs connecting
the Weinberger array to the rest of the circuit consume
additional power (the rest of the analysis in the table
above is done on just the logic circuits, and not on the
whol e chip).
The speed of the two circuits is approximately the same.
Figures 3.4 and 3.8 show the Crystal -generated deiav data on
the data path and control path circuits. The results are
perhaps clearer in Figures 3.5 and 3.9, the logic gate
equivalents of the cif plots. In the data path (Figure 3.4)
,
the signal experiences approximately 21 ns delay per
organelle. The organelle comprises a NAND gate and an
inverter (Figure 2. 14) . From the gate equivalent, and the
Crystal script (Figure 3.7) , each NAND gate induces a delay
of 9.4 ns, and each inverter induces a delay of 11.4 ns. The
circuit shown in the gate equivalent is expected to produce
a delay equal to the product of the number of organelles and
the delay per organelle. The expected delay is then 4 x 20.8
= 83.3 ns. The cif plot (Figure 3.2) reveals where the added
three ns delay arises. The river routing routine in MacPitts
runs the input and output lines in polysilicon, and in this
case the output comes -from across the circuit. The specific
resistance and capacitance of polysilicon and the poly input
and output line lengths constitute this added delay. Froede
CRe-f. 3:pp. 72—76D has validated Crystal s timing
calculations and compared them tor accuracy with the theory
presented in Mead and Conway CRe-f. 4: pp. 3-143.
Figure 3.8 is the corresponding data path ci-fplot with
Crystal delay annotation -for the Weinberger array. The
structure o-f the Weinberger array is, at -first glance,
intimidating. Two observations on -function assist in
understanding the structure. (1) Any BND track that connects
a Vdd track with only one diffusion gate is an inverter, and
(2) any GND track that connects a Vdd track with multiple
diffusion gates is a multiple input NOR gate. The transverse
poly runs turn on and turn off the NOR gates and inverters.
This cifplot shows six inverters and six NOR gates.
Furthermore, multiple input /single output Weinberger arrays
appear to always exhibit the four level structure shown in
Figure 3.9, a bank of inverters followed by a bank of
multiple input NORS followed by a single multiple input. NOR
followed by an output inverter. Figure 3.9 is the gate level
equivalent of the Weinberger array in Figure 3.8, with delay
annotation and fan-in (shown inside the bodies). The
critical path is from input A to the second level nine-input
NOR througn the output NOR through the output inverter. The
Weinberger array total delay is then 81.15 ns, not much
different from the data path circuit delay. This delay
calculation only considered the Weinberger array, however,
and not the connections to it which MacPitts creates in
polysilicon. If these additional connections were
considered, the Weinberger array would certainly be slower
than the equivalent structure in the data path. Figure 3.8
shows the critical path (annotated with cumulative delay
times), and it is evident that the longest delay path occurs
along the wires which must charge the largest capacitances,,
The data path block is connected to the rest of the chip
with metal lines (in most cases) , so this added delay from
polysilicon runs would not apply to it.
The relative sizes of the data path and control path
circuitry are as expected from the object code respective
descriptions. The object code for the data path
instantiation is approximately half the size of the code tor
the control path. From a theoretical viewpoint, the cascaded
AND organelle circuit is more conservative of both silicon
and power than is the Weinberger array. This principle
applies to most combinational logic in MacPitts, since the
Weinberger array builds functions only from NOR gates,
whereas in the data path the choice of building blocks is
larger (NAND, NOR, and inverter). The MacPitts chip size
comparison is given in the table above, but the circuit
dimensions are more informative. The data path circuitry has
an Ares, of .090 square mm, and the Weinberger array covers
.109 square mm, an area o-f 120 7. over the data path
functional equivalent.
The compile time -for the control path chip is
approximately 257. greater than -for the data path chip- This
is understandable in light o-f the gate instantiation process
for each path. From the cif plots in Figure 3.2 (data path.)
and Figure 3.7 (control path) , the circuits ar<^ not even
remotely similar structurally. The data path circuit is made
•from quadruple instantiation o-f the MacFitts library AND
organelle (see Appendix A, the object code). This organelle
is accessed four times, its location calculated, and then
it is instantiated. The control path Weinberger -:B.rra\'
(Figure 3.7) requires time consuming decisions and
construction from more primitive units, NOR gate inputs (see
the object code, Appendix A). The poly cross-runs must then
be laid down. All of these processes are computationally
intensive, and this is why large control —heavy Weinberger
array architectures take a long time to compile. Chapter VI
describes the design of a control path chip and how long it
required for compilation.
D. ALTERNATE POSSIBILITIES FOR FIVE INPUT AND GATES
The five input AND gate, as implemented by MacPitts in
both its data path and control path configurations, has been
examined above. Each configuration can be improved in the
areas of speed and circuit density. While the goal of
silicon compilation is to -free the designer from excessive
preoccupation with detail, perhaps the combinational logic
generation by MacPitts can be improved. The following
section presents two hand-designed variants of the five
input AND gate for comparison with the MacPitts designs.
The first design is patterned after the Mead—Conway
cells as illustrated throughout CRef. 43. The layout is
similar to that generated by MacPitts for the five input
data path AND gate, a linear cascade of NANDs and inverters.
Figure 3.10 shows the hand-crafted circuit. It is noticeably
different from the MacPitts design in two ways. The pulldown
transistors on the NAND gates are four lambda wide. This
allows a shorter data path, while preserving the 4:1 aspect
ratios of the transistors. Also, the characteristic MacPitts
pull up diffusion "dogleg" is absent. This is accomplished by-
joining the pull up diffusion and polysilicon layers with an
in-line buried contact. The circuit is also less wide than
the MacPitts equivalent. MacPitts uses NAND organelles, and
interconnects then with metal /pol y/di ffusi on wires. This
wastes a lot of space. In the hand-designed five input AND
gate, the output is taken from the pull up on a polysilicon
wire, and routed directly to the input of the next
transistor. This saves (at a minimum) two contact, cuts in
the transistor interconnections. As expected, this
configuration is also considerably faster than the MacPitts
equivalent. The MacPitts data path -Five input AND gate
requires 86 ns -for signal propagation, and the handcrafted
design requires 22 ns. Figure 3.11 shows the gate equivalent
of the hand design, with propagation times marked above the
respective gates.
This con-figuration is amenable to silicon compilation if
the NAND-NOT pairs as shown are incorporated into the
MacPitts organelle library as an AND organelle. Similar
speed and area enhancements are expected -for other data path
logic gates.
I-f the multiple input AND gate can be improved so much
using the basic MacPitts data path cascading scheme, does a
better method exist using another approach? The drawback to
the cascading scheme is the linear pileup o-f transistors.
This requires more silicon, and consequently more current to
charge the gates of later stages. A better design would use
only one gate for the five input AND function, as shown • in
Figure 3.12. This is a true five input AND gate, as opposed
to the previous circuits which only emulate the five input
AND function. The circuit is much smaller than the previous
five input AND gates, and is much faster. Figure 3.1.3 is the
q a t e eq u ivalent with mar k ed d e 1 a y s . T h l s c i r c u :i. t i. s
patterned after those circuits illustrated in CRef. 4 3 also.
The wide (10 lambda) pulldown region permits a comparatively
short transistor (i. e.
,
the pull up aspect ratio is not very
large). The multiple input NAND and NOR derivatives
Figure 3.10 Mead-Conway Style Five Input Linear AND
110
patterned after this gate should be simple to incorporate
into the silicon compiler. The only decisions required are
how many inputs (set by the designer), spacing of the input
wires (set by the design rules), and pulldown diffusion
column width (must be calculated as a -function of number o-f











>\^ Jttna , H 41nstF 50nsi R 61n3i F
r
70nstR 86n3fF
Figure 3.11 Gate Equivalent of Figure 3.10 With Delays
which produces fast, compact combinational logic circuitry,
this method should be considered. Table 3.2 compares the
data path AND gate (DP) , the control path AND gate (CP) , the
hand-crafted linear cascaded AND gate (LC) , and the








Figure 3.12 Compact Five Input AND Gate
1 12








Figure 3.13 Gate Equivalent o-f Optimal Geometry Five
Input AND Gate Showing Delays
TABLE 3.2


























IV. SEQUENTIAL LOGIC IN MACPITTS
Based on previous analysis, combinational logic in
MacPitts is done better (i.e. , more efficiently, when a
choice exists) in the data path than in the control path.
Does the possibility ot improving MacPitts' sequential logic
performance exit also? A study of this question presents
interesting problems.
A. AN OVERVIEW
Chapter II discussed two different ways of increasing
throughput, the PAR form and the COND form. There exists
also a method of global parallelism available to the
MacPitts programmer, the PROCESS form. The PROCESS form has
the syntax
(process -(process name) <stack depth> ... )
where the process name is an arbitrary ASCII. character
string ( i f the name is made short, then the VT-100/ADM-3A
interpreter screen can display them all). The stack depth
refers to the depth of subroutine calls for which this
process must push return addresses onto its program
counter LIFO stack. MacPitts syntax requires the designer
to determine this stack depth a priori, and to explicitly
state it to MacPitts (the silicon compiler). The stack
depth is a required field in the PROCESS statement, and
may be any integer including zero. Each process has its
own stack, and all processes are executed in parallel.
This parallelism provides a high throughput on a properly
designed algorithm.
An extension o-f the digital home temperature controller
o-f Chapter II might also control other aspects of the home
environment. For instance, it would be desirable to turn tne
security lights on and off by a photoelectric cell signal,
to start the coffee brewing and the microwave oven cooking
dinner at a timer signal, and to keep the lawn' appropriately
watered by turning the sprinkler on upon a moisture detector
signal. The following MacPitts program outline would
accomplish these tasks. All logic is done on Boolean
variables, flags for storage and signals for sensor inputs.
(program house (word size)
(port , si gnal , regi ster , and flag assignments)
(process lite









(setq put __di nner __i n t))
( f i ve_pm
(setq microwaveon t))
(f i ve30__pm

















(setq doors_open t)) )
(process grass
(setq sprinkler_on (not lawn_moist)) )
(process clock 1
(par (call mod 60) (setq time counter__out ) ) )
mod 60
<a modulo sixty up counter al gori thm> (return )
)
All of these processes are done in parallel. All at the
processes have a stack depth of zero except for the clock
process, which has a stack depth of one. This is necessary
due to the clock process calling a subroutine, the modulo
sixty up counter. The call of the counter and the following
SETQ are paralleled with the PAR construct. This PAR
paralleling appears to work well for cases where the output
depends on the called routine, like the example above. If
the dependency is reversed (for instance, paralleling SETQs
of inputs to a slow multiplier subroutine with the CALL to
that multiplier) some unpredictable results can arise. A
good practice is to emulate all time-dependent algorithms
alone in the interpreter prior to their incorporation into
the MacPitts algorithm. In so doing, syntax errors may be
•found and -fixed and the algorithm may be optimized -for
number o-f cycles required to execute.
For -fast architectures, some additional speed can be
gained by paralleling the subroutine outputs with the
RETURN -from the subroutine. For instance, the mod60
counter-timer in the previous example is called as a
subrouti ne.
mod 60
(par(setq counter_put count) (return))
There exists no time-dependency between the -final result
(counter_put ) and the RETURN to the main program, so no
data latency results -from this paralleling.
To re-emphasize, all o-f the PROCESSes under the PROGRAM
statement execute in parallel. So while the <house> chip is
monitoring temperature and time, it is simultaneously
monitoring lawn moisture, setting the house clock, and
checking the outside light level. PROCESSes execute
independently, in parallel. Each PROCESS has its own
independent stack, and processes do not communicate
internally with each other. From the hardware standpoint,
each process is an independent MacPitts entity sharing data
storage elements and signal wires.
1. 7
In this somewhat artificial example, there is no strict
requirement for speed. If the lawn is watered 50
microseconds late, the grass will still grow. But the
principle o-f global process parallelism applies to more
complicated digital systems where intricate timing
interrelationships exist. It is also evident that MacPitts
is a very versatile silicon compiler. A chip constructed
from a similar multi-process algorithm could be used to
control many o-f-f—chip processes simultaneously. The
intrinsic nature o-f the PROCESS -form lends itself well to
applications such as industrial digital control. In
situations where the PROCESS statement is used to force
parallelism but the parallelism is not needed (-for instance,
the <house> algorithm), MacPitts creates a large layout.
Silicon area is traded o-f-f -for speed.
This algorithmic outline illustrates using PROCESSes in
a combinational logic machine. PROCESSes are required around
any invocation o-f a subroutine, but aside -from this
consideration, the -(house) chip could be specified just as
well without PROCESSes.
PROCESSes are required, however, to describe a
sequential logic machine in MacPitts. The FSM architecture
is explicitly specified by the PROCESS form. The PROCESS
statement implicitly specifies creation sequencers (a data
path hardware organelle, which steps the FSM through its
states) and their instantiation in the data path.
1 1 8
B. GRAY CODE TO BINARY DECODER
The -Following section illustrates the MacPitts design oi
a simple sequential logic system. The Gray code CRet.5:
p. 97] -finds many diverse uses in electrical engineering and
computer science. Whenever a single bit change in successive
data words is desired, (disk sector addressing, radar
antenna positioning) the Gray code should be considered. In
-finite automata theory, the Gray code decoder can be
regarded as a sequence detector. The desired sequential
machine complements the input on having received an odd
number of earlier l's, and does not complement the input on
an even number o-f l's. An example sequence is
input: 11110 1 1 1 10 1 ...
output: - 1 1 1 1 1 ...
The Gray code decoder can be implemented in MacPitts as
a Mealy FSM to detect this sequence, and set the appropriate
outputs. The automata -for the Gray code decoder is shown in
Figure 4.1. The node label MSBS indicates most significant
bits, COMPL means complement the present bit, and NEXTBIT
means consider the next bit.
1 . Al qor l thm Desi qn
The next consideration is algorithm design. Previous
experience inclines the designer toward a data path
architecture (faster, smaller, less power consumption).
Furthermore, a data path chip would probably have a greater
throughput, since the operations could be done on words, and
i/i
Figure 4.1 Gray Code Decoder State Transition Diagram
120
not individual bits (e. g., a parallel Gray code decoder,
which decodes on a word basis rather than a bit-by-bit
basi s)
.
The problem with this approach is that MacPitts
permits no explicit, succinct method of setting the
individual bits in a word. The bits can be tested with the
BIT expression, but not set. So a control path (implying
Boolean type data and Weinberger array combinational logic)
architecture is probably a better choice.
A control path FSM can be designed with MacPitts
(even though no explicit data path is used). The reason is
the way in which MacPitts implements FSM state
transi tioni ong with the sequencer organelles. The sequencer
can be thought o-f as a bank of n sequencer organelles, where
n is the data path width speci-fied in the PROGRAM statement.
The sequencer organelles are physically adjoined to the data
path organelles in the MacPitts chip. The sequencer stores
FSM state, much in the same way as flip-flops store state in
a di screte—chi p FSM design. And just as two raised to the
power (number o-f flip-flops) limits the states in a discrete
digital system, so two raised to (number of sequencers)
limits the states possible in a MacPitts sequential machine.
The number of sequencers is always equal to n, the data path
width. This has ramifications for MacPitts designers
considering a system of many states with a narrow data path.
The possible number of states is limited to 2**n.
121
One solution to the Gray code problem is to use a
data path architecture, to declare the data path width as
two, and to specify an extra (unused) bit in the input and
output port declaration statements. The most significant
bit of the input port is obviously extraneous, but the data
path width o-f two is necessary to address the three states
required (Figure 4.1). When the Gray code chip is used,
these extra pins must be tied to ground. If a data path
width of one is specified (and PORTS are used for inputs)
instead, MacPitts gives the following diagnostic
Error-Word length too small to store the state for
this process
If the data path width is left as two, but the input and
output ports are left only one bit wide (another attempt
to circumvent this problem), MacPitts responds with
Error-Invalid port definition
which means that the data path width was declared as two,
but the port is only one bit wide (MacPitts has helpful
diagnostics). The MacPitts source code file, extract. lisp
(under the def get-sequencer-f rom-process macro) shows why
this constraint exists. The sequencer width is explicity
set to the data path width.
Figure 4.2 shows the MacPitts driver code to do the
Gray code to binary conversion serially. The MacPitts
algorithm shown in Figure 4.2 has the lines numbered for
122
reference, but the numbers are not part o-f the allowed
MacPitts syntax. Line 1 is the title, using a semicolon as
the reserved word comment designator. Line 2 is the PROGRAM
statement, the program name is gc (Sray code) and the data
1 ;Grey Code to binary conversion algorithm
;Th1s code Illustrates the Data Path (1. e.,
;Integer) solution to the problem. It Is but one
;Var1ant of many possible solutions.
;Def1ne the data path width as 2 (state transitioning)
2 (program gc 2
3 (def 1 ground)
4 (def 2 phla)
5 (def 3 phlb)
6 (def 4 phic >
;A11 FSMs must have a RESET Input (for Initialization)
7 (def reset signal input 5)
;Use INTEGER (port) Input & output, 2 bits wide
8 (def Inp port Input (6 7)>
9 (def bin port output (8 9))
10 (def 10 power)
;Spec1fy FSM architecture
11 (process grycod
12 msbs ; (Most Significant Bits)
13 <cond((=0 1np)(setq bin 0)(go msbs))
14 <(= 1 InpHsetq bin 1 ) ( go compl)))
15 compl ; (COMPLement bits)
16 (cond((=0 1 np ) ( setq bin l)(go compl))
17 ((= 1 InpHsetq bin 0)(go nextblt)))
18 nextblt; (NEXTBIT in string)
19 (cond((=0 InpHsetq bin 0Xgo nextblt))
20 ((= 1 InpHsetq bin lHgo compl))) ) )
Figure 4.2 Gc.mac
path width is two. Lines 3, 4, 5, 6 , and 10 are standard,
and required by MacPitts conventions. Line 7 is required for
all FSMs, and when it is raised high (positive logic
arbitrarily chosen here), the FSM/PROCESS is reset to its
initial state. Line 8 de-fines the input port, inp, and
123
declares it integer two bits wide. Line 9 does the same -for
the output port, bin (binary value). Line 11 specifies FSM
architecture with the PROCESS statement, -for which the stack
depth is zero (no calls to subroutines). Line 12 is a node
label, msbs (most significant bits), and represents the top
node in Figure 4.1. Line 13 is the -first check in this
state, and says that if the input equals zero, then set the
output to zero and go to node msbs. If the input does not
equal zero, then go to the next line of code. Line 14 checks
whether the input equals one. If the input is equal to one,
the output is set to one, and the program transitions to
the complement (compl) state. Line 15 implements the second
node in Figure 4.1, complementing the input. Line 1 o checks
the input, and if it equals zero it complements and keeps-
complementing as long as the input equals zero, and if not,
it proceeds to the next line. Line 17 checks for the
sequence of an even number of ones, and if true, sequences
to the next node after complementing the input. Line IS is
the label corresponding to the last node in Figure 4. 1
,
nextbit. Line 19 checks the input bits, sets the output to
the input value, and returns to this node as long as the
input is zero. Line 20 also sets the output to the input
value, but jumps back to the bit complement node when the
input is one. The conditional in line 17 is unnecessary, taut
is included for clarity (If the n on -storage port, tain, is
not explicitly set to one, it will become zero at the next
i 24
state transition. Line 17 can be eliminated, and the
algorithm will work correctly anyway).
The next step is to test and debug the algorithm in
the interpreter prior to full compilation. The Gray code
algorithm was debugged in the interpreter, and compiled with
the <herald> option. Appendix B shows the script recording
of the compilation process, and indicates a data path of
seven different organelles (to be discussed in the next
section) and a moderate—si zed (31 columns) Weinberger array.
Figure 4.3 shows the chip resulting from the
compilation of gc.mac. The functional constituents o-f this
layout will be treated gual i tat i vel y in the next section.
2. Functi onal Const i tuents Of The Chi p
The layout scheme of MacPitts places general
functional blocks in specific relative locations on the
chip. Figure 4.4 indicates where these relative locations
lie on the cifplot. The block sizes shown in Figure 4.4 are?
arbitrary, since the actual sizes depend on a combination
of algorithm and MacPitts (the source code). In comparing
Figure 4.4 to Figure 4.3, it is seen that this chip has no
flags, which is expected since none are defined in the
source algorithm. The rest of the blocks shown in Figure 4.4
ar& instantiated in cg.cif (Figure 4.3).
The data path arithmetic block is shown in Figure

















































Figure 4.5 Data Path Arithmetic Block From GC.ci-f
128
so that the desired outputs result. The inputs enter the
arithmetic block and the outputs exit as shown in Figure
4.5. Between input and output, the data is subject. to
switching and various logic operations. The data path and
the control path must also communicate with each other over
the interconnecting traces. The leftmost top poly line, D9
,
is an input to the Weinberger array, where it turns on five
NOR gates. Similarly, the other nine lines also connect to
the control path. Lines D8 , D7 , D5 , D4 , D3 (reset), D2, Di,
and DO are outputs -from the control path and inputs to the
data path. Line D6 is the other output from the data path to
the control path. The inputs to the data path can oe
understood as relay controls, or switches. The outputs from
the data path to the Weinberger array are Boolean values to
cause decisions about what to do next.
From Figure 4.5, the arithmetic path of this chip is
seen to be two bits wide (the two horizontal parallel
organelle chains). In Chapter II it was shown that syntax
implicitly controls instantiation. Line 13 in the Gray code
algorithm specifies two data path operations
(cond( (=0 inp) (setq bin 0)
where the (=0 inp) is a logical comparison integer test, and
(setq bin 0) is an integer form by definition of bin in the
def statement and the source for bin being an integer, z&ro.
The leftmost set of cascaded OR gates makes the (=0 inp)
test, and signals the control path on line D9. Figure 4.6
shows the logic diagram for this stipple plot, and the
results -for a zero input.
Proceeding .right on the arithmetic block stipple
plot, the next block is a set of paralleled NOR gates. The
inputs are the inp bits, inpO and inpl, and Vdd and BND. The
output is a signal to the control path -from 08 which
determines the chip output, bin (BINary equivalent of the
Gray code bit stream). This circuit does not directly make
the output assignment, (setq bin 0), but rather does it
through combinational logic in the Weinberger array. Figure
4.7 is the logic diagram of the setq operation circuitry.
The circuit is annotated to show a zero bit input on inp, in
which case a TRUE is sent to the control path on line DS.
Proceeding right in the data path, the next two
blocks in Figure 4.5 show pass transistor units. The
leftmost pass transistor unit has inputs from binO, binl,
and control on 07. The output is a signal to control on D6.
This section of the data path is where the output bin is
set, although the logic for setting bin is determined in the
preceeding two data path units and the control path. To the
right of this unit is another pass transistor block which
takes inputs from the previous pass transistor unit, trom
the clock drivers, from control on lines 05, D4 , and 03, and
from the sequencer. The function of this unit is state












Figure 4.7 SETQ Operation (Signal to Control)
131
state, and this unit drives the state registers which signal
next state to the sequencer tail, at -far right. The input D3
is the reset signal, which implements the MacPitts -function
o-f returning the FSM to its initial state when raised high.
Figure 4.8 shows the state registers, a set of
parallel 2—T memory cells, in which the current state is
held. The inputs to the state registers are the outputs of
the previous pass transistor block, signalling next state
transition, and the three clock lines -from the clock driver.
The outputs are the two state bits (SO and SI) to the
control path (on lines marked CI and CO, Figure 4.10), The
Mealy FSM methodology is evident in MacPitts from both the
algorithmic and hardware viewpoints. The output is a
function of both input (inpO, inpl) and present state (SO,
SI) .
Below the state registers in Figure 4.3 are the
clock drivers. Figure 4.9 is a blowup of the driver
organelles, used for buffering the clock signals and
generating the five overlapping clock signals. The drivers
are turned on by a signal from the Weinberger arrav^ C5„
Carlson describes the clocking scheme and the reasons behind
its choice (Ref. 2: p. 26).
The rightmost. block in the data path is the
sequencer. Figure 4.10 is the cifplot of the sequencer
combinational logic, and Figure 4.11 is its gate equivalent.
The sequencer has as its inputs the current state (SO, SI)
inpO
Figure 4.8 MacPitts State Registers
133
Figure 4.9 Clock Drivers and Five Segment Generator
134
and produces as its outputs the next state (SO+1 , Sl+1).
The gate diagram of the sequencer answers the question asked
in the initial design o-f the Gray code decoder, why three
states are not allowed in a control path (i.e. , a data path
width o-f zero) architecture. The answer lies in the implied
data path structure, as explained previously and as
graphically shown in Figure 4.10 and Figure 4.11. The data
path width as specified in the PROGRAM statement sets the
number o-f sequencers to be instantiated, and the number of
sequencers limits the number o-f states possible. It fewer
FSM states are required than the sequencer depth can
transition to, the sequencers are nevertheless instantiated,
but their outputs are not connected to the control path (CO
and CI in this example). For example, this would occur ior a
wide data path which had few states. If a data path FSM chip
were designed with a word length of five , and only four
FSM states were needed, MacPitts would instantiate all five
of the sequencer organelles. Only the top two would be
connected to the Weinberger array. Figure 4.12 is a block
diagram of the MacPitts sequencer organelles, and shows now
the Mealy FSM is implemented. The multiplexers on each side
of the state registers determine that the next state is a
function of both present state and present input. The
Weinberger array controls the gating in the multiplexers to
allow the appropriate signals to pass to the state
registers.
Vdd
§ '*' %'' v-.v
ISO* 1 1 v '
7SZZZM Y/~S777777. W, Z32ZZZZZZZZZ2ZZZZZ. 7Z W/////////7777777. jgg V//////////
177777777777777777777777. V7j W/.WS//;/>,'//.
'
. 77. •&> 777. ^777777777777777. "TTj > '.'/77S//77S.
d cO
Figure 4.10 Sequencer Tail
136
SI' <
to pa33 transistor for next 3tate input MUX
50' ^
Ydd I













































































































Figure 4.12 Mealy FSM Implementation in MacPitts
13S
The Weinberger array is the control path in a
MacPitts chip, -for reasons explained in Chapter II. The
Weinberger array is shown in Figure 4.13, and its labelled
gate equivalent is Figure 4.14. In the cifplot, all input
and output columns have been labelled (A-Z) tor comparison.
The output lines have also been labelled (Cn) for reference
to the other -functional blocks o-f the chip. There ars major
di f f erences between this multiple function Weinberger array
and the single function Weinberger arrays considered
previ ousl y
.
This Weinberger array for single output. functions
always has a four level structure, inverter—NOR—NOR—
inverter. This is not the case for multiple? output
Weinberger arrays. This circuit, has 11 inverters and 15 NOP
gates. The maximum fan in on any NOR gate is six. In the
previous Weinberger arrays, the maximum delay was
approximately four gate delays. In this Weinberger array*
the longest path is shown in Figure 4.14 as J-U-T—L-F-G-D
,
or Q-W-T-L—F—G-D , Each path induces approximately seven gate
delays. The MacPitts script session (included in Appendix B)
lists the control depth (NOR gate nesting) incorrectly as
four. Furthermore, the pal ysil icon runs cover proportionally
more area in this Weinberger array than in the previous
single function ones. From Chapter III, the polysiiicon to
substrate capacitance is a strong factor in limiting chip
speed. The multiple function Weinberger arrays are; expected
Figure 4.13 Weinberger Array -from Gc.ci-f
140
Figure 4.14 Gate Equivalent of Gray Code Weinberger Array
141
to be slow. This Weinberger array has nine outputs (Cli,
CIO, C8, C7, C6, C5, C4 , C3 , and C2) and -five inputs (C13,
C12, C9 , CI, and CO). C13 is a check on the input signal
values, and comes -from D9 (D indicates a signal to or from
the data path, C indicates a signal to or -from the control
path). Cll is an output to D8 in the data path, the -function
ot which is not clear (data path output connecting control
path output). CIO is an output to D7, and the signal
controls pass transistor gating in the left pass transistor
unit, which determines the value of the output (binO, binl).
C9 is an input to the Weinberger array, and comes from D6.
This input is not set within the data path, and it is likely
that it results -from MacPitts' expectations of a more
complicated structure. The sequencer organelles exhibit this
vest i gal structure property also, as previously mentioned,
C8 , C7, and C6 are outputs which control the second pass
transistor block (state register multiplexer) in the data
path. They connect to D5 , D4 , and D3 , respectively, and
control the sequencer's next—state transitioning. C6 is
connected to pin -five by a polysilicon run and CI 3, so C6
(D3) is the reset signal. C5 is an output which turns on the
clock drivers. C4 , C3 , and C2 are outputs connecting the
data path at D2 , Dl, and DO, where they control pass
transistor gating for the sequencers and state register. CI
and CO are inputs -from the state register which represent
the current state. Figure 4.15 shows the data path—control
path interconnections. The interconnections Are summarized
in the diagram below.
(inp) (bin) PT ves PT: state PT:seq and reg
rst D? D8 D7 D6 D5 D4 D3 elk D2 Dl DO SI SO
C13 C12 Cll CIO C9 C8 C7 C6 C5 C4 C3 C2 CI CO
rst
In this diagram, rst means reset, PT is a pass
transistor unit, ves is a vestigal (non-functional) unit,
seq is sequencer, and reg is the state register.
3. Al ternate Desi qns
The gc.mac algorithm used explicit value assignment
in the output setq -forms.
( set q bin -(value >
)
In this case, it is possible to explicitly set the output
to a value (one or zero). This is not possible, however,
for all algorithms, and is not even desirable in the
general case. Usually the output is a -function o-f the
input(s), and not a specific value which is known
beforehand. With this in mind, an alternate algorithm was
written to implement the Gray code to binary conversion.
Figure 4.16 shows the algorithm, gc2.mac. This code
follows the state diagram given for gc.mac (Figure 4.2)
,
and the states all have the same names. The algorithms ar<a
equivalent functionally and semantically (they both do and
143
Figure 4.15 Data Path/Control Path Interconnection
144
say the same thing). The only di f f erence is in the binary
output (bin) setq forms. In the previous algorithm,
gc.mac, the output bin is explicitly set to either one or
zero. In this algorithm, gc2.mac, the output is set to a
data path -function of the input. The code represented by
gc2.mac represents the more general case.
The chip created by gcZ'.mac is expected to be larger
than the one created by gc.mac, since additional data path
decisions are required in the setq forms. The script file of
the gc2 MacPitts session, (Appendix B) verifies this, and
Figure 4.16 shows the resulting layout. In comparing the two
script files, it is seen that gc2 requires more data path
units, data path transistors, and control path transistors.
This is reflected in the comparative complexities of the
data paths in Figure 4.3 and Figure 4.16. The chip produced
by gc2 would also consume slightly more power and be3
slightly larger than the chip produced by gc. The conclusion
is that by explicitly specifying the setq destination
values, the designer can save area and power consumption. A
reasonable expectation would also be a taster chip.
Explicit assignment of outputs is therefore
desirable, though not always feasible. In many control path
architectures, where the output is treated as individual
bits, explicit assignment is possible (though not always the
optimal solution, see Chapter VI on Hamming error-
correctors, where there aro many outputs possible). In data
i 45
path or hybrid architectures where there ^rs only a few
numerical outputs possible, explicit assignment of output
values should also be considered (see the blackjack
algorithm, -following). A general rule is to choose the
method that results in the shorter algorithm, whether (1)
explicit assignment of outputs, or (2) assignment of outputs
as a function of either inputs or intermediate values. The
significance of this is that the designer can influence the
design by the MacPitts program written, even though the
silicon compilation process is automatic.
The two previous algorithms assume serial decoding.
If it is desired to do the decoding faster, parallel
decoding should be considered. MacPitts has a mechanism for
this implicit in the integer data types (which look at a
data word in parallel), and the multiple PROCESS algorithm,
which performs independent functions in parallel. Parallel
data processing will be considered in Chapter VI
.
The alternate solution (control path logic) to the
Gray code decoder is shown in Appendix B for comparison to
gc.mac and gc2.mac. The script and cif files are: included
also for comparison.






(def reset signal Input 5)
(def Inp port Input (S 7))
(def bin port output (8 9))




(cond((*0 InpXsetq bin InpXgo msbs))
((= 1 InpHsetq bin 1np)(go comp 1 ) ) )
comp 1
(cond({=»0 1np)(setq bin (word-not 1np))(go compl))(( 1 1np)(setq bin (word-not 1np)>(go nextblt)))
nextb It
(cond((-0 1np)(setq bin 1np)(go nextblt))
((- 1 1np)(setq bin 1np)(go compl))) ) )
Figure 4.16 Gc2.mac
.clock
Figure 4. 17 Gc2.ci-f
147
C. A BLACKJACK GAME
The previous section discussed MacPitts sequential
logic implementation as a -function of algorithmic syntax.
A simple -finite state machine was developed, and the
structural ramifications of the source algorithm were
investigated. This section will discuss development of a
more complex algorithm, and its consequent structure.
1 . The Al qor i thm
The blackjack game algorithm was developed based
on the -following rules. The rules are expressed as FSii
states, since the transition to MacPitts syntax is easier
that way. The capitalized words correspond to node names
and MacPitts variables.
sO: START, initialize
si: ACCEPT card (?) (F, go sO) , add FACE value to
SCORE
s2:if ace and no prior ace valued as 11,
SC0RE=SC0RE+10
s3sif SC0REO16, HIT, go si
s4:it SCORE >21 and previous ace valued as 11,
SC0RE=SC0RE-10, go s5
s5:if SCQRE<21 and no previous ace valued as 11,
BROKE, go si
s6:it 17<=SC0RE=>21 , STAND, go si
The next step is to create a state transition
diagram, and then to translate the game rules into the
appropriate MacPitts entities (ports, registers, signals,
and flags). This is usually done from an English
description, and then the number of states is minimized by
standard techniques. Figure 4.18 shows the transition
diagram, which is not minimized -for the sake of clarity.
There are seven nodes in the diagram. The top node is
start, the initial state and the state to which the FSM
reverts when the reset signal is brought high. The next
node is draw, where the player draws a card (simulated by
an off-chip random number generator). The third node is
labelled ace, and represents decisions made if an ace is
drawn. The next node, htchk, checks for a hit condition
(draw another card). Following htchk is devalu, which
decrements the rscore contents when appropriate. Then the
broke (lose game) condition is tested in the brkchk (broke
check) state. Finally, the stand check node, stchk, tests
if the stand (win) condition exists, and the program
returns to the initial state for either replav or
termination. The state transitions follow from the?
preceeding rules. The MacPitts driver algorithm is written
on the basis of the state transition diagram. The driver
is shown in Figure 4.19.
Storage elements are required for state
transition decisions under the CONDs , so these variables
must be flags (aceflg and acptflg). Line 11 in the source
code reflects this. The arithmetic comparisons are made on
integer values, and these must likewise be storage
elements, so this variable is defined as a register
(rscore, line 10). Since the FSM progresses asynchronously
with the output (no new output with each clock cycle)
,
I 4 s
Figure 4.18 Blackjack Game State Transitions
150
1 ;B5.MAC BLACKJACK MACHINE
2 (program blackjack 5
3 (def 1 groundXdef 2 phfaHdef 3 phlbXdef 4 phtc)
4 (def face port 1nput( 5 6 7 8 9))
5 (def hit signal output 10)(def stand signal output 11)
6 (def broke signal output 12)
7 (def score port output(13 14 15 16 17))
8 (def accept_card signal Input 18)
9 (def reset signal Input 19)
10 (def 20 power )(def rscore register)
11 (def aceflg flagXdef acptflg flag)






15 ( condl acptf lg ( setq rscore 0)(setq aceflg f)))
16 draw
17 ( cond ( acptf lg ( setq rscore(+ rscore face))
18 (setq score rscore) (go acenode))
19 (t (go start) ) )
20 acenode
21 (condUand (- face 1) (not aceflg))
22 (setq rscore (+ rscore 10))
23 (setq score rscore)
24 (setq aceflg t ) )
)
25 htchk
26 ( cond( ( uns Igned-O rscore 16)(setq hit t) (go draw)))
27 deva 1
u
28 (cond((and aceflg (unsigned-) rscore 21) )
29 (setq rscore (- rscore 10))
30 (setq aceflg f)
31 (setq score rscore)
32 (go htchk) )
>
33 brkchk
34 (cond((and (unsigned-) rscore 21) (not aceflg))
35 (setq broke t> (go start)))
36 stchk
37 ( cond (( and ( uns fgned-O rscore 21)
38 (unslgned->= rscore 17))




there must also be a port (score, line 7) to clock the
register value to. Similarly, a port (-face, line 4) is
de-fined as the input (-face value) of the card. Whenever an
output is produced asynchronously with the clock, the
latching operation
(setq <register> i nteger_val ue)
must be made. One method o-f clocking the register contents
to the output port is to use the ALWAYS statement under
the PROGRAM statement.
(program <name> <data path width>
(al ways (setq (output_port regi ster _contents) ) )
(process <name> < stack depth
>
This will insure accurate current output values. In the
blackjack algorithm, this procedure will not work. It the
statement
(alway5(setq score rscore))
is used, the algorithm would appear to work in the command
interpreter. Upon compilation, however, the following LISP
compiler (Liszt) diagnostic results,
Error: Non-number to minus nil
< 1 >
where the -first line o-f the diagnostic indicates an
attempted arithmetic operation on an empty LISP atom or
list, and the second line is the LISP debugger prompt
CRef. lisp. 11-13.
The reason why this does not work (for this
algorithm) is that rscore has not been initialized (as in
Fortran, -for example) at execution of the ALWAYS
statement. The LISP primitive representing rscore is at
this time a nil, or empty, atom. The solution is to clock
the register (rscore) to the output port (score) whenever
it changes value. Lines 18, 21, and 23 show this other
method o-f register transfer to ports.
There ar& some new -forms in b5.mac which also
require discussion. The integer test which returns a
Boolean value to control is
(<signed> < inequality type> integerl integer 2)
where the -field <signed> is required, and is either blank
or the string "unsigned-" -for the less than, less than or
equal, greater than, or greater than or equal tests. The
comparison is made with the < inequality type> between
integerl and integer2.
For instance, i -f temp is an integer variable set
equal to 72, hot is an integer variable set to 88, and cold
is an integer variable set to 60, the following forms would
produce the signals to control shown. The result of the
FORM SIGNAL TO CONTROL
(cond( (=hot 88) )
)
T
(cond < (unsigned-< hot 99))) T
(cond ( (unsi gned-<=hot 89))) T
(cond ( (= temp hot) )
)
F
(cond( (unsigned-) temp hot))) F
(cond ( (unsigned-)3 70 temp) )
)
F
integer comparison test is a Boolean value, and as suchis
used as a conditional under COND, as shown in Figure 4.19.
The remaining -forms in the algorithm have been
previously explained. The algorithm bS.mac (which required
five tries to obtain a success-ful compilation) -follows the
FSM state transition diagram with the syntax given. The
algorithm has been exhaustively tested (only possible with
simple FSMs) in the command interpreter.
2. The Chip
Figure 4.20 shows the cifplot resulting from
bS.mac. The appearance is similar to the Gray code decoder
layout, with the exception o-f an added -functional block at
the top right. This is the flag block, resulting from line
11 in b5.mac. The flag block is both a source and a
destination for control signals, as the driver syntax
suggests.
The data path is organized in five parallel
units, as expected from line 2 in b5.mac. There are
seven states in the FSM, so only three of the five
instantiated sequencer tails are connected to control (the
other two are vestigal, instantiated, yet not used).
Since four integer values were used in the comparisons,
154
Figure 4.20 B5.ci-f
the data path is required to generate the comparison
integers. This must be considered in designing an
algorithm, in assigning the data word length under the
PROGRAM statement. The maximum score possible -for the
blackjack game is 27, so the minimum word width is -five.
Another reason -for the lengthened data path is the number
o-f arithmetic tests made. The integer values -for hit,
stand, broke, and devalu are made within the data path,
since syntax speci-fies structure in MacPitts. In the Gray
code decoder, the comparison tests generate combinational
logic in the data path which sends a signal to control. As
more data path tests are required, a longer data path will
resul t
.
The Weinberger array o-f the blackjack chip shows
a multi— level structure similar to the Weinberger array
for the Gray code decoder. As the Weinberger array grows
in complexity, it becomes increasingly di-f-ficult to
understand its -function in terms o-f a gate level
equivalent. The correct by construction property of
MacPitts is intended to assure correct operation o-f large
control path circuits nonetheless. The compilation session
recording in Appendix B shows the MacPitts instantiation
process -for the blackjack machine, which -follows the same
general scheme as -for the Gray code decoder.
D. MEAD-CONWAY TRAFFIC LIGHT CONTROLLER
The -functional description o-f the Mead-Conway tra-f-fic
light controller is taken -from CRe-f . 4: p . 85] . The chip
controls a tra-f-fic light at a hi ghway--f arm road
intersecti on.
1 . The Alqori thm
Design o-f the algorithm -follows principles stated
previously. A-fter the desired -function is understood, an
automata (state diagram) is drawn. From this, the
algorithm is written. The placement the logic is
determined by syntax, and the selection o-f storage
entities (-flags or registers) -follows.
The light controller controls the three-light
tra-f-fic signals at the intersection o-f a busy highway and





and TS (short timeout) . The
outputs are ST (start timer), FLO and FL1 (encode the
color o-f the -farmroad light) , and HLO and HL1 (encode the
highway light color). An FSM is appropriate to represent
the sequential nature o-f the tra-f-fic light cycling. Figure
4.21 shows the state transition diagram, with labels
corresponding to the MacPitts states in the algorithm.
Next, the algorithm is written. A control path
architecture is chosen for ease in setting the output bits
(initially, the output bits are set individually). Storage
elements (-flags) are not needed -for this example, since
Figure 4.21 Light Controller State Transition Diagram
158
the outputs are synchronously produced, and constant
throughout a given state. In control path circuits using
Boolean variables, the value goes to FALSE at the next
state transition unless it is explicitly set to TRUE. So
storage o-f the output values would be required i -f they
were to be output within a di-f-ferent state -from that in
which they are determined. For example, i -f the light
control signals -for the highway yellow (HY) state were
produced in the previous state (HG) , then they would
require latching so the correct values would remain a-fter
the state transition. I-f the chip was to be produced,
however, the outputs would require latching as explained
in the previous section, since the chip clock is many
times -faster than the light timer clock.
The output bits which control the -farmroad and
highway light colors must be encoded. The following table
i s used
HLO HL1 FLO FL1
o GREEN
1 o 1 YELLOW
1 1 1 1 RED
and the output bits are explicitly set to Boolean values
in the SETQ -forms.
Figure 4.22 is the MacPitts algorithm to create
the tra-f-fic light controller. The -format is similar to the
previous FSM drivers, with the exception o-f absence o-f
data path combinational logic. The data path width must be
159
;MEAD-CONWAY LIGHT CONTROLER
;Set the D.P. width to 2 (4 nodes 1n FSM dgm>
{program lc2 2






;The following 3 SIGNALS are control Inputs:
(def c signal Input 5)
(def tl signal Input S)
(def ts signal Input 7)
;The RESET signal Is required for all FSMs:
(def reset signal Input 14)
;Deflne 5 output SIGNALS (=> C. P.) to
;Control the TIMER & HW/FR traffic light:
(def st signal output 8)
(def h 1 J0r signal output 9>
(def hll signal output 10)
(def fl0 signal output 11)
(def fl signal output 12)
(def fll signal output 12)
;The PROCESS statement Implies FSM sequencing,
;The stack depth is zero:
(process 1 1 ght_control 1 er
;The HIGHWAY GREEN state; output=f(PS & PI)
;where <hg>=PS, and <C , TL ,TS>=P I
:
hg
( cond ( ( not ( and c tl ) )
(setq hl0 f)
(setq hll f )
( setq f 10 t
>
(setq fll f )
(setq st f)
(go hg ) )
(t (setq hl0 f
)
(setq hll f )
(setq f 10 t
(setq fll f )
(setq st t)
(go hy) ) >
;The HIGHWAY YELLOW state and associated
;outputs & state transitions (<go >)
; (see text for output encoding table and
[explanation of state transition syntax! :
hy








(setq f 11 f )
( setq st f >
( go
( setq hl0 f )
( setq hi 1 t)
( setq f 10 t >
( setq f 1
1
f >




;The FARMROAD GREEN state and associated
;outputs & state transitions!
( cond { ( not < or tKnot
<t
;The FARMROAD YELLOW state;
fy








































































hg) ) ) ) > )
)
Figure 4.22 Lc2.mac (continued)
161
nevertheless declared with the PROGRAM statement. The data
path width is two
,
to permit instantiation of two
sequencers to cycle through the -four states o-f the FSM.
The initial attempt at lc2 erroneously used a data path
width o-f -five , and the algorithm compiled to cif. The
resulting cifplot had a data path width of -Five bits,
only two o-f which were connected to the sequencer tails
to remember and address the states. The other three data
path units took up chip space, but performed no function.
2. The Chip
The cifplot resulting from lc2.mac is shown in
Figure 4.23, and the script of the compilation session is
in Appendix B. The cifplot resembles the previous two
FSM cif plots, but lacks flags and data path logic. The
only registers shown a.r& those which receive and store
state information from the sequencer tail. As usual, they
lie in the data path above the clock drivers. Other than
that, the cifplot for lc2 has no data path. This is
expected in view of the driver algorithm, and the script
file of the compilation shows only six data. path
organelles but 43 columns in control. A handcrafted
version of this chip could be produced with just a data
path, if a two phase clock is used. This will be






This Chapter has considered three examples of MacF'itts
sequential logic: the Gray code decoder, the blackjack
game, and the Mead-Conway light controller. In each case,
the Mealy FSM convention of MacF'itts led to an easy
transition -from state diagram to algorithmic description.
The Mealy architecture is evident in both the MacPitts
algorithm and the resulting chip layout.
In the algorithm, each state is given a name (e. g.
,
HIGHWAY GREEN, HIGHWAY YELLOW) and within each state the
outputs are determined with the COND -form and set-
accordingly. The output is a function of both present state




The same Mealy logic is evident in the circuit layout
(cifplot). The sequencer stores the present state, and
multiplexers driven by the Weinberger array and present
inputs determine the next-state transitioning by
controlling the inputs to the bank of present-state
r eg 1 st er s
.
Sequential logic in MacPitts can be influenced by the
designer in the same way as combinational logic can, by
explicitly specifying the desired outputs. The alternative
is to specify the outputs as an implicit function of
either inputs (ports, input signals) or intermediate
results (internal signals, flags, registers). In general.
,
when the explicit specification of outputs is used
(setq score 19)
rather than the -functional specification o-f outputs
(setq score (+ rscore -face) )
a smaller and -faster circuit will result. The explicit
specification of outputs is therefore the preferred
method, though not always possible. If there ar& many
possible outputs, it may even be better to use the
functional specification of outputs rather than attempting
to specify each one explicitly.
The data path width for a MacPitts sequential machine,
as specified in the PROGRAM statement, must be la.r<"i<?
enough to address the number of states. That is, the data
path width must be greater than or equal to log (base 2)
of the number of states in the state transition diagram.
If this c on d i 1 1 on is n ot met , \v\s.c Pitt s ( t h e si 1 i c on
compiler) will not successfully compile the source
algorithm. The reason for this requirement is the manner
in which MacPitts lays out the sequencer and data path,
The sequencer and data path s.rG laid out contiguously, in
a linear bit-slice configuration. The width of both is the
width of the data path as specified in the PROGRAM
statement (this number is also the number of present—state
registers instantiated). Since there must be the? same
number of i /o ports as the data path width, and since all
of these ports may not be used -for data i/o, one solution
to the problem o-f extra ports is to ground them in the
circuit in which the chip is to be used (as suggested for
the Gray code decoder, where only one port was necessary,
but two ports had to be speci-fied to allow enough state
transitions). The alternate solution -for the Gray code to
binary conversion routine is to treat the data as a serial
stream, one bit wide. This suggests using SIGNALS (instead
o-f PORTs) as inputs, and processing the Gray code as Boolean
data instead of integer data. This algorithm is included for
completeness in Append ix B, with the resulting cifplot and
script of the compilation process.
MacPitts provides a convenient method to compare both
Boolean and integer values, which is particularly useful
in the decision-making under a CGND. The Boolean
comparisons (Figure 4.22) are used to test the value of a
flag or a signal, and the integer comparisons (Figure
4.19) a.re used to compare numerical values in ports or
registers. In each case, the result is a Boolean signal to
control which affects subsequent state transitioning or
setting of outputs.
Algorithm design for MacPitts FSiis begins with the
decision of how much data it is desired to process
simultaneously, and in what form that data presents itself
to the chip. For instance, if a serial FSM chip is desired
(e. g. , a serial Gray code decoder) , the data word is one?
bit wide. The inclination is there-fore to treat the data as
Boolean type, which is -feasible -for FSM architectures tor
reasons explained previously. The designer is not
constrained to integer data types in this case (although the
examples presented in Figure 4.2 and Figure 4.16 used
integer data types). I-f the data comes to the chip tor
parallel processing in an n-bit word, however, the
inclination is to treat the data as integer type (tor
example, the blackjack algorithm). This is not always
possible, -for reasons to be explained in connection with
Hamming error correction in Chapter VI (MacPitts does not
permit implicit setting ot bits within a data word).
Algorithm design may be viewed as the designer's
influencing of the chip layout. Since circuit structure is a
function of syntax (on a lower level)
,
it is reasonable? to
assume that chip layout is a function of algorithm structure
(on a higher level). That is, syntax determines not only the
individual circuit elements (NANDs, QRs , XQRs, ports, flags,
registers, etc.) of the chip, but. also determines how the
individual elements work in concert. The source algorithm
lc2.mac shown in Figure 4.22 used Boolean control signals as
inputs (C, TL , TS). The resulting cifplot in Figure 4.23
shows a Weinberger a.rray at the bottom, and no data path
except for a bank of two sequencer organelles at the top o-i-
the chip. This chip can be viewed as a control path chip. An
alternate design would use a five-bit word (representing the
signals HLO, HL1 , FLO, FL1 , and ST) as the output, and
retain the three control signals as inputs. Appendix B shows
dplc2.mac (the data path equivalent o-f Figure 4.2:1,
lc2.mac), and the resulting ci-fplot. The output bits -are set
explicitly by setting the output word values in the .mac
file. This results in a larger data path, as expected, since
the output decisions result in data path operations instead
o-f control path operations. The control path is smaller than
in lc2.ci-f, since the Weinberger a.rra.y has fewer decisions
to make. Appendix B also contains the script file of the
compilation o-f dplc2.mac.
Yet another version o-f the light controller would assign
the input values to a three bit word (representing C , TS
,
and TL) , and make the conditional checks on the input
control word with the BIT statement. This solution would
result in a still larger data path and a smaller control
path than the two previous light controller chips. Just as
in any high-level language, there exists many ways o-f
solving a given problem with MacPitts. The best way to solve
the problem must consider not only the algorithm, but. the
structural (layout) consequences of algorithmic syntax. The
"best" solution is arrived at by experience in MacPitts
programming, knowledge of the consequences o-f syntax, and
•finally, iteration toward a better solution (trial and
error >
V. A COMPARISON OF A MACPITTS DESIGN
WITH A HANDCRAFTED EQUIVALENT
Previous chapters illustrated some inefficiencies
inherent in the MacPitts layout scheme. The Weinberger array
and the data path both use transverse polysilicon wires -for
cross-communication, and poly has the highest specific
resistance of the three possible NMOS wire materials. The
one dimensional river routing method used is not optimal,
because the input, output, and data/ control lines required
are long. The sequencer organelles are: instantiated
according to the data path width, and not according to the
number of states necessary. The Weinberger array generates
multiple cascaded gates to implement multiple output,
combinational logic functions, causing long signal delays in
comparison to a PLA. A handcrafted version of a functionally
equivalent chip is compared to a MacPitts design to
investigate these differences both quantitatively and
qual i tat i vei y
.
A. THE HANDCRAFTED TRAFFIC LIGHT CONTROLLER
The standard for this comparison is a handcrafted (CAD)
version of the Mead-Conway traffic light controller which is
compared to the MacPitts generated version in terms of speed
and power consumption. Qualitative observations ar<e also
descr i bed
.
The custom-made traffic light controller was constructed
on the Caesar VLSI graphics editor with the aid of various
VLSI CAD tools.
1 . Desi qn
The MacPi tts-produced traffic light controller was
described in the Chapter IV. MacPitts design is just a
matter of generating a prototype MacPitts driver program,
and refining it until an acceptable archetype algorithm is
achieved. This is done in both the command interpreter
(algorithmic optimization)
,
and in Caesar (structural
optimization). Caesar allows the designer to see the
structure and analyze it with power estimators (Powest) and
timing estimators (Crystal, SPICE). Moving pads and deleting
vest i gal structures 3^re examples of possible structural
optimizations using Caesar (this procedure should be
considered if the MacPitts chip is to be fabricated).
The standard VLSI design scheme is similar to
MacPitts design in that structure is considered as a
function of behavior. The behavior is not constrained to
follow a given algorithmic syntax, though, as it is in
MacPitts. So custom design is more flexible than silicon
compiler designs s.re, since the designer can choose any
desired structure to implement the behavior called for.
The standard MMDS PLA is used for the hand-crafted
traffic light, controller. Mead and Conway LRef. 4:pp. 80-8811
develop the state transition table for the light controller.
and provide a sticks diagram of the clocked PLA FSM. The
following PLA is based on the Mead-Conway development.
Ousterhaut CRef. 9] illustrates use of Eqntott and
Re-ference 10 illustrates use o-f Tpla to generate this PLA.
Eqntott is a VLSI CAD program which takes logic equations as
the input and produces a PLA truth table as the output. This
truth table is the input to Tpla (Technology independent
Programmed Logic Array), and Tpla -further allows the
designer to geometrically modify the PLA. The result o-f Tpla
processing the truth table is a Caesar representation of the
desired PLA. Figure 5.1 shows the input logic equations for
Eqntott, and Figure 5.2 shows the resulting truth table from
Eqntott
.
The best method to design a PLA is to create the
logic equations as in Figure 5.1, and then use the Unix
pipeline to send the result of Eqntott to Tpla
eqntott L opt ions J infilename I tpla [options J
outf i 1 ename
The result is a Caesar file of the PLA layout, which must be
converted to cif in Caesar as previously described. Figure
5.3 shows the -trans PLA (inputs and outputs on opposite
sides) generated from the command
eqntott -1 -f -R stopltltpla -s Btrans -I -Q -o
stopl t . ca


























































































































Itl 4 y0 4
!tl 4 y0 4






ytf 4 yl ;
4 y0 4 yl
;
4 y0 4 yl
>
4 y0 4 yl ;
4 y0 4 yl
4 y0 4 yl ;
<S5 4 yl >
its 4 y0 4
its 4 y0 4
its 4 y0 4
(its 4 y0 4
ts 4 y0 4
ts 4 y0 4
ts 4 y0 4
I ( ts 4 y0 4

























































Figure 5.2 Truth Table Input -for Tpla
172
means list the truth table, —f means to connect the feedback
paths in the PLA, and -R directs eqntott to minimise the
truth table. The tpla switch -s selects the PLA type (-trans
shown) , and -I and -0 indicate clocked inputs and outputs™
This command string creates an NMOS FSM Caesar -file. It was
determined later that a -cis PLA (input and output on the
same side o-f the PLA) would -fit the chip -frame better. The
change is simple. The same command string as above was
issued, except Bcis was substituted -for Btrans.
The PLA is a -fast structure. Appendix A shows the
interactive Crystal session showing the timing analysis of
just the PLA. The delays a.r& determined to be 26.93 ns tor
phi a and 32.06 ns -for phib. For symmetric phi a and phib
durations, with each having the duration o-f the slowest
critical path, or 32.06 ns, the maximum clock rate is 15.6
ns. The maximum clock rate is calculated as the inverse? o+
twice the slowest critical path time. The use of Crystal on
non-overlappinq, two—phase clocking schemes is described in
[Re-f.3:pp. 80-93 J.
The sequential logic -for the light controller chip
is made with the University of Washington /Northwest
Consortium CAD tools as described above. All that is lacking
is the power and ground connections and the pads. Usual I -•
the power and ground busses arE? laid out by hand (Caesar) or
specified in cartesian coordinates (CLL, Chip Layout
Language, a method o-f specifying mask. polygons, their
s^^^MI?^^;^^^^^:^"^ ~1
Figure 5.3 -Trans PLA Resulting
-from Eqntott and Tpla
174
dimensions, and the -fabrication process required), and the
pads are then invoked -from an existing library o-f VLSI macro
cells. MacPitts can shorten the design time by doing most o-f
this work for the designer. Figure 5.4 shows the algorithm
stopl t_-f rm. mac used to create the frame -for the PLA FSM. The
frame is created like wire. mac (Figure 2.1), in that it is
just wires from input to output. The wires are deleted in
Caesar, and the PLA is placed in the center of the chip
frame. Figure 5.5 shows the resulting chip. The clocked
-cis PLA is in the center o-f the chip, connected to
appropriate inputs and outputs (tpla makes this connection
easy, it labels all inputs and outputs). The third clock pad
(phic) is deleted in Caesar. This chip still has long
indirect metal runs and lots of white space.
2. Opt i mi zat i on and Anal ysi s
Figure 5.6 shows a condensed version of the chip,
stopl t__mi nc . ci f . The area o-f the chip shown in Figure 5.6 is
40"/. smaller than the chip in Figure 5.5, and still more
reduction is possible. Since there are 12 pads, it would be
better to place three per edge on the chip. The signal wires
could also be shortened by judicious choice of pad placement
in the .mac algorithm. And finally, all sides could be
brought closer together. There exists a synergistic
relationship between the existing CAD tools and MacPitts
that bears further study.
; stop 1 t_frm.mac
;Th1s pgm creates a design frame for the stoplight
;controller Ccf.Mead & Conway, p. 81, 2nd printing]
; hand-craft 1 ng will be required to merge the PLA
;FSM created by eqntott I tp 1 a Into this frame. CAESAR
; Is used to do this.
{program stop 1 t_frm. mac 5






; Inputs to light controller PLA FSM
(def c signal Input 5)
(def tl signal Input 6)
(def ts signal Input 7)
;outputs from light controller PLA FSM
(def st signal output 8>
(def hl0 signal output 9)
(def hll signal output 10)
(def fl0 signal output 11)
(def fll signal output 12)
( a 1 ways
*
jhere we setq 5 simple dummy paths. These are chosen with a
jvlew towards later simple editing In CAESAR
t
(setq St c)
( setq hl0 tl
)
(setq hll ts)
( setq f 10 c)
(setq f 1 1 tl > ) )
Figure 5.4 Stopl t_f rm. mac
176











Figure 5.6 Stopl t_mi nc. ci
f
178
Intervention by the designer, however, is antithetical to
the goal of silicon compilation. The silicon compiler has a
ruleset which (in theory) guarantees the property of
"correct by construction". This property states that the
chip design will always be -functionally correct; it cannot
be wrong. Circuit density is not the primary goal, nor is
speed
.
The MacPitts designer has no control over circuit
density, other than Boolean optimization of the algorithmic
•forms as explained in Chapters II and III. The designer does
have some control over chip speed. There are two ways of
optimizing throughput in a MacPitts design. The first method
is explained at the beginning of Chapter III, and can be
thought of as algorithmic optimization. The objective is to
write an algorithm which executes in a minimum number of
clock cycles. The verification is done in the command
interpreter. PAR, COND , and PROCESS are used wherever
possible to parallel operations.
The second method of controlling chip speed is
through circuit optimization (this too is a function of
syntax in MacPitts). The designer chooses either the data
path or the control path or a hybrid of both, and with
Crystal designs a chip which has a maximum speed per clock
cycle. The throughput is then the product of the inverse of
the number of clock cycles required for a valid result and
the cycle rate (results/cycle x Hz = results/sec).
Furthermore, the circuit speed can be increased by
judicious placement of pads in the .mac tile. It is not
always apparent where the routing will go beforehand, so the
recommended method is to create a prototype citplot, and
then modi-fy the pad numbering in the .mac tile to decrease
signal path lengths trom the pads to the logic elements. For
example, in stopl t_mi nc . ci t (Figure 5.6), the phia pad would
be moved to center lett on the chip trame, phib to center
right, ground to top right, and C, TL , and TS would be moved
to the lower lett corner region to decrease metal run
lengths. All ot these suggested moves a.r<B not possible due
to the way MacPitts places pads, so Caesar editing is
required to optimize the MacPitts design it minimal length
runs are desired.
Appendix C contains the Crystal analysis ot the PLA
trattic light controller. The chip speed is limited to the
inverse ot the sum oi the critical propagation times, or
6.85 MHz. This is less than halt the speed ot just the PLA
(16.95 MHz). Appendix C also contains the Powest analysis of
the PLA tra.-f-fic light controller.
B„ COMPARISON WITH MACPITTS DESIGN
Appendix C contains the Crystal command tile tor the
MacPitts trat-fic light controller timing analysis. Froede
CRet. 3: pp. 80-8511 explains the analysis ot a MacPitts
design with Crystal. The Crystal command tile in Appendix C
shows just the commands issued to Crystal, and in
parentheses to the right, the time delay values returned
(representing an actual Crystal session).
Figure 4.23 shows the chip on which this Crystal
analysis was done. The critical path is -from phic to the
clock drivers to the state registers. The clock drivers
induce a cumulative delay o-f 23.9 ns, and the state
registers a cumulative delay o-f 114.2 ns, so the transition
induces a delay o+ 90.3 ns. The Weinberger array induces
another 178 ns, and the slowest path is -from there to the ST
pad. The total delay is 363.52 ns, for a maximum speed of
2.75 Mhz . This speed is 40"/. of the maximum speed of the PLA
light controller.
Figures 5.7 and 5.8 show the floorplans of each version
of the traffic light controller. Figure 5.7, the PLA FSM is
comparatively simple. The FSM is a small clocked PLA with
feedback. The connections to the pads are all metal (not
shown). Figure 5.8 is the MacPitts version, and is far more
complicated. The control path is large, and induces the
largest part of the delay. The present state (PS in Figure
5.8) -next state mechanism is much more complex than the
simple PLA feedback generated by eqntott and tpla. The wires
between the data and control paths are poly, as are the PS
feedback lines in Figure 5.8. These wires contribute to the
slowness of the MacPitts chip. The wires to the pads also
take a more circuitous route, inducing still more delay.
1 E


























Figure 5.8 MacPitts Stoplight Chip Floorplan
183
Table 5.1 compares the MacPitts tra-ftic light controller and








max . cl ock freq
.
CMhzD 6.85
pull up transistors 35
avq. DC power C WD .042
man. DC power C WD .085
control path dimensions
[mm] . 49 x . 29
data path dimensions
[mm] . 178










.547 >i . 185
. 256 ;•: . 240
8.9
1 . 1 64
VI. DESIGN EXAMPLE: HAMMING ERROR DETECTOR/CORRECTOR
This Chapter describes one method o-f design with
MacPitts. The procedure is to first de-fine the problem, then
to write an initial algorithmic description o-f the solution
in MacPitts (the language). The initial algorithm is either
a simplified version, or a piece of the larger problem. The
simplified algorithm is tested for execution in the
interpreter, and then compiled to cif. Alternate solutions
are considered next, and simplified alternate solutions are
likewise tested. The best of these algorithms is then
chosen, based on speed, power dissipation, and size. The
chosen solution is then expanded to solve the larger
probl em.
The problem is to design a parallel Hamming method error
detector /corrector which will correct single bit errors in a
15-bit encoded message.
A. THE ERROR DETECTOR
The theory behind Hamming error detection and correction
is found in most texts on coding and information theory
CRef. 5:pp. 39-49D. A subset of this problem is error
detection, which the prototype algorithm solves.
The prototype algorithm looks at a three bit encoded
message in parallel, and by the Hamming method determines
the bit error location. The algorithm is written to
demonstrate correct operation -for three-bit messages. It can
later be expanded to cover longer word lengths.
The Hamming method scans the encoded word, and by a
series of parity checks determines the bit error position.
The single error detection method assigns the result o-f each
parity check to a bit o-f data. The word -formed from the
resulting bits comprises the syndrome. The value o-f the
syndrome is the bit error position in the received message.
The parity checking is done in a specific order. If the
codeword is a string of n bits with the 1 sb leading
1 2 3 4 5 6 7 8 ... n
then the syndrome bits are determined by parity checks
across the message bits as shown below.
syndrome bit message bit positions for parity check
2 4 6 8 10 12 14 16 18 20 ...
1 1 2 5 6 9 10 13 14 17 18 21 22 ..
.
2 3 4 561112131419 20 21 22 ...
3 7 8 9 10 11 12 13 14 23 24 25 26 27. .
.
Where the syndrome word is read from msb to 1 sb and points
to the message bit which needs correcting.
For instance, for an encoded seven bit message, there
ar& three check bits (represented by "c") , and four bits of
information (represented by "i") in the positions indicated
bel ow
c c 1 c 1 1 1
The -First bit of the syndrome ( 1 sb ) is determined by parity
checks over positions 0, 2, 4, and 6. The next bit o-f the
syndrome considers positions 1, 2, 5, and 6. The last bit of
the syndrome (msb) is determined -from message positions 3,
4, 5, and 6. The three—bit syndrome indicates the error
position in the message string. I-f the received message is
0100011, the syndrome generated is 011. The syndrome
indicates an error in the third bit -from the right. The
correct message is 0110011. The Hamming method corrects
(complements) the third symbol.
1 . Desi qn Con si derati ons
Previously in this research it was noted that
MacPitts syntax does not permit explicit bit manipulation in
the data path. To do this algorithm in the data path may be
desirable, in view o-f the speed of simple data path
•functions. Since this is not possible, perhaps a hybrid data
path-control path algorithm should be considered. A review
o-f the Gray code decoder chip (Figure 4.2) will show why
this is not a good approach. The Gray code decoder is a
mixed structure, having both a data path and a control path.
The interconnections are all poly, which slows the chip
down. The multiple unPARalelled CONDs have a more
detrimental effect on speed, since each requires a clock
8
cycle to execute i f its antecedant is true. So the target
architecture will be Boolean (control path).
The parity checks can be done by a variety o-f
methods in MacPitts. The simplest way is with the built in
library -function PARITY, which has the -Format
parity (boolean boolean ...)
PARITY per-forms modulo two addition, and returns Boolean
TRUE to control i f the argument i s an odd number o-f TRUEs,
or Boolean FALSE if the argument i s an even number of TRUEs.
So the parity checks can be done directly on the bits of the
message, in parallel, with the PARITY statement.
MacPitts also has a method of checking specific
bits in a data word. The BIT statement looks at a bit in the
integer-valued word, and returns a TRUE to control if the
bit is one, or a FALSE to control if the bit is zero. The
form of the BIT statement is
(bit <bi t_posi ti on) < i nteger_e>;pressi on > )
Figure 6.1 is the algorithm tst.mac, used to test the BIT
statement. It is similar functionally to wire. mac, in that
it sets an output bit to an input bit. The difference is
that BIT permits a bit-by-bit conversion from integer value
to Boolean value. In Figure 6.1, the input word mesg is
integer valued. The output bits ^re Boolean signals (out;;),
and they are setq'd to the respective bit position values of
mesg (the corrupted input word).
2- Prototype Error Detector
Knowing Hamming error detection theory and the
PARITY and BIT statement syntax, an error detector algorithm
;TST.MAC
;A MacPltts algorithm for bit-setting of output ports
;The BIT form Is used to select a specific bit of the
; Input data wo rd, and an output signal Is set to
;The value of the bit selected.






;Use a 3-btt INTEGER as Input PORT:
(def mesg port Input (5 6 7 )>
;Use 3 BOOLEAN SIGNALS as outputs:
(def out0 signal output 8)
(def outl signal output 9)
(def o u 1 2 signal output 10)
( def 1 1 power >
;Perform bit-setting on each clock cycle:
( a 1 ways
;Select which bit of the Input word Is to
;Be SETQ'd to the output signal pads:
(setq out0 (bit mesg))
(setq outl (bit 1 mesg ) >
(setq out2 (bit 2 mesg)) ) )
Figure 6.1 Tst . mac
can be written. The encoded message input (mesg) is word-
valued, three bits wide. The output syndrome (syndx) is two
Boolean signals. The algorithm is shown in Figure 6.2. The
semantics o-f the MacPitts algorithm -follow the English
189
description o-f the problem statement. The appropriate bit
patterns o-f the message are checked, and the syndrome bits
are set based on the results of the parity checks. This
algorithm was exhaustively tested in the command
interpreter, and serves as the prototype for the error
; HAM3 . MAC
;A MacPltts algorithm for single-error detection
juslng the Hamming method.
(program haml 3 ;note width of data path (=w1dth of msg
)




;mesg Is the Input data word of 3 bits width with possible errors
(def mesg port Input (5 6 7 )>
(def syndl signal output 8)
(def synd2 signal output 9)
( def 10 power
)
( a 1 ways
;For a 3 bit word, two parity checks are required. The
;result of these parity checks Is a 2 bit syndrome, which
{Indicates the bit position of the error In the 3 bit word.
;th1s cond sets or clears the lsb of the syndrome.
( cond
((parity (bit mesg) (bit 2 mesg) )
( setq synd It))
(t
(setq syndl f )))
;Th1s cond sets or clears the msb of the syndrome.
( cond( ( par Ity (bit 1 mesg) (bit 2 mesg) )




(setq synd2 f ) ) ) ) )
Figure 6.2 Ham3.mac
detector. The algorithm compiled to ci-f, and Figure 6.3
shows a logic structure completely in the control path. The
parallel lines at center left are the input (mesg) bits, and
result from the BIT statement. They go to the right side of






The three bit Hamming error detector is the trivial
case. The decision is in -favor of the winning bits ("two out
o-f three"), so the syndrome is not really necessary unless
the check bits are wrong (a possibility -for which the
Hamming code allows ).
The Hamming code is uniform in its protection,
however; once encoded there is no difference between the
message bits (i) and the check bits (c). This is important
in checking longer words -for errors. A seven bit message is
checked as in the example given above. Elaborating on the
prototype, Figure 6.4 shows the algorithm to generate the
syndrome -for a seven bit parallel error detector. This error
detector requires a three bit syndrome to point at one o-f
the possible seven error bits in the message. Section A.
above illustrates the syndrome generation process, and how
the syndrome word points at the erroneous message bit. The
resulting ci-fplot is shown in Figure 6.5, and the structure
is similar to the Weinberger array -for the three-bit error
detector
.
It is good practice to expand the algorithm in
steps, instead o-f going directly from the prototype to the
-final design. Unexpected results can be dealt with better i -f
this approach is followed.
;HAM7.MAC
;A MacPltts algorithm to Implement a 7 bit message error
;correct1on chip. The Hamming method Is used. Four of the





(def 4 ph 1c
)
(def msg port Input (5 6 7 8 9 \Z 11))
(def syndl signal output 12)
(def synd2 signal output 13)
(def synd3 signal output 14)
( def 1 5 power
)
;The Hamming method uses parity checks over bit positions
;l,3,5,and 7 to set the 1 sb of the syndrome,
jchecks over positions 2, 3, 6, and 7 to set the middle synd bit,
;and checks over positions 4,5,6, and 7 to set the msb of the
;syndrome. The value of the syndrome Indicates the bit error
iposltlon In the 7 bit message.
(always
;set lsb of syndrome:
( cond
((parity (bit msg) (bit 2 msg) (bit 4 msg) (bit S msg))
( setq syndl t ) )
(t
(setq syndl f ) ) )
;set middle bit of syndromes
( cond( (par Ity (bit 1 msg) (bit 2 msg) (bit 5 msg) (bit 6 msg))
( setq synd2 t ) )
(t
(setq synd2 f ) )
)
; set msb of syndrome:
(cond( (par Ity (bit 3 msg) (bit 4 msg) (bit 5 msg) (bit 6 msg))
( setq synd3 t )
)
(t
















The desired algorithm is to uniformly detect errors
in a 15 bit message. Remembering the surprising inability of
MacPitts to compile a six input /one output gate in the data
path, a test algorithm was written -for the larger message.
Figure 6.6 is the algorithm to detect errors in an 15 bit
encoded message. The syndrome bits are determined -from the
parity checks as -follows.
syndrome message bit check positions
syndl 2 4 6 8 10 12 14
synd2 1 2 5 6 9 10 13 14
synd3 3 456 11 12 13 14
synd4 7 8 9 10 11 12 13 14
The single error detection scheme requires -four
bits to select the message bit -for correction, thus the four
bit syndrome. Syndl is the 1 sb and synd4 is the msb of the
Boolean syndrome word. Figure 6.7 shows the cifplot
resulting from hamlS.mac. The structure is predictably
similar to ham7.cif and ham3.cif (Figure 6.3, Figure 6.5).
This algorithm serves as the archetype (chief model, as
opposed to prototype, first model) for the error detector.
The error detector is half of the solution, the other half
is correction of the errors. The detection is feasible, as
proven by this algorithm.

















































Ithm to Implement an 11 bit message error
The Hamming method Is used. 11 of the




























































f > ) )
stndr ome
»
t 1 msg) (bit
t 9 msg) (bit
t ) >
2 msg) (bit 4 msg) (bit 6
10 msg) (bit 12 msg) (bit
2 msg) (b1t5 msg) (bit S
10 msg) (bit 13 msg) (bit
msg )
14 msg ) )
msg )
1 4 msg )
)
f > ) )
syndrome
t 3 msg) (bit 4 msg) (bit 5
t 1 1 msg ) (bit 12 msg) (bit
t ) )
msg > (bit S msg
)








isg) (bit 8 msg) (bit 9




1 4 msg > )








HAM3 HAM7 HAM 15
Chip area.
Cmm**2: 3.473 4.812 11.113
Control path area
[mm**23 2.75 1.918 8.025
Number pullups
[in control: 9 31 71
Number pads 10 15 24
MacPitts pwr.
CW3 .03194 .06094 .12265
Powest pwr. (avg)
C W D . 02 1 70 . 03808 . 06 1 9
1
Powest pwr. (max)
[WD .04341 .07379 .11746
296.23
3.34 1.73











So this method o-f parallel error detection appears
feasible -for word lengths less than 16 bits. The speed is
fast due to the chosen single-state MacPitts architecture
(ALWAYS = one PROCESS with zero stack depth, or tor this
purpose, a single—state FSM). These chips are unclacked
circuits. The throughput is not a -function o-f the clock
rate, but depends on the signal propagation time -from input
to output. The propagation time sets the upper limit on
throughput, and the capacitive leakage -From the Weinberger
array gates sets the lower limit on throughput. If the error
detectors are used in a slow system, the outputs must
there-fore be latched to maintain valid logic levels. This is
easily done with MacPitts, by SETQing the results to -flags,
and subsequently clocking the flags to output signal ports.
B. HAMMING METHOD 15/4 ERROR CORRECTOR
The previous section is only part of the story. Having
located the error bit in the message, it must now be
corrected. The decision of how to implement the error
detector was a simple one, constrained by syntax. The error
detector /corrector invites other methods of implementation.
1 • Desi qn Considerati ons
The message bit error is pointed at by the syndrome
bits (the syndrome indicates the erroneous bit position).
The error bit needs to be complemented, and the correct
message results. The corrected message is then fed to the
output ports. In this application, the extraneous check bits
are discarded. The check bits (c) are used to encode? the
original message, and after reception and decoding the
serve no purpose.
The message error detection and correction
procedure can be reduced to three steps;:
1. locate the error
2. complement the error bit
3. set the corrected output word bits
The -first step is done with the error detection
part o-f the algorithm. The second step is straightforward in
MacPitts. Either the output bit is the input message bit
(the correct message bit case) , or else the output bit is
the complement o-f the corresponding message bit (the
incorrect message bit case). The checking is done with the
COND -form in MacPitts.
The third step involves discarding the check bits,
setting the correct output bits to the corresponding input
bit values, and sending the complement o-f the erroneous
input bit to the corresponding output bit position.
2. Prototype Desi qns
Bit manipulations require Boolean data types, so
-flags and signals are used. The -flags store the computed
syndrome bits, and the signals are used -for input and
output. Figure 6.8 shows the MacPitts driver, ham3c.mac.
There are three COND statements in ham3c.mac. The
first two determine the results o-f the message parity
checks, as in the error detection algorithms. The last COND
sets the single message bit according to the result o-f the
parity checks. I-f -fsl (flag, syndl) is FALSE and -fsO is
TRUE, then the message bit is incorrect. The output is then
set to the complement o-f the input bit value. I-f the -form
Of)
under the last COND is FALSE, then either there is no error
in the message, or the one o-f the two check bits is
incorrect. In either case, the input message data bit is
correct, so the output data bit (outO) is set to the 1 sb o-f
the input message (msgO)
.
The -format o-f the input is three symbols, two o-f
which are check bits and one data (information) bit.
bit position 1 2
bit -function c c i
Only the last bit is returned -from the error
correction routine, the two check bits (inserted in the
encoding o-f the message) are useless at this point. The last
bit is the result o-f the error correction process, and is
also -the output o-f the prototype design. The algorithm
(ham-3c.mac) has the syndrome bits declared as output
signals. This is considered good programming form (MacPitts
being both a language and a silicon compiler), and allows
troubleshooting the algorithm at run time. The syndrome
outputs are unnecessary -for the error corrector chip, and
are deleted after verification of the algorithm in the
command interpreter.
The resulting cifplot is Figure 6.9. The BIT
organelles are absent, but two data path organelles
corresponding to the flags fsl and fsO are instantiated.
These are the storage elements for the computed syndrome
i
;HAM3C.MAC
;MacP1tts algorithm for single-error detection & correction.
;Th1s algorithm serves as a paradigm for the Hamming single
jerror detection and correction problem.
(program haml 3
( def 1 ground )
(def 2 phla )
(def 3 phlb)
(def 4 phlc)
;msg(n) : the Input datum and 2 parity check bits
;out0 i the corrected datum
;synd(n)t the bit-checked Hamming error syndromes
;fs(n) t Integer storage flags for the syndrome states
(def mag2 signal Input 5)
(def msgl signal Input 6)
(def msg0 signal Input 7)
(def out0 signal output 8)
(def syndl signal output 9)
(def synd0 signal output 10)
(def fs0 flag)
(def f si flag)
( def 1 1 power >
(always ;a 1 state FSM
( cond. ;set the 1 sb of the error-bit syndrome:
((parity msg0 msg2 )
(setq synd0 t ) < setq fs0 t) )
(t
(setq synd0 f > (setq fs0 f) )>
(cond ;set the msb of the error-bit syndrome:
((parity msgl msg2 )
(setq syndl t ) (setq fsl t) )
(t
(setq syndl f ) (setq fsl f) ))
(cond ;the fs(n) flag states determine whether
;the output datum requires correction.
( (and (not fsl ) f s0)













Figure 6.9 Ham3c. ci-f
jiuo
values. The Weinberger array writes to and reads -from these
flags, as the algorithm suggests. An implication of this
hybrid (data path and control path) structure is slower
speed. This does not necessarily denote slower throughput,
but slower signal speed across the logic circuitry.
To • the right o-f the two -flags is a bank of three
dual cascaded vertical inverters. This structure performs a
function analogous to what the clock drivers do for data
path registers (superbuf f ering and sequencing of the three
phases)
.
Just as the error detector was tested for the three
bit, seven bit, and 15 bit cases, so is the error corrector
tested next for the case of a seven bit message (the error
corrector incorporates the error detector in its logic).
This section suggests a method whereby the designer
can optimize the MacPitts chip. Three solutions to the error
detection/correction problem a.re considered. Each is
investigated, and the best solution is chosen as the
archetype for the final 15 bit error corrector chip. The
archetype is chosen on a seven bit basis instead of the
simpler three bit chip. The seven bit error
detector /corrector s require more time to design and analyze,
but their performance is more representative of the desired
chip's than is the three bit detector /corrector
.
The first method is an elaboration on ham3c.mac.
The algorithm is shown in Figure 6.10, and the cifplot is
204
Figure 6.11. This algorithm uses three -flags (fsO, f si , and
fs2> to store the individual syndrome bits. The syndrome
bits are subsequently tested in the Weinberger Array, and
used to selectively set the -four output bits of the
corrected message (out6, out5, out4, and out2) . This
solution has the advantage of clarity, and the disadvantage
o-f slowness due to the hybrid structure and poly run
lengths. In comparing this algorithm to Figure 6.8
(ham3c.mac) , it can be inferred that the number of COND
statements in the error detection part o-f the algorithm is
always the same as the number of parity checks needed.
Similarly, the number of CONDs in the error correction part
equals the number of output data bits.
This version of the chip requires two clock cycles
to produce an output (write the error syndromes to the
flags, then read the flags to determine the correct output).
The throughput is 318,180 results/sec. A result is taken to
be a corrected data word, in this case, a four-bit word.
Figure 6.12 shows an alternate solution,
ham7cs.mac. This algorithm replaces the three flags with
internal signals, i sO , isl, and is2. Internal signals in
MacPitts have the advantage of not requiring time-consuming
storage operations. This architecture reduces the error
corrector to a combinational logic structure, implemented in
the control path due to syntax (all Boolean forms). The
algorithm has a similar structure to the previous one which
205
used -flags to store the syndromes (Figure 6.10). There are
three CONDs to set the syndrome, and -four CONDs to set the
output word. The question o-f internal timing arises: will
MacPitts have the syndrome ready in time -for the output word
setting? The answer is yes, because the algorithm executes
sequentially in the order written in the absence of
parallelising -forms (COND, PAR, PROCESS).
This algorithm is faster than the previous one
also. The throughput is 2,034,000 words/sec, almost six
times as fast as the chip using flags to store the syndrome.
Another solution considers the PAR form for
paralleling the CONDS. An increase in speed results if the
three CONDs which set the syndrome are paralled, and then
the four CONDs which set the output are paralled with PAR.
The throughput of this chip is 2,208,000 words/sec, slightly
faster than the chip without PARs around the CONDs. This
translates into larger structure (Table 6.2). Figure 6.14 is
the MacPitts driver, ham7cr.mac, and Figure 6.15 is the
ci f pi ot
.
This version of the error detector /corrector is the
archetype (chief example) for the 15 bit error
detector /corrector . It was developed based on the three bit
prototype (Figure 6.8), refined , tested with the MacPitts
interpreter and Crystal, and is considered the optimal
MacPitts parallel-architecture solution for the seven bit
correction problem. It serves as the model for building the
HAM7Cfth.MAC







































; set msb of s



















t message error corrector, FLAGS for syndromes
cfth 1
Xdef 2 phtaHdef 3 phlbHdef 4 phlc)
na 1 Input 5)(def msgl signal Input S)
Input 7)(def msg3 signal Input 8)
Input 9)(def msg5 signal Input 10)
Input 1 1 )
output 12)(def out5 signal output 13)
output 14)(def out2 signal output 15)











msg0 msg2 msg4 msg6)
t ' ) >
f ) ) )
It of syndrome:
msgl msg2 msg5 msgS)
1 -t ) >
If ) ) )
yndrome
:
msg3 msg4 msg5 msgS)
2 t ' ) )
2 f ) ) )
s MESSAGE bits are corrected




It 4 (msg bit 5 ) t
(not fsl ) f 30















data bit 5 (msg bit 6)
fsl (not fs0)
( not msg5 ) >
msg5 ) )
It 6 (msg bit 7) t
fsl fs0
( not msg6 ) )

































































ng 7 bit message error corrector , S IGNALS
am ham7cs 1
groundXdef 2 phlaHdef 3 phtbHdef 4 phfc)
sg0 signal Input 5)(def msgl signal Input 6)
sg2 signal Input 7)(def msg3 signal Input 8)
sg4 signal Input 9)(def msg5 signal Input 10)
sg6 signal Input 11)
utG signal output 12)(def out5 signal output
ut4 signal output 14)(def msg2 signal output










(parity msg0 msg2 msg4 msgS)
setq 1s0 t " ) >
t
setq 1s0 f ) )
)
Iddle bit of syndrome:
(parity msgl msg2 msg5 msg6)
setq 1 s 1 t ) )
t
setq 1 3 1 f ) ))
sb of syndrome:
(parity msg3 msg4 msg5 msgS)
setq 1s2 t ) )
t
setq 1s2 f ) ) )
data bit 2 (msg bit 3) :
nd (not 1s2) 1sl 1 s0
tq out2 (not msg2 ) )
tq out2 msg2))
data bit 4 (msg bit 5)
:
nd 1s2 (not isl) 1 s0
tq out4 (not msg4) )
tq out4 msg4 ) )
data bit 5 (msg bit 6)
i
nd 1s2 isl (not 1s0)
tq out5 (not msg5) )
tq out5 msg5 )
)
data bit 6 (msg bit 7 )
nd 1s2 isl 1s0
tq out6 (not msg6) )
;Use SIGNALS Instead of FLAGS:




























def 1 s2 sign
def 1 s 1 sign
def 1 s0 sign
def 17 power
always

















( setq 1 s
;set msb of s
( cond ((parity






























t message error corrector, using PAR
cr 1
)(def 2 phlaXdef 3 phlbXdef 4 phlc)
nal Input 5)<def msgl signal Input 6)
Input 7)(def msg3 signal Input 8)
Input 9)(def msg5 signal Input 10)
Input 1 1 )
output 12)(def out5 signal output
output 14)(def out2 signal output







a 1 1 nter na 1 )
a 1 1 nter na 1 )
















f ) ) )
It of syndrome:
msgl msg2 msg5 msg6)
It ) )
If ) > )
yndrome
msg3 msg4 msg5 msg6)
2 t ) )
2 f > > > )




It 4 (msg bit 5)
t
(not Isl) 1s0
( not msg4 ) )
msg4 ) )
It 5 (msg bit 6)
»
Isl (not 1s0)
( not msg5 ) >
msg5 ) )
it 6 (msg bit 7)
Isl 1s0







' /.' ;•'/ '
'








15 bit machine (the seven bit model is easier to analyze in
the interpreter, and with Crystal and Esim).
It is impractical to do the preceeding design
process beginning with a 15 bit machine. The 15 bit message
cannot be tested in the interpreter (all the inputs and
outputs will not -fit on the VT— 100 screen) , and Caesar and
Crystal analysis is -far more complicated with large
structures. It is better to optimize with a smaller model,
and then extend the results to achieve the desired chip.
Table 6.2 is a parametric comparison o-f the three
Hamming error detector /corrector chips. The reason -for the





Area Cmm**2D 7.003 6.305 6.187
Power CW: .102 .0931 .0931
Delay Ens] 1581.37 491.64 452.94
Speed CMHz] .6324 2.034 2.208
Cycles /res. 2 11
Throughput Cres/sD .316M 2.2034M 2.208M
Speed /area
HMHz/mm-**2]
. 0903M . 3226M . 3579M
Densi ty
Ctran/nim**2] 53.6 45.7 46.6
2 1 3
The reason for the choice of ham7cr as the model is
seen in Table 6.2. The chip (Ham7cr) is smaller and -faster
than its predecessors. It has the highest throughput of all
the seven bit correctors. The result of using the PAR -Form
is seen by comparing the speed/area ratios o-f ham7cs and
ham7cr. PAR translates into more decisions done
simultaneously, and the decisions are done -faster
(speed/area is greater). The result o-f storing the syndrome
bits in -flags (ham7c-f) is shown in its comparatively low
throughput and low speed/area -figures.
A -functional summary o-f the three prototype
candidate algorithms (-flowcharts and resulting -floorplans)
is given in Figures 6.16 - 6.21.
4. Hammi nq 15/4 Error Corrector
The 15 bit error corrector is designed after the
PARalled COND version o-f the ham7 algorithm, ham7cr.mac
(Figure 6.14). As explained above, the number o-f CONDs
expected is the sum o-f the number o-f syndrome bits and the
number of corrected data bits out. There s.re four syndrome
bits for the 15/4 code, and 11 corrected data bits out, for
a total of 15 CONDs in the algorithm. Figure 6.22 shows
hamlSdc.mac. The algorithm structure is similar to hamZ,
except for the pin naming which has been shortened to make
it easier to enter the data for analysis (Crystal, Caesar
labels, esim). There are four parity checks across the bits
as described in the paragraph on error detection. The parity
2 1. 4
parity check on input MSG
set syndrome 1 sb to -flag FS0
\^L
parity check on input MSG
aet syndrome middle bit to -Flag FS1
\/_
parity check on input MSG
set syndrome msb to flag FS2
\/
set output bit <0UT2)
as f (flag states & MSG2)
AZ_
set output bit (0UT4)
as f <-flag states & MSG4)
\/_
set output bit (0UT5)
as f (flag states & MSG5)
\/
set output bit <0UT6)








Figure 6.16 Ham7c-f Flowchart
21
GND ph ia phib phic MSG0 MSG1












Udd 0UT6 OUT5 OUT4
Figure 6.17 Ham7c-f Floorplan
216
parity check on input MSG
set syndrome I sb to signal IS0
\L
parity check on input MSG
set syndrome middle bit to signal IS1
V
parity check on input MSG
set syndrome msb to signal IS2
V
set output bit (0UT2)
as -F (si gna I s & MSG2 >
V
set output bit (0UT4>
as f (signals & MSG4)
±
set output bit (0UT5)
as f (signals & MSG5)
±
set output bit (0UT6)







Figure 6.18 Ham7cs Flowchart
217









Udd 0UT3 0UT2 0UT1


























Figure 6.20 Ham7cr Flowchart
219










Udd 0UT3 0UT2 0UT1
Figure 6.^1 Ham7cr Floorplan
220
checks result in -four syndrome internal signals. The
internal signals translate to -feedback within the Weinberger
array. After the bit error is identified by the syndrome
pattern, it is corrected. There are 11 CONDs which
accomplish the bit-wise correction of the output word, one
•for each bit which is not an encoding bit (positions 0, 1,
3 , and 7)
.
The algorithm compiled to ci-f, as expected. The
size o-f the Weinberger array (155 columns) required a long
time -for compilation, approximately 3.5 hours (at night) on
the VAX 11/780 at Naval Postgraduate School. The resulting
labelled ci-fplot is shown in Figure 6.23. The circuit is an
expansion o-f the seven bit Hamming error correctors, but
larger. The seven bit chip has seven CONDs, the 15 bit chip
has 15. The result o-f COND in the algorithm is NOR gates in
the Weinberger array. The chip measures 5.1371 mm by 4.005
mm, for an area of 20.57 sq. mm. There are 238 pullup
transistors, so the Powest-cal cul ated power dissipation of
0.1229 W (average) is no surprise (MacPitts estimates the
power consumption as 0.16086 W). The Powest estimated
maximum dc power is 0.2321 W. Crystal timing analysis
predicts a maximum delay of 1222.94 ns, for a maximum data
rate of 818 kHz and therefore a maximum throughput of
818,000 results/sec (8,998,000 bits/sec). The circuit
density is sparse, as seen in the cifplot, and the average



































































































































































< def 36 signal







nter na 1 )
<




1 s 1 signal 1
1 s3 signal 1
nter na 1 )












((parity m0 m2 m4 m6 m8 ml0 ml2 ml4)(
(t (
middle bit of syndromei
((parity ml m2 m5 m6 m9 ml0 ml3 ml4)(
(t (
next bit syndrome:








1s0 f ) ) )
isl t ) )









((parity m7 m8 m9 ml0 mil ml2 ml3 ml4)(setq
(t (setq











((and (not 1s3) (not
( setq s2 ( not m2 > )
)
( t( setq 32 m2 ) )
)
bit 4 (m5)
((and (not 1s3) 1s2
( setq s4 < not m4 ) ) )
( t( setq s4 m4 ) ) )
bit 5 (m6)
((and (not 1s3) 1s2
( setq s5 ( not m4 ) ) )
































((and (not ts3) 1s2 Isl 1s0>
( setq s6 ( not m6 ) ) )
(t(setq s6 m6))>
bit 8 <m9)
((and 1s3 (not 1s2)
( setq s8 (not m8 ) )
)
( t( setq s8 m8) )
)
bit 9 (ml0)
((and 1s3 (not 1s2)
(setq s9 (not m9 ) ) )
( t( setq s9 m9 ) )
>
bit 10 (mil)
((and 1s3 (not 1s2) Isl
( setq sl0 ( not ml0> >
)
(tCsetq sl0 ml0) ) )
bit 11 (ml2)
(and 1s3 1s2 (not Isl)
setq si 1 ( not ml 1 ) )
)
t(setq si 1 mil ) >
)
bit 12 (ml3)
(and 1s3 1s2 (not Isl)
setq sl2 ( not ml2) )
t(setq sl2 ml2) )






(and 1s3 1s2 1sl (not
setq sl3 ( not ml3) )
t( setq sl3 ml3) )
bit 14 (ml5)
(and 1s3 1s2 Isl 1s0)
setq sl4 (not ml4)>)
t(setq sl4 ml4) ))))))
1s0) )
Figure 6.22 Haml5dc.mac (continued)
Figure 6.23 Haml5dc . ci
f
224
is due in part to the absence of a data path. If just the
Weinberger array is considered, however, the circuit density
is approximately 100 transi stors/sq. mm. Appendix D contains
the script recording of the compilation of haml5dc.mac.
The transistor densities given in Table 6.2 are
derived from MacPitts chips. A comparison with standard
library cells densities derived from Newkirk and Matthews












CRef. lisp. 20 3
ADDER









So the MacPitts chips Are far less dense than even
the library macro cells. The Newki r k-Mathews cells only
consider the cell itself, and not the chip, which was the
basis on which the MacPitts densities were calculated.
Nevertheless, a density -factor o-f 10 is a considerable
di-f-ference (the MacPitts chips in this chapter are
approximately 50V. circuitry, and 50'/. white space, so a
density -factor o-f -five is still si gni -f i cant ) .
VII. CONCLUSION
A. SUMMARY
This thesis has considered the effects of syntax on
circuit structure in the MacPitts silicon compiler. The
combinational logic structure is explicitly specified by
syntax in the data path, and the appropriate behavior
results. The circuit behavior is explicitly specified in the
control path, and the combinational logic structure (a
Weinberger array) results.
Combinational logic structures in the data path comprise
adjoined MacPitts macros (organelles). Combinational logic
structure in the control path, however, is always done in a
Weinberger array. The poly runs internal and external to the
Weinberger array make combinational logic operate slower
there than in the equivalent circuit in the data path.
Parallelism of logical functions is possible in MacPitts by
using the COND and PAR forms. These paralleling forms
usually equate to a speed /area tradeoff on the chip.
Sequential logic in MacPitts is implemented as a Mealy-
type FSM. The state registers store the present state, and
receive present input information from both the control path
and the sequencer tail organelle. The data path width, as
declared in the PROGRAM statement, determines the number of
states possible for the FSM. This must be determined by the
designer a priori, and explicitly stated in the PROGRAM
statement. The long poly runs between the data path and
control path cause a slow speed in the MacPitts FSM, as
compared to the handcrafted equivalent. The 8:1 ratioed
superbu-f -f ered input pads add to this slowness, because of
the number of NOR gates one pad may have to drive in the
Weinberger array.
The FSM architecture and its attendant Mealy sequencer
organelles are implicitly specified by the PROCESS
statement. Each process is an independent entity in
MacPitts, with its own organelles and wires. Processes do
not communicate internally with each other. The PROCESS form
is another method of parallelism possible in MacPitts. All
PROCESSES embraced by PROGRAM execute in parallel, at the
speed of the slowest-executing process. This capability
makes MacPitts well-suited for design of controller-oriented
chi ps.
The chip design process with MacPitts can be understood
initially as algorithmic optimization. The test algorithm is
written, tested in the interpreter, and compiled to cif.
Then an expanded version of the test algorithm is written
and tested in the interpreter. The expanded version is
compiled to cif, a circuit extraction is made, and the
electrical characteristics and speed of the chip are
determined. Alternate solutions are then considered, and
tested in the same fashion. The best of these is chosen as
the archetype -for the desired chip. The archetype must have
suf f icientl y -few signals, ports, registers, and -flags to
permit testing in the interpreter (a maximum of 36). The
algorithm is then expanded again to cover the desired chip
function. The -final algorithm is compiled to cif , a circuit
extraction is made, and then the chip is tested
electrically. If there are too many variables to permit
command interpreter display, the algorithm is tested with a
switch-level simulator (this exercises both the algorithm
and the circuit). Further analyses with a power estimator
and a timing analyzer are done to see that the chip operates
within specifications. I-f the chip operates too slow,
parallelism should be applied to the algorithm where
possible, in an attempt to trade speed -for silicon area.
B. RECOMMENDATIONS
This thesis also investigated a number o-f MacPitts
errors and shortcomings. The -following recommendations
should be considered:
1. Have the the light controller chips -fabricated by
MOSIS -for testing at Naval Postgraduate school, and
compare with the results -from Crystal.
The Weinberger array errors as depicted in
Chapter II are thought to result from incorrect
installation of MacPitts under Unix 4.2. It would
be fruitful to search for a Uni x -dependent roundoff
error in the instantiation of part i al -qate-i nput-
ground-right and part i al -gate-i nput-ground-1 eft
.
The poly interconnections between data and control
also suffer a lateral displacement/gap error, and
the solution to the partial gate problem is likely
to solve this one also. Similar errors were also
noted in the data path, usually between vertical
metal lines and horizontal Vdd/GND busses.
New Mead—Conway organelles (c-f. Chapter III) should
be tried as replacements -for the MacPitts data path
organelles. This will require comparison between
similar structures with Powest and Crystal, and
selection o-f the better circuit. MacPitts will
connect the new organelles properly if the pitch is
preserved
.
4. The error of shorted flag traces occurs almost
every time a flag is declared. The vertical flag
lines intersect the horizontal clock traces at a
via cut, which shorts the flag signal and does not
permit it to pass to control. The solution to this
error is best solved by a conditional test in the
routing algorithm. If the flag traces run close to
the Vdd/ground comb, then the traces must be moved
in towards the center of the chip.
The possibility of replacing the slow Weinberger
Array with a PLA should be considered. This
solution will entail a complete rewrite of the
control. lisp source file, and major modification to
other files which depend on or interact with
control . 1 i sp . A study of plague and plagen (or
eqntott and tpla) is the best place to start, with
a view towards replacing the Weinberger array with
a compact PLA. The difficulty will lie in the
interface between the PLA logic equation
specification (in plague or eqntott) and the
MacPitts algorithmic language.
The problem of vestigal instantiation (sequencers,
unconnected vertical poly runs from the data path)
could be solved with a simple test using list
processing primitives. If the organelles or wires
a.re not needed, then skip the instantiation
process.
7. The problem of the unconnected Vdd bus only occurs
in very small chips, but should be simple to
correct. A metal routing up and to the left, to
connect to the Vdd comb is required. The simple
solution is to explicitly specify a connecting wire
in the CLL-1 i ke language used in the MacPitts
source code. The more instructive solution is to
write the Franz LISP code to decide if a jumper
wire is needed, and i f so, to create one.
8. A menu invoking Crystal, Esim, Powest , and Mextra
would speed up the design cycle. The menu could be
incorporated in MacPitts, but would probably be
just as good external to MacPitts. A timing
analysis is necessary in the compilation o-f the
chip, however. If it had existed during the Hamming
15/4 error corrector example (Chapter VI), the
choice o-f an archetype chip would have been
si mpl er
.
The vT— 100 terminal screen is too small to display
the interpreter session of all the signals, flags,
registers, and ports which occur on even a
moderate-sized MacPitts chip. A windowing
capability is needed. The source file
i nterpret . 1 i sp contains the command interpreter
logic. The interpreter is functionally a dynamic
debugger, similar to those in CP/M or VMS (but
without the ability to change the source code). The
interpreter has a very slow response time to
terminal inputs for all but the simplest chip
algorithms, and it would be useful to speed it up
also if other modifications are planned.
10. SPICE would be a valuable addition to timing
analysis. Currently, SPICE 2g6 is not installed on
the VAX-1 1/780 at Naval Postgraduate School. A plot
of the SPICE output is also desired, but not
available under the currently installed version of
Unix 4.2.
11. The capability to scale the MacPitts designs to
sizes other than multiples of 200 or 250
centimicrons is needed for future applications. The
ability to scale in multiples of 25 centimicrons is
suggested, where the designer chooses the option at
compile time in the MacPitts '(options) field.
12. MacPitts currently places pads on only three sides
of the chip frame. A better design would permit
I
pads to be placed on all -four sides of the chip.
This would also allow -faster chips, due to
shortened inter-chip wires.
13. The capability o-f automatic test vector generation
and evaluation is lacking. The command interpreter
should be able to access an existing file -for
testing and write the results o-f the tests to
another -file.
14. The ability to display transistor density as one of
the compiler statistics should be incorporated.
This would be a simple task, since MacPitts already
computes the chip dimensions and the number of
transistors, and writes each of these values to the
statistics output file.
15. A serial implementation of the Hamming 15/4 error
detector/ corrector should be attempted using
primitive polynomials CRef. 133, CRef. 5:pp. 2003.
The throughput should be compared to the parallel
15/4 error corrector. The interesting problem is to
solve the differing bandwidths at the input and
output of the shift register. MacPitts may not be
able to cope with this requirement, and will likely
be slower than the parallel architecture (in the
throughput sense) regardless.
16. A MacPitts prototype FIR or IIR digital filter
should be attempted. The first model should be an
FIR four-bit prototype, and this algorithm can then
be expanded to the floating point version of larger
word length. An excellent reference for the
designer is CRef. 14:pp. 5413, where the
algorithmic aspects of digital filter design a.r&
ex pi ai ned
.
17. Faster graphics B.rs required for the VLSI graphics
terminal (Caesar). A better (i.e., quicker)
terminal should be considered.
13. The Backus-Naur file (BNF) included with the
MacPitts source code specifies allowed algorithmic
syntax. The macro and lambda forms should be
investigated with a view to incorporating macros
into the algorithms.
19. It would speed up the design time and confer added
versatility on MacPitts i -f the input port width
could be specified as a variable. The word lengths
would then be assigned according to another single
statement in the MacPitts algorithm. For instance
(de-f -face port input (*) )
(de-f data word width 16)
would assign a 16-bit width to the variable <-face>,






























































r 11 > )
nel 1 e and
ne 1 1 e and
ne 1 1 e and








-1 ( ( (port-f nput d)
-2 ((( por t-
1
nput c)
-3 ( ( (port- 1 nput b )
-4 (<( port- 1 nput a)
( ( ( Internal 4 ) ) ) ) )
(port-fnput e)M)
( Interna 1 1 ) ) ) )
{ Internal 2 ) ) ) )
( 1 nterna 1 3 > ) ) >
phlc ) )
hlb > )















(I 0) (port-Input a 0))
(I b 0) (port-Input b 0>)
(I c 0) (port-Input c 0>)
(I d 0) (port-Input d 0))
(I (e 0) (port-Input e 0))
(outputs (z 0) (port-output z 0)))))




















































































c - for project f tvand

























- Max 1 mum
- Number
Reading source file - flvand.mac





















c - Data-path has 5 Units
1383, 901 - Outputlng .obj file
1413, 501 - Extruding gates
c - Control has columns
1516, 997 - Extruding straps
c - Circuit has 98 transistors
c - Control has tracks
c - Power consumption Is 0.038114 Watts
1679, 1095 - Laying out data-path
1815, 1192 - Organelle unit* 1 bit
2014, 1290 - Organelle unit* 2 bit
2168, 1391 - Organelle unit* 3 bit
2332, 1498 - Organelle unit* 4 bit
2385, 1498 - Organelle un1t# 5 bit
:







2539, 1600 - Laying
2542, 1600 - Laying
2543, 1600 - Laying
2545, 1600 - Laying
2547, 1600 - Laying
2683, 1699 - Laying
c - Dimensions are 1
5299, 3105 - Outputlng
c - Memory used - 357K
c - Compilation took 1.534722
c - Garbage collection took













Script of Compilation o-F Data Path Five Input AND Gate
94 41 64200 79400;
94 42 82200 79400;
94 43 100200 79400;
94 a 46300 79600;
94 41 64200 79600;
94 42 82300 79600;
94 43 100300 79600;
94 54 48000 79900;
94 41 54200 79900:
94 55 66000 79900:
94 42 72200 79900
94 56 84000 79900;
94 43 90200 79900;
94 57 102000 79900;
94 z 108200 79900;
94 54 49800 80400:
94 41 55500 80400
94 55 67800 80400
94 42 73500 80400
94 56 85800 80400:
94 43 91500 80400;
94 57 103800 80400;
94 z 109500 80400;
94 a 46300 80400;
94 41 64300 80400;
94 42 82300 80400:
94 43 100300 80400;
94 Vdd 52000 80600;
94 Vdd 57700 80600;
94 Vdd 70000 80600;
94 Vdd 75700 80C00;
94 Vdd 88000 80600:
94 Vdd 93700 80600;
94 Vdd 106000 80600;
94 Vdd 11 1700 80600;
94 54 49800 81600;
94 41 55500 81600
94 55 67800 81600:
94 42 73500 81600;
94 56 85800 81600
94 43 91500 81600;
94 57 103800 81600;
94 z 109500 81600;
94 54 49800 82400:
94 41 55500 82400:
94 55 67800 82400
94 42 73500 82400
94 56 85800 82400:
94 43 91500 82400;
94 57 103800 82400;
94 z 109500 82400:
94 z 1 16500 83600;
94 e 97200 84900;
94 z 109400 84900;
94 z 113500 84900:
94 d 79200 86100;
94 43 91400 86100:
94 43 95500 86100:
94 GND 41500 71700; 94 C S1200 87400;
94 Vdd 52000 76800; 94 42 73400 87400;
94 Vdd 57700 76800; 94 42 77500 87400;
94 Vdd 70000 76800; 94 b 43200 88600;
94 Vdd 75700 76800; 94 41 55400 88600:
94 Vdd 88000 76800; 94 41 59500 88600;
94 Vdd 93700 76800; 94 a 41500 89900;
94 Vdd 106000 76800;
94 Vdd 111700 76800;
94 b 43200 76900
94 c 61200 76900
94 d 79200 76900
94 e 97200 76900
94 z 113500 76900;
94 b 43200 76900
94 GND 48000 76900;
94 c 61200 76900
94 GND 66000 76900;
94 d 79200 76900
94 GND 84000 76900;
94 e 97200 76900
94 GND 102000 76900;
94 z I 13500 76900;
94 b 46300 77100
94 c 64300 77100
94 d 82300 77100
94 e 100300 7710f1;
94 b 45000 77100
94 c 63000 77100
94 d 81000 77100
94 e 99000 77100
94 b 46300 77800
94 c 64300 77800
94 d 82300 77800
94 e 100300 7780(1:
94 GND 53700 7 8 1
i
12;
94 GND 71700 781112;
94 GND 89700 78U12;
94 GND 107700 78 100;
94 a 41500 78600
94 41 59500 78601 r;
94 42 77500 786011;
94 43 95500 786011;
94 a 41500 78600
94 45 48000 78601J;
94 41 59500 78601J:
94 47 66000 786011;
94 42 77500 786011;
94 49 84000 78601J;
94 43 95500 78601J;
94 51 102000 786J70;
94 z 1 16500 7890 7;
94 z 116500 7890J7;
94 54 53200 7930 J;
94 55 71200 7930J3;
94 56 89200 793013:
94 57 107200 793130;
94 a 46200 79400 ;









































u 1 00 . 2 s
























Is dr 1 ven
. . through
Is dr 1 ven
. . through
Is dr 1 ven
. . through
Is dr I ven
. . through










s dr 1 ven
U : 00 . 1
S
ca 1 -g 5a


















































































: critic l .dum
[ : 00 . 1
: quit
[0:00. 4u 0:00.4s 36k] Crystal done,
% "D





































]41 .2ns, fal 1
405 2 2





19 . 4ns .rise
Data Path Five Input AND Critical Nodes
238
X powest -p < a5andcr . s 1 m
gamma=0. 4V"* . 5, tox=9e-08m, u0»0 . 08m**2/V-s
vdd=5V, vtd=-3.5V, vte=0.8V, vsb=2V

















































































































































































gate 2 ) )
gate 1 )
gate 0)
s 1 gna 1 - I nput
s 1 gna 1 - 1 nput
s I gna 1 - 1 nput
s 1 gna 1 - 1 nput
gate 4 )
gate 3 ) )
gate 2 > )
gate 1 ) )
gate > )
















d) ) ) ) )
a) )
b > )
c > ) ) ) )




























pr 1 m 1
pr t m i t









pr 1 m f t
pr Imlt
pr 1 m 1
te 4 )
te 3>
ive ( s Ignal- 1 nput a))













































ph Ic ) )















0) ) ) ) )
(primitive
( pr 1m ft Ive
( pr 1m 1 1 1 ve
( pr I m 1 1 1 ve






( s Igna 1 - 1 nput a )
)












( s 1 gna 1 - 1 nput e ) )
(signal-Input d>)
( s 1 gna 1 - 1 nput c ) )
( s 1 gna 1 - 1 nput b ) )
(signal-Input a))
z ( s 1 gna
1
-output 2)))))
Control Path Five Input AND .obj File (continued)
241
Script started on Mon Apr











































stlc - for project flveand












opt-c stat obj elf nologo)
55 - Reading source file - fiveand. mac
5 5 - Reading library from - /vlsl/nacplt/ltbrary
6 04 - Processing definitions
604 - Evaluating evals
604 - Expanding micros
Extracting sources
dest I na t Ions
label s
sequencer s











stlc - Maximum control depth Is 4
stlc - Number of gates Is 12










d - 1946 . 1286
d - 2002. 1286
stlc - Control
d - 4031 , 24 17
stlc - Circuit
stlc - Contro 1
stlc - Power consurtpt Ion is
d - 4183, 2517 - Laying out
stlc - Data-path
- Outputing .obj file
- Extruding gates

























- D I men s 1 ons







are 1 . 772500 mm by 1 .905000 mm
d - 7361. 4042 - Outputing .elf file
stlc - Memory used - 349K
stlc - Compilation took 2.106111
stlc - Cartage collection took 1,
stlc - For a total of 41 garbage
CPU minutes
15 3 889 CPU m 1
col lect ions
nutes
done on Mon Apr 15 22:34:42 1985
Control Path Five Input AND Gate Script File
242
94 Vdd 41000 4G000
94 Vdd 45200 47700
94 Vdd 48200 47700
94 Vdd 50400 47700
94 Vdd 53400 47700
94 Vdd 55700 47700
94 Vdd 58700 47700
94 Vdd 62700 47700
94 Vdd S5700 47700
94 Vdd 67900 47700
94 Vdd 73200 47700
94 Vdd 78400 47700
94 Vdd 81400 47700
94 14 45000 48900
94 15 48000 48900
94 16 50200 48900
94 2 53200 48900;
94 18 55500 48900
94 19 58500 48900
94 20 62500 48900
94 21 65500 48900
94 22 67700 48900
94 23 73000 48900
94 24 78200 48900
94 25 81200 48900
94 14 45500 51200
94 15 48500 51200
94 16 50700 51200
94 2 53700 51200;
94 18 56000 51200
94 19 59000 51200
94 20 63000 51200
94' 21 66000 51200
94 22 68200 51200
94 23 73500 51200
94 24 78700 51200
94 25 81700 51200
94 14 45000 52900
94 15 48500 52900
94 16 50200 52900
94 2 53700 52900;
94 18 55500 52900
94 19 59000 52900
94 20 62500 52900
94 21 66000 52900
94 22 67700 52900
94 23 73000 52900
94 24 78200 52900
94 25 81700 52900
94 d 71200 53900;
94 22 67700 53900
94 d 71200 53900;
94 GND 48500 54700
94 GND 78700 54700
94 GND 81700 54700
94 GND 46700 54900
94 GND 52000 54900
94 GND 57200 54900
94 GND 64200 54900
94 GND 69500 54900
94 GND 80000 54900
94 GND 46700 549.00
94 GND 52000 54900
94 GND 57200 54900
94 GND 64200 54900
94 GND 69500 54900
94 GND 80000 54900
94 15 48500 55900;
94 25 81700 55900;
94 2 53700 56700:
94 18 56000 56700
94 19 59000 56700
94 20 63000 56700
94 22 68200 56700
94 24 78700 56700
94 16 50200 57900
94 GND 56000 58700
94 GND 59000 58700
94 GND 63000 58700
94 GND 68200 58700
94 GND 78700 58700
94 GND 52000 58900
94 GND 57200 58900
94 GND 64200 58900
94 GND 69500 58900
94 GND 80000 58900
94 a 41500 59900;
94 a 41500 59900;
94 23 73000 59900
94 16 50700 60700
94 18 56000 60700
94 19 59000 60700
94 20 63000 60700
94 22 68200 60700
94 24 78700 60700
94 14 45000 61900
94 GND 56000 62700
94 GND 59000 62700
94 GND 63000 62700
94 GND 68200 62700
94 GND 78700 62700
94 GND 46700 62900
94 GND 57200 62900
94 GND 64200 62900
94 GND 69500 62900
94 GND 80000 62900
94 b 43200 63900;
94 b 43200 63900;
94 14 45500 64700
94 18 56000 64700
94 20 63000 64700
94 22 68200 64700
94 24 78700 64700
94 19 59000 65000
94 21 66000 65900
94 GND 56000 66700
94 GND 59000 66700
94 GND 63000 66700
94 GND 68200 66700
94 GND 78700 66700
94 GND 74700 66900
94 GND 46700 66900
94 GND 57200 66900
94 GND 64200 66900
94 GND 69500 66900
94 GND 74700 66900
94 GND 80000 66900
94 e 76500 67900;
94 19 59000 67900;
94 e 76500 67900;
94 15 48500 68700
94 20 63000 68700
94 22 68200 68700
94 23 73500 68700
94 24 78700 687C0
94 21 66000 69000
94 c 60700 69900;
94 18 55500 69300;
94 c 60700 69900;
94 GND 48500 70700
94 GND 63000 70700
94 GND 66000 70700
94 GND 78700 70700
94 GND 46700 70900
94 GND 64200 70300
94 GND 80000 70900
94 24 78200 7 1900
94 15 48500 72700
94 20 62500 73900
94 GND 48500 74700
94 GND 46700 74900
94 a 41500 75900
94 b 43200 75900
94 2 53700 75900
94 c 60700 75900
94 d 71200 75900
94 e 76500 75900









































































































c - for project gc
c - options: (herald opt-d opt-c stat obj
64, 57 - Reading source file - gc.mac
















































































path has 7 Units *
22 - Outputlng .obj file
22 - Extruding gates
ol has 31 col umns
85 - Extruding straps
It has 280 transistors

















































































































ge collection took 2,
total of 79 garbage
CPU minutes
328889 CPU minutes











































































































c - for project gc2
c - optfonst (herald opt
61, 54 - Reading source
64, 54 - Reading 1 Ibrar
882, 596 - Processing d
884, 596 - Evaluating e
967, 596 - Expanding ma
986, 596 - Extracting s
1084, 692 - Extracting
1086, 692 - Extracting
1087, 692 - Extracting
1090, 692 - Extracting
- Maximum control dept














- Extr ud 1 ng
has 288 tran
has 13 track

































































































-d opt-c stat obj cff nologo)
file - gc2 . mac
y from - /v 1 s 1 /macp 1 t/ 1 Ibr ary






















untt# 1 bit 1















c - D 1 mens ions
17205, 8788 - Outputlng
c - Memory used - 408K
c - Compilation took 4.8
c - Garbage collection t























500 mm by 1
.elf f 1 1e
982500 mm
23334 CPU minutes
























































































c - for project stop
c - options: (herald opt-d opt-c stat obj ctf nologo)
Reading source ftle - stop. mac





1 - Extracting sources
I - Extracting destinations
1 - Extracting labels
1 - Extracting sequencers
1 - Extracting flags, data-path
urn control depth Is 5






























































































268 trans I stor
s
14 tracks
ptton Is 0.054698 Watts
aylng out data-path
rganelle un1t# 1 bit 1































sk e 1 eton
pins
re 2.107500 mm by 2.207500 mm
Output tng .elf file
- 403K
took 6.877223 CPU minutes
ectlon took 3.587222 CPU minutes


















































































































































































































































































































































t-c stat obj elf nologo)
- b4 . mac








, data-path, control, and pins
4
file
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
le un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un I
le un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
1 e un 1
le un 1
1 e un 1
le un 1
1 e un 1











































































































































































































































































































































;GRAY CODE to BINARY conversion algorithm
(program gc2s 2





4 ph Ic )
reset signal Input 5)
1 np si gna 1 1 nput 6 )












( cond ( (not
( Inp
compl




( cond ( ( not
( Inp
1 np ) ( setq
< setq
1 np ) ( setq
(setq
1 np ) ( setq
( setq
bin (not 1np))(go msbs))
bin 1np)(go compl)))
bin lnp)(go compl))
bin (not 1np))(go nextblt)))
b1n(not 1np))(go nextblt))
bin 1np)(go compl))) ) )
THIS ALGORITHM EXHIBITS THE GRAY CODE
DECODING SCHEME DONE IN THE CONTROL PATH.
THE ONLY DATA PATH ORGANELLES INSTANTIATED
ARE THOSE ASSOCIATED WITH THE SEQUENCER. THE
WIDTH OF THE SEQURNCER (2 BITS) IS DEFINED
EXPLICITLY IN THE PROGRAM STATEMENT, EVEN
THOUGH NO ACTUAL DATA PATH (AS SUCH) EXISTS.
THE IMPLICATION IS THAT FSMs CAN BE CREATED
WITHOUT AN "ACTUAL DATA PATH".
Gcs. mac
249















































































































































































nsi (herald opt-d opt-c stat obj elf nologo)
Reading source file - gcs. mac
Reading library from - /v 1 s 1 /macp 1 1/ 1 1 br ar
y
- Processing definitions




2 - Extracting destinations
2 - Extracting labels
2 - Extracting sequencers
2 - Extracting flags, data-path, control, and pins
um control depth Is 4
r of gates is 25
_,
path has 4 Units
78 - Outputlng .obj file
78 - Extruding gates
ol has 29 col umns
32 - Extruding straps
1t has 215 transistors
ol has 13 tracks




































Organel 1e un1t# 1 bit 1
Organelle unit* 1 bit
Organelle unlt# 2 bit 1
Organelle unlt# 2 bit
Organel le unit* 3 bit 1
Organelle unit* 3 bit
Organelle unit* 4 bit 1
Organelle unit* 4 bit
















- Lay 1 ng
s are 1 . 742500 mm by 1
- Output 1 ng .elf file
ed - 377K
on took 4 .00861
1
ol lection took 2










(program dplc2 5 ;there aro 5 outputs
( def 1 3 power
)
(def 1 ground)
(def 2 phla >
(def 3 ph1b>
(def 4 phlc)
(def c signal input 5) ;note use of Boolean Inputs
(def tl signal Input 6)
(def ts signal Input 7)
(def reset signal Input 14)
(def 1c port output ( 8 9 10 11 12)) ;and Integer outputs
(process 1 1ght_contro1 1 er jstlpulates FSM architecture
hg • ;HIGHWAY GREEN state




(t (setq lc 5)
(go hy) ) )
hy ;HIGHUAY YELLOW state





(t (setq lc 13)
(go fg) > )
fg -.FARMROAO GREEN state
( cond( ( not (or t1(not c)))
( setq 1 c 16)
(go fg)
(t (setq lc 17 )
(go fy> ) )
fy ;FARMROAD YELLOW state




(t (setq lc 19)





























































stlc'- for project dplc2
stlc - options! (herald opt-d opt-c stat obj ctf nologo)
d - 62, 55 - Reading source file - dp1c2.mac
d - 68, 55 - Reading library from - /v 1 s 1 /macp 1 1/ 1 Ibrar y
d - 905 , 604 - Processing definitions
d - 906, 504 - Evaluating evals
604 - Expanding macros
702 - Extracting sources
dest 1 nat Ions
labels
sequencers














d - 2277, 1498
d - 2410, 1498
stf.c - Control
d - 8931, 4725






- Maximum control depth Is 5
- Number of gates Is 34
- Data-path has 4 Units







stlc - Power consumption Is 0.056716 Watts
e un It* 1 bit 4
d - 9580, 5048 -
d - 9922, 5267 -
d - 10156, 5379 -
d - 10207, 5379 -
d - 10375, 5498 -
d - 10533, 5607 •
d - 10859, 5718 •
d - 1 1242. 5928 -
d - 1 1266, 5928 -
d - 1 1291 , 5928 •
d - 11316, 5928 -
d - 11552, 6042
d - 11590, 6042
d - 1 1722, 6148
d - 1 1748, 6148
d - 11777, 6148 •
d - 12052, 6272
d - 12068, 6272
d - 12080, 6272 •
d - 12204, 6383
d - 12216, 6383
stlc - Data-path
d - 12313, 6383 -
d - 14457, 7438 -
d - 14461 , 7438 -
d - 14506, 7438 -
d - 14521 , 7438 -
d - 14578, 7438 -
stlc - D 1 mens 1 ons
d - 18275, 9184 -
stlc - Memory used
stlc - Compilation took 5.164444
stlc - Garbage collection took 2.
































are 2. 160000 mm by 2



































































[0:00. 5 u 0i































» c 1 ear
[0:00.0u 0:
: del ay ph
1








































































































at (154, -155) to Vdd after
at 23.99ns
at ( 158, -106) to 93
at (156, -59) to GND after
h at 18.05ns
at (5, -61 ) to Vdd after
at 9.33ns
at (69, -113) to GND after
at 6.31ns
at (75, -124) to Vdd after
at 1.95ns
at (76, -153) to 4








ns I stor f 1 ow. .
.
to 1 . . .
to 0. .
.
exam 1 ned .
)
00.0s 52k]
driven high at 32.06ns
rough fet at (154, -155) to Vdd after
riven low at 29.11ns
rough fet at (158, -106) to 93
rough fet at (156, -59) to GND after
driven high at 23.17ns
rough fet at (5, -61) to Vdd after
riven low at 14.46ns
rough fet at (69, -113) to GND after
riven high at 11.43ns
rough fet at (75, -124) to Vdd after
r 1 ven 1 ow at 6.97ns
rough fet at (76, -153) to 4











...through fet at (119, -126) to GND after
Is driven high at 2.67ns
...through fet at (118, -106) to 88
...through fet at (117, 11) to Vdd after
1b Is driven high at 0.00ns
lu 0:00. Is 52k]
leal -g sp laph lb
lu 0:00.1s 52k3
7u 0:00.5s 52k] Crystal done.
done on Sat Jun 15 15:16:58 1985



























































































Sat Jun 15 15:18:00 1985
n/crysta 1 1 t . s 1m
39k]
b c tl ts
48k]
f 11 hl0 hi 1
48k]




n low at 10.1Sns
fet at (569, 453)
fet at (568, 570)
fet at (456, 538)
fet at (480, 537)
high at 4.92ns
fet at (416, 930)
1 ow at 0.75ns
fet at (365, 942)






















Is dr 1 ve
. through

















Is dr 1 v
























































































Crystal Analysis o-f PLA Light Controller Chip
257
Script started on Thu Jun 13 23:30:02 1985
X powest -p < It. aim
gamma=0.4V**.5, tox-9e-08m, u0=0.08m**2/V-s
vdd=5V, vtd=-3.5V, vte=0.8V, vsb=2V
#devs Pdc_avg (W) Pdc_tnax (W) type
0.000000 0.000000 enhancement pullups
20 0.011980 0.023959 depletion pullups
15 0.030536 0.061072 special depletion pullups
35 0.042516 0.085032 TOTAL
X ~D
script done on Thu Oun 13 23:31:12 1985
Powest Analysis o-f PLA Light Controller Chi
258
X/vls t /berk 85 /b In/ crystal stop . s 1m
: Inputs c t 1 ts rst
:outputs st hl0 nil fl0 fll




t c 1 ear
t set 1 ph 1 a
:delay ph lb -1
:delay ph 1c -1
tcrttlcal (5S.67ns)
: c 1 ear
: set phlb phlc
:del ay ph 1a - 1
:crlt1cal (17.55ns)
: c 1 ear
.•set phlb phlc
: delay phla -1
scrltical (54.S3ns)
: c 1 ear













































stlc - for project haml5.4
stlc - options: (herald opt-d opt-c stat obj elf nologo)
d - 59, 52 - Reading source file - haml5.4.mac
d - 78, 52 - Reading library from - / v 1 s 1 /macp 1 1/ 1 Ibr ary
d - 890, 591 - Processing definitions
d - 894, 591 - Evaluating evals
d - 980, 591 - Expanding macros
d - 2822, 1405 - Extracting sources
d - 2982, 1511 - Extracting destinations
d - 3015, 1511 - Extracting labels
d - 3015, 1627 - Extracting sequencers
d - 3131, 1627 - Extracting flags, data-path, control, and pins
stlc - Maximum control depth Is 7
stlc - Number of gates Is 140
stlc - Data-path has Units
d - 9964, 4968 - Outputlng .obj file
d - 10373, 4968 - Extruding gates
stlc - Control has 155 columns
d - 586415, 233036 - Extruding straps
stlc - Circuit has 715 transistors
stlc - Control has 42 tracks
stlc - Power consumption Is 0.160860 Watts
d - 589965, 234452 - Laying out data-path
stlc - Data-path Internal bus uses tracks
d - 589967, 234452 - Laying out control
d - 599196, 239812 - Laying out flags
d - 599197, 239812 - Laying out river
d - 599206, 239812 - Laying out wing
d - 599281, 239812 - Laying out skeleton
d - 599325, 239812 - Laying out pins
stlc - Dimensions are 5.137500 mm by 4.005000 mm
d - 606259, 242522 - Outputlng .elf file
stlc - Memory used - 529K
stlc - Compilation took 168.593902 CPU minutes
stlc - Garbage collection took 67.456947 CPU minutes




1. Conrad i , J. R. and Hauenstein, B. R. , VLSI Desi qn
of a Very Fast Pi pel ine Carry Look Ahead Adder ,
Master's Thesis, Naval Postgraduate School,
Monterey, California, September 1983.
2. Carlson, D. J. , Appl i cati on o-f a Si 1 i con Compi 1 er
to VLSI Desi qn o-f Pi qi tal Pi pel ined Mul t i pi i ers
,
Master's Thesis, Naval Postgraduate School,
Monterey, Cali-fornia, June 1984.
3. Froede, A. 0. , Si 1 i con Compi 1 er Desi qn o-f
Combi national and Pipel ine Adder Integrated
Ci rcni ts , Master's Thesis, Naval Postgraduate
School, Monterey, Cali-fornia, June 1985.
4. Mead, C.
,
and Conway, L. , Introducti on to VLSI
Systems , Addi son-Wesl ey , 1980.
5. Hamming, R.
,
Codi nq and In-f ormati on Theory
,
Prentice-Hall, 1980.
6. Lincoln Laboratory, Massachusetts Institute of
Technology Project Report RVLSI-3, An I ntrod ucti on
to MacPitts , by J. R. Southard, 10 February 1983.
8. Weinberger, A.
,
"Large Scale Integration o-f MQS
Complex Logic: A Layout Method", IEEE Journa l of_
Sol id State Circuits , v. sc-2, pp. 182-190
December 1967.
9. Computer Science Division, Department o-f
Electrical Engineering and Computer Sciences,
University of Cali-fornia, Berkeley, EQNTQTT
,
by J. Ousterhout, pp. 1-6, 1981 .
10. Computer Science Division, Department o-f
Electrical Engineering and Computer Sciences,
University o-f Cali-fornia, Berkeley, TPLA,
by R. N. Mayo, pp. 1-5, 1983.
11. Newkirk, J.
, and Mathews, R. , The VLSI Desi qner '
s
Li brary , Addi son-Wesl ey , 1983.
12. University o-f California Regents, The Franz LISP
Manual
,
by J. K. Foderaro, p. 11-1, 1980.
13. McEliece, R. J. , Encyclopedi a o-f Mathemati cs and
Its Appl icati ons , vol. 3, (Probability, The Theory
o-f In-formati on and Coding), pp. 142-146, Addison-
Wesley, 1977.
14. Rabiner, L.R. , and Gold, B. , Theory and




Electronics Research Laboratory, College of
Engineering, University o-f California, Berkeley, Memo.
No. ERL-M520, 5PICE2: A Computer Program to Simulate
Semi conductor Circuits , by L. Nagel , 9 May 1975.
Electronics Research Laboratory, College of
Engineering, University of Cali-fornia, Berkeley, Memo.
No. ERL-M592, Program Reference for SPICE2 , by Ellis
Cohen, 14 June 1976.
Electronics Research Laboratory, College of
Engineering, University of California, Berkeley, Memo.
No. UCB/ERL M80/7, The Simulation of MPS Integrate d
Circuits Usi ng SPICE2
,
February 1980 (Revised October
1980)
.
Mavor , J. , Denyer , P. B. , and Jack, M. A. ,
I NTRODUCTION to MPS LSI DESIGN , Addi son-Wesl ey , 1983.
Muroga, S.
,
VLSI System Desi gn , Wi 1 ey- Inter sci ence
,
1982.




Martin, D. , and Prat a, S. , UNI X Pr i mer PI us
,
H. Sams Co. , 1983.





1. Library, Code 0142
Naval Postgraduate School
Monterey, California 93943-5100












5. Chairman, ECE Department, Code 62RR
Naval Postgraduate School
Monterey, California 93943-5100
6. Mr. Antun Domic, B-347








8. Robert C. Larrabee
5313 Angus Dr.
Virginia Beach, Virginia 23464
9. Mr. R. N. Larrabee
P.O. Box 96
Hollywood, Maryland 20636






























c.l VSLI design with
the MacPitts silicon
compilex.

