Proceedings of the Second Caltech Conference on Very Large Scale Integration, held at the California Institute of Technology 19-21 January, 1981 ; Organized by the Caltech Computer Science Department and the Caltech Industrial Associates Office, and sponsored by Caltech Industrial Associates and the National Science Foundation by Seitz, Charles L.
PROCEEDINGS 
of the Second Caltech Conference on 
VERY LARGE SCALE INTEGRATION 
held at the 
California Institute of Technology 
19-21 January, 1981 
Organized by the Caltech Computer Science Department 
and the Cal tech Industrial Associates Office 
and sponsored by 
Ca l tech Industrial Associates 
and t he National Science Foundat ion 
Editor: Charles L. Seitz 
© Copyright, 1981, Caltech Computor Science Department, all rights reserved 

FORWARD 
INVITED SPEAKERS SESSION 
Chairperson: Charles L. Seitz 
The MPC Adventures 
Lynn A . Conway 
TABLE OF CONTENTS 
MOSIS - The ARPA Silicon Broker 
Danny Cohen, George Lewicki 
Fast TUr naround Fabrication for Custom VLSI 
Gunnar A . Wetlesen 
Longer Term Directions for Semi-Custom VLSI 
Gordon B . Hoffm~n 
FABRICATION SESSIONS 
Chairperson: James D. Meindl 
Trends in Silicon Processi ng 
i 
iv 
5 
29 
4 5 
55 
63 
v. Leo Rideout 65 
Electron Beam Testing and Restructuring of Integrated Circuits 
D. C. Shaver 111 
Two Timing Samplers 
Edward H. Frank , Robert F . Sproull 127 
The Role of Test Chips in Coordinating Logic and Circuit Design 
and Layout Aids for VLSI 
Martin G. Buehler, Loren w. Linholm 135 
INNOVATIVE LSI DES IGNS SESSION 
Chairperson: Gerald J . Sussman 153 
Bit Serial Inner Product Processors in VLSI 
Misha R. Buric , Carver A . Mead 155 
A Smart Memory Array Processor for Two Layer Path Finding 
Christopher R. Carroll 165 
Special Purpose Hardware for Design Rule Checking 
Larry Seiler 197 
A VLSI Tactile Sensing Array Computer 
John E. Tanner , Marc H. Raibert , Raymond Eskenazi 217 
CALTECH CONFERENCE ON VLSI~ JanuaPy 1981 
ii 
Table of Contents (Cont'd) 
COMPUTER-AIDED DESIGN SESSION 
Chairperson : Martin Newell 
Algorithmic Layout of Gate Macros 
Daniel D. Gajski , Avinoam Bilgory , Joseph Luhukay 
SLIM: A Language for Microcode Description and Simulation in VLSI 
John Hennessy 
Signal Delay in RC Tree Networks 
Paul Penfield , Jr ., Jorge Rubinstein 
Functional Verification in an Interactive Symbolic IC Design 
Environment 
Bryan Ackland, Neil Weste 
A Methodology for Improved Verification of VLSI Designs Without 
Loss of Area 
Louis K. Scheffer 
INNOVATIVE CIRCUIT DESIGNS SESSION 
Chairperson: Thomas F. Knight, Jr. 
Considerations for an Analog Four Quadrant SC Muliplier 
Phillip E . Allen, William H. Cantrell 
A One Transistor RAM for MPC Projects 
James J . Cherry , Gerald L. Roylance 
PLA Design in NAND Structure 
Chong Ming Lin 
A Multiproject Ch ip Approach to the Teaching of Analog MOS 
LSI and VLSI 
Yannis P. Tsividis, Dimitri A. Antoniadis 
DESIGN DISCIPLINES SESSION 
Chairperson: Martin Rem 
Towards a Formal Treatment of VLSI Arrays 
Lennart Johnsson, Danny Cohen , Uri Weiser , Alan L . Davis 
A Notation for Designing Restoring Logic Circuitry in CMOS 
Martin Rem , Carver Mead 
A Structured Approach to VLSI Layout Design 
M. s . Krishnan 
Minimum Propagation Delays in VLSI 
Carver Mead , Martin Rem 
TABLE OF CONTENTS 
235 
237 
253 
269 
285 
299 
311 
313 
329 
343 
355 
373 
375 
399 
413 
433 
Table of Contents (Cont' d) 
Towards More Realistic Models of Computations for VLSI 
B . M. Chazelle L . M. Monier 
A Logic Design Theory for VLSI 
John P. Hayes 
ARCHITECTURE SESSION 
i ii 
441 
455 
Chairperson: Alan L. Davis 477 
A Restructurable Integrated Circuit for Implementing Programmable 
Digital Systems 
Rob Budzinski, John Linn, Satish Thatte 481 
Communication in a Tree Machine 
Sally A . Browning , Charles L. Seitz 509 
The Torus: An Exercise in Constructing a Processing Surface 
Alain J. Martin 527 
Architecture for VLSI Design of Reed-Solomon Encoders 
K. Y . Liu 539 
Communications for Next Generation Single Chip Computers 
David R. Smith , Douglas Chan 555 
CALTECH CONFERENCE ON VLSI, Janua Py 1981 
iv 
FOREWORD 
As Lynn Conway pointed out in her invited talk (page 5) , the two-year 
period between that first VLSI conference held at Caltech in January 1979 
and this Second Caltech Conference on VLSI " ••• has been one of tremendous 
activity in VLSI, a time of r eal discovery and rapid progress." 
Let me mention two of the important milestones reached in this period. 
Regular and reliable channels for those of us in universities to fabricate 
our designs were established, and several existing and new companies are 
organizing to provide such services commercially. The building of clean 
interfaces between design and fabrication, and the possible restructuring 
and broadening the design base of the microcircuit industry along this 
pattern, is the theme of the invited speakers session that opened the 
conference. 
IntPoduction to VLSI Systems by Carver Mead and Lynn Conway was published 
in the Fall of 1980, and, with some stimulus also in the form of 
"teacher's courses," the VLSI design courses and project laboratories 
pioneered in a few universities and innovative companies seem since to be 
spreading exponentially. 
That Lynn Conway and Carver Mead were the central figures in both of these 
accomplishments, and that their energies have been directed at these two 
projects that have made the VLSI research community more cohesive and 
cooperative, is surely a testimony to their insight and character. 
The technical sessions were organized to provide a broad view -- including 
fabrication, innovative designs, design tools, design disciplines, and 
architecture of research efforts underway in industry , government, and 
universities. 
The 28 papers presented were selected by the organizing committee from 
nearly five times as many submitted. We received many more excellent 
papers than we could accept for presentation and publication. 
We at Cal tech are very pleased with the alternation established with the 
MIT Conference on Advanced Research in Integrated Circuits in January 
1980 , and January 1982, and recommend to the interested reader the 
proceedings of the January 1982 conference and of other conferences held 
at the University of Edinburgh (August 1981) and Carnegie-Mellon 
University (October 1981). Proceedings from these conferences were 
published by Academic Press (VLSI81), Edinburgh), by Computer Science 
Press (VLSI Systems and Computations, Carnegie-Mellon), and Artech House, 
Dedham, MA (Proceedings, Conference on Advanced Research in VLSI, MIT 
January 1982). Proceedings of the Caltech conferences, January 1979 and 
January 1981, will continue to be available through the Computer Science 
Librarian, Caltech 256-80, Pasadena, CA 91125. 
Alas, the commercial publishers we approached in the fall of 1980, prior 
to the publication of the Mead & Conway text, were not yet ready to 
publish VLSI Proceedings, and we had to undertake this job internally. 
Your editor greatly regrets and apologizes to the authors and to those who 
had to wait for their orders to be filled for the extraordinary delays we 
have experienced in preparing this 600-page document . Very special thanks 
go to my secretary, Vivian Davies, for her care and persistence in 
assembling the document , making arrangements with printers, and sorting 
out the orders, after a turnover in our staff, as well as reminding the 
editor frequently of his duties. 
v 
This conference was organized jointly by the Caltech Computer Science 
Department and the Caltech Industrial Associates Office, and was sponsored 
by the Industrial Associates and by the National Science Foundation. My 
thanks to Bernie Chern of NSF for his support and assistance in expanding 
the representation at the conference to many more universities. 
Finally, let me express my thanks and appreciation to the technical 
program committee, consisting of Forest Baskett, Xerox PARC and Stanford 
University; Alan L. Davis, University of Utah; Lee Hollaar, University of 
Utah; Paul Hudak, University of Utah; Lennart Johnsson, Caltech; Thomas F. 
Knight, Jr., Massachusetts Institute of Technology; James D. Meindl, 
Stanford University; Glenn Miranker, IBM Corporation; Martin Newell, Xerox 
vi 
PARC; Martin Rem, Technical University, Eindhoven, and Caltech; James A. 
Rowson, Caltech; Dick Sites, Digital Equipment Corporation; Harold Stone, 
University of Massachusetts; and Gerald J. Sussman, Massachusetts 
Institute of Technology, for classifying and refereeing the very large 
number of papers over a brief period before Christmas. 
Charles L Seitz 
Conference Chairperson and Proceedings Editor 
INVITED SPEAKERS ses~ION 
ChaiPpe rs on : ChaPZes L . Seitz 
Associate PPofessoP of ComputeP Science 
CaZifoPnia Institute of Technology 
CALTECH CONFERENCE ON VLSI, JANUARY 1981 
2 
INVITED SPEAKERS SESSION 
Where the invited speakers session of our 1979 VLSI conference was devoted 
to r:1 s urvey of thl' evolving researc h areas, for this conference we tri ed 
to select a single topic of current interest to people from industry, 
government , and universities : the " s ilico n foundry" and "implementation 
system." 
There .trt;! many definitions of the "silicon foundry," from simply a factory 
that fabr-icates chips "designed e lsewhere," to Gordon Hoffman's definition 
')[ a semi-c ustom integrated circuit (page 61) as one that appears c ustom 
Lo the user but standard to the manufacturer. This departure from the 
situation m•>St often f0und today, in wbich designers work for the same 
comp<tny that fabricates their chi ps, indeed •Jsually in the same building 
;JS the fabrication li11e, is reminiscent of the period in Lhe 1950's when 
us~rs of compute rs started in earnest to supplement and specialize the 
program~ provided by compute r manufacturers. 
The forces behind this possible restructuring of the microcircuit industry 
,He slmilar: the need to broaden the design capability of the industry, 
and the differences in business organization that encourage and reward the 
designer ~:~nd the foundry (sec Carver :-tead's article on "VLS I and 
Technological Innovation" in the 1979 Proceedings) . 
The tt!chnica 1 problems and solul ions also may well be similar, and can be 
described in terms of "clean interfaces" that must be estab l ished between 
design and fabrication . These interfaces may be at many different levels. 
One expects the inevitable tradeoffs between low-level representations 
that t~ke longer to design but which achieve very high density and 
performance, and high-level representations that reduce design time and do 
not squeeze as much onto a chip as possible. If the fabrication technology 
is outrunning the designers, reminiscent of course of the situation with 
computer hardware and software , perhaps we must learn to us e this 
:NVITEQ SPEAKERS SES3ION 
3 
remarkable fabrication technology in ways that optimize returns rather 
than silicon. 
The five talks in this session -- unfortunately Professor Carver Mead's 
talk was not captured on tape due to a defective tape cassette --
represent a progression from the rationale fo~ this approach , to the first 
experimental implementation system , to the first automated production 
implementation system MOSIS, to a commercial startup to provide 
fabrication services for custom designs addressing now additional issues 
in yield, testing, and customer education, and finally to systems in which 
the interface is elevated from the exchange of mask geometry to function, 
a two step interface that permits still more independence between design 
and fabrication. 
These papers were reconstructed from tape and/or the author's notes, with 
an attempt to retain the "first person" spontaneity of the talks, and 
where mistakes are fou~d, they are certainly the fault of this editor dnd 
not of the author . Please let me convey here my appreciation to the 
authors for sharing their ideas in this conference. 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
4 
INVITED SPEAKERS SESSION 
1. Introduction 
THE MPC ADVENTURES: 
Experiences with the Generation of 
VLSI Design and Implementation Methodologies 
Lynn A. Conway 
Research Fellow, and 
Manager, VLSI System Design Area, 
Palo Alto Research Center, Xerox Corporation 
5 
It's great to be here with you today. 1 remember an equally sunny January day here in Pasadena 
when the first VLSI Conference was held at Caltech two years ago. That seems such a short while 
ago, but the period since has been one of tremendous activity in VLST, a time of real discovery and 
rapid progress. I'm really looking forward to the Technical Sessions of the next two days, to 
hearing about some of the best recent work in this exciting field. 
My talk today is about "The MPC Adventures", namely the multi-university, MultiProject Chip 
escapades of the past two years. I'll describe these adventures, and the new VLSl implementation 
system that made possible the economical, fast-turnaround implementation of VLSI design projects 
on such a large scale. I'll also describe the experiences I've had with the processes involved in 
generating new cultural forms such as the "Mead-Conway" VLSI design and implementation 
methodologies. One of my objectives today is to help you visualize the role that the "MPC 
Adventures" played in the generation of the methodologies. 
I am particularly interested in developing effective research methodologies in the sciences of the 
artificial, especially in areas such as engineering design. The sort of question that really interests me 
is: How can we best organize to create, validate, and culturally integrate new design methods ·in 
new technologies? What arc the research dynamics involved? Consider the following: 
When new design methods are introduced in any technology, especially in a new technology, a 
large-scale exploratory application of the methods by many designers is necessary in order to test 
and validate the methods. A lot of effort must be expended by a lot of people, stmggling to create 
many different systems, in order to debug the primitives and composition rules of the methodology 
and their interaction with the underlying technology. A similar effort must also be expended to 
generate enough design examples to evaluate the architectural possibilities of the design methods 
and the technology. That is the first point: A lot of exploratory usage is necessary to debug and 
evaluate new design methods. The more explorers that arc involved in this process, and the better 
they are able to communicate, the faster the process runs to any given degree of completion. 
Suppose some new design methods have been used and fairly well debugged by a community of 
exploratory designers, and have proven very useful. Now consider the following question: How can 
you take methods that arc new, methods that are not in common usc and therefore perhaps 
considered unsound methods, and tum them into sound methods? In other words, how can you 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
6 
Lynn A. ConliJay 
cause the cultural integration of the new methods, so that the average designer feels comfortable 
using the method~. considers such usage to he part of their normal duties. and works hard to 
correctly usc the methods? Such cultural integration requires a major shift in technical viewpoints 
by many, m<lllY individual designers. Changes in design pra<:ticcs usually require changes in the 
social organi7ation in which the designer functions. These arc difficult obstacles to overcome. We 
sec that numbers arc important agam. leading us to the second point: A lot of usage is necessary to 
enable sufficient individual viewpoint shift!> and social organizmion shifts to occur to effect the cultural 
integration of the methods. The more designers involved in using the new methods, and the better 
they arc able to communicate with each other, the faster the process of cultural integration runs. 
When methods arc new and arc still considered unsound. it is usually impossible in traditional 
environments to recruit and organize the large numbers of participants required for rapid, thorough 
exploration and for cultural integration. Therefore, new design methods normally evolve via rather 
ad hoc. undirected processes of cultural diffusion through dispersed, loosely connected groups of 
practitioners. over relatively long periods of time. (Think, for example of the effect of the vacuum-
tube-to-transistor technology transition on the design practices of tJ1c electronic design community, 
or of the efTect of the discrete-transistor-to· TI'L technology transition). When the underlying 
technology changes in some important way, new design methods exploiting the change compete for 
market share of designer mind-time. in an ad hoc process of diffusion. Bits and pieces of design 
lore. design examples, design artifacts, and news of successful market applications, move through the 
interactions of individual designers, and through the trade and professional journals, conferences, 
and mass media. When a new design methodology has become widely integrated into practice in 
industry, we finally sec textbooks published and university courses introduced on the subject. 
I believe we can discover powerful alternatives to that long, ad hoc, undirected process. Much of 
this talk concerns the application of mcmods of experimental computer science to me particular case 
of me rapid directed creation, validation. and cultural integration of me new VLSI design and VLSI 
implementation methods within a large computer-communication network community. 
First I will sketch the evolution of the new VLSI design methods, the new VLSI design courses, and 
me role mat implementation pla}cd in validating the concepts as mcy evolved. Next I'll bring you 
up to date on the present status of me methods, the courses. and the implementation systems. 
Finally, I'll sketch of me methods mat were used to direct this evolutionary process. We'll reflect a 
bit on mose methods, and look ahead to other areas where such mcmods might be applied. 
2. Evolution of the VLSI Design Courses; Hole of the MPC Adventures 
In the early 1970's, Carver Mead began offering a pioneering series of courses in integrated circuit 
design here at Caltech. The students in mcse courses in MOS circuit design were presented the 
basics of industrial design practice at me time. Some of iliese students went on to do actual design 
projects, and Carver found tJ1at even those without backgrounds in device physics were able to 
complete rather ambitious projects after learning iliese basics. These experiences suggested mat it 
might be feasible to create new and even simpler memods of integrated system design. 
In me mid 1970's, a collaboration was formed between my group at Xerox PARC and a group led 
by Carver here at Caltech, to search for improved mcmods for VLSI design. We undertook an 
efTon to create, document, and debug a simple, complete, consistent method for digital system 
design in nMOS. We hoped to develop and document a method mat could be quickly learned and 
INVITED SPEAKERS SESSION 
The MPC AdventuPes: ExpePiences with the Gene~2tion of 
VLSI Design and Implementation Methodologies 
applied by digital system designers. folks skilled in. the problem domain {digital system architccu1re 
and design) but having limited backgli)Unds in the solution domain (circuit design and device 
physics). We hoped to generate a method U1at would enable the system designer to really exploit 
the architectural possibilities of planar silicon technology without giviug up the order of magnitude 
or more in area-time-energy performance sacrificed when using the intermediate representation of 
logic gati.!S as in. for example, traditional polyccll or gate-array techniques. 
Our collaborative research on design methodology yielded important basic results during '76 and 
'77. We f01mulatcd some very simple rules for composing FET switches to do logic and make 
registers. so that system designers could easily visualu.e the mapping of synchronous digital systems 
into nMOS. We formulated a simple set of concepts for estimating system performance. We created 
a number of design examples that applied and illustrated the methods. 
The Mead-Conway Text 
Now. what could we do with this knowledge? Write papers? Just design chips? r was very aware 
of the difficulty of bringing foith a new system of knowledge by just publishing bits and pieces of it 
in among traditional work. 
r suggested the idea of writing a book, actually of evolving a bvok, in order to generate and 
integrate the methods. and in August 1977 Carver and I began work on the Mead-Conway text. 
We hoped to document a complete, but simple, system of design knowledge in the text, along with 
detailed design examples. We quickly wrote a preliminary draft of the first three chapters of this 
text. making usc of the .'\Ito personal computers, the network, and the electronic printing systems at 
PARC. In parallel with this. Carver stimulated work on an imponam design example here at 
Caltech, the work on the "OM2". Dave Johannsen carefully applied the new design methods as 
they were being documented, refined and simplified. to the creation of this major design example. 
We then decided to experimentally debug the first three chapters of material by interjecting them 
into some university MOS design courses. An initial draft of the first three chapters1<a) was used by 
Carlo Sequin at U.C. Berkeley, and by Carver Mead at Caltech in the f.1ll of '77. During the fall 
and winter of '77-'78. Dave Johannsen finished and documented the new OM2 design. The OM2 
provided very detailed design examples that were incorporated into a draft of the first five 
chaptersl(b) of the text. We distributed that draft in February '78 into spring semester courses by 
Bob Sproull at CMU, and by Fred Rosenberger at Washington University, St. Louis. 
We were able to debug and improve the material in these early drafts by getting immediate 
feedback. from the '77-'78 courses. We depended heavily on usc of the ARPAnet for electronic 
message communications. Our work rapidly gained momentum. A number of people joined to 
collaborate with us during the spring of '78: Bob Sproull at CM U and Dick Lyon at PA RC created 
the CIF 2.0 specification; Chuck Seitz prepared the draft of Chapter 7 on self-timed systems; H. T. 
Kung and several others contributed important material for Chapter 8 on Concurrent Processing. 
By the summer of '78 we completed a draft of the manuscript of the entire textbook._l(c) 
The MJT78 VLSJ Design Course 
During the summer of 1978, r prepared to visit M.l.T. to introduce the first VLSI system design 
course there. This was to be a major test of the full set of new methods and of a new intensive, 
project-oriented form of course. I also hoped to thoroughly debug the text prior to publication. I 
wondered: How could I really test the methods and test the course contents? The answer was to 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
8 Lynn A. Conway 
spend only half of the course on lectures pn design methods: then in the second half, have the 
students do design projects. I'd then try to rapidly implement the projects and sec if any of them 
worked (and if not, find out what the bugs were). fhat way I could discover bugs, or missing 
knowledge, or missing constraints in the drsign methods or in the course curriculum. 
1 prepared a detailed outline for such a course. and printed up a bunch of the drafts of the text. 
Bob Hon and Carlo Sequin organized the preparation of a "Guide to LSI Implcmcntation"2 that 
contained lots of practical infonnation r~~ latcd to doing projects. including a simple library of cells 
for 110 pads, PLA 's, etc. I then travelled to M.I.T .. and began the course. It was a very exciting 
experience, and went very well. We spent seven weeks on design lectures, and then an intensive 
seven weeks on the projects. Shortly into the project phase it became clear that things were 
working out very well, and that some amazing projects would result from the course. 
While the students were finishing their design projects, I cast about for a way to get them 
implemented. I wanted to actually get chips made so we could sec if the projects worked as 
intended. But more than that, I wanted to sec if the whole course and the whole method worked, 
and if so, to have demonstrable evidence that it had. So I wanted to take the completed layout 
descriptions and very quickly turn them into chips, i.e. implement the designs (We usc the term 
"VLSI implementation" for the overall process of merging the designs into a starting frame, 
converting the data into patterning format, making masks, processing wafers, dicing the wafers into 
chips. and mounting and wire-bonding the chips into packages). 
We were fortunate to be able to make arrangements for fast implementation of those student 
projects following the MIT course. I transmitted the design files over the ARPAnet from M.l.T. on 
the cast coast to some folks in my group at PARC on the west coast. Tbe layouts of all the student 
projects were merged together into one giant multiproject chip layout, a trick developed here at 
Caltcch, !>O as to share the overhead of maskmaking and wafer fab over all of the designs. The 
project set was then hustled rapidly through the prearranged mask and fab services. Maskmaking 
was done by Micro-Mask, Inc., using their new eicctron-bcam maskmaking system, and wafer 
fabrication was done by PJt Castro's Integrated Circuit Processi ng Lab (ICPL) at HP Research, in 
Palo Alto. We were able to get the chips back to the students about six weeks after the end of the 
course. A number of the M.l.T. '78 projects worked, and we were able to uncover what had gone 
wrong in the design of several of those that didn't. 
The M.I.T. course led to a very exciting group of projects, some of which have been described in 
later, publications. I'll now show a map and some photos of the project chip (see Ref. 6). The 
project by Jim Cherry, a transformational memory system for mirroring and rotating bit map image 
data, is particularly interesting, and was one of those that worked completely correctly. Jim's project 
is described in detail in the second edition of the Hon and Sequin Guidebook (see Ref. 5). 
Another interesUng project is the prototype LISP microprocessor designed by Guy Steele, that was 
later described in an M.I.T. AI Lab report.3 
As a result of this course and the project experiences, we uncovered a few more bugs in the design 
methods, found constraints that were not specified, topics that were not mentioned in the text, that 
sort of thing. You can ·see that the project implementation did far more than test student projects. 
It also tested the design methods, the text, and the course. 
During the spring of '79 we began preparing the final manuscript of the Mead-Conway text for 
publication by Addison-Wesley the following fall.4 Hon and Sequin began preparing a major 
INVITED SPEAKERS SESSION 
The MPC Adventures: Experiences with the Generation of 
VLSI Design and Implementation Methodologies 
9 
revision of the Implementation Guide5 that would contain important things like a CIF primer, new, 
improved library cells, and so forth. I began preparing an "Instructor's Guide", based on the 
experiences and infonnation from the M.I.T. '78 VLSI design course.6 containing a detailed coure 
outline. a complete set of :ecture notes. and homework assignments from that course. These 
materials would help transport the course to other environments. 
The M PC Adventures: M PC79 and M PC580 
I'll now describe the events surrounding the multiproject chip network adventures of the fall of 
1979 and spring of 1980. I remember thinking: "Well, ok, we've developed a text, and also a 
course curriculum that seems tr:msportahlc. The question now is, can the course be transported to 
many new environments? Can it be transported without one of the principals running the course?" 
In reflecting on the early work on the text by communicating with our collaborators via the 
ARPAnet. and by thinking about which schools might be interested 111 offering courses, I got an 
idea: If we could find ways of starting project-oriented courses at several additional schools, and if 
we could also provide VLSI implementation for all the resulting student projects, we could conduct 
a really large test of our methods. The course might be successful in some schools, and not in 
others, and we could certainly Jearn a lot from those experiences. I began to ponder the many ways 
we could use the network to conduct such an adventure. 
We began to train instructors from a number of universities in the methods of teaching VLSI 
design. Doug Fairbairn and Dick Lyon ran an intensive short course for PARC researchers during 
the spring of '79, and a videotape7 was made of that entire course. During the summer of '79, we 
began using those tapes as the basis for short, intensive "instructor"s courses" at PARC for 
university faculty members. Carver Mead and Ted Kehl also ran an instructor's course at the 
University of Washington. with the help of the PARC tapes, in the summer of '79. All "graduates" 
of the courses received copies of the Instructor's Guide, to use as a script at their schools. 
By early fall of '79, quite a few instructors were ready to offer courses. We at PARC gathered up 
our nerve. and then announced to this group of universities: "If you run courses, we will figure out 
some way so that at the end of your course, on a specified date, we will take in any designs that 
you transmit to us over the ARPAnet: we will implement those projects, and send back wire-
bonded, packaged chips for all of your projects within a month of the end of your course!" This 
multi-university. multiproject chip implementation effort came to be known as "MPC79". 
About a dozen universities joined to participate in MPC79. As this large university community 
became involved, the project took on the characteristics of a great "network adventure", with many 
people simultaneously doing large projects to test out new ideas. Through the implementation 
effort, students hoped to validate their design projects, instructors would be able to validate their 
offering of the course, and we would be able to further validate and test the design methodology 
and the new implementation methods in development at PARC. 
We coordinated the MPC79 events by broadcasting a series of detailed "infonnational messages" 
out over the network to the project lab coordinators at each school. MSG# 1 announced the service 
and the schedule; MSG # 2 distributed the basic library cells, including 110 pads and PLA cells; 
MSG # 3 described the "User's Guide" for interactions with the system: MSG # 4 contained 
information about the usc of CIF2.0: MSG # 5 provided last-minute infonnation just prior to the 
design deadline; MSG#6 was sent just after the implementation was completed, and contained 
news about the results of the entire effort. Figure 1 flowcharts the overall activity. 
CALTECH CONFERENCE ON VLSI, January 1981 
10 
FIGURE 1 . 
MPC79 Flowchart: 
~- DS i2:9-PiaCeU-; ------1. 
I (S Items. ); 
I L NM; B L4000 w 1000 c 2000, ·750: ! 
I L NP: n L soo w 4000 c 2500. ·2000; ,. 
! DF; 
!____ ·--·---· 
r J --
' TO: MPC79@PARC·MAXC I 
i FROM: REB@MIT·XX 
SUBJECT: IMPLEMENT PROJ.CIF I 
..J 
$7 
l 
.. ·c!J, m!llll!iiiili!!l!!W' 
l 
INVITED SPEAKERS SESSION 
Lynn A. Conway 
i·-·-··-·--·-·-·--··----·-·-·---·-·-·--------·--·-----····--·--·--·-·-i 
'. USER COMMUNITY I 
i 
- 100 Designers at: ! i MIT, Caltech, Carnegie-Mellon Univ .. Stanford, • 
i c i ! uni(~~~~l~f~s~LA~;[2~u~~!;~.JfWash .• --- ! 
! l I ' ' ' (MSGS, Design Files) i '.~. I 
1 
ProJect lab coordmamrs at each school use local elcctromc 1 
i mail and file transfer faciliues to interact with the designers j ! and use the ARPANET to interact with MPC79 i 
L ________________ : ___ .:. ·----~ ·--·--------------·--··----_j 
DATA COMM. FACILITY ' 
(MSGS. CIF2.0 Design Files) 
IMPL.FACILITY 
ARPANET 
(MSG. FTP. TEi.NEl) 
(MSGS. CIF2.0 Design Files) 
INFO.MGMT SYSTEM: XEROX PARC/SSL 
Checking, Plannmg, Mcrgmg of Dcs1gns IntO Starting Frames 
Meeting of Constraints. Coordination . Logistics 
(Constraints. logi;~cs. '' (Design files, mc.-gcd 
1 • fi into Starting Frames) r···-·-·-~y_l)_t!P- .. '~ .. Q, ........ .. ................................. -· .. ·-·-·-·--....................... -...... . .... --····~ 
i "The Foundry" ! 
I . 
1 
MASKMAKING: MICRO MASK, INC. j I I I 0.~ fu~o< MEBES, El~::::·-''"~ I 
I WAFER FABRICATION: H-P/ICPL I ! NMOS Silicon Gate • I LAMDDA = 2.5 MICRONS I 
I (WAFERS) I ~~· , I '! PACKAGING 
L·---·---1--·----·- --------·-.. ·--·---· ____ J 
(Bonding M!s) (Pactag~d Chips) (ElecL Par LlllS) (Plots) 
Packaged Chips, custom wire-bonded per project. along with 
plots, wire-bonding maps, and results of electrical testing, 
to send back to the designers for functional testing. 
The MPC Adventures: Experiences with the Generation of 
VLSI Design and ImpLementation MethodoLogies 
11 
During thic;; period. Alan fkll pioneered the architecture and teamed up with Martin Newell to 
develop a "VLSI Jmplcmcntation System", which is sometiling like a time-~luring operating system, 
or information management system. for providing remote access to mask and fab services. This 
system manages all user interactions. manages the data base of design Iiles. handles the logistics, the 
scheduling, enabling users all around tile country to interact by electronic messages witil (what they 
perceive lO be) an automatic system that implements their projects. 
Figure 2 shows a simple block diagram of tile basic modules of the system. It contains a user 
message handler and an associated design file processing subsystem ; these provide a means for 
interacting with users to receive requests for service, transmit status and error messages, and build 
the design-file data base. It also contai ns a die-layout planning and design-file merging subsystem 
used to pack all of the participants designs together into a mask specification following the design 
deadline time. Finally it contains a CIF to MEBES (electron beam maskmaking) fonnat-conversion 
subsystem to prepare the data files for hand off to the foundry. 
TO ARPAtKT 
(IJSCA usc.s, svsn:",t= ""'•• TOR "''"· SYSTtM L16AAHY f"t f S. U5l R Dt.SIG"f rUS) 
-
USER MfSSAGE AND DESIGN FILE f--PROCESSING ~UDSYSTEM 
(control Info) 
[] O'l·lA YOUT Pt ANNtNG nu:: STORAGE AND SYSTEM DCSIGN Fll£ M(RGINO 
SUOSYSlCM 
pmmmnmj 
OPERATOR 
T(RMINI\L 
(control Info) 
Clr TO MERES CONVERSION 
-
SUBSYSlrM r--
(mas.k,fab. .t.ckaglog jMEBESmosl spcc.lhcatton filM) (control info, bo• dtng diagrema, 
schedules o ld con~l ralnts) knplcrncntd.lton ~ocumcnl311on) 
Figure 2. Block Diagram of the VLSI Implementation System 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
12 
Lynn A. Conway 
Following is a photo (Fig. 3) of Alan lkll operating the implementation system at PARC during the 
very final stages of project merging following the MPC79 design deadline. He's taken almost all of 
the designs, as identified in a display menu listing the project ID's, and packed them into the 12 
die-types of the project seL 
Figure 3 . Alan Bell using the Implementation System to merge the MPC79 projects 
For MPC79. the implementation system produced MEI3ES mask specifications contammg 82 
projects from 124 participating designers, merged into 12 die-types that were distributed over two 
mask sets. Thus there was a tremendous sharing of the overhead involved in the maskmaking and 
wafer fab. For MPC79 the masks were again made by Micro-Mask, Inc., and wafer fabrication was 
again done by HP-ICPL. Several chips of each project type were custom wire-bonded and prepared 
for shipment back to the designers, along with "implementation documentation"8 containing pinout 
information for the projects, electrical parameter measurements for the wafer Jots, etc. Figure 4 
provides a visuali7..ation of the many projects conveyed through one of the MPC79 wafer types, and 
of the corresponding of hierarchy of information associated with the project seL 
INVITED SPEAKERS SESSION 
Th e MPC Advent ures: Ex pePiences ~ith the GenePation of 
VLSI De s ign a n d Impteme n tation Methodotogies 
H 
,.c::J 
.. 0 
··O 
.. 0 
.. 0 
.. 0 
•0 
•0 
•D 
· 0 
•O 
· c=J 
B B B 
B B B B 
c c c D D D 
c c c 0 0 0 
c c c H F F 
E E E E F F 
E E E E F F 
G G G G 
G G G 
AE 
~ ) l H ) 1 H H l l 11 
11 u u u 14 
D 
0 
F 
F 
F 
" 
.. 
Figure 4 . 
Above: 
At Left: 
• 
MPC79 
A.E-7 
Photo of MPC79 type· A wafer, 
type-AE die, type AE-7 chip. 
Corresponding hierarchy of 
informational material. 
13 
CALTECH CO NFERENCE ON VLSI, Januapy 1981 
14 Lynn A. Con!JJay 
Just 29 days after the design deadline time at the end of the courses, packaged custom wire-bonded 
chips were shipped back to all the MPC79 designers. Many of these worked a:; planned, and the 
overall activity was a greaL success. I'll now project photos of several interesting MPC79 projects. 
First is one of the multiprojcct chips produced by students and faculty researchers at Stanford 
University (Fig. 5). Among these is the first prototype of the "Geometry Engine", a high-
performance computer graphics image-generation system. designed by Jim Clark . That project has 
since evolved into a very interesting architectural exploration and development project.9 
t-lgu re o. Photo of MPC79 Die-Type BK (containing projects from Stanford University) 
Another project that turned up in MPC79 was a LTSP microprocessor10 designed by Holloway, 
Sussman, and Steele at MIT and De11 at Pi\RC. This "Scheme-79" chip is a further step in the 
evolution of LISP microprocessor architectures by the M.l.T. i\1-l..ab group. Their work is based on 
the prototype LISP microproccssor3 Guy Steele designed for the 1978 MIT course. 
INVITED SPEAKERS SESSION 
The MPC Adventupes: ExpePienc e s with the CenePation o f 
VLSI Des i gn and Implementation Methodologies 
15 
The results of this design methodology experimentation and demonstration were very exciting, and 
convinced us of the overall ments of the design methods. the courses. and the implementation 
infrastructure. We first reported on the results at the M.I.T. VLSI conference in January 1980.11.12 
At PARC we then began the transfer of the implementation system technology to an internal 
operational group: the transfer was completed during the spring of 1980. That operational group 
now has the responsibility of providing VLSI implementation service within Xerox. They ran the 
implementation system for a very large group of <;chools in the spring of 1980. in order to provide 
themselves wil11 a full-scale test 111e overall operation of the system, and to confinn the success of 
the technology transfer. That effort, known as "MPC580" 13. had about twice as many participants 
as did MPC79. O'er 250 dcsiguers were inv,11ved! They produced so many projects, including a 
number of full-die sized projects. that 5 mask sets were required. i\.1111ough MPC580 involved a lot 
of maskmaking and wafer fabrication, the project set was turned around from design-cutoff to 
packaged chips in about six weeks. 
Some really interesting projects were created by the MPC580 designers. An example is the RSA 
encryption chip 14 designed by Ron Rivest at MIT. Ron is a computer science theoretician and 
faculty member at M.I .T .. had taken the YLSl design course the previous fall, and had done a small 
project for MPC79. He and several other M.l.T. people then created 111e prototype RSA encryption 
chip architcclUre and design during the spring of 1980, in time for the MPC580 cutoff. 
I think you can now begin to sec the role the provision of implementation plays in stimulating 
architectural exploration, 111e offering of design courses, and the creation of design envirorunents. 
3. Present Status of the VLSI Design Courses and the VLSI Implementation Systems 
The design methodology introduced in the Mead-Conway text has now become well integrated into 
the university computer science culture and educational curriculum. During the '79-'80 school year, 
courses were offered at about 12 universities. During the present '80-'81 school year, courses are 
being offered at more than 80 universities. 
Jn addition, a number of industrial firms have begun to offer internal, intensive courses on the 
design methodology. For example. courses arc being offered at several locations within Hewlett· 
Packard, under the leadership of Merrill 13rooksby, Manager of Corporate Design Aids at HP. The 
HP courses arc project oriented, and provide students with fast-turnaround project implementation. 
13rooksby believes that in addition to directly improving the skills of HP designers, the course plays 
an important role by providing a common internal base of design knowledge through which 
designers can communicate about work in other technologies (the "common culture effect"). 
Similar courses are being offered at DEC, in an effort Jed by Lee Williams. Many other industrial 
finns have begun using an excellent videotape VLSI system design course produced recently by 
YLS I Technology, Inc. (VTI).1s 
Design aid concepts and software arc evolving rapidly in the university VLSI research community. 
During the work on MPC79, we began to sec very interesting new types of analysis aids originating 
at MIT. I'm thinking of the work of Clark Baker, Chris Terman, and Randy Bryant who began 
creating circuit extractors, static checkers, and switch simulators of 3 sort appropriate for our design 
mcthods.16.17 They began to provide access to such analysis aids 0ver the network, aids that could 
be easily and efficiently used to partially validate projects prior to implementation. These tools 
were used to debug some large projects prior to submission to MPC79 (for example, the Scheme-79 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
16 Lynn A. Coni.Jay 
chip). Some of these tools arc now in routine usc at a number of other universities. I believe we'll 
soon sec analysis aids embodying these new concepts placed into widespread usc in industry. 
A VLSI implementation system has been put into usc by Xerox Corporate Research to support 
exploratory VLSI system architecture and design within Xerox Corporation. Another 
implementation system is being operated by USC/lSI for the Defense Advance Research Projects 
Agency's (DARPA) VLSI research community, a community consisting of several large research 
universities (including M.l.T .. CMU. Stanford, U.C. Berkeley, Caltech, etc.). and a number of 
Defense Department research contractors. (Danny Cohen will describe that system in a later talk) 
The initial system architecture of the system used for MPC79, and the operational experiences 
during MPC79. provided the knowledge on which the new Xerox and lSI systems were based. One 
of the major improvements contained in both these newer systems is the fully-automated handling 
of user electronic message interactions and management of the design file data base. During 
MPC79, Alan Bell interacted with the designers with some machine assistance in message handling 
(using a menu-based graphical interface that made message-processing and file management 
interactions easy and fast), but in fact he did actually look at all user messages. When we ran 
MPC79, we couldn't predict the bounds on the information that would have to be conveyed 
between designer and system. The generation of that knowledge was an important result of 
MPC79. making it possible to automate the message handling and data base management in later 
systems. Our knowledge about the implementation system to foundry interface was also 
considerably expanded and refined during these experiences.18 
As I think back over the origins of the VLSI implementation system, it's clear that we didn't 
initially set out to create such a system. It was reaJly a serendipitous result. We were extremely 
motivated and driven to provide VLSI implementation to a large university community. I thought 
that it just might be possible to do that. I realized that pulling off VLSI implementation on such a 
vast scale would generate and propagate a lot of artifacts, and thus announce the presence of the 
new design culture, and help to culturaJly integrate our methods. So, we began working very hard 
at PARC to create ideas to bring down the cost per project and the overall turnaround time, and to 
scale up capabilities for handling as many designers as possible. 
Somewhere along the line I began to use the metaphor that "we're creating something for mask and 
fab that was like the time-shared operating system was for computing systems". Our idea was to 
create a system that provided remote-entry, time and cost-sharing access to expensive capital 
equipment, and that also managed the logistics of providing such access to a large user community. 
At that time. and even now in most integrated circuit design environments, the maskmaking and 
wafer fabrication required to implement prototypes for a design project cost about $15,000 to 
$20.000, and with some luck take only three or four months getting through the various queues. 
(Designers using internal comapany facilities may not sec those costs, but I guarantee they're there; 
on the other hand, all IC designers are familiar with those long turnaround times). With that as 
background, we were really amazed when we added up the costs in doJlars and time to implement 
the projects in MPC79. By using the implementation system to provide shared access for a large 
community of users to what amounts to a "fast-turnaround silicon foundry" for rapid maskmaldng 
and wafer fabrication, we achieved a cost per project on the order of a few hundred dollars, and a 
total turnaround time of only 29 days! (And remember. we weren't using internal mask and fab 
facilities at PARC, but were instead going to outside foundry services.) 
INVITED SPEAKERS SESSION 
<J
 
:h
. 
t-
. 
l-c3
 
t>:
l 
<J
 
::t:
: 
<J
 
C
) 
=
<=
 
'"
>;1
 
t>:
l 
::JJ
 
t>:
l 
=
<=
 
<J
 
t>:
l 
C
) 
=
<=
 
~
 
t-
. {/)
 
~
 
.
.
 ~
 
~ ~ t:
 
~ ~ ~
 
.
.
.
.
.
.
.
 
~
 
Cl
:) 
.
.
.
.
.
.
.
 
TA
B
LE
 1
. 
C
om
pu
tin
g 
a
n
d 
D
es
ig
n 
E
nv
iro
nm
en
ts
 f
or
 1
 g8
Q
-
81
 V
LS
I 
D
es
ig
n 
C
ou
rs
es
 a
t 
U
ni
ve
rs
iti
es
 th
at
 p
ar
tic
ip
at
ed
 in
 M
P
C
7g
/M
P
C
58
0.
 
[R
ep
rin
ted
 w
ith
 p
cn
ni
ss
io
n 
of
 L
AM
BD
A.
 T
he
 M
ag
az
in
e 
o
f V
LS
I 
De
sig
n1
9 )
 
U
N
IV
ER
SI
TY
•
 
M
IT
 
Co
tto
cll
 
Sl
•n
fo
rd
 
CM
U 
U
C
I.
 
CO
UR
SE
 IN
FO
RM
AT
IO
N 
ln
st
ru
ct
O
f(S
) 
J 
Al
le
n,
 
C 
M
ea
d.
 
J 
N
ew
kt
rk
, 
A 
Sp
ro
ul
l 
A 
N
ew
to
n 
L 
G
la
ss
er
 
C 
St
et
z 
A 
M
1t
he
w
s 
C 
Se
qu
.n
 
Co
ur
se
 • 
83
71
 
CS
18
1.
 
EE
27
1 
15
-1
14
6 
C
S2
<8
 
CS
18
2 
Se
m
 o
r 
O
tr 
F,
 
S
p 
F-
W
-S
p 
F.
 S
p 
S
p 
F 
•
st
ud
 /C
la
ss
 
35
 
40
 
eo
 
JO
 
50
 
C
O
M
PU
TI
N
G
 E
N
VI
R
O
NM
EN
T 
CP
U 
D
EC
-
20
 
D
EC
-2
0 
D
EC
-V
AX
 
D
EC
·
VA
X 
D
EC
-V
AX
 
VA
X 
O
p 
Sy
s 
TO
PS
-
20
 
TO
PS
-2
0 
U
N
IX
 
U
NI
X
 
U
N
IX
 
U
N
IX
 
Pr
og
 l
an
g 
li
SP
 A
PI
.. 
S'
m
ul
1.
 C
 
c 
c 
c 
C
lU
 
D
ES
IG
N
 A
ID
 E
N
VI
R
O
NM
EN
T 
Sy
nt
he
st
s 
aM
:ts
 
P
LA
G
 M
l 
M
G 
M
l 
P
lA
G
 
P
lA
G
 
PL
AG
 S
G
C
 
O
es
cr
ip
ho
n 
li
d
S
 
SL
L 
SL
L 
SL
L 
S
LU
G
L 
S
LU
G
S
 
An
al
ys
.s
 a
td
s 
C
X.
SS
,
 
e
x
 s
s
.c
s
 
C
X 
O
RC
.
SS
 
CX
.S
S.
O
RC
 
e
x
 s
s
 c
s
 
O
RC
.
C
S 
V~
ot
Wi
nQ
 li
dS
 
C
PP
,
 
BR
P 
CP
P.
CR
P 
c
o
 
BR
P 
C
PP
.B
R
P 
8R
P.
B
D
 
CO
.B
O
 
Te
sl
•n
g 
1t
ds
 
M
TE
 
M
TE
 
M
TE
 
.
.
.
 
PR
O
JE
CT
 E
XP
ER
IE
N
CE
 
( •
 
PI
O
JK
IS
 '
 
de
stQ
ne
rs
l 
M
PC
79
 
15
 2
7 
24
 
28
 
19
 J
5 
S. 
5 
•
•
 
M
PC
S8
0 
11
 
13
 
21
 
22
 
32
 5
9 
12
. 1
7 
8 
12
 
U
 o
l C
ot
 
(10
.
5
' 
U
 o
l t
lf
no
le
 
U.
o•
w
 .
.
 h 
U
 o
f A
oc
hn
la
r 
U
C
LA
 
J 
M
ur
ra
y 
J 
Ab
ra
ha
m
 
E 
Oa
v•d
$01
"1 
T 
Ke
ht
 
E 
K1
nn
en
 
G
 K
ed
em
 
V 
Ty
re
e 
EE
59
< 
EE
32
5 
CS
59
00
 
<
92
 
M
25
8A
.
 
B 
C 
F 
F 
Sp
 
F 
W
 S
p 
F 
F
·W
S
p 
20
 
20
 
IS
 
25
 
20
 
DE
C
-
20
 
HP
IO
OO
 
O
EC
-2
0 
AL
TO
. V
AX
 
O
EC
·V
AX
 
VA
X 
TO
PS
·2
0 
R
TE
 IV
 
TO
PS
-2
0 
AL
TO
.
 
U
N
IX
 
U
N
IX
 
VM
S 
S•
m
ut
. 
Pu
c.
al
 
FO
RT
RA
N 
C.
 P
IS
C
III
 
C 
Pu
ca
t 
P
n
ce
t 
M
G
 M
l 
.
.
.
 
P
lA
G
 
M
l 
.
.
.
 
S
ll
 
IG
L 
S
ll
 
IG
L 
SL
L 
c
s
 
c
s
 
D
R
C 
e
x
 s
s
 D
R
C
 
c
s
 
C
PP
 
C
PP
.8
0 
CP
P 
C
PP
BR
P 
CP
P 
C
O
B
O
 
M
TE
 
.
.
.
 
M
TE
 
M
TE
 
1 
I 
5 
8 
I 
3 
s 
9 
12
 2
1 
8 
13
 
IS
 I
S 
3 
3 
9 
9 
SU
M
M
AR
Y 
O
F 
D
ES
IG
N
-
AI
D
 C
O
DE
S
: 
8
0
 8
/W
 O
tS
CJ
trt
 8
R
P
 8
/W
 R
a
s
tt
t 
Pt
on
er
 C
O
 C
ol
or
 O
•s
pl
ay
 C
PP
 
C
of
or
 P
en
 P
lo
lte
r 
CR
P 
Co
iO
f R
es
le
r P
lo
lte
r 
CS
 
C
uc
ut
l S
•m
ul
at
or
 
CX
 C
t~
cu
tl
 tX
tr
.c
:to
r 
OR
C 
la
yo
ut
 O
es
•g
n 
R
u
le
 C
he
ck
er
 I
GL
 
ln
te
ra
ct
tv
e 
G
ra
ph
iC
 L
ay
ou
t 
IG
S
 
ln
le
r~
el
tv
e G
ra
ph
iC
 S
he
lls
 M
G
 
M
O
du
le
 G
en
er
at
or
 
M
l 
M
od
ul
e 
ln
te
rc
on
M
<:
to
r 
M
TE
 
M
·
n
•m
a
l T
es
t E
nv
u
o
n
m
e
o
t 
P
LA
G
 P
lA
 ge
n~
fi
iO
f 
S
G
C
 S
hc
ks
 to
-la
yo
ut
 G
~e
ra
to
r 
Co
m
C)
fes
so
r 
S
U
 S
ym
bo
hc
 la
yo
ut
la
ng
ue
ge
 S
S 
Sw
tiC
h 
S
tm
ul
at
or
s 
SS
l 
Sy
m
bO
itc
 S
tt
ek
s 
La
ng
ua
ge
 
W
 ..
.
.
.
 
U 
(S
L 
L.
) 
u
s
c
.
 
F 
R
os
en
be
rg
er
 
J 
N
el
so
n 
EE
<6
3 
EE
59
9 
Sp
 
F 
25
 
JO
 
O
EC
-2
0 
D
E
C
-K
l1
0 
TO
PS
-2
0 
TO
PS
-1
0 
St
m
u
la
 
Pu
ca
l 
FO
RT
RA
N 
PL
AG
 
.
.
 
.
 
SL
L 
SL
L 
c
s
 
c
s
 
C
PP
C
O
 
CP
P 
.
.
 
.
 
9 
II
 
10
 I
S 
~l
-c
3 
t-
.~
 
{1
)\
i)
 
~
 ~
 
1:::
:1'1
:1 
\i)
 
<J
 
0)
 
~
. 
:h
. 
~
~
 
~ 
~ \i)
 
~ 
~ 
~
 
~
 
~t
: ~ 
~\
i)
 
:3 
0)
 
"
' 
.
.
 
~
 
\i)
 
t>:
l 
:3
 
fi 
\il
"=
l 
~
 
\i)
 
~
~
 
~
 
~
. 
~
\
i
)
 
~
. 
~
 
0 
(") 
~
 
\i)
 
0)
 
~
 
\i)
 
~
 
~
 
~.
 
~
~
 
0 
~
 
~
 
0 
~
 
~
~
 
0 
\i)
 
~
 
~
. 
<
;') 
\i)
 
\i)
 
0)
 
~
 
\i)
 
~ ~
 
~
 
~
. 
0 ~
 
0 '-
!,
 ....
.
 
-
.
J 
18 Lynn A. ConlJflY 
• 
Thus we had demonstrated that the time and cost to implemcm a prototype Vl .SJ designs were as 
low as they would be using rrL for an equivalent sii'c designs. However, once you've successfully 
prototypcd a design in VLSI. you can take tremendous advantage of the low replication costs and 
high-perfonnance of VLSI when competing against similar systems implementatcd in ·n 'L. 
Therefore, I belie\e that in addition to the many business opportunitieS in YLS1 design aids and 
chip designs, there must abo be a substantial business opportunities in the area of VLSI 
implementation systems and services, foundry service brokerage, and foundry services. 
Those of you who arc interested in learni ng more about the present courses and design aid 
em ironmcnts in the uni versi ties might read my recent column19 in Lambda Maga1ine. I'll now 
show a table (sec Table 1.) from that article that tabulates the courses, the computing and design-aid 
environments (as of summer 1980), and the project experience at the key gro up of 12 universities 
tl13t collaborated with us at Pi\RC during MJ'C79 and MPC580. You can sec some interesting 
patterns of diffusion and convergence in this table. You can sec how nev. types of analysis aids arc 
being used this year a t most schools to qualify projects for implementation, and how rapidly those 
new concepts have swept through this university community, most of whom are on the ARPAneL 
4. Sketch of and Reflections on the Research Methods Used 
How was all of this done? Let's reflect o n these events, focussing on the research methods used to 
direct and help all of these different things jointly evolve. You'll notice a common idea running 
through all of these events: Fast-turnaround implementation provides a means for testing concepts 
and systems at many levels. It isn' t just used for testing the project chips. It also tests the design 
environments, the courses and instructional methods, the text materials, and the design methods. 
I'll now describe a basic method of experimental computer science, and sketch how this method was 
applied to the generation of the YLSI design and implementation m ethodologies. Later I'll describe 
the resou rces required in order to direct this sort of large scale, experimental evolution of 
engineeri ng knowledge and design practices. 
Experimental Method 
There is a basic experimental method that is used in experimental computer science when we are 
exploring the space o f what it is possible to create. 'I he method is especially applicable when 
cre3ting compute r languages, operating systems, and various kinds of computing environments, i.e., 
applications where we provide primitives that many other people will usc to generate larger 
constructs. Suppose that you've conceived of a ncv. system concept. and want to try it out 
experimentally. The method is simple : You build a prototype of a system embodying that 
concept, run the system, and observe it in operation. You might immediately decide, "Hey, this is 
just not feasible," and scrap the idea right there : or you may think, "Well, maybe we can improve 
things." or, "l .et's try something slightly different," make ~omc revisions, and run the system again. 
This simple. iterative procedure is sketched in Figure 6. After the experimentation has generated 
sufficient knowledge (for example, has demonstrated the feasibility of the concept). you may make a 
transition into some later phase in the evolution of the concept. 
What might such later phases be? Suppose you've successfully taken a new concept through a 
feasibility test., perhaps experimenting with a quick implementation that you ran yourself. You may 
think, "Well , let's build an improved prototype, and have some other user run it. I'll watch the 
user usc it, and sec what happens. " After going around that loop a few times, and making further 
INVITED SPEAKERS SESSION 
Th e MPC AdventuPe s: Expe Pie nc e s ~it h t he Ge ne Patio n o f 
VL SI De s i gn and Impl ementation Me th odologi e s 
refinements. you may make the transition to building a prototype to be placed into extensive field 
trials by many users. ' tninking back, you can sec how the de~ign course was wken through a 
(O.K.) 
(on to the ext phase) 
t 
Figure 6. An Experimental Method 
succession of such phases, from feasibility to transfer to a few other "users" and on to full scale 
"field trials". By obtaining feedback from users and observating results at each step, you move on 
to on the next phase (sec Fig. 7) of refmcment and integration of that particular system. 
\ 
Feasibility Test of Concept 
\ 
First Prototype to be User Tested 
\ 
Development Prototype for Extended Field Trials 
~ 
Operational Version of System 
\ 
Figure 7. Some Phases in the Evolution of a System 
If we study the development of the VLSI design methodology, its validation, and its social 
propagation, you' ll notice that the following has happened: The evolution of the methodology 
involved a multilevel cluster of systems that were being jointly evolved (sec Fig. 8). Each system in 
the cluster runs through the experimental loops, and passes through the various phases of its own 
evolution. Entries at the higher levels, for example the methodology, or the text, or the documents 
to support a course. might be more solid and in later phases of their evolution at any given time 
than, for example, a course in a particular school, or the design environment for that course. 
CALTECH CONFERENCE ON VL SI, JanuaPy 1981 
20 Lynn A . Conw11.y 
Student design projects play a key role in this process. supporting new refinements in the higher 
level systems in the hierarchy every new school semester. Fast turnaround implementation of 
designs was used to close the experimental loop on all the systems in this hierarchy. 
Design Methodology 
\ 
Text, Instructors' Guide, :md other Documents 
\ 
Courses 
\ 
Design Environments 
\ 
Student Design Projects 
\ 
Implementation Methodology & Systems 
\._ D~icn Prototypes 
Figure 8. The Joint Evolution of the Multi-Level Cluster of Systems 
If we think back over the evolution of these systems, we can sec how all these things were running 
in parallel in a rapidly enlarging social enterprise. The early courses run here at Caltech 
demonstrated that it might be feasible to create a simple design methodology. Following the period 
of basrc design methodology research. the prelimina ry courses run at Caltech, U.C. Berkeley, and 
CM U helped debug the emergi ng text documenting the new design methods. Tne newly 
documented methodology was then introduced into the M.l.T. '78 course. wh ich became the 
prototype for the new type of intensive, project-oriented courses. The results of that course 
prepared the way for seed ing similar courses in many other schools. 
The text itself passed through drafts, became a manuscript, went on to become a published text 
Design environments evolved from primitive CIF editors a nd C IF plotting software on to include 
all !>ons of advanced symbolic layout generators and analysis aids. Some new architectural 
paradigms have begun to similarly evolve. An example is the series of designs produced by the OM 
project here at Caltcch. /\t MIT there has been the work on evolving the LISP microprocessors.3.1° 
/\t Stan ford, Jim Clark's prototype geometry engine, done as a project for M PC79, has gone on to 
become the basis of a very powerful graphics processing system nrchitecture,9 involving a later 
iteration of his prototype plus new work by Mark Hannah on an image memory processor.20 
INVITED SPEAKERS SESSION 
The MPC Adventures : Experien~es with the Gene~~tion of 
VLSI Desig~ and ImpLe~entatin~ Methodologies 
21 
While these things were evolving. Dick Lyon undertook the important work of developing, 
debugging. and evolving a set of basic library cells (sec refs. 2.5) that would later be used in all of 
the courses by all of t.hc students in the MPC adventures. Again. in par<~llcl with that. there was 
the iterative evolution through a series of experiments. from the early multiproject chip sets to the 
remote entry multiprojcct chip done Jt MIT. to the early tmplcmcntation systems at PARC, and 
now on to the automated implementation sy~tcms at PARC and USC-lSI. 
One thing to remember about this is that such enterprises arc organi7ed at the meta-level of 
research methodology and social organit.ation: they arc not planned in fully-instantiated detail using 
some sort of PERT chart. The evolution of a system of knowledge has a certain dynamics. There 
is a great deal thJt happens concurrently. There is the necessity for various activities to reach some 
minimum sufficient stage of development in order to support activ tt} at some other level. If things 
arc staged right. and people arc in close contact with each other and arc htghly motivated by 
effective leadership, then a lot of these things can move rapidly forward together. But remember, 
there is always a strong clement of chance when folks go off exploring. The unfolding of the events 
depends upon what is discovered, and upon how well the opportunities presented by the discoveries 
arc seized upon and exploited by the overall community of explorers. 
The Network Community 
Some key resources arc required in order to organize such an enterprise. Perhaps the most 
important capital resource that we drew upon was the computer-communications network. including 
the communications facilities made available by the ARPAnct. and the computing facilities 
connected to the ARPAnct at PARC and at various universities. Such a computer-communication 
network is a really key resource for conducting raptd, large scale, interactive experimental studies. 
The networks enable rapid diffusion of knowledge through a large community because of their high 
branching ratios, short time-constants, and flexibility of social structuring; any participant can 
broadcast a message to a large number of other people very quickly. It isn't hke the phone, where 
the more people you try to contact. the more time-overhead is added so tl1at you <;tart spending all 
of your time trying to get your messages around instead of going on and doing someth ing new. 
The high social branching ratios and short communications time constants of the networks also 
make possible the interactive modifications of the systems. all of these systems. while they arc 
running under test. If someone running a course. or doing a design. or creating a design 
environment has a problem. if they find a bug in the text or the design method. they can broadcast 
a message to the folks who arc leading that particular aspect of the adventure and say, "Hey! I've 
found a problem." ll1c leaders can then go off and think. "Well. my God! !low arc "c going to 
handle this?" When they've come up with some solution. they can broadcast it through the network 
to the relevant people. Thus they can modify the operation of a large, experimental. multi-person, 
social-technical system while it is under test. They don't have to nm everything through to 
completion, and then start all over again, in order to handle contingencies. This is a subtle but 
tremendously important function performed by the network. and is similar to having an interactive 
run-time environment when creating and debugging complex software systems. 
There is another thing that happens in the network: it's relatively easy to get people to agree to 
standards of various kinds. if the standards enable access to interesting servers and services. For 
example, CIF became a de fac io standard for design lnyout interchange because we at PARC said 
"if you send a ClF file to us we will implement your project". Everybody put their designs in CIFI 
CALTECH CONFEReNCE ON VLSI, January 1981 
22 
Lynn A. Conway 
We answered our own questions: "Is CIF documented well enough to be propagated around? Does 
it really work anyway? Docs it have the machine independence we've tried for?" That way we 
debugged ClF and culturally integrated CIF. 
Such networks enable large, geographically dispersed group of people to function as a tightly-knit 
research and development communily. New forms of competitive-collaborative practices are 
enabled by the networks. The network provides the opportunity for rapid accumulation of sharable 
knowledge. Much of what goes on is captured electronically - - designs, library cells, records of 
what has happened in the message traffic. design-aid software and knowledge - - all can be captured 
in machine representable form. and can be easily propagated and shared. 
One reason for the rapid design-environment development during '79-'80 was a high degree of 
collahorntion nmong the schools. Often. as useful new de~ign aids were created, they were quickly 
shared. Many of the schools had similar computing environments, and the useful new knowledge 
diffused rapidly via the 1\RPAnet. 
Another reason for rapid progress was keen competition among the schools and among individual 
participant~. The schools shared a common VLSI design culture; during '79-'80 all used the same 
implementation system. and batches of projects from the schools were often implemented 
simultaneously. Therefore, project creation. innovations in system architecture, and innovations in 
design aids at each of the schools were quite visible to the others. Students and researchers at MIT, 
Stanford. Caltech, CMU, U.C. Berkeley, etc., could visualize the state of the art of each other's 
stuff. These factors stimulated competition. which Jed to many ambitious, innovative projects. 
Successful completion of designs, and thus participation in such competition, depended strongly on 
the quality of the design environment in each school. Therefore. there was strong pressure in each 
school to have the latest, most complete set of design aids. This pressure tended to counter any "not 
invented here" opposition to importing new ideas or standards. The forces for collaboration and for 
competition were thus coupled in a positive way, and there was "gain in the system". 
Now. think back to the question. "How do unsound methods become sound methods?" Remember, 
you need large scale usc of methods to validate them, and to produce the paradigm shifts so that 
the methods will be culturally integrated. In industry, it's very difficult to take some new proposed 
technique for doing things and put it in usc in a large scale in any one place; a manager trying such 
things would be accused of using unsound methods. However, in the universities. especially in 
graduate courses in the major research universities, you have a chance to experiment in a way you 
might not in industry, a way to get a lot of folks to try out your new methods. 
1\ final note about our methods: The major human resources applied in all of these adventures 
were faculty members, researchers, and students in the universities. The research of the VLSI 
System Design Area has often involved the experimental introduction and debugging of new 
technical and procedural techniques by using the networks to interact with these folks in the 
universtucs. These resources and methods were applied on a very large scale in the MPC 
adventures. There arc risks associated with presenting undcbugged technology and methods to a 
large group of students. However, we have found the universities eager to run these risks with us. 
It is exciting, and I believe that it is appropriate for university students to be at the forefront, 
sharing in the adventure of creating and applying new knowledge. The student designers in the 
MPC adventures not on ly had their projects implemented, but also had the satisfaction of being part 
of a larger experimental effort that would impact industry-wide procedures. 
I NVI TED SPEAKERS SESSION 
The MPC AdventuPes: ExpePiences with the CenePation of 
VLSI Design and ImpLementation MethodoLogies 
23 
Th ese experiences suggest opponumt1es and provide a script for university-government-industry 
collaboration in developing new design methodologies and new supporting infrastructure in many 
areas of engineering design. The universities can provide the cxperunental and intellectual arena; 
government can provide infrastructure and univerSity research funding; industry can provide 
knowledge about and access to mndern, expensive, capital equ ipment that can implement 
experimental designs created by university students and rc.;;earchcrs. Modern computer-
commumcations networks, properly used, can tic all these acuvitics together. The implementation 
of designs closes al l the experimental loops. 
5. Looking Ahead 
I wonder where we might apply some of these methods next? Where might some of you apply 
met110ds !Jke these in order to aggress1vely explore new areas? Well, first of all, there certainly are 
tremendous opportunities further discoveries and evolutionary progress in VLSI design and 
implementation methodology. 
We are now seeing the beginnings of new architectural methodologies appropriate for VLSI in a 
number of specializcJ areas of application. ror example you might study the work that Dick Lyon 
is doing to create a new arcllltcctural set of "VLSl building blocks" for bit-serial digital signal 
proccssing.2l Wouldn't it be interesting if those techniques could now be tried in a few courses? 
We'd find out if people can really learn about signal processing with YLSI, and then quickly 
compose working systems, thus providmg a reality test of Dick's ideas. 
TI1cre arc many other areas of digital system architecture ripe for the introduction o f new 
architectural methodologies appropriate for YLSI. There arc areas like computer graphics for 
providing high-bandwidth visual displays for interactive personal computing systems, and the 
generation of computer images for electronic printing and plotting. There's image processing, taking 
digitized input im:1gc data and processing it to recognize and detect things, with applications in 
OCR systems. visual input systems for controlling robots. smart visual scn<>ors for various defense 
systems, t11at son of thing. There arc areas like data encryption and decryption. So there's a whole 
world of spcciali7cd architectural areas that people can now explore. given that they have access to a 
VI .SI design environment and to quick turnaround Implementation to try out their ideas. As 
successes accumulate, the underlying knowledge and the detailed design tiles can be rapidly 
propagated around the YLSI network community. 
There arc many opportunities for evolving new design and analysis aids appropriate for the new 
design methodology. Progress has been rapid so far, 19 but there 1s p!cnty more to do. Those 
interested in creating and testing new design aids might ask yourselves "What can I create and then 
introduce over the network that wou ld be valuable to the YLSI community, t11at might integrate 
with tl1e overall activity?" That line of t11inking. taking into account the current state of the 
community. and the means of introducing new ideas into the community for testing and validation, 
may increase your chances of successfully creating something that becomes culturally integrated. 
For example, the early circuit extractor work done by Clark I3akcr16 at MIT became very widely 
known because Clark made access to the program available to a number of people in the network 
community. From Clark's viewpoint. this further tested the program and validated the concepts 
involved. Out Clark's usc of the network made many, many people aware of what the concept was 
about. The extractor proved so useful that knowledge about it propagated very rapidly through the 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
24 
Lynn A. Conway 
community. (Another factor may have been the clever and often bizarre error-messages that Clark's 
program generated when it found an error in a user's design!} 
Another area of opportunity is in the evolution of standards. For example, we need a standard 
"process test chip" for the back-end foundry interface, so that designc1s and foundry operators will 
ha\C a mechanism for deciding to shake hands and exchange dollars for wafers. Although some 
strawman versions have been proposed, there is no standard now. Perhaps a standard process test 
chip could be evolved by inserting strawman versions into wafers that arc run for university 
multiproject chips sets. The community could then gradually converge on a workable standard. 
There arc opportunities for further evolution of implementation systems. Also, similar design and 
implementation methods could be mapped into technologies other than nMOS. Design primitives, 
design rules. and design examples could be created. for example. for CMOS and then run through 
the same kind of scenario as above to introduce those into a university community. 
I myself have become interested in the prospects for bringing about a convergence of the work in 
VLSI design methodology with work based in knowledge engineering.22•23 There is the possibility 
of creatmg knowledge-based expert systems to aid VLSI system designers. I can imagine directing 
the evolution of such expert systems by using similar methods to those described above: trying out 
idea'>, prototyping them, evaluating them, and bringing them in large-scale use within a computer-
communication network community. But an added twist is possible here, that of making knowledge 
about expert systems accessible to the larger CS community, a community now knowing about 
YLSI. That way we could help to generate a common literacy about knowledge, a common 
knowledge representation language, and knowledge about the methods of knowledge engineering. 
You'll note that the experiment.al methods described in this talk aren't limited to application in the 
exploration of microelectronic system design. I find it fascinating to think about applying these 
methods to the rapid exploration of other domains of engineering design that may be operating 
under new constraints, and thus be full of new opportunities. 
For example. it is becoming common in some industrial environments for folks to do mechanical 
system design by using computers to specify the shape and dimensions of parts and to generate the 
tapes for numerically controlled machine tools that can Implement the parts. Consider the 
opportunity here: What if we documented a simple design method for creating mechanical systems 
under the assumption that the parts arc to be remotely machined and assembled in some sort of 
"magical automatic factory". Then ask the question, "Well, how would you teach mechanical 
design under the many new constraints imposed by the remote factory?" If you had access to such a 
factory, or if you could even emulate it using manual procedures where necessary, you could put in 
place the same sort of overall experimental environment to develop from very early crude principles 
some sort of new design methodology that would be appropriate for that environment. In that way 
one could evolve an entire design culture of methods, courses, design examples, design aids, etc., 
using the methods described above, and that culture could be rapidly spread out through the 
networks into a large university community. 
1 am very interested in studying and experimenting further with techniques for creating, refining, 
and culturally integrating new engineering design methodologies. If any of you folks engage in 
similar work. especially within the university computer-communications network community, I'd be 
very interested in learning of your experiences. I'd enjoy brainstorming with you on how to 
improve the underlying methods, and how to spread knowledge about the results. 
INVITED SPEAKERS SESSION 
The MPC Adventu~es: Expe~iences with the Gene~ation of 
VLSI Design and Implementation Methodologies 
6. AcknO\\Icdgcmcnts and Conclusions 
25 
I am deeply indebted to many people for their contributions and help in creating the design 
methods. the textbook, and the implcmentallon methods and system. and also the university VLSI 
design courses. design environmenllo. and research programs. There arc literally hundreds of people 
who have played important roles in the overall activity. Students, researchers, and faculty members 
in the universities. and a number of industrial researchers. industrial research managers, and 
government research program managers ha\e been actively involved m these events. J am at a loss 
to acknowledge all of the individual participants. 
However, l would like to individually acknowledge some some folks at P/\RC who've worked on 
this research since the early days. I am thinking of Doug Fairbairn, who was with us during the 
key early years: Dick Lyon, who has contributed so much to the effort: /\Jan Bell and Martin 
Newell for their innovations and efforts in the creation of VLSI implementation systems that have 
supported so well the validation and spread of VLSJ knowledge. I'd especially like to acknowledge 
the support and encouragement that all of us m P/\RC have received over the years from the senior 
research management of Xerox Corporation, in particular, from llert Sutherland. 
Let's look at the photo of /\Jan Bell again (Fig. 3}, and think back to the MPC79 effort. l'm sure 
you now sense that MPC79 was not just a technical effort, that there was a tremendous human 
dimension to the project. So many folks were stmultaneously creating and trying out things: 
students and researchers trymg out new designs that were very, very important to them: instructors 
and project lab coordinators trying out the new courses and project lab facilities; at P/\RC the new 
implementation system was coming into existence, under the pressure of trying to provide VLSI 
implementation service to the many university designers. This built up into a tremendously exciting 
experience for all participants, a giant network adventure that climaxed as the design-cutoff time 
approached, and the final rush of design files flowed through the: ARPAnet to P/\RC. 
So when you see someone interacting with a personal computer connected to <t network, rather than 
jumping to the conclusion that you are observing a reclusive hacker running an obscure program, 
you might ask yourself "I wonder what adventures this person is involved in?" Remember, you 
may be observing a creatively behaving individual who is participating in, or perhaps even leading, 
some great adventure out in the network! 
These events arc reminiscent of the pervasive effects of the telegraph and the railroads, as they 
spread out everywhere during the nineteenth century, providing an infrastructure people could use 
to go on adventures, to go exploring, and to send back news of what they had found. I think of 
personal computers and the computer communication networks as a similar son of infrastructure, 
but here and now, as we explore the modern frontier- - the frontier of what we can create. 
The new knowledge and products our YLSJ design community is creating will have tremendous 
social impact, by helping rapidly spread and increasing the power of the new personal computing 
and computer-communication infrastructure. 
Thus your work in computer science and VLSI system design is expanding the opportunities for all 
of us to go on all sorts of grand adventures in the future! 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
26 £ynn A. Conway 
REFERENCES 
l. C. Mead and L. Conway, Introduction to V LSI Systems, Limited printings of prepublication 
drafts of a text in preparation, Xerox Palo Alto Research Center (PARC). Palo Alto, CA; (a) 
Chapters 1-3, September 1977; (b) Chapters 1-5, February 1978; (c) Chapters 1-9, July 1978. 
2. R. Hon and C. Sequin, A Guide to LSI Implementation, Limited Printing, Xerox PARC, 
September 1978. 
3. G. Steele, Jr. and G. Sussman. Design of U.\'P-Based Processors or. Scheme: A Dielectric LISP 
or. Finitl' Memories Cons1dered Harmful or. I.AMJJDA: The Ultimate Opcode, AI Memo No. 559, 
Artifictal Intelligence Laboratory, M.l.T., March 1979. 
4. C. Mead and L. Conway, Introduction to VLSJ Systems, Addison-Wesley, Reading, MA, 1980. 
5. R. Hon and C. Sequin, A Guide to LSI Implementation, 2nd Ed, Xerox PARC Technical Report 
SSL-79-7. January, 1980. 
6. L. Conway, The MIT '78 VLSI System Design Course: A Guidebook for the Instructor of VLSJ 
System Design, Limited Printing, Xerox PARC, Palo Alto, CA, August 1979. 
7. D. Fairbairn and R. Lyon, "The Xerox '79 VLSI Systems Design Course", Xerox PARC 
Videotapes and Lecture Notes. Xerox PARC. Palo Alto, CA. February, 1979. 
8. L. Conway, A. Bell, M. Newell. R. Lyon. R. Pasco. Implementation Documentation for the 
MPC79 Multi-University Mu/tiproject Chip-Set, Xerox PARC Tech. Memorandum, 1 January 1980. 
9. J. Clark, "A VLSI Geometry Processor for Graphics", Computer, Vol. 13, No. 7, July, 1980. 
10. J. Holloway, G. Steele, Jr. , G. Su<>sman, A. Bell, The Scheme-79 Chip, AI Memo No. 559, 
Artificial Intelligence Laboratory, M.l.T .. January 1980. 
11. L. Conway, A. Dell, M. Newell, "MPC79: The Demonstration-Operation of a Prototype 
Remote-Entry, Fast-Turnaround, VLSl Implementation System", Conference on Advanced Research 
in Integrated Circuits. M.l.T., January 28-30, 1980. 
12. L. Conway, A. Bell. M. Newell, "MPC79: A Large-Scale Demonstration of a New Way to 
Create Systems in Silicon", LAMBDA, the Magazine of VLSI Design. Second Quarter, 1980. 
13. T. Strollo, et al. Documentation for Participants in the M PC580 Multiproject Chip-Set. Xerox 
PARC Technical Memorandum, 7 July 1980. 
14. R. Rivest, "A Description of a Single-Chip Implementation of the RSA Cipher", LAMBDA, 
the Magazine of VLSI Destgn. Fourth Quarter, 1980. 
15. D. Fairbairn, R. Mathews. J. Newkirk. et al, Videotape VLS! Design Course based on the Mead-
Conway text "Introduction to VLSI .S'ystems", VLSI Technology, Inc. (VTl), Los Gatos, CA, 1980_ 
16. C. Baker and C. Terman, "Tools for Verifying Integrated Circuit Designs", LAMBDA, the 
Magazine of VLSI Design, Fourth Quarter, 1980. 
INVITED SPEAKERS SESSION 
The MPC AdventuPes: ExpePiences with the CenePation of 
VLSI Design and ImpLementation MethodoLogies 
27 
17. R. Bryant, "An Algorithm for MOS Logic Simulation", LAMBDA. Lhe Magazine of VLSI 
Design, Foun.h Quarter, 1980. 
18. A. 13e11, "The Role of VLSI Implementation Systems in Interfacing the Designer and Fabricator 
of VLSI Circuits", Proc. of the lnternaLional Telecommunications Conference, Los Angeles, Nov. '80. 
19. L. Conway. "University Scene", LAMBDA, the Magazine of VLSI Design. Fourth Qtr., 1980. 
20. J. Clark and M. Hanna, "Distributed Processing in a High-Performance Smart Memory", 
LAMBDA, Lhe Magazine of VLSI Desig11, Fourth Quarter, 1980. 
21. R. Lyon. "Signal Processing with VLSI", Limited printings of lecture notes for a "constantly 
evolving talk", Xerox PARC, 1980. 
22. E. Feigenbaum, "The art of artificial intelligence - Themes and case studies of knowledge 
engineering," Proc. of the 1978 National Computer Conference. AFIPS Press. Montvale, N.J., 1978. 
23. M. Stefik, et al, "'The Architecture of Expert Systems: A Guide to the Organization of Problem-
Solving Programs," to appear as Chapter 3 in: F. Hayes-Roth, D. Waterman, D. Lenat, (Eds.), 
Building Expert Systems, (a textbook in preparation). 
SUGGESTED READING REFERENCES 
H. Simon, The Sciences of the Artificial, The M.I.T. Press, Cambrige, MA, 1969. (2nd Ed. in Press). 
T. Kuhn, The Structure of Scientific Revolutions. 2nd Ed., Univ. of Chicago Press, Chicago, 1970. 
L. Fleck, Genesis and Development of a Scientific Fact, F. Bradley and T. Trenn, Translators, T. 
Treon and R. Merton, Editors, University of Chicago, 1979. (Originally published as Entstehung und 
Entwicklung einer wissenschaftlichen Tatsache: Einfuhrung in die Lehre vom Denkstil und 
Denkkol/ektiv, 13enno Schwabe & Co., Basel, 1935.) 
C. Levi-Strauss. Mythologiques, Vols. I-IV. Pion, Paris, 1964, '66, '68, '71. 
H. Garfinkel, Studies in Ethnomethodology, Prentice-HaJJ, Englewood Cliffs, N.J., 1969. 
B. Latour and S. Woolgar, Laboratory Life: The Social Construe/ion of Scientific Facts. Vol. 80, 
Sage Library of Social Research, Sage Publications, Beverly Hills, 1979. 
D. Crane, Invisible Colleges: Diffusion of Knowledge in Scientific Communities. Univ. of Chicago 
Press, Chicago, 1972. 
0. Englehart, R. Watson, J. Norton, Advanced Intellect-Augmentation Techniques, Stanford Research 
Institute, Menlo Park, CA, 1972. 
J. Licklider and A. Veua, "Applications of Information Networks", Proceedings of the IEEE. Vol., 
66, No. 11, November, 1978. 
J. Lederberg, "Digital Communications and the Conduct of Science: The New Literacy", 
Proceedings of the IEEE, Vol., 66, No. 11, November, 1978. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
28 
INVITED SPEAKERS SESSION 
29 
MOSIS --THE ARPA SILICON BROKER 1 
Danny Cohen and George Lewicki 
USC/Information Sciences Institute 
This paper is actually an edited transcript of the talk presented 
at the conference. Many references to visual accompaniments, 
difficult to reproduce here , have been eliminated. 
INTRODUCTION 
The idea of a silicon broker was conceived by Carver Mead some time ago as 
a way to give a large community of chip designers access to fabrication 
services and as a way to speed up the fabrication process . MOSIS is the ARPA 
silicon broker that we have implemented at ISI. With MOSIS we are trying to 
totally isolate the designer from all the trivia that fabrication requires. 
The main objective of ISI ' s VLSI project is to support the fast turnaround 
requirement of the ARPA VLSI community and of related programs . Another of 
our objectives is to help expand the VLSI design community by supporting 
research institutes and universities that are actively involved in VLSI. We 
hope to help MIT, Cal tech , Berkeley and other universities train as many VLSI 
students as they can . 
In addition , we'd like to encourage more vendors to offer custom VLSI 
services . We were pleasantly surprised at the nunber of organizations already 
in the business of offering those services. 
For the time being, we are sorry to report He can serve only the ARPA VLSI 
community . However , other government,- sponsored users may gain access to MOSIS 
by special arrangement with ARPA. If you are interested in our service and 
your project is government sponsored , please contact us or ARPA, and we will 
try to help you. Remember that NSF is part of the US Government , so people 
sponsored by NSF will probably be able to participate. 
1This research is supported by the Defense Advanced Research Projects Agency 
under Contract Nos. MDA 903 80 C 0523 and MDA903 81 C 0335 . Views and 
conclusions contained in this paper are the authors ' and sho uld not be 
interpreted as representing the official opinion or policy of DARPA, the U.S. 
Government, or any per son or agency connected with them. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
30 
Danny Cohen 
MOSIS 
The MOSIS system developed from an idea demonstrated in recent years by 
Cal tech and by the MPC project at Xerox PARC. We'd like to acknowledge 
everyone who helped us, but this is only a partial list. We would like to 
mention Carver Mead--he was the first to put together many chips, and he 
taught us how to do it; and Lynn Conway, whom you have just heard telling us 
all about the MPC project at Xerox PARC. Thanks also to the fantastic crew 
there: Alan, Martin, Dick, Ted, and many others--it ' s impossible to name 
everyone here; please accept our apologies. 
lSI's silicon broker works as follows: Users who have obtained access to 
MOSIS communicate directly with the system via electronic mail . Most users 
are on the other side of the ARPANET, whether across town or across the 
country. MOSIS understands various types of information, such as, "this is a 
description," "thi s is a pad," and "this is the technology we want," and it 
knows that CIF files describe the geometry of a project. MOSIS accepts 
several types of requests, for example, "please start a new project . 11 All 
requests are very formal, because machines, not people, read them. Questions 
about sending requests to MOSIS can be sent to MOSIS@ISIF. The questions 
should be stated in plain English, e.g., "Please tell me what to do." The 
answers from us will probably be equally cryptic: "We couldn't understand your 
message, but if you want to talk to us, do such and such and we will send you 
the MOSIS User'~ Manual." Save time and trouble by reading the MOSIS User'~ 
Manual--it explains everything a user needs to know. 
All user-provided information flows through t40SIS to MrBill, our geometry 
handler that checks CIF files, packs sets of projects onto a (smaller) set of 
dies, translates each die into MEBES format, makes bonding diagrams, and more. 
Ron Ayres wrote MrBill in ICL--beautiful language, beautiful system, works 
magic, very efficient. For exanple, it can plot CIF files like Figure 1. 
Figure 2 is a slightly more complicated plot. It's not clear exactly what 
MrBill drank before he plotted it, but we were told, it's OK, it's a bubble 
memory. MrBill' s primary task is to produce tapes that the foundry uses to 
make masks. 
After MrBill does his work, the next step in the process is mask 
fabrication. Mask houses expect two types of things from us: tapes with MEBES 
files and job decks. MEBES files contain the information that the mask house 
uses to make bitmaps (which are made into masks). A job deck, about one 
percent of the size of a MEBES file, maybe less, contains the specifications 
for each MEBES file--parity, record size, etc. 
Fabrication itself is very simple because somebody else does it. Once the 
masks are made, all we have to do is drive three, four, or maybe ten miles in 
Silicon Valley with the masks to a wafer fabricator. (It is wise to drive 
slowly to make sure the masks don't break.) After that, if we're lucky--and 
typically we are--we end up with a couple of wafers. 
Once we have the wafers, we like to probe each of the chips, not just all 
the wafers, so that no one will tell us later, "Maybe only the north part of 
INVITED SPEAKERS SESSION 
MOSIS - The ARPA Siticon Broker 
......... 
r 
, ~ -~-~~~~~~~~~~­
. :u.t. :.Ui :.ui .:....£ ~.;.i.. :w~ :....i. 
] 
r 
r 
1 I 
m::~ 
~::~ 
!l'~i 
··' 
?lf" l! 
i.,'jj ••• • 
~ ... 
.. ~ 
~ .. ~ 
... ; 
~··': 
... 
1 
'] 
Figure 1: Simple plot produced by Mr Bill from a CIF file 
,H 
CALTECH CONFERENCE ON VLSI, January 1981 
32 Danny Cohen 
Figure 2: A more complicated plot of a bubble memory 
INVITED SPEAKERS SESSION 
MOSIS - The ARPA Silicon BPokeP 
the wafer is good; if you happen to have a southeast project on it , you 
probably lose . " He like to make sure that the wafer is uniformly good. 
Then we break the wafers into individual chips, package them , and run some 
more tests , if we can . Afterwards we distribute the chips to the users , And 
each user examines his chips, points fingers at everyone, resubmits , etc. 
STANDARDS FOR MOSIS 
We prefer continuous-spooling mode, which means that we rlon' t like to 
advertise deadlines . ~vhenever we have enough projects , we fabricate them . 
The sooner you submit, the sooner your design will be fabricated . 
For the time bein~ , we support CIF 2 . 0 +. If you know only about CIF 2 . 0 , 
fine ; we support it. There are several other features we support, and perhaps 
we'll eventually convince the entire community to use them. 
At present, we support nMOS with the Mead-Conway design rule s . ~1ore 
processes will eventually be offered. This means that we do not now support 
2-layer metal, buried contacts, etc ., but we will later. Currently we use 
lambda equals 2 . 5 mic rons. This is a feature size o f five microns. We have 
talked to several people about smaller feature size , and we are in various 
stages of negotiations about smaller values. 
Once given a file , we ca n change lambda a little bit , but not a lot. If a 
big change of lambda is necessary , the designer has to make the c hanges . If 
the design is for 2 . 5 and we have 2 .0 available , we might cha nge the size if 
the designer allows us to . 
Our standar d packages for bonding projects are 40 and 64. If we can find 
packages for 89 bond projects, we might be able to bond them, but it rnight 
take more time. 
We try to provide fast turnaround by streamlining all the interfaces. We 
have been told by industry that if we pay a premilDll, He can get faster 
services. vie are not sure this is what we want to do right now. Hith more 
money we know we can get faster service. We are trying to see how fast 
service is if we know about tape parity, registration marks, COs, and all the 
other details that are required for fabrication . 
PROBLEMS 
We are constantly trying to improve the service from MOSIS . We like to try 
new software for its added features . We like to try new mask houses so we 
don ' t have to depend on one sour ce . We like to try many fabrication lines. 
We like to change the way we test wafers, packaging , etc . 
We have pr oblems in the process of qualifying any chanp;es, that is , making 
sur e a c ha nge is really an advance . For example , the first problem we had was 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
34 
Danny Cohen 
deciding row to qualify a new set of software. The probl em arises 1 11 
comparing two masks, one produced by the old and the other produced by the new 
system . Are the patterns really the scrne? A microscope is supposed to help , 
but it can 't do a good job. We tried many ways and finally worked out a very 
strange technique . Suppose you want to compare mask A and mask B. What we did 
was to overprint A and B bar [the reverse of 8], and A bar and B. In this way 
we discovered all the changes . We did all the printing on one plate so we 
wouldn't have to use a special microscope . 
He appliec.l this test after we were sure there were no other problems. We 
were shocked by what we discovered--one little bug that turned out to be many 
bugs in the manuals , not a bug in the software . But it was just as bad, so we 
had to fix everything until we finally got all those squares to be exactly the 
way they were supposed be , blank . 
We have also had problems in the preparation of a job deck. A job deck is 
a definition of a wafer. Figure 3 shows a wafer containing 18 different die-
types . Each die-type requires six files, so over 100 files are involved in 
preparing this wafer. A lot of coordinating is therefore necessary, and we 
would like to make sure everything happens right . 
Some of the companies we've worked with (Xerox , Boeing) share horror 
stories with us about the production of an accurate job deck . Our goal is to 
generate job decks with computers. Figure 4, for example, is an input for a 
program that generates a job deck (unfortunately , not for the wafer shown in 
Figure 3 . Sorry about that.). The input contains the name of the run, like 
N11E , and the definition of each layer. The letters D and C determine the 
dark or clear mask--very important, o r else you get some odd results. The 
name of the level and the name of the job have to be written correctly so the 
fabricators do not make mistakes . The input also contains some coordinate , 
and the map , which controls the position and the choice of over 100 files. 
With all of these variables, there is high potential for something to go awry . 
We have to screen new fabricato r s carefully . Ideally we'd like to give 
them a form to fill in, and then we would continue from there, if their 
qualifications were close to what we expected. In reality we ask the 
fabricators, "\olhat technology do you offer? nMOS? CMOS? What?" 
We also ask, "\olhat are your design rules?" Actually, we don't ask, "What 
are your design rules?" We say, "Tho se are OUR design rules. Do you support 
them?" And sometimes we get answers, "Yeah, we support them . Lambda equals , 
say , two microns , for everything except ... " and we say , "Too bad. You don ' t 
really support our design rules for this feature size .. .. " We have to decide 
at what feature size our design rules are supported. 
Next we ask the potential fabricator for electrical parameters , everything 
we need to know about masks, polarity , bloating, etc. Then we tell the 
fabricator how we like to measure. As a matter of fact, selecting a 
fabricator is not quality control--it should be a process control, insurance 
that everything meets our standards, including turnaround time , and, 
obviously , the expense. 
INVITED SPEAKERS SESSION 
~OSIS - The ARPA Silicon Rroker 
Figure 3: Wafer containing 18 different d ie- types 
CAL TEC H CONFERENCE ON VLSI , January 1981 
36 
Job Deck for Masks 
M 11 E 
01 D NO DIFFUSION 
02C Nl IMPLANT(DEPL) 
030 NP POLY 
04C NC CUT(CONTACT) 
050 NM METAL 
06C NG OVERGLASS 
24000 22400 6700 7100 
G 
CDEFH 
IJKLMNO 
AOPOADOIA 
BOCGBEPJB 
L M H Q C F N KM 
I J PHLGO 
KDEFP 
N 
Danny Cohen 
Figur e 4: Example input for a program that gene r ates a job deck 
INVITED SPEAKERS SESSION 
MOSIS - The ARPA Silicon BPokeP 37 
We specify five electrical parameters to the fahr icators . \.Je ask that the 
enhancement rjevices' threshold voltage be about +0.8V. For depletion, it 
shou~J be about - 3.5V . For k:4 and fork=~. minimum size inverters Vinv 
should be about +2. OV, and the poly sheet resistance should be less than 50 
ohm/ square . 
Obviously , this process control doesn't cover everything. Our expectations 
are that, when we go to reputable lines, they know how to do the thinp;s we 
have to have done in order to support our designers and they will produce gooci 
chips all the time . We would not be surprised if one day we find wafers that 
don't do anything right but that do l'lleet these five criteria. lvhen this 
happens, we are not going to sue anyone--just say , " Sorry! \ve are not goinr: 
to use this line anymore ." \le are not interested in finding out exactly why 
something went wrong. 
TESTING 
Next, we have the issue of testing. We don 't need tests that are desip:neci 
to calibrate fabrication lines because we don't care to calibrate fabrication 
lines. There are already peopl e whose job it is to calibrate the lines . The 
tests that a re important to us tell us something about what yield or what 
performance users can expect with o ur design rules; those areas are our 
concern. We would love to have standar d industry wafer- acceptance procedures. 
\ole' d like to be able to design one test chip into every wafer, accept the 
produced wafer , and then test it. If it passes the test , we say it ' s a good 
wafer , and we pay. If it does not, we don't take that wafer . \tJe ' c like t.o 
establish a standard that will be accepted both by industry and , obviously , by 
users. 
Unfortunately, we hav e not reached that point yet. \Ve are working on n 
standard. JPL, Xerox PARC, NBS, and the Integr ated Circuit Lab of HP are 
participating in this effort . We have made some pr ogress , but , aeain , we are 
not there yet. 
First of all we are trying to test the b nsic elements , like transistors. 
Then we like to test the building blocks, like inverters , to see if they work 
to our specs , and then even more complicated r andom fault structures. In 
order to do that, we have our " sta nd a r d" test patterns , which are designed for 
probing, not for bonding . 
That test vehicle is a n i nteresting camel. lt was supposed to be a lion. 
The co0111ittee that designed it met too many times . It ' s very complicated . It 
has had many tests, and we are trying to simpli fy it. Maybe we will be able 
to turn it into a lion again . But we don ' t know yet. We will work on it. 
Another issue is ho w to verify the complete ness of the testing . We would 
be uncomfortable in a situation whe r e all the tests a r e passed with flying 
color s , and no device works , or most devices don ' t work. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
38 
Danny Cohen 
In the past , there have been two situations in which tests were per feet but 
devices didn ' t work. When that happens you say, "Gee , that's too bad . Let ' s 
chanp;e the test ." We don't want to go into too many details, but let us 
describe one of those cases to you . 
On a cert::lin run we had , amonR; others , dieD and die E. Unfortunately , an 
error crept into the job deck when someone at the mask shop decided to retype 
it manually . When the person retyped it he interchanged the diffusion level 
0f the two clies. 
However , our test p"ltterns on the two dies are identical ! When we probed , 
everything was perfect ! So now vre say , "Aha! If wP. put the clie designation 
on <'Ill lnyP.rs . .. " (by the way, we thought about doin~ that before but never 
did or the error would have been caught) . Now we know how to find this error. 
F\Jt He hi'JvP no way of knowing how many other problems will pass ::!ll our tests 
without ~eing caught. 
We us~ both small test patterns that are p<'lrt of every die and a few bigger 
drop-in tests . All the test patterns of all dies on all wafers a re probed . 
\.Je actunlly probe every die. VIe compute the mean and standard deviations over 
thP. sample of 46 to 50 dies. We are looking for some interesting patterns, 
0 ut we hope never to find the,. 
Two- anrl three-dimensional <'lni'llyses of problem data do not reveal any 
significant inter- or intra-wafer pat tern. Two-d irnensional analysis, for 
':!X ample , indicates Hhether the north part of the W"' fer is better than the 
south or whether the middle is much better than the edge . Three--dimensional 
analysis shovJS up a difference between wafers. Maybe one wafer is OK , and 
other W"~ fer s ar e not . We compute both by wafer and by position, and by many 
othe r statistical ~eans. 
He have been very deli~hted not to find significant patterns . If we found 
sip;ni ficAr'lt patterns, for example , that the northeast corner is always the 
bPst , vre would be flooded with requests from users: "Please put rny job on the 
northeast , " or "put mine on wafer nunber three ," or something like that. We 
believe that Monday wafers are not really as good as Tuesday wafers , but we 
cannot. prove it ! 
Vie have also experi'Tlented with several comprehensive structures that test 
':.yric"ll user devices . Years ago, there was a notion of a typical picture for 
coiTlpute r g raphics. fl thous;;md lines was considered a typical picture , and 
everyone was supposed to support such a typical picture. What we need now is 
::.J typical user device that we can put on every wafer, try it , and if it works , 
then everythin~ is OK. If it doesn ' t work, we have a problem. We are still 
looking for such a device . 
INVTTED SPEAKERS SESSION 
MOSIS - The ARPA Silicon BPokeP 39 
As a matter of fact, we are designing several canaries2 just for this 
purpose. One of our canaries is a 19-stage ring oscillator that Xerox's MPC 
use~. We'd like to use it as many times as we can . One of the neatest thing s 
about this ring oscillator is that it uses only three pins . This is very 
important, because we can nearly always bond it in addition to other projects 
that don't require all these pins. When we bond projects we always try to 
bond the 19-stage ring oscillator and to test all of the oscillators to see 
that everything works. 
In addition, we are trying to get some yield information from the world's 
slowest 4K RAM. We try to come up with ratios such as 3 bits out of 4K didn't 
work on so many units to see if we can derive some yield statistics that are 
meaningful for users. Anyone who has entries or suggestions for this 
collection of canaries is most welcome to submit them for consideration. 
Thanks are due JPL and NBS for providing the following test structures. 
Figure 5 shows die F of run B; what is evident here are some random fault 
structures. There are several miles of metal over poly, etc. One of the 
interesting things to see here is that this is both a drop-in as well as a 
user device (at the lower right-hand corner) . We don't really make any 
differentiation between user projects and drop-ins. 
There were random fault structure dies with more miles of step coverage , 
and their logarithmic connections were visible. With ranclan fault structures 
like this, there is always the hope that the small portions of the structure 
are small enough not to have any faults and the big portions are large enough 
to make it easy to find the faults. And the worst that can happen is that all 
of them fail or none of them fails. Then you know you are looking at the 
wrong range. 
We had another interesting drop-in from NBS. It is interesting because 
several test patterns are repeated with variable geometry. It is revealing to 
learn more about the geometry and compare it with claims made by 
manufacturers. 
PACKAGING 
Our standard packages, as mentioned earl ier, are 40 and 64 pins. \ole might 
bond several projects in the sane package if they go to the same customer. It 
often happens that several small user projects can be bonded together . He 
always try to bond as many test structures as we can; for the time being this 
includes only the 19-stage ring oscillator. If we have more later , we will 
try to bond them too, but never at the expense of the paying passengers. 
2canaries used to be employed in und erground mines as indicator s of air 
quality. If the air wa s bad, the canary would die, but the miner would have a 
chance to return t o good air. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
40 Danny Cohen 
Figure 5: Die F of run B 
I NVITED SPEAKERS SESSION 
MOSIS - The ARPA Silico~ BPokeP 41 
Researchers at Lincoln are pursuing several techniques for bond ing 
standardization without imposing limitations on the designers . At present w~ 
bond manually. We'd like to be able to go to automatic bonding, but the larg~ 
variation between projects prevents that, at least at reasonable cost . The 
folks at Lincoln are trying to work out some techniques that will enable us to 
bond automatically. 
Our strategy for die distribution is as follows. We try to fabricate n 
copies3 for each project, such that the probability of giving a designer at 
least two dies that are mask defectless and silicon defectless will be greater 
than 90 percent . Now maybe 90 is not a hig~enough nunber . but 90 will have 
to suffice. In the worst case we can always refabricate the die just a few 
weeks later. The die should also be bonding defectless . If the bonding is 
not 100 percent. there is no point in having a per feet project . From time to 
time we have discovered some problems in bonding. 
Using the available data. we were able to achieve our goal of 65 projects 
on 18 die types with one wafer set. We showed this wafer set before (Figure 
3) . This die J (Figure 6) happens to have eight different projects on it. 
Some o f them could be bonded in the same package (utilizing unused pins). some 
of them not; the arithnetic of how many of each can become very interesting . 
One of the most interesting projects was done by a student of Chuck Seitz , 
Eric Barton. Eric was very impressed by Chapter 7 in Mead and Conway, and he 
decided to do a self-timed project. So he had his own clock wired into the 
project ; it can be seen in Figure 7. When power is applied. the hands 
actually move. We disconnected it at 8:00 this morning so it still registers 
that time. The clock was a great thing. We never knew about it until we 
looked through the microscope . First time we saw it under a microscope we 
checked--it was three minutes slow. 
CONCLUSION 
We want to push lambda but not at the expense of the design rule s . We 
w:>uld like to see if we can really reduce the design rules. It's nice to say 
one micron, but obviously we do not have to stop there. When people from 
industry come to us with submicron processes, we will be delighted to check 
each of the processes. We're always ready to add more features to nMOS and 
always willing to use other technology . Though we're not sure in exactly 
which order ... 
By the end of 1981 we expect to support over a thousand designers and a 
thousand projects. We want to be very careful with this kind of prediction . 
We like to think that we are underestimating: we would like to see more users. 
Please. feel free to try us; we hope that we will be able to accommodate most 
of you. 
3Needless to say, this magic nunber n depends on the active area of the 
project . 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
42 
Danny Cohen 
.:...~ - -~--- ~ i., ~-.::-- --. ~~ 
I ........ _.. • •• •••• ' 
. ' ' 
Figure 6: Die J with 8 different projects 
INVITED SPEAKERS SESSION 
MOSIS - The ARPA Silicon BPokeP 
Figure 7: A self-timed project 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
44 
Danny Cohen 
AC~OWLE [')GMENTS 
Thanks to ARPP. , whi c h provides all the funds for this project ; JPL and 
NASA, which helped us a lot in the first run; NBS , PARC, and the HP Integrated 
Circuit Lab , 1-1hich has helped us a lot in the testing. Thanks also to the 
crew at Caltech who provided the transcription of the talk and to Jim Melancon 
at ISI for editing the transcript into a reasonable form for publication. 
INVITED S PEAKERS SESSION 
4!> 
FAST TURNAROUND FABRICAT [QN FOR CUSTOH VLS I 
Gunnar A. Wetlesen 
VLSl Techno logy, Inc . , Santa Clara, California 
Now, you are probably wondering how somebody on crutches is going to be 
able to provide fast turnaround or anything. Well, exc use me while 1 grapple 
with everything that I have to carry up here . 
While we are in an informal mode, l would like to say that we made an 
acronym of an acronym, and we go by VTI, standing for VLSI Technolgy Inc. 
The so-called foundry concept ~s first proposed by Carver Mead is one of 
the areas of business that VTI hopes to support. 1 am not sure if our coming 
about was to make Carver's prognostications accurate, or whether it is indeed 
a reflection of the support that we have gotten from a number of investors. 
I would like to acknowledge them . There is a computer company with vision 
and venture capitalists that are represented in the audience today that made 
VTI come about. Previously, the idea of the foundry was only available 
internally in large corporations, such as Hewlett-Packard, who did the proto-
typing work for the MPC runs that Lynn Conway described earlier. We are 
going to be one of the factors in that area in the future. 
So, beginning with the more formal portion of this talk, 1 will be 
addressing the whole area of custom VLSI, and addressing the MPC area just a 
little bit. 
The use of custom and semi-custom circuits is rapidly growing in many 
electronic equipment areas, particularly as the potential of VLSI is being 
realized. Helping create this intensity in custom activity are the following 
advantages available through customization seen in contrast to the use of 
standard products: 
l) A competitive advantage product differentiation and/or 
proprietary feature sets . 
2) A performance advantage 
system overhead. 
reduced component count and associated 
Realizing the potential of custom VLSI circuits is analogous to the 
emergence of the microprocessor a decade ago . In this case it is presently 
limited by the front-end design phase (both schedule and cost). Thanks to 
many of the people present here today, the design mechanism for translating 
systems into silicon is evolving rapidly . 
Regardless of the design mechanism or the driving force toward custom in 
terms of systems advantages, all custom circuits must subsequently navigate 
the same obstacle in development: the time-consuming and often iterative 
prototyping phase . 
Fast turnaround for custom VLSI fabrication will now be examined . This 
fabrication need will be translated into terms of the characteristics desired 
of the MOS/IC manufacturer . We believe there is a need in the semiconductor 
industry for new kinds of companies properly postured to service the 
requirements for the coming era of custom VLSI . 
Fast turnaro und for custom VLSI fabrication as a goal defines 
general requirements which must be supported for the goal to be met . 
are: 
three 
They 
CALTPCH CONFERENCE ON VLSI, Januapy 1981 
46 GunnaP Wetlesen 
1) VLSl level of fabrication technology. 
2) Supporting custom designed circuits (as opposed to standard 
products). 
3) Providing fast turnaround fabrication. 
Let us consider now each of these in detail and the resulting "factory" 
charilcteristics in terms of operating philosophy, organization, equipment , 
"lnd people . 
VLSI FABRICAT[ON TECHNOLOGY 
There are several common definitions of what is VLSI . While best 
defined in terms of system complexity , other definitions are often based on 
feature size or device complexity. For example, the state-of-the-art 64K 
dynamic memory has been referred to as a VLSI chip by virtue of its greater 
than 100,000 trarssi.stor count (figure 1) . Certainly, the domain of logic 
circuits above 10 transistors can be associated with VLSI system complexity . 
10' 
105 
Q. 104 
J: 
0 
-fl) 101 
!z 
w 102 ::e 
w 
..J 
w 
10 
1 
"
160 65 70 75 
256K ~32 BIT 
RAM _,""~-'COMPUTER 
64K •/ 
RAM -" 
•/" 
.... 16 BIT 
~-'COMPUTER 
YEAR OF INTRODUCTION 
Figure 1 
What kind of fabrication technology for VLSI? as a practical matter , 
fabrication processes will evolve with time. Current technologies with VLSI 
potential i ncluded scaled n-MOS (commonly called H-MOS) and oxide isolated 
CMOS . One thing that is clear is that process technologies are becoming more 
INVITED SPEAKERS SESSfON 
Fast Tu~na~ound Fab~ication fo ~ Custom VLSI 4'1 
specialized depending on their application (figure 2). For example, in 
memory technology, double poly and diffused capacitors have been evolved to 
optimize the density of dynamic RAMs whereas static RAMs have incorporated 
high value polysilicon transistors and multiple transistor thresholds for 
both density and performance. I think it can be concluded that evolution of 
technology for VLSI custom circuits will be on a different branch of the 
technology tree compared to memory circuits. In particular, the use of 
multilevel metal as a solution to the interconnect and signal delay problem 
is overdue for logic circuits. Additional benefits would be gained by 
features improving ROM and PLA densities. 
SPIICIAUZATION OP PRoc•aa TIICHNOLOGY 
PRODUCT TYPE SPECIAL PROCESS FEATURES 
DYNAMIC RAMS • DOUBLE POLY 
• ENHANCED CAPACITORS 
STATIC RAMS • DOUBLE POLY (INTERCONNECT) 
• POLY LOAD RESISTORS 
ROMS • SELF·ALIGNED CONTACTS 
• MULTIPLE THRESHOLDS 
LOGIC CIRCUITS • MULTILEVEL METAL 
Figure 2 
In terms of commercial state-of-the-art equipment, the current noncap-
tive lines are still based on l : l projection aligners. Selective use of 
direct wafer steppers on certain critical mask levels is being employed for 
manufacture of volume standard products . Due to the logistics and the 
potential for r epeating defects associated with changing lOx reticles, the 
use of l: l projection printers is favored for producing lower volume custom 
and customizable circu its . With use of planar plasma etching and reactive 
ion technologies, resulting average feature sizes of 2-3 microns appear 
readily manufacturable with projection aligners in the near future (figure 
3) . Other aspects of equ ipment selection will be considered later. 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
48 Gunnar Wetl e sen 
MOS COMPLEXITY/DIMENSION 
FORECAST 
DEVICES DIE SIZE AREA (mils2) LINE 
YEAR PER CHIP (mils) PER DEVICE WIDTH 
1978 35,000 200 X 200 1.26 4J.L 
1982 250,000 300 X 300 .36 2.1 J.L 
1985 1 ,000,000 400 X 400 .16 1.4J.L 
Figure 3 
SUPPORTING CUSTOM CIRCUITS 
Supporting custom circuit manufacture is markedly different than the 
merchant IC industry supplying standard products. The first and most obvious 
characteristic is that a custom circuit is by nature unique . This uniqueness 
begins with the generation of the mask set , and involves both a significant 
data transfer of the base description of the circuit layout, and also the 
practical details of mask geometry polarity, and skewing, and process monitor 
chip insertion . The industry needs to develop clean, automated procedures 
for easy customer interfacing. 
The first step is to replace the personal phone call and the physical 
transfer of data base tapes, which will be accomplished instead by what we 
call " VTI Net ." Access to VTI via a commercial computer network allows the 
transfer of design files in CIF and other formats (figure 4) . More impor-
tantly, an inquiry service will be established to answer the basic questions 
a designer in a remote location needs to ask in order to interface with our 
fabrication service, including cost and scheduling information as well as 
formats, etc . 
INVITED SPEAKERS SESSTON 
Fast TurnJround P~b~ication fo ~ Custom V~Sl 
N&l WORK ACC&ee TO VT1 (HVTI h&Y"') 
• VIA COMMERCIAL NETWORK 
e ACCEPTS DESIGNS IN CIF (AND OTHER FORMATS) 
• ALLOWS FOR BASIC INQUIRY RESPONSE SERVICE 
Figure 4 
Centr al to such an arms-length service working are standardized inter-
faces before and afte r fabrication (figure 5). In partic ular , the technology 
a nd design rules must be common . nMOS based on lambda rules will be offered 
eTANDARDIZ&D PR&..PA8RICATION MTIIRPAC& 
• BASED ON LAMBDA DESIGN RULES 
• REQUIRES E-BEAM MASK GENERATION 
• "STARTING FRAME" INSERTED BY VTI 
Figure 5 
plus s pecials on a "non- network" basis (figure 6) . Mask generation will be 
do n e using an electron beam system not only for dimensional and complexity 
r easons but a l so because i t greatly simplifies the starting frame task . 
Al ignment keys a nd c r itica l dimension measurement points need not be inse r ted 
on t he cir c uit as those are included within t he process control monitor 
(PCM) , ch ip i nser t e d by VTI in the mask set . 
cueTOM IIN'I'IIJIPac•• 
TOOLING • DATA BASE 
• MASKS 
FABRICATION • VERIFICATION 
• SPECIAL PROCESSING 
TESTING • WAFER LEVEL 
• PACKAGE LEVEL 
Figure 6 
CALTECH CONFERENCE ON VLSI, January 1981 
,. 
50 t;u'l'!nrlr Wetlesen 
Thi c; sdm€' PCM serves as the common denominator for ~valuation of the 
processed wafer lot (figure 7). 1t provides process parametric cha racte ri-
zation data automated in both measurement and data reduction. VTI intendc; Lo 
• PARAMETRIC CHARACTERIZATION VIA PROCESS 
CONTROL MONITOR (PCM) 
• RESULTS OF "CANARY" CIRCUIT EVALUATION 
(OPTIONAL YIELD DATA) 
• PROTOTYPE PACKAGING (FOR CHIPS MEETING 
VTI ASSEMBLY STANDARDS) 
• ARCHIVAL OF PCMs (FOR OPTIONAL RELIABILITY AND 
OTHER EVALUATIONS) 
figure 7 
take advantage of ~HS work and to make this PCM and suppo r t documentation 
widely available to provide a standard suitable for multiple sourcing of 
fabrication services. The PCM will provide, however, only limited informa-
tion in terms of yield analysis. In addition , VTI will include a " canary" 
circuit , for example a large shift register, whose functionality will be part 
of the wafer acce ptance c ri t eria . In early prototyping phases statistically 
significant quantities of these test circuits can be incorporated into the 
mask set for correlation and projection of both yield and circuit 
performance . The PCMs will be saved in die form when a packaged chip rather 
than wafer level inte r f::1ce is employed , so that they can be used both for 
later electrical characterization and for quality a nd reliability 
verification for military programs . Similarly, correlation with pe r formance 
of both the prototype circuit and the " cana r y" will be even mo r e valuab l e if 
p r ocess v<:~riants or " tweaks" are employed either to give special featu r es 
(such as on - chip analog interfaces) or to give performance enhancements. 
In addition , the " VTI Net " will a l so be set up with multiproject chip 
(MPC) capability in mind , similar to t hat described earlier by Lynn Conway 
(figure 8) . Initially , the MPC capabiliLy will be used internally to support 
VTI ' s de sign courses . Based on a co r e of material available on video tape, 
these courses are being given in remote locations for appropriate l y equipped 
companies which will in tu r n enjoy MPC avai l ability . 
I NV ITED SPRA KERS SESSTON 
Fast Tu~na~o und Fabricati o n f o r Cu s t om VLS[ 
• INITIALLY INTERNAL FOR VTI COURSES 
e INTERIM AVAILABILITY ON A SELECTED CUSTOMER 
BASIS 
• NETWORK CAPABILITY IN FUTURE (LOGISTICAL, 
TECHNICAL AND PROPRIETARY ISSUES) 
Figure 8 
In the future , the courses will be given at VTI's planned design center. 
When technical and proprietary issues are solved, the MPC capability will be 
expanded and will become available for routine prototyping on an individu:-il 
basis via the network. 
Let us now consider the workings of the factory which supports the 
fabrication of custom VLSI. As a result of the percentage of prototype runs 
and the larger number of unique mask sets being run to get suitable volumes 
in a given manufacturing module, the support organizations of a custom cir-
cuit factory must be substantially different than current merchant IC 
suppliers . The greater logistics task begin with order entry, where a tech-
nical transfer is implicit, and continues through the factory with production 
control, quality assurance, and other organizations being matched according-
ly . Clearly, the task of performing on a build-to-suit basis at each step 
from maskmaking through fabrication, test, and assembly, while maintaining a 
line item orientation, is much greater than that for the inventory orienta-
tion used in standard products. The operating philosophy completely dif-
ferent, perhaps best described as the need to be effective rather than 
efficient, especially in the context o( fast turnaround. The caliber and 
awareness of people in these support organizations must be matched accord-
ingly . The mainstay of their equipment must be real time control systems for 
scheduling and tracking programs throughout the factory cycle. Ultimately, 
status information should become available as part of the inquiry service 
over the network. 
PROVIDING FAST TURNAROUND 
In t r aditional IC companies, factory production cycles of 16 weeks, 
assuming mask availability, are not uncommon . Of this cycle typically 6-8 
weeks is in wafer fabrication . Clearly , this cycle is untenable for circuit 
prototyping --a year could be consumed in 2-3 iterations. As wi l l be seen, 
ho wever , carryover of a fast turnaround philosophy into production also 
brings tangible benefits. 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
52 Gunnar> Wetl.esen 
The theoretical limit on fabrication cycle time of a typical MOS process 
is on t he order of 2- 3 days. A practical goal for minimum lot size fabrica -
tion of prototypes is one week if t he factory is organized accordingly to 
~inimize waiting in queues be f o r e each ope r ation . The preeminent r equirement 
in achieving fast turnaround is the operating philosphy (figure 9). 
Instilling in the work force the concept that "minutes count " is a further 
COMPANY CHARACTIIRianca 
COMPARISON EXISTING IC C0'1 PROPOSED CO 
GENERAl ORGANIZATION lASED ON EFFICIENCY lASED ON EFFECTIVENESS. 
PERFORMANCE TO SCHEDULE 
PHILOSOPHY INFLEXIBlE FLEXIBLE (PROCESS, 
PRODUCT, ETC.} 
COST IEFORE SERVICE SERVICE ORIENTATION 
PRODUCT ORIENTATION CUSTOMER ORIENTATION 
INVENTORIES COUNT TIME COUNTS 
Figure 9 
development of t h i s theme . In a dditi o n to the scheduling and tracking 
organizat i o n s operating as a "real time" function , the other organizationa l 
strategy for providing fast turnaround of prototypes is t h e pilot line 
concept (figure 10) . Located within the manufactu ring facil ities to assure 
ease and success of later production transfer, a prototype pilot line can, by 
providing the necessary focus a nd priority, reduce fabrication cycle t imes 
severalfold . In addi tion to pr oviding an elite team of skille d people, it is 
necessary lo give last-in, first-out priority on shared equipment plus 
p r ovide dedicated equipment at steps where lot uniqueness is maintained, such 
as pho t omasking . Fur t hermore , factory equipment must be more r e dundant to 
provide capac ity fo r " su r ge condition s " a nd be chosen to be not too " state-
of-the- art " so as to maintain high up-time. 
INVCTED SPEAKERS SESSION 
Fast TuPna~ound Fab~ication fo~ Custom VLSI 
PROTOTYPES 
STEP TYPICAL 
MASK MAKING 4-6 WEEKS 
WAFER FAB 4 - 6 WEEKS 
PACKAGING 2- 3 WEEKS 
TOTAL 10- 15 WEEKS 
II PRODUCTION (WITH EXISTING MASKS) 
TYPICAL 
12- 18 WEEKS 
Figure 10 
53 
POSSIBLE 
3 DAYS 
9 DAYS 
3 DAYS 
3 WEEKS 
POSSIBLE 
8-8 WEEKS 
Wafer fabrication is only one link in the turnaround chain . Mask making 
and assembly are others. In-house control of an e-beam mask gene rat ion 
system can keep this cycle time minimized. Typical writing and processing 
times make one plate per hour realizable with additional time for inspection. 
As a result, two- to three-day turnaround for a mask set ls practical while 
still providing priority for single layer redos or customized devices . 
Similarly, an in-house prototype packaging line can assemble prototypes in a 
couple of days. 
Summing up these individual times it can be seen that less than a week 
total cycle time from receipt of tapes to custom prototypes could be routine 
if an IC manufacturing facility were postured as described. Moreover, the 
production cycle times could easily be halved compared to traditional IC 
suppliers in this same atmosphere . In terms of limiting exposure to upstream 
yield or reliability problems as well as responding to increased customer 
needs, shortening of the manufacturing pipeline is an equally attractive 
possibility . 
CONLUSION 
VTI has been funded to serve the need of a high technology for VLSI and 
a service-orientation to provide quick turnaround. 
CALTECH CONFERENCE ON VLSIJ JanuaPy 1981 
54 
INVITED SPEAKERS SESSION 
LO~~F R r~RM DI~ECTIONS FOR SEMI-CUSTOM VLSI 
Gordon B. Hoffman 
United Technologies Microelectronics Center, 
Colorado Springs, Colorado 
55 
Through the convenience of jet travel have been able to talk both in 
Detroit and here in Pasadena today--although admit the time zone change 
helped, too. As was cruising along realized that next year we will 
begin limited production of a set of semicustom CMOS gate array chips for 
the jet engine fuel control on the Pratt & Whitney engine that has recently 
received so much attention due to large orders from Delta and American 
Airlines. A year or two after this production begins, my life, as well as 
yours, might welt depend on the performance of that controller. It 1 s a 
sobering thought that tends to bring home the reality of the technologies we 
develop. 
Our newly formed company, United Technologies Microelectronics Center, 
or UTM C for short, is dedicated to the development and design automation of 
semicustom circuits. Before UTMC was formed last year, all of the divisions 
of United Technologies had to go outside for their custom IC needs. It was 
getting increasing! y difficult to get support from the merchant semi-
conductor industry, particularly where low volumes of devices were required. 
In 1979 United Technologies acquired ~ostek, and there was hope that 
Mostek would help alleviate this difficulty in obtaining custom integrated 
circuit support. Now those who know Mostek realize that 11 custom 11 is not a 
happy word a! Mostek. 11 Custom 11 represents to Mostek the wrong use of scarce 
design resources. Mostek does not disagree that there is a strong market 
need, but for Mostek it has been a conscious business decision not to 
participate. Mostek did not originally hold this view; in fact, the first 
one-chip calculator circuit was made by Mostek on a custom contract. The 
original customer is now bankrupt, however, which gives you an inkling of 
why Mostek is leery of the custom business. 
The custom needs of the UTC divisions range from five - to ten-chip 
sets a month for esoteric military applications to high-volume chips for 
automotive applications. Because of the mismatch with Mostek, a high-volume 
MOS commodity IC supplier, a joint study was launched by UTC and Mostek to 
CALTECH CO NFERENCE ON VLSI ~ J anua Py 1981 
56 Go~don B. Hoffman 
find a ~olution. .A.s a result of that study UTMC was c reated last year in 
Colorado Springs and funded with $22M for Phase 1. Our Phase goal is the 
development of a C \105 gate array design system by the end of 1982. Mostek 
w iII b • :t •: fit as we II as the other U T C divisions, both as a user in its 
system divisions and in its ability to offer semicustom services to the 
merchant market using the tools we develop at UT\1C. 
Gate arrays were chosen as our initial thrust because automati o n of 
that design style is possible irt th~ ti,e scale we had to work with, th a t 
is, ASAP. Gate array design turnaround time is rapid, and production costs 
are re3 so nable, even for the automotive volumes some of our di"bions 
require. 
Since our go-d from the o utset wa~ an .tutomated design system, actual 
design of the arr1ys and processing considerations are \econdary to CAD 
requirements. For exampl e, if we ca n si.,plify our software or make it 
easier to usc by slightly in c reasing the wafer pro ~<!Hi ng cost, then we'll 
live with th e extra processing cost. In fact, one of our first decisions 
was to use two-level metal with our CMOS arrays to 'llake the routing problem 
easier and the corresponding software task smaller. 
Preoo; . •tptttion with chip area is another concern that needs examination 
with semicustom circuits. Yields for gate arrays and other semicustom 
approaches t en d t •J be much higher than custom circuits of corresponding chip 
size. Several factors are involved including a lower active density, fewer 
de~ign rule violations particularly with automated design systems, and the 
fact that a cumulative learning curve generally applies to semicustom 
t: ircuit cost independent of the actual customization. It's really the 
11 11nber of good die per wafer that determines chip costs, which is in turn 
determined by a combinat ion of chip size and yield. A gate array twice as 
large as an equivalent custom chip, but with twice the yield, has the same 
chip cost, and as it turns out, chip cost becomes less important as system 
functions require more expensive, high-pinout packages, and as other system 
integration cost savings are taken into account. I'll talk more about this 
subject a little later. 
We Nill have the capability to 
tions onto the same mask set and, 
INVITED SPEAKERS SESSION 
merge several different 
therefore, on the same 
array 
wafer. 
metalliza-
This will 
Lo ng e r Term Directions fo r Semi - Custom V~SI ::>I 
allow us to economically process very small quantities of devices, even 
breadboard quantities. 
The complete cycle from the start of logic design to delivery of custom 
chips should require 3 to 6 months vs 18 months or more for traditional 
custom circuits. Logic design for the jet engine fuel c ontrol mentioned 
earlier began last sumner and the first complete chip set will be delivered 
this February. 
Despite manual placement and routing of these four chips, ea c h with 
about 6000 CMOS pairs, only seven months were required for the entir e d e sign 
c y c I e , s o f e e I v e r y c o n f i d e n t a b o u t t h e 3 t o 6 mo n t h p r o j e c t i o n u s i n g 
design automation. 
This overall system really represents a type o f foundry operation, 
connecting silicon to order. The concept can, of course, be extended to 
other forms of semicustom IC 1 s and 1 1 11 talk about that later. 
Now many of you realize that semicustom concepts, and gate arrays in 
particular, have been talked about for more than 10 years. Systems similar, 
at least in concept, to the one 1 1 ve just described have been tried but 
w i thou t c onme r c i a I s u c c e s s , that i s , u n t i I r e c en t I y. I BM has used gate 
arrays quite successfully , as has DEC, M1DAHL, and Storage Technology, t o 
name a few. feel it 1 s instructive to see why semi c ustom approa c hes are 
experiencing a 
should give an 
11 renaissance 11 after a l o ng period 
insight into future directions. Let 
of 
me 
"dark ages, 11 whi c h 
share with you s ome 
o f t he t h o u g h t s a I o n g t h i s I i n e t h a t I e ad t o o u r e n t r y i n t o t h e s em i c u s t om 
arena. 
The first observation was that of transparency. If we c an make sil i con 
de s i g n t r an s p a r en t to I o g i c de s i g n e r s t hen we have s o I v e d t he s h or t age o f 
s i I i c on d e s i g n e r s t h a t s o I i m i t s t h e i n t e r e s t o f me r c h a n t s em i c on d u c t o r 
suppliers in custom design. Our design automation system is an attempt to 
do j us t that • Not on I y i s the i n i t i a I des i g n t i me reduced , but so i s the 
time for the inevitable design changes that always seem to occur in custom 
designs. 
When 
complete, 
s imu I at ion, testability 
the divisions will send 
verification, 
us , v i a DECNET , 
and 
data 
array 
that 
routing 
contains 
are 
test 
patterns, routing, and identification. We w i I I de r i v e the a c t u a I t e s t 
CALTECH CONFERE NCE ON VLSI , January 1981 
58 Gordon B. qo ffm a n 
equipment programs from these data and forward mask-making and identifica-
tion information on to Mostek. It is interesting to note that our customers 
need not devulge the chip's function, and it would take quite an effort to 
derive its function from the data they supply us. 
At Mostek three 
second meta I, and the 
will be used, although 
interconnection masks will be generated: first metal, 
vias between the two. E-beam mask-making equipment 
we envision going to E-beam direct write on wafer by 
1 98 3. 
with 
Preprocessed CMOS gate array wafers will 
the first level of aluminum already applied to 
be inventoried at 
the wafer. We 
Mostek 
expect 
the turnaround time for mask-making and 
take two weeks or less, rather than the 13 
the completion of metalization to 
to 18 weeks normally required for 
a complete set of CMOS masks and wafer processing from bare silicon. 
When "v1ostek 
customized wafer 
mentally process 
packaging efforts 
of these. 
is finished 
to us dnd we 
the finished 
will utilize 
with the metalization, they will deliver the 
will probe, assemble, final test, and environ-
devices prior to shipment. Ou r initial 
leadless chip carriers, or a leaded version 
The next reason behind the semicustom renaissance is performance. The 
metal-gate PMOS arrays of the early 70's had a narrow application, since 
they were not fast enough to replace TTL in most digital systems. Today, 
with bipolar or C MOS arrays, gate delay times can be achieved such that TTL 
replacement, including Schottky TTL, is quite practical, greatly expanding 
the size of the potential market. With CMOS we anticipate average on-chip 
gate delays under 3 nsec using double-level metal and 3.5 micron gates, and 
there is still a lot of room for further improvement. 
There is also ability to insert technology improvements without 
modifying the basic functional design. We expect, for example, to be able 
to process the same array metalization on differently processed arrays for 
radiation hardened applications or very high-temperature needs. 
The last and perhaps most important attraction to semicustom circuits 
is economics. While traditional logic design costs and associated bread-
boarding, documentation, and preparation for manufacturing costs continue to 
increase, computer time costs have dropped dramatically. Some of you 
remember the discretionary-wired LSI program that was much touted in the 
INVITED SPEAKERS SESSIO N 
Longe~ Te ~m Di ~ections fo~ Semi - CuBtom VLSI 
---o;;::;;::_ 
late 19601s. At that time 
and route interconnections to 
$2,000. Remember those are 
recall the computer run time costs alone to map 
the good die 
1968 dollars, 
on a two-inch wafer amounted to 
or about $4,700 in today 1s rhubarb 
currency. 
costs have 
Based on a recent study, using an IBM benchmark program, computer 
declined so much that $2,000 spent in 1968 would only cost $40 
today, or $17 in 1968 dollars, and computer run costs are continuing to 
decline. In fact, computer costs may be dropping as fast as MOS RAM prices 
are, and you can't say that for many items in today 1s world. 
The economic benefits of system integration onto silicon are well known 
and have been the driving force toward higher levels of integration. But 
access to higher levels of integration has been limited to high-volume 
system manufacturers that could justify traditional custom design costs, or 
to smaller users through microprocessors and related standard products. The 
immense volume of TTL and other forms of small and medium scale integrated 
circuits demonstrates that the transition to higher levels of integration is 
far from complete, and is an attractive area for penetration by semicustom 
eire uitry. 
Inflation has exacerbated the problem not only by increasing the cost 
of the small and medium scale integrated devices themselves, but the cost of 
putting them into systems and maintaining operation of those systems has 
risen dramatically. A $100 124- pin semicustom gate array, for example, may 
be a real bargain if it replaces 60 TTL packages, eliminates a PC board and 
edge connector, and saves on service and repair costs. Other system 
advantages include the potential elimination of cooling fans, smaller and 
less costly power supplies, lower system manufacturing cost, smaller cabinet 
volume requirements and therefore more cabinet styling latitude, and could 
go on to list at least 10 more advantages of system integration but I'm sure 
you could come up with interesting lists of your own. 
Finally, under the topic of economics are "market window costs" to a 
systems manufacturer. There is a measurable dollar value to getting a 
product into the market early, or at 
experience shows again and again, 
to the manufacturer who dominates 
least before competiton gets there. As 
the highest return on investment belongs 
market to get this 
domination if you are a year or two late 
share, and it's hard 
with your product introduction. 
With sem icustom circuit design, times are short, production ramps up quickly 
CALTECH CO NF ERENCE ON VLSI , Janua~y 1981 
60 GoPdon B. Hoffman 
and, I believe, without the number of unexpected design fixes often required 
with traditional custom circuits, or conventional designs using TTL for that 
matter, and necessary changes are quickly implemented. 
Market window costs may be a big factor in keeping even high volume 
manu fa c t u e r s from i n i t i a I I y go i n g the t r ad i t i on a I c us tom route , and I wonder 
if the switch to custom would occur even if economics were favorable. At 
the design automation conference last year asked a European manufacturer 
of gate arrays, who has fabricated over a thousand different customizations, 
how many were eventually designed out and replaced by smaller custom cir-
cuits. His answer was none, to his best recollection. It seems the 
engineering and financial resources to make the conversion were always put 
return on investment was consistently into new 
higher. 
product designs, where 
I expect this is not an 
the 
isolated phenomenon. 
Now I've talked about what we are doing with gate arrays, why the 
s em i c u s t om a p p r o a c h i s e x p e r i e n c i n g a r e v i v a I , a n d t h e c r i t i c a I r o I e de s i g n 
a u t om a t i on w i I I p I a y • My p e r s p e c t i v e , a dm i t t e d I y , h a s b e en f r om t h e MOS 
point of view, although there is quite an activity in bipolar technology as 
you know. now think it's time to make some predictions beyond where my 
e a r I i e r c omme n t s I e ave of f • 
I don't see gate arrays being a temporary phenomenon. The quick 
c us tom i z at i on w i I I be c r i t i c a I in my a p p I i cat ions , des p i t e c h i p s I z e 
implications , at least until automated fab operations can produce chips from 
bare silicon in a really short time frame. There's a good deal of architec-
tural innovations on the horizon for gate arrays as well. We have a big 
p r o b I em t o s o I v e w i t h t e s t a b i I i t y , b u t t h a t p r o b I em i s e n d em i c w i t h a I I 
efforts toward higher levels of integration, and must and will be solved. 
The next logical step is to use standard cells, where the design 
system is identical to gate arrays, except that the wafers are processed 
from bare silicon using compact, although dimensionally constrained, layouts 
for each eel I type, and resulting in a smaller chip. This approach would be 
a cost-effective transition from a gate array design for high volume 
applications once the system design has been stabilized. We intend to make 
standard cells a key part of our second-phase effort, which begins in 1982. 
INVITED SPEAKERS SESSION 
Longe r Term Directions for Semi - Custom VLSI 61 
Following the standard cells c a n see a "macro ce II 11 approach where 
macro cells could be subsystems themselves, not constrained to particular 
shapes or positions. In the cell library there might be a 32-bit processor, 
speech synthesizer elements, memory, standard cell logic blocks, and so 
forth. The design automation requirements are hardly trivial, 
necessitating, for example, some sort of high- level machine description 
I an g u age • After a I I , we can 1 t keep de a I i n g w i t h s imp I e I o g i c des i g n i n put s 
as the level of integration continues to rise. 
And I couldn't finish this talk without a conment on foundries. It's 
clear to me that with design automation, connecting silicon to order will 
become a way of life. The order initiators will, in time, be predominately 
the custom community rather than the semiconductor manufacturers themselves. 
To make this practical, design automation is the key. Semiconductor wafer 
fab and backend operations must evolve from an orientation toward making 
huge volumes of relatively few circuit types to small volumes of many 
circuit types. Our plans are to develop the systems I have described today 
and work with Mostek to define and develop such a foundry operation in the 
mid 1 980 1 s. 
L e t me I e a v e yo u w i t h on e I a s t t h o u g h t by p r o p o s i n g a de f i n i t i o n o f 
semicustom circuits, and then leaving you with a curious observation. 
user 
S em i c u s t om i n t e g r a t e d c i r c u i t s : t h o s e 
but are s tan dar d to the rna n u facture r • 
ICs which 
If this 
appear custom to the 
definition is a fair 
one, and I believe it is, then the distinction between semicustom and custom 
is bound to disappear. Design automation at the device level is progressing 
r a p i d I y and t r an s p a r en t c us t om c i r c u i t de s i g n can 1 t be fa r o f f. Wh en i t 
do e s b e c orne a r e a I i t y t h en t h e on I y d i s t i n c t i on m i g h t be e n t r y I e v e I , d e v i c e 
v e r sus I o g i c , or s orne other I eve I , but sus p e c t that w i I I merge as we I I • 
I will leave the conclusions to you. 
CALTECH CO NFERENCE ON VLSI , Jan ua r y 1981 
62 
63 
FABRICATION SESSION 
Chairperson: James D. Meindl 
Professor of Eleatriaal Engineering 
Stanford University 
CALTECH CONFERENCE ON VLSI, JANUARY 1981 
64 James D. Meindl, 
FABRlCATlON SESSION 
VLSI is governed by a hierarchy of limits. Each of the five levels 
of this hierarchy is constrained by all preceding l evels . Briefy, 
limits representing e.:=tc.h level are 
(l) A fundamental limit derived from thermodynamics requiring the energy 
expenditure p~r swi.t ehing transition in a digital system to be E ) 
-20 s 
4kT = 1.65 x 10 joules. 
(2) A material limit for silicon requiring the transit time of an 
elect ron (~t) through a potential drop AV to be At ~ 6V/v ~ = 0 . 416 ps 
s c 
for 6V IV, v the scattering limlled velocity and ~ the critical 
8 c 
fi.Pld. 
(3) A device limit f0r an N-channel MOS transistor to avoid punch-through 
requiring a source-to-~r~in spacing L) 2[2Esi(v0 + ~bi)/qNA] 112 = O .l8~m 
for substrate doping NA = 1. 5xlo 17 /cm 3 and drain voltage v0 = 0.5 v. 
(4) A clrc:llit limit for a CMOS inverter requiring an average power drain 
2 PAVE = CL Vddf where CL is the load capacitance, Vdd the drain 
supply voltage and f the frt::!quency of excitation. 
(5) A system limit . This level consists o f a number of sublevels which 
include both software -'lnd applications constraint<; . Consequently , 
system limits reprt'sent the most nUJnerous and nebulous gro11p of the 
hierarchy albeit potf>ntl~lly the most profoundly important. 
Any level of the hierarchy can be split into two generic metalevel s . 
At the circuit level the conceptual metalevel i.s intrinsic or independent 
of the fabrication equipment of technology used to produce a chip, whlle 
the practical metalevel depends , for exumple , on whether photolithography 
o r X- r,ly l i.th·>~ r.\phy is used. 
The papers in t id s session of " fabrication" deal largely with 
practical limits in VLSI at the device and circuit levels. 
FARRICATION SECTION 
Abstract 
TRENDS IN SILICON PROCESSING 
V. Leo Rideout 
IBM Corporate Headquarters 
Armonk, NY 10504 
The advent of very large scale integration will require 
substantial progress in all aspects of silicon technology: 
processing, lithography, modeling, design tools, chip archi-
tecture, and applications. This paper will survey current 
trends in silicon integrated circuit fabrication, focusing 
on new developments and outstanding problems. Progress in 
both bipolar and MOSFET technologies will be considered . 
Silicon fabrication techniques will be described in terms of 
the repetitious application of operations that are additive 
(oxidation, doping, deposition), selective (lithography), and 
subtractive (etching). The objective of these operations is 
65 
a reliable and predictable device structure. Device structures 
will be described in terms of isolation areas, devices, con-
tacts (intraconnection vias) , wiring (interconnectio n lines), 
and passivation. Immediate problems in isolation size, 
device performance, contact resistance, and wiring topography 
will be identified. Future needs for improved structures 
will be indicated. Promising new approaches such as lightly-
doped drain FETs and silicide-on-polysilicon (polycide) wiring 
will be described. Throughout this discussion the importance 
of process modeling will be emphasized. 
CALTECH CONFERENCE ON VLSI, January 1981 
66 
v. Leo Rideout 
1.0 INTRODUCTION 
The advent of very large scale integrated circuits (ICs) will 
require substantial progress in all aspects of silicon techno -
logy: processing, lithography , modeling, design tools, chip 
architecture and applications . Thi s paper will survey current 
trends in silicon integrated circuit fabrication including 
processing techniques, lithographic tools, and process model -
ing. As illustrated in Figure 1, the level of integration in 
mass manufactured IC chips has reached 160,000 components on 
a 64 Kbit dynamic random- access memory (RAM) chip, 80,000 on 
a 16 Kbit static RAM, and 40,000 on a 16 bit custom micropro-
cessor (1). By the end of this decade we can expect 512 
Kbit dynamic RAM's and 128 Kbit static RAM's with one million 
components per chip, and 64 bit microprocessors with 250,000 
components. By the term component we mean an integrated 
transistor, resistor, capacitor, or diode . 
In 1979, the worldwide sale of semiconductors was $11 .1 billion, 
most of it in silicon IC ' s. In 1980, semiconductor manufactur-
ers are expected to invest well over $2.0 billion in new 
fabrication plant and equipment . Including captive silicon 
suppliers, the total investment will be in excess of $3.0 
billion. Obviously silicon processing is a big business . 
Presently , the most advanced bipolar and MOSFET circuits are 
mass produced with photolithographic groundrules of 2.5 to 
3.0 microns . This is expected to improve to 2 . 0 microns in 
1982 , and to 1.0 microns by about 1988 (see Figure 2). New 
processing tools and techniques needed to maintain this 
progress and novel device structures evolving at smaller 
dimensions will be discussed in this paper. 
FABRICATION SESSION 
TPends in SiLicon PPocess ing b 'l 
IM 
256K 
FIGURE 1: Yearly rate of improvement in components per chip . 
MINIMUM LINE WIDTH FOR PRODUCTION IC'S 
--
- Dataquest Inc. (1974) 
• G. Moore (1975) 
x V.L. Rideout (1978) 
b. J.C. van Vessem (1979) 
., 
........ 
',b. 
xx;~"¢··············································· 4 l x, 
62 
Line width halves 
every 6 years 
66 70 74 
YEAR 
: ......... b. 
' 
78 82 
FIGURE 2: Prooress in lithooraphic linewidth . 
2 
86 90 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
V. LeQ Rideout 
2.0 ADDITIVE OPERATIONS 
Silicon IC fabrication may be thought of as the repetitious 
application of materials processing techniques that are addi-
tive, selective, and subtractive in nature. Typically, only one 
lithographic masking operating or step is associated with each 
loop of this repetitious manufacturing procedure. It will be 
assumed that the reader is at least basically informed about 
the fabric~tion process for silicon integrated circuits (2). 
The ability to thermally grow a layer of silicon dioxide either 
locally or globally on a silicon wafer is one of that semicon-
ductor ' s most important properties. The oxides are usually 
grown in dry oxygen (thin layers/slow growth) or in the pres-
ence of water vapor (thick layers/fast growth) . Chlorine ions, 
usually in the form of HCl vapor, can be added to reduce oxide 
charge and improve capacitor characteristics, particularly in 
MOSFETs. Oxidations are typically perform~d in the 900- 1100°C 
range for times of 10 to 100 minutes. 
A major direction in oxidation techniques is toward lower 
temperatures (less than 1,000°C) and higher pressures (more 
than one atmosphere). The desirability of lower fabrication 
temperature is quite general because the drive for continually 
shrinking lateral dimensions (i.e., device scaling) has led to 
thinner vertical layers and reduced processing times . The 
processing time reduction can become so severe (only a few 
minutes) that uniformity control is impaired. In order to 
maintain or even lengthen processing times for better control, 
lower processing temperatures are desired. Other advantages 
of lower processing temperature include reduction of out-
diffusion, grain growth, defect generation, and wafer 
warpage. 
FABRICATION SESSION 
Trends in SiLicon Processing 69 
2.) OXIDATION 
During oxidation, both the linear and parabolic rate constants 
increase with the partial pressure of the gaseous oxidant (3) 
( see Figure 3). This advantageously leads to a much faster 
oxidation rate at a given temperature or the same oxidation 
rate at a much lower temperature. For every increase of one 
atmosphere, the oxidation rate doubles. Alternately, if a 
fixed rate is desired, every increase of one atmosphere a llows 
a reduction of 30°C in oxidation temperature. Potential 
advantages include reduced thermal-induced damage (e.g., 
wafer warpage and oxidation-induced stacking faults), lower 
surface-state density, and reduced boron depletion. Presently, 
press urized furnaces are commercially available for dry and 
wet oxidations at pressures up to 25 atmospheres. This tech-
nique can be expected to be incorporated into production in 
the ne< ~ future. 
2.2 CHEMICAL VAPOR DEPOSITION 
Thin layers can also be chemically vapor deposited ( CVD) from 
gaseous sources in an RF induction-heated furnace to provide 
epitaxial silicon (900 - 1300°C }, polysilicon/Si02;si 3N4 (600-
10000C}, and passivation ( 600°C } layers (4). Dopants can be 
incorporated into the chemical deposition process. A major 
trend is ~o use low pressure CVD (0.1 to 40 Torr) whi ch affords 
improved thickness control (especially for polysilicon layers}, 
reduced auto-doping, and higher throughput. The deposition 
temperatures listed above could potentially be reduced by 100 
to 200°C. Probably the most difficult deposition process to 
improve on is single crystal silicon-on- silicon epitaxy vlhich 
is essential to bipolar processing. Two interesting research 
techniques that address this problem are molecular beam epitaxy 
or MBE (5) and solid phase epitaxy or SPE (6) which utilize 
substrate ~emperatures of 400 to 600°C. 
CALTECH CONFERENCE ON VLSI, January 1981 
70 V. Leo Rideout 
25K 
20K 
cxx: 
(/') 
(/') 
w 
z 
:::r::::: 
u 15K 
-
I 
I-
UJ 
0 
X 
0 
10K 
5K 
SUBSTRATE 
RESISTIVITY: 3-10S1-CM 
• 
• 
• .
. 
• 
• .
. {·· 
• 
• 
• 
• 
• 
• 
• 
• 
.r···· 
• 
• .
• 
• 
• 
• 
• 
. 
• .
1000°C 
.·110 ATM 
. ·----
•• < 1 OO> 
1150°C 
1.0 ATM 
11 00°C 
1.0 ATM 
10oo0 c 
1.0 ATM 
40 80 120 160 200 
OX IDATI ON TIME (M INS.) 
FIGURE 3 : Wet oxidation growth curves for one and ten 
atmospheres (afte r Ref. 3) . 
240 
FABRICATION SESSION 
TPends in Silicon PPocessing 71 
The chemical vapor deposition rate can be enhanc e d by pres-
ence of an RF plasma. The gaseous reactants (e.g., nitrogen, 
ammonia, and silane) interact to form a solid film product and 
other gaseous by-products (7). The reaction is sustained by 
the RF plasma rather than by external hot-wall heating. Low 
deposition temperatures (200 to 400°C) and highly conformal 
films are the result. Plasma deposition of silicon nitride is 
now widely used for final passivation as a replacement for 
phosphorus-doped silicon dioxide. A related future activity 
is photo-excited CVD of silicon nitride, silicon dioxide and, 
possibly, epitaxial silicon. 
Molecular beam epitaxy (5) utilizes a vaporized beam in ultra 
high vacuum. The MBE technique has exhibited excellent thin 
film quality (one micrometer thickness) but is hindered by 
throughput limitations and equipment cost . With solid phase 
epitaxy, a doped amorphous silicon layer is deposited onto the 
substrate, and an epitaxial film is produced by heating the 
composite either locally or in a furnace. N- channel MOSFETs 
have been fabricated in SPE - grown epitaxial layers with 
channel mobilities of 360 to 480 cm 2 /V-sec (6). In addition 
to low temperature growth for improved layer thickness control 
and reduced dopant redistribution, SPE offers selected-area 
epitaxy which can be attractive for defining isolation regions. 
This can be achieved by depositing onto a masked substrate, 
or by local heating with a l aser or electron beam. 
2.3 BEAM HEATING 
In several processing operations it is necessary to subsequently 
h eat or anneal the wafer. One such example is the annealing of 
regions doped by ion implantation to remove local stress and to 
activate the dopant. Traditionally a RF heated furnace with an 
CALTECH CO NFERENCE ON VLSI, JanuaPy 1981 
72 V. Leo Rideout 
inert gas ambient is used for annealing in the 800-1000°C range 
for about 30 minutes. Significant dopant redistribution can 
occur during the annealing step, but laser or electron beam 
heating offers the potential for fast local heating that avoids 
this. Obviously, the beams can be scanned to anneal the entire 
wafer. Potential advantages are reduced processing time and 
lower cost . Thus far beam heating techniques are not reliable 
enough to anneal active semiconductor regions for production 
devices. The future applications of beam annealing in order of 
acceptance probably are: 
• 
• 
• 
• 
• 
• 
• 
inducing backside damage gettering, 
forming silicide layers, 
activating doping in polysilicon layers, 
annealing r.ontact regions, 
growing epitaxial layers from deposited amorphous 
films, 
annealing implanted device areas, 
relieving stress in silicon-on- sapphire or in local 
isolation regions. 
2.4 DOPING 
Ion implantation is steadily replacing solid and gaseous 
diffusion as the primary means of doping silicon because ion 
implantation offers much better areal doping uniformity 
(better than 1%) as well as profile tailoring. The uses of 
ion implantation for backside gettering, channel threshold 
adjustment, source/drain or emitter/collector doping, and 
resistor fabrication, are well known. Machine capability 
is steadily being improved, particularly for higher currents 
up to lOrnA, and higher ion energies up to 400 KeV (8). These 
higher throughput machines give rise to concerns with heat 
dissipation in the wafer. 
FABRICATION SESSION 
T Pe n ds in Si l icon PPocessing 
One trend in ion implantation is toward very low energy (less 
than 10 KeV ) implants for shallow distributions which will be 
needed for threshold adjustment of micrometer- sized MOSFETs. 
The uniformity of low dose/low energy implants is still a sig-
nificant problem . Another concern associated with such low 
energy implants is anomalous channeling which degrades the 
distribution . Yet another difficulty is grain boundary chan-
neling, particularly of As or P through polysilicon gate 
electrod es ( 9). Another n ew implant appl ication uses extremely 
high energies (2 - 3 MeV) to implant deep buried layers for 
alpha particle collection grids. Other novel applications 
of ion implantation include enhanced etching of oxide and 
silicon regions, contact via hole doping, silicide formation, 
and double - diffused (DMOS ) FET ' s. 
Focused ion beams (1 0 ) offer one means for combining additive, 
selective, and subtractive processing operations. Potentially, 
t h is technique cou l d selectively dope the substrate , expose 
resist patterns, or sputter etch thin films. The goa l is to 
reduce processing steps and eliminate masking operations. 
Narrow line widths and precise registation will be required, 
however . To date, in the research laboratory, Ga ion beams 
have been used for doping and machining se l ected regions. 
Focused boron and arsenic beams have also been demonstrated . 
The most likely i nit i al application of focused ion beams is 
in special applications requiring customized fabrication. 
2 . 5 METALLIZAT I ON 
By far, the most popu l ar material for metallic low resistance 
interconnection l ines in silicon integrated circuits is alumi -
num. Alu minum is abundant, i n exp e n sive , easy to evaporate and 
to pattern, se l f - passivating, and adheres well to both si l icon 
CALTECH CONFER ENCE ON VLSI, Janu a py 19 81 
74 V. Leo Rideout 
and silicon dioxide. The most common metal deposition tech-
niques are: 
• evaporation from a RF heated source, 
• evaporation by electron- beam heating, and 
• sputtering. 
Of these, the RF heating approach offers the least radiation 
or surface damage, particularly for FET fabrication . DC 
magnetron sputtering also has low associated radiation. The 
deposition of other metals for rectifying contacts (11) (e . g., 
Pt, W, Ta, Nb) often requires electron-beam heating with 
higher risks of radiation damage. 
One of the most important trends in metallization is the 
development of two, three, or even four layers of metal wiring 
paths. Most of the difficulty centers around the insulating 
layers between metal level s (sputtered quartz, nitride, oxide -
nitride or polymide) and the contact holes through these 
layers. The insulating layers must be deposited at low tem-
peratures so as not to degrade the first metal (aluminum) 
interconnections. As more layers are added, the topography 
of the structure becomes less and less planar, and corres -
pondingly the linewidth control degrades. For example, in 
a triple metal system, the third level metal lines may have 
to be double the width of the first level l ines. A goal 
is to dev~lop planarizing techniques and via refill steps 
to improve planarity and line control . FETs with two metal 
levels and bipolars with three are now commonplace. 
Aluminum, gold, and silver do not form intermetallic compounds 
with silicon, but many other metals such as Pt, Pd, Ti, Ta, Mo, 
and W do. These intermetallic compounds, called silicides, 
provide intimate metal - semiconductor contacts which have a 
FABRICATION SESSION 
TPends in SiLicon PPocessing 75 
number of useful pro p e rties including: high barrier heights, 
high eutectic formation temperatures, and low resistance. 
Applications of silicides include both rectifying and ohmic 
contacts, and, potentially, interconnection lines on top of 
polysilicon or diffused regions. A new area is the use of 
ion or laser beams to form silicides. 
Low formation temperature silicides like PtSi and Pd2Si have 
long been used as rectifying contacts and more recently 
studied as a means for reducing contact via resistance 
between Al lines and Si regions. Such techniques are applied 
late in the process after all high temperature steps have 
been completed. The development of silicide layers that can 
withstand high proces s ing temperatures is now one of the most 
active research and development areas for IC metallurgy. 
It has been proposed that a high formation temperature silicide 
such as tungsten, tantalum or molydenum silicide could be used 
to reduce the sheet resistance of polysilicon or diffused 
silicon regions in FETs (12}. For process ground r ules of over 
three micrometers, diffusion depths and polysilicon thicknes-
ses are large enough that sheet. resistances of 15 to 30 ohms/ 
square can be obtained. As dimensions are reduced, however, 
sheet resistances rise degrading performance. A major 
objective is to cover the thin polysilicon layer with a low 
resistance silicide layer yielding a composite "polycide" 
layer with the gate electrode properties of polysilicon but 
with a sheet resistance of one to five ohms/square (13, 14}. 
The layered gate electrode/interconnection line material must 
withstand high temperature oxidation and annealing steps and 
be easy to pattern and etch selectively (13) . The etching 
step is particularly troublesome as polysilicon and metallic 
silicides have quite different chemical behavior. To date a 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
76 V. Leo Rid(v 
simple preferred polycide technique has not been disclosed and 
a compromise between sheet resistance, etching behavior, and 
durability must be achieved. Polycides are alao important for 
advanced bipolar structures that use thin polysilicon layers 
for wiring and as the source of s hallow emitter doping (15). 
2.6 PASSIVATION 
A passivation layer is required over the silicon chip to 
inhibit damage from mobile ions (Na, Cu, etc.) and water 
vapor . Historically, sputtered quartz, phosphorus- doped 
glass, and oxide-nitride coatings have been used. More 
recently a move to plasma enhance CVD nitride and to polyimide 
has occurred (7). The primary attractions are lower cost and 
more conformal coatings. 
Polymide is particularly attractive in cases where multilevel 
metal wiring is employed . An interesting research development 
is a photosensitive polymide whi ch could eliminate resist 
masking and etching of the passivation layers. Organic 
coatings for absorption of package-generated alpha particles 
are now popular for dynamic memory chips. 
3.0 SELECTIVE OPERATIONS 
Selective operations invo lve the exposure and development of 
lithographic patterns into a photosensitive layer (e.g., a 
resist) . Over the past 20 years, the lithographic improvement 
traced in Figure 2 has progressed from proximity to contact 
to projection printing, all with full wafer exposure . As 
indicated in Figure 4, other developments in lithographic 
technique are expected in the future. Presently, the manu-
facturing capability of full - field projection printing is 
about 2.5 ± 1.0 micrometers. This lithographic groundrule 
FABRICATION SESSION 
T Pends in Silicon PPocessing 
SELECTIVE OPERATIONS ( LITHOGRAPHY) 
1970 
I 
75 
' 0 
NONCONTACT 
PROX/CONTACT 
PROJECTION 
I I 0 t 
8 6 4 
80 
' I 
85 
' 0 
OPTICAL DSW 
90 
' I 
X- RAY PROX 
X- RAY DSW 
E - BEAM DSW 
I I t 0 I I I 
3 2 1 . 5 1 . 8 
GROUNDRULE (MICROMETERS ) 
. 5 
95 
0 
0 
77 
F I GURE 4 : Development of lithographic exposure techniques . 
refers to the resist patterns, not the final fabricated 
feature s izes on the silicon wafer whi c h are approximate 
1.5 times larger (16 ) ( e.g., about 3.5 to 4.0 micrometers). 
A natural extension to shorter wavelengths (250 nm or "deep" 
UV) is occurring with the necessary transition from glass to 
quartz mask plates . When compared to standard UV (320 nm), 
deep UV enables the proximity gap between mask and wafer 
to be widened to reduc e ma s k damage while maintaining system 
resolution. Alternately, operation with the same gap will 
give reso l ution increased by the square root of the wave -
length ratio . 
CALTECH CONFERENCE ON VLSI , Jan uaPy 1981 
78 v. Leo Rideout 
In order to progress to below 2 micrometers it appears that 
limited field, step-and- repeat (i.e., direct- step- on- the- wafer 
or DSW), projection optics with automation alignment will be 
required. Many new IC facilities now under construction are 
strongly dependent on optical step- and-repeat lithography. 
Linewidths of 1 to 1.25 micrometers and registration of ±0.4 
to ±0.6 micrometers should be possible near the end of the 
decade. 
Beyond one micrometer, a strong competition is developing 
between direct write electron-beam and projection X- ray. 
Although the outcome will not be decided for at least 5 years, 
both techniques still have serious deficiencies . Electron-
beam lithography is costly, complicated, and in need of more 
robust and more sensitive resists. Electron- beam machines 
have, however, become widely accepted as mask makers for 
optical projection, and at IBM are also used in production 
for the customizing of final level metal patterns in bipolar 
logic arrays. X- ray lithography is less well developed and 
needs more energetic sources, stable masks, more sensitive 
resists, and an automatic alignment scheme. An interesting 
proposal in the lithography field is to develop a relatively 
smaller electron storage ring for the generation of intense 
X- rays (17). Such a ring would service several (e.g., up to 
10) X-ray lithographic stations. 
In the resist area, an important activity concerns conformal 
multilayer masking techniques (18, 19) . This is a means for 
circumventing low resist sensitivity and depth of field 
constraints, and for improving resolution (14). As shown 
in Figure 5, a very high resolution pattern can be formed 
in a thin layer of resist which would be too thin to be 
used for etching. But this high resolution pattern can be 
transmitted down into a thicker layer of working resist, 
FABRICATION SESSION 
T~ends in Silicon P~ocessing 
NEAR -ULTRAVIOLET OPTICAL 
OR ELECTRON-BEAM EXPOSURE 
~ ~ ~ AZ 1350J RESIST 
79 
/ 
0.2J...I.r•l 
~---------------.~----~~--------------~-, 
OEEP·UV BLANKET EXPOSURE 
~ ~ ~ ~ t 
' 1 • I.,. , ; . 
~ . . ~ . 
I ' . , , I 
, 
~ 
~ 
PMMA 
~1T03J...I. m 
t 
:) 
\ 
J 
OF POLYMETHYL 
METHACRYLATE 
(PMMA) 
FIGURE 5: Portable conformal masking t echnique (after Ref . l8) . 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
80 
V. Leo Rideout 
without much loss in resolution, by a blanket exposure. IBM 
calls this portable conformal masking because the multilayer 
resist structure is transported from ~ne exposure station 
to another and because the thicker resist conforms to the 
topography below it . 
Over the past ten years, photolithographic linewidths have 
halved every 6 years (see Figure 2). The combined progress 
in lithographic machinery, resist materials, and etching 
techniques can double the chip density about every 4 years 
at best (see Figure 1). It is expected that improvements in 
optical projection lithography can sustain this rate of 
improvement for at least the next five years. 
The transition to E- beam or X- ray lithography in production 
should take place in the latter half of the decade as optical 
wavelengths constrain photolithography to about one micrometer 
dimensions (see Figure 4). This transition will be slowed 
by the severe technical and economic difficulties inherent 
in introducing any new lithographic technology, by the 
problems associated with even larger chip and wafer sizes, 
and by the staying power associated with an immense invest-
ment in the highly utilitarian optical technology. 
4.0 SUBTRACTIVE OPERATIONS 
The transfer into the substrate of the exposed and developed 
mask pattern in the photoresist layer is accomplished by a 
subtractive etching operation. Historically, such operations 
were carried out using wet chemical etchants such as hydro-
fluoric, sulphuric, or phosphoric acids. The attraction of 
wet etchants is that they are highly selective, generally 
attacking only one layer species. Unfortunately, they are 
FABRICATION SESSION 
TPends in SiLicon PPocessing 
frequently isotropic or non- directional in nature and hence 
tend to etch under the masking layer. Wet etches are temper -
ature and concen~rat1on dependent and overetching is often 
needed to insure complete material removal. 
Important improvements have been made in recent years with 
etching in RF generated plasmas (20, 21, 22) which is sometimes 
referred to as dry etching. Figure 6 illustrates the differ-
ence between sputter, plasma, and reactive ion etching . In 
sputte r etch~ng, a non-reactive gas such as Ar is used and 
the sample is simply bombarded with directional, energetic ions. 
The etching, however, is indiscriminant, i.e., non- selective. 
Thus, there is no physical mechan1sm that stops the etching 
process when a second layer is revealed . 
With plasma etching (20, 22) a reactive gas species like CF4 or 
cc1 4 is used with the wafers at plasma or ground potential and 
a pressure of 0.5 to 2 Torr. The high gas pressure leads to 
a random incidence of etching species. These conditions and 
the wafer positioning lead to an isotropic (i.e., non-
directional) but highly selective etching which is widely used 
in the industry today, for example for ashing (stripping) 
exposed resist layers and for etching thin Si 3N4 layers. As 
contrasted to the barrel assembly, the use of a parallel plate 
reactor improves the directionality of plasma etching, but 
with a loss in selectivity. Selectivity ratios of 10 or 20 
to 1 are possible although 5 to 1 is more typical. The tran-
sition from wet etching to dry (plasma ) etching is required 
for linewidth control for features in the 2 to 3 micrometer 
regime (14). 
CALTECH CONFERENCE ON VLSI, Januapy 19 
82 
RF 
~ 
C~.~ 
t t t 
~ ....... ~'"'; ... . 7 .... ~: 
.:<:< Ar 
~{~:~:~· · .. 
v. Leo Rideout 
Pl.AS'A ETCHI~r; 
(tla'l-CIRECT Jaw.JSEL.ECTIVE) 
(DJ P.ECTir::J-tAL/SE!..ECT l 'ff..) 
CF4 PLASMA 
RF 
FIGURE 6: Comparison of sputter , plasma and reactive ion 
etching (after Ref . 22). 
vlith reactive ion etching (RIE), a reactive gas like CF4 , CC1F3 , 
c 2F6 , cc1 4 , etc., is used at lower pressures (.020 to .040 Torr) 
and with the wafers placed on the cathode (e.g . , out of the 
plasma discharge region) (23). A directional and selective 
etching condition results. The etching behavior is sensitive 
to various parameters such as RF power, gas pressure, and the 
FABRICATION SESSION 
TPends in Silicon PPocessing 
choice of etching gas and cathode material. A strong attraction 
of RIE is that the etching gas composition can be manipulated to 
obtain differe~t high etching rates (24). Also, by increasing 
the gas pressure or altering the sample position, combinations 
of reactive ion and plasma etching can be utilized (22). 
The drawbacks to reactive ion etching are primarily technical 
(batch sizes, etch rate uniformity, understanding of the 
etching chemistry) and hence RIE will become more widely 
accepted as techniques improve. The combination of selecti -
vity and directionality afforded by reactive ion etching is 
essential to the development of one micrometer processes. A 
particular need is for robust photoresists that are highly 
resistant to plasma or reactive ion etching. Another need is 
for the ability to provide a controlled slope on the edge of 
an etched line to relieve line coverage problems. 
5.0 PROCESS MODELING 
Computer aided design (CAD) tools are of increasing benefit to 
chip and circuit design, for example, for wire routing, cell 
placement, timing analysis, and design rule checking . Device 
modeling is also valuable in predicting electrical parameters, 
cutoff frequencies, threshold voltages, and so on . Only 
recently has process modeling begun to play an important role 
in integrated circuit fabrication (25). 
Integrated circuit modeling activities can be roughly subdivided 
into process, device, circuit, and chip architecture areas. 
Device modeling using Poissons ' equation and the continuity 
equations has been active for over twenty years, however process 
modeling dates back less than a decade. The primary purpose 
of process modeling is to provide a description of the device 
CALTECH CONFERENCE ON VLSI , Januapy 1981 
84 V. Leo Rideout 
structure that can be utilized in a device analysis model. 
This structural description may include impurity distributions, 
insulator thicknesses, and device dimensions such as channel 
lengths and widths. This activity can be referred to as 
process profile modeling. The device analysis model utilizes 
this profile information in predicting the device parameters 
such as curre~t-voltage or voltage - voltage relationships. 
These electrical relationships can then be incorporated into 
a circuit simulation model to predict switching rates or 
signal propagation times. The result of such an analysis will 
be a description of the nominal circuit performance. Before 
attempting to fabricate a circuit however, one should also 
determine the statistical range of circuit performance . 
Var~ations in the fabrication process are unavoidable . Some 
of these are natural in origin such as uncertainties in sub-
strate resistivity, others are equipment related arising from 
and yet others are due to operator differences in etching 
times or other procedural variables. The cumulative result 
of many small statistical variations in fabrication can 
induce a significant resultant error in the final electrical 
parameters. It is here that process control modeling can be 
of value. Deviations from the base line process conditions 
can be deliberately introduced into the model and a sensiti-
vity analysis performed (26) . This can help to determine if 
one of the process steps is close to a critical point. In one 
instance in the author's experience, process models showed 
that the chosen implanter energy was on a steep sensitivity 
slope . Reducing the energy and increasing the dose gave a 
safer fabrication condition which had less effect on the range 
of the resultant electrical device paramenters. 
FABRICATION SESSION 
TPends in Silicon P Pocessing 85 
Relative to CAD for circuit development, process modeling is 
s till in its infancy. The process work began as one - dimensional 
profile analysis, by either analytical or numerical models, and 
progressed to two and three - dimensional forms. The initial 
work concerned primarily the additive materials operations . 
Mo re recently, the se lective operations of resist exposure 
and development have been addressed with the intent of defining 
linewidth parameters (27). More work still remains to be done 
in modeling of oxidation and diffusion which are now well 
developed and suitab l e for computer modeling. Plasma and 
reactive ion etching, however, are le ss well understood which 
complicates the de scription of their behavior . In the area 
of profile modeling, two s ucces sfull efforts concern descrip-
tions of channel stoppers for FET isolat ion (28), and 
double-diffused regions for bipolar emitters (29). 
Until very recently process modeling h as been mostly an " after-
the- fact" activity. Typi ca lly, process modeling was brought 
into play only after difficulties were detected in a new 
process, or in a modified older process. The VLSI era promises 
to be different . First, s trictly from an economic point of 
view, the cost of experimental pilot line faci lities is so high 
that brute force trial-and-error process development is too 
expensive . Thus tightly coupled process and device modeling 
activities will be required to optimize the process develop-
ment cycle. Second, more highly integrated circuits promise 
to be more sensitive to statistical variations whether natural 
in origin or introduced during fabrication. Statistical 
fluctuations in doping, high d e n s ities of small defects, layer 
thickness control, and a host of problems may plague VLSI 
fabrication in the micrometer and submicrometer lithographic 
regime. Consequently , two and even three dimensional process 
control models will be required to help identify, understand, 
CALTECH CO NF ERENCE ON VLSI, JanuaPy 1981 
86 V. Leo Rideout 
and avoid costly fabrication problems. Third, novel process 
steps can now be investigated with a computer model. This 
gives the process engineer a new tool for innovation . 
6.0 AUTOMATED PROCESSING 
Process automation is slowly and steadily being incorporated 
into integrated circuit fabricat i o n. Automation can be 
applied in several ways (30). One area is robotics, or auto -
mated wafer handling in which the wafers are moved on air 
tracks (rather than by operators) and are mechanically 
inserted into open mouthed diffusion and evaporation stations . 
Two such facilities exist within IBM. A second area is o n - line 
inventory control, also used at IBM, in which the position of 
various wafer lots in the line is determined and monitored by 
computer . This is especlally important for high throughput 
manufacturing. A third area of automation is local process 
control in which, for example, microprocessors are used to 
ramp furnaces or insert wafers at a particular station. 
An important adjunct to process automation is on- line 
monitoring of gas purity, furnace temperature, etc . This 
activity is highly transducer dependent and much progress 
can stil l be made here. An interesting hypothesis is that 
real time process information cou ld be used to modify a step 
further along in the process . For example, if the monitor 
showed that the gate ins11lator thickness was above nominal, 
the subsequent channel implantation do se could be reduced 
accordingly. Such on-line process tailoring might be required 
for the most sophisticated VLSI fabrication . 
The objectiv e of process automation is improved process quality 
or process throughput, or both. Probably the simplest and most 
direct way to increase productivity, however, is to increase 
the wafer size. Over the past decade wafer diameters have 
FABRICATION SESSION 
TPends in Silicon Processing 
increased from one to five inches. By 1990, we can expect 
wafer diameters of seven to nine inches. Silicon strips or 
ribbons may also be developed. Each move to a larger wafer 
diameter is tramatic due not only to equipment changes, but 
due to size related problems as well. As wafers increase, 
thermally 1nduced wafer stresses (warpage), stress- induced 
dislocations, global and local distortions, and other problems 
arise. Improved lithographic dimension is an alternative 
productivity enhancer, and it is lithographic machinery devel -
opment that experiences the most demanding requirements with 
each quantum jump in wafer diameter. The difficulties asso -
ciated with increasing wafer size tend to constrain progress 
in lithographic dimension. Compared to full field exposure, 
direct step- on- the-wafer, whether optical, E- beam, or X- ray, 
is relatively less sensitive to increased wafer size due to 
the limited field exposure area. 
7.0 DEVICE STRUCTURES 
The intended result of new processing techniques is an inte-
grated transistor device structure that is smaller, cheaper, 
faster, lower in power, and more reliable than its prede -
cessor. The basic structural elements of an integrated 
circuit are electrical isolation (between devices), the 
active device itself, contact vias (layer intraconnection 
points), wiring (interconnection lines), and passivation. 
7.1 MOSFET TECHtmLOGY 
Figure 7 shows a high density N- channel MOSFET or NMOS struc-
ture typical of the industry today (31). The structure is 
characterized by semi - recess8d oxide isolation, a polysilicon-
gate FET, an etched and rediffuse~ contact area, polysilicon 
CALTECH CO NFER ENCE ON VLSI , JanuaPy 1981 
88 v. Leo Rideout 
and diffusion and aluminum wiring paths, and phosphorous - doped 
glass passivation. 
PltO~PIIORUS DOPED 
PO LY 
ZERO 
OV(RL/\P 
PHOSPHORUS DOPEO 
METAL CONTACT 
AREA 
'FIGURE 7: Present MOSFET strucr.p--(> (after Pef . 3J.). 
7. 1. 1 ISOLATION 
In MOSFET technology, dielectric oxide isolation between 
devices has progressed from planar (global thick oxide with 
Semi -etched hole s ) to l ocally grown ( semi or fully recessed). 
recessed oxide is the mainstay of polysilicon- gate MOSFET 
technology due to its simplicity of fabrication, higher density, 
and se lf-aligned parasi tic channel stopper . The locally grown 
or recessed oxide techniques require an oxidation- resistant 
silicon nitride layer . Oxide growth gives rise to a lateral 
oxidation wedge under this layer shaped like a bird ' s beak 
which reduces the active device area and complicates the 
structure (32) . Although lacking high surface planarity , a 
simple approach is to use a common channel and field boron 
doping thereby eliminating nitride from the isolation step ( 33 ) . 
This technique becomes more attractive at smaller . dimensions 
with thinner isolation layers. Ideally, one would eliminate 
FABRICATION SESSION 
TPends in SiLicon PPocessing 8~ 
silicon nitride from the process and yet provide a deep, narrow, 
oxide region flush with the substrate surface and possessing 
a doped channel stopper region. 
Complementary (CMOS) FET technology has a special problem in 
that the isolation must help prevent PNPN (silicon-controlled 
rectifier) l atchup . A common approach is to widely separate 
devices and use the inherent diffusion isolation of the 
structure. Sapphire substrates provide complete dielectric 
isolation, which greatly improves the density, but the cost 
and processing difficulties of sapphire (e.g., outdiffusion, 
epi control, etc.) have led to a declining intere s t in these 
substrates . The idealized deep dielectric isolation discussed 
above could greatly benefit the bulk CMOS technology. 
7.1.2 DEVICE STRUCTURES 
Two of the most interesting developments in MOSFET devices 
are the double-diffused or DMOS device (34), and the lightly-
doped drain MOSFET (35). Figure SA illustrates the D~OS 
device. The idea is to ma s k the drain and to laterally diffuse 
in a narrow p-type channe l region on the source side of the 
t E 'l. oy 1:i1.1s method, ior exatnpl~, u v . ;.) ... ~ ---- - ··'·'- -'!.. c nc.t1!•..::.:. 
length can be fabricated with 2.0 micrometer lithography. 
The advantages are higher gain, faster switching, and better 
channel length control. The penalties are added cost (one 
additional masking operation) and unilateral device operation . 
Static and dynamic RAMs and uncommitted logic arrays have 
been made using DMOS devices, but it is yet to be established 
that the performance improvement warrants the additional 
processing complexity. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
90 V. Leo Rideout 
In the lightly-doped drain structure shown in Figure 8B, 
the electric field at the drain is reduced by grading ~he 
diffusion profile there (35). Consequently, the drain voltage 
may be increased, thereby increasing the switching speed and/ or 
current carrying capability . Extra masking steps are not 
required as the device fabrication is bas ed on a controlled 
oxide overhang. 
A 
I POLYSi \ 
N+ P- N+ 
0 .5J1-m 
8 
I POLYSi \ 
N+ I N- p N- N+ 
H 
0 .3J1-m 
FIGURE 8 : DMOS device (A) and lightly-doped drain FET (B) 
(after Refs . 34 and 35 respectively) . 
FABRICATION SESSION 
TPends in Silicon PPocessing 
7.1.3 CONTACT VIAS 
Contact resistance promises to be one of the first major prob-
lems to be encountered with higher integration because the 
resistance increases even more than linearly as contact area 
decreases (36). This antiscaling behavior is combatted, to 
first order, by deeply rediffusing the contact hole (37) (see 
Figure 7). Widely used in production, this technique is 
referred to as a borderless contact, but actually the diffused 
area is expanded under the isolation oxide by a rediffus1on 
step. Another approach to optimizing metal to diffusion or 
to polysilicon contact areas is the self-registering contact 
in which an oxide layer is locally grown up around a nitride 
protected contact area (38). 
Direct polysilicon to diffusion contacts buried under thick 
oxide (i.e., "buried contacts") become more difficult to 
fabricate as processes are scaled down because thermal drive 
times are reduced and a thinner polysilicon layer contains 
less N-type dopant. New approaches will be needed here. 
7 . 1.4 WIRING 
The most important advances in MOSFET wiring are double level 
metal and silicide-on-polysilicon (po lycide) interconnections. 
N+ diffused regions can also be used for wiring, however, with 
scaling the higher sheet resistance of shallower diffused 
lines discourages it. Although it requires at least two extra 
masks for contact via and wiring patterns, double level metal 
is finding acceptance in FET production. The advantages 
include density improvements in 4 Kbit quasi-static RAMs, 
lower resistance wiring in CMOS microprocessors, and increased 
yield with redundancy wiring for 64 Kbit dynamic RAMs. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
92 v. Leo Rideout 
The refractive silicide on polysilicon technique offers 5 to 
10 times lower line resistance without additional masking 
operations (13, 14) . The processing difficulties with incor-
porating a reproducible W, Ta or Mo silicide layer into the 
existing FET process are formitable, however, this technique 
should be available in the near future. The reduced inter-
connection line resistance will lead to higher speed operation, 
especially in static RAMs and in microprocessors . 
The use of an intermetallic silicide layer to reduce the sheet 
resistance of polysilicon or diffused region wiring represents 
one of the most important processing trends today. Although 
the polycide wiring approach has been demonstrated with both 
bipolar and FET test vehicles, thus far it has not been 
incorporated into mass manufacturing due to fabrication 
difficulties . When used on top of polysilicon, the silicide 
layer must be essentially transparent to the process, that 
is it must be patterned and oxidized along with the under-
lying polysilicon layer. This is not a trivial requirement. 
Generally silicide layers are more resistance to plasma and 
wet etches, and can become brittle when oxidized. Also, 
the silicide may fail to adhere to the polysilicon layer. 
The most popular approach is to use a silicide layer over a 
polysilicon layer, the idea being to simultaneously retain the 
faborable properties of the polysilicon either as a gate elec -
trode material in FETs or as a diffusion source and contact for 
bipolar emitter or base regions, and incorporate with it the 
low sheet resistance of the intermetallic silicide . A sili-
cide layer alone would be easier to pattern but has poor 
oxidation properties and cannot serve as a controlled 
diffusion source. The most popular silicides being investi -
gated are the high temperature ones like WSi 2 , TaSi 2 , MoSi 2 , 
FABRICATION SESSION 
TPende in Silicon PPoceeeing 
NbSi 2 , and TiSi2 . The primary deposition techniques are 
sputtering, or co-evaporation by electron beam. Following 
the deposition, the silicide molecule must be established, 
or formed, by heating the composite at an elevated t empera-
ture (e.g., 900°C for 30 minutes). The patterning may be 
done either before or after the forming step. 
An interesting new technique is the use of ion implantation 
(39) or laser annealing (40) to form the silicide. In the 
former case the deposited silicide layer is bombarded by an 
arsenic beam, the energy dissipation of which causes the 
silicide layer to form, thus eliminating the high temperat\tre 
heating step. 
It is interes ting to speculate that the ion beam annealing 
technique for silicides might also be used to anneal, for 
example, implanted source and drain regions, possibly with 
an argon beam. Of course, lasers or electron beams can be 
incorporated into ion irr.planters. Overall, there is a 
constant desire to combine processing steps into situ, 
although thus far little of this has occured. There stil l 
is a great deal of wafer handling involved in an IC process 
which may require as many as 150 sequential operations . 
7.1.5 MOSFET 
Figure 9 shows a hypothetical IC MOSFET of 1990. It is a 
polysilicon- gate bulk CMOS structure with one layer of 
polysilicon with silicide over it. Very shallow and lightly 
doped source and drain regions are used with laterally 
diffused regions for threshold control. Two layers of 
metal wiring are employed. The topography of the structure 
is highly planarized due to the fully recessed, deep dielec-
CALTECH CO NFERENCE ON VLSI, Januapy 1981 
94 
v. Leo Rideout 
tric, field isolation and the planarizing passivation layers. 
High conductivity silicide layers over the diffused and 
polysilicon regions greatly improve the electrical conducti -
vity. Contact vias to connect metal lines to metal, diffused, 
and polysilicon lines are refilled with conductive material. 
The vias are self-registering to the lines they contact. The 
channel length of the device is 1.0 ± 0.25 micrometers and the 
threshold voltage is 0.5 volts. The gate insulator thickness 
is 250 Angstroms. 
OXIDE OR POLYIMIDE 
COND. 
,.----------l----- --- '------\ 
METAL1 METAL1 
OXIDE P- N- OXIDE 
P+ P+ P+ 
FIGURE 9 : Future FET structure . 
FABRICATION SESSION 
TPends in Sili~on PPo~essing 95 
7 . 2 BIPOLAR TECHNOLOGY 
Recently, the major emphasis in MOSFET development has been 
on dimensiona l ~eduction rather than on structural innovation. 
In contrast, bipolar technology is going through a minor 
renaissance in structural improvement. Just as the emergence 
of polysilicon gate electrodes spurred advances in FETs, the 
invention of integrated- injection logic has inspired innovation 
in bipolar device structures. Additionally, fabrication 
advances in FETs have influenced bipolar design which now 
include, for example, self- aligned regions, polysilicon 
doping, a nd polycide interconnections. 
7 . 2 . 1 ISOLATION 
Bipolar isolation progressed directly f~om diffusion isolation 
to fully recessed oxide. Figure 10 shows a cross- section of 
an IBM masterslice bipolar logic structure ( 41 ). Planarity is 
a k ey re~1irement as the substrate must support three levels 
of metal wiring for which the linewidth control is affected 
b y surface topography . The idealized deep, narrow dielectric 
isolation described for MOSFETs would, of course, also bene -
fit b i po l ars . Historicall y , novel iso l aticn schemes have 
been more readily accepted into bipolar processes. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
96 V. Leo Rideout 
0 
'---·~·-"~·,_··'-"'--~~ 
' I F r• 7 '"' ' " 
e,,,, ... . v 
N• 
FIGURE 10: Present bipolar structure (after Ref. 41). 
7.2.2 DEVICE STRUCTURE 
A high degree of novelty is being incorporated into new bipolar 
structures. Among the most important of these are collector 
regions doped from and contacted by polysilicon, self- aligned 
collector- base contact edges, collector region that abut the 
isolation oxide areas, and metal-interconnected base regions 
(42) . A major thrust is toward 0.1 micrometer base widths 
facilitated by limited outdiffusion. Another is toward 
reducing parasitic capacitances by reducing collector area 
(butted regions) and by moving the base contact closer to the 
active base region (self-alignment). 
FABRICATION SESSION 
T~ends in Silicon P~ocessing 
7 . 2.3 CONTACT VIAS 
To date, conventional etched contact vias have been used to 
emitter, subcollector, and extrinsic base regions. The use 
of polysilicon, or silicide-on-polys1licon, allows aluminum 
lines to contact polysilicon regions over the isolation 
regions. This relieves aluminum spiking problems and reduces 
the overall device area. 
7.2.4 WIRING 
Bipolar logic structures today use three levels of metal wiring 
(41) and this might increase to four or five levels in the 
future. 
trend. 
The use of polycide layers, however, may alter this 
The attraction of multilevel metal is high conductivity 
and low temperature processing, the drawbacks being 9Xt~a mask-
ing steps and increasingly larger groundrules for the upper 
levels. Conductive metal refill techniques for COil~a~~ vias 
between metal levels are needed, as are self- alignment tech-
niques to register holes to lines. 
7.2.5 FUTURE STRUCTURE 
Figure 11 shows a hypothetical bipolar transistor of 1990. 
It is a T 2 L structure with silicide- on- polysilicon for the 
emitter, base and collector doping . Device regions butt up 
against the ideally deep dielectric isolation . Three layers 
of metal wiring are employed. Contact vias between metal 
layers are refilled with metal and self- registering contact 
techniques are used. The base width of the transistor is 
0.1 micrometers. Like the future FET, the bipolar structure 
is highly planarized to relieve line coverage problems and 
reduce linewidth of upper metal layers. 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
V . Leo Rideout 
OXIDE OR POL YIMIDE METAL3 
COND. 
METAL2 
COND. SILICIDE 
METAL1 
OXIDE 
OXIDE N+ OXIDE OXIDE 
N ..____ -.~ 
N 
P+ P+ POLYSi P- P+ 
F!GT'! .' 11 : Futu;'"' hi:Jo] ·>r structure . 
FABRTCATION SESSION 
TPends in SiLicon PPocessing 
8.0 STRUCTURAL PROBLEMS 
8.1 I SOLATION AREA 
Over the past decade, reduction in isolation area relative to 
device area has led to significant density improvements. This 
is exemplified by the transition from diffusion isolation to 
fully - recessed oxide isolation in bipolars, and from thick 
oxide with non- registered channel stoppers to semi - 1~cessed 
oxide with self - aligned channel stoppers in N- channel MOSFETs . 
Nevertheless, today about 50% of an IC chip area is still 
devoted to isolation. A major improvement in isolation is 
needed for density enhancement. The idealized deep and 
narrow dielectric isolation, which we hypothesized earlier, 
might reduce i so lation area to 25% . The isolation dielectric 
could be oxide, nitride, polysilicon, or combinations of these 
materials, and the refill should be planar with respect to 
the substrate surface. 
8.2. CONTACT RESISTANCE 
Contact resistance is one of the parameters that defies scaling 
and promises to present a major difficulty for VLS I (1). As 
contact areas are reduced, contact resistances increase 
linearly or superlinearly with contact diameter. One micro -
meter contact diameters with 10 to 100 ohms contact resistance 
can be expected (36). Techniques to insure uniformity such 
as rediffused (37) or laser annealed (40) contact holes help 
reduce contact resistance which is determined by the area, 
thickness, and resistivity of the contacted region. In thi s 
regard, an aluminum to silicide- on-polysilicon contact 
becomes particularly attractive. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
100 V. Leo Rideout 
8.3 LINE COVERAGE 
The ability to cross one conductive line with another is 
impaired if the edges of the lines are not sloped or if the 
intervening insulating layer is not conformal. The problem 
is manifested in presentday products in aluminum lines 
crossing dry etched polysilicon lines in which a reflowed 
phosphorus - doped glass insulator is not used . Line thinning 
or breaks can occur at the crossing points. This will present 
an additional burden to the plasma etching techniques which 
will have to provide controlled slopes during delineation. 
8.4 RADIATION 
Radiation introduced during fabrication can detrimentally affect 
integrated circuits. Deposition, etching, and lithographic 
equipment is particularly suspect, especially for thin MOSFET 
gate insulators (43). Although it appears that any radiation 
damage that might be introduced by chemical vapor deposition 
steps or plasma etching can be annealed out at subsequent 
processing operations at over 800°C, metal deposition presents 
a different situation. Aluminum, for example, is deposited 
in vacuum by RF heating, by sputtering , or by electron- beam 
heating. The latter technique involves considerable radia-
tion damage which cannot be annealed out at high temperatures 
as aluminum melts near 500°C. Electron-beam (or X- ray) litho-
graphy also introduces radiation damage, which, if used for 
the final metal definition may be hard to remove (44) . Plasma 
etching can also introduce radiation damage (45). 
A deleterious effect of process radiation is to produce 
damage st~tes (traps) in the thin gate insulator which may 
charge up during the operating life on the device . Thus a 
FABRICATION SESSION 
TPends in Silicon PPocessing 
device parameter (e.g., the threshold voltage) may slowly 
move out of specification during the life of the device . 
One trend is to try to improve low temperature (below 500°C) 
annealing techniques by RF annealing or by thermal annealing 
in pure hydrogen. The area of process induced radiation 
damage is a new one that promises to be of much greater 
importance for VLSI fabrication. 
9.0 SUMMARY 
101 
A variety of new fabrication techniques and structural elements 
have been reviewed thus far in this work. We have speculated 
as to how these trends in silicon processing might come together 
to produce the integrated circuits of the future. In particular 
we have tried to imagine what the silicon MOSFET and bipolar 
transistors will look like ten years hence and what techniques 
will be required to fabricate them . 
9.1 IC FABRICATION OF THE FUTURE 
Based on present projections, by 1990, one micrometer litho-
graphy with mask-to-mask alignment capability of ±0.25 
micrometers will be practiced in mass production. Most 
likely this wil l be achieved with optical direct - step- on- wafer 
(DSW) projection systems . MOSFET memory chips with one million 
components will be available yielding 128 Kbit static RA~1s 
and 256 Kbit dynamic RAMs in production with developmental 
chips of twice that capacity just emerging. Furthermore, 64 
bit FET microprocessors with 250,000 components will also be 
available with the equiva l ent computing power of 50,000 logic 
circuits. Hi g h performance (l ess than 10 nanoseconds) bipolar 
cache memory chips of 32 Kbits wi ll be available . Bipolar 
masterslice logic wil l have 10,000 circuits (about 50,000 
CALTECH CONFERENCE ON VL SI , Januapy 1981 
102 
V. Leo Rideout 
components ) and denser bipolar circuitry (PLAs and semi-custom 
chips) will reach 100,000 components. Chip power dissipation 
will play a major role in determining that the leading MOSFET 
technology will likely be polysilicon- gate bulk CMOS while the 
bipolar technology will be low power T 2 L, or I 2 L. 
ments in packaging technology will also occur. 
Key improve -
Ten years from now, the IC fabrication process will use ion 
implantation for all doping steps and dry etching for all 
material removal steps. Polycide layers will be used for gate 
electrodes, wiring, contacts, and controlled doping of shallow 
regions . Combinations of plasma and reactive ion etching 
will be used to achieve the required degree of selectivity, 
directio11al1ty, and line shape. Multilayer resists will 
be available to withstand the plasma etching. Seven inch wafer 
diameters and chips as large as 100 mm 2 will be processed. Low 
pressure and plasma assisted CVD will be commonplace, as will 
high pressure oxidation. Scanned laser beams will be used 
for various annealing and forming steps. Process control 
and monitoring will be highly automated so that operators 
will be employed primarily for maintaining and repairing 
equipment or moving containers of wafers from one station to 
another. Monitoring information will be processed in real 
time so that later process steps may be customized to accom-
modate variations in earlier steps. The process line of 
tomorrow will look and operate somewhat like the computer 
center of today. 
Technological prediction is an unreliable and highly imprecise 
art. In 1970, device and process researchers could not have 
predicted the HMOS polysilicon- gate FET or I 2 L bipolar struc-
tures of today, and fabrication techniques like plasma etching 
and laser annealing were then unknown. In a ten year timeframe, 
FABRICATION SESSION 
TPends in Siticon PPocessing 
lithographic dimensions decreased from over 6 to about 2.5 
micrometers, wafer diameters increased from 1 to 5 inches, 
and device structures underwent revolutionary changes. In 
103 
the next decade groundrules will decrease from 2.5 to 1 micro-
meter, wafer diameters will increase from 5 to 7 inches, and 
device structures will undoubtedly again undergo revolutionary 
changes. About the only certain future characteristic of 
integrated circuit technology is its unpredictability. 
9 . 2 THE TV/0 CULTURES 
The advances in very large scale integrated circuit design will 
be accompanied by substantial progress in microcircuit fabri -
cation . This paper has concentrated almost exclusively on 
trends in silicon wafer fabrication techniques, only mildly 
considering lithographic progress, and ignoring completely 
requirements in packaging and in circuit and chip design . 
Obviously progress on all fronts will be necessary. 
A challenging aspect of VLS I is the range of its impact upon 
the electronics industry. For example, digital technology 
ranges from solid state physical effects, through processing, 
devices, circuits, and chip architecture to system design. 
Two camps or cultures may be identified : chip fabricators 
that work in the semiconductor " foundry " handling silicon 
wafers and system designers that work in the CAD " foundry " 
handling terminal keyboards . The device and circuit design· · 
ers work in the intermediate r egion between these two extremes . 
Clearly the coming of VLSI has required people with widely 
differing skills to work cogether. 
The interface between the two cultures is often a difficult 
one. This is partly because the system tends to work from 
the top down with the design ideas driving the fabrication 
capabi l ity. Consider for example that a fabricator needs a 
CALTECH CO NFERENCE ON VLSI~ Januapy 1981 
104 V . Leo Rideout 
ten million dollar laboratory and fifty associates to do a one 
micron feasibility study, while the designer can simu l ate an 
entire chip right on his own computer terminal. Apparently 
the role of the l one innovator has shifted from the labora-
tory to the office . Certainly a very innovative and excited 
atmosphere exists today in the design world. 
In the future fabricators and designers will have to work 
closely together and there is a tremendous range of tech-
nology to span . One small way of improving the fabrication -
design interface and encouraging process innovation is to 
promote process modeling and automation . VLSI brings the two 
cultures closer together which in itself may be one of the 
most important future trends in integrated ci ruci ts. 
10. 0 AC:KNOvlLEDGEr'lENTS 
The autho~ is partially indcpted to S .C.Su . of the Hughes 
Research Cente r for making avai l able information on low 
temperature processing. Di. scussion a~1d helpful s ugge s tions 
were alsG received from the author ' s col l egues at IBM includ-
ing: J. M. Aitken, E. Bassous, J. M. Blum, L. M. Ephrath, 
W. D. Grobman, R. D . Isaac , B. J. Lin, L. M. Terman, and 
M. Y. Tsai . 
11.0 RE FERENCES 
1. V. L. RidE::" 
grated Ci rc .. i. ts, 
79, pp. 144-152, 
~~mi ts to Improvements of Silicon Inte -
~~oceedi ng s of Microcircuit Engineering 
.'.ac. ( September 25 -27, 1979 ). 
2 . A. B. Glaser and G. E. Suba k- Sh arpe , Integrated Circuit 
Enginee ring, Addison-Wesley Pub., Reading, Mass. (1977 ). 
FABRICATION SESSION 
TPends in SiLicon PPocessing 
3. Courtesy of S . C. Su, Hughes Research Center, Newport 
Beach, Calif. 
4. R. J. Robinson, "CVD Process Trends," Semiconductor 
Internat., pp. 27 - 37 (March, 1979). 
lU~ 
5. P. E. Luscher, Y.l . S. Knodle, andY. Chai, "Automated 
Molecular Beams Grow Thin Semiconductor Films, " Electron-
ics, pp. 160-1 68 (August 28, 1980). 
6 . Work reported by Hughes Research Laboratories, Malibu, 
Calif. 
7. P. S. Burggraaf, "Plasma Depostion Production Trends, " 
Semiconductor Internat., pp. 23-34 (March, 1980). 
8. R. J . Robinson, "Ion Implanters Overcoming Current 
Barriers, " Semiconductor Internat., pp. 45-53 (June, 
1979). 
9 . Y. Wada, S. Nishimatsu, and N. Hashimoto, "Arsenic Ion 
Channeling Through Single Crystal Silicon," J. 
Electrochem. Soc ., Vol. 127, pp. 206 - 210 (January, 1980). 
10. R. L. Seliger and P. A. Sullivan, "Ion Beams Promise 
Practical Systems for Submicrometer Wafer Lithography," 
Electronics, pp . 142-146 ( March 27, 1980) . 
11. V. L. Rideout , "A Review of the Theory, Technology, and 
Applications of Metal - Semiconductor Rectifiers," Thin 
Solid Films, Vol. 48, pp. 261-291 (1978 ). 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
106 v. Leo Rideout 
12. V. L. Rideout, "Reducing the Sheet Resistance of Poly-
silicon Lines in Integrated Circuits, " IBM Tech . Disc. 
Bul., Vol. 17, p. 1831 (1974). 
13 . B. L. Crowder and S. Zirinsky, "1ilm MOSFET VLSI 
Technology: Part VII -- Metal Silicide Interconnection 
Technology -- A Future Perspective," IEEE Trans. Electron 
Dev., Vol. ED- 26, pp. 369 - 371 (April, 1979) . 
14. J. Lyman, "Scaling the Barriers to VLSI ' s Fine Lines," 
Electronics, pp. 115 - 126 (June 19, 1980) . 
15. T . Takahashi, S. Wakamatsu, and K. Kimura, "A High Speed 
Multiplier Using Subnanosecond Bipolar VLSI Technologies," 
Eur. Solid- State Cir. Conf. Tech. Dig., pp. 110- 112, 
Southampton (September, 1979). 
16. V. L. Rideout, "Development of One-Device Random Access 
Memory Cells: A Tutorial," IEEE Trans . Electron Dev., 
Vol. ED - 26, pp . 839-852 (June, 1979). 
17. W. D. Grobmann, "Synchrotron Radiation X- ray Lithography," 
to be published. 
18. B. J . Lin, "Portable Conformable Mask -- A Hybrid Near -
Ultraviolet and Deep-Ultraviolet Patterning Technique," 
SPIE, Vol. 174, Develop. in Semicond. Microlitho. IV, 
pp. 114-121 (1979). 
19. B. J. Lin and T. H . P. Chang, "Hybrid E- beamjDeep UV 
Expsoure Using Portable Conformable Masking {PCM) 
Technique, " J. Vac. Soc . Technol., Vol. 16, pp. 1669-
1771 (November/December, 1979). 
FABRICATION SESSION 
Trends in Siticon Processing 
20. P . S. Burggraaf, "Plasma Etching Technology, "Semi-
conductor Internat., pp. 49-58 (December, 1979) . 
21. C. J . Mogab and W. R. Harshbarger, "P lasma Processes Set 
to Etch Finer Lines with Less Undercutting," Electronics, 
pp. 117-121 (August 31, 1978). 
22. L. M. Ephrath, "Dry Etching Review," invited paper to 
be presented at Silicon Symposium, Electrochem. Soc. 
Spring Meeting, Minneapolis (May, 1980). 
23. L. M. Ephrath, "Reactive Ion Etching for VLSI," invited 
paper to be presented at IEEE Internat. Electron Dev. 
Meeting, Washington, D.C. (December 8 -10, 1980) . 
24. L . ..M. Ephrath, "Selective Etching of Silicon Dioxide 
using Reactive Ion Etching with CF4 -H2 ," J. Electrochem. 
Soc., Vol. 126, pp. 1419-1421 (August, 1979). 
25. J. Meindl, et al, Process Models for Integ rated Circuits, 
Stanford Univ., to be published. 
26. D. A. Antoniadis and R. W. Dutton, "Models for Computer 
Simulation of Complete IC Fabr1cation Process," IEEE 
Trans. Electron Dev., Vol. ED-26, pp. 490-500 (April, 
1979) . 
27. A. R. Neureuther, D. F. Ky ser, and C. H. Ting, "Elect.ron-
beam Resist Edge Profile Simulation," IEEE Trans. Electron 
Dev . , Vol. ED-26, pp. 686-692 (April, 1979). 
CALTECH CONFERENCE ON VLSI, January 1981 
108 
V. Leo Rideout 
28. V. L . Rideout, B . L. Crowder, and F . F. Morehead, 
"Implanted Boron Channel Stoppers for MOSFET Integrated 
Circuits," talk presented at IEEE Semicond. Interface 
Specialists Conf., New Orleans (December, 1979) . 
29. R. Reif, R. W. Du-cton, and D. A. Antoniadis, "Computer 
Simulation in Silicon Epitaxy," J. Electrochem. Soc . , 
Ext. Abstracts, Vol. 79 - 1, pp. 352 - 355 (May 6 - 11, 1979). 
30. See special session on Process Monitoring and Automation 
at the Electrochem. Soc. Meeting in St. Louis (May, 1981). 
31. M. Eklund , " 1C Technology in the Eighties," Semicond. 
Internat . , Vol. 3, pp. 29 - 38 (January, 1980 ) . 
32. E. Bassous , H . M. Yu, and V. Maniscalco, "Topology of 
Silicon Structures with Recessed Silicon Dioxide," 
J. Electrochem. Soc., Vol . 123, pp . 1729-1737 (November, 
1976) . 
33. R. H. Dennard, and V. L. Rideout, "Method of Fabrication 
for Field Effect Transistors Having a Common Channel 
Stopper," U. S. Patent 4,090,289 (May 23, 1978). 
34. Y. Tarui, et al, "Diffusion Self-aligned MOST -- A New 
Approach for High Speed Devices, " in Proc. First Conf. 
Solid- State Devices (Suppl. to J. Japan Soc. Appl. Phys., 
Vol. 39, pp. 105-110, 1970). 
35. S. Ogura, P. J . Tsang, W. W. Walker, D. L. Critchlow, and 
J. F . Shepard, "Lightly Doped Drain MOSFET Structure to 
Relieve Scaling Limi~ations, " IEEE Workshop on Scaling and 
Lithography, N. Y. City ( April 22, 1980). 
FABRICATION SESSIO N 
TPends in Silicon PPocessing 
36. H. Nozawa, S. Nishimura, Y. Horiike, K. Okumura, 
H. Jizuka, and S. Kohyama, "High Density CfiJOS Process-
ing for a 16 Kbit RAM," IEEE Internat. Electron Dev. 
Meet. Tech. Digest, pp. 366- 369, Washington, D.C. 
(December, 1979). 
109 
37. W. G. Watrous, " MOSFET Transistor and Method o f 
Fabrication," U. S. Patent 3,986,903 (October 19, 1976). 
38. V. L. Rideout, J . J. Walker, and A. Cramer, 11 A One - device 
Cell Using a Single Layer of Polysilicon and a Self-
Registering Metal - to - Polysilicon Contact," IBM J. Res. 
Develop., Vol . 24, pp. 339-347 (May , 1980) . 
39 . M. Y. Tsai, C. S. Peterson, F. M. d'Heurle, and 
V. Mani scalco , "Refractory riJetal Silicide Formation 
Induced by As+ Implantation, 11 Appl. Phys . Lett., Vol. 
37, pp. 295 - 298 (August , 1980 }. 
40 . C. J. Doherty, T. E. Seidel, H. J. Leamy, and G. K. Celler, 
" Formation of p - n Junctions and Ohmic Contacts at Laser 
Processed Pt-Si Surface Layers," J. Appl. Phys., Vol . 51, 
pp . 2718 - 2721 (May, 1980). 
41. H. W. Curtis, "Integrated Circuit Design, Production, 
and Packaging for System/38, 11 IBM S/ 38 Technology 
Development Report, pub . by GSD Tech. Comm., Atlanta, 
Georgia ( 1978) . 
CALTECH CONFERENCE ON VLSI~ January 1981 
110 
42 . 
V. Leo Rideout 
D. D. Tang, T. N. Ning, R. D. Isaac, G. C. Feth, S. K. 
Wiedmann, and H. N. Yu, "Sub-nanosecond Self-aligned 
I 2 L/MTL Circuits," IEEE Internat. Electron Device 
Meeting., Tech. Digest, pp. 201-203, Washington, D.C., 
(December, 1979), also to be published in IEEE Trans. 
Electron Dev. (August, 1980). 
43 . R. A. Gdula, "The Effects of Processing on Radiation 
Damage in Sio2 , "IEEE Trans. Electron Dev., Vol. ED - 26, 
pp. 644- 646 (April, 19 79 ). 
44. J . M. Aitken, "111m MOSFET VLSI Technology: Part VIII 
Radiation Effects, " IEEE Trans. Electron Dev., 
Vol. ED-26, pp. 372-378 (April, 1979). 
45. D. J. DiMaria, L . M. Ephrath, and D. R. Young, " Radiation 
Damage in Silicon Dioxide Films Expo s ed to Reactive Ion 
Etching," J . Appl. Phys . , Vol. 50, pp. 4015 - 4021 
( June, 1979). 
FABRICATION SESSION 
111 
ELECTRON BEAM TESTING AND RESTRUCTURING OF INTEGRATED CIRCUITS* 
fly 
D. C. Shaver 
Lincoln Laboratory , Massachusetts Institute of Technology 
Lexington, Massachusetts 02173 
Dramatic improvements in the cost, performance, and reliability of a 
digital system can he obtained if the system is integrated on a single chip . 
Many systems are suff i ciently comp l ex that the die size resulting from inte-
gration would be very large with a low probabili ty of producing a perfect , 
functioning die . Since there is a real need for larger integrated systems 
than can he fabricated free of defects, it is likely that techniques which can 
locate and "wire - a r ound " defects will be useful and will allow the die-size to 
increase, perhaps to ful l-wafer size . 
A plausible scheme for fabricating a large system is: 
(i) fusign the large-scale system in a highly modular fashion. Parti -
tioning into subsystems should stress minimum interconnection 
requirements between the s ubsys t ems , complete testability of each 
subsystem, and minimum number of subsystem types . This l ast 
requirement suggests that one should try to design and fabricate a 
sin gle subsystem type , and that each s ubsystem would be assigned 
unique functions by a customizing operation . Finite sta t e machines 
containi ng PLAs or FOMs are examples of s ubsystems which could be 
easi ly customized . 
(ii) Customize and test the individual subsystems . If a particular sub-
system does not f unction properly, a spare one would be customized 
and tested. 
(iii) Interconnect working subsystems to form the large-scale system . 
To construct such a system a flexible t ool is r equired to allow subsystem 
customization, testing, and interconnection . The o bjective of this paper is 
to demonstrate that a scanned electron-beam provides this flexible tool . 
Specifically , the e l ec tron beam can be used for three essential functions : 
(1) Input in jec tion: the e l ectron beam can be used to apply inputs and 
t o alte r the log i ca l sta t e of a subsyste m under test . 
(2) Output sensin g : 
of a " zero" or 
points . 
the e l ect r on beam can be u sed to sense the presence 
"one s t ate a t any one of a large number of test 
This work was s upported by the Department of the Air Force and the Defense 
Advanced Resear c h Projec t s Agency . 
CALTECH CONFERENCE ON VLSI , Januapy 1981 
112 
D. C . ShaVeT' 
(3) Non-volatile restructuring: the electron heam can he used to open 
or close switches in the subsystems in a non-volatile manner . The 
opening and closing of switches is used to provide customization 
within the individual subsystems as well as control of the discre-
tionary interconnect between working subsystems. 
A variety of physical effects including charging and discharging of 
surfaces 1 , vol t~ge contrast 2 , electron-beam inciuced current (EBIC)3, melting 
or vaporization , e utectic-formations , threshold shifts in MOSFETs6 , a nd 
decomposit ion of organometallic vapors7 may occur when an electron beam 
interacts with matter, and these effects could be exploited in a n electron-
beam testing and r estructuring tool . Earlier work on electron beam testing of 
integrated ci r cuits has centered on voltage contrast examination of chips for 
failure analysis , or stroboscopic measurement of waveforms . Generally a 
raster scan of the electron-beam is generated and an image of the chip is 
displayed or , alternatively, a single point is probed and a waveform is viewed 
on a CRT. The emphasis of the work presented here is to develop methods for 
computer - controlled electron beam testing of wafer-scale circuits , including 
restructuring techniques . This emphasis has the following implications : 
(1) Input injection, output sensing , and progr amming of non- volatile 
switches must a ll be achievable in a single system. 
(2) The test point to be probed or switching element to be programmeci is 
selected by a computer-controlled deflection of the beam to the 
appropriate coordinate . The beam is unhlanked on ly over selected 
points, so the entire circuit is not exposed to the beam . 
(3) Testing can be fully automatic 
provides coordinate information 
Potentially, n ode extraction and 
used to generate test sequences 
description. 
since a mask level description 
for the e l ectron-beam system. 
switch level simulation8 can be 
automatically from the mask 
(4) The objective is to perform only functional testing (i . e ., checking 
for logical ones or zeroes) , not parametric measurements . 
(5) The integr ated circuits must be designed to be compatible with 
electron beam testing . Specific structures incorporated in the 
mask-level specification make the electron- beam t es ting and restruc-
turing possible. A wafer - scale powe r grid is used to suppl y power 
during electron beam testing . 
Figure 1 illustrates a possible layout for a wafer - scale elec tr on-beam 
testable system. The central portion of the wafer containing active sub-
systems and discretionary interconnect would be probed only by the electron 
beam. Some large test and power pads at the wafer perimeter could be 
contacted by relatively lar ge probes in the cassette which holds the wafer in 
FABRICATION SESSION 
Electron Beam Testing and Rest~ucturing of lnteg~a~ea ~~ ~cu~~b 
105852- N 
ACTIVE SYSTEM 
AREA 
SILICON WAFER 
. ··o ···. 
·. o .··.·o··· .. ·._.·· .. 
. . D .. · 
.. .. ~ ·:··q ·.: : .. o·· .... . ·.. : .. o·· .· ·:··.~.--.. :·: · 
. . . . 
:··o · · c::J D ·. 
.·_ .. _· .. :· .. -:.··o ··: ,: ·.· ···o·· ·: 
..... ·. ·. :.· .. · o· . · . . · ... 
. ~ - - . . ... 
. . . . 
.. :-·.~ ··· .... ·.-o·· 
·· ~ · - ·.· .. · . 
GROUND 
LARGE TEST 
AND 
POWER PADS 
WAFER - LEVEL 
BONDING PADS 
FIGURE 1: 
A possible l ayout for a waf e r-scale system designed for electron 
beam testing and restructuring . 
CALTECH CONFERENCE ON VLSI, January 1981 
D. c . Shaver> 
electron beam system . Onlv a very limited number of these pads would be 
required since the electron he:1m multiplexes use of the pads . A set of wafer-
level bonding pads would bf' used to connect the wafer-scale system inputs and 
outputs in the final package . Figure 2 illustrates schematically a possible 
arrr111gement \vithin the active subsystem nrea. 
~ 
BUS 
FIGURE 2: 
I 
N 
p 
u 
T 
s 
INTERNAL 
CUSTOMIZATION 
® 
0 
INTERNAL 
TEST POINTS 
0 
® 
0 
® NON-VOLATILE SWITCH 
0 
u 
T 
p 
u 
T 
s 
0 OUTPUT SENSING TEST POINT 
[tJ ELECTRON BEAM-SWITCHED LATCH 
~ 
BUS 
A schematic representation of a subsystem designed for el~ctron 
beam testing a nd restructuring . ESLs can be used to apply 1nputs 
to the subsystem and sensing points are provided within and at the 
outputs of the subsystem. Non-volatile switches provide subsystem 
c ustomization as well as flexible interconnect . 
FABRICATION SESSION 
Electron Beam Testing and Restructuring of Integrated rircuits 
LOGICAL "OUTPUT" SENSING TECHNI()UFS 
Only specific techniques which are useable with n-channel ~~OS (fi."MOS) 
technology and which require no signific~nt modification of our commercially 
available ETEC LEBFS-0 electron beam 1 ithography sys tern will be described. 
Since the system is also routinely used for mask-making and direct-write lith-
ography, techniques were chosen to he compatible with operation of the machine 
in a lithography mode. Specifically, the beam-defining aperture si7P and 
secondary-electron detector placement were not optimum, and techniques such as 
organometallic decomposition which would cause system contamination were 
avoided . The most obvious Lechnique for sensinr the presence of a logical 
zero" or "one is to ulilize voltage-contrast in the secondary Plectron 
sir,nal. Tn particular, if the incident electron heaM is directed at a flat 
aluminum test pad a few microns in diameter, and if the test pad is surrounded 
hy a grounded metal ring several-microns wide, a strong modulation of the 
secondary electron signal will be obtained as the potential at the teRt pad is 
varied from zero volts to a + 5 V logic level. ~Jith the test pad c1t + 5 V, 
the lowest energy secondaries will not escape from the vicinity of the pad, 
and Lhe secondary signal will be reduced. Under ideal conditions, a very 
strong modulation of the secondary electron sir,nal can be achieved and vir-
tually all secondary electrons leaving the pad can be collected. In this 
ideal case, a high enough signal to noise ratio can be obtained in the 
secondary signal (with modest incident beam currents) to allow reliable 
discrimination be tween a "one" and "zero" logic state with a beam unhlank time 
of less than one microsecond. Thus , in principle, the beam could be deflected 
to examine more than one million test points per second. Put another way, if 
a chip were clocked at a 10 kHz rate, 100 seJected internal test points could 
be examined during each clock cycle. 
This voltage contrast functional probing shoulrl provide excellent test-
ability with minimal area overhead for tesl points. Unfortunately, results on 
the LF.BES system have been disappointing . Heasurements indicate that Lhe 
secondary electron detector in this system is very poorly placed and receives 
less than one electron for every ten thousand electrons incident on the 
sample. For comparison, with a good detector a rrangement one could receive 
ahout 1 electron for every 10 incident electrons under comparable bombardment 
conditions. In addition, most of the low-energy electrons collected by the 
detector appear to be produced in the vicinity of the detector by high-energy 
electrons backscattered from the specimen which results in a very poor voltage 
contrast modulation of the "secondary" signal. These two effects reduce the 
logic level measurement rate to only about 100 Hz. It is anticipated that 
detector improvements will improve thi s rate by several orders of magnitude. 
A second logic-level sensing technique eliminates most of the diffi-
culties encountered with voltage contrast probing. This technique is capable 
of logic level sensing at rates of at least 1 MHz, and is free of contam-
ination, charging, and crosstalk effects encountered with voltage contrast 
techniques. The incident electron probe is pointed at a specially designed 
test point and is used to select that test point for examination . When an 
CALTECH CONFERENCE ON VLSI, January 1981 
116 D. C . ShaVe7' 
<>lectron benm penetrates a semiconductor , electron- hole pairs are generated . 
These elect r ons or holes can he collected by a p - n junction , causing a current 
to flow in the integr ated circuit. Reasonable incident beam currents might he 
as large as a few hundred nanoAmperes , but such l ow currents will not appre -
ciably perturb static circuits . llowever , if an incident electron has a 5-
20 keV energy it can generAte About 1000- 4000 electron - hole pairs . If these 
are collected by a j u nction, currents of several hundred microamperes can ~e 
induced in the integrated circuit which is larger than the current which is 
usually supplied by a depletion-mode pullup . In W10S , since p - n junctions are 
formed betwePn all n+ rliffused conductors and the substrate , the e - beam can be 
used to pull down any diff u sed conductor to substratE> potential , which is 
usually ground. Thus , by designing so that selected diffused conductor s are 
accessible to the e l ectron heam , it is possible to produce a low logic level 
at a selected point merelv by pointing the electron heaM and unblanking it. 
This forms the basis for a number of possible> sensing schemes including 
the electron beam control led multiple>xor shown in Fig . 3 . In this scheme each 
of a number of test points (shown as A, B, c ... ) is connected to the gate of a 
FET added specificA ll y for test purposes . The drains of these FFT's are 
connected to a tesl bus "'hich can be the power bus , or a special test bus can 
be used to reduce the capacitive loading and improve sensing speed . The 
sour~e terminal~ of the FFT ' s are left disconnected , and the> source-su bstrate 
(iiode provides the electron- beam probed point. If, for example , test point BT 
is electron hoMharded , the diode will be driven towards for ward bias . 
Depending on the statf' of test point R, the FET switch will be closed or open 
and the hoMbarded diode will be able or unable , re~pectively , to draw current 
from the test bus . Thus , th<> appearance of current on the test bus synchro-
nous wilh the> bombardmE-nt of the tesl point will indicate the logical state . 
A singlP wire pr ovides access to many test points with the electron beam 
providing test - point selection . A number of var iations on this basic theme 
are possible includinR tree-structured buffered bu sses . The operation of a 
single test point will he demonstrated late r in this paper . 
H:PUT lNJFCTION 
Hole - electron pair inject i on by e l ect r on beam provides the basis for 
applying inputs to a powered - up device wi t hout actual input connections . One 
de vice , the c> lee tron- benm switched latch (ESL) , provides a means for app l ying 
static inputs lo a system under electron beam cont r ol. The electron beam is 
used to control a set -reset flip - flop by bombarding eithe r a set- to- o ne or a 
set - to- zero control diode . ESLs can be u sed to provide s t able inputs or 
clocks , or as volatile cont r ol for pr ogrammable links during testing . 
FABRICATION SESSTON 
ElectPon Beam Testing and RestPuctuPing of Integ~ated CiPcuits 111 
105855- N 
+V 
TEST BUS 
••• 
A-I B~ c-1 
B 1 
0----------
BEAM 
CURRENT n n AT BT ____ ....... .._ ___ __, ..._ ___ _ 
VouT u 
FIGURE 3 : A simple electron-beam con t r olled mu l tiplexor for sensing a 
selected test point . 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
us 
D. c . Shaver> 
Figure 4 is a schematic representAtion of an electron-beam switched 
latch . The latch is a pair of cross-coupled inverters with pulldown access 
provided for the electron beam. Figure 5 shows an actual layout of an NMOS 
r:SL . Electrically-long pull11ps are used to reduce the r equired pulldown 
·~11rrent to less than 10 t-~ 1\ , and the e-beam pulldown points have been extended 
well away from th(;> active transistors. Uncovered contact c uts provide elec -
tron beam access . This relatively large layout provided many advantages 
during preliminary experiments , but the ESL could be made as small as two 
~inimum size inverters. Two additiona l structures are attached to the output 
of the ESL in Fig. 5 . One is an inverter which was added to make the power 
supply current une(]ual for the two states of the ESL which made it possible to 
test ESL operation by simply monitoring supply current. The second structure 
is a one-bit slice of the electron-beam controlled multiplexor described in 
the previous se~tion . Thus, this simple chip can provide a feasibility demon-
strHtion of combined electron-beam input injection and output sensing . 
A chip fabricated at Hewlett -Packard as part of the MPC-580 multiproject 
chip run "ms wire-bonded and placed in our ETEC electron beam lithography 
syste~ . Only two wires , supplying power and ground , are attached to the chip . 
After setting the benm parameters (5 kV, 7nA) and reg istering the chip to the 
el0ctron-beam coordinate system , elec tron-beam control of the test chip began. 
As shown in Fig. 6, successful electron-beam swi t ching of the latch and 
electron-beam probing was achieved . The lower trace in Fig. 6 shows the elec-
tron beom x-deflection signal and indicntes at which of three locations the 
bPam is positioned . An upwards deflection on this trace corresponds to a 
leftward motion (in Fig . 5) of the beam. The beam cyc les among three posi-
tions. From highest to lowest on the trace the positions are set-to-zero, 
set-to- one , and s~nse-output . The top trace in Fig . 6 shows turn-on 
(unblanking) of the electron beam as a small downward blip . The center trace 
s hov7S an AC-coupled record of the supply current monitored across a small 
sense resisLor. Moving from left to right across Fig . 6, the beam is 
initially posj tioned at the set-to-zero location. \.fuen the beom is unblanked, 
the latch switches Lo zero causing the downward transient in supp l y current 
since the latc h was designed to draw less current in the zero state . Then, 
the ~am position shifts to the sense loca tion and is again unblanked; no 
se nse pulse appears on the supply current , so the latch i s zero . Then, the 
beam moves to the set-to-one position and unblanks. The immediate change in 
supply current indicates the ESL has changed state . Finally, the beam is 
moved to the sensf' position again and a smal l pulse appears in the supply 
c urrent synchronous with the unblanking of the beam . The output is a one. 
Fignre 6 demonstrates complete electron beam control and probing of a small 
logic circuit , and the principle is extendable to complex systems . The ESL 
described above could he switched with a 380 ns pulse. 
The ESL provides electron-beam control of static logic levels . The elec-
tron beam may be used to great advantage in dynamic circuits as shown in 
Fig. 7. Hany NMOS integrated circuits are designed as two-phase clocked, 
dynamic finite-state machines. A very large number of electron-beam 
FABRICATION SESSION 
ELectPon Beam Testing and Rest~uctuPing of IntegP~ted CiPcuits 
105854- N 
+ 5 v --..,-----
------------TDIODES TO 
~ ~ T SUBSTRATE 
... . 
ELECTRONS 
0 1 
ESL 
FIGURE 4: Schematic representation of an electron beam switched latch . 
..l.L::I 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
120 
D. C. Shav e r> 
r 
I· 
FIGURE 5 : NMOS layout of an electron-beam switched latch. Top pad is +5 V 
and bottom pad is ground . The three dark contact cuts above the 
ground pad correspond, from left to right, to "set-to - zero" , " set -
to-one", and "sense" points for the elect r on beam. 
FABRICATION SESSION 
ELectPon Beam Testing and RestPuctuPing of IntegPated CiPcuits 
FIGURE 6 : An electron heam switched latch in operation . Downward pulses on 
top trace s how beam unblanking . Center trace is AC- coupled supply 
current to ESL . Lower trace shows beam deflection . 
COMBINATORIAL 
LOGIC 
":" 
-
0 
t 
0 
t 0 0 0 0 
BOMBARD AFTER BOMBARD AFTER 
cp1 TO cp2 TO 
SET "ZERO" SET II ONE" 
rlGURE 7~ A dynamic finite-state ma chine s howing how the electron beam is 
used to alter the internal state . 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
122 D. c . Shaver> 
accessible test points ar~ ~vailable , and the heam is used to selectively dump 
stored charge after the clock-controlled pass transistors are cut- off . The 
ability to alt~r sequentially internal states in the finite-state machine and 
to probe the consequences will <tllow an unprecenclented degree of testability . 
FLFCTRON BEA!-1 CONTROL OF NON-VOLATILE Sh'ITCHES 
The ability to open and close selected switchea on the wafer is cssentlal 
if one wishes to customize subsystems and modify interconnects . Ideally these 
switches should be easily flipped from on to off (and vice-versa) under e-beam 
control and should exhibit hi~h off resist~nce and low on resistance . Once 
programmed by the beam they should retain their state indefinitely . Two types 
of electron-beam alterable switches, field-oxide FETs and floating-gate FETs, 
are described helow. 
The field oxide FETs are parasitic devices present in all ~~OS processes . 
WhPn fl polysilicon or metal conductor run~ over thick field oxide , a parasitic 
rET i.e; formed between adjacent diffused conductors . Normally , this FET is off 
since the combination of thick oxide and field implantation raises its thres-
hold to ~any voltc; . Flectron-beam irradiation of oxicles with a bias appliecl 
across the oxide can result in a large buildup of positive charge in the 
oxide , presumably due to hole-trapping . Specifically, if a positive voltage 
is applied across thick oxide and the FET is irradiated a thin layer of posi-
tive char~e will accumulate near the silicon surface producing an effect 
equivalent to applying the bias across a thin oxide of only about a hundred 
angstroms thickness . Since this charge remains trapped in the oxide after the 
irradiation and bias are removed, a large (> 50 V) negative threshold shift is 
induced in the FET . For example , a polysilicon ~ate field-oxide FFT formed in 
the MPC-580 run exhihi. ted a threshold of ) 10 V. Thus, for normal logic 
levels between 0- 5 V, the p,ate cannot turn - on the FET , and adjacent diffused 
conductors are not connected . This FET was strongly turned on by applying 
+ 3 V to the gate and irradiating the gate oxide . As shown in the top of 
Fig . 8, the FET is strongly turned on after e -beam irradiation with a positive 
gate-programming voltage. The four closely spaced c haracteristics correspond 
to gate voltages of 0 , l, 2 and 3 V indicating that the FET is strongly on 
re~ardless of the ~ate voltage afte r programming. By biasing the gate at zero 
volts and irradiating again, the FET can be turned off . The lower half of 
Fig . 8 shows Lhe drain-source characteristic of the FET after this operation. 
For gate voltages varying from 0-5 V, the FET is off . 
C-beam programmed field-oxide FETs provide a simple, reprogrammable 
element for restructuring and customization . The voltage applied to the 
programming gates during electron heam bombardment can be generated locally 
(rom the output of an ESL or can he supplied over a single programming wire 
such as the power bus . The retention time of the on state is at least a few 
weeks for devices operated at room temperature, but the retention charac-
teristics are likely to be poor if the devices are operated at 150°C for even 
shorl periods. Nonetheless, this e-beam controlled switch should be very 
FABRICATION SESSION 
EtectPon Beam Testing and RestpuctuPing of IntegPated CiPcuits 
I 2 0 0}-L A 
IV 
FIGURE 8: A field-oxide FET non-volatile switch after e-beam progranming. 
Switch is initially off . Top picture shows FET characteristic 
after e-beam programming to the "on" state . Note the low sensi-
tivity of the FET conductance as the gate voltage is varied from 
zero to +3 volts. Bottom picture shows "off" state after e-beam 
reset . 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
124 
useful i.f the devi.ees are 
cP rtainly prove to be n 
restructurahle systems . 
to producP the threshold 
switch is possible . 
D. c . Shaver 
operatP.d at liquid nitropen temperatures, and will 
useful PleMent in laboratory studies of large-scale 
Less than 1 l!C/cm2 of 20 kV bombardment is required 
shifts , and a programmin~ time of less than 1 lJS per 
Electron hP.am progrAmmed floating p,ate FE'fs should provide a r eliahle, 
truly non-volatile switch . Flectron storage on floating gate FETs is the 
basis for information retention in UV-erasable EPROMs which have enjoyed wide 
commercial acceptdnce and have C'xcellent retention characteristics . FAt-IOS9 
electrically-programmable RONs are programmed by hot-electron injection 
through the thin rate oxide where the hot electrons are generated by avalanche 
breakdown of the drain-substrate junction. Electron-beam programming of 
floating gate devices is possible if an electron beam is directed at an oxide-
cover~d gate , causing electrons to pe netrate the oxide and become trapped on 
the gate . The trapped elPctrons will charge the gate to a negative 
potential . 
Some preliminary experiments have been performed on depletion- mode 
floating gate devices fabricated as part of MPC-580 . These devices were 
normally on , and could be turned off by electron bombardment of the gate . 
Figure 9 shows a drain- source characteristic for a FET before (top photo) and 
after (bottom photo) electron-beam programming. At a 5 V drain- source bias, 
the current is 100 times greater in the on state than in the off state . The 
residual cur r ent of 5 l!A in the off device was determined to be the result of 
formation of a weak buried channel in the depletion mode device. Biasing the 
substrate to -0 . 4 V increased the on : off ratio to better than 10000 : 1. A 
slight modification of the depletion mode implant would greatly reduce the 
magnitude of the buried channel . The large floating gate devices used for 
th~se experiments required 370 lJS to insure complete turn off using a modest 
2 nA beam . By using larger beam currents and smaller devices , programming 
times of a few microseconds could be achieved. 
CONCLUSION 
Control of the state of a simple NMOS integrated circuit using an elec-
tron heam has been demonstrated . Techniques have been demonstrated for input 
injection, output sensing, and programming of non-volatile switches using a 
commercial ly-available electron beam lithography system and conventional NMOS 
technology . Many of the techniques for input injection and output sensing 
have a lso been demonstrated using light beams, but the experime ntal apparatus 
was very limited . Though many aprroaches to "scanning-bea m" probing and 
restructuring are feasible, the state-of-the-art in electron beam deflection, 
blanking , etc . , is highly advanced, and electron beam probing a nd r es truc-
turing could be a r eality in the very short term. This can provide us with a 
flexible tool to understand the problems in constructing wafer-scale sys t ems . 
FABRICATION SESSION 
Electron Beam Testing and Restructuring of Integrated C~rcu~ts 
IV 
FIGURE 9 : A depletion-mode floating gate FET before (top photo) and after 
(bottom photo) electron beam programming. 
CALTECH CONFERENCE ON VLSI, January 1981 
126 
D . c . Shaver> 
ACKNO\JLE DGHENTS 
The test chips used for these experiments were fabricated as part of the 
DARPA sponsored HPC-580 multiproject chip . I would like to thank everyone at 
Hewlett-Packard , Micro Mask, and Xerox who made this multiproject chip a 
success . At Lincoln Labor atory, Dr . B. Burke provided test FETs used for 
initial expe riments , and P . Daniels and D. Klays provided assistance with 
packaging ano bonding . 
REFERENCES 
1 . P. E . Kudirka and C. K. Crawford , "Potential measurement and stabilization 
of an isolated target using electron beams", Solid State Electronics, 
15: 987-992 (1972). 
2 . ~khard Wolfgang , Rudolf Lindner , Peter Fazekas, and Hans-Peter Feuerbaum, 
"Electron-beam testing of VLSI circuits", IEEE Trans . Electron Devices, 
ED26 : 549-5 59 , April 1979. 
3. James R. Beall and Leon 
conductor evaluation and 
Physics Symposium , 1977 . 
Hamiter , Jr ., "EBIC - A valuable tool for semi-
failure analysis" , p . 6 1-69, IEEE Reliability 
4. J . Vine and P. A. Einstein, "Heating effect of an electron beam impinging 
5. 
on a solid surface , allowing for penetration", Proc . IEEE , 111: 921-930 
(1964) . 
C. G. Kirkpatrlck , J. r . Norton, H . G. Parks, and G. 
concepts for electron-ion beam and electron-electron 
J. Vac. Sci. Technol., 15 : 841-844 (1978). 
E. Possin, "New 
beam memories", 
6 . N. C. MacDonald and Thomas E . Everhart , "An electron beam activated switch 
and associated memory", Proc. IEEE , 56: 158-166, February 1968 . 
7 . 
8 . 
9 . 
Allen G. 
electron 
Instrum., 
Baker and William C. Horris, "Deposition of metallic films by 
Rev. Sci. impact decomposition of organometallic vapors" , 
32: 458 (1961) . 
Clark M. Baker and Chris Terman, "Tools for verifying 
designs ", Lambda, Volume 1, No . 3, p. 22 (1980); 
E . Bryant, "An algorithm for MOS logic simulation", 
No . 3 , p . 46 (1980) . 
integrated circuit 
see also Randal 
Lambda, Volume 1, 
Dov Frohman-Bentchkowsky, "A fully-decoded 2048-bit electrically 
programmable FAMOS read-only memory", IEEE J. of Solid State Circuits, 
SC6 : 301-306, October 1971. 
FABRICATION SESSION 
Two Timing Samplers 
Edward H. Frank 
Robert F. Sprou ll 
Department of Computer Science 
Carnegie-Mellon University 
Piusburgh, PA 1521 3 
Abstract 
Testing VLSI chips presents a variety of problems . some of which can be solved by building on-chip testing 
structures. On-chip testing structure!. can allow a designer to test aspects of a ci rcuit which might be difficult to 
test even with expensive test equipment and moreover can provide reasonable testing hardware to designers who 
do not have access to sophisticated off-chip testing equipment. 
In this paper we describe a type of on-chip test s tructure called a timing sampler which enables the de~igner to 
accurately measure when on-chip l>ignal transitions occur. The timing samplers we present are simple. They 
have been fabricated as part of a multi -project chip and experimental results show that they arc reasonably 
accurate as well. 
Copyright (C) 198 I Edward H. Frank and Robert F. Sproull 
This research was sponsored by the Defense Advanced Research Projects Agency (DOD). ARPA Order No. 
3597. monitored by the Air Force Avionics Laboratory Under Contract F33615-78-C-1551 . 
The views and conclusions contained in this document are those of the authors and should not be interpreted as 
representing the official policies, either expressed or implied , of the Defense Advanced Research Projects 
Agency or the US Government. 
CALTRCH CONFERENCE ON VLSI, JanuaPy 1981 
128 
EdwaPd H. FPank and RobePt F. SpPoull 
1 Introduction 
A<, VLSI chip~ become im:rcasingly complex designers are di\covering that testing and debugging must be 
comidered in the overa ll chip de!.ign. A~ a result. testing mechanbms such '" scan-in/scan-out (often called shift-
register latch (SRL) or level-sensitive scan des1gn (LSSD) !Eichelberger 78]) for accessing internal s tate and on-
chip <,ignature analy-,is [FrohwerJ... 77] such a ... BILBO [Koencmann 79j are becoming more common . In this 
paper we describe another type of on-chip testing mechanism which b useful for measuring some aspects of 
c1rcu11 performance . 
When asscs-.ing the performance of a circuli, 11 is often important to measure the time of occurence of an on-
chip signal. For example, to measure the performance of an adder design, the designer wants to know when the 
carry signal arrive<, at the end of the carry chain. The simple expedient of connecting the signal to an output pad 
dnvcr and measunng the arrival time off-chip i' inadequate because a long unknown delay i~ introduced by the 
pad driver. 
An accurate meawrement of the t1mc behav:or of a s1gnal could be obtained if we could build an on-chip 
dig1tal oscilloscope, capable of mea,unng the signal value at all points in time. We present two circuits for 
makmg ~uch measurements in a limited way. The first is a simple latch, which records the value of the signal 
when a tunmg signal arrives from off-chip. The <,econd, a C-elemcnt, can detenmnc the time at which a single 
transition appears on the signal to be tesLed. An additional advantage of the second scheme is that it requires 
exact ly one pad to accomplish the test. Note that the time required to drive the input pad used by the timing 
'>lgnal doe\ not ~ignificantly effect the results . 
2 Latch sampler 
A latch can be used to sample the value of a signal under the control of an externally-generated timing signal. 
We fabncatc a latch near the place where the signal to be tested is generated (Figure I). Signal T is the one 
being tested; L 1S a timing signal generated off-chip: and R communicates the result of the test for off-chip 
ana lysis. This c ircuit samples the signal T, u<.ing the latching signal L to determine the time at which the 
sample is to be taken . L is normally high: it i-, lowered at the instant we want to take a sample (Figure 2). The 
latch is ihcreaftcr closed, wllh a feedback path en~uring static stability. A complete trace of the behavior ofT is 
obtained by repeating the experiment many time~. varying the time at which L is lowered . 
~ R 
pad driver 
Figure 1: Latch Sampler. 
T 
L i 
Figure 2: Operating the latch sampler. 1 represent!. the approximate time of the sample ofT. 
FABRICATION SESSION 
Two Timing SampLePs 
Proper operation of this circui t requires ~orne attention to 1ts design. When L i~ lowered, the pass tran'ii~tor 
opens, preventing further change of the charge on the gate of the first inverter of the latch. Note that the gate 
may be neither fully charged to Vdd nor fu lly d1scharged to ground. We now allow the ~igna l to propagate 
through the latch inverters, and then close the feedback pass transistor. (For thl'> reason, the pullup time of the 
inverter that inverts L to drive the feedback pass transistor gate b arranged to be quite long. substantially longer 
than the propagation de lay through the latch . The pullup time is slowed with a large diffusion-to-substrate 
capacitor.) After the feedback pass transistor closes, the latch may remain in a metastable state for some time. 
After allowmg suffic ient time for metastable exi t and delay through the pad driver, we measure the result of the 
test by sensmg pad R. 
3 C-element sampler 
The second scheme for measuring uming uses a C-elementl as shown in the circuit Figure 3. The use of this 
scheme is illustrated in Figure 4, which shows a test to determine the time at which the signal T rises. L is 
mitially low, to msure that the C-element is ~et to 7Cro. L is then raised for a while, and lowered at a known 
time 1. If 1 precedes the rise of T , the C clement wi ll remain zero; if 1 follows the rise of T. the C-element will 
be set to one. The results of the experiment are observed on the output pad R. The cxpenment to detect a 
falling transi tion on T is the symmetric opposite of the one described above (i.e , L is s imply the complement of 
the s1gnal shown in Figure 4.) 
Figure 3: C-element timing sampler. 
T 
L 
i 
Figure 4: Operating the C-element sampler. 
A trick allows us to economize on pads, us ing a smgle pad LR to communicate both the L llignal and the 
results of the test (Figure 5). During the fir;t phase of a test , LR is driven from off-chip with the signal shown 
for L in Figure 4 . The n the external drive is removed (e.g .. by driving LR with a tristate driver), and we now 
detect the results of the experiment by observing whether pad LR is high or low. The trick is that although LR 
may change state when the drive is removed , the C-clc mcnt wi ll no t change s tate. For example, 1f LR is low. 
lA C-elemcnl ··. 1s a b1s1able device 1ha1 prov1dcs an ncuon similar 10 hy'>ICrc"'· in 1ha1 ih oUipul bc.:omcs I onl) ahcr <1// of"' 1npu1s 
an: I . and bc.:um.:' zero onl) aflcr all of ils mpuh arc zero .. IScilt 801 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
130 
Edward H. Frank and Robert F. ~proull 
anti the C-elcmcnt output i'> lllg h. removing the tlrivc will t:ausc the pad to become high by charging th rough the 
rC\1\tor. v.hich leave'> the C-elcment high. 
Figure 5: U'ing a 'ingle pad LR for both L ttnd R. 
4 Experiments 
We have de.,ignctl and had fabricated a project (a' part of two d1fferent mult•-proJeCt chlp'>, MPC580 and 
~1088) to te'>t these circuitS . The circui t diagrams and layouh arc '>hown in Figures 6 and 7. In addition to 
testing the idea~ ment1oned in thi!> paper. the projec.:t aho proviues a conventional C-clement for various uses 
(inputs on T and LR: output on C) . The c1rcuit diagram of the jig for testing these projec ts b shown in Figure 
8 Ba'>ic.:ally. we u.,e the vanable w 1dih pube generator running at one-half the frequency of the free-runnmg 
osc.:illator to va ry where the edge ofT occurs re lative to the edge of L or LR . The R and C output~ are used to 
mea\urc the results for the latch and C-e lcmcnt re!>pectively 
As previou.,ly mentl()ned the circuits were fabricated two ~eparate limes. Due to a minor mi!>take2 the 
MPC580 ver'>ion of the chip had a non-func.:tional latch sampler. This error wa!. fixed in the M08B ver\l()n . The 
C clement -.ampler worked in both versions o f the ch 1p. We 1.tatiea lly tc!.ted that the LR pm worked a'> both L 
and R. To -.unplfy the test jig. however we, u'>ed the C output in the dynamic test'> . 
Figure 9 and Table I wmmarite our re,ulh Although we tested both versions of the C-clement, the results 
v.ere almost JUenllc.:al and '>O only one '>et of fi gures is shown . As can be ~ecn, the latch sampler performed 
exllemel} well. \\l!h a Jitter of at mo'>t two nanoseconds . The C-element was not nearly as \aUsfactory While 
the Jitter wa., about the sam~: as the latch, the internal switching delay of the flip-flop caused two problems. 
hN. b~:c.:au\c of an asymmetry in the C-elemcnt , the delay was much greater for the high-to- low C-clemcnt 
tran'>ition than for the low-to-htgh transition . Second, because thl'> delay wa!. much grea te r than the prcd.,ion 
with wh1ch we \\Ould like to be able to mca-.urc s•gnab and moreover becau'e 11 wi ll vary from fab run to tab 
run. it makes the C -e lement a\ de\Jgned here not particularly useful to make accurate timing measurements. 
Wh1le not \hown here . expcnments such a\ measuring the effects of supply voltage and temperature on the 
t1mmg 'ampler should a lso be performed. In addition , Pun [Pun 77] ha!> used a si milar form of timing samph!r 
for mea\uring the mtntmum pulse w1dth required to disturb a latch. 
5 Pad Considerations 
One objective of any on-chip te'>tlng scheme i'> to minimize the number of pad~ devoted exclusively to this 
purpme . The latch scheme seems at first glance to require two pads for each s•gnal , one for L and o ne for 
R However . a single latching signal L could be used to control several latche'>; th1s a llows u' to <;ample n 
2f'orgctung to wire the R pad Ill gruund . 
FABRICATION SESSION 
Two Ti ming SampZe~s 
6/2:2/6 4/2:2/4 I TT I -----,2/2 L._f LL..--r---1-[>o--{>o I 
~2/2 pad driver 
16/2:2/2 
L T 12/ 12 diffusion-substrate cap 
········· 
········· :»jlj~ ~ ;:;:::.:.G round ~g L 
::::::::: ~~ 
~~--:~~ ::::::::: ~ T 
~[IJ. <--.'~~ ... ~: ~~ ~ l ( ,,,,, . ;:1::n:::;:u''i'>'::::: .. , , -~J Hm HiJ11~~~ 1~ I J ~~JlllJ lii;i i:::u::nk::::n:::n $J:$s~J$3J . ·n~ 
131 
ll~::~~~;l;:;~:lll~~~~~:;;u::::c~:?~~::::::;:;::::::::::n~~~ ::::::~\Hm;;:n:f::;:;:::::;:::1::::~~i::;:::::::::::l:~~~f~~~~~~~i:~=~~::::/=:::::~~:::;u:?:lll::1;_::::;:~;;p;Hl:~::::::::~::::;u::::::::: 
················· ······ ··· ················ ··················· ··· ·· ······ ························("0.~-·········· ··· ··································································· · ···· · ··············· ·················· ······· ·•········· ························ t~l.~~s:~~:~~~;:;:±t:~l.~ma~:;:.s~:;~~~~~~=ts~ts&~fi:::::::::::::::::::::::::::::::::::::::::::::::::::::::::=:::=:=~:~~::::::::::::::::::::::::::::=::::::::::::::::::::. :::::::::: ~::::::::::::: ::::::::::::::::::::: 
R 
Figure 6: Circui t diagram and layout of the latch sampler. Transi-.tor geometries arc given a~ lJW in unit~ of 
lambda. 
signal' U'>ing ,.., pads: n pad~ for the re\ults R and a single pad for L . This number can be reduced still further 
if the chip being tested has incorporated ~orne general provision for scanning out internal state (e .g. 'hilt-register 
latch) . In this case. only a single pad for L i., requ1red, because the contents of the various statiC latches 
controlled by L can be determined by the -.canning mechanb.m. Moreover. if properly des1gned . the clm:J... of the 
scan-out path itself can be used as the L signal. 
The C-clcmcnt scheme ~eems to require a sing le pad for each '>ignal being tested . However. it too can take 
advantage of provis ions for scanning out internal state: each C clement output can be transmilled off-chip by the 
scanning mechanism . 
6 Conclusions 
On-chip timing samplers can provide the designer with an effective tool for mca!.uring performance without 
using sophisticated test equipment. By incorporating these samplers into a scan-out path it is possible for the 
designer to provide both access to internal state and precise on-chip timing measurements without using many 
additional pads . 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
132 Edwa~i H. FPank and Robe Pt F. Sp PouLl 
L/W = 2/4 unless otherwise marked 8/ 2 
455/2 
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 
VDD 
Figure 7: Circuit diagram and layout o f the C-element sampler. The diffusion resistor is not shown in the 
layout. 
50 ohms ~11 1 
I 741S04 LAL [>o-u 
Square wave gen. 
74S04 Pu lse gen. 
~su- I 
~--------------------------------------------7_,4804 ----c:J 
50 ohms T 
Figure 8: Jig u\ed to test latch and C-element samplers. 
FABRICATI ON SESSIO N 
Two Timing SamptePs 
133 
T,__ _ _J/ ~'->o....--- -
~c~-------~~ ~--
(A) 
T ~ 
c 
(8) 
LR 
T ~ 
c I' 
(C) 
10 
LR 
h 
11 12 
(D) 
Figure 9: Rc~uhs of timing expenmcnts. Lower case leiters indicate the edge of T thm generates the 
correspondmg output on C nr R. Number' anu dots arc uscu as reference points lor the measurements. The 
horitontal a>.ls IS not to scale . Rise and fall t1mes forT. L and LR arc 4ns. (A) Latch sampler used to detect 
falhng eU!!C of T (B) Latch <.,ampler w.ecl to detect rising euge ofT. (C) C-elemcnt sampler used to detect 
falling edge ofT. (D) C -e lemcnt used to detect ri<.ing edge ofT. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
134 
Edwa~d H. F~ank and Robe~t F. Sp~oui~ 
Latch u'ec.l to detect fulling edge ofT (Figure 9A): 
T2 10 1 to produce output u: 
Tt 10 J to produce output b: 
Lutch used 10 detect rblng edge ofT (Figure 98): 
T ~ 10 4 to produce output c: 
T 4 w 6 to produce outupt d: 
C-elcment u'ed to detect falling edge of T (1-'igure 9C): 
TK 1u 7 to produce output e: 
T'J "' 7 to produce output f: 
C-element u~ed to detect ri~ing edge ofT (Figure 90): 
T 11 1u w 10 produce output g: 
T 12 1u 10 to produce output h: 
Table I : Timing measurement\. All times arc +I· I n~. 
References 
!Eichclhcrger 7!11 Eichelhergcr, E.B .. and Williams, T.W. 
A logic design structure for LSI testuhi lity. 
0 ns (min) 
2 ns (min) 
0 ns (min) 
2 ns (min) 
32 ns (min) 
30 ns (max) 
12ns (min) 
IOns (max) 
"Journal of De.liJ.III Autommion ami Fault Tolera11t Computing" 2(2): 165-178, May, 1978. 
!Frohwcrk 771 Frohwerk. R.A. 
Signature unuly,is: u new digital field service method. 
Hewlru-Pcwkctrd Journal :2-8, May, 1977. 
!Kocncmunn 791 Kocnemunn. B .. Mucha, J .• und Zwichoff, G. 
Built-in logic hlock ohservution techniques. 
In Te.w Co11}c•mwc• Proceed/nJ.Is, pugcs 37-4 1 . Octohcr, 1979. 
IPuri 771 Puri, P. 
The functiona l tester: an uid to the designer. 
In Tf'lt Cmifnence Prm·e{'(/lng.~. puges 64-RO. 1977. 
I Seitt RO] Seitz, C. 
Sy.,tem Timing. 
In l ntrodtwtion to VIA~/ Sy.1tt'f1H, chapter 7. pugell 21 R-254. Addison-Wesley, 19RO. 
FA8RICATrON SESSION 
THE ROLE OF TEST CHIPS IN COORDINATING LOGIC 
AND CIRCUIT DESIGN AND LAYOUT AIDS FOR VLSI 
Martin G. Buehler and Loren w. Linholm 
National Bureau of Standards 
washington, DC 20234 
ABSTRACT 
This paper emphasizes the need for multipurpose test chips and 
comprehensive procedures for use in supplying accurate input data 
to both logic and circuit simulators and chip layout aids . It is 
shown that the location of test structures within test chips is 
critical in obtaining representative data, because geometrical 
distortions introduced during the photomasking process can lead to 
significant intrachip parameter variations . In order to transfer 
test chip designs quickly , accurately, and economically, a commonly 
accepted portable chip layout notation and commonly accepted 
parametric tester language are needed. In order to measure test 
chips more accurately and more rapidly , parametric testers with 
improved architecture need to he developed in conjunction with 
innovative test structures with on-chip signal conditioning. 
1 . INTRODUCTION 
135 
The increasing complexity of VLSI circuits is forcing the development of 
coordinated aids for circuit design and layout, in particular aids that can 
be used to predict the performance of circuits prior to their fabrication. 
Key elements in the development of design and layout aids are microelectronic 
test chips, parametric testers, and data analysis procedures. Since test 
chips provide the input data for the logic and circuit simulators and the 
chip layout aids, it is essential that test chips be developed in concert 
with the simulators and layout aids . 
Microelectronic test chips* have been used by the integrated circuit industry 
for many years. Typically, the test chips are substituted at several 
selected sites for integrated circuits on production wafers. The points in 
an integrated circuit production sequence where test chips can provide 
valuable information are illustrated in figure 1. This sequence shows the 
logic design sequence on the left and the circuit layout and fabrication 
Contribution of the National Bureau of Standards, not subject to copyright. 
*In the authors' view, the term test chip is preferred over the term test 
pattern which can be confused with a sequence of electrical pulses used to 
test a circuit. The preferred term used to describe this latter effect is 
test vector. 
CALTECH CONFERENCE ON VLSI , January 1981 
136 
MaPtin G. BuehleP and LoPen w. Linholm 
• Design Rules 
• Fault Parameters • Device Parameters 
• Process Parameters 
• Equipment Evaluation 
• Reliability Analysis 
Figure 1. A simplified integrated circuit produc-
tion sequence illustrating the points where test 
chips provide needed information. 
FABRICATION SESSION 
The Role of Test Chips in CooPdinating Logic and CiP~uit 
Design and Layout hids foP VLSI 
137 
sequence on the right. Before fabrication, cross checks are made to ensure 
that the physical design correctly implements the logical specification [1). 
Traditionally, test chips have been used to supply process and device 
parameters and subcircuit data. In recent years, test chips have been used 
to evaluate such processing equipment as photomask aligners [2] and ion 
implanters. Also, reliability information has been obtained from test chips 
containing MOS capacitors which are analyzed for mobile oxide charge 
contamination and interface state densities (3] . In this paper we wish to 
highlight the role of the test chip as identified in figure 1 in circuit 
simulation, logic simulation, and chip layout. 
2. CIRCUIT SIMULATION 
The simulation of the de and timing characteristics of a circuit is essential 
in identifying circuit design flaws prior to the fabrication of complex VLSI 
circuits. In this section, the circuit simulator requirements are discussed 
in terms of test structures for (a) acquiring device parametric data, (b) 
verifying the dynamic circuit performance capability of the fabrication 
process, (c) measuring intrachip parameter variations, and (d) evaluating the 
onset of small geometry effects. 
Simulators such as SPICE [4] or MSINC [5) require device models, device 
parametric data usually collected from test chips, and geometrical layout 
information for the circuit being simulated. The device engineer "adjusts" 
various parameters in the simulator code so that the device models faithfully 
replicate the observed device performance. From discussions with industrial 
scientists, we found that the de response and timing simulation of MOS 
digital logic circuits is judged as adequate for processes where both the 
design rules and fabrication procedures are stable. In such a stable 
environment, the devices can be modeled with the use of a mixture of 
empirical intuition and physical insight. 
The predictive capability of circuit simulators is greatly reduced when the 
design rules change (e.g., devices become smaller) or when the process is 
changed significantly. For these circumstances more sophisticated device 
phenomena (e.g., short-channel effects) must be taken into account. In the 
long term, industry will need circuit simulation codes and data collection 
methodologies (based on test chips) that are easy to use and upgrade and can 
predict circuit performance as design rules and processes change. 
A commonly accepted test circuit must be developed which can verify that the 
fabrication process is capable of producing circuits with correct dynamic (or 
timing) properties. Traditionally, ring counters which are characterized by 
their frequency of oscillation have been used for this purpose. However, the 
oscillation frequency can often not be adequately simulated (especially in 
MOS circuits) because many interdependent factors contribute to its 
magnitude. A candidate circuit should be easy to measure in chip form and 
its performance should be simple to evaluate. 
As feature sizes become smaller! the ability to fabricate circuits with 
uniform features will become more difficult. As a result, the percentage 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
138 
MaPt i n G. Bue hleP and ~o Pen w. Linho lm 
variation in device parameters will increase. This variation will make 
circuit simulation more difficult. It also poses a problem in the placement 
of test devices on a test chip, for a devic e located in one part of the chip 
can have significantly different characteristics from a supposedly identical 
device in another part of the chip . Such effects were observed by Ham [6] 
for the threshold of MOSFETs fabricated in silicon on sapphire. The 
variations can be either intrachip or interchip in nature. 
To illustrate the importance of such parameter variations, we have measured 
the linewidth variations with the use of the cro ss-bridge test structure [ 7) 
shown in figure 2.* Using the tast structure shown in figure 2, a single 
mask test chip was prepared on a 10X master reticle. The test chip was 
composed of an 12 by 20 array of identical test structures with a design 
linewidth of 6 ~m. The final photomask was prepared from the master reticle 
by a step-and- repeat process. The photomask was used in conjunction with a 
contact printer and a photolithographic process to etch the test chip pattern 
into an 800 - nm thick aluminum layer. The aluminum had been electron- gun 
evaporated and deposited on an oxide film thermally grown on a 2-in . (50 . 8-
mm) diameter silicon wafer. 
The linewidth variations shown in figure 3 are from a single row of test 
s tructures measured across the diameter of the wafer. The plot shown in 
figure 3 indicates that the linewidth varies periodically with the chip 
dimension of 250 mil (6 . 35 nun). This periodic or intrucl1ip variation is 
superimposed on a nonperiodic or intePchip linewidth variation due to those 
factors which affect the contact between the photomask and the photoresist-
covered wafer. The periodic or intrachip linewidth variation is due to 
aberrations in the optics of the image repeater used to step and repeat the 
10X reticle. Similar results have been reported for a 15- ~m line [8) . 
The absolute magnitude of the variations shown in figure 3 is independent of 
the magnitude of the linewidth. Thus, the impact of such linewidth 
variations on device characteristics is quite dramatic especially for small 
dev ices. The linewidth variation for the lines shown in figure 3 is about 13 
percent. For 1- ~m lines, the variation would be 70 percent. These results 
illustrate the importance of the location within the test chip where 
"representative" device parameters are measured. Ultimately , the accuracy of 
the data entered into circuit simulators will be limited by both intrachip 
and interchip parameter variations . 
The final item mentioned in this section concerns the development of test 
structures and evaluation techniques which tell when the models used in the 
circuit simulators are no longer valid due to the onset of small geometry 
effects. Of critical importance for future VLSI components are the 
*The linewidth is determined after measuring the sheet resistance Rg which 
is determined from Rs = (~/~n2)(6V/I) , where I is the current forced 
between I 1 and I 2 and 6V is the voltage difference between v 1 and v 2• The linewidth W is given by W = RsLI*/6v* , where Lis the distance between 
the voltage taps shown in the figure , I* i s the current forced between 
* * * . * * I 1 and I 2 , and 6V ~s the voltage difference measured between v 1 and v 2• 
A more detailed account of the measurement is given in reference (7] • 
FABRI CATION SESSION 
The Rote of Teet Ch i ps i n Coo Pdi nating Lo gic a nd CiPc ui t 
Des ign a nd Layout Ai ds f o p VLS I 
I, 
v, 
Rs '-..J 
...--- ......__ 
12 i r- v2 I * 1 l v * 1 
-~ ~w 
-
v * 2 
I * 2 
Figure 2. The cross-bridge test structure used to measure 
the sheet resistance and linewidth of a conducting layer. 
5 .• ~----~----~r-----,-----~------~-----T------, 
u 
5.4 
• ~ 5.3 
i ..... '' 5.1 
5.0 
4.1 
4 .1~----~----~----~----_.----~~----~--~ 
0 250 500 750 1000 1250 1500 1150 
DISTANCE ACROSS WAFER, mil. 
Figure 3. The variation of the linewidth as determined from 
an array of identical cross-bridge test structures as shown 
in figure 2. The period of the linewidth variation corre-
sponds to the width of the test chip, 250 mil (6.35 mm). This 
periodic linewidth is due to aberrations in the optics of 
the image repeater uspd to step and repeat the 10X reticle. 
139 
CALTECH CONFERENCE ON VLS I , JanuaPy 1981 
140 
MaPtin G. Bu ehleP and LoPen w. Linholm 
evaluation of MOSFET short-channel effects and capacitor fringing field 
effects. 
3 . LOGIC SIMULATION 
The logic simulation of the gate level performance of a circuit is essential 
in identifying logic design flaws prior to the fabrication of complex VLSI 
circuits. As shown in figure 1 , the logic simulator can also be used to 
develop the test vectors used to test the circuit after fabrication. An 
accurate simulation depends on having proper fault models, correctly 
identified faults and their density, and detailed layout information . 
An example of a test structure which can provide detailed information for 
fault simulation is shown in figure 4 . This figure shows a MOSFET array 
which is composed of 100 devices where the gate is connected to the drain . 
This structure appears on test chip NBS- 16 [9] which includes two p - channel 
and two n-channel MOSFET arrays. On a 3-in. (76.2-mm) diameter wafer , 380 
arrays containing 38,000 MOSFETs were tested. The results shown in table 1 
are from 26 , 760 MOSFETs located in the interior portion of the wafer . Both 
the fault location and the relative density of different fault types, both 
clustered and nonclustered, can be determined from the electrical data. A 
fault is considered to be clustered when two or more adjacent MOSFETs 
containing the same fault type are detected in an array . As seen in the 
table , the most frequent fault was #8 , a combination of excessive leakage 
current and low breakdown voltage. 
A major limitation of present logic simulators is their inability to properly 
model faults other than classical faults. The latter comprise those faul ts 
where signal lines are either shorted to the power suppl y (stuck- at- one) , 
shorted to ground (stuck-at-zero), or simply open circuited. Sievers [ 10] 
has recently shown how the classical faults and open and short faults can be 
modeled for both nMOS and CMOS circuits. Any of the first five faults listed 
in table 1 could lead to a classical, open, or short fault . But the total of 
these five faults is only a small fraction of the total of the nonclassical 
faults, #6 through # 13. Future logic simulators must have the ability to 
model such nonclassical faults in order to enable more realistic circuit 
fault simulations . 
When using a logic simulator, it is essential that there be a one- to-one 
correspondence between the logic representation and the physical layout [ 11 ]. 
For example, the logic diagram shown in the upper part of figure 5 bears 
little resemblance to the accompanying circuit schematic. To illustrate , the 
wire D joining the two gates is not uniquely found in the circuit schematic. 
To perform a fault simulation where a classical fault (stuck- at-one or stuck-
at-zero) is introduced on this wire is a meaningless exercise . 
4. CHIP LAYOUT 
Chip layout is a very important part of a VLSI design system because the 
layout influences all other parts of the system. For example , the layout 
data set can be used to derive the geometrical data needed by the circuit and 
logic simulators. In addition, the choice of layout notation can restrict 
FABRICATION SESSION 
The RoLe of Te s t Chip s in Coo rdinating Logic and Circuit 
De s ign and Layout Aid s fo r VLSI 
Figure 4. MOSFET array test structure consisting of 100 
individually addressable transistors shown above . A 
schematic diagram is shown below. 
141 
CALTECH CONFERENCE ON VL SI, Janua ry 19 81 
142 
Ma~tin G. Buehte~ and Lo~en W. LinhoLm 
Table 1. MOSFET Array Test Results. 
Number of Number of 
Number Fault Nonclustered Clustered 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
Poly void/break 
Epi void 
Metal void/break 
Metal bridge 
Gate short 
VT IL Va 
L 
H 
H L 
L 
L H 
L H L 
H 
H H L 
Faults 
4 
0 
0 
1 
4 
5 
25 
62 
1 
0 
0 
2 
0 
Faults 
0 
1 
0 
0 
0 
0 
0 
0 
2 
threshold voltage; IL 
parameter too high; L 
leakage current; v8 
parameter too low. 
breakdown voltage . 
A<>-1 
Bo-f 
c~-.Jl'----J 
L----.~---' 
F = 
A·B+C 
F 
Figure 5 . Lack of correspondence between the logic 
diagram and the circuit schematic in that wire 0 does not 
appear uniquely in the circuit diagram. 
FABRICATION SESSION 
The Role of Test ~hips in CooPdinating Logic and Ci Pcuit 
Design a nd Layout Aids foP VL SI 
143 
the portability of the design. In the authors' view, the chip layout 
methodology must allow for design portability so that circuits can be 
fabricated in facilities with different design rules. The idea is to supply 
different fabrication facilities with a chip layout description that can be 
adjusted to meet the design rules for that particular facility. This idea 
requires that the manufacturer adjust the chip layout pattern as it appears 
on the photomask to allow for changes in feature sizes during fabrication so 
that features appear with specific dimensions on the finished circuit. One 
layout representation that appears to be a candidate for such portability is 
symbolic notation [12]. 
An additional need in chip layout is to develop test structures that can be 
used to establish layout design rules. Structures useful for this purpose 
are known as rand~n fault test structures where a feature (e.g., metal-to-
silicon contact) is repeated many times within an array [ 13]. Arrays with 
different numbers of features are fabricated and tested for opens or shorts. 
The number of good features per fault characterizes the process. The test 
structures to be developed must be designed so that the intended fault is 
measured. Good design practice dictates that each probe pad used to contact 
an array accommodate two probes (Kelvin contacts) so that one can be assured 
that the probes are making contact [8] • In addition, the arrays must be 
designed so that interferences between other parts o f the array are 
eliminated [ 14] • 
5. TES'r CHIP 
Despite the long history of test chip usage by the integrated circuit 
industry, there has been relatively little emphasis placed on the development 
of such chips. To make better use of test chips in the future, one must 
develop a coordinated metro logical system including advanced parametric test-
ers, commonly accepted parametric tester language, improved microelectronic 
test structures, and efficient information-handling procedures. 
The first multifunction parametric tester specifically designed to measure 
test chips was commercially available in 1978 and now there are at least 
three systems which can be purchased [15]. The availability of these systems 
greatly enhances the use of test chips in production wafer-probe 
environments. The commercially available systems typically have a system 
architecture that is based on the mechanical switch matrix as seen in figure 
6. The accuracy and precision of such systems is limited by noise introduced 
by the switch matrix and long leads. In addition, measurement times can be 
long when measuring low-level quantities. The authors feel that these 
limitations can be overcome by changing the system architecture to the pin-
electronics approach [ 16] as seen in figure 6. Here the stimulus/measure 
(S/M) devices are physically located close to the wafer probes . In addition, 
the number of wafer probes can be profitably increased (20 to 40 in the 
example) because test structures addressed by a wafer-probe array can be 
measured simultaneously. Overall test times for the pin-electronics approach 
should be significantly reduced [17]. 
A commonl.y accepted parametric tester language is needed to facilitate the 
rapid, accurate, and economical transfer of test chip measurements . As is 
CAL TECH CONFERENCE ON VLSIJ Januapy 1981 
144 
MaPtin G. BuehleP and LoPen W. Linholm 
Switch Matrix 
(Existing) 
Stimulus/ 
Measure 
Switch 
Matrix 
1 2 ... 20 
Wafer 
Pin Electronics 
(Proposed) 
I Computer J 
I 
Multiplexer 
... 
S/M 1 1 S/M 2 1-. -JSIM 40 
Wafer 
Figure 6. System architecture of multifunction paramet-
ric test equipment (existing and proposed [17] ). 
Figure 7. Schematic diagram of an integrated gated-diode 
electrometer. The boxes represent probe pads. The off-
chip components are the two de supplies and the resistor. 
FABRICATION SESSION 
The Role o f Te s t Chi ps in Coo Pdi nati n g Logic and CiPcuit 
Design a nd Layout Aid s f o p VLSI 
145 
well known, software development costs are very expensive. There is an IEEE 
standard test language named ATLAS (Abbreviated Test Language for All 
Systems) [18] that was originally developed by the commercial airlines 
industry for testing avionics packages. Recently, a software package was 
developed which translates ATLAS into a test language used in the testing of 
digital printed circuit boards [19]. Currently, there is no comparable 
standard test language for parametric chip testers. 
When developing new test structures, one must decide if there is a 
measurement advantage to be gained by incorporating a portion of the tester 
into the test structure. Such an advantage has been f o und with the 
integrated gated-diode electrometer shown s chematically in figure 7. Low-
level diode leakage currents can be determined by measuring the time decay of 
the output vo ltage v 0 resulting from the momentary application of reverse 
bias voltage VR to the gated diode GD through the MOSFET switch Q1• The 
internal gated-diode curre nt I is de termined from the expression [20): 
I = (C/8) (-dV / dt) , 
0 
where C is the diode capacitance and B is the incremental gain of MOSFET 
Q~· The diode capacitance can be determined from C = £A/W where £ is the 
d~electric constant for silicon, A is the area of the diode, and w is the 
width of the depletion region. For a one-sided step junction, W = [2£(Vi + 
Vb)/(qN)]l/2 where vi is the diode voltage, Vb is the built-in voltage, 
q is the electronic charge, and N is the dopant density. The de gain of 
MOSFET Q2 is determined from B = ~V0/ ~Vi which is evaluated by closing 
the MOSFET switc h Q1 (Vi = VR) and measuring V0 at two dif ferent values 
of VR. The expression above assumes that the capacitance of the gated 
diode C is large compared to the gate-source capacitance of MOSFET Q2 . The 
test structure design shown i n figure 8 obeys this restriction. The o ff-chip 
output resistor RL shown in figure 7 is replaced by an on-chip load MOSFET 
Q3 with its gate connected to the V0 point so that it operates in the 
saturated mode. 
Another integrated test structure has recently been reported by Iwai and 
Kohyama [21]. This structure is shown schematically in figure 9. Here, the 
unknown capacitance 
CR [8(v ./v ) - 1] 
~ oa 
where CR is a known reference capacitor, B is the ac gain of the output 
MOSFET, vi is the rms value of the ac input signal, and v0 a is the 
output at v0 for vi connected to point "a." To measure Cx, vi is 
applied to point "a" and the MOSFET switch Q1 is opened by properly biasing ~· The ac gain B is determined from B = v0 b;vi where v0 b is the 
output at v0 for vi connected to point "b." In this measurement the 
MOSFET switch Q1 is closed by properly biasing ~· This structure is useful in determining the value of the small capacitances typical of VLSI device 
geometries by connecting many capacitors into an array. 
CALTECH CONF ERENCE ON VLSI , J a nua ry 19 81 
146 
MaPtin G. BuehL e P and LoPen W. Linho Lm 
~--
I 
I 
I 
I 
GO 
~ ~ 01 
- v -
}118llll l8llll\ II ~ 
L~ J[ i' q>R R t:l~ vo l8l 
02 ~ ;n 
'--
-
VR GND 
1-- •o• ·~ 
r- r-
Vs VG 
L 
Figure a. The integrated gated-diode electrometer test 
structure shown for a polysilicon-gate technology. 
FABRICATION SESSION 
The Role of Test Chips in CooPdinating Logic and CiPcuit 
Design and Layout Aids fop VLSI 
Poly # 1 
Poly # 2 
Poly # 1 
Substrate 
lock· 
In 
Amp. 
Figure 9 . Schematic diagram of an integrated precision 
capacitance meter . The off-chip components are vi , RL , 
and t he lock-in amplifier . 
147 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
148 
Resistance. 
ohms 
314.4 
281.3 
248.3 
149.2 
116.2 
83.2 
50.1 
Resistance . 
ohms; sq . 
114.5 
109.7 
104.9 
90.5 
nsEi.~rnn 
85.7 
80.9 
76.1 
~aPtin c . Buahte~ and LoPen w. Linho7.~ 
(a) 
(b) 
Figure 10 . Wafer maps of the metal-to- n + contact r esis-
tance (a) and then+ sheet resistance (b) . The high cor-
relation between these two parameters leads to the con-
c l usion that excessivel y h igh contact resistance was due 
to t he lack of adequate cont rol of a phosphorus implant 
process step. 
FABRICATION S ESSIO N 
Th e Role of Te s t Chips in Coo ~di n ati ng Logic a nd Circz1i t 
De s ign a n d La y o u t Aid s : o r VL S l 
149 
Once the information has been collected from test chips , the data must be 
quickly analyzed to be of benefit . Wafer maps of a spectrum of parameters 
are crucial in identifying problems [22] . For example , a contact resistanc e 
problem was identified and correlated to variations in sheet resistance as 
shown in figure 10 [9] • The correlation was identified by observing a high 
correlation of these parameters from a host of other parameters [23] . The 
manufac turer of this wafer would not have identified this problem because his 
procedures ca l led for only two test chips on each product wafer . Other 
information-handling techniques have been devel oped over the years [24 ] . The 
industry needs to fully utilize existing techniques and to search for more 
useful and efficient techniques . 
6 . CONCLUSION 
Test chips ha~e a role that goes beyond their traditional role involving 
process or subcircuit evaluation . The additional role involves supplying the 
data f o r the logic and circuit simulators and in setting the design rules for 
chip layout methods . The development of design aids and test ch~ps must be 
coordinated to provide a well-integrated design system. In addition, new 
parametric testers are needed that can quickly s upply data of increased 
accuracy. The development of such testers must be coordinated with test 
struc ture development so as to take advantage of on- chip signal processing . 
In addition , effective data- handling techn iques (e . g ., wafer mapping and data 
management) need further development so as to rapidl y reduce data to a useful 
form . In order to transfer test chip designs q uick l y , accurately , and 
economically , a commonly accepted portable chip layou t technique and a 
commonly accepted parametric tester language are needed . 
7 . ACKNOWLEDGMENTS 
The authors are indebted to Charles Wilson, Tom Leedy , and others for their 
insights into the needs for parametric data for circuit simulators . The 
authors appreciate Gary Carver ' s contribution of the integrated gated-diode 
elec trometer design as shown in figure 8 and the critical reviews of Barry 
Bell, Murray Bullis, and Christoph Witzgall . 
CALTECH CONFERENCE ON VLSI, January 1981 
150 
MaPtin c . BuehleP and LoPen w. Linholm 
REFERENCES 
1. E . I . Mueh l dorf , Test Pattern Generation as a Part of the Total Design 
Process , 1978 SemiconductoP Test ConfePence , pp . 4-7 ( 1978) . 
2 . T . F . Hasan , D. s . Per l off , and c. L . Ma llory, Test Vehicles for the 
Measurement and Analysis of VLSI Lithographic and Etching Parameters , 
SemiconductoP Silicon 1981 (to be published) . 
3 . K. H. Zain inger and F. P . Heiman , The c-v Technique as an Analytical 
Tool, Solid State Technology ]l , 49-56 (May 1970). 
4 . L . w. Nagel , SPICE2 : A Computer Program to Simulate Semiconductor 
Circuits , Memorandum No . ERL-M5 10 , Electronics Research Laboratory , 
University of California , Berkeley , Californi a (May 9 , 1975) . 
5 . T . K. Young and R. w. Dutton, Mini MSINC - A Minicomputer Simulator for 
MOS Circuits with Modu lar Built-In Model, J . Solid- State CiPcuits 
SC-11, 730- 732 ( 1976) . 
6 . w. E. Ham, Intrachip and Spatial Parametric Integrity: An Important 
Part of IC Process Characterization , 1977 IEDM, 406- 409 ( 1977) . 
7 . M. G. Buehler, s . o. ~rant, and w. R. Thurber , Bridge and van der Pauw 
Sheet Resistors for Characterizing the Linewidth of Conducting Layers , 
J . ELectPochem. Soc . 125 , 650-654 ( 1978) . 
8 . M. G. Buehler , The Use of Electrical Test Structure Arrays for 
Integrated Circuit Process Evaluation , J . ELectPochem. Soc . 127 , 
2284-2290 ( 1980) . 
9 . L . w. Linholm , SemiconductoP MeasuPement Technology : The Design , 
Testing , and Analysis of a Comprehensive Test Pattern for Measuring 
CMOS/SOS Process Performance and Control, NBS Spec . Publ. 400 - 66 (to be 
published) . 
10 . M. w. Sievers , Approaching Fault Tolerant VLSI, CovePnment 
MicPociPcuit Applications ConfePence, TechnicaL Digest, 256- 259 
( 1980) . 
11 . H. s . DeMan , Computer- Aided Design for Integrated Circuits : Trying to 
Bridge the Gap, J . Solid-State CiPcuits SC- 14 , 6 13- 62 1 ( 1979) . 
12 . R. P . Larsen , Symbolic Layout System Speeds Mask Design for ICs , 
ELectPonics 2!• 125- 128 (July 20 , 1978) . 
13 . A. c . Ipri, Impact of Design Rule Reduction on Size , Yield , and Cost of 
Integrated Circuits , Solid State Technology ~' 85-89 (February , 
1979). 
14 . M. A. Mitchell, private communication . 
FABRICATION SESSION 
The Rote of Test Chips in rooPdinating Logic and CiPcuit 
Design and Layout Aids fop VLSI 
15. c. Chrones, Parametric Test Systems for Wafer Processing, 
Semiconductor International l• 113-122 (October 1980) . 
151 
16. P. C. Jackson, Optimization of ATE Switching/Multiplexing, 1980 IgEE 
AUTOTgSTCON, 242-246 (November 1980). 
17. M. G. Buehler and D. s. Perloff, Microelectronic Test Chips an~ 
Associated Parametric Testers: Present and Future, Semi~onductoP 
Silicon 1981 (to be published). 
18. IEEE Guide to the Use of ATLAS (J. Wiley, New York, 1980). 
19. T. Lapetina and D. Schneider, An ATLAS Implementation for a Commercial 
Tester, 1980 Iggg AUTOTESTCON, 5-7 (November 1980). 
20. G. P. Carver and M. G. Buehler, An Analytical Expression for the 
Evaluation of Leakage Currents in Integrated Gated-Diode Electrometers, 
IEEE mans. ElectPon Devices ED-27, 2245-2252 (1980). 
21. H. Iwai and s. Kohyama, Capacitance Measurement Technique in High 
Density MOS Structures, 1980 IEDM, 235-238 (December 1980). 
22. J. M. Charles and M. w. Lantz, Applications of High Speed Data 
Acquisition for Semiconductor Device Yield Analysis, IEgE T~ns . 
Electron Devices ED-27, 2299-2303 (1980). 
23. L. w. Linholm, R. L. Mattis, and R. c. Frisch, Characterizing and 
Analyzing Critical Integrated Circuit Process Parameters, 
SemiconductoP SiLicon 1981 (to be published). 
24. T. Cave and D. Smith, Data Reduction and Display in Automated Test 
Systems, ComputeP Design 12• 161-172 (May 1978). 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
152 
153 
INNOVATIVE LSI DESIGNS SESSION 
ChaiPpe~son: GERALD J . SUSSMAN 
ASSOCIATE P~ofesso~ of ElectPical Enginee~ing 
and Compute P Science 
Massachusetts Institute of Technology 
CALTECH CONFERENCE ON VLSI, JANUARY 1981 
154 
INNOVATIVE LSI DESIGNS SESSION 
1. Introduction 
Bit-Serial Inner Product Processors in VLSI 
Misha R. Buric 
Bell Laboratories 
Murray Hill, New Jersey 07974 
Carver A. Mead 
California Institute of Technology 
Pasadena, California 91125 
155 
Many problems in signal and image processing, pattern recognition, and feedback systems 
involve models with vector variables. Besides vector addition and multiplication by a scalar, an 
inner product of vectors is a basic arithmetic operation in these models. It is computationally 
most demanding, so that there is considerable interest in finding ways to speed up its imple-
mentation. Array configurations of simple processors for performing vector and matrix opera-
tions have been extensively reported. A number of ideas can be found in [11, [2), [7], [8], and 
their references. 
In this paper we describe a bit-serial pipelined implementation of an inner product proces-
sor, and related interconnections of a number of such processors on a single chip. We argue 
that bit-serial computational models are particularly suited for VLSI, because of relatively inex-
pensive communication links and arithmetic processing elements, in terms of the area occupied 
on silicon. Sixteen inner product processors, described here, may be easily placed on a single 
40-pin chip in today's NMOS technology with a 2 micron lambda. Similar arguments for bit-
serial arithmetic were used in [3), in a description of a design of a general purpose massively 
parallel processor. 
Generally, all multiprocessor schemes can be divided into two classes with respect to the 
interconnection patterns among processing elements. Static communication links characterize 
those array configurations in which a fixed algorithm is executed repeatedly and synchronously 
with the input data flow. These structures are especially useful if the input information is being 
continuously provided by sampling some real world variables, and the purpose of the processing 
is to provide a compressed version of the data, or to transform it into another representation. 
Many examples can be found in speech and image processing. This structure and its variations 
is examined in this paper. Another, more general class of multiprocessor schemes involves 
flexible communication interconnections among the processors through switched networks, (7] . 
2. Basic Processors and their Interconnections 
An inner product of two n-dimensional vectors, x andy, is defined by 
(l) 
where 
XT- [ x, x2 ... Xn ] and yT- [ Yt Y2 . .. Yn ] 
We use a convention that all vectors are column vectors, and that T denotes a transpose opera-
tion. A transposed column vector is a row vector. It is assumed here that all the vector ele-
ments are integers. The equation (1) can be rewritten in an iterative form as follows: 
CALTECH CONFERENCE ON VLSI, Janu a Py 1981 
156 
z, - x,y, + z, 1 , i-1.2, ... n (2) 
zo- 0 , Z- Zn 
There are a number of ways to implement this equation, ranging from a single multiplier-
accumulator combination, to an array of n processors. 
In the first case, which may be called an iteration in time, the 2n operands are fetched two 
at the time, they are multiplied together, and added to the previously accumulated value. This 
requires n steps, assuming that the pipeline registers are used to enable overlapping of the 
operations. 
On the other side, an iteration in space is characterized by a pipelined array configuration 
of n processors that operates on 2n operands simultaneously, and provides a new result every 
cycle. The i-th processor ts assigned to the i-th elements of the input vectors, and to the {i-1}-st 
partial sum, and it produces the i-th partial sum Due to pipelining, a processor may receive 
new operands every cycle. 
Iteration in space will be considered first , because it provides the necessary throughput for 
large vector sizes. In a VLSI implementation of this scheme, the cost of arithmetic elements, 
and the communication cost of providing 2n operands and interconnecting individual proces-
sors, would be prohibitive if we were to use word-parallel arithmetic. 
Bit-serial arithmetic and communication, however, offers a viable alternative for two rea-
sons First, the arithmetic components and the communication lines occupy much smaller area 
on silicon. Therefore , a larger number of basic processors may be integrated on a single chip. 
Second, the nature of inner product computation requires more bits of precision for larger vec-
tor sizes A fixed-word parallel arithmetic is not suitable for a flexible precisiOn control, since 
the overflow conditions may occur for larger vectors, unless a sufficiently large adders are pro-
vided in advance. On the other side, bit-serial addition does not suffer from this problem, 
since the precision may be maintained arbitrarily high with the same hardware. 
In this text we use b- 1 as a bit-delay operator, analogously to z- 1 which we reserve for 
signal processing applications. This notation is convenient for treating bit-serial systems. 
because there may be various delays in a complex array of serial elements, so that a systematic 
treatment of path delays is important 
Our implementation of a single inner product processing element is shown in Figure 1. 
b-3 X. b- 3 
X, 
-3 
IS" 114 .. . * ( t:+ x -a )b 
j 
"l 
"J b- ~ 
F; , . 
It consists of a modular btt-serial multiplier, (two's complement, 16-bit), and a single-bit 
carry-save adder The input variables x, y, and z enter the processor with the least significant bit 
first, and the result starts appearing three bit times later. The computation is synchronized by a 
control bit, that is applied simultaneously with the LS bits of the operands. The processor pro-
vides 31 bits of the result. An additional 0 bit is inserted between the result and the LS bit of 
the next product. The input operands are padded with 16 zeros to the left of the most 
significant bit, so that new operands may be applied every 32 clock cycles. 
INNOVATI VE LSI DESIG NS SESSION 
Clockmg is two-phase, such that phase one controls all data transfers, and all logic func-
tions are evaluated in phase two. The delays b 1 imply shift-register pipeline stages. 
There are some variations of the circuit in Figure I that are applicatiOn dependent The 
delayed mput variables do not have to be provided at the output in some array configurations. 
Also, we will show later in the text that the adder may be connected to a pair of multipliers 
instead as shown in Figure l. 
The size of a single inner product processor was 2720 by 155 lambda with a linear layout, 
and 680 by 620 lambda in a rectangular configuration. There was no attempt made to minimize 
the size of the basic multiplier cell. Despite this, it is feasible to place sixteen such processors 
on a 40-pin chip, but the limitation in this and future implementations is not so much the area, 
as much as the number of external connections. 
"-, 
0 
';} , 'hb-(, 
F; ~ . 2. 
.. , 
-3(1'1-1) 
X. b 
1'1 
This brings up the main question· what is the best interconnection among processors on a chip, 
which makes further combinations of such chips most flexible? 
A linear array, as in Figure 2, has many desirable properties for a number of applications. 
The number of pins for a 16 processor combination is less than 40, and the array may grow 
arbitrarily large. The expression for an inner product of two large vectors has the same iterative 
form in terms of inner products of their smaller parts, as in equation (2) . That is, if x and y 
are two vectors of size M. we can partition them into K groups of smaller vectors of size n, so 
that the inner product xT y may be evaluated as follows: 
XT- ( cr c[ cl), YT- [ d! d[ · · · d[ J 
z- xTy- c!d1 + c!d2 + . .. + cldA 
z, - c,Td, + z,_., i - 1,2, · · · ,K , 
Zo- 0 ,Z- ZJr. 
If each partial inner product of size n is computed in a separate chip, then the complete result 
is obtained by a linear connection of the K chips. The throughput rate remains the same, one 
inner product per 32 clock cycles regardless of the vector sizes. Of course, extra cycles may be 
inserted between two products if there is a need for the overflow control in the adder stages. In 
Figure 2 the variables x, , y1 are integer elements of vectors x and y, which enter the processor 
bit serially. The expression x, b- 1 means that the integer x1 is delayed by j bit cycles. Notice 
that each element of the vectors is delayed by b- 3, with respect to the previous element. This 
skewing does not pose any conceptual difficulties, but for practical reasons it would be easier to 
apply all elements simultaneously, without any bit delays among the operands. A possible solu-
tion is to include an appropriate shift register delay at the input of each section, but this would 
increase the area of the chip. 
A better form of a linear array of basic inner product processors, suggested in [8] for 
word-paral\e\ arithmetic, is shown in Figure 3. This configuration does not require any delays 
157 
CA£TECH CONFEREN CE ON V£SI , J a nua Py 1981 
158 
Misha R . BuPik and CaPVeP A. Mead 
(:r.. 
1---------------'!...- • ., 
T -7 
... X..d-)b 
among the vector elements, and produces the initial product with the delay of only b- 6 in a six-
teen processor chip. Here, the adders are organized in a tree structure, and an additional 
carry-save adder is added to the chip, with no internal connections. The purpose of this adder is 
to allow interconnections of an arbitrary number of chips for larger inner products. The pipe-
line registers are assumed to be included in each adder, and the corresponding delays are shown 
externally. 
A connection of individual 16 element chips for operating on larger vectors is shown in 
Figure 4. Summation of the partial inner products is again done by a tree structure of the 
adders, which are obtained from a pool of free adders. 
F ;"'· 'f . 
In Figure 4 each x1 and y1 are 16 dimensional vectors. 
This will be the main scheme in further discussion. It is sufficient for applications such as 
FIR and IIR filtering, matrix multiplications, vector convolutions and others. 
Alternatively, the inner product processors may be connected together in a hexagonal 
array suggested in (1] and [2] for matrix operations. In this case, every processor has three 
INNOVATIVE LSI DESIGNS SESSION 
Bit - SePiat Inner Product Processors in VLSI 
inputs and three outputs, but a chip with sixteen processors, shown in Figure 5, has two pairs 
of four inputs for the vector variables, two pairs of four delayed outputs of the same variables, 
and seven inputs and outputs for the result variables. In this case the input variables would be 
provided at the outputs delayed by b-3, as in Figure 1. Larger hexagonal arrays may easily be 
constructed out of these sixteen processor chips, if each is viewed as a new basic element in a 
hierarchy. Even though this approach addresses a very important issue of the data flow through 
the network simultaneously with the computation, it requires more complex synchronization of 
the input operands. 
3. Data Flow Control 
It was assumed so far that an inner product of two vectors was computed by iterating n 
processors in space. The 2n operands were supplied simultaneously bit-serially, and due to 
pipelining there was no delay between consecutive results of a series of inner products. There-
fore, the computational throughput was adjusted to match the input data rate at the expense of 
more processors. This approach can be extended to matrix products, because they consist of a 
number of inner products. First, multiplication of an n-dimensional vector by a matrix, (m by 
n), can be accomplished by m arrays of inner product processors, each consisting of n basic ele-
ments. One operand to all arrays is the vector, and the second operand is one row of the matrix 
for each array. Each array computes one component of the result simultaneously with others. 
Hence the throughput still remains the same as in the case of a single inner product. Similar 
structure can be used for a product of two matrices, (m by n) and (n by k), where mk linear 
arrays of dimension n compute mk results simultaneously. 
In applications of inner product arrays outlined above there is a need for a data flow net-
work that connects the source of operands with the computational structure, and provides for 
internal data flow during the operation. This is also important for purpose of matching the 
input data rate with the processing throughput. For example, in an FIR filter application a new 
data sample may be provided every T microseconds, and a filter has to perform one inner pro-
duct of length N on two vectors. One vector consists of N previous samples, and the other is 
composed of filter coefficients. The result of the inner product is output once per period T. 
Let xk be the input sample, and zk the output value at time k. Then: 
aT - (a0 · · · aN-I) , XT - (xk Xk-1 Xk-2 .• ' Xk-N+d 
159 
CALTECH CONFERENCE ON VLSI, January 1981 
160 
Misha R . BuPi k and Ca PVeP A . Mead 
N- 1 
zk - a r x- ~ a,xk , 
Therefore, both input and output are single integers, but the processor array operates on two 
vectors. In addition, the x vector has to be updated every sample time, in such a way that all 
the components change their position by one place: 
xk+ •+ l - x 4+, , i- 0,1, · · · ,N-2 
with the new sample becoming xk . This is a shifting operation, suggesting a set of shift regis-
ters for this application. Now, if our inner product processor generates a result in ~ t 
microseconds, ~~ being much smaller than sampling period T, we can use a smaller processor, 
with only N~; basic processors, but there has to be a mechanism for accumulating partial 
results within one inner product and cycling through all partial vectors of this smaller size. 
This simple example is an illustration of a problem which contains a combination of com-
municational and computational complexity. A standard measure of computational complexity 
in this case would indicate that the problem is solvable in O(N) time, and since we can use an 
N processor array it becomes an 0(1) problem. However, there are additional communication 
costs of providing 2N operands, and performing N data exchanges. Also, in a VLSI implemen-
tation of this example it would be only sensible to provide aJI data movements within the same 
chip that contains the inner product processor, so that there is a single external connection for 
input and output. 
This ts typical for many algorithms with vector variables. Each operand interacts with a 
number of other operands before the computation is completed. Another example is a convolu-
tion of two vectors of size N, which requires 2N inner products of the same vectors, but one of 
them is shifted each time. If there is a reasonable restriction that the data be brought into a 
VLSI vector processor only once, which minimizes the number of interactions with the external 
world, then there has to be some data storage on the chip with a flexible data exchange scheme. 
This issue prompted a hexagonal array approach in [1) and [21, in which the data storage and 
flow takes place in each basic processing element, and the topology of a network is tailored for 
the problem. 
Here, we examine another alternative that seems to be well suited for bit-serial vector 
processing. Consider a shift-register element that has two inputs, horizontal and vertical, and 
one output, Figure 6. It can shift the data either horizontally or vertically, as determined by a 
shift control signal. Next consider a standard shift-register of length N, say 16, which consists 
of N-1 standard cells and one two-input cell, shown in the same figure. 
F <. 1 . b 
A set of such shift registers can now be connected in a storage array with a two-dimensional 
shift capability. To demonstrate an application of such an array with an inner product processor 
let us consider an FIR filter implementation. 
Suppose the filter is specified at 512 points, and the sampling period is 100 microseconds. 
A conservative estimate of the inner product performance is 3 microseconds per product. 
Therefore, a 16 processor array is sufficient for computing inner products of 512-dimensional 
vectors by iterating in time 32 times. A diagram of this configuration is shown in Figure 7, for 
INNOVATIVE LSI DESIGNS SESSION 
Bit - Se r ial In n e r Pr od uct Pr ocessors in VLSI 
a smaller array. In order to iterate in time, a bit-serial accumulator is provided. There are two 
sets of register arrays, data and filter registers. The data registers are connected vertically in 16 
circular groups, and horizontally into a linear array. In a case of a constant filter the filter regis-
ters may be connected in the same way as the data registers, even though there are applications 
in adaptive filtering where a different connection would be used. Each register group has a sin-
gle one-bit input and output, the output being used for expansion purposes. 
'C. lt.- I 
The filter coefficients are loaded into the registers one at the time by using horizontal shifting. 
During each sample period the computation consists of 32 partial product cycles, and one 
memory shift cycle. Each partial product cycle results in a 16 component inner product, during 
which the registers are being shifted vertically. This provides bit-serial operand streams to the 
inner product processor, and simultaneously prepares new operands for the next cycle. At the 
same time the accumulator adds previous partial product to the one being computed. The last 
partial product cycle produces the result that is then shifted outside. In the next step a memory 
shift is done by shifting registers horizontally. The first register receives a new sample from the 
external source while it is transferring its content to the next neighbor to the right. At the same 
time the accumulator is cleared for the next round of partial product cycles. This sequence of 
steps is repeated for each new input sample. 
This example is indicative of tradeoffs that have to be made in a practical design of array 
schemes for vector processing. The goal is to minimize the silicon area, while matching the pro-
cessing speed with the available input/output data rates. Here, the area of two-dimensional shift 
register arrays was much smaller than an array of 512 inner product cells that could be used for 
the same computation. However, if the input sampling rate was on the order of 3 microseconds 
and the filter was specified with the same number of points, then a large processor array would 
be used. In addition, if the input rate was even larger, two arrays would have to be used, each 
operating on alternate samples. 
A two-dimensional shift register array may be used in a similar way for other vector and 
matrix operations. An alternative to this approach is to use standard memory arrays with a spe-
cialized memory access facilities. An example is given in [9] . 
161 
CALTECH CONFERENCE ON VLSI , January 1981 
162 
Misha R . Burik a nd Ca rv e r A . Mead 
4. The Multiplier 
There are a number of reported bit-serial multipliers in literature [4), [5), [6) . Most of 
them preserve only N most significant bits of the result. We have devised yet another 
configuration, which preserves all 2N-1 bits. This is important for inner product computations, 
where a large number of individual products are accumulated. 
If two integers are given in a binary representation, then they can be viewed as vectors 
whose components are in the set [0, 1). Then, their product is a vector whose elements are 
obtained by a convolution of the two operand vectors. Alternatively, a polynomial representa-
tion with a delay variable b- 1, can be used for representing integers for bit-serial arithmetic. A 
convolution of two vectors is equivalent to a polynomial multiplication, if the vector elements 
are equated with the polynomial coefficients. Let x and y be two N-bit integers, represented as 
N- 1 N-1 
x- ~ x1 b-' , y- ~ y, b-' 
The product polynomial is given by 
2N- 2 
z- L zk b- k , 
k-o 
It is interesting to note that each zk is an inner product of two binary vectors, so that a multi-
plier design becomes an exercise in configuring a regular array structure for inner product com-
putation. In order to derive such a structure, we rewrite the result polynomial as: 
2/'.-2 
z - L ( z~ + z~') b-J. 
(!] 
2 
A-0 
ZJ. - L X,Yk-1 ' 
,-o 
These two expressions can be wr1tten in an iterative way, such that they map into a linear pipe-
lined array of N sections. The j-th section computes the following: 
• • 1 
Z1 - X1Yk + z1+1b- , )~ k 
.. .. 1 
z1 - y1xk + Zj+lb-' )<k 
k - 0 1 · · · N - 1 1· - 0 1 · · · N-1 z1 - z1' + z " 
' ' ' ' ' , ' J 
The section 0 provides the product polynomial, with z0b- 1 being the least significant bit. Notice 
that the additions in the above expressions are arithmetic, with a carry bit. 
A diagram of a single section of the multiplier, and the connection of the sections, are 
shown in Figure 8. The operands are applied on two single bit buses, x and y, one bit per 
cycle. The control bit is provided simultaneously with the LS bits, and it advances from the 0-th 
section to the remaining stages synchronously with the bit rate. Its purpose is to enable x and y 
latches in each section. In the i-th cycle, it deposits i-th x and y bits in the i-th latches. Each 
section computes two partial sums, z,' and z;". Carry-save adders add together three values, a 
product of two bit values, previous carry bit, and the delayed partial sum from the next section. 
The 0-th section also contains an adder for forming the final result. Finally, two's complement 
multiplication is obtained by applying a special "subtract" signal, simultaneously with the most 
significant bits of the operands, x,.._1 and YN- l · This has the effect of converting all adders to 
borrow-save subtractors at this time, (except in the last section). In this implementation, it 
takes 2N-1 steps to perform a multiplication of two N bit numbers. 
The floor-plan of a single multiplier section looks very much like the diagram in Figure 8. 
All signals were chosen to run horizontally, so that a multiplier with an arbitrary number of bits 
can be constructed in many ways, by abutting sections on two edges only. 
INNOVATIVE LSI DESIGNS SESSION 
11:>.3 
Bi t-Se PiaL I nn e P PP oduct PPoc essOPB in VLSI 
_, 
b 
)(. 
Con. 
I 
t.· 1 
I -1 
~j .. , b 
II ~--------1---------~,__..------------------~ 
" " 
5. Conclusion 
A bit-serial approach to computations of vector inner products offers many advantages 
over word arithmetic. The size of basic processing elements and communication links is much 
smaller, and the array configurations are easy to implement. The slower rate of operation of a 
single multiplier-adder combination is offset by a much higher throughput rate of a large 
number of processors. A single element has been designed and tested, and a sixteen processor 
combination with a tree of adders is under way. The approach is especially useful for real-time 
signal processing tasks. 
6. Acknowledgements 
We are grateful to David Hagelbarger for suggesting the structure of the multiplier. Also, 
we would like to thank Sandy Fraser and Mike Maul for making the chip production possible. 
7. References 
1. C. A. Mead and L. A. Conway, Introduction to VLSI Systems, Addison-Wesley, 1980, 263-
330. 
CALTECH CONFER ENCE ON VLSI, J a nuaPy 1981 
164 
Misha R. BuPik and Ca PVe P A. Mead 
2. H. T. Kung, The Structure of Parallel Algorithms, Carnegie-Mellon University, August 
1979. 
3. K. E. Batcher, "Design of a Massively Parallel Processor", IEEE Trans. Compur., Vol. C-
29, pp. 836-840, Sept. 1980. 
4. E. K. Cheng and C. A. Mead, "A Two's Complement Pipeline Multiplier", Proc. ICASSP, 
Apr. 1976. 
S. R. F. Lyon, " Two's Complement Pipeline Multipliers" IEEE Trans. Commun., 418-425, 
Apr. 1976. 
6. L. B. Jackson et al., "An Approach to the Implementation of Digital Filters", IEEE Trans. 
Audio Electroacoust., vol. AU-16, pp. 413-421, Sept. 1968. 
7. Special Issue on Parallel Processing, IEEE Trans. Comput., vol. C-29, Sept. 1980. 
8. E. E. Swartzlander et al., "Inner Product Computers", IEEE Trans. Comput., vol. C-27, pp. 
21-31, Jan. 1978. 
9. K. E. Batcher, "The Multidimensional Access Memory in STARAN", IEEE Trans. Com-
put., vol. C-26, pp. 174-177, Feb. 1977. 
INNOVATIVE LSI DESIGNS SESSION 
165 
A SMART MEMORY ARRAY PROCESSOR FOR TWO LAYER PATH FINDING• 
Christopher R. Carroll. Caltech 
This paper describes three examples of hardware Implementations of pAth fmciiiHl 
schemes based on the lee-Moore maze solving algorithm. One is purr>ly a 
demonstration circuit to show the tec hnique . The other two are completr l Sl 
Implementations which should be usable In building large and useful path finding 
machines. One of these two LSI c ircuits, known as the MAZER. is designed to find 
shortest paths from one point to another on a plane, whe re there Is only one Ioyer of 
allowable routes to take. As its name suggests, this chip solves ordinary ma zes. or on 
a more practical level, it can route wires on a one sided printed circui t board . ThP 
other LSI circuit, known a s the PATHFINDER . is designed to handle the two sided 
printed circuit board case. It finds a least costly path from one point to another 
where there are two parallel planes on which routes are allowed. C rossinrt o f the 
path from one plane to another can be either unrestricted, as In free via printPd 
circuit boards, or permitted only in certain places, as In fixed via boards. The phrase 
"least costly" above can, for now, be read as "shortest", although in a later section A 
more general definition will be revealed . 
The remainder of this document is divide d into three parts. The first sec t •on outlmc-s 
the original L~e-Moore algorithm f or path finding. on whic h the circuits descr•bed 
later are based. The second section details the one layer hardware , including hoth 
the demon•tration circuit and the MAZER chip . Finally. the third section dPscr•hes 
the PATHFINDER chip and the techniques used to conquer the problpms encounterC'd 
In two layer path finding . Documentation on the integrated circuits Includes those 
results of testing and characteriz ation which were available at the time of thi s 
writing. 
'"The research described In this paper was sponsored by the Defense Advf'nced 
Research Projects Agency, ARPA Order number 3771, and monitored by the Office of 
Naval Research under contract number N00014-79-C - 0697. 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
166 
Christopher R. CarroLL 
1. THE LEE-MOORE ALGORITHM 
The Lee-Moore algorithm for path finding, proposed by E. Moore In 1 959 ( 1) and 
extended by C. Lee in 1961 (2), is a scheme for finding the shortest route betwe en 
two points In a plane. where the route is composed of some number of vertical and 
horizontal segments through a rectanqular grid superimposed on the plane ." This has 
been a popular algorithm for people doing problems related to maze solvino because It 
is easy to Implement and because It guarantees that a path will be found if one 
exists. The drawbacks to the algontllm arc that It Is expensive computotionnlly 1n 
both t1me and space. However, the use of the hardware described in this paper 
circumvents these difficulties. 
Suppose that the size of the gnd, i.e . the pitch of the cells defined hy the wl(1, Is 
set to the minimum path width that is allowed. In the case of printed circuit honrd 
design this would be the minimum center to center spacing for adjacent wlm.o;. 
Suppose also that the grid IS u1uform and symmetric, forming an array of square cells. 
each a path width on a s1<1e . The path found by the algorithm from point A to point B 
will consist of a route beginnmg at the cell contaming point A, continuing to a neiohhor 
of that cell, and then to a neighbor of that sE> c ond cell, and so on from a cell to one of 
Its neighbors, until eventually the path ends 1n the cell containing point B. Some of the 
cells in the array may be blocked, preventmg the path from runninq through these 
cells . These would be "barriers", or "walls" 111 A maze. or cells occupied by prev1ous ly 
routed wires In the prmted circu1t board apphcat1on. 
The algorithm finds the shortest path from A to B 1n two phases. One word of storaoc. 
which I will call the "label", is assoc1ated w1th each of the cells in the array. The 
first phase, called the Propagation Phase, stores mformatlon In the labels throuqhout 
the array. The second phase, called the Retrace Phase, then uses that information 
to find the required path. 
" Since our geometry here Is based on this grid, the distances mentioned will be 
Manhattan distances, i.e. the distance from A to B would be the shortest distance 
covered by a taxi driver driving from A to B on the streets of Manhattan. 
INNOVATIVE LSI DESIGNS S ESSION 
16'/ 
A Smart Memory Array ProcessQr for Two Layer Path Finding 
The Propagation Phase, whic h distributes the information, executes the following 
program: 
put label -1 in ali cells which are blocked 
put label 0 in all cells whic h are not blocked 
N:=1 
put label N in the cell containing point A 
while cell containing point B is labelled 0 and more activity is possible do 
begin 
for every neighbor of every cell labe lled N do 
if that neighbor Is labe lled 0 then label it (N+ 1) else leave it alone. 
N:=N+ 1 
e nd 
This part of the algorithm is illu strated in Fioure 1 . The purpose of this phRse is to 
distribute informat•on to the ce lls wh1ch can then be used to find the cllrection hnc:k 
to point A. The information is s pread out in a propagating wave fro nt cente r ecl o n pomt 
A, muc h like waves propaga ting away from a s to ne dropped in a ponrl. It is 
interesting to note that the o nly activity tha t takes place occurs at the front1er o f 
this expanding wavefront. Celis ahead of th e frontier merely walt for the wave to 
arrive, keeping their label of 0 . Cells behmd the frontier have already receiv ed the 
information they need. and simply ke e p 1t s tored in their label. 
The Retrace Phase , using the informat1on s tored in the labels, executes the fo llo w inq 
program to find the path: 
Sta rt the path at the cell containing point B 
N := label o f cell containing point B 
if N=O the n there is no path fro m pomt A to point B 
else begin 
While path has not yet reached cell containing point A do 
begin 
N :=N-1 
Continue the path to a neighbor of the c urre nt cell which contains 
the label N 
end 
end 
This part of the algorithm is illustrated in Figure 2. Notice that there is nothinr~ to 
s pecify which cell to choose when there are two or more possible c hoic:es. This 
me rely means that there are multiple paths between A and B that have the so me 
length. To first order, there is thus no preference of one path over another, so no 
selection mechanism need be used . In practice, some scheme is often employed when 
t~ere Is a choice of paths to take. A common s election scheme is to avoid chanqlng 
CALTECH CONFERENCE ON VLSI, January 1981 
168 
4 
.5 4 5 G 7 p 
~ n 4 J 4 ~ B 4 ..., ..J 4 ~ IBB R 
2 ~ ~ ~ ~ J 2 ~ ~ ~ ~ J 2 ~ ~ ~ ~ -I 
2 Al 2 2 A l ') ,_ J 4 2 \1 2 J 4 5 6 
2 J 2 J 4 J , J 4 5 6 7 
Just S tnrtinc In Proc-rPss Fini s h<' ·] 
Fic:ure 1. The Lee-!'loorf> Propaeation Phac;e 
( 1 t r-1 n crnn t £' p ~ th) 
5 4 5 6 ll 8 5 4 ~. ~ .~ 8 ., 5 *~ _., £:.. ""' 8 • f' 
' 4 J 4 ~ ~ R 4 J 4 ~ ~~ R 4 ~ .J ~ I~ p 
J 2 ~ ~ ~ ~ 7 J 2 ~ ~ ~ ~ 7 J ~ ~ ~ ~ ~ 7 
2 \1 ? 3 4 5 6 2 A 1 2 J 4 5 6 2 ~u 2 J 4 5 6 
J 2 J 4 5 6 7 J '? 3 4 5 6 7 J 2 J l, 5 (i ... I 
Ju s t StartinG In Pro~ress Finishcrl 
FieurP 2 . Thc> Lee-!-looro RetracP Phase 
INNOVATI VE LSI DESIGNS SESSION 
169 
A Sma Pt Memo Py APPa y PP oce s so P f o p Two Laye P Pat h Fi n di n g 
directions In the path during the Retrace Phase when It Is unnecessary. This tends 
to minimize the number of bends In the resulting path. When this phase Is complete, 
the algorithm either has found the required path from A to B or has proven that no 
such path exists. 
Before beginning the discussion of the hardware implementations of this algorithm, a 
couple of things should be noted. First , examine the time complexity of the programs 
above. The Retrace Phase merely traces the path from B bock to A using information 
stored in the cells . The only cells accessed are those along the selected path and 
their Immediate neighbors. The time comple xity is thus linear with respect to the 
path length. However, in the Propagation Phase, the situation is worse. Information is 
propagated in all directions around point A. The number of cells accessed is 
approximately proportional to the square of the path length. Thus, the time complexity 
here is quadratic with respect to the path length, making the algorithm as a whole 
quadratic. This is unfortunate, since for mo7e solving to be Interesting, a large maze 
must be involved. In the circuit board applic ation, for example, a cell array containing 
1 000 x 1000 cells would be common. The quadratic time aspect of the algorithm 
thus Is a real handicap. Current software using this algorithm to route typical printed 
circuit boards can consume several hours of CPU time on a full - size computer. On top 
of that, the space requirement is also large. Circuit board routing requires 1 0-1 2 bits 
of storage for each cell , ond a million 12 bit words is a lot of memory. So, both the 
space and time complexity of the algorithm need to be attacked in any successful 
hardware Implementation. The next s e c tion s hows how this was accomplished. 
2. HARDWARE FOR FINDING ONE-LAVER PATHS 
Implementing the Lee-Moore algorithm in hardware is a clean and natural thing to do. 
Because the problem Is cellular and bec ause information flows only between adjacent 
cells without using any long distance c ommunication paths, the task is a natural one 
for an array processor structure with one processor per cell In the array. However, if 
there Is to be any hope of building a large mac hine this way, there are two problems 
which must be overcome. First, the amount of storage per cell must be limited. In 
the original algorithm, in an array of unbounded size each cell would be required to 
contain an unbounded number of bits. Second, the global state required in the 
original algorithm, which was represented by "N" In the programs above, must be 
eliminated. Accomplishing these goals would result In a machine which could be 
• extended to any size needed without undue complications. 
CALTECH CONFERENCE ON VL SI, Ja nua Py 1 9 81 
170 
ChP i a t opheP R. CaPPO LL 
The first goal, that of limiting the amount of storage required In each cell, was 
attacked by S. Akers in 196 7 (3). He showed that only two bits were required per 
cell to implement the algorithm proposed by Lee and Moore. Of the four states 
available from the two bits, one indicated that a cell was blocked and unavailable for 
new paths. Another was used to lnd1cate a cell that was so far untouched by the 
propagation process, like label 0 in thn above programs. Then, instead of using the 
ascending ordinal numbers to label successive wavefronts In the propagation, Akers 
used successive members from the sequence 
1, 1, 2. 2. 1. 1, 2. 2. 1, 1, 2. 2 . . . . 
These last two states stored the information necessary to get back to the point 
where the propagat1on started. See Fiqure 3 . It was only required that the program 
remember whether it was on the first 1, second 1, first 2, or second 2 In the 
sequence when it stored a number in the goal cell containing point B. For example, If 
the program knew It was on the second 1 of a pair when it reached the goal, then in 
the Retrace Phase it looked for c e lls contacnmg the above sequence in reverse order, 
starting with the first 1, then 2,2,1,1, etc ., until it reached the starting cell. This was 
a b1g step forward, for only two bits of stora9e were needed for each cell. no matter 
how big the array of cells wa s mnde Unfortunately this did nothing to solve the 
problem of having to distribute the next member of the sequence as a global variable 
to all the cells in the array. 
The solution to the dilemma of global state IS to us e a slightly different strategy in 
the algorithm. This idea was record e d In an internal document of the Coltech 
Computer Science Department by Ivan Sutherland (4 ). Rather than numberinq 
succ essive wavefronts with some sequence and then searching for the reverse of 
that s eque nce to find the path. s imply s tore In each cell arrows which point to the 
neighbor(s) from wh1ch the wavefront approaches as It passes over that cell. and 
then just let the arrows show the path back to the starting cell. This approach is 
shown in Figure 4. Since the wave front may reach a cell from more than one neighbor 
simultaneously, and since that fact Is Important when trying to select one of several 
equally good paths, an arrow for each neighbor Is needed. These arrows require one 
bit each, because the wavefront either came from that neighbor or It didn't. 
Additionally, one bit Is required to Identify a cell as being blocked. Five bits per cell 
Is more than Akers' two, but it Is still a small number, and more Importantly, It Is still a 
I NNOVATIVE LSI DES I GNS S ESSION 
171 
A Sma~t Memo~y A~~ay P~ocesso~ fo~ Two Laye~ Path Finding 
1 2 1 1 2 ') .... 1 ~ 1 1 . 2 l 1 
2 ~ B 2 2 2 ~ B2 2 2 ~ ~ ~ ~ ~ 2 
2 1 ~ ~ ~ ~ 2 1 ~ ~ ~ ~ 2 2 ~ ~ ~ ~ ~ 
1 A 1 1 2 1 A 1 1 2 2 1 1 1 Vwl 1 2 ? 1 1 
2 1 2 < 1 2 2 1 1 2 ') 1 2 2 1 1 ,., -
Tn !'r o::;ress P r opagRtion Complet0 Path Found 
Fiaur0 J. AkE>rs ' 1-lodi.fication 
~ J- ~ 
-
+-- (C- ~ 
" 
'l 
-
,. 
!, ~ B + J, (C-~ Bi J, -+-;> ~U-~ -jt l ... 
-b ~ ~ ~ ~ ~ -t ~ ~ ~ ~ ~ j, + ~ ~ ~ ~ J 
-
A ~ 
-
~ A ~ r- ~ ~ ~ __., 1.4 +--
-
~ ~ 4---
+ t 4- + 1' + + 4- + 4- ~ t 4- 4- 4- I+ 4-
In ProgrE>ss Propa~ation Comp l ~t~ Path Fonnd 
F i GUr e 4. t-!orli "fica tion f'or ~1.\Z =~ 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
172 
Ch PistopheP R. CaPPoll 
bounded number, that does not change as the array of cells grows. With these 
modifications to the algorithm, the move to hardware Is hardly more than just "wiring It 
up" . 
2.1 THE DEMONSTRATION CIRCUIT 
As a demonstration of the feasihtlity of this npproach to path finding. I built a small 
array of processors out of standard TTL parts . Figure 5 shows the circuit I used. 1\s 
can he seen from the figure, there is not much to the "processor" . It consists of one 
and two halves standard TTL packages. a few resistors, and an LEO display. The 
circuit uses a 7 41 61 as a four btt Ia tch . The 7 41 61 features the TC output. whic h t5 
the logical AND of the four latch outputs and the ET input. The four bits in the 7 416 1 
are the four arrow bits . The fifth bit Is formed by one NAND and one NOR gnte to 
Indicate the blocked condition. Global control signals are circled. The two siqnnls 
START and BLOCK are tndependent for each cell and are activated by momentarily 
grounding that node with a probe ttp. Communication with ne1 .nbor processors P.ntcrs 
this processor at the preset inputs of the 7 4161 and exits to the neighbors from the 
NAND gate at the right . 
In operation, the circuit i s quite simple. lnita ly, the CLEAR signal is taken low to cle-ar 
all the block flip-flops . Then the maze walls are defined by selectively blocking somP. 
processors by grounding their BLOCK inputs. The LED decimal point li9hts in thr. 
blocked cells. Next, RESET Is taken high for at least one clock cycle . This forces nil 
communication wires between neighbor processors high and parallel loads all ones into 
the 7 41 61. turning off the LEO segments. This is a stable configura tton, and wtll not 
change as the clock ticks. All communication wires stay high. and the latches k erp 
parallel loading all ones because the TC outputs are high. Now suppose thnt 
somehow the processor to the right of this one changes out of Its all ones s tntf' . 
Then its TC output goes low, causinq the latch to stop parallel loading and cnusmq 
the communication wires leaving the processor to go low. One of those wirr.s r ntcrs 
this processor on the P0 Input. At the next clock cycle, that low state is loaded into 
the 0 hit of the latch, turns on the right LED segme nt indicating a right pointing arrow. 
and prevents further parallel loading of the latch by forcing the TC output low. The 
result. then, is an indication on the LED for this processor that something happenf'd to 
the right of it. Incidently, when the TC output of this processor went low. so diet the 
outgoing communication wires, so on the next clock cycle, the other neighbors of this 
processor will be activated just as described above. Now, how did all that get 
INNOVATIVE LSI DESIGNS SESSION 
~
 
~
 
t'1
 
f-.3
 
~
 
~
 
::X
: ~
 
()
 
~
 
"
>:
j 
~
 
~
 
~
 
~
 
~
 
~
 
()
 
~
 
~
 
t'1
 
tl
) 
'-
"
i 
"
 ~
 
~ ;:!
 
1':
 ~
 
~
 
~
 
.
.
.
.
.
.
.
 
~
 
Q
:) 
.
.
.
.
.
.
.
 
33
0 
(4
 t
im
es
) 
FR
OM
 T
OP
 
NE
IG
HB
OR
 
QA
 
QB
 
CL
K 
PA
 
PB
 
FR
OM
 L
EF
T 
NE
IG
HB
OR
 
Qc
 
QD
 
74
16
1 
PC
 
PD
 
FR
OM
 B
OT
TO
M 
NE
IG
HB
OR
 
+
 
Ep
 
ET
 TC
 I
 
' 
PE
b 
~ 
' 
~
 
+
 
R
 
FR
OM
 R
IG
HT
 N
EI
GH
BO
R 
F
ig
ur
e 
5.
 S
ch
em
at
ic
 f
or
 D
em
on
st
ra
ti
on
 C
ir
cu
it
 
TO
 T
OP
 
NE
IG
HB
OR
 
TO
 L
EF
T 
4 NE
IG
HB
OR
 T
O 
RI
GH
T 
NE
IG
HB
OR
 
TO
 B
OT
TO
M 
NE
IG
HB
OR
 
~
 
tl
) 
:::! ~ ~ <i- ~
 
~
 
:::! ()
 
'
$ ~
 
~
 
'
$ 
'
$ ~ ~
 
"
':J ~ ()
 
~
 
~
 
Ol
 
O
l ()
 
""
$ 
'
-
I) ()
 
""
$ f-.3
 
E:
 
()
 
t'1
 
~ ~
 
~
 
~
 
'"
'd ~ <i- ~
 
"
>:
j 
t>
. 
;:!
 ~
 
t>
. 
;:1
 
~
 
.
.
.
.
.
.
 
-
::
1 
c,
..,
 
174 
ChPistophe P R. CaPPotr 
started? Well, the START Input on one cell was momentarily grounded, causing the 
latch outputs to go to all zero, turning on all four LED segments indicating tllflt 
propagation started there , and causing that cell's communication outputs to go low. It 
is actually a very simple process that each processor in the array must execute. 
There is no computation in the numerical sense Involved. Each cell simply passes on 
the propagating signal when it arrives. and records from which dtrection(s) it come. 
When the propagation reache<> the edges of the array, or can go no farther becousc 
of blocked cells, the action stops. What is recorded by the LEOs Is actually the 
direction to go from each cell in the array to get back to the cell where it all started. 
The hardware has found the shortest path from the starting point to any other point in 
the array. 
I designed this circuit purely for demonstration purposes. As such, tracing the path 
back is a visual process done by looking at the LED displays. if automatic trace bock 
were desired, the five bits in each processor would be accessed as five bit words by 
a general computer which would then consider the processor array as a block of 
smart memory. It Is on easy task for on ordinary computer to decipher the bits from 
each processor to find the path desired. 
Before proceeding, consider what has happened to the computational effort requirnd 
to reach this result. Time complexity of the algorithm has been dramatically lmprov€'cl . 
Now, rather than having a single processor advance the wavefront by stcppinq 
around the starting cell one celt at a time in an expanding spiral, the propo~1ation 
takes place by activation of successive rings of processors surrounding the startmg 
cell. At any given time, a number of processors directly proportional to the length of 
the path are actively working, rather than just one processor. The time required for 
the wavefront to expand out to the goal point is now directly proportional to thf"! 
length of the path, not to the square of the length. Thus, the time complexity of the 
algorithm Is now linear, not quadratic, with respect to the path length. This result is 
expected -- a linear number of active processors can do in linear time what one 
active processor can do In quadratic time . 
The circuit described above Is so simple that it seems natural to lay out several 
copies of It on a silicon chip. That Is just what was done for the design of the 
M~ZER chip. 
INNOVATIVE LSI DESIGNS SESSION 
175 
A SmaPt MemoPy A~~ay PPocessoP fo~ Two LayeP Path Find i ng 
Z.Z THE MAZER CHIP 
The first step In designing the MAZER chip was to develop a NMOS circuit which 
performed the function of the demonstration circuit above. The stumbling block was 
the clocking scheme. Edge triggered latches are not as easy to come by In MOS as 
they are In TTL. Usually a set of multi -phase clocks are used to latch signals. This 
seemed to unnecessarily complicate the circuit, and a way around the problem was 
sought. The answer turned out to be easy. Just don't use any clocks! 
On careful scrutiny of the operataon of the demonstration circuit , one sees that 
clocking Is really unnecessary . The only operation performed by each processor 
consists of waiting for the propa ~toting wavefront to reach It, recording the 
dlrectlon(s) from which it came. and passing it along to its neighbors. One could 
Imagine the array of processors as an array of mousetraps, each cocked and ready to 
fire. Each mousetrap is designed to fire as soon as any of its neighbors fire . Each 
mousetrap will store the direc tion from which ats firinq signal comes. At the end of 
the outward propagation process. which might always be allowed to propagate to the 
extremities of the array. the contents of each mousetrap cell's storage would then 
be the required arrows pointing in the direction of the shortest path from that cell to 
the start of the wave propagation process. This is a good visualization of the way in 
which the MAZER works . 
Figure 6 is a conceptual logic design of a simplified MAZER celt. Not shown are all 
the mechanisms for accessing the ce ll. blocking the cell so that It becomes part of a 
"wall" In the maze. causing that cell to be the starting point of wave propagation, 
etc., but the mousetrap characteristic is illustrated. After the reset line has gone 
high to make all the flip flop 0 outputs low, all signals which cross the cell boundary 
are low, and the system is stable In this state. Now, if for some reason one of the 
Incoming signals goes high, the corresponding flip flop will be set. This causes the 
Inputs to be disabled via the AND gates. and also causes the cell to generate a high 
going signal to each of its neighbors, triggering them In the same way. The flip flops 
remember from which direction the activation signal entered the cell, and reading 
them out by an accessing mechanism not shown gives the direction the maze solution 
takes as It passes through this cell . 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
176 
FROM TOP ~ s Q 
FROM LEFT ~ s Q 
R 
FROM BOTTOM • s Q 
R 
Figure 6. The Mousetrap Concept 
INNOVATIVE LSI DESIGNS SESSION 
Christopher R. CarroZZ 
TO TOP 
TO LEFT 
.---· 
TO BOTTOM 
TO RIGHT 
177 
A SmaPt MemoPy APPay PPOCB880~ fo~ Two LayeP Path Finding 
Figure 7 is an actual schematic of a MAZER cell. Three of the AND gate/flip flop 
combinations of Figure 6 are seen here as transistor groups 01-06, 07-012, and 
Q 1 3-01 8 . The fourth direction is identified by the state where the cell has been 
triggered, and the other three flip flops are not set. The NOR gate and Inverter are 
formed by 019-025. Four b1ts of information are provided, as open drain outputs wire 
OR-ed with other cells on the chip. These bits are the three flip flop outputs plus a 
signal which indicates if the ceil mousetrap has been "sprung" . Transistors 033-036 
form a flip flop to store the blocked condition. ROW and COLUMN are addressing 
s ignals to select the ceil for data readout, BLOCKing the ceil to make it part of a 
maze wall , or STARTing the propagation process with this cell. RESET re-cocks the 
mousetraps, but does not des troy the blocked condition in the maze wall cells. CLEAR 
unblocks ail the cells in preparation for a new maze. The other signals are 
communication paths to adjacent cells. 
The complete MAZER chip contains sixtee n processors arranged in a four by four 
array. Larger arrays can be assembled by arranging MAZER chips themselves in an 
array. Four wires come off each edge of the chip for the purpose of communication to 
adjacent chips. There are 15 additional wires which come off chip for data and 
control. Four ore for data outputs, four are for address inputs, two are for power, and 
five are for the control s ignals BLOCK. RESET, CLEAR, START, and CHIP-ENABLE. A plot 
of the MAZER chip Is shown in Figure 8. The four by four array of processor cells 
can be seen, s urrounded by the 31 pads . Designed with Caltech's conservative 
design rules, the chip measures 2241 microns square, or about 8100 square mils . 
2.3 MAZER CHIP TEST AND CHARACTERIZATION 
The MAZER chip recently returned from fabrication and some preliminary results of 
testing are in . I have three chip s which appare ntly contain no processing faults . The 
chips seemed to perform strangely, until I diagnosed the symptoms and found a basic 
bug in the MAZER processor cir c uitry. With the bug uncovered, I was able to 
circumvent the problem and continue to test the parts. 
In order to understand the bug in the MAZER c ircuitry, examine Figure 7 . The four 
data outputs of each of the sixteen processors, which come from the drains of 
transistors 027 through 030 In the figure, are bussed together onto four global data 
w1 r es on the chip, DATA1 through DATA4. These wires run to the chip output pad 
circu\try, where there is a single pull up transistor on each of them. The Important 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
178 
Ch~istophe~ R. Ca~~oLL 
L:\_ 
V Q3f1 
Figur~ 7 . Schenatic of a ~AZER cell 
INNOVATIVE LSI DESIGNS SESSION 
A Smart Memory Array Processor for Two Lay~r Path Finding 
Figure 8. Plot of the MAZER chip 
CALTECH CONFERENCE ON VLSI , January 1981 
180 
Christopher R. Ca rroll 
point to exam1ne is the node in eoch processor which is common to the sources of the 
four transistors. 027 through 030. Th1s node, which I will call the "enable node". is 
pulled low by the decoding trans is tors, 030 oncl 040, which are controlled by the 
addressing s1gnals ROW and COLUMN. Smce 030 and 040 are hoth turned on In only 
one of the sixteen processors, there should be a path from the enable node to ground 
o nly in that addressed processor . This prevents data in the non-addressed 
processors from pulling down on the cl"ta lmes. However. things are not so simple. 
Suppose one of the data output trano:;l o:; tnrs. ~'>OY 027, IS on In the addressed 
processor. thus pullinq down the DATA 1 w1re . Now. suppose that in another 
processor , wh1ch is not b e in<J addresse<1 , both 027 and 028 Are on. Sin ce DATAl i s 
being h eld low by data in the addressed processor. 027 in the second processor 
provides a path to ground for th e enable node in that second processor. 1\s a result, 
028 in that second processor can erroneously pull down the DATA2 wire . Since the 
addressed processor should not pull clown DATA2, the result is that incorrect data 
appears at the output o f the c h1p. 
Fortunately. it is possible t o r e trieve correct data despite that problem. In the above 
example, notice that DATA 1 IS pulled down f1r :; t hy the addressed processor 111 the 
normal way. Th e bad data does no t s tnrt to pull down on DATA2 until DATA 1 Is down, 
and even then , the s tnng of transistor s doinq the pulling on DATA2 is longe r thnn 
no rma l. The result is that qood data s hows up o n the c hip output pads about fifty 
nano-seconds before it is rum e d by t>a<l data caused by the s neak paths. Latc hing 
the good data m an ex t ernal latc h at the riqht tim e retrieves the correct state of the 
internal b1ts . 
The operations of STARTm g propaqat1on and £3LOCKinq the cell are also affected by 
the unwante d paths betwee n ennble nodes a nd ground. Luc kily, bec ause of the high 
pull - u p to pull-down ratio in the 024/025 invert e r , re:;ultinq in a v e ry low s wit c hing 
thresho ld, STARTing can he performe d normnlly. Apparently the s neak currents Are 
low enough to prevent the mverter from swit c hing in the non-addr essed proces <:o rs. 
Howe ver, so far I have b een unnble to individually BLOCK cells . The effect of blocked 
CE'II s c an n e v e rth e less be t es t ed by us inq th e ranc1om distribution of blocked cells 
present after power-on. The Cl EAR s 1qnal, which clears all blocked cells , operates 
no rmally. 
In s pite of the difficulties mentioned above. the test results are encouraging. All the 
arrows point corre c tly back to the cell whe re propagation starts. Some local 
INNOVATIVE LSI DESIGNS SESSION 
181 
A SmaPt Memo Py APPay PPocess o P fop Two LayeP Path Finding 
asymmetries sometimes are present. 111d1catrng that propagation proceeds a little 
more quickly in some areas of the ch1p than in others, but this result Is expected with 
the asynchronous scheme used, and Is neqlig1ble anyway. Access time, from chip 
enable to data output, is around 1 00 nano-seconds, about normal for a chip of this 
small size. A more detailed characterization of the chip remains to be completed. 
3. HARDWARE FOR FINDING TWO-LAYER PATHS 
The fact that the MAZER is limited to s1ngle layer paths limits its u<;;efulness ThP. 
most Immediate application for path f1ndinq hardware is In the area of pnnted clrcu1t 
board design. However. single s1ded c 1rcu1t boards are not very exciting. The step 
to two sided boards dramatically Improves w1reability and board dens1ty Addinq even 
more layers to the hoard improves dens1ty st1ll more, but the add1t1onal effort does 
not buy nearly as much as the move from one to two sides. Thus, there was oreat 
incentive to develop a two l <~yer path finciN, w1th the specifiC goal of producrng a 
routrng machine for two sided prrnted c1rcuit hoards. This is the background for the 
design of the other integrated c1rcu1t t o be discussed. the PATHFINDER. 
At first glance, it would seem that one need only construct a circu1t that forms the 
topology of two MAZER chips laid on top of one another. with an additional arrow h1t 
In each cell to Indicate travel from one layer to the other. This strategy would work. 
except that it lacks some properties wh1ch have been found very desirable in the 
two layer environment . In what follows. termrnoloqy of the printed circuit board world 
Will be used, w1th the understanding that other applications. such as Interconnect 
wiring on Integrated circuits. would have analogous features and terminology . 
The first feature that would be missin~1 in such a two-layer MAZER is thE> ab1lity to 
block travel from one side of the board to the other Independently from blocking 
travel through those cells without chan{ling sides. Often It is desirable to prevent 
these holes In the board, or v1as, from occurring In certain areas of the circuit board. 
Perhaps vias are to allowed only on a tenth rnch grid, for example. Furthermore. vias 
sometimes can affect more than just the cell in which they occur. A via In one cell 
may prohibit the placing of a via in an adjacent cell. For all these reasons, an 
additional bit is required for via blocking rn any proposed two layer path finding 
system to make It useful. 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
182 
Chr i s tophe r R. Ca rroLL 
The second missing feeture Is much more d1sturblng. Designers of two layer circuit 
boards have long realized that it was advantageous to employ a tendency for w1re 
runs that were mostly vertical to end up on one side of the board, and runs that w e re 
mostly horizontal to end up on the other side. This helps to avoid unnecessarily 
blocking channels for future w1res . The tendency of a wire to choose one side o f the 
board or the other dependang on Its orientation would be completely lacking In a 
straightforward two-layer MAZER. Incorporating this preference into the bas1c p ath 
finding algorithm was an lnterestmg problem, and the methods developed to s olve 1t in 
both the traditional software implementations and the current hardware 
Implementation will now be examaned. 
A way to achieve the wire location preference is to use a system of costs associated 
with travel from cell to c ell through the array. One group to incorporate thrse costs 
into a standard software wire routmg system was a group at Burroughs Corporat1on 
(5) . Each cell stored one mte qer, but rather than storing ascendinq ordinal numbers 
on successive wavefronts m the propag ation phas e , a s in the original algorithm, eac h 
cell stored the accumulated c o s t for reachmg that cell from the startinq cell . 
Suppose C(a,b) IS the cos t for expanding the wavefront from cell a to its neiqhbor, 
cell b . Then as the wavefront passes from cell a to cell b. the number stored in cell b 
is the number stored m c ell a plus C(a ,b) . Smc e different costs might be encountered 
along different route s from the starting point to a given goal pomt, a numher which 
has been previously stored 1n a c ell might he overwritten If the wavefront reor.hes 
that cell from another direction with a lower cost than that achieved by the first 
contact with the cell . Th1s IS shown 1n F1gure 9 . Notice that the wavefront expondc; 
in exactly the same way as 1t did m the origmal algorithm, but now cells on the 
frontier are not necess arily all equally "dis tant" m terms of costs , as they were in 
the schemes described earlier. In the retrace phase of the algorithm , the numhc rs 
stored in the cells are used in a way similar to that described earlier . Howeve r, 
rather than searching neighbor cells for the next member In a revers ed sequence. 
each step of the retrace Involves s earching for a neighbor of the current cell With a 
stored cost less than that of the current cell by the amount of the cost of 
propagating from that ne1qhbor to this cell dunng the propagation phase. This doe s 
not necessarily result In the s hortest path from point A to pomt B. What comes out 
Instead Is the least costly path between those two points, based on the cost 
function C(a,b) . This Is the meaning of the phrase "least costly" In the Introductory 
paragraph of this paper. 
INNOVATIVE LSI DESIGNS SESSIO N 
183 
A Sma~t Memo~y A~~ay P~ocesso~ fo~ Two Laye~ Path Find;ng 
2 
IJ2 
22 
12 
2 
1 2 
J2 J 1 J2 
~ B 2 1 ~ D 22 2 1 22 ~ D 
1 1 ~ ~ 10:: ~ 1 2 1 1 ~ ~ ~ ~ 12 1 1 ~ ~ ~ ~ 
IA1 2 2 A1 2 J ? .\1 ') J ,, 5 
1 1 1 2 1 1 1 2 1 2 1 1 1 '? 11 1 t, 
Just Startin r: :\ lit tle more <:; iill Fnrthr>r 
J1 J2 JJ J4 ~ 2 ) 1 J2 JJ J4 35 J6 J2 J 1 J? JJ J4 
2 1 22 ~ 44 126 P.2 2 1 22 ~ 28 ?. 7 26 22 2 1 22 ~ :~ 
1 1 ~ ~ ~ ~ 16 ~ 2 1 1 ~ ~ ~ ~ 16 1 2 1 1 ~ ~ ~ 
~1 2 3 4 5 6 ? f\1 2 j 4 5 6 .., -1 .... .., I -' 
11 12 1J 14 15 16 ~ 2 n 1 1 2 1J 14 15 16 1 2 1 1 1 :? 1J 1 /~ 
Not Done Ye t! Finished .i. ·a th Found 
Figure 9 . Path founrl by usine a cost of 1 f or 
trave l in the horizont al direction Rn u 
10 ~or tra vel in the vertica l dire c tion. 
J5 J6 
b._.., 1-l, 
~ 1 fl 
,... 
~ 
1 5 1() 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
184 
ch~ i stophe~ R. ca~~o ZZ 
The simplest cost function normally used consists of only three distinct costs . One 
cost is used for travelling in the "ensy" directions, north-south on one side of the 
board and east-west on the other s1de , a second, slightly higher, cost Is used f o r 
travelling in the "hard" directions, east-west on the first side and north-south on the 
second side, and a third, even higher , cost is used for travel "through the board" 
from one side to the other. Fancier schemes are possible . These involve reduce d 
costs for travel near to and parallel to the edges of the board to Increase utilization 
of that area, increased costs near component pins to prevent blocking future access 
to those pins, etc. The task of developing a cost function tailor-made to a particular 
circuit board can become quite an art. 
Note, however, that using this cost func t1on as a solution to the two layer Situation 
brought back the same problems the ortqmal algorithm had, namely an unbounded 
number of bits of storage per cell , and global distribution of a numerical cost function . 
If there was to be any hope of bulldmg a chip comparable to the MAZER for two layer 
circuit boards, these problems had to be elimmated. To accomplish this, the MAZER 
was re-examined . 
The scheme of using arrows instead of numbers seemed to be the way to go to limit 
the number of bits per cell. Using this method, however, required that during the 
propagation phase the expanding frontier of the wavefront must Include only cells 
that were equally distant in terms of cost from the starting point , unlike the above 
two layer approach. This seemed to be at odds with the uniform, diamond shaped 
wavefront propagation described above. 
The solution to the costs problem was to control the speed of wavefront propagation 
from cell to cell , rather than IPt it go at gate delay speeds. Consider the simple three 
cost system described earlier . If propogat1on could be allowed to proceed quickly in 
the "easy" direction, more slowly in the "hard" direction, and even more slowly in the 
"through the board" direction, the wavefront would meet the requirement that all cells 
on the frontier would be at an equal "distance" in terms of cost from the starting cell. 
Imagine a system where north-south propagation is easy on the top of the board and 
hard on the bottom. On such a board, a wavefront propagating from point A to a point 
B directly north of point A will reach point B on top of the board first, and Will thus 
store arrows indicating a path which travels on the top side of the board back to 
point A. Similarly, If point B were to the east of point A, the wavefront, propagating 
more quickly In the east direction on the bottom of the board than on the top, would 
VNO VAT I VE LSI DESIGNS SESSION 
185 
A SmaPt MemOPY APPay PPoceBBOP fop Two LayeP Path Finding 
cause point B to store arrows Indicating retrace along the bottom of the board back 
to pomt A. No matter where pomt B was, the arrival of the wavefront woulcf store 
information describing the least costly path back to point A. 
During the time this solution evolved, some redundancy in the storage used In the 
MAZER made itself known. If the propagation phase left an arrow in cell A pomtrnq to 
a neighbor cell B, Indicating that retrace should proceed in that direction, that 1mphed 
that there would be no arrow in cell B which pointed to cell A. Since that particular 
combmation, adjacent cells point1ng at each other, would never occur, there must 
have been some redundant information stored there, implying that a rpduct1on m 
storage was possible. Th1s was accomplished by moving the location of the arrow 
bits from inside each cell to between cells. Since eBch arrow then served the two 
cells between which it lay, the total storage requ1red for the arrows was halved 
With these new modifications to the basic Lee-Moore algorithm, it was t1me to start 
designing the two layer chip. 
3.1 THE PATHFINDER CHIP 
The difficult obstacle in th e desiqn of the PATHFINDER chip was the methocl to uc;e 
in controlling propagation speeds What was required was a way to vary the spPPd 
over at least a ten to one range in each of three directions, the "easy" direct1on, the 
"hard" direction, and the "through the board" d1rectlon. Also, the circuitry coulc1 not 
be overly complex, nor could it involve many w~res to the global environment. 
However, the required speed settings were related to the cost funct1on clescnb€'c1 
above. The cost function was somethmg that was set by little more than educatPd 
guessing and expenmentatlon. There was nothing very critical about the exact 
values of the costs. Only approximate settmgs were required. All of these 
considerations led the design away from a digitally controlled speed system, and 
towards an analog system. 
The method employed relies heavily on the dynamic charge storage ab11it1es of M OS 
circuitry. Figure 1 0 shows the set up for a simplified, one layer cell w1th its 
surrounding arrows, not showmg the blocking or accessing circuitry. Each cell 
contains a capacitor of about 5 pF, or so. Before the start of the propagation phase, 
the capacitors are all precharged by means of the precharge transistor. With all the 
capacitors charged, all the arrow flip flops have both outputs held low. To start 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
186 
1 
------- _j 
1 "!' t A r ro '" r---- ~ ?rccht rge Right Arr0w r--------~ 
L-------.-.J 
Lo"'·(> r Arrow 
r - - - - - - 1 
I 
FieurP 1 o . Simplifjcrl PATTWTND~n C"ll 
INNOVATIVE LSI DESIGNS SESSION 
187 
A SmaPt Memo Py APPay PPoc e ssor f o p Two LayeP Path Fi ndi ng 
propagation at a cell, that cell's capacitor 1s discharged. That action releases one 
side of the arrow flip flops surrounding that cell, causing those errows to "point" to 
that cell with the discharged capacitor. The h1gh outputs of the errow flip flops then 
enter the neighbor cells, and begin discharging the capacitors there at rates 
determined by the voltages on the gates of Oa and Ob. When those capacitors are 
completely drained, the Arrows surrounding those cells flip to point to the newly 
discharged capacitors, and the arrow outputs begin discharging capac1tors in the1r 
neighbors. As the wavefront of activity propagates out, cells behind the frontier 
have completely discharged capacitors, cells nhead of the frontier have fully charq£>d 
capacitors, and cells on the frontier have capacitors which are m the proces s of 
being discharged. The voltaqes on the gates of Qa and Qb are the cn1c1al SIH'P<1 
settmg values. They are set by current mirror arrangements, as shown in r1qure 1 1 . 
There are three current mirror pads on the PATHFINDER chip, genera tlng three 
control voltages for <1ischarging the capacitors at the three rates required for the 
"easy", "hard", and "through" directions. 
A feature included on the ch1p allows a smnll Amount of local control over the r.o~ t 
function, to modulate the overall three costs described above. This consists of nn 
additional pf or so of capacitance wh1ch can be switched on In parallel with the mnm 
capacitor In each cell. The time for propaqatmg through a cell, and hence 1t s 
propagation "costs", can be increased by connecting Its extra capacitor before 
precharge and leaving it connected through propagation. The cost can be d e crea s ed 
by connecting the extra capacitor aftf'.H precharge is over and disconnectmg it aqan1 
before propagation s tarts. These capacitor connections are switched on a c£>11 by 
cell basis, controlled by a single bit in each cell. This makes it possible to inr.rPas c 
costs near component pins, or to decrease costs near the board ectqes, etc: , to 
reduce or increase the tendency for w1res to end up in those areas. If circuitry h<HI 
been included to discharge the extra capacitor when it was disconnected from the 
main one, additional levels of cost could be obtained by repeatedly connectinq and 
disconnecting the extra capacitor between precharge and the start of propaqat1on to 
remove more and more charge from the main capacitor, and thus reduce its dlscharqe 
time . However, that extra feature was not included. 
Figure 12 is a schematic of a two layer PATHFINDER processor, containing circuitry 
for the cells on both sides of the board as w ell as the arrow between them. The 
upper arrow and right hand arrow for each cell ore arbitrarily ass1gned as belongmg to 
• that cell, while the lower and left hand arrows are considered to belong to the 
CALTECH CONFERENCE ON VL S I, Januapy 1981 
188 
+ 
CHI 
Christophe~ R. CarroLL 
Ex t p rn a l Tic~ibtor 
t n c: e 1 r 11 r rr n t 1 ,.. v r l 
Vol t ugc distrihutPd t o 
.spt>Pd control tran s i s t ors 
( Q nr Qb in Fir,urP 10. ) 
a 
Figure 11. ExAm p le of' a rurrE>nt mirror pud us~d on 
the FATHFIKDER chip. TherE> arr- three <.' f' th E>se , 
one f'or each co s t ("eas y'.', "har•l "' , nnd "throu ch " ) . 
INNOVATIVE LSI DESIGNS SESSION 
\}
 
:t
. 
t-
. 
'-
3 ~
 
\}
 
::'.
!:: \}
 
C
) <=
 
~
 
~
 
:::
0 ~
 
<=
 
\}
 
~
 
C
) <=
 
~
 
t-
. 
{/
) 
"'
-1 
"'
 ~ ;::s $::: ~ "'$ ~ ..... (0 ~ ..... 
f 
-~
~J
u.:r- \rrOI( 
~ ~ 
CL
K 
-
f 
-
, I 
r
-
-
-
-
-
-
-
-
-
1
 I 
I ~ 
T
op
 L
ay
er
 c
ir
c
u
it
ry
 
Sa
m
e 
a
s
 
B
ot
to
m
 L
ay
er
 
E
x
ce
p
t 
"
e
a
s
y
" 
a
n
d 
"
ha
rr
l"
 
v
o
lt
a
c
e
s 
a
r
e
 
r
e
v
e
r
s
e
d
 
-
-
-
.
I 
L 
B
ot
to
m
 L
ay
er
 C
el
l 
T
hr
o\
1£
:h
 
'
li
 re
c
 t'i
. o
n
 
~
ol~
e -
4 
F
ig
u
re
 
12
,
 
S
ch
em
at
ic
 
o
f 
PA
TI
IF
IN
D
F:
II 
pr
o
c
e
s
s
o
r
,
 
m
e
c
ha
ni
s
m
s 
a
r
e
 
n
o
t 
sh
o'W
Tl
, 
I 
I L 
T
op
 
L
ay
er
 C
Pl
l 
B
it
 
r
o
a
d
in
c 
a
n
rt
 
w
r
lt
in
c
 
_
_
_
 
J 
:t
. {/)
 
:3 ~ "'$ ~
 
~ ~ :3 C> "'$ r~
 
:t
. 
"
'$ 
"
'$ ~ ~
 
"
tl 
"
'$ C> ~ ~ Q)
 
Q)
 
C> "'$ '-
'l C> "'$ '-3
 
e: C> t-.
 
~ ~
 
~
 
"
'$ 
"
tl ~ ~
 
,
3"
 
.
.
.
.
, 
~""
· 
;::s A
. 
~""
· 
;::s
 
CQ
 
""
" 
(X
) 
<.
0 
190 
Christopher R. CarroLL 
neighbor cells in tho!;e direct1ons. The control storaqe hits are shown as boxes for 
simplicity . Actually the five arrow hits and the four control bits make up a nine hit 
word of what amounts to a standard s tatic me mory system, using the usual six 
transis tor cell. Not shown are the two trans 1s tors which selectively link the flip flops 
to the word lines which run throu gh all the bits, nor the select lines which control the 
gates of those transistors t o do the addressing. Instead, the storage bits are shown 
located '" reas onable places on the schematic to suggest their function In the circuit . 
The circuit works jus t as described above for F1gure 10, with the addition of the 
bloc king controls and the s w1tc h t o two lnyer operation. Having two laye rs m e rf>ly 
means that three paths are present for dischargmg the cap a ci tor , each controlled by 
a trans istor whose gate voltage is se t by one of the cost-setting current m1rrors. 
The blocking control flip flops merely inh1bit the appropriate discharge paths to 
prevent the discharge of the capacitor under the conditions which are to be blocked . 
The only r e maining unexplained f ea ture of the schematic is the signal labelled CLK. 
Rememb e ring that this circuit wa s ciP.scribed as clock-less, one might wonde r what 
the signal labelled CLK could be. In fa ct. it is a c lock , but not in the us ual sense . 
The clock signal concerns a problem whic h has not been mentioned until no w . h a ving 
to do with chip boundaries in an array o f cells composed of many chips. Because of 
the relatively large wiring capaci tance associa ted with leaving one chip and entering 
the next, propagation from a cell on the periphery of one chip to its neighbor whic h 
happens to he o n an adjacent c hip will be muc h s lowe r than propagation from cell to 
cell within the same c hip. This can result in paths which have the problem shown in 
Figure 13, where the path takes e v e ry possible route to avoid crossing extra chip 
boundaries. The clock s ignal is an a I tempt t o avoid th1s problem, by allowmg the 
discharge process to occur only in short spurts . The c loc k is on for a f e w t e n s of 
nano-seconds, and then off for a hundred nano-seconds or so, giving the s iqnals 
crossing from chip to chip time to settle. Letting the propagation process proceed 
only a little bit at a time like this s lows down the system some, but should greatly 
alleviate the chip boundary problem. 
Figure 14 s hows a plot of the metal layer of the PATHFINDER chip. The chip 
contains a four by eight array of two layer processors . As with the MAZER , the large 
processor arrays of several hundred processors on a side which are needed for 
useful printed circuit board work are built up by assembling PATHFINDER chips 
themselves in an array. Forty-eight of the seventy pads are devoted to chip to chip 
INNOVATIVE LSI DESIGNS SESSION 
A SmaPt MemoPy APPay PPoceB80P fop T~o Layep Path Finding 
i 
i 
-- 1-
·---+ 
- ' • L ( ~ ~ ~ \ \ I / 
' ' 
.......... ~ 
' ~ ~ ~ ' • I I I / I 
v ~-- ........... ~ I I 
r ~ ~ ~ I I J I I L' .. ~ 
' -. 
.. / r-:--
14 14 
v 
._PrP f <:>r re d P C\ th 
...._ ..._ 
Chip BoundariE-s Path f o und hy ~lA ZER 
Figure 1 ) . Th<:> p roblem pos<:>ol by chip boundaries 
in l a rge arrays 
191 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
192 
ChPistopheP R. CaPPOLL 
Figure 14. The PATHFINDER's metal layer 
INNOVATIVE LSI DESIGNS SESSION 
A SmaPt Memo Py APPay PPocessoP fop Two LayeP Path Finding 
communication within the larqe array. The remaining pads consist of nine address 
pads, two power pads, two data 1/0 pods, and ntne control pads, including the three 
current mirror cost-settrng pads . Chrp s rze is 3750 by 4875 microns. 
3.Z PATHFINDER CHIP TEST AND CHARACTERIZATION 
The PATHFINDER chip was includP<I on the MPC380 run managed by Xerox PARC in 
the spring of 1980, and was also Included o n the M08B M osis run managed by the 
Information Sciences ln ~ trtute . From th e two runs, I have received approximately 
twenty five copies o f the chrp Only a few of the c hips have been t es ted so far, 
however, because of dtfficulties in packaginq the sf!venty pin circuit . Ne verthe l ess , I 
do have one chip which is comple t e ly functional, and several others whrch work well 
enough to verify that the c hrp desrqn rs correct. Faults tn the bad c hips range from 
stuck bits to malfunc tionin g address rlecoclers to c omplete farlure. perhaps in the 
output buffers . Some o f the chips have capaci tor s which fail to hold charge long 
enough to be useful. The capacrtors rn the one good c hip, though, hold their charge 
long enough to keep the arrows "balanced" fo r about twenty seconds when cArefully 
shielded from light . Thi <; i s at IP.ast two o rde r s o f magnitude longer than r e quire d for 
successful path findtng . 
The chip has pa ssed through several quantrtative t es t s. Access time from chip 
enable to data out is around a mi c rosecond, which is acceptable, though not 
noteworthy. An important rt e m of inte res t is th e operation of th e cost-settrng current 
mirrors. These perform very w e ll. The r ange of control is excellent, with the normal 
external operating current between 100 and 1000 microamps. Charac teri z ations of 
other quantitative aspec t s of the c hrp are not yet complete. 
I have a ssembled a s mall mic rocompu t f' r system to tes t the PATHFINDER chips and 
to operate them as a path fincltnq system. Wrth only one functronal chip to use, no 
tests of the chip-to-chrp communrcation s trnteqy have yet been possible . Ho w e ver , I 
have written a true path finding program for the microprocessor whi c h uses the one 
PATHFINDER chip to find paths through a four by eight qrid. The program allows the 
user to set up any Initial combination of blocked cells and blocked vias, start 
propagation from any cell in the array, And trace a path back to that starting cell from 
any of the other cells . The program dis plays the resulting path as well as the status 
of the control bits In the PATHFINDER c hip on a terminal screen. In this 
Implementation, three potentiometers, connected to the three current mirror pads on 
CALTECH CONFERENCE ON VLSI, Janua py 1981 
194 
Ch Pietophe r R . Ca rro ZZ 
the chip, are used to set the three global costs . The program can demonstrate the 
effect of changing costs on the path found between two points In the grid. For 
example, the user can cause the path to either skirt around barriers between the two 
endpoints , or to form vias and go over or under the barriers , depending on the 
settings of the three potentiometers. As more chips are tested and certified 
functional , this system can be expanded by connecting the good chips In an array to 
Increase the size of the grid on which paths can be found. 
SUMMARY 
This paper has detailed the d esign of hardwAre wh1ch implements the computationally 
expensive parts of the Lee-Moore path finding algorithm. The progression of d esigns, 
leading to the PATHFINDER c hip , s how thot this is a natural application of array 
processing. Applied to the proble m of two sid ed printed circuit board wire routinQ, the 
use of the chips described hme can reduce c:omputatlon time from several hours to 
around a minute. However, thi s circui t i s Innovative not only because of Its array 
processing aspect, but also b ecause o f its unusual use of analog variables . These 
cost-setting control voltages are not there to s1mply make the circuit work, as is the 
s ubstrate bias on the chip, for example. Ins t ead, th e values of the control voltaqes 
are important parame t e rs to th e computation performed in the processor arrAy. 
Changing the values of the voltages c hanues the result of the computation. This 
application of analog processmg in a cJi{Jital syc;tem may be just the beginning of a 
new design discipline combining the advantages of the analog and digital design 
worlds . 
INNOVATIVE LSI DESIGNS SESSIO N 
A SmaPt Me mo Py Arra y Pr ocesso r f o r Two LayeP Path Finding 
REFERENCES 
1 . E. Moore, "Shortest Path Through a Maze," Annals of the Computation 
Laboratory of Harvard University, Vol. 30. Cambridge, Mass.: Harvard 
University Press, 1 959, pp. 285-292. 
2 . C . Lee. "An Algorithm for Path Connections and its Applications ." 
IEEE Trans. Electronic Computers, Vol. EC -1 0, pp. 346-365, September, 1961 . 
3. S. Akers, "A Modif ication of LP.e's Path Connection Algorithms," 
IEEE Trans . Electronic Computers , (Short Notes), Vol. EC-16, 
pp. 97-98, february, 1967. 
4 . Sutherland, Ivan, "A Better Mousetrap", Computer Science Department 
Display File N562, Caltech, March 8 , 1977. 
5. Slemaker, C., A. Mosteller , L. Leykmg, A. Uvitsanos, 
"A Programmable Printed-Wiring Router". Burroughs Corporation. 
Mission Viejo, California 
195 
CALTECH CONFERENCE ON VLSI , J anua Py 1 9 81 
196 
INNOVATIVE LSI DESIGNS SESSION 
Spec ea l Pu rposc Ha rd,,v are 
for Design F~ule Checldng 
Larry Seiler 
M<.1::.sacl usells Institute of Technology 
197 
Spectcll putpose hardware can sign ftcantly tncrease the spP.ed of integnted circuit design 
rule chec/{f(IU. The archttecture closcril>eci tn thts paper uses four custom chips to 
implement a rastr>r scan uRC alg Htthm It allows the uso of ·15° a. •gi"S and can be 
programmed to ct1eck a wtde variell of de~tgn rules involvtng an arbttraf) number of lnyors. 
A slitink/expand opemt;on allows the use of rasteruaUon ~rtds that are s·na/1 relati'lc to /fie 
minitrum feawre stze Using the I.Aead/Conway NMOS design rulesB and asstJmlllfJ a grid 
size of I/2"A. or 11•1 the mtntnwm transi5/or width, this hardware can completely cllcc.lc a 
3000-:..x3000"A layout in uncler a minute. tf tt;e input data can be p,-ovicmd quickly enough. 
1. Introduction 
One of the mo;,t corr.rut<ltio;mlly difficult nsrccts of integrated circuit rfos1an is the 
problem of checking for desivn rule violations. Dec;ign 1ules define the ways :n •vhich the 
features on the vnnous '"~·out masks may be po:;ition~d with respect to each other. In 
iredustrial applic<ttion::;, the prob!crn is usually solved by running clesiun rule checks as batch 
jobs on large mnch10es. In un1ver~ity applications. the rno::>t common appront-h is to simplify 
the problem by using less complex clestgn rules nncl disallowing nonorthogonal angles. 
Neither approach is completely satisfac tory. Wlrat IS needed is a method of design rule 
checking that allows the ureC\tcr ccmplex1ty of inclu::.tt iul oestgn rules v1hile re taining the 
speed and simplicity of t11e university d8sign rule checkers. 
Special purpose hardwnre is one way o! sat1sfying these confl ic ting requ!rements. 
This hardware should be inexpensive enough tllnt it can be includ·~cl in individual color 
graphic designer workstations. It should be programmable for wide variety of design rules. 
It should be extensible to lurye numbem of layers and should be appllr:nble to hierarchical 
design rule checking nlgonttuns. Al::;o, it should be able to handle -t5° angles. Most 
industrial des1ons include 45° angles because they can result in a signific::\nt reduction in the 
area required by a lnyout. Allowing arbitral y anales provides 011ly a sm<.tll increosc in 
packing clencity. The only significant usc of angles which ure not a rnultiple of 45° is in 
bipolar anuloy devices where circular wires are us eel to accurately control transistor ratios. 
It has been c laimed th3t or.togon31 wires would be a sufficiently close approximation.s 
This research w::~s supported in paet by Unetcd Stntl'::> Aer Force Coe.trnc t AFOSR-r4D620 80-C-0073 ::~nd the 
Real lrme Systems Grllup of the ~.II r Laboratory tor Computer Scecnce. 
CALTECH CONFERENCE ON VLSI , January 1981 
198 
Lar>r>y Seiler> 
Thrz paper is orynnil.cd as follows. First, an oo~erview is yiven of the algorithm used 
by the proposed dcsrun rule check harclw~ue ond the arcllitectur8 that unrlerr.ents it. The 
nex t section descnbes alnorrthms that perform width checl~ ing and fe::tture shrinking 
operations. A custom chip arcl1rtecture that irnple: ments these ai\Jorit11ms ig also described. 
Next, the boundary check opornlion is introduced for checking €nor s tht>t depend on edge 
condrtrons Another cu::>torn chrp arcl11tecture implements this OJ,Jeration. 1\ !mal section 
summarizes the work nnd sugyests areas for further research. 
2. DcGiQn Hu lc Cit eel< llu rdw u re Description 
The two main cutogont:s of desrun rule cl1cckmg a!qor i thm~ diffet acconlmg to the 
ty;1t:s of ObJects they manrpulate. Geornetncal rlcsion rule checl,ors pcrf·)l m operations on 
'J• ·•met11cal obJects such a-; 1 ec tnnyles. wires and polyuons. F1aster scan design rule 
ciHJclwrs Llivide the layout mto a gnd of celis, each of whrch is <rnply or full on uach layer. 
l h,_ srn tple~t raster sc:.~n aiCJnnthms use o fr xed sr7c r~ctangular g.·icl of ce ll ~. although 
v<H rat>le srzed cells and trupuzorclnl cells are also used. 
·1 he design rul~ cller.k hardwnre de:;cribecl in thi s pnper unplentents a fi «Cd ur id raster 
scan nlyor itltm. This alyorrtltnt is especrally ~o~dl suited to hnrclwar e rrnplementallon because 
the data ' epresentation and d1e oper <>lions on tl te data are very ::;rmnle. Al~o. th3 raster 
sum formJt rncludes local connectrvrty mfo1 rnntinn, mnkrng expensrve inter'5cc livn tests 
<.~nneccs~ary . This section starts by rJ•1SCr ibilly lire basic algorithm, inclllcling its relation to 
r,rprurchical dnsign rule checl>~ nq . N8xl. the han.lware architectur·~ that impelments i t is 
d·~scrihf!d and the component::; of tnc: <.Hcllrtecture are drscussed in weater detarl. Finally, 
3n estunn!e is uCVt11oned for the spectl performance of tlw a; chitecture. 
2.1 Top Level Algorithm 
The hnsic structure o f the algorithm is similar to one used in a software design rule 
checi<.Jr written by Clark llaker.1 Srnce IC layouts nrc ~Jsuolly de:::cnbed by a hierarchical 
structure of geometncal objects, th0 first step rs to rnstantiate lite hierarchy and create a 
raster rmaqe of the layout. Desrqn rule checks are performed by moving a small rectangular 
wrndow over each positron in the rastcnzed layout, checkrng to see if the pattern in the 
wrnclow rs valid at each posrtion. For example, a 3x3 wrndow could be used to find all 
places where a masl< rs o11ly one unrt wide. The pattern matching operations that are 
performerJ at each position are callecl loc:al ~rea d·~sign rule checks. The final s tep in the 
algonthm rnvolvcs reporting tl~e pusi tions at which errors were found. 
There are two main categories of local area dc::;ign rule checks. The first is width and 
spacing chl'cks, whtch am the most common opel'atrons. Shrink and expand opcrntions on 
masl<s arc also used . ft1 P.y are closely related to vtidth checl<ing. The other mnin category 
rs general boundary checks. These window operntions check tilt: relationship between 
edges of features on two m<~sks. For example, 111 the Mead/Conway design rules,a 
polysrhcon <.~nd drffusion are only permitted to have coincident edges where they form a 
buttrng con tact. 
INNOVATI VE LSI DESIGNS SESSION 
SpeciaZ PuPpose HaPdWaPe foP Design RuZe Checking 199 
Local area design rule checks are not suitable for design rules that depend on mask 
features far away from the position being checked. Rules for pad size or spacing would 
require impossibly large wrr Hiows. Rules that depend on the connectivity of the layout 
cannot in general be checked using windows, because there is no upper limit on the 
necessary wmdow size. The algor thm will report all potential errors that depend on the 
connectivi ty or intended functional!! { of the layout so that a postprocessor can filter out the 
spurious errors from those that anJ genuine. Ideally. tile postprocessor would include a 
node analysrs step which would not ~>nly filter out potential connectivity errors but woukJ also 
compare the intended circuit aga;nst the circUi t that was actually implemented.? 
What is corn,nonly referred tc. as hierarchrcal design rule checkinu is not so much a 
method of perform ina desion rule cl .ecks as rt is a method of reducing tl e wort~ required by 
a nonhrerarchical algorithm. It is important to consider whether the basic algorithm defined 
above can be used in a hierarcln~at desryn rut~ check algorithm. One such ulgonthm 
operntes by removing duplicate ue )metry from the layout. 11 Thrs would not signrfican tly 
decrease the running time of any n ster scan algorithm because the number of geometrical 
objects in the layout i::; reducerl rather than the area that the layout occupies. However, it is 
possible to design a hrorar ct1rcat algorithm that does reduce area. An exam pte is an 
algorithm that checks a prunitive cell in the usual way and checks a non~Jrimitive cell by 
checking its subcells and then checking the areas where its subcells overlap. In any case, 
hiermchical design rule checking can be done using raster scan algoritl1ms as well as 
geometric algorithms. 
2.2 Top Level Architecture 
The design rule check architecture described below rrnplements the inner loop of tire 
above alaorith rn . Custom ch ips ir11plement primrtive OJ' :ration::; involving raster ization and 
design rule checking. Standard MSI and memory parts complete the system. The resulting 
hardware will be able to frt on a sing le PC board and will be inexpensive enough to include 
as a part of individual color graphic designer worl~stations First, the basic architecture is 
described along with the op0rations it performs. Next, the interface between the portions of 
the algorithm im~)lemented rn hardware and software is discussed. The extensibility of this 
architecture to larger layouts and more complex design rules is also considered. 
Performing a design rule check with th0 proposed hardware requires three distinct 
operations. as illustrated in figure 1. These three operatrons nre performed for each scan 
line. Firzt the design must be converted into a ra.!::ter image. 1 his rs done by feeding mask 
fea tures that rntersc<.t the current scan tine into custom chips in the r <lSI~rization unit. 1 he 
resulting line o f rasterizetl rnask data is given to a unit which performs local area design rule 
ched~s. This unit performs boolean operations on the rasterized input masks, buffers 
previous raster lines of the derived masks, and feeds them into two kinds of custom chips 
that perform primitive one operntions. The output from the local area DRC unit consists of 
parallel st reams of error bits, where a one indicntes an error at that pooitron in the layout. 
The error reporting hardware converts this into n sequence of error co or eli nates which are 
read by the controlling processor. Assuming that tile incidence of errors iz low, this will 
result in a srgnificant compression of the error information. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
20 
Processor Control 
~t.mznllnn 
L~clw~ne 
J>r ocessor Control 
\[/ 
P;:11allt>l , local Area OrlC 
Ll•l Str·.:~ liard ware 
rigure 1: Design flute Check Hardware System 
Lar>r>y Seiter> 
Crror l Outp~tt 
Error Reporting 
Hardware 
The ll<.HCiware arclli iE'clure ir1 figure 1 cannot s tnncl alone. but ntust be a part of a 
d,;..,ign rule ciHJCiiiny systorn fllis system contwn'" a controlling proce:>:>or whtch includes 
sof twc1re to pt n!.J r'.\rn the local area desion rule c i1PCk unit . convert tile Ia· out rnto the proper 
for n1<1t for th t~ raslt~rizalion t•ntt, nr-d convert the error reports tnlo I tum n remlable output. 
lll~ dcsrgn rulo' t heck ltardw.ve srrniJIY sp·~eds up t110 rnncr loop of Ll ;.;oftwar u nlgonthm. 
file IJnsrs for c lwosinq these~ intc:rfac~ pornls bP. twrJn h .. udware and software involves the 
V!Jiumc uf dat·t wlli c l1 rs 111 1111pulated ,\nd tile cc· rnrlc.x ity of op0rat.ons on that data. 
Soft·.:~a re roulrt tc!~ can do !:JGil•'?r al data rnantpttlatton operations bttt are relatively slow at 
processing lar yr~ arr.ount::. o f d 1ta. On the other It and, datd can be manipulated very quickly 
u:;ing sp.~cinl purpo:.e hardware. hut tllP hardwnre co_,t int.n:ases raridiJ as the complexity 
of tlw opcraltOII'I inc r(!.JSes. Til') 1'1put dll I output of the IOC;ll area ore unit IS rasterized 
c.Jal~t strea:n;>, wluch e~r~ easy tu prcJcc..ss u .:>i rt~J harc.JI!Jare but would be very time consuming 
lo proce~s u'Jirl~J soft\·1arc. The rlatn r.t<miptJI.l.lions required to inr.tnntiate the layout 
prt.Jpatory to ra<Jlerrzutton and to pr:)c·m•s the error positrons for user output ore very general 
and mvolve lltuc l1 sntallc r ljtt.u tlclies of iJtformutJon. Wrtlt t!1e hardwart! software interface 
sho•.vn abo'.'", tile twrctwarc tr.tpl~rnrnts ~rrllr lc data m~nipulnticns on lar gc volumes of data 
and tile so ttw~u e rmpiemen ts uenera! operations on ~matter volume!; of data. 
The nbove arclritcctllrc cnn hr.ndle a wide range of layout sizes and design rule 
complexi lle~ . The p~uart h.J irH:> tlta t li1nit tlte size nnd complexi ty that can be handled are the 
r:umhcr of rn ~;k ttzallon chips, tile nurnl.Jcr of rmrallel nwsk data lines, tile size of the line 
buffers. and tile nurniJcr oi cu~tom chrps in the local area desrgn rul0 check unit. None of 
tllesc fac tors place c..: serrous limitntron on t11e usefulness of the hardware. /\ layout which is 
too w1de for th~ ltne hullers can bn snlrt into parallel s trrps that can be checked separately. 
/\clr acent slrtp;; must overlap by an nmount equal to the laruest design rule size. Statistical 
studies permit qood cstrmales to rnnde as to the number of mask fe:Jtures that will intersect 
a scan ltne." If there nrc too many fentures for the rasterization chipe to handle, a smaller 
lmc size can l>e dtosen. It is not necess<.~ry lo check all of the de::;ign rules during one pass 
tltr ouyh the l<tyout, so tile local area clcsrnn rule chccl~ h <.~rclware need only include enough 
custom chrps to handle euch clcSIQn rule tndivtdually. rinally, only those masks that are 
actually berng tl';ud need to be sent to the local area check hardware during a given pass 
through the dcsrgn, so the numbet o f parallel rnask data lines does not need to be as large 
as the total number of masks. 
I NNOVATIVE LSI DESIGNS SESSIO N 
:lUl 
SpeciaZ Purpose Hardware for Design RuZe Checking 
2.3 Detailed Architec t ure 
This sectron descnbes the rasterization, error repor trng , and loc,ll area design rule 
check units in greater detail. The archrtecture of the local area design rule check unit is 
dascnbed rn the most detail, with sample bus sizes and c hip counts. 
The rastenzation unrt outputs parallel strenrns of rastenzed rnosk data to the local nrea 
DRC unit Its input consists of the set of intervnl; on the current sc.an lme that ore covered 
on each mask. For orthogonal rectrurgles. this requires that the controllrng processor add 
an interval to the set when the current scan lrne reaches the start of thE· box and remove it 
when rl reaches the end. TrapezorJ,; with two horr.wntal srcles i.ltlC! two srdes ut 45° ang les 
can be r asterized by add iny an rnterval to the set when the current sc; n lirr e reaches the 
start of the trapezorcl. and tlren rncr ementrng or decrernentrny tile t-ncls Oo the rnterval before 
each successrve scan lrne until the end of tlw trnpezord is rencl1ed. t·otyuons and wires 
must IJe decomposed into trapezords. Tl1e rnterval rasterizntron operation is porfornrerl by a 
custom chip, srm rlar to a c hip designed IJy Oar t Locanthi.6 
The error report unit is tile rntcrface between the local area DHC unit and the 
con trolling processor. It converts the parallel streams of error brts into a list of positions 
where errors were discovered. 1 110 type of en or at each posrtion is also rPported. Most of 
the e rror bits will be zero, so this wrll sianrfican tly reduce the nrnount of inlormntion passed 
to the controlling processor l ypically, the error positions wi ll be s:1ved on a disl~ file for 
furt11er processing a lter the desron rule r.hccl'. is complete. 
The local area design rule check unit J.ccepts parallel streams of mask data bits and 
produces parallel s tr eamr. of error bits. It performs r>rimrtive DRC functions such c>.s width 
tests and boundary checks It al::;o rmrlernents musk shnnl~ <md expand oporations and 
boolean operations sucl1 as rnasl< intersectrons, unrons, negations. and d ifferences. The 
boolean operati ons are used to <.reate derived masks such as transistor gate area, which is 
defined to be the rntersectron of the polysilicon and tliffu~ion masks. Since the width tests, 
boundary checks. and shrrnk/expantl operations require lool<ing at the masks through 
wrndows of size up to 4x4, buffers are used to save up to three previous lines of each mask. 
Frgure 2 rllustrntes the arclutecture of the local area design rule check unit wi th sample bus 
srzes and numbers of chips. 
v 
8 
8 ~ 
) > 
Ma sk Bits In 
Derived 
.... 
, Mask 
> 
Generator 
(8 c hips) 
.... 
Shrirrk Bits ...... J8 Widlh 
, 
.... Check Width Error 
Line /32 m chrps) / 7 
, 16 
s 
/ .... Buffer / .... 
16 
.,. 
4 Over lap Error s 
Doundary (11 chips) 
t. > Check 4 / ... ) , / 8 (2 chips) Bound Erro rs 
Figure 2: Local Area DRC l lardware 
CALTECH CO NF ERENCE ON VLSI, January 1981 
202 
Larry Seite r 
The mask bits input to the the local area DRC unit are led into the derived mask 
generator, whrch uses a custom chip to perform boolean operations on them. The derived 
masks are fed into the line buffer, which uses standard memory parts to buffer preceding 
lines. Each custom chip in the width check unit receives a mask from the derived mask 
generator along wrth the cor respondrng bits m the preceding three line: saved in the line 
buffer. The wrrl tll check chips each output two wrclth error lines for the irrput mask and also 
ou tput the resu it of a shnnl< operation on the mask. This operation ca•l also be used for 
mask expansron. The shrink output ic; led back around to the derive J mask generator, 
nlluwing multiple sllrink operations to be done. The boundary chesk chips rach require bits 
lr om the current and preceding lim·s of two dl"'r ived masks and produce two en or outputs. 
The final group of cmor lines are ou.put directly from the clerivecl mas!\ ae•1crator. These are 
useful for sr tuallons such as ovc llar tests, where errors are found by subtracting one mask 
from another. Tl1e algonthms and arch rtec tures used for the width check and boundary 
check chrps are described m ser lions 3 and 4, respectively. 
2.1l Timing Estimate 
At thrs point, we have enouc;h information to estimat~ the speed of the design rule 
check hardware. Tt1e basic data rat~ is determmed by the rate at which the rasterizer 
sequence~ tilroU~Jh the posrl rons on a scan line. Assuming that dt.lt:). is buffered at 
approprio.tc pla~cs, the mcst ccmplex oper at ron thct muot be clone rluring a single cycle is a 
memory react and wrrte in the line bul fers. It is reasonable to assume thnt this can be done 
111 200ns Usrng a grid siw of 1 !?."A, and givon the hardware configuratton in figure 2, the 
lviC'nd/Conway cl:?:>iqn rulo::s cnn be chPckcd rn no more than five passe;s through the design 
rule cllccker .10 Msumo that the chrp being c l u~c.;k8d is 3000A.x3000"A, v.hich is 295 mils on a 
srde rf "A equa ls 2.b mrcrons. Fur \her assume that there rs a 50% overt read resulting from 
o11erlapping str ips and d~lays between scan lines. The equation below gives the time 
needed for a complete clesign rule check. 
2.1 (3000"A)2 · 200ns/( I /2"A)2 · 5 · 150% = 54 seconds 
Of course, the con trolling processor will not necessarily be able to provide data to the 
rasterization unit tll t1t qutckly. Clark Baker's raster scan DRC program takes 49 seconds 
srmply to reud tile instantiated rectangle file for a chip that size, which contains about 
100,000 .cctangles.2 This would have to ue done once for each pass through the design 
rule checker . Experience tndrcates that rt would be much faster to instantiate the chip on 
the fly from a hierarchical description, raihcr tllon read it from a largn disk file. Further 
research must be done to find wnys of quickly getting data into the rnsterization unit. If 
software instantiation algorithms are not fast enough, special purpose llarclware could be 
designed to speed that up as well. 
INNOVATIVE LSI DESIGNS SESSION 
203 
Specia~ PuPpos e HaPdwaPe fop Design Rute Checking 
3. Width Checking and FE.!a tu rc Sh rinl<ing 
The most frequent operation in integrateJ circuit design r11les is mrnunurn width 
checking. Polysilicon, diffusion. m'3l'tl and con tact cut masks all hnvc !l1err own minimum 
feature srze. Spacing checl<s are simply wiclttl checks performed 011 tin complement of a 
mask , so spacings between features on the same or different masks can be checked in the 
same way that widths are checked. 
This sec tion starts by describ:n~t how to check for v1id lh errors in n ask fcntures which 
are ortlwyonal and are smell in comparison to the grid size. Then t11e n1ethod is expanded 
to handle edges at 45° ang le:;. A f~ature shrrnldng operation ts introduced to allow c t1ecking 
of greater v:idths. This nperctian c<.n also be used to expand masks. Ne t, a custom chip is 
descriiJed lint impleme11 ts these or,8ra trons. Finally, a notatron is devel.med fur specifying 
width checks and shrinl< or expand operations. 
3.1 Ortho!)onnl Width Checking 
Width checks requrrc lookinn at features in a window which is one grenter in size than 
the width that is being checked. Se, a 3x3 v1indow is needed to verify that a masl< is at leas t 
two units wide and a tlx4 winduw is needecl to check for width thr ae. Figure 3 illustrates the 
sot of patterns that implement thest~ r.hecks. A one indicate;, a position where th e mask is 
rcqu irod to be present , n ze:ro ind1cates .. po::;1tion where the mr~sk is requi;ed to not be 
present, and a dash represents a d.)ll't -care position. The mask pa::;ses the width 2 test if it 
matches rme of the fou r orthogonal rotations of one of the patterns in finure 3a. The mask 
passes the width 3 test if it also matcl1es one of t11e four orthogonal totations of one of the 
patterns in figure 3b. 
00 ff§l . Hffu1_ g 1 1 - 1 I -· · 0 1 I · 0 I 1 1 - f-- ----1 1 I --1 1 1 0 1 1 1 
. 
'--· -
I I 1 . 
I 1 1 . 
-
1--
I I 1 
t--t--1--
Figure 3a: Valid 3x3 Windows Figure 3b: Valid 4x4 Windows 
The patterns in figure 3a are correct because a mask fails the width 2 check if nnd 
only if it contains a feature of width 1 . The patterns in figure 3a check whether the center 
cell is part of a fea ture of width 1. If a mask matches the first pattern rn figure 3n, t:1en the 
masl< has wicllh zero at thk.; point. If it matches the second pattern. it rs part of the corn er of 
an area which is at lenst 2 unrts w1cle. If it matches the thircl pattern . then it is part of the 
center or edge of an area which is at least 2 units wide. Therefore, if it does not match any 
of these patterns, it must be part of a feature which is only one unrt wide. 
The justrfication for the patterns in figure 3b is similar. They check whether the center 
2x2 box is part of a feature which rs ureater or less than 2 units w1de. If there is a zero in 
one of the four center cells, then the rnasl< is less tlwn two units wide at t:1at position. If the 
mask matches the second or third patterns, then the rresent position is t11e corner, edge, or 
• inte rior of an area which is at lea:::;t three units wide. If none of these patterns are matched, 
CA LTECH CONFERENCE ON VL SI, JanuaPy 1981 
204 
Lar>r>y Seiler> 
then there must be a featurP. of width 2 at this position. If a pattern in figure 3a and a 
pattern in figure 3b are matched, then the mask must be either less than 1 or greater than 2 
units wide. This is the test that is needed for the width 3 check. 
Figure 4 gives examples of checking for width 2 and width 3. The edges and corners 
of the rasterized rectangles must fall exactly on the rasterization grid for the width check to 
work correctly. Rounding is not permitted . Each example contains an error which is marked 
by an X. The 3x3 and 4x4 windows show the mask pattern around the error positions. It 
should be noted that if the zero were not present in the corner patterns in figure 3, the 
errors below would not be d8tected. 
-0 
· - -- - - I I 1 0 0 
1- 1 1 1 1 
,, ; 
- 7·<-
-
;_-~ 
1 1 1 1 
0 1 1 1 
·t-
>~ 
- -
Figure 4a: Width 2 Example Figure 4b: Width 3 Example 
There is some question as to wr,ether the pattern in figure 4b is actually an error. The 
angular distance across the stricture is 2·(2) 1-2. w!1ich will be referred to as 2 diagonal units 
or 2 cliags. This distance is approximat•;ly equal to 2.83 orthogonal units. This is only 6% 
less than the required width of 3 unit::.. and in some fabricatron processes it could be 
con::;iclered sulficiently close as to net be an error. To retain the greatest degree of 
generali ty it must be possible to select whether or not this case represents a width 3 error. 
3.2 Angled Width Checl<ing 
Figure 4 gave an examrlc of measuring tile width of a feature along a 45° angle rather 
than orthogonally. Now we; wi ll consicler how layou ts that include 45° angles may be 
checked for wid ths 2 and 3 To be checked correctly, all edges must fall on the grid 
illustrated in figure 5a. Fiqures ::u ancl oc show two more cases that must be allowed in 
order to do wiclth 2 and 3 checks on a mask with edges at 45° ang les. 
1 0 0 1 1 0 0 
1 1 0 1 1 1 0 
1 0 0 1 1 I 0 
1 1 0 0 
Figure 5b: 3x3 Angled Corner 
Figure 5a: 45 Degme Grid Figure 5c: 4x4 Angled Corner 
The rules for rasterizing 45° edC)es are very simple. When a width check is going to 
be performed, all partially covered raster cells are filled in. When a spacing check is going 
to be performed, all partially covered cells are marked empty. Figure 6 sl10ws a width check 
INNOVATIVE LSI DESIGNS SESSION 
205 
Special PuPpose HaPJWaPe fop Design RuZe Checking 
being performed on a wire which contains a 45° bend. When the wire is rasterized, the 45° 
edges become st<:ursteps whrch touch actual edges of the wr re at each step. The resu lt is 
that the rastcrized wire rs o.t every puint greater than or equal to the correct width, allowing a 
width check to be done wit110ut reporting spurious errors. Since the rasterized wire has the 
correct wid tll once at eo.ch stair ste >, Rny genurne errors will be discovered. The different 
rasterrzation rule for spac ing checks rnsures that all spacrngs are greater than or equal to 
the correct value The X in figure L marks the posrtion which is illustrated in the window at 
the right. If the ang led corners in 1 igure 5 were not added to the set of valid patterns, this 
would be reported as an error. 
. 1 0 0 
AJ 1 1 0 1 0 0 
Figure L: Angled Width 2 Example 
Figure 7 gives an example of checking an ang led wire of width 3. The wid th of the 
angletl portion of the wire is 2 drugs. or 2.83 unrts. The upper left X and the upper left 
window illustrate that this wi ll be detected as a wid th 3 error unless the pattern in figure 4b 
has been specrfied as vnlrd. rhe lower rrght X and the lower rrght window show an example 
of the 4x4 ang led corner from frgrrre 5c. 
1 I 1 0 
~ / t--1 1 1 1 
/ 
' 
1 1 1 1 1 1 0 0 
~- / v I~ 
0 1 1 1 1 1 1 0 
1 1 1 0 
1 1 0 0 
Figure 7: Angled Width 3 Example 
Allowing the ang led corner patterns in figure 5 causes a problem which is illustrated in 
figure 8. These patterns cannot be distinguished from ang led corners. As a result, these 
errors are not detected rl angled corners are allowed. A strona case can be made that these 
errors are insignifrcant. However, it i:; possible to detect these errors and still allow angled 
corners by doing two width checks. The first check allows angled corners and the second 
does not. If an error or.curs in the second width check that did not occur in the first , and 
that position is not the corner or edge of an angled box or wire, then it must be one of the 
errors in figure 8. 
CALTECH CONFERENCE ON VLSI , Januapy 1981 
206 
Larry SeiLer 
~~ -
EEl HIll 
0 
r--0 0 
-
0 1 0 
1 1 1 
1'\J./ /., 
,-
• 
0 0 0 0 
--
0 1 1 0 
1 I 1 I 
1 1 1 I 
.... 
Figure 8: Undetected Errors 
The inclusion of 45° edges can result in some unclesir<.1blc clegene ·ate cases. Figure 
9 shows what happens when 45° eclges are coincident or with111 1/2 c;iag of each other. 
Figure Oa illustrates a zero wiuth a1 glecl w1re whicll is rastenzed for a wi'Jth check and two 
angled wires with coincident edges which are rastcrized tor a sracing · .lwck. In the first 
cast~. a zero width line C<luses r...ls er cells to ue fi:led in. In the second case, unwanted 
holes appear dlong touching edgns. The romccly for these problems is to require the 
rantcriLation ulgorithm to detect coi.1cident 45° ed~Jes and e1tller rgnore t 1e feature or fill in 
the raster cell at that point , d~pendinu on wllrch of til~ cases in frgure 9a '"las been detected. 
Figure 9a: Coincident Edge Degenerate Cases 
Figure 9b. Near Edge Degenerate Cases 
Figure 9b show~ what happens when 45" edges fall 1 /2 ding apart. In tile first case, 
a wire whrch is only 1/2 diag wide disappears completely upon a spacing check 
ra::;teri'-ation. If two angled wires have edges thut fall 1/2 diug apart, the gap disappears 
during a width check rasterizutrun. 1 hese problems ~an be soivcd by doing both a width 2 
check and a spacing 2 cllt::!cl< on each mask that may have 45° edgPs. Checking the 
~tr uctu1 e on the left, for exnrnolc, will cause a width 2 error to be reported, even though the 
spacilt9 error tho! rnight exist will not be reported. The only complication comes with a 
mas~ s11ch 11'> depletion mode implant, wllich docs not have any minumum width or spacing 
rules. One way to soh1e th1s problem i:> to impose width 2 and spacing 2 rules on this layer 
ancl accept I he ~pu1 ious errors thnt rniotlt result. Most of the spurious errors may be filtered 
out by havin9 the raoto:mza11on algoriihm report points where 45° edges are 1/2 diag apart. 
Any l e~J itilrlate error will be recoun1zed as such by both checks. It should be noted that this 
is only necessary if 115° edges arc allowe:tl in the implant mask. If 45° etlyes are used for 
interconn(;ct only and not for transistor or implant areas, then this case will not arise. 
INNOVATIVE LSI DESIGNS SESSION 
SpeciaZ Pu~pose Ha~d~a~e fo~ Design RuZe Checking 
Figure 9 also illustrates how the msterization algorithm deals with acute angles. An 
inside acute angle will always be detected ns a wid th error and nn outside acu te ang le {an 
acute ang le on the complement of the mask) will always be detected as a spacing error. It 
would be nossibfe to recognize the occurrance of acute angles and not flag them at. errors, 
but there is little point in doing this. It is not possible to acurately pattern an acute angle 
onto sili con, so it 1s not very useful to include one in a layout. 
3.3 Feature Shrinking 
The algorithms developed above allo·N mask features to be c h ecl~cd for width 1 errors 
or width 2 errors using 3x3 and 4x4 wrndows. respectively. Larger wind• -ws could be used 
to check for greater widths. Howevnr, tl1e complexity of the width checl~s JOes up rapidly as 
the size of the window increases. Also, t11cre is no fi xed limit to the widt l s that wi ll need to 
be checked. What is needed is a way o f mal~ing the wid th check function modular. This 
can be done by introducing feature shrinl<ing. 
The goal of the feature ::;hrink operation is to reduce the size of the features in a mask 
so that the w1dth 2 or width 3 check can be used. Feature shrinkiny is done by passing a 
3x3 window over the selec ted mask and producing an output which is one or zero 
depending whether the mask matches a specified pattern, as for the width 2 check. The 
difference is that the output is used as Rnother rasterized mask rather than as an error 
1ndicat1on. Figure 10 rllustrates the four patterns that are used. 1 he orthogonal shrink and 
angled shrink patterns are d1scussecl I.Jelow. The shrink /expand and null shrink patterns are 
discussed at the end of this section . Note that any of the four rotations of the 
shrink/expand pattern constitute a match. 
1 1 1 
1 1 1 
1 1 1 §±§1. ~--1 1 - - 1 -. . - . . -
orthog shrink angled shrink shrink/expand null shrink 
Figure 10: Patterns for Featu re Shrinl~ 
If the orthogonal shrink pattern is applied to a mask, the resulting output will be a one 
whenever the center of the pattern is in the inside of a fea ture of wic.lth 3 or more. If it is on 
the edge or corner of a mask feature or is outside, the output will be zero. The re:;ult of this 
is a mask which is the same as the input mask except tllat all of its horizontal and vertical 
edges have receded by one unit and all of its ang led edges hav(; receded by one diag. Any 
parts of the mask that are loss tllan three units wide will disappear con1pletely. The angled 
shrink is almost the same, except that it causes 45° edges to recede by only 1/2 diag. This 
is illustrated in figure 11. The orthogonal po1 lion of the wire starts out 4 units wide and the 
45° portion starts out 3 diags wide. After either shrink, the orthogonal portion of the wire is 
2 units wide. After the orthogonal shrink the angled portion is only 1 d1ag wide, which is a 
width 2 error, but after the angled shrink it it 2 dings wide. The name of the angled shrink 
pattern refers to the fact that it does not reduce angular widths as much as the orthogonal 
shrink. 
207 
CALTECH CO NFER ENCE ON VLSI, Jan uaPy 1981 
208 
£ar>r>y SeiLer> 
Figure 11 : Feature Shri nk Example 
Tl1c local mea DRC architecture in riyure 2 shows til at the shrin , outputs from the 
width checl< chips are fed buck into the derived mask uenmator for corr.bmation wrth other 
n.asks. Therefore. a singl0 mask n ay be shrunk or expanded as nwny trmes as there are 
widtll checl< chips, reducing ils wrcJI 1 by 2 units each t11nc Thrs rnal<es i possible to check 
arbitrari ly large widths usrng only II c wrdth 2 and width 3 checks. Since rnasl< features of 
width less than 3 drsappear drrrina the r. :1rink opE'mltron. a wrdth 3 check nu~t bo pNformed 
each time 1 mask is shrunl<. The s~1rne patterns lhat are used to shrink 11asks can also be 
used to expand masks, smce the expansion of a m:-~sk rs srmply the c :omplernent of the 
shrink of the complement of the ltH.Isk. 
The patterns in figure 12 descnbe double shrinl< operations. St1 rinking a mask twice 
usrng the orlhogonal or anC)Icd rhrini pattC'I'ns rs eqtllv<.l lcnt to shrinkinu it once using one of 
the four double shrink patterns. Tlr•J two center p<.~tterns clernon•5trate that the order in 
wt1ich a sequ€'nce of ort110nonal and nnur.:cl shrinl<s ::m~ done r::; L•nimportant because the 
two shrink operations are con•nrul::;live. fl1e patterns in figure 12 can also be interpreted as 
the resu!t of expanding n rnasl< wt1:ch had n one rn the tenter of tile wrndow. Orthogonal 
expansions produce square:> .u;gk:cl t2 pansions produce angled squares, and a cornbinalion 
of the two produces octagons 1 tu::> shows that the Jegree of orthogonal and angular 
shrinkage or expansion can be S€'1cct8d rndopendently. 
H- 1 1 1 1 - j 1 1 1 
.__ 
-
- 1 I 1 
1 1 I I I 
·-
-
r--· 
I 1 1 
1 1 1 1 1 
1 -
- I 1 I 
' 
1 1 1 1 
I 
' 
1 1 1 
-- --
1 1 I 1 1 ~ 1 1 1 1 I 1 1 1 I 1 I - 1 i 1 1 1 ·-1-1 I ' 1 1 1 1 1 - 1 I 1 1 I 1 1 1 1 -
ortllog ·orthog orthog·angled ang leu-orthog angled-angled 
Figure 12: Double Shrink/Ex f)ancl Patterns 
The zhrink/expand pattern in frgure 10 outputs a one if the cell in the center of the 
window is rmrl of a feature of size 2x2 or greater. This does not 1 csult in any shr inl~age of 
the mnsk. Instead, it removes fenturcs of width 1, as rf the mask hnrl been shrunk by half a 
unit in ench dimension and then expanded back again. Figure 13 illustrates using this 
opcratron lo find the actrve urea of a transistor using the Mead/Conway design rules.8 The 
layout on tne left depicts a depletion mode pullup transistor combined with a butting contact. 
INNOVATIVE LSI DESIGNS SESSION 
209 
Special, Puropose Harodwa roe /'J r> De sign Ru Z, e Ch ec 'Jd ng 
Only the polysilicon and diffusron masi(S are shown The center fiqure illustrates the 
mter section of these two m3sks, v.'hich includes the unwanted butt111g contac t stnp. The 
shrink/expand operation results in the mask on the right, whrch is the real tmnsistor gale 
area. An orthogonal shrinl-. followed by an orthogonal expand cun be used to remove areas 
of width 1 or 2. An angled shrinl( followed by an anuled expand removes features of width 1 
or 2 and also rounds off orthogonal corners. 
poly I 
I dill I I I I I I I I I 
L 
Figure 13: Shnnk/Expand Example 
Overlap tests can be done by subiracllnn a shrunl( masl( from an CYpancled mask, 
usrng the feedback lines into the derived mask generator. llowever, a shrunk or expanded 
mask cannot be combined wrth an unchanged mask. The reason is that lhc shrink operation 
displaces the mask that it operates on, since the curren t mask posrtion is 111 the corner of tlie 
window and the shrink pattern works on the cen tP.r . The bit erHering the derived mask 
generator from a feedback line is not at tire same oosrtion on the layout as a hit coming 
directly from the rasterizer. To curnbine an unchanged mask with a masl< that has been 
shrunk or expandecl. it is necessnry to displ:.~ce 1t by the same amount as a sh1 ink or expand 
operation. The null shrinl( p<:~ttern in fig m e 10 does this. The only change it causes in the 
input mask is to displace it by the same amount as a shrink or expand. In 38Ction 3.4, we 
will see that there is another use for the null shrink pattern. 
3.4 Width Check/Shrinl( Chip 
The previous sections have defined the functionality that is necessary for the width 
checl(/shrink chip. Figure 14 give::; a structure that Implements this functionality. All of the 
window patterns are impelmcnted in a clocked PLA which has 44 MID t ~)rms. Four 
successive rows of mask clata are input to the chip and the previous three v<:~ lu es from each 
row are saved so that a 4x4 window IS input to the PLA Five control lines nre also input to 
the PLA. Their exact functionality is described below. The ou tput lines WlnT II 1 ERROR and 
WIDTH 2 ERnon indicate when one or two unit wide mask features nre found. SHRINK 1 OUT 
and SHRINJ< 2 OUT arc the result of upplyrng the shri11k operati on selected by the control lines. 
The first applies it to a 3x3 window which uses rows 1-3 and the second applies it to a 3x3 
window which uses rows 2-4. This allows the width cl1eck chip to output two rows of shrunk 
mask in parallel. S11r~1NK 1 OUT can be used as a feedi.J<:~ck line to the derived musk 
generator. A use for SHRINK 2 OUT will be described at the end of this section. 
CALTECH CONFERENCE ON VLSI, Januaroy 1981 
Lar>r>y Seiler> 
n OW 1 -
.... > SR / bit 
~ r--' 
-?> SA .... SA l.Jit _.. bit 
Width 1 Error 
ow2 -
..... .... , SR 
/ bll 
- -
7 Sl1 ,- :--7 SR bit l.Jit 
R Width 2 Error 
ow3 -
' 
.... SA 
/ ~ Lut 
1-
-~1n .... Sl1 1----;; bit .... l.Jil 
R Shrink I Out 
OW·I -
..... 
-
---::-' 
sn 1--r-1-/ lJit 
- -
-
sn t--- 1-- ..... sn >----IJI I .... bi t 
11 Shrink 2 Out 
c ontrol '--- ...__ -
u 
>-5 
nes v 
\ I \ [/\ I \ I \I \I \ ll\ II \I \I\ In I \II \I \ I \ I!\ II 
PLA An d Plane ( 44 terms) 
Figure 1 •1: Structlr~e of Wrdth Check/Shrinl< Chip 
Tllree of tile five control lines specify the kind of width checl< that is performed. 
Tlrese inputs are called WID I II SELICT, 1\IJCIII:: Of'. and COnNER OK. The WIDTH SELECr line 
chooses between widtl1 ?. and width 3 checking . The wron r 2 ERnOH output is only nonzero 
,_.,hen wrdth 3 c l lecking is ~e!ccted 1 his way. the two wrdth error outputs can be on'ed 
toucthN e:.< ternally to produce a ~rngle wrdth P.rror srgnal rf the senarfl te rnformation is not 
neecl(;d. The AI·KII r OK lirH' contro ls wl1et11er the ang led wid th patterns in figure 4 are 
I rAnted as wiu th errors. Til e c:onNt.R or~ line causes the angled corners in f igure 5 to be 
nccepteJ. The reP1 ... 11n!ng two cont,ol lines specrfy the pattern that is used to rroduce the 
::;hri11k ou tputs. These Inputs :1re called Sl H111ll< SELECT 1 ·tnd SI IRINK SELECT 2. Table I shows 
which shrink pattern ~~ used for t:ach combination of the select lines. 
SeLe_gU Select 2 Shrink Ty12e 
true true orthog shrink 
truP. false angleu shrink 
false true chnnk/expand 
false false null shrink 
Table 1: Pattern Selection for Feature Shrink 
For dn.srun rules tlwt contain many l;trge widths and spacings, it would be good to be 
ahle to do a double shri11k wrthout using feedback lines. Figure 15 shows how several width 
chcc.l~/shrink chips could uo combrned to do double or even triple shrinl<s at once, reducing 
the wJCIIh of rnask features l.ly ' ' or 6 units at n time. Note that the double shrink requires 6 
input rows insteml of 4 and the triple shrinl< requires 8 tnput rows. Also, a double or triple 
t.IHink con figun.tiOn can be $et to do a smulc or double shrink by specifying a null ~hrink in 
the frrst column ol width checl< chips. All of the control lines in a single: column should be 
tied together so that the snme type of shrink is done by all chrps in the same column. 
IN NOVATI VE LSI DESIGNS SESSION 
.C..l..l. 
Special PuPpose HaPdWaPe fop Design Rule rhecking 
11ow I 
!low2 
nc.w3 
now·1 
Row 1 
Row2 Wid th 
Row3 Check 
Row4 Chip 
Row3 
Row4 Wiclth 
RowS Check 
Row6 Chip 
Width 1 Row 1 
WHJ1h Width 2 now 2 
Check Sluink 1 How3 
Shnnk 2 flow 4 
flo,; 3 
w.ct1h 1 llo.v 4 
Wirl1h 2 
Width 
Check 
Chip 
Width 1 
·---------------------
----f Width 
Wid111 
Check 
Ch•P 
V'11dlll 
Cht>c k 
Chip 
Wid th 2 
Width3 
Width 4 
Widlh5 
Widlh6 
Figure 15: Single, Double, and Tnplc W1dth Check Configuralions 
3.5 Width Check Specirication 
Now that a custom chip has been defined that can clo small width ch ccl~s und several 
kinds or feature shrinking, 1t IS necessary to def1ne how larger chccl(s are done. Th1s 
section shows how to do arbitrarily large w1dth check and featut e shrink operat1cns using 
ml•ltiplt: width check/slu i11k chi!JS and then uef1ncs a 11otalion lor spc>c• fytng wiuth checks 
and shnnk operations. 
When a width check ts dOJ'lC involving one or more w1cltl1 clwck/shrink chips, all but 
the lant ch1p should select width 3 chcckmg. ThP value of the ccmtr n OK line should be the 
same for all of thum. ANGLe Oi< should be f::tlse for all except poss1bly the last one. The 
number of chips which should have OH TI10G low to select angled r.lmnking and the state of 
the ANGLE OK and SELrc r fines tor the last w1dth check ch1p depend on the spcc1fic wid th 
checl< that is being performed Table 2 gives the values to use lor doing wid th checl<s on 
features up to s1ze 6. A width check IS determined by the reqwr.Jd orthogonal and angular 
widths. For example, the sixth line of th e tab!e tells how to check for features with a 
minimum orthogonal width of 4 units and a min1rnum 45° width or 3.0 dings, or 4.24 units. 
The table is sufficiently large that the pattern should be clear. The final 'Nidth c llccl< is width 
2 il the orthouonal width to be checkt;d is even and w1dth 3 otherwise. Specily1ng ANGLe Ot< 
for the last chip reduces the required dingonal Width by 1/2 ding. Chang1ng an orthogonal 
shnnk into an angled shnnk reduces the requir~"d angled wtdth by 1 ding. This is because 
the ortlloaonal shrink causes nngled edges to recede by 1 diau whilo the angled shrinl~ 
causes them to reced~ by only 1 /2 diag. 
CALTECH CONFERENCE ON VLSI~ JanuaPy 1981 
212 
Lar'r'Y Seil.er' 
orthogonal angular width orthouonal angled final final 
wrdth diags unit'5 shrink sllrrnk select angle ok 
----
2 1.5 2.12 0 0 width 2 no 
2 1.0 1.41 0 0 width 2 yes 
3 2.5 3.54 0 0 width 3 no 
3 2.0 2.83 0 0 width 3 yes 
4 3.5 4.95 0 width 2 no 
4 3.0 4.24 1 0 width 2 yes 
4 2.5 3.54 0 Width 2 no 
5 4.5 6.36 0 width 3 no 
5 4.0 5.66 1 0 width 3 yes 
5 3.5 4.95 0 width 3 no 
6 5.5 7.78 2 0 width 2 no 
6 5.0 7.07 2 0 width 2 yes 
6 4.5 6.36 width 2 no 
6 4.0 5.66 width 2 yes 
Table 2: Control Line Values for Width Checking 
No1v it i'5 possible to define a notQtion for specif~1ing ~'Jidth checks and shrink 
or,P.r ations. W2(mask) reprc3cnts a width 2 checl<. that is, errors are reported for features of 
si7e 1. Subscr ir>ls are used to rndicate the required diagonal width. For example, 
W<t3 0(mm~k) mdrcates a checl< for features with an orthogonal width of 4 units or an angled 
wrcltll of 3.5 diags. The frfth line of table 2 describes how thrs width check could be 
'1Chieved. Shrinl< and e)(pand operations are expressed sinlilarly. S1 (mask) describes a 
c;h rinl< operation which causes et~ch edge to recede by one unit. S3:? 0(mask) denotes a 
~,llrink of 3 orthog0na1 units and 2 dragonal units. This requires one orthogonal shrink and 
two angled ~hrink operations. [1 0 G(mask) specifies an angled mask expansion and is 
equrvalent to •S t0 r1( -,masl<). N1 (rnm;k) indicates a single null shrink, whrch displaces the 
masl< hy the sarnu amount as a shrink operation but otherwise leaves it unchanged. 
SE(mn::;l<) indicates a slmnk/expand operation, which removes one unit wide features. 
rin~1lly, a superscnpt on the rnask name indicates the kind of rasterization to do. If NP 
rP.prer..ent:, the pnlysr lrcon mask, then Nr .. specifies mstcrization for a width check and NP-
sr ccrfies rastenzation for a spacing check. 
4. Boundnry Checking 
So far v1c have seen how to shrink and expand masks and perform width checks on 
tiwm. Another type of design rule depends on the relatrve positrons of the edges of two 
mask~:;. An exw11ple of this ic tile transistor e>-tension rule, which requires polysilicon or 
drilu::;1on to extc:nd beyond the edges of trunr;istor gate areas. This section starts by giving 
t:;<<tmples of boundary checks required by the Menu/Conway design rules.s Then an 
,1rchitecture rs described for n custom chip lllnt irnplements the~e checks. Finally, a notation 
rs developed for !;J1C<.:IIying boundary chocks. 
INNOVATIVE LSI DESIGNS SESSION 
Specia~ PuPpose HaPdWaPe fop Design Rule Chec~ing 
4.1 Boundary Check Examples 
In order to checl< design rules wh1ch involve boundary interactions, it is necessary to 
use a 2x2 window to lool< at two masks simultaneously. Figure 16 shows three examples of 
boundary errors mvolvmg tile polysilicon and d iffusion masks. For each case, one or more 
2x2 windows are shown wl1ic l1 identify t11e presence of the error. A cap ital P or 0 indicates 
that the spec ified mask must be present at that pos1t10n, a lowercase letter indicates that the 
mask must be absent, and a dash represents a don' t care position . 
diff 
poly 
poly Iron poly 
diff 
diff 
poly 
trnn 
dill 
EGJ 
I~ 
f>D pD 
----
pD pd 
Figure 16: Tl1rce Boundary Check Examples 
The first boundary error involves the polysilicon/diflusiOn spaGmQ rule, which requires 
a 1 /... separation except where llwy c ross to form a transistor. Tl1is rule cannot be 
completely checked by a width test on the union of the two masks because they might 
touch, a:> shown m the lc ll11and entry in figure 16. This error occms whenever a raster cell 
which contams polysi!icon but not diffusion is next to a cell thdt contnio1S drllusron but not 
polysi licon. The window that is shown is 011 ... of four rotati ons lhat must be checked. Note 
that it b not necessary to check whether the two masks touch at a corner because that case 
is detected by a width test on tl1e un1on of the two masks. 
The cen ter entry in ligure 16 shows a case where the poly mask fails to extend 
beyond tho edge or a transistor gate area. As the window illustrates, thi s error can be founu 
by lool<ina for a raster cell tllat contains both polysrhcon and d1flusion next to a cell that 
contains nei ther polysilicon nor diffusion. Again, all four ro tations must be ct1ecked. 
The righthand entry shows a more subtle transistor error. Here, the extons1on rule is 
satisfied but the transistor is still incorr ect. It is necessary to check that polysilicon and 
dlfrusion actually cross the transrstor gate area. Th1s can be done by lool<ing at the corners 
of the transistor. Til e two r iQI1tl1ancl windows in figuro 16 check for the errors found at the 
upper left and lower rrght corners of the transistor gate area. It 1s necessary to check all 
four rotatrons of each window. 
4 .2 Boundary Check Chip 
The structure or the boundary check chip is similar to that or the width chocl< chip in 
that row inputs are fed through shrft regis ters into a PL A which checks the input window for 
errors. Unlike width checks and shrink operations, there are too many d ifferent possible 
boundary checks to spec ify them all in advance. Since it is necessary to be able to program 
tbe boundary check chip for the specific check that is desired, n dynamically programmable 
logic array (OPLA) is used instead of the mask programmable logic array used in the width 
CALTECH CONFERENCE ON VLSI , Januapy 1981 
214 
La.,..,.y Seize.,. 
check/shrink chip. The window patterns programmed into the DPLA are stored in dynamic 
nodes and must be refreshed periodically. 
Figure 17 g1ves the structuro of the boundary check chip. Two rows each from two 
separate ma5ks are input to the ch•p. One b1t of sh1ft register is provided for each input row 
so ttl at a 2>.2 wmdow on each m< sk is input to the DPLA. The LOAD srLECT lines control 
when data on the DA Tl\ BUS is used to program the DPLA. which has eight AND terms. The 
first four terms are Ol~·ed together •o produce CRROR 1 and tl1e other four are OR'eu together 
to produce mnoR 2. Each of tl1e AND terms in the DPLA specifies a boundary error. 
Row1a Rowlb Error 1 
L~~r_l_s_e_le_c~~a~/~>~[. ________ o_P_L_A _______ _ 
Dnta Ous 16 
Figure 17: Structur~ of Boundary Check Chip 
f'lcrm::tlly, the bour.dary rhcr,k ch;p w011ld input two masks d1rectly from the derived 
mm;k g.;nerator, but this is nut necr·ssarv. riaure 18 illustrates an alternate way to use the 
bolln~lary chccl~ ch rp that allows ihe input~ to have a shrrnl< operation applied to them before 
bPrng uoundary checked. f'lolc that type of sl1rink can be selected separately for each width 
checl< cl1ip. This is especinlly useful when the gnd srze is small relative to the feature siz.e, 
srnce many masi<S vJill have to be shrunk or expanrlnd before they can be compared. 
Ro 
Ro 
no 
llo 
w 1a 
·.v 2a 
w3a 
w4a 
R 
R 
R 
n 
OW 1b 
ow2b 
ow3b 
ow4b 
Witltll 
Check Shnnk In 
Clup Shrtnk 2n 
Shnnk 1 b 
Witlth Shnnk 2b 
Check 
Chip 
Boundary 
Check Ch1p 
Width 1 a 
2a Width 
Erro 
Erro 
W1dth 
Width 
r 1 
r2 
1b 
2b 
Figure 10: Boundary Check Chip with Pre-shrink 
4.3 Bound.Hy Chccl< Notation 
A boundary check consists of a set o f 2x2 window comparisons performed on two 
m~sl~s. If one of the wu1dow patterns rs matchcu, then a boundary error has been found. 
The two masks are referred to as A and B; they are defined by equations involving shrinks, 
ex panels, and boolean ope, atrons on ma::;ks input from the rastcriz.ation unit. Each window 
is defined by a conjunction of subscrrpted terms, where each term is a dash or an upper or 
lower case A or B. rhe subscript defines the position of each term in the 2x2 window and 
r nTI'l 1 rr> rr r r n r-r .,,.. 
215 
Special PuPpose HaPdWaPe fop Design Rule Checking 
the chioce of character indicates whethN mask A or 8 is required to bE. presen t or absent 
<~I that position. For example, if A ancl 8 1 eprescnt the polysil1con ami d1ffu~1on masks 
respectively, the leftmost window in fi{Jure 16 would be reprcsent~d as A 11b 11 a21 B2 1. 
When spP.cifying a boundary check, it is usually necessary to check for several 
rotations or reflections of the bas1c wmclow pattern In the examples in liQtlre lG, eacl1 
window is 0110 of four ortllogonal rotations that must be clleckerJ. r=or clority, a sequence of 
windows that are rott1t10ns or reflections ol each other can be sp ... c1 f1 ed as ' l[v'llldow] to 
ind1cate tlutt tile four rota !Ions of the wmdow shou ld be checked 01 Of Wllldovl] to indicate 
that all eight rotations anc! rellect1ons should be cllecl·ed. fhc cquatlc,ns below w;o lh1s 
notation to speci fy the tiHE:e boundary checl<s in flgu1 e 16. Equallun 'l 1 a 1s :lie fully 
expanded form of equation 4.1. 
4.1 poly/dllftls1on spacmg 
4.1a A= NP, 8 = ND: A 11 b 11a21 B21 , A 12b12a11 8 11 . A22b22a 12B 12. 1\L 1b21 a22B22; 
puly/dllfusion spacing 
truns1stor edge extcns1on 
4.3 A= NP, 8 = ND: 11[a11b 11 A12b 12A21 b21 A22B22], 4[A 11 8 11a 1,n 1;,a21 B21 a22b22 J; 
tmns1stor corner ex l t•nsion 
5. Conclusions 
This paper has presented a hardware architt:cture for a raster scan design rule 
checkmg algorithm. The 1nput to the hardvmre is the set of intervals til at are covered on 
each sc:.u1 line for each ma~l< llle ou tput IS a list of positions where eac l1 type of error was 
found. Tl1e hardware can h<lnclle a wl(Je vL\IIdty of df'Sign rules, layouts of arbi tr my size with 
an arb1trary number of layers, and fc:-~t11res w1tll ed,)es at 45° ang les. The uc:;r- of cu~tum 
chips makes 1t smal l ancl 1ne~<rens1ve enouyl1 to 111cluoe in inoiv1dunl des1gnN work[;talions. 
Furthe1mort-~. tllC' hardware i~ fast enough tll:J.t the ~;p:,;ed of the dcs1gn rule cllcck if. lirn1ted 
by the speed at wh1ch the cont1 oiling proce~sor can provide input. 
Such a signiticnnt c:;peed increase cnn qreatly change the vmy in wl11ch design rule 
checking is used At p1 esent, ctes1gn rule checking is usually done, 1f at all, as a 
postprocessing step after n l<tyOlll has l.loen mostly completed. In untversity design~. the 
design rule check is often not performed at all. The avnilnl.li lity of a ICully lnst design rule 
checker makes it easter to check n layout dunn~J all JJhases of the des1Qn effort and not just 
near the encl. when it is hardest to fi x any errors that arc found . 
It should also be notecl that the basic architecture of the one system is applicc.:l.lle to 
a variety of other problems. The rasteri~:nlion unit cnuld be used to drive raster scan 
printing devices or could be used to scan ccnvert images for a display screen. With slight 
changes, the arcl1itecture could bo usecl to speed up the mner loop of a raster scnn noJe 
e.xtraction algonthm. 1 By mlroducing a cltlfercnt set of primitive window operations, the 
architecture couiJ also be used in image processing applications. 
CALTECH CO NFERENCE ON VLSI, JanuaPy 1981 
216 
La'Y''Y'Y Seite'Y' 
Now that the ::uc..:hitccture has been defin t=!d, an effort ts under v1ay to lay out the four 
custom chtps menlton crl in this pJper and builcl a prototype of the design rule check 
IHmfwCire. fl paper in prepamlton vd ll deftne ~everal variations of the Mead/Conway design 
rules coulcl be chP.cked using the not'llion~ developed tn this paper. 10 
/\ c !<now letlgeme nts 
Many thanks arc due to Jon Allen, Clark Baker. Lance Glasser, Gary l<opec, Paul 
Penftcld. and 8 tnis Terrnun for thntr tclcas and encouragement. Jon Allen deserves spcch\1 
tlwnl~s for llts support. in <111 ser ·scs o f the word. 
Rcfc rcnces 
[I] Oaker, C.M .. Terman . T., "Toe Is for \'crif ·;inr;lntegratcd Circ uit Designs," Lambda. the 
1.1agaztno of VL Sf !Josign, Volu11c I , nun.IJer 3, Fourt11 Ou<ttter, 1900 
[2] B<•l<c r, C.fvl., Massacllusf.'tt~ lr·stitu te of Technology, private communication, 
Dt:cember 1980 
[3] Barrel. H.S., "I n~.t t\lgontluns tor LSI Arhvod< .A.natysts, " Proceedings of the 141/J Design 
Autom:..;ll•)n Confer.~n.-;e, pp. 3'!3-311, June, 1877 
[4] Bently. J.L., Hal<•! l1 , D., Hr:>n, I" W., Stu tisttcs on VLSI Destgns, CMU-CS·80- 111 , 
Deparln aent uf Cu 111puter Sc a ,.ncc~. Carracgtl) M<JIIon University, Apnl, 1980 
[5] Dunn, 1/1/., Gcm~ral lns tru rnent Corporation , prrvate communtcation, November 1980 
[G] L acanthi , B., "Ohjcc. t Oriented naster Oispkws," Proceedings of tile Caltech 
Couf :renee on \'cry l.argc ~ca:c Integration, pf'l . 215·-225, January, 1979 
[:] tvlcCirutll , E.J. , Whitney, T., "Dc:;ign lntegrtty and Immunity Checldng: A New Look at 
L.tyout Vcrificntron nnd u est!J'l Flulc Ch8cl<arl!J," Proceedings o f the 7 7/17 Dasign 
Automc~lion Conference, Mrnnc.. IDO!ts, pp. 263-268, June, 1980 
[81 Mend, C .. Conwa:~ , L, lntrorluction to VLSI Systems, Addison-Wesley, Massachusetts, 
1980, pp. 47- 51 
[9] Roscnb·~rg. L.M., "The Lvoluta0n of Desinn Automation to Meet the Challenges of VLSI," 
Proccecl!nys of !he 77th Oc~t!Jn l \utomMion Conference, Minneapolis, pp. 3-11, 
June, 1980 
[10] S8alcr, L ., "Formal Definition of the Mead/Conway Design Rules," private 
communlcauon, M!T VLSI Memo Scnen (in preparation) 
[11] Wlotlney, T , A 1/ierarcfllcal Dostan Flvle Cflociwr, Master's Tlwsts, Caltforr>ia Institute 
of Technology (in preparation) 
INNOVA TIVE LSI DESIGNS SESSION 
ABSTRACT 
1 
A VLSI T 1\CTILE St:NSING ARRAY COMPUTER 
Cnlifonua lnst1l11lC of Technoloqy 
Mmc II. Hruhrrt2 and nc,yrnond Eskeno7i 
,J<.)t Propulsion Laboratory 
llr •1 r· WP cle>scnho H dc•vi<:<' I hilt i~ •• t once• a ~P<'C itll purpose parall<'l c:ompult•r o111d ;t l11• 111 
!11\/PS <• r<"'h<l t mnnipul-'llion ~ysle 1n 1111~>1111<1llnn t~houl contact bctwc·"n IIH' lli.IIIIPIII.•I<ll 
S(' ll !";()l:-. Wllh i1 CIJS(l)lll <IP!'I(IIH!cl I Sl d••vicc tllnt IH11HII<=!S lli\llsducl1011 , c:llllll'lli11H f, .lllcl 
C:Oilll lHII IiC.oJIIOil I 111 qp nwt.11 c : lr·r:trucl<'~' nn the~ ~- 111 fac£> of the dP,rit'l' o11 p l'l.n:••cl 111 
c<ll il n r; I wi lh a con due I iv~ rubber I)PfonnilliPIIS of this <'lil~lic m;llc• ri.d '"'' '·•· n ·,,·d l•v 
lllf'cl'>llriiHI clwnqr·s in 1l s local r~' r;i~ llv lly . Ill~ sensory architecture Pl.-.r-r. liH· pr11l•ll'lll nl 
(:<•llii<'Clmq liH• transducer nne! romputcr by u :::; inq <Il l a rr ay o f proc:c· ~so1:; to 1111<'1 iiiHI 
rndtll:" raw d~1tn hPfnre conHIIIllllc<lllon . Smce lorqc nrroys covcnnct tlll entire' in I ;1c:l 
w.1 f t>r n r P plnnnPd. l11r- <IC'siqn lllCitlfll's bHckup rt•dllltd<wcy for tiH• c:ompulinq C'h•mt•nt~;. 
<liH I me-r hnn1:;m;, fm nutonwl1c r<•placc·mcnl or ftlllt~(l clements. 
1. tho~"""'" l''"""nts thn '"~"It~ of onn ph:o~n of ro~o;uch pm f ocmed at tho Jot l'ropul ~'"" I "hor.otnry. C;,loln!fu,, 
ho·. tolutr o f 1 uchnology, ~ponsorpfl by thP. Natoonal AC'rollaUIICS and Sp;oce Allnoonost r illtoll un<lrr C'N>ItilCI 
N AS/- 100. 
Z. r.urrnn t ;ul<lro">~ i:< Ont'"'tment of Com,wtcr S<:ooncn . C!lrnC!Jce-Mcllon Unovorsoty, Po t t •.hurgh. Pil . 1 r,2 13. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
218 
John E. Tanner', Ma7'c H. Raiber>t and Raymond EBkenazi 
I!\IIIH11>UC rtON 
\lr r~ .. llil•· c'Oilll"''"r C'•Hl lrolln<l nr Hlljl lll <~littrt <IPpt•nds IH•<Jvily upon lhL' ;tv,rtl;tloility .,f 
pr •·n•.p r1,11.1 qr11ror.dr•d from rntc•r.tr.lton•; IH·twrc•n lhn IIH"IIlipulntnr illld tl ~ c:nVIII'IIIIIt'rt\, 
illtcl lljl!lll COillf'U\Olt totHII c•IPntr'lltS lltc1\ COIIV(•rt \IH'S(' clnta IIllO u sc• ful GOittrol 111fCIIIII•1lll•ll 
lire· ·~··rt·.r of \ou c h t:-. n pol<'lllinl :;ourcC' of :;uch dnlfl, but cxts\11\q t.H:trl<: ~.1'11'-IIHI 
(r r lt11nloqy t.rlh far ·. IHJrl of Ollr rc•quirCIIIC'IIl~. rurtllcrmorc, lcCillllqll<"; ftll 111,1111111 
It rno.;diiC <'r VII \It ClllllPllll' l -- lr:c.llllHIIIt": thill iH(' CSSC'Il ltnl to \hr. ttltllllcl lC '>liC:\(! :-;•, (I( 
trt.tn m.rd1 • ·, t : rtsory dt~vH;c•:; -- ltavr: nr•rtured quite s lowly. 
I"'"' rc".;llhtlion 1.1• lllc: ~;rn•.or:. lr;rvc• IH•t•ll m.Hin rr.•.trtrt ,111 arrny or s urfn c<"' t:lr•c:trodf''• on,, 
p.t'. ' '"'' ·;ub;.\ r;ttr~ cov<•rf'cl Wtlll ..a prr::.:;un~ ~;, • n•,ilive clastrc mate rr <tl [ll <"' rc :;.-y 1U. 1\llfll 
/!lj. I •II<J<', cl~:rr:>c .trrOtys wet<: noi l•:<~:>il>lr. wi th those mcthocls, 11nt IH•c:nw.t• :.111.111 
r·l"r ltntl<• :> wr•r•• lt;nd to 111irkC', l•ul IH:c:.ru:·.c• a :>cpnrnle wire connC'clrcl c "' h C'lcclr•td<• 
In llrl' :~1'11'>01 l'lt••.l runrr.~>. Ill<' P•lCkdq•: of :;c•w;or elcctronics itself l<'rHI• d In ht• IHIII.;y , 
lllo!klltq II ilo~l d ln 11 '.11 11•~·11 \1,.: l•ll~·illf'!•S t'llll of U llHllllj)UialOt . fu rllt f' lllHll l', lltt• 
••ll't ''""~~' :, o11l y trclltsfutnt<•d .qui llllrllrpl .. :<•·d r.•w rlc:rla. leavi11~1 lillt'rinq, llltPrpr,•l<ltlllll, 
olllfl lt •t. tupliliort to IH• o~r:cr•lrlpft .l11·d r-1-.t·'\IIIH•It!. 
Ill' I I' \MI' tlt•:.cr rl1ro tl II<JVC! I dc ... iqn I It" l u:,c·: C:ll'dorn 11\ICfOC'IectrOIIIC$ to U\lf'l COIIII' tltt ••o~' 
prnhl r· rlt'. . Wr. nrc~ cnn s tnwtlltq n hiql1 rC' !",ululron lnctrle sensintt a rrny by dc~ rqr11nq ,, 
~.pr•c rnl \/1 ~.r crr<.url tlwt performs lhrc•e baste functions: 
1 r<111Sd11cllon -
Corn t> ut a 110 11 -
I hC' rxpnr;cd surfncf' of the circuit contarns se t s of nlectrodc 
pilei<;, muclt lt~c hondrnrt pitds. llt.Jl nwkr clrrc•ct contnct wr ll 1 ;1 
luyc•r nr PI C~·!dll c :;en:; iltV(' c ia:; tic lllil\ e rinl. fhc surftlC(' llilllll f' 
of llrt ~. rnl<'r ,1r.lro11 1': key lo liH• rnleqrulron ctf !>cn sor o~nrl 
computer It c.attPmpt::. lo overcome the pin limilu lio n pr ohlt•m 
tlwt us ually flmrts pc~rullcl computing in integrated circuits by 
u :;rng a two-clrmens•onal array o f inputs. 
C rr<:uils Ill til~· tlt•vicP for m "" arr ny o f computntiona l clemc•nls, 
Ull'' ;J~;~,OC:Ii'l\Pd Wi th f'tlCh '-if' l Of s urfuce e lec trOclf''i. r <lt:h 
t•lt •m<•lll PCI fOIIII'> :>llllpll' nrrllllllf'lrC opcrfltiOilS ciiHI lcot:,d 
commtlluc·ntron functions wtlh neiqhbors, while t oqe thcr llt ~'Y 
form u VC'r~c:tlrle pur<lll f' l 'tmaqc' processor that pC'rlc•trn~ 
d rsct Ptc two-clrmc n s rorwl ( 2 1)) convolutio ns. 
INNOVATTVE LSI DESIGNS .C:F!.C: .C:Tn1t1 
I.JJ.. o.J 
A VLSI TactiLe Sensing APPay ComputeP 
Comm11nica1 ion - lhr of a di~ trihut!'d s h1ft rt>~JIS tcr nllows all outputs f1om th0 
clrvice to br r.omm1111ic:.1tcd O\lt!r a s mqle wire. 
OIH'r.,tion of tlln s l1ift lt'CII s ter cnusP:; the cnt1re stnte of IIH: 
scn:;or to be transmillccl over th1s compact channel. 
011r ~.;\rnlf'qy in ll'>inq \n c liiP alrdy clntn 13 to l1eat lhf>m as im<HJP..S , n1HI to apply ttu• 
pror l ">'>illct l!'Cillliquf' ~. th,,t hdVI' prove n US«'f lll i11 rle«linq with otlll't t VI""' of iln.tq<•: .. 
C"'P"<'inllv \li:;unl imnqr~ :;. 
COilll'll t ~r vi~dnn rnly (>n compulinq the ~() ccmvolulion between iln 1111<1'1<! .l l lCI ·;t~\:. td 
prp-:-;pp r: lfl<'d fE'dlure ma sks. S uc h u convolution IS qiven by: 
N N 
C(x.y) =LL l'(x-ii-l,ytJ-1) M(i.j) 
i:: l j::l 
wlwrc• l'(x,y) i <> \hr. vnluc of pn:·;suro on a C!'ll ill (x.y), nne! M(1.j) i:; \IH · corrt"·Jlnndii HI 
vnllf(• i11 ,111 N x f\1 fillt•linq llhl ';k. lo perform thi.s calculation it is nccr" .~ '" Y to ind•· x til " 
llirl <.;kS JlC'IIllil irl<'llliri r..tt iOil il iHI h •r.il tl(ln of IIIII 'S, !• ( ICJ0.S, unci other CO iltOIII S imputl.lll\ II> 
1\n impnrtnnt fPDtllrC' o f convolution i:, llwt it Cd ll ho 0fficicnlly impll'lllf'illl'd hy .111 <~ll.l Y 
l>f pror.0 ~~>o 1 s , cnch p<~ rfornunq t11c •,nmc calculation ns every o liH~r pror.c •:; ~,o1 in till' 
.~rr;.y. nil Ol'l'rH iinq s llnullaneou ~; l y, and each only reCJuirinn datn lr0111 11 ::; own IClc, d 
fl('iqhh or lu>u<l of lllP imnqe. Therefo re we h,1vc implcmcntecl an arr;1y p r c•cPssor ll~oll 
pr•rfnrlll$ ?f) c0nvoluUo n between IHO~Jru mmabiC' m ~)S ks ancl tactile imnqcs. 
WHI\ 1 Wl· DID 
I iq11r0 1 iflu~tra\ (:s tllP. phys1r:nl layout of the tactile sensor. An nrrny of l~I<'C:tlucf<· ,; 1. 
C'JV0.fl~ d l>y !l s hf'el of prcc.surc vt!ll'-iliV!) C:OilClllr: fi VP. rubber. fllis (• 1 ,1~\it: llld tt~llnl I lit~. 
the propC'rly thnt deformation cati~Jt"; it s s l1cct resistivity to clwn cw in n pr<·Cflcl,tlllt ' 
f n s llio ll . !ly 1HlS3111(J a Sill <:! II lc ~; t CllrrC'nt lr0111 a Pillr of s urface f'lcctroclf' :> till 0111(11 lhl' 
m,df'lilll, \IH? nwq11ituch> of :;urfncc PIK's·-alr(• cn n be measur ed locally. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
220 
h T M H 0a~be1't an.d Raymond Esk e n.a?.i Jo n. E . an.n.e1' , a1'c • ~ v 
COMPUTING 
ELEMENTS 
SILICON SUBSTRATE 
PRESSURE SENSITIVE 
ELASTIC MATERIAL 
SURFACE 
METAL 
ELECTRODES 
Figure 1 . M och ilnical architecture of touch sensor. A layer of prc:;surc 
p;cnsitivc rubber Is placed in conti\ct with VLSI wafer. Tho surface of tlw 
wafer i s covered with large son:;rng electrodes tha t are connected to circuitry 
that makes up an array processor wi\hin the wafer. Communication with the 
d e vice takes place over a few seria l line:;. 
111 p 1 r •\/iow, ilnpl r ·nH·nl it lio n :. IIH' c• l r-c:l 1o d c-; :,hown in f inure 
p;~ •, .i\tf' r• p oxy o r cr• r<~mr c; :.uiJ :c.tr , rl r: tt 'il)d only for lllPCh nnicu l ~upporl ll~<·jc ,·y /~; 1. In 
Cll ll dP ' ;rqn, llnw PV!'r, n E;tll f.r cr· l .vy'c:r of nH1I<:tl i '-. pl ilr:f' cl on a s ilic on wn f r•r I n fo) 111 lllr· ··<· 
~' ll '< ll orlr- • •• whi l" con vc: ntlorl.11 11rVo(l:, c rr r:11itry IJ e lnw il protective qlo~ ;,!'> 1.1yr•r f orm :.; l111~ 
complllillll r lr• IIH~ nt ~ IH :nclrd I n inrplernt•nt lr <rns clll c lion, filt e rin ~t . cl .:t tn r r. duc liun. <IIHI 
:~ ilico r l. 1111', i:-. : drow 11 111 r'11prre ?. 
I iqrii P ;l -; llows thP ovr:ra ll orr.hilc:r. lurr• o f thr ln r. lil e sen s ing comp11 tr~ r . /\n nrr<ty o r cr• ll s 
lh il l trnn ., rl rH.:e i11Hi comprrte <Hr r:;1r. h collncr:tc d to tll e ir nearc~ t IH• iqlrlHII :; ; oncl l n il 
q l,ba l control bus. I his qlo11ill b11:-> IS driven externally to prov1rl<• p ownr, c: lock, 
ill <.; lruc lio n s, uiHI volt u q c re f e r e nce to li1P c le me nts o f the array. Thr: t nc trlr~ c:C' II :; Pitllc:r 
~<' n ~•l~ Pr<";~ ur c-. r.omput c o r comrn tll licn t c, w ith the wl1ole array pcrforminq in lncks l<• p 
I h r. rnmp11li11q 11111t s n1 <' s irnplc hill po werful p t o <: <~:-;sors thnt pm ti c ipat o COOJH'r<liiVI ' IY t o 
implnmrnt lhn nlqnrithms r r.quit c<l f o r tn c trl c imaqr. sen s inH and iln., lysi s. I iq11rc- II ~how:; 
a blor;k rlinqrnm o f a computmq unit. I ach unit contains its own se t of tt anscluc tron 
INNOVATIVE LSI DESIGNS SESSION 
A VLSI Ta~tiZe Sensing A~Pay Co mput e P 
PRESSURE 
SENSITIVE 
RUBBER R • 1000 n 
\ 
r-------~~~------· 
----~--------~------~---OVER · GLASS 
-::::::::.-=::::==::::r:::::::x::=:=--==' 
SIDE ACTIVE CIRCU ITRY 
VIEW SILICON SUBSTRATE OF WAFER 
A 
PRESSURE 
SENSITIVE A • 100 H 
RUBB~ER + '%v-.A-C-E--~ OVER· 
~~ I I I !IJ:(7G·~·ASS ~-.,......_.~ ·-/ 
SIDE ACT IVE C IRCUITRY 
V I EW SILICON SUBSTRATE OF WAFE R 
F1guro 2 . A) The sheet rc :; i:;t cvity of the conductive clastic miltcr i il l i s 
ml'<~surPd locally by m e tal c lcctr ocl:-s tha t make contact throu9h cuts in the 
ovcr~JI<I ss. B) When the miltcrial is co1o1pres~od, ca1bon particles b ecome more 
d cn:;ely pilcked, d ecr easing rcsi:;tivity . 
I 
I 
I 
I 
I 
I 
I 
I 
SENSE : 
_____ J 
I I 
r---..1- --
SENSE 
SENSE : SENSE : 
--- - _J 
_____ J 
LOCAL DATA BUS 
Figure 3. Clock d iagram of array proces:;or. Each tactile cell has a sensing 
part and a computing part. A global control bus provides all cells with power. 
synchroniz.-.tion si9nals, and instructions. Cells are locally connoctod to 
nearest neighbors. 
CALTECH CONFERENCE ON VLSI, Janua Py 1981 
222 
John E . TanneP , MaPC n. RaibePt and Raymond Eskenazi 
c- lectrorlt";, nn nn<1loq to dr cptnl converter, n latc h, a s imple arithmctr c locpc unrt i"IIHI .111 
llh\ructron r<'CII ~· tcr. I li e n11aloq pnrt converts lir e variable res istance• or 1111 • condrrctr\/P 
rubber into n 1 -bit digital voluc that corresponds to the prcsMIH' o n the c c•ll. /\n 
,l djtr ~ l.eld P qlohnl r e ! P r encc> volto~q c n llows the threshold of thr •.; d llilloq- t n-tlrqr t ,J I 
r:<IIIVr. r :; in n I n hi' vnrwd . lhH rr>·,ult is :.lurl'cl loca lly 111 n latch. a ncl c:.111. tiiHIPr r'<:lt•lrhd 
procrnun contro l . be mnde nvarlnhlc· to \lilY of the four n ear es t llPiqllllou;. lite• lo~\t :IH•d 
dnl<~ r:u1 tt l •;o he multiplied by a rHrml> ~r olll.trnP<I fro m the qlobal cor tlrol l>u :; , wr ll1 1111• 
rr'!"'lllt ilCC tllnulniPrl 111 a (:i- IJit n •qr:, t c·r usrn q two';, complement a11thmc•tir. . I Ill~ con\('11\ :. 
u r this A-hrl ncc umulalor can llc s hif\t !d n n<l rot n t ed rn various ways. 
NEIGHBOR N 
1 BIT 
LATCH 
Ia 
r---------------~----------------------------.NEIGHBORE 
~2 
NEIGHBORS 
•GLOBAL SIGNALS 
Figure 4. Block diagram of computmg clement. 
1 h e rrt r, JIIr <. ltOI1 r r qhtr r Co11tr ot:. .111 oper a tions or the computinq e lr>m<'lll. I he etqht hit:~ 
o l the in .,. tnrct ron a r c transnHtlccl scrrnlly over a ::;innl c uloba l in :-- tllrctiun line t o !111• 
lll'>lll•ctro l1 rr!qrs tcr of each ce ll rn the orray. [ rqht instruc tion lines, OIH' lor Pll<: ll l>il in 
tlw ins lrll c: tr o n rc•qis lcH, cont rol tlw various parts of the computing unit. /\ppendi x 1\ tr:. t s 
tho frurction~ nf IIHOJ ins trur.lion linr~s and {lives the mnemonics n n cl c.ocli r1 ~1 for till' 
in <. lr11 r.l in11 <;p l. 
INNOVATIVE LSI DESIGNS SESSION 
223 
A V~SI TactiLe Sensing A~~ay Compute~ 
llw compllliiHJ resources JII Sl d('o:;r;rib(•cl allow the cell to sampiP. thr loc<ll pr<'" ' .tln·. to 
~ton' llw VflhiP, to p<tss d ella to ne1qllburinq CPIIs, and to perform c:ompulilllnn•, on ti H• 
datn that implemPnt i1 ?D convolution with pro£trammablc mask. 1\ simple s lip d~·ll't:lion 
nlqorithm 1• .. 
·' 
also rn!>ily impiC'IIIP.ntcd u sino these computa tiona l nhililif's. I >..nmpl r 
proqrnmR using tl1e instruction set of /\ppendix 1\ to do convolution and slip dPtc•clion n 1 I! 
qivcn in /\ppC'ndix B. 
EXl ENSIONS TO LARGE ARRAYS 
I hP. tlllimnlP. tnrqrt of this projE•ct is to const Juct a 00 by 60 111111 . nrrny of 1 111111 2 t.1clilr-
fJIPmrnts . lllis involvP.s a very 1.-.roc nrcn of active silicon comparrd to convrntinnnl 
rlesiqn<;, nnrl thP.rcforP. a much lnrqcr n sk of fabricatinq defective cJrc~tlilry. No1 mnlly, 
i11troqra tPd circuits nrr nl<Hlllfaclurcd by huilrlin9 an array of idcnticnl circtuts 011 <1 Silicon 
wnfer wllich is subsequently diced into "chips" that a rc packaned and w;ccl sepnrntf' ly. 
Si11cP. c•r~r::h cllip is used sep<Hnlely. dPfcctivc c hips can be discarded 0 11 nn individunl 
hflsis in a straiqllt forw<~rd llldllllPr. The over all yteld can be kept llaqh despite the 
pr0snncn of rl r. f nc: tivc circuitry 011 lito wnfnr. 
u~.inq torlny's fnbricataon tcchnaqllcs, a fair number of defects must l>e r.xpP.ct,•cl 0 11 ;, 
dcvicf' tlwt is 1 ROO nun2 . The prrscnt sc l1 cmc, however, requires thAt nil tnctilc ~C'n~i11q 
r.ICIIlf'llls function correc tl y for the comp lete tactile sensor to opcrntf' cf f Pctivl' l y. 
TIH:HP for~'. mPasures must l>e t<~kPll to eliminate tile effects of t11 rse def0cts. Our 
dP~iqns provide a spAre comput111g element for each t acti le scnsinq e lrmcnt -- idl 
:<>P. n :=>inq nncl r.omputinq circuits are rrplicatccl within the tactile cell (f iq11rc 6). 1\ 
~.e lrctoJ cir cu it Ghooses betwPcn the pair of redundant computin~t c l c nwn t s. 
We lwve dcsiqncd a very simple 1? transistor selector that replaces n fn1led cornpulinq 
E'lcmcnt with 1ts backup spare. It is vitally important that this se lector clemen t IH• 
simple, since no l>ackup Is providccl f or l11c seleGtor itself. Makiny it simp le reduces its 
aren nn<i, there f ore. the probability of it hnving a fabrication defect. 
1 h e se lec tor i:<> jus t a l<1tc h tllilt cllan~ICS s tate whenever its two inputs, hotlt si{l llills 
from the primary compu linq clement, <1rc not the same during the selector's 8trohr. p11lse. 
One input is the transducer output . Th e o ther i s the computing c lemen t' s output line. /\11 
sectio n s or the primary computing clement can be checked for propPr opern lion hy 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
224 
John E . Tanne~ , Ma~a H. Raibe~t and Raymond Eskenazi 
rnn nip11l r~tinq th e n n a lnq r c rc 1ence vol la~JO and pcrrorming approprinte computa tio n s . If n 
failure IS detected. the seconciA I y cOJnpulin£1 c le me nt is se lec t ed. In thi s dP.si~Jil, 
sP-Inclion takes plo cP simult <~ n eous ly and a utomatically for every tactile ce ll when lhC' 
t n:s l p roqrnm is execute d by thn <l rrn y. No human Inte rv e ntion o.- hand tPS lill £1 is 
r e quire d . 
G 
B 
LOBAL 
us 
r---
DIGITAL 
PROCESSOR 
r-------, 
......- SENSOR: DATA OUT 
1 
1--
1-- SELECTOR 
1--
1 1 
......-
I DATA 
SENSOR i OUT 
r------_J 
......- DIGITAL 
PROCESSOR 
SPARE 
PRIMARY 
DATA TO 
NEIGHBORS 
Figuo e 5. Block diagram o f tac tile cell with redundancy. Two computing 
eiC'mC'nts ar<> found within each cl'll . A selector enables tho backup clement 
when the prim<~ry fail :; a prog~illnmcd function test. 
Us in!;:J a Po isson model of d e f ec t dis tribution we can estimate the yield o f th e arrny 
snnsor . l h1s mode l assunws the e qua l probability of a d e fect at any p oint. ror o circuit 
with 1111 r e dundancy , a de h >c t onywhere within its a rea will cause il to f a il. 1 h e 
prohahilily P that a ta c tile computing e le ment works is the n : 
p ::: e -AD 
where A is the a rea of the c irc u it and D is the defect density of the fabrication process. 
INNOVATIVE LSI DESIGNS SESSIO N 
A VLSI Tacti~e Sensing APPay Co mput eP 
~ or a n arr ay wit h N tac til e c le me nts th e a rray y i e ld will be: 
Yield = PN -AO N = e 
If e nch tn r. til r. CC' II contai ns dllplicn t c computinq r. IPments a nd n se lec tor tlln prol,,JIJrl i ly 
l'n tllnt a n y one t nc til e CC' II is q ood becomes: 
= P - P ( 1- P )2 + ( 1-P )P s s s 
wllPr P P s is tllP. prnl ''"I> i ii l y a selr>c: tor iH ~JOO< I. Hf>re we hn vc <~S:-<t rtn ~>cl lll ;ll " I <HI• ·<I 
sP. Ic·c l nr r. l1oosns onn o r tll P n tll•'r of t11 e r.omputinq e lement s rt CPtltro l· ... r ; tll11·r llr.trr 
c lroosinq n e ithe r or both . J\l ~;o. til l' r cliith ility of th e qlobal data bus i~ n o t cort'-'l<l<• t •~d. 
IIH' y i e ld o f a n m r ay nf N t ac lil n cr lls, each with one backup computinq t!IPIIIPrt l i ,.., ll ll'll : 
Yield = p N A 
r-iqttre 6 plot s expected yie ld ns <1 function of <Hray s ize wi th dcf0ct dt' l lS rl v 11 1 
O.Ofl/mm2 , sr>lnc t o r nrPC'I of 0.()11!'1 mm2. niiCI computin£1 e l e me nt area of O.r!O mrn2 fnr hnlll 
r cd ttndflnt nnd non-r e dttnd <~ nt cells. It ca n h e seen from th c:sP. plots that nrr:1v~ 
conta inrnq 1000 c l e me nts a r e ar.hrrvnble. 
100 
...... 
......... 
80 
' 
' 60 '\ 
YIELD \ 
(%) 
40 \ 
\ 
20 \ 
\. 
0 
' 1 10 100 1000 10,000 
NUMBER OF CELLS 
Figure 6. Tho expected yield of a tactile sensing array is calculated u sing o 
Poisson model of defect distribution. It is plotted here as a function of N. t he 
number of cells in the array . The dotted curve is the yield with no redundancy. 
The solid curve Is the yield with one spare computing element per cell as 
describ~d in text. Defect dcnsrty is 0.05/mm2 , selector area is 0 . 045rnm~ and 
computing element area is 0.9mm2 • 
CAL TECH CONFERENCE ON VLSI , JanuaPy 1981 
226 
John E . TanneP, MaPC H. RaibePt and Raymond Eskenazi 
riqun! 1 i <> a lloorpl a rr and pllotouruph o f a chip tlwt implements a 1 x? lo~ c til e ~.errs rnq 
it iiii Y computc• r . In ordrr tu ~,c •c tile nc tivC' c ir c uitry the pressure SPn s ilivl' miltPrinl i ~ 
11nt prr~ ··; P flt in llrr s pllo l oqrnph. fil e• c llqr 111c ludr ":; two ta c til e ce lls . ~ ~ ac:ll n f wl11r~ ll ll <r :; 
t v1u comp11linq r-!I P. ment~ c~11cl r~ sc· lc•r:tor. Bon clriHI w irt'!-; nrc a ti <'~C IH• d to cr, rr vr' l lli ~Hl .d 
illpll t /orrlprrt p ;rcl •, n lonq \h r. lop n ncl bottorn. lhP:,p <11e u scclto t <>sl tins c• <nly p rol nlypr· . 
IIH' l ,nqr• m r l n l nrrns in lilt• crnt rr o f f'dcll tnc lile ce ll <H<' thr. scn s ill t l r · l r·r. trr•clr 's . f lrt• 
•;j _-p o f tlli •• r: llip i ~ about ? mm x 3 mm. 
ll w~ frr •.\ prototype wac; faiJn ca l cd a s part o f Mf'CfJ-80, the Multi-Prnwc t C hrp C'f furl 
r oorrl ir li1 tr>cl h y IIH• Xr!rnx Pa lo /\1\o Hr·sc;rrcll cente r f Conway 8 1 J. 1\ ~\)<.:\ <Hr<ly w.tr: 
f ,11lricnlr-d n s p nr\ of MOS I:-i. a s11n il.rr su r v1cc coordinat e d by l S I [ l.nlw11 B 1 1. f lo lll 
vr- r s irn1s <H P r1 MOS with l; rrnhrl ;r r•qu.JI to ?.5 micron. Tnbl e 1 qivc·; th e me><~ !; urt' d 
pr·rf or iiHHlCP o f th e ~x:-3 tn c lrlr c;r.n 'diHI nrrny. 
Pr essure Scnsili\/ily 
Cloc k l~ <1 lc 
ln:;truc tion Time 
6 IJi l Add (Multiply) 
3x2 Comtolution 
PmNcr fkqui1 c mc nts 
SO ~tr ams 
3 Mllz 
3 JJ. ~·CC 
18 JJ. ~·CC 
1 / j.Q JJ. t;<:! f' 
5 V 
1 0 mA/Ccll 
TABLE 1. PEHFORMAI\JCE 01- TACTILE SENSING CHIP 
Pl\0!11 I M S ANO PLANS 
/\11 1mpu1l n nt p1 o hl<' m wi th 0 111 l'~'~ ~'~l' nl nrv.os 11nplc me ntntion is l110 r.xcc·:.•>~vt> powt'r 
r PI\UII e mPn\ f o r c: •rc ui\ 1> w it11 l.1 rqc fi C ti ve a rcu ~~. ror Slll <l ll nrrn y:'> , Ill<' c urr !'Ill uf 10 
m/\/r :PII I'; r<' <J<;OiliiiJic, but an 1 800 ci<Hllc nt <II ray, our o ri {lina l l ar~1c t, wo11ld r oquir <' I B 
'""P~. Wr• nre dE>:'> i!Jilllltf n n ew vu rs ro11 of th e Lac: tile sen sor in C M OS. I his fnbric illrurr 
IP<:Irrrnloqy i·, P.x p or. tt~ cf t o bf! come nvai laiJie to us soon lhrou{lh tile l S I lmpiPm<•rrl n lion 
:-. y:> lrm. /1. CMOS rl r.s iqn s hould orc.rtly rf' lic v c tlw powe r s urply a nd diss ip,,tioll prohh·m s 
rnh c rr! lll in 11MOS. 
INNOVATIVE LSI DESIGNS SESSION 
A VLSr Tactile Sensing A~ ~ay ComputeP 
TAXEL 1 TAXEL 2 
r------ ---------- ---- - "T,.., _.,....__-_-_-_:-_------------...... ,
I I ------I 
l. 
COMMON ELECTRODE COMMON ELECTRODE I 
SPARE 
PROCESSOR 
SPARE 
PROCESSOR 
GLOBAL GLOBAL 
CONTROL 
AND I SELECTOR II SENSING ELECTRODE I POWER 
BUS 
CONTROL 
AND 
,.ELECTOR II SENSING ELECTRODE I POWER 
au a 
PRIMARY PRIMARY 
PROCESSOR PROCESSOR 
·. 
I COMMON ELECTRODE I I COMMON ELECTRODE J :J __________ _ 
Figure 7 . Floorplan and photograph of a tactile 5ensing chip produced by 
MPCS-80. 
227 
CALTECH CONFERENCE ON VLSI~ Janua Py 1981 
228 
J h E T u A ll Rn~be-t and Raymond Eskenazi o n . • anne~" , :•Ja r' .__ . ..... ., , 
~'""''"" '"" inl1on •: on liH! h <•:;lc trnm;duc l•o n sch e me nrc bc111~1 con· :idP I P d f o r fu\llr<' 
•mt •l •·•n•· •tf :11 1nl1. On P of liH•Sf' 11 :;r·•; ' ' l<1yr•r o f til<' f abrica lio ll prOCC':,:: , ,:, " pie/or e~~i·•l i Vt' 
tr .111 :. rl11 < P l . ll11 o;; 1.1 yn r c01ild 1111 , , r:u·: t om 1.1ycr added n fle r liH• Cll <:uit "" '' IH'<'II 
f :lhll r:..-t1cd convrntionnlly. II mo~y !11 ! 1'0~.:-> ihle tn utili ze ll1 e poly51l1cn11 1,1y1•r "'' 1111 · 
l 'rt",'.tllr' .,,.11:-.1\i''': c lnmcnt , 'oi lie r. 1t t yp ir..dly r·x llihi ls a s liqht pi n7orc:,i~;tiv<· l!l f r-c t . ll1i'. 
I t'< 11111'111'' , ., u ·:C' d i11 •;rune pre~;:.urr. lran:.clucrr~, availab le commcrci.l ll y. /\ 110 l11 C' r pn•.;.ll>lc 
•nr' tlind p f rl e lccll11q JJrr•;<;Ur(' ,., to ll0<~ u 1.1ycr of ma t erial tlla t c ll t:ll l(lt~~; iU; oplicnl 
propt••l•"" w ith prl's~:t JI'lL 1110 c 1rct tilry IJclow mu s t then d e t ect the opt 1ct1 l dtfft· • <'ll<:C'. 
1111~ ndv<•I11.H J• ' o f thi ·: \ PcillliCJI Il' i'; ll~<ll il allows a ~J re a ter isol n lio n o f tiH' inlt' qr.llt!d 
ci t cllill y fr om lh<! phy~; •ca l anrl r;hcm l c<~l di1nqc1s of the e nvironme nt. 
W•: nl •,n h npc• t n rl r.ve lop othr't 1111:l110d!> uf t.,ult tolerance. T l1 c 111fo rmill1o11 colll< ~ lll nl <~ll 
tlllol<i< ' i ·, d eqr.t dr·d only s liqlltlv l1y tiH' In~•~· o l ulll) pixel. In tliP t ac lilt! !,1• 11:-: inq ,,n.1y if 
<llll' (.<'II f ,lll~•. Fi ll I h e d A t a frn m nlli<'r cell·. ll1nl <HC' t o be s hifte d tllrou q ll 111r. IH1cl C<~ ll "' <' 
lo·: l. llw• l !":• llll s in a n im«qe with <.1 r> lfiP<' <i<'fr'G l. We a 1c cl cvPiopmq a mo t r. CCllllph·x 
•;J nftn111 p;l\lt~lll llwt I<Hil•·s P~•cll c•: ll' :: i11 fo rmut1o n t oward llle e clqe of til<' nrr<~y tll rollqll 
ltllllt1Jdt• p :l\llwny~ •. "lhis 11ow d l'siqll r~!'lllircs lllu /\UJ w ithin C'acll cn ll t o rr con st lllc t 
c-<~ t rt·c. l v. tltll· ~; p P riod ir.n lly ill l1 1P :;llifl111!1 l hl tl l' 111 b y comp11tin{1 til (' 111ujo11ty fiiii C llOII ol 
Simul a lio n s s how tlw t l Iii f.; Ill(! l llod () r 
n•ri111Hinnt comllllllliCntin n qiVC'S <111 array a 40 fold i11crease in immunity to :; tnp c d f' fr' c \ s. 
c omhlll i iH I lrn11·•dttr.ttn11 w ith clr'ctro m c r. i rcuilry 111 an intimate way qives 1 i;,c t o n s <:l of 
p rnl, l rotw. ttiii<Jll' ! t o t111 -; nppllc<~ lion of VI S l. llw transduce r mus t b e 111 con t nr. l w•tll tl1 r 
• : tl\lltOIIIIH•IIl il i ·, :;PII 'OIIlCJ, ltl llli$ t~d~<:, th e qrippi11q s urface of ll roilol ' s 11.1 1HI. I liP 
•· l .. r: ll fll ll r::.;. w ll•t:ll i11 mq ·; t <lJ>p lir.. tliOI1 :·. an~ prfllt,-~c l ecl fro m th e cnvirn iiiTIPII t l> y '' <; lurdy 
'·<·Ot l c· rl J'ilr·knq•', 1:; lin''' a 11 inlr.'ll<tl Pv ll of ll1c tr nnscluccr. fh c s rlicon !' ll l>:, tt<lll' mw. t IH' 
nll l r- In wi tll '• l ilnd nny Jq rr ,... lll o~l tl lr' l nbnt'f; 11.11H I is cxp r.c t c cl to C! II CO illll r~ r oiiH I t o l"f "· i· : t 
.tll y CIH ' IIIIf.ill COIIl<llllil l ,l iiOil diffli'.II1CI llirnuqh IIH' pressure scn s itiV C' !' l ~1 ~. l1 C GOV t 'r lllq . Ill 
<~rlrll lmn . l1 r-n t '·1111-'illq lllll '> t I •<· prov1rle d t n Ill<) :>ubs trate t o remove thr. ll c <tt proclllc:ccl b y 
'.ttr :ll <I l arq t : nr<:u () ( i.l<..:tiVC r:ir<..:111try. f 'urlllCf t es ting Will ShOW 110\111 c riti c!l l tiH ·~;(' 
piir:k;l(pnq prolllt'111 :: nrt' t o prorlttCIIHJ ' ' pro~ct•c,,l, llse ful tactile sensor . 
:,o L•• w<• IH•v<· not 111o 1o11q ll ly <.II<H <~ c:lt·•L-:uc l t11 c sr. nsor' s analo~1 r csp o llsf' t o fo1 c:c~; oiiHI 
tl• ·. tnrl uHI ';. llti;, IIlli '. I h f! doiH! . filr: :>f!Vcrn l diffe r e nt t ypes o f nvn ll!lblr Pl <':.s w C! 
·.r· w . lli VC' <' lt~ r. ti r. 111.1tcn<ll<. w 1ll !Jr. l l':. t c d wit ll n special c lnc trocl e c llip -- nne: t11nt liil '> 
'· I X eli! f ,. , Pn l eiPc trod<' qcome tnp•;. ::i<.:n s l\i v ity , ranqe, and locali7a lion nrc• the• v!lrinhlr• :-. 
INNOVATIVE LSI DESIGNS SESSION 
229 
A VLSI Tactile Sen s ing APPay ComputeP 
~UMM/\IW 
< n mhminq t r n n :;du c: tion wi tl1 compulnt1on i11 til£' :"l lll P. dev1r:e is o n r.ff<'CI I\11' "'•'Y '" ~~'·'' 
til r powrr o f VI S l t !'chnoloqv. iliHI to ovcrcnmr the traditiOII ill :..;l'l l '.t iHI prCIIll··nr·. <d 
rnl e r co rli iC'C iio n, r:om rrHrni r:n ttOn nnd r:ompu l ntillll. ll1 e lwo-dinH·n :> iorr.rl n.rlrnt • P I 1111· 
lrrtcqr;ltc-d r 1rc tnt nice ly m u t r:l rc-0 tire ~> tnl .:rcc n.l\trrc o f thP to r: lriP st'll'· II H r prnhl< ·ln "''" 
lilf' n r c lutec turn o f nrroy proces•;ors. liH' cor1 vnlu llo n n lqontllm usn rl t o r •· l1rH~ r.t w rrrr.r<rc· 
clnt<~ ••xplo rt s til e full concurrency nl <111 <Hr<lY pr ocessor to nc lll c>VP lr rql1 l '<' lf••r rn.ll l t. •·. 
W,\frr :->t:n l o lrll n qr cJtinn i !-> rH•c:e•;•.;.rry tn build work lll<l <l rr ays Ol uspful '· 1/ 1', ciiHI f.1trlt 
toler ;1nl I P.c: llnrqrrr> :..; arr~ n Pcrs:;.r ry to ;rc l1i C'V(~ wufc r scn lc inteqrulion . ll w:~~~ Hh•<ts ll i l V I ' 
b een hrouqht toqP. th e r to s t.1rt a nc•w q c n n r.:rtion of ta c lil(! sen sor s l or 1ol ro t s. 
ACKNOWLCDGEMENTS 
ll1c nutlwrs wish t o thank Glen Okita ancl Dcnn Ue haro for nwny earl y co1rlrihutro11 ~. l t' 
thi s WOI k. 
REFFHENCES 
l l(' jr:7y, A . K., " ~mnrt Sen sors f or Sm<1 rt Hands, " 1\IAA/ NAS A ConiC'r<' llce on " ~> lll.tll 
~~~ nsnrn, " Novembe r 1 ~) 73, ll arnp l on, VI\. 
l l riol , M ., II flw lltllizn lion o f illl 1\rl ifi c io..~l Skill Sensor f or til e ld c rlltlicilli Uil of :>uhd 
O hjnr: l •..;, " Ol ll Internationa l Sympos rum o n lnclus tnal Robots, Wo s llinqlorl. U .C .. M,11cll 
1070. 
C ohrn, IJ . , "MOS IS -- lhc AnPA S ilir:on Brok0.r," Procecd inqs or til e Cn ltPc il Conf u n•nr:r· 
on VI S l , ,JamriHy 1 981. 
Conwny, L. A .. " Th e M PC Adv e ntures: Exp<'ri c nccs with VLS I lrnplemcn t ,,tlon Sys lrrn!. ," 
l'ror:P.<!dHIUS of tlln Coltec ll ronfcrcncc o n VL S I , January 198 1 . 
Davis , I . S., "A Survey of l<lq e Detectio n fcchniques, " Compute r G raphr c:; n nd lm.aqc1 
Processing, 1 075, 4, pp . 248-2/0. 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
230 
h M !/ 0 aibert and Raymond gskena z i .; o n F. . Tan n A r , a r c . 'l 
' 'l ' j)(.:ndi x 1\ 
IIH• t · tqlll btl ~• rd tiH • ttt s ltttcl t()ll w• •• d cottlrol dtflt:r t! tl l p n rls o f tiH• r .<' ll e~ttd Cdll ' "' 
11 I , 1( , I t ttqtlrl '• t ' l<'c: l 0 
I •' I n lt.il 111 p ttl .._ • ·I,, r t 
I 
1 
I >1\o :ll •; IJtl t/·.l n t f' 
1., 1\ •: • : ttttntl;tlot M :) l ~ s•• l,. c l 
1,, 1\f' <: lttllltl.do r r : lr><H 
t ~ ~ . t P d '" ' low <tt c s omr u ~·. c ful tn :; tnr c tro nG n n cl lh<' ir mne mo nics. 
,, I? 1.1 
'" 
I f, I fi 17 111 
I I ,. , , ... X ( I '( X 1 () ( IP;•r n r:c tttnttl i tl o r <IIHI carry l<.~tch 
I I ! 1\ -.: X () X X () 0 C' l r o~r dt'C IIIlltt l;lln r n 11ly 
r I H < y y () ;< () Y. 1 () 1 l r•o~ t t: .ttt y l. • l <: lt u11l v 
:.111 I : , X ( J > (l (} Sllif I ; til ln t r.h lldlll <"· ••11r• c:t ·it , ' tt tl l l 
: ;111 I VV () 
' () ' (l () :,hift ;1!1 l.tl c h vil ltl r·~. n t rr• cr l l wr"; l 
• ,( /\ll• u ><: () () ·, t o r P tlt P v,.tw · r t n m the _. I d r·o11vr. rt (" r 111 the In t e ll 
[ \I )[ Jfd 1 , , () ( ) ( ) () I Mtt l ltpl y I i1 I (. " v il h I(' l•y n nd oHid t o n ccumuln t o t l ~) B 
y v ( ) () (l () ( ) Mtlllrply liit<: lt Vil h ll' by 0 n ne! ol cl d lo accumuln tor LS I\ 
r lt' r 1\ 1 (I () () X ( ) l !t~ l.tl< • lit•• o!CI !1111 llli1 1CJ I' lilrOIII( It \h e l il t Cfl 
I I Uti\ ... ... (} ( ) X () Mn\/P lhf> v . •hH· 111 lh ro l o~ lr, h r11to l11 c a ccumul<1lo r M s n 
NOI ' .._ (I X () X () 0 No op e t a I 1011 
l u 111111\tpl y \110. v .lluc tn the latc h IJy a ma s k v a lu e 5 and add the r esult to tlw :,ix hit 
n cc urn!llntor rr :quir r s thi s sequr.n cc. 
l'dH >M l 
/\I)IHvi O 
/\DllM 1 
1\DI >M O 
1\DDMO 
/\DDMO 
b = 000 1 0 1 (l>in.uy) 
I NNOVATIVE LSI DESIGNS SESSIO N 
A VLSI Tactile Sensing APPay ComputeP 
Appendix B 
One po-;r, ihlc mn·,k for dr.lr.ctinq vrrtic;d o>drws is s hown in liH' r~xnmplt• hr•low. IIH· 
prc:;~,ur r• <l<1ta f, how lllat an ohwct wrtll a rc•ct.HHJIIIar corner is prr.s:.rrHJ on lllf' lnwc~r 1<·11 
pnrt of n r; x 0 nrrt~y of r.PIIs. lllc rcr;ults of the convolution <.He ;HI v.tlu< '!'-., <•IH' 
cOIHJHil<'d by C'ncll cell, as &llOw n on t11c riqlll. r11c non-zero valuC's inclic<tlc• " vcr ll <:.d 
f!dn<' 
PrP.!';SUrc Mask After fi l tennq 
() 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 
() () () 0 0 0 !> -5 0 0 0 0 0 I) 
0 0 0 5 - ~ :.) 0 0 ~ 0 0 0 
0 0 0 5 -5 0 0 l 0 0 0 0 
0 0 0 0 0 15 0 0 0 
0 0 0 0 0 15 0 0 0 
rinurc B 1. r xample of cci~JC cletcctron u s ing convolution. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
John E . Tanne ~, MaPc H. RaibePt and Raymond Eskenazi 
I iqcd below i:> n proqram w riti P.n usinq th e mn e monics of Appendix A to perform the e clqe 
riPit'clloll cnn volutio n of llfii Jrr: l\1 . 
IN:, llliiC"; liO N 
< I I 1\n 
: , 1/'.1)( 
/\I 111 M I 
/lllllMO 
/\ I ll •1\·1 I 
/lllltr1() 
/\ I 111M() 
1\1 liiM() 
I l I { I 
:;111 IW 
f\l l l l M I 
I \ f 1111.1 I 
til II lMO 
1\f 11 ll'vll 
f\ lll ll\1 1 
f\IJIJM I 
( , I fll 
[ ,Ill I :-, 
/II II lM 1 
f\lJI lM I 
/\D ilMO 
( OMMI Nl:; 
N• )11 : W e: <Ht : c::r. ll ( I, 1) 
Gc~t pt c·,•; 11rP d,Jl.J f>( 1,1) 
~/1(1,1) :.:[) 
(ir. t nciqllllur' ·; data P(?, 1) 
M( ;) . 1 ) = - [J 
Ci£'1 llCI'I IllloJ' f, d ollO P( ? .2 ) 
M( ;..> ,?) = -[) 
n n rl r;o on l t •l n lot.-11 ot 1Jt3 11 1:; tnt c tio11s 
I 111111' 0. B? . Prnrp o1111 l c1 p• :rform :1 x;> LOIIVolution. 
f ,ll<llllf ·l c:ollll" '' ·"inn llt.tl '"" " Jli CJ\/ t• ll ',c· llll i :, ll1ro de• terminatio n or c ll o~ IHJ <• ill pr<' ''" '" , . 
lrPIIl 1-Hi f' llHl llH' nl t o tllr. n ext. I hi s can inclica t n ::; lip o r be o n e step inn cl t l\<1 c o mpt r• : . •. inll 
· .clwm ~" . I a c h ce ll ca n detc: c t a c iJ,lnCJC o f pressure by computitHI th e tn P qll rllit y 
I llll t: I io11 : 
s = { 0 f or r(l 1)- P (T2 ) 
for P(T 1 ) t P(T2 ) 
wll •' n ' P( I 1) i:; tlu • pr<~.<> s urc on lll< ' ce ll o l ltmc I 1 a nd P('1 2 ) is the pre ::;:;urc o n \lu • ct' ll ;1 1 
<~ l;,lc•r l1111 c 12 . r o r SlllCJit• l>al pr<: ~<;ure values . the fun c tio n S ca n be C01111H itc d IJy <HidiiHI 
f'( r 1) I n P( T 2 ). 'Tile len s l ::. icJIH f ican l b it of th e r esult is S. The program li s l e d in t-iqur0 
H:~ amp 1 ~> mP.nl s tlli '> o lqnrithm. 
I NNOVATI VE LSI DES IGNS SESS I ON 
A VLSI Tactile Sen s ing Array Computer 
IN S THU CTION 
C l [f\1{ 
~~ I /\IJ C 
/\()IJM1 
/\PDMO 
/\1111MO 
/\DDMO 
/\PI)MQ 
/\11DMO 
S l/\D C 
/\I )I)M 1 
/\I l[)MO 
/\DIJMO 
/\DDMO 
/\DDMO 
/\LH >MO 
flOT/\T 
COMMENTS 
Get pressure d ata 
Put it in accumulntor L Sl3 
(Multiply by 1) 
Get n ew prcssttrl' dnta 
/\del rt t o accumulator 
I SA is 1 if slip occurred betwee n samples 
Figure 83. Proornm to ca lculate slip. 
CALTECH CONFERENCE ON VLSI, January 1981 
234 
235 
COMPffT€R-AIDED DESIGN SESSION 
ChaiPpePson: MARTIN NEWELL 
MembeP of ReseaPch Staff 
Pa~o A~to ReseaPch CenteP 
XePox CoPpOPation 
CALTECH CONFERENCE ON VLSI, JANUARY 1981 
236 
COMPUTER-AIDED DESIGN SESSION 
* Algorithmic Layout of Ga t e Macros 
Daniel D. Gajski 
Avinoam Bilgory 
Joseph Luhukay 
Department of Computer Science 
University of Illinois at Urbana-Champaign 
Urbana, Illinois 61801 
The rapid advancement of VLSI technology necessitates new implemen-
tation methodologies with design automation capabilities. Existing 
implementation styles such as master slice, programmable logic arrays 
and custom design with cell library do not achieve the best tradeoffs 
between circuit density and chip development cycle time. Th e implemen-
tation ~methodology based on register-transfer building blocks cal led 
gate macros can be used to drastically cut down the design time. Furth-
ermore, the gate macros which generally represent functional entities 
like registers, adders, busses, logic units etc. are subjective to a l go-
rithmic or totally automatic layout [Verg80], [Joha79]. 
This pape r describes the basic modules of a gate-to-silicon com-
piler which accepts as its input a high level description of gate macros 
and generates a layout that satisfies partlcular technology (NMOS, for 
example) and environmental parameters (layout area or time delay, for 
example). The input to the gate-to-silicon compiler are the set of 
* This work was supported in part by the NSr under g rant 
No. US NSF MCS80-0156l 
CALTECH CO NFERENCE ON VLSI~ JanuaPy 1981 
238 
Daniel D. Gajski, 4vinoam Bilgo~y and Joseph Luhukay 
•nacros generated lit the register transfer level. High-level language 
constructs like DO loops and IF statements are allowed ln the input 
language. However, only Boolean scalars, vectors and strings are 
allowed. For example, a 16-bit binary adder can be described as fol-
lows: 
s . 1. C(O) = CIN 
DO I = 1,16 
S2: C(I) = A(I)*B(I) + (A(I) + B(I))*C(I-1) 
S3: S (I) A(I) ffi B(I) ffi C(I-1) 
END 
S4: COUT C(16) 
The above description can be used for variety of implementatlon 
styles . For example, if the delay time specified ls relatively slow 
with respect to technology used the 32-bit adder will be implemented as 
a ripple-carry adder. If a faster version is required the look-ahead-
carry adder will be used. For different delay times different number of 
bits will be looked ahead. Similarly, different layouts will be produced 
for different tlme delays. 
The compiler consists basically of four modules (Figure 1): 
1. ~~olean Analyzer partitions the input description into blocks 
with easily recognizable structure. For example, the statements s1 and 
s2 will be recognized as a recurrence system while the statement s3 is 
detected to be a vector operation. Statement s4 is detected as a scalar 
operation. Furthermore, the Boolean Analyzer generates the dependence 
graph with statements as vertlces and dependences as edges. The depen-
dence graph represents the Internal structure of the gate macro. It 
COMPUTER-AIDED DESIGN SRRRTnM 
AZgo~ithmia Layout of Gate Maa~os 
High Level Languaee desc ription of gate mac ros 
! 
(Boo l ean Analyzer I 
Cell Generator 
Depe ndence Graph Refiner 
t L Subc ell Generator 1 
l Ce 11 Binder 
lPara10eters Evaluation! 
Cell Layout 
I Symbolic Placement ! 
!Layout Gene r~tor 
Timing Evaluation! 
JStruc ture GeneratorJ 
+ 
Figure 1. Block diagram of a gate-to-silicon compiler . 
indicates the critical time delay a nd cell structure of the future lay-
out alternatives. 
2. Cell Generator modules consist of Dependence Graph Refiner, 
Subcell Generator and Cell Binder. 
The Dependence Graph Refiner tries to break each of the dependence 
graph nodes into as many nodes as possible. The resulting dependence 
239 
CALTECH CONFERENCE ON VLSI, Ja nua p y 1981 
240 
Daniet D. Gajski~ Avinoam BitgoPy and Joseph Luhukay 
graph is more detailed, which allows the Cell Binder more flexibility in 
opt lmization. Since stat~·nents s1 and s4 are scalar operations without 
operators their layout area and time delay are O( c ) where : is a small 
value, so they are left untouched. Statement s2 is a recurrence with 
maximum O(n log n) layout area and minimum O(log n) time delay where n 
is the recurrence length. Since the recurrence node will be broken into 
three or more different types of subcells, its decomposition is left to 
the Subcell Generator. Statement s3 has an O(n) layout area and 0(1) 
time delay. Since the EXCLUSIVE-DR oper~tion is associative, statement 
s3 can be dissolved into s3a and s3 b. Using the above approximation the 
original program is distributed as shown belo,.,. 
s1 : C(O) = CIN 
DOI=l,l6 
s2 : C(I) = A(I)*B(I) + (A(I) + B(I))*C(I-1) 
END 
DO I 1,16 
S3 a: T(I) = A(I) EB B(I) 
END 
DO I = 1,16 
S3b: S(I) = T(I) EB C(I-1) 
END 
s4 : COUT 
2 C(l6) 
The new dependence graph ls shown in Figure 2. 
The Subcell C.enerator consists of several submodules, each for one 
type of a block recognized by the Eoolean Analyzer. Each submodule gen-
erates the functional description of the basic subcells used to syn-
thesize the given block. The recurrence statements s1 and s2 generate 
COMPUTER-AIDED DESIGN SESSION 
AlgoPithmic Layo ut of Gate MacPOS 
Figur~ 2 . De pendence gra ph of dis tribut e d progr~m. 
four types of subcells : 
type 2.1 subcell : G A*B 
type 2.2 subcell : p =A+ B 
type 2.3 subcell: G '"' Gl + G2*P l, P .. Pl *P2 
type 2.4 subcell: c = C + P*C 0 
A description of cell generation f or recurrence struc tures is found in 
(BiGa80]. Statements s3 a and s3 b g enerate one type o f subcell e a c h, 
called type 3a and 3b subcells, respectively. 
The Cell Binder combi~es subcells to form larger cells. Th e sub-
cells to be combined are selected according to the constra ints pos e d by 
the dependence gra ph. Since type 2.1 and 2.2 subcells (genera ted f or the 
recurrence) perform vector operation as we ll a s type 3a s ubcell, the 
three can be combined to form one cell called type 1 cell. Type 2.4 and 
3b subcells can also be combine d into one cell, but it was not done in 
this e xample, so type 2.3, 2.4 and 3b subce lls will each be assigned one 
CALTECH CO NFERENCE ON VLSI , Ja nuaPy 1981 
242 
DanieL D. Cajski, Avinoam BiLgoPy and Joseph Luhukay 
type of cell and renamed as type 2, 3 and 4 cells, respectively. The 
layout occupies minimum area when a ll the cell types have similar 
widths. 5o, if the Structure Generato r finds, for example, type 1 cell 
to be too large, a separate cell type may be dedicated to subcell 3a. 
Since s3 is not on a critical path, this cell can be positioned almost . a 
anywhere in the layout ln that case. 
3. Cel~ ~out modules consist of Symbolic Placement and Layout 
Generator. 
The Symbolic Placement module generates a two-dimensional array of 
symbolic transistors anrl their connections. Compaction is done automat-
ically when this two-dimensional array is translated by the Layout Gen-
erdtor into a complete mask description in compliance with layout design 
rules of the chosen technology. 
Each cell can be manually designed if so desired, leaving the 
pl~cement and routing to be automatically performed by the system. The 
11anual cell design presents one e>etretne of the provided layout design 
space [MeCo80]. liowever, the overall aim is to have an automatic layout 
system, where a manual cell design or a cell library is replaced by the 
library of algorlth.ns in which one or more algorithms for automatic gen-
eration of layout ~pecifications are available for each cell model sup-
plied by the Cell Generator module. lt then follows that the algo-
rithmic layout is the othe r extrerne of the layout design spectrum. 
For exampl e, an obvious approach would be to implement each cell 
2 
with a small progr~10mahle logic array. The MOS and l L technologies are 
well adaptable to automatic synthesis as shown in [SOHT80] for one-
dimensional gate arrdys . 
approach a s desc ribed below. 
T-ie have chosen a two-dimensional array 
COMPUTER -AIDED DESIGN SESSION 
AZgoPithmic Layout of Gate MacPos 
The Symbolic Placement module is based upon a grid system of tracks 
- or channels - on different layers of the integrated circuit structure. 
Interaction among the layers is governed by the technology, and as a 
result, geometric relationship among the tracks is determined by the 
technology's layout design rules. Figure 3 shows a grid system which is 
used for silicon-gate MOS. 
---1--1------------- -
---1-- --r-----------
----- metal 
---- polysilicon 
----- - -----------
Figure 3 . Sample grid system for ~105 technology . 
Here the metal layer is more or less independent of the polysilicon and 
the diffusion layers, whereas polysilicon and diffusion interact 
strongly with each other. Hence polysilicon and diffusion tracks can be 
"hidden" underneath metal tracks . Using this grid as a base, synthesis 
procedures have been developed. For example, using a metal and polysili-
con grid like in Figure 3, two-dimensional arrays can be formed by mani-
pulating the diffusion to form the necessary devices, interconnected 
such as to build the required circuit . 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
244 
Daniel D. Gajski, Avinoam Bilgory and Joseph Luhukay 
Flgurd 4 shows the processes implemented by the Cell Layout 
modules. Input to the Symbolic Placement module consists of functional 
description of a cell (or a set of cells), in the form of a set of AND-
OR-INVERT Boolean equations. In addition to this, basic topological 
information about the cell is also given, which comprises assignment of 
topological attributes to the input-output nodes of the cell. For exam-
ple, the cell shown in Figure 5(b) was specified with G1 , 
- -P1 and T 
(ordered from left to right) as top-inputs corning in polysilicon, c2 and 
P2 (ordered from top to bottom) as right-inputs corning in metal, G, P 
and T (ordered from l ef t to right) as bottonr-outputs going out in 
polysilicon, and G and P (ordered from top to bottom) as left-outputs 
going out ln metal. The functional description specified for the cell 
is: G z c1 *P 1 + G1•c2 ; P • P1 + P2 and T = T. 
If the I/O nodes ordering along the cell boundaries is fixed, such 
as in our case, then the Symbolic Placement module will start by order-
ing product-terms within an AND-oR-INVERT function, a nd also of the 
drive-transistor• within a product-term. Otherwise, the module will 
first ger1erate a symbolic placement of the functions themselves. The 
ordering's goal is to minimize the cell's height by reducing the number 
of horizontal tracks needed to lay out the cell. In our example, we 
neerl to place the product-terms of function G (G 1 •~1 and G1*G2 ), func-
tion P (P1 and P2 ) a nd function T (T), such that G1 , P1 and T which 
come in polysillcon - need not traverse any unnecessary vertical diffu-
sion tracks. This is done by identifying the polysilicon input variable 
shared by both funct tons (here : i\) and ordering the product terms such 
that me tal crossovers for the polysilicon input variables (to get over 
diffusion tracks) are minimized. The following table shows how this 
process is done: 
COMPUTER -A IDED DESIGN SESSION 
245 
ALgoPithmic Layout of Gate Macpos 
Transistor 
sizes 
Functlonal description 
+ 
Basic topological description 
No Gate 
Yes 
Product-term 
Symbolic placement 
Drive-transistor 
Symbolic placement 
Layout of; 
- diffusion product-term tracks 
- input nets & drive transistors 
- load structures 
- output nets 
- inverter structures 
mask description 
Symbolic 
P l 'lcement 
Symbolic Placement 
Layout Generator 
Figure 4. Block diagram of the Cell Layout modules. 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
246 
Daniel D. Gnjski, 4vinoam BilgoPy and Joseph Luhukay 
-
Gl pl T Gl pl T 
,., . 
"'· Gl*Pl 1 l 0 G: Gl *G2 1 0 0 
Gl*G2 1 0 0 Gl*Pl 1 1 0 
p: pl 0 l 0 P: pl 0 1 0 
-
p2 0 0 0 p2 0 0 0 
- -T: T 0 0 l T: T 0 0 1 
Before ordering After ordering 
The output of the Symbolic ?lacement module is a t a ble denoting 
rel~tive placement of transistors on the re(erence grid system, and 
net-lists for the inputs and outputs. For our example, the table will 
be as follows: 
where columns denote vertical diffusion tracks, and rows denote horizon-
tAl ~olysilicon tracks. 
The Layout Generator uses the symbolic placement rlata to generate 
the masks, described in an intermediate form like the CIF [MeCo80]. It 
generates the rectangle s necessary to lay out the masks: diffusion 
product-term "tracks", input nets and drive transbltors, load struc-
tures, output nets, and inverter structures. Figure 5 shows the simu-
lated l ayout of four types of cells used in our example. 
Rather than predefining device parameters and then laying them out 
using a placement and routing scheme, the circuits are first laid out in 
an array-like structure with minimum device sizes. The electrical and 
geometrical parameters are passed on to the next module. Iteration of 
COMPUTER-AIDED DESIGN SESSION 
, 
247 
Algorithmic Layout of Gate Macros 
G 
p 
G = G1 + G2*Pl P ~ P1*P2 
T = T 
G = Gl*Pl + Gl*G2 ; p = PJ + p2 
T = T 
q 
(a) (b) 
T 
GND 
VDD 
s 
S = C*T + C*T 
(c) (d) 
Figure 5. Layout of adder/ s bas ic cells : 
(a) Type 2a cell; (b) Type 2b cell ; 
(c) Type 3 cell; (d) Type 4 cell . 
CALTECH CONFERENCE ON VLSI J January 1981 
248 
Dan i e l D. Gajsk i , Avinoam BiZgoPy and Joseph Luhukay 
the process ~ill produce the desired cl~cuit with the device sizes 
necessary to meat the design goals. 
4. Structure ~ene~ator attempts to obtain the best possible struc-
tun~ fo~ the gillen functional descript ton and envi~onmental parameters. 
1t specifies the cell types, the position of each cell in the final lay-
out and the interconnections between the cells. 
Figur~ 6 shows the struct•H~ of a 16-bit binary adder. Each cell 
will be refered to as C(i,j], where land j are the row and column where 
the cell is located, respectively, and the top rightmost cell is C[l,l]. 
Data are flowing only from top to bottom and from ~ight to left. The 
four types of cells gener~ted by the Cell Generator are located as fol-
lows: 
rows, 
type 1 cells tn 
type 3 in the fourth 
the fi~st row, type 2 ln the second and third 
~ow and type 4 in the fifth row. The second, 
third and fourth rows pe~form the carry-look-ahead. 
The input carry C(O) is fed into cells 8[4,1] through C(4,4] which, 
together with the cells in the second and third rows above them, func-
tion as the car.~y-look-ahead for ca~ries C(l) through C(4). Then the 
output of cell C[4,4] (which is C(4)) is fed into cells C[4,5] through 
C(4,10] that <>lmilarly produce the carries C(S) through C(lO). Lastly, 
the output of cell c[4,10] ls fed into the cells to its left, so C(ll) 
through C(l6) are produced. 
~et us assume that each type of cell produces its outputs in the 
sa•ne time delay d after its inputs are stable. For this particular adder 
example it was also given that the sum S(I) has to be available 7d and 
the input carry C(O) is available 3d after the inputs A(I) and B(I) are 
stable. Also, the fanout is limited: each cell can drive at most 7 
other cells. The structure shown in Figure 6 meets these constraints 
COMP UTER-AIDED DESIGN SESSION 
AZgoPithmic Layout of Gate Mac~os 
type 1 c e ll type 2 cell 
type 3 cell type 4 cell 
F igure 6 . 16-bit a dder s tructure and types 1, 2, 3 a n d 4 cells , in AND-OR fo r m. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
Daniel D. Gajski, Avinoam BiZ gory and Jos eph Luhukay 
with a very important feature - it has the minimum number of rows, 
therefore it occupies minimum chip area (however, this structure is not 
unique). 
Several paths through the structure have the maximum specified 
delay. They will be called critical paths (e.g C[l,6] ~ C(3,6) + C(3,8) 
+ C(3,10] + C(4,10] + C(4,12) + C[5,14]). The functions that define 
each type of cell are evaluated by the Cell Generator in a sum of pro-
ducts form. Since in ~OS technology (where this example is implemented) 
an AND-oR-INVERT logic is imple•nented more naturally then AND-QR, the 
complemented outputs are produced by each cell rather than the true 
ones. Inverting the outputs again is ruled out, since it almost doubles 
the delay tl•ne of each cell. For type 1 and 4 cells the double inversion 
problem is solved by modifying the functions to fit the complemented 
outputs. However, for type 2 and 3 cells this solution does not work, 
since these cells drive cells of the same type. Instead, two different 
s•1btypes of type 2 cell are defined: type 2a, which produces comple-
mented outputs from lts true inputs and type 2b, which produces true 
outputs from its complemented inputs. Now, cells along the critical 
paths are chosen to be of types 2a and 2b alternately. For type 3 
cells, inverting the left output of C[4,4] and C[4,10] (that drive other 
type 3 cells) is unavoidable. Inverters must also be added to few type 
1 and 2 cells in order to adjust their outputs to the driven cells. For 
these cells, only the outputs that drive the cells ln the same column 
are inverted again, while the outputs that drive cells to the left 
remain unchanged. Since critical paths have already been taken care of, 
the adder speed does not degrade by these inverters. In Figure 6, cells 
that contain additional inverters have a bar added above thelr type 
number. 
COMPUTER-AIDED DESIGN SESSION 
.4lgor>ithmic Layout o f Ca te Mac r>os 
Conclusions 
We have described the basic ideas behind a gate-to-silicon compiler 
by walking through a simple and well-known example. The compiler con-
sists of four 111odules, each of which performs one step of the transla-
tion toward silicon level. The first translation is a crude approxima-
tion of the final layout, and therefore one or more iterations are 
needed to achieve a "near optimal" solution. 
The novel approach in our compiler is based on (a) the set of syn-
thesis procedures for decomposition of gate macros into small atomic 
cells and for optimization of obtained cellular structures wlth respect 
to environmental and technological paramete rs, and (b) the set of algo-
rithms for automatic layout of different cell models obtained through 
decomposition of gate macros. 
CALTECH CONFERENCE ON VLSI, Januar>y 1981 
252 
[lHGa80] 
(Joha79] 
[MeCo80] 
(SOHT80] 
(Verg80) 
Daniel D. Gajski, Avinoam BilgoPy and Joseph Luhukay 
References 
Rilgory, A. and Gajski, D. D., "Automatic Cell Generation for 
Recurrence Structures" University of Illinois at Urbana-
Champaign, Department of Computer Science, Report UIUCDCS-R-
80-1040, November 1980. 
Johannsen, D., "Bristle Blocks: A Silicon Compiler," Proc. 
16th Design Automation Conf., pp 310-313, 1979. 
Mead, c. A., Conway, L. A., Introduction to VLSI Systems, 
Addison-Wesley, 1980. 
Shirakawa, I., Okuda, N., Harada, T., Tani, S. and Ozaki, H., 
"A Layout System for the Random Logic Portion of MOS LSI," 
Proc. 17th Design Automation Con£., pp 92-99, 1980. 
Vergnieres, B., "Macro Generation Algorithms for LSI Custom 
Chip Design," IBM J. Res. Develop., Vol. 24, pp 612-621, 
1980. 
COMPUTER-AIDED DESIGN SESSION 
SLIM: A Language for Microcode Description 
and Simulation in VLSI1 
John Hennessy 
Computer Systems Laboratory 
Stanford University 
Abstract 
SLIM (Stanford Language for Implementing Microcode) is a programming language based system for 
specifying an · simulating microcode in a VLSI chip. The language is oriented towards PLA 
implementations of microcoded machines using either a microprogram counter or a finite state 
machine. The system supports simulation of the microcode and will drive a PLA layout program to 
automatically create the PLA. 
1 Introduction 
Vl.SI chip design has rapidly become an area of great importance and interest. Mead and Conway 
[6] have proposed a design methodolouy for VLSI systems that has been widely employed. Their 
design methodology proposes a chip organization using: finite state control implemented with a PLA, 
functional units controlled by the PLA, and a set of data paths. This design methodology has been 
used in a number of large chip designs [5, 1, 2]. The finite state control co.n be thought of as 
microcode. Within a design that follows the microcoded control approach, designing and debugging 
the microcode appears to constitute a significant portion of the work involved in the design process 
[1, 2]. This paper describes a language for synthesizing the control units of c:. chip from a high level 
l&nguage description. 
Presently few tools exist to assist the user in design ing and debugaing the microcoded control. 
Programs to construct PLA's from boolean equation are widespread; however, the difficult 
component of control unit design is to speci fy and debug the microcode. This difficulty arises in 
generating the boolean equations that describe the finite state machine control. Transforming these 
1This research w;,s partially supported by the Join Services Electronics Program under contract # DAAG29-79-C-0047 and 
the Defense 1\dvanced Research Projects Agency lrnder contract # ND/\903-79-C-0680. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
254 
John L. Hennessy 
to a PLA layout is tedious and error-prone but mechanically straightforward. Some work has been 
done on describing PLA's at a higher level [7) and on synthesizing PLA descriptions from low level 
state machine descriptions in DOL [4). 
SLIM (Stanford Language for Implementing Microcode) is a programming language useful for the 
design of a microcoded system that will employ PLA implementation techniques. Unlike earlier work 
SLIM is functionally oriented. Control in SLIM is based on a finite state machine, but SLIM deals with 
objects that can be more abstract than the actual PLA inputs and outputs. The SLIM system supports 
both microcode simulation and automatic synthesis of the microcoded control function either in ROM 
or PLA. SLIM will also accommodate either finite state machine control or control with a program 
counter. 
Correct microprograms are both tedious and dirricult to write for several reasons. First, the 
programming language is extremely low level. Typically, the designer must deal with a primitive finite 
state macl1ine without the benefit of a human-engineered interface. Secondly, many of the 
microprograms are large. This leads to a relatively complex program without a great deal of structure; 
this is especially true if the finite state machine is codecl as boolean equations. A boolean equation 
approach makes it difficult to consider altering the m:crocode, even during the debugging process. 
Another major diffic ulty is the significant level of detail that must be expressed. This leads to one of 
two pitfalls: either the microcode description is very low level and cluttered with details, which makes 
it impossible to understand; or the designer uses an ad hoc higher level description of the microcode. 
An ad hoc description is unsuitable because the translation to the low level microcode must be done 
by hand, and the description tends to be too informal and vague. Without a higher level standard 
representation, microcode programs arc difficult to write correctly and virtually impossible to 
understand. The SLIM system is also able to translate the boolean equation representation of the 
PLA into a layout. 
We can summarize the goals of SLIM as 
• a symbolic higher level language suitable for designing and documenting the micro-
program and oriented towards implementation with PLA technology, 
• simulation tools to debug the microcode, 
oe automatic Ia you t of the PLA based on the microcode. 
255 
SLIM: A Language foP MicPocode DescPiption and Simulation ~n VLSI 
The SLIM design goals spawn a set of language and system requirements. The microcode 
simulation requirement implies the ability to describe the subsy3tcms that interact with the 
microcontroller; we will refer to these subsystems as the environment. Describing the environment 
can be easily done in a conventional programming language, if the interaction \Nith the microcode 
occurs in a restricted and well defined rn:mner. Separating the micromachine description from the 
environment description has two benefits. The separation increases comprehensibility of the 
micromachine structure. A specialized le1nguage is also more appropriate for the microprogram 
design; without the separation the translation process is difficult or impossible. 
The environment of the finite state micromachine can be described in a conventional programming 
language. The environment consists of c!ata structures and variables which can be used to simulate 
the structure of the subsystems. The environment/controller interface is based on a set of functions 
and procedures. The functions, wllir.h must be type boolean, correspond to the inputs to t11e 
microcode machine, while the p~ocedures correspond to outputs. We have chosen Pascal to 
represent the environment. The Pascal data structures provide additional support in describing the 
functional components. The wide variety of data types coupled with strong type checking also 
provides support for checking the rnicro.;ode and making des1gn restrictions explicit in the SLIM 
l)rogram. 
Since the end product of SLIM program is a finite state machine implemented with PLA techniques, 
a SLIM program must incorporate details about the implementation. This snecification should 
include: mappings bet•neen functions and procedures 111 the environment, actual PLA inputs and 
outputs, and timing spec ifications that force outputs to occur earlier or later th::m they occur in the 
program. Including these details separately allows a more functional orientation in the microcode 
description. Lastly, details concerning the actual PLA layout are needed, e.g. the number of PLA's 
and the positioning on the PLA of each signal. 
2 Specifying the microcode 
A SLIM program consists of a finite state machine. Each state in the SLIM proyram contains a 
s8ries of conditional actions that may cause one or more outputs to be high, or may specify the next 
state. The next state may be specified by default or explicitly. The outputs associated with a given 
state are conditional on a set of product terms only. /\!though arbitrary boolean expressions could be 
used, SLIM does not because it requires a significant amOIII)t of processing to transform the 
e.<pressions to a PLA oriented sum of products form. In this process the number of product terms 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
256 
John L. Hennessy 
added to the PU\ may be substantial (up to 2° terms for an expression of lenath n) . The property that 
-the number of product terms in the PLA is approximately equal to the number of preconditions for 
the outputs in a SLIM program - has been useful in estimating the PLA size. 
There are two major schemes for implementino the state component of a finite state machine. A 
stundard finite statG implementation uses a fixed state assignment and includes an encoding of the 
next-state function in the PLA. An alternative implementation uses a microprogram counter that is 
incremented under external control. Each approach has benefits that depend on the the micro-
program being implemented. The tradeofls and the advantages of the two different VLSI control 
implementations are discussed in [3). SLIM supports both control implementr.tions, provides default 
next states for program counter implementations, and will support subroutines with call and return in 
either case. 
A Sll t·A microprogram consists of a set of states listed sequentially. Each state may optionally have 
a label, which denotes the state name. The specification of the first state is preceded by a set of 
s;Y~ci fi cat;ons for outpu ts whicl1 are state independen t. Figure 1 shows the format of a the state 
machine specificr\tion. 
fsm 
state ·!:pucilicalton (/or stato mdependent outputs) 
state namo? (opfic>nu/) : {state-spe~ilication} 
st.Jte-name (optional): {stato-spccilication]. 
Fig11 re 1: Specifying the state machine 
A state specification is a list wh0se Plemen ts arc either unconrlitional actions or ccnditional 
cornmanrls. A conditional command consists of a condition and a list of actions. A condition consists 
of a list of one or more product terms that are joined with or, and a product term is a series of 
predicates joined with and. A predicate must be a call to a fun ction in the environment; predicates 
correspond to one or more PLA inputs. The interpretation of the command is: if the entire condition 
evaiuates to true, then tl1e actions should be executed. If there are no predicates, the condition is 
a-:;sumed to be true and the action is always executed in that state. The form of a state speci ficution is 
g1ven in Figure 2. 
if p1 and ..... and p11 or q1 ..... or qm=> action 
Where the pi arc function invocations and the qj are product terms, 
like the first term. 
f-igu re 2: .Stale specifications 
COMPUTER-AIDED DESTGN SESSION 
257 
SLIM: A Language fop MicPocode Desc~iption and Simu~ation in VLS[ 
Each slate may contain a list of such specifications and the entire state is bracketed. During 
simulation, state specifications are evaluated and executed sequentially, but in the actual PLA 
implementation these operations wi ll occur in parallel. Therefore, side effects between procedures 
that are outputs and functions that are inputs in the san1e state should b~ employed with great care. 
2.1 Actions 
There are two types or actions allowed: outputs and state change oper::ttions. A ld of actions can 
be used as a single compound action by bracketing the list. Outputs are invocations or procedures in 
the environment and correspond to PLA outputs. The state change directives dictate th8 next state. 
All state change directives have effect only after the current state is completed; thus, all state 
specifications with true conditions will be executed in a state. The ::.tate chango directives are: 
next ste1te-name- mal~es state llume tile next ~-.lte. 
call state-name- does a micrococ..le call to the routine at state-name. 
return - returns to tl1e state sequentially following tile calling ~tate. 
2.2 A short example 
Figure 3 shows the finite state machine controller for the traffic light example fron1 [6J. (The entire 
example is given in the appendix.) The state independent component is for simulation purpozes. The 
procedures Farmlight and Highlight alter the color (which is a parameter) of the traffic light ut tile 
farmroad and the highwGy. Timeout lool~s for the timeout condition, which is eitl1er sii•Jrt or long as 
dictated b'l the pararneter. The function Cars corresponciz to the tl)st for a car. 
Figure 3: ~licrocode specification for the Mead/Conway traffic controllar 
fsm 
[ getinput; timer ] { state independent component } 
higt.grn: [ highlight(green); farmlight(red): {ltiahway green and farmr oad rt!tl} 
if r.ot ca1'S or nottimeout(long) =>nex t hi(Jhgrn; 
if cars tllld timeout( l o ny) •> [ starlt imer: next highyel ] ] 
hiyhyel : [ hi ghllg ht (ycllow); fa•·~1light(red): {Htghway yell ow and farmroar1 red} 
il nott.1meout(short) => neAt highyel; 
if timcout(short) => [ starttimer; 11ext farmgrn]] 
farmgrn: [ higlllight(rer1); far mlighl(gree n): (lligllway retl and farm1•o ad green} 
if ca1· s and nottimeout(long) •> nt!xt f..1rmgrn; 
il notcJrs or t.imeou t(lona) => [ startlimer; next far1nyel ] ] 
fumyel: [ highlight(red); fa•·ml ight(yellow); {Highway r ed and farmroad yellow} 
if nottimeout(sho1·t) '> n~xt farmyel; 
if timt:out (short) M> [ starttimer; next highgrn ] ]. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
258 
John L. Hennessy 
3 Oefining the Relationship to the PLA 
Tho relationship between the microcode specification of the control program and the PLA is 
defil'lod by: declaring the input and output sionals for the PLA and defining tile mappings between 
environment fun c ti ons/procedures and input/output signals. The expressive power of this mapping 
is one of the advantages of SLIM. 
3.1 Defininy input and output s ignals 
PLA signals are defined by means of input and output signal declarations, which appear just before 
the definit ion of the environment procedures. Signal declarations begin with the keyword inputs or 
outputs, as appropriate. The general form of each declaration is then : 
{name [ T bounds ')' ]} [ ':' parameters]';' 
The list of names are the names of inpu t or outpu t signals being declared. The optional bounds 
designator indicates whether a ;:larticular signal is a ~ingle bit or a vector of bits. In the latter case the 
line can be treated as an integer encoded number; the order of the bounds (low to high or high to 
low) spec1fies the order of the lines in the signal vector . If any optional parameters appear they are 
associated with all inputloutpu£ narnes in the declaration . Table 1 defin9~ the legal parameters. 
{Syntax 
pla (n) 
top 
bottom 
renames (id) 
ea rlie r (n) 
lat e r (n) 
Table 1 : Signal parameters 
Meaning 
Associate signal wi th pia It n 
Position signal on top of pta 
Po~ition ~ig nal on bottom of pta 
Give the siun al id another nnme 
Mov8 the siunal n states e"-trlier 
Move sig11al n state::> late r 
For input/outout} 
both 
both 
both 
both 
output 
output 
/\ ~i gn::~ l declaration specifies physical placf:mont information using the direr tives top and bottom. 
Tile order of the signals on the PLA is given by the order of their declaration. The state sionals are 
otfd8cl by SLIM and appear last in the PL/\ inputs and first in the outputs; this facilitates 
interconnection . When more than a single Pl../\ is speci fied SLIM deter mil1cs whic h outputs should 
:wpca1 from which r>:...A 's (uy dcclctrl.'.tion or clefault to rLA 1). Only the necessary inputs are 
g<Jner:.lted for each PLA; these arc based on the outpu ts that are specified in that PLA. 
Tile optional pipc lined dir8ctives, i.e. ea rlie r and Inte r , move an output signal forward or 
b.Jckwurd in the state graph. Th is is very useful when a particular signal , which is logically associated 
COMPUTER -A IDED DESIGN SESSION 
SLIM: A Lan guage f o P Mic Pocode DescPiption and Simutati on in VLSI 
with a single operation, must occur earl ier. A frequently occurring example of this is precharging or 
enabling of alu's. Alth ough the functional operation add appears to occur in a sing le state the alu 
must be precharged/enabled one state earlier. The pipelined directives provide a convienent way to 
express such relationships without add ing need less details to the microcode descnpti on. If an output 
signal x appears in a states cond itional on input c and x is pipelined c arlier(i) , then t11e output x will 
appear, conditional on c, in all the states that precedes by i states. /,!though pipelining can be done 
into both predecessor and successor states, by far the most common si tuation is pipelining into the 
immediate successor state. SLIM finds all predecessor or successor states, including those that 
occur when the state that is pipelined from is the target of a branch or cal l. Pipelining is not permi tted 
across a procedure return, i.e. in the state fo llowing a call. The ren ames directive gives a signal 
another name, w ithout associatin g the other c harac teristics (e.g. pipelining) of the renamed sign&l. 
This is useful if a particular signal must be pipelined nearly all the time, but occasionally nonpipelincd 
generation of the signa l is needed . 
3 .2 Describing the re l ationship be tween environment and outputs 
Since a procedure or func tion in ti le environment can log ically correspond to one or more signals, 
SLIM provides a method of defin ing the mapping between environment rou tines and signals. T l tis 
meth od allows the microcode description to be func ti onally oriented, and to sign ificantly decrease t11e 
amount of code needed to describe the PL/\ implementation of the microcode. 
The mapping between environment procedures and signals to be generated in the PLA is given in 
the definition section of an environment procedure or function . The defin ition section starts wi th the 
keyword d efinition and appears immediately after the function or procedure header. Procedures in 
the environment witho ut a definition section are presumed to be for simu lat ion purposes on ly. The 
definition section con sists of a list of signal definitions whic h arc separated by semicolons; the 
ddfinition section is tz rminated by end. 
A signal definition h as the form: 
[ pattern-s tring : ] signal-expression 
The op tional patte rn -string is used to specify different signal combinations based on the values of the 
parameters to the environment procedure. The pa ttern-string consists of a list of string patterns 
separated by commas and enclosed in parenthesis. If the pattern list matches the list of actual 
parameters in a call to tllis procedure, then the signab in the signal list are generated as ou tputs. 
E3.ch string pattern can either be a alphanumeric string or a "• ". The latter is o. wi ld card match, 
indicating that any ac tual parameter value should generate a matc h for the correspond ing parameter. 
CALTECH CO NF ERENCE ON VLSI ~ Janua Py 1981 
260 
John L. lfennessy 
The signal expression specifics what sinnals to generate; it m3y also contain invocation ::>f otl1er 
environment procedures. Before it is evaluated any iden'ulicrs in the siunol-list that corre:>pond to 
formal parameters are replaced by Hw actual parameter values in the call for which signa!s are being 
generated. The types o f signnl exwessions arc defined in Table 1. 
______ --; iqr.al·c.x DttJ$Sion 
sianal name 
procedure-name( pammeters) 
:;Jgrwl-cxpre:.,stan and s ignal-expression 
expr 1 & expr 2 
s ignalrvHne ~ integer constant 
not .r:ignal expression 
s,gnal-namu[ constant] 
!\.~cr.ning 
emit the signal 
emit tl1c siynnls for the nam()d procedure 
emit both sets of signal expressions 
Emit cxpr 2 concaten:Jted to expr 1 
emit encoded const<:nt to the signnl vector 
emit in 'Jl'l sc of a simple signal-expression 
emit a single signal within a sianal voctor 
Table 2: Signal cxprassions 
It the rranal irJe,ltifier is an E"11V1ronment procedure ho~d not an s1gnal name, the d efini tion section of 
t11c rcferenr:ed nnviron .nent procedure is usecl for that sign .11. Naturally, t11c procedure name can be 
tn:low8cl by par~11ncter stnnos. This fasi lity al:ows multi -level environmP.nt procedures to produce 
:>1ona!s by composing tho ddinitlon li st in eacl1 procedure. 
lol 1-t ~itlre ·~ r,.)me iltput/out1)11 t cleclaratlons and two of the procedures trom the Mead/Conway 
tral IC 'rJh t c.<<.lr olplc arc given. The high-.N<IY tr:Jffic li~Jht is encoded c.s a two -ele ment voctor; the input 
t,:;slmq for c.:lrs is o. single bit. i'lote that PLA ~- iQnab may have the same name as compool'~l"lts of the 
Pascal proCjr~u11. 
Fi9u rc 4: /\n cx:-tillple from the Mcad/Comvay Traffic Controller 
type CQlOI'tjpl:! : (\)l ' el!l1,yellOIV, I'f)(J ); 
mput::; c: bottom: 
ou!put s hl(l .. O]: bottom: 
fHOCf!CIIIrC h1ghl ighl(color: co lor type): 
dr.f1notoon 
(green): hi ~ 0: 
(yt.llow): hl a 1: 
(n~ll): hl • 2 : 
bcgrn 11 1 :~ CO l O!' end; 
func tion c:a rs :boolean : 
clc.-finitovn c: 
begrn ca.- s : • ( c 1) end; 
COMPUTER -AIDED DESIGN SESSION 
<:.U.l 
SLIM: A Language far Microcode Description ani Si~u!1t£o~ in VLS[ 
4 UGing SLIM 
A SLIM program can be used to drive a microcode simulation as well as gen8rate a PLA l<wou t. A 
SLIM simulation requires a microcode description \Nith all of the enviro: tm€nt procedures and 
functions. The simulation is presently done by creating a Pascal program which embodies the 
semantics of the microcode. A SLIM simulation can be requested with state tracing. 
PLA generation is a straightforward process, which is done in two pc::rts. The first part analyzes the 
microcode structure and creates product term lists for each output. The effect of signal definition and 
pipelining is integrated before mal~ina these lists. The PLA layout is then donG by a separate program 
whicl1 inputs the signal description::; ::md the product term lists. The interrnedi;,1te form uses boolean 
expressions; this allows the use of any PLA gener::1.tor that accepts boolean r:quations as input and 
tile use of PLA optitntzers prior to inyout. 
Another program in the S lltvl syf,tem can be used to as3ist in choosing a state encoding (applic::lble 
only for finite state implementations). The proar~mt acceptances output from SLIM '.vith tl1e state 
entries unencoded. It computes a matrix whose: i,j entry ts the sewing in prcduct tcr.11 CuL•nt tilut will 
result if states i and j are encoded so ti1::: tr.ey can r;e uniquely di::;tinguislwd frcm c::ll other st;tics v;itll 
a single product term. 
4 .1 Ensuring micrtJcode correctness 
There are several useful types of debugaing ancl cl1ecking of microcode t!ut c :\n be dc,ne in tlte 
process of simulation. Most important among llwse are detncting potantial err nrs wllich arise 
because the simulation does not exact!y match the PLA irnplemen t~1tion, or bec:.wse the microcode 
does not employ the environment in e~ manner that the hard·,vare is deSi<JnC'cl to support. Ano ther 
class of errors may arise because the ~;i mulation may fail to test all possible COitlbinations of iniJuts or 
fail to t.::st all states. 
The major reasons t11at tile simulation C~nd PLA implementation llli~11lt b~~l1avc ditferetlfly is because 
the simulation treats outputs, envimnmont procedures, ancl the state CIS uniqu8 er tt i! ics i11 r1 ~;eq uential 
rn<:~nner. In the PLA these objects are interrel<t tccl. Problems sucll as ass1gning two next statos are 
resolved into a single, well defined action in the simulation, but these actions resu lt in a d isaster in tile 
PLA implementation, since both sets o f state bits are set high. Certain classes of these errors can be 
caught by predefined, microcode independent mc t11otls, but ot1 1ers require a more general sche me, 
which we can also employ to find errors concerning the use of the hardware environment by the 
CALTECH CONFERENCE ON VLSI , January 1981 
262 
John L. Hennessy 
microcode. SLIM checks for common s0rts of errors, sucl1 as fail ing to assign a next-state in a finite 
state machine implementation, or attempting to assign more than one next-state. 
Many of the hardware/microcode inconsis tencies arise from situations where certain outputs are 
beinu incorrectly used , perhaps with respect to timing, or the hardware is being instructed to preform 
some tasl< it is not physically able to undertal<e. Many of the latter types of errors can be caught using 
a stric tly type-checked environment specification . For example, suppose that the register file on 
s0me microcoded processor is divided into two secti ons in such a way that two registers from the 
same section can not be gated to the alu (many hardware micromachines have this property). 
Microcode errors that arise because two registers frorn the same section are being sent to the alu can 
be detected by defining the machine structure with two different types for the registers and speci fying 
that the alu environment procedures have two parameters- one from each register section . This 
class of simple errors is detected at compile- time. 
1\ more complex class of errors can not be detected with a straightforward compile- time scheme. 
Some exarnples of th is type ot error o.re: attempts to 11:.;e the bus for two d ifferent quantities in the 
same tirne frame, overl::"\pp!ng use of environment lomdware (such as an alu) . and incorrec t t iming of 
an output in ::t state. Many of thes errors can be cletect0d durir.o simulation using a set of assertions, 
which can be c hecked during simulation. We di,,irle tlle..se assertions into two groups: invariant 
assertions and state dependent assertions. The invariant assertions speci fy conditions which must 
hold regardless of the current state, e.g. if an alu output occurs in this state, the alu was precharged 
in the previous state and was not doing any other operation. State dependent assertions specify 
properties wl1ich should ho ld at a par ticular s tate, e.g. a certai11 part of the machine should have a 
certain value. 
In SLIM anywhere an action can occur, an assertion can be spec ified. Although tile assertion 
generates code for simulation purposes, no PLA entries are affected or generated. Assertions are 
or. ly used to ensure that cortain properties hold . An ~ss0rtion has the form assert invocation, where 
invocation must be the invocation of a boo!uan function . Whenever execution reaches an assert 
slatcment at cimulation time, the simu l <~tion invokes the specified function. If the function returns 
false the simulation is hal ted with an appropri ate error message. 
In using SLIM, we have found that the expressive power of SLIM 's pipcl ining and signal definitions 
is one of its major advantages. However , the mechanism can also lead to errors, since the 
Sf.)eci fications are not reflected in the simulation . To assis t in ensuring that the signal specifications in 
COMPUTER -A IDED DESIGN SESSION 
SLIM: A Language foP MicPocode DescPiption and Simul2tion in VLSI 
a SLIM program are consistent and correct, two types of output-generation checl<ing are supported. 
Pipeline checking wi ll cause a warning to be generated whenever a signal component both occurs in 
a state and is pipelined into that state from another state. This appears to catch most errors in the use 
of pipelining. Another powerful check is examining sets of mutually exclusive signals. A SLIM 
program can specifv one or more exclusive sets. SLIM will check that no two signals in the same 
exclusive set can be generated in the same state. 
5 Current status and concluding rem a rl<s 
This paper describes SLIM, a languc.ge and processing system for describing microcode whose 
implementation orientation is PLA based. The purposes of this language are: to document tile 
microcode at a reasonable, logical level while providing a firm specification; to allow extensive 
simulation , debugging, and error detection; and to automatically create th e PLA layout n~cessary to 
implement the microcode description. 
SLIM has been working for <Jpproximatel~· one year. It is coded in standard Pascal. To date, 
experience with SLIM has been highly favorable. It has be.:ln used in the development of two large 
chip designs [1, 2], both of these contair. extensive microcoding . It has alc;o been used in a number of 
smaller projects with favorabl8 resu!ts. 
The most significant observation we have made in using SLIM is the ~normous sign if icr111ce of the 
control function and its design. For 13rge projects, we have found that 60-75% of the de:.iun time is 
spent in constructing and debuuging the control us specified by SLIM. A Iaroe amount of this tirne is 
spent is constructing an accurate functional specification of the data components as a SLIM 
environment. In many instances, the construction of SLIM environment has uncovered bugs in the 
data components being descr ibcd. The specification of the control prog ram itseif is ~lso time 
consuming especially in the debugging process. 
There are many interesting questions concerning the app licabi l ity of SLIM that have not been 
investigated . It would be interesting to e:,amine the use of SLIM for microcode machines whose 
architecture is not s trictly PLA based, but whose microcontrol is straightforward. We are also 
interested in supporting a wide variety of PLA implementations and in PLA optimization. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
264 
John L. Hennessy 
.1.\ppendix 1. Annotated Syntax of SLIM 
This is the syn tax for tile non-Pasc::tl portion of SLIM. Nonterminal symbols appear to the left of =; 
terminal symbols in the grammar are distinguished by being in quotes. The metasyntax [a] means 
thGt the string a is optional, and {o:) means that the string a may be repeated zero or more times. 
Comme:1ts can appear at lhe end of a p(oduction and are started with --. 
Program = 'program' '( id)' Progmmparms ';' Outcrblock 
Outerblock ~ Consrpnrt Typf)dcfpart Vcrdeclpart lopart Proc1-1art rsm ··II rascal program wrth a Ism body 
Prochf:atlrng .: "proc:edure' '(irl>' f ormalparms ':' Oefin'tionpart - Procedures contain definitions 
FunclleadinlJ = function' '<id>' f nr malparms '·' '(rd)' ';' Definition part 
lopart = ['rnputs' Spec C ';'Spec:)] [outputs' SIJ(;C { ·: Sp0c}]- Input/output declarations 
Spec = Vector { '.' Vector}[':' Paramct~'r {Paramt' tc r} ';'J - An input/output vector 
Vector = '(id)' [ [' '<int>' · '<int>' ') ) - Vector has intCg(!t bounds 
Parameter = 'pia' '(' '<int)' ')' rLA number 
= 'top' -- r op of PLA 
'bottom' · Bottom of PLA 
= 'ear her' ' (' '(int>" ')' rr~t'li11u into earlier states 
= 'loter' '(' '( rnt>' ')' -- Prpehne into l:.~tPr r.tatC's 
= 'ren !lmen· '(' '(rei>' ')' Rt"narnc a .;;igMI (\'Jithout pipelining) 
Defrnrlionpart ~ ( 'Je:fmrtron' Dcfinrlinn {Definrtu.; n} l 
Defrni tron " l '(' ratterntist ")'' :' ]Output {'and' Output} ';' -- Definiti0n in a :>cries nl pattern lists 
Pattnr nhst = Pattern { ' ,' Patt.ern} -- each pattern he;: mu~t match the parameter list 
Pattern = · •· ··Wild curd match 
= '(id>' - t tame match 
Outf)ut = ('not' ] PlarnCJrrtput - Outoutc; c:an be i:wcrt"ci 
Plarnou tpul ~ lnvucatron [ '& • Output) - O utp tiH <:rm be composed by concatenation 
=- '(id )' '- • Constan t . II vec tor can outr.,.ut an encoded intcuer 
= '<id>' '[' '( int>' ')' -- II, srnutc lrnP fr om "1 vedor can bs made high 
Fsm -= 'Ism' S: ::~ tE:mdpart {State·} · .' 1 he rSM contDrn~ a "Jtate independent part an:l a list of states 
Stalcrndpart = '(' Stall•spe:c ifier~ ']' 
State = ( "<id>' ··· ) '[' StntcspccifrPr~ T State::; ere optronally labelled 
St;:Jtespecilicrs Statesp:;!c; { ' ;' Statospec} 
Slutcnpcc ~ [ 'rf' Cond { or ' Conn}·-)' /\chon) -- P. ~.tntc i.; conditionn l on a sum of prod11ct terms 
Co••d ~ {In' ocation ':wei"} lmocatiun For rn of a produt.:t term, the i:woc<.1tro11S are functions 
lnvocatron = '(rc~>· ( ' ( Constant (' ,' Ce>.,stant) ')' I -- Ltmrted 
function uwocahon, r::onstant can be A variable 
Actrnn ~ '[' 1\ctror. t ',' J\chon} ')' -- Co111posite act ion 
'assert ' rnvocatum -- A;;;scrt action 
.: lnvocatrnn - Procedure rnvQcation 
= 'next' '<id>' -- Goto spccrficd ~t::lte 
= 'call' '<reD' -- II mrc rococte subroutine call 
= 'return' -- A micr{lC:Ode sub~:rou tine return 
COMPUTER-AIDED DESIGN SESSION 
SLIM: A Language for Microcode DescPiption and Simulation in VLSI 
Appendix 2. More Ex'-"lmples 
The Full Traff ic Controller from Mead/Conway 
program traffic( input .output): 
const short = 2: long • 4; 
type colortype = (green,yellow,red); 
signaltype = 0 .. 1; 
var time: integer: hl,fl: colortype; 
inputs 
outputs 
procedure 
begin 
procedure 
begin 
procedure 
definition 
c. tl, ts : bottom: 
st,hl[t..O],fl[l..O] :bottom; 
getioput; { for simulation purposes only } 
write('cars? ');rcad{c); e nd; 
timer; { for simulation purposes only } 
if time < long then time : • time + 1 end; 
highlight{color: c0lortype); 
(green): hl =0; 
(yellow): hl = 1 ; 
(red): hl m 2 : 
begin hl :=color· end; 
procedure farml ight{color: colortype): 
de finition 
(areP.n): fl = 0 : 
(yellow): fl " 1 ; 
(red): fl = 2 ; 
begin fl := color end; 
procedure st1rttimer: 
definition s t: 
begin time : = 0 e nd : 
function cars :boo 1 ean: 
definitron c: 
begin cars : = (c = t) end; 
functi o n not ca rs :boolean: 
definition not c; 
begin not cars : = nut cars e nd; 
function timeout(length: integer) :boolean; 
definition 
{long): tl ; 
{short): ts : 
1:-eqin timeout := (time >= length) end: 
fun c tion not.tirnoout(lerliJLh: inteutlr) :boolean; 
definition 
(long): not tl : 
(short): not ts : 
begin nott imeout : = not timeout( length)end; 
Ism 
[ getinput: timer ] { stale independent component } 
highgrn: [ hrghlighL(gr· ~en): farmliglrt(red): 
tl not.cars or not.t.imeoul(lonq) => next hi !Jhg r·n; 
c: signaltype; 
tl cars and llllteout.(long) => [ st.arttimer: next highyel ) ) 
h1ghyel: [ hlghligh\.(yt:lluw): farmlight.{red); 
if not l imeou t (short) => next hi ghyel: 
if timeoul{ s hor t) "> [ starttimer·; next far·mgrn]] 
farmgrn: [ highlight(rcd); farml1ght(green): 
if cars and nottimeout(long) => next farmgrn: 
if notcars or tirncout(long) => [ :.Larttimer: next farmyel ] ] 
farmyel: [ hi ghl ight(red): farml ight(yellow) : 
if not timeout( ~hort) =.· next farmyel: 
if timeout( s hor t) => [ star·ttimer: next highgrn ] ). 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
266 
Example- Computing GCD 
program test ( input,output); 
var x,y: integer; 
inputs 
eql,eqO,gtx, gty: bottom; 
outputs 
aluop[l. .2] : bottom ; 
enabL.<.,cnat.lBy: lop (;artier (l}; 
procedure init; 
begin read(x); read(y): end; 
p rocedure subt (var a,b: integer); 
def inition 
enable & a and enable & band aluop = 1; 
begin a : 3 a-b eml; 
function greater (x,y:integer): boolean; 
definition 
gt & X 
begin greater := x>y end; 
fun ction equal (x,y:integer): boolean; 
d efi nition eq & y; 
bcoin eq : = x z y: end: 
functi on ne(x,y:integer): boolean; 
definition not e4ua 1 ( x, y): 
b~oin ne :=not equal(x,y): end: 
Ism 
[: J 
one [ init 
assert ne(y,O): 
if equa 1 (x, 0) => noxt ends tate ] 
[ call t.wo ] 
[ next one ] 
John L. Hennessy 
two: [ if greater(x ,y ) z> [subt (x,y): next two ]: 
if greater(y,x) ~> (sullt (y,x); next two]] 
three: [ assert equal (x ,y); 
if equal(x,l} ~> [writeln(l}; return]: 
if ne(x, 1) => [wri tel n(y): return]] 
ends tate: [ halt ] . 
COMPUTER -AIDED DESIGN SESSION 
...... 
SLIM: A Language for Microcode Dese~iption and Simulation in VLSI 
References 
1. Clark, J.H. "A VLSI Geometry Processor for Graphics." Computer 73,7 (July 1980), 59-68. 
2. Clark, J.H. and Hannah, M.R. "Distributed Processing in a High-Performance Smart Image 
Memory." Lambda 1, 3 (1980), 40-45. 
3. Clark, J.H .. Hennessy, J.L., Hannah M.R. A comparasion of two different VLSI control structures. 
Computer Systems Laboratory, Stanford Universi ty, Dec, 1980. 
4. Duley, J.R. and Dietmeyer, D.L. "Translation of DOL digital system speci fication to Boolean 
equations." IEEE Trans. Computers c-78, 4 (Apr 1969), 305-313. 
5. llolloway J., Steele, G., Sussman, G., Bell, A. The Scheme-79 Chip. Tech. Rept. 599, Artificial 
Intelligence Laboratory, MIT, Jan, 1980. 
6. Mead, C. and Conway, L .. Introduction to VLSI Systems. Addison-Wesley, Menlo Park, Ca., 1980. 
7. Weber, H. High Level Design for Programmed Logic Arrays. Proceedings of Fourth Cont. on 
Computer Hardware Description Languages, May, 1979, pp. 96- 101. 
CALTECH CONFERENCE ON VLSI, January 1981 
268 
COMPUTER - AIDED DESIGN SESSION 
Signal Delay in RC Tree Networks* 
Paul Penfield, Jr.** 
Jorge Rubinstein*** 
ABSTRACT 
269 
In MOS integrated circuits, signals may propagate between stages with 
fanout. The exact calculation of signal delay through such networks is 
difficult. However, upper and lower bounds for delay that are computationally 
simple are presented in this paper. The results can be used (1) to bound the 
delay, given the signal threshold; or (2) to bound the signal voltage, given a 
delay time; or (3) to certify that a circuit is "fast enough", given both the 
maximum delay and the voltage threshold. 
I. Introduction 
In MOS integrated circuits, a given inverter or logic node may drive 
several gates, some of them through long wires whose distributed resistance 
and capacitance may not be negligible. There does not seem to be reported in 
the literature any simple method for estimating signal propagation delay in 
such circuits, nor is there any general theory of the properties of RC trees, 
as distinct from RC lines. This paper presents a computationally simple 
technique for finding upper and lower bounds for the delay . The technique 
is of importance for VLSI designs in which the delay introduced by the 
interconnections may be comparable to or longer than active-device delay. 
This can be the case for wiring lengths as short as 1 mm, with 4-micron 
minimum feature size. The importance of this technique grows as the wiring 
lengths increase or the feature size decreases. 
*This work was supported in part by Digital Equipment Corporation, in 
part by the Advanced Research Projects Agency of the Department of Defense 
and monitored by the Office of Naval Research under Contract N00014-C-80-0622, 
and in part by the Air Force under Contract Number AFOSR 4-9620-80-0073. 
**Department of Electrical Engineering and Computer Science, Massa-
chusetts Institute of Technology, Room 36-575, Cambridge, ~~ 02139; 
telephone (617) 253-2506. 
***Digital Equipment Corporation, 75 Reed Road, Hudson, MA 01749; 
telephone (617) 568-4835. 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
270 
Pau l. Penfiel-d , JY'. and ,Topge Rubinstein 
Consider the circuit of Figure l. The slowest transition (and therefore 
presumably the one of most interest) occurs when the driving inverter shuts 
off and its output voltage rises from a small value to v00 • During this 
process the various parasitic capacitances on the output are charged through 
the pullup transistor. Figure 2 shows a simple model of this circuit for 
timing analysis. The pullup, which is nonlinear, is approximated by a linear 
resistor, and the transition is represented by a voltage source going from 0 
to v00 at time t - 0. (Later, for simplicity, a unit step will be con-
sidered instead.) The polysilicon lines are represented by uniform RC lines. 
The resistance of the metal line is neglected, but its parasitic capacitance 
remains. Capacitances associated with the pullup source diffusion, contact 
cuts, and the gates being driven are included. Any nonlinear capacitances are 
approximated by linear ones. 
In general, the circuit response cannot be calculated in closed form. 
The results of this paper can be used to calculate upper and lower bounds to 
the delay that are very tight in the case where most of the resistance is in 
the pullup. The theory as presented here does not explicitly deal with non-
linearities and therefore does not apply to signal propagation through pass 
transistors. A more complete discussion of this theory will appear elsewhere 
[ l 1 • 
II. Statement of the Problem 
An RC tree is defined as follows. Consider any resistor tree with no 
node at ground . From each node in this tree a capacitor to ground may be 
added, and any resistor may be replaced by a distributed RC line. Although 
nonuniform RC lines may appear in an RC tree, for simplicity, the examples in 
this paper involve only lumped resistors and capacitors and uniform RC lines. 
An RC tree has one input and any number of outputs. Side branches may or may 
not end in a node that is considered as an output; in fact, outputs may be 
taken anywhere in the tree. Nonuniform RC lines are special cases of RC 
trees, without any side branches. An important property of RC trees is that 
there is a unique path from any point in the tree to the input. 
The tree representing the signal 
step voltage. Gradually the voltages 
at all the outputs, rise from 0 to 
voltages cannot be calculated easily. 
lower bounds for the output voltages, 
lower bounds for the delay associated 
III. Analytical Theory 
path is driven at the input with a unit 
at all other nodes, and in particular 
l volt. It is assumed that the output 
The problem is to find simple upper and 
or, equivalently, to find upper and 
with each output. 
Consider any output node e, and any lumped capacitor at node k with 
COMPUTER-AIDED DESIGN SESSION 
Signat Detay in RC TPee NetwoPks 
Voo 
,..... 
METAL rl I 1_, I 
c 
POLY POLY POLY 
:t 
1-- POLY :t B I 1--
A 
GND 
Figure 1. Typica1 MOS signal-~~stribution network. ~e inverter is 
shown driving three gates. 
Figure ?. LineaT-cjrcuit model for the network of Fi~ure I. The 
voltage source is a step at time t m n. 
B 
271 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
272 
Paul Penfield , Jr . and Jorge Rubinstein 
capacitance Ck. For the moment consider only lumped capacitors; the theory 
is similar if the distributed lines are considered also. One may think of 
many-stage approximations for the distributed lines, or one may convert some 
summations in the formulas below to a form including both summations over 
lumped capacitors and integrals ove r distributed ones . 
The resistance Rke is defined as the resistance of the portion of the 
(unique) path between the input and e, that is common with the (unique) path 
between the input and node k . In particular, Ree is the resistance between 
input and output e, and Rkk is the resistance between the input and node 
k. Thus Rke ~ Rkk and Rke ~ Ree · For an example, see Figure 3. 
The sum (over all the capacitors in the network) 
(l) 
has the dimensions of time and is in general different for each output . It is 
equal to the first - order moment of the impulse response, which has been called 
"delay" by Elmore [2] • This constant alone can be used to generate a lower 
bound to the step response. Two other time constants and a set of tighter 
upper and lower bounds using them will be given at the end of this section . 
Let vk(t) and ve(t) be the voltages at the node k and output e 
respectively, in response to a unit step excitation. The current Ck dvk/dt 
that feeds the capacitor Ck contributes to the voltag e drop between the 
input and the output e by the amount Rkeck dvk/dt as it flows through the 
resistance Rke· The net voltage drop 1 - ve(t) is obtained by adding the 
NODE k 
R3 I R4 
R1 R2 
Rs OUTPUT 
e 
at 
Figure 3. Illustration of res ista n ce terms. For this network, 
Rke - R1 + R2 , Rkk = Rl + R2 + R3 , a nd Ree = R1 + R2 + R5 • 
COMPUTER-A IDED DESIGN SESSION 
ulu 
Signa~ Delay in RC Tree Networks 
contributions from all the currents in the tree: 
(2) 
Integration of the right-hand side of (2) from 0 to CD yields TOe' 
since the voltages are assumed to rise from 0 at t = 0 to 1 when t 
approaches oo, everywhere in the tree. Thus TOe is equal to the area 
above the unit step response ve(t) but below 1' as indicated in Figure 4. 
Since v (t) 
e 
increases monotonically (a fact proven elsewhere [ 1) ) ' no 
rectangle with one corner on ve(t) and bounded by the lines t = 0 and 
ve(t) = 1 can have an area g reater than Toe (see Figure 4) ' i . e . , 
(3) 
This expression yields a lower bound for 
v (t) > 1 - Toe 
e -
(4) 
t 
This result illustrates how a suitably defined time constant Toe can be 
used in a bound for the step response. The computation of Toe and the bound 
are much simpler than the exact calculation of the response, especially for RC 
trees with distributed lines. 
llore complete and tighter bounds require two additional time constants 
Tp and TRe to be defined: 
t 
Figure 4 . The shaded area is equal to Toe and the rectangle 
has smaller area. 
CALTECH CONFERENCE ON VLSI , January 1981 
274 
Tp = L k Rkkck (5) 
IRe = ( L k R~eck) /Ree • (6) 
Both summations extend over all the capacitors of the network. As with IDe' 
these additional time constants can be computed easily, even in the presence 
of distributed lines, and while IRe is in general different for different 
output nodes , Tp is the same for all outputs . It is easily seen that 
(7) 
For nonuniform RC lines (i . e ., RC trees without side branches) IDe= Tp . For 
a single uniform RC line, Tp a Toe a RC/2, and IRe = RC/3. Lower bounds , 
tighter than (4), and upper bounds can both be derived in terms of these three 
characteristic times . A detailed derivation [1] leads to the upper bounds 
and lower bounds 
v (t) > 0 
e -
1 -
t + TRe 
(8) 
(9) 
(10) 
(11) 
where (12) applies if t ~ Tp - IRe · The tightest upper bounds are (8) for 
small t and (9) for large t. The tightest lower bounds are (10) for 
t ~ Toe - IRe' (11) for Tue - IRe ~ t ~ Tp - IRe' and (12) for 
Tp - IRe $ t . 
Bounds for the time, given the voltage, are possible because the voltage 
i s a monotonic function of time . Of course 
t ~ 0 (13) 
a nd in addition, (8) and (9) can be inverted to yield 
COMPUTER-A IDED DESIGN SESS I ON 
Signal Del~y in RC TPee NetwoPks 
(14) 
(15) 
and ( 11) and (12) yield 
t < 
Toe 
- T 
1 
- ve(t) Re 
(16) 
t ~ Tp - TRe + Tp ln Toe 
Tp[1 - ve(t)] 
(17) 
where (17) only applies if ve(t) ~ 1 - T0 e/Tp• The general form of all these 
bounds is illustrated in Figure 5. 
These bounds, (8) to (12) for voltage, and (13) to (17) for time, con-
stitute the major result of this paper. 
IV. Practical Algorithms 
One way to use the inequalities of the previous section is to consider 
the overall RC tree, and compute for each capacitor the appropriate Rke and 
1 
t ~ 
Figure 5. Form of the bounds, with the distances from 
the exact solution exaggerated for c larity. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
276 
Rkk so that Tp, IDe' and IRe for each output can be found. Of course 
for distributed lines the sums are replaced by appropriate integrals . In this 
approach, the calculations necessary for each output require time proportional 
t o the square of the number of elements . 
An alternate approach is to build up the network by construction, and 
calculate independently for each of the partially constructed networks enough 
information to permit the final calculation of Tp, Toe• and IRe· A recur-
sive definition of Re t r ees is given below, and if the network is expressed 
in these terms rather than in the form of a schematic diag r am, the resulting 
expression can be used as a guide for the calc u lations . The computation time 
for each output is proportional to the number of elements, rather than the 
square of the number . Programs that implement this approach appear below. 
For simplicity, the tree is assumed to consist of lumped capacitor s, 
lumped resisto r s, and uniform (not nonuniform) Re lines . Only one ou t put is 
considered; a more general set of programs is described elsewher e [1] . Only 
one primitive element, a uniform line (URC) is necessary . If either the 
resistance of the line or the capacitance is zero, the line reduces to a 
lumped capacitor or resistor . The line is denoted URe R,e where R,e is a 
vector of length 2 cons i sting of the resistance a nd capaci t ance of the line , 
in that or der. A capacitor is written URe O,e and a resistor URe R,O . 
Figure 6 shows a way of converting a subtree into a side b r anch and a way of 
cascading two subtrees. The topology of any Re tree can be denoted by an 
expression using only these two functions, WB and We . 
Example: The netwo r k shown in Figure 7 is a tree with one side branch and 
may be denoted 
(URe 15 0) WC (URe 0 2) WC (WB (URe 8 0) We URe 0 7) We (URC 3 4) 
we URC 0 9 . (18) 
A OPEN A B 
WB A A we B 
Figure 6 . Wiring functions for interconnec t ing elements or subtrees. Here 
A and B are previously defined Re trees. 
COMPijTER -AIDED DESIGN SESSION 
8 
15 3,4 
~ 
Figure 7. Example network. Parameter values are 
in ohms and farads. 
277 
An expression such as (18) can be used as a guide for the calculations if 
each function shown corresponds to the calculation of partial results which 
are sufficient to allow further calculations . The following information is 
adequate at each stage in the construction of the network: 
(1) Total capacitance CT. 
(2) Tp of the network as constructed so far. 
(3) Considering port 2 as the output, R22 , T02 , and TR2 • (For 
convenience, the product R22TR2 is used in the programs below 
instead of TR2 .) 
Each of the quantities identified above pertains to the particular subnetwork 
and can be calculated from a knowledge of that subnetwork alone, independent 
of how the subnetwork may later be wired together with other subnetworks . As 
an example of the use of these quantities during construction of the network, 
consider the cascade operation WC. The objective is to find CT, Tp, R22 , 
T02 , and TR2 , of the cascade A WC B from the corresponding quantities for 
its two arguments, A and B. The formulas for calculating these are 
(19) 
(20) 
( 21) 
(22) 
(23) 
The corresponding formulas for WB are even simpler : 
CALTECH CONFERENCE ON VLSI , Janua~y 1981 
278 
Paul Penfie ld, Jr. ~nd Jorge Rubinstein 
CT CTA (24) 
Tp = TPA (25) 
R22 0 (26) 
To2 = 0 (27) 
TR2R22 o. (28) 
A set of APL functions which implement this approach appear in Figures 8 
and 9. The necessary data is passed around in the form of vectors. A two-
port network is represented by the vector CT, Tp, R22 , T02 , TR2~22 • The 
listing of WC, for example, shows the calculation of the required output, 
term by term, from the arguments. This function can be compared with (19) to 
(23). 
Figure 9 shows five functions intended to calculate the bounds for any 
network . The two functions TMIN and TMAX calculate the lower and upper 
bounds for delay, and refer to a global variable named V which contains the 
threshold, a number (or array of numbers) between 0 and 1. The functions 
VMIN and VMAX calculate the lower and upper bounds for signal voltage and 
refer to a global variable T containing an array of delay times. The final 
function, OK, refers to both V and T and returns 1 if all is well, 
that is, if TMAX ~ T, or -1 if the network definitely will fail, that is 
if T < TMIN, or 0 if the bounds are not tight enough to tell for sure, 
that is if TMIN ~ T < TMAX. An example of the use of these functions to test 
the network in Figure 7 is shown in Figures 10 and 11. 
Because these functions were written for exposition, no protection is 
included against meaningless values of V or T. In addition, these fail for 
networks without any resistances or capacitances, and for V = 0 or T = 0. 
V. Application to PLA Speed Estimates 
These bounds are applied, as an example, to polysilicon lines driving the 
AND plane of a PLA, to determine whether or not the dominant delay occurs 
here . It is assumed that a strong superbuffer driver drives the line, and 
that every second minterm has a transistor present. The gates are assumed to 
be 4 microns square, separated by 24 microns of RC line. The poly resistance 
is assumed to be 30 ohms per square, the gate-oxide thickness 400 Angstroms, 
and the field-oxide thickness 3000 Angstroms. 
These numbers lead to a capacitance of 0.01 pF and resistance 180 ohms 
between gates, and a resistance of 30 ohms and capacitance of 0.013 pF for 
each gate. The network is driven by a source resistance of 380 ohms and the 
effective capacitance of the output of the driver is estimated as 0.04 pF . 
COMPUTER-AIDED DESIGN SESSION 
' I I 
279 
SignaZ De~~y in RC ~ree Netwo~ ks 
'V Z-<-URC X 
[ 1] Z-<-X[2 ],(X[1]xX[2]f2),X[ 1], (X[1]xX[2 ]f2),XL1]xX[1]xX[2]f3 
'V 
'V Z-<-WB A 
[1] Z-<-A[1 2], 0 0 0 
'V 
'V Z+A WC B 
[1] Z-<-(A[1]+B[1]),(A[2]+B[2]+A[3]xB[1]),(A[3]+8[ 3]),A[4]+B[4]+A[ 3]xB[ 1] 
[2] Z+Z,A[5]+8[5]+(2xA[3]xB[4])+A[3]xA[3]xB[1] 
'V 
Figure 8 . APL functions for the element and wiring functions. 
'V Z+VMI!I A 
[1] Z+(~A[2]-A[S]+A(3])x1-(*-(T-A[2]-A[S]fA[3])fA[2])xA[4]+A[2] 
[2] Z+OfZf1-A[4]+T+A[S]tA[3] 
'V 
'V Z+VMAX A 
[1] Z+(1-(A[4]+A[2])x*-T+A[S]tA[3])[(T+AL2]-A[4])+A[2] 
'V 
'V Z+TMIN A 
[ 1] Z+-e A[2]x(1-V)tA[4] 
[2] Z+Of(ZxA[5]fA[3])fA[4]-A[2]x1-V 
'V 
'V Z+TMAX A 
[1] Z+(A[4]-H-V)-A[S]tA[3] 
[2] Z-<-Z[(A[2]-A[S]fA[3])-0[A[ 2 ]xe A[2]x(1-V)+A[4] 
'V 
'V Z+OK A 
[ 1] Z+(T~TMAX A)-T<TMIN A 
'V 
Figure 9. Response functions. 
CALTECH CONFERENCE ON VLSI, January 1981 
280 Paul Penfield~ Jr . and Jorge Rubinstein 
A AN EXAMPLE OF THE USE OF THE RC TREE DELAY CALCULATIONS. 
BRANCH +- WB (URC 8 0) k'C URC 0 7 
NET+- (URC 15 0) WC (URC 0 2) WC BRANCH WC (URC 3 4) WC URC 0 9 
A NOW THE NETWORK IS DEFINED. 
V+- 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
A NOW THE VECTOR OF THRESHOLD VOLTAGES IS DEFINED. 
A NEXT TO FIND THE MINIMUM AND MAXIMUM BOUNDS FOR DELAY: 
v. (TMIN NET),L1.5] TMAX NET 
0.1 0 68 . 167 
0.2 27.8 117.22 
0 . 3 71.46 173.17 
0.4 123 .13 237.76 
0 . 5 184 . 23 314.15 
0.6 259 . 02 407.65 
0.7 355.45 528.18 
0 . 8 491.34 698.07 
0.9 723.66 988.5 
A NOW TO DEFINE A DELAY VECTOR AND GET THE VOLTAGE BOUNDS: 
T+- 20 40 60 80 100 200 300 400 500 1000 2000 
T, (VMIN NET) ,[1.5] VMAX NET 
20 0 0.18138 
40 0 . 03243 0.22912 
60 0.0814 0.27565 
80 0.12565 0 . 31761 
100 0.16644 0.35714 
200 0.34342 o. 52297 
300 0.48283 0.64603 
400 0.59263 0.73734 
500 0.67913 0 . 8051 
1000 0.90271 0.95615 
2000 0.99105 0.99778 
Figure 10. Example of the use of the fast calculation scheme to 
find upper and lower bounds on delay and response voltage. 
A function which returns a network with N minterms is shown in Figure 
12 . The results of calculating the delay as a function of the number of 
minterms are shown in Figure 13. The voltage threshold was taken to be 0 . 7 
times v00 • On this log-log plot the quadratic dependence of delay on number 
COMP UTER -AIDED DESIGN S ESSION 
~OJ. 
Signal DeLay in RC TPee NetwoPks 
t 
v 
.8 
o-0~\:> 
co "\ .., 
. 6 <(_. i-~ ~,. .,. .... 
.,. 
/ 
/ 
"' 
"' 
"' / 
.4 / / 
/ 
/ 
/ 
/ 
/ 
/ 
.2 / 
/ 
/ 
/ 
/ 
/ 
/ 
0 100 200 300 400 500 600 
t--+ 
Figure 11. Upper and lower bounds as calculated in Figure 10 . The exact 
solution, found from circuit simulation, is shown also. 
'iJ Z-+-PLALINE N;A 
[1] A-+-(URC 180 0.0107)WC URC 30 0 . 0134 
[2] A A IS A SINGLE SECTION ACCOUNTING FOR TWO MINTERMS 
[3 ] Z-+-(URC 378 0 )fv'C URC 0 0. 04 
[4] A Z IS THE PULLUP R AND C FOR SUPERBUFFER DRIVER 
[5] LOOP:+ (N~0)/0 
[6] Z-+- Z WC A 
I 7] N-+-N-2 
l8] +LOOP 
'iJ 
Figure 12. APL function which returns a model of a PLA line 
with N minterms . 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
. 
(/) 
c: 
.. 
w 
~ 
J-
>-
<{ 
282 
10 
.....J 
w 
0 0 . 1 
0 .01 
2 
Pau1 Penfield , ~P . and JoPge qubinstein 
4 10 20 40 100 
NUMBER OF MINTERMS 
Figure 13 . Upp e r and lower bounds on response time of the network of Figure 
12 , shown as a function of the number of minterms in the PLA . 
COMPUTER - AIDED DESIGN SESSION 
SignaL DeLay in RC TPee NetWoPks 
of minterms (as a measure of the length of the line) is evident . Also evident 
is the fact that even with as many as a hundred minterms, the delay is guaran-
teed to be no worse than 10 nsec. This suggests that the dominant delay in a 
PLA occurs elsewhere . 
VI . Conclusions 
A computationally efficient method for calculating the signal delay 
through MOS interconnect lines with fanout has been described. Tight upper 
and lower bounds for the step response of RC trees have been presented, 
together with linear-time algorithms for these bounds from an algebraic 
description of the tree. Substantial computational simplicity is achieved 
even in the presence of RC distributed lines by representing the RC tree by a 
small set of suitably defined characteristic times, which can be calculated by 
inspection and used to generate the bounds. 
Although only the step response is considered here, the results can be 
extended to upper and lower bounds for arbitrary excitation by use of the 
superposition integral [1). 
Extensions of the theory to RC trees with nonlinear elements (similar to 
the work of Glasser [3) for nonlinear MOS inverters) would be desirable for 
better modeling of MOS circuits. Investigations of RC trees with nonlinear 
capacitors and resistors are now under way, along with attempts to unify the 
modeling of gates and interconnects, and in particular to include pass 
transistors in the interconnects . Tighter bounds are also being looked for . 
VII. Acknowledgements 
The authors are pleased to acknowledge discussions with Steven Greenberg, 
Llanda Richardson, and Lance Glasser, and help from Barbara Lory in manuscript 
preparation. 
References 
[1) J . Rubinstein and P. Penfield, Jr .; to be published . 
[2) W. C. Elmore, "The Transient Response of Damped Linear Networks with 
Particular Regard to Wide-Band Amplifiers," Journal of Applied Physics, vol. 
19, no. 1, pp. 55-63; January, 1948. 
[3] L. A. Glasser, "The Analog Behavior of Digital Integrated Circuits," 
private communication; December, 1980. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
284 
COMPUTER - AIDED DESIGN SESSION 
285 
FUNCTIONAL VERIFICATION IN AN INTERACTIVE SYMBOLIC 
IC DESIGN ENVIRONMENT 
Bryan Ackland 
Neil Weste 
Bell Laboratories 
Holmdel , New Jersey 07733 
ABSTRACT 
This paper describes verification techniques that have been implemented as part of 
an interactive symbolic IC design system. Circuit analysis programs perform node 
extraction and gate deco mposition . They generate both transistor and gate level 
circuit desriptions which are used as input to a transistor level digital MOS timing 
simulator. The extraction programs make use of an intermediate circuit descrip-
tion language which captures both geometric placement and circuit connectivity. 
All programs are written in the C programming language and run under the UNIX 
operating system. An example is included to demonstrate the operation of these 
various techniques. 
1. INTRODUCTION 
Functional verification is an important and necessary step in the design of large scale 
integrated circuits. It is that part of the design cycle which eliminates most , preferably all, of 
the human errors introduced in the forward part of the design. It is generally a two stage pro-
cess consisting firstly of automatic circuit extraction, in which electrical circuit descriptions are 
generated from the physical layout, and secondly of functional simulation of the derived cir-
cuit. Symbolic design techniques simplify the circuit extraction task as they introduce struc-
tural or circuit information into the layout file and remove unnecessary geometrical data. 
Inte ractiv e design techniques, however, place additional constraints on verification in that 
they demand fast response in order to avoid slowing the interactive design cycle. 
This paper describes verification techniques that have been implemented on MULGA [1] -
a UNIXt based interactive symbolic layout system. They consist of two programs which per-
form nodal circuit extraction and gate decomposition , and EMU - a transistor level MOS tim-
ing simulator. All programs are written in the C programming language and run under the 
UNIX operating system in a microcomputer based design station. 
2. MULGA 
Symbolic layout provides a means of abstracting the detailed and often laborious task of 
mask design . It offers the advantages of manual layout with regard to density , along with 
reduced design time and reduced likelihood of manual error. MULGA is a UNIX based 
interactive symbolic d esign system consisting of a suite of programs residing on a high perfor-
mance color display station. Figure 1 shows the various software components of MULGA and 
t UNIX is a trademark of Bell Laboratories 
CALTECH CO NFER ENCE ON VLSI, Janua~y 1981 
286 
8Pyan Ackland and Neil Weste 
INTER-
ACTIVE 
EDITOR 
r---r 
I I 
INTER-
ACTIVE 
EDITOR 
MULGA 
PROCEDL 
DEFN . 
-------
Figure 1. MULGA design system 
the way in which they come together to effect a design . 
-, 
I 
EMU 
GATE 
The system is based around a symbolic Intermediate Circuit Description Language 
(ICDL) which uses a derivation of the co-ordinode notation introduced by Buchanan and 
Gray [2]. It combines circuit topology with geometric placement on a coarse virtual grid. In 
this way, the language captures designer intent with respect to the circuit, rather than a collec-
tion of abstract geometric forms . 
The basic structure in ICDL is a cell which is a collection of elements placed on a virtual 
grid as shown in Figure 2. These elements may be devices, wires , contacts, pins, or other cell 
instances. Pins are named interconnection points that have no physical meaning in the final 
layout. They are a very important attribute of the language, however, and are used exten-
sively in cell placement procedures and circuit verification. Figure 2 shows a CMOS 2-input 
nand gate represented graphically along side the textual ICDL description of the cell. Note 
that each line of text corresponds to an actual circuit component rather than a geometrical 
shape. Components are resticted to lie on grid intersection points as shown. Note, however , 
that this grid is only a relative placement network which defines the topology of the layout. 
Actual physical dimensions are determined later by a compaction process. 
ICDL cell descriptions may be generated either via the interactive editor or else procedur-
ally using the C programming language. Once the designer is satisfied with the symbolic 
COMPUTER -AIDED DESIGN SESSION 
Functiona 1 Verifiratio n i n an I n te r active Sy mbolic 
IC Des~gn F.nvi ~nnme n t 
beoin nand2 
10 pin al 1 1 
pin al 1 8 
pin poly 4 
pin poly 6 
pin a I 7 ~ 
dev n or•1 
dev n or•1 
dev p or •1 
6 dev p or • 1 
wire al 1 
wire al 1 
wire poly 
4 wire poly 
wire a I 3 
wl r e Ndlf 
wire Pdif 
2 con cut 7 
con cut 7 
con cut 3 
con cut 3 
con cut ~ 
end 
Figure 2. ICDL description of 2-input CMOS nand gate 
287 
VII 
vdd 
9 A 
9 8 
z 
4 3 
6 3 
4 6 
6 6 
1 8 1 
8 8 8 
4 3 4 9 
6 3 6 9 
6 7 6 7 3 
3 1 3 3 
~ 6 ~ 8 
3 
6 
1 
6 
8 
description, the file is compacted. The compaction program examines each symbolic grid line 
in the layout to determine how far it must be spaced from its neighbours in order to satisfy 
process design rules. Compaction information is stored in the design grid file . This file, along 
with the original ICDL description defines the minimum cell geometry assuming no other 
constraints in the design. 
A chip assembler program takes the design grid file along with a specified chip floor plan 
and generates a mask coordinate file describing the actual physical location of the symbolic 
grid lines in the final layout. The chip assembler frequently needs to expand previously com-
pacted cells in order to maintain global connectivity. The final step in the forward design path 
is the conversion and placement of cells into XYMASK data files. XYMASK is the geometric 
mask defini tio n language used by the Bell System. A second interactive editor provides a 
means whereby the designer can view and evaluate his final design. 
The verification phase consists of two circuit extraction programs and EMU - a transistor 
level MOS timing simulator . The first program performs node extraction and produces, in 
addition to the node list, a transistor level description of the circuit suitable as input to a cir-
cuit simulator such as SPICE. The second performs gate decomposition producing the higher 
level circuit description required by EMU . The following sections describe the operation of 
these three programs. 
CALTECH CONFERENCE ON VLSI, Janu a ry 1981 
288 
8Pya n AckZand a nd NeiZ Weste 
3. NODE EXTRACTION 
T he complexity of the circuit extraction process is heavily influenced by the nature of the 
layout definition language. One of the advantages of ICDL is that it ca rries implicit circuit con-
nectivity information along with the physical topology. As shown in Figure 3, devices have 
designated connection points which are related by simple geometric rules to the center of the 
device. Wires serve to connect devices and external connections via interlayer contacts. 
Electrical connecti vity is established when two elements exist on the same layer at the same 
virtual grid position . The pin construct aids the designer in naming specific nodes and con-
nection points. This implicit connectivity is used, in conjunction with a s imple algorithm, to 
arrive at a transistor node table description of the cell. 
A 
Figure 3. Implicit connectivity of ICDL components 
The algorithm begins by first reading in a complete description of the ICDL cell. Following 
this, each pin, contact and transistor connection is assigned a different node number. Figure 4 
shows a hypothetical net of components labelled in this manner. If there were no wires in the 
circuit, all such initial node numbers would be unique. Wires serve to connect components 
and reduce the overall number of unique nodes in the circuit. Accordingly, each wire is 
examined in turn to determine which nodes are redundant. A list is made of all nodes belong-
ing to that wire. If the wire crosses another wire of the same type, connectivity is established 
by adding one node from the new wire to the old wire node list. 
These wire node lists are used to eliminate redundant node numbers and generate a node 
net list description of th e circuit. Pin names are used, wherever possible, to identify named 
nodes. Un-named nodes are given an internally generated name. Parasitic capacitance values 
for each node are calcula ted using the topology contained in the ICDL description along with 
absolute grid dimensions obtained from the mask coordinate file and specified process param-
eters. At this stage, sufficient information has been gathered to produce a transistor level cir-
cuit description. A simple filter converts this data into a SPICE simulation file. Figure 5 
shows the simulation file generated from the 2-input nand gate described in Figure 2. 
COMP UTER-AIDED DESIGN SESS I ON 
Functio nal Ve r i f icatio n in 
IC Desi gn Envi r onment 
an Inter~ctive ~ymbolic 
18 I 19 
10 12 
1 I 
2 11 
)( 3 20 
7 )( 22 
21 X 8 
9 
~ 
4 )( 
1 
)( 5 14 X 7 t6 15 
16 23 
Figure 4. An example of initial node numbering 
. SUBCKT nond2 ( vss vdd A B 
MNl 10 A VSS vss N7A 
MN2 z 8 10 vss N7A 
MPl z A vdd vdd P7A 
MP2 z B vdd vdd P7A 
CMvss vss 0 CMTOSH 42 
CNTvss vss 0 CN+H 12 
CMvdd vdd 0 CMTOSH 42 
CPTvdd vdd 0 CP+H 12 
CPA A 0 CPTOSH 36 
CPB B 0 CPTOSH 36 
CMZ Z 0 CMTOSH 42 
. FINIS 
Figure 5. SPICE description of 2-input nand gate 
4. SIMULATION 
X CONTACT 
-!D-DEVICE 
z ) 
Analog circuit simulators such as SPICE give very accurate reliable feedback as to the 
functional operation of a circuit. They tend , however , to be very expensive in terms of com-
puter and engineer time. In an interactive design environment, ease of operation and fast tur-
naround are of paramount importance and some accuracy can often be sacrificed to achieve 
this . For these reasons, a UNIX based MOS timing simulator known as EMU was developed. 
An important feature of the simulator is the fact that it is a resident part of the design s tation 
software and is therefore capable of giving the designer fast feedback concerning the opera-
tion of his circuit. 
CAL TEC H CO NFE RENCE ON VLSI , January 1981 
290 
I I 
r---,-
1 I I 
I I -I -I L 
I r 
I I 
I I 
I I 
I I --~ 
I T 
I . -
T 
I I I 
I I I 
1&1,----,--
1 I I I I 
I 
I 
I 
I 
I 
CURRENT SOURCE <::] 
CAPACITIVE NODE e 
(a) Generalized showing current sources and voltage nodes 
T 
r-
1 
I 
I 
I 
L----1 
I 
I 
I 
.J I 
__ .J 
T 
(b) Bi-directional circuit element 
Figure 6. EMU circuit model 
Timing simulators fall somewhere in between circuit simulators and logic sim ulators. They 
model digital circuits as co ll ections of idealized transistors which may be grouped in a defined 
manner to form simple logic functions. Unlike logic simulators, they generate an analog 
waveform and are able to deal with limited analog effects such as charge storage a nd bidirec-
tional circuit elements . Performance, however, is typically one to two orders of magnitude 
faster than analog circuit simulators. EMU is aMOS timing simulator which, like MOTJS [3], 
uses certain properties of the MOS transistor to greatly simplify the circuit model. These pro-
perties are represented by the following approximations: 
COMPUTER- AIDED DESIGN SES SIO N 
Functional Ve Pi[ication in an I~te~a~tive Symbolic 
IC Design EnviPonmen t 
l . The input resistance of the gate terminal is infinite, i.e. the input impedance is purely 
capacitive. 
2. The leakage current of a MOS device is zero. 
3. The channel may be represented as a voltage controlled d.c. current source. 
4. In a self-aligned digital process, Miller effects are negligible. 
5. The impedance to ground at any node is dominated by diffusion , gate and wiring capaci-
tances which are voltage independent. 
These approximations lead to the model shown in Figure 6(a). The circuit consists of a 
number of capacitive nodes interconnected by various voltage controlled current sources. Any 
number of current sources may drive a single node. Bidirectional circuit elements are 
represented by two sources as shown in Figure 6{b). C urrent sources may be transistors, 
load devices or compound gate structures. 
4.1 Compound Gates 
MOS gates typically consist of driver transistors connected in series/ parallel combinations 
as shown in Figure 7. Parallel branches pass current if any of the component elements con-
duct. This is equivalent to an OR function . Similarly, series branches conduct only if all 
component elements conduct- hence an AND function . 
0 o-.._-4 
Eo---+----
voo 
vss 
z 
Zc A.B+C. (D+E) 
Figure 7. Typical CMOS gate construction 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
292 
Although a logic gate can be simulated as a collection of individual transistors, much data 
space and simulation time can be saved if it can be modelled as a single current source. 
MOTIS uses the concept of a compound gate in which series/ parallel combinations of driver 
transistors are lumped together into a single data structure. It is based on the following 
approxi,rnations: 
I. The total current sourcing a node is the sum of the currents sourced by all parallel 
branches connected to that node . 
2. The cu rrent sourced by a branch containing several elements in series is the inverse sum 
of the curren ts that would be sourced by each element if it were the only element in the 
branch. 
3. All transistors in the gate structure operate independently. Their only point of interaction 
is through the gate output node. 
EMU uses this same compound gate construction. Driver transistors are specified in a 
"reverse polish" manner as shown in Figure 8. Transistor position in the gate is defined by 
the operators push, parallel and series. Push places a transistor on to an imaginary stack. 
Parallel means that the named device is in parallel with the top element of the stack. The 
resu lting parallel combination replaces the element on the stack. Series means that the named 
device is in series with the top element of the stack. The resulting series combination replaces 
the element on the stack. Parallel or series without an operand, operates on the top two ele-
ments of the stack and pops the stack one position. 
PUSH (A). PARL(B). PUSH (C). PARL (0). PARL (E). SERS. 
PUSH (F). SERS (G). PUSH (I). SERS(J). PARL (H) . SERS. PARL 
Figure 8. Reverse Polish gate specification 
This data structure is simply interpreted by a "reverse polish" current calcu lator. The calcu-
lator uses a real stack to store transistor conduction currents. The operator push causes a new 
current to be pushed on to the stack. The operator parallel causes two currents to be added. 
Similarly, the operator series causes two currents /0 a nd l b to be combined according to: 
I I I 
-=-+-
/ l a / b 
COMP UTER-AIDED DES IGN SESS TON 
293 
F' U •! ~ t {. '? 11 rl 7 V e r' if 1' C a t i 0 n f n • l'1 T Y1. t e I' • .l C: t £ l) P f' i I 1711> o) 7 ; '.: 
· Desig~ Rnvi~onmPn~ 
4.2 Gate Advancement 
Gate advancement is the technique by which node voltages are updated for each new time 
:.tep of the simulator. Referring to Figure 6, each node in the circuit model is dominated by a 
capacitance to ground C. This means that for a sufficiently small time step, node voltages 
within the circuit are essentially constant. Source currents, which are in turn functions of cir-
cuit voltages are therefore also constant. Suppose a node is driven by n current so urces 
n 
1 " ... In. The total current into the node is then I = "2; lk. For a sufficiently s mall time step 
k - 1 
(t - t 0), the new node voltage V (t) is given by: 
I (t - t 0) 
v(t) = V(t 0 ) + C 
The accuracy of this simple forward integration scheme depends critically on the choice of 
time step. If the time step is too large, circuit voltages and currents may change significantly 
during one iteration and errors will be introduced. In addition, voltages tend to overshoot 
leading to numerical instability - especially with bidirectional circuit elements such as 
trans mission gates . On the other hand , if the time s tep is too small, much s imulation time is 
wasted iterating over unnecesarily small time intervals . 
Rather than leaving this delicate choice of time step to the operator, EMU automatically 
adjusts the time step to maintain simulation accuracy. It does this by monitoring the max-
imum voltage step occuring from one time instant to another and adjusting the timestep 
accordingly. Simulation thus proceeds rapidly during periods of low circuit activity and then 
slows down to critically examine those periods when changes are taking place. 
4.3 Device Model 
The basic Sah model is used to calculate MOS trans istor channel current. The equations, 
as applied to an N channel device are : 
Cutoff: iVosi < V, 
Non -saturation : 
Saturation : 
los= 0 
ivGs - v,i > ivosi 
IDS = f3 [ 7 ) [( V GS - V,) V OS - ( V ~S) 2 ] 
iVos - v,i ::s ivos i 
I OS = {3 [ 7 ) ( V GS ; V, ) 2 
where w and I are the channel width and length respectively. 
Back-gate bias effects are taken into account using table lookup techniques to calculate per-
turbations in threshold voltage . 
4.4 OPERATION 
The operation of EMU is characterized by four software states. Initially , EMU enters the 
command state. This is the common state from which all others can be entered. It is used to 
set simulation parameters, define clocks, display portions of the data base, initialize inputs and 
format the output of res ults . The circuit state is used to create a circuit description in the data 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
294 
8 Pya n AckLan d a nd NeiZ We s te 
base. It is used to define inputs and nodes, assign gates and set circuit capacitances. The pro-
cess state is used to set process parameters such as transistor threshold voltage and transistor 
gain. The exec ute state represents the actual simulation. Following execution, the simulator 
returns to the comma nd state. Command, i nput and process states each have their own 
input language which may be entered interactively or via an predefined input file. 
4.5 Performa nce 
EMU is written in C and will run on any UNIX based machine. In particular, it runs on 
the /.S/ - 11 / 23 - the host processor in the MULGA design station. It has also been imple-
mented, however, on a VAX- 11 / 780 and a Motorola 68000. The 11 / 23 implementatio n occu-
pies approximately 25K bytes of code space leaving 30K available for circuit definitio ns. This 
is sufficient to hold a circuit of about 2000 transistors. Table I shows some simulation run 
times for a sample circuit - a 32X I bit CMOS static RAM. This circuit contains 260 gates 
which in turn contain some 770 transistors. Note that even on the 11 / 23 microcomputer , this 
type of circuit can be simulated in a time which compares favorably with the time needed to 
perform an off-line simulation on a larger machine. 
SIMULATION RUN TIMES (sees) 
LSI-11/23 1094 
VAX-11/780 129 
68000 (4 mHz) 1010 
68000 (8 mHz) 585 
T ABLE I SAMPLE RUN TIMES 
5. GATE DECOMPOSITION 
The nodal analysis program described in Section 3 produces a transistor net list which can 
be used to ge nerate a transistor level circuit description for EMU. The speed of the simulator, 
however, is directly related to the number of active current sources in the circuit. Accord-
ingly, a gate extraction program has been written to process this transistor net list and convert 
it, where possible, into the compound gate structures recognized by EMU. 
COMPUTER-AIDED DESIGN SESSIO N 
Fun~"tionrl Ve 1" if1 , rlti()n ; '1. rut Tnt~.J t' .<etive .-:~~ "?!,o'i ~ 
i" De s ign Env~ ~o~ ~ e ~ t 
295 
As a first step, nodes are characterized as either inputs, outputs or internal nodes. Outputs 
are defined to be those nodes which connect (in the case of a CMOS design) to the drains of 
oath an N and a P transistor. Inputs are those nodes which connect only to transistor gate ter-
minals within the cell. All remaining nodes are assumed to be internal. The algorithm then 
examines each output node in turn, and searches for all N devices connected either directly 
or indirectly (through an N channel) to that node. Any branches that pass through other out-
put nodes are assumed to contain transmission gates and are ignored. All such N branches 
must eventually terminate at the negative supply rail if they are indeed part of a compound 
gate structure. Branches that do not satisfy this condition are discarded. This is equivalent to 
traversing the graph of all potential gate transistors connected to the output node. Figure 
9(a) shows a number of devices connected to an output node Z. The N transisto r graph that 
is derived from this circuit is shown in Figure 9(b). Each device is represented by a s imple 
PUSH operator, consistent with the notation described in Section 4.1. 
(a) 
C--i 
•z• 
(b) 
·vss· 
Figure 9. Example of gate device graph extraction from a circuit 
CAL TECH CO NF ERE NCE ON VLSI~ JanuaPy 1981 
296 
8Pyan Ack~and Qnd Neil West e 
The gate transistor graph is then reduced by successive parallel and series merging until a 
si ngle branch remains. At each merge , the appropriate operator (SERIES or PARALLEL) is 
added to the branch description. The final result is the desired reverse polish specification of 
the transistor grap h. Figure 10 shows the four steps required to reduce the graph of Figure 9. 
•z• 
PUSH IF I •z• 
SERSIGI r I I PUSH lA I I r PUSHIBI 1 PUSHIA I PUSH IFI PARLIB I SERS IG I 
1 I 
PUSHIC I PUSH III I PUSH IC I I [ PUSH 101 J I PUSH lEI I PUSH III PARLIDI SERS IJI SERSIJ I PARLIEI PARLIHI l PUSH IHI J I ] 
•vss· 
. ,. vs .. 
(a) (b) 
•z• 
•z• y 
PUSH IAI 
PARL 181 r I 
PUSH IC I 
PARL 10) 
PARL lEI 
SERS 
PUSH If) 
SERS IGI 
PUSH ll I 
PUSH I A I PUSH IFI 
PAR LIB I SERSIGI 
PUSH IC I PU SHlll 
PARLID I SERSIJ I 
PARLIEI PARL IHI 
SERS SERS 
SERS IJ I 
PARL IHI 
I I 
• vss· SERS 
PARL 
(d) l 
• vss. 
(c) 
Figure 10. Series/ parallel reduction of device graph 
~O~PUTER-AIDED DESIGN SESSION 
Functio na' VP r ificacion in zn Interactive Symbolic 
IC Design En vi ro~mP.nt 
z 
8 
A 
This process is then repeated for the P transistor chain. The two expressions are then 
combined to produce a complete gate description. Transistors not absorbed by this extraction 
process are retained as single transistor transmiss ion gates. Figure II shows the gate level 
description that was extracted from the 2-input nand gate example of Figure 2. This descrip-
tion was given to EMU and Figure 12 shows the actual simulation result. 
I I 
0 8 
nand2(A. B. Z) 
EXTERN A. B: 
EXTERN Z: 
{ 
GATE <Z. 5): 
PUSH <A. D : SER <B. D : PCH 0 : 
PUSH <B. D : PAR <A. 1) : 
I NCAP <A. 3121) : 
I NCAP <B. 3121) : 
OUTCAP <Z. 7121) : 
ROUCAP <Z. 9) : 
ROUCAP <A. 11) : 
ROUCAP <B. 1 1) : 
} 
Figure 11. Gate level description of 2-input nand gate 
\ ~ ~ 1( 
I I I 
I I 
18 24 32 40 48 56 
Figure 12. Simulation plot of 2-input nand gate 
II 
64 72 80 (nS.) 
CA L TECH CO NFERE N'CE ON VLS I , January 1981 
6. CONCLUSIONS 
In an interactive design environment, verification tools must be fast and readily accessible 
in order to encourage the designer to perform this vital task. This paper has described three 
tools which meet these criteria by actually being part of the resident design station software. 
In addition, they have the advantage of being written in C and UNIX, making them readily 
transportable to other design systems. 
REFERENCES 
[!] WESTE, N., "MULGA- An Interactive Symbolic System for the Design of In tegrated 
Circuits", Bell Sys. Tech. Journal, to be published. 
[2] BUCHANAN, 1., "Modelling and Verification in Structured Integrated Circuit Design", 
PhD Thesis, University of Edinburgh. Scotland, 1980. 
[3] CHAWLA, B.R., GUMMEL, H .K., and KOZAK, P., "MOTIS - An MOS Timing 
Simulator", IEEE Trans. on Circuits and Systems, Vol. 22, No. 12, Dec. 1975, pp. 901-
910. 
COMPUTER - AIDED DESIGN SESSION 
A METHODOLOGY FOR IMPROVED VERIFICATION 
OF VLSI DESIGNS WITHOUT LOSS OF AREA 
Louis K. Scheffer 
Departments of Electrical Engineering 
and Computer Science 
Stanford University 
Stanford, California 94305 
ABSTRACT 
This paper describes a n IC layout methodology based on arbitrary outline 
cells, prevention of overlap, and mixed programs and graphics . Advantages 
are: no loss in area over hand packing; incremental checking of design 
rules, component interconnection, and timing; reduction of visible 
complexity; and easy implementation. Disadvantages are: possible 
proliferation of cel l types and poor handling of cells with contacts not on 
the boundary. An implementation that uses and enforces this methodology is 
discussed . 
INTRODUCTION 
IC design methodologies in use today have a number of serious defects 
with respect to design verification. They defer design rule checks (DRC) and 
electrical continuity tests until the end of the design cycle . This is 
necessary since the possibility of another item overlaying an item that has 
already been checked cannot be ruled out. However, at the end of the design 
cycle, errors that could have been fixed quite simply earlier may require 
extensive revision. Multiple errors may require multiple analyses of the 
entire chip to find all the errors. The classic example of this is the power 
ground short. If extract ing a component list from the artwork reveals a 
power to ground short, then the list will be useless for other purposes such 
as simulation or er r or checking . It is necessary to fix the short , a nd rerun 
the component extraction program on the entire chip. Furthermore , DRC and 
component extraction require execution times ranging from NlogN to N**3/2, 
making these tests time consuming and expensive as chips increase in size. 
In this paper, a methodology is introduced that depends on not allowing 
cells to overlap other cells or geometrical primitives, but allows cells to 
have arbitrary shapes . The advantages and disadvantages of this methodology 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
300 Gouis K. Scheffe~ 
are discussed, and comparisons made between this methodology and others that 
mip,ht be adopted to solve the same problems. A discussion of how the 
problems of nRC and component extraction are treated in this system folious, 
along with a proof that shows that this rnethodolo~y does not cost any area . 
Next there is .1 discussion of the relationship between this methodology and 
schematic drawing systems, ~nd ~section on .1n implementation of an IC layout 
editor that uses and enforces this methodology . Finally, there is a section 
on possible future developments using this methodology . 
BASIC: IDCA OF THE t-11: fHODOLOGY 
The theme of this methodology is simply to avoid overlap. Towards this 
end, teells may not overlap each other, and cells may not overlap geometrical 
primitives, such as rectangles and polygons. The only objects that may 
overl.1p are geometrical primitives within the same cell . Cells, however, may 
be of arbitrary polygonal shape . To repla ce the capability lost when 
overlapping cells are outlawed, a mechanism is supplied to mix layouts and 
programs to define devices such as ROMs and PLAs. In addition there is a 
requirement that no active component area touch the boundary of a cell. 
An analogy exists between this methodology and that of structured 
programming. In both cases, the intent is r educe the interaction of 
disparate parts of the system, so that each part may be designed, tested and 
understood by itself, independent of the remainder of the design . Both 
methodologies attempt to hide the details of the implementation; in 
progr~mming the essential information is contained in the calling sequence, 
whereas in IC layout it is contained in the boundary of the cell . Not 
allowine overlap is similiar to not allowing GOTOs into or out of 
procedures. Not allowing components to touch the boundary of a cell is 
similiar to not allo\dng statements to be split over procedure boundaries. 
This analogy explains why verification is so much easier if overlap is 
forbidden . Since there is no equivalent of global variables in IC design , 
each piece may be considered on its own , just as a procedure that has no 
global variables may be checked on its own . Furthermore, a designer (or a 
program) may use a cell without knowing anything about it except the behavior 
as exp r essed from the terminals. 
One final analogy is that when GOTOs are eliminated, a richer variety of 
control structures is required if efficiency is not to be sacrificed . 
Similiarly, in IC design, if overlap is not allowed then cells must be 
allowed to assume arbit rary shapes. 
ADVANTAGES AND DISADVANTAGES OF THIS METHODOLOGY 
There are several advantages to this choice of methodology . First, it 
does not cost any area over a hand layout (proof follows). Second, it allows 
a completely hierarchical design, where only two adjacent levels of the 
hierarchy need to be examined at any time . This reduces the complexity as 
seen by the user and any analysis programs. Third, many checks on the 
validity of the data may be made incrementally, as each cell is defined and 
used. These include DRC, component extraction, and timing verification (if 
COMPUTER -4 IDED DESIGN SESSION 
A l.f~tl-zodo~ogy .foro ImpPOVe 'l !fePifica+ ion 
of VL8I DeBi(lY!S Without Loss of AY' PU: 
301 
the technique of assertions is used). ~1any erro rs can therefore be caught at 
an early stage, where correction is still easy, instead of at the end o f t he 
design process wher e correc t ion is quite difficult and costly . Fourth , most 
processes that must be applied to the entire chip , such as the generation of 
new layers through logical operations on the existing layers , may be do ne by 
processing the chip on a cell by ce ll basis. This results in large savings 
when cells a r e us e d more than once . 
Another advantage of this methodology is that it makes ~inimal demands 
on co~puter resources. This translates into low cost per s tation, ei ther 
with a large computer timesharerl betwee n seve ral stations o r with a small 
comput e r per user. Since all p r ocessing is done one cell at a time , large 
amounts of aemory are n o t required in the processor that does the DRC and 
component extraction . By using assertions, the wo rk necessary t o verify 
performance can also be vastly reduced. 
Naturally , there are disadvantages as wel l. One serious object i on is 
that designt!rs a r e not us e d to designing with a total lack o f ove rlaps. If 
the advantages described above are not a ttractive e nough , the designers will 
be reticent to use a system that takes away some of their former freedom . 
Anothe r objection is that this Methodology is optimized for cells with t hei r 
connections on the edges. Tf cells have terminals in the ~Lddle then this 
methodology will require a large nuMber of nearly identical cells, with t he 
only differenc es being the wiring required to bring the internal ter~inals to 
the edge. A third problem may appear if an operation such as oversizing or 
undersizing a mask is attempted. In this case the modifica tio ns \lithin a 
cell may depend on the surroundings . Thus these modifications cannot be done 
on a strict ly cel l by cell basis , and it may b e neces sa ry t o accept a 
pro liferation of n early identical cells in order t o perform these operations 
and still keep the results in the strictly hierarchica l fo r mat . These 
problems should n ot be serious for highly st ructured chips , but may present 
r eal barriers to using this methodology in other cases . Fo r example, it is 
very ill-suited to masterslice type construction. 
There are other problems that this methodology does no t help, but does 
not hurt either. These are primarily calculations of a global nature that 
cannot be completed until the entire circuit is known. The general problem 
of calculation of the resistance of interconnections and the resulting time 
delays is one of these problems. Since the problem is inherently global, 
unlike capacitance (which is the sum of the local capacitances), the entire 
net must be known before the resistance or time delay may be calculated. 
Therefore this cannot be done at cell definition time, but must wait until 
the cell containing the entire net is constructed. 
Penfie ld and Rubinstein [Pe81) have recently demonstrated a method which 
bounds the time delay for tree structured networks. This case can be solved 
on a local basis, and hence fits very well with the proposed methodology. 
The only test that would still have to be made globally is to insure that the 
interconnections are indeed tree structured, since this cannot be determined 
from local data. 
CALTECH CONFERENCE ON VLSI, Januar>y 1981 
302 
~ouis K. Scheffep 
COMPARISON WITH OTHER METHODOLOGIES 
There are many other methodologies that may be used to create a chip. 
How does this methodology compare? Three strategies that have been 
considered are: allow any overlaps and do all checking at the end of the 
design process, allow overlaps but account for them correctly, and do not 
allow overlaps and use rectangular bounding boxes. 
Currently, the strategy used by most artwork systems is to allow the 
designer to create any overlap desired . Design rule checks , and tests for 
logical correctness, are performed once the design has been completed and the 
entire design expanded out. This approach gives the designer the maximum 
amount of freedom, and makes editing very simple, since anything is allowed . 
However, it requires that the checking programs must run on huge amounts of 
data . In particular, repeated cells are checked as many times as they 
occur. As chips get bigger, the voluDe of data will only get larger . 
Furthermore, design errors are found at the end of the design cycle , when 
they are the most difficult to fix . Errors in repeated cells show up many 
times, caking it hard to find errors that may only have occurred once. One 
error, such as shorting power and ground, may make it impossible to find 
others without repeating the entire cycle . 
Another methodology, tried by \fuitney [Wh80] for DRC , is to allow the 
designer to overlap items wherever desired. The resulting design is checked 
one cell at a time , but the design rule checking program detects and accounts 
for these overlaps in the checking process . This approach also puts no 
restrictions on the designer, and allows the traditional methods of 
programming PLAs and RO~s . Redundant errors are also reduced, since each 
cell is checked only once except where it overlaps others . The drawbacks are 
that checking must still be done on the whole design at the end of the cycle, 
and that accounting for the overlaps may take more computational effort than 
simply checking the entire design as one piece. Furthermore , this approach 
is not well suited to incremental checking , since there is no guarantee that 
any portion of the design is complete until the entire design is finished. 
This approach probably represents the best that can be done without imposing 
any restrictions whatsoever on the designer. 
McGrath and Whitney [licG80] propose a methodology that allows cells to 
overlap, but requires that each cell be correct when checked by itself, and 
all active devices to be represented by specially marked cells. The 
requirement that each cell be correct by itself leads to many false errors in 
the cases where a half width feature on a boundary is mated with a similar 
feature on the other side of the boundary. The suggested solution is to 
include full width features, and avoid the loss of area by overlapping the 
two full width edges . This means that overlap must be permi t ted, which leads 
to the problems in DRC discussed under the last methodology . Component 
extraction must also take the overlap into account, at least for capacitance 
calculations . Isolating active devices in cells of their own is almost a 
necessity in bipolar technology, where a given geometry could be a resistor , 
a transistor, or a zener depending on how it is biased, but is a quite 
significant (and unnecessary) restriction on the designer in normal MOS 
design . The transistors may be recognized quite easily, and unintentional 
COMPUTER-AIDED DESIGN SESSION 
A f.! e t 1-j o d () Z o g .'{ f ~ ro I m r> Y' o v e ti V e ro i f i c a t i ? n 
o f VLSI D~ s i J~S W i th o ~t Lo sa ? f A roP~ 
303 
tr~nsistors are caught when the layo ut i s compared to the logi c schematic. 
A fourth methodology is followed by many autoMated layout progr<~ms such 
a s \.!CLOP S (Pr79). This is to :tllow no overlaps, and to r~quire rectangular 
cell boundaries. This allows design and analysis to proceed on n cell by 
cell basts, but often wastes area since only rarely is it possible to combl~e 
several different sized rectangles into a resultant rect~ngle without any 
wasted area . Because of this obvious wa s te , human de s igners are reluctant to 
uc;e this methodology, especially for high volume products . 
NO LOSS OF AREA 
It is straightfono~ard to show that the pr oposed methodology resultc; in no 
loss of area in any technology with only one active layer. Given any design, 
first expand out Rny hierarchy that nay be present. At this point, the data 
is merely a large number of polygons with no included cells, and hen c e it 
meets all the requirement s of the propos ed methodology. Now itera tively 
perform the follo\.ling procedure: cover the chip with two non- overlappinp, 
polygons in such a way that half of the devices are inside one polygon, and 
the other half of the devi c es are in the other . The polygon boundaries may 
not pass through any devi c e s . Since the devices are topologically separate 
in a technology with one active layer, this division into two polygons can 
always be performed (although it may require using polygons with internal 
holes in pathological cases). Call each polygon that has been created this 
way a cell; then the chip is now represented a s two cells, each of which 
meets the methodolor-y requirements. These cells may be in turn subdivided 
further, untll the entire chip is represented as a hierarchy, with each cell 
containing at most two other cells , or two devi c es . 
It should be emphasized that although it is possible to automatically 
split a chip up this way for DRC purposes, it is far better to have people 
design hierarchically in the first place. If this is done, then no arbitrary 
splitting of any sort is necessary. Splitting a complete chip voids most of 
the benefits of the proposed methodology, and is only intended as a thought 
experiment to sho\.1 that any design may be represented in this methodology 
without loss of area . 
PERFORlHNG OPERATIONS IN THE PROPOSED ~ffiTHODOLOGY 
Design rule checking is done as follows in this methodology. Let 
MAXRULE be the maximum distance specified in any of the design rules. Define 
the OUTLINE of a cell as everything \.lithin MAXRULE of the boundary. Then, 
for each cell, s ubstitute in the outlines of all the lo\.ler level cells, and 
perform all the DRC tests. Ignore all errors within the outline that could 
possibly be affected by whatever surrounds the cell , for it is not known 
\.lhether these are errors until the cell is used. An error of this type is 
shown in figure 1, for the case of width violations. If the square C is a 
minimum size feature, then the r ectangles A and B represent potential width 
violations. Rectangle A cannot b e fixed by the addition of geometry outside 
the cell boundary, and hence r epresents a real error that must be reported. 
Rectangle B, ho\.lever, could be fixed by the addition of another rectangle 
when the cell is used, a nd therefore it is not reported. If the additional 
CALTECH CONFERENCE ON VLSI~ Januaroy 1981 
304 
Louis K. SchP[[eP 
geomet ry is not added to a cell in which this cell is used, then the error 
will be reported at that time. 
cO 
A 
B 
Figure 1: Real and potential errors 
Similiarly , the design rule checking code must ignore all errors that 
could be fixed by something inside the inner boundary of an included cell . 
If the error is real , it would have been caught when the cell was checked . 
This step is quite important, since the process of saving e verything within 
MAXRULE of the boundary may split internal items into pieces that do not meet 
the design rules. Figure 2 shows the portion of a cell that is retained and 
s ubstituted in whereever the cell is used. Since only the geometry within 
MAXRULE of the edges is retained, the polygon at A has been split, creating a 
possible error. The design rule checker, however, should not report this 
error, since it is against an inner boundary, and if it was a real error then 
it would have been caught wh e n the cell was checked . 
Any remaining errors are valid, since nothing can be added within 
MAXRULE of them, and should be displayed . The final step of the design rule 
checking process saves the outline, which will be substituted for the cell 
whenever the cell is to be involved in DRC checks . 
This approach handles split contacts and half-width lines on the edges 
of cells without special cases . Furthermore, all of the rules, including the 
complicated anti-reflection rules, may be checked in this manner . The 
designer therefore gets immediate feedback as each cell is created or 
modified. This is important since errors are much harder to correct later 
on . The ability to check all rules is in contrast to some interactive design 
rule checkers that check for validity while the editing is in progress. 
These give even more immediate feedback , but cannot check some of the more 
difficult rules. The proposed methodology will never create false errors , 
but errors in the outline of the cell may not be caught until the cell is 
used, and may appear as many times as the cell is used. 
COMPUTER-AIDED DESIGN SESSION 
A Me th od o logy for Improv ed Ve r i f i ca t i o n 
o f VLSI De signs Wi tho ut Lo ss o f Area 
INNER CELL BOUNDARY 
A 
~ MAX RULE J 
Figure 2- Possible internal erro r 
~vv 
Component extraction is made si~ple by the fact that no active areas may 
touch the boundary. This means each device is entirely contained within a 
cell, and only interconnections touch the edges. Implied devices, such as 
transistors formed where polysilicon overlaps diffusion, are not a problem 
since we know nothing will overlay the cell under consideration . Thus each 
cell can be extracted separately and reduced to a boundary that shows the 
connection points. Capacitance (as a function of area) may be easily 
calculated since the area of a node is the sum of the area in each of the 
cells that use that node (since there are no overlaps). Sidewall capacitance 
may also be computed, with the proviso that sidewall capacitance on the cell 
boundary cannot be computed until the cell is used. Similiarly, node to node 
capacitance may be computed, at least for nodes lying within MAXRULE of each 
other. (MAXRULE could be increased if this proves to be a restriction . ) 
Enforcement of this methodology is quite easy . Check to see that no 
cells overlap, check to sec that no primitives overlap any cells, and check 
to see that no components extend past the boundary, if the boundary is 
specified by the user. 
Finally, some mechanism is needed to allow the operations that formerly 
were done by overlapping cells , such as filling out the bits in a ROM or 
PLA. One approach to this problem is to allow programs access to the 
graphics primitives, so a program can write a bit pattern to be included in a 
ROM o r PLA. The program does not get any special privileges such as the 
ability to overlap cells. The output is subjected to exactly the same tests 
as if it had been entered by a human. This is useful for catching errors in 
the programs. 
CALTECH CONFERENCE ON VLSI , Ja nua r y 1981 
JUb 
~FJ.ATION TO SCHG!ATIC DRAIIING SYSH:HS 
In many ways , the proposed methodo logy resembles the methodology used in 
sch~Matic drawing systems . This is not a coincidence, since every attempt 
~1as r~;ide to incorporate the features that have made hierarchical schematic 
drawing systems useful. In a schemi'ltic, th~ user never vie\o1S more than two 
levels of the hierarchy at a time. Any arbitrary s tructur e can be 
represent ed , although some fit r:1ore naturally into the hierarchical structure 
than others . Individual pieces r:1ay b<' checked with the ;~ssurance that the 
r est of the design wtll not affect the piece that is being checked, except 
through the terminals of the device. These properties all carry over to 
layout, if it is done with this methodology. 
In particular, two forms of checking originally designed for schematics 
are now possible for layouts. One is the timing check by means of 
assertions, i'lS used by McWilliams [McWAO]. Tn this technique the designer 
~upplies statements about the signals which are believed to be true (the 
assertions) . For example, consider tl1e case of a simple cell wtth one input 
qnd one output . One assertion might sta te that the input must be true 50 NS 
after the c lock, and another assertion might state that the output will be 
stable and correc t by 90 NS after the c lock. With the aid of these 
Assertions, a cell May be checked without looking at the contents of any 
othe r cells. 
In orde r to check a cell by this technique, the timing verifier assumes 
that the input assertions are co rrec t, and attempts to prove the output 
i'lS~Prtions from the input~ and the properties of the devices contained in the 
celt . If any other cells are included in the cell under consideration, the 
verifier checks to be sure that their input assertions are r:1et, and assumes 
t,at thei r outp1Jt assertions true . By this means, an entire design may be 
verified o ne cell ~t a tir:1e, and timing ve rification may be done on the 
pi~~ ces long before the design is complete . 
A second r heck that is easy to make using the proposed methodology is a 
coMparison between the logi c schematic of a cell and its layout. In the 
simp lest form, the checking program could require that the hierarchy be 
identic~! in t he schematic and layout representations. Then checking is 
quite siMple . In more complicated forms , a theorem prover may be called into 
pl~y when the correspondence i s not exact, to see if the two different 
representations perform the same f unc tion . In either case, the check may be 
performed on a cell by cell basis. 
There are still fundamental differences betwee n a schematics and 
layouts, however . In a schematic, the inside and outside views are 
distlnct . An op-amp, for example, is composed of a large number of 
transistors, but the configuration of these transistors bears no relation to 
the triangular symbol that represents the op-amp. Furthermore, the size and 
shape of the symbol do not depend on the actual complexity of the opamp, so 
the designer can design and wire an active filter, for example, without 
knowing the gain bandwidth products of the op-amps . These may be filled in 
later without disturbing the design. 
COMPUTER-Al DED DESIGN SESSION 
A Methodology for Improved Verification 
of VLSI Designs Without Loss of Area 
In a layout system , such freedom does not exist. The outside view of a 
device must resemble the internal view in its size and the location of 
connections. The designer cannot wire up an inverter, for example, without 
knowing the load the inverter is to drive, and hence its size. If at some 
later date a larger, more powerful inverter must be substituted, then the 
wiring must be changed. This additional complexity does not plague the 
designer who works with schematics . 
Schematics also allow several shortcuts for indicating connectivity that 
have no equivalent in the world of IC design. Global signals may be defined 
whose values are known to all cells . Within a diagram, pins may be connected 
together simply by giving them the same name. In a layout, on the other 
hand, all connections of any type must be shown explicitly. 
AN IMPLEMENTATION 
A program has been developed to test this methodology . Component 
extraction, boundaries, assertions, methodology enforcement, and programmatic 
access have been implemented, but DRC has not. The system is written in 
PASCAL and runs on a DEC- 20 with an HP-2648 graphics terminal and a 
Summagraphics BitPad One graphics tablet. PASCAL is also used as the 
imbedded language in which the user manipulates graphic constructs. 
The program consists of two major sections, the editor and the 
compiler . The editor is the user interface, and allows the user to create, 
modify, and purge cells. Commands are similar to many other graphics 
editors, allowing the user to add components, interconnect them, move them 
and so forth. A minimal amount of checking is done at entry time for obvious 
errors, such as illegal signal names and lines at angles that the rest of the 
process cannot accept. 
The editor and the compiler are adapted to various processes by means of 
a "process file". This file contains data on all the layers to be used in 
the compilation process . It is similar to the file normally used to describe 
DRC operations, with information about each layer, such as minimum spacing 
and width, capacitance per unit area, and the logical operations by which it 
is computed if it is not entered by the user. 
The compiler is called wh e never the user writes a cell out to permanent 
storage, thus making the cell available for use elsewhere . The compiler 
takes the user specifications of the geometry, fills out lines to their 
minimum width, and converts all input to polygons . The new layers are then 
computed, if necessary, and a connectivity analysis made . The DRC test is 
done at this point, so it can take advantage of the continuity information if 
it so desires. Then the boundary is computed (or the user specified boundary 
is checked) and every signal that touches the boundary is noted. The outline 
is then saved away so that it may be used by other cells . The date when the 
last change was made is recorded, so that when a cell is changed the 
modifications will propagate correctly . 
CALTECH CONFERENCE ON VLSI , January 1981 
The cnmponPn t ext ra~tion rnutlnes check for the obvious pr obleMs of 
r.llllt iply naned s ir,n<lls, .<~nrl signals of the same nAme that are not connec t ed 
tngether . Runni,g l'apaci tance ,~al·~ul.-ttions a r e avall:tble for applicat i ons 
•,.rhert:> performan.~e is crucial. The user can nsk for con tinuity to be shown; 
the syste~ ca n t3kc n signHl by name or location and displ ay o n the sc r een 
~-.·her ... the signal is connected . All nf the r.t<lsks that have bC'en gene r ated as 
part of the fiR(' or C•)mponPnt ext r::tct ion pr ocess may be displayed . 
~oundaries for cells M:1y be specified in one of th r ee \lays . If nothing 
else i·3 specified , then t he boundAry is computed .<~S the logic<tl OP. of a ll the 
layers of tltr cell . Active Hreas ~re cxp~nded by one Minimum unit before t he 
OR to C'nsure that no nctive area hits the bnund<try . This pives the minimur 
possP·>lE> sizP for the ce ll, but the bound.<~ry is often fa r nore coMplica t ed 
t:i1.'1.n n<'• ·ess.'lry , :1nd rr~·sents a cl11ttc r ed :tppenranc<>. 1t l s ne ve r wrong, 
hnwev('r , and is the default when data i s transferred from anothe r sys t eM that 
does not use the s.<~me m~t~odology . 
Bounci:tri.•'S may also be the ;.,inimul'l boundinr boxes. This is consistent 
wit't srverd l othe r artwork systems , .:Jnd works well when the r esulting cells 
::~r~ to he fed to an auto-router (sincf' current auto-route r s cannot handle 
'lrbitr<try shape cells \/ell) . Final ly , the user ca n specify the bounda r y by 
l'nt~ r ing it ns n polygon on .:1 r eserved m:tsk . This usually yit"lds t he hest 
rt>~ult s , but n•quires the most work . 
t..ss~> r tinns .'lppen r to the user simply as notes with a r f'se rved flrst 
rh<H-H~ t •'r . Lil-c slgnHl name~> , they att::~ch to the c l oses t object . The 
rnnpL l er makes no Rttempt to pr ocess t he assertions , but records them, along 
•,.rj th tlH• signA ls they are attached to, in the dictiona r y of signal nl'lmes . 
Thic; dictionary is <tvailahle l'lS a fi.lP , and .<~ny checking program tha t desires 
t) Jo so May check it , anci use the .qsse rtlons :'I s it sees fit . This l s 
deslr-'lblr since peop l~ wi ll doubtless tl1Lnl- up new ways to use assertions . 
Tn ~OS design, for exal'lp l e , it may be nccess<t r y to include asse rtions about 
capacitl'lncPs to he driven and voltage levels of various signals , as well as 
ossc rtions nhout the sign al tiMing . 
Thl' u ~er tells the compiler to includC' a progr am along with a diagram by 
divi.dinr, the diag r HrnS into FRAJ-1f ') . Each frame has a na me , a nd may contain 
prirnftlvcs 0 r text r ep r esentine a progr.<~m . 1f frames a r e present, and one i s 
label lf'd PP.OGRMI, then the pr ogram in that fr ame is execu ted before any other 
fra!'les a r e evaluated. The progran may direct the e valua tion of o the r frames , 
:tdrl g raphics primitives to a diag r dm , a nd make decisions about the paramete r s 
of diag r ams . 
The r e are sever al r estrictions imposed by th is implementation . Although 
componen t ext r action h as been implemented for MOS transistors and parasi tic 
·~Apacitances , no DRC has bee n impleMented yet. This should be a relatively 
simple project, since it only involves running a nearly conventional DRC on 
portions of the artwork that are selected by the compi l e r. Lines at a ngles 
other that 0 and 90 degrees are not s upported, but this s hould be a 
straightfo rward ext ens i o n . Althou gh the simple logica l operations such as 
AND, OR , and ANDNOT are implemented, the more complica t ed operations of 
OVERSIZE and UNDERSIZE a r e o nly implemented in a form that is not always 
COMPUTER -AIDED DESIGN SESSTON 
4 "f t:! f h < > t1 o 7. "!1' .,1 .. r .J 'Y' .. t m ;"' )'1 ~·\ ' ,., ! V ... )'1 : _r 1~ ~ ·; ,... i ? ~z 309 
o J•' l' L.'i I Jn;:; ::.PlS W · t ~· n 'l ~ ,,-.;c. ,;· .1 ,,,: ,1 
co rrec t at the cell boundaries . If R cel l is used in different surrvundinr-... 
dte user might prefer to crente addition.1l cPlls in order to do th£> 
geoMC'trical opcr'ltions properly and keep thC' results in the strlrtly nl) 
overlap format . At p r esent thi~ option is not ~vall~hle . 
A POSSIBLE FXTF:NSIOtl 
An extension that allows more des ignt:-r freednm .H thP cnst of .-o~ddi r il"'n,ll 
comp lexity has been proposed by Horowitz [Hof!O). This is to st0rL' n bnun~bry 
for each layer inst ead of o nP boundAry for the whole cell. Thic; :otllows cP ll s 
wit h internal terr.inctls to be rout~d, and r~)utinp to be ntn over ct>l l~ thr.t 
were not specifirnlly deslgnt>d for this type of routing, if ther~ ls r on~. 
However, care must be exercised or else unintPntional devic~s ~tv be for red . 
Capacitance calculations :tnd DRC checking also become norc diffirn1t, but thr 
basic hierarchical s tructur e of the solutions r em.<! in unrl,a npPd. 
ACKNOWLF.nGEMFNT<:; 
I would like to thank the design <'lids g roup fit Hewlett-Packard for 
supporting this work, and my colleagues at StAnford IJniversity for rwny 
fr u itful discussions . Martin ~Jewell suggested includlnp the <lnnlnry tn 
structured prograMMing . 
R F.FE RENCF. S 
[Pr79) Pr eas , BryanT., " Placement and Rout lng Algorithmc; for JliC'r<~rchi ca l 
Integrated Circuit Deslpn'', Technical Report #If!0 , Conput~r Systr~~ 
Lab, Stanford University , Stanford, CA . (August lQ79) 
[McG80) McGrath , f:dward J, and Whitney , Telle, "neslgn tntep,ri.ty anc! IMmunity 
Checking ", Proceedings of the Sevente~?nth Desip.n i\utom.:H i0n 
Conference , Minneapolis, Minnesota, June 1Q80 
[\-Th80) Whitney , Telle , " Description of the HierarchicAl '1esip.n Rule Filtt>r", 
SSP File 114027 , Silicon Structures Project, Call forni:-t Inst it11te nf 
Technology, Pasadena CA . (October 1980) 
[Mdl80) t-1cWilliams , Thomas M., " Verification of Timing Constraints on L:Hg£' 
Digital Systems '', Proceedings of the Seventeenth Desi~n Automation 
Conference. Minneapolis, Minnesota, June 1980 
[Ho80) Horowitz, Mark. Private communication 
(Pe81) Penfield, Paul, a n d Rubinstein , Jorge , " Signal Delay in Rr Networks" , 
Caltech Conference on Very Large Scale I nt egration , Janunry 1981. 
CALTECH CONFERENCE ON VLSI , Janu~PY 1J 81 
310 
COMPUTER-AIDED DESIGN SESSION 
311 
Chair>per>son : THOM.4::> F. K.N:c7H':' , .TR . 
Gr>adu1te Student 
Department of Elertr>ica7 
Stanfor>d University 
CALTECH CONFERENCE ON VLSI, JANUARY 1981 
312 
INNOVATIVE CIRCUIT DESIGN SESSION 
ABSTRACT 
CONSIDERATIONS FOR AN ANALOG FOUR QUADRANT SC MULTIPLER 
by 
Phillip E. Allen 
and 
William H. Cantrell 
Department of Electrical Engineering 
Texas A&M University 
College Station, TX 77843 
This paper outlines the considerations and design of a four qua-
drant analog multiplier using switched capacitor (SC) techniques. The 
design algorithm for accomplishing the multiplication is described. 
Implementation of the algorithm is then presented. The predicted 
accuracy of the multiplier is given and compared to preliminary bread-
board measurements. The multiplier described is presently being fabri-
cated as an integrated circuit on a university multichip project using 
double-poly MOS technology. 
CA LTECH CONFERENCE ON VLSI, JanuaPy 1981 
314 Phillip F. 4 lle~ a ~ J WilliaM 1 . ~~~t~ezz 
INTRODUCTION 
Practical analog signal processing using integrated circuit tech-
nology has been made possible by the application of SC techniques. 1 
The accuracy of analog signal processing systems can approach 0.1 % 
which is equivalent to approximately 10 bits of digital information. 2 
The primary analog signal function which has been implemented using 
SC techniques is the filter. 3 Some non-filter applicat ions have also 
been considered such as oscillators, 4 diodes, 5 and digital-to-analog 
converters. 6-9 Some work has been done in the area of modulators, 10 
but little if any consideration has been given to a true analog 
multiplier. 
The objective of this paper is to take the proven techniques of 
SC circuits and apply them to the development of a four-quadrant SC 
multiplier. The result is a very useful analog signal processing 
component which is compatible with MOS technology. The speed 
of such a multiplier would not be expected to match the integrated bi-
polar analog four-quadrant multipliers presently available11 because 
of the use of sampled data techniques. Preliminary results show 
that it is possible to eliminate many of the multiplier errors and to 
avoid the extensive fine tuning and external components that must 
accompany the use of bipolar analog four-quadrant multipliers. One 
useful technique that can be accomplished in SC circuits is to sa1nple 
the offsets and cancel their contribution during the next clock phase 
period. This technique is considered as a means to reduce the errors 
associated with the analog, SC multiplier. 
The paper will first consider the principles of operation by which 
the SC multiplier will be designed. These principles of operation will 
then be used to develop an implementation of the multiplier. An analy-
sis of this implementation will provide the predicted performance 
which will be compared to breadboard results. This will be followed 
by the present status of this development and the future steps that will 
be taken . Brief consideration will be given to the implementation of 
I NNOVAT I VE CIRCUIT DES IGNS SESSION 
ConBide~ations fo~ an AnaLog Fo u~ Quad~an t SC Mu Lti pLi e ~ 
this multiplier using MOS technology which is presently under construc-
tion. 
PRINCIPLE OF OPERATION 
In seeking a multiplier compatible with MOS technology, one might 
ask why not simply replace the bipolar junction transistors in the 
bipolar multiplier with MOS transistors. In theory, this approach pro-
posed by the question should work. However, in practice there are 
several problems. The bipolar multiplier works on the principle of 
current ratioing using the transconductance of the bipolar junction 
transistor. Unfortunately, the MOS transistor has much lower trans-
conductance and larger offset voltages leading to a four-quadrant 
multiplier having large errors. While the transconductance of the MOS 
transistors could be increased by using very large devices, the offset 
voltages would create too much error. With the concepts of SC circuits 
in mind, a new approach was sought which would take advantage of the 
SC methods to provide improved performance. 
The operating principle chosen for the MOS multiplier can be 
explained by the block diagram in Fig. 1. This block diagram is used 
to represent the multiplier which will be designated as the operation-
al multiplier. The operational multiplier has three inputs and one 
output. Two of the inputs are designated as multiplicands and the 
third input is called the divisor. The principle of operation can be 
simply stated as follows. Operate simultaneously on one of the multi-
plicands and the divisor in such a manner that the divisor is equal 
to the remaining multiplicand times a constant. For example, suppose 
that the operated value of the divisor C is equal to the multiplicand 
B and is given as 
Operated value of C = KC = B ( 1 ) 
If the remaining multiplicand, A, is operated on in the same manner as 
Jl;) 
CALTECH CONFERENCE ON VLSI, Janua~y 19 81 
316 
MULTIPLICAND A---~ 
j OPERATIONAL 
! MULTIPLICAND B ---~ _ ___,, OUTPUT = ( AB) /C 
I MULTIPLIER 
DIVISOR C --~t_____ _  _j 
Fig . 1 - Block diagram of an operationa l multiplier . 
Ao---~ 
I= i ~ I 
Co------! i .6 c 
e:r.O,I,2, ... ,n 
AS i---___,.---o OUTPUT='~ A "'c 
STOP 
Fig. 2 - Counter implementation of the operational multiplier. 
IN NOVATIVE CIRCUIT DESIG NS SESSTO N 
the divisor C then we can write 
Operated value of A = KA (2) 
From (l) we see that K = B/C. If the output is equal to the operated 
value of A then the output can be expressed as 
Output= Operated value of A = (AB)/C (3) 
If the output of the operational multiplier i s equal to the operated 
value of A, then multiplication between the inputs A and B has been 
achieved. We also note that division of the product (AB) by C has 
also been achieved. Although the operations indicated above can be 
more complex than multiplying by a constant, this choice has the ad-
vantage of s impli c ity and is used in this paper. 
IMPLEMENTATION 
The first implementation for analog multiplication to be consider-
ed was the counter approach as shown in Fig. 2. This approach is an 
obvious implementation of the operational multiplier. The counter 
approach consists of two accumulators designated as i6A and i 6C. 
These accumulators continue to add 6A and 6C until i6C is larger than 
B. 
Once the accumulation stops, it is seen that B is equal to n 6 C, 
or at least within one unit of 6. Obviously one can see that in order 
to achieve high accuracy the incremental constant, 6, must be very 
small. The problem arises in that as 6 decreases, the number of steps, 
n, increases. If B is much greater than 6 · C, then a long interval 
will be necessary to obtain the output signal. The disadvantage of 
this approach is that the operation can take too much time, particu-
larly if B is much greater than 6C . 
The seco nd method selected is the successive approximation approach 
as used in analog-to-digital converters, and is shown in Fig. 3. One 
can readily see the advantages of this approach over the previous one. 
317 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
318 
Phi~~ip E. A~~en a n d Wi ~~ iam H. Ca n t P e~~ 
The master accun1ulator successively approximates the value of v8 
resulting after n steps in 
n b. 
L: ~ v - v 
i =O 21 C B 
Rearranging terms: 
But, 
n b. 
v L: ~ v 
out = i=O 21 A 
Substituting, 
(4) 
( 5) 
( 6) 
(7) 
Eq. {7) is approximate to within 2-n times Vc. This difference between 
eq. (3) and eq. (l) can be reduced by increas ing the value of n. 
One can see that the successive approximation approach converges to 
the proper value much faster than the counter approach . 
It turns out that this method is naturally adaptable to SC cir-
cuits and requires only three matched capacitors to implement the 
basic accumulator operation. Fig. 4 shows a circuit which resembles 
a SC integrator but has been modified for our purposes in implementing 
eq. (3). The circuit works as follows. At the beginning of the 
multiplying operation, the left-hand capacitor C is charged to the 
voltage VC. The right-hand capacitor C is completely uncharged during 
this time. The multiplication operation begins by switching across 
both capacitors by exactly one-half. The value of Vc/2 is then applied 
either positively or negatively to the capacitor C connected around 
I NNOVA TI VE CIRCUIT DESIGNS SESSION 
Conside~~tions fo~ an Analog Four Qua~~znt 5C ~ultiplier 
SLAVe 
ACCUMULATOR 
VB o---------~. 
j 
I 
MASTeR 
ACCUMULATOR 
~~ 
'- vc 
1:0 2 I 
1=0,1,2, ... ,n 
b = • I 
.. I 
319 
Fig. 3- Successive approximation implementation of the operational multiplier. 
rrset- !Pz 
_y_-
sign control 
Fig. 4 - Switched capacitor accumulator circuit. 
CALTECH CONFERENCE ON VLSI, January 1981 
320 Phi l l ip E. Allen and Wil l iam H. Can t Pell 
the op amp. This capacitor serves the purpose of a memory or an 
accumulator. The decision to accumulate in a positive or negative 
manner is made by comparing the output of the master accumulator with 
a reference voltage. If the accumulator is below this voltage, then 
the next sample is added to the accumulator. In this manner, the 
accumulator successively builds in value KVC which approaches the 
reference voltage . A second accumulator operates on VA in the same 
manner as the first accumulator resulting in the implementation of eq. 
(3). Since the accuracy of the accumulators is dependent upon how 
well the three capacitors designated as C can be matched, each sample 
of VA should be transferred to the accumulator with an error of less 
than 0.1 %. 
Fig. 5 shows the implementation of the four-quadrant multiplier 
using the successive approximation accumulator of Fig. 4. The 
accumulators use a set of switches indicated as "+" and "-" to deter-
mine the polarity of the accumulation. In the "+" position the 
accumulator has the advantage of operating in a stray-sensiti ve mode 
which prevents capacitor and switch parasitics from affecting the 
accumulation. 12 In the negative position the accumulator is unfortu-
nately susceptible to these parasitics which must be taken into account 
when considering the accuracy of the circuit. Note that the sign 
of the four-quadrant multiplier is automatically determined in Fig. 5. 
This is possible because the accumulators are bidirectional. 
A shift register is used to control the operation and sequencing 
of the multiplier. The first output of the shift register is used to 
reset the accumulators by discharging the accumulation capacitance, 
and by charging the left-hand capacitor C to the value of VA (or VC). 
The accumulation process continues until the shift register reaches 
the point where the sample-and-hold circuits are triggered and C is 
reconnected to the voltage VA (or VC). The theoretical accuracy of 
the multiplier will be determined by the number of steps taken in 
the successive approximation sequence. Obviously, if no other consi-
derations are pertinent, the accuracy times speed of performing a 
I NNOVA TI VE CIRCUIT DES IGNS S ESSION 
Conside r ations fo r> an Analog Four> Quad.r>ant 01.-' MU~r;'~-p~·~.-~::r· 
v A<n 
SAMPLE SLAVE SAMPLE 
AND VA ACCUMULATOR vo AND You 
-
~ 
HOLD n b . HOLD 2. -+vA 
i•0 2 
T 
T 
s<n SAI1PLE 
Vs 
AND SIGN 0-----'1 COI1PARATOR 
v 
HOLD ~ CONTROL 
SAMPLE MASTER 
c<n AND Vc ACCUMULATOR t--~
v 
HLlLD n b . ~-+ vc 
i•0 2 
T 
Fig. 5 - Block diagram of the successive approximation implementation. 
10 
8 
6 
4 
2 
Vout 0 
( ....,, sl 
-2 
- 4 
-6 
- 8 
v8 = IOvolrs 
(VA ) (V8 ) 
VREF 
10 volts 
2 
(v olts) 
V 8 = Ovolrs 
4 6 8 10 
Fig. 6 - Experimental s tati c characteristics of the multi pli er. 
VOUT versus VA with VB constant . 
CALTECH CO NFERENCE ON VLSI , Januar>y 1981 
322 
·~;n·~ operation would be a cons••"+. 
The static performance of a multiplier is typically characterized 
by offsets, feedthrough, and nonlinearities. Offset is due to an 
output voltage when both inputs are zero. Feedthrough is defined as 
an output when one of the inputs is zero and the other varies over 
its range of possible values. Nonlinearity has to do with the fact 
that the output of the multiplier may not be exactly equal to the pro-
duct of the two inputs. The static performance of the breadboard 
version of the SC multiplier is shown in Fig. 6. To obtain these 
plots, VC was connected to a +10 volt supply. VA was set to -10, 
zero, and +10 volts to produce the three lines shown, and VB was then 
swept over the range of -1 0 to +10 volts. As one can see, errors are 
mainl y due to a constant offset of about 156 millivolts. This i s a 
result of the number of successive approximations chosen for the 
breadboard version. Eight approximations are performed for each 
iteration. Since one approximation is discarded for resetting accumu-
lators, seven useful approximations (n = 7) remain. If in this case, 
V,... = 10 volts. then the useful range <Y40 11 is from -10 to +10 volts 
)' ! J v 'J 1 t s t 0 s means that tl · <imum error would then be 
t / r 2J V0lt : 
J" 7 Because this ,r is constant as shown, it ~ (I 2 
can easily be nulled out by a si ngle adjustment to be mentioned later. 
The results are similar when VA is held constant and VB is swept over 
its useful range . 
In addition to these static characteristics, the bandwidth of the 
multiplier i s also of interest. The bandwidth can be defined in terms 
of the magnitude response or the phase response . The best dynamic 
definition is the frequency at which a 1% absolute error i s introduced 
in the multiplication operation. Figure 7 shows a plot of magnitude 
and phase versus frequency for a sine wave applied to the VB input. 
The inruts VA and VC are each held at 10 volts. Note that the magni-
INN~VATIVE CIRCUIT DESIGNS SESSION 
Considerations for an Ana Lo g ~o ur ~uaaran~ ~~ Mu ~~~v~~~·· 
I ~:' I 
1.0 
Ideal and Experimen ta l 
0.5 
0 
0 25 .50 .75 1.0 
(a.) MAGNITUD E RESPOf'!ISE 
Ph a•• Shift 
.215 .50 .75 1.0 
' 
' 
- 90 ° ' 
' 
,<d7 
' Experimental ' -1 80 ° 
' 
' ' 
' ' 
' 
' 
' 
' 
-270 ° 
' 
' 
' 
' 
' 
' 
' 
-.360 • 
' 
(b.) PHASE RESPONSE 
Fig. 7 - Frequency r esponse of the SC multiplier. 
and (b.) phase response. 
, 
f clock 
(a.) Magnitude 
CALTECH CONFERENCE ON VLSI~ January 1981 
• Ji' ··e,udins constant regardl ess of frequency , and that the phase 
shift is a linear function of input frequency. 
In the MOS version the offset error in Fig. 5 will become more 
im~~rtant because of the clock feedthrough of the switches (not to be 
confused with the multiplier feedthrough). Because of the large 
clock signals and the possibility of high impedance states, small 
portions of the clock transitions will appear on the capacitors of 
Fig. 4. Although this feedthrough can become dependent upon the signal 
level, for purposes of this paper we shall assume that it is constant. 
Fortunately, in SC circuits we have the opportunity to sample the 
output offset and to introduce a cancelling component during the 
clock phases. To see how we can apply this idea to cancel the offset, 
let us consider the influence of the clock feedthrough. If the clock 
feedthrough introduces a constant component in the output, say E , 
then we can write the output of the two accumulators at the end of a 
mul tiplication sequence as 
V out = K VA + E (n) 
and 
Nh~re K j~ expressed as 
K = + - + - + - (1 0) 
2 4 8 
where n is the number of steps in the successive approximation of VB 
of the multiplier. The approach taken to reduce this clock feedthrough 
is to build the dummy accumulator shown in Fig. 8. The dummy accumu-
lator is identical to the other accumu l ators except that it has no 
input, and is not allowed to accumulate due to the discharge of the 
feedback capacitor C around the op amp during each clock cycle (this 
switch on the other two accumulators only operates once during the 
entire multiply sequence) . In this manner a voltage will appear across 
IN N?VATTVE CIRCUIT DESTGNS SESSION 
the two output capacitors which is caused by the clock feedthrougr arrc 
the op amp offsets. These two capacitors will be applied on the same 
clock phase to the two accumulators in such a manner as to cancel the 
offsets due to the clock feedthrough and op amp offsets of these 
accumulators. If the dummy accllllulator is ma tched to the actual accu-
mulators, then the offsets should cancel. Furthermore, this system 
has the ability to track cha nges in the offset due to different clock 
amplitudes or temperature changes. 
The nature of the operation of this multiplier prevents serious 
problems in multiplier feedthrough. If the A input is zero, then 
only clock feedthrough and op amp offset can contribute to an output 
but the dummy accumulator scheme of Fig. 8 should remove this output tc 
minimize the 8 feedthrough. If the 8 input is zero, a1 should become 
close to zero since the dummy accumulator is removing the E value in 
eq. (3). 
Since the error caused by mismatching of the capacitors is con -
stant and since we are assuming that the clock feedthrough is not a 
function of the input amplitudes, then the nonlinearity of the multi-
plier should be very small. The dynamic range of the multiplier 
should be limited only by the power supplies and the ability of the 
dummy accumulator scheme to cancel th r ttsets ca used by c~ock feed -
through and op amp offsets. Indeed, bt~a dboard result s of the SC 
multiplier circuit show that non-lin ed r 1ties are a lmost negligible 
with matched accumulators. 
conducted at present. 
CONCLUSIONS 
More extensive measurements are being 
This paper considered the design of a SC multiplier which can be 
implemented using MOS technology. This circuit i s compatible with 
other s ignal processing circuits designed by SC methods. The multi -
plier has four quadrant capability and has the potential of requiring 
no adjustments before application . The accuracy of the multiplier 
appears to very comparable to existing multipliers and the opportunity 
to use offset cancelling techniques give the promise of excellent 
static characteri stics . The circuit requires 6 op amps and 11 capaci-
325 
CA£TECH CONFERENCE ON V£SI, Janua~y 1981 
326 PhiLL i p E. ALLen and WiLLiam H. CantPe LL 
tors in its present form including the dummy accumulator. At present, 
portions of Fig. 5 are being implemented using MOS technology to 
further study the effects of clock feedthrough and other sources of 
error to the multiplier operation. The next step will be to integrate 
the entire circuit and to analyze the performance of the system and 
to study potential applications. 
q,, 
s tart - ~2 
~ ----Q ro Accumulator A ~2 j_ - ~, 
c 
I ~ -0 to Accumulator B ~2 j_ - ~, 
c 
I 
Fig . 8 - Dummy accumulator used to cancel offset. 
INNOVATIVE CIRCUIT DESIGNS SESS I ON 
REFERENCES 
1. D.A. Hodges, P.R. Gray, and R.W. Brodersen, "Potentia l of MOS Tech-
nologies for Analog Integrated Circuits", IEEE J. of Sol id - State 
Circuits, Vol. SC-13, No. 3 , June 1978, pp. 285 - 294. 
2. J.L. McCreary and P. R. Gray, "All-MOS Charge Redistribution Analog-
to - Digita l Conversion Techniques - Part I ", IEEE J. of Solid- State 
Circuits, Vol. SC-10, No. 6, pp. 371-379, Dec. 1975. 
3 . R.W. Brodersen, P.R. Gray, and D.A. Hodges, "MOS Switched- Capacitor 
Filters", Proceedings of the IEEE, Vol. 67, No. 1, pp. 61 - 75, Jan. 
1979. 
4. T.R. Viswanathan, K. Singhal, and G. Metzker, "Application of 
Switched Capacitor Resistors in RC Osc ill ators", Electronic Let-
ters, Sept. 28, 1978, Vol. 14, No. 20, pp. 659-660 . 
5. B.J. Hosticka and G.S. Moschytz, "Practica l Design of Switched-
Capacitor Networks for Integrated Circuit Implementation " , Elec-
tronic Circuits and Systems, March 1979, Vol. 3, No. 2, pp.~ 
88. 
6. Y.S. Lee, L.M. Terman and L.G. Heller, "A Two-Stage Weighted 
Capacitor Network forD/A - A/D Convers i on", IEEE J . of Solid-State 
Circuits, Vol. SC-14, No. 4, August 1979, pp. 778-781. 
7. K.B. Ohri and M.J. Callahan, Jr., "Integrated PCM Coder", IEEE J . 
of Solid-State Circuits, Vol. SC-14, No. l, Feb. 1979, pp. 38-46 . 
8. A.R. Hamade, "A Single Chip All-MOS 8-Bit A/D Converter", IEEE J . 
of Solid-State Circuits, Vol. SC-13, No. 6, Dec. 1978 , pp . 785-
791. 
9. B. Fotouki and D.A. Hodges, "High-Resolution A/ D Conversion in 
MOS/LSI", IEEE J. of Solid-State Circuits, Vol. SC-14, No . 6, 
Dec. 1979, pp. 920-926. 
10. J. Bingham, "SC Modulators", Switched-Capacitor Workshop, Feb. 
10-12, 1980, Palo Alto, CA. 
11. B.A. Gilbert, "A Precise Four-Quadrant Multiplier with Subnano-
s econd Response", IEEE Journa l of Solid-State Circuits, Vol. SC-3, 
No. 4, Dec. 1968, pp. 365-373. 
12. K. Martin and A. S. Sedra, "Stray-insensitive Switched-capacitor 
Fi lters Based on Bilinear Z-Transform", Electronics Letters, Vol. 
15, No. 13, June 21 , 1979, pp. 365-366. 
327 
CALTECH CO NF ERENCE ON VLSI , Janu~ P Y 1981 
328 
INNOVATIVE CIRCUIT DESIGN SESSION 
A One Transistm I~Al\1 ror MPC Projects 
by 
JJmt'<; J. Cherry and GcraiJ I . Roylance 
\lns.,.-dlliSC<ts ill'<lilutc or I cchnol<'gy 
545 I cd111olo)!) <;quare. Cambndgc, ,\1/\ 
Abstract: Many MPC projects, such ::13 video frame buffers. need n large memory 
subsystem. A one tranci:.>tor r>t:r bit dynam1c merr.ory using Mend-Conway de:.:;ign rules is 
being destgned with lit is purpose in m1nd. The me111o ry cell size is 1G.5,\ by 8A (about the 
same size as a 1975 41< RAM cd l witll A= 2.5 microns). 
While a comr>IE'te high den::;ity menury subsystem h::w not been dt:si(']ned, two chips have 
been desiuncd to test its major cont:')unettts. One chip is a II< rn emor y army that tests the 
sense ampl ifiE!r, column decoder /cJ,· iver. anrl read /write l(•gic. 1 his cl;ip lacks a timin!) 
genet a tor :mel clock drivers. The ~·eco nd chip tests some low power boot3trnpped c lock 
dnvers. n~ese lest chips me currently being fabri cated. 
ll1is rrport d.'~crrbc;, rc~cnrch clorte at the Arufki~l lnr clhgrnc~ I ahmator, of the \ t .. ·.s~cl:u\etl~ l n~t rtule of 
lcchnoltlg} Suppnr• for the I ahorat<>r) 's \ ' I S I r.:••car,·h rs pro\tdccl 1n pan hi' tht' \til.tnr,•cl Resc·urh Projects 
i\gcnn of Jh..: (),parlmcnl uf Dd.:n-c under Ollicc o! '\a,.t! RL'<'IITh ( r•nttarl numtwr :-.oooJ-i·XO·C 0622 and 111 pall 
l>} the t\civanrcd Research ProJCCt\ 1\gcuC\ uadu Offrrc or '\a\:11 Hcsc;Jt ch cuntr.Jct l\00014·75-C: 06-13. 
I. Design Considcr:d ions ror a 011c Trnnsi',tor UAM 
Many VLSI projects need a mocleratl! l y large mr mory module. A one trnnsistor 
dynamic mcn1ory has been desiglll ... d as n suusyskm usable in the Multi-Project 
Chip (MPC) design~ being undenakcn by several univcr..; ilies. lkc<tttsc the mcrnory 
is intended for M PC projects, il ro llmvs the conse rva tive l\1rmJ-C<.Hnvay dl:sign ru les 
<tnd usrs on ly a single layer o r pol) si licon; consequently, the achi~vable memory 
density sufTcrs. Furlher111 ore, the memory design must tnkrale wide process 
variations because many eli rrcrent fabricatit.Jt l lines arc used I(Jr M PC. 
Throughout the RAM dc<iign we have Jecidcd in fa vor or simplicity in order to 
give the RAt\! the besl chance of working. When the choice \Va'> between spcr~cl 
and density, we chose density bcca ttse v.c feel densi ty i~ more i111portnnt to the 
<tverage project. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
330 
!\ block diagra111 of th<..: nwmory is show n in the first fi gure. The major 
Lt)mponents < 11 "1.~ the mcnwry all'it), scn ~e Hmplilicrs. culu n1n address decoder, and 
\\llrd l ine (co lumn) dri,er. TI1L' J,i ts ol' the m<..: mur) nrc stl>lecl as t1 vo ltage (0 or 2 
\Oil\ in uur case) on the c:tpacllc,rs in the ar ray . A ll of the memory ce lls in a 
column arc rc:tcl at once. AddrL·ssi 11g i ~ clnne by the '' 0 1 cl line cleco.!cr; the decoder 
take\~~ pu'>it i \l; pulse from the \\Ord li ne dri \e r and steers tlwt pulse to the column 
sckt ted h) the address li n e~. Tktt \\ord line puhc tu 1ns on ltll o f the pass 
tran·,i\!I)JS in the ~ckctcd co lumn thus Cl)nnecti ng the menH>J) cell .::apacitur to th e 
ll tll i/()ntal bit l ines and to the '•\. ,be :1111p l i fi crs \vhcre the bi nar) '.tl ue held in the 
c.;ll is rktcrmincd. l lle '>l'llSC ~111 pl ificr must be scnsiti\c to signals on the order of 
:1 lluiHII \'d 111 illiHJits bec<~usc the rncmor) c\..' 11 capacitance and th ~ (much larger) 
">lia) l'it line Glpaci tance form a w ltagc cl i\' idcr. In the p resent design this 
nttcJ lU:Jt ion is a li tclor of 15. 
wo1d line 
Sense An.ps 
'"""'" --c==:'d Uoo D•o::_ __ ___J 
Word Li ne Driver 
Th ere arc scvcr:1 l references nn one transisto r RAM design. Bam es [ ll gives a 
short tk~.cripti o n or scvc ml se nse <J inplifier designs. We tJ'\Cd the charge transfer 
sense amplifier fi r~t reported by H eller [5, 6j . We did not use the more 
INNOVATIVE CIRCUIT DESIGNS SESSION 
A One TPansisto P RAM f o p ~PC PPojects 331 
sophisti cated version of this amplifier dc~,cnbt:u in I kllcr (7) because we felt there 
v. ould be a better chance o f making the simpler version work. The charge transfer 
sense amplifiers hnve more scnsiti vit) than amplifkrs Llwt directly ~cnsc the voltage 
difference on the bit lines anJ arc also tolcmnL of bit li!1e capacita nce and transistor 
threshold vo ltage variations. 
Th e input-output (1 / 0) ci rcuits (not sh0\\11] follow those of Gray (4]. The 1/0 is 
done on one side o f the array (t :tthcr th.m in the center nf the the array) so many 
bits (for cxumplc, 32 bits) can be read o r written at once, allowing higher memory 
b:111dv. iclth than that available from commercial part~ that arc lint itccl to reading 
from 1 to 8 bits at a time. 
The high voltage (7.5 volt) bootstrnp driver used to prechargc the bit lines is 
based on one described U) Chan (2] The word line clccocler is derived from one 
given by T LOU in an article on ceo memories (11]. 
2. Memory Cell Dc~ign Considerations 
Several different memory cdl layouts were tried in the search for highest 
possible array density. The densest one we found (shown below) uses metal bit 
lines and polysilicon word lines. The memory capacitor is actually an enhancement 
mode transistor whose source ~md drain arc tied together to make one termin;tl and 
whose gate forms the other terminal; this latter terminal is con nected to Yno· 
For a bit line capacitance to storage capacitance ratio o f L5 to 1 ( wi th 64 cells on 
each bit line), tile cell size is 16.5 X by 8 X. This cell is half the size (comparing 
square A.'s) o f n three tran!)istor RAM cell designed by Dick L) on nl Xerox PARC. 
With A. = 2.5 fl , this ce ll is the same siLc a$ the INTEL 2104, a com mercia! dynamic 
RAM that came out in 1975. 
CALTECH CONFERENCE ON VLSI, Janua Py 1981 
332 
IIi t 
I in~' 
····· ···· 
• ' • ' 
.. 
1\.l ctc~l 
;§;t§ W!i:!ill 
Pol} !)iff 
f\IL"11tor} Cell Layout 
Word 
I. inc 
~ 
Cut 
Shared 
Contact 
Cel l -; i t~ is not the on ly size c:msidcration. ~,h e sense amplifiers ancl the word 
lin!! c.kcoders must also fit in the pill.l 1 determined by the cell s i t.~. While there was 
little prubkm \\ ith the decoder pitch, the tightest pitch we could gi ve the sense 
:tmplilicr was It A. { when we n~~cdcd 8 A.). i\ltllOLI~''' a bit line pitch o f 11 A. could 
be used, it ..,voulcl cau s~ n ~ i gni lie am increase in cell area. In a two le vel polysilicon 
proces'\ (v. hich we do not ha\'e), the area penall) ca n be avuided by using a fo lded 
l>itlinc [IU] or a swggercd bit litH! [8]. 
We were able to use tile minimum area ce ll (v.ith its 8 A. pitch) by placing two 
amplifier:) side by :-; ide and fitting the pait to a 16 'A ritch. This layoul is not 
sytllmetricul and ~ omc errort is needed to balance the stray coupling rn~nl control 
lines on to the bit lines or the m~nH.>ry will not work. The noise injected onto the 
bit lin e~ by these control lir1es is o ne-hair o r the signal tlwt the :unplilicr is ll ying to 
sense! 
INNOVATIVE CIRCUIT DESIGNS SESSION 
A One TrJnsistoP RAM for ~P7 Projects 333 
J. Sense ,\111plificr 
The he:ul or ever) sen'ie amplifier design is a cruss-coupkd eli !Teren tia! pair 
(MJ- i'viT in the figure belm\). 1\ ~ma ll initial \Olt~tgc cli!Tcrl~ ncc n .~ nodes 1 and 2 
i"' arnplilicd \\hen node 3 i::, p11llcd Jown b) !'vi~. The 111ain Jillcrc 1ce between the 
nw 11 y <.1 i ffcr c n t 'iCnsc ant pl i licrs used in d) na 111 ic RA \Is is ho" tl e cross cnu pled 
latch is CllllllCCll'd to the bit lines [11. In order to scnSl' ~ ~ s mall clil"ll:rentinl voltage 
ljllil k I) ' the brge rttpacit<tllCe oi· the bit lines must be bola ted r, )111 the internal 
nodes or the eli rrcrentia l ampli i er. 
The nc.\t fi gure shows the In sic chargC' tramfer SC'IIW' tllllfJI({tcr hat we usc and 
111<11 \\tiS fir~;t tle:"~cribed by llclil'r (5, 6]. (\luch of the periphl'r<il circuitry that WC 
u~;c comes from Gray [4]). 
Equilibration Line 
• 
• 
Word Line 
• e>3 v 
n Ml 
• 
• 
• 
-1j Bit Line 
cl BLT 
_L. 
" rt .. - . 1 
-Mll-=cc 
110/5 ~-· 0 
T 
M6 
20/ 12 .5 
Sense Amplifier 
MBA 
small 
I M8B 
e>2B I big 
l 
Dummy~ 
Word Line 
M6' 
20/12.5 
• 
• 
• 
• 
• 
The memory cell capacitors arc FET's J'vl6 and M6'. <r 1 is a high voltage pulse 
(above v ()f)) thal pl echarge"l nncles 1, 2, anu 3 to v()f)• VR is a supply vo ltage less 
than vf)()• so nod~s 4 and 6 charge LO VR - VT (V, is the transistor threshold 
voltage) as M4 dllU Nl4' reach cut-orr. Since V R - V 1 is the highest the bit lines can 
CALTECH CONFERENCE ON VLSI, January 1981 
334 
James J . ChePPy and CePatd L . RoytancP. 
go, this voltage is used to store a logica l one in the tnemory ce lls. A logical zero is 
represented by storing zero volt:-. in the memory Ct.:ll capacitor. A \Oitage half way 
bet\veen these two limits is stored on a dummy cdl (1\16') to provide a reference 
,·oltagc fur lite sense amplifier. E:1cll biLiinc is pupul;.ncd with 6·~ memory cells and 
one dummy cel l. When rending u column or bih l"rom one side or the array, the 
dummy cel l on the opposite side o f the alTa) is addressed. Titus, the figure above 
depicts the the situation cltcoulltered when reading a bi t out ur th '! left half of the 
memory array. 
When a read sequence starts. the prechargc line (q11) is turned <tiT and the word 
li ne and dummy word lines al\~ brought high to connect u memory ce ll (Mo) and 
the dumm y ce ll (M6") to opposite sides o f the eli ITcre11tial sense nmpli fkr. Consider 
the case in which a zero is stored in M6. The drop in vo ltage on the bit line wi ll be: 
where Cs is the capaci tance of Lhc memory ce ll , and C 13L is the bit line capncitance. 
This takes M4 uut of cut-ol'l' and into su tumtion with a gate drive of ll V4• M4 will 
remain on until the bit l ine is chnrgcd to back to V R - V r· The charge necessary to 
charge Cs + C0L back to VR - V 1 must come from C0 , the parasitic capacitance on 
node l. Thus, 
The voltage dwngc at node 3 will be one half of this. Note that the differential 
\"O]tage ll V 2 - b. V.l is indepcndcn f ol' the bit line capacitance if enough time has 
claspscd [5, 6]. q12" comes on und starts pulling node 3 toward ground, even tually 
turning one of ~113 or M 3' on. Node 3 is pulled down slowly enough to prevent the 
other latch transisLOr from turn ing on [9]. As the Yus of the on transistor increases, 
so docs it" ~ g . cp111 is then turned nn to quickly ptlll node 2 to ground. The bit that m -
was stored on M6 hds now been refreshed, so the word l!nc is turned orr. <p2" and 
((1 111 arc turned ofT. tp 3 is turned on to short all of the bit lines together; since half of 
the bit lines :trc nt the logic zero level (0 vol ts) and the o ther half t~rc at logic one 
(V,- V), th1· bit lines go to a voltage h:df way hctwccn the two levels. This voltage 
is stored i11 the dtimmy cell by dropping the dummy word line ~md <py 
INNOVATIVE CIRCUIT DESIGNS SESSION 
335 
A memory \\rite C)Cic pl"tJCCl'ds cx ~1 Ctl) like that o f n rc~!d, bu t instead o f letting 
the cnntents o f the memor) ce ll determine the fate <lf thc bit line, a circuit at the 
end or the bit line either pull · it to ground or pushes it up \\ith a capacitor 
appro\imately the same siLe as a memory cell [4]. 
4. \\ onl I ,iuc Decoder 
It is impractical tu 11se a sep:trate high capacitaJKe clri\'er for e~tch o f the 128 
\\\)rd lines, so a single dri\cr m :1st be shared h) man ~ \\Orcl lines. The \',orcl line 
decoder does both tile memo1) nddrcssing and tile multiplexing of the high 
ca paci tance dri ver. ·1 he " urcl li1:e dccudcr :tbo clamps th L: unsckctcd word lines to 
ground to minimi;e some problems rela1ed to subthreshold conduct ion of the 
FET\ that iso late the unsc lccted JnemtJ r) cap:1citur~ to the bit lines [2, <1]. The 
word line dL:codcl ci rcuit shu\\ bclo\\ is a moclilit;d \Crsion of the t>nc clcscribcd by 
Tzuu [11] . The basic idea is to bootstr:tp pnss transistor :'vl7 wi th its own gate-
sou rce capacitance so that al l u f the 'oltagc clevclopcd tJy the word line dri ver is 
clcli\Crcd onto the word line. 
CALTECH CONFERENCE ON VLSI, J a nuary 1981 
336 
James J . Che~PY and Ce~ald L . Roylance 
v 
0 
e> ~ M1 p 5/5 
lltl7 
110/5 
AO~ M3 A1~ A2-l M5 
10/5 10/5 10/5 Word Line 
AEN ~ M2 15/5 
llt18 
1)/5 
(three bits of decoder NOF!g:Jte shown) 
Word Line Decoder Driver 
Operation of tile dccodrr driver sta rts with <pp precharging the gates of M8 (lhc 
clamp transistor) :mel M7 (the pass transistor) while Yn (word line driver), <pn, and 
AEN arc all low. When t he address lines (AO-A6) have sdtlcd, <pp goes low 
followed by AEN going high. If all of the address lines inputs of the NOR gate 
decoder are low, then the word line is selected and the pass transistor will allow the 
word line pulse to pass through. In that case, when <pu comes on it will discharge 
the gate of M8, allowing the ou tput to rise when VD comes along. As V0 rises, 
isolation transistor M6 turns orr allowing M7 to bootstrap. 
If one of the address lines in the NOR decoder is high, then the word line is not 
se lectee!. When A EN goes high, the gate of M7 is discharged, turning it off and 
isol:tting the word line from the driver. <p0 is prevented from discharging the gate 
of M8, the clamp transistor, uy MlO. 
Tzou's design docs not include isolation transistor M6. Withont this transistor, 
much of the bootstrap charge on M7 is lost in charging the diffusion capacitance of 
the decoder logic. Another addition to Tzou's design is M2, which provides a 
simple means or disabling the address lines. 
INNOVATIVE CIRCUIT DESIGNS SESSION 
337 
A One Tr ansistor RAM for MPC Projects 
The capacitance associated '' ith the drain of M7 is a surprising!) !urge load to 
the word line dri ve r. While the word line capacitance fo1· :1 161·: bit version of the 
RAM for a typical process ~AouiJ l>c only 2pF. the combined drai11 capacitances of 
128 decoders is several times higltc1 (about 8pF). Our first la)out of M7 used a 
stra ight ga te and lwei a capacitance rrom the drain to ground of .lpF. lknding the 
gate imo a rectangle (u sugges tion of Pat f1osshan) clilllinated the sidewall 
capacitance nncl reduced the capacitance to .068pF r er clr~1 in . 1\nother layout, 
suggested by Tom Knight but not used by us, has an ciTectivl capacitance of 
.056pF. 
5. Boot~lrap Drivers 
In a typ ical application the memory array is large and con~cquen ll y the clock 
signals arc large capacitive loads which require dri ver circu its. V.'e designed two 
bootstrap drivers: one is a 5 vo lt dri ve r lo r the address inputs or the word line 
decoder and qJ 3 of' the sense amplifier and the otheris a 7.5 volt driver for 
rrecharging the bit lines (qJ1). A third driver v. hich is for the word lines (VD)has not 
been designed. The 5 volt circuit takes 5nS to switch a l .JpF load (128 51-L x 5J.L 
gates); the 7.5 volt circuit takes 15nS to clri\C the same load. 
Neither of the dri ver circuits is con nected to the memory array so that the 
drivers and the memory can be tested independently. S~1pcr buffers arc used for 
the address buffers in the current memory array, but all other clocks are simply 
bonded out to pncts. Normal depletion load inverters and su per buffers are not 
acceptable for the word line dri ve r because their logic zero output is not 0.0 volts. 
Super buffers consume more standby power than a d) namic bootstrap dri ve r and so 
super buffers arc less clc'iireablc in a large memory. 
The 5 volt bootstrap dri\ cr is modeled alter one used in the INTEL 2 118 16K 
dynam ic RAM (sec ligurc). <Jlc 1.R prcclwrgcs the gates o r Mll an cl M IJ high, 
turning them on. When q>1N starts to rise, it ch:11 gcs cap:tcito r M9 and stru·ts to turn 
MlO and M 12 OIL M6 isolates node 18, allowing that nod~ to bootstrap nnd keep 
M5 turned on hard. M 1 and M4 form a comp:11 atl)r that notices when cr 1 ~ has gone 
nbovc 2 threshold drops. Wh en this happens. M·1 turns on nnd pulls nodes 12 and 
CALTECH CONFERENCE ON VLSI, Janua ry 1981 
338 
·1'1.mes ,T . Cheroroy a>?.d Gero-zli L . lfoylan~e 
18 dO\'. n to ground. M ll and M 13, which had b(Cll lloldi ng dcm n nodes 13 and 
1-1, 110\\ tu rn orr. k tting those IWUCS rbe Capacitor ~ 1 9 bOl) t~trap~ ; node 16 (which 
\\~ts i ~o l ~llcd by MS \\hen i\ 15'::-, gate kll), turning 1\1110 nnd M 12 on hard. M 12 
pu ll ~· the output IWtk \ultage up. cp1, c.tn 11ll'\ fall without affcuing the rcsL o f the 
Circuit [,ccausc \15 is oiT. <J , lr turns \Ill and \1lJ on ~111d llllllS off 1\110 and 
\112, fl)rcing the output lo\'. anu rese tting the circuit. The boot strap capacitor M9 
i'; dri\ ~~ n frnm lhH .. k 13 and nut l'ron1 node 1·1 to get more gate dri \ c on M 12 which 
~,!r nifi ca ntly i111 prm..:s the ou tp..1t rise time. 
e> 
IN 
(7) - 1 CLn 
M1 
20/5 
M5 
40/5 
M4 
40/5 
e) CL-;4 
16 
M8 
10/5 
5 Volt Ooot:; tran Driver 
14 f-.!.:!..--·,--...,.-~' 0 u t p u t 
Ml3 
100/5 
v 100/5 
·111c high voltage booLstrnp driver (:lftcr Chan [2]) is basically two of the 5 vult 
uri\ers that h<1ve been mergl'd together. Vll.J bootstraps in much the same way as 
iv19 did in the previous circuit When M 15 is bootstrapping, node 10 (which was 
pushed high by I! 1,) connects noclc 8 to VI>D' As node ·1 rises, the output (node 7) 
also r ic.;cs. M 18 a11cl M n forlll nnothcr comparator that no tices \vhcn the output 
has c,\cccdcd two V 1; then i\ 121 pulls clown node 10, turning o fT M 17 which had 
been ho lding one tt:rm inal or c: tpacitor M 15 down at ground and also turning off 
~ 11 2 v.hich h<td been holding the other tc1minal at V1>1>. M16 has been turned on 
becauc.;c node L~ has been Ollt>htrapp~d hi~h earlier. M15 now boobtmps node 8. 
INNOVATIVE CIRCUIT DESIGNS SESSION 
339 
A One TPanaistor RAM for ~PC Pr ojects 
M 11 is still on, so nock 5 fo llows node 8 \\ hich pushes rll)Lk ..t still higher. ode 4 
nwkes 1\tllJ and l'vll 6 '>lay lllrnecl on; thus, the outrut folhl\\s rroLk 8. The clever 
fea ture o f tlli -.; circu it:; is that l'vl15 docs not ch~rrgc share \\ itll the I x1d C::lpaci tance 
until the load capaci ta~lc~ has bc·.;n charged up to two V 1 (ie, unt i l the compa rntor 
trips). 
6. Conclusion 
We cl cs irncd two project chips to test our dc~.igns . One chip is a 128 by 8 (l K) 
arTa) of memory ce lls with sense amplilicrs, word line ck cotk rs (implemen ted with 
super buller address drivers), and nllrltiplcxccl read/writ e luljic. This array size 
rrovidcs a reasonable le~~~; i bility test lor building a l6K sub-;ystcnl. No clock or 
t iming scncratio n is includccl on the chip bccmrsc \\e did nut have enough time. 
We expect <1Crcss tinH.::) of 150ns and cycle tim es of 25Uns. A second, S1~pnratc 
project chip kst the 5 \Oil ar1d the 7.5 volt bootstrap drivers. Additil)n<rl circu itry is 
includcu on tlwt chip for mc<~suring the cdpuciti\l' loaJing on the drivers when a 
CALTECH CO NFERENCE ON VLSI, January 1981 
340 
James J. CheP Py and CePaZd L . Roylance 
low capacita nce probe is connected to their outputs. 
111c o riginal goal o f this cl"lo1t w<~s to develop a high density memory subsystem 
tllLtt could he treated as a "black box " hy designers w ith little or no analog 
background. We severely undcreslimntccl the magnitude o f such a task. As it 
swncls, our results can only be viewed a~; a first cut townrds that goal. We found 
that many design chu llcngcs lie in the peripheral cirwitry such as drivers and 
decoders: there is much more to a one transistor RAM than . ense amplifiers. 
We llwnk rv1 ark Johnson lo t· telling us about laying out high frequency gate 
midc capucitors ami for spotting the undesirable control line coupling in the sense 
a1~1p l i lie r. We thank Tom !( nigh t for general guidance and mornl support. 1l1e 
RAI\ I was cle')ignccl as a term project for an MOS analog circuit design course 
taught by Prof. Yannis Tsi \oidis of Columbia Universi ty while visiting MIT. Prof. 
Tsividis has gi ven both o f us a better understanding of using MOSFETs in both 
analog and digital design. 
7. Bibliography 
I. J. Barnes. "/\ ll igh Performance Scn<;e /\ rnrlifler for a 5V Dynamic R/\M", ll•:n : Journal of 
~ol ill·Statc Circuits, Vol SC-15, No. 5, October 1980, pp 8J L-839. 
2. J. Y. Chan, el al. "/\ l OOnS SV Only 6-tKxl t\ IOS Dynamic R/\M", IEEE .Journal of Solid· 
Stale Circuits, Vol SC· LS, No. 5, 0-:tobcr 1980, pp 839-846. 
3. Flcclronic Design, "Circuit Tcchni<.jues Tunc Up for Production of 64-K R/\Ms", Electronic 
I k'>igu, 0-:tobcr 25, I 980, pp J 1-32. 
4. K. (ir.ly, "Crw.,s·Couplcd Chaq;e Trnnsrc r Scns.:! /\mpliflcr and l.atch Sense Scheme for lligh· 
lkll';it} ! ' 1:1' Mcmorie'>", IB I\ 1 .ltntmal or l~ cc;ca n: h aut! DclciOJllllenl, Vol 24, No.3, May L980, 
pp 2~D·290. 
INNOVATIVE CIRCUIT DESIGNS SESSION 
A One Transistor RAM for MPC Projects 341 
5. 1.. G. lleller. D. P. Spampinato. Y. 1.. YclO, " ll igh-Sensitivity Charge·Trallsfer Sense 
J\mpli fier", 1975 IEEE I ntcruat ion a I Solid-State Circuit ~ Conference, pp 112-113. 
6. 1.. G. llcl ler, D. P. Spampinat 1. Y. 1.. Yao, " ll igh Sen-;iti\ ity Charge-Transfer Sense 
Amplifier", li·.EE Jourual of Solid-S t:.tc Circuits, Vol. SC- 11, No. 5, Octob~1 1975, pp 596-601. 
7. 1.. G. ll cl ler, "Cross-Coupled Cl <~rgc-Transfer Sense t\m plilicr", 1979 IE EE Internationa l 
Solid-Stale Circuits Conference, Feb 1979, pp 20-2 l. 
8. T. C. l.o, R. E. Scheuerlcin, R. Ta nlyn, " /\ 64K Dynamic Random ,\rccs·, \1emory: Dcc;ign 
considerations and Description" , IB!\1 Journa l of Hcsc:nch and Dc,elopmcnt, \ ol. 24, No . .3, M.1y 
1980, pp 318-327. 
9. W. T. Lynch, II. .J . 13oll, "Optimii'a:ion of the l.<ttching J>u]<;c for Dynamic Flip-Flop Sensors", 
IEFE .Journal of So lid-State Circuit··, Vol SC-9, No. 2, 1\pril 1974, pp 49-55. 
10. F. J. Smith, R. T. Yu, I. l.ce, S. S. Wong. l'vl. P. t-:mbratlmy. "/\ 04 kbit MOS D)namic RAM 
with Novel ~ 1 cmory Capaciwr". ll·,n . .J ourual of ~-iulid-State Circuit s, Vol. SC-15. N11. 2, April 
1980, pp 184-189. 
11. t \. Tzou. et al. "!\ 25<>1<- Bit Charge-Coupled Device t\lcmory", IB!\1 .J ournal of Research and 
lklclO(llllcnt, Vol 24, No. J, May 1980, pp 328-338. 
12. Y. S. Yee, I . M. Terman, L. G. ll ellcr, " /\ I mY MOS Comparator", 11-: FE .Journal of Solid-
State Ci rcuits, Vol. SC- L3, No. 3, June 1978, pp 294-297. 
CAL TECH CONFERENCE ON VLSI, Ja nuary 1981 
342 
INNOVATIVE CIRCUIT DESIGN SESSION 
PLA Design in NAND Structure 
Chong Ming Lin 
Semiconductor Engineering Group 
Digital Equipment Corporation 
75 Reed Road 
Hudson, MA 01749 (617) 568-4888 
343 
ABSTRACT--A NAND (serial gating) structure PLA of the MOS poly-silicon gate 
process has been developed for high density and medium fast speed VLSI 
application. Dynamic clocking is used for minimum power dissipation and 
elimination of the ratio problem associated with static NAND gate. 
Ion-implantation for memory cell programming and the elimination of contact in 
the memory area drastically reduces the cell size, and reliability is improved. 
A simple but effective self-timed clocking scheme is employed for better 
operating margins against process variations; the overhead chip area for the 
clock generation is sufficiently small. The advantages of allowing metal signal 
and power lines to cross the PLA memory area is discussed. Some measured data 
from a 3. 5pm NMOS Si-gate process with regard to gate height and transistor 
sizes are also described. 
INTRODUCTION 
In MOS circuit design, the NAND circuit, due to its inherited electrical 
characteristics, has been restricted in application for only a limited number of 
inputs. In the past, ion implant programmed NAND structures had only been used 
for very slow speed ROM applications [1] [2]. However, with the fast progress 
in process technology, a properly designed NAND structure PLA is becoming more 
attractive for some of the existing applications. New product of a new 
structure with a new process is always impressive for its performance, but its 
cost effectiveness is not guaranteed on production level. This question was 
better stated by G. Moore in his lecture, in the 1st Cal tech VLSI Conference, 
January 1980: 
"· •. the semiconductor industry is not now process--technology limited for 
non-memory product. How to best make use of the processing technology is 
really what the problem is." 
CALTECH CONFERENCE ON VLSIJ January 1981 
344 
Chong Ming Lin 
PLA Design in NAND Structure 
The experiment discussed in this report was done primarily as an answer to a 
request, in November 1979, for a high density and medium fast speed PLA design. 
In order to make best use of a then newly developed 3.5 pm process and fully 
utilize the given timing spec of the speed requirement, a NAND structure PLA was 
proposed for better performance which was also cost effective for an existing 
application and beyond. 
In order to achieve overall low chip size and a high operating margin; a 
dynamic, self-timed clocking scheme was proposed for most of the circuitries. 
Knowledge from measured data are used to construct circuits with better 
performance than the original approaches used on a circuit test chip [3). 
Since the NAND and the NOR structures are the two basic building blocks and 
complementary to each other in HOS circuit design, the author feels that the 
study is also useful for understanding the general NOR type PLA design as an 
expected by-product. 
In the following sections, process background, 
layout, reliability and design consideration will 
improvement will also be discussed. 
PROCESS BACKGROUND 
NAND circuit model, cell 
be described. Further 
Although the process performance is crucial in the evaluation of a new 
circuit structure, this information was not available in the previous papers [1] 
[2]. In order to test out some of the assumptions and limitations of the NAND 
structure PLA, a test chip was designed and manufactured in 1979 with a 3. 5 ~m 
NMOS process , P400. This process uses ion implantation for Source/Drain, plasma 
dry etching , silicon doped aluminum; plus 4 types of devices, as shown in Table 
I , which provide more flexibility in circuit design and better 
powerXspeed Xdensi ty product than many of the processes in the previous 
generation. 
The performance of the process was measured through a ring-oscillator built 
on the test chip. As shown in Fig. 1(c), the powerXspeed product curves 
indicate that in a typical case, the performance of that ring-oscillator is 
around 0.35 pj at 5V vee. -3V VBB, and RT. Those curves also indicate that the 
process parameters are being optimized against those skewed process corners. 
With this kind of performance, the NAND PLA does have better potential for some 
applications which were not feasible by processes of the previous generations. 
INNOVATIVE CIRCUIT DESIGNS SESSION 
PLA Design in NAND St~uctu~e 
TABLE X 
Device threshold voltages and 1on-l.Jr1:·lantatlon def1nll1on 
of the P400 proce~~. 
Boron Arsenic 
Enhancement X --
Intrinsic -- --
Oepletion-1 X --
Oepletion-2 
--
X 
V"rt . Jvll'l v . Hori. 1 Ons/Dt v. 
Fig. lb- The wavE' f orm o f the 
ring-oscillator, mPa-
sured at 23 c, 5V vee, 
and -3V VBB. 
Vth (typical) 
0.7v 
o . ov 
-1. 2v 
-2. Sv 
F'ig.la - Phuton1crogra~: . of a 
ring-osc illator with 19 
st~~s and fan- in ~td 
fan-out of 1. 
- ~ce ". 5V,t~- . -I 
lV JV $V 
Fig. lc - Performance of the ring-
osci llator at 23 e, 5V vee. 
"•· Jc - ..,. .-.,...h •t tM cUftlc:YltJ' l• l•t•rt.c.t., 
,. • - u ull )tld• •• ,.._ ltotJID ?1.A · 
345 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
346 
T.ULI II 
Elf'etrtW ,.,...,.t.,.. 
""" 
Lo&J e 1hronho ld . 
__!c=._ 
IE£ v 'l><l pd 
.. ~..-,.,. T'JO\ 11 TDI\' 
1\U!-::tJwr-. Dn1~ W1dtf'l 
(to •Cilt•ve ..,.. 'VOl') "'~«:Jt . wur. .. 
"' )e 
""" 
__!;;_.___ 
H ~ 
TJrrWo'I. • n·T.Dl\ 
~ • n- 'loJ:I\' 
F111, 11> 
• Ass~rt. ..,... uu 
ror &ll p.U.l-o::-
ctevt~. 
• • 11'\rcn- t hn. 
1a not • 
prob)~. 
IWC. C1J"'eWU W1t.t': S'Latl t Ml-!:f ~ Cli"'('UJ\ 111\V j')T'!r.tr ~l-:5 
fO'.IPLA Ct ll t..Yout or 
.u-,.. c.ont..ct Prccrer.'Jrc. 
" " 'P:.A Cf>!l La;,c.-.· o!" 
GNO 
~ ~r. ... s:r. f'I""'gT'W'".! ~ wlu 
l-: f',.-r.· .7i.at.-O:U.» 
1'hte~e ... 
l.c)n..WltnUttan ~.1~ 
Vl.Ut tiiiO a.t•~ or Pol)'-St. 
POLY WOR::l LINE 
]t. • 2,. lD lfJ9 
Al BIT ll NE 
uj 1 1 .A 
LINE 
AL. BilliNE 
l iii lr- O...l 
Lf5-i 
9all~ ( i) 
INNOVATIVE CIRCUIT DESIGNS SESSIO N 
Chong 111:ng Lin 
·o· 
···¢ 
.,. 
Rll1ab!lltY :ttu:ty or uw "*Plant ~ .. ec.p 
A hlC ~IC't' M~~~.,.. 
2a r~. c:an ale:-:~­
UIIf' trans t llon t11r!t! and 
n?.'7ir;r,-,l-h.;,-ne• .... •1 t\n:t lor. 
IJ~Flant 
P£A Design in NAND Structure 347 
PLA Design in NAND Structure 
MODEL OF NAND CIRCUIT 
As shown in Table II, the major reasons that the NAND circuit has not been 
used as popularly as the NOR structure are due to its inferior electrical 
properties in comparison with NOR circuit [4]. Although NAND circuit's d.c . 
characteristics, such as its logic threshold and pull-down device channel width, 
shown in Fig. 3(a), can be improved by using dynamic pull-up device, shown in 
Fig. 3(b). However, the delay time of a NAND circuit is slow and proportional 
to its number of transistors in series. Even so. a NAND circuit can use ion 
implantation as a programming method for its memory bits, as shown in Fig. 4, 
and this programming approach leads to the smallest PLA/ ROM cell size that can 
be achieved by the current HOS technology. 
CELL LAYOUT 
For a conventional NOR Structure PLA / ROM Cell, as shown in Fig. 5(a). the 
size of a cell is defined and limited by its components--word line (poly), bit 
line (aluminum), contact between drain (N+) and bit line, memory transistor gate 
area, drain, and source (N+). Furthermore, due to process requirement, minimum 
area and space for each element must be used in the implementation of the memory 
cell. Thus, within the limitation of the present process technology, selection 
and/or elimination a certain part of the memory cell leads to different 
structure variations and cell sizes in PLA/ ROM cell design. Table III shows the 
size ratio and then the basic properties of the four major types of cells which 
are well known to the public with straight forward layout techniques. From Fig. 
5, it is obvious that the NAND structured PLA/ ROM can be made up to a quarter of 
the size of a 'contact programming' cell, and the elimination of the contact 
also enhances the circuit reliability. 
The size of the NAND Structure PLA/ROM cell can be made smaller if the 
process can provide smaller poly and N+ lines without causing electrical 
problems [ 3]. Furthermore. the alignment and resolution of the ion implant 
process hold the last barrier on the minimum size this approach can be. 
When the NAND PLA/ROM cells are placed closer to each other, the enhancement 
and depletion implant can overlap into each other. Out of the four possible 
overlapping cases, there is one fatal case and the other three cases, although 
not fatal, can all cause electrical problems as shown in Fig. 6. 
One way to test out the process and equipment limitations is using a 
checkerboard test pattern, shown in Fig. 6(c). This test pattern, if properly 
decoded. would be able to indicate the safty margins left in a process for 
implementing the NAND structure PLA/ROM. On the other hand, a NAND structured 
PLA/ROM is also a good test tool for process control monitoring, especially for 
ion implantation's definition control. 
CALTeCH CONFeRe NCe ON VLSI, January 1981 
348 Ch ong Mi ng Li n 
PLA Design in NAND Structure 
CIRCUIT DESIGN CONSIDERATIONS 
As shown in Table II, the NAND Structure PLA with static pull-up would have 
difficulties with ratio, pull-down device size, and slow speed in discharge 
against the constant conducting pull-up device. In dynamic operation, the ratio 
problem is eliminated, pull-down device size is minimized, and discharging time 
is reduced. However, the generation and implementation of the control clocks 
will complicate the design and require extra silicon area. As a result, in 
dynamic operation, the design effort and total implementation area for a PLA is 
more than putting two ROM arrays together. Furthermore, in NOR structure 
dynamic (or semi-dynamic) PLA design, an interface circuit is needed to allow 
precharging of the two arrays at the same time during the early cycle time of 
the operation. Consequently, a clock scheme of four phases generated from the 
system is the most common approach, and the safest way is using four 
non-overlapping clock phases to execute the operation at the expense of slow 
through-put time and zero process tracking capability. 
For the proposed NAND PLA, the elimination of the interface circuit between 
the two arrays simplifies the layout work between the arrays and save area from 
implementing the interface circuit. Basically, a dynamic NAND circuit's access 
time is limited by its precharge and discharge time. In this proposed NAND PLA, 
precharge time is significantly reduced with precharge from both ends, because 
RC constant of the serial channel is halfed. The precharging devices of the AND 
array are driven by bootstrapped voltage level, which gives the AND array full 
VCC level that helps to speed up the discharge time in the OR array. The 
precharge operati on can be further optimized by generating a longer pulse width 
with normal vee level for the OR array, because the OR array will be enabled 
only after the AND array is settled. A high beta ratio input inverter in the 
output register wi th amplified positive feedback through an intrinsic device, as 
shown in Fig. 8 (g), allows a weak logic "1" output from the OR array a~ VCC-VTn 
with no difficulty in the initial sensing and final stored level. 
CLOCKING SCHEME AND GENERATION 
The proposed NAND PLA uses five clock phases. Their wave forms are show in 
Fig. 7(a), where Cin, Cpr, Cena, and Ceno are used for the control of the PLA 
operation flow. While Cla is designed for the latch of the processed data 
against precharge and low frequency operation's leakage problem. It depends 
upon each design's system spec and circuit structure, Cla may be spared without 
effecting the performance. 
INNO VATIVE CIRCUIT DESIGNS SESSIO N 
349 
PLA Design in NAND StPuctuPe 
No. ~ .. 
~ pft' Oitll 
. I 
b 112 
. , 
• 
, 
I ' ~ \:li:( ~ f¥1''""· 
c., c.. 
TAIIU Ill JIOV"PLA 0.11 1'Jpe ord Sturct""' 
~~ hfii.'GJ• ,.,__,.,. fUjti!i".., .,_,.,, . 
tt:~. or c.u Ho. or Otll ... " ... 
HOft StNCtW"f' 
• 2 
. • I Oont.oct 1.0 I'Ut1\arn ~ 't'lme 
l«l't StNC"t~ 
• 2 • 1 Dlf'f\aton • 2/) '!'h1n O.te Ox.l~ Stle-c:twd 
, .. 
- m .. aeltteteod 
IWCl Strut't~ 
~ fiNncoetnmt ~Unt- Selected .. . 
- 2 - 11) Depletion ~lant- D:ln't Car'f 
NNG) StNCtut"e 
• 2 
- 2 Ulllam - lll 'TV:) la)'tr. or poly-$1. yield ta 
le11 u.n s1,.1~ Poly pnxeu. 
'J" Da ta - JN 
--c;;;;-l_[J 
~L----' 
~---
:x----~.,~.~~----~:~ 
leU. 
,., 
F10. ;a CLoc• ScHEH£ ••o ~•vE FoAHs 
... THE NAND PLA 
,., 
"' 
0: 1 
v 
Dl 
Ft~.s - Ct•cutu uno t• THE NAND PLA 
.. ,...,.,_._,,., 
• · A c•U a l ta.. ,ue..,., ... , 
c. "uti ar ttw oe •• , ... , 
41 . C.M &lftet Uar aftlll 4~ l~t- .... fllt 
• . c ......... , ••• , 
f. C h IIMUIOf 
•· •. ,.. .......... , 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
350 
PLA Design in NAND Structure 
All these clocks are derived from the rising edge of the input system clock, 
Tclk, and they are generated through dummy circuits which provide enough 
tracking capability against process variations and power supply changes. In 
order to use the same circuits or cells to achieve the best result for dummy 
circuits, the proper number of depletion devices are suggested to use for 
capacitance loading duplication, and device width effect {3} is another source 
to provide extra operation margin. 
CIRCUIT OPERATION: 
1. Array precharge--
Cpr is a self-timed clock pulse triggered by the rising edge of the system 
clock, Tclk. As shown in Fig. 9, both epr and ein are generated through a dummy 
circuit which uses a row of the OR array to track the loading in OR array and 
uses a column of the AND array to track the completion of the precharge action. 
During this period, both Cena and Ceno are 1 0 1 • This allows both arrays to be 
fully charged to their highest possible level against couplings and the 
so-called "charge sharing" problem (3). 
2. Data input--
New input data should be ready in the beginning of a new cycle. Those 
input data are strobed into the input buffer through T1, see Fig. 8(a), and 
temporarily stored at node <D after the ein pulse is gone. During the 
precharge time, T4 is turned 'off' by ein. This guarantees the completion of 
the precharge to be independent of those input data--such that either T3 is 'on' 
or 'off', there is no d.c. path through T2 to ground due to the exclusive wave 
forms between Cpr and Cin. The input buffer only consists of 4 transistors. 
Transistor T2 is an intrinsic device which gives a higher output voltage than an 
enhancement device when epr is high, but it will not conduct much current, with 
proper choice of channel length, when epr is low. This structure allows optimal 
output level and minimum device sizes for the pull-down devices T3 and T4, as 
well as for T2 itself, partly because the intrinsic device has the highest 
mobility among the four types of devices available in this process. Also, due 
to the compactness of the structure and the lack of appreciable d .c. path in 
this input buffer, the interface problem is helped and power dissipation is 
greatly reduced. 
3. Enabling of the arrays--
Once the arrays are fully charged, epr and then ein go down to '0'. When 
ein is '1', T4 in the dummy input buffer, INdmbf, turns 'on', and the output of 
the INdmbf, see Fig. 8(d), start to change from a precharged '1' to '0' due to 
the input is vee. With enough depletion type capacitors on the output and a 
Schmitt trigger to sense the change, eena is ensured to turn 'on' only after the 
completion of all the inverted input data have been transferred to their 
outputs, or word lines of the AND array. 
INNOVATIVE CIRCUIT DESIGNS SESSION 
PLA Design in NAND St ~u~tuPe 351 
PLA Design in NAND Structure 
Cena enables the AND array to decode its inputs through its pre-programmed 
memory bits. Cena also enables a dummy circuit in the AND array to discharge 
from its precharged level to '0' and thus generate the Ceno pulse, as shown in 
Fig. 8(e), by the same principle as the Cena generation. With the starting of 
the Ceno clock pulse, NAND PLA is ready to send out its decoded results to the 
output register and outside buses. 
4. Output Register--
As shown in Fig. 8(g), the output buffer contains 1 transistors with 
static pull-ups used in the register to provide easy data storage through 
amplified positive feedback. The advantage of precharging the output data bus 
lines is incorporated into the circuit design to save power and size in the 
output section. Because of the precharge from T17, T15 can be made small as D1 
(light depletion) device for sustaining purpose. This also speeds up the 
discharge on the bus line if bout is a '0'. 
Even though the output section only contains a minimum number of 
transistors, the layout work to interface the OR array cell pitch to the output 
buffer is not an easy task. Techniques like: combining two buffers together; 
bringing out output from both ends of the OR array; or constructing the buffers 
at a distance and then connect the two parts through spread out N+, Poly, or 
metal lines, are up to the designer's choice for the best matching between the 
NAND PLA and system requirement. 
5. Latch of the output register--
For a dynamic PLA without any sus taining pull-up device used in the 
arrays, maintaining Cena = Ceno = '1' to keep ORRDY at '0' is needed against 
noise and coupling. Furthermore, isolating the OR array precharged outputs from 
their output buffers is also crucial for operation at low frequenc y where 
leakage eventually will change a precharged '1' to a '0'. In this proposed NAND 
PLA, a simple but effective design is used, as shown in Fig. 7(b). The dummy 
circuit in the OR array generates a delayed "data is ready" signal, ORRDY, which 
tracks after the completion of the OR array transition through narrower device 
width transistors and / or a Schmitt trigger. The Cla clock is also controlled by 
a system latch signal, SYSLA, which happens only after the transition is over, 
see Fig. 7(b). 
CALTECH CO NFERENCE ON VLSI , JanuaPy 1981 
352 
Chong Ming Lin 
PLA Design in NAND Structure 
TEST RESULTS 
Since most of the circuits proposed for the NAND PLA in this paper are of 
dynamic operation, power dissipation is being optimized to a minimum. Thus, the 
major concerns left for this approach are speed related device width effect, 
gate height effect, circuit operating range, and effect of precharge methods 
[3]. These measured data are considered to be useful as a reference point in 
related applications with this kind of circuit structure. 
FURTHER IMPROVEMENT 
SPEED: If the negative '0' level can be generated from an external VBB 
power supply, a Depletion-1 /Depletion-2 pair can be used for a 
transistor programming purpose. A D1 device, even with the lowest 
carrier mobility among the four types of devices, does c o n d u c t 
current more strongly at the same gate voltage than the enhancement 
type, thus the D1/D2 pair would be faster in transition time than 
the regular E/D2 pair. 
DENSITY: Since this structure does not need metal lines in the array, 
this PLA's memory area is free for metal lines of other 
functions on the chip, as shown in Fig. 10. If this PLA 
section is properly located on the chip, further area savings 
can be achieved through sharing the memory area with wide power 
lines or limited number of data/control lines. The problem of 
poly lines going across the N+ lines can be handled by using 
depletion implant at the cross section. Properly clocked poly 
line, buried contact, and a few more contact points to the 
power lines will further improve the topology and electric 
conditions in this special application. 
~ A - DropleUM DtY1cc1 
.J'i.. - ....__ ""''" 
INNOVATIVE CIRCUIT DESIGNS SESSION 
Fig.7b- The Circuit 
Structure of the NAND 
PLA. 
PLA Design in NAND StPuctuPe 
Poly WORO LtNE of ANO- AkAAY 
Vte 
TCLK~ I 
08-i 
DB·t•1 
CNTL- j 
y 
vss )t 
--\ INPuT BuFFER 
- - - - - , ARRAYS 
-:-t~t-"~i 
---·- ·- -- ---.J 
~0UI'II'IY (ELL 
OR-ARRAY 
f'l C. ! J lnput-8uffer and the genera. tion of the ae lf-t 1.mod 
clocks c 1N and cp~· 
Vee n+ 
1//7//// 1/ / / /// /•V / 
VAN'o/ Qfr~( / 1/ I/ / '/qR/ frray/~/ 
LAv 
v~ 
. 
n-f .___AIXH:SS~ ___ lNPllr __ Elm'1l1 __ _;! Cpr Po ly 
Fl&- 10 PtA ~ry ~a 1s~ for Jlletal Unesl ! I 
• Use Deplt-t! on Ion--1111Jlant to allow Pol.y and N+ 
~~]gf=r=~~ , Poly T 1: • : .. _ • _J ~- . ; j
"'etal (Al) .J - N+ 
Poly 
353 
CALTECH CONFERENCE ON VL SI , JanuaPy 1981 
354 
ACKNOWLEDGEMENTS 
The author would like to thank those whose contribution and help made this work 
possible. 
L. Nguyen --His request for development of an 
advanced VLSI Chip. 
D. Morgan 
Tuan H.T. (BURROUGHS) --Their technical evaluation and 
encouragement. 
J. Zeh --His vision and commitment on the 
project. 
J. Schneider --His decision, funding and continuous 
support. 
T. Northrup, K. Slater, 
and F. Zereski --Their continuous encouragement and 
support. 
Layout 
--H. Nguyen, J. McHood, H. Riley and 
A. Sella 
Process 
--L. Y. Wu 
P400 Program 
--R. Spencer 
Test 
--R. Saul and D. Cote 
Review and preparation of the manuscripts, figures and typing--H. Forsyth, J. 
Blake, A. Flohr, J. Giles, R. Ryan, and M. Burton. 
REFERENCES 
1. H. Kawagoe and Nobuhior Tsuji, "Minimum size ROM structure compatible with 
silicon-gate E/D/MOS LSI," IEEE J. Solid-State Circuits, Vol. sc-11, No. 3, 
pp. 360-364, June 1976. 
2. Y. Kitano, S. Kohda, H. Kiduchi and S. Sakai, "A ~Mb full wafer ROM," in 
ISSCC Dig. Tech. Papers, Feb. 1980, pp. 150-151. 
3. Chong M. Lin, "A 4um NMOS NAND Structure PLA," IEEE J. Solid-State Circuits, 
April 1981. 
4. Carver Mead, Lynn Conway and Charles L. Seitz, Introduction to 
systems, Addison-Wesley Publishing Co., 1980, pp. 15-16, and Ch. 7. 
INNOVATIVE CIRCUIT DESIGNS SESSTON 
VLSI 
Abstract 
A MULTIPROJECT CHIP APPROACH TO THE TEACHING OF 
ANALOG t10S LSI AND VLSI 
Yannis P. Tsividis* and Dimitri A. ~ntoniadis 
Department of Electrical Engineering and Computer Science 
Massachusetts Institute of Technology 
Cambridge , MA 02139 
355 
Multiproject chip implementation has been used in teaching analog MOS 
circuit design. After having worked with computer simulation and layout airls 
in homework problems, students designed novel circuits including several high 
performance op amps, an A/D converter, a switched capacitor filter , a 1 K 
dynamic RAM, and a variety of less conventional MOS circuits such as a V/I 
converter, an AC/DC converter , an AM radio receiver, a digitally-controlled 
analog signal processor , and on-chip circuitry for measuring transistor 
capacitances . These circuits were laid out as part of an NMOS multiproject 
chip. Several of the designs exhibit a considerable degree of innovation; 
fabrication pending , computer simulation shows that s~ne may be pushing the 
state of the art. Several designs are of interest to digital designer s ; in 
fact, the course has provided knowledge and technique needed for detailed 
digital circuit design at the gate level . 
1. INTRODUCTION 
During the last few years the development of MOS IC design has advanced 
in two fronts. On one hand improvements in fabrication have made possible the 
implementation of LSI and VLSI digital systems. On the other hand, 
introduction of analog MOS circuit techniques has made possible single chip 
integration of high performance analog and analog/digital circuits (1) such 
as A/D and D/A converters , PCM encoders and decoders and a variety of other 
telecommunication systems, switched capacitor filters, microcornputers with 
analog interfaces , several special purpose signal processors , and high 
performance operational amplifiers . 
*On leave from the Department of Electrical Engineering, Columbia University , 
during the Fall of 1980. 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
356 
Yannis P. Tsividis and DimitPi A. Anton i adis 
Courses and research projects in digital LSI and VLSI have been 
initiated in many universities. A particularly successful teaching and 
research vehicle is the incorporation of a number of diverse design projects 
on a s{ngle chip. This economical and speedy realization of new circuits and 
the associated design methodology have come to be called the "multiproject 
chip" approach (2,3). This approach has proven its value in the classroom by 
allowing students to be exposed to all aspects of integrated circuit design , 
layout and experimental evaluation of their projects . This paper describes 
the use of the multiproject chip approach in the teaching of a one-semester 
course on analog MOS circuit design, offered at MIT during the fall of 1980, 
with very encouraging results. The course has evolved from a similar course, 
taught over the last few years at Columbia University, which included the 
design project but not the multiproject chip implementation. 
2. COURSE OUTLINE 
The course to be described is at the senior/graduate level. The assumed 
background is a one 
at MIT it happened 
year junior electronics sequence (in the first offering 
that most students in the class were graduate) . No 
background in MOS devices and circuits is assumed. A list of the major topics 
covered follows: 
Semiempirical MOS transistor model 
Fabrication and computer aided layout 
Basic circuit building blocks 
Computer aided circuit analysis 
Operational amplifiers 
Large signal consideration (transient response and distortion) 
Noise 
Voltage reference sources 
Comparators 
AID and D/A converters , PCM encoders and decoders 
Switched capacitor filters 
Detailed device physics and higher order models 
I NNOVATIVE CIRCUIT DESIGNS SESSION 
A MuLtipPoject Chip AppPoach to the Teaching of 
AnaLog MOS LSI and VLSI 
357 
The topics have been selected so as to give working knowledge for the 
design of high performance analog MOS circuits; in fact the design and layout 
of such circuits is a required project in the course. The particular order in 
which the topics are persented is chosen in order to allow an early start of 
the design project which has to meet specific deadlines associated with the 
multiproject chip implementation. This makes it necessary to begin the course 
with the presentation of a semiempirical device model which has to be taken 
temporarily for granted; enough plausibility arguments are presented so that 
the model "makes sense", and students are promised a detailed derivation from 
physical principles in the last part of the course. Judging from student 
responses to a questionnaire, this does not cause problems. Both NMOS and 
CMOS circuits are covered. 
The standard square law equations, used for strong inversion in hand 
analysis of digital circuits, are often inadequate fo;· analog design 
especially when the substrate doping is relatively high. A more accurate. yet 
simple. set of equations (4) used in the course appears in Fig. 1 (th i ~ ~0~P 1 
is modified for short channel devices to include channel length modulation). 
0 
•-1*-• 
s 
K • K'( ~) 
FIGURE 1 Semiempirical DC model for long channel devices. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
[ 
358 
Yannis P. Tsividis and DimitPi A. Antoniadis 
K' is a process dependent parameter and 6 is a parameter which depends on 
substrate doping and substrate bias. However, the quantity 1+6 is only weakly 
dependent on substrate bias and to first order can be considered constant for 
a given device. Both K' and 1+6 are empirically determined by fitt i ng 
experimental I-V curves. The parameters Vro• y and •B in the threshold 
voltage expression are also empirically determined. A small signal equivalent 
circuit is derived from the above set of DC equations; when intrinsic device 
capacitances and the drain small signal conductance are added to it, the 
circuit of Fig. 2 is obtained (5,6,7). Junction and overlap capacitances are 
easily added to this model. The above models represent good compromises 
between accuracy and simplicity for hand analysis. Students are made aware of 
higher order models, but a detailed discussion of these is postponed until 
the last part of the course. 
0 
G B 
s 
FIGURE 2 Small-signal model. 
Both computer aided layout and circuit analysis are taught in 
conjunction with homework problems at which students are exposed early to the 
use of computer aids; in our case the programs AIDS (8) and SPICE (9) were 
used on a DEC-20/60 computer. 
Basic circuit building blocks are presented at a great level of detail . 
Bias, low frequency small signal operation and high frequency considerations 
INNOVATIVE CIRCUIT DESIGNS SESSION 
4 11 'A. Z t i p r> v J e .-J t '"'1-r i p A p p r> v 11 , . It t J t ~~ £< i' q rr (' h i "! u ·? f 
Mtal.o_; MOS LSI <Htd VLSI 
359 
are discussed for inverters (CtDS, depletion load N~OS anrl enhancement mode 
NMOS) , source followers , differential stages , cascade stages, current sources 
and level shifters . The knowledge thus gained is used to discuss more complex 
circuit blocks . The students are at this point ready for a detailed exposure 
to various operational amplifier configurations; bias and small signal 
calculations for working operational amplifiers are treated at length. ~iah 
frequency considerations and frequency compensation are emphasized. At <1hout 
this point students start their project work; details are given in Section 3. 
The lectures continue with the topics of transient response, distortion 
and noise. The approach used for noise analysis is that of Ref . (10) . Voltage 
reference sources, comparators , AID and D/A converters, and PCM codecs are 
then discussed and examples of working designs from the literature are 
analyzed. The emphasis in the treatment of switched capacitor circuits is on 
basic principles. Exact analysis is taught using the intuitively appealing 
concept of charge conservation within closed surfaces not crossed by 
conductors (11) . No matrix analysis is used . The students are cautioned 
against carelessly using resistive equivalents of switched capacitors and 
illustrative examples of misuse of such equivalents are presented . 1\lthough 
several working filter designs are discussed, not much time is devoted to the 
synthesis of such circuits as it is felt that this topic is hetter left to a 
course on network synthesis . 
The final topic discussed in the course is that of MOS transistor 
physics and models . Potentials within the semiconductor instead of energy 
bands are used in such a way that rigor is not compromised. Both the 
serniempi r ical model discussed above , and more accurate models are derived 
from first principles . Small geometry and high order effects are discussed 
and general capacitance and charge models are introduced . 
In the MIT offering weekly homework assignments were given during the 
fi r st part of the cour se ; later these were gradually phased out to allow more 
time for work on the design project. A total of 8 assignments were given 
during the semester . Student performance was judged from the design project, 
CALTECH CONFERENCE ON VLST, Janua~y 1981 
360 Yannis P. Tsivid i s and DimitPi A . Antoni adi s 
homework and personal interaction with the instructor. No midterm or final 
examination were given. Previous offerings of this course at Columbia 
University have included such examinations . 
3. THE DESIGN PROJECT 
Independently of whether it is finally realized on silicon , the design 
project provides the student with an opportunity to pull together what he has 
learned on circuit design. Realization of the circuit as part of the 
multiproject chip offers the additional opportunity to go through the 
remaining steps typical in an industrial environment , these being layout and 
final evaluation in the lab; it also serves as an important booster to 
student motivation . Because it is impossible to have the chips fabricated 
before the end of the semester, the evaluation in the lab cannot be a 
required part of the course; however, experience with other courses involving 
a multiproject chip has shown that the majority of the students return the 
following semester, on their own initiative, to evaluate their circuits . 
The implementation of the MIT multiproject chip is managed by MOSIS at 
the University of Southern California; the chip is to be fabricated by the 
Integrated Circuits Laboratory, Hewlett Packard, Inc . , Palo Alto. A NMOS 
enhancement-depletion single-level polysilicon process is to be used , with 
nominal substrate doping of 6 x 1014 cm- 3. The nominal threshold voltages at 
0 substrate bias are +1 V and - 4 V for the enhancement and depletion 
transistors, respectively . The layout rules followed are those in reference 
(2) , with A = 2.5 ~m. Minimum channel dimension for the projects was set at 
3A, as opposed to 2A used in digital projects , to avoid modeling i nadequacies 
at short and narrow channels . This was necessary because of lack of 
appropriate test transistors for detailed characterization. 
Polysilicon-to-depletion implant capacitor s are used as the above process 
does not permit the implementation of higher quallty structures . More 
appropriate processes for the purposes of this course are double-level 
polysilicon NMOS or CMOS, or at least a modified single-level polysilicon 
process that would allow the fabrication of reasonable value high quality 
capacitors between metal and polysilicon ; although of lower performance, 
metal-gate processes can also be used. 
I NNOVA TI VE CIRC UIT DES IGNS S ESSION 
A MultipPoject Chip AppPoach to the Teaching of 
Analog MOS LSI and VLSI 
361 
After the first third of the semester students were asked to submit a 
brief proposal outlining the design project they intended to work on . A high 
performance operational amplifier was suggested as a possible project by the 
instructor, and a set of state- of-the-art specifications that had to be met 
or exceeded was given . Minimization of power consumption was emphasized as 
one of the most important design goals. Recently designed high-performance 
operational amplifiers in the industry were claimed to have a power 
dissipation of only 0.75 mW, so this was set as a specification to be 
bettered. The students were asked to work in groups of two or wore in order 
to reduce their work load, facilitate supervision and avoid overloading the 
computer facilities. All student designs were simulated using the program 
SPICE. The students were supplied with model parameters, which were derived 
from the information we had on the process to be used. Unfortunately, no 
appropriate test devices were available for detailed characterization, so we 
had to use instead devices integrated using a related process and then 
extrapolate the results. A further complication arose from the fact that we 
had no previous experience with the model in the particular version of the 
program SPICE we used. However, every effort was made to use as reasonable a 
set of model parameters as possible, and it is hoped that simulation results 
are a good indication of what will be seen in the laboratory when the 
fabricated chips are received. The design projects undertaken by the various 
groups are listed below: 
Low power enhancement/depletion operational amplifier (5 groups) 
Low power enhancement-only operational amplifier (3 groups) 
High speed operational amplifier 
Autozeroing operational amplifier 
AID converter 
AC/DC converter 
Voltage to current converter 
Switched capacitor filter 
Digitally programmable analog filter 
1 Kbit dynamic RAM 
On-chip circuitry for MOS transistor capacitance measurement 
AM radio receiver 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
362 Ya nniR P . Ts iv id i s '1.nJ Di rnitr> i A. 4nt 'J n/.t ii s 
Some of the student designs will be briefly described below, and 
representative computer simulation results will be quoted . As can be seen in 
the list given above , the most popular design project was that of a low-power 
enhancement- depletion operational amplifier . Five groups have designed such 
circuits for operation from ±5 V power supplies, with power dissipation 
ranging from 0.4 mW to 1 mW. Low frequency gains are between 60 dP and 74 dB , 
and unity gain frequencies after frequency compensation is between 0.65 MHz 
and 3 MHz. The 1% settling times are between 0 .7 ~s and 4 ~s for a 10 pF 
load. An example of a student design is shown in Fig . 3. Another design uses 
an architecture drastically different from that of any N:10S operational 
amplifier presented to date; the students who desi 0ned the circuit have asked 
us not to present it because they plan to apply for a patent . 
FIGURE 3 A low-power enhancement/depletion operational amplifier (M . 
Elbuluk and J. Harrison). 
One group has decided to meet the challenge of using only enhancement 
devices in their operational anplifier; they have come up with the circuit of 
Fig. 4, and a performance certainly impressive for an all-enhancement design: 
a power dissipation of 0.93 mW, a low-frequency gain of 61 dB, a unity gain 
frequency of 420KHz, and a 1% settling time of 2 . 3 ~s with a 10 pF load. The 
circuit is more process-insensitive and has much lower distortion than most 
enhancement-depletion operational amplifiers. 
I NNOVATIVE CIRCUIT DESIGNS S ESS I ON 
363 
A MuLtipPoject Chip AppPoach to the Teaching of 
AnaLog MOS LSI and VLSI 
FIGURE 4 A low power enhancement-only 
Cederberg and B. V. Karlsson). 
operational amplifier (C. C. 
Three groups have designed high speed operational amplifiers. The 
simplest and fastest design is shown in Fig. 5; it compromises gain, which is 
only 40 dB, for speed. The unity gain frequency is 126 MHz, and the 1% 
settling time is only 24 ns with a 5 pf load charged through a series device. 
Power dissipation is 15 mH. 
FIGURE 5 
Ml 
M2 
M3 
A high speed, low gain operational amplifier (C. Christensen and 
W. Shiley). 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
364 Yannis P. Tsividis and DimitPi A. Antoniadis 
The autozeroing operational amplifier's concept is shown in Fig. 6; the 
switches are implemented with MOS transistors. The bottom amplifier is used 
to null alternatively the offset of itself and that of the top amplifier. 
This avoids the problems of commutated auto-zero designs, where at every 
commutation the signal output must slew from the offset value to the signal 
value. 
IN <>---,_-----1 
IN <>---T------1 OFFSET NULL 
OUT 
I 
I 
I 
I I 
L.--- ----' 
EXTERNAL 
COMPONENTS 
FIGURE 6 An autozeroing operational amplifier (M. Coln). 
The A/D converter project employs charge redistribution using three 
capacitor s; the analog part of the design is shown in Fig. 7. It is expected 
that it will perform an 8-bit conversion in 27 ~s . 
FIGURE 7 
C3 
51 52 53 COs I 
Vrol o--"'0~~--+---+--+-! 
Cl I C2 ~54 
OATA 
OUT 
An 8-bit, successive approximation A/D converter (S. McCormick 
and A. Garcia). 
INNOVATIVE CIRCUIT DESIGNS SESSION 
A MuZtipPoject Chip AppPoach to the Teaching o f 
Analog MOS LSI and VLSI 
365 
The AC/DC converter design (Fig. 8) is aimed at instrumentation 
applications. It uses a zero-crossing detector which activates a switch 
allowing only the negative half-cycles to pass; a polysilicon resistor is 
used as part of the following filter. A switched capacitor amplifier is used 
to scale the DC output. 
R 
RPOLY 
Ml7 
FIGURE 8 An AC/DC converter (D. K. Oka and S . Fiedler). 
Another project aimed at instrumentation applications is the 
voltage-to-current converter of Fig . 9 . One of the uses of this circuit is in 
developing a temperature-insensitive, supply-insensitive reference current 
from an existing reference voltage. The output current is produced from an 
internal reference current through a mirror circuit . Internal feedback 
circuitry adjusts the reference current until it charges a capacitor to a 
voltage equal to the voltage applied externally, in a specific amount of time 
determined by an external clock . 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
366 
FIGURE 9 
Ya nni s P . Tsiv i d is an d Di mi t Pi A . Anto niadi s 
E•t 
A voltage-to-current converter (M . M. Colavita and F. L. Terry, 
Jr . ) . 
One pro ject dealt with the design of the switched capacitor biquadratic 
notch filter shown in Fig . 10; the topology is that reported in (12) . This 
design has been adjusted for minimum capacitance spread and minimum total 
capacitance . A very good operational amplifier was part of the design ; in 
fact many of the "non-op amp" projects actually contained good operational 
amp lifie r s . 
I NNOVATfVE CIRCUIT DES IGNS S ESSIO N 
367 
4 MultiprJject nhip Approach ta the T~achi~g of 
An2lnJ MOS L~I and VLSI 
FIGURE 10 
c 
4>2 
.1 
A switched capacitor biquad notch filter (M. H. DeSmith and ~. W. 
Duehren) . 
The digitally programmable analog filter pr oject undertaken hy C. \I . 
~angelsdorf and A. L. Robinson uses pulse width control to adjust the 
transfer function coefficients ; the value of these coefficients depends only 
on timing and is independent of elenent values or even elenent val,Je ratios 
for some configurations (13) . The circuit is laid out in such a way that both 
transversal and recursive f i lters can he implemented . The design includes 
many s~mple-and-hold circuits , which share two operational amplifiers using 
the technique described in ( 14) . 
A good example of the extensive analog design involved in realizing 
digital circuits at the gate level is a dynamic RAN; one of the projects was 
the design of a one- t r ansistor-cell , 1 Kbit dynamic RAM including sense 
amplifie r s , word line driver and colu~n decoder . Its density is comparable to 
that of INTEL 2104 . According to the students , one of the most challenging 
parts was the high voltage driver , shown in Fig . 11 . 
CALTECH CONFERENCE ON VLSI, January 1981 
368 
FIGURE 11 
Yannis P . Tsividis and DimitPi A. Antoniadis 
High voltage driver of a 1 Kbit dynamic RAM (J. J. Cherry and G. 
L. Roylance). 
One of the most unconventional projects undertaken was an inductorless, 
one- chip AM radio receiver using superheterodyne circuitry by S . L. 
Garverick, T . E. Haferd , and R. B. Iverson . Both the RF input stage and the 
local oscillator are voltage controlled. The IF stage consists of a cascade 
of active non-switched filters. The detector uses a scheme similar to t he one 
described above for the AC/DC converter project . AGC circuitry is included . 
Although , in the authors' opinion , it is unlikely that the radio circuit will 
work as a whole , parts of it probably will; given the short amount of time in 
-hich the project had to be completed, this will still be satisfactory. 
Problems expected include the generation of spurious components in the input 
circuitry , and component mismatches in the IF stage. 
In thei r layout work students were careful to conserve chip ar ea; an 
example of an operational amplifier layout is s hown in Fig. 12. 
INNOVATIVE CIRCUIT DESIGNS SESSION 
A MultipPoject Chip AppPoach to the Teaching of 
Analog MOS LSI and VLSI 
369 
FIGURE 12 Layout of an operational amplifier (D. Goddeau and M. Johnson) . 
Student time spent on project work was a large part of the total time 
spent on the course; answers to a questionnaire distributed in class show 
that the average time spent for the course per week was 15 hours. We are 
looking for ways to decrease this time in future offerings, without 
subtracting significantly from the value of the course. 
4. CONCLUSIONS 
A one-semester course on analog MOS design has been taught several 
times at Columbia University and recently at HIT . The course covers topics 
from device physics and models to detailed circuit and layout, and one of its 
important parts is a state-of-the- art design project in which students put 
together what they have learned in the lectures . In the last offering of the 
course (MIT) , the emphasis on the project was increased; student designs will 
be implemented as part of a multiproject chip. Computer simulation results 
show that many designs are of excellent quality and are innovative; some may 
be pushing the state-of-the-art. Several designs are of interest to digital 
designers; in fact, the course has provided knowlege and technique needed for 
detailed digital cir cuit design at the gate level. 
CALTECH CO NFERENCE ON VLSI, JanuaPy 1981 
370 
Yannis P . Tsividis and DimitPi A. Antoniadis 
5. ACKNOv~EDGEMENTS 
The authors would like to thank the many people at MIT who have 
contributed to the success of the course; among them Paul Penfield , Jr . for 
useful discussions and for giving the lecture on layout aids , Steve Senturia 
for useful discussions, and Zahid Ansari, Hark Johnson , John Paulos and Irfan 
Rahim for characterization of devices . Thanks also go to the many students, 
both at Columbia and HIT, who have helped shape the course with their helpful 
comments and questions. 
REFERENCES 
1 . See special issues on analog circuits, IEEE Journal of Solid-State 
Circuits , December of each year starting with 1975 . 
2 . C. Mead and L. Conway, "Introduction to VLSI Systems," Addison-Wesley, 
1980 . 
3 . L. Conway , "University Scene," Lambda Magazine , vol. 1 , no . 3 , pp . 
65-69 , fourth quarter 1980. 
4. G. Merckel , J. Borel and N. Z. Copcey, "An Accurate Lar ge- Signal MOS 
Transistor Model for Use in Computer-Aided Design ," IEEE Trans. on 
Electron Devices , ED-19, p. 681 , 1972. 
5. Y. P. Tsividis, "Design considerations in single-channel MOS analog 
integrated circuits- A tutorial ," IEEE J . Solid-State Circuits , vol 
SC-1 3, pp . 383-391, June 1978. 
6 . Y. P . Tsividis , "Relation between incremental intrinsic capacitances 
and transconductances in MOS transistors," IEEE Transactions on 
Electron Devices, vol. ED-27 , pp . 946-948 , May 1980 . 
1. J . E. Meyer, "MOSmodels and circuit simulation ," RCA Review, vol. 32, 
pp. 42- 63. 1971. 
INNOVATIVE CIRCUIT DESIGNS SESSION 
A Multiproject Chip Approach to the Teaching of 
Analog MOS LSI and VLSI 
371 
8. P. Penfield, Jr., "AIDS-79 User's Manual," Integrated Circuit Memo No. 
80-14, Department of Electrical Engineering and Computer Science, MIT. 
9. L. W. Nagel, "SPICE 2: A computer program to simulate semiconductor 
circuits ," ERL Report No. ERL-M520, University of California , Berkeley , 
1975. 
10. J. C. Bertails, "Low-frequency noise considerations for MOS amplifier 
design," IEEE J. Solid-State Circuits , vol. SC-14 , pp. 773-776, August 
1979. 
11. Y. P. Tsividis, "Analysis of switched capacitive networks ," IEEE 
Transactions on Circuits and Systems , vol. CAS-26, pp. 935-947, 
November 1979. 
12. P . E. Fleisher and K. R. Laker, "A family of active switched capacitor 
biquad building blocks," Bell System Technical Journal, vol. 58 , pp . 
2235-2269, December 1979. 
13. Y. P. Tsividis, "Method for signal processing with transfer function 
coefficients dependent only only on timing," Electronics Letters, vol. 
16, pp. 796-798, 9th October 1980. 
14. L. Bienstman and H. J. DeMan, "An eight-channel 8-bi t microprocessor 
compatible NMOS converter with programmable scaling," IEEE J. 
Solid- State Circuits, vol. SC-15, pp. 1051-1059. December 1980. 
CALTECH CONFERENCE ON VLSI, January 1981 
372 
373 
DESIGN DISCIPLINES SESSION 
ChaiPpePson: MARTIN REM 
PPofessoP of Mathematics 
Technical UnivePsity 
Eindhoven , The Nethe~lands 
Visiting PPofessoP of ComputeP Science 
Cal tech 
CALTECH CONFERENCE ON VLSI , JANUARY 1981 
374 
Ma r>ti n Rem 
DESIGN DISCIPLINE S SESSION 
The management of c omplexity is among the most important problem s 
associated with the design of computing structures, and with the desi g n 
of VLSI structures in particular. 
The aim of establishing good design disciplines is the raising o f 
our confidence in the produ c ts we design. Such disciplines will o f 
nec essity involve an abstraction from the physical properties of the 
underlying VLSI medium. They Involve the construction of mathematical 
models in which we design our computations and that allow us to prove 
properties of our computations. The intellectual mastering of VLSI 
design requires the cooperation of mathematical methods, circuit design, 
and programming methodology. In this session we get an impression of 
what it is that is making this meeting ground so exciting. 
There are two aspects any mathematical abstraction in VLSI design 
:n us t n e v e r n e g I e c t : 
(1) The notati o n in which we ex press our designs, our 
11 programming language, 11 should have a rigorous definition. 
O nly then will we be able t o consider correctness proofs and 
formal verifiction. In order not to be overwhelmed by the 
inh e rent complexity of the VLSI medium the notation should 
s upp Cl rt (or : f o rce) the hierarchical structuring of designs. 
The first two papers address this topic. The first gives a 
mathematical tr ea tment of VLSI arrays. The second paper 
discusses a hierarchi ca l notation for computations that are to 
be realized as CMOS circuits. 
(2) The mathematical model, the universe in which we express 
our designs and that we use to argue about our designs, should 
be based on underlying physical properties. The other four 
p a pers emphasize this aspect. The third paper of the session 
addresses the mappability of topological structures into the 
planar VLSI medium. The next two discuss the topic of signal 
propagation delays. In the session 1 s fourth paper it is shown 
th a t under certain co nditions the delay time rn ay be a ssumed to 
be a logarithmi c function of the wire length. When these 
conditions are not met one has to resort to linear delay 
times. A complexity model under the linear time assumption is 
the subject of the fifth paper. The last paper of the session 
discusses a variant of switching theory that is more 
a p p r o p r i a t e to VL S I t han t he t r ad i t i on a I t he or y • 
DESIGN DISCIPLINES SESSION 
375 
Towards a Formal Treatment of VLSI Arrays 
Lennart Johnsson 
Computer Science 
California Institute of Technology 
Pasadena, CA 91125 
and 
Information Sciences Institute 
Danny Cohen 
Information Sciences Institute 
Marina Del Rey, CA 90291 
and 
California Institute of Technology 
Abstract 
Uri Weiser 
Computer Science 
University of Utah 
Salt Lake City, UT 84112 
Alan L. Davis 
Computer Science 
University of Utah 
Salt Lake City, UT 84112 
This paper presents a formalism for describing the behavior of computational networks at the 
algorithmic level. It establishes a direct correspondence between the mathematical expressions 
defining a function and the computational networks which compute that function. By formally 
manipulating the symbolic expressions that define a function, it is possible to obtain different 
networks that compute the function. From this mathematical description of a network, one can 
directly determine certain important characteristics of computational networks, such as 
computational rate, performance and communication requirements. The use of this formalism for 
design and verification is demonstrated on computational networks for Finite Impulse Response (FIR) 
filters, matrix operations, and the Discrete Fourier Transform (DFT) . 
The progression of computations can often be modeled by wave fronts in an illuminating way. The 
formalism supports this model. A computational network can be viewed in an abstract form that can 
be represented as a graph. The duality between the graph representation and the mathematical 
expressions is briefly introduced. 
CALTECH CONFERENCE ON VLSI, January 1981 
..>tO Lennart Johnsson , Ur i Wei s e r, Dann y Cohe n and Alan L . Davis 
Introduction 
This paper addresses the problem of formally describing the behavior of computational networks at 
the algorithmic level. The focus is on the correspondence between equations defining a certain 
computation and networks performing that computation. In an equation there may be no concept of 
time. An equation expresses how a variable is related to a set of constants, other variables, and 
possibly itself. Time is, however, intrinsically associated with any computation performed by a 
physical device (Mead and Conway [10]). 
VLSI technology promises to offer substantial computational power. With submicron technology, on 
the order of a million to ten million transistors can be placed on a single chip. The complexity of 
designing such a chip is orders of magnitude greater than that typical today. The need for proper 
abstractions at all levels of design is apparent. These abstractions have to be consistent so that a 
higher level description can be expanded in a hierarchical manner to levels where a direct mapping to 
silicon will generate circuitry that performs the function of a high-level description. One hierarchical 
approach, Rowson [12], not only brings the complexity of VLSI circuit design within reach of humans 
but also enhances correctness. The approach is also a good basis for advanced design tools such as 
silicon compilers (Ayres [1] and Johannsen [5]) . Several silicon compilers exist today. The first 
silicon compiler, based on a special class of floorplans, has been followed by less restricted 
compilers. Currently available compilers accept an input at the register-transfer level. 
Chen and Mead [2] have proposed a notation for designing concurrent systems. Their notation 
supports a hierarchical approach towards system design and enhances proof of liveness and 
safeness of concurrent systems. In mapping a behavioral description to silicon, it is also necessary to 
insure that the construCts used have a correspondence in circuits with an electrical behavior 
matching the description. Rem and Mead [1 1] have proposed a notation and composition rules that 
insure a correct correspondence between a syntactically correct behavioral description and the 
behavior of circuitry that can be generated from the description. 
Early in the design of a computational network, a decision must be made about how the data required 
by the computation will enter the network. In a real- time signal-processing environment, a variable is 
typically observed (sampled) at discrete times, often at a constant frequency. In such applications it is 
natural to assume that the order of the data input to the network is the same as the order of the 
generation of the data. If the input data resides in a random access memory, then there are several 
ways for the data to be organized without requiring additional hardware or adversely affecting 
performance. The organization of the input data is, however, of prime importance for the size, 
DESIGN DISCIPLINES SESSION 
To wa Pds a FoPmat TPeatment o f VLSI APPays 377 
structure, and performance of the computational networks that can be designed to compute the 
desired function. 
For simplicity, the derivations of the networks presented here are assumed to be synchronous. The 
results derived apply also to self-timed design (Seitz [13]) . In the computation of most functions, not 
all computations can be performed concurrently. Sequence requirements implied by the algorithms 
used to compute a function have to be satisfied in order to obtain a correct result. It will be seen that 
the computational networks described here are composed of a collection of functional modules that 
take a certain collection of inputs and produce a collection of outputs. We define the notion of a time 
step to be the time it takes for a module of a network to compute its results from its inputs. In this 
manner, a time step can be viewed as the time quantum separating sequential sets of inputs to 
modules and outputs from modules. 
The formalism we use to establish a correspondence between a mathematical expression and 
computational networks follows Cohen [3] by modeling a storage element (e.g ., a flip-flop) with an 
operator that can be interpreted as a delay when acting on a series of conceptually sequenced data. 
A data sequence can be viewed such that successive elements of the sequence are separated by a 
single time step. In a self-timed system, the delay may vary in terms of absolute time, but the 
sequential order is always preserved. The operator is well defined mathematically and can be readily 
introduced in an expression to be evaluated by a network once the input data has been ordered in 
time. Hence, the mathematical expression can be transformed in a straightforward manner into a 
form that maps directly to a computational network. There are a number of mathematical expressions 
that are all functionally equivalent but whose direct hardware interpretation results in different 
networks. These equivalent forms can be obtained by formal manipulation of the equations. 
Correctness is assured since these transformations are function preserving. 
Using this formalism it is possible to determine the essential properties of the networks directly from 
the equations defining them. Computational rate , performance, delay, modularity, and module count 
are all easily determined from the equations. The interconnection scheme and communication 
characteristics can also readily be found from the equations for networks with a high degree of 
regularity. We will show how this approach can be used to derive and characterize computational 
networks for Finite Impulse Response (FIR) filters, matrix-vector products, and the Discrete Fourier 
Transform (OFT). The details of these networks may be found in Cohen [3], Cohen and Tyree [4], 
Weiser and Davis {14] and Johnsson and Cohen [6] and [7]. Weiser and Davis [14] have also used 
this approach to treat networks for the multiplication of band matrices and the solution of triangular 
linear equations. 
CALTECH CONFEREN~E ON VLSI, JanuaPy 1981 
37 8 
LennaPt Johnsson , UPi Weise P, Dan ny Co h e n and Alan L. Davis 
The mathematical approach pursued in this paper may also be used for verification. The modules 
used for the examples in this paper contain additions, multiplications, and delays. Their function can 
be described as a transfer function in the form of an operator that can be represented by a matrix. 
Since all of these networks are linear, the compositions of modules into arrays correspond in the 
functional domain to multiplication of matrices. 
This mathematical approach is demonstrated in this paper only for one-<1imensional arrays; however, 
this is not a limitation of the approach. See Weiser and Davis [14) for the application of this approach 
to two-<limensional arrays. 
The progression of a computation can be modeled by the concept of a wave front. Wave fronts are an 
intuitively appealing way to illustrate how the computations proceed. Wave fronts can be defined 
either graphically, in terms of the networks, or mathematically, in terms of equations (see Johnsson 
and Cohen [6,7), and Weiser and Davis [14)). The use of wave fronts in the mathematical domain has 
the additional utility of simplifying the notational complexity. S. Y . Kung [9) has also used wave fronts 
in an informal way to describe the progression of computations in orthogonal arrays. 
Notation 
Space and time are fundamental characteristics of computational networks and their behavior. 
Computations may be distributed in time, in space, or both. 
Let X= {x(k)} be a sequence of variables. The index k is associated with time such that x(k) precedes 
x(k + 1) by one time step. We refer to such a sequence as a data stream. 
. '-1 
Define the operator z by Zx(k) = x(k-1) and define zl = zzl . Then zix(k) = x(k-j) . 
The elements of the sequence X may be observed in two fundamentally different ways . The first way 
is to view the elements over time as they pass by a particular point, which is fixed in space. The 
second way is to take a "snapshot" of the network, i.e., to view the elements of X as they are spread 
out in space at an instant of time. Such a snapshot is shown in Figure 1. 
Hence, the operator Z may be considered as a delay in time when applied to a data stream at a certain 
position, or as a "shift over space" when considered for an instant of time. 
Figure 1 shows the effect of z5 operating on a data stream. 
DESIGN DISCIPLINES SESSIO N 
379 
TowaPds a FoPmal TPeatment of VLSI APPays 
x(k) m x(k-1) m x(k-2) m x(k-3) m x(k-4) m x(k-5) 
----il .. ~~ .. ~ .. 1!.1 .. ~ .. ~ .. 
Figure 1: z5 operating on a data stream 
From the definition of Z and Figure 1 it is natural to interpret Z as a ~ when acting on a data 
stream. 
Define z-1, the inverse of the Z operator, by z-1 x(k) = x(k + 1 ) . 
z-1 can be interpreted as a prediction when operating on data streams. It has the following properties 
:cz-1 = z-1 Z = z0 = I, where I is the identity operator. 
The operator Z is commutative with respect to time-independent functions. However, when 
commuting the Z operator and the function , the Z operator must be distributed over the entire 
operand set of the function, e.g. , 
ZF(X,Y) = F(ZX,ZY). 
A graphical representation of this commutative-<iistributive property is shown in Figure 2. 
X 
y 
Figure 2: ZF(X,Y) = F(ZX,ZY) 
Constants do not change over time and the Z operator does not affect constants. Therefore, 
Z(CX) = (ZC)(ZX) = C(ZX) . 
CALTECH CONFERENCE ON VLSI , Januapy 1981 
380 
Lennart Johnsson , Uri Weiser , Danny Cohen and Alan L . Davis 
Hence, as operators, Z and C commute: (ZC)X = (CZ)X . 
Equations in which the Z operator is used to express sequencing can be given a direct interpretation 
in terms of computational networks. Different expressions correspond to different networks. In a few 
examples we will show not only the correspondence between expressions using the Z operator and 
computational networks but also how some properties of the computational network (such as 
modularity , computational rate, performance, and fault propagation) can be determined directly from 
the expressions. 
Finite Impulse Response Filters 
A Finite Impulse Response (FIR) filter can be defined as 
N-1 
y(k) = :L a(i)x(k-i) 
i=O 
(1) 
where X is the input signal to the filter and a(i), i = 0,1 ,2, .. . ,N-1 are the filter coefficients. The output of 
the filter is Y. The indices of x· and Y are naturally associated with time in a real-time environment. 
The N multiplications required to compute each value of Y can be carried out in any order or 
concurrently, because equation (1) does not prescribe any order of evaluation. There exist several 
hardware realizations of equation (1 ). They may differ with respect to computational rate, 
performance, amount of hardware, reliability, etc. 
For the FIR filter it is natural to assume that the data arrives sequentially and in order of increasing 
indices. Using the Z operator we will now discuss a few implementation alternatives. 
Introducing the Z operator into equation (1) gives 
or 
N-1 
y(k) = 2 a(i)Zix(k) 
i:O 
DESIGN DISCIPLINES SESSION 
(2) 
Towa Pds a FoPma~ TPeat~e n t o f VLSI 4PPays 381 
A direct hardware interpretation of equation (2) would contain N(N-1)/ 2 delays, N multipliers, and N-1 
adders. Having computed the products, which can be made concurrently, the N terms have to be 
added. If the N-1 additions are concurrent then the computational rate is limited by the carry 
propagation. 
However, equation (2) can be rewritten as 
y(k) = ( ( ... ((a(N-1 )Z + a(N-2))Z + a(N-3))Z + ... )Z + a(O)) x(k) (3) 
From equation (3) it is obvious that N- 1 delays suffice to implement the filter defined by equation (1). 
Equation (3) naturally corresponds to a linear array of modules, as indicated in Figure 3 . The 
computational rate of an implementation corresponding to equation (3) is, however, still limited by 
N-1 concurrent additions. 
y(k) 
Figure 3 : The implementation of the FIR fi lter 
. .•............ 
. . 
. 
. 
. 
. 
. 
.,._...,·-o 
. . 
•• : ••••••••••• 1 
The coefficients a(i) , i = 0,1 ,2, ... ,N- 1, and the Z operator commute because the coefficients are 
constants. Note that the index k is associated with time and that the index i is associated with space 
("module number" ). Therefore , Z operates on the variable k, not on i. Using the property that a(i) and 
Z commute, equation (2) can be rewritten as 
y(k) = (a(O) + Z(a(1) + Z( ... (a(N- 2) + Z(a(N- 1))) ... ))) x(k) 
or (4) 
N- 1 
Y,. ( ~ zia(i) }x~ 
1=0 
CALTECH CO NFERENCE ON VLSI , Janua p y 1981 
382 
Lennart Johnsson , Uri Weiser, Danny Cohen and Alan L. Davis 
Equation (4) corresponds to an implementation that has as many components as the implementation 
of equation (3), but the modules and the array have different properties. X is broadcasted to all 
modules. For large values of N, broadcasting ("fanout") is undesirable. Broadcasting implies long 
wires and large drivers; long wires are likely to be slow ·and may limit the computational rate. An 
implementation corresponding to equation (4) is not, however, limited by N-1 concurrent additions, as 
is the implementation of equation (3). The summation is in implementations corresponding to 
equation (4) performed in a pipelined fashion. Figure 4 shows modules and a linear array 
corresponding to equation (4) . 
.................. ...........•.•.•... 
x(k) ....;•;.---,------T----r-----r 
y (k) 
. 
..•...•................................. 
. .................... . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. ....•.......•.....•. 
Figure 4: The implementation of the FIR filter 
0 
If the value of N is large enough for broadcasting to be a problem, then the computations of the FIR 
filter can be grouped so that broadcasting will not be a problem in each group. The groups are then 
connected via delays. The output of the FIR filter will be delayed. a number of steps one less than the 
number of groups. The broadcasting may also be made in a tree-like manner. One possible solution 
to this fanout problem is shown in Figure 5. 
x (k) -..---.........----~----! z 1----.-----.---....---- x(k-1 ) 
................. ........ .......... ............ . 
. . . 
. . . 
................................................... 
. . 
. . 
. . . 
. . . 
. . . 
. . . 
. . 
. . 
. . 
• 0 • • 
• • 0 • 
······· ················································ 
y(k- l ) 0 . . 
. . 
. ..................................... ················· 
Figure 5 : The implementation of the FIR filter 
DESIGN DISCIPLINES SESSION 
Towa r as a ~o rmal T r eatment of VL SI Arrays 383 
In the implementations discussed above, it has been assumed that N is small enough to allow for all 
modules required by the equations as given to be implemented. The input consisted of one data 
stream. 
Matrix vector multiplication 
The product Y of a matrix A by a vector X is defined by 
N 
y(m) = L a(m,i)x(i) 
i = 1 
for m = 1,2, ... , M 
Hence, each y(m) is the inner product of the mth row of A, {a(m, ·n, and the vector X. 
(5) 
The evaluation of each of these inner products is similar to the evaluation of the FIR filter shown 
above, with the differences that here there is a set of {a(i, ·n associated with the ith unit, not just one 
a(i) as before, and that the notation here starts at i = 1 whereas it starts at i = 0 in the FIR filter 
computation. 
M•N multiplications are required for the evaluation of equation (5). 11n the following discussion they 
are distributed into M modules (distribution in space) each performing N multiplications (distribution 
in time) . 
Obviously, these Minner products are not independent because they share the same input vector, X. 
We first introduce another implementation scheme for inner products; then we show several different 
ways to interconnect them in order to achieve the matrix-vector multiplication. 
A straightforward use of the arrays discussed above for the FIR filter to compute a matrix-vector 
product would require one array for each component of the product, i.e., O(N.M) modules. Since this 
quantity may be prohibitively high, we use another approach that uses only M modules. This 
reduction in the number of modules is obviously reflected in the rate at which the output is computed , 
as seen below. 
We follow approach B from Cohen and Tyree [4). We pursue an implementation that is organized as 
M modules, each corresponding to a certain y(m). The matrix coefficients are given one column at a 
time such that each row, {a(m,•)), is a data stream of coefficients given to the unit corresponding to 
that y(m). The vector X is also given as such a data stream to all the units. 
CALTECH CO NFERENCE ON VLSI, Janua ry 1981 
384 
LennaPt Johnsson , VPi WeiseP, Dann y Cohen and ALan L . Davis 
With this organization, the operation of the operator Z on the data is 
ZX(k) = x(k-1) and Za(m,k) = a(m,k-1) 
Note that the above is a property of the data organization and not of the Z operator. 
Define the partial sums involved in the computation of the products: 
k 
Y(m,k) = :2 a(m,i)x(i) for k = 1 ,2,3, ... , N 
i= 1 
Obviously, y(m) = Y(m,N). Also, Y(m,O) = 0. 
The partial sums are recursively computed by Y(m,k) = Y(m,k-1) + a(m,k)x(k). 
Hence, Y(m,k) = ZY(m,k) + a(m,k)x(k), which can be written as (1-Z)Y(m,k) = a(m,k)x(k). 
Multiply both sides by (1-Z)-1 and gEft 
00 00 
Y(m,k) = (1-Z}-1 [a(m,k)x(k)] = :2 zi[a(m,k)x(k)] • :2 a(m,k-i)x(k-i) 
i=O i-=0 
A module for the implementation of equation (7) is shown in Figure 6. 
(6) 
(7) 
However, it is apparent that a module corresponding to equation (7) has an infinite "response". The 
boundary conditions of the problem imply a need to bound this infinite response. This is done by 
using a modulo N counter to provide a~ mechanism, as in Cohen and Tyree [4]. 
With this additional control, designed for repetitive operation, the module can be redefined as 
Y(m,k) = ZY(m,k) + a(m,k)x(k) for k;t 1 (mod N) 
and (8) 
Y(m,k) = 0 + a(m,k)x(k) fork= 1 (mod N). 
A module for the implementation of equation (8) is shown in Figure 7. 
DESIGN DISC I PLINES SESSION 
385 
TowaPis a Fo r mal TPeatment of VLSI APPays 
a(m, k) x(k) 
:·· ...... ··················: 
. . 
. . 
. 
. ................. . 
y(m, k) 
Figure 6: Infinite response module 
a(m, k) x(k) 
:· ...... ··························: 
. 
......... ............••...•.......... 
y(m, k) 
Figure 7: A finite response module 
The reset occurs at the same time that the partial sum, Y(m,k), is equal to y(m) . At this time y(m) is 
output, and the computation of the next y(m) begins. 
Note that here there is a need for control signaling, which was not needed in the FIR case. This is 
because of the distribution of the FIR computation in space, where the "size" of the inner product (N) 
is determined by the actual size of the array, which implicitly defines the value of N. In the latter case 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
386 
LennaPt Johnsson, UPi WeiseP, Danny Cohen and Alan L . Davis 
this computation is distributed in time, which necessitates the use of control signals for defining the 
value of N. 
The discussion above focused on the single module for inner products. There are several ways for 
using M s1l"ch devices in the design of a network for matrix-vector multiplication. 
One way is to synchronize M such devices in parallel and to supply the same x{k) to all of them at the 
same time through any of many broadcasting techniques.1 In such an arrangement a single reset 
control signal is broadcasted to every unit, and all the values of {y{m), m = 1 ,M} are available at the 
same step. Such a network is shown in Figure 8. 
a(l 1 k) 
x(k)~-.---------------r~~--------------~~ 
a(2 1 k) a(3, k) 
. . . . . . 
•··•••·•·•••···••··•··•··· .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 
. 
. 
. 
•••....•.•............•.• 
y(l 1 k) y(21 k) y(3 1 k) 
Figure 8: Synchronized matrix- vector multiplication 
Another possible arrangement is based on the following relation : 
N N N 
zmy{m) :::1 L zm[a{m,k)x{k)] = ~ [Zma{m,k)][Zmx{k)] = ~ a'{m,k)[Z{m)x{k)], 
i=1 i=1 1=1 
where a'(m,k) = a(m,k-m), namely the same sequence "shifted" in time by m steps. 
11t is assumed that the broadcasting delivers the same value to all of its recipients "at the same time". 
DESIGN DISCIPLINES SESSION 
TowaPis a Fo PmaL TPeatment of VL SI APPays 387 
This network has one delay in X between successive modules. It also changes the "phase" of each 
unit, such that the mth unit has to be reset when k = m (mod N), unlike the condition k = 0 (mod N) as 
above. Hence, each unit is reset at a different time (rather than all of them at once) and the elements 
of Y are available sequentially. In this implementation the input stream is continuous, one x(k) per 
cycle, as is the output stream, one y(m) per cycle. 
The network for implementing this arrangement is shown in Figure 9. 
a'(l,k) a' (2, k) 
•••.....•..•..•...••••• 
. 
a' (3, k) 
......•......•....... 
. 
X (k) ~:"-+-.....---+1 ···················~ . 
kwO mod N 
. 
. 
. 
. 
. 
. 
Figure 9: Pipelined matrix-vector multiplication 
According to Figure 9, an array of M modules is capable of concurrently computing sequences of 
matrix-vector products of indefinite length, N. 
Another related problem is the multiplication of a band matrix by a vector. A matrix A is defined to be 
a band matrix of width r + s + 1 if a(i ,j) = 0 for all j)i + r and for all i)j + s. 
A FIR filter can be considered as a multiplication of a band matrix by a vector. The filter coefficients 
are the diagonals of the matrix. 
The structure of a band matrix suggests that the elements of the matrix are entered along its 
diagonals such that each diagonal. is entered as a separate data stream. 
CAL TECH CONFERENCE ON VLSI , JanuaPy 1981 
388 
LennaPt Johnsson ~ UPi WeiseP~ Danny Cohen and Alan L . Davie 
In this arrangement the effect of Z on the matrix elements is Za(m,k) = a(m-1 ,k-1 ). 
The difference between this expression and equation (6) is due to the different order of the input data. 
,. 
The concept of a wave front is useful because it simplifies the mathematical notation and it can be 
used as a conceptual tool in understanding how data progresses through a computational network. 
Wave fronts in computational networks are analogous to wave fronts in fluid dynamics. They are of 
particular importance in investigating laminar flow but are less useful in the study of turbulent flows. 
Elements of a sequence of data values ordered in time can be viewed as a data stream. Elements of 
the same data stream are separated by a single time step and typically follow the same path through a 
computational network. 
A wave front in fluid dynamics consists of a set of points in sp3ce which changes with time according 
to the propagation of the wave. Similarly, a wave front in a computational network consists of 
elements from different data streams. If we view a computational network abstractly as computing a 
result based on input operands, then it is possible to associate a set of data streams with a particular 
input operand. In the previous example of matrix-vector multiplication, the matrix operand consisted 
of a set of data streams, where each data stream corresponded to a row of the matrix. Using the wave 
front concept it is possible to change the view of an operand from a set of data streams to a set of 
wave fronts. These wave fronts are essentially a series of parallel cross sections of the set of data 
streams. More precisely, such a wave front contains exactly one element from each data stream. We 
are particularly interested in wave front sequences that contain all of the data elements present in the 
set of data streams comprising the operand. 
In the implementation of the matrix-vector multiplication, as shown in Figure 8, all of the 
multiplications corresponding to a column are performed concurrently, and the matrix elements enter 
the array column by column. The input data set applied at time k may be defined as a wave front 
WF(k). Applying the Z operator to every data stream results in the wave front WF(k-1 ). 
In general , ZWF(k) = WF(k-1 ). By this definition a wave front corresponds to a column of the matrix 
A. Figure 10 shows these wave fronts. 
However, in the implementation shown in Figure 9, the wave fronts are skewed because of the relative 
delay between successive modules. These wave fronts are shown in Figure 1 1. 
DESIGN DISCIPLINES SESSION 
389 
To~aPds a FoPmaL TPeatment of VLSI APPays 
... 
... 
Figure 10: Vertical wave fronts 
Figure 1 1: Skewed wave fronts 
It is possible to perform several types of transformations on wave fronts. For example, it is possible to 
transform a wave front representing a row in a band matrix to a wave front representing a column in 
that matrix , and vice versa. This is done by applying zi to each element of the initial wave. ·ront, where 
the value of i corresponds to the data element position in the initial wave front. This t3 in effect, a 
rotation of the initial wave front that results from applying the variable delay to its elements. 
The Discrete Fourier Transform 
The Discrete Fourier Transform (DFT) is defined by 
N- 1 
where 
y(k) = L wmkx(m) for k "' 0,1 ,2, ... , N-1 
m=O 
-277i/N 
w=e 
The DFT can be considered as a special case of a matrix-vector product. The approach 
corresponding to equation (8) is directly applicable. The elements in a row or a column can be easily 
generated because the ratio between consecutive elements is a constant. The computational 
CALTECH CONFERE NCE ON VLSI, JanuaPy 1981 
390 
Le n na Pt Johnsson , VPi Wei s eP , Dan n y Cohen and Alan L . Davi s 
networks we derive here explore the fact that the ratio between consecutive elements in a column is a 
constant. 
Define Y(m,k) = wmkx(m) such that each Y(m,•) is a data stream. Therefore, ZY(m,k) = Y(m,k-1). 
The sequence {Y(m,k) , m = 0,1 ,2 ... ,N-1} may be generated by 
and Y(m,O) = x(m) . 
Obviously, 
N-1 
y(k) = ~ Y(m,k) for k ~ 0,1 ,2, ... , N-1. 
m=O 
A module generating the variables Y(m,•) is shown in Figure 12. 
x(m ) 
·····••·•···•·······•···•·•·•••·· . 
k!! 0 (mod N) 
Y(m,k) 
Figure 12: The Y(m,•) module 
(9) 
An array corresponding to equation (9) suffers from the need to perform N-1 additions in one step. 
The modules in the array are initiated with different values, {x(m)}, which become available at 
successive steps. The input values therefore have to be stored for the initialization of the modules. 
The implementation corresponding to equation (9) can be improved by using pipelined addition. 
Let the {Y(m,•)} modules be interconnected into an array in a pipelined manner as shown in Figure 4. 
DESIGN DISCIPLINES SESSION 
TowaPds a Fo~ma~ TPeat ment o f VLSI A ~~ays 391 
Let s(k) be the output of this array at time k; then 
s(k) = Y(N-1 ,k) + Z(Y(N-2,k) + Z( ... Z(Y(O,k)) ... )). (1 0) 
The modules are initialized so that 
Y(m,k) = x(k) for k = m (mod N), m = 0 ,1 ,2, .. ,N-1 
and (1,) 
Y(m,k) m = w Y(m,k-1) otherwise. 
Equation (11) expresses a sampling mechanism. The values of the data stream X are multiplexed into 
the N modules in a cyclic manner. The output S is well defined from time N-1 and on. To study the 
output S we rewrite equation (1 0) as 
s(k) = Y(N-1 ,k) + Y(N-2,k-1) + Y(N-3,k-2) + .... + Y(O,k-(N-1 )). 
Hence s(N-1) = x(N-1) + x(N-2) + .... + x(O) = y(O) 
s(N) = w(N-1 )x(N-1) + w(N-2)x(N-2) + ..... + w0x(O) = y(1) 
s(2(N-1 )) = w(N-1 )(N-1 >x(N-1) + w(N-2)(N-1 >x(N-2) + .... + wO(N-1 >x(O) = y(N-1) 
s(2N-1) = x(2N-1) + x(2N-2) + .... + x(N) = y(N) 
An array corresponding to equations (1 0) and (11) is shown in Figure 13. 
0 t----=~ : : 1---s(k) 
......................................... -.......................................................................................................... .. ............ .. 
Figure 1 3: OFT array of si:ze 4 
CALTECH CONFERENCE ON VLSI , Janua~y 19d1 
392 
Le n na Pt Jo hnsso n, U~ i Wei seP , Danny Coh e n and Al a n L . Davi s 
The set of multiplications being performed during a step of the computations lies on lines parallel to 
the diagonal from the lower lett-hand corner to the upper right-hand corner of the matrix defining the 
OFT. These lines can be defined as wave fronts . The computations start at the upper lett-hand corner 
and proceed downward and to the right. The wave fronts are illustrated in Figure 14. The multiple 
wave fronts shown in Figure 14 correspond to the case where a sequence of OFTs are computed by 
the array. There is one wave front for each OFT. Hence, computations belonging to two OFTs are 
typically performed concurrently. 
-
-
y X 
Figure 14: Wave fronts for OFT 
Multiplexing the elements of the stream into a set of N modules can also be used in the general 
matrix-vector multiplication case. The matrix elements cannot in general be generated within a 
module but have to be supplied to the modules. Each module should be supplied with the matrix 
elements in a column in row order . The data stream associated with a column should be delayed one 
time step with respect to the stream associated with the preceding column. The loop around the 
multiplier is replaced with a storage element into which the elements of the stream X are multiplexed. 
The output of the multipliers should be added in a pipelined manner. Figure 15 illustrates the data 
organization schematically with the matrix rows indicated by dashed lines. Wave fronts can be 
associated with the set of matrix elements that enters the array at any given time. The wave fronts 
defined in this way are indicated by solid lines in Figure 15 and correspond to the diagonals in 
Figure 14. 
The implementation corresponding to equations (1 0) and (1 1) contains O(N) modules and computes 
the OFT in O(N} time. Following the FFT logic it is possible to reduce the number of modules to 
O(log2N), by further exploration of the properties of the coefficients. 
DESIGN DISCIPLINES SESSIO N 
To~aPds a FoPmaL TPeatment of VLSI APPays 
X 
, 
, 
, 
," / 
, , A , / , 
, , " 
, " , " , , / , 
, ,,,, 
", '"" 
, , '"" , , , " 
, , " 
, , " 
, ,' / 
, 
Array Il N-1 y 0 2 
Figure 15: Wave fronts for OFT, v2 
We assume N = 2n. 
The coefficient {wik} has the property 
Hence, 
and 
or 
w(j + N/2)k 
H-1 
2 
={ 
'k . 
= vt ,keven, J=0,1,2, .. . ,N/2-1 
'k . 
= -vt ,kodd,J=0,1,2, ... ,N/2-1. 
y(2j) • L ~k(x(k) + x(k + ~ )) for j • 0,1 ,2, ••• , ~-1 
k=O 
ti_, 
2 
y(2j + 1) • L wk(2i + l)(x(k)- x(k + ~ )) for j • 0,1 ,2, .•• , f-1 
k=O 
ti_1 
2 
y(2j + 1) == 2 w2ik(wk(x(k) - x(k + ~ )) for j • 0,1 ,2, ... , ~-1. 
k=O 
393 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
394 
Lenna Pt John sson ~ VPi Weise P~ Dann y Cohe n and Alan L . Davis 
Thus, the even and odd components of the OFT can be obtained as matrix-vector products by using 
the same N/2 by N/2 matrix operating on different vectors. 
Define a new data stream V obtained from the stream X as follows: 
Let -N/2 v(k) = (1 + Z )x(k) for k = 0,1 ,2 ... N/2 
and k N/2 v(k) = w (1-Z )x(k) for k = N/2,N/2 + 1 , .. . ,N-1. 
The former definition requires a negative power of Z, a prediction that cannot be implemented in 
general. However, by multiplying both sides of this definition by zN12 no prediction is needed. This 
means that instead of computing v(k), the network actually computes zN12v(k), which is the desired 
value delayed by N/2 steps. Since the first half of {v(k)} has to be delayed by N/2 steps we also delay 
the second half by the same amount. 
A module computing the sequence Vis shown in Figure 16 . 
•..............................•........ 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
: ...••............... .......••... .... 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Control 
Figure 16: A modified butterfly module 
DESIGN DISCIPLINES SESSION 
Towa ~ds a Fo~mal T~eatment of VLSI A~~ays 
Then ,_, 
y(2j) • ~ .lik(Y(k) tor J • 0,1.2, -·· f-1 
k•O 
N-1 
Y<2i+1) • l: ~) for J • 0,1.2, ••. , f-1 
kat! 
2 
or in matrix form 
y{O) 1 12 1 ........ 1~20 
y{2) 1 w w •••••• w 0 
1 ............. 0 
1 .•...•... ....• 0 
y(N-2) • 1 w w •••••• w 0 
y(1) 0 0 0 .•..•.. 0 1 
y(3) 0 0 0 •••.•.• 0 1 
0 0 .•.•••• 0 
0 0 •.••.•• 0 
0 0 .•.••.• 0 
0 0 .•..... 0 
0 0 ....••• 0 
12 
w 
1 ..... . .. 1N-2 
w • • •••• w 
0 0 0 •...•.• 0 1 . .......... ..... 
0 0 0 .....•• 0 1 ... . •..••..... .. • 
y{N-1) 0 0 0 ..•.•.. 0 1 w w • ••• • • w 
395 
v(O) 
v{l) 
v(N/2) 
v{N-1) 
The original matrix has been transformed into a block diagonal matrix with identical diagonal blocks. 
Obviously, the observation used to obtain equation (1 2) can be applied to each of the diagonal blocks 
recursively. Eventually, the block diagonal matrix becomes a diagonal matrix, and the computation of 
the OFT is completed. Figure 17 shows an array of log2N modules computing the OFT. 
The input has to be in normal order. The output appears in bit-reversed order. There is no 
broadcasting, and no module has to contain any long wires. The computational rate is limited by 
log2N adders in series. The computational rate can be improved by introducing delays in a 
straightforward manner. The first component of the OFT will appear at the output of the array at the 
time the last data item, x(N-1), arrives, and the last component will appear at the output N-1 steps 
later. 
Please note that this implementation performs FFT by using decimation in frequency. 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
396 
Lennapt Johnaaon , UPi WeiaeP , Danny Cohen and Alan L . Davis 
---1+2 ' +2 : +21----..._..._ ___ -4+2t---....... -----t 
... ... .. .... ......... · .... .... ............... ........ ....... .. ....... . 
Figure 17: An FFT array 
In the FIR filter , the matrix-vector multiplication cases, and the OFT network of N modules, the data 
streams can be considered as laminar. The wave fronts can be considered as corresponding to points 
of constant phase on waves. The network considered for the computation of the OFT by log2N 
modules does not preserve the laminar flow. The concept of wave fronts is not attractive in this case 
as in the case for turbulent flows. 
Conclusions 
The simple model of storage used in this paper is given a precise mathematical meaning. Once the 
organization of the input data is determined, the Z operator (i.e., the mathematical model of a storage 
element) can be used to model the ordering of input data streams d irectly. The mathematical equation 
that defines a function to be computed can be transformed from a form that contains no concept of 
time to a form that contains information about t ime as well as space. This new form is typically an 
expression containing the Z operator. 
Equations containing information about the time and space required to compute a function can be 
manipulated formally, since the properties of the Z operator are well defined. It is possible to give 
expressions containing the Z operator a direct hardware interpretation. Properties such as 
computational rate, performance, delay, modularity, communication structure, and fault tolerance 
can be determined directly from an expression using the Z operator. 
The methodology suggested in this paper is useful in synthesis as well as analysis and verification of 
computational networks. It is possible to iterate between formal manipulations of expressions in the Z 
DESIGN DISCIPLINES SESSION 
397 
Towards a Formal Treatment of VLSI Arrays 
operator and graphs describing computational networks. Modeling the behavior of a network using 
the Z operator to describe a storage element makes it conceptually straightforward to verify whether 
or not the network actually computes the function it is supposed to compute. 
A formal approach as outlined in this paper is particularly useful in complex problems where designs 
based entirely on intuition may be incorrect or may have a performance lower than necessary for a 
given amount of hardware. For instance, using the methodology suggested here, Weiser and Davis 
[1 4] have discovered designs of systolic arrays with two to three times higher performance than the 
corresponding arrays by H. T. Kung and Leiserson [8]. Furthermore, the designs are proven to be 
correct. 
Explicit control can be modeled within the formalism . The expressions become more complex, but the 
formalism allows for the treatment of space-time tradeoffs. If at the first iteration of the design cycle 
the spatial requirements of a computational network for a function are too large, the hardware 
requirements can be reduced by mapping the computations to a network of reasonable size. In doing 
so, it is necessary to model the control explicitly. Eventually the control will become fairly complex, 
but it can still be included in the formalism. 
The formalism allows for a precise definition of wave fronts. The concept of wave fronts is useful both 
in designing computational networks and in finding suitable organizations of the input data. Wave 
fronts are particularly useful in problems where the data flow can be considered laminar. The 
formal ism proposed is not, however, limited to networks with laminar flow as the example describing 
arrays for the FFT shows. 
Acknowledgments 
The authors gratefully acknowledge the support for this research provided generously by the Defense 
Advanced Research Projects Agency, under contracts MDA-80-C-0523 with the USC/Information 
Sciences Institute and N00014-79-C-0597 with the California Institute of Technology, and by the 
Burroughs Corporation for the Data Driven Research Project at the University of Utah. 
Views and conclusions contained in this paper are the authors' and should not be interpreted as 
representing the official opinion or policy of DARPA, the U.S. Government, any person or agency 
connected with them, nor of the Burroughs Corporation. 
CALTECH CONFERE NCE ON VLSI, Janua r y 1981 
398 
Lennart Johnsson , Uri Weiser, Danny Cohen a n d A Lan L . Davis 
References 
1. Ayres, R., "IC design language," Proceedings of the 16th Design Automation Conference, June 25-
27, 1979, pp. 307·309. IEEE No. 79CH1427-4, Library of Congress Card No. 76-150348. 
2 . Chen, M., and C . Mead, "A notation for designing concurrent systems," Internal Document (3927), 
Caltech, Computer Science Department, August 1980. 
3. Cohen, D., "Mathematical approach to iterative computational networks," Proceedings of the 
Fourth Symposium on Computer Arithmetic, pp. 226-238, October 1978, also published as 
USC/Information Sciences Institute RR-78-73, November 1978. 
4. Cohen, D., and V. C. Tyree, "VLSI system for Synthetic Aperture Radar (SAR) processing," 
Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE), Vol. 186, 
pp. 166-177, 1979. 
5. Johannsen, D., "Bristle Blocks: A silicon compiler," Proceedings of the Caltech Conference on 
VLSI, January 1979. 
6. Johnsson, L. , and D. Cohen, "Computational arrays for the Discrete Fourier Transform," 
COMPCbN, February 1981 . 
7. Johnsson, L., and D. Cohen, A Mathematical Approach .to Computational Networks for the Discrete 
Fourier Transform, USC/Information Science Institute, RR-81 - 90, 1981 (forthcoming). 
8. Kung , H. T., and C. E. Leiserson, "Algorithms for VLSI processor arrays," Section 8.3 in [10]. 
9. Kung, S. Y., "VLSI matrix computation array processor," The MIT Conference on Advanced 
Research in Integrated Circuits, February 1980. 
10. Mead, C., and L. Conway, Introduction to VLSI Systems , Addison-Wesley, 1980. 
11 . Rem, M., and C. Mead, "A notation for designing restoring logic circuitry in CMOS," Second 
Caltech Conference on VLSI, January 1981 . 
12. Rowson , J. , Understanding Hierarchical Design, Caltech, Computer Science Department, Report 
3710, April1980. 
13. Seitz, C., "System timing," Chapter? in [10]. 
14. Weiser, U., and A. Davis, Mathematical Representation for VLSI Arrays, University of Utah, 
Computer Science Department , Report UUCS-80-1 11, September 1980. 
DESIGN DISCIPLINES SESSION 
A NOTATION FOR DESIGNING RESTORING LOGIC CIRCUITRY IN CMOS 
Martin Rem 
Eindhoven University of Technology 
and California Institute of Technology 
and 
Carver ~1ead 
Professor of Computer Science, Electrical Engineering 
and Applied Physics 
Californi~ I~Rtitute of Technology 
1. INTRODUCTION 
399 
As the underlying silicon fabrication technology has become 
capable of producing chips with transistor counts in excess of 
1,000,000, problems associated with correct design are assuming ever 
greater importance. Exhaustive checking of mask artwork for errors 
becomes prohihitive. Technologies and design styles which obviate large 
classes of potential errors are enormously preferable to those that do 
not. 
A mooular, hierarchical design style can, with proper 
restriction, confine many types of checks to one level of the hierarchy 
within each module. A set of such restrictions is given in this paper, 
together with a mechanism for their enforcement. These restrictions 
capture a substantial fraction of the design style given in [1] . 
As feature sizes are scaled below one micron, ratio logic 
processes like nMOS and I 2 L become progressively less attractive . 
Straightforward scaling to smaller sizes results in a linear increase in 
current per unit chip area. Technological tricks such as high 
resistivity polysilicon pullup devices or very small injector current 
can be used to decrease current drain, but the resulting devices become 
increasingly vulnerable to "soft error " prohlems from alpha particles, 
etc. Fully restored "static " logic using a complementary process is the 
natural choice for systems with submicron components . Present bulk CMOS 
processes have a number of very ugly analog rules associated with the 
4-layer nature of the process. As a result, the designer must be aware 
of details of the technology to an alarming degree. CMOS on an 
insulating substrate is, on the other hand, a conceptually clean 
process: it requires no analog rules whatsoever if proper timing 
conventions are observed . There are recent signs that it may become 
reliably producible as well . 
We introduce a programming notation in which every syntactically 
correct program specifies a restoring logic component, i . e ., a component 
whose outputs are permanently connected, via "not too many" transistors , 
to the power supply . It is shown how the specified components can be 
translated into transistor diagrams for CMOS integrated circuits . As 
these components are designed as strict hierarchies, it is hoped that 
the translation of the transistor diagrams into layouts for integrated 
circuits can be accomplished mechanically. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
400 
Ma~tin Rem and Ca~ve~ Mead 
In this paper we do not address the dynamic behavior of the 
logic components . The "proper timing conventions," alluded to above, are 
left for a subsequent paper. 
2 . SWITCHES IN CMOS 
The CMOS technology uses two types of transistors: the N-channel 
enhancement transistor (1a) and the P-channel enhacement transistor (1b) . 
li 
T.ate 
1a 
Fig . 
I I 
~ate 
1b 
Both of them act as switches but they are "on" and "off" for complemen-
tnry values on their gates. Denoting a high voltage by "1" and a low 
voltage by "0", switch 1a is on if the gate is 1 and 1b is on if the 
qate is 0 . When the switches are on , however , they do not convey a 1 
and a 0 on their paths (in Fig. 1 the horizontal connections) equally 
well. Switch 1a conveys a 0 virtually perfectly , but it is not a 
perfect switch for a 1. Switch 1b, conversely , is a good conveyor for a 
1 only . 
Using these CMOS transistors we want to make two types of 
switches, a "normally-off" switch (2a) and a ''normally-on" switch (2b). 
e1 • I • e2 e 1 • 9 • e2 T gate gate 
2a 2b 
Fig. 2 
If the gate is 0 switch 2a is off (nonconveying) and 2b is on 
(conveying) . Otherwise 2a is on and 2b is off . The points e1 and e2 
are called the end points of the switch. We call the connection between 
the end points its path . If nothing is known about the values conveyed 
through its path, except that they are O' s and 1's, the realization of a 
switch requires two transistors: (the complement of g is denoted as g') 
DESIGN DISCIPLINES SESSION 
401 
A Notation for Designing Restoring Logic Circuitry in CMOS 
• 
and 
are realized as 
_L 
-~nf---· and 
---r:-
Fig . 3 
These double transistors make our switches good conveyors for 
both O' s and l's , which allows the use of longer strings of switches . 
These strings of switches , however, should not be too long: the distance 
to the "power supply" must not be excessive, otherwise the signal will 
become inaccurate and the ci r cuit slow. To do justice to the nature of 
restoring logic we disallow the driving of external outputs by long 
strings of switches . This shall be reflected in the composition rules 
to be formulated in Section 3 . 
The gate inputs are run in two-rail logic to accommodate both 
the g and the g ' signals . For switches that are known to convey always 
the same value there are two instances in which they can be realized by 
just o ne transistor : 
value 0 value 
-------r--t-1- and 
are realized as 
• • li 
T and 
:::w= 
T 
Fiq . 4 
In that case, the two-rail representation of the gate signal is not 
necessary . It is assumed that the compiler can recognize instances in 
which one transistor s uffices . From now on we sha ll simply design in 
terms of switches and apply the above knowledge only if we wish to count 
the number of transistors a component requires . 
CALTECH CONFERENCE ON VLSI J January 1981 
402 Martin Rem and Carver Mead 
3 . RESTORING LOGIC COMPONENTS 
A restoring logic component (RL) has external ports . The pur-
pose of an RL is to establish a relation between the values it communi -
cates via its external ports . We restrict ourselves to the values 0 and 1. 
,. 
We design components in a hierarchical fashion . A typical RL is 
shown in Fig. 5 . 
Fig. 5 
It r.onsists of subcomponents P., B, and C, which are also RL's, 
and a p~ttern of connections between them. We restrict the possible 
connection patterns to guarantee that the composite is again an RL . 
Such restrictions are only useful if they can be formulated in terms of 
the connection pattern , i . e ., independent of the internal structures of 
the subcomponents thus connected . Before we can formulate these 
connection rules we have to give a few definitions. Each port is either 
an ~u~ port or an outEut port . The connec tion pattern of an RL 
specifies connections between its external ports and the external ports 
of the subRL's . We call the external ports of a subRL interna1 ports of 
the RL . P.n external output port of a subRL is an internal input port of 
the RL . Conversely every external input port of a subRL gives the RL an 
inte rnal output port . The rules o n connection patterns will be stated 
in terms of external and internal ports of the RL . 
We assume that the distribution of power and ground to all 
components is taken care of by the compiler . Johannsen (1] has outlined 
n methoct for the distribution of power and ground over hierarchically 
defined components . In our nomenclature: each RL has two constan~ 
internal input ports , denoted by 0 and 1. These constants are the power 
supply rails which must be present in every component. 
In Section 2 we have introduced the term path for the connection 
between the two end points of a switch . We now generalize that term . 
We say that there is a path between two ports p1 and p2 if either they 
are connected by a wire~"wire path") or there is a switch such that 
there are paths between p1 and one end point of the s witch and between 
p2 a nct the other end point . In the latter case we say that the switch 
is on the path . A path is called a conveying path if all switches on 
DESIGN DISCIPLINES SESSION 
A Notation foP Designing RestoPing Logic CiPcuitPU in CMOS 
the path are on . The values on the input ports (ext~rnal or internal) 
determine which switches are on and which are off, and hence between 
which ports there are conveying paths. (Whenever we do not specify 
whether a port is external or internal, that is donP. intentionally . ) 
Two input ports are said to be fighting if there exists any 
assignment of values to all input ports such tnat there is a conveying 
path between the two input ports . 
We introduce three rules the connection pattern must satisfy: 
Rule .l• [no fighting): No two input ports are fighting . 
403 
Rule 2 . [restored external outputs]: Every external output port 
(a) has a wire path to an intPrnal port , or 
(b) has a conveying path to 0 or 1 for every assignment 
of values to all input ports . 
Rule 3 . [nonfloating internal outputs): For every internal 
output port p and for every assignment of values to all 
input ports there is a conveying path between p and an 
input port . 
Notice that Rule 1 includes 0 and 1 (the two constant internal input 
ports) . Remember that internal outputs are regarded as (external) inputs 
of the subcomponent and that the subcomponent's external outputs are 
internal inputs for the component . 
The justification of Rule 1 is obvious . The result of Rule 2 is 
that all external outpu t s are driven by power or ground . They may be 
driven via a number of switches , but such a string of switches is 
confined to one component , viz . the component in which the actual 
connection to 0 or 1 is made . 
The rules for internal outputs, i . e . , outputs to subcomponents , 
are more liberal . We allow that inputs from subcomponents and inputs 
from the environment are directed through switches before they are 
output to subcomponents . For inputs from subcomponents this is 
reasonable: they are restored by the subcomponents . With inputs from 
the environment we have to be more careful . We have to allow that such 
a signal from an external input port goes through a switch to an 
internal output port . Otherwise we would be unable to make the flip-
flop to be shown in Example 3 . But it does allow long strings of switches 
~ going into ~ the hierarchy , as sketched in Fig . 6 . 
We do not consider this a serious drawback . One may expect a sub-
component to have (physically) shorter connections than the component 
itself . Restoring in the ~ inward~ direction, therefore , seems less 
vital than in the ~outward~ direction . Still , if we wish to bound the 
lengths of s uch inward strings of swi t ches we could have the compiler 
insert amplifiers into them to restore their signals . 
The consequence of allowing the switches in the outputs to sub-
components is that Rule 2 has t o be stronger than one might expect . In 
Rule 2 we could not allow wire paths between external input ports and 
external output ports . This may seem to disallow running through a 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
404 
Fig . 6 . 
MaPtin Rem and CaPVeP Mead 
--c::J-- stands 
for a connection 
via one or more 
switches 
component ·,..rire whose signals are not used by the component . In fact , it 
does not . Such a wire is just not part of the component . (On the chip 
a wire between t ·,..ro components may run through the "area" of another 
component , but that is a matter of chip layout. It is a physical 
property, not a functional one.) Allowing wire paths between external 
input ports and external output ports would have given ri se to the 
possibility of ill-restored outputs. Fig . 7 sketches an RL that is 
allowed by Rules 2 and 3. Now assume that each Si is just a wire path 
from it5 input to its output , which would be allowed if we weakened Rule 
2 . The output of the RL is then not restored . Imagine now that each Si 
actually has the same structure as the whole RL . It is clear that this 
wo•Jld violate our goal of having restored external outputs . 
In one respect is Rule 3 stronger than necessary. It requires 
that all subcomponents receive well-defined inputs , even a subcomponent 
whose ouputs are not used . We could have restricted the rule to 
subcomponents whose outputs are actually used in the computation, but 
that would have made both the rule and the checking whether it is obeyed 
more complicated . 
Fig , 7 
DESIGN DISCIPLINES SESSION 
A Notation foP Designing RestoPing Logic CiPcuitPy in CMOS 405 
4 . THE PROGRAMMING NOTATION 
In this section we introduce a programming notation in which 
connection patterns can be specified that satisfy the three rules of the 
preceding section. There are two properties a good notation should 
enjoy . First , it should be relatively simple for the compiler to check 
that a program is syntactically correct . If this mechanical check i~ 
simple , it will probably be simple for programmers to convince 
themselves that their designs satisfy the rules. We shall show how the 
syntactic checking can be performed. Second, it should be possible to 
give a formal definition of the semantics of our programs . We have not 
yet achieved the second goal, but ultimately we must be able to prove 
that a component performs a certain computation. That seems a much 
better technique than a demonstration of its effect with an a posteriori 
simulation. (Besides, how do we know that the simulation is correct if 
we do not have a rigorous definition of the meaning of our statements?) 
It will not be simple , but remember: a program of more than, say, 20 
lines is probably too long, we then have not chosen the right 
subcomponents . 
For the formulation of connection patterns we introduce the term 
node . Every port is a node, but the program may introduce additional 
(interior) nodes . For each node n we shall introduce a ~nectio~ 
<2Qndition C(n) and a ~~.!:_l~~~d_-to-co_l!stan~condi~ion CC(n) . We shall, 
furthermore , distinguish a directly driven set D, which is a subset of 
the set of nodes . These concepts- will be used in the syntax checking . 
A formal definition of how they depend on the connection pattern 
specified will be given later. Intuitively, C(n) will be the condition 
on the input values under which node n is connected to an input, and 
CC(n) will be the condition under which it is connected to a constant . 
The C(n) ' s will be used to enforce the no-fighting rule. The set D will 
comprise all nodes that are connected by a wire path to an internal 
input port . 
The program consists of a sequence of statements . Each statement 
introduces a number of connections and switches between nodes, and 
thereby affects the C(n) and CC(n) of each node involved and the set D. 
Initially, i . e . , prior to the first statement, D is the set of all 
internal input ports , C(n) is 1 for each input port and CC(n) is 1 for 
the two constant internal input ports . The C(n) and CC(n) are 0 for all 
other nodes . ("1" should be interpreted as "true" and " 0" as "false. " ) 
The program is complete if finally we have: 
for every external output port p 
for every internal output port p 
p t D V CC(p) 
C(p) = 1 
(These completeness conditions correspond to Rules 2 and 3 . The observ-
ing of Rule 1 is discussed below. ) 
EXAMPLE 1 comp inverter (in? , out! ): 
begin in ' + out= 1; in+ out 0 end 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
406 Ma~tin Rem and CaPVeP Mead 
The ahove is a simple example of an RL , it does not have 
subRL ' s . The first line specifies the name of the component and its 
external ports. ~ question mark or an exclamation point indicates that 
the port is an input port or an output port , respectively . In the 
ronnection pattern two switches are specif~~d , textually separated by a 
semicolon . The first statement expresses that the output port out is 
connected to the constant input port 1 . The condition in front of the 
arrow specifies under which circumstances the switch in the connection 
should bP on . In this case a normally- on switch whose gate is connected 
t o th~ input port in (or a normally - off switch with its qate connected 
t0 in ' ) is specified . The second statement specifies the second switch . 
For the more pictorially inclined reader we observe the resem-
hldnce of t~e program and the followinq diaqram . 
in out 
0 
Fiq . 8 
Why is the program syntactically correct? In order to be able to show 
thdt th~ only output port out satisfies 
out t D V CC (out ) = 1 
we have t o be ~ore precise as to how a statement affects C(n) , CC(n ) and 
D. 
In a program switches are introduced by statements 
BE -+ X = y 
in which x and y are nodes, and BE is a boolean expression in terms of 
nodes, more precisely: BE is a production of the qrammar 
<boolean expression>::= <term> { V <term>} 
<term> : := <factor> { A<factor>} 
<factor>::= <primary> \<primary> ' 
<primary>::= <node> I (<boolean expression>) 
DESIGN DISCIPLINES SESSION 
A Notation foP Designing RestoPing Logic CiPcuitPy in CMOS 407 
Prior to the statement 
BE -+ x y 
we should have 
for all nodes n in BE : C(n) 1, and 
(C(x) II C(y) II BE) = 0 
•rhe first requirement is introduced to permit the syntax check-
ing to be done incrementally at each statement of the program . A con-
sequence , however , is that not every order of the statements in the 
program is permissible . It is still an open questi on whether this 
serializability requirement is not too strong . If we succeed in design-
ing our components under this regime it will certainly enhance both the 
readability and the checkability of our programs . 
The second requirement guarantees the observance of the no -
fighting rule . The statement does not have an effect on the set D. The 
effect on C(n) and CC(n) is 
Z(x):= (Z(x) V (Z(y) II BE)) 
Z(y):= (Z(y) V (Z(x) II BE)) 
in which Z stands for C or CC . 
The set D is affected only by a statement that specifies a 
direct connection, i . e ., one that does not go through a switch . We 
obtain such a statement by dropping the conditional part " BE+ ": 
X = y 
As for the effect on C(n) and CC(n) this statement is like a switch 
specification with "1 " as its boolean expression . Prior to the 
statement the condition 
(C(x) II C(y)) = 0 
should hold , and its effect is that Z(x) and Z(y) both become Z(x) V 
Z(y) (Z still standing for Cor CC) . The effect on the set D is that if 
either node x or node y was a member of D then D is extended with the 
other node . 
In the example of the inverter we initially have out ¢ D . As the 
program leaves the set D unchanged we have to show that it establishes 
CC(out) 1 . The first statement is legitimate as we initially have 
C(in) = 1 and 
C { out ) II C ( 1 ) II in ' 0 II 1 II in ' 
0 
The effect is that both C(out) and CC(out) become in '. The second 
statement is legitimate as well: C(in) is still 1 and 
CALTECH CONFERENCE ON VLSI ~ JanuaPy 1981 
408 
C(out) ~ C(O) ~ in 
Martin Rem and Carve r ~ead 
in' II 1 ~ in 
0 
It establishes CC(out) 
program . 
in ' V in , which is 1. Hence, it is a complete 
Notice that both switches in the inverter are of the type that 
~an be implemented by one transistor . The inverter, consequently , 
reqnires only two transistors . We shall use this inverter as a sub-
component in our thirrl example . 
EXAMPLE 2 . 
~~m~ nor(a? , b? , out!): 
b~_9i_!1 a v b +out= 0 ; a'~ b' +out= 1 end 
In the first statement the boolean expression is a disjunction 
of two nodes . This gives ri se to a diagram in which two switches are 
placed in parallel. 'l'he boolean e xpression of the second statement 
spPci fies two sw itches that are placed in series . The whole component 
r~->quires four trdnsistors . The following picture shows a diagram of the 
r;ompont-> n t . 
a 
Piq. 9 
out 
b 
A new node is introduced by mentioning it in the right-hand side (in the 
part to the right of the arrow) of a statement. There is no example of 
this in the paper . 
F:XAt1P LE 3. 
camp flip-flop(in?, ld?, q!, qbar! ): 
begin sub i1 , i2: inverter; 
end 
i2.in = i1 . out; 
ld' + i 1 .in= i2.out; ld + i1 .in in; 
q = i2 . out; qbar = i1.out 
'l'he second line of the program specifies that the component 
flip-flop has two subcomponents , named i1 and i2, of type inverter. As 
each inverter has two external ports, this declaration provides the 
component with four internal ports . An internal port that corresponds 
to the external port p of a Sllbcomponent S is denot~d as S .p. As both 
i1 and i2 have an external output port out , the component flip-flop has 
the internal input ports i 1.out and i2.out. Likewise, it has the 
internal output ports i1.in and i2.in. 
DESIGN DISCIPLINES SESSION 
A Notation foP Designing RestoPing Logic CiPcuitry in CMOS 
The reader is encouraged to check that the component satisfies 
the rules by formally deriving that all statements are legitimate and 
that the program establishes 
q e:: D, qhar 1::. D, C(i1 . in) 1, C(i2 .in) 
A possible diagram of the component is 
Fig . 1 0 
5 . BUSES 
409 
If we want to design a random access memory out of inverters, we 
must be able to connect their inputs and outputs via buses to the inputs 
and outputs of the memory. we want to connect the outputs of many 
subcomponents (inverters) to the same bus. Just connectinq these 
outputs (internal inputs to the memory) to the bus would violate the 
no-fighting rule . We shall remedy this by putting switches in these 
connections . 
To indicate when the memory cell has to drive the bus 
("reading") and when it has to receive a value from the bus ("writing") 
two inputs , r and w, go into the cell: 
memory cell 
w -------~r---------------~~-------
bus 
Fig. 11 
We attach a number of cells to the same bus. Such a composition will 
only be an RL if we guarantee that, at most one of the cells can have 
its r equal to 1 . The signals r come from another subcomponent of the 
memory , usually called the "decoder ." The purpose of the decoder is to 
assure that at most one r equals 1. Given that the outputs of the 
decoder satisfy that requirement, we can show that the composition is 
again an RL . This is a new phenomenon: a condition on the values output 
CALTECH CONFERENCE ON VLSI, January 1981 
410 MaPtin Rem and CaPVeP Mead 
by a subcomponent has to be tak~n into account to prove that a 
connection pattern specifies an RL . We call such a check a semantic 
check . 
The following program is a 1-of-2 decoder. 
c~mp 1-of-2 decoder{in?, out1!, out2!): 
~in in + out1 := 1; in + out2 = 0; 
in' + out 1 = 0; in' -+ ou t2 = 1 
enrl 
By a syntactic check, as described in Section 4, we can show that this 
is a legitimate RL . In this case it is also simple to check that the 
output values satisfy {out1 A out2) = 0, but that is a semantic check . 
The moral is that we will design components that are only 
"conditional RL's , " i . e . , they are RL's under the condition that the 
output values of other components satisfy certain constraints . When 
such components are put together we will have to see to it that such 
semantic constraints are indeed satisfied . 
6 . A GLANCE INTO THE FUTURE OF COMPUTING 
In this paper we have not addressed the dynamic behavior of 
components, i . e . , how they react to transitions on their inputs . That 
is obviously the next step. By adopting proper timing and signalinq 
conventions {cf . Chapter 7 of [2)) one should be able to address the 
dynamic behavior in an equally discrete fashion. The purpose of such 
conventions is to generate "data valid" inputs that signal that the 
input data are well-defined and may be inspected . Such a data valid 
signal may come from a clock or it may be an asynchronous acknowledge 
signal. 
After that there are two roads we can follow . We can make a 
machine. That machine will accept programs and execute them . We then 
concentrate on the programs and if we wish to have a certain computation 
performed, we write a program for it. That is the traditional road . 
We are led to the other, more promising, road if we observe that 
we are already designing programs , programs that can be compiled into 
transistor diagrams for CMOS . We make components out of subcomponents . 
Every time they will be more "powerful" or "sophisticated " than their 
subcomponents. We can inspect how a component is implemented by looking 
at its program text to see how it is composed out of subcomponents . 
Every component is again an implementation of a "higher level" concept . 
We can , e . g ., introduce components that communicate other data types 
than just O' s and 1 ' s . If we look at the implementation of t hat 
concept, we may notice that it is achieved by multiplexing or by t h e u se 
of multiple ports . In that way the components we introduce wi ll give us 
new modes of expression so that we can formulate our programs in terms 
of concepts that are more appropriate to our computations . After a 
while , we will have a mode of expression that one would customarily ca l l 
a "higher level programming language." 
DESIGN DISCIPLINES SESSION 
A Notation fop Designing RestoPing Logic CiPcu itPy in CMOS 411 
Throughout all the levels of the hierarchy we have maintained 
that we program by composing components out of communicating sub-
components. But by expressing a program in such a notation we have also 
specified an implementation for it, we have actually specified for the 
program a transistor diagram in CMOS. From there, the step to a 
complete silicon compiler is a (nontrivial) matter of generating the 
proper geometric representation of the transistor diagrams . 
Of course, we do not have to translate all our programs into 
silicon to have them executed. We could also compile them into machine 
code , e.g., into code for a machine designed by taking the other 
aforementioned road. Our choice will depend on such external factors as 
the speed with which the computation has to be performed or the expected 
frequency of its use. It is also possible that we want to make a 
translation into machine code first in order to get some experience with 
the program and that we do not have it compiled into silicon until it is 
in a form that suits us. 
POSTSCRIPT 
Is this an article about machine design or about programming? 
The answer to that question is definitely "Yes!". 
ACKNOWLEDGEMENTS 
The research described in this paper was sponsored by the 
Defense Advanced Research Projects Agency, ARPA Order Number 3771, and 
monitored by the Office of Naval Research under contract number 
N00014-79-C-0597. 
REFERENCES 
[1] Johannsen, Dave, "Hierarchical Power Routing." Display file 2069, 
Computer Science Department, California Institute of Technology, 
Pasadena, CA, October 1978 
[2) Mead, Carver & Lynn Conway, "Introduction to VLSI Systems ." 
Addison-Wesley Publishing Company, Reading MA, 1980 
CALTECH CONFEREN~E ON VLSI, Januapy 1981 
412 
DESIGN DISCIPLINES SESSION 
A STRUCTURED APPROACH TO VLSI LAYOUT DESIGN 
ABSTRACT 
M.S.KRISHNAN 
XEROX Corporation 
LSI Development A3-7 4 
701 S.Aviation Blvd. 
El Segundo, CA 90245 
413 
A new approach to the VLSI layout problem is proposed that produces a 
structured floor plan for an arbitrary network of interconnected processing 
elements. It is based on extracting a minimum spanning tree from a given 
representation of a computation network and using an efficient, structured layout 
scheme for this minimum spanning tree. Techniques to lay out trees as arrays of 
layout slices are presented. It is assumed that the nodes of a network are identical 
in their layout size and connectivity. This method is valid at any level of a VLSI 
design since these nodes may represent gates, cells or complex macros. An 
application of this approach to modified tree networks is described. Other useful 
applications of the method are mentioned . 
CALTECH CONFERENCE ON VLSI, January 1981 
414 
M. S . Kr>ishnan 
1. INTRODUCTION 
A significant portion of a VLSI chip of any reasonable complexity is consumed by 
the communication paths among the various macros comprising the chip. This 
situation is further aggravated by several factors: 
a) Decreasing feature sizes, e.g. transistors and wires, which cause 
communication delays to decrease non·proportionately with gate delays. 
b) Presence of random logic with the attendant interconnection that is also 
irregular. 
c) Lack of adequate design aids that guide a designer along a structured, 
hierarchical design sequence that is also streamlined to provide masks within 
a short time. 
Traditionally, the two major phases of digital system design, namely logic design 
and physical , or more appropriately geometric design, have been treated as 
separate, sequential operations. This methodology was generally adequate before 
the LSI revolution. With the enormous computing power and hence complexity 
present in a network of processors in one chip, the separation of these two tasks 
tends to overlook the effects of one on the other. It is essential for a VLSI designer 
to be able to evaluate the effects of his logic design on the layout and vice versa 
early enough to incorporate them into his design. Yet there is a lack of adequate 
design aids for evaluating the potentials of alternative chip designs carried 
through the logic design to the floor plan of the chip. 
The major goal of the work described in this paper is the development of design 
aids that will produce structured chip layouts that are efficient in area and are 
easy to generate. The problem of regular and/or area·efficient layouts for VLSI 
has generated considerable interest recently [7]. The Bristle Blocks approach [6] 
attempts to develop cells that have built·in stretching points so that neighboring 
cells may be made to conform to the same pitch. Leiserson [2] describes a divide· 
and·conquer approach to the layout problem wherein any planar graph that 
satisfies the conditions of the separator theorem of Tarjan and Lipton may be 
recursively bisected by removing edges until subgraphs realizable as rectangular 
layouts with the desired aspect ratio are obtained. These are then recursively 
connected by restoring the deleted edges. Brent and Kung [1] proposed a regular 
layout for a carry·lookahead addition scheme. 
The present work attacks the problem through the creation of regular "layout 
slices" for a few commonly found computation structures .. tree, cube, hexagonal 
array etc. and treating a given computation network as a composition of instances 
of these structures. Thus the tasks of logic design and geometric design are being 
absorbed into one wherein the impact of one on the other can be handled 
systematically. 
DESIGN DISCIPLINES SESSION 
A StPuctuPed App Poach to VLSI Layout Design 415 
Section 2 describes some regular layout schemes for trees proposed in the 
literature. Section 3 presents some new layout schemes that are implementable as 
arrays with useful properties. These schemes are applied to modified tree 
networks in Section 4. Section 5 describes an algorithm to generate the floor plan 
for an arbitrary computation network of interconnected processors. Some 
potential applications of this method are listed in Section 6. 
2. LAYOUT SCHEMES FOR TREES 
A structure that has a wide range of applications from multiplexors, decoders etc. 
to multiprocessors is a tree network. A regular layout for the carry chain 
computation in an adder has been proposed [1] . It is actually a set of trees with 
common inputs and is illustrated in Fig . 1. Its area is O(nlog n) where n is the 
number of leaf nodes. 
C8 C7 C6 C5 C4 C3 C2 C1 
~--+CARRY 
PROCESSOR 
G8,P8 G7,P7 G6,P6 G5,P5 G4,P4 G3,P3 G2,P2 G1 ,P1 
Fig. 1. An O(nlogn) layout for a carry lookahead adder tree 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
416 
M. s . Kr>iehnan 
An H-tree layout for binary trees has been proposed [9] that is more space-
efficient. The H-tree requires O(n) area and is shown in Fig. 2. 
~1~----~n- ----~,..1 
Fig. 2. The H-Tree Layout for a Complete Binary Tree 
Several observations can be made on these layout schemes : 
a) The tree layout scheme of Fig . 1 has all the leaf nodes at one end and the 
output nodes at the other. Thus data travels in one direction only and a total 
distance proportional to log n. 
b) The H-tree scheme has leaf nodes spread throughout the interior as well as the 
periphery of the square area. The data flow alternates between the two directions 
from level to level. The total distance traversed by data from the leaf nodes to the 
output, using the unit shown in Fig.2, is : 
for n = 22k leaf nodes 
for n = 22k + 1 leaf nodes 
This delay is proportional to sqrt n. 
DESIGN DISCIPLINES SESSION 
A Structured Approach to VLSI Layout Design 
417 
c) Both schemes grow in both directions as the tree expands. In both, the number 
of nodes in either direction is not constant and varies across the layout. Therefore 
neither scheme is suitable for realization of a tree as a one-dimensional array. 
3. LAYOUT SLICE SCHEMES FOR TREES 
We develop, in this paper, a structured layout scheme as a one-dimensional array 
of "layout slices". The new layout technique places the leaf nodes along the 
edges for ease of routing, minimizes the data propagation time so that the latency 
time of the tree as a segment in a pipeline is minimized. The ease of access to the 
leaf nodes is critical in applications where not only the root but also the leaf nodes 
of the tree communicate with other macros on the chip. 
3.1 BINARY TREES 
An algorithm to generate the layout for a complete binary tree is given below. The 
numbering notation used in this paper for levels of nodes is shown in Fig. 3. 
-Level3 
- Level2 
12 
\ - Level 1 
@ -Level 0 
Fig . 3 . A Logical Binary Tree with Eight Leaf Nodes 
CALTECH CONFERENCE ON VLSI , January 1981 
418 
M. s . Kr>ishnan 
ALGORITHM 1 (Algorithm for the layout of a binary tree) : 
Let the binary tree have n = 2k leaf nodes for some integer k. The two main tasks 
in the layout process are placement of the nodes and interconnections among 
them. 
1 . PLACEMENT 
a) Traverse the tree in order. 
b) Group the nodes in the traversal into pairs. 
c) Assign every pair obtained above to a layout slice. 
2. INTERCONNECTION 
a) The connections to the leaf nodes (level O) are straightforward since they 
receive external inputs. 
b) For nodes at higher levels in the tree, the level number of a node in a given 
slice can be determined in a simple manner. For nodes at level1, the inputs 
are from within the same slice and the slice immediately to its right. For 
nodes at level i, i > 1. the inputs are from the slices 2i-2 positions to the left 
and 2i-2 positions to the right. o 
The algorithm is illustrated for a tree with n = 8 in Fig. 4. Step 1 of the algorithm 
yields (1 ,9) (2, 13) (3.1 0) (4, 15) (5, 11) (6,14) (7,12) (8). Step 2 can be observed from 
Fig.4 which shows the realization of the tree. 
2 
1 I 1 I L _I I _j I 4 I L _j I 7 I L _j 
r--, 
I 
I 
~--------------------------n --------------------------~ 
Fig . 4. Realization of a tree as an array of "layout slices" 
DESIGN DISCIPLINES SESSION 
A StPuctuPed AppPoach to VLSI Layout Design 419 
The following lemma determines the number of slices required for such a layout and 
the bounds on the degree and eccentricity of a slice. 
LEMMA 1 : 
A complete binary tree with n leaf nodes can be realized in area O(n) as an array 
of n layout slices where each slice contains exactly two nodes with a) at most four 
wires connected to it and b) at most log n · 2 wires passing around it without 
connection , where a wire is an interconnect between two nodes in the tree. 
Proof: 
We prove the first part of the lemma by induction on the number of leaf nodes. For 
a complete tree, n = 2k for some integer k . For n = 2, we need two slices as 
shown below: 
Consider a tree with 2m leaf nodes. 
The total number of nodes in the tree = 2m + 2m-1 + . .. + 21 + 20 
= 2m+ 1 . 1 
= 2(2m) · 1 
= 2(n) · 1 
This shows that the nodes of the tree can be assigned to the 2m slices, two nodes 
to a slice, such that exactly one halt of one of the slices is unused. 
We now apply induction to a tree with 2m + 1 leaf nodes. Let 
R 
T 
m+ 1 /~ 
be a complete tree with 2m + 1 leaves where R is the root node and T L and T R are 
the left and right subtrees of T m + 1 with 2m nodes each. By the induction 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
420 
M. S. Kr>ishnan 
hypothesis, both T L and T R have structured layouts with 2m slices each. But we 
have shown above that the layout for T L has a slice, say S, that has an unused 
node. Let R be assigned to this unused slot and connected to the roots of T Land 
T R· ··This results in a layout forT m + 1 as an array of 2m+ 1 slices with each slice 
containing two nodes. An interesting property of the placement strategy is that 
every slice has exactly one leaf node and one node from a higher level. This 
follows from the in-order traversal of the tree. Thus the area of the layout, using 
any choice of units is O(n). The maximum propagation length from the leaf nodes 
to the root is n/ 2. The maximum number of wires across any vertical cross 
section of the layout is log n. 
b) Let the levels of the tree be numbered with the leaf nodes at level 0 and let I(P) 
denote the level of node P. As mentioned above, one of the two nodes in a slice is 
from some level > 0. Let P be such a node. There can be no wires passing around 
the slice containing P that connect to to a node at level I(P) or less since, by 
construction, none of the left sons of node P are to the right of P. The maximum 
number of wires passing around a slice occurs for the case I(P) = 1 for which 
there are log n - 1 levels above it. However, the root node is not connected to any 
other slice and hence there are at most log n - 2 wires passing around a slice. 
The number of wires connected to a slice is maximum when I(P) :;e 1 and can be 
seen to be four. o 
The number of wires passing around a slice was treated specifically above to 
bound the width of the routing channels although these wires may be run through 
the cells. Only one half of one slice is unused. Note in Fig. 4 that there are at most 
log 8 - 1 i.e. 2 wires passing around a slice. The output from a node at level 
passes around 2(i -1) - 1 slices using this arrangement of nodes. 
3.2 TREES WITH LARGER FANOUT 
The above scheme for binary trees can be generalized to any k-ary tree as stated 
in the following lemma. 
LEMMA 2 : 
Any k-ary tree where each node has k inputs and one output, with n leaf nodes 
can be realized in area O(n) as an array of n layout slices where each slice 
contains at most two nodes with a) at most k + 2 wires connected to it and b) at 
most log n · 2 wires passing around it without connection. 
DESIGN DISCIPLINES SESSION 
A StPuctuPed App Poach to VLSI Layout Design 42 1 
Proof: 
The construction of a k-ary tree layout is similar to that of a binary tree. At each 
level i, a node is placed in the same slice as its rk/21 th input for i = 1 and in the 
slice ki-2 positions to the right of its rk/21 th input for i>1 so as to place at most 
two nodes in every slice. The properties of this layout follow from arguments 
similar to those for Lemma 1. o 
The placement criterion stated above is illustrated for a ternary tree with n = 9 in Fig. 
5. Note that this criterion is also true for a binary tree. 
I 
L - _I 
I 
_ _I 
I I 
L __ I 
I I 
L __ I 
Fig . 5. Ternary tree layout as array of slices with two nodes each 
The above layout schemes have two noteworthy properties: 
a) There are two distinct types of slices. characterized by the 1/0 connections of 
their nodes. These can be represented for the binary tree as 
m 
I I 
Th i 
L _ _l 
Although SA can be obtained from Ss, we choose to distinguish them in the 
following. These two slice types alternate in the array. 
CALTECH CONFERENCE ON VLSI J JanuaPy 1981 
422 
M. S . Kr> ishn a n 
b) The number of nodes in a slice is not restricted to 2 as described in Lemmas 1 
and 2, but can be any power of 2. 
Some useful implications of these properties are stated in the following lemma. 
LEMMA 3 : 
A tree network can be realized as an array of two distinct types of slices, denoted 
by SA and Ss, where SA consists of a left subtree and its parent node, say F, and 
s8 consists of the right subtree of F and an ancestor of F. The array realization is 
an alternating sequence of these two slice types. There are log n + 1 different 
realizations of the tree, characterized by the number of nodes in a slice, where n is 
the number of leaf nodes. 
Proof: 
The basic unit in a binary tree is a node along with its two children. Out of the 
three ways of assigning these three nodes to two adjacent slices, the ones that 
yield the minimum inter·slice communication assign the parent node and one of 
its subtrees to be in the same slice as its parent. We have arbitrarily chosen the 
left subtree to be in the same slice as its parent. The sequence SA,SB by this 
definition, expands the left subtree contained in SA to the next higher level while 
the sequence Ss,SA inserts the right subtree of the parent node contained in s8. 
The result follows from an inductive argument on the sequence of slice types. 
With n = 2m leaf nodes, there are 2m+ 1.1 nodes in the tree. A 31ice may contain 2i 
nodes, 1 ~ i ~ m + 1. Each of these produces a distinct realization and hence 
there are m + 1, or equivalently, log n + 1 different realizations of the tree. 
It should be pointed out that the layout slice arrangement is amenable to a gate 
array in which a "gate" may be a layout slice and the number of interconnecting 
channels may be bounded as above. 
3.3 AN ALTERNATIVE REALIZATION OF A BINARY TREE 
Another layout slice scheme for complete binary trees that uses a single slice type 
is briefly described below. Each slice contains a basic unit of the tree. 
LEMMA 4 : 
A complete binary tree with n leaf nodes can be realized as an array of r (2n-1 )/31 J 
identical layout slices where each slice contains a parent node and its two 
children. with a) at most five wires connected to a slice and b) at most log n - 2 
wires passing around it without connection. 
DESIGN DISCIPLINES SESSIO N 
A St Puctu Ped Ap pPoach to VLSI Layo u t Desig n 
423 
Proof: 
a) As shown in Lemma "1, the total number of nodes in a tree with n leaf nodes is 
2n- "1 . Assigning a parent node and its two children to a slice results in r2n-1 /31 
slices and at most five wires connected to a slice. 
b) The bound on the number of wires passing around a slice can be proved by 
induction on the number of leaf nodes. o 
This scheme also has the leaf nodes at the edges of the layout and is illustrated in 
Fig. 6 for n = 4 and n = 8. It can be observed that the slices are fully used when m is 
odd and there are exactly two unused slots when m is even , where n = 2m. 
2 
n = 2 (m even) 3 n = 2 (m odd) 
Fig. 6. An alternative array realization of the tree with one slice type. 
4. MODIFIED TREE NETWORKS 
Let us consider the applicability of the array-realizable layout schemes to modified 
tree networks. Such a network of considerable interest is a carry-save adder that 
adds a set of n·bit numbers using standard full / half adder cells with the carry 
propagation deferred to the very last stage. The number of levels in this tree is 
determined by the type of basic adder cells used in reducing the given h n-bit 
numbers to two rows of bits before performing a carry propagation. For instance, 
using full/half adder cells for each node of the network one would require 
approximately logsh levels in the tree excluding carry propagation, where s =::: "1.5 
[10). It suffices to say that the number of levels· in the tree for a given type of adder 
cell can always be bounded. 
An assignment of full/adder cells to add six 3-bit numbers is shown in Fig. 7. The 
CALTECH CO NFERENCE ON VLSI, JanuaPy 1981 
424 M. s . Kr> ishna n 
horizontal lines separate the successive levels in the tree. The numbers within 
circles are the unit numbers of the adder cells and the numbers within the adder 
cells (boxes) represent the outputs of other adder cells from previous levels. There 
are four levels in the tree and the carry propagation is done at the fourth level. 
(§) Q cv 
Addends 
[:J[:J[:J 
[:][;][;] 
0 G) G) 
CJCU 
4 
6 
7 
8 8 
9 9 W) 
ww 
Carry propagation 4~ 11 11 
stage 12 e 
6-bit result~ 15 15 14 13 10 7 
Fig. 7. Carry-save addition of six 3-bit numbers using full/half adders 
The tree network for this adder scheme is shown in Fig. 8. Let us assign the nodes 
in the tree to layout slices as follows: 
DESIGN DISCIPLINES SESSIO N 
A St~uctu~ed App~oach to VLSI Layout Design 425 
Define a slice for each leaf node. Assign a non-leaf node P to the same slice as its 
middle input if P is a full adder and to the same slice as its right input if P is a half 
adder. Note that there is at most one node from any given level in a slice. The 
placement of the carry propagation stage cells is explained below. The effect of 
this assignment strategy is that in any slice there is at most one node from any 
given level. A more rigorous method for obtaining the layout slices for genera/ 
networks will be decribed in Section 5. We can now state the following for such an 
adder tree implementation using layout slices: 
HALF ADDER 
426 
M. S. Kroishnan 
LEMMA 5: 
An adder tree network that adds h n-bit words using carry-save addition with 
full/half adder cells can be realized in area O(nlogsh) as an array of n + rh/21 
slices with each slice containing at most rlogsh1 + 1 nodes and at most 
5 rlogsh1 + 4 wires connected to it, where s :::: 1.5. 
Proof: 
As discussed above. the upper bound on the number of levels in the tree using full 
adders is logsh· For the carry propagation, we need at most (n·1} + rh/21 more 
levels where the term rh/21 accounts for the fact that the sum of a pair of n-bit 
numbers will need an additional bit for overflow and so the sum of the h n-bit 
numbers may have up to n + rh/21 bits. For simplicity, we allow these additional 
rh/21 bits of the sum to have separate slices for the carry propagation stage. 
Thus we need at most rlogshl + 1 nodes in each slice including the carry 
propagation stage. Each of the nodes in a slice has two or three inputs and two 
outputs. However, from the assignment of the nodes to slices, we are guaranteed 
that at least one node in every slice has at least one input from within the same 
slice. Thus there are at most 5(rlog5h1 + 1) · 1, i.e. 5rlogsh1 + 4 wires 
connected to any slice. The area of the layout is therefore O(nlogsh). o 
The dotted lines in Fig. 8 indicate the slices used in the realization of the tree. 
5. A FLOOR PLAN GENERATION APPROACH 
In this section an approach to develop the floor plan of an arbitrary computation 
network of interconnected processors is outlined. The task of generating a floor 
plan is to lay out the individual nodes and the interconnections among them in a 
rectangular area satisfying the specified design constraints like line length, width 
and number of crossovers. The availability of alternative layout schemes is bound 
to suggest alternative logic realizations at any level of a hierarchical design. The 
layout schemes described above are not limited to any particular logic level, e.g. 
transistors, gates etc. Thus various multiprocessor architectures as well as 
different multiplication schemes may be tried with the same abstraction. The 
elementary nodes themselves may in turn be laid out in detail at a lower level to 
the desired degree of optimization. 
DESIGN DISCIPLINES SESSION 
A Structu r ed Approach to VLSI Layo u t Design 427 
5.1 DESCRIPTION OF THE METHOD 
The approach consists of the following steps: 
a) Obtain a weighted network of processing elements with appropriate weights 
assigned to the edges. These weights may represent the required degree of 
proximity of the two nodes connected by that edge. 
b) Extract a minimum spanning tree for the network i.e. a tree that spans all nodes 
of the network and has the minimum total weight for any tree. The designer can 
force critical interconnections into this tree by assigning appropriate weights to 
these edges in the original network. The resulting minimum spanning tree will 
contain the part of the original network that is critical in terms of topology 
constraints. 
c) Lay out the mrnrmum spanning tree obtained using the regular tree layout 
schemes described above. 
d) Realize the original network by restoring the remaining edges. 
The implementation of the above method is illustrated in Fig. 8. The leaf nodes are 
at one end and each leaf node defines a slice. The middle input to a full adder and 
the right input to a half adder have been assigned weight 0. The effect of this 
criterion on the assignment of nodes to slices is evident although the actual 
strategy for the assignment of weights to the edges is not relevant to the proposed 
floor plan method. 
The layout of a bigger version of a carry·save adder macro generated using the 
above method is shown in the accompanying plate which shows the overall floor 
plan of the macro and the internal layout of an adder cell. There are five 6·bit 
operands and the sum is truncated to 6 bits. The leaf nodes are in the top row 
and the output nodes in the bottom row. Note that the middle row consists of one 
type of adder cell while the top and bottom rows contain a smaller adder cell 
although the above method was developed primarily for identical cells. The inputs 
and outputs of a cell are on opposite faces of a cell. A cell layout or its mirror 
image may be used in a slice. 
5.2 INCOMPLETE TREES 
Although the layout schemes described above have assumed complete or nearly 
complete trees, incomplete or unbalanced trees may still bring useful layout 
structure to the original network. Fig. 9 illustrates this with incomplete trees laid 
CALTECH CONFERENC.E ON VLSI , Janua r y 1981 
428 
LAYOUT OF ADDER CELL 
LAYOUT OF ADDER MACRO 
DESIGN DISCIPLINES SESSION 
M. S . Kr>ishnan 
A StPuctuPed AppPoach to VLSI Layout Design 429 
out as arrays. The interconnections among nodes are weighted, with 0 indicating a 
closer required proximity than 1. External inputs are shown unweighted and 
undirected. Note that in (b), node 3 is not a conventional leaf node. 
0 1 r- l r- l r l r l 
==> 
J 
1 
(a) 
0 / 
r- l r l l 
1 
==> 
I 
1 L _I _I 
1 0 
~ SA ss SA Ss 
(b) 
Fig. 9. Examples of layouts for incomplete trees 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
430 M. s . Kr>ishnan 
5.3 PROPERTIES OF THE PROPOSED FLOOR PLAN APPROACH 
1) It is more general than the divide-and-conquer method underlying Leiserson's 
scheme in that any network, not necessarily planar, can be handled as a minimum 
spanning tree problem. 
2) It yields a regular, array-realizable layout with known bounds for the number of 
processors in a slice and the number of wires beween slices which provide for 
uniform spacing for routing purposes. 
3) There are efficient, polynomial-time algorithms to extract m1n1mum spanning 
trees [12] . More importantly, alternative minimum spanning trees can be easily 
obtained using cyclic interchange methods [11] making it possible to 
systematically generate alternative floor plans. A useful implication of this is that 
the design hierarchy may be reevaluated in the light of the floor plans generated, 
resulting in a modified computation network. Thus the processes of logic design 
and physical design can be integrated to simplify the time-consuming and often 
error-prone task of a detailed layout for VLSI chips. 
4) An apparent disadvantage of this approach is that for an unbalanced tree the 
slices are not utilized efficiently. However, as mentioned in the previous section , 
regularity in layout can still be imposed on unbalanced trees. Also, at the mask 
generation stage, the unused portion of a slice may be eliminated so that the 
unused part of a slice does not consume any power. 
6. CONCLUSIONS 
Some layout schemes for tree networks and a possible solution to the floor plan 
generation problem using such schemes were proposed above. Possible 
directions for pursuing this approach are mentioned in this section: 
1) Efficient layout slices for other commonly used structures e.g . cube, hexagonal 
array etc. may be developed such that different types of layout slices are 
compatible in terms of number of wires and/ or number of processing elements in 
a slice. The goal here is similar to that of the Bristle Blocks project [6]. Thus a 
given network may be decomposed as a set of these structures which can then be 
laid out individually using efficient layout slices for each of these structures and 
interconnected. This compares favorably with the arbitrary division approach used 
in the divide-and -conquer method. For instance, the slice concept applied to a 
cube network is demonstrated in Fig . 10. 
DESIGN DISCIPLINES SESSION 
A StPuctuPed AppPoach to VLSI Layo u t Design 
431 
Fig. 10. Cube Interconnection realized as a regular array of slices 
Note that each slice for the cube also contains two nodes similar to a tree slice and 
there are two slice types. 
2) There are no efficient methods to extract minimum spanning trees with special 
constraints such as a bound on the degree ~f a node etc. This problem can be 
viewed differently : Are there useful computation networks that are modifications 
of a balanced tree and are realizable as arrays of slices within the bounds 
discussed above? The adder trees of Section 4 are examples of such networks. 
3) A natural extension of the layout schemes described above would be to tree 
structures where the nodes are not identical in their sizes or connectivities. The 
selection of a basic slice or slice types would be critical to an array realization. 
4) Since the processing elements within a slice may be connected internally, 
specific optimizations both in layout and in logic design are possible. For example. 
for a layout slice where a leaf node communicates its carry signal to the node 
within the slice, e.g. slice SA above, it is possible to use a complemented carry 
signal thereby eliminating two inverters and saving their area and power. However, 
such optimization is meaningful only in situations where it does not cause a 
proliferation of layout slice types. This may be treated as a problem of 
characterizing the interconnection pattern among the slices. 
ACKNOWLEDGEMENT 
The author is thankful to the Xerox Corporation for their support during this 
research . 
CAL TECH CONFERE NCE ON VLSI, JanuaPy 1981 
432 
M. S . Kroishnan 
REFERENCES 
1) Brent, R.P. and Kung , H.T. , ''A Regular Layout for Parallel Adders", Tech. 
Report, CMU-CS-79-131, Dept. of Computer Science, Carnegie Mellon 
University. June 1979. 
2) Leiserson , C.E., "Area-Efficient Layouts for VLSI", Tech. Report, Dept. of 
Computer Science, Carnegie-Mellon University, August 1979. 
3) Bentley, J.E. "Multidimensional Divide-and-Conquer" , Comm. of the ACM, Vol. 
23, No.4, April1980, pp 214-229. 
4) Rowson , J.A. "understanding Hierarchical Design" , Ph.D thesis, Dept. of 
Computer Science, Caltech. April1980. 
5) Browning, S.A. , "A Tree Machine", Lambda, Second Quarter, 1980. pp 32-36. 
6) Johannsen, D. , "Bristle Blocks: A Silicon Compiler" , Caltech Cont. on VLSI , Jan 
1979, pp 303-310. 
7) Marshall, M .. "VLSI pushes super-CAD techniques" , Electronics, July 31, 1980, 
pp 73-80. 
8) Mead. C.A. and Conway. L.A. , "Introduction to VLSI Systems". Addison -Wesley, 
Mass. 1980. 
9) Rem, M .. "Mathematical Aspects of VLSI Design". Caltech Cont. on VLSI , Jan 
1979, pp 55-63. 
10) Stenzel, W.J. et al , "A Compact High-Speed Multiplication Scheme" , IEEE TC. 
Vol. C-26, No. 10. Oct 1977, pp 948·957. 
11) Deo. N., "Graph Theory with Applications to Engineering and Computer 
Science". Prentice-Hall , Englewood Cliffs, N.J., 1974. 
12) Cheriton. D. and Tarjan. R.E .. "Finding Minimum Spanning Trees", SIAM J. of 
Computing, Vol. 5, No.4. Dec 1976, pp 724-742. 
DESIGN DISCIPLINES SESSION 
433 
MINIMUM PROPAGATION DELAYS IN VLSI 
Carver Mead 
Professor of Computer Science, Electrical Engineering and Applied Physics 
California Institute of Technology 
and 
Martin Rem 
Eindhoven University of Technology and California Institute of Technology 
1. INTRODUCTION 
With feature sizes d ecreasing and chip area increasing it becomes more 
and more time consumin g to transport signals over long distances across the 
chip [ 5]. Designers are already introducing more levels of metal connections, 
using wider and thicker paths for longer distances. Another recent development 
is the introduction of an additional level of connections between the chip and the 
pc-board, multilayer ceramic chip carriers. The trend is undoubtedly towards 
even more connecting levels. 
In this paper we demonstrate that it is possible to achieve propagation 
delays that are logarithmic in the lengths of the wires, provided the connection 
pattern is designed to meet rather strong constraints . These constraints are, in 
effect , satisfied only by connection patterns that exhibit a hierarchical structure. 
We also show that, even at the ultimate physical limits of the technology, the 
propagation for reasonably sized VLSI chips i s dominated by these considerations, 
rather than by the speed of light . 
2 . PROPAGATION DELAY 
We compute the tim e it takes a minimum sized transistor to drive a wire 
of l ength 1 with width and thickness s . We assume the wire to have a distance 
s to its neighboring wires and layers. Let s 0 be the minimal width of a wire on 
the chip, so that a minimal transistor has area s~ . 
The following equation is a n excellent approximation to the total time T 
required to drive the wire. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
434 
Car>Ver> ."lead and Mar>tin Rem 
Rt is the resistance of the minimal transistor, Rw the resistance of the wire an~,.l 
C its capacitance . The resistance of a wire is proportional to its length and 
w 
inversely proportional to its cross section: 
( 2) 
The capacitance of a wire is inversely proportional to the distance of its neigh-
boring wires and layers, and it is proportional to the area of the side facing 
that neighboring wire or layer: 
C :: f: g X f:l 
w 5 
We notice that the product of R and C is already quadratic in 1. 
w w 
( 3) 
Thus the time it takes to drive a wire is at least quadratic in the wire length. 
However, things arc not as bad as they look: Rt, the resistance of a minimal 
transistor, is the dominant t erm in ( 1). We can decrease that t erm by fitting 
a larger driver to the wire. But that driver must then in its turn be charged 
by the minimal transistor and it seems that we have hardly gained anything. 
That, however, is not true, for we can use a sequence of drivers instead of 
just one. The first one is the minimal transistor, the next one is bigger by a 
factor a. It drives another driver that is again bigger by a factor a , etc., 
until we finally reach a driver that is l arge enough to drive the whole wire in 
a sufficiently short time. 
There exists a simple rule to determine the time required to have a 
driver charge another driver [ 2 J. Let T be the time it takes a minimal transistor 
to charge the gate of another minimal transistor. The rule is then that the time 
required to have a driver with capacitance c 1 drive another driver with capacitance 
C 2 ( C 2 > C 1) is 
( 4) 
Let C t be the capacit ance of a minimal transis tor. We have it drive a 
driver with capacitance aCt, this second one drives a driver with capacitance 
") 
a ··c , etc ., until the last driver has a gate capacitan ce of about C /a. The t w 
number of drivers (including the initial transistor) required is 
c 
log w 
a ct ( 5) 
DeSIGN DISCIPLINES SESSION 
Minimum PPopagation Delays in VLSl 435 
The capacitance Ct of a minimal transistor is equal to (r.s~) /d, in which 
d is the thickness of the gate insulator. The n umber of drivers is then log 1d 
Cl' 
and we get for the time T d spent in driving a zero resistance wire through the 
sequ ence of drivers: 
We may replace formula (1) by 
T - T + R C 
- d w w 
From (2) , (3), (6), and (7) we conclude 
T z: aT log 
Cl' 
1~ + pf. 1: 
so s 
( 6) 
( 7) 
( 8) 
We now have a formula for the propagation delay with both a logarithmic 
and quadratic term. One can see why a longer wire requires a larger s : that 
decreases the quadratic term. Actually, we wish to restrict the lengths of wires 
to values of 1 that are sufficiently small to assure that the quadratic term does 
not dominate . We restrict ourselves to values of 1 for which the quadratic term 
grows at a slower rate than the logarthmic one. Therefore, we determine lhe 
value of 1 for whi ch the derivates with respect to 1 of the two terms are equal: 
Cl'T 
=-1lna 
2p£.1 
2 
s 
( 9) 
(lO) 
If a signal has to go distance 1 we choose a path with w idth and lhick-
ness s for which ( 9) and ( 10) are equal: 
5 = 1 J2pf.lna 
Cl'T 
Substitution of (ll) in (8) y i elds 
Or, approximat ely, 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
( ll) 
( 12) 
436 
CaPVeP Mead and MaPtin Rem 
T z: Ta log 
a ( 13) 
We have assumed that the values of s could be chosen from a continuous 
•' 
range. Although this is a good conceptualization of the increasing number of 
different connection layers, in practice we will have to choose s from a discrete 
set. The connecting wires will be placed at different levels. The widths of the 
paths at the next level will be some factor !3 times the widths at the preceding 
level. Given a distance J. the signal has to travel, formula ( 11) gives us the 
ideal s and we choose a level at which the widths of the wires are closest to s. 
This leads to an interesting observation, the 11 magnifying glass phenomenon: 11 
not only will the widths of the wires at any given level be the same but their 
lengths will also be about equal. The patterns at different levels are similar, 
at the next level the features are just magnified by a factor !3. 
2.1 Velocity of Light 
Asymptotically, no signal can travel faster than the velocity of light. 
We must ask under what conditions the above considerations will set a limit which 
is more stringent, i.e . , when the velocity of light limit is not attainable . In ( 13) 
we can substitute T = s 0 /v where v is the limiting velocity of electrons in the 
channel (a few 106 em/sec in silicon) 
a so 
T-= --log 
v a ( 14) 
The maximum 11 velocity 11 with which signals can propagate is given by 1/ ( dT I dJ.) 
dT a 8 0 
di ~~: vJ.lna 
The domain of validity of the above results is 11 velocity 11 < c : 
c a so 
1 < vl'n a 
(15) 
(16) 
For typical technology today, s 0 = 4 microns, allna about 6 and J. should be less 
than about a foot. Hence the velocity of light cannot be reached using the best 
MOS technology in the most optimal way within a typical small card bay, but will 
DESIGN DISCIPLINES SESSION 
437 
Minimum PPopagation Detaye in VLSI 
be important at larger dimensions . Even for the ultimate technology ( s 0 = 0. 25 
rricrons) , the results given above will dominate over speed-of-light considerations 
for chips up to about an inch across. 
3. AREA 
The arrangements outlined in the preceding section, allowing us to treat 
propagation delays as being logarithmic, will only work if we can allot enough 
area at the lowest level for the drivers and at the higher levels for the wires. 
A . . 1 . h 2 minima transistor as area s 0 • 
. 2 h h. d 2 2 reqwres an area as0 , t e t Ir one a s 0 , 
The next driver in the sequence 
etc . The total area A of the drivers 
thus becomes 
2 2 A= s 0 (1+a+a + ·· ·) {1ogi terms) ( 1 7) 
(18) 
Or, approximately, 
Notice that we can trade area for time. By increasing a the area of the drivers 
decreases, cf. (19), but the propagation delay increases, cf. (13) . 
2 A transistor that has to drive a wire of length 1. requires area s 0 1./ (a-1) 
at the lowest level. This area is proportional to the length of the wire . That is 
fortunate: if we double both the length and the width of a chip we also double 
the lengths of the longest (cross chip) wires and the areas of their drivers. But 
the total area of the chip will quadruple and we will thus be able to double the 
number of wires as well. 
The longer wires come on higher levels on which the wires are wider , 
thereby consuming more area . Each level, however, has the same area. As a 
result, we can accommodate the wires at the higher levels only if we do not have 
too many of them . Assume again that at the next level the wires are ~ times 
thicker , longer, and wider. Call the lowest level number 0 and let N. be the 
I 
CALTECH CONFERENCE ON VLSI . JanuaPU 1981 
438 
Car>ver> Mead tJ.rtd Mar>tin Rem 
number of wires at level i (i > 0), then we must have 
N. 
1 
N A -2i 
:=: Qt-' ( 20) 
The number of wires as a function of their lengths must decrease 
exponentially fast. This is a strong restriction. It suggests that efficient chips 
must have a tree-like structure. It is again a reason to design hierarchical 
chips [ 2] , [ 4] . If a design does not meet this exponential rule the best we can 
do is getting the propagation delay linear in the wire length by inserting 
repeaters at equidistant positions along the wires. The consequences of linear 
wire delays are discussed in [ 1] . 
One may also sec complexity computations that assume that wires have no 
delay. Thompson, e . g., writes in [ 6]: 
"The propagation time can be made independent of the length of 
the wire, by fitting larger drivers to longer wires. Larger 
drivers of course occupy more area, but need not take more than 
10% of the area of the wire they drive. By fudging X. upwards by 
5% , the area of the driver is thus absorbed into the area of its wire." 
We have seen that the area of the driver is indeed proportional to the wire 
length, but Thompson neglects the fact that charging the gate of the larger 
driver will also take time. Our choice of the sequences of exponentially grow-
ing drivers allowed us to do this in a time that is logarithmic in the wire length, 
a technique that can work only if we have very few long wires. Thompson ' s 
model also neglects that the drivers h ave to be at the lowest l evel , in poly silicon 
and diffusion, independent of the level of the wire. 
ACKNOWLEDGEMENTS 
The research described in this paper was sponsored by the Office of Naval 
Research Contract No . N00014-76-C-0367 and by the Defense Advanced Research 
Agency, ARPA Order number 3771 , and monitored by the Office of Naval Research 
under Contract number N00014-79-C-0597 
DESIGN DISCIPLINES SESSION 
439 
Minimum Pr>opagation Delays in VLSI 
4. REFERENCES 
[l] Chazell, B. M. and L. M. Monier, "Towards More Realistic Models of 
Computation for VLSI," these Pr>oceedings 
[ 2] Mead, Carver and Lynn Conway, I ntr oduction to VLSI Systems ~ 
Addison-Wesley, Reading MA, 1980 
[3] Mead, Carver and Martin Rem, "Cost and Performance of VLSI Computing 
Structures, 11 IEEE J . Solid St ate Cir>c uits 14, No. 2, 
April 1979, pp. 455-462 
[ 4] Rem, Martin, "Mathematical Aspects of VLSI Design, " Pr>oeeeding.;~ 
Caltech Conf er>ence on VLSI, (ed. C. L. Seitz), Computer 
Science Department, California Institute of Tec~nology, 
Pasadena CA, January 1979, pp . 55-64 
[ 5] Seitz, Charles L., "Self-Timed VLSI Systems," Pr>oceedingJ ~ Caltec.:h 
Conf er>ence on VLSI, (ed. C. L. Seitz), Computer Science 
Department, California Institute of Technology, Pasadena CA, 
January 1979, pp. 345- 355 
[ 6] Thompson, C. D., "Area-Time Complexity for VLSI, 11 Pr>occedingu ~ 11th 
Annual ACM Sympor; ium on t he Theor>y of Computing~ ACM Special 
Interest Group on Automata and Computing Theory with 
IEEE Computer Society Technical Committee, Atlanta GA, 
May 1979, pp. 81-88 
CALTECH CONFERENCE ON VLSI~ Januar>y 1981 
440 
DESIGN DISCIPLIES SESSION 
Towards More Realistic Models 
of Computation for VLSI 
B.M. Chazelle L.M. Monier 
Deparuncnt of Computer Science 
Carnegie-Mellon University 
Pittsburgh, Pennsylvania 15213 
Abstract 
441 
We propose two new models of computation for VLSI which rake into consideration the physical nature of 
information, the properties of wires, and the geometrical structure of the circuit. Both are refinements of the 
Kung-Thompson model, and make the main additional assumption that t11c propagation time of information 
is at best linear in the distance. The first is the more general and applies for ;my planar tech nology. It is in a 
sense the minimal physical model. The second, more restrictive, is specially tailored for electrical 
technologies. Our approach is motivated by ilie failure of previous models to allow for realistic asymptotic 
analysis. For each model, we arc able to show new lower bounds and tradc-offs for many well-known 
problems. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
442 
B. M. ChazetZe and L. M. Monie~ 
1. Introduction 
The importance of having general models of computation for VLSI is apparent for various reasons. Among 
the chief ones, we must include the need for evaluating and comparing circuit performances, showing lower 
bounds and tradc-offs on area, time, and energy, and more generally building a complexity theory of VLSl 
computation. 
While these models must be simple, general enough. to allow for mathematical analysis, they must also 
reflect reality independently of the size of the circuit. We justify the latter claim by observing that if 1980's 
circuits arc still relatively small, the usc of high-level languages for designing chips, combined with the 
possibility of larger integration and bigger chips, will make asymptotic analysis necessary in the ncar future. 
Yet as circuits are pushed to their physical limits. constraints which could be ignored before become major 
problems and must be accounted in the models. In particular, certain physical phenomena specific to 
electrical technologies enforce the density of current at any point of a conductor to be bounded. We can show 
that this invalidates the assumptions made in previous models. whereby long wires can be driven in constant 
time and an f-branch fanout takes O(log f) time [MC80,TH79]. 
Generally speaking, one major flaw in those previous models is to regard a circuit as a topological 
interconnection of nodes where transmission delays between adjacent nodes can be ignored. Instead, we 
propose to take into account the geometry of the circuit by assuming a propagation delay linear in the 
distance. We can justify this approach by considering parameters such as length and width of wires, and 
associating resistance and capacitance with each part of the circuit. We will define a first model which docs 
not make further assumptions, and we will review the complexity of some well-known circuits in this model. 
Howe\ cr. observing that in NMOS technology, the power can be supplied only from the outside boundary of 
the circuit, we can include this requirement and define a second model, which may be more realistic for 
electrical planar technologies. 
Also, besides presenting new models of computation for VLSI, the purpose of this paper is to present a 
general technique for deriving lower bounds and space-time trade-offs for many problems, e.g., addition and 
transitive functions. 
2. The Models 
2.1. The basic assumptions 
Our models arc for the most part refined versions of the current planar models found in the literature 
[TH79, BK80, VU80]. A circuit consists of nodes and wires connected in a network, and it is defined by a 
geometrical layout of this network. We distinguish 110 nodes where input and output values arc available, 
the logical nodes (gates) which compute boolean functions, and the connection nodes which simply connect 
DESIGN DISCIPLINES SESSION 
Towards More Realistic Models of Computation for VLSI 443 
wires. The circuit is laid out within a convex region with all the 110 nodes lying on its boundary. It is the case 
today, and will remain true because of the greater case in connceting and packaging such chips. In addition, 
we make the following set of assumptions. which define our first model (MODl). 
l. Wires have width and spacing between them greater than A. (today A.~lp.m}. This requirement 
will always be valid for any physical device. 
2. A circuit is laid out on a finite number of layers, and wires c rossing through different layers are 
allowed. ·n,us there is at most a constant number of cross-overs at any point 
3. The density of current at any point of a wire is bounded by a maximum value 6 na , which is 
equivalent to saying that the power dissipated per unit volume is also bounded. 1nc major 
consequence of this assumption is to make propagation delays at least linear in the distance. 
4. To switch a gate requires a minimum energy dissipated as heat [MC80.Ch.9]; this energy must be 
supplied to the gate by a source other than the input signals. 
To take into account the limitations in driving power enforced by NMOS and to a Jesser extent CMOS 
technology, we introduce a second model (MOD2). which in addition to MODL includes the following 
assumptions. 
1. All the energy supplied to the circuit comes from outside the circuit. and its transmission is 
performed on ly through wires. From 3, it follows that the maximum power provided to the circuit 
is at most proportional to the perimeter of the circuit. 
2. Storing a bit of information requires a minimum energy per unit of time. 
Note that since this model is more restrictive than the previous one. all the lower bounds obtained for MODI 
arc still valid in MOD2. 
2.2. Coding information 
The information at a point is given by the value of an electrical parameter at this point. which we define as 
the potential of a capacitor. While electrical computations arc essentially analog processes. the coding of 
information is made digital by assuming a 0 for a potential less than V 0 and a 1 for a potential greater than 
V 1 (V1>Vr). 
2.3. Wires 
A wire is a rectangular parallelepiped made of conducting material. oriented by the direction of the current 
It is characterized by its length L its width W, its thickness H. and its distance D from a plane of reference 
(the substrate). Its resistance R and its capacitance C arc given by the (idealized) relations 
R = pxL/(WxH) C = EXLxW/D 
where p and £ arc technology-dependent coefficients. 
Minimum values for L.W and H arc set by the technology (as well as by the Jaws of physics). and we 
CALTECH CONFERENCE ON VLSI , January 1981 
444 
B. M. Chazette and L . M. MonieP 
require 0 to be constant for any wire. Moreover it is legitimate always to assume bounded thickness. Indeed 
a current density l) causes a heat power loss in the wire proportional to LxWxHxol, but the dissipated 
power is proportional to LxW, since the circuit is planar. For allowing this heat to be dissipated, the thickness 
H must remain within constant bounds. Thus we can assume that the resistance is simply proportional to 
L/W and the capacitance to LxW. 
2.4. Nodes 
We distinguish three kinds of nodes, each of which uses up a minimum constant area. 
• Connection nodes: Tneir purpose is to provide electrical contacts between a bounded number of 
wires. 'These contacts may either connect wires on a same layer, or they may be "vertical contacts" 
between different layers. Of course, they introduce no delay and do not dissipate any energy. 
• 1/0 ports: 'They ensure the exchange of information between the circuit and the outside world. 
111e locations and the order in which input (resp. output) bits arc to be written (resp. read) are 
fixed and independent of the values of these bits. We restrict each input bit to be available on the 
input port only once. This implies U1at the repeated usc of the same input bit necessitates its 
storage within the circuit. The transmission of an information signal through an 110 port 
introduces a constant delay. 
• Gates: Conceptually, a gate is the device used to compute a logical function of one or two inputs 
and one output. Since it can be shown that there is no interest m having gates of arbitrary size, we 
assume that all gates have the same sii'e. Physically we must associate a gate capacitance with each 
input. 1\n input is valid as soon as the corresponding gate capacitor has been set above or below a 
certain threshold potential. The value of the function is given by the potential of the output 
device of the gate. Once the output is available, it cannot be destroyed before a constant lapse of 
time, whatever the input changes occurred in the meantime. 
2.5. Current density 
Proposition 1: The density of current is bounded at any point of a conductor by a maximum 
value ornax. 
One major flaw in previous models is to suppose that a wire of constant width can drive a current of 
arbitrary intensity. We can list at least three reasons in present-day technologies which justify Proposition 1. 
1. Any conductor with non-zero resistance produces a power per unit volume proportional to the 
square of the current density. Since this power can be dissipated only through the boundary of the 
conductor, the heat dissipation is at most proportional to the area of the conductor, which implies 
a bounded density. 
2. An electrical phenomenon called metal migrafion (MC80, CL80] causes a current to destroy the 
conductor all the more quickly as the density is high. For this reason, a maximum admissible 
density of current can be assigned to any conducting material. 
3. The voltage drop per unit length is proportional to the density of the current. Since we must 
ensure that the logical value of the signal provided by power wires is the same at any point of the 
circuit. this voltage drop must remain small, and thus the density must be bounded. 
For example, the aluminium currently used in NMOS technology has a maximum density imposed. by metal 
DESIGN DISCIPLINES SESSION 
445 
TowaPds MoPe ReaZistic ModeZs of Computation foP VLSI 
migration of about 109 A/m2 or only 1 rnA/ J!m2. For this density the voltage drop is 30 V /m with a resistivity 
of about 3.10·8 Slxm. Note that the voltage drop on a 3 mm wire is 0.1 V, and is far from negligible. Also, the 
power induced in the wire by such a density of current is about 3 W /em 2, if the thickness is lJ!m. 
3. Transmitting Information 
We turn to the problem of transmitting an information bit from a point A to a point 13 at a (Euclidian) 
distance L apart. We will assume that this information will be carried through an arbitrary path from A to B 
consisting of nodes and wires. 
We first consider the case where the path consists of a wire followed by a gate. LetS= HxW be the section 
of the wire. In order lO transmit a bit of information, we must raise the wire to the required voltage. The 
charge Q on the wire is therefore proportional to its capacitance, that is, LxW. Since in a timeT a density of 
current crossing a sectionS can provide at most a charge 8maxxSxT, the assumption that His bounded yields 
the relation T= Sl(L). 
We next investigate the case where two paths of the previous type are cascaded. Since the first gate cannot 
be switched before the signal becomes available on the first wire, the total delay will amount to the added 
delays of the two paths augmented by the switching time of the first gate. This also results in an Sl(L) delay. 
The last case to examine involves two wires linked by a connection node. We can apply the reasoning used in 
the case of a single wire, with W now being the maximum of the two wire widths. The same result follows 
directly. In the general case, we can decompose an arbitrary path from A to B into components of the form 
previously examined. Putting the above results together permits us to find the claimed lower bound on the 
time. 
In addition, we should notice that some energy is dissipated along the wire during the propagation of 
information since the wire has a non-zero resistance. This energy is proportional to the charge involved, 
which is Sl(L) in any configuration. Observe that this energy is independent of the timeT. 
Both results permit us to state the following. 
Theorem 2: Transmitting a signal between two points at a distance L apart requires Sl(L) time 
and Sl(L) energx. 
Note that this lower bound cannot be achieved with a simple wire: because of the diffusion law 
fMC80,SE79], the actual delay is in fact proportional to RxC = L2. However. we can reduce this delay to 
O(L) by using O(L) wires of constant length connected by O(L) gates (e.g., inverters or amplifiers). If the 
wires have minimum width, the lower bound O(L) on the energy is also achieved. Note that a simple speed-
of-light argument yields the same result for any technology. This is precisely what makes MODI a minimal 
planar model for all physical computations. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
446 
B. M. Chazelle and L . M. Monier 
4. Distributing and Collecting Information 
lluoughout this section, we will assume that the model is MODI or MOD2, indifferently. To fan-in or 
fan-out information being two of the most common operations performed by circuits, we next turn to these 
problems. from which we can best measure the significant departure of our models from previous ones. For 
simplicity, we first prove a technical lemma. 
I ..emma 3: There is a constant c (c = 1 /2 .,) such that for any convex polygon with a boundary of 
length Nand for any point M , there e~ists a vertex v such that dist(v,M)~cN. 
We omit the proof. which is straightforward. 
4.1. Fan-out 
A fan-ou t of degree N refers to the distribution of an information bit from a source to N points (gates or 
ports) on the circuit. To study the complexity of this problem, we distinguish two cases; when theN points lie 
on a convex boundary (e.g., on the boundary of the circuit), and when their location is left arbitrary. We 
define T (resp. E) to be the minimum time (resp. energy) to perform a fanout of N points. It is trivial to see 
that E= O(N)in both cases, since w reach every node, the information must cross a wire of (at least) unit 
length . As for the timeT, we have two different results. 
Thcorcm4: If theN points lie on a convex boundary. T = O(N). 
Proof: It follows from Lemma 3 that one of the N destinations is at least eN apart from the 
source. a nd Theorem 2 permits us to conclude. 0 
Theorem 5: If the N points have arbi trary locations, T = O(N 112). 
Proof: A consequence of the fact that the maximum distance between N points and an arbitrary 
point in the plane is at least cN112, for some constant c. 0 
Note that all these lower bounds are tight, as shown in Figure 4-l. 
On the boundary Without constraints 
Figure 4-1: Optimal fan-out. 
DESIGN DISCIPLINES SESSION 
TowaPds MoPe ReaZistic ModeZs of Computation fop VLSI 447 
4.2. Fan-in 
The fan-in is essentially the reverse operation of the fan-out, since J\J information bits must converge from 
N sources to one destination poinL Yet it is a little more general, since the information may be submitted to 
logical operations on its way to the destination. Typically, the problem is to compute a boolean function ofN 
inputs and one output. Since every gate is followed by a wire of unit length at least, the minimum energy 
dissipated during the operation is E= 11(N). If the N inputs are valid at the same time, the results arc the 
same as for the fan-out. In the more general case where pipelining is allowed and the inputs arc valid at 
arbitrary times, we can show the following. 
Theorem 6: If T (rcsp. A) denotes the minimum time (resp. area) for computing a boolean 
function ofN inputs, we have T= 11(N112) and AT= 11(N). 
Proof: Let p denote the total number of input ports actually used. It takes time at least 
proportional to N/p tO read all the inputs. and since the p ports lie on a convex boundary, 
Lemma 3 and lhcorcm 2 show that T=11(p). Observing that A=11(p), the result is then 
immediate. 0 
Note that these lower bounds arc still valid for boolean functions with an arbitrary number of outputs, as long 
as at least one output depends on all the input values. lhc addition for example falls in that category, since 
the last carry depends upon all the operand bits. If the boolean function is a commutative, associative, 
operation on N variables, these lower bounds arc tight, as shown in Figure 4-2. 
r---- ---~ Lt-t= f t 1-1-
a1 a2 ~ aVN 
a . 
VN+1 
a N· VN +1 ~ 
Figure 4-2: ComiJuting Y = a1 op a2". op aN 
takes 11(N 112) time and area. 
5. New Lower Bounds for some Common Problems 
5.1. Addition 
> 
y 
Since our models relate the time of computation to the geometry rather than the topology of the circuit, we 
can show that many complete binary tree based schemes cease to have the logarithmic time complexity which 
they enjoy in previous models. Notable examples include the fan-in and fan-out operations studied earlier, or 
the addition of two N-bit integers, to which we next turn our attention. We study this problem in our tw{l 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
448 
B . M. Chaze l le and L . M. MonieP 
models in turn. For simplicity, we start the analysis with the model MOD2, and present the basic arguments. 
5.1.1. Case MOD2 
Theorem 7: 1fT is the time required in MOD2 by any circuit to add two N-bit integers, and if A 
is the area of the circuit, we have 
l) AT= O(N) 
Proof: For the sake of simplicity, we will assume in this proof that the sign "=" really means 
"equals to within a constant factor". Relation l) follows directly from the fact that adding two N-
bit integers involves a fan-in of degree N. To prove 2), lel's call X one of the operands andY the 
result of the addition. Since we can always assume that low order bits arc read firs t, we can rewrite 
X as X ... X2X1, where X arc the bits of X read at timet., with t1< ... <t <T. X denoting both the 
chain o~bits and its length, we have the relations 1 P 1 
{l) X1 + ... +XP = N (2) T~p. 
Let X(t) be the total number of bits of X read so far at time t. and let Y(t) be the total number of 
result bits output in tl1is interval of ti!Tie. Since at time t, the total number of possihle values for 
the remaining output bits is at least2N·Y(l), and only N-X(t) bits of X remain to be read. the circuit 
must have at least X(t)-Y(t) active gates at time t. This requires a circuit perimeter n ~X(t)-Y(t) 
and a time n , hence the relations 
(3) X(t)-Y(t)::;n (4)T~n. 
Since low-order input bits arc read first and a fan-in on k bits takes O(k) time, at least N-X(t) 
output bits remain to be computed at time t +X.. We can give a geometric interpretation of mfs 
relation as shown in Figure 5-l. Relation (3) implies that the endpoints of t11c intersection of the 
shaded area of Fig.S-1 with a vertical line are at most n apart. ll follows that the shaded area must 
lie within the strip (Ll'L2). which in turn implies 
x1
2+ ... + xP 2::;Tn 
Since X/+ ... +X 2 is minimal when all the Xi's arc equal, we derive the relation N2/p::;Tn, 
which combined ..J'ith relations (2) and (4) yields 
T~N2Jpn +p+ n 
The minimum of the right-hand side is achieved for p = n = N 213, which concludes the proof. 0 
Note that the lower bound on AT is trivially tight, since there exist linear-time constant-area adders. We do 
not believe that this is the case with the lower bound given for T. We conjecture that T = O(N) is the actual 
lower bound in this model, which would make the simplest adder in the world asymptotically optimal. 
5.1.2. Case MODl 
ll is natural that low~r bounds obtained in MODI should be weaker than in MOD2. However, MODI has 
the merit of greater generality, and any lower bound in this model is thus very interesting. 
Theorem 8: IfT is the time required in MOD! by any circuit to add two N-bit integers, and if A 
is the area of the circuit, we have 
T = O(N 112), AT= O(N), A T 2 = O(N 2) . 
Proof: 1ne first two relations result from the fact that adding two N-bit integers involves a fan-
in of degree N. Lndeed the last carry is a fan-in of all the input bits. We can prove the last relation 
with the same technique used above. Keeping the same notation, we find that X(t)-Y(t)::;A, since 
at any time t the number of bits stored in the circuit is at least X(t)-Y(t). On the other hand, Y(t) 
always lies below the shaded area of Fig.S-1. It then follows that total area 6f the shaded rcgiqn 
DESIGN DISCIPLINES SESSION 
TowaPds MoP e Re a li sti c Mo del s o f Co mp uta t io n f o p VLSI 449 
N 
t 
T 
Figure 5·1: The Q(N 213) time lower bound on integer addition. 
cannot exceed the area of the parallel strip (Ll'L2), hence 
2 2 X1 + ... +XP :5AT. 
The minimum is achieved for X.= N/p. and since the time for reading the data is proportional to 
p, we find AT2=Q(N2), which cbmplctes the proof. 0 
5.1.3. Optimal adders in model MODI 
A fortunate feature of addi tion in model MODl is to allow the possibility of matching all the lower bounds 
derived above. We will describe a class of adders which satisfy these properties. 
Serial Adder: 'lhe simplest adder requires constant area. operates in linear time. and thus matches the lower 
bound for the measures AT and AT2. 'l11e scheme of this adder is represented in Fig.5·2. 
CLA Adder: Assuming wlog that N is a power of two. we implement the CLA scheme on a complete binary 
tree with N leaves. lhe operand bits are read in parallel at the leaves. and the time of computation is at least 
the time for propagating a signal across the longest path in the tree. It follows that the layout of Fig.5· 3 
requires O(N) time and O(NlogN) area. 
CALTECH CONFERENCE ON VLS I , JanuaPy 1981 
450 
B . M. ChazeZZe and L . M. MonieP 
s. 
I + 
Figure 5-2: The Serial Adder. 
Figure 5-3: The CLA Adder. 
If the technology allows the packing of k infonnation bits on a square of area O(k) (e.g. that excludes 
NMOS}, an alternate layout may use the H-embedding of a binary tree. as shown in Fig.S-4. The operands 
may be driven from the input ports to the leaves of the tree in about N112 waves of2N112 bits. Unfortunately, 
each wave consists of a complicated (but fixed) sequence of input bits. If we do not account for the task which 
arranges the input bits in the proper order, and if we use inverters to avoid long wires (sec Section 3) adding 
two N-bit integers simply takes O(N 112) time and linear area, which matches the lower bounds obtained forT 
and AT2. 
Mixed CLA Adder: In some applications the size of the operands greatly exceeds that of the circuit, and 
only, say, Na input ports are available. In this case, we can divide the operands into roughly N1·2a groups of 
N2a bits. and compute the addition for each group with a CLA adder of area N2a, transmitting the carry for 
the next addition every time around. The total time of computation will thus be O(N1-a), with a circuit of 
area O(N2a). Note that the lower bound AT2 = N2 is still matched with this scheme. Also, we observe that for 
a = 1/2 we have the CLA adder. whereas setting a to zero reduces to the serial adder. 
DESIGN DISCIPLINES SESSION 
TowaPds MoPe Realistic ModeZs o f Computation !oP V~SI 451 
Inputs 
and 
Outputs 
Figure 5·4: Optimal layout of the CLA adder. 
5.2. Transitive Functions 
In a recent paper [VU80]. J. Vuillemin has shown that the transitivity of a function has heavy consequences 
on its complexity in a VLS I model. Roughly speaking, a function is said to be transitive of degree N if it 
computes a transitive group of permutations acting on N clements. This implies that the function can map 
any input bit onto any output bit for an appropriate value of the other inputs. Such functions include cyclic 
shifts. integer products, convolutions. linear transforms, and some matrix products. 
5.2.1. Case MODl 
Even in our more general model. we can show a significant difference with previous results [PV79, VU80]. 
Theorem 9: Computing a transitive function of degree N takes timeT= n(N112). 
Proof: Let p be the number of output ports actually used. Since an input bit can be mapped 
onto any output port, Lemma 10 shows that for some value of the inputs. the computation will 
take time at least proportional to p. On the other hand. observing .that it takes time at least 
proportional to N/p to output the result completes the proof. 0 
It is worthwhile to notice the serious gap existing between thic; model and the previous ones. which allowed 
for logarithmic time for computing transitive functions (e.g. the CCC·scheme [PV79]). 
5.2.2. Case MOD2 
It comes as no surprise that since our second model adds physical constraints to the one in which Vuillemin 
derived his lower bounds, we can significantly improve upon his results. Before proceeding, we will establish a 
preliminary result 
Lemma 10: If N gates in a circuit are switched at the same time, their convex hull has a 
perimeter n(N). 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
452 
B . M. Chazette and L . M. MonieP 
Proof: Since all the power comes from outside the circuit and is transmitted through wires, the 
power inside any convex region of the circuit is at most prpportional to its perimeter. Switching a 
gate requiring a minimum energy, the result is straightforward. D 
,. 
We can now prove our main result. 
Theorem 11: ;\ ny circuit of area ;\ which computes a transitive function of degree N in timeT 
satisfies A= Q(N), T = Q(N). 
Proof: It has been shown in [YU80] that the circu it must have the capability of memorizing N 
bits. Therefore Lemma 10 implies ct:ar the circuit must have two active gates G 1 and G 2 at a 
distance S"l(N) apart, hence ;\ = Q(N). We can always assume that for some values of the inputs, 
infom1ation will be transmitted from G 1 to an output port P 1 (same with G 2 and an output port 
P 2 ). Consider now an arbitrary input port R. Since the function is transitive. t11ere exists a path in 
the circuit from R to P 1 and from R to P 2. Among all possible computations, the four paths G 1-
p 1, G 2 ·P 2 • R · P 1, and R· P will be used at least once. hom Theorem 2, it then follows that T is at 
least proportional to MaxtG 1P1,G 2P2,RP1,RP2 }. 'Jbe sum of these four lengths is greater than G 1G 2 =S"2(N) ·Sec Fig.S-5·, which concludes the proof. D 
G1 
Figure 5·5: Computing a transitive function requires linear time. 
Remark: ln MOD2, these lower bounds arc tight tor some problems; for example optimal circuits for 
performing integer multiplication, based on the Shift&Add scheme, can be found. 
DESIGN DISCIPLINES SESSION 
TotJaPds NoY>e Realist i c Models of Computation fop VL SI 453 
6. Conclusions 
The major contribution of this paper has been to show how previous models fail to allow for asymptotic 
analysis. We have proposed two models of computation which arc more realistic yet fairly simple. Since our 
models arc essentially geared towards asymptotic analysis, previous models may tum out to be more accurate 
for circuits of small size. For example, m·c carry-look-ahead scheme for adding two N-bit integers actually 
requires at least Q(N112) time in our models instead of the well-known logarithmic time, bur it may still be 
superior to any naive circuit for small integers. 
Further refinements of these models should be valid independently of size considerations, and should allow 
for a Ia Knuth analyses of YLSI circuits. It is still difficult to think of a technology-independent model at the 
present time. But it may be a prerequisite for building a complexity theory which faithfully reflects reality. 
Acknowledgments 
We wish to thank Jean Yuillemin for suggesting this research and Mike Foster for many fruitful 
discussions. Our thanks also go to H.T. Kung and Gerard Baudel. who shared our interest in this work. 
References 
[BK80] R.P. Brent and H.T. Kung. The Chip Complexity of Binary Arithmetic, proc. 12th Annual ACM 
Symposium on Theory of Computing, ACM, pp. 190-200, May 1980. 
[CL80] W.A. Clark, From Electron Mobility to l,ogical Structure: A View of Integrated Circuits, Computing 
Surveys, Yol.l2, No 3, September 1980. 
[MC80] C. Mead and L. Conway, Introduction to VIS! Systems, Addison-Wesley, 1980. 
[PY79] F.P. Preparata and J. Yuillemin, The Cube-Connected-Cycles: A Versatile Network for Parallel 
Computation, proc. 20th Annual Symposium on Foundations of Computer Science, Oct. 1979. 
[SE79] C.L Seitz, Se/ftimed VLS! Systems, proc. ofCaltcch Conf. on YLSI, 1979. 
[TH79] C.D. Thompson, Area-Time Complexity for VI.SI, proc. 11th Annual ACM Symposium on Theory 
ofComputing, ACM, pp. 81-88, May 1979. 
[VU80] J. Vuillcmin, A Combinatorial Limit to the Computing Power of V.L.S.l. Circuits, proc. 21st Annual 
Symposium on Foundations of Computer Science, Oct. 1980. 
CALTECH CONFERENCE ON VL SI, Janua yoy 19 81 
454 
DESIGN DISCIPLINES SESSION 
A LOGIC DESIGN THEORY FOR VLSI* 
by 
John P. Hayes 
Digital Integrated Systems Center 
and 
Departments of Electrical Engineering and Computer Science 
University of Southern California 
Los Angeles, California 90007 
ABSTRACT 
455 
Classical switching theory fails to account for some key structural and 
logical properties of the transistor circuits used in VLSI design. Thi s pa-
per proposes a new logic design methodology called CSA theory which is suit-
able for VLSI. Three kinds of primitive logic device s are defined: connec-
tors (C), switches (S), and attenuators (A); the latter have the characteris-
tics of pullup/pulldown components. It is shown that four new logic valu es 
are required, in addition to the usual Boolean 0 and 1 values. These value s 
introduce a concept of gain or drive capability into logic de sign; they also 
account for the high-impedance state of tri-state devices. The elements of 
CSA theory and its application to some basic VLSI design problems are des -
cribed. It is demonstrated that CSA theory provides a more powerful and more 
rigorous replacement for the mixed logic/electronic methods currently used in 
VLSI design. 
This research was supported by the National Science Foundation under Grant 
No. MCS78-26153, and by the Naval Electronic Systems Command under Contract 
No. N00039-80-C-0641. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
456 
John P . Hay e s 
1 . INTRODUCTION 
The development of very large-scale integrated (VLSI) circuits using the 
philosophy espoused by Mead and Conway [1] and others involves a complex in-
terplay of various design techniques at the electronic, logical and systems 
levels. These techniques are ad hoc for the most part, with the result that 
the VLSI designer is mainly guided by experience rather than theory. It mi9ht 
be expected that the large body of results in switching theory and logic de-
sign that has accumulated over the past 40 years can readily be applied to 
VLSI design, at least for analysis purposes if not for synthesis. Thi s does 
not appear to be the case, however. Several reason s may be cited for thi s. 
(1) The basic component of VLSI circuits is the MOS transistor whose 
logical behavior is that of a three-terminal digital swi tch. Neither of the 
classical models from switch ing theory, branch-type networks (also called 
contact networks ) or gate-type networks [2], adequately capture the structure 
or logical behavior of MOS transistor circuits . The primitive components of 
gate-type circuits are l ogic gates which allow signa l transmission in one di-
rection on l y. An MOS transistor, on the other hand, is inherently bidirec-
tional. The components of branch-type networks are (relay) contacts. A con-
tact i s bidirectional, but unlike a transistor , it i s ba s ically a 11'10-terminal 
device. 
(2) Classical switching theory hides some types of logic devices that 
have a s ignificant impact on integrated circuit design and layout. For ex-
ample, it does not recognize the important role played by connectors in logi-
cal behavior. Connections to power and ground are omitted from standard logi c 
diagrams, yet they are the sources of the logical 0 and 1 values on which the 
logical operation of all circuits depends. The selection and layout of con-
nectors is a central issue in VLSI des ign. Components like amplifiers and 
pullup/pulldown loads, which are crucial to proper logical or digital opera-
tion, are also invisible at the standard logic level. To see these devices 
we must move to the more detailed electronic or analogue level. 
(3) Only the two logical values 0 and 1 are recognized in standard 
switching theory. However, in modern des ign practice extensive use is made of 
DESI~N DISCIPLINES SESSION 
A Logic De s i gn Th e o~y fo~ VLSI 
457 
at least one additional logic value, the high-impedance state Z. Indeed it 
has been suggested that MOS technology i s inherently a three-state t echnol-
ogy [3]. 
At the present time, the usual remedy for the foregoing difficulties i s 
to combine design methods from switching theory and electronic circuit theory 
heuristically. Thi s results in "mixed" circuit diagrams which couple logic 
gates, transistors, etc. in a manner that, strictly speaking, is meaningl ess . 
It also causes important logic design techniques such as wired logic and tri-
state logi c to be treated as anomalous special cases. 
In this paper a new logic design theory is introduced that attempts to 
overcome the difficulties cited above. The key components of this theory are 
connectors (C), switches (S) and attenuators (A); we therefore refer to it as 
CSA theor y . A switch here is a three-terminal device that can accurately mo-
del the digital operation of a PMOS or NMOS transistor. An attenuator model s 
a pull up or pull down load; it ha s no counterpart in cl assi ca 1 S\'Jitchi ng theory. 
Central to our approach is a six-valued logic which, in addition to the usual 
"strong" Bool ean values 0 and 1, has "weak" versions of these values denoted 
by 0 and T. This weak/strong signal dichotomy allows the electronic concept s 
of signal amplification and attenuation to be transferred to the logic level. 
The high-impedance state Z is also treated as an explicit logic value. CSA 
theory provides a uniform and consistent alternative to the mixed design ap -
proach mentioned earlier. It can be used to analyze both branch- and gate-
type networks, as well as such nonclassical structures as wired logic and tri-
state 1 ogi c . 
Section 2 presents an informal development of the basic concepts of CSA 
theory. In Sec. 3 these ideas are presented in more ri gorouc:; and complete 
fashion. Finally in Sec . 4, CSA theory is applied to the design of a simpl e 
but important class of circuits, namely inverters. 
2. IN FORMAL DEVELOPMENT 
In thi s section the basic concepts of connector and switch are examined 
in detail. We show that six logical values are needed for an adequate des-
cription of their behavior, as well as a new l ogic device which we call an 
attenuator. 
CALTECH CO NFERENCE ON VLSI , Janua~y 19 81 
458 
,Jo hn P. Ha y e s 
Connectors 
In classical switching theory the only logical operation associated with 
a connector or wire is the trivial identify function v-+v. At the CSA level 
of complexity, connectors are seen as the fundamental devices for performing 
nontrivial operations of the AND and OR type. To demonstrate this, we first 
need a precise definition of a connector and its behavior. A tenninal T i s a 
designated conenction point in a network; it is denoted by a black dot in lo-
gic diagrams as shown in Fig. la . A simple aonneator is a continuous conduc-
ting path between two terminals. It may represent a metal, diffusion or poly-
silicon conductor in an integrated circuit, and is represented by a lineasin 
Fig . lb. A (complex) aonneator is a linked set of simple connectors; Fig. lc 
shows an example. Any point in a connector may be designated a terminal, 
therefore a connector can be viewed as simply a sequence of contiguous ter-
mi na 1 s. 
T• 
{a) (b) 
(c) 
Fig. 1 (a) A terminal T. (b) A simple connector. (c) A general 
connector C. 
Let V be the set of logic values or signals of interest. V contains the 
usual Boolean constants 0 and 1; additional values will be added later. With 
every connector C we associate a set of i nput vaZuee vin(C) taken from V. The 
vin(C) values are typically derived from external s ignal sources that are con-
nected to C. Thus in the connector of Fig. 2a the external signal sources are 
indicated by arrows, and the input signal set is v;n(C) = {v1 ,v2,v3,v4 ,v5,v6). 
DESIGN DISCIPLINES S ESS I ON 
A Logic Design Theory for VLSI 
I 
- ""v(C) ;v(C) 
(a) 
I 
I 
tv(C) 
459 
\ 
\ v(C) 
Fig. 2 (a) Input-output signal s of the connector C. (b) An equiv-
alent terminal T. 
While several different input values may be applied to a connector simultan-
eously, we as sume that the connector produces a unique outru'~ -..J,~iu. v(C) at 
all its terminal s , where v(C) E V. Thus if the physical signals associated 
with Care voltages , then C has the equipotential property of a perfect elec-
trical conductor. It follows that a complex connector can always be replaced 
by a singl e terminal as illus trated in Fig. 2b. 
Suppose that the input values v1,v2 EV are applied to connector C. For 
logical consistency and completeness, we require v(C) to be defined uniquel y 
for all possible combinations of v1 and v2 . We can write 
where # denotes the connection funct ion implemented by C. Let v1 and v2 as-
sume the values 0 and 1. If v1 = v2, then we expect the following equations to 
hold: 
CALTECH CONFERENCE ON VLS I , Janua r y 1981 
460 
#(0,0) = 0 
#( 1,1) = l 
John P. Hay e s 
( l ) 
If v1;. v2 is allowed (this is normally considered to be improper behavior in 
binary switching circuits), then we need a third logi c value which we denote 
U. U (for unknown) has frequently been used in logic simulation programs to 
model signal values during transitions between 0 and l, and the values asso-
ciated with uninitialized states [4]. We use U in the sense of a conflict 
value that results in the connector behavior defined by the foll owing set 
of equations: 
#(0, l) = :!: {O,U) = #(l,U) = #(U,U) = U ( 2) 
Next we consider the notion of a switch as a control led connector, and show 
that it requires the introduction of three additional logic values. 
Switches 
A switch S is defined here as a three-terminal device with a "control" 
terminal K and two symmetric "data" terminals 01 and 02. It is represented 
by the circuit symbol of Fig. 3a. The set of values V(O) is assigned to 01 
and 02 . The set V(K) containing the two values ON and OFF is assigned to K. 
later we will equate V(K) and V(O). When v(K) =ON, 01 and 02 are joined by a 
connector as in Fig. 3b. When v(K) =OFF, there is no connection between 01 
and 02 via the switch. Examples of switches that can be easi l y made to con-
form to this model are a manual on-off switch, a single-contact relay, and an 
MOS trans i s tor. 
ON, 
I 
1 D1 Kf 
-- .. 
K D1-cS--D2 
s 
(a) 
(b) 
.OFF 
I 
I 
Fig. 3 (a) An isolated switch S, and (b) its behavior. 
DESIGN DI SCI PLINES S ESSI ON 
A Logic Design Theo Py foP VLSI 461 
Suppose that an isolated switch S is to be used to control the signal 
value appearing at one of its data terminals, say 02. Intuitively the follow-
ing type of behavior i s expected: 
v(K) = ON implies v(D2 ) = ( 3) 
v(K) = OFF implies v(o2) = 0 (4) 
If v(K) =ON, then V( D1) = v(D2), so we can satisfy (3) by applying the constant 
1 to 01 as shown in Fig . 3b . When v(K) =OFF, however, 02 becomes an isolated 
terminal, and it "floats" to a value that is distinct from 0, 1 and U. We 
therefore introduce a new logical value Z to denote v( C) when Cis an isolated 
connector. Z corresponds to the usual high-impedance state of tri-state lo-
gi c. It i s a weak va lue in the sense that it can be overriden by each of the 
logic values 0 , l and U. This suggests that Z should satisfy the following 
set of equations: 
#(O, Z) = 0 
#{l, Z) = 
( 5) 
;:{U , Z) u 
#(Z ,Z) = z 
To satisfy (4) above, we can attempt to apply to 01 an external signal 
v that forces v(D2 ) to 0. Thus when v(K) =OFF, we require 
#( v,Z) = 0 {6) 
To sati sfy (3) at the same time r equires 
#(v,l) = l (7) 
assuming that l has been appli ed to the input data terminal 01. It is easily 
seen that none of the values 0, 1, U, Z can sa ti sfy equations (1), (2), (5), 
(6) and (7) s imultaneous ly. Thu s we introduce a fifth logical value denoted 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
462 
John P . Hayes 
0 which, like 0, is an acceptable "0-like" value in the realization of Boolean 
functions. If we replace (6) and (7) by 
#(O,Z) = o 
and 
#(0,1 ) = 1 (8) 
resp ectivel y, no contradiction results . Now (8) implies that 1 overrides 0 
,..., 
when both are applied to the same connector, hence 0 is a weak 0-like value, 
that is, a value with low (logical) drive capability. In a s imi lar manner, 
we define a weak 1-li ke value denoted T. The foregoing analysis suggests that 
0 and T should satisfy the following set of equations invo l ving the connection 
operator f: 
i:(O, Z) = t(o,o) = 0 
:r(l,z) = # (l, l) = 
;; (0, 0) = ;(l, o) = 0 ( 9) 
:! (0,1) = # (l, 1 ) = 1 
,..., ,..., ,..., 
;:(l, U) ;;(0 ,1) = ;:t (O,U) = = u 
Attenuators 
We have just seen that i f a swi tch i s used to tran smit 0-like and 1-like 
,..., ,..., 
signal s , we need two new values 0 and 1 that can be applied externally to its 
data terminals. We now define a new logic element ca lled an attenuator whose 
,..., 
function i s to generate 0 and 1. An attenuator i s a unidirectional two-
terminal device whose output is 0 or 1 when 0 or 1 respectively are applied to 
its input terminal. Figure 4 shows the symbol used for an attenuator, as well 
as its typical use to force the output of a switch to have values from the set 
,..., ,..., 
{0,1,0,1]. The circuit of Fig. 4 i s thus a complete switching circuit that 
meets the original behavior specifications suggested by (3) and (4). 
It is apparent from Fig. 4 that an attenuator is a device that can pull 
an isolated connector up (from Z to l) or down (from Z to 0). It thus models 
DESIGN DISCIPLINES SESSIO N 
A Logic Design TheoPy foP VLSI 
K 
I 
I 
I 
0 
Attenuator 
Fig. 4 Typical application of an attenuator. 
463 
the behavior of a pullup or pulldown device in an electronic circuit. Thi s 
may be a resistor or, in the case of VLSI circuits, a load transistor. Since 
it also converts "strong" to "weak" signals, an attenuator can be regarded 
as a digital impedance. 
The final primitive component we need is an ampli fier that converts 0 
and T to 0 and 1 respectively. Standard amplifying devices perform this func-
tion satisfactorily, hence we denote our amplifiers by the standard triangle 
symbol of Fig. 5a. Note that an attenuator is the inverse of an amplifier, a 
fact that guided our choice of symbo l for an attenuator. The attenuator sym-
bol contains a reversed amplifier, and also suggests an impedance or load ele-
ment. In accordance with normal logic design practice we inserta small circl e 
in a line to denote the following nonamplifying inversion operation: 
1-+ 0, 0-+1, 1-+0, 0-+1, U-+U, Z-+Z . 
Thus an inverting amplifier can be represented as shown in Fig . 5b. 
CALTECH CONFERENCE ON VLSI, Januapy 1981 
464 
John P . Hayes 
~}· C> ·H 
(a) 
• [>o • • c{> • (b) 
Fig . 5 (a ) P non-invert ing amplifi er. (b) Inverting amplifiers . 
3. THEORY 
We now present a formal description of CSA theory. A CSA network i s com-
posed of four ba s ic component types: n-terminal connectors , three-terminal 
switches, and two-terminal amplifiers and attenuators. Components are connec-
ted via their terminals, where a terminal is the simplest connector . The be-
havior of a network is determined by the output signal valuesofit s terminals . 
A set of six logical signal values is recognized: v6 = [0,1 ,O,l, U,Z]. The be-
havior of al l CSA component types is completely defined in terms of v6. 
Connection Function 
It is useful to introduce a concept of relative strength among the mem-
bers of v6. 
!J.'[:nit~o>: 1: Let v1,v2 EV6. v1 is (logicazz.!, ) stY'(ln.._?•J· than 
v2, denoted v1 ~ v2, if f{ v1 , v2) = v1 where # i s defined by Eqns. 
(1), (2), (5) and (9). 
The relation ~ impose s a partial ordering on v6 which is depicted graph-
ica ll y in Fig. 6. Cl early U i s stronger than all other member of v6, whil e Z 
is weaker than all other values. The values 0 and 1 are not related by~ 
DESIGN DISCIPLINES SESSION 
465 
A Logic Design Theo~y fo~ VLSI 
Strongest 
Weakest 
Fig. 6 Relative strength of the logic values in v6. 
becau se !1 (0, 1) = U; these values are said to be conti·..:.J,:.-..:or ' · Similarly 0 
~ . 
and 1 are contradictory. Us ing the foregoin g notions, we can now generalize 
Eqns. (1), (2), (5) and (7) to obtain the following concise definition of t hE 
behavior of a connector. 
Definition 2: (The k-place conn.:ct·1:on function i: ) Let C be a connector to 
which the input signals v1,v2, ... ,vk EV6 are applied; cf. Fig.2. C generates 
a unique output signal v( C) = =( v1 ,v 2 , ... ,v k) c v6 defined as follows. If v 1, 
v2, ... ,vk contain no contradictory values, then 
where vi ~ vj for all j= l, 2, ... ,k. If v1,v2 , ... ,vk contain contradictory val-
ues, then 
The action of # on the members of v6 determines the interpretation of 
the se logical quantities in practical digital circuits . Z is the logical val-
ue of an isolated connector, and also corresponds precisely to the high-
impedance state used in tri-state circuits. 0 and 1 correspond to the usual 
Boolean variables 0 and 1. Here, however, they are seen as strong signals 
that can override their weaker counterparts 0 and T. Thus 0 and 1 denote sig-
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
466 
John P . Hayes 
nals with high drive capability, such as power, ground, and amplifier output 
signals. 0 and l represent weak signals that have relatively low drive cap-
abili.ty; such signals are typically produced by passive load devices. Note 
that the Oandl signals are easily mapped onto Oand 1 respectively by passing 
them through an amplifier. Thus from t he viewpoint o f implement ing Boolean 
functiorc~ we ma.: chooosc either 0 or 7J to repr>esent Boolean zero., and either 
1 or> 1 to r>eprcs.~t BooZe~~ ont . U represents a conflict resulting from the 
simultaneous application of contradictory Boolean values of equal strength to 
a connector. U is not normally encountered in properly designed or "well-
vehaved" circuits. 
The standard Boolean operations AND and OR which do not require inver-
sions can be implemented by means of a connector alone. Suppose, for example, 
that k devices have their output terminals z1 ,z2 , .•• ,zk jo i ned by a connector 
C. Let the values v1 ,v2 , ..• ,vk applied to C via the zi terminals be restric-
ted to the subset [0,1} of v6. Then v(C), which is defined by Def. 2,imple -
ments the OR function. This i s because a (strong) 1 applied to any terminal 
of C overrides a (weak) 0 applied to any other terminal of C. Similarly, when 
the v.'s are confined to [ O,l}, C implements the AND function . It i s believed 
I 
that this type of '~irei Zogic " underlies the behavior of all switching cir-
cuits , including both branch and gate-type circuits . 
CSA Networks 
A switch, as indicated by Fig. 3, contains a control terminal K and two 
symmetric data terminal s D1 and D2 . The switch is homogeneous if the values 
that all three terminal s can as sume are identical, that i s , v(K) = v( D). A 
manual on-off switch i s not homogeneous because v(K) and v( D) are defined in 
incompatible mechanical and electrica l domain s , respectively. An MOS transis-
tor is homogeneous if the K input (the gate) and the D inputs (the source and 
drain) all employ the same digital voltage levels. An inhomogeneous switch is 
useful as a transducer between physically i ncompatible signal domains. Here 
we will restrict our attention to homogeneous switches where all three termin-
al s may assume values from v6. 
Figure 7 defines the behavior of the most basic switch in terms of v6. 
The switch i s turned on and off by the K values 1 and 0, respectively; there is 
DESIGN DISCIPLINES SESSIO N 





























































































































