Cost and performance of VLSI computing structures by Mead, Carver A. & Rem, Martin
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-14, NO. 2, APRIL 1979 455
[32] P. C. Amett and B. H. Yun, “Silicon nitride trap properties as
revealed by charge centroid measurements on MNOS devices,”
Appl. Phys. Lett., vol. 96, pp. 94-96, Feb. 1975.
[33] R. R. Troutman and S. N. Chakravarti, “Subthreshold character-
istics of insulated gate field-effect transistors,” IEEE Tram Cir-
cuit Theory, vol. CT-20, pp. 659-665, 1973.
Peter E. Cottrelt (S’69-M’73) was born in Troy,
N.Y. He received the B.S., M.E. and Ph.D.
degrees from Rensselaer Polytechnic Insti-
tute, Troy, NY, in 1968, 1970, and 1973,
respectively.
From 1970 to 1972, he was an Instructor at
Rensselaer Polytechnic Institute. Since 1973,
he has been employed by the IBM General
Technology Division, Essex Junction, VT. His
experience has included research and develop-
ment in the following areas: transient ionizing
radiation effects in IMPATT diodes, hot-electron effects in MOSFET’S,
design limitations and characterization of MOSFET structures, and two-
dimensional simulation of both bipolar and FET semiconductor devices.
Dr. Cottrell is a member of Eta Kappa Nu, Tau Beta Pi, and Sigma Xi.
Ronald R. Troutman (S’61-M’63-SM’78), for a biography and photo-
graph, see this issue, p. 391.
T. H. Ning (M’75), for a biography and photograph see this issue,
pp. 274-275.
Cost and Performance of VLSI Computing Structures
CARVER A. MEAD AND MARTIN REM
Abstract-Using VLSI technology, it will soon be possible to imple-
ment entire computing systems on one monolithic silicon chip. Con-
ducting paths are required for communicating information throughout
any integrated system. The length and organization of these communi-
cation paths place a lower bound on the area and time required for
system operations. Optimal designs cars be achieved in only a few of
the many alternative stmctures. Two illustrative systems are analyzed
in detail: a RAM-based system and an associative system. It is shown
that in each case an optimum design is possible using the area-time
product as a cost function.
I. INTRODUCTION
T HE SILICON integrated-circuit technology is evolvingcontinuously toward smaller elementary devices and
denser, more complex functions on each single silicon chip.
It appears that new processing and lithographic techniques
will make possible the fabrication of chips containing 107 or
108 individual transistors. One such chip will contain more
function than today’s largest computers. A large amount of
effort has been put into fabrication questions, and much more
effort will be required to reach the practical limits of device
Manuscript received September 18, 1978; revised January 10, 1979.
This work was supported in part by BMD under Contract DASG60-77-
C-0097, and the Office of Naval Research under Contract NOOO14-16-
C-0367. (Califor~a Institute of Technology, Computer Science Dewrt-
ment Contribution 1584.)
C. A. Mead is with the Department of Computer Science, California
Institute of Technology, Pasadena,CA91125.
M. Rem is with the Department of Computer Science, California
Institute of Technology, Pasadena, CA 91125, on leave from the
Department of Mathematics, Eindhoven University of Technology,
Eindhoven, The Netherlands.
compactness. However, there is at present essentially no theo-
retical basis for optimizing the overall organization of systems
implemented in this technology.
The conventional complexity theory is inadequate because
its measure of cost is the number of steps of a sequential ma-
chine. No account is taken of the size of the machine (and
hence the time required for each step). Possible concurrency
is ignored, thereby ruling out the most important potential
contribution of the silicon technology. The traditional switch-
ing theory is also inadequate. While it pro~des a beautiful
formalism for describing elementary logic functions, its opti-
mization methods concern themselves with logical operations
rather than communication requirements. Ev?n in current in-
tegrated circuits, the wires required for communicating infor-
mation across the chip account for most of the area, and driv-
ing these wires accounts for most of the time delay. In very
large scale integrated systems, the situation becomes even
more extreme. In this paper, we describe a method by which
the conceptual organization of a large chip can be analyzed,
and a lower bound placed on its size and cycle time before a
detailed design is undertaken. The results of this analysis
suggest rather general guidelines for the organization of large
integrated systems.
II. METRICS OF SPACE AND TIME
A. Physical Properties
Devices used to construct monolithic silicon integrated cir-
cuits are universally of the charge-controlled type. A charge
Q placed on the control electrode (gate, base, etc.) results in
001 8-9200/79/0400-0455$00.75 @ 1979 IEEE
456 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-14, NO. 2, APRIL 1979
a current 1 = Q/-r flowing through the device. The transit time
7 is the time required for charge carriers to move through the
active region of the device.
All times in an integrated system can be formulated as simple
multiples of ~. For one transistor to drive another identical to
it, a charge Q must flow through its active region, requiring
time 7. If the capacitance CL of ‘the load being driven is K
times the gate capacitance C’ of the driving transistor, a time
KT = (CL/Cg)T is required.
B. Linear Versus Hierarchical Structures
In large integrated systems it is necessary to communicate
information throughout the entire system. As an example, a
bit of information stored on the gate of a minimum size tran-
sistor in a random-access memory must be communicated to
the memory bus of a CPU. Since there are many words of
data in the memory, there are mapy possible sources for each
wire in the memory bus. Fig. 1 illustrates two possible ap-
proaches to organizing such a bus. In the first approach, a
transistor associated with each bit drives the bus wire directly.
If the bus wire has a capacitance CW, the time required to
drive the bus wire is t = T(Cw/Cg). In a typical computer
memory, CW is many orders of magnitude larger than Cg, and
the delay introduced by such a scheme is very long. Since CW
is proportional to the length of the wire, it is also proportional
to S, the number of driver transistors connected to the wire,
t = T&s. (1)
A second scheme is shown in Fig. 1(b). Here each transistor
drives a wire only long enough to reach its neighbor. Each
such wire is connected to the gate of a transistor twice as large
as the transistor driving it. The arrangement is repeated up.
ward until the top level where all sources have a path to the
bus. In this scheme the delay in driving the lowest level wire
is 2~ (assuming the primary capacitance is due to the gate of
the larger transistor). The delay introduced by the wires at
each level is the same, since each driver transistor is twice as
large as those driving it. Hence the delay in driving the bus
line is 27N where N is the number of levels in the structure.
Since there are S = 2 N transistors at the lowest level, the delay
may be written
t = 2T log~ s (2)
Comparing (2) and (l), we see that for large S the delay has
been made much shorter by using a hierarchical structure.
C. A Cost criterion
A hierarchy such as that shown in Fig. l(b) may be built
using any integral number a of transistors driving each wire.
The driver transistors will in general be a times the size
of those driving them. The delay for such a structure is t=
errlogJ = ?(a/log a) log S’. All system delays are thus propor-
tional to ~ log S, with a penalty factor a/log ct dependent upon
the branching ratio of the hierarchy. This delay is plotted in
Fig. 2, normalized to its minimum value which is attained at
ci=e.
While dramatic improvements in the performance of inte-
grated structures can be achieved by a hierarchical organization,
BUSLine
-J-
—
—
(a)
(b)
Fig. 1. (a) A bus driven directly by memory cells. (b) A bus driver tree.
6
5
4
Relatwe
Delay
3
2
, .-
o~
1
alpha
Fig. 2. Delay of a hierarchical structure as a function of a.
a penalty is always paid in the area required for wires. In the
simple case shown, a bus requiring one wire when driven
directly requires logs S wires when organized as a hierarchy.
For this reason it is not possible to optimize a design without
a cost function involving both area and time. In this paper we
will use the area-time product as our basic cost function. For
the above simple example, the cost function is area “time =
T(log S)z a/(log a)z . The cost is minimized for a = e 2 = 7.4.
D. Hierarchical Computing Systems
The analysis given above suggests a very general structure for
computing systems. Lowest level cells are grouped together
into modules in such a way that a cells drive their outputs
onto an output wire. Each output wire is connected to a driver
MEAD AND REM: VLSI COMPUTING STRUCTURES
transistor which is a times as large as those driving the wire.
Modules are grouped in such a way that &of those modules
drivers are connected to an intermodule commuriication wire.
This wire in turn is connected to a driver transistor a2 tit-nes as
large as the lowest level transistors. This process is continued
until the appropriate size system has been realized.
III. RANDOM -ACCESSMEMORY
In this section we discuss the cost and performance of a
random-access memory (RAM) of S words of log S bits each.
As the unit of length we employ the minimum distance of two
conducting paths. For the unit of time we choose the time it
takes a basic element to charge a wire of unit length plus
another transistor like itself. One unit of time is thus slightly
larger than the transit time of a transistor.
A. Organization of the RAM
We organize the RAM in a hierarchical fashion. The elements
of level O are the bits themselves, each bit consisting of two
crossing wires: a select wire and a data wire. When the select
wire is signaled it puts its contents on the data wire. We group
a2 bits into an a X a square to form a module of level 1. If
the width of an element (a bit) is b., the elements have to
drive wires of length abo. A module on level 1 consists of an
array of crossing select and data wires, constituting the a2 bits
of level O, and some additional logic and wires at the side. We
group again a2 of these modules into a square to forma module
of level 2, etc. Fig. 3 shows three levels of the hierarchy for
Q=4.
To study the memory in more detail we look at a module of
level i (Fig. 4). We describe how one extracts one of its a2 i
bits. In order to select 1 bit of storage, 2i log a address wires
are required. We run i log a of them, tilled the row address
wires, vertically along the side of the module and the other
i log a; the column address wires, horizontally. Its a2 sub-
modules are organized into cr rows of a submodules each. When
the select wire of the module is asserted log cr of the row ad-
dress wires are used by the decoder to select one of the a rows
of sub modules; the select wire running through that row is
asserted. The other (i - 1) log a row address wires are run
horizontally into each of the a rows of submcrdules, where
they serve as column address wires for the submodules. Of
the i log a column address wires (i - 1) log a are run vertically
into each of the a columns of submodules, where they serve
as row addresses. The other log a address wires are used by
the multiplexer to select one of the a data wires coming out
of the columns of sub modules. The signal on the selected
data wire is driven onto the data wire of the module itself.
If we wish to have a memory of S words with N + 1 levels
(level O through N) we choose N = log S/2 log a or S = cr2N.
This gives a hierarchical structure with S bits from which we
can extract 1 bit at a time. If we want the word length to be
log S we employ log S of these structures in parallel. To select
one word we select 1 bit in each of the log S hierarchies.
B. Area of the RAM
Fig. 4 allows us to compute the size of a RAM. Let Li denote
the width of a module of level i; then we have the following
-O13EI0
13r311DUUDU
q 13UU
- q CIDU
OUEIO
unnn
q uuu
- q unnEorln
q lrluuUUQU
- q unuUUU’U
EIL3CSD
q EIDC
457
Fig. 3. Three levels of a memory hierarchy for a = 4.
recurrence relation:
LO = b~
Li =ilog~+ l+log~+~. Li.l.
The solution to the above relation is
i-l+
(
Zai+i _ ~i _ ~
Li = aibO + L (a- 1)2 )
- + log a.
a-l
Rather than the width itself we are interested in the width per
bit. In one direction, horizontal or vertical, module i has ai
bits; therefore, we compute Lj/ai.
Li _ 1 2cl!-1
~- bO+— —
a cl-l +(cl-l)210@
1 i [(fi+l+i)(cl - l)a 1loga+l . (3)
An interesting property of the width per bit, as expressed by
(3), is that its limit for i ~ ~ is finite.
1 2a-1
Iim<=bo+— —
i+m Cr (x- 1 + (a-l) JOga” (4)
This means that the width per bit Li/ai is bounded from above
by (4) independent of the number of levels of a RAM. Expres-
sion (3) converges in an exponential fashion towards its limit.
For small vahres of i, (3) is already very close to (4). There-
fore, we use (4) as the width per bit for a RAM; its square is
then the area per bit. By dividing the area per bit by the bit
area b~ we obtain the total area per bit atea for a RAM. Fig.
5 shows this quotient as a function of a for four different
values of b.. It gives the overhead factor in the area that is
due to the wires. For a memory of 64K bits with A’= 2, a
should be 16. Expression (4) is then equal to bO + 0.6. This
shows that in 2-level 64K dynamic MOS memories, for which
bO lies between 1 and 2, roughly half of the area will be oc-
cupied by wires.
One may wonder why we have not discussed the area that is
458 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-14, NO. 2, APRIL 1979
20
Total mea
b(t area
10
8
L
i
1 i Ioq ,. log.
s
e
I
6
c
t
R
o
w’
A
D
D
R
E
s
s
]
D
E
c
o
D
E
R
2
a
SUBMODULES
data
.L
1-1
09 a
log a
1
Fig. 4. A RAM module of level i (i > O).
rz~). Taking the sum of this expression for i = O, 1, - “ -, N
2 4 8 16 32 64 128
alpha
Fig. 5. Total area per bit of a RAM as a function of a.
consumed by the wires for power and ground. The reason for
this is that these wires cau be thought of as increasing only the
width b. of each bit; they do this by an amount that is roughly
independent of a, as is shown in the following analysis.
For simplicity we assume that the wires for power and ground
run in opposite directions, say parallel to the data and select
wires. We compute how much one of them contributes to the
width of a module i. The width of a power or ground wire is
proportional to the number of bits served by it. Let the width
at the highest level be u; given S and the design of the lowest
level memory cell, this parameter is easy to compute. The
width of the wire in a module on level i is proportional to the
current it must supply and is hence u (a 2‘/u2N). In one direc-
tion, horizontal or vertical, there are (aN/at) such modules.
The total contribution of all modules on level i is thus ZJ(LXi/
yields
u ~N+l _ 1 c1
~N =u—a-l ~-1”
There are @ bits in one direction; the increase of the bit
width, due to power and ground, therefore, is
&Y:l
which is roughly equal to u/&.
We are interested in the optimal choice of a, but to make
that choice we will have to look at the access time, which also
depends on a as well.
C. Access Time of the RAM
Each element of level O drives a wire of length abO to reach
the periphery of its module on level 1; this takes time &bO.
Each module on level 1 drives in the same amount of time as a
wire that is a times longer to reach the periphery of its module
on level 2, etc. With N being the level of the highest module,
the time required to extract 1 bit of storage adds up to abo N.
We use this figure as the access time. For a RAM of S words,
the access time is then abo (log S/2 log LY).
D. The Cost of the RAM
We take the product of the area and the access time as the
cost function of the RAM. A RAM of S words of log S bits
each has the following area–time product.
( 1 2cr-1 ) 2 abobO+— — —a-l+ (a-l)z 10ga 210ga ‘10g2s” (5)
Fig. 6 shows (5), normalized with respect to S log 2S, as a
function of a for different values of b.. One notices that for
increasing bit sizes the branching ratio of the hierarchy should
decrease. Static memories, therefore, should have a smaller u
MEAD AND REM: VLSI COMPUTING STRUCTURES 459
/“’-”100 -\, /
“;:’: ./
10
alpha = 2 4 8 16 32 64 128
Fig.6. Area-time product ofa RAMasafunction of a.
than dynamic ones. For dynamic MOS memories the optimal
choice for a lies between 8 and 16, for static MOS memories
(bO s 4) between 4 and 8. One may speculate that “smart
memories,” structures in which part of the processing task is
distributed over the memory cells, will have small branching
ratios and hence relatively deep hierarchies.
IV. CONTENT ADDRESSABLE MEMORY
The basic elements of the RAM were bits. The content ad-
dressable memory (CAM) is an example of a word organized
memory. We consider a “pure” CAM. It consists of words of
w bits each. We access a word by applying w bits of data to
the system. We assume that there is only one word in the
memory with that contents, and the address of that word is
produced by the memory.
A. Organization of the CAM
The basic elements are the bits, each of width b ~. The bits
do not constitute the modules of level O. The modules on level
O of the hierarchy consist of CYWwords of w bits each. [See
Fig. 7(b)] The w data bits are run via parallel wires vertically
through the module. Out of each word comes one horizontal
match wire going to the right. A word asserts its match wire if
each data bit received is equal to the corresponding bit stored.
There are w words in a module of level O; the address of the
matching word leaves the module via the log w address wires.
The above organization of a module of level O has one defect.
It would require the individual bits of storage to drive wires of
length WIbl, which may be greater than the desired @l, to
reach the address wires. In Section II, we discussed that this
type of communication should be achieved by a hierarchy.
We, therefore, organize the driving of the match wire by the w
bits in a word in the same manner as shown in Section II.
Each word is chopped up into (w/a) subwords of a bits each
[Fig. 7(a)], Each of the (w/a) subwords sends a signal to a
“match tree” which has a branching ratio of a and delivers,
via logaw levels, the logical product of its inputs. The top
node of the match tree can drive a wire of length bl a]og~ ““=
b I w, the length of a word in the memory. Therefore, the
word itself can drive a wire of length b 1aw, and we may group
together O!Wwords into module O [Fig. 7(b)]. Notice that the
module’s length is roughly equal to a times its width. This will
be true for modules on higher levels as well.
We now describe a module of level i (Fig. 8). It contains
wa4i+l
words and consists of a4 submodules of level i -1,
grouped into a2 rows of az submodules each. Each such row
contains, besides the a2 submodules, w data wires to trans-
port the data to each of the submodules and log wa4i-] out-
coming address wires to transport to the right the address of
the matching word. Each submodule has wa4 i‘3 words, and,
hence, one row contains wcx4i-1 words which explains the
number of address wires. A module on level i has cr2 of these
rows and thus requires log WU4‘+1 outcoming address wires;
the y are placed to the right of the rows.
In the CAM we have Q4 submodules per module, in the RAM
only a2. This is only a seeming difference. In the CAM, for
simplicity, we have combined two steps in the hierarchy; we
have maintained, however, our multiplication factor a for the
wire lengths. Li_i , the length of a module of level i - 1, is
roughly equal to a times W’i-l, the width of a module of level
i - 1. Therefore, module i - 1 can already drive wires of length
a Wi_l . As a consequence, we can put a2 submodules into one
row as this would only require the driving of wires of length
LY2Wi.l in each row. But then we can, and this is the second
step, combine a2 rows as this would require the driving of
wires of a length about a2L i.l , which is roughly equal to
&3Wi-~ .
B. Area of the CAM
We compute the length and the width separately. For the
length Lj of a module on level i, we have the relation [cf. Figs.
7(b) and 8]
‘“’”W(+%)
Li Ga2(W +Li-l + log WQ4i-1).
The solution to this recurrence relation is
Li = ~zi+l
‘(bl+%$+(w+’ogw) a::--:
(+ 4a2i+2 _ ~&2 s~2i+2 - QjQ2 - 3a~(IY-2-1)2 + a2-1 ) log CY.
A module on level i has Waz ‘+] bits in the vertical direction.
The length per bit, therefore, is Li/w~2i+l . This has the fol-
lowing limit for i ~ ~:
log w +
bl+—
a(w+logw+310g LY)+ 4cllogcl
log a
(6)
W(O!*- 1) W(az - 1)2 “
As in the case of the RAM, Li/wa2 ‘+1 is already very close to
the limit for small values of i; the rate of convergence is again
exponential. We use (6) as the length per bit of a CAM.
We find for the width Wi of a module on level i the following
460 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-14, NO. 2, APRIL 1979
nlalch tree
[ 1
log w / log a
. . .
~Subwords -------- b,
} H
a b, 1
w(ub,+l)
a
(a)
Wn
A
log w
bl*—
log a I
aw
words
D
D
R
E
s
s
o
u
T
L
o
I , I
W(I + a b, ) log (NW)
a
(b)
Fig. 7. (a) One word of storage in the CAM. (b) A CAM module of
level zero.
CX2W,1 !Og w (14i+1
I I
w
L
i-1
4i-1
log w a
do l.,”
2
. . ..~ Submodules - --- -
AD13Rk SS OIJ1
4
c
Submodules
A
D
D
R
E
s
s
o
u
T
I
w 1
Fig. 8. A CAM module of level i (i > O).
MEAD AND REM: VLSI COMPUTING STRUCTURES
recurrence relation [cf. Figs.7(b)and 8]:
w~ = ; (CXbl + 1) +logaw
Wj = ~2Wj_~ + 10g W~4i+1 .
Its solution is
() ~zi+zWi=~2iw bl+~ + -1 log wCr @z-l
(+ ~azi+z _ ~~z ~2i+2 -4,i_l(az-l)z + d-l ) log a.
In the horizontal direction there are Wazi bits. The width per
bit Wi/w~2i has as its limit for i ~ ~
bl+~+Cl*log c!w 4a2 log a
a W(az - 1) + W(az - 1)2“ (7)
We take the product of (6) and (7) as the area per bit.
By dividing the area per bit by the bit area b; we obtain the
total area per bit area for a CAM. Fig. 9 shows this quotient
for w = 32 as a function of a for different vrdues of bO.
If we compare Figs. 5 and 9, we notice that for small values
of a the wires in the CAM cause less overhead in area than
those in the RAM. For large values of a it is the RAM that
enjoys a smaller overhead in area. For equal bit sizes, i.e.,
with bO = bl, the area overhead factor for the RAM and the
CAM are about equal at CY= 8.
As in the RAM we can compute by how much we should
increase the bit width b ~ if we wish to take power and ground
into account. Both power and ground give an increase of
U(cY2/c12- 1) to the length and the width of the CAM. This is
even closer to u than in the case of the RAM. If we wish to
ammortize this amount over the bits, the bit width b, should
be incremented by
& .2”: ~
for a CAM of S words of w bits each.
C. Access Time of the CAM
For the access time we take the time required to extract the
address of the matching word of data from a memory of S
words. With the highest level being level N, we have S =
wa4N+1 or
~= logs -logw 1
410go! -7
A word of storage has a response time of (log w/log a)ab I;
for a module of level O this becomes [(log w/log a) + 1] ab ~.
Each new level of the hierarchy multiplies the wire lengths by
a factor a2 and hence requires an additional time of 2 ab ~.
For AJlevels we find, hence,
(
log w
access time = 2N + —
)
+ 1 abl
log o!
(_ logs+ logw 1— )+~ abl.2 log a (8)
10
9
8
7
6
5
4
Total area_
bit area
3
2
I
461
“
bl=l
2
&&
4
8
, ~.
2 4 8 alpha 16 32
Fig. 9. Total area per bit as a function of a for a CAM with word
length 32.
Araa.t!me
product
10 -
L4
1 I 1 I 1 1 1
2 4 8 alpha 16 32
Fig. 10. Area-time product for a CAM of 65K 32-bit words.
D. i%e Cost of the C!
We again take the product of the area and the access time
as the cost function. For a CAM of S words of w bits each,
formulae (6), (7), and (8) yield the cost function
( log w +bl+— a(w+logw+310g a)+ 4alogalog a w(a* - 1) W(clz - 1)2)
(‘ bl+l+ a2 log CYw+ 4(12log aCr W(az - 1) W(czz - 1)2)
“(log s + log w )+ + ctblwS.2 log a!
Fig. 10 shows the cost function as a function of a for a CAM
of 65K words of 32 bits each. The curves are fairly indepen-
462 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. SC-14, NO. 2, APRIL 1979
dent of the choice of w provided we choose w great enough,
say w >16. A change in S will basically move the curves only
up and down; it will not affect the positions of their minima.
We notice again that increasing the bit size will decrease the
optimal choice of a. Comparing Figs. 6 and 10 we see that
content addressable memories should have smaller branching
ratios than random-access memories, For bl = 4, which seems
a reasonable figure, the optimal choice of a is 4.
V. CONCLUS1ON
We have presented a general method for analyzing the cost
and performance of recursively defined VLSI structures.
Parameters of any such structure may be optimized with re-
spect to time, area, or some combination of the two. While
we have chosen the area-time product, it is clear that some
other choice may be appropriate for any given application.
The results of this study indicate that as more processing
is available in each module at level zero, the optimal value of
a will decrease. A system with a = 4 wouId seem to be appro-
priate for memories in which substantial processing is comingled
with storage.
Very general arguments were used to generate the basic re-
cursive structure. For that reason it appears that a very large
fraction of VLSI computing structures will be designed in
this way. We have discussed two examples, one in which the
basic elements were bits of storage, and one with words of
storage at the lowest level. They gave rise to rather {lfferent
recursive structures. The way in which their area and time
measures were established should make it clear how to apply
these techniques
structures.
Carver A. Mead, for
470.
to other recursively defined computing
a biography and photograph, see this issue, p.
Martin Rem was bom in The Netherlands on
September 22, 1946. He received the B.S.
degree in mathematics and physics and the MS.
degree in mathematics from the University
of Amsterdam, Holland, in 1968 and 1971,
respectively, and the Ph.D. degree in computer
science from Eindhoven University of Tech-
nology, Eindhoven, The Netherlands, in 1976.
He is an Associate Professor of Mathematics
in the Department of Mathematics, Eindhoven
University of Technology. He is presently a
Visiting Professor of Computer Science in the D~partment of Computer
Science, California Institute of Technology, Pasadena, CA. His major
research interests are in the area of programming of machines with in-
store processing, semantics of programming languages, correctness
proofs, and well-structured machine designs.
Dr. Rem is a member of the Association for Computing Machinery,
the Dutch Computer Society NGI, and the Dutch Mathematical
Society.
Delay-Time Optimization for Driving and Sensing of
Signals on High-Capacitance Paths of
VLSI Systems
AMR M. MOHSEN, MEMBER,IEEE, AND CARVER A. MEAD
Abstracr–Transmissirnr of signals on large capacitance paths in a
VLSI system may result in substantial degradation of the overall sYs-
tem performance. In this paper minimization of the delay times as-
sociated with driving and sensing signals from large capacitance paths
by optimizing the fan-out factor of the driver stages, the gain of the
input sensing stages, and the path voltage swing are examined. Ex-
amples of driving signals on a high capacitance path with two driving
schemes are: a push-pull depletion-load driver chain and a fixed driver;
and of sensing signals with two sensing schemes: a single-ended deple-
tion-load inverter input stage and a batanced regenerative strobed
latch are presented. We conclude that minimum delay time is achieved
when the delay times of the successive stages of the driver chain, the
high capacitance path, and the input sensing stage are comparable.
A. M. Mohsen is with Intel Corporation, Santa Clara, CA, and the
California Institute of Technology, Pasadena,CA911 25.
C. A. Mead is with the Department of Computer Science, California
Institute of Technology, Pasadena, CA 91125.
In general, transmission time of signals in a system is minimized when
the delay times of the different stages of the system are comparable.
I. INTRODUCTION
THE OVERALL PERFORMANCE of VLSI systems maybe seriously degraded if signals need to be transmitted
from one part to other parts in the system across large capaci-
tance paths [1] . This large fan-out situation often occurs in
the case of control drivers that are required to drive a large
number of inputs to memory cells or logic-function blocks
across a chip, or in the case of sensing stored information from
small cells of large memory arrays. A similar and even more
serious problem is driving wires which go off the silicon chip
to other chips or input and output devices. In such cases, the
001 8-9200/79/0400-0462$00.75 @ 1979 IEEE
