The Future of data communications system design by Kellow, Kenneth
Rochester Institute of Technology 
RIT Scholar Works 
Theses 
2002 
The Future of data communications system design 
Kenneth Kellow 
Follow this and additional works at: https://scholarworks.rit.edu/theses 
Recommended Citation 
Kellow, Kenneth, "The Future of data communications system design" (2002). Thesis. Rochester Institute 
of Technology. Accessed from 
This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in 
Theses by an authorized administrator of RIT Scholar Works. For more information, please contact 
ritscholarworks@rit.edu. 




Project submitted in partial fulfillment of the requirements for the
degree ofMaster of Science in Information Technology
Department of Information Technology
B. Thomas Golisano College
Of
Computing and Information- Sciences
April 2002
The Future of Data Communications System Design
By
Kenneth J. Kellow
A Thesis submitted in partial fulfillment for the degree of
Master of Information Technology






The Information Technology universe and its data communications subset was
created largely due to advances in integrated circuit technology. Primarily
focused on layer 2 of the OSI model, silicon integration is a technology upon
which much of data communications is constructed. Over the past forty years
advances in integrated circuit technology have manifested themselves through
shrinking structural dimensions and increases in performance. This combination
of shrinking structural dimensions also known as circuit density, and performance
increases, also known as circuit speed, have led to long term compound
productivity increases and a seemingly never ending menu of data
communications applications. Recent beneficiaries include 10/100 Ethernet
systems, gigabit Ethernet systems, asynchronous transfer mode systems (ATM),
synchronous optical network systems (SONET), and wireless systems. Some
have argued that these density and speed improvements have doubled every
eighteen months to two years over the past thirty to forty years and are directly
responsible for the information age revolution in which we exist today. Others
have projected that past integrated circuit productivity increases cannot continue
indefinitely and that we may be approaching and end to integrated circuit
compound productivity increases that fuel the information age. But does the
purported end to compound silicon productivity increases mean an end to the
growth of data communications systems or the larger role of information
technology? Or are there other factors which may fill the role at the system level
by using silicon in a more productive way? The statement of this thesis is that
the end of compound silicon productivity will not be an important impact to data
communications over the next five years.
About the Author
Ken Kellow electrical engineer having spent much of his 35 year career in the
Microelectronics Division of IBM Corporation. He joined IBM in 1968 as an
electronics technician and has held numerous engineering and management
positions at IBM primarily in the areas of microelectronics product development,
product engineering, applications engineering, and marketing engineering. Over
the years he has made many contributions to IBM, some of which are still
regarded as trade secrets. He was a member of IBM's patent review board
where he worked with engineers and patent attorneys to protect IBM's intellectual
property. Prior to early retirement in 1998, he was a marketing manager
responsible for worldwide microelectronics support of communications
companies such as: Nortel, Ericsson, Siemens, Cisco, 3COM, and Alcatel. Ken
is a 1968 graduate of RETS technical school where he earned an FCC first class
license with radar endorsement. He earned a BS in Business Administration and
a BS in Corporate Communication and Public Relations from Trinity College of
Vermont in 1997 and is currently a candidate for a Masters Degree in Information
Technology from Rochester Institute of Technology. He is currently an
iii
information technology training coordinator for the Lancaster Pennsylvania
campus of Harrisburg Area Community College where he assists students in




TABLE OF CONTENTS 5




Transistor source to drain leakage 16
Transistor gate leakage 17












Silicon Germanium Background and Future 36
CHAPTER 3 - CHOICES IN THE DESIGN OF DATA COMMUNICATIONS
SYSTEMS 39
5
Low cost systems 40
Benefits of avoiding OTS components 41
FPGA Chips 42
Improving low cost system competitive advantage 44
Cost - Performance Systems 45
CHAPTER 4 - CHALLENGES TO E-COMMERCE AND DATA
COMMUNICATIONS SYSTEM DESIGN PART 1 51
Custom Chip Design 55






CHAPTER 5 - CHALLENGES TO E-COMMERCE AND DATA
COMMUNICATIONS SYSTEM DESIGN PART II 68
Partitioning 68
Rents Rule 71
Chip Economics 101 74
Leading Edge vs Bleeding Edge 76
Second Level Assembly 80
Packaging 81
Final word on partitioning 86
CHAPTER 6 - CHALLENGES TO E-COMMERCE AND DATA




Low Complexity Gate Array 90
Low Complexity Gate Array Verification 91
Economics ofLow Complexity Gate Arrays 93
High Complexity Gate Arrays 95
Silicon Compilation 96
Synthesis 97





Place and Route 104
Macros 106
Software and Hardware Synergy 109
CHAPTER 7- CHALLENGES TO E-COMMERCE AND DATA
COMMUNICATIONS SYSTEM DESIGN PART IV 111
Timing Verification and Test Engineering 112
Timing Analysis 114
Clock Tree 115












Chapter 1 - The Limits of Silicon Productivity
Fundamentally, all aspects of information technology owe their existence to
computer systems and of equal importance, the data communication systems
between computers that form the backbone of the value of information
technology to society. Silicon technology remains the basic building block for
both of these systems. Since the invention of the first transistor in 1948, the
commercialization of transistor technology enabled rapid improvements in the
way transistor technology is used in society. Ideas, like spread spectrum radio, a
key technology in today's wireless world, although understood over fifty years
ago, could not be deployed because of the cost and reliability of the technology
available to construct it at the time. With the development of the integrated
circuit in the early 1960's, the science of silicon technology was launched upon a
rate of contribution unmatched by any technology previously known. All aspects
of data communications and e-commerce are affected by silicon technology.
Increases in system capability enable the invention of new features and
applications almost daily. Yet most of us know little about silicon technology.
Like the air we breathe, the electricity and telephone service we receive, silicon
technology remains an unsung hero to most people involved in information
technology. Yet for the past 40 years the integrated circuit industry has enjoyed
productivity and performance increases that doubled every 18 months to 2 years.
This rate of productivity, called Moore's Law after Gordon Moore who first
observed it, forms the basis of the current information age. Figure 1 shows
Moore's Law as it progressed from 1970 with a few well known data points from
Intel Corporation for reference. The Y axis depicts the number of transistors on a



















I I I I
1970 2010
Figure 1: Moore's Law and Intel Processors
The Intel data points are not meant to minimize the contribution of semiconductor
memory (DRAM) which increased in productivity from 256 bit chips to 64 megabit
chips over approximately the same period, and also plays a critical role in
e-
commerce and data communication systems.
It is well understood that nothing increases in productivity indefinitely. Recently,
some have predicted that the huge productivity gains of the past will be ending
within the next 10 to 15 years. If true, this could mean a significant change to the
economics that applies to information technology including e-commerce and data
10
communications. The reason for this end is the fundamental limit identified by
John Von Neumann as "the computed thermodynamical minimum of energy per
elementary act of information from the formula Es = kT loge N", where Es is the
minimum signal, N = 2 for a binary act, k is Boltzmann's constant, and T is
absolute temperature in degrees Kelvin. Thus Es (min) = (In2) kT. While Von
Neumann never explained how he came to this equation, no one has yet found it
in error. This fundamental limit is an important factor for binary devices upon
which the digital world is built. Transistor devices derive much of their usefulness
from their ability as switching entities. Meindl and
Davis1
calculated three values
supporting Von Neumann's contention. First, they calculated the minimum
supply voltage needed for a MOSFET device considering the need for binary
signal discrimination in the widely used CMOS technology. For an ideal
MOSFET, the value of the supply voltage Vdd is 0.036 V at
300
K. Second, they
calculated the minimum signal energy transfer during a binary switching
transition as 0.0179 eV at
300
K. Third, they calculated the minimum switching
energy of an interconnect using
Shannon's'
theorem of the maximum capacity of
a communications channel contaminated by a white noise source. This third
calculation, Ebit (C/B --> 0) is equal to Von Neumann's equation Es (min) = (In2)
kT. Thus Meindl and Davis claim to correctly project the fundamental limit on
signal energy transfer in terms very meaningful to terrascale integration. In
combining the minimum switching energy and the minimum energy per bit with
the minimum value of gate oxide thickness needed to retain the properties of bulk
11
silicon oxide (S|02) of 1.5 nm, Meindl and Davis show the minimum channel
length of a MOSFET device ( Lmin ) to be 13.9 nm.
Von Neumann's equation projects that devices operating near the
thermodynamic limit would theoretically be unreliable. This means
semiconductor devices would not have discernable energy states and would
have equal probability of being in a binary "T state or a binary
"0"
state. Such a
situation would make digital systems as we know them so unreliable as to be
worthless. Yet other limitations are more likely than the Von Neumann limit.
These other limitations are more economic than physical and should occur far
earlier than the ten to fifteen years before the Von Neumann limit is expected to
occur. Figure 2 shows the aggressive improvement of transistor physical gate
length starting from the early 1990's and projecting to the year 2010. The
technology node is the definition by which the semiconductor industry identifies
the technology in marketing terms. The transistor physical gate length is the




















1990 1995 2000 2005 2010
Figure 2: Transistor Physical gate length Vs.
Technology Node - Source Intel Corp.
Extending Moore's Law
Indeed the 13.9 nm goal may be in reach. According to Chau and Marcyk,
previous generations of technology relied on photolithography to travel along
Moore's roadmap2. Current vintage technology is relying on new materials to
continue Moore's legacy. Of all the future challenges to Moore's Law it may be
transistor power that holds the key.
Transistor Power
Today's e-commerce and data communication technology is, for practical
purposes, digital. Binary states (one's and zeros) are much easier to engineer
13
than analog (continuously varying) states. Today's circuit technology is
dominated by complementary metal oxide semiconductor (CMOS) transistors.
CMOS transistors work in pairs, a pair consisting of an NMOS and a PMOS
transistor. A key point is that CMOS circuits only dissipate power when they
switch from one state to another. A properly operating CMOS circuit dissipates
no powerwhen it is idle. Consider the Intel CMOS chips from Figure 1 and the
implications of chip power dissipation.
Circuit power may be approximated from the simple equation. P
= V x I, where V
is the source to drain voltage and I is the amount of current flowing in a CMOS
circuit. The amount of current (charge) is quite small, in the range of nano (10"9)
amperes or less. Thus ifwe consider a CMOS transistor operating at 3.5 Volts
and 1 nanoamp we calculate a transistor power of 3.5 nanowatts or 3.5 x
10'9
watts. Ifwe say it takes 4 CMOS transistors pairs to construct a CMOS circuit,
than the circuit will dissipate 14 nanowatts. Consider however that this chip is
being operated (clocked) at 1 MHz and has 25,000 CMOS circuits on it (perhaps
an Intel 286 processor). The equation now becomes:
1x106
Hertz x 25 x
103
circuits x 14 x
10"9
watts or 350 Watts. Clearly the Intel 286 processor did not
dissipate 350 watts of power. What is missing is the fact that digital operations
often result in no change from the prior digital representation in the system, i.e.,
digital systems are not uniform, not every circuit switches states on every clock
cycle, and no power is dissipated when the circuit does not change states. The
non-uniformity implies digital systems have a duty cycle component which must
14
be taken in to accountwhen estimating the total power dissipation of a chip.
Calculating duty cycles is an important part of a system design and requires
detailed analysis often assisted by computer aided design tools, but typical duty
cycles are in the range of 1 to 10 percent. Ifwe factor in a duty cycle of 3% our
equation now becomes:
1x106
Hertz x 25 x
103
circuits x 14 x
10"9
watts x .03
duty cycle or 1 1 watts - a much more realistic number. Ifwe consider a more
recent vintage Pentium 3 processor being clocked at 500 MHz with a 3% duty
cycle, operating at 2.5 Volts, with a maximum power dissipation of 100 watts, we
can approximate the individual transistor current (charge) to be 2.67
"12
amperes.
Additional analysis of the physics involved could lead to the approximate
dimensions of the CMOS transistors use by Intel to construct the Pentium 3.
Scaling
As the ability to manufacture successively smaller transistors is realized, a
process called scaling is applied. Scaling is a relatively simple process of
reducing the dimensions of a transistor structure (somewhat uniformly) to that of
the current generation of manufacturing capability. Thus, for example, as
manufacturing ability improves from 1 micron to .7 microns, the 1 micron
transistors can be scaled by 30% to the .7 micron size. In other words, one can
take a transistor of dimension x and linearly reduce it by some amount y. This
reduction is typically in the range of 25 to 30 percent. A linear shrink of 30
percent in transistor size will theoretically result in a 100% increase in the
15
number of transistors per unit area on a chip. Scaling is rarely fully linear but is
sufficient to achieve the smaller dimension in a manufacturing process with some







Figure 3: Shrink Example
As we have seen, continuing to travel Moore's roadmap in the immediate future
will require improvements in transistor power consumption which could possibly
be accomplished in three interrelated ways: 1) Transistor source to drain
leakage, 2) Transistor gate leakage, and 3) Lowering transistor operating
voltage.
Transistor source to drain leakage
16
In a perfect transistor, current flows across the channel (area directly beneath the
gate) from source to drain only when the transistor is turned "on". If current flow
occurs when the transistor is turned "off", a condition called sub-threshold
leakage is said to occur. This leakage causes power to be consumed in the
"off1
state and requires a higher transistor operating voltage. The combination of
current flow and higher operating voltage causes large amounts of power to be
consumed and must be corrected if terrascale integration is to be achieved. In
addition, as transistor size continues to shrink, gate oxide thickness declines.
Thinner gate oxide results in higher gate leakage and the need for higher chip
voltage. Chau and Marcyk have proposed constructing a depleted substrate
transistor (DST). This structure is a Silicon On Insulator (SOI) structure
supplemented with an epitaxy grown source and drain. The proposed structure
reduces the sub-threshold leakage by a factor of 100 over bulk silicon and prior
structures using SOI technology.
Transistor gate leakage
With oxide thickness continuing to be reduced in the gate region, current begins
to flow through the gate. Chau and Marcyk have again proposed using a
proprietary gate dielectric material as a substitute for silicon oxide (Si02). This
dielectric is formed using atomic layer deposition in a step process until the gate
surface area becomes saturated with the dielectric material. This new dielectric
results in a thicker film without added capacitance. Chau and Marcyk claim gate
17
leakage is reduced by a factor of
104
while gate capacitance remains the same
as SiO2.
Lower chip voltage
The combination of less sub-threshold leakage and gate leakage allows the chip
to operate at a lower voltage. Chau and Marcyk claim their invention will operate
at 600 millivolts by the year 2010. This is a tremendous advantage (5X) when
compared to today's 3V technology and will help keep chip power to a minimum.
Indeed, according to Chau and Marcyk, with the number of transistors on a chip
approaching 1 billion, total transistor power at the chip level may approach that of
a "rocket
nozzle"
sometime during the next decade. From an engineering
viewpoint there are probably ways of handling the immense power but no
proposals seem to do it economically. It is very possible future desktop
computers will contain liquid cooled processors.
Carbon Nanotubes
As we approach the limits of silicon, other inventions have the opportunity to
push on. One such technology is the recently announced carbon nanotube
announced by IBM3. According to IBM, "Carbon nanotubes are tiny cylinders of
carbon atoms that measure about 10 atoms across, and are 500 times smaller
18
than today's silicon based
transistors."
It is expected that carbon nanotubes may
replace silicon transistors when silicon transistors cannot be made smaller
sometime in the next 10 to 20 years. IBM's approach is to use carbon nanotubes
to replace the channel in MOSFET transistors. Several aspects of this work are
interesting4. IBM has been able to create carbon nanotube transistors at the
same scale as today's silicon transistors. This indicates the ability to continue
shrinking transistor geometries along the path predicted by Moore's law well into
the future. A key element of this report is the ability to create precise electrical
properties in carbon nanotubes including any desired band gap.
Spin Transistors
Like the vertical structure being developed by Intel and the carbon nanotube
being developed by IBM, researchers are investigating the phenomena of the
spin transistor5. Spin technology is based on quantum theory which is
considered to be the basis of modern physics. Spin transistors were postulated
seventy years ago by Paul Dirac, a graduate engineering student in Cambridge
England who turned physicist. Dirac reconciled equations for energy and
momentum from quantum theory with those ofAlbert Einstein and his theory of
relativity and earned a Nobel Prize in the process. Spin technology is based on
the idea that the "exchange of energy at a subatomic level is constrained to
certain levels-or quantities. Spin technology is based on the intrinsic angular
momentum of a particle that it cannot gain or lose. The concept is difficult to
19
convey since electrons do not have dimensions, like a radius, in the classical
sense. Researchers have identified two types of spin technology the Spin Field
Effect Transistor (SFET), and the recently proposed Resonant Tunneling
Transistor. According to Zorpette, the SFET is a traditional FET with ferro
magnetic metal added to the source and drain which have the same alignment of
electron spins. Electrons are injected into the source, align their axes with the
source and drain, and move toward the drain at approximately 1% of the speed
of light. At this speed an applied electric field acts as a magnetic field. Thus a
voltage applied to the gate would flip the direction of their spin and become
polarized against the direction of the drain. Current flowing toward the drain
would stop. The advantage is that the flipping action takes very little energy. In
addition, the polarity of the source and drain could be flipped independently
offering possibilities not yet understood.
The second type of SPIN Technology, the Resonant Tunneling Transistor (RTT)
also has real possibilities. The Resonant Tunneling Transistor derives its value
from a quantum phenomenon called resonant tunneling and is an extension of
the resonant tunneling diode. According to Zorpette, an RTT device depends on
an infinitesimal region called a quantum well where electrons are confined. At a
specific voltage, called the resonant voltage, electrons begin to move (tunnel)
from the well. The resonant voltage corresponds to the quantum energy of the
well. While the spin state of the electrons is irrelevant to tunneling, researchers
have proved they can control the energy levels to create different tunneling
20
pathways. Thus at one voltage, current would conduct in one direction while at
another voltage, current would conduct in the other direction. There are several
proposals on how to control these phenomena.
Semiconductor Memory Technology
Since the early 1970's data communications systems have relied on
semiconductor memory. Figure 4 shows the progress of semiconductor memory


























1.E+09 1+10 1.E+11 1JE+ 1J6+13 1*14 1JE+16 tJE+16 1JE+17 1JE-M8
cumulative shipments, bte
Figure 4: Moore's Law Effect on DRAM Technology
The key difference between memory devices (DRAM) and logic devices





at the lowest possible cost. This storage ability is contained in the
storage capacitors on the chip. Logic devices try to minimize storage
capacitance (gate delay) to achieve maximum speed. Memory devices try to
optimize storage capacitance in the memory cell to achieve the highest
capacitance per unit of area to get the lowest cost per bit possible. Performance
is a secondary consideration. Typical of the memory industry is to show the
cumulative sales (in bits) on the x-axis and the cost per bit on the y-axis. Cost
per bit is typically high when the chip is introduced and quickly declines as
manufacturing learning drives the cost down. The legend in Figure 4 shows all
the memory generations starting from 1971. What is unique about memory is the
combination of innovation and production experience which combine to produce
a double logarithmic trend whereby the price per bit decreases by approximately
30 percent as the total number of bits shipped doubles. This will probably
continue for some time as a 16 Gbit DRAM is expected to be announced at the
International Solid State Circuits Conference in February
2002.7
As with logic transistors, memory productivity cannot continue its historic
productivity forever. New types of memory structures will need to be invented to
keep pace with Moore's Law. Spin transistors offer a distinct possibility.
We are constantly reminded that most physical laws do not directly apply to
e-
commerce and data communications systems. At any point in time it is rather
trivial to use the past as a roadmap to the future. In a recent paper by
Keyes8
22
the overall complexity of creating integrated circuits are discussed. According to
Keyes, the way we actually store and retrieve information remains controlled by
physics within the hardware. With the quantitative properties of silicon well
understood, it is the economics that dictates commercial success to e-commerce
and data communications systems. Thus it is not the ability to place copious
amounts of transistors on a chip that brings success. It is in fact the ability to use
them effectively. Therefore, I find no reason to dispute Meindl and Davis's
conclusion that the limits of Terascale integration is a gate length of 13.9 nm, nor
do I have any reason to think the proposals of Chau and Marcyk are without
merit. But I am convinced that the economic structure of e-commerce and data
communications will see a paradigm shift sometime before the year 2010.
23
Chapter 2 - Beyond Basic CMOS
While CMOS technology dominates the system design landscape there are
supplements to CMOS technology that can make the design of e-commerce and
data communication systems design more effective. These supplemental
technologies extend the benefits of CMOS technology in a focused way. As
measured by the total number of CMOS circuits shipped, neither of these
technologies are by themselves major players in system design. But both
contribute in important unique ways. These technologies are bipolar technology
and silicon germanium technology. While bipolar and silicon germanium are
useful supplements to CMOS, they are more prevalent in wireless applications
than computer applications.
Bipolar Technology
Practical applications of bipolar technology have been around longer than FET
technology although the first FET patent was filed in the early 1920's. Indeed
bipolar technology dominated the high performance system design arena from
the beginning of the 1950's up until the late 1980's. Emitter coupled logic (ECL),
a circuit variation of bipolar technology invented by Hannon Yourke in 19569, was
the preferred technology for high speed information technology systems for many
years. ECL was invented to overcome the problem of excess minority carriers in
the base region of a transistor which caused high speed switching delays and
thus slower circuits. ECL was so successful that it was the technology preferred
24
by IBM for its most powerful computers until the early 1990's when it was
replaced by CMOS. Bipolar technology generally and ECL particularly, are
power hungry technologies and more complex to construct than CMOS
technology. From a system design viewpoint, bipolar technology required
elaborate cooling systems to achieve acceptable system reliability. Some
versions of pre CMOS IBM mainframes contained chip assemblies that produced
over a kilowatt of power and were liquid cooled. Despite its reputation as a
power hungry technology, bipolar technology remains useful in today's e-
commerce and data communications systems.
About 1990 a paradigm shift occurred in both the type of technology used and
how it was manufactured, that may have forever changed the shape of system
design. Prior to 1990 high speed systems used ECL (some very high speed
applications used Gallium Arsenide [GaAs] technology), and slower systems
used CMOS. Slowly CMOS, thanks in large part to Moore's Law, began to
compete favorably with ECL and GaAs. System designers could count on higher
and higher CMOS circuit densities, which required far fewer high power
transistors, to achieve performance equivalent to ECL. The tens of thousand of
transistors available on chips during the latter half of the 1980's and into the early
1990's enabled system designers to create platforms upon which were built the
fore runners of today's data communications and e-commerce systems. So
successful was CMOS that CMOS chips, containing tens of thousands of
transistors became commodity products. System designers began creating chips
25
and having them manufactured in facilities called
"foundries."
Prior to the 1990's
chips were exclusively the domain of the large corporate powers. Companies
like Hewlett Packard, IBM, Texas Instruments, Siemens, AT&T and others
invested in chip making facilities costing tens of millions of dollars. In addition,
these same companies invested hundreds of millions of dollars to develop
proprietary tools and fabrication processes. Suddenly a new type of business
appeared whose sole purpose was to provide prototype and manufacturing
capability for those engineering entities that could not afford the high cost of
owning their own chip fabricator. Now the world of innovative technology was no
longer the exclusive domain of the large high tech companies. Armed with some
venture capital, a good idea, and a few smart people, relatively small teams of
people could create products that impacted the data communications and e-
commerce landscape.
BiCMOS
Bipolar technology has been combined with CMOS technology in a process
named BiCMOS. BiCMOS technology exhibits the density of CMOS technology
and the current drive capability of bipolar technology. There are three situations
when BiCMOS seems appropriate: 1) When high performance analog capability
is needed, 2) when memory performance is critical and 3) when system cost can
be reduced. BiCMOS has been used commercially, mostly for analog
applications, since the early
1970's.10
At that time, the value of CMOS
26
technology was just becoming recognized and combining CMOS with bipolar was
the path chosen by early adopter's especially high performance SRAM
manufacturers who recognized the value of BiCMOS and had customers willing
to pay for the highest bit density and performance available. As CMOS
applications grew so did its manufacturing complexity. Initially, starting with just
a few mask steps, CMOS eventually grew in complexity to over 1 1 mask steps.
Thus the major simplicity advantage of CMOS over Bipolar was lost. But while
losing simplicity, it became viable to combine CMOS and Bipolar technologies.
Santo reports that the complexity of BiCMOS can be formidable. Texas
Instruments has reported an SRAM which takes 18 mask steps to produce; yet
appears simple when compared to Hitachi who used 25 mask steps to produce
its SRAM. During the early 1990's data communications shed their analog
heritage and along with new wireless technology became almost totally digital.
System designers are concerned about overall system perormance.
Performance issues directly affect the amount of bandwidth a system can utilize.
Figure 5 shows a shows a BiCMOS structure containing a PMOS device, an









Figure 5: BiCMOS Pictorial
The ability to place tens of millions of transistors on a chip, and in the near future
a billion transistors on a chip, creates opportunity for a BiCMOS process. The
large number of CMOS transistors must be interconnected to perform useful
functions. These functions created in CMOS technology must then be connected
to other CMOS functions and so on until a system is created. The material that
interconnects these functions, usually aluminum or copper aluminum, introduces
resistance and capacitance between the functions being connected. This
resistance and capacitance reduces the overall performance of a chip and
subsequently the system the chips are used to create. With its ability to drive
relatively high current, a bipolar transistor can supplement the signal strength
created by a CMOS transistor and increase its power to
"drive"
the signal to the
next transistor. This results in much faster chip performance. A down side of
BiCMOS is its additional cost. A BiCMOS chip contains several more
manufacturing processes than a simple
CMOS manufacturing process. Thus
system performance gained by the added cost must be sufficient to offset the
28
increased system cost which Einspruch estimates to be approximately 40%11. A
recent paper by Bouras et
al12
shows the value of BiCMOS to sub half micron











Figure 6: CMOS Vs. BiPolar effect on risetime.
Figure 6 shows the decrease in rise time between a BiCMOS transistor and a
CMOS transistor for capacitive loads of .5pf to approximately 10pf. With small
loads, there is little advantage to BICMOS, however when larger loads are
considered, there is a definite advantage to BiCMOS. This particular model
reports an advantage of approximately 400 Pico seconds
(400"12
seconds).
When we consider the total number of functions on a chip containing hundreds of
thousands of functions, the value of BiCMOS technology becomes apparent.
These loads are typical for CMOS technology wire lines. Figure 7 shows delay
time as a function of capacitive load. Again at higher loads the bipolar transistor
29
is capable of less delay time. At 10 pf, this improvement is approximately 500











Figure 7: Delay improvement of BiPolar over CMOS
With the advantage of bipolar technology established on CMOS chips, the
question arises about when to use a bipolar transistor or function on a particular
design. There are several applications which make sense. 1) On chip clock
drivers, 2) On chip storage interface, 3) Situations when you need to go off chip
(off chip drivers), 4) Analog applications. In all of these examples it is the ability




Clock drivers are a special class of transistor needed to keep a system
synchronized to a common electrical signal, the system clock. It is not necessary
for an entire chip to be synchronized to a single clock. Indeed there are many
applications in data communications where a piece of the total chip circuitry
operates asynchronously to interface to incoming data like the front end of a
modem, T1 line, ethernet LAN etc. Yet most chips operate on signals derived
from a single clock. The clock signals must connect throughout the chip and at
the same time must be precise. Providing accurate clock signals throughout the
chip is called clock distribution. Electrical signals on a chip operate in a window
of validity that can last for only a few nanoseconds or less. If the clock and data
signals are not adequately synchronized, the chip will not operate correctly.
When you think of the large number of transistors on a chip it may become
apparent that system designers need to obtain a window of validity as large as
possible to achieve the performance possible with today's technology. Bipolar
transistors, with their ability to significantly improve rise and fall delay time when
connected to large capacitive loads, are outstanding clock drivers. If used
relatively sparingly, so as to minimize their propensity to use high amounts of
power, bipolar transistors make a significant contribution to fulfilling the promise
ofMoore's Law.
31
On Chip Storage Interface
A second application for bipolar transistors on CMOS chips is to interface to on-
chip storage. It is rare for systems that support e-commerce and data
communications to be without on chip storage. As higher levels of integration
were achieved, the amount of storage increased also. The ability to temporarily
store information on a chip cannot be overstated. From caches on processor
chips, to content addressable memories in routers, to on chip storage for LAN
interface functions, on-chip storage capability is a must.
Dynamic semiconductor memory consists largely of transistor arrays containing
storage elements (transistors and capacitors) while static semiconductor memory
usually lacks the capacitor element. Both dynamic and static memory contain
large transistor arrays with thousands of elements. These elements are just as
sensitive to the rise, delay and fall time of electrical signals as functional logic
elements, and must be provided with robust input. All memory systems have
tight electrical specifications and bipolar compatibility helps achieve these tight
specifications with a minimum amount of complexity. On-chip memory differs
somewhat from memory only chips in that memory only chips are designed with
signal distribution in mind, (i.e., electrical signals that appear at the input of a
memory chip usually need to drive a single transistor, not the total array.) The
memory chip designer takes responsibility for signal distribution after the signal
enters the chip. Yet it remains up to system designers to understand memory
32
chip requirements when designing cards and boards with many memory chips;
for example, dual in-line memory modules (DIMMS) which have many memory
chips on board; each of which is depending on a robust signal input. Bipolar
transistors on CMOS chips seem a likely solution for memory applications.
Off Chip Drivers
Off chip drivers present another opportunity for BiCMOS to contribute to reduced
system cost. Until complete systems can be built on a single chip, the need to
send electrical signals off chip will be needed. Going off the chip presents a
whole range of engineering challenges at the card or board level where many
centimeters of relatively large wire must be driven to the next subsystem. Today,
as in the past, it is common for special chips, called line drivers, to perform the
task of driving large capacitive loads. While these line driver chips are not
usually expensive by themselves, system designers need to include them in both
the materials cost, cost of assembly, test, and more frequently the amount of
physical space they use. Good examples are systems with cost and / or physical
size constraints such as cell phones, airliner avionics, laptop computers, and low
end routers, switches, and hubs. Integrating bipolar transistors into the I/O of a
CMOS chip eliminates the need for stand alone line drivers and receivers making
overall system design, assembly and test less expensive.
33
Analog Applications
BiCMOS technology has become a critical element in many high performance
analog applications. Chief among them is wireless applications. Szmyd et. al.,
describe an RF wireless
technology13
capable of high gain and a wide range of
applications such as Low Noise Amplifiers (LNA), integrated voltage controlled
oscillators, mixers, A/D converters, and power amplifiers (PA's). Szmyd claims
the 250 nano meter technology (.25 micron) is capable ofmeeting all currently
known wireless applications. This BiCMOS process is important to system
designers because it supplements an existing CMOS process being
manufactured and allows the use of existing CMOS logic functions. This implies
little or no additional cost to use this technology in future system designs. Szmyd
offers proof of the robustness of this technology through the use of Gummel
characteristics which plot transistor data over many decades of current and is
capable of identifying non-ideal transistor behavior. Figure 8 and 9 show
Gummel plots of Szmyd's technology for both NPN and lateral PNP transistors.
Figure
1014












0.2 0.4 0.6 0.8 1 1.2
Vbe (V)
o.2 o.4 o.e o.e 1 1,2
Veb(v)
Figure 8: NPN Transistor Linearity and Figure 9:
Lateral PNP Transistor Linearity
Silicon Germanium
A second key BiCMOS technology for future system designs is silicon
germanium (SiGe) technology. Silicon germanium is still emerging but is
positioned for future wireless and optical system contributions. It has two highly
important properties, its ability to work at very high frequencies and improved
power efficiency as compared to silicon; the latter contributing to increased
battery life in portable applications which are expected to become ubiquitous in
the future. Some examples of silicon germanium applications are: digital set top
boxes, automobile collision avoidance radar systems, personal digital assistants,
global positioning systems, optical network devices, switches, and routers.
Some have projected applications such as single chip watch sized wireless
phones with global positioning receivers and internet access in a case the size of
35
RF FRONT-END
Figure 10: Block Diagram of an RF Front End.
today's cell phones. Applications previously relegated to gallium arsenide
technology will be applied using silicon germanium. Some announced products
include a high bandwidth 10 Gbit/sec SONET optics based communications
system (Alcatel)15, an 11 Mbit/sec 802.11 wireless network solution16, a 40




and spread spectrum cell phones .
Silicon Germanium Background and Future
According to Harame and Meyerson20, Silicon germanium became an attractive
technology after several difficult physics, engineering and manufacturing
challenges were solved. The result was a heterogeneous bipolar transistor
(HBT) that required no changes to the BiCMOS process being used to
manufacture products at IBM. As previously mentioned, IBM discontinued the
use of ECL technology in 1992 and now uses CMOS technology for future
36
mainframe systems (now called Enterprise Servers). SiGe technology was
reapplied to BiCMOS mixed signal applications which are ideal for RF wireless
data communications. Analysis shows why SiGe is such a formidable performer,
the improved base sheet resistance over Silicon (Figure 11).
vm*ajm
iV$ ,::- V^*--* ;?
(KOfa)
Figure 11: Improvement in Base Sheet Resistance of Silicon Germanium over Silicon
The base sheet resistance is significantly lower than silicon. IBM reports a 10
times improvement in collector current over silicon devices having the same
process. IBM further reports transistor ft greater than 100 MHz using a more
recent process.
One unlikely application of SiGe technology was on hard disk drives21.
E-
commerce and data communications systems often depend on large amounts of
fast hard drive storage mostly for database applications. SiGe technology has
37
created a chip that translates analog signals from a disk drive read head into
digital words using a PRML signal processing algorithm. The chip was clocked
at 75 MB/sec which is reported to be the fastest PRML chip speed to date. The
literature continues to grow in SiGe applications. One may conclude that the
combination of a base CMOS technology supplemented with bipolar technology
to create BiCMOS and further improved with SiGe bodes well for future e-
commerce and data communications systems. But what will we do with all of
those transistors?
38
Chapter 3 - Choices in the Design of Data
Communications Systems
While the world of marketing tries (very often successfully) to artificially segment
the market for their own purposes, there remains fundamentally two types of
systems from a generic classification view. The two extremes are: low cost
systems, often called commodity systems, and high performance systems, often
called cost - performance systems. Specific system design points are positioned




Figure 12: Concept of Cost - Performance systems
39
Figure 12 pictorially describes the concept. Low cost systems are positioned on
the far left of Figure 12 while cost - performance systems are positioned
anywhere on the line. According to Porter22, "Competitive advantage grows
fundamentally out of a
firms'
ability to create value for its buyers that exceed the
firms'
cost of creating it. Value is what buyers are willing to pay for, and superior
value stems from offering lower prices than competitors for equivalent benefits or
providing unique benefits that more than offset a higher
price."
The concept of
cost - performance systems fit well with Porters statement. Of course this does
not mean that low cost systems equate to low technology. Many low cost
systems are also high technology. For example, cell phones, smart cards, global
positioning receivers, interactive video games, "Easy
Pass"
toll booth technology,
small office switching systems, LAN switches, etc., are all high technology
systems, but, they are low cost systems too. They are of course, enabled
fundamentally by productivity predicted by Moore's Law. One may note that the
examples are consumer oriented and in most cases require an enormous data
communications and e-commerce infrastructure behind them to create value.
Low cost systems
System designers must always be cognizant of system manufacturing cost in
addition to the cost to develop a system. The approach to a specific low cost
system design could take a path using commodity "off the shelf (OTS)
components, or some level of a combination of OTS components and custom
40
design.3
Commodity components, some ofwhich are very powerful, have the
advantage of being immediately available in small quantities. Thus, system
designers can begin to create prototypes almost immediately. In addition, OTS
manufacturers offer applications notes, reference designs, and in some cases
debug assistance to bring prototype systems to life and give those working on
software development early access to working prototypes. Generally, low cost
systems are more sensitive to market windows of opportunity because low cost
systems can often be substituted for other low cost systems. Thus, fast time to
market is often critical to success. Yet in some cases, it makes sense to custom
design more of the critical parts of the system. Table 1 shows some reasons
why OTS components may not be the best choice to achieve future system
goals.











Benefits of avoiding OTS components
In the first row of Table
1,23
existing products which benefited by avoiding OTS
components during the original design are generally looking for a functional
improvement and desire to, at minimum, maintain performance while avoiding
a
Authors note. It is highly unlikely that a designer would consider attempting to design and
fabricate the complete bill of materials of a system. My purpose here is limited to those
semiconductor components which system designers have the skill and ability to use to create
value.
41
product discounting by offering an increase in function at the same price. In the
second row, existing products expect to achieve a manufacturing cost reduction
while maintaining performance and function. This is generally achieved by
convincing the supplier to reduce the price of the current design point or
redesigning the function in a less costly technology i.e., for example, moving the
function from a .5 micron technology to a .25 micron technology. If no additional
function is added, the new chip will be significantly smaller and garner the
benefits thereof. In the third row, old design points are no longer competitive and
cannot be refreshed. The new design is expected to achieve a competitive point
through increased performance and function while decreasing (or at least
maintaining) price. It may be possible to refresh a design point using OTS
components but the design schedule is often influenced by the OTS component
manufacturer who may not be motivated to replace the current OTS product on
the schedule needed.
FPGA Chips
Out of the tremendous growth in silicon integration came a class of OTS chip
called a field programmable gate array (FPGA). FPGA chips can take on logic
functions once considered the domain of custom chip designers. FPGA chips do
not require the up front investment in complex modeling and simulation tools as
do traditional chip designs. They also avoid the prototype delay process, which
some have estimated to be as long as six months, and which is often critical to
42
narrow market windows. Some types of FPGA chips have their logic
configuration set at system boot time by downloading its personality from some
type of off-chip storage, while others are set during the system manufacturing
process by incorporating read only technology or "blowing fuses". A very
attractive feature of FPGA chips is the difficulty they present to those who try and
"reverse
engineer"
a system. Those who implement a system using OTS
components are exposed to the risk of easy reverse engineering. While FPGA
devices are not as difficult to reverse engineer as custom chips, they do reduce
the exposure to reverse engineering. In addition, because FPGA devices can be
reprogrammed again and again, system designers can quickly correct design
errors during the development process. An additional FPGA capability is the
ability to download new function or implement engineering changes to systems
TM
already in the field. One example of this feature is the Dish Network Satellite
set top box where new features, design fixes, or system upgrades can be
downloaded from the satellite with no action required by a customer. This is
attractive to a customer as it tends to protect the customers investment
technology obsolescence. Over the next few years, FPGA chips are expected to
exceed over 300,000
gates'3
on a chip giving them unprecedented capability in
this class of chip. While FPGA's are certainly a solution for future systems they
are not yet capable of analog functions. This puts them at a disadvantage when
competing in the high growth wireless system market.
In addition, as long as
b
A gate is a measurement of logic on a chip. A single gate is representative of a 2 input nand
gate. Chip designers often state the size of the chip in terms of gates, ram, and I/O.
43
cost is a prime factor, FPGA's will always be subject to replacement by custom
chips which can pack much more logic per square millimeter than an FPGA. As
the number of transistors on a chip continues to grow, the FPGA products are
positioned very well to contribute to low cost systems with narrow market
windows. An important factor not mentioned in Table 1 is the possibility of long
term competitive advantage in using the intellectual property contained in a firms
own design. Such a condition is described by
Abrail24
et. al.























Figure 13: Block Diagram of a programmable Smart Card
Abrail describes the design of a contactless RF smart card chip that integrates an
on chip antenna with a power reception system and contains an on-chip 8 bit
processor for programmable intelligence. Such a design shows what the future
holds for low cost systems and is an excellent example of custom design for a
low cost system. Figure 13 and 14 show the chip architecture and RF front end.
Such a design point, which operates in compliance with ISO standard 14443 for
44
compliant radio frequency emitter/receivers, is likely to have long term value add.
The chip described by Abrial is aimed at the smart card market. The chip derives
its power from its proximity to a smart card reader and thus requires no on board
power source. Because of the high number of smart cards expected to be used
in the future, it makes sense to invest in a custom chip. In the future, Abrial and







Figure 14: Proprietary custom design suitable for future reuse.
Cost - Performance Systems
The second general type of system is the cost performance system. The cost
performance system differentiates itself from low cost systems by the capital
outlay needed to install them and the implied durability and reliability of the
system. If a consumer grade product fails, a small number of people will be
c
This technology is presently used in many places across Europe and is being tested in North
America. This technology is similar to that used under the trademark EZ PASS for toll booth and
commuter services in the north east U.S. The cost is sufficiently low that it is being incorporated
into disposable watches for commuting consumers.
45
inconvenienced. If a large switch or router fails, hundreds of thousands of
people, or more, could be seriously affected through disruptions in e-commerce,
emergency services and even commercial aviation. Cost performance systems
are also differentiated by product features that are offered, the skill and
experience of the employees who operate the system, and of course the fact that
a buyer may be "locked
in"
to a particular type of system and substitution is not a
viable solution. An example of a cost-performance system is a LAN switch (as
contrasted to a LAN repeater or network interface card [NIC]). A LAN switch is
on the low cost side of the cost performance diagram in Figure 12. It is typically
made from OTS components with FPGA logic. Some LAN switch manufacturers
with large installed bases may use their own proprietary design to discourage
reverse engineering, guarantee homogulation, and achieve more control over
profit margins. LAN switches do relatively simple tasks and do not need
substantial on board software. System engineers connect many prototype LAN
switches during the development process and perform network tests to identify
problems prior to deploying the system in the field and to create a customer set
up environment similar to that of a television set. Basic LAN switches, for
example 10 Mbit Ethernet switches, are sold via catalog and are typically
plugged in and turned on by the buyer. Since there is little complexity in a basic
46
LAN switch, these switches can operate for years with few problems.
Figure 15: Routing Chip
Some LAN switches offer more function which requires additional logic to
implement, and are of more value to the buyer; for example a 10/100 Mbit LAN
switch. Adding 10/100 Mbit capability costs very little to implement on a LAN
switch but is critical to most installations since the original installed base was
likely 10Mbit Ethernet and is migrating to 100 Mbit Ethernet at the desktop. Such
a dual speed capability is valuable in the market and buyers will pay more for it.
In addition, some LAN switches offer some level of diagnostics that enable
system administrators to manage the network from remote locations. This often
requires the addition of diagnostic software and a processor to support it
somewhere in the switch. In most system designs, custom processors (CPU's)
are rarely implemented. The cost of developing the processor and the tools to
support it are prohibitively expensive. OTS processor components, both real
time and general purpose, are widely available and well supported. Examples
47
are MIPS, Sparc, IBM 4xx series, ARM, Pentium, and others. So farwe have
considered low cost consumer technology and two examples of
cost-
performance technology. Now we can consider the opposite end of the cost-
performance chart.
As the capital outlay of a system increases so does the need to invest in
performance, reliability and diagnostics. Examples of high cost/high performance
systems might be the 5ESS switches from AT&T, the DMS 100 class of switch
from Northern Telecom, or the CISCO 7500 class of router. Highly skilled people
develop, install and maintain this class of system. Many companies invest in
research groups to stay competitive in high performance systems. The very best
of these companies file patents to protect their intellectual property. IBM, for
instance, files over 3000 patents every year to stay competitive. In addition,
patent licensing is a significant source of revenue for the leading high tech
companies worldwide.
Choosing a design point for high performance systems is not so much choosing
OTS components over custom components (all the key semiconductor
components are custom designed in high performance data communications
systems), as it is choosing an appropriate architecture to be implemented in
custom chips. The literature has many examples of high performance circuits
and chips that support data communications systems. Figure 15 and 16 show
examples of high performance chip data communications chips.
48
Figure 16: OC-48 SONET Tranceiver
Figure 15 shows a six port 30 GB per second nonblocking router chip with a
sustainable I/O bandwidth exceeding 30 GB/second26. The chip contains three
separate clock domains which allow lower speed subsystems to attach to the
chip. The chip uses 180 nanometer technology and contains 6.6 million
transistors within a 100
mm2
area. The chip consumes 21 watts at a supply
voltage of 1 .75
volts.d
The chip was designed to support high bandwidth
multiprocessors which are expected to be critical to future e-commerce systems.
Furthermore, the chip is constructed as a macro cell which will be considered in a
following chapter. Figure 16 shows a block diagram of a 2.488 Gb/s OC-48
SONET transceiver designed in standard CMOS27. The chip is capable of
sustained throughput of 8 x 622 Mbit/sec channels and shows how standard
CMOS capability has been extended. Some have predicted BiCMOS technology
Note: At this point I want to point out the fact that this chip is generating 21 watts using 6.6
million CMOS transistors. Consider the implication that future system designers will have close to
a billion transistors in the next few years.
49
would be needed to achieve this level of performance. The chip dissipates a
total of 500 mW of power using a supply voltage of 1 .8 V. Chip size is 12.2
mm2.6
e
Figure 5 is a transceiver chip 8.2 times smaller and 12.5 times less bandwidth than that of figure
4. Considerable less power is expected to be dissipated.
50
Chapter 4 - Challenges to E-Commerce and Data
Communications System Design Part I
In August, 2002 the IBM Personal Computer reached its twentieth birthday28.
A mere twenty years ago the IBM personal computer contained an 8 bit
Intel
microprocessor (8088) operating at 4.77 MHz. Memory consisted of 16 kilobytes
and a maximum of 64 kilobytes with a 5.25 inch 160 Kilobyte floppy disk. The
display was only 1 1 .5 inches and monochrome. The operating system was MS
DOS and the only significant software application was VisiCalc, an early form of
spread sheet. The IBM System sold for $2,665.00. A typical PC system sold
during December 2001 was a Dell
Dimension 8100. The Dell operated at
1 .3GHz. Memory consisted of 128 Megabytes with a maximum of 512
Megabytes. Additional peripherals consisted of a 56 Kilobit modem, a 20
Gigabyte hard drive, a 3.5 inch floppy drive, a rewritable CD drive, external
stereo speakers and a 104 key keyboard. Software consisted of a 32 bit
multitasking operating system, MSWorks, a digital music jukebox, digital imaging
software and an antivirus software package. The Dell System sold for $999.00.
If one assumes that prices double every ten years, the Dell System should have
cost $10,660. The contribution of silicon integration to e-commerce and data
communication cannot be easily overstated. Today's notebook "Winter system
whose processing unit operates at over 1GHz and
contains DRAM storage often
exceeding 512 Megabytes and containing
tens of gigabytes of hard drive
storage, capable of being networked to other computers anywhere in the world,
51
weighs less than 3 pounds, at a price of less than $1300, is nothing short of a
modern industrial miracle. Indeed, if automobiles would have declined in price as
computers have declined in price we would be driving Ferrari's and throwing
them away every two years. E-commerce and data communications system
designers must continue to make use of the capability provided by silicon
integration.
The productivity provided by Moore's Law is much more than placing transistors
on a slice of silicon. There must be significant technical infrastructure to make
use of silicon integration and create systems. As system designers we must
understand a portion of this infrastructure to make decisions which will affect
future system products. The International Technology Roadmap for
Semiconductors29
has defined twelve specific areas which must be addressed by
the semiconductor industry to keep pace with the future needs of the e-
commerce and data communications industry among others. Table 1 identifies
the twelve areas.
Design Test Process Integration, Front end
Devices & Structures processes
Lithography Interconnect Factory Integration Assembly
and
packaging
Environment, Safety & Defect Reduction Metrology Modeling
Health and
Simulation
Table 2 Key areas affecting future system design
52
For the most part, unless one is associated with systems directly involved with
the development and manufacture of computer chip technology, the table has
limited use. Yet e-commerce and data communications system designers should
be cognizant of the factors in the table and more specifically the design
possibilities and test requirements that affect future systems if they expect to
successfully apply semiconductor technology. Table 3 contains a brief
description of the elements in Table 2 to add context and perspective to those
designing future e-commerce and data communications systems. Of the
elements listed in Table 2, design, chip testing and, assembly and packaging are
most closely associated with e-commerce and data communications system
design. The other elements are more closely associated with the development







Design complexity fueled by continuous improvements in
transistor density, heterogeneous technologies (BiCMOS, Silicon
Germanium etc.), and faster clock speeds, mixed signal
applications, trends toward single chip systems, and time to
market pressure, combine to make chip design a formidable
challenge.
Closely driven by design challenges, chip testing must deal with
electrical verification of the design as well as chip reliability and
failure rates, added complexity due to mixed signal applications
and embedded systems like processors, memory, etc. All of these
factors must be solved within acceptable test cost. A formidable
challenge indeed
The ability to take accurate and repeatable measurements all
along the manufacturing process is becoming increasingly more
difficult. Some have projected a measurement barrier at the 70
nm level. Without acceptable metrology, manufacturability of
gigascale integration may hit a plateau.
Closely related to metrology is the ability to find and remove
defects in the manufacturing process. The current method of
finding defects using ultraviolet light is reported to be unreliable at
130 nm. This represents another exposure to chip manufacturing.
Gigascale integration will require improvements in the way models


















new gate materials gain acceptance the ability to include their
characteristics into the models and simulators is critical to
achieving the next generation of integration. As transistors
continue to get smaller, 2nd, and 3rd order effects become more
important.
Critical to successful gigascale integration is the ability to distribute
clock, data, and control signals around a chip. Continued
microminiaturization has observed the effects of cross talk, (i.e.,
electrical signals being induced on an unintended signal line). The
problem becomes exacerbated by the number ofwiring levels
used by current technology.
The ability of the production line to implement improvements that
keep gigascale integration cost effective. The ability to keep
production cycle time competitive, and the ability to implement
yield learning in the face of almost continuous change.
Chip packaging is another critical aspect of gigascale integration.
With over a billion transistors on a chip, chip connection beyond its
borders is a major concern along with power dissipation, and
reliability. Several solutions are possible but may not solve the
problem, just move the problem to another part of the system.
The gigascale integration industry must accurately assess the
impact of new chemicals and compounds to the environment,
safety, and health of all stakeholders prior to implementing large
scale production. In some areas natural resource and energy
conservation are taking an important role, (i.e., example silicon
valley California where job loss is a real possibility).
The ability to implement lithography's using extreme ultraviolet
(EUV), electron projection (EPL), electron beam direct write
(EBDW), proximity x-ray (PXL) technologies needs to be
understood in order to project manufacturing costs, foundry
investment, and payback time.
Front end processing has to do with the oxides, silicates, and new
gate materials, uniformity of these materials, etching selectivity,
and many other aspects of fabricating chips. Present materials
are reaching physical limits and unless innovation is discovered
gigascale integration may reach a plateau.
This aspect is tightly coupled with front end processing. With
future chips projected to contain over a billion transistors and
because DRAM's quadruple in capacity at roughly two year
intervals( i.e., 64 Mbit, 256 Mbit, 1 Gbit, 4 Gbit etc.), DRAM's are
expected to require new types of storage cells before other types
of chips. New cell structures will be needed to complement the
low k dielectrics, copper multi-layer interconnect, and maintain
adequate capacitance for the DRAM storage cell.
Table 3 - Summary of the Technology Node Challenges as defined
by the International Technology Roadmap for Semiconductors
1999. Source: Semiconductor Industry Assn.
54
Custom Chip Design
In the previous chapter it was stated that no matterwhat the design point, the
need to create a custom chip was highly likely. Whether the system is a low cost
consumer product or a high cost /and high performance system, only custom
logic is capable of providing the highest performance and the lowest cost. We
mentioned that there are OTS components which in some cases make sense for
specific design points but custom logic is king. When it comes to kings, the
Application Specific Integrated Circuit (ASIC) is the king of kings'.
Application Specific Integrated Circuits
We know that the silicon productivity predicted by Moore's Law manifests itself in
the capability to place many millions of transistors on a chip. The problem then
becomes how we use all of these transistors. From the early days of transistor
logic, companies like Fairchild Semiconductor, Texas Instruments, Motorola and
a few others created a set of chips called transistor-transistor logic (TTL). Each
chip performed a logic function, (for example a nand gate or a decoder), and
systems were created by combining hundreds, if not thousands, of TTL chips.
Companies, like IBM, who were undergoing tremendous growth in the general
purpose computer market, were concerned that the supply of TTL chips could be
disrupted or may have serious reliability
problems. Thus some companies, like
IBM, created a corporate division to develop and manufacture the needed TTL
55
parts internally. This internal manufacturing capability enabled tight control over
the manufacturing schedule, the chip failure rate, and perhaps most important
the future shape of the technology. As Moore's Law evolved, it became clear
that the TTL logic functions could be placed on chips with much higher levels of
integration. At this time (1962), a chip containing four 2 input nand gates was
placed in a 14 lead plastic package. After the beginning of the ASIC revolution
(1972), logic functions containing at first hundreds of nand gates, and then
thousands of nand gates were placed on single chips. By the 1990's chips
containing 5 million equivalent nand gates were possible. To help with the
explosion of available gates on a chip, macrocells were created. A macrocell is a
larger logic function created from smaller logic functions. For example, rather
than create a 32 bit exclusive
"xor"
gate from basic nand gates, designers would
simply choose a 32 bit
"xor"
macrocell from a design library. The macrocell was
designed, tested and known to be correct prior to its addition to the library.
Designers were then able to manipulate macrocells rather than fundamental
nand gates. Along with increasing circuit (nand gate) density, the close (on chip)
proximity of the transistors to each other and the decreasing need to go off of the
chip to reach the next gate, resulted in performance increases to the range of 70
to 100 MHz. Faced with an explosion of power dissipation as chips were packed
with transistors, CMOS transistor technology evolved. While CMOS technology
requires more transistors to do a specific function, it dissipates little power and
was first favored by watch manufacturers who relied on small watch batteries to
56
survive. With such increases in transistor density it quickly became apparent that
some level of computer design automation was needed.
Design Automation
In the beginning, designing a function was accomplished with large blueprint like
diagrams (schematics) constructed by hand. This was quite suitable for designs
no larger than 100,000 gates. But the typical ASIC was quickly approaching
250,000 gates and some invention was needed. The invention was computer
aided design or as some call it electronic design automation (EDA). Early EDA
tools were simple schematic capture tools which transformed engineer readable
schematics into design information suitable for computer manipulation. As higher
levels of integration were achieved, schematic capture was no longer viable and
a second generation tool which depends on a higher layer of abstraction was
created. This was the macrocell mentioned above. Beyond this first level of
abstraction came VHDL or VHSIC High Level Description Language. VHDL
enabled designers to use a textual language rather than macrocells. VHDL also
allows designers to focus more on the architecture of a design rather than its
gate level implementation. VHDL also enables designers to simulate the system
before it is cast in silicon. Concurrent with VHDL is logic synthesis. Logic
synthesis is equivalent to a software compiler. It enables a high level design
language to create gate level logic. Implied in the use of HDL tools is the need to
do a "top
down"
design. Prior to the advent of HDL tools, custom chips were
57
designed from the gate level up. Engineering skill levels have increased
proportionately as HDL became the design method of choice.
Types ofASIC's
Asic's are classified into 3 general types: The gate array, the standard cell and
the full custom chip.
GateArray
The gate array is a chip which contains many thousands, in some cases millions
of transistor cells. The chips are stocked approximately 75% through the
manufacturing process and ready to be personalized with logic design data.
Manufacturers stock a menu of chip sizes. Of course, the larger the chip the
more gates available. A manufacturer takes the design data, creates masks, and
applies the masks to the chips. The masks create the transistor interconnects
that form the total logic function of the chip. In a gate array the logic function is
defined by the wiring layers. Gate array's typically exhibit fast manufacturing
time since the chip is 75% manufactured. Gate array's can also be constructed
in two ways: channeled and channeless.
A channeled gate array is architected so that room is reserved for gate to gate
wiring (the channel). Because the wiring channel is reserved, contact holes
enabling connection of the transistors can be
opened prior to being placed in
stock. Thus, the chips are stocked closer to completion of the manufacturing
58
process. This can result in the fastest time to market. The second type is the
channeless gate array, the Sea OfGates (SOG). The SOG architecture fills the
channel with transistors. No space is left for wiring. Wiring is accomplished by
connecting over the top of the transistors. Thus the transistor contact holes
remain closed when the chip is placed in stock. This requires a few more
manufacturing steps before the chip can be delivered to the design team.
Generally SOG technology is preferred because more transistors are available to
be personalized and it results in a smaller chip.
Standard Cell
Standard cells are differentiated by the fact that all mask layers must be applied
by the manufacturer after design data is received. This results in longer delivery
times than a gate array. A second differentiator is the chip core which is not
constructed of pre-placed and predefined logic as in a gate array. Standard cell
chips are tailored to achieve maximum density and performance. Standard cell
chips typically offer twice the density and 1 .5 times the performance as a gate
array. In addition, because each logic cell is optimized, a standard cell chip does
not need to be larger than absolutely needed. There is one exception that affects
chip size; I/O bound. I/O bound is a condition whereby the chip size is defined
not by the internal logic but by the number of receivers and drivers needed to
enter or exit the chip. It is common for chips to have relatively little logic but a
large number of I/O cells. Thus manufacturers do not
"stock"
a menu of standard
59
cell chip sizes containing transistors or macro cells. They stock the ability to
manufacture specific images. Each standard cell is manufactured to designer
specifications.
Full Custom
A full custom chip is the third type ofASIC. Full custom chips are 100% custom.
Each gate and cell on the chip is fine tuned for the function it is performing. The
physical configuration of a transistor is modified for maximum performance and
overall smallest size needed to do its assigned task. Placement of higher
function logic elements, SRAM, registers, and buffers etc., are fine tuned for
maximum performance, minimum size or both. Full custom chips have much
longer design times than either gate arrays or standard cells. Some designers of
well known custom chips have designed and built their own custom library and
fine tuned it to a specific need. Full custom chips are generally developed to
meet a large specific market. A good example is the Pentium processor from
Intel.
ASIC Libraries
It is important that system designers be somewhat familiar with ASIC library
development. We know that ASIC products are formed from libraries of gate
level components. But how are these components created? Figure 1 shows the
design flow of an ASIC library30.
60
Step 1 is the development of the photolithographic and chemical processes
needed to fabricate the chip. Sufficient development work must be done to
assure the process is controllable and repeatable and the processing tools are




Transistor Model Extraction and Verification
Library Desiqn and Verification
Timina Model Extraction
CAD Tool Implementation
Library certification and documentation
Figure 17: Technology Development flowchart from Process Development to Certification.
Step 2 is establishing a process electrical baseline of all the transistor
parameters as well as their effects on each other and their reliability.
Step 3 requires building many chips and testing the parameters measured in
step 2. Step 3 creates the database from which a statistical analysis can be
applied to correlate physical to electrical parameters.
Step 4 is transistor model extraction and verification whereby many parameters
are selected from the Statistical database and entered into a transistor level
61
simulator. One popular simulator is called SPICE. SPICE is capable of
simulating transistors to a high degree of accuracy. SPICE models typically















































; Pen rer, Noise 7
Slgriil Integrity








































Figure 18: ASIC development details related to system market need.
Step 5 Library elements are designed and verified using the data from previous
steps and test cells designed to check the accuracy of the library elements.
Libraries can consist of dozens of logic elements and can extend into the low two
hundreds if the I/O models are included.
Step 6 is timing model extraction whereby the propagation delay of both rising
and falling pulse edges are analyzed and provided to the timing tool delay
equations so chips can be accurately timed.
62
Step 7 CAD tool implementation is a process whereby behavioral language
models are created for every element in the library. The CAD tools then have the
ability to simulate designs for logical accuracy and performance. All possible
logic states must be available i.e., true, false or don't care.
Step 8 Library Certification and Documentation, contains the point where the
library is certified and documented and becomes ready for customer use. Library
verification is usually done under the watchful eye of a quality assurance
representative. Figure 2 shows a complete pictorial flow that a system designer
should be aware of up to the prototype manufacturing interface. Past the
prototype manufacturing interface, the manufacturer uses the design data to
create masks and extract data to test the chip. The level of simulation,
verification, and checking cannot be overstated. Depending on the
manufacturing volumes, system designers may need to pay several hundred
thousand dollars for a few prototype parts. Manufacturers will negotiate
prototype charges when the charges can be amortized over the thousands or
millions of parts expected to be manufactured. System designers never seem to
do enough simulation. The huge cost of investing in a high performance ASIC
demands that every possible amount of performance be extracted from a design.
Thus designers are constantly requested to add more function and keep on
schedule. A similar environment exists in large software projects. As hardware
description languages evolve and advances in design automation occur with the
resulting improvement in levels of abstraction,
future chip designs may look more
like a software design, code and test project than a chip design project.
63
Design Productivity
A key concern for future system designers must be the productivity of a design
team itself. Examples in this thesis show the enormous complexity in a high
performance ASIC chip. The chip design tools must take into account the micro
miniature wires that connect the transistors, where the transistor functions are
placed on a chip in relation to other functions, the speed at which the system is
operating, and the method by which the chip will be electrically tested in the
manufacturing facility. Design complexity is increasing logarithmically. Some
have even used the work "super
exponential"
to describe the rate of complexity
increases over the next few
years.31
It appears that design productivity is not
concurrent with Moore's Law. This may cause Moore's Law to plateau, since
















































O.B01 I 'I I I I I I I I I I I- I III I II I I I II I I I I I I O.01





























@ $150KJ StaffYr. (In 1997 Dollars)
Figure 19: Complexity & Productivity
It may also cause system designers to implement more chips
rather than less
chips to avoid serious schedule slips due to unmanageable complexity. Figure 3
shows the overall impact of designer productivity over the next few years. Staff
cost on figure 3 should be increased by 15% to 20 % to adjust for the 1997
estimates. The chart shows a 58% annual compound increase in complexity
beginning in 1981 and projected to continue to the end of the
decade. Contrast
Moore's Law which shows productivity increasing at a 21% annual compound





There are no magic solutions to the design issues facing system designers,
however there is some hope by using macro cells to keep design schedules
manageable. Macrocells are high function logic elements (building blocks) that
can be connected together to create a much larger function. Macrocells are
intended to be reused again and again. They are intended to be ported to the
next generation chip technology in like manner to the primitive logic elements in
the library. Macrocells are generally part of a library and are verified and
qualified in a manner similar to lesser functional elements. With macrocells you
are working with whole subsystems at a high level of abstraction and need not
concern yourself about verifying the macrocell. As a system designer, you are
now concerned that you have properly connected the macrocell logic rather than
the internal working of the macrocell logic. Some examples of macrocells are:
memories, processors, phase locked loops, and the whole switch fabric chip
identified in Chapter 3.
Macrocells are generally available in three levels of abstraction: soft, firm and
hard. Soft macrocells consist of the highest level of abstraction - a high level
design language. They have no physical attributes and will not take a physical
form until they are converted into shapes for mask building. Firm macrocells
have a physical attribute, yet retain some flexibility toward their ultimate
implementation. Firm macrocells are placed in a symbolic library. Hard macros
66
are completely implemented in a particular chip processing technology. They
have a physical design component and are placed on the chip during the layout
design phase. Macrocells are used to best advantage when the intellectual
property of the design team is needed for more important aspects of the system
design. For example, the design team should not be focused on a processor
design if the most important element in the design is a switch fabric. The design
team should be using a processor core and stay focused on the switch fabric.
Macrocells offer the best hope of improving design productivity over the next few
years. But even with macrocells, productivity will continue to
decline.32
67
Chapter 5 - Challenges to E-Commerce and Data
Communications System Design Part II
Arguably the most challenging three aspects to system design are what to do
(partitioning), how to do it (methodology), and how to know it was done correctly
(verification). All three aspects are inter-related. No single aspect can be
successfully achieved without consideration of the other two aspects. Hopefully
it is apparent that choosing a partition that cannot be designed and verified is
problematic. In addition, the design must also be economically viable. It is of no
benefit to propose a design point which fails to achieve the system goals no
matter how advanced the proposal technology may be. Table 1 in chapter 3
identified future system design goals which need to be considered for a
successful partition. Closely joined to these three aspects are the tools which a
design team relies upon for its success. Managing a data communications
system design project is quite similar to managing a large software development
project. The design automation tools enable a high level of abstraction much like
high level software compilers have removed the need to invest in machine and /
or assembly level software coding. Yet there are aspects of a design that need
special attention since in many designs every nanosecond counts.
Partitioning
68
Partitioning is the process by which we decide what to build and what
components should be adopted. The focus is on a list of components comprising
the entire subsystem being proposed. This list is usually called a "bill of
materials"
and contains every component in the subsystem including the cost of
software, packaging, assembly, and test. Table 4 shows a medium scale bill of
materials for a data communications feature card which we will use to focus our





frame relay, and DSL. Figure 20 is a pictorial view of
the card. I am assuming this design is for a cost
- performance product suitable
for investing in integration and not a commodity product. The table represents
the first pass attempt to define a set of components needed to achieve the cost
reduction design specification.
Table 4 Typical Bill ofMaterials
Component Quantity Gates Package Technology
CPU 1 750K 304 PFP 1.5u
ASIC1 3 350K 256 PFP 2.0u
256K ROM 2 500K 64 PFP ?
Connectors 4 0 N/A N/A
BAT 1 0 N/A N/A
FPGA 10 100K 256 PFP 2.0u
ASIC 2 6 50K 144 PFP 1.5u
51 2K SRAM 2 150K 64 PFP ?
DRAM 8 ? DIMM 1.0u
Board 1 0 4 layer ?
Miscellaneous 1 0 Miscellaneous N/A
Totals 39 3650K N/A N/A
*
BAT is an acronym for Build, Assembly and Test.
Table 4 shows we have an assembly consisting of 39 components and
approximately 1900K logic gates. The technology of the assembly
ranges from 1
micron to 2 microns. Since this is a cost reduction effort, we need to see if
69
migrating the design point to a new technology makes sense. We know from
current semiconductor trends that deep submicron technology is available so we
assume we can take advantage ofMoore's Law and get many more circuits on a
chip at the same price and perhaps even a lower price. We hope to reduce the
number of components on the board which may allow us to reduce the size of the
board and reduce cost even more. Table 5 tells us there are a large number of
interconnects needed on the board needed to connect all of the components.






















Table 5 also shows the total number of interconnects (4752) minus the DIMM
assembly and the
miscellaneous connections for connectors, resistors and
capacitors.
70
Table 5: I/O count for the current assembly




FPGA x 10 2560
ASIC #2x6 864
SRAM x 2 128
Total I/O 4752
There are three questions at this point of the partition. 1) Is it possible to
sufficiently integrate existing components on the board into one or more chips at
less cost? 2) Is it possible to find a package that can contain the number of l/O's
at the proposed design point? 3) Does the proposed design point cause
engineering problems at the board level negating the silicon integration
advantages, i.e., did we solve a problem or merely move it.
Rents Rule
Over 40 years ago E. F. Rent published internal memoranda at IBM that
established Rent's Rule33. Rents Rule is given by the equation Np = KpNg
where Np is the number of pins, Ng is the number of gates and Kp is a
proportionality constant. Rent's Rule
establishes the relationship between the
number of logic gates on a chip and the number of pins needed on the package
to support it. Rent's Rule was derived empirically and is very specific to a
particular architecture, i.e., it is not suitable for general predictions unless the
future system maintains the same architecture as the previous system, which is
the case in our example. System designers must be aware that high speed
71
requirements tend to drive designers toward parallelism while lower speed
requirements drive toward serialism. For example, DRAM's which operate in the
100 nano second range, are highly cost sensitive and as a result DRAM's tend to
multiplex address and data pins to achieve the smallest die and least expensive
package possible (a form of serialism). SRAM's on the other hand, which
operate in the sub 10 nano second range, are used to achieve high speed
functionality (a form of parallelism). It is also common for 32 bit CPU's to have
multiplexed 16 bit interfaces to minimize die and package size when high
performance is not critical to the design. Applying Rents Rule to the example
shows the various ratios of gates to pins (P) for this example. It is interesting to
note the difference in CPU and ASIC B as contrasted to the ROM and SRAM.
The table shows that while the amount of bits on a chip have increased
tremendously over the past 40 years the need for larger packages, as measured
by the number of pins has not increased significantly; as expected. On the other
hand, the number of gates on a chip has increased and so has the pin count.
Thus the ratios have remained somewhat constant over the past 40 years.
Table 6: Rent's Rule Analysis of the Proposed Partition
Component Log of Gates Log of Pins B Rents 3
CPU 5.87 2.48 .42 .45
ASIC1 5.54 2.41 .43 .50
ROM 5.70 1.81 .32 .12
FPGA 5.00 2.41 .48 .5
ASIC 2 4.69 2.16 .46 .5
SRAM 5.17 1.81 .35 .12
Board / card N/A N/A N/A .25
Rents (3 is derived from a 1985 vintage 3081 IBM mainframe.
72
Rents Rule is effective in projecting the ratio of gates and bits to signal pins and
is also effective at projecting large scale integration i.e., what happens when the
total function becomes contained within a larger chip. Rent did this by projecting
a B for board and system level designs. Rents projection of .25 shows a large
efficiency can be obtained at the board and system level. There are many more
interconnects within a chip than within a card and within a card than within a
system. As we know, what were several boards became a single board. What
was a board became many chips. What were many chips became a single chip.
It thus seems perhaps obvious that what we want to do with our partition is to
"mop
up"
as much function into as few chips as possible to do the best cost
reduction. This will lower the cost of BAT, minimize component inventory, and
improve overall reliability since fewer parts mean fewer failure opportunities.
Choosing a partition requires that we look at all of the components and determine
the optimum design point. The CPU, for example, may be available as a macro
cell. Once the CPU macro cell (or any macro cell) is integrated into a chip we no
longer need concern ourselves with future CPU supply issues. If a vendor
removes it from the market, we will not be forced into another redesign or
perhaps much more if our software suddenly needs to be ported to another CPU
architecture. The other components are also candidates for "mopping up";
especially the FPGA components and the chips
which used 1 .5u to 2.0u
technology. Moving from a 2.0u technology to an .18u technology is an 1 1x
improvement. With current foundry chip production using deep submicron
technology (.18u) Moore's Law may help us immensely. But we still need to
73
address the aforementioned challenges before us. a) Can we partition the logic
such that we can use a suitable chip package, b) Do we want to use a few large
(and dense) chips or smaller (and dense) chips but more of them, and 3) ifwe
use a package capable of a large number of pins, will it affect the board cost by
adding more interconnect layers to the extent we seriously impact our overall
cost reduction?
Not by coincidence in our example, there are 3 ASIC's used for the OC3
interface, 6 ASIC's used for the T1 interface, and 5 FPGA components used
each for the Frame Relay and DSL interface. Each of these components may be
suitable for a macrocell. One of the key advantages of a macrocell is we get to
evaluate serialism and parallelism within the new design point. Perhaps we can
operate some portion of the new chip at a much higher speed allowing us to
share some function and reduce the number of gates on the new chip design
point. Perhaps we can identify redundant logic in the old chip design point and
remove it giving us even more savings. We can also save pins on the package
since presumably each macro cell need not individually connect to the data bus.
Perhaps with some on chip storage we can create sufficient buffer to avoid over




Chips are fabricated on silicon wafers. These wafers have increased in size over
the years. Current industry tooling is capable of handling wafers in the range of
200mm (8 inches) to 300mm (12 inches). It should not be a surprise that where
sub micron dimensions are the rule, the tiniest of particles are problematic. It is
somewhat obvious that larger chips are individually more likely to contain a
defect and smaller chips are less likely to contain a defect. While chip testing
has not yet been introduced, we should say at this point it takes time and money
to test chips. Test tooling is such that larger chips take longer to test than
smaller chips. It is also true that there will be many smaller chips on a given
wafer than larger chips. So we have a dilemma. If we use larger chips it could
cost us more for the chip. If we use smaller chips, the cost is likely to be less but
we will need more chips and incur a higher board level BAT. Larger chips may
take longer to design and verify individually but we will have less of them. What
should we do? We work with our ASIC vendor to identify the best possible chip
and package combination for our design.
ASIC vendors generally have a "sweet
spot"
in the product menu that reflects
their best analysis of their fabricators defect density, tooling capability and test
methodology. These factors are rarely provided to customers but are reflected in
the price. By looking at the various chip and package combinations we can find
this "sweet
spot"
and get the best value possible. We should also be alert not to
confuse leading edge with bleeding edge.
75
Leading Edge vs Bleeding Edge
Using the very latest chip technology must be weighed carefully. Committing a
large chip, or worse yet several chips to a new technology is highly risky. While
basic logic gates and the libraries that contain them are somewhat easily
qualified, the overall design methodology including macro cell and primitive logic
functionality, testing and verification, must be equally robust. High performance
macro cells are much more difficult to qualify than primitive logic gates. Some
macro cells may operate at tens or hundreds of megahertz and it is critical that al
of the macro cells used in a design be qualified prior to being used in a
production chip. The problem is many ASIC vendors cannot possibly qualify the
macro cells thorough in house qualification methods. Migrating macro cell
libraries from one technology (for example .25u to .18u) is complex and costly.
ASIC vendors do not qualify macro cells until a customer needs them. If not
qualified by the ASIC vendor or a previous customer, macro cell qualification
becomes a part of the negotiation of the price of the prototype hardware. Of
course this could cause the project to be delayed or even cancelled should
problems be found in any of the macro cells. It is best not to go to the bleeding
edge of technology without strong motivation. It also does not hurt to have more
than one vendor competing for the production volumes. Table 7 shows some
possible wafer and chip sizes and the number of chips one can expect on a
wafer. I have not considered edge effects which is the inability to place chips on
the edge of a wafer.
76
Table 7: Wafer Productivity for various chip sizes
Chip Size 5mm 8mm 12mm 20mr
Wafer size
100mm 314 122 54 19
200mm 1256 490 218 78
300mm 2827 1104 490 179
The table is calculated by taking the area of a wafer (Pi times the diameter)
divided by the area of a chip. One can see that as chip sizes became larger,
wafer sizes needed to increase also to maintain some semblance of productivity.
For example, had the industry not invested in 200mm and 300mm wafers, large
chips (mostly processors and DRAM's) would yield a mere 19 chips per wafer.
At the microeconomic level, this would have caused far more factories to be built
to meet the demand for chip technology we have experienced over the past ten
years. This is especially important for high volume chips like DRAM's and Intel
like processor chips. Ifwe now consider that the number of transistors on a chip
is doubling every 18 months to two years, one can get another view of the impact
of silicon on information technology systems.
At this point let us say we have partitioned the logic along the prior architecture
as shown in Figure 19.
77














Figure 21 shows a redesigned system. The system uses deep submicron chip
technology - perhaps .25u or .18u. With much effort, previous functional
elements were redesigned as reusable macro cells and retained as unique
intellectual property of the design team. FPGA chips were recast as ASIC
standard cells and will exhibit a much lower cost per circuit than the prior design
point. The Central Electronics Core (CEC), consisting of the SRAM, ROM and
CPU were integrated into a single chip using macro cells from an ASIC vendor's
cell library. Assuming the design team used the same CPU macro cell as the
prior OTS CPU, porting software to the new design point will be less headcount
intensive. A key savings is the reduction in components from 39 to 19.
Table 8: Post Partition Bill ofMaterials
Component Quantity Gates Package Technology
CEC 1 1500K 304 PFP .18u
ASIC1 1 1000K 625 BGA .25u
78
ASIC 2 1 300K 256 PFP .25u
ASIC 3 1 500K 256 PFP .25u
ASIC 4 1 500K 304 PFP
Connectors 4 _ _
BAT 1
Board 1
DRAM 8 _ _
Totals 19
Design teams must take into account the full cost of implementation when doing
a project of this type. If planned incorrectly, the software development cost could
easily negate the overall benefit of the cost reduction. Typically, the hardware
view seen by the software is rigorously preserved. Bit positions in interface
registers are not changed unless there is overwhelming need to do so.
Preserving as much of the system as possible leads to the fastest rate of
implementation and integrating into manufacturing. If done exceptionally well,
there is no need to maintain an inventory of the previous design point for spare
parts.
Table 9, from LSI
Logic34
shows a typical menu of chip and package
combinations which are likely to support the partition choice. Of particular
interest is the number of connection points available in ball grid array (BGA)
package technology. Recall from Table 5 that there were 4752 pins needed in
the original design point. Note that a single chip in a die size between 15.8mm
and 16.8mm in BGA package technology can support 1600 I/O. Conceivably, if
we ignore the "sweet
spot"
we could partition our design point to use only three
(expensive) chips.
79










I/O Vdd Vss Vdd
Core
896 31 30x30 8.8 - 13.3 608 116 140 32
1152 35 34x34 9.3 - 13.8 768 156 196 32
1517 40 39x39 11.3-15.8 1024 212 248 33
1932 45 44x44 13.3-16.3 1280 290 314 48
2397 50 49x49 15.8-16.8 1600 348 396 53
Second Level Assembly
We discovered from Rents Rule, that integration can result in less connection
points at the board and system level. Yet we must be concerned that we are not
merely moving the connection problem; in this case
from the chip to the board. If
we consider our board to be a typical 4 layer board, we need to be concerned
that our package choice does not cause us to spend our system savings on a
board with many more layers. While multi
layer boards of up to 6 layers are
relatively inexpensive, going beyond 6 layers is
a concern. The board factor can
be critical if we have a large board of many layers and a board form
factor which
causes un-necessary waste at the raw
board manufacturing level. Our cost
reduction uses plastic flat pack (PFP) packages which are well understood and
mostly compatible with
low density board form factors. However we are also
using some BGA
packages. BGA packages are high density form factors and
when they are placed on a board it is likely
we will find congestion when making
the electrical connections from the BGA
package to another package. While the
congestion is typically isolated to the area
in the immediate vicinity of the chip
80
package, the effects are felt in the cost of the board. High density chip
packages, like the 1600 I/O package above, could cause the board to jump from
the 4 to 6 layer form factor to a 10 to 12 layer board form factor/ Going beyond 6
layers is generally not acceptable except in the design of the highest
performance systems where space is a premium like some military projects,
satellite projects, and perhaps some commercial aircraft. The point is to take the
entire assembly into consideration when choosing a partition.
Packaging
While space does not permit an in-depth look at chip packaging technology, it is
not appropriate to ignore it. The broad semiconductor industry could not have
advanced without strides in semiconductor packaging. At the head of the
contribution list is ball grid array packages (BGA)35. BGA packages bring
unprecedented packaging density to the semiconductor industry. But perhaps
more important is the change in chip physical architecture BGA packages
provide. BGA packages are a packaging extension of Controlled Collapse Chip
Connection (C4) technology invented by IBM during the early 1960's to deal with
the reliability of connecting chips to packages and packages to boards. C4 is
best described as "A solder joint connecting a substrate directly to an IC in a flip
chip configuration. In this packaging scheme, a solder ball is formed on the IC,
the IC is placed active circuitry down onto a substrate, and the solder is reflowed.
f
Note: Boards up to 20 layers are possible, and IBM's
TCM ceramic form factor contained over
50 layers and supported over 100 chips. The TCM was capable of over 100 layers.
81
As the solder melts, the solder balls collapse into a shape controlled by the
surface tension of the liquid solder while supporting the weight of the
IC."
IBM
offers a concise description of the technology: "Originally developed for use with
ceramic carriers in connection with the Solid Logic Technology (SLT) introduced
by IBM in the early 1960s, C4 is a process that uses 97/3% PbSn solder balls
with diameters ranging from 100 to 125 microns as a chip-to-carrier interconnect.





9 Last Level Metal (AlCu)
Figure 22 Controlled Collapse Chip Connection Technology.
An array of these balls or bumps is arranged
around the surface of a chip, either
in an area or peripheral configuration. The chip is placed face down on a carrier
that has been prepared with corresponding metallized pads that have been
flashed with gold to prevent corrosion. When heat is applied, the solder reflows
to the
pads."




the chip is shown prior to depositing the ball. The ball is deposited by
evaporating lead and tin upon the surface of the wafer through a metal mask.
This allows all the C4 balls to be applied simultaneously on all of the chips on the
wafer consistent with typical chip masking operations. The cr
- cu - au (chrome
- copper - gold) step is needed to assure robust attachment of the ball to the
chip. Without this step the ball would fall off the surface of the chip. The ball
location on the chip could be an I/O connection, a power connection, a ground
connection or no connection at all. The no connection balls are used for
mechanical strength only. Slide 2 shows the ball as it appears after the
evaporation process. Note this column like appearance and the 93% lead and
3% tin composition of the evaporated metal as contrasted with the 60% lead and
40% tin content of typical electrical solder. Slide 3 shows the ball after
completing the reflow step. The reflow step consists of passing the wafer
through a furnace at sufficient temperature to liquefy the solder ball. Surface
tension causes the ball itself to form. Slide 4 shows the chip removed from the
wafer and placed on a ceramic chip carrier (package). Note the chip is placed on
the carrier "face down". C4 technology is sometimes called "flip
chip"
technology.
Figure 23, obtained from IBM, shows three differing chip architectures enabled
by C4 technology. The left slide shows a peripheral pad layout somewhat similar
to that generally used in the industry by wirebond techniques. Power, ground
and I/O are placed around the periphery of the chip. As we will see, C4 allows
many more chip connections than
wirebond as well as some important electrical
improvements. The center slide shows a combination of peripheral and
83
distributed power and ground. In this slide, any of the balls could be power,
ground or I/O. In the right slide, an area array of balls is shown.
Figure 23: Three types ofC4 chip images
Area array balls give the maximum number of chip connections. As mentioned
before, not all of the balls need make electrical connection. Many are used for
mechanical strength reasons only. In addition the hundreds or thousands of balls
make an ideal heat sink which helps solve the expected "rocket
nozzle"
temperatures of future gigascale, gigahertz chips. According to IBM, placing the
I/O near where it's required improves delays and reduces on-chip wiring. A
major issue is simultaneous switching output noise (SSO). As device speeds
increase, bus widths widen and noise tolerance is reduced by low voltage levels,
noise becomes of utmost importance. The lower inductance-per-connection (as
much as 40X less for C4 overWB) and higher power-to-signal ratio enable a
84
significant reduction in SSO. Figure 24, from IBM, shows chip terminal count as
a function of die size for peripheral and area-array connections.
Figure 24: Advantages of C4 technology for various ship sizes.
According to IBM, "For the highest lead-count chips, area array can provide
much smaller dice compared to the pad-limited peripheral termination style, in
cases where high terminal counts and low gate counts are required. This
reduced die size for high-terminal count chips can translate to substantial saving
in terms of increased die
per-wafer."
Figure 25: Typical high density BGA package from
LSI Logic Corp.
85
Figure 25 shows a typical high density BGA
package from LSI Logic Corporation. Note
the thousands of C4 balls on the package.
The chip is in the center of the package.
One may be able to imagine the problem
encountered at the board level trying to
avoid wiring congestion if high numbers of
these balls had signals which needed to be connected to other chips. C4
technology allows for greater flexibility in placing chip connections. This flexibility
will be critical to future chip designs where it is not possible to place l/O's near
the periphery of a chip.
n
Final word on partitioning
System design is no longer strictly the domain of large companies locked in a
vertically integrated corporate system. Small firms have made major impacts to
the data communications landscape. The past fifteen years have unveiled a new
entrepreneurial concept in system design consisting of three interrelated
capabilities. The fabless design firm, the semiconductor fabricator, and the
design tool firm. The fabless design firms relied on a semiconductor factory
dubbed a foundry which provided access to prototype parts and future
manufacturing volume. The fabless firms and the foundry also relied on design
automation firms who created software design tools. All three contributing
86
factors rely heavily on intellectual property as the raw material for success.
Entrepreneurs need no longer be experts in chip design to bring leading edge
products to market. There are many chip design firms located at all points
across the globe ready to assist in the design of sophisticated chips. Since the
design of electronic systems is mostly one of intellectual property and computer
based tools, design firms on the subcontinent of India, for example, are just as
likely to contribute to new electronic products are those in Silicon Valley. In the
next chapter we will discuss design methodology and verification.
87
Chapter 6 - Challenges to E-Commerce and Data
Communications System Design Part III
In the past chapter we discussed partitioning a design and the challenges and
rewards of doing a good design partition. We discussed the need to avoid the
very latest technology without substantial reason. We also discuss BGA chip
packaging; a package which enables higher performance chips in a very small
form factor. At this point we should begin to discuss design methodology and
then design verification.
Design Methodology
Design methodology encompasses the creation of a chip from the designer's
perspective as a whole. As expected, design methodology increases in

































H i g h Com pi exity MACRO Metho dol ogy
Figure 27: Complex Chip DesignMethodology
Figures 26 and 27 show the design methodologies for a small gate array and a
high complexity macro cell
design.36
We will discuss the test portion of the
methodology in a later chapter.
Background
Chip design tools evolved from early computer assisted design tools. Engineers
and technicians would create chip images containing transistors, capacitors, and
resistors. These chip images were very low density by today's measurements.
The earliest image I can recall contained a mere sixteen NPN transistors, four
lateral PNP transistors, no capacitors and many banks of resistor bars.
Component placement on the chip was fixed. Large drawings of the chip image
89
were reproduced on thick plastic sheets of vellum and provided to circuit
engineers who created the digital building blocks needed for the system. Each
chip image was hand drawn by layout technicians and process engineers with
large amounts of help from physicists. Design failures were typically caused by
incorrect drawings or placement of shapes, from which masks were produced.
Such an error was called a design rule error. Thus, one of the first tasks was to
use a mainframe computer to check all of the shapes for design rule violations.
This checking process is called design rule checking (DRC) and remains in use
today. Master image wafers were produced and stocked awaiting chip
personalities from circuit engineers who were creating small scale and medium
scale digital functions as well as some analog function as needed. The circuit
engineers
"personalized"
the chips by interconnecting the transistors and
resistors to form circuits. Of course, the circuit engineers could also violate
design rules and DRC helped fix that problem also. As Moore's Law started to
take affect, design libraries were created which allowed designers to work with
logic symbols of circuits (nand, nor, adder, etc.) rather than physical shapes.
Thus schematic capture was born. This is a first level of abstraction. Higher
layers of abstraction will come later.
Low Complexity Gate Array
Developing a low complexity gate array is a relatively simple process of capturing
the schematic of a design and converting the captured design into a netlist.
90
Figure 28 shows an example of a symbol used in schematic capture. A netlist is
a list of functions (nand, nor, etc), and how the designer intends to have them
connected. When the designer is satisfied the design is correct the designer
pushes a button and the schematic capture tool creates a netlist which is sent to
the ASIC factory for fabrication. Before accepting the design, the fabricator
usually requires the netlist to be screened (DRC) to identify possible design
problems which the designer may have deemed as "don't
care"
but which is a
concern to the fabricator. This is rarely a problem with low complexity gate
arrays. The factory then creates a data file from which masks will be made.
Creating this mask file is the first time actual micron sized wires can be
individually assigned to the various
"nets"
on the netlist. Once the length and
width of the wires is known, resistance and capacitance can be calculated and
provided as a file (back annotation file) to the designers who can use it as part of
their verification process to identify any nets with too many transistors (loads)
attached to meet the speed required by the ASIC. Small changes to the design
can usually be accommodated for small gate arrays. Late design changes to
large gate arrays and macro cell chips, even if they are relatively small changes,
are not easily accommodated by the foundry and are a source of some of the
additional steps needed when designing a more complex gate array.
Low Complexity Gate Array Verification
91
Verification consists of simulating the netlist and testing the prototype ASIC. The
table in Figure 28 shows an example of event level verification. Simulation of low
complexity gate arrays, including FPGA products, is accomplished using event
simulation. A computer simulation tool
"simulates"
the effect of inputs upon the
outputs of the ASIC as a function of time. For every set of inputs (stimulus or
"stims") there is a set of outputs ("expects"). The simulation process can be
explained as a table of events which contain the inputs and the outputs delayed
by some period of time. Given a set of inputs, designers expect some change at
the output within some period of time. It is common that not all outputs will
change when the input changes. This is only a problem when the designers
expect a change and it does not occur. Designers typically allow a number of
micro-seconds or nano-seconds to expire before checking the output. The table
in Figure 28 for example shows that there are delays associated both with the
logic blocks and the connecting wires. While all nets must be checked, it is the
nets with the longest delay that determine system speed. Prior to the back
annotation file being received from the ASIC foundry, designers are forced to use
delay estimates. If the net delay estimates are not close to net delay actuals a
redesign may be

















Figure 28 Example of Schematic Capture using Symbols &
Timing Table
The amount of time allowed is the expected propagation delay of the chip. The
chip designer of a low complexity gate array may be responsible for both the
design and the simulation. But, like a good software project, it is generally not a
good idea to have the coders do the testing. The best design teams separate the
designers and the verifiers. While this can result in some stressful meetings, it
also uncovers design flaws which otherwise may go undiscovered. The table in
Figure 28 shows the longest path in this design to be 27 nS consisting of a 12 Ns
delay in the logic block, 6 nS delay in wire net 1009 and a 9 nS delay in the last
logic block. A simulator would require that a check of the output of the logic
block not occur until 27 nS after the start of the event.
Economics of Low Complexity Gate Arrays
93
One of the challenges with low complexity gate arrays is that foundries are more
and more unwilling to build them. Eventually old technology, like the typewriter,
passes into history. Foundries cannot keep five and ten micron technology
around forever. One of the more effective methods used by foundries to
motivate designers to migrate to new chip technologies is to begin to raise prices
on the old technology while touting the benefits of the new technology. If that
fails, an end of production notice is announced. An end of production notice
typically gives a year or more notice to allow ample time for a change over.
When possible, some design teams simply move the design to another fabricator
which specializes in older technologies at an acceptable price point. But it is
common for cost reductions to be motivated by "old
technology"
penalties. Yet
moving a low complexity gate array to a current .25u or .18u technology usually
results in chips with little logic on them. It is common for chips to be I/O bound
even when using current chip technology. I/O bound simply means the chip does
not have sufficient I/O cells forthe'design partition. When I/O bound, chip
designers typically select a larger chip from the menu to solve the problem. The
larger chip has more l/O's and usually solves the problem. But for low
complexity gate arrays the problem is quite
different. The cost per gate is
extremely high. Ifwe assume we have a
small chip image which contains 100
I/O cells and is capable of 150K gates and we have only 25K gates and 40 I/O to
place on the chip we can see we are paying
for 150K gates and 100 I/O and
using only 17% of the gates and
40% of the I/O. The cost per gate, one of the
94
economic measurements often cited in a bill of materials will seem out of line.
There are two things which can be done. One is to 'mop
up"
the small gate array
into a larger one, as we did in the example in Chapter 5, or we could choose to
use a FPGA product. Since FPGA products are more costly per circuit than high
density gate arrays we may not solve our circuit cost problem but we can get the
design completed and into manufacturing. FPGA products also avoid the NRE
(non-recoverable engineering) cost imposed by the foundry on gate array and
cell based ASIC's. When low complexity gate arrays are not avoidable, FPGA
technology is an alternative.
High Complexity Gate Arrays
With high complexity gate arrays the design methodology is more complex.
Silicon compilers and synthesizers are introduced at this point. In addition,
timing verification and floorplanning become critical. High complexity gate arrays
often contain read only memory cells (ROM) and/or Static Random Access
Memory (SRAM) cells. Dynamic Random Access Memory (DRAM) cells, which
are technically feasible on a gate array chip, remain elusive because the cost per
bit of DRAM is in constant decline and doesn't warrant the engineering effort
needed to embed it on a gate array chip.
95
Silicon Compilation
Silicon compilers convert a high level concise specification into custom logic.
Silicon compilers are somewhat "expert
system"
tools. Silicon compilers are
excellent at creating repeatable logic on a chip. The compiler developer creates
the program and design rules specific to a particular ASIC technology i.e., the
designer takes into account the transistor characteristics, wire attributes
(resistance, capacitance), circuit timing constraints, number of wiring levels (five
or more layers of wiring on a chip is common), crosstalk, switching noise, and
porosity. Porosity is the property of a silicon object to allow wiring connections to
pass through it. Perhaps the best examples of silicon compilation are the
compilers used to create ROM and SRAM macros in most foundry ASIC libraries.
The designer typically needs to specify the width and depth of the macro and the
compiler builds it. It is common for foundries to provide estimation tools to
designers so they may consider the design factors of the macro during the
partitioning phase. A typical SRAM compiler at IBM requires the designer to
specify the number ofwords, the number of bits in each word,
and select a
porosity level. With these three inputs, a total SRAM is built for a design team.
The compiler returns all the data describing the macro cell for verification and
floorplanning tools. We will discuss more about porosity in the floorplanning





Nwords Nbits Porosity Delay Width mm Length
mm Mm
32 32 40 9 1.9 2.3
Table 10: Silicon Compiler Input and Output
The compiler is provided with the Number of words (Nwords), the Number of bits
(Nbits) and the porosity required. The compiler then provides the details of the
compiled SRAM showing the speed (9ns) and the length (2.3mm) and width
(1.9mm) of the macro. The compiler also supplies all of the files needed to build
the macro on a chip.
Synthesis
Synthesis is the process of designing ASIC's using equations to define the logic
rather than trying to enter logic symbols of primitive gates. Synthesis tools are
quite complex and are capable of taking many parameters into account as the
synthesis tool generates the logic embodiment of the abstracted design.
Synthesis tools have the ability to contract logic blocks into smaller blocks using
a simplification process or to expand the number of logic blocks, or to do nothing.
Logic expansion usually results in faster logic, contracted blocks usually results
in slower logic. By using equations to design logic, designers concentrate on
choosing the embodiment of the logic by forcing constraints on the synthesis tool
97
rather than trying to optimize logic paths by hand. One might be able to see how
valuable logic synthesis is when implementing designs containing thousands of
gates and future designs using millions of gates. Without synthesis, designers
would have to count gates and delay, and choose what logic primitives to employ
in the design. If an acceptable outcome was not achieved, the design would
need to start over. It is unlikely we would be able to use the number of gates
produced by Moore's Law without tool like compilers and synthesizers and others
we will discuss shortly. Logic equations can take the form of a simulation
language (Verilog, VHDL), truth tables, bubble diagrams, and state transition
languages. Some common constraints are the number of gates to be used, the
clock speed, and the maximum delay between input and outputs. One important
synthesis capability is the tool's ability to take a design built with a library from
vendor
"A"
and converting it to a design to be built with a library from vendor "B".
While much of the verification process must be redone, moving designs from one
vendor to another is much simpler. Many old technology designs have been
preserved using this capability.
Hardware Description Language
Perhaps by now one can begin to see the level of complexity contained in a
system design.
37
It would surely be impossible to create large ASIC chips
without some form of abstraction. While there are several proprietary hardware
98
description languages, Very High Speed Integrated Circuit (VHSIC) Hardware
Description Language (VHDL) is probably the best known. As opposed to
performing a schematic capture, VHDL requires a designer to make declarations
about the design primitives. Of course, the primitives may be obtained from a
library of previous designs and logic need not be limited to primitives. Much
larger blocks of logic may be defined but they are done using behaviorals.
Figure 29 is a schematic of a latch used in the following discussions. Figure 30








port (s,r: in bit;
q.nq : out bit);
end latch;





Figure 30: VHDL declaration
The entity is a latch with a
single port, one bit wide. The inputs to the latch are
labeled s and r and the outputs are q and nq (not q). From this basic
description
99
we have formed a latch. The function of the latch is described by the architecture
which says data flows from nq and r to q (i.e., q is active when r and nq are
inactive). It follows for a latch that nq is active when s and q are not active. The
described architecture is called a functional simulation because the latch
operation is described without timing. To add the time component the following
description is added (Figure 32).
q<=r nornq after 1ns;








Ins 2ns 3ns 4ns 5ns 6ns
VHDL Behaviorals
VHDL behaviorals are unique from previous methods in that design
implementation is a "black box". Sounds a little like synthesis. The black box
analogy is an important concept to large ASIC's because large macros, like
microprocessors, switching fabrics, wireless radios etc. which are typically
100
contained in large ASIC's, can be modeled with relative ease. With VHDL
behaviorals the internal operation of the attached component is irrelevant; only
the inputs and outputs are important. Relieved of the burden of modeling the
whole structure, VHDL abstraction speeds the design process considerably.
VHDL is a complex design language. A detailed description of behaviorals is
beyond this paper but an example is appropriate. Typical program control
statements are used in VHDL such as: If, then, else, begin etc. The following
represents a VHDL behavioral.
signal x : bit_vector (7 dowhto 0);
process (x)
variable p : bit;
begin
p:='0'





Verification takes on much more complexity when building large gate arrays.
The verification process must take into account silicon
compilation and synthesis
processes, the use of a large macro,
an even schematic capture for those
occasions when a little
"glue"
logic is needed. Achieving a good timing
101
verification becomes more critical and is complicated by the increasing timing
constraints introduced as a result of much higher performance and the
associated clock speeds. Verification of complex gate arrays always employ
higher levels of abstraction in the simulators (example VHDL). Gate (event) level
simulators are almost never used in complex gate arrays. They are much too
cumbersome.
As mentioned in a previous paragraph, timing components are contained with the
HDL model. These timing components are accessed by the logic that connects
to the model. The model, of course, is now simpler since only the inputs and
outputs are pertinent. The developers of the behaviorals for the macros may
have used both low level and high levels of abstraction, or even no abstraction at
all. The important point is to create a behavioral that describes the macro to a
high degree of accuracy in both function and time. The accuracy of the time
component is variable under most circumstances for reasons we will describe
later, so "tight
timing"
cannot be assured without floorplanning which we will
discuss later. With a set of behaviorals, designers then begin to run test cases
that verify the correct operation of the logic block. Because of the high level of
abstraction capable with HDL languages, many test cases can be applied to the
behavioral model in a short amount of time. Thus the simulator is a very efficient
method of verifying complex chips. However, the simulator does not excel at
timing verification since timing verification must be done at the netlist level.
102
Timing Verification
Timing verification is the process of evaluating every net in the chip to be sure
signals are valid as intended, i.e., signals arrive in time and remain valid for the
time required and are compatible with the behavioral models of all the macro
components on the
chip.38
Millions of nets are involved in large gate arrays, thus
one can appreciate that timing analysis is a critical component to a successful
design. As performance increases, distance between components becomes
more critical. In most cases, high performance circuits and the wires connecting
them operate in the low nanosecond range. Sometimes logic functionality is
acceptable or not acceptable due to a few tenths of a nanosecond. For that
reason, floorplanning is an important part of timing verification.
Floorplanning
With large gate arrays, floorplanning is an important design consideration.
Floorplanning is the process of defining where to place the various elements on a
chip to achieve the best
functionality.39
Floorplanning is done in conjunction with
the design, simulation, and timing verification process. The floorplanning tool is
capable of estimating delay's in the interconnect paths and providing that
information to a timing verification tool. If a particular floorplan indicates timing
problems, the designers can try a different placement or perhaps place
103
constraints on another tool to obtain faster logic to correct the problem.




Place and Route is a part of the design phase called physical design. It means
that the various circuit elements such as gates, SRAM, ROM, and macros are
placed on a chip and connected (routed) to form the total functional entity of the
chip. Up to this point the design itself exists as an abstraction, a model, nothing
more than data in a computer. At the physical design stage, the design models
are converted into physical elements from the library of the ASIC vendor. The
process of converting to physical elements is called "instantiation". Once
instantiation has occurred, the designers provide the instantiated data file to the
place and route tool along with the floorplanning data. The place and route tool
then begins the process of placing all of the instantiated components on the chip
using the floorplanning data as a guide. The place and route tool, which is
dealing with real gates and not abstractions, is more efficient. It does not have to
start from the beginning. The place and route tool can be constrained as was the
floorplanning tool to consider critical parts of the chip where "tight
timing"
is
anticipated. At the conclusion of place and route, a timing delay file is provided
to the design team to assist with any final design changes prior to making masks.
104
The timing delay file, called the extraction file, contains the actual wire length
delays of the nets on the chip. This is the first time actual physical quantities are
known. Everything prior to this point involving timing has been estimated. The
design is rechecked comparing the data in the extraction file to the prior
estimates. Differences deemed critical to the operation of the chip are
investigated and resolved. The physical design stage is critical to the success of
the project. Physical design occurs fairly late in the project schedule. It is rare
for sufficient time to be built into the schedule to recover from a major problem at
this late stage. Serious timing problems, for example, a chip image that cannot
contain the amount of logic intended for it, or the inability to distribute the clock
tree to all portions of the chip, inability to wire all macro's on the chip (at the
required performance level), can cause major problems. Like all phases of a
design, the place and route phase was planned. Place and route is often done
by vendor ASIC experts who will permit a small number of non performing nets to
exist after place and route is completed. The non performing nets can often be
manually manipulated to achieve success in a process called embedding. Place
and route engineers are reluctant to embed many nets. They often have multiple
designs in process simultaneously. Being near the end of the design process
they are under pressure to keep the fabricator filled and a large number of
manual embeds can take days or weeks. This is a second point where stressful
meetings can and do occur. At this point a DRC is performed, final timing is




Throughout this discussion we have mentioned macros often. Macros
themselves are subsystems and are treated as such. Therefore everything we
have discussed involving abstraction applies to macros. Today, most ASIC
vendors call their macros "cores". Core libraries consist of just about everything
one can imagine. Table 1 1 shows a typical core library.
Table 11
Component Class Component Type Component Class Component Typ
Processors PowerPC 401 Serial Interface l2C
PowerPC 405 IrDA Controller
PowerPC 440 Smart Card Interface
ARM7TDMI Serial Communication
IBM C54XDSP Port
ZSP400 DSP UART 16550
UART 16750
USB 1.1 Full Speed
Device Controller
USB 1.1 OHCI Host
Controller
Processor Peripherals PowerPC Peripherals Bus Interface AGP4X






Local and Wide Area Ethernet 10/100 MAC Data Compression ALDC (Adaptive
Networks Ethernet Gigabit MAC Lossless Data
Ethernet 10/100 TX PHY Compression)
Ethernet Management ELDC (Embedded
Information Base (MIB) Lossless Data








Fiber Channel 2 Gigabit
PHY
HDLC 32 Channels 32
Ports










Serial Link, 625 Mbps)
DASL (Data Aligned













































Cores, like chips, can be created to exploit different design intentions. Slow
performing cores may be synthesized without regard for performance or number
of gates. High performance cores may be hand crafted to squeeze every bit of
performance in the smallest size possible, or even a little of both. Thus cores are
said to be hard, firm or soft. Hard cores strive for the highest possible
performance. Firm cores contain portions that are hard, but much of it is soft.
Soft cores are fully synthesizable. Soft cores are instantiated every time the core
is used. As with most engineering tasks, there are consequences to choosing a
core. Hard cores are small, fast and are not porous. Soft cores have the most
flexibility. Thus, floorplanning takes on an added dimension when working with
107
cores. Large cores, with little or no porosity, can prevent a chip from being wired.
It is then said that the core cannot be placed. In addition, a core placed near the
edge may prevent engineers from using the peripheral l/O's in the immediate
region. In addition, hard cores have strict physical height and width dimensions.














C PacKaglig PC a Deslgi
T
Thus they also have an aspect ratio and its physical dimension can also restrict
its placement and wiring. Soft cores, at the other extreme are the most flexible,
offer the least problem in the physical design phase but usually have the least
performance. Figure 33 shows a high level diagram of a complete design
process. Note that cores may include analog and mixed signal designs so
important to future wireless technology. It should also be noted that the
instantiation process can, of course, include both CMOS and BIFET technology.
108
We have covered much in this chapter but we are now at the point where the
chip is ready for fabrication. Yet we have ignored another important aspect of
design and that is how we know if it actually works. How do we know if it is being
manufactured correctly?
Software and Hardware Synergy
Up to this point we have seen how design abstraction has enabled great
productivity increases that directly affect the ability to create new information
technology systems including data communications and telecommunications
systems. Abstraction itself, arguably owes its existence to software development
over the past 50 years. Table 12 shows the parallel development ofASIC
technology, computer hardware and software development. Data
communications and telecommunications systems typically lag computer
hardware and software development since data communications and












ASIC None - Single Schematic Synthesis and Cores and Single chip
transistor Capture Compilation Macros systems
design or
vacuum tubes
Computer Relay Core SSI and MSI LSIA/LSI ULSIA/HSIC
memories, (magnetic) chips, circuits, DRAM processors,
fixed point memories, I/O Microcode, and SRAM, memory,
arithmetic, processors, Pipelining, Vector switches,
single user, Floating point cache memory, processors, parallel











Table 12: Synergy of ASIC, computer and software
development.
We need to look at test generation strategies which we will do in Chapter 7.
110
Chapter 7- Challenges to E-Commerce and Data
Communications System Design Part IV
At this point we have discussed the ASIC design process to the point where an
ASIC has been submitted to a fabricator for prototype hardware. We also have
an understanding ofwhat the fabricator needs to do and some of the challenges
unique to building chips. Next we need to discuss timing verification of large
ASICs which I delayed until this chapter.
In the ASIC world, custom chips have the highest complexity level. Design
system based ASIC chips (gate arrays and standard cells) are considered to
have less stature in the pecking order of the most challenging silicon jobs. Yet
the actual design differences are minor in my opinion. Large custom chips like
the Intel Pentium IV, justify large and expensive development teams who are
dedicated to performing all aspects of chip development including developing
custom tools, compilers, synthesizers, HDL compilers, custom circuits and the
resources to demand process
'tweaks'
from the foundries they own to achieve
their company goals in the largest sense. More than one chip has been
announced by the marketing department that never reached actual production.
The term ASIC includes custom chips but unlike custom chips ASIC designers
rely on industry design tools from companies like Cadence, Synopsis and others.









including custom chips. Table 13 shows some performance differences between
full custom chips and ASIC
chips.40
Factors Contributing to vs. poor vs. best practice
Custom Better Than ASIC's
Micro-architecture: pipelining; logic design
Process variation and accessibility
Dynamic logic on critical paths
Timing overhead: clock tree distribution; latch/flip-flop design
Floorplanning and placement
Sizing of transistors and wires
Table 13: Maximum differences between custom andASIC? A
factor of x 1.00 indicates no difference.
Timing Verification and Test Engineering
Arguably the most challenging aspect of chip development is timing verification
and test engineering. The tools and design methodologies are capable of
creating many millions of transistors on a single chip. Each of those transistors is
comprised of shapes whose size is approaching atomic levels. Thus we can say
there are billions of shapes that need to be reproduced for every chip. Any one
of those shapes can cause a problem. In addition, the fabrication process of
most complex chip technologies may comprise between 100 to 200 process




chips, (G/B x 100) also called yield, is low, even Moore's Law
cannot sustain the chip industry. Thus it is critical to identify yield loss before
they are placed on the market. Since chips operate at high speeds, timing
verification is a good place to start.
112
In Chapter 6 Figure 28, a small logic block was introduced. One can see that it is
quite simple to measure the performance of a small logic block. One simply
provides a stimulus, waits 27 nanoseconds and samples the output. Ifwe want
to measure the robustness of the logic block we can lower the operating voltage
or perhaps lower the chip temperature and again measure the time it takes from
launch to capture. By making measurements at several points we can determine
the robustness of the logic block. But things change drastically when we are
faced with millions of logic blocks, cores, and unique I/O cells. We discussed in
a previous chapter that HDL supports time attributes which designers use for
preliminary timing verification. We learned that prior to place and route, the
timing data was estimated, not actuals. We also learned that a post place and
route extraction file containing timing data is provided so designers may make
minor adjustments prior to fabricating the chip. While much effort is expended to
minimize the separation between the estimated delays and the actual delays,
with hundreds of thousands and even millions of nets, problem nets can and do
arise. The timing analyzer tool can identify problem nets from the simulation runs
which use the behavioral models. The tool identifies early and late mode timing
problems. Early mode problems result from a clock pulse arriving too early for
the data resulting in data being lost. Late mode problems result from the clock
pulse arriving too late for the data. Again the data is lost. Problem nets are
identified and modeled in much more detail using a circuit modeling tool. One
113
such tool used frequently is
SPICE9 41. A SPICE model of the net can take into
account projected variations of the circuit environment such as temperature,
process variations, and voltage variations. In addition the SPICE model can
accurately predict circuit action beyond normal operating limits giving the design
team a good handle on the robustness of a design point. SPICE modeling work
is very detailed. A large number of non-conforming nets may result in a chip
architecture change since it is almost always impossible to model many nets
using SPICE. Fortunately a technique known as static timing analysis has been
developed.
Timing Analysis
Timing analysis is predominantly applying metrics to synchronous digital
systems. The metrics are the amount of time needed for digital signals to
perform mostly primitive functions. Thus we have a set or multiple sets of
combinatorial logic made up of mostly primitive logic being synchronized by
medium scale logic made up of the classic flip-flop or its more powerful version
the latch. Latches are almost always used throughout synchronous digital
systems. The synchronization is provided by a single heartbeat, the clock.
Timing verification therefore is mostly measuring the time of flight of digital




SPICE is an acronym meaning Simulation Program
with Integrated Circuit Emphasis.
114
predominant type of timing analysis used today. Static timing analysis predicts
arrival time and transition times from the primary logic input to the primary logic
output for each gate in the logic block under observation. The same calculations
are made backwards from the primary outputs to the primary inputs. Delay
errors are noted for any path where the timing range fails to overlap. Minimum
and maximum values are calculated for both rising and falling edge's and the
overlap is expected to be within the minimum and maximum range of the delay
path. According to Gattiker, static timing analysis is the only viable method for
signing of an ASIC design today.
Clock Tree
The clock is distributed throughout the chip through a mechanism called the
clock tree. The mechanism is called the clock tree because it looks similar to a
tree with branches. Since the clock tree must branch to every portion of the chip,
the number of transistors attached to every branch of the tree is carefully
watched. It is critical to successful timing analysis that the tree be as balanced
as possible. Balance is achieved when all branches of the tree achieved equal
rise and fall times of the clock pulse. If a clock pulse rising and falling edges are
substantially different between two branches of the tree, the slower branch will
exhibit slower logic operation. Details of rising and falling clock edge's, set up
and hold time, and latch orthogonality are beyond the scope of this paper but I
115
should mention that much effort goes into design and implementation of a chip
clock tree. In addition, hard and firm cores often have clock trees embedded
within them to assure proper full speed operation when they are placed in the
chip. Soft cores depend on the chip clock tree along with the remainder of the
logic not contained in cores or compiled memories.
Latch Utilization for Timing Analysis
Primary Input Latches Primary Output Latches
i r
Figure 34: Timing analysis overview
116
Figure 34 shows a block of combinatorial logic surrounded by latches. The
latches comprise the primary input and outputs of the logic. Timing analysis
measures the time of flight both from the primary inputs to the primary outputs
and vice versa43. This method keeps the amount of logic being examined to a
manageable size since combinatorial logic is partitioned throughout the design in
this fashion. Simulators can focus on logic blocks between the latches reducing
the complexity many fold. We now have a cursory understanding of how logic
blocks are timing verified. One can see that the latches enable designers to
focus on specific parts of a chip rather than try and verify a design as one block.
Indeed, trying to verify a block of combinatorial logic several million or more
gates in size is unimaginable.
Scan Design
Suppose now we modify the latches so we can preload data into them and
control the stimulus being applied to the combinatorial logic. We could probably
scan data into the primary input latches, provide a single clock pulse to the chip
and capture the result of the combinatorial logic in the output latches. We then
might be able to scan the data out of the output latches and compare the output
with any expected output we may have. This is exactly what IBM did in the late
1960's before Moore's Law was a factor. IBM called this method Level Sensitive









until today. When coupled with static timing
analysis, it is a highly successful chip
verification method. A slight variation of this
technique has been adopted by the industry
as boundary scan testing and standardized
aslEEESTD 1149.1.
SCAN IN
Figure 35: LSSD Pictorial
A slightly different version of Figure 34 is shown in Figure 35. LSSD latches are
enhanced to allow scan data to flow from latch to latch. Latches are chained
together serially forming "scan
chains"
often several thousand in length.
Standard 1149.1 also implements a control block for controlling all of the scan
chains along with a multiplexing function. By using the multiplexing function,
several hundred scan chains, each several thousand latches in length can be
controlled using just a few pins of the chip. A critical aspect of scan design is
that no logic feedback may occur unless the feedback loop is broken by a scan
chain. One of the drawbacks involving scan design is the fact that scan latches
require more gates to build and thus are larger than non scan latches. While the
size difference is just a few gates, when you multiply those few gates by the
number of latches on a chip, it is formidable. Yet given the alternatives of being
118
unable to verify or debug a design, the criticism of scan design has fallen by the
wayside.
Chip Testing
Like virtually all manufacturing processes, the earlier you can identify non
conforming product the better, and chip fabrication is no exception. We have
already discussed chips exceeding 2000 I/O and operating at high clock speeds.
The question is how to test it. It becomes even more interesting ifwe wish to test
it before removing the die from the wafer and placing it in a package since the
connecting wires between the tester and the chip are several feet in length.
Production test systems are expensive and would be prohibitively expensive
were it not for scan test. Scan test offers some ability to test chips while still
attached to the wafer albeit at relatively low speeds. In addition with IEEE
1149.1 embedded on the chip the number of functional test pins can be reduced
to a dozen or so. Functional defects are easily found and using diagnostic ability,
special test patterns can be generated to analyze specific gate level failures.
Test systems with extremely deep memories contain the test data and apply the
data to a chip usually at a rate of 1 MHz. According to the Semiconductor
Industry Association (SIA) the total number of test vectors to be applied to a high
function ASIC is estimated to be 32 million as of 1999 and could climb to 100
million by 2005. Even at 1 MHz the amount to time needed to test a chip can be
119
a minute or more. Chip test systems can cost several million dollars each and
test time is an ongoing challenge to the industry. It has been projected by the
SIA that by 2014 it will cost more to test a transistor than to manufacture
it.44
Yet, much can be accomplished45. Three key circuit limited yield detractors can
be uncovered using scan test and timing verification. According to Gattiker,
design independent die to die variations caused by process fluctuations,
environmental variations caused by temperature and voltage, and physical
variations such as line width and dielectric thickness can all be uncovered. The
technique is to closely measure the latch to latch delay path on the chip.
Variations in performance can be attributed empirically to one of the three circuit
yield detractors. The test system simply loads a test pattern into the scan in
latches, provides a single clock pulse to the chip, and measures how long before
the launched data arrives at the output latches. Unfortunately, despite the
advantages of scan design, the total amount of chip test coverage remains
woefully low.
Test Coverage
Test coverage is simply the number of potential chip faults divided by the number
of faults tested. Manufacturers would like to get the fault coverage as high as
possible. Yet the ever present problem with test time, and the potential for huge
increases in test cost driven by the need to buy more test systems remains an
120
obstacle. Test coverage of 99.9% may seem quite good until one realizes we
are dealing with gigascale integration and 99.9% leaves 1x
106
gates untested.
The problem becomes exacerbated by the fact cores containing DRAM, SRAM,
FPGA, and mixed signal (analog) functions are possible. Built-in-Self Test
(BIST) is considered a solution for these problems.
BIST
BIST is simply using gates on a chip to form an internal chip test system. First
announced by IBM in 198346, BIST offers a potential solution to mitigate the
problem of testing ASICs. BIST extends scan design by adding a method of
creating test vectors on the chip, applying those vectors to the chip logic, and
evaluating the results. We mentioned scan chains in a previous section and that
there can be many scan chains on a chip. This architecture is not changed for
BIST. Each of the scan chains can be used to test logic concurrently reducing





Figure 36: BIST block diagram.
121
Using BIST requires the introduction of a Linear Feedback Shift Register to the
scan chain and a signature register to the output of the scan chain. Pseudo
random patterns of bits are applied to the scan chain by the LFSR. The logic is
clocked and its response is captured as previously described. The captured
response is contained in the signature register. One can see that this is effective
as opposed to a multimillion dollar test system approach. Unfortunately some
logic blocks are resistant to random pattern faults and evade detection. One
method of improving BIST is to add additional bits to the pseudo random vectors
thus giving the test vector greater
"weight"
(a propensity to find defects in a
particular logic block). Another improvement may be to add a way of
"fixing"
the
weighted vectors which improves the effectiveness of the chip generated vectors.
Such a system is described in Figure 37.
Logic Under Test
> t




Figure 37: Modified BIST
122
The bit sequence generator is designed to alter the pseudorandom sequence of
bits that is shifted into the scan
chain.47
According to Touba, experimental
results show the bit sequence generator is capable of complete test coverage. If
the experimental results are sustained this could be a major improvement in chip
testing.
At this point we have now completed our look through some of the background
and present practices of silicon integration. It is now time to look at the near term
future and a summary and conclusion.
123
Chapter 8 - Future System Design
The future of data communications and e-commerce systems seems very bright
indeed. For the foreseeable future, Moore's Law will continue to provide the
roadmap to future chip density. Now more than ever, high levels of integration
are almost capable of providing system on a chip function. Attaining true
System-On-Chip (SOC) function is certainly complicated by the fact technology is
in constant change. New applications are announced almost daily. The Internet,
cellular technology, the information age boosted by unprecedented mobility, and
future ubiquitous wireless services are all contributing to astounding rates of
growth. The constant improvement in chip circuit density will continue to drive
the cost of silicon function down. Even ifwe cannot use a billion gates on a chip,
we can always use the productivity of gigascale integration. System-on-a-Chip
(SOC) is a likely candidate
SOC
Single chip systems remain a highly likely candidate. The cores (macros)
identified in Table 1 1 represent but a few of the total available in the industry.
These cores represent the basis for future single chip systems. It has rarely
been possible for any single company to create all
the technology needed for any
system and chip integration is no different. The
intellectual property invested in
124
silicon cores enable many different applications to be supported by silicon
function. Practically unlimited combinations of cores can be used to form new
functions limited only by human creativity. Yes, the design challenges identified
in previous chapters must be solved but it gets a little easier each time it is done.
New tools will be developed to solve problems. Root causes will be identified
and solutions implemented. Single chip systems will be able to make use of the
large BGA packages and use many of the gigascale gates available in the future.
Multi-chip packages
Multi-chip packages offer another means of implementing gigascale technology.
These packages formed using such common materials as FR4, ceramic and
plastic offer a means of connecting several gigascale chips into a system.
Indeed, IBM created multi chip packages for ECL vintage mainframe systems.
Multi-chip packages offer the ability to accurately model chip interconnects while
at the same time minimizing the distance between chips. These factors have a
positive effect on chip power dissipation, delay time, signal integrity, and
verification. Because of the small form factor, it may be feasible to economically
cool gigascale chips using well known techniques or perhaps even miniature
refrigeration systems. According to the SIA, such techniques may increase chip
reliability 3-5 times while increasing performance by 15%. It is also feasible
that gigascale integration may make it possible for on-chip test systems to be
125
added to the multi chip scheme. Multi-chip packages also enable chip reuse in a
multitude ofways. For example, one of the chips could be a Bluetooth radio
function that applies to several products. Reuse tends to lessen development
costs and is very desirable.
Virtual Hardware
Another promising technology is virtual hardware also known as a multi-context
hardware48. We have already mentioned a technology called an FPGA (field
programmable gate array). While FPGA technology is relatively expensive on a
per gate scale, it may be quite acceptable within the context of logarithmic rises
in complex and custom ASIC development costs and the realization of gigascale
technology. One can envision the ability to create massive amounts of gates on
a chip while being unable to use them. Indeed Figure 19 indicates such a
situation is expected. Virtual hardware may be a solution. Virtual hardware
consists of large amounts of FPGA circuits on a chip along with other popular
system cores such as DRAM and SRAM. Since the FPGA can morph into a
different function by simple reprogramming, a single virtual hardware chip could
take on many different applications.
Summary
126
The health and advancement of the data communications and e-commerce
universe are tightly coupled to advancements in silicon technology. We started
this thesis identifying Moore's Law and the positive impact it has had since its
inception. Moore's Law has, for decades, projected the growth in semiconductor
integration. The concern that Moore's Law is quickly coming to an end is not
supported by the data from Meindl and Davis who project the limits to be the
minimum channel length of a MOSFET device (Lmin) of 13.9 nanometers.
Additional work by Chau and Marcyk show promising new materials that may
achieve such a small channel length. 13.9 nanometers is a reduction of 10X
from today's leading edge technology. It is expected that 13.9 nanometer
technologies will not be available until sometime in the next decade. Such tiny
entities, bordering on atomic dimensions, will enable the semiconductor industry
to create gigascale level chips well before reaching the 13.9 nanometer
dimension. Yet, research into new types of transistors, carbon nanotubes for use
in MOSFET channels and spin transistors, show promise for the long term
extension ofMoore's Law. But the problem may not be our inability to create
transistors as Moore predicted. The problem may be how we use the transistors
we can make. Macro's and cores are one element trying to solve the problem.
But even cores and macros do not seem capable of keeping up with Moore's
Law. The ability to create very fast and highly compacted chips creates a
problem with power dissipation and creates economic pressure on low cost chip
packages. Development costs are expected to increase substantially so we may
127
be forced to develop smaller chips if we cannot improve designer productivity.
Chip testing also remains a difficult challenge. If chips cannot be verified and
properly tested, they are unlikely to populate future systems. This probably
means single chip systems will not be realized in the next few years. However,
multi-chip packages and virtual hardware may fill the void until productivity
solutions are found.
Conclusion
We have looked at many aspects of silicon integration as it affects data
communications and e-commerce system design. Moore's Law has accurately
predicted huge advances in silicon integration and for the next few years it
appears Moore's Law will continue unabated. Yet we understand that despite
advances in computer aided design tools, increasing degrees of design
abstraction, and advances in packaging technology, it is likely we will be able to
create more gates on a chip than we can use. Ifwe project future increases in
circuit density due to spin transistor technology and carbon nanotubes, circuit
technology is likely to reach densities we can only imagine, for spin transistors
and carbon nanotubes dissipate little power - arguably the key liming factor in
gigascale integration. Formidable problems exist in the ability to verify, place and
route, and test gigascale level function. Yet, this inability opens doors for other
technologies to fill the void. With such huge numbers of circuits available to
128
future chips, placing a "test
system"
on a chip may not be such an outrageous
proposal. Such a test system could conceivably be able to diagnose failed logic
to the transistor level where it could be observed and verified with a powerful
microscope.
With the exception of processor and switching systems, the need to create
gigascale chips is not a pressing matter. Indeed, it is unlikely that 64 bit
processors, a critical gigascale driver, will emerge to the desktop in the next five
years. The desktop environment, which has driven Moore's Law for the past
twenty years, has not emerged with an application that needs a 64 bit processor.
Unless some
"golden"
application emerges, the need to create production level
gigascale chips, including memory chips, is suspect. Switching systems may be
able to make use of gigascale integration but the amount of circuits used in
switching systems worldwide is dwarfed by the desktop environment. Indeed,
compute power is aimed more at entertainment than business, science,
communications, or mathematics applications. The end of compound silicon
productivity, as predicted by Moore's Law, will not occur in the next five years but
the need for the circuit density predicted by Moore's Law will plateau until
desktop applications spur the chip industry into gigascale integration. Despite




Meindl, J.D. and Davis, J.A. (2000). The Fundamental Limit on Binary Switching Energy for
Jerascale Integration (TSI). IEEE Journal of Solid State Circuits, Vol. 35, NO. 10, October 2000
Chau, Robert, and Marcyk,. Gerald., New Transistors for 2005 and Bevond. IEEE International
Electron Device Meeting 2001.
3
IBM Research News. April 27, 2001.
www.research.ibm.com/resources/news/20010425_Carbon_Nanotubes.shtml
Avouris, Phaedon et. al. Engineering Carbon Nanotubes and Nanotube Circuits Using Electrical
Breakdown. Science, Vol. 297, issue fifii7, April 27, 2001.





Solid State Circuits, IEEE Solid State Circuits Society Quarterly Newsletter Col. 7 NO 1 January
2002. PL
Keyes, RobertW., (2001). Fundamental Limits of Silicon Technology. Proceedings of the IEEE
Vol.89, NO. 3 March 2001.
9
Bashe, C.J., Johnson, L.R., Palmer, J.H., & Pugh, E.W. (1986). IBM's Early Computers. Boston,
Massachusetts: MIT Press p 396.
10
Santo, B., BiCMOS circuitry: the best ofboth worlds. IEEE Spectrum Vol. 26 No 5, May 1989. P 50 - 53.
Einspruch, N.G., and Hilbert, J.L., (1991). Application Specific Integrated Circuit Technology. San Diego,
California: Academic Press p 2 1 .
12
Bouras, I., Papadas C, Moreau, JP., Katsafouros., A Novel Driver Architecture Capable ofDriving High
Capacitive Loads for Sub-HalfMicron Technologies.
Szmyd., P., et al.. QUBJC4: A Silicon RF-BiCMOS Technology for Wireless Communication IC's. IEEE
BCTM3.3P60-63.
Sevenhans, J. et al. Wireless Telecom Silicon Integration: Analog Design for Radio. Baseband and Speech
Spectrum. Wireless Networks 4 (1998) p 71
- 77.
Alcatel Announces RevolutionaryNew 10 GBPS Transmission System based on IBM SiGe Technology,
Available http://www.alcatel.com/press/current/1998/09_29.htm
16
Intersil's Newest PRISM WLAN Chipset Now Available for high performance, high speed wireless
networking
- half the chips, half the power, 5X the speed, Available
http://www.intersil.com/whatsnew/prismPCS99_2.asp
Wedding, B., et al. 40 Gbit/sec quaternary dispersion supported transmission over 3 1 km standard single
mode fiberwithout optical dispersion compensation. Proceedings 1998 European Conference Optical
Communications, 1998 p 523.
18
Delaney, M., and Sunderland, D., Private Communications at Hughes Space and Communications.
1 9
RFR3 1 00 IF Receiver Device, Available http://www.qualcom.com/ProdTech/asic/products/rtr3 100.html,
Feb 8, 1999.
20
Haratne, D. L., andMeyerson, B. S., The Early History of IBM SiGe Mixed Signal Technology. IEEE
Transactions on Electron Devices, Vol. 48 No. 1 1, November 2001.
21
Harame, D.L., et al. Current Status and Future Trends ofSiGe BiCMOS Technology. IEEE Transactions
on Electron Devices. Vol. 48, No. 11 November 2001
22
Porter, Michael C, Competitive Advantage
- Creating and Sustaining Superior Performance. The Free
Press. New York, New York. 1985
23
Einspruch, N.G., and Hilbert, J. L. (ed). Application Specific Integrated Circuit Technology. Academic
Press, Inc. San Diego, CA. 1991
24
Abrial, Andre, et. al., A New Contactless Smart Card IC Using an On-Chip Antenna and an Asynchronous





Wilson, H and Haycock, Matthew. A Six Port 30 GB/sNonlocking Router Component Using Point to
Point Simultaneous Bidirectional Signaling for High Bandwidth Interconnects. IEEE Journal ofSolid State
Circuits. Vol.36. NO. 12, December 2001.
27
Momtaz, J.C., et. al., A Fully Integrated SONET OC48 Transceiver in Standard CMOS. IEEE Journal of
Solid State Circuits, Vol. 36 NO. 12, December 2001 .
29
Happy Birthday PC. Cox News Service, Lancaster (Pennsylvania) New Era, August 12, 2001 p: D-3.
30
The International Technology Roadmap for Semiconductors 1999. Semiconductor Industry Association.
Einspruch, N.G., and Hilbert J.L., ed. Application Specific Integrated Circuit Technology. Academic Press,
Inc. New York p 108.
31
The International Technology Roadmap for Semiconductors: 1999 p 35.
Meguerdichian, S., et. al. Metacores: Design and Optimization Techniques. Proceedings of the IEEE






VHDL Tutorial. GreenMountain Computing Systems, http://www.gmvhdl.com
Krishnamachary, Aran, et.al. Timing Verification and Delay Test Generation for Heirarchical Design.
Fourteenth Annual Conference on VLSI Design. Jan. 3 - 7, 2001 P 157 - 162
Ranjan, Abhishek. et. al., Fast Floorplanning for Effective Prediction and Construction. IEEE Transactions
on Very Large Scale Integration(VLSI) Systems, Vol. 9, NO. 2, April 2001
40
Chinnery, D.G., et. al., Achieving 550 MHz in an ASIC Methodology. Proceedings of the IEEE Design




Chen, Liang-Chu, et. al., A New Model for Simultaneous Switching and Its Applications. Proceedings of
the IEEE Design Automation Conference 2001. p 289
- 294.
Krishnamachary, A., et. al, Timing Verification and Delay Test Generation for Hierarchical Designs.
Proceeding of the IEEE Design Automation Conference 2000. p 157
- 162.
International Technology Roadmap for Semiconductors: 1 999 p 62.
Gattiker, A., et al., Timing Yield Estimation from Static Timing Analysis. IEEE International Symposium
on Quality Electronic Design 200 1 . p 437 - 442.
Eichelberger, E.B., & Lindbloom, E., Random pattern coverage enhancement and diagnosis for LSSD logic
selftest. IBM Journal ofResearch and Development, Vol. 27, no 3, p 265-272, 1983
4
Touba, N. A., and McCluskey, E.J., Bit Fixing in Pseudorandom Sequences for Scan BIST. IEEE
Transactions on ComputerAided Design of Integrated Circuits and Systems, Vol. 20, No. 4, April 2001 .
48
Kawakami, D., et. al., A Prototype Chip ofMulti-context FPGA with DRAM forVirtual Hardware.
Proceedings of the IEEE ASP Design Automation Conference 2001. p 17
- 18.
45
131
