Reliability in CMOS IC processing by Wang, J. et al.
N94-71099
2nd NASA SERC Symposium on VLSI Design 1990 1.2.1
Reliability in CMOS 1C Processing
R. Shreeve, S. Ferrier, D. Hall and J. Wang
Hewlett Packard
Circuit Technology Group
Corvallis, Oregon 973330
Abstract - Critical CMOS 1C processing reliability monitors are defined in this
paper. These monitors are divided into three categories: process qualifications,
ongoing production workcell monitors, and ongoing reliability monitors. The
key measures in each of these categories are identified and prioritized based
on their importance.
1 Introduction
1C process reliability starts with a clear description of the entire 1C process from 1C design
through final shipment (Figure 1). In this flow the original development of the 1C process,
packaging process, test process, and shipping process are shown to the left of the main
manufacturing process flow. New processes are expected to meet minimum reliability
requirements. These requirements are typically referred to as new process qualification
requirements.
Figure 1: 1C process flow
Each of the manufacturing processes like wafer fabrication can be broken down into
several smaller processing steps. These steps will be referred to as workcells throughout
https://ntrs.nasa.gov/search.jsp?R=19940004344 2020-06-17T00:16:39+00:00Z
1.2.2
Incoming Material
Process
Operation
Operattonal
Specifications
Operator
Training
Equipment
Maintenance
Specif ications
Process
laproveient
Training
Equipment
Maintenance Training
Outgoing Mater ial
Figure 2: Workcell Description
this paper. Figure 2 shows a general process description of the workcell. The workcell is
composed of three major parts: the process operation, internal workcell process control,
and external feedback to the workcell. This paper carefully reviews both internal and
external process control feedback for key CMOS 1C workcells. This feedback is critical to
the continuous improvement of the workcell process. These improvements directly affect
the material consistency.
Ongoing reliability strife testing plays a key role in continuously improving the reliabil-
ity of current and future 1C processes. Strife testing unlike qualification testing is designed
to produce 1C failures through excessive environmental stress. Ongoing reliability strife
testing is performed at least quarterly on material from released manufacturing processes.
This testing has two purposes. First, it is intended to provide feedback so that the overall
1C strength can be improved. Second, this testing provides a larger statistical basis for
evaluating the consistency of the process (cumulative sample sizes for a single strife test
are typically in the thousands).
This paper reviews each of these three areas in much greater detail. A commitment
to continuous process improvement is assumed to be a basic operational methodology.
1C reliability relies on two key components: material strength, and material consistency.
Material strength refers to the ICs capability to resist degradation over time. Typically
this degradation results from temperature, humidity, current flow, high voltage gradients,
or mechanical stress. Material consistency refers to the ability to make each part exactly
the same. 1C consistency varies because of normal process variations or unexpected process
exceptions (particles typically fall into this category).
2nd NASA SERC Symposium on VLSI Design 1990 1.2.3
Qual i < icat ion Tests
Integrated Tests
i
[Intrinsic Tests
t
Silicon
Tests
t
Package
Tests
High T»«p»r«1ur» Llf*
*Conductive
Filns
Th«r««! Cycling
i
Insulating
Fllns
El«ctro>lgr*tlon 0>ld« Integrity
Figure 3: Process Qualification Tests
2 Qualifications
Qualifications are designed to set minimum expectations on the initial 1C process. As a
result, inherently weak processes are prevented from moving into manufacturing by these
qualification standards. Qualification tests can be classified as either intrinsic or integrated.
Intrinsic tests directly measure the intrinsic strength of specific films on the silicon die.
Integrated tests measure the reliability of the entire packaged 1C. Integrated tests typically
verify acceptable interactions between different materials. Figure 3 shows the purpose of
different qualification tests.
Integrated testing is usually given the greatest importance since it evaluates the re-
liability of the entire 1C rather than one or two elements. However, intrinsic testing is
required to supplement integrated testing because of limitations on the'stress which can
be applied to an integrated system.
High temperature operating life is an example of an integrated qualification test. High
temperature operating life testing operates the part at junction temperatures significantly
above maximum operating temperatures for extended periods (6 weeks). These higher
temperatures will cause defective parts to fail in one tenth to one hundredth the time
required under normal operating conditions.
Four key factors define the potential value of integrated qualification tests. These
factors are listed in order of importance below:
1. Electrical Testing
2. 1C Vehicle
3. Sample Sizes
4. Environmental Conditions
Specifications of qualification tests normally focus on the environmental conditions.
However, our experience has shown that this factor is actually less significant that the other
three factors. In other words it is typically easy to pass harsh environmental conditions if
1.2.4
the test program is compromised, the 1C vehicle incompletely designed, or sample sizes of
10 to 50 parts are used.
2.1 Electrical Testing
Electrical Testing is designed to detect when the part stops functioning. Functional testing
has been used for many years to detect these failures. The creation of functional vectors
is typically based on a stuck-at-fault model. This model assumes that a failure is char-
acterized by a node stuck at either 0 volts or 5 volts on the 1C. In practice this type of
failure mechanism is extremely rare. Typical failure mechanisms exhibit leakage current
to the supplies or to an adjacent line. In both cases huge leakage currents are required to
actually induce a stuck-at-fault failure. Long before the leakage induces a stuck-at-fault
failure it will create reliability degradation which is the real failure mechanism in the field.
Hence, it is extremely desirable to detect these low level leakages directly on the 1C. This
is accomplished by implementing a static current test. This test places all nodes on the
1C at alternating states (0 volts, 5 volts) and then measures the leakage currents on the
supply lines. All nodes on a CMOS 1C are connected to either ground or Vdd through
a transistor that is turned on. Hence, the leakage current between adjacent nodes can
be directly detected at the supply. Useful static current measurements can be made at
any level below 10 ^tA. Typical measurements should be in the I/* A range to produce the
appropriate sensitivity to defects. Measurements in the lOOnA range are limited not by
CMOS process capabilities but rather by electrical test hardware capability. For assessing
silicon process reliability this single measurement is far more important than the other
qualification criteria discussed.
Packaging process qualifications rely heavily on detecting defects in the silicon to pack-
age connection. As a result, I/O leakage current testing is a key measurement for detecting
shorts. In addition, I/O conductivity measurements should be made to detect discontinu-
ities at the silicon-package interface.
2.2 1C Vehicle
The 1C vehicle is the part on which qualification testing is performed. Usually two different
1C vehicles are required to qualify a new process. One vehicle is designed to optimize the
sensitivity of the part to silicon degradation. While the other vehicle is designed to optimize
the package to silicon thermal mechanical mismatch.
The silicon vehicle is intended to optimize the sensitivity of the 1C layout to processing
deviations (variations & exceptions). As a result, this device should be as dense as possible
using minimal spacings between devices. The vehicle as a whole should dissipate extremely
low standby currents so that the maximum defect sensitivity can be achieved. The design
of the I/O pads should possess good ESD immunity to prevent unrealistic defects caused
by electrical noise in the environmental system. Finally, it is critical that all nodes on
the vehicle can be tested, exercised by the environmental test system, and failures can be
mapped to the physical site of the defect. At HP we typically use an SRAM part for this
2nd NASA SERC Symposium on VLSI Design 1990 1.2.5
vehicle.
The silicon vehicle is used for high temperature operating life tests and moisture re-
sistance testing. Life testing is designed to accelerate silicon defects. Moisture resistance
testing accelerates both silicon and package corrosion. Silicon corrosion is directly related
to the layout of the die. Layouts with tight geometries create the most difficult topolo-
gies for passivation coverage. Participate on the die further complicate the difficulty of
passivation coverage. Hence, this is the most sensitive vehicle for these two tests.
The package test vehicle is intended to optimize the mechanical stresses induced as the
temperature is increased. This is the key vehicle used to qualify new packaging processes.
The mechanical stresses result from differences in the coefficient of expansion between
packaging and silicon materials. These stresses are optimized by creating large die and
placing them in the largest package within a packaging family (i.e. package family= PDIPs,
PLCCs). The package test vehicle should also be designed to incorporate direct measure-
ments of the mechanical stress. This increases the sensitivity of the part to failure induced
by stress. Corrosion structures should be incorporated to detect failures induced by the
combination of moisture and ionic contaminants. Finally, the 1C should be designed with
a large number of I/O pads. Both the number of bonds on a part and spacing between
bonds can be optimized to provide the worst case environment for reliability testing.
The package test vehicle is used for temperature cycling tests and pressure pot testing.
Thermal shock, temperature cycling, and solder resistance tests are examples of different
temperature cycling tests. The large size of the package test vehicle and the built in
mechanical stress monitors optimize the sensitivity of this part to temperature cycling
induced failures. The corrosion structures and die size optimize its sensitivity to corrosion
during pressure pot testing.
2.3 Sample Sizes
Significant sample sizes are critical to detecting failures during integrated qualification
tests. Parts with the same defect can degrade at varying rates. In addition, much of the
electrical testing is designed to identify parts that have failed but not all of the parts that
have degraded. As a result, in a few cases not all of the defective parts result in failure.
This drives the need for significant sample sizes. At the vary least these tests should start
with sample sizes of more than a hundred parts for each environmental test.
2.4 Environmental Conditions
Most environmental tests can be classified into one of three categories:
• Life tests
• Moisture tests
• Thermal cycling tests
1.2.6
Vdd Vdd
Jn> 'Out
Figure 4: CMOS Inverters
Each of these tests are characterized by the environmental conditions and the opera-
tional state of the part. These parameters are consistently less important than the other
issues already discussed.
High temperature and low temperature life tests are intended to primarily accelerate
silicon defects. The acceleration in high temperature life testing is determined by the
junction temperature of the device. Higher temperatures produce higher stresses on the
part. Typical CMOS junction temperatures during high temperature life testing are 150C.
During life testing the part should be exercised with a set of vectors that simulate normal
1C operation. This simulation should satisfy two objectives:
1. Exercise all areas of the 1C
2. Exercise the most sensitive portion of the circuit 90% of the time
Vdd
Ron
In-
Cload
Out
Figure 5: CMOS inverters equivalent circuit
Exercising all areas of the 1C optimizes the chances of detecting unexpected problems
on the 1C. Figure 4 shows a typical CMOS inverter driving another inverter. Figure 5 shows
the idealistic equivalent of this circuit. It is obvious from Figure 5 that current flows only
when the inverter switches from one state to another. In many cases the cumulative current
flow is what causes defects to become failures. The implication for reliability testing is
that a circuit must be continuously exercised to identify reliability weaknesses. This is
very difficult if the only objective is to exercise the entire 1C. As a result, it is necessary
2nd NASA SERC Symposium on VLSI Design 1990 1.2.7
Test
Name
65/90
85/85
Pressure
Pot
HAST
Electrical
Bias
Yes
Yes
Yes
Pressure
(Atm)
1
1
2
2
Humidity
%RH
<90%
85%
95%
95%
Temperature
(C)
65
85
125
125
Time
(Hours)
240
1000
240
168
Sample
Size
105
105
50
105
Table 1: Summary of Moisture Resistance Tests
to select one small area of the 1C that can be continuously exercised a high proportion of
the time.
Several different types of moisture resistance tests are summarized in Table 1. Each of
these tests use a combination of moisture and temperature to activate reliability failures.
65/90 is the only test designed to cycle the temperature and relative humidity. The purpose
of this cycling was to drive moisture into cavity packages. Our experience with this test has
demonstrated that it is unlikely to accelerate defects to failure. As a result, HP prefers to
use 85/85 testing. Relative humidity is typically used to describe these tests. However, a
much better measure is the partial pressure of water. This measure directly describes how
much moisture is present in the chamber. Figure 6 shows a graph of partial pressure (for
water) at various temperatures when the air is fully saturated (100% RH). Reviewing the
65/90 and 85/85 operational points on this curve illustrate that relative humidity values
are quite misleading. The 85/85 test contains twice as much moisture as 65/90 even though
the relative humidity is lower.
2.0
E 1.5
«w^
J 1.0
<n
n
2 0.5
a.
0.0
30 40 50 60 70 80 90 100 110 120
Temperature (C)
Figure 6:
Thermal cycling tests are designed to detect mechanical stress induced failures. The
mechanical stress is a direct result of the differences in the coefficient of expansion for
different materials. Figure 7 summarizes several different types of thermal cycling tests.
All of these tests should be performed using the largest possible die and package size.
Since this produces the worst case internal stress. HP generally considers thermal shock
to be the worst case test because the fluorocarbon liquid forces the most rapid tempera-
1.2.8
ture change in the part. The soldering process tests were originally designed to confirm
part compatibility with the board assembly processes. However, solderability tests offer a
different temperature profile which can result in different failure mechanisms.
Name
Thermal Shock
Temperature Cycling
Wave Solder
Vapor Wave Solder
IR Soldering
Phase of
Medium
Liquid
Air
Air
Air
Air
Low
Temperature
- 55 deg
- 55 deg
25 deg
25 deg
25 deg
High
Temperature
+ 125 deg
150 deg
260 deg
215 deg
215 deg
Number
of Cycles
200
500
1
4
4
Dwell
Time
5 Min.
5Min.
10 Min.
1 Min.
IMin.
Table 2: Summary of Thermal Cycling Test
2.5 Intrinsic
Intrinsic tests are designed to measure the intrinsic film strengths on the silicon die. Two
types of films exist on a silicon wafer:
• Conductive
• Insulating
Metal layers are typical conductive films. Gate oxides, intermetal dielectrics, and pas-
sivation films are common insulators. Two special intrinsic tests are performed on these
films to assure that they will continue to operate throughout the rated lifetime of the
product:
• Electromigration
• Oxide Integrity
The verification of these film properties is rarely possible during integrated testing
because sufficient stresses can not be applied to the part. In electromigration testing the
stress is increased by applying high current densities. Oxide integrity increases the stress
on nonconductive films by rapidly ramping the voltage across the film.
Electromigration testing is designed to accelerate metal migration failures. However,
it can also be a useful test for accelerating failures from stress migration. In addition,
this test can be used to accelerate the failure of intermetal connections (vias) or metal to
substrate connections (contacts).
Oxide integrity tests are designed to measure the potential breakdown voltage of oxide
films. These breakdown voltages are directly affected by weak film quality or particle con-
tamination of the films. In addition, to normal gate oxide testing the intermetal dielectrics
also need to be tested.
2nd NASA SERC Symposium on VLSI Design 1990 1.2.9
3 Workcell Processes
Individual work cells are the building blocks of the production process. Figure 2 shows a
general description of a workcell. This section of the paper focuses on the process control
monitors for each of the major workcells. This control is necessary to assure consistency
from lot to lot. The process operation consists of operational personnel, equipment, in-
flow, and outflow material. This paper assumes that operational specifications, operator
training, equipment maintenance, specifications, equipment maintenance rate tracking,
and process change training are parts of the process operation in each workcell.
Figure 1 shows the overall 1C process flow. This flow consists of five production pro-
cesses:
1. Layout
2. Silicon Fabrication
3. 1C Packaging
4. Electrical Testing
5. Shipping
This section reviews the major workcells for each of these production processes.
3.1 Layout
The layout of the 1C will ultimately determine the reliability of a specific 1C design. Three
key reliability issues must be considered in the layout:
1. Compliance to process layout rules
2. Electromigration
3. I/O pad protection
Layout rules should be clearly documented. In many cases layout programs and libraries
already exist that prevent potential violations of these rules. In addition, a design rule
check program is usually run to verify that the physical layout does meet the design rule
requirements.
Electromigration is typically handled by providing margin in the initial electromigra-
tion rules. Libraries of verified designs provide additional protection from electromigration
violations. Some electromigration checkers do currently exist but most are still in the de-
velopment phase. As a result, it is crucial that the designer identify where potential elec-
tromigration violations could exist. This in usually a short list because of the availability
of verified libraries and autorouters.
The I/O pads interface to the external world. As a result, they need to protect the 1C
from transient high voltage or current abuse. This is accomplished by incorporating ESD
1.2.10
protection circuitry and appropriate guardrings for latchup. All pads shotdd be reviewed
before the 1C artwork is released for mask production. HP also performs ESD and Latchup
testing on the first parts fabricated in a new design to verify their protection capabilities.
Different ESD test equipment can produce significant differences in the measured ESD
protection of an I/O pad. This is especially true with machines that conform to older
versions of MIL-STD 883C [1].
3.2 Silicon Fabrication
The following silicon fabrication workcells are reviewed in this section:
• Diffusions
• Gate oxides
• Metalization
• Intermetal dielectrics
• Passivation
Table 3 summarizes the key reliability process monitors for each of these process steps.
Process Step
Diffusions
Metalization
Gate Oxide
Intermetal Dielectric
Passivation
Process Controls
CD's, dose, temperature, resistivity
Thicjness, width, grain structure, sheet resistivity
Mobile ion concentration, surface charge density, thickness
Leakage currents, thickness, topography, alignment
Coverage
Table 3: Silicon process controls for reliability
Island and well diffusion are required to produce the correct dopant concentrations and
depth profiles. These issues are typically monitored by measuring the dimensional accuracy
of the diffusion openings, monitoring the dose, controlling the drive in temperature, and
measuring the final diffusion resistivity.
Metalized films are required to conduct electrical current from one node to another
without variations in resistivity over their life. Resistivity variations could result from
differences in metal width, thickness, grain structure, or sheet resistivity. As a result, it is
desirable to have each of these parameters monitored as part of the metalization workcell.
Gate oxides should block current flow from the gate to the channel without degrading
the electrical field. Mobile ion concentration, surface charge density, and thickness need to
be monitored to assure consistency from part to part and over the lifetime of the product.
Intermetal dielectrics prevent leakage paths between adjacent metal conductors (both
vertical and horizontal). Leakage currents, and thickness are two key monitors of the insu-
lating characteristics of these materials. In addition, the film topography, and alignment
to underlying layers must be monitored.
2nd NASA SERC Symposium on VLSI Design 1990 1.2.11
Passivation is usually the final layer deposited onto the silicon wafer. It is designed
to protect the metalized layers from chemical corrosion and the transistors from chemical
contamination. To provide this protection the passivation must uniformly cover all struc-
tures on the 1C. Variations in the thickness of the film need to be monitored to assure film
reliability.
3.3 1C Packaging
Table 4 summarizes the process steps and process controls for 1C packaging. These controls
are designed to assure 1C packaging consistency.
Process Step
Leadframe
Die Attach
Bonding
Molding
Lead forming
Process Controls
Platting uniformity,frame quality
Backside coverage, excess material
Temperature, pressure, tip wear, wire quality
Temperature, pressure, injection rate
Tool accuracy,tool wear
Table 4: Packaging process controls for reliability
The leadframe is usually considered an incoming material. The leadframe is plated
so that a wire can be bonded to the inner lead fingers. This plating must have uniform
thickness and consistent purity. The leadframe is stamped or etched to create separate
leads. Each of these leads need to be correctly formed and free of contamination.
The die attach step is designed to hold the silicon die to the metal leadframe for the
entire life of the product. The attachment method should provide complete attachment
across the entire back surface of the die. Any excessive attachment material should be
carefully controlled so that it does not come in contact with the top side of the die.
Bonding is designed to make electrical connection between the die and the leadframe.
Several different bonding processes exist. The bonding temperature, bonding pressure,
bonding tip quality, and wire quality must be carefully controlled to assure consistency.
Wire quality is defined by the wire diameter and the wire purity.
Molding is designed to protect the part from chemical and mechanical degradation.
To accomplish these objectives the molding material must be uniform and dimensionally
accurate. The molding temperature, pressure, and injection rate must be carefully con-
trolled.
The final step in the packaging process is lead forming. In surface mount applications
the lead alignment and planarity are critical for creating a good electrical connection to
the board. The forming tool determines the results of this operation. The tool needs to
meet original specification and stay within specified wear requirements.
1.2.12
3.4 Electrical Testing
Electrical testing is performed at both the raw wafer level and at the final packaged part
level. The wafer level testing is usually critical to assuring overall wafer reliability. While
package testing is designed to detect packaging induced defects.
Three different types of tests axe performed at the wafer level. The first test verifies
the appropriate device characteristics by measuring structures like individual transistors,
contact strings, or diffusion resistances. These structures are located in the center of the
scribe lines on the wafer. A few wafers are tested from each lot. If a wafer does not meet
the device specifications then the entire lot is rejected.
The second wafer test is a full functional test. HP typically breaks this test into several
parts:
• Open/Short I/O pin testing
• Functional testing
• I/O leakage current testing
• Supply current testing
Functional testing is usually performed at full device operational frequencies with the
voltage levels adjusted to compensate for maximum temperature sensitivities. Typical
functional vectors sets are expected to meet at least 95% fault coverage. A static current
test is one of the supply current tests. This test is a critical element in assuring that the
part is free of defects. The Qualification section of this paper explains in detail why this
test is critical. One additional screen for wafer testing is called below ship limit wafer
scrapping. In cases where the number of good die is less than 25% of the standard wafer
yield the entire wafer should be scrapped. This prevents potentially marginal die from
being shipped to the field.
The third wafer test is a visual inspection of the wafer. This screen is designed to
detect gross visual defects that would affect metal or passivation quality.
Package testing consists of a full functional test of each part. In addition, lead planarity
and alignment testing is performed on surface mount devices.
3.5 Shipping
The shipping process packages parts so that they will not be damaged during shipment.
Three potential reliability problems must be prevented during this process:
• Electrostatic discharge damage (ESD)
• Pin planarity or alignment damage
• Excessive moisture absorption.
2nd NASA SERC Symposium on VLSI Design 1990 1.2.13
ESD damage can be prevented by using appropriate ESD grounding procedures while
packaging and preparing parts for shipment. In addition, the internal protection structures
on the part also provide protection. Regular ESD audits on the shipping area assure that
the intent of ESD prevention procedures are understood and that the procedures are
followed on a regular basis.
Pin planarity or alignment damage can result from poor part handling techniques. This
damage can be prevented by correctly selecting shipping containers for the parts, and using
appropriate part handling techniques. Workcell process control and therefore consistency
are typically supported by both audits of the actual procedures used and sampling audits
of the outgoing material.
Parts that absorb excessive moisture before shipment may suffer internal cracking or
delaminations (popcorning) during soldering to pc boards. The best way to control this
mechanisms is to place parts into inventory in moisture tight packaging. Placing a moisture
absorption card in each package is another step that can be taken to guarantee that material
soldered onto boards does not popcorn.
4 Ongoing Strife Testing
One purpose of this testing is to identify the weakest element in the 1C. Once this element
is identified process improvements are developed to further increase the strength of the
process. Ideally strife testing should not be a static set of tests but rather a set of tests
that continuously evolve to create higher and higher part stresses. Another purpose of this
testing is to increase the statistical data available on the process reliability. Running these
tests on a regular basis assures that sufficient data is available to make realistic assessments
of the actual product reliability. New processes should be tested on a more frequent basis
than old processes to develop the statistical data base. At HP we perform strife testing
every month for newer processes. After a couple years this testing is reduced to a quarterly
basis.
As in qualification testing three major types of integrated tests exist:
• Operating life tests
• Moisture resistance tests
• Temperature cycling tests
To increase the part stress these tests are usually performed sequentially. The stress
in operating life tests are increased by using higher supply voltages. Moisture resistance
testing is almost always performed using HAST conditions (summarized in Figure 6).
Thermal shock is used to induce thermal mechanical stress. These stresses are increased
by cycling the parts for at least 1000 cycles.
1.2.14
5 Conclusions
CMOS 1C reliability is determined by a combination of material strength and product
consistency. Weak materials clearly result in a weak product. Inconsistent product quality
will result in inherent product weaknesses that cause field failures. Qualification testing
is designed to set minimum intrinsic and integrated material strength standards. Process
controls are designed to assure product consistency. Products that do not meet these con-
sistency requirements must be scrapped because they contain defects that will cause early
life failures. Finally both material strength and product consistency should improve over
the process life. Ongoing strife testing is designed to identify which materials possess the
lowest strength and any variations in process consistency. Based on this data appropriate
process improvements can be developed and implemented.
References
[1] MIL-STD-883, "Test Methods and Procedures for Microelectronics".
[2] HP General Silicon Specification, A-5951-7600-1, Hewlett Packard, Palo Alto, 1988.
[3] J.M. Pimbley, M. Ghezzo, H.G. Parks, and D.M. Brown, "Advanced CMOS Process
Technology," Academic Press, San Diego, CA, 1989.
[4] S. Wolf, and R.N. Tauber, "Silicon Processing for the VLSI Era Volume 1: Process
Technology," Lattice Press, Sunset Beach, CA, 1986.
[5] M.J. Howes, and D.V. Morgan, "Reliability and Degredation," Wiley, New York, 1984.
