Development of reliability testing automation for microwave radios by Taipale, Juho
helsinki university of technology
Faculty of Electronics, Communications and Automation
Juho Taipale
DEVELOPMENT OF RELIABILITY TESTING AUTOMATION FOR
MICROWAVE RADIOS
Thesis submitted for examination for the degree of Master of Science in
Technology
Espoo 16.09.2009
Thesis supervisor:
Prof. Antti Ra¨isa¨nen
Thesis instructor:
M.Sc.(Tech.) Ville Ha¨ma¨la¨inen
helsinki university of technology abstract of the
master’s thesis
Author: Juho Taipale
Title: Development of reliability testing automation for microwave radios
Date: 16.09.2009 Language: English Number of pages: 56
Faculty: Faculty of Electronics, Communications and Automation
Professorship: Radio Engineering Code: S-26
Supervisor: Prof. Antti Ra¨isa¨nen
Instructor: M.Sc.(Tech.) Ville Ha¨ma¨la¨inen
In my master’s thesis I have studied how automation makes reliability testing of
a microwave radio outdoor unit more efficient. Two new test cases, that are done
manually so far, will be added to an existing automation software. The goal is to
speed up reliability testing and decrease resources needed for it.
Microwave radios used in radio links are divided into two physical parts. One of
them is indoor unit and the other is outdoor unit. The latter has to endure in
outdoor conditions.
Performance of the microwave radio outdoor unit can be investigated with several
test cases. The scope of these test cases is to get detailed information about the
radio. This helps locating malfunctioning unit or component in the hardware.
On one test run these test cases are repeated several times with different outdoor
unit transmission settings and quite a lot of manual work is needed. Including
new test cases in the existing automation software reduces time spent on testing
and the amount of work needed for it. Also, benefits of automation are good
reproducibility of measurements and the lack of human errors.
Keywords: Reliability testing, automation
teknillinen korkeakoulu diplomityo¨n
tiivistelma¨
Tekija¨: Juho Taipale
Tyo¨n nimi: Mikroaaltoradioiden luotettavuustestauksessa ka¨ytetta¨va¨n
automaation kehitta¨minen
Pa¨iva¨ma¨a¨ra¨: 16.09.2009 Kieli: Englanti Sivuma¨a¨ra¨: 56
Tiedekunta: Elektroniikan, tietoliikenteen ja automaation tiedekunta
Professuuri: Radiotekniikka Koodi: S-26
Valvoja: Prof. Antti Ra¨isa¨nen
Ohjaaja: DI Ville Ha¨ma¨la¨inen
Diplomityo¨ssa¨ni tutkin, miten mikroaaltoradion ulkoyksiko¨n luotettavuustestaus
tehostuu automatisoinnin avulla. Olemassa olevaan automaatioon on tarkoitus
liitta¨a¨ kaksi uutta mittausta, jolloin niita¨ ei tarvitse erikseen tehda¨ ka¨sin.
Tarkoituksena on nopeuttaa testausta ja va¨henta¨a¨ siina¨ tarvittavia resursseja.
Radiolinkeissa¨ ka¨ytetta¨va¨t mikroaaltoradiot on jaettu kahteen osaan, joista toista
kutsutaan sisa¨yksiko¨ksi ja toista ulkoyksiko¨ksi. Ulkoyksiko¨n ta¨ytyy kesta¨a¨ vaih-
televia ulko-olosuhteita.
Mikroaaltoradion ulkoyksiko¨n suorituskykya¨ voidaan tutkia erilaisilla mittauksilla.
Na¨iden mittausten tarkoituksena on tarkastella laitteen suorituskykyyn vaikut-
tavia osatekijo¨ita¨ erikseen, mika¨ auttaa suorituskyvyssa¨ havaittujen ongelmien
syiden paikallistamisessa rautatasolla. Mittaukset ovat luonteeltaan toistuvia ja
varsin tyo¨la¨ita¨, jos ne tehda¨a¨n ka¨sin. Uusien mittausten liitta¨minen automaation
lyhenta¨a¨ mikroaaltoradion testaukseen kuluvaa kokonaisaikaa ja va¨henta¨a¨ tyo¨n
ma¨a¨ra¨a¨. Lisa¨ksi automatisoitujen mittausten etuja ovat mittausten hyva¨ toistet-
tavuus ja inhimillisten virheiden va¨ha¨isyys.
Avainsanat: Luotettavuustestaus, automaatio
4Preface
This master’s thesis was done for Nokia Siemens Networks during 2008 and 2009.
First of all I am very grateful to MSc Ville Ha¨ma¨la¨inen whose effort made it possible
for me to do this master’s thesis. Also I would like to thank him for his inspiring
guidance and for his insights on the topic of this thesis. Secondly, I would like to
thank Professor Antti Ra¨isa¨nen at Helsinki University of Technology for his valuable
feedback.
The work started with getting familiar with the software development environment
of the test automation. I am very grateful to Antero Tyrva¨inen who was of great
help with his expertise in the automation software. His guidance was crucial espe-
cially in the beginning of the project. Thanks to Petteri Aleksejev for teaching me
how reliability testing is done in practice. I want to thank all the people at NSN
who have contributed to this work. Also I want to thank my family and friends for
their support.
Otaniemi, 16.09.2009
Juho Taipale
5Contents
Abstract 2
Abstract (in Finnish) 3
Preface 4
Contents 5
Symbols, operators and abbreviations 8
1 Introduction 10
2 Reliability 11
2.1 Reliability block diagram . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Strength versus stress load 12
4 Fatigue damage 15
5 Failure mechanisms 17
5.1 Solder joint fatigue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Time dependent dielectric breakdown . . . . . . . . . . . . . . . . . . 18
5.2.1 Early models for dielectric breakdown . . . . . . . . . . . . . . 18
5.2.2 Models for Ultra-Thin Dielectric Breakdown . . . . . . . . . . 19
5.3 Hot carrier injection . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Electromigration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.5 Soft errors due to memory alpha particles . . . . . . . . . . . . . . . 21
5.6 IC design related failure mechanisms . . . . . . . . . . . . . . . . . . 23
5.7 Manufacture related failure mechanisms . . . . . . . . . . . . . . . . . 23
6 Accelerated stress testing 25
6.1 Highly accelerated life testing . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Step-stress testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3 Value of stress testing . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4 Stress screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.5 Accelerated life testing . . . . . . . . . . . . . . . . . . . . . . . . . . 29
66.6 Temperature cycle testing . . . . . . . . . . . . . . . . . . . . . . . . 29
6.6.1 Temperature shock . . . . . . . . . . . . . . . . . . . . . . . . 30
6.6.2 Stress effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.7 Vibration testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.8 Mechanical shock testing . . . . . . . . . . . . . . . . . . . . . . . . . 31
7 Microwave radio link 33
7.1 Outdoor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.1 NSN Flexi Hopper Plus design . . . . . . . . . . . . . . . . . . 34
8 Reliability testing of microwave radio outdoor units 36
8.1 Pre- and post- stress cycle measurements . . . . . . . . . . . . . . . . 36
8.1.1 RX BER compared to input signal power level . . . . . . . . . 37
8.1.2 Spectrum of TX signal . . . . . . . . . . . . . . . . . . . . . . 37
8.1.3 Accuracy of TX signal power . . . . . . . . . . . . . . . . . . 38
8.1.4 Accuracy of TX carrier frequency . . . . . . . . . . . . . . . . 38
8.1.5 Spurious emissions . . . . . . . . . . . . . . . . . . . . . . . . 38
8.1.6 Accuracy of RX signal power measurement . . . . . . . . . . . 38
8.2 Analyzing test results . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9 Problem definition 41
9.1 Work assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
10 Test automation architecture 43
10.1 Teststand - test sequence editor . . . . . . . . . . . . . . . . . . . . . 43
10.2 Visual engineering environment . . . . . . . . . . . . . . . . . . . . . 46
10.2.1 Communication with measurement equipment . . . . . . . . . 46
11 Implementation of the automation software 47
12 Results 48
13 Conclusion 49
References 50
Appendix A: Typical examples of accelerated stress tests 53
7Appendix B: Frequently used acceleration models 54
Appendix C: The relation among failure modes, mechanisms and fac-
tors 55
Appendix D: The relation among failure modes, mechanisms and fac-
tors continued 56
8Symbols, operators and abbreviations
Symbols
Ea activation energy
F cumulative failure probability
k Boltzmann’s constant
Operators
α′ first derivative of α
9Abbreviations
ADE application development environment
AF acceleration factor
AGC automatic gain control
ALT accelerated life testing
API application programming interface
ASD acceleration spectral density
ASIC application-specific integrated circuit
AST accelerated stress test
C programming language
CERT combined environmental reliability testing
CVI C for virtual instrumentation
DLL dynamic link library
DQPSK differential quadrature phase shift keying
EM electromigration
EMD early mode dielectric breakdown
ESD electrostatic discharge
FDD frequency division duplex
FEC forward error correction
FMECA failure modes, effects and criticality analysis
GPIB general purpose interface bus
GPIO general purpose I/O
HALT highly accelerated life testing
HAST highly accelerated stress testing
IC integrated circuit
IF intermediate frequency
IFU intermediate frequency unit
KSI thousands of psi
LAN local area network
LNA low noise amplifier
LO local oscillator
MOSFET metal oxide semiconductor field effect transistor
MTBF mean time between failures
MTTF mean time to failure
MWU microwave unit
PCB printed circuit board
PSU power supply unit
QFD quality function deployment
RF radio frequency
REF reference unit
RX receive
TCM trellis coded modulation
TDD time division duplex
TDDB time dependent dielectric breakdown
TX transmit
UUT unit under test
VEE visual engineering environment
VMEbus VersaModular Eurocard bus
VXI VMEbus extensions for instrumentation
10
1 Introduction
Reliability testing is implemented in different stages of microwave radio life cycle –
from design to on-going testing at the stage of production. The most important stage
of doing it is the design qualification because the benefits are realized over the whole
life of the product. Production sampling or stress screening is performed regularly in
order to monitor the production process for manufacturing and component quality
variations.
In reliability testing of microwave radio outdoor unit some stress, like temperature
cycling or vibration, is used to precipitate degradation. This may result in failures
during testing or decreased performance after the stress test. Performance of the
unit under test can be measured with several tests.
Reliability testing process consists of calibration of the test system, carrying out
test cases before and after the stress cycle, stress cycle and overview of the results.
Comparing test case results that were obtained before and after the stress cycle
with one another gives information about the deterioration of the unit under test
during stress cycle. These test cases are done with different transmission settings
of the microwave radio and in several temperatures. Manually changing settings,
taking measurements, making graphs, checking results and storing results is quite
arduous and time consuming. Therefore automation software should handle the test
case execution. The program can also be used for performance demonstration for
customers or for checking only that the tested unit meets regulatory standards.
The implementation of two new test cases to an existing reliability testing automa-
tion software was done as a thesis work in Nokia Siemens Networks.
Theory part of this thesis begins with a definition of the reliability in 2. Chapter.
Chapter 3 introduces a concept where the strength of the equipment and the stress
level are presented as probability density functions. This illustrates how the proba-
bility of failure is affected by the strength of the equipment and the stress level when
random errors are not taken into account. Chapter 4 shows how the damage inflicted
by stress degrades the equipment over time. Chapter 5 presents most common fai-
lure mechanisms and also gives examples of some errors in design and manufacture
that have been known to cause early failures. Chapter 6 gives practical view to
accelerated stress testing and presents methods for doing it. Chapter 7 introduces
the test subject, FlexiHopper Plus outdoor unit, for which the reliability testing au-
tomation is designed for. The test cases that will be performed to the outdoor unit
are presented in Chapter 8. Chapter 9 defines the problem and presents the work
assignment. Chapter 10 shows the software architecture of the test automation and
gives short introductions to the main programming tools VEE Pro and Teststand.
Chapter 11 tells about the implementation of the software. Results are presented in
Chapter 12 and conclusion is in Chapter 13.
11
2 Reliability
There are several ways to define reliability. A commonly used definition and the
definition used in this thesis is:
”Reliability is the probability that an item operating under stated conditions will
survive for a stated period of time.”
The above definition has its roots in military handbook MIL-STD-721C [1]. The
above definition is valid for non-repairable hardware items. The ”item” may be a
component, a sub-system or a system. If the item is software instead of hardware,
the definition will be somewhat different [2].
2.1 Reliability block diagram
A system can be divided into sub-systems. Each sub-system has reliability that is
not dependable of reliabilities of other sub-systems. In Figure 1 a) sub-systems are
in series and in Figure 1 b) they are parallel to each other. The former system works
only if both blocks are working and the latter system works if at least one of the
blocks is working.
Figure 1: Reliability block diagram of a: a) system in series, b) parallel system
Reliability of these systems in Figure 1 can be counted if the reliabilities R1 and R2
of the units are known. For the series system in Figure 1 a) reliability is
Rs = R1R2 (1)
For parallel system in Figure 1 b) reliability is
Rp = 1− (1−R1)(1−R2) (2)
12
3 Strength versus stress load
Strength of a product is defined as the maximum stress load the product can with-
stand without a failure. Every product is different from each other due to component
variation and limited reproducibility in manufacturing and so the strengths of pro-
ducts vary. Strength can therefore be presented as probability density distribution.
[3, pp. 115]
Figure 2: Strength distribution.
The bigger the variation in strength is the more there are weak units. Varying
component values and inaccuracy in manufacturing don’t normally enhance the
durability of the products. [3, pp. 116]
Figure 3: Strength distribution with many weak units.
’Stress’ may refer to a mechanical stress due to temperature cycling, vibration,
humidity or so on. It can be presented as a probability density distribution, because
the products are under environmental conditions that are different from each other.
[3, pp. 114]
13
Figure 4: Stress and strength distributions.
Stress and strength distributions can be placed in the same graph. If the distribu-
tions don’t overlap each other, as seen in Figure 4, there will be no failures. Prac-
tically this means that every unit can surely withstand the current stress, because
lowest strength level is higher than highest applied stress level. Random failures are
not taken into account.
Over time, the strength of units decreases due to degradation, but the stress load
distribution remains still unchanged. When these two distributions overlap, failures
among the weakest units become probable. [3, pp. 115]
Figure 5: Strength deterioration over time.
The pattern of failures that dominate in the field can be approximated in three ways
according to the time of failure in the life span of a product.
14
1. There are early life failures that are due to defects in products. These failures
are also known as ’infant mortalities’.
2. The failures caused by defects decrease rapidly and the dominant failure mecha-
nisms are externally induced failures, which have rather constant rate of oc-
currence all the time. At this point the failure rate is at its minimum.
3. When the strength of the products deteriorates due to stress induced fatigue,
wear out takes place as the dominant failure mechanism. [5, pp. 3-4]
When all of these failure types are superimposed, a curve called the bathtub curve
occurs. One of a kind is shown in Figure 6.
Figure 6: Bathtub curve.
15
4 Fatigue damage
Many failures in electronic equipment are mechanical in nature. One common failure
mode is mechanical fatigue damage due to cyclic stresses caused by temperature,
temperature cycling, vibration or some combination of them. [5, pp. 15-16]
Miner’s criterion is one of the least complex ways to model fatigue damage. This
criterion states that the fatigue damage is cumulative, non-reversible and accumu-
lates on a simple linear basis. The damage accumulated under each stress condition
taken as a percentage of the total life expended can be summed up over all stress
conditions. When the sum reaches unity, the end of fatigue life has arrived and
failure occurs. Fatigue damage based on Miner’s criterion is: [6]
D ≈ NSβ, (3)
where D is the Miner’s criterion fatigue damage accumulation, N is the number of
cycles of stress, S is the mechanical stress in force per unit area and β is a material
property.
Stress versus number of cycles to fail diagram by Steinberg [7] is presented in Figure
7.
Figure 7: S-N diagram for 7075 aluminium [7].
The graph is derived from tensile fatigue tests on specimens and illustrates that the
relationship between the number of cycles to fail and tensile stress is exponential in
nature and verifies that an equation as (3) exists. Material property (β) is derived
from the slope of the curve and, for most materials, ranges between 8 and 12 for
most materials in high cycle fatigue. S-N diagrams for other materials are similar
in nature. [5, pp. 133]
For a demonstration of the diagram in Figure 7, which is for 7075-T6 aluminium,
following tensile stress levels and corresponding number of cycles to fail are chosen:
1. At 40 KSI (thousands of psi) it takes 2 million cycles to fail
2. At 80 KSI it takes 2000 cycles to fail.
16
Increasing stress by a factor of 2 causes decrease in life by factor of 1000. This
acceleration factor is very typical for mechanically induced fatigue and for many
other failure modes as well. Part that has imperfections may have higher stress
concentrations. Even small imperfection may lead to stress two or even three times
as high as in a part without imperfection - in other words the stress concentration
factor is two or three. For example, if the stress concentration factor is 2 and β is
10, then the acceleration factor is about 1 million. [5, pp. 133]
The fatigue damage that cycles do is non-reversible and cumulative. Cycles of high
stress during testing do as much fatigue damage as cycles of low stress in the field
environment and this is what makes it possible to utilize accelerated stress testing
for finding weaknesses in products and validating the reliability of a product in a
fraction of expected life time of the product.
Using S-N curves as a tool for design against fatigue have obvious limitations. One
severe limitation is that they do not distinguish between crack initiation and crack
propagation. Particularly in the low-stress regions, a large fraction of a component’s
life may be spent in crack propagation, thus allowing for crack tolerance over a large
portion of the life. Engineering structures often contain flaws or crack-like defects
which may altogether eliminate the crack-initiation step. Therefore a method that
quantitatively describes crack growth as a function of the stress loads is of great
value in design and in assessing the remaining lives of components. [8]
17
5 Failure mechanisms
5.1 Solder joint fatigue
During temperature cycling conditions, solder joints experience a complex stress and
strain history. The stress and strain in the solder joint result from the mismatch
of the coefficient of thermal expansion between the package and the substrates, and
the total CTE mismatch between the solder and copper pads/leads. This mismatch,
which is sketched in figure 8, will cause the fatigue failure of solder joints. [9, pp.
152]
Figure 8: Thermal effects in soldered joint connecting a chip carrier (CC) to a
substrate (PWB). (a) Initially unstressed solder joint. (b) Solder joint is heated to
Tmax, expanding PWB relative to CC. (c) Solder joint is cooled to Tmin, contracting
PWB relative to CC.
The process of this failure mechanism is:
1. crack nucleation
2. crack growth
3. ultimate ductile failure.
18
The thermal fatigue of solder joints depends on a number of parameters related to
materials, pad surface finishes, geometry and the manufacturing processes. Besides
thermal-mechanical fatigue failures, the impact of solder joint failures is a major
issue for the electronics industry because of the ever increasing popularity of portable
electronics and the transition to lead-free solders. [9, pp. 152]
5.2 Time dependent dielectric breakdown
One of the most important failure mechanisms in semiconductor technology is time
dependent dielectric breakdown (TDDB). TDDB refers to the wear out process in
gate oxide of the transistors. Other main contributors to semiconductor device wear
out are electromigration and hot carrier effect. The exact physical mechanisms
leading to TDDB are still unknown [10] [11].
Currently, the accepted model for oxide breakdown is the percolation model, in
which breakdown occurs when a conduction path is formed by randomly distributed
defects generated during electrical stress. It is believed that when this conduction
path is formed, soft breakdown happens. When a sudden increase in conductance
occurs, as well as a power dissipation rate higher than a specific threshold, hard
breakdown occurs. The whole process is very complicated, but can be explained by
the physics of breakdown [12].
In general, the electric field in MOSFET causes dielectric degradation and conduc-
tive path formation in dielectric material, which somehow connects the anode and
cathode. Then, the continuous stress of the electric field on the gate oxide leads
to thermal runaway through the breakdown path (soft breakdown) or energy dis-
sipation (hard breakdown). The oxide breakdown leads to an increase in the gate
current. The whole process can be modeled physically and mathematically. The
models provide the lifetime function of the device. [9, pp. 89]
Further studies show that, while there is neither microscopic information about the
breakdown defect and the interaction with stress voltage and temperature, nor a
complete explanation of temperature dependence of dielectric breakdown of ultra-
thin dielectric layers, there are empirical relations between voltage and temperature
stresses. [9, pp. 92]
5.2.1 Early models for dielectric breakdown
During electrical stress, the time to dielectric breakdown depends on the electric
stress parameters such as the electric field, temperature and the total area of the
dielectric film. From the 1970s until 2000, several models were suggested for the
time-to-breakdown in dielectric layers.
The intrinsic failure, which occurs in defect-free oxide, is modeled in four forms:
- Bandgap ionization occurs in thick oxides, when the electron energy reaches
the oxide bandgap and causes electron-hole pairs.
19
- Anode hole injection (the 1/E model) occurs when electrons injected from
the cathode gate get enough energy to ionize the atoms and create hot holes.
Some of the holes tunnel back to the cathode and either create traps in the
oxide, or increase the cathode field, leading to sudden oxide breakdown. The
time to breakdown (tBD) has a reciprocal electric field dependence from the
Fowler-Nordheim electron tunneling current:
tBD = t0exp
(
G
EOX
)
exp
(
Ea
kT
)
, (4)
where EOX is the electric field across the oxide and G and t0 are constants.
- The thermo-chemical (E) model relates defect generation to the electrical field.
The applied field interacts with the dipoles and causes oxygen vacancies and,
hence, oxide breakdown. The lifetime function has the following form:
tBD = t0exp (γEOX) exp
(
Ea
kT
)
, (5)
where γ and t0 are constants.
- Anode-hydrogen release happens when electrons at the anode release hydrogen,
which subsequently diffuses through the oxide and generates electron traps. [9,
pp. 90]
For thick dielectrics, the dielectric field is an important parameter controlling the
breakdown process, while temperature dependence of dielectric breakdown is another
key point. The E and 1/E models can only fit part of the electric field, as shown in
Figure 11. There are some articles in the literature that tried to unify both models
[13] [14] [15]. However, both of those models are not applicable to the ultra-thin
oxide layers. For ultra-thin oxide layers (between 2 to 5 nm), other models are used.
5.2.2 Models for Ultra-Thin Dielectric Breakdown
The dielectric thickness of modern semiconductor devices has been steadily decrea-
sing. The time dependent breakdown of thin dielectric gets the following form [9,
pp. 91]:
tBD = A
(
1
WL
)1/β
F 1/βV a
′+b′T
GateToSourceexp
(
a(V )
T
+
b(V )
T 2
)
, (6)
where W and L are the width and length of the channel respectively, F is cumulative
failure probability, A is acceleration factor, and β, a, b, a′ and b′ can be derived
according to experimental data [16].
The b(V )
T 2
expression in equation 6 represents the non-Arrhenius behavior of tempe-
rature [17] [18].
20
5.3 Hot carrier injection
Hot carriers in a semiconductor device are the cause of a distinct wear out mecha-
nism, the Hot Carrier Injection (HCI). Hot carriers are produced when the source-
drain current flowing through the channel attains high energy beyond the lattice
temperature. Some of these hot carriers gain enough energy, higher than the Si-
SiO2 energy barrier of about 3.7 eV, to be injected into the gate oxide, resulting
in charge trap and interface state generation. The latter may lead to shifts in the
performance characteristics of the device:
- threshold voltage
- transconductance
- saturation current. [9, pp. 95]
The hot carrier effect accelerates as the temperature decreases. Hot carrier phe-
nomena are accelerated by low temperature, mainly because this condition reduces
charge de-trapping. [19]
A simple acceleration model for hot carrier effects is as follows:
AF =
t50(2)
t50(1)
, (7)
AF = exp
(
Ea
kT
(
1
T1
− 1
T2
)
+ C(V2 − V1)
)
, (8)
where, AF is the acceleration factor of the mechanism. t50(1) and t50(2) are the rates
at which the hot carrier effects occur under conditions V1 and T1 and V2 and T2,
respectively. T1 and T2 are the applied temperatures in Kelvins. Ea is the activation
energy in the range of -0.2 eV to -0.06 eV and C is a constant. [20]
5.4 Electromigration
Electrons passing through a conductor transfer some of their momentum to its
atoms. At sufficiently high electron current densities, greater than 105 A/cm2 [21],
atoms may shift towards the anode side. The material depletion at the cathode side
causes circuit damage due to decreased electrical conductance and eventual forma-
tion of open circuit conditions. This is caused by voids and micro-cracks, which
may increase the conductor resistance as the cross sectional area is reduced. The
mechanism for this is shown in Figure 9. Increased resistance alone may result in de-
vice failure, but also the resulting increase in local current density and temperature
may lead to thermal runaway and failure, such as an open circuit. Alternatively,
short circuit conditions may develop due to excess material buildup at the anode.
Hillocks form where there is excess material, breaking the oxide layer and allowing
the conductor to come in contact with other device features. Other types of damage
21
Figure 9: EM failure mechanism [9, pp. 116].
include whiskers, thinning, localized heating, and cracking of the passivation and
inter level dielectrics [22].
This diffusive process, known as electromigration, is still a major reliability concern
despite vast scientific research, as well as electrical and materials engineering efforts.
In particular, the areas of greatest concern are the thin-film metallic interconnects
between device features, contacts and vias [22].
The favored method to predict time-to-failure is an approximate statistical one given
by Blacks equation, which calculates the MTTF as:
MTTF = AJ−nexp
(
Ea
kT
)
, (9)
where A is constant based on cross-sectional area of the interconnect, J is current
density, n is scaling factor, Ea is EM activation energy, k is Boltzmann’s constant,
and T is temperature in Kelvins [9, pp. 117].
5.5 Soft errors due to memory alpha particles
One of the problems which hinders development of larger memory sizes or the minia-
turization of memory cells is the occurrence of soft errors due to alpha particles.
Uranium (U) and thorium (Th) are contained in very low concentrations in package
materials and emit alpha particles that enter the memory chip and generate a large
concentration of electron-hole pairs in the silicon substrate. This causes a change
in the electric potential distribution of the memory device amounting to electrical
noise which, in turn, can cause changes in the stored information. Inversion of me-
mory information is shown in Figure 10. The generated holes are pulled towards
22
the substrate with its applied negative potential. Conversely, electrons are pulled
to the data storage node with its applied positive potential. A dynamic memory
filled with charge has a data value of zero. An empty or discharged cell has a value
of one. Therefore, a data change of 1-0 occurs when electrons collect in the data
storage node. Such a malfunction is called memory cell model of a soft error. [9,
pp. 119]
Figure 10: Memory cell model of soft error [9, pp. 119].
The bit line model reflects a change of the bit line electric potential. The electric
potential of the bit line varies with the data in the memory cell during readout,
and is compared with the reference potential, resulting in a data value of 1 or 0.
A sense amplifier is used to amplify the minute amount of change. If α-particles
penetrate the area near the bit path during the minimal time between memory
readout and sense amplification, the bit path potential changes. An information
23
1-0 operation error results when the bit path potential falls below the reference
potential. Conversely, if the reference potential drops, an information 0-1 operation
error results. The memory cell model applies only to information 1-0 reversal, while
the bit path model covers both information 1-0 and 0-1 reversals. The generation
rate of the memory cell model is independent of the memory cycle time because
memory cell data turns over. Since the bit path model describes problems that
occur only when the bit line is floating after data read-out, an increased frequency
of data read-out increases the potential for soft errors and so the bit path model
occurrence rate is inversely proportional to the cycle time. In products, the combined
model describes the combination of the memory cell and bit path models. [9, pp.
120]
5.6 IC design related failure mechanisms
IC chip edge cracks may be due to a too small tolerance in distance between the
chip edges and the lid of a hermetically sealed component. The edges may be
cracked at the sealing operation. Edge cracks may also develop due to improper
chip bonding. Reference [23] reports that a bad soldering mostly results in edge
cracks. The sensitivity for cracks decreases for thinner chips to a certain limit and
increases with larger surface area. The best stress type for precipitation of this kind
of weakness is thermal shock. [24, pp. 25]
Moisture in IC package can cause it to crack. When temperature is increased mois-
ture evaporates and creates pressure which may cause a crack in the IC package.
This can happen while soldering if the IC package has, at some point, absorbed
moisture for example during storage. [4, pp. 463]
Moisture condenses onto surfaces when temperature drops to the dew point. Liquid
water can cause chemical corrosion, [25, pp. 31-32] if contamination is also present,
electrolytic corrosion and short-circuiting of electrical systems.
5.7 Manufacture related failure mechanisms
Manufacturing and assembly processes and procedures must be carefully controlled
to ensure the reliability of the electronic equipment. Many early field failures in
military electronic equipment have been traced back to manufacturing methods
that are not normally evaluated in the preliminary design and analysis evaluations.
Some of these documented failures are described below.
Conformal coatings are used to protect the electronics from moisture, salt and dirt.
The coating can fill in under chip components and large fine-pitch surface mounted
components. Chip resistors and capacitors and solder joints on the fine-pitch com-
ponents may crack, due to thermal expansion in thermal cycling conditions. [7, pp.
340]
Large components will generate large relative displacement to the PCB as the PCB
24
bends during vibration. This increases strain and stress in the component lead wires.
[7, pp. 339]
Wave soldering operation is known to lift some components up from the PCB. This
produces tilted axial leaded component that has one long and one short lead. This
solder wicking structurally short circuits the shorter wire strain relief. This increases
the forces and stresses in the solder joint, which reduces the operating life. [7, pp.
337]
As through-hole components have been largely replaced by surface mount compo-
nents, wave soldering has been supplanted by reflow soldering methods in many
large-scale electronics applications. However, there is still significant wave soldering
where SMT is not suitable (e.g. large power devices and high pin count connectors),
or where simple through-hole technology prevails. [26]
Lead forming dies are used to bend the component lead wires to fit the solder pads.
If dies are worn or they are not aligned properly, the lead wires may have sharp-
bend radii, deep cuts or scratches. High stress concentrations are developed at these
defects, which result in reduced operating life. [7, pp. 337]
Some through-hole components, such as, transformers, must not be flush mounted
on PCBs. There is no air gap between the transformer and the PCB and it is
difficult to clean out the solder paste and flux under the component. High humidity
and a little electric current can promote dendritic growth under the component.
This growth is a transparent semiconductor, similar in appearance to lacquer or
shellac, with a high electrical impedance so high-impedance circuits can be affected
by this growth. Another reason for not to flush mount is that the component may
prevent the venting action of the hot gases escaping from the plated through-hole
during the wave soldering operation. This builds up the air pressure in the plated
through-hole, which reduces the wicking action of the solder. Plated through-holes
are not full of solder and this causes premature solder joint failures. [7, pp. 337]
Components with flat bottom, like transformers, should not be mounted tightly on
PCB. Thermal expansion in the transformer body in the direction perpendicular to
the plane of the PCB can break the lead wires and solder joints in thermal cycling
conditions. [7, pp. 340]
The copper layers must be located uniformly through multilayer PCBs to prevent
them from warping during the lamination process and during the soldering process.
Special fixtures may have to fabricated to prevent the PCBs from warping. High
stresses can be developed in the component lead wires and solder joints if the PCBs
are allowed to warp, which can lead to premature field failures. [7, pp. 338]
Components and PCB have different thermal coefficient of expansion (TCE). There
is temperature difference when the component is turned on and it heats up faster
than PCB. Shear stresses are imparted to solder joints. The bigger the component is
the higher is the stress in the solder joints on the edge and especially on the corners
of the component. This can cause deformation and fatigue failures in form of cracks
in the solder joints. [4, pp. 430]
25
6 Accelerated stress testing
Stress testing can be implemented in different stages of a product life cycle - from
the design qualification phase to the on-going testing at the stage of production.
The most important stage of doing it is the design qualification because the benefits
are realized over the whole life of the product. After the design stage manufacturing
qualification stress testing is done to representative sample of product in order to
identify deficiencies in component quality and manufacturing process, so that these
deficiencies are fixed before production volumes become large. Production sampling
or stress screening is performed regularly in order to monitor the production process
for manufacturing and component quality variations. [27, pp. 9]
Accelerated stress testing (AST) is a reliability testing method that tries to induce
failures in order to find weaknesses and fix those weaknesses to enhance the reliabi-
lity. This testing differs from the classical reliability demonstration and mean time
between failures (MTBF) types of tests that measure the expected life time of the
UUT.
In AST the stress is increased over the normal operation conditions of the equipment.
The reason for using higher stresses than in the actual service environment is to
induce failures quickly. The failures that occurred during testing might occur in
normal use as well. By investigating the physical and chemical background of the
failures during testing might give the information whether these are relevant failures
or not and how the durability of the equipment can be improved. Wrong conclusions
about the nature of the failure lead to unnecessary improvements of the product and
increased production costs.
The first thing in performing an AST is to determine, as far as practicable, what
failures might occur in service. This should have been performed during design
analysis and review, particularly during the quality function deployment (QFD) and
failure modes, effects and criticality analysis (FMECA). After that the application
and environmental stresses that could cause failures should be listed. Finally a
plan is made on how to use stresses to stimulate foreseeable and unforeseen failures
effectively and how the test set up is built, operated and monitored. [3, pp. 331-332]
The main reliability-affecting environmental factors, affecting most electronic pro-
ducts, are:
- temperature
- vibration
- shock
- humidity
- power input and output
- dirt
26
- people
- electromagnetic effects
- electrostatic discharge (ESD).
Different stress types can be applied simultaneously or separately to the equipment
under test. The former is the better way because the total stress is higher and more
failure mechanisms may be excited than when single stress is applied. There are
chambers that can apply temperature cycling, humidity and vibration at the same
time. Downside of these multi-stress is their high cost. [3, pp. 330-331]
Electronic systems have been subjected to accelerated temperature and vibration
stresses to levels of the order of [3, pp. 330] 20-50 percent above specifications in
development and for production units. Accelerated tests with stresses above the use
environment are used and justified because the causes and probabilities of failures
that will occur in the future are often very uncertain. It is possible that weakness
is exposed during testing with different stress [5, pp. 140-142] than the one that
would make the weakness show up in the field.
6.1 Highly accelerated life testing
Highly accelerated life testing (HALT) was developed by Dr. Gregg Hobbs and the
principle of this concept is described fully in reference [5].
In HALT, as well as in ALT, the goal is to simulate the life time stresses a pro-
duct is going to experience. These two concepts differ two ways from each other.
Firstly, HALT uses higher stress levels than in ALT in order to gain more life time
compression. Secondly HALT is not for measuring the expected life of the product
but, like in AST, to induce failures to find weaknesses. Different types of testing
like temperature cycle, humidity, vibration and product specific stresses like power
cycling and varying input line voltage can be part of HALT program. In testing
program, for example for a printer, can be included tests where shafts or bearings
are misaligned or papers are used that are out of specification dimensions or have
frictions. [5, pp. 49]
6.2 Step-stress testing
Step-stress testing has been used since the early days of space program. This test
is categorized as highly accelerated stress test (HAST) because of its nature. Step-
stress testing starting point is a known stress level and then increasing the stress
levels in controlled steps. At each stress level tests are run in order to monitor
functionality and performance of the equipment under test. This process is repeated
until equipment under test fails or satisfactory stress level is achieved. [27, pp. 11]
Figure 11 illustrates the process of step-stress testing. The stress is increased further
with higher stress than the product is designed to withstand and failure at some
27
Figure 11: Step-stress testing process model [5, pp. 55].
point is expected. The reason for failure is analysed. The relevance of the failure
must be considered because it is useless to change design if it is very unlikely to have
a failure of this type in normal use. If irrelevant failures begin to appear step-stress
testing is not useful.
6.3 Value of stress testing
It has been found in various papers from Hewlett-Packard that most of the weak-
nesses found in HALT and not addressed resulted in costs to the company in the
neighbourhood of 10 million US$ to address later, when failure costs were included.
[28]
Time spent to testing is expensive, so the more quickly we can find possible causes
of failures the better. Finding causes of failure during development and preventing
recurrence is far less expensive than finding new failure causes in use.
Direct costs of failure of a product that is already in the market are reclamation
requests, alternation of the production line for replacement component or unit, subs-
tituting the stock of failure sensitive components or units with new ones, testing the
product with new component or unit and the delay caused by the measures dis-
cussed. Indirect cost is decreased customer confidence in the market. Buyers want
to have reliable equipment in order to keep the maintenance costs as low as possible.
Reliability of the product is an essential marketing medium.
28
6.4 Stress screening
Stress screen testing removes the units not qualified for sale and prevents early
failures in use. Defective components or mistakes during manufacturing may result
in very weak units. Even a small imperfection like voids in solder joint may cause
the stress concentration in that particular location to increase by a factor of [5, pp.
133-134] two to three compared to a flawless solder joint. Units with imperfections
will fail quickly in use conditions. This is also known as ’infant mortality’.
Fatigue damage is cumulative, as was mentioned earlier in this thesis, and so stress
screening removes some life from the product. However stress screen testing doesn’t
have much effect on healthy products. If assumed that there is a flaw which causes
the stress concentration to increase by a factor of two. According to the Miner’s
equation (3) with material property β assumed to be about 10, the fatigue damage
would accumulate about 1000 times as fast in the area with the flaw as it would in
a non-flawed location having the same nominal stress level. This means that the
flawed area can break and the non-flawed areas still have 99,9 % of the life left. [5,
pp. 16]
The amount of life in a product after the screen test compared to the initial state
of the product can be measured by performing the screen testing many times for a
couple of products. If a product can handle, let’s say 20 screen tests, as in Figure
12 without an end-of-life failure then after one screen test there is still at least 95 %
of the life time left in the product. Also performing many screen tests ensures that
the test itself doesn’t initiate a defect like a crack which would make the product
deteriorate much faster than without testing it in the first place. [5, pp. 104-107]
Figure 12: Validating stress screen test which is to be used in production [5, pp.
106].
29
6.5 Accelerated life testing
Accelerated life testing (ALT) measures the expected life of the equipment by si-
mulating a lifetime of the actual stresses the product will experience. This testing
method is used in the product reliability validation to ensure acceptable lifetime in
use. The concept of ALT is pretty much the same as that of AST but in addition
there is an acceleration factor (AF) that relates the mean time to failure (MTTF)
of equipment in normal use and equipment under test. Acceleration factor, which
is calculated in Equation (10), is the equipment MTTF in normal use environment
divided by MTTF under test conditions.
AF =
MTTFinuse
MTTFundertest
(10)
In general, accelerated life testing techniques provide a shortcut to investigate the
reliability of electronic devices with respect to certain dominant failure mechanisms
occurring under normal operating conditions. Accelerated tests are usually planned
based on the assumption that there is a single dominant failure mechanism for a
given device. However, the failure mechanisms that are dormant under normal use
conditions may start contributing to device failure under accelerated conditions,
and the life test data obtained from the accelerated test would, therefore, not be
representative of the actual situation. Moreover, an accelerated stress accelerates
various failure mechanisms simultaneously and the dominant failure mechanism is
the one which gives the shortest predicted life. [29]
6.6 Temperature cycle testing
Temperature cycle is one of the most common ways of doing AST. Temperature
cycling can be done in thermal chambers. Key parameters are not only high and
low temperature but also temperature change rate and dwell time.
Figure 13: Temperature cycle with UUT.
30
Temperatures of a climatic chamber and operating UUT is sketched in Figure 13.
Temperature inside the UUT is higher than the ambient temperature because it is
operating and produces some heat. The temperature change rate of the UUT is lower
than the one of the climatic chamber and reaches its maximum when the climatic
chamber reaches its maximum or minimum temperature. Temperature maximum
and minimum are chosen according to the use environment of the equipment and
used testing method.
Dwell time ensures that inside the UUT the temperature reaches stable state af-
ter the chamber has reached the maximum or minimum temperature of the cycle.
Dwell time depends on the structure of the equipment. Required dwell time can
be measured by applying a temperature sensor in the UUT and taking time when
temperature change is near zero. If the equipment is sealed, dwell time is longer
than when it has ventilation holes and possibly fans.
6.6.1 Temperature shock
Temperature shock can be arranged switching high power equipment on and off or
by using two climatic chambers, one with high temperature and the other with low
temperature, and quickly moving the UUT from one chamber to another.
6.6.2 Stress effects
High temperature causes:
- softening and weakening (metals, some plastics)
- melting (metals, some plastics)
- charring (plastics, organic materials)
- other chemical changes
- reduced viscosity or loss of lubricants
- interaction effects, such as temperature-accelerated corrosion.
Low temperature effects are:
- embrittlement of plastics
- increased viscosity of lubricants
- condensation
- freezing of condensation or coolants.
31
Most of these temperature effects are deterministic not cumulative, so time and
number of cycles do not directly affect reliability. However, secondary effects might
be cumulative, for example the effects of lubricant viscosity on rate of wear.
Possible stress screen types for precipitate solder joint cracks are vibration and
temperature cycling. [24, pp. 39]
6.7 Vibration testing
Single frequency sinusoidal wave test can be used to test the equipment compatibility
with some source of constant frequency vibration like a constant speed rotating mo-
tor. Single frequency vibration can be also used to excitate the resonance frequencies
of the equipment one at a time and checking if there is some resonance frequency to
which it is vulnerable. This test type is specifically used in development phase.
Sinusoidal vibration sweep over frequency range can reveal resonance frequencies in
the system. Key parameters are acceleration, frequency range, number of sweeps
and sweep rate, which is usually expressed as octave per minute.
Random frequency vibration testing is used to simulate the use environment. This
test type is used in qualification, reliability and acceptance testing. The amount
of vibration is usually expressed as acceleration spectral density (ASD). Other pa-
rameters are random vibration frequency bandwidth and attenuation outside the
bandwidth, which is usually expressed as attenuation in decibels per octave from
the boundary frequency.
Test systems can usually vibrate along one axis on both directions, so all three
different axes of the UUT has to be tested separately. Multi-axis vibration test
systems are more expensive than single axis ones.
Dr Gregg Hobbs, who introduced the HALT, claims that he first used single axis
vibration machine to severely ruggedize a device. Then, in production, a multi-axis
vibration machine was used and three design weaknesses which had not been found
on the single-axis system were exposed almost immediately. [5, pp. 18]
With high acceleration levels relays chatter and crystal oscillator signal may be
distorted when their internal resonance are excited. These are soft failures if the
UUT works normally after the vibration is stopped. Typical hard failures include
cracked solder joints, broken wires, cracked circuit traces, cracked plated through-
holes, broken connector pins, broken screws, cracked components, cracked hermetic
seals, and cracked silicon chips. [7, pp. 189]
6.8 Mechanical shock testing
Shock is defined as a rapid transfer of energy to a mechanical system, which results
in a significant increase in the stress, velocity, acceleration or displacement within
the system. [7, pp. 248]
32
A common way of doing shock testing is dropping the equipment. Test subject
is attached to a plane, which is then dropped to hit rubber pads or a spring that
bounces the plane and its velocity goes abruptly to zero. Key parameters for this
test are drop height, rubber pad elasticity or spring rebound coefficient.
Shock and vibration testing failures are very similar. Shocks can produce four basic
kind of failures in electronic systems
- high stresses, which can cause fractures or permanent deformations in the
structure
- high acceleration levels, which can cause relays to chatter, signal distortion in
crystal oscillators, potentiometers to slip and bolts to loosen
- high displacements, which cause impact between circuit board and adjacent
board or shell causing electrical malfunction such as short-circuit or cracking
components and solder joints. [7, pp. 248]
33
7 Microwave radio link
Microwave radio link is mainly used in macrocellular sites. It can also be used in
the microcell layer when there is a need for higher capacities or longer radio hops.
Microwave radio link main parts are indoor unit, outdoor unit, and antenna.
Indoor unit is connected to outdoor unit with coaxial cable which transfers data
in full duplex mode and feeds DC power to the outdoor unit from indoor unit.
Several outdoor units can be attached to single indoor unit. Also indoor units can
be attached to another indoor unit with the same cable. In Figure 14 there is an
example configuration of branching station with two indoor units and four outdoor
units directly attached to the antennas.
Figure 14: Radio link configuration [30].
7.1 Outdoor unit
Microwave radio is the part that sends and receives transmission. Bi-directional
communication can be realized with two techniques:
- Time division duplex (TDD), which means that the transmission and reception
are done at different times.
- Frequency division duplex (FDD), which means that the received and trans-
mitted signals are on different frequency bands. Difference between received
and transmitted signal is called duplex frequency.
There are different ways to divide the functionalities between outdoor and indoor
unit. In Figure 15 the modem is located in the outdoor unit. It may also be in the
indoor unit and the data flows through I/Q-channel between outdoor and indoor
units.
34
7.1.1 NSN Flexi Hopper Plus design
Block diagram of the Flexi hopper plus microwave radio outdoor unit is illustrated
in Figure 15.
Figure 15: Block diagram of the outdoor unit [30].
The outdoor unit includes five functional units:
- a power supply unit (PSU)
- a modem board
- an intermediate frequency unit (IFU)
- a microwave unit (MWU)
- a duplex filter.
Moving modem to outdoor unit decreases the reliability of the outdoor unit. Upside
of this is that there is less attenuation between the modem and the IF unit resulting
increased hop length of the radio link.
The main component of the modem board is the custom design ASIC (application-
specific integrated circuit). The ASIC contains a digital modulator and demodulator
with Reed-Solomon forward error correction (FEC). The interface between the mo-
dem board and the RF part is analog I and Q signals. The modem board also
includes an embedded microprocessor system, which is used to control all units in-
side the outdoor unit as well as to communicate with the indoor unit and the far-end
unit when needed. The RF functions are divided between two units: the IF unit
and the microwave unit. The MWU includes all microwave circuits, most of which
are MMICs, while the IFU includes required intermediate frequency circuits. The
35
waveguide duplex filter separates the transmitter and the receiver and provides at
the same time low loss connection to the antenna port. [30]
In the transmitter side, direct conversion architecture has been implemented to
enable use of a single microwave local oscillator. Since the I/Q up-converter operates
at the end frequency, a digital feedback loop is required to correct the amplitude
and phase errors of the modulator. After the up-conversion the signal is amplified
enough in order to obtain the required maximum output power level. A temperature
compensated power detector is used to monitor the power level after the high power
amplifier (HPA), and thus, to drive the voltage variable attenuator (VVA) in order
to obtain the required output power level. [30]
In the receiver side, the single IF conversion architecture is used. After the low-noise
amplifier (LNA) the received signal is down-converted to the IF. The automatic gain
control (AGC) with a dynamic range of about 100 dB is used to obtain a constant
rms-power level for the I/Q-demodulator. The outdoor unit contains two separate
phase-locked oscillator circuits. In the MWU, the fundamental oscillator frequency
is multiplied in order to obtain the low phase noise VCO signal for the transmitter
(Tx) and the receiver (Rx) up- and down-converters. Due to the common VCO
frequency at Tx and Rx, the IF frequency is always equal to the duplex spacing.
[30]
The transmitter uses either 4-state modulation (pi/4-DQPSK, differential quadra-
ture phase shift keying) or optional 16-state modulation (32 TCM, Trellis coded
modulation), which have the advantages of a narrow spectrum and a good output
power efficiency. The optional 16-state modulation is available for 8x2 and 16x2
Mbit/s capacities. The channel bandwidth is half of the bandwidth required for the
4-state modulation with the same capacity. 4-state modulation has 2x2, 4x2, 8x2
and 16x2 Mbit/s capacities.
36
8 Reliability testing of microwave radio outdoor
units
Reliability testing process consists of calibration of the test system, measurements
before and after the stress cycle, stress cycle and overview of the results. Performance
measurements before the stress cycle ensure that the equipment under test meets
its specifications. Comparing measurement results that were obtained before and
after the stress cycle with one another gives information about the deterioration of
the equipment during stress cycle.
8.1 Pre- and post- stress cycle measurements
The main goal of these pre- and post-measurements is to measure the performance
of the outdoor unit and locate the malfunctioning blocks. This is achieved by doing
several tests. On one test run these test cases are repeated several times with dif-
ferent outdoor unit transmission settings and with different temperatures to ensure
that the UUT can operate in the specified temperature range. Test cases for micro-
wave radio outdoor unit are:
- BER compared to input signal power level
- spectrum of TX signal
- accuracy of TX signal power
- accuracy of TX carrier frequency
- spurious emissions
- accuracy of RX signal power measurement.
Test setup consists of waveguides, couplers, variable attenuators and measurement
equipment. A sketch of the test setup is shown in Figure 16. Following equipment
are needed to perform test cases that were mentioned in previous chapter:
- power meter and two power sensors
- spectrum analyzer
- attenuator control processor and two variable attenuators
- BER meter (Another BER meter is required if TX bit errors are measured
during stress cycle)
- climatic chamber.
37
Figure 16: Test setup for outdoor unit pre- and post-measurements.
Measurements are usually done in at least two temperatures. If the stress is inflicted
to the unit under test by temperature cycle, then the pre- and post-measurements are
done at the maximum and the minimum temperatures of the cycle. Moreover, the
measurements may also be done at room temperature. The outdoor unit has several
capacities and modulations. Each combination of these capacities and modulations
has to be tested separately at every testing temperature.
8.1.1 RX BER compared to input signal power level
This test measures the receiver sensitivity of the UUT. Sensitivity is the smallest
RX signal power level the UUT can receive without bit errors. The smaller the
sensitivity is the more attenuation is acceptable between two microwave radios.
A transmission analyzer and two power meters are required for this test case. RX-
signal of the UUT is attenuated and at some point the BER increases rapidly.
Attenuation is adjusted to find out the RX-signal levels where the BER is 10−6 and
10−3. This test can be performed also increasing attenuation step by step. RX-signal
level can be counted from the TX signal power of the reference unit subtracted with
the total attenuation when decibel scale is used. [31]
8.1.2 Spectrum of TX signal
This test measures the TX signal spectrum. The main concerns are usually too high
sidebands, IF harmonics and too high noise floor.
38
For this test case a spectrum analyzer is needed. TX signal from the UUT is led to
the spectrum analyzer which measures its spectrum over the signal bandwidth. The
spectrum is compared to a mask which sets the limits it must not exceed. [31]
8.1.3 Accuracy of TX signal power
TX signal power level can be chosen manually or controlled automatically according
to the radio link attenuation. It is adjusted through a loop-back to match the
nominal value.
TX signal power of the UUT is measured with power meter. Different combinations
of nominal output power levels and TX frequencies are used and with every com-
bination the real output power of the UUT is measured. These real output powers
are then compared with the nominal values. [31]
8.1.4 Accuracy of TX carrier frequency
TX carrier frequency is mixed with the IF signal. Carrier frequency determines the
transmission frequency over a hop.
The microwave radio transmits only the carrier frequency, if the modulation of the
radio is turned off. The carrier frequency can then be measured with a frequency
counter or spectrum analyzer. The result is then compared to the nominal value of
the TX carrier frequency. [31]
8.1.5 Spurious emissions
Spurious emissions are unwanted transmitted signals outside the transmission band-
width. These can be harmonic signals due to mixer LO leakages and improper
filtering.
For this test case a spectrum analyzer is needed. TX signal from the UUT is fed
to the spectrum analyzer. Several frequency sweeps are taken with the spectrum
analyzer so that the whole measurement band gets swept with right spectrum set-
tings. The frequencies that are included in these sweeps are from waveguide cutoff
frequency up to the second harmonic of the TX carrier frequency. The bandwidth
used in measurement of TX signal spectrum mask is excluded from this measure-
ment. The spectrum analyzer may also limit the upper boundary of the frequency
sweep when high-frequency microwave radios are tested. [31]
8.1.6 Accuracy of RX signal power measurement
The RX signal power has to be adjusted to certain level before it can be fed to
demodulator. The adjustment is done with an AGC amplifier in the IF unit.
39
The reference unit is set to transmit signal, which is fed through the waveguides
to the UUT. Outdoor units have inbuilt power meter to detect the incoming signal
power level. This signal power is compared to the real RX signal power, which is
counted from the power meter value and calibration data. [31]
8.2 Analyzing test results
The same set of measurements are done before and after stress cycle. Test results
are compared to specifications and also with each other. The comparison has to be
profound. Usually even small differences in the pre- and post-measurement results
indicate fast deterioration of some component in the UUT. [31]
Each test case has its specifications the measurement results has to fulfill otherwise
the test fails. The reason to mark a test as failed and some examples of possible
failure reason for each test case are listed in Table 1.
Failed test cases and differences between pre- and post-measurements might origi-
nate also from malfunctioning of a measurement equipment. To ensure that the test
result is genuine the test or part of it can be done manually or the test case can be
rerun. Also possibility of human error has to be considered.
40
Table 1: Failure reasons for test cases.
Test case Reason to fail the test Example of failure reason
RX BER The RX signal power is Something is wrong with the RX
higher than in signal path (e.g. in LNA) that
specifications for causes either noise floor to rise or
BER of 10−6 or 10−3. the RX signal to attenuate too much.
Accuracy of The signal power The detector diode may be broken or
RX signal measured by UUT differs calibration coefficients, that convert
power too much from the real the detector values to power values,
measurement value. are inaccurate.
Spurious UUT transmits signals The filtering is not good enough or
emissions outside the TX there are some LO-leakages that
bandwidth that have cause harmonic signals.
higher power than
specification approves.
Spectrum of The spectrum exceeds the In mixer the signal can degrade
TX signal mask. which may lead to symbol frequency
harmonic signals.
Accuracy of TX signal power differs Something is wrong with the power
TX signal too much from the adjustment loop or detector diode.
power nominal power.
Accuracy of Unmodulated carrier Crystal oscillator tuning has
TX carrier frequency differs too changed. Also aging causes
frequency much from the nominal change in crystal oscillator
TX frequency. frequency.
41
9 Problem definition
Reliability testing is time consuming process and is practically the last step a product
has to pass before it can be sold in the market. Telecommunication is a big business
and while technological development goes forward, new products and new versions
are been developed. Time from sketch board to market is essential factor when the
profitability of a new project is considered and whether it is launched or not.
Starting reliability testing in a new production facility may be a slow process. Re-
sources are needed for new equipment and for hiring and training new personnel.
When things are up and running it takes some time before personnel gains expe-
rience so that testing runs smoothly. Also human errors decrease over time, but
the more complex the testing process is the more prone it is to errors. Undetected
errors, whether during measurement or in calculations, lead to incorrect test results.
If these errors are undetected also in result review, the worst thing that can happen
is that the design of the tested product is changed according to these false results.
9.1 Work assignment
Work assignment includes further developing reliability testing automation for Flexi-
Hopper Plus outdoor unit by adding two new test cases to an already existing test
automation and enhancing the functionality of the test automation. The main ob-
jective is to reduce the amount of manual work and speed up reliability testing of
the outdoor units.
The existing test automation provides the system architecture and the following test
cases:
- BER compared to input signal power level
- spectrum of TX signal
- accuracy of TX signal power
- accuracy of RX signal power measurement.
The following test cases are to be added to the test automation:
- accuracy of TX carrier frequency
- spurious emissions.
Following ideas were presented considering the new test automation:
- Test automation should be able to control measurement equipments and cli-
matic chamber.
42
- Test automation handles measurements done before and after the stress cycle.
- Test engineer needs to attach radios to calibrated test setup, select correct
calibration data, select test plan and press ”START”.
- Results are stored to database from where results can be extracted as reports
for the review.
- Test automation should be able to be copied to a new location.
- Test engineer doesn’t need to be ”expert” in reliability testing.
- Modularity of the test automation is important in order to be able implement
it for testing of different types of radios. [32]
43
10 Test automation architecture
Main parts of the test automation are:
- Teststand that reads configuration information from files and user input, con-
trols the automation process, calls VEE-programs and stores test results to a
database.
- Agilent Vee Pro that works as a run-time environment for VEE-programs.
VEE-programs send commands to measurement equipments and climatic cham-
ber and get output from them.
- Database where the results are stored.
- Reporting tool that extracts the results from database to printable version.
Figure 17: Test automation architecture.
10.1 Teststand - test sequence editor
Teststand is a software that controls the execution of tests. The Teststand engine,
as shown in Figure 18, plays a pivotal role in the Teststand architecture. The
Teststand engine can run sequences. Sequences contain steps that can call external
code modules. By using module adapters that have a standard adapter interface, the
Teststand engine can load and execute different types of code modules. Teststand
sequences can call subsequences through the same adapter interface. Teststand uses
a special type of sequence called a process model to direct the high-level sequence
flow. The Teststand engine exports an ActiveX Automation API that the Teststand
sequence editor and run-time operator interfaces use. [33]
44
Figure 18: Teststand system architecture.
Teststand sequence editor is an application program in which the sequences can be
created, modified, executed and debugged. Teststand run-time operator interfaces
are simpler than the editor and don’t allow meddling with the sequence file. There
are four different run-time operator interfaces and they are developed in LabVIEW,
LabWindows/CVI, Visual Basic and Delphi.
Most steps in a Teststand sequence invoke code in another sequence or in a code
module. When invoking code in a code module, Teststand must know the type of
code module, how to call it and how to pass parameters to it. Teststand uses module
adapters to obtain this knowledge. Teststand currently provide module adapters for
following programming environments:
- LabWindows/CVI
- Visual C/C++
- .NET
- C DLLs
- Java classes
- HTBasic
- TCL
- PERL
- VEE. [33]
45
The test management system must also perform a series of operations before and
after the test sequence executes to handle common test system tasks. These opera-
tions define the testing process, and the set of operations and the flow of execution is
called a process model. With process models different test sequences can be written
without repeating standard testing operations in each sequence. the process model
can be modified to vary the testing process to suit the unique needs of a production
line. Figure 19 contains a flowchart of the major operations in the default process
model. [33]
Figure 19: Teststand default process model.
46
10.2 Visual engineering environment
Agilent visual engineering environment (VEE) is a visual programming language
environment. VEE supports standard ties to ActiveX Automation and Controls,
and DLLs. Therefore it can be used from Teststand through ActiveX Automation
adapter.
VEE programs are created by selecting objects from menus and connecting them to-
gether. The result in VEE resembles a data flow diagram, which helps understanding
the code. In figure 20 there is an example of a simple VEE program.
Figure 20: ”Random” program in VEE.
In VEE, data moves from one object to the next object in a consistent way: data
input on the left, data output on the right, and operational sequence pins, which
control the execution order of the objects, on the top and bottom.
10.2.1 Communication with measurement equipment
Professional measurement equipments, like spectrum analyzers, power meters, bit-
error-ratio meters and so on, come with possibility to control them remotely via
various physical interfaces. Most common interface buses are General purpose in-
terface bus (GPIB) and serial communication bus. RS-232 type of serial interface
port is common in personal computers and climatic chambers. Climatic chambers
can usually be programmed to perform a temperature cycle.
Installation of Agilent IO libraries suite software is required if Agilent VEE has to
control measurement instruments. Agilent VEE supports GPIB, VXI, serial, GPIO,
and LAN interfaces for communicating with measurement instruments.
47
11 Implementation of the automation software
The master’s thesis work assignment was to implement two new test cases, accuracy
of TX carrier frequency and spurious emissions, to an existing automation software
which is for reliability testing of the FlexiHopper Plus microwave radio outdoor unit.
I was familiar with the test cases, because I had already worked the previous summer
as an intern doing reliability testing. The automation development tools, Teststand
and VEE Pro, I had never used before. I had some programming background and
with help from a colleague I learned the basics of these development tools pretty
quickly.
The development process wasn’t as straightforward as I first thought it would be.
Controlling the measurement equipment and climatic chamber through direct I/O
interface, which specifies the communication transaction in detail, turned out to be
the most difficult part of the project. Although the old test automation had been in
use for some time without any significant errors, there were run-time errors when I
ran the old test automation in my development setup. The problem appeared when
a query was sent to a climatic chamber or spectrum analyzer and run-time error
occurred because answer was not received. First I thought that the problem was
mismatch of the VEE Pro version. The old automation had used version 6.5 of the
VEE Pro and now my development setup had newer 8.5 version of the VEE Pro
installed. In order to locate the cause of the problem I made a test code module.
As the problem persisted even with the code module that was purely made with
the newer version, it was obvious that the problem lied elsewhere. My colleague
suggested that the root cause might be that the computer I used was newer and
faster than the ones used with the old test automation. After sending a query to a
climatic chamber or a spectrum analyzer, the computer waits for an answer. The
time the computer waited for the answer is proportional to the clock rate of the
CPU so the new computer didn’t wait long enough and read an empty answer from
its receive buffer. The problem was solved by adding some delay between sending a
query and reading the answer.
Altogether it took almost six months to get the project done. Two new test cases,
spurious emissions and TX frequency accuracy, are now included as new test cases to
the reliability test automation. Functionality of these new test cases and the whole
test automation was checked with several trials. Trials were performed with two
different frequency band, 8 GHz and 15 GHz, outdoor units. Test plan during trials
consisted of all the test cases and three different temperatures. The test automation
was started, left alone over night and the results were extracted from database on
the following day. The automation worked as planned.
Alongside with the implementation of the test cases I studied reliability testing that
contributed to the theory part of my master’s thesis.
48
12 Results
It takes about 55 minutes to run through every test case once with one capacity on
one modulation. In Table 2 there are listed the times taken by the execution of the
test cases. These times are not exact.
Table 2: Time taken by the execution of the test cases with one capacity and one
modulation.
Test case Manual (min) Automation (min)
TX frequency accuracy 2 0.5
Spurious emissions 10 1
All test cases N/A 55
The two new test cases are quickly done when automation is used. Manual execution
of the test cases is a lot slower compared to automation. Usually every combination
of the capacities and modulations are tested separately. The difference between
execution times can be multiplied with the number of these combinations. There
are six of these combinations in a FHP radio. Also the test automation can change
the temperature of the climatic chamber. It takes up to an hour and a half to get
the temperature inside the outdoor unit under test stable after the climatic chamber
has reached its target temperature.
49
13 Conclusion
In this section the the benefit of including the two test cases to the test automation
is described and a future prospect is proposed.
Mainly the benefit comes from reduced manual work and execution time, because
the test engineer doesn’t have to be present while the automation executes the test
cases, and the whole test plan can be carried out over night or over weekend. Other
benefits are:
- Automated test execution ensures that measurements have good reproducibi-
lity. This helps comparing the radios with one another.
- Manual execution is prone to human errors which may result in unnecessary
re-testing or wrong conclusions.
- All the test case results are now stored in the same database that makes it
easy to access them and extract reports.
Some future prospects came to my mind while doing this thesis work. This automa-
tion could be done without Teststand Sequence editor by replacing it with VEE
code modules. This would make the test automation architecture seen in Figure 17
simpler and easier to adapt to new requirements. The usefulness of the Teststand is
in the clear way of presenting the execution process in steps. Also Teststand handles
database logging very well. The version 9.0 of the VEE Pro has in-built database
handling and it could be possible to replicate the steps in Teststand.
50
References
[1] Military Standard MIL-STD-721C, Definitions for Terms for Reliability and
Maintainability, June 12, 1981.
[2] International Standard, Information technology Software product quality, Part
1: Quality model, ISO/IEC FDIS 9126-1, ISO/IEC 2000.
[3] Patrick D. T. O’Connor, David Newton and Richard Bromley, Practical Relia-
bility Engineering, Chichester, Wiley, 2002.
[4] Milton Ohring, Reliability and failure of electronic materials and devices, Aca-
demic Press, San Diego (CA), 1998.
[5] Gregg K. Hobbs, Accelerated Reliability Engineering, Chichester, Wiley, 2000.
[6] M. A. Miner, Cumulative damage in fatigue, Journal of Applied Mechanics,
Vol. 12, pp. 159-164, 1945.
[7] D. Steinberg, Vibration Analysis for Electronic Equipment, John Wiley & Sons,
New York, 1988.
[8] R. Viswanathan, Damage mechanisms and life assessment of high-temperature
components, Metals Park (OH), ASM International, pp. 158, 1989.
[9] Shahrzad Salemi, Liyu Yang, Jun Dai, Jin Qin and Joseph B. Bernstein,
Physics-of-Failure Based Handbook of Microelectronic Systems, The Reliability
Information Analysis Center, March 2008.
[10] J. S. Suehle, Ultra-thin gate oxide reliability: physical models, statistics, and
characterization, IEEE Transactions on Electron Devices, Vol. 49, No. 6, pp.
958971, June 2002.
[11] Jorg D. Walter and Joseph B. Bernstein, Semiconductor Device Lifetime En-
hancement by Performance Reduction, tech. rep., University of Maryland,
ENRE, 2003.
[12] Kin P. Cheung, Soft breakdown in thin gate oxide - a measurement artifact, 41st
IEEE Annual International Reliability Symposium, Dallas Texas, pp. 432-436,
2003.
[13] Y.C. Yeo, MOSFET gate oxide reliability: anode hole injection model and its
application, International Journal of High Speed Electronics and Systems, Vol.
11, No. 3. pp. 849-886, 2001.
[14] Cheming Hu, A unified gate oxide reliability model, IEEE Annual International
Reliability Physics Symposium, IRPS99, pp. 47-51, 1999.
51
[15] W. W. Abadeer,A. Bagramian,D. W. Conkle,C. W. Griffin,E. Langlois,B. F.
Lloyd,R. P. Mallette,J. E. Massucco,J. M. Mckenna,S. W. Mittl,P. H. Noel,
Key measurements of ultra-thin gate dielectric reliability and in-line monitoring,
IBM J. Res. Develop., Vol. 43, No. 3, pp. 407-416, May 1999.
[16] Xiaojun Li, Jin Qin, Bing Huang, Xiaohu Zhang, and Joseph B. Bernstein,
SRAM circuit-failure modeling and reliability simulation with SPICE, Trans-
actions on Device and Materials Reliability, Vol. 6, No. 2, pp. 235-246, June
2006.
[17] E. Y. Wu, CMOS scaling beyond 100-nm node with silicon-dioxide-based gate
dielectric, IBM J. Res. & Dev., Vol. 46 No. 2/3, pp. 287-298, March/May 2002.
[18] E. Wu, Interplay of voltage and temperature acceleration of oxide breakdown
for ultra-thin gate oxides, Solid-State Electronics, Vol. 46, pp. 1787-1798, 2002.
[19] Alexander Acovic, Giuseppe La Rosa and Yuan-Chen Sun, A review of hot-
carrier degradation mechanisms in MOSFETs, Microelectron. Reliab., Vol. 36,
No. 7/8, pp. 845-869, 1996.
[20] Eiji Takeda, Hitoshi Kume, Toru Toyabe and Shorijo Asai, Submicrometer
MOSFET structure for minimizing hot-carrier generation, IEEE Journal of
Solid State Circuit, Vol. SC-17, No. 2, pp. 241-248, April 1982.
[21] S. Mahapatra, Investigation and modeling of interface and bulk trap generation
during negative bias temperature instability of p-MOSFETs, IEEE Transac-
tions on Electron Devices, Vol. 51, No. 9, pp. 1371-1379, 2004.
[22] D. Young and A. Christou, Failure mechanism models for electromigration,
IEEE Transactions on Reliability,Vol. 43, pp. 186-192, 1994.
[23] Steve S. Chiang and Rama K. Shukla, Failure mechanism of die cracking due
to imperfect die attachement, Proceeding of IEEE Electronic Components and
Technology Conference, pp. 195-202, 1984.
[24] Arne Bo¨rjesson and Valter Loll, Background Information to Reliability Stress
Screening of Components, Bor˚as, 1994.
[25] Patrick D. T. O’Connor, Test Engineering: A Concise Guide to Cost-Effective
Design, Development and Manufacture, Wiley, 2001.
[26] http://en.wikipedia.org/wiki/Wave soldering (31.03.2009)
[27] Thermotron Industries Ltd. Company, Fundamentals of accelerated stress test-
ing, Company brochure, 1998.
[28] C. Seusy, Achieving phenomenal reliability growth, ASM Conference on Relia-
bility - Key to Industrial Success, Los Angeles, CA, pp. 24-26, March 1987.
52
[29] H. Caruso and A. Dasgupta, Fundamental overview of accelerated-testing an-
alytic model, Proceedings Annual Reliability and Maintainability Symposium,
19-22 Jan. 1998, pp. 389-393.
[30] Nokia Siemens Networks, Product description for Nokia Flexihopper (Plus) 2.7,
2007.
[31] Nokia Siemens Networks, Reliability Test Plan for FlexiHopper XC and Flexi-
Hopper Plus products, 2007.
[32] V. Ha¨ma¨la¨inen, P. Aleksejev, A. Tyrva¨inen and J. Taipale, meeting room, Es-
poo, Karaportti 2, 02610 NSN, Meeting 23.10.2008.
[33] National Instruments, Teststand user manual, 2001.
53
Appendix A: Typical examples of accelerated stress
tests
Applied Accelerated test Main stressor Failure mechanism
stress
method
Constant High-temperature Temperature Junction degradation, impurities
deposit, ohmic contact, inter-
metallic chemical compounds
Operating life test Temperature Surface contamination, junction
Voltage degradation, mobile ions, EMD
Current
High temperature, Temperature Corrosion, surface contamination,
high humidity Humidity pinhole
storage
High temperature, Temperature Corrosion, surface contamination,
high humidity Humidity junction degradation, mobile ions
bias Voltage
Cyclic Temperature cycle Temperature Cracks, thermal fatigue,
difference broken wires and metallization
Duty cycle
Power cycle Temperature Insufficient adhesive strength
difference of ohmic contact
Duty cycle
Temperature- Temperature Corrosion, pinhole,
Humidity cycle difference surface contamination
Humidity
difference
Step Operating test Temperature Surface contamination, junction
stress Voltage degradation, mobile ions, EMD
Current
High temperature Temperature Surface contamination, junction
reverse bias Voltage degradation, mobile ions, TDDB
54
Appendix B: Frequently used acceleration models
Description Application Model equation
Arrhenius acceleration model
Life as a function Electrical insulation Life = A0exp(−EakT )
of temperature and dielectrics, where
or chemical solid state and Life = median life of population
aging semiconductors, A0 = scale factor determined by
intermetallic experiment
diffusion, battery Ea = activation energy of the failure
cells, lubricants & mechanism
greases, plastics, k = Boltzmann’s constant
incandescent lamp T = temperature (K)
filaments
Norris-Landzberg a.k.a. modified Coffin-Manson
Fatigue life of Solder joints and AF = ( ∆Tl
∆Tf
)1.9(
ff
fl
)1/3exp[1414 1
Tmaxf
1
Tmaxl
]
metals (due to other connections where
thermal cycling AF = acceleration factor (cycle basis)
and/or thermal ∆T = package/board temperature
shock difference between Ton and Toff (K)
Tmax = maximum solder joint
temperature (K)
f = cyclic frequency (cycles per 24h).
Minimum number of six.
f ,l = subscripts to denote field and
lab conditions respectively
Black
Life as a function Capacitors, MTTF = AJ−nexp(Ea
kT
)
of temperature electromigration where
and current in aluminium A = constant based on cross-sectional
density conductors area of the interconnect
J = current density
n = scaling factor
Ea = activation energy of the failure
k = Boltzmann’s constant
T = temperature (K)
55
Appendix C: The relation among failure modes,
mechanisms and factors
Failure Factors Failure Modes Failure Mechanisms
Diffusion Substrate Crystal defect Decreased breakdown voltage
Junction Diffused junction Impurity precipitation Short circuit
Isolation Photoresist mask Increased leakage current
misalignment
Surface contamination
Oxide film Gate oxide film Mobile ion Decreased breakdown voltage
Field oxide film Pinhole Short circuit
Interface state Increased leakage current
TDDB hfe and/or Vth drift
Hot carrier
Metallization Interconnection Scratch or void damage Open circuit
Contact hole Mechanical damage Short circuit
Via hole Non-ohmic contact Increased resistance
Step coverage
Weak adhesion strength
Improper thickness
Corrosion
Electromigration
Stress migration
Passivation Surface Pinhole or crack Decreased breakdown
protection film Thickness variation voltage
Interlayer Contamination Short circuit
dielectric film Surface inversion Increased leakage
current
hfe and/or
Vth drift
Noise deterioration
Die bonding Chip-frame Die attachment Open circuit
connection Die crack Short circuit
Unstable/intermittent
operation
Increased thermal
resistance
Wire bonding Wire bonding Wire bonding deviation Open circuit
connection Off-center wire Short circuit
Wire lead bonding Increased resistance
Damage under wire
bonding contact
Disconnection
Loose wire
Contact between wires
56
Appendix D: The relation among failure modes,
mechanisms and factors continued
Failure Factors Failure Modes Failure Mechanisms
Sealing Resin Void Open circuit
Sealing gas No sealing Short circuit
Water penetration Increased leakage current
Peeling
Surface contamination
Insufficient airtightness
Impure sealing gas
Particles
Input/output Static electricity Diffusion junction Open circuit
pin Surge breakdown Short circuit
Over voltage Oxide film damage Increased leakage current
Over current Metallization
defect/destruction
Others Alpha particles Electron-hole pair generation Soft error
High electric Surface inversion Increased leakage current
field
Noise
