Integration of Physical Reliability Knowledge into the Design of VLSI-Circuits by Geest, D.C.L. van et al.
INTEGRATION OF PHYSICAL RELIABILITY KNOWLEDGE INTO THE DESIGN OF VLSICIRCUITS 
D.C.L. van Geest*, R.H. Hoeksma*, A.C. BrombacherO, O.E. Henmann* 
*) Twente University, Faculty of Electrical Engineering, Laboratory for Network theory, 
P.O. Box 217,7500 AE Enschede, The Netherlands; Phone +31-53892813; Fax +31-53340045 
O )  Philips Consumer Electronics, Quality Engineering Dept., Building SK6, 
P.O. Box 80002,5600 JB Eindhoven, The Netherlands; Phone +314734188; Fax +31-40733966 
Abstract 
This paper presents a method to optimise the reliability of a circuit 
in its application using a CAD system that simulates circuit 
behaviour including tolerances and allocates critical parts of the 
circuit. As an example a circuit susceptible to electromigration has 
been optimised towards both reliability and functionability. 
Building-in reliability requires a lot of co-operation between 
different disciplines [ 11. To make a VLSI-circuit robust by design, 
all devices have to be robust against their actual user conditions. 
This implies that not only detailed knowledge about the devices is 
required but also a lot of information about the circuit behaviour in 
its application is necessary. 
At this moment the link between devices and circuit is based on 
design rules that describe under what conditions devices may be 
used. This is especially the case for digital circuit design. However, 
these design rules are normally not detailed enough to perform a 
proper optimisation of circuit reliability. For example, 
electromigration design rules may consist of a maximum peak 
current and a maximum average current. But the actual dynamic 
wave form is no parameter, although this influences 
electromigration. Therefore it is necessary to have device failure 
models on circuit level. Such models must be detailed enough to 
detect conditions under which failures occur, but do not necessarily 
need to describe the whole failure mechanism in detail. 
As a circuit usually contains many devices each having their own 
failure mechanisms influenced by several stress-factors, a 
systematic approach is needed. This approach must be flexible 
enough to describe all failure mechanisms properly, but general 
enough to make it usable for a circuit designer who can not be an 
expert in the physical aspects of all failure mechanisms. 
This paper presents a systematic approach to model failure 
mechanisms on circuit level and to use these models for 
optimisation of both reliability and functionability. Optimisation is 
done using a CAD system. This way it is possible to carry out such 
an optimisation in a very early stage of the design process. In this 
system the stress-factors of the failure mechanisms are calculated 
using a circuit simulator. The effect of internal and external 
tolerances is incorporated in the simulation, as reliability problems 
often do not occur in nominal circuits, but in extreme circuits and 
under extreme user conditions. From the simulation results the 
sensitivity of failure behaviour for so-called designable parameters 
on circuit level is determined. This information is used to optimise 
the design towards minimum occurrence of failures. For functional 
demands the same methodology is used. Experience from earlier 
circuits about critical devices or topologies can be stored in a 
knowledge-base. This knowledge-base helps the designer to allocate 
critical parts of the circuit even before circuit simulation has been 
carried out. It is important to have th is  possibility of focusing on 
problem areas, as VLSIcircuits are too large to simulate totally. 
The method used for modelling of failure behaviour is called 
"stressor / susceptibility interaction" [2]. It is based on the 
susceptibility of physical failure mechanisms for extemal influence 
factors, called stressors. Stressors are physical entities, like 
currents, voltages, temperatures or powers. Most failure 
mechanisms depend on more than one stressor. The combination of 
all non redundant factors influencing a single failure mechanism is 
called the stressor set of that failure mechanism. 
The probability of activation of a certain failure mechanism is not 
only determined by the stressor set. Two comparable devices, 
subjected to the same stressor set, can have a different failure 
probability due to a different susceptibility for this stressor set. 
Mathematically susceptibility is defined as the probability that a 
device, under a given stressor set, will fail within a certain interval 
of time. 
Figure 1 shows an example of stressor and susceptibility probability 
density functions. For the simplicity the susceptibility in the figure 
only depends on one stressor. The stressor probability density 
function represents the probability that a certain stressor value 
occurs. The fact that the stressor value is not fixed but has a certain 
distribution, is due to variation in material (tolerances) and user 
conditions. The susceptibility probability density function represents 
for each stressor value the probability that the failure mechanism is 
activated. The numerical value of the failure probability is the 
convolution of stressor and susceptibility. However, for 
optimisation, the goal is not to obtain a quantitative failure 
probability but to minimise occurrence of failures. For optimisation 
overlap of stressor and susceptibility must be avoided or minimised. 
The above example is suitable for short-term failure mechanisms 
where activation of a failure mechanism immediately causes a 
failure [3]. In case of long-term failure mechanisms, for example 
-+ 
Mmaor bvel 
Figure 1 :  An example of stressorfsusceptibility 
81 CH3194-8/93/0000-0081$01 .OO 0 1993 IEEEilRPS 
Figure 2: Long-term stressor and susceptibility in case of constant 
stressor distribution and degradation behaviour 
electromigration, the susceptibility is not constant but changes 
during lifetime (see figure 2). 
Figure 2 is only valid in case of a stressor distribution that is 
constant in time and degradation behaviour that does not vary over a 
batch of circuits. In practice, however, this is not necessarily true. 
Due to the degradation the electrical behaviour of a device may 
change which influences circuit behaviour. This results in a stressor 
distribution that changes during lifetime. Also degradation 
behaviour usually shows a very large variation. Due to tolerances in 
the physical structure and material properties of a device there is a 
large amount of time between the f i s t  failing device and the 
average failing device. 
To incorporate these effects, it is necessary to introduce an extra 
dimension representing the degradation level of a device (see 
figure 3). The degradation level can be seen as the parameter that 
describes the momentary physical state of the device. The 
degradation level is determined by stressor values in the past. 
Stressors and susceptibility distributions are now a function of the 
stressor values and the degradation level. At to the degradation 
level of all circuits equals zero. Therefore the distribution is located 
along the stressor axis. After that the degradation of each individual 
circuit is determined by its individual stressor value. This results in 
the distribution at t i  where degradation level varies over a batch of 
circuits. This process continues until overlap between stressor and 
susceptibility arises. In this figure overlap arises at t2, so at that 
time a certain probability exists that the device fails due to this 
failure mechanism. 
When degradation occurs the physical state of a device changes, 
which influences its electrical behaviour. Electrical parameters of 
the device are therefore a function of the degradation level. 
To avoid simulations over very long time intervals, the degradation 
process can be calculated using a Markov approach. Based on the 
stressor and degradation distribution at t i  the degradation level at t2 
"lue 
Figure 3: Long-term stressor and susceptibility with degradation as 
an extra dimension 
is calculated. The calculated degradation level at t2 determines the 
electrical behaviour of the circuit at t2 and thus the stressor 
distribution. 
The possibility to take into account variation in both electrical and 
degradation behaviour is an important advantage of this method. 
Another advantage is the possibility to incorporate changing 
electrical parameters during lifetime. This is much more flexible 
than only calculating the TI'F based on an extrapolation at to. As 
functional demands are taken into account as well, circuits where 
slightly changed electrical parameters disturb the proper operation 
of the circuit, are also detected as unwanted circuits (although no 
hard failure has occurred yet). 
Using this concept it is possible to describe very different failure 
mechanisms in one single concept. The only requirement is a model, 
describing the susceptibility of a failure mechanism in relation with 
the associated stressors. For long term failure mechanisms in 
addition to the susceptibility model a model is necessary describing 
the degradation as a function of the stressors. 
It is possible to use this concept as integrated part of the design 
process. As all failure mechanisms are described using the same 
methodology, this can be implemented into a circuit design CAD 
package. This will be discussed in more detail later in this paper. 
F l e d "  
. .  
To show how susceptibility and degradation models can be made an 
electromigration model has been developed and used to optimise a 
clock generator IC in its application. 
Electromigration strongly depends on current (density) and 
temperature. During lifetime the resistance changes due to 
degradation. The exact resistance-curve strongly depends on the 
used material in combination with under layers, passivation layers, 
etc. Most experiments and models in literature determine time-to- 
failure (TTF) as a function of current-density and temperatyre. In 
these experiments a failure is defined as a resistance that deviates 
more than a certain percentage (for example 10%) from its starting 
value. Lifetime experiments are usually done under DC stress 
conditions, although some results have been reported for AC, 
alternate and pulsed stress conditions. 
In practice, however, currents are not of these idealised types. 
Electromigration occurs as a function of the dynamic current-density 
in a circuit. The result of a circuit simulation will not be a 
DCcurrent, but a dynamic one. Therefore the susceptibility and 
degradation model needed here must be able to use the dynamic 
current as a stressor. 
This has been done using the integral of the square current in a 
certain time-interval as a stressor( YEM,, ). Contrary to normal 
stressors like voltage or current, the stressor now not only depends 
on the momentary value, but also on the wave form of the current. 
The time-interval must be large enough to contain all occurring 
currents (the whole wave form) but small enough to have negligible 
degradation during the interval. The second stressor in the model is 
the temperature of the conductor ( 
the exact definitions of the stressors. The degradation level equals 0 
). Equations 1 and 2 give 
82 
at ru. The increase of degradation level between tu and t i  depends 
linearly on YEM,/ and has an exponential temperature-dependency 
(equation 3). The susceptibility is a simplified function 
(equation 4). As long as the degradation level is less than p, failure 
probability is zero. When the degradation level exceeds p, failure 
probability equals one. The electrical behaviour of a conductor on 
circuit level is presented by its resistance. The resistance is assumed 
to increase linearly with the degradation level (equation 5). 
For DC-stressed conductors the time-to-failure in this model is 
equivalent with Blacks model [4]. On circuit level the model has 
dynamic current and temperature as an input and resistance, 
degradation level and failure probability as an output. A circuit 
designer does not have to know more details about the model than 
this. This makes it very easy to implement more accurate models 
when they become available as there is no change on circuit level. 
m a t i o n  of c ritical Darts 
Failure mechanism models should be added at all critical parts of 
the circuit. This implies that a method is needed to allocate these 
critical parts. For example electromigration can occur in all metal 
lines, but it is often impossible to perform an analogue simulation of 
a total VLSI circuit extended with a large number of failure models. 
Therefore critical parts must be found even before analogue 
simulation. 
This is done using knowledge about structures in the circuit that 
may become critical. From previous designs a lot of knowledge is 
available about critical devices or topologies. It is possible to store 
this information in a knowledge base. With this knowledge base the 
CAD-system finds parts of the circuit that may be critical. When 
such a critical part is found, a failure model is added into the circuit 
together with an analysis profile that describes the simulations that 
are necessary to investigate the failure behaviour in detail. 
An operational knowledge base should contain for each failure 
mechanism the failure model (including stressor-set and 
susceptibility) and "rules" to allocate critical parts. Every time a 
critical part is found the failure model is added at that point. The 
susceptibility is used as a specification on the stressors, which is 
used for optimisation. 
Conductors that may be critical for electromigration are conductors 
with high currents with respect to the line width. Rules to allocate 
these parts have to take into account the line width and devices 
connected to the conductor that conduct high currents. 
In this paper two rules have been used to identify critical parts: 
0 All output parts can be critical, as actual user conditions are 
often not the same as specified conditions. 
Lines that supply large transistors with respect to their line 
width can be critical, as the transistors may drive heavy loads. 
In future more sophisticated rules can be stored in the knowledge 
base. It is always possible to implement new knowledge, which 
makes it a powerful tool for designing robust circuits. 
Figure 4 shows a block diagram of the CAD system that has been 
used. The design package MINNIE [5] (or MOUSE) contains a 
user-interface and several optimisation algorithms using design- 
centering methods. Circuit simulation is done by the circuit 
simulator PSTAR [6] .  Rules for reliability optimisation are taken 
from the knowledge base VERA [7]. All software packages run on 
HP/Apollo workstations. 
The circuit simulation is performed by PSTAR. Circuit topology, 
parameters, tolerances and analysis profiles are defined in a netlist 
file that is created by MINNIE. Electrical parameters of ICdevices, 
including tolerances and correlations, are defined in a PSTAR- 
library. The output of the simulator is a set of voltages, currents, 
powers, etc. 
In MINNIE the circuit is drawn and parameters and tolerances are 
defined. The toleranced parameters are divided into two groups, viz. 
designable and non-designable parameters. Designable parameters 
are parameters that the designer is free to change in order to 
improve his design (e.g. transistor dimensions, resistance values). 
Non-designable parameters are parameters that the designer can not 
change (e.g. "process parameters" of ICs or parameters of 
components from external suppliers). Functional demands are taken 
into account by the optimisation algorithms by defining 
specifications on the simulation results. 
Figure 4: Block diagram of the CAD system 
83 
CLOCK GENERATm IC CLDCK UNES ON PRINTED CIRCUIT BOARD 
dmv- 
I 
out2 2cm 
Figure 5: Simplified circuit diagram of the clock generator IC and its application 
The optimisation algorithms make use of the results of a Monte- 
Carlo circuit simulation in which all tolerances are included. In a 
Monte-Carlo simulation a circuit is simulated several times, each 
time with a new set of parameters that are determined by a random 
generator. This way a batch of circuits is sunulated that is 
comparable with a batch coming from a production line. For all 
designable parameters the sensitivity of both functional and 
reliability behaviour is determined. The algorithm used in this paper 
is called "centre of gravity" [8]. This algorithm divides the circuits 
of the batch into two groups, one passing all specifications and one 
failing one or more specifications. The circuit is then optimised by 
moving the nominal values from the failing circuits towards the 
passing circuits. 
The knowledge base VERA contains rules to allocate structures in 
the circuit that may be critical. Whenever a critical structure is 
found, a failure model is added at that point to the PSTAR netlist 
file. The analysis profile is extended with the associated stressors 
and the susceptibility is taken into account using specifications on 
the stressors in the same way as for functional demands. 
The interface between MINNIE and PSTAR is fully operational at 
this moment. To start a simulation MINNIE activates PSTAR with 
the analysis profile defined by the designer. Optimisation is done 
using specifications on the functional behaviour of the circuit and on 
stressors. 
The interface between MINNIE and VERA is not fully operational 
yet. Therefore in this paper the translation from rules for critical 
parts towards reliability specifications and a proper analysis profile 
has been done by hand. This has been done using the same 
methodology as an automated system would do. 
A clock generator IC that may be susceptible to electromigration has 
been simulated and optimised towards reliability and funtionability. 
Figure 5 shows the simplified circuit diagram of the clock IC and 
the design of the board (PCB). Clock inputs of other ICs are 
represented by capacitances. The clock frequency is 27MHz and the 
rise time of the signal is about 4nsec. Due to the high spectral 
frequencies it is necessary to take into account second order effects, 
like bond inductance and propagation on board. To avoid reflection 
in the clock lines special attention must be paid to the load of the 
lines. This load should be as close as possible to the characteristic 
impedance of the PCB-line. In the original design the above effects 
have been taken into account. The nominal design meets all 
functional and reliability demands. However, due to tolerances, 
some circuits in a large batch have either reliability or functional 
problems. 
To illustrate that it is not always necessary to increase the line 
width of the critical conductors, it is tried to improve the design 
with given line widths. This is also useful because increasing line 
widths probably increases chip-area (thus cost), while other changes 
may not influence chip-area. The resistances R and RVd+?, and 
capacitance C v d  were chosen as designable parameters inside the 
IC. The discrete components Rclk, Rafsl and C.f.1 were also chosen 
as designable parameters. All other parameters can not be changed 
for circuit optimisation. 
P 
Decreasing the load will decrease the stress of the clock generator 
IC but will also have a negative effect on the shape of the clock 
signal due to reflections. Therefore a compromise has to be found 
that has optimum reliability and optimum functionability. 
In the simulation tolerances of the IC, discrete components and 
board are included. IC tolerances are modelled using a solcalled 
process block which gives a relation between process parameters 
and electrical parameters. This way correlations due to the process 
are included. The discrete components are considered to have 30% 
tolerances. This high value has been chosen to be sure that the 
discrete components are not critical for correct operation of the 
circuit. Capacitance of the clock inputs is considered to have 10% 
tolerance. 
Following the rules for electromigration three points are found 
where electromigration might be a problem. These points are the 
l i e s  to the two output pads and the internal supply line between 
decoupling and output buffers. At this points electromigration 
degradation models were added. Similar to component tolerances 
the degradation behaviour also has a tolerance. This tolerance is 
considered to be 10%. The analysis needed to f i d  the related 
stressor distributions is a Monte-Carlo transient analysis of one total 
clock period with the degradation levels of the conductors as output 
signals. In this example the Monte-Carlo simulation consists of 100 
samples. 
a4 
lout1 [mA] 
Yield 
Function 
Reliability 
Total 
Before 
Before After 
61% 100% 
77% 100% 
49% 100% 
" T  
30 
0 
-30 
-60 
t [nsec] 
after 
" T  
22, 
-60 
t [nsec] 
Figure 6: Simulated current at out1 before and after optimisation 
Figure 6 shows the nominal wave form of the current at out1 before 
and after optimisation. Important to notice is that the peak-current 
has not changed by optimisation. Apparently the degradation level is 
not determined by the peak-current but by the whole wave form. 
The optimisation algorithm divides the batch of circuits into two 
groups. The fmt  group contains circuits that meet all functional 
demands and has demanded lifetime (passing circuits). The second 
group contains circuits that either fail one of the functional demands 
or do not have demanded lifetime (failing circuits). Optimisation is 
done by moving the nominal design from the failing circuits towards 
the passing circuits. This way the number of passing circuits (Yield) 
is maximised. 
It is also possible to optimise a circuit without using the automatic 
optimisation algorithms. Figure 7 shows a so-called pass / fail 
diagram. The horizontal axis is the value of a toleranced designable 
parameter. The black histogram represents the number of failing 
circuits and the dashed histogram represents the number of passing 
circuits. The pass /fail diagram in figure 7 indicates that most 
passing circuits have a relatively high value of parameter R 
Therefore a higher nominal value of R may improve the yield. P' P 
10 - 
n n 
30 ---F 40 Rp[n] 
Figure 7: An example of a pass-fail diagram (before optimisation) 
Usually optimisation is done interactively. The circuit designer uses 
his experience together with the pass / fail diagrams and 
optimisation algorithms. In this example optimisation has been done 
fully automatically. 
Table 1 shows the results of optimisation. Before optimisation 49% 
of the circuits are passing circuits meeting both functional and 
reliability demands. After optimisation this fraction has increased to 
100%. This value does not mean the circuit after optimisation is 
perfect. It means that the circuit do not fail within demanded 
lifetime. A next step could be to continue optimisation with a longer 
Designable 
parameter 
Rdrv 
Before I After 
lOQ I 17Q 
436pF z+ 
91Q I 119R 
45pF 13pF I 
Table 1: Yield ana' designable parameters before and after 
optimisation 
85 
before before 
T number of circuits 
30 
degradation level [%] 
a1 r 
number 
of circuits 
degradation level [YO] 
Figure 8: Distribution of the degradation level at demanded 
lifetime before and after optimisation 
demanded lifetime or with more severe functional demands. This 
may be useful to give the design a certain margin in case there are 
unexpected factors influencing function or reliability. 
Figure 8 shows histograms of the degradation level before and after 
optimisation. Before optimisation there is a large fraction that fails 
before the end of demanded lifetime. After optimisation all circuits 
have a degradation level below the susceptibility limit at demanded 
lifetime. Figure 9 shows histograms of the voltage on a clock input 
pin at t=7.5nsec. For proper operation this voltage must be above 
2.0V. Before optimisation a small fraction has a voltage that is too 
low. After optimisation all circuits have a correct value at t=7.5nsec. 
Conclusipns 
It is possible to describe physical failure mechanisms, like 
electromigration, in terms of stressors. susceptibility and 
degradation. 
Stressor / susceptibility models are usable for reliability 
analysis and optimisation on circuit level. 
By means of rules critical parts in the circuit can be found 
before analogue simulation. 
T number of circuits 
30 
0 
afier 
number 
of circuits 
30 
0 
2.0 3.0 __+ 
Voltage IC76 M 
Figure 9: Distribution of voltage ICIB at t=7.5nsec before and 
after optimisation 
References 
R.W. Thomas, H.A. Thomas, "Building-in reliability: 
Making it work", presented at the ESREF conference, 
Bordeaux (F), October, 1991. 
A.C. Brombacher, Reliabilitv bv de-, Chisester (UK): 
John Wiley & Sons, 1992. 
D.C.L. van Geest, A.C. Brombacher, O.E. Hemnann, 
"Robust design of circuits susceptible to electromigration", 
presented at the ESREF conference, Schwabisch-Gmiind 
(D), October, 1992. 
J.R. Black, "Elcctromigration: A brief survey and some 
recent results". IEEE Tr. El De v,, VOI ED-16, NO. 4, 1969. 
Analogue Electronic Design System h4I"IE, Interactive 
Solutions Ltd., 275-281 Kingstreet, Hammersmith, London 
W6 (9LZ), UK, 1990. 
PSTAR user manual, Philips Electronic Design & Tools, 
Eindhoven (NL), 1992. 
VERA user manual, Philips Electronic Design & Tools, 
Eindhoven (NL), 1990. 
R. Spence, R.S. Soin, -desiprl of e l e c t r o ~  
w, Wokingham (UK): Addison-Wesley, 1988, ch. 7 
and ch. 9. 
86 
