INTRODUCTION
The military and aerospace electronics industries are experiencing an ever increasing demand for the use of plastic encapsulated microcircuits and semiconductors. Use of PEMs without specific attention to the environment in which they will be used introduces a number of technical risks in military and aerospace equipment applications that are not associated with hermetic packaged devices.
The G-12 Solid State Device Committee of the Government Electronics & Information Technology Association (GEIA) developed guidelines for assessing the suitability of plastic encapsulated microcircuits and semiconductors for use in military, aerospace and other rugged applications. EIA Engineering Bulletin
SSB-1, Guidelines for Using Plastic Encapsulated Microcircuits and Semiconductors in Military, Aerospace and Other Rugged Applications provides:
• Methods for selecting the most suitable device for the application from both an equipment performance and economic perspective
• Means to emulate commercial buying practices by drawing upon qualification and reliability evaluation methods applied by the microelectronics design and manufacturing industry SSB-1 presently includes four annexes that describe the reliability assessment method, including supporting technical rationale.
• SSB-1.001 Qualification and Reliability Monitors recommends minimum qualification and monitoring testing of plastic encapsulated microcircuits and discrete semiconductors.
•
SSB-1.002 Environmental Tests and Associated Failure
Mechanisms provides more detailed information concerning the environmental stresses associated with qualification and reliability monitor tests and the specific failures induced by these environmental stresses.
• SSB-1.003 Acceleration Factors provides reference information concerning acceleration factors commonly used by device manufacturers to model failure rates in conjunction with statistical reliability monitoring
• SSB-1.004 Failure Rate Estimating provides reference information concerning methods commonly used by the semiconductor industry to estimate failure rates from accelerated test results.
This paper presents the reliability assessment methodology described in SSB-1.
FAILURE-MECHANISM-DRIVEN RELIABILITY MONITORING
Failure-Mechanism-Driven Reliability Monitoring draws upon the concepts and implementation of line controls, process stability and effective monitoring programs in lieu of qualifying a product based solely on a fixed list of tests. A supplier must identify those failure mechanisms that may be actuated through a given product / process change(s), and design and implement reliability tests adequate to assess the impact of those failure mechanisms on system level reliability. In order for this to be effective, the supplier establishes a thorough understanding and linkage to their reliability monitoring program. Statistical Reliability Monitoring (SRM) is a statistically based methodology for monitoring and improving reliability involving identification and classification of failure mechanisms, development and use of monitors, and investigation of failure kinetics allowing prediction of failure rate at use conditions. Failure kinetics are the characteristics of failure for a given physical failure mechanism, such as the acceleration factor, derating curve, activation energy, median life, standard deviation, characteristic life, instantaneous failure rate, etc.
The failure rate of semiconductor devices is inherently low. As a result, the semiconductor industry uses a technique called acceleration testing to assess device reliability. Elevated stresses are used to produce the same failure mechanisms as would be observed under normal use conditions, but in a shorter time period. Acceleration factors are used by device manufacturers to estimate failure rates based on the results of accelerated testing. The objective of this testing is to identify these failure mechanisms and eliminate them as a cause of failure during the useful life of the product.
ACCELERATION TESTING AND FAILURE MECHANISMS
The following describes tests frequently used in statistical reliability monitoring (SRM) activities for plastic encapsulated microcircuits and semiconductors and identifies the potential failure mechanisms monitored by these tests. This discussion does not include all of the tests typically included in device qualification and reliability monitoring, but focuses on those tests specifically designed to apply to (or have unique implications for) plastic encapsulated microcircuits and semiconductors. EIA JESD-47 Stress-Test-Driven Qualification of Integrated Circuits includes a complete set of reliability stress tests used by the semiconductor industry for qualifying new or changed products.
Preconditioning of Surface Mount Devices (EIA JESD-22-A113)
The advent of surface mount devices (SMDs) introduced a new class of quality and reliability concerns regarding package cracks and delamination. Moisture from atmospheric humidity will enter permeable packaging materials by diffusion and preferentially collect at the dissimilar material interfaces. Assembly processes, used to solder SMDs to printed circuit boards (PCBs), will expose the entire package body to temperatures higher than 200°C. During solder reflow, the combination of rapid moisture expansion and materials mismatch can result in package cracking and/or delamination of critical interfaces within the package. The solder reflow processes of concern are convection, convection/IR, infrared (IR), vapor phase (VPR), and hot air rework tools. The use of assembly processes that immerse the component body in molten solder are not recommended for most SMD components.
IPC/JEDEC J-STD-033, Standard for Handling, Packing, Shipping and Use of Moisture/Reflow Sensitive Surface Mount Devices, describes the standardized levels of floor life exposure for moisture/reflow-sensitive SMDs. This standard also includes handling, packing and shipping requirements necessary to avoid moisture/reflow-related failures. These methods are provided to avoid damage from moisture absorption and exposure to solder reflow temperatures that can result in yield and reliability degradation. By using these procedures, safe and damage-free reflow can be achieved, with the dry packing process, providing a minimum shelf life capability in sealed dry-bags of 12 months from the seal date.
JESD22-A113, Preconditioning of Nonhermetic Surface Mount
Devices Prior to Reliability Testing, is an industry standard preconditioning flow for nonhermetic SMDs that is representative of a typical industry multiple solder reflow operation. The semiconductor manufacturer should subject these SMDs to the appropriate preconditioning sequence of this test method prior to specific in-house qualification and reliability monitoring to evaluate long term reliability which might be effected by solder reflow.
Bias Life Test (EIA JESD-22-A108)
This test is performed to determine the effects of bias conditions and temperature on solid state devices over an extended period of time. A device is defined as a failure if the parametric limits are exceeded or if functionality cannot be demonstrated under nominal and worst-case conditions.
Temperature Cycling (EIA JESD-22-A104)
Temperature cycling tests the durability of a package undergoing extreme temperature variations over a given period of time. Temperature is usually varied about a mean value with a constant ramp rate followed by a dwell period. This test exposes the package to mechanical stress and accelerates failure modes associated with differing coefficients of thermal expansion between die and encapsulant materials. The dwell period is important because it allows the part to reach thermal equilibrium and for stress relaxation to occur. To conduct a temperature cycling test, a temperaturecontrolled environmental chamber and a heating unit and cryogenic cooling unit with the ability to meet the ramp rate specifications are required. At the end of the test, the package is tested electrically and examined visually to identify areas of failure.
Failure mechanisms targeted by this test include die cracking, shorts and opens on die, passivation cracks/fracture, voids in die attach, plastic package fracture/cracks, wirebond pad cratering, excessive intermetallics in wirebonds, poor solder joints.
Autoclave (EIA JESD-22-A102)
Autoclave is an environmental test that measures device resistance to moisture penetration and the resultant effects of galvanic corrosion. It is a highly accelerated and destructive test. Conditions employed during the test include 121°C, 100% relative humidity, and 15 psig. Minimum test duration is typically 96 hours. Failure mechanisms targeted by this test include metallization corrosion, moisture ingress and delamination.
Disadvantages of autoclave testing lie in the fact that contaminants in the chamber can induce failures that are not representative of device reliability.
Temperature Humidity Bias (EIA JESD-22-A101)
The Temperature Humidity Bias Life (THB) test is used to test for moisture induced failures. Compared to Highly Accelerated Stress Test (HAST) or autoclave, it requires less severe levels of temperature and relative humidity. The test requires the devices to undergo a constant temperature, elevated relative humidity, and electrical bias (constant or intermittent, based on device type). Once moisture reaches the die surface, the electric potential helps transform the device into an electrolytic cell. This in turn accelerates the corrosion failure mechanism. Electrical tests are performed after the THB stressing to detect parametric drifts associated with corrosion of susceptible parts. Failure mechanisms targeted by this test include electrolytic/galvanic corrosion, delamination, and crack propagation. Common failure sites include interfaces between lead fingers and the encapsulant, wirebonds, bondpads, and die metallization.
THB has become less useful for microcircuits in recent years due to the increased packaging quality of die; reliability tests can run thousands of hours in order to get useful results.
Highly Accelerated Stress Test (EIA JESD-22-A110)
The Highly Accelerated Stress Test (HAST) is performed to evaluate the non-hermetic packaging of solid state devices in humid environments. This test uses a high temperature (usually 130°C), high relative humidity (about 85%), under high atmospheric pressure conditions (up to 3 atm) to accelerate the penetration of moisture through the external protective material or at the seals around the chip leads. Once moisture reaches the die surface (as described for THB), the electric potential helps transform the device into an electrolytic cell. This in turn accelerates the corrosion failure mechanism.. This test is intended to precipitate failure mechanisms associated with metallization corrosion, delamination at material interfaces, wirebond failures, and reduced insulation resistance. One should exercise caution when evaluating results of HAST tests performed at temperatures higher than 130°C. Such tests can precipitate different failure mechanisms that would not be seen during normal device operation.
HAST was developed especially for plastic encapsulated solid state devices after it became evident that autoclave and THB tests were no longer generating failures among certain robust PEMs. HAST detects failure mechanisms similar to those detected by THB, but at a greatly accelerated rate. Some device manufacturers substitute HAST testing for THB based on comparisons between lots with known moisture sensitivity and verifying that failures were due to the same failure mode. Acceleration factors are then applied to derive equivalent THB failure results from HAST test results.
ACCELERATION FACTORS
This following discussion addresses acceleration factors commonly used by device manufacturers to model failure rates in conjunction with in statistical reliability monitoring (SRM). These acceleration factors are frequently used by OEMs in conjunction with physics of failure reliability analysis to assess the suitability of plastic encapsulated microcircuits and semiconductors for specific end use applications.
Thermal Effects (Arrhenius
A f = acceleration factor Ea = activation energy, typical value for a given failure mechanism or derived from empirical data k = Boltzman's Constant (8.6171 x 10 -5 eV)
T u = use environment junction temperature (in °K)
T t = test environment junction temperature (in °K)
The Arrhenius Life-Temperature Relationship [1] is widely used to model product life as a function of temperature. This relationship is used to express both a single failure mechanism's sensitivity to temperature and a product's thermal acceleration factor. When used to estimate the reliability of a product, the form above is used to express that product's reliability with respect to temperature and as a function of time. The following table, from EIA/JEP122, is a first order listing of thermal activation energies assigned to general classifications of failure mechanisms applicable to microcircuits. If one has only superficial knowledge of the physical processing employed and has no other way of obtaining the characteristics of the failure mechanism, but knows that the failure falls under one of the categories on this table, then the selection of the typical value for thermal activation energy will provide the basis for a reasonable estimate of that failure mechanism's effect on the microcircuit failure rate. If one has more knowledge of the specific process and material used, EIA/JEP122 includes more detail to some of the specific materials and processes listed here. 
First Order Activation Energies

Non-Volatile Memory Data Retention
One should exercise caution where the Arrhenius LifeTemperature Relationship is used to derive acceleration factors for data retention time-to-failure. Based on the work of DeSalvo et al [2] , the Arrhenius relationship does not give the proper relationship for data retention life versus temperature. The Arrhenius relationship generally defines the rate of diffusion as a function of temperature. Since many failure mechanism in semiconductor devices are attributed to the effect of mobile ions, the Arrhenius relationship provides a good model for calculating the acceleration of these affects due to increased temperature, and visa versa, relating observed failure rates at high temperatures to expected life times at lower temperatures.
DeSalvo et al argue that the Arrhenius relationship does not properly model data retention in floating-gate non-volatile memory devices, because the data loss in due to charge loss, which obeys the Fowler-Nordheim transport. Cogent analysis of historical data demonstrates how the newly proposed "T-Model" fits existing data. The Arrhenius model, however, is shown to require different activation energies to fit the data at different test temperatures. Choosing the wrong activation energy for a given temperature can drastically exaggerate results.
The data retention time-to-failure using the "T-Model" is calculated by the equation: This equation is often used to estimate acceleration factors for temperature-humidity and bias effects when applied to HAST test results, and for temperature-humidity effects when applied to autoclave (unbiased). This model is also used for HAST testing performed without bias, a condition preferred by some users to approximate dormant storage under a variety of long term storage conditions. Peck [4] described a relationship between temperature, humidity and life for electrolytic corrosion of aluminum metallization. Peck concluded that this relationship allows the establishment of very-short-time tests to replace 1000-hour Temperature Humidity Bias (THB) testing and suggested using this relationship to extrapolate autoclave test results. This relationship has the following form.
where t t is time-to-failure, n = -2.66, Ea = 0.79eV, A is a constant (the temperature humidity failure rate in reference conditions) Subsequent to this study, Hallberg and Peck [5] found that data taken from several publications optimally fit this equation with n = -3.0 and Ea = 0.90eV. Recent studies indicate that some devices have higher activation energies associated with temperature-humidity effects. Tam [6] One should exercise caution when evaluating results of HAST tests performed at very high temperatures. Such tests can precipitate different failure mechanisms that would not be seen during normal device operation. Sinnadurai [7] advocated an upper limit of 130°C for the validity of HAST testing of PEMs. In an extreme example, Sinnadurai argues that at 140°C and 100% RH the polymer of a plastic package would progressively de-bond and the exterior terminations of the package would suffer electrolytic damage. The JEDEC standard test method, JESD22-A110, includes test conditions of 110°C at 85% RH and 130°C at 85% RH. Further, JESD22-A110 cautions that moisture reduces the effective glass transition temperature of the molding compound and that stress temperatures above the effective glass transition temperature may lead to failure mechanisms unrelated to standard 85ºC/85% RH stress.
Brizoux, et al [8] , of Thompson-CSF derived a model for temperature and humidity effects. Though the use of this model is not widely reported, it is presented here for completeness. This model is based on Peck's law and the Thompson-CSF functional failure model, which assumes temperature and power supply voltage conditions activate functional failures. In contrast to Hallberg's and Peck's work, Thompson-CSF found that temperature-humidity acceleration can be represented by Peck's law, with n = -2.66 and Ea = 0.7eV. Using the reference conditions of 55°C junction temperature, 50% relative humidity and voltage at nominal + 10%, the Thompson-CSF model is expressed in the following form. where T j = T a + θ ja P (T a is ambient temperature, θ ja is junction to ambient thermal resistance, P is dissipated power) Bias effects are incorporated in this model. When there is no bias, or when test voltage equals the nominal voltage for the device, this portion of the equation goes away. This model also addressed the notion that during operating conditions, the relative humidity at the die surface (RH j ) is lower than ambient relative humidity due to junction temperature heating effects. When the difference between the junction and ambient temperature increases, the die dries and the rate of acceleration decreases. Thompson-CSF models this corrective term using the following psychrometric law.
where RH a is ambient relative humidity, T j is junction temperature (in ûK) and T a is ambient temperature (in ûK) The Coffin-Manson Relationship [9] is an effective method to model the effects of low-cycle fatigue induced by thermal stressing upon microcircuit and semiconductor package reliability. This relationship is based on the inverse power law [10] originally used to model fatigue of metals subjected to thermal cycling and has been used for mechanical and electronic components, solder and other connections, and metals fatigue life. The typical number of cycles to failure (N) as a function of the temperature range (∆T) of the thermal cycle is expressed as
Thermo-mechanical Effects (Coffin-Manson
where A is the number of cycles to failure in reference conditions and B is characteristic of the specific metal and the test method.
The acceleration factor for the Coffin-Manson Relationship is the ratio of the temperature swing under accelerated conditions to the temperature swing under service conditions, raised to the power given by a Coffin-Manson exponent (m = 1/B) specific to each failure mechanism. [11] used this equation to analyze accelerated conditions for fractured-intermetallic bond and chip-out bond failures ("cratering") and derived Coffin-Manson exponents for these failure mechanisms. Blish and Vaney [12] subsequently applied this approach to thin film cracking, failures due to passivation film cracks induced by thermal stress. Blish [13] observed from several studies that Coffin-Manson exponents for integrated circuit failure mechanisms tend to lie in one of three relatively narrow ranges: 
Dunn and McPherson
Failure Mechanism m
USE CONDITION BASED RELIABILITY EVALUATION
The SEMATECH Reliability Technology Advisory Board (RTAB) developed a reliability evaluation methodology based on the use conditions a component is expected to encounter in its market applications [14] . One of the most critical steps in the process is defining environmental, lifetime and manufacturing use conditions since it provides the basis for all follow on activities that lead to establishing baseline performance. Determining the target market segment for a product establishes the use environment and lifetime appropriate for the technology.
It is important to note that semiconductor manufacturers derive baseline performance estimates for use conditions associated with their predominant market segment(s). The table, prepared by the SEMATECH RTAB, encompasses the majority of specific conditions within each major market segment. When assessing the suitability of a device for a specific application, it is essential to account for differences between the use environment and the environment the manufacturer used for reliability evaluation.
To illustrate this point, here is a specific example comparing reliability assessment results for a benign use environment versus results for a more stressful environment such as those encountered in many military, aerospace, and other rugged applications.
Upon reviewing a device manufacturer's product reliability report, we note that this manufacturer extrapolates HAST test results for temperature-humidity-bias induced failure mechanisms assuming use conditions of 70°C junction temperature and 17.6% relative humidity. For one product technology, the manufacturer publishes a failure rate estimate of 5 Failures-In-Time (FITs), or Mean-Time-To-Failure (MTTF) ≈ 22,500 years.
If, however, we recalculate the failure estimate for a use environment of 85°C junction temperature and 90% relative humidity (with all other elements of the failure rate calculation remaining equal), the result becomes 2431 (FITs), MTTF ≈ 47 years.
FAILURE RATE ESTIMATING METHODOLOGY
The most frequently used reliability measure for semiconductor devices is the failure rate (λ). For constant failure rate, the failure rate is the ratio of the number of failures to the product of the number of devices on test and the interval in hours (i.e. λ = number of failures / number of devices / number of test hours). The standard method for reporting long term failure rates for semiconductor devices is to express failure rate in Failures-In-Time (FITs), or the fraction of the number of failures per billion (10 9 ) device-hours.
To project from a sample to the population in general, one must establish confidence intervals. The application of confidence intervals is a statement of how "confident" one is that the sample failure rate approximates that for the population. To obtain failure rates at different confidence levels, it is necessary to make use of specific probability distributions. The chi-square distribution (χ 2 ), which relates observed and expected frequencies of an event, is frequently used to establish confidence intervals. The relationship between failure rate and the chi-square distribution is as follows: In order to derive the overall failure rate for a product, failure rates of potential failure mechanisms are estimated separately, then added together. This is known as the Sum-of-the-Failure-Rates method: where λ Total represents the overall failure rate and λ i represents the failure rate for each failure mechanism.
Example
In this example we will illustrate failure rate calculation for temperature-humidity-bias effects extrapolating HAST test results. Upon reviewing a device manufacturer's product reliability report, we note the following for HAST test extrapolation for temperaturehumidity-bias induced failure mechanisms: 
Deriving Acceleration Test Parameters from Use Condition Parameters and Sub-System Failure Rate Requirements
One can reverse this methodology to derive test parameters necessary to achieve a specific failure rate allocation for a particular end use environment. In this example, cyclic thermal stress conditions over the anticipated product life are shown in the table.
We plan to perform Temperature-Cycling test from -55°C to 125°C (∆T t = 180°C) to qualify the device for thermo-mechanically induced defects.
Using the Coffin-Manson Relationship with a conservative value for the constant (m = 3), we estimate the number of failure free test cycles associated with each condition. 
Voltage Derating for Discrete Semiconductor Devices
When estimating overall failure rates for discrete semiconductor devices, the amount of derating needs to be considered. An example would be the use of either a 200V 1A part, or a 900V 1A part in a 175V 0.5A application for a rectifier. In both cases, the junction temperature (Tj) will be determined by the amount of current (0.5 amps), the junction to ambient thermal resistance, the nominal ambient temperature, and the forward voltage of the device. In the case of voltage, however, there is a marked difference in stress for 25V derating versus 825V derating. Other application considerations can significantly effect the amount of stress applied to a device. For example, a transient voltage suppressor (TVS) does not operate during normal assembly operation. Because of these issues, stress factors should be used in determining the effect of derating on overall failure rate.
In the case of a 200V rated device used in a 175V application, the electrical stress ratio (V s ) is 0.875. The table shown here provides electrical stress factors for low frequency diodes [15] . Using the table, the corresponding stress factor (Π S ) is Vs 2.43 = 0.723.
In the case of a 900V rated device used in a 175V application, the electrical stress ratio (V s ) is 0.194. Using the table, the corresponding stress factor (Π S ) is 0.05.
The overall failure rate becomes: 
Sub-System Level Analysis
One can derive a sub-system level failure rate estimate from the cumulative failure rates for all devices in the sub-system. A complete sub-system level failure rate estimate would, of course, include other factors (e.g. derating, assembly manufacturing process, etc.) in addition to the cumulative failure rates for all components. For the purpose of this paper, however, we will confine our discussion to device level failure rate calculations discussed earlier.
Upon reviewing device manufacturers' product reliability reports for each device used in the sub-system, we note the test conditions, device-hours and number of failures from the applicable acceleration tests. We use sub-system level (e.g. circuit board) thermal analysis results to establish use condition junction temperatures (T u ) for each device. Using the methods described earlier, we calculate the failure rates associated with each environmental effect and then derive an overall failure rate for each device. Finally, we sum the failure rates all of the devices to estimate the sub-system level failure rate. The table illustrates a sub-system level failure rate estimate. 
