I . INTRODUCTION
Reliability predictions could play a role in preventing, detecting, and correction of failures associated with design, manufacture and operation of the product. A reliability prediction could be used to measure the relative worth which, combined with other design considerations, aid in selecting the best available options; to study impact of design changes on reliability; and to guide the design to meet the application conditions.
In the past electronics-reliability prediction has focused solely on models, curve-fit to field data returns, generally assuming a constant failure rate for electronic-device times to failure [Pecht, Nash, and Lory, 19941 . The constant failure-rate values for each device are then summed to obtain a constant failure-rate value for the entire product. However, reliability predictions are usually distorted by including field failure data obtained from improper socketing, calibration or interconnection of components; faulty handling, installation or storage; incorrect use of the component; and invalid statistical data predict "intrinsic" reliability, which may be much higher than the actual reliability.
Reliability modeling that incorporates these factors is likely to yield widely divergent failure rates. It is clearly impossible to predict field reliability where any kind of mishandling can cause the product to fail.
Fortunately, the reliability prediction paradigm is changing, especially as it pertains to up-front design. A criterion for judging failure models, their applicability, utility and design implications is being established, and consistent definitions of failure, failure mechanisms, modes and confidence are being applied. This is done by employing physics-of-failure concepts. Physics-of-failure is an approach to reliability modeling, design, and assessment that utilizes knowledge of failure mechanisms to prevent product failures through robust design and manufacturing practices. The approach is based on the identification of potential failure mechanisms and failure sites for the product. Each failure mechanism is described by the relations between the stresses and variabilities at each potential failure site. The stress at each failure site is obtained as a function of environmental and usage conditions, as well as of product geometry and material properties. Thus, the approach proactively incorporates reliability in the design process by establishing a scientific basis for evaluating new materials, structures, and technologies, using well-designed tests, screens, safety factors, and acceleration transforms.
Traditional reliability assessment techniques heavily penalize new materials, structures, and technologies because of insufficient failure data. This approach, based on fear of the unknown, rather than on any science-based analysis, discourages change, hindering reliability enhancement. The physics-of-failure approach, on the other hand, is based on generic failure models that are as effective for new materials and structures as they are for existing designs. The approach encourages innovative design through the use of realistic reliability assessment. This paper discusses the physics-offailure approach to design, reliability modeling, testing, and screening of IC, hybrids, and MCMs that has been implemented in CADMP Software at the CALCE Electronic Packaging Research Center at the University of Maryland.
APPROACH
High-reliability requirements set by many component suppliers necessitates addressing reliability upfront in the design phase.
PoF approach addresses the limitation of statistical probabilistic reliability approaches, which cannot address or identify the reliability concerns during the design process. This approach involves:
identifying potential failure mechanisms (chemical, electrical, physical, mechanical, structural, or thermal processes leading to failure); failure sites; and failure modes (which result from the activation of failure mechanisms, and are usually precipitated as shorts, opens, or electrical deviations beyond specifications); identifying the appropriate failure models and their input parameters, including those associated with material characteristics, damage properties, relevant geometry at failure sites, manufacturing flaws and defects, and environmental and operating loads; parameters when possible, as distribution functions;
. 
.
This approach proactively incorporates reliability in the design process by establishing a scientific basis for evaluating new materials, structures, and electronic technologies. In addition, it provides information to plan tests and screens, and to determine electrical and thermal-mechanical stress margins. In terms of using the PoF approach, the interested reader is referred to books by Pecht [ 19941 and Pecht, Dasgupta, Evans, and Evans [1994] .
ROLE OF THE CADMP-I1 SOFTWARE
Computer-aided Design of Microelectronic Packages (CADMP-11) is a set of integrated software programs that can be used to design and assess single chip packages and MCMs. CADMP-I1 is based on the PoF approach, to assess reliability by investigating potential failure mechanisms, failure sites, and root causes of failures. The CADMP-I1 capabilities may be used in several ways (Figure 1 ):
CADMP-I1 will assist in rapidly assessing alternative microelectronic package design solutions during the design phase. Reliability assessment for each design alternative consists of a pareto ranking of the dominant failure mechanisms and time-to-failure. The ability to satisfy mission life requirements, and avoid failure due to any of the dominant failure mechanisms during mission life, will provide a measure of relative worth and aid in selecting the best of the available options.
. Once a design is selected, the CADMP-I1 may be used as a guide to design improvement by identifying the drivers for the dominant failure mechanisms. The drivers may include package geometry, architecture, material properties and application environment. The merit of various design trade-offs can then be evaluated by determining the sensitivity of the dominant degradation mechanisms to failure mechanism drivers.
A three-dimensional thermal stress analysis can be used to simulate application environment and identify hotspots or areas of thermal overstress. The stress analysis results may be used to evaluate the need for environmental control systems or a design change.
. Impact of proposed design changes can be evaluated by comparing the reliability assessment of the existing and proposed designs.
. The a.bility of proposed design to resist any damage during environmental stress screening may be assessed using the CADMP-11. This information can be used to modify design or derive appropriate screens for minimal residual damage to good components.
. FinallLy, the CADMP-I1 is useful to various other engineering analysis. As examples, for evaluation of new materials, structures, and technologies; assessment of packages designed by other software programs or manufacturers; maintenance strategy planners can make use of the potential failure sites, to minimize down time.
E,ach of the design steps are described in more detail, in what follows. Design process for both single chip packages and MCMs begins by definition of constraints.
Constraints
Package design constraints constitute the fixed information given to the design team; design goals, on the other hand, are not necessarily fixed. Major design constraints include parameters defined by the passive and active elements that need to be packaged and the platform on which the package will be mounted.
Life-cycle profile constraint information includes data on package manufacturing, assembly, storage, transportation, usage, and repair environments, including the expected history of temperature extremes, temperature ranges during tlemperature cycling, temperature gradients, vibrational and shock loads, chemically aggressive or inert environments, electromagnetic isolation, and radiation shielding. In other words, the Me-cycle profile contains the necessary stress information to design for reliability.
Manufacturing limitations and capabilities can constrain the design, based upon the availability and control of manufacturing processes. For example, the package may need to be built at a particular manufacturing site with specific manufacturing, assembly, inspection, and rework capabilities. The manufacturing process can also place constraints on material selection and package architecture, as well as on production volume, yield, cost, and schedule.
Information Base
The information base for design, reliability assessment, testing, screening, and derating is provided by databases that include package element materials, environments, tests, screens, components, and failure mechanism models.
0
Materials. The materials library includes properties for various package elements including die or chip; first-level interconnects (wire bond interconnects, tape-automated bonds, flip-chip, flip-TAB and HDI); and device packaging materials (die attach, substrate, substrate attach, case, lid, lid seal, lead seal, and lead). These properties provide a basis to select materials for different package elements, and also form inputs for the physics-of-failure models used in reliability assessment.
Environments. The environments library includes complete descriptions of common electronic environments in terms of characteristic load parameters-temperature (minimum, maximum, and average), magnitude and number of temperature cycles per year, relative humidity (RH) (minimum, maximum, and average); RH cycle; vibration load (acceleration power, spectral density, frequency, waveform, and vibration mode); maximum acceleration load: weather and environmental conditions; and radiation.
Tests and Screens. The test and screen library includes a complete description of common tests and screens in terms of their characteristic stresses, which include temperature, relative humidity, pressure, electrostatic discharge, radiation, and refined forces (e.g., bond pull or die shear). The user can select the stresses that will be part of the test or screen requirements. Each stress is characterized by a set of parameters, which include maximum, minimum, dwell at maximum, dwell at minimum, ramp time maximum-tominimum, ramp time minimum-to-maximum (for temperature, relative humidity, and pressure); pulse time, current, and voltage (for electrostatic discharge); mode, max acceleration, and excitation frequency (for vibration); and energy, dose, dose rate, and pulse width (for radiation).
Components. The component library provides information on active and passive circuit elements within a package, including die or chip, resistors, and capacitors. Each component is characterized in terms of such parameters as physical dimensions and electrical specifications. Components selected from the library can be placed on the substrate to represent the packaging configuration quickly and to obtain required information for subsequent reliability assessment.
Failure Mechanisms. The failure-model library provides a systemized approach to editing and to accessing information on potential electronic failure mechanisms. The failure-model library contains algorithms for electronic failure assessment. 
Mission Profile
Mission profile refers to the magnitude and duration of all the loads the package is subjected to during its life-cycle. The mission profile allows the user to specify the environment, test, and screen durations to which the package will be exposed during mission life (Figure 2 ). Mission profile information is used by the failure-model library to evaluate the dominant failure mechanisms in the device architecture under the specified loads. 
Figure 2 The mission profile tool allows the user to specify the environment, test, and screen durations to
which the package will be exposed during the mission life.
Package Architecture
The package architecture provides the ability to specify the geometry and materials of a package. Package architecture also allows the user to start with a new design or modify a previously designed package from the parts library. Design aids are provided to help the user select inner interconnect (die-to-package) technology; substrate; and package-to-board mounting technology and package type-for example, dual inline package (DIP), quad flat-pack (QFP), pin-grid array (PGA), or multi-in-line package (MIP). The user can specify the electrical parameters and physical dimensions of the active and passive elements to be packaged in the multichip module. Finally, the package is completely described by specifying the dimensions and materials for each package element, including the element attach, substrate, substrate attach, lead, lead seal, lid, lid seal, case, and interconnects. (Figure 3) 
Stress analysis
Stress analysis allows the user to evaluate the package architecture in terms of stress distributions, concentrations, intensity factors, temperature distribution, and hot spots. For the given package configuration-that is, geometry, materials, and power dissipation of circuit elements-a finite differencebased thermal stress analysis of the package indicates the temperature distribution in the package (Figure 4 ). This distribution is used to calculate the stress concentration and distribution inside the package. The boundary conditions of thermal analysis are specified in terms of temperatures along the sides, bottom, and top of the package. The results are displayed as a map of the temperature distribution in various package elements.
Reliability Assessment
'The reliability analysis calculates the time to failure for dominant failure mechanisms in a multichip module. This is accomplished by inputting the package attributes and the environmental parameters into the failure model equations.
CAMP-IHERIIIL RNRLYSIS DISPLAY n L t t v p nrw

Ran@ Xit
CrossUIevs
TopUieus
.
I
Figure 4: Thermal analysis results from CADMP-II.
The results are then ranked in terms of their time to failure. The failure models used to evaluate each mechanism can be selected from the failure-mechanism library or can be userspecified, depending on the application. Different loads can also interact to cause failure. For example, a thermal load can trigger mechanical failure because of a thermal expansion mismatch. Other interactive failures include stress-assisted corrosion, stress-corrosion cracking, fieldinduced metal migration, and temperature-induced acceleration of chemical reactions.
Reliability information can be used to assess whether the package will survive for the designed-for life. If the time to failure for the mechanism with the lowest time is less than the desired mission life, then the sensitivity of the failure mechanism to design parameters can be iteratively evaluated until system reliability goals are met ( Figure 5 ).
3.7
Screening, Qualification, and Derating Traditionally, blanket levels of stresses are used to precipitate defects. This screening methodology may not address the dominant failure mechanisms in the package architecture because the stress dependencies and defect thresholds of the failure mechanisms are not considered. Physics-of-failure models provide information on which loads can accelerate a particular failure mechanism and the effect on the time to failure of increasing the stress above the normal operating load. The idea is to address use-condition failure mechanisms during testing and screening. Thus, physics-of-failure models can be used to determine the type, magnitude, and duration of accelerated test loads, and to extrapolate test results to life under normal operating conditions. A common derating condition is thermal derating, based on the belief that reliable electronics can be achieved by lowering temperature. The problem is that there has been no scientific way to evaluate the temperature sensitivity with respect to device design, and thus the need for lowering temperature; the value of the lower temperature as a function of device architecture; the sensitivity of the maximum operating temperature with design; and the effect of manufacturing defect magnitudes on thermal reliability. Also, a method of changing any other form of stress to achieve the desired mission life has not yet been presented. The penalties of not having this knowledge include added cost and weight.
I
Existing thermal derating procedures give the design team a false sense of security about achieving increased reliability at lower temperatures. Lower temperatures may not necessarily increase reliability, since some of the failure mechanisms are inversely dependent on temperature; for example, device technologies with hot electrons as the dominant mechanism may have lower reliability at lower temperatures. Even in microelectronic technologies where temperature is a dominant failure accelerator, steady-state temperature thresholds, affecting device sensitivity to various forms of temperature stress, are a function of device architecture, materials, manufacturing defects, and other non-temperature-related operational stresses. Assigning a generic value of lower temperature to a technology, based on the assumption that all devices operating at that lower value of temperature will be reliable, is thus arbitrary [Kopanski 1991, Witzman et al. 19911 .
Cumulative device derating curves to evaluate the sensitivity of device life towards the temperature and non-temperature stresses, the user can plot the device life versus percentage change from a nominal stress value (Figures 6, 7) . This menu allows the user to identify stress derating thresholds below which lowering stress magnitudes will produce no additional benefit in terms of added life. These thresholds can be identified as values of stresses for which the projected time to failure is well beyond the specified mission life of the module.
Threshold values for temperature and non-temperature stresses that cause the critical parameters of the device technology to exceed acceptable ranges are calculated based on closed-form models relating device architecture to critical parameters. [min(SP(iyj3>, min(Sr(i, j3>l where max(S,(ij)) is the maximum allowable stress value derived from reliability considerations, max(Sp(ij)) is the maximum allowable stress value derived from performance considerations, and max(S,(ij)) is the maximum allowable stress value for the device. Min represents the minimum values of the individual stresses. Maximum values are specified for stresses directly proportional to detrimental effects on package reliability and performance. Minimum values are specified for stresses with an inverse influence on package performance or reliability.
SP(i,J]
Physics-of-failure concepts have been used to relate allowable operating stresses to design strengths through quantitative models for failure mechanisms. Failure models have been used to assess the impact of derating on the effective reliability of the component for a given load. The quantitative correlations outlined between derating and reliability will enable designers and users to tailor the margin of safety more effectively to the level of criticality of the component, leading to better and more cost-effective utilization of the functional capacity of the component
IMPACT
The benefits of using this software include scientific consideration of reliability during the design phase; evaluation of new materials, structures, and technologies; assessment of packages designed by different manufacturers; development of science-based tests, screens, and derating methods; and costeffective product development through investigating trade-offs This project has been funded by a grant from the NSF SIUCRC program and the members of CALCE EPRC. CADMP software has been developed jointly by a team of researchers at the CALCE Electronic Packaging Research Center, University of Maryland.
