ABSTRACT Power electronics are widely used in energy conversion systems due to their high efficiency. Finding and replacing the defective power electronic modules timely by monitoring the ageing state of devices can greatly improve the security of power converters and reduce the loss caused by the device failure. Packaging-related fatigue has been identified as one of the main causes of the failures of power electronic modules. This paper proposes a method to monitor the fatigue inside a module by identifying the increase of internal on-state resistance under a certain operation conditions due to the device packagingrelated fatigue. Multiple physical field model results show that the junction temperature increases with the thermal resistance, which causes an increase in the on-state resistance. Therefore, the healthy state of power modules can be diagnosed by comparing the difference of the on-state resistance before and after ageing in the same case temperature, which can be measured directly. Experiments and simulations are conducted to demonstrate the concept and verify the method.
I. INTRODUCTION
Power electronic modules which are widely used in the harsh environment are the core components in most of the power electronic systems, playing a main role in energy generation, transmission and consumption systems. A converter failure will result in high maintenance costs. The power system reliability, risk and maintenance costs can be greatly improved by replacing semiconductor devices when incipient failures are detected. Therefore, building an accurate online condition monitoring model is becoming an increasingly important issue to investigate. However, the condition monitoring of power modules is still a challenge.
Condition monitoring methods are established based on the failure parameters, which led by packaging-related fatigue nowadays. Reference [1] propose a method using the caseabove-ambient temperature rise to evaluate the solder fatigue. This method needs an accurate calculation of the power dissipation, which is difficult to be achieved in practice, and it cannot identify the fatigue source, which is caused by the power module or the heat sink. A method is presented to monitor the solder fatigue in a voltage-source inverter by detecting its output harmonics based on thermal and power loss models, because low-order harmonics, which are caused by non-ideal switching characteristics are affected by the increase of the junction temperature of power devices due to the solder fatigue [2] . However, a small change of the harmonic current is used to detect the solder ageing of power modules. Conditions are monitored by identifying dynamic changes of the gate current, which is based on the principle that parasitic elements inside the module are affected by local damage induced by ageing over time [3] . It is hard to measure the changes of the gate current in an actual converter, because the detection process must be completed in nanoseconds, and it cannot apply to evaluate the ageing state of a module when part of the bond wires lift off. A real-time health monitoring method is proposed by using 2-D case temperatures, which can defect the bond-wire ageing, metallization ageing, and substrate solder ageing by using only two temperature sensors [4] . In [5] and [6] , a new method is proposed to measure the on-state collector-emitter voltage during converter operation to monitor the device fatigue, which may play a key role in assessing the reliability of power converters, but the junction temperature is difficult to be measured in an actual converter due to the housing of the modules. In [7] and [8] , a method is presented for detecting the healthy condition of bond wires in a module based on the short-circuit current of power modules. In this method, it is low sensitive to indicate the ageing state, and the driving voltage should only be set as a constant inflexion driving voltage. Although such researches are being developed, there are still some innate shortcomings to limit their application. This paper proposes a new method for the device fatigue monitoring by measuring the case temperature, on-state voltage and drain current. Changes of the on-state resistance are monitored to indicate the healthy condition of the internal packaging.
The remainder of this paper is organized as follows. In Section II, the mechanism of the module ageing and the effects of solder fatigue are investigated. Meanwhile, an FE (Finite Element) model is established to obtain the characteristic parameters of the module fatigue. Section III presents a condition monitoring method based on detecting the change of the on-state resistance at an electrical operating point and case temperature. In Section IV, experimental results are provided for validation. Section V concludes the paper.
II. FAILURE MECHANISM AND FE MODELING A. FAILURE MECHANISM OF POWER ELECTRONIC DEVICES
In the actual converter, the junction temperature will fluctuate with the load condition varies. Mismatched coefficients of thermal expansion between adjacent layers and internal temperature gradients cause cyclic thermo-mechanical stresses, which leads to fatigue damage. Part of an MOSFET module's (IXFK80N60P3, 600V/80A) structure is shown in Fig. 1 . The main failure areas are the junction interfaces with the chip, because the power semiconductor chip produces massive amounts of heat during switching and conducting. The most commonly observed packaging-related failure modes are bond wire lift off [9] , [10] and solder delamination [11] - [13] . MOSFET modules would continually bear the impact of temperature swings, which leads to the accumulation of plastic strain. Macroscopic cracks will be initiated at the corner of the die joining interface due to the stress concentration. Once there are macroscopic cracks, it will propagate towards the center at high speed and lead to final fatigue failure. Normally, thermal resistance is used to indicate the health of the solder layer. The calculated thermal resistance is based on the junction temperature and power loss. While the junction temperature is difficult to be directly measured, and the power loss is also hard to be accurately calculated. Normalized on-state resistance is usually used to indicate the healthy state of the bond wires of MOSFET modules under a certain junction temperature, but both the die-attach solder fatigue R ds−on = R baseplate + R solder + R die + R bond−wires (1) where R baseplate is the resistance of the baseplate, R solder is the resistance of the die-attach solder, R die is the chip resistance and R bond−wires is the resistance of the paralleled wires.
References [14] - [16] show that the die-attach solder fatigue is usually more dominant whilst bond wire joints start to degrade when the junction temperature is above 473K. Therefore, solder fatigue-based condition monitoring method could be more effective than bond wire fatigue-based method. To further discuss the effect of the solder fatigue on the parameters, the interaction process is shown in Fig. 2 . Once the solder fatigue occurs, the junction to case thermal resistance R th increases gradually, and the junction temperature will increase on a specific case temperature. The onstate resistance will increase due to the positive temperature coefficient. Where R th represents the actual increase of the thermal resistance. Experimental results show that 1.5 K increase of case temperature while the junction temperature increases by 9.1 K using a thermal pad to emulate solder fatigue in [1] . It indicates that the increase of the junction temperature is larger than the case temperature. However, the junction temperature is difficult to be measured due to the packaging house. The increase of on-state resistance may be obvious due to the positive temperature coefficient. Therefore, the on-state resistance could be selected as the health indicator of the MOSFET modules.
B. FE MODEL OF POWER DEVICES
According to the packaging structure of the MOSFET module IXFK80N60P3, an electrical-thermal coupled analysis model is established in COMSOL Multiphysics software. The geometries and the material properties of each layer are supplied by the power module manufacture as shown in Table 1 .
All materials of this power device, except the solder layer are considered to have elastic properties. The solder layer is modeled by using the Anand's visco-plastic material model, which is popular to solder layer involving strain and temperature effect, assuming plastic flow occurs at all nonzero stress value. This model accounts for the physical phenomenon of strain-rate, strain hardening or softening characteristics, crystalline texture and its evolution, and does not require an explicit yield condition [17] . The specific material properties of the solder layer are shown in Table 2 . Where s 0 (MPa) is the initial deformation resistance, Q/R (K) is activation energy/Boltzmann's constant, A (s −1 ) is the pre-exponential factor, ξ is the stress multiplier, m 0 and η are the strain rate sensitivity of stress and strain rate sensitivity of the saturation value respectively, h 0 (MPa) is the hardening/softening constant, s (MPa) is the coefficient for saturation value of deformation resistance, and a is the strain rate sensitivity of the hardening/softening.
Multiphysics coupling field between electricity and heat transfer is used in this model. When the current source is considered for the thermal analysis, the differential equation for the temperature calculation of the element in each layer is shown in [18] . The coupled relationship is that the on-state resistance increases with the junction temperature and the large on-state resistance products the large power loss which causes the further increase of the junction temperature.
where J and γ are the current density and conductivity respectively, Q j is the boundary current source, T is the temperature, Q v is the heat source per unit volume, k is the thermal conductivity, ρ is the density and c is the specific heat.
To obtain accurate thermal and electrical characteristics of MOSFET modules under different loading conditions, some materials are set with temperature-dependent thermal conductivity in the model [19] , especially for the chip (silicon), which shows different behaviors at various temperatures according to the output characteristic curves as shown in the datasheet of IXFK80N60P3 devices [20] . However, there are only on-state resistance characteristics at 298K and 398K under driving voltage 10V. In the aerospace application, the range of the ambient temperature during this module running is from 218K to 343K. Therefore, several tests are designed from 218K to 423K junction temperature and 10V to 20V driving voltage to measure the on-state resistance using Agilent B1505 and the temperature chamber as shown in Fig. 3 . It can be seen that the on-state resistance increases with current, junction temperature and driving voltage respectively. It indicates that the on-state resistance has a non-linear relationship with drain current, junction temperature and driving voltage. Therefore, the operation conditions should be focused when the on-state resistance is used to indicate the health status of the MOSFET module. On-state resistances under different conditions are modeled using look up tables (LUTs), which are set as the material parameter of the chip, i.e., R ds−on = f (I D , T j , V GS ) as shown in Fig. 3(c) .
In order to obtain accurate results and save the simulation time, a multilevel meshing process is used in the FE model. Multilevel meshing means that for the critical layers (bond wires and solder layers), finer meshing is processed rather than other layers like the baseplate and plastic housing. Boundary conditions of the FE model are set as follows: the current load is applied to the drain terminal D in the simulation; the electric potential at the cross section of the source terminal S is set to zero; heat flux is defined to emulate the heat dissipation of the forced air cooling on the bottom surface of the FE model to simplify the model which is set to 2000 W/(m 2 · • C); packaging house is set as acrylic plastic and the open surfaces are set as natural convection which is set to 10 W/(m 2 · • C). The ambient temperature is set the same as the measured value 298K. FE model is generated as shown in Fig. 4 , in which the total number of nodes is 18075 and total number of body elements is 125118. 
C. FE MODEL STEADY-STATE RESULTS
A 53A constant current is uniformly applied on the drain terminal and the steady-state electric potential distribution of the MOSFET module is shown in Fig. 5 . Normally the MOSFET module is directly mounted on the heat sink using thermal grease to ensure contact and improve the physical integrity and thermal transfer, and the thermal grease is set as uniform [21] with a thickness of 40 µm. In the model a metallization layer (Aluminium) of dimensions 14.24 mm by 10.48 mm and thicknesses 10 µm is used to reduce the contact resistance. It shows that the main forward voltage drop is in the chip; the proportion of the baseplate is 0.0001%, the proportion of the solder is 0.05%, the proportion of the chip is 98.25% and the proportion of the bond wires is 1.70%, respectively.
Steady-state temperature distribution of the FE model is shown in Fig. 6 . It shows that the temperature is concentrated at the chip, die-attach and baseplate. almost consistent with the datasheet, and the error is only about 0.52%. Fig. 7 shows the variation of the on-state resistance as a function of drain currents. It clearly displays that the on-state resistance increases nonlinear with temperature and drain current which agrees with the aforementioned test results. To verify the accuracy of the FE model in the paper, the FE model results and the manufacture datasheet results are compared under different electrical operating points and junction temperatures. Good agreements are observed over the drain current and junction temperature rise, with error less than 1m . Therefore, the FE model can be used to extract the characteristic parameters of the module fatigue and the accuracy meets the design requirement.
D. CHARACTERISTIC PARAMETERS OF THE MODULE FATIGUE
Further FE models are designed to obtain the characteristic parameters which are used to establish the condition monitoring model under different simulated conditions. The module is fed with 40A current with and without packaging related failures. The effect of individual solder fatigue on parameters is carried out by reducing the area of the solder layer, while the bond wire fatigue is emulated by reducing the number of connected bond wires. References [22] and [23] propose that 5% increase of on-state voltage indicates the bond wire damage for IGBTs.
Reference [24] show that 20% increase of thermal resistance indicates the solder fatigue-related failure. The on-state resistance is normally used as the healthy indicator of MOSFET modules. The simulated results for the bond wire fatigue are shown in Fig. 8 . Fig. 8(a) shows the change of the junction temperature, thermal resistance and on-state resistance under different numbers of lifted off bond wires when I D = 40A. It indicates about 6% increase of on-state resistance when the five bond wires are lifted off, and it has a slight effect on the junction temperature, because the power loss of the whole chip is almost unchanged and the path of heat flux is the same as the initial state. The slight increase of junction temperature is caused by the change of the bond-wire's temperature. R th is temperature dependent as shown in (5), where T j is the junction temperature, T case is the case temperature, and P loss is the power loss of the modules. Fig. 8(b) shows that the temperature of the remaining bond wires increases when four bond wires are lifted off compared with Fig. 6(a) , because the current of remaining bond wires increases. Therefore, it leads to a slight increase in the junction temperature.
R th =
T j − T case P loss (5) Fig. 9 shows that the junction temperature and the on-state resistance increase with the thermal resistance when T case = 323.15K and I D = 40A. Meanwhile, it shows 54% increase of the on-state resistance corresponding to 20% increase of R th , which implies that the on-state resistance has higher sensitivity than R th . What is more, the thermal resistance will increase once solder fatigue occurs, which will finally lead to the increase of the junction temperature and on-state resistance.
Therefore, the simulated results agree with failure mechanism of power electronic devices as shown in part A of section II and the on-state resistance could be used to indicate the healthy condition of modules at a given electrical operating point and case temperature. Because those parameters can be measured directly and the healthy indicator has higher sensitivity than traditional healthy indicator.
III. CONDITION MONITORING MODEL BASED ON ON-STATE RESISTANCE
The traditional condition monitoring approaches cannot be used in real-time application because it is still difficult to obtain the accurate power loss from the varying inverter voltages and currents. Moreover, most power devices do not have integrated temperature sensors positioned as needed to measure the junction temperature. In this paper a method is proposed to monitor the condition of the MOSFET module by capturing the change of on-state resistance at a given electrical operating point and case temperature based on the results of section II. Because of the fast thermal transmittance inside the module, there is a direct relationship between the case and junction temperature. In other words, an increase of case temperature by 10 K implies increase of the junction temperature by approximately the same amount when the module is in healthy condition. The case temperature rather than the junction temperature, is concerned in the model, because the case temperature can be directly measured using a thermocouple which is embedded on the surface of the heat sink. The junction temperature is above the case temperature depending on the internal thermal resistance. The internal thermal resistance increases with solder fatigue. It should be noted that the junction temperature is not directly used to indicate the healthy condition because it is hard to be measured without removing the packaging house. It is just used to show temperature dependence.
At the same electrical operating point and case temperature, the change of the on-state resistance indicates the degree of the solder fatigue. The relationship among the on-state resistance, drain current and case temperature at a driving voltage, R ds−on−m0 = f (I D , T c ), is firstly obtained for a healthy module. Because in actual applications, converters are not always working under constant load stresses and the case temperature fluctuates with the change of load conditions and the environment temperature. Then, the on-state resistance and corresponding case temperature after the modules wear out are recorded at the same electrical operating point to be defined as the failure criteria.
This condition monitoring test can be conducted during the operation by capturing the electrical operating points that have been covered in the calibration test. The degree of fatigue can be calculated by the damage D as defined in (6), where R ds−on−max is the maximum allowable change of on-state resistance due to the module fatigue, which is defined when R th is increased by 20% from the initial value of a new module. R ds−on−m0 is the original on-state resistance and R ds−on−m is the current on-state resistance which is calculated by (7) . Once D reaches 100%, an alarm signal could be set on. Moreover, when D is less than 100%, it can be useful information for the inverter condition-based maintenance and lifetime estimation.
The procedure for the proposed condition monitoring method is illustrated in the flowcharts as shown in Fig. 10 . Given the corresponding case temperature and drain current, the on-state resistance R ds−on−m0 for healthy modules can be looked up from the reference table. On-state resistance R ds−on−m is calculated with measured parameters according to the (7) . D can then be calculated as the output. The reference look up table about the on-state resistance, drain current and the case temperature is extracted using the FE model under different conditions as shown in Fig. 11 . It shows that the minimum absolute error is 32% when R th is increased by 20% and the failure criterion is varying under different conditions. The maximum difference is 1.72 times when I D = 80A and T case = 218.28K. The absolute error of the aged module is large when the drain current and case temperature are large. On-state resistance has higher sensitivity than R th which is consistent with previous results.
A multi-stage simulation is conducted to simulate the change of environmental conditions when I D = 25A. The change of the ambient temperature T a leads to the change of the junction temperature, and on-state resistance will change accordingly. Results of the condition monitoring model are calculated with and without solder fatigue as shown in Fig. 12(a) . It indicates that the condition monitoring model can track the LUT-value effectively under different ambient temperatures. To further discuss the effectiveness of the condition monitoring model, damage Dis calculated for unaged and 1.17R th0 modules as shown in Fig. 12(b) . It shows that damageDis equal to zero for the unaged module as shown in red line of Fig. 12(b) , which means that the module is healthy. However, damageD will increase once the module starts to degrade. DamageDmaintains the current value as shown in black line of Fig. 12(b) , although the ambient temperature is changed. Consequently, the proposed model can eliminate the effect of environmental condition change on the model result.
IV. EXPERIMENTAL VALIDATION
Experimental work is carried out to validate the correctness of the condition monitoring method presented in this paper by using IXYS MOSFET modules (IXFK80N60P3). Firstly, a DC-DC converter is designed. Secondly, on-state resistance and case temperature are obtained where on-state resistance is calculated by the on-state voltage and drain current. Lastly, the proposed method is applied to verify its practicability. Buck converter is one of the simplest but most useful power converters. It is a step-down converter that converts an unregulated DC input voltage to a regulated DC output at a lower voltage. Fig. 13 depicts the basic circuit configuration used in the buck converter. As can be seen, it consists of two MOSFET modules Q3 and Q4, a diode D18, an inductor L4, and three output capacitors C16, C18 and C19. Two power modules are connected in parallel to meet a high output power. The inductor L4 acts as an energy storage element that keeps the current flowing while the diode facilitates inductor current wheeling during the OFF time of the MOSFET modules. Filter consisting of capacitors is normally added to the output of the converter to reduce output voltage ripple. The input voltage is 100 V, the output voltage is 28 V, and the switching frequency of power modules is 50 kHz. Fig. 14 shows the schematic of the on-state voltage V ds measurement circuit which is described in [6] and [25] . V ds can be represented as (8) . A thermal pad is used to emulate a module failure via an increase in thermal resistance. Experimental prototype of the DC-DC converter is shown in Fig. 15 . An electronic load is used as the load resistance.
The measured V ds−on is 950 mV, I D is 10A which is obtained from the electronic load (63205A) and T c is 331.48 K which is measured using temperature data logger (RDXL6SD). It should be noted that I D is only half the current of the electronic load, because there are two power modules connect in parallel. Then R ds−on−m is 95 m calculated by (7) . R ds−on−m0 which is obtained from the reference table (Fig. 11) is 94.43 m . R ds−on−max which is obtained from the reference table (Fig. 11) is 34.73 m . The absolute error of on-state resistance is about 0.006 and the calculated damage D is 0.016 calculated by (6) . It is evident that the measured result is in good agreement with the simulated result. To emulate the device fatigue within the converter, a thermal pad is cut to fit with the baseplate (20.4mm×18.6mm) and inserted between the module baseplate and the heat sink.
The thermal pad, Bergquist GapPad1500 [26] , is a layer of a material usually used to wrap electronic components for rigidity and electrical isolation without significantly impeding the heat conduction. In the experiment, the pad is treated as part of the module aged, i.e., by solder fatigue. The measured V ds−on is 1200 mV, I D is 10 A and T c is 334.46 K. Then R ds−on−m is 120 m . R ds−on−m0 is 96.15 m and R ds−on−max is 35.05 m . The calculated damage D is 0.6805. It can be seen that the on-state resistance increases significantly when a thermal pad is inserted. It is expected that the model structure can be applied to all power modules. But the reference tables should be updated for each type of design because designs can differ in die size, die structure, packaging structure and material which will affect the characteristic parameters of the power modules.
V. CONCLUSION
In this paper, a novel condition monitoring method is proposed for power electronics. MOSFET modules (IXFK80N60P3, 600V/80A) are selected as the research object. The increase of internal on-state resistance is used as a fatigue index for the condition monitoring under a certain case temperature and load current. LUTs of the normal state and wear out state are established for the health assessment. The condition monitoring method is then demonstrated by using DC-DC converter, which has the advantages of being fast and easy to be implemented. This model can be used for different environmental conditions and load levels. It is expected that this study will be useful for engineers developing practical condition monitoring solutions for power electronic converters in order to improve their operational reliability.
Using the proposed method, the real-time health level of MOSFET module can be real time evaluated in the operating power converter, and those defective MOSFET modules could be replaced timely, which can greatly reduce the maintenance costs and time to improve the security of the power converter. The condition monitoring method proposed in this paper could also be applied to IGBT which has a similar packing structure with MOSFETs. Further studies on this condition monitoring method applying to those devices are necessary in the future.
