Abstract-This paper proposes an in situ diagnostic and prognostic (D&P) technology to monitor the health condition of insulated gate bipolar transistors (IGBTs) used in EVs with a focus on the IGBTs' solder layer fatigue. IGBTs' thermal impedance and the junction temperature can be used as health indicators for through-life condition monitoring (CM) where the terminal characteristics are measured and the devices' internal temperaturesensitive parameters are employed as temperature sensors to estimate the junction temperature. An auxiliary power supply unit, which can be converted from the battery's 12-V dc supply, provides power to the in situ test circuits and CM data can be stored in the on-board data-logger for further offline analysis. The proposed method is experimentally validated on the developed test circuitry and also compared with finite-element thermoelectrical simulation. The test results from thermal cycling are also compared with acoustic microscope and thermal images. The developed circuitry is proved to be effective to detect solder fatigue while each IGBT in the converter can be examined sequentially during red-light stopping or services. The D&P circuitry can utilize existing on-board hardware and be embedded in the IGBT's gate drive unit.
I. INTRODUCTION

I
NSULATED gate bipolar transistor (IGBT) power modules have been widely used in high-voltage and high-power applications such as electric vehicles (EVs) [1] , [2] , ships [3] , aircraft [4] , wind turbines [5] , smart grids [6] , and industrial drives [7] , primarily due to their superior performance in terms of power density, switching frequency, energy efficiency, and cost effectiveness. EVs represent a mass market which is both safety-critical and cost-sensitive. IGBTs are a weak link in the EVs' traction drive so that their failures could lead to a sudden breakdown of the power converter or even a catastrophic accident involving human lives [8] - [10] . In general, the lifespan of IGBT power switches is much shorter than the drive system so that these devices are considered to be "consumables" which need to be renewed several times within the lifetime of the system [11] or prior to the vehicle services (600 000 km or 15 years) [12] . Thus, the capital and maintenance costs of power converters can be very high [9] and predominately affect the market acceptance of EVs. Over the last 20 years, this issue has been addressed by improving the device design at component levels (i.e., semiconductor design and packaging) and system levels (i.e., overengineering design, soft-switching, snubber circuit, advanced cooling and modulation schemes). The former has led to the improved reliability of conventional silicon-based power modules by the application-oriented design [13] , [14] , as well as to the development of high-performance silicon carbide (SiC) [7] , [15] and gallium nitride (GaN) devices [16] which offer much promise for future EVs. The latter focuses on redundancy design, diagnosis, postfault protection and advanced control algorithm [1] , [17] - [20] . In industry, traditional reliability prediction methods such as MIL-HDBK-217 [21] , PRISM [22] , and Telcordia [23] are based on statistical and empirical data. They employ the failure rate as a reliability index and generate a so-called bathtub curve (e.g., Fig. 1 ). The curve consists of three periods: early failure period, random failure period, and wear-out failure period with different failure rates. In general, early failures are associated with flawed designs or manufacturing defects so that such products would fail prematurely, i.e., the failure rate tends to reduce as the operational time elapses. Random failures are mainly due to the intermittent excessive stress (electrical, thermal, mechanical, etc.) over the maximum rating of the devices. The device failure rate can be very low after the early period. Wear-out failures occur toward the end of the device lifetime, caused by the operational loading and environmental stresses. During this period, the failure rate increases rapidly and effective condition monitoring (CM) can significantly lower the operation and maintenance cost and improve the system availability.
With regard to IGBTs, major wear-out failures are bondwire lift-off and baseplate solder fatigue [24] , [25] . The chip solder fatigue can also be an issue [26] when the device is subject to active power cycling. In this study, the IGBT devices are tested by thermal cycling within a thermal chamber, generating severe fatigue in the direct copper-bonded (DCB) solder but little impact on the chip solder. The coefficients of thermal expansion (CTE) mismatch of silicon and DCB is lower than that of the base plate and DCB. The chip solder also has a much smaller dimension than the DCB solder layer so that its fatigue is commonly ignored when experiencing passive thermal cycling [26] . In order to operate at adverse environments and improve reliability, wirebond-less technologies (e.g., planar packaging [27] and solderable front metallization [28] ) are currently developing to remove wire bonding. Nonetheless, solder layer fatigue cannot be eliminated and it remains to be a challenging failure mode. IGBT bondwire faults are extensively studied in a companion article [1] whereas this paper is devoted specifically to the solder fatigue.
II. SOLDER FATIGUE IN IGBT POWER MODULES
IGBT power modules operating at variable load conditions in EVs experience repetitive thermomechanical stress due to power cycles and thermal cycles [29] , [30] .
A. Failure Mechanisms
A simplified cross-sectional view of a standard IGBT power module is shown in Fig. 2 . Typically, a DCB substrate, which is made of ceramic and metallized copper films, is attached to the copper base plate by the solder layer. The IGBT chip is soldered onto the DCB substrate and the chip surface is connected to copper tracks via aluminum wire bonds. The assembly is then housed in a plastic case and encapsulated with silicon gel. Furthermore, a thermal interface material (e.g., thermal grease) is inserted between the base plate and heat sink to improve physical integrity and thermal transfer. Presently, some IGBT modules for EV applications use integrated heat sinks [31] , [32] to eliminate the need for the baseplate. However, the temperature difference across the power chip increases and its thermal overloading capability is reduced.
Because of different CTE for adjacent layers in the multilayered module, excessive thermomechanical stress presents in the solder layer. Consequently, solder fatigue appears and accumulates between the base plate and the DCB substrate in the form of creep [33] , voids [34] , cracks or delamination [35] . These increase the heat flux density in the remaining solder layer and retard the heat dissipation. If left untreated, the fault can grow in size and lead to an ultimate failure. In this process, the thermal impedance through the heat transfer paths increases and so does the power loss to increase the chip temperature. In turn, the escalated deterioration may trigger other failure models such as hot-spots, latch-up, burn-out, leading to bond wire or even chip failures. As a consequence, the majority of in-service IGBTs can only operate for a fraction of their life expectancy.
B. Technologies for Detecting Solder Fatigue
Generally, existing CM technologies can be divided into two methods: model-based and data-driven. The model-based method utilizes a parameterized physics of failure (PoF) model [36] , [37] to determine the degradation based on life-cycle loads, material properties, and packaging factor [14] . However, PoF models are limited to only one failure mechanism where in reality multiple degradation factors can collectively contribute to solder fatigue. In order to derive a PoF model, the coefficients of a device need to be determined experimentally which introduces errors from device inhomogeneity, filtering, and curve fitting [36] . Furthermore, a high fidelity model is computationally costly and is often impractical for in situ measurements.
In contrast, data-driven methods take advantage of the information from available measurements but usually involve pattern recognition and machine learning to extract the diagnostic and prognostic (D&P) signatures from the monitored parameters. The information can be used to correlate with the damage growth. Data-driven methods are practical, computationally efficient, and can discriminate between different failure mechanisms in a complex multivariate system.
For monitoring the solder fatigue, nondestructive testing techniques are reported in use such as scanning acoustic tomography [38] and active thermography [39] , [40] . Nonetheless, these methods rely on sophisticated, expensive measurement equipment and are only applicable to offline intrusive measurements. Alternatively, the junction-to-case thermal impedance has been widely used as a precursor of solder failures [34] , [41] , [42] , which provides an insight into multiple-layer devices with a spatial resolution [43] . On this basis, this paper develops a new in situ D&P method to estimate the changes in thermal impedance consequent upon occurred faults or thermal aging.
III. IMPLEMENTATION OF THE PROPOSED METHOD
At the vehicle market, nearly all new models are equipped with some fault diagnostic functions such as overvoltage, overcurrent, overtemperature in gate drivers and interfaced with the vehicle's electronic control unit (ECU). But these are postfault measures and are difficult to protect the switching devices from short-circuits [44] . In EVs, gate drivers need to interface with the master controller or ECU when faults are present, causing a delayed response and fault propagation. In this paper, the health condition of IGBTs is evaluated by characterizing the transient thermal impedance (TTI) curve along with IGBTs' lifelong aging models established from analytical and data-driven methods. The technology can provide both D&P functions and the in situ measurement circuitry can be integrated into current gate drivers to interface with ECUs.
A. Junction Temperature Measurement
Direct measurement of the semiconductor device junction temperature is impractical [45] . A common alternative is to measure the temperature sensitive electrical parameters (TSEPs) [46] - [48] . This work utilizes the on-state voltage drop (V CE ) as a TSEP and measures it after injecting a low dc current (I l ) into the device. Additionally, a high dc current (I h ) is used for the TTI characterization [49] . As a result, a train of pulses is generated for both measurements, as illustrated in Fig. 3 .
A TSEP calibration is carried out at low currents to obtain a TSEP calibration curve over a wide temperature range • C in this case). In this test, the ohmic voltage drop at bondwires and terminals is negligible [1] because of a very low current used for calibration. The terminal voltage at high current pulses (V CE(h) ) is directly measured to find power losses. T j is derived from the V CE(l) measurement after the current is switched from high to low. Three measurement points are taken successively and averaged to estimate T j with an aid of the predetermined TSEP calibration curves. Since some disturbances may be present at the switching transients, an initial delay time is inserted to attain valid temperature information. Therefore, it is required to trace back to zero time in heating tests by an extrapolation method.
B. Determination of Reference Temperature
The device case temperature is taken as reference and four temperature sensors (k-type thermocouples) are installed at four locations to obtain an average temperature (see Fig. 4 ) according to the industrial practice [50] - [52] . T r 1 and T r 2 are glued on the case back surface directly below the chip center of the freewheeling diode and IGBT, respectively. T r 3 is placed in a drill hole, 3 mm away from the chip edge and 2 mm from the top surface of the heat sink. T r 4 is at the edge of the base plate. In this experiment, the noises in junction temperature and case temperature are 0.6 and 0.3
• C (peak-to-peak), respectively. These correspond to the resolution for the thermal impedance measurement of ±0.003
• C/W with a sampling time of 4 ms.
C. Experimental Setup
Fig . 5 shows the experimental setup. In Fig. 5(a) , the IGBT is placed in a thermal chamber to maintain a required testing environment and tested with the proposed D&P circuit. In Fig. 5(b) , T1-T6 are six identical IGBTs and D1-D6 are six freewheeling diodes. The measurement circuitry consists of an auxiliary power supply unit (PSU); a gate drive and protection circuit; a measurement circuit with digital isolation; and several selector relays.
IV. SIMULATION AND EXPERIMENTAL RESULTS
A. Finite Element Method (FEM) Simulation Results
An FEM in COMSOL Multiphysics environment is used to evaluate the IGBT thermal characteristics under different solder fatigue conditions. The ambient temperature is set to 20
• C and a defined heat transfer coefficient is applied to the bottom of the heat sink to simulate the forced air cooling. A dissipative power step is applied to the active chip volume homogeneously (i.e., up to 100 μm from the chip top surface). The heat flow in each layer is calculated by
where ρ, C p , k, Q, and q s are the density, heat capacity, thermal conductivity, injected heat power, and absorption/production coefficient, respectively. The material and the dimension of each layer are given in Tables I and II . The solder fatigue is modeled by creating a 3-μm-thick vacuum "delamination" layer (with infinite thermal resistance) in the solder layer between the DCB and the baseplate. Simulation results are presented in Fig. 6 for comparison. It can be seen that, as the thermal cycle increases, the remaining solder area reduces and the delamination spreads from the edge to center. In addition, excessive heat is generated above the "delamination" as the result of the increased thermal impedance (i.e., aging).
B. Power Loss Measurements
A 60-A heating current is injected during the IGBT's forward conduction state while the ambient temperature is controlled within the thermal chamber. The current and voltage are measured to compute the instantaneous power input. V CE(on) consists of the voltage drop across the silicon chip (V chip ) and across the stray impedance (V stray ) accounting for terminal leads and DCB films. V CE(on) gradually increases with the rising junction temperature during the heating period. The observed power dissipation is shown in Fig. 7 where the power is segmentally averaged for TTI measurements. Since a fixed dc current is used for injection, the injected power is not constant. To simplify the calculation of the heat power, it is decomposed into a series of power pulses. Within a pulse, the power is considered to be constant. This power waveform can be represented approximately by m sequential pulses with averaged amplitudes of P 1 , P 2 , . . ., P m . The amplitude of each pulse with N samples is calculated by
In this paper, heat is assumed to conduct along a single dominant path from the IGBT junction to the environment. This offers a spatial inspection of the fault location in the multilayered power module. By controlling the duration of the applied heating pulses, the depth of the heat diffusion from IGBT chips can be regulated so that the heat flux through the solder layer can be developed sequentially (i.e., baseplate, thermal grease, and heat sink). This enables a fast condition estimation and a focus on the solder layer. Alternatively, it can increase heat loss (i.e., high temperature gradient) for a given thermal resistance. Thus, a 60-A heating current is chosen to give a sufficient increase in ΔT jr (circa 100
• C) while still limiting the junction temperature to <150
• C, even in the worst-case scenario.
C. Thermal Response at Different Ambient Temperatures
The thermal impedance is measured at the ambient temperature of 0, 20, and 40
• C, as shown in Fig. 8 . The heat flux distribution is altered due to the combined effect of silicon thermal conductivity and power dissipation variation. V CE(on) varies with V chip while the change in V stray caused by the copper thermal conductivity is negligible.
The average thermal impedance at t = 1 s and 2 s are plotted against the ambient temperature in Fig. 9 . It can be observed that the thermal impedance has a nearly linear characteristic against the ambient temperature. Therefore, the thermal impedance can be derived by linear curve fitting and a 3-D look-up table at various ambient temperatures can be created for the IGBT's healthy baselines. Any noticeable rise in measurement results above the baseline will indicate thermal path degradation. In Fig. 9 , the dotted lines also show a 0.01
• C/W drift from the initial thermal impedance at t = 1 s and 2 s. Clearly, the thermal deterioration can be easily identified.
D. Thermal Impedance With Different Injection Currents
TTI curves are obtained to study the impact of injection currents (20, 40 , and 60 A), as shown in Fig. 10 . Test results do not match perfectly at different injection currents. This is because at a higher injection current an increased heat dissipates through the terminal leads and chip surface, rather than a downward conduction route as assumed. This effect needs to be factored when evaluating the test results.
Since the thermal resistance of the DCB solder is only a fraction of the total junction-to-case thermal resistance, a high power is preferred to produce a large temperature gradient for heating tests. By doing so, the signal-to-noise ratio is improved and so is the measurement accuracy.
E. Effects of Bondwire Failures on Solder Fatigue
In practice, bond wires age when solder layers do so. Because of this coupling effect, the wire failures and subsequently increased heat can speed up the solder layer fatigue. A series of TTI tests are performed with six, five, four, and three healthy bondwires. The bondwire breakage is realized by cutting off the wire one after another. As shown in Fig. 11 , the power dissipation and junction temperature in the remaining wires increase as the wire is disconnected. However, the variation of thermal impedance is negligible in spite of broken wire faults. This is because the majority of the diverted heat passes downward into the heat sink. This confirms the effectiveness of the TTI method for monitoring bond wire lift-off failures.
F. Thermal Cycling Tests
A thermal aging test is conducted in a two-chamber thermal shocker with the temperature varying between −50 and 160
• C (see Fig. 12 ). The transition time is 2 min. The dwell time is set to 10 min, allowing the assembly to reach the maximum and minimum temperatures. The thermal cycling is interrupted at 800 and 1300 cycles for inspection.
The degradation of the solder layer between the DCB and the base plate is also examined by a C-mode scanning acoustic microscope (SAM). On detecting the reflected acoustic signal upon an injection, the delamination patterns can be visually examined. Fig. 13 presents three SAM images of two 70 A 600 V IGBTs in one module from new (a), after 800 (b) and after 1300 cycles (c). The remaining DCB solder area (given in %) is calculated by the grayscale image processing programme using Matlab. These photos clearly show that the damage initiates around the circumferential area and propagates inwards to the center. In addition, it is also interesting to see the differences in the damage between the two IGBTs: less than 1% from new, 12% at both 800 and 1300 cycles. This is not uncommon from previous observations, suggesting a necessity for improving quality control in the packaging process of manufacturing power IGBT devices.
Impedance results are illustrated in Fig. 14 , with a comparison of the thermal image results. A direct, positive correlation between the two methodologies can be clearly seen at 0, 800, and 1300 cycles since the two observe the same fatigue mechanism. As the delamination propagates deeper, the heat dissipation area shrinks and thus the thermal impedance increases. These changes are sufficiently large to be identified in this case. This may favor the data driven methods which do not rely on accurate lifelong thermal models. As for the die-attach solder layer, its degradation can also be detected with the developed D&P system but the magnitude and duration of the heating pulse should be optimized.
Moreover, the IGBT chip surface temperature under healthy (0 cycle) and aged conditions (1300 cycles) is also captured by a thermal camera for further comparison. As shown in Fig. 15(a) , the maximum chip temperature rises from 117
• C in the initial state (healthy) to 129
• C in an aged state when the DCB solder changes from a healthy to a degraded state after 1300 thermal cycles. The temperature gradient over the IGBT chip surface varies from less than 50
• C in the initial state to more than 55 • C in the aged state, as demonstrated in Fig. 15(b) .
V. CONCLUSION
IGBTs become the most vulnerable component in the electrical drivetrain of EVs, their failure would bring about safety and economic issues. This paper has addressed these by developing a D&P circuitry for IGBT solder fatigue detection.
Test results from simulation, experiments, scanning acoustic and thermal images have clearly shown that, with the gradual solder layer degradation, the junction-to-case thermal impedance and chip surface temperature will increase. High accuracy and sensibility of IGBT terminal parameter and temperature measurements would be necessary to evaluate thermal impedance so as to reveal such degradation.
The effectiveness of the developed technology is validated in terms of in situ monitoring the IGBT solder fatigue. The D&P functions can be embedded in the IGBT's gate drive unit (GDU) to improve system reliability and fault predictability. In order to minimize modifications to the system architecture and control algorithm, the measurement (junction temperature, voltage and/or current), protection (e.g., dc-bus voltage and voltage transients) and data postprocessing functions can be integrated into the GDU. The GDU can also control the selector relays to enable the sequential tests of the IGBTs, which can be conducted during stop-and-go traffic conditions or routine services. Although the paper has focused on automotive applications, the developed technology can be applied to many other applications such as wind turbines, smart grids, and industrial drives.
Presently, some intelligent power modules are equipped with on-chip temperature sensors for overtemperature protection. It is also possible to utilize these sensors for measuring junction temperature in the integrated GDU. Another tendency in IGBT packaging is to develop baseplate-free power modules which can eliminate the need for baseplates. However, in absence of baseplates, the temperature difference across the power chip increases and their thermal overloading capability is reduced. This may be improved by the latest sintering technology but the consequent manufacturing costs are high. Furthermore, these devices are only available from few selected manufacturers while the proposed technology can be applied to all standard IGBT modules available on the marketplace. He was a Power Electronics Engineer with a UK low-emission vehicle company from 2012, where he worked on powertrain development for hybrid electric vehicles and battery management systems. Since 2013, he has been a Postdoctoral Researcher at Newcastle University, where he is involved in accurate power loss measurement and health management for power electronics. His research interests include reliability study of power semiconductor devices, batteries and power electronics converters, function integration of gate drivers, electrothermal modeling, thermal management and high power-density converter integration for electric vehicle applications. He is also a member of the IET. 
