### Reliability-Oriented Strategies for Multichip Module Based Mission Critical Industry Applications

Dissertation

for obtaining the academic degree of Doctor of Engineering (Dr.-Ing.) Faculty of Engineering of Kiel University and Graduate Program in Electrical Engineering of Federal University of Minas Gerais presented by

M. Sc. Victor de Nazareth Ferreira

*Kiel* 2021

### Declaration

I declare in lieu of oath that I have completed the dissertation on the topic:

#### Reliability-Oriented Strategies for Multichip Module Based Mission Critical Industry Applications

apart from the supervision of Prof. Marco Liserre and Prof. Braz Cardoso, I have prepared the following documents independently and without assistance and have not yet submitted, published or submitted for publication, either in whole or in part, to any other body within the framework of an examination procedure. Furthermore, I hereby affirm that I have prepared the present dissertation in accordance with the rules of good scientific practice of the German Research Foundation (Deutsche Forschungsgemeinschaft) and that all passages taken over verbatim from other authors as well as the explanations of my work that closely follow the thought processes of other authors are specially marked and the sources are indicated.

Kiel, 08. Dec 2020

Gutachter: Prof. Marco Liserre, Ph.D.
 Gutachter: Prof. Braz Cardoso, Ph.D.

Datum der mündlichen Prüfung: 14.09.2021

## Acknowledgements

To my family. At the end of the day, it's all for you. Kiel in Dec 2020

## Contents

| Α  | bstra | ict     |                                                             | xii               |
|----|-------|---------|-------------------------------------------------------------|-------------------|
| R  | esum  | 0       |                                                             | xiv               |
| K  | urzfa | ssung   |                                                             | xvi               |
| Li | st of | Table   | S                                                           | xix               |
| Li | st of | Figur   | es                                                          | xxx               |
| A  | bbre  | viation | ı List                                                      | xxxi              |
| 1  | Intr  | oducti  | ion                                                         | 1                 |
|    | 1.1   | Missic  | on Critical Industry Applications                           | . 3               |
|    | 1.2   | Thern   | nal Mismatches in MCM-based Power Converters                | . 5               |
|    | 1.3   | Resear  | rch Proposal                                                | . 6               |
|    |       | 1.3.1   | Target 1 - Thermal Balancing in MCM-based Mission Critical  |                   |
|    |       |         | Power Converters                                            | . 6               |
|    |       | 1.3.2   | Target 2- Reliability-Oriented Design for MCM-based Mission |                   |
|    |       |         | Critical Applications                                       | . 7               |
|    | 1.4   | Thesis  | Structure                                                   | . 7               |
|    |       | 1.4.1   | Publications During the Doctoral Project                    | . 8               |
| 2  | The   | ermal I | Distribution and Reliability Impacts in Multichip Module    | <mark>s</mark> 11 |
|    | 2.1   | Introd  | uction                                                      | . 12              |

|   | 2.2                                                                                      | Therm                                                                                                                  | al Mismatches in Multichip Power Modules                                                                                                                                                                                                                                                                                                                                                                                                                                         | 13                                                                                                                                                         |
|---|------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   |                                                                                          | 2.2.1                                                                                                                  | Current Unbalance                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 14                                                                                                                                                         |
|   |                                                                                          | 2.2.2                                                                                                                  | Thermal Cross-Coupling                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 17                                                                                                                                                         |
|   |                                                                                          | 2.2.3                                                                                                                  | Cooling Challenges                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 18                                                                                                                                                         |
|   | 2.3                                                                                      | Therm                                                                                                                  | al Modeling of Multichip Modules                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 19                                                                                                                                                         |
|   |                                                                                          | 2.3.1                                                                                                                  | Equivalent Thermal Networks                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 19                                                                                                                                                         |
|   |                                                                                          | 2.3.2                                                                                                                  | FEM-based Thermal Modeling                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 21                                                                                                                                                         |
|   |                                                                                          | 2.3.3                                                                                                                  | Thermal Validation                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 25                                                                                                                                                         |
|   | 2.4                                                                                      | Therm                                                                                                                  | ally-Related Failures in Power Modules                                                                                                                                                                                                                                                                                                                                                                                                                                           | 27                                                                                                                                                         |
|   |                                                                                          | 2.4.1                                                                                                                  | Instabilities                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 27                                                                                                                                                         |
|   |                                                                                          | 2.4.2                                                                                                                  | Severe Overloads                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 28                                                                                                                                                         |
|   |                                                                                          | 2.4.3                                                                                                                  | Wear-out failures in power modules                                                                                                                                                                                                                                                                                                                                                                                                                                               | 29                                                                                                                                                         |
|   |                                                                                          | 2.4.4                                                                                                                  | Accumulated Degradation Unevenness in MCMs                                                                                                                                                                                                                                                                                                                                                                                                                                       | 30                                                                                                                                                         |
|   | 2.5                                                                                      | Propos                                                                                                                 | sed Solutions to Mitigate Thermal Mismatches in MCMs                                                                                                                                                                                                                                                                                                                                                                                                                             | 31                                                                                                                                                         |
|   |                                                                                          |                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                            |
|   | 2.6                                                                                      | Short                                                                                                                  | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 32                                                                                                                                                         |
| 3 | 2.6<br>Die                                                                               | Short<br>-Level                                                                                                        | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 32<br><b>33</b>                                                                                                                                            |
| 3 | 2.6<br>Die<br>3.1                                                                        | Short<br>-Level<br>Introd                                                                                              | Summary of the Chapter          Design for Reliability of MCM-based Converters         uction                                                                                                                                                                                                                                                                                                                                                                                    | 32<br><b>33</b><br>35                                                                                                                                      |
| 3 | 2.6<br>Die<br>3.1<br>3.2                                                                 | Short<br>-Level<br>Introd<br>Design                                                                                    | Summary of the Chapter       Output         Design for Reliability of MCM-based Converters         uction       Output         a for Reliability of Power Semiconductor Devices       Output                                                                                                                                                                                                                                                                                     | 32<br><b>33</b><br>35<br>36                                                                                                                                |
| 3 | <ul><li>2.6</li><li>Die</li><li>3.1</li><li>3.2</li></ul>                                | Short<br>-Level<br>Introd<br>Design<br>3.2.1                                                                           | Summary of the Chapter       Oesign for Reliability of MCM-based Converters         uction       of Network         a for Reliability of Power Semiconductor Devices       Oesign Semiconductor Devices         Mission Profile       Oesign Semiconductor Devices                                                                                                                                                                                                               | 32<br><b>33</b><br>35<br>36<br>36                                                                                                                          |
| 3 | <ul><li>2.6</li><li>Die</li><li>3.1</li><li>3.2</li></ul>                                | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2                                                                  | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 32<br><b>33</b><br>35<br>36<br>36<br>37                                                                                                                    |
| 3 | <ul><li>2.6</li><li>Die</li><li>3.1</li><li>3.2</li></ul>                                | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2<br>3.2.3                                                         | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 32<br><b>33</b><br>35<br>36<br>36<br>37<br>38                                                                                                              |
| 3 | <ul><li>2.6</li><li>Die</li><li>3.1</li><li>3.2</li></ul>                                | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2<br>3.2.3<br>3.2.4                                                | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <ul> <li>32</li> <li>33</li> <li>35</li> <li>36</li> <li>36</li> <li>37</li> <li>38</li> <li>40</li> </ul>                                                 |
| 3 | 2.6<br>Die<br>3.1<br>3.2                                                                 | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2<br>3.2.3<br>3.2.4<br>3.2.5                                       | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <ul> <li>32</li> <li>33</li> <li>35</li> <li>36</li> <li>36</li> <li>37</li> <li>38</li> <li>40</li> <li>42</li> </ul>                                     |
| 3 | 2.6<br>Die<br>3.1<br>3.2                                                                 | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2<br>3.2.3<br>3.2.4<br>3.2.5<br>3.2.6                              | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 32<br><b>33</b><br>35<br>36<br>36<br>37<br>38<br>40<br>42<br>43                                                                                            |
| 3 | 2.6<br><b>Die</b><br>3.1<br>3.2                                                          | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2<br>3.2.3<br>3.2.4<br>3.2.5<br>3.2.6<br>Die-Le                    | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 32<br><b>33</b><br>35<br>36<br>36<br>37<br>38<br>40<br>42<br>43<br>44                                                                                      |
| 3 | <ul> <li>2.6</li> <li>Die</li> <li>3.1</li> <li>3.2</li> <li>3.3</li> <li>3.4</li> </ul> | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2<br>3.2.3<br>3.2.4<br>3.2.5<br>3.2.6<br>Die-Le<br>Design          | Summary of the Chapter                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <ul> <li>32</li> <li>33</li> <li>35</li> <li>36</li> <li>36</li> <li>37</li> <li>38</li> <li>40</li> <li>42</li> <li>43</li> <li>44</li> </ul>             |
| 3 | <ul> <li>2.6</li> <li>Die</li> <li>3.1</li> <li>3.2</li> <li>3.3</li> <li>3.4</li> </ul> | Short<br>-Level<br>Introd<br>Design<br>3.2.1<br>3.2.2<br>3.2.3<br>3.2.4<br>3.2.5<br>3.2.6<br>Die-Le<br>Design<br>study | Summary of the Chapter         Design for Reliability of MCM-based Converters         uction         uction         for Reliability of Power Semiconductor Devices         Mission Profile         Converter Analysis         Thermal Analysis         Lifetime Analysis         Statistical Analysis for Reliability Prediction         System-Level Reliability         evel Design for Reliability         for Reliability of Mission Critical Industry Applications - A case | <ul> <li>32</li> <li>33</li> <li>35</li> <li>36</li> <li>36</li> <li>37</li> <li>38</li> <li>40</li> <li>42</li> <li>43</li> <li>44</li> <li>46</li> </ul> |

|   |     | 3.4.2   | Converter Analysis                                                                          | 49 |
|---|-----|---------|---------------------------------------------------------------------------------------------|----|
|   |     | 3.4.3   | Thermal Analysis                                                                            | 50 |
|   |     | 3.4.4   | Lifetime, Statistical and Reliability Analysis                                              | 50 |
|   | 3.5 | Short   | Summary of the Chapter                                                                      | 53 |
| 4 | The | ermal l | Balancing in Power Converters                                                               | 54 |
|   | 4.1 | Introd  | luction                                                                                     | 56 |
|   | 4.2 | Thern   | nal Control in Power Electronics Devices                                                    | 56 |
|   |     | 4.2.1   | Finite Control Set Model Predictive Control                                                 | 57 |
|   |     | 4.2.2   | Adaptive Dc-Link Voltage Control                                                            | 59 |
|   |     | 4.2.3   | Power Routing                                                                               | 59 |
|   | 4.3 | Power   | Routing in Multiphase Drives                                                                | 60 |
|   |     | 4.3.1   | Soft-Unbalance Operation of Multiphase Machines                                             | 61 |
|   |     | 4.3.2   | Power Routing in Multiphase Drives                                                          | 64 |
|   |     | 4.3.3   | Thermal Validation                                                                          | 65 |
|   |     | 4.3.4   | Reliability Analysis of the Power Routing in MCIAs                                          | 68 |
|   | 4.4 | Thern   | nal and Aging Monitoring of Power Semiconductor Devices                                     | 70 |
|   |     | 4.4.1   | Sensor-based Temperature Monitoring                                                         | 70 |
|   |     | 4.4.2   | Gate Voltage Monitoring                                                                     | 71 |
|   |     | 4.4.3   | Gate Resistance Monitoring                                                                  | 73 |
|   |     | 4.4.4   | Transient Monitoring                                                                        | 74 |
|   |     | 4.4.5   | Device Current Monitoring                                                                   | 76 |
|   |     | 4.4.6   | On-State Voltage Monitoring                                                                 | 77 |
|   |     | 4.4.7   | Comparison of Thermal and Aging Monitoring Strategies                                       | 87 |
|   | 4.5 | Valida  | ation of a $V_{on}$ -based Thermal Balancing Strategy                                       | 89 |
|   |     | 4.5.1   | $V_{on}$ -based $T_j$ Sensing                                                               | 90 |
|   |     | 4.5.2   | $V_{on}$ -based Thermal Balancing $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$ | 91 |
|   | 4.6 | Short   | Summary of the Chapter                                                                      | 93 |

| 5 | Die        | Level Thermal Balancing in Multichip Modules                              | 95  |
|---|------------|---------------------------------------------------------------------------|-----|
|   | 5.1        | Introduction                                                              | 97  |
|   | 5.2        | The Multi-gate Multichip Structure                                        | 97  |
|   | 5.3        | Die-Level Thermal Balancing                                               | 98  |
|   |            | 5.3.1 Pulse-Shadowing Based Thermal Balancing                             | 99  |
|   |            | 5.3.2 Turn-off Losses Manipulation for Thermal Balancing                  | 108 |
|   |            | 5.3.3 Comparison of the Thermal Balancing Solutions                       | 115 |
|   | 5.4        | Indirect Thermal Balancing for Diodes                                     | 118 |
|   | 5.5        | Selective $T_j$ Sensing and the Pre-Programmed Thermal Balancing $\ldots$ | 120 |
|   | 5.6        | Reliability and Efficiency Analysis of the Die-Level Thermal Balancing    |     |
|   |            | in MCIA                                                                   | 122 |
|   | 5.7        | Technical Analysis of the Proposed Solution                               | 126 |
|   |            | 5.7.1 Multi-Gate Driver Considerations                                    | 126 |
|   |            | 5.7.2 Power Level Margins                                                 | 127 |
|   | 5.8        | Short Summary of the Chapter                                              | 128 |
|   |            | 5.8.1 Evaluation System with Equivalent Multi-gate Multichip Module       | 129 |
| 6 | Cor        | clusions and Future Research                                              | 132 |
|   | 6.1        | Summary                                                                   | 134 |
|   | 6.2        | Future Research I - Die-Level Thermal Balancing in Wide-Bandgap           |     |
|   |            | Devices                                                                   | 136 |
|   | 6.3        | Future Research II - Degradation Control Through SOH of Power Semi-       |     |
|   |            | conductor Devices                                                         | 136 |
| A | $V_{on}$ - | based Sensing Circuit Design                                              | 137 |
|   | A.1        | Design of a $V_{on}$ -based Sensing Circuit                               | 137 |
|   |            | A.1.1 Power Supply                                                        | 137 |
|   |            | A.1.2 Operational Amplifier                                               | 138 |
|   |            | A.1.3 Isolation                                                           | 138 |
|   |            | A.1.4 Transient Performance                                               | 139 |

|   |     | A.1.5   | Steady-State Performance                         | 140 |
|---|-----|---------|--------------------------------------------------|-----|
|   |     | A.1.6   | Moving Average Digital Filter for High Precision | 140 |
| В | Des | ign an  | d Control of the Adopted Electrical Drives       | 142 |
|   | B.1 | Electri | ical Drive Design and Control                    | 142 |
|   |     | B.1.1   | Indirect Field Orientation Control               | 144 |
|   |     | B.1.2   | Control Tuning                                   | 145 |
|   |     | B.1.3   | Time Domain Response                             | 147 |

## Abstract

The availability is defined as the portion of time the system remains operational to serve its purpose. In mission critical applications (MCA), the availability of power converters are determinant to ensure continue productivity and avoid financial losses. Multichip Modules (MCM) are widely adopted in such applications due to the high power density and reduced price; however, the high number of dies inside a compact package results in critical thermal deviations among them. Moreover, uneven power flow, inhomogeneous cooling and accumulated degradation, potentially result in thermal deviation among modules, thereby increasing the temperature differences and resulting in extra temperature in specific subset of devices. High temperatures influences multiple failure mechanisms in power modules, especially in highly dynamic load profiles. Therefore, the higher failure probability of the hottest dies drastically reduces the reliability of mission critical power converters. Therefore, this work investigate reliability-oriented solutions for the design and thermal management of MCM-based power converters applied in mission critical applications. The first contribution, is the integration of a die-level thermal and probabilistic analysis on the design for reliability (DFR) procedure, whereby the temperature and failure probability of each die are taken into account during the reliability modeling. It is demonstrated that the dielevel analysis can obtain more realistic system-level reliability of MCM-based power converters. Thereafter, three novel die-level thermal balancing strategies, based on a modified MCM - with more gate-emitter connections - are proposed and investigated. It is proven that the temperatures inside the MCM can be overcame, and the maximum temperate reduced in up to 8%. Moreover, a power routing strategy for multiphase drives is presented for the first time, whereby thermal deviations of up to  $10 \,^{\circ}C$  can be considerably reduced without degrading the electromagnetic machine performance. A finite elements model of a 24-dies MCM is developed to validate the MCM thermal distribution as well as the proposed balancing strategies. As a complementary contribution, a very low noise on-state voltage ( $V_{on}$ ) sensing circuit with  $0.3 \, mV$  of precision is implemented, and its capability to close the loop of thermal control is strategies is demonstrated. The proposed solutions are also validated by an experimental setup which has an equivalent three-chips multi-gate MCM, a  $V_{on}$ -based junction temperature sensing capability and high resolution thermal camera. In addition, to evaluate the impact of the proposed strategies on the reliability and availability of MCA power converters, a case study based on a real high power mission critical application, is conducted. It is demonstrated that the power converter lifetime can increase in up to 50% when the proposed thermal balancing strategies are adopted.

### Resumo

A disponibilidade é definida como a porção de tempo que o sistema permanece operacional para servir ao seu propósito. Em aplicações de missão crítica (MCA), a disponibilidade de conversores de energia é determinante para garantir a continuidade da produtividade e evitar perdas financeiras. Os Módulos Multichip (MCM) são amplamente adotados em tais aplicações devido à alta densidade de potência e preço reduzido; entretanto, o grande número de chips dentro de um pacote compacto resulta em desvios térmicos críticos entre eles. Além disso, o fluxo de energia desigual, o resfriamento não homogêneo e a degradação acumulada podem resultar em desvio térmico entre os módulos, aumentando assim as diferenças de temperatura e resultando em temperatura extra em subconjuntos específicos de dispositivos. As altas temperaturas influenciam vários mecanismos de falha nos módulos de potência, especialmente em perfis de carga altamente dinâmicos. Portanto, a probabilidade de falha mais alta das chips mais quentes reduz drasticamente a confiabilidade dos conversores de energia de missão crítica. Desta maneira, este trabalho investiga soluções orientadas para confiabilidade para o projeto e gerenciamento térmico de conversores de energia baseados em MCM em aplicações de missão crítica. A primeira contribuição é a integração de uma análise térmica e probabilística no nível de chip no procedimento de projeto para confiabilidade (DFR), em que a temperatura e a probabilidade de falha de cada chip são levadas em consideração durante a modelagem da confiabilidade. É demonstrado que a análise de nível de chip pode obter confiabilidade de nível de sistema mais realista de conversores de energia baseados em MCM. Posteriormente, três novas estratégias de balanceamento térmico em nível de chip, com base em um MCM modificado - com mais conexões gate-emissor - são propostas e investigadas. É comprovado que as temperaturas dentro do MCM podem ser superadas, e o temperado máximo reduzido em até 8%. Além disso, uma estratégia de roteamento de energia para drives multifásicos é apresentada pela primeira vez, em que desvios térmicos de até  $10^{\circ}C$  podem ser reduzidos consideravelmente sem degradar o desempenho da máquina eletromagnética. Um modelo de elementos finitos de um MCM de 24 chips é desenvolvido para validar a distribuição térmica do MCM, bem como as estratégias de balanceamento propostas. Como contribuição complementar, um circuito de detecção de tensão no estado de ruído muito baixo  $(V_{on})$  com  $0.3 \, mV$  de precisão é implementado e sua capacidade de fechar o loop de controle térmico é demonstrada. As soluções propostas também são validadas por um setup experimental que conta com um MCM com multiplos gates equivalente de três chips, um circuito de detecção de temperatura de junção baseada em  $V_{on}$  e câmera térmica de alta resolução. Além disso, para avaliar o impacto das estratégias propostas na confiabilidade e disponibilidade dos conversores de energia, é realizado um estudo de caso baseado em uma aplicação real de missão crítica de alta potência. È demonstrado que a vida útil do conversor de potência pode aumentar em até 50% quando as estratégias de balanceamento térmico propostas são adotadas.

## Deutsch Kurzfassung der Arbeit

Die Verfügbarkeit ist definiert als der Teil der Zeit, die das System betriebsbereit bleibt, um seinen Zweck zu erfüllen. In missionskritischen Anwendungen (MCA) ist die Verfügbarkeit von Stromrichtern entscheidend, um die Produktivität zu gewährleisten und finanzielle Verluste zu vermeiden. Multichip-Module (MCM) werden in diesen Anwendungen aufgrund der hohen Leistungsdichte und des reduzierten Preises häufig eingesetzt. Die hohe Anzahl von Halbleitern in einem kompakten Gehäuse führt jedoch zu kritischen thermischen Abweichungen ihrer Temperaturen. Darüber hinaus führen ungleichmäßiger Leistungsfluss, inhomogene Kühlung und Alterung potenziell zu Abweichungen im thermischen Verhalten der Module, wodurch die Temperaturunterschiede erhöht werden und zu einer erhöhten Temperatur Teilen der Module führt. Hohe Temperaturen beeinflussen dabei mehrere Ausfallmechanismen in Leistungshalbleitermodulen, insbesondere bei hochdynamischen Lastprofilen. Dadurch steigt die Ausfallwahrscheinlichkeit der heißesten Halbleiter und die Zuverlässigkeit von einsatzkritischen Leistungswandlern sinkt drastisch. Aus diesem Grund untersucht diese Arbeit zuverlässigkeitsorientierte Lösungen für den Entwurf und das thermische Management von MCM-basierten Leistungswandlern, die in missionskritischen Anwendungen eingesetzt werden. Der erste Beitrag ist die Integration einer thermischen und probabilistischen Analyse auf Chipebene in das DFR-Verfahren (Design for Reliability), wobei die Temperatur und die Ausfallwahrscheinlichkeit jedes Chips bei der Zuverlässigkeitsmodellierung berücksichtigt werden. Es wird gezeigt, dass mit der Analyse auf Chipebene eine realistischere Zuverlässigkeitsabschätzung von MCM-basierten Leistungswandlern auf Systemebene erreicht werden kann. Danach werden drei neuartige thermische Ausgleichsstrategien auf Matrizenebene vorgeschlagen und untersucht, die auf einem modifizierten MCM - mit mehr Gate-Emitter-Verbindungen - basieren. Es wird demonstriert, dass die Temperaturen innerhalb des MCMs ausgeglichen und die maximale Temperatur um bis zu 8 % reduziert werden kann. Darüber hinaus wird erstmals eine Power-Routing-Strategie für mehrphasige Antriebe vorgestellt, wodurch thermische Abweichungen von bis zu  $10\,^{\circ}C$  erheblich reduziert werden können, ohne den Maschinenbetrieb zu beeinflussen. Zur Validierung der thermischen Verteilung der MCM sowie der vorgeschlagenen Ausgleichsstrategien wird eine MCM mit 24 Chips konstruiert. Als zusätzlicher Beitrag wird ein sehr rauscharmer Schaltkreis zur Erfassung der Spannung im eingeschalteten Zustand  $(V_{on})$  mit einer Genauigkeit von 0.3 mVimplementiert und seine Fähigkeit zur Schließung des Regelkreises der thermischen Regelungsstrategien demonstriert. Die vorgeschlagenen Lösungen werden auch durch einen Versuchsaufbau validiert, der eine äquivalente Drei-Chip-Multi-Gate-MCM, eine auf  $V_{on}$  basierende Sperrschichttemperaturerfassungsfähigkeit und eine hochauflösende Wärmebildkamera aufweist. Um die Auswirkungen der vorgeschlagenen Strategien auf die Zuverlässigkeit und Verfügbarkeit von MCA-Leistungswandlern zu evaluieren, wird außerdem eine Untersuchung auf der Grundlage einer realen missionskritischen Hochleistungsanwendung durchgeführt. Es wird gezeigt, dass die Lebensdauer von Leistungswandlern um bis zu 50 % erhöht werden kann, indem die vorgeschlagenen thermischen Ausgleichsstrategien angewendet werden.

# List of Tables

| 2.1 | Simulation parameters for the validation of the thermal distribution        | 25  |
|-----|-----------------------------------------------------------------------------|-----|
| 2.2 | Material characteristics of a power module                                  | 29  |
| 3.1 | Fixed parameters in Bayerer's equation.                                     | 41  |
| 3.2 | Fixed parameters of wire bond lifetime model                                | 41  |
| 3.3 | Simulation parameters                                                       | 49  |
| 4.1 | Rated and equivalent circuit parameters of the $690V/1.4MW$ nine-           |     |
|     | phase induction machine                                                     | 66  |
| 4.2 | Comparative analysis for the presented cases of the multiphase drive in     |     |
|     | the mine hoist system, showing the temperature and $B_{10}$ lifetime        | 70  |
| 4.3 | Comparison of Thermal Sensitive Parameters.                                 | 87  |
| 4.4 | Comparison among CM techniques                                              | 88  |
| 4.5 | Validation parameters for the the $V_{on}$ -based thermal control           | 92  |
| 5.1 | Die-Level proportional control parameters.                                  | 103 |
| 5.2 | $E_{off}$ comparison of multiple devices considering a single device with 1 |     |
|     | pu of current and two with 0.5 pu each                                      | 112 |
| 5.3 | Comparative analysis for the presented thermal strategies, showing the      |     |
|     | highest thermal deviation and temperature among the dies                    | 117 |

| 5.4 | Efficiency comparison for the MCM operating with $0.33~{\rm pu},0.66~{\rm pu}$ and |     |
|-----|------------------------------------------------------------------------------------|-----|
|     | 1 pu of power, considering four operating conditions: without thermal              |     |
|     | balancing, pulse-shadowing, turn-on losses and turn-off losses manipu-             |     |
|     | lation.                                                                            | 117 |
| 5.5 | Comparative analysis for the presented thermal strategies in the mine              |     |
|     | hoist system, showing the highest temperature, the $B_{10}$ lifetime, the          |     |
|     | energy consumption only by losses in 15 years of operation $(E_{15})$ , and        |     |
|     | their differences comparing with the system with thermal deviations.               | 126 |
| 5.6 | Validation parameters.                                                             | 130 |
| B.1 | Rated and equivalent circuit parameters of the $690 V / 1 MW$ three-               |     |

xviii

# List of Figures

| 1.1  | Steel industry plants (a) Rolling Mills (b) Wire Rods                        | 3  |
|------|------------------------------------------------------------------------------|----|
| 1.2  | Photo of the gold ore mine hoist plant                                       | 4  |
| 1.3  | Mission profile of critical industry applications (a) Mine hoist systems     |    |
|      | (b) Steel rolling mills system.                                              | 4  |
| 1.4  | Temperature distribution in a 24-dies MCM obtained from FEM analy-           |    |
|      | sis for the same power and different cooling flows (a) $h = 4440 W/K.m^{-2}$ |    |
|      | (b) $h = 3960 W/K.m^{-2}$ .                                                  | 5  |
| 1.5  | Thesis organization and related publications                                 | 9  |
| 2.1  | Power module structure.                                                      | 13 |
| 2.2  | Multichip power module: single-switch, $1.7kV/1600A,16$ IGBTs and            |    |
|      | 8 diodes in a $140 x 130 cm$ structure                                       | 14 |
| 2.3  | Equivalent circuit of single-switch MCM with N parallel dies                 | 15 |
| 2.4  | Thermal impacts on the transient time of Si IGBTs (a) Turn-on (b)            |    |
|      | Turn-off.                                                                    | 16 |
| 2.5  | Impact of the temperature on the conductivity of Si IGBTs                    | 17 |
| 2.6  | Thermal cross-coupling effects: a heat source is generated in one device     |    |
|      | and the temperature spread on its neighbors                                  | 18 |
| 2.7  | Dimensions of a multichip power module with 36 dies                          | 18 |
| 2.8  | Dynamic thermal model of power modules with heatsink                         | 19 |
| 2.9  | Cauer thermal model                                                          | 20 |
| 2.10 | Foster thermal model.                                                        | 20 |

| 2.11 | Impedance curves used to obtain the power module equivalent foster            |    |
|------|-------------------------------------------------------------------------------|----|
|      | network                                                                       | 20 |
| 2.12 | Finite Elements Model of a 1700 V/ 1600 A single-switch, with 24-dies.        | 22 |
| 2.13 | Superposition methodology to obtain the equivalent MCM thermal net-           |    |
|      | work, whereby the transient curves of all dies are obtained heating a         |    |
|      | single die at a time. The process is done for each die, but specific ones     |    |
|      | are shown as examples: (a) T1 (b) T7 (c) D1 (d) D3. $\ldots$                  | 23 |
| 2.14 | Second-order exponential fitting of the transient temperature obtained        |    |
|      | through FEM analysis, $f(x) = a \cdot e^{bt} + c \cdot e^{dt}$ .              | 24 |
| 2.15 | Simulation process, where the thermal network is exported from FEM            |    |
|      | software and embedded in numeric model. The losses are extracted from         |    |
|      | the electrothermal model and fed back to FEM software. The process            |    |
|      | works iteratively for losses and temperature analysis                         | 25 |
| 2.16 | FEM analysis of the 24-dies MCM, showing a thermal deviation among            |    |
|      | the dies of 17.4 $^{\circ}C$ (a) Steady-State analysis (b) Transient analysis | 26 |
| 2.17 | Temperature on the 24-dies MCM obtained from FEM analysis (a)                 |    |
|      | Standard cooling system $(h = 4440 W/K.m^{-2})$ (b) Defected cooling          |    |
|      | system $(h = 3960 W/K.m^{-2})$                                                | 26 |
| 2.18 | Classification of failures in power modules based on root cause               | 27 |
| 2.19 | Aging failures in power modules (a) Bond wire metallurgic damage, heel        |    |
|      | crack, fracture and liftoff. (b) Solder damages                               | 30 |
| 3.1  | Design for reliability of power semiconductor devices of a converter ap-      |    |
| 0.1  | plied to mission critical applications                                        | 37 |
| 3.2  | Switching losses extracted from the datasheet of a power semiconductor        | 01 |
| 0.2  | device: (a) IGBT Turn-on (b) IGBT Turn-off (c) Diode reverse-recovery         | 30 |
| 3.3  | Conduction losses extracted from the datasheet of a power semiconduc-         | 55 |
| 5.0  | tor device: (a) IGBT (b) Diode                                                | 30 |
| 34   | Normal distribution with a confidence of $99.73$ %                            | 42 |
| J. 1 |                                                                               |    |

| 3.5  | Unreliability curve with: $B_x$ - time when the device achieve $x\%$ of failure          |    |
|------|------------------------------------------------------------------------------------------|----|
|      | probability; $U_{\boldsymbol{x}}$ - The failure probability in a specific operation time | 44 |
| 3.6  | The reliability series association, which is adopted when a failure of a                 |    |
|      | single device results in a system interruption                                           | 44 |
| 3.7  | Die-Level design for reliability flowchart, including - light green - the                |    |
|      | thermal deviation and the system-level reliability modeling of the MCM                   |    |
|      | dies                                                                                     | 45 |
| 3.8  | Mine hoist system (a) General diagram (b) Photo of the electrical sys-                   |    |
|      | tem of the mine hoist                                                                    | 46 |
| 3.9  | Electrical measurements of the mine hoist during one trip, measured in                   |    |
|      | the real field environment                                                               | 48 |
| 3.10 | Mission profile of the mine hoist system driven by an induction machine                  |    |
|      | with FOC                                                                                 | 49 |
| 3.11 | Dc-Link current profile of the designed converter feeding the induction                  |    |
|      | machine which drive the mine hoist system.                                               | 50 |
| 3.12 | Junction temperature profile of one MCM in the 2-level inverter with                     |    |
|      | (a) Foster thermal network (b) Matrix with cross-coupling impedance                      |    |
|      | extracted from finite elements analysis                                                  | 51 |
| 3.13 | Monte-Carlo analysis to account the failure distribution over time, con-                 |    |
|      | sidering parametric deviations in $V_{ce}$ , without thermal deviation (pur-             |    |
|      | ple) - foster thermal network - and with thermal deviation (black) -                     |    |
|      | FEM-based thermal network with cross-coupling                                            | 51 |
| 3.14 | Series-connected reliability modeling, whereby the failure of a single                   |    |
|      | component means system interruption, for the procedures: (a) System-                     |    |
|      | Level Design for Reliability (b) Die-Level Design for Reliability                        | 52 |
| 3.15 | Unreliability analysis of the multichip modules composing the mine                       |    |
|      | hoist power converter, for system-level reliability (purple) and die-level               |    |
|      | reliability (black) approaches. A $B_{10}$ lifetime difference of $100\%$ is ob-         |    |
|      | served                                                                                   | 52 |

| 4.1  | Temperature related control variables and strategies for thermal control |    |
|------|--------------------------------------------------------------------------|----|
|      | and power routing, in power electronics devices.                         | 57 |
| 4.2  | Module and system layout : (a) One phase ANPC open module. (b)           |    |
|      | Simplified system model scheme of ANPC converter using model pre-        |    |
|      | dictive control.                                                         | 58 |
| 4.3  |                                                                          | 59 |
| 4.4  | Adaptive minimum dc-link control to increase PV lifetime                 | 59 |
| 4.5  | Power Routing concept, the power is reduced in the most damaged cell     |    |
|      | to equalize the remaining useful lifetime.                               | 60 |
| 4.6  | (a) Current diagram of the nine-phase machine in balanced condition      |    |
|      | (b) Basic construction diagram of the 9PIM with two poles                | 61 |
| 4.7  | Current diagram of the a nine-phase machine in balanced condition,       |    |
|      | considering a reduction in phase $A_1$ - current $I_1$                   | 62 |
| 4.8  | Magnetomotive force for the balanced and the re-calculated currents      |    |
|      | for the soft-unbalanced operation, showing a circular MMF in both        |    |
|      | conditions                                                               | 63 |
| 4.9  | Flowchart of the power routing for thermal balancing in nine-phase       |    |
|      | induction machines                                                       | 64 |
| 4.10 | Power routing diagram of a nine-phase machine, whereby the soft-         |    |
|      | unbalance currents are calculated to balance the temperatures; and it    |    |
|      | is imposed via a Z-subspace current control. The speed IFOC control      |    |
|      | is implemented in dq synchronous in the same way of an ordinary three    |    |
|      | phase induction machine. The T9 matrix generates the voltage refer-      |    |
|      | ences for the 9PH modulator.                                             | 65 |
| 4.11 | FEM-based thermal simulation schematic of the nine-phase machine.        | 66 |

| 4.12 | Transient results of the power routing - triggered at $t = 15$ s - in a       |    |
|------|-------------------------------------------------------------------------------|----|
|      | 9PIM fed by a nine-phase inverter with 24-dies MCMs and a defected            |    |
|      | heatsink in phase $A_1$ (a) Current in all phases, whereby the current in     |    |
|      | phase $A_1$ is reduced to smooth its thermal stress. (b) Temperature of       |    |
|      | the hottest dies in the MCMs of phases $A_1$ and $A_2$ , showing a balancing  |    |
|      | when the power routing is activated.                                          | 67 |
| 4.13 | FEM analysis of the 24-dies MCM in phases $A_1$ and $A_2$ of the nine-        |    |
|      | phase induction machine (a) Without power routing (b) With power              |    |
|      | routing.                                                                      | 67 |
| 4.14 | Temperature profile for the die T7 of the 24-dies MCM at the top of           |    |
|      | phase $A_1$ of the nine-phase converter applied to the mine hoist profile,    |    |
|      | considering three cases: standard heatsink (black), defected heatsink         |    |
|      | (red) and defected heatsink with power routing (blue).                        | 68 |
| 4.15 | Die-level reliability analysis for the mine hoist system driven by a mul-     |    |
|      | tiphase machine, for the standard system (black), defected heatsink in        |    |
|      | phase A1 (red) and power routing (blue) (a) Statistical Analysis (b)          |    |
|      | Unreliability Analysis.                                                       | 69 |
| 4.16 | Classification of the thermal sensitive and aging parameters.                 | 71 |
| 4.17 | Regulation of the collector current for $T_j$ estimation using $V_{ge,i}$     | 72 |
| 4.18 | The $T_j$ estimation by the internal gate resistance through a predefined     |    |
|      | dc current injection.                                                         | 73 |
| 4.19 | High power IGBT module parasitic-based equivalent circuit.                    | 75 |
| 4.20 | Gate impedance assist circuit.                                                | 75 |
| 4.21 | The $T_j$ estimation by the saturation current through a regulated $V_{gs}$ . | 76 |
| 4.22 | The $T_j$ estimation by the short-circuit current                             | 77 |
| 4.23 | The timeline of the proposed on-state voltage sensing circuits                | 79 |
| 4.24 | The $V_{on}$ sensing circuit with clamping Zener diodes: (a) 0-600 V (b)      |    |
|      | 0-2000 V                                                                      | 80 |

| 4.25 | The $V_{on}$ sensing circuit with series connection of Zener and a diode to              |    |
|------|------------------------------------------------------------------------------------------|----|
|      | clamp the voltage.                                                                       | 80 |
| 4.26 | The $V_{on}$ sensing circuit with series connection of diode to clamp the                |    |
|      | voltage and cascode current source.                                                      | 80 |
| 4.27 | The $V_{on}$ sensing circuit with series connection of high voltage diodes               |    |
|      | and single current source.                                                               | 81 |
| 4.28 | The $V_{on}$ sensing circuit based on a sensing current and a series connec-             |    |
|      | tion of high voltage diodes and Zener                                                    | 81 |
| 4.29 | The $V_{on}$ sensing circuit based on a $V_{dc}$ clamping                                | 82 |
| 4.30 | The $V_{on}$ sensing circuit with active voltage clamping                                | 83 |
| 4.31 | The real-time $V_{on}$ sensing circuit with active voltage clamping                      | 83 |
| 4.32 | The off-line $V_{on}$ sensing circuit with relay-based voltage isolation                 | 84 |
| 4.33 | Low current $T_j$ calibration curve experimentally obtained from a 25 A                  |    |
|      | IGBT (DP25H1200T01667)                                                                   | 85 |
| 4.34 | Operating region of an IGBT depending on the collector current. The                      |    |
|      | temperature does not impact the $V_{ce}$ with a specific current - i.e. at the           |    |
|      | inflection point.                                                                        | 86 |
| 4.35 | The $V_{on}$ sensing circuit with series connection of high voltage diodes, sin-         |    |
|      | gle current source, low noise dc power supply and isolation (a)Schematic                 |    |
|      | (b) Board                                                                                | 89 |
| 4.36 | $V_{ce}$ -based $T_j$ under fixed current $I_c = 1 A$ and temperature $T_j = 52 \circ C$ |    |
|      | (a) Junction temperature measured via fiber optic sensors (b) $V_{avg}$ ob-              |    |
|      | tained from a fifty sample move average filter using an 1 MSPS ADC                       |    |
|      | in burst mode (c) Fitting curve obtained with fixed low current of $1 A$ .               | 91 |
| 4.37 | Validation setup consisting of two buck converters in parallel with $V_{on}$             |    |
|      | sensing capability, whereby one is placed over a heatsink to obtain a                    |    |
|      | specific thermal deviation. A fiber optic system is used to measure                      |    |
|      | the temperature on the chips and validate the strategy (a) Photo (b)                     |    |
|      | Schematic.                                                                               | 92 |

| 4.38 | $V_{on}$ -based on-off thermal balancing strategy, whereby the junction tem-                                                        |
|------|-------------------------------------------------------------------------------------------------------------------------------------|
|      | peratures are sensed and the device with highest one has its PWM                                                                    |
|      | command blocked                                                                                                                     |
| 4.39 | Experimental validation of the $V_{on}$ -based thermal control. The junction                                                        |
|      | temperature starts with a deviation of $9 ^{\circ}C$ , and it is equalized after                                                    |
|      | the thermal control is triggered at $t = 3 s$ . (a) $V_{on}$ of each switch (b)                                                     |
|      | Measured temperature of each switch                                                                                                 |
| 5.1  | Multichip Modules (a) Standard Structure (b) Multi-gate Structure 97                                                                |
| 5.2  | Proposed packaging with four equivalent switching groups. (a) Open                                                                  |
|      | package with switching groups (b) Package structure with four gate-                                                                 |
|      | emitter connections                                                                                                                 |
| 5.3  | Full diagram of the thermal balancing in multichip multi-gate modules,                                                              |
|      | with $V_{on}$ -based $T_j$ sensing $\ldots \ldots $ 99 |
| 5.4  | The pulse-processing approaches for die-level thermal balancing: pulse-                                                             |
|      | Shadowing, turn-on losses manipulation and turn-off losses manipulation. 99                                                         |
| 5.5  | The pulse-shadowing based thermal balancing pulse pattern 100                                                                       |
| 5.6  | Die-Level proportional control based on average losses                                                                              |
| 5.7  | Frequency response for the equivalent thermal model                                                                                 |
| 5.8  | Frequency response for the proportional die-level thermal control system                                                            |
|      | (a) Dynamic stiffness (b) Magnitude gain and phase marging of the                                                                   |
|      | open-loop control system                                                                                                            |
| 5.9  | Transient thermal results of the pulse-shadowing strategy, showing the                                                              |
|      | junction temperature of one die per switching group. The thermal bal-                                                               |
|      | ancing is triggered at $t = 20 s$ , and the temperatures are balanced 105                                                           |
| 5.10 | Transient thermal results of the pulse-shadowing strategy considering                                                               |
|      | 10% of variation on the $R_{th}$ of SW1, showing the junction temperature                                                           |
|      | of one die per switching group. The thermal balancing is triggered at                                                               |
|      | t = 20 s, and the temperatures are balanced, with deviations below 1°C.105                                                          |

| 5.11 | Control diagram of the on-off based die-level thermal control strategy             | 106 |
|------|------------------------------------------------------------------------------------|-----|
| 5.12 | Transient thermal results of the on-off control with pulse-shadowing               |     |
|      | strategy, showing the junction temperature of one die per switching                |     |
|      | group. The thermal balancing is triggered at $t = 20 s$ , and the temper-          |     |
|      | atures are perfectly balanced.                                                     | 106 |
| 5.13 | Electrical results of the pulse-shadowing strategy, showing the gate com-          |     |
|      | mands $(S_x)$ , one IGBT/diode current $(I_{s1})$ , output voltage $(V_{out})$ and |     |
|      | current $(I_{out})$                                                                | 107 |
| 5.14 | Steady-state thermal analysis of the pulse-shadowing strategy: (a) With-           |     |
|      | out balancing. (b) With balancing.                                                 | 108 |
| 5.15 | Turn-off process of an IGBT under inductive load                                   | 109 |
| 5.16 | Physical structure of an IGBT.                                                     | 110 |
| 5.17 | Schematic and flowchart of the two double pulse tests applied to the               |     |
|      | DUT to demonstrate the effects of $I_c$ - 1 and 0.5 pu - on the turn-off           |     |
|      | time of IGBTs.                                                                     | 111 |
| 5.18 | Experimental results for the turn-off of an $1.2kV/25A$ IGBT for dif-              |     |
|      | ferent currents. (a) $25 A$ (b) $12.5 A$                                           | 112 |
| 5.19 | Pulse pattern of the turn-off losses manipulation strategy, whereby the            |     |
|      | hottest device - or SW - has its duty time reduced and turned-off before           |     |
|      | the others, in soft-switching.                                                     | 113 |
| 5.20 | Implementation schematic of the on-off control based on turn-off losses            |     |
|      | manipulation for thermal balancing, with online $t_{off-red}$ calculation          | 113 |
| 5.21 | Transient thermal results of the on-off control with pulse-shadowing               |     |
|      | strategy, showing the junction temperature of one die per switching                |     |
|      | group. The thermal balancing is triggered at $t = 20 s$ , and the temper-          |     |
|      | atures are perfectly balanced.                                                     | 114 |
| 5.22 | Electrical results of the turn-off losses manipulation, showing the gate           |     |
|      | commands $(S_x)$ , one IGBT/diode current $(I_{s1})$ , output voltage $(V_{out})$  |     |
|      | and current $(I_{out})$                                                            | 115 |

| 5.23 | Steady-state thermal analysis of the turn-off losses manipulation strat-      |     |
|------|-------------------------------------------------------------------------------|-----|
|      | egy: (a) Without balancing. (b) With Balancing                                | 115 |
| 5.24 | Finite element analysis comparison for the presented thermal balancing        |     |
|      | strategies: (a) Without Balancing (b) Pulse-Shadowing (c) Turn-on             |     |
|      | Losses (d) Turn-off Losses.                                                   | 116 |
| 5.25 | Flowchart of the thermal stress reduction in MCM diodes by acting on          |     |
|      | the thermal spread                                                            | 119 |
| 5.26 | Finite element analysis of the thermal balancing for MCMs in reverse          |     |
|      | power flow applications: (a) Without thermal balancing (b) With ther-         |     |
|      | mal balancing.                                                                | 120 |
| 5.27 | Flowchart of the pre-programmed thermal balancing for load cycling            |     |
|      | applications, with selective $T_j$ sensing process under reduced load         | 121 |
| 5.28 | Validation of the die-level $V_{ce}$ sensing, pre-programmed thermal balanc-  |     |
|      | ing and transient performance of the turn-off losses manipulation strat-      |     |
|      | egy. The pulse pattern is defined with 7.5 A by sensing the $T_j$ , and it is |     |
|      | repeated for $15 A$ and $23 A$ keeping the temperatures balanced during       |     |
|      | the whole power cycling. (a) Current profile. (b) Junction temperature        |     |
|      | of each device.                                                               | 122 |
| 5.29 | (a) Dc-Link current of the mine hoist system mission profile under re-        |     |
|      | duced load, whereby the pulse-pattern of the thermal balancing strate-        |     |
|      | gies are obtained with a full $T_j$ sensing capability, a shown in green.     |     |
|      | The adopted thermal balancing strategies according to the power flow,         |     |
|      | are also highlighted in red. (b) Junction temperature of each die in-         |     |
|      | side the MCM for the mission profile with reduced load, whereby the           |     |
|      | temperatures are sensed, balanced and the pulse-pattern is defined            | 123 |

| 5.30 | FEM-based temperature analysis for the mine hoist mission profile con-               |     |
|------|--------------------------------------------------------------------------------------|-----|
|      | sidering the following thermal balancing strategies: (a) None (b) Pulse-             |     |
|      | Shadowing in direct power flow and diode stress reduction in reverse                 |     |
|      | power flow (c) Turn-on Losses in direct power flow and diode stress re-              |     |
|      | duction in reverse power flow (d) Turn-off Losses in direct power flow               |     |
|      | and diode stress reduction in reverse power flow                                     | 124 |
| 5.31 | Die-level reliability analysis for the mine hoist system considering the             |     |
|      | presented thermal balancing strategies (a) Statistical Analysis (b) Un-              |     |
|      | reliability Analysis.                                                                | 125 |
| 5.32 | Hardware schematic of the Multi-gate driver structure for control and                |     |
|      | monitoring of multi-gate MCMs                                                        | 127 |
| 5.33 | Validation Setup consisting of: an equivalent multi-gate multichip power             |     |
|      | module in two heats<br>inks, driver board, $V_{ce}$ sensing, variable load and       |     |
|      | infrared camera : (a) Wide view of the open three-dies module, $V_{ce}$              |     |
|      | sensing and drivers. (b) Schematic.                                                  | 129 |
| 5.34 | Validation system with equivalent multi-gate multichip modules : (a)                 |     |
|      | Output waveforms (b) Thermal distribution on the equivalent multichip                |     |
|      | module                                                                               | 130 |
| A.1  | Transient performance of the $V_{\rm cm}$ sensing circuit, using a square wave       |     |
|      | generator with duty cycle of 0.5 and switching frequencies of: (a) $5  kHz$          |     |
|      | (b) $10  kHz$ (c) $15  kHz$ (d) $20  kHz$                                            | 139 |
| A.2  | Steady-state validation of the sensing circuit showing an output voltage             |     |
|      | for $V_{on}$ (pink) and $V_{filter}$ (green) with a fluctuation of around $5 mV$ (a) |     |
|      | Full view (b) Zoom.                                                                  | 140 |
|      |                                                                                      |     |
| B.1  | Induction machine equivalent magnetic circuit.                                       | 143 |
| B.2  | Simulation of the induction machine with equivalent magnetic circuit                 |     |
|      | parameters under nominal load (a) Line voltages (b) Stator Currents                  | 143 |
| B.3  | Photo of the electrical system of the mine hoist                                     | 144 |

| B.4 | Indirect field orientation control for a three-phase induction machine. 1 | .45 |
|-----|---------------------------------------------------------------------------|-----|
| B.5 | Frequency response of the system, for the current loop                    | .46 |
| B.6 | Frequency response of the system, for the speed loop                      | .47 |
| B.7 | Time domain response of the IFOC applied in the selected induction        |     |
|     | machine: (a) Reference and measured speed (b) Electrical and load         |     |
|     | torque                                                                    | .48 |

# Symbol and Abbreviation List

| MCM  | Multichip Modules;                       |
|------|------------------------------------------|
| MCIA | Mission Critical Industry Applications;  |
| DFR  | Design for Reliability;                  |
| CM   | Condition Monitoring;                    |
| ТВ   | Thermal Balancing;                       |
| ATC  | Active Thermal Control;                  |
| DL   | Die-Level;                               |
| ac   | Alternate Current;                       |
| dc   | Direct Current;                          |
| MMF  | Magneto-motive force;                    |
| FEM  | Finite Elements Model;                   |
| EOL  | End of Life;                             |
| PDF  | Probability Density Function;            |
| CDF  | Cumulative Density Function;             |
| TSEP | Thermal Sensitive Electrical Parameters; |
| APP  | Aging Precursor Parameter;               |
| DUT  | Device Under Test;                       |
| ADC  | Analog-Digital Converter;                |
| SW   | Switching Group;                         |
| IGBT | Insulated Gate Bipolar Transistor;       |
| LC   | Lifetime Consumption;                    |
| RUL  | Remaining Useful Lifetime;               |
| EOL  | End of Life;                             |
| PWM  | Pulse-Width Modulation;                  |

| $T_j$           | Junction Temperature;                |
|-----------------|--------------------------------------|
| $T_{jm}$        | Mean Junction Temperature;           |
| $\Delta T_j$    | Thermal Cycle Amplitude;             |
| $V_{on}$        | On-State Voltage;                    |
| $V_{ge-th}$     | Gate-Emitter Threshold Voltage;      |
| $V_{ge}$        | Gate-Emitter Voltage;                |
| $V_{ce}$        | Collector-Emitter Voltage;           |
| $V_{dc}$        | Dc-link Voltage;                     |
| $V_{ds}$        | Drain-Source Voltage;                |
| $I_c$           | Collector Current;                   |
| $R_{g-int}$     | Internal Gate-Resistance;            |
| $R_{g-ext}$     | External Gate-Resistance             |
| $t_{d-off}$     | Turn-off Time Delay;                 |
| $L_{kE}$        | Kelvin-Emitter Parasitic Inductance; |
| $I_{css}$       | Saturation Current;                  |
| $I_{sc}$        | Short-Circuit Current;               |
| $V_f$           | Diode Forward Voltage Drop;          |
| $V_{iso}$       | Isolated On-State Voltage;           |
| $V_{fil}$       | Filtered On-State Voltage;           |
| $C_{gc}$        | Gate Capacitance;                    |
| $C_{ox}$        | Oxide Capacitance;                   |
| $V_{gp}$        | Miller-Plateau Capacitance;          |
| $\Delta I_{ch}$ | MOS-Chanel Current;                  |
| $W_{cd}$        | Depletion Layer;                     |
| $T_L$           | Load Torque;                         |
| n               | Machine Speed;                       |
| η               | Efficiency;                          |
| $I_L$           | Load Current;                        |
| $F_{System}$    | Component Unreliability;             |
| $F_{System}$    | System-Level Unreliability;          |
| FIT             | Failure in Time;                     |
| 9PIM            | Nine-phase Induction Machine;        |

Chapter 1

# Introduction

# Contents

| 1.1                   | Mission Critical Industry Applications           |                                                             | 3 |
|-----------------------|--------------------------------------------------|-------------------------------------------------------------|---|
| 1.2                   | Thermal Mismatches in MCM-based Power Converters |                                                             | 5 |
| 1.3 Research Proposal |                                                  |                                                             |   |
|                       | 1.3.1                                            | Target 1 - Thermal Balancing in MCM-based Mission Critical  |   |
|                       |                                                  | Power Converters                                            | 6 |
|                       | 1.3.2                                            | Target 2- Reliability-Oriented Design for MCM-based Mission |   |
|                       |                                                  | Critical Applications                                       | 7 |
| 1.4                   | Thesis                                           | Structure                                                   | 7 |
|                       | 1.4.1                                            | Publications During the Doctoral Project                    | 8 |

#### 1.1 Mission Critical Industry Applications

In Industry, a single failure may lead to high financial losses due to long production pauses, parts replacement, travel for maintenance and penalty charges [1]. However, the high power demand and critical load variation of many deep mining extraction and steel industry processes, challenge the design of their power electronics systems. The steel industry, is nowadays the worlds most important material, with an annual production over 1.5 billion tons [2]. The rolling mills, shown in Fig. 1.1 (a), is a widely used process, whereby the mill is responsible for controlling the strip speed in precise limits to provide high production and quality of the final product [2]. Wire rod plants, shown in Fig. 1.1 (b), are composed by 30 stands responsible for reducing the cross-section of a metal bloom, and produce steel for automobile components, barbed wires, wire ropes and hardware manufactures [3]. In such application, vibration in the finishing-blocks, reduces the quality of the coil formation. Therefore, a ripple free robust torque control is required for the best wire-rod winding formation [3].

The deep gold ore mineral extraction is carried out through several levels, whereby the ore is drilled and transported to its first crushing stage. After that, it is transported



Figure 1.1: Steel industry plants (a) Rolling Mills (b) Wire Rods.


Figure 1.2: Photo of the gold ore mine hoist plant.

to the surface, and according to the characteristics of the ore body, there are guidelines indicating the proper method of transportation. In gold deposits of more than 500 meters in depth, for example, the use of electrical driven mine hoist, shown in Fig. 1.2, is qualified as the preferred solution [4]. The mine hoist is responsible for the transportation of the overall gold ore production and the mining workers; therefore, its reliability is of utmost importance to ensure continuity of service and preserve human lives.

In addition to the high reliability requirements, the vertical hoist acceleration and the bidirectional rolling process with material thickness variation, result in very critical mission profiles, as shown in Fig. 1.3. Thereby, the torque  $(T_L)$  and velocity dynamics  $(w^*)$  with eventual overload conditions, result in highly dynamic power variation, and ultimately critical thermal cycling for the power semiconductor devices.



Figure 1.3: Mission profile of critical industry applications (a) Mine hoist systems (b) Steel rolling mills system.

## 1.2 Thermal Mismatches in MCM-based Power Converters

Multichip power modules (MCM) are widely adopted in mission critical industry applications, due to its high power density, ease of maintenance and instalation [5, 6]. The MCM, is a very engaging solution containing a plurality of chips inside the same package, shortening the interconnection time, whereas decreases weight and size [7]. Consequently, the MCM has been widely adopted in high power density applications over hundred of amperes [8]. Conversely, the incessant desire for miniaturization has been shrinking the package area without reducing - or even increasing - the number of chips, thereby resulting in parametric - circuit and device - deviations, thermal cross-coupling, inhomogeneous thermal resistances and ultimately, thermal mismatches among the dies and modules, as shown in Fig. 1.4 [7, 9, 10, 11, 12, 13, 14]. Indeed, high temperatures affects multiple failure mechanisms of power modules and has been reported as the root cause of most failures in industry [1, 15, 16, 17, 18, 19, 20, 21]. Consequently, an extra temperature in a subset of devices facing critical thermal cycling of mission critical industry applications, reduces the reliability of their power electronics converters [1, 20, 21].



Figure 1.4: Temperature distribution in a 24-dies MCM obtained from FEM analysis for the same power and different cooling flows (a)  $h = 4440 W/K.m^{-2}$  (b)  $h = 3960 W/K.m^{-2}$ .

### 1.3 Research Proposal

This work, proposes and investigates reliability-oriented solutions to mitigate the effects of thermal mismatches and improve the reliability of MCM-based mission critical industry applications (MCIA). Therefore, four novel strategies to balance the power among dies and modules of non-modular high power MCM-based converters, are presented and deeply investigated. Furthermore, the addition of a die-level thermal and failure probability analysis is proposed to improve the design for reliability of MCM-based power converters

### 1.3.1 Target 1 - Thermal Balancing in MCM-based Mission Critical Power Converters

In the last 30 years many solutions have been proposed to solve thermal unbalance in MCMs such as: thermal optimization design, modified layout and optimized water cooling systems [7, 9, 22, 23, 24, 25, 26, 27]. Nevertheless, the increasing power density demand and the advent of new technologies stills affecting the thermal balancing in multichip modules, and critical thermal deviations have been recently reported [28, 29]. Besides the thermal mismatches among the dies; inhomogeneous air cooling, aging process and power unbalance among phases, can potentially result in thermal deviation among the modules of high power converters, as shown in Fig. 1.4. In modular systems, active thermal control and power routing strategies have been proposed to redistribute the power among their modules and alleviate the most thermally stressed devices [30, 31, 32]. However, the standard structure of MCMs and the non-modularity of three-phase power converters limit the application of such strategies.

Therefore, the first research target is to propose and investigate solutions to balance the temperature among dies and modules in MCM-based power converters. Thereby, a die-level thermal balancing based on a more flexible MCM structure is introduced, and its capability to increase the reliability of MCM-based power converters is investigated. Moreover, a power routing strategy for multiphase drives is proposed, whereby the power is distributed to overcome thermal mismatches among different MCMs without affecting the electromagnetic machine performance.

### 1.3.2 Target 2- Reliability-Oriented Design for MCM-based Mission Critical Applications

The design for reliability (DFR) has been widely applied in mission critical application power converters, whereby power modules and cooling systems are designed aiming at achieving a predefined lifetime [33, 34, 35, 36]. The proposed methods, however, do not consider the thermal mismatches and the high number of devices prone to failure is neglected, which potentially results in less realistic reliability levels in MCM-based power converters.

Therefore, the second research target is to propose a die-level thermal and probabilistic analysis in the design for reliability of MCM-based power converters. In this proposal, the temperature of each die is obtained through finite elements analysis, rather than using simplistic thermal models. Moreover, the failure probability of each die is calculated and combined to obtain the power converter system-level reliability.

#### **Complementary Contributions**

As a complementary contribution, the capability of an on-state voltage  $(V_{on})$  based junction temperature  $(T_j)$  sensing circuit to feedback thermal balancing strategies, is experimentally demonstrated.

### 1.4 Thesis Structure

This thesis is organized in six chapters, as shown in Fig. 1.5. In the first chapter, the influences of temperature on the failure mechanisms of power semiconductor devices, are explained. The first contribution is presented in Chap. 3, whereby the impacts of a die-level thermal and reliability analysis on the design for reliability procedure is demonstrated. Thereafter, in Chap. 4, a state-of-the art of the methods to monitor and control the temperature in power devices is described, and a power routing strategy for multiphase drives, is proposed. Furthermore,  $V_{on}$ -based  $T_j$  sensing circuit is implemented and its capability to feedback thermal balancing strategies, is demonstrated. Based on the presented thermal control solutions, pulse processing based thermal balancing strategies are presented in Chap. 5. In addition, the impact of the thermal balancing on the lifetime and efficiency of mission critical industry applications, are evaluated. Finally, the conclusions are stated and future research topics based on this thesis are suggested in Chap. 6.

#### **1.4.1** Publications During the Doctoral Project

#### Conference

The conference publications correlated to the doctoral research project are summarized as follows:

- - [K1] Power Routing to Enhance the Lifetime of Multiphase Drives (ECCE 2019)
- - [K2] Active Redundancy in the Low Voltage Stage of Smart Transformers (ECCE 2018)

#### Journal

The journal publications correlated to the doctoral research topic are summarized as follows:

- [J1] Design and Selection of High Reliability Converters for Mission Critical Industrial Applications: A Rolling Mill Case Study (Transactions on Industry Applications)
- - [J2] Mission Critical Analysis and Design of IGBT-based Power Converters Applied to Mine Hoist Systems (Transactions on Industry Applications)



Figure 1.5: Thesis organization and related publications.

• - [J3] Soft-Unbalance Operation for Power Routing in Multiphase Drives (Transactions on Industry Applications)

- - [J4] Pulse-Shadowing based Thermal Balancing in Multichip Modules (Transactions on Industry Applications).
- [J5] Selective Soft-Switching in Multichip Systems (Journal of Emerging and Selected Topic in Power Electronics).

Chapter 2

# Thermal Distribution and

# **Reliability Impacts in Multichip**

# Modules

# Contents

| 2.1 | Introd                                            | uction                                               | 12 |  |
|-----|---------------------------------------------------|------------------------------------------------------|----|--|
| 2.2 | 2.2 Thermal Mismatches in Multichip Power Modules |                                                      |    |  |
|     | 2.2.1                                             | Current Unbalance                                    | 14 |  |
|     | 2.2.2                                             | Thermal Cross-Coupling                               | 17 |  |
|     | 2.2.3                                             | Cooling Challenges                                   | 18 |  |
| 2.3 | Thern                                             | nal Modeling of Multichip Modules                    | 19 |  |
|     | 2.3.1                                             | Equivalent Thermal Networks                          | 19 |  |
|     | 2.3.2                                             | FEM-based Thermal Modeling                           | 21 |  |
|     | 2.3.3                                             | Thermal Validation                                   | 25 |  |
| 2.4 | Thern                                             | nally-Related Failures in Power Modules              | 27 |  |
|     | 2.4.1                                             | Instabilities                                        | 27 |  |
|     | 2.4.2                                             | Severe Overloads                                     | 28 |  |
|     | 2.4.3                                             | Wear-out failures in power modules                   | 29 |  |
|     | 2.4.4                                             | Accumulated Degradation Unevenness in MCMs           | 30 |  |
| 2.5 | Propo                                             | sed Solutions to Mitigate Thermal Mismatches in MCMs | 31 |  |
| 2.6 | Short Summary of the Chapter                      |                                                      |    |  |

### 2.1 Introduction

Multichip power modules (MCMs) are widely adopted in high power converters, and it is expected to still the standard solution in a foreseeing future [37, 38]. MCMs are indeed a very engaged solution with small size and short commutation loops; however, the high number dies in limited space result in thermal deviations and, consequently, extra temperature in specific dies [8, 17, 39, 40, 41]. Indeed, high temperatures have a strong influence on multiple failure mechanisms of power devices, and has been reported as the root cause of most failure events in industry [1, 15, 16, 17, 18, 19, 20, 21]. This chapter, presents the causes and reliability impacts of extra temperature in specific dies of a multichip modules. To demonstrate the uneven thermal distribution, a finite elements model of a 24-dies MCM is constructed.

# 2.2 Thermal Mismatches in Multichip Power Modules

For high power semiconductor devices two options are available in the market: press-pack technology and power modules. Although the first one has high power density, low failure rate and small thermal resistances, the multichip power module technology has been widely adopted mainly because of its lower price, ease of maintenance and installation [42]. As shown in Fig. 2.1, the power module is composed by several layers. A single-die power module, for example, is composed by a Silicon (Si) chip soldered in a direct copper-bonded (DCB) substrate, and contacted on the top side by aluminum (Al) bond wires. The DCB, insulates the chip from the base plate and conducts the heat dissipated to the cooling system [43].



Figure 2.1: Power module structure.



Figure 2.2: Multichip power module: single-switch,  $1.7 \, kV / 1600 \, A$ , 16 IGBTs and 8 diodes in a  $140 \, x \, 130 \, cm$  structure.

To achieve high current capability a multichip module structure is adopted, whereby several semiconductors are connected in parallel inside the same package. The MCM is a very engaging solution which reduces size and shorts interconnections of parallel devices to obtain very compact high current switches - up to 3600 A [7]. Fig. 2.2, shows an 1.7 kV / 1600 A single-switch, whereby 16 IGBTs and 8 diodes are embedded in a 140 x 130 cm structure. Although the MCM is a very engaging structure, its design is quite challenging, and multiple factors influence thermal mismatches among the dies, as detailed in sequence.

#### 2.2.1 Current Unbalance

Transient and static current unbalances impact directly the performance of MCMs, thereby generating thermal mismatches among the devices. There are many factors influencing the current distribution among the parallel devices, such as circuit mismatches, device parametric variations and temperature influences, as detailed in sequence.

#### **Circuit** Mismatches

The equivalent circuit of parallel devices is shown in Fig. 2.3. The switching loop  $(L_C)$  and common emitter  $(L_E)$  stray inductance, are the main causes of current unbal-

ance in this structure [44].  $L_E$  affects the switching characteristics, whereby the device with larger  $L_E$  turns on and off slower taking less and more current in both process, respectively. Conversely,  $L_C$  impacts on the on-state voltage ( $V_{on}$ ) during switching transient, thereby affecting the on-state current balancing in case of inductive load current [44].



Figure 2.3: Equivalent circuit of single-switch MCM with N parallel dies.

A comprehensive analysis of circuit influence in transient currents is conducted in [40], whereby the stray inductances are measured for a six-chips structure considering five different frequencies. At 10 KHz, the stray inductance inside the module can vary between 34 nH to 86 nH, thereby resulting in peak current deviations of up to 200 A during the turn-on process. Consequently, the devices with lower equivalent series inductance can reach up to twice of the switching losses comparing to the other ones.

The Kelvin emitter influences on the current sharing is also studied, showing a higher parasitic asymmetry and consequently higher current unbalance and losses increase in up to 20 %, for a three-chips structure [14].

#### **Temperature Influences**

The gate-emitter threshold voltage  $V_{ge-th}$  is a crucial parameter influencing the current distribution among parallel devices [45, 46]. Such parameter is related to the Fermi energy  $\phi_{FB}$ , which is in turn dependent to the junction temperature as shown in 2.1 [47]:

$$\phi_{FB} = \frac{kT_j}{q} ln \frac{N_{Amax}}{n_i} \tag{2.1}$$

Therefore, the correlation of  $V_{ge-th}$  with  $\phi_{FB}$  shown in 2.2 makes the gate-emitter voltage also dependent on the temperature, with a variation defined in 2.3 [46]:

$$V_{ge(th)} = -V_{ms} - \frac{Q_{ss}}{C_{ox}} + 2\phi_{FB} + \frac{\sqrt{2\epsilon_o\epsilon_{si}qN_{Amax}(2\phi_{FB})}}{C_{ox}}$$
(2.2)

$$\frac{dV_{ge(th)}}{dT_j} = \left[\frac{\phi_{FB}}{T_j} - \frac{k}{q}\left(\frac{E_g}{2kT_j} + 1.5\right)\right]\left(2 + \frac{\sqrt{2\epsilon_o\epsilon_{si}qN_{Amax}(2\phi_{FB})}}{2\phi_{FB}C_{ox}}\right)$$
(2.3)

Therefore, looking at 2.2.1, it can be concluded that the gate threshold voltage is inversely proportional to  $T_j$ .  $V_{ge-th}$  in turn, influences the turn-on delay time  $t_{d-on}$  as shown in 2.4 :

$$t_{d-on} = -R_G(C_{GE} + C_{GC}) \cdot ln\left(1 - \frac{V_{ge-th}}{V_{ge}}\right)$$

$$(2.4)$$

From the variation of  $V_{ge-th}$  with  $T_j$  demonstrated in , the  $t_{d-on}$  variation with  $T_j$  can be obtained as shown in 2.5 [46]:

$$\frac{dt_{d-on}}{dT_j} = -\tau_1 \left(\frac{V_{ge(th)}}{V_{(ge)} - V_{ge(th)}}\right) \cdot \frac{dV_{ge(th)}}{dT_j} > 0$$

$$(2.5)$$

The turn-off delay time derivation is similar to the  $t_{d-on}$ , and according to 2.5, both are increasing function of the junction temperature. Fig. 2.4, shows the thermal impacts on the turn-on and turn-off times of an IGBT, obtained from reference [46]. When a parallel IGBT has shorter  $t_{d-on}$  or longer  $t_{d-off}$  it takes more current than



Figure 2.4: Thermal impacts on the transient time of Si IGBTs (a) Turn-on (b) Turn-off.

the others, thereby influencing the transient current during the switching process of parallel devices [46].

Considering the carrier mobility as decreasing function of  $T_j$ ,  $V_{on}$  is also proportional to the temperature which directly influence the static current balancing, as shown in 2.6 [46]. The temperature dependency of the on-state voltage, has a negative coefficients at high load conditions; therefore, the temperature act reducing the thermal mismatch, since hotter devices has lower conductivity and carry less current [48]. The temperature impact on a Si IGBT device is shown in Fig. 2.5, which is obtained from reference [48].



Figure 2.5: Impact of the temperature on the conductivity of Si IGBTs.

#### 2.2.2 Thermal Cross-Coupling

The thermal cross-coupling is defined by the impact of the heat spreading of one device on its neighbors, which mainly depends on the power level and chip positioning [9]. Thereby, for a specific power and chip distance, one device induce a proportional thermal stress on the others. Consequently, several chips processing high power over very close distances result in critical cross-coupling effects [9]. Considering multichip modules which are designed for high power density, the devices are placed relatively



Figure 2.6: Thermal cross-coupling effects: a heat source is generated in one device and the temperature spread on its neighbors.

close - as shown in Fig. 2.6. As a result, the thermal spread of one device has high influences on the temperature of its neighbors.

### 2.2.3 Cooling Challenges

In MCMs, the dissipation power per unit is bigger, and the removal of heating from this structure is more challenging [13]. In addition, some chips can be up to 18 cm distant from each other, as shown in Fig. 2.7, and it is hard to obtain homogeneous cooling in the whole structure [13]. In air cooling systems, for example, the chips near to the fan has lower thermal resistance comparing to the ones in the other extreme.



Figure 2.7: Dimensions of a multichip power module with 36 dies.

### 2.3 Thermal Modeling of Multichip Modules

The thermal model of a power module, is composed by an equivalent circuit based on the thermal impedance  $(Z_{th})$  of its different layers.  $Z_{th(x,y)}$  is defined by the temperature difference measured in point x to y, divided by a step change of power dissipation [49]. As shown in Fig. 2.8, the impedance  $Z_{th(j-c)}$ ,  $Z_{th(c-h)}$  and  $Z_{th(h-a)}$  are referred to the heat dissipated between the die and the module encapsulation; the package and the heatsink and from the heatsink to the environment, respectively [50].



Figure 2.8: Dynamic thermal model of power modules with heatsink.

#### 2.3.1 Equivalent Thermal Networks

The equivalent thermal model, in turn, is usually represented by resistance and capacitance (RC) thermal network, whereby thermal resistance  $R_{th}$  and capacitance  $C_{th}$  are cascaded connected to represent the thermal behavior of the materials [51]. Thereby, there are two traditional ways to obtain the thermal model of a power module, as described in sequence.

#### Cauer Model

The first one is the cauer Model, which is based on the geometry and property of the materials of each layer, which gives physical sense to each power module thermal capacitance and resistance. As shown in Fig. 2.9, the number of RC modules is defined by the number of layers within the IGBT [51].



Figure 2.9: Cauer thermal model.

#### Foster Model

The foster model shown in Fig.2.10, however, is a mathematical approximation without any physical behavior [51]. This model, is obtained from fitting parameters of transient thermal tests, such as the one shown in Fig.2.11. The analytical formula that relates the thermal impedance to the thermal resistance is represented by 2.7, whose parameters are in general provided in datasheets:

$$Z_{th(j-c)}(t) = \sum_{i}^{n} R_{i}(1 - e^{\frac{-\tau}{\tau_{i}}}) \qquad ; \qquad \tau_{i} = R_{i}.C_{i}$$
(2.7)



Figure 2.10: Foster thermal model.



Figure 2.11: Impedance curves used to obtain the power module equivalent foster network.

#### Heatsink Thermal Modeling

As previously shown in Fig. 2.1, the power module is placed over a heatsink to increase its warm dissipation capability. The heatsink, in turn, is commonly aided by a cooling system to increase its heat transfer capability and achieve high power density. The forced air is the standard solution for applications with low cooling demand; however, to achieve a high thermal performance, fluid cooled systems are required [50]. Thereby, the cooling system selection with specific flow rate, pressure, concentration and fluid temperature directly affects the steady-state ( $R_{th}$ ) and transitory ( $\tau$ ) thermal resistance. Although many factors influence the dynamics of a heatsink system, its thermal model is commonly obtained by the aforementioned Cauer and Foster approaches.

#### 2.3.2 FEM-based Thermal Modeling

Although equivalent thermal networks have been adopted for the design of power converters, they are not capable of reflecting the effects of thermal mismatches inside a multichip module. In addition, a die-level thermal analysis is not possible, whereby only an overall junction temperature can be calculated. Therefore, to obtain a more accurate thermal behavior of multichip modules, and validate the thermal distribution, a finite elements (FEM) based modeling is implemented in this work. Hence, a 24-dies MCM structure previously shown in Fig. 2.2 is modeled in the FEM software Ansys Icepack. Fig. 2.12, shows a cross section of the model, as well as the dimensions and parameters, where d is the device thickness,  $\lambda$  the thermal conductivity and h the homogenous heat transfer coefficient [52].

#### Equivalent Thermal Network Extraction

Even though the FEM-based analysis is quite precise, it is not realistic to conduct real-time simulations due to the high computational effort and very long time consumption. Nevertheless, it is possible to obtain an equivalent thermal network



Figure 2.12: Finite Elements Model of a 1700 V/ 1600 A single-switch, with 24-dies.

with self and cross-coupling thermal impedance from the FEM model by using superposition [22, 52]. Thereby, the obtained system of equation can be embedded in an electrothermal simulation software, and enable a faster FEM-based co-simulation procedure.

#### The Superposition Methodology

The thermal performance of power modules is defined by the thermal impedance  $Z_{th}(j-a)$  from the device to the environment. This information is provided by manufacturers in documents such as datasheets, which in case of MCM are represented by the  $Z_{th}(j-a)$  of the whole parallel string. Although effective for single-die devices, this parameter is pretty limited for MCMs, due to multiple dies dissipating power inside the same package [53].

One solution is to obtain an effective thermal network is the superposition approach, which account the multiplicity of heat sources inside the package to obtain a matrix of thermal impedance. The principle of superposition to generate the steadystate thermal resistance matrix  $R_{th}(j-a)$  was introduced in references [22, 23, 54, 55], and extended in [52] to include the transient behavior  $Z_{th}(j-a)$ .

In this approach, a known power is the applied - in a controlled environment - to one specific die at a time and the temperature rising of the whole group is measured in every step. Therefore, the matrix representing the self and coupling impedance  $(Z_{th-x-y})$  of



Figure 2.13: Superposition methodology to obtain the equivalent MCM thermal network, whereby the transient curves of all dies are obtained heating a single die at a time. The process is done for each die, but specific ones are shown as examples: (a) T1 (b) T7 (c) D1 (d) D3.

each die can be obtained, where x-y represents the effect of die y on the impedance of die x. This matrix, in turn, can be used to to calculate the junction temperature of each single die  $(T_{jx})$ , for a respective dissipated power  $(P_x)$ , and ambient temperature  $(T_a)$ , as shown in the equation system below:

$$\begin{bmatrix} T_{j1} \\ T_{j2} \\ \vdots \\ T_{24} \end{bmatrix} = T_a + \begin{bmatrix} Z_{th-1,1} & Z_{th-1,2} & \cdots & Z_{th-1,24} \\ Z_{th-2,1} & Z_{th-2,2} & \cdots & Z_{th-2,24} \\ \vdots & \vdots & \ddots & \vdots \\ Z_{th-24,1} & Z_{th-24,2} & \cdots & Z_{th-24,24} \end{bmatrix} \cdot \begin{bmatrix} P_1 \\ P_2 \\ \vdots \\ P_{24} \end{bmatrix}$$
(2.8)

Then, the superposition methodology is applied to the FEM model to extract the impedance matrix of the 24-dies multichip module. Hence, a power loss of 200 W is injected in the FEM model in one die at a time and the transient curves of each device on every step is obtained. Fig. 2.13, shows the transient response of all dies, whereby four (T1-T8, D1-D4) are adopted as examples and fed with 200 W of power at a time. The 24 obtained heating curves - one per single die - are then loaded in a numeric software and fitted to obtain the matrix impedance terms [52]. As shown in Fig. 2.14, each impedance term is represented by a second-order exponential equation.



Figure 2.14: Second-order exponential fitting of the transient temperature obtained through FEM analysis,  $f(x) = a \cdot e^{bt} + c \cdot e^{dt}$ .

#### **FEM-based Electrothermal Simulation**

The developed FEM-based simulation structure is shown in Fig. 2.15, with a three-phase converter designed in a electrothermal software containing six equivalent MCMs. To represent the second-order exponential thermal behavior of each die, an equivalent foster circuit is implemented by using the terms obtained from the fitting process shown in Fig. 2.14. The simulation process works iteratively, whereby the losses are obtained from the electrothermal software and used to feed the equation system described in 2.3.2 for the calculation of the  $T_j$ s. The junction temperatures, in turn, are fed back to the model for the correct measurements of the desirable outputs: losses and temperatures.



Figure 2.15: Simulation process, where the thermal network is exported from FEM software and embedded in numeric model. The losses are extracted from the electrothermal model and fed back to FEM software. The process works iteratively for losses and temperature analysis.

#### 2.3.3 Thermal Validation

To demonstrate the thermal distribution in a 24-dies MCM, the FEM-based electrothermal simulation is realized considering the parameters stated in Tab. 2.1. Fig. 2.16, shows the steady-state and transient response of the FEM software, whereby the losses are calculated in the electrothermal software. Thereafter, the power losses of each die are applied the same time in the FEM software as a step. As can be seen, there is a thermal deviation of  $17.4 \,^{\circ}C$  comparing the middle with the edge transistors  $(T_x)$ . Moreover, the middle diodes  $D_x$ , show higher temperature than the edge transistors even with reduced losses - around 70% - due to high cross-coupling effects in its critical positioning on the MCM.

Table 2.1: Simulation parameters for the validation of the thermal distribution.

| $V_{dc}$    | 980 V            |
|-------------|------------------|
| $V_{out}$   | 690 V            |
| Iout        | 1000 A           |
| $F_{sw}$    | $5 \mathrm{kHz}$ |
| $cos(\phi)$ | 1                |

#### Thermal Deviation Among Modules

As aforementioned and demonstrated, multiple factors such as thermal cross-coupling, current deviation, inhomogeneous cooling and uneven aging results in thermal mis-



Figure 2.16: FEM analysis of the 24-dies MCM, showing a thermal deviation among the dies of  $17.4 \,^{\circ}C$  (a) Steady-State analysis (b) Transient analysis.

matches among the dies and, ultimately, reduced lifetime. Moreover, the same factors can also influence thermal mismatches among different modules, which are not - necessarily - in parallel. To demonstrate this effect, a reduction on the heat transfer coefficient of 10% representing a defect on the water cooling system, is simulated in the FEM model, with the same procedure and parameters described above. Fig. 2.17 (a) shows the thermal results obtained in the FEM software with the standard cooling  $h = 4440 W/K.m^{-2}$  and Fig. 2.17 (b), the results for the defected cooling  $h = 3960 W/K.m^{-2}$ . As can be seen, there are temperature differences of around



Figure 2.17: Temperature on the 24-dies MCM obtained from FEM analysis (a) Standard cooling system  $(h = 4440 W/K.m^{-2})$  (b) Defected cooling system  $(h = 3960 W/K.m^{-2})$ .

 $6 \,^{\circ}C$ , thereby resulting in a total thermal deviation, from the hottest to the coldest die of the two modules of  $22.4 \,^{\circ}C$ .

### 2.4 Thermally-Related Failures in Power Modules

As validated in the previous section, the thermal mismatches in MCMs result in induced extra temperature in a subset of devices. Indeed, the temperature has an influence on multiple failure mechanisms of power semiconductor device, and have been reported as the root cause of most failures in industry [1, 20, 21]. This section, examine the correlations of temperature with the failure mechanisms in power semiconductor devices. In power modules, failures can be classified in wear-out and catastrophic, as shown in flowchart of Fig. 2.18. Looking at the root causes, aging failures are basically due to wear-out of electronic parts, mostly at package level, whereas catastrophic failures basically come from severe overloads and instabilities [15].



Figure 2.18: Classification of failures in power modules based on root cause. [15]

#### 2.4.1 Instabilities

Instabilities are strongly related to internal aspects, which are characterized by a loss of control that leads to destruction, i.e., whose evolution does not depend on the external circuit but rather only on regenerative phenomena internal to the device [15]. Instabilities are a very dangerous phenomena that must be deeply understood before using power devices. It is worth to noting that neither protection circuits nor control strategies can help in avoiding instabilities because there is no clear external evidence when they occur [15]. However, the understanding of the root causes may help on avoiding potential events.

High temperatures has been reported as root cause of instabilities in the off-state of fast switching diodes [56]. It is demonstrated that the increasing leakage current, due to higher temperature level, can provoke thermal runaway and damage a small region of the junction peripheral surface, thereby resulting in a short-circuit catastrophic event. In power Mosfets, high temperatures can be deterministic for a thermal runaway, when the gate voltages are lower than the stability boundary [57]. The contribution of temperature on instabilities of IGBTs under unclamped inductive switching (UIS) have been also presented. Experimental results have shown a negative resistance under high temperatures, allowing current unbalance in few cells and inducing hot spots in specific regions of the die [58, 59].

#### 2.4.2 Severe Overloads

The temperature can also influence on the failures triggered by severe overloads. When the device exceeds its blocking voltage limits, for example, the high electric field accelerate and generate mobile carriers through impact ionization, thereby creating a significant current flow in the depletion region [47]. As a result, an avalanche break-down occurs increasing the current and disabling the power device voltage blocking capability. High temperatures contribute to the generation of carriers and can potentially activate a dynamic avalanche breakdown with reduced electric field - i.e lower voltage level [17, 60]. This phenomena has presented higher risk in multichip modules, where thermal deviations can result in a dynamic avalanche breakdown of the hotter devices [17].

The tendency for current filaments during an IGBT turn-off with a current signif-

icantly higher than the nominal is investigated in [19]. The related work demonstrate that the thermal heating during the turn-off is the main reason of a non-extiguishing current filament and the probable cause of device destruction under over current. Furthermore, the temperature can also initiate a thermal activation process according to Arrhenius law, triggering a degradation process in the dies [47]. In fact, the devices have a temperature limit for a safe operation, which varies in general between  $125^{\circ}$  C and  $175^{\circ}$  C.

#### 2.4.3 Wear-out failures in power modules

The power module is based on several layers consisting of different materials with different coefficients of thermal expansion, heat conductivity and heat storage, as summarized in Tab. 2.2 [50]. The previously shown Fig. 2.1, display the structure of a single-die power module, which is composed by a Silicon (Si) chip soldered in a direct copper-bonded (DCB) substrate, and contacted on the top side by aluminum (Al) bond wires. The DCB, insulates the chip from the base plate and conducts the heat dissipated to the cooling system, which in turn is composed by a heatsink, and commonly aided by forced air or water cooling system [43].

| Material        | Heat Conductivity | Heat storage     | Coefficient of Thermal  |  |  |  |  |  |
|-----------------|-------------------|------------------|-------------------------|--|--|--|--|--|
|                 | $[W/(m^*K)]$      | $[kW/(m^3 * K)]$ | expansion $[10^{-6}/K]$ |  |  |  |  |  |
| Silicon         | 148               | 1650             | 4.1                     |  |  |  |  |  |
| Copper          | 394               | 3400             | 17.5                    |  |  |  |  |  |
| Aluminum        | 230               | 2480             | 17.5                    |  |  |  |  |  |
| Silver          | 407               | 2450             | 19                      |  |  |  |  |  |
| Molybdenum      | 145               | 2575             | 5                       |  |  |  |  |  |
| Solders         | 70                | 1670             | 15-30                   |  |  |  |  |  |
| $Al_2O_3 - DBC$ | 24                | 3025             | 8.3                     |  |  |  |  |  |
| AlN DBC         | 180               | 2435             | 15-30                   |  |  |  |  |  |
| AlSiC (75% SiC) | 180               | 2223             | 7                       |  |  |  |  |  |
| [50]            |                   |                  |                         |  |  |  |  |  |

Table 2.2: Material characteristics of a power module.

The temperature is the main factor influencing the aging of power semiconductor

devices, whereby repetitive thermal cycling cause expansion and shrinkage at different rates in the power module materials. As a result, crack growth at the bond wire/chip interface and propagation of cracks or voids between the module substrates, ultimately results in failure by bond wire liftoff or solder fatigue, as shown in Fig.2.19 [16, 61].



Figure 2.19: Aging failures in power modules (a) Bond wire metallurgic damage, heel crack, fracture and liftoff. (b) Solder damages.
[16]

Thermal cycling can also result in reconstruction of the aluminum metalization due to its different CTE comparing to the Si chip. Reconstruction are mostly more evident in the middle of the dies, where the temperature achieve its maximal [62]. Indeed, surface reconstruction in the middle of the chip operating over  $110^{\circ}C$  has been demonstrated. Conversely, no noticeable effect on its periphery, which operates under lower temperature has been observed [62].

#### 2.4.4 Accumulated Degradation Unevenness in MCMs

As aforementioned, the power module wear out is directly related to the temperature; therefore, the many factors influencing thermal deviations, ultimately result in degradation unevenness inside MCMs [40, 63, 64]. In long term cycling, this problem become worst, because the aging can increase the device parameters and circuit mismatches, thereby resulting in a cumulative degradation process [44]. Imperfections caused by oxide traps on the atomic level can vary the  $V_{ge-th}$ , thereby decreasing the device turn-on time, increasing losses, temperature and accelerating the aging. Delamination, in turn, can influence the  $R_{on}$  changing the conduction losses and the static current distribution among the devices [65]. The liftoff of wire bonds can also impact the current distribution in MCMs by increasing the ohmic resistances of specific devices [66]. Changes in the case temperature  $(T_c)$  distribution after aging is also reported impacting the thermal unevenness inside MCMs [67]. Considering a 9-chips MCM, where temperature sensors are placed near each chip, the non uniformity is more significant after aging, and an extra deviation of 6.37° C is measured among the dies. Therefore, case imperfections contributes ultimately to the degradation unevenness inside MCM, and this problem become worst over time [67].

## 2.5 Proposed Solutions to Mitigate Thermal Mismatches in MCMs

The thermal mismatch has been addressed in the design of power modules to improve thermal and current distribution in MCMs. To reduce static and transient current unbalance, a modified split-output DCB layout is proposed, whereby the common stray inductance and the di/dt mismatches are reduced [68]. To overcome thermal cross-coupling effects, an optimum design based on the position of each chip is proposed in [9]. Indeed, the thermal cross-coupling effects can be eliminated respecting a predefined space among the chips during the design process. Nevertheless, the demand for increasing the power density challenges to keep the distance among the dies. To solve inhomogeneous cooling problems, optimized direct water cooling system shows up as a solution. However the higher cost, size and reduced reliability contradicts the philosophy of safe power electronics miniaturization [69]. As a result, the reliability of MCMs stills an open-point even after 30 years of multidisciplinary research.

### 2.6 Short Summary of the Chapter

This section has presented the thermal deviations and the reliability impacts on MCM-based systems. Circuit and parametric deviations have been reported as rootcause of current unbalance in parallel devices, which ultimately contribute to the thermal deviation. A finite elements model of a 24-dies MCM has shown a thermal deviation of around  $17 \,^{\circ}C$ . Moreover, the influences of temperature in aging and catastrophic failure events is presented, justifying the reasons why the temperature is responsible for most failures in industry. The proposed approaches to solve the related issues, point out that the problem stills open for alternative solutions. Chapter 3

# Die-Level Design for Reliability of

# **MCM-based** Converters

# Contents

| 3.1 | Introd                                                                                                                                | luction                                         | 35 |  |  |
|-----|---------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|----|--|--|
| 3.2 | Design for Reliability of Power Semiconductor Devices                                                                                 |                                                 |    |  |  |
|     | 3.2.1                                                                                                                                 | Mission Profile                                 | 36 |  |  |
|     | 3.2.2                                                                                                                                 | Converter Analysis                              | 37 |  |  |
|     | 3.2.3                                                                                                                                 | Thermal Analysis                                | 38 |  |  |
|     | 3.2.4                                                                                                                                 | Lifetime Analysis                               | 40 |  |  |
|     | 3.2.5                                                                                                                                 | Statistical Analysis for Reliability Prediction | 42 |  |  |
|     | 3.2.6                                                                                                                                 | System-Level Reliability Analysis               | 43 |  |  |
| 3.3 | Die-Level Design for Reliability                                                                                                      |                                                 |    |  |  |
| 3.4 | 4 Design for Reliability of Mission Critical Industry Applications - A cas                                                            |                                                 |    |  |  |
|     | $study \ldots \ldots$ |                                                 |    |  |  |
|     | 3.4.1                                                                                                                                 | Mission Profile                                 | 47 |  |  |
|     | 3.4.2                                                                                                                                 | Converter Analysis                              | 49 |  |  |
|     | 3.4.3                                                                                                                                 | Thermal Analysis                                | 50 |  |  |
|     | 3.4.4                                                                                                                                 | Lifetime, Statistical and Reliability Analysis  | 50 |  |  |
| 3.5 | Short                                                                                                                                 | Summary of the Chapter                          | 53 |  |  |

### 3.1 Introduction

To reduce the temperature impacts during the design stage of power converters, reliability-oriented solutions have been applied for many years in industry and traction applications. Therefore, power converters have been designed to respect specific thermal cycling limits to avoid wear out failures during their power modules lifetime [70, 71, 72, 73]. The design for reliability (DFR) is based on the same concept, whereby physics of failure (PoF) analysis are conducted to define the thermal cycling limits [74]. Considering power electronics devices, this approach calculates the number of cycles to failure - which can be traduced to lifetime - based on mathematics models, obtained from accelerated lifetime tests [75, 76]. The mathematical models, however, depends on the devices parameters which can vary within a large group of dies. Therefore, statistical tools has been adopted to take into account the variations and calculate the failure probabilities inside a specific confidence interval [77]. Furthermore, the failure probability - or unreliability - of each device are combined towards the system-level reliability evaluation [78, 79].

In the DFR of MCM-based power converters, however, the thermal deviations and the high number of dies are not considered in conventional procedures. Thereby, the power module lifetime is calculated based on a lower temperature and reduced failure probabilities, thereby resulting in higher system-level reliability. Therefore, a die-level analysis is proposed to improve the DFR procedure of MCM-based systems and . For that, the temperature of each die is obtained separately via finite elements analysis and the die-level failure probabilities are combined to obtain a more realistic system-level reliability. To evaluate the impacts of a die-level analysis on the DFR of MCM-based power converters, a case study based on a real mine hoist system is carried out.

## 3.2 Design for Reliability of Power Semiconductor Devices

Based on Physics-of-Failure, the DFR has been introduced in power electronics aiming at understanding and fixing reliability problems during the design stage to ensure a system to achieve a predefined target lifetime - or reliability level - within a given environment [5, 6, 34, 36, 74, 77, 78]. The main characteristic of the DFR is the application focus, whereby the lifetime depends on each case, which according to the European industry varies mostly between 10 and 30 years [20]. This procedure is, in general, based in seven steps, as shown in Fig. 3.1 and described in the following subsections.

#### 3.2.1 Mission Profile

The first step is the obtaining of the mission profile for the considered application, which fidelity is of utmost importance and must be as close as possible to the real operational conditions, otherwise it can compromises the analysis [80].

Mission profile in railway applications has been extensively studied by the LESIT (1994-1996) and RASPSDRA (1996-1998) programs, leading to a definition of power cycling and life cycle prediction methods. For trains with a known load cycle, a possible approximation is to consider the total charge cycle on a given service line, assuming maximum load conditions, and then repeat the short term period for long period profiles [72].

In applications with more complex mission profiles with random variations, such as wind and solar energy systems, however, this approximation can generate a significant deviation from the reality [81]. The correct characterization of the mission profile in wind energy systems is indeed a very hard task. Firstly due to the different factors that directly influence the thermal cycling of the devices, such as wind speed, variations in the environment, behavior of the mechanical and electrical parts of the system and



Figure 3.1: Design for reliability of power semiconductor devices of a converter applied to mission critical applications.

grid conditions. These conditions may involve multidisciplinary models with different time constants; therefore, further scrutiny in the mission profile is required. Therefore, alternative approaches based on separating the mission profile time constants in: long period, average period and short period have been proposed [81].

#### 3.2.2 Converter Analysis

Knowing the mission profile, the power converter can be modeled in an electrothermal software to obtain the power, dc-link voltage and currents. Thereafter the devices can be selected respecting the primary design criteria: voltage blocking and conduction current. This first design serve as a basis, yet this procedure has to be repeated until the system achieve the predefined reliability level [35].

#### Power Losses Modeling

After modeling the power converter in the electothermal software, the next step is to obtain the power losses of the semiconductor devices, which are extracted from the datasheet. The modeling of the switching losses is based on a four-dimensional matrix, with the respective inputs: blocking voltage, current, and junction temperature. As demonstrated in Fig 3.2, each factor is directly proportional to the losses, in other words, higher voltage, current and temperature results in higher losses.

Similarly, the same procedure is adopted to model the conduction losses of the power modules, which is in turn represented by a two-dimensional matrix, with the respective input data: junction temperature and conduction current. As shown in Figs. 3.3, the output data for the IGBT and diodes are the collector-emitter voltage  $(V_{ce})$  and forward voltage  $(V_f)$ , respectively. Then, the obtained lookup-tables are loaded in the electrothermal software, for the losses calculation of each device, under the respective mission profile.

#### 3.2.3 Thermal Analysis

For the thermal analysis, the power modules are modeled as equivalent circuits, as demonstrated in Sec. 2.3.1. Then, the model is fed with the power losses profile and the junction temperatures are obtained. After obtaining the temperatures of the power devices, the next step is to characterize the cycle profile. Commonly, rainflow cycling count algorithms are used to detect peaks and valleys to calculate the mean temperature  $(T_m)$ , variation  $(\Delta T_j)$  and duty time  $(T_{on})$  [82, 83].



Figure 3.2: Switching losses extracted from the datasheet of a power semiconductor device: (a) IGBT Turn-on (b) IGBT Turn-off (c) Diode reverse-recovery .



Figure 3.3: Conduction losses extracted from the datasheet of a power semiconductor device: (a) IGBT (b) Diode.
### 3.2.4 Lifetime Analysis

Thereafter, lifetime models of power devices are used to obtain an expected number of cycles to failure  $(N_f)$  based on the specific thermal cycling mission profile. To obtain such models, in short, a power profile with heavy load variation is applied to a group of device until they reach of the previous established end of life (EOL) criteria [84]. Then, the obtained samples are fitted in mathematical models, whose parameters vary with the characteristics of each device, as demonstrated in the following examples.

#### Coffin-Manson-Arrhenius Model

This model is based on the Coffin-Manson simple model and takes into account the mean temperature  $(T_m)$  and the fluctuation  $T_j$ , as shown in equation 3.1.

$$N_f = \alpha . (\Delta T_i)^{-n} . e^{(E_a/k.T_m)}$$

$$(3.1)$$

where k is the Boltzmann constant and  $E_a$  is the energy activation parameter.

#### **Bayerers Model**

Reference [85] proposed the most analytically comprehensible model, which considers the power module features and variation in different thermal cycling parameters.

$$N_f = A.(\Delta T_j)^{\beta_1} . e^{\beta_2 / (T_{j,min} + 273K)} . t_{on}^{\beta_3} . I^{\beta_4} . V^{\beta_5} . D^{\beta_6}$$
(3.2)

where  $\Delta T_j$  is the temperature variation,  $T_{j,min}$  the minimum junction temperature,  $t_{on}$  the duty time, V the blocking voltage, D the bond-wire diameter, I the current per wire, A,  $\beta_1$ ,  $\beta_2$ ,  $\beta_3$ ,  $\beta_4$ ,  $\beta_5$  e  $\beta_6$  are the adjustable parameters, exhibited in table 3.1.

#### Wire-bond lifetime model

The limitation of lifetime models is introduced in reference [86], claiming the difficulty to dissociate the impacts of wire bond deterioration and solder fatigue. As a solution, a wire bond based lifetime model is proposed. For that, pressure modules without baseplate (SKIM63) are selected for 97 power cycling tests, with a wide parameter variation in an interval of 5 years [87]. To compress the impact of design

| Parameter | Unity               |
|-----------|---------------------|
| А         | 9.34e14             |
| $\beta 1$ | 4.416               |
| $\beta 2$ | 1285                |
| $\beta 3$ | -0.463              |
| $\beta 4$ | -0.761              |
| $\beta 5$ | -0.5                |
| eta 6     | -0.5                |
| V         | 1700                |
| D         | $75 \mu \mathrm{m}$ |
|           |                     |

Table 3.1: Fixed parameters in Bayerer's equation.

parameters, different bond wires are adopted during the tests. The obtained lifetime model is shown in 3.3, and its fixed parameters are stated in Tab. 3.2.

$$n_f = A \cdot \Delta T_j^{\alpha} \cdot ar^{\beta 1 \cdot \Delta T_j + \beta_0} \cdot \left(\frac{C + t_{on}^{\gamma}}{C + 1}\right) \cdot exp\left(\frac{Ea}{K_B \cdot T_{jm}}\right) \cdot f_{Diode}$$
(3.3)

Table 3.2: Fixed parameters of wire bond lifetime model.

| Parameter    | Unity       | Experimental data range                           |
|--------------|-------------|---------------------------------------------------|
| А            | 3.4368e14   |                                                   |
| lpha         | -4.923      | $64 \mathrm{K} \le \Delta T_j \le 113 \mathrm{K}$ |
| $\beta 1$    | -9.012e - 3 | $0.19 \le \mathrm{ar} \le 0.42$                   |
| eta 0        | 1.942       | $0.19 \le ar \le 0.42$                            |
| $\mathbf{C}$ | 1.434       | $0.07s \le t_{on} \le 63s$                        |
| $\gamma$     | 1.208       | $0.07 \mathrm{s} \le t_{on} \le 63 \mathrm{s}$    |
| $E_a[eV]$    | 0.06606     | $32.5^{\circ}C \le \Delta T_j \le 122^{\circ}C$   |
| $f_{diode}$  | 0.6204      |                                                   |

#### Accumulated Damage

As shown above, the lifetime models indicate the number of cycles to failure for a device under a specific thermal profile. To obtain the expected end-of-life (EOL) of the devices in years, however, it is firstly necessary to calculate the accumulated damage caused by each thermal cycle [88]. For that, the damage of each cycle is accounted by the Palmgren-Miner rule, thereby obtaining the lifetime consumption shown in 3.4 [89].

$$LC = \sum_{i}^{k} \frac{n_i}{N_{fi}} \tag{3.4}$$

where  $n_{(i)}$  is the number of cycles and  $N_{fi}$  the number of cycles to failure. Hence, considering the LC over one year is therefore possible to estimate the remaining useful lifetime and the expected year for the device to achieve its EOL - in other words, its lifetime [88].

## 3.2.5 Statistical Analysis for Reliability Prediction

Deviations in lifetime model and device parameters may result in different lifetime estimation [77]. The on-state voltage drop of an IGBT ( $V_{ce}$ ), for example, presents variation even within the same manufacturing batch, which ultimately impact  $T_{jm}$  and  $\Delta T_j$  [44, 71]. Therefore, statistical analysis are adopted to improve the confidence level of lifetime predictions, whereby stochastic parameters are converted into equivalent deterministic values [77]. For that, a normal distribution with a mean value ( $\mu$ ) and standard deviation ( $\sigma$ ) is defined to obtain a specific confidence interval. Fig. 3.4, for example, shows a distribution with a confidence interval of 99.73%, i.e. a variation equals  $3\sigma$ .



Figure 3.4: Normal distribution with a confidence of 99.73 %.

For the implementation the Monte Carlo analysis have been adopted, whereby the parameter variation obtained from normal distributions is applied to a specific number of samples. Thereby, the EOL of each sample, which has specific parameters defined by the normal distribution, is calculated as detailed in Sec. 3.2.4. Although the number of samples is not limited, the selection of 10000 has been adopted in such analysis, due to its capability to ensure a good confidence level and low computational effort [77, 90]. After calculating the lifetime for each sample, the obtained distribution - which represents the number of samples on its expected end-of-life year - is fitted within the Weibull probability density function (PDF) f(x), by using 3.5 [90].

$$f(x) = \frac{\beta}{\eta^{\beta}} \cdot x^{(\beta-1)} \cdot exp\left[-\left(\frac{x}{\eta}\right)^{\beta}\right]$$
(3.5)

where  $\beta$ ,  $\eta$  and x are the shape, scale and operating time parameters, respectively. Finally, the cumulative density function (CDF) is calculated by integrating the PDF, as shown in 3.6. As a result, the probability of one sample to failure over time is defined by the CDF curve - F(x) [78]. From that, the statistical-based  $B_x$  lifetime, which is the probability of x% samples to fail in a specific time can be obtained, and used as a reliability metric for the design of systems. Alternatively, the failure probability - or unreliability - in a specific time, can be represented by the  $U_x$  factor introduced in [35]. Fig. 3.5, shows the unreliability curve - or CDF - highlighting the  $B_x$  and  $U_x$  factors, which are the time for a specific failure probability and the failure probability for a defined operation time, respectively.

$$F(x) = \int_0^x f(x)dx \tag{3.6}$$

## 3.2.6 System-Level Reliability Analysis

Once the unreliability curve of each component is obtained, the next step is to combine them to obtain the reliability of the entire system, via system-level block diagram approach. The reliability modeling is based on the system interruption; therefore, the blocks, which represent the unreliability of a each component, are connected accord-



Operation time

Figure 3.5: Unreliability curve with:  $B_x$  - time when the device achieve x% of failure probability;  $U_x$  - The failure probability in a specific operation time.

ing to the consequence of its failure [78]. For example, if a failure of one single device means a system interruption, the unreliability - or reliabilities - are combined in a series association, as shown in Fig. 3.6 [78].



Figure 3.6: The reliability series association, which is adopted when a failure of a single device results in a system interruption.

Therefore, the system-level reliability, or the probability of an interruption in such systems due to a single failure, is the product of the failure probability of each seriesconnected component ( $F_{Comp-i}(x)$ ), as modeled in 3.7 [78]:

$$F_{System}(x) = 1 - \prod_{i=1}^{n} (1 - F_{Comp-i}(x)).$$
(3.7)

## 3.3 Die-Level Design for Reliability

The failure-in-time (FIT) is a statistical value that represents a failure rate based on the number of failures that occurs every  $10^9$  hours [91]. For the reliability modeling considering failure-in-time (FIT), the failure probabilities are calculated for a specific power module and scaled to others with similar chips - same blocking voltage -, yet with different current level [91]. In other words, the result is obtained for a specific number of dies, and modeled as a reliability series-connected system, to calculate the failure rate of MCMs with different number of dies. Although the design for reliability has been applied for proper selection of power devices in very complex systems, the specificity of multichip modules have not been considered. As demonstrated in Sec. 2.3.2, the critical thermal mismatches results in considerable extra temperature in specific dies which is not estimated in ordinary designs. Moreover, to obtain the system reliability of modular or multiphase systems the failure probability of each device is combined in a product, as shown in Sec. 3.2.6. Nevertheless, the multiple devices inside the MCM prone to fail is neglected and its failure probability is calculated as a single die solution.



Figure 3.7: Die-Level design for reliability flowchart, including - light green - the thermal deviation and the system-level reliability modeling of the MCM dies.

Therefore, this work proposes a die-level reliability design for MCM-based mission critical application power converters. This structure has two basic differences, the thermal modeling via superposition methodology to obtain the matrix with cross-coupling impedance - as described in Sec. 2.3.2 -, and the reliability modeling of the MCM as a system composed by multiple dies with different failure probabilities, as highlighted - light green - in Fig. 3.7. To evaluate the impact of the thermal deviation and a die-level failure probability analysis on the reliability of mission critical applications, a case study based on an aforementioned mine hoist system is presented in sequence.

## 3.4 Design for Reliability of Mission Critical Industry Applications - A case study

The conducted case study is based on a real gold ore mine hoist system located in southeastern Brazil, which operates with a maximum speed of 12 m/s and a payload of 4.2 tons. This system, which dates back to the late eighties, is currently driven by a 1 MW/900 V dc motor fed by a fully controlled thyristor rectifier, as shown its electrical layout of Fig. 3.8. However, for this study a retrofit is realized, whereby the new system is composed by induction machine fed by a two-level inverter. This new system is driven by an 1 MW / 690 V induction machine, with a current of 1070 A



Figure 3.8: Mine hoist system (a) General diagram (b) Photo of the electrical system of the mine hoist..

at full load  $(I_L)$  and 560 A at no load  $(I_{NL})$  whilst the locked rotor current  $(I_i/I_n)$ achieves up to 6520 A and the breakdown torque  $(T_{bd})$  is 200 %. This machine has a frequency of 50 Hz and six poles, thereby achieving a speed of 990 rpm at full load, with a total slip of 1%. For that, an indirect field oriented control (IFOC) strategy is realized, whereby the machine starts in a controlled velocity ramp. The induction machine parameters, equivalent magnetic circuit estimation and validation, as well as the control tuning information are included in Appendix. B.

### 3.4.1 Mission Profile

The real field measured mission profile of the mine hoist including the driven dc current, speed (commanded and measured) and skip position is shown in Fig. 3.9. The mission profile initiates with the reference skip stopped at the top of the tower, and it is accelerated to the deep during 40 s, thereby achieving the maximum velocity of 12 m/s. When the skip achieves nominal speed, it remains constant for about 50 s. Thereafter, the skip starts breaking (regenerative breaking), which is represented by the armature negative current. Finally, before the skips come to a complete stop, there is a low constant speed region. The mission profile finishes with the reference skip at the bottom of the mine, with a total time of 120 s. In the respective production plant, it is expected around 11800 similar trips during a year of operation.

As can be seen, the skip is accelerated via small speed steps, and the current achieves up to 2560 A, i.e. around 150% of overload. To reduce the thermal cycling stress, the skip is accelerated in a constant ramp, as shown in 3.11. The static load torque, in turn, results from the force exerted by the skip load, and varies with its position from bottom as shown in 3.8

$$F_{skip} = g \cdot (D_{wd} - 2 \cdot D_{Bt}) \cdot R_w + P_L \tag{3.8}$$

where, g is the gravity,  $D_{wd}$  the winding distance,  $R_w$  the rope weight,  $P_L$  the payload and  $DB_t$  the skip distance from bottom. Hence, the static torque on drum



Figure 3.9: Electrical measurements of the mine hoist during one trip, measured in the real field environment.

 $(T_{DS})$  is obtained by:

$$T_{DS} = F_{skip} \cdot \frac{D_{ia}D}{2} \cdot (1+F_C) \tag{3.9}$$

where, DiaD is the drum diameter and FC the friction torque. The torque on motor is then obtained by converting it with the gearbox ratio  $(G_R)$  and efficiency  $(G_{EF})$ , as shown in 3.10.

$$T_{MS} = \frac{T_{SD}}{G_R \cdot G_{EF}} \tag{3.10}$$

Therefore, with the mission profile based on the trip from the top to the bottom of the mine, the measured torque profile starts from its maximum and vary with negative coefficient, as shown in 3.11. For the respective system, the load torque in worst case - i.e. with the skip at the top of the mine, 860 m - is  $T_{MS-1pu} = 10015 Nm$ . The total inertia on the motor side, accounting the mechanical parts of the system is also calculated in the real mine hoist project, totaling  $J_{MS} = 650 kgm^2$ . The mechanical parameters of this project are confidential, and not relevant for the conducted case



Figure 3.10: Mission profile of the mine hoist system driven by an induction machine with FOC.

study which requires only the mentioned  $T_{MS}$  and  $J_{MS}$ .

## 3.4.2 Converter Analysis

To select the proper devices for the designed system, the mechanical load parameters and the mission profile are applied in the simulation software, which contains an induction machine fed by a two-level inverter, and the parameters shown in Tab. 3.3. Therefore, knowing the dc-link voltage  $V_{dc} = 1150 V$ , only the current is necessary for the first selection of the devices. Fig. 3.11, shows the output current of the converter which achieve up to  $I_{dc} = 2200A$ , and operates with reverse power flow during breaking process. Therefore, to ensure a reasonable temperature without compromising the lifetime [5, 6], two 24-dies MCM devices are adopted in parallel, totaling twelve MCMs.

Table 3.3: Simulation parameters.

| Parameter | Value  |
|-----------|--------|
| $V_{dc}$  | 1150 V |
| Vout      | 690 V  |
| $F_{sw}$  | 5  kHz |



Figure 3.11: Dc-Link current profile of the designed converter feeding the induction machine which drive the mine hoist system.

## 3.4.3 Thermal Analysis

For the thermal analysis, two thermal circuits are adopted, one based on a ordinary foster thermal network and the other obtained through the FEM analysis. Therefore, in the first analysis shown in Fig. 3.12(a), the MCM has only one temperature for the IGBTs and other for the diodes. For the second analysis, however, there are 24 temperatures - one for each die inside the MCM - including the self and thermal cross-coupling effects, thereby resulting in a maximum junction temperature 7.64 °C higher.

## 3.4.4 Lifetime, Statistical and Reliability Analysis

With the temperature profiles, the expected lifetime of both approaches are calculated through the model described in Sec. 3.2.4 [86]. To represent the uncertainties, a Monte-Carlo analysis with a population of 10000 samples, considering variations in  $V_{ce}$  obtained from die-level measurements, is realized [92]. Fig. 3.13, shows the failure distribution over time for the two-level inverter feeding the mine hoist induction machine, for the  $T_j$  profiles obtained by a single foster network (purple) and the equivalent thermal network with cross-coupling thermal impedance (black). As can



Figure 3.12: Junction temperature profile of one MCM in the 2-level inverter with (a) Foster thermal network (b) Matrix with cross-coupling impedance extracted from finite elements analysis.



Figure 3.13: Monte-Carlo analysis to account the failure distribution over time, considering parametric deviations in  $V_{ce}$ , without thermal deviation (purple) - foster thermal network - and with thermal deviation (black) - FEM-based thermal network with cross-coupling.

be seen, considering the thermal distribution inside the MCM and a die-level lifetime calculation - as proposed in the die-level design for reliability (DL-DFR) procedure the number of failed samples are more scattered over time, thereby resulting in some samples failed in the first years of operation.

Thereafter, the failure probabilities over time are obtained through cumulative density functions (CDF) of the distributions shown in Fig. 3.13. For the first case, the system-level design for reliability approach is used, whereby the failure probability



Figure 3.14: Series-connected reliability modeling, whereby the failure of a single component means system interruption, for the procedures: (a) System-Level Design for Reliability (b) Die-Level Design for Reliability.

per module - one IGBT and one Diode -, are combined in a series-reliability block, as shown in Fig. 3.14 (a) and described in 3.2.6. Conversely, for the second analysis the CDF of each die is obtained as proposed in the DL-DFR approach. Therefore, the failure probability of each die is combined in a series-reliability block, and the system-level reliability is calculated, as shown in Fig. 3.14 (b).

Fig. 3.15, shows the unreliability analysis for the mine hoist power converter, with a system-level and die-level reliability modeling. As can be seen in blue, the obtained  $B_{10}$  lifetime, i.e. the time when 10 % of the samples fail, is  $B_{10} = 28.8$  years. Nevertheless, considering the thermal deviations inside the MCMs, and a die-level reliability modeling, the resulting lifetime is  $B_{10} = 14.36$  years, as shown in green line of Fig. 3.15.



Figure 3.15: Unreliability analysis of the multichip modules composing the mine hoist power converter, for system-level reliability (purple) and die-level reliability (black) approaches. A  $B_{10}$  lifetime difference of 100 % is observed.

## 3.5 Short Summary of the Chapter

A power routing strategy to overcome thermal deviation among modules of a multiphase drive is proposed. It is demonstrated that the proposed strategy, can overcome thermal deviations of  $10 \,^{\circ}C$  and enhance the power converter lifetime in up to  $23 \,\%$ , without degrading the electromagnetic machine performance.

The design for reliability has been presented as a solution to mitigate thermal stress, whereby the power converter is designed to obtain a predefined lifetime considering the mission profile of a specific application. Although the DFR has been increasing the reliability of systems, the system-level reliability approaches may result in wrong lifetime prediction. Therefore, this chapter has presented an alternative DFR approach, whereby a FEM-based thermal model and a die-level reliability modeling are adopted. As a result, the procedure is capable of accounting the failure probability of each die for a specific temperature, thereby predicting the system-level lifetime which can diverge in up to 100 % of the traditional procedure. Moreover, the results point out that the unequal thermal distribution inside the MCM, results in reduced system lifetime.

Chapter 4

# Thermal Balancing in Power

## Converters

# Contents

| 4.1 | Introd | uction                                                  | 56 |
|-----|--------|---------------------------------------------------------|----|
| 4.2 | Therm  | nal Control in Power Electronics Devices                | 56 |
|     | 4.2.1  | Finite Control Set Model Predictive Control             | 57 |
|     | 4.2.2  | Adaptive Dc-Link Voltage Control                        | 59 |
|     | 4.2.3  | Power Routing                                           | 59 |
| 4.3 | Power  | Routing in Multiphase Drives                            | 60 |
|     | 4.3.1  | Soft-Unbalance Operation of Multiphase Machines         | 61 |
|     | 4.3.2  | Power Routing in Multiphase Drives                      | 64 |
|     | 4.3.3  | Thermal Validation                                      | 65 |
|     | 4.3.4  | Reliability Analysis of the Power Routing in MCIAs      | 68 |
| 4.4 | Therm  | nal and Aging Monitoring of Power Semiconductor Devices | 70 |
|     | 4.4.1  | Sensor-based Temperature Monitoring                     | 70 |
|     | 4.4.2  | Gate Voltage Monitoring                                 | 71 |
|     | 4.4.3  | Gate Resistance Monitoring                              | 73 |
|     | 4.4.4  | Transient Monitoring                                    | 74 |
|     | 4.4.5  | Device Current Monitoring                               | 76 |
|     | 4.4.6  | On-State Voltage Monitoring                             | 77 |
|     | 4.4.7  | Comparison of Thermal and Aging Monitoring Strategies   | 87 |
| 4.5 | Valida | tion of a $V_{on}$ -based Thermal Balancing Strategy    | 89 |
|     | 4.5.1  | $V_{on}$ -based $T_j$ Sensing                           | 90 |
|     | 4.5.2  | $V_{on}$ -based Thermal Balancing                       | 91 |
| 4.6 | Short  | Summary of the Chapter                                  | 93 |

## 4.1 Introduction

The thermal control has been proposed to mitigate thermal stress during the power converter operating stage by influencing the losses of the semiconductors, and ultimately reduce the thermal stress on the devices [30, 93, 94, 95, 96]. The power routing, is an alternative solution for modular converters, whereby the power is distributed to balance the temperature and degradation among their building blocks [31, 32, 97, 98, 99]. Although multiphase drives are commonly fed by non-modular converters, its higher number of phases allow a soft-unbalance operation capability without affecting its magnetic performance. Thereby, this chapter presents a power routing strategy for multiphase drives to balance the temperature and even the degradation among its phases.

Even though several thermal balancing strategies have been proposed, its implementation with a practical temperature sensing has been scarcely explored. Indeed, the temperature sensing of power semiconductor devices stills an open research topic due to multiple factors, such as high resolution and isolation requirements. Thereby, in this chapter a  $V_{on}$ -based sensing circuit is furthermore implemented and its capability to feedback thermal balancing strategies, is experimentally validated.

## 4.2 Thermal Control in Power Electronics Devices

Fig. 4.1, shows a summary of the adopted temperature-related control variables for thermal control and stress mitigation of power semiconductor devices. A simple approach is the switching frequency adjustment to direct influence the losses, which can be realized without noticeable effects on the operating point if adjusted within system constraints [30]. The current limiting under heavy load conditions has been also proposed to reduce the thermal stress of electrical vehicle (EVs) converters operating under variable load cycling [100]. Another possibility is to act on the gate driver to control the semiconductor losses, either varying the gate voltage or manipulating



Figure 4.1: Temperature related control variables and strategies for thermal control and power routing, in power electronics devices.

the gate resistance [101, 102]. The first one, for example, has shown good performance on reducing temperature without affecting the power device mission profile. Alternative modulation strategies have been also applied for thermal balancing in modular converters [93, 94, 95, 96]. Moreover, finite control set model predictive control, dclink voltage control strategies and power routing are also adopted, as reviewed in the following.

## 4.2.1 Finite Control Set Model Predictive Control

Active thermal control can also benefit from the non-linear control structure of model predictive control applying a particular space vector directly to the inverter without modulator. In [103] a finite control set MPC (FCS-MPC) algorithm is proposed to minimize the fatigue of the devices by reducing the thermal stress in electrical drive applications. In this proposal, the load current and the junction temperature for all possible vectors of the next sampling instant are predicted and used to derive the FCS-MPC cost function parameters. These cost functions include the error from the current reference, the temperature difference among the dies and the total power losses. Thereafter, the factors are weighted and the vector with lowest cost function is directly applied to the power converter. As a result, the temperature of the dies inside a six-pack power module can be better balanced. The FCS-MPC is also proposed to



Figure 4.2: Module and system layout : (a) One phase ANPC open module. (b) Simplified system model scheme of ANPC converter using model predictive control. [105]

better distribute the thermal stress among the devices of a three level NPC converter [31, 32, 104]. In this method, the vector which devices would switch higher currents are avoided, and the temperature between inner and outer devices are balanced.

The Active NPC is an alternative solution to even the thermal stress among the phase devices, by acting on the active clamping ones [106]. The Hybrid ANPC (H-ANPC) converter has been recently proposed to increase its efficiency with a reduced additional cost. As shown in Fig. 4.2 (a), the inner switches are exchanged by SiC MOSFET devices [107]. Therefore, modified modulation strategies can be adopted making the inner devices to operate under fast switching, whereas the outer devices switch at line frequency. As a result, the converter efficiency is optimized due to the lower switching losses of inner SiC Mosfets [108]. Nevertheless, this alternative modulation is not focused on thermal balancing potentially resulting in thermal deviation even in an ANPC structure. Hence, a FCS-MPC for optimized operation of a H-ANPC converter to improve thermal balancing whilst keeps the thermal and the dc-link voltages balanced, is proposed [105]. As shown in Fig. 4.2 (b), the current and

the dc-bus voltages are sensed, and the vector with lowest weighting factor is applied in the respective sampling time.

## 4.2.2 Adaptive Dc-Link Voltage Control

According to IEC 61727, photovoltaic systems must remain connected with the grid floating between 0.85 and  $1.1 \, pu$ ; therefore, manufacturers design dc-link to meet the requirements in the worst case -  $1.1 \, pu$ . Conversely, the fixed high dc-link voltage results in high switching losses and, consequently, higher temperatures and reduced reliability. Therefore, an adaptive control to operate the dc-link voltage in the minimum required level to remain connected and inject power to the grid, is proposed [109]. As shown in 4.4, the minimum dc voltage is selected considering the instantaneous grid voltage, and a linear overmodulation strategy is used in case of fast grid transients. As a result, the switching losses are reduced and the inverter lifetime increases in up to 75%, without impacting the MPP tracking and needing additional measurements.



Figure 4.3

Figure 4.4: Adaptive minimum dc-link control to increase PV lifetime
[109]

## 4.2.3 Power Routing

The power routing concept is based on uneven loading of building blocks in modular power converters to control the thermal stress and equalize the useful remaining lifetime (RUL) of the power devices. Fig. 4.5, shows the power routing concept in two parallel inverters [97]. In addition to parallel inverters, the power routing is further-



Figure 4.5: Power Routing concept, the power is reduced in the most damaged cell to equalize the remaining useful lifetime.

more proposed for the lifetime control of Cascaded H-Bridge (CHB) [98], Quadruple Active Bridge (QAB) [110], MMC [99] and overall modular systems such as smart transformers (ST) and more electric aircraft (MEA) [31, 32]. Moreover, the power routing has been also proposed to improve the maintenance schedule of modular system, whereby the quantity of maintenance can be reduced whilst the time between aging failures is increased [111].

## 4.3 Power Routing in Multiphase Drives

As demonstrated in Sec. 2.3.3, deviations among the modules - demonstrated in can potentially uneven MCMs degradation and reduce even further the reliability of high power converters. Therefore, the power routing has been proposed as a potential solution, whereby the power is directed to lower stressed device to increase the entire system lifetime [31, 32]. Based on this concept, this section presents a power routing strategy which takes advantage of the capability of multiphase drives to operate under soft-unbalanced condition [112]. Due to the higher number of phases, the temperature of the hottest phases can be reduced redistributing the power among the other ones, thereby balancing the temperature without magnetic performance degradation. To validate the capability of the multiphase machine to operate under soft-unbalance condition and balance the temperatures, finite elements analysis are conducted. Furthermore, the impact of such strategy on the reliability of mission critical applications, a case study based on the mine hoist system described in Sec. 3 - with higher power - is carried out.

## 4.3.1 Soft-Unbalance Operation of Multiphase Machines

The nine-phase induction machine (9PIM) can be seen as three three-phase systems with the windings are star-connected, yet displaced 40 ° among each other with connected neutrals [113]. Fig. 4.6, shows the construction and currents diagrams of a symmetrical nine-phase induction machine of two poles (9PIM), which has three times the number of phases. Therefore, the 9PIM can produce a circular magnetomotive force (MMF) under severe unbalanced conditions. The 9PIM, for example, can produce circular MMF even after losing one phase, and still operating with only 16% of extra current in the remaining ones without affecting its magnetic performance [114, 115, 116]. As a result, multiphase machines have been proposed to increase the reliability of mission critical drives, due to its inherent fault tolerance capability [3, 117, 118].



Figure 4.6: (a) Current diagram of the nine-phase machine in balanced condition (b) Basic construction diagram of the 9PIM with two poles.
[115]

The multiphase machine, can furthermore operate under soft-unbalance conditions, whereby the current is slightly reduced in specific phases, and redistributed among the others. To operate under soft-unbalance conditions, however, the currents - magnitude and phases - must be recalculated to keep a circular MMF, as shown in Fig. 4.7. For that, it is necessary to solve two equations, one for the imaginary and other for the real part of the MMF [115]. To obtain the system of equations and calculate the currents for the soft-unbalance operation, the balanced MMF for the 9PIM has to be defined. According to [115], the MMF generated by nine balanced stator currents is given as in 4.1.



Figure 4.7: Current diagram of the a nine-phase machine in balanced condition, considering a reduction in phase  $A_1$  - current  $I_1$ .

#### [112]

$$MMF_{Balanced} = \left(\frac{9}{2}\right) N\hat{I}e^{j\theta} \tag{4.1}$$

where,  $\hat{I}$  is the peak of the phase current and  $\theta = \omega t$ , is the time varying angle. Thereafter, it is necessary to separate the imaginary to the real parts of the  $MMF_{Balanced}$ , thereby obtaining 4.2 and 4.3, respectively [115].

$$\frac{9}{2}N\hat{I}sin(\theta) = N[(I_2 - I_9)sin(40^\circ) + (I_3 - I_8)sin(80^\circ) + (I_4 - I_7)sin(60^\circ) + (I_5 - I_6)sin(20^\circ)]$$
(4.2)

$$\frac{9}{2}N\hat{I}cos(\theta) = N[I_1 + (I_2 + I_9)cos(40^\circ) + (I_3 + I_8)cos(80^\circ) - (I_4 + I_7)cos(60^\circ) - (I_5 + I_6)cos(20^\circ)]$$
(4.3)

Considering a reduction in the current magnitude of phase A, it is necessary to recalculate the current magnitude and phase displacement of the other ones, as shown in Fig. 4.7. As can be seen in (4.4), the currents from  $I_2$  to  $I_9$  have the same magnitude and some cancel each other [112].

$$\begin{cases}
I_2 = -I_6 \\
I_3 = -I_8 \\
I_5 = -I_9 \\
|I_2| = |I_3| = \dots = |I_9| = I_p \hat{I}
\end{cases}$$
(4.4)

To avoid neutral current circulation, the sum of the nine currents must be null. Therefore it is required that  $I_4$  and  $I_7$  cancel  $I_1$ , and the relation shown in (4.5) has to be respected [112].

$$I_1 = -(I_4 + I_7) \tag{4.5}$$

Thereby, the problem is resumed in three equations - 4.2, 4.3 and 4.5 - with three incognitos: where  $I_p$  is the magnitude of the currents - except  $I_1$  -, whilst x and y are the phase shifts related to the reduction of  $|I_1|$  - as shown in 4.7 [112]. As shown in Fig. 4.8, a circular magnetomotive force can be obtained in soft-unbalance condition when the currents are recalculated by following the mentioned procedure [112]. As a result, the multiphase machine can operate under soft-unbalanced condition without interfering its magnetic performance, as validated in reference [112].



Figure 4.8: Magnetomotive force for the balanced and the re-calculated currents for the soft-unbalanced operation, showing a circular MMF in both conditions.

63

[112]

## 4.3.2 Power Routing in Multiphase Drives

Although the high number of devices in a nine-phase drive increases the probability of thermal mismatches among its power modules. Therefore, this work proposes a power routing strategy, whereby the phase power of the hotter devices are reduced, to balance the temperature among the MCMs. The power routing procedure with each step is described in the flowchart of Fig. 4.9. As can be seen, at the beginning, balanced currents are applied to the stators of the 9PIM. Thereafter, the temperatures are sensed; if the temperatures are balanced, the machine stills operating under balancing conditions. Conversely, if a thermal deviation is detected, the current of the hottest phase is reduced; the other currents are then re-calculated to keep circular MMF as described in Sec. 4.3.1 - and the obtained unbalanced currents are applied to the phases. This process works iteratively, until the thermal balancing is achieved [112].

To implement the power routing in a nine-phase machine the control schematic



Figure 4.9: Flowchart of the power routing for thermal balancing in nine-phase induction machines.



Figure 4.10: Power routing diagram of a nine-phase machine, whereby the soft-unbalance currents are calculated to balance the temperatures; and it is imposed via a Z-subspace current control. The speed IFOC control is implemented in dq synchronous in the same way of an ordinary three phase induction machine. The T9 matrix generates the voltage references for the 9PH modulator.

shown in Fig. 4.10 is used. The speed control of the 9PIM is realized in the dq synchronous frame in the same way of an ordinary three-phase induction machine - as demonstrated in Sec. B.1. However, a T9 matrix is required to transform the ninephase currents to  $\alpha\beta$  and the Z-subspace currents [119]. To realize the power routing, the current references are calculated to obtain thermal balance whilst keep circular magnetomotive force. Thereafter, the Z-subspace and the speed control - in  $\alpha\beta$  outputs are applied to the inverse T9 matrix, which in turn calculate the nine-phase voltages as a reference to the modulator. To synthesize the voltages, a nine-phase space-vector pulse-width modulator is adopted.

## 4.3.3 Thermal Validation

To validate the power routing in a 9PIM, a similar FEM-based thermal simulation based on the previous one, yet with a nine-phase converter and a MCM with defected heatsink is developed - as shown in Fig. 4.11. For that, the superposition procedure is applied to the 24-dies module with reduced heat transfer coefficient (h = 3600), and its impedance matrix is also loaded to the numeric software. For the machine simulation, the dynamic model of a nine-phase squirrel-cage developed in [115] is used. For this thermal validation, a real 690 V/ 1.4 MW three-phase induction machine is taken as a basis, and the parameters are adopted to obtain the equivalent nine-phase machine ones. The rated and equivalent circuit parameters are shown in Tab. 4.1. The indirect vector control and its tuning process are imlemented in a similar process of an equivalent three-phase machine, as detailed in Appendix B.1.



Figure 4.11: FEM-based thermal simulation schematic of the nine-phase machine.

Table 4.1: Rated and equivalent circuit parameters of the  $690\,V/\,1.4\,MW$  nine-phase induction machine.

| Parameter   | Value              |  |
|-------------|--------------------|--|
| Power       | 1.4 MW             |  |
| $V_{line}$  | 690 V              |  |
| $I_L$       | 526 A              |  |
| $T_{bd}$    | 200~%              |  |
| Frequency   | $50 \mathrm{~Hz}$  |  |
| Poles       | 6                  |  |
| Speed       | $990 \mathrm{rpm}$ |  |
| $cos(\phi)$ | 0.83               |  |
| $\eta$      | 94.9~%             |  |
| Slip        | 1%                 |  |
| J           | $42.46 \ kgm^2$    |  |

| Parameter | Value             |
|-----------|-------------------|
| $r_r$     | $8.4~m\Omega$     |
| $r_s$     | $5.8\ m\Omega$    |
| $L_r$     | $6.1 \mathrm{mH}$ |
| $L_s$     | $6.1 \mathrm{mH}$ |
| $L_m$     | $5.7 \mathrm{mH}$ |

To validate the performance of the proposed power routing strategy, a simulation considering the equivalent nine-phase machine and the 24-dies MCMs in the FEMbased system, shown in Fig. 4.11, is realized. For that, the machine starts at no-load condition, and a load step of 180 % of the nominal torque is applied after the machine achieves nominal speed. As can be seen in Fig. 4.12 (a), the machine operates in balance condition at the beginning, whereby the currents have the same magnitude with the phase displacement of 40°. The power routing is applied at t = 15 s, and the current of phase  $A_1$  is reduced by 8%, whilst the other ones increase only by 2.5%,



Figure 4.12: Transient results of the power routing - triggered at t = 15 s - in a 9PIM fed by a nine-phase inverter with 24-dies MCMs and a defected heatsink in phase  $A_1$  (a) Current in all phases, whereby the current in phase  $A_1$  is reduced to smooth its thermal stress. (b) Temperature of the hottest dies in the MCMs of phases  $A_1$  and  $A_2$ , showing a balancing when the power routing is activated.



Figure 4.13: FEM analysis of the 24-dies MCM in phases  $A_1$  and  $A_2$  of the nine-phase induction machine (a) Without power routing (b) With power routing.

as highlighted in Fig. 4.12 (a). Moreover, the resulting phase displacement to keep circular magnetomotive force during the soft-unbalance condition - as demonstrated in Sec. 4.3.1 -, is observed. As a result, the temperatures are balanced after triggering the power routing, whereby the hottest die of phase  $A_{1-T7}$  (red) - which has a defected heatsink - is reduced by 9°C, whereas the temperature of the hottest die of phase  $A_{2-T7}$  (blue) are increased by only 1°C, as shown in Fig. 4.12 (b). Fig. 4.13 (a) and (b), show the FEM results for the temperature of each die for the modules of phases  $A_1$  and  $A_2$  without and with power routing, respectively. As can be seen, there is a temperature reduction - of up to 9°C -, in all dies of the module with defected heatsink.

## 4.3.4 Reliability Analysis of the Power Routing in MCIAs

To evaluate the impact of power routing in a 9PIM, a similar case study based on the Mine Hoist presented in Sec. 3.4, is carried out. For that, a 690 V/1.4 MW 9PIM is adopted, thereby increasing the Hoist payload capacity to 7.4 tons. Therefore, for the new payload, the obtained values for load inertia and static torque on the motor side firstly calculated in Sec. 3.4 - are now  $J_{MS} = 1391 \, kgm^2$  and  $T_{SM-1pu} = 14250 \, Nm$ , respectively. Then, one load cycle profile of this new 9PIM- based system is applied to the multiphase converter with the FEM-based thermal models shown in 4.11. Fig. 4.14, shows the resulting junction temperature of the hottest die -  $T_7$  - of the upper module of phase  $A_1$  - with defected heatsink - before (red) and after power routing (blue), thereby showing temperature reductions of up to 9 °C. For sake of comparison, the system is simulated again changing the defected heatsink of phase  $A_1$  for a healthy one and its  $T_j$  is also demonstrated in Fig. 4.14. As can be seen, the temperature difference of the system with only healthy heatsink to the power routing (blue) is only 1 ° C.



Figure 4.14: Temperature profile for the die T7 of the 24-dies MCM at the top of phase  $A_1$  of the nine-phase converter applied to the mine hoist profile, considering three cases: standard heatsink (black), defected heatsink (red) and defected heatsink with power routing (blue).

From the temperature profiles, it is possible to account the impact of the power routing on the reliability of the hoist system power devices composing the selected multiphase system. For that, the same procedure described in Sec. 3.4.4 is adopted; thereby, a Monte Carlo analysis considering 10000 samples and same variations, is conducted. Fig. 4.15 (a), shows the quantity of samples failed over time for all the dies of the nine-phase converter for three scenarios: One without any deviation (black), the second one with a thermal deviation of  $10^{\circ}C$  - shown in 4.14 - and the third one considering the power routing. As can be seen, the samples start to fail earlier in phase  $A_1$  with defected heatsink, due to the extra induced temperature. The systemlevel unreliability of the three cases are shown in Fig. 4.15 (b), whereby the failure probabilities of each die - and modules - are combined, as demonstrated in Sec. 3.4.4. As can be seen, the  $B_{10}$  lifetime of the nine-phase power converter is increased from 13 to 16 years - 22 % -, which is only 3 % lower than the system without any thermal deviation among the modules. The temperature and obtained reliability results are summarized in Tab. 4.2.



Figure 4.15: Die-level reliability analysis for the mine hoist system driven by a multiphase machine, for the standard system (black), defected heatsink in phase A1 (red) and power routing (blue) (a) Statistical Analysis (b) Unreliability Analysis.

| Case                              | Standard HS | Defected $HS_{A1}$ | Power Routing |
|-----------------------------------|-------------|--------------------|---------------|
| $T_{j-max}(^{\circ}C)$            | 103.5       | 113.5              | 104.5         |
| $T_{j-diff}(^{\circ}C)$           | -           | 10                 | 1             |
| $B_{10} \left( years \right)$     | 22.13       | 16.8               | 21.58         |
| $\Delta B_{10} \left( pu \right)$ | 1           | 0.75               | 0.97          |

Table 4.2: Comparative analysis for the presented cases of the multiphase drive in the mine hoist system, showing the temperature and  $B_{10}$  lifetime.

## 4.4 Thermal and Aging Monitoring of Power Semiconductor Devices

The demonstrated possibility to increase the reliability during operation through active thermal balancing, motivates the monitoring of junction temperature and stateof-health (SOH) of the power semiconductor devices. In fact, a recently European industry survey has been reported such procedure as the most promising solution to increase reliability of power electronics systems in a near future [20]. Therefore, many research effort has been addressed to this topic in the last years, whereby academia and industry have been looking for a suitable solution to meet all the challenging requirements [66, 120, 121]. This section, presents the proposed methods for temperature and aging monitoring of power devices, demonstrating the variety of solutions and comparing the specificity of each one.

## 4.4.1 Sensor-based Temperature Monitoring

For the junction temperature sensing, it is possible to directly integrate sensors into the chip. *Mitsubishi*, for example, has developed a commercial solution based on a string of diodes, whereby the junction temperature is obtained through its linear relation with the diode forward voltage [122]. *Fuji*, in turn, has embedded sensors located above the dies, which are integrated to pre-drive control circuit for protection of its intelligent power modules (IPM) [123]. Aiming at detecting temperature variations inside the same device, a solution with several sensors on its surface has been also proposed [124].

The sensor-based solutions, however, are quite complex and invasive, thereby motivating the monitoring of thermal sensitive electrical parameter (TSEP) for an indirect  $T_j$  sensing approach [121, 125, 126]. In addition to the non-invasive and simple implementation,  $T_j$  and some TSEPs are also aging precursor parameters (APP), as shown in Fig. 4.16 [66, 127, 128]. The characteristics and implementation challenges of the TSEPs and APPs are detailed in sequence.



Figure 4.16: Classification of the thermal sensitive and aging parameters.

## 4.4.2 Gate Voltage Monitoring

The gate-emitter voltage under a specific collector-current value  $(V_{ge,i})$  has been proposed as a TSEP, with a sensitivity of  $-6.5 \, mV/^{\circ}C$  for lower temperatures and  $-8 \, mV/^{\circ}C$  for higher temperatures [129]. Fig. 4.17 shows the proposed methodology schematic, whereby  $I_c$  is measured by shunt and regulated through gate-voltage. In this proposal, the calibration procedure is made using pulse currents to avoid selfheating, which can generate errors during the measurement [130]. In addition,  $V_{ge,i}$ can vary drastically between devices generating errors of around  $10^{\circ}C$  inside the same module, thereby requiring single calibration process in case of multichip systems. In addition, increase on the  $V_{ge,i}$  rate has been also presented as a potential parameter do detect aging of power semiconductor devices [131].



Figure 4.17: Regulation of the collector current for  $T_j$  estimation using  $V_{ge,i}$ . [130]

The  $V_{ge-th}$  is also proposed for junction temperature estimation of MOSFET and IGBTs [132, 133, 134, 135]. In this case, the calibration step is made with very low current regulation acting on the gate-emitter voltage. The  $V_{ge-th}$  sensitivity varies between  $-2 mV/^{\circ}C$  and  $-10mV/^{\circ}C$  depending on the semiconductor device. The gate-emitter voltage threshold ( $V_{ge-th}$ ) is also a precursor parameter to detect aging in the gate-oxide [136]. Operating over 100°, traps accumulate in the IGBT gate-oxide, building up a leakage path, decreasing the oxide area and gate capacitance, thereby increasing the  $V_{ge-th}$  during the device lifetime.

The gate-plateau voltage has been presented as a TSEP, showing a linear sensitivity for a fixed current and varying between 1.5 and  $7 mV/^{\circ}C$  for the entire operating range [137]. Moreover, an in-situ monitoring of the Miller-plateau during the IGBT turn-on is also proposed an aging precursor parameter. For that, the Miller-plateau duration shows a sensitivity for gate oxide degradation and bond wire fatigue [138]. It is demonstrated that bond wire liftoff and aluminum reconstruction overlap doped regions with the surface of the gate polysilicon, thereby resulting in decrease of oxide capacitance and ultimately shortening its duration [138].

## 4.4.3 Gate Resistance Monitoring

The gate-resistance  $(R_{g-int})$  variation with temperature is also investigated as a TSEP in references [139, 140, 141]. In this case, a differential amplifier with an integrator is applied to detect the peak voltage  $(V_{peak})$  - which is directly proportional to the peak current  $(I_{peak})$  - over the external gate-resistance  $(R_{g-ext})$  [139]. Then, the  $I_{peak}$ is - indirectly - measured during the turn-on charging cycle, as well as the negative  $(V_{g-neg})$  and positive  $(V_{g-pos})$  voltages of the gate-driver. Hence, the  $R_{g-int}$  can be estimated as shown below:

$$R_{g-int} = \frac{V_{g-pos} - V_{g-neg}}{(I_{peak})} - R_{g-ext}$$
(4.6)

A method to estimate  $R_{g-int}$  with simple modification in the gate driver is presented in [140]. In this approach, a dc current is injected into the gate, and  $V_{ge}$  is sensed in two different time instants, as shown in Fig. 4.18. Since the injected current is already known, the  $R_{g-int}$  and consequently  $T_j$  can be estimated, with an accuracy of roughly 1°C and standard deviation of 0.4°C. This method can be applied independent on the device state - on or off-, but if applied in off-state an special attention has to be taken to avoid an undesirable turn-on [140]. It is furthermore proven these strategies can keep providing good accuracy without considerable deviations even after long operating time [139, 140].



Figure 4.18: The  $T_j$  estimation by the internal gate resistance through a predefined dc current injection. [140]

The sensing process, however, is quite complex because  $R_{g-int}$  is tied to the gate

capacitor and the momentary disconnection of the external gate driver is required. The addition of a kelvin resistance as a part of the entire  $R_{g-int}$ , is proposed in [142]. In this case, the resistor is added to the die at the kelvin-emitter position, thereby enabling its sensing without gate capacitance influences. The proof-of-concept of this proposal has shown good precision, and further results with kelvin-emitter resistor inside the chip are promised in future publications.

### 4.4.4 Transient Monitoring

The switching behavior of Si IGBTs [125, 143, 144] and Silicon Carbide Mosfets [145] are also influenced by the temperature. The turn-on time delay, for example, is proposed as a TSEP, whereby a very fast detection circuit counts the time between the  $V_{ge}$  rising detection and the starting of the  $I_c$  current rising during the turnon [125, 146]. The turn-on delay increase has presented a linear relation with the temperature, current independence, and a sensitivity close to  $2 ns/^{\circ}C$ . The time span, however, is very short, and the use of high external gate resistance to increase the sensing time, is proposed in [147]. Moreover, the  $dV_{ce}/dt$  during the turn-off is also investigated as a TSEP, due to its negative temperature dependence, i.e. as lower the temperature higher the dV/dt. This TSEP also presents an  $I_c$  and  $V_{dc}$  dependency, as well as high linearity and fixed sensitivity around 6.7  $V/\mu s^{\circ}C$ .

The turn-off time  $(t_{d-off})$  of Si IGBTs has been also studied as a solution for junction temperature sensing [143]. The  $t_{d-off}$  - the time between the turn-off command and  $V_{ce} = V_{dc}$  - is obtained by measuring the induced voltage on the kelvin emitter parasitic inductor  $(L_{kE})$  of high power module, as shown in Fig. 4.19. Even though the turn-off time is dependent on the  $I_c$  and  $V_{dc}$  requiring calibration, it has shown a fixed sensitivity of  $4 ns/{}^{\circ}C$  and high linearity [143].

The implementation of a smart gate driver with an online monitoring of  $t_{d-off}$ SiC Mosfets is presented in [145]. Due to the fast switching and high  $dV_{ds}/dt$  of SiC Mosfets, the sensitivity has to be enhanced to achieve online  $T_j$  sensing with enough



Figure 4.19: High power IGBT module parasitic-based equivalent circuit. [143]

accuracy. One possibility is to insert  $R_{g-ext}$ , as suggested in [147]. Nevertheless, the required external resistance is in order of 150-300  $\Omega$ , which would dominate the total gate loop impedance and increase the switching time and losses. Therefore, a gate impedance assist circuit is proposed, whereby the auxiliary gate resistor ( $R_{g-aux}$ ) is associated in series with the gate loop impedance only during the sensing process, as shown in Fig. 4.20. As a result, the sensitivity is improved by a factor of 60, resulting in 760  $ps/^{\circ}C$ . Moreover, a unit base for real time sensing with resolution less than 104 ps, is also developed [145].



Figure 4.20: Gate impedance assist circuit. [145]

Most of the switching properties of power devices are also affected by aging, and its potential to detect degradation in power devices has been investigated [66].  $T_{d-off}$ and dVce/dt during turn-off has presented a potential to detect soldering and wire
bond degradation [148, 149, 150]. The thermal path degradation increase the die temperature, which ultimately impact the turn-off time [148], whereas the bond wire aging vary the stray inductance, thereby changing the voltage transient [144].

Indeed, a method to estimate  $t_{d-off}$  based on very fast sampling - hundreds of MSPS - of the  $V_{ce}$  rising during turn-off is proposed for aging detection [151]. For that the  $V_{ce}$  is sensed in multiple switching transition when its value is between 20% and 80% of the dc bus, and the Standard Error of the Mean (SEM) is used to calculate the expected  $t_{d-off}$  error with statistical precision.

# 4.4.5 Device Current Monitoring

In IGBT and MOSFETS, the saturation current  $I_{css}$  has been proposed as a TSEP [130, 152, 153]. During the measurement, a low  $V_{gs}/V_{ge}$  - slightly higher than  $V_{gs-th}/V_{ge-th}$  - is regulated through a controller (Fig. 4.21), and the temperature is estimated through its dependence of the electron mobility, which in turn impacts  $I_{css}$  [130, 153]. Even though gate-emitter voltage pulses are applied to reduce the self-heating, an interpolation process is necessary to limit the temperature error. Moreover, this TSEP is not linear, the sensitivity is not fixed and presents very poor precision under low temperatures [130].



Figure 4.21: The  $T_j$  estimation by the saturation current through a regulated  $V_{gs}$ .

The short-circuit current  $(I_{sc})$  of an IGBT is also a function of the temperature, and has been investigated for  $T_j$  sensing purpose [154]. As shown in 4.22, the temperature of an IGBT can be sensed by applying a short-circuit current pulse to the device under test (DUT). To eliminate the effect of the auxiliary IGBT, a bypass device with higher current capability is added in parallel. During the sensing process, the bypass device is turned-on, and the dc bus voltage is blocked by the DUT. In sequence, the DUT is turned-on in short-circuit,  $I_{sc}$  rapidly reaches the peak and the DUT is turnedoff. For that, the current peak shall be defined avoiding to exceed the device shortcircuit capability [154]. Although the method has presented an adequate sensitivity  $-0.17 \%/^{\circ}C^{-1}$  - and linearity, the very high thermal dissipation during the test can compromise the reliability of the devices due to the cumulative degradation effect resulting from repetitive short-circuits.



Figure 4.22: The  $T_j$  estimation by the short-circuit current.

# 4.4.6 On-State Voltage Monitoring

The on-state voltage  $(V_{on})$  has been recognized as an effective indicator to predict aging, whereby variations from 5% to 20% comparing with its initial value might indicate a wear out failure by bond wire lift-off or solder joint degradation [120, 155, 156, 157, 158, 159]. Moreover,  $V_{on}$  has been presented as a promising TSEP parameter due to its linear sensitivity [160, 161, 162, 163]. However, the requiring of mV-level accuracy, kV-level isolation and noise rejection has challenged its development; therefore, multiple on-state voltage sensing circuits for  $T_j$  sensing and condition monitoring have been proposed in the last years [65, 75, 161, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179].

Fig. 4.23, shows a timeline of the proposed structures in the last 20 years of research. The on-state voltage sensing circuits are designed considering the purpose and application requirements. Considering online junction temperature and device characterization, for example, the circuit has to be immune to switching noise and operate in variable electrical conditions [171]. In addition, the circuits need fast response, which can be in order of few  $\mu s$  considering wide bandgap semiconductor devices [179]. Therefore, fast passive - Zener or diodes - or active - transistor - voltage clamping during the off-state are required for online sensing [164, 171, 176]. Conversely, the condition monitoring can be realized *in - situ* - off-line -, thereby taking away the need of fast response and enabling the use of slow devices such as mechanical switches, as also shown in Fig. 4.23 [169].

# **Online Passive Clamping**

The on-state voltage of power devices was firstly adopted to detect zero-voltage crossing in soft switching devices at the beginning of 90's [180]. The introduction of the on-state voltage as a TSEP, however, was presented only in 98's, with the purpose to optimize heat management of IGBTs with two solutions; one with limit of 600 V (Fig. 4.24 (a)) and the other reaching up to 2000 V (Fig. 4.24 (b)) [164]. In such structures, the diodes are responsible to block the high voltage during the off-state and the Zener diodes clamp the output voltage in a reduced level enabling an analog to digital conversion (ADC).

An evolution of this proposal is presented in [169], whereby one Zener is changed by a diode, aiming at reducing the effective stray capacitance, as shown in Fig. 4.25. In addition, two resistors are added, one to limit the current of the Zener and the other to balance the input impedance network to minimize the common-mode error. The clamping circuit, however, has an inherent leakage current during the off-state, which can be really critical in high dc-link [169].

Another variation was proposed in 2015, whereby the Zeners are completely removed and changed by series-connected diodes [172]. Therefore, a cascode current mirror is added to provide equal currents flowing through the high voltage diodes during the on-state, as shown in Fig. 4.26. Hence, the potential at the drain - or collector - is the voltage drop  $V_d$  in the upper diode plus  $V_{on}$  whilst  $V_d$  is the potential at the



Figure 4.23: The timeline of the proposed on-state voltage sensing circuits.



Figure 4.24: The  $V_{on}$  sensing circuit with clamping Zener diodes: (a) 0-600 V (b) 0-2000 V



Figure 4.25: The  $V_{on}$  sensing circuit with series connection of Zener and a diode to clamp the voltage.

source - or emitter. As a result,  $V_{on}$  is measured in the output, considering the same voltage drop in both diodes. During the off-state, on the other hand, the upper current source flows through the clamping diodes, and the output voltage is clamped by the voltage drop of the series association of the devices.



Figure 4.26: The  $V_{on}$  sensing circuit with series connection of diode to clamp the voltage and cascode current source.

From the same concept of current injection, a single source solution was presented in [171]. As shown in Fig. 4.27, when the DUT is blocking voltage, the diode is reversely polarized and isolate the measurement circuit. During the conduction phase, however, the source starts to provide current through the two diodes, and considering the same  $V_d$ , the diode drop voltages cancel each other and only  $V_{on}$  is seen at the amplifier output, as shown in (4.7).

$$V_{on} = 2(V_{on} + V_{d1}) - (V_{on} + V_{d1} + V_{d2})$$

$$(4.7)$$

Figure 4.27: The  $V_{on}$  sensing circuit with series connection of high voltage diodes and single current source.

A variation of the latest presented circuit, is the one proposed in [178], whereby a similar functionality is brought in a alternative structure. As shown in Fig. 4.28, during on-state the current is provided by the voltage source series-connected to the resistor. Due to the Zener, the D2 voltage drop can be measured and used to 'correct' the offset introduced by D1 [178]. Moreover, the mentioned work presents a design for high switching frequencies, resulting in bandwidths higher than 50 MHz and dynamic response lower than 50 ns - without isolation.



Figure 4.28: The  $V_{on}$  sensing circuit based on a sensing current and a series connection of high voltage diodes and Zener.

Even with an identical current provided for both devices by the cascode sources [172] or the series connection of the diodes [171, 178], parametric and thermal deviations result in different  $V_d$ , and ultimately voltage offset in the sensing circuit output [179].

To remove the possibility of an offset, a new structure with a voltage source  $V_{cc}$  is proposed [179]. As shown in Fig. 4.29, the  $V_{cc}$  is selected to be higher than the

maximum  $V_{on}$ . Thereby, selecting a diode with very low leakage current and a low input bias operational amplifier, it is possible to ensure a low voltage across the resistor. In addition, a proportional amplifier is added to reduce the common-mode voltage.

In the off-state, the diode is directly polarized and the output measure its voltage drop plus  $V_{dc}$ . During the on-state,  $V_{on}$  can be measured in the output without  $V_d$  and Zener leakage current in the path, thereby eliminating any possible relevant offsets. As a result the circuit provides good accuracy with errors below 0.13 % in a wide  $T_j$ and a time response lower than 19  $\mu s$ .



Figure 4.29: The  $V_{on}$  sensing circuit based on a  $V_{dc}$  clamping.

# **Online Active Clamping**

The active voltage clamping circuit was firstly introduced in the patent 2008/0309355, whereby a MOSFET is included between the terminals of the DUT, as shown in Fig. 4.30 [181]. In short, the auxiliary transistor is turned-off by the voltage raise generated by the current flowing through the resistor, during the off-state. As a result, the output voltage is clamped to  $V_{clamp} = V_g - V_{th}$ . In on-state, however, the auxiliary transistor starts to conduct, and  $V_{on}$  is measured in the output [181]. This solution presents voltage peaks several times higher to the clamping due to parasitic capacitance of the transistor, which can led the device to destruction. In addition, a large dv/dt is present across the DUT capacitance during the transition to the off-state, causing a large current flowing through the voltage supply, potentially leading it to fail [172].

Despite the drawbacks, a real-time  $V_{on}$  sensing circuit based on an active clamping circuit is proposed in [176], whereby a N-channel depletion mode Mosfet is adopted.



Figure 4.30: The  $V_{on}$  sensing circuit with active voltage clamping.

During the on-state of the DUT, the current is not flowing through the Mosfet, thereby turning it on and enabling the  $V_{on}$  sensing in the output as shown in Fig. 4.31. Conversely, the current flowing in the clamping circuit during off-state, turns-off the Mosfet through the resistor voltage drop, and the clamping voltage is seen in the output.

# **Offline Isolation**

The offline isolation based  $V_{on}$  sensing circuits are generally applied general to monitor the degradation of the devices In - situ. This realization can be, for example, during a start-stop of a electric vehicle or programmed pauses in Industry [6, 161, 169]. Fig. 4.32, shows a relay-based isolation for  $V_{on}$  sensing in applications with high dclink, whereby the main goal is the reduction of the leakage current and parameter mismatches of semiconductor devices impacts during the sensing process. In short, relays are turned-on - offline - only during the sensing process and a specific current is applied to the DUT without disturbances. As a result, the  $V_{on}$  is sensed with very high precision, presenting deviations lower than 0.05% [169].



Figure 4.31: The real-time  $V_{on}$  sensing circuit with active voltage clamping.



Figure 4.32: The off-line  $V_{on}$  sensing circuit with relay-based voltage isolation.

### Methods for Tj Estimation throuh $V_{on}$

### - Low current $T_j$ Sensing

The  $V_{on}$  under low current sensing was firstly introduced in [182] and is nowadays the most common method to obtain the chip temperature of a bipolar transistor [183]. in this method, a low sensing current is applied to the device under different temperatures and a 2-D fitting curve  $V_{on}(Tj)$  is obtained [182]. As a result, the simple calibration process is not affected by the self-heating of the dies.

Fig. 4.33, shows an experimentally obtained fitting curve, whereby an 1*A* dc current is applied to a single IGBT device under different fixed temperatures. In this process, the IGBT module DP25H1200T01667 (1200 V/25 A) is adopted, which is placed over a heatplate to apply the temperature variation. As it can be seen, this curve is linear in a wide temperature range, presenting a slope of  $1.8 mV/^{\circ}C$  and a negative thermal coefficient (NTC). To estimate the temperature, the same current is applied to the device during operation, the  $V_{on}$  is sensed and the fitting curve is interpolated to obtain  $T_j$  [163].

In general, an auxiliary circuit to apply the sensing current is required and the system stop may change the temperature even in a fraction of second [184]. An alternative solution for  $V_{on}$  sensing under low current without stopping the system is proposed in [183]. In this strategy, the load current is modified intermittently during very short times - around 100  $\mu s$  - and the  $V_{on}$  is sensed when the current cross the respective  $I_{sense}$ . This method, however, impact the output current quality and generate electromagnetic interference. Furthermore, it is limited to converter with low



Figure 4.33: Low current  $T_j$  calibration curve experimentally obtained from a 25 A IGBT (DP25H1200T01667).

inductive load, whereby specific current values can be applied in a short time.

### - High current $T_j$ Sensing

The  $T_j$  can be also sensed through  $V_{on}$  under high and variable current [61, 163, 163]. The calibration procedure, however, is done monitoring the temperature to take the self-heating into account. In [164], it is proposed to sense the device temperature during the calibration by using an infrared camera, which is indeed limited for most applications. A more common method based on the monitoring of the baseplate temperature is proposed in reference [169]. Nevertheless, the electrical connections voltage drop and its temperature sensitivity results in considerable sensing errors [169]. The possibility to sense  $V_{on}$  using the Kelvin emitter to avoid the connection voltage has not shown effectiveness to overcome this problem, and an alternative solution based on an assumption of the resistivity and temperature of them is proposed in [61]. Another calibration procedure adopting a controlled cooling system is demonstrated in [185], proposing furthermore a mathematical model to embed the variable sensitivity by fitting results obtained from different currents, as shown in (4.8).

$$T_j = T_{j0} + \frac{V_{on}(I_c, T_j) - V_{on}(I_c, T_{j0})}{\alpha I_c + \beta}.$$
(4.8)

where,  $\alpha$  and  $\beta$  are the fitting parameters to define the shape of  $V_{on}$  ( $I_c$ ) at certain calibration temperature.

The possibility to sense  $T_j$  only under rated load current is proposed to reduce the calibration effort. In this strategy, the  $V_{on}$  sensing is triggered only when the current crosses the calibrated value. This strategy has demonstrated good sensing capability, yet with low bandwidth in low-frequency sinusoidal application (50 Hz), and not suitable for load variable applications [183].

#### Aging Impacts and Compensation

The aging of the power module structure such as chip delamination, solder degradation and wire bond liftoff can impact the  $V_{on}$  sensing, ultimately generating errors on the  $T_j$  sensing in long term [185, 186, 187]. Therefore, some strategies to sense and compensate the degradation effects have been proposed. In short, the  $V_{on}$  is sensed at the inflection point to obtain only the aging-related variation due to the zero thermal coefficient (ZTC) in this region - as shown in Fig. 4.34 [176, 188, 189].



Figure 4.34: Operating region of an IGBT depending on the collector current. The temperature does not impact the  $V_{ce}$  with a specific current - i.e. at the inflection point.

There are also challenges to detect the aging through  $V_{on}$ , due to the concurrence of different failure mechanisms which can change the trend of on-state voltage variation [190]. As a solution, the decoupling of such influences by sensing at different points and using auxiliary APPs - e.g.  $V_{ge-th}$  has been proposed [191, 192].

# 4.4.7 Comparison of Thermal and Aging Monitoring Strategies

Due to the multiple strategies proposed in the last years, a considerable number of papers comparing condition monitoring [66, 120, 193, 194] and junction temperature sensing strategies [121, 130, 160] has been published. This section brings a comparative analysis regarding the characteristics, pros and cons, related in the mentioned references.

#### **Thermal Sensitive Parameters**

In Tab. 4.3 a comparison among the proposed thermal sensitive parameters is stated. The short-circuit current  $(I_{sc})$  has shown a good linearity and performance, yet not safety during the application [154]. Even though the saturation current presents a good linearity, it has a variable sensitivity which can disturb its applicability [130]. The on-state voltage  $(V_{ce})$  has indeed good linearity, low calibration requirements and simplicity during implementation - if compared with the other solutions. Conversely, the low resolution challenge its online implementation, and the heating of other ele-

|                  |        |            | [121]       |                     |                |
|------------------|--------|------------|-------------|---------------------|----------------|
| Method           | Device | Dependents | Linearity   | Resolution          | Considerations |
| I <sub>sc</sub>  | IGBT   | Т          | Good        | $150 mA/^{\circ}C$  | Low safety     |
| V <sub>sat</sub> | MOS    | T,V        | exponential | Varies              | Unknown fre-   |
|                  |        |            |             |                     | quency         |
| $V_{on}$ Low     | All    | T,I        | Good        | $mV/^{\circ}C$      | Only fixed     |
| Current          |        |            |             |                     | current        |
| $V_{on}$ High    | All    | T,I        | Good        | $mV/^{\circ}C$      | Affected by    |
| Current          |        |            |             |                     | elements (wb)  |
| $R_{g-int}$      | MOS    | Т          | Good        | $m\Omega/^{\circ}C$ | Current injec- |
|                  |        |            |             |                     | tion required  |
| $T_{off}$        | IGBT   | T,I,V,Rg   | Good        | $ns/^{\circ}C$      | Affected by    |
|                  |        |            |             |                     | harmonics      |
| $T_{on}$ delay   | MOS    | T,Rg       | Good        | $ns/^{\circ}C$      | Rg changes     |
|                  |        |            |             |                     | required       |
| dV/dt            | MOS    | T,I,V,Rg   | Good        | $ns/^{\circ}C$      | Rg changes     |
|                  |        |            |             |                     | required       |

Table 4.3: Comparison of Thermal Sensitive Parameters.

ments can impact its sense under high current [182]. Therefore, the possibility to sense it only at fixed and low current shows up as a solution, yet showing limited implementation, as detailed in section 4.4.6 [183]. The transient behavior of devices have also been proposed as a TSEP, such as the turn-on/turn-off times and the dV/dt, showing good linearity but requiring very fast sensing circuits, especially for SiC Mosfets [125, 143, 144].

# **Condition Monitoring**

Tab. 4.4, brings a comparison among existing CM techniques for power modules with degraded parts they can detect, advantages and shortcomings. As can be seen the  $V_{on}$  is a mature technique which can be successfully applied offline and In - situto detect three different failure mechanisms [84, 169]. Even though the threshold gateemitter voltage ( $V_{ge-th}$ ) and the gate current ( $I_g$ ) have presented the same maturity, it is limited only to detect the gate oxide degradation [65]. The thermal resistance has shown capability to detect thermal path deterioration in controlled conditions, yet showing poor performance in variable losses and environment conditions. Another possibility is a thermal sensor solution, which is capable of detecting wire bond degradation in any conditions, but presenting high cost due to the necessity of multiple sensors and limitations in multichip systems.

| Parameters   | Detection                                | Advantages                   | Shortcomings                                            |  |
|--------------|------------------------------------------|------------------------------|---------------------------------------------------------|--|
| Von          | Metalization<br>Wire bonds<br>Gate oxide | Mature<br>In-Situ<br>Offline | Affected by multiple mechanisms<br>Hard to sense online |  |
| $V_{ge-th}$  | Gate oxide                               | Mature<br>Low Fsw            | Hard to sense under working conditions                  |  |
| $I_g$        | Gate Oxide                               | Mature                       | Hard to sense under working conditions                  |  |
|              |                                          | Easy in Lab                  |                                                         |  |
| $R_{th}$     | Thermal Path                             | In-Situ<br>Offline           | Hard to sense under variable conditions                 |  |
| $dV_{on}/dt$ | Wire bond                                | Integrable                   | Influenced by gate oxide and GD circuit                 |  |
|              |                                          |                              | Multidamage indicator                                   |  |
|              |                                          | Easy to                      | High Cost                                               |  |
| Sensors      | Wire bond                                | measure                      | Not applicable in MCM                                   |  |

Table 4.4: Comparison among CM techniques.

# 4.5 Validation of a Von-based Thermal Balancing Strategy

Although many thermal balancing strategies have been presented, their implementation with a practical junction temperature sensing structure is scarcely explored. Therefore, a thermal balancing with thermal sensitive parameter (TSEP) based  $T_j$ sensing is implemented and validated in this section. Based on the conducted analysis in the previous section, a  $V_{on}$  sensing circuit is selected, due to its capability to detect aging and sense the junction temperature in a relatively simple structure; therefore, the circuit described and shown in Fig. 4.27 is implemented [171]. This structure, is the base of recent proposed solutions [178, 179] and stands out due to the lower number of components which furthermore involves only passive devices.



Figure 4.35: The  $V_{on}$  sensing circuit with series connection of high voltage diodes, single current source, low noise dc power supply and isolation (a)Schematic (b) Board

The first working condition of the sensing circuit is when the device is blocking voltage, whereby D1 is reversely polarized to isolate the measurement circuit from high-voltage. During the conduction phase, however, the current source composed by the power supply and a series resistor, starts to inject a current current of  $10 \ mA$  to the two diodes and the device under test (DUT). To extract only the on-state voltage, an operational amplifier is added and configured to cancel the voltage drops  $(V_d)$  in D1 and D2, as demonstrated in (4.9). Thereby, considering the same  $V_d$  for both diodes, only  $V_{on}$  is sensed in the output [171]. Nevertheless, parametric deviations and thermal mismatches between the diodes can result in different  $V_d$  and ultimately a  $V_{on}$  offset in the output, as detailed in Sec. sec:Tsep. To mitigate the impact of thermal deviations, SiC Schottky diodes (C3D1P7060Q) are selected due to its negligible reverse recovery. Moreover, the injection current is selected to make the diode operate in ZTC region - where the current does not affect the voltage drop - and they are placed very close to each other for better thermal coupling to reduce potential thermal deviation, as shown in Fig. 4.35 (b). In addition, very low noise devices with high bandwidth and digital filters are implemented to achieve high resolution, and further details are given in Appendix A

$$V_{on} = 2(V_{on} + V_{D1}) - (V_{on} + V_{D1} + V_{D2})$$
(4.9)

# 4.5.1 $V_{on}$ -based $T_j$ Sensing

To validate the capability of the proposed  $V_{on}$ -based circuit to sense  $T_j$ , the onstate voltage of an IGBT - DP-25F1200T-101666 - conducting 1 A dc under fixed temperature, is measured. For that, the power module is placed over a heatplate with controlled temperature at 52 °C, as shown in 4.36 (a) . As can be seen in Fig. 4.36 (b), a  $V_{on} = 882.5 \, mV$  with precision of  $0.3 \, mV$  is obtained, after applying a 50 samples moving average filter as described in Appendix A. Thereby, interpolating the fitting curve experimentally obtained for the device conducting the same dc current detailed in Sec. 4.4.6 -, the capability of the  $V_{on}$  circuit to sense the temperature is demonstrated. As shown in Fig. 4.36 (c), the measured  $V_{on} = 882.5 \, mV$  matches the fitted  $T_j = 52 \, ^\circ C$ , with very small difference.



Figure 4.36:  $V_{ce}$ -based  $T_j$  under fixed current  $I_c = 1 A$  and temperature  $T_j = 52 \,^{\circ}C$  (a) Junction temperature measured via fiber optic sensors (b)  $V_{avg}$  obtained from a fifty sample move average filter using an 1 MSPS ADC in burst mode (c) Fitting curve obtained with fixed low current of 1 A.

# 4.5.2 Von-based Thermal Balancing

After demonstrating the capability of the  $V_{on}$  to sense  $T_j$ , the next step is to validate its effectiveness to feedback thermal balancing strategies. As shown in Fig. 4.37, the validation setup is composed by two buck converters connected in parallel with the same dc-bus. As it can be seen, the setup is composed by two buck converters, each one with a specific  $V_{on}$  sensing circuit, and open modules without gel. To obtain a homogeneous emissivity the module is also painted in black. Moreover, one converter is placed over a heatsink to generate a specific thermal mismatch, and a fiber optic system to measure the device temperatures is adopted. The validation parameters are stated in Tab. 4.5.



Figure 4.37: Validation setup consisting of two buck converters in parallel with  $V_{on}$  sensing capability, whereby one is placed over a heatsink to obtain a specific thermal deviation. A fiber optic system is used to measure the temperature on the chips and validate the strategy (a) Photo (b) Schematic.

| Power Module     | DP-25F1200T-101666     |  |
|------------------|------------------------|--|
| Power Capability | $2x \ 25 \ A/1200 \ V$ |  |
| $V_{dc}$         | $250 \mathrm{V}$       |  |
| $I_{out-peak}$   | 15 A                   |  |
| $F_{sw}$         | 1 kHz                  |  |
| Duty Cycle       | 0.5                    |  |

Table 4.5: Validation parameters for the the  $V_{on}$ -based thermal control.

For the thermal balancing, an on-off control strategy is implemented, whereby the hotter device is turned-off during specific periods. As shown in Fig. 4.38, the  $V_{ce}$  of each device is sensed and the  $T_j$  estimated by the lookup table. Then, the IGBT with highest temperature has its PWM blocked by the comparator output connected to an



Figure 4.38:  $V_{on}$ -based on-off thermal balancing strategy, whereby the junction temperatures are sensed and the device with highest one has its PWM command blocked.

AND logic port. In this case, only one device is conducting at a time and the on-state voltage can be used to sense its temperature during the balancing process. As shown in Fig. 4.39, a deviation of  $9 \,^{\circ}C$  is generated by the heatplate at the beginning of the operation. At  $t = 3 \, s$ , the thermal balancing is triggered, the  $V_{ce}$ s are sensed and the device with higher temperature is deactivated in the next period, thereby transferring the total power to the colder one. As a result, the  $T_j$ s are equalized by indirect sensing  $T_j$  through the  $V_{on}$  of the devices, as shown in Fig. 4.39.



Figure 4.39: Experimental validation of the  $V_{on}$ -based thermal control. The junction temperature starts with a deviation of 9°C, and it is equalized after the thermal control is triggered at t = 3 s. (a)  $V_{on}$  of each switch (b) Measured temperature of each switch.

# 4.6 Short Summary of the Chapter

In this chapter, the thermal control is presented as a solution to balance uneven temperatures and alleviate thermal stress of devices by manipulating its losses. Moreover, a state-of-the-art of the junction temperature sensing and aging detection of power semiconductor devices is described, whereby the proposed solutions are presented and compared. The  $V_{on}$  based monitoring has presented some advantages in comparison with others, such as linearity, reasonable resolution, capability to detect multiple imminent failures and relative implementation simplicity. Therefore, a  $V_{on}$ sensing circuit with a very low noise design is implemented and validated, showing a resolution of  $0.3 \, mV$ . Moreover, its capability to sense  $T_j$  and close the loop of thermal control is experimentally validated.

Chapter 5

# **Die-Level Thermal Balancing in**

# **Multichip Modules**

# Contents

| 5.1 | Introduction                                                            |  |  |  |  |
|-----|-------------------------------------------------------------------------|--|--|--|--|
| 5.2 | The Multi-gate Multichip Structure                                      |  |  |  |  |
| 5.3 | Die-Level Thermal Balancing                                             |  |  |  |  |
|     | 5.3.1 Pulse-Shadowing Based Thermal Balancing                           |  |  |  |  |
|     | 5.3.2 Turn-off Losses Manipulation for Thermal Balancing 108            |  |  |  |  |
|     | 5.3.3 Comparison of the Thermal Balancing Solutions                     |  |  |  |  |
| 5.4 | Indirect Thermal Balancing for Diodes                                   |  |  |  |  |
| 5.5 | Selective $T_j$ Sensing and the Pre-Programmed Thermal Balancing 120    |  |  |  |  |
| 5.6 | Reliability and Efficiency Analysis of the Die-Level Thermal Balancing  |  |  |  |  |
|     | in MCIA                                                                 |  |  |  |  |
| 5.7 | Technical Analysis of the Proposed Solution                             |  |  |  |  |
|     | 5.7.1 Multi-Gate Driver Considerations                                  |  |  |  |  |
|     | 5.7.2 Power Level Margins                                               |  |  |  |  |
| 5.8 | Short Summary of the Chapter                                            |  |  |  |  |
|     | 5.8.1 Evaluation System with Equivalent Multi-gate Multichip Module 129 |  |  |  |  |

# 5.1 Introduction

Even after 30 years of research, multichip modules still presenting high thermal deviations and extra heat in a subset of devices, as detailed in Sec. 2.3.2. As a result, MCM-based power converters has been operating with reduced reliability, especially in mission critical applications - Chap. 3. As validated in Chap. 4, thermal balancing strategies are able to overcome thermal mismatches and increase lifetime of power converters. This chapter, presents and investigates a die-level solution to overcome thermal mismatches among the dies of a more flexible MCM structure. Thereby, three novel balancing strategies to reduce thermal stress in transistors and diodes of an IGBT-based MCM, are proposed. To evaluate the impacts of the proposed solutions in mission critical applications, a case study based on the aforementioned mine hoist system, is conducted. To validate, evaluate and compare the presented solutions, FEM-based thermal analysis and experimental tests are conducted.

# 5.2 The Multi-gate Multichip Structure

As shown in Fig. 5.1 (a), a standard high power multichip module is composed by several devices with a single gate-emitter connection. Therefore, a high number of devices are controlled by a single gate driver and, consequently, the same pulse-pattern. The multi-gate multichip module concept, however, is based on the traditional MCM structure, yet with multiple gate-emitter connections, as shown in Fig. 5.1 (b).



Figure 5.1: Multichip Modules (a) Standard Structure (b) Multi-gate Structure

The purpose of the multi-gate MCM structure is to to give flexibility to the structure, which enabled the control of specific group of dies by independent gate drivers. To select the groups, the temperature distribution inside the MCM - Fig. 2.16 - is analyzed and the devices with similar thermal stresses are grouped in equivalent switches (SWx). Considering the presented 24-dies MCM, the sixteen IGBT dies can be divided into four equivalent switches with respective gate-emitter connections, as shown in Fig. 5.2. Even though the internal structure of the MCM is modified, the parallel devices are connected through the same collector/emitter - or drain/source - busbar structure, as shown in Fig. 5.1. Therefore, this structure has the same collector-emitter stray inductance and the gate-emitter loop can be even reduced, compared to an ordinary MCM [195].



Figure 5.2: Proposed packaging with four equivalent switching groups. (a) Open package with switching groups (b) Package structure with four gate-emitter connections.

# 5.3 Die-Level Thermal Balancing

Based on the more flexible multi-gate structure, this work proposes the die-level thermal balancing to overcome thermal mismatches in MCMs. As shown in Fig. 5.3, the junction temperature of the devices - or switching groups (Fig. 5.2 (a)) are sensed and the pulses are processed to manipulate the losses among them. This control is realized in a gate-driver level, whereby the pulses come from the modulator and are manipulated without modifying the converter control structure. Therefore, it is possible to act on the turn-on [28, 39], turn-off or even shadow the pulses during a complete period [63], as shown in Fig. 5.4 and detailed in sequence.



Figure 5.3: Full diagram of the thermal balancing in multichip multi-gate modules, with  $V_{on}$ -based  $T_j$  sensing



Figure 5.4: The pulse-processing approaches for die-level thermal balancing: pulse-Shadowing, turn-on losses manipulation and turn-off losses manipulation.

# 5.3.1 Pulse-Shadowing Based Thermal Balancing

One solution to achieve thermal balancing in MCMs is by shadowing pulses of the devices - or SW [29, 63]. Therefore, this work proposes and investigate the pulse-shadowing strategy for thermal balancing in MCMs. As shown in Fig. 5.5 the temper-atures are sensed and the hotter devices remain turned-off during specific switching periods. Thereby, the shadowed device dissipates lower mean losses - switching and conduction - and the temperatures are balanced by alternating the process among them over time.



Figure 5.5: The pulse-shadowing based thermal balancing pulse pattern.

## **Proportional Control**

To realize the die-level thermal control a linear strategy based on average losses and a proportional controller is proposed. In this strategy, the average losses of the devices are manipulated inside a limited period defined by a specific number of pulses. As shown in Fig. 5.6, a mean temperature is defined as a set-point  $(T_{j-avg})$  and it is compared to the measured junction temperatures, this temperature set-point is obtained online by an average of the measured junction temperatures, as shown in 5.1.

$$T_{j-avg}^{*} = \sum_{i=1}^{N_{dev}} \frac{T_{ji}}{N_{dev}}$$
(5.1)



Figure 5.6: Die-Level proportional control based on average losses.

Based on the temperature differences, the proportional controller (|P|) acts on the die average losses - represented by the vector  $\overrightarrow{P_{loss}}$  - aiming at equalizing the  $T_js$  to the reference one. As detailed in Sec. 5.2, the dies of presented MCM is divided in four groups, each one with a single gate-emitter connection. Therefore, the temperature of one device per group is compared to the set-points, which results in four errors  $(\overrightarrow{e_{tj}})$ . Hence, the control variable vector  $(\overrightarrow{\Delta P_{self}})$  is composed by four elements, and the proportional controller can be represented by a 4x4 matrix, as shown in 5.2. The proportional losses variation is given to the pulse processing block, which is then responsible for applying the specific extra - or reduced - losses to each group of devices.

$$\overrightarrow{\Delta P_{self}} = \overrightarrow{e_{tj}} \cdot \begin{bmatrix} P & 0 & 0 & 0 \\ 0 & P & 0 & 0 \\ 0 & 0 & P & 0 \\ 0 & 0 & 0 & P \end{bmatrix}$$
(5.2)

Once the thermal network is fully knowing, it is possible decouple the effects of the thermal cross-coupling on the temperatures, by applying a decoupling vector  $(\overrightarrow{\Delta P_{cross}})$ . The  $\overrightarrow{\Delta P_{cross}}$ , calculates online the proportional power generated by the neighbor devices after a variation on the  $\overrightarrow{\Delta Ploss}$ , from one control period to the next. The decoupling vector is shown in 5.3, whereby  $\overrightarrow{P_{Loss}}$  are the online losses of each die,  $|Z_{th}|$  the steady-state thermal model and  $|I_{24}|$  an identity matrix.

$$\overrightarrow{\Delta P_{cross}} = \overrightarrow{\Delta P_{act}} \cdot \overrightarrow{P_{Loss}} \cdot \frac{|Z_{th}|}{Z_{th-x,x}} - \overrightarrow{\Delta P_{act}} \cdot \overrightarrow{P_{Loss}} \cdot \frac{|Z_{th}|}{Z_{th-x,x}} \cdot |I_{24}| \tag{5.3}$$

Moreover, considering the FEM model, it is possible to obtain the power losses vector which is capable of balancing compensating the effects of thermal cross-coupling and insert it as a feed-forward action to the closed-loop control  $(\overrightarrow{\Delta P_{ff}})$ . Therefore, the compensator (|P|) becomes responsible for removing only additional temperature errors, caused by power deviation, uneven cooling, aging or parametric deviations. Consequently, the decoupled power  $\overrightarrow{\Delta P_{dec}} = 0$  and the closed-loop control has no action to the system in case the  $\overrightarrow{\Delta P_{ff}}$  is perfectly estimated. Finally, the  $\overrightarrow{\Delta P_{ff}}$  is summed to the  $\overrightarrow{\Delta P_{dec}}$ , and the proportional power difference  $(\overrightarrow{\Delta P_{act}})$  is delivered to the pulse processing block. As previously shown in Fig. 2.14, the thermal impedance of the dies can be fitted by a second order exponential function. Therefore, the proportional open and closed-loop control structures, can be represented by 5.4 and 5.5, respectively.



Figure 5.7: Frequency response for the equivalent thermal model.

$$\frac{T_j}{T_{j-avg}^*} = \frac{Kp \cdot K}{(T_1s+1)(T_2s+1)}$$
(5.4)

$$\frac{T_j}{T_{j-avg}^*} = \frac{Kp \cdot K}{(T_1s+1)(T_2s+1) + K_p \cdot K}$$
(5.5)

To obtain the controller proportional gain, a strategy based on dynamic stiffness is adopted. This concept is based on the system capability to reject a specific disturbance magnitude [196]. In other words, it measures the amount of disturbance magnitude necessary to generate a unit deviation on the system output [196]. As shown in Fig. 5.6, the dynamic stiffness of the presented control system is designed based on the effect of thermal cross-coupling power ( $\Delta P_{cross}$ ) variation on the junction temperature of a die ( $T_j$ ). Therefore, the dynamic stiffness transfer function can be represented as in 5.6.

$$\left|\frac{P_{cross}}{T_j}\right| = \frac{(T_1 \cdot T_2) \cdot s^2 + (T_1 + T_2)s + 1 + (Kp \cdot K)}{K}$$
(5.6)

Considering the main goal is to regulate cross-coupling disturbances, the proportional controller is designed based on the cross-coupling impedance response. Fig. 5.7, shows the frequency response of one self (red) and the equivalent cross-coupling impedance of a middle device (blue). As it can be seen, the cross-coupling impedance cutoff frequency is  $\omega_c = 0.142$ . Therefore, the dynamic stiffness zeros - i.e. the control system poles - are allocated in order to obtain a response five times faster than the cross-coupling impedance  $\omega_{cross} = 0.71$ . Thereby, the proportional gain can be obtained as shown in 5.7.

$$K_p = \frac{(T_1 + T_2) \cdot \omega_{cross} - 1}{K}$$
(5.7)

Fig. 5.14 (a), shows the frequency response of the dynamic stiffness transfer function considering the calculated proportional gain and parameters shown in Tab. 5.1. As it can be seen, a  $\Delta P_{cross}$  variation of 20 W is required to generate a deviation of 1° C, considering low frequency disturbances. Moreover, the dynamic stiffness increases for frequencies above  $\omega_{cross}$ , whereby high frequency disturbances are rejected by system inertia. To demonstrate the control system stability, the the bode plot of the open-loop plant is obtained (Fig. 5.14 (b)), whereby a phase margin of 69.5° is observed.

Table 5.1: Die-Level proportional control parameters.

| $T_1$ | 1.406  |
|-------|--------|
| $T_2$ | 6.155  |
| K     | 0.2475 |
| $K_p$ | 17.64  |



Figure 5.8: Frequency response for the proportional die-level thermal control system (a) Dynamic stiffness (b) Magnitude gain and phase marging of the open-loop control system.

As a second dynamic stiffness criteria, it is desired a maximum tolerance of 10% of maximum power deviation per degree. Therefore, considering the presented 24dies MCM the minimum low frequency dynamic stiffness is calculated by the equivalent cross-coupling thermal disturbances on a specific die  $(T_x)$ , as shown in 5.8. For the pulse-shadowing process, the power loss  $P_{Loss(T_x)}$  is calculated considering a shadowing in one SW under nominal load - i.e. only twelve IGBTs carrying the nominal current. Thereby, considering the devices with higher cross-coupling effects  $(SW1) R_{th(T_{x-n})} = 0.69 K/W, R_{th(T_{x,x})} = 0.30 K/W$  and a pulse-shadowed power of 145.48 W, the minimum dynamic stiffness results in 18.79 W/K. Consequently, considering the obtained dynamic stiffness shown in Fig. 5.14 (a), the system is capable of keeping thermal deviatios below 1°C considering disturbances variation  $(\Delta P_{cross})$ of up to 10%.

$$\left|\frac{\Delta P_{cross}}{T_j}\right| > P_{Loss(Tx)} \cdot \frac{R_{th(T1-n)} - R_{th(Tx,x)}}{R_{th(Tx,x)}} \cdot \frac{1}{10}$$
(5.8)

For the pulse-shadowing implementation, a range of ten periods is selected to manipulate the mean power losses. Thereby, the temperature of the devices are sensed and the number of shadowed pulses are selected based on the mean power losses on every ten switching periods. As a result, the hotter dies remain off for more periods, thereby directing a portion of the mean losses to the colder ones. To validate the control system, the FEM-based electrothermal simulation system previously described in Sec. 2.3.2 with respective parameters, is adopted. Since the FEM model is symmetric, only one device per switching group - SW and DG - is simulated for this transient analysis to reduce the computational effort. At the beginning of the thermal simulation, the MCM operates with a thermal deviations of up to  $17.2 \,^{\circ}C$ , as shown in Fig. 5.9. The pulse-shadowing based thermal balancing is triggered at  $t = 20 \, s$ , the temperatures are balanced - with a small temperature difference - and the highest temperature is reduced by  $6.8 \,^{\circ}C$ .

To validate the dynamic stiffness, the system is simulated again considering a



Figure 5.9: Transient thermal results of the pulse-shadowing strategy, showing the junction temperature of one die per switching group. The thermal balancing is triggered at t = 20 s, and the temperatures are balanced.

thermal resistance of one device of group SW1 10% higher than the estimated value. As shown in Fig. 5.10, the thermal balancing is activated at t = 20 s, and the temperatures still balanced with thermal deviations below 1°C. Moreover, a thermal reduction of 11 °C can be observed, due to the higher thermal deviation caused by the different thermal resistance.



Figure 5.10: Transient thermal results of the pulse-shadowing strategy considering 10% of variation on the  $R_{th}$  of SW1, showing the junction temperature of one die per switching group. The thermal balancing is triggered at t = 20 s, and the temperatures are balanced, with deviations below 1°C.

# **On-Off** Control

Although the proportional controller has several advantages, a parametric evaluation of the thermal network is required. Alternatively, an on-off control structure can be adopted, whereby the temperature of the devices are manipulated on each switching periods. In this approach, the temperatures are used as a feedback and compared through logic elements. For the pulse-shadowing approach, the temperature of each switching group  $(T_{j-SWx})$  is sensed and compared the all others  $(T_{j-SW-All})$ , and the SW with highest temperature has its PWM signal blocked  $(Lock_{SWx})$ , as shown in 5.11. Since only logic elements are used and no control tuning is required, this strategy does not require the thermal network parameters, and it can be directly applied to any multi-gate multichip structure.



Figure 5.11: Control diagram of the on-off based die-level thermal control strategy.

The on-off control strategy is also validated on the FEM-based electrothermal simulation system. A shown in Fig. 5.12, the pulse-shadowing based thermal balancing



Figure 5.12: Transient thermal results of the on-off control with pulse-shadowing strategy, showing the junction temperature of one die per switching group. The thermal balancing is triggered at t = 20 s, and the temperatures are perfectly balanced.

is triggered at t = 20 s, the temperatures are balanced and the highest temperature is reduced by  $6.9 \,^{\circ}C$ . Looking at the zoom of the balanced temperatures, it can be seen that the steady-state error is even lower compared to the proportional controller, shown in Fig. 5.9.

#### **Experimental Validation**

For the experimental validation, the on-off control strategy is implemented on the evaluation system with three-devices with independent gate-commands described in the end of this chapter - Sec. 5.8.1. The electrical results of the on-off pulse-shadowing strategy, with gate commands  $(S_x)$ , IGBT/diode current  $(I_{s1})$ , output voltage  $(V_{out})$  and current  $(I_{out})$  is shown in Fig. 5.13. For the thermal balancing validation, two thermal images are collected, one without thermal balancing strategy and other with pulse-shadowing. As shown in Fig. 5.14 (a), there is a maximum thermal deviation of  $6.43 \,^{\circ}C$ , and the temperatures are balanced resulting in a temperature reduction of  $0.43 \,^{\circ}C$ , as displayed in Fig. 5.14 (b).



Figure 5.13: Electrical results of the pulse-shadowing strategy, showing the gate commands  $(S_x)$ , one IGBT/diode current  $(I_{s1})$ , output voltage  $(V_{out})$  and current  $(I_{out})$ 



Figure 5.14: Steady-state thermal analysis of the pulse-shadowing strategy: (a) Without balancing. (b) With balancing.

# 5.3.2 Turn-off Losses Manipulation for Thermal Balancing

Alternatively to the complete pulse shadowing, the turn-off losses of the devices can be separately manipulated to obtain thermal balancing. Moreover, the turn-off manipulation has a potential to reduce the total losses of power devices due to the direct relation between  $I_c$  and  $t_{off}$ . Therefore, this work proposes and investigates the turn-off loses based thermal balancing for multichip modules. A physics-based analysis of the method and its thermal validation are conducted in sequence.

### IGBT Turn-off - A Physics-Based Analysis

The turn-off process of an IGBT is defined by the sweep away of remaining charges from the conduction period. As shown in Fig. 5.15, the first turn-off stage is defined by the time between the turn-off command and the gate-voltage reaching the miller plateau ( $V_{gp}$ ). In this stage, the voltage decay is defined by the gate ( $C_{gc}$ ) and oxide ( $C_{ox}$ ) capacitances, which are constant during the process. However, the difference between  $V_{ge}$  and  $V_{gp}$  is proportional to the  $I_c$ . Therefore, higher collector current shorts the duration of the first stage, as shown in 5.9 [143].

$$t_1 = R_g (C_{ge} + C_{gc}) ln \left( \frac{V_{gg(on)} - V_{gg(off)}}{(V_{th} + \frac{I_c}{g_m}) - V_{gg(off)}} \right)$$
(5.9)

During the second stage, the charges are extracted from the  $N^-$  drift region to



Figure 5.15: Turn-off process of an IGBT under inductive load.

maintain a constant  $I_c$  and compensate the decreasing of the MOS channel electron current. As a result, the charges in the accumulation layer under the gate region shown in Fig. 5.16 - are swept out; the depletion region starts to widen and  $V_{on}$  rises slowly [144]. The evolution of the depletion layer located under the gate region  $(W_{cd})$ is proportional to the MOS current decrease  $(\Delta I_{ch})$ , and ultimately related to the collector-current, defines the duration of this stage  $(t_2)$  [143]. Therefore,  $I_c$  impacts the evolution of the depletion layer and is inversely proportional to the duration time of the second stage  $(t_2)$ , as shown in 5.10.

$$t_2 = \frac{L_M \cdot q \cdot \eta_{ac}}{2 \cdot \Delta V_{ge} \cdot \sqrt{(1 - \alpha_{pnp}) \cdot K_c \cdot I_c}}$$
(5.10)

where  $\Delta V ge$  is the gate-emitter voltage,  $L_M$  the length of the depletion region, q the unit charge and  $\eta_{ac}$  the carrier distribution.  $\eta_{ac}$  is the carrier distribution, $\Delta V ge$  the variation of gate-emitter voltage, q the unit charge,  $\alpha_{pnp}$  the current gain and  $K_c$  an electron mobility coefficient [143].

The third stage starts when the accumulation layer disappear from the gate region, the miller capacitance decrease suddenly and the evolution of the space-charge layer  $(W_n)$  conducts the voltage rising [144]. In this step, the concentration of the drift region, which is related to the collector current, dictates the evolution of  $W_n$ . Therefore,  $I_c$  also affects the third stage, as shown in 5.11 [47].



Figure 5.16: Physical structure of an IGBT.

$$t_3 = \frac{\varepsilon_S \cdot \eta_{wnb^+} \cdot V_{dc} \cdot A_c}{W_n \cdot (N_D + \eta_{sc}) \cdot I_c}$$
(5.11)

Thereby,  $N_D$  and  $W_n$  are the doping concentration and width of the of  $N^-$  drift region, respectively.  $\eta_{sc}$  the hole concentration in the space-charge region and  $\eta^+_{wnb}$ the drift region interfacing the N-buffer layer.  $\varepsilon_S$  is the Boltzmann constant,  $I_c$  the on-state current,  $A_c$  the active IGBT area and  $V_{dc}$  the total dc-bus voltage. The  $t_3$ , is also dependent on the collector current through  $\eta_{sc}$ , which in turn increase the hole concentration in the depletion layer. As a result, high  $I_c$  accelerate its evolution and, consequently, reduces  $t_3$ .

The collector current decay is the process of the fourth stage, which starts after the voltage rising. The space-charge layer, in most cases, extend trough the entire N-base, whereby the recombination of trapped holes on the N-buffer layer conduct this process. Therefore, the  $I_c$  decay and the current falling time, depends only to the buffer layer lifetime ( $\tau_{p0,NB}$ ), as modeled in 5.12 [47].

$$t_4 = \tau_{p0,NB} \cdot ln(10) \tag{5.12}$$



Figure 5.17: Schematic and flowchart of the two double pulse tests applied to the DUT to demonstrate the effects of  $I_c$  - 1 and 0.5 pu - on the turn-off time of IGBTs.

Notably, the collector current influences the first three turn-off stages, whereby higher  $I_c$ s results in lower turn-off times  $(t_{off})$  in all steps. Indeed, variations on the dV/dt during the third stage of around  $20 V/\mu s/A$  in 1.7 kV/50 A devices, has been demonstrated in reference [144]. Reduction of  $1\mu s$  is furthermore demonstrated with with 25% of current increase in medium-voltage high-current devices (3.3 kV/1500 A)[143].

Therefore, to validate the aforementioned physics behavior, a double-pulse test - removing the effects of temperature - is realized in a 1.2 kV/25 A device (DP25F1200T). In this experiment, the nominal and half currents are applied, as shown the diagram and schematic of Fig. 5.17. As a result, the  $t_{off}$  of the second and third stage are reduced by 41 ns in full load, as shown in Fig. 5.18 (a).

### Losses Analysis

The energy  $(E_{off})$  dissipated during the turn-off is related to its duration, whereby short  $t_{off}$  results in lower losses, as deduced in 5.13. As detailed above, the  $I_c$  is proportional to the turn-off time, and comparing the turn-off losses of one device conducting the same current with two in parallel - supposing the same temperature -, the single one is more efficient. In other words, the turn-off losses of one device under nominal load is lower than two devices with half load each. To illustrate it in a


Figure 5.18: Experimental results for the turn-off of an 1.2 kV/25 A IGBT for different currents. (a) 25 A (b) 12.5 A.

practical scenario, the turn-off energy of multiple devices from different manufacturers is summarized in Tab. 5.2. As expected, the  $E_{off}$  of one device with total current (1  $-E_{off}$ ) is , in general, lower than the energy of two devices with half current (2  $-E_{off}$ ).

$$E_{off} = \int_0^{t_{off}} V_{ce}(t) \cdot I_c(t) dt$$
(5.13)

Table 5.2:  $E_{off}$  comparison of multiple devices considering a single device with 1 pu of current and two with 0.5 pu each.

| Device                 | Technology   | Characteristics | $1 - E_{off}$      | 2 - <i>E</i> <sub>off</sub> |
|------------------------|--------------|-----------------|--------------------|-----------------------------|
| IKW40N120C6S $(Inf)$   | Trenchstop 6 | 1200 V/40 A     | 2.9 mJ             | 3.33 mJ                     |
| IKFW50N65DH5 (Inf)     | Trenchstop 5 | 650  V/50  A    | $3.67 \mathrm{mJ}$ | 4.20 mJ                     |
| FD100R12W2T7 (Inf)     | IGBT 7 - T7  | 1200 V/100 A    | 10.02 mJ           | 11.30 mJ                    |
| IRG7PH35UD (IR)        | Generation 7 | 1200 V/20 A     | 1.16 mJ            | 1.42 mJ                     |
| 5SNG 0150Q170300 (ABB) | 62Pak        | 1200  V/150  A  | 35.83 mJ           | 41.22 mJ                    |
| SK100GH12T4T (Sem)     | 4-Trench     | 1200 V/100 A    | 10.13 mJ           | 11.36 mJ                    |
| SKM75GB12V (Sem)       | V-IGBT       | 1200  V/75  A   | 7.1 mJ             | 8.2 mJ                      |

#### **Turn-off Losses Manipulation**

The turn-off losses manipulation strategy is based on turning-off the hotter device or SW - before in soft-switching, as shown in Fig. 5.19. Thereby, the  $E_{off}$  of the hotter dies are reduced and the current is redistributed among the colder ones for its turn-off process. Consequently, the remaining devices turn-off under higher current, thereby balancing the temperatures and reducing the overall losses due to the shorter turn-off time. Fig. 5.20, shows the adopted on-off control structure considering the turn-off losses manipulation for the multi-gate MCM. As it can be seen, the junction temperatures are sensed, compared, and the equivalent switch with highest temperature has its duty  $(T_{duty})$  reduced by a specific time  $(t_{off-red})$ . Thereby, this device is turned-off in soft-switching to reduce its  $E_{off}$ . To manipulate only the turn-off losses without affecting the conduction one, the  $t_{off-red}$  is calculated online to ensure a thermal balancing on its lowest possible value. As shown in Fig. 5.20, the  $t_{off-red}$  starts with an initial value, the temperatures with highest difference are compared and  $t_{off-red}$ is decremented by a specific step. Thereby, the optimum turn-off reduction value is obtained automatically, when a minimum value that ensures balancing is reached.



Figure 5.19: Pulse pattern of the turn-off losses manipulation strategy, whereby the hottest device - or SW - has its duty time reduced and turned-off before the others, in soft-switching.



Figure 5.20: Implementation schematic of the on-off control based on turn-off losses manipulation for thermal balancing, with online  $t_{off-red}$  calculation.

#### Thermal Analysis

The validation of the proposed turn-off losses manipulation, with its additional capability to slightly reduce the switching losses, is firstly validated by using the previously presented FEM-based simulation process. As shown in Fig. 5.21, the temperatures are equalized when the thermal balancing strategy is activated at t = 20 s. Moreover, there is a total thermal reduction of  $8.7 \,^{\circ}C$  when the  $t_{off-red}$  is shortened to its minimum possible level, thereby slightly reducing the MCM overall losses.



Figure 5.21: Transient thermal results of the on-off control with pulse-shadowing strategy, showing the junction temperature of one die per switching group. The thermal balancing is triggered at t = 20 s, and the temperatures are perfectly balanced.

#### **Experimental Validation**

The turn-off losses manipulation is also applied in the validation system described in Sec. 5.8.1. Fig. 5.22, shows the electrical results, whereby the device current increases by 33% at the of the period, when another device is switched-off before a calculated turn-off time. The thermal validation is shown in Fig. 5.23, whereby the temperatures are balanced by manipulating the turn-off losses with a reduction of  $2.2 \,^{\circ}C$  in the hottest device.



Figure 5.22: Electrical results of the turn-off losses manipulation, showing the gate commands  $(S_x)$ , one IGBT/diode current  $(I_{s1})$ , output voltage  $(V_{out})$  and current  $(I_{out})$ 



Figure 5.23: Steady-state thermal analysis of the turn-off losses manipulation strategy: (a) Without balancing. (b) With Balancing.

### 5.3.3 Comparison of the Thermal Balancing Solutions

For a better visualization of the thermal distributions considering the impact of the presented strategies on the temperature of all 24-dies, the obtained losses are loaded on the finite elements software. Moreover, the turn-on losses manipulation proposed by different authors, is also implemented for sake of comparison [28, 39, 195]. In this strategy, a group of parallel devices are turned-on displaced in time to manipulate its turn-on switching energy  $(E_{on})$ . As a result, the temperature among the devices are balanced by alternating the turn-on sequence over time. As shown in Fig. 5.24 (a), without any strategy there is a critical uneven thermal distribution among the

dies, whereby the MCM operates with a maximum deviation of  $16.5 \,^{\circ}C$ . As shown in Fig. 5.24 (a), the temperature in the hottest die is reduced from  $123.0 \,^{\circ}C$  to  $116.7 \,^{\circ}C$ , when the pulse-shadowing strategy is applied. The results of the turn-on and turn-off losses manipulation are shown in Figs. 5.24 (c) and 5.24 (d), whereby the maximum temperatures are even reduced achieving maximum of  $115.5 \,^{\circ}C$  and  $114.5 \,^{\circ}C$ , respectively. The cross-coupling effects are also displayed in the FEM analysis, whereby a reduced impact in the middle dies with thermal balancing, is observed. Consequently, the temperature of the diodes  $(D_x)$  are reduced in up to  $2.1 \,^{\circ}C$ ,  $3.0 \,^{\circ}C$  and  $3.3 \,^{\circ}C$ , for the pulse-shadowing , turn-on losses and turn-off losses manipulation, respectively. The comparative analysis, with the thermal deviation  $(T_{j-dev})$ , maximum temperature



Figure 5.24: Finite element analysis comparison for the presented thermal balancing strategies: (a) Without Balancing (b) Pulse-Shadowing (c) Turn-on Losses (d) Turn-off Losses.

 $(T_{j-max})$  and obtained reduction on the transistors  $(T_{j-Red-T})$  and diodes  $(T_{j-Red-D})$ , are summarized in Tab. 5.3.

Table 5.3: Comparative analysis for the presented thermal strategies, showing the highest thermal deviation and temperature among the dies.

| Strategy        | $T_{j-max}$      | $T_{j-Red-T}$  | $T_{j-Red-D}$  |
|-----------------|------------------|----------------|----------------|
| None            | $123^{\circ}C$   | -              | -              |
| Pulse-Shadowing | $116.7^{\circ}C$ | $6.3^{\circ}C$ | $2.2^{\circ}C$ |
| Turn-on         | $115.5^{\circ}C$ | $7.5^{\circ}C$ | $3 ^{\circ}C$  |
| Turn-off        | $114.5^{\circ}C$ | $8.5^{\circ}C$ | $3.3^{\circ}C$ |

### **Experimental Efficiency Analysis**

To validate the impact on losses of the presented thermal balancing strategies, their efficiencies are measured for different power levels. For that, a power analyzer is connected in the input and output of the buck converter, the power is varied through the resistors, the thermal balancing strategy is triggered in the software, and the efficiency is calculated through the measured input and output power. The obtained results are shown in Tab. 5.4, whereby the losses differences are very small with a slightly higher efficiency for the turn-off losses manipulation in all operating points.

Table 5.4: Efficiency comparison for the MCM operating with 0.33 pu, 0.66 pu and 1 pu of power, considering four operating conditions: without thermal balancing, pulse-shadowing, turn-on losses and turn-off losses manipulation.

| Strategy        | 0.33 pu  | 0.66 pu  | 1 pu     |
|-----------------|----------|----------|----------|
| None            | 76.126~% | 86.638~% | 90.172~% |
| Pulse-Shadowing | 76.013~% | 86.557~% | 90.048~% |
| Turn-on         | 76.124~% | 86.621~% | 90.165~% |
| Turn-off        | 76.128~% | 86.647~% | 90.184~% |

### 5.4 Indirect Thermal Balancing for Diodes

The thermal balancing strategies has been demonstrated the effectiveness to overcome temperature and degradation of active switches inside MCMs. Nevertheless, the diodes are passive devices and the losses manipulation among them is not possible. Looking at Fig. 5.24, it can be noticed that the diodes are in the middle of the MCM, thereby suffering high thermal cross-coupling influences. As a result, high thermal deviations and high temperature - especially in case of the extreme middle ones: D2, D3, D6 and D7 - can be observed. The thermal deviation among diodes is even worst, because it has positive on-resistance coefficient, whereby the temperature mismatches result in higher current deviations and, consequently, a cascaded effect.

The diode thermal stress, however, is not critical in most applications, whereby a positive power factor results in considerably higher thermal stress on the IGBTs. Nevertheless, in specific applications, such as active front end (AFE) [197] and reactive power processing [198], the negative power factor results in higher stress on the diodes. Therefore, the thermal mismatch and extra heating due to cross-coupling effects can result in high thermal stress on the diodes, which counts with a reduced number of dies and higher thermal resistance [198]. Hence, an alternative strategy to reduce the thermal stress and ensure high reliability in such applications has high relevance.

Although the hotter diodes are not controllable, there is a possibility to act on the active devices of their neighborhood, to enhance the thermal spread in the middle of the MCM. Therefore, this work proposes an indirect thermal balancing for MCM diodes in applications where the diodes suffer from higher thermal stresses. The main focus is to reduce the thermal cross-coupling effects on the diodes, by manipulating the losses of the IGBTs close to them. As can be seen in Fig. 5.24, the immediate neighbors of the middle diodes - T2, T7, T10 and T11, are grouped in the same equivalent switch (SW1). Therefore, it is possible to reduce the temperature of the diodes by manipulating the losses of SW1. Considering the pulse-shadowing strategy, the pulses of SW1 is shadowed until the temperature of the others IGBTs reach the



Figure 5.25: Flowchart of the thermal stress reduction in MCM diodes by acting on the thermal spread.

 $T_j$  of the diodes, as shown in the flowchart Fig. 5.25.

#### Thermal Analysis

The thermal balancing for MCMs in reverse power flow applications is validated considering the same parameters of the previous analysis, yet with a power factor of  $cos(\phi) = -1$ . In this strategy, the middle dies (SW1) and diodes are sensed and compared; if SW1 is hotter than the diodes, its pulse is shadowed in the next period. As shown in Fig. 5.26 (a), the diodes located in the middle of the MCM have the highest thermal stress, operating under  $124 \,^{\circ}C$  in ordinary conditions. Nevertheless, with the thermal balancing, the thermal stress of the middle IGBTs - SW1 - is directed to the edge ones - SW2, SW3 and SW4 -, thereby reducing its temperature and changing the thermal spread on the MCM, as clearly shown in Fig. 5.24 (b). Therefore, the thermal cross-coupling in the middle dies - DG1 - is drastically reduced, thereby lowering their temperatures in up to 9.9  $^{\circ}C$ , as also displayed in Fig. 5.24 (b).



Figure 5.26: Finite element analysis of the thermal balancing for MCMs in reverse power flow applications: (a) Without thermal balancing (b) With thermal balancing.

# 5.5 Selective $T_j$ Sensing and the Pre-Programmed Thermal Balancing

Considering  $V_{on}$  as a TSEP for die-level thermal balancing strategies, the single collector-emitter connection challenges the sensing of the specific devices inside the parallel group. By sensing  $V_{ce}$  of all dies at the same time, only a mean temperature is obtained thereby hiding the die-level temperatures. Therefore, it is possible to apply a selective  $T_j$  sensing, whereby the  $V_{ce}$  of each device - or SW - is sensed at time only under the currents below its limit. Therefore, this strategy can be applied in partial load conditions or with the current crossing lower level - as discussed in section 4.4.6. Considering the presented 24-dies module, for example, the sensing process can be applied in its four groups with currents up to 400 A. Since this work is focused on the reliability of power devices, which is more critical in load cycling applications, as well detailed in Chap. 3, the sensing process can be safely applied in many steps of the mission profile. Therefore, a pre-programmed thermal balancing strategy is proposed, whereby the  $T_j$ s are sensed only at low load through a  $V_{on}$  sensing circuit. In this strategy the close-loop thermal balancing is applied only under low load conditions, and the obtained pulse-pattern is replicated in other operating points, as shown in the flowchart of Fig. 5.27. As a result, the temperature per device - or SW - can be safely sensed with required precision, and the temperatures balanced in the entire load range, due to the relation between losses and thermal deviation.



Figure 5.27: Flowchart of the pre-programmed thermal balancing for load cycling applications, with selective  $T_j$  sensing process under reduced load.

#### **Experimental Validation**

The pre-programmed thermal balancing is also validated in the experimental setup (Fig. 5.33), by applying the turn-off losses manipulation strategy with the implementation process shown in Fig. 5.27. As shown in Fig. 5.28 (a), the system starts operating without thermal balancing and with  $0.33 \, pu$  of power - 15 A peak. At  $t = 30 \, s$  the thermal balancing is triggered, and the die-level  $V_{ce}$  sensing strategy is activated. The temperature of each device is sensed at a time, and the pulse pattern is updated until the temperatures are balanced, as shown in Fig. 5.28 (a). In this experiment, a sensing process is applied after each ten soft-switching periods. Thereafter, a step current is applied at  $t = 60 \, s$  by switching-on the dc breaker and adding one more resistor in parallel, making the system to operate with  $0.66 \, pu$  of power. In this stage, the die-level  $V_{ce}$  sensing is deactivated, and the pulse-pattern responsible for balancing the temperature under  $0.33 \, pu$  of load is replicated. In sequence, another resistor is added to the parallel group with a second dc breaker -  $t = 60 \, s$  -, and the system



Figure 5.28: Validation of the die-level  $V_{ce}$  sensing, pre-programmed thermal balancing and transient performance of the turn-off losses manipulation strategy. The pulse pattern is defined with 7.5 A by sensing the  $T_j$ , and it is repeated for 15 A and 23 A keeping the temperatures balanced during the whole power cycling. (a) Current profile. (b) Junction temperature of each device.

starts to operate with 1 pu of power for a specific time. Finally, the two resistors are removed by switching-off the dc breakers at the same time, taking the system again to 0.33 pu of power. As shown in Fig. 5.28 (b), the temperatures are balanced in the whole cycling process, with the same pulse pattern pre-programmed at low load condition.

# 5.6 Reliability and Efficiency Analysis of the Die-Level Thermal Balancing in MCIA

To evaluate the performance the pre-programmed die-level thermal balancing and its impact on mission critical application, a reliability and efficiency analysis of the presented strategies is conducted. In this study, the mine hoist system presented in Sec. 3.4 is adopted. In this application, the skip is accelerated until the nominal speed, and is braked when approaching the final destination - the top or bottom of the mine. Therefore, the converter experience direct power flow during acceleration and a reverse power flow during the regenerative breaking process. Thereby, the thermal balancing strategies are switched to reduce the thermal stress on the IGBTs or diodes depending on the current direction, as highlighted in 5.29 (a). In the figure, the junction temperature sensing region of the aforementioned pre-programmed strategy can be also observed, whereby the sensing per group can be applied in up to 800 Adue to the adoption of two parallel MCMs. In gold ore production, the mine hoist operates with a fixed fixed trajectory and variable payload. As shown in Fig. 5.29 (a), the pulse pattern can be obtained in trips with reduced load, and then reproduced in the full load ones - as previously demonstrated in Fig. 5.28. Fig. 5.29 (b) shows the junction temperature of each device with turn-of losses manipulation for thermal balancing, considering one trip with half of the load. In this case, the temperatures are sensed during the whole profile, the pulse pattern is stored and thereafter used for the full load trips. Fig. 5.30 (d), shows the MCM junction temperatures for the mine hoist operating with full load, yet applying the pulse pattern obtained with the



Figure 5.29: (a) Dc-Link current of the mine hoist system mission profile under reduced load, whereby the pulse-pattern of the thermal balancing strategies are obtained with a full  $T_j$  sensing capability, a shown in green. The adopted thermal balancing strategies according to the power flow, are also highlighted in red. (b) Junction temperature of each die inside the MCM for the mission profile with reduced load, whereby the temperatures are sensed, balanced and the pulse-pattern is defined.

turn-off losses manipulation under low load. As can be seen, the temperatures are balanced in a similar way.

Thereafter, the procedure is repeated for all presented strategies, whereby the pulse-pattern is obtained under load load and repeated for higher ones. Fig. 5.30 (b) shows the results for the pulse-shadowing strategy which has a temperature reduction of up to  $T_{j-max-dir} = 5 \,^{\circ}C$  comparing to Fig. 5.30 (a). For the turn-on losses manipulation strategy, the reduction in direct power flow is upt  $T_{j-max-dir} = 6.8 \,^{\circ}C$  as shown in Fig. Fig. 5.30 (c). Due to the slightly better efficiency, the junction temperature reduction of the turn-off losses manipulation strategy reaches up to  $T_{j-max-dir} = 8.3 \,^{\circ}C$ , as demonstrated in Fig. 5.30 (d). In reverse power flow, the indirect thermal balance-



Figure 5.30: FEM-based temperature analysis for the mine hoist mission profile considering the following thermal balancing strategies: (a) None (b) Pulse-Shadowing in direct power flow and diode stress reduction in reverse power flow (c) Turn-on Losses in direct power flow and diode stress reduction in reverse power flow (d) Turn-off Losses in direct power flow and diode stress reduction in reverse power flow

ing for diodes is adopted for the three cases, whereby a temperature reduction of  $T_{j-max-rev} = 4.9 \,^{\circ}C$  in the maximum highlighted thermal cycling can be observed.

To evaluate the impact of the thermal balancing strategies on the lifetime of the studied mine hoist system, the die-level reliability procedure - proposed in Sec. 3.7 - is applied. Thereby, the statistical and reliability analysis are realized considering the temperature and failure probabilities of each die. Hence, the monte carlo analysis is conducted for the four cases presented in Fig. fig:TransientHoist - without balancing (None) and the presented thermal balancing strategies. As can be seen in Fig. 5.31 (a), without balancing the failure probabilities are more scattered, which results in a higher failure percentage in the first operating years. The thermal balancing strategies, however, shift the failures over time, whereby the samples start to failure later depending on the thermal stress reduction provided by the respective technique. The system-level unreliability results which consider the failure probability of each die composing the system, is shown in 5.31 (b). As it can be seen, the  $B_{10}$  lifetime is increased by 22%, 36% and 51% when the pulse-shadowing, turn-on and turn-off losses manipulation strategies - with the indirect strategy in reverse power flow - are adopted, respectively. The temperature and lifetime differences are summarized in Tab. 5.5.



The last analysis evaluates the impact of the thermal balancing strategies on the

Figure 5.31: Die-level reliability analysis for the mine hoist system considering the presented thermal balancing strategies (a) Statistical Analysis (b) Unreliability Analysis.

energy losses consumption of the system during an expected operating time of 20 years  $(E_{15})$ . As can be seen in Tab. 5.5, the turn-off losses manipulation shows a slightly reduced losses consumption.

Table 5.5: Comparative analysis for the presented thermal strategies in the mine hoist system, showing the highest temperature, the  $B_{10}$  lifetime, the energy consumption only by losses in 15 years of operation ( $E_{15}$ ), and their differences comparing with the system with thermal deviations.

| Strategy                          | None   | Pulse-Shadowing | Turn-on | Turn-off |
|-----------------------------------|--------|-----------------|---------|----------|
| $T_{j-max}(^{\circ}C)$            | 110.4  | 105.4           | 103.6   | 102.1    |
| $T_{j-red} (^{\circ}C)$           | -      | 5               | 6.8     | 8.3      |
| $B_{10} (years)$                  | 15.9   | 19.43           | 21.7    | 24.03    |
| $\Delta B_{10} \left( pu \right)$ | 1      | 1.22            | 1.36    | 1.51     |
| $E_{15}\left(GWh\right)$          | 205.36 | 204.40          | 204.64  | 204.97   |

### 5.7 Technical Analysis of the Proposed Solution

Although the die-level thermal balancing has presented effectiveness to increase the reliability of MCM-based power converters, its technical implementation requires specific considerations which are discussed in this section.

### 5.7.1 Multi-Gate Driver Considerations

To control this structure with multiple SWs, a multi-gate driver structure is required. As shown in Fig. 5.32, the multi-port structure contains a digital logic processor which receives the PWM from the main controller, manipulate the pulses and generate the gate-emitter voltage to the the parallel devices. Even though additional components are required, the potential and the gate charge are preserved and the same power supply of the ordinary solution can be maintained. As a result, the additional cost is small and the reliability is not considerably modified, because the power supply is the only component in the gate-driver with enough failure rate to impact the system-level reliability of a high-power electronics system [199]. Furthermore, the



Figure 5.32: Hardware schematic of the Multi-gate driver structure for control and monitoring of multi-gate MCMs.

multi-gate driver structure allows the condition monitoring of a reduced group of dies and wire-bonds. Therefore, a degradation can be detected earlier, which for ordinary multichip modules is possible only after several wire-bond liftoffs [161].

Considering the intelligent power modules (IPM) as a future trend of power electronics, pre-driver unities can be locally embedded inside the MCM to reduce even more the commutation loops [200, 201]. As demonstrated in [195], the adoption of a selective gate driven approach decouple the commutation paths of one die to the others, thereby resulting in turn-on switching energy reduction of up to 30%.

### 5.7.2 Power Level Margins

Another important consideration is the power level margins for the die-level thermal balancing application. The limiting margins during a design and selection of the power devices are defined by voltage, current, temperature and transient power - current x voltage. Indeed, the voltage is not a concern during the presented thermal strategies, due to the parallel connection among the dies. Even though the devices face higher current during the pulse processing, their temperatures are reduced and any strategy can be safely applied if the natural distribution - i.e. without thermal balancing - is not exceeding the device limits. The factor limiting current during thermal balancing is the transient turn-off process, whereby the voltage and current can not overtake a predefined safe operating area (SOA). Since the voltage is fixed, the equivalent switches shall be selected by respecting a limit of dies as well and, in case of applications with severe overload conditions, a specific operating point. Otherwise, the devices have higher probability to fail by dynamic avalanche breakdown or due to a inextinguishable current filament [17, 19]. Considering the proposed 24-dies 1.6 kA MCM, however, all the strategies can be applied at nominal and moderate overloads, whereby each device will face only 33% of extra current in the absence of one equivalent switch, and the SOA limits - of this specific device - is twice the rated current [202].

### 5.8 Short Summary of the Chapter

In this section thermal balancing strategies are presented to overcome thermal mismatches among the dies of a modified MCM structure. Therefore, two different strategies for direct power flow and one to reduce thermal stress on the diodes in reverse power, are proposed. It is proven that the temperatures can be balanced and the highest thermal stress reduced in up to  $9.9 \,^{\circ}C$ . A pre-programmed thermal balancing strategy is presented, eliminating the necessity of continuous  $T_j$  sensing. The possibility to apply a pre-programmed balancing, as well as the impact of each presented thermal balancing strategy in a mine hoist systems are demonstrated. The thermal balancing strategies are capable of increasing the mine hoist converter lifetime by  $22 \,\%$ ,  $36 \,\%$  and  $51 \,\%$ , considering the pulse-shadowing, the turn-on and turn-off losses manipulation, respectively. Moreover, a technical analysis is conducted, explaining the considerations and limiting margins for a real field application of the presented strategies. The presented and studied strategies are validated by FEM-based thermal simulations and experimental results.

### 5.8.1 Evaluation System with Equivalent Multi-gate Multichip Module

The multichip MCM with multiple dies is not available in the time development of this thesis; therefore, an equivalent multi-gate multichip module structure is constructed through a six-pack power module (DP-25F1200T-101666). Hence, the phase devices are placed in parallel, via a small bus bar directly connected to the collector of the upper devices, as shown in Fig. 5.33(a). In addition to the three parallel devices, the bottom diodes are also attached to the circuit, thereby resulting in a buck converter, as shown in Fig. 5.33 (b). To enable thermal analysis capability, the power module has its silicone gel removed and is painted in black to ensure homogeneous emissivity. To obtain a considerable thermal mismatch, the power modules is coupled to two different heatsinks, whereby S1 and S2 are over heatsink H1 whilst S3 is over heatsink H2, as detailed in Fig. 5.33 (a). Therefore, turning on the cooler of H2 the



Figure 5.33: Validation Setup consisting of: an equivalent multi-gate multichip power module in two heatsinks, driver board,  $V_{ce}$  sensing, variable load and infrared camera : (a) Wide view of the open three-dies module,  $V_{ce}$  sensing and drivers. (b) Schematic.

temperature of S3 is reduced, thereby increasing the thermal mismatch with S1 and S2, which are in turn naturally hotter due to the thermal cross-coupling between them. To drive the parallel group and sense the temperature, a driver board with the  $V_{ce}$  sensing circuit developed in Sec. 4.5 is adopted. To interface with the driver board a Microlab Box Dspace which has an embedded logic processor and a very fast analog-to-digital converter (ADC) - as previously mentioned - and it is used for the modulation and pulse processing of the the proposed strategies. For the thermal validation of the balancing strategies, it is used an infrared (IR) camera with high resolution as shown in Fig. 5.33 (b). In addition, three resistors are placed in parallel, where two of them are connected through circuit breakers, enabling online current variations, as also demonstrated in Fig. 5.33 (b). The adopted operating parameters are stated in Tab. 5.6.



Figure 5.34: Validation system with equivalent multi-gate multichip modules : (a) Output waveforms (b) Thermal distribution on the equivalent multichip module.

Fig. 5.34 (a), show the electrical results of the three dies buck converter, including the output voltage, output current, current in one device (S1) and the gate commands.

| Power Module     | DP-25F1200T-101666     |
|------------------|------------------------|
| Power Capability | $3x \ 25 \ A/1200 \ V$ |
| $V_{dc}$         | 250 V                  |
| $I_{out}$ (1 pu) | 23 A                   |
| $F_{sw}$         | $5 \mathrm{kHz}$       |
| Duty Cycle       | 0.5                    |

Table 5.6: Validation parameters.

Fig. 5.34 (b), shows the obtained thermal distribution among the three devices, with a maximum deviation of  $6.43 \,^{\circ}C$  due to the cross-coupling effects of the closer devices (S1, S2) and the lower thermal resistance of the third one (S3), which has influences of the turned-on air cooling system.

Chapter 6

# **Conclusions and Future Research**

# Contents

| 6.1 | Summary                                                             | 134 |
|-----|---------------------------------------------------------------------|-----|
| 6.2 | Future Research I - Die-Level Thermal Balancing in Wide-Bandgap     |     |
|     | Devices                                                             | 136 |
| 6.3 | Future Research II - Degradation Control Through SOH of Power Semi- |     |
|     | conductor Devices                                                   | 136 |

### 6.1 Summary

The multichip power modules (MCM) will remain the standard solution for high power mission critical applications in a foreseeable future. This structure, contains a plurality of chips inside the same package, thereby achieving high power density and lower cost. However, its structure limitation results in critical thermal mismatches which ultimately induces substantial thermal stress in specific subset of devices. Although many strategies to mitigate this effect have been proposed in the last 30 years, the MCM-based converter still operating with a reduced reliability. Therefore, this work proposed reliability-oriented solutions to reduce the impact of uneven thermal stress and improve the reliability of MCM-based mission critical applications.

The first section has discussed the causes and consequences of uneven thermal distribution in MCMs. A FEM model of a 24-dies MCM is developed to demonstrate the thermal deviation among the MCM dies. It is concluded that the thermal cross-coupling affects the thermal unevenness by up to  $17.4^{\circ}C$ . The consequences of the extra thermal stress in specific dies is presented in sequence, thereby demonstrating that the temperature affects multiple failure mechanisms of power semiconductor devices, potentially provoking either catastrophic or aging failures. At the end, the already proposed solutions to mitigate such effects and avoid premature failures are presented; however, it is concluded that further research are necessary, because the MCM remains experiencing critical thermal deviations nowadays.

The next section has evaluated the impact of the degradation unevenness on the lifetime of MCMs, and presents one research target: the addition of a die-level thermal and reliability analysis on the design for reliability procedure of MCM-based systems. In this methodology, the temperature and failure probability of each die are taken into account. It is furthermore demonstrated, that the common DFR processes may result in considerably high erroneous lifetime prediction in MCM-based mission critical applications.

Thereafter, a review of the proposed thermal control and aging monitoring strate-

gies is presented. For the monitoring, the  $V_{on}$  has stood out due to its relatively simplicity, reasonable resolution requirements and capability to sense temperature and detect degradation. Therefore, a high-resolution circuit is implemented and its capability to close the loop of thermal control strategies is experimentally validated. It is demonstrated a resolution of  $0.3 \, mV$  and capability to sense online temperatures in applications of up to  $15 \, kHz$ . Moreover, another contribution is presented in this section, whereby a power routing strategy for multiphase machines is proposed. This strategy was based oon the capability of a multiphase machine to work under soft-unbalanced condition without degrading its magnetic performance. Thereby, the strategy proposes to direct the power from the hottest MCM to other phases to alleviate its thermal stress. It is demonstrated that the temperature of one phase can be reduced by  $9^{\circ}C$ , adding only  $1^{\circ}C$  to the remaining ones. As a result, the converter lifetime can be increased in up to 22%, thereby overcoming a thermal deviation effects caused by a defected heatsink.

Based on the implemented review, the next section proposes a die-level thermal balancing to overcome the critical mismatches among the dies in multichip modules. For that, a hypothetical packaging structure with four gate-emitter commands is considered. Thereby, two novel strategies are proposed, one shadowing pulses and the other turning-off softly the hottest devices, to balance the temperature among the IGBT dies in direct power flow. As a result, a thermal reduction of up to  $8.5^{\circ}C$  in the hottest devices, is demonstrated. Moreover, a strategy is proposed to reduce the thermal stress among the diodes in negative power factor applications, by shadowing specific devices and reducing the cross-coupling effects, thereby achieving a thermal reduction of up to  $9.9^{\circ}C$  is achieved. A reliability and losses analysis, has demonstrated better performance for the turn-off losses manipulation in direct power flow, due to its additional capability to reduce the overall MCM losses. This strategy, allied to the diode thermal stress reduction in reverse power flow, can increase the lifetime of a MCM-based mine hoist system converter in up to 51%.

# 6.2 Future Research I - Die-Level Thermal Balancing in Wide-Bandgap Devices

The thermal mismatches in parallel wide-bandgap devices is even more critical, due to its not consolidated fabrication process. In very high switching frequency applications, for examples, thermal deviations of up to  $50^{\circ}C$  have been reported [38, 44, 45]. A statistical analysis of the parametric deviation impact on the transient current distribution and the resulting thermal mismatches of a parallel association of multiple samples of SiC devices, is conducted in reference [45]. It is demonstrated that thermal deviations of up to 10, 18 and 32 ° C for parallel devices switching at 50, 100 and 200 kHz, respectively, even limiting the deviations to  $\sigma < 0.5$  by selecting similar devices. Moreover, wear out resulting from thermal cycling has been more evident and even more critical in latest technologies such wide-bandgap based power modules [203]. Therefore, investigating the effects and implementation concerns of a die-level thermal balancing in wide-bandgap based MCMs, has a high relevance for the future of power electronics.

# 6.3 Future Research II - Degradation Control Through SOH of Power Semiconductor Devices

Even though many methods for online  $T_j$  sensing of power devices have been proposed, its implementation in industrial environment stills challenging, whereby a noisy environment can disturb the accuracy of the sensing circuits. An alternative for this solution is to use a data-driven methodology to detect the aging rather than the temperature of the devices. Therefore, the devices can be sensed in - situ and the pulse-patterns updated seasonally, thereby avoiding on-line sensing disturbances. The  $V_{ce}$  sensing circuit, which is implemented in this work, has a huge potential to be the aging detector parameter

## $V_{on}$ -based Sensing Circuit Design

### A.1 Design of a Von-based Sensing Circuit

This appendix give detailes about the technical implementation of the  $V_{on}$  sensing circuit presented in Sec. 4.5.

### A.1.1 Power Supply

The biggest challenge of the online  $V_{on}$  sensing is the precision required to obtain the  $T_j$  sensing resolution ensuring isolation capability. Consequently, the necessity of isolated power supplies operating at few kHz can generate noise - ranging up to 20 MHz -, which may result in interference via coupling capacitance. Moreover, the pulsating energy flow can cause output voltage ripple and a reflected input ripple current. The pulsating ripple is generated from each switching cycle, which provokes voltage rising and falling in the output capacitors which supply the load by itself between the power transfer cycles. In addition, the switching of the isolated dc-dc converter induces a high frequency and a common mode noise to the output voltage and input current ripple, respectively. Although the input and output ripple can be reduced with additional output capacitance, it has no effect on the common mode noise [204].

Therefore, three separate filters to reduce ripple and noise to very low levels are necessary, whereby each one handles a specific part of the interference spectrum [205]. The step-by-step of a a very low noise filter implementation, considering the adopted dc-dc source (R1ZX-0505), which has a inherent 30 mVp - p ripple, is presented in reference [205]. The first step is to add 2 nF capacitors across  $V_{out}$  and  $V_{in}$ , which provides quite lower impedance return path compared to the 100 pF transformer coupling capacitance. Thereafter, two 10  $\mu F$  multilayer ceramic capacitors are connected in parallel to reduce the ESR, which considerably reduce the input and output ripple. However, to filter the high frequency common mode noise, the addition of capacitance to the LC filters is not effective. Therefore, 50  $\mu H$  and 10  $\mu H$  chokes forming common mode Pi-filters are added to the input and output, respectively [205]. The resulting circuit is shown in 4.35 (a), which is implemented in the developed  $V_{on}$  sensing board.

### A.1.2 Operational Amplifier

The precision of the output sensing is also the main criteria for the selection of the operational amplifier. Therefore, a high-precision, ultra-low noise operational amplifier (ADA4528-1), is selected. In this work, the goal is to sense only the junction temperature of the transistor  $(V_{on})$ , and a single-supply device is adopted and fed with +5 V referenced to the digital ground. Moreover, the selected amplifier has high common mode -  $CMRR = 135 \, dB$  - and power supply -  $PSRR = 135 \, dB$  - rejection ratio, which contributes to reduce even more the impact of common mode and power supply interferences.

#### A.1.3 Isolation

The isolation is also a critical part of the sensing circuit, which is connected to a high-voltage reference. For that, a cost-effective precision amplifier with capacitive isolation is selected (ISO124) [206]. This component, has a  $50 \, kHz$  signal bandwidth, and a  $2 \, pF$  differential capacitive barrier rated for up to  $1500 \, Vrms$ . This isolator, however, introduce a  $20 \, mVp-p$  ripple to the signal which has to be filtered, to achieve the required precision. Therefore, a Sallen-Key filter, shown in Fig. 4.35, is added in the output, to eliminate the ripple and keeps full bandwidth [207].



Figure A.1: Transient performance of the  $V_{on}$  sensing circuit, using a square wave generator with duty cycle of 0.5 and switching frequencies of: (a) 5 kHz (b) 10 kHz (c) 15 kHz (d) 20 kHz

### A.1.4 Transient Performance

The transient performance validation of the  $V_{on}$  sensing on each stage of the developed circuit is demonstrated in Fig. A.1. For that, square waves with 5, 10, 15 and 20 kHz and a duty cycle of 0.5 are applied, whereby a signal wave generator is connected to the board terminals to emulate the  $V_{on}$  of a transistor. The generated signal is represented in orange, the Op amp output  $(V_{on})$  is in pink, the isolated signal  $(V_{iso})$  is in blue and the filtered  $(V_{filter})$  is the green signal. As can be seen in Fig. A.1,  $V_{on}$  has a fast transient response, and low disturbance in steady-state.  $V_{iso}$ , however, has an additional ripple and an additional response delay. Moreover, the filter capability to mitigate the signal ripple and provide a cleaner signal in the output, is also observed. As shown in Fig. A.1 (a) and (b), the circuit has a satisfactory response in 5 and 10 kHz, despite the additional delays provided by each circuit part. Nevertheless, for 15 and 20 kHz, the output filter can not achieve steady-state, thereby compromising the  $V_{on}$  sensing. In this case, the Sallen-Key has to be bypassed and alternative methods shall be adopted.



Figure A.2: Steady-state validation of the sensing circuit showing an output voltage for  $V_{on}$  (pink) and  $V_{filter}$  (green) with a fluctuation of around  $5 \, mV$  (a) Full view (b) Zoom.

#### A.1.5 Steady-State Performance

To validate the steady-state performance, the sensing board is connected to the collector-emitter terminals of an IGBT in a buck converter switching at 1 kHz, operating with 40 V and a current of 1 A. Fig. A.2, shows the  $V_{ce}$  sensing during the conduction stage of the device, with an output voltage varying between 850 and 855 mV. As clearly seen in Fig. A.2,  $V_{filter}$  has a higher fluctuation comparing with  $V_{on}$  due to the additional ripple generated by the isolation circuit which is mitigated but not eliminated by the output filter.

### A.1.6 Moving Average Digital Filter for High Precision

Although the proposed hardware can effectively sense  $V_{on}$  with a reasonable precision (5 mV), a higher resolution is required to achieve  $T_j$  sensing capability. Therefore, a digital filter is implemented in the Dspace Microlab Box system. For that, the analogdigital converter (ADC) channels - which has 1 MSPS (million samples per second), 16 bits and burst mode operation - are used to realize multiple sequential sampling with an interval of  $1 \mu s$ . Then, 50 samples are acquired per period during the steadystate circuit operating region, and used for a moving average filter implementation, as shown in (A.1). Since the ADC in burst mode, the multiple sampling is realized inside the same period, without interfering the sensing bandwidth.

$$V_{on-avg} = \frac{V_{on}[0] + V_{on}[1] + \dots + V_{on}[49]}{50}$$
(A.1)

# Design and Control of the Adopted Electrical Drives

### **B.1** Electrical Drive Design and Control

The electrical drive design of the mine hoist system study is presented in this appendix. Firstly, a research in the e-catalog of the company WEG is realized, and the rated characteristics of the selected machine are stated in Tab. B.1. As can be seen, the induction machine of 1 MW, has a nominal voltage of 690 V and current of 1070 A at full load  $(I_L)$ . The current at no load is 560 A  $(I_{NL})$ , whilst the locked rotor current  $(I_i/I_n)$  achieves up to 6520 A and the breakdown torque  $(T_{bd})$  is 200 %. This machine has a frequency of 50 Hz and six poles, thereby achieving a speed of 990 rpm at full load, with a total slip of 1%. Although the machine rated characteristics are available in the catalog, the parameters of its magnetic equivalent circuit - shown in Fig. B.1 - are necessary, for simulation purpose and field orientation control realization. The parameters are based on the rated characteristics provided by the manufactures, and its parameters are also shown in Tab. B.1, where  $r_r$ ,  $r_s$  and  $r_m$  are the rotor, stator and magnetizing resistances, whereas  $X_r$ ,  $X_s$  and  $X_m$  are the respective reactances.

To validate the equivalent circuit parameters, the values are loaded in an induction machine model of the electrothermal software. Thereafter, the machine is fed by a three phase voltage source with  $V_{line} = 690$  V, thereby achieving the nominal current at full load, as shown in Fig. B.2. As can be seen in Fig. B.3, a nominal load step is applied



Figure B.1: Induction machine equivalent magnetic circuit.

Table B.1: Rated and equivalent circuit parameters of the  $690\,V/\,1\,MW$  three-phase induction machine.

| Parameter   | Value             |
|-------------|-------------------|
| Power       | 1 MW              |
| $V_{line}$  | 690 V             |
| $I_L$       | 1070 A            |
| $I_{NL}$    | 560 A             |
| $I_i/I_n$   | 6.1               |
| $T_{bd}$    | 200~%             |
| Frequency   | $50 \mathrm{~Hz}$ |
| Poles       | 6                 |
| Speed       | 990 rpm           |
| $cos(\phi)$ | 0.83              |
| $\eta$      | 94.9~%            |
| Slip        | 1%                |
| J           | $29.46 \ kgm^2$   |

| alue            |
|-----------------|
| $m\Omega$       |
| $m\Omega$       |
| $5 \ \Omega$    |
| 2 mH            |
| $3 \mathrm{mH}$ |
| 2 mH            |
|                 |

at t = 4 s and the machine rotates with nominal speed, showing the rated slip of 1 %.



Figure B.2: Simulation of the induction machine with equivalent magnetic circuit parameters under nominal load (a) Line voltages (b) Stator Currents.



Figure B.3: Photo of the electrical system of the mine hoist.

### **B.1.1** Indirect Field Orientation Control

To drive the mine hoist, the speed of the induction machine has to be controlled, and the indirect field orientation control (IFOC) is adopted. In this strategy, the angle of the rotor flux is calculated by the slip relation B.1. As can be seen, it includes the lag in flux response ( $\hat{\tau}_r$ ) in a non linear calculation and it is entirely based on current commands ( $I_{ds}^*$ ,  $I_{qs}^*$ ) [208]. The slip and angle calculator is represented in green in the general diagram of the IFOC, shown in Fig. B.4.

$$S\omega_e^* = \frac{\frac{1}{\hat{\tau_r}} \cdot I_{qs}^*}{\frac{1}{1+\rho\hat{\tau_r}} \cdot I_{ds}^*} \tag{B.1}$$

In the field oriented control, the currents noted in d,q are decoupled, whereby  $I_{ds}^*$  represent the flux command whilst  $I_{qs}^*$  is the torque command.  $I_{ds}^*$  is applied to magnetize the machine; therefore, it is has a fixed value, which is calculated based on the estimated magnetizing inductance  $(\hat{L}_m)$  and the rotor flux  $(\lambda_{dr}^*)$  as presented in B.2. Since the skip velocity is controlled in this application, the torque reference  $(T^*)$  is provided by the output of the speed controller, and  $I_{qs}^*$  is calculated by its relation with the estimated rotor inductance  $(\hat{L}_r)$ ,  $\hat{L}_m$  and  $\lambda_{dr}^*$ , as shown in B.3.

$$I_{ds}^* = \frac{1}{\hat{L}_m} \cdot (1 + \hat{\tau}_r \rho) \cdot \lambda_{dr}^*$$
(B.2)



Figure B.4: Indirect field orientation control for a three-phase induction machine.

$$I_{qs}^* = T^* \cdot \frac{4 \cdot \hat{L}_r}{3 \cdot P \cdot \hat{L}_m \cdot \lambda_{dr}^*} \tag{B.3}$$

### B.1.2 Control Tuning

As shown in Fig. B.4, the inner current loop has a proportional controller, which is modeled by the transfer function shown in B.4.

$$\frac{i_s}{i_s^*} = \frac{G}{\tau_s + 1} \tag{B.4}$$

where,  $G = \frac{R_a}{R_a + r_s}$  and  $\tau_s = \frac{L'_s}{R_a + r_s}$ ;  $L'_s = \frac{L_s - L_m^2}{L_r}$ . Therefore, considering a cutoff frequency of 500 Hz - one decade below the switching frequency - the proportional gain of the current loop is  $R_a = 1.02 \ \Omega$ . To verify the capacity of the control loop to reject the disturb caused by the back electromotive force  $(E_s)$  and follow the trajectory, its dynamic roughness is modeled. As shown in B.5, it correlates the obtained proportional gain with machine parameters.

$$\left|\frac{E_s}{i_s}\right| = sL'_s + (R_a + r_s) \tag{B.5}$$

Fig. B.5 shows the frequency response of the control loop and its dynamic roughness. As can be seen, the transfer function has the selected cutoff frequency and a small attenuation of G = 0.99, due to the coupling of  $r_s$ . The dynamic roughness is  $R_a + r_s = 1.03$  and increase with frequency following the slope of  $L'_s$ .



Figure B.5: Frequency response of the system, for the current loop.

The outer speed loop control, however, has an additional integral gain and is modeled as shown in B.6.

$$\frac{w}{w^*} = \frac{sGb_a + GK_a}{J\tau_s s^3 + s^2(J + b\tau_s + G_dG) + s(b + b_aG) + K_aG}$$
(B.6)

where,  $G_d$  is a differential gain to reduce the effects of load variation in the motor speed,  $b_a$  the proportional gain,  $k_a$  the integral gain, J the momentum of inertia and b the viscous friction. The cutoff frequencies are selected to be one and two decades below the inner control loop, i.e. 50 Hz and 5 Hz. As a result, considering  $G_d = 0$ , the proportional and integral gains are  $b_a = 1.081e^4$  and  $k_a = 2.9e^5$ , respectively. The dynamic roughness of the speed control loop is based on its capacity to follow the trajectory and reject the disturb caused by the torque, as shown in B.7.

$$\left|\frac{T_L}{w}\right| = \frac{J\tau_s s^3 + s^2 (J + b\tau_s + G_d G) + s(b + b_a G) + K_a G}{s^2 \tau_s + s}$$
(B.7)

The frequency response of the transfer function and dynamic roughness of the speed

loop are shown in Fig. B.6. The speed control loop has an attenuation of 10/decade between 50 and 500 Hz, and 100/decade after 500 Hz due to the cutoff frequency of the current loop. Different to the current loop, the gain of the  $w/w^*$  is unitary even at low frequencies due to the integral action of the speed loop. The dynamic roughness  $T_L/w$  is infinite for constant disturbs - constant load torque. For frequencies between 5 and 50 Hz, the dynamic roughness presents its lower values with minimum around  $b_a G = 1.0e^4$ . For frequencies higher than 50 Hz, the dynamic roughness starts to increase again with slope defined by J. Therefore, the high frequency disturbs are rejected only by the inertia of the system.



Figure B.6: Frequency response of the system, for the speed loop.

### B.1.3 Time Domain Response

To evaluate the time domain response of the control system, the induction machine is started in a ramp of 100 rpm/s, and a nominal load step is applied when the machine achieve the nominal speed, as shown in Fig. B.7. In Fig. B.7 (b), an electrical torque  $(T_e)$  is observed during the starting and  $T_e$  follows the reference when the load step is applied at t= 15 s. The speed control performance in time domain is demonstrated in Fig. B.7 (a), whereby the measured speed follows the reference with small overshoot, even after a nominal load step.


Figure B.7: Time domain response of the IFOC applied in the selected induction machine: (a) Reference and measured speed (b) Electrical and load torque.

## Bibliography

- S. Yang, A. Bryant, P. Mawby, D. Xiang, L. Ran, and P. Tavner, "An industrybased survey of reliability in power electronic converters," *IEEE Transactions* on *Industry Applications*, vol. 47, no. 3, pp. 1441–1451, May 2011.
- T. J. Hesla, "Electrification of a major steel mill part 5: Scherbius and kraemer drives [history]," *IEEE Industry Applications Magazine*, vol. 13, no. 4, pp. 8–11, July 2007.
- [3] R. R. Bastos, T. S. de Souza, M. M. de Carvalho, L. A. R. Silva, and B. J. C. Filho, "Assessment of a nine-phase induction motor drive for metal industry applications," in 2019 IEEE Industry Applications Society Annual Meeting, Sep. 2019, pp. 1–9.
- [4] D. Vergne, Hard Rock Miners Handbook, M. Engineering, Ed. McIntosh Engineering, 2003.
- [5] V. N. Ferreira, G. A. Mendonca, A. V. Rocha, R. S. Resende, and B. J. C. Filho, "Mission critical analysis and design of igbt-based power converters applied to mine hoist systems," *IEEE Transactions on Industry Applications*, vol. PP, no. 99, pp. 1–1, 2017.
- [6] V. N. Ferreira, A. F. Cupertino, H. A. Pereira, A. V. Rocha, S. I. Seleme, and B. Cardoso, "Design and selection of high reliability converters for mission critical industrial applications: A rolling mill case study," *IEEE Transactions on Industry Applications*, pp. 1–1, 2018.

- [7] J. Zhang and D. Zhang, "Study of response surface methodology in thermal optimization design of multichip modules," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 3, no. 12, pp. 2075–2080, Dec 2013.
- [8] B. Sarlioglu, C. T. Morris, D. Han, and S. Li, "Driving toward accessibility: A review of technological improvements for electric machines, power electronics, and batteries for electric and hybrid vehicles," *IEEE Industry Applications Magazine*, vol. 23, no. 1, pp. 14–25, Jan 2017.
- [9] H. Cheng, I. Chung, and W. Chen, "Response surface based optimization approach for thermal placement design of chips in multiple-chip modules," *IEEE Transactions on Components and Packaging Technologies*, vol. 32, no. 3, pp. 531–541, Sep. 2009.
- [10] H. Li, W. Zhou, X. Wang, S. Munk-Nielsen, D. Li, Y. Wang, and X. Dai, "Influence of paralleling dies and paralleling half-bridges on transient current distribution in multichip power modules," *IEEE Transactions on Power Electronics*, vol. 33, no. 8, pp. 6483–6487, Aug 2018.
- [11] T. Ohi, T. Horiguchi, T. Okuda, T. Kikunaga, and H. Matsumoto, "Analysis and measurement of chip current imbalances caused by the structure of bus bars in an igbt module," in *Conference Record of the 1999 IEEE Industry Applications Conference. Thirty-Forth IAS Annual Meeting (Cat. No.99CH36370)*, vol. 3, Oct 1999, pp. 1775–1779 vol.3.
- [12] H. Li, S. Munk-Nielsen, X. Wang, R. Maheshwari, S. B?czkowski, C. Uhrenfeldt, and W. . Franke, "Influences of device and circuit mismatches on paralleling silicon carbide mosfets," *IEEE Transactions on Power Electronics*, vol. 31, no. 1, pp. 621–634, Jan 2016.
- [13] R. Sullhan, M. Fredholm, T. Monaghan, A. Agarwal, and B. Kozarek, "Thermal

modeling and analysis of pin grid arrays and multichip modules," in 1991 Proceedings, Seventh IEEE Semiconductor Thermal Measurement and Management Symposium, Feb 1991, pp. 110–116.

- [14] Z. Zeng, X. Li, X. Zhang, and L. Cao, "Comparative evaluation of kelvin connection for current sharing of multi-chip power modules," in 2018 IEEE Energy Conversion Congress and Exposition (ECCE), Sep. 2018, pp. 4664–4670.
- [15] F. Iannuzzo, C. Abbate, and G. Busatto, "Instabilities in silicon power devices: A review of failure mechanisms in modern power devices," *IEEE Industrial Electronics Magazine*, vol. 8, no. 3, pp. 28–39, Sept 2014.
- [16] V. Smet, F. Forest, J. J. Huselstein, F. Richardeau, Z. Khatir, S. Lefebvre, and M. Berkani, "Ageing and failure modes of igbt modules in high-temperature power cycling," *IEEE Transactions on Industrial Electronics*, vol. 58, no. 10, pp. 4931–4941, Oct 2011.
- [17] X. Perpina, J. Serviere, J. Urresti-Ibanez, I. Cortes, X. Jorda, S. Hidalgo, J. Rebollo, and M. Mermet-Guyennet, "Analysis of clamped inductive turnoff failure in railway traction igbt power modules under overload conditions," *IEEE Transactions on Industrial Electronics*, vol. 58, no. 7, pp. 2706–2714, July 2011.
- [18] H. Du, P. D. Reigosa, F. Iannuzzo, and L. Ceccarelli, "Impact of the case temperature on the reliability of sic mosfets under repetitive short circuit tests," in 2019 IEEE Applied Power Electronics Conference and Exposition (APEC), March 2019, pp. 332–337.
- [19] H. Schulze, F. Niedernostheide, F. Pfirsch, and R. Baburske, "Limiting factors of the safe operating area for power devices," *IEEE Transactions on Electron Devices*, vol. 60, no. 2, pp. 551–562, Feb 2013.
- [20] J. Falck, C. Felgemacher, A. Rojko, M. Liserre, and P. Zacharias, "Reliability of

power electronic systems: An industry perspective," *IEEE Industrial Electronics Magazine*, vol. 12, no. 2, pp. 24–35, June 2018.

- [21] E. Wolfgang, "Examples for failures in power electronics systems," in presented at ECPE Tutorial Reliability Power Electronics Systems, Nuremberg, Germany, 2007.
- [22] A. Bar-Cohen, "Thermal management of air- and liquid-cooled multichip modules," *IEEE Transactions on Components, Hybrids, and Manufacturing Technology*, vol. 10, no. 2, pp. 159–175, June 1987.
- [23] M. Aghazadeh and P. Jain, "Thermal performance of multi-chip modules," 1987.
- [24] R. Sullhan, M. Fredholm, T. Monaghan, A. Agarwal, and B. Kozarek, "Thermal modeling and analysis of pin grid arrays and multichip modules," in 1991 Proceedings, Seventh IEEE Semiconductor Thermal Measurement and Management Symposium, Feb 1991, pp. 110–116.
- [25] K. Lampaert, G. Gielen, and W. Sansen, "Thermally constrained placement of smart-power ic's and multi-chip modules," in *Thirteenth Annual IEEE. Semi*conductor Thermal Measurement and Management Symposium, Jan 1997, pp. 106–111.
- [26] Y. . Huang, S. . Fu, and Y. . Chang, "Fuzzy logic based thermal design for mcm placement," in 2nd 1998 IEMT/IMC Symposium (IEEE Cat. No.98EX225), April 1998, pp. 179–184.
- [27] S. Li, L. M. Tolbert, F.Wang, and F. Z. Peng, "Stray inductance reduction of commutation loop in the p-cell and n-cell-based igbt phase leg module," *IEEE Transactions on Power Electronics*, vol. 29, no. 7, pp. 3616–3624, July 2014.
- [28] J. Ewanchuk, J. Brandelero, and S. Mollov, "A gate driver based approach to improving the current density in a power module by equalizing the individual

die temperatures," in 2018 IEEE Energy Conversion Congress and Exposition (ECCE), Sept 2018, pp. 4652–4658.

- [29] V. N. Ferreira, M. Andresen, B. Cardoso, and M. Liserre, "Pulse-shadowing based thermal balancing in multichip modules," *IEEE Transactions on Industry Applications*, pp. 1–1, 2020.
- [30] D. A. Murdock, J. E. R. Torres, J. J. Connors, and R. D. Lorenz, "Active thermal control of power electronic modules," *IEEE Transactions on Industry Applications*, vol. 42, no. 2, pp. 552–558, March 2006.
- [31] V. Raveendran, M. Andresen, G. Buticchi, and M. G. Liserre, "Thermal stress based power routing of smart transformer with chb and dab converters," *IEEE Transactions on Power Electronics*, pp. 1–1, 2019.
- [32] V. Raveendran, M. Andresen, and M. Liserre, "Improving onboard converter reliability for more electric aircraft with lifetime-based control," *IEEE Transactions* on Industrial Electronics, vol. 66, no. 7, pp. 5787–5796, July 2019.
- [33] H. Wang, M. Liserre, F. Blaabjerg, P. de Place Rimmen, J. B. Jacobsen, T. Kvisgaard, and J. Landkildehus, "Transitioning to physics-of-failure as a reliability driver in power electronics," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 2, no. 1, pp. 97–114, March 2014.
- [34] H. Wang, M. Liserre, and F. Blaabjerg, "Toward reliable power electronics: Challenges, design tools, and opportunities," *IEEE Industrial Electronics Magazine*, vol. 7, no. 2, pp. 17–26, June 2013.
- [35] J. V. M. Farias, A. F. Cupertino, V. De Nazareth Ferreira, H. A. Pereira, S. I. Seleme Junior, and R. Teodorescu, "Reliability-oriented design of modular multilevel converters for medium-voltage statcom," *IEEE Transactions on Industrial Electronics*, pp. 1–1, 2019.

- [36] S. Peyghami, Z. Wang, and F. Blaabjerg, "A guideline for reliability prediction in power electronic converters," *IEEE Transactions on Power Electronics*, pp. 1–1, 2020.
- [37] H. Wang, M. Su, and K. Sheng, "Theoretical performance limit of the igbt," *IEEE Transactions on Electron Devices*, vol. 64, no. 10, pp. 4184–4192, Oct 2017.
- [38] D. Peftitsis, R. Baburske, J. Rabkowski, J. Lutz, G. Tolstoy, and H. Nee, "Challenges regarding parallel connection of sic jfets," *IEEE Transactions on Power Electronics*, vol. 28, no. 3, pp. 1449–1463, March 2013.
- [39] J. Ewanchuk, J. Brandelero, and S. Mollov, "Lifetime extension of a multi-die sic power module using selective gate driving with temperature feedforward compensation," in 2017 IEEE Energy Conversion Congress and Exposition (ECCE), Oct 2017, pp. 2520–2526.
- [40] R. Wu, L. Smirnova, H. Wang, F. Iannuzzo, and F. Blaabjerg, "Comprehensive investigation on current imbalance among parallel chips inside mw-scale igbt power modules," in 2015 9th International Conference on Power Electronics and ECCE Asia (ICPE-ECCE Asia), June 2015, pp. 850–856.
- [41] P. M. Fabis, D. Shum, and H. Windischmann, "Thermal modeling of diamondbased power electronics packaging," in *Fifteenth Annual IEEE Semiconduc*tor Thermal Measurement and Management Symposium (Cat. No.99CH36306), March 1999, pp. 98–104.
- [42] V. N. Ferreira, G. A. Mendonça, A. V. Rocha, R. S. Resende, and B. J. C. Filho, "Proactive fault-tolerant ight-based power converters for mission critical applications in mw range," in *Applied Power Electronics Conference*, 2017.
- [43] A. Volke and M. Hornkamp, *IGBT Module*, I. Technologies, Ed. Infineon Technologies, 2011.

- [44] H. Li, "Parallel connection of silicon carbide mosfets for multichip power modules," Ph.D. dissertation, Allborg University, 2015.
- [45] A. Borghese, M. Riccio, A. Fayyaz, A. Castellazzi, L. Maresca, G. Breglio, and A. Irace, "Statistical analysis of the electrothermal imbalances of mismatched parallel sic power mosfets," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 7, no. 3, pp. 1527–1538, Sep. 2019.
- [46] X. Wang, Z. Zhao, and L. Yuan, "Current sharing of igbt modules in parallel with thermal imbalance," in 2010 IEEE Energy Conversion Congress and Exposition, Sep. 2010, pp. 2101–2108.
- [47] B. Baliga, Fundamentals of Power Semiconductor Devices, Springer, Ed. Springer, 2008.
- [48] P. R. F. I. M. Akbaria, A.S. Bahman and M. Binaa, "Thermal modeling of wire-bonded power modules considering non-uniform temperature and electric current interactions," *Microeletronics Reliability*, 2018.
- [49] U. S. Marion Kind, Estimation of Liquid Cooled Heat Sink Performance at Different Operation Conditions., Semikron, 2015.
- [50] A. Wintrich, U. Nicolai, and W. Tursky, Application Manual Power Semiconductors, Semikron International GmbH, 2011, 2011.
- [51] M. Marz and P. Nance, Thermal modeling of Power electronic System, Infineon Technologies, 2012.
- [52] U. Drofenik, D. Cottet, A. Müsing, J.-M. Meyer, and J. Kolar, "Modelling the thermal coupling between internal power semiconductor dies of a water-cooled 3300v/1200a hipak igbt module," in Proc. Intl. Conference on Power Electronics Intelligent Motion Power Quality (PCIM2007), 2007.

- [53] B. A. Zahn, "Steady state thermal characterization of multiple output devices using linear superposition theory and a non-linear matrix multiplier," in Fourteenth Annual IEEE Semiconductor Thermal Measurement and Management Symposium (Cat. No.98CH36195), March 1998, pp. 39–46.
- [54] B. S. Lall, B. M. Guenin, and R. J. Molnar, "Methodology for thermal evaluation of multichip modules," in *Proceedings of 1995 IEEE/CPMT 11th Semiconduc*tor Thermal Measurement and Management Symposium (SEMI-THERM), Feb 1995, pp. 72–79.
- [55] D. J. W. Sofia, Electrical Thermal Resistance Measurements for Hybrids and Multi-Chip Packages, Analysis Technology, Inc., Wakefield, MA. 1990., 1990.
- [56] V. V. N. Obreja, C. Codreanu, and K. I. Nuttall, "Reverse leakage current instability of power fast switching diodes operating at high junction temperature," in 2005 IEEE 36th Power Electronics Specialists Conference, June 2005, pp. 537–540.
- [57] P. Spirito, G. Breglio, V. d'Alessandro, and N. Rinaldi, "Analytical model for thermal instability of low voltage power mos and soa in pulse operation," in *Pro*ceedings of the 14th International Symposium on Power Semiconductor Devices and Ics, 2002, pp. 269–272.
- [58] G. Breglio, A. Irace, E. Napoli, M. Riccio, and P. Spirito, "Experimental detection and numerical validation of different failure mechanisms in igbts during unclamped inductive switching," *IEEE Transactions on Electron Devices*, vol. 60, no. 2, pp. 563–570, Feb 2013.
- [59] Y. Mizuno, R. Tagami, and K. Nishiwaki, "Investigations of inhomogeneous operation of igb under unclamped inductive switching condition," in 2010 22nd International Symposium on Power Semiconductor Devices IC's (ISPSD), June 2010, pp. 137–140.

- [60] M. D. Josef Lutz, "Dynamic avalanche and reliability of high voltage diodes," *Microelectronics Reliability 3 (2003) 529536*, 2002.
- [61] U. Choi, F. Iannuzzo, and F. Blaabjerg, "Junction temperature estimation method for a 600v, 30a igbt module during converter operation," *Microelectronics Reliability*, 2015.
- [62] M. Ciappa, "Selected failure mechanisms of modern power modules," *Microelectronics Reliability* 42 (2002) 653667, 2012.
- [63] V. Ferreira, M. Andresen, B. Cardoso, and M. Liserre, "Active redundancy in the low voltage stage of smart transformers," in 2018 IEEE Energy Conversion Congress and Exposition (ECCE), Sept 2018, pp. 471–477.
- [64] J. Ewanchuk, J. Brandelero, and S. Mollov, "Improving the die utilization and lifetime in a multi-die sic power module by means of integrated per-die gate buffers," in 2017 29th International Symposium on Power Semiconductor Devices and IC's (ISPSD), May 2017, pp. 439–442.
- [65] S. H. Ali, X. Li, A. S. Kamath, and B. Akin, "A simple plug-in circuit for igbt gate drivers to monitor device aging: Toward smart gate drivers," *IEEE Power Electronics Magazine*, vol. 5, no. 3, pp. 45–55, Sep. 2018.
- [66] Y. Avenas, L. Dupont, N. Baker, H. Zara, and F. Barruel, "Condition monitoring: A decade of proposed techniques," *IEEE Industrial Electronics Magazine*, vol. 9, no. 4, pp. 22–36, Dec 2015.
- [67] B. Tian, Z. Wang, and W. Qiao, "Study on case temperature distribution for condition monitoring of multidie igbt modules," in 2014 IEEE Applied Power Electronics Conference and Exposition - APEC 2014, March 2014, pp. 2564– 2568.
- [68] H. Li, S. Munk-Nielsen, S. B?czkowski, and X. Wang, "A novel dbc layout for cur-

rent imbalance mitigation in sic mosfet multichip power modules," *IEEE Transactions on Power Electronics*, vol. 31, no. 12, pp. 8042–8045, Dec 2016.

- [69] A. S. Bahman and F. Blaabjerg, "Optimization tool for direct water cooling system of high power igbt modules," in 2016 18th European Conference on Power Electronics and Applications (EPE'16 ECCE Europe), Sept 2016, pp. 1–10.
- [70] A. V. Rocha, G. J. Franca, M. E. dos Santos, H. de Paula, and B. J. C. Filho, "Increasing long-belt-conveyor availability by using fault-resilient medium-voltage ac drives," *IEEE Transactions on Industry Applications*, vol. 48, no. 5, pp. 1708– 1716, Sep. 2012.
- [71] M. Held, P. Jacob, G. Nicoletti, P. Scacco, and M. Poech, "Fast power cycling test for igbt modules in traction application," in *Proc. Int. Conf. Power Electron. Drive Syst.*, 1997, pp. 425430., 1997.
- [72] M. Guyennet and M. Piton, "Railway traction reliability," in CIPS 2010 Nuremberg/Germany, 2010.
- [73] B. Farokhzad, P. Turkes, E. Wolfgang, and K. Goser, "Reliability indicators for lift-off of bond wires in igbt power-modules," in *Proceedings of the 7th European* Symposium on Reliability of Electron Devices, Failure Physics and Analysis, Oct 1996, pp. 1863–1866.
- [74] H. Wang, M. Liserre, F. Blaabjerg, P. Rimmen, J. Jacobsen, T. Kvisgaard, and J. Landkildehus, "Transitioning to physics-of-failure as a reliability driver in power electronics," *IEEE JOURNAL OF EMERGING AND SELECTED TOP-ICS IN POWER ELECTRONICS, VOL. 2, NO. 1, MARCH 2014*, 2014.
- [75] D. Astigarraga, F. M. Ibanez, A. Galarza, J. M. Echeverria, I. Unanue, P. Baraldi, and E. Zio, "Analysis of the results of accelerated aging tests in insulated gate bipolar transistors," *IEEE Transactions on Power Electronics*, vol. 31, no. 11, pp. 7953–7962, Nov 2016.

- [76] R. Bayerer, T. Herrmann, T. Licht, J. Lutz, and M. Feller, "Model for power cycling lifetime of igbt modules - various factors influencing lifetime," in 5th International Conference on Integrated Power Electronics Systems, March 2008, pp. 1–6.
- [77] P. D. Reigosa, H. Wang, Y. Yang, and F. Blaabjerg, "Prediction of bond wire fatigue of igbts in a pv inverter under a long-term operation," *IEEE Transactions* on Power Electronics, vol. 31, no. 10, pp. 7171–7182, Oct 2016.
- [78] A. Sangwongwanich, Y. Yang, D. Sera, F. Blaabjerg, and D. Zhou, "On the impacts of pv array sizing on the inverter reliability and lifetime," *IEEE Transactions on Industry Applications*, vol. 54, no. 4, pp. 3656–3667, July 2018.
- [79] D. Zhou, H. Wang, and F. Blaabjerg, "Mission profile based system-level reliability analysis of dc/dc converters for a backup power application," *IEEE Transactions on Power Electronics*, vol. 33, no. 9, pp. 8030–8039, Sept 2018.
- [80] A. F. Cupertino, J. M. Lenz, E. M. Brito, H. A. Pereira, J. R. Pinheiro, and S. I. Seleme, "Impact of the mission profile length on lifetime prediction of pv inverters," *Microelectronics Reliability*, vol. 100-101, p. 113427, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S0026271419305438
- [81] k. Ma, M. Liserre, F. Blaabjerg, and T. Kerekes, "Thermal loading and lifetime estimation for power device considering mission profiles in wind power converter," *Thermal Loading and Lifetime Estimation for Power Device Considering Mis*sion Profiles in Wind Power Converter, 2015.
- [82] S. Downing and D. Socie, "Simple rainflow counting algorithms," Int. J. Fatigue, vol. 4, no. 1, pp. 3140, 1982.
- [83] M. Musallam and C. M. Johnson, "An efficient implementation of the rainflow

counting algorithm for life consumption estimation," *IEEE Transactions on Reliability*, vol. 61, no. 4, pp. 978–986, Dec 2012.

- [84] U. Choi, F. Blaabjerg, and S. Jørgensen, "Power cycling test methods for reliability assessment of power device modules in respect to temperature stress," *IEEE Transactions on Power Electronics*, vol. 33, no. 3, pp. 2531–2551, March 2018.
- [85] R. Bayerer, T. Herrman, T. Licht, J. Lutz, and M. Feller, "Model for power cycling lifetime of igbt modules various factors influencing lifetime," in *Proc. CIPS*, 2008, pp. 16, 2008.
- [86] U. Scheuermann and M. Junghaenel, "Limitation of power module lifetime derived from active power cycling tests," in CIPS 2018; 10th International Conference on Integrated Power Electronics Systems, March 2018, pp. 1–10.
- [87] U.Scheuermann and R.Schmidt, "A new lifetime model for advanced power modules with sintered chips and optimized al wire bonds," 2013.
- [88] M. Musallam, C. Yin, C. Bailey, and M. Johnson, "Mission profile-based reliability design and real-time life consumption estimation in power electronics," *IEEE Transactions on Power Electronics*, vol. 30, no. 5, pp. 2601–2613, 2015.
- [89] M. Miner, "Cumulative damage in fatigue," J. Appl. Mech, vol. 67, p. A159A164, 1945.
- [90] A. Sangwongwanich, Y. Yang, D. Sera, and F. Blaabjerg, "Lifetime evaluation of grid-connected pv inverters considering panel degradation rates and installation sites," *IEEE Transactions on Power Electronics*, vol. 33, no. 2, pp. 1225–1236, Feb 2018.
- [91] N.Kaminski, Failure Rates of HiPak Modules due to Cosmic Rays, ABB Switzerland Ltd, Semiconductors, 2004.

- [92] STMicroelectronics, STG40M120F3D7, https://www.st.com/resource/en/datasheet/stg40m120
  2017.
- [93] Y. Ko, V. Raveendran, M. Andresen, and M. Liserre, "Thermally compensated discontinuous modulation for mvac/lvdc building blocks of modular smart transformers," *IEEE Transactions on Power Electronics*, vol. 35, no. 1, pp. 220–231, Jan 2020.
- [94] Y. Ko, V. Raveendran, M. Andresen, and M. G. Liserre, "Advanced discontinuous modulation for thermally compensated modular smart transformers," *IEEE Transactions on Power Electronics*, pp. 1–1, 2019.
- [95] V. G. Monopoli, A. Marquez, J. I. Leon, Y. Ko, G. Buticchi, and M. Liserre, "Improved harmonic performance of cascaded h-bridge converters with thermal control," *IEEE Transactions on Industrial Electronics*, vol. 66, no. 7, pp. 4982– 4991, July 2019.
- [96] Y. Ko, M. Andresen, G. Buticchi, and M. Liserre, "Discontinuous-modulationbased active thermal control of power electronic modules in wind farms," *IEEE Transactions on Power Electronics*, vol. 34, no. 1, pp. 301–310, Jan 2019.
- [97] M. Liserre, M. Andresen, L. Costa, and G. Buticchi, "Power routing in modular smart transformers: Active thermal control through uneven loading of cells," *IEEE Industrial Electronics Magazine*, vol. 10, no. 3, pp. 43–53, Sept 2016.
- [98] Y. Ko, M. Andresen, G. Buticchi, and M. Liserre, "Power routing for cascaded h-bridge converters," *IEEE Transactions on Power Electronics*, vol. 32, no. 12, pp. 9435–9446, Dec 2017.
- [99] F. Hahn, M. Andresen, G. Buticchi, and M. Liserre, "Thermal analysis and balancing for modular multilevel converters in hvdc applications," *IEEE Transactions on Power Electronics*, vol. 33, no. 3, pp. 1985–1996, March 2018.

- [100] D. Kaczorowski, B. Michalak, and A. Mertens, "A novel thermal management algorithm for improved lifetime and overload capabilities of traction converters," in 2015 17th European Conference on Power Electronics and Applications (EPE'15 ECCE-Europe), Sept 2015, pp. 1–10.
- [101] C. H. van der Broeck, L. A. Ruppert, R. D. Lorenz, and R. W. D. Doncker, "Active thermal cycle reduction of power modules via gate resistance manipulation," in 2018 IEEE Applied Power Electronics Conference and Exposition (APEC), March 2018, pp. 3074–3082.
- [102] P. K. Prasobhu, V. Raveendran, G. Buticchi, and M. Liserre, "Active thermal control of a dc/dc gan-based converter," in 2017 IEEE Applied Power Electronics Conference and Exposition (APEC), March 2017, pp. 1146–1152.
- [103] J. Falck, G. Buticchi, and M. Liserre, "Thermal stress based model predictive control of electric drives," *IEEE Transactions on Industry Applications*, vol. 54, no. 2, pp. 1513–1522, March 2018.
- [104] M. Novak, T. Dragicevic, and F. Blaabjerg, "Finite set mpc algorithm for achieving thermal redistribution in a neutral-point-clamped converter," in *IECON 2018* - 44th Annual Conference of the IEEE Industrial Electronics Society, Oct 2018, pp. 5290–5296.
- [105] M. Novak, V. N. Ferreira, M. Andresen, T. Dragicevic, F. Blaabjerg, and M. Liserre, "Fs-mpc algorithm for optimized operation of a hybrid active neutral point clamped converter," in 2019 IEEE Energy Conversion Congress and Exposition (ECCE), Sep. 2019, pp. 1447–1453.
- [106] T. Bruckner, S. Bernet, and P. K. Steimer, "Feedforward loss control of three-level active npc converters," *IEEE Transactions on Industry Applications*, vol. 43, no. 6, pp. 1588–1596, Nov 2007.

- [107] Q. Guan, C. Li, Y. Zhang, S. Wang, D. D. Xu, W. Li, and H. Ma, "An extremely high efficient three-level active neutral-point-clamped converter comprising sic and si hybrid power stages," *IEEE Transactions on Power Electronics*, vol. 33, no. 10, pp. 8341–8352, Oct 2018.
- [108] M. Novak, V. unde, N. ?obanov, and . Jakopovi?, "Semiconductor loss distribution evaluation for three level anpc converter using different modulation strategies," in 2017 19th International Conference on Electrical Drives and Power Electronics (EDPE), Oct 2017, pp. 170–177.
- [109] J. Callegari, A. Cupertino, V. Ferreira, E. Brito, V. Mendes, and H. Pereira, "Adaptive dc-link voltage control strategy to increase pv inverter lifetime," *Microelectronics Reliability*, vol. 100-101, p. 113439, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0026271419305426
- [110] G. Buticchi, M. Andresen, M. Wutti, and M. Liserre, "Lifetime-based power routing of a quadruple active bridge dc/dc converter," *IEEE Transactions on Power Electronics*, vol. 32, no. 11, pp. 8892–8903, Nov 2017.
- [111] V. Raveendran, M. Andresen, and M. Liserre, "Lifetime control of modular smart transformers considering the maintenance schedule," in 2018 IEEE Energy Conversion Congress and Exposition (ECCE), Sept 2018, pp. 60–66.
- [112] V. N. Ferreira, R. R. Bastos, T. S. de Souza, M. Liserre, and B. J. Cardoso Filho, "Power routing to enhance the lifetime of multiphase drives," in 2019 IEEE Energy Conversion Congress and Exposition (ECCE), Sep. 2019, pp. 3215–3222.
- [113] R. R. Bastos, R. M. Valle, S. L. Nau, and B. J. C. Filho, "Modelling and analysis of a nine-phase induction motor with third harmonic current injection," in 2015 9th International Conference on Power Electronics and ECCE Asia (ICPE-ECCE Asia), June 2015, pp. 688–694.

- [114] J.-R. Fu and T. A. Lipo, "Disturbance-free operation of a multiphase currentregulated motor drive with an opened phase," *IEEE Transactions on Industry Applications*, vol. 30, no. 5, pp. 1267–1274, Sep. 1994.
- [115] T. S. de Souza, R. R. Bastos, and B. J. C. Filho, "Modeling and control of a nine-phase induction machine with open phases," *IEEE Transactions on Industry Applications*, vol. 54, no. 6, pp. 6576–6585, Nov 2018.
- [116] A. S. Abdel-Khalik, M. S. Hamad, A. M. Massoud, and S. Ahmed, "Postfault operation of a nine-phase six-terminal induction machine under single open-line fault," *IEEE Transactions on Industrial Electronics*, vol. 65, no. 2, pp. 1084– 1096, Feb 2018.
- [117] F. Barrero and M. J. Duran, "Recent advances in the design, modeling, and control of multiphase machinespart i," *IEEE Transactions on Industrial Electronics*, vol. 63, no. 1, pp. 449–458, Jan 2016.
- [118] M. J. Duran and F. Barrero, "Recent advances in the design, modeling, and control of multiphase machinespart ii," *IEEE Transactions on Industrial Electronics*, vol. 63, no. 1, pp. 459–468, Jan 2016.
- [119] T. S. de Souza, R. R. Bastos, and B. J. Cardoso Filho, "Synchronous-frame modeling and dq current control of an unbalanced nine-phase induction motor due to open phases," *IEEE Transactions on Industry Applications*, vol. 56, no. 2, pp. 2097–2106, 2020.
- [120] H. Oh, B. Han, P. McCluskey, C. Han, and B. D. Youn, "Physics-of-failure, condition monitoring, and prognostics of insulated gate bipolar transistor modules: A review," *IEEE Transactions on Power Electronics*, vol. 30, no. 5, pp. 2413–2426, May 2015.
- [121] N. Baker, M. Liserre, L. Dupont, and Y. Avenas, "Improved reliability of power

modules: A review of online junction temperature measurement methods," *IEEE Industrial Electronics Magazine*, vol. 8, no. 3, pp. 17–27, Sept 2014.

- [122] M. Electric, Mitsubishi Electric. (2014). Intelligent powermodules. [Online]. Available: http://www.mitsubishielectric.com/semiconductors/products/ powermod/intelligentpmod/index.html., 2014.
- [123] F. Electric, 6MBP25VDA120-50, www.fujielectriceurope.com, 2014.
- [124] E. R. Motto and J. F. Donlon, "Igbt module with user accessible on-chip current and temperature sensors," in 2012 Twenty-Seventh Annual IEEE Applied Power Electronics Conference and Exposition (APEC), Feb 2012, pp. 176–181.
- [125] H. Kuhn and A. Mertens, "On-line junction temperature measurement of igbts based on temperature sensitive electrical parameters," in 2009 13th European Conference on Power Electronics and Applications, Sept 2009, pp. 1–10.
- [126] J. O. Gonzalez, O. Alatise, J. Hu, L. Ran, and P. A. Mawby, "An investigation of temperature-sensitive electrical parameters for sic power mosfets," *IEEE Transactions on Power Electronics*, vol. 32, no. 10, pp. 7954–7966, Oct 2017.
- [127] Z. Hu, M. Du, and K. Wei, "Online calculation of the increase in thermal resistance caused by solder fatigue for igbt modules," *IEEE Transactions on Device* and Materials Reliability, vol. 17, no. 4, pp. 785–794, Dec 2017.
- [128] J. Zhang, X. Du, Y. Yu, S. Zheng, P. Sun, and H. Tai, "Thermal parameter monitoring of igbt module using junction temperature cooling curves," *IEEE Transactions on Industrial Electronics*, vol. 66, no. 10, pp. 8148–8160, Oct 2019.
- [129] D. Berning, J. Reichl, A. Hefner, M. Hernandez, C. Ellenwood, and J. . Lai, "High speed igbt module transient thermal response measurements for model validation," in 38th IAS Annual Meeting on Conference Record of the Industry Applications Conference, 2003., vol. 3, Oct 2003, pp. 1826–1832 vol.3.

- [130] L. Dupont, Y. Avenas, and P. O. Jeannin, "Comparison of junction temperature evaluations in a power igbt module using an ir camera and three thermosensitive electrical parameters," *IEEE Transactions on Industry Applications*, vol. 49, no. 4, pp. 1599–1608, July 2013.
- [131] L. Zhou, S. Zhou, and D. Xu, "Investigation of gate voltage oscillations in an igbt module after partial bond wires lift-off," *Microelectron. Rel., vol. 53, pp.* 282287, Feb. 2013., 2013.
- [132] Z. Jakopovic, Z. Bencic, and F. Kolonic, "Important properties of transient thermal impedance for mos-gated power semiconductors," in *Industrial Electronics*, 1999. ISIE '99. Proceedings of the IEEE International Symposium on, vol. 2, 1999, pp. 574–578 vol.2.
- [133] H. Chen, V. Pickert, D. J. Atkinson, and L. S. Pritchard, "On-line monitoring of the mosfet device junction temperature by computation of the threshold voltage," in *The 3rd IET International Conference on Power Electronics, Machines and Drives, 2006. PEMD 2006*, April 2006, pp. 440–444.
- [134] Y. Huang, C. Lü, X. Xie, Y. Fan, J. Zhang, and X. Meng, "A study of test system for thermal resistance of igbt," in 2010 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), Sept 2010, pp. 312– 315.
- [135] X. Cao, T. Wang, K. D. T. Ngo, and G. Q. Lu, "Characterization of lead-free solder and sintered nano-silver die-attach layers using thermal impedance," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 1, no. 4, pp. 495–501, April 2011.
- [136] S. Dusmez, S. H. Ali, M. Heydarzadeh, A. S. Kamath, H. Duran, and B. Akin, "Aging precursor identification and lifetime estimation for thermally aged discrete package silicon power switches," *IEEE Transactions on Industry Applications*, vol. 53, no. 1, pp. 251–260, Jan 2017.

- [137] C. H. van der Broeck, A. Gospodinov, and R. W. De Doncker, "Igbt junction temperature estimation via gate voltage plateau sensing," *IEEE Transactions on Industry Applications*, vol. 54, no. 5, pp. 4752–4763, Sep. 2018.
- [138] J. Liu, G. Zhang, Q. Chen, L. Qi, Y. Geng, and J. Wang, "In situ condition monitoring of igbts based on the miller plateau duration," *IEEE Transactions* on Power Electronics, vol. 34, no. 1, pp. 769–782, Jan 2019.
- [139] N. Baker, L. Dupont, S. Munk-Nielsen, F. Iannuzzo, and M. Liserre, "Ir camera validation of igbt junction temperature measurement via peak gate current," *IEEE Transactions on Power Electronics*, vol. 32, no. 4, pp. 3099–3111, April 2017.
- [140] J. Brandelero, J. Ewanchuk, and S. Mollov, "On-line virtual junction temperature measurement via dc gate current injection," in CIPS 2018; 10th International Conference on Integrated Power Electronics Systems, March 2018, pp. 1–7.
- [141] M. Denk and M. Bakran, "An igbt driver concept with integrated real-time junction temperature measurement," in *PCIM Europe 2014; International Exhibition and Conference for Power Electronics, Intelligent Motion, Renewable Energy and Energy Management*, May 2014, pp. 1–8.
- [142] N. Baker, F. Iannuzzo, S. Beczkowski, and P. K. Kristensen, "Proof-of-concept for a kelvin-emitter on-chip temperature sensor for power semiconductors," in 2019 21st European Conference on Power Electronics and Applications (EPE '19 ECCE Europe), Sep. 2019, pp. P.1–P.8.
- [143] H. Luo, Y. Chen, P. Sun, W. Li, and X. He, "Junction temperature extraction approach with turn-off delay time for high-voltage high-power igbt modules," *IEEE Transactions on Power Electronics*, vol. 31, no. 7, pp. 5122–5132, July 2016.

- [144] A. Bryant, S. Yang, P. Mawby, D. Xiang, L. Ran, P. Tavner, and P. R. Palmer, "Investigation into igbt dv/dt during turn-off and its temperature dependence," *IEEE Transactions on Power Electronics*, vol. 26, no. 10, pp. 3019–3031, Oct 2011.
- [145] Z. Zhang, J. Dyer, X. Wu, F. Wang, D. Costinett, L. M. Tolbert, and B. J. Blalock, "Online junction temperature monitoring using intelligent gate drive for sic power devices," *IEEE Transactions on Power Electronics*, vol. 34, no. 8, pp. 7922–7932, Aug 2019.
- [146] J.Lutz, Semiconductor Power Devices, Springer, Ed. Springer.
- [147] H. Chen, V. Pickert, D. J. Atkinson, and L. S. Pritchard, "On-line monitoring of the mosfet device junction temperature by computation of the threshold voltage," in 2006 3rd IET International Conference on Power Electronics, Machines and Drives - PEMD 2006, April 2006, pp. 440–444.
- [148] D. W. Brown, M. Abbas, A. Ginart, I. N. Ali, P. W. Kalgren, and G. J. Vachtsevanos, "Turn-off time as an early indicator of insulated gate bipolar transistor latch-up," *IEEE Transactions on Power Electronics*, vol. 27, no. 2, pp. 479–489, Feb 2012.
- [149] M. Tounsi, A. Oukaour, B. Tala-Ighil, H. Gualous, B. Boudart, and D. Aissani, "Characterization of high-voltage igbt module degradations under pwm power cycling test at high ambient temperature," *Microelectron. Rel., vol. 50*, pp. 18101814, Sep./Nov. 2010, 2010.
- [150] Du Mingxing, Wei Kexin, Li Jian, and Xie Linlin, "Condition monitoring ight module bond wire lift-off using measurable signals," in *Proceedings of The 7th International Power Electronics and Motion Control Conference*, vol. 2, June 2012, pp. 1492–1496.

- [151] B. Rannestad, S. Munk-Nielsen, K. Gadgaard, and C. Uhrenfeldt, "Statistical method of estimating semiconductor switching transition time enabling condition monitoring of mega watt converters," *IEEE Transactions on Instrumentation* and Measurement, pp. 1–1, 2019.
- [152] A. Ammous, B. Allard, and H. Morel, "Transient temperature measurements and modeling of igbt's under short circuit," *IEEE Transactions on Power Electronics*, vol. 13, no. 1, pp. 12–25, Jan 1998.
- [153] D. Bergogne, B. Allard, and H. Morel, "An estimation method of the channel temperature of power mos devices," in 2000 IEEE 31st Annual Power Electronics Specialists Conference. Conference Proceedings (Cat. No.00CH37018), vol. 3, 2000, pp. 1594–1599 vol.3.
- [154] Z. Xu, F. Xu, and F. Wang, "Junction temperature measurement of igbts using short-circuit current as a temperature-sensitive electrical parameter for converter prototype evaluation," *IEEE Transactions on Industrial Electronics*, vol. 62, no. 6, pp. 3419–3429, June 2015.
- [155] Y. Xiong, X. Cheng, Z. J. Shen, C. Mi, H. Wu, and V. K. Garg, "Prognostic and warning system for power-electronic modules in electric, hybrid electric, and fuel-cell vehicles," *IEEE Transactions on Industrial Electronics*, vol. 55, no. 6, pp. 2268–2276, June 2008.
- [156] M. Ciappa and W. Fichtner, "Lifetime prediction of igbt modules for traction applications," in 2000 IEEE International Reliability Physics Symposium Proceedings. 38th Annual (Cat. No.00CH37059), April 2000, pp. 210–216.
- [157] G. Coquery and R. Lallemand, "Failure criteria for long term accelerated power cycling test linked to electrical turn off soa on igbt module. a 4000 hours test on 1200a3300v module with alsic base plate," *Microelectronics Reliability*, vol. 40, no. 8, pp. 1665 – 1670, 2000, reliability of

Electron Devices, Failure Physics and Analysis. [Online]. Available: http: //www.sciencedirect.com/science/article/pii/S0026271400001918

- [158] M. Tounsi, A. Oukaour, B. Tala-Ighil, H. Gualous, B. Boudart, and D. Aissani, "Characterization of high-voltage igbt module degradations under pwm power cycling test at high ambient temperature," *Microelectronics Reliability*, vol. 50, no. 9, pp. 1810 – 1814, 2010, 21st European Symposium on the Reliability of Electron Devices, Failure Physics and Analysis. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S002627141000332X
- [159] N. Patil, D. Das, and M. Pecht, "A prognostic approach for non-punch through and field stop igbts," *Microelectronics Reliability*, vol. 52, no. 3, pp. 482 – 488, 2012, special section on International Seminar on Power Semiconductors 2010. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S0026271411004823
- [160] H. Niu and R. D. Lorenz, "Evaluating different implementations of online junction temperature sensing for switching power semiconductors," *IEEE Transactions on Industry Applications*, vol. 53, no. 1, pp. 391–401, Jan 2017.
- [161] P. Asimakopoulos, K. D. Papastergiou, T. Thiringer, M. Bongiorno, and G. L. Godec, "On vce method: in-situ temperature estimation and aging detection of high-current igbt modules used in magnet power supplies for particle accelerators," *IEEE Transactions on Industrial Electronics*, pp. 1–1, 2018.
- [162] A. Rashed, F. Forest, J. . Huselstein, T. Martiré, and P. Enrici, "On-line [tj, vce] monitoring of igbts stressed by fast power cycling tests," in 2013 15th European Conference on Power Electronics and Applications (EPE), Sep. 2013, pp. 1–9.
- [163] A. Amoiridis, A. Anurag, P. Ghimire, S. Munk-Nielsen, and N. Baker, "Vcebased chip temperature estimation methods for high power igbt modules during power cycling ; a comparison," in 2015 17th European Conference on Power Electronics and Applications (EPE'15 ECCE-Europe), Sept 2015, pp. 1–9.

- [164] Y.-S. Kim and S.-K. Sul, "On-line estimation of igbt junction temperature using on-state voltage drop," in *Conference Record of 1998 IEEE Industry Applications Conference. Thirty-Third IAS Annual Meeting (Cat. No.98CH36242)*, vol. 2, Oct 1998, pp. 853–859 vol.2.
- [165] A. Koenig, T. Plum, P. Fidler, and R. W. D. Doncker, "On-line junction temperature measurement of coolmos devices," in 2007 7th International Conference on Power Electronics and Drive Systems, Nov 2007, pp. 90–95.
- [166] A. Pokryvailo and C. Carp, "Accurate measurement of on-state losses of power semiconductors," in 2008 IEEE International Power Modulators and High-Voltage Conference, May 2008, pp. 374–377.
- [167] B. Lu, T. Palacios, D. Risbud, S. Bahl, and D. I. Anderson, "Extraction of dynamic on-resistance in gan transistors: Under soft- and hard-switching conditions," in 2011 IEEE Compound Semiconductor Integrated Circuit Symposium (CSICS), Oct 2011, pp. 1–4.
- [168] V. Smet, F. Forest, J. Huselstein, A. Rashed, and F. Richardeau, "Evaluation ofv<sub>ce</sub>monitoring as a real-time method to estimate aging of bond wire-igbt modules stressed by power cycling," *IEEE Transactions on Industrial Electronics*, vol. 60, no. 7, pp. 2760–2770, July 2013.
- [169] B. Ji, V. Pickert, W. Cao, and B. Zahawi, "In situ diagnostics and prognostics of wire bonding faults in igbt modules for electric vehicle drives," *IEEE Transactions on Power Electronics*, vol. 28, no. 12, pp. 5568–5577, Dec 2013.
- [170] R. . Nielsen, J. Due, and S. Munk-Nielsen, "Innovative measuring system for wear-out indication of high power igbt modules," in 2011 IEEE Energy Conversion Congress and Exposition, Sep. 2011, pp. 1785–1790.
- [171] S. Beczkowski, P. Ghimre, A. R. de Vega, S. Munk-Nielsen, B. Rannestad, and P. Thogersen, "Online vce measurement method for wear-out monitoring of high

power igbt modules," in 2013 15th European Conference on Power Electronics and Applications (EPE), Sep. 2013, pp. 1–7.

- [172] R. Gelagaev, P. Jacqmaer, and J. Driesen, "A fast voltage clamp circuit for the accurate measurement of the dynamic on-resistance of power transistors," *IEEE Transactions on Industrial Electronics*, vol. 62, no. 2, pp. 1241–1250, Feb 2015.
- [173] M. A. Eleffendi and C. M. Johnson, "Application of kalman filter to estimate junction temperature in igbt power modules," *IEEE Transactions on Power Electronics*, vol. 31, no. 2, pp. 1576–1587, Feb 2016.
- [174] M. Hoeer, F. Weiss, and S. Bernet, "Online collector-emitter saturation voltage measurement for the in-situ temperature estimation of a high-power 4.5 kv igbt module," in 2017 19th European Conference on Power Electronics and Applications (EPE'17 ECCE Europe), Sep. 2017, pp. P.1–P.9.
- [175] F. Stella, G. Pellegrino, E. Armando, and D. Daprà, "On-line temperature estimation of sic power mosfet modules through on-state resistance mapping," in 2017 IEEE Energy Conversion Congress and Exposition (ECCE), Oct 2017, pp. 5907–5914.
- [176] U. Choi, F. Blaabjerg, S. Jørgensen, S. Munk-Nielsen, and B. Rannestad, "Reliability improvement of power converters by means of condition monitoring of igbt modules," *IEEE Transactions on Power Electronics*, vol. 32, no. 10, pp. 7990–7997, Oct 2017.
- [177] M. Denk, F. Lautner, and M. Bakran, "Accuracy analysis of uce(on)-based measurement of the inverter output current at higher motor speeds," in 2017 19th European Conference on Power Electronics and Applications (EPE'17 ECCE Europe), Sep. 2017, pp. P.1–P.10.
- [178] M. Guacci, D. Bortis, and J. W. Kolar, "On-state voltage measurement of fast

switching power semiconductors," CPSS Transactions on Power Electronics and Applications, vol. 3, no. 2, pp. 163–176, June 2018.

- [179] B. Yu, L. Wang, and D. Ahmed, "Drain-source voltage clamp circuit for online accurate on-state resistance measurement of sic mosfets in dc solid state power controller (sspc)," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, pp. 1–1, 2019.
- [180] U.S. Patent 5,166,549, 1992.
- [181] Netherlands Patent EP2 564 220B1, 2011.
- [182] U. Schauermann, "Investigations on the vce(t)-method to determine the junction temperature by using the chip itself as sensor," in *PCIM 2019*, 2009.
- [183] N. Degrenne and S. Mollov, "Robust on-line junction temperature estimation of igbt power modules based on von during pwm power cycling," in 2019 IEEE International Workshop on Integrated Power Packaging (IWIPP), April 2019, pp. 107–116.
- [184] J. Brandelero, J. Ewanchuk, and S. Mollov, "Online junction temperature measurements for power cycling power modules with high switching frequencies," in 2016 28th International Symposium on Power Semiconductor Devices and ICs (ISPSD), June 2016, pp. 191–194.
- [185] F. Gonzalez-Hernando, J. San-Sebastian, A. Garcia-Bediaga, M. Arias, and A. Rujas, "Junction temperature model and degradation effect in igbt multichip power modules," in 2019 IEEE Energy Conversion Congress and Exposition (ECCE), Sep. 2019, pp. 2957–2962.
- [186] L. Dupont and Y. Avenas, "Evaluation of thermo-sensitive electrical parameters based on the forward voltage for on-line chip temperature measurements of igbt devices," in 2014 IEEE Energy Conversion Congress and Exposition (ECCE), Sep. 2014, pp. 4028–4035.

- [187] N. Degrenne and S. Mollov, "Experimentally-validated models of on-state voltage for remaining useful life estimation and design for reliability of power modules," in CIPS 2018; 10th International Conference on Integrated Power Electronics Systems, March 2018, pp. 1–6.
- [188] A. Singh, A. Anurag, and S. Anand, "Evaluation of vce at inflection point for monitoring bond wire degradation in discrete packaged igbts," *IEEE Transactions on Power Electronics*, vol. 32, no. 4, pp. 2481–2484, April 2017.
- [189] P. Asimakopoulos, K. Papastergiou, T. Thiringer, M. Bongiorno, and G. Le Godec, "Igbt power stack integrity assessment method for high-power magnet supplies," *IEEE Transactions on Power Electronics*, vol. 34, no. 11, pp. 11228– 11240, Nov 2019.
- [190] S. H. Ali, S. Dusmez, and B. Akin, "Investigation of collector emitter voltage characteristics in thermally stressed discrete igbt devices," in 2016 IEEE Energy Conversion Congress and Exposition (ECCE), Sep. 2016, pp. 1–6.
- [191] M. Du, Q. Kong, Z. Ouyang, K. Wei, and W. G. Hurley, "Strategy for diagnosing the aging of an igbt module by on-state voltage separation," *IEEE Transactions* on Electron Devices, vol. 66, no. 11, pp. 4858–4864, Nov 2019.
- [192] U. Choi and F. Blaabjerg, "Real-time condition monitoring of igbt modules in pv inverter systems," in CIPS 2018; 10th International Conference on Integrated Power Electronics Systems, March 2018, pp. 1–5.
- [193] S. Yang, D. Xiang, A. Bryant, P. Mawby, L. Ran, and P. Tavner, "Condition monitoring for device reliability in power electronic converters: A review," *IEEE Transactions on Power Electronics*, vol. 25, no. 11, pp. 2734–2752, Nov 2010.
- [194] S. S. Manohar, A. Sahoo, A. Subramaniam, and S. K. Panda, "Condition monitoring of power electronic converters in power plants a review," in 2017 20th

International Conference on Electrical Machines and Systems (ICEMS), Aug 2017, pp. 1–5.

- [195] J. Brandelero, J. Ewanchuk, and S. MOLLOV, "Selective gate driving in intelligent power modules," *IEEE Transactions on Power Electronics*, pp. 1–1, 2020.
- [196] M. J. Ryan and R. D. Lorenz, "A high performance sine wave inverter controller with capacitor current feedback and "back-emf" decoupling," in *Proceedings of PESC '95 - Power Electronics Specialist Conference*, vol. 1, 1995, pp. 507–513 vol.1.
- [197] A. V. Rocha, H. de Paula, M. E. dos Santos, and B. J. Cardoso Filho, "A thermal management approach to fault-resilient design of three-level igct-based npc converters," *IEEE Transactions on Industry Applications*, vol. 49, no. 6, pp. 2684–2691, Nov 2013.
- [198] D. I. Brandao, F. E. G. Mendes, R. V. Ferreira, S. M. Silva, and I. A. Pires, "Active and reactive power injection strategies for three-phase four-wire inverters during symmetrical/asymmetrical voltage sags," *IEEE Transactions on Industry Applications*, vol. 55, no. 3, pp. 2347–2355, May 2019.
- [199] R. . Klug and A. Mertens, "Reliability of megawatt drive concepts," in *IEEE International Conference on Industrial Technology*, 2003, vol. 2, Dec 2003, pp. 636–641 Vol.2.
- [200] E. R. Motto, J. F. Donlon, M. Honsberg, and F. Tametani, "A new intelligent power module with enhanced diagnostics and protection," in 2013 Twenty-Eighth Annual IEEE Applied Power Electronics Conference and Exposition (APEC), March 2013, pp. 2398–2401.
- [201] J. D. van Wyk and F. C. Lee, "On a future for power electronics," *IEEE Journal of Emerging and Selected Topics in Power Electronics*, vol. 1, no. 2, pp. 59–72, June 2013.

- [202] ABB, Dasheet ABB HiPak 5SNA 1600N170100, http://new.abb.com/semiconductors/igbt-and-diode-modules, 2014.
- [203] B. Hu, J. Ortiz Gonzalez, L. Ran, H. Ren, Z. Zeng, W. Lai, B. Gao, O. Alatise, H. Lu, C. Bailey, and P. Mawby, "Failure and reliability analysis of a sic power module based on stress comparison to a si device," *IEEE Transactions on Device* and Materials Reliability, vol. 17, no. 4, pp. 727–737, Dec 2017.
- [204] S. Roberts, Dc/Dc Book of Knowledge, RECOM, Ed. RECOM, 2017.
- [205] S.Roberts and W.Wolfsgruber, "Very low noise filter for isolated dc/dc converters," Recom, https://recom-power.com/ru/rec-n-very-low-noise-filter-forisolated-dc!sdc-converters-46.html?0, Tech. Rep., 2019.
- [206] T. Instruments, ISO124 ±10 VInput, PrecisionIsolationAmplifier, http : //www.ti.com/lit/ds/symlink/iso124.pdf, 2018.
- [207] M. Sttit, Simple Output FIlter Eliminates ISO AMP Output Ripple and Keeps Full Bandwidth, http://www.ti.com/lit/an/sboa012/sboa012.pdf.
- [208] D. Novotny and T. Lipo, Vector Control and Dynamics of AC Drives. Oxford Science Publications, 2000.