# Fast Reliability Assessment of Neutral-Point-Clamped Topologies through Markov Models 

Sergio Busquets-Monge, Senior Member, IEEE, Roya Rafiezadeh, Salvador Alepuz, Senior Member, IEEE, Alber Filba-Martinez, and Joan Nicolas-Apruzzese


#### Abstract

This paper presents detailed Markov models for the reliability assessment of multilevel neutral-point-clamped (NPC) converter leg topologies, incorporating their inherent faulttolerance under open-circuit switch faults. The Markov models are generated and discussed in detail for the three-level and fourlevel active NPC (ANPC) cases, while the presented methodology can be applied to easily generate the models for higher number of levels and for other topology variants. In addition, this paper also proposes an extremely fast calculation method to obtain the precise value of the system mean time to failure from any given formulated system Markov model. This method is then applied to quantitatively compare the reliability of two-level, three-level, and four-level ANPC legs under switch open-circuit-guaranteed faults and varying degrees of device paralleling. The comparison reveals that multilevel ANPC leg topologies inherently present a potential for a higher reliability than the conventional two-level leg, questioning the suitability of the traditional search for topologies with the minimum number of devices in order to improve reliability. Experimental results are presented to validate the fault-tolerance assumptions upon which the presented reliability models for the three-level and four-level ANPC legs are based. This paper is accompanied by supplementary MATLAB scripts.


Index Terms- Neutral-point-clamped, Markov model, mean time to failure, multilevel, reliability.

## NOMENCLATURE

$R(t) \quad$ Reliability at time $t$.
$N_{\mathrm{S}}(t) \quad$ Number of systems still operating at time $t$.
MTTF Mean time to failure.
$\lambda(t) \quad$ Failure rate at time $t$.
$\lambda_{k i, k f} \quad$ Transition rate between state $k_{\mathrm{i}}$ and state $k_{\mathrm{f}}$.
$\lambda_{x y} \quad$ Failure rate of switch located in row $x$ and column $y$.
$\lambda_{q} \quad$ Failure rate of one switch when $q$ switches are operating in parallel.
$\lambda_{\text {sys }} \quad$ System failure rate.

[^0]| $\lambda_{\text {sys }, p r}$ | Failure rate of one system formed by $p r$ switches in <br> parallel. |
| :--- | :--- |
| $p_{k}(t)$ | Probability of being in state $k$ at time $t$. |
| $P_{k}$ | Steady-state value of $p_{k}(t)$. |
| $\mathbf{p}$ | Vector of state probabilities. |
| $v_{\mathrm{ac}}(t)$ | Voltage of the leg ac terminal at time $t$. |
| $i_{\mathrm{ac}}(t)$ | Leg ac terminal current at time $t$. |

## I. Introduction

RELIABILITY of power electronics systems has become of primary importance to fully leverage the advantages that this technology offers [1]-[4]. In many applications, the power electronics subsystem is one of the weakest links from the reliability point of view and an unexpected sudden full system shutdown is not acceptable.

Reliability research has traditionally focused on two main areas: modeling and methods to improve reliability.

On the one hand, a significant effort has been devoted to the development of reliability models [5]-[6]. At the component level, two types of models can be highlighted: empirical models such as the Arrhenius-Coffin-Manson model [7]-[8] or the Palmer-Miner linear cumulative model [8], and physics-offailure models. At the system level, part-count models, combinatorial models (fault trees, success trees, and reliability block diagrams) [9], and state-space models (Markov models) [10]-[11] have been proposed. Artificial neural network models have also been employed to ease the introduction of the reliability metrics into the design of power electronics systems [12].

On the other hand, several methods have been proposed to enhance the reliability of systems, that can be broadly categorized as: 1) using more suitable materials, shapes, dimensions, and processes in the component and system implementation [13]; 2) methods based on the system operation management, such as active thermal management [14] and preventive maintenance supported by condition monitoring [15] and fault prognosis; and 3) methods based on increasing the redundancy of systems [9], at both the component and system level, tied to fault diagnosis and resulting in fault-tolerant systems.

One approach to increase the redundancy of a power converter is to employ multilevel topologies, since many of them present inherent redundancy. The reliability of several multilevel topologies has been studied in [16]-[26], including their reliability modeling and the strategies to operate them under faults. In particular, in [16], the reliability of some multilevel converters is modeled, assuming power device

## IEEE TRANSACTIONS ON POWER ELECTRONICS

short-circuit faults, through reliability block diagrams, from which the system reliability can be obtained as a somewhat complex function of time. It is concluded that multilevel converters can present a higher reliability than a conventional two-level converter over an initial period of time. In [17], Markov models are used to analyze and compare multilevel inverters. Nevertheless, they do not consider the inherent topology fault tolerance; e.g., that neutral-point-clamped (NPC) converters can continue operating under multiple opencircuit switch faults [18]. Therefore, they are oversimplified models considering only two system states. In summary, the literature lacks detailed Markov models accounting for multilevel converter inherent fault-tolerance, despite being the most powerful models at the system level. Generating such Markov models is recognized to be a challenge [1].

To contribute to fill this gap, this paper derives Markov models to characterize the reliability of multilevel NPC topologies under open-circuit faults and, from these models, proposes a fast method to compute the mean time to failure (MTTF) of these topologies, which enables the use of the MTTF as a figure-of-merit to quickly characterize the reliability of multiple topology options in optimization processes.

The paper is organized as follows. Section II reviews the basics of reliability and Markov models. Section III proposes a fast method to compute the MTTF from a system Markov model. Section IV presents the Markov models of two-level, three-level, and four-level active NPC (ANPC) legs with a variable number of parallel switches per position, and performs an MTTF comparison of these topologies under several degrees of paralleling and simple conditions to explore their inherent reliability features. Section V presents experimental results to illustrate the behavior of multilevel ANPC legs under several concurrent open-circuit switch faults. Finally, Section VI outlines the conclusions.

## II. Basics of ReLiability

The reliability of a system at time $t, R(t)$, is defined as the probability that the system is still operating at time $t$. If a relatively large set of equal systems are tested in parallel over time, it can be calculated as

$$
\begin{equation*}
R(t)=\frac{N_{\mathrm{S}}(t)}{N_{\mathrm{S}}(0)} \tag{1}
\end{equation*}
$$

where $N_{\mathrm{S}}(t)$ is the number of systems still operating at time $t$ (i.e.; $N_{\mathrm{S}}(0)-N_{\mathrm{S}}(t)$ systems have already failed at time $t$ ).

A convenient figure of merit to assess and compare the reliability of different systems is the mean time to failure, defined as

$$
\begin{equation*}
M T T F=\int_{0}^{\infty} R(t) \cdot \mathrm{d} t \tag{2}
\end{equation*}
$$

Another important parameter is the failure rate, defined as

$$
\begin{equation*}
\lambda(t)=-\frac{1}{R(t)} \cdot \frac{\mathrm{d} R(t)}{\mathrm{d} t} \tag{3}
\end{equation*}
$$

The failure rate at time $t$ indicates the probability of a system failure in the next unit of time. It is usually expressed in FIT, where $1 \mathrm{FIT}=10^{-9} / \mathrm{h}$. The failure rate usually depends on the operating conditions.

If $\lambda(t)$ is constant over time and equal to $\lambda$, then

$$
\begin{equation*}
R(t)=e^{-\lambda t} \tag{4}
\end{equation*}
$$

and

$$
\begin{equation*}
M T T F=\frac{1}{\lambda} \tag{5}
\end{equation*}
$$

A very convenient tool to study the reliability of a complex system is its Markov model, represented by a diagram known as a Markov chain. In a Markov chain, the different relevant system states are represented as nodes, and the possible transitions between states are represented by arrows with an associated transition rate. For example, Fig. 1 presents the simplest Markov chain, with only two states: state 1, in green, representing the system in its original state, with no internal failed devices, and state 0 , in red, representing the system under failure; i.e., the state of the system once it has stopped operating because of the failure of one or more internal devices. The simple diagram of Fig. 1 is appropriate when any device failure within the system leads to a full system failure. However, when certain device failures within a system lead to states where the system can still operate, although under limited conditions, additional states have to be included in the Markov chain. This is the case of the example system represented in Fig. 2, with two intermediate states shown in yellow.

The probability of being in state $k$ at time $t$ is denoted as $p_{k}(t)$. The probability of being in each state evolves over time according to

$$
\frac{\mathrm{d}}{\mathrm{~d} t}\left[\begin{array}{l}
p_{1}  \tag{6}\\
p_{0}
\end{array}\right]=\left[\begin{array}{cc}
-\lambda_{1,0} & 0 \\
\lambda_{1,0} & 0
\end{array}\right] \cdot\left[\begin{array}{l}
p_{1} \\
p_{0}
\end{array}\right]
$$

in Fig. 1, and according to


Fig. 1. Markov chain diagram of a simple system where any device failure leads to a system failure.


Fig. 2. Markov chain diagram of an example system with intermediate states before the system failure.


Fig. 3. Markov chain diagram of the example system of Fig. 2 with an infinite repair rate.

$$
\frac{\mathrm{d}}{\mathrm{~d} t}\left[\begin{array}{l}
p_{1}  \tag{7}\\
p_{2} \\
p_{3} \\
p_{0}
\end{array}\right]=\left[\begin{array}{cccc}
-\lambda_{1,2}-\lambda_{1,3} & 0 & 0 & 0 \\
\lambda_{1,2} & -\lambda_{2,0} & 0 & 0 \\
\lambda_{1,3} & 0 & -\lambda_{3,0} & 0 \\
0 & \lambda_{2,0} & \lambda_{3,0} & 0
\end{array}\right] \cdot\left[\begin{array}{l}
p_{1} \\
p_{2} \\
p_{3} \\
p_{0}
\end{array}\right]
$$

in Fig. 2. In a general case, this equation can be formulated as

$$
\begin{equation*}
\frac{\mathrm{d}}{\mathrm{~d} t} \mathbf{p}(t)=\mathbf{M} \cdot \mathbf{p}(t) \tag{8}
\end{equation*}
$$

where $\mathbf{p}(t)$ is the vector of state probabilities. Assuming that all transition rates among states are constant, the vector of probabilities can be obtained as

$$
\begin{equation*}
\mathbf{p}(t)=e^{\mathbf{M} t} \cdot \mathbf{p}(0)=\left[\sum_{k=0}^{\infty} \frac{(\mathbf{M} t)^{k}}{k!}\right] \cdot \mathbf{p}(0) \tag{9}
\end{equation*}
$$

The system reliability can then be computed as

$$
\begin{equation*}
R(t)=1-p_{0}(t) \tag{10}
\end{equation*}
$$

and (2) can then be used to calculate the system MTTF. The system failure rate can be calculated as

$$
\begin{equation*}
\lambda_{\mathrm{sys}}(t)=\sum_{k} \lambda_{k, 0} \cdot p_{k}(t) . \tag{11}
\end{equation*}
$$

## III. Proposed Fast Method to Compute MTTF

The calculation of the system reliability presented in the previous section, based on the system Markov model, is aimed at obtaining the value of the system reliability at each point in time, and involves certain computation complexity. However, to assess the reliability of a system, an average-type figure-ofmerit such as the MTTF is often enough. It would then be very interesting to find a way to compute the MTTF without the need to compute $R(t)$. This can be done as follows. The procedure will be illustrated with the example system of Fig. 2. Let us assume that anytime the red system failure state is reached, the system is immediately fully repaired and returned to the green initial state; i.e., an infinite repair rate is considered. The new Markov chain diagram is illustrated in Fig. 3. In this situation, the probability of being in each state will reach a constant steady-state value $\left(P_{0}, P_{1}, P_{2}\right.$, and $\left.P_{3}\right)$, with $P_{0}=0$ and $P_{1}+P_{2}+P_{3}=1$ because the system is immediately repaired when it fails and the system must always be either in state 1 , state 2 , or state 3 . In addition, in this steady state, the transitions into each yellow intermediate state must equal the transitions departing from the same intermediate state. All the above can be formulated as

$$
\left[\begin{array}{ccc}
\lambda_{1,2} & -\lambda_{2,0} & 0  \tag{12}\\
\lambda_{1,3} & 0 & -\lambda_{3,0} \\
1 & 1 & 1
\end{array}\right] \cdot\left[\begin{array}{l}
P_{1} \\
P_{2} \\
P_{3}
\end{array}\right]=\left[\begin{array}{l}
0 \\
0 \\
1
\end{array}\right] .
$$

From (12), the vector of probabilities can be easily isolated

$$
\left[\begin{array}{l}
P_{1}  \tag{13}\\
P_{2} \\
P_{3}
\end{array}\right]=\left[\begin{array}{ccc}
\lambda_{1,2} & -\lambda_{2,0} & 0 \\
\lambda_{1,3} & 0 & -\lambda_{3,0} \\
1 & 1 & 1
\end{array}\right]^{-1} \cdot\left[\begin{array}{l}
0 \\
0 \\
1
\end{array}\right]
$$

The system failure rate can then be computed as

$$
\begin{equation*}
\lambda_{\text {sys }}=\lambda_{2,0} \cdot P_{2}+\lambda_{3,0} \cdot P_{3} . \tag{14}
\end{equation*}
$$

Finally, since $\lambda_{\text {sys }}$ is constant over time, the MTTF can be easily calculated as

$$
\begin{equation*}
M T T F=\frac{1}{\lambda_{\mathrm{sys}}} \tag{15}
\end{equation*}
$$

The obtained MTTF value through this simple procedure is exactly the same as the one obtained through the cumbersome procedure presented in Section II. Thus, equations (12)-(15)
greatly simplify the calculation of the system MTTF. Table I shows that the computation time of the MTTF in this simple example can be reduced more than 500,000 times using the proposed procedure (see supplementary MATLAB script).

In a general case, the equations describing the proposed computation procedure can be formulated as

$$
\begin{gather*}
\mathbf{A} \cdot \mathbf{P}=\left[\begin{array}{c}
0 \\
0 \\
\vdots \\
1
\end{array}\right] \\
\mathbf{P}=\mathbf{A}^{-1} \cdot\left[\begin{array}{c}
0 \\
0 \\
\vdots \\
1
\end{array}\right]  \tag{16}\\
\lambda_{\text {sys }}=\sum_{k} \lambda_{k, 0} \cdot P_{k} \\
M T T F=\frac{1}{\lambda_{\text {sys }}}
\end{gather*}
$$

where $\mathbf{P}=\left[\begin{array}{llll}P_{1} & P_{2} & \ldots & P_{m}\end{array}\right]^{\mathrm{T}}$ and the last row of matrix $\mathbf{A}$ contains all ones.

## IV. Application to Multilevel NPC Topologies

In this section, the previously conceived method will be applied to compute the MTTF of a conventional two-level leg and its extension into multilevel ANPC legs. The resulting MTTF values will be compared under different scenarios.

In the aforementioned study, it will be assumed that all power semiconductor devices always fail in open circuit because this is the most favorable situation from the reliability point of view. In practice, although power semiconductor devices may fail in both short circuit and open circuit, by adding some auxiliary circuitry acting as an electronic fuse in series with the power semiconductor, it can be guaranteed that the compound device formed by this series association ends up failing in open circuit. This compound device, designated here as switching cell (SC), is conceptually illustrated in Fig. 4, where $S_{m}$ represents the main switch and $S_{a}$ represents an auxiliary switch connected in series to perform the electronic fuse function. $\mathrm{S}_{\mathrm{a}}$ should be a very reliable low-conduction-loss switch which is always ON and whenever a failure of $S_{m}$ is detected, $\mathrm{S}_{\mathrm{a}}$ turns permanently OFF. Thus, the overall SC can

TABLEI
Comparison of the mttr computation Time with the Conventional Procedure and the Proposed Procedure*

|  | Conventional <br> $(7)+(9)+(10)+(2)$ | Proposed <br> $(12)-(15)$ |
| :---: | :---: | :---: |
| Computation <br> time | 36.95 s | $68 \mu \mathrm{~s}$ |

${ }^{*}$ Conditions: $\lambda_{1,2}=20000$ FIT, $\lambda_{1,3}=30000$ FIT, $\lambda_{2,0}=40000 \mathrm{FIT}, \lambda_{3,0}=$ $50000 \mathrm{FIT}, M T T F=42000 \mathrm{~h}, 300 \mathrm{k}$ points calculated of $R(t)$ in conventional procedure, MATLAB R2019a, Intel Core i5 processor, and 16 GB RAM.

$$
\mathrm{S}_{\mathrm{m}}
$$

Fig. 4. Switching cell configuration to guarantee that the failures of the main switch $\mathrm{S}_{\mathrm{m}}$ always lead to an open circuit, by opening the auxiliary switch $S_{a}$, which performs the function of an electronic fuse [27].

## IEEE TRANSACTIONS ON POWER ELECTRONICS

be regarded as a single switch which always fails in opencircuit. The good performance of such configuration has been verified and discussed in detail in [27].

## A. Two-Level Leg

Fig. 5(a) shows the topology of the conventional two-level leg. It contains two switches, labelled with a two-digit code indicating the row and column where the switch is located. The two switching states allowing the connection of the leg ac terminal to the two dc-link terminals are represented in the
first row of Fig. 6.
In the conventional two-level leg, the open-circuit failure of any switch leads to a full system failure, because once a switch fails, the leg can no longer switch between the two dclink points; thus, the leg loses its essential functionality: being capable of connecting the ac terminal to more than one dc-link point. This is indicated in the Markov chain diagram of Fig. 7, where $\lambda_{11}$ and $\lambda_{21}$ represent the failure rate of switches 11 and 21 , respectively.

(a)

(b)

(c)

Fig. 5. Multilevel ANPC topologies. (a) Conventional two-level leg. (b) Three-level leg. (c) Four-level leg.


Fig. 6. Leg switching states to connect the ac terminal to: (a) dc. $\mathrm{c}_{1}$ (b) $\mathrm{dc}_{2}$. (c) $\mathrm{dc}_{3}$. (d) dc $\mathrm{c}_{4}$. In each switching state, on switches are represented with a solid line and OFF switches are represented with no line. The line representing an ON switch is red when the switch connects the ac terminal with the intended de terminal and carries a portion of the ac terminal current, and it is green when the ON switch simply clamps the blocking voltage of off state switches to the elementary value $v_{\mathrm{dd}} /(n-1)$, being $n$ the number of levels, and carries no current.


Fig. 7. Markov chain diagram of a two-level leg.

## B. Three-Level ANPC Leg

Fig. 5(b) shows the topology of the three-level ANPC leg. It contains six switches, labelled with a two-digit code indicating the row and column where the switch is located. The three switching states considered for the connection of the leg ac terminal to the three dc-link terminals are represented in the second row of Fig. 6 [28].

Through a systematic search and analysis of all possible combinations of failed devices, it is possible to establish the Markov chain diagram for the three-level ANPC leg. This systematic search and analysis is performed in one of the supplementary MATLAB scripts provided with this manuscript. Fig. 8 presents the resulting Markov chain diagram, with 17 relevant states. Each state contains a diagram indicating the state of each switch of Fig. 5(b): a green dot indicates that the switch is operating correctly and a red cross indicates that the switch has failed in open circuit. The number of available levels is also indicated for each state. It can be observed that the failure of only one switch does not lead in any case to the full system failure state, because in all these cases the leg ac terminal can still be connected to more than one dc-link point. It is interesting to note that in states 3 and 4,
the three levels are still available. The concurrent failure of two switches may also not lead to a full system failure. Only when two or three concurrent switch failures occur with 1 or 0 levels available, the system reaches the system failure state (state 0 ).

## C. Four-Level ANPC Leg

Fig. 5(c) shows the topology of the four-level ANPC leg. It contains twelve switches, labelled with a two-digit code indicating the row and column where the switch is located. The four switching states considered for the connection of the leg ac terminal to the four dc-link terminals are represented in the third row of Fig. 6 [28].

Through a systematic search and analysis of all possible combinations of failed devices, it is possible to establish the Markov chain diagram for the four-level ANPC leg. This systematic search and analysis is performed in one of the supplementary MATLAB scripts provided with this manuscript. The Markov chain contains 1,118 relevant states. Note that the complexity of the Markov chain raises exponentially with the leg number of switches. From the analysis of these states, it can be concluded that the leg can continue operating with up to eight concurrent open-circuit switch failures; i.e., in four different states featuring eight concurrent failed switches the leg is still able to connect the ac terminal to two different dc-link points. On the other hand, the


Fig. 8. Markov chain diagram of a three-level ANPC leg.

## IEEE TRANSACTIONS ON POWER ELECTRONICS

full leg failure state can be reached if both 33 and 43 switches fail; i.e., with only two concurrent switch failures. This means that these two positions should be occupied by switches featuring a low failure rate.

## D. Devices in Parallel

One way to improve the reliability of systems involving switches that fail in open circuit is to introduce additional switches connected in parallel with the preexisting ones. By introducing additional switches in parallel with a given switch, the current is distributed among the paralleled devices, reducing the current stress and therefore reducing the failure rate of each individual switch. In addition, when one of the paralleled switches fails, the remaining parallel switches can continue operating, with eventually higher current stress and a higher failure rate. Ultimately, the overall failure rate of the set of paralleled devices ends up being lower than the failure rate of a single device. The reduction is especially noticeable for a moderate number of paralleled devices [19].

Let us analyze the reliability of an isolated system integrated by $p r$ parallel devices, as depicted in Fig. 9. Fig. 10 illustrates the Markov chain of this system, with $p r$ varying from 1 to 5 . State $k$ corresponds to the set with $k-1$ failed devices, except for state 0 , which corresponds to the state will all devices failed. Variable $\lambda_{q}$ indicates the failure rate of one device when $q$ devices are operating in parallel. From Fig. 10 and applying the method presented in Section III, the expressions in (17) of the overall system failure rate $\lambda_{\text {sys }, p r}$ are quickly obtained.

The paralleling of switches can be used to improve the reliability of ANPC legs. Their MTTF will especially improve if paralleling is applied to the most critical switch positions. If paralleling is used in a given position of the leg, the new system reliability can be estimated, in a first approximation, using the same Markov chain diagrams derived in Sections IV.A, IV.B, and IV.C, and setting that the failure rate of the corresponding switch position is equal to the value computed with (17).


## E. MTTF Comparison Study and Discussion

The models presented in the previous sections and the proposed fast method to compute the MTTF can be very useful to characterize and compare the reliability of different systems. Thanks to the low MTTF computation time, it can be incorporated in optimization procedures requiring the evaluation of many alternative designs.

In this section, a comparative study of multilevel ANPC topologies is presented to illustrate the potential of this modeling approach, while providing a deeper insight into the reliability of these systems. A comparison of the MTTF of ANPC topologies with two, three, and four levels and the same full dc bus voltage is performed by employing the reliability models and fast MTTF calculation method presented in previous sections. The comparison also considers the paralleling of SCs $(p r=1$ to 5$)$ per position. For the sake of simplicity, the failure rate of the SCs will be roughly estimated with a simple normalized value, where only the influence of the SC voltage rating has been considered. In a more accurate modeling approach, a converter leg thermoelectrical model should be employed in each possible state to compute each SC temperature, and then, conventional reliability model expressions, based among others on the Arrhenius law, would determine the failure rate of the SC from the value of the SC temperature, the SC blocking voltage, and other predetermined parameters, as discussed in section II.C of [16]. However, this is not deemed convenient in this preliminary study, whose only aim is to explore basic comparative features and trends.

It is worth to highlight that, in the usual case, any device failure in a leg topology leads to a global leg failure. For this reason, it is commonly accepted that a topology with a larger number of devices presents worse reliability, which has motivated the search of multilevel leg topologies with reduced number of devices. However, the previous argument is not invariably true if the topology has inherent redundancy that enables a fault-tolerant operation where the failure of one device does not lead to a global leg failure. The results of the study in this section confirm this latter reasoning.

Fig. 9. A set of $p r$ switches connected in parallel.

$$
\begin{gather*}
\lambda_{\text {sys }, 1}=\lambda_{1} ; \quad \lambda_{\text {sys }, 2}=\frac{2 \lambda_{1} \lambda_{2}}{2 \lambda_{2}+\lambda_{1}} ; \quad \lambda_{\text {sys }, 3}=\frac{6 \lambda_{1} \lambda_{2} \lambda_{3}}{2 \lambda_{1} \lambda_{2}+3 \lambda_{1} \lambda_{3}+6 \lambda_{2} \lambda_{3}} ;  \tag{17}\\
\lambda_{\text {sys }, 4}=\frac{12 \lambda_{1} \lambda_{2} \lambda_{3} \lambda_{4}}{3 \lambda_{1} \lambda_{2} \lambda_{3}+4 \lambda_{1} \lambda_{2} \lambda_{4}+6 \lambda_{1} \lambda_{3} \lambda_{4}+12 \lambda_{2} \lambda_{3} \lambda_{4}} ; \quad \lambda_{\text {sys }, 5}=\frac{60 \lambda_{1} \lambda_{2} \lambda_{3} \lambda_{4} \lambda_{5}}{12 \lambda_{1} \lambda_{2} \lambda_{3} \lambda_{4}+15 \lambda_{1} \lambda_{2} \lambda_{3} \lambda_{5}+20 \lambda_{1} \lambda_{2} \lambda_{4} \lambda_{5}+30 \lambda_{1} \lambda_{3} \lambda_{4} \lambda_{5}+60 \lambda_{2} \lambda_{3} \lambda_{4} \lambda_{5}}
\end{gather*}
$$


(a)

(b)

(c)

(d)

(e)

Fig. 10. Markov chain diagram of a system integrated by $p r$ parallel devices. (a) $p r=1$. (b) $p r=2$. (c) $p r=3$. (d) $p r=4$. (e) $p r=5$.

## IEEE TRANSACTIONS ON POWER ELECTRONICS

## 1) Assumption 1: SC failure rate independent of voltage rating

The first comparison assumes a normalized value of the failure rate $\lambda=1$ for all SCs over their whole life, regardless of their operating conditions and voltage rating; i. e., it is assumed that all SCs within a leg suffer the same stress in all operating conditions over the leg lifetime and that SCs with different voltage rating have the same reliability. Obviously, this is not realistic, but allows us easily obtaining a first approximation of the leg MTTF that will reflect its inherent reliability and thus allows classifying the different possible leg configurations from the reliability point of view.

In these conditions, the failure rate of a set of $p r \mathrm{SCs}$ in parallel is shown in the left column of Table II.

Leg configurations with two, three, and four levels, and a degree of paralleling from $p r=1$ to $p r=5$ in each leg switch position have been analyzed. The MTTF value for each configuration is shown in Table III and plotted in Fig. 11, where $n$ is the leg number of levels and \#SC is the total number of SCs within the leg.

It can be observed that, without paralleling $(p r=1)$, the MTTF increases with the number of levels, in spite of increasing the number of SCs with the number of levels. This also occurs for the same level of paralleling ( $p r=2,3,4$, and 5). This is due to the fact that as the number of levels increases, the topology has an inherent higher degree of redundancy: redundancy in the dc bus points to which the ac terminal can be connected and redundancy in the paths that allow the connection of the ac terminal to these dc bus points.

It can also be observed that, if the use of paralleling is considered and topologies at different levels but with the same number of SCs are compared, then, as the number of levels increases, the MTTF decreases. For instance, the MTTF of $n=$ 2 with $p r=3$ is higher than the MTTF of $n=3$ with $p r=1$. However, this is not a fair comparison, because the SCs used in a leg with a higher number of levels should be simpler and more reliable because they have a lower voltage rating. Let us take this aspect into account with a second assumption, which should be fairer.

## 2) Assumption 2: SC failure rate proportional to the voltage rating

The same calculation is performed but now the SC failure rate is assumed to be proportional to the SC voltage rating, with a normalized value $\lambda=1$ for the cells used in a two-level leg. In these conditions, the failure rate of a set of $p r$ SCs in parallel for an $n$-level leg is shown in the right column of Table II.


Fig. 11. MTTF of different leg configurations under assumption 1.

The new MTTF values for the different analyzed leg configurations are presented in Table III and Fig. 12.

It can be now observed that the MTTF increases as the number of levels increases, even if the comparison is made for the same number of SCs.


Fig. 12. MTTF of different leg configurations under assumption 2.

TABLE II
Failure Rate of a Set of $P R$ Cells in Parallel

| $p r$ | $\lambda$ [p.u.] |  |
| :---: | :---: | :---: |
|  | Assumption 1 | Assumption 2 |
| 1 | 1 | $1 /(n-1)$ |
| 2 | $2 / 3=0.67$ | $0.67 /(n-1)$ |
| 3 | $6 / 11=0.55$ | $0.55 /(n-1)$ |
| 4 | $12 / 25=0.48$ | $0.48 /(n-1)$ |
| 5 | $60 / 137=0.44$ | $0.44 /(n-1)$ |

TABLE III
MTTF of DifFerent Leg Configurations

| Leg configuration |  |  | MTTF [p.u.] |  |
| :---: | :---: | :---: | :---: | :---: |
| $n$ | $p r$ | \#SC | Assumption 1 | Assumption 2 |
| 2 | 1 | 2 | 0.50 | 0.50 |
|  | 2 | 4 | 0.75 | 0.75 |
|  | 3 | 6 | 0.92 | 0.92 |
|  | 4 | 8 | 1.04 | 1.04 |
|  | 5 | 10 | 1.14 | 1.14 |
| 3 | 1 | 6 | 0.55 | 1.10 |
|  | 2 | 12 | 0.83 | 1.65 |
|  | 3 | 18 | 1.01 | 2.02 |
|  | 4 | 24 | 1.15 | 2.29 |
|  | 5 | 30 | 1.26 | 2.51 |
| 4 | 1 | 12 | 0.56 | 1.67 |
|  | 2 | 24 | 0.84 | 2.51 |
|  | 3 | 36 | 1.02 | 3.06 |
|  | 4 | 48 | 1.16 | 3.48 |
|  | 5 | 60 | 1.27 | 3.81 |

## IEEE TRANSACTIONS ON POWER ELECTRONICS

## V. Experimental Results

In this section, experimental results are presented to illustrate the behavior of multilevel ANPC legs under certain open-circuit switch faults, and most specifically, to validate the assumption that multilevel ANPC legs can continue operating under the concurrent open-circuit fault of some switches; i.e., the fault-tolerant operation.

A three-level ANPC leg (Fig. 5(b)) and a four-level ANPC leg (Fig. 5(c)) have been implemented with $100-\mathrm{V}$ metal-oxide-semiconductor field-effect transistors and then tested with 50 V dc power supplies across adjacent dc-link terminals, a series $33 \Omega-3 \mathrm{mH}$ load connected between the ac and $\mathrm{dc}_{1}$ terminals, the same duty-ratio of connection to all available dc-link points, and a switching frequency of 10 kHz . The switch control signals are generated with the aid of a dSPACE control platform equipped with a DS5101 digital waveform output board. Fig. 13 shows the pictures of the experimental setups, where the three-level and four-level ANPC legs have been implemented employing 6 and 12 cells, respectively, of a $6 \times 3$ switching-cell array [29].

Fig. 14 shows the voltage of the ac terminal with reference to node $\mathrm{dc}_{1}\left(v_{\mathrm{ac}}\right)$, the current through the ac terminal $\left(i_{\mathrm{ac}}\right)$, and the binary switch control signals (high level: switch ON, low level: switch OFF) in the three-level leg case under several fault states. In the first and second switching cycles, the leg is in a failure-free state and produces a $v_{\mathrm{ac}}$ with three levels, as expected. At the beginning of the third switching cycle, switch 21 fails but $v_{\text {ac }}$ still presents three levels, because the failure of

(a)

(b)

Fig. 13. Experimental setup. (a) Three levels. (b) Four levels.
switch 21 only eliminates one of the two redundant paths to connect to $\mathrm{dc}_{2}$, according to Fig. 6. At the beginning of the fourth switching cycle, switch 41 fails leading to a $v_{\mathrm{ac}}$ with only two levels, since the failure of switch 41 eliminates the only path to connect to $\mathrm{dc}_{3}$ (see Fig. 6(c)). Finally, at the beginning of the fifth switching cycle, switch 32 fails, eliminating the remaining path to connect to $\mathrm{dc}_{2}$, and thus leading to a full leg failure state since $v_{\mathrm{ac}}$ can no longer present more than one level.

Fig. 15 shows $v_{\mathrm{ac}}, i_{\mathrm{ac}}$, and relevant switch control signals in the four-level leg case under several fault states. In the first and second switching cycles, the leg is in a failure-free state and produces a $v_{\mathrm{ac}}$ with four levels, as expected. A sequence of switch failures ( $21,41,32,61,52$, and 43 ) occurs during the next switching cycles, as indicated in Fig. 15. According to Fig. 6, the first three failures (switches 21, 41, and 32) only eliminate part of the redundant paths to connect to the dc points, and thus $v_{\text {ac }}$ preserves four levels. The failure of switch 61 , eliminates the only path to connect to $\mathrm{dc}_{4}$, and $v_{\text {ac }}$ presents three levels. The failure of switch 52 , eliminates the remaining path to connect to $\mathrm{dc}_{3}$, and $v_{\text {ac }}$ presents two levels. Finally, the failure of switch 43 eliminates the remaining path to connect to $\mathrm{dc}_{2}$ leading to a full leg failure state since $v_{\mathrm{ac}}$ can only present one level.


Fig. 14. Experimental three-level ANPC leg ac-terminal voltage under a sequence of switch open-circuit failures.


Fig. 15. Experimental four-level ANPC leg ac-terminal voltage under a sequence of switch open-circuit failures.

IEEE TRANSACTIONS ON POWER ELECTRONICS
(a)



(b)






(d)



(e)



Fig. 16. Extended experimental results under different combinations of failed devices and under no load (left column), under a step-down in the load resistance from $66 \Omega$ to $10 \Omega$ (center column), and under a step-up in the load resistance from $10 \Omega$ to $66 \Omega$ (right column). (a) Three-level ANPC leg with switch 21 in failure. (b) Three-level ANPC leg with switches 21 and 41 in failure. (c) Four-level ANPC leg with switches 21, 41, and 32 in failure. (d) Four-level ANPC leg with switches $21,41,32$, and 61 in failure. (e) Four-level ANPC leg with switches $21,41,32,61$, and 52 in failure.

Finally, Fig. 16 presents extended experimental results under different cases of concurrent switch faults and under no load and load-step transients to further validate the fault-tolerant-operation of three-level and four-level ANPC legs.

## VI. Conclusion

A very fast method to compute the MTTF of power electronics systems based on their Markov model has been presented. Assuming the use of SCs designed to always fail in
open-circuit, the Markov chain diagrams of two-level, threelevel, and four-level ANPC converter legs have been derived considering for the first time all possible intermediate states and the proposed MTTF computation method has been employed to conveniently compare their MTTF with different number of parallel devices in each switch position. The comparison reveals that topologies with a larger number of devices may feature higher reliability if they present enough inherent redundancy, proving that the common belief that the

## IEEE TRANSACTIONS ON POWER ELECTRONICS

higher the number of devices the lower the reliability cannot be generalized to all cases.

The proposed Markov models of multilevel ANPC legs and the proposed fast MTTF computation method enable the search through optimization algorithms of the most reliable leg configuration under specific operating conditions, where a large number of different configurations have to be evaluated.

## REFERENCES

[1] Y. Song and B. Wang, "Survey on reliability of power electronic systems," IEEE Trans. Power Electron., vol. 28, no. 1, pp. 591-604, Jan. 2013.
[2] H. Wang, M. Liserre, and F. Blaabjerg, "Toward reliable power electronics: challenges, design tools, and opportunities," IEEE Ind. Electron. Mag., vol. 7, no. 2, pp. 17-26, Jun. 2013.
[3] J. Falck, C. Felgemacher, A. Rojko, M. Liserre, and P. Zacharias, "Reliability of power electronics systems: an industry perspective," IEEE Ind. Electron. Mag., vol. 12, no. 2, pp. 24-35, Jun. 2018.
[4] S. Peyghami, Z. Wang, and F. Blaabjerg, "A guideline for reliability prediction in power electronic converters," IEEE Trans. Power Electron., vol. 35, no. 10, pp. 10958-10968, Oct. 2020.
[5] S. Peyghami, F. Blaabjerg, and P. Palensky, "Incorporating power electronic converters reliability into modern power system reliability analysis," IEEE J. Emerg. Sel. Top. Power Electron., pp. 1-1, 2020.
[6] Z. Ni, X. Lyu, O. P. Yadav, B. N. Singh, S. Zheng, and D. Cao, "Overview of real-time lifetime prediction and extension for SiC power converters," IEEE Trans. Power Electron., vol. 35, no. 8, pp. 77657794, 2020.
[7] A. Wintrich, U. Nicolai, W. Tursky, and T. Reimann, Application Manual Power Semiconductors, 2nd ed. Semikron: Nuremberg, Germany, 2015.
[8] H. Huang and P. A. Mawby, "A lifetime estimation technique for voltage source inverters," IEEE Transactions on Power Electronics, vol. 28, no. 8, pp. 4113-4119, Aug. 2013.
[9] P. Tu, S. Yang, and P. Wang, "Reliability- and cost-based redundancy design for modular multilevel converter," IEEE Trans. Ind. Electron., vol. 66, no. 3, pp. 2333-2342, 2019.
[10] H. Chen, S. Xu, and S. Cui, "Reliability evaluation for power converter of SRM on fault-tolerance capability and thermal stress," IEEE Trans. Ind. Electron., vol. 68, no. 2, pp. 1749-1758, 2021.
[11] H. Tarzamni, F. P. Esmaeelnia, M. Fotuhi-Firuzabad, F. Tahami, S. Tohidi, and P. Dehghanian, "Comprehensive analytics for reliability evaluation of conventional isolated multiswitch PWM DC-DC converters," IEEE Trans. Power Electron., vol. 35, no. 5, pp. 52545266, 2020.
[12] T. Dragičević, P. Wheeler, and F. Blaabjerg, "Artificial intelligence aided automated design for reliability of power electronic systems," IEEE Trans. Power Electron., vol. 34, no. 8, pp. 7161-7171, Aug. 2019.
[13] R. Elakkiya, G. Kavithaa, V. Samavatian, K. Alhaifi, A. Kokabi, and H. Moayedi, "Reliability enhancement of a power semiconductor with optimized solder layer thickness," IEEE Trans. Power Electron., vol. 35, no. 6, pp. 6397-6404, 2020.
[14] V. Raveendran, M. Andresen, and M. Liserre, "Improving onboard converter reliability for more electric aircraft with lifetime-based control," IEEE Trans. Ind. Electron., vol. 66, no. 7, pp. 5787-5796, 2019.
[15] S. H. Ali, X. Li, A. S. Kamath, and B. Akin, "A Simple Plug-In Circuit for IGBT Gate Drivers to Monitor Device Aging: Toward Smart Gate Drivers," in IEEE Power Electronics Magazine, vol. 5, no. 3, pp. 45-55, Sept. 2018.
[16] F. Richardeau and T. T. L. Pham, "Reliability calculation of multilevel converters: theory and applications," IEEE Trans. Ind. Electron., vol. 60, no. 10, pp. 4225-4233, Oct. 2013.
[17] H. K. Jahan, M. Naseri, M. M. Haji-Esmaeili, M. Abapour, and K. Zare, "Low component merged cells cascaded-transformer multilevel inverter featuring an enhanced reliability," IET Power Electron., vol. 10, no. 8, pp. 855-862, 2017.
[18] P. Azer, S. Ouni, and M. Narimani, "A novel fault-tolerant technique for active-neutral-point-clamped inverter using carrier-based PWM," IEEE Trans. Ind. Electron., vol. 67, no. 3, pp. 1792-1803, Mar. 2020.
[19] R. Grinberg, G. Riedel, A. Korn, P. Steimer and E. Bjornstad, "On reliability of medium voltage multilevel converters," in Proc. IEEE Energy Conversion Congress and Exposition, 2013, pp. 4047-4052.
[20] Z. Wang, H. Wang, Y. Zhang, and F. Blaabjerg, "A viable mission profile emulator for power modules in modular multilevel converters," IEEE Trans. Power Electron., vol. 34, no. 12, pp. 11580-11593, Dec. 2019.
[21] P. Lezana, J. Pou, T. A. Meynard, J. Rodriguez, S. Ceballos, and F. Richardeau, "Survey on fault operation on multilevel inverters," IEEE Trans. Ind. Electron., vol. 57, no. 7, pp. 2207-2218, July 2010.
[22] S. Li and L. Xu, "Strategies of fault tolerant operation for three-level PWM inverters," IEEE Trans. Power Electron., vol. 21, no. 4, pp. 933940, Jul. 2006.
[23] Alian Chen, Lei Hu, Lifeng Chen, Yan Deng and Xiangning He, "A multilevel converter topology with fault-tolerant ability," IEEE Trans. Power Electron., vol. 20, no. 2, pp. 405-415, Mar. 2005.
[24] J. Nicolás-Apruzzese, S. Busquets-Monge, J. Bordonau, S. Alepuz, and A. Calle-Prado, "Analysis of the fault-tolerance capacity of the multilevel active-clamped converter," IEEE Trans. Industrial Electron., vol. 60, pp. 4773-4783, Nov. 2013.
[25] Y. Zhang, H. Wang, Z. Wang, F. Blaabjerg, and M. Saeedifard, "Mission profile-based system-level reliability prediction method for modular multilevel converters," IEEE Trans. Power Electron., vol. 35, no. 7, pp. 6916-6930, 2020.
[26] J. V. M. Farias, A. F. Cupertino, V. D. N. Ferreira, H. A. Pereira, S. I. Seleme, and R. Teodorescu, "Reliability-oriented design of modular multilevel converters for medium-voltage STATCOM," IEEE Trans. Ind. Electron., vol. 67, no. 8, pp. 6206-6214, 2020.
[27] A. Filba-Martinez, S. Alepuz, S. Busquets-Monge, A. Luque and J. Bordonau, "An Intelligent Electronic Fuse (iFuse) to Enable ShortCircuit Fault-Tolerant Operation of Power Electronic Converters," IEEE TechRxiv preprint, Nov. 2020.
[28] S. Busquets-Monge and J. Nicolas-Apruzzese, "A multilevel activeclamped converter topology - Operating principle," IEEE Trans. Ind. Electron., vol. 58, no. 9, pp. 3868-3878, Sept. 2011.
[29] S. Busquets-Monge and L. Caballero, "Switching-cell arrays - An alternative design approach in power conversion," IEEE Trans. Industrial Electron., vol. 66, no. 1, pp. 25-36, Jan. 2019.


Sergio Busquets-Monge (SM'11) was born in Barcelona, Spain. He received the M.S. degree in electrical engineering and the Ph.D. degree in electronic engineering from the Universitat Politècnica de Catalunya (UPC), Barcelona, in 1999 and 2006, respectively, and the M.S. degree in electrical engineering from Virginia Polytechnic Institute and State University, Blacksburg, VA, USA, in 2001.
From 2001 to 2002, he was with Crown Audio, Inc. Since 2007, he has been an Associate Professor with the Electronic Engineering Department, UPC. His current research interests include standardized/modular power converter design and multilevel conversion.


Roya Rafiezadeh was born in Yazd, Iran. She received the M.S. degree in electrical engineering from the Sharif University of Technology, International Campus, Kish, Iran, in 2010. She is currently working toward the Ph.D. degree in electronic engineering at Universitat Politècnica de Catalunya (UPC), Barcelona, Spain.
Since 2019, she has been a Researcher with the Power Electronics Research Group, UPC. Her current research interests include the standardized and optimized design of power converters.

## IEEE TRANSACTIONS ON POWER ELECTRONICS



Salvador Alepuz (SM'12) was born in Barcelona, Spain. He received the M.S. and Ph.D. degrees in electrical and electronic engineering from the Universitat Politècnica de Catalunya (UPC), Barcelona, in 1993 and 2004, respectively.
Since 1994, he has been an Associate Professor at the Escola Superior Politècnica, Universitat Pompeu Fabra, Tecnocampus Mataró-Maresme, Mataró, Barcelona. He has been a visiting researcher at the Departamento de Electrónica, Universidad Técnica Federico Santa María, Chile, and the Electrical and Computer Engineering Department, Ryerson University, Toronto, Canada. His fields of interest are multilevel converters, ac power conversion, and predictive control applied to renewable energy systems.


Alber Filba-Martinez (M'19) was born in Mataró, Spain. He received the M.S. and Ph.D. degrees in electronic engineering from the Universitat Politècnica de Catalunya (UPC), Barcelona, Spain, in 2011 and 2017, respectively.
Since 2020 he is in a PostDoc tenure track at Catalonia Institute for Energy Research in Barcelona. From 2011 to 2020, he was a researcher in the Power Electronics Research Group, UPC, where he stills collaborates in some of the group's research lines His research interests include multilevel converters for dc-dc conversion applications, power converters in electric vehicles, and fault-tolerant power conversion.


Joan Nicolas-Apruzzese ( $\mathrm{M}^{\prime} 14$ ) was born in Maracaibo, Venezuela. He received the M.S. degree in electrical engineering and the Ph.D. degree in electronic engineering from the Universitat Politècnica de Catalunya (UPC), Barcelona, Spain, in 2008 and 2013, respectively.
Since 2008, he has been a researcher in the Power Electronics Research Group, UPC. His main research interests include multilevel power converters applied to electric vehicles and photovoltaic- and windenergy systems.


[^0]:    Manuscript received August 28, 2020; revised January 25, 2021 and April 23, 2021; accepted May 29, 2021. This work was supported by the Ministerio de Economía, Industria y Competitividad, Spain, under Grant DPI2017-89153-P (AEI/FEDER, UE). (Corresponding author: Sergio BusquetsMonge.)
    S. Busquets-Monge, R. Rafiezadeh, A. Filba-Martinez and J. NicolasApruzzese are with the Electronic Engineering Department, Universitat Politècnica de Catalunya, 08028-Barcelona, Spain (e-mail: sergio.busquets@upc.edu; roya.rafiezadeh@upc.edu; alber.filba@upc.edu; joan.nicolas@upc.edu).
    S. Alepuz is with the Escola Superior Politècnica, Tecnocampus MataróMaresme, Universitat Pompeu Fabra, 08302-Mataró, Spain (e-mail: dr.salvador.alepuz@ieee.org).

