Abstract-This paper presents a fault detection and isolation (FDI) method for open-circuit faults of power semiconductor devices in a modular multilevel converter (MMC).
The proposed FDI method is simple with only one sliding-mode observer (SMO) equation and requires no additional transducers. The method is based on an SMO for the circulating current in an MMC. An open-circuit fault of power semiconductor device is detected when the observed circulating current diverges from the measured one. A fault is located by employing an assumption-verification process. To improve the robustness of the proposed FDI method, a new technique based on the observer injection term is introduced to estimate the value of the uncertainties and disturbances; this estimated value can be used to compensate the uncertainties and disturbances. As a result, the proposed FDI scheme can detect and locate an open-circuit fault in a power semiconductor device while ignoring parameter uncertainties, measurement error, and other bounded disturbances. The FDI scheme has been implemented in a field-programmable gate array using fixed-point arithmetic and tested on a single-phase MMC prototype. Experimental results under different load conditions show that an open-circuit faulty power semiconductor device in an MMC can be detected and located in less than 50 ms.
Index Terms-Fault detection and isolation (FDI), modular multilevel converter (MMC), sliding-mode observer (SMO).

I. INTRODUCTION
T HE modular multilevel converter (MMC) is the state of the art in multilevel converters and is receiving great interest both from academia and industry. It has a number of desirable features such as modular configuration, low harmonic distortion, low-voltage stress on the semiconductor devices, high-voltage and high-power capability, and simple realization of redundancy [1] . In addition, the cells of an MMC are fed by capacitors and no multiphase transformers are required. A comprehensive introduction of the operation of the MMC is given in [2] . The review paper [3] summarizes the latest achievements regarding the MMC in terms of modeling, control, modulation, applications, and future trend. Power semiconductor switches are among the most failureprone components in a power converter and each of these devices is a potential failure point [4] . With large numbers of semiconductor devices, the possibility of fault occurrence is much larger than for normal two-level voltage-source converters (VSCs). Faults in power semiconductor devices cause a power converter operating far away from its setting point and this abnormal operation cannot be overcome by a feedback controller. If the faulty operation is allowed, other devices may be damaged and a shutdown of the plant may follow. Therefore, it is vital to detect and isolate these faults immediately after their occurrence.
Fault detection and isolation (FDI) deals with detecting anomalous situations [fault detection (FD)] and addressing their causes (fault isolation, FI) [5] . An FDI scheme can be implemented either by hardware method or analytical (software) method [5] , [6] . Hardware FDI employs repeated components or additional sensors, and a fault can be obtained if the behavior of the process components is different from the redundant ones, or the additional sensors detect anomalous signals. It is straightforward and reliable but increases the cost, size, and hardware complexity of the plant. The basic idea of analytical FDI is to check the consistency between the actual system behavior and its estimated behavior [7] . The estimated behavior can be obtained either from a mathematical model of the system (for example, using observers) or an analysis of the historical data (for example, using data mining or neural networks). Although the algorithm is more sophisticated, the cost and hardware complexity of employing the analytical method is less than that for the hardware method. The application of the analytical FDI methods is boosted by the great advances of the computer technology in recent decades [6] .
There are two types of faults seen in a fully controlled power semiconductor device: short-circuit fault (remains ON regardless of the gate signal) and open-circuit fault (remains OFF regardless of the gate signal). Any short-circuit fault needs to be detected within 10 μs to save the semiconductor devices from destruction and to avoid a shoot-through fault with the complementary device [8] . A short circuit in an insulated-gate bipolar transistor (IGBT) is usually detected using a hardware circuit, often with additional sensors and associate circuits. These sensors and circuits are usually integrated in a gate driver to form an active/smart gate driver [9] , [10] . The additional sensors and circuits add extra cost and size to the system. Furthermore, these active gate drivers can fail themselves due to their complexity and hence decrease the reliability of the power converter. Fig. 1 , where the parameters are same as an industrial 24-MW MMC [11] and an open-circuit fault occurred at 0.1 s. Only one of the phases is considered. It can be seen that an open-circuit fault is not fatal immediately to an MMC; however, the fault needs to be detected and removed within 0.1 s to avoid secondary damages on other devices. The cause of an open-circuit fault can be various: lifting/fusing of bonding wires, a driver failure, or a communication problem between the controller and driver. The gate driver is recognized as the third most failure prone components according to an industry-based survey [12] . The simplest detection method is to use an active gate driver as mentioned previously. Analytical redundancy can be used to detect an open-circuit fault as this type of fault is not fatal immediately and can be tolerated by the power converter for some time [13] . Several analytical FDI methods based on the analysis of the output voltage waveform are reported. In [14] , a faulty cell in a flying capacitor converter is detected and localized by analyzing the switching frequency of the output phase voltage. This technique has also been applied to a cascaded H-bridge [15] where an open-or short-circuit fault can be detected. In [16] , the characteristics of the output phase voltage are analyzed in the time domain, and the occurrence of a fault is detected by the degradation of the output voltage, while the fault is located by comparing the output phase voltage with all the possible phase fault voltages. In [17] , an artificial intelligence FDI algorithm is proposed, where the historical data of the output phase voltages both in normal and faulty conditions are used to train a neural network. Survey [18] has presented a comprehensive review of the reliability of power electronics systems including methodologies of assessing reliability, methods to detect and locate faults as well as fault tolerate operation. Survey [19] has summarized the recent fault tolerance techniques for three-phase VSCs.
A sliding-mode observer (SMO)-based FDI technique for an MMC was proposed in [20] and [21] , where a faulty power semiconductor device can be detected and located within 100 ms. The work presented in this paper is an improved method. This method is simpler using only one SMO equation and can detect and locate an open-circuit fault in less than 50 ms. Furthermore, a technique is proposed to compensate for any parameter uncertainties, measurement errors, and other bounded disturbances. The resultant FDI scheme can detect an opencircuit faulty power semiconductor device while rejecting any uncertainties and disturbances. The practical implementation of the SMO-based FDI scheme in a field-programmable gate array (FPGA) is also discussed in this paper and the experimental results at different load conditions are presented.
II. SLIDING-MODE OBSERVER
A. Introduction
An observer is a mathematical replica of a system to estimate its internal states, driven by the input of the system and a signal representing the discrepancy between the estimated and actual states [22] . In the earliest observers such as the Luenberger observer, the differences between the estimated outputs and the actual outputs of the plant are fed back to the observer linearly, and the estimated states cannot converge to the measured states in the presence of a disturbance [22] , [23] . The SMO employs a high-gain switching function of the discrepancy between the estimated and actual outputs to force the estimated states to the actual states asymptotically.
A first-order system (1) is used in this papeṙ
An SMO for (1) is introduceḋ
wherex donates the estimated/observed state of x and L denotes the observer gains designed to drivex → x in finite time. Subtracting (2) from (1) yields the dynamic error between the observed and measured stateṡ Choosing L > |ax|, we obtaiñ
which will forcex andẋ to zero and keep zero thereafter, this motion along a line is the so-called sliding mode [24] .
B. SMO for an MMC
An SMO can be built for an MMC based on (2) . In this paper a single-phase eight-cell MMC is considered; nevertheless, the method is versatile and can be used for MMC with hundreds of cells.
The circuit diagram and parameters of the MMC used for the analysis and simulation are presented in Fig. 2 and Table I , respectively. T 1 and T 2 in Fig. 2 represent the upper and lower power semiconductor devices in a cell.
According to the Kirchhoff's voltage law, we obtain the following equation for the MMC (see Fig. 2 ):
where i p and i n are the upper and lower arm currents, l is the inductance of arm inductors, E p and E n are the dc voltages, and v ci and S i are the capacitor voltage and switching state of the cell i, respectively. S i is defined in Table II , where g 1 and g 2 are the gate signals for the upper and lower switch in a cell. Since the circulating current of the MMC converter is i z = (i p + i n )/2 [25] , (6) can be rewritten as Based on (2) and (7), an SMO can be obtained for the MMC
It is noted that a saturation function sat(x) (9) is utilized instead of sgn(x) for less chattering of the observed states according to [26] sat
where h is a constant. A simulation has been carried out in SIMULINK/PLECS to verify the SMO (8) . The parameters of the MMC are listed in Table I and the observer gain L is 6 × 10 4 and h = 1. Fig. 3 shows the simulation results where it can be seen thatî z follows i z closely.
III. FAULT DETECTION AND ISOLATION USING SMO
A. Mathematical Basis
The FD is first considered and a fault is added to the first-order systemẋ
where f represents the value of the fault and k the corresponding coefficients. It is noted that f is often a very large value and cannot be overcome by the feedback control.
The difference between the observed and measured states can be obtained by subtracting (10) from (2)
If we choose
then at the faulty conditionx 1ẋ1 > 0, the observer cannot enter the sliding mode andx will diverge from x significantly. For an open-circuit fault at cell i in the MMC, f = v ci /(2l), k i = 1, and therefore, L needs to satisfy the following condition to detect an open-circuit faulty switch:
The occurrence of a fault can be detected by comparing |x − x| with a given threshold value.
For the FI, an assumption-verification method was proposed [20] , [21] . The procedure is to assume a location for the fault, modify the observer equation accordingly, and to again compare the observed states with the measured states.x will converge to x if the assumption is correct. In this case, kf is included in the observer as well
Subtracting (14) from (10) yields the dynamical erroṙ
which is the same as (4), where sliding conditionxẋ < 0 is satisfied andx → x in finite time. On the other hand, if the assumed fault location is incorrect,x will keep diverging from x. In this way, the fault can be located.
B. Flowchart
The flowchart of this algorithm is shown in Fig. 4 . There are two modes in this algorithm: FD mode and FI mode.
FD mode: This mode monitors whether a fault occurs. If the difference between the observed and measured circulating current |i z −î z | is larger than a threshold value I th1 and this condition persists for 0.4 ms, then an open-circuit fault occurs and the FDI scheme enters FI mode; otherwise, the FDI scheme stays in FD mode.
FI mode: This mode locates where is the open-circuit fault. The assumption-verification process is employed. The cell i, T j is assumed to be the faulty device, the switching state S i in SMO (8) is modified according to Table II in [20] . If cell i, T j is the actual faulty device,î z converges to i z ; otherwise,î z diverges from i z . It is important to note that during some points in the faulty period, the current of the faulty arm can be clamped to zero because of the fault, and the converter is unobservable in these moments. Therefore,î z is set toî z = i z when the current of the assumed faulty arm is 0 as shown in Fig. 4 .
It is noted that the threshold values I th1 and I th2 are load dependent. In the case of faulty power semiconductor device,î z diverges from i z slower under light load than that under heavy load. The divergence rate betweenî z and i z is also related to the observer gain L according to (11) . There are many choices for I th1 and I th2 and, for example, one of them can be where L o denotes the observer gain under the full load, I z the circulating current, and I z o the circulating current under full load. As shown in (16) , it is recommended that L, I th1 , and I th2 are larger than certain values to reject the parameter uncertainties and measurement noise.
Simulations have been carried out to verify the proposed algorithm with the parameters listed in Table I . L needs to satisfy L < V c /2l = 2.5 × 10 5 according to (12) , and L is set to 6 × 10 4 so that an open-circuit fault can be detected and located within 50 ms.
In Figs. 5-7 , an open-circuit fault occurs at cell 1, T 1 at 0.1 s. In Fig. 5 , no FDI scheme is applied andî z diverges from i z at a very high rate after the occurrence of the fault. In Figs. 6 and 7 , the FDI algorithm enters FI mode once |i z −î z | > I th1 persists for 0.4 ms. The FI mode is indicated with a grey background. In Fig. 6 , the assumed faulty switch is the actual one andî z converges to i z in FI mode; in Fig. 7 , the assumed faulty switch is cell2, T 1 , which is not the actual faulty device;î z diverges from i z in FI mode and |î z − i z | > I th2 in 50 ms.
IV. ROBUSTNESS ANALYSIS AND DISTURBANCE COMPENSATION
In any analytical FDI scheme, certain assumptions including accurate physical parameters, precise measurements, and linear time-invariant operation are made when modeling a plant [5] . However, these assumptions may not be accurate. The parameters may contain uncertainties, for example, the parasitic resistance of an inductor, and may degrade over time. Measurements usually have errors superimposed on the signals. These errors can include electronic white noise and incorrect scaling factors between the measured and actual variable. Furthermore, all dynamical plants are nonlinear, but behave almost linearly. These uncertainties and disturbances may lead to divergence between the actual system behavior and its estimated behavior, giving false alarms. The robustness of an FDI scheme is the degree to which the system can maximize the sensitivity of the detection of actual malfunctions while discriminating between apparent faults and disturbances due to measurement noise, parameter uncertainty, or transients [5] .
The desirable features of this FDI method are as follows: 1) white noise in the measurement does not affect the observed states, so it does not affect the FDI; 2) the value of the parameter uncertainties, scaling errors in the measurements, and other bounded disturbance is estimated using the observer injection term; this estimated value is used to compensate for the uncertainties and disturbances. In summary, the proposed method is able to detect and locate an open-circuit fault of a power semiconductor device while ignoring parameter uncertainties, measurement noise, or other bounded disturbances. This desirable feature will be discussed in this section.
A. Mathematical Basis
The first-order system (1) and its SMO (2) are considered to demonstrate the features described above. By adding the uncertainties and disturbances to (2), we obtaiṅ
where Δa and Δb denote the values of parameter uncertainties, and Δu the value of the measurement noise consisting of white noise Δr and a scaling error between the measured and actual variable Δs. It is assumed that the values of these uncertainties and disturbances are bounded and are smaller than the value of a fault. Subtracting (17) from (1), we obtain the errors between the measured and observed stateṡ
If we choose L satisfying
thenxẋ < |x|(|ax| + |D| − L) < 0; the sliding mode in (18) occurs andx → 0 (namelyx → x) in finite time.x is not affected by the uncertainties or the disturbances. Based on (12) and (19) , the observer gain needs to satisfy the following condition to discriminate an open-circuit fault from uncertainties and disturbances:
Two simulations have been carried out to verify the above analysis. In these simulations the parameter uncertainties and measurement noise are added to the observer; all other conditions are the same as for Figs. 6 and 7 . An open-circuit fault in cell 1, T 1 occurred at 0.1 s and in FI mode the assumed faulty switch is the actual one. In the first simulation (see Fig. 8 ), 5% white noise is added to all the measurements as shown in (21) . In the second simulation (see Fig. 9 ), parameter uncertainties 
where the subscript mes denotes measured variables; r 1 , r 2 , and r 3 are random numbers ranging from −1 to 1 and change at every calculation cycle;l denotes the inductance used in the observer; and R l denotes the parasitic resistance of the arm inductors.
In the fault-free condition, it can be seen in Figs. 8 and 9 that i z converge to i z and is not affected by the uncertainties and disturbances. It can also be seen in Fig. 8 that white noise in the measurements does not affect the FI which is indicated with gray background. Since the average value of the white noise is zero, its effect on the observer is self-canceling, and therefore, the observer and FDI scheme are not affected. However, parameter uncertainties and scaling errors in the measurements will lead to incorrect FI. As shown in Fig. 9 , there is noticeable difference between theî z and i z . Larger observer gain and threshold values can be used to alleviate the incorrect FI, but more time will be needed to detect and locate a fault.
B. Compensation of Uncertainties and Disturbances
In this section, the value of parameter uncertainties, scaling errors in the measurements, and other bounded disturbances are estimated, and this estimated value is used to compensate the observer to achieve robust FDI.
Once (18) enters the sliding mode,x → 0 andẋ → 0, and it can be obtained that
When the MMC is fault free (0 to 0.1 s in Figs. 8 and 9 ), the uncertainties and disturbances D are counterbalanced by the observer injection term −Lsgn(x) according to (23) . Therefore, the value of D can be extracted from −Lsgn(x). Since −Lsgn(x) is a high-frequency switching term, a low-pass filter is applied to obtain the estimated value of D
whereD denotes the estimated value of the uncertainties and disturbances, and τ denotes time constant of the low-pass filter. A simulation has been undertaken with the white noise (21), scaling errors, and parameter uncertainties (22) , and the simulation results are shown in Fig. 10 . The value ofD is about 20 000 A/s and is caused by the parameter uncertainties and scaling errors in the measurements (the effect of the white noise is self-canceling). Because of the uncertainties and disturbances, the observer injection term Lsgn(ĩ z ) operates at a biased condition with an offset of 20 000 A/s; as a result, the observer becomes sensitive to noise and incorrect FI is caused. In order to achieve robust FDI,D is added to SMO to compensate for the uncertainties and disturbances
It is noted thatD only updates when the system is fault free. 
V. EXPERIMENTAL VALIDATION OF THE FDI METHOD
An MMC experimental rig has been built to validate the FDI method. The method is implemented in an FPGA using fixed-point arithmetic. The implementation procedures and experimental results are presented in this section.
A. Experimental Rig
The diagram and a photograph of the laboratory setup are shown in Figs. 13 and 14 , respectively. The assembled power module with gate driver and heat sink is shown in Fig. 15 . The power module is soldered to a module interface board and attached to a heat sink. The cell capacitances are selected such that the ripple of the capacitor voltages is less than 10% [27] and arm inductances are chosen such that the switching harmonic is less than 60% of the nominal circulating current. The parameters of the experimental rig are listed in Table III .
The control scheme of the MMC experimental rig is shown in Fig. 16 . The subscripts p and n denote the upper and lower 
B. FPGA Implementation of the SMO
The SMO is implemented in the FPGA to obtain the quasianalog behavior of the observed states. The observer is implemented using fixed point as there is no floating point unit in the A3P1000 FPGA. The implementation includes three steps.
Step 1: Convert the analog observer into discrete form. Usinġ
Step 2 
where m I , m V , and m E are the scaling factors. Substituting (27) into (26), we obtain
Step 3: Convert the parameters from floating point to fixed point and implement the observer in the FPGA using Verilog. The observer equations are break down into three parts as shown in (28) . The block diagram of FPGA program is illustrated in Fig. 18 . The subtraction is performed by adding the complement of the subtracted number and the multiplication is carried out by shifting. 
C. Experimental Results
In the experimental tests, to create the open-circuit fault condition on a power semiconductor device, the gate drive signal of the device is set to low. The experimental results are taken using a C6713 host-port interface daughtercard and the waveforms are shown in Figs 4 is chosen for L and h = 0.25. In these experimental tests, parameter uncertainties and measurement noise are considered: 10% error in the inductance l, 0.11-Ω parasitic resistance in the arm inductors, and 5% scaling error in the measurement of the e p . A low-pass filter with a time constant of 0.1 s is used to filter the switching frequency of −Lsgn(x) as shown in (24) . This filter is implemented in the DSP. The estimated value of the uncertainties and disturbances is about −2400 A/s, as shown Fig. 19 . This estimated value is put into the observer to compensate for the uncertainties and disturbances. In the experimental results in Fig. 20-26 , this compensation has been added. (16) . I th2 = 5.2 A is indicated using a black dash line. In Fig. 21 , the assumed faulty switch is the actual one-cell 6, T 2 ,î z converge to i z ; in Fig. 22 , the assumed faulty switch is cell 7, T 2 ,î z diverges from i z .
In Figs 
D. Discussion on the Detection Time
The choice of threshold value in an FD system such as the one we have described is always a compromise between the time for detection and the certainty of a correct detection. In the simulation and experimental results above, we have used a very conservative value for the threshold which yields a detection time of 50 ms. During this time, the capacitor voltage of the faulty cell in the 24-MW MMC rises to approximately 2300 V according to Fig. 1 . While this is unlikely to be an issue for the semiconductors (usually rated at 3.3 kV), it might be unacceptable in terms of the headroom on capacitor voltage rating. In addition, careful coordination would be required with any local overvoltage protection. The detection time can be reduced by selecting a less conservative threshold as indicated in the results of Fig. 26 for the experimental rig, where an open-circuit fault occurs at cell 5, T 1 at 0.05 s and is automatically detected and removed once located. Here, we have selected a threshold of I th2 = 2 A (indicated in Figs. 21-24) , which still gives good certainty of FD and yields a detection time of 20 ms, reducing the impact on the capacitor voltages considerably. Clearly, the exact situation in a practical converter will differ from that in our laboratory prototype and selection of an appropriate threshold will be an important consideration.
VI. CONCLUSION
This paper has presented an SMO-based FDI technique applied to an MMC. The technique can detect and locate an opencircuit fault of a power semiconductor device or a gate driver failure in less than 50 ms. This method is simple with only one SMO equation and requires no additional transducers or circuits.
However, this method is not suitable for the detection and isolation of a short-circuit faulty device due to the very fast response requirement (10 μs). It is suggested that the proposed method works together with the hardware detection methods (for shortcircuit fault) to achieve a more reliable MMC.
To improve the robustness of the FDI method, a technique is proposed to estimate parameter uncertainties, measurement errors, and other bounded disturbances, and the estimated value is used to compensate for the influence of the uncertainties and disturbances. As a result, the proposed technique can detect and locate an open-circuit faulty power semiconductor device while ignoring the parameter uncertainties, measurement noise, or other disturbances.
The FDI algorithm has been implemented in an FPGA using fixed-point arithmetic and has been tested on an experimental scaled-down, single-phase, eight-cell MMC converter. Experimental results have verified the analysis and simulation results. According to the experimental results, it is possible to use a smaller threshold value to detect and locate an open-circuit fault in less than 20 ms.
This FDI method can be applied to other converters with modular topologies employing similar analysis and principles. Furthermore, it is possible to apply this method for the detection and isolation of multiple open-circuit faults in an MMC, although it will take longer to find the faults as there are many possible fault scenarios to be assumed.
