Common approaches for direct model predictive control (MPC) for current reference tracking in power electronics suffer from the high computational complexity encountered when solving integer optimal control problems over long prediction horizons. We propose an efficient alternative method based on approximate dynamic programming, greatly reducing the computational burden and enabling sampling times below 25 µs. Our approach is based on the offline estimation of an infinite horizon value function which is then utilized as the tail cost of an MPC problem. This allows us to reduce the controller horizon to a very small number of stages while simultaneously improving the overall controller performance. Our proposed algorithm was implemented on a small size FPGA and validated on a variable speed drive system with a three-level voltage source converter. Time measurements showed that our algorithm requires only 5.76 µs for horizon N = 1 and 17.27 µs for N = 2, in both cases outperforming state of the art approaches with much longer horizons in terms of currents distortion and switching frequency. To the authors' knowledge, this is the first time direct MPC for current control has been implemented on an FPGA solving the integer optimization problem in real-time and achieving comparable performance to formulations with long prediction horizons.
I. INTRODUCTION
A MONG the control strategies adopted in power electronics, model predictive control (MPC) [1] has recently gained popularity due to its various advantages [2] . MPC has been shown to outperform traditional control methods mainly because of its ease in handling time-domain constraint specifications that can be imposed by formulating the control problem as a constrained optimization problem. Due to its structure, MPC can be applied to a variety of power electronics topologies and operating conditions providing a higher degree of flexibility than traditional approaches.
With the recent advances in convex optimization techniques [3] , it has been possible to apply MPC to very fast constrained linear systems with continuous inputs by solving convex quadratic optimization problems within microseconds [4] . However, when dealing with nonlinear systems or with integer inputs, the optimal control problems are no longer convex and it is harder to find optimal solutions. Sequential quadratic programming (SQP) [3] has gained popularity because of its ease in iteratively approximating nonconvex continuous control problems as convex quadratic programs, which can be solved efficiently. Moreover, integrated perturbation analysis (IPA) has been recently combined with SQP methods (IPA-SQP) [5] and applied to power electronics [6] by solving the approximated quadratic optimization problem at a given time instant using a perturbed version of the problem at the previous time instant, thereby reducing the number of iterations required at each time step. However, there are still two orders of magnitude difference in achievable computation time compared to results obtained for linear systems [4] and further advances are required to apply these methods to very fast dynamical systems.
In power electronics, many conventional control strategies applied in industry are based on proportionalplus-integral (PI) controllers providing continuous input signals to a modulator that manages conversion to discrete switch positions. Direct MPC [7] instead combines the current control and the modulation problem into a single computational problem, providing a powerful alternative to conventional PI controllers. With direct MPC, the manipulated variables are the switch positions, which lie in a discrete and finite set, giving rise to a switched system. Therefore, this approach does not require a modulator and is often referred to as finite control set MPC.
Since the manipulated variables are restricted to be integer-valued, the optimization problem underlying direct MPC is provably N P-hard [8] . In power electronics these optimization problems are often solved by complete enumeration of all the possible solutions, which grow exponentially with the prediction horizon [9] . Since long horizons are required to ensure stability and good closed-loop performance [10] , direct MPC quickly becomes intractable for real-time applications. As a consequence, in cases when reference tracking of the converter currents is considered, the controller horizon is often restricted to one [2] . Recently, the IPA-SQP method has been applied to a finite control set MPC [11] to efficiently approximate the optimization problem in the case of nonlinear systems. However, no particular attention is paid to reducing the number of integer combinations that must be evaluated, which is at the source of the most significant computational issues. Despite attempts to overcome the computational burden of integer programs in power electronics [12] , the problem of implementing these algorithms on embedded systems remains open.
A recent technique introduced in [13] and benchmarked in [14] reduces the computational burden of direct MPC when increasing the prediction horizon. In that work the optimization problem was formulated as an integer least-squares (ILS) problem and solved using a tailored branch-and-bound algorithm, described as sphere decoding [15] , generating the optimal switching sequence at each time step. Although this approach appears promising relative to previous work, the computation time required to perform the sphere decoding algorithm for long horizons (i.e. N = 10), is still far slower than the sampling time typically required, i.e. T s = 25 µs. In the very recent literature, some approaches have been studied to improve the computational efficiency of the sphere decoding algorithm. In particular, in [16] a method based on a lattice reduction algorithm decreased the average computational burden of the sphere decoding. However, the worst case complexity of this new reformulation is still exponential in the horizon length. In [17] , heuristic search strategies for the sphere decoding algorithm are studied at the expense of suboptimal control performance. Even though a floating point complexity analysis of the algorithms is presented in these works, no execution times and no details about fixedpoint implementation are provided. Furthermore, there currently exists no embedded implementation of a direct MPC algorithm for current control achieving comparable performance to formulations with long prediction horizons. This paper introduces a different method to deal with the direct MPC problem. In contrast to common formulations [18] where the switching frequency is controlled indirectly via penalization of the input switches over the controller horizon, in this work the system dynamics are augmented to directly estimate the switching frequency. Our approach allows the designer to set the desired switching frequency a priori by penalizing its deviations from this estimate. Thus, the cost function tuning can be performed more easily than with the approach in [14] and [13] , where a tuning parameter spans the whole frequency range with no intuitive connection to the desired frequency. To address the computational issues of long prediction horizons, we formulate the tracking problem as a regulation one by augmenting the state dynamics and cast it in the framework of approximate dynamic programming (ADP) [19] . The infinite horizon value function is approximated using the approach in [20] and [21] by solving a semidefinite program (SDP) [22] offline. This enables us to shorten the controller horizon by applying the estimated tail cost to the last stage to maintain good control performance. In [23] the authors applied a similar approach to stochastic systems with continuous inputs, denoting the control law as the "iterated greedy policy".
As a case study, our proposed approach is applied to a variable-speed drive system consisting of a three-level neutral point clamped voltage source inverter connected to a medium-voltage induction machine. The plant is modelled as a linear system with a switched three-phase input with equal switching steps for all phases.
Closed loop simulations in MATLAB in steady state operation showed that with our method even very short prediction horizons give better performance than the approach in [14] and [13] with much longer planning horizons.
We have implemented our algorithm on a small size Xilinx Zynq FPGA (xc7z020) in fixed-point arithmetic and verified its performance with hardware in the loop (HIL) tests of both steady-state and transient performance. The results achieve almost identical performance to closed-loop simulations and very fast computation times, allowing us to comfortably run our controller within the 25 µs sampling time.
The remainder of the paper is organized as follows. In Section II we describe the drive system case study and derive the physical model. In Section III the direct MPC problem is derived by augmenting the state dynamics and approximating the infinite horizon tail cost using ADP. Section IV describes all of the physical parameters of the model used to verify our approach. In Section V we present closedloop simulation results on the derived model in steady state operation to characterize the achievable performance of our method. In Section VI we describe the hardware setup, the algorithm and all the FPGA implementation details. In Section VII HIL tests are performed in steady-state and transient operation. Finally, we provide conclusions in Section VIII.
In this work we use normalized quantities by adopting the per unit system (pu). The time scale t is also normalized using the base angular velocity ω b that in this case is 2π50 rad /s, i.e. one time unit in the per unit system corresponds to 1/ω b s. Variables in the three-phase system (abc) are denoted ξ abc = ξ a ξ b ξ c . It is common practice to transform phase variables to ξ αβ in the stationary orthogonal αβ coordinates by ξ αβ = P ξ abc . The inverse operation can be performed as ξ abc = P † ξ αβ . The matrices P and P † are the Clarke transform and its pseudoinverse respectively, i.e.
II. DRIVE SYSTEM CASE STUDY
In this work we consider a variable-speed drive system as shown in Figure 1 , consisting of a threelevel neutral point clamped (NPC) voltage source inverter driving a medium-voltage (MV) induction machine. The total dc-link voltage V dc is assumed constant and the neutral point potential N fixed.
In most modern approaches to control variable-speed drive systems, the control is split into two cascaded loops. The outer loop controls the machine speed by manipulating the torque reference. The inner loop controls the torque and the fluxes by manipulating the voltages applied to the stator windings of the machine. Our approach focuses on the inner loop. The reference torque is converted into stator currents references that must be tracked and the controller manipulates the stator voltages by applying different inverter switch positions.
al one. Enumeration is sometimes this is a misconception since enuo MPC problems featuring a limited ces. Exhaustive enumeration is not housands of sequences, which arise ith prediction horizons of four or ions made previously, this paper exhorizons longer than one for direct g. To address computational issues, etrical structure of the underlying resents an efficient optimization alelements of sphere decoding [35] to quences, requiring only little comnables the use of long prediction s applications. nal approach is derived for a linear e-phase input with equal switching ally, the present work focuses on a , consisting of a three-level neutral e inverter driving a medium-voltage lts in the analysis part [34] show that rger than one does, in fact, provide efits. In particular, at steady-state ions and/or the switching frequency y with respect to direct MPC with some cases, a steady-state perforis similar to the one of optimized tion of this paper and its analysis tantiating the following statements. with reference tracking and long olved in a computationally efficient oding and tailoring it to the problem ons provide at steady state a better on one case. Third, long horizons pact on the transient performance. e can be further reduced by using The latter gives suboptimal results, when the switching effort is very r is organized as follows. Section II ase study used throughout the two e model predictive current control h can be cast as an integer QP, as pting the notion of sphere decoding, solved efficiently, as described in ions are provided in Section VI. use normalized quantities and adopt xtending this to the time scale t, to 1/ω b s, where ω b is the base lly, we use ξ(t), t ∈ R, to denote nd ξ(k), k ∈ N, to denote discretempling interval T s . All variables hree-phase system (abc) are transin the stationary orthogonal αβ co- ordinates through ξ αβ = P ξ abc , where
II. DRIVE SYSTEM CASE STUDY While the ideas of this study can be applied to general ac-dc, dc-dc, dc-ac, and ac-ac topologies with linear loads, including active front ends, inverters with RL loads and inverters with ac machines, we focus our exposition on the setup described in the sequel.
A. Physical Model of the Inverter
As an illustrative example of a medium-voltage power electronic system, consider a variable speed drive consisting of a three-level neutral point clamped (NPC) voltage source inverter (VSI) driving an induction machine (IM), as depicted in Fig. 1 . The total dc-link voltage V dc is assumed to be constant and the neutral point potential N is fixed.
Let the integer variables u a , u b , u c ∈ U denote the switch positions in the three-phase legs, where for a three-level inverter the constraint set is given by
In each phase, the values −1, 0, 1 correspond to the phase voltages −
2 , respectively. Thus, the voltage applied to the machine terminals in orthogonal coordinates is
The voltage vectors are shown in Fig. 2 .
B. Physical Model of the Machine
The state-space model of a squirrel-cage induction machine in the stationary αβ reference frame is summarized hereafter. For the current control problem at hand, it is convenient to choose the stator currents i sα and i sβ as state variables. The 
A. Physical Model of the Inverter
The switch positions in the three phase legs can be described by the integer input variables u a , u b , u c ∈ {−1, 0, 1}, leading to phase voltages {− 
where
B. Physical Model of the Machine
Hereafter we derive the state-space model of the squirrel-cage induction machine in the αβ plane.
Since we are considering a current control problem, it is convenient to use the stator current i s,αβ and the rotor flux ψ r,αβ as state variables. The model input is the stator voltage v s,αβ which is equal to the inverter output voltage in (1) . The model parameters are: the stator and rotor resistances R s and R r ; the mutual, stator and rotor reactances X m , X ls and X lr , respectively; the inertia J; and the mechanical load torque T l . Given these quantities, the continuous-time state equations [24] , [25] are
where D := X s X r − X 2 m with X s := X ls + X m and X r := X lr + X m , and I represents the 2 × 2 identity matrix. To simplify the notation, we have dropped the subscripts αβ from all vectors in (2) . Moreover, τ s := X r D/ R s X 2 r + R r X 2 m and τ r := X r /R r are the transient stator and the rotor time constants respectively. The electromagnetic torque is given by
The rotor speed ω r is assumed to be constant within the prediction horizon. For prediction horizons in the order of a few milliseconds this is a mild assumption.
C. Complete Model of the Physical System
Given the models of the drive and of the induction motor in (1) and (2) respectively, the state-space model in the continuous time domain can be described as
where the state vector x ph = i s,α i s,β ψ r,α ψ r,β includes the stator current and rotor flux in the αβ reference frame. The output vector is taken as the stator current, i.e. y ph = i s,αβ . The matrices D, E and F are defined in Appendix A.
The state-space model of the drive can be converted into the discrete-time domain using exact Euler discretization. By integrating (4a) from t = kT s to t = (k + 1)T s and keeping u sw (t) constant during each interval and equal to u sw (k), the discrete-time model becomes
with matrices A ph := e DTs , B ph :
I is an identity matrix of appropriate dimensions. Although the sampling time is T s = 25 µs, we use the discretization intervalT s = T s ω b for consistency with our per unit system, where ω b is the base frequency.
III. MODEL PREDICTIVE CURRENT CONTROL

A. Problem Description
Our control scheme must address two conflicting objectives simultaneously. On the one hand, the distortion of the stator currents i s cause iron and copper losses in the machine leading to thermal losses. Because of the limited cooling capability of the electrical machine, the stator current distortions have to be kept as low as possible. On the other hand, high frequency switching of the inputs u sw produces high power losses and stress on the semiconductor devices. Owing to the limited cooling capability in the inverter, we therefore should minimize the switching frequency of the integer inputs.
Note that the effect of the inverter switchings on the torque ripples can be improved during the machine design. In particular, increasing the time constants of the stator and the rotor τ s and τ r can reduce the amplitude of the torque ripples by decreasing the derivative of the currents i s and fluxes ϕ r . This is achieved naturally when dealing with machines with higher power. Thanks to the flexibility of model based controller designs such as MPC, different machine dynamics influencing the torque ripples are automatically taken into account by the controller, which adapts the optimal inputs computation depending on the plant parameters. Thus, any improvements during the machine design can be optimally exploited by adapting the internal model dynamics in the controller. Another similar approach is to include LCL filters between the inverter and the motor to decrease the high frequency components of the currents; see [26] . These approaches allow operation at lower switching frequencies with low THD at the same time. However, it is sometimes impossible to change the machine's physical configuration and it is necessary to operate the inverter at high switching frequencies to satisfy high performance requirements in terms of stator currents distortion. For all these reasons there is an unavoidable tradeoff between these two criteria.
The controller sampling time plays an important role in the distortion and switching frequency tradeoff. Depending on the precision required in defining the inverter switching times, the controller is discretized with higher (e.g. 125 µs) or lower (e.g. 25 µs) sampling times. Higher sampling times define a more coarse discretization grid leading to less precise definition of the switching instants, but more available time to perform the computations during the closed-loop cycles. Lower sampling times, on the other hand, lead to improved controller accuracy while reducing the allowed computing time. However, for the same switching frequency, longer sampling times produce higher distortions. Ideally, the sampling time should be chosen as low as possible to have the highest possible accuracy.
In contrast to the common approaches in direct MPC where the switching frequency is minimized, in this work we penalize its difference from the desired frequency which is denoted by f * sw . This is motivated by the fact that inverters are usually designed to operate at a specific nominal switching frequency.
The current distortion is measured via the total harmonic distortion (THD). Given an infinitely long time-domain current signal i and its fundamental component i * of constant magnitude, the THD is proportional to the root mean square (RMS) value of the difference i − i * . Hence, we can write for one phase current
with M ∈ N. For the three-phase current i abc and its reference i * abc the THD is proportional to the mean value of (6) over the phases. It is of course not possible to calculate the THD in real time within our controller computations because of finite storage constraints.
The switching frequency of the inverter can be identified by computing the average frequency of each active semiconductor device. As displayed in Figure 1 , the total number of switches in all three phases is 12, and for each switching transition by one step up or down in a phase one semiconductor device is turned on. Thus, the number of on transitions occurring between time step k − 1 and k is given by the 1-norm of the difference of the inputs vectors:
Given a time interval centered at the current time step k from k − M to k + M , it is possible to estimate the switching frequency by counting the number of on transitions over the time interval and dividing the sum by the interval's length 2M T s . We then can average over all the semiconductor switches by dividing the computed fraction by 12. At time k, the switching frequency estimate can be written as
which corresponds to a non-causal finite impulse response (FIR) filter of order 2M . The true average switching frequency is the limit of this quantity as the window length goes to infinity
and does not depend on time k.
The f sw computation brings similar issues as the THD. In addition to finite storage constraints, the part of the sum regarding the future signals produces a non-causal filter that is impossible to implement in a real-time control scheme.
These issues in computing THD and f sw are addressed in the following two sections via augmentation of our state space model to include suitable approximation schemes for both quantities.
B. Total Harmonic Distortion
According to (6) , the THD in the three-phase current is proportional to the mean value of
As shown in [13] , the THD is also proportional to the stator current ripple in the αβ coordinate system, i.e.
T HD ∼ lim
where we have introduced the error signal e i (k) :
It is straightforward to show [27] that the stator current reference during steady-state operation at rated frequency is given by
Hence, in order to minimize the THD, we minimize the squared 2-norm of vector e i over all future time steps. We also introduce a discount factor γ ∈ (0, 1) to normalize the summation preventing it from going to infinity due to persistent tracking errors. The cost function related to THD minimization is therefore
In order to construct a regulation problem, we include the oscillating currents from (10) as two additional uncontrollable states x osc = i * s,αβ within our model of the system dynamics. The ripple signal e i (k) is then modeled as an output defined by the difference between two pairs of system states.
C. Switching Frequency
To overcome the difficulty of dealing with the filter in (7), we consider only the past input sequence, with negative time shift giving a causal FIR filter estimating f sw . This filter is approximated with an infinite impulse response (IIR) one whose dynamics can be modeled as a linear time invariant (LTI) system. Note that future input sequences in (7) are taken into account inside the controller prediction.
Let us define three binary phase inputs denoting whether each phase switching position changed at time k or not, i.e.
It is straightforward to show that the following second order IIR filter will approximate the one-sided version of the FIR filter in (7) [28] :
wheref sw (k) is the estimated switching frequency and x f lt (k) is the filter state. The two poles at a 1 = 1 − 1/r 1 and a 2 = 1 − 1/r 2 with r 1 , r 2 >> 0 can be tuned to shape the behavior of the filter. Increasing a 1 , a 2 make the estimate smoother, while decreasing a 1 , a 2 gives a faster estimation with more noisy values.
We denote the difference between the approximationf sw (k) and the target frequency f * sw by e sw (k) := f sw (k) − f * (k). Therefore, the quantity to be minimized in order to bring the switching frequency estimate as close to the target as possible is
where δ ∈ R + is a design parameter included to reflect the importance of this part of the cost relative to the THD component.
Finally, we can augment the state space to include the filter dynamics and the target frequency by adding the states x f lt f * sw so that the control inputs try to drive the difference between two states to zero. Since the physical states are expressed in the per unit (pu) system with values around 1, in order to have these augmented states within the same order of magnitude we will normalize them by the desired frequency f * sw defining x sw = (1/f * sw )x f lt 1 and the matrices A sw = blkdiag(A sw , 1),
D. MPC Problem Formulation
Let us define the complete augmented state as
with x(k) ∈ R 9 ×{−1, 0, 1} 3 and total state dimension n x = 12. The vector x ph represents the physical system from Section II-C, x osc defines the oscillating states of the sinusoids to track introduced in Section III-B, u sw (k − 1) are additional states used to keep track of the physical switch positions at the previous time step, and x sw are the states related to the switching filter from Section III-C.
The system inputs are defined as
where u sw are the physical switch positions and p are the three binary inputs entering in the frequency filter from Section III-C. The input dimension is n u = 6. To simplify the notation let us define the matrices G and T to obtain u sw (k) and p(k) from u(k) respectively, i.e. such that u sw (k) = Gu(k) and p(k) = T u(k). Similarly, to obtain u sw (k − 1) from x(k) we define a matrix W so that u sw (k − 1) = W x(k).
The MPC problem with horizon N ∈ N can be written as
where the stage cost is defined combining the THD and the switching frequency penalties in (11) and (16) respectively as N ) ) is an approximation of the infinite horizon tail that we will compute in the next section using approximate dynamic programming (ADP). The matrices A, B and C define the extended system dynamics and the output vector; they can be derived directly from the physical model (5) and from the considerations in Sections III-C and III-B.
The input constraint set is defined as where constraint (19b) defines the relationship between u sw and p from (12) and (13) . Constraint (19a) together with (19b) defines the switching constraints u sw (k) − u sw (k − 1) ∞ ≤ 1 imposed to avoid a shoot-through in the inverter positions that could damage the components. Finally, (19c) enforces integrality of the switching positions.
It is straightforward to confirm that the number of switching sequence combinations grows exponentially with the horizon length N , i.e. 3 3N = 27 N . The problem therefore becomes extremely difficult to solve for even modest horizon lengths.
Observe that the controller tuning parameters are δ, which defines the relative importance of the THD and f sw components of the cost function, and r 1 , r 2 , which shape the switching frequency estimator.
E. Control Loop
The complete block diagram is shown in Figure 2 . The desired torque T * determines the currents x osc by setting the initial states of the oscillator OSC. The motor speed ω r and the stator currents i s are measured directly from the machine and used by the observer OBS providing the physical states of the motor x ph . The auxiliary inputs p are fed into the filter FLT estimating the switching frequency in x sw . The switch positions u sw go through a one step delay and are exploited again by the MPC formulation.
Following a receding horizon control strategy, at each stage k the problem (18) is solved, obtaining the optimal sequence {u (k)} N −1 k=0 from which only u (0) is applied to the switches. At the next stage k + 1, given new vectors x osc (k), x ph (k), u sw (k − 1) and x sw (k) as in Figure 2 a new optimization problem is then solved providing an updated optimal switching sequence, and so on. The whole control algorithm, appearing within the dotted line, runs within 25 µs.
F. Approximate Dynamic Programming
The goal of this section is to compute a value function approximation V adp for an infinite horizon version of (18) . The function V adp is used as a tail cost in (18) .
Let V * (z) be the value function evaluated in z, i.e. the optimal value of the objective of our control problem starting at state z
subject to the system dynamics (18b). For notational convenience, we will drop the time index k from the vectors in this section. The main idea behind dynamic programming is that the function V * is the unique solution to the equation
known as the Bellman equation. The right-hand side can be written as monotonic operator T on V * , usually referred to as the Bellman operator: V * = T V * . Once V * is known, the optimal control policy for our problem starting at state z can be found as
{l(z, u) + γV * (Az + Bu)} , subject to constraints (18b).
Unfortunately, solutions to the Bellman equation can only be solved analytically in a limited number of special cases; e.g. when the state and inputs have small dimensions or when the system is linear, unconstrained and the cost function is quadratic [29] . For more complicated problems, dynamic programming is limited by the so-called curse of dimensionality; storage and computation requirements tend to grow exponentially with the problem dimensions. Because of the integer switches in the power converter analyzed in this work, it is intractable to compute the optimal infinite horizon cost and policy and, hence, systematic methods for approximating the optimal value function offline are needed.
Approximate dynamic programming [19] consists of various techniques for estimating V * using knowledge from the system dynamics, fitted data through machine learning or iterative learning through simulations.
Approximation via Iterated Bellman Inequalities: the approach developed in [20] and [21] relaxes the Bellman equation into an inequality
or, equivalently, using the Bellman operator: V adp ≤ T V adp .
The set of functions V adp that satisfy the Bellman inequality are underestimators of the optimal value function V * . This happens because, if V adp satisfies V adp ≤ T V adp , then by the monotonicity of the operator T and value iteration convergence, we can write
The Bellman inequality is therefore a sufficient condition for underestimation of V * . In [21] the authors show that by iterating inequality (20) , the conservatism of the approximation can be reduced. The iterated Bellman inequality is defined as:
where M > 1 is an integer defining the number of iterations. This inequality is equivalent to the existence of functions V adp i such that
By defining V adp 0 = V adp M = V adp , we can rewrite the iterated inequality as
where V adp i are the iterates of the value function.
To make the problem tractable, we will restrict the iterates to the finite-dimensional subspace spanned by the basis functions V (j) defined in [20] , [21] :
The coefficients α i will be computed by solving a Semidefinite Program (SDP) [22] .
The rewritten iterated Bellman inequality in (21) suggests the following optimization problem for finding the best underestimator for the value function V * :
where c(·) is a non-negative measure over the state space. On the chosen subspace (22) , the inequality (23b) is convex in the coefficients α ij . To see this, note that the left-hand side is affine in α ij . Moreover, for a fixed u the argument of min on the right-hand side is affine in α ij while the min of affine functions is concave.
The solution to (23) is the function spanned by the chosen basis that maximizes the c-weighted 1-norm defined in the cost function while satisfying the iterated Bellman inequality [20] . Hence, c(·) can be regarded as a distribution giving more importance to regions of the state space where we would like a better approximation.
Following the approach in [21] , we make use of quadratic candidate functions of the form
where P i ∈ S nx , q i ∈ R nx , r i ∈ R, i = 0, . . . , M .
If we denote µ c ∈ R nx and Σ c ∈ S nx + as the mean and the covariance matrix of measure c(·) respectively, by using candidate functions as in (24) the cost function of problem (23) becomes
We now focus on rewriting the constraint (23b) as a Linear Matrix Inequality (LMI) [30] . We first remove the min on the right-hand side by imposing the constraint for every admissible u ∈ U(x 0 ) and obtain
From [21] , we can rewrite (25) as a quadratic form
is a symmetric matrix. The matrices S i−1 , L and G i (u) are defined in Appendix B.
By noting that the state vector z includes two parts which can take only a finite set of values -the normalized desired frequency fixed to 1 and the previous physical input u sw (k−1) ∈ {−1, 0, 1} -we can explicitly enumerate part of the state-space and rewrite the quadratic form (26) more compactly as
wherez is the state vector without the desired frequency and u sw (k−1). Moreover, m := (u sw , u sw,pr ) ∈ M are all the possible combinations of current and previous switch positions satisfying the switching and integrality constraints (19) . The detailed derivation ofM (m) ∈ S 9 can be found in Appendix B.
Using the non-negativity condition of quadratic forms [22] , it is easy to see that (28) 
which can be solved efficiently using a standard SDP solver, e.g. [31] . Once we obtain the solution to (29), we can define the infinite horizon tail cost to be used in problem (18) as
G. Optimization Problem in Vector Form
Since we consider short horizons, we adopt a condensed MPC formulation of problem (18) with only input variables, producing a purely integer program. In this way all the possible discrete input combinations can be evaluated directly. With a sparse formulation including the continuous states within the variables, it would be necessary to solve a mixed-integer program requiring more complex computations.
Let us define the input sequence over the horizon N starting at time 0 as
where we have dropped the time index from U to simplify the notation. With straightforward algebraic manipulations outlined in Appendix C, it is possible to rewrite problem (18) as a parametric integer quadratic program in the initial state x 0 :
IV. FRAMEWORK FOR PERFORMANCE EVALUATION
To benchmark our algorithm we consider a neutral point clamped voltage source inverter connected to a medium-voltage induction machine and a constant mechanical load. We consider the same model as in [13] : a 3.3 kV and 50 Hz squirrel-cage induction machine rated at 2 MVA with a total leakage inductance of 0.25 pu. On the inverter side, we assume the dc-link voltage V dc = 5.2 kV to be constant and the potential of the neutral point to be fixed. The base quantities of the per unit (pu) system are the following: V b = 2/3V rat = 2694 V, I b = √ 2I rat = 503.5 A and f b = f rat = 50 Hz. Quantities V rat , I rat and f rat refer to the rated voltage, current and frequency respectively. The detailed parameters are provided in Table I . The switching frequency is typically in the range between 200 and 350 Hz for medium-voltage inverters [13] . If not otherwise stated, all simulations were done at rated torque, nominal speed and fundamental frequency of 50 Hz. We consider an idealized model with the semiconductors switching instantaneously. As such, we neglect second-order effects like deadtimes, controller delays, measurement noise, observer errors, saturation of the machine's magnetic material, variations of the parameters and so on. This is motivated by the fact that, using a similar model, previous simulations [32] showed a very close match with the experimental results in [33] . All the steady-state simulations in the following sections were also performed with model mismatch of ±1 % in all the parameters of Table I showing negligible variations in the THD. However, we omit these benchmarks since an exhaustive sensitivity analysis is out of the scope of this paper.
V. ACHIEVABLE PERFORMANCE IN STEADY STATE
We performed closed loop simulations in steady-state operation in MATLAB to benchmark the achievable performance in terms of THD and switching frequency. The system was simulated for 4 periods before recording to ensure it reaches steady-state operation. The THD and switching frequency were computed over simulations of 20 periods. The discount factor was chosen as γ = 0.95 and the switching frequency filter parameters as r 1 = r 2 = 800 in order to get a smooth estimate. The weighting δ was chosen such that the switching frequency is around 300 Hz. The infinite horizon estimation SDP (29) is formulated using YALMIP [34] with M = 50 Bellman iterations and solved offline using MOSEK [31] . Note that in case of a change in the systems parameters, e.g. the dc-link voltage or the rotor speed, the tail cost has to be recomputed. However, it possible to precompute offline and store several quadratic tail costs for different possible parameters and evaluate the desired one online without significant increase complexity.
For comparison, we simulated the drive system also with the direct MPC controller described in [13] (denoted DMPC) tuned in order to have the same switching frequency by adjusting the weighting parameter λ u .
The integer optimization problems were solved using Gurobi Optimizer [35] . Numerical results with both approaches are presented in Table II . Note that the choice of the solver does not influence the THD or the switching frequency and we would have obtained the same results with another optimization software.
Our method, with a horizon of N = 1 provided both a THD improvement over the DMPC formulation in [13] with N = 10 and a drastically better numerical speed. This showed how choosing a meaningful cost function can provide good control performance without recourse to long horizons. Moreover, we also performed a comparison with longer horizons N = 2, N = 3 and N = 10. Our method, with horizon N = 10 would give an even greater reduction in THD to 4.80 %. We implemented the control algorithm on a Xilinx Zynq (xc7z020) [36] , a low cost FPGA, running at approximately 150 MHz mounted on the Zedboard evaluation module [37] ; see Figure 3 . The control algorithm was coded in C++ using the PROTOIP Toolbox [38] . The FPGA vendors HLS tool Xilinx Vivado HLS [39] was used to convert the written code to VHDL defining the Programmable Logic connections.
B. Algorithm Description
We now present a detailed description of how the controller within the dotted lines in Figure 2 was implemented on the FPGA.
The updates in OSC and FLT were implemented as simple matrix multiplications. The solver for the integer problem (32) was implemented with a simple exhaustive search algorithm for three reasons:
first, the tail cost approximation provides good performance with very few horizon steps while considering a relatively small number of input combinations; second, the structure of the problem allows us to evaluate both the inequalities and cost function for multiple input sequences in parallel; third, the FPGA logic is particularly suited for highly pipelined and/or parallelized operations, which are at the core of exhaustive search.
To exploit the FPGA architecture, we implemented our algorithm in fixed-point arithmetic using custom data types defined in Vivado HLS [39] . In particular, we used 4 integer and 0 fractional bits to describe the integer inputs and 2 integer and 22 fractional bits to describe the states and the cost function values. This choice is given by the minimum number of bits necessary to describe these quantities from floating-point simulations in Section V. Note that the exhaustive search algorithm does not suffer from any accumulation of rounding error because it consists entirely of independent function evaluations, in contrast to iterative optimization algorithms [3] .
We provide pseudo-code for our method in Algorithm 1. From Figure 2 , the controller receives the required torque T * (k) and the motor states x ph (k) and returns the switch positions u sw (k). From line 2 to line 8 the oscillator OSC and the filter FLT are updated to compute the new initial state x 0 for the optimization algorithm. Note that if there is a change in the required torque then the oscillator states x osc (k) are reset to match the new T * (k). Line 9 and 10 precompute the vectors in problem (32) depending on x 0 .
The main loop iterating over all input combinations is split into two subloops: Loop 1, which is completely decoupled and can be parallelized; and Loop 2, which can only be pipelined. . For every loop cycle, sequence i is saved into variable u. Then, in line 13, the value of p(k) is updated inside u with u sw (k − 1) according to (12) and (13) . If u satisfies the constraint A ineq u ≤ b ineq (x 0 ), then the cost function is stored in J (i) (line 15). Otherwise J (i) is set to a high value J ub . Note that, even though it would bring considerable speed improvements, we do not precompute offline the quadratic part u Qu of the cost and the left side of the inequality A ineq u since it would also require enumeration over inputs at the previous control cycle used in line 13.
Each iteration of this loop is independent from the others and can therefore be parallelized efficiently.
Loop 2 from line 20 to 26 is a simple loop iterating over the computed cost function values to find the minimum and save it into J min . Every iteration depends sequentially on J min which is accessed and can be modified at every i. Thus, in this form it is not possible to parallelize this loop, although it can be pipelined.
C. Circuit Generation
In Vivado HLS [39] it is possible to specify directives to optimize the circuit synthesis according to the resources available on the target board. Loop 1 and Loop 2 were pipelined and the preprocessing operations from line 2 to 10 parallelized. We generated the circuit for the algorithm 1 with horizons N = 1 and N = 2 at frequency 150 MHz (clock cycle of 7 ns). The resources usage and the timing estimates are displayed in Table III . Since timing constraints were met, there was no need to parallelize Loop 1 to reduce computation time.
Note that for N = 2 we are using already 91 % of the DSP multipliers. This is due to the limited amount of resources available on our chosen low-cost hardware.
VII. HARDWARE IN THE LOOP TESTS
We performed hardware in the loop (HIL) experiments using the controller FPGA fixed-point implementation developed in Section VI and the machine model described in Section IV.
N , J min ∈ R and i min ∈ N Execute Filter and Oscillator to Obtain Initial State:
if change in T (k) then 3:
x osc (k) ← Reset according to (3) 4:
else 5:
end if
7:
8:
Precompute Vectors:
f (x 0 ) ← Compute from (40) 10: (41), (42) Loop 1 -Compute Cost Function Values:
11:
u ← U seq (:,i)
13: J min ← J ub , i min ← 1
21:
for i = 1, . . . , 27 N do 22: if J (i) ≤ J min then 23:
24:
i min ← i
25:
end if 26: end for
Return Results
27:
u sw (k) ← U The control loop was operated using the PROTOIP toolbox [38] : the plant model was simulated on a Macbook Pro 2.8 GHz Intel Core i7 with 16GB of RAM while the control algorithm was entirely executed on the Zedboard development board described in Section VI-A.
A. Steady State
The controller was benchmarked in HIL in steady-state operation to compare its performance to the achievable performance results obtained in Table II . We chose the same controller parameters as in Section V.
The HIL tests for horizon N = 1 are shown in Figure 4 in the per unit system. The three-phase stator currents are displayed over a fundamental period in Figure 4a , the three spectra are shown in Figure 4b with THD of 5.23 % and the input sequences are plotted in Figure 4c .
From the experimental benchmarks with horizon N = 1 and N = 2 we obtained THD = 5.23 % and THD = 5.14 % respectively. As expected, these results are very close to the simulated ones in Table II . The slight difference (∼ 0.01 %) comes from the fixed-point implementation of the oscillator OSC and the filter FLT in Figure 2 .
B. Transients
One of the main advantages of direct MPC is the fast transient response [13] . We tested torque transients in HIL with the same tuning parameters as in the steady state benchmarks. At nominal speed, reference torque steps were imposed; see Figure 6b . These steps were translated into different current references to track, as shown in Figure 6a , while the computed inputs are shown in Figure 6c .
The torque step from 1 to 0 in the per unit system presented an extremely short settling time of 0.35 ms similar to deadbeat control approaches [40] . This was achieved by inverting the voltage applied to the load. Since we prohibited switchings between −1 and 1 in (19b) and (19a), the voltage inversion was performed in 2T s via an intermediate zero switching position.
Switching from 0 to 1 torque produced much slower response time of approximately 3.5 ms. This was due to the limited available voltage in the three-phase admissible switching positions. As shown in Figure 6c , during the second step at time 20 ms, the phases b and c saturated at the values +1 and −1 respectively for the majority of the transient providing the maximum available voltage that could steer the currents to the desired values.
These results match the simulations of the DMPC formulation in [13] in terms of settling time showing that our method possesses the fast dynamical behavior during transients typical of direct current MPC.
As noted in [13] , having a longer horizon or a better predictive behavior does not significantly improve the settling times. This is because the benefit of longer prediction obtainable by extending the horizon or adopting a powerful final stage costs is reduced by the saturation of the inputs during the transients. DMPC sphere decoding algorithm execution times, we compared our results to the time needed to solve the DMPC formulation in [14] for the same horizon lengths on a Macbook Pro with Intel Core i7 2.8 GHz and 16GB of RAM using the commercial integer program solver Gurobi Optimizer [35] which implements an efficient Branch-and-Bound algorithm. The results are shown in Figure 5 .
The FPGA execution times were 5.76 µs and 17.27 µs for horizon N = 1 and N = 2 respectively. Note that they presented a slight overhead of approximately 3.5 µs compared to the estimates in Table III since the measured times included the time needed to exchange the input-output data from the FPGA to the ARM processor through the RAM memory. Without the overhead, the estimated FPGA computing times obtained by the circuit generation are exact; see [39] .
Note that the time needed by the FPGA to compute the control algorithm is deterministic with zero variance. This makes our HIL implementation particularly suited for real-time applications. Furthermore, it is important to underline that the method we propose is the only method available that can produce integer optimal solutions to this problem achieving this performance in 25 µs sampling time.
The execution times needed by Gurobi optimizer were 621.2 ± 119.98 µs and 750.40 ± 216.15 µs for horizons N = 1 and N = 2 respectively. The non-negligible standard deviation appeared because of the branch-and-bound algorithm implemented in Gurobi. However, since we are considering real-time applications, we are interested in the worst case number of visited nodes which is always the whole tree of combinations, i.e. 27 N . Note that the DMPC formulation was solved in [13] using a different branch-and-bound algorithm based on the sphere decoding algorithm [15] , but the worst case number of visited nodes cannot be easily reduced because of the N P−hardness of the problem.
VIII. CONCLUSION AND FUTURE WORK
This work proposes a new computationally efficient direct model predictive control (MPC) scheme for current reference tracking in power converters. We extended the problem formulation in [13] and [14] in order to include a switching frequency estimator in the system state and rewrite the optimal control problem as a regulation one. To reduce the horizon length and decrease the computational burden while preserving good control performances, we estimated the infinite horizon tail cost of the MPC problem formulation using approximate dynamic programming (ADP).
Steady-state simulation results show that with our method requiring short horizons, it is possible to obtain better performance than the direct MPC formulation in [13] with long horizons. This is due to the predictive behavior of the tail cost function obtained with ADP.
The control algorithm was implemented in fixed-point arithmetic on the low size Xilinx Zynq FPGA (xc7z020) for horizons N = 1 and N = 2. Hardware in the loop (HIL) tests during steady-state operation showed an almost identical performance to the simulation results. We also performed transient simulations where our proposed approach exhibited the same very fast dynamic response as the direct MPC described in [13] . Moreover, we showed that our algorithm can run within the sampling time of 25 µs by measuring the execution time on the FPGA. Results showed that only 5.76 µs and 17.27 µs are required to run our controller for horizons N = 1 and N = 2 respectively.
Direct MPC can also be applied to more complex schemes such as modular multilevel converters (MMC) [41] . While it is possible to derive a complete MMC model that could be used in an MPC approach, the number of switching levels per horizon stage exponentially increases with the number of converter levels. As stated in [14] , long-horizon predictive power is expected to be even more beneficial with MMCs. We believe that our method, making use of short prediction horizons and long predictions using an approximate value function could be applied effectively to MMCs with more levels because it is still possible to evaluate on commercially available FPGAs the multilevel feasible switching combinations over very short horizons within the required sampling time.
There are several future directions to be investigated. Given the system design there are several symmetries in the model that could be exploited to increase the controller horizon without requiring more computational power. Regarding the frequency estimation, other filters with different orders could be implemented and their parameters chosen optimally by solving an optimization problem instead of performing manual tuning. Moreover, it would be interesting to benchmark other ADP tail cost functions (e.g. higher order polynomials) to understand which ones best approximate the infinite horizon tail cost and produce the best overall control performance.
ACKNOWLEDGMENT
The authors would like to thank Andrea Suardi and Bulat Khusainov for their advice regarding FPGA implementation, in particular the PROTOIP toolbox. The quadratic form decomposition can be derived as follows. For every m = (u sw , u sw,pr ) ∈ M, we can define the input u m = u sw u sw − u sw,pr 1 , X = x(0) x(1) . . . x(N ) , the system dynamics (18b) with initial state constraint (18c) can be written as 
where the last equality is obtained by plugging in (34) and the term const(x 0 ) is a constant depending on the initial state. Matrix H is defined as
In order derive the tail cost, let us write the last stage as
where B end is the last row of B used to compute the last state. Using equations (37) and (30), the tail cost can be rewritten as 
By combining (35) and (38) 
where in the term on the right we substituted (34) .
Similarly, constraint (19a) with k = 0, . . . , N − 1 can be written as We can now merge (41) and (42) into a single inequality A ineq U ≤ b ineq (x 0 ) and rewrite (18) neglecting the constant terms in the cost function obtaining the result in (32) .
