Abstract-This paper presents an efficient real-time implementation of embedded model predictive control, adopted in the context of active vibration control with the objective of minimizing the tip deflection of lightly damped cantilever beams. In particular, we focus on memory and time-efficient explicit solutions of the associated constrained optimal control problems that are easily implementable on low-end embedded hardware. To this end, we exploit the concept of convex lifting and show how it can be used to devise low-complexity, regionless piecewise affine controllers without any loss of optimality and performance. The efficiency of this constructive procedure is quantified via an extensive complexity analysis, evidenced by a successful practical deployment and optimal vibration control performance using a family of 32-bit ARM Cortex-M-based microcontroller platforms.
I. INTRODUCTION

M
ODEL predictive control (MPC) has brought a tremendous improvement to the quality of many industrial applications [1] . Since its early conception, MPC was adopted in petrochemical plants for its inherent ability to handle process constraints and its increased control performance. The relatively slow processes in the chemical industry as well as the lack of need to miniaturize and aggressively cost-optimize computing hardware were initially concealing the main drawback of predictive control methodology-its demand for computing resources. The utilization of advanced optimal control schemes was, therefore, at first limited to systems with slow dynamics or powerful computing implementations, or both.
Since then, the evolution of control theory and applied automation has not stopped; besides the constantly dropping price of hardware and increased performance, new computationally efficient MPC methods have been devised. Thanks to the combination of these factors, new application areas are emerging. Applications involving fast dynamics like mechanical systems with active vibration control (AVC)-the main interest of this paper-are now within the realm of practical implementation possibilities.
There have been a plethora of academic works using various adaptations of the predictive control algorithm for vibration attenuation. However, with a few exceptions [2] - [4] , these works are restricted to computing hardware that is feasible only in a laboratory setting due to its size and weight [5] . The possibility of one using MPC for fast systems like AVC in consumer-level products depends on an often overlooked but essential practical factor: hardware price.
Embedded single-chip systems, also known as microcontroller units (MCUs), can offer only a small fraction of execution performance or memory compared with their laboratory or industrial counterparts, but they may be mass-produced and built-into miniaturized devices for very little money. Microcontroller families and architectures that were once part of bulky personal computers that were being slowly forgotten are now making a comeback in miniaturized versions, partly due to their low price [6] . This is where the efficiency-boosting achievements of control theory in the field of MPC come to the foreground: these developments allow one to implement better control methods with fewer resources. Because of the improvements in algorithm efficiency, model predictive control can now be implemented on embedded hardware such as MCUs [7] , [8] , programmable logic controllers (PLC) [9] - [11] , or field programmable gate arrays (FPGA) [4] , [12] , [13] . Efficiency improvements in nominal or deterministic MPC can be divided into two main categories [14] . To the first belong online MPC algorithms that attempt to minimize the real-time computational requirements by a context-oriented reformulation of the optimization problem, or by the use of advanced, often hardware-targeted optimization solvers [15] - [17] (sometimes referred to as implicit MPC). The other group consists of a family of methods known as explicit MPC (EMPC), which essentially turn the real-time optimization problem into a simple table-lookup procedure by precomputing its optimal solution [18] , [19] .
Computationally efficient developments in implicit MPC attempt to minimize the online execution requirements, so that simpler control hardware is sufficient. However, these methods still requiring a considerable computing power are usually not memory-intensive. The explicit approach is quite different; it trades execution speed for memory footprint. EMPC may be executed with a very little computing effort, but it is known for its memory complexity that grows exponentially with the prediction horizon [20] , which may quickly make the hardware deployment or the online evaluation intractable.
In this paper, we resort to the explicit approach to solving the MPC problem, since it inherently comes with several convenient features from the real-time applications' perspective. In particular, unlike its implicit counterpart, the structure and properties of the explicit solution allow for a straightforward implementation in a division-free fashion and, moreover, provide a tool for exact worst-case certification of the implementation and closed-loop analysis [21] . We look at a particular possibility to dramatically decrease both the memory and runtime complexity of the EMPC code, so that it can be implemented in low-cost embedded microcontrollers. This paper is focused on a particular class of systems with fast dynamicsthe aforementioned problem of AVC. Nowadays, experimental AVC systems start to leave laboratories and appear in consumer products, like active car suspensions [22] or even miniaturized medical devices, to reduce hand tremor in the sufferers of Parkinson's disease [23] . Vibration control systems are in this way getting smaller, and thus require tiny and cheap microcontrollers that have limited power consumption. There is indeed no good reason why these devices and products should not benefit from advanced control algorithms, but are current embedded devices up to the task?
Of course, embedded MPC has been applied to AVC systems before. Possibly, the first work of its kind used implicit approaches and machine-level code optimization on digital signal processors (DSPs) [3] . While therein achieved sampling speeds measured in tens of kilohertz were remarkable, highend DSP hardware still remains prohibitively expensive for low-cost mass-produced applications. Many have suggested EMPC as the ideal formulation for cheap embedded hardware [9] , [24] . Former trials with EMPC for vibration attenuation evaluated memory needs and algorithm timing, albeit on powerful hardware that is suitable only for laboratory tests [5] , [25] . EMPC without efficiency modifications was recently applied to AVC via embedded hardware, showing the limits of hardware and software on high-end microcontrollers [26] . Thanks to the underlying formulation of EMPC, execution speed does not create any prohibitive issues in embedded MCU, but the available volatile memory and nonvolatile memory of a microcontroller do affect its class, and therefore its cost. In an ideal case, microcontrollers closer to what is today considered as average should be capable of running model predictive vibration control. Nevertheless, contemporary midrange MCUs are simply not sufficient for vibration attenuation based on EMPC [26] .
To this end, we introduce an efficient methodology to transform the nominal, memory-intensive EMPC into an equivalent formulation that enables to meet the strict requirements imposed by technical specifications of low-end embedded control hardware, with no implications in loss of optimality or closed-loop stability. To this end, we exploit the concept of convex lifting, recently adopted in [27] - [29] in the context of control design, and show that it can be used to devise regionless yet fully optimal EMPC controllers implementable in embedded AVC applications with fast sampling speeds. Compared with the former study [26] , we aim at EMPC solutions with guaranteed stability that render much smaller domains of attraction, and hence need longer prediction horizons [30] . The resulting algorithm will be implemented in a range of 32-bit embedded microcontrollers to provide an overview of its memory foot-print and execution timing. This embedded AVC system will be used to minimize the tip deflections of an aluminum cantilever beam by supplying the input decisions of the proposed control algorithm to the piezoceramic actuators in the form of a driving voltage, while gaining its feedback from the position measurements. The purpose of the active cantilever beam featured here is via release tests to emulate the dynamic behavior of a class of flexible mechanical structures under transient external disturbances [31] .
Note that there are several well-known control algorithms routinely used in AVC, the standard one being the positive position feedback (PPF) controller [32] , which is, however, not in the scope of this paper. The use of PPF along with nominal EMPC for AVC can be found in our previous work [26] .
The rest of this paper is structured as follows. After introducing the system model, Section II recalls the concept of EMPC and presents two convex lifting-based methods for efficient embedded hardware implementation of MPC in AVC, followed by a detailed memory and runtime complexity analysis. Section III describes the laboratory setup and the experimental deployment of the autogenerated code on a class of 32-bit ARM Cortex-M microcontroller units. Finally, vibration damping performance, memory, and timing properties of the proposed algorithms are discussed in Section IV.
Notation: We denote by R, R n , R n×m , N, and N + the sets of real numbers, n-dimensional real vectors, n × m dimensional real matrices, nonnegative integers, and positive integers, respectively. For a vector-valued function f : R n → R m , dom( f ) denotes its domain. Given an arbitrary set S, conv(S), and dim(S) denote its convex hull and the dimension of its affine hull. Moreover, if S is full-dimensional, int(S) denotes its interior. Given a set S ⊆ R n and a subspace S of R n , Proj S S denotes the orthogonal projection of S on the space S. Given two sets S 1 and S 2 , we define the following set:
In addition, a finite index set of N ∈ N + elements will be denoted as I N := {1, . . . , N}, and its cardinality by |I N |.
II. CONTROLLER DESIGN
A. Modeling
The mechanical behavior of a given physical object mainly depends on its modal properties and energy dissipation, known as damping. There are infinitely many resonance frequencies and modes for every real-life object; however, it is sufficient to include the principal ones in a mathematical representation to achieve a good match with the measured behavior [22] , [33] .
The dynamic response of a flexible cantilever beam is clearly dominated by its first resonant frequency and its corresponding mode of vibration [5] , [34] . Thus, in order to enable the realtime tractability of MPC on an embedded system, we chose to represent a beam driven by piezoceramics by a simplified nominal dynamic model-an equivalent single degree of freedom (SDOF) linearly driven mass-spring-damper unit.
By assuming a viscous damping model, the system equivalent in its response can be described by a second-order linear differential equation asq(t) + 2ζ ωq(t) + ω 2 q(t) = cu(t), where q(t) [m] is the position of the free end of the beam, ω [rads −1 ] is the first natural resonance frequency, and ζ [−] is the unitless damping ratio [33] . The change of the voltage u(t) to driving force is linear in the piezoceramic actuators, and the constant c [NV −1 kg −1 ] represents this conversion ratio. Choosing position and velocity for the state vector x(t), we may express the continuous-time state-space representation of the beam aṡ
with the state-transition and input matrix, A ∈ R 2×2 and B ∈ R 2×1 , given as above, while assuming that only the measurement of the position gives the output matrix as C = [1 0]. The driven beam was identified using a pseudorandom binary excitation signal, supplied to the actuating elements for a period of 100 s. Using a gray-box prediction error method system identification procedure, the unknown parameters for the laboratory system that is further described in Section III were determined as ω = 50.89 rads −1 (8.10 Hz), ζ = 0.005, and c = 5.91 NkV −1 kg −1 . We remark that the assumption that the beam dynamics may be represented by an SDOF model is, apart from providing satisfactory results, an essential premise to this paper, as it is unlikely that complex prediction models are able to preserve real-time implementation feasibility on simple microcontrollers [5] . Although by using embedded computing devices with generous memory capacities one may be able to utilize EMPC for the vibration control of up to two to three resonant modes or multi-input multioutput systems, it is unreasonable to expect a complex electromechanical model derived, e.g., by a finite-element analysis to be viable on current hardware.
Before proceeding, let us emphasize that the following theoretical developments are valid for the class of linear systems; however, in the sequel, we will specifically focus on the AVC problem that is central to this paper.
B. Explicit Model Predictive Control
Let us assume the control of linear discrete-time systems in the state-space form, given as
where t denotes the multiples of the sampling period T s , and the pair (A d , B d ) is stabilizable. In the presence of input (and/or state) constraints, we may formulate the following constrained finite-time optimal control problem:
where x k and u k denote, respectively, state and control input predictions over a finite horizon N ∈ N at time instant t + k, initialized by the current state, i.e., x 0 = x(t), and subject to polytopic constraints given by X ⊆ R n x and U ⊆ R n u . Within the quadratic objective (3a), the stage costs are weighted with Q 0, R 0, while the terminal penalty uses P 0 usually determined as a solution of the discrete-time algebraic Riccati equation (DARE) for the unconstrained problem (3a). Reformulating (3) into a quadratic program (QP) and solving it for a feasible initial state x 0 yield a sequence of optimal control moves
horizon MPC feedback thus becomes u 0 , which is actually implemented to the controlled system, and the procedure is repeated at the next sampling instant for a new value of the state. In addition, persistent feasibility and stability may be guaranteed by employing a maximal control invariant set C ∞ [35] as terminal constraint set in (3d). Note also that for the ease of presentation and without loss of generality, we will in further developments assume singleinput systems, such as the vibration system (1).
In view of practical implementation aspects discussed in Section I, let us focus on the explicit representation of the optimizer to the MPC problem (3), u 0 , rather than its considerably more expensive computation in the implicit fashion outlined above. As shown in [18] , this can be achieved by recasting and solving (3) as a parametric QP (pQP) using the technique of parametric programming, which allows us to precompute the MPC control law U (x) for all feasible values of parameter x, explicitly, as a continuous and piecewise affine (PWA) function. In the receding horizon implementation, the closed-loop EMPC feedback has the following form:
. . .
A polyhedron denotes the intersection of a finite set of closed halfspaces. The collection of the polyhedral regions
, is referred to as a partition of the set of feasible parameters; a formal definition is given in Section II-C. The online implementation effort thus reduces to a simple function evaluation, as per (4), where the most time is spent on the point location, i.e., determining which polyhedral region
h , the current state resides in, by checking its defining inequalities, i.e., halfspaces. A straightforward way for searching the state-space partition is the direct sequential region traversal (see Algorithm 1) with runtime complexity linear in the number of regions. The other crucial EMPC implementation factor is the amount of memory needed to store the regions R i and the optimal PWA feedback κ(x). Both the aforementioned complexity indicators are of a great practical importance, namely, in the case of deployment of embedded control hardware, which has given rise to numerous techniques over the last decade, focused on the reduction of both memory and runtime complexity of EMPC (refer to [36] for an overview of recent lower complexity implementations and [37] for faster point location algorithms).
The EMPC vibration controller for the experimental system (to be introduced in Section III) described by (1) is illustrated in Fig. 1 . The PWA feedback law was obtained assuming the sampling period of 20 ms and the prediction horizon of 50 steps. The underlying state-space partition
consists of 5207 regions and is omitted for clarity. More details on the computation are given in Section III.
C. Efficient Embedded EMPC via Convex Lifting
This section aims to introduce the main algorithmic developments toward memory and time-efficient embedded vibration EMPC implementations using the geometric concept of convex lifting.
With respect to the scope of this paper, we restrict ourselves to the essential theoretical concepts and focus on the controller synthesis and its practical deployment of embedded control hardware. A comprehensive overview of theoretical and structural properties of the convex lifting can be found in [38] .
To put forward the algorithmic implementation, let us first recall some formal and necessary definitions.
In addition, a cell complex of polyhedron R is defined as a polyhedral partition whose facet-to-facet property holds, meaning any pair of neighboring regions share a common facet [39] . In the context of MPC problem, R constitutes its feasible set [20] . The definition of a convex lifting is given next.
Definition 2: Given a polyhedral (polytopic) state-space par-
where a i ∈ R n x , b i ∈ R, ∀i ∈ I R , is called a convex PWA lifting (for brevity, henceforth, referred to as convex lifting) if the following conditions hold:
We remark that the partition that admits a convex lifting is a cell complex [28] . The cell complex characterization is indeed a necessary, but not a sufficient condition for the existence of a convex lifting; and holds for nondegenerate MPC problems. In the following, we will suppose the polyhedral/polytopic partition to have the properties of a cell complex. Note that even a convexly nonliftable partiton may be easily modified into a liftable one by appropriate hyperplane arrangement [38] , [40] .
Let us append some useful definitions to be used next [41] . Definition 3: Let κ and κ denote, respectively, the maximum and minimum values that the PWA function κ( (4) attains over its domain R. Denote by I max (I min ) the index set of regions where κ(x) is saturated at the maximum (minimum), i.e., κ(x) = κ (κ(x) = κ) for all x ∈ R i , i ∈ I max (i ∈ I min ), and let I sat = I max ∪ I min . A region R i with i ∈ I sat is therefore called the saturated region either at the minimum or at the maximum. Otherwise, R i is called unsaturated with the index set denoted by I uns = I R \ I sat . Definition 4: Given a continuous PWA function κ(x), defined over a parameter-space partition {R i } i∈I R , we call the PWA functionκ(x) :=f T j x +g j a suitable augmentation of κ(x) if the following properties hold.
Herein, R I denotes the subset of regions {R i } i∈I R for some index set I ⊆ {1, . . . , R}.
Note that the premise of existence of saturated regions does not hold in general; however, |I uns | R is a common case in most practical MPC setups [36] (see the explicit vibration Algorithm 2 Construction of a Convex Lifting for Unsaturated Regions of a Given Polytopic Partition controller in Fig. 1 with |I uns | = 679), and we show that it can be efficiently exploited in the proposed controller implementation. Denoting the vertex set of a polytope P by V(P), let us recall the algorithmic procedure for construction of a convex lifting for a given state-space partition [28] , herein aiming specifically at its unsaturated regions. It is summarized in Algorithm 2, and its outcome is the gains of a convex lifting defined over the unsaturated regions.
We remark that the feasibility of the optimization problem (8) can serve as a necessary and sufficient condition for the existence of a convex lifting of a given partition. Since there exist in fact infinitely many candidates for z(x) belonging to the epigraph of z(x), let us exclusively denote the convex lifting z(x) obtained as per Algorithm 2 by (x), in this case uns (x), which is to be exploited by the two efficient EMPC implementation techniques presented next. Before proceeding, let us, therefore, denote the quantities resulting from Sections II-C1 and II-C2 by superscripts I and II, respectively.
1) Inverse Parametric Optimization-Based Implementation:
In the following, we recall a recent technique that exploits the concept of inverse parametric convex programming (pCP) via convex lifting [28] , [42] , [43] . It will be briefly described here, since it had drawn attention to the interesting properties of convex lifting that ultimately led to the main developments presented in Section II-C2, and was also shown to be relevant in robust and EMPC design [27] , [29] .
Inverse pCP (IpCP) is defined as an inverse optimality problem of pCP, which aims to build an alternative optimization problem characterized by an appropriate constraint set and a cost function such that its optimal solution coincides with the one of an original problem. In particular, the goal of inverse parametric linear programming/inverse parametric quadratic programming (IpLP/IpQP) is to construct a linear constraint set and a linear/quadratic cost function such that the optimal solution of this newly formulated problem is equivalent to a given PWA function defined over a given polyhedral partition. Construction of such an optimization problem based on convex lifting was proposed in [28] and provided a novel perspective on the structural link between linear MPC design and pCP that can be stated as follows: every continuous PWA control law can be recovered via a linear optimal control problem with control horizon at most equal to two prediction steps.
Considering, e.g., the IpLP problem, a given continuous PWA function κ(x) defined over a polyhedral partition {R i } i∈I R is the image via the orthogonal projection onto R n u (= R in this case) of the optimal solution to the parametric LP below [28] :
where z represents a 1-D auxiliary "lifting" variable, and the constraint set [x T z u] T is obtained as follows:
and can be equivalently expressed as
is simply composed of the partition vertices v ∈ i∈I R V(R i ) appended by corresponding values of convex lifting (v) and control action κ(v) in the augmented space R n x +1+n u , in case of system (1), R 4 .
Algorithm 3 Extended IpLP With Clipping [29]
Since the solution of the "horizon 2" IpLP problem (9) exactly recovers the original "horizon N" pLP/pQP-based EMPC solution, its structural complexity and hence also online implementation effort remain the same. Let us, therefore, recall the extension proposed recently in [29] , aiming at practical EMPC problems featuring active input constraints, i.e., a large number of saturated regions which typically inhibit the direct convex liftability. The so-called extended IpLP procedure is summarized in Algorithm 3. It starts by finding the lifting uns (x) to be used for construction of a polyhedron uns
in the augmented space. This is then exploited by keeping only the constraints (facets of uns [x T z u] T ) which practically contribute to the optimal solution [29] , yielding a constraint set˜ of the extended IpLP (12) . Its solution consists of˜ I (x), κ I (x) defined over {R I j } j ∈IR I -a rearranged partition of R. The PWA functionκ I (x) (13) obtained from the extended IpLP (12) is by Definition 4 a suitable augmentation of κ(x) (4), i.e., only the portion defined over R I uns is retrieved exactly. To establish the equivalence between the two over the entire feasible domain R = dom(κ(x)), and thus the optimality, we employ a clipping filter φ(·) adopted from [41] as follows:
This wayκ(x) inherits all the performances, closed-loop stability, and feasibility guarantees of the original EMPC feedback.
We remark that if the EMPC problem yields a partition with |I uns | R, then |I uns | ≤R I R tends to hold as well [29] . While in [29] , we proposed the traditional EMPC implementation via point location, yet using the lower complexity problem data given byκ I (x), {R I j } j ∈IR I ; in the following, we show that it may be performed in a substantially more efficient regionless fashion, employing the lifting uns (x) instead.
2) "Pure" Convex Lifting-Based Implementation: Recall that the output of Algorithm 2 is the gains (a i , b i ), ∀i ∈ I uns of a convex lifting uns (x), defined over R I uns . Now, let us define the following convex lifting:
Since˜ II (x) = uns (x) for all x ∈ i∈I uns R i , let us, for the ease of presentation, denote the polytopic partition of R associated with˜ II (x) (15) by {R II j } j ∈IR II ,R II = |I uns |. The corresponding PWA control law has the following property [38] :
it may be used within online implementation in the following fashion:
This property enables to easily identify the index j of the j th affine control law to be evaluated at a given x 0 ∈ R, Algorithm 4 Efficient Regionless Online EMPC Implementation Using Convex Lifting and Clipping without the need to search forR II j by traversing {R II j } j ∈IR II , yielding a regionless EMPC controller. As in the case ofκ I (x), optimality of the PWA feedbackκ II (x) (possibly discontinuous over R I sat ) for any x ∈ R can be achieved by taking φ(κ II (x)).
We remark that an EMPC implementation exploiting property (17) was proposed in [44] , yet employing the optimal cost function, which for the explicit solutions based on pLP (only if the optimizer is unique) represents nothing else than a convex lifting from the geometrical viewpoint. However, in the pQP case, such an approach is no longer applicable.
The online (convex) lifting-based EMPC (LEMPC) implementation is summarized in Algorithm 4. Note that executing Step 2 practically amounts to a mere searching for the maximum among the list {ã T j x 0 +b j }R II j =1 by sequential comparing of its components evaluated at x 0 . A more illustrative, pseudocode implementation can be found in the Appendix.
The interested reader is referred to [28] and [38] for a comprehensive theoretical background on the convex lifting concept. It is also shown therein that both the constructive and the implementation procedure may be easily generalized for systems with multiple inputs. An alternative construction of (x) based on half-space representation of {R i } i∈I R is shown in [38] and shall be preferred in case of higher dimensional systems, since it allows to avoid the vertex enumeration. Scalability of both vertex and half-space-based construction is also shown there.
Note that a component of explicit solution of the extended IpLP (12) in Section II-C1, the convex lifting˜ I (x), can be also employed to accelerate the online EMPC implementation using Algorithm 4. However, in contrast to the latter, simplified procedure relying only on construction of the convex lifting uns (x) itself (Algorithm 2), obtaining˜ I (x) per Algorithm 3 requires an extra off-line effort spent on constructing the constraint set˜ for the LP (12) and its subsequent explicit solution, i.e., steps 2-5, which renders this variant from computational perspective clearly redundant.
3) Complexity Analysis: In terms of complexity, let us first assess the memory footprint of particular EMPC implementations described above. Taking the original EMPC controller as a reference, the total memory consumed by the PWA feedback κ(x) and its underlying partition
h is the number of half-spaces defining the i th region. On the other hand, the memory footprint of the regionless LEMPC controllers (either I or II), given by {F j ,g j ,ã j ,b j }, amounts toR(1 + n u )(n x + 1) + 2n u real numbers, where the second negligible term represents memory needed to encode {κ, κ} used by the clipping function φ. Recall that in case of variant II (Section II-C2)R II = |I uns |, whileR I ≥ |I uns | in general.
Second, let us quantify the necessary online computational effort. The critical time-consuming task in the standard EMPC implementation is the point location, i.e., finding the index i of the region R i that contains x 0 , followed by simple evaluation of the associated i th affine control law. Performing the direct sequential search per Algorithm 1 amounts in the worst case to R i=1 n i h (2n x +1)+2n u n x floating point operations (FLOPs). The regionless LEMPC implementation can solve this problem in an efficient way exploiting clipping (Algorithm 4), avoiding the expensive region-based point location. In total, it requires a constant number of 2Rn x + 2n u n x + 2n u FLOPs (R II = |I uns |), which implies a significant reduction in runtime complexity; even if |I uns | = R was the case, while typically
h . Note that none of the terms above consider storing or evaluating the full optimizer U with Nn u elements, since only its first element u 0 is required for implementation in a receding horizon fashion. The off-line computation times needed to construct˜ (x) andκ(x) are reported in Section III-A for completeness.
One may also compare the above LEMPC techniques with other complexity reduction approaches in EMPC, e.g., with the clipping-based one in [41] as they both exploit the concept of clipping. The latter, however, relies on replacing the saturated regions with extensions of the unsaturated ones, for which the achievable reduction may range from none (R [41] = R) to the case when the new partition of R has R [41] = |I uns | regions. Another approach in [36] requires to store only the unsaturated regions by devising a separating function. Clearly, the family of these methods still requires storing a modified state-space partition, and hence performing the traditional point location. On the contrary, techniques aiming at faster online evaluation in EMPC usually come at a price of a larger memory footprint and/or preprocessing time [37] , [45] . Alternatively, a memory-efficient regionless EMPC implementation was proposed recently in [46] , combining the approaches in [47] and [48] . Its nature renders it applicable for problems with even moderately large parametric space (however, only very short prediction horizon) and does not imply reduction in the evaluation effort. LEMPC, as indicated above, enables a significant reduction in both storage and online computation requirements with a relatively low preprocessing load. For completeness, we remark that study [49] recently presented a linear machine concept, which is similar to convex lifting presented in this paper. However, necessary and sufficient conditions for the existence of a convex lifting are not available therein, while these are stated in our previous studies such as [28] and [38] .
The concepts presented in Sections II-C1 and II-C2, adopted for the experimental vibrating system described by (1) , are shown in Figs. 2 and 3 , respectively. In particular, Fig. 3(a) visualizes the PWA convex lifting˜ II (x) (15) for all x ∈ R. We recall that the associated state-space partition {R II j } j ∈IR II needs not to be constructed, and is depicted for illustration only. Fig. 3 (b) in blue shows the associated control lawκ II (x) [see (16) ]. From the geometrical point of view, these can be interpreted as continuous extensions of uns (x) and κ(x), ∀x ∈ i∈I uns R i , respectively, over the entire feasible domain R. The number of local control laws and convex lifting terms of such a regionless EMPC implementation is R II = |I uns | = 679. On the other hand, Fig. 2(a) and (b) shows the convex lifting˜ I (x) and the PWA feedbackκ I (x), both obtained as optimal solution of the "horizon 2" extended IpLP (12) . These can be geometrically interpreted as orthogonal projections of˜ onto the respective subspaces. The number of the affine lifting as well as feedback terms in this case isR I = 1160. Recall that in case of the original MPC problem (3) with N = 50 solved as pQP, the number of regions was R = 5207. Finally, the result of clipping per (14) allowing to maintain the equivalence betweenκ II (x) (κ I (x)) and κ(x) is visualized in Fig. 3(a) (Fig. 2(a) ) in orange color (cf. Fig. 1 ).
III. EXPERIMENTAL IMPLEMENTATION
A. Active Beam Test
The active cantilever beam first introduced in Section II-A is depicted in Fig. 5 , where the fixed-free aluminum beam is shown mounted to a sturdy base. The deflection of the free end of the beam is measured by means of a Keyence LK-G80 spot-type laser triangulation head, providing its signal to a Keyence LK-G3000 series central processing unit. The beam is actuated by a pair of cross-coupled MIDÉ QP16N piezoceramic actuators on its clamped end, receiving the same amplified input signal with inverted polarity.
The configuration of the experimental setup is summarized in the simplified scheme shown in Fig. 4 . The LEMPC algorithms introduced in Sections II-C1 and II-C2 were deployed in the stand-alone mode to an MCU. The inputs generated by the controller are fed to an operational amplifier (Texas Instruments TLC2272CP), then to a signal processor that shifts signal levels to a bipolar configuration (Advantech ADAM 3014) needed for the final capacitive amplifier (MIDÉ EL-1224) feeding the piezoceramic actuators. The input is read directly by the microcontroller in the form of a single analog signal from the laser triangulation system.
The cantilever beam had been tested in a scenario that is frequently referred to as a release test. Release tests are often used to emulate transient mechanical behavior often encountered in aerospace applications in the laboratory environment [31] . In a release test, the free end of the beam is deflected by a force to an initial position, and then released to reach its equilibrium with or without feedback control. The repeat release tests were generated by a stinger mechanism (not shown in Fig. 5 ) that pushes the end of the beam away from its zero position upon receiving a digital signal. The experiment itself is launched, controlled, and logged using an external computer, sending the engage signals to the release tests, and capturing input, output, and timing data. This external computer contains a laboratory measurement card and runs a simple experiment control and data logging program under the Simulink real-time algorithm prototyping suite.
The microcontroller used in the experimental testing was an STMicroelectronics (STM) 32-bit (STM32) F051x-series MCU, specifically STM32 F051R8T6. This MCU features 64-kB nonvolatile read-only memory (ROM) and 8-kB volatile random-access memory (RAM). The MCU considered for the laboratory test and the rest of the investigated devices belong to a family of microcontrollers that use the very popular ARM Cortex-M core design. The ARM Cortex-M family considered in this paper represents modern 32-bit embedded devices that are used in a wide variety of applications, including control systems technology. The F051x-series MCU in the laboratory test incorporates the basic M0 core architecture and is marketed as a low-cost entry-level device with a unit cost of approximately $1.50 for large quantities. The maximum clock speed of 48 MHz is more than that of the typical 20 MHz of budget 8-bit devices, however, it is still a true embedded device without the power and possibilities of its more expensive counterparts.
This microcontroller was tested using an STM32 F0 Discovery prototyping and evaluation board that integrates the MCU with a programmer and debugger and provides an easy access to the physical pins of the MCU. The prototyping board employed in the beam tests is shown in Fig. 6 . Input is generated and output is acquired via the integrated digital-to-analog and analog-to-digital converters of the MCU. The task execution timing (TET) is sent as an output to the data logging computer in the form of a simple digital signal that is logical true when the algorithm runs, and false when it idles. The board outputs two TET signals, one for the LEMPC algorithm only, and the other for the entire feedback control algorithm, including input and output data processing and state estimation.
The peripheral initialization code for the processor was developed using the STM32CubeMX initialization code generator. Subsequently, the rest of the program was finished and compiled to target using the IAR Embedded Workbench for ARM (EWARM) in C-language. The feedback LEMPC algorithms proposed in Section II-C received state estimates from a time-varying linear Kalman filter employing the model from Section II-A. Input and output measurements were rescaled linearly according to the input and output measurement chains.
Finally, let us shed more light on the essential computation of the controllers discussed so far and used in the experiments. As described in Section II-A, the beam dynamics (1) was experimentally identified and then discretized with T s = 0.02 s. The sampling time was chosen so as to enable implementation on low-cost hardware and it is also related to the dominant resonant frequency of the beam, such that the closed-loop system is sampled over six times an oscillation period. The MPC problem (3) was formulated with control horizon of N = 50 steps, which practically amounts to the prediction over 1 s. The choice for this horizon is motivated by the stability guarantees assumed in the problem formulation and allows the expected range of positions and velocities to be covered by the controller's domain of attraction [25] . In addition, state and input penalties were chosen as Q = I and R = 1, respectively, P N set to the solution of DARE, and C ∞ designed as the maximal control invariant set-rendering the controller response such that it attenuates the beam tip vibration efficiently and yet does not behave like an overly aggressive "bang-bang"-like controller. The controllers used symmetric ±100-V bounds to prevent depolarization of the piezoceramics. We remark that although state constraints are also allowed by the formulation, it is not practical to impose them on this process as it would unnecessarily restrict the performance of vibration damping and their feasibility cannot be guaranteed in the presence of transient excitation. The resulting MPC problem was solved parametrically using the Multi-Parametric Toolbox (MPT) [50] . The EMPC feedback in Fig. 1 was obtained in 10 minutes on a 2.2-GHz i7 core CPU with 8 GB of RAM running MATLAB 8.5, MPT3, and YALMIP [51] . The respective data were used to construct the LEMPC controllers per Sections II-C1 and II-C2, visualized in Figs. 2 and 3 . In particular, the "legacy" convex lifting I (x) was obtained in 250 s (via Algorithm 3), whereas finding the convex lifting˜ II (x) took mere 3 s (Algorithm 2 and (15)), which represents a negligible postprocessing effort compared with the time spent on computing the nominal EMPC controller. The obtained LEMPC controllers were then independently passed through a custom code generation routine to provide an efficient C-code-tailored for the embedded target and integrated into the main code described above.
B. Cross-Platform Comparison
The cross-platform microcontroller comparison tests were performed similarly to the experimental investigation with the active cantilever beam, but in processor only, without the physical beam assembly. The goal of these tests is not to evaluate the vibration damping performance of the controllers as this should be the same in all the cases, but to provide a common foundation for comparing the memory needs and the execution timing of the algorithms. The critical question to ask then is, whether a given class of microcontrollers has enough memory or computation power for similar applications employing the LEMPC methodology proposed here.
Therefore, in these trials, only the C-code for the LEMPC algorithms without state estimation or input-output pre and postprocessing was implemented on various microcontrollers. Each controller implementation was tested in double and single numeric precision version. The former represents the baseline controller version in finite-precision digital computing, but in certain cases, one may limit memory footprint and computation needs by using only single precision data without significantly affecting the quality of control [52] . This in turn allows usage of better performing controllers on the same hardware or further reduction of hardware cost by using less expensive MCUs. The numeric precision was varied only in the feedback control algorithms, not on the MCU initialization code; however, the chip, clock, and peripheral initialization routines are unlikely to contain floating point arithmetic anyway.
In order to make the timing and memory data meaningful for comparison, all controllers were created using the same model and identical parameters. In every single test case, a common and fixed initial state of x(0) = [−5 mm 400 mms −1 ] T has been considered. This state simulated an initial condition, where the beam is deflected to the position of −5 mm and having the initial speed of 400 mms −1 . This initial state is far outside the terminal set of the controllers and emulates a state for which constraints will be guaranteed to remain active.
Just as in the case of the experiments with the active beam, here the MCUs were initialized using the automatically generated code from STM32Cube MX, then it was compiled and deployed by IAR EWARM. In order to minimize the code overhead in addition to the LEMPC algorithms, all the peripherals were disabled in software besides a single digital output pin and the single wire debugging capability of the chip. This means that, though the memory footprint listed here includes a minimal overhead for initializing the MCU, this is kept at an absolute possible minimum. The digital signal from the remaining pin was used to measure TET using an oscilloscope.
The range of MCUs tested in this paper represents a good cross section of the devices typically available on the market at current time. The microcontrollers considered in this paper are photographed and compared in geometric scale in Fig. 7 . All MCUs were tested using electronic prototyping boards by STM (Discovery-series boards) for easy physical pin access, integrated programming, and debugging features. The MCUs evaluated here range from devices that are considered to be entry level (F051x), extremely cost-efficient (F030x), or low power (L100x); up to chips created for high-performance embedded applications (e.g., F407x and F746x). Obviously, the price point of the microcontrollers depends on many factors not directly related to computing performance or memory, such as peripheral selection or purchase quantity. The unit cost of the devices considered here ranges from approximately U.S. $1.50 to U.S. $10 for large quantities.
The exact types and basic specifications of the MCU considered here will be introduced along with the experimental results in Section IV. The nonvolatile memory used to store the program on the microcontroller ranged from 64 up to 2048 kB, while the volatile data memory ranged from 8 up to 340 kB. All microcontrollers were clocked to their respective maximum clock speeds, ranging from 24 MHz up to the considerable 216 MHz, except for STM F303x, which was tested at 64 MHz instead of its 72 MHz maximum, due to a hardware limitation of the prototyping board.
Besides the raw information on clock speed, the real performance of a microcontroller depends on many other factors, including that of the core design. The MCUs employed here incorporate the so-called ARM Cortex M0, M3, M4, and M7 architectures. Generally speaking, the architecture may affect both memory footprint and execution speed of the compiled program, even when using the same exact high-level code and accounting for clock speed differences. Some of the controllers tested here featured a floating point unit (FPU) as well, which may affect the computational performance of predictive controllers in theory-in fact, any task with floating point arithmetic. Although a previous work did not suggest a considerable performance increase with MCUs using their FPU [52] , and while any EMPC implementation is an unlikely candidate for FPU-boosted efficiency, we have turned these components on for all tests featured in this paper-if available in the core.
IV. RESULTS AND DISCUSSION
A. Active Beam Test
First, let us review the results from laboratory experiments with the active beam. A typical release test is shown in Fig. 8 , with output (position), input (voltage), and execution timing depicted from top to bottom. The open-loop response is shown in the background, and this is compared to feedback control using the LEMPC algorithm. As it is expected, the settling time of the transients is reduced considerably (∼20 times) by the active control. Also, although the input constraints are active for a while, they are inherently respected by MPC. None of these facts shall be surprising, as the LEMPC formulation proposed here will create exactly the same outputs as any other optimal control algorithm using dual-mode infinite-horizon predictive control with guaranteed stability.
In other words, neither theoretically nor practically is there any difference between the inputs generated by the proposed LEMPC controllers and other known MPC algorithms with the same configuration. This, of course, also means that the vibration damping performance, or more generally the control performance, remains unchanged.
Therefore, the important feature of the LEMPC framework is not an increased control performance but the memory and timing efficiency, which may enable the deployment of fast sampling applications even on simple hardware. Fig. 8 (bottom) shows the timing process for both versions of LEMPC introduced in Section II-C in both double and single precision implementations. The chosen baseline hardware executes the controller from Section II-C1 with a total TET approaching the 20-ms sampling period, while the controller from Section II-C2, central to this paper, is executed in less than 11 ms. By employing single-precision floating-point format, these figures can be yet reduced nearly twice, without any noticeable deterioration in the damping performance. Note that the overhead (state estimation and data processing) amounts to roughly 1 ms. The timing chart shown here indicates that online execution timing does not depend on state estimates for LEMPC, just as it was suggested in Section II-C3. This is unlike the execution time of nominal EMPC employing the sequential search that depends on the location of state within the polyhedral partition as well as the complexity of its regions [26] . The memory footprint of the controller from Section II-C1 occupies most of the available nonvolatile memory of the F051x MCU, taking up 58 kB of the available 64 kB. Approximately 22 kB of this memory is used by the program itself, while 36 kB by the data-meaning the static variables. The even more efficient controller of Section II-C2 requires mere 36 kB of nonvolatile memory for the entire feedback control algorithm implementation, including state estimation and data processing. Volatile memory needs for dynamic variables are minimal in all cases, well under 1 kB. We remark that the nominal EMPC controller has been attempted to be run on the F051x MCU as well. However, the formulation with only a 14-steps long prediction horizon was the maximum that could fit into the available memory of the MCU. Nonetheless, such controller did not yield the feasible set large enough to allow for experimental testing. More details on the embedded implementation of explicit MPC vibration control, yet without stability guarantees, can be found in our previous study [26] .
To appreciate the mere fact that both LEMPC versions were deployable in the fairly limited memory of the F051x MCU and running in real time under 20 ms, one has to recollect the typical hardware requirements for dual-mode infinite-horizon MPC with stability guarantees. It is extremely unlikely that an online QP strategy would be able to solve the same problem within 20 ms using the available 24-MHz clock speed without the possibility of suboptimal solution, although it could possibly fit into the allocated 64-kB ROM. On the other hand, the clock speed and architecture of this low-end MCU could be sufficient to evaluate a nominal EMPC formulation online even with sequential search, but it would be out of the question to fit its memory footprint under 64 kB. To put this in perspective: an equivalent MPC formulation with guaranteed stability and near-impossible to deploy to the same hardware by other currently known means.
B. Cross-Platform Comparison
Continuing in introducing and evaluating the results in this spirit, in the upcoming paragraphs, we will only focus on the timing and memory footprint of the two LEMPC algorithms in single and double numeric precision formats. Results of the STM cross-platform comparison are summarized in Table I . The left side of the table lists the type designation of the microcontrollers, their architecture family, clock speed, and the maximum available nonvolatile ROM for the program and data, and volatile RAM for variable manipulation. Note also that the maximum nonvolatile memory refers to that available on the chip itself; except F429 which also lists the external memory of the prototyping board. Table I is further divided into two variants of the LEMPC algorithm, each of which was tested in double and single numeric precision-creating a total of four test scenarios. Each of these lists the nonvolatile ROM footprint for the complete algorithm, including the overhead and its program (CODE) and static data (DATA) variables as reported by the compiler. The RAM contains dynamic variables, though this footprint is minimal for the proposed controllers and keeps the same size between hardware versions.
Timing is characterized by the task execution time as measured for the single initial state x(0) without overhead; however, this timing metric is indicative for any other state as well. The absolute execution timing is, of course, the function of the computational performance of the microcontroller. The final piece of information presented in Table I is the normalized TET, which attempts to level the differences in the small variation in architecture between various microcontrollers shown here. The Dhrystone million instructions per second (DMIPS) metric indicates the relative processor efficiency, but does not include FLOPs. The DMIPS metric is readily available for the processors tested here, and accounts for some of the differences in architecture. A typical 8-bit microcontroller has 1.0 DMIPS for each MHz of its clock speed, essentially performing a single elementary operation for each clock cycle. The relationship is more complicated in modern 32-bit MCUs, as the devices tested here range between 0.8 and 2.1 DMIPS/MHz. Rescaling this to the actual clock speed and then dividing the TET by it yields the time necessary to evaluate the LEMPC algorithm for one DMIPS, which we chose to call normalized TET in Table I .
Let us first look at the differences and the similarities for a given experimental configuration, that is, for a given LEMPC variant and numeric precision. Upon inspecting the table, one may see that the volatile memory remains virtually unchanged across platforms for the same type of controller. This is also true for different variants, as the RAM footprint remains well under 1 kB for all cases. This size is not prohibitive even for the most fundamental types of modern MCUs. The nonvolatile memory footprint varies only slightly for a given test case; this can be caused by minor differences in architecture or even the size of overhead that must include the initialization code for the given MCU. Timing varies greatly among the tested platforms for a given test case, as the timing depends on clock speed an architecture.
On average, switching from double to single precision yields a ∼1.9 times smaller footprint in ROM for both LEMPC implementations. Changing the formulation from the one in Section II-C1 to the one in Section II-C2 will further reduce the nonvolatile memory footprint by a factor of ∼1.6 for both numerical precision cases. To approximate the effect of numerical precision on computational effort, note that TET will be reduced anywhere from a factor of ∼1.5-7.7 (averaging around 4), which depends on architecture and clock speed strongly. This suggests that limiting the numeric precision-or even converting to fixed point-can broaden the horizons for applying predictive control on cheap hardware, if the reduced precision does not affect algorithm convergence or control quality. We remark that the aspect of possible performance loss due to the lower precision may be further investigated from a theoretical point of view via a fragility analysis [53] , based on the disturbances of the explicit PWA control law.
Yet again, the relative differences between the two proposed LEMPC techniques and the numerical precision are unimportant compared to the fact that the traditional implementation of EMPC or most implicit quadratic programming solvers could not have solved the same problem on the majority of investigated MCUs, either due to their large memory needs or computational requirements. We may consider the memory footprint and online evaluation effort of the proposed controllers from two viewpoints. By using efficient MPC algorithms, one can either opt for much cheaper control hardware for the same application, or use more complex models or higher sampling rates than before. Table I shows both of these critical aspects. By implementing the proposed algorithms, cheap entry-level MCU like the F100x can compute the same optimal constrained MPC moves with guaranteed stability and even long horizons that would otherwise need powerful hardware [25] . Conversely, high-end microcontrollers like the F407x can execute the benchmark vibration control problem in the microsecond range, suggesting that even faster applications or better mathematical models are within the realm of possibilities.
V. CONCLUSION
This paper presented a methodology to synthesize memory and time-efficient MPC solutions, herein utilized to suppress lateral vibrations of an active cantilever beam. The key feature of the proposed convex lifting-based implementation is that the Algorithm 5 Pseudocode Implementation of Algorithm 4 online algorithm is extremely simple with only a tiny footprint, which makes it deployable even on low-end embedded control hardware. In addition, the optimal explicit solution to a given MPC problem can be evaluated at runtime with the predefined online computation time guarantees, allowing for an easy verification of correctness of implementation in rapid prototyping design and real-world applications. The proposed framework was experimentally tested to provide optimal vibration control performance, accompanied by a comprehensive cross-platform comparison study using the family of 32-bit microcontrollers, suggesting a step toward feasible embedded implementations in memory-and time-critical optimal control applications.
The proposed framework will likely allow the use of more complex structural vibration models that cover further resonant frequencies and modes, possibly including rotational degrees of freedom. Further research shall explore this possibility and also discover theoretical and implementation challenges posed by the need of more complex observer algorithms. APPENDIX See Algorithm 5.
