Predictive Control of a Boeing 747 Aircraft using an FPGA by Hartley, Edward et al.
Predictive Control of a Boeing 747 Aircraft
using an FPGA ?
Edward N. Hartley ∗ Juan L. Jerez ∗∗ Andrea Suardi ∗∗
Jan M. Maciejowski ∗ Eric C. Kerrigan ∗∗∗
George A. Constantinides ∗∗
∗ Engineering Department, Cambridge University, CB2 1PZ, United
Kingdom (e-mail: {enh20,jmm}@eng.cam.ac.uk).
∗∗Dept. of Electrical and Electronic Engineering, Imperial College
London, SW7 2AZ, United Kingdom (e-mail:
{jlj05,a.suardi,gac1}@imperial.ac.uk)
∗∗∗Dept. of Electrical and Electronic Engineering and Department of
Aeronautics, Imperial College London, SW7 2AZ, United Kingdom
(e-mail: e.kerrigan@imperial.ac.uk)
Abstract: New embedded predictive control applications call for more efficient ways of solving
quadratic programs (QPs) in order to meet demanding real-time, power and cost requirements.
A single precision QP-on-a-chip controller is proposed, implemented in a field-programmable
gate array (FPGA) with an iterative linear solver at its core. A novel oﬄine scaling procedure is
introduced to aid the convergence of the reduced precision solver. The feasibility of the proposed
approach is demonstrated with a real-time hardware-in-the-loop (HIL) experimental setup where
an ML605 FPGA board controls a nonlinear model of a Boeing 747 aircraft running on a desktop
PC through an Ethernet link. Simulations show that the quality of the closed-loop control and
accuracy of individual solutions is competitive with a conventional double precision controller
solving linear systems using a Riccati recursion.
1. INTRODUCTION
For large predictive control problems, the computation and
storage of the explicit solution (Bemporad et al., 2002)
is not a feasible option and the QP has to be solved
online at every sampling instant. For a wide range of
control applications that could benefit from employing
predictive control, this approach is still either too compu-
tationally demanding to meet real-time requirements with
conventional computing platforms, or the cost and power
requirements of the necessary hardware are unfavourable.
FPGAs are reconfigurable chips that can be customised
for a specific application and have the potential to address
all of these problems. For this class of devices, transistor
scaling is still improving the performance and reducing
the cost in every new technology generation, meaning that
it has become possible to implement complex floating-
point algorithms exploiting wide parallelisation. For pre-
dictive control applications, FPGAs are easily embedded
as a system component, and offer cycle-accurate timing
guarantees. The low clock frequencies compared to high-
performance microprocessors translate to lower power
consumption, whereas the performance can be regained
through parallelisation and customisability.
Resource usage depends on computation precision. More
operations can be performed in parallel by using fewer bits
? This work was supported by the EPSRC (Grants EP/G031576/1
and EP/I012036/1) and the EU FP7 Project EMBOCON, as well
as industrial support from Xilinx, the Mathworks, and the European
Space Agency.
to represent each number, hence numerical issues are of
crucial importance for an efficient FPGA implementation.
In this paper a novel procedure for scaling the QP matrices
allows implementation of an interior-point solver in single
precision floating-point, where the solution of systems of
linear equations, which is the computational bottleneck
of the algorithm, is achieved with a parallel implementa-
tion of an un-preconditioned iterative linear solver. The
resulting solution accuracy is competitive with a double
precision implementation of a factorisation-based method,
even at the later iterations of the interior-point method,
when the equations to be solved become ill-conditioned.
There have been several prior FPGA implementations of
predictive controllers. In Ling et al. (2008), a high-level
synthesis tool that converts a C-like description into a
hardware description was used to show that it was feasible
to achieve speeds comparable to Matlab implementations
for an aircraft example with four states. Vouzis et al.
(2009) proposed a mixed software/hardware implemen-
tation where the core matrix computations are carried
out in parallel custom hardware, whereas the remaining
operations are implemented in a general purpose micro-
processor. The performance was evaluated on two state
systems. Recently, Wills et al. (2012) proposed a (very) re-
duced precision interior-point implementation that solved
a condensed QP with impressive computation times and
demonstrated its feasibility on a experimental setup with
a real two state plant.
For this implementation, the control of a nonlinear model
of a Boeing 747-200 is considered. The resulting QP,
posed in a non-condensed form, considering 12 states and
17 constrained inputs, is substantially larger than any
prior FPGA demonstration. In addition, controlling this
plant is significantly more numerically challenging than
in any related prior work. A faster-than-real-time setup
where the QP solver is implemented in a Xilinx Virtex
6 chip on an ML605 Evaluation board, controlling the
nonlinear model of the plant running in Simulink on a
desktop computer, communicating using UDP/IP over a
100 MBit/s Ethernet link is used to demonstrate that the
real-time requirements for the application can be met with
current FPGA technology.
The remainder of the paper is organised as follows: Sec-
tion 2 presents the linearised model of the Boeing 747
aircraft and defines the optimal control problem; Section 3
gives the key characteristics of the interior-point algorithm
and the hardware architecture; the scaling procedure is
described in Section 4; Section 5 describes the details of
the experimental setup; and results and conclusions are
presented in Section 6.
2. BOEING 747 MODEL AND CONTROL PROBLEM
The FPGA-based controller is demonstrated through the
control of a nonlinear model of a Boeing 747-200, modified
to enable all control surfaces to be individually manipu-
lated (Edwards et al., 2010; van der Linden et al., 2011).
The predictive controller is used to track state and in-
put reference signals from an external target calculator,
enabling roll, pitch and airspeed reference set points to
be tracked with no steady state offset. These references
are provided by a higher level controller—in this case an
outer PID loop, which tracks a piecewise-linear altitude
and yaw-angle reference trajectory.
A linear prediction model is obtained by linearising the
nonlinear model around an equilibrium trim point for
straight and level flight at an altitude of 600 m and
an airspeed of 133 ms−1 (corresponding to a point soon
after take-off), and discretising this at a sampling period
Ts = 0.2 s. The actual state and input vectors for the
nonlinear plant at time step k will be denoted x(k) and
u(k) respectively. The state and input at the equilibrium
trim point will be denoted x, and u.
Defining δx(k) , (x(k)−x) and δu(k) , (u(k)−u) as the
deviations from the trim point, and letting w(k) denote
an unmeasured disturbance, the prediction model takes
the form
δx(k + 1) = Aδx(k) +Bδu(k) +Bdw(k). (1)
Twelve states are considered in the prediction model: roll
rate (rad/s), pitch rate (rad/s), yaw rate (rad/s), airspeed
(m/s), angle of attack (rad), sideslip angle (rad), roll (rad),
pitch (rad), and four engine states. A further two states,
yaw angle (rad), and altitude (m) will be considered by
the observer and outer-loop controller. Seventeen bound
constrained inputs are controlled (see Table 1).
2.1 Control system architecture
The scenario considered is the tracking of a piecewise linear
trajectory consisting of a period of straight and level flight,
Table 1. Input constraints
Input Feasible region Units
1,2 Right, left inboard aileron [−20, 20] deg
3,4 Right, left outboard aileron [−25, 15] deg
5,6 Right, left spoiler panel array [0, 45] deg
7,8 Right, left inboard elevator [−23, 17] deg
9,10 Right, left outboard elevator [−23, 17] deg
11 Stabiliser [−12, 3] deg
12,13 Upper, lower rudder [−25, 25] deg
14–17 Engines 1–4 [0.94, 1.62] –
followed by a 90◦ change in heading followed by a 200 m
descent, followed by a 10 ms−1 deceleration.
An observer is used to obtain filtered state esimates, δxˆ(k)
and obtain an estimate wˆ(k) of a further 10 modelled
integrating disturbance states. Time varying yaw, altitude
and airspeed setpoints are provided to outer PID control
loops (implemented on the PC), which provide roll, pitch
and airspeed set points. A separate target calculator is
used to obtain a target equilibrium pair (δx∞, δu∞) for the
required roll, pitch and airspeed setpoint. Letting umin and
umax be the vectors corresponding to the lower and upper
bounds of the manipulated inputs, δumin , umin − u and
δumax , umax−u, Q ≥ 0 and R > 0, the target calculator
solves
min
δu∞,δx∞
δxT∞Qδx∞ + δu
T
∞Rδu∞ (2a)
subject to [
(A− I) B
Hr 0
] [
δx∞
δu∞
]
=
[−Bdwˆ
r
]
(2b)
δumin ≤ δu∞ ≤ δumax. (2c)
For this study, it is assumed that an equilibrium pair
compatible with these constraints exists. This is used as a
setpoint for the predictive controller which is implemented
on the FPGA. The linearisation point remains unchanged.
2.2 MPC Regulation Problem
Define the notation δxi and δui to denote predicted state
and input deviations from the trim point at a time instant
i steps in the future and let θ =
[
δxT0 δu
T
0 . . . δx
T
N
]T
, and
‖x‖2M , xTMx. Letting Q ≥ 0 and R > 0 be weighting
matrices, and P > 0 be the solution to the discrete
time algebraic Riccati equation (Franklin et al., 1990),
the optimal control problem (OCP), of horizon length N ,
solved at each time step is
min
θ
‖ (δxN − δx∞) ‖2P
+
N−1∑
k=0
(‖ (δxk − δx∞) ‖2Q + ‖ (δuk − δu∞) ‖2R) (3a)
subject to
δx0 = δxˆ(k) (from observer) (3b)
δxk+1 = Aδxk +Bδuk +Bdwˆ(k), k ∈ 0, . . . , N − 1 (3c)
δumin ≤ δuk ≤ δumax k ∈ 0, . . . , N − 1. (3d)
For simplicity, state constraints are neglected, and with
only input constraints considered, no extra measure is
needed to guarantee that a solution of the QP exists.
2.3 Finite-horizon optimal control problem as a quadratic
program
Letting ⊗ be the Kronecker product, ⊕ be the matrix
direct sum, Ii ∈ Ri×i be an identity matrix, and 1i ∈ Ri×1
be a vector of ones, let
H , (IN ⊗ (Q⊕R))⊕ P (4a)
G ,
[
IN ⊗
[
0m×n Im
0m×n −Im
]
, 02Nm×n
]
(4b)
F ,

−In
A B −In
...
. . .
−In
 (4c)
h ,
[−1TN ⊗ [xT∞Q uT∞R] −xT∞P ]T (4d)
g , 1N ⊗
[
δuTmax −δuTmin
]T
(4e)
f ,
[−xˆT (k) −(1TN ⊗ wˆT (k)BTd )]T . (4f)
The OCP (3) is posed as a quadratic program:
min
θ
1
2
θTHθ + hT θ subject to Gθ ≤ g, Fθ = f. (5)
3. HARDWARE IMPLEMENTATION
In this section the main characteristics of the hardware
architecture are described. For more details, see Jerez et al.
(2011). At each time step, the quadratic program (5) is
solved using Algorithm 1 (Wright, 1993) assuming general
state and input constraints and dense cost and dynamics
matrices.
Algorithm 1. primal-dual interior-point (PDIP) algorithm
1. Initialization θ0 := 0.05, ν0 := 0.3, λ0 := 1.5,
s0 := 1.5, σ := 0.35, IIP = 18.
for k = 0 to IIP − 1
2. Linearization Ak = Aˆ+
[
Φk 0
0 0
]
bk :=
[
rθk
rνk
]
where
Aˆ ,
[
H FT
F 0
]
, Φk , GTW−1G, W−1 , ΛkS−1k
rθk , −Φkθk − h− FT νk −GT (λk − ΛkS−1k g + σµks−1k ),
rνk , −Fθk + f,
µk ,
λTk sk
Nl + 2p
.
3. Solve Akzk = bk for zk =:
[
∆θk
∆νk
]
4. ∆λk , ΛkS−1k (G(θk + ∆θk)− g) + σµks−1k
∆sk , −sk − (G(θk + ∆θk)− g)
5. Line Search αk , max(0,1] α :
[
λk + α∆λk
sk + α∆sk
]
> 0.
6. (θk+1, νk+1, λk+1, sk+1) ,
(θk, νk, λk, sk) + αk(∆θk,∆νk,∆λk,∆sk)
end
Table 2. Resource usage of the FPGA mounted
on the Xilinx ML605 Evaluation Board. An
FPGA consists of look-up tables (LUTs), flip-
flops (FFs), embedded RAM blocks and mul-
tiplier blocks (DSP48s).
MicroBlaze MINRES solver Sequential stage
LUTs 7125(4.7%) 61341(40.7%) 1960(1.3%)
FFs 6930(2.3%) 88217(29.3%) 3039(1.0%)
BRAMs 20(4.8%) 76(18.3%) 14(3.4%)
DSP48s 3(0.4%) 205(26.7%) 2(0.3%)
The hardware implementation is split into two distinct
blocks. One block accelerates the computational bottle-
neck of the algorithm (solving the linear equations in step 3
in Algorithm 1) implementing a parallel minimum resid-
ual (MINRES) solver. MINRES is an iterative method,
where the main operation is performing a matrix-vector
multiplication (banded in this case) at each iteration. In
the present implementation, this operation is accelerated
by computing dot-products in parallel by performing all
the multiplications concurrently and passing the results
through an adder reduction tree. The other block imple-
ments the remaining operations in Algorithm 1. It consists
of a sequential machine with custom instructions that
reduce instruction storage requirements and is close to
100% efficient, i.e. it produces one result every clock cycle.
There are several reasons why an iterative method can
be preferable in this context. Firstly, the main operation
is easy to parallelise and consists of multiply-accumulate
operations which map efficiently onto hardware in terms
of resources. Secondly, these methods allow one to trade
accuracy for computation time by varying the number
of iterations (see Section 6). Finally, it is possible to
exploit the finer structure of the matrix to reduce memory
requirements since there is no filling and most of the
elements do not change and are repeated throughout the
matrix. This last point is quite important because being
able to store all problem data using on-chip memories
is essential for accelerating an iterative application like
MINRES. If one could not store the matrix on-chip, then
the amount of I/O and computation would be of the same
order as each iteration, hence the speed of the application
will be determined by how fast one could load the data in
and out of the chip regardless of the parallelism employed.
Table 2 shows the proportion of the FPGA resources used
for the components of the implemented controller. The
controller easily fits in a mid-range FPGA with plenty of
resources to spare, and the linear solver accounts for more
than 96% of the resources used.
Implementing the controller in an FPGA gives the possi-
bility of providing cycle accurate computation time guar-
antees. For the current controller, computation time is
IIPPZ(IMR + 66)
fc
seconds, (6)
where Z = N(2n+m)+2n = 229 andM = 2n+m = 41 are
the size and halfband of Ak, IIP and IMR are the number
of interior-point and MINRES iterations, respectively, and
P :=
⌈
2Z +M + 12dlog2(2M − 1)e+ 230
Z
⌉
= 4. (7)
The clock frequency is denoted by fc, which is 250MHz in
this implementation.
4. OFFLINE PRECONDITIONING
For each PDIP iteration, the convergence of the MINRES
algorithm used to solve the indefinite linear systemAkzk =
bk, and the worst-case accuracy of the final estimate of
zk are influenced by the eigenvalue distribution of Ak. If
no scaling is performed on the prediction model and cost
matrices for this application, inaccuracy in the estimates of
zk leads the PDIP algorithm to not converge to a satisfac-
tory solution. Merely increasing the number of MINRES
iterations by an order of magnitude fails to improve the
solution, yet radically increases the computational burden.
Conventionally, preconditioning would be applied online in
order to accelerate convergence, and reduce the worst-case
solution error of zk. In order to minimise design complexity
and hardware resource demands, in the present paper this
is performed oﬄine through a systematic scaling of the
optimal control problem.
Matrix Ak is not constant. However, W
−1
k is diagonal, and
since there are only upper and lower bound constraints on
inputs, the varying component of Ak, Φk also only has di-
agonal elements. Moreover, as k → IIP−1, the diagonal el-
ements of W−1k corresponding to inactive constraints tend
towards zero. Therefore, despite those diagonal elements of
W−1k corresponding to active constraints becoming large,
as long as only a handful of these exist at any instant,
the perturbation to Aˆ is of low rank, and has a relatively
minor effect on the convergence of MINRES. It follows that
rescaling the control problem to improve the conditioning
of Aˆ should also improve the conditioning of each Ak and
accuracy of the corresponding zk.
Prior to scaling, cond(Aˆ) = 2.3916× 109. The objective of
the exercise that follows is to obtain diagonal matrices
TQ > 0 and TR > 0 to scale the linear state space
prediction model and quadratic cost weighting matrices
to improve the conditioning of Aˆ using the following
substitution: A ← TQAT−1Q , B ← TQBT−1R , Bd ← TQBd,
Q ← T−1Q QT−1Q , R ← T−1R RT−1R , δumin ← TRδumin,
δumax ← TRδumax. This substitution is equivalant to
Aˆ ← MˆAˆMˆ (8)
where
Mˆ =
((
IN ⊗
(
T−1Q ⊕ T−1R
))
⊕ T−1Q
)
⊕ (IN+1 ⊗ TQ) .
(9)
By constraining TQ = diag(tQ) and TR = diag(tR), the
diagonal structure of Φk is retained. The transformation
(8) is a function of both TQ and its inverse, and both of
these appear quadratically, so it is therefore likely that
minimisation of any particular function of MˆAˆMˆ is not
(in general) going to be a convex or particularly well
conditioned.
In Betts (2010) some guidelines are provided for desirable
scaling properties. In particular, it is desirable to normalise
the rows and columns of Aˆ so that they are all of similar
magnitude. In Jerez et al. (2012) a diagonal preconditioner
M ,
{
M : M ij =
{(∑
s |Ak,{is}|
)−1/2
if i = j
0 if i 6= j
}
is proposed for matrix Ak that guarantees λi(MAkM) ∈
[−1, 1]. This must be re-calculated at every iteration of
the PDIP method and requires additional hardware for
division and square root calculation to be instantiated
in the FPGA. It can be observed that if the method of
Jerez et al. (2012) were to be applied repetitively to a
general square matrix of full rank, the 1-norm of each
of the rows converges asymptotically to unity. Whilst
iterative application of this preconditioner (i.e. repeatedly
re-conditioning the same matrix) was not the intended
application of Jerez et al. (2012), the oﬄine method
proposed subsequently exploits this observation.
The method proposed here for normalising Aˆ follows
naturally but with the further caveat that the structure
of Mˆ is imposed to be of the form (9). Consequently, it is
not (in general) possible to scale Aˆ such that all row norms
are equal to an arbitrary value. Instead, the objective is to
reduce the variation in row (and column) norms. Empirical
testing suggests that normalising the 2-norm of the rows of
Aˆ (subject to (9)) gives the most accurate solutions from
the PDIP method for the present application.
Noting that Aˆ has a very particular structure, define the
following vectors:
sx ,
{
sx ∈ Rn : sx,{i} =
(∑n
j=1Q
2
ij +
∑n
j=1A
2
ji + 1
)1/2}
su ,
{
su ∈ Rm : su,{i} =
(∑m
j=1R
2
ij +
∑n
j=1B
2
ji + 1
)1/2}
sN ,
{
sN ∈ Rn : sN,{i} =
(∑n
j=1 P
2
ij + 1
)1/2}
sλ ,
{
sλ ∈ Rn : sλ,{i} =
(∑n
j=1A
2
ij +
∑m
j=1B
2
ij + 1
)1/2}
.
Define elementwise,
l1 ,
√
su/µ (10a)
l2 , {l2 ∈ Rn > 0 : l42 = ((Nsx + sN )/(1 +Nsλ))}
(10b)
where
µ =
(N
∑
sx+
∑
sN+N
∑
sλ+n)
(2(N+1)n) . (10c)
Algorithm 2. (Oﬄine Preconditioning).
Data: A, B, Q, R, P , 
1. Let tQ ← 1n, and tR ← 1m.
Repeat:
2. Calculate l1, l2 as functions of current data, and define
L1 , diag(l1), L2 , diag(l2).
3. Update:
tQ ← L2tQ, tR ← L1tR
A← L2AL−12 , B ← L2BL−11
Q← L−12 QL−12 , P ← L−12 PL−12 , R← L−11 RL−11 .
Until: (‖l2 − 1‖ < ) ∩ (‖l1 − 1‖ < )
Output:
4. TQ = diag(tQ), TR = diag(tR).
Table 3. Effects of oﬄine preconditioning
Scaling cond(Aˆ) std ‖Aˆ{i,:}‖1 std ‖Aˆ{i,:}‖2 std |λi(Aˆ)|
Original 2.39× 109 2.92× 103 2.42× 103 2.42× 103
Scaled 21.08 0.8087 0.1487 0.5538
Table 3 shows properties of Aˆ that are likely to influence
solution quality, before and after application of Algo-
rithm 2 with  = 10−7. These are the condition number of
Aˆ, the standard deviation of the row 1– and 2–norms, and
the standard deviation of the magnitude of the eigenvalues.
5. TEST BENCH
The proposed hardware-in-the-loop experimental setup ar-
chitecture has two goals: providing a reliable real-time
closed-loop simulation framework for controller design ver-
ification; and demonstrating a hardware coded controller
that could be directly plugged into a real world plant.
Figure 1 shows the experimental setup. The QP solver,
running on a Xilinx FPGA ML605 evaluation board,
controls the nonlinear model of the B747 aircraft running
on a desktop PC. Data transfer between controller and PC
is performed through a 100 Mbit/s Ethernet link using
UDP/IP. The communication latency of this interface is
small compared to the computation time (see Table 4).
The desktop PC simulates the nonlinear plant model,
the observer and target calculator faster-than-real-time
using Simulink. At every sampling instant k, the observer
estimates the next state δxˆ(k + 1|k) and disturbance
wˆ(k + 1|k), and the target calculator provides the refer-
ence (δx∞(k), δu∞(k)) for the controller, represented as a
sequence of single-precision floating point numbers in the
payload of a UDP packet via an S -function. The FPGA
returns the control action in another UDP packet, which
is applied to the plant model at the next sampling instant.
On the controller side, the transferred data is captured by
the Physical Layer Device (PHY) on an ML605 evaluation
board, implementing the physical layer of the Ethernet
stack, and then transmitted to the FPGA chip. The
data link layer is implemented in hardware with a Media
Access Control (Ethernet lite MAC) provided by the
FPGA manufacturer. The transport and network layers
of the UDP stack are also provided by the FPGA vendor
and run on an embedded soft processor IP-core (Xilinx
MicroBlaze). The decoded UDP packet is routed to a
mixed software-hardware application layer.
The hardware application task consists of a custom hard-
ware accelerator implementing the QP solver. The software
application, executed on the same soft processor, bridges
the communication between the Ethernet interface and the
hardware accelerator. This architecture provides more sys-
tem flexibility than a dedicated custom designed interface,
with a small increase in FPGA resource usage (Table 2),
power consumption and communication delay (Table 4),
and allows easy portability to other standard interfaces,
e.g. SpaceWire, CAN bus, etc., as well as an option for
direct monitoring of controller behaviour.
Simulink
- Plant 
- Observer
- References
UDP/IP
software
application
layer
transport, 
network,
Desktop PC
mixed 
software -
hardware
application
layer
transport, 
network 
layers
Xilinx FPGA ML605 evaluation board
QP solver
bridge
Soft Processor
light-weight UDP/IP stack
Ethernet
100Mbits
data link, 
physical
layers
Ethernet lite MAC
Physical Layer 
Device (PHY)
100Mbits
physical
layers
data link
layers
Fig. 1. Hardware-in-the-loop experimental setup. The
computed control action by the QP solver is encapsu-
lated into a UDP packet and sent through an Ethernet
link to a desktop PC, which decodes the data packet,
applies the control action to the plant and returns
new state, disturbance and trajectory estimates.
6. RESULTS AND CONCLUSIONS
The MINRES-based PDIP MPC controller, implemented
using single precision floating point arithmetic, and only
off-line preconditioning of Ak, and using a prediction
horizon N = 5, has been used to successfully control the
nonlinear model of a Boeing 747-200. Figure 2 shows the
closed-loop state and control surface trajectories for the
simulation of the manœuvre described in Section 2.1, with
IMR = 66.
To demonstrate the claim in Section 3 that solution accu-
racy can be traded for computation time, Table 4 presents
timing and solution accuracy data for the MINRES-based
controller over the course of a 600 s simulation, allowing
229, 115, 88 and 66 MINRES iterations for each PDIP
iteration. The MINRES-based controller is compared with
the algorithm of Rao et al. (1998), the dense active set
method of Goldfarb and Idnani (1983) and an imple-
mentation of the un-preconditioned MINRES-based solver
with the same pre-scaled prediction model, running on
the PC. This is also compared with the algorithm of Rao
et al. (1998) running directly on a 150 MHz MicroBlaze
(the fastest allowed by the synthesis software). For the
first two PC-based implementations, the algorithms are
coded in Matlab and converted to C-MEX functions
with Matlab Coder (R2011a), with responsiveness and
memory integrity checks disabled, and no use of BLAS.
The MINRES-based PC implementation is coded directly
in C. Functions are compiled using gcc version 4.2.1 with
-O3 optimisations.
Timing for PC-based algorithms is obtained using the tic
and toc commands on a Macbook Pro with a 2.4GHz
Intel i5, over a sequence of data obtained from a prior
simulation. For FPGA-based controllers, time is measured
from the point before the UDP packet is sent until the
point after which the UDP packet reply is received. It
can be seen that the FPGA-based implementation requires
substantially fewer clock cycles to achieve competitive
control accuracy, and that for the present application,
the clock frequency could even be reduced (for example,
to reduce power consumption or reduce heat dissipation
requirements) whilst still achieving real-time control.
Table 4. Predictive controller performance comparisons over duration of simulation (N = 5)
Implementation Numerical accuracy Cost Max solution time (ms)
Location Algorithm Precision IMR emax eµ summation Total QP Comm Clk cycles
FPGA MINRES float32 229 2.2× 10−3 3.3× 10−5 4.428× 103 20.99 20.38 1.11 50× 105
FPGA MINRES float32 115 1.0× 10−2 3.8× 10−5 4.420× 103 13.52 12.41 1.11 31× 105
FPGA MINRES float32 86 1.9× 10−2 4.5× 10−5 4.412× 103 11.81 10.70 1.11 27× 105
FPGA MINRES float32 66 3.7× 10−2 6.0× 10−5 4.401× 103 9.89 8.78 1.11 22× 105
PC MINRES float32 229 3.1× 10−3 4.5× 10−4 4.428× 103 865 865 N/A 21× 108
PC RWR1998 float64 N/A – – 4.428× 103 15.2 15.2 N/A 36× 106
PC GI1983 float64 N/A 1.3× 10−4 1.3× 10−5 4.428× 103 6.4 6.4 N/A 15× 106
MicroBlaze RWR1998 float32 N/A 2.5× 10−3 3.8× 10−4 4.428× 103 526.3 525.2 1.11 79× 106
The control accuracy metrics presented are
emax = maxi,k
∣∣∣uF{i}(k)− u∗{i}(k)∣∣∣ / (δumax,{i} − δumin,{i})
eµ = meani,k
∣∣∣uF{i}(k)− u∗{i}(k)∣∣∣ / (δumax,{i} − δumin,{i})
where •{i}, •F and •∗ denote the ith element of a vector,
solution from the FPGA, and a “true” (assumed) optimal
solution found using double precision arithmetic conven-
tional methods respectively. Even with a reduced number
of MINRES iterations, and a fixed number of 18 PDIP
iterations, the error between the obtained control action
and the ideal solution is, on average, small. The sum of
stage costs over the simulation is similar in all cases.
0 100 200 300 400 500 600
−5
0
5
10
15
20
25
Time /s
R
ol
l /
de
g
 
 
Target, x
∞
Measured, x
0 100 200 300 400 500 600
4
4.5
5
5.5
6
6.5
7
Time /s
Pi
tc
h 
/d
eg
 
 
Target, x
∞
Measured, x
0 100 200 300 400 500 600
−50
0
50
100
Ya
w
 /d
eg
 
 
Reference, r
Measured, x
0 100 200 300 400 500 600
300
400
500
600
700
Al
tit
ud
e 
/m
 
 
Reference, r
Measured, x
0 100 200 300 400 500 600
120
125
130
135
Time /s
Ai
rs
pe
ed
 /m
s−
1
 
 
Reference, r
Target, x
∞
Measured, x
(a) Selected state trajectories
0 100 200 300 400 500 600
−20
−10
0
10
20
Ai
le
ro
ns
 /d
eg
 
 
R/I Aileron
R/O Aileron
0 100 200 300 400 500 600
−20
−10
0
10
20
Ai
le
ro
ns
 /d
eg
 
 
L/I Aileron
L/O Aileron
0 100 200 300 400 500 600
0
10
20
30
40
R
 S
po
ile
rs
 /d
eg
Time /s
0 100 200 300 400 500 600
0
10
20
30
40
Time /s
L 
Sp
oi
le
rs
 /d
eg
0 100 200 300 400 500 600
0
2
4
6
Time /s
El
ev
. /
de
g
 
 
R/I Elev.
L/I Elev.
R/O Elev.
R/L Elev.
0 100 200 300 400 500 600
−1
−0.5
0
0.5
1
Time /s
R
ud
de
r /
de
g
 
 
U Rudder
L Rudder
(b) Selected input trajectories
Fig. 2. Closed loop simulation trajectories (IMR = 66)
REFERENCES
Bemporad, A., Morari, M., Dua, V., and Pistikopoulos,
E.N. (2002). The explicit linear quadratic regulator for
constrained systems. Automatica, 38(1), 3–20.
Betts, J.T. (2010). Practical Methods for Optimal Control
and Estimation using Nonlinear Programming. SIAM,
second edition.
Edwards, C., Lombaerts, T., and Smaili, H. (eds.) (2010).
Fault Tolerant Flight Control: A Benchmark Challenge.
Lecture Notes in Control and Information Sciences.
Springer.
Franklin, G.F., Powell, J.D., and Workman, M.L. (1990).
Digital control of dynamic systems. Addison-Wesley,
Reading, Mass., 2nd ed edition.
Goldfarb, D. and Idnani, A. (1983). A numerically stable
dual method for solving strictly convex quadratic pro-
grams. Math. Prog., 27(1), 1–33.
Jerez, J.L., Constantinides, G.A., and Kerrigan, E.C.
(2011). An FPGA implementation of a sparse quadratic
programming solver for constrained predictive control.
In Proc. ACM Symp. Field Programmable Gate Arrays.
Monterey, CA, USA.
Jerez, J.L., Constantinides, G.A., and Kerrigan, E.C.
(2012). Towards a fixed point QP solver for predictive
control. In Proc. IEEE Conf. on Decision and Control
(Submitted).
Ling, K.V., Wu, B.F., and Maciejowski, J.M. (2008).
Embedded model predictive control (MPC) using a
FPGA. In Proc. 17th IFAC World Congress, 15250–
15255. Seoul, Korea.
Rao, C.V., Wright, S.J., and Rawlings, J.B. (1998). Ap-
plication of interior-point methods to model predictive
control. J. Optim. Theory Appl, 99(3), 723–757.
van der Linden, C., Smaili, H., Marcos, A., Balas, G.,
Breeds, D., Runham, S., Edwards, C., Alwi, H., Lom-
baerts, T., Groeneweg, J., Verhoeven, R., and Breeman,
J. (2011). GARTEUR RECOVER benchmark. URL
http://www.faulttolerantcontrol.nl.
Vouzis, P.D., Bleris, L.G., Arnold, M.G., and Kothare,
M.V. (2009). A system-on-a-chip implementation for
embedded real-time model predictive control. IEEE
Trans. Control. Syst. Technol., 17(5), 1006–1017.
Wills, A.G., Knagge, G., and Ninness, B. (2012). Fast
linear model predictive control via custom integrated
circuit architecture. IEEE Trans. Control. Syst. Tech-
nol., 20(1), 59–71.
Wright, S.J. (1993). Interior-point method for optimal
control of discrete-time systems. J. Optim. Theory
Appl., 77, 161–187.
