Abstract-A field programmable gate array (FPGA)-based predictive controller for a spacecraft rendezvous manoeuvre is presented. A linear time varying prediction model is used to accommodate elliptical orbits, and a variable prediction horizon is used to facilitate finite time completion of manoeuvres. The resulting constrained optimisation problems are solved using a primal dual interior point algorithm. The majority of the computational demand is in solving a set of linear equations at each iteration of this algorithm. To accelerate this operation, a custom circuit is implemented, using a combination of Mathworks HDL Coder and Xilinx System Generator for DSP, and used as a peripheral to a MicroBlaze soft core processor. The system is demonstrated in closed loop by linking the FPGA with a simulation of the plant dynamics running in Simulink on a PC, using Ethernet.
I. INTRODUCTION

I
MPLICIT generation of a control law through solution of a constrained optimisation problem makes model predictive control (MPC) particularly suitable for control scenarios where reconfiguration is necessary. The prediction model, cost function and constraints can be updated online to reflect changes in system parameters or control objectives. In [1] - [4] , linear time-varying (LTV) prediction models are used to perform spacecraft manoeuvres in elliptical orbits. Simulations show that fuel consumption using MPC is favourable in comparison to other methods. Variable horizon predictive control (VH-MPC) is a variant of MPC, suited to "completion" type control problems, such as aerial vehicle manoeuvres [5] and spacecraft rendezvous [4] , [6] , [7] due to its finite-time completion properties. Exploiting this, [4] presents a family of VH-LTV-MPC controllers to perform a spacecraft rendezvous operation, with a controlled chaser spacecraft performing rendezvous with a passive target in an orbit around Mars [8] .
VH-LTV-MPC has a high computational burden, and a contemporary guidance, navigation and control computer would require a dedicated co-processor to support the MPC. Instead of using an extra high-performance general purpose microprocessor, an alternative would be to employ a custom peripheral circuit, designed to carry out all or part of the predictive control function, trading high clock frequencies (problematic in space applications due to radiation exposure) for parallelism, and offering opportunities to reduce power consumption.
This work was supported by the EPSRC (Grant EP/G030308/1) as well as industrial support from Xilinx, Mathworks and the European Space Agency.
The authors are with the University of Cambridge (email: {enh20,jmm}@eng.cam.ac.uk).
A. Background: FPGA based predictive control
Field programmable gate arrays (FPGAs) are configurable silicon devices that enable a system designer to implement custom circuits without the need to produce custom chips.
Recently there has been interest in using FPGAs to exploit opportunities for parallelism in the numerical algorithms used to implement MPC, with a variety of approaches proposed. A hybrid software-hardware design is advocated by [9] - [11] . The first uses a primal barrier method and the custom accelerator is used to calculate the gradient vector and Hessian matrix of the augmented cost function at each iteration, whilst the latter two use an active set method, with a custom circuit accelerating matrix/vectorvector multiplication. Full hardware implementations of MPC controllers are documented in [12] (active set), [13] - [16] (interior point), although [16] uses a soft-core microprocessor to bridge between the MPC and the outside world. At the other end of the spectrum, [17] implements MPC using a custom soft-core processor with non-standard numerical representations to minimise the FPGA resource (and power) requirements. With the exception of [15] , [16] , systems with one or two inputs are considered, and no prior FPGA-based predictive controllers consider variable prediction horizons and time-varying prediction models.
B. Summary
This paper presents an FPGA-based implementation of a variable horizon predictive controller, for a phase of the spacecraft rendezvous application in an elliptical orbit, based on a simplified version of one modes of operation from [4] . The design is implemented and tested on a Xilinx ML605 Evaluation Board [18] , using Ethernet to communicate with a simulation of the spacecraft dynamics running in Simulink on a PC. Resource usage and power consumption are analysed.
An uncondensed MPC form is used. This scales favourably in terms of prediction horizon [19] and is advantageous for model reconfiguration, since once the time varying state space matrices are computed, the remaining effort in constructing the optimisation problem is data transfer rather than the multiplication of large, dense matrices.
A primal-dual interior point (PD-IP) algorithm [20] is implemented to solve the constrained optimisation problems. Following the hybrid software-hardware paradigm, the algorithm is co-ordinated by a Xilinx MicroBlaze soft-core processor, and a custom peripheral circuit is designed to assist in the construction and to perform the solution of the set of linear equations that arise at each iteration of the PD-IP algorithm. This is the bottleneck in optimisation problems of the scale considered. In contrast to prior designs, where register transfer languages (RTL) such as Verilog [10] , VHDL [12] , [14] - [16] , or high level C-like languages (System-C, [13] ) have been used, the custom peripheral presented here is designed using a combination of Mathworks HDL Coder 2012a and Xilinx System Generator for DSP 14.2. This gives almost the same fine-grained clock-cycle accurate control of the design that an RTL would offer, but, being based upon Simulink, enables high-level visualisation of connectivity between components and rapid simulation.
The remainder of the paper is organised as follows: Section II develops the control problem; Section III describes the FPGA-based implementation; Section IV analyses the FPGA resource usage and power consumption; Section V describes the test-bench setup and presents closed-loop performance results; and Section VI concludes.
II. PREDICTIVE CONTROL PROBLEM
The scenario considered here is the medium-short range phase of the rendezvous in the Mars Sample Return mission [4] , [8] , starting at the point where the controlled spacecraft (chaser) has been brought into approximately the same orbit as the spacecraft with which it must rendezvous (target), but with a significant separation in true anomaly (i.e. an in-track separation of a few tens of kilometres). The control objective is to guide the chaser towards the vicinity of the target via a sequence of pre-determined "holding points" at which it must remain until given clearance. The time spent at holding points is unknown a priori, so at a given instant the goal is to enter the next holding point. The scenario terminates once the final holding point has been reached.
The Yamanaka-Ankersen (Y-A) equations [21] describe the dynamics of the relative position and velocity of the chaser with respect to the target in a local-vertical local-horizontal (LVLH) reference frame centred on the target, with the zaxis pointing towards the central body, the y-axis normal to the orbital plane, and the x-axis completing a right-handed set. Unlike the LTI Clohessey-Wiltshire (C-W) equations (e.g. [22] ), these apply to elliptical orbits, but are parameter varying in terms of the true anomaly, ν of the target. In the MSR capture scenario, the target is passive, so the Y-A equations are LTV, since ν is only a function of time.
As in [4] , [6] , a 1-norm cost is used to reflect propellant consumption being directly proportional to delivered thrust. A variable control horizon is used to enable a compromise between completion time and propellant usage. Letting x i ∈ R n x and u i ∈ R n u denote the predicted state and control input i time steps into the future, x T ∈ R n x denote a target setpoint, x(k) denote the actual measured (estimated) state at time k ∈ Z + , γ > 0 ∈ R be a weighting parameter, and θ (k) (x 0 , u 0 , . . . , x N ), the optimisation problem posed is
where A i denotes the Y-A state transition matrix corresponding to a prediction from time step i to i + 1. The plant input is modelled as an impulsive change in velocity, partitioned into positive and negative components aligned with the three axes, so B i = A i 0 0 (1) can be posed as a sequence of N max linear programs (LPs), each with fixed N. An LP is QP with a zero Hessian matrix, and the PD-IP algorithm for QP does not require a strictly positive definite Hessian matrix. Therefore, to facilitate future component reuse (for example for trajectory tracking during the final few metres of the rendezvous, for which a quadratic cost is more appropriate), a QP solver is implemented. The QP representation of (1), with N fixed is
subject to Fθ = f and Gθ ≤ g where, letting ⊕ denote the direct matrix sum, ⊗ denote the Kronecker product, I p denote the p-dimensional identity matrix, 0 q×r denote a q × r matrix of zeros and 1 q×r denote a q × r ma-
(1), with fixed N, every
III. MPC CONTROLLER ARCHITECTURE
Since the cost function (1a) in problem (1) is not convex with respect to the N, each of the convex constrained optimisation problems corresponding to horizons N = {1, . . . , N max } is solved in sequence according to Algorithm 1. A Xilinx MicroBlaze soft-core processor (with floating point unit) is used to calculate the LTV prediction matrices, and provide data to a custom peripheral core designed to solve the set of linear equations that occur at each iteration of the PD-IP algorithm, which solves the optimisation problem occuring at step 3 of Algorithm 1. Using the annotation (M) to denote a process performed on the MicroBlaze and (F) to indicate that the process happens in the custom logic implemented in the FPGA fabric, Algorithm 2 shows the software-hardware Algorithm 1: Variable horizon MPC controller Initialisation:
partition. The notation n c (= 2Nn u ) is used to denote the total number of inequality constraints.
Solve:
Line search:
end for
The number of iterations is fixed, since in a real time setting, the time to obtain a solution (even if not optimal) should be bounded. Some of the optimisation problems might be infeasible for shorter horizons N. Three heuristics are used to attempt to identify infeasibility and/or a poor solution after I IP iterations: the solution is assumed infeasible if for any j, ( b j ∞ + Nn c µ j ) > 10 4 min i ( b i + Nn c µ i ), [23] and the solution is also rejected as likely infeasible if µ I IP −1 > 10 −3 or b I IP −1 ∞ > 0.01 b 0 ∞ (i.e. the result has not converged sufficiently).
As in [15] , [16] , the system of linear equations is solved using the minimum residual (MINRES) algorithm. In the present context, this iterative method can be considered preferable to factorisation-based methods, being dominated by matrix-vector multiplications, which are very easily parallelised, and the number of iterations (i.e. the solution time) can be traded for solution accuracy.
At a high level, the custom peripheral core (PCORE) is split into two major parts. The first component performs steps (5-7) of the Algorithm 2, building the linear system, whilst the second, more complex component performs step 8. The PCORE is connected to the MicroBlaze using the AXI4-lite bus, which presents data and status registers as shared memory locations (Figure 1) .
A. Linear system builder
The linear system builder is designed using Xilinx System Generator for DSP, and consists of four main modes of operation. In the first mode, the vectorised matrices A i , B i , G i , sequentially and can be repeated (e.g. for LTI cases) without the overhead of re-writing the time invariant matrices to the PCORE. The second mode triggers a state machine which calculates G T diag(w k )G + H and stores this row-wise in a further RAM of size N max (n x + n u ) 2 + n 2
x . This calculation requires N max (n x + n u ) 2 dot products per PD-IP iteration. To achieve a throughput of one scalar addition per clock cycle within each dot product (after an initial latency), elementwise multiplication is performed in single precision floating point using a pipelined IP-core, then converted to 64-bit signed fixed point with a 32-bit fractional part [11] . The accumulated result is transformed back to single precision floating point.
The third mode triggers a second state machine which addresses the RAMs to construct a sequence of numbers comprising the rows of a compact block representation of
Whilst less generic than interleaving the primal, dual and slack variables corresponding to each stage of the prediction horizon forming a banded matrix [19] , to perform dot products of the rows of (3) with appropriately sized vectors with a throughput of one dot product per clock cycle requires a bank of (3n x + n u ) multipliers in parallel, in constrast to (2n u + 4n x − 1) if a band structure were exploited. The fourth mode triggers the second state machine again with an additional configuration flag enabled, and in combination with a second multiply-accumulate device and a subtractor to calculate b k as in step 7, and presents this sequence as a separate output.
B. MINRES solver
The majority of the computation in the MINRES algorithm comprises the Lanczos process. This is implemented using fixed-point arithmetic, enabled by the preconditioner proposed by [24] , [25] , which ensures that key variables are in the range [−1, 1]. Using the notation Z k,mn to denote the nth element of the mth row of matrix Z at iteration k, by letting
The preconditioner is calculated from the row-wise sequence of the elements of (3) and stored in a dualport RAM. The sequence (3) is also simultaneously stored temporarily in a single-port RAM. After the preconditioner is calculated, the contents of the RAM are read in sequence and multiplied by the corresponding two elements of the preconditioner (determined by a state machine addressing the dual port RAM containing the preconditioner), converted to a fixed point representation and written to a bank of block RAMs, each of which will contain a single column of the preconditioned version of (3). The key operation within the Lanczos process is a matrixvector multiplication, (M k A k M k )v performed at each iteration. This is implemented as a bank of (3n x + n u ) multipliers, a bank of (3n x + n u ) single port RAMs comprising the columns of the preconditioned fixed point copy of (3) and a bank of 2n x dual port RAMs and n u single port RAMs, containing appropriately ordered elements of v. Simulink HDL coder transparently handles element-wise vector operations and automatically creates the tree-reduction structure for adding the products together, and perform register balancing between tree stages. However, to efficiently use the 25 × 18-bit hardware multiplier units (DSP48E) on the FPGA and to avoid timing issues at the implementation stage, the operands are split and long multiplication is performed in stages, with pipeline registers (delay blocks) inserted in between. Using the notation (u/s)Fixxx_yy to denote an (un)signed xx-bit number with yy-bit fractional part, a data type of sFix25_23 is used for the matrix and sFix35_33 for the elements of v. This uses 2 DSP48Es per vector element.
The Lanczos process also requires a reciprocal square root operation to be performed. This is implemented as an integer square root operation followed by calculation of the reciprocal of the result. Due to the word lengths involved (sFix52_50), the Simulink built-in blocks generate a design with many layers of logic between registers even if automatic register balancing is used in the synthesis tools. These operations need only be performed on scalars and therefore do not need to be pipelined, so to circumvent this limitation, a standard integer division algorithm and an integer square root algorithm are implemented as state machines using M-code. The result of this operation is stored as type sFix64_32.
Outside of the Lanczos process, the remainder of the algorithm is implemented using floating point arithmetic since the bounds on the magnitudes are not known a priori. This is designed using Xilinx System Generator for DSP to instantiate pipelined IP-cores performing floating point operations and type conversions, with the HDL description of the Lanczos process imported as a "black box". A single precision (32-bit) floating point data type is used in preference to the more common double precision (64-bit) representation, this decision being motivated by FPGA resource usage. Table I shows the resource usage of the MicroBlaze and custom circuit on the FPGA on the ML605 Evaluation Board. The resources used specifically in the Lanczos process are also shown in italic. An estimate of utilisation for a selection of smaller FPGAs is also presented. Using Xilinx Power Analyzer's default operating conditions, power consumption of the system is estimated. This shows that the majority of power consumption is due to "leakage" (i.e. due to quiescent currents) and the custom PCORE. The power consumption of the PCORE and leakage are estimated for a selection of smaller FPGAs, using the Timing and Power Analysis workflow in System Generator for DSP and presented in Table II . (Totals in brackets exclude the MicroBlaze). The estimated consumption of the PCORE is consistent over the FPGA models considered, so to gain a further reductions in power consumption, improving the sharing of resources between tasks would be worthwhile. V. CLOSED-LOOP SIMULATION The design is used to control a simulation of the plant dynamics on a PC connected to the FPGA using UDP/IP over 100 Mbit ethernet. The target orbital parameters, the chaser relative position and velocity and parameters configuring the number of algorithm iterations are transmitted to the FPGA, which returns the optimal cost, optimal prediction horizon and the full predicted state and input trajectory.
IV. RESOURCE AND POWER USAGE
A. Simulator and scenario
Continuous-time target orbit dynamics and chaser relative dynamics are simulated in Simulink. The target is modelled as being in a Keplerian orbit [22] around Mars, with semimajor axis a = 4643 km and eccentricity e = 0.2044. The orbital radius at perigee is r p = 3.6939 × 10 6 m. The the open-loop chaser relative dynamics are modelled using the continuous time-varying equations from which the Y-A equations [21] derive. Navigation uncertainty is modelled as zero-mean additive Gaussian white noise on each channel. With r denoting the Euclidean distance from the target, the standard deviation is 0.35/1000r + 0.0028 (m) for position, and 0.0115/1000r + 0.003 (m/s) for velocity measurements [4] . The impulsive thrust input request from the predictive controller delivered as a 1 s pulse of acceleration, 1 s after the control instruction is received from the FPGA (this delay is unmodelled in the prediction).
In elliptical scenarios, points on the x-axis in the relative reference frame, with zero instantaneous velocity, are not unforced equilibria. The terminal state x T (k + N) is chosen as a point on a periodic holding trajectory. Letting h p be the average distance the target, ν N = ν(k + N), and ρ N = (1 + e cos ν N ), then
T , where
, and µ is the gravita- [4] , the sampling period is chosen to be T s = 300 s, and the weighting parameter γ = 35. The impulsive control input bound u max = 1 m/s. Table III shows metrics for the quality of the control performance, with respect to variation in the number of interior point iterations I IP , and the number of MINRES iterations, I MR = η (N(2n x + n u ) + n x + n T ) . These include the completion time of the whole scenario, the solution time for the full VH-LTV-MPC problem at each sampling instant, the cumulative impulsive velocity change ∆V applied, and the maximum and mean 2-norm of the difference between the control action from the FPGA-based strategy and an "ideal" solution solved on the PC using GLPK [26] . (For context, the MicroBlaze runs at 100 MHz, and the peripheral MINRES accelerator at 200 MHz). Figure 3 shows an example closed loop trajectory with I IP = 22 and η = 1.1. The apportionment of computation time for a fixed horizon N = 20 is presented in Table IV , obtained empirically using a 100 MHz Timer Counter peripheral connected to the MicroBlaze. In comparison, for a fixed horizon N = 20, an LP solver generated using CVXGEN [27] , configured with prediction matrices starting from ν = π/2 with x 0 on the x axis at 10000 m, and scheduled holding point at 5000 m takes 314 ms to run 23 iterations directly on the MicroBlaze using single precision arithmetic. The FPGA-based solver offers a 4× speedup (or 2.5× cycle-per-cycle due to the different clock rates). The FPGA-based solver can also handle dense quadratic stage and input costs with no modifications. A QP solver for such costs from CVXGEN takes 1.79 s with single precision arithmetic (14× more cycle-per-cycle). 
B. Control results
VI. CONCLUSIONS AND FUTURE DEVELOPMENTS
This paper has presented the design and evaluation of a predictive controller for a phase of spacecraft rendezvous on an FPGA using a mixed hardware-software approach, with the custom hardware component accelerating the critical path of the algorithm. The controller supports a LTV prediction model and a variable horizon, and the custom hardware components are implemented using Simulink HDL Coder and Xilinx System Generator for DSP. Further work will investigate the application to additional rendezvous phases, and improving the efficiency of the custom hardware component.
