INTRODUCTION

Motivation
Advances in communication, control, and computer engineering have enabled the design and implementation of large-scale systems, such as smart infrastructure, with remote monitoring and control, which is often desired due to the geographical spread of the system and requirements for flexibility of design (to accommodate future expansions). These positive features however come at the cost of security threats and privacy invasions [14, 44, 53, 58] .
Security threats can be decomposed into multiple categories based on resources available to adversaries [52] . A basic security attack that requires relatively few resources is eavesdropping in which an adversary monitors communication links to extract valuable information about the underlying system. Eavesdropping is often a starting point for more sophisticated attacks [56] . These attacks have resulted in the use of encryption [42, 55] . Figure 1 diagram of a typical secure cyber-physical system with encryption. The actuator, system, and sensor (sometimes together referred to as the plant) form the physical system that must be remotely monitored and controlled. The physical system can be the electricity grid, transportation network, or a building, for example. Note that, although a single node is used in Figure 1 (b) to denote the sensor, in general it can comprise a collection of spatially distributed sensors. That is, the sensors can be spread geographically within the underlying physical system to measure appropriate states in different locations, e.g., voltages and frequencies at various locations in an electricity grid. The same also goes for the actuator. The addition of the encryption and decryption units in Figure 1 (a) protects the overall system against eavesdroppers on the communication network; however, it does not provide any protection if the eavesdropper infiltrates the controller or if the controller itself is the eavesdropper (in industrial espionage). This is because sensitive information is decrypted prior to entering the controller and is thus readily available there. This motivates the use of a system, depicted in Figure 1 (b), with homomorphic encryption enabling controller computations to be performed on encrypted numbers. In practice, the (physical) system in Figure 1 (b) is a continuous-time dynamical system. To control the system, the sensors sample the outputs of the system at regular intervals and transmit these measurements to the controller through communication networks (e.g., WiFi or Bluetooth for short ranges or the Internet for longer ranges). The controller computes the necessary commands based on the received measurements and forwards the commands to the actuators for implementation. The actuators then apply and hold the received control signal for a fixed duration. This methodology for digital control of physical systems is often, unsurprisingly, referred to as sample and hold [18] . Before each new sample can be processed by the controller, it must process the previous one, compute the control inputs, and transmit the control inputs to the actuators. Therefore, the sampling rate of the sensors cannot be faster than the inverse of the worst-case delay/latency caused by the required computations and communications. On the other hand, in order to guarantee stability and performance of the overall closed-loop system, we must ensure that the sampling occurs regularly and faster than a certain level (related to how fast the controlled systems dynamics need to be) [18] .
In [16] , a general purpose microprocessor based system, specifically a Raspberry Pi, was used to control a differential-wheeled robot in real-time using an encrypted controller. Controlling such a robot is not a complicated task as the underlying system is stable and, if the control signal is not updated with regular timing, the system would not violate safety constraints so long as it is restricted to move very slowly. Further, slowing down the sampling rate in this robot only degrades the performance by making it slower, not resulting in undesirable behaviours. In safety critical applications, however, the timing of the control loop is crucial; if we cannot ensure that the controller is able to provide the correct actuation signal within the sampling time of the system, then safe operation of the system cannot be guaranteed. Having tight control on the timing is unfortunately not always possible on general purpose microprocessor based systems with operating systems because computations and their timings are subject to the operating system scheduling. Even without an operating system, the time sequential nature of software implementations for execution on a general purpose processor can be limiting from the perspective of achievable sampling rate. This motivates the design of a custom digital engine, amenable to realization on Field-Programmable Gate Arrays (FPGAs), for performing the necessary computations. This is the focus of the developments presented below.
Contributions
In this paper, we use homomorphic encryption, specifically the Paillier encryption scheme [41] , to implement linear control laws. This includes many popular control laws, such as static gain [21] , proportional-integral-derivative (PID) control [3] and linear quadratic regulators (LQR) [21] . Linear control laws, such as PID controllers, have been heavily used within the industry for regulating nonlinear physical systems and are of practical relevance [2] . Although the paper presents the digital system implementation within the context of Paillier encryption, the underlying methodology is applicable, in principle, to other homomorphic encryption methods that rely on the exponentiation of large integer numbers, such as RSA and ElGamal encryption [15, 46] . After the quantization and transformation of the controllers for implementation on ciphertexts, modular multipliers and exponentiators are implemented using Montgomery multiplication [26, 38] . These modules can be used in parallel for encryption, controller computations, and decryption. We analyze the timing of an FPGA realization of such an implantation of a feedback controller, and present experimental results for the control of an unstable system, namely, an inverted pendulum.
Related Studies
The study of homomorphic encryption, a form of encryption that enables computations to be carried out on the encrypted data, dates back to the pioneering result of [45] after observing semihomomorphic properties in RSA [46] . Semi-homomorphic encryption only allows for a smaller number of operations to be performed on the encrypted data in contrast with fully homomorphic encryption. For example, in the case of RSA and ElGamal encryption [15] multiplication of plaintext data corresponds to multiplication of encrypted data, and in the case of Paillier encryption [41] summation of plaintext data corresponds to multiplication of encrypted data. The Gentry encryption scheme [19] is the first fully-homomorphic encryption scheme that allows both multiplication and summation of plain data through appropriate arithmetic operations on encrypted data. Subsequently, other fully homomorphic encryption methods have been proposed, e.g., [9, 54] . The computational burden of fully-homomorphic encryption methods is often much greater than that of semi-homomorphic encryption methods.
Homomorphic encryption has been used previously for third-party cloud-computing services [1, 17, 20, 32, 35, 60] . More recent studies [16, 25, 27, 28, 50] have considered challenges associated with the use of homomorphic encryption in closed-loop control of physical systems, such as maintaining stability and performance, albeit without considering timing concerns (by not getting into the computational time of encryption, computation, and decryption and assuming all underlying computations are instantaneous). None of these studies consider dynamic control laws; they are all restricted to static control laws without any form of memory. This is because, in dynamical control laws with an encrypted memory, the number of bits required for representing the state of the controller can grow linearly with the number of iterations. This renders the memory of such control laws useless after a certain number iterations due to an overflow or an underflow 1 . We borrow theoretical results from [39, 40, 43, 51] to propose a finite-memory implementation of dynamic controllers over ciphertexts.
An alternative to homomorphic encryption is secure multi-party computation based on secret sharing or other forms of encryption (possibly non-homomorphic encryption methodologies). A well-known method for secure multi-party computation is the Yao protocol, which was originally developed for secure two-party computations [59] . The protocol provides a method for evaluating a Boolean function without any party being able to observe the bits that flow through the circuit during the evaluation. This has been proved to be secure [34] and efficiently implementable for Boolean functions [33] . However, when dealing with more general mappings, i.e., non-Boolean functions, the efficiency of the protocol is limited as the problem of finding the most efficient Boolean representation of a function, in terms of the efficiency of implementing the Yao protocol [29] , is not trivial [30] . Another approach is to utilize secret sharing in which a secret is divided into multiple shares and each party receives one share, which appears random to the receiving party. Then, appropriate computations on the secret shares can be performed to evaluate the outcome [10, 23] . Application of secret sharing to general problems is difficult and the digital design becomes problem specific to the application.
Finally, note that the Paillier encryption scheme has been recently implemented on FPGAs in [47] ; however, that paper considered the problem of privacy-preserving data mining, which has different requirements in comparison to real-time encrypted control. This difference in requirements resulted in the consideration of a different implementation architecture in this paper. In particular, the binomial expansion for the specific choice of the exponential base is exploited to achieve fast encryption in this paper. Further, there are differences between the operations required for data mining and controller computation.
Paper Outline
The rest of the paper is organized as follows. In Section 2, the building blocks of the networked control systems in Figure 1 (b) are presented and we describe the implementation of the control laws over ciphertexts. In Section 3, the digital design for FPGA realization is described. We present the experimental results for the control of an inverted pendulum in Section 4. Finally, we conclude the paper and present avenues for future research in Section 5.
SECURE FEEDBACK CONTROL
In this section, we discuss encryption, decryption, and controller blocks of the networked control systems in Figure 1 (b).
Feedback Controller
In this paper, we consider dynamic controllers of the following form:
where x[k] ∈ R n x is the controller state, u[k] ∈ R n u is the vector of control inputs to the physical system, [k] ∈ R n is the vector of plant outputs, and T is the number of time steps between controller state resets. Conditions for selecting T with stability and performance guarantees are presented in [40] . The class of controllers in (1) covers static, reset integral, reset lead and lag controller. For instance, in the case of static controllers, A = 0, B = I , and C is the static gain of the controller. Note that there is a delay of one sampling time between measurement and actuation, modelling computation and communication time associated with the networked controller. For static controllers, since the controller's state is not accumulative and only acts as a delay, we can set T = ∞ without concerns about state overflow or underflow. For reset proportional-integral (PI) controllers, A = diag(1, 0) with diag(a) denoting a diagonal matrix whose main diagonal is equal to a, B = [∆t 1] ⊤ with ∆t > 0 denoting the sampling time of the control system, and
with K I and K p denoting, respectively, the integral and proportional gains. Note that PI control laws have been heavily used within the industry for regulating/controlling nonlinear physical systems [2] and, therefore, the choice of linear dynamic controllers is of practical relevance. In this paper, we consider resetting dynamic control laws because implementing encrypted controllers over an infinite horizon is impossible due to memory issues (through repeated multiplication of fixed point numbers in the plaintext domain, the numbers of the bits required for representing the fractional and integer parts of plaintext numbers continuously grow, and there is no simple way to truncate with small error when working in the encrypted domain). Resetting controllers have been previously studied in [39, 40, 43, 51] .
Homomorphic Encryption
A public key encryption scheme can be described by the tuple (P, C, K, E, D), where P is the set of plaintexts, C is the set of ciphertexts, K is the set of keys, E is the encryption algorithm, and D is the decryption algorithm. As such encryption schemes are asymmetric, each key κ = (κ p , κ s ) ∈ K is composed of a public key κ p (which is shared with everyone and is used to encrypt plaintexts), and a private key κ s (which is kept secret and is used to decrypt ciphertexts). The algorithms E and D are publicly known, and use the keys as parameters, which are generated for each new use-case. It is required that
Definition 2.1 (Homomorphism in Cryptography). A public key encryption scheme (P, C, K, E, D) is homomorphic if there exist operators • and ⋄ such that (P, •) and (C, ⋄) are algebraic groups and
Typically, the sets P and C are finite rings of integers Z n P and Z n C respectively. Then, the modular addition operation (x 1 • x 2 = (x 1 + x 2 ) mod n P ) and the modular multiplication operation (x 1 • x 2 = x 1 x 2 mod n P ) both form groups with P. If there exists an operator ⋄ that satisfies the definition of a homomorphic encryption scheme when • is defined as modular addition, we call the encryption scheme additively homomorphic. Likewise, if there exists an operation ⋄ that satisfies the definition of a homomorphic encryption scheme when • is defined as modular multiplication, we call the encryption scheme multiplicatively homomorphic. If both these properties hold, the encryption scheme is called fully-homomorphic; if only one description applies, it is semi-homomorphic. Importantly, the properties of fully-homomorphic and semi-homomorphic encryption schemes allow additions and multiplications of plaintexts to be performed through the generation of a ciphertext from other ciphertexts, without any intermediate decryptions and encryptions.
Encryption schemes, such as Paillier [41] , RSA [46] , and ElGamal [15] , are examples of semihomomorphic encryption. The Paillier encryption scheme is additively homomorphic, while the RSA and ElGamal encryption schemes are multiplicatively homomorphic. These homomorphic encryption schemes have been used in the literature to ensure privacy and security when various computational tasks, such as computing set intersections, data mining, executing arbitrary programs, and controlling dynamical systems, are performed by untrusted parties; see, e.g., [1, 16, 20, 32, 35, 40, 60] and references there-in for examples. The above-mentioned homomorphic encryption schemes involve calculating modular exponentiations (i.e., b a mod M for positive integers a, b, and M), which is a computationally expensive operation. The time required to perform encryption, decryption, and homomorphic operations on ciphertexts, depends largely on the speed with which modular exponentiation can be achieved. This can potentially limit the usability of homomorphic encryption schemes for real-time control of physical systems.
Definition 2.2 (Indistinguishability under Chosen Plaintext).
Consider a scenario in which a polynomial-time-bounded adversary provides two plaintexts. One of these plaintexts is randomly chosen and encrypted. An encryption scheme is said to be indistinguishable under chosen plaintext attack, if the adversary has a negligible advantage 2 over guessing which of the two plaintexts were encrypted, using any information apart from the private key.
Indistinguishability under chosen plaintext is a desirable property because an adversary is unable to determine the decryption of a ciphertext, by trialling encryption of likely plaintexts. The RSA encryption scheme does not have this property unless modified to OAEP-RSA [5] . The Paillier and ElGamal encryption schemes have this property, as they introduce a large random number during encryption, allowing a single plaintext to encrypt non-deterministically to many possible ciphertexts, which removes any significant advantage in trialling encryption of likely plaintexts [15, 41] .
In what follows, we use Paillier encryption scheme as it is additively homomorphic and satisfies indistinguishability under chosen plaintext attack. Note that the ideas of this paper can be readily used for other homomorphic encryption relying on modular exponentiation. Paillier encryption works as follows. First, two large prime numbers p and q are randomly chosen to generate keys. The public key is κ p = N = pq and the private key is κ s = (λ, µ) = (lcm(p − 1, q − 1), λ −1 mod N ) where lcm(a, b) denotes the least common multiple of integers a and b. Note that λ −1 mod N is a unique integer µ in Z N such that λµ mod N = 1. In the Paillier encryption scheme, the set of plaintexts and ciphertexts are, respectively, P = Z N and C = Z N 2 . Encrypting a plaintext t is done by calculating E(t) = (N + 1) t r N mod N 2 , where r ∈ {x ∈ Z N | gcd(x, N ) = 1} is randomly chosen. Note that, because of using N + 1 as the exponentiation basis in the encryption algorithm, it can be rewritten as E(t) = (N t + 1)r N mod N 2 . This property follows from the use of binomial expansion because
Using this property makes our implementation of the encryption considerably faster than [47] . Decryption of a ciphertext c is done by calculating
The additive homomorphic property follows from
Note that this is not a true multiplicative homomorphic property, as t 2 is not encrypted; the encrypted result is formed from one ciphertext and one plaintext, rather than two ciphertexts. In the remainder of this paper, we use ⊕ to denote the additive homomorphic operator on ciphertexts and ⊗ to denote the pseudo-multiplicative homomorphic operator, i.e.,
Secure Controller Implementation
The computations required to implement the controller in (1) are additions and multiplications. We restrict the controller input to fixed-point numbers and use the mapping from fixed point numbers to the integers from [16] . This allows the equivalent operations of addition and multiplication to be effectively applied to fixed point numbers and integers over the ciphertext. The effect of the quantization error can be made arbitrarily small by increasing the number of bits used to represent the underlying numbers (specifically the number of fractional bits), at the expense of increased computational cost [16, 40] , given bounds on the size of disturbances that can act on the system. Quantizing also introduces saturation, which can be quite problematic. However, the negative effects of saturation may also be manged by increasing the number of bits (specifically the number of integer bits) used to represent the underlying numbers [16, 40] .
To provide more detail about the quantization process and its effect on the control law, we introduce the set of fractional numbers
The quantization operator Q : R → Q is defined as Q(z) := arg min z ′ ∈Q(n,m) |z − z ′ |. With slight abuse of notation, we use Q(A) and Q(x) to denote the entry-wise quantization of any A ∈ R n×m and x ∈ R n , respectively. The quantized controller is then given bȳ
. We use the bar, e.g.,x, to denote the quantized version of any variable, e.g., x. The map from fixed point numbers to the integers Z 2 n ′ is borrowed from [16] to definê
where n ′ = (n x + 1)T + n u + n(T + 2) to prevent overflows. Here, for simplicity, we assume that all scalar components of vectors use the same n and m, but these values can differ for various parts of the controller in general [16] . Then the quantized controller can then be rewritten to operate on ciphertexts as where ⊕, ⊗ are defined in (2) and the tilde is used to denote the encrypted integers; i.e.,
Finally, the control signal at the actuator is computed bŷ
where 1 p is equal to one if statement p holds and is equal to zero otherwise.
DIGITAL DESIGN
Timing is an important issue when implementing controllers in real-time. While the maximum computation to be performed by the controller is effectively the same in every iteration, implementations on a general purpose microprocessor based system are subject to variable timing performance dependent on operating system scheduling. Even without an operating system, the time sequential nature of software implementations for execution on a general purpose processor can be limiting from the perspective of achievable sampling rate. Such implementations are therefore not acceptable for systems with strict deadlines. This motivates the development of a custom digital engines for performing the computations. Hardware implementation of homomorphic encryption based secure feedback control can result in faster sampling rates than software implementations, thereby broadening the applicability of encryption based methods for securing feedback control systems. The speedup of a digital design in hardware over a software design can be from many aspects. Hardware designs are able to take advantage of full parallelism, while software designs typically run sequentially on a few parallel threads, and are thus limited in their parallelism. Hardware designs can also introduce pipelining into data paths, where the computation is divided into a pipeline of sequential stages, with stages all running at the same time, and each stage passing its result to the next stage [57] . This can be used to increase achievable data throughput compared to sequential software designs, as new data can be passed through the first stage of the pipeline while there is still data to be processed in the subsequent stages. for i = 1, ..., l do
12:
if E mod 2 = 1 then 13:
14:
end if
15:
E ← ⌊E/2⌋
16:
end for 18: return P 19: end function Figure 2 illustrates the schematic diagram of the custom digital system for encrypted control discussed in this section. There are three major parts: encryption and decryption units, in the plant interface, and the physical system controller unit, accessed over a network. Each of these units includes a digital engine controller, which orchestrates data flow through the components of these systems, according to a corresponding algorithmic state machine. The activity of each major part is triggered by external events. Encryption is periodically triggered by the generation of samples of the plant output. Physical system controller and decryption unit activity is triggered by the arrival of data over the network.
In this section, we describe plant interface (encryption and decryption) and physical system controller blocks in Figure 2 . Modular multiplication and modular exponentiation are important recurring elements in all of these blocks. Therefore, we start by describing these elemental building blocks in Subsection 3.1. We then describe the controller in Subsection 3.2 and the plant interface in Subsection 3.3.
Modular Multiplication and Exponentiation
In many homomorphic encryption schemes, including Paillier encryption, efficient implementation of modular exponentiation is essential for fast encryption, decryption, and homomorphic operations; see Subsection 2.2. Within the context of secure feedback control implementation, the time it takes to perform encryption, decryption and homomorphic operations on cyphertexts, is a lower bound on the control loop sample period, which when reduced, typically leads to improved performance for systems with fast dynamics (e.g., an unstable inverted pendulum). Note that, in principle, it is possible to decrease the time required for computations by decreasing the encryption key length; however, this would reduce the security of the system which is not desirable. for i = 1, ..., w + 1 do
13:
Z ← X (Y mod 2 16 )
14:
Y ← ⌊Y /2 16 ⌋
15:
m ← ((T mod 2 16 ) + (Z mod 2 16 ))M ′ mod 2 16 16:
end for 18:
return T 19: end function
We utilize the right-to-left binary method for calculating modular exponentiation, which is summarized in Algorithm 1. The algorithm is particularly useful for our application as it allows for the parallelization of the two modular multiplications in each iteration. This gives a speedup of up to two times, and results in a constant latency as the modular multiplication in line 13 in Algorithm 1 is performed in parallel to the modular multiplication that must be always performed in each iteration in line 16 in Algorithm 1. The right-to-left binary method for exponentiation involves calculating many sequential modular multiplications. The algorithm best suited for this purpose is Montgomery multiplication [38] . It removes the need to perform a trial division by the modulus which is an expensive operation in hardware, and instead only involves additions, multiplications, and right shifts; e.g., see Algorithm 2. However, for it to be useful for implementing modular multiplications, its operands must be converted to Montgomery form, and the result must be converted back from Montgomery form. These conversions can be done using additional Montgomery multiplications. The Montgomery form of an integer a when using a modulus of M is (aR) mod M, where the Montgomery radix R is typically a power of 2, larger than M. In the right-to-left binary implementation of modular exponentiation, subsequently, referred to as Montgomery exponentiation, the conversions to and from the Montgomery form only occur before and after the exponentiation, as the intermediate (theoretical) conversions between the sequential multiplications within the exponentiation cancel out [38] . The block diagram for a realization of the Montgomery exponentiator is illustrated in Figure 3 .
Many hardware designs for computing Montgomery multiplications exist. A design involving the Karatsuba multiplication algorithm can be used to evaluate very large multiplications [11] . While this proved to be computationally effective in [11] , such a method may not be suitable for some applications due to prohibitive hardware resource required for evaluating Montgomery multiplications even with relatively small operands. Another method for implementing Montgomery multiplication involves using the Coarsely Integrated Operand Scanning (CIOS) variant [26] with a word size of a single bit. Implementations of this algorithm are described in [12, 31] . The bitwise approach greatly simplifies the architecture of the Montgomery multiplier, as it is only required to perform additions and right shifts. However, the bitwise design cannot make use of the multi-bit word embedded multipliers available on most modern FPGA devices.
A blockwise implementation of the CIOS method of Montgomery multiplication is ideal for the purposes of this paper as it is amenable to the use of embedded multipliers in FPGAs to perform smaller multiplications. Some implementations of this algorithm are discussed in [6, 37, 48] . These implementations range from using a constant number of embedded multipliers to the case where the number of embedded multipliers scales linearly with the number of bits in the operands to perform large parallel multiplications. Therefore, based on the amount of the available hardware resources, an appropriate implementation of the blockwise CIOS-based Montgomery multiplier can be designed to ensure the resources are utilized effectively.
In Algorithm 2, we borrow the modified CIOS method [22] with a word size of 16 bits. The modified CIOS method removes the conditional final subtraction in typical Montgomery multiplication implementations to reduce hardware resource consumption. Algorithm 2 also differs from the conventional Montgomery multiplication in that it produces outputs that possibly have the modulus M added to it, rather than an output in Z M . Such an output is acceptable as long as an explicit conversion from this modified Montgomery form, through Montgomery multiplication by 1, is used to produce the final result [22] .
Across all Montgomery multipliers, we use the same value of the Montgomery radix R = 2 16(w +1) , where w is the smallest integer such that N 2 + 2 < 2 16w ; note that N 2 + 2 is the largest modulus used in the system. Throughout the encrypted control system, there are only three different values used as modulus, so these values can be coded into the Montgomery multipliers required, with an input allowing for the selection of the modulus. In the Paillier encryption scheme, all modular exponentiations have modulus M = N 2 , where N is the public key. M u l t i p l i c a t i o n a n d E x p o n e n t i a t i o n M u l t i p l i c a t i o n a n d E x p o n e n t i a t i o n M u l t i p l i c a t i o n a n d E x p o n e n t i a t i o n M u l t i p l i c a t i o n a n d E x p o n e n t i a t i o n . . . n copies Fig. 4 . Block diagram of control computation using the Montgomery multiplication and exponentiation. These parallel blocks sit within the controller block in Figure 2 .
In what follows, using the custom digital implementations of the Montgomery multipliers and the Montgomery exponentiators as the underlying arithmetic blocks, we design plant interface and physical system controller modules for an encrypted control system secured with the Paillier encryption scheme. As shown in Figure 1 , the plant interface performs encryptions of system outputs and decryptions of control inputs, and the controller evaluates the control law securely over encrypted data. The ciphertexts transmitted between the plant interface and the controller are in the Montgomery form.
Parallelization is possible within the building blocks of the Montgomery multiplier and the Montgomery exponentiator, and also in the designs of the plant interface and controller. Adding parallelization increases the resource consumption of the hardware design, which is a limiting factor. To offset this, resources are reused whenever possible. In particular, the Montgomery multipliers used to implement the Montgomery exponentiators can also be used whenever single modular multiplications are required, rather than instantiating separate Montgomery multipliers for this purpose.
Controller Module Design
Consider dynamic controllers in (5). There are computations for incorporating the state of the controller into the generated control inputs, described in Algorithm 3, and for updating the state of the controller, described in Algorithm 4. The update of the controller state can be performed independently of the generation of the control inputs. Figure 4 illustrates the block diagram for a possible realization of Algorithms 3 and 4. Because calculating the control inputs are independent of each other, individual computations can all be performed in parallel using n u copies of the multiplier and exponentiator. These parallelizations allow physical systems with more inputs and outputs to be controlled, without increasing the time required to perform the encryptions and decryptions. However, as a trade-off more hardware resources are required, and so in resource limited scenarios, these computations can be performed sequentially if a longer sampling period is acceptable. In the case that the computations all be performed sequentially, the controller would require only one Montgomery exponentiator module. The matrix multiplications for updating the state of the controller can also be parallelized for each row by utilizing n x copies of the multiplier and exponentiator. The modular exponentiations can also be performed in parallel, and the results for i = 1, ..., n u do ⊲ Generate encrypted scalar products 12: for j = 1, ..., n x do 13:
end for 15: end for 16: for i = 1, ..., n u do ⊲ Homomorphically sum up encrypted scalar products 17: for j = 2, ..., n x do 18: are multiplied together afterwards in a binary tree structure with a latency of ⌈log 2 (n x +n )⌉ times the latency of the Montgomery multiplication.
Plant Interface Module Design
The plant interface's role in the encrypted control system is to encrypt the plant outputs and decrypt the control inputs. There is no requirement for a single plant interface that performs both encryptions and decryptions, as these functionalities can be separated into distinct modules if the actuators and the sensors are physically apart. However, a single plant interface module allows for the reuse of hardware resources for both encryption and decryption, reducing the hardware cost of the system.
Paillier encryption algorithm in Algorithm 5 requires values for r N mod N 2 as inputs, which is independent of the plaintext being encrypted. The steps required for generating r N mod N 2 are described in Algorithm 6. A block diagram similar to Figure 4 can be employed for parallel realization of the steps in Algorithm 6. Note that it is possible to generate the value of r N needed to encrypt the next system output sample in parallel with the controller computations involving the encryption of the current sample. This parallelization between the plant interface and controller decreases the time required for completing the necessary tasks within a sampling period without utilizing extra resources.
There are various approaches for generating cryptographically secure random or pseudo-random values for r . Random methods involve sampling a noise source, such as oscillator jitter; examples R Montgomery radix 4: n ′ Number of bits in mapping from fixed point numbers Q(n, m) to integers Z 2 n ′
5:
T Controller reset period if k + 1 modT = 0 then ⊲ Controller reset 15 : 
end for
22:
for i = 1, ..., n x do ⊲ Generate encrypted scalar products 23: for j = 1, ..., n x do 24:
end for 26: for j = 1, ..., n do 27:
end for 29: end for 30: for i = 1, ..., n x do ⊲ Homomorphically sum up encrypted scalar products 31: for j = 2, ..., n x + n do 32:
end for returnx ′ 38: end function can be found in [4, 36, 49] . Pseudo-random methods are algorithms that generate numbers from an initial seed, which should be generated from a random method; examples can be found in [7, 8] . Depending on the method used, the generator can be implemented on the FPGA, or external to it. The generated random numbers are used as the input to Algorithm 6, which first converts them to the Montgomery form, in order to compute r N . Note that, for larger encryption key lengths, checking for i = 1, ..., n u do 14:
end for 21:
The tasks performed by the plant interface are described in Algorithms 5, 6, and 7, expressed as a collection of the Montgomery exponentiations and the Montgomery multiplications. The inputs to all of these Montgomery operations are either constants (as the algorithm parameters do not change within any given implementation), algorithm inputs, or the result of the previous operations. Every loop in Algorithms 5, 6, and 7 can be parallelized, as the iterations are independent of each other. For example, the encryptions of plaintexts are independent of each other, so individual encryptions can all be performed in parallel. The same applies to the calculation of values for r N mod N 2 , and to decryptions of the ciphertexts. If on the other hand the plant interface is fully parallelized, then it would require max(n , n u ) Montgomery exponentiators, as the maximum number of encryptions or decryptions to be performed in parallel depends on whether there are more system outputs to encrypt or more control inputs to decrypt.
EXPERIMENT
To demonstrate the system, we have implemented encrypted balance control of an inverted pendulum using our plant interface and controller digital designs on an FPGA. Inverted pendulum systems are unstable and require a dynamic controller to be robustly stabilized. We use the Quanser QUBE-Servo 2 as the plant and the Terasic C5P Development Board (equipped with the Cyclone V GX 5CGXFC9D6F27C7 FPGA) to implement the plant interface and the encrypted controller. The setup is shown in Figure 5 . We use the following dynamic controller with a control sampling frequency of 500 Hz to stabilize the inverted pendulum: is the measured pendulum angle, all in encoder counts (with 2048 encoder counts measured per revolution), 0 n×m is a matrix of zeros with n rows and m columns, and I n×n is an identity matrix of size n. The resulting control input u is a number between −999 and 999, representing a duty cycle and direction. We implement this controller using n ′ = 32 bits, m = 7 bits, and an encryption key length of 256 bits. In Section 2, as there were no assumptions on the integer or fractional nature of the parameters, all parameters were multiplied by 2 m to generate equivalent integer numbers. However, in this experiment, the sensor measurements and the C matrix are already integers, so we use the following substitutions in our encrypted system:
Since there is no state evolution (i.e., the state is a simple two steps delay to calculate velocities from position measurements by first order difference), a resetting the controller state is not required. Rounding and clamping of the generated control input is performed externally from the plant interface and controller.
We utilize the Montgomery multiplier design in Algorithm 2, which has an embedded multiplier usage that scales linearly with encryption key length. We run two Montgomery multipliers in parallel in each Montgomery exponentiator, and run a single Montgomery exponentiator in the plant interface and controller modules. We neglect the generation of random numbers, but still calculate a number to the power N in each control sampling period. We also neglect instantiating a separate module to encrypt setpoints, and instead encrypt setpoint in the controller, without the use of random numbers. Neither of these simplifications affect the synthesis, timing, or synthesis of the digital design, as the random number generation can be done outside of the digital engine using commercially available integrated circuits for random number generation, and the encryption of setpoints with random numbers can occur in parallel with the encryption of system outputs, thus not extending the minimum control sampling period. Importantly, on the FPGA we have distinct plant interface and controller modules and use an abstracted network to communicate encrypted data between them. Figure 9 shows the hardware resource usage of the plant interface module as the encryption key length increases, for our implementation. Figure 8 shows the minimum control sampling period as the encryption key length increases from 64 bits to 512 bits, which affects the speed with which physical systems can be controlled. For the key length of 512 bits, the sampling time of system is 10 ms. Implementations using other Montgomery multiplier architectures can potentially result in completely different hardware resource usages and speeds. Such issue are the topic of future work. Figure 6 shows the system behaviour converging to its setpoint. Figure 7 shows the system behaviour when disturbances are introduced at the tip of the pendulum. Evidently, the controller successfully attenuates large disturbances (of peak magnitude of twenty degrees).
In the experiments, we found that the latency of the plant interface determines the maximum control sampling frequency. This is due to Montgomery exponentiations with the large exponents N and λ, which require more Montgomery multiplications compared to the Montgomery exponentiations in the controller, where the exponents are shorter. If a larger control sampling frequency is required, then the plant interface digital design could make use of the Chinese Remainder Theorem [13] to reduce the size of the modulus in Montgomery exponentiations, speeding up each calculation.
The hardware description language (HDL) code used for synthesizing the encryption, controller, and decryption in the experiment can be found at https://github.com/availn/EncryptedControl. A video of the experiment can also be found at https://youtu.be/ATM0tcecst0.
CONCLUSIONS AND FUTURE WORK
We presented an experimental setup to demonstrate a powerful framework for encrypted dynamic control of unstable systems using digital designs on FPGAs with deterministic latency. The framework is scalable and can be applied to large-scale cyber-physical systems. Future work includes investigation of methods for speeding up the computations and studying the effect of uncertain communication systems on the performance of the system.
ACKNOWLEDGMENTS
