Optimizing relinearization in circuits for homomorphic encryption by Chen, Hao
ar
X
iv
:1
71
1.
06
31
9v
1 
 [c
s.D
S]
  2
5 O
ct 
20
17
Optimizing relinearization in circuits for homomorphic encryption
Hao Chen
Microsoft Research
haoche@microsoft.com
Abstract
Fully homomorphic encryption (FHE) allows an untrusted party to evaluate arithmetic cir-
cuits, i.e., perform additions and multiplications on encrypted data, without having the decryp-
tion key.
One of the most efficient class of FHE schemes include BGV/FV schemes, which are based
on the hardness of the RLWE problem. They share some common features: ciphertext sizes
grow after each homomorphic multiplication; multiplication is much more costly than addition,
and the cost of homomorphic multiplication scales linearly with the input ciphertext sizes.
Furthermore, there is a special relinearization operation that reduce the size of a ciphertext,
and the cost of relinearization is on the same order of magnitude as homomorpic multiplication.
This motivates us to define a discrete optimization problem, which is to decide where (and how
much) in a given circuit to relinearize, in order to minimize the total computational cost.
In this paper, we formally define the relinearize problem. We prove that the problem is
NP-hard. In addition, in the special case where each vertex has at most one outgoing edge, we
give a polynomial-time algorithm.
1 Introduction
Fully homomorphic encryption (FHE) is an encryption technique which allows any untrusted party
to evaluate functions on encrypted data without the decryption key. As a typical application, FHE
allows a client to outsource computation to an untrusted cloud. It has generated interest in fields
such as health and finance, due to the need to analyze sensitive data without having access to the
data itself. Since Gentry introduced the first FHE scheme in 2009, there has been a line of work that
proposed new FHE schemes with improved efficiency, among which two of the most widely used
schemes are [BGV14] and its scale-invariant counterpart [FV12]. Implementations of these schemes
include [HS14], [CLP], and [AMBG+16]. There has been numerous work that design applications
based on these schemes. Some of them ([GBDL+16, BCIV17]) evaluate machine learning models
on encrypted data. Others use FHE to design secure protocols such as private information retrieval
[MBFK16] and private set intersection [CLR17].
Unfortunately, in these schemes homomorphic operations are still several-orders of magnitude
slower than performing the same operation on plaintexts. Therefore, any optimization in the
computation time has great interest. In order to use FHE to evaluate a function, one first needs to
express the function as an arithmetic circuit. The circuit is represented as a direct acyclic graph
with each vertex being either an input, an output, or an arithmetic operation such as multiplication
and addition. In both schemes mentioned above, a fresh ciphertext is a pair of polynomials. When
a homomorphic multiplication is performed, the length of the output ciphertext grows. More
precisely, if we denote the length of a ciphertext c by l(c), then l(c1 ⊗ c2) = l(c1) + l(c2)− 1. The
1
length of the result of a homomorphic addition is the maximum length of the two operands, i.e.,
l(c1 ⊕ c2) = max{l(c1), l(c2)}.
Rouhgly speaking, the computational cost to perform a homomorphic multiplication scales
linearly with its input lengths. In both schemes, we can model the amount of work it takes to
perform a homomorphic multiplication between two ciphertexts c1 and c2 by
km(l(c1) + l(c2)),
where km is some scheme-dependent constant. In FHE, there is also a squaring operation, which
takes as input an encryption of x and returns an encryption of x2. It has the same cost 1 and
length effect as multiplication, but only takes one input.
Homomorphic additions, on the other hand, takes much less time to perform compared to
multiplication. Hence in this work we will assume that additions are “free”. For the same reason,
we will adopt the common notation from the FHE literature, and denote by depth of a circuit by
the largest number of multiplication vertices contained in a path.
Note that it is undesirable to let the ciphertext sizes grow, since it will increase both the
computational cost and the storage burden. To control the ciphertext sizes, both schemes support
a special operation called Relinearization. Effectively, relinearizing a ciphertext means reducing its
length, while keeping the underlying message the same. We can use this operation to reduce the
length of a ciphertext to any integer between two and its original length. The cost of relinearization
scales linearly with the reduction in ciphertext length. In other words, there exists a constant kr
such that reducing the ciphertext lengths by i takes i · kr units of work.
Suppose we are given an arithmetic circuit to perform on encrypted inputs. It is now an
optimization problem to decide where and how much to relinearize, in order to minimize the
total amount of work, consisting of multiplication cost and relinearization cost. Previous works
employ the simple strategy of relinearizing after every multiplication/squaring. In this way, the
multiplication costs are kept minimal. However, this strategy is not always optimal, as we will
demonstrate in Section 2.1.
1.1 Roadmap
In Section 2, we will formally describe the problem and show why this simple strategy can be
sub-optimal. In Section 3, we prove that the relinearize problem is NP-hard by reducing from the
knapsack problem. Finally, in Section 4, we restrict to the special case where each vertex in the
circuit has at most one outgoing edge, and give a polynomial time algorithm.
1.2 Related work
The work [CDS15] is an effort to find a good circuit representation of a function, in order to
minimize the total computation time.
Bootstrapping is an operation that refreshes the so-called noise in FHE ciphertexts. It is an
essential yet expensive operation. The two papers [LP13] and [BLMZ17] aim at minimizing the
total number of bootstrapping operations in a circuit, while keeping the noise from overflowing
in order to ensure the final result is correct. In their work, the authors implicitly assume the
relinearization is done after every multiplication. Similarly, we will make a simplifying assumption
that the boostrapping time is a constant, so that it does not factor into our optimizatoin problem.
1Actually, the cost of squaring x is a constant factor smaller than multiplying x with itself. For simplicity, we will
assume that the costs are equal. This simplification does not invalidate the results.
2
It will be interesting to combine these works in order to achieve an overall optimization that targets
both operations.
Acknowledgement The author thanks Rebecca Hoberg and Mohit Singh for helpful discus-
sions in preparing this work. We modify the usual definition of the arithmetic circuits to include
the squaring operations in FHE.
2 Problem Description
To formally describe our problem, we need to properly define circuits used in FHE applications.
Definition 1. a (squaring-enabled) arithmetic circuit is a directed acyclic graph G = (V,E), where
there are three kinds of vertices: input vertices has indegree 0 and outdegree 1; output vertices has
indegree ∈ {1, 2} and outdegree 0; add/multiply operation vertices have indegree 2 and outdegree
1; finally, square operation vertices have indegree and outdegree both equal to 1.
We will define the relinearize problem as an integer programming problem on arithmetic cir-
cuits. For every vertex i, we maintain an integer variable lnew(i) (the final length of vertex i during
homomorphic evaluation of G), and an integer variable xi, which indicates the amount of relin-
earization at i. We will denote the two parents of a vertex i by p1(i) and p2(i). If i is a squaring
vertex, then we set p1(i) = p2(i). We denote addition vertices by ⊕ and multiplication/square
vertices by ⊗. To resolve ambiguity, we make the convention that if a ⊗ vertex has two distinct
parents, then it is understood as a multiplication; otherwise it is a squaring.
Then the relinearize problem on G is
minimize kr
∑
i∈V
xi +
∑
i=⊗
km(l
new(i) + xi),
s.t.
lnew(i) ≥ 2 for all i
lnew(i) = lnew(p1(i)) + l
new(p2(i)) − 1− xi if i = ⊗
lnew(i) ≥ lnew(p1(i))− xi if i = ⊕
lnew(i) ≥ lnew(p2(i))− xi if i = ⊕
xi, l
new(i) ∈ Z≥0 for all i
2.1 An example
To demonstrate the non-trivality of the relinearize problem, we consider the following circuit:
⊗
⊗ ⊕
⊕ ⊗ u ⊕
○ ○ ○ ○ ○ ○
3
First, we apply the simple strategy and relinearize at every multiplication vertex. Then the
total cost is equal to 12km + 3kr. Alternatively, we can choose to only relinearize the vertex u.
Then the multiplication cost increases to 14km, while the relinearization cost is kr, so the total cost
is 14km + kr. Comparing this with the previous cost, we see that as long as kr > km, the simple
strategy is not optimal.
3 NP-hardness of the Relinearize Problem
We prove a polynomial reduction from the knapsack problem to the relinearize problem, which
establishes that the latter problem is NP-hard. First we recall the definition of knapsack problem.
Definition 2. Given positive integers v1, . . . , vn, w1, . . . , wn and W . The (0-1) knapsack problem
is:
maximize
n∑
i=1
vixi
subject to xi ∈ {0, 1} and
∑
wixi ≤W.
For our convenience, we make some modifications to the setting of the relinearize problem. We
change the inputs lengths from two to one, and we modify the equation l(c1⊗c2) = l(c1)+ l(c2)−1
to l(c1⊗ c2) = l(c1)+ l(c2). One can check that under this modificaiton, the length of every vertex
is smaller by one. Hence the modified problem is equivalent to the original problem.
To prepare for the main theorem, we make some convenient definitions.
Definition 3. A circuit is of type L(k) if it consists of one input vertex, one output vertex, and
multiplication/squaring vertices, such that if the first non-input vertex length is reduced from 2 to
1, then the length of the output vertex reduces by k.
Figure 1 is an example of L(7).
⊗
⊗
⊗
⊗
⊗
○
Figure 1: an example of L(7)
Lemma 1. For all integers k ≥ 1, there exists a circuit of type L(k) which has at most 2⌈log k⌉
vertices. Moreover, the cost to evaluate this circuit is bounded above by 4kmk⌈log(k)⌉.
4
⊗○ ○
G1
G2
G1 ⊞G2
○ ○ ○
⊗ ⊕
⊗
○
○
○
○
○
⊗ ⊕
⊗ ⊗
⊕
Figure 2: Example of G1 ⊞G2
Proof. If k is a power of 2, we can realize L(k) by a circuit that does log(k)+1 consecutive squarings.
The total cost of executing the circuit is km · (2 + 4 + · · · + 2k) < 4kmk. In general, we can start
by building the circuit L(2[log(k)]). Then for every nonzero bit in the binary representation of k, we
need to add a multiplication vertex. Since there are at most log(k) bits, we know the number of
vertices is at most 2 log(k).
As for the evaluation cost, note that each vertex in the circuit has length bounded above by
2k, hence evaluating it has cost bounded by 2kkm. The claim follows because there are at most
2⌈log(k)⌉ vertices.
Next we describe some simple ways to construct new circuits from old ones.
Definition 4. (1) The addition/multiplication of two circuits. Take two circuits G1 and G2 with
unique output vertices v1 and v2. Then G1 ⊞G2 (resp. G1 ⊠G2) is the circuit that is the union of
G1 and G2, plus an extra addition (resp. multiplication) vertex that has v1 and v2 as parents. See
Figure 2 for an example.
(2) The concatenation of two circuits. Let G1, G2 be two circuits such that the number of
output vertices of G1 is equal to the number of inputs of G2. Then we simply “feed” the outputs
of G1 to inputs of G2. We denote the resulting circuit by G1 y G2. See Figure 3 for an example.
(3) The K-repeat of a circuit along a subset of vertices. Let G be a circuit and let S =
{s1, . . . , sk} be vertices of G. Let K be a positive integer. Then we keep the vertices si and all
their ancestors, and copy the rest of the circuit K times. The resulting circuit is denoted by G
(K)
S .
See Figure 4 for an example.
(4) The gluing of two circuits along a subset of vertices. Let G1 and G2 be two circuits and S1, S2
be subsets of their vertices, such that the subgraph of G1 consisting of ancestors of S1 (including
vertices in S1) is isomorphic to the corresponding subgraph in G2. Then the gluing of G1 and G2
along S1, S2 is the circuit that contains the common subgraph and the disjoint union of the rest
of the two graphs. We denote the new circuit by G1 ⋆S1 G2 when S2 and the isomorphism is clear
from context. See Figure 5 for an example. Note that (3) is a special case of (4).
Now we are ready to state our main theorem. Consider a knapsack problem with parameters
vi(1 ≤ i ≤ n), wi(1 ≤ i ≤ n) and W .
Theorem 1. There exists a circuit G = G(vi, wi,W ), and integers km, kr such that
(1) G has O(polylog(vi, wi,W )) · poly(n)) vertices.
(2) km, kr = O(poly(vi, wi,W, n)).
5
○ ○
⊕ ⊗
G1 G2
G1 y G2
⊗
○ ○ ○ ○
⊕ ⊗
⊗
Figure 3: Example of G1 y G2
G G
(2)
S
○ ○
⊗, s1 ⊗, s2
⊕
⊗
○ ○
⊗, s1 ⊗, s2
⊕ ⊕
⊗ ⊗
Figure 4: Example of G
(K)
S for K = 2 and S = {s1, s2}
⊗
⊗ s1
○ ○
G1 G2 G1 ⋆s1 G2
○ ○ ○
⊗ s1 ⊕
⊗
○ ○ ○
⊗ ⊕
⊗⊗
Figure 5: Example of G1 ⋆s1 G2
6
(3) There exists a set of n vertices s1, . . . , sn in G, such that if the length l
∗
new(i) is the length
of si in an optimal solution to the relinearize problem on G. Then l
∗
new(i)(1 ≤ i ≤ n) is an optimal
solution to
max
∑
vili, s.t. li ∈ {1, 2},
∑
wili ≤W +
∑
wi,
Hence l∗new(i) − 1(1 ≤ i ≤ n) is an optimal solution to the original knapsack problem.
Since our proof is long, we will break it into several parts. First, let K,T be positive integers
whose values will be determined later. We define a circuit
G0 := {((L(w1)⊠ L(w2)) · · · ⊠ L(wn))⊞ L(W
′))y L(T )}
(K)
S
Here W ′ = W +
∑
iwi, and S = {s1, . . . , sn}, where si is the first non-input vertex in the circuit
L(wi). In particular, with no relinearization the length of si is equal to 2. Consider the relinearize
problem on the circuit G0 and let li be the new lengths of si. Without loss of generality, we assume
that wi ≤ W for all i (if wi > W , then any optimal solution of the knapsack problem always have
xi = 0, and we can reduce the dimension of the problem by one).
Lemma 2. Suppose
kr > 4km(T log T +W
′ logW ′),
and km = 1. Then for any optimal solution to the relinearize problem on G
0, the only vertices that
could have nonzero relinearization are the si.
Proof. By Lemma 1, the total cost of evaluating a circuit of type L(T ) is bounded by 4kmT log T ,
hence relinearizing any single vertex in this circuit has benefit bounded by 4kmT log T . The sit-
uation is similar for L(W ′). Note that relinearizing verteices in L(W ′) could reduce the length
of vertices in L(T ), but the benefit is still bounded above by 4km(T log T +W
′ logW ′). For the
same reason, the benefit of relinearizing any vertex in any of the K copies of L(wi) is bounded by
4km(T log T + wi logwi). Since wi ≤W
′, this completes the proof.
Lemma 3. Suppose kmKT > kr. Then for any optimal solution to the relinearization problem on
G0 we must have
∑
liwi ≤W
′ :=W +
n∑
i=1
wi.
Here again we recall that li ∈ {1, 2} denote the length of si in an optimal solution.
Proof. Suppose the claim is false. Then there exists i such that li = 2. We relinearize the vertex
si, which reduces the length of the final output in each copy of L(wi) by wi, and the length of the
output vertex of
(L(w1)⊠ L(w2)) · · · ⊠ L(wn))
is reduced by wi. Since
∑
liwi > W
′, the length of the input vertex in each L(T ) is reduced by
at least one, and the cost reduction from each L(T ) is at least kmT . Hence we the benefit we collect
from relinearizing si is at least kmKT , whereas the cost is kr. Since we assumed kmKT > kr, we
know relinearizing the vertex si reduces the total cost. This is a contradiction, since we started
with an optimal solution.
Now we can starting proving Theorem 1.
7
Proof. (of Theorem 1) LetM =W+
∑
iwi+
∑
i vi. We take T = ⌈5M logM⌉, kr = 25⌈M logM log(M logM)⌉,
K = 6⌈log(M logM)⌉ and km = 1. It is easy to see that K,T, kr are of size polynomial in W,wi, vi.
One can verify that kmKT > kr and kr > 4km(T log T +W
′ logW ′). Thus, by Lemma 3, we have∑
liwi ≤ W
′ if li are the new length of si in any optimal solution to the relinearization problem
on G0. This means we have the correct constraint. However, the costs are wrong: the total cost of
evaluating the circuit G0 is given by
K(
∑
rili) +
∑
i
kr(2− li) + C,
where as we proved in Lemma 1, ri ≤ 4wi log(wi). Here C is the cost of evaluating all the L(T )
circuits plus all the L(W ′) circuits. The fact that C is a constant follows from Lemma 3.
Note that the coefficient before li is equal to Kri − kr, and we want to modify this coefficient
to −vi. First, note that
Kri − kr ≤ K4wi logwi − kr
≤ 4K⌈M logM⌉ − kr
≤ (24 − 25)⌈M logM log(M logM)⌉
≤ −M
≤ −vi,∀i.
Let λi = kr−Kri−vi ∈ Z≥0. We claim that there exists a circuit L
′(λi) of such that relinearizing its
first non-input vertex reduces the total multiplication cost by λi. We omit the details of construction
of L′ since it is similar to that of L. In particular, the L′(λi) can be constructed with at most
2 log(λi) vertices. We then let
G1 = G0 ⋆s1 L
′(λ1), . . . , G
i = Gi−1 ⋆si L
′(λi), . . . , G
n = Gn−1 ⋆sn L
′(λn)
and set G = Gn. Since λi < kr, one can see that in any optimal solution of the relinearize
problem on G, the vertices in L′(λi) have zero relinearization. Thus, the relinearize problem on G
is equivalent to
min
n∑
i=1
−vili + C
′, s.t. li ∈ {1, 2} and
n∑
i=1
wili ≤W +
∑
wi,
which is equivalent to
max
n∑
i=1
vili, s.t. li ∈ {1, 2} and
n∑
i=1
wili ≤W +
∑
wi.
This proves part (3) of Theorem 1. Part (1) is clear since the number of vertices in G is bounded
by 2K(log(T ) + log(W ′) +
∑n
i=1 log(wi)) + 2
∑n
i=1 log(λi). Hence it is logarithm in the parameters
vi, wi,W and linear in the number of variables n. For (2), note that we set km = 1, so it suffices to
prove it for kr. By construction, kr is also bounded by a polynomial in vi, wi,W . This completes
the proof.
Corollary 1. The relinearize problem is NP-hard.
8
4 An Simple Case
Assume we are in the situation where each non-input vertex in the circuit has two inputs and at
most one output. In this case, we have a polynomial time algorithm for the relinearize problem.
For a vertex i, define M(i, ℓ) to be the minimal cost to compute the circuit up to vertex i, so that
the new length of i is ℓ.
Recall that p1(i) and p2(i) denote the parents of i. If i is a multiplicative vertex, we have
M(i, ℓ) = min
ℓ1,ℓ2
{M(p1(i), ℓ1) +M(p2(i), ℓ2) + kr(ℓ1 + ℓ2 − ℓ) + km(ℓ1 + ℓ2)}.
If i is an addition vertex, we have
M(i, ℓ) = min
ℓ1,ℓ2
{M(p1(i), ℓ1) +M(p2(i), ℓ2) + kr(max{ℓ1, ℓ2} − ℓ)}.
Here it is important that the vertices all only have a single output, since otherwise p1(i) and
p2(i) might have a common ancestor, in which case relinearizing this ancestor might benefit both
of them.
Claim 1. Suppose N = |V | ≥ 2. Then in the above formulae, it suffices to take the minimum over
range 2 ≤ ℓ1, ℓ2 ≤ N .
Proof. For the input vertices, the lengths is at most 2. For any non-input vertex v, we prove
inductively that its length cannot exceed its number of ancestors. The length is at most l(p1(v)) +
l(p2(v)) − 1, and by inductive hypothesis, both l(p1(v)) and l(p1(v)) are at most their number of
ancestors (or plus one if it happens to be an input vertex). That is, l(p1(v)) + l(p2(v)) − 1 ≤
n1 + n2 +1 = n. Here n1, n2, n denote the number of ancestors for p1(v), p2(v), v, respectively.
Now our algorithm proceeds as follows. We traverse the N vertices. At each vertex, we compute
M(i, l) for O(N) values of l, and each computation requires O(N2) operations. Thus the total
running time is O(N4). Finally, the optimal cost is given by min2≤l≤N M(v, l), where v is the
output node of the graph G.
5 Conclusion and Future Work
Fully homomorphic encryption evaluates boolean circuits, and relinearization is a standard tech-
nique to reduce the ciphertext sizes after evaluation. In this paper, we consider the goal of optimiz-
ing where and how much to perform the relinearization operation in any given circuit, in order to
minimize the total computational cost. We formalized it as a discrete optimization problem, and
proved that the problem is NP-hard. In the special case where every node has at most one ouptut
node, we give a polynomial time algorithm.
For future directions, it is of interest to design fast approximate algorithms for the relineariza-
tion problem. Also, one can aim at optimizing specific circuits that appear in the literature for
applications of FHE. Examples include components of the AES encryption/decryption circuit and
machine learning models such as logistic regression or neural network.
References
[AMBG+16] Carlos Aguilar-Melchor, Joris Barrier, Serge Guelton, Adrien Guinet, Marc-Olivier
Killijian, and Tancrede Lepoint. Nfllib: Ntt-based fast lattice library. In Cryptogra-
phers’ Track at the RSA Conference, pages 341–356. Springer, 2016.
9
[BCIV17] Joppe W Bos, Wouter Castryck, Ilia Iliashenko, and Frederik Vercauteren. Privacy-
friendly forecasting for the smart grid using homomorphic encryption and the group
method of data handling. In International Conference on Cryptology in Africa, pages
184–201. Springer, 2017.
[BGV14] Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) fully homomor-
phic encryption without bootstrapping. ACM Transactions on Computation Theory
(TOCT), 6(3):13, 2014.
[BLMZ17] Fabrice Benhamouda, Tancre`de Lepoint, Claire Mathieu, and Hang Zhou. Opti-
mization of bootstrapping in circuits. In Proceedings of the Twenty-Eighth Annual
ACM-SIAM Symposium on Discrete Algorithms, pages 2423–2433. SIAM, 2017.
[CDS15] Sergiu Carpov, Paul Dubrulle, and Renaud Sirdey. Armadillo: a compilation chain
for privacy preserving applications. In Proceedings of the 3rd International Workshop
on Security in Cloud Computing, pages 13–19. ACM, 2015.
[CLP] Hao Chen, Kim Laine, and Rachel Player. Simple encrypted arithmetic library-SEAL
v2.
[CLR17] Hao Chen, Kim Laine, and Peter Rindal. Fast private set intersection from homo-
morphic encryption. IACR Cryptology ePrint Archive, 2017:299, 2017.
[FV12] Junfeng Fan and Frederik Vercauteren. Somewhat practical fully homomorphic en-
cryption. IACR Cryptology ePrint Archive, 2012:144, 2012.
[GBDL+16] Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig,
and John Wernsing. Cryptonets: Applying neural networks to encrypted data with
high throughput and accuracy. In International Conference on Machine Learning,
pages 201–210, 2016.
[HS14] Shai Halevi and Victor Shoup. Algorithms in helib. In International Cryptology
Conference, pages 554–571. Springer, 2014.
[LP13] Tancre`de Lepoint and Pascal Paillier. On the minimal number of bootstrappings in
homomorphic circuits. In International Conference on Financial Cryptography and
Data Security, pages 189–200. Springer, 2013.
[MBFK16] Carlos Aguilar Melchor, Joris Barrier, Laurent Fousse, and Marc-Olivier Killijian.
Xpir: Private information retrieval for everyone. Proceedings on Privacy Enhancing
Technologies, 2016:155–174, 2016.
10
