Side-Channel Attacks on Threshold Implementations using a Glitch Algebra by Vaudenay, Serge
Side-Channel Attacks on Threshold
Implementations using a Glitch Algebra
Serge Vaudenay
EPFL
CH-1015 Lausanne, Switzerland
http://lasec.ep.ch
Abstract. Threshold implementations allow to implement circuits us-
ing secret sharing in a way to thwart side-channel attacks based on prob-
ing or power analysis. It was proven they resist to attacks based on
glitches as well. In this report, we show the limitations of these results.
Concretely, this approach proves security against attacks which use the
average power consumption of an isolated circuit. But there is no security
provided against attacks using a non-linear function of the power traces
(such as the mean of squares or the majority of a threshold function),
and there is no security provided for cascades of circuits, even with the
power mean. We take as an example the threshold implementation of the
AND function by Nikova, Rechberger, and Rijmen with 3 and 4 shares.
We further consider a proposal for higher-order by Bilgin et al.
1 Introduction
Since the late 1990's, many side-channel attacks based on either power analysis
or probing have been presented. We consider essentially two types of attacks.
In Dierential power attacks (DPA), the adversary collects many samples of the
sum of the power used by all gates of the circuit with noise. In Probing attacks,
the adversary gets a few intermediate values of the computation by probing the
circuit. All measures are subject to noise and can be modeled [2]. Duc et al. have
shown that these two attacks are essentially equivalent [4].
One devastating type of attack is based on \glitches". It takes into account
that electric signals are not necessarily a classical 0/1 signal but a real function
over a clock period which is non constant. For instance, the signal can be in-
termediate between 0 and 1, or switching several times between 0 and 1 during
the clock period, or a signal with a very short switching peak, etc. The CMOS
technology uses very little power. Signals switching in between clock periods
use power. Essentially, only signal switches use power. So, a glitch induces an
abnormal power consumption which is visible during a clock period [5].
To avoid these attacks, masking is a common method. Essentially, instead
of running the computations based on inputs x and y to obtain a result z, we
rst use a secret sharing for x and y to split it into n random shares (x1; : : : ; xn)
and (y1; : : : ; yn) and run the computation on the shares to obtain a sharing
(z1; : : : ; zn) of z. Usually, the secret sharing is the simple (n; n)-scheme in which
x = x1      xn, y = y1      yn, and z = z1      zn. Trichina, Korkishko,
and Lee [11] proposed an implementation of an AND gate with n = 2.
In [7], Nikova, Rechberger, and Rijmen proposed the threshold implementa-
tion which transforms a gate (such as an AND gate) into a circuit which resists
to probing attacks with a single probe or DPA based on the average of the power
consumption. One construction uses n = 3 and another one with n = 4 has the
property that output shares are always balanced. In [1], Bilgin et al. extend this
method to higher orders, to make circuits resisting to 2 probes or DPA based
on a 2nd order moment of the power consumption. They propose an imple-
mentation of an AND gate with n = 5 but this implementation requires internal
ip/op registers, thus induce latencies, just to have a secure AND circuit. These
constructions were recently consolidated in [9].
Our results. As the glitch propagation model highly depends on concrete im-
plementations, in this paper, we consider several models for accounting glitches
obtained by the XOR of two glitched signals. We do not advertise any model
to be better but rather show how little inuence the model has on the security
results. In a rst model, the \double-glitch" simply counts as twice a normal
glitch. In this model, the mean power for the construction with n = 2 does not
leak. In a second model, the double-glitch counts as a normal one. In a third
model, the two glitches cancel each other and do not count. In the two latter
models, the construction with n = 2 leaks from the mean power.
In the mentioned constructions using n > 2, we show that two probes leak,
that some non-linear function of the power (such as the mean of squares or
the majority of a threshold function) leak, and that by composing two circuits
implementing two AND gates, one probe leaks.
Finally we show that in the three models, the AND construction using n = 5
(the one resisting 2nd order attacks) does not resist to an attack with two probes
when we do not add internal ip/op registers.
The security claims coming with these implementations from the literature
are of the form \if [conditions] then we have security". We do not contradict any
of these results. In this paper, we complement them by showing that when the
conditions are not met, we clearly have insecurity. So, these conditions are not
only sucient: they are also necessary.
2 The Theory
2.1 The Glitch Algebra
Algebra is \the part of mathematics in which letters and other general symbols
are used to represent numbers and quantities in formulae and equations". Herein,
we propose to represent glitches as well and to do operations on glitches.
In what follows we use the following conventions: a \signal" is a function
from a clock cycle [0;  ] to R; we consider real numbers as constant signals,
we consider bits as real numbers in f0; 1g; + and  denote the addition and
multiplication of reals; , _, and ^ denote the XOR, OR, and AND of signals.
2
A signal \represents" a bit. To avoid confusion, from now on we denote
with regular letters a signal and we denote with a bar the bit it is supposed
to represent. We say that a signal x has no glitch if it is constant and equal
to the bit x it represents. The functions , _, and ^ are dened by the gates
implementing these functions. We only know that they match what we know
about bits: ab = a+b mod 2, a_b = max(a; b), and a^b = ab when a and b have
no glitch. Furthermore, we dene a function glitch giving the \number of glitches"
in a signal and a function power giving the power consumption of a gate. We
assume that glitch(x) = 0 if x has no glitch. The function glitch applies to a signal
but the function power applies to a gate. Concretely, a gate g = op(a; b) with
output signal c corresponds to power(g) = glitch(c)pop where pop is a constant.
So, power(g) = 0 if op(a; b) has no glitch. Actually, this is an approximation.
Essentially, it is assumed that a stable signal uses very little power while a
glitch induces a high power consumption, like in the CMOS technology [5]. The
assumption on the inuence of glitches on the power consumption may be a bit
arbitrary. In the sequel, we take for granted that when y has no glitch, then xy
has the same glitch as x. When y has no glitch and y = 0, we assume that x^ y
has no glitch either (due to the AND with 0). When y has no glitch and y = 1,
we assume that x ^ y has the same glitch as x. So,
glitch(x ^ y) =

0 if glitch(y) = 0 and y = 0
glitch(x) if glitch(y) = 0 and y = 1
glitch(x y) = glitch(x) if glitch(y) = 0
We further dene power as the sum of power(g) for all gates g in a circuit.
It is not quite clear how to dene glitch(x^y) for two glitched signals x and y
in general. Even for glitch(x y), we may take one of the following assumptions:
glitch(x y) = glitch(x) + glitch(y) (1)
glitch(x y) = max(glitch(x); glitch(y)) (2)
glitch(x y) = glitch(x) glitch(y) (3)
These assumptions are quite reasonable in theory. (1) accounts for glitches which
cumulate, for instance because they occur at dierent time in a clock period. (2)
assumes that a glitch can be hidden by another one. (3) comes from saying that
two perfectly identical glitches should cancel each other in a XOR. However,
reality is more complex and probably a mixture of these three models:
glitch(x y) = F (glitch(x); glitch(y))
for some symmetric function F . For simplicity, we will study these simple as-
sumptions. We will see that nearly all assumptions give the same results. Each
denes some kind of \glitch algebra" on which we can do computations.
In this report, we consider two types of side-channel attacks based on glitches.
{ Power analysis: the adversary can see power with noise.
{ Probing attack: for a gate g, the adversary can get glitch(g) with some noise.
Duc et al. have shown that these two attacks are equivalent [4].
3
2.2 Side-Channel Attack with Noise
In side-channel attack, we measure a quantity S in a discrete domain D but the
measurement comes with noise so we obtain Z = S + noise. We assume that
S follows a distribution PSb depending on a secret bit b. We want to make a
guess X for b. An algorithm taking some random input and giving X as output
is a distinguisher. The Type I error is  = Pr[X = 1jb = 0]. The Type II
error is  = Pr[X = 0jb = 1]. The error probability is Pe = Pr[X 6= b] =
Pr[b = 0] +  Pr[b = 1] so depends on the distribution of b. The advantage of
the distinguisher is Adv = jPr[X = 1jb = 0]  Pr[X = 1jb = 1]j = j+    1j.
If PZb denotes the obtained distribution for Z. We know that the largest ad-
vantage using one single sample Z is Adv = d(PZ0 ; P
Z
1 ) dened by the statistical
distance between PZ0 and P
Z
1 .
d(PZ0 ; P
Z
1 ) =
1
2
X
z
jPr[Z = zjb = 0]  Pr[Z = zjb = 1]j
Theorem 1 (Precision amplication). Given an elementary distinguisher
computing X from Z, with Type I error probability   12 and Type II error
probability   12 , for any N we can construct a distinguisher such that from
i.i.d. samples Z1; : : : ; ZN we compute X with error probability
P 0e  e N(
1
2 min(;))
2
(4)
Taking N = 2
 
1
2  min(; )
 2
, we obtain P 0e  e 2  13%.
Proof. We use the elementary distinguisher to compute the X1; : : : ; XN corre-
sponding to Z1; : : : ; ZN . Then, we compute X = majority(X1; : : : ; XN ).
Using the Cherno bound (Lemma 2 below), we obtain a new distinguisher
with errors N and N such that N  e N( 12 )
2
and N  e N( 12 )
2
. So,
the error probability P 0e = N Pr[b = 0] + N Pr[b = 1] obtained by taking the
majority vote decreases exponentially fast with N . As min(; )  ;   12 , we
obtain the result. ut
Lemma 2 (Cherno [3]). Let X1; X2; : : : ; XN be N independent boolean vari-
ables such that that E(Xi) = p for all i. We dene X = majority(X1; : : : ; XN ).
For all p < 12 , we have
Pr[X 6= 0]  e N( 12 p)2
For all p > 12 , we have
Pr[X 6= 1]  e N( 12 p)2
In what follows, we assume that the noise is Gaussian, centered, independent
from S, and that the ratio of the standard deviation of the noise and of S is a
given value . So, noise has a variance of 2V (S). Hence,
Pr[noise   x] = 1
2
erfc
 
xp
22V (S)
!
4
Threshold distinguisher. We consider the distinguisher computing X = 1Z .
In the Gaussian noise model, the Type I error is
 = Pr[X = 1jb = 0] =
X
s
Pr[S = sjb = 0]Pr[noise     s]
by symmetry of the noise distribution, the Type II error is
 = Pr[X = 0jb = 1] =
X
s
Pr[S = sjb = 1]Pr[noise  s   ]
By adjusting  so that  =  = Pe, we obtain that N = 2
 
1
2   Pe
 2
is enough
to reach P 0e  13%.
Case study for S = 1   b and b uniform. If S = 1   b and b is uniform, we
have V (S) = 14 . We adjust  =
1
2 and obtain  =  =
1
2erfc
 
0:5

p
2

. We obtain
from Th. 1 that N = 2
 
1
2   12erfc
 
0:5

p
2
 2
. For instance, with  = 1, we have
Pe  16% and N = 17. With  = 2, we have Pe  31% and N = 55. We obtain
that N !+1 42 using erfc(t) = 1   2tp + o(t) for t ! 0. So, we see that
N = O(2) is enough to guess b with error limited to a constant. This is a quite
favorable attack as we can measure b directly.
Attack for n = 1. As an example, given an AND gate z = x^y (with no threshold
protection, or equivalently n = 1), assuming that y is stable equal to some
secret y and that glitch(x) = 1, we have glitch(z) = y. So, an attack measuring
S = glitch(z) deduces y trivially. We are in the case where S = glitch(z) = y
is binary and balanced. So, the above equation governs the complexity N of
recovering y using no threshold implementation and noise characterized by .
Pushing to higher order measures. We can wonder what happens if, for some
reasons, S does not leak but S2 leaks. Then, we should look at Z2 instead of
Z. But Z2 = S2 + noise0 with noise0 = 2Snoise + noise2. By neglecting the
quadratic noise, we have V (noise0)  4V (S)V (noise) = 42V (S)2. Assuming
V (S2)  V (S)2, we can see that the eect of moving from S to S2 is only in
doubling the value of . As we will see, a motivation of threshold cryptography
is to prevent leaks at a lower order S to make the adversary look at higher order.
This actually penalizes a bit the adversary.
3 Implementation with n = 2
Trichina et al. [11] proposed an implementation of the AND gate to compute
z = x ^ y by using n = 2: 1. (secret sharing for x) pick a 2U Z2 and compute
~x = ax; 2. (secret sharing for y) pick b 2U Z2 and compute ~y = by; 3. (secret
sharing for z) pick c 2U Z2; 4. compute
~z = (((c (a ^ b)) (a ^ ~y)) (b ^ ~x)) (~x ^ ~y)
5
by respecting the order of the parentheses; 5. the output (~z; c) shares z = c ~z.
In [7], Nikova, Rechberger, and Rijmen observe that if the input signal x has
a glitch and y is a secret input, then by analyzing the power consumption of the
above gate we can easily deduce y. Indeed, assuming that an AND or XOR gate
uses an abnormal power scheme proportional to the number of \glitch" on their
result, the number of gates using an abnormal power scheme depends on y. So,
we assume that glitch(x) = 1 and glitch(fa; b; c; yg) = 0.
We have
glitch(~x)= 1 glitch(~x ^ ~y) = ~y=b y
glitch(~y)= 0 glitch((c (a ^ b) (a ^ ~y)) (b ^ ~x))=b
glitch(b ^ ~x)=b
so
glitch(z) =
8<:
b+ ~y with Assumption (1)
b _ ~y with Assumption (2)
y with Assumption (3)
power = (b+ ~y)pAND +
8<:2
b+ ~y with Assumption (1)
b+b _ ~y with Assumption (2)
b+ y with Assumption (3)
9=; :pXOR
If y = 0, we have ~y = b so
power = 2bpAND +
8<:3
b with Assumption (1)
2b with Assumption (2)
b with Assumption (3)
9=; :pXOR
For y = 1, this is power = pAND + (1 + b):pXOR for Assumptions (1,2,3). So,
E(powerjy = 1)  E(powerjy = 0) =
8<:0 with Assumption (1)12 with Assumption (2)
1 with Assumption (3)
9=; :pXOR
It is explicitly said in [8, p. 297] that
\The power consumption caused by the glitch is related to the number
of gates that see the glitch. It is clear [...] that the energy consumption
depends on the values of [b and ~y]. Since the mean power consumption is
dierent for y = 0 and y = 1, the power consumption leaks information
on the value y."
The computation done in [8] to analyze the leakage was based on Assumption (1)
as we can easily check from [8, Table 1]. So, we contradict this claim for Assump-
tion (1): E(power) is independent from y in this case. However, it is true that
E(power) leaks y for Assumption (2) and Assumption (3). For this implemen-
tation, the choice of the \glitch algebra" gives dierent conclusions.
Similarly, in attacks based on probing z, we can see that E(glitch(z)) = 1
which is independent from y in Assumption (1). For Assumption (2), we have
6
E(glitch(z)) = 12 which is also independent from y. For Assumption (3), we have
glitch(z) = y. In the latter case, we can see that E(glitch(z)) leaks y so noisy
samples for glitch(z) leak y using the amplication technique of Eq. (4).
This made [7] propose a \threshold implementation" of an AND gate using
n = 3 or n = 4 shares, the above example being an example using n = 2
shares. They prove that, contrarily to this example, probing a single gate in the
computation leaks no information on any of the input x and y, on average. They
deduce that their implementations resist to the above attacks based on glitches.
We will show the limitations of this result with eective attacks.
4 Implementation with n = 3
Assuming that (x1; x2; x3) shares x, (y1; y2; y3) shares y, and (z1; z2; z3) shares
z, Nikova, Rechberger, and Rijmen [7] propose
z1 = (x2 ^ y2) ((x2 ^ y3) (x3 ^ y2))
z2 = (x3 ^ y3) ((x1 ^ y3) (x3 ^ y1))
z3 = (x1 ^ y1) ((x1 ^ y2) (x2 ^ y1))
This construction satises the conditions from Nikova et al. [7]. We quote [7]:
\Theorem 3. [...] the mean power consumption of a circuit implement-
ing realization [above] is independent of [x, y], even in the presence of
glitches or the delayed arrival of some inputs."
Although we do not contradict the independence of the mean with the input
values, we show that a probing attack can leak y easily. We further show that a
cascade of this construction also leaks with the mean of power consumption.
In the attacks, we will assume that none of the yi variables have a glitch, and
that they are independent from the glitches in the xi variables.
With Assumption (1), we have
glitch(xi ^ yj) = glitch(xi)yj
glitch((xi ^ yj) (xj ^ yi)) = glitch(xi)yj + glitch(xj)yi
glitch(z1) = glitch(x2)(y2 + y3) + glitch(x3)y2
glitch(z2) = glitch(x3)(y3 + y1) + glitch(x1)y3
glitch(z3) = glitch(x1)(y1 + y2) + glitch(x2)y1
so
power =
X
i;j
glitch(xi)yjpAND +
X
i
glitch(xi)(2y1 + 2y2 + 2y3   yi)pXOR
In the glitch value of each gate, we can see that at least one variable yi is not
present (indeed, the construction was made for that). Since the yi are uniformly
distributed conditioned to y = y1  y2  y3, no matter the value of y, the two
7
present yi variables are uniformly distributed. So, the distribution of any glitch
value is independent from y. Consequently, it is the case for their mean value.
Since power is a linear combination of these values, due to the linearity of the
mean operator, this is also the case for power.
With Assumption (2), by writing max(yi; yj) = yi _ yj , we have
glitch(xi ^ yj) = glitch(xi)yj
glitch((xi ^ yj) (xj ^ yi)) = max(glitch(xi)yj ; glitch(xj)yi)
glitch(z1) = max(glitch(x2)(y2 _ y3); glitch(x3)y2)
glitch(z2) = max(glitch(x3)(y3 _ y1); glitch(x1)y3)
glitch(z3) = max(glitch(x1)(y1 _ y2); glitch(x2)y1)
Like above, the mean value of any of these expression is independent from y.
With Assumption (3), we have
glitch(xi ^ yj) = glitch(xi)yj
glitch((xi ^ yj) (xj ^ yi)) = glitch(xi)yj  glitch(xj)yi
glitch(z1) = glitch(x2)(y2  y3) glitch(x3)y2
glitch(z2) = glitch(x3)(y3  y1) glitch(x1)y3
glitch(z3) = glitch(x1)(y1  y2) glitch(x2)y1
so
power =
X
i;j
glitch(xi)yjpAND +
 X
i
glitch(xi)((yi  yi+1) + yi 1) 
X
i
glitch(xi)glitch(xi+1)((yi  yi+1)yi)
!
pXOR
Like above, the mean value of any of these expression is independent from y.
4.1 Power Analysis not Based on the Mean Value (All Assumptions)
We have already seen that no glitch value has a distribution which depends
on y. So let us focus on the distribution of power. With glitch(x1) = 1 and
glitch(x2) = glitch(x3) = 0, we obtain with Assumption (1) that
power = (y1 + y2 + y3)(pAND + 2pXOR)  y1pXOR
With Assumption (2), our previous computations simplify to
glitch(xi ^ yj) =

0 if i 6= 1
yj if i = 1
glitch((x2 ^ y3) (x3 ^ y2))= 0 glitch(z1)= 0
glitch((x3 ^ y1) (x1 ^ y3))= y3 glitch(z2)= y3
glitch((x1 ^ y2) (x2 ^ y1))= y2 glitch(z3)= y1 _ y2
8
so
power = (y1 + y2 + y3)pAND + ((y1 _ y2) + y2 + 2y3)pXOR
With Assumption (3), we have the same results except glitch(z3) = y1  y2. So
power = (y1 + y2 + y3)pAND + ((y1  y2) + y2 + 2y3)pXOR
We count the number of gates with a glitched output following the two as-
sumptions. We also indicate power assuming that pAND = 1 and pXOR = 4.
1
The results are on Table 1.
Table 1. Distributions for a glitch in x1 in the threshold implementation
Assumption (1) Assumption (2) Assumption (3)
y y1 y2 y3 #AND #XOR power #AND #XOR power #AND #XOR power
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 2 4 18 2 4 18 2 4 18
0 1 0 1 2 3 14 2 3 14 2 3 14
0 1 1 0 2 3 14 2 2 10 2 1 6
mean 1:5 2:5 11:5 1:5 2:25 10:5 1:5 2 9:5
variance 0:75 2:25 46:75 0:75 2:1875 44:75 0:75 2:5 48:75
1 0 0 1 1 2 9 1 2 9 1 2 9
1 0 1 0 1 2 9 1 2 9 1 2 9
1 1 0 0 1 1 5 1 1 5 1 1 5
1 1 1 1 3 5 23 3 4 19 3 3 15
mean 1:5 2:5 11:5 1:5 2:25 10:5 1:5 2 9:5
variance 0:75 2:25 46:75 0:75 1:1875 26:75 0:75 0:5 4:75
stat. dist. 1 1 1 1 0:5 1 1 0:5 1
Clearly, the distributions of powerjy = 0 and powerjy = 1 are very dif-
ferent. For instance, the parity of #AND is always equal to y. The supports
of the distributions #XORjy = 0 and #XORjy = 1 are disjoint with Assump-
tion (1). So, we can distinguish them with one sample with advantage 1. With
Assumption (2) or Assumption (3), the statistical distance of the distributions
#XORjy = 0 and #XORjy = 1 is 12 . So, a trivial statistic with a couple of samples
would recover y assuming no noise.
We consider several types of distinguishers base on measuring #XOR. As the
impact of the glitched XORs on power is bigger, we can assume we measure it
this way. We could also consider other side channel attacks which can separate
the XORs from and ANDs.
1 We took pXOR = 4pAND as an example, which justies by assuming that we use 4
NAND gates to make a XOR gate. But this must only be taken as an example. Note
that an AND requires two NAND gates but the second one which is used as a NOT
gate can often cancel with subsequent gates.
9
Best distinguisher. With Assumption (1) and #XOR, the best distinguisher
returns 0 if #XOR 2 f0; 3; 4g and it returns 1 if #XOR 2 f1; 2; 5g. A statistical
distance of 1 means that we can guess y with an error probability 0. A statistical
distance of 0:5 means that we can guess y with an error probability 14 .
In practice, measuring #XOR may give a noisy value making it hard to
implement this distinguisher. I.e., 0 and 1 may be too close to be distinguishable,
as well as 2 and 3, and 4 and 5.
Threshold distinguisher. We consider the distinguisher giving 1#XOR+noise ,
i.e. 1 if #XOR (rather its noisy value from a side channel) is below a given
threshold  . Assuming that noise follows an independent normal distribution
with mean 0 (w.l.o.g. by adjusting ) and variance 2V (#XOR), we have
Pr[noise   x] = 1
2
erfc
 
xp
22V (#XOR)
!
so the Type I error in guessing y is
 =
1
2
X
i
erfc
 
i  p
22V (#XOR)
!
Pr[#XOR = ijy = 0]
The Type II error is
 = 1  1
2
X
i
erfc
 
i  p
22V (#XOR)
!
Pr[#XOR = ijy = 1]
For Assumption (1), we have V (#XOR) = 94 and
 =
1
2
5X
i=0
erfc
 
i  
3
2
p
2
!
Pr[#XOR = ijy = 0]
=
1
8
 
erfc
 
 
3
2
p
2
!
+ 2erfc
 
3  
3
2
p
2
!
+ erfc
 
4  
3
2
p
2
!!
For  = 2:5, using erfc( x) = 2  erfc(x), we obtain
 =
1
4
+
1
8
 
 erfc
 
2:5
3
2
p
2
!
+ 2erfc
 
0:5
3
2
p
2
!
+ erfc
 
1:5
3
2
p
2
!!
Similarly, we have  = . So, we have Pe =  =  and
Pe =
1
4
+
1
8
 
 erfc
 
2:5
3
2
p
2
!
+ erfc
 
1:5
3
2
p
2
!
+ 2erfc
 
0:5
3
2
p
2
!!
=
1
2
 
( 3)
As  goes from 0 to innity, Pe grows from
1
4 to
1
2 . For instance, for  =
1
2 , we
have Pe  38%. For  = 1, we have Pe  46%. For  = 2, we have Pe  49:35%.
10
So, even with a big noise, we can recover y with an interesting advantage with
only one sample.
Of course, we can amplify this advantage by using several samples. Since we
have  =  = Pe, by using (4) we obtain a new error probability of P
0
e = e
 2 
13% with N = 2
 
1
2   Pe
 2
= O(6). We obtain the following table:
: 0:5 1 1:5 2 2:5 3 3:5 4
N : 143 1 417 9 979 46 765 163 627 465 879 1 141 284 2 495 478
So, measuring the number of XORs (a number between 0 and 5) with a big noise
of standard deviation twice what we want to measure still allows to deduce y
with less than 50 000 samples with Assumption (1).
Moment distinguisher. Instead of computing the average of power, we com-
pute the moment E((power)d) of order d, just like in Moradi [6,10].
With Assumption (1), we have
E(powerjy = 0) = E(powerjy = 1) = 3
2
pAND +
5
2
pXOR
V (powerjy = 0) = V (powerjy = 1) = 3
4
p2AND +
5
2
pANDpXOR +
9
4
p2XOR
but the moments of order d = 3 dier. So (power)3 leaks y.
With Assumption (2), we have
E(powerjy = 0) = E(powerjy = 1) = 3
2
pAND +
9
4
pXOR
V (powerjy = 0) = 3
4
p2AND +
9
4
pANDpXOR +
35
16
p2XOR
V (powerjy = 1) = 3
4
p2AND +
7
4
pANDpXOR +
19
16
p2XOR
V (powerjy = 0)  V (powerjy = 1) = 1
2
pANDpXOR + p
2
XOR
With Assumption (3), we have
V (powerjy = 0) = 3
4
p2AND + 2pANDpXOR +
5
2
p2XOR
V (powerjy = 1) = 3
4
p2AND +
1
2
pANDpXOR +
1
2
p2XOR
V (powerjy = 0)  V (powerjy = 1) = 3
2
pANDpXOR + 2p
2
XOR
Clearly, the mean of (power)2 (i.e., d = 2) leaks y with Assumption (2) and
Assumption (3). For Assumption (1), the same holds with (power)3.
If E(power) is known and we measure Z = S + noise with S = power  
E(power) and a centered Gaussian noise of variance 2V (power), we can
compute Z 0 = S2 + noise0 with noise0 = 2Snoise + noise2   2V (power). So,
11
it is as if we measured S2 with noise noise0. The variance of noise0 is roughly
42V (S2), so the attack works as if we just doubled . For instance, our previous
computation shows that by doubling  we roughly multiply N by 50. With this
approach, threshold implementation penalizes the precision of the measurement.
4.2 Probing Attack with Two Probes Based on the Mean Value
(All Assumptions)
We can further see what probing can yield.
With Assumption (1,3), we have glitch(z2) = y3 and glitch(z3) = y1 y2. So,
probing both z2 and z3 is enough to recover y.
With Assumption (2), this leaks (y3; y1 _ y2). For y = 0, the distribution
of this couple is Pr[(0; 0)] = 14 , Pr[(0; 1)] =
1
4 , Pr[(1; 1)] =
1
2 . For y = 1, the
distribution of this couple is Pr[(0; 1)] = 12 , Pr[(1; 0)] =
1
4 , Pr[(1; 1)] =
1
4 . So, the
statistical distance is 12 and the probability of error for guessing y is
1
4 .
As an example, with Assumption (1) we compute S = glitch(z2)+glitch(z3) 
glitch(z2)glitch(z3). Assuming that both glitch(z2) and glitch(z3) are subject to
some noise with same parameter , the value we obtain for S is similar to a
noisy value with the parameter  multiplied by a constant factor less than 3. In
our table, this results in a complexity N multiplied by a factor 300.
We recall that [7] claims no security when probing two values.
4.3 Power Analysis and Probing Attack on Two ANDs Based on
the Mean Value (Assumption (2) or (3))
We use two consecutive threshold AND gates to compute the AND between z
and another shared bit u to obtain v = x ^ y ^ u. We assume no glitch on the y
and u variables. We assume that only x1 has a glitch. We have
z1=(x2 ^ y2) ((x2 ^ y3) (x3 ^ y2)) v1=(z2 ^ u2) ((z2 ^ u3) (z3 ^ u2))
z2=(x3 ^ y3) ((x1 ^ y3) (x3 ^ y1)) v2=(z3 ^ u3) ((z1 ^ u3) (z3 ^ u1))
z3=(x1 ^ y1) ((x1 ^ y2) (x2 ^ y1)) v3=(z1 ^ u1) ((z1 ^ u2) (z2 ^ u1))
With Assumption (1), the linearity of the equations make sure that the expected
value of the glitch variables are independent from y.
Now, under Assumption (2), we have
glitch(z1)= 0 glitch(v1)=max(glitch(z2)(u2 _ u3); glitch(z3)u2)
glitch(z2)= y3 =max(y3(u2 _ u3); (y1 _ y2)u2)
glitch(z3)= y1 _ y2
so we can now try to probe v1. We have the 3 following cases:
{ u2 = u3 = 0 (probability
1
4 ): we have v1 = 0, no glitch and nothing leaks.
{ u2 = 0 and u3 = 1 (probability
1
4 ): v1 has a glitch if and only if y3 = 1.
{ u2 = 1 (probability
1
2 ): v1 has a glitch when y1 _ y2 _ y3 = 1. For y = 1,
there is always a glitch. For y = 0, there is a glitch with probability 34 .
12
So, if y = 0, we observe a glitch in v1 with probability
1
4 0+ 14  12 + 12  34 = 12 .
If y = 1, the probability becomes 14  0 + 14  12 + 12  1 = 58 . Hence, the mean
value reveals y. A single sample gives an error probability of 717 .
Now, under Assumption (3), we have
glitch(z1) = 0 glitch(z3) = y1  y2
glitch(z2) = y3 glitch(v1) = y3(u2  u3) (y1  y2)u2
so we can try to probe v1 again. With probability
1
4 , we have u2  u3 = u2 = 1
so glitch(v1) = y. Otherwise, glitch(v1) is uniformly distributed. So, for y = 0,
E(glitch(v1)) =
3
8 and for y = 1, E(glitch(v1)) =
5
8 . Again, y leaks from the
mean value. A single sample gives an error probability of 38 .
The attack with noisy values is hardly more complicated than for n = 1.
Note that [7] does not claim any security on the composition of two AND
gates. But this attacks clearly shows the limitation of this approach.
5 Implementation with n = 4
Assuming that (x1; x2; x3; x4) shares x, (y1; y2; y3; y4) shares y, and (z1; z2; z3; z4)
shares z, Nikova, Rechberger, and Rijmen [7] propose
z1 = ((x3  x4) ^ (y2  y3)) y2  y3  y4  x2  x3  x4
z2 = ((x1  x3) ^ (y1  y4)) y1  y3  y4  x1  x3  x4
z3 = ((x2  x4) ^ (y1  y4)) y2  x2
z4 = ((x1  x2) ^ (y2  y3)) y1  x1
It was proposed as an improvement to the n = 3 scheme as it makes all zi
shares balanced. This property is called uniformity in [8]. It was used to address
composition. So, we look again at the composition of two AND circuits.
Again, we assume glitch(x1) = 1, glitch(x2) = glitch(x3) = glitch(x4) =
glitch(yi) = 0 for i = 1; : : : ; 4 and glitch(x1) = 1. So, glitch(z1) = 0, glitch(z2) =
(y1 y4)+ 1, glitch(z3) = 0, and glitch(z4) = (y2 y3) + 1 with Assumption (1).
We compute v = z^u = (x^y)^u using the threshold implementation with
v1 = ((z3  z4) ^ (u2  u3)) u2  u3  u4  z2  z3  z4
v2 = ((z1  z3) ^ (u1  u4)) u1  u3  u4  z1  z3  z4
v3 = ((z2  z4) ^ (u1  u4)) u2  z2
v4 = ((z1  z2) ^ (u2  u3)) u1  z1
So, we have
glitch(v1) = (y2  y3 + 1)(u2  u3) + (y1  y4) + (y2  y3) + 2
glitch(v2) = y2  y3 + 1
glitch(v3) = ((y1  y4) + (y2  y3) + 2)(u1  u4) + y1  y4 + 1
glitch(v4) = ((y1  y4) + 1)(u2  u3)
13
Hence, we can just probe v1 and see if it has a glitch. With probability
1
2 , we
have u2 = u3 so glitch(v1) = 2(y2  y3) + (y1  y4) + 3. In other cases, we have
glitch(v1) = (y1  y4) + (y2  y3) + 2 which is uniformly distributed. So, by
repeating enough times, the majority of glitch(v1) is y with high probability.
The attack with noisy values is hardly more complicated than for n = 1.
Computations with Assumptions (2) or (3) are similar.
Note that [7] does not claim any security on the composition of two AND
gates. However, the n = 4 implementation was made to produce a balanced
sharing of the output to address composability through pipelining, meaning by
adding a layer of registers between the circuits we want to compose. Here, we
consider the composition of two AND gates without pipelining. Indeed, we cer-
tainly do not want to add registers in between two single gates! But our attacks
shows that the entire layer of circuit that we want to compose through pipelining
must be analyzed as a whole, since single gates clearly do not compose well.
6 Higher-Order Threshold Implementation with n = 5
In [1], Bilgin et al. propose an example of higher-order threshold implementation.
Equation (1) in [1] implements y = 1 a bc. To obtain the implementation of
an AND gate, we just remove the 1 and the a terms and obtain
y1 = (b2 ^ c2) (b1 ^ c2) (b2 ^ c1) y6 = (b2 ^ c4) (b4 ^ c2)
y2 = (b3 ^ c3) (b1 ^ c3) (b3 ^ c1) y7 = (b5 ^ c5) (b2 ^ c5) (b5 ^ c2)
y3 = (b4 ^ c4) (b1 ^ c4) (b4 ^ c1) y8 = (b3 ^ c4) (b4 ^ c3)
y4 = (b1 ^ c1) (b1 ^ c5) (b5 ^ c1) y9 = (b3 ^ c5) (b5 ^ c3)
y5 = (b2 ^ c3) (b3 ^ c2) y10 = (b4 ^ c5) (b5 ^ c4)
Then, Equation (2) in [1] decreases the number of shares to 5 by
z1 = (b2 ^ c2) (b1 ^ c2) (b2 ^ c1) z5 = (b2 ^ c3) (b3 ^ c2) (b2 ^ c4)
z2 = (b3 ^ c3) (b1 ^ c3) (b3 ^ c1) = (b4 ^ c2) (b5 ^ c5) (b2 ^ c5)
z3 = (b4 ^ c4) (b1 ^ c4) (b4 ^ c1) = (b5 ^ c2) (b3 ^ c4) (b4 ^ c3)
z4 = (b1 ^ c1) (b1 ^ c5) (b5 ^ c1) (b3 ^ c5) (b5 ^ c3) (b4 ^ c5)
(b5 ^ c4)
This 2nd order implementation is supposed to resist to probing attacks with two
probes. Normally, the transform of (y1; : : : ; y10) to (z1; : : : ; z5) by zi = yi for
i < 5 and z5 = y5      y10 must be done with intermediate registers to avoid
the propagation of glitches. We wonder what happens without these registers.
Let consider an attack probing z4 and z5. If there is a glitch in b5 and no
other input share, we have glitch(z4) = c1 and
glitch(z5) = glitch((b5 ^ c2) (b5 ^ c3) (b5 ^ c4) (b5 ^ c5))
With Assumption (1), this is glitch(z5) = c2+c3+c4+c5. With Assumption (2),
this is glitch(z5) = max(c2; c3; c4; c5). With Assumption (3), this is glitch(z5) =
14
Table 2. Distribution of (glitch(z4); glitch(z5)) for a glitch in b5 in the 2nd order thresh-
old implementation
c c1c2c3c4c5 A. (1) A. (2) A. (3)
0 0 0 0 0 0 (0; 0) (0; 0) (0; 0)
0 0 0 0 1 1 (0; 2) (0; 1) (0; 0)
0 0 0 1 0 1 (0; 2) (0; 1) (0; 0)
0 0 1 0 0 1 (0; 2) (0; 1) (0; 0)
0 0 0 1 1 0 (0; 2) (0; 1) (0; 0)
0 0 1 0 1 0 (0; 2) (0; 1) (0; 0)
0 0 1 1 0 0 (0; 2) (0; 1) (0; 0)
0 0 1 1 1 1 (0; 4) (0; 1) (0; 0)
0 1 0 0 0 1 (1; 1) (1; 1) (1; 1)
0 1 0 0 1 0 (1; 1) (1; 1) (1; 1)
0 1 0 1 0 0 (1; 1) (1; 1) (1; 1)
0 1 1 0 0 0 (1; 1) (1; 1) (1; 1)
0 1 0 1 1 1 (1; 3) (1; 1) (1; 1)
0 1 1 0 1 1 (1; 3) (1; 1) (1; 1)
0 1 1 1 0 1 (1; 3) (1; 1) (1; 1)
0 1 1 1 1 0 (1; 3) (1; 1) (1; 1)
mean ( 1
2
; 2) ( 1
2
; 15
16
) ( 1
2
; 1
2
)
variance ( 1
4
; 1) ( 1
2
; 15
256
) ( 1
2
; 1
4
)
c c1c2c3c4c5 A. (1) A. (2) A. (3)
1 0 0 0 0 1 (0; 1) (0; 1) (0; 1)
1 0 0 0 1 0 (0; 1) (0; 1) (0; 1)
1 0 0 1 0 0 (0; 1) (0; 1) (0; 1)
1 0 1 0 0 0 (0; 1) (0; 1) (0; 1)
1 0 0 1 1 1 (0; 3) (0; 1) (0; 1)
1 0 1 0 1 1 (0; 3) (0; 1) (0; 1)
1 0 1 1 0 1 (0; 3) (0; 1) (0; 1)
1 0 1 1 1 0 (0; 3) (0; 1) (0; 1)
1 1 0 0 0 0 (1; 0) (1; 0) (1; 0)
1 1 0 0 1 1 (1; 2) (1; 1) (1; 0)
1 1 0 1 0 1 (1; 2) (1; 1) (1; 0)
1 1 1 0 0 1 (1; 2) (1; 1) (1; 0)
1 1 0 1 1 0 (1; 2) (1; 1) (1; 0)
1 1 1 0 1 0 (1; 2) (1; 1) (1; 0)
1 1 1 1 0 0 (1; 2) (1; 1) (1; 0)
1 1 1 1 1 1 (1; 4) (1; 1) (1; 0)
mean ( 1
2
; 2) ( 1
2
; 15
16
) ( 1
2
; 1
2
)
variance ( 1
4
; 1) ( 1
2
; 15
256
) ( 1
2
; 1
4
)
c2c3c4c5. So, we obtain the distributions for (glitch(z4); glitch(z5)) which is
on Table 2. As we can see, the mean and the variance do not leak (as intended).
However, the distributions are quite far apart.
Indeed, for Assumption (3), we have c = glitch(z4)  glitch(z5) so it is clear
that c leaks. For Assumption (1), we have c = glitch(z4) (glitch(z5) mod 2) so
it is clear that c leaks as well. For Assumption (2), the distributions are
distribution (0; 0) (0; 1) (1; 0) (1; 1)
(glitch(z4); glitch(z5))jc = 0 1=16 7=16 0=16 8=16
(glitch(z4); glitch(z5))jc = 1 0=16 8=16 1=16 7=16
so the statistical distance is 18 . This means that from a single value we can
deduce c with an error probability of Pe =
1
2   116 . Of course, this amplies like
in (4) using more samples. Hence, two probes leak quite a lot. So, we clearly see
that avoiding the extra registers needed to avoid the number of shares to inate
makes the implementation from [1] insecure.
7 Conclusion
We have shown that the threshold implementations are quite weak against many
simple attacks: distinguishers based on non-linear functions on the power traces
(as simple as a threshold function or a power function), multiple probes, and
15
linear distinguishers for a cascade of circuits. Although they do not contradict the
results by their authors, these attacks show severe limitations on this approach.
We have seen that compared to the attack on the AND gate with no protec-
tion, the threshold implementation proposals only have the eect to amplify the
noise of the side-channel attack by a constant factor. Therefore, we believe that
there is no satisfactory protection for attacks based on glitches.
References
1. B. Bilgin, B. Gierlichs, S. Nikova, V. Nikov, V. Rijmen. Higher-Order Threshold
Implementations. In Advances in Cryptology ASIACRYPT'14, Kaohsiung, Taiwan,
Lecture Notes in Computer Science 8873{8874, pp. 326{343 vol. 2, Springer-Verlag,
2014.
2. S. Chari, C.S. Jutla, J.R. Rao, P Rohatgi. Towards Sound Approaches to Coun-
teract Power-Analysis Attacks. In Advances in Cryptology CRYPTO'99, Santa
Barbara, California, U.S.A., Lecture Notes in Computer Science 1666, pp. 398{
412, Springer-Verlag, 1999.
3. H. Cherno. A Measure of Asymptotic Eciency for Tests of a Hypothesis Based
on the sum of Observations. Annals of Mathematical Statistics, vol. 23 (4), pp.
493-507, 1952.
4. A. Duc, S. Dziembowski, S. Faust. Unifying Leakage Models: From Probing Attacks
to Noisy Leakage. In Advances in Cryptology EUROCRYPT'14, Copenhaguen,
Denmark, Lecture Notes in Computer Science 8441, pp. 423{440, Springer-Verlag,
2014.
5. S. Mangard, T. Popp, B.M. Gammel. Side-Channel Leakage of Masked CMOS
Gates. In Topics in Cryptology CT-RSA'05, San Francisco CA, USA, Lecture Notes
in Computer Science 3376, pp. 351{365, Springer-Verlag, 2005.
6. A. Moradi. Statistical Tools Flavor Side-Channel Collision Attacks. In Advances in
Cryptology EUROCRYPT'12, Cambridge, UK, Lecture Notes in Computer Science
7237, pp. 428{445, Springer-Verlag, 2012.
7. S. Nikova, C. Rechberger, V. Rijmen. Threshold Implementations Against Side-
Channel Attacks and Glitches. In Information and Communication Security
ICICS'06, Raleigh NC, USA, Lecture Notes in Computer Science 4307, pp. 529{
545, Springer-Verlag, 2006.
8. S. Nikova, V. Rijmen, M. Schlaer. Secure Hardware Implementation of Nonlinear
Functions in the Presence of Glitches. Journal of Cryptology, vol. 24, pp. 292{321,
2011.
9. Oscar Reparaz and Begl Bilgin and Svetla Nikova and Benedikt Gierlichs and
Ingrid Verbauwhede. Consolidating Masking Schemes. In Advances in Cryptol-
ogy CRYPTO'15, Santa Barbara, California, U.S.A., Lecture Notes in Computer
Science 9215{9216, pp. 764{783, Springer-Verlag, 2015.
10. F.-X. Standaert, N. Veyrat-Charvillon, E. Oswald, B. Gierlichs, M. Medwed,
M. Kasper, S. Mangard The World Is Not Enough: Another Look on Second-
Order DPA. In Advances in Cryptology ASIACRYPT'10, Singapore, Lecture Notes
in Computer Science 6477, pp. 112{129, Springer-Verlag, 2010.
11. E. Trichina, T. Korkishko, K.H. Lee. Small Size, Low Power, Side Channel-Immune
AES Coprocessor: Design and Synthesis Results. In Advanced Encryption Standard
AES'04, Bonn, Germany, Lecture Notes in Computer Science 3373, pp. 113{127,
Springer-Verlag, 2005.
16
