Abstract. Threshold implementations allow to implement circuits using secret sharing in a way to thwart side-channel attacks based on probing or power analysis. It was proven they resist to attacks based on glitches as well. In this report, we show the limitations of these results. Concretely, this approach proves security against attacks which use the average power consumption of an isolated circuit. But there is no security provided against attacks using a non-linear function of the power traces (such as the mean of squares or the majority of a threshold function), and there is no security provided for cascades of circuits, even with the power mean. We take as an example the threshold implementation of the AND function by Nikova, Rechberger, and Rijmen with 3 and 4 shares. We further consider a proposal for higher-order by Bilgin et al.
Introduction
Since the late 1990's, many side-channel attacks based on either power analysis or probing have been presented. We consider essentially two types of attacks. In Differential power attacks (DPA), the adversary collects many samples of the sum of the power used by all gates of the circuit with noise. In Probing attacks, the adversary gets a few intermediate values of the computation by probing the circuit. All measures are subject to noise and can be modeled [2] . Duc et al. have shown that these two attacks are essentially equivalent [4] .
One devastating type of attack is based on "glitches". It takes into account that electric signals are not necessarily a classical 0/1 signal but a real function over a clock period which is non constant. For instance, the signal can be intermediate between 0 and 1, or switching several times between 0 and 1 during the clock period, or a signal with a very short switching peak, etc. The CMOS technology uses very little power. Signals switching in between clock periods use power. Essentially, only signal switches use power. So, a glitch induces an abnormal power consumption which is visible during a clock period [5] .
To avoid these attacks, masking is a common method. Essentially, instead of running the computations based on inputs x and y to obtain a result z, we first use a secret sharing for x and y to split it into n random shares (x 1 , . . . , x n ) and (y 1 , . . . , y n ) and run the computation on the shares to obtain a sharing (z 1 , . . . , z n ) of z. Usually, the secret sharing is the simple (n, n)-scheme in which x = x 1 ⊕ · · · ⊕ x n , y = y 1 ⊕ · · · ⊕ y n , and z = z 1 ⊕ · · · ⊕ z n . Trichina, Korkishko, and Lee [11] proposed an implementation of an AND gate with n = 2.
In [7] , Nikova, Rechberger, and Rijmen proposed the threshold implementation which transforms a gate (such as an AND gate) into a circuit which resists to probing attacks with a single probe or DPA based on the average of the power consumption. One construction uses n = 3 and another one with n = 4 has the property that output shares are always balanced. In [1] , Bilgin et al. extend this method to higher orders, to make circuits resisting to 2 probes or DPA based on a 2nd order moment of the power consumption. They propose an implementation of an AND gate with n = 5 but this implementation requires internal flip/flop registers, thus induce latencies, just to have a secure AND circuit. These constructions were recently consolidated in [9] .
Our results. As the glitch propagation model highly depends on concrete implementations, in this paper, we consider several models for accounting glitches obtained by the XOR of two glitched signals. We do not advertise any model to be better but rather show how little influence the model has on the security results. In a first model, the "double-glitch" simply counts as twice a normal glitch. In this model, the mean power for the construction with n = 2 does not leak. In a second model, the double-glitch counts as a normal one. In a third model, the two glitches cancel each other and do not count. In the two latter models, the construction with n = 2 leaks from the mean power.
In the mentioned constructions using n > 2, we show that two probes leak, that some non-linear function of the power (such as the mean of squares or the majority of a threshold function) leak, and that by composing two circuits implementing two AND gates, one probe leaks.
Finally we show that in the three models, the AND construction using n = 5 (the one resisting 2nd order attacks) does not resist to an attack with two probes when we do not add internal flip/flop registers.
The security claims coming with these implementations from the literature are of the form "if [conditions] then we have security". We do not contradict any of these results. In this paper, we complement them by showing that when the conditions are not met, we clearly have insecurity. So, these conditions are not only sufficient: they are also necessary.
The Theory

The Glitch Algebra
Algebra is "the part of mathematics in which letters and other general symbols are used to represent numbers and quantities in formulae and equations". Herein, we propose to represent glitches as well and to do operations on glitches.
In what follows we use the following conventions: a "signal" is a function from a clock cycle [0, τ ] to R; we consider real numbers as constant signals, we consider bits as real numbers in {0, 1}; + and × denote the addition and multiplication of reals; ⊕, ∨, and ∧ denote the XOR, OR, and AND of signals.
A signal "represents" a bit. To avoid confusion, from now on we denote with regular letters a signal and we denote with a bar the bit it is supposed to represent. We say that a signal x has no glitch if it is constant and equal to the bitx it represents. The functions ⊕, ∨, and ∧ are defined by the gates implementing these functions. We only know that they match what we know about bits: a⊕b = a+b mod 2, a∨b = max(a, b), and a∧b = ab when a and b have no glitch. Furthermore, we define a function glitch giving the "number of glitches" in a signal and a function power giving the power consumption of a gate. We assume that glitch(x) = 0 if x has no glitch. The function glitch applies to a signal but the function power applies to a gate. Concretely, a gate g = op(a, b) with output signal c corresponds to power(g) = glitch(c)p op where p op is a constant. So, power(g) = 0 if op(a, b) has no glitch. Actually, this is an approximation. Essentially, it is assumed that a stable signal uses very little power while a glitch induces a high power consumption, like in the CMOS technology [5] . The assumption on the influence of glitches on the power consumption may be a bit arbitrary. In the sequel, we take for granted that when y has no glitch, then x⊕y has the same glitch as x. When y has no glitch andȳ = 0, we assume that x ∧ y has no glitch either (due to the AND with 0). When y has no glitch andȳ = 1, we assume that x ∧ y has the same glitch as x. So,
We further define Σpower as the sum of power(g) for all gates g in a circuit.
It is not quite clear how to define glitch(x∧y) for two glitched signals x and y in general. Even for glitch(x ⊕ y), we may take one of the following assumptions:
These assumptions are quite reasonable in theory. (1) accounts for glitches which cumulate, for instance because they occur at different time in a clock period. (2) assumes that a glitch can be hidden by another one. (3) comes from saying that two perfectly identical glitches should cancel each other in a XOR. However, reality is more complex and probably a mixture of these three models:
for some symmetric function F . For simplicity, we will study these simple assumptions. We will see that nearly all assumptions give the same results. Each defines some kind of "glitch algebra" on which we can do computations. In this report, we consider two types of side-channel attacks based on glitches.
-Power analysis: the adversary can see Σpower with noise.
-Probing attack: for a gate g, the adversary can get glitch(g) with some noise.
Duc et al. have shown that these two attacks are equivalent [4] .
Side-Channel Attack with Noise
In side-channel attack, we measure a quantity S in a discrete domain D but the measurement comes with noise so we obtain Z = S + noise. We assume that S follows a distribution P 
d(P
Z 0 , P Z 1 ) = 1 2 ∑ z |Pr[Z = z|b = 0] − Pr[Z = z|b = 1]|
Theorem 1 (Precision amplification).
Given an elementary distinguisher computing X from Z, with Type I error probability α ≤ 
In what follows, we assume that the noise is Gaussian, centered, independent from S, and that the ratio of the standard deviation of the noise and of S is a given value σ. So, noise has a variance of σ 2 V (S). Hence,
Threshold distinguisher. We consider the distinguisher computing X = 1 Z≤τ .
In the Gaussian noise model, the Type I error is
by symmetry of the noise distribution, the Type II error is
By adjusting τ so that α = β = P e , we obtain that N = 2
Case study for
. We obtain from Th. 1 that N = 2
For instance, with σ = 1, we have P e ≈ 16% and N = 17. With σ = 2, we have P e ≈ 31% and N = 55. We obtain that N ∼ σ→+∞ 4πσ 2 using erfc(
is enough to guess b with error limited to a constant. This is a quite favorable attack as we can measure b directly.
Attack for n = 1. As an example, given an AND gate z = x∧y (with no threshold protection, or equivalently n = 1), assuming that y is stable equal to some secretȳ and that glitch(x) = 1, we have glitch(z) =ȳ. So, an attack measuring S = glitch(z) deducesȳ trivially. We are in the case where S = glitch(z) =ȳ is binary and balanced. So, the above equation governs the complexity N of recoveringȳ using no threshold implementation and noise characterized by σ. 
we can see that the effect of moving from S to S 2 is only in doubling the value of σ. As we will see, a motivation of threshold cryptography is to prevent leaks at a lower order S to make the adversary look at higher order. This actually penalizes a bit the adversary.
Implementation with n = 2
Trichina et al. [11] proposed an implementation of the AND gate to compute z = x ∧ y by using n = 2: 1. (secret sharing for x) pick a ∈ U Z 2 and computẽ x = a⊕x; 2. (secret sharing for y) pick b ∈ U Z 2 and computeỹ = b⊕y; 3. (secret sharing for z) pick c ∈ U Z 2 ; 4. computẽ
by respecting the order of the parentheses; 5. the output (z, c) shares z = c ⊕z.
In [7] , Nikova, Rechberger, and Rijmen observe that if the input signal x has a glitch and y is a secret input, then by analyzing the power consumption of the above gate we can easily deduce y. Indeed, assuming that an AND or XOR gate uses an abnormal power scheme proportional to the number of "glitch" on their result, the number of gates using an abnormal power scheme depends on y. So, we assume that glitch(x) = 1 and glitch({a, b, c, y}) = 0.
We have The computation done in [8] to analyze the leakage was based on Assumption (1) as we can easily check from [8, Table 1 ]. So, we contradict this claim for Assumption (1): E(Σpower) is independent fromȳ in this case. However, it is true that E(Σpower) leaksȳ for Assumption (2) and Assumption (3). For this implementation, the choice of the "glitch algebra" gives different conclusions. Similarly, in attacks based on probing z, we can see that E(glitch(z)) = 1 which is independent fromȳ in Assumption (1). For Assumption (2), we have E(glitch(z)) = 1 2 which is also independent fromȳ. For Assumption (3), we have glitch(z) =ȳ. In the latter case, we can see that E(glitch(z)) leaksȳ so noisy samples for glitch(z) leakȳ using the amplification technique of Eq. (4).
This made [7] propose a "threshold implementation" of an AND gate using n = 3 or n = 4 shares, the above example being an example using n = 2 shares. They prove that, contrarily to this example, probing a single gate in the computation leaks no information on any of the input x and y, on average. They deduce that their implementations resist to the above attacks based on glitches. We will show the limitations of this result with effective attacks.
Implementation with n = 3
Assuming that (x 1 , x 2 , x 3 ) shares x, (y 1 , y 2 , y 3 ) shares y, and (z 1 , z 2 , z 3 ) shares z, Nikova, Rechberger, and Rijmen [7] propose
This construction satisfies the conditions from Nikova et al. [7] . We quote [7] : Although we do not contradict the independence of the mean with the input values, we show that a probing attack can leakȳ easily. We further show that a cascade of this construction also leaks with the mean of power consumption.
In the attacks, we will assume that none of the y i variables have a glitch, and that they are independent from the glitches in the x i variables.
With Assumption (1), we have
In the glitch value of each gate, we can see that at least one variableȳ i is not present (indeed, the construction was made for that). Since theȳ i are uniformly distributed conditioned toȳ =ȳ 1 ⊕ȳ 2 ⊕ȳ 3 , no matter the value ofȳ, the two presentȳ i variables are uniformly distributed. So, the distribution of any glitch value is independent fromȳ. Consequently, it is the case for their mean value. Since Σpower is a linear combination of these values, due to the linearity of the mean operator, this is also the case for Σpower. With Assumption (2), by writing max(ȳ i ,ȳ j ) =ȳ i ∨ȳ j , we have
Like above, the mean value of any of these expression is independent fromȳ. With Assumption (3), we have
Like above, the mean value of any of these expression is independent fromȳ.
Power Analysis not Based on the Mean Value (All Assumptions)
We have already seen that no glitch value has a distribution which depends onȳ. So let us focus on the distribution of Σpower. With glitch(x 1 ) = 1 and glitch(x 2 ) = glitch(x 3 ) = 0, we obtain with Assumption (1) that Σpower = (ȳ 1 +ȳ 2 +ȳ 3 )(p AND + 2p XOR ) −ȳ 1 p XOR With Assumption (2), our previous computations simplify to We count the number of gates with a glitched output following the two assumptions. We also indicate Σpower assuming that p AND = 1 and p XOR = 4.
1
The results are on Table 1 . Table 1 . Distributions for a glitch in x1 in the threshold implementation Assumption (1) Assumption (2) Assumption ( Clearly, the distributions of Σpower|ȳ = 0 and Σpower|ȳ = 1 are very different. For instance, the parity of #AND is always equal toȳ. The supports of the distributions #XOR|ȳ = 0 and #XOR|ȳ = 1 are disjoint with Assumption (1). So, we can distinguish them with one sample with advantage 1. With Assumption (2) or Assumption (3), the statistical distance of the distributions #XOR|ȳ = 0 and #XOR|ȳ = 1 is 1 2 . So, a trivial statistic with a couple of samples would recoverȳ assuming no noise.
We consider several types of distinguishers base on measuring #XOR. As the impact of the glitched XORs on Σpower is bigger, we can assume we measure it this way. We could also consider other side channel attacks which can separate the XORs from and ANDs.
Best distinguisher. With Assumption (1) and #XOR, the best distinguisher returns 0 if #XOR ∈ {0, 3, 4} and it returns 1 if #XOR ∈ {1, 2, 5}. A statistical distance of 1 means that we can guessȳ with an error probability 0. A statistical distance of 0.5 means that we can guessȳ with an error probability In practice, measuring #XOR may give a noisy value making it hard to implement this distinguisher. I.e., 0 and 1 may be too close to be distinguishable, as well as 2 and 3, and 4 and 5.
Threshold distinguisher. We consider the distinguisher giving 1 #XOR+noise≤τ , i.e. 1 if #XOR (rather its noisy value from a side channel) is below a given threshold τ . Assuming that noise follows an independent normal distribution with mean 0 (w.l.o.g. by adjusting τ ) and variance σ 2 V (#XOR), we have
) so the Type I error in guessingȳ is
The Type II error is
For Assumption (1), we have V (#XOR) = 9 4 and
For τ = 2.5, using erfc(−x) = 2 − erfc(x), we obtain
Similarly, we have β = α. So, we have P e = α = β and
As σ goes from 0 to infinity, P e grows from , we have P e ≈ 38%. For σ = 1, we have P e ≈ 46%. For σ = 2, we have P e ≈ 49.35%.
So, even with a big noise, we can recoverȳ with an interesting advantage with only one sample.
Of course, we can amplify this advantage by using several samples. Since we have α = β = P e , by using (4) we obtain a new error probability of P ′ e = e −2 ≈ 13% with N = 2
. We obtain the following table:
So, measuring the number of XORs (a number between 0 and 5) with a big noise of standard deviation twice what we want to measure still allows to deduceȳ with less than 50 000 samples with Assumption (1).
Moment distinguisher.
Instead of computing the average of Σpower, we compute the moment E((Σpower) d ) of order d, just like in Moradi [6, 10] . With Assumption (1), we have
it is as if we measured S 2 with noise noise ′ . The variance of noise ′ is roughly 4σ 2 V (S 2 ), so the attack works as if we just doubled σ. For instance, our previous computation shows that by doubling σ we roughly multiply N by 50. With this approach, threshold implementation penalizes the precision of the measurement.
Probing Attack with Two Probes Based on the Mean Value (All Assumptions)
We can further see what probing can yield. With Assumption (1,3) , we have glitch(z 2 ) =ȳ 3 and glitch(z 3 ) =ȳ 1 ⊕ȳ 2 . So, probing both z 2 and z 3 is enough to recoverȳ.
With Assumption (2), this leaks (ȳ 3 ,ȳ 1 ∨ȳ 2 ). Forȳ = 0, the distribution of this couple is Pr[(0, 0)] = As an example, with Assumption (1) we compute S = glitch(z 2 )+glitch(z 3 )− glitch(z 2 )glitch(z 3 ). Assuming that both glitch(z 2 ) and glitch(z 3 ) are subject to some noise with same parameter σ, the value we obtain for S is similar to a noisy value with the parameter σ multiplied by a constant factor less than 3. In our table, this results in a complexity N multiplied by a factor 300.
We recall that [7] claims no security when probing two values.
Power Analysis and Probing Attack on Two ANDs Based on the Mean Value (Assumption (2) or (3))
We use two consecutive threshold AND gates to compute the AND between z and another shared bit u to obtain v = x ∧ y ∧ u. We assume no glitch on the y and u variables. We assume that only x 1 has a glitch. We have
With Assumption (1), the linearity of the equations make sure that the expected value of the glitch variables are independent fromȳ. Now, under Assumption (2), we have
so we can now try to probe v 1 . We have the 3 following cases:
-ū 2 =ū 3 = 0 (probability The attack with noisy values is hardly more complicated than for n = 1. Note that [7] does not claim any security on the composition of two AND gates. But this attacks clearly shows the limitation of this approach.
Implementation with n = 4
Assuming that (x 1 , x 2 , x 3 , x 4 ) shares x, (y 1 , y 2 , y 3 , y 4 ) shares y, and (z 1 , z 2 , z 3 , z 4 ) shares z, Nikova, Rechberger, and Rijmen [7] propose
It was proposed as an improvement to the n = 3 scheme as it makes all z i shares balanced. This property is called uniformity in [8] . It was used to address composition. So, we look again at the composition of two AND circuits.
Again, we assume glitch(x 1 ) = 1, glitch(x 2 ) = glitch(x 3 ) = glitch(x 4 ) = glitch(y i ) = 0 for i = 1, . . . , 4 and glitch(x 1 ) = 1. So, glitch(z 1 ) = 0, glitch(z 2 ) = (ȳ 1 ⊕ȳ 4 ) + 1, glitch(z 3 ) = 0, and glitch(z 4 ) = (ȳ 2 ⊕ȳ 3 ) + 1 with Assumption (1).
We compute v = z ∧ u = (x ∧ y) ∧ u using the threshold implementation with
So, we have
Hence, we can just probe v 1 and see if it has a glitch. With probability 1 2 , we haveū 2 =ū 3 so glitch(v 1 ) = 2(ȳ 2 ⊕ȳ 3 ) + (ȳ 1 ⊕ȳ 4 ) + 3. In other cases, we have glitch(v 1 ) = (ȳ 1 ⊕ȳ 4 ) + (ȳ 2 ⊕ȳ 3 ) + 2 which is uniformly distributed. So, by repeating enough times, the majority of glitch(v 1 ) isȳ with high probability.
The attack with noisy values is hardly more complicated than for n = 1. Computations with Assumptions (2) or (3) are similar. Note that [7] does not claim any security on the composition of two AND gates. However, the n = 4 implementation was made to produce a balanced sharing of the output to address composability through pipelining, meaning by adding a layer of registers between the circuits we want to compose. Here, we consider the composition of two AND gates without pipelining. Indeed, we certainly do not want to add registers in between two single gates! But our attacks shows that the entire layer of circuit that we want to compose through pipelining must be analyzed as a whole, since single gates clearly do not compose well. linear distinguishers for a cascade of circuits. Although they do not contradict the results by their authors, these attacks show severe limitations on this approach. We have seen that compared to the attack on the AND gate with no protection, the threshold implementation proposals only have the effect to amplify the noise of the side-channel attack by a constant factor. Therefore, we believe that there is no satisfactory protection for attacks based on glitches.
