Error Detection for Borrow-Save Adders Dedicated to ECC Unit by Francq, Julien et al.
Error Detection for Borrow-Save Adders Dedicated to
ECC Unit
Julien Francq, Jean-Baptiste Rigaud, Pascal Manet, Assia Tria, Arnaud
Tisserand
To cite this version:
Julien Francq, Jean-Baptiste Rigaud, Pascal Manet, Assia Tria, Arnaud Tisserand. Error
Detection for Borrow-Save Adders Dedicated to ECC Unit. FDTC’08: 5th Workshop on Fault
Diagnosis and Tolerance in Cryptography, 2008, Washington, DC, United States. IEEE, pp.77-
86, 2008, <10.1109/FDTC.2008.17>. <lirmm-00316796>
HAL Id: lirmm-00316796
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00316796
Submitted on 16 Jun 2009
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
Error Detection for Borrow-Save Adders Dedicated to ECC Unit
Julien Francq1,3, Jean-Baptiste Rigaud1,
Pascal Manet2, Assia Tria2
1E´cole des Mines de Saint-E´tienne, 2CEA-LETI
1,2Centre Microe´lectronique de Provence G. Charpak
Laboratoire SESAM
880 Avenue de Mimet, 13120 Gardanne, France
1{name@emse.fr}, 2{surname.name@cea.fr}
Arnaud Tisserand3
3LIRMM, CNRS–Univ. Montpellier 2
161 rue Ada,
34392 Montpellier Cedex 05, France
3{surname.name@lirmm.fr}
Abstract
Differential Fault Analysis (DFA) is a real threat for el-
liptic curve cryptosystems. This paper describes an elliptic
curve cryptoprocessor unit resistant against fault injection.
This resistance is provided by the use of parity preserving
logic gates in the operating structure of the ECC unit, which
is based on borrow-save adders. The proposed countermea-
sure provides a high coverage fault detection and induces
an acceptable area overhead (+ 38 %).
1. Introduction
Since it offers the highest strength per bit of any public-
key cryptography system known today, Elliptic Curve Cryp-
tography (ECC) is a good alternative to RSA cryptosystems
for ensuring secret exchange [22]. Usually, ECC brings
many benefits: faster computations, less power consump-
tion, limited storage and smaller keys and certificates. As a
consequence, ECC is a good candidate for smart cards and
embedded systems.
ECC is considered mathematically secure since it is
based on the Elliptic Curve Discrete Logarithm Problem
(ECDLP). Nevertheless, the secret key processed by an el-
liptic curve cryptosystem can be retrieved if this latter is im-
plemented without caution using physical attacks. Among
physical attacks, Simple Power Analysis (SPA [15]) can be
an efficient method to extract the key. In such attacks, in-
formation about secret key is deduced directly thanks to the
study of the power trace from a single secret key computa-
tion. Implementations of elliptic curve point multiplication
algorithms are particularly vulnerable because the usual for-
mulas used for the two main elliptic curve point multiplica-
tion operations called addition and doubling are quite differ-
ent. Consequently, they can have power traces which can be
distinguished. In order to protect elliptic curve cryptosys-
tems against SPA, some efficient countermeasures have al-
ready been proposed in [7], [4] or [5]. A second type of
attack, called “fault attacks”, consists in forcing the de-
vice to perform erroneous computations by changing some
bits of a parameter or an intermediate result [2], [6], [3].
Faults can be induced thanks to various means as temper-
ature variations, electromagnetic perturbations, X-rays and
ion beams injection, glitches on the supply voltage or the
external clock, or light illumination [1]. In order to cir-
cumvent this powerful kind of attack, some standard hard-
ware countermeasures can be implemented such as detec-
tors (temperature, supply voltage, frequency, light) for ex-
ample. If an embedded error detection scheme detects an
error, an alarm can be raised and/or a random result can
be sent to the output of the cryptosystem. Another obvi-
ous countermeasure can be to check the output using an ad-
ditional computation step. Unfortunately for curve-based
cryptography, and contrary to RSA, checking the result im-
plies to perform the whole computation twice or requires a
space redundancy duplication, which is very costly. As a
consequence, the use of partial checking methods working
in parallel with the main computation is preferred.
This paper presents the incorporation of a fault detection
method based on parity-preserving logic gates in some parts
of an elliptic curve unit. In [18], the feasibility of this ap-
proach had already been theoretically demonstrated, but no
synthesis result of parity-preserving circuits had been re-
ported. The main contribution of this paper is to demon-
strate that this method is acceptable in practice by giv-
ing implementation results. A specific part of the unit,
the borrow-save adders, becomes a high-level fault-tolerant
structure.
This paper is organized as follows. Section 2 recalls
some notations and mathematical background. Section 3
presents known fault attacks on ECC and previous counter-
2008 5th Workshop on Fault Diagnosis and Tolerance in Cryptography
978-0-7695-3314-8/08 $25.00 © 2008 IEEE
DOI 10.1109/FDTC.2008.17
77
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
measures. Then, our parity fault-tolerant unit using parity
preserving logic gates is detailed in Section 4. The perfor-
mance of this unit, the impact of the implemented counter-
measure and the fault coverage are presented in Section 5.
2. Mathematical Preliminaries
2.1 Elliptic Curves over Fp
Elliptic curve cryptography was proposed independently
by Koblitz [14] and Miller [16] in 1985. See [11] for a
complete introduction to ECC. Many protocols use ECC
for digital signature (ECDSA), encryption (ECIES), keys
exchange (ECDH) or key generation (ECQMV).
An elliptic curve over a finite field Fp (p is a large prime
number) is the set of points (x, y) which satisfies the Weier-
strass equation E : y2 = x3 + ax + b, where x, y, a and b
are in Fp and 4a3+27b2 6≡ 0 (mod p). The point at infinity
∞ is added to E. The set E forms an additive group with
the properties (P1 and P2 are in E):
1. ∞ is the neutral element: P1 ⊕∞ =∞⊕ P1 = P1,
2. point opposite (or negation) is inexpensive on elliptic
curves: −P1 = (x1,−y1),
3. in affine coordinates, the result of the addition of the
points P1 = (x1, y1) and P2 = (x2, y2) gives the final
point (x3, y3), such that:
x3 = λ2 − x1 − x2, y3 = λ(x1 − x3)− y1
where λ =

y2 − y1
x2 − x1 , if P1 6= ±P2 (addition)
3x21 + a
2y1
, if P1 = P2 (doubling)
The main ECC primitive is the point scalar multiplica-
tion. This one-way function is defined as follows:
E × Z→ E, (P, k) 7→ Q = [k]P = P ⊕ P ⊕ · · · ⊕ P︸ ︷︷ ︸
k times
One basic method to compute the point scalar multipli-
cation is the double-and-add algorithm (Algorithm 1).
2.2 Borrow-Save Addition
Borrow-Save (BS) is a radix-2 signed-digit redundant
representation [10]. The integer X is represented by
(xl−1 · · ·x1x0)BS where the l digits xi are in {−1, 0, 1} and
coded using 2 bits x+i and x
−
i such that xi = x
+
i − x−i and
X =
l−1∑
i=0
xi2i =
l−1∑
i=0
(
x+i − x−i
)
2i
Algorithm 1 Double-and-add
Input: P, k = (kl−1...k1k0)2
Output: Q = [k]P
1. Q←∞
2. for i = l − 1 downto 0 do
3. Q← [2]Q
4. if ki = 1 then
5. Q← Q⊕ P
6. end if
7. end for
8. return Q
Borrow-Save Addition (BSA) can be performed using
the constant time algorithm presented in Algorithm 2 and
illustrated on Figure 1. Due to the signed-digit representa-
tion, −X is obtained by swapping x+i and x−i for all ranks
i.
Algorithm 2 Borrow-Save Addition
Input: X = (xl−1...x1x0)BS and Y = (yl−1...y1y0)BS
Output: S = (sl...s1s0)BS = X + Y
1. c+0 ← 0, s−0 ← 0
2. for i = 0 to l − 1 do B parallel loop
3. 2c+i+1 − c−i PPM←−−−− x+i + y+i − x−i
4. end for
5. for i = 0 to l − 1 do B parallel loop
6. 2s−i+1 − s+i PPM←−−−− y−i + c−i − c+i
7. end for
8. s+l ← c+l
9. return S
The BSA operator depicted on Figure 1 uses 2 rows of
PPM cells [10]. A PPM is very close to a full-adder (just 1
extra inverter) and computes 2c± − s∓ = x± + y± − x∓.
A logical implementation can be s = x± ⊕ y± ⊕ x∓ and
c = x±y± + x±x∓ + y±x∓. Figure 1 clearly shows that
the BSA computation time does not depend on the operand
size l and is T(BSA(l)) = 2 · T(PPM).
PPM
+
x+0
+
y+0
−
x−0
PPM
−
c−0
− −
y−0
+
0
c+0
+
s+0
−
s−1 s
−
0
0
PPM
+
x+1
+
y+1
−
x−1
PPM
−
c−1
− −
y−1
+
+
c+1
+
s+1
−
s−2
PPM
+
x+2
+
y+2
−
x−2
PPM
−
c−2
− −
y−2
+
+
c+2
+
s+2
−
s−3
PPM
+
x+3
+
y+3
−
x−3
PPM
−
c−4
− −
y−3
+
+
c+3
+
s+3
−
s−4
+
c+4
s+4
Figure 1. 4-Digit Borrow-Save Adder
78
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
Known attacks on the
ECDLP
Security constraints on
parameters
Exhaustive search n sufficiently large (e.g.,
n ≥ 280)
Pohlig-Hellman and Pol-
lard’s rho attack
For maximum resistance:
# E(Fp) = hn with h ∈
{1, 2, 3, 4} and n prime
(n > 2160)
Attack on prime-field-
anomalous curves
# E(Fp) 6= p
Weil and Tate pairing at-
tack
n - (pk − 1) for all 1 ≤
k ≤ C, where C is large
enough 1
Table 1. Consequences of known attacks on
the choice of the curve parameters
3 Fault Attacks on ECC
3.1 Known Fault Analysis on ECC
The security of elliptic curve cryptography schemes lies
on the ECDLP: given points P and Q, there exists no sub-
exponential algorithm to find k, such that Q = [k]P (in the
general case). For security reasons, the ECDLP should be
intractable. As a consequence, the elliptic curve parameters
(particularly p, a, b, P , and n = ordE(P )) should be care-
fully chosen in order to be resistant against all the known
attacks on the ECDLP. If this is the case, this elliptic curve
is considered to be a cryptographically “strong” curve. Ta-
ble 1 lists some attacks on the ECDLP and the consequences
induced on the choice of a strong curve parameters.
An attacker’s way of applying DFA on ECC is to dis-
turb the representation of P (P becomes Pˆ ), such that the
cryptosystem applies its point multiplication algorithm to a
value which is not a point on the given (or selected) curve
but on another curve, expected to be cryptographically less
secure (it is considered to be a “weak” curve). The result
of this computation is the point Qˆ on this new weak curve
which can be exploited to compute the secret key k. This
idea had been first described by Biehl et al. in [2].
Because parameter b is not used in point addition or dou-
bling, an elliptic curve can be completely defined as:
E(a, b) = E(a, y2 − x3 − ax). (1)
If a “point” Pˆ = (xˆ, yˆ) ∈ Fp × Fp but Pˆ /∈ E then the
computation of Qˆ = [k]Pˆ will take place on the curve:
Eˆ(a, bˆ) = Eˆ(a, yˆ2 − xˆ3 − axˆ) (2)
1C must be large enough so that the DLP in F∗
pC
is considered in-
tractable (if n > 2160, C = 20 suffices).
x1 → xˆ1 p→ pˆ a→ aˆ
P → Pˆ = (xˆ1, y1) Pˆ = (xˆ1, yˆ1) unchanged
Device
outputs
Qˆ = [k]Pˆ Qˆ = [k]Pˆ Qˆ = [k]P
(xˆQ, yˆQ) = [k](xˆ1, y1) = [k](xˆ1, yˆ1) = [k](x1, y1)
Qˆ ∈ Eˆ(a, bˆ) Eˆ(a, bˆ) Eˆ(aˆ, bˆ)
Unknown
values
xˆ1 pˆ aˆ, bˆ
Useful
relations
xˆ31 + axˆ1 +
bˆ − y21 =
0 (mod p) 2
pˆ | (bQ − b1) 3 y21 = x31 + aˆx1 + bˆ,
yˆ2Q = xˆ
3
Q + aˆxˆQ + bˆ
Table 2. Attacks proposed by Ciet et al. in [6]
Using this fact, an attacker can choose carefully a “point”
Pˆi = (xˆi, yˆi) ∈ Eˆi such that with bˆi = yˆi2 − xˆi3 − axˆi
ordEˆi(Pˆi) = ni is small. Then, the cryptosystem will com-
pute Qˆi = [k]Pˆi. Because Pˆi and Qˆi are known, and ni
is small, the attacker can compute the discrete logarithm to
retrieve k (mod ni). The attacker can iterate this procedure
with other input points and using Chinese Remainder The-
orem, the correct value of k can be finally retrieved.
A simple countermeasure is to check whether the points P
and Q are on the strong curve E. As a consequence, the at-
tack described before may not be easy in practice. Biehl et
al. also proposed two other attacks making the assumption
that only few bit-errors (or exactly one) are inserted into the
base point P . These attacks are based on a rather idealized
fault model.
Ciet et al. in [6] refine the ideas of Biehl et al. by relax-
ing their fault model. It is shown how truly random errors
(hence practical computation faults) injected in the coordi-
nates of P , in the field representation, or in the curve param-
eters can allow to retrieve the key k. In a cryptographic de-
vice, the system parameters are stored in non-volatile mem-
ory (e.g. EEPROM) and are transferred into working mem-
ory (e.g. RAM) for the computations. In a first scenario,
Ciet et al. assume a permanent fault in an unknown posi-
tion in any system parameter defining the elliptic curve. In
a second scenario, it is analyzed the consequence of faults
during the transfer of the system parameters. Table 2 lists
the attacks proposed in [6].
Blo¨mer et al. showed in [3] how sign changes of points
can be used to recover the value of k. While the previous
attacks forced the device to output points that are not on the
original elliptic curve, the following Sign Change Fault At-
tacks (SCFA) uses a faulty point on the original curve.
It is only shown in this paper the SCFA on the standard
2with bˆ = yˆ2Q − xˆ3Q − axˆQ (mod p)
3with bQ = yˆ2Q−xˆ3Q−axˆQ (mod p) and b1 = y21−x31−ax1 (mod p)
79
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
double-and-add algorithm (Algorithm 1). It must be no-
ticed that this attack can be also mounted against the Non-
Adjacent-Form-based binary method and the Montgomery
ladder [4] where the y coordinate is used.
The goal of the adversary is to change the sign of the y co-
ordinate of the variable Q in line 5 of Algorithm 1 during
some unknown loop iteration 0 ≤ i ≤ `−1, such thatQ⊕P
becomes−Q⊕P . It is needed to be able to mount c = `/m
log(2`) attacks on the same input (P , k, E(a, b)) to recover
k with probability at least 1/2. Moreover, a correct result Q
must be known. The secret key bits of k will be retrieved in
pieces of r bits, such that 1 ≤ r ≤ m. The faulty value Qˆ
that results from a SCFA like defined before can be written:
Qˆ = −Q+ 2Li(k),with Li(k) =
i∑
j=0
kj2jP (3)
Li(k) is the only part which is unknown in (3): it is a mul-
tiple of P . If only a small number of bits of k are unknown,
these latter can be guessed and verified using equation (3).
The complete attack is divided into three phases. In the first
phase, it must be collected c outputs Qˆ of Algorithm 1 by
inducing a sign-change fault inQ for random values of i. In
the second phase it is guessed parts of k (stored in a variable
x) and computed a test candidate Tx. In the third phase, it
is tested Tx with all faulty results Qˆ obtained in the first
phase.
Blo¨mer et al. proved that their Algorithm succeeds to re-
cover k with a O(c`M3m) complexity (M is the maximal
cost of a full scalar multiplication or a scalar multiplication
including the induction of a sign-change fault) with proba-
bility at least 1/2.
The fault analysis presented in [2] and [6] allows the
key recovering thanks to the mathematical analysis of only
faulty results, contrary to the attack depicted in [3] where
both correct and faulty executions are needed.
Another fault attack, which is called Safe-Error Attack
(SEA), only checks if the computation is correctly per-
formed or not. This definition involves that the adversary
does not need to know the faulty decryption value, but only
if his attack is successful or not. There are two types of
SEAs: the CSEA and the MSEA. The CSEA consists in
inducing any temporary random computational fault inside
the ECC unit [24]. It can be applied to attack the value of
key bits in a double-and-add always scheme with a dummy
addition: after injecting a fault during the ECC unit compu-
tation, if the final result is correct, the addition was dummy
[12]. In order to perform a MSEA, it is needed to induce
a temporary memory fault inside a register or memory lo-
cation [23]. The MSEA implies stronger requirements than
CSEA in terms of controllability of fault location and tim-
ing. Thus, this attack appears rather hypothetical.
3.2 Previous Countermeasures
In order to circumvent fault attacks depicted in sub-
section 3.1, some countermeasures have already been pro-
posed.
Check if points are on the initial curve. Attacks pro-
posed in [2] and the attack on the base point P in [6] can be
counteracted when the device checks if P is on the original
curve. This can be done thanks to the curve parameter b.
This latter can become an integrity check:
for P = (x, y), b = y2 − x3 − ax
It must be noticed that the attack presented in [3] uses a
faulty point on the original curve. As a consequence, the
proposed countermeasure is inefficient for this attack.
CRC checksums. In order to prevent curve parameters
(particularly a and p) from the attacks in [6], it can be com-
puted CRC on them. Before each use of these parameters, a
new CRC is computed and then compared to the old one.
Randomization. Scalar k can be randomized. It can be
split thanks to a random value r. Different splitting meth-
ods can be implemented, from the simplest ([d]P = [k −
r]P + [r]P ) to the most complex ([k]P = [k mod r]P +
[bk/rc]([r]P )).
Use of a combined curve. In [3], it is proposed a counter-
measure which generalizes the Shamir’s idea [19] for RSA
to ECC. The modulus is extended in a first computation (p
becomes N = p0p) and then reduced (modulo p0) in a sec-
ond one. Instead of computing directly [k]P , it can be per-
formed
QN = [k]PN and Qp0 = [k]Pp0 ,
where PN = P (mod N ) and Pp0 = P (mod p0).
At the end, it must be checked if Qp0 = QN (mod p0). If
this is not the case, some errors have been induced.
Montgomery Ladder without using y coordinates. Since
y coordinate is not used in Montgomery Ladder, it is not
possible to successfully attack this latter thanks to a SCFA
[3].
4 Proposed ECC Unit
We present in this section the chosen algorithms for
computing modular operations and the global architecture
of the proposed ECC unit.
4.1 Algorithms for Modular Operations
This ECC unit is built to be able to compute unified ad-
dition formulae described by De´che`ne et al. in [8]. As a
80
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
consequence, this architecture is also SPA resistant.
Required Modular Operations. Three different types of
modular operations are needed to compute the formulae of
De´che`ne et al.: additions, multiplications and inversions
(to express the final point in affine coordinates at the end
of the point multiplication). All these arithmetic opera-
tions are done in borrow-save representation (see subsec-
tion 2.2). Modular additions are computed thanks to an al-
gorithm given in [21], and Modular Montgomery Multipli-
cation (MMM) is used to compute modular multiplications
[17]. Fermat’s theorem is chosen to implement modular in-
versions.
Chosen Modulo through Point Multiplication. The ini-
tial version of the Montgomery multiplication described in
[17] takes as inputs X < p and Y < p, and finally com-
putes X.Y.R−1 (mod p), with R = 2l (see Algorithm 3).
At step 5 of algorithm 3, 0 < S < 2p, so a final subtraction
is needed to ensure S < p. Unfortunately, this final sub-
Algorithm 3 MMM with final subtraction
Input: p (l bits), X < p (l bits), Y < p (l bits),
gcd(p, 2) = 1, R = 2l
Output: S = X.Y.R−1 (mod p)
1. S ← 0
2. for i = 0 to l − 1 do
3. mi ← s0 ⊕ xi.y0
4. S ← (S + xi.Y +mi.p)/2
5. end for
6. if S > p then
7. S ← S − p
8. end if
9. return S
traction is a time- and area-consuming process. Moreover,
an attack can be performed on this, especially when uni-
fied addition formulae are chosen to implement scalar mul-
tiplication [20]. As a consequence, a new MMM algorithm
without final subtraction must be written (see Algorithm 4),
and the result S is given modulo 2p: this is the chosen mod-
ulo for all the calculations.
Algorithm 4 MMM without final subtraction
Input: p (l bits), X < 2p (l + 1 bits), Y < 2p (l + 1 bits),
gcd(p, 2) = 1
Output: S = MMM(X,Y, 2p) = X.Y.2−(l+2) (mod 2p)
1. S ← 0
2. for i = 0 to l + 1 do
3. mi ← s0 ⊕ xi.y0
4. S ← (S + xi.Y +mi.p)/2
5. end for
6. return S
Algorithm for Modular Multiplication. The Algorithm
4 must be modified in order to make the computations in
borrow-save representation (see Algorithm 5).
Algorithm 5 MMM without final subtraction in borrow-
save representation
Input: p (l bits), −2p < X < 2p, −2p < Y < 2p,
gcd(p, 2) = 1
Output: (S+, S−) = X.Y.2−(l+2) (mod 2p)
1. (S+, S−)← (0, 0)
2. for i = 0 to l + 1 do
3. mi ← (s+0 ⊕ s−0 )⊕ (x+i ⊕ x−i ).(y+0 ⊕ y−0 )
4. if (x+i , x
−
i ,mi) = (0, 0, 0) or (1, 1, 0) then
5. (S+, S−)← BSA[(S+, S−), (0, 0)]
6. else if (x+i , x
−
i ,mi) = (1, 0, 0) then
7. (S+, S−)← BSA[(S+, S−), (Y +, Y −)]
8. else if (x+i , x
−
i ,mi) = (0, 1, 0) then
9. (S+, S−)← BSA[(S+, S−), (Y −, Y +)]
10. else if (x+i , x
−
i ,mi) = (0, 0, 1) or (1, 1, 1) then
11. (S+, S−)← BSA[(S+, S−), (p, 0)]
12. else if (x+i , x
−
i ,mi) = (1, 0, 1) then
13. (S+, S−) ← BSA[(S+, S−), ((Y +
p)+, (Y + p)−)]
14. else if (x+i , x
−
i ,mi) = (0, 1, 1) then
15. (S+, S−) ← BSA[(S+, S−), ((Y +
p)−, (Y + p)+))]
16. end if
17. (S+, S−)← (S+/2, S−/2)
18. end for
19. return (S+, S−)
Algorithm for Modular Addition. The algorithm initially
described in [21] allows to compute X + Y (mod p). This
algorithm must be modified to finally computeX+Y (mod
2p).
Algorithm 6 Modular Addition
Input: 2l ≤ 2p < 2l+1, −2p < X < 2p, −2p < Y < 2p
Output: S = X + Y (mod 2p), with S = S+ − S−
1. (T+, T−)← BSA[(X+, X−), (Y +, Y −)]
2. if tv = 4tl+2 + 2tl+1 + tl < 0 then
3. (S+, S−)← BSA[(T+, T−), (2p, 0)]
4. else if tv > 0 then
5. (S+, S−)← BSA[(T+, T−), (−2p, 0)]
6. else if tv = 0 then
7. (S+, S−)← BSA[(T+, T−), (0, 0)]
8. end if
9. return S
Algorithm for Modular Inversion. Using Fermat’s the-
orem, the inverse of a value X modulo p can be com-
puted thanks to a modular exponentiation of X by p − 2:
X−1 = Xp−2 (mod p), if gcd(X ,p) = 1. This algorithm
81
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
can be used in this case because p is prime. A standard algo-
rithm for computing modular exponentiation is the square-
and-multiply algorithm.
4.2 Architecture
The figure of our ECC unit is depicted in figure 2.
Modular Addition. When modular addition is performed
(X + Y (mod 2p), with X 6= Y or X = Y ), MUX1
and MUX2 respectively select the couple (X+,X−) and
(Y +,Y −) and a first addition is computed thanks to BSA1.
MUX4 selects (T+,T−) and MUX3 chooses the value
which must be added to (T+,T−) thanks to the value of
4tl+2+2tl+1+ tl. BSA2 adds these two variables, and DE-
MUX sends the result to its output (DEMUX+, DEMUX−).
SUBTRACTER affects the operands before BSA1 in order
to make a subtraction:
• [(X+, X−), (Y +, Y −)] ← [(X+, X−), (Y −, Y +)]
for X − Y (mod 2p)
• [(X+, X−), (Y +, Y −)] ← [(X−, X+), (Y −, Y +)]
for −X − Y (mod 2p)
Modular Multiplication. SHIFT1 and SHIFT2 are only
used to implement the division by 2 in MMM. At the be-
ginning of a MMM, the value of (Y + P ) (denoted Y P in
Figure 2) is computed: MUX1 and MUX2 respectively se-
lect the couple (Y +,Y −) and (P+, P−), BSA1 makes the
addition, MUX3 and MUX4 respectively select the couple
(T+,T−) and (0,0), BSA2 adds (T+ T−) and (0,0) and DE-
MUX sends the result to its output named (YN s+,YN s−).
If X is denoted X = (xl+1xlxl−1 · · ·x2x1x0)2, SELEC-
TOR1 and SELECTOR2 respectively computes mi1 (resp.
mi2) by treating even (resp. odd) indexes of X . These val-
ues command MUX2 and MUX4 in order to choose the
value which must be added to (S+ S−). Thus, SELEC-
TOR1, BSA1 and SHIFT1 computes the result (T+,T−)
which is treated by SELECTOR2, BSA2 and SHIFT2. In
this mode, MUX1, MUX3 and DEMUX respectively se-
lects the couple (S+,S−), (T+,T−) and (S s+,S s−). At
the last step of the MMM, DEMUX sends the result to its
output (DEMUX+, DEMUX−).
Modular Inversion. Modular inversion can be imple-
mented as a series of modular multiplications.
5 Parity-Preserving Logic Gates
5.1 General Properties
Parhami introduced in [18] a class of logic gates for
which the parity of the outputs matches that of the inputs.
For example, a parity-preserving logic gate (PPLG) which
has 2 input bits (a, b) and 2 output bits (p, q) must hold the
property:
BSA 1
S+S−X+X−Y+Y−
MUX 1
SHIFT 1
BSA 2
SHIFT 2
DEMUX
MUX 3 MUX 4
T+T−
SUBTRACTER
BSA1+
2
y
−
0y
+
0
out X− out Y−
cmd fsm3
MUX1− MUX2+
cmd fsm1
x
−
i
Y+Y−P+P−Y P+ x+
i
Y P−
0
cmd fsm2
mi1
y
+
0x
−
i
x
+
i
y
−
0s
−
0s
+
0
MUX 2 SELECTOR 1
MUX2−
out X+ out Y+
BSA1−
cmd fsm4
cmd fsm6
mi2
MUX1+
MUX4+ MUX4−
BSA2+ BSA2−
SHIFT2−
tv
cmd fsm8
cmd fsm7
YN s+ DEMUX+S s+ DEMUX−YN s−S s−
SHIFT2+
x
−
i
SELECTOR 2
x
+
i
t
−
0t
+
0
cmd fsm5
2P+2P−−2P+−2P− T−T+
MUX3−MUX3+
Y+Y−P+P−Y P+Y P−x+
i
x
−
i00
Figure 2. Original ECC unit
a⊕ b = p⊕ q
These PPLGs are also reversible in the sense that they al-
lows the reproduction of the circuits inputs from observed
outputs. In this paper, it is only used the parity-preserving
property of these gates. In [18], the author proved that the
only 2-input (a, b), 2-output (p, q) reversible gate which is
also a PPLG that complements both inputs unconditionally.
This is not a sufficiently interesting result for building com-
plex circuits. Consequently, the author searched and found
the only two 3-input (a, b, c), 3-output (p, q, r) reversible
logic gates which are also PPLGs (with the condition p = a
4): the Fredkin gate (denoted FRG, [9]) and the Feynman
double-gate (F2G).
If designers want to optimize the performance of the cir-
cuit using some fanouts or feedbacks, it must paid attention
that the parity of the global circuit is maintained. Conse-
4It must be noted that all the already proposed reversible logic gates in
the literature hold this relation.
82
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
F2G
r = a⊕ c
q = a⊕ b
p = a
c
b
a
Figure 3. Feynman double-gate (F2G)
FRG
r = ac⊕ ab
q = ab⊕ ac
p = a
c
b
a
Figure 4. Fredkin gate (FRG)
quently, making a reversible circuit robust or fault tolerant
is much more difficult than a conventional logic circuit.
5.2 Our approach
It is detailed in this subsection a procedure for imple-
menting parity-preserving circuits in the general case. An
illustration is given in this paper by protecting only the
borrow-save adders (BSA1 and BSA2), which are the op-
erating components of the initial ECC unit (see Figure 2).
Future works will consist in protecting other components of
the ECC unit, like control logic.
Choose the protected part of the circuit. Designers
should investigate the tradeoff between the degree of fault
tolerance of the resulting circuit (i.e. how many BSA output
bits are protected and what is the protection level?) and its
performance (area and speed). Moreover, besides control-
ling the validity of outputs, designers can add parity checks
on intermediate variables: these additional checks should
imply a reasonable penalty on the circuit performance.
This paper chooses to check the validity of two BSA output
bits at a time (called s−i+1 and s
+
i ). Consequently, five BSA
input bits are potentially concerned: a+i , b
+
i , a
−
i , b
−
i , and
c+i .
Get the corresponding logic equations. The chosen pro-
tected part of the circuit should be sufficiently simple to get
its resulting logic equations easily.
In our case, the intermediate result called c−i (respectively
c+i ) computed by the PPM cell which processes the input
bits a+i , b
+
i , a
−
i (a
+
i−1, b
+
i−1, a
−
i−1) can be written:{
c−i = a
+
i ⊕ b+i ⊕ a−i
c+i = a
+
i−1.b
+
i−1 + a
+
i−1.a
−
i−1 + b
+
i−1.a
−
i−1
Finally, the results s−i+1 and s
+
i computed by the PPM cell
which processes the input bits c−i , b
−
i , c
+
i can be written:{
s−i+1 = c
−
i .b
−
i + c
−
i .c
+
i + b
−
i .c
+
i
s+i = c
−
i ⊕ b−i ⊕ c+i
Transform the logic equations in Galois field. F2G and
FRG can compute “not” logic function, and FRG can imple-
ment “and” logic function. On the other hand, among these
gates, no one can implement “or” logic function. Thus,
logic expressions must be expressed in Galois field thanks
to the following property:
x+ y = x⊕ y ⊕ x.y
In our application, the variables c+i and s
−
i+1 can be finally
written:
{
c+i = a
+
i−1.b
+
i−1 ⊕ a+i−1.a−i−1 ⊕ b+i−1.a−i−1
s−i+1 = c
−
i .b
−
i ⊕ c−i .c+i ⊕ b−i .c+i
Proof. (for s−i+1)
s−i+1 = c
−
i .b
−
i + c
−
i .c
+
i + b
−
i .c
+
i
= (c−i .b
−
i ⊕ c−i .c+i ⊕ c−i .b−i .c+i ) + b−i .c+i
= c−i .b
−
i ⊕ c−i .c+i ⊕ c−i .b−i .c+i ⊕ b−i .c+i ⊕ c−i .b−i .c+i ⊕
c−i .b
−
i .c
+
i ⊕ c−i .b−i .c+i
= c−i .b
−
i ⊕ c−i .c+i ⊕ b−i .c+i
This proof can be also applied to the variable c+i .
Implement these equations thanks to PPLGs. The pro-
tected circuit will only consist of the FRG (white boxes in
Figure 5 and 6) and the F2G (grey boxes in Figure 5 and 6).
It must be noticed that no fanout is advised. If a signal is
needed by more than one cell, it is preferable to duplicate it
thanks to a parity-preserving logic gate. For example, if a
signal x is needed to be duplicated, a F2G cell can be used
with the inputs (a = x, b = 0, c = 0): its outputs will be
equal to (p = x, q = x, r = x).
If a PPLG output signal is not used, it must be sent to the
output of the component to maintain its global parity. If
two PPLG outputs have the same values and are not used
by other PPLGs, they can be simplified or unconnected in
the case that the reversibility property of these PPLGs is not
used: these bits are called “garbage bits”. For example, if a
F2G have the outputs (p = x, q = x, r = x), and if q and r
are not used by other PPLGs, they can be unconnected in-
stead of sending them to the component output. PPM1 have
eight garbage bits (see Figure 5), PPM2 six (see Figure 6).
The elementary cell of our fault-tolerant BSA (called
FTBSA in the following), is depicted in Figure 7. It is made
of two cells, called PPM1 (see Figure 5) and PPM2 (see
Figure 6). PPM1 computes c+i+1, c
−
i , and p1, which are
the remaining output bits of PPM1. PPM2 finally outputs
the results s−i+1 and s
+
i ; it also computes p2, which are the
remaining output bits of PPM1. The implemented parity-
checker computes:
par1 = a+i ⊕ b+i ⊕ a−i ⊕ c−i
par2 = a+i ⊕ b+i ⊕ a−i ⊕ p1 ⊕ c+i+1
par3 = c+i ⊕ b−i ⊕ c−i ⊕ s−i+1 ⊕ p2 ⊕ s+i
Signal po is then calculated:
po = par1 + par2 + par3
Thus, if po = 1, an (or some) error(s) is (are) detected.
83
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
F2G
F2G
F2G
FRG
FRG
F2G
FRG
F2G
a+
b+
a− s
a+b+
a−
a−b+
c
a+a−
b+
0
0
0
0
0
0
0
0
0
Figure 5. PPM1
FRG
F2G F2G
F2G
F2G
FRG
F2G
FRG
s
a+b+
a−b+
c
a+a−
0
0
0
0
0
a−
a+
a+
0
0b+
0
0
Figure 6. PPM2
6 Results
6.1 Cost of the proposed solution
All the architectures presented in this paper have been
synthetized in C35 CORELIB technology using Design Vi-
sion tool. Table 3 contains the impact of our countermea-
sure only on the BSA. The overhead is great; the next
step will be to implement the FTBSA thanks to optimized
PPLGs: for example, instead of implementing F2Gs with
inputs (a = x, b = y, c = 0), it will be implemented only
x⊕ y.
Overhead. The impact of the countermeasure imple-
mented in components BSA1 and BSA2 is given in Table
4. The area overhead is acceptable (+ 38 %), but the main
drawback of this countermeasure is the induced latency.
Like in BSA case, some optimizations are possible, particu-
larly the implementation of optimized PPLGs or pipelining.
PPM1
Parity
b+ia
+
i a
−
i
p1
s+is
−
i+1po
Checker
PPM2
p2 4
5
b−i
c+i
c+i+1
Figure 7. FTBSA
Architecture Area (µm2) Latency (ns)
BSA-160 w/o
EDC
134,440 1.39
BSA-160 with
EDC
698,157 5.69
overhead ×5.19 ×4.09
Table 3. Area and time evaluation of BSA-160,
with and without error detection capabilities
(EDC)
6.2 Fault Coverage
The error detection capabilities of the proposed parity
check were studied using simulated fault injections into the
BSA. In these simulations, two random 164-bit elements
were generated, addition was started and a fault was in-
jected during the calculation. The corrupted data was finally
checked against the parity bits. The faults (one or two faulty
bits at once) could appear anywhere in the adder (164× 35
bits).
Some faults appear not to affect the result, they are fil-
tered during the computation (it can appear through a FRG
for example). Some are not detected even with only one
faulty wire due to multiple use of some wires through the
PPLGs if one of the outputs is filtered. The ratio of unde-
tected faults goes from 5 to 12% passing from one to two
faults. This ratio should decrease for three faults thanks to
the parity properties. We will evaluate this case in future
works.
84
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
Architecture Area (µm2) Latency (ns)
ECC-160 w/o
EDC
3,096,103 8.38
ECC-160 with
EDC
4,270,313 19.96
overhead ×1.38 ×2.38
Table 4. Area and time evaluation of ECC-160,
with and without error detection capabilities
(EDC)
Number of faulty bits 1 bit 2 bits
Size of the space search 5740 32941860
Detected faults 4592 28590818
Undetected faults 328 3668352
Non-faulty computation 820 682690
Table 5. Evaluation of the detection capabili-
ties
6.3 Additional Remarks
At present, the proposed countermeasure only computes
parity bits. The treatment of this information is critical since
it must not add single points of failure in the circuit. For ex-
ample, decisional tests must be avoided, because the flag bit
which commands this test can be also faulted. It is advised
to use the infective computation principle: if a fault is de-
tected, the result is changed in a way unpredictable to an
adversary (for example, [13] uses this technique). Thus, the
following technique will not be implemented:
If parity(in) = parity(out), then return(out),
else return(error)
It will be preferable to implement instead:
return((parity(in) ⊕ parity(out)).r ⊕ out)
where r is a random value.
The second remark concerns the use of borrow-save rep-
resentation. This latter enables to make SCFA very difficult
to achieve (under the condition of a secure state machine).
At last, the countermeasure described in this paper aims
at making also more difficult SEA. It can be done by check-
ing the parity during the ECC unit computations.
7 Conclusion
This paper presents a fault-tolerant elliptic curve crypto-
processor unit. This resistance is provided by the use of par-
ity preserving logic gates in the critical part of the ECC unit,
which is the borrow-save adder. This architecture is pro-
tected with high-coverage rate against computational safe-
error attack, and the sign-change fault attack seems partic-
ularly difficult to perform since borrow-save representation
is used. Moreover, standard countermeasures shown in sub-
section 3.2 can be implemented in addition of our gate-level
approach in order to be protected against other possible fault
attacks on ECC.
The proposed countermeasure provides a high level of
fault detection, but at the expense of an important latency
overhead. This overhead can be decreased thanks to the
use of optimized parity-preserving logic gates, and other
reversible gates, like Toffoli and Peres gates (see [18] for
more details). Moreover, some flipflops will be inserted in
the critical path of the circuit in order to shorten it. This
new architecture is under development. Future works will
also consist in generalizing the parity-preserving principle
to other ECC unit components, like control logic.
References
[1] H. Bar-El, H. Choukri, D. Naccache, M. Tunstall, and
C. Whelan. The Sorcerer’s Apprentice Guide to Fault
Attacks. IEEE Special Issue on Cryptography and Se-
curity, 96(2):370–382, 2006.
[2] I. Biehl, B. Meyer, and V. Mu¨ller. Differential Fault
Attacks on Elliptic Curve Cryptosystems. In Advances
in Cryptology− CRYPTO, LNCS, volume 1880, pages
131–146, 2000.
[3] J. Blo¨mer, M. Otto, and J.-P. Seifert. Sign Change
Fault Attacks on Elliptic Curve Cryptosystems. In
Proc. Fault Diagnosis and Tolerance in Cryptography
− FDTC, LNCS, volume 4236, pages 36–52, 2006.
[4] E´. Brier and M. Joye. Weierstrass Elliptic Curves and
Side-Channel Attacks. In Proc. Public Key Cryptog-
raphy − PKC, LNCS, volume 2274, pages 335–345,
2002.
[5] B. Chevallier-Mames, M. Ciet, and M. Joye. Low-
Cost Solutions for Preventing Simple Side-Channel
Analysis: Side-Channel Atomicity. IEEE Transac-
tions on Computers, 53(6):760–768, 2004.
[6] M. Ciet and M. Joye. Elliptic Curve Cryptosystems in
the Presence of Permanent and Transient Faults. De-
signs, Codes and Cryptography, 36(1):33–43, 2005.
[7] J.-S. Coron. Resistance against Differential Power
Analysis for Elliptic Curve Cryptosystem. In Proc.
Cryptographic Hardware and Embedded Systems −
CHES, LNCS, volume 1717, pages 292–302, 1999.
85
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
[8] I. De´che`ne, E´. Brier, and M. Joye. Unified Point Ad-
dition Formulae for Elliptic Curve Cryptosystems. In
Embedded Cryptographic Hardware: Methodologies
and Architectures − Nova Science Publishers, pages
247–256, 2004.
[9] E. Fredkin and T. Toffoli. Conservative Logic. Inter-
national Journal of Theoretical Physics, 21(3–4):219–
253, 1982.
[10] A. Guyot, Y. Herreros, and J.-M. Muller. JANUS,
an On-line Multiplier/Divider for Manipulating Large
Numbers. In Proc. IEEE Symposium on Computer
Arithmetic, pages 106–111, 1989.
[11] D. Hankerson, A. Menezes, and S. Vanstone. Guide to
Elliptic Curve Cryptography. Springer, 2004.
[12] M. Joye. Elliptic Curve Cryptosystems in the Presence
of Faults. In Securing Cyberspace: Applications and
Foundations of Cryptography and Computer Security,
2006.
[13] M. Joye, P. Manet, and J.-B. Rigaud. Strengthen-
ing Hardware AES Implementations against Fault At-
tacks. IET Information Security, 1(3):106–110, 2007.
[14] N. Koblitz. Elliptic Curve Cryptosystems. Mathemat-
ics of Computation, 48(177):203–209, 1987.
[15] P. Kocher, J. Jaffe, and B. Jun. Differential Power
Analysis. In Advances in Cryptology − CRYPTO,
LNCS, volume 1666, pages 388–397, 1999.
[16] V. Miller. Use of Elliptic Curve in Cryptology. In Ad-
vances in Cryptography − CRYPTO, LNCS, volume
218, pages 417–426, 1986.
[17] P. L. Montgomery. Modular Multiplication with-
out Trial Division. Mathematics of Computation,
44(170):519–521, 1985.
[18] B. Parhami. Fault-Tolerant Reversible Circuits. In
Proc. IEEE Asilomar Conference on Signals, Systems
and Computers − ACSSC, pages 1726–1729, 2006.
[19] A. Shamir. Method and Apparatus for Protecting
Public Key Schemes from Timing and Fault Attack.
United States Patent, (5991415), 1999.
[20] D. Stebila and N. The´riault. Unified Point Addition
Formulae and Side-Channel Attacks. In Proc. Crypto-
graphic Hardware and Embedded Systems − CHES,
LNCS, volume 4249, pages 354–368, 2006.
[21] N. Takagi and S. Yajima. Modular Multiplication
Hardware Algorithms with a Redundant Represen-
tation and Their Application to RSA Cryptosystem.
IEEE Transactions on Computers, 41(7):887–891,
1992.
[22] S. Vanstone. ECC Holds Key to Next-Gen Cryptogra-
phy. Technical report, Certicom Corporation, 2004.
[23] S.-M. Yen and M. Joye. Checking before Output
May not be Enough against Fault-Based Cryptanaly-
sis. IEEE Transactions on Computers, 49(9):967–970,
2000.
[24] S.-M. Yen, S. Kim, S. Lim, and S. Moon. RSA
Speedup with Chinese Remainder Theorem Immune
against Hardware Fault Cryptanalysis. IEEE Transac-
tions on Computers, 52(4):461–472, 2003.
86
Authorized licensed use limited to: UR Rennes. Downloaded on June 16, 2009 at 10:54 from IEEE Xplore.  Restrictions apply.
