Abstract-The paper presents arithmetic level protections for ECC processor against some side channel attacks. The proposed protection is based on random recodings of the secret key in the double base number system (DBNS). DBNS is a highly redundant and sparse number system. Here, the high redundancy level of DBNS is used to randomly modify on-the-fly the ki digits during the scalar multiplication [k]P . The proposed solution leads to random numbers and orders of curve level operations (point addition, doubling and tripling) during the computation of [k]P operations. Our random recoding method provides [k]P computation time comparable to the best w-NAF recoding methods. But standard w-NAF recodings are deterministic ones while our solution is a random one.
I. INTRODUCTION
Implementation of embedded cryptosystems has to face two types of attacks: theoretical attacks and physical attacks. The security against theoretical attacks is provided by the robustness of the mathematical problem behind the cryptosystem: the elliptic curve discrete logarithm problem (ECDLP) for elliptic curve cryptography (ECC) [1] . The security of the cryptosystem against physical attacks, which are considered as very strong threats in embedded security applications, should be provided by cryptosystem designers. Recently, very efficient side channel attacks (SCA) have been proposed such as power analysis [2] or electromagnetic radiations analysis.
Many countermeasures have been introduced to protect cryptosystems against some SCAs. Some methods have been proposed at the arithmetic level. For instance, addition chains [3] , [4] allow to perform only one type of operation, point additions, during the scalar multiplication [k]P . Hence, addition chains prevent from some attacks based on simple power analysis (SPA).
In this paper, we propose on-the-fly random recodings of the scalar digits k i using the double base number system (DBNS). The very high redundancy level of DBNS allows to randomly choose among several representations of the key digits k i . Then the number and order of operations at the curve level (point addition, doubling and tripling) is randomized. This may prevent from some power or electromagnetic radiations attacks.
Notations and background on ECC and DBNS are provided in Sections II and III respectively. The proposed countermeasure method is described in Section V. Implementation details in FPGA and experimental results are reported in Sections VI and VII respectively.
II. ELLIPTIC CURVE CRYPTOGRAPHY (ECC)
A complete presentation of ECC systems in presented in [1] . In this work, we use elliptic curves defined over the prime finite field F p . Arithmetic operations in F p are performed modulo p a large (160-600 bits) prime number. Extension to ECC systems defined over the F 2 m finite field is simple. See [5] 
where (x, y) ∈ F 2 p are the coordinates of point P on E, the curve parameters a and b, in F p , are such that 4a 2 +27b 2 = 0. The addition (ADD) of two points P = (x 1 , y 1 ) and Q = (x 2 , y 2 ) of E gives a third point R = (x 3 , y 3 ) also on E (the sum point R = P + Q). The computation of R coordinates involves arithmetic operations in F p . See [1, see p. 80] for a geometrical illustration. The addition of P + P is called point doubling 2P (DBL), and is computed using a slightly algorithm. Point tripling 3P (TPL) can also be defined in order to speed up the scalar multiplication process.
In ECC protocols, the main operation is the scalar multiplication [k]P = P + P + · · · + P (with k − 1 additions) where k is a given integer (the secret/public key) and P a point on the curve E. The scalar k is a t-bit integer,
There are several algorithms for scalar multiplication (see [1, sec. 3.3] ). Figure 1 presents the simplest [k]P algorithm.
The theoretical security of ECDLP comes from the fact that given two points P and Q such that Q = [k]P , finding the integer k is not feasible in practice (for well chosen curves), see [1, sec. 4 .1] for details and theoretical attacks.
Side channel attacks allow to extract secret informations from running devices by measuring and analyzing some physical parameters (power consumption, electromagnetic radiations, computation time). For instance, recording and analyz- ing the instantaneous power consumption of a circuit may lead to very efficient power attacks [2] . Simple power analysis (SPA, see [2, chap. 5] ) directly uses the fact that basic implementations of point addition and point doubling operations have different power traces. In the algorithm presented in Fig. 1 , the point addition operation Q ← Q + P is only performed for odd bits of the scalar. Thus, the recorded power traces will directly show where/when are the ADD operations in the sequence of DBL ones. Algorithms such as the doubleand-add method are prone to this type of attacks.
III. DOUBLE BASE NUMBER SYSTEM (DBNS)
The double base number system (DBNS) simultaneously uses two bases for representing numbers [6] . In this work, we use bases 2 and 3. The integer x is represented by:
where
, 3}-terms and are the "digits" of the DBNS representation. DBNS is a sparse (i.e. the number of {2, 3}-terms is small) and very redundant (i.e. some numbers have several representations) number system. For instance, the value 127 has 783 different DBNS representations (from [7] ):
Most of arithmetic operations (addition, multiplication, division) are very complex and costly in DBNS. But there are a few very cheap operations: multiplication by 2 and 3. Those operations are very useful in ECC.
As suggested in [8] , we use k recoded using DBNS chains to perform the scalar multiplication operation [k]P . As stated in [8] , the scalar multiplication is efficiently computed using DBNS representations with decreasing exponents 1 :
1 a 0 is considered the largest exponent of powers of 2, while a n−1 is the smallest one (the same notation applies for b i exponents of powers of 3).
Let t i = 2 ai 3 bi , using a decreasing sequence of exponents, the value k = n−1 i=0 t i can be factorized by t n−1 :
ai−an−1 3 bi−bn−1 . This recursive factorization is similar to the Horner scheme.
For instance, k = 15679 can be recoded in DBNS with and without decreasing exponents:
Without decreasing exponents, the computation cost of [15679]P is 3 · ADD + 9 · DBL + 10 · TPL operations while with decreasing exponents this cost reduces to only 3 · ADD + 6 · DBL + 5 · TPL operations.
In [8] , a scalar multiplication algorithm dedicated to DBNS representation with decreasing exponents is proposed. This algorithm is given in Fig. 2 . The cost of this algorithm is If the last exponents of powers of 2 and 3 (respectively a n−1 and b n−1 ) are equal to 0 then line 6 must not be executed.
IV. TRUE RANDOM NUMBER GENERATOR (TRNG)
TRNGs use a physical noise source such as radioactive decay, meta-stability, thermal noise or jitter variations in free running oscillators, to produce a random signal. Here, we use in-house TRNGs with on-line randomness monitoring to provide the random bits required for the selection of DBNS recoding rules. This kind of TRNG is based on oscillators sampling for the noise source (random jitter produced by one or several free running oscillators). See [9] , [10] for details.
V. PROPOSED ARITHMETIC COUNTERMEASURE
During the scalar multiplication [k]P , the point level operations (i.e. ADD and DBL) are scheduled in a deterministic way based on k digits. NAF or w-NAF standard recodings produce deterministic schedules (see [1, p. 98] ).
In this work, we use random representations of k in DBNS. This is possible because the DBNS representation is extremely redundant. The conversion from the standard binary representation to DBNS is done on-the-fly. Then random recodings of are applied using the following identities:
Expansions of DBNS terms may introduce more randomness in the recoded scalar k representation, and consequently in the power traces. There should be a trade-off between the computation time and the amount of randomness introduced by expansions.
Furthermore, the previous identities can be combined with signed digits version of the DBNS representation. Thus there are two recoding rules for each identity: Figure 3 illustrates some examples of DBNS recodings for the scalar k = 140400.
In some cases, some rules cannot be applied due to neighboring {2, 3}-terms. For instance, if two consecutive terms,
, share the same exponent of the power of 2, i.e. a i = a i−1 , then reductions based on rule R 1 cannot be applied.
Even for expansions, some problems may occur due to the fact that the sequence of exponents must be kept decreasing. In order to always ensure that the rules produce only decreasing sequences of exponents, three consecutive {2, 3}-terms must be read for each rule application.
Below we provide a complete example of a situation where some rules should be discarded.
We use the k value presented in example Fig. 3 . Starting with the initial DBNS representation k = 140400 = 2 7 3 7 − 2 7 3 6 − 2 7 3 5 − 2 6 3 5 + 2 4 3 3 (which is recoding 7 in Fig. 3 Figure 4 presents the architecture of the implemented recoding unit. This unit on-the-fly recodes the scalar k using randomly chosen DBNS representations. The scalar k, represented in DBNS, is stored in a specific register as a sequence of {2, 3}-terms (s i , a i , b i ) . Two specific blocks check which rules, R 1 to R 8 , can be applied. One block is dedicated to reductions and the other one to expansions. Both blocks take three consecutive {2, 3}-terms as inputs. Using 3 random bits produced by a TRNG, one rule is randomly selected among the set of allowed rules and then applied. Once a rule is selected, the block checks if the application of rules produces a valid sequence of exponents.
VI. IMPLEMENTATION
In case of a reduction, one of the original two terms is modified accordingly to the selected rule (addition/subtraction on the exponents and sign adjustment), while the other term is deleted from the DBNS representation. In case of an expansion, the block checks that inserting a new term still lead to decreasing sequence of exponents. The proposed recoding unit has been described in VHDL and implemented in a XC5VLX50T FPGA using the ISE version 11.4 design environment from Xilinx. The results reported in Table I have been obtained with standard efforts for both synthesis and place-and-route steps. The area required by the recoding unit is very small compared to the total area of the complete ECC processor. The recoding unit can work at a clock frequency greater than 200 MHz which is faster than the clock frequency of our arithmetic units in F p . VII. VALIDATION AND COMPARISON In order to mathematically validate our method, we used comparisons to software results provided by PARI/GP which is a computer algebra system with many powerful number theory functions. We have used the software environment developed for the mathematical validation to experimentally evaluate the computation time using the proposed random recoding method. All obtained results have been compared to standard recoding methods (NAF, w-NAF see [1, p. 98 
]).
The experiments reported below have been realized using the curve P-224 provided by NIST (FIPS 186-2), see [1, appendix A.2.1] for details. The prime field F p is defined with p = p 224 = 2 224 −2 96 +1. There exists many ways to represent points of an elliptic curve. We use Jacobian coordinates (denoted J ) in order to speed up the [k]P computation (see [1, sec. 3.2.1] ). We also use mixed coordinates Jacobian with affine ones (denoted A). Table II summarizes the cost of the main elliptic curve operations (see [7] for details). ADD J +A is the cost of point addition in mixed coordinates (J + A → J ).
x−DBL
J and x−TPL J are respectively the cost of x doubling operations and x tripling operations when Jacobian coordinates are used.
curve operation cost [7] ).
The experimental setup was as follows. 1000 scalars k have been randomly chosen in the range
For each scalar k, the binary representation of k is converted to DBNS using the algorithm provided in [11] . Then 10 000 random recodings are applied to k in DBNS. For each recoded DBNS sequence, the number of curve level operations (ADD, DBL, TPL) is recorded. The number of times expansion and reduction rules can be applied are also recorded. Table III presents the obtained statistics for 1000 scalars k and 10 000 random recodings for each scalar. The average value and the standard deviation of number of times each rule can applied are reported for reductions and for expansions.
Rules R 7 and R 8 are the most often used rules both for reduction and expansion. Note that given three successive {2, 3}-terms, in 6.7 % of cases no rule can be applied for those three terms.
The average length of the obtained DBNS expansion is 65.5 {2, 3}-terms with a standard deviation equal to 4.1. The maximal exponent of powers of 2 and 3 are 112 and 71 respectively. Then the average cost for computing [k]P using the random DBNS recoding is 65.5 · ADD + 112 · DBL + 71 · TPL.
We compared the [k]P computation cost using our random DBNS recoding to the results based on standard methods. The corresponding results are reported in Table IV where M and S are respectively the cost of one multiplication and one square in F p . We use the standard cost approximation S ≈ 0.8·M. The reported results show that our solution provide [k]P operations a computation time similar to state-of-the-art 4-NAF recoding method but with the advantage of having a randomized behavior.
VIII. CONCLUSION
In this paper, preliminary results for arithmetic level protections against some side channel attacks are reported. The proposed countermeasure uses random and on-the-fly recodings of the secret key k in DBNS during the scalar multiplication [k]P . Starting from a scalar k, our method randomly provides different DBNS representations of the recoded scalar k. Then the number and the order of point level operations (addition, doubling and tripling) is randomized during the scalar multiplication [k]P . An on-the-fly recoding unit in DBNS has been implemented in FPGA. The cost of this unit is very small compared to the total cost of a complete ECC processor both for clock frequency and silicon area aspects. Our countermeasure provides randomized scalar multiplications at the speed of the best unprotected algorithms.
In the future, we plan to integrate this protection scheme into an ECC processor under development and experimentally evaluate its robustness using power attacks. We also plan to work on other recoding identities and rules such as 2 2 + 2 = 3 2 − 3.
