Koblitz curves are a special set of elliptic curves and have improved performance in computing scalar multiplication in elliptic curve cryptography due to the Frobenius endomorphism. Double-base number system approach for Frobenius expansion has improved the performance in single scalar multiplication. In this paper, we present a new algorithm to generate a sparse and joint τ -adic representation for a pair of scalars and its application in double scalar multiplication. The new algorithm is inspired from double-base number system. We achieve 12% improvement in speed against state-of-the-art τ -adic joint sparse form.
Introduction
Let Koblitz curve be Ea : y 2 + xy = x 3 + ax + 1,
where a ∈ {0, 1}, E(F2m ) a group of points on Ea for some extension field F2m and n the group order of Ea(F2m ). Any point P ≡ (x, y) ∈ Ea(F2m ) has following properties
[τ ]P = (x 2 , y 2 ) and − P = (x, x + y),
where τ is called the Frobenius map over Ea(F2m ). Further, there exists a point at infinity denoted by O [1] . The point at infinity satisfies the properties,
The Frobenius mapping of a point can be computed by squaring its coordinates. The cost of the squaring is very cheap and fast in its hardware implementation with both polynomial and normal basis representations [2] . In the digital signature verification of an elliptic curve cryptosystem, double scalar multiplication [k]P + [l]Q consumes most computational power, where P, Q ∈ Ea(F2m ) and k, l ∈ [1, n − 1]. The scalar k, l are represented in τ -adic expansion to obtain the advantage of Frobenius map by replacing point doublings.
Let Z[τ ] be a ring of polynomials in the form l−1 i=0 uiτ i , where l is the length of the polynomial, ui ∈ {0, ±1} for all 0 < i < l−1 and u l−1 = ±1. First, both scalars are converted or reduced in Z[τ ] to complex numbers such that l is minimal. The reduction in Z[τ ] is defined as ρ ≡ k mod δ, where k is an integer in [1, n−1] and δ = (τ m −1)/(τ −1). Next τ -adic non-adjacent form l−1 i=0 uiτ i with l/3 average Hamming weight is computed [3] . Then two τ NAFs are used as inputs to generate τ -adic joint sparse form (τ JSF) of both scalars with average Hamming weight of l/2 [4] .
Dimitrov et al. has introduced the two dimensional Frobenius expansion (TDFE)
j , where l is the length of the τ -adic expansion, ui,j ∈ {0, ±1} and k is an integer to compute single scalar multiplication [5] . Note that TDFE can be reduced to a polynomial in Z[τ ] [5] . Our approach towards the double scalar representation is based on TDFE. Our algorithm delivers a joint and sparse two dimensional representation that can be reduced to Z[τ ]. It is used with Straus' idea [6] to compute double scalar multiplication and perform minimum number of point additions. Our new algorithm, joint two dimensional Frobenius expansion (JTDFE) is having 15% improvement in terms of speed compared to τ JSF in its implementation on a field programmable gate array (FPGA).
This paper is arranged as follows: two dimensional Frobenius expansion is discussed in Section 2. The construction of new algorithm JTDFE is discussed in Section 3. Section 4 explains the hardware implementation. We conclude the paper in Section 5.
Two Dimensional Frobenius Expansion
The Frobenius map τ is a complex number with value (µ + √ −7)/2, where µ = (−1) 1−a . A complex number in the form of a + τ b, where a, b ∈ Z is called a Kleinian integer [7] . Next, we define the {τ, τ − 1}-Kleinian integer.
The two dimensional Frobenius expansion of an integer can be represented as in the following equation:
where d is the length of the expansion si = ±1 and ai, bi ∈ Z * . We rearrange (4) as follows:
where max(a i,l ) is the maximum power of τ that is multiplied by (τ − 1) l in (5). Algorithm 1 illustrates the routine to compute the single scalar multiplication [k]P when the {τ, τ −1}-expansion of k is given. In order to simplify, we denote the terms corresponding to (τ − 1)
The multiplication [τ − 1]P costs one Frobenius mapping and a point addition.
Algorithm 1 Scalar Multiplication using Two Dimensional Frobenius Expansion Require: Two dimensional Frobenius expansion of k ∈ N and a point P ∈ E(F2m ).
Q ← Q + S 7: end for 8: return (Q).
Therefore, max(bi) should be limited when the two dimensional Frobenius expansion is computed.
Finding an algorithm that returns a fairly short representation of k as the sum of {τ, τ − 1}-Kleinian integers is an absolute need. The greedy algorithm given in [5] is used to obtain such a representation. Greedy algorithm does not always return the canonical {τ, τ − 1}-expansion. Note that the complexity of the greedy algorithm depends crucially on the time spent to find the closest {τ, τ −1}-Kleinian integer to the current Kleinian integer.
However, finding the closest Kleinian integer in an intermediate step of greedy algorithm is achieved by precomputing all Kleinian integers ±τ
y for x, y ∈ Z * less than certain bounds and using an exhaustive search. Using divide-andconquer principle, Dimitrov et al. have invented an effective method to generate an efficient two dimensional Frobenius expansion for computing single scalar multiplication [5] . Further they have conjectured following:
Conjecture 1 Length of Two Dimensional Frobenius Expansion Every
Kleinian integer ξ = a + bτ , can be represented as the sum of at most O (log N (ξ)/ log log N (ξ)) {τ, τ − 1}-Kleinian integers, where N (ξ) = (a + bτ )(a + bτ ) is the norm of ξ.
They highlighted that use of two complex bases has increased the theoretical difficulties in proving the Conjecture 1. Nevertheless, that lead to a more important practical blocking algorithm given in [5] .
Joint Blocking Algorithm
In this section, we present the construction of our new algorithm to return a joint and sparse representations for a pair of Kleinian integers η0, η1 ∈ Z[τ ]. Algorithm 2 illustrates the procedure to compute a joint two dimensional Frobenius expansion in Z[τ ] for a pair of Kleinian integers.
A window size w is fixed prior to running the algorithm. Then the optimal joint two dimensional Frobenius expansions for all possible pairs of w-bit τ -adic representations are precomputed and given as another input.
First, two τ -adic expansions l−1 i=0 uiτ i , where ui ∈ {0, 1} and l is the length of the longer expansion, in Z[τ ] are computed. Next both τ -adic expansions are arranged as in (6) to generate joint columns. 
compute τ -adic expansion ηi = l j=1 ηi,j τ j , where ηi,j ∈ {0, 1} 4: end for 5: for i = 0 to ⌊l/w⌋ do 6: find optimal joint two dimensional Frobenius expansion of pair of multiply each term by τ iw and add to L0 or L1
8:
i ← i + 1 9: end for 10: return (L0, L1).
The i th joint column in (6) has two elements η0,i, η1,i ∈ {0, 1} for all i satisfying 0 ≤ i < l. If one τ -adic expansion is shorter than the other, then the coefficients of higher degrees of τ of shorter expansion should be set to zero.
Two τ -adic expansions are separated into w-bit ⌈l/w⌉ number of blocks. The least significant w bits of τ -adic expansion have the label block 0, while the most significant bits have label block ⌊l/w⌋.
At step 6 of Algorithm 2, i th block of η0 and η1 representations are considered to find the i th block of optimal joint two dimensional Frobenius expansion. This is achieved by a look-up-table approach. Once the i th block of optimal joint two dimensional Frobenius expansion is obtained, all elements are multiplied by τ iw and appended to the relevant lists. We repeat this step for ⌈l/w⌉-times to obtain the complete joint two dimensional Frobenius expansion. Example 3 illustrates the execution of Algorithm 2. 4 . Most significant bits pair have τ 3 + τ (τ − 1) 2 and
of Algorithm 2). We multiply last pair by τ 5 to obtain final results (Step 7 of Algorithm 2). The optimal joint two dimensional Frobenius expansion is given by:
The main advantage of a joint representation in double scalar multiplication in elliptic curve cryptography is that Straus' method can be applied with some precomputations to improve the efficiency [6] . Considering Example 3, if A = P + Q and S = P − Q are precomputed, we can compute [−5 − 18τ ]P + [−21 + 5τ ]Q with four point additions. We do not consider any additions due to (τ − 1) terms in the total cost.
Then we can apply Algorithm 1 to compute final point with (7). The point negations over Koblitz curves is only a field addition and can be neglected in terms of cost compared to field multiplication. Fig. 1 illustrates the graphical representation of joint two dimensional Frobenius expansion of η0 = −5 − 18τ and η1 = −21 + 5τ . The generation of precomputed optimal joint representations for all possible combinations of pairs of Kleinian integers for a given window w is achieved by an exhaustive search. This computation needs to be done only once per curve and a given window size.
Hardware Implementation and Results
The double scalar multiplication over F 2 163 with joint two dimensional Frobenius expansion is implemented in VHDL and placed and routed to Xilinx XC4VLX200 FPGA by executing Xilinx Integrated Software Environment (ISE TM ) version 9.2i. The window size is set to w = 5 and maximum exponent of τ −1 is limited to four. We describe the hardware architecture of our circuit in this section. The top-level design components and architecture of the circuit are illustrated in Fig. 2 .
The circuit is partitioned into four high-level components, namely, main controller (MC), binary arithmetic processor (BAP), integer-to-τ converter (ITC) and registers. In our implementation, databus width is set to 163 bits. Other than u0 and u1, inputs and outputs of main controller are handshaking signals between MC and other units.
The binary arithmetic processor performs four basic arithmetic operations needed for point multiplication, namely, addition, squaring, multiplication and inversion. All arithmetic operations are performed in the normal basis representation. Addition and squaring can be executed in a single clock cycle. Addition is an exclusive OR (XOR) operation and squaring is a cyclic shift operation in the normal basis representation [9] . Multiplication is a direct implementation of Massey-Omura multiplier with computing four bits in one clock cycle [10] . Therefore we need only forty one clock cycles for the multiplication. The inversion is performed with Itoh-Tsuji architecture [11] . It needs nine multiplications to calculate the inversion of an element in F 2 163 . Once the multiplication or inversion is performed, binary arithmetic processor sends out a job completion signal by setting DONE of BAP to high.
The primary job of the integer-to-τ converter is to compute the joint two dimensional Frobenius expansion from a pair of integers. Our implementation comprises of two integer-to-τ converters with lazy reduction introduced in [8] , because it is faster and needs less area in hardware implementations. The converter is slightly modified to generate nonnegative elements for the τ -adic expansion, whereas the circuit proposed in [8] generates the τ NAF. Then a precomputed look-up-table is used to compute
The registers are used to store point coordinates and intermediate values during point additions. Further some registers can perform cyclic shift operation to facilitate Frobenius mapping on points P , Q, P + Q, and P − Q.
The main controller is designed with a finite state machine to perform the double scalar multiplication with other three components. With the INIT of MC set to high, main controller begins loading integers k0, k1 and P , Q point coordinates xP , yP , xQ, yQ to registers. Then k0 and k1 are loaded into the integer-to-τ converter simultaneously. Once DONE of ITC is high, the joint two dimensional Frobenius expansion is read to main controller and the double scalar multiplication is started. The main controller knows that it has reached to the end of computation, when the TOP of ITC is high. Final results are stored in the registers and DONE of MC is set to high.
Affine coordinates are used in precomputations. Mixed coordinates are used for computing P and Q related calculations and needs 8 field multiplications and 5 field squarings. For other point additions, i.e. P ± Q related computations we have used López-Dahab projective coordinates in this implementation [12] . These point additions require 13 field multiplications and 4 field squarings.
The hardware implementations are carried out for both τ -adic joint sparse form and joint two dimensional Frobenius expansion based double scalar multiplication. A window value w = 5 and maximum τ − 1 exponent max(bi) = 4 are selected for the joint two dimensional Frobenius expansion implementation. The y 2 + xy = x 3 + x + 1 is considered over binary field F 2 163 . We have considered the curve parameters and field for implementation which are specified by NIST. We have implemented both circuits in Xilinx XC4VLX200 FPGA and tested for 10,000 pair of integers, k, l and pair of points, P , Q. The summary of the experimental results are given in Table 1 .
The results collected in Table 1 are based on the synthesis goals set for speed maximization. Time is read for each algorithm when the circuit is operating at its maximum frequency.
Note: Timings given for single scalar multiplication in [5] 
Conclusions
The joint two dimensional Frobenius expansion outperforms the state-of-the-art τ -adic joint sparse form in double scalar multiplication over Koblitz curves, in speed, according to the experimental results presented in Table 1 . The area of the new architecture has increased by about 45% of that of τ JSF architecture. Having greater values for window sizes and maximum τ − 1 exponents, the speed of the double scalar multiplication can be improved. When the window size is increased the size of look-up-table in integer-to-τ conversion grows exponentially. We will investigate on different combinations of w and maximum τ − 1 exponent as future work. 
