In this work, we present a new structure for multiplication in finite fields. This structure is based on a digit-level LFSR (Linear Feedback Shift Register) multiplier, in which the area of the digit-multipliers is reduced using the Karatsuba method. We compare our results with the other works of the literature for F 3 97 . Furthermore, we propose new formulas for multiplication in F 3 6·97 . These new formulas reduce the number of F 3 97 -multiplications from 18 to 15. The finite fields F 3 97 and F 3 6·97 are important fields for pairing based cryptography.
INTRODUCTION
Efficient multiplication in finite fields is a central task in the implementation of most public key cryptosystems. A great amount of work has been devoted to this topic (see [1] or [2] for a comprehensive list). The two types of finite fields, which are mostly used in cryptographic standards, are binary finite fields of type F 2 m and prime fields of type F p , where p is a prime (cf. [3] ). Efforts to efficiently fit finite field arithmetic into commercial processors resulted into applications of medium characteristic finite fields like those reported in [4] and [5] . Medium characteristic finite fields are fields of type F p m , where p is a prime slightly smaller than the word size of the processor, and has a special form that simplifies the modular reduction. Mersenne prime numbers constitute an example of primes, which are used in this context. The security parameter is given by the length of the binary representations of the field elements, and the extension degree m is selected appropriately. Due to security considerations, the extension degree for fields of characteristic 2 or medium characteristic is usually chosen to be prime.
With the introduction of the method of Duursma and Lee for the computation of the Tate pairing (cf. [6] ), fields of type F 3 m for m prime have attracted special attention. Computing the Tate pairing on elliptic curves defined over F 3 m requires computations both in F 3 m and in F 3 6m . In [7] calculations are implemented using the tower of extensions
and the inherent parallelism of multiplication in extension fields is used to accelerate the operations. Hardware designs and especially FPGA-based ones are suitable platforms for parallel implementation of algorithms. In that work multiplications in the first and the second field extensions are computed via 3 and 6 multiplications in the ground fields, respectively, requiring 18 multiplications in F 3 97 .
In our current work, which is mostly based on [7] , on the one hand, we use asymptotically fast methods to improve the performance of multiplication in F 3 97 , and on the other hand, we propose new multiplication formulas to speedup multiplication in F 3 6·97 . Using the new formulas, multiplication in F 3 6·97 is done with only 15 multiplications instead of 18. We use the same extension tower, using 3 multiplications in F 3 97 to multiply elements in F 3 2·97 , but only 5 multiplications in F 3 2·97 for F 3 6·97 . Our proposed method has a slightly increased number of additions in comparison to the Karatsuba method. Notice however that a multiplication in F 3 97 requires many more resources than an addition, therefore the overall resource consumption will be reduced.
A consistent amount of work has been done on hardwarebased multiplication in finite fields, especially those of characteristic 3. The authors of [8] propose a least significant digit-element (LSDE) multiplier for F 3 m . This multiplier divides the input polynomials into digits of length D. Whereas the digits of one input polynomial are processed in parallel, the digits of the other input polynomial are handled serially. Then the result is reduced modulo the irreducible polynomial. The same structure has also been used in [7] for multiplication in F 3 97 . Our multiplier, on the other hand, is based on the digit-serial implementation of LFSR (Linear Feedback Shift Register) multiplier, which is widely used in the literature (see [9] or [10] ), and performs the modular reduction during the multiplication. The first contribution of our current work is the application of the Karatsuba multiplier inside the digit-multipliers, which results in smaller area for these multipliers. Our results demonstrate the efficiency of this design compared to other works. The second contribution is the application of a method using only 5 multiplications in F 3 2·97 for multiplication in F 3 6·97 . This results in an area-saving of almost 17 % compared to the Karatsuba method, which is used in [7] .
Our work is organized as follows. Section 2 is devoted to the general structure of our multiplier for F 3 97 . In Section 3 we describe some improvements on the traditional LFSR multiplier and compare our results with other works from the literature. In Section 4 the new formulas for F 3 6·97 together with suggestions for a new multiplier are presented, and Section 5 concludes the paper.
MULTIPLICATION IN F 3 97
The finite field F 3 97 can be represented as a vector space over F 3 . In this representation, elements of F 3 97 are vectors of length 97 over F 3 . Addition of elements is computed by adding corresponding vectors. Multiplication is more complicated, and depends on the selected basis for F 3 97 . There are two popular bases, which are used often in the literature, namely polynomial and normal bases. A polynomial basis is generally more suitable for multiplication, hence we choose this basis in our work.
In the polynomial basis, elements of F 3 97 are represented as polynomials of degree at most 96 over F 3 . Two elements are added by adding of the corresponding polynomials. Multiplication is based on polynomial multiplication followed by reduction modulo the irreducible polynomial, which generates the polynomial basis. In our case the irreducible polynomial, which we denote by f (x), is
In the next sections we show the details of polynomial arithmetic in our designs.
Arithmetic in F 3
The element a ∈ F 3 is represented using the vector (a 1 , a 0 ) of two bits such that the elements 0, 1, and 2 are (0, 0), (0, 1), (1, 0), respectively. In this representation the operations addition, multiplication, and negation (multiplication by 2) are done, as shown in [11] , using Equations 2, 3, and (2) where
The implementation of Equations 2 and 3 is done using 2 LUTs in the FPGA, whereas (4) is only a permutation of bits.
Structure of the multiplier for F 3 97
The structure of a digit-level LFSR multiplier is shown in Figure 1 . In this figure, the two input polynomials a(x), and b(x) are loaded into registers A and B, respectively, and divided into digits of length D. In each clock cycle the most significant digit of B is multiplied by the words of A, through digit-multipliers denoted by M, and added to the content of the register in the feedback circuit. Inputs to the digit-multipliers are two polynomials of degree
of each multiplier must be added to the powers x 0 to x D−2 of the next multiplier. This is done by the overlap circuit. In each clock cycle the register B and LFSR are shifted by D bits to the right. Shifting LFSR to right is equivalent to multiplication by x D , which generates the powers x 97 to x 96+D . These powers are reduced modulo f (x) of (1) using the feedback circuit. The name Linear Feedback Shift Register descends from these feedback structures. For more information about the digit-level LFSR multiplier and its costs for classical methods see [10] . In the next section we discuss our improvements to the traditional LFSR multiplier.
THE KARATSUBA METHOD
In this section we use asymptotically fast methods to reduce the size of digit-multipliers. We use a similar approach to [12] and combine the classical and the Karatsuba methods to build small digit-multipliers. Two linear polynomials a 1 x + a 0 and b 1 x + b 0 are multiplied classically using the formula
with 4 multiplications and 1 addition. The same product can also be computed via
The new formula is called the Karatsuba method (see [13] ). It requires 7 operations instead of 5, but only 3 multiplications, and uses fewer resources when the coefficients a 0 , a 1 , b 0 , b 1 are replaced by polynomials. The classical method for multiplication of two polynomials of degree n − 1 requires O(n 2 ) operations. Recursive application of the Karatsuba method reduces the cost of a multiplication to O(n 1.59 ) operations. We represent the classical multiplication of two polynomials of degree n − 1 by C n and the method of (6) by K. The methods C n for n ∈ N, and K constitute a set of polynomial multiplication methods. We call this set T. Using the elements of T, we define the set of recursive multiplication methods T * , which contains the elements of T and all recursive combinations of elements of T * . The recursive combination of the two methods M and N , for polynomials of lengths m and n, respectively, is the multiplication method MN for polynomials of length mn. Let a(x) = a mn−1 x mn−1 + · · · + a 0 , and
be given polynomials. In order to apply MN , we write these polynomials as
are polynomials of degree n − 1. If the polynomials A i and B i were coefficients, the two polynomials a(x) and b(x) would be multiplied using M. The product using the method MN consists of several multiplications of the polynomials A i and B i , which are performed using N . We implement the digitmultipliers using the elements of T * to reduce their size. Our approach is similar to [12] .
In Table 1 we show the results of implementing F 3 97 multipliers on a XC2VP20-6FF896 FPGA. In this table the first column is the digit-size D. In a digit-level multiplier with digit-size D, inputs are preceded by enough zeros so that their length becomes a multiple of D. Hence, it is natural to choose a value of D such that the difference m/D − m/D is as small as possible. Our values for D are selected using this criteria and hence differ from standard values, e.g. , multiples of 4, of other works (see [8] and [7] ). The second column shows the recursive combination of the Karatsuba and the classical methods, which is applied. It is important to notice that the method KC 4 , which we used for polynomials of digit-size 7, applies to polynomials of length 8. Therefore, we add a zero in front of the polynomial and then remove all the gates containing an operation with the coefficients that are known to be zero. Hence, this multiplier requires fewer resources than a complete KC 4 . This point distinguishes our approach from that in [12] . In the third, fourth, and fifth columns are the number of slices, maximum working frequency of the multiplier, and the required clock cycles for our designs. The results of comparing our results with those in [7] are shown in Figure 2 . Different digit-levels result in different circuits, which we compare with respect to both time and area. Area is the number of slices, whereas time is the product of clock cycles and minimum period. Both designs are on the same technology, but the speed grade of the FPGA in [7] is not available. As it is shown, our designs have better area-time performance. These improvements result, on the one hand, by using asymptotically faster methods, and on the other hand, by integrating the modular reduction stage into the LFSR. When a small digit-serial multiplier is used even the small size of a modular reduction must be taken into account.
MULTIPLICATION IN F 3 6·97
Multiplication in F 3 6·97 is done in the same way as in [7] , by using a tower of extensions of degrees 2 and 3, i.e. ,
The elements of F 3 2·97 are polynomials of degree 1 in s over F 3 97 , for s a root of y 2 + 1 in F 3 2·97 . The polynomials are multiplied by applying (6) and then reduced modulo s 2 + 1. The elements of F 3 6·97 are polynomials of degree 3 in r, for r a root of z 3 − z − 1 in F 3 6·97 . They are multiplied using Multipliers from [7] Fig. 2. Time vs. area comparisons of our multipliers with those in [7] the formulas (7) and then reduced modulo r 3 − r − 1.
Combining (6), (7) we have the following theorem.
Theorem 1 Let α, β ∈ F 3 6·97 be given as: 
All of the F 3 97 -multiplications can be done in parallel. This property allows designers to implement as many of these multipliers as possible, according to their time-area constraints. On the other hand, these multipliers are used for other computations such as point addition and doubling on elliptic curves for pairing-based cryptography. Reading and writing intermediate values into register files in such applications is time-consuming. To solve this problem we propose a new multiplier, which is shown in Figure 3 . The new multiplier consists of three pipeline stages, namely, input, multiplication, and output. During the time of each multiplication in F 3 97 , the input stage loads the coefficients a i and b i from memory for the next multiplication, and computes the linear combinations in (8) to compute P i s. In this time the output stage adds the last computed product P i to memory variables according to (9) . In this structure the hatched multiplexers can select either one of their inputs or the sum of the inputs. In this way all possible multiples of input polynomials can be selected and added to the accumulators.
CONCLUSION
In this paper, we proposed a new structure for multiplication in F 3 97 . This structure is based on digit-level LFSR multipliers, where the area of digit-multipliers is reduced using the Karatsuba method. Another advantage of this approach is performing the modular reduction during the multiplication. Our synthesis results showed the performance improvement compared to other designs in the literature. We have also presented new formulas for multiplication in F 3 6·97 using only 15 multiplications in F 3 97 . When the Karatsuba method is applied 18 multiplications are required. Furthermore, we have introduced a feasible hardware structure for realizing our proposed formulas. Our formulas are for the case that F 3 6·97 is constructed from F 3 2·97 using the irreducible polynomial z 3 − z − 1. In case that the finite field is constructed using z 3 − z + 1, the formulas require slight modifications.
A. MULTIPLICATION FORMULAS FOR F 3 6·97
Let α, β ∈ F 3 6·97 be given as:
α =a 0 + a1s + a2r + a3rs + a4r 2 + a5r 2 s, β =b 0 + b1s + b2r + b3rs + b4r 2 + b5r 2 s, where a0, · · · , b5 ∈ F 3 97 and s ∈ F 3 2·97 , r ∈ F 3 6·97 are roots of y 2 + 1 and z 3 − z − 1, respectively. Let their product γ = αβ ∈ F 3 6·97 be γ = c 0 + c1s + c2r + c3rs + c4r 2 + c5r 2 s.
Then the coefficients c0 · · · c5 ∈ F 3 97 of the product can be computed using the following formulas. P0 = (a0 + a2 + a4)(b0 + b2 + b4) P1 = (a0 + a1 + a2 + a3 + a4 + a5) (b0 + b1 + b2 + b3 + b4 + b5) P2 = (a1 + a3 + a5)(b1 + b3 + b5) P3 = (a0 + sa2 − a4)(b0 + sb2 − b4) P4 = (a0 + a1 + sa2 + sa3 − a4 − a5) (b0 + b1 + sb2 + sb3 − b4 − b5) P5 = (a1 + sa3 − a5)(b1 + sb3 − b5) P6 = (a0 − a2 + a4)(b0 − b2 + b4) P7 = (a0 + a1 − a2 − a3 + a4 + a5) (b0 + b1 − b2 − b3 + b4 + b5) P8 = (a1 − a3 + a5)(b1 − b3 + b5) P9 = (a0 − sa2 − a4)(b0 − sb2 − b4) P10 = (a0 + a1 − sa2 − sa3 − a4 − a5) (b0 + b1 − sb2 − sb3 − b4 − b5) P11 = (a1 − sa3 − a5)(b1 − sb3 − b5) P12 = a4b4 P13 = (a4 + a5)(b4 + b5) P14 = a5b5 
