Abstract-In this paper an efficient high-speed architecture of Gaussian normal basis multiplier over binary finite field GF(2 m ) is presented. The structure is constructed by using regular modules for computation of exponentiation by powers of 2 and low-cost blocks for multiplication by normal elements of the binary field. Since the exponents are powers of 2, the modules are implemented by some simple cyclic shifts in the normal basis representation. As a result, the multiplier has a simple structure with a low critical path delay. The efficiency of the proposed structure is studied in terms of area and time complexity by using its implementation on Vertix-4 FPGA family and also its ASIC design in 180nm CMOS technology. Comparison results with other structures of the Gaussian normal basis multiplier verify that the proposed architecture has better performance in terms of speed and hardware utilization.
Introduction
Finite fields are applied in a variety of applications such as cryptography. The efficient implementations of finite fields are important in public key cryptosystems such as elliptic curve cryptosystem (ECC). In such cryptosystems, the multiplier is a key operator in group law and point multiplication [1] . Furthermore, the time and hardware complexity of multiplication are important factors in evaluating the efficiency of the related cryptosystems. The binary finite fields of order 2 , denoted by GF (2 ) , are attractive fields for implementation of ECC. In these fields, the addition operation is implemented by a simple bit-wise XOR. Moreover, the basis of a binary field is a critical factor in the hardware implementation. There are two popular and applicable basis called polynomial basis (PB) and normal basis (NB). In the normal basis representation, the squaring operation and every exponentiation by powers of 2 are implemented only by cyclic shift operations. This feature can be useful in the design of the field operations such as multiplier. Therefore, hardware implementation by normal basis representation is a notable issue in the cryptographic applications.
There are several presented architectures of the normal basis and Gaussian normal basis (GNB) multiplication in recent years [2] - [24] . For example, in [4] a novel scalable multiplication algorithm is presented for a Gaussian normal basis using Hankel Matrix-Vector representation. In [5] a modified digit-level GNB multiplier over GF(2 ) is proposed. Also for GNB of types greater than 2, a complexity reduction algorithm is proposed to reduce the number of XOR gates without increasing the gate delay of the digit-level multiplier. In [7] three structures for GNB multiplier are presented. The first structure is a low-complexity digit-level serial input parallel output (SIPO) GNB multiplier. Second structure is an improved digit-level parallel input serial output (PISO) multiplier architecture. And the third structure is a new hybrid architecture by connecting the output of the digit-level PISO multiplier to the input of the digit-level SIPO multiplier. In [8] a new normal basis multiplication algorithm based on divide-and-conquer and uniform shift method is used to implement an efficient multiplexer-based architecture. A bit-parallel GNB multiplier using one pipelined XOR tree is also designed in [15] . A novel algorithm for GNB binary finite field multiplication using Toeplitz matrix-vector representation is proposed in [19] . It is also shown that the GNB multiplication can be realized through block Toeplitz matrix-vector-products. The multipliers with systolic and semi-systolic architecture are presented in [3] , [12] , [14] , [16] , [17] , [18] and [20] . A main problem in the systolic structure is its very high hardware consumption and high number of clock cycles; see for example [20] where the number of clock cycles is reduced. In [24] Dickson polynomial representation is proposed as an alternative way to represent the GNB of characteristic 2. A novel recursive Dickson-Karatsuba decomposition to achieve a subquadratic spacecomplexity parallel GNB multiplier is presented.
The aim of the present paper is to design a high-speed and efficient hardware architecture of the digit-serial Gaussian normal basis multiplier for binary finite fields. To that end, by reviewing the multiplication operation in the normal basis, we present a highly regular structure with low critical path delay and low hardware resources. The digit-serial multiplier is a suitable structure for area and speed trade-off in cryptographic application such as ECC. In addition, we present an efficient digit-serial multiplier based on exponentiation by powers of 2 and multiplication by a normal element of the binary finite field. Moreover, the proposed architecture is very regular and simple, and is well suited to hardware implementations. The FPGA and ASIC implementation results show that the proposed structure has acceptable area and time consumption.
The rest of this paper is organized as follows. In section 2, we briefly recall the notion of Gaussian normal basis for binary finite fields and propose the structure of digit-serial GNB multiplier. In section 3, we provide a comparison between this work and other previously related works. Finally, we conclude the paper in section 4.
2.
Proposed structure of the Digit-serial Gaussian normal basis multiplier over GF(2
A binary finite field of order 2 denoted by GF(2 m ) is isomorphic a vector space of dimension over GF (2) . So, the elements of GF(2 m ) can be represented by a basis. Two important types of this representation in the finite field arithmetic are polynomial basis (PB) and normal basis (NB). For an efficient hardware implementation, the normal basis representation is a suitable choice. The element in GF ( So, the addition of the elements of GF (2 m ) is performed using bit-wise XOR logic gates. The squaring of the element is . So, one important property of using normal basis representation is performing the squaring operation very efficiently by a simple one-bit rotation to the left. Also, this operation is performed recursively for exponentiation of a power of two. Thus, for the positive integer , the computation of 2 is performed via -bit cyclic shift to left, i.e.,
And similarly 2 − is computed by -bit cyclic shift to right, as we have
The multiplication of elements , in GF(2 m ) is written by
The element 2 +2 is represented by The complexity of the hardware implementation of a normal basis multiplication is related to the number of nonzero entries of the matrices that is a crucial parameter for the speed of the system. The Gaussian normal basis, GNB for short, is a special class of normal basis that by which the multiplication is simpler and more efficient [25] and [26] . The complexity of the multiplication of a GNB is measured by its type that is a positive integer related to the number of nonzero entries of the multiplication matrix. The time and area complexity of the multiplication operation over GF (2 m ) is depend on the type of the normal basis with respect to that basis. Therefore, a more efficient multiplier has a smaller type. The optimal normal basis (ONB) is a GNB of type 1 or 2 providing the most efficient multiplication algorithm among all normal bases. The GNB is considered in several standards such as IEEE P1363 [26] and NIST [27] . For example even types ={4,2,6,4, and 10} corresponded to fields {GF (2 163 ), GF (2 233 ), GF (2 283 ), GF( 2 409 ) and GF (2 571 )}, are recommended by these two standards. For each binary finite field GF(2 m ), where is not divisible by 8, a GNB exists of some type, also for each positive integer at most one GNB of type exists. More precisely, for given positive integers and , let = + 1 be a prime number such that gcd( / , ) = 1, where is the multiplicative order of 2 module . Then a GNB over GF(2 m ) of type exists. In this work, we consider the GNBs with odd values of which are applicable for cryptography applications and implies that is an even number. The GNB multiplication can also be computed via the following approach; see e.g. [26] . Let GF(2 m ) has a Gaussian normal basis of type , where = + 1 is a prime number. Let be an integer of order mod . Then, the set
is a reduced residue system modulo , i.e., each positive integer less than can be uniquely represented as = , mod . Let be a function given by
For even type the first coordinate 0 of , the multiplication and , is computed by
Also, other coordinates , 1 ≤ ≤ − 1, are computed similarly by one bit right cyclic shift of inputs.
A bit serial implementation of normal basis multiplication of two elements and is performed in [21] as follows.
Also the digit-serial GNB multiplication is implemented as below, where is divided into words of bits with = ⌈ ⌉.
In this method multiplications by some powers of are required which cause complex and irregular structure in hardware implementation [21] . To have a low-complexity and regular architecture of multiplication by 2 ( − ) for some integer , the computation = 2 ( − ) can be performed in three steps; first the exponentiation of the input by 2 −( − ) is done, then multiplication by is performed, and finally the exponentiation of the result by 2 ( − ) is completed. These operations result in:
. In the proposed method two steps of exponentiation by 2 −( − ) and 2 ( − ) are free hardware implemented only by cyclic shift; therefore, only multiplication by is the main part to be implemented. Here, we explain the structure of the proposed digit-serial multiplier. Let , be two elements in GF(2 m ) and let = [ −1 , −2 , … , 2 , 1 , 0 ]. We consider as the following × array and divide it into its columns. In other words, we have
where for = 1, … , ,
To calculate exponentiation by 2 −( − ) before multiplication by block in regular form, we write
So, first exponentiation by 2 −( − ) is computed, and then for = 2,3, … , , exponentiation by 2 −( − ) are generated by a sequence of exponentiation by 2 with length -1. Fig.2 shows the proposed structure for the digit-serial GNB multiplier over GF(2 m ). As seen in the Fig. 2 , a regular architecture for hardware implementation is provided. In proposed structure different exponentiation by power of 2 blocks are implemented by wired cyclic shift in the normal basis. This property is an important factor for improvement of efficiency in the structure. For implementation of multiplication by in GF(2 m ) of type , a method presented in [6] is employed. In this method, the entries of the multiplication matrix are encoded to an ( − 1) × matrix with entries , ∈ {0, 1, 2, … , − 1}, where = 0,1, … , − 2 and = 0,1, … , − 1. Notice that the multiplication matrix is symmetric and for GNB of even type , we have ,0 ( ) = , (0) , where , are in {0, 1, 2, … , − 1} (see [6] ). So, This means, the th row of matrix is the -bit number representation of 2 . Thus, the number of entries '1' in the first row is one and in other rows is even and less or equal to . Now, the entries of th row of matrix are identified based on the column numbers of entries 1 in ( + 1) th row of matrix . If the number of ones in row ( + 1) of matrix is , then all entries of row of matrix are specified. If it is not the case, the remaining entries of , whose number is even, is initialized with a constant value. There is also another method to determine matrix which is based on the function in Eq. 
.
Also based on the second method, first values of ( ) are calculated, as shown in Table 2 .
Then the pairs of ( ( + 1), ( − )) are listed as shown in Table 3 , and based on that matrix can be constructed.
Table3: Pairs of ( ( + 1), ( − )) for =4 over GF( 2 7 ) (1,0) (4,2) (3,2) (5,1) (0,4) (3,3) (2,5) (4,0) (5,3) (1,2) (6,4) (6,5) (6,1) (6,6) (1,6) (5,6) (4,6) (2,1) (3 As it can be seen, there are some common XOR terms in expressions. For example 1 and 6 have a common term 6 ⊕ 5 , and 4 and 5 have a common term 2 ⊕ 6 . Considering these common terms, hardware implementation of the multiplication by over GF (2 7 ) is as shown in Fig. 3 . Type of the Gaussian normal basis in the field GF (2 163 ) is =4. In this field, after resource sharing of the XOR common terms 24% of XOR gates are reduced in the implementation. In the following, the proposed structure of digit-serial GNB multiplier is presented for two cases of =3, = ⌈ The bits −1 and −2 are set to zero, and product of the Gaussian normal basis multiplication based on 1 , 2 and 3 is presented as follows: = (( 1 2 + 2 ) 2 + 3 ). Table 3 shows required exponentiation operations in the proposed digit-serial GNB multiplier over GF(2 7 ) for =3 and =3, and Fig. 5 shows the proposed structure of the digit-serial GNB multiplier over GF(2 7 ) with =4 for case =3 and =3. Required exponentiation operations in the proposed digit-serial GNB multiplier over GF( 2 7 ) for =4 and =2 are shown in Table 4 , and the proposed structure of the digit-serial GNB multiplier over GF(2 7 ) with =4 and =2 is shown Fig. 6 . The critical data path of the proposed structure of digit-serial GNB multiplier over GF(2 m ) with type is T A + (⌈ 2 ⌉+⌈ 2 ( +1) ⌉)T X , where T A and T X denote the time delay of a 2-input AND gate and 2-input XOR gate respectively. For above two examples the critical data path of structures in Fig.5 and Fig.6 are
Also, the number of clock cycles for the case ( =3, =3) is 4 and for case ( =4, =2) is 5. The proposed digit-serial GNB multiplier requires . AND gates and ≤ . +( -1)( -1) XOR gates. The number of XOR gates is lower than those of other digit-serial structures. More details of the hardware and time complexity of this work and other related works are presented in the next section.
Comparison Results
Comparison with other structures of the GNB and ONB multipliers based on different parameters like hardware resources, critical path delay and number of clock cycles are presented here. The hardware implementation of the proposed architecture is based on two fields of GF (2 163 ) and GF (2 233 ) recommended by NIST for ECC applications. The proposed digit-serial GNB multiplier structure has been successfully verified and implemented using Xilinx ISE 11onVirtex-4XC4VLX100-ff1148FPGA. In addition, the ASIC results are achieved by using Synopsys Design Vision tool based on library of standard cells with 180nm CMOS technology. In Table 5 , hardware utilization including numbers of 2-input XOR gates, 2-input AND gates, D flip-flops and 2 to 1 multiplexers for proposed structure and also for previously related works are presented. Also critical path delay and latency of multipliers are presented. 
------
[24] Bit-Parallel GNB, b=2 T=4 36 As seen in the Table 5 hardware utilization in the proposed structures is comparable with other digit-serial GNB multipliers. The proposed digit-serial GNB multiplier requires AND gates and ≤ +( -1)( -1) XOR gates.
The number of XOR gates is the lowest compared to other digit-serial structures. Hardware resources in recent work [20] are more than that of present design; however, the latency in [20] is better. Table 6 shows FPGA implementation results of the proposed architecture and work [7] on Virtex-4 XC4VLX100-ff1148for GF(2 233 ) and GF (2 163 ). In the table hardware utilization, maximum frequency and execution time for different digit sizes are reported. It should be noted that in [7] number of D flip flops for output register in DL-PISO structure and for serial input (A input) in the DL-SIPO structure have not been considered in the implementation. According to the Table 6 , the proposed work has better timing results than [7] on similar FPGA family Virtex-4 XC4VLX100-ff1148. For example in field GF (2 163 ) execution times of the proposed structures are 20.74ns and 53.788ns for two digit sizes 41 and 11, respectively, which are better than execution time in [7] for similar digit sizes. Table 7 shows area, critical path delay, and execution time of the proposed structure in 180nm CMOS technology by Synopsys Design Vision tool. Results show a suitable trade-off between area and execution time, applicable for elliptic curve cryptography systems. 
Conclusions
This paper presents an FPGA and ASIC implementation of an efficient hardware structure of the digit-serial Gaussian normal basis multiplier over GF(2 m ). In the proposed structure by reviewing the multiplication equation in normal basis, a regular structure for Gaussian normal basis multiplier is presented. The structure of multiplier is based on exponentiation by powers of 2 and multiplication by normal element of GF(2 m ). Therefore, the proposed architecture has low hardware complexity and low critical path delay. It is suitable for high-speed hardware implementation of the finite field multiplication and inversion operations over GF(2 m ) for elliptic curve cryptography.
