Abstract-This paper proposes the design and implementation of GF (2 16 ) multiplier using composite field arithmetic. We have introduced an irreducible polynomial X 2 +X+ξ. This irreducible polynomial is required for transforming Galois field of GF (2 16 ) to composite field of GF (((2  2 ) 2 ) 2 ) 2 . Our estimation of the value of ξ and subsequently the composite field arithmetic hence forth derived achieved high speed GF (2 16 ) multiplier. The design being purely combinational is a clock free design. We achieved critical path delay of 11.5ns between inputs to output data path. We have used combination of ᴪ and λ as {10} 2 and {1100} 2 respectively. Due to this value of ᴪ, λ, ξ we achieved fastest implementation, at the cost of few extra gates. The design methodology includes implementation and verification on FPGA using Xilinx ISE and finally the physical layout was designed on ASIC using 90nm CMOS standard cell libraries. Our implementation result shows that without pipelining the hardware core can achieve throughput of 5.39 Mbps on FPGA and we achieved throughput of 5.43Gbps on 90nm ASIC.
Introduction
Multiplications are elementary mathematical operations extremely important in signal processing applications.To keep pace with the technology, high speed applications require faster methods of multiplication. Multipliers are the key components of many high performance systems such as FIR filters, microprocessors, and digital signal processors etc.Thecomputational performance of a DSP system is limited by its multiplication performance and since, multiplication dominates the execution time of most DSP algorithms, high-speed multiplier is much desired. Currently, multiplication time is still the dominant Factor in determining the instruction cycle time of a DSP chip.hence, optimizing the speed element and area of the multiplier is a major design issue. The three important considerations for VLSI design are power, area and delay.
There are number of techniques to perform binary multiplication. In general, the choice is based upon factors such as latency, throughput, area, and design complexity. Galois field multiplier, Array multiplier, Booth Multiplier and Wallace Tree multiplier are some of the standard approaches to have hardware implementation of multiplier which are suitable for VLSI implementation at CMOS level. Galois field multiplier is fix bit multiplier while others are not. Galois field multipliers are high in performance because of their carry free property. Due to decomposition of Galois field to composite field, complexity is less than Array multiplier, Booth Multiplier and Wallace Tree multiplier. In this paper, high speed GF (2 16 ) multiplier is implemented using tower field decomposition, employing lowest resources.
In the work of [1] presents the design and implementation of substitute Byte process element required in AES encryption. They have used the composite field arithmetic for computing multiplicative inverse. The conversion of GF (2 8 ) to GF (2 4 ) and subsequently to GF (2) has reduced the complexity. Isomorphic Mapping and Inverse Isomorphic Mapping Technique is used for mapping of Galois field to composite field and vice versa [2] . For mapping of GF (2 16 ) to GF (2 8 ) irreducible polynomial is used which contain constant μ. They performed the multiplication with an assumed value for the constant μ [3] . In the literature published till date, design methodologies of a Galois field multiplier and theory based on pipelining has been presented. A design of Galois field multipliers using a composite field includes designing of lower order Galois field multiplier. For implementation of GF (2  2 ), GF (2   2   )   2   , GF ((2  2 ) 2 ) 2 , GF (((2   2   ) 2 ) 2 ) 2 we use irreducible polynomial which has constant ᴪ, λ, ξ respectively.
The main contribution of this paper is to estimate the value of ξ for Implementation of GF (2 16 ) to GF ((
2 ) tower field conversion and also implementing on FPGA as well as on 90nm CMOS technology such that the design consumes low power and area and achieves high speed of operation. The value of ξ which is 8 bit constant required in irreducible polynomial X 2 +X+ξ.
The rest of the paper is organised in the following manner. Section II explains the fundamentals of Galois field , Section III elaborates our implementations of multiplier. The paper is concluded by Section IV that discusses our results and comparison. The complete multiplication operation can be realized by using XOR gates only.
II. Galois field
The multiplicand and multiplier are expressed in GF where in any number is expressed in a polynomial form. Here a polynomial f(x) is a mathematical expression in the form a n x n + a n-1 x n-1 + ... + a 0 . The highest exponent of x is the degree of the polynomial. For example, the degree of x 5 + 3x 3 +4 is 5. In a polynomial, a n , a n-1 ... a 0 are called coefficients. If in a polynomial, the coefficients a n , a n-1 ... a 1 are all 0, or in other words, the polynomial is in the form of a 0 , we call this polynomial a constant. We can add, subtract polynomials by combine the terms in the polynomials with the same powers.
Let f (x) = a n x n + a n-1 x n-1 + ... + a 0 and
be two polynomials over a field F, then there is a unique polynomial r(x) of degree smaller than m and another unique polynomial h(x),
+r(x). The polynomial r(x) is called the remainder of f(x) modulo g(x). For polynomials a(x), b(x) and g(x)
which are over the same field, we say a(
Let A = (11) = x+1
III. GF multiplier Implementation
The GF (2 16 
For decomposition of more complex GF (2 16 ) to lower order GF ((2  2 ) 2 ) 2 , GF ( 2 2 ) 2 , GF (2 2 ) and GF ( 2 1 ) irreducible polynomials of (1), (2) and (3) The values of ξ may take up many combinations. From  Fig.1 , the calculation of multiplication in composite field, elements can"t apply directly to the GF (2 16 ) elements.It must be mapped into Galois field first. For that purpose isomorphic function δ is used.After performing multiplication,the result will also have to map back from its composite field. For that purpose inverse isomorphic function δ -1 is used.Both δ and δ -1 can be represented in 16*16 matrix.Let q be the element in GF (2 16 ) then the isomorphic mapping and its inverse can be written as δ*q and δ -1 *q.
[6]
[7] In order to construct a GF (2 16 ) multiplier, GF ( 2 8 ) multiplier implementation is used. For multiplication of 16 bit binary number in Galois field, two 16 bit inputs are given and we get output of 16 bit. As shown in Fig. 3 , the 16 bit binary input given to the delta block transforms it from finite field to composite field [3] . For each input, multiplicand and multiplier, the isomorphic transformation needs to be performed prior to applying to block as shown in Fig.2 . The block in Fig. 4 , represents the 8 bit Galois field multiplier, which has two 8 bit input and 8 bit output. This block can be implemented using combinational gates [2] . (2 2 ) 2 ) 2 ) 2 to GF (( (2) 2 ) 2 ) 2 .
Fig.6. Inverse Isomorphic transformation block
The inverse isomorphic transformation of 16 bit as shown in Fig. 6 maps the polynomials represented in composite field arithmetic to GF(2 16 ) finite field format. It is a 16*16 bit matrix which can be implemented using XOR gates [3] .
In the proposed GF (2 16 ) multiplier in Fig.2 , the coefficient ξ can take any value out of 128 combinations. The selection of ξ may change the number of gates required while implementing GF (2 16 ) multiplier. The estimation of the constant ξ is a complex and time consuming method and therefore we have followed a methodology to select value of ξ that is explained in the subsequent paragraphs. 
The multiplication with normal bases [3] has been considered for 16-bit multiplication and each internal block of that design was analysed with known inputs and their output products such that value of ξ can be estimated. Subsequently GF( 2 2 ) is constructed by using the irreducible polynomial k(X) over GF (2) .Similarly GF (2 2 ) 2 is constructed by using irreducible polynomial l(X) ,GF( (2 2 ) 2 ) 2 is constructed by using irreducible polynomial m(X) and GF( (2 2 ) 2 ) 2 ) 2 is constructed by using irreducible polynomial n(X) .
The Composite field construction for multiplication with ξ block can be further elaborated only after unfolding the various constituent blocks wherein all are expressed in normal base. 
Implementation of Mᴪ and
Proc (6)+q (5)+q (4)+q (1) (7)+q (5)+q (3)+q (1)+q(0) k 6 =q (7)+q (5)+q (5)+q (4)+q (2) k 7 =q (7)+q (6)+q (4)+q (3) (15) k={k 7 , k 6 
IV. Our Result and Comparison
The implementation of design resulted combinational logic circuit which contain the combination of AND and XOR gates. Our ξ value is {11100011} 2. The design was implemented on Vertex 4 FPGA using Xilinx ISE tool. In [2] the authors suggested the value of ψ and λ as {10} 2 and {1100} 2 respectively. For our implementation, we took the value of ψ, λ and ξ as {10} 2 , {1100} 2 and {11100011} 2 . We achieved the critical path delay 11.5ns.
Our implementation result shows that without pipelined, we achieved throughput of 5.39Mbps on FPGA and a throughput of 5.43Gbps on 90nm ASIC respectively. While synthesizing and layout design, we have considered TSMC 90nm standard cell libraries. Cadence RTL compiler and Encounter are the tools used for synthesis and physical layout design. The final layout of our implementation is shown in Fig. 14. The performance of our FPGA implemented design is better than the other two designs as mentioned in Table III . Our implementation on FPGA as well ASIC consumes very low area and power without pipelining the architecture. 
