Introduction
Finite field GF (2 m ) is a field that contains finitely many fields. It is especially useful in translate computer data, which present in the binary form. Finite Field has wide applications in cryptography and error control coding [1], [2] . The key arithmetic unit for multiple systems based on computations of finite field is finite field multiplier because the complex operations like division and inversioncan be broken down into successive multiplication operation. The most common arithmetic is multiplication which is useful to obtain efficient multipliers [3] .
Both the hardware and software architectures are studied for computing multiplications over finite field [4] . The mostly used bases for finite fields are polynomial (PB), normal (NB), triangular (TB), and redundant (RB) [5] . Basis is a set of vectors that, in a linear combination, can represent every vector in given a vector space. Redundant basis is attractive due to its free squaring and modular reduction for multiplication [7] . A redundant representation is extracted from minimalcyclotomic ring and the arithmetic operation can be performed in the ring by embed the present field [9] .
A number of structures have been designed for efficient finite field multiplication over finite field based on RB. Semi-systolic Montgomery multiplier ispresented in [4] . Super-systolic multiplier has been reported by Pramod Kumar Mehar. Bit-Serial/Parallel multipliers [8] , Comb style architectures are presented formerly and also several other RB multipliers are designed for hardware efficiency and throughput [6] .
In this contribute, an efficient high-throughput digit-serial/parallel multiplier designs over finite field based on RB is presented. A novel recursive decomposition scheme is presented, based on that parallel algorithms are obtained for high-throughput digit-serial multiplication. By depicting the parallel algorithm to a regular two dimensional signal-flow-graph (SFG) array go after by projection of SFG to onedimensional processor-space flow graph (PSFG), the algorithm is mapped to three multiplier architectures. In this work, the implementation of 10-bit digit-serial RB multipliers is presented to obtain high-throughput.
The organization of this paper is as follows: Mathematical representation is presented in section II. Highthroughput structures for digit-serial RB multipliers are derived from the proposedalgorithm mentioned in section III. Implementation and Simulation results are presented in section IV. Conclusions are presented in section V.
II. Mathematical Representation
Assume x to be a primitive nth root of unity, components in finite field GF (2 m ) are often described within the form:
A=a 0 +a 1 x+a 2 x 2 + ⋯+a n-1 x n-1 (1) Where a i belongs to GF (2), for 0 ≤i≤ n -1, alike the set {1, x, x 2 ,⋯ ,x n-1 } is defined as the RB for finite field components, wherever n could be a positive number not below m.
And just then (m + 1) is prime and 2 is primitive root modulo (m + 1) for a finite field, there beinga type I optimal normal basis (ONB). 
Where (i -j) n denotes modulo n reduction. Define C =
−1 =0
, where c i ∈ GF (2), we have:
Alternately, we can write (8) in a bit-level matrix-vector form as:
From (9), shifted form of the input bits B can be defined as follows:
Where,
The recursions on (13) can be extended further to have:
Where 1 ≤ ≤ − 1, Let Q and P are two integers alike n = QP + r, where 0≤ ≤ .For ease, assumer = 0, and decompose the input operand A into Q number of bit-vectors A u for u= 0, 1,⋯Q-1,as follows:
(17) Identically, we can produce Q units of shifted vector operands B u for u= 0, 1, ⋯ ,Q -1, as follows:
(20) The product C =AB which is obtained from (6) are broken down into productsQ of vectors A u and B u for u = 0, 1, ⋯ , Q-1 as:
Note that A u for u =0, 1, ⋯ , Q -1 is a P point bit -vector. B u for u = 0, 1, ⋯ , Q -1 is a P bit-shifted forms of operand B. Based on (21) and (22) proposed digit-serial algorithm is described.
Algorithm for proposed digit-serial RB multiplication 
III. High-Throughput Structures For Digit-Serial RB Multipliers 3.1Structure -I for Digit-serial RB multiplier
The proposed digit-serial RB multiplier is derived from the SFG of the proposed algorithm. From (21) and (22), the representation of RB multiplication is by two dimensional SFG in Fig.1 . The SFG consists of Q number of arrays which are in parallel; each array is with (P-1) bit-shifting nodes which is S node. The S nodes are two types they are S-I and S-II. The one position circular bit-shifting is carryout by S-I and Q positions circular bit-shifting is carryout by S-II. And it also consists of P multiplication nodes and addition nodes, where M nodes and A nodes. The role of M nodes and A nodes are described in Fig.2 (b) and 2(c). M node carryout AND operation of each serial-input bits of Awith the B input bits by bit-shifting form, and XOR operation is carryout by the each Anode. The final output is obtained by performing the bit by bit XOR operation of the operands Fig.1 .By addition of the Q parallel arrays output the required product word is obtained. To obtain the PSFG (Fig.3) , the SFG is projected along the jth direction for digit-serial multiplication.In PSFG during each clock cycle the p number of input bits carried in parallel to multiplication node. The PSFG functionality is as same as the SFG inFig.1 It consists of an extra node which is add-accumulation node (AA) and the role of the add-accumulation is to carryout accumulation operation to produce necessary result.
Bit-shifting (Xin) →youtXin1. Xin2→yout The digit-serial RB multiplier shown in Fig.4 , mentioned as structure-I. Structure-I consists of three blocks, which are bit-permutation module (BPM), partial product generation module (PPGM) and finite fieldaccumulator module. The BPM carries out rewiring of inputsB and the output is fed to the partial product generation unit. The PPGM is with the AND, XOR and register cells which carry out the function of M node. And the finite field accumulator block consistent with n-bit parallel accumulation units. The recent input which is received is added with the past accumulated result, and the sum is retain in the register cell and used in the next cycle. And successive output is obtained. Fig.7 shows the structure of partial product generation module which consists of XOR cell, AND cell and register cells with n parallel input bits and n parallel output bits. 
Modification of Structure -I for Digit-Serial RB Multiplier
We can have (P=kd+l), for any p integer value, where 0≤ <d and d<P. For simpleness, we assume l=0, however can easily extended to the cases where l ≠ 0. Define 0≤ ℎ ≤ − 1, and 0≤ ≤ − 1, such that (22) can be as:
Implementation of High-Throughput Digit-Serial Redundant Basis Multipliers over Finite Field
DOI: 10.9790/4200-0604013545 www.iosrjournals.org 39 | Page By depending on the (23), the PSFG is modified to obtain appropriate digit-serial multiplier structure Fig.6 , a set of shifting nodes, a set of multiplication nodes and a set of addition nodes of PSFG are combined to form overall node. And these nodes are executed by new PPGU to obtain PPGM of P/2 PPGUs. Suitably, in the structure of Fig.4 the two PPGU are appeared into a new PPGU, and it consists of two AND cells, two XOR cells and it needs only one XOR cell at the first PPGU of the structure-I when d=2. The functionality of the AND, XOR and register cells are same as the structure-I in Fig.4. 
3.3Structure-II for digit-serial RB multiplier
The Structure-II for digit-serial RB multiplier is in Fig.9 , the (P-1) A nodes of PSFG which are connected serially are combined into the pipeline form of (P-2) A nodes. And these pipeline forms of A nodes are constructed by using the pipeline XOR tree. To meet the time requirement there is no need of padding '0' at input due to the AND cell is organized in parallel. The function is as same as the structure-I.
Structure-III for digit-serial RB multiplier
In this, the bit-addition and bit-multiplication are carried out concurrently and hence the throughput of the desired structure can be increased. The structure-III for digit-serial RB multiplier is shown in Fig.10 , which contains (P+1) PPGUs and the each PPGU is with the single AND cell, single XOR cell and two register cells and the first output of this structure-III can be obtained at (P+Q+1) cycles. And at Q cycles the consecutive output is obtained. The fig.5 shows the structure of the bit-permutation module, and fig.7 (a), 7(b) and fig.7(c) shows the structure of AND cell, XOR cell and register cell of PPGM. Which the inputs are given parallel to the AND cell and obtain the output parallel and also which is done similar to the XOR cell and register cell. This consists of n parallel inputs and n parallel outputs. Fig.8 . Shows the structure of finite field accumulator, the finite field accumulator also consists of XOR cell and register cell with the parallel inputs and parallel outputs. 
Implementation of High-Throughput Digit-Serial Redundant Basis Multipliers over Finite Field
DOI
IV. Implementation And Simulation Results
The proposed structures (case 1, case 2, and case 3) are written in a Verilog HDL, synthesized and simulated using Xilinx 12.2.The simulation results and RTL schematic of 10 bit Signal-flow graph(SFG), Processor-space flow-graph (PSFG) and proposed structures (case 1, case 2, and case 3) are shown below. The simulation result of 10-bit Processor-space flow-graph (PSFG) is shown in Fig.13 .The inputs are a=0000000110 and b=0000001000 and the output obtained is c= 00000000000000011000. Case 1: The Simulation result of 10 bit structure-I for digit-serial RB multiplier is shown in Fig.15 . The inputs are a=00000000100 and b=0000000011 and the output obtained is c=0000001100, by performing the shifting, multiplication and addition operations. The detailed view of RTL schematicof 10 bit Structure-I is shown in Fig.16 . Consists of 10-bit input operands a and b with clock and reset, which obtain the 10-bit output c. The Simulation result of 10 bit structure-I for digit-serial RB multiplier when d=2 is shown in Fig.17 .The input operands are a=0000000101 and b=0000000110 with clock=1 and reset=0 and the output obtained is c=0000011110.The detailed view of RTL schematicof 10 bit Structure-I when d=2 is shown in Fig.18 .consists of 10-bit a and b operands with clock and reset, which obtain the output 10-bit c. Case 2: The Simulation result of 10 bit Structure -II for digit-serial RB multiplier is shown in Fig.19 .The input operands are a=0000000011, b=0000000101 with clock=1 and reset=0, and the output obtained is c=0000001111.
Implementation of High-Throughput Digit-Serial Redundant Basis Multipliers over Finite Field
Fig.19
Simulation result of 10 bit Structure-II for digit-serial RB multiplier
The detailed view of RTL schematicof 10 bit Structure-II is shown in Fig.20 .consists of 10-bit input operands a and b with clock and reset, and obtained output is 10-bit c.
Case 3:
The Simulation result of 10 bit Structure -III for digit-serial RB multiplier is shown in Fig.21 .The input operands are a=0000000011,b=0000001000 with clock=1 and reset=0 and which obtain the output c=0000011000.The detailed view of RTL schematic of 10 bit Structure-III is shown in Fig.22 . Consists of 10-bit input operands a, b with clock and reset and which obtain the 10-bit output c. 
V. Conclusion
The proposed structures (Case 1, Case 2, and Case 3) for digit-serial RB multipliers are implemented in Verilog HDL by using novel recursive decomposition algorithm. The synthesis is done for 10 bit proposed structures (Case 1, Case 2, and Case 3) and is simulated by using Xilinx 12.2.The proposed structures are implemented to obtain high-throughput, by projection of signal-flow graph to the processor-space flow-graph. And these multipliers are used based on application requirement mostly in cryptographic applications. The detailed RTL schematic of proposed structures is also obtained.
