Introduction
The term cryptography represents the encryption of data. For the secured data communication cryptography is essential one. Cryptography is used now a day in a variety of different applications. A secure crypto processor is a dedicated microprocessor chip for carrying out cryptographic operation. Every application has its own design criteria and raises special requirements for hardware designs. The most fundamental decision concerning future hardware designs is whether to use a binary-extension field or a prime field as basis of the used crypto processor. Only a few papers compared binary and prime fields in hardware. Dual-Field Arithmetic Unit for GF(p) and GF (2 m ) is designed by Johannes Wolkerstorfer [1] . Jun-Hong Chen et al. designed the high performance unified field reconfigurable cryptoprocessor [2] .
The cryptography operations involving the integer value is carry over by GF(p). The binary extension fields GF (2 m ) ,where elements can be represented as polynomials instead of integers. Binaryfields GF(2 m ) are considered advantageous for hardware solutions because addition and modular reduction of polynomials are somewhat easier than those of integers. We will present a dual-field arithmetic unit that is capable to calculate the operations in both types of fields: GF(p) and GF(2 m ) A reconfigurable multicore cryptoprocessor for multi channel communication systems [3] result shows the increase in speed of execution. For increasing the speed of execution this paper is designing the quad core crypto processor. The quad core will execute the instruction concurrently. The parallel execution will increase the speed of execution. The parallelization of the hardware design is a significant ongoing topic of research. At the same time for decreasing the power consumption this design is implemented in the FPGA.
Santosh Ghosh et al. design the secure dual core cryptoprocessor using the prime field instructions [4] .This paper is concentrate to design a quad core crypto processor used to execute both prime and binary extension field instructions.
II. Implementing Prime Field And Binary Extension Field In FPGA
The Prime field and Binary extension field consist of several hardware blocks. In this section we discuss about the hardware design involvement of both prime and Binary extension field.
Prime field unit architecture
Blakley introduced an algorithm to perform Modular multiplication of two integers A and B modulo an integer M [5] . It is an iterative binary double-and-add algorithm. The main idea of the algorithm is that it keeps the intermediate result after each iteration below the modulus value, which it avoids final division. In this paper, the modulus M corresponds to p and we say it Fp multiplication. All arithmetic in Fp are performed in two's complement number system, which avoids input and output conversions like existing implementations [6] , [7] .
The main difficulty of the Blakley algorithm is the computation of addition on large operands. The modified Blakley algorithm for large operands is shown in [8] and [9] . The use of carry save adder (CSA) helps to speed up the repeated additions on large operands. However these modified versions require at least one final addition on large carry chain. Some pre-computed values too are used by this technique which requires additional time and storage area. The architecture for prime field consists of several independent blocks which operate in parallel for accelerating the execution of respective operations. The adder unit is used for performing the various addition and subtraction function involved in the algorithm. Multiplication is performed by the help of the left shifter. At first the instruction is decoded, depending upon the instruction algorithm is selected. Various control signals depending upon the algorithm is produced, that are used for executing the instructions.
Binary extension field unit block diagram

Fig.3. Binary extension Unit Block Diagram
The binary extension unit designed for the polynomial equation p(x)=x 4 +x+1. Therefore out of 256 input bits 4 bits are taken for the each binary extension unit blocks. The binary extension field unit consists of GF addition, multiplication and double. The binary field addition is designed by using the XOR gate operation.
At every level of hierarchy it adds one additional MUX in the critical path. Thus the latency of a 256-bit adder is 1 FCC + 3 MUX delay, which is 9.9 ns on a Virtex-4 FPGA, whereas the latency of a 256-bit carry look ahead adder on the same platform is 16.7 ns, which is 1.7 times slower than the above technique. we develop a programmable F p -primitive based on above 256-bit high-speed adder circuits. Prime field operations carry over are addition, subtraction, and multiplication. Fig. 1 depicts the overall resulting architecture of the F p -adder/subtractor/multiplier unit, where the internal dataflow of A256 blocks are shown in Fig. 1 .
The multiplier unit is designed by the look up table procedure. So this block is consist of memory blocks, which are used for storing the all possible output. The GF double unit consists of a simple addition unit. 
III. Instructions Implementation For Prime Field And Binary Extension Field
Instruction 2-Computation of F p Addition
The F p Addition is based on the following algorithm 
Instruction 4 -Binary Extension field multiplication
Quad Core Dual Field Cryptoprocessor On Fpga Platform
Instruction 6 -Binary Extension field GF(Double)
Same point value is added. Result will produce the double the value of present point in the graph . Result = a + a Table 1: Instruction Table: Sl.no Instruction Operation 1.
Interleaved multiplication based on Montgomery Result = a.b mod p --Galois field(P) 2.
Addition in prime field Result = a + b mod p --Galois field(P) 3.
Subtraction in prime field Result = a -b mod p --Galois field(P) 4.
GF_multiplication Result = a * b --Galois field(2^m)
GF_Double Result = (r3,r4) + (r3,r4) --Galois field(2^m)
IV. Execution Of Parallel Instructions.
The Fig5 shows the quad core structure execution unit block diagram. The mechanism and regularity of data access for computing all instructions are fairly simple. The instructions are supplied by the instruction memory. Data are supplied by the data memory. The microcode sequence unit checks the incoming instruction is a valid one or not. If the instruction is a valid instruction then only microcode sequence unit enable the execution unit. The execution unit is named as configurable arithmetic unit (CAU). The CAU unit contains the both blocks of Prime field instruction execution unit and Binary extension field execution unit. The data accesses and instruction sequences are hard coded into the sequence control of the architecture which avoids the additional software development costs. The quad core contains the 4 blocks of CAU unit, each one will execute any one of prime field or binary extension field instructions and produce the result. 
V. Implementation Results
The whole design has been done in verilog on Xilinx ISE design suit using a Spartran 3E, virtex4 and Virtex5. Performances are compared between the architecture.The synthesis output of the Xilinx ISE is shown in the Fig 6   Fig 6: Synthesis output of the design Various combination of instruction execution in parallel is verified. The instructions selection and input a and b value loading is inbuild in the coding. Example execution for Prime field addition, subtraction and binary extension field multiplication and double for the same input value of a=5h,b=4h, and p=7h is executed and the simulation output result is shown in the Fig 7. The output for the prime field addition is 102h, subtraction is 1h, Binary extension field multiplication is 7h, and binary extension field double is 1h. This simulation output is shown in the 
VI. Conclusion
The quad core structure executes the parallelism with both fields of instructions.At the same time 4 number of instructions are executing, this parallelism action will increase the speed. This architecture is an example design of crypto processor for executing prime field and binary field operations. This work has been further developed for the superscalar architecture with more number of instructions.
