Abstract -In this paper, presents an optimized combinational logic based Rijndael S-Box implementation for the SubByte transformation(S-box) in the Advanced Encryption Standard (AES) algorithm on FPGA. S-box dominated the hardware complexity of the AES cryptographic module thus we implement its mathematic equations based on optimized and combinational logic circuits until dynamic power consumption reduced. The complete data path of the Sbox algorithm is simulated as a net list of AND, OR, NOT and XOR logic gates, also for increase in speed and maximum operation frequency used 4-stage pipeline in proposed method. The proposed implemented combinational logic based S-box have been successfully synthesized and implemented using Xilinx ISE V7.1 and Virtex IV FPGA to target device Xc4vf100. Power is analized using Xilinx XPower analyzer and achieved power consumption is 29 mW in clock frequency of 100 MHz. The results from the Place and Route report indicate that maximum clock frequency is 209.617 MHz.
I. INTRODUCTION
Cryptography is the science of information and communication security. Cryptography is the science of secret codes, enabling the confidentiality of communication through an insecure channel. It protects against unauthorized parties by preventing unauthorized alteration of use. It uses a cryptographic system to transform a plaintext into a cipher text, using most of the time a key [1] . Byte substitution and Inverse Byte Substitution are the most complex steps in the encryption and decryption processes. In these steps each byte of the state array will be replaced with its equivalent byte in the S-box or the Inverse S-box. As AES algorithm use elements within the GF (2 8 ), each element in the state array represents a byte with a value that varies between 00H-FFH. The S-box has a fixed size of 256 bytes represented as (16*16) bytes matrix [2] . In this paper propose an optimized and pipelined architecture for Sbox block in AES based on combinational logic. We used minimum number of logic gate in proposed design. In recent years, a number of researches have been proposed for Implementation of S-box by using the FPGA by [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] . In continue we present some researches, in [3] , a software method of producing the multiplicative inverse values, which is the generator of S-box values and the possibilities of implementing the methods in hardware applications will be discussed. The method is using the log and antilog values. The method is modified to create a memory-less value generator in AES hardware-based implementation. In [4] , they propose an improved masked AND gate, in which the relationship between inputs masked values and masks, is nonlinear. Usually, when converting S-box operations from GF (2 8 ) to GF (((2  2 ) 2 ) 2 ), all the necessary computations become XOR and AND operations. Therefore, to fully mask AES S-box is to substitute the unmasked XOR and AND operations with the proposed masked AND gate and protected XOR gate. In [5] , a general method for sharing common subexpressions derived from the algebraic finite fields is proposed. Furthermore, they present a randomly configurable architecture for protecting S-box transformation. [6] , presents a compact implementation of the S-box of Pomaranch stream cipher using composite field arithmetic in GF ((2   3   ) 3 ). It describes a systematic exploration of different choices for the irreducible polynomials that generate the extension fields. It also examines all possible transformation matrices that map one field representation to another. In [7] , they propose countermeasure techniques for AES with S-box hiding using four different implementations of S-boxes using composite fields. The proposed work by [8] , employs a combinational logic design of S-box implemented in FPGA. The architecture employs a Boolean simplification of the truth table of the logic function with the aim of reducing the delay. The S-box is designed using basic gates such as AND gate, NOT gate, OR gate and multiplexer. In [9] , presents FPGA implementation and overhead evaluation for an algorithmic Differential Power Attack (DPA) countermeasure for AES. In [10] , presents a new efficient method for implementation of the AES byte substitution function. It is aimed at the AES implementation in non-volatile FPGAs featuring volatile embedded RAM blocks. The method uses a pair of linear feedback shift registers to generate substitution tables into embedded RAMs. The proposed solution requires less space and is faster than the one implementing whole Sboxes in the logic area, and it is especially suited to a power-aware AES implementation. In [11] , investigate a new compact digital hardware implementation of AES Structure with integrated S-box and Inverse S-box transformation which minimizes the computation cost of the relevant arithmetic in the finite field GF (2 8 ), including the cost of the mapping. This approach has advantages over a straightforward implementation using read-only memories for table lookups. The resulting Sbox design with subfield operations in GF ((((2   2   ) 2 ) 2 offers a reduction in the reconfigurable logic by 81% low gate count as compared to Look Up Table( LUT) and 23% better performance in area and faster by 3% in comparison with one using GF ( (2 4 ) 2 ). A high speed architecture for composite field arithmetic based S-boxs transformation used in AES is present by [12] . In [13] , two instructions for S-box access are designed by constructing a novel flexible on-chip parallel substitution box unit that consists of multiple LUT and a postprocessing module. The box unit is integrated into the 32-bit configurable Leon2 processor. Configuration of Leon2 is presented. Implementing this extended processor core on FPGA shows that the parallel substitution box unit uses very small amount of hardware resources. The proposed architecture is derived by extending the precomputation technique suggested recently by Liu and Parhi [14] to a recently proposed architecture of AES Sbox due to Rashmi, Mohan and Anami [15] . To reduce implementation overhead the masked compact S-box, proposed by Canright [16] , was chosen to implement a DPA countermeasure on an SRAM FPGA. This paper is organized as follows. In section II description of the subbyte transformation, proposed method and proposed architecture is presented. Section III discusses comparison of the hardware implementation and chip utilization taken from Xilinx ISE that verifies the performance of the proposed work. Section IV is the conclusion.
II. THE SUBBYTE TRANSFORMATION
Paper presents a combinational logic based Rijndael Sbox implementation for the Sub Byte transformation in the AES algorithm for FPGA. We for implementation Sbox use from [17] [18] . Using combinational logic for implement S-box has small area occupancy and high throughput, and as compared to the typical ROM based LUT implementation which access time is fixed and unbreakable. The SubByte transformation is computed by taking the multiplicative inverse in GF (2 8 ) followed by an affine transformation [17] .
SubByte:
1-Multiplicative Inversion in GF ( 2 8 ) 2-Affine Transformation
The Affine Transformation can be represented in matrix form and it is shown below:
The AT is the Affine Transformation From here, it is observed that the SubByte transformation involve a multiplicative inversion operation. This section illustrates the steps involved in constructing the multiplicative inverse module for the S-box using composite field arithmetic. The multiplicative inverse computation will first be covered and the affine transformation will then follow to complete the methodology involved for constructing the S-box for the SubByte operation. The individual bits in a byte representing a GF ( 2 8 ) element can be viewed as coefficients to each power term in the GF (2 8 ) polynomial. For instance, {10001011} 2 is representing the polynomial q7 + q3 + q + 1 in GF (2 8 ).
From [18] , it is stated that any arbitrary polynomial can be represented as bx + c, given an irreducible polynomial of x 2 +Ax+B. Thus, element in GF ( 2 8 ) may be represented as bx+c where b is the most significant nibble while c is the least significant nibble. From here, the multiplicative inverse can be computed using the equation below [18] .
From [17] , the irreducible polynomial that was selected was x 2 +x +λ. Since A=1 and B=λ, then the equation could be simplified to the form as shown below [17] .
The above equation indicates that there are multiply, addition, squaring and multiplication inversion in GF (2 The legends for the blocks within the multiplicative inversion module from above are illustrated in Table I . 
2) Isomorphic Mapping and Inverse Isomorphic Mapping
The multiplicative inverse computation will be done by decomposing the more complex GF (2 8 ) to lower order fields of GF(2 1 ), GF(2 2 ) and GF ((2   2   ) 2 ). In order to accomplish the above, the following irreducible polynomials are used [14] .
Where φ= {10} 2 and λ= {1100} 2 . Computation of the multiplicative inverse in composite fields cannot be directly applied to an element which is based on GF (2 8 ). That element has to be mapped to its composite field representation via an isomorphic function, δ. Likewise, after performing the multiplicative inversion, the result will also have to be mapped back from its composite field representation to its equivalent in GF (2 8 ) via the inverse isomorphic function, δ -1
. Both δ and δ -1 can be represented as an 8*8 matrix. Let q be the element in GF (2 8 ), then the isomorphic mappings and its inverse can be written as δ*q and δ -1 *q, which is a case of matrix multiplication as shown in below , where q 7 is the most significant bit and q 0 is the least significant bit [17] . Proposed implementation of the affine transformation is shown in Fig.3 . The matrix multiplication can be translated to logical XOR operation. The logical form of the matrices above is shown below.
As seen in above matrix this block is implementation based on XOR gates. We for implementation of this block use minimum number of XOR gates, until proposed design optimized. Also other blocks in S-box are designed with combinational logic implemented with minimum number of logic gates. Proposed implementation of δ*q is shown in Fig.3 . Also proposed implementation of δ -1 *q is shown in Fig.4 . From [18] and [19] , any arbitrary polynomial can be represented by bx+c where b is upper half term and c is the lower half term. Therefore, from here, a binary number in GF q can be spilt to q H x+q L . For instance, if q={1011} 2 , it can be represented as {10} 2 x+{11} 2 , where q H is {10} 2 and q L = {11} 2 . q H and q L can be further decomposed to {1} 2 x+{0} 2 and {1} 2 x+{1} 2 respectively. Using this idea, the logical equations for the addition, squaring, multiplication and inversion can be derived.
3) Squaring in GF(2 4 )
Let k =q 2 , where k and q is an element in GF( 2 4 ), represented by the binary number of {k 3 k 2 k 1 k 0 } 2 and {q 3 q 2 q 1 q 0 } 2 respectively. The expression above is now decomposed to GF (2 2 ).
Decomposing k H and k L further to GF (2) would yield the formula to compute squaring operation in GF (2 4 From equations (2) and (3), the formula for computing the squaring operation in GF (2 4 ) is acquired as shown below.
Proposed implementation of above equations is shown in Fig.5 . 
4) Multiplication with constant, λ
Let k = qλ, where k= {k 3 k 2 k 1 k 0 } 2 , q= {q 3 q 2 q 1 q 0 } 2 and λ= {1100} 2 are elements of GF (2 4 ). 
 
2 2 0 1 2 3 0 1 2 3 ) ( k L HL H k k q xk x k k k k k L H L H                       2 2 2 2 2 2 L k L H L L H L H H q xxxx q       2 2 ) ( k L H q x q       ) 2 ( k 2 2 2 2 GFx q L H K L H k H              2 2 3 2 2 3 2 H ) ( ) ( k q xH     2 2 3 2 2 2 3 2 3 2 2 3 H k q xxxx q       2 3 H ) 1 ( k q x q    ) 2 ( ) 2 ( ) ( k 3 2 3 2 3 GFx q k x      2 0 1 2 2 2 3 2 2 L ) ( } 10 { ) ( kb q L H      2 0 1 2 2 2 3 L ) ( ) 0 } 1 ({ ) ( k q x q x q x q      2 0 1 2 2 2 0 1 0 1 0 2 2 1 2 2 3 2 3 2 2 2 3 L ) ( } 10 { ) ( ) )( ( kxxx q x q xxx q          0 2 1 2 3 L k q x q x q x q H     0 1 2 3 L ) 1 ( ) 1 ( k q x q x     ) 3 ( ) 2 ( ) ( ) ( k 0 1 3 1 2 0 1 GFxk x        3 3 k q  2 3 2 k  1 2 1 k  0 1 3 0 k                                  L H L H L HL H k kk x k k k k k   00 11 k 0 1 2 3 0 1 2 3 ) )( ( k L H L H x q x q      Copyright
=1,
From equations (4) and (5) combined, the formula for computing multiplication with constant λ is shown below.
Proposed implementation of multiplication with constant λ is shown in Fig.6 . 
5) GF(24) Multiplication
Let k = qw, where k= {k3 k2 k1 k0}2, q = {q3 q2 q1 q0}2 and w = {w 3 w 2 w 1 w 0 } 2 are elements of GF (2 4 ).
Substituting the x 2 term with x 2 = x + φ yields the following. Equation (7) is in the form GF (2 2 ). It can be observed that there exist addition and multiplication operations in GF (2 2 ). Addition in GF ( 2 2 ) is but bitwise XOR operation. Multiplication in GF (2 2 ), on the other hand, requires decomposition to GF (2) to be implemented in hardware. Also, it the expression would be too complex if equation (7) were to be broken down to GF (2). Thus, the formula for multiplication in GF (2 2 ) and constant φ will be derived instead. Fig.7 below shows the hardware implementation for multiplication in GF (2 4 ). 
6) GF(22) Multiplication
Let k=qw, where k = {k 1 k 0 }2, q= {q 1 q 0 } 2 and w = {w 1 w 0 } 2 are elements of GF (2 2 ). The equation above can now be implemented in hardware as multiplication in GF (2) involves only the use of AND gates. That we use from AND gate for its implementation.
The formula for computing multiplication in GF (2) is as follows. The above hardware implementation is different of the (9) for the computation of k 1 . It can be proven that the implementation above for computing k 1 would result to the expression in (9) , as shown below.
7)
Multiplication with constant φ
Let k=qφ, where k = {k 1 k 0 } 2 , q = {q 1 q 0 } 2 and φ = {10} 2 are elements of GF (2 2 ).
Substitute the x 2 term with x 2 =x+1, yield the expression below.
From (10), the formula for computing multiplication with φ can be derived and is shown below.
The hardware implementation of multiplication with φ is shown below in Fig.9 . Fig.9 : Hardware implementation of multiplication with constant φ.
8) Multiplicative Inversion in GF(24)
In [19] has derived a formula to compute the multiplicative inverse of q (where q is an element of GF (2 4 )) such that q -1 ={q 3
,q 1 -1 ,q 0 -1 } The inverses of the individual bits can be computed from the equation below [19] .
Proposed implementation of these equations is shown in Fig.10 . As explained proposed implementation for S-box is based on pipelining until performance and speed is increased. Fig.11 shows proposed pipelined S-box. IV. COMPARISON We design a FPGA implementation of the S-box algorithm based on combinational logic. In this paper proposed method has been written by VHDL hardware description language. In order to get actual numbers for the hardware usage this work was synthesized and implemented using Xilinx 7.1 software, Virtex-4 FPGA to target device Xc4vfx100 also power is analyzed using Xilinx XPower analyzer. Table II shows utilization hardware and performance in different works and proposed method for S-box also Table III shows power consumption in proposed method for S-box. V. CONCLION The aim of paper is design and implementation of the optimized combinational logic based Rijndael S-Box on FPGA. Proposed method is based on combinational logic, thus it is low power and number of logic gates is very low. The approach used for increase performance is pipelining technique we use 4-stage pipelining in S-Box design. The proposed architecture only is based on XOR, AND, NOT, and OR logic gates. This method has more speed and low power than other work. 
