ABSTRACT
INTRODUCTION
The cryptographic algorithms became the main proceeding for protection of very important data, the security objective called confidentiality [1] [2] [3] being the one taken into account by their hardware implementation and by their integration into the present-day communication systems.
A number of the encryption/decryption algorithm the cryptographie has been developed [2] [3] [4] . Keeping pace with maturity of the security technology the hackers, the electronic eavesdroppers, electronic frauds and the virus have been coming into the field with new improved techniques for to attack the security mechanism [17] , [19] . So to protect any attack to the valuable information source and their transmission, the algorithm Advanced Encryption Standard (AES or Rijndael), a Federal Information Processing Standard is approved by National Institute of Standards and Technology (NIST) [4] , [7] , [8] , [11] .But AES has 10 round of complex algebraic and matrix operation which involve high processing power and introduce delay in encryption and decryption process. For this reason at the beginning of this work the speed is treated as a major issue and concentration is provided on hardware based implementation. Field Programmable Gate Array based implementation is chosen in this operates as FPGA offers lower cost, flexibility and reasonable performance than ASIC (Application Specific Integrated Circuit) implementation. Beforehand researcher proposed application of AES processor on FPGA hardware place a few security features since earlier version of the FPGA available in the market was low capacity.
Newly the development of a AES CPU using VHDL and its implementation on FPGA Xillinx without sacrificing any security feature of the algorithm is reported [22] . It offers many FPGA high capacities in various families. Literatures [10] , [12] , [13] , [18] , [20] [21] describe design and implementation of AES processor in the FPGA platform. This paper presents a hardware implementation for the AES (Advanced Encryption Standard) symmetric cryptographic algorithm, under VHDL programming language by using different architecture of Mixcolumn and a hardware simulation of the resulted ciphering & deciphering module. Figure 1 . The basic AES-128 cryptographic architecture AES algorithm is a FIPS standard and is a symmetric key [5] , [9] , in which the sender and recipient use only key for encryption and decryption. The data block length is fixed to be 128 bits (Nb = 4 words), while the length of the cipher key can be 128, 192 or 256 bits, and be represented by Nk = 4, 6, or 8 words respectively. Moreover, the AES algorithm is an iterative algorithm. The iterations are called rounds, and the total number of rounds, Nr is 10, 12, or 14, when the key length is 128, 192, or 256 bits, respectively. The 128 bit plaintext block is divided into 16 bytes. These bytes are mapped to a 4 x 4 array called the State, and all the internal operations of the AES algorithm are performed on the State. Each byte in the State is denoted by Si;j , (0 < i, j < 5) and is considered as an element of Galois Fields, GF (28) . The irreducible polynomial used in the AES algorithm to construct, GF (28) 
THE BASIC CRYPTOGRAPHIC ARCHITECTURE
In (Figure 1 ) AES encryption processes are presented. In the encryption of the AES algorithm, each round except for the final round consists of four transformations: the Sub__Bytes(), the Shift__Rows(), the Mix__Columns(), and the Add__RoundKey(), while the final round does not have the MixColumns() transformation.
The algorithm AES It can be cut in three blocks:
Initial Round: It is the first and simplest of the stages. it only counts one operation: Add Round Key.
Remark: The inverse of this operation bloc it is herself. 
ADDROUN-key
In this transformation, a round key is added to the State by a simple bit wise XOR operation (that is a sum in Galois Fields). Each tower key consists of four words from the key schedule procedure 
SubBytes
This transformation is a non-linear byte substitution that operates independently on each byte of the State using a substitution table (Sbox) [3] . We construct this S-box, which is reversible, by composing two transformations [9] :
• 1. Taking the multiplicative inverse in the finite field GF (28) with
• As irreducible polynomial; the element {00} is mapped onto itself.
2. Applying an affine (over GF (2) [1] , [8] ) transformation defined by:
• For 0 ≤ i < 8, where bi is the ith bit of the byte and ci is the ith bit of a constant byte c with the value {63}.
Figure 3. Sub-Byte bloc
• Remark: The function inverse named Inv_SubBytes consists in applying the same function but this time while using Inv_SBox that is the inverse table of SBox.
Shif-Rows
In this transformation, the bytes in the last three rows of the State are cyclically shifted over different numbers of bytes (offsets). The first row, row 0, is not shifted. Row 1 of the State is left shifted by 1 byte position; row 2 is left shifted by 2 byte positions; row 3 is left shifted by 3 byte positions.
• Remark: The inverse function of this Inv_ShiftRoxs operation consists in replacing the shift on the left on the right by a shift. 
MixColumns
This transformation [9] operates on the State column-by-column, treating each column as a four-term polynomial over GF (28) . These polynomials are multiplied modulo (x4 + 1) with a fixed polynomial a(x), specified in the standard. 
Previous work
There exist many presentations of hardware implementations of AES algorithms in literature. In 2001, Elbirt et al., [6] [32] . Unfortunately, most of those implementations are too costly for practical applications.
In this paper, we have developed compared and implemented our new different architectures of mix-column: 1-Methods permit to calculate the products in GF (28) (architecture (1). 2-Galois Multiplication lookup tables (architecture 2). 3-Properties of the binary calculation (architecture 3). To achieve a high throughput with small area. The rest of the paper is organized as follows: Section 3. Implementation of AES (Ciphering & Deciphering) in FPGA. Section 4.simulation & interpretation. Section 5 concludes the paper.
methods permit to calculate the products in GF (28).
a) Mathematical application (architecture 1) [14] {0D}* M(x) = (m0 + m5 + m6) + (m1 + m5 + m7) x + (m0 + m2 + m6) x2 + (m0 + m1 + m3 + m5 + m6 + m7) x3 + (m1 + m2 + m4 + m5 + m7) x4+ (m2 + m3 + m5 + m6) x5 + (m3 + m4 + m6 + m7) x6 + (m4 + m5 + m7) x7.
{0E}* M(x) = (m5 + m6 + m7) + (m0 + m5) x + (m0 + m1 + m6) x2 + (m0 + m1 + m2 + m5 + m6) x3 + (m1 + m2 + m3 + m5) x4+ (m2 + m3 + m4 + m6) x5 + (m3 + m4 + m5 + m7) x6 + (m4 + m5 + m6) x7.
{0B}* M(x)= (m0 + m5 + m7) + (m0 + m1 + m5 + m6 + m7) x + (m1 + m2 + m6 + m7) x2 + (m0 + m2 + m3 + m5) x3 + (m1 + m3 + m4 + m5 + m6 + m7) x4+ (m2 + m4 + m5 + m6 + m7) x5 + (m3 + m5 + m6 + m7) x6 + (m4 + m6 + m7) x7.
{09}* M(x) = (m0 + m5) + (m1 + m5 + m6) x + (m2 + m6 + m7) x2 + (m0 + m3 + m5 + m7) x3 + (m1 + m4 + m5 + m6) x4+ (m2 + m5 + m6 + m7) x5 + (m3 + m6 + m7) x6 + (m4 + m7) x7.
{03}* M(x) = (m0 + m7) + (m0 + m1 + m7) x + (m1 + m2) x2 + (m2 + m3 + m7) x3 + (m3 + m4 + m7) x4+ (m4 + m5) x5 + (m5 + m6) x6 + (m6 + m7) x7.
{02}* M(x) = (m7) + (m0 +m7) x + (m1) x2 + (m2 + m7) x3 + (m3 + m7) x4+ (m4) x5 + (m5) x6 + (m6) x7.
b) Galois Multiplication lookup tables (architecture 2)
Commonly, rather than implementing galois multiplication, Rijndael implementations simply use pre-calculated lookup tables to perform the byte multiplication by 02, 03, 09, 0B, 0D, and 0E. [15] . These lookup tables are as follows:
Multiplication by 02 
c) Properties of the binary calculation (architecture 3)
According to the two architectures previous we were able to achieve other method based on the Properties of the binary calculation that have for goal the easiness the use of this operation to the material level you find the manner and the stages that we followed in order to calculate the multiplication mixcolumn below: we take The MSB like a mask and one calculates the operations (temp1, temp2 and temp3) of shift on the left to add by the number (1B) in order to compensate the loss of the MSB caused (provoked) by the shift. 
Implementation of AES (Ciphering & Deciphering) in FPGA:
To concretize that our modifications give a better results in terms of area and speed than the previous work, we compare the encryption /decryption codes (original and modified) based on the three models of mix-column. The comparison considered two criteria: chip speed and area utilization. The design was implemented on an Cyclone III (EP3C80F780C6 model) device. Both designs were synthesized using Quartus II v9.1 tool. As shown in Table 1 , an increase in speed about 20% was achieved in our design and a reduction in area about 12% was achieved in our design. On the other side, since we use Properties of the binary calculation to perform MixColumn transformation, we the other encryption (model 1 and 2) design needs more memory (as shown in Table 1 ). Implementation uses the VHDL programming language that nowadays is commonly a language used very established for FPGA [16] . The drawing & the software of the simulation is Quartus II.
The encryption block is represented in Figure 9 , where the main signals used by the implementation are shown. Table1. Comparative table between different blocks constitute AES algorithm we take in consideration the different architecture (model 1,2 and 3) of Block Mixcolumn that we already treated, under VHDL language in the Circuit Cyclone III (EP3C80F780C6 model) which is a low resource: 81264 logical elements; 430 pine to enter / exit; Embedded Multiplier 9-bit elements 488; capacity of memory 2810880bits; 4PLL.The main signals are: the clock of the system(CLK), the system reset (RST), signal of the load which load the key and given them and signal that permits to encode and to decipher given them. A summary of the occupied resources is presented in the comparative table. (Table 1) • According to the comparative table we can notice that with first architecture of AES-128 (based on the model 1 of mixcolumn) of the setting in .implementation occupies more that (75840 slices) of the device, when the second (based on the model 3 of mixcolumn) need of the setting in .implementation roughly (75147tranches) of capacity total of the device, on the other hand the last architecture of AES-128 has (based on the model 2 of mixcolumn) need (80829) of the device.
The conclusion is that the last (model 3) bet in implementation is more efficient than architecture of the first and the third, about the number of occupation of resources of the device.
SIMULATION & INTERPRETATION
Each round has 4 operations and it is iterative in nature. So the output of first round is fed to the second round as input data and performs the same operations with another set of keys. This process continued until the last round reach. In the last round, there is no mix-column operation. The State array obtained after the last round is the required cipher text for transmission Figure 10 and 11: shows the simulation results of the encrypt and decrypt data, we give 1bit e_d (to control the encryption and decryption ,e_d=1 Ciphering else Deciphering ), 128 bit plaintext (data that we need to encrypt) and 128 bit key (the key used also to generate another keys), of course clock (clk) was used to synchronize the various blocks and rst representing the reset the given if equals 1, finally, we will have in 128 bit output (cyphertext).we use for our simulation QuartusII simulator.
The diagrams retailed of the simulation the processes for the setting in implementation AES are presented below, in Figure (10 and 11) .The length total of the process of the ciphering is (124s) and some decoding is (277s). Figure 11 . Simulation of the decoding of AES-128
During our implementation, we met several difficulties among which we mention:
-Differentiation in time of execution between the ciphering and the decoding because of the structure of some functions and we could remedy this problem by optimization of the code -The problems of battery overflow caused by the operating system, and we think that it is preferable to work with machines of big performances (RAM, speed of clock, cache memory.).
-Calculate it binary in field of Galois notably the multiplication and the inverses of the matrixes.
-The description sometimes dark of the algorithm of RIJNDAEL and especially in the phase of decoding.
This system of cryptage is used in the domains industrial, commercial and financial and on an increasing number of the PCs. its domination on the market increases daily.
CONCLUSIONS
In this paper, we have presented a novel FPGA implementation of the encryption algorithm AES under VHDL utilizing high performance Mix-column/inv-Mix-column which uses Properties of the binary calculation.
The result shows the FPGA implementations allow us to increase flexibility, lower costs, and reduce time to release enhanced cryptographic equipment, providing a satisfactory level of security for communication applications, or other electronic data transfer processes where security is needed.
The modification (AES by using architecture 3 of Mix-column) of gives an 12% reduction in area and 20% increase in speed compared with the original design (AES by using architectures 1 and 2 of Mix-column) . Our design gives the highest throughput and area utilization over all the Iterative Looping based FPGA implementations. The decryption algorithm is implemented and gives better results than the design in previous work.
This compact design can help in implementing AES for smart cards, RFID Tags, and wireless sensors. This design prevents timing attack on mix-columns as the resultant columns take the same duration not depending on multiplicand. Our optimized and Synthesizable VHDL code is developed for the implementation of both encryption and decryption process. Each program is tested with some of the sample vectors provided by NIST and output results are perfect with minimal delay. Therefore, AES can indeed be implemented with reasonable efficiency on an FPGA, with the encryption and decryption taking an average of 124 and 277(s) respectively (for every 128 bits). The time varies from chip to chip and the calculated delay time can only be regarded as approximate.
