Abstract-Cryptographic algorithm is a tool that is used to secure the transmitted data on the network. The current standard algorithm the Advanced Encryption Standard (AES) is used to maintain the security and reliability of the encrypted data whether these data are stored in computer or in transmit. AES can be implemented either in hardware or software, however hardware implementation is more sensible for high speed applications. In this paper, AES-128 algorithm is implemented in hardware in order to achieve a high-speed data processing. It is implemented on an FPGA platform using HLL language and Xilinx ISE software. The design is effectively optimized and Synthesizable with high accuracy using the conventional blocks of Xilinx System Generator. The results of implementation have enhanced the performance in terms of resource utilization, speed and power consumption as compared with other related works. The circuit operates at a maximum frequency of 800.000 MHz which offers a high throughput of 102.4 Gbps on virtex6 xc6vlx130t-3ff1156, in addition it occupies only 2,405 slices.
implementation of the Advanced Encryption Standard on XC2v6000-6 of Xilinx using three hardware languages (VHDL, Handel-C and JBits) with a throughput of 24.92 Gbps. Dong Chen 2010 [8] implemented the AES algorithm on a Xilinx Virtex-4 xc4vlx100 device using the composite field algorithm to realize SubByte operation. Sub-pipelined architecture was designed to improve the frequency and achieve higher throughput. It operates on a maximum frequency of 645.703 MHz with a throughput of 82.65 Gbps. Pravin B. G. 2010 [9] used Very High Speed Integrated Circuit Hardware Description Language (VHDL) and Virtex XCV600 FPGA to implement the AES algorithm. In order to increase the throughput of the design, a pipelining architecture was proposed for implementing the encryption and decryption algorithms. The throughput for both encryption and decryption are 352 MHz. A. Arshad 2014 [10] proposed a hardware implementation of Advance Encryption Standard algorithm on Virtex-6 xc6vsx315t-3ff1156 FPGA using High Level Language (HLL) approach.
Pipelined architecture was implemented to increase the overall frequency and minimize the critical resources of the design.
It operates on a maximum frequency of 288.19 MHz and offers a high throughput of 36.864 Gbps. S. M. Umar Talha 2016 [11] presented an efficient reconfigurable hardware implementation of an AES using High Level Language (HLL) on Virtex-6 xc6vsx315t with the help of Xilinx System Generator. The implementation uses a pipelined architecture to reach a maximum frequency of 254.453MHz. Kirat Pal Singh 2016 [12] proposed an efficient hardware architecture design and implementation of the AES algorithm using VHDL on XC6vlx240t of Xilinx Virtex Family. The design was based on pipeline architecture to increase the frequency. It operates on a maximum frequency of 515.38 MHz.
III. AES ALGORITHM
In November, 2001 the Advanced Encryption Standard (AES) was announced officially by the National Institute of Standards and Technology (NIST) [3] . The algorithm is a block cipher based on SP-Network, it has a block size of 128-bit and a key length of 128, 192, or 256 bits. It is referred to as AES-128, AES-192, or AES-256, depending on the key length. AES-128 algorithm consists of ten rounds and each round consists of a sequence of four different transformations, called steps, which are SubByte, ShiftRow, MixColumn and AddRoundKey. These steps are identical at all rounds except the last which is applied without the MixColumn transformation. In addition, the value of the round key is differing from round to round and from the user supplied key.
• SubByte Transformation SubByte is a non-linear substitution of bytes, where each byte of the state is substituted with another using table lookup (S-box). The S-box is constructed by composing of two transformations: first, by taking the multiplicative inverse of the elements in the finite field GF(28) , where the element {00} is mapped to itself. Then by applying a certain affine transformation over GF(2).
• ShiftRow Transformation
The elements in the last three rows of the state are cyclically shifted to the left over different numbers of offset, while first row remained unchanged. Second row is shifted 1-byte to the left, third row is shifted 2-byte to the left, and last row is shifted 3-byte to the left. 
• AddRoundKey
In this step, the 16 bytes of the State are XORed with the 16-byte of the round key.
IV. AES HARDWARE DESIGN AND IMPLEMENTATION
AES algorithm is designed in FPGA [13] using Xilinx System Generator [14] . The design of the AES-128 is shown in Fig. 1 . It is a Loop Unrolling Architecture where each round is implemented separately. In this architecture, the hardware required to implement each round is duplicated to the number of rounds, i.e. , 10. Thus, the throughput is increased with the cost of the area used. Accordingly, the design is suitable for applications that require high speed implementation. SubByte transformation is implemented using a lookup table. Read-Only-Memory (ROM) is used to store the 256 values of the s-box in the form of decimal numbers as shown in Fig. 2 This ROM is duplicated 16 times to implement the 16 bytes of the block in parallel at the same time.
• ShiftRow ShiftRow transformation is implemented by reconnecting the wires according to the shift operation explained in section III without involving any logics. XORing the bytes of the word after multiplying them with the mentioned array. Fig. 3 shows the architecture of the MixColumn for the first 32-bit word. One of the bytes of the word is multiplied by 2. Using Xilinx multiplier costs three latency, therefore multiplier 2 is designed using the circuit shown in Fig. 4 , which cost nothing in hardware. The circuit works as follow, if the value of the entered number is greater than or equal 128, i.e. the most significant bit of the byte is one, the output byte is computed by shifting the input byte 1-bit to the left, and XORing the result with the constant 27 (the value of the irreducible polynomial in decimal). Otherwise, the XOR operation is skipped. From this circuit Multiplier 3 is designed by XORing the designed multiplier 2 with the input data as shown in Fig. 5 .
Iraqi Journal of Information and Communications Technology(IJICT)
At AddRoundKey transformation the 16-output byte of the MixColumn operation is XORed with the 16-byte of the round sub Keys.
• Deign of Round Key Generation AES-128 requires generating in total sub-keys of 44 words (32-bit each), 4-word sub-key for each round in addition to the 4-word of the initial key addition, which is actually the user provided key. The words are generating by simply XORing the word preceding the word to be generated in one location with that word located four places backward from the new one, for example, Word 5 is generated by XORing Word 4 with Word 1. This algorithm is applied to three out of four words of each round. For the first word from each round which is located in a place its value modulo 4 is 0, a different algorithm is used. It is achieved by passing the word immediately preceding the word to be generated through the G function. This function as shown in Fig. 6 consists of three steps, first step maps the values of the 4-byte by passing them through the substitution box. This layer is implemented as in the algorithm using S-box which its value is calculated in advance and stored in a ROM.
Next step is the rotate word which rotates the 4-byte one byte to the left. This layer is executed by rearranging the output according to the algorithm without involving any logic. The last step is to XOR the results of the previous step with the round constant. This constant is used to break the symmetry as its value is changed at each round. It starts with a value of one and then this value is multiplied by 2 every round, noting that this multiplication is polynomial multiplication. The output of the G function is then XORed with the column which is three columns behind. The design is shown in Fig. 7 . The cipher has been implemented on the target device Xilinx xc6vlx130t-3ff1156 FPGA, which is chosen very carefully to adhere the requirements of the proposal. The algorithm is design using Xilinx System Generator that generates the necessary netlist file which is synthesized and simulated using Xilinx ISE Foundation 14.7 [15] and ModelSim [16] . The block of data is processed in only one cycle this is due to the selected architecture and the efficient design of the layers.
The results show that the AES-128 loop unrolling architecture operates on a maximum frequency of 800 MHz and offers a high throughput of 102.4 Gbps. The design occupies 2,405 slices out of 20,000 (12%). The total power consumption of the circuit is 3.477 W. The simulation result is shown in Fig. 8 using ModelSim simulator. The values of the input key and plaintext as well the generated ciphertext are:
Input Key: 2b7e151628aed2a6abf7158809cf4f3c
Plain Text: 3243f6a8885a308d313198a2e0370734
Cipher Text: 3925841d02dc09fbdc118597196a0b32
The performance of the algorithm is clearly enhanced as shown in Table I , which presents a comparison between the obtained results with other related works. Virtex-4 xc4vlx100 645.7 82.6 Reference [6] Virtex XCV600 140.390 -Reference [7] Virtex-6 xc6vsx315t 288.19 36.8 Reference [8] Virtex-6 xc6vsx315t 254.45 -Reference [9] Virtex-6 XC6vlx240t 515.38 -Proposed work Virtex-6 xc6vlx130t-3 800 102.4
VI. CONCLUSIONS
In this paper, efficient implementation of an AES in term of throughput, area, and power consumption is presented using Xilinx system generator. The chosen methodology was to optimize the implementation speed, thus loop unrolling
