Design of AES Architecture With Area and Speed Tradeoff  by Shaji, Neenu & Bonifus, P.L.
 Procedia Technology  24 ( 2016 )  1135 – 1140 
Available online at www.sciencedirect.com
ScienceDirect
2212-0173 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of ICETEST – 2015
doi: 10.1016/j.protcy.2016.05.066 
International Conference on Emerging Trends in Engineering, Science and Technology
(ICETEST - 2015)
Design of AES architecture with area and speed tradeoﬀ
Neenu Shajia, Bonifus P.Lb
aStudent,Department of Electronics Communication Engineering,Rajagiri School of Engineering and Technology , Kochi, Kerala,India
bAsst. Professor, Department of Electronics Communication Engineering,Rajagiri School of Engineering and Technology ,Kerala,India
Abstract
AES, is the well-accepted cryptographic algorithm which could be utilized to ensure security of electronic information since it is
proven to be resistive to most of the attacks. In this work, we present the AES-128 encryption and decryption circuit using area
optimized iterative architecture. Here the technique used to obtain a lower area delay product is to map the transformations to 8 bit
hardware while keeping the datapath to be 128 bit. A control unit is designed to keep track of the transformations. The proposed
architecture have been implemented on the most recent Xilinx Spartan FPGA, their area and delay are compared with the previous
works and it is proved that proposed technique has lower area coverage and delay.
c© 2016 The Authors. Published by Elsevier Ltd.
Peer-review under responsibility of the organizing committee of ICETEST - 2015.
Keywords: Security;Cryptography; AES; Encryption;Decryption;Field Programmable Gate Array(FPGA);RTL.
1. Introduction
The assurance of security in networks and storage devices has become the need of the hour. With increase in the
rate of crimes, one needs to take precautions to protect the data in an eﬃcient manner from all possible attacks. Data
security can only be achieved with secure communication channel, strong data encryption technique and trusted third
party to maintain the database. Messages need to be secured from malicious attacks. Encipherment is one of the
security systems to secure data from unauthorized access. Cryptography enables us to store sensitive information or
transmit it across insecure networks like Internet so that no one else other than intended recipient can read it [4].
Software and hardware based security solutions can encrypt the data and provide security. However, a malicious
program or a hacker could corrupt the data in order to make it unrecoverable, making the system unusable. Hardware-
based security solutions can provide better security to data than software solutions since they are safe from hackers
and virus attacks. It also provides security against unauthorized access.
∗ Corresponding author. Tel.: +91-9562828805
E-mail address: neenuks@gmail.com, bonifus@rajagiritech.ac.in
 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of ICETEST – 2015
1136   Neenu Shaji and P.L. Bonifus /  Procedia Technology  24 ( 2016 )  1135 – 1140 
In order to implement the algorithm in hardware, it is essential that its area is optimized to ﬁt the device. Low
area also provides low cost and power consumption. Since AES algorithm is a highly extensive and area consuming
architecture is important to optimize it so that it ﬁts low powered embedded devices. Area optimization of AES
architecture is obtained by using iterative structure and low order datapaths wherever possible with little degradation
in throughput. The design is implemented in Verilog HDL and synthesized for Xilinx Spartan device (encryption)
using Xilinx ISE tool [1].
2. Preliminaries
The AES algorithm is a private key block cipher. It encrypts data of block size 128 bits. It can use any of the
three key sizes, 128 bits, 192 bits and 256 bits and the number of round operation(loop) is 10,12 and 14 depending
on the key length. It has 4 transformations in each loop: SubBytes, ShiftRows, MixColumns and AddRoundKey.
AddRoundKey is executed for initialization as shown in Fig.1. and the ﬁnal round excludes the MixColumns trans-
formation. ShiftRows is a cyclic shift operation in each row of four 4-byte data with oﬀsets of 0 to 3. The SubBytes
transformation is a non-linear byte substitution that operates independently on each byte of the State using a substi-
tution table (S-box). This S-box is constructed by composing two transformations: multiplicative inverse in the ﬁnite
ﬁeld GF(28) and aﬃne transformation[6]. MixColumns treats the 4-byte data blocks in each column as coeﬃcients
of a 4-term polynomial, and multiplies the data modulo x4 + 1 with a ﬁxed polynomial. AddRoundKey is a simple
bit-wise XOR operation on the 128-bit round keys and the data [5]. The above encryption scheme can be inverted to
get the decryption structure
Fig. 1. AES operations.
The round keys will be generated using the key generation unit. It generates 11 sets of 128-bit round keys from
one 128-bit secret key. This unit will be generating 176, 208 or 240 bytes of round keys depending on the size of the
used key. The AES algorithm uses the GF(28). The data byte can be characterized using a polynomial representation
of GF(28). The decryption process involves the inverse steps: Inverse S-BOX used for Byte Substitution, Inverse Shift
Rows, AddRoundKey and Inverse MixColumns.
3. Area optimization of AES encryptor and decryptor
The AES architecture of this design adopts an iterative round-looping structure by mapping sub modules to lower
datapaths of 8 and 32 bit. Internally, the AES transformations are performed on a two-dimensional array of bytes
called the state[6]. Here the hardwares used for the architecture is 8 bit while the entire datapath remains 128 bit.
The 128 bit registers are used in between the hardwares in order to obtain speed improvement with very less area
utilization. The counter counts the number of rounds executed. The mux is used to skip the mix column operation in
1137 Neenu Shaji and P.L. Bonifus /  Procedia Technology  24 ( 2016 )  1135 – 1140 
last stage. The key is expanded on the ﬂy in order to save the area consumption of the storage of expanded key of 44
words.
3.1. ShiftRow Implementation
ShiftRows() is a cyclic shift of the bytes of the state. In algorithm, the shift rows comes after the substitute bytes
transformation. But from the study it can be inferred that swapping these two steps will produce the same outputs as
their order is not signiﬁcant because SubBytes operates on single bytes, and ShiftRows reorders bytes without altering
them [6]. Hence we can save the area for a shifter by directly port mapping the required byte and directing it to the
SBox.
Fig. 2. Shift row by port mapping
3.2. Sbox Implementation
Sbox is usually implemented as LUT which is highly area consuming. Here a combinational logic has been
created which maps the Galois ﬁeld to lower composite ﬁelds so as it uses only logic elements in the implementation.
It reduces the complexity of the circuits and hence sbox can be implemented with very low gate count. The input
receives 8 bit data, maps all elements of the ﬁeld GF(28) to its composite ﬁeld, using an isomorphism function.Then
it computes the multiplicative inverse over the ﬁeld and in last stage re-map the computation results back to GF(28),
using the inverse isomorphic function. Isomorphic mapping function and its inverse need to be applied to map the
representation of an element in (28) to its composite ﬁeld and vice versa[7].The InvSubBytes transformation can be
implemented by inverse of this[2] [5] [6] [9].
Fig. 3. Sbox implementation using gates
3.3. MixColumn Implementation
In this module, one column of a state is treated at a time in four clock cycles as shown in Fig.4. In each clock
cycle a new byte is fed to the unit, and the four registers store the intermediate results of the MixColumn calculation.
1138   Neenu Shaji and P.L. Bonifus /  Procedia Technology  24 ( 2016 )  1135 – 1140 
Fig. 4. Mix column operation using registers
Every four cycles, the 32-bit output is fed to the output registers. The MixColumns multiplier performs a complete
MixColumns operation in 16 cycles in parallel with the rest of the operations of the AES core [7].
Since the mix column operation takes data as a column we require a control unit to provide data to the architecture
as each byte and produce the output only after 4 clock cycles. The purpose of control unit is to provide enable signals
to the register and mix column unit. The activity is controlled using a 3 bit counter. Each 32 bit register is enabled in
every 4 clock cycles and after 16 clock cycles the data is stored in 128 bit register.
Fig. 5. Control unit to carryout mix column operation
3.4. Add round key
Add round key is a self-inverting transformation. It transforms the input data by XORing 128-bits of the plain text
with 128 bits of the expanded cipher key in the ﬁrst iteration of the algorithm. In subsequent iterations, the partially
processed data is XORed with the expanded cipher key. These round keys can be prepared on the ﬂy in parallel with
the encryption process. [1] [10].
The ﬁgure(6) shows the iterative design of the AES architecture. Here the all computational units used are in 8 bit
while the entire datapath remains at 128 bit. Register of 32 bit and 128 bit are used in order to achieve this. Hence
with little delay in producing the output we can ﬁt the AES design into low area FPGA even without using BRAM.
1139 Neenu Shaji and P.L. Bonifus /  Procedia Technology  24 ( 2016 )  1135 – 1140 
Fig. 6. AES iterative structure.
4. Experimental Results
The proposed AES design and its sub modules are implemented using Verilog Hardware Description Language in
Xilinx ISE 14.6. The device utilization summary of the complete design of the encryptor with the selected device
Spartan xc3s100e-5vq100.
Fig. 7. AES encrypted output.
The simulation result of proposed architecture is shown in Fig.6. On comparison with the previous it shows that the
area delay product is optimized. The comparison of our work is done with other implementations of AES in FPGA.
Only the area consumption of 128 bit AES implementation can be compared. Most of the works that achieved low
area was either in 8 bit or 32 bit implementation else it was using BRAM.The delay was observed to be 3.06ns
1140   Neenu Shaji and P.L. Bonifus /  Procedia Technology  24 ( 2016 )  1135 – 1140 
Table 1. Comparison with other works
Design No. of slices BRAM used Maximum frequency(MHz) Throughput(Mbps)
Rouvroy[9] 146 3 123 358
Chodowiec [6] 222 3 60 166
T.Good [8] 124 2 72.3 -
Our design 930 - 81.74 -
5. Conclusion
This work addresses the area optimization of AES by mapping the transformations to lower datapath hardwares and
using an iterative loop architecture. This design achieves tradeoﬀ between the speed and area even without using the
BRAM. The datapath is maintained to be 128 bit and hardwares are either 8 or 32 bit. The whole design is performed
with the help of Xilinx ISE and is synthesized with its tools. The simulation and synthesis is done targeting the
Spartan 3 device. From the obtained performances, we can conclude that our proposed AES Architecture is suitable
to be used in resource constrained systems.
References
[1] National Institute of Standard and Technology., NIST FIPS PUB 197Advanced Encryption Standard, 2001.
[2] X. Zhang and K. Parhi, High-Speed VLSI Architectures for the AES Algorithm,IEEE Transactions on, vol. 12, no. 9, september 2004
[3] Bellare, Mihir; Rogaway, Phillip. Introduction to Modern Cryptography. p. 10. (21 September 2005)
[4] P. Hmlinen, T. Alho, M. Hnnikinen, and D. Hmlinen, Design and Implementation of Low-area and Low-power AES Encryption Hardware
Core.,Proceedings of the 9th EUROMICRO Conference on Digital System Design 2006
[5] S. Morioka and A. Satoh, An Optimized S-Box Circuit Architecture for Low Power AES Design ,in Proc. ASIACRYPT , 2003,pp. 172186
[6] K. Gaj and P. Chodowiec. Very Compact FPGA Implementation of the AES Algorithm. In the proceedings of CHES 2003, Lecture Notes in
Computer Science, vol 2779, pp. 319-333, Springer-Verlag.
[7] S. Kaur,R Vig, Eﬃcient Implementation of AES Algorithm in FPGA Device. In the proceedings of International Conference on Computational
Intelligence and Multimedia Applications 2007.
[8] T. Good and M. Benaissa,AES on FPGA from the fastest to the smallest, UK Engineering and Physical Sciences Research Council (EPSRC)
[9] G. Rouvroy, F. X. Standaert, J. J. Quisquater, and J. D. Legat, Compact and efcient encryption/decryption module for FPGA implementation
of the AES Rijndael very well suited for small embedded applications, in Proc. ITCC04, Apr. 2004, vol. 2, pp. 583587.
[10] A.Satoh, S. Morioka et.all ”A Compact Rijndael Hardware Architecture with S-Box Optimization” in ASIACRYPT 2001, LNCS 2248, pp.
239254,Springer-Verlag Berlin Heidelberg 2001
