Performance Evaluation, Comparison and Improvement of the Hardware Implementations of the Advanced Encryption Standard S-box

Abstract

The Advanced Encryption Standard (AES) is the most popular algorithm used in symmetric key cryptography. The efficient computation of AES is essential for many computing platforms. The S-box is the only nonlinear transformation step of the AES algorithm. Efficient implementation of the AES S-box is very crucial for AES hardware. The AES S-box could be implemented by using look-up table method or by using finite field arithmetic. The finite field arithmetic design approach to implement the AES S-box is superior in saving the hardware resources. The main objective of this thesis is to evaluate, compare and improve the hardware implementations of the forward, inverse and combined AES S-box in terms of area and/or delay. Both the composite field GF((2^4)^2) and the tower field GF(((2^2)^2)^2) are considered. Our first improvement is the optimization of the input and output linear mappings of the S- box in order to design a more compact circuit. Our second improvement aims at modifying the architecture of the S-box to achieve a higher speed. We used multiplication of the S-box input by an 8-bit binary field element to optimize the input and output transformation matrices of the S-box. A Matlab® search is then conducted to find more compact linear mappings for the S-box. We also modified the fast S-box architecture, in addition to optimizing and searching the extended linear input mappings to improve the speed of Reyhani et al. fast S-box. The improved fast S-box, Fast 3, is the fastest and most efficient (measured by area × delay) AES S-box available in the literature, up to our knowledge. We also improved the area and delay of the inversion circuit of the lightweight and fast S-boxes in [69], by slightly modifying the exponentiation block and designing a new subfield inverter block. The improved inversion circuit leads to a more compact and a faster lightweight S-box and it yields a lower area fast S-box. Moreover, we show that the “tech. XORs” concept proposed by Maximov et al. [54] to estimate the delay of the S-box is not accurate. We show how to use the logical effort method [74] instead to estimate and compare the delay of previous and improved S-boxes, regardless of the CMOS technology library used for the implementation. We verified all the codes at the RTL level using Mentor Graphics Modelsim®, by comparing against the legitimate S-box outputs. We synthesized the designs using STM 65nm CMOS standard cell library and we used VHDL coding as the design entry method to Synopsys Design Compiler®. The synthesis results confirm the lower area and / or delay of the improved S-box designs and match our space and timing analyses

    Similar works