



USING S-BOX INTEGRATION by Lee, Yi Lin




Lee Yi Lin (1520)
Dissertation submitted in partial fiilfillment of
The requirement for the
Bachelor of Engineering (Hons)






Final Year Project Dissertation
CERTIFICATE OF APPROVAL
Optimization of Advanced Encryption Standard (AES) in FPGA
Implementation using S-Box Integration
by
Lee Yi Lin
Dissertation submitted to the
Electrical and Electronics Engineering Programme
Universiti Teknologi PETRONAS
in partial fulfillment of the requirement for the
BACHELOR OF ENGINEERING (Hon)
(ELECTRICAL AND ELECTRONICS ENGINEERING)
Approved by.
(NOOHUL BASHEER ZAIN ALI)
s& UNIVERSITI TEKNOLOGI PETRONAS
TRONOH, PERAK
June 2004
Final Year Project Dissertation
CERTIFICATE OF ORIGINALITY
This is to certifythat I am responsible for the work submitted in this project, that the
original work is my own except as specified in the references and
acknowledgements, and that the original work contained herein has not been




Final Year Project Dissertation
ACKNOWLEDGEMENT
This project would not have been possible without the help of a number of people,
and the author would like to express his heartfelt gratitude to all of them.
First of all, the author would like to express his foremost gratitude to his project
supervisor, Mr. Noohul Basheer Zain AH, for his endless guidance and continuous
monitoring throughout the two semester's of final year project. His comments,
suggestions, and advice were given serious consideration and were invaluable in
determining the final output ofthis project.
The author is greatly indebted to his parents, Lee Sin Fatt and Chew Lai Chan, for
their never-ending support and encouragement. The author would also like to thank
his lovely family members, for their love and care which continue to support him
through the difficult times
Thanks to Mr. Rudolf Usselmann from asics.ws, for sharing his hard work of the
AES IP cores on the webpage ofOpen Cores.
Last but not least, the author would like to thank his friends and everyone who
provided many useful assistance and support to help him in completing his project
successfully.
in
Final Year Project Dissertation
ABSTRACT
Cryptography has a significant role in the security of data transmission. The
algorithm of Rijndael was selected and adopted by National Institute of Standards
and Technology (NIST) U.S. as Advanced Encryption Standard (AES) in October
2000, in order to replace the old Data Encryption Standard (DES).
As compared to software, hardware implementations provide more physical security
as well as faster speed. Thus, in this project, the AES cryptograph was simulated
with FPGA, by using Verilog HDL. The main objectives are the architectural and
algorithmic optimizations of the AES implementation, which would in turn benefit
applications that are both speed and area critical.
The optimization methodology in this project was achieved using S-Box integration.
S-Box, which is for SubBytes, and Inverse S-Box, which is for InvSubBytes, are both
constituted of two 256-byte substitution tables. In fact, it is usual that in any high
speed full pipelining AES implementations, it would require 24 S-Box tables and 16
InverseS-Box tables at any one time.
Nonetheless, mathematical formulas show that S-Box and Inverse S-Box could
actually beachieved with only g,fand/1. Multiplicative inverse, org, is a 256-byte
look-up table. On the other hand, affine transformation,/, and its inverse,/7, can be
implemented with a limited number of XOR gates. Accordingly, the number of
substitution tables necessitated could be reduced by half.
Consequently, the new implementation would still obtain the identical S-Box and
Inverse S-Box values, but merely from one look-up table and some simple logic
gates. The new design shows that it can deliver a throughput of 203 Mbit/sec with
hardware of 78,977 gate counts. Hardware complexity is reduced to 69% of its
originalwhile still able to function at core process of only 12 cycles.
IV
Final Year Project Dissertation
TABLE OF CONTENTS
CERTIFICATE OF APPROVAL i
CERTIFICATE OF ORIGINALITY ii
ACKNOWLEDGEMENT iii
ABSTRACT iv
TABLE OF CONTENTS v
LIST OF FIGURES viii
LIST OF TABLES ix
LIST OF APPENDICES x
ABBREVIATIONS AND NOMENCLATURES xi
1. INTRODUTION 1
1.1 Background ofStudy 1
1.2 Problem Statement 2
1.3 Objectives and Scope of Study 2
2. LITERATURE REVIEW AND THEORY 4
2.1 The AES Algorithm 4
2.2 Algorithm Specification ofAES 5
2.2.1 Cipher 6
2.2.2 Key Expansion 9
2.2.3 Inverse Cipher 10
2.3 Dedicated Hardware 12
2.3.1 Decomposition of Srd 12
Final Year Project Dissertation
3. METHODOLOGY / PROJECT WORK 14
3.1 Procedure Identification 14
3.1.1 AES Algorithmic Review and Analysis 15
3.1.2 Verilog and FPGA Tools Familiarization 15
3.1.3 AES Core Analysis and Top Layer Implementation 16
3.1.4 Design Verification of Combined S-Box with MATLAB 17
3.1.5 Module Validation of Combined S-Box with Test Bench 18
3.1.6 Integrated Crypto Module for Unified S-Box Module
Instantiation 19
3.1.7 Top Layer Implementation and FPGA Synthesis 21
3.1.8 Performance Comparison 21
3.2 Tools Required 22
4. RESULTS AND DISCUSSION 23
4.1 Optimization Via S-Box/Inverse S-BoxIntegration 23
4.1.1 S-Box Integration Design Verification with MATLAB 26
4.1.2 Integrated S-Box Verilog Modules Construction 29
4.1.3 S-Box Integration Module Validation with Test Bench 31
4.2 Combination of Cipher and Inverse Cipher 32
4.3 Top Layer Implementation 34
4.3.1 Results Comparison of Various Data Path Size 35
4.3.2 64-bit Top Layer Data Path Implementation 36
4.4 Performance Comparison 38
5. CONCLUSION 41
5.1 Review and Conclusions 41
5.2 Suggested Future Workfor Expansion & Continuation 42
VI




Final Year Project Dissertation
LIST OF FIGURES
Figure 1 Statearray inputand output 5
Figure 2 PseudoCode for the Cipher 6
Figure 3 SubBytes () applies the S-box to each byte of the State 7
Figure 4 ShiftRows () cyclically shifts the last three rows in the State 7
Figure 5 MixColumns () operateson the State column-by-column 8
Figure 6 AddRoundKey () XORs each column of the State with a word from
the key schedule 9
Figure 7 Pseudo Code for Key Expansion 9
Figure 8 Pseudo Code for the Equivalent InverseCipher 10
Figure 9 InvShiftRows () cyclically shifts the last three rows in the State 11
Figure 10 Methodology FlowChart for the FYP Project Execution 14
Figure 11 Cipher Core Architecture Overview 16
Figure 12 Inverse Cipher Core Architecture Overview 16
Figure 13 Relationships of Original AES Modules 18
Figure 14 Relationships ofModified AES Modules 18
Figure 15 Integrated Crypto Module for Unified S-Box Module Instantiation 20
Figure 16 Total Number of S-Box and Inverse S-Box Instantiation 23
Figure 17 MATLAB M-file Codes for Design Verification 28
Figure 18 S-box (Srd) 28
Figure 19 Inverse S-box (Srd"1) 29
Figure 20 Equivalence of Modules after the Integration 30
Figure21 PartialVerilogcodes of aes_sboxjnv.v 30
Figure 22 Partial Verilog codes of aes_mul_inv.v 31
Figure 23 Test Bench Simulation Output 32
Figure 24 Total Number of S-Box and Inverse S-Box Instantiation 33
Figure 25 Partial Verilog codes ofaes_crypto_top.v 34
Figure 26 Physical Layout of Top Layer Data Path Implementation 37
Figure 27 Key and Data Input at the start ofEncryption/Decryption 38
Figure 28 Output Text from an Encryption/Decryption Operation Being
Outputted 38
vin
Final Year Project Dissertation
LIST OF TABLES
Table 1 Key-Block-Round Combinations 6
Table 2 Comparison of total clock cycle 35
Table 3 Comparison of slice utilization percentage 35
Table 4 Comparison of the numberof 4-inputLUT 35
Table 5 Comparison of the numberof bondedIOB 35
Table 6 Comparison of total equivalent gate count 36
Table 7 Pin Descriptionof Top LayerData Path Implementation 37
Table 8 Gate Count of S-Box, Inverse S-Box and Integrated S-Box 38
Table 9 Performance Comparison of Area Utilization 39
Table 10 Performance Comparison of Speed 40
IX
Final Year Project Dissertation
LIST OF APPENDICES
Appendix 1 Final Year Project Gantt chart ofFirst Semester
Appendix 2 Final Year Project Gantt chart of Second Semester
Appendix 3 S-box: substitution values for the byte xy
Appendix 4 Inverse S-box: substitution values for the byte xy
Appendix 5 Multiplicative Inverse, g(xy)
Appendix 6 Round constants for the key generation
Appendix 7 Original Design - aesj>ri64Jop.v
Appendix 8 Original Design - aes cipherJop.v
Appendix 9 Original Design - aesJnv cipherJop.v
Appendix 10 Original Design - aessbox.v
Appendix 11 Original Design - aesinvsbox.v
Appendix 12 Original Design & New Design - aes key expands
Appendix 13 Original Design & New Design - aesrcon.v
Appendix 14 New Design - aes_new64jop.v
Appendix 15 New Design - aes_crypto_top.v
Appendix 16 New Design - aessboxjnv.v
Appendix 17 New Design - aesjnuljnv. v
Appendix 18 test benchJop.v
Appendix 19 Synthesis Report of Original Design
Appendix 20 Map Report of Original Design
Appendix 21 Synthesis Report ofNew Design
Appendix 22 Map Report ofNew Design
x
Final Year Project Dissertation
ABBREVIATIONS AND NOMENCLATURES
1. AES - Advanced Encryption Standard
2. Cipher - Series of transformations that converts plaintext to ciphertext using
the Cipher Key
3. Cipher Key - Secret, cryptographic key that is used by the Key Expansion
routine to generate a set ofRound Keys; can be pictured as a rectangular
array ofbytes, having four rows and Nk columns
4. Ciphertext - Data output from the Cipher or input to the Inverse Cipher
5. DES - Data Encryption Standard
6. FPGA - Field Programmable Gate Array
7. Inverse Cipher - Series of transformations that converts ciphertext to
plaintext using the Cipher Key
8. IOB-Input Output Blocks
9. Key Expansion - Routine used to generate a series ofRoutine Keys from the
Cipher Key
10. LUT- Look Up Tables
11. MARS - AES finalist block cipher proposed by IBM
12. NIST - National Institute of Standards and Technology
13. Plaintext - Data input to the Cipher or output from the Inverse Cipher
14. RC6 - AES finalist block cipher proposed by RSA
15. Rijndael - 128 bit block cipher, named after its creators, Rijmen & Daemen,
selected as AES algorithm in year 2000 by NIST
16. Round Key - Round keys are values derived from the Cipher Key using the
Key Expansion routinel they are applied to the State in the Cipher and
Inverse Cipher
17. Serpent - AES finalist block cipher proposed by Ross Anderson, Eli Biham
and Lars Knudsen
18. State - Intermediate Cipher result that can be pictured as a rectangular array
of bytes, having four rows and Nb columns.
xi
Final Year Project Dissertation
19. S-box - Non-linear substitution table used in several byte substitution
transformation and in the KeyExpansion routine to perform a one-for-one
substitution of a byte value
20. Twofish - AES finalist blockcipher proposed by Counterpane Lab
21. Word - A group of 32 bits that is treated eitheras a single entityor as an
array of 4 bytes
xn
Final Year Project Dissertation
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND OF STUDY
Cryptography plays an important role in the security of data transmission. In 1997,
the RSA Security Company issued a challenge to break the US government's Data
Encryption Standard (DES) algorithm. In June of that year, the challenge was solved
by the DESCHALL team who successfully recovered the 56-bit DES key. In
response, the National Institute of Standards and Technology (NIST) requested
candidates for a new Advanced Encryption Standard (AES) algorithm to replace
DES, realizing that the algorithm's 56-bit key was no longer sufficient to provide the
necessary security in many applications [1],
As an interim measure they adopted and standardized Triple-DES, which uses three
passes of the DES algorithm and a 112 or 168-bit key. The AES requirements were
for a block cipher capable of supporting a data block size of 128-bits and keys of
128, 192 and 256-bit in length. They wanted an algorithm whose security is at least
as good as Triple-DES, but with significantly improved efficiency [1].
Among the 15 preliminary candidates, MARS, RC6, Rijndael, Serpent and Twofish
were announced as the finalist candidates on August 9, 1999 for further evaluation.
After studying all available information and public comments on these finalist
candidates, NIST announced in October 2000 that Rijndael was selected as the AES
algorithm [2]. Rijndael proved a secure and efficient algorithm when implemented in
both hardware and software across a wide range ofplatforms.
FinalYear ProjectDissertation
1.2 PROBLEM STATEMENT
The project will address the efficient hardware implementation approaches for the
AES algorithm. The hardware implementation would be FPGA-based. Compared to
software implementations, hardware implementations provide more physical security
as well as higher speed [2].
Different application of the AES algorithm may require different speed/area trade
offs. Some applications, such as smart cards and cellular phones, require small area.
Other applications, such as WWWservers and ATMs, are speed critical. Someother
applications, such as digital video recorders, require an optimization of speed/area
ratio. Various optimizations for implementation are developed to suit the different
demands of applications. Architectural optimizations make use of duplicating the
round units, while algorithmic optimizations explore algorithm simplification inside
each encryption/decryption round unit [2].
In this project, effort would be paid in order to look into the feasibility of achieving
both architectural optimizations and algorithmic optimizations of AES
implementation. It would effectively optimize the area utilization of applications that
are area-crucial, such as smart cards and mobile phones, while still providing them
with essential high speed.
1.3 OBJECTIVES AND SCOPE OF STUDY
1. Implement the cryptographic algorithm of AES, a symmetric block cipher
that can process data blocks of 128 bits, using cipher keys with lengths of
128 bits.
2. Implement the AES with FPGA technology, using Verilog Hardware
Description Language.
3. Investigate the feasibility of both architectural optimizations and algorithmic
optimizations of AES with FPGA implementation, in order to benefit
applications that are both speed and area crucial, such as smart cards and
mobile phones.
FinalYearProject Dissertation
4. Integrate the S-Box and the Inverse S-Box into aunified module, by utilizing
aMultiplicative Inverse look-up table and simple logic gates.
5. Instantiate the combined module of combined S-box/Inverse S-Box with a
modified structure of Cipher and Inverse Cipher, and subsequently
implement the design with a top layer data path. The performance in
comparison with the implementation of an ordinary design under the same
data path would be studied.
6. Reduce the utilization of256-byte S-Box/Inverse S-box tables from 40 to 20,
which would eventually reducethe total area utilized.
Please refer to Appendix 1 for Final Year Project Gantt chart of first semester and
Appendix 2 for FinalYearProject Gantt chartof second semester.
Final Year Project Dissertation
CHAPTER 2
LITERATURE REVIEW AND THEORY
2.1 THE AES ALGORITHM
The AES algorithm is a symmetric-key block cipher in which both the sender and
receiveruse a single key to encrypt and decrypt the information. Although the block
length of Rijndael can be 128, 192,or 256 bits, the AES algorithm only adoptedthe
blocklength of 128 bits.Meanwhile, the key length can be 128,192, or 256bits [2].
The AES algorithm is a substitution linear transformation cipher based on S-boxes
and operations in the Galois Fields. Implementation of the encryption round of AES
requires realization of four component operations: Substitution, ShiftRow,
MixColumn, and KeyAddition. Implementation of the decryption round of AES
requires four inverse operations: InvSubstitution, InvShiftRow, InvMixColumn, and
KeyAddition [3].
Substitution is composed of sixteen identical S-boxes working in parallel.
InvSubstitution is composed of the same number of inverse S-boxes. Each of these
S-boxes can be implemented independently usinga 256x8 bit look-up table [3].
ShiftRow and InvShiftRow change the order of bytes within a 16 byte (128 bit) word.
Both transformations involve only changing the order of signals, and therefore they
can be implemented using routing only, and do not require any logic resources, such
as Configurable Logic Blocks (CLBs) or dedicated RAM [3].
The MixColumn transformation as well as InvMixColumn can be expressed as a
matrix multiplication in the Galois Field GF(28). The InvMixColumn transformation
Final Year Project Dissertation
has a longer critical path compared to the MixColumn transformation, and therefore
the entire decryption is more time consuming than encryption [3].
KeyAddition is a bitwise XOR of two 128 bit words.
2.2 ALGORITHM SPECIFICATIONS OF AES
Since the AES algorithm may be used with the three different key lengths (i.e. 128,
192 and 256 bits), the variances are therefore referred to as "AES-128", "AES-192"
and "AES-256" [7].
Internally, the AES algorithm's operations are performed on a two-dimensional array
of bytes called the State. The State consists of four rows of bytes, each containing Nb
bytes, where Nb = 4 in this standard. At the start of the Cipher and Inverse Cipher,
the input - the array of bytes in0, ini... in}5 - is copied into the State array as
illustrated in Figure 1. The Cipher or Inverse Cipher operations are then conducted
on this State array, after which its final value is copied to the output - the array of
bytes onto, out]... out15.
Inputbytes
In0 in4 ins inn
in, i»s in9 in,3
in2 in6 mo inl4
in3 in7 inn ini5
^>
State array
$0,0 $0,1 $0,2 $0,3
$1,0 $1,1 $1,2 $1,3
$2,0 $2,1 $2,2 $2,3
$3,0 S3J $3,2 $3,3
>
Figure 1: State array input and output
Output bytes
outQ out4 out8 OUt12
onts out5 out9 OUt]3
out2 out6 out10 OUt14
out3 out? OUtu outi5
For AES algorithm, the length of the Cipher Key, K, is 128, 192, or 256 bits. The
key length is represented by Nk = 4, 6, or 8, which reflects the number of 32-bit
words in the Cipher Key. Meanwhile, the number of round to be performed during
the execution of the algorithm is dependent on the key size. The number of rounds is
represented by Nr, where Nr= \0 when Nk = 4,Nr= 12 when Nk = 6, and JV> = 14
when Nk = 8.
Final Year Project Dissertation








AES-128 4 4 10
AES-192 6 4 12
AES-256 8 4 14
For both its Cipher and Inverse Cipher, the AES algorithm uses a roundfunction that
is composed of four different byte-oriented transformations: 1) byte substitution
using a substitution table (S-box), 2) shifting rows of the State array by different
offsets, 3) mixing the data within each column of the State array, and 4) adding a
Round Key to the State.
2.2.1 Cipher
The Cipher is described in the following pseudo code. Notice that all Nr rounds are
identical with the exception of the final round, which does not include the
MixColumns () transformation.
















Figure 2: Pseudo Code for the Cipher
Final Year Project Dissertation
i) SubBytes () Transformation
The SubBytes () transformation is a non-linear byte substitution that operates
independently on each byte of the State using a substitution table {S-box).




ho hihi s 2,2
ho hi Sll hi
S-Box





ho •h hi hi
Figure 3: SubBytes () applies the S-box to each byte of the State
The S-box used in the SubBytes () transformation is presented in Appendix 3.
ii) ShiftRows () Transformation
In the ShiftRows () transformation, the bytes in the last three rows of the State are
cyclically shifted over different numbers of bytes (offsets). The first row, r = 0, is
not shifted.
^r
Srfi SrA *r.2 Sr,3
\o sO,I SQ,2 SQ,i
ho h hi hi
^2,0 hi hi hi





S'r,0 s'r.l S'r,2 *r.3
S'
ho S0,l h2 hi
h hi hi ho
hi hi ho hi
hi ho hi 3 2
Figure 4: ShiftRows () cyclicallyshifts the last three rows in the State
Final Year Project Dissertation
Hi) MixColumns () Transformation
The MixColumns ( ) transformation operates on the State column-by-column,
treating each column as a four-term polynomial. The columns are considered as
polynomial over GF(28) and multiplied modulo x4 + 1with a fixed polynomial a{x).
This can be written as a matrix multiplication [7]. Let
s'(x) =a(x)®s(x)
he
02 03 01 01
01 02 03 01
01 01 02 03




























Figure 5: MixColumns () operates on the State column-by-column
iv) AddRoundKey () Transformation
In the AddRoundKey () transformation, a Round Key is added to the State by a
simple bitwise XOR operation. In the Cipher, the initial Round Key addition occurs
when round = 0, prior to the first application of the round function. The application
of the AddRoundKey ( ) transformation to the Nr rounds of the Cipher occurs when
1 < round < Nr.






^2,0 he ,2 hi







ho he i 1 hi
ho he 1.2 hi
Figure 6: AddRoundKey () XORs each column ofthe State with a word from the key
schedule
2.2.2 Key Expansion
The AES algorithm takes the Cipher Key, K, and performs a Key Expansion routine
to generate a key schedule. The Key Expansion generates a total of Nb(Nr + 1)
words. The resulting key schedule consists of a linear array of 4-byte words, denoted
[w,], with/in the range 0<KNb(Nr+ 1) [7].
The expansion of the input key into the key schedule proceeds according to the
pseudo code below:




while (i < Nk)







temp = SubWord(RotWord(temp)) xor Rcon[i/Nk]
else if (Nk > 6 and i mod Nk = 4)
temp = SubWord(temp)
end if




Figure 7: Pseudo Code for Key Expansion
Final Year Project Dissertation
Subword () is a fiuiction that takes a four-byte input word and applies the S-box to
each of the four bytes to produce an output word. The function RotWord () takes a
word [a0,aj,a2,a3] as input, performs a cyclic permutation, and returns the word
[ai,a2,a3,ao]. The round constant word array, RconfiJ is presented in Appendix 6.
2.2.3 Inverse Cipher
The Ciphertransformation can be inverted and then implemented in reverse order to
produce a straightforward Inverse Cipher for the AES algorithm [7],
Pseudo code for the Inverse Cipheris as follow:
















Figure 8: Pseudo Code for the Equivalent Inverse Cipher
i) InvShiftRows () Transformation
InvShiftRows () is the inverse of the ShiftRows () transformation. The byte in the
last three rows of the State are cyclically shifted over different numbers of bytes
(offsets). The first row, r = 0, is not shifted [7].
10
Final Year Project Dissertation
InvShiftRows {)
2^
sr.Q h.l hi Sr,l
50,0 hi hi hi
ho hi hi hi
ha hi hi hi
ho hi hi hi
tf
CE
ho hi hi h-3
S'
h$ hi h,2 $03
hi ho hi hi
^2.2 h; S2.0 hi
hi hi hi S1.Q
Figure 9: InvShiftRows () cyclically shifts the last three rows in the State
ii) InvSubBytes () Transformation
InvSubBytes () is the inverse of the byte substitution transformation, in which the
inverse S-box is applied to each byte of the State [7]. The inverse S-box used in the
InvSubBytes () transformation is presented in Appendix 4.
Hi) InvMixColumns () Transformation
InvMixColumns ( ) is the inverse of the MixColumns ( ) transformation.
InvMixColumns () operates on the State column-by-column, treating each column as
a four-term polynomial. The columns are considered as polynomials over GF(28) and
multiplied modulo x4 + 1 with a fixed polynomial a'l(x). This can be written as a




Oe Ob Qd 091 [vl
09 Oe Ob 0d i*i,. 1
Orf 09 Oe Ob \ he \
0& Orf 09 Qe k<j
for0<c<7Vft.
11
Final Year Project Dissertation
iv) InverseoftheAddRoundKey () Transformation
AddRoundKey (), which was described previously, is its own inverse, since it only
involves an application of the XOR operation [7].
2.3 DEDICATED HARDWARE IN AES
Rijndael (AES) is suited to be implemented in dedicated hardware. There are several
trade-offs between chip area and speed possible. In dedicated hardware, the SubBytes
step is the most critical part for a hardware implementation, for two reasons [8]:
1. In order to achieve the highest performance, Srd (S-Box) needs to be
instantiated 16 times (disregarding the key schedule). A straightforward
implementation with 16 256-byte tables is likely to dominate the chip area
requirements or the consumption of logic blocks.
2. Since Rijndael encryption and decryption are different transformations, a
circuit that implements Rijndael encryption does not automatically support
decryption.
However, when building dedicated hardware for supporting both encryption and
decryption, we can limit the required chip area by using parts of the circuit for both
transformations [8].
2.3.1 Decomposition of Srd
The Rijndael S-Box Srd is constructed from two transformations [8]:
Sw>[a]=/fefcy,
where g(a) is the transformation
a->d]mGF{2*),
andf(a) is an affine transformation. The transformation g(a) is a self-inverse and
hence
12
Final Year Project Dissertation
Therefore, when we want both Srd and Srd"1, we need to implement only g,/and/7.
Since both/and/7 can be implemented with a limited number of XOR gates, the
extra hardware can be reduced significantly compared to having to hardwire both
Srd and Srd"1 [8].
13
Final Year Project Dissertation
CHAPTER 3
METHODOLOGY / PROJECT WORK
3.1 PROCEDURE IDENTIFICATION
The flowchart below is the methodology conducted throughout the entire project
execution:
AES algorithmic review and analysis
Verilog & FPGA tools familiarization
AES core analysis and top layer
implementation
Design verificationof Combined Sbox/
Inverse Sbox with MATLAB
Module validation of Combined Sbox/
Inverse Sbox with test bench
Integration of Cipher & Inverse Cipher
forunifled Sbox module instantiation
Top layer implementation and FPGA
synthesis
Performance comparison between
original implementation and new
design
Figure 10: Methodology Flow Chart for the FYP Project Execution
14
Final Year Project Dissertation
3.1.1 AES Algorithmic Review and Analysis
The project was commenced with the review and analysis of AES-Rijndael
algorithm. Both official resources on the algorithm, which are FIPS PUB 197,
released by the NIST on the specifications of AES [reference 7] and The Design of
Rijndael (AES - Advanced Encryption Standard), written by its very creators Joan
Daemonand VincentRijmen [reference 8], were investigated thoroughly.
Next, the mathematical preliminaries in the AES design as well as the algorithmic
specifications were studied. All the standard specifications in the Cipher, Inverse
Cipher and Key Expansion were analyzed in order to identify the definite algorithm
implementations.
Furthermore, instances of AES implementation by other researchers and academia
were investigated in order to gain supplementary understanding on the subject
studied. The step served as a foundation for the architectural optimization and
algorithmic optimization of the cryptography.
3.1.2 Verilog and FPGA Tools Familiarization
Next, the very introduction into Verilog Hardware Description Language was done
via the interactive tutorial by Active-HDL, Aldec. The tutorial materials offered a
great insight into the increasing-significant hardware description language being
used in the industry. Moreover, crucial information on Verilog was obtained from
reference books as stated in the Reference section.
On the other hand, it was learned from the FPGA tool familiarization that the FPGA
designs described in Verilog would be verified by a simulator (suchas Active-HDL,
Aldec). The validated Verilog codes would then become an input to the FPGA
synthesis software (such as WEB Pack ISE), which performs the logic synthesis,
mapping, placing and routing. Eventually, reports describing the area and the speed
of the implementation, a net list used for timing simulation, and a bit stream to be
usedto program the FPGA device willbe generated by the tools.
15
Final Year Project Dissertation
3.1.3 AES Core Analysis and Top Layer Implementation
A simple yet efficient AES/Rijndael IP Core [reference 10] developed by Rudolf
Usselmann from ASICS.ws was studied and investigated. Nonetheless, it is an AES
implementation with only a 128 bit key expansion (excluding 192 and 256 bit key).
All the Verilog modules developed by the mentioned author, i.e. aes cipherJop.v,
aes inv cipherJop.v, aesjsbox. v, aesinv_sbox.v, aes_key_expanJ28.v and
aesrcon.v were analyzed thoroughly. Please refer to Appendix 8, 9, 10, 11, 12, 13.
The AES core consists of two blocks, i.e. the AES Cipher block which performs
encryption and the AES Inverse Cipher block which performs decryption. Both

































Figure 12: Inverse Cipher CoreArchitecture Overview
16
Final Year Project Dissertation
During the first phase of the research project, thecryptograph was implemented with
different size of data path. Data input, key input and data output of the cryptograph
were implemented with bit size of 64, 32 and 16 bits.
The top layer of the implementation was achieved with Verilog codes in behavioral
style, intended to find out the effect of different data path size on the
implementation. The arbitrary device chosen for the implementation (FPGA area
simulation) is Xilinx VirtexE - PQ240 -6. Despite their variation in the size of data
input, key input and data output, all the implementations have the common I/O
configurations and internal structure.
The performances in terms of area utilization and speed for each of the distinct
implementation were compared and analyzed. The investigation outcome obtained
was incorporated into thedesign andproject execution of the second phase.
3.1.4 DesignVerification ofCombined S-Box with MATLAB
From the discussion in section 2.3, it was realized that both S-Box and Inverse S-Box
could actually be derived from a common multiplicative inverse table, with merely
some additional logic gates. In other words, theareautilized in theFPGA forthe two
tables could actually be reduced to only one table.
More discussions on the algorithm simplification and the design rational would be
discussed in Chapter 4. Basically, the designing steps involved translating the affine
transformation matrix into its equivalent mathematical formula. The formula would
thenbe able to perform calculation on the multiplicative inverse in order to obtain S-
Box and InverseS-Box respectively.
The mathematical formula derived, together with the multiplicative inverse table,
were further translated into MATLAB codes for MATLAB simulation. Two output
arrays, which are the S-Box and the Inverse S-box, were compared against the
standard tables under AES specifications. The simulation result would be presented
in the next chapter. Nonetheless, the simulation step verified the feasibility of the
combination design.
17
Final Year Project Dissertation
3.1.5 Module Validation of Combined S-Box with Test Bench
Upon the verification test of the combined S-Box design, the algorithms in
MATLAB codes were translated into two Verilog modules, namely aes_sboxinv.v











Aes rcon Aes sbox.v











Figure 14:Relationships ofModified AES Modules
18
Final Year Project Dissertation
The aesjnulinv.v module would be containing the multiplicative inverse table
while the aesjsboxjnv.v module would be having the logical gates for both the S-
Box and the InverseS-Box computations.
As can be seen from Figure 13 and Figure 14, both the aessbox.v and the
aesinv_sbox.v modules were replaced by the combined, identical aesjsboxjnv.v
and aesmulinv.v modules.
Nevertheless, before it can be shownthat the gate count could be reducedby having
the new design implementation, the new modules devised were required to prove
that they would be capable in producing the same functionality and specification as
the original aessbox.v and aesjnvsbox.v did. Validation of the new design was
much crucial.
Therefore, a test bench written by Rudolf Usselmann from ASICS.ws was employed
for the justification purposes. As can be seen from Appendix 18, the test bench
consisted of 283 official test vectors, which were provided by NIST. While the
original implementation (Figure 13) developed by Rudolf showed an error-free test
output, the new designed modules replacing both the S-boxand the Inverse S-Box as
shown in Figure 14 were subjected for the same test.
The test output of this modified AES modules would be discussed in Chapter 4.
3.1.6 Integrated Crypto Module for Unified S-Box Module Instantiation
Upon the verification of the combined S-Box concept and the subsequent validation
of the new modules developed, test was conducted to review the area utilization
performance for the new implementation.
Referred to Figure 13, if the modules structure were subjected for a parallel
architecture and a high-speed design, aessbox.v would be instantiated 16 times
under aesjipherJop.v, 4 times under aes_key_expanJ28.v (under Cipher), and
another 4 times under aes_key_expanJ28.v (under Inverse Cipher). Meanwhile,
aesjnvsbox.v would be instantiated 16 times under aes inv cipherJop.v. By
19
Final Year Project Dissertation
having this configuration, it would mean that a total of forty (40) 256-byte tables
would be hard-wired into the FPGA. Even with the configuration as in Figure 14, the
total repetitions of table required would still be the same. More discussions on this
issuewould be presented in the subsequent chapter of results and discussion.
In order to be beneficial from the combined S-Box and Inverse S-Box, we would













Figure 15: Integrated Crypto Module for Unified S-Box Module Instantiation
The components for both the Cipher and the Inverse Cipher would be residing
within the aes cryptoJop.v module. By having this configuration, both Cipher and
Inverse Cipher could instantiate the aesjsbox inv.v module, which would return the
S-Box and Inverse S-Box values by further instantiating the aesjnul inv.v module
and performing some simplecomputation.
Hence, we could see that the total number of 256-byte table required for
implementing theSubBytes function had been reduced byhalf.
20
Final Year Project Dissertation
3.1.7 Top Layer Implementation and FPGA Synthesis
Both the original AES implementation (as in Figure 13) and the integrated AES
crypto (as in Figure 15) were implemented with a top layer data path and
input/output buffers.
Due to the reasons that would be discussed in the next chapter, the data path size was
chosen to be 64-bit for both of the configurations. Besides, research from phase 1
had shown that the bit size of the top layer data path would largely affect the total
gate count. Therefore, the data path implementation had to be the same for the two
cases so that we could conduct an unbiased area utilization performance and arrive at
a justifiable result.
The physical layout, operation and timing of the top layer implementation would be
presented in Chapter 4.
Next, both of the Verilog projects (original and modified) were synthesized for
syntax checking. Subsequently, they were translated and mapped (with an arbitrary
device) under Xilinx Web Pack, in order to obtain the final gate count for
performance comparisons.
3.1.8 Performance Comparison
Nevertheless, the execution of this project was only being on the simulation and
verification of the RTL codes.
The performances in terms of area and speed for each of the implementation were
compared and analyzed. The main objective of this research project was to find out
the performance in terms of area utilization for the combined S-Box AES design in
comparison with the ordinary AES design. The efficiency of space saving under this
new approach wasfound outandtheobjectives were achieved successfully.
21
Final Year Project Dissertation
3.2 TOOLS REQUIRED
1. Verilog Hardware Description Language Programming tools - Active-HDL
by Aldec
2. FPGA Software-WEB Pack ISE
22
Final Year Project Dissertation
CHAPTER 4
RESULTS AND DISCUSSION
4.1 OPTIMIZATION VIA S-BOX / INVERSE S-BOX INTEGRATION
There are two large substitution tables in AES. One is for SubBytes operation
[Appendix 3], and anotheris for InvSubBytes operation [Appendix 4].
The first table, S-box table, was used in two functions, i.e. SubBytes and
KeyExpansion. On the other hand, the inverse S-box table was used in function
InvSubBytes. We know that these two tables are not the same, thus we must build
two different ROMs (256-byte) in order to store the table to realize the S-box and
















Figure 16: TotalNumber of S-Box and Inverse S-Box Instantiation
Referring to the figure above, for a parallel architecture of AES design, it usually
needs several tables. For example, in a high-speed design of 128-bitAES [reference
10, which is the IP Corebeingreferred to in this project], it needsa totalof 24 S-Box
modules and 16 Inverse S-Box modules. 16 S-Box modules would be called in the
23
Final Year Project Dissertation
implementation of SubBytes function, another 4 S-Box modules would be called in
each of the implementation of KeyExpansion function under Cipher and Inverse
Cipher and then 16 inverse S-box modules would be called in the implementation of
InvSubBytes function.
In this case, a substantial amount of hardware resource will be occupied if SubBytes
and InvSubBytes utilize their own tables in encryption (Cipher) and decryption
(Inverse Cipher). It is hence desirable to obtain a simplified way so as to reduce the
hardware complexity.
As described in the section 2.3.1, the operation ofthe S-box can be expressed as
SRD[a]=/fgf4;,





0 0 0 11111
10 0 0 1111
110 0 0 111
1110 0 0 11












On the other hand, the operation of the inverse S-box can be expressed as:
W=*YA)j (2)
Since the transformation g(a) is a self-inverse and hence
$w=g1(f!(a))=g(f1(a)) (3)
where/Y^ *s tne inverse affine transformation
24
Final Year Project Dissertation
\
f(a) =
0 10 10 0 10
0 0 10 10 0 1
10 0 10 10 0
0 10 0 10 10
0 0 10 0 10 1
10 0 10 0 10
0 10 0 10 0 1










Therefore, when we want both Srd and Srd"1, we need toimplement only g,fmdf}.
Since both/and/7 can be implemented with a limited number of XOR gates, the
extra hardware can be reduced significantly compared to having to hardwire both
Srdand Srd'1 [8].
By examining Eq.l and Eq. 3, a common look-up table, i.e. Multiplicative Inverse,
g(xy), is employed so the S-Box and the Inverse S-Box can be integrated to reduce
the hardware requirement for SubBytes andInvSubBytes. Referto Appendix 5 for the
look-up table ofg(xy).
Subsequently, the function off(a) could be achieved by:
hi = byxor bexor bs xor b4 xor b3 xor 0
be = be xor bs xor b4xor b3xor 62 xor 1
bi = bs xor b4xor b3xor b2xor bi xor 1
bi = b4 xor b3 xor b2 xor bi xor bo xor 0
bi - byxor bs xor 62 xor bi xor boxor 0
62' = b7xor ft(j xor £2 xor bi xor 60 xor 0
bi = byxor be xor 65 xor 6/ xor boxor 1
60' = byxor Z>6 xor 65 xor £4 xor &o xor 1
On the other hand, the function off(a) could be achieved by:
by - (be xor 1)xor (b4 xor 0) xor (bi xor 1)
be ~ (bs xor 1)xor (63 xor 0) xor (bo xor 1)
bi = (by xor 0) xor (b4 xor 0) xor (b2 xor 0)
bi = (be xor 1)xor (63 xor 0) xor (bi xor 1)
bi = (bsxor 1) xor (b2 xor 0) xor (bo xor 1)
25
Final Year Project Dissertation
h' = (byxor 0) xor (b4 xor 0) xor (bi xor 1)
Si = (bexor 1) xor (Z>3 xor 0) xor (bo xor 1)
in = (by xor 0) xor (b< xor It xor (b->xor 0)
02
b 6
bo' £7 s 1) 2
Thus, we could see that merely 40 XOR gates are required by each of the process for
the above implementation. Moreover, a look up table of 16x16 bytes was totally
eliminated.
Nevertheless, further optimization proved that the algorithmic structure off(a) could
be simplified. In fact, the XORing of 3 Booleansvalues within each equation could
be reduced to merely 1 finite value. The optimized structure would be as followed.
Function off(a) could be achieved by:
by =be xor b4xor bi xor 0
be - bs xor 63 xor boxor 0
bi = byxor b4xor b2xor 0
b4 - be xor 63 xor b\ xor 0
bi = bs xor #2 xor boxor 0
bi = byxor b4xor bj xor 1
bi = be xor b3xor boxor 0
bo = byxor bs xor b2xor 1
Hence, we could see that merely24 XOR gates are required, as comparedto the 40
XORgates in the previous implementation. 40%of gatecounthas beenreduced.
Therefore, by merely utilizingone look-up table, which is the Multiplicative Inverse
table, it was estimated that the amount of hardware for this implementation would
have a significant decrease of 50%, as compared with the original hardware
requirement without the functional integration.























b(l :4) = dec2bin(base2dec(h(2*n-l),16),4);
b(5:8) = dec2bin(base2dec(h(2*n),16),4);
form =1:8







































S(l) = h(temp*2 + 1);
S(2)= h(temp*2+ 2);
27











Figure 17: MATLAB M-file Codes for Design Verification
Please note from the above MATLAB codes that the subscripts used for the notation
of f(a) and f(a) in MATLAB codes are in the reversing order. This is due to the
differing notation handling of matrix elements in MATLAB. The structure, however,
is identical.
From the simulation result, it was found out that:
' J Array Editor: Show
























































































































Final Year Project Dissertation
'! Array Editor: InvSboK
File Edit View . Web . Window Help






















































































































Figure 19: Inverse S-box (Srd-1)
From the simulation results shown, it was verified that both the S-Box and the
InverseS-Box can be achieved effectively by the concept suggested. The logic gates
for deriving the two look-up tables would be sharing a common g(x,y) table.
Therefore, it managed to reducethe necessity fromhaving 2 look-up tables to only 1,
while ensuring that the contents within the 2 original tables would be always
deliverable.
4.1.2 Integrated S-Box Verilog Modules Construction
With ordinary implementation, Cipher would be invoking S-Box meanwhile Inverse
Cipher would be invoking Inverse S-Box separately. Besides, Key Expand module
would be invoking S-Box as how Cipher does. Please refer to Figure 13 in Section
3.1.5 for the illustration.
By combining both S-Box and Inverse S-Box undera unifiedmodule, a multiplicative
inverse table would be residing in a lower hierarchy module, and the combined
module of S-Box/Inverse S-Box would be incorporating some simple logic gates in
order to realize the required output. Nevertheless, an additional input would be
passed into the module in order to retrieve the out of EITHER S-Box OR Inverse S-
Box.
29
Final Year Project Dissertation






Figure 20: Equivalence of Modules afterthe Integration
The Verilog module for realizing the S-Box and the Inverse S-Box computational
gates is aes_sboxinv.v. A Boolean value, b, in which a value of 1 is for encryption
and a value of 0 is for decryption, would bepassing into themodule for invoking the
intended operation. The following is the partial Verilog codes of the module; please



















































Figure 21: Partial Verilog codes of aes_sbox_inv.v
30
Final Year Project Dissertation
Upon selecting the desired operation, the module aesjmuljnv.v that contains the
multiplicative inverse table would be instantiated in order to load the g(x,y) values
for S-Box values or Inverse S-Box values computation. The following is the partial

























Figure 22: Partial Verilog codes of aes_mul_inv.v
4.1.3 S-Box Integration Module Validation with Test Bench
As stated in Section3.1.5, the newly designed Verilog modules for the integrated S-
Box were subjected to validation test, in order to verify their functionality and
validity.
By having the Verilogmodule hierarchy as shown in Figure 14 (in Section 3.1.5), in
which the S-Box moduleand the Inverse S-Box modulewere replacedby the module
pair of S-Box-Inverse and multiplicative inverse, the design was subjected for a
simulation test of 80 ms. The simulation output as displayed in the following Figure
23 proved that the integrated S~Box module would be capable in producing the
identical and error-free S-Box values and Inverse S-Box values, without having to
compromise the speed and the modification of algorithm in the topper hierarchy
modules.
31
Final Year Project Dissertation
Hence, the design was proven in providing a perfect substitution for the original S-
Box and Inverse S-Box modules.
asim test
# ELBREAD: Elaboration process.
# ELBREAD: Elaboration time 0.0 [s].
# KERNEL: Main thread initiated.
# KERNEL: Kernel process initializatio phase.
# KERNEL:Time resolutionset to lOps.
# ELAB2:Elaborationfinal pass...
# ELAB2: Create instances ...
# ELAB2: Create instances complete.
# ELAB2: Elaboration final pass complete - time: 1.0 [s],
# KERNEL: Kernel process initialization done.
# Allocation:Simulatorallocated 1385kB (elbread=250 elab2=857kernel-278 sdf=0)
# 2:47 PM, Saturday, April 10,2004
# Simulation has been initialized










# : Started random test...
#:
#:
# : Test Done. Found 0 Errors.
#:
#:
# RUNTIME: RUNTIMEJ)068 test_bench_top.v (437): Jfinish called.
# KERNEL: stopped at time: 77015 ns
endsim
Figure 23: Test Bench Simulation Output
4.2 COMBINATION OF CIPHER AND INVERSE CIPHER
As contrary to Figure 16 in Section 4.1, the following Figure 24 shows that only half
of the total numbers of look-up tables is desired. The number displayed an
encouraging decrease from (16 + 4 + 16 + 4 = 40) to (16 + 4 = 20).
Nonetheless, this decrease could not be achieved if we separate the Cipher module
and the Inverse Ciphermodule. The both need to be combined or unified so that they
could instantiate the same set of integrated S~Box modules. While both the Cipher
and the Inverse Cipher residing in a unified crypto module, all the repetitions of
modules in the lower hierarchy could now be eliminated.
32
Final Year Project Dissertation
Figure 24: Total Number of S-Box and Inverse S-Box Instantiation
As can be seen from this new relationship, key expansion would require 4 sboxjnv
(inclusive of muljnv) at a time, for the pipelining implementation. Besides, Cipher
and Inverse Cipher would be sharing 16 duplicated sboxjnv (inclusive of muljnv)
for SubBytes and InvSubBytes respectively, instead of having 16 aes_sbox and 16
aesJnv_sbox. In fact, they could share the combined module because they are two
mutualexclusive processes and only one processwould be invoked at a time.
The combined S-Box/Inverse S-Box module would be instantiated within unified
crypto module as followed:
Sbox_inv identifier (h(lorO\ .a( input value), .d( output value));
in which
identifier is to duplicate the module 16 times for pipelining implementation
b is a Boolean value in which 1 is for encryption and 0 is for decryption
The following is the partial Verilog codes of the unified crypto module, please refer
to Appendix 15 for the complete moduleconstructed.
33
Final Year Project Dissertation








aes_sbox_inv us00( .b( ciph_opt ) •a( in sOO )>.d( out_s00 ));
aes_sbox_inv us01( .b( ciph_opt ; .a( in sOl M( out_s01 ));
aes_sbox_inv us02( .b( ciph_opt ; •a( in_s02 ),-d( out_s02 ));
aes_sbox_inv us03( •b( ciph_opt ) •a( in s03 ),-d( oat_s03 ));
aes_sbox_inv uslO(
-b( ciph_opt ] .a( in_slO ),-d( out_slO ));
aes_sbox_jnv usll( .b( ciph_opt ; .a( in_sl 1 ),-d( out_sll ));
aes_sbox_inv usl2( •b( ciphopt ; .a( in_sl2 X-d( out_sl2 ));
aes_sbox_inv usl3( ,b( ciph_opt ] .a( in sl3 ),-d( out_sl3 ));
aes_sbox_inv us20( •b( ciphopt ; •a( in_s20 ),A out_s20 ));
aes_sbox_inv us21( .b( ciph_opt ] •a( in_s2I I A out_s21 ));
aes_sbox_inv us22(
-b( ciphopt ' •a( in_s22 ),.d( out_s22 ));
aes_sbox_inv us23( .b( ciph_opt ; •a( in s23 ),.d( out_s23 ));
aes_sbox_inv us30( •b( ciph_opt ) •a( in_s30 X.d( out_s30 ));
aes_sbox_inv us31( .b( ciph_opt -a( in_s31 );.d( out_s31 ));
aes_sbox_inv us32( .b( ciph_opt '
-a( in_s32 ),.d( out_s32 ));
aes_sbox_inv us33( •b( ciph^opt ; -a( in_s33 ),-d( out_s33 ));
Figure 25 Partial Vei*ilog codes of aes_crypto_top.v
4.3 TOP LAYER IMPLEMENTATION
In order to simulateand synthesize both of the original design and the new design, a
top layer data path for the input/output would be necessary. It would work as an
implementation for the AES core.
From the research finding obtained in the first phase, the area utilization of the top
layer data path for achieving a constant size of data input/output is inversely
proportional to the bit size of the per cycle input/output. In other words, to achieve
the AES input/output that is 128-bit, more gates would be required if 32-bit per I/O
cycle is employed as compared to 64-bit per I/O cycle. Nevertheless, one would have
to consider the total I/O available on the targeted FPGA chip, before deciding on a
larger number of data path size for the sake of area saving.
Secondly, despite on the fact that more gates would be required for a smaller data
path size, another drawback would be a smaller size of datapath would require more
clock cycles to complete the data input/output.
34
Final Year ProjectDissertation
During the first phase of the research project, the Cipher of the AES Core was
implemented with different data path size of64, 32 and 16 bits. The arbitrary device
chosen for the FPGA area simulation was Xilinx VirtexE - PQ240-6. The results of
the simulation would evidently illustrate the effects of different data path size onthe
overall implementation.
4.3.1 Results Comparison ofVarious Data Path Size
Table 2: Comparison of total clock cycle
16-bit Data Path 32-bit Data Path 64-bit Data Path
Clock Cycle 28 20 16
It was observed that the more the 128-bit data block is divided into smaller clusters,
the more clock cycles arerequired to complete theoperation.
Table3: Comparison of sliceutilization percentage
16-bit Data Path 32-bit Data Path 64-bit Data Path
Slice Utilization % 96 85 58
It was noticed that more slices are required for relatively small sized data path.
Implementation of 16-bit data path almost reaches the full limit of area utilization.
Table 4: Comparison of the number of 4-input LUT
16-bit Data Path 32-bit Data Path 64-bit Data Path
4-input LUT % 78 68 42
Again, it was found that more 4-input LUTs were required for relatively small sized
data path.
Table 5: Comparison of the number of bonded IOB
16-bit Data Path 32-bit Data Path 64-bit Data Path
Bonded IOB % 32 62 123
Additional JTAG
gate count for IOB
2,496 4,800 9,408
Nevertheless, it could be seen that the 64-bit data path implementation exceeds the
allowable physical limit of IOB (for the particular Xilinx VirtexE device chosen for
35
Final Year Project Dissertation
the simulation). Therefore, IOB would be a major factor in determining the data path
size despite the undeniable fact that larger data path size would require less area
utilization.
Moreover, as listed in the table, the additional JTAG gate count required for IOB
was linearly proportional to the data path size; in which JTAG gate for the 32-bit
data path was double the size for the 16-bit data path, meanwhile JTAG gate for the
64-bit data path was even quadruple the size for the 16-bit data path.
Table 6: Comparison of total equivalent gate count




Conclusively, greater number of total equivalent gate count was required for smaller
data path size implementation.
4.3.2 64-bit Top Layer Data Path Implementation
With the goals of:
1) reducing the clock cycle as small as possible
2) not exceeding the bonded IOB
3) remaining at low equivalent gate count
4) utilizing less number of slices
5) minimizing the 4-input LUT
The top layer for both of the original design and the integrated S-Box design were
constructed with data path size of 64-bit. The arbitrary device chosen, which was
Xilinx VirtexE - BG352-8, supported this number of data path size. Please refer to
Appendix 7 - aes_ori64Jop.v for the top layer of the original design andAppendix
14- aesjiew64Jop.v for the top layerof the integrated S-Box design.
The common physical layout and the pin description would be as followed.
36


















Figure26: PhysicalLayoutof Top LayerData Path Implementation
Table 7: Pin Description ofTop Layer Data Path Implementation
Name Type Description
rst Input Core reset, active low
elk Input Core clock signal
start Input When HIGH, a cryptographic operation is started
option Input HIGH for encryption; LOW for decryption
Key in [63:0] Input Input key
Text in [63:0] Input Input data
Text out [63:0] Output Output data
Get output Output Output data valid
A rising input on the 'start' port would trigger the beginning of a cryptographic
operation on the data 'textin', using the 'keyjn' as key. The data block (from
'textin') and the key (from 'key_in') would then be fed into the core serially, 64
bits at a time. All the input blocks would be captured at the rising edge of the
external supplied clock 'elk'.
With the proper selection of 'option' for either encryption or decryption, and
subsequently upon the receiving of the 128-bit data block, the core would start to
process the stateaccording to the AES algorithm.
Thetiming diagram below would show how the data is fed to thecoreat the start.
37









Figure27: Key and Data Input at the start of Encryption/Decryption
When all the rounds were completed, the 'getoutput' signal would be raised and the







Figure 28: Output Text from an Encryption/Decryption Operation BeingOutputted
4.4 PERFORMANCE COMPARISON
Before drawing a conclusion on the overall performance, the gate counts of the
individual modules of S-Box, Inverse S-Box and Integrated S-Box would be
evaluated.






Final Year Project Dissertation
Therefore, the percentage reduce in gate count in terms of the S-Box modules would
be
[(1836 + 1815)- 1974] / (1836 + 1815) x 100% = 45.9%
The Integrated S-Box module performed the functions of both S-Box and Inverse S-
Box with only one look-up table so that the amount of hardware for implementation
ofSubBytes and InvSubBytes would have a significant decrease of 46%, as compared
with the original hardware implementation without the functional integration.
Subsequently, the overall performance in terms of area utilization would be
compared. Please refer to Appendix 20 for the Map report of the original design and
Appendix 22 for the Map report of the Integrated S-Box design.
Table 9: Performance Comparison ofArea Utilization
Aspect Original Design Integrated S-Box Design














Hence, the percentage of overall gate count for the new design would be
78977 /114785 x 100% = 68.8%
In other words, the new design would be just 69% of the original design in area
utilization. Nevertheless, the unified crypto module of the new design was merely
combining both the Cipher and the Inverse Cipher modules without further
optimizing the module architecturally. In fact, the performance of the area utilization
could further be improved if the unified crypto module would be optimized and
enhanced beforehand.
39
Final Year Project Dissertation
Next, the overall performance in terms of speed would be evaluated. Please refer to
Appendix 19 for the Synthesis report of the original design and Appendix 21 for the
Synthesis report of the Integrated S-Box design.
Table 10: Performance Comparison ofSpeed
Aspect Original Design Integrated S-Box Design
Critical Path
(minimum)
23.042 ns 39.333 ns
Clock Frequency
(maximum)
43.399 MHz 25.424 MHz
Clock Cycle 12 (core)
+ 2 (input data path)
+ 2 (output data path)
= 16 cycles
12 (core)
+ 2 (input data path)
+ 2 (output data path)
= 16 cycles
Throughput 128-bit/(16 x 23.042 ns)
- 347 Mbit/sec
128-bit/(16 x 39.333 ns)
= 203 Mbit/sec
Given the fact that the design was synthesized under the optimization goal of area
and with the optimization effort of high (grade 2), the decrease in clock frequency
and subsequently the decrease in throughput would be inevitable.
Nonetheless, the data delivery would still be in the range of high data rate. With
further architectural optimizations on the unified crypto module, much improvement
would be able to be achieved.
40
Final Year Project Dissertation
CHAPTER 5
CONCLUSION
5.1 REVIEW AND CONCLUSIONS
The complete execution of the AES cryptograph inevitably requires two large
substitution tables, in which one is called S-Box and another is called Inverse S-Box.
S-Box table is used in two functions, which are SubBytes in the encryption process
and KeyExpansion in the encryption/decryption process. On the other hand, Inverse
S-Box is used in function InvSubBytes in the decryption process. Absolutely, it is
apparent that these two tables are not the same, thus requiring two different ROM's
of 256-byte in order to store the table for the operations mentioned above.
For a high-speed, full pipelining and parallel architecture of AES implementation,
the requirement of the look-up tables can go to an extent of 24 S-Box tables and 16
Inverse S-Box tables at any one time. Within the circumstances, a substantial amount
of hardware resource will be utilized if SubBytes and InvSubBytes utilize their own
tables in encryption (Cipher)and decryption (Inverse Cipher).
Nevertheless, it is enthralling that the hardware complexity could actually be largely
reduced by integrating both the S-Box table and the Inverse S-Box table under a
combined module. Mathematical formulas used to derive the two tables show that
when we want both S-Box and Inverse S-Box, we need to implement only g,/and/7,
in which g is a 256-byte table called multiplicative inverse and both/and/7 can be
implemented with a limited number of XOR gates. In other words, with merely one
look-up table and some simple logic gates, we would still be obtaining both S-Box
and Inverse S-Box.
The design concept was proven with MATLAB at the outset, followed by the
construction of aes sbox inv.v module and aes mul inv.v module, which were
41
Final YearProjectDissertation
validated by a test bench with official AES test vectors. The module of
aesjsboxjnv.v contains the logic gates for realizing the/and/7 functions while the
module ofaesjnulJnv.v is itself the multiplicative inverse look-up table.
By implementing with a top layer data path of 64-bit and subsequently subjected
under the synthesis and the arbitrary device mapping in Xilinx Web Pack, the new
design shows that it can deliver a throughput of 203 Mbit/sec with hardware of
78,977 gate count. Hardware complexity is reduced to 69% of its original yet still
functions at core process of merely 12 cycles.
Conclusively, this new design managed to effectively optimizethe area utilization of
AES in applications that are area-crucial, such as smart cards readers and mobile
phones, while still allowing them to run with essential high speed. The objectives of
the research project were met through the architectural optimizations and the
algorithmic optimizations of the substitution tables.
5.2 SUGGESTED FUTURE WORK FOR EXPANSION &
CONTINUATION
In termsofarea utilization, it can be further improved by performingthe following:
(1) Optimize the architectural structure of the unified crypto module by
removing the duplicating data registers of the Cipher andthe Inverse
Cipher.
(2) Integrate MixColumns and InvMixColumns and subsequently share
the xtime() function and its derivatives (higher order ofthe function).
(3) Fully utilize the key buffer for KeyExpansion function so that
continual encryption or decryption process could retrieve their round
keys from the buffer without re-generating it.
(4) Implement the design with its full range ofkey size, i.e. 128-bit, 192-
bit and 256-bit key.
(5) Implement pipelining for the top layer data path so that the
cryptograph can continuously input / output data while the core is
executing the encryption / decryption.
42
Final Year Project Dissertation
REFERENCES
[1] Maire McLoone, John V McCanny, "Rijndael FPGA Implementation
UtilizingLook-up Tables", SignalProcessing Systems, IEEE (2001)
[2] Xinmiao Zhang, Keshab K. Parhi, "Implementation Approaches for the
Advanced Encryption Standard Algorithm", Circuits and Systems Magazine,
IEEE (2002)
[3] Christian Chitu, David Chien, Charles Chien, Ingrid Verbauwhede, Frank
Chang, "A Hardware Implementation in FPGA of the Rijndael Algorithm",
Circuits and Systems, The200245thMidwest Symposium (2002)
[4] Donald E. Thomas, Philip R. Moorby, "The Verilog Hardware Description
Language", 4th Edition, Kluwer Academic Publishers (1998)
[5] Bruce Schneier, "Applied Cryptography: Protocols, Algorithms, and Source
Code inC", 2nd Edition, John Wiley and Sons, Inc (1996)
[6] Joan Daemen and Vincent Rijmen, "AES Submission Document onRijndael,
Version 2", September 1999.
http://csrc.nist.gov/CrvptoToolkit/aes.rijndael/Riindael.pdf
[7] FIPS PUB 197, "Advanced Encryption Standard (AES)", National Institute
of Standards and Technology, U.S. Department of Commerce, November
2001.
http://csrc.nist.gov/publications/fips/fipsl97/fips-197.pdf
[8] Joan Daemen and Vincent Rijmen, "The Design ofRijndael, AES - The
Advanced Encryption Standard", Springer (2001)
[9] Verilog HDL Language Training, "Active-HDL, Version 5.1", Aldec (2001)
44
Final Year Project Dissertation
[10] Rudolf Ussehnann, "AES (Rijndael) IP Cores", Rev 1.1,ASICS.ws,
November 2002.
http://www.opencores.org/proiects/aes core/
[11] Chih-Chung Lu and Shau-Yin Tseng, "Integrated Design of AES (Advanced





















































































































































































































































































































































































































































































































































































































































































































































































































































































































































Appendix 3 S-Box: substitution values for the byte xy
Appendix 3 S-box: substitution values for the byte xy (in hexadecimal format)
Y
0 I 2 3 4 5 6 7 8 9 a b c d e f
X
0 63 7c 77 7b f2 6fo 6f c5 30 01 67 2b fe d7 ab 76
1 ca 82 c9 7d fa 59 47 fO ad d4 a2 af 9c a4 72 cO
2 b7 fd 93 26 36 3f f7 cc 34 a5 e5 fl 71 d8 31 15
3 04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75
4 09 83 2c la lb 6e 5a aO 52 3b d6 b3 29 e3 2f 84
5 53 dl 00 ed 20 fc bl 5b 6a cb be 39 4a 4c 58 cf
6 d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8
7 51 a3 40 8f 92 9d 38 f5 be b6 da 21 10 ff f3 d2
8 cd Oc 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73
9 60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db
a eO 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79
b e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08
c ba 78 25 2e lc a6 b4 c6 e8 dd 74 If 4b bd 8b 8a
d 70 3e b5 66 48 03 f6 Oe 61 35 57 b9 86 cl Id 9e
e el f8 98 11 69 d9 Oe 94 9b le 87 e9 ce 55 28 df
f 8c al 89 Od bf e6 42 68 41 99 2d Of bO 54 bb 16
Appendix 4 Inverse S-Box: substitution values for the byte xy
Appendix 4 Inverse S-box: substitution values for the byte xy (in hexadecimal
format)
3f
0 1 2 3 4 5 6 7 8 9 a b c d e f
X
0 52 09 6a d5 30 36 a5 38 bf 40 a3 9e 81 f3 d7 fb
1 7c e3 39 82 9b 2f ff 87 34 ae 43 44 c4 dc e9 cb
2 54 7b 94 32 a6 c2 23 3d ee 4c 95 0b 42 fa c3 4e
3 08 2e al 66 28 d9 24 b2 76 5b a2 49 6d 8b dl 25
4 72 f8 f6 64 86 68 98 16 d4 a4 5c cc 5d 65 bb 92
5 6c 70 48 50 fd ed b9 da 5e 15 46 57 a7 8d 9d 84
6 90 d8 ab 00 8c be d3 0a f7 e4 58 05 b8 b3 45 06
7 dO 2c le 8f ca 3f Of 02 cl af bd 03 01 13 8a 6b
8 3a 91 11 41 4f 67 dc ea 97 f2 cf oe fO b4 e6 73
9 96 aa 74 22 e7 ad 35 85 e2 f9 37 e8 lc 75 df be
a 47 fl la 71 Id 29 c5 89 6f b7 62 Oe aa 18 be lb
b fc 56 3e 4b c6 d2 79 20 9a db cO fe 78 cd 5a f4
c If dd a8 33 88 07 c7 31 bl 12 10 59 27 80 ec 5f
d 60 51 7f a9 19 b5 4a Od 2d e5 7a 9f 93 c9 9c ef
e aO eO 3b 4d ae 2a f5 bO c8 eb bb 3c 83 53 99 61
f 17 2b 04 7e ba 77 d6 26 el 69 14 63 55 21 0c 7d
Appendix 5 Multiplicative Inverse, g(xy)
Appendix 5 Multiplicative Inverse, g(xy) (in hexadecimal format)
y
0 1 2 3 4 5 6 7 8 9 A B C D E F
X
0 00 01 8D F6 CB 52 7B Dl E8 4F 29 CO BO El E5 C7
1 74 B4 AA 4B 99 2B 60 5F 58 3F FD CC FF 40 EE B2
2 3A 6E 5A Fl 55 4D A8 C9 CI OA 98 15 30 44 A2 C2
3 2C 45 92 6C F3 39 66 42 F2 35 20 6F 77 BB 59 19
4 ID FE 37 67 2D 31 F5 69 A7 64 AB 13 54 25 E9 09
5 ED 5C 05 CA 4C 24 87 BF 18 3E 22 FO 51 EC 61 17
6 16 5E AF D3 49 A6 36 43 F4 47 91 DF 33 93 21 3B
7 79 B7 97 85 10 B5 BA 3C B6 70 DO 06 Al FA 81 82
8 83 7E 7F 80 96 73 BE 56 9B 9E 95 D9 F7 02 B9 A4
9 DE 6A 32 6D D8 8A 84 72 2A 14 9F 88 F9 DC 89 9A
A FB 7C 2E C3 8F B8 65 48 26 C8 12 4A CE E7 D2 62
B OC E0 IF EF 11 75 78 71 A5 8E 76 3D BD BC 86 57
C OB 28 2F A3 DA D4 E4 OF A9 27 53 04 IB FC AC E6
D 7A 07 AE 63 C5 DB E2 EA 94 8B C4 D4 9D F8 90 6B
E BI OD D6 EB C6 OE CF AD 08 4E D7 E3 5D 50 IE B3
F 5B 23 38 34 68 46 03 8C DD 9C 7D AO CD 1A 41 1C
Appendix 6 Round Constants for the key generation
Appendix 6 Roundconstantsfor the key generation
i 0 1 2 3 4 5 6 7
RC[i] 00 01 02 04 08 10 20 40
/ 8 9 10 11 12 13 14 15
RC[i] 80 IB 36 6C D8 AB 4D 9A
/ 16 17 18 19 20 21 22 23
RC[i] 2F 5E BC 63 C6 97 35 6A
i 24 25 26 27 28 29 30 31
RC[zj D4 B3 7D FA EF C5 91 39
Appendix 7 Original Design - aes ori64Jop.v
tintiiititiiiitiitiiiiiiitiiiiiimnatmummiitiiiiiiiitiiii
lilt llll
lilt AES Top Layer for Original AES Core developed ////
//// by Rudolf Usselmann (rodi@aslcs.ws) ////
//// (for serial input of 64 bit per cycle) llll
mi mi









It SDate: 2004/4/01 S
// SRevision: 1.0 5







'include "c: \aes_tnodules\ori\timescale, v"






































counter <= #1 8'hOO;
counter <= #1 8'bOO;
counter <= #1 8'hOl;
counter <= #1 counter + 3'hOl;
run <= #1 1'bO
run <= ffl l'bl
run <= #1 1'bO
option^buf <= #1 option;
always
always (posedge elk)
if (!(Icounter) S start)
else
iflcounter == 8'hOl)
teMt_in_buf[063:000] <= #1 text__in;
text in buf[127:064] <= #1 text_in;
always @(posedge elk)
if (I (|counter) S start)
else
iflcounter == B'hOl)
key_buf[063:000] <= #1 key_in;








if(start) load_ciph <= #1 1'bO;
else
iflcounter == 8'hOl) load_ciph <= #1 l'bl;
else







load_key <= #1 1'bO;
'hOl) load_key <= #1 l'bl;
rh02) load^key <= #1 1'bO;




load inv <= #1
Original Design - aesj>ri64Jop.v
cnt_temp <= #1 cnttemp - 8'hOl;
|cnt_temp[7:l]) & cnt__temp[0];
// encryption or decryption process takes place
always @[posedge elk)
if(donel IJ done2) assign cnt_terap2 = counter;
always @ (posedge elk)
if(donel M done2) text_out <= #1 option_buf ? ciphertext_out[063:000] : plaintext_out[063:000];
else
if((counter - cnt_terap2) == 8'h01) text_out <= #1 option_buf 7 ciphertext_out[127:064] :
plaintext out [127:064);
always S(posedge elk)
if(donel || done2 ((counter - cnt_temp2) == 8'hOl)) get_output <= #1 l'bl;
always @(posedge elk)
if([counter - cnt_temp2) == 6'hOl)
else
if((counter - cnt_temp2) == e'hQ2)
es cipher top uO(
.clk( elk
.rst( rst
.ld( load ciph ),
.done( donel ),
•key( key buf ),




aes inv cipher top ul(
.clk( elk
.rst [ rst
.kld( load key ),
.Id ( load inv ),
.done( done2 ),
-key( key buf ),
.text in( text in_buf ),
complete <= #1 l'bl;
complete <= #1 1'bO;
Appendix 8 Original Design - aes cipherJop.v
iHiiimimmimmiiimiiimmimmmimmimmiHim
in/ mi
llll AES Cipher Top Level ////
//// ////
//// ////




//// Downloaded from: http://www.opencores.org/cores/aes core/ ////
//// ////
I mimummii mim mim milmimummimummimi
mi mi




//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED "AS IS'' AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////





// Sid: aes_cipher_top.v,v 1.1.1.1 2002/11/09 11:22:48 rudi Exp $
//
// $Date: 2002/11/09 11:22:48 S
// SRevision: 1.1.1.1 $
// SAuthor: rudi $
// SLocker: S
// SState: Exp $
//
// Change History:
// $Log: aes_cipher_top.v, v 3














input [127:0] text in;
output [127:0] text_out;




wO, Hi, wZ, w3;
text_in_r;
text^out;
saOO, saOl, sa02, sa03;
salO, sail, sal2, sal3;
sa20, sa21, sa22, sa23;
sa30, sa31, sa32, sa33;
sa00_next, sa01_next, sa02_next, sa03_next;
salO^next, sall_next, sal2_next, sal3_next;
sa20_next, sa21_next, sa22_next, sa23_next;
sa30_next, sa31_next, sa32_next, sa33_next;
sa00_sub, sa01_sub, sa02_sub, sa03_sub;
salO_sub, sall_sub, sal2_sub, sal3_sub;
sa20_sub, sa21_sub, sa22_sub, sa23_sub;
sa30_sub, sa31_sub, sa32_sub, sa33_sub;
sa00_sr, sa01_sr, sa02_sr, sa03_sr;
salO_sr, sall_sr, sa!2_sr, sal3__sr;
sa20_sr, sa21_sr, sa22_sr, sa23_sr;
sa30_sr, sa31_sr, sa32_sr, sa33_sr;
sa00_mc, sa01_mc, sa02_mc, sa03_mc;
salO_mc, sall_mc, sal2_mc, sal3_mc;
sa20_mc, sa21_mc, sa22_mc, sa23_mc;
































if (!rst) dent <= ffl 4'hO;
else
if(ld) dent <= #1 4'hb;
else
if(Ident) dent <= #1 dent - 4'hl;
always 8(posedge elk) done <= #1 !(|dcnt[3:1]} & dcnt[0] s !ld;
always 8(posedge elk) if(Id) text_in_r <= [fl text_in;
always 8(posedge elk) ld_r <= #1 Id;
Immi mmm miimmi mmit m m mmmii inmuniin
ii
II Initial Permutation (AddRoundKey)
//
Original Design- aes_cipherJop.v
always @ [posedge elk) sa33 <= #1 id r ? text_in_r[0Q7 000] " w3[07 00] sa33 next
always @ (posedge elk] sa23 <= #1 Id r ? text in r[015 008] " w3[15 08] sa23 next
always @(posedge elk) sal3 <= #1 Id r ? text in r[023 016] " w3[23 16] sal3 next
always @ (posedge elk) aa03 <= #1 Id r ? text_in_r[031 024] - w3[31 24] sa03 next
always @ (posedge elk) sa32 <= #1 Id r ? text in r[039 032] " w2[07 00] sa32 next
always @[posedge elk) sa22 <= #1 Id r ? text__in_r[047 040] " w2[15 08] sa22_next
always @(posedge elk) sal2 <= #1 Id r ? text in r[055 048] " w2[23 16] sal2 next
always @(posedge elk) sa02 <= #1 Id r ? text__in r[063 056] " w2[31 24] sa02_next
always @(posedge elk) sa31 <= #1 Id r ? text in r[071 064] " wl[07 00] sa31 next
always 8(posedge elk) sa21 <- #1 Id r ? text in r[079 072] " wl[15 08] sa21 next
always @(posedge elk) sail <= #1 Id r 7 text in r[087 080] * wl[23 16] sail next
always @(posedge elk) saOl <= #1 Id r 7 text__in_r[095 086] " wl[31 24] saOl next
always @(posedge elk) sa30 <= n Id r 7 text in r[103 096] - w0[07 00] sa30 next
always @(posedge elk) sa20 <= n Id r 7 text in r[lll 104] - w0[15 08] sa20__next
always 8(posedge elk) salO <= #i Id r ? text_in_r[119 112] A w0[23 16] salO next





























































































sal0_mc, sa20_mc, sa30_mc] = mixcol(sa00_sr,salO_sr,
sall_mc, sa21_rnc, sa31_mc] = mix_col (sa01_sr, sall_sr,
sal2_mc, sa22_mc, sa32_mc] = mix_col(sa02_sr,sal2_sr,
sa!3__mc, sa23_me, sa33_mc] = mix_col(sa03_sr,sal3_sr,
= sa00_mc * w0[31:24]
- saOl mc A wl[31:24]
sa20 sr sa30 sr)
sa21 sr sa31 sr)
sa22 sr sa32 sr)































II Final text output
//
always 8(posedge elk) text
always 8 (posedge elk) text
always 8(posedge elk) text
always 8(posedge elk) text
always @(posedge elk) text
always 8(posedge elk) text
always 8(posedge elk) text
always 8 (po3edge elk) text
always 8(posedge elk) text
always 8(posedge elk) text
always 8 (posedge elk) text
always 8 (posedge elk) text
always 8 (posedge elk) text
always 8(posedge elk) text
always 8 (posedge elk) text

























































































































, -d( saOO sub
. .d( saOl sub
, ,d[ sa02 sub
, -dt saG3 sub
, .dt salO sub
, .dt sail sub
, .d| sal2 sub
, .d( sal3 sub
, -d( sa20 sub
, .d( sa21 sub
, .d( sa22 sub
, .d( sa23 sub
, .d( sa30 sub
, -d( sa31 sub
, -d( sa32 sub
, -d( sa33 sub
Original Design - aes cipherJop.v
Appendix 9 Original Design - aes inv cipherJop.v
m nmim i mimimim mm/mimmmiin1111i/mm m a
tin mi
llll AES Inverse Cipher Top Level ////
//// ////
//// ////












//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// [INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY HAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////





// $Id: aes_inv_cipher_top.v,v 1.1.1.1 2002/11/09 11:22:53 rudi Exp $
//
// $Date: 2002/11/09 11:22:53 5
// SRevision: 1.1.1.1 $
// $Author: rudi ?
// SLocker: 5
// SState: Exp S
//
// Change History:
// $Log: aes_inv_cipher_top.v,v $














input [127::UJ text in;





wkO, wkl, wk2, wk3;
wO, wl, w2, w3;
text_in_r;
text_out;
saOO, saOl, sa02, sa03;
salO, sail, sal2, sal3;
sa20, sa21, sa22, sa23;
sa30, sa31, sa32, sa33;
sa00_next, sa01_next, sa02_next, sa03_next;
salO_next, sall_next, sal2_next, sa!3_next;
sa20_next, sa21_next, sa22_next, sa23_next;
sa30_next, sa31_next, sa32_next, sa33_next;
sa00_sub, sa01__sub, sa02_sub, sa03_sub;
salO_sub, sall_sub, sal2_sub, sal3_sub;
sa20_sub, sa21_sub, sa22_sub, sa23_sub;
sa30_sub, sa31_sub, sa32_sub, sa33_sub;
sa00_sr, sa01_sr, sa02_sr, sa03_sr;
salO_sr, sall_sr, sal2_sr, sal3_sr;
sa20_sr, sa21_sr, sa22_sr, sa23_sr;
sa30_sr, sa31_sj:, sa32_sr, sa33_sr;
saO0_ark, sa01_ark, sa02_ark, sa03_ark;
salO_ark, sall_ark, sal2_ark, sal3_ark;
sa20_ark, sa21_ark, sa22_ark, sa23_ark;
sa30_ark, sa31__ark, sa32_ark, sa33_ark;
reg ld_r, go, done;
reg [3:0] dent;






























iff!rst) dent <= ttl 4'hO;
else
if(done) dent <= #1 4'hO;
else
if(ld) dent <= #1 4'hl;
else
if(go) dent <= #1 dent + 4'hl;
always 8(posedge elk) done <= #1 (dcnt==4'hb) s !Id;
always 8(posedge elk)
if first) go <= #1 1'bO;
else
iffld) go <= #1 l'bl;
else
if(done) go <= #1 1'bO;
always @ [posedge elk]
always @ (posedge elk)
if(ld) text_in__r <- #1 text_in;
Id r <= #1 Id;
Original Design - aes inv cipherJop.v




always 8(posedge elk) sa33 <= #1 Id r ? text_jLn_r[0Q7 000] •• w3[07 00] sa33 next
always 8(posedge elk) sa23 <= #1 Id r 7 text in r[015 008] " w3[15 08] sa23 next
always @(posedge elk) sal3 <= #1 Id r 7 text in r[023 016] " w3[23 16] sa!3 next
always @ (posedge elk) sa03 <= #1 Id r ? text_in_r[031 024] " w3[31 24] sa03 next
always @ (posedge elk] sa32 <^ #1 Id r 7 text in r[039 032] " w2[07 00] sa32 next
always @ (posedge elk) sa22 <= #1 Id r ? text_in_r[047 040] " w2[15 08] sa22_next
always 8(posedge elk) sal2 <= #1 Id r 7 text__in_r[055 048] " w2[23 16] sal2 next
always 8 (posedge elk) sa02 <= #1 Id r ? text_in_r[063 056] " w2[31 24] aa02 next
always 8 (posedge elk) sa31 <= #1 Id r 7 text in r[071 064] A wl[07 00] sa31 next
always 8 (posedge elk) sa21 <= #1 Id r ? text in r[079 072] A wl[15 08) sa21 next
always @(posedge elk) sail <= #1 Id r ? text_in_r[087 080] " wl[23 16] sail next
always 8 (posedge elk) saOl <= #1 Id r ? text_in__r[035 08B] A Ml [31 24] saOl next
always 8 (posedge elk) sa30 <= #1 Id r 7 text in r[103 096] A w0[07 00] sa30 next
always 8(posedge elk) sa20 <= #1 Id r ? text_in_r[111 104] " w0[15 08] sa20 next
always 8 (posedge elk) salO <= #1 Id r 7 text in r[119 112] '• w0[23 16] salO next





assign saO0_sr = saOO;
assign sa01_sr = saOl;
assign sa02_sr = sa02;
assign sa03_sr = sa03;
assign sal0_sr = sal3;
assign sall_sr = salO;
assign sal2_sr = sail;
assign sal3_sr = sal2;
assign sa20_sr - sa22;
assign sa21_sr = sa23;
assign sa22_sr - sa20;
assign sa23_sr = sa21;
assign sa30_sr = sa31;
assign sa31_sr = sa32;
assign sa32_sr = sa33;
assign sa33_sr = sa30;
assign sa00_ark = sa00_sub
assign sa01_ark - sa01_sub
assign sa02__ark = sa02_sub
assign sa03_ark - sa03_sub
assign aalO_ark = salO_sub
assign sall__ark = sall_sub
assign sal2_ark = sal2_sub
assign sal3_ark = sal3_sub " w3[23:16]
assign sa20jark = sa20_sub A w0[15:08]
assign sa21_ark = sa21_sub
assign sa22_ark = sa22_sub
assign sa23_ark = sa23_sub
assign sa30_ark = sa30_sub
assign sa31_ark = sa31_sub
assign sa32_ark = sa32__sub
assign sa33_ark - sa33_sub
assign [sa00_next, salO_next, sa20_next
assign (sa01_next, sall_next, sa21_next
assign (sa02_next, sal2_next, sa22_next















sa30__next) = inv_mix_col (sa00_ark, sal0_ark, sa20_ark, sa30_ark)
sa31_next} = inv_mix__col |sa01_ark, sall_ark, sa21_ark, sa31_ark)
sa32_next} = inv__mix_col [sa02_ark, sal2__ark, sa22_ark, sa32_ark)
sa33 next] = inv mix col(sa03_ark,sal3_ark,sa23_ark,sa33_ark)
i iiiiminil i iimmmiiii/mmitin iiii iiiimmiimiii iiiii
n
II Final Text Output
//
always @(posedge elk) text_out[127:120] <= #1 aa00_ark;
always 8(posedge elk) text_out[095:088] <= #1 sa01_ark;
always @(posedge elk) text_out[063:056] <= #1 sa02_ark;
always @(posedge elk] text_out[031:024
always @[poaedge elk) text_out[119:112
always @(posedge elk) text_out[087:080




<= #1 sal2 ark;
Appendix 9 Original Design - aes inv cipherJop.v
always @(posedge elk) text_out[023:016] <= #1 sal3_ark;
always @(posedge elk) text_out[111:104] <= #1 sa20_ark;
always @(posedge elk) text_out[079:072] <= SI sa21_ark;
always @(posedge elk) text_out[047:040] <= #1 sa22_ark;
always @(posedge elk) text_out[015:003] <= #1 sa23__ark;
always 8(posedge elk) text_out[103:096] <= #1 sa30_ark;
always 8(posedge elk) text_out[071:064] <= #1 sa31_ark;
always 8(posedge elk) text_out[039:032] <= #1 sa32_ark;
always 8(posedge elk) text_out[007:000] <= #1 sa33_ark;





















































always 8 (posedge elk)
if (Irst) tent <= !U 4'ha;
else
if(kld) kent <= #1 4'ha;
else
if(kb_ld) kent <= #1 kent - 4'hl;
always @[posedge elk)
if(!rst) kb_ld <= #1 1'bO;
else
if(kld) kb_ld <= #1 l'bl;
else
if fkcnt==4'h0) kb_ld <= itl 1'bO;
always @(posedge elk) kdone <= #1 (kent==4'h0) & !kld;
always @(posedge elk) if(kb_ld) kb[kcnt] <- #1 [wk3, wk2, wkl, wkO)










.wo 1 ( wkl ),
.wo_2( wk2 ),
Appendix 9
aes_inv sbox us00( •a( sa00_sr
aes inv sbox us01( •a( saOl sr
aes_inv_sbox us02( •a( sa02_sr
aes inv sbox ua03( •a( sa03 sr
aes inv sbox uslO( •a( salO sr
aes inv sbox usll( •a( sail sr
aes_inv_sbox Usl2( .a( sal2 sr
aes inv sbox usl3< •a( sal3_sr
aes inv sbox us20( •a( sa20_sr
aes inv sbox us21( •a( Sa21_sr
aes inv sbox us22 ( •a[ sa22_sr
aes inv sbox us23( •a[ sa23 sr
aes inv sbox us30t .at sa30 sr
aes inv sbox Us31|
-at sa31_sr
aes inv sbox us32| .at sa32 sr
aes inv sbox us33[ -a( sa33 sr
endmodule
















































//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////




SId: aes_sbox.v,v 1.1.1.1 2002/11/09 11:22:38 rudi Exp "


















































// synopsys full_case parallel_case




































































































































































































































































































































































Downloaded from: http://www.opencores.org/cores/aes_core/ ////
////
milmiitiiiitiiiiiiiii in/mi iiiiiiii iiiiiiiiiiii iiii imiiiiiiii
im im




//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////
//// POSSIBILITY OF SUCH DAMAGE. ////
//// ////
tint iiii iiiiiiiimiiiiiiiiiiiii iiii iiii imiiii iiii iiiiiiii iiii i iii
CVS Log
Sid: aes_inv_sbox.v,v 1.1.1.1 2002/11/09 11:22:55 rudi Exp $


















































// synopsys full_case parallel_case


















































































































































































































































































































































































































































Original Design - aesinv_sbox.v





















Downloaded from: http://www.opencores.org/cores/aes_core/ ////
////
t ill! iii/iiiiiimmti ii iiiiiiii imm/immi iiiimiiiimiii/n
im mi




//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////
//// POSSIBILITY OF SUCH DAMAGE. ////
//// ////










1.1.1.1 2002/11/09 11:22:38 rudi Exp $
Change History:
SLog: aes__key_expand__128 .v, v $












kid, key, wo_0, wo_l, wo_2, wo_3);
assign wo_0 = w[0];
assign wo_l = w[l];
assign wo__2 = u[2]
assign wo_3 = w[3];
always @ (posedge elk)
always @ (posedge elk)
always @ (posedge elk)
always @ (posedge elk)







w(0j <= #1 kid ? key[127 096]
wfll <= #1 kid 7 key[095 064]
w[2] <= #1 kid ? key[063 032]














Appendix 13 Original Design &New Design - aes rcon.v
i ii im i tmmmmimiiiiiiiiiiii i mimmimummim iiiii
//// im
IIII AES RCON Block ////
//// ////
//// ////












//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED '"AS IS" AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// [INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// [INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////





// $Id: aes_rcon.v,v 1.1.1.1 2002/11/09 11:22:38 rudi Exp $
II
II SDate: 2002/11/09 11:22:38 $
// SRevision: 1.1.1.1 S
// SAuthor: rudi S
// SLocker: $
// SState: Exp $
//
// Change History:
// $Log: aes_rcon.v,v $




















out <= #1 32'h01_00_00_00;
out <= #1 frcon(rcnt_next)i
assign rcnt_next - rent + 4'hl;
always @(posedge elk)
if (kid) rent <= #1 4'h0;
else rent <- ftl rcnt_next;
function [31:0] frcon;
input [3:0] i;















Appendix 14 New Design - aesjiew64Jop.v
i iimmmillmimimmmimii mmmnmini!min n tin
mi mi
IIII AES Top Layer for Combined Sbox Design ////
//// [for serial input of 64 bit per cycle) ////
//// ////




//// Composed for Final Year Project of EE Faculty, UTP ////
//// ////
11mi i mimimiminiiiimiii m m mmii i ii m mim iimn
it
it
It SDate: 2004/4/01 S
// SRevision: 1.0 $




























reg [7:0] ent_temp, cnt_temp2;
always Slposedge elk)
if(!rst) counter <= #1 B'hOO
else
if(complete} counter <= #1 8'b00
else
if(start) counter <= ffl 8'hOl
else
if (run) counter <= #1 counter + 8'hOl;
always @ (posedge elk)
if (!rst) run <= #1 1'bO
else
if(start) run <= #1 l'bl
else
If(complete) run <= #1 1'bO
always 13 (posedge elk)
if(start) option_buf <= #1 option;
always @ (posedge elk)
if(! Iloounter) S start) text_in_buf[063:000] <- #1 text_in;
else
iflcounter == a'hOl) text_in_buf[127:064] <= #1 text_in;
always @(posedge elk)
if [! (Icounter) £ start) key_buf[063:000] <= #1 key_in;
else




if(start) load_crypto <= #1 1'bO;
else
iflcounter == B'hOl) load_crypto <= #1 l'bl;
else
iflcounter == 8'h02) load_crypto <= #1 1'bO;





if(start) loadjtey <= #1 1'bO;
else
iflcounter == B'hOl) load_key <= #1 l'bl;
else
iflcounter == B'h02) load_key <= #1 1'bO;
Appendix 14 New Design - aesjiew64jop.v
if(load_key) cnt_temp <= #1 8'hOb;
else
if(Icnt_temp) cnt_temp <= #1 cnt_temp ~ 8'hOl;
load_crypto <= fil 1[|cnt_temp [7:1]) S cnt__temp [0];
end
// encryption or decryption process takes place
always @(posedge elk)
if(done) assign cnt__temp2 = counter;
always Slposedge elk)
if(done) text__out <= #1 text_out_buf[063:00C
else
iff[counter - cnt_temp2) =- B'hOl) text_out <= #1 text_out_buf[127:064
always @(posedge elk)
if(done || ((counter - cnt_terap2) == B'hOl)) get_output <= #1 l'bl;
always Slposedge elk)
if((counter - cnt_temp2) == 8'hOl) complete <= #1 l'bl;
else
if((counter - cnt_temp2) == 8'h02) complete <= #1 1'bO;
aes crypto top uO(
.ciph opt( option buf ),
.clk( elk
.rst( rst
.kld( load key ),





















AES Cipher Top Level
(combined with)
AES Inverse Cipher Top Level
Author: Rudolf Usselmar
rudi@asics.ws






















//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////















// SLog: aes_cipher_top.v,v S










Sid: aes__cipher_top.v,v 1.1.1.1 2002/11/09 11:22:48 rudi Exp $
Sid: aes_inv_cipher_top.v,v 1.1.1.1 2002/11/09 11:22:53 rudi Exp $






New Design - aes crypto Jop.v
Both Cipher and Inverse Cipher were merged for combined-Sbox
'include "c:\aes_modules\new\timescale.v"


































] in_s00, in_s01, in_s02, in_s03;
] in__sl0, in_sll, in_sl2, in_sl3;
j in_s20, in__s21, in_s22, in_s23;
] in_s30, in_s31, in_s32, in_s33;
] out_sOO, out_s01, out_s02, out__s03;
] out_slO, out_sll, out_sl2, out__s!3;
] out__s20, out_s21, out__s22, out_s23;
] out_s30, out_s31, out_s32, out_s33;
0] wO, wl, w2, w3;
i m titmimmimmmmmmmitiiii i iiimini i tiniiii iii
it
It Local Wires for Cipher
//
wire [31:0] CwO, Cwl, Cw2, Cw3;
reg [127:0] Ctext_in_r;
reg [127:0] Ctext_out;
reg [7:0] CsaOO, CsaOl, Csa02, Csa03;
reg [7:0] CsalO, Csall, Csal2, Csal3;
reg [7:0] Csa20, Caa21, Csa22, Csa23;
reg [7:0] Csa30, Csa31, Csa32, Csa33;
wire [7:0] Csa00_next, Csa01_neKt, Csa02_next, Csa03_next;
wire [7:0] CsalO next, Csall next, Csal2_next, Csal3__next;

















[7:0] Csa20_next, Csa21__next, Csa22_next, Csa23_next;
[7:0] Csa30_next, Csa31_next, Csa32_next, Csa33_next;
[7:0] Csa00_sub, CsaOl^sub, Csa02_sub, Csa03_sub;
[7:0] CsalO_sub, Csall_sub, Csal2__sub, Csal3_sub;
[7:0] Csa20_sub, Csa21_sub, Csa22__sub, Csa23_sub;
[7:0] Csa30_sub, Csa31__sub, Csa32_sub, Csa33_sub;
[7:0] Csa00_sr, Csa01_sr, Csa02_sr, Csa03_sx;
[7:0] CsalO_sx, Csall_sr, Csal2_sr, Csal3_sr;
[7:0] Csa20_sr, Csa21_sr, Csa22__sr, Csa23_sr;
[7:0] Csa30_sr, Csa31_sr, Csa32_sr, Csa33_sr;
[7:0] CsaOOjnc, CsaOljnc, Csa02_mc, Csa03_mc;
[7:0] CsalOjnc, Csall__mc, Csal2_mc, Csal3_mc;
[7:0] Csa20_mc, Csa21_mc, Csa22_mc, Csa23_mc;
[7:0] Csa30_mc, Csa31_jnc, Csa32_mc, Csa33_mc;
Cdone, Cld_r;
[3:0] Cdent;
nmnim/itinii/i/i/i iiiiiiii im/ini/ii/iii iiiiiiii ina inin
II










next, Isa01_next, Isa02_next, Isa03_next;
next, Isall_next, Isal2_next, Isal3_next;
next, Isa21_next, Isa22_next, Isa23_next;
"next, Isa31_next, Isa32__next, Isa33_next;
sub, Isa01__sub, Isa02_sub, Isa03_sub;
sub, Isall_sub, Isal2_sub, Isal3_sub;
"sub, Isa21_sub, Isa22_sub, Isa23_sub;
"sub, Isa31_sub, Isa32_sub, Isa33_sub;
sr, Isa01_sr, Isa02_sr, Isa03_sr;
sr, Isall_sr, Isal2_sr, Isal3_sr;
sr, Isa21_sr, Isa22__sr, Isa23_sr;
sr, Isa31_sr, Isa32_sr, Isa33_sr;
"ark, Isa01_ark, Isa02_ark, Isa03_ark;
ark, Isall__ark, Isal2_ark, Isal3_ark;
ark, Isa21_ark, Isa22__ark, I3a23_ark;
_ark, Isa31_ark, Isa32_ark, Isa33_ark;
, Igo, Idone;
wire [31:0] IwkO,

























n miiHimiiiiiiii iiimniii i immnmnmiimmmni/m
n
It Misc Logic for Cipher
//
always @(posedge elk)
if first) Cdent <- #1 4'h0;
else
if(ld) Cdent <= #1 4'hb;
else
if(1Cdent) Cdent <= #1 Cdent - 4'hl;
always @(posedge elk) Cdone <= Wl !11 Cdent [3 :1]) &Cdcnt[0] & '.Id;
always @(posedge elk) if(ld) Ctext_in_r <= «1 text_in;
always @(posedge elk) Cld_r <- #1 Id;
tintturninni iiiinnnmiiimmmiiiiiimiitin in/mini
n
II Initial Permutation (AddRoundKey) for Cipher
//
always @(posedge elk)








always 9 Iposedge elk)
always @(posedge elk)
always S(posedge elk)
always e (posedge elk)
always @(posedge elk)
always 0 (posedge elk)
always @(posedge elk)
Csa33 <= #1 Cld r _ Ctext n r[007 000] " Cw3(07 00] Csa33_next
Csa23 <= #1 Cld r ? Ctext_ n 'r [015 000] '• Cw3[15 08] Csa23 nest
Csal3 <= #1 Cld r ? Ctext_ n r[023 01S] " Cw3[23 16] Csal3__next
Csa03 <= ffl Cld r 7 Ctext_ n "r[031 024] " Cw3[31 24] Csa03_next
Csa32 <= #1 Cld r 7 Ctext n r[039 032] " Cw2[07 00] Csa32 next
Csa22 <= #1 Cld r 7 Ctext_ n "~r[047 040] " Cw2(15 08] Csa22_next
Csal2 <- #1 Cld r 7 Ctext_ n r[055 048] " Cw2[23 16] Csal2 next
Csa02 <- (*1 Cld r 7 Ctext_ n r[063 056] A Cw2[31 24] Csa02_next
Csa31 <= HI Cld r 7 Ctext_ .n r[071 064] - Cwl[07 00] Csa31 next
Csa21 <= #1 Cld r 7 Ctext in r[079 072] " Cwl[15 08] Csa21 next
Csall <= #1 Cld r 7 Ctext Ln r[037 080] " Cwl[23 16] Csall next
CsaOl <= #1 Cld r ? Ctext_ J.n r[095 098] " Cwl[31 24] . CsaOl next
Csa30 <- #1 Cld r 7 Ctext_in r[103 096] A CwO[07 00] : Csa30 next
Csa20 <= #1 Cld r 7 Ctext__ .n r[lll 104] A Cw0[15 0B] : Csa20_next
CsalO <^= 111 Cld r 7 Ctext_ .n r[119 112] " Cw0[23 16) : CsalO next
CsaOO <= #1 Cld r ? Ctext in r[127 120] " Cw0[31 24] : CsaOO next
iiinininiinnnniiiiiiiiiiiiniiiiiiiin iiiiiiii iiiiiiiiiiiiiii
it



















































































CsalO_rac, Csa20_mc, Csa30_mcf = mix__col ICsaO0_sr, CsalO_sr, Csa20_sr,Csa30_sr);
Csall__mc, Csa21__mc, Csa31_mc) = mix_col(Csa01_sr, Csall_sr, Csa21__sr, Caa31_sr);
Csal2_mc, Csa22_mc, Csa32_mc) = mix_col(Csa02_sr,Csal2_sr,Csa22_sr,Csa32_sr);
Csal3_mc, Csa23_mc, Csa33_mc] - mix_col(Csa03_sr,Csal3_sr,Csa23_sr,Csa33_sr] ;































iiii! imiiiii iii iiiiiiii i miniiiiiiiiiiiiiiiiiiiiimmimimi
it
I! Final text output for Cipher
//
always G(posedge elk) Ctext out[127 120] <= #1 CsaOO sr - Cw0[31 24]
always (J(posedge elk) Ctext _out[095 0B8] <= #1 CsaOl sr * Cwl[31 24]
always G(posedge elk) Ctext "out[063 056] <- #1 Csa02 sr " Cw2[31 24]
always 11 (posedge elk) Ctext out[031 024] <= #1 Csa03 sr ' Cw3|31 24]
always G(posedge elk) Ctext "out [119 112] <= #1 CsalO"_sr " Cw0[23 16]
always @(posedge elk) Ctext" out[087 0B0] <= #1 Csall sr ' Cwl[23 16]
always G(posedge elk) Ctext out[055 048] <= #1 Csal2 _sr " Cw2[23 16]
always @(posedge elk) Ctext out[023 016] <= #1 Csal3 sr ' Cw3[23 16]
always S(posedge elk) Ctext "out[111 104] <= #1 Csa20~_sr - Cw0[15 08]
always S (posedge elk) Ctext out [079 072] <= #1 Csa2l" sr • Cwl[15 08]
always S(posedge elk) Ctext "out [047 040] <= #1 Csa22 sr A Cw2[15 08]
always @(posedge elk) Ctext "out[015 008] <= #1 Csa23~ sr ' Cw3[15 08]
always S(posedge elk) Ctext "out [103 096] <= #1 Csa30 sr ' Cw0[07 00]
always E(posedge elk) Ctext "out [071 064] <= #1 Csa31~_sr " Cwl[07 00]
always G(posedge elk) Ctext "out[039 032] <= #1 Csa32_ sr " Cw2[07 00]
always G(posedge elk) Ctext" out[007 000] <= #1 Csa33_ sr " Cw3[07 00]
iimmiimi/imiiiiiiniimi/iiimiiiiimiiiii/miiii imm
n
II Misc Logic for Inverse Cipher
//
always @(posedge elk)
if (!rst) Idcnt <= #1 4'h0;
else
if(Idone) Idcnt <= #1 4'hO;
else
if(ld) Idcnt <= #1 4'hl;
else
if(Igo) Idcnt <= #1 Idcnt + 4'hl;
always @(posedge elk) Idone <= iii (Idcnt—4'hb) f, !ld;
always G(posedge elk)
if (!rst) Igo <= #1 1'bO;
elae
if(Id) Igo <= #1 l'bl;
else
if(Idone) Igo <= #1 1'bO;
always G(posedge elk)
always @(posedge elk)
if(ld) Itext_in__r <= #1 text__in;
lid r <= #1 Id;
mnmiii i iii ii•/mi/n/t/////mii/i/n/ii im////inn//////////
n
II Initial Permutation Inverse Cipher
//
always G(posedge elk) Isa33 <= #1 lid r 7 Itext in r[007 000] " Iw3[07 00] Isa33 next
always G(posedge elk) Isa23 <= #1 lid r 7 Itext in r[015 008] " Iw3[15 08] Isa23 next
always G(posedge elk) Isal3 <= #1 lid r ? Itext_in ~r[023 016] " Iw3[23 16] Isal3_next
always @(posedge elk) Isa03 <- #1 lid r ? Itext in "c[031 024] " Iw3[31 24] Isa03_next
always @(posedge elk) Isa32 <= #1 lid r 7 Itext in r[039 032] " Iw2[07 00] Isa32 next
always G(posedge elk) Isa22 <= #1 Ild_r 7 Itext_in "r[047 040] " Iw2[15 08] Isa22_next
always G(posedge elk) Isal2 <= #1 lid r 7 Itext_in "r[055 048] " Iw2[23 16] Isal2 next
always G(posedge elk) Isa02 <= #1 lid r 7 Itext in "r[0E3 056] " Iw2[31 24] Isa02_next
always G(posedge elk) Isa31 <= #1 lid r ? Itext_in r[071 064] - Iwl[07 00] Isa31 next
always G(posedge elk) Isa21 <= #1 lid r 7 Itext_in "r[079 072] " Iwl[15 08] Isa21 next
always G(posedge elk) Isall <= #1 lid r 7 Itext in ~r[087 080] " Iwl[23 16] Isall next
always @(posedge elk) IsaOl <= #1 lid r ? Itext in r[095 08B] " Iwl[31 24] Isa01__next
always S(posedge elk) Isa30 <= #1 lid r ? Itext in r[103 096] '• Iw0[07 00] IsaSO next
always @(posedge elk) Isa20 <= #1 lid r ? Itext in ~r[lll 104] " Iw0[15 08] Isa20_next
always 9 (posedge elk) IsalO <= #1 lid r ? Itext_in r[119 112] '• Iw0[23 16] Isal0_next






















_ark = IsaO0_sub " Iw0[31:24]
_ark = Isa01_sub " Iwl[31:24]
_ark - Isa02_sub " Iw2[31:24]
_ark - Isa03_sub " Iw3[31:24]
_ark = IsalO_sub " Iw0[23:16]
_ark = Isall_sub " Iwl[23:16]
_ark = Isal2_sub " Iw2[23:16]
_ark = Isal3_sub A Iw3[23:16]
_ark = Isa20_sub " Iw0[15:08]
__ark = Isa21__sub " Iwl[15:08]
_ark = Isa22_sub " Iw2[15:08]
_ark = Isa23_sub " Iw3[15:08]
_ark = Isa30_sub " IwO[07:00]
_ark = Isa31_sub " Iwl[07:00]
__ark = Isa32_sub " Iw2[07:00]
_ark = Isa33_sub " Iw3[07:00]
0_next, IsalO_next, Isa20_next, Isa30_next) = inv_mix_col(IsaO0_ark,IsalO_ark,Isa20_ark,
l_next, Isall__next, Isa21_next, Isa31_next) = inv_mix__eol (Isa01_ark, Isall__ark, Isa21_ark,
2_next, Isal2_next, Isa22_next, Isa32_next} - inv_mix_col(Isa02_ark,Isal2_ark,Isa22_ark,





































mili! ii iiiiiiii tintinimiiti/m iiiimmimmimmimiii
a
II Final Text Output Inverse Cipher
//
always G(posedge elk) Itext_out[127 120] <= iii IsaOO ark
always 6(posedge elk) Itext_out[095 0B8] <= #1 IsaOl" ark
always 8(posedge elk) Itext out[063 056] <= #i Isa02~ ark
always G(posedge elk) Itext out[031 024] <= #1 Isa03 ark
always S(posedge elk) Itext out[119 112] <= #i IsalO" ark
always G(posedge elk) Itext_out[0S7 080] <= #1 Isall""ark
always @(posedge elk) Itext out[055 048] <= #i Isal2"_ark
always G(posedge elk) Itext_out[023 016] <= #1 Isal3 ark
always G(posedge elk) Itext out [111 104] <= #1 Isa20" ark
always G(posedge elk) Itext_out[079 072] <= #1 Isa21 ark
always S (posedge elk) Itext out[047 040] <= #i Isa22_ark
always G(posedge elk) Itext out[015 008] <= #1 Isa23" ark
always @(posedge elk) Itext__out[103 096] <= #1 Isa30"ark
always @ (posedge elk) Itext out[071 064] <= #1 Isa31 ark
always @ (posedge elk) Itext_out[039 032] <= #1 Isa32 ark
always @ [posedge elk) Itext out[007 000] <= #i Isa33~"ark
always %[posedge elk) text_out <= #1 ciph__opt ? Ctext_out : Itext_out;
always @(posedge elk) done <- #1 ciph_opt 7 Cdone : Idone;
i iiiiiiii iiiiiiii iiiiiiii immi/mimiiiitiimiiiii ii m iiiiiii
II














sO o,si o,s2 o,s3 o;
function [7:0] xtime;



















Appendix 15 New Design - aesjryptoJop.v
endfunetion





































always 1 (posedge elk)
if[!rst) kent <= #1 4'ha;
else
if (kid) kent <= #1 4'
else
if[kb Id) kent <- #1 kent - 4'hl;
always 0 (posedge elk)
if(!rst) kb_ld <= #1 1'bO;
else
if(kld) kb_ld <= #1 l'bl;
else
if(kcnt==4'h0) kb_ld <= #1 1'bO;
always Glposedge elk) kdone <= #1 (kcnt==4'h0) J !kld;
always (.(posedge elk) if(kb_ld) kb[kcnt] <- #1 (Iwk3, Iwk2, Iwkl, IwkO];
always ("(posedge elk) [Iw3, Iw2, Iwl, IwO] <= #1 kb[Idcnt];
iiiiiiiiiii/iiiiii/iiii/i/ii/itiiii n in n tmi n iniiiiiiiiiiiiiiii
ii
II Modules for both cipher and Inverse Cipher
//
assign CwO = wO;
assign Cwl = wl;
assign Cw2 = w2;
assign Cw3 = w3;
assign Csa00_3ub = out_s00
assign Csa01_sub ~ out_s01
assign Csa02_sub = out_s02
assign Csa03_sub = out_s03
assign Csal0_sub - out_sl0
assign Csall_sub = out_sll
assign Csal2_sub = out_sl2
assign Csal3_sub - out_sl3
assign Csa20_sub = out_s20
assign Csa21_sub ^ out_s21
assign Csa22___sub = out_s22
assign Csa23_sub - out_s23
assign Csa30_sub = out_s30
assign Csa31__sub = out_s31
assign Csa32_sub = out_s32
assign Csa33_sub = out_s33
assign IwkO - wO;
assign Iwkl = wl;
assign Iwk2 = w2;
assign Iwk3 = w3;
assign IsaOO_sub = out_s00;
assign Isa01_sub - out_s01;
assign Isa02_sub = out__s02;
assign Isa03_sub - out__s03;
assign Isal0_sub = out__sl0;
assign Isall__sub = out_sll;










































































































































































































































































Appendix 16 New Design - aessbox inv.v
i iii i iinuniii iiiiiiiiiiiininitiii ii iiii iiiiiiiiiiii ii i iiiii i ii ii
mi mi








//// Composed for Final Year Project of EE Faculty, UTP ////
//// ////





















reg [7 0] d;
reg [7 0] a_in;




a in = a;
d[7] = a sub[7]Aa sub[6]Aa sub[5] Aa sub[4]Aa sub[3] "1 bO
d[6] = a_sub[6]Aa sub[5]Aa sub[4] Aa sub[3]"a sub[2] Al bl
d[5] = a sub[5]Aa sub[4]Aa sub[3] "a _sub[2] Aa_ sub[l] Al bl
d[4] = a_sub[4]Aa___sub[3]Aa sub[2] Aa sub[l]Aa sub[0] Al bO
d[3] = a sub[7]Aa sub[3]Aa sub[2] Aa _sub[l] Aa_ sub[0] Al bO
d[2] = a_sub[7] Aa__sub[6] Aa_sub[2] Aa sub[l]Aa sub[0) "1 bO
d[l] = a sub[7]Aa sub[6]Aa sub[5] Aa _^sub [ 1] Aa_ sub[0] Al bl




a in[7] = a[6]Aa[4]Aa[l]Al'bO
a in[6] = a[5]Aa[3]Aa[0]Al'bO
a in[5] - a[7]Aa[4]Aa[2]Al'bO
a in[4] = a[6]Aa[3jAa[l]Al'bO
a in[3] - a[5]Aa[2]Aa[0]Al'bO
a in[2] = a[7]Aa[4]Aa[l]"l'bl




aes__mul__inv m0( ,b( b
endmodule
), .m( a in ), .g( a sub ));
Appendix 17
i mumimimmmimmmm mitintimmmtmimimii
'///, miIIII AES Multiplicative Inverse Block ////
//// (to be instantiated by aes_sbox_inv for building ////
//// (both Sbox and Inverse Sbox) ////
iiii IIII
IIII Author: Lee Yi Lin ////
//// leeyilin@yahoo.com ////
//// "N
IIII Composed for Final Year Project of EE Faculty, UTP ////
//// ////
mimiiimiiiimim iiimmimmiimmmmmimiiii i m
I!
II $Date: 2004/4/01 $
// SRevision: 1.0 $





















































































































































































































































































































































































































































































IIII AES Test Bench ////
//// ////
//// ////












//// This source file may be used and distributed without ////
//// restriction provided that this copyright statement is not ////
//// removed from the file and that any derivative work contains ////
//// the original copyright notice and the associated disclaimer.////
//// ////
//// THIS SOFTWARE IS PROVIDED "AS IS'' AND WITHOUT ANY ////
//// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ////
//// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS ////
//// FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR ////
//// OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, ////
//// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ////
//// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE ////
//// GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR ////
//// BUSINESS INTERRUPTION) HOHEVER CAUSED AND ON ANY THEORY OF ////
//// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ////
//// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ////
//// OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE ////





// Sid: test_bench_top.v,v 1.2 2002/11/12 16:10:12 rudi Exp $
//
// SDate: 2002/11/12 16:10:12 $
// $R.evision: 1.2 S
// SAuthor: rudi S
// $Locker: S
// $State: Exp $
//
// Change History:
// SLog: test—bench_top.v,v $
// Revision 1.2 2002/11/12 16:10:12 rudi
//
// Improved test bench, added missing timescale file.
//

















wire [127:0] key, plain,
wire [127:0] text_in;









SdisplayC* AES Test bench ...");
























































































































































































































































































































@ (posedge elk) ;
whileUdone) @[posedge elk);
//SdisplayC'INFO: (a) Vector %0d: xpected Ik, Got %x %t", n, ciph, text_out, $time) ;




$display("ERROR: (a) Vector &0d mismatch. Expected %x, Got %x",
n, ciph, text_out);
error__cnt - error cnt + 1;
while[!done2) @(posedge elk);
//SdisplayflNFO: (b) Vector %0d: xpected *x, Got £x", n, plain, text_out2) ,
if[text_out2 != plain | (|text_out2)==l'bx)
begin
$display("ERROR: (b) Vector £0d mismatch. Expected %k, Got 4x",
n, plain, text_out2);











assign tmp = tv[n];
assign key = kid ? tmp[383:256]
assign text_in = kid ? tmp[255:128]
assign plain - tmp[255:128];
assign ciph = trap[127:0];
always #5 elk = -elk;
123'hx;
128'hx;
aes cipher top uO(
.clk( elk ),






















Appendix 19 Synthesis Report of Original Design
Final Results
Top Level Output Fil 2 Name : aes ori64 top edn
Output Format : EDIF
crit
: Area
Users Target Library- File Name : Virtex
Keep Hierarchy : NO
Macro Generator : Macro+
Macro Statistics
# RAMs : 1
1408-bit dual-port RAM : 1
# Registers : 99
4-bit register : 2
1-bit register : 97
# Multiplexers : 1
64-bit 4-to-l multiplexer : 1
# Adders/Subtractors : 8




























# Clock Buffers 1
# BUFGP 1




NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.




Minimum period: 23.042ns (Maximum Frequency: 43.399MHz)
Minimum input arrival time before clock: 9.007ns
Maximum output required time before clock: 6.887ns
Maximum combinational path delay: No path found
Timing Detail:
All values displayed in nanoseconds (ns)
Appendix 19 Synthesis Report of Original Design
Path from Clock 'elk'
(Slack: -23.042ns)
rising to Clock 'elk' rising : 23.042ns
Gate Net
Cell:in->out fanout Delay Delay Logical Name
FD:C->Q 59 1.065 4.635 ul sa20 5 1
LUT3:l2->0 6 0.573 1.665 ul us22 I SF69
LUT4:l3->0 1 0.573 0.000 ul us22 I 203 LUT 11 F
MUXF5:IO->0 1 0.436 1.035 ul us22 I 203 LUT 11
LUT4:I3->0 1 0.573 1.035 ul us22 I 199 LUT 136
LUT3:I2->0 1 0.573 0.000 ul us22 I d 6 F
MUKF5:IO->0 1 0.436 1.035 ul us22 I d 6
LUT2:IO->0 21 0.573 2.925 ul I sa22 ark 6
LUT2:Il->0 2 0.573 1.206 ul I n0224 4
LUT3:Il->0 1 0.573 1.035 ul 1351 I Result 4
LUT4:IO->0 1 0.573 1.035 ul 1380 I Result
LUT4:I3->0 2 0.573 0.000 ul I n0008 4
FD:D 0.342 ul sa32 4
Total 23.042ns
Path from Port 'start' to Clock 'elk' rising : 9.007ns
(Slack: -9.007ns)
Gate Net














Path from Clock 'elk' rising to Port 'text_out_0' : 6.887ns
(Slack: -6.887ns)
Gate Net








text out 0 OBUF
Appendix 20 Map Report ofOriginal Design
Xilinx Mapping Report File for Design 'aes__ori64_top'
Copyright (c) 1995-2000 Xilinx, Inc. All rights reserved.
Design Information
Command Line : map -p V300E-BG352-8 -cm area -gm exact -k 4 -c 100 -tx off
aes__ori64_top. ngd
Target Device : xv300e
Target Package : bg352
Target Speed : -8
Mapper Version : virtexe — D.22
Mapped Date : Wed Apr 07 11:08:27 2004
Design Summary
Number of errors: 1
Number of warnings: 2
Number of Slices: 6,979 out of 3,072 227%
Number of Slices containing
unrelated logic: 65 out of 6,979 1%
Total Number Slice Registers: 1,760 out of 6,144 28%
Number used as Flip Flops: 1,752
Number used as Latches: 8
Total Number 4 input LUTs: 13,415 out of 6,144 218%
Number used as LUTs: 13,156
Number used as a route-thru: 3
Number used for Dual Port RAMs: 256
(Two LUTs used per Dual Port RAM)
Number of bonded lOBs: 196 out of 260 75%
Number of GCLKs: 1 out of 4 25%
Number of GCLKIOBs: l out of 4 25%
Total equivalent gate count for design: 114,785
Additional JTAG gate count for IOBs: 9,456
Appendix 21
Final Results
Top Level Output File Name : aes new64 top.edn
Output Format EDIF
crit Area




































































Synthesis Report ofNew Design
TIMING REPORT
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.




Minimum period: 39.333ns (Maximum Frequency: 25.424MHz)
Minimum input arrival time before clock: 9.727ns
Maximum output required time before clock: 6.887ns
Maximum combinational path delay: No path found
Timing Detail:
All values displayed in nanoseconds (ns)
Path from Clock 'elk' rising to Clock 'elk' rising : 39.333ns
(Slack: -39.333ns)
Gate Net
Appendix 21 Synthesis Report of New Design
Cell:in->out fanout Delay Delay Logical Name
FDE:C->Q 99 1.065 6.435 option_buf_2
LUT3:Il->0 5 0.573 1.566 uO_I_in_s03_6
LUT4: i2->0 2 0.573 1.206 uO__us03_I__5_LUT 210
LUT3:I2->0 77 0.573 5.445 u0_usO3_I_a_in_4
LUT4:IO->0 1 0.573 1.035 uO_us03__mO__I_157_LUT_218
LUT2:Il->0 1 0.573 1.035 u0_us03_m0_I_156_LUT 6
LUT4:Il->0 1 0.573 1.035 uO__usO3_mO_I_JL55_LUT_109
LUT4:l3->0 1 0.573 1.035 uO_us03__mO_I_146 LUT 109
LUT4:I3->0 4 0.573 1.440 uO_us03_mO_I__g_4
LUT4:I3->0 2 0.573 1.206 uO_usO3_I_nO035
LUT3:I2->0 9 0.573 1.908 u0_us03_I_d_7
LUT2:IO->0 13 0.573 2.250 uO I Isa03 ark 7
LUT2:Il->0 6 0.573 1.665 ~ ~~
u0_I_inv_mix_col_4_pmul_e_13__xtime_177_xtime__l
LUT2:Il->0 1 0.573 1.035 u0_I728_I_Xol0
LUT4:I2->0 1 0.573 1.035 uO__I760_I Result
LUT4:IO->0 1 0.573 0.000 uO__I__n0155_2
FD:D 0.342 uO__Isa03_2
Total 39.333ns
Path from Port 'start' to Clock 'elk' rising : 9.727ns
(Slack: -9.727ns)
Gate Net
Cell:in->out fanout Delay Delay Logical Name
IBUF:I->0 19 0.768 2.790 start_IBUF
LUT4:l2->0 64 0.573 4.860 I_n0003
FDE: CE 0-736 text_in__buf_59
Total 9.727ns
Path from Clock 'elk' rising to Port 'text_out_0' : 6.887ns
(Slack: -6.887ns)
Gate Net
Cell:in->out fanout Delay Delay Logical Name




Appendix 22 Map Report of New Design
Xilinx Mapping Report File for Design 'aes_new64_top'
Copyright (c) 1995-2000 Xilinx, Inc. All rights reserved.
Design Information











Wed Apr 07 11:43:50
Design Summary
2004
Number of errors: 1
Number of warnings: 1
Number of Slices: 4,382 out of 3,072 142%
Number of Slices containing
unrelated logic: 68 out of 4,382 1%
Total Number Slice Registers: 1,445 out of 6,144 23%
Number used as Flip Flops: 1,437
Number used as Latches: 8
Total Number 4 input LUTs: 8,341 out of 6,144 135%
Number used as LUTs: 8,082
Number used as a route-thru: 3
Number used for Dual Port RAMs: 256
(Two LUTs used per Dual Port RAM)
Number of bonded lOBs: 196 out of 260 75%
Number of GCLKs: l out of 4 25%
Number of GCLKIOBs: 1 out of 4 25%
Total equivalent gate count for design: 78,977
Additional JTAG gate count for IOBs: 9,456
