The hardware implementation of private-key block ciphers by Riaz, Mohsin




The Hardware Implementation of Private-key Block -
C iphers
by
@ M oh sin Ri az
A t hes is submitted to the
School of Gradua te Studies
in part ial fulfillmen t of t he requirement s for
t he degr ee of Master of Engineering
Faculty of Engineering and Applied Science
M emorial University of Newfoundland
Jul y, 1999
St . John's Newfoundlan d Canada
Dedica ti on
To my mom. Qa.iser, whose selfless love and tendern ess for me always make me st rive for
be tt er in life.
Abstract
Th e Natio nal Institute of Standards an d Technology (NIST) in the U.S. has ini tiate d
a process to devel op a Federal Infonnation Processing Stan dard (FIPS) for an Advanced
Encryption Stand ard (A ES) [t], to become the standard for priva te-key block encryp tion.
The Dew encryp tion algorithm will be based on a 128-bit block size and t he key size can be
128, 192, or 256 hits. AES will be a replacement for the Da ta Encryption Standard (DES) [21
which is based on a 64-bit block size and bas a 5&.bit key. In this regard , the agency has
accepted candida te algori thm nominations for AES .
On e of the important evaluation crit eria concerns th e efficiency of the private- key block
ciphe r from the hardware imp lementa tion perspective . Re6 {3] and CA ST- 256 [41are among
t he fifteen cand ida te algorithms that have been accep ted in the first round of the AES
development phase. This thesis investigates the efficiency of these two AES candi dates from
the hardware imp lementation perspec tive with Field Programmable Gat e Arr ays (FPGAs)
as the target technology.
Our analysis and synt hesis studies of both the ciphers suggest it would be desirable for
FPGA implement at ions to have a simpler cipher design that makes use of simpler operati ons
that not onlv possess good cryptographic properties , but also make th e overal l cipher design
efficie nt from the hardware implementat ion perspective. A2, a result , the thesis also proposes
a new private-key block cipher design that, not only is very efficient as far as its implemen-
tati on in FPGAs is concerned, but at th e same time is secure against the two most potent
attacks that have been app lied to block ciphers , namely , differential and linear cryptanalysis.
Acknowledgements
I wish to express my gratitude to my mentor and supervisor, Dr . Howard Heys for his
valuable time , guidance, and financial support throughout the course of my research. His
constant encou ragement esp ecially during hard times is not only appreciated, bu t will always
be remembered .
A spec ial thank you to my family and frien ds, who had always been there for me. To
Bala and Sidhu - be tt er friends I could not ask for - a heart-felt appreciat ion for reminding
me what else there is to life.
I also appreciate the timely support provided to me by Mr. Michae l Rendell whi le I was
bat tling along with the CAD tools.
iii
Contents
Abstract
Ackno w ledgements
Table of Contents
List of Figures
List of Tables
Symbols an d Abbreviations
1 Introduction
1.1 Motivation for t he Research
1.2 Ou tline of Th esis . . ..
2 Review- of Previow Research
2.1 Private Key Block Ci ph ers ..... ...•..•
2.1.1 Acchitectures
2.1.2 Popular Pri vat e-key Block Ciphers
tv
ii
ii i
v iii
xi
xii
12
2.1.3 Advanc ed Encryption Standard (AES )
2.2 Cryptographic Properties of a Block Cip her
2.2. 1 Nonline arity . . .
2.2.2 Avala.nche ..
2.2.3 Completeness . .
2.2 .4 Strict Avala.nche Criterion
2.2.5 Information Theory .
2.2.6 Invertibility . . .
2.3 Cryptanalysis of Pri va te-key Block: Ci phers . .
2.3 .1 Bru te force Atta.clc
2.3 .2 Different ial Cry p t analysis
2.3 .3 Linear Cryp tan al ysis .
14
15
I.
17
17
17
18
18I.
. . . . . . . . . . . . 19
2.
21
2.3.4 Ti ming Att ack 22
2.4 Secu rity of Re 6 and CAST~256 Encryp tion Algorithms . 22
2.5 Con clusion . . _ . .. _ ... . 24
2.
. • • • • • . • . . 29
30
31
. . _ . . . . . .. 32
33
26
. . . . . . . . . •• 27
3.3.1 Xilinx XC4000 St ruc ture . .
3.3.1.1 XC4000 Structure
3.2 Field Programmable Gate Arrays (F P GAs) ..
3.2.1 Advantages of FPGAs over MPGAs . .
3.2.2 Disad van tages of FPGAs over MPGAs
3.3 SRAM -based F P G As ..
3 Hardware E nvir onmen t s for Cryptographic Applica t ions
3.1 Hardware Encryption V5 . Soft ware Encryp tion ...
3.3.1.2 Pr ogramm ing Technology
3.3 .1.3 Interconnections
3.3.1.4 Xilinx Logic .
3.4 Cryptographic Algorithms : FP GAs vs. ASICs
3.5 Conclusion .
4 Design of RC6 an d CAST-256
4.1 The RC6 Ciphe r
4.2 The CAST-256 Cipher
4.3 Hardware Development Environme nt
4.4 Design of RCG .
4.4.1 RC6 Datapa th
4.4.1.1 Design of 32~bit Barre l Shifter.
4.4.1.2 Design of 32-bit Adder.
4.4.1.3 Design of 32~bit XOR
4.4 .1.4 Design of 32 x 32 "Partial" Integer Multi plier .
4.4.2 RC6 Control Path Design
4.4.2.1 RC6 Global St ate Machine
4.4.2.2 RC6 Da ta Flow Controller.
4.4.3 Key Sto rage for RC6 .
4.4.4 Simulation and Synthesis Results
4.5 Design of CAST- 256
4.5.1 CAST~256 Datapath
4.5.1.1 Generi c Roun d Funct ion .
vi
34
35
37
40
41
42
42
44
46
47
49
50
51
54
54
59
59
62
63
63
66
67
67
4.5.1.2 The S-Box Design
4.5.2 CAST -256 Control Path Des ign .
4.5.2.1 CAST -256 Glob al Sta te Machine
4.5.2.2 CAST-256 Data Flow Controller
4.5 .3 The Key Storage Unit
4.5.4 Simulation and Syn thesis Results
4.6 Comparison of RC6 and CAST-256 Ciphers
4.6.1 Some Recent Modlflcatious .
4.7 Conclusion.
.5 A N ew Private-key Bl ock Cipher Design
5.1 The Proposed Cipher .
5.2 FPGA Implementation of the Propo sed Cipher .
5.2.1 Datapath
5.2.2 Control Path Design
5.2.3 The Key Stor age Un it
5.2.4 Simulatio n and Sy'uthesis Results
5.3 Security Analysis
5.3.1 Selecting Nonlinear Round Functions .
5.3.2 Linear Cryp tanalysis . .
5.3.3 Differential Cryptan alys is
5.4 Conclusion . .
6 Co ncl us ions and Future Work
6.1 Summary of the Thesis .. .
vii
68
69
69
70
71
n
75
76
n
7'
7.
82
82
84
84
85
87
87
88
95
96
.7
98
6.2 Suggestions for Fu ture Work .
Bibliography
A A VHDL Description o f RC6 G lobal State Machine
B Gate-Level Simulation of RC6 Cipher
C Gate-Level Simulation of CAST-256 C ip her
D Gate-Level Simulation of Fast Hardware Cipher (FHC)
viii
100
101
109
116
127
136
List of Figures
2.1 A General Cryptosystem . . .
2.2 Private and Public Key Cryptosys tems
2.3 A Basic Substitut ion-Perm utation Networ k •• . .
2.4 A Basic Feistel Structure .
3.1 A Simple FP GA Taxonomy
3.2 A Xilinx XC4QOO structure .
3.3 Pass Trans istor Control Technique .
3.4 Multiplexer Contro l Techniq ue .. . .
3.5 A Lookup Tab le Implementation
3.6 A Xilinx XC4000 Switch Matrix .
3.7 Simplified Logic Schematic of aXilinx CLB
3.8 Simplified Block Diagram of Xilinx lOB
4.1 Encryption with RC6-w/r/b
4.2 En cryption with CAST-256
4.3 Realizati on of RC6 Encryp tion in Hardware
4.4 Funct ional Representat ion of RCG Da tapath
ix
10
12
28
33
35
35
36
37
. _ _ . . . . . .. 38
39
. . . • • . . •• 44
46
48
50
4.5 Configurable Logic Block Schematic of the 32-bit Carry Ripple Adde r . 52
4.6 A High Level Organi za tion of Wallace Tree Multip lier . _ . . . . . . '. 58
4.7 A Component Interface of RC6 Encryp tor 60
4.8 A Com po nent Interface Representation of RC6 Global Sta te Machin e 62
4.9 CAST -2OOEncryption in Hard ware . . 66
4.10 The Generi c Round FUnction Mod ule ..
4.11 CAST-2OO State Machine Unit _ _
4.12 CAST-2OO Data Flow Co ntro ller .
. . . __ . _ . . . . . . 68
70
71
5.1 Encryption with Fast Hard ware Cipher (FHC) . si
5.2 Realizati on of Fast Hardware Cipher in Hard ware . . . . - . . . . . .3
n.i Ga te- Level Simulation of RC6 Cipher Design . 11.
0. 2 Gate- Level Simulation of RC6 Cipher Design Cont'd 11.
D.3 Gate-Level Simula tion of RC6 Cipher Design Cont'd 120
D.4 Gate-Level Simulation of RC6 Cipher Design Cont 'd 121
8 .5 Gate-Level Simula tion of RC6 Cipher Design Co nt 'd 122
D.. Gate-Level Simulation of RC6 Cipher Design Con t'd 123
0. 7 Gate-Level Simulation of RC6 Cipher Design Cont 'd 124
B.• Gate-Level Simulation of RC6 Cipher Design Cont'd 125
D.. Gat e-Level Simulation of RC6 Cipher Design Co nt'd 12.
c.i Gate-Level Simu latio n ofCAST-2OOCipher Design 12.
C.2 Gate-Level Simulat ion of CAST-256 Cipher Design Co nt'd .. . . . . . . . . 12s
C.3 Gat e-Level Simula tio n of CAST-2OOCiphe r Design Cont' d 130
C.4 Gate-Level Simulation of CAST-256 Cipher Design Cont' d
C.5 G ate- Level Simula tion of CAST-2 56 Ciph er Design Cont'd
C.O Ga te-Level Simulation of CAST-256 Cipher Design Cont'd
C .7 Gate-Level Simulation of CAST-256 Ciphe r Design Cont'd
C.S Gate- Level Simulation of CAST-256 Ciphe r Design Cont'd
D.l Ga te-Level Simulation of FIIC Design .
0.2 Gate- Level Simulation of FIIC Design Cont'd
0. 3 Gate- Level Simulation of FII C Design Cont 'd
0 .4 Gate-Level Simulation of FHC Design Cont 'd
0 .5 Gate- Level Simulation of FHC Design Cont 'd
D.6 Gate-Leve l Simulation of FIIC Design Cont 'd
D.7 Gate-Level Simulation of FHC Design Cont'd
0 .8 Gate-Level Simulation of FIIC Design Cont'd
131
132
133
134
135
137
138
139
140
141
142
143
144
List of Tables
5.1 Probabilities of se lect ing k non linear S-boxes _
5.2 Linear Cryptanal ysis Results for Different values of N L . .
xli
88
92
List of Symbols and Abbreviations
M Original plaintext message
C Encrypted ciphertext messag e
Round function
Total numbe r of rounds of enc .-yption
Numbe r of inputs to an S-hox
Numbe r of out puts of an 5-b ooc:
Number of Scboxes in a roun d functi on
K r ; S-hit round subkey for the ith r-ound
K"" 32-bit maski ng key for the i fh co und
N(f ) Nonline ari ty of an rn-bit boolean funct ion
N(S) Nonline arity of an S-box
K Primary key of encryption in bytes
NL Nonlinearity of S'box function
Nt To tal number of known plaintecxts need ed for linear cryptanalysis to be successful
Ne Total number of chosen plai nte:xts needed for differenti al cryptanalysis to be successf
AES Advanced Encryption Standarcl
DES Dat a Encryption Standard
NIST National Institu te of Standards . and Techn ology
FIPS Federal lnf ormation Processing Standard
FPGA Field Programmable G ate Arra,ars
MPGA Mask Program mable Gat e Arr esy
VPN Virtual Privat e Netwo rk
xiii
LAN Local Area Network
WAN Wide Area Network
NBS National Bureau of St andards
SPN Su bstitution-Permu tati on Network
SAC Strict Avalanche Criterion
XOR Exclusi ve-OR
LUT Looku p Ta ble
SR.Ai.\1 Static RA.\f
ASIC Applica tion Specific Integra ted Circuits
CLB Configurable Logic Block
rOB Input/ Out put Block
FHC Fast Hardwar e Cipher
CMC Canadian Microelectronics corporation
CLA Carry Lookahead Adder
BCLA Block Carry Lookahead Adder
CRA. Carry Ripple Adder
CSEA Carry Select Adder
GSA Carry Save Adder
CPA Carry Pr opaga te Adder
T""" Tot al delay for the 32-bit carry prop agat e adder
Ten<: Time to achieve one encryption
Td'k One clock period in nanoseconds
One logic level delay in nanoseconds
xi v
po . Proba bility of best r -ro ua d iterative characteristic
Pi Probability of the outpu t XOR, given the input XOR in round i
piJ Probability th at the linear approximation holds tru e
PI;n Probability of one linear or affine funct ion
PI: Probability that k out of a total of eight S-boxes are nonlin ear
PI Probability of t he bes t linear expression for an r-round cipher
Chapter 1
Introduction
Civilization is the progr ess toward a society of privacy . The savag e 's whole exis-
tence is public, ruled by the laws of his tribe . Civilization is the process of setti ng
man free from men. Ayn Rand . The Founta inhead (19';3)
In t he recent years , there has been a great need for much improved techniques of securely
transmit ti ng and storing info rmation. From electronic mai l to cellular communicat ions, se-
cure web access to smart cards and electronic commerce, wireless LAN and WAN computer
networks to vir t ual priva te netw orks (VPNs) - these and other new inf ormation-based ap-
plications will have far reaching consequences , affecting the way business is done as well as
private communication and social in teraction. As this happens, security aspec ts of com-
munication systems are of growing commercial and public interest . Unfortunately, these
aspects have been widely underestimated or ignored in the past. Today, however , there is
high demand for expertise and high-quality produ cts in the field of information security and
cry ptography.
Until recently, encryption products were generally in the fonn of specialized hardware.
These encryption/decryption devices plugged into the communicati ons line and encryp ted
al l the data going across the line. Alt hough, software encryp t ion is becoming more preva-
lent today, hardware is still the embod iment of choice for many mili tary and commerci al
applications. As an ind ustry trend , man y companies in Nort h America as well as Europe
are developing cryptographic hard ware for applications such as secure voice, fax: and dat a
net works , secur e VPN cryptographic accelerators, protoco l sensi tive encryption for wide area
networks , an d DSP voice ciph ering .
Speed and security are also impo rtant issues that play in the favour of the hardware
impleme ntation of encryption de vices. Encryp tion algorit hm s involve many complex oper-
at ions on the message or plaintext bits . Oft en these are Dot the type of operatio ns th at
are incorporated into our typical des kto p computers. The most widely accepted private-key
block cipher, the Data Encryption Standar d (DES) [2], int rodu ced in 1977, runs inefficiently
on general purpose processors. Alt hough, some cryptographers have tri ed to shape their
algorithms to suit software implementations, specialized hard ware such as an encryption
chi p will likely emerge as t he win ner in efficiency. Another key factor that favours the hard-
ware imple men tation of a block ciphe r is security. An encryption algorit hm being run on a
gene ra lized computing machine has no physical protec tion . On the other hand , hardware
encryption devices can be secur ely encapsul ated to prevent this. O ther factors that suggest
a hard ware implement ation include cost, ease of installation , an d lower power consumption.
T he Na tional Institute of Stan dar ds and Technology (NIST) in the U.S. has initiated
a process to develop a Federal Inform at ion Processing Standard (FIP S) for an Advanced
Encryption Stan dard (AES) [IJ. T he new encryption standa rd is based on a 128-bit block
size and a 128, 192, or 256-bit key size. T his st andar d will be a rep lacement for DES. This
thesis exam in es the hard ware imp lemen tation of two private-key block cip hers, RC6 [3] and
CAST-256 [4},in Fiel d Programmable G a te Arrays (F PGAs). Both RC6 and CAST-25f) are
among the fifteen candidate algorithms th at have been accep ted in the first round of AES
deve lopment phase . The thesis also proposes a simpler priva te-key block cipher design that
is very efficient in terms of hard ware imp lementation in FPGAs.
1. 1 Motivat ion for t he R es earch
Th e Da ta Encryp tion Standard (DES) [2], a priva te-key block cipher, is the mos t widely
used cryp tosysrem in the world . DES was developed by IBM, as a modification of an ear lier
cry p tosyst em known as LUCIFER [5). DES was first publ ished in the Federal Regist er in
1975. DES was ad opted as a stan dard for "unclassified" applicatio ns in 1977 by the Nat ional
Bureau of Standards (NBS).
DES h as bee n a target of cri ticis m since its incep tion in 1977. O ne o bjection of DES
concerns the myst ery surro undi ng the design of its s-bcses, which bein g the only aoalia-
ear component of the cryp tos ystem, is vital to its security . However, t he mos t pertinent
criticism of D ES is that the size of the key, 56 bits , is too small to be really secure . Afte r
twenty two years , DES is nearing its demise and is theoretically breakable by two powerful
cryptanaJytical attacks of differential and linear cryptanalysis [6, 7J.
The National Institute of Standarrls and Techn ology (NIST) has initiated a precess to
develo p a Federal lnfo nn ati on Processing Standard (F IPS) for an Advanced Encryption
Standar d (AES) [IJ specifying an encryption al gorit hm for the twen ty-first cent ury as a
replacem ent of DES . In this regard , the age ncy has announced a request for candidate algo-
rithm nominations of AES. One of the important evaluation criteria concerns the efficiency
of the private-key block cip her from the hardw are imp lemen ta tion perspective. RC6 [3] an d
CAST-256 [41 are among t he fifteen can dida te alg orit hms that have been presen ted to the
first rou nd of the AES devel opment phase. Both ciphers are mod ifica tions of earli er gener-
ation ciphers (Res [81and CAST-128 [9]) based on smaller (64-bit) block sizes. Like most
proposed private-key block ciphers , RC6 and CAST -256 are clear ly designed for efficient
implementation in software .
T his thes is discusses th e issues that effect the hardware implementatio n of t he two AES
candida tes , RC6 and CAST -25 6, in FP GAs. Th e two major aspects of speed and hardware
complexity associa ted wit h the two ciphers are exp lored an d a comparat ive an alysis of th e
two ciphers in term s of implementa tion in FP G.J\s is present ed.
As t he resul t of our study of these two ciphers , we also propose a new private- key block
cipher, specifically tar geted for har d ware implement at ion. Th is cipher is based on simpler
opera tions that not only possess good cryptographic properti es , but also make the overall
cipher design efficient for implemen ta tion in cus to m architectures as FP GAs.
1.2 Outline o f Thesi s
Th e thesis is organized as follows :
• Chapter 2 presents a literat ure survey of the previous research t ha t is relevant to our work .
• Ch ap ter 3 examines the different issues pertaining to hardware implementation of cryptc-
gra phic algorithms in FPGAs.
• Chapter 4 examines t he design of RC6 and CAST-256 e ncry ptions and their imp lementat-
ion in tar get FPGA devices.
• Chapter 5 presents the design of a new private-key block cipher based on simpler crypto-
graphic operations and its imp lementation in FPGAs. The security of the proposed cipher
against linear and differential cryptanalysis is also examined in this chapter.
• Chapter 6 summarizes the results of the thesis and presen ts certain suggestio ns for
future work.
Chapter 2
Review of Previous Research
Security of information stems from the need for private transmission of both military and
pub lic messages . This need is as old as civiliza tion itself. T he ancient Spartans, for in-
stance, encip hered their military messages. T he first secure commu nication channels were
very simp le and their reliability depended on the physical security of messengers. Due to
the invent ion of computer syst ems and t he pervas ive intrusion of computer networks, the
spec trum of protection issues bas been stretched. :Manyprotection issues of modern day com-
put er systems and net works are strictly related to the protection of comm unication channels.
Due to the natural characteris tics of any channel, we have a communication medium that
is accessib le to eavesdroppers, so physical security is meaningless. The only way to enforce
security in communication channels is by th e app lication of cryptography.
Th e term cryptology originates from Gre ek roots meaning" hidd en" and ''word'' and is
the um brella tenn used to describe the ent ire field of secret communications. Cryptology
further bran ches into two: cryp tography an d cryp tan alysis. Cryptography is the art and SCi4
ence of transforming inform atio n into an intermediate form which secu res that information
while in storage or in transit , AJ:,opposed to steganogrophy, which seeks to hide the exis-
tence of any messa ge, cryptogra phy seeks to rend er a message unintel ligible even when th e
message is completely expos ed. Cryptanalysis, on the other hand is the aspect of cryptology
which concerns the strength an alys is of a cryptogra phic sys tem or cryptosystem, and the
penetration or breaking of a cryptosystem.
A crypt osystem is any system that employs meth ods of cryptography to encryp t a mes-
sage. EnC11/ptionis a process th at transforms the original message or info rmation referred to
as the plaintext into an encrypted message known as th e ciphertext. This ciphertext is the n
transmitted over an insecure channel. when th is ciphertext reaches the receiver, a reverse
trans formation process , referred to as decrypt ion, is perform ed t o recover the original plain-
text from the correspo nding ciphertext. Thi s encryption/decryption scheme is also referred
to as a cipher [10]. Figure 2.1 shows the encryp tion/ decryp t ion process in the context of an
insecure communica tions channel. A block cip her is a function th at maps N·bit plain text
blocks to N -bit ciphertext blocks, where N is the block length, which is 64 bits in the case
of DES and 128 bits in the case of AES.
In 1948 Shannon [11] proposed two principles th at present a so und theoretical basis
for cryptosyste ms wit h good security, name ly confus ion an d dijjwion . Confusion emp loys
substitution to hide the plaintext and the key. Diffusion spr eads the confusion effect across
the enti re ciphert ext, thereby masking any statist ical pro per ties of the plaintext .
The field of cryptography is divided into two main branches: pri vate-key cryptography
and public-key cryp tograp hy. Th e two types of cryptosystems are shown in Figure 2.2. In
priv ate-key cryptosystems, the same secre t key is used bot h for encryp tion and decryption .
Assuming the algorit hm. t o he secure enough, th e security o f the cryptosystem is based on
Insecure communication
channel
P la lIlte l l(P j - ~
Key(K)
_._-
EHCRYPTlON
e,
Key(K)
-_._-
DECRYPTlO N
D.
Figure 2.1: A General Cryp tosys tem
keeping the key secret. In cont ras t , every user in a pub lic-key cryptosystern possesses two
keys. One key is public and is known to everyone. Th e other is private and is only known to
the person possessing it and no one else! Public-key cryptography has the advan ta ge that
a secure channel is not required to exchange keys . However , its disad vantage is that it is
orders of magnitude slower to encrypt as compared to private- key cryptosystems. :\t06l cryp-
tesystems use a combination of public and private key cryptography. In a typical scenario,
a public key scheme is first used to exchange the secret key that is then used for encry pting
or decrypt ing the messages using a private key encry ption algor it hm.
2.1 Private Key Blo ck Ciphers
Th e security of a privat e key block cip her depen ds on the communicat ing parti es sharing
the same com mon secret key. If this key is compromised, the n the encrypted messages will
be easily decrypted using the known key. Th e cipher is called a block cipher because the
/ _Vcm ti.t,
r-:=~.=-L r:::::~:=L
- I' la lntn t ~ ~:~C1Phtl1nl - "'~- I'la l n ln t ..
c~. r-' -"~CIPhtrtnl - j Decryption , Plaint"" . ...
P1lbUc-kt,
Cry pl.Oooy-.t.,..
Figure 2.2: Private and Public Key Cryptcsystems
plaintext is broken into fixed length blocks before being encrypted.
2.1.1 Architectures
The first practical private key block cipher designs based on Shannon's principles of con-
fusion and diffusion were laid down by Feistel [51 and Feistel, Notz and Smith [121. These
design frameworks are referred to as Subshtution- Pennutation Network3 (SPNs) and Feistel
Net work8 or Feutel ciphers .
The SPN cryptographic network consists of & number of stages or rounds of substitution-
permutation layers. Each sub6t itution-penn utation layer (SP layer) is made up of several
smal ler sub-b lock substi tutions (known as Scboxes) followed by a large bit posit ion perm u-
tation opera tion (known as P-box). The former has the effect of Shannon's confusion, while
the latter operation imp lements Shannon's concept of diffusion. A primary key is used to
• ••
<,
\
Figure 2.3: A Basic Substitution-Perm utation Network
generate all sub keys implemented in each substitut ion-pe rmutation layer accord ing to a key
schedu le scheme. An m x n S-box has a nonlinear mapping from an m bit inpu t to an n bit
output pattern. T he Scboxes for SPNs, however , must have same number of inp uts and ou t-
puts . Such S-boxes are alsoknown as svmmetric S-boxes. Moreover , these mappings should
be bijective, meanin g th at they are a one-to-one mapping and are invert ible. Invertib ihty is
needed for the purpose of decryp t ion. T wo stages of Scboxes in an SP st ruct ure based on
.. x .. S-boxes are shown in Figure 2.3.
Another type of private key block cipher design is based on a Feistel network architecture
proposed by Feiste l, NOb:, and Smith {12J. In a Feistel architecture, as shown in Figure 2.4,
Shannon's mixing transformation can be achieved using Scboxes and permutations inside a
round function 1- But these operations are performed on only half the block at a t ime. For
10
each roun d , the right half is fed into a round function f ....bose ou tput is bitwise XORed with
th e left half. This is followed by an immediate swap ping of the two hal ves. After a tow of
R rounds , the two halves are concatenated to consti tu te the ciphertext block . Th e com plete
encryption process can be visual ized as an iteratio n of the following op erat ion :
n.+l f(fl;, K<l e L; (2.1)
where n.and L; are the righ t and left halves , respecti vely, of the block for the iUr. round.
Also , K, represents the ,-rA round su bkey,
Decryption is similar to encrypt ion, with the only exception tha t th e subkeys are used
in reverse orde r. The round function is the most criti cal compo nent of the cipher design as
it int rod uces an element of ran domness to the plain text . It is basically t he stru ct ure of the
round fun cti on that dis tinguishes between diJferent Feistel ciphers. Feistel ciphers , unlike
SPNs , can have asymmetri c S-boxes , t.e. m of: n,
Most of the existing commercial cryp tographic implementations use DES for their priva te
key algori thms. Th e Data Encryption Standard , first intro duced in 1977 as an encryption
standard for unclass ified applications is based on a 64-bit block size. The key size is 56
bits . T he round function expands the 32-bit inpu t into a 48-bit block using an expansi on
table, followed by an XOR operation involving a 48-bit subkey generated by the key !JckJule
scheme and the 48-bit expanded block . Th e resultan t 48 bits are th en fed into eight 6 x 4
S-boxes. The output of the eight S-boxesgoes through a final 32-b it perm utation giving the
final 32-bit output of th e round function.
Th e ciphers are typ ically keyed by applying subkey bits (derived for each roun d by the
key schedule) to t he S-boxes employin g either :
11
iR,
:
i
~
~
T T
~ L,., ~ R""
Figure 2.4; A Basic Feistel Structure
(i) select ion keying: Here the key bits select the desired mapping for a part icul ar Scbox.
(ii) XOR keying: Here th e key bi ts are XORed with the input bits before feeding into the
8-box.
2.1.2 P opular Priva t e-key B lock Ci phers
Over the years , many private-key block ciphers have been proposed as poten tial replacements
for DES . The structure of these block ciph ers mayor may not be a Feistel network . An
introduction to some of the popular private-key block ciphers is presented here.
Blowfish (131 is an algorithm developed by Bruce Schneier. It is a block cipher with a
64-b it block size and variable lengt h keys (up to 448 bits) . It has gained a fair amount of
acceptance in a number of applications. No successful attacks are known agains t it . Blowfish
12
is used in a num ber of popular software packages , including Na utilus and PGPfone.
FEAL [14J is a 64-bit block ciph er with a 64-bit key. It is basically a Feistel structure.
Th e 8 x 8 Scboxes in the round function execute XOR addi tions and byte rot atio ns. The
algori thm is well suited for 8-bit microprocessors. However, the down side of this cipher
has to do with its resistance against differential crypt analysis. It has been shown that the
algori thm wit h less than 8 rou nds can be easily broken using differential crypt an alysis [15J .
The cipher is resistan t to this kind of at t ack only if the number of rounds exceeds 32 [161.
IDEA (Intern ational Data Encryption Algorith m) is an algorithm developed at ETH
Zurich in Switzerl and [17). It uses a 128 bit key, and it is general ly considered to be very
secure. The block size is again 64 bits. The algorithm uses a mix of three different groups
of operations - bitwise XOR, in teger additions and integer mult ip lica tion s. It has already
been around for several years, and no prac tical attacks on it have been pub lished despite the
number of attempts to analyze it. IDEA is pat ented in the Uni ted States and in moot of
the European countries. Th e pa tent is held by Ascom-Tech. Non-commercial use of IDEA
is free .
RC5 [8] is a very efficient word-oriented secret-key block cipher. It is a parameterized
fam ily of symmetric ciphers . It uses a variable word size, a variable-length secret key and
a variable num ber of encryp tion ro unds . The archit ecture of thi s novel symm etric block.
cipher does not fall into the realms of a typical SPN or Feistel cipher . This algorithm makes
use of data-dependent rotations. It also makes use of intege r additions, subtract ions and
bitwise XORs . RC5 has been shown to be very resistant against both linear and differential
cryptanalysis (18), although potentially suscept ible to timin g a ttacks (19J.
CAST-128 [9] is ano ther private-key block cipher tha t is based on a 64-bit block size.
13
It uses a 128-hit primary encryp t ion key. The algori thm uses six 8 x 32 S-hoxes . T he
strength of this algorithm has been shown to lie in the large size of t he S-boxes {9J. Lee,
Heys, and Tavares {20] sbowed t ha t the algo rithm is resistant to both linear and differen tial
cryptanaJ:rsis.
2.1.3 Advanced Encryption Standard (A ES)
As mentioned ear lier , the National Institute of Stand ards an d Technology (NlST ) bas un-
veiled a process to develop a federal lnf onnation Processing St an dard (f IPS) for an Ad-
vanced Encryption Standard (A ES) {I ). The AES represen ts a specificat ion for a pri vat e-key
block cipher as a replacement for DES. As a part of the AES process, a number of minimum
acceptability requirements have been d raft ed . These candidate algo rithm evalua tion criteria
include:
• AES shall be a. symme tric privat e-key block cipher .
• T he adop ted st andard shall be publi cly defined .
• AES should be sui tab le both for hardware and software imp lemen tations .
• The key length for the AES may be increased as needed .
• Candid a te algorithms t ha t meet the above requirements will be j udged on the basis of
th e following factors ;
1. Computational efficien cy
2. Hardware complexity
3. Encryp t ion speed
14
4. Software suit ability
5. Memory requirements
6. Flexi bility
7. Licensing requirements
8. Simplicity
Fifteen candidate algori thms have been presen ted to t he first round of AES de velopm ent
phase . Inform a tion o n AEScandidat es can be found in [1]. CAST-256 and RC6 are two
of t he fifteen AES submissions and are invest igated in this thes is. T he emph as is of th is
invest igation is on the hardware imple me ntation of th ese ciphers in FPGAs. Both ciphers
are strong cand idates because they are mod ifica tions of th eir earlie r vers ions (CAST-128 (9)
and RCS IS]). Like most of these proposed ciphers . CAST· 256 and RC6 are des igned for
efficient implementations in software . But as one of th e implementa tion req uirem ents for an
AES can didate. the hard ware efficiency ofthese algori t hms has to be thoroughly investiga ted .
Detailed archi tectu res of the CAST-256 an d RC6 ciphers are present ed in Cha pte r 4.
2 .2 Cryptographic Properties of a Block C ipher
Since its adop tion as a standard, DES has been the focus of most of the research in privat e-
key cryptogra phy. :Much of the effort had been di rected towards cryp tanalyring DES or
investigating properties that might improve the overall securi ty of the cipher. In t his section.
d ifferen t cryp tographic pro perti es that are vital to the security of a block cipher are presented .
15
2.2 .1 Nonlinearity
NOtllinearity is the most cruci al featu re in the des ign of private-key block ciphers. for
instance, if there exists a linear relat ionshi p (on a pe r bit or per block basis) betwe en the
ciphertext output and t he plaintext input , the cip her can be easily broken by red ucing the
cipher to a system of linear equa tions. Th ese linear equat ions can be then solved using a
small amo unt of known plaintext- ciph ertext pa irs. T yp ically, an Scbox is the only nonlinear
compo nen t of an SPN or a Feistel cip her. As such , the need to design highly nonli near
S-boxes makes the difference between a more or a less secure cipher.
An m-bi t affine boo lean function g is defined [21) as
g(X) = a.oEDatXt ED .... ED a".X... (2.2)
where X = [Xl> ...,x...] is the m-b it binary input, ED is t he bitwise exclusive-or , and ai e
(0, I} , 0 ~ i ~ m. The Hamming distance bet ween two m-bit boolean functio ns , I (X) an d
g(X ), is defined to be
d(j,g) ~ #{X E (a , 1}mlf (X ) Ellg( X) ~ I}
where # is the tot al number of m-btt binary inputs.
The nonline ar ity of an m-bit boolean funct ion I is defined as
(2.3)
(2.4)
where A is th e set of all m- bit affine boolean functions. Since an m x n Scbox has n out put
bits, each of which is an m -bit bool ean functio n, the nonli neari ty of the Scbox S is defined as
th e minimum nonlinearity over all non-zero combinations of outp ut bit boolean func tions:
(2.5)
16
where Ii is the m-bn boo lean funct ion of the ith outpu t bit of th e Sebox , an d (c; / i)( X) ==
c;(fi(X )) for all X .
He}"Sand Tavares (21) used random search and filtering agains t known weaknesses to find
highly nonlinear large Scbcxes , They have used this technique in const ru cting S-boxes for
SPNs.
2.2 .2 Aval an ch e
f eist el , Notz , and Smith [121 first described the concept of avalanche as an important cryp-
tographic property in t he design of a block cip her . T he avalanche property is satisfied only
when , on average, half the outp ut block bi ts vary when one inpu t bit chan ges.
2.2.3 C omplet en ess
Comp letenes3 was a concept intro duced by Kam and Davida [221. T he com pleteness criterion
is satisfied if aUoutput bits depend on all input bit s. Kam and Davida pro posed a class
of permutations in a basic SP N which ensures the comple teness of an SPN , provided each
S-box is comp lete . Drown and Seberry {23J found th at DES is complete after four to five
rounds with a high probability.
2 .2 .4 Strict Avalanch e Criteri on
Webster and Tavares {24] used the concepts of com pleteness and avalanche to come up with a.
new cryptographic prope rty that concerns not only the individual S-boXESbu t also comple te
cryptosyst ems. T his property is known as th e Stri ct Avalanche Criteri on or SAC. This
pro perty st ates that for every input bit, inverting the bit causes each output bit to vary with
17
a probability of one half over all possible input vectors. Higher order properties of SAC have
been presen ted by Ferre [25]. Adam s [26] an d Preneel et al [27].
2 .2 .5 In fo rma ti on Theory
Many of the contributions to the field of cryp tograp hy come from the information theQry
concepts introduced by Shannon (11J. In a ciph er that has perfect secrecy the plaintext is
statistically independent of the ciphertext. Th is means that even with an unlimi ted time
and computational resources at our disposal. we cannot guess the plai ntext. given knowledge
of the ciphertext. For a private-key cipher to be perfectly secure, the uncertainty in the key
must be at least as large as tha t of the plaintext.
Dawson and Tavares [28] furt her investigated the work of Ferre (29] in using information
theory to design the Scboxes. They proposed minimizing the mu t ual information between a
subset of output bits and any subset of input and/or outpu t bits in the design of S-boxes
for SPNs and Feistel ciph ers.
2 .2 .6 Invertibilit y
An n x n Scbox is sai d to be invert ible if it is a bijectiv e mapping. Adams and Ta vares (3D)
proposed a method of const ructing 8-boxes such that it satis fies ei) biject ion. (i i ) minimum
nonlinearity, (iii ) SAC, an d (iv ) output bit ind epen dence by combining 0 - 1 bal anced
boolean fun ctions. However , O'Connor [31] was of the opin ion that this technique becomes
impractical as n increases .
18
2 .3 Cryptanalys is of Private-ke y Block C iphers
T he purpose of cryptanal ysis is to recover a secret primary key used in a particular cryp-
tosys tem , The re are three types of gene ral attacks that can be applied against an y par ticular
block cip her. Th ese include ciphertext only, knoum plaintext, an d chosen plaintezt. In case of
a ciphertext only attack, the cryptanalyst possesses the ciphe rtexts only. A known plaintext
a tt ack uses the knowledge of both plaintexts and their correspo nding ciphertexts. In the case
of chosen plaintext attack, t he cryptanalyst can select particular p laintexts, and pro duces
th e corresponding ciphertext . T his is poss ible only because he or she has tem porary access
to the encryption mach inery.
The most fundamental way to break a cipher is exh aus tive key sear ch, re ferred to as t he
brute force attack. Two recent and powerful meth ods t hat have dem ons trated t he ab ility
to break modem day block ci phers are different ial and linear cryptanal ysis . This section
desc ribes these two widely known a t tacks against private- key block ciphers as well. At the
end of this sectio n a new typ e of a ttack, called the timing attack is presented.
2 .3 .1 Brut e fo rce Attack
A bru te force attack, also kno wn as exhaustive key search is a known plaintext att ack. In
this kind of attack , the cryptanalyst gets hold of a few eip hertexts and thei r corresponding
plain tex ts . The next ste p is to exha ustively search all poss ible keys by encrypting a known
plain text with each of these keys . When one of the keys generates th e corr ect ciphertext, we
very likely have the correct key . We can use a few more ciphertext- plai ntex t pairs to verify
the correctness of the key.
T he best line of defense agains t this type of attack is to increase the key size such that
19
the a tt ack becomes infeasi ble. Theoretical ly spea king , a cipher is broken, if the time and
memory resources requi red by any cryptan alyt ic at tack are less than what is needed Cor a
brute force attack.
2.3 .2 Differential Cryptanalysis
Difftrtntial cryptanalysis, develo ped by Biham and Sham.ir [61. is one of t he most potent
techniq ues used to cryptanalyze many private-key block ciphers. SPNs and Feistel ciphers
belong to the class of iterat ed product ciphers and this attack is very much ap plicable to
them.
Differential cryp tana lysis is essentially a chosen plaintext at tack. Blham and Shamir have
successfully attacked DES using th is technique and have found it to be more efficient than
a bru te force att ack. T his meth od takes into accoun t cip hert ext pairs , whose corres ponding
plaintexts have a par t icular diffe rence . In oth er words , it looks at the XOR difference of
two plain tex ts and considers the corresponding ciphertext pair. In a particular S-box, if
we know the input XOR of a pai r, it d oes not ensure the knowledge of its output XOR.
However , there exists a probabilistic relation between the out put XORs and every input
XOR. Differential cryptanalysis makes use of the highly proba b le occurrences of sequences
oCout put XOR diJ£ere nces at each round given a particular plaintext XOR differe nce.
Several methods have been proposed to ensure the immunity of a round funct ion against
this typ e of at tack . Several methods have been used to reduce these highly probable ce-
curre nces of outpu t XORs in relation to inpu t XORs . For example, this can be done by
incre as ing th e outp ut bits of th e S-bo x t o some reasonable value [321. A secon d a pproach
uses a modular multi p licatio n t o mask the input of the S-boxes as a way to repl ace t he XOR
20
ope ration in the roun d funct ion that involves the sub key [33J.
2.3.3 Linear Cryptanalysis
Linear cryptanalysis is a known plain text attack, invented by Matsui [iJ, uses linear expres-
sions to approximate the action of a block cipher. T his attack exploits the statistical linear
rela tions between plaintext, ciphertext and su bkey bits. This implies that if we XOR some
plain text bits together, XOR some ciphert ext bits toget her and then finally XOR th e resul t ,
we end up getting a sing le bit th at is equal to the XOR of some of the key bits with a
pr obabil ity that is signi ficantly different than one-half. This defines a linear approximation,
which holds with a certain probability. ITthis pro bab ility is different from one half, we can
use this fact to construct a linear app roximation of the enti re algorithm. This is done by
concatenating linear ap proximations of differen t ro unds . Matsui, in his paper present ed two
algorithms used to de rive the subkey bi ts using a linear approximation. Algorithm 1 is used
to recover one sub key bit that is the XOR sum of a subs et of sub key bits. The secon d
algorithm, an exte nsion of t he first , det ermines a num ber of th e subkey bits at one time.
DES is highl y susceptible to this kind of attack as th e 5-boxes of DES are not optimized
against this attack. Wh en this attack is mounted against a I s-ro und DES, the ciphe r is
broken with 247 known plaintexts. As the attack greatly relies on t he st ruct ure of S-boxes,
t he best way to increase the immunity of SPNs against linear cryptanalysis is to select high ly
nonlinear Scboxee. Alternate approaches to thwart linea r cryp tan alysi s, involve th e use of
key-dependent rot at ions {S] and modu lar add itio ns and su bt ractions [131.
21
2 .3 .4 T im ing Attack
Another type of attack aimed at breaking a priva te-key block cipher is the timing attack.
Based on the assumption that accurate timing measurements are available for individual
encryptions , this attack empdoys the methodology of deriving the key bi ts using timing
inform ation from a set of cip laert ext s. The timing attack of RC5 as ou tlined in [19j, exploits
the fact tha t a naive im pleme nta tion of the cipher could result in data-dependen t rotation s
taking a time that is a func tion of the data. This implies that it is important for the
designers to be aware of different cryptographic issues when imp lementing ciphers like RC5 .
However, the timing a tt ack cezn be prevented if a digital hardware irnplemention ensures tha t
the rotations take constant tbne. A barrel shifter is one piece of d igital hardware tha t can
execute any size rotations in o ne clock cycle.
2.4 Se curity of R C 6 and CAST-256 Encryption A lgo-
rithm s
As mentioned earlier, C AST -2 56 and RC6 are among the fifteen candidate algorithms that
have been presented to the fi-rst round of AES development phase. Al tho ugh CAST-256
and RC6 are neither SPNs nor- Feistel ciphers, their architectures are extensions of the basic
Feistel cipher. This section oriefly discusses the security of these two AES submissions
against linear and differential .crypranalysis as well as against brute force attack.
In AES submission for RC B (3j, several modi fications have been made such as the use
of four working registers instead of two as in Re5 [8], and the introduction of a quadratic
function that uses the primitive operation of integer multiplication . The use of mult iplication
22
oper ation enhances th e di ffusion effect, t hereby increasing the ove rall security of t he ci pher.
It has been conjectured in {3J that the best approach to atta.ck RCS block cipher is to
adopt brut e force atta.clc:. This is achi eved by carrying out an exhaustive search for the usee-
supp lied encryption key . Rivest, Robs haw, Sidney, and Yin [3]have concluded that the work
load needed to exhaustively search foe th e ~byte encryption key DC t he expan ded forty -four
32-bit subk eys (as a pact of AES submission] is min(~.2104a} operations! :U fae as the
linear and differential cryptanalytic a t tacks on this cipher are concerned , the data require-
ments to execute these attacks on RCS exceed the available da ta. Foe ins tance, considering
an g.couod version of RC 6 would req uire more th an 278 chose n plaintext pairs foe success-
fully mounting differen tial cryp t an alysis, while it needs more than 200 known plain texts to
appl y lineae cryptanalysis succ essfu lly. Clearly, application of these attacks to the 2o-ro und
version of ReS, as presented for AES submission makes these at tacks impractical.
T he securi ty anal ysis o f C AST -256 [4, 34] reveals that the cipher is resistant to both linear
an d differential cryp tanalysis. The to tal numbe r of known plain texts needed for a 48-round
linear approximation of CAST-256 is ap proximately 2 122 , which is al most eq ual to the total
number of plaintexts available (2128 ) . This im plies that linear cryptanalysis is im prac ti cal
against CAST-256. In case of di fferential crypta.oalysis of CAST-256, we need more than
21-40 chosen plaintexts which is much greater than the number of plai ntexts avail able for a
128-block size. It therefore appears th at CAST-256 is immune to differential cryptanal ysis
attack too .
23
2.5 Conclusion
We have introduced the fundament als of cryp tography in the beginning of th is chap ter,
followed by a detailed investiga tion of private-key block ciphers . Here , we have presented
two main ar ch itec tures of b lock ciphers. Many popular private-key block ciphe rs have been
described too. In addition , various cryptographic properties tbat are vital to the design and
anal ysis cr s-boxes and ciphers have been discussed. Next , different cryptanalysis techniques
as applied to ciphers have been presented. Finally, we have discussed. the security of two
block ciphers, Re6 and CAST-256 , against the different attacks presented .
Chapter 3
Hardware Environments for
Cryptographic Applications
Two import ant communica tion revolution s have ca tal yzed t he genera tion of an entire new
inform a tion-based industry. Th e first revolut ion was the interconnection of da ta networks
aro und the globe culmina ting in the Int ernet. Th e second is the recent availability of inex-
pensive high speed connections that link users at home or in the office to these networks.
Thi s growing tren d of intem etworking has led to t he commerc ializa tion of on- line services
and electronic commerce. This has resulted in a critical urge for data security.
Most modem day security applications make use of cryp tographic hardware as a po tent
weapo n against different security breaches and intrus ions. With the demoniac growth of
th e Internet, the need for privacy , aut hen ticity and ano nymity has encouraged cryptography
to surface as a via ble means of achiev ing security . There are DOW a lot of engineering
design companies that are shipping out cryptographic hardware for applications as diverse
as electro nic commerce and banking, secure wireless solutions , smart cards , PCMCIA card
25
sec uri ty, certifica tion aut hority, digi tal signatures etc. One of the most recent applications
of cry ptographic hardware has to do with the securi ty of virtual private networks or VPNs
as they are popular ly known . Th ese are cryptographic accelera tor mod ules that act as
fast coprocessors providing cryptographic processing at wireline speeds, freeing the route r
or firewall to perform other critical tasks while eliminating congestion in virt ual private
networks. Other examples of cryptographic hard ware include LAN jWAN encryptors.
3. 1 Hardware Encryption vs , Software E ncryptio n
Any encryption algorithm can be im plemen ted in software. But there are several disadv an-
tages inherent to software im plementations. T hese include lower speed, higher cost , and less
secu rity. T he speed of encryptio n is basical ly restricted by the maximum clock frequency of
the comp uting platform, whereas in case of a hard ware solution, we can go for an extre mely
fas t imp lementation such as a full cus tom ASIC implement a tion . Moreover , a software-alone
solut ion is vulnerable to viruses, ina dvertant erasing, complications from system failures,
an d hackers.
In cont ras t , hardwa re encryption has many advan tages over soft ware solutions. Encryp-
tion in hardware is fast er . A ha rdwa re solut ion is imp ervious to system failures such as
viruses. Hardwar e implementations can protect against internal an d external intruders us-
ing two- factor authen ticati on : bot h th e bardware device an d a password are necessary to
access th e roo t or primary encryp tion key.
In ternal key manage me nt an d distribution is bette r taken care of using a hardware en-
cry pt ion soluti on. Hard ware solut ions can contro l access to the root keys so that one can
dis t ri bu te access codes across sever al indivi du als who must coopera te to gain access to the
26
root keys. Moreover, hard war e solu tions offer scalable security.
3 .2 Field P rogranunable Gate Arrays (FPGAs)
A Field Programmable Gate Array (FPG A) is a general -purpose, multi-level programmable
logic device tha t is customized in the package by the end users. FPGAs are devices whose
cores are popula ted with an array of logic struc tures of changing granularity and pro-
grammable interconnect used to conn ect them in several differe nt ways. For instance, the
logic blocks can be SRA.VI-based lookup tables (LUTs) or even mult iplexers with or without
registers , while special purpose routing swit ch boxes or segmented channels can make up the
progr ammable interconnect.
T he structure, size and number of blocks of logic as well as the amount of glue logic or the
connectivity of the in terconnection differs largely among FPGA architectures. This variation
in FPGA architectures is dictated by different programming techno logies and different target
app licati ons of the parts. T his implies tha t an architectural arrangement that works well
with a part icular program ming technology does not necessari ly work with ano ther.
Based on differen t programming tech nologies and architectural styles, FPGAs fall into
four groups:
• Island-style SRAM-progr am med devices .
• Cellular SHAM-programmed devices .
• Channeled, antifuse..programmed devices .
• Array-s tyle EPROM or EEPROM- programmed devices.
SR.A.\1-based islan d-style F PGAs include Xilinx LeA families. The Xilinx FPGA uses
a fairly large logic block with t able lookup functional ity and two 0 flip-flops. Xilinx arrays
27
' ....... ~ K ...~l ·
,~-
/
.. --,,-- --<--
';::.:.::' I
Figure 3.1: A Simple FPC:\. Taxonomy
•.I'll< '\l · n 'n 'n<'<l
M y
have specialized routing blocks. This enables interconnection of a subset of inputs to one
another. The AT&T Ocre and Alters Flex, as ....'ell as UTFPG AI [351, also belong to this
type of FP C:\. arch itect ure. Toshiba, Plessey's ERA, Atmel's Cl.i family, the Algotr onix
CAL, as well as Tript ych [3GI FrGAs belong to the cellular-s tyle ar chitecture. Algotronix
and CAL reuse some of the logic cells t hemselves to act as routing resou rces.
Ant ifuse-based channeled FrGAs include Actel's ACTl and ACT 2, Quicklogic's pASIC
and Crosspoint's CP20 I{ Series FPC :\.. Acte l logic blocks are very small and multi plexer
based [3i] . Actel arrays use segmented channel resources. EPRO~I-programmed devices
include Altere's ~IAX 5000 and \IAX 7000, A\ID's \lach and Xilinx's EPLD, as 'il..ell as a
few others. Altera logic blocks directly support multi-level combinational logic. A simple
taxo nomy of FPGAs is illust rated in Figure 3.1.
28
3 .2 .1 A d vanta ges of F PGA s ov er MPGAs
In this sectio n, we discuss the ad vantages of using FPGA 5 as a hardware imp lementation
sol ution versus a masked programmed gat e array (~PGA) implementation.
Low Too ling Costs: Every design to be imp lemen ted in a ma.sk programmed gate amlY
(MPG A) requ ires cus to m masks to cons truct the custom wiring pa tterns. The C05t of
each mas k is several thousand do llars an d this cost is then am ortized over the total
number of units be ing manufactured. As a conseq uence , maskin g charges for des igns
based o n MPGAs are trem endous . In co ntrast , th ere is no custo m tool ing need ed for
FPG A design s, presenting FP GAs as cost -effecti ve for most logic designs.
Rap id Turnaro und: From the completion of the d esign to the delivery of the finished
products, the manufacturing process takes sever al weeks in case of MPGAs. An FPGA
on the other hand can be progr amm ed in minutes by the end user . Faster design
turn aro und leads to faster prod uct de velopment an d short er time- ta- market for new
FPGA products. In [38J, Reinersten found that in a design environmen t that ca te rs
the needs of a high -tech industry, a delay of six mo nths in product delivery redu ces
the lifetim e profits o f a product by thirty- three percent .
Reduced Risk : T he ad van tages of low initial non-recurring engineerin g charges and rapid
turn aro und desi gn time implies tha t a redes ign due to an erro r incurs low expenses
and sm all delays . This encourages rap id p ro tot ypi ng ~d more aggressive logic des ign .
Effi cient Des ign Ve rification: MP G A users ha ve to verify their designs by ext ens ive and
elaborate simulation before manufacture, mainly, because of huge non-recurring engi-
neering costs and long manufacturing de lays . An MPGA design may incl ude errors du e
29
to inaccu racies or oversim plificati ons made in the sim ulati on mo del. T his is because
t here is need for long time simu lations . FPGAs do not suffer from this dilemma. FPGA
users may cho ose to use in-circuit verifica tion, inst ead of simulating large amounts of
t ime.
Low Test in g E x pens es : There are three types of costs asso ciated with testi ng MP GA
par ts : on-chip logic for test ing , generating test program and final parts test ing when
the manufacturing is do ne. The manufacturer's test program verifi es t ha t every FP GA
will he functional for all possible designs that may be imp lemen ted on it . FPGA users
do not worry about writing design-specific test s for the ir desi gns. This eliminates the
need to build tes tability into the design. Moreover , since the test program for FPGAs
is the same for all designs as opposed to MPGA s, it is reasonable to invest more effort
on improving it. This achieves excellent test cover age, providing high quali ty le s.
3 .2.2 Di sadvanta ges of FPGAs ove r MPGAs
Field programmable gate arrays (F P G As) also have some disadvantages. These drawbacks
ar e main ly du e to the inherent nature of t he technology itself. To begin with , FP GAs suffer
from on-chip programming overhead circu i try which is responsible for the programmi ng of
a given part. The area occupied by programming overhead cannot be utilized by the end
user . T his remits in low gate densi ty for the FP GA. The programmable switch matrices
and int erconnects in t he FPGAs are larger than thei r mask-p rogrammed counterparts in
MP G As.
The programmable switches also increase signal delay by addi ng res istance and capaci-
tance to interconnect paths. As a conseq uence, F PGAs are larger and slower than equivalent
30
MPG As. FPG. u a.lsoexhib it some design limitatio ns in relat ion to imp lementing config-
urab le com puting applications in them. For ins tance , FPGAs are well sui ted to algorithms
composed oCbit-level operations , such as in teger arithmetic, but they do not rende r them-
selves efficient enough for implementing num eric operations, such as high-precision multipli-
cati on. In fact , dedicated multiplier circuits such as those used in micro processo r and OSP
chips can be optimized to work more efficien tl y th an those developed Cor configurable logic
blocks available in FPG As.
The on-chi p memory provided by FPGAs for storing interm edia te computational results
is too lit tle.. T his implies th at most configu rable com puti ng app lications need some sort of
add it ional external memory. Th is will slow down the comp utat ions. However , researchers
and industry people are developing newer and more advanced FPGA arch ite ct ures th at ln-
corpo rate enough on-chip memory , very fast an d efficient ari thmetic processing and some
special-purp ose functional uni ts. A very recent exampl e of such FPG As are Xilinx 's Virtex
FPGAs t hat not only possess great ga te densi t ies, but a.lsoharness very high speeds . T he
Virtex Camily FPGAs have broken densi ty and performance barri ers while offering unp rece-
dented system level integration, achievin g clock speeds in excess of 150 MHz.
3 .3 SRAM-based F P G As
In this section. we shall Cocus our discuss ion on the issues surrounding the SR.A.\t:·based.
FP GAs. This is because we ha ve used th ese devices as 0U! targe t techn ology Cor real izing
the cipher design s in hard ware.
SR..A..\t:-based FPG As are the most popular. This is mainly d ue to their abili ty to reo
configure. Several researchers [39, 40, 41, 42, 43, 44, 36, 35, 45, 46J have all investigated
31
this class of FPGAs. An SR..Ac.\l:-hased FP GA is progr amm ed by downloading configura tion
memory from an ext ernal source. T he configuration memory cells contro l the logic and the
inte rconnect tha t execute the ap plica tion func tion inside an FPGA. Th e RA.Vf memory is
not centralized, bu t ra ther dist ribu ted among the con.figurable logic blocks. T his type of
programming approach has many advantages as well as disadvantages at tache d to it . One
obvious d rawback is the volatile nature of progr amming. Thi s means that when power is
switched off, the device loses its progr ammi ng and as such the FP GA is to reprogr ammed
every time its turned on. However , the disadvantage of volatility furnis hes the benefit of
reprogrammability. T his abili ty ma kes SRAM-based FP GAs ideal for rapid prototyping.
This results in very high quality devices. Since the same CMOS process as used in ASICs
is emp loyed to build t hese typ e of FP GAs , SRA M-based FPGAs bene fit from process im-
provemen ts driven by semiconductor industry. F inally, because these FP GAs implement
logic using stat ic gates, SR.A!.\1-based FPGAs have very low power consumption.
3 .3 .1 Xilinx XC4000 Structure
Xillnx FP GAs possess an arr ay based stru cture, each chip comprising a two-dimens ional
array of logic blocks int erconnected by horizon tal an d vertical routing channels. Xilinx
introdu ced the first FP GA series, the XC2000 in 1985, and now offers three mo re generations:
XC3000, XC4000 and XC5000 devices. A very recent device famil y is th e Xilinx 's Virtex
series which is st ill unde rgoin g field test s. Of all the device famil ies introduced so far, Xilinx
XC4000 is the one th at has mos t proven its wort h over the years. T his is the most widely used
FPGA family in indu stry . More information can be obtai ned from Xilinx data books [47].
A detai led descripti on of thi s de vice family is prese nted .
32
Figu re 3.2: A Xilinx XC4000 structure
3.3.1. 1 XC4000 St ructure
The Xilinx XC.tOOO FPGA structure is illustrated in Figure 3.2. The logic inside the FPGA
is implemented in an array of programmable blocks of logic called ronfigurable Wgic blocks
(CLBs) . Input to and output from the array are taken care of by the input / output blocks
(lOBs) along the edges of the array. The CLBs and the lOBs are interconnected using
sev-eral types of programmable interconnect architectures. Connections to and from CL&
and 10D s can be programmed and wire segments can he interconnected, to fonn paths that
extend from one box to anot her using an arr ay of program mab le connect ion blocks called
the switch matric es.
33
3.3.1. 2 P rogramming Technology
As mentioned ear lier, Xilinx uses SR.A.:.\1 techno logy to store the programming inform atio n.
After the power is applied to the circuit, the program data definin g the logic configuration
mus t be loaded into the SRAM . The FPGA itself contains the logic needed to load the
information into itself from from a PROM. Once t he progr amm ing information has been
loaded, the device swit ches from programm ing mode to operational mode in which t he logic
is available. Th is logic is main tained as long as the device is powered up. As soon as the
device is down , i t loses it . Th e ability t o reprogram the FPGA permi ts a new hardware
design on the fiy.
Th e SRAM bits control the logic that is implemented in a Xilinx XC4000 device . This
is done using three techniques, namely pass transis tor control, multiplexer control, and table
lookup implementation [48J. Figure 3.3 shows an SRAM cell driving the ga te terminal of an
n-channel MOS trans istor. When such a transistor is used to make or break a bidirectional
connection for passing a signal between two wiring segments, it is called a pass transisto r .
When the SRA.\1 cell cont ains a zero-bit, t he trans istor is OFF, the path between the two
wiring segments is OPEN, and as such no control signal can be passed. On the other hand,
when the SRA.r.\1 bi t contains a one, the transis tor is ON , the path between t he two wiring
segments is CLOSED, permi tti ng the sign al to be passed. An XC4000 series FPGA contains
tens of t housands of such pass transis tors in the in terconnection st ructure.
Figure 3.4 shows an SRAM cell connec ted. to the select inp ut of a 2 x 1 multiplexer.
When the SRAM cell contains a 0, value on the zero input line is passed to the multiplexer
output. The st ruct ure is used to make selections betwee n two signals.
In the Figure 3.5, we have a lookup table (LUT) built using the SRAM cells . A LUT for
34
Figure 3.3: Pass Transi stor Control Technique
Figure 3.4; Multiple xer Control Technique
a three variab le function F (A, B,C) is illustr ated . The SR.A~f cells in the LUT sto re the
actual truth tab le for the logic function F . This implies that eachcell houses the value of
the function F for the corresponding minterm 149].
3.3.1.3 Int erconnections
Connect ions between the CLDs as well as between CLDs and lOBs are esta blished using
wiring segments . These wiring segments extend in bot h the horizontal and vertical directi ons
in channels lying between the various blocks. Some of the segments are very long, spannin g
the entir e length or width of the arr ay. Such segments are known as long lines. They are
f(A, B,C)
Figure 3.5: A Lookup Table Implementat ion
basica lly intended for high fan-out , time-crit ical stgnelnete, or the ones that are dist ributed
over long dis tances [47]. Other segments are long enough to span a single CLB. These can
be interconnected using the switch matrices. These are called single-length linu. Some of
the XC4000 family devices even have double-length linu or quad lina. The benefit of using
longer wire segments is that signal passes through 1t"S6 series resistance in traversing the
same distance compared to single-length lines.
The discussion of Xilinx interconnections is incomplete without describing its "witch
matrice". An example of a switch matrix is shown in Figure 3.6. Here we have four segments
meeting at a point . At this particular crosspoint, there are six pass transistors: one vertical,
one horizontal and four 00 the diago nals. The horizontal and vertical pass transistors are
shown as a plus sign as they intersec t each othe r. The remaining four pass transistors,
represented by black bold lines su rround these two pass transistors. The connection between
two segments is CLOSED for a one stored in the SRA11 cell dri ving the tran sist or ga te.
Similar ly, the connection is open for a zero sto red in the SRA),l cell. Thu s, the SRAM·
36
I .., C
I
· i
--+---+-+--r!~_Li .lI ! i &. ' !
: ' 1 ~1 ;~I-T i l i i
~ l ! l ,I ! I I ! !
I I I I
Figure 3.6: A Xilinx XC400Q Switch Matrix
controlled pass transistors lie a t selec ted interco nnec:tions betwee n the wiring segments in
the routing cha nnels {50].
3 .3 .1 .4 Xllinx Log ic
:Most of the logic in a Xilinx XC4000 device lies with in the CL& and the lOBs. The structure
of the e LB as well as of the lOB is int ern ally program mable. A simplified repr esentation of
a. Xilinx XC4000 Ct. B is sho wn in Figure 3.7 [47].
Here there are thirteen inputs to the eL B, iDcluding the clock input. Two Sip-Bops
and the ir associated logic are also rep resented by broken lines . Th e remaining part of the
e LB is used to implement the combinational logic. There are three LUTs that implement
combinational functions. Two four-inpu t tables imple ment two functions labeled F and G.
A third function generator , Hcan implement any boolean function of its three inputs. Two
37
Figure 3.7: Simplified Logic Schematic ofaXilinx CLB
of these inputs can optionally be the F and Goutputs of the two function generators . The
third input essentially comes from outside the CLB.
A eLB can implement any of the following functions:
• Any function of up to four variab les, any additional second function of up to four
unrelated variables, plus any third funct ion of up to three unrelated variables.
• Any single function of five variables.
• Any funct ion of four variables along with some functions of six variables.
• Some functions of up to nine variables.
38
Figure 3.8: Simplified Block Diagram of Xilinx IOD
User-programmable input /output blocks (IODs) supply the interface between external
package pins and the internal logic. A simplified block diagram representation of a Xilinx
XC4000 IOD is given in Figure 3.8.
To simplify things, the IOD block is divided into two sections - an input portio n and
an output part . The output part of the IOD gives the output data from the interior of the
FPGA and the I/O pin. Alternatively, it can provide the stored value of the output data
from a flip-flop. A tri-state driver on the output allows the I/O pin to be used as an input ,
an output or the input /output. In the input part , the signal at the I/ O pin enters the input
buffer. The signal can be fed directly to input data 1 and input dat a 2. These are two input
lines to the interior of the FPGA or one or both of the inputs can be fed by a stored value
from the I/ O pin signal or its complement.
39
3 .4 C ryptographic A lgorithms: F PGAs vs, AS ICs
A new development in integrated circuits that offers a hardware implementation choice
that is much more flexib le than App lication Specific Integrated Circuits (ASIe s) is the
remarkable entry of reconfigurahle custom har dware ; these large , fast reconfigurable gate
arrays are FPGAs. In contrast, ASICs provide only functionality needed for a specific tas k.
A well-designed ASIC chip will support a particular application for which it is designed, bu t
not a slightly modified version of the sam e application introduced after the ASIC des ign is
completed. Furthermore, even if a modified ASIC can be developed, the original hardware is
too highly customized to be reused in successive generations . In contrast , the configuration
of an FPGA can be easily reprogrammed to eccomod eee a design modifica tion.
Field Programmable Gate Arr ays (FPGAs) have been chosen as the targe t techno logy
for realizi ng the Re6 and CAST-256 cryp tograp hic algorithms in har dware beca use of a
number of reasons. Replaci ng one cryptographic algorithm with another is a trivial matter in
software, but it is not the same in har dware. Moreover , at the same time hardware solu tions
can offer improved performance in tenus of speed , securi ty, and cost . As such the solutio n
to t his problem is reconfigurable hard ware and FPGAs are the ans wer. As a mat ter of fact ,
FPGAs can be used to buil d algorithm agile applications [511. Th e term algorithm agility
refers to the fact t hat the same FPGA can be reprogrammed a t run time to support di fferent
algorithms. Other key fact ors tha t favour the use of FPGAs for hardware implemen tation
of cip hers include faster turnaround design time, saJ.lablesecurity, and variable architectural
parameter s.
40
3 .5 Conclusion
1J:l this chapter. we have to uched different issues that rela te to the hardware lmplemen ta -
t ion of cryptographic algorithms in FPGAs. We have com pared hardware encryption to
encryption in software . Next , we introduced FPGAs as a viab le custom archit ecture imple-
mentation choice. Here we have described different FPGA architectures, presented several
key advantages and disad van tages of using FPGAs as a targe t technology. We have focused
on SRA..."d·based FPGAs in general and Xilinx XC4QOO family in particular. We have closed
our discussion by reasoning why FPGAs have been chosen over ASICs for this particular
application.
41
Chapter 4
D esign of RC6 and CAST-256
In this chapter , we will basically focus on the FPGA implementa tion of two strong AES
candidates - RC6 an d CAST-256. We will investigate issues rela ting to the efficiency of the
two ciphers from the hardware imp lementation perspective.
4.1 The R C 6 Ciphe r
Re6 is a symmetric (p rivate-key) block cipher submitted to NIST for consideration as the
new AES. Re6 is an evoluti onary improvement of RC5 [81. Modifications have been made
to meet the AES requirements, to increase security, and to enhance performance.
Re6-wIT/b, a general version of the ReG cipher [3], opera tes on units of four w-bit words,
with the encryption cons isting of a nonnegative number of rounds r, an d b representing the
length of the encryption key in bytes. T he user sup plies a primary key of b bytes, where
o :5b:5 255 and, from this key, the key sche du le scheme of the ReS-to/rib algorithm de rives
2r + 4 sub keys, where each su bkey is a w-bit word. T hese 2r + 4 suh keys are then stored in
42
the array S(O,....,2r + 3}. T his array of subkeys is used in bo th encrypt ion and decry p tio n.
Encryp tion with t he RC6 algorithm is described below. RC6 works with four w-bit
registers A, B, C, and D which conta in the initi al input plaint ext as well as the outp u t
ciphertext at the end of t he encryp tion. The standar d little endian convention is used for
packing the data byt es into the inp ut /o utp ut blocks . The encryp tion block involves the
following basic operations on two w-hit words a an d b:
a EEl b : bitwis e exclusive-or of w-bit words
a + b : in teg er addition modulo 2'"
a x b : integer mu ltip lica tion mod ulo 2'"
a <: b : rotate the w-bit word a to the left by the am ount
given by th e least significant log, w bit s of b
For the AES implementation of RC6, w = 32 and r = 20. Each of the four 32-bit regis ters A,
B, C, and D is updated after each round of encryp tion . The out put of the 2O-round encryption
is also stored in the four regis ters as the ciphertext. T he enti re process of encryption in the
RC6 algorithm is illustrated in Figure 4.1.
Decryp tion is similar, but involves reversing the order of the subkeys, replacing left ro ta-
tions by right rotations and replacing addi tion with subtraction. Since the AES submission
req uires the cipher to oper ate on 32-bit words and there shoul d be twenty rou nds of encryp-
t ion/decryp tion, the issues related to hard ware impleme ntation of RC6-32/ 20/ b version will
be discussed in th e coming sect ions. Note that fer AES, b ::; 64 byt es [i.e. up to 512 bits of
key) are allowed as primary key. For details on th e key schedu ling scheme refer to [3].
43
B~B+S[0J
D-D +S[lJ
(or(i=l;i:=;;,- ; i++)
t = (B x (28 + 1»« log~w
u = (D x (2D + 1»«log2 w
A = « A e t)« u) + 8 [2r]
C = « C $ u) « t) + 5 [27" +1 1
(A, B ,C, D ) ~ (B ,C,D,A)
A=A+S(2r+2)
C=C+S[2r+3 ]
Figure 4.1: Encryption with RC6-w/r/b
4,2 The CAST-256 Cipher
CAST-256 [41is a priva te-key block ciphe r that is a gene raliz atio n of the basic Feistel network
[12]. CAST-256 algorithm uses a 12s..bit block size and a 256-hit (or less) prim ary key that is
used in the algorithm's key schedule scheme to generate two sets of subkeys, each of which is
used per round: a 5--bitsu bkey K,., is used as a "rotation key" for round i and a 32-bit sub key
Km.; is used as a "masking key" for round i. T here are a total of 48 rounds in encryption.
T hree differen t 32-bit roun d fun ct ions are used in CAST-256. Using the same notation
as for RC6, th ese functio ns are defined as follows:
• Round Functio n II
44
1 = ( K"" +D) « KrJ
o ~ ((s ,(Iol'" S, [I.J) - S, (I, J)+ S, (I, J
• Round Funct ion h
I ~ ((K~ '" D ) « K, J
o ~ ((S ,(Iol - S,[ I. J)+ S, [I, J)'" S, (I, I
• Round Func tion f3
I ~ ((K~ -D) « K ,J
o ~ ((S ,(IoJ+ S, (I.J) '" S, (I,J) - S, (I, J
Here D is the 32-bit data input to the round function , I" to Id are the most significant
byte through the least significant byte of I, respec tively, Si is the itA substit ution box or
S-box, and 0 is the 32-hit outp ut of the round function . Each S-box is a nonlinear mapp ing
of an B-bit input to a 32-bit output (41. Moreover, "+" and "-" are addition an d subtraction
modulo 232 operations, -e " is bitwise exclusive-OR operation, and , final ly, "u «: e" is the
rotation of u to left by th e value indicated by v. The CAST-2M encryption algorithm is
illustr ated in Figure 4.2.
The plain tex t is st ored in four 32-bit input registers A, B, a,and D. Th ere are 48 rounds
in encryp tio n. The four 32-bit registers are upda ted after each roun d of encryp tion. Th e
ou tput of th e 48-roun d encryp tion is also stored in the four 32-bit registers A, B, C, and D
as the ciphertext . Decryp t ion is identical to encryption except that the masking and round
keys derived from the prim ary key are used in the reverse order . Note tha t the 256-bit
primary key can be generated from smaller user keys as out lined in t he CAST-256 algori thm
specifications (41. Details of the key scheduling schem e for CAST-256 are also outlined in (4].
45
for ( i = 0; i < 6; i ++)
C=C EEl!I(D,Kr.<+"Km..+I )
B = B EEl!2(C, K r.;..2' K m )
A = A EEl !J (B, K r4i+'J ' «; J
D = D E;B ! t (A ,Kr..... , K m•H.. )
for (i = 6; i < 12; i++)
D = DlIl !L(A, K r<;+ " K m.;+,)
A = A e !J (B ,Kr....2' Km.H 2)
B = B Eeh (C, Kr.<+~ , K , )
C = C EEl !t(D, K ••;+'J, K -+3 )
Figure 4.2: Encryp tion with CA$ T-256
4. 3 H ardware Development Environment
The design cycle and CAD tools used for the hardware Irnpelemea tation of RC6 and CAST-
256 algorith ms have been provided by Canad ian Microe lectronics Corpo ration (CMC) [52).
T he entire design process can be divided into the following three stages:
• Generating the VHDL (IEEE 1076) descriptions of t:I:J.e cipher design, employing dif-
ferent architectural opt ions . The functional VHDL shm ula ticn of the design is carried
out using t he Synopsys VSS simulator version 1998.0:2 to verify the correc t operat ion
46
of the cryptographic algori thm.
• Ga te-level synthesis and logic optimiza t ion of the design utilizi ng Syno psys Design
Compil er version 1998.02 to produce a functionally equivalent schema tic in hardware.
• P lace and rou te for a specific FPGA device followed by a final verification of the des ign.
T he timing simulation data is generated during this design stage which is the n used to
carry out timin g simulat ion for the final verification of the design .
We chose Xilinx as the FP GA vendo r and an XC4000 device family. In particular, we
used XC4020aXV-9-BG560 as our target device. Thi s part icular FP GA has a total of 7056
configurable logic blocks (CLBs) , which gives us a basel ine with which we can measure FPGA
resource consump tion. Xilinx Alliance Series version 1.5 is used for place and route.
4.4 Design of RC6
T he block diagram represen tation of the RC6 encryption algori thm realized in hardware is
shown in Figure 4.3. T he Re 6 core bas ically consists of four components - a 32-bi t adder,
a 32 x 32 "partial" integer multi plier (i.e. the product is modulo 232) , a 32-b it bitwise
exclusive-or (XOR) and a 32-bit barrel shift er . The control path of the Re6 encryptor
consists of state machine/controller unit that controls the various modes of ope ra tion of
t he cipher. Other majo r components of the design that implemen t t he glue logic include
shift registers , multiplexers/demultiplexers , serial-In paralle l-out (SIPO), parallel-in serial -
out (PISO) and parallel-in parallel-out (P IPO) registers. As an example, the SIPO registe r
takes a stream of 32-bit words and converts every four consecutive words to four parallel
output words .
47
C,ph
:o-~' ~ ~ r- "' ~~~ .. ROle- ~ .. i ~28.. , -
32.. . . 511 _ ,3 lZ I , ;..E~
~ 'l. i .i · ·/~c L: iI Ld...:
L, L. .. -J l I .. ~ ...J I I ..-'" II,,. ,
"
. I~~ w'- f~ ~~~~. aa.. - 32 ? '" aa~ I· .....,]"
'-- .. ...J. l' I
RClIe--__..... t-... ,CLOCK.. _ ..
Figure 4.3: Realizat ion of RC6 Encrypt ion in Hardwar e
The implementation of the RC6 encrypt ion algorithm is based on a sing le-stage iterative
architecture. This particu lar architectural option involves the hardware for one round of
encryption. The contro l path of the encryptor is designed such that dat a flows through the
RC6 core for a total of 20 rounds as required for the AES submission .
48
4.4.1 R C 6 D ata pa t h
The da tapath for the RC6 encryption , as illustrated in Figure 4.4, consists of two parallel
functional pipes. Firs t the re are two init ial modulo 232 additions of the two 32-hit data
blocks wit h the two corresponding su bkeys . These add itions are executed in parallel. The
other two 32· bit da ta blocks are pushed into the two identical fun ction al pipes without any
modifications. Each of the two functional pipes , as indicated by the operations with in the
two dotted traces in Figure 4.4, are comprised of a num ber of operations. The 32-bit data
is first fed into a quadratic functio n f = 2x 2 + x used in the cipher for enhancing the ra te
of diffus ion thereby improving the security of the cipher. The quadratic functi on f can be
implemented using an addition followed by a multiplication ope rat ion. Here it should be
noted that all these are modulo 232 operations. T he 32· bit output of the quadratic function
goes through a left circular shift or rotation of 5 bits . This is followed by a 32-b it XOR
ope ration, another left rotation, and ano ther modulo ~2 addition with the 32-bit subkey for
this particular round. T he amount of rotation is determined by the least significan t live bits
coming from the other path. Note that the two functional pipes are not independent of each
oth er . Before the end of the first round , the four 32-bit modified words are swapped . T his is
done to increase the nonlinearity of the scheme . finally, the four 32-bi ts outputs of the two
functional pipes are stored back int o th e four 32-bi t registers. The same process is repeated
for a to tal of twenty rounds. Upon comp letion of twenty iterations, two of the four 32-bit
out put blocks undergo final 32-bit additions with the last two subkeys . The other two 32-bit
output blocks are passed out withou t these final additions.
49
Figure 4.4: Functiona l Representation of RC6 Datapath
4.4 .1.1 Desig n of 32-bi t Barrel Shifter
One of the major concerns in the design of the RC6 core has to do with the data-dependent
rotations. We have to look for an implementation that would take constant time for these
rotat ions, irrespective of the size of the rotat ion. The need for constant time rotations
stems from the fact that the RC6 algorithm is vulnerab le to the timing attack [19J that
may ultimately lead to breaking the cipher. This attack exploits the fact that it ta kes a
variable amount of time to encrypt different plaintexts. This vulnerability occurs if the
data-dependent rotat ions take a time that is a funct ion of the data.
The solution to th is problem has to do with the way we implement th ese da ta-dependent
rotations. A barr el shifter is a device that can shift any number of bits in one clock cycle.
\\'e have designed a 32-bit barrel shifte r at the behavioral level and our implementat ions in
FPGAs reveal the following synthesis results:
• Maximum delay for the data to be shifted = 4.88 ns
50
• Total nu mber of CL Bs used = 369 (5.2 % of the total availabl e CLBs )
Although the total numb er of configurable logic blocks is much greater than a normal serial
shift er, the barrel shifte r is much faste r an d takes only one CLB propag at ion delay time for
any size rotation.
4 .4 .1.2 Design of 32-bit Adder
The implementation of a fast , low comp lexity 32~bit adde r involves the consideration of a
numb er of design choices .
We first exp lored a 32-bit carTY ripple adder (CRA) implementation in an F PG A. A
32-bit CRA is made up of 32 stages of full ad ders, with the carry out of the precedi ng stag e
feeding in as tbe carry in to the following one. When imple mented in an FP GA, the synthesis
results are as follows:
• Maximum delay = 173.21 us
• Total num ber of CLBs used= 32 (0.45 % of availab le F PGA resources )
T he reason for this lar ge delay is the way the 32~bit CRA is cons truc ted as shown in Fig-
ure 4.5. It involves 32 CLB delays because the full adder at stage i has to wait for a possi ble
carry from stage i-I , which in turn has to wait for a possi ble carry from st age i - 2, and
A carTY lookahead adder (CLA) is t he fastest of all adders as it has a maxi mum delay of
four logic levels, irr espective of the ad der widt h [53J. The com ponents of thi s timin g delay
are one logic level for th e carry propagate, carry generate, and partial sum signals, two delays
for t he carry signals, and o ne more delay for assimilat ing the carries and the partial sums.
51
U·j...H.HI ~-H--H-
"'!" 'mI
~{l
. ~iL
tf{~
, , i'----
• I
.I
•.t~
*~{
.
",,,,1'/
. ~ *,~~
• I
..~
if
Figure 4.5: Configurable Logic Block Schematic of the 32-bit Carry Ripple Adder
However, it is not pract ically feasible to implement a CLA that adds numbers greate r tha n
eight bits because of the very high complexity and the limitations that arise from potentially
high fan-in and fan-out requirements. As a consequence, implementing a pure 32-bit CLA is
not a likely option.
Another way to design a practical CLA is to alter the basic design principle of the
ripple carry lookahead adder (RCLA) that uses 4-bit CLA blocks with an inter -block ripple.
However, our synthesis studi es in relation to implementat ions in FP GAs reveal that a much
52
more efficient implementation is to design a hieran:hicaJ CG"lI loolwhead, also known as block
carry loolroilead adder (BC LA) [54J. In this particular design approach, we ripp le the carries
within th e 4-bi t adders, an d generate carries between the 4· bit adder blocks using CLAs.
This particular adder implementation results in the following numbers:
• Ma."Cimum del ay = 70 us
• Total number of CLBs used = 60 (0.85 % of the avail able CLBs)
The maxim um delay associated with this implem enta tion is reduced by a factor 2.45 over the
32-b it CRA, but a t the same tim e the hard ware comp lexi ty is increased. by a factor of 1.873.
Ot her 32-bit adder des ign cho ices that have been investiga ted include a pure 32-bit catTy
"elect adder (CSEA) as well as a 32-bit CSEA that incorp orates a 4-bit CRA as t he basic
uni t . A carry select ad der is an adder organ izati on t hat introduces redundant hard ware to
make the carry cal cula tions go faste r [55].
The adder finall y se lect ed for our design is a hybrid of CLA and CSEA . Ou r synt hes is
st udies using FP GAs reveal that such a design is pre ferred on the basis of its speed . This
particular design uses a modula.r archi tecture, where in a 4-bit pure CLA is used to form
an 8-bit CSEA and an 8-bit CS EA is then used to develop a l~bit CSEA and so on. Our
FP GA imp lementation of this design yields th e following results:
• :Maximum delay = 39.32 DS
• Total number of C LB$ used = 191 (2.19 % of the available FPGA resources)
Th ese results suggest t hat the hybrid approach is very fast (an improvement by a facto r of
4.4 over the CRA implementa tion) . But since there is al ways a trade off be tween speed
53
and hard ware comp lexity, t his design achieves this high speed at the expe nse of increasing
hardware complexit y.
4 .4 .1. 3 Design of 32-bit X OR
A digi t al har dware oompo nent is needed to perform the 32-bi t bitwise exclusive-or opera tion .
The timing and FPGA resource reports for the synthesized 32-bit XOR unit are as follows:
• Maxim um delay = 4.88 us
• Total number of CLBs used = 16 (0.2 % of FPGA CLB resources)
These resul ts suggest th e cost effect iveness of this simp le operat ion in th e context of imple-
menting an encryp tion algorithm in an FPGA.
4.4.1.4 Design of 32 x 32 "Partial" Integer Multipli er
The design of an efficient mult ip lier has been the com er st one of the RC6 core implementation
in FPGAs. In particular , we need a 32 x 32 "part ial" integer multiplier to compute int eger
multiplication modulo 232 • This implies that we need to obtain onl y the leas t significant 32
bits of th e 64-bit product . Our ini tial behavioral level imp lement a tion of the multiplier was
very discouraging . T he synthesis results for the implementation of the partial multiplier in
the target FP GA furnished the following numbe rs:
• Maximum delay = 294 us
• Total number of CLBs used = 551 (7.8 % of the ava.i.lableCLB s)
This low speed , less efficient FP GA implementation of the mult iplier forced us to in-
vestigate a structural design option for the mult iplier . In this respect, we have considered
54
a num ber of mult iplie r designs and their impl eme nta tion in FPGAs . T he differen t mul ti-
plier archi tectures consi dered from the F PGA imple men tation perspective incl ude a serial
multiplier [561, ripple ClJrry army multipliers [57], row adder tree designs [581, parallel array
multlpllers [59], lookup table (L UT) multipliers [601, and Wallace tree multipliers [61].
The serial mul ti plier is the one that uses a serial adde r for com puting t he partial sums an d
th erefor e produces the product at a rate of one bit per mul tlpller-cycle , T he operational ti me
for thi s type of mul tiplier is 3nr per cycle and 3n 2r for the entire multiplica t ion proce ss,
where , T represents dela y of one logic level and n is the operand size in bits. Clear ly,
implementa t ion of th is sor t of a multiplier design is extremely slow in FP GA s.
A ripp le carry array mul tiplier (also known as row ripple architecture) is an unrolled
embodiment of th e class ic shift.-add multiplic ation algorithm. Thi s sort of design has a
maximum delay of the or der 2nr , if one igno res the routing delays. Im plementations in
FPGA s of this particular struct ur e su ggest that it does not make efficient use of the logic
avail able inside the target FPGA and is found to be slower than many ot her impl em entations.
A variant of the ripp le carry array multiplier is the row adder tree mul tiplier th at uses
an optimized fonn of row ripple. Basically, t he gate count is same as for row rippl e one, bu t
it im proves on the delay by arr an ging the ro w adders in tr ee. Bu t it has been found that
routing such a desi gn in an F PGA is very cumbersome and t he des ign seem s to be workable
in certain FP GAs only [62].
In principle, we can evaluate any finite function by using a lookup table (LUT ) that is
addressed with the arguments for the evaluation and whose output is the resul t of the eval-
uat ion . In theory, this furnishes the fastest possib le implementation, as no actual arithmetic
is needed . In the case of multiplication . however , the use of a single loo kup t able is not prac-
55
tical for any but the smallest operands. T his is because the tab le size grows exponentially.
LUT multipliers are simply a block of memory con taining a comp lete mu lti plica tion table of
all possib le inp ut combinations. Th e larg e table sizes needed for even modest input widths
make these impractical for implementations in FPGAs, especially with their limited on-chip
memory. For instance, a single table for 32-bit x 32-bi t integer mu ltip lication would have a
size of 2&1 words x 64-bit , which is simply out of question .
A better multi plier design approach is the use of paralle l array mult ip liers . Here the term
parallel mult iplie r refers to any multiplier that employs two or more adders in t he adder
section. This implies that we can have multi ple additions in one cycle. In this particular
design , we first have n2 AND gates operating in paral lel, genera ti ng the logical-AND or bit i of
t he mult iplier and bit j of the multiplicand , called the multi plican d-m ultiples. This is followed
by a number of logic layers of carry save adders [63) to add different multiplicand -multi ples
alo ng with different carri es that are generated . T he last stage uses an n-bit ca'fTy propagate
adder (CPA) to sum the bits coming ou t from the penul timate stage. The operational time
is made up of r , where T represents on logic level delay, for generating the n2 multipl icand -
multiples, 3r for each carry save adder, and the de lay associa ted with th e n-bit CPA (Tcpo ).
T his final n -bit adder can he any of the n- bit modulo ~2 adders discussed earlier. Here it
should be noted that each carry save adder stage is implemented using full adders as well as
half adders.
Varie ties of par allel-array mul ti pliers, for ope rand sizes ranging from four bits to sixteen
bits , have been commercially availab le for many years. However , for lar ger ope rands, high
fan-out require me nts and costs are generally prohi biti ve. One of the major contri but ors to
the total delay associated with t he parallel-array multipliers bas to do wit h the way t he
C&rTy save add ers are erraeged in each ro w of the array. Tak en to its logical conclusion, t he
approach of having as much co ncurrency as possible in the ope rat ion of the carry save adder,
yields & class of multi pliers known as Wall4ce tm: multip lierllo r simulta.neou" multipliers (61).
T he basic design principles for th ese kin d of multi pliers is explained as follows. First , all
n mul tiplicand-multiples for n-bit x n-bit multiplication are generated concurrently as nZ
ANOs. T hen & number of addition st eps follow. Assume n =310+ mo, where 0 :S roo :S2.
In the first addi tion step, each of the 10 triplets is reduced in a earry save adder to two sets
of out puts . At this poi nt we have 210+ rna = 311 + mto where O:S mt :S 2, outputs . In the
secon d addi tion step , each of II triplets is once again redu ced in a carry sa ve adder to two
outpu ts . This process is repeated until only two outpu ts are left and these are then fed in
a carry propagate adde r (CP A) . As each st ep red uces the number of outputs by a factor of
3/ 2, the complete proce ss of redu ction to two ou tp uts tak es rtog l.5n/21 s teps.
A Wallace tree is an implement ati on choice tha t is designed for minimum prop aga tion
de lay. A Wall ace tree approach rearranges the wiring so that the partial prod uct bits with the
longes t delays are wired closer to the root of the tree . This chan ges the delay characteristics
from O(n Z) to O(n log(n » with ou t increas ing the hardware in comparison to parallel array
approach. So, finally we find this approach to be quite suitabl e for implementation in F PG As.
An important cons ideration at this point is the requirement of producing only the least
significant 32-bits of the 64-b it product for our /J(Jrtla.l multiplier . As a consequence, we
co uld easily exploit this co ncurrency [l.e. t he parallel adder columns) that comes with the
Wal lace tree design . A high-level organization of Wall ace tree multiplier for ou r 32 x 32
parti al integ er multi plier is shown in Figu re 4.6. Here it sho uld be noted tha t t he add er
blocks labelled as CS A in Figure 4 .6 represen t the carry save adders. Basi cally, the carry
57
saw udders call lw iurplcmeute d using full add ers with ~:aITY inputs. A carry '~'L\e adder
essentially takes ill three inputs '11111generates two outputs, tile carry-out C und tho SIllU bit
S. III other words, afte r each st age of addit ion, tile carry suve adder reduces the JJ1l1l1Lpr uf
inputs going into the next stage by () IJI~.
Figure .J.o: A High Level Organ iza tio n of Wallace Tree Multiplier
T he extra hardware needed to compute the most significant 32 bit s of the G4-bit product
is removed . III other words , we are using the hardware tha t is only required to produce the
partial products used ill generating the least significant 32 bit s of the G.J·Lit prod uct . The
to ta l delay constit utes 0111' del ay associated with the hard ware produci ng the multiplicand-
58
mul ti ples, eight full adder delays plus a 6naJ.delay for the 32-bit CPA. The finaJ32-bit CPA
is implemented using the hybrid adder. Our synt hesis of t his multiplier design in a target
FP GA furnis hed t he following [lumbe rs;
• Maxim um delay = 79 ns (an impro vement by a fact or of3.8 over th e original syn thesized
behavioral d escription)
• Total numb er of CLBs used=930 CLBs (13 % of t he availabl e CL Bs)
An interesti ng obse rva tion has to do wi th th e maximum delay numbe r of 79 us for the
FP GA implementation of the multiplier. Alm ost half of the to tal dela y (Le. 39 as} is
contributed by th e final stage 32-bit adde r.
4.4 .2 R C 6 Control Path Desi gn
The cont rol pat h for t he ReG eocryptor consists of two ma jor funct ional units besides some
glue logic. Th ese are a global state machine uni t and a data flow controlle r usedfor ensuriog
the proper operat ion of vario us components during the encrypt ion process .
4 .4 .2 .1 ReG Gl obal State Ma chine
T he design of a synchronous finite st ate machine is vital to the proper operation of the RCG
encryptor, as the cipher cycles through a Dumber of modes of opera tion. T hese include
reset mode, kq,.doumlocd mod e, plaintut data·doumlocd mode, idle mode, and th e data-
encrypt mod e. A compon ent int erface represe nta tion of the RC6 encrypt ion is illustrated
in Figure 4.1. Here it sho uld be noted that th e ReG encryptor design essentiall y encrypts
t he data only. However, decryp tion can be accom plished with mino r modifica tions to th e
59
original design. T he cipher has th ree asynchronous inputs (RESET-CHIP, KO, and 00), a
32-bit data-in, a 32-bit data-out, a dock input and finally a status flag output. 'fhe sta tus
flag is used to indicate the mode of operat ion of the cipher.
JAl',_lU310
5l.\HIJ_FLAG2'0
f igure 4.7; A Component Interface of RC6 Encryptor
When the cipher is powered up, it is in the reset mode, with aU the registers and flipftops
being init ialized. Now, in order to start the normal ope ration of the cipher, the reLJd-chip
input is disab led, and the cipher enters into the key-download mode. During the key-
download mode , the for ty four 32·bit sub keys are downloaded into the cipher key LJtoroge
unit . Once all forty four subkeys are downloaded into the encryptor (indicated by t he sta tus
flag), then depending upon the choice of the user-select inpu ts KO and DO, the cipher
can either proceed forward to star t downloadin g the plaintext or it can go into the idle
mode. The idle mode is a feature incorporated into the design of the state machin e so that
60
we can have enough Bexibility among the key-download, data-download, and data-encrypt
stages. During the normal operation of the cipher, once the keys have been downloaded,
the cipher enters into the plaintext data-download mode. Once the 128-bit plaintext is
downloaded, the cipher enters the data-encrypt mode . During the data encrypt mode, the
plaintext undergoes twenty rounds of encryption, finally coming out of the cipher as the
12S-bit ciphertext (Le. as 128-bit encrypted dat a). From then onwards , the cipher can
synchronously download the 128-bit data and encrypt it. The component interface of t he
Re6 global state machine is shown in Figure 4.8. The design of the RC6 state machine
is based on a synchronous finite state machine with asynchronous inputs. The Re6 state
machine has a number of synchronous inputs such as DONE-4 , DONE-44, DONE-OUT,
COUNT-20. These inputs to the state machine come from different counters used during
the execution of different modes of operation of the cipher. As well, we have a number of reset
and enable outputs that feed into the datapath of the cipher and make sure that different
registers and multiplexers/demultiplexers are enabled or disabled at the appropriate time .
A sample VHDL code for the RC6 state machine design is included in the Appendix A.
Essentially, the Re6 state machine is based on Moore finite state machine approach [64J.
This finite state machine mode l is characterized by the fact that the outputs are identified
solely with the present state of the device. Our global state machine has been designed keep-
ing in mind that our main intent is to investigate the efficiency of the encryption algorithms
from the hardware implementation perspective. k; a resul t , the state machine design can
not be regarded as a very robust one. This is because the design does not fully account for
error conditions. Obviously, one can spend more resources to develop a more robust and
sophisticated state machine for the design, but that has not been the focus of our attention.
61
~C6 _5Tl.TUIA(Hlf{
Figure 4.8: A Component Interface Representat ion of RC6 Global State Machine
4 .4 .2 .2 RC6 Da t a F low Co nt ro ller
A simple controller has been incorporated in the design of the RC6 encryptor . The controller
is needed to enable and disable a number of registers and multiplexers during the data
encryption mode of the cipher. In other words, it is used to regulate the flow of data
through the RC6 core and update the \"&I10US registers after each round of encryption ,
thereby allowing a feedback connect ion from the output of the RC6 core to its input . As
soon as the global state machine control unit forces the cipher to go into data-encrypt mode,
the flowcontroller starts operating until the cipher exits this mode, once the data has been
encrypted.
62
4 .4 .3 Key Storage for RC6
One of the major concerns in 't he design of the Re6 cipher has to do with storage of the
forty-four 32-bit eubkeys that are to be used during the data-encrypt mode of the cipher.
Our design of the RC6 cip her assumes that the sub keys for encryption are being generated
outside the FPGA using t he k~y schedule scheme for the cipher. Th e details of RC6 cipher
key schedule scheme are outlined in [3J.
During the key-download mcode of the RC6 cipher, the forty-fou r 32-bit eubkeys are being
downloaded into the key stom~e unit inside the cipher. Thus the forty four clock cycles
needed to store the sub keys iraside th e ciphe r constitute the key-setup time. Our imple-
mentation of the key storage unit involves using a combination of serial-in paralIe l-out and
parallel-in paralle l-out registers., The former is used durin g the sto rage of forty four 32-bit
words. During the data-encrypot mode, the forty four subkeys ar e being fed into the RC6
core using the paral lel-in par alltel-out regist er . This impl ement ati on of the key sto rage unit
consumes a total of 704 configuaable logic blocks (CLBs). This im plies that t he key storage
un it takes up 10% of the available FPGA resources.
4.4.4 Simulat ion and Synthesis Results
T his section presents the findin~ of our simulat ion and synt hesis studies for the complete
design of the RC6 cipher. As escplained earlier, the cipher operates in five different modes
- reset mode, key-download mo-de, idle mode, data-download mode, and fina lly the data-
encrypt mode.
~ the simula tion runs for the design are very extensive, only the portions that concern
each mode of global st ate machine operat ion are illustrated in Appendix B. wben t he ciphe r
63
is powered up, it is in the reset mode as indicated by the sta t us flag ('7', decimal equ ivalent
of ' '111 ~ ) . Next , when the RESET-CHIP input to the cip her is dis ab led, the cipher moves
in to the key-download mode . During this mode of operation (st atus flag = '1', decimal
equi valent of "001"), t he forty four 32-bit subkeys are being downloaded . Once the fort y
four subkeys are downloa ded , the stat us flag changes to a new value ('2', decimal equivalent
of "010") and then t he cipher enters into the idle mode as ind icated by status flag ('0 ',
decimal equivalent of "000"). Now depending upon the user-controlled KD and DO in puts,
the cipher can move into any ot her mode of operation. As seen from the simulation figure,
the cipher then enters the da ta-download mode (KO = '0', DO = ' I' and status flag =
'3' ). Durin g this mode of operation , the 12S.bit da ta is down loaded int o the cipher in four
clock cycles as the cip her has 32-bit I/ O buses . Once t he 128-bit p lain text is downloaded
(st atus flag = '4') , the cipher then enters the data-encrypt mode (sta tus flag = ' 5') . After
20 rounds of encryption, the ciphertext data is latched out in four clock cycles (st atus flag =
'6'). Unless directed otherwise, the cipher then goes back to down load t he data and encrypt
it in a synchronous fashion.
As a verifica tion of the design funct ionality, we adopt a bo ttom-up approach by test-
ing each individual component and subcompo nent thoroughly until we verify the correct
operation of the entire des ign . Each component is also synthesized and a postsynthesis
simu lation is carried out to make sure that we genera te th e right hardware. In ad di-
t ion , we use test vectors such as previ ous ly encrypted plaintext-ciphertext pairs and en-
cryption subkeys to verify t he correct operation of the cipher. For instance, a plaintext
818C555EB18C555E2AA502AB2AA502AB (a 128-bit plaintext block represented in hex-
adecimal form ) is load ed and encrypted with the forty-four 32-bit subkeys (FFEFFF FF... )
64
stored inside the cipher . The resulting ciphertext isC749Bl 640A9DBOEA12579B32F94CC59D.
These simula tion results ha.ve been obtained after carrying out both functi onal simulati on
and the timing simulat ion after the actual place and route of th e RC6 design in the target
Xilinx XC40200 FP GA device. Here it should be noted that we are assuming that the
subkeys for encrypt ion are being generated out side the FPGA using t he key schedul e scheme
for the cipher and that they are being down loaded into the key storage unit. Neglecting the
key -setup time, a single encryption time T= er is given as follows;
28Tcl k (4.1)
Here it should be noted th at Tcl k repres ent s the minimum clock period and defines the
maximum combinational pat h delay of the design. Moreover , since there are 32-bit I/ O
buses, as such it will take four clock cycles to download the 128-bit plaintext data and an
equal number of clock cycles to la tch out the dat a onto the 32-bit output bus. Our syn thesis
process results in the following numbers:
• Minimum clock period = 146 ns
• Maximum clock frequency = 6.85 MHz
• Time for a one encryption = Tener = 4088 na
• Rate of encryption = 31.3 Mbps
T he hardware needed for the RC6 encryption is 4944 CLBs for the RC6 core plus 704
CLBs for key storage unit and another 64 Ct Bs for storing the 128-b it input dat a. Besides
65
this , there is also some data flow and control logic overhead . Thus the total Fr GA resources
required is about 6450 CLIls , which implies that the design takes up 91% of the available
CLDs in the target devi ce.
4.5 Design of CAST-256
CAST~256 is the othe r AES candidate that has been implemented in frGAs in order to in-
vesngate its efficiency from the hard .....are implementation perspective. The cipher as realized
in hardware is illustra ted in Figure 4.9.
Figure 4.9: CAST-2OO Encryp tion in Hardware
66
The design of the CAST-256 ciphe r is based on single-s tage iterative arch itecture. .tv;
mentioned ear lier , this design approach involves generating the hard ware for one roun d of
encryption and then des igning a control pat h that allo ws a feedback connec tion from the
outpu t of the sing le roun d hardware to its inp ut. As a consequence, we cycle the original
plaintext input through the single stage hardware for the required numbe r of roun ds (or
iterations) , finally latching the encrypted data out at the end of the required number of
encryption rounds.
4.5 .1 CAST-256 Datapath
T he CAST-256 datapa t h bas ically consists of a generic round funct ion, a number of mu l-
tiplexers, demultiplexers, a feed back regis ter, a feed back multiplexer and a final out put
register. The generic round function module realizes any of t he three roun d functions I " h,
or 13depe nding on the particular round in progress .
4 .5.1.1 Generic Round Function
T he generic round funct ion mod ule consists of four 32-bit add/$tJbtract/uelusive-or units, a
separate 32-bit XOR module, a 32-bit barre l shifter and four 8 x 32 s-boxes, namely, St. S2,
53, and 54' T he generic round fun ction module is shown in Figure 4.10.
The generic round fun ct ion mod ule receives two 32-bit inp uts from the two 4 x 1 mul ti-
plexers, in add ition to inp uts from the masking key storage and rotation key storage units.
T he gene ric round function module also receives contro l inpu ts from the control unit for the
cip her. Th e generic round fun ction has only one 32-bit out pu t , because anyone of th e four
32-bit blocks A, B, C, or D is modifi ed a t the end of each rou nd of encryption. T he 32-bit
67
/~i~
~ - .:' " ' c~
\ -'-'- ~-
\.~
~ . ~
"-.~. . ) ~ /
-...c...,._....
»=.q:-_.. » >-'
~..:..~
Figure 4.10: The Generic Round Function Module
input which is to be modified by the generic round function module as well as addi tional
32-bit input that is fed into the separate XOR unit as a final process inside the module
are being selected by the two 4 x 1 multiplexers outside the round function module. These
selections are dictated by the control signal that emanate from the contr ol unit of the cipher.
4 .5.1.2 T be 5-Box Design
As illustrated in the block diagram representation of the generic round function module in
Figure 4.10, the 32-bit output coming out of the 32-bit barre l shifter is split into four Bcbit
vectors used as four sets of B-bit address lines for the four 8 x 32 S-boxes. The S-boxes are
implemented as lookup tables (LUTs) . The hardware complexity of the cipher is primarily
68
due to the complex structures of the S-boxes. The two-hundred and fifty- six 32· bit values
sto red in each of these S-boxes are tabulated in [41.
Our synth esis resul ts for t he 8 x 32 S-boxes are presented here :
• Total numb er of CLBs used for each S-box = 411
• Maximum delay of each S-box = 62 ns,
T he shere size of the four Scboxes and the associated delay contributes to the hardware
complexity and low speed of the generi c round function module, which requires a total of
3037 CLBs (43% of t he available FPGA resour ces), and has a maximum combinational pa th
delay of 202 ns.
4.5 .2 CAST-256 Contro l Path D esign
The CAST·2 56 control pa th consis ts ofa globa l state machine unit with asynchronous inpu ts ,
a data flow cont roller and some glue logic as is needed to integrate the control logic with the
ciph er da tapath.
4.5 .2 .1 CAST-256 Global State Machine
Ai; in the case of RC6 , here too the st a te machine unit is based on a Moore fini te state
machine model (64]. The component interface of CAST-256 sta te machine is presented in
Figure 4.11. The stat e machine unit desi gn has a number of asynchronous inputs (RESET-
CHIP, KD, and DO) in addition to other inputs. The st ate machine is a synchronous one.
The CAST- 256 state machine is differen t from the RC6 counterpart in the sense that the
former incorporates one more key download state. As mentioned earlier, in the case of
69
CAST-256 we have two sets of keys - forty eight 5-bit rotation keys and forty eight more 32-
bit masking keys . Hence, we are concerned with an additional rotation-key-dowruoad state.
T he six cipher modes - reset, masklng-key-downJoad, rotation-key-download, idle, plaintext
data-downloa d, and data-encrypt.
oc'£ _"e
L"JIIE_O'JT
co
FE SE T _'::HIP
Figure 4.11: CAST-256 Stat e Machine Unit
4.5.2 .2 CAST-256 Data F low Contro ller
T he design of the data fiow controller is of prime importance to the proper operation of the
cipher, as it is responsible for regulating the flow of data through the CAST-256 core. A
component interface for the contr oller is shown in Figure 4.12. The controller receives the
counte r output that is needed to stim ulate certain contro l signals emanati ng from the control
path and feeding into the da tapath. The controller provides a synchronous control over the
generic round function module . It also provides contro l inputs for the feedback multiplex er
as well as for the two 4 x 1 multip lexers. The feedback mult iplexer is used to loop back
th e data after each round of encrypt ion. The other two mult iplexers decide which 32-bit
70
CTRL-Tt
SEUNiH '
~fU~ln~>
SELl1UX2, I
~ILOUTl <1 ;0 )
Figure 4.12: CAST-256 Data Flow Controller
input data is to be modified in each round . The con tro ller also supplies four con trol inputs
to th e generic round function that dec ide which operation - addition, su btrac tion , XOR., or
rota tion operation - is to be executed for operations a, b, c, and d in the round funct ion.
4.5 .3 The Key Storage U n it
The key sto rage unit for the CAST-256 cipher is comprised of a masking keys storage part
and a rota tion keys storage portion. There are forty eight 32-bit masking keys stored inside
the cipher to be fed to the generic round funct ion. This is accomplished by des igning a
serial -in parallel-out (SIPO) register, in conjunction with a paral lel-in parallel-out (PIP O)
71
register. \OVe have a similar arrangement for storing the fort y eight &-bit rotation keys . Here,
it should be noted that the subkeys for encryption are assum ed to be genera ted outside the
FPGA using the key schedu le scheme for CAST-256 [4J. The keys are t hen downloaded into
the key stor age units duri ng the ma3ki ng.key-downloadan d round.key.doumload modes. This
implementation of storing the keys inside the FPG A uses around 1000 CLBs .
4.5.4 Simulat ion and Synthesis Results
T his section presents t he findings of our sim ulation and synthesis stu dies for the comple te
design of the CAST-256 cipher. As opposed to RC6, CAST-256 ope ra tes in six different
mod es - reset mode, masking-key-downl oad mode, rotation- key-download mode, idle mode,
data-download mode, an d final ly t he data-encrypt mode .
As the simulation runs for the design are very extens ive, so only the portions that concern
each mode of global state machine operation are illustrated in Appendix C. When the cipher
is pow-eredup, it is in the reset mode as indicated by the st atus Bag (' 15', decimal equivalent
of "1111")_ Next, when the RES ET -C HIP input to the cipher is disab led, the cipher moves
into the masking-key-download mode. During this mode of operation (status flag = ' 1',
decimal equivalent of ''0001'') , the forty four 32-bit masking sub keys axe being downloaded .
Once the forty four sub keys are downloaded, the status flag changes to a new value ('2 ',
decimal equivalent of "0010") and th en the cipher enters into the rotation-key-download
mode as indica ted by the sta tus flag ('3' , decimal equivalent of "0011"). Once the forty eight
5-bit round keys are downloaded (sta tus Bag = '4' ), the cipher then enters th e idle mode as
indica ted by sta tus flag (' 0')_
Now depending upon the user-controlled KD and DD inputs, the cip her can move into
any other mode of ope ration. As seen from the simulation figure, the cipher the n enters
the data-download mode (KD = '0', DO = '1' and stat us flag = '5' ). During this mode of
operation, the 128-bit data is downloaded into the cipher in four clock cycles as th e cipher
has 32-bit I/ O buses. Once the 128-bit plaintext is downloaded (stat us flag = '6') , the
cipher t hen ent ers the data-encryp t mode (sta tus flag = '7' ). M ter 4B rounds of encryption,
the ciph ertext data is latched out in four clock cycles (stat us Bag = 'B'). Unless direc ted
ot herwise , the cipher then goes back to download the data and encryp t it in a synchrono us
fashion .
As a verificat ion of the design functionali ty, we first encrypt a plaintext with a certain key
and th en later on use the same ciphert ext as input to the cipher and recover the original plain-
text. For the correc t operation of the CAST-256 ciphe r, both the encryption and the follow-
ing decryption should result in the original plaintext as long as the subkeys are fed in the re-
verse orde r. For inst an ce, a plaintext E2AAFC llE2AAFCllE2AAFCllE2AAFCll (a 128-
bit plain text block represented in hexadecimal form ) is loaded and encryp ted with the fort y-
eight 32-bit masking subkeys (FFEFF FF F...) and forty-eight S.bit round subkeys stored
inside the ciphe r . Th e resulting ciphertext is 3BA95CB3135B95DFC54D1C 13297FC 027. At
a lat er time, the ciphertext 3BA95C83135B 95DFC54DIC1 3297FC 027 is fed in as inpu t to
the cip her with the subkeys applied in the reverse order to recover the original plaintext
E2AAFC llE2AAF CllE2A... . We also carry out presynthesis and pos tsyuthesis testing of
each individual component in a bottom-up manner to verify th e correct operation of our
design. Simulation results bave been obtained afte r carrying ou t bot b func tional simulation
and th e timing simulation afte r actually place and route of the CAST-2 56 design in the
target Xilinx XC40200 FPGA device . Here it should be noted th at we are assuming that the
73
subkeya for encryption are being generat ed outside th e FP GA using the key schedule scheme
for the ciphe r and that they are being downloaded into the key storage uni t . Neglecting the
key-setup time , a single encryption time T..,.",.is given as follows:
(4.2)
Here it should be noted that Tel.!: represents the minimum clock period and defines t he
maximum combinational pat h delay of the design. Moreover, since there are 32-bit I/ O
buses , as such it will take four clock cycles to download the 128-bit plain text da ta and an
equal number of clock cycles to latch out th e da ta onto the 32-bit output bus. Ou r synt hesis
process result s in the following numbers:
• Minimum clock period = 198 us
• Maximum clock frequency = 5.05 :MHz
• Time for a one encryp t ion = Ten"" = 11088 ns
• Ra te of encry ptio n = 11.54 Mbps
The t ime for one encryption an d t he corresponding encrypted data rates app ly to a
48-round CAST-256 cipher. However , if we had half the number of rounds for CAS T-256,
i.e. twenty four rounds in all , the encryption speed of the cipher is almost doub led , as the
const ant 48 in the expression for Ten""changes to 24 and this leads to a data rate of 20.2
Mbps . T here are no known cryptanalytic attacks that have been applied to a 24-round
version of the cipher.
74
T he hard ware need ed for the CAS T-256 encryp tion is 3037 CLBs for th e generic round
function plus 768 CLBs for masking keys sto rage and anot her 120 CLBs for stori ng the
rotation keys. Besides this, another 64 CLBs are required for storing the 128-bi t input data.
The data Bow and control logic overhead amounts to around 1000 CLBs . Thus t he total
FPGA resources req uired is abo ut 5050 CLBs , which imp lies th at the design takes up 72%
of the availab le CLBs in the target de vice.
4.6 Com parison of RC6 and CAST -256 C ip hers
Simu la tion and synt hesis studies for the two ciph ers suggest t hat neit her RC6 nor CAST- 256
is well suited for imp lement at ion in th e t argetted Xilinx FP GA . The har dwar e comp lexity
is high and th e encryptio n speed is low, particular ly comp ared to simil ar implem entations
of DES [51].
Our simu lat ion an d syn thesis stu dies reveal that mult iplicat ion in part icular and eddt-
tion to some ext ent are major bottl enecks as far as speed of encryption in RC6 cipher is
concern ed. This has also to do with cust om architecture of the FP GAs . However , a faste r
imp lementat ion of RC6 cip her can on ly be achieved at t he expense of increas ingly large
hard ware complexity, which imp lies the use of a high-en d FPGA device . Moreover, it ap-
pears tha t implement at ion of RC6 in the targetted FP GA using pipelining is found to be
imp racti cal from a hardware comp lexity viewpoint. Thi s is because of the large number of
CLBs (in excess of 6000 CLBs) that are need ed for implementing just one round of encryp-
tion hardw are . So, if we want to even pipeline say two rounds, we will be needin g twice th e
hardware and will need to cons ider very high density devices.
CAST-256 encryption in FP GAs is found to be slower than what we can achieve with
75
the RC6 cipher . At the same time , the hardware complexity associat ed with CAST-256
cipher is roughl y of the same order as RC6. Tbis is because the advan tage of not having a
multi plica tion opera tion is being offset by the use of four large S-boxes .
4 .6.1 Some Recent Modific ations
Lately in our research , we have made a number of im provements in the design of some of t he
key cipher componen ts th at have im proved the ovreall speed of these ciphers to some extent,
as well as reduced some of the hardware that was previously in use. But , these mod ifications
stil l do not mark a sign ificant imp rovement over the speed and hard war e comp lexity for these
ciphers.
One major modification has to do with the storage of the encryption key, and the imp le-
mentation of t he S-boxes . We have now replaced the LUT impleme ntation of t he S-box es
with a much faster and low complexity R.k\1 units. Another significant modification in-
volves im proving the speed of the mult iplier by rep lacing the current 32-bit adder designs
with a new kind of synt hesized adders provided by th e LOGmLOX toolbox available with
the new FP GA design tools. These are called .. Rela tionally Pl aced Macros" or RP Ms.
These adders not only make use of tbe fast carry logic avai lable within the Xilinx CLBs, but
at the sam e time most of the logic is aligned in paral lel to red uce the delays. T his 32-bi t
adde r imp lemen tation reduces the hardware by a factor of ten over the previously emp loyed
adder implement ations. T he speed of an RPM based 32-bit adder is aro und 20 ns as op-
posed to 39 ns for the hybrid adde r design . However , it sho uld be noted that these add er
results are only fine for the SRAM-based Xilinx FPGAs, and for t he technology independent
implem entation, the hybrid design a ppears to be t he best ap proach .
76
Some recent modifications in the the design of control path for both the ciphers have re-
sulted in removing the T dGttJ-in = 4Tclk overhead, so tha t during the las t stages of encryption,
data for next encryption is already available, thus saving us four clock cycles. Tbis has in-
creased the encryption data rate for our RC6 cipher implementation from 31 Mbps to around
37 Mbps. Similarly, the encryption data rate for CAST-256 has been im proved to a value of
12.5 Mbps for 48~round imp lementa tion and 24 Mbps for a 24-round implementation.
If we compare the encryption speeds of these hardware implementations with the cor-
responding implemen tations in software on 200 MHz Pentium and Pentium Pro pla tforms,
we find t hat rate of encryption for RCB is around 100 Mbps and that for CAST-256 is 38.8
Mbps [65}. However, in order to attain very high speeds in hardware, one can go for a full
custom ASIC implementation. These results also point out the fact that most of these AES
ca ndidate algorithms have an element of bias for software implementation. As such there
is a need to look for a private-key block cipher that is very efficien t in terms of hardware
imp lementation.
4 .7 C oncl usion
In this chapter, we have presented the FPGA implementation of two AES candidates -
RC6 and CAST-256 encryption algorithms. We have first explained the two encryption
algorithms in detail. Next we have found it necessary to talk about the hard ware develo pment
environment that has been used to implement the two ciphers. This is followed by a detailed
investigation of the design of RCB cipher. Here, the design of RC6 datapath as well as
its control path is presented. The FPGA implementation of the ciphe r is carried out and
simulation and synt hesis results are presented. A similar investigation of CAST-256 cipher
77
design is presented, followed by simulation and synthesis results. Finally, the hardware
complexities of the two ciphers are compared and certain conclusions are derived.
78
Chapter 5
A New Private-key Block Cipher
Design
5.1 The Proposed Cipher
As menti oned ear lier , the FPGA industry has become one of the fas test growing segme nts
of today's i!1dustrial world . With the contin uous enhancements in th e FPGA technology
in terms of increasing gat e density and fast er clock speeds, applications liIce reconfigurable
computing, rapid prototyping, as well as algorithm agile I2pplia1tioru are ideal for FPGA
implementations.
However, the FPGA imp lementation of RC6 and CAST-256 encryption algorithms ba.s
brought forward a very important aspect of implementing the private-key block ciphe rs
in hardware; the hardware com plexities associated with these cipher designs are significant
enough to discour age t he possibility of going for programmable logic devices such as FPG As.
As a consequen ce, we propose II much simpler cipher design that makes use of simpler
79
operations that not only possess good cryptographic properties , bu t also make the overall
ciphe r des ign efficient from the hard ware implementation perspective. This approach may
also encourage us to effectively pipeline multiple rounds of encryption and thereby increase
the ciphe r speed for FPG A implementations in particular and hardware implementations in
general.
T he proposed cipher. whic:h weshall refer to as Fad Hard ware Cipher-or FH C is a private-
key 128-bit block cip her that is a. generalization of basic Feistel network, and it incorporates
the sam e d ata flow scheme as used in CAST -256. Howe ver, there are a number of key
differen ces between the proposed cipher and CAST-256. The pr oposed cip her bas a much
sim pler generic ro und functia-n than th e one used for CAST-256. Complex operations such
as additions, subtractions, and data dependent rotations are removed. T hese have been
rep laced by simple XOR op erations. As well, the mere s ize and structure of the S-boxes
used in C AST-256 cip her was, a major contributor towards the overall hard war e com plexi ty
and low speed for the cipher . The roun d function in th e p ro posed ci pher makes use of eight
4 x 32 S-boxes instead of four 8 x 32 Scbcxes . Hence. the size of each S-box is reduced from
256 x 32 bits to 16 x 32 bit .
The "penalty" for these simplifications is t~ the total number of encryption rounds is
increased from 48 in CAST-2S6 to 96 in FHC. The reason for doing so has to do with t he
security of the cipher, as will be explained later in this chapter. A key $cMduk ,cAemt: must
used to generate th e 32-b it fnlZJ:king sub keys K..... each of which is used per round. There
are no rotation sub keys.
Th e round fun<:tion used in F HC is defined as follows;
• Round Function I
80
fOT (i = OJ i < 12j i ++)
C =CEDf(D ,K_+, )
B=BEDj(C, K +1 )
A = A $ j (B . K .»)
D =D ED j (A , K ,... )
fOT(i = 12; i < 24; i ++)
D = D $ j (A,K +. )
A = A ED n»,K +1 )
B-B"f(C. K_~)
C =CED j (D . Km.o.....)
Figure 5.1: Encryption with Fas t Hard ware Cipher (FHC)
1 = K....e D
0= (5 1[1..]$ 5:2 [1.1$ 5J[ l c] ED 5~(14J ED 55[1.1ED 55[ I, JED 5 7(1,16)S, rIA])
where D is the J2-bit da ta inp ut to the roun d funct ion , 1.. through l A are the most signif-
icant nibb le through the leas t significant nibb le of I , respectively, Si is the i Uo substitu tion
box, and 0 is the 32-bit output of th e round function . Each s-bcx is a nonlinear map ping o f
a 4-hi t input to a 32-bit output . In our implementation, these 5- boxes have beenrando mly
gener a ted . Moreover, "$" is bitwise exclusive-OR ope ra tion.
Th e proposed encry p tio n algorithm is illustrat ed in Figure 5.1. Th e plaintext is stored in
8 1
four 32-bit registers, A , B , C, and D . In each rou nd of encryptio n, a 32-bi t maskin g key is
used . The output of the 96-round encryption is also stored in the four 32-bit registers. The
design of a suit a ble key scheduling algorithm is no t addressed in this thes is. As in the case of
th e previous two ciphe r implementations , here too , the subke ys for encryp tion are gene rat ed
outside the FPGA and downloaded into the cipher during the key-clownload mode.
5 .2 FPGA Implementation of the Propos ed Cipher
A block diagr am of the proposed cipher as realized in digital har dware is presented in Fig-
ure 5.2. T he cipher is cons tructed as a single-s tage iterative structure. This implies that we
have the hardware deve loped for one round of encryption and the control part of the cipher
is responsib le for iteratively passing t he data through the encryption core.
5 .2 .1 D atapath
As seen from Figure 5.2 , the data pat b for the proposed cipher encapsulates th e encryptio n
core implemented as a ro und func tion unit . A round fun ction consists of a multiple exclusive-
OR operations and S-box subst itutions. When the 32-bi t data ent ers the round functio n unit,
it first goes through a bitwise XOR with the 32-bit masking key for that particular round.
Th e output of this operation is then divided into eight 4-bit vectors used as address lines for
the eight 4 x 32 S-boxes. The 32-bit out puts of all eight 8-boxes are then XORed to yield a
32-bit result . These final 32 bit s are th en XORed with appropriate 32-bit values of A, B , C ,
or D. This 32-bit output of the rou nd funct ion un it is then swapped along wit h the ot her
three 32-bit values and then looped back to the input of the round function . Th is continues
for a total of 96 rounds. T he swappin g of the 32-bit words and the feedback connection is
82
H-f::=~=f===1f=::=- -~---c". .' ",
G {~j 1 ~1Q J.., ~ 0 C,pl,e rl .. t'i;:==='=l..j:;~_-I-_.F~H_C_C_"_'_ " " , , " J, ...T"-'
Figure 5.2: Realization of Fast Hardware Cipher ill Hardware
achieved using a number of multiplexers, demultiplexers aud a feedback register.
The simple structure for the round function lends itself likely to all FrGA lruplementa-
ttou. This is mainly because of employing simpler cryptographic operations such as bitwise
XOR. Tile size of the Scboxes used ill this design is reduced from the OIlt'S that were used
to implement the CAST-2,j6 cipher . This has resulted ill saving a lot of hardware as well
as reducing the cr itical path delay considerably. Synthesis of the round function for the
proposed cipher yields a maximum combinational path delay of just 30 us as opposed to 202
TIS for the CAST·256 generic round function implementa tion. The tota l number of CLDs
used for implementing this round function is just 192, which is a considerable improvement
83
over the CAST-2ii6 generic round function that requires 3037 CLBs. This is mainly because
now the total num ber of CLBs needed for the eight 4 x 32 Scboxes is 8 x 16 = 128 as opposed
to abo ut 4 x 411 = 1644. This considerable reduction in the hardware is also because the
Scboxes for the proposed cipher have been implemented as 16 x 32 RAMs , instead of lookup
table (LUT) imp lementations based on using the register bits in the CtBs. As such the
RA.\1 implementation is still a LUT - just more efficient storage.
5 .2 .2 C on t r o l Path D esign
The FHC control path is not much different from the one employed for CAST-256 , except
for the total num ber of encryp tion rounds. The control path comprises of a synchronous
state machine unit wit h asynchronous inputs, a data flow cont roller and some glue logic.
The state machine for the proposed ciphe r is no different from the one used for t he FPGA
implementation of CAST-256, except that in the case of FHC, wehave only one key-download
state. The data flow controller is also based on the same design as used for CAST-256 (see
section 4.5.2.2).
5 .2 .3 The K ey Stora ge U nit
One of the major enhancements to the implementaion of t he proposed cip her has to do
with the way the encryption keys are being stored inside the target FPGA. Previously, key
storage module imp lementations for RC6 and CAST-256 emp loyed the lookup tables (LUT)
approach, but our synthesis stu dies for FPGA implementations suggest that table looku p
imp lement a tions d o not scale efficiently in custom architecture FPGAs, mainly becau se of
the restricted nature of t he Cl.Bs.
84
On the ot her ban d, imp lementing the key storage unit as RA.t.\1s proves to be very efficient ,
as the target dev ice makes full use of the on-chip R.A...VI available to it . For the pro posed
cipher, we need to sto re ninety six 32-bit encry ption subkeys . T hese subkeys are to be read
lat e!"on during the dat a4encT7/pt mode. Our implementation of the key sto rage unit for the
proposed ciphe r gives the following numbers:
• Maximum combina tion al pa th del ay = 26.593 us
• Tot al number of CLBs use d = 99
5 .2.4 Simulation and Synthesis Results
T he design of the comp lete proposed cipher has been simulated, synthesized and we place-
and -routed in a particular F P G A. T he relat ively small hard ware comp lexity ass ociated with
this design makes it possi ble for us to use a medium size FPGA device. We have used
XILI NX XC4036 EX F PGA for im plementing this des ign. The target device has been used
in the field for quit e some time now and has proven to he very popular. XC4036 has a to tal
of 1296 CLBs.
As the sim ulat ion run s for th e des ign are very extensive , only portions of simulation that
conce rn each of the five cipher modes are shown (see Appendix D). T hese incl ude the reset
mode, key-download mod e, idle mode , data-download mode, and the da ta- encrypt mode.
The key-d own load mode diffe rs from the one for RC6 beca use the former takes 96 clock
cycles as opp osed to 44 clod: cycl es for the latt er . The manner in which the cip her cycles
t hro ugh different modes of operation is the same as descri bed in section 4.4.4.
Neglect ing the key setup time, one single encryp tion tim e T~n<:F" is given as follows:
Tener = 96Td .t + Td4t4_""~
88
(5.1)
Here it should be noted th a t Tdl; represents the minimum dock period and defines the
maxim um com binational pat h delay of th e design . Moreover . since there are 32--hit I/ O
buses , as such it will take four clock cycles to downl oad the 128-hit plaintext data and an
equal number of dock cycles to la tch out the data onto the 32·bi t output bus. However. in
t his part icular imp lement ation, we have made a modifica tion so tbat duri ng the time dat a
is being encryp ted . the next 128-bi t plaint ext is alre ady available. This has saved us four
clock cycles for downloading the input data. Our synt hesis proc ess resul ts in the following
numbers :
• Minimum clock period _ 30 ns
• Maximum clock frequency = 33.33 MHz
• Time for a one encryption = T......". = 3000 OS
• Rate of encryption w 42.67 Mbps
The time for one enc ryption and th e corresponding encrypted data rates appl y to a
96-round FHC implementation.
The hardware needed for the FHC encryptio n is 192 CLBs for the generic round function
plus 99 CL& for "'masking" keys sto rage. Besides t his , another 64 CLBs for storing the
128-bit Input dat a and 66 CLBs for each of the eight S-boxes . The data How and contro l
logic overhead amo unts to under 300 CLBs. T hus t he tot al FPGA resources required is
about 750 CLBs, which implies t hat the design takes up 57% of the available Cl. Bs in the
86
target device, but only 11% of the availab le CLBs in the device tar getted for the original
RC6 and CAST-256 designs!
5.3 Security Analysis
In proposing any new cipher, one of the key design elements concerns the security of th e
proposed cip her. A success ful cipher should resist all propos ed cryptanalysis techniques, such
as linear and differential cryptanalysis, and at the same time exhib it the potential of surviving
brute force attacks in future when computing power is increased. Moreover, there sho uld
exist clear mathematical techniques to analyze and determine the cryptographic strength of
the cipher in ques tion.
In this sectio n, we an alyze the security of our proposed cipher agai nst two very potent
cryptanalytical attacks, name ly linear and differential cryptanalysis. An m x n S-box is a
2m X n lookup tab le. In SPNs and most Feist e1ciphers, S-boxes are critically impo rtant to
security, since t hey are the key compone nts of nonlineari ty in the algorithm. As the size of
the lookup table increases exp onentially with increase of the value m, m should be chosen to
be smal l. On the other hand, the value of n can be selec ted to be large as the size of lookup
table increases linearly with n.
5.3 .1 Selecting Nonlinear Round Functions
For t he pro posed cipher, we have rand omly selected.eig ht 4 x32 S-boxes. These S-boxes have
been constru cted using ran dom number gener ators. It is t herefore important to consider
the nonlinearity cont ributed to t he cipher by ran domly selected S-boxes . Th e conc ept of
nonlinearity and m-bi t affine functio ns have been defined in Section 2.2.1. It follows tha t
87
out of all 22'" possible m- bit functions, there are 2m+t affine functions. Since there are thirty
two 4-bit linear or affine funct ions, the probability of ran domly generating a 4-bit linear or
affine function is given as Pli" = (:z5/ 2 L6) = 2- lL •
Next consider the problem of selecting k =0, I, 2, ..., 8 perfectly linear S-box app roxima-
tions , wherein we consider the approximation th at is an XQR of some subset of outpu t bits
of a round function and is th e XOR of corresponding bits in t he output s of Scboxes. Now
the probabili ty of k out of a total of eight Scboxes being nonlinear for a particular subse t of
output bits, Pk , is given as a binomial dist rib ut ion:
(8) • ._.Pk = k (1 - PI;,,) (P",,) (5.2)
Table 5.1 lists values of Pk for k = 0, 1, 2, ... , 8. This implies, for example, the proba bility
of ran domly selectin g all eight S-boxes contributions to t he roun d approximations as being
linear is 2- 81 for a particular subset of round functi on output bits . We shall use these results
in our following discussion of the securi ty of the cipher with respec t to linear cryp tan alysis.
Table 5.1 : Probabilities of selecting k nonlinear S-boxes
5.3 .2 Linear Cryptanalys is
Linear Cryp tanalysis [7J attempts to find a linear app roximation of a cipher only derived
from plaintext, ciphertext, and key terms. A general linear approximation of a cipher is
derived by combining a number of linear approximations of the S-boxes of different rounds
88
so that the intermediate terms ar e can celed . Bas ically, the attack makes use of any high
probability occurrences of linear exp ress ions of inp ut , outp ut, and round keys in the round
function of an ite rat ed ciph er structure. As such the basic principle of linear cryptanalysis
is to determine a linear approxim ati on of the type:
where i.,»....j", ki>k2 , •.k~, and h, [2, .. Je denote bit positions of the plain text P, ciph ertext
C , and key K , respect ively. In [341, the probabili ty of satisfying the best linear exp ression
for r -round cipher is bound ed as follows:
(5.4)
where PI rep resen ts t he probab ilit y th a t the linear exp ression 5.3 holds, ptJ rep rese nts the
p roba bility of the best linear approxima tion of a ny S-box. Also, Ct is the number of S-boxes
involved in the linear approximation. Ideally ptJ = 0.5 =>PI =0.5.
It has been shown in [341 that an S-b ox linear approximation has a proba bility PB where
(5 .5)
where m represen ts the total numb er of in put bits to the Scbox, and N L is the nonli nearity of
the S-box function used in th e linear ap proximation. Here N L =0 implies a linear function .
A linear cryp tanalyt ical a tt ack typ ically employs a numbe r of linear app roxim at ions of th e
rounds to deve lop an overall linear expression involving subsets of plain tex t and ciphert ext
bits. This makes it possible then to de rive one key bit , which in turn is given as the
excl usive-o r of a number of round key bi ts as in given in equatio n 5.3. As a result , it has
89
been shown in [7J that the numbe r of known plain texts required, with a success ra te of 97.7%,
is approximately
(5.6)
In order to analyze the st rength of the proposed cip her agains t this a t tack, we adopt
a very pessimistic approach by making worst-case assumptions. From Tab le iL l, it is clear
that the probability of selecting all eight S-boxes to be linear (2- 118) for a particular set of
output bits is highly remote. As such we can very safely rejec t the possibility of all eight
S-boxes being perfectly linear in the linear approximation of a round .
Let us consider th e scenario where we have seven linear S-boxes and one nonlinear S-box.
Once again ado pting a very conserva tive approach, assume N L = 1. Substit uting this value
for N L in equation 5.5, we get a Ips - ~ I = k. What follows next is an example of how to
construct the bes t linear approximation for the round function and then extending it to the
entire cipher .
Let us assume that the best linear a pproximation for the round fun ction in round 1 is as
follows:
Round 1 : X 31
x"
K" (5.7)
wher e equat ion 5.7 holds with PP=k. Here X jj and Yjj rep resen t the t"tlt input bit to the
j U> 5- box and iU> output bit to S-box Sit respec tively, and K ;J represents t he t"'h, bit of i":
90
round key. Z" is th e k~1t. out pu t bi t of the round function . Similar ly, scenarios for the bes t
linear approximations for second, thi rd, and fou rth roun ds are given below. All these best
linear approximations are assumed to hold with PfJ= -!ci .
Round 2 Z2 ED Z3 ED K 2• E!lK 3• X l2·E!lXIO'
X2·E!lX3• Yi2,EDYIO I
~EDZ3ED K2'EDK32 Z I2ED ZIO
Round 3 ZlOED Zl2 ED K lQ3 ED K 12• X UpEDXl22
X UpeX(2" Y"
Y" Z,
ZlOED Z l2 ED K LO• EEl K I2" Z,
Round 4 : X,. Z2EDK -r
X,. y;, ED !'; , ED Y7'
1'3, ED 1'$. EflY7' Z3 ED Z5$Z7
Z3EDZS$Z7 C99EDCLOl eCuI3
Z2WK24 C9geCI01WCl03
(5.8)
(5.9)
(5.10)
Finally, the bes t linear ap proximat ion for a 4-round cipher is as follows:
Now substituting th e values for a 4-round linear app roximation in equations 5.4 an d 5.6,
we find tha t the tot al number of known plaintexts needed to guess the right hand side of
91
equat ion 5.3 with a 97.1 % success rate is only 12. However , if we extend this attack to
complete 96 roun ds of the cipher , t hen we need at leas t zl9 known plaint exts for 96 ro unds.
Alt hough tbis number is too small to claim 96 rounds for the cip ber to be secure , howeve r,
the very low pro ba bility of selecting seven linear Scboxes (see Table 5.1) implies t hat being
ah le to represent seven out of eigh t S-bo xes in each roun d as linear in a linear approximation
of a round function is extre mely unlik ely.
Table 5.2 lists the results for 4-ro un d , 48-round, and 96-round linear approximations
for differen t values of N L. We find that for all S-boxes funct ions used in the app roximation
wit h nonlin eari ties gr eater t han 4, th e attack becomes impractical for a 48-round cipher. The
values list ed in Tabl e 5.2 also im ply tha t for a 9&-roun d cipher, tbe attack is not successfu l
for N L > 2. In fact, for value of N L = 3, the numbe r of known plaintexts required sim ply
exceeds th e total n um ber of plain texts (t.e. 2128 ) .
N4.un M(r= 4) M (r = 48) N,(r = 96)
1 > 2' 2" > 2"
2 > 2' 2" > 2"
3 >2' 2" > 2132
4 2" 2" > 2LIH
5 > 2u 2'" > Z:m
6 2" 2'" -
7 2" 2'"
-
Ta ble 5.2: Linear Cryp tanalys is Results for Different values of N L
We have so fa r focused on the upper region of Ta ble 5.1, and have found that even by
92
ado ptin g very conservative values of NL = 1, th e cipher seems to be pret ty secure against t he
linear attack. Next , "'-econcentrate on the lower regio n of Tahl e 5.1, wherein the proba bilit ies
of selecting 3 or 4 linear 5-boxes are relatively not that remote .
Our analysis so far involves only one linear S-box approximation per round function , the
probability of this occurring is extremely small, Pl = 2-". Now we relax our assumptions
a bi t to assume that there are four ou t of eight s-bceee functions which are perfectly linear
in our linear approximation of a round func tion and t he probability of this happening being
still a very unlikel y, p~ = 2-311, for randomly selected 5-hoxes. T his implies that we are to
cons truct a linear app roxima tion per round tha t takes into accoun t four nonlin ear S-boxes
approximat ions and th en conca tenating th e linear approximation for each rou nd to form a
4-round linear approxima tion . Finally, we ex tend it to 48 and 96 rounds .
Assuming a N L = 1 for four S-boxes fun ctions, say 51151, 53, an d 5~. Thi s implies
JplJ- ~I =£-. Our analysis for this scenari o yields the following results :
• Tot al numbe r of known plaintexts requ ired for a 4-round attack is grea te r th an ~
• Tot al number of known plaiatexts req uired foc a 4S-rouod. attack is app roxim a tely 2"
• Total number of known plaintexts req uired for a 96-round attack is approxim a tely 2 l $O
Simil arly , we ap ply the same attack on th e proposed cip her with the condition that there are
three perfectly linear S-box approximations selected wit h a great er proba bility of Pit ",. 2- 21.
This implies tha t we have to consi der the effect of live nonlin ear S-box approximations while
cons t ruct ing the linear app roxim at ion per round . Once again assuming the lowest value of
nonl ineari ty (NL = I ), which corresponds to weakly nonlinear approximation , we get the
following resul ts:
93
• Total number of known pl aint exts required for a 4-round attack is greater than 2~
• Total number of known plaintexts req uired for a 48-round attack is a pproxi mately 294
• Total number of known plaintexts required for a 96-round a t tack is approximately 2187
Our analysis of th e proposed ciph er against linear cryp t analysis under some very pes-
simisric assumptions sugges t :
• The probability of a linear approximation involvi ng five or more linear approximations
of S-box functions is negligibly small.
• For a 96 round cipher with three or four linear Scboxes, the total number of known
plaintexts required exceeds the the total number of available plaintexts. This renders
the linear att ack unsuccessful against the ciphe r.
• Finally, the known plaintexts requirement makes the attack increasingly impractical
if we select Scboxes in a way such that fewer than three S-boxes are perfectly linear
amo ng them , although such cases are likely to occur .
Here it should be noted that, for our analysis, we had a very conservative approach an d used
worst-ease bou nds on the valu es of nonl inearity, beca use the probabilities of lower values of
N L are (ow. It is conce ivable that a linear approxima tion can be constructed that involves
only one round function every four rounds, but this can be avoided if the S-b oxes are selected
so tha t the S-box functio ns are balanced (I.e. th ere ace an equal numbe r of ones and zeroes) .
We can safely conclude that the 96-roun d proposed ciphe r implemen tation is secure against
linear cryptanalysis.
94
5 .3 .3 Differential Crypt analysis
In this section, we briefly present some of the studies and results in rela tion to the proposed
cipher 's resistance to differential cryptanalysis.
Differential cryp tanalysis [6J is basicall y a chosen plaintext a ttack . Thi s met hod takes
into account ciphertext pairs , whose corresponding plaintexts have a particular difference.
In other words , it looks at the XOR difference of two plaintexts and compares that to the
correspond ing cipbe rtext pai r. In a particular S-box, if we know the input XOR of a pai r,
it does Dot ensure t he knowled ge of its out put XOR. However , there exist s a proba bilis tic
relation between the ou tput XORs and every inpu t XOR. Differential cryptanalysis makes
use of t he higbly probable occurrences of sequences of output XOR differences at each round
given a particular plaintext XOR difference.
A block cipher can be proved to resist differential cryptanalysis if it can be shown tha t
high probability differentials do not exist. In a secure cipher, t his probability should ap-
proach 2- N, where N represents the block size. In the case of the proposed ciphe r , N = 128.
In actu al practic e, its very hard to derive th e probability of any practical differenti al. As a
consequence, we search for highly probable r-ro und iterative characteristics. Th e probabili-
ties of the most likely characteristics can be estim ated and used as a meas ure of the cipher 's
resist ance to differenti al cryptanalysis.
In [66], it has been shown that the best a-round iterative characteristic for a round
function having 4 x 32 S-boxes as used in t he design of the proposed cipher has a differential
probability of 2- Lli • In [34], the probabilit y of best r-rouad ite rated charac teristic is given as
follows: ,
Pfl.,= !l P'
ss
(5.12)
where Pi is the probability of the out put XOR given the input XOR in roun d i, It has been
shown in [34] tha t the numbe r of chosen plaintexts is Nc =:::;;!;; for an appropriate value of
T (usual ly less than the number of round s). Applying a simila r approach for our proposed
cipher , Pn . is given as follows:
(5.13)
In particular, an S8-round characteristic (used to mount a potentia l different ial a tt ack against
the 96 round cipher) has a prob ability less than or equal to 2- 264 • As a consequence , th e
number of chosen plaintexts needed for this attack would be in excess of 2254 for a 96-ro und
imp lementation. These results suggest the proposed cipher ap pears to be quite imm une to
this kind of at tack .
5 .4 Concl usion
In t his cha pter , we prop ose a new private-key block ciph er. We have discussed the design
and implementation of the proposed cipher in FPGAs and outline d our results. We have
also investigated the security of the cipher against two very pop ular and effective cryp tana-
Iytic al attacks, name ly, linear an d differenti al cryptanalysis. Our analysis suggests th at the
proposed cipher app ears to be qu it e secur e against both the attacks.
96
Chapter 6
Conclusions and Future Work
The AES process, which comme nced in 1997 by the National Instit ute of Standards and
Techno logy (NIST) is a major un folding in the field of private-key cryptography. As a conse-
quence of this concerted effort, we will select a new block cipher as an eventual replacement
for DES, which is nearing the end of its useful life. Interestingly, this even t bas come at the
ad vent of the Dew millen nium . CAST-256 and RC6 are amo ng t he stronger candidates to
qualify as AES.
In this thesis, we have presen ted the hardware implementation of these two candi date
encryption algori thms, brin gin g forth some very interesting observations and results. These
conclusions constitute a framework for des igning private-key block ciphers that are targetted
for a hardw are environment such as th e FPGAs. As a consequence of OUf research, we
have also prop osed a new pri vate-key block cipher which is very conducive for hardware
im plementations, especially for custom architect ures such as the Fie ld P rogramm able Gate
Arrays (F PGAs) .
97
6.1 S urmnary of the Thesis
In this thesis , we have investigated the issues relating to the hardware imp lementation of
private-key block ciphers . In particular, we have selected field programmab le gate arrays
(FPGAs) as our target environment for implementing these ciphers in hardware. Two key
factors have motivated us to go for FPGA implementations. Firstly, FPGAs possess this
very attractive feature of reprogrammibility that has forced many applications to migrate
from ABIes to the domain of FPGAs. The other key factors that play in the favour of
FPGAs include rapid prototyping, that leads to faster design turnaround times, scalable
architectures, and variable architectural parameters.
We have first presented th e design of the ReB cipher and its imp lemen tation in FPGAs.
The cipher design is based on a single-stage hardware iterative architecture. This design
approach requires the hardware for only one round of encryption, and the control machinery
ensures that the data cycles through the hardware afte r each round of encrypt ion. Our
simulation and synthesis studies suggest that the inclusion of mult iplication operation in t he
quadratic function of the RC6 core is a major bottleneck as far as the cipher encryption
speed is concerned. However , a faster implementation of RC6 can only be achieved at the
expense of increasingly large hardware complexity, which implies the use of a high end FPGA
device . Implementa tion of RCa in the target FPGA device using pipelin ing is impractical
from a hard ware comp lexity viewpoint.
CAST-256 encryption in FPGAs is found to be even slower than what we can achieve
with RC6. At t he same time, the hardware complexity of CAST-256 is roughly of the same
order as RC6. This is because the advantage of not having a multiplication operation is
offset by using four 8 x 32 S-boxes .
98
As a consequence of investigating the FPGA implementations of these two private-key
block ciphers, we have proposed a much sim pler cipher design that makes use of simpler
cryptographic operations t hat not only possess good cryptographic properties, but alsomakes
the overall cipher design efficient from the hardware implementation pe rspective. T he new
proposed ciph er uses smaller S-boxes and does not incorporate any arithmetic ope rat ions
such as addition or multiplication . T his ciphe r design uses only 750 CLBs as opposed to
5050 CLBs for CAST-256, which is a reduction in hardware complexi ty by a factor of arou nd
7. Also, the speed of the proposed cipher is improved by a fac to r of 3.5 over the CAST-256
implementation , if we are looking for a 96 round imp lementation of the proposed cipher.
One key observation has been made with regards to the hardware complexity of the
proposed cipher. We have found that although we have heen successful in bringing down the
hardware complexity to a mere 192 CLBs for the round function, the hardware associated
with the control circ uitry as well as the key sto rage unit proves to be the overhead. This
puts a lower bound on the hardware complexity that can be achie ved. Another performance
limiting factor is the maximum dock frequenc y of the tar get device. Most of the current
FPGAs have a frequency ceiling of 50 MHz. However, with the introduct ion of the ODe
million gate , 200 MHz FPGA chips , one can achieve very high data rates in the near future.
In this thesis, we have also investigated the securi ty of the proposed cipher against linear
and differen tial cryptanalysis. Our analysis suggests that the new cipher ap pears to be
res istant to hath kind of attacks.
99
6 .2 Suggestions for Future Work
Our FPGA implementation of the proposed cip her suggests that since the hardware associ-
ated with its design is very small as compared to what we have come to know about Re6
and CAST-256, we can pipeline the new cipher to further increase the data encryption rate.
It would be possible to pipeline four stages of the proposed cipher, giving us a speed up of
about four over the present implementation. This implies that we can be looking for data
encryption rates in excess of 150 Mbps for a 96-round implementation. This will be achieved
a t the expense of under 3000 etBs and will therefore comfortably fit in the FPGA targetted
for ReS and CAST-256 ciphers.
Throughout the course of this research, we have not conside red the design of key schedule
scheme for any of the ciphers in question. This issue can also be investigated in the future.
Generating the encryption keys on the fly is another approach that can be explored. A
secure and efficient key schedule algorithm should be proposed for the new cipher as well.
Yet another future direction would be to investigate the hardware imp lementation of other
AES candidates such as MARS, RIJNDAEL, and TWQFISH [11. Further research in this
dynamic area is strongly encouraged.
100
Bibliography
[I] ''htt p:/ / c:srs.nist .govf encryp tionf aesf aes-home.htm.'' NIST Advanced Encryption
Standard (AES) Development Effort Web Site.
[2J "National Bureau of Standards - Data Encryption Standard." FIPS Publication 46,
1977.
[3J R. L. Rivest , M. J . B. Robsbaw, R. Sidney, and Y. L. Yin , ''T he RC6 Block Cipher."
available at web site , " http: / /theory.lcs.mit.edu/rivest/rc6.pdf' .
[4] C. Adams , "Ibe CAST-256 Encryp tion Algorithm." available at web site, "
http: / /www .entrust .com /resour ces/pdf/cast256.pdf' .
{5J H. Feistel , "Cryp togra phy and Computer Privacy ," Scientific American, vol. 288(5) ,
pp . 15- 23, May 1973.
[61 E. Biham and A. Sbamir, «Different ial Cryptanalysis afDES-like Cryptosystems," Jour-
nal of Cryptology , vol. 4, no. 1, pp. 3-72, 1991.
[71 M. Matsui, "Linear Cryptanalysis Method for DES Cipher ," in Proceedings of EURO-
CRYPT'93. pp . 386-397, Springer-Verlag, 1993.
101
[8] R. L. Rivest, "Ibe RC5 Encryption Algorithm," in Proceedings of Fast Software En -
cryption - 2nd International Workshop , (Leuven, Belgium), pp. 86--96 , Springer-Verlag,
1995.
[9/ C. Adams, "Constructing Symmetric Ciphers Using the CAST Design Procedure," De-
signs, Codes, and Cryp tograph y, vol. 12, no. 3, pp. 283-316 , 1997.
(10] A. Menezes, P. C. V. Oorschot , and S. A. Vanstone , Han dbook 0/ Appl ied Cryptography.
eRC Press, 1997.
[111 c . E. Shannon, "Communica tion T heory of Secrecy Systems," Bell Systems Technical
Journal, vol. 28, pp. 656-71 5, 1949.
[12] W. A. Notz, H. Feistel , and J . L. Smit h, "Some Cryp tographic Techniques for Machine-
to-Machine Data Communications," in Proceedings of the IEEE, vol. 63(11) , pp. 1545-
1554, November 1975.
(13J B. Schneier , "The Blowfish Encryption Algorithm," in Proceedings of the Cambridge
Securi ty Workshop on Fast Software Encryption, pp- 191-204, December 1993.
(14] A. Shimizu and S. Miyagucbi, "Fast Data Encipherment Algorithm FEAL ," in Proceed-
ings 0/ EUROCRYPT'81, pp. 267-278, Springer-Verlag, 1987.
{IS] E. Beham and A. Shamir, "Differential Cryptanalysis of FEAL and N-Hash, " in Pro-
ceedings of EUROCRYPT'91 , pp. 1- 16, Springer-Verlag, 1991.
[16] K. Ohta and K. Aoki, "Linear Cryptanalysis of Fast Data Encip herment Algorithm, "
in Proceedings of CRYPTO'94, pp . 12-16, Springer-Verlag, 1994.
102
[17] X. Lai , J. L. Massey, and S . Murphy, "Mar kov Ciphers and Differen tial Cryp tan alysis ,"
in PTOceeding~ of EUROCRYPT'91, pp . 17-38, 1991.
[18J B. S. K. Jr . and Y. L. Yin, "On Differential and Linear Cryp tan alysis of the RC5
Encryption Algorithm," in Proceedings of CRYPTO'95, pp. 171-1 84, Sprin ger-Verlag,
1995.
[19] H. M. Heys, "A Timing Attack on Re5," in Proceedings of SAC'98, (Kings ton, On t.) ,
August 1998.
[201 J . Lee, H. M. Heys, and S. E. Tavares, "Resis t an ce of a CAST- Like Encryption Al -
gorithm to Linear and Differential Cryptanalysis," Designs, Code8, and Cryptography,
vol. 12, no. 3, pp . 267-282 , 1997.
[21] H. M. Heys and S. E. Tavares, "Subst itution-Perm ut ation Networks Resistant to Dif-
feren tial and Linear C ryp t analysis," Journal of Crypto logy, vel. 9, pp. 1-1 9, 1996.
[22J J . B. Kam and G. I. Davida, .. A Structured Design of Subs titution-Permutation En-
cryption Networks," IEEE TTansactions on Computers, vol. 28, no. 10, pp . 747-753 ,
1979.
(23] L. Brown and J . R Seberry, "On the Design of Permutation P in DES Type Cryptosys-
terns ," in Proceedin9S of EUROCRYPT'89, pp . 696-705, 1989.
[24j A. F. Webster and S. E. Tavares , "On the Design of S-Boxes," in Proceedings of
CR YP TO'85, pp. 523-534, Springer-Verlag, 1985.
103
[251 R. Ferre, "T he Strict Avalanche Criterion; Special Properties of Boolean Funct ions and
an Extended Definition ," in Proceedingl1 0/ ·CRYPTO'88 , pp. 450-468 , Springer-Verlag,
1990.
[26] C. M. Adams, "A Formal and Practical Design Procedure for Substitution-Permutation
Network Cryptosystems." PhD thesis, Quee-n's University at Kingston, Kingston, Ont. ,
1990.
[271 B. PreneeI, \V . V. Leekwijck, L. V. Linden, JR. Goevarts, and J . Vanderwalle, "Propaga-
tion of Boolean Functions, " in Proceedingl10-1 EUROCR YPT'90 , pp. 161-17 3, Springer-
Verlag, 1991.
[28J M. H. Dawson and S. E. Tavares , "An Expoanded Set of S-Box Design Cri teria Based
on Informat ion Theory and its Relation to Differential-Like Attacks," in Proceedinqs 01
EUROCRYPT'91, pp. 352-367, Springer-Verlag, 1991.
(29] R. Ferre , "Methods and Instru ments for Deesfgning S-Soxes," Journal 0/ Cryp tology ,
vol, 2, no. 3, pp. 115-130, 1990.
(301 C. Adams and S. E. Tavares , "The Structuned Design of Crypt ogra phicall y Good S-
Boxes," Journal 01 Cryp tology, vel . 3, no. 1, pp. 24-41, 1990.
(311 L. O'Connor, "An Analysis of Product Ciphers Based on the Properties of Boolean
Functions" PhD thesis, Universi ty of Waterloo, Waterloo, Ont., 1992.
(32] C. M. Adams and S. E. Tavares , "Designing ~Boxes for Ciphers Resistant to Differential
Cryptanalysis," in Proceedings o/the 3rd Syr.nposium on State and progress of Research
in CT7/ptogrophy, (Rome , Italy), pp. 181-190 , 1993.
104
[331 c. M. Ad ams , "Desi gnio g DES-Like Cip hers wit h Guaranteed Resistance to Differen t ial
and Linear Attacks," in Pr-ot::t%din!J$ 0/ SAC '95, (Carl eton University, Ot tawa, Ont .),
pp. 133-144 , 1995.
[341 C. Adams, H. M. Heys, S. E. Tavares, and )d. Wie.ner, "'An Anal ysis of CAST- 256
Cipher ," in Proc:«diflg$ 0/ CCECE'99, pp . 361- 366, :May 1999.
[351 P. Chow, S. O . Sea , D. Au, T. Choy , B. Fallah, D. Lewis, C. U , and J. Rose , "'A
1.2 IJ CMOS FPGA using Cascaded Logic Blocks ," in~ing3 0/ the Oz/ord 1991
International Worb hop on Field Programmable Logic.and ApplicatioTl.'J, 1991.
(36] C. Ebe ling, G . Borriello, S. A. Hauck , D. Song , and E. A. Walkup , "T RYPTYCH: A
New FP GA Archit ectu re," in Procec:iings 0/ the Or/o rd 1991 Internati on al Workshop
on Field Programmabl e Logic and ApplicatioTl.'J , 1991.
[37] "btt p://www.ac tel.com." , Actel web site.
(38) D. G. Reinertsen, "Whoduni t? The Search for the New-prod uct Killers ," Electronic
Busineu, July 1983.
{39] W. Carter, K. Duong , R. H. Freeman, H. C. Hsieh, J . Y. Ja, J . E. Mah oney, L. T .
Ngo, and S. L. See , -A User Programmable Reconfigurable Gate Array," in IEEE 1986
Cu.slom Integroted Circuiu COTI/~ 1986.
(40] H. C. Hsieh , K. Duong , J . Y. Ja, R. Kanazawa, L. T . Ngo, L. G. Tin key, W . S. Carter,
and R. H. Freem an , "'A Second Generation User P rogramm able Gate Arr ay," in IEEE
1987 ClUltom In tegrated Circuiu Conference, 1981.
105
[41] H. C. Hsieh, K. Duong, J . Y. Ja , R. Kanazawa, L. T .. Ngo, L. G. Tinkey, W. S. Carter,
and R. H. Freeman, "A 900o-Ga te User Programmable Gat e Array ," in IEEE 1988
Custom In tegrated CiTC'Uits Conference, 1988.
[42J T . Kean," Configurable Logic: A Dynamicall y Programmable Cellular Architecture and
its VLSI Implementation." PhD t hesis, University of Edinburgh" 1989.
[431 F. Furtek, G. Stone, an d I. Jon es, "La byrin th: A Homogeneous Computational
Medium," in IEEE 1990 Custom Integrated Circuits Conference, 1990.
[44J K. Kawana, H. Keida, M. Sakam oto, K. Shibata, an d I. Moriyama, "An Efficient Logic
Block Interconnect Architecture for User Reprogrammab le Gate Array, " in IE EE 1990
Custom Integrat ed Circuits Con ference , 1990.
[45J S. Hauck, G. Borriello, S. Burns, and C. Ebel ing, "Montage: An FPGA for Synchronous
and Asynchronous Circuits," in Procudings of the 2nd Intern ational Workshop on Field
Programmable Logic and Applications, 1992.
[46J R. Cliff, B. A.banin , L. T. Cope, F. Heile , R. Ho, J . Huang, C. Lytle, S. MashruwaIa ,
B. Pedersen , R. Raman , S. Redd y, V. Singhal , C. K. Sung, K. Veenstra, an d A. Gup ta,
"A Dual Granularity and Globally Interconnected Architecture for a Pr ogramma ble
Logic Device," in IEEE 1993 Cus tom In tegrated Circ uits Conference , 1993.
[47) ..http://www.xilinx.com ", Xilinxwebsite .
[481 J . Rose, E. C arnal, and A. Sangiovanni-Vincen telli, "Architecture of Field Pro-
grammable Gate Arrays," in Proceedings of the IEE E, vol. 81(7), pp. 1013--1029, J uly
1993.
106
(49] S. Singh et al., "The Effect of Logic Block Archi tecture on FPGA Performance," IEEE
J. Solid..state Circuiu, vol. 27, no. 3, pp . 281-287, 1990.
[50) J . S. Rose and S. Brown, "Flexi bility of Interconnection Struct ures for Field-
Pro grammable Gate Arr ays," IEEE J. Sol id-State CiTcuiU, vol. 215, no. 3, pp. 227-282,
1991.
[51) J .-P. Kaps and C. Paar, "f ast DES Imp lementa tion for FP GAs and its Applicat ion to
a Universal Key-Search MacbiDe," in Proceeding6 0/ SAC'98, (Kin gston, Ont .), Augus t
1998.
(521 ..htt p;/ /www .cmc.ca... , CMC web sit e.
{53] Garcia, O.N ., H. Glass, and S. C. Haimes, "An Approximate and Empirical St udy of
t he Distribu tion of the Adde r Inputs an d Maximum Carry Length Propagation ," in
Pr-rJUedinglJof 4Ul IEEE Svmpo6ium on Computer Arith metic, pp. 97-1 03, 1978.
[541 Doran and R. W., · Varian ts on an Improved Carry Lookahead Adder ," IEEE 71"an" ac -
tions on CompulerlJ, vol. 31, no. 9, pp. 1110-1113 , 1988 .
[55} O. Bedrijj , "Carry Select Adder," IRE ThIn" acnon.s on Ekctronic. ComputeT$, vel . EC-
11, pp. 340--346, 1960 .
[56] L. Dadda, "Some Schemes for Fast Serial Inp ut Multi pliers ," in ProcudinglJof61A IEEE
Svmpori um on Computer Arithmetic, pp . 52- 59, 1983.
[57] D. P. Agarwal , "Optimum Array-Like Structures for High-Speed Arithmetic, " in Pro-
cudinglJ0/3rd IEEE Sympo6ium on Computer Arithmetic , pp . 208- 219, 1975.
107
(58] L. Ciminiera and A. Serr a, "Fast It era tive Multi plying Array," in Proceedin gs oj 5U1.
IEE E Symposium on Compu ter Ari thm eti c , pp. 60-66, 1983.
[59] Baugh, C. R., an d B. A. Woo ly, " A T wo's Com plement Par allel Array Mult iplie r,"
IEEE Transac tions on Computers, vol. C-22, no. 12, pp. 1045-1047. 1973.
[50J Bru bake r, T . A., and J . C. Becker, "Multi plica tion Using Logari thms Implemented wi th
~OM," IEEE tran.sactions on Computers , vel. G-24, no. 8, pp . 761- 765 , 1975.
[61] Doran and R. W., "A Sugg estion for a. Fast Multiplier." IEEE Ihmsactions on Com-
put ers, vol. EC- 13, pp. 14- 17, 1964.
[62J ''htt p:// users.ids.net/ randraka/ mul t ipli. ht m.", Multiplicatio n in FPGAs .
[63] J . B. Gos ling, "Design of Large High-Speed Binary Multiplier Units ," in Proceedings oj
the l EE, vel . 118(3), pp. 499-505, 1971.
[64) S. H. Unger, Asynchronous Sequen tia l Switt:hing Circuits. Wiley (In tersc tee ce Divisi on).
1969.
[651 B. Sch ne ier , J . Kelsey, D. Whi ting, D. Wagner, C. Hall , an d N. Fergu-
son , "Perform ance Compariso n of th e AES Submissions" availa ble at web site,
"http:/ /csrc.nist .gov/ encryption/ aes /round l / conf2 / aes2conf.btm" .
[66] X. Zhu, "A New Class of Unbalanced CAS T Ciphers and Its Securi ty Analysis" M.Eng.
thesis, Memorial University of Newfoundland , St . John's, NF, Can ada, 1997.
108
Appendix A
A VHDL Description of RC6 Global
State Machine
- This 18 a VHDL description of the Gl obal s t a t e machine f or the RC6 8Dcryptor
- a t the behavioral level. The state machine is baaed on the Moore FSM model.
- The glo bal s t a t e machine 18 a s ynchronou s one wi t h asynchronous i npu ts .
libr ary IUE;
U8~ lEEE .std_logic_1164 . a.ll;
Ulle Work. RC6_typea .a.ll ;
uee IEEE. atd.- l ogic _ari tb. iLll;
- Ent ity Dec l ara.tion
109
entity RCS_STATEjlACHlNE is
port(RESET_CHIP ,DONE_44,DOIfE_4 ,DONE_OOT, KO, DD, CLK: i n Std._ logic ;
COUlfL 20 : in atd_logic_vector (4 downto 0);
COUNT_OUT : in std.-logic_vector(l downto 0) i
ENl ,SEL_DEMUX ,RESET_44,RESET_4,RESET_ 20 : out std_logici
ElC OUTPUT_REG ,RF.SET_MUX : out std_logic ;
STATUS_FLAG : out std_logic_vector (2 d ownt o 0» ;
end ReS_STATE_MACHINE;
- Behavioral architecture f or the entity
architecture BEHAVIORAL of ReS_STATE_MACHINE is
type STATE i s (RESET, KEY_DOWBLOAD , IDLE,DATA_DOWHLOAD ,DATA_ENCRYPT) ;
- - Signal declarations
Si gnal CURRENT : STATE : = RESET;
signal COUNT_OOT_INT : I NTEGER range 0 t o 3 ;
begin
COUN"C OOT_I NT <a conv_integer (unsigned (COUNT_OU'I» ;
proce ss
begin
case ct1RREJlT i s
-- The en cryptor i a in RESET state
wben RESET =>
Elfl <= ' 1' ;
110
RESET_20 c.. '1 ';
RESET_MUX c.. ' 1';
EN_OUTPUT_REG <= '1' ;
STATUS_FL AG <- "1 11 " ;
if(RESET_CHIP .. ' 0 ' ) then
CURRENT <- KEY_DOWLOAO ;
elsif(RESET_CHI P - ' 1 ' ) t he n
CURRENT c.. RESET ;
end i f ;
- - The encryptor is download i ng the key
eb en KEY_DOWNLOAD ..>
EN1 <'" 'O ';SEL_DEMUX <s 'O ';RESET_44 <= ' 0 ' ;
RESET_4 <- ' 1' ;RESET _20 <~ '1 ' ;
RESET_MOl: <.. '1 ';
EN_OUTPUT_REG <-' 1'; STATUS_FLAG <,. "001" ;
if(DONE_44 .. '0') t hen
STATUS_FLAG <.. " 010" ;
CURRElfT <'" IDLE;
R.ESET_44 <-' 1 ' ;
e lsif (RESET_CRI P = ' 1 ' ) then
III
CURRENT <= RESET ;
end if ;
- - The encryptor is in the i d l e s t a.t e .
when I DLE - >
Elfl <- '1' ;
RESET_44 <- ' 1';
RESET_MUX <- ' 1' ;
ElCOUll'UT_REG <.It '1 ' ;
STATIJS_FLAG <.. MOOO" ;
if (I<O .. '0' and DO = ' 1 ' and RESET_CHIP .. ' 0 ' ) then
e l s i.f (KD .. ' 1 ' and 00 • '0' and RESET_CHIP .. ' 0 ' ) then
Ct1RRENT <IS DATA_DOWNLOAD ;
e l sif( (KD - DO) and RESET_CHIP '" '0 ') t he n
CURRENT <.. IDLE;
elsif (RESET_CHIP .. ' 1 ' ) then
CURRENT <= RESET;
en d i f ;
-- The encrypt or is downloading the data
112
when DATA_DOWNLOAD =>
EN_OUTPOT_REG <"' '1' ;STATUS_F1.AG <= "011";
if (DONE_4 ., ' 0' ) then
STATUS_FLAG <= " 100" ;
CURREHT <"' DATA_ENCRYPT ;
elsif «KD = DO) an d RESET_CHIP = '0 ') t hen
CURRENT <.. IDLE;
e.1sif (RESET_CHIP " ' 1 ') t hen
CURRENT <'" RESET;
en d i f;
-- The encryp t or is encrypting t he data
ENl <- 'l '; RESET_44 <a' 1 ' ; RESET_4 <" '1' ;
RESET_MID: <'" ' O'; RESET_20 <'"' '0';
STAnJ S_FLAG.<'"' " 101";
i t(COtJNT_20 - "10 10 1 ") t hen
E1C OUTPUT_REG <= '0 ' :
i f COUNT_OUT_I NT >- 0 t hen
113
wa i t; unt; il CLK • ' 1 ' ;
wait un t;i l CLK - '1' ;
wa i t; un t;il CLK · '1';
STATUS_FLAG <- " 110" ;
wa i t un t;il CLK = ' 1 ';
STATUS_FLAG <- "110";
end i f;
i f (DONE_OUT = ' 0 ' ) t;hen
EN_OUTPUT_REG <- ' 1 ' ;
end if ;
if (DONE_OUT - ' 0 ' and (l ro ". DO) an d RESET_CHIP = ' 0 ' ) t;hen
CURRENT <- IDLE;
elsif(RESET_CHlp · ' 1') t hen CURRENT <- RESET;
elsif (DONE_OUT = '0' and !CD - ' 1 ' and DO - '0' and RESET_CHIP - ' 0' ) then
en d i f ;
elsif (RESET_CHIP - '1') then
CURRENT <- RESET;
end i f;
end case ;
vait un til CLK = ' 1';
114
end proce ss;
end BEHAVIORAL;
115
Appendix B
Gate-Level Simulation of RC6 Cipher
Th is appendix shows t he ga te- level simulat ion results for the design of Re G cipher. T he
entire simula t ion is divided int o nin e segme nts . with each segment illustra t ing a particular
mode of operatio n of the cip her . T he simulatio n figures ill ustrat e aU t he five mcdes , data -
do wnload mode , keys-download mode , reset mode , idle mode, and th e data-encrypt mode .
The simulation also sbows th e sta te of th e three asynch ron ous signals. name ly, RESET-CHIP.
KD. and DO.
During the reset-mode, al l t he control inpu ts to the darapath are disa bled and as such no
ciphertext app ears at t he ouptut of th e cipher as illust ra ted in t he simulat ion ligur es. Durin g
t he key-download mode , forty fou r 32-bit subkeys are do wnloaded into th e cipher. These
subkeys are to be used for en cryption during the da ta-e nc ryp t mode. Du rin g the data-
download mode, the 12s-.bit plaintext block is downl oaded into the cipher. Finally when
t he global state machi ne is in t he data-encrypt mod e, t be 12s-.bit plaintext is encrypted
synchronously unt il finally a t t he end of the required number of rounds of encryption, the
12s-.bit ci pherte xt is available at the 32-bi t out put bus of t be cipher. T he idl e mode is used
116
to provide more Hexibilry to the RC6 global state mach ine.
117
20 00 4000
:> tE NC AYPTO A_ AC6_T EST/ DATA_ 1N(3 1:0)
:> IE NC AYPTO A_ AC6_T ESTfO ATA_O UT(31:0)
IE NCAYPTOA_ RC6_TEST/R ESET _CH IP
IENC AYPTO R_AC6_ TESTIKD
IENC AYPTO R_ AC6_TEST/ DD
IENC RY PTO R_ RC6_TEST/CLK
:> IENCRYP TOR_ RCQ_TESTIS TATU S_ FLAG{2:0j
> IENCR YPTO R_RC6_TEST/PLA INTE XT{ 127:0l
:> IENCA YPTO R_R C6_TEST /ENCR _SUBKEYS(0)(31:0)
:> l eNCRYPTOR_A C6_TEST/ ENCR_SUBKEYS(1)(31 :0)
:> IENCR YPTO R_ RCG_TEST/ENCR_ SUBKEYS(2j(3 1:0)
.. IENC AYPTO R_RC6_TEST/ ENCR_ SU BKEY S(3)(31:0j
.. ,ENC AYf>TO A_ RC6_TEST/ENCR_ SU BKEYS( 4j(31:0 j
FFEFF FFF
00000000
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
uuuuuuuu
uuuuuuuu
Figure B.l : Gate-Level Simulat ion of RCS Ciphe r Design
118
> IENCR YPTO R_RCSJES T/DA TA_IN(3 1:0}
.. !E NCA YPTO A_RCSJ ESTIOATA_OUT( 31;Oj
fENC AYPTOA _RCSJ ESTJRES ET _CHIP
/ENCRYPTOR _ACSJESTIKO
IENCRYPTOA_ACSJESTIOO
/ENCRYPTOA_ACS_ TESTICLK
> !ENCRYPTO R_ACS _TE STISTATVS_FlAG(2:0 j
.... !ENCAYPTO R_ACS _TEST/PlAINTEXT {127:(lj
> IENCR YPTO R_RCS_TESTIEN CR_SUBKEY S{Oj{31: 0j
> IE NC RY PTO A_ RCS_TE STIENCA_SU BKEY S{l )(3 1:0 )
.. IENC AYPTO R_AC6_TE STfENCA_ SUBKEYS(2)(3 1:0 )
.. IENCAYPTO R_RCo_TESTIE NCR_SUBKEYS(3){ 31; 0)
.. /ENCRYPTO R_ AC6J EST/ENCA_SUBKEYS(4 )(31 :0 )
2 2 00 0
FFEFFFFF
0ס0ooooo
uuuuuuuu
uuuuuuuu
UU UUUUU U
UUUU UUUU
24000 0
2AA50 2Aa
FFEFFFFF
FFEFFFFF
FFEFFFFF
Figure B.2: Gate-Level Simulation of RC6 Cipher Design Cont 'd
119
UUUUUUUU · B 18CS55EBl 8C 55SE2AAS0 2AB2AAS0 2AB
,. IENCRYPTOR_RCS_TESTID ATA_1N(31:0)
... !ENCRYPTOA_RCSJ ESTIOATA_OU T(31:O)
!ENCRYPTOR _RC6_TESTIRE SET_CHIP
IENCAYPTOR_RCS_TESTIK O
IENCAYPTOR _RCS_TESTIOO
IENCRYPTOR_R C6_TESTIC LK
,.. IENCRYPTO R_RC6_TESTIS TATV S_FlAG(2:O}
,. IENCAYPTOA_ ACS_TESTIPLAINTE XT( 127:0)
,. IENC AYPTOR_ RC6_TESTIE NCR_SUBKEYS{O)(31:01
> IENC AYPTOA_RC6_TESTIE NCA_SUBKEYS(1 )(31:0)
,. IENCRYPTOR_RC6_TESTIENC A_SU BKEYS(2)(31:0)
.,. IENCRYPTOA_ RC6_TEST/ ENCR_SU BKEY S(3)(3 t:O)
.,. IE NCAYPTOR _AC6_TEST/ENCA_SUBKEYS(4)(31 :01
2 6 0 00
61 8CSS5E
28 0 00
E2AA FC 11
oooooooo
FFE FFFFF
FFE FFFFF
FFEFFFFF
FFEFFFFF
Figure B.3: Gate-Level Simulat ion of RC6 Cipher Design Cont 'd
120
8 18C555EB 18C555E2AA5D 2A82AA502A B
FFEFFFFF
FFEFFFFF
FFEFFFFF
FFEFF FFF
FFEFF FFF
,. IENCAVPTOA_AC6_TESTIDATA_IN(31:0)
,. IENCAVPTOAj:~C6_TESTIOATA_OUT(3 1 :0)
IENC AVPTOR_AC6_TE STIFlESET_CHIP
IENCRYP TOR_RC63 EST/KD
IENCR VPTOR_RC6_TESTIOD
/ENCRVPTOR_RC6_TEST/ClK
>- IENC RYPTO R_RC6_TESTISTA TVS_ FLAG (2:0 1
>- IENC RVPTO R_RC6_TESTIPLA INT'EXT(127:01
,. /ENCRVPTOR_RC6_TESTIENCR_SUBKEVS(01(31:01
,. /ENCRVPTOR _RC6_TESTIENC R_SUBKEV S(l )(31:0)
,. IENCRVPTOR_ RC6_TESTIENCA_ SUBKEVS(2) (31:0j
,. I ENCRYPTOR_RCS_TESTIENCA_SUBKEVS(3)(31;0)
'> IENCRVPTOA_AC6_ TEST/ENCA_SUBKEYS( 4)(3 1:0)
3 8 0 0 0
E2AAFC l l AABE1EOF
4 0 0 0 0
F871 S" CC44 " OI F'
oooooooo
Figure 8 .4: Gate-Level Simulat ion of RC6 Cipher Design Cont 'd
121
4 2000 440 0 0
oooooooo
A"F871S ·CC44- OlFS-:0- tENCAYPTO R_ RC6JESTIOATA_IN(31:0j
:0- IENCR YPTOR _RC6 _TEST/OATA_OU T(3 1:0)
IENCA YPTOR _RC6_TESTIR ESET_CHIP
IENCRYPTOA_ RC6_TESTIKD
I ENCAYPTO R_RC5_TESTIOD
IENC RYPTOA_RC5_ TEST/CLK
.:0- IENCRYPTOA_RC5_TESTfSTATUS_FLA G{2:0 )
;,.. IENCRYPTOA_RC6_TEST/PLAINTEXT( 127:0) 81SC555E81SC555EZA" FFFFFFFF0 1F8AAAACC4462A9FS71S S
;,.. IENCRYPTOR_RC5JEST/ENCA_SU 8 KEYS(O){3 1:0j FFEFFFFF
.:0- IENCAYPTOR_RC6_ TEST/ENC R_SU8KEYS(1)(31 :0) FFEFFFFF
;,.. IE NCRY PTOR_RC5_TESTIENCR_SUBKEYS (2){31:O} FFEFFFFF
;,.. fENCRYPTQR _AC6JEST/ENCA_SUBKEY S(3)(31 :0)
• IENCAYPTO R_ RC6_TEST/ENCA.- SUBKEYS( 4){31:0}
Figure B.5: Ga te-Level Sim ulation of RC6 Cipher Design Cont' d
122
,.. IENC AYPTQF:U 'lC6_TESTIOATA._tN(31 :0)
,.. IENCAYPTOA_AC6JESTIOATA_OUT(3 1;O)
IENCA YPTOA_AC6_TESTIR ESET_CHIP
IENCRYPTOR_RC6JESTIKD
IENCflYPTOA_ RC6_TESTIOD
JENCflYPTOR~C6_TESTICLK
::00 IENCflYPTOR_RC6_TESTISTA.TVS_FLAG (2:0)
::00 IENC RYPTOA_RC6_TESTIPLA JNTEXT(12 7:0)
::00 IENCRYPTOR_AC6_TESTIENCR_SUBKEYS (0)(31:0)
::00 IENCAYPTOR_RC6_TE$ TIENCR_SUBKEYS(1)(31:0)
,.. IENCAYP TOA_RC6_TEST/ENCA_SUBK EYS(2l(J l :0)
::00 fENCAY PTQR_RC6_TESTIENCR_SUBKEYS(3)(J l :0)
::00 IENC RYPTOA _RC6_TEST/ENCA_SUBKEYS(4)(J l :0)
58000
FFFF FFF"
6 00 00
F8000 7FF
0ס0ooooo
F80007FFF8000 7FFF 800 FFC05 570 7FF F
FFEFFFFF
FFEFFFFF
FFE FFFFF
FFEFFFFF
Figure 8 .6: Gat e-Level Simulation of RC6 Ciph er Design Cont 'd
123
700 00 7200 0
F8000 7FFFBOOO7FFF800FFC(»S5707FFF
.. JENCRYPTOR _RC6_TESTIOATA _lN(31:0)
.. /ENCRY PTOR_RC6_TESTIOATA_OOT(3 1:0}
JENCRYPTOR _RC6_TESTIRESET_CHIP
IENC RYPTO R_RC6_TESTJKD
IENCFtYPTOR_RC6_TESTIDO
IENCRY PTOA_ RC6_TESTICLK
.. IEN CAYPTOR_RC6_TESTISTATUS_FLAG(2:0)
.. IENC RYPTO R_RC6_TESTIPLAINTEXT( 127:0)
.. /ENCR YPTOR _RC6_TEST/ENCA _SUBKEYS (0)(31:0)
.. /ENCAYPTOA_AC6_TESTtENC R_SUBKEYS { l )(31:0)
.. IEN CRYPTOR _RC6_TEST/ENCR _SUBKEYS(2)(31:0)
.. IENC AYPTOR_AC6_TEST/ENCR_SU BKEYS(3)(3t :O)
.. IENCRYPTOA_RCS_TESTlENCA_SU BKEYS(4)(3 t :O)
llOOOOOOO
FBOOO7FF
4C2r 96080' 2.40" F2EO· 0000000t
FFE FFFFF
FFE FFFFF
FFEFFFFF
Figure B.1: Gate-Level Simulation of RC6 Cipher Design Cont 'd
124
74000 0 7 6'00 0
J>. IENCRYPTOR_RC6_TESTIOATA_'N(31 :0)
> IENCRYPTOR_RCS_TEST/OATA..-OUT(31:0)
IENCRYPTOR_RC6_TEST/RESET_CHIP
IENCRYPTOR_RC6_ TESTIKO
IENCRYPTO R_RC6_TEST/OO
IEN CRY PTO R_RCS_TEST/Cu<
J>. IENCRYPTOR_RC6 _TEST/STATUS_FLAG(2:0)
» IENCRYPTOR_RC6 _TEST/PLAINTEXT{127:0)
J>. IENCRYPTO A_RC6_TEST/ENCR_SUBKEYS(O)(31;0)
,. IENCAYPTOR_RCS_TEST/ENCR_SUBKEYS(1)(31:0}
ENCAYPTOR_RC6_TEST/ENC R_SU BKEYS(2)(31:0)
"'lCRY PTOR_RCS_TES TIENCR_SUBKEYS(3)(31:0)
~CAYPTOR_RC6_TEST/ENCA_S UBKEYS (4)(31:0)
F80007FF
00000000
F80007FF F8OOQ' F80007FFF80007FFF 8000 7FFF80007FF
FFEFF FFF
FFEFFFFF
FFEFFFFF
FFEFFFFF
Figure B.8; Gat e-Level Simulation of RC6 Cipher Design Cont 'd
125
FFEFFFF F
FBOOO7FF F80007FFFBOOO7FFFBOOO7FF
FFEFFFFF
"> IE NCAY PTO R_RCS3 ESTIOATA_IN(J 1:01
,. IENCRYPTO R_ACS_TESTIOATA_OUT(31:0)
IENC RVPTO R_RCSJ ESTIRESET_CHIP
fENC RVPTCR _ACS_TESTIKO
fENCAVPTOR_RC6_TE STIOO
IE NCRVPTOA_AC6_TE ST!CU<.
,. IEN CA VPTOR_RCS_TESTISTAnJS_FLAG {2."O)
,. IENC RVPTOR_RCS_TESTIPlAINTEXT(127:0)
,. IENCRYPTOA_AC6_TE STIENCR_SUBKEYS(0)(31:0)
,. IENCAVPTOR_RCSJ ESTIE NCA_SUBKEYS(1)(31:0)
,. (ENCAVPTOR_RC6_TESTIENCA_SUBKEYS(2)(31:0j
.. IENCRYPTOR_ACG_TESTIENCA_ SUBKEYS( 3)(31:0j
.. IENC RVPTO R_RC6_TE STtENCR_SUBKEYS(4)(31:0)
88000
04- 0748C- 04602"
F80007FF
0ס0ooooo
9 0 00 0
Figure B.9: Gate-L evel Simulation of RC6 Cipher Design Cont' d
126
Appendix C
Gate-Level Simulation of CAST-256
Cipher
This a ppendix shows t he ga te-level si mula tion results for t he design of CAST-256 cipher. The
ent ire simu lation is divi ded into eight segme nts , with each segment illust rati ng a particular
mode of operation of tbe cipher. Th e simulat ion figures tllusera ee all the slx modes.
Duri ng the reset-mode, all the control inputs to the datapath are disabled and as such
no eipbert ext appears at the out put of the cipher as illustrated in the simul atio n figures .
During the masking-key-download mode , (orty eight 32-bit subkeys are down loaded. into the
cipher; whereas d urin g the rotation-k ey-download mode , same number of 5-bit subkeys are
download ed. When t he global state machi ne is in the data-download mode, the 128-bit
plain te xt block is down loaded into t he ciphe r. Final ly when t he global state mach ine is in
the da ta-encryp t mod e, tbe 128-bit p lainte xt is encrypted . T be idle mode is used to provide
more 8exibilty to the CAST-256 global sta te machine.
127
'"' fENCFlY PTOR_CAST-.256_TEST IOATA_IN (31 :0l
> fENCRYPTOR_CAST~_TESTJENCFlYPTEO_OATA_OUT( 3 1:0)
JENCRY PTO R_CAS T-.256_TESTIR ESET _CHI P
fENCRY PTOR_CAST_256_TESTIKO
IENCFlYPTOR_CA ST_256_TESTIDD
IENCRYPTO R_CAST_256_TESTICLK
'"' IENCRYPTO R_CAST-256_TESTfSTATUS] l.A G(3 :0)
> IENCRY PTO R_C AST_256_TESTIP LAINTEXT( 121:0)
'"' IENCRYPTO R_CAST-2 56_TESTIMASKING_ SUBKEY S{Oj( 31:0l
'"' IENCAY PTOA_ CAST_2S6_TESTIM ASKl NG_SUBKEY S{1)( 3 1:Ql
'"' IENCA YPTO A_CAST_256 _TESTIM ASKl NG _SU8 KEYS( 2)(3 1:0)
> IE NCFlYPTO A_CAS T..2S6_TESTIMA SKING_SUBKEY S(3)(31:0J
'"' IENCAYPTOR_CAST..256_TESTIMA SKl NG_SUBKEY S(4)( 31:Q)
'* IEN CAYPTOA _CAST.-256_TES TIROUNO_SUBK EYS (O)(4:01
> /ENCR YPTO A3;AS T_2S6_TESTIR OU NO_SU BKEY S{ l l (4:0 )
> IEN CAYPTOR_CAST_256 _TESTIROUNO_SU BKEYS(2}{4:O)
.. IENCRYPTOR_CA$T_256_TEST/ROUNO_SUBKEY S{3J(4:0J
> IENCAYPTOA_CAST.,2 56_TEST/ AOU ND_SUBKEV S(4){4:0)
'"' IENCRYPTOA_CAST.,256_TESTIA OUNO_SUBKEYS(5)(4:0)
2 0 0 0
FFEFFFF F
OOOOOOOO
uuuuuuuuuuuuu uuuuuuuuuuuuuuuuu
uuuuuuu u
UU UUUUUU
UUUUUUUU
UU
uu
UU
UU
Figure C.I : Gate-Le vel Simu lat ion of CAST·256 Cipher Design
128
FFEFF- 2AA502A8 Bl 8C5 55E E2AA.F
oooooooo
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
~ /ENCRYP TOR _CA ST_256_TESTIDA TA_IN{31:O)
~ /ENCRYPTOR_CA ST_256_TE STIENC RYPTEO_OATA_oun31:01
IENCR YPTOR_CA ST-256_TESTIRESET_CHIP
/ENCRYPTOR _CAST_256 _TEST/KO
IENCRYPTOR _CAST-256_TESTfOO
/ENCRYPTOR_CAST-256_TESTICu<
> fENCRYPTOR_CAST_256_TEST/STATUS1 LA G(3:0)
~ IENCRYPTO A_<:AST-256_TEST/PLAJNTE)CT( t 27:0)
... fENCAYPTOR_CAST_256_ TESTIM ASKJNG_SUBKEY S(0)( 31:0)
. IEN CAYPTO A_CAST_256 _TESTIMA.SKJNG_SUBKEYS(l K31:0)
i /ENCA YPTOR_CAST_256_TESTIMA.SKJNG_SUBKEYS (2)(31 :0)
> fENCAYPTOR_ CAST_256_TESTIM ASKlNG_5UBKEYS(3)(31 :0 )
-.. 't: RYPTOR_CA ST-256_ TESTIMA.SKlNG _SUBKEYS (4K31:0 )
VPTOA_CA ST-256_ TESTtROUND_SUBKEYS(0){4:O)
... , ~ ...." YPTOR_C AST-256_TESTIROUNO_SUBKEYS (1}(4:O)
... IENCRYPTOR_CAST-2S6_TE STIR OUNO_SUBKEYS(2){4;0)
~NCRYPTOR_CAST-256_TE5T/ROUNO_SUBKEYS(3 )(4:0)
::NC"RYPTOR_CAST_256_TESTfAOUNO_SUBKEYS( 4)(4:0)
• IENCAYPTOR_CAST_256_TEST/AOUNO_SUBKEY S(S){4 :0)
2(000
uuuuuuuu
UUUUUU UU
UUUUUUUU
uwuuuuu
uu
uu
UU
26000
FFEFF FFF
FFEFF FFF
FFEFFFFF
FFEFFFFF
FFEFFFFF
Figur e C.2: Gate- Level Simulation of CAST-256 Cipher Design Con t 'd
129
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
2 8 0 0 0
:. IENCRYPTOR_CAST~56_TESTIDATA_IN(3 1 :0 ) E2AAFCl l
~ fENCRY PTOR_CAST _2563ESTIENCRYPTED..DATA_0l1T(31:01 OOOOOOOO
fENCRYPTOR_CAST-"!56_TESTJRES ET_CHIP
fENC RYPTOR_CAST~56_TESTIKD
fENCRY PTO R_CAST_256_TEST/OD
I ENCRYPTOR_CAST_256_TEST/Cu<
.. I ENCRYPTOR_CAST. 256_TEST/STAT1JS_FLAG{3:0)
... IENCRYPTOR_CAST_256_TEST/ PLAINTEXT(121 ;O)
1 0 0 0 0
• IENCRYPTOR_CAST~56_TESTfMASKING_SU8KEYS(O)(31;O)
>- JENCRYPTOR_CAST_256_ TESTIMASKING _SU8KEYS(1){3 t :O)
:. IEN CAYPTOR_ CAST..256_TESTfMASKlNG _SU8KEYS (2)(31 :0)
.... IENCRYPTOR_CAST_256_TESTIMA$ KlNG_SU 8 KEYS(3)(3 1:0)
IENC RYPTOR_C AST_256J ESTIMASKlNG_SUBKEYS(4l{31;OI
.. IENCRYPTOR_CAST..2 56_TESTIROUN O_SUBKEYS (O){4:01
.. IEN CRYPTOA_CAST_256_TESTIR OUN O_SU8KEY S(1){4:0)
~ IENCRYPTOA_CAST~56_TESTJROUNO_SU8KEYS(2 )(4:0)
>- fENCRYPTOR_CAST..256_TEST/ROUNO_SU8KEYS(3)(4:0)
:. IENCRYPTOA_CAST_256_TESTfROUNO_SU8KEYS(4){4:0)
:. IENCRYPTOR_CAST..256_TESTIROUNO_SU8KEYS(S)(4:0)
FFEFFFFF
FFEFFFFF
FFEFF FFF
uu
uu
UU
UU
uu
"
06
06
OS
OA
Figure C.J: Gate-Level Simulation of CAST -256 Cipher Design Cont'd
130
.. IENCRYPTOR_CAST~S6_TESTIOATA_ IN(31:0)
.. IENCRYPTOR_CAST~S6_TEST/ENCR"(PTEO_OATA_OUT{3 T :O)
IEN CF!YPTOR _CAST_256_TESTIFIESIT_CH IP
IEN CRYPTOR_CAST_256_ TESTII<D
IENCRYPTOR_CAST_256_TESTIOO
IEN CRYPTOR_CAST_256_TESTICLK
.. IE NCRYPTOA_CAST_2S6_TE STIS TAT1J'S_FlAG(3:O)
.. IEN CRYPTO R_CAST_2S6_TE STlP lAlNTEXT(127:O)
.. IENCAYPTOA_CAST_256_TESTIMASKI~G_SUBKEYS(0){3 1 :O)
.. I ENCRYPTOR_CAST_256_TE STIM ASKIN G_ SUBKEYS(1){31:O)
.. IENCRY PTO R_CAST_256_TEST/MASKIN G_ SUBKEYS(2)(31;O)
"i:NCRYPTOR_CAST_256_TESTIM ASKIN G_ SUBKEYS(3)(3T;0)
· 'CRYPTOA_CAST_256_TE ST/MA SKIN G _SUBKEYS(4){31:0)
.;NCAYPTOR_CAST~S6_TEST/AOUNO_SUBKEYS(O)(4:0}
.. IENCAYP TOA_CAST_256_TESTIFIOUNO _SUBKEYS{ l}{4 :0)
.. IENCRYPTOA_CAST..256 _TESTIA OU NC _SUBKEYS(2}(4 :0)
·~CAYPTOA_CAST..256_TESTIAOUNO_SUBKEYS(3)(4:0)
i CAYPTOA_CAST_256_TESTIA OU NCI_SUBKEYS(4){4:0)
.. iE NCRYPTOA_CAST_2S6_TESTIFIOVN D _SU BKEYS(5){4:O)
3 2000
E2A.A.FC 1 1
oooooooo
UUUU UUU UU UUUUUUUUU' E2AAFC l l E2AAF
FFE FFFFF
FFEFFFFF
FFEFFFF F
FFEFFFFF
Figure C.4 : Gate-Leve l SimuJatioD.of CAST-256 Cipher Design Com'd
131
58000 6 0 0 00
> IENC RYPTO R_CAST_256 _TES T/OATA_ IN{31 :0)
~ IENCRYPTOR_CAS T_256 _TEST/ENCRYPTED_OATA_OUT{31 :0)
IE NC RYPTO R_C AST_256_TEST/R ESET_CH IP
IENC RYPTO R_CAST_2S6_TEST/KD
IENCRYPTO R_CAST_256 _TEST/DD
IENC RYPTOA_CAST _256_TE ST/CLK
,. IENCA YPTOA _C AST_256 _TEST/S TATUS_FLAG(3:0)
~ IEN CRYPTO R_C AST_256_TEST/PLAINTEXT(127:0)
IENC AY PTOA _CAS T_256_ TEST /MAS KING_SUB KEY S(0)(3 1:0)
> I ENC AYPTO A_CAST_256_TESTtMASKING_SUBKEYS{1)(31 :0)
;»- IENCRYPTO A_C AST_256_TESTIM ASKING_SUBKEY S(2){3 1:0)
'ENCA YPTOR_CAST_256 _TES T/MASKING_SUBKEYS(3)(3 1:0)
ENCAYP TOA _CAS T_256_ TES TIMASKI NG_SUBKEYS{4)(3 1:0)
.. IE NCRYPTOR_CAST_256_TEST/AOU ND_SU BKEY S{O){4:0)
• IENCRY PTOR_CA ST_ 256_ TEST/ROUNO_SU8K EYS(1)(4:0)
.. IENCRYPTO R_CA ST_25 6_ TESTIROUND_SU 8KEYS(2)(4:0)
.. IENCAYP TOR _CAST_256 _TEST IRO UNO_SU8KEYS{3)(4:0)
... IE NCAY PTOA_C AST_256_ TEST/AOU NO_SU8 KEYS( 4}(4:0)
• IE NCAY PTO R_CAST_256_ TEST/ ROUND_SU8KEYS(5){4:0 )
reo-
00000000 38 A9 ' 135S' C540" 297F" 000000'
E2AAFC11 E2AA FC 11E2AAFC 11E2AA FC11
FFE FFFFF
FFEFFFFF
FFE FFFFF
FFE FFFFF
FFEFFFF F
06
05
OA
Figur e C.S: Gate-Level Simulation of CAST-2S6 Cipher Design Cont'd
132
8 800 0 '0000
~ IENCRYPTOR_CAST..256_TESTIOATA_IN (31:O)
:> IEN CRYPTOA_CAST_256_TESTJENCRYPT ED_DA TA_0UT(31:0)
IENCRYPTOR_CAST~56_TESTIRESET_CHIP
IENCRYPTOR_CAST_256_TESTIKD
l ENCRYPTO R_C AST_256_TEST/DO
IENCRYPTOR_CAST_256_TESTlC LK
... /ENC RYPTOR_CAST~56_TESTISTAruS1l.AG(3:O)
... IENCR YPTOR _CAS T_256_TESTlP lAlNTEXT(127:0)
» IENCRYPTOR_CAST_256_TESTlMASKlNG_SUSKEYS(0 )(31:O)
» IENCRYPTOR_CAS T_256_TESTIM ASKl NG_SU BKEYS (1)(31:0)
... IENC RY PTO R_CAS T_256_TESTIMASKl NG_SU BKEYS(2){J 1:O)
> IENCRYPTO R_CA ST_256_TESTIMASKING_S UBKEYS(J)(31:0)
> IENCRYPTOR_CAST_256_TESTIMASKING_SUBKEYS(4)(Jl :0)
... IEN CRYPTO R_CAST_256_TESTIROUNO_SUBKEYS(O)(4:0)
... IENC RYPTO R_C AST_256_TESTIROU NO_SUBI<EYS (1)(4:O)
... IE NCRY PTO R_CAST_256 _TESTIROU NO_SU BKEYS (2){4:O)
> IENCAYPTOA_CA ST_256_TESTIRQUNO_SU BKEYS (3){4:O)
> IENCAYPTOR_CAST~56_TESTIROUNO_SU8KEYS(4){4:O)
» IENCAYPTOR_CAST~56_TESTIROUNO_SUBKEYS(5)(4 :0)
F8000 7FF
"C01.-115S"0080"5A2.- OOOOOOOO
F80007FFF80007FFF80007FFF80007FF
FFEFFFFF
FFEFFFFF
FFEFFFFF
FFEFFFFF
FFEFF FFF
OA
Figure C.6: Gate-Level Simulation of CAST-256 Cipher Design Cont'd
133
6 2 0 0 0 6 4000
:>- IENC AY PTO R_CAST_256 _TESTlOATA_ IN[3 1:0)
> IENCAYPTOA_CAST_256_TEST/E NCRYPTED_OATA_OUT(3 1:0 )
IE NC RY PTO A_C AST-<!563EST/R ESET _CHIP
IEN C AYPTOA_CAST_256_TES TIK O
IENC AY PTO A_CAST_256_ TESTIO D
/ENC AYPT OA_CA ST_256_TES T/CLK
:>- IENCRYPTOA_CAST-<!563 EST/STATU S_FLAG{3:0j
~ IE NC AYPTOA_CA ST_256 _TEST/ PtAINTEXT( 127 :0)
> IENCAYPTO A_C AST_256_TEST/ MASKING_SUSKEYS(0}(3 1:Q)
> IENCAYPTOR_CAST_256_TEST/ MASKING_SUBKEYS( 1)(3 1:0)
~ IE NCAYPTO A_CAST_256_ TEST/MASKING_SUBKEYS (2)(3 t :0)
.. IENC AY PTO A_CAST_256_TEST/MA SK ING_SUBKEY S(3)(3 1:0)
.. IENCRYPTOR_CAST_256_TEST/MASKING_SUBKEY S(4){3 1:0)
co. IENCAY PTO A_CAST_256_TEST/ RO UN D_SUSKEYS(0)( 4:0)
.. IEN CRY PTO A_CAST_256_TEST/ RO UN D_SUSKEY S( 1)(4:0}
... /ENCAYPTOA_CA ST_256_TESTIR OUN D_SUBKEY S(2)(4:0)
• IENCAYPTOA_C AST_256 _TEST/RO UN D_SUBK EY S(3){4:0)
.. IENCAY PTOA_CAST_256_ TEST /AO UNO_SU BKEYS (4)(4:0)
... IENC AYPTOR_CAST_256_TEST/ AO UNO_SUBKEYS(5 ){4:0)
F8000 7FF
E2AAFCll E2A· FSOO07FFF8000 7FFFBOOO7FF F '
FFEFFFFF
FFEFFFFF
FFEFFFFF
FFEFFFFF
OA
Figure C.7: Gate- Level Simulation of CAST ·256 Cipher Design Cont 'd
134
900 00 9 2 0 0 0
:> IEN CRYPTOR_CAST.-256_TESTJOATA._IN(31:01
:> IENCRY PTO R_CAST _256 J ESTlENCRYPTEO_DATA._OU T(31:0)
/ENCR YPTOR _CAST_25 6_TEST/RESECCHI P
IENCRYPTO R_CAST_256_TESTIKD
/ENCRYPTOR_ CAST_2563ESTIO O
IENC RYPTOR_CAST~56_TE STICU<
IENCRYPTOR_CAST.-256_TE STIS TA.TUS_FlAG(3:0)
:NCRYP TOR_CAST_256_TESTIPLAINTEXT (T21:0)
'::NCRYPTO R_CAST.-256_TESTIMASKING_SUBKEYS(0)(31:0)
- !ENCRYPTO R_CAST~56_TE ST/MA SKJNG_SUBKEYS{1){31:0)
... IENCRYPTOR_CAST_256_ TEST/M ASKING_SUBKEYS(2)(3 1:0)
~CRYPTOR_CAST..256_TE STIMASl<JNG_SUBKEYS(3)(31:O)
~CRYPTOR_CAST~56_TESTIMASKlNG_SUBKEYS(4)(31:0)
:> IEN CRYPTO A_CA ST_256_TESTIROUNO_SUBKEYS(O)(4:O)
.. IENCRYPTOR_CAST_256_TESTIR OUND_SUBKEYS(1)(4:0)
.. IENCRYPTO R_C AST.-256_TEST/ROUND_SUBKEYS(2)(4:0}
.. JENCRYPTOR _CAST ,..256_TESTIROUNO_SUBKEYS (3}(4:O)
.. IENCAYPTQR_CAS T,..256_TESTIROU NO_SUBKEYS(4)(4:O)
.. IENCAYPTOR_CAS TJ56_TESTIROU ND_SUBKEYS(5)(4:O)
F8000 7FFF80007FFF80007FFF80007FF
FFEF FFFF
FFEFF FFF
FFE FFFFF
06
Figure C.8: Gate- Level Simula tion of CAST -256 Ciph er Design Cont 'd
13.
Appendix D
Gate-Level Simulation of Fast
Hardware Cipher (FHC)
T his appendix shows the ga te-le vel simul atio n results for the design of th e pro posed cipher .
referred to as Fast Har dware Cipher or FHC. The en ti re simu la tio n is divi ded into eight
segments, wit h each segm ent illustrating a particular mode of ope ratio n of t he cipher. The
simulation figures illustrat e al l tb.e five modes - reset mode (7), key-downloa d mode (1,2).
da ta-download mode (3, 4), id le mod e (0), and da ta -encryp t mode (5.6) .
During the reset -mode. all the control inputs to the datapath are disabled and as such
no ciphertext appears at the ouptut of the cipher as illus trated in the simulation figures.
During the key-download mode, nine ty six 32-bit subkeys are downloaded into the cipher;
whe reas during the da ta-download mode, the 12&-bit plaint ext b lock is do wnloaded into
the ci phe r . Finall y when the global sta te machine is in the d ata-encrypt mode, the 128-bit
plaintext is encryp ted synch ro nous ly unt il finall y at the end of th e required numbe r of rounds
of enc ryp t ion, th e 12B-bit ciphertext is availabl e at the 32·bit out put bus of the cipher.
136
5 0 0
xx· 00 000 0· 00· 00 · 00 · 00 · 00· 00· 00' O'
UUUU UUUUUUUUUUUUUUUUUUUUUUU'
xx-ocoooo-00 · 00· 00 · 00· 00· 00· 00· O·
~ !FAST_ENCRYPTORJ EST/PLAINTEXT_tN{31:01
»- IFAST_ENCRYPTOR_TEST/CtPHERTEXT_OUT(31:0)
IFAST_ENCRYPTOR_TEST/RESET_CHIP
/FAST_ENCRYPTOR_TEST/KO
IFAST_ENCRYPTOR_TESTIDO
!FA ST_ENCRYPTOR_TEST/CL K
~ I FAST_ENCRYPTOR_TEST/STATUS_FlAG{2:O)
»- IFAST_ENCRYPTOR_TEST/FAEIOA3(31:O)
»- /FAST_ENCAYPTOR_TEST/FAElOA4(127:0)
... I FAST_ENCRYPTOR_TEST/FAElF EClMASK tNG_KEYS(31:0)
uu·
FFEFFFFF
00000000
Figure 0 .1: Gate-Leve l Simulation of FHC Design
131
50000
oooooooo
FFFFF" B7424AA7 F2B9" F" Al057800
UUUUU UUUUUUUUU UUUUUU" A105 780'
,. I FAST_ ENCRYPTOR_TEST/ PLAINTEXT_IN(31:0}
:> IFAST_ENCRYPTOR_TEST/CIPHERTEXT_OUT(31;()
/FA ST _ENCRY PTOR_TE STIRESET_CHIP
/FAST_ ENC RY PTOR_TESTIKO
/FAST_ENCRYPTOR_TE STIDO
/FAST _ ENCR YPTOR_TESTfC LK
/FAST _ENCR YPTOR_TEST ISTATUS_FtAG{2:Q)
FAST_ ENCRYPTOR_TEST/FAEIO A3(31;()j
:> IFA ST_ENC RY PTOR_TEST/FAEJOA4( 127:O)
... IFAST_ENCRYPTOR_TEST/FAE/FECIM ASKING_KEYS(31:0)
2 0
00" 00 ' FF" FFEFF"
00· 00· FF' FFEFF ·
oooooooo
00000000
Figure 0 .2: Ga te-Level Simulation of FHC Design Cont 'd
138
5 0000 55
UU UUU UUUUU U" A1D578DOFCCF9543F2 "
At 05 7800• /FAST_ENCRY PTOR_TEST/PLAINTEXT_ IN(31:O)
> /FAST_ENCRYPTOR_TEST/CIPHERTEXT _OUT (3 1:O)
fFAS T_ENCRYPTQR_TESTIAESET_CHIP
fFAST_ENCRYPTOR_TEST/KO
(FAST_ENCAYPTOR_TESTIOD
(FAST_ENCAYPTOR_TEST/Cl K
• I FAST_ENCAYPTO R_TEST/STATU SJ LA.G(2:0 j
.. IFAST_ENCRYPTOR_TEST/FAElD A3(31 :0)
.. IFAST_ENCAYPTOA_TESTIF AEIOA4( 127:01
• I FAST_ENCRYPTO R3 ESTIFAElFECIMASK ING_KEYS(31:0)
FFE"
FFE"
0ס0ooooo
00000000
0ס0ooooo
FFEFFF FF
. FFEFFFFF
Figure D.3: Gate-Level Simulat ion of FHC Design Cont 'd
139
". /FAST_ENCRYPTOR_TEST/PLA INTEXT_IN(31:O}
»- /FAST_ENCRYPTOR_TEST/CIPHERTEXT_OUT(31:O)
/FAST_ENCRYPTOR_TESTIR ESET_CHIP
IFAST_ENCRYPTOR_TESTIKD
/FAST_ENCRYPTOR_TESTIOO
/FAST_ENCRYPTOR_TE STIC LK
". IF AST_ENCR YPTOR_TES T/STATUS _FlAG(2:O}
". IF AST_ENCRYPTOR_TES TIFA EfOA3(3 1:0j
". /FAST_ENCRYPTOR_TESTIFAEIOA4(127:0)
". /FAST_ENCRYPTOR_TESTIFAElFE ClM ASKING_KEYS{31:O}
1 0 0 0 0 0
oooooooo
FFFFFFFF
A 10 578DOFCOF9S43F2B9AE2 7B7424AA 7
FFFFFF FF
Figure 0 .4: Gate-Level Simulati on of FHC Design Cont 'd
140
1050 0 0
A105 7800FCOF9543· 497 3J 51C el 05 5732
F80· E5· El e - El " 497335 1C
· 4E· OOOOOOOC
~ I FAST_ENCRYPTOR_TESTIP LAINTEXT_IN(31:O)
:. /FAST_ENCRYPTOR_TEST/CIPHERTEXT _OUT(31:0)
/FAST_ENCRYPTOR_TESTIR ESET_CHIP
/FAST _ENCRYPTOR_TESTIKO
/FAST_ENCRYPTOR_TESTJOO
/FAST _ENC RYPTOR_TE STICLK
~ lF AST_ENC RYF'TOR_TESTISTATUS1 LAG(2:0)
~ I FAST_ENCRYPTQR_TEST/FAEJOA3(31:O)
• /FAST_ENCRYPTOR_TEST/FAEJOA4(127:Ol
.. /FAST_ENCRYPTOR_TEST/FAElFECIMASKING_KEYS(31:0)
FFFFFF'" oooooooo
00000000
FFEFFFFF
FFEFFFFF
Figure 0 .5: Ga te-Level Simulation of FHC Design Cont 'd
141
497335 1CE1 055732 E1EtAAA9E5C55C68
0ס0ooooo 60 " A2" 4 1" sa" 0ס0ooooo
~ I FAST_ENCAYPTOR_TESTIPLAINTEXT_1N(3 1:O)
~ IFAST_ENCAYPTOR_TESTICI PHEATEXT_ 0UT(31:O)
IFAST_ENC RYPTORJESTIAESET_CHI P
IFAST_ENCRYPTQR_TEST/KD
IFAST_ENCAYPTOA_TESTIOO
IFAST_ENCAYPTOA_TEST/CLK
~ fFAST_ENCAYPTOR_TEST/STATUS_FLAG(2:0)
~ I FAST_ENCAYPTOA_TEST/FAElDA3(31:0)
~ IFAST_ENCAYPTOR_TESTIFAE/OA4(127:0)
~ IFAST_ENCA YPTOR_TESTIFAEIFECIMAS KING_KEYS (31:O)
F'
155000
4973351C
ooooocoo
0ס0ooooo .
Figure 0 .6: Gate-Level Simulation or FHC Design Cont 'd
142
160 0
4 973351 CE 1D5S732E1" 497335 1C49 733S '
> /FA.ST_ENCRYPTOR_TESTIPLAINTEXT_IN(3 1:O)
> IFAST_ENCRYPTOR_TESTJCrPHERTEXT_OUT(31:O)
I FAST_ENCRYPTOR_TESTIR ESET_CHIP
I FA.ST _ENCRYPTOR_TEST/KO
IFAST_ENCRYPTOR_TEST/DO
I FA.ST_ENCRYPTOA_TESTIC LK
> /FA.ST_ENCAYPTOA_TESTJSTATU S_FLAG(2:0)
... /FA.ST_ENCRY PTOR_TEST/FAElD A3(31 :O)
... IFAST_ENCR VPTOR_TESTIF A.ElD A4(127:0)
>- I FA.ST_ENCRYPTOR_TEST/FA.E/FECIMASKINGJ< EYS(31:0)
497335 1C
0ס0ooooo
0ס0ooooo
FFEF FFFI
FFEFfF FI
Figure 0 .1: Ga te-Level Simulat ion of FHC Design Cont'd
143
~ IFAST_ENCRYPTOR_TESTIPLAINTEXT _IN(31:O)
~ I FAST_ENCRYPTOR_TEST/CIPHER TEXT_OUT(31:0)
IFAST_ENC RYPTOR_TESTIR ESET_CHIP
/FAS T_ENCRYPTOR_TEST/KD
/FAST_ENC RYPTOA_TESTIDD
IFAST_ENCRYPTOR_TESTIClK
>- IFAST_ENCRYPTOR_TEST/STATUSJ LAG(2;0)
~ 'FAST_ENCA YPTOR_TEST/FAElDA3(3 1:0)
4.ST_ENC AYPTOR_TESTIFA ElD A4(127;O)
, /FAST_ENCAYPTQR_TEST/FAE/FECIMASKrNG_KEYS(31 ;O)
345000
4973351C
0000ססOO
0ס0ooooo
497335 1C49 7335 1C497335 1C497335 1C
0ס0ooooo
Figure:0 .8: Gate- Level Simulation or FHC Design CoDt'd
144




