New methods for the implementation of statistical cipher feedback mode by Zhang, Liang




New Methods for the Implementation of Statistical Cipher Feedback 
Mode 
ST. JOHN 'S 
by 
© Liang Zhang 
Master of Engineering 
A thesis submitted to the 
School of Graduate Studies 
in partial fulfillment of the 
requirements for the degree of 
Master of Engineering. 
Department of Electrical and Computer Engineering 
Memorial University of Newfoundland 
May 8, 2008 
NEWFOUNDLAND 
Contents 
Abstract 
Acknowledgements 
List of Tables 
List of Figures 
Notation and List of Abbreviations 
1 Introduction 
1.1 Symmetric-Key Ciphers 
1.2 Public-Key Ciphers 
1.3 Motivation . 
1.4 Objective of the Thesis 
1.5 Thesis Out line . 
2 Background 
2.1 Block Ciphers 
2.2 St ream Ciphers 
2.3 Block Cipher Modes of Operation 
11 
V I 
v iii 
ix 
X ll 
X lll 
1 
2 
3 
4 
5 
5 
8 
8 
9 
10 
2.4 Advanced Encryption Standard (AES) 14 
20401 Implementation of AES S-box 0 16 
2.402 Hardware Analysis of AES S-box 20 
2.403 Shift Row and Inverse Shift Row 22 
2.404 Mix Column and Inverse Mix Column 23 
2.405 Add Round Key 0 25 
2.406 Key Scheduling 25 
205 Statistical Cipher Feedback Mode 27 
20501 Implementation Structure of SCFB System 0 28 
20502 Discussion on Queuing System 0 0 0 29 
20503 Serial Transfer vso Parallel Transfer 31 
205.4 Relationships of clocks 32 
20505 Synchronization Cycle 33 
20506 SCFB with CTR mode 34 
20507 Previous SCFB Implementations 35 
20508 Performance Analysis of SCFB Mode 36 
206 Conclusion 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . .. .. 38 
3 SCFB Mode Using Serial Transfer 39 
301 AES Implementation ••• 0 0 0 0 • . . . . . . 40 
302 SCFB Mode Hardware Implementation Details 0 42 
30201 Registers 0 0 0 0 0 0 44 
30202 System Controller 0 45 
30203 Plaintext Queue and Ciphertext Queue 48 
303 Synthesis Results, Analysis and Comments on the Design 0 49 
304 Conclusion 0 0 0 0 0 0 0 0 0 0 0 • 0 • • •• 0 • 0 0 ••• 52 
lll 
4 SCFB Mode Using Parallel Transfer 54 
4.1 Hardware Implementation Details 55 
4.1.1 Shift Register .. 57 
4.1.2 IV _ShifLRegister 58 
4.1.3 Plaintext Queue and Ciphertext Queue 61 
4.2 Synthesis Results, Analysis and Comments on the Design . 66 
4.3 Conclusion . 70 
4.4 Conclusion . 70 
5 Pipelined SCFB Mode Using Parallel Transfer 71 
5.1 SCFB Based on Pipelined Counter mode (CTR) 72 
5.2 Hardware Implementation Details . . . . . . . . 73 
5.2.1 Implementation of Counter Mode (CTR) 75 
5.2.2 Advanced Encryption Standard (AES) 77 
5.2.3 System Controller . . . . . . . . . . . . 81 
5.2.4 IV Shift Register for Parallel Thansfer Mode 86 
5.2.5 Shift Registers . . . . . . . . . . . . . . 93 
5.2.6 Plaintext Queue and Ciphertext Queue 97 
5.3 Synthesis Results, Analysis and Comments on the Design . 100 
5.4 Conclusion . . . . . . . . 0 0 0 ••••••• •• • 0 • 0 •• • 103 
6 Analysis of SRD and EPF 
6.1 Error Propagation Factor . 
104 
104 
6.1.1 EPF of the Pipelined SCFB Mode Versus Various Blackout 
Period Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . 105 
6.1.2 EPF of Pipelined SCFB Mode Versus Various Sync Pattern Sizes108 
6.2 Sync Recovery Delay ..... . . . . . . . . . . . . . . . . . . . . 110 
l V 
6.2.1 SRD Versus Various Blackout Period .. 
6.2.2 SRD Versus Various Sync Pattern Sizes . 
6.3 Conclusion . .. . .. . . .. . 
7 Conclusions and Future Work 
7.1 Conclusions . 
7.2 Future Work . 
Appendix A 
A Partial VHDL Codes for SCFB Systems 
A.1 SCFB System Controller using Serial Transfer 
A.2 SCFB System Controller using Parallel Transfer 
A.3 Pipelined SCFB System Controller . . . . 
A.4 Top Level RTL of Pipelined SCFB System 
Abstract 
v 
110 
113 
115 
116 
116 
118 
122 
123 
123 
128 
134 
145 
Abstract 
In this thesis, we investigate a recently proposed mode of operation for block ciphers, 
referred to as statistical cipher feedback (SCFB) mode. SCFB mode is designed for 
high speed stream-oriented transmission wherE1 it is necessary to recover from any 
number of bit slips or insertions in the communication channel, that is, SCFB has 
the capability of self-synchronization. SCFB mo e is a hybrid of CFB mode and OFB 
mode, and hence, it has a higher throughput han CFB mode and can obtain s If-
synchronization while OFB mode can not. As result, SCFB mode can be applied 
physical layer security for applications such as SONET /SDH. 
In this thesis, SCFB mode using both serial transfer and parallel transfer is im-
plemented in hardware. Additionally, we have: implemented pipelined SCFB mode 
based on parallel transfer in hardware as well. 1 he hardware implementation of thes 
SCFB structures is thoroughly investigated. ~hroughout this research, VHDL and 
ModelSim SE 6.0 are used in the process of hardware design and verification. Further 
these SCFB modes which have been implement/ dare synthesized by using Synopsys 
tool (version 2002 and 2004) targeting to ASICs based on 0.18 micron CMOS technol-
ogy based on the TSMC (Taiwan Semiconductor Manufacturing Company) process 
supported by Canadian Microelectronics Corporation (CMC). 
As an outcome of our result, we have created a new modified version of SCFB 
mode, which we refer to as pipelined SCFB mode. Pipelined SCFB mode applies a 
VI 
block cipher, which has pipelined architecture, and Counter(CTR) mode instead of 
OFB mode which is used in conventional SCFB mode. 
Based on the synthesis results, the throughput of the SCFB using serial transfer 
and parallel transfer (block transfer size equal to 4 bits) can reach 100 Mbps and 
222 Mbps, respectively. The total number of gates of these two SCFB systems are 
41600 and 43697, respectively. For the pipelined SCFB mode, the throughput and 
area complexity are 333 Mbps and 189963 gates. 
The performance analysis of pipelined SCFB mode is also provided with respect to 
characteristics such as synchronization recovery delay (SRD) and error propagation 
factor (EP F). Moreover, the analysis of system queues such as the number of bits in 
the plaintext queue, the queue size requirements and probability of queue overflow is 
also provided. 
Among these different implementations, the pipelined SCFB mode based on paral-
lel transfer mode can obtain the highest throughput and the SCFB mode using serial 
transfer mode has the lowest area complexity. Hence, the pipelined SCFB mode using 
parallel transfer is more suitable for high speed physical layer security. 
Vll 
Acknowledgements 
I am very grateful to my supervisor, Dr. Howard Heys, for his constant guidance, 
feedback, encouragement and for keeping me focussed in my research. During the 
past two years, Dr. Howard has given me consistent trust and support which greatly 
encouraged me to improve and finishi my work. 
This is also a chance to thank all the members of Computer Engineering Research 
Laboratory (CERL) in Memorial University of Newfoundland during these years. 
Special thanks to Ling Wu, Reza Shahidi, Pu Wang, Tianqi Wang, Shenqiu Zhang, 
Jonathan Anderson and Shi Chen for the precious friendship and generous support. 
I am also indebted to my parents. Thank you for your continuous support and 
love during two years of my Master 's study. 
I would like to thank my wife, Yanan Ma for her selfless supporting and encour-
aging me to pursue this degree. Thank you for saving me from all the depressions I 
have went through. Without my wifes encouragement, I would not have finished the 
degree. 
Vlll 
List of Tables 
2.1 Area Complexity of AES S-Box Using 0.18J1m CMOS Standard Cell 
Technology [15] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 
2.2 Timing Delay of AES S-Box Using 0.18p m CMOS Standard Cell Tech-
nology [15] . . . . . . . . . . . . . . . . . . . . . . . . 22 
2.3 Synthesis Result Using 0.18 Micron CMOS From [19] 36 
3.1 Synthesis Result Using 0.18 Micron CMOS ... . . . 52 
4.1 Synthesis Result Using 0.18 Micron CMOS (Block Transfer Size = 4 
Bits) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 
5.1 Boundary Positions Where the Sync Pattern is Recognized . . . . . . 91 
5.2 Synthesis Result Using 0.18 Micron CMOS (Block Transfer Size = 8 
bits) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 
IX 
List of Figures 
2.1 Electronic Codebook (ECB) Mode . 10 
2.2 Cipher Block Chaining (CBC) Mode 11 
2.3 m-bit Cipher Feedback (CFB) Mode 13 
2.4 m-bit Output Cipher Feedback (OFB) Mode 14 
2.5 Counter (CTR) Mode. 15 
2.6 AES . . . . . . . . . . 16 
2.7 S-Box: Substitution Values (in Hexadecimal Format) 17 
2.8 Inverse S-Box: Substitution Values (in Hexadecimal Format) 17 
2.9 Block Diagram of the LR Implementation of S-Box [13] 19 
2.10 Schematic Representation of Multiplicative Inverse [12] 20 
2.11 Shift Rows Transformation [6] . . . . . 22 
2.12 Inverse Shift Rows Transformation [6] . 23 
2.13 Mix Column Operation . . . 23 
2.14 Xtimes Block Diagram [16] . 24 
2.15 Inverse Mix Column Operation 24 
2.16 Joint Implementation of MixColumns and InvMixColumns Transfor-
mations [17] . . 25 
2.17 Key Scheduling 27 
2.18 SCFB System Compared to CFB and OFB. 28 
X 
2.19 Synchronization Cycle for Serial Transfer Mode SCFB . 33 
2.20 SCFB with CTR Mode . . . . . . . . . . . . . . . . . . 34 
3.1 Block Diagram of the AES Controller 41 
3.2 FSM of AES Controller . . . . . . . . . 42 
3.3 Hardware Implementation of SCFB Using Serial Transfer 43 
3.4 Shift Register . . 44 
3.5 IV Shift Register 45 
3.6 Block Diagram of the System Controller 46 
3.7 FSM of System Controller . . . . . . . . 47 
3.8 Probability Distribution of# Bits in the Plaintext Queue . 51 
4.1 Hardware Implementation of SCFB Using Parallel Transfer (N=4) 56 
4.2 Shift Register for Parallel Transfer (N=4) . . . . 57 
4.3 IV Shift Register Using Parallel Transfer (N=4) 58 
4.4 Sync Pattern Recognition for Parallel Transfer (N=4) 59 
4. 5 Process of New IV Collecting for Parallel Transfer (N =4) 60 
4.6 Plaintext Queue for Parallel Transfer ( =4) . . . . . . . 61 
4.7 Plaintext Queue Output Buffer for Parallel 'n·ansfer (N=4) 63 
4.8 Ciphertext Queue for Parallel 'n·ansfer (N=4) . . . . . . . 64 
4.9 Ciphertext Queue Input Buffer for Parallel Transfer (N=4) 65 
4.10 Probability Distribution of# Bits in the Plaintext Queue (Block Trans-
fer Size=4 Bits) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 
5.1 Synchronization Cycle for £-Stage Pipelined SCFB 73 
5.2 Hardware Implementation of Pipelined SCFB Using Parallel Transfer 74 
5.3 Block Diagram of Linear Feedback Shift Register (LFSR) . . . . . . . 76 
Xl 
5.4 Block Diagram of Ports Specification of the LFSR . 76 
5.5 11-Stage Pipelined AES Using Key-Scheduling . . . 78 
5.6 Block Diagram of the AES Controller for Pipelined SCFB 79 
5. 7 FSM of AES Controller for Pipelined SCFB . . . . . . . . 81 
5.8 Port Specification of System Controller for Pipelined SCFB . 82 
5.9 Finite State Machine of SCFB System Controller for Pipelined SCFB 84 
5.10 IV Shift Register for Pipelined SCFB Using Parallel Transfer (N=8) . 87 
5.11 Sync Pattern Recognition for Pipelined SCFB Using Parallel Transfer 
(N=8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 
5.12 Boundary Adjustment for Resynchronization in Pipelined SCFB Using 
CTR Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 
5.13 Block Diagram of Shift Registers for Pipelined SCFB Using Parallel 
Transfer (N=8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 
5.14 Data Flow of Shift Registers for Pipelined SCFB Using Parallel Trans-
fer (N=8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 
5.15 Plaintext Queue for Pipelined SCFB Mode Based on Parallel Transfer 
(N=8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 
5.16 Ciphertext Queue for Pipelined SCFB Mode Based on Parallel Transfer 
(N=8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 
6.1 Synchronization Cycle for P ipelined SCFB with Various Blackout Period105 
6.2 EPF of the Pipelined CTR mode vs. various Blackout Period . . . 107 
6.3 EPF of Pipelined CTR mode SCFB vs various Sync Pattern Size 109 
6.4 SRD vs. various Blackout Period . 112 
6.5 SRD vs. various Sync Pattern size . 114 
Xll 
.--- -----------------------------
Notation and List of Abbreviations 
n The length of sync pattern 
k The length of data bit between the previous sync pattern and the next sync 
pattern in ciphertext data 
B The lenth of a block 
M The size of queue 
TJ The theoretical efficiency 
R The rate of incoming data of plaintext queue and outgoing data of 
ciphertext queue 
R' The rate of outgoing data of plaintext queue and incoming data of 
ciphertext queue 
m The number of data less than or equal to the length of a block 
N Block transfer size 
L The number of pipeline stages 
k Average length of CTR mode block 
SCFB Statistical Cipher Feedback 
Xlll 
NIST National Institute of Standards and Technology 
ECB Electronic Code Book 
CBC Cipher Block Chaining 
CFB Cipher Feedback 
OFB Output Feedback 
CTR Counter mode 
LUT Lookup Table 
LR Linear Redundancy 
EDA Electronic Design Automation 
FSM Finite State Machine 
FIFO First In First Out 
WFSM Write Finite State Machine 
SR Shift Register 
ASIC Application-Specific Integrated Circuit 
DES Data Encryption Standard 
AES Advanced Encryption Standard 
CAD Computer Aided Design 
XOR Exclusive-or 
XlV 
IV Initialization Vector 
HDL Hardware Description Language 
VHDL VHSIC Hardware Description Language 
IC Integrated Circuit 
VLSI Very Large Scale Integration 
CMC Canadian Microelectronics Corporation 
CMOS Complementary MetalOxideSemiconductor 
TSMC Taiwan Semiconductor Manufacturing Company 
FPGA Field Programmable Gate Arrays 
LFSR Linear Feedback Shift Register 
ATM Asynchronous Transfer Mode 
SRD Synchronization Recovery Delay 
EPF Error Propagation Factor 
XV 
Chapter 1 
Introduction 
Cryptography is the practice and study of hiding information. We can also define 
cryptography as the science of encrypting and decrypting data by using mathematics 
[1] . Nowadays, this world is filled up with electronic connectivity, electronic fraud, 
viruses, hackers and so on. The network security becomes more and more important. 
The interconnections of computer systems via networks are growing fast ; hence, peo-
ple feel more and more dependant on the information which is communicated through 
these systems. The discipline of cryptography has led to the development of practical 
applications to enforce network security [1]. The sender is able to hide sensitive infor-
mation or transmit it across insecure networks with cryptography so that it can not 
be read except by the intended recipient. Cryptanalysis is the study of methods of an-
alyzing and breaking secure communication. Cryptanalysts are also called attackers. 
The areas of cryptography and cryptanalysis together are called cryptology. 
There are three main requirements in information security, namely, confidentiality, 
integrity, and authentication [1]. The confidentiality of the information represents the 
protection of data from unauthorized disclosure. Only the authorized access to the 
information is allowed. The integrity of the information means the assurance that 
1 
CHAPTER 1. INTRODUCTION 2 
data received is exactly as sent by an authorized entity without modification, insertion 
or deletion. The authentication of the information means the communicating entity 
is the one that it claims to be without being processed during the transmission [1]. 
A cryptographic system normally involves an encryption system and a decryption 
system. Before we define encryption and decryption, we should know what is plaintext 
and ciphertext. Plaintext is the data that can be read and understood without any 
special measures. Ciphertext is the information that has been encrypted into seem-
ingly meaningless code. Encryption is the process of transforming plaintext using an 
algorithm and keys to make it unreadable to anyone except for the intended recipient. 
Decryption is the process of reverting ciphertext to its original plaintext. There are 
two types of encryption, symmetric-key encryption and public-key encryption. We 
will discuss them in the following sections. 
1.1 Symmetric-Key Ciphers 
In a symmetric-key cryptosystem, encryption and decryption use the same key. The 
Data Encryption Standard (DES) [2] in an example of a symmetric-key cryptosystem 
that has been widely deployed by the U.S. Government and the banking industry. 
Nowadays, DES is being replaced by the Advanced Encryption Standard (AES) [3]. A 
symmetric encryption scheme has five ingredients which include plaintext, ciphertext, 
encryption algorithm, decryption algorithm and secret key. Normally, the encryption 
and decryption algorithms are published, but the key is kept secret. In a symmetric-
key cipher, maintaining the secrecy of the key is the pricipal security problem. 
For the symmetric-key ciphers, there are two requirements to make it secure [2]. 
1. The encryption/decryption algorithm must be strong. Even if the attacker 
knows the ciphertext and the encryption/decryption algorithm, he/she can not 
CHAPTER 1. INTRODUCTION 3 
get the secret key or decrypt the ciphertext. 
2. The secret key must be kept secure by both sender and receiver. If the attacker 
can get the secret key and knows the encryption algorithm, all the ciphertext 
going through the communication can be deciphered and readable. 
Substitution and transposition are two basic operations used in symmetric-key 
encryption. In the substitution operation, the symbols of plaintext are substituted 
by other symbols. In the transposition technique, the plaintext symbol positions are 
permuted. 
1.2 Public-Key Ciphers 
Public-key cryptography, also known as asymmetric cryptography, utilizes two differ-
ent keys, a public key and a private key, for encryption and decryption. The public 
key may be widely distributed, but the private key is kept secret except for the in-
tended recipient. The keys are related mathematically, but the private k y cannot be 
practically derived or can not be derived in a reasonable time limit from the public 
key. Normally, at the transmitter, the plaintext is encrypted with the public key. At 
the receiver, the ciphertext can be deciphered only with the corresponding private 
key. In some algorithms, such as RSA, the plaintext can be encrypted with either 
the public key or the private key depending on the nature of the application. For the 
public-key cryptosystem, there are basically four essential steps as following. 
1. We may suppose there are several users, USER_l, USER_2 ... , and each of them 
generates a public key and a private key and put the former in a public register. 
Each user collects all the public keys from others. 
.---------------------------- ---
CHAPTER 1. INTRODUCTION 4 
2. If USER_1 needs to send a secret message to USER_2, USER_1 encrypts this 
message with USER_2's public key. 
3. When USER_2 receives the encrypted message from USER_1, USER_2 deciphers 
it using his/her own private key. Only USER_2 can decrypt the message from 
USER_1 because only USER_2 holds USER_2's private key. 
Public-key cryptography is normally based on mathematical functions rather than 
on substitution and transposition used in symmetric-key cryptography. Although 
public-key cryptography is a great revolution in the history of cryptography, it does 
not mean it is more secure from cryptanalysis than symmetric encryption because 
basically the length of the key and the computational complexity of the algorithm 
determine the security of an encryption/decryption scheme. 
RSA was the first algorithm to be widespread for public-key encryption. It is 
widely used in electronic commerce protocols. If the key size is long enough (currently 
the typical key size is 1024 bits), RSA is believed to be secure. 
1.3 Motivation 
Today, more and more commerce activities, transactions and services are offered over 
high-speed communications network. In order to take advantage of the big bandwidth 
capacity of high-speed networks and also keep the data in a secure manner, modes of 
operation are becoming more and more important. This thesis will study a r cently 
proposed mode of operation, statistical cipher feedback (SCFB) mode [4] [5]. SCFB, 
like cipher feedback (CFB) mode has the ability of self-synchronization to overcome 
slips and error insertions, but can be implemented in digital hardware to have higher 
throughput than CFB mode. 
CHAPTER 1. INTRODUCTION 5 
1.4 Objective of the Thesis 
The main focus of the thesis is the digital hardware implementation of SCFB mode. 
The detailed hardware design characteristics, including the Advanced Encryption 
Standard and the SCFB system hardware structure, are discussed. We also investigate 
the hardware characteristics with respect to the relationship of plaintext queue and 
ciphertext queue, queue overflow, relationship of clock domains, serial transfer mode 
versus parallel transfer mode, and implementation throughput and efficiency. We do 
the functional simulations for 3 implementation structures: 
1. SCFB mode using serial transfer. 
2. SCFB mode using parallel transfer. 
3. Pipelined SCFB mode. 
The secondary objective of the thesis is to consider an analysis of the error prop-
agation delay, synchronization recovery delay and probability distribution of number 
of bits in the plaintext queue. 
The research considers the comparison of hardware structure and performance 
between serial transfer mode, parallel transfer mode and pipelined SCFB mode. As 
a result, we draw the conclusions regarding which mode is suitable for low-area im-
plementation and which mode is suitable for high speed networks. 
1. 5 Thesis Outline 
In this thesis, there are seven chapters. Chapter 1 is the introduction. Chapter 2 
provides the background of statistical cipher feedback (SCFB) mode and considers 
CHAPTER 1. INTRODUCTION 6 
previous related research. Specifically, several block cipher modes of operation, Ad-
vanced Encryption Standard (AES) algorithm [6] and SCFB mode of operation are 
discussed. In addition we consider our implementation of the AES S-box in three dif-
ferent methods and compare them with respect to timing delay and area complexity. 
The structure and performance analysis of SCFB mode are briefly introduced. 
Chapter 3 provides a hardware implementation of SCFB mode using serial trans-
fer. In this chapter, the implementation of AES where the S-boxes are constructed 
to perform inversion in GF(28) using a composite field based on GF(24) [7] is pro-
vided. The detailed hardware implementation of SCFB mode using serial transfer is 
detailed. At the end of this chapter, the hardware characteristics such as the area 
complexity and timing analysis are discussed. Also the analysis of the queuing system 
is investigated. 
Chapter 4 provides hardware implementation of SCFB mode using parallel trans-
fer. In this chapter, the implementation of AES where the S-boxes utilize simple 
boolean function implementation in order to obtain high speed is provided. The 
detailed hardware implementation of SCFB mode using parallel transfer for block 
transfer size equal to 4 ( N =4 bits) is investigated. The hardware characteristics such 
as the area complexity and timing analysis are discussed. The analysis of the queuing 
system characterized by the number of bits in the plaintext queue is also investigated 
in this chapter. 
Chapter 5 provides hardware implementation of pipelined SCFB mode using par-
allel transfer (N=8). In this chapter, the implementation of AES with 11-pipeline 
stages where the S-boxes utilize simple boolean function implementation in order to 
obtain high speed is provided. The detailed hardware implementation of pipelined 
SCFB mode based on parallel transfer mode is discussed. Further, the hardware 
characteristics such as the area complexity and timing analysis are compared with 
CHAPTER 1. INTRODUCTION 7 
the non-pipelined SCFB mode. 
Chapter 6 provides the performance analysis of SCFB mode with respect to syn-
chronization recovery delay (SRD) and error propagation factor (EP F) [8]. In this 
chapter, we investigate the E P F and S RD of the pipelined SCFB mode versus vari-
ous pipeline stages and various sync pattern sizes. 
Chapter 7 draws a conclusion for this thesis and provides direction for some future 
work. 
Chapter 2 
Background 
This chapter introduces the background on block cipher modes and provides some 
preliminary implementation results of the Advanced Encryption Standard (AES)[l] 
[6]. This chapter also provides some results of previous work on SCFB mode, which 
can be used to compare with our work. 
Normally, an encryption/decryption system is realized by using an operational 
mode. Security and efficiency are two important aspects for a cipher system imple-
mentation. The mode of operation chosen for an application has a great influence ou 
these two aspects. Thus, it is significant to study the modes of operation. We will 
introduce five different block cipher modes of operation in this chapter. 
2.1 Block Ciphers 
A block cipher is one in which a block of plaintext is treated as a whole and used 
to produce a block of ciphertext with the same length as the plaintext. Usually, a 
block size of 64 or 128 bits is applied. In general, the block cipher has a broader 
range of applications than stream ciphers, which encrypt a digital data stream one 
8 
CHAPTER 2 . BACKGROUND 9 
bit or one symbol at a time. Nowadays, the majority of network-based symmetric key 
cryptographic applications are making use of block ciphers. In recent years, Advanced 
Encryption Standard (AES) [1] has come to be the widely applied block cipher. Later 
in this chapter, we will discuss AES in detail. 
2.2 Stream Ciphers 
A stream cipher is an important method of encryption in which the plaintext is 
encrypted bit-by-bit or symbol-by-symbol to produce the corresponding ciphertext 
[9]. A stream cipher can be used to generate a pseudo-random keystream by using a 
block cipher output to exclusive-or (XOR) with the plaintext to produce ciphertext at 
the transmitter. At the receiver, the plaintext is recovered by generating the identical 
keystream which is then XORed with the ciphertext. Stream ciphers can be used for 
high-speed networks at the physical layer in a communication system. 
In a typical stream cipher configuration, a single bit of ciphertext error only results 
in a single bit of recovered plaintext error. However, for such stream ciphers complete 
nonsense data will result for the rest of the recovered plaintext if bit slips or insertions 
happen in the communication channel. Hence, it is important to keep the keystream 
of both the transmitter and receiver synchronized. Output feedback (OFB) mode and 
cipher feedback (CFB) mode are two conventional modes of operation of block ciphers 
that allow their use as stream ciphers. However, they both have disadvanatges. In 
this work, we are concerned with statistical cipher feedback (SCFB) mode, proposed 
in [4] and investigated in [8], which is a hybrid of CFB and OFB mode. This SCFB 
mode configures block ciphers, such as the Advanced Encryption Standard (AES) [6], 
as stream ciphers capable of self-synchronization. SCFB mode has been proposed to 
provide physical layer security for a SO NET /SDH environment and is suitable for 
CHAPTER 2. BACKGROUND 10 
many other applications as well. 
2.3 Block Cipher Modes of Operation 
The National Institute of Standards and Technology (NIST) has expanded the list of 
"modes of operation" to five in Special Publicat ion 800-38A [10] . Electronic codebook 
(ECB) mode [1], as shown in Figure 2.1, is the simplest mode of Block Ciphers. In 
this and the following figures, B is used to represent the block size. In ECB mode, the 
plaintext is encrypted in blocks of B bits using the same key each time. The reason 
we use the term codebook is that for every B-bit block of plaintext there is a uniqu 
ciphertext for a given key as a paper codebook would have been used in early cipher 
[1]. For short messages, ECB mode is ideal. However , for a large amount of data 
ECB mode may not be secure. The same block of plaintext always produces the same 
ciphertext if the former appears in the message more than once. If a lengthy message 
is highly structured, a cryptanalyst may have chance to exploit these regularities. 
Time = 1 
p , 
c, 
Time = 2 
p, 
Encryption 
Decryption 
• • • 
• • • 
Figure 2.1: Electronic Codebook (ECB) Mode 
Time = N 
CHAPTER 2. BACKGROUND 11 
Cipher block chaining (CBC) mode [1], as shown in Figure 2.2, is used to overcome 
the security defects of ECB. CBC mode utilizes a technique in which the same B-bit 
plaintext block, if repeated, produces different ciphertext blocks. In CBC mode, the 
input to the encryption block cipher is the exclusive-or (XOR) of the current plain-
text block and the preceding ciphertext block. Therefore the input to the encryption 
block will have no relationship to the plaintext block although the same key is used 
for each block. For decryption, each B-bit cipher block is passed through the de-
cryption algorithm. The result from the decryption block cipher is XORed with the 
preceding ciphertext block to produce the corresponding B-bit plaintext block. An 
initialization vector (IV) is used to produce the first block of ciphertext/ plaintext on 
encryption/ decryption. IV should be unique for every sequence [1]. 
Time= 1 
p, 
IV---"-~ 
p, 
Time= 2 
. 
c, 
Encryption 
c, 
p, 
Deayption 
• 
• 
• • K 
• • K~ 
Figure 2.2: Cipher Block Chaining (CBC) Mode 
Time= N 
CH 
CH 
PN 
Cipher feedback (CFB) mode [1], as shown in Figure 2.3, utilizes m bits pseudo-
random keystream, which is generated by a block cipher to XOR with the m bits 
plaintext at the transmitter. In this figure, m is used to represent the feedback size. 
CHAPTER 2. BACKGROUND 12 
For the encryption, CFB mode feeds back m bits ciphertext into the input shift reg-
ister at the input of the block cipher in order to produce the next B bits output. For 
decryption, the same scheme is applied, except that the received ciphertext unit is 
XORed with the keystream from the block cipher to produce the plaintext unit. One 
should notice that it is the encryption function that is used, not the decryption func-
tion. CFB mode can be considered to fall into the class of stream ciphers. However, 
for this mode, one single bit error in the communication channel (i.e., an error in a 
ciphertext bit) will cause the recovered plaintext bit to be in error and the next whole 
block of B recovered plaintext bits to be corrupted while the corrupted bit works its 
way through the shift register of the receiver. In Figure 2.3, when m > 1 and a single 
bit slip occurs (that is, one bit is deleted from the ciphertext stream), the input to 
the block cipher at the receiver will become misaligned and resynchronization will not 
occur. When m = 1, CFB mode has the ability to resynchronize for a slip or inser-
tion of any number of bits. However, because each bit encryption requires a complete 
encryption of the block cipher, with a much slower throughput than straightforward 
block encryption, CFB mode with m = 1 is very inefficient. 
Output feedback (OFB) mode [1], as shown in Figure 2.4, takes the previous 
output of the block cipher as the next input to the block cipher to produce the next 
keystream block at the transmitter. OFB mode is also a stream cipher configuration. 
Of all the operational modes, OFB mode offers minimal error propagation. A bit 
error in ciphertext will merely cause one bit error in the recovered plaintext because 
the keystream generation only depends on the output of the block cipher rather than 
the ciphertext. That is, errors from the communication channel are not multiplied 
through the decryption process. High throughput can be achieved in this mode by 
performing the XOR of the plaintext with the keystream in blocks of m = B bits. 
However, OFB mode does not have the ability to resynchronize. OFB needs an extra 
CHAPTER 2. BACKGROUND 13 
lnlt~tlzal lon Vector (IV) 
·~ I 
~ 
'""'"·-·· .... 
- ~ Encryption K I Encryption ~ 
•• 
~ 
.-  ~ 
K 
• • • 
'"" ........ ·-........ 
-
-
p, 
-
p, -
- -- -
c. c. 
lnltlallzaUon Vector (IV) Enayption 
Ct C2 CN-t p , 
Dea yption 
Figure 2.3: m-bit Cipher Feedback (CFB) Mode 
signaling channel to periodically transfer an IV from the transmitter to the receiver 
in order to recover from any synchronization loss that may occur due to bit slips or 
insertions. 
Counter (CTR) mode was first proposed in [11]. Recently interest in CTR mode 
has increased with applications to ATM (asynchronous transfer mode) network se-
curity and IPSec (IP security). Counter (CTR) mode [1], as shown in Figure 2.5 
is a stream cipher mode and uses a counter, which is equal t o the plaintext block 
size. The counter is initialized to some value and then incremented for each subse-
quent block (modulo 28 , where B is the block size). For encryption, a counter passes 
through the block cipher and each block of plaintext is XORed with an encrypted 
count. For decryption, the same sequence of counter values is encrypted. The result 
is XORed with a ciphertext block to recover the corresponding plaintext block. The 
block cipher uses encryption function instead of decrypt ion function. 
CTR mode has several advantages compared to the three chaining modes (i.e., 
CHAPTER 2. BACKGROUND 14 
Init ialization Vector {IV) 
• • • 
p , p , . 
·-······ =r= 
PN~ 
c. 
Initialization Vector (IV) Encryption 
K K 
• • • 
... ,~~. ... ,.. 
c. ... c. . . 
p, p , 
Decryption 
Figure 2.4: m-bit Output Cipher Feedback (OFB) Mode 
CBC, CFB and OFB). For hardware efficiency, CTR mode can do the encryption (or 
decryption) in parallel on multiple blocks of plaintext or ciphertext while the three 
chaining modes can not . For software efficiency, parallel features, such as aggressive 
pipelining, are supportable because of the parallel execution in CTR mode. Also , it 
can be shown that CTR is at least as secure as the other modes. 
A new block cipher mode, refered to as statistical cipher feedback (SCFB) [4] and 
not standardized by NIST, is examined in this thesis and will be introduced in detail 
in Section 2.5. 
2.4 Advanced Encryption Standard (AES) 
The AES algorithm [6] is a symmetric key block cipher that processes data blocks 
of 128 bits using a cipher key of 128, 192, or 256 bits . It was developed by NIST 
to replace DES and protect sensitive government information well into the twenty-
.---------------------------------------------
CHAPTER 2. BACKGROUND 
c. 
Counter 
K---+1 
p, 
p, -...;-.-liJo~EB 
. 
c, 
Encryption 
Counter+ 1 
p, 
Decryption 
• • • 
• • • 
Figure 2.5: Counter (CTR) Mode 
15 
Counter + N - 1 
Counter + N - 1 
K--+ 
first century [6]. In t his work, AES is adopted for the block cipher to generate the 
keystream block. 
In our design, we only apply the key length equal to 128 bits. In AES, the input 
data is a 4 x 4 array of bytes, i.e., 4 x 4 x 8 = 128 bits. The AES algorithm repeats a 
series of operations for 10 rounds. Figure 2.6 shows the steps of the AES algorithm. 
In each round, except for the last round, there are four operations: Substitute Bytes, 
Shift Rows , Mix Column and Add Round Keys. In the last round, there is no Mix 
Column phase. The round function is performed it eratively 10 t imes, and the data 
path is shared for different rounds of the algorithm. Among the four operations, Byte 
Substitution is the most critical part of this algorithm in terms of performance for 
hardware designs, while the other three operations are implemented only by using 
simple linear operations such as rotations and XORs. 
CHAPTER 2. BACKGROUND 
Figure 2.6: AES 
2 .4.1 Implementation of AES S-box 
Key 
<}, 
w [0, 31 
w [4, 7) 
W [8, II) 
..[!,. 
. 
~ 
w (40, 43] 
16 
The forward substitute byte transformation is conceptually a simple lookup table 
(L UT). The SubByte operation is a nonlinear byte substitute that operates inde-
pendently on each byte of the state (i.e., a state is a 4 x 4 arrary of bytes) using a 
substitution table (i.e., S-box), which is shown in Figure 2.7 and Figure 2.8. The 
AES S-box is a 256-entry table composed of two transformations: first each input 
byte is replaced with its multiplicative inverse in GF(28 ) with the element 00 being 
mapped onto itself; followed by an affine transformation. For decryption, the inverse 
S-box is obtained by applying inverse affine transformation followed by multiplicative 
inversion in GF(28). In each round, we have to apply the SubByte operation, so 
CHAPTER 2. B ACK GROUND 17 
the SubByte operation becomes the most crit ical part in this AES algorithm. The 
AES S-box can be implemented in different methods such as: simple boolean func-
tion implementation [12], linear redundancy (LR) implementation [13], composite 
field GF(24 ) implementation [12], memory (e.g., RAM and ROM) implementation 
and Fourier transform based implementation [6] and so on. However some of these 
methods are not suitable for hardware implementation. 
0 1 2 3 4 5 6 7 8 9 A B c D E F 
0 63 7C 77 78 F2 68 6F C5 30 01 67 28 FE 07 .A8 76 
1 CA 82 C9 70 FA 59 47 FO )ll) 04 A2 AF 9C A4 72 co 
2 87 FD 93 26 36 3F F7 cc 34 ~ E5 F1 71 08 31 15 
3 04 C7 23 C3 18 96 05 9A 07 12 80 E2 EB 27 B2 75 
4 u~ I:Jj 2(.; 1A 18 bE SA I() 5:2 38 06 t:Jj :29 l:j :2r 84 
5 53 01 DO ED 20 FC 81 58 6A CB BE 39 4A 4C 58 CF 
6 DO EF M FB 43 40 33 85 45 F9 02 7F 50 3C 9F )l8 
7 51 A1 40 BF 92 90 38 F5 BC EE DA 21 10 FF F3 02 
8 CD DC 13 EC SF 97 44 17 C4 R 7E 30 64 50 19 73 
!;! bO 1:11 4F DC 2:2 2A 9U 88 4b Et 1:11:1 14 Ut bt Ut:J UB 
A EO 3:<' 3A OA 49 06 24 5C C2 03 AC 62 91 9b E4 79 
B E7 C8 37 60 80 05 4E lfJ 6C 56 F4 EA 65 ?A .AE 08 
c BA 78 25 2E 1 c ,A6 84 C6 E8 DO 74 1 F 48 BD 88 SA 
D 70 3E ffi 66 48 03 F6 DE 61 35 57 89 86 C1 10 9E 
E E1 F8 98 11 69 09 BE 94 98 1E 87 E9 CE 55 28 OF 
F 8C A1 89 DO BF E6 42 68 41 99 20 OF BO 54 BB 16 
Figure 2.7: S-Box: Substitution Values (in Hexadecimal Format) 
0 1 2 3 4 5 6 7 8 9 A B c D E F 
0 52 09 6A D5 30 36 1>6 38 BF 40 A:l 9E 81 t-3 U/ FB 
1 7C E3 39 82 98 2F FF 87 34 BE 43 44 (;4 DE E9 CB 
2 54 7B 94 32 ,A6 C2 23 3D EE 4C 95 DB 42 FA C3 4E 
3 DB 2E A1 66 28 D9 24 82 76 58 f!V. 49 6D BB D1 25 
4 72 FB F6 64 86 68 98 16 D4 A4 5C cc 50 65 t:l:i 9:<' 
5 6( 70 48 50 FD ED 89 DA 5E 15 46 57 R 80 9D 84 
6 90 08 AB DO BC BC D3 DA F7 E4 58 05 88 B3 45 06 
7 DO 2C 1 E BF CA 3F OF 02 C1 Pf BD 03 01 13 BA 68 
ij 3A 91 11 41 4F 67 DC EA 97 F2 CF CE FO 84 F6 73 
9 96 !lC 74 22 E7 ..AD 35 85 E2 F9 :21 til 1G lb ur bt 
A 47 F1 1A 71 1 D 29 cs 89 6F 87 62 DE M 18 BE 18 
B FC 56 3E 4B C6 D2 79 20 9A DB co FE /8 CD SA F4 
c 1 F DO .A8 33 88 07 C7 31 81 12 10 59 27 80 EC SF 
D 60 51 7F /l8 19 B5 4A DD 2D E5 ?A 9F 93 C9 9C EF 
E )ll) ED 3B 4D AE 2A F5 80 C8 EB BB 3C 83 53 99 61 
F 17 2B 04 7E BA 77 06 26 E1 69 14 63 55 21 DC 70 
Figure 2.8: Inverse S-Box: Substit ution Values (in Hexadecimal Format) 
CHAPTER 2. BACKGROUND 18 
LR implementation of S-box 
The linear redundancy in the AES S-box was discovered by J. Fuller and W. Millan 
[13]. In order to gain high nonlinearity, the AES S-box uses finite field arithmetic. 
However the relationship between the S-box output functions still remains linear be-
cause of the inherent characteristics of the finite field multiplicative inverse. Fuller 
and Millan discovered a new efficient algorithm to determine equivalence between 
functions [13]. As noted in [13], letting bJ (x) indicate the output boolean function, 
c represent a binary constant and D represent a binary matrix, the output Boolean 
function bJ(x )(0 :=:; j :=:; 7) can be represented by the form bJ(x) = bi(Dijx) EB cj, where 
(0 :=:; i :=:; 7), i # j. In the LR implementation, the output Boolean functions bj (the 
first 7 bits of the 8-bit S-box output) can be represented by bJ(x) = b0 (D0Jx) EB cj, 
where b0 is the least significant bit of the 8-bit S-box output. In the hardware imple-
mentation, we only need the D matrix block and the b0 logic. Figure 2.9 illustrates 
the block diagram of the LR hardware implementation of the AES S-box [14]. 
Simple Boolean Function 
Compared with the LR implementation of an S-box, the simple boolean function 
implementation of S-box has a smaller area and higher speed. The simple boolean 
function implementation is the most straightforward way to implement the AES S-
box. High speed (e.g., low latency) can be obtained for the S-box by using this 
method. In the byte substitution phase for the tables of Figure 2. 7 and 2.8, the 
individual byte is mapped into a new byte in the following way: the leftmost 4 bits of 
the byte are used as a row value and the rightmost 4 bits are used as a column value. 
We select an 8-bit S-box output value by the indices which are represented by the row 
and column values. The S-box 8-bit lookup table can be input to EDA (electronic 
CHAPTER 2. BACKGROUND 19 
design automation) tools (Synopsys Design Analyzer is applied in our design) and 
then the EDA can generate the corresponding combinational logic after we do the 
analysis and elaboration operations resulting in each output bit of the S-box being 
derived by an 8-bit boolean function . 
·r---' 
bO_IC>:;jic 
Figure 2.9: Block Diagram of the LR Implementation of S-Box [13] 
Composite Field in GF(24 ) 
The implementation of AES S-box usmg a composite field based on GF(24 ) has 
smaller hardware complexity but lower speed than the simple boolean function im-
plementation of S-box [12]. Normally the SubByte transformation for the AES al-
gorithm is implemented by using the simple boolean functions. In this approach, 
the S-boxes are based on inversion in the finite field G F(28 ) using composite field 
in GF(24 ) [12]. Comparing with arithmetic operation in GF(28), arithmetic opera-
tion in GF(24 ) is suitable for a hardware implementation using combinational logic 
CHAPTER 2. BACKGROUND 20 
based on 4 bit operations. Every element of GF(28 ) can be represented as a lin ar 
polynomial with coefficients in GF(24 ) (i.e. , bx +c). We represent the irreducible 
polynomial as x2 +Ax+ B and the multiplicative inverse for an arbitrary polynomial 
bx +cis given by (bx + c)-1 = b(b2 B + bcA + c2)-1x + (c + bA)(b2 B + bcA + c2 ) - 1 [12]. 
The problem of calculating the inverse in GF(28 ) is now translated to calculating 
the inverse in G F(24 ) , some multiplications, squarings and additions over G F(24 ). 
Figure 2.10 gives a schematic representation of multiplicative inverse calculations. 
Figure 2.10: Schematic Repr sentation of Multiplicative Inverse [12] 
2.4.2 Hardware Analysis of AES S-box 
In the analysis of AES S-box implementations in [15], the Synopsys Design Analyzer 
standard cell library based on 0.18 micron CMOS TSMC (Taiwan Semiconductor 
Manufacturing Company) process, version 2002 provided by Canadian Microelec-
tronic Corporation (CMC) is used to synthesize the S-box implementation. Applying 
CHAPTER 2. BACKGROUND 21 
Synopsys synthesis tools to the implementations using simple boolean function, lin-
ear redundancy and composite field arithmetic based on GF(24 ) , we compare the 
implementations are compared in area complexity and timing delay. 
Area Complexity 
To examine the area complexity, we use the number of equivalent 2-input NAND gates 
as a metric of circuit size. The area of the 2-input NAND gate is about 12.197f.Lm2 . 
To determine the number of gates in the synthesized circuit, we divide the total area 
by 12.197p.m 2 . From Table 2.1, we can see that the area of the LR implementation 
is the largest, and the area of the arithmetic operation in G F(24 ) is the smallest. 
Table 2.1: Area Complexity of AES S-Box Using 0.18/Lm CMOS Standard Cell Tech-
nology [15] 
area complexity (number of gates) 
LR Implementation 908 
Simple Boolean Function 677 
Using arithmetic operation in GF(24 ) 336 
Timing Delay 
The timing delay refers to the latency of the circuit critical data path under the 
worst-case conditions. The system maximum clock frequency is decided by the criti-
cal data path delay. Applying the synthesis tools, in [15] the timing delay is examined 
for the various S-box implementations. Table 2.2 illustrates the timing delay details 
for each implementation method mentioned previously. From Table 2.2, we can see 
that the simple boolean function implementation is the fastest among these three 
.-----------------------------------
CHAPTER 2. BACKGROUND 22 
implementations. However the area complexity of the simple boolean function imple-
mentation is larger than that of the GF(24 ) implementation. Also we can see that 
arithmetic operation in GF(24 ) implementation has the longest timing delay among 
all implementations, but it has the smallest area complexity. 
Table 2.2: Timing Delay of AES S-Box Using 0.1811m CMOS Standard Cell Technol-
ogy [15] 
Timing delay ( ns) 
Simple Boolean Function 4.70 
LR Implementation 7.80 
Using arithmetic operation in G F ( 24 ) 17.02 
2.4.3 Shift Row and Inverse Shift Row 
The Shift Row operation is a cyclic shift operation where each row is rotated cyclically 
to the left using 0, 1, 2 and 3-byte offset for encryption, while for decryption, the 
circular shifts are performed in the opposite direction for each of the last three rows, 
with 1, 2 and 3 byte right shift for the 2nd, 3rd and 4th rows. Figure 2.11 and 
Figure 2.12 illustrate the forward Shift Row and Inverse Shift Row transformations, 
respectively. 
so.o so.t so.2 so.3 so.o so.t so.2 so.3 
s1.o Su s1.2 su Su su su st.o 
s2.o S1.1 s2. ~ s2.3 su s2.3 s 2.0 s 
- .1 
sJ.o sl.l s3.2 s3.J su sl.o s3.t su 
Figure 2.11: Shift Rows Transformation [6] 
CHAPTER 2. BACKGROUND 23 
so.o so. I so.! so.> so.o So.t So.2 So.> 
sl.o su su Su su st.o Su Su 
s2.o S1. 1 S2.2 sl.> S2.2 su s~.o S2.1 
sl.o sl.t su s3.J s3.t su Su sJ.o 
Figure 2.12: Inverse Shift Rows Transformation [6] 
2.4.4 Mix Column and Inverse Mix Column 
For the Mix Column operation, each column of the state (i.e., a state is a 4 x 4 
arrary of bytes) is treated as a polynomial over GF(28), and multiplied by the fixed 
polynomial, C(x) = {03}x3 + {01 }x2 + {01 }x + {02} modulo x4 + 1. The mix column 
operation is given in Figure 2.13. In GF(28 ), addition is the bitwise XOR operation. 
Multiplication of a value by 01 is equal to the value itself. Multiplication of a value 
by 02 can be implemented as a one-bit left shift followed by a conditional bitwise 
XOR with (00011011) if the leftmost bit of the original value is 1. This operation is 
often called Xtimes, which is shown in Figure 2.14 [16]. 
hoc 02 03 01 01 aoc 
bl c 01 02 03 01 ale 
• 
b 2c 01 01 02 03 a2c 
b3c 03 01 01 02 a3c 
Figure 2.13: Mix Column Operation 
The Inverse Mix Column operation is defined by the matrix multiplication, which 
is shown in Figure 2.15. For example, we can express x·OE as (x ·08)+(x ·04)+(x·02), 
for any x E GF(28) . The only difference between forward MixColumn and Inverse 
CHAPTER 2. BACKGROUND 24 
MixColumn is that the latter has extra multiplication with 04 and 08. We can do this 
operation like this: 04 ·X = 02 · (02 ·(X)) and 08 ·X= 02 · (02 · (02 ·(X))). The block 
diagram of the joint Mix Column and Inverse Mix Column implementation is shown 
in Figure 2.16 [17] [18]. This figure only illustrates the single byte output, and we 
applied 16 joint Mix Column and Inverse Mix Column blocks in parallel to process 
128 bits data in our design. In Figure 2.16, the four inputs, "a" , "b", "c" and "d" 
represent four bytes in a column of the state. The variables "invmix" and "mix" are 
two outcomes by applying Mix Columns and Inv Mix Columns, respectively. 
a7 a6 aS a4 a3 a2 al aO 
a' 7 a' 6 a ' 5 a' 4 a' 3 a' 2 a' 1 a ' 0 
Figure 2.14: Xtimes Block Diagram [16] 
boc OE OB OD 09 0 oc 
blc 09 OE OB OD 0 1c 
• 
b2e OD 09 OE OB 0 2c 
b3c OB OD 09 OE 0 Je 
[02 OJ 01 
Oil ["'' 
08 08 08 08 0 oc 04 00 04 
00 ["~ l 01 02 03 01 a 1e 08 08 08 08 ale 00 04 00 4 • a"" 
• + • + 01 01 02 03 ale 08 08 08 08 0 l e 04 00 04 00 ale 
03 01 01 02 a3e 08 08 08 08 0 Je 00 04 00 04 a Jc 
Figure 2.15: Inverse Mix Column Operation 
CHAPTER 2. BACKGROUND 25 
Figure 2.16: Joint Implementation of MixColumns and InvMixColumns Transforma-
tions [17] 
2.4.5 Add Round Key 
The Add Round Key operation is a bit-wise exclusive OR operation of the whole 
block and the corresponding round key. Before the first round is performed, there is 
one key addition operation for pre-whitening. 
2.4.6 Key Scheduling 
The key scheduling is an important part of the AES algorithm. It can take an initial 
key of length of 128 bits, 192 bits or 256 bits. In the design of this thesis, the key 
scheduling takes 128-bit initial key as 4 words (i.e., 16 bytes) input, and it generates 
40 words to provide each of the 10 rounds with a 4-word round key. Each of the 
round keys depends on the key of the last round. 
There are two typical methods used to implement the AES key expander. One 
method is to compute the round key on-the-fly on each round for the data processing. 
CHAPTER 2. BACKGROUND 26 
The other one is to precompute all the round keys before-hand and store them in 
memory. Saving area is the advantage of the first method because it does not need 
any extra memory to store all keys, and it can change initial keys fast with low or 
no delay. The precompute scheme has no extra delay while supplying the decryption 
key, but it takes more area in order to store all the round keys. In this thesis, we will 
usc the on-the-fly computaion scheme for most designs. However, for the pipelined 
SCFB design using parallel transfer (Chapter 5), we need the block cipher to generate 
the keystream as fast as possible, and, hence, use the pre-computation scheme. 
The 128-bit initial key is used to XOR with the plaintext as pre-whitening before 
the first round of operations. Subsequently, round keys are derived and applied at each 
round. In general, the current round key is represented as [w4i, w 4i+l, w4i+2, w4i+3 ], 
where i indicates the round number. The next round key [w4(i+ l ), W4(i+l)+l, W4(i+1)+2, 
w4(i+l)+3] is generated as illustrated in Figure 2.17 [1], where the F represents a 
complex three-step function. The F function includes three operations, a one-byte 
circular left shift operation, a byte substitution operation and a leftmost byte XOR 
with the round constant Rcon[i]. The Rcon[i] is defined as Rcon[i+ 1] = 02 x Rcon[i]. 
For the first round, the Rcon[i] is initialized as 01. All the multiplications through the 
key scheduling are defined in the finite field GF(28 ). The round-dependent constant 
Rcon[i] eliminates the symmetry or similarity in the round keys [1]. 
When we apply the key scheduling to both encryption and decryption in AES, 
the key scheduling processes are different. For the encryption process, the round keys 
are applied to the datapath in the forward order. However, for the decryption, the 
round keys are calculated in the backward direction starting from the last round key. 
Firstly, the decryption key scheduling has to compute the round keys in the forward 
direction to obtain the last round key, and then compute in the backward direction 
to get the corresponding round keys in each round. In this case, the setup time is 
------------- ------------- ------------------ ----- ----------
CHAPTER 2. BACKGROUND 27 
longer than that of encryption. 
Figure 2.17: Key Scheduling 
2.5 Statistical Cipher Feedback Mode 
We mentioned the statistical cipher feedback (SCFB) mode in Section 1.2. In this 
section, we will further investigate the SCFB mode. The algorithm of SCFB was first 
described in [4] . The name derives from the fact that the cipher feedback is working 
in a statistical way to resynchronize based on recognition of a sync pattern. SCFB 
works in the way of a stream cipher by utilizing a block cipher to produce a keystream 
which is XOR'd with the plaintext data to produce the ciphertext data. Unlike other 
conventional block cipher modes, when bit slips occur in the communication channel, 
SCFB mode can achieve self-synchronization and SCFB can be implemented with 
high efficiency to operate at high speeds. Additionally, the latency and buffer sizes 
used to implement the system are reasonable. 
r-------------------------------------~~~-~------ ------------
CHAPTER 2. BACKGROUND 28 
Output Feedback (OFR) Mode 
Output Fcrrlhack (OFn) Mode 
Statisti•·al Cipher Feedback (SCFB) Mode 
Figure 2.18: SCFB System Compared to CFB and OFB 
2.5.1 Implementation Structure of SCFB System 
In early sections of this chapter , we have discussed CFB mode with m = 1, which 
is an inefficient mode with the property of self-synchronization. Hence, our re-
search direction becomes how to improve the system efficiency and to keep the self-
synchronization as well. To save communications bandwidth, we check for a sync 
pattern in the ciphertext data to control synchronizations of the encryption system 
and decryption system because the encryption system and the decryption system can 
---·- -----------------------------------
CHAPTER 2. BACKGROUND 29 
obtain the same ciphertext. The sketch of SCFB mode is shown in Figure 2.18 where 
E represents the block cipher and the input register is needed to store the input 
data of the block cipher. The Sync Pattern Recognition Block is needed to scan 
the ciphertext to find a sync pattern and then collect the new IV for the next B bits 
after t he sync pattern is recognized. The sync pattern is a fixed small size sequence. 
For example, a sync pattern size of 8 and sync pattern of 10000000 could be used [4]. 
If the sync pattern is not found in the ciphertext the input of the block cipher comes 
from the previous output of the block cipher, and hence, in this case SCFB mode can 
be thought of as OFB mode with m = B. When the sync pattern occurs and the 
collection of new IV is completed, the new IV will be loaded into the input register as 
the input to the block cipher, and SCFB mode can be thought of as momentarily in 
CFB mode. Thus, SCFB mode is a combination of CFB and OFB mode. Obviously, 
SCFB mode can provide the capacity of self-synchronization, which conquers the de-
ficiency of OFB mode. As well, comparing to the conventional CFB, the efficiency 
of SCFB mode is improved dramatically since SCFB mode works as OFB mode with 
m = B most of time. From Figure 2.18, the decryption system has the same structure 
as the encryption system with the roles of plaintext and ciphertext reversed. 
2.5.2 Discussion on Queuing System 
For SCFB mode, a queueing system consisting of 2 queues (plaintext queue and ci-
phertext queue) is needed [8]. The plaintext queue is needed to store the incoming 
bits and transfer them out to XOR with the keystream bit by bit. The ciphertext 
queue is needed to store the ciphertext bits and send them out of SCFB system bit 
by bit. The queuing system provides the elasticity necessary to accomodate periods 
during which the keystream is not available due to resynchronization. A previous 
CHAPTER 2. BACKGROUND 30 
implementation of SCFB mode transfered data between queues in blocks of 128 bits 
[19]. However, the resulting design required a large amount of hardware. The plain-
text queue is initialized to be empty and the ciphertext queue is full initially with all 
'1's. The plaintext data is sent to the plaintext queue at a fixed rate, the ciphertext 
queue sends data out of the system at the same fixed rate. The transfer of data into 
the plaintext queue has the same rate as the transfer of data out of the ciphertext 
queue, so, the ciphertext queue becomes empty when the plaintext queue fills up. 
The plaintext queue becomes empty and the ciphertext queue fills up because the 
plaintext queue is designed to send data to XOR with the keystream to produce the 
corresponding ciphertext which is sent to the ciphertext queue at a higher rate than 
the incoming speed of the plaintext or outgoing speed of the ciphertext queue. When 
resynchronizations occur, data transfer out of the plaintext queue is stalled until the 
new keystream is produced based on the new IV. During such period, since data ar-
rives continuously at input, the data in the plaintext queue increases. The higher rate 
of data transfer out ensures that during periods of SCFB mode the plaintext queue 
recovers its stability. This process represents the elastic property of the queues [19]. 
The plaintext queue will overflow if resynchronization occurs frequently. In order to 
avoid the overflow, the size of the queuing system has to be large enough to reduce 
the probability of overflow to as small as possible [19]. 
Let M represent the size of plaintext queue and ciphertext queue and k represent 
the current number of bits in the plaintext queue, the ciphertext queue should have 
(M- k) bits because the incoming speed of the plaintext queue is identical with the 
outgoing speed of the ciphertext queue when the resynchronization does not occur. 
The delay through the system is defined ask+ (M - k) = M bits [8]. The buffer size 
M has an influence on the delay when data passes through the system. In order to 
minimize the delay, the buffer size M should be as small as possible. However, when 
CHAPTER 2. BACKGROUND 31 
the block cipher gets delayed and queues get held due to the resynchronization, the 
buffer size M has to be large enough to collect the incoming plaintext. If B represents 
the size of the block cipher, the buffer size M should be greater than or equal to B 
because the plaintext queue continues to collect incoming data without outgoing data 
until the new keystream is ready in the block cipher by using the new IV while the 
system collects all B bits of the new IV after the sync pattern is recognized. It is 
possible that the last bit of the new IV could happen anywhere within a block of 
ciphertext and there is a scenario where only part of the block needs to be XORed 
with some delay since all bits following the last bit of new IV can not be encrypted 
until the new block of keystream is ready. If the last bit of IV really happens closed to 
the beginning of the block of ciphetext, it is necessary that the buffer size !vf should 
be at least equal to B to make sure overflow docs not happen in the plaintext queue. 
Hence, !l;f should be greater than or equal to B so that the plaintext queue has enough 
space to store the data and does not have data overflow [8] . An appropriate value for 
M will depend on the ratio of the plaintext queue outgoing rate to the incoming rate, 
the speed at which a new block is produced, and the requirements for the probability 
of error [8] . 
2.5.3 Serial Transfer vs. Parallel Transfer 
Serial transfer and parallel transfer are different methods for the transfer of data from 
the plaintext queue to the ciphertext queue. In parallel transfer the incoming data 
which is stored in the plaintext queue are removed from the queue and sent to XOR 
with the keystream in a unit of block transfer size N which is more than one bit. 
The resulting N bits of ciphertext are placed into the ciphertext queue at the output 
of the system. When SCFB mode is working in OFB mode and the sync pattern 
CHAPTER 2. BACKGROUND 32 
is not recognized, the plaintext queue sends N bits of data to XOR with N bits of 
keystream at a time. 
In serial transfer mode, the plaintext queue sends plaintext data bit by bit to 
XOR with keystream to produce the corresponding ciphertext data and the ciphertext 
queue receives the ciphertext data bit by bit as well. Serial transfer generally requires 
a simpler circuit than parallel transfer. 
In this thesis, we will investigate different parallel transfer sizes N which varies 
from 2 to 8. Both the serial transfer and the parallel transfer have clock limitation 
which constrains the system efficiency. The clock limitation will be discussed later. 
2.5.4 Relationships of clocks 
In SCFB mode, there are three clocks, clkl, clk2 and clk3, to control the running 
speeds of the data transfer and the block cipher: clkl is used to clock the transfer 
of data out of the plaintext queue and into the ciphertext queue, clk2 is used to 
clock data into and out of SCFB system, and clk3 is used to clock a round of the 
block cipher. The clkl frequency is designed to be faster than clk2. This ensures 
that plaintext queue does not back up due to periods during which outgoing bits are 
stalled because of resynchronization. This relationship of clocks becomes the clock 
limitation which constrains the system efficiency. For simplicity of design, the clkl 
frequency is set to two times faster than the clk2 frequency, and as a result underflow 
happens frequently in plaintext queue. Overflow happens infrequently in plaintext 
queue, except when the buffer size is too small, or the clk3 is too slow. Because the 
total number of bits in plaintext queue and ciphertext queue is fixed, underflow may 
happen in ciphertext queue when overflow happens in plaintext queue. Overflow will 
never happen in the ciphertext queue, because of the complementary relationship of 
CHAPTER 2. BACKGROUND 33 
the number of bits in the queues. When underflow happens in the plaintext queue, 
then plaintext queue will spend 2 clk1 cycles to shift out 1 valid data bit. So, the 
actual rate of the incoming data of ciphertext queue will be equal to the rate of clk2. 
This will result in a balance between the rates of the incoming and outgoing data in 
ciphertext queue, which will lead to no overflow in ciphertext queue. 
2.5.5 Synchronization Cycle 
For SCFB mode, we assume that the ciphertext bits transmitted in the communica-
tion channel can be categorized as illustrated in Figure 2.19. In this figure, it is clear 
that n represents the length (in bits) of the sync pattern, B represents the length 
(in bits) of the subsequent IV, and k represents the length of the remaining bits, 
which is labelled as OFB block. These k bits of data occur between the end of the 
IV and the beginning of the next sync pattern. The variable k is a random variable 
depending on the placement of the next sync pattern in the ciphertext. The system 
works in CFB mode from when the sync pattern is recognized until the end of the 
new IV. Correspondingly, the system works in OFB mode from when the new IV is 
all collected until the next sync pattern is found. Hence, a synchronization cycle can 
be defined as the set of bits from the beginning of the sync pattern to the beginning 
of the next sync pattern. A synchronization cycle consists of n + B + k bits. 
v 
n 
'-IV B <v k ,[v n '/ B 
' I' /I' /'- / I' ' / 
. . . . . . sync IV OFB block sync IV ...... 
Figure 2.19: Synchronization Cycle for Serial Transfer Mode SCFB 
.---~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~---··-- --- ·--- -
CHAPTER 2. BACKGROUND 34 
2.5.6 SCFB with CTR mode 
Counter (CTR) mode is an important operation mode in this thesis because encryp-
tion (or decryption) in CTR mode can be done in parallel on multiple blocks of 
plaintext or ciphertext, and, this property makes it possible to pipeline the block 
cipher in order to improve the throughput. That is, the CTR function can provide 
pseudo random data to the block cipher as the input in a higher speed than OFB 
mode because it does not depend on the previous output of the block cipher while 
OFB mode does. 
lnit_CTR_Biock(B-1 :0) 
lniti_Ke\(B·1 :0)D---t E E .---a lnili_Ke\(B·1 :0) 
N 
N 
f----optaintex1(N·1 :0) Ciphertext )D ~N -L.........j ) >---'--'-7----- )D 
Figure 2.20: SCFB with CTR Mode 
As we have mentioned earlier in this chapter, counter mode is a stream cipher 
mode and uses a counter which is initialized to some value and then incremented 
for each subsequent block (modulo 28 , where B is the block size). For encryption, 
a counter passes through the block cipher and each block of plaintext is XORed 
with an encrypted counter (i.e., keystream). The general block diagram for SCFB 
with CTR mode is shown in Figure 2.20. In this figure, B represents the size of the 
block cipher or counter function output (i.e., a counter block) , and N indicates the 
number of block transfer size. The variable E indicates the block cipher in this figure, 
CHAPTER 2. BACKGROUND 35 
and in our work we adopt and implement AES algorithm with 128-bit block length 
for the block cipher. For encryption, while the plaintext data is being collected, a 
counter block (B bits) generated by the counter function is encrypted by the block 
cipher to produce the keystream block (B bits). The input of the block cipher is the 
counter block (B bits) generated by the counter function. The counter function keeps 
supplying the pseudo random counter block to the block cipher by typically using a 
linear feedback shift register (LFSR) which is a sub-module of the counter function. 
The input signal "IniLCTR_Block(B-1:0)" is used to initialize the counter function. 
When the resynchronization does not occur, the counter function does not need any 
input , but when the sync pattern is recognized the new IV is sent to the counter 
function as the new initial block (B bits). After a block of keystream is ready and 
sent to the output register when the sync pattern is not recognized, the keystream 
will be XORed with the plaintext data in a unit of N bits to produce the same length 
of ciphertext data which is then stored into the ciphertext queue. For decryption, the 
structure is similar to the encryption system except that the position of checking the 
sync pattern occuring on the ciphertext side. The same sequence of counter values is 
encrypted. The result is XORed with a ciphertext block to recover the corresponding 
plaintext block. The block cipher uses encryption function and does not need the 
decryption function. 
2.5.7 Previous SCFB Implementations 
In [19], Yang has already investigated an SCFB system in full parallel transfer mode 
(i.e., 128 bits transfered from the plaintext queue to the ciphertext queue at once) . 
In [19] , the hardware implementation of SCFB mode ut ilizes the Design Analyzer 
based on 0.18J.Lm CMOS technology to perform the front-end synthesis. The hard-
CHAPTER 2. BACKGROUND 36 
ware complexity is shown in Table 2.3, which is reported by the design analyzer of 
the Synopsis tool with the constraint of the system clock of IOns. The number of 
equivalent 2-input NAND gates ·is used as a metric of the circuit size in order to 
estimate the circuit size. According to synthesis results, the total number of gates of 
the encryption system is 1255644, of which about 50% are the result of SCFB mode 
configuration. 
Table 2.3: Synthesis Result Using 0.18 Micron CMOS From [19] 
Total Area ( # gates) 
Plaintext Subsystem 190788 
Ciphertext Subsystem 313856 
AES 612834 
SCFB System 1255644 
2.5.8 Performance Analysis of SCFB Mode 
In this section, we will introduce some concepts of basic metrics of performance 
analysis. These concepts include the theoretical efficiency, synchronization recovery 
delay and error propagation factor. 
Theoretical Efficiency 
Compared with conventional CFB, SCFB has the advantage that the efficiency of 
the implementation can approach that of straight block encryption, depending on the 
sync pattern size. The theoretical efficiency can be defined as [8]: 
l . D/B 'rJ = llll 
D-+oo E {#block cipher opemtions forD bits} (2.1) 
CHAPTER 2. BACKGROUND 37 
In Eq.(2.1) , D represents the number of bits transmitted. The numerator represents 
the number of blocks corresponding to the encryption of D bits. The denominator 
represents the expected number of block cipher operations required in SCFB mode. 
The theoretical efficiency is a measure of the rate at which the stream cipher can 
encrypt compared with the rate of the block cipher. For OFB mode, T7 can be 1 when 
all B bits are used in the XOR operation. For conventional CFB mode, 17 can be 1 
with m = B. However, if it is guaranteed to resynchronize from individual bit slips, 
CFB must operate with m = 1 and, T/ = 1/ B < < 1. In this case, CFB mode is a very 
inefficient mode. These are reasons why we are so interested in SCFB mode so far. 
Synchronization Recovery Delay 
The synchronization recovery delay (SRD) is defined a,s the expected number of 
bits following a sync loss due to a slip before synchronization is regained. We will 
investigate the SRD for a parallel transfer implementation of SCFB and pipelined 
SCFB mode in Chapter 6. It should be noted that SRD does not include the lost 
bits directly due to the slip and no explicit assumptions are made about the number 
of bits lost in the slip [8]. 
Error propagation factor 
Error propagation factor (EP F) is the bit error rate at the output of the decryption 
divided by the probability of a bit error in the communication channel (i .e., in the 
ciphertext) . That is, the EP F measures the average number of bit errors on the 
output of the decryption when a bit error occurs. We will discuss the EP F for the 
parallel transfer implementation of SCFB and pipelined SCFB mode in Chapter 6. 
CHAPTER 2. BACKGROUND 38 
2.6 Conclusion 
The chapter introduces the concepts of block cipher , stream cipher and block cipher 
modes of operation. The structures of hardware implementations of AES and SCFB 
system are also described. In the hardware implementation of SCFB mode, the 
parallel transfer mode and serial transfer mode are discussed, respectively. In SCFB 
mode, we have investigated the nature of the plaintext queue and the ciphertext 
queue, t he relationship between different clocks, t he relationship between queue sizes, 
and the data delay during the transmission from the plaintext queue to the ciphertext 
queue. As parts of performance analysis, such as theoretical efficiency, S R D and 
E P F , is discussed in this chapter as well. 
Chapter 3 
SCFB Mode Using Serial Transfer 
In this chapter, the hardware implementation of Statistical Cipher Feedback (SCFB) 
using serial transfer from the plaintext queue to the ciphertext queue is investigated. 
An iterative implementation of the Advanced Encryption Strandard (AES) is adopted 
as the block cipher in this SCFB system. The S-box of AES is based on the composite 
field based on G F(24 ) implementation. By using this composite field implemenation 
of S-box, the hardware complexity is minimized. Although the hardware complexity 
is low and the throughput of the block cipher is high, the throughput of the plaintext 
queue can only reach 100 Mbps, which results in the throughput of the SCFB system 
only reaching 100 Mbps. By doing the functional simulations for different buffer 
sizes, we select out an appropriate buffer size of 64 bits which has no queue overflow 
in our simulations . We also investigate how the various sync pattern sizes affect the 
probability distribution of the number of bits in the plaintext queue and average 
number of bits in the plaintext queue. 
39 
.---------------------------------------
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 40 
3.1 AES Implementation 
In our SCFB mode, we adopt the S-boxes which are constructed to perform inversion 
in GF(28 ) using a composite field based on GF(24) [7]. Compared with straightfor-
ward implementation in GF(28 ), implementation in GF(24 ) is suitable for a hardware 
implementation using combinational logic for all boolean equations which depend only 
on 4 input bits. The result ing circuit area is significantly reduced. 
The AES controller is needed to take control of the block cipher. The block 
diagram is shown in Figure 3.1. On the input side, the "hold_on" signal comes from 
the sync pattern checking model, which we will discuss in the next section. The 
"lnit_Data_Load" signal comes from the input port of AES, and it indicates that the 
initial input text data is loaded to AES. The "Reg_Load" signal comes from the SCFB 
system controller, which will be introduced in the next section. On the output side, 
the "load_data_rcg" signal triggers the register in the first round of AES to load in 
the input text data. The "load _key _reg" signal triggers the corresponding register in 
the key scheduling block in order to load the proper initial round key /sub-roundkey 
to the keys register. The "key _reg_mux_sel" signal also goes to multiplexer in the key 
scheduling block. It acts as a select signal to choose either the initial key or round 
key. The "done" signal indicates whether the keystream is ready or not in the last 
round of AES. The "data_reg_mux_sel" signal is used to select the proper round data 
to go through the 128-bit register. We re-use the 128-bit register in order to decrease 
the complexity of AES. The "round_const" signal is needed in the F function of key 
scheduling, which we have introduced in the previous chapter. 
The finite state machine (FSM) of the AES controller is illustrated in Figure 
3.2. At any state, if "reset" is high, the next state will transfer to Init immedi-
ately. From any state of RoundO to Round10 or hold state, the state will transfer to 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 41 
Load! nput state on the next clk3 cycle if "iniLdata_ctl" or "hold_on" or "Reg_Load" 
is high. From state RoundO to Round10, the output "round_const" varies. From 
state RoundO to Round9, the outputs are the same except for "round_const" . The 
output "key _reg_mux_sel" is high to generate the round key by Key Scheduling block. 
The output "load _key _reg" and "load_data_reg" are also high for these ten states 
for loading the round keys and data in the corresponding registers. The output 
"data_reg_mux_sel" is set to "01" for these ten states indicating the input data to 
the reused register will be the output of Round1 to Round9 , respectively. When 
the state is Load Input , "key_reg_mux_sel" is low, which indicates the Multiplexer 
in the Key Scheduling will select the initial keys for the first round. The output 
"data_reg_mux_sel" is set to "11"; the input to the register will be input data to the 
block cipher, i.e. , "aes_data_in". If the current state is Round10, the only different 
output from the previous state is the "load_key _reg", which is set to low indicating 
there will be no round keys offering for the next state. When the current state is 
hold, "load_key _reg" and "load_data_reg" will both be set to low because there will 
be no new round keys or data to be processed. Because the Shift Register spends 
more time to shift out a block of keystream than the block cipher to generate one 
block of keystream , it is possible the block cipher can not begin to do the encryption 
until the Shift Register finishes. So we add a hold state in the AES Controller design. 
, ...... clk3 
rese 
-
,,..,. 
cD 
-~ 
hold_on 
lnlt_Data_Loa 
Reg_ load 
1 1 
1 I 
/ / 
1 L 
I 
1 I 
I 
1 1 
{I 
111 
AES Controller 1' I 
2', 
8 1 
I 
~ 
..... 
"" 
load_data_reg 
load_key_reg 
key_reg_mux_sel 
done 
data_reg_mux_sel( 1 0) 
round_const(7:0) 
Figure 3.1 : Block Diagram of the AES Controller 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 42 
/l:..:y reg IIIUX :.cl <• '0'; 
load_ key reg <.- '0'; 
data reg mux sci ...:- "Ill ": 
load d~nn reg ..-.- '0': 
round cunst , . "OOOOOUOO": 
key reg mux sci < • '0' ; 
load key reg <• '\'; 
datu reg mux sci <- " I I": 
lnad data reg c.~- 'I': 
round cons! < • "00000000": 
key reg mux sci <- ' 1': 
load key reg <• ' I' : 
dnlfl reg mu.~ ,o;cl <• .,00"; 
lund data reg < • 'I': 
round const <- "OOOO(l(lO 1": 
key reg mux sci < • ' I ' ; 
load key reg <• 'I'; 
data reg mux sci <• "0 1": 
load dutu reg <• 'I '; 
round "'onst <• "0(}0000 10"; 
key n:K mux :-.cl <• ' I ': 
load key reg <- ' I ': 
tlata reg mux !ocl <• "0 I"; 
load datn reg <• ' I '; 
round ..:unst <• "01)000 IOU"; 
key reg mux sd ..-... 'I'; 
load key reg «..""' 'I ': 
dnra reg mu:< sci ... ..,. "01 ": 
load data reg <• 'I': 
round const <• "O()()(It ()()()"; 
kl·y rc~ mux sci < .. '1 ' , 
load key reg <• '0'; 
data re t; mux J.CI <• " 10" ; 
lond data reg <• '0': 
key reg mux sci <• ' I '; 
load key reg •:- ' I'; 
da111 reg mux set <• "0 I"; 
load data rcg <• ' I ': 
round CO liSt <• "000 I (){)1)0"; 
key r..:g mu'( !old <• ' I '. 
\nod key reg <• '0', 
data reg mm: :-.e l <• " 10": 
l o<~d ~ta la r..:g .-.- ' I ' : 
kl·y reM mux sci ,_' I ': 
lond key reg <• 'I ': 
data reg mux s c i <'• "0 I": 
load dalll reg <.• 'I'; 
round ..:un.~t ~~- "()() I I 0 I I 0"; 
key reg mux sd ... . ' I ' ; 
lnad key reg ,~.' I ': 
dutn reg mu:" sd <• "(II " : 
load d1ua reg ..:• ' I'; 
rouml cons• <• "000 I I 0 11 " : 
key res lllU): !ocl <• 'I'; 
loa d key reg <• 'I': 
cln1a reg mux sci <• "0 I " ; 
loRd data reg <• 'I ' : 
1011nd ~.:on)ol <• "I OOOOOCtO"; 
key reg mux sci <• ' 1' : 
load key reg <• ' I '; 
,\atn reg mux sci'~- " 0 l " : 
lo ad dnt1t rc~;·:• ' l '. 
T('IU!ld C('lll:.l ~.• " (1 1(1(100()1)" ; 
key reg mu.~ sci <• '1 ': 
lond key reg <• ' 1': 
dutn reg mux !<>CI <• "() J"; 
hl:~d dnta reg ...::• ' I '; 
ro und const <'• " 0() I(}()()()()"; 
Figure 3.2: FSM of AES Controller 
3.2 SCFB Mode Hardware Implementation Details 
The hardware implementation of SCFB mode using serial transfer from the plaintext 
queue to the ciphertext queue is illustrated in Figure 3.3. In this section we explore 
an implementation that serially transfers bits and as a result keeps the circuit area 
reduced. In the serial design, there are three clocks, clkl , clk2 and clk3 , to control the 
running speeds of data transfer and block cipher: clkl is used to clock the t ransfer 
of data out of the plaintext queue and into the ciphertext queue, clk2 is used to 
clock data into and out of the SCFB system, and clk3 is used to clock a round of 
the block cipher. The plaintext queue and the ciphertext queue are initialized to be 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 43 
clk3 
n ew_IV_dcn• IV_In 
8 NrareaN(aoJ ·pulse· 
1y ·o· 1' I Choose 
/ Cho M.Jx 1, l MUX I New_IV 
I r 
8 -r chos Nekr.s 
~ 
Data h ~vel' 
~ M '1' 
...... Block ~ u 
·o· 
Cipher X 
lnit_D ata_Load 
'--
~ 8,.. new_IV(a:O 
aes_Datag 8 8 
1, ncl 
1.-
Koy_9roorr( a 0) 
ci~er dcne 1/ ~e new_I V _don 1 
L I I ...... I .... "" 
Reg load 
1 ~ 1 v J3lock Register (8 bls} 
R O Dooe Key_Stream_Ou~ a 0) 
n bits 
i ~· fiR~ 1 f--+-1 
ontroller ~ I ·~ ...... I .l n ew_IY dcne I I I ...... I I IJ-I R o1.o I I ~R (8 bitsj ~· unhd d on 
---::::_j syn pattern 
- F I 1 111v Shin Register (128} .~ 
PI oin T e>d _au L...r-)D-= 1 CQ full I 11 L,t' 
e-f I I ...... I I -D 
(n-1:0) 
R eset 
Iva lid oval ld I ...... I ffi*J R' R 1 11 1 R ext Is . 256) clp Plaintext Queue (4 - 256} lph ertext Queue( pl aint her! ext 
clk2 
c lk1 ~ 
AanTe>C_V:alid 
Figure 3.3: Hardware Implementation of SCFB Using Serial Transfer 
empty and full , respectively. While the plaintext data is being collected bit by bit in 
the plaintext queue, a keystream block of 128 bits is generated by the block cipher. 
If a block of keystream is ready and the sync pattern is not recognized , the 128-bit 
keystream will be loaded into Block Register. Also the same keystream will be loaded 
into the block cipher as the new input data . Then , Shift Register (SR) will load in 
this block of keystream if it is empty and then begin to shift bits out one by one. At 
the same time, the plaintext queue will shift out the data bit by bit to XOR with 
the keystream coming from Shift Register. When the sync pattern is recognized, the 
system will continue working in the OFB mode for at least 128 clk1 cycles to collect 
the complete IV. When the IV ..shift_register is in the middle of collecting 128 bits 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 44 
for the new IV, the sync pattern scanning is t urned off so that any 8 bits matching 
the sync pattern are ignored until the IV collection phase is complete. When the 
128 bits of IV are ready in IV_ShifLRegister, Shift Register, plaintext queue and the 
ciphertext queue will be held. That is to say, Shift Register and the plaintext queue 
will not shift out bits any more, and the ciphertext queue will not have any incoming 
data until the new IV is used to create a new keystream block. However , the plaintext 
queue will continue to accept incoming data and the ciphertext queue will continue 
to transmit outgoing data. The new IV block is sent into the block cipher as the new 
"datajn", and the next block of key stream will be generated by the block cipher. 
After this new keystream is ready, the controller will provide it to Shift Register and 
simultaneously unhold the shift register, the plaintext queue, the ciphertext queue 
and the IV _ShifLRegister. In the following , we will describe some basic components 
in this system. 
clk1 
reseP 
L..oad> SR_ 
-.1..01""" f\ew_IV_ 
l.hrd 
co 
d_ooD 
_FLJ 
Key_StrEBll1_0Jt(1 ·-27:0, 
3.2.1 Registers 
--
Shift_ Register 
• Keystr631Tl0ut 
• sR_Valid 
Figure 3.4: Shift Register 
The component Block Register is used to capture the output of the block cipher, 
prior to transfer into Shift Register. Shift Register is used to shift keystream bits 
into the XOR operation with the plaintext . The block diagram of the Shift Register 
is shown in Figure 3.4. Shift Register will be held when "New_IV_Done" is high. 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 45 
Shift Register will continue shifting when it is released, i.e., "Unhold_on" is high. 
The "SR_Valid" signal will determine whether the plaintext queue can shift out data 
or not, and it is triggered by both the "New _IV _Done" signal and "CQ_Full" signal. 
The controller will decide when Block Register and Shift Register can load in new 
keystream. IV _Shift_Register, shown in Figure 3. 5, will keep checking for the n bit 
sync pattern all the time, except for the period from when the new IV is ready until 
"Unhold_on" is high. When the 128 bit new IV is ready, IV _Shift_Register will provide 
this new IV to the block cipher as the new input, and at the same time, it will set 
the signal "New_IV_Done" high to hold Shift Register, plaintext queue and ciphertext 
queue. 
e..., 1 I 
~New_IV_Don 
IV_Out(128:0 128 
First 8hit 
8 I 
8 
_, 
Sync_P 
n Unhold_o 
c hose_N ew_IV 
attern(7: 0) 
1 
I 1 
1 I 
I 
IV Shift _Register ,. 
-
1--
r-
hold_COIII I 
1 
~ ])-J-1- j ~ 1---
D_FF 
-8 bit Syn_Villid ... 
ornparator 
1 Count_En11 
/ ~ilid 
i- .1. 
Counter 1--
Figure 3.5: IV Shift Register 
3.2.2 System Controller 
1 1 
I ;::; 
I V_ln 
PQ_Val id 
reset 
clk1 
The controller is needed to take the control of the whole SCFB system. The block 
diagram is shown in Figure 3.6. The Finite State Machine of the system controller is 
shown in Figure 3. 7. At anytime if the "Reset" is high, the system will be in On_Rst 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 
clk1D-----4----:---:------~ 
" '>- ---. 11---+·~Reg_Load rese .. _ 
RD _OonED- -:-1----t 
New_IV_OonED--+--t 
SR_Oon 
Controller 
Cipher_OonA:}- -1--t_ _________ __. 
SR_Load 
Figure 3.6: Block Diagram of the System Controller 
46 
state and "ChoMUX" will be set to low, which means that the input to the block 
cipher will load in the initial data as its data input. When the system is in Gen_K ey 
state, the block cipher is in the process of generating the keystream. If "Cipher_Done" 
is low or "RD_Done" is low, the system will be kept in Gen_K ey state. The system 
will not be in Reg_Taking_K ey state until "Cipher_Done" is high and "RD_Done" is 
high. "Reg_Load" will be set to high when the system is in Reg_Taking_K ey state, 
which means the Block Register is in the process of load in the 128 keystream from 
the block cipher. The state will transfer to Reg_Occupied if input "SR_Done" is low. 
When the system is in Reg_Occupied state, which indicates the Block Register has 
been occupied by the new keystream and has not transfered them out yet, the output 
"Reg_Load" will be set to low and "ChoMUX" will be set to high to get ready to load 
the new data from the shifLregister into the block cipher. When the input "SR_Done" 
is high, the system state will transfer to SR_Loading_Key from Reg_Taking_Key or 
Reg_Occupied state. When the system is in the state SR_Loading_K ey state, output 
"Reg_Load" is set to low and "SR_Load", "U nhold_on" and "ChoMUX" are set to 
high. If "SR_Done" is low, the output "Unhold_on" and "SR_Load" will be set to low 
after one clk1 cycle and the system will be in state W ait_State. After a clk1 cycle, 
the state will transfer from W ait_State to Gen_K ey state. At any time, if the 128 bits 
.----~~~~~~~~~~~~~~~~~~~~~--------
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 47 
SR_Done = '0' 
Figure 3.7: FSM of System Controller 
new IV is ready in the IV _ShifLRegister, the system state will be in N ew_IV _Found 
state in the next clkl cycle. When the system is in N ew_IV _Found, which indicates 
the system is in the process of generating the new keystream by using the new IV s 
from the IV_ShifLRegister, the output signals "Reg_Load", "SR_Load", "Unhold_on" 
will be set to low and "ChoMUX" will be set to high. The system state will not 
transfer to the Reg_Taking_K ey from N ew_IV _Found until the "Cipher_Done" is 
high. The VHDL code of the SCFB system controller is shown in the Appendix A. 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 48 
3.2.3 Plaintext Queue and Ciphertext Queue 
The architectures of the plaintext queue and ciphertext queue are similar, except for 
their initialization mechanisms. The plaintext queue is initialized to be empty, and 
ciphertext queue is initialized to be full, i.e., all 'l's. The clock clkl is designed to 
be faster than clk2. This ensures that the plaintext queue does not have overflow 
due to periods during which outgoing bits are stalled because of resynchronization. 
For simplicity of design, the clkl frequency is set to two times faster than the clk2 
frequency, and as a result underflow happens frequently in the plaintext queue. So, 
we have designed a special scheme to handle this issue to avoid any data lost in the 
queue. Overflow happens infrequently in the plaintext queue (ideally never), except 
when the queue size is too small, or the clk3 cycle is too large. Because the total 
number of bits in the plaintext queue and the ciphertext queue is fixed, underflow 
may happen in the ciphertext queue when overflow happens in the plaintext queue. 
Overflow will never happen in the ciphertext queue, because of the complementary 
relationship of the number of bits in the queues. When underflow happens in the 
plaintext queue, the plaintext queue will spend 2 clkl cycles to shift out 1 valid data 
bit. So, the actual rate of the incoming data of the ciphertext queue will be equal to 
the rate of clk2. This will result in a balance between the rates of the incoming and 
outgoing data in the ciphertext queue, which will lead to no overflow in the ciphertext 
queue. 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 49 
3.3 Synthesis Results, Analysis and Comments on 
the D esign 
As we mentioned before, there are three clock domains in this system. Among these 
clocks, clkl is the fastest clock and it can be the base system clock in the implemen-
tation. The clocks clk2 and clk3 can be derived from clkl. As shown in Figure 3.3, 
the rate R of incoming plaintext data to the plaintext queue is directly equal to the 
frequency of clk2, since the data collection of the plaintext queue is based on clk2. 
The system efficiency can be controlled by adjustment of these three clock frequencies. 
The plaintext queue collects incoming data at the rate R ( clk2) and outputs the data 
at the rate of clkl. The ciphertext queue has the reverse situation. The interfaces 
(Block Register, Shift Register, etc.) of the block cipher also use clkl to keep the 
same pace as the two queues . The block cipher, which is clocked at a per-round rate 
of clk3, has to run as fast as possible in order to reduce the idle time that stalls the 
queue bit transfer due to generating the keystream when resynchronization occurs. 
We undertook functional simulations for different buffer sizes for the plaintext 
queue of 48 to 256 bits. It was discovered that the overflow only happens when the 
queue size is 48 bits. From the simulations, an appropriate buffer size of 64 bits, which 
results in no queue overflow, is selected. The simulation parameters are adopted as 
follows: 
1. The sync pattern size, n, is equal to 8. 
2. The sync pattern format is "10000000". 
3. The size of the block cipher, B, is equal to 128. 
4. clk1, clk2 and clk3 are set to have periods of 5 ns, 10 ns and 25 ns, respectively. 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 50 
These values are selected to give a minimum possible given critical path t iming. 
Figure 3.8 illustrates the probability distribution of number of bits in the plaintext 
queue for varying sync pattern sizes. This curve is derived from the simulation results. 
The simulation parameters are adopted as follows: 
1. The sync pattern size, n, varies from 4 to 8. 
2. The sync pattern format is "10 ... 00" . 
3. The size of the block cipher, B , is equal to 128. 
4. clk1 , clk2 and clk3 are set to have periods of 5 ns, 10 ns and 25 ns , respectively. 
We take the values after 1000 periods of clk2 in the simulation when the system is 
working in stable state. In general, with high probability there will be fewer than 6 bits 
in the queue. At times, with non-zero probability, as many as 45 bits were found in the 
queue. This results from the resynchronization of the SCFB system. The number of 
stored bits continuously increases without any outgoing bits for the plaintext queue 
when the new IV is used to generate a keystream block. Since resynchronization 
happens more frequently for the smaller size of sync pattern, the queue would have 
more chances to be filled with incoming bits without any outgoing bits during the 
resynchronization for the smaller size of sync pattern. The same queue would have 
less time for the normal operation where the resynchronization does not happen. This 
is why the peak for the smaller size sync pattern is lower than that for the larger size 
sync pattern. 
We did an ASIC synthesis with 0.18 micron CMOS TSMC (Taiwan Semicon-
ductor Manufacturing Company) standard cell technology using Synopsys 2002 tools 
supported by Canadian Microelectronics Corporations (CMC) [20]. We can get a 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 51 
Probability distribution of# bits In the queue 
50% 
(PQ-slze = 64 bits; clk3 = 25 ns; sync-pattern-size= 4, 6, 8, respectively, Running time = 1000000 ns) 
40% 
-z;- 35% 
c: 
g 
0 
0 g 
~ 30% 
2 
.. 
c: 
0 
0 
0 
25% ~ 
E 
~ 
.. 
E 20% 
"' 
"' c: 
·c: 
c: 
" IX 
0 15% 
.. 
"' .!! c: 
.. 
~ 
.. 
10% 
"' 
5% 
+ 
,, 
• I 
I I 
I 
• 
I I 
I I 
I I 
I 
'J: 
I I 
I I 
I ( 
'\Q._<lo <lo <lo <lo<lo·O<lo<lo<loO<lo 0 <f><lo 00 00 0 0 0 0 ·00 00 0~060 Oo 
......... + +-+~ +·+ +-+-+ + t +-+-t- + ~ +-+-+ + ·~ +-+-+ + .... +-+-+ + ~ 
10 15 20 25 30 35 40 45 
#bits In the Plaintext Queue 
I 
~ sync-alze :~: 4 
--- sync-size • 6 
-+- sync-size • 8 
60 64 
Figure 3.8: Probability Distribution of# Bits in the Plaintext Queue 
report indicating a number of different gates, timing and a total overall area when 
the circuit is synthesized. We use the number of equivalent 2-input NAND gates for 
the total area as a metric of circuit size. The synthesis results of the block cipher, 
the plaintext queue and the ciphertext queue are shown in Table 3.1. 
Compared to the results of [19] which uses full block parallel transfer, the complex-
ity of the SCFB system is much reduced. The complexity of hardware implementation 
of [19] is also shown in Table 3.1. The constraint of the system clock (i .e., clkl) was 
10 ns and the total number of gates of the encryption system is 1255644 according to 
the synthesis result in [19]. In our design, the synthesis results have been improved 
~------------------------------------------------------- ----
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 52 
significantly, both for the AES design and the SCFB mode circuitry. The complexity 
of hardware implementation is reduced. 
In our SCFB system, based on the critical path timing information derived through 
synthesis, the speed of the block cipher is set to 128/12 x 25 ns ~ 426.67 Mbps using 
clk3 to have period of 25 ns which is 5 times the 5 ns period clkl. The throughput 
of the SCFB system is 1/10 ns = 100 Mbps since clk2 is half of clk1 and hence has 
a period of 10 ns. Hence, the efficiency is 100/426.67 ~ 23.4%. 
Thus, the throughput of the plaintext queue becomes the bottleneck of the system. 
To improve the throughput of the system we can change the serial-in and serial-out 
mode into parallel-in and parallel-out mode for the transfer of data from the plaintext 
queue to the ciphertext queue. This will be investigated in the next chapters. 
Table 3.1: Synthesis Result Using 0.18 Micron CMOS 
Total Area ( # gates) 
Serial Transfer Mode Full Block 
(This thesis) Parallel Transfer [19] 
plaintext queue 1232 190788 
ciphertext queue 2291 313856 
PQ_CQ_Integrated 3525 -
AES 16919 612834 
SCFB System 25361 1255644 
3.4 Conclusion 
This chapter investigates the hardware structure of statistical cipher feedback mode 
using serial transfer. The S-box of AES is based on the composite field based on 
CHAPTER 3. SCFB MODE USING SERIAL TRANSFER 53 
G F(24 ) implementation in order to minimize the hardware complexity. For the in-
vestigation of ASIC synthesis with 0.18 micron CMOS standard cell technology, the 
throughput of the SCFB using serial transfer can reach 100 Mbps and the overall 
complexity of the system is equivalent to about 42k gates. The efficiency of SCFB 
using parallel transfer is about 23.4%. Compared to the results of [19] which applies 
full block parallel transfer, the hardware complexity of the SCFB system based on 
serial transfer mode is much reduced. 
Chapter 4 
SCFB Mode Using Parallel 
Transfer 
In this chapter, the hardware implementation of statistical cipher feedback (SCFB) 
using parallel transfer from the plaintext queue to the ciphertext queue is investigated. 
We have studied SCFB using serial transfer in Chapter 2, where we know that the 
throughput of the plaintext queue has become the bottleneck of the system. In order 
to solve this problem, we improve the design by enlarging the transfer size of the 
queuing system. By changing the serial transfer to parallel transfer in the queues, 
a higher throughput of the plaintext queue is obtained comparing with that in the 
serial transfer mode SCFB system, which is discussed in the last chapter. For SCFB 
mode using parallel transfer, the input and output of the system becomes N bits in 
parallel where N is the number of bits tranferred in parallel between queues. 
Compared with the serial transfer mode SCFB, the SCFB using parallel transfer 
has more complex architecture while dealing with the data transfer among plain-
text queue, ciphertext queue and IV _Shift_Register. The external signals also have 
some changes. For example, the "plaintext" input port and the "ciphertext" out-
54 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 55 
put port become multiple bit signals. For the block cipher, we still use the iterative 
implementation of Advanced Encryption Standard (AES) as the block cipher in this 
parallel SCFB system. We discuss the detailed hardware implementation of the par-
allel transfer mode and compare it to the serial transfer mode which has already been 
investigated in Chapter 3. We do the analysis and synthesis as well on this parallel 
transfer mode in this chapter. 
Through this chapter, the ideal throughput of the block cipher is 128 bits /(12 x 
clk3 cycle) , where clk3 cycle represents the clock period of the block cipher. However, 
because of the resynchronizations, for SCFB mode, the throughput is reduced to be 
about 50% to 60% of the ideal value [8]. On the other hand, the input throughput 
of the plaintext queue is N / clk2 cycle, where N is the block transfer size and clk2 
cycle represents the clock period of transfer of data into and out of the system. In 
the last chapter, we have investigated the serial transfer mode SCFB, and the low 
throughput of the plaintext queue has limited the throughput increase of the system. 
In our investigation of parallel transfer mode in this chapter, we set the block transfer 
size to 4 bits and investigate how this change may improve the throughputs of both 
plaintext queue and the system. By doing this, we can make the throughput of the 
block cipher as high as possible. 
4.1 Hardware Implementation Details 
In our implementation of SCFB using parallel transfer , AES is still using the key-
on-fly scheme. However, in this chapter, we adopt t he simple boolean function in 
the S-box of AES. Figure 4.1 illustrates the hardware implementation of SCFB mode 
using parallel transfer (N = 4 bits) from the plaintext queue to the ciphertext queue. 
Compared with the serial transfer mode, the parallel transfer mode has more complex 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 
ret-~ IV 
clk3 
lnit_Data_Load Block 
Cipher 
Key_ Strean(B-1:0) 
OON_J\1(8:0 
B 
OONjV_ 
PlainTeld_Vc:M 
cfk2D----+-_._-----I----------l----_l 
clk1 
Figure 4.1: Hardware Implementation of SCFB Using Parallel Transfer (N= 4) 
56 
structures for the shift register, IV _Shift_Register, plaintext queue and ciphertext 
queue. Only a small modification on the system controller is made because the 
behavior of the system does not change so much except that the speed of transfer of 
data and the keystream generation in the AES becomes faster than the serial transfer 
mode. When the 128-bit keystream is generated in the block cipher, it will be loaded 
into the block register , which contains a 4 x 32-bit long register. Then this keystream 
will be moved to the shift register when the system controller gives a proper control 
signal to the shift register. The shifter register also has a register which is 4 x 32-
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 57 
bit long with the last 4-bit block, having a more complex architecture. After the 
keystream is successfully loaded to the shift register , this 128-bit keystream will be 
transferred out of the shift register to be XOR'd with the plaintext from the plaintext 
queue to generate the corresponding ciphertext in a unit of 4 bits. The ciphertext 
data will be transferred to both the ciphertext queue and IV _ShifLRegister in a unit 
of 4 bits. Since the sync pattern can be generated anywhere while the system is 
working in the OFB mode, the last transfer block of the new IV might need less than 
4 bits depending on where the sync pattern is recognized. In this case, the Shift 
Register moves data out in a unit of 4 bits, which may contain 1 - 4 valid bits and 
0 - 3 invalid bits correspondingly. The same thing happens in the plaintext queue, 
ciphertext queue, and the IV _Shift_Regist er. In the upcoming sections, we discuss the 
details of the hardware design for the parallel transfer mode, especially for the shift 
register , plaintext queue, ciphertext queue and IV _shift_register. The VHDL code of 
the SCFB system controller is shown in the Appendix A. 
Shift_ Register 
SR_COre 
II---Dkeystrean"Out(3:0) 
SR_Valid 
Key_Strwn_c:ut(127:oP·----t. __________ _j 
Figure 4.2: Shift Register for Parallel Transfer (N=4) 
4.1.1 Shift Register 
Figure 4.2 illustrates the block diagram of shift register in parallel transfer. Compared 
with the shift register in the serial transfer implementation, the shift register in the 
parallel transfer mode has a 4-bit keystream output signal and an extra input signal, 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 58 
the latter indicates the length of the last block in the new IV. The shift register 
transfers out 4 bits of keystream block for every clk1 cycle. The input signal (i.e., 
"LasLIV _Length") represents the number of valid bits which will be needed in the 
next keystream transfer block. For example, if "LasLIV _Length"= "001" the next 
keystream block will only contain 1 valid bit. 
4.1.2 IV _Shift_Register 
The IV _Shift_Register component is illustrated in Figure 4.3. Unlike serial transfer 
mode, parallel transfer mode, where block transfer size is equal to 4 bits, for every clk1 
cycle, has up to 4 valid bits of ciphertext coming in the IV _ShifLRegister. However, 
for the last block of IV while collecting new IV, there may be less than 4 valid bits of 
data transferring to the IV _ShifLRegister. This number of valid bits of data in the 
last block of IV depends on where the sync pattern is recognized. For every clk1 cycle 
there is at most 4 comparisons occuring in the IV _Shift_Register in order to recognize 
the 8-bit sync pattern. 
New _IV_ dare 
IV_Out(127 
Sync_Pattem(7: 
:0 
0) 
1 
1281 
8 
-
2~ 
4 
IV_ Shift_Register 
f---
1-
r.~L 1 hold_cou D~ ~ 
Court- EJ1ab. ~ V< id ~ li 
1-
Unhold_ 
Chose_New_ 
Last_IV Jmgth(1 
Last IV block lencrth 1/ Counter f---~\P 1 2 I 1-
:Or 
11 
1 
I 1! 
• 
I V_in(3:0) 
PQ_Vaid 
reset 
clk1 
Hdd_m _fa-_PQ_SR 
Figure 4.3: IV Shift Register Using Parallel Transfer (N=4) 
The process of sync pattern recognition is described in Figure 4.4. The 1 st 
moment describes the first two blocks of ciphertext data (i.e., {IVo(O) ... IV0 (3)} 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 59 
'---------------------' 4" comporis01t if recognizee! 
"LasLfV_lenglh" • 4 
Figure 4.4: Sync Pattern Recognition for Parallel Transfer (N=4) 
and { JV1 (0) ... !Vi (3)}) have already been loaded into the first 8 bits positions in 
IV _Shift_Register by using 2 clk1 cycles, where clk1 is needed to clock the transfer of 
data into the IV _ShifLRegister. The 2nd moment describes that at most 4 com par-
isons are complete for every clk1 cycle. For example, if the sync pattern is recognized 
in the 2nd comparison the IV_Shift_Register will begin to collect the 128-bit new IV, 
and the first bit of the new IV will be IV2(1). In this case, the "LasLIV_Length" 
is set to 2, which indicates that both the shift register and the plaintext queue will 
transfer only 2 bits in their transfer blocks after 31 clk1 cycles (i.e., 31 clk1 cycles 
are needed in order to collect 128-bit new IV while the block transfer size is equal to 
4 bits). Actually everytime when a block of ciphertext data is transferred into the 
IV _Shift_Register except for during IV collection, there are four comparisons needed 
to be done in order to recognize the sync pattern. 
For the block transfer size which is equal to 4 bits, after the sync pattern is 
recognized, IV _Shift_Register will spend 32 clk1 cycles to collect the 128 bits new IV. 
When the new IV is ready, IV _Shift _Register will provide this new IV to the block 
cipher. Figure 4.5 shows how the 128-bit new IV block transfers to IV _Shift_Register 
when the sync pattern is recognized. In the first block of Figure 4.5, we assume 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 
... new IV collectln~f---- S}'W P•tttr•il nce1.U.td 
" L..1t_I V _MIIjltll" •I 
60 
.....  Fe 1 , ,E ~_J...:..v..C:. •• ;.:..;, '-'-'·"""·"~, L..;;.IV...;;::·":.:...., L.._-L.__L.._-L.___J _ _,___J _ _,___Jc__ ________ .lbk• ...... " 
4 blll~ 
........ Shirt 4 bill ~ ........ -...... ······ ......_ 
... the JOdi block of new IV 
...... I X I X I X w ~-'---'-'-1'----"-'-'---'-'-'--''-'---'-'-'----'-'-'--...;__;_j.---'-'-'--....:..:....O.---'-'-'--....:..:...J..----"''-'---"'--'--'-'---'-"-'-'...o....;_'-'-'---""-' 
~ 
Plaintext Queue IV _Shift_Register 
Figure 4.5: Process of New IV Collecting for Parallel Transfer (N=4) 
that the sync pattern is recognized in the pt comparison (shown in Figure 4.4) of 
IV _ShifLRegister. Then the first 3 bits of ciphertext are collected in the first 3 
positions of the new IV. The second and third block of Figure 4.5 represent the 
following 31 clk1 cycles, where for every clk1 cycle, there are 4 valid bits of ciphertext 
bits which are transferred to the IV _Shift_Register. Simutaneously, the bits in the 
IV _ShifLRegister are shifted 4 bits to the right per clk1 cycle. In the last block of 
Figure 4.5, the plaintext queue only sends 1 valid bit which is XORed with 1 bit 
keystream from the shift register, and then this 1 bit of ciphertext is transferred to 
the IV _ShifLRegister. Simultaneously, the bits in the IV _ShifLRegister are shifted 1 
bit to the right. If the length of the last IV block is 4 bits the PQ will transfer a 
block of 4 bits with 4 valid bits to XOR with 4 bits of keystream, then transfer to the 
IV _ShifLRegister. If the length of the last IV block is 2 bits the PQ will transfer a 
block of data which contains only 2 valid bits to XOR with 2 bits of keystream, then 
these 2 bits ciphertext bits come into the IV _ShifLRegister to complete the 128-bit 
new IV collection. 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 61 
4.1.3 Plaintext Queue and Ciphertext Queue 
The structure of the plaintext queue for the parallel transfer mode is similar to that 
of the serial transfer mode except for the output pipeline design, which becomes more 
complex. Figure 4.6 illustrates the structure of the plaintext queue in parallel transfer 
mode for block transfer size which is equal to 4 bits. 
IDAT A(3:0r 
IVAUI:J> - EED-
• 
R ST 
WCLJ< 
4 
Vlklte_PipeUne ~ ~ Reed_Pipellne 3 ; 4 / 
l 
VIFSM 
-):... - WPORT_OATA(:tO) RFSM 
""-""' ~ f--- RPORJ"_Or\TA(30 - Rsader ~ f¥- -+- Poirter -..,_rod ~ WPORT..AOOA(J'O i RfOi - ..... ~·0 ,.,..., 
.......... 
,.., 
I I 
NOTE : AI the COfl'l'OOBlts <re syncrroniliXI reset 
Figure 4.6: Plaintext Queue for Parallel Transfer (N=4) 
Last_ IV l<ngth(2.0) 
ODATA(3:0) ~ 
+ • 
--7f-
-tiE] 
PQ__pipe_rdd 
SR_Vaid 
~ RCLJ< 
The input signals, "IDATA", "!VALID", "RST" and "WCLK", are connected 
to the external ports of the system. The "!DATA" signal represents the plaintext 
data, which will be loaded into the input pipeline which is composed of several 4-bit 
registers. Then the plaintext data will be stored in the proper positions in the FIFO 
and read out of the FIFO when the control signals, "wport_meb" , "wp_enab" , "renab" 
and "rporLmeb", are asserted properly. The "LasLIV Jength" signal comes from the 
IV _Shift_Register and represents the number of valid bits that the read pipeline should 
transfer. The "PQ_pipe_hold" signal also comes from the IV _Shift_Register. It is used 
to freeze the read pipeline when resynchronization happens. The "SR_Valid" signal, 
which comes from the shift register, is used to synchronize the output data from 
the shift register and the plaintext queue. In Figure 4.6, WFSM, (i.e., write finite 
state machine), is needed to control the behavior of the write part in the plaintext 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 62 
queue. The block Writer Pointer provides the writing address to the FIFO. The 
FIFO is actually a 2-port RAM which is used to store and read the data through 
write port and read port, respectively. RFSM, (i.e., read finite state machine), is 
needed to control the behavior of the read port in the plaintext queue. The block 
Reader Pointer provides the reading address to the FIFO. 
Figure 4. 7 shows how the plaintext queue adjusts the boundary of each 4-bit 
transfer block. We apply three 4-bit registers in order to handle the boundary of the 
transfer block when the last block of the new IV is smaller than 4 bits. In Figure 4. 7, 
Reg1 is used to transfer the block of data, which contains 1 to 4 valid data bits, out of 
plaintext queue. R eg2 is used to store the intermediate data which may contain data 
from two successive blocks of plaintext. R eg3 is used to receive the data directly from 
the read pipeline of the plaintext queue. When R eg2 is not filled, the next oncoming 
block of data will first fill Reg2 and then put the remaining data to Reg3. In the first 
block of Figure 4. 7, we assume the "LasLIV Jength" is equal to 3. Thus, when the 
plaintext queue receives this signal, Reg2 will transfer the first 3 bits of data to Reg1 
in the next clk1 cycle. Simultaneously, the new 4-bit incoming data will be separated 
into two parts which transfer to R eg2 for the first 3 bits and Reg3 for the last one 
bit. 
In the second block of Figure 4.7, we assume the "Last_IV Jength" is equal to 4 
after the previous block. Therefore, when plaintext queue receives this signal, R eg2 
will transfer all the 4 bits of data to R eg1 in the next clk1 cycle. At the same time, the 
1-bit of data in the previous R eg3 will be transferred to R eg2 in the first position, and 
the new 4-bit incoming data will be separated into two parts which are transferred 
to Reg2 for the first 3 bits and Reg3 for the last one bit. 
The structure of the ciphertext queue for the parallel transfer mode is similar 
to that of the serial transfer mode except for the input pipeline design. Figure 4.8 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 63 
Reg'! 
X I X I X I Pl(3) 
~ "' Last_ IV_lcnglh'" = 3 
Data_in~ Pt(2) Pt(l) I Pt(O) Po(3) I ~ I X I Po(2) I Po(l) I Po(O) 
I ' 1 moment: 
Reg:! Rcgl 
2nd moment: 
Reg'! 
I X I X I X I Pl~) 
~ '"LasUV_Icngth'" = 4 
Data_in~ Pl(2) Pl(l) I Pl(O) Pt(3) I ~ Pt(2) Pt(l) I Pt(O) Po(J) 
Reg:! Regl 
Figure 4.7: Plaintext Queue Output Buffer for Parallel Transfer (N=4) 
illustrates the structure of the ciphertext queue in parallel t ransfer mode for block 
transfer size which is equal to 4 bits. The output signals, "ODATA", "OVALID" and 
"CQ_Full", are connected to the external output ports of the system. The "IDATA" 
signal represents the ciphertext data, which will be loaded into the input pipeline 
that is composed of several 4-bit registers. Then the ciphertext data will be stored 
in the proper positions in the FIFO and read out of the FIFO when the control 
signals, "wport....meb", "wp_enab" , "renab" and "rport....meb", are asserted properly. 
The "!VALID" signal, which comes from the plaintext queue, is used to identify the 
validation of the input data. In Figure 4.8, WFSM, (i.e., write finite state machine), 
is needed to control the behavior of the write port in the ciphertext queue. The 
Writer Pointer provides the writing address to the FIFO. The FIFO is actually a 
2-port RAM which is used to store and read the data through write port and read 
port, respectively. RFSM, (i.e., read finite state machine), is needed to control the 
behavior of the system on the read side of the plaintext queue. The block Reader 
Pointer provides the reading address to the FIFO. 
,---------------------------- ----
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 
3:0 IDATA( 
IVAUI:P' ----£EE:]-
• 
RS T 
clk2 . 
:Y 
I 
W:SM 
.,_ ... 
.. _ ...... 
""""-
I 
Write_Pipellne P ~ Read_Pipeline 
I I 
co._p .. _ .... r-r- 'NPORT_r».TA(3•0) Rn_Wiggor_<NMC ~-V- Writer Reader Pl:llrter APORT _llATA(l.O - Pl:llrter p.-; -
~ -.>(7:0 ~--- ; AF0 1 ~ """'17:0) v- -
II 
NOTE : All the CClfl'lX>I'Ml'l!S ore synciTonized reset 
!)-
I 
RFSM 
""" 
'"'"' 
, .. 
I 
Figure 4.8: Ciphertext Queue for Parallel Transfer (N=4) 
64 
0 DATA(3:0) 
EEE:]-
y 
-
clk1 
CQ_Fti l 
F igure 4.9 shows how the ciphertext queue adjusts the boundary of each 4-bit 
ciphertext block when the number of valid bits in the ciphertext block is smaller 
than 4. The data in darker colour represents the valid data in the new upcoming 
ciphertext block. The first three blocks, (i.e., block a, band c), in Figure 4.9 describe 
the behaviour of ciphertext queue input pipeline at very beginning, (i.e., initialization 
process in t he queue). The remaining parts in the figure show the behavior when the 
ciphertext queue is working in normal situation. Block I I and I I' are two separated 
cases followed by block I . Blocks a-b-c represent a successive process, one of which 
spends one elk 1 cycle. Blocks I , I I or I , I I' represent a successive process, one of 
which also spends one clk1 cycle. 
In block a of Figure 4.9, we assume the number of valid bits in ciphertext block 
IS 4 at the very beginning. The first ciphertext block of data is represented as 
C0 (0) ... C0 (3) . These 4 bits of data will be transferred to Reg1 directly for the 
initialization. In the next clk1 cycle it will be output to the ciphertext queue. 
In block b of Figure 4.9, we assume the number of valid bits in the 2nd ciphertext 
block (i.e., C1 (0)) is 1. This 1 bit of data is transferred to Reg1 after the first 
ciphertext block has been transferred out of Regl. After this 1 bit of ciphertext data 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 
RegJ 
(a) X I X I X X I 
~ 
Data_ in~ Ll ....Cx----'.1--'..:...x _L_....Cx----LI--'..:...x _JI- I C.(J) I C~l) I Co(l) I Co(O) I 
R egl R eg I 
R egJ 
(b) I X I X I X I X I 
~ 
Data_ln~ x I x I x I x I - Ll _x----l._x --'---x----'-1 --=c.::..~•.:....JJ I 
R egl Reg I 
R egJ 
(c) I X I X X I X I 
Data_ in~ X I X I X I C~J) I - I Co(l) I Co(l ) I C.(O) I C~O) I 
R egl Reg I 
............................................................................................... -............................ . 
--------------------------------------------------------------------------· 
R egJ 
Data_in~ 1 C•Pl 1 C>(l) I C.( l ) I c~Ol I 
(I) ~ 
Oata_in!.L,.. 
( II ) 
Onta_in~ 
( II' ) 
I X I X I X I C~3) I 
R egl 
RegJ 
I X I X I C.(l) I C~O) I 
x l x l x lc~»l 
Regl 
RegJ 
I C~J) I C>(l) I Co(l) I 0(0) I 
I C o(l) I C( l) I C~O) I C o(l) I 
R eg I 
CASE 1: 
Swn of# va lid bit in Rcg2 & Rcg3 -< 4 
I Co(l) I C.( l ) I C~O) I C o(3) I 
R eg I 
CASE 1: 
Sum of# valid bit in Rcg2 & Reg.] > 4 
I X X I X I C.(J) I _ .,.. I Co(1) I Co(l ) I C~O) I C~J) I 
Rcgl R eg I 
Figure 4.9: Ciphertext Queue Input Buffer for Parallel Transfer (N=4) 
65 
is transfered, the block cipher will encypt the new IV to generate the corresponding 
new keystream and the ciphertext queue will be held. This 1 bit of data will not 
be output until the next 4-bit ciphertext block (i.e., the 3rd ciphertext block which 
contains C2 (0) ... C2 (3)) comes in and the number of bits in R egl reaches 4 when 
ciphertext queue is released. This process is shown in block c of Figure 4.9. 
Block I in Figure 4.9 shows the next oncoming ciphertext block (i.e., C2 (0) ... C2 (3), 
t hat is, we assume the number of valid bits in this ciphertext block is 4) will be trans-
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 66 
ferred to Reg3 when neither Reg1 nor Reg2 is empty. 
The reason we add Reg3 in our design is to detect the number of valid bits in the 
upcoming ciphertext block. The new upcoming ciphertext block will be transferred 
into the Reg3 in every clk1 cycle when neither Reg1 nor Reg2 is empty. 
Block I I and I I' in Figure 4.9 show two different situations when the last block of 
IV will be transferred and the number of valid bits in the upcoming ciphertext block 
varies from 1 to 4. Block I I illustrates the case when the sum of number of valid bits 
in Reg2 and Reg3 is equal or smaller than 4 (assuming the number of valid bits in 
the upcoming ciphertext block is 2 in block I I). Block I I' illustrates the case when 
the sum of number of valid bits in Reg2 and R eg3 is bigger than 4 (assuming the 
number of valid bits in the upcoming ciphertext block is 4 in block I I'). After this 
last block of ciphertext data is transferred, in which the number of valid data varies 
from 1 to 4, the block cipher will encypt the new IV to generate the corresponding 
new keystream and the ciphertext queue will be frozen. The 4 bits of data in R eg1 
will not be transferred out until the next 4-bit ciphertext block comes in when the 
ciphertext queue is released. 
4.2 Synthesis Results, Analysis and Comments on 
the Design 
We did the functional simulations for block transfer size equal to 4. From the sim-
ulations, an appropriate queue size which is equal to 80 x 4 bits was found to have 
no queue overflow for the block transfer size which is equal to 4 bits. We also did 
the simulation for the queue size equal to 64 x 4 bits. In this case queue overflow 
happened frequently. In this chapter, we investigate the probability distribution of 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 67 
the current number of bits in the plaintext queue. 
The simulation results are shown in Figure 4.10. In the simulation we ignore the 
values before 70 ns because in the very beginning of the system, the queue is empty 
and the incoming bits would continuously fill up the plaintext queue until the first 
block of key stream is finished . Hence, we just consider the data when the system has 
reached a steady state in normal operation. In Figure 4.10, clk1 is the fastest clock 
and it can be the base system clock. The clk2 rate is needed to clock the transfer 
of data into the plaintext queue. The clk3 rate is the per-round rate for the block 
cipher. The simulations parameters are adopted as follows: 
1. The sync pattern size, n, is adopted as 8. 
2. The sync pattern format is "10 ... 00". 
3. The size of the block cipher, B , is equal to 128. 
4. The simulation is run for 2 ms, that is, over 105 blocks of plaintext data going 
into the plaintext queue. 
5. clk1, clk2 and clk3 are set to have periods of 9 ns, 18 ns and 18 ns, respectively. 
These values are selected as the minimum possible given critical path timing. 
The minimum number of bits in the plaintext queue is found to be 20. This 
situation only appears about 200 times of the total simulation run of 2 ms after the 
system is already working in the stable operation. In the plaintext queue, the 20 
(i.e., current number of bits) are composed as follows, 3 x 4 = 12 bits in the input 
pipeline, 4 bits in the FIFO, and 4 bits in the output pipeline. In this case, recalling 
the structure of the output pipeline of the plaintext queue, there is no valid bits in 
Reg3. 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 68 
With high probability distribution in Figure 4.10, there are four different values 
for the current number of bits in the plaintext queue, which are 21, 22, 23 and 24, 
respectively. These four numbers indicate that there are 3 x 4 = 12 bits in the 
input pipeline, 4 bits in the FIFO, and 5 or 6 or 7 or 8 bits in the output pipeline, 
respectively. The R eg3 is filled with 1 or 2 or 3 or 4 bits of plaintext data for the 
previous four situations, respectively. 
Probability Distribution of II bits In the Queue 
{Queue-Size • 80x4; clk3 •18 ns; clk2 c 18 ns; clk1 • 9 ns; Sync-Pattern • "10000000"; Running Time • 2 ms} 
17% ,-----,---,----,---,---,----T----.----,---',-------,----, 
12.5% 
e 
I= 
"' c c 
c 
~ 
" 
8% 0 
G 
"' ~
c 
1! 
. 
.. 
4% 
o~~~·~w'(~~~·~, ·~~~~~ 
0 10 20 30 40 50 60 70 80 90 100 110 
II bits In the Plaintext Queue 
Figure 4.10: Probability Distribution of# Bits in the Plaintext Queue (Block Transfer 
Size=4 Bits) 
We did an ASIC synthesis with 0.18 micron CMOS TSMC (Taiwan Semicon-
ductor Manufacturing Company) standard cell technology using Synopsys 2002 tools 
supported by Canadian Microelectronics Corporations (CMC) . We use the number 
of equivalent 2-input NA D gates for the total area as a metric of circuit size. The 
synthesis results of the block cipher, plaintext queue and ciphertext queue in this 
parallel transfer (4 bits) mode are shown in Table 4.1. The complexity of the SCFB 
.------------------------ - --------------------
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 69 
system has become 43697 gates vs. 41600 gates for the serial transfer design. The 
queuing system of the SCFB system using parallel transfer has more area consump-
tion than that of the serial transfer design. As we have mentioned, the clock clk1 is 
designed to be faster than clk2. This ensures that plaintext queue does not back up 
due to periods during which outgoing bits are stalled because of resynchronization. 
For simplicity of design, the clk1 frequency is set to two times faster than the clk2 
frequency. Based on thesis results, we adopt clk3 to be 18 ns, 18ns as the clk2 period 
and 9ns as the clk1 period. These clocks are slower than that of the last chapter 
(e.g., clk1 has become 9 ns vs. 5 ns for the serial transfer design) because the output 
pipeline in the plaintext queue and the input pipeline in the ciphertext queue have 
become more complicated than before. These changes have increased the delay in the 
critical path. The throughput of the block cipher of SCFB mode is reduced compared 
to the potential block cipher throughput because of the resynchronizations. The ideal 
throughput of the block cipher is 128 bits/ (12 x 18 ns) :::::; 592 Mbps. On the other 
hand, the input throughput of the plaintext queue is N /18 ns = 222 Mbps for N = 4 
bits. Thus, the throughput of the SCFB in parallel transfer ( 4 bits) mode can reach 
222 Mbps. The efficiency of the system is 222/592 :::::; 38%. Although the through-
put of the queuing system can be enhanced by increasing block transfer size, the 
throughput of the block cipher can only reach 500 Mbps - 600 Mbps, which becomes 
the bottleneck of the system efficiency and throughput . In the next chapter, we will 
apply the pipeline architecture to the block cipher and increase the block transfer size 
of the queuing system in order to increase the throughput of the system. 
CHAPTER 4. SCFB MODE USING PARALLEL TRANSFER 70 
Table 4.1: Synthesis Result Using 0.18 Micron CMOS (Block Transfer Size= 4 Bits) 
Total Area ( # gates) 
Plaintext Queue 7211 
Ciphertext Queue 7424 
Shift_Register 2375 
AES 27180 
IV _Shift_Register 1138 
SCFB System 43697 
4.3 Conclusion 
4.4 Conclusion 
This chapter investigates the hardware structure of statistical cipher feedback mode 
using parallel transfer. Compared with SCFB using serial transfer which is studied in 
the last chapter, parallel transfer applied to the hardware implementation of SCFB 
is able to improve the throughput of SCFB system. For the investigation of ASIC 
synthesis with 0.18 micron CMOS standard cell technology, the throughput of the 
SCFB using parallel transfer (block transfer size equal to 4 bits) can reach 222 Mbps, 
which is about two times higher than that of the SCFB using serial transfer in Chapter 
3. The complexity of the SCFB using parallel t ransfer is 43697 gates, which is larger 
than that of the SCFB using serial transfer. The efficiency of SCFB using parallel 
transfer is about 38%, which is much higher than that of SCFB using serial transfer, 
where the efficiency can only reach about 23%. 
Chapter 5 
Pipelined SCFB Mode U sing 
Parallel 'Iransfer 
In this chapter, the hardware implementation of pipelined statistical cipher feedback 
(SCFB) using parallel transfer from the plaintext queue to the ciphertext queue is 
investigated. As we have studied in Chapter 4, the throughput of the SCFB system 
can only reach 222 Mbitsjs. This results for two reasons: the limited throughput 
of the block cipher operation (592 Mbits/s) and the necessity of keeping the SCFB 
system throughput at less than about 50% of the block cipher throughput to avoid 
buffer overflow in the plaintext queue. For this reason, in this chapter we investigate 
pipelining the block cipher and increasing the block transfer size of the queuing sys-
tem. By doing this change to our system, we can increase the throughput of both the 
block cipher and the plaintext queue so that the throughput of the whole system will 
be improved significantly. 
In the SCFB mode using serial transfer or parallel transfer, which we have inves-
tigated before, the input data to AES comes from the previous output of AES if the 
sync pattern is not recognized. That is, the block cipher works in OFB mode most 
71 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 72 
of time. 
However, OFB is not a suitable choice if we are trying to improve the throughput 
of the block cipher by using pipelining. Counter (CTR) mode operation for the 
block cipher is a better choice for the purpose of pipelining AES. The reason is that 
encryption (or decryption) in CTR mode can be done in parallel on multiple blocks of 
plaintext or ciphertext. This property makes it possible to pipeline the block cipher. 
That is, the CTR function can provide pseudo random data to the block cipher as 
the input in a way that does not depend on the previous output of the block cipher 
while OFB mode does. By pipelining CTR mode, we are able to produce a block of 
keystream in only 1 clk3 cycle. Hence, pipelined CTR mode operation for the block 
cipher overcomes the throughput deficiencies of non-pipelined OFB mode and allows 
us to dramatically increase the throughput of the SCFB mode system. However, as 
we discuss in the next section, it will be necessary to modify SCFB mode in order for 
it to operate with pipelined CTR mode. 
5.1 SCFB Based on Pipelined Counter mode (CTR) 
SCFB mode based on pipelined CTR mode utilizes the block cipher which has pipeline 
architecture in order to increase the throughput of the block cipher. Compared with 
the conventional SCFB mode, the pipelined SCFB mode applies a pipelined CTR 
mode instead of OFB mode when the synchronization does not happen. The input 
data of the block cipher is only provided by the counter function. The counter function 
utilizes a linear feedback shift register (LFSR) to produce a pseudo random count 
which is sent to the block cipher as the input data every time. When synchronization 
happens, the new IV will be sent to the counter function for re-initialization. 
Figure 5.1 illustrates the nature of ciphertext data for pipelined SCFB mode. In 
,----------------------------------------------------- - ----
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 73 
n B J/ LxB bits 
' 
k 
' 
n 8 
I' / I' -'1' / I' I' 'I' / 
...... sync IV Blackout CTR sync IV . ..... 
Figure 5.1: Synchronization Cycle for £-Stage Pipelined SCFB 
the figure, n represents the number of bits in the sync pattern, B is the length of 
the subsequent IV and k indicates the duration of CTR mode in bits. The pattern 
of data is similar in nature to SCFB mode, except for the added "Blackout" period. 
CTR mode occurs between the end of the blackout and the beginning of the next 
sync pattern. For the blackout period, L x B indicates there are L pipeline stages 
(in parallel working on B bits of data) before the new IV produced ciphertext block 
appears at the output of the block cipher (that is, there is a pipeline latency of L 
stages, each stage producing B bits). The system does not begin to check the sync 
pattern until CTR mode begins and the ciphertext data in the blackout period is still 
produced using the previous IV as it resulting from the flushing out of the data caught 
in the pipeline when the sync pattern is detected. Following the blackout period, the 
new IV has propagated through the block cipher and the ciphertext data for CTR 
mode period is produced using the new IV. Hence, a synchronization cycle consists 
of n + B + L x B + k bits, which includes the set of bits from the beginning of the 
sync pattern to the beginning of the next sync pattern. Note that a non-pipelined 
SCFB mode using CTR mode, can be considered to the scenario of L = 0. 
5.2 Hardware Implementation Details 
Figure 5.2 illustrates the hardware implementation of pipelined SCFB mode using 
parallel transfer (where N represents the block transfer size). In the implementation 
of t his chapter, we shall assume N = 8. Compared with the SCFB using parallel 
r-- ------------------------------
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 74 
..,..., IVIR.1 ·nl 
CTR_.Funcl 
~1 I Data_ln newj V_d:lne _.. 
c 11<3-
I nit_ Data_ Load Block 
Cipher 
1 8 I~ _.. J...-_.. 
.., g ~ en I R I I I~ In I ~ R.>l"4·nl !!! I~ i l ~ I I ~ BlackaJt_Piriod(7:0 ~ ·~ ·~ 
SR1 LDar ~ ~ 1 1 I..., Contrdler .J_sR1_Rni ~ f ~ ~ f ~ I~ r-~ ~  I:D :II I I:II ·~ --'~ s l ~ 0!. . ..... I I 1 "'-1 
ivai li~ t 
-
I 
Plain text(N-1:0) 
lk2 -c 
clk1 • 
SR_Valic 
L..t MUX 
R 
L~ I ...... I I~ l PQ(128x8) 
,-J 
IV Shift Register (128) syn_pattem 
1 
(rt-1:0) 
lea Full N 1 XOR ...... ·1 I f·+ N R 1 1 R Ci -.....-
PlainTwct_Valid 1 00(128x8) 
CNalid 
pherteJd.(N-1 0) 
Figure 5.2: Hardware Implementation of Pipelined SCFB Using Parallel Transfer 
transfer mode (N = 4 bits), the SCFB using a pipeline architecture has more complex 
structures for the shift register, IV_ShifLRegister, plaintext queue and ciphertext 
queue. We also did some modifications to the system controller in order to control 
the behaviour of the counter function, pipelined AES and the two shift registers, which 
are quite different from the previous SCFB mode implementations of Chapters 3 and 
4. When the 128-bit keystream is generated in the block cipher, it will be loaded into 
the shift_register_l or shifLregister_2 depending on which is activated by the system 
controller. Then the selected shift register will transfer keystream out to be XORed 
with the plaintext to generate the corresponding ciphertext. The ciphertext data will 
be transfered to both the ciphertext queue and IV _ShifLRegister in a unit of N = 8 
bits . Since the sync pattern can be recognized anywhere in the ciphertext data while 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 75 
the system is working in CTR mode, the last transfer block of the new IV might need 
less than 8 bits depending on where the sync pattern is recognized. In this case, the 
shifLregister _1 or shifLregister _2 moves data out, which may contain 1 - 8 valid bits 
in a keystream block. The same thing happens in the plaintext queue, ciphertext 
queue, and the IV _Shift_Register. In the upcoming sections, we discuss the details 
of the hardware design for pipelined SCFB using parallel transfer mode, fo cusing 
on the shift register, plaintext queue, ciphertext queue and IV _ShifLRegister. The 
VHDL codes of the SCFB system controller and the top level RTL are shown in the 
Appendix A. 
5.2.1 Implementation of Counter Mode (CTR) 
Linear feedback shift registers (LFSRs) [21] are widely used in many of the keystream 
generators that have been proposed in the literature. Compared with other genera-
tors, LFSRs are suitable for hardware implementation. They can produce sequences 
of large period and good statistical properties. In our implementation based on AES, 
we apply the whole 128-bit block as the incrementing function. Thus, the period of 
the incrementing function should be n ~ 2128 . We can select the primitive polynomial 
C(D), which is used to construct the LFSR by using Table 4.8 in [21]. This primitive 
polynomial C(D) is shown in Eq.(5.1). 
C(D) = 1 + D2 + D27 + D128 (5.1) 
Then we can get the hardware implementation which is illustrated in Figure 5.3 
with regard to Eq.(5.1). In Figure 5.3, there are 128 stages numbered stage 0, 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 76 
Output 
Figure 5.3: Block Diagram of Linear Feedback Shift Register (LFSR) 
stage 1, ... ,stage 127, each capable of storing one bit and having one input and one 
output. The clock clk3 controls the movement of data where clk3 is also used to clock 
the block cipher at a per-round rate. The output sequence from this polynomial C(D) 
strictly has a period 2128 - 1. During each unit of time, the following operations are 
performed. 
1. The contents of stages are output to the input of the block cipher, 
2. The content of stage i is moved to stage i- 1, where 1 ::; i ::; 127, 
3. The content of stage 127 is calculated by adding together MOD 2 the previous 
contents based on Eq.(5.1). 
New_IV(1 27:o-n---..,L:JlZa.._..r---~-:-------, 
AES _ Input_ data( 1 27:o'I(J----,'...~.~oo:~'--__. 
el k 
hold o 
CTR Furc Ena 
128-bit Register 
Initial 
CTRBiock 
Figure 5.4: Block Diagram of Ports Specification of the LFSR 
The ports specification of the LFSR is illustrated in Figure 5.4. The "New _IV" 
vector (128 bits) is provided by the IV_Shift_Register right after the sync pattern is 
,-----------------------------------------------------------------·-----
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 77 
recognized. The "hold_on" signal is also set by the IV _Shift_Register. It just indicates 
whether the new IV is complete. The "CTR_Func_Enab" is set by the system con-
troller. When the "CTR_Func_Enab" is high, LFSR will load in "InitiaLCTR_Block" 
as the init ial counter block at very beginning of the system. Then LFSR will do the 
increment operation. For every clk3 cycle, if the AES is not frozen , the LFSR will 
generate a block of "AES_Input_data" which will act as the input to the block cipher. 
At any time, when "hold_on" is high, LFSR will load in the "New_IV" as the initial 
counter block and then do the increment operation. 
5.2.2 Advanced Encryption Standard (AES) 
In our implementation of pipelined SCFB using parallel transfer, the AES implemen-
tation has all the round keys precomputed and stored in memory. This differs from 
our implementations for the serial and parallel transfer modes in Chapter 3 and 4 
where we computed t he round key on-the-fly on each round for the data processing. 
This precompute scheme has no extra delay while supplying the sub keys, but it takes 
more area in order to store all the sub keys. We can not adopt the key on-the-fly on 
each round for every encryption because each of the 11 round stages need the round 
keys simultaneously in the pipelining architecture and the key scheduling hardware 
can only generate one block of round key per clk3 cycle. After all the subkeys have 
been calculated and stored in the 11 individual 128-bit Registers, the key scheduling 
can provide the sub keys to each round stage in AES for the following encryption. 
For the S-box implementation, we still adopt the simple boolean function. 
Figure 5.5 shows the block diagram of 11-pipeline stages of AES with key-scheduling. 
We perform the outer round pipelining of the AES algorithm. That is, we need 11 
128-bit registers each of which is inserted right after each round operation. There-
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 78 
Figure 5.5: 11-Stage Pipelined AES Using Key-Scheduling 
fore, every round is performed in one clk3 cycle. In this pipelining implementation, 11 
pipeline stages are performed. All the four transformations (Substitute Byte, Shift 
Rows, Mix Columns and Add Round Keys) in each round operation become the 
critical path in AES. 
Figure 5.6 illustrates the ports in the AES controller. The finite state ma-
chine of the AES Controller for the pipelined SCFB is shown in Figure 5. 7. The 
"IniLData_Load" signal indicates that the initial input t ext data should be loaded to 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 79 
lnit Data L 
1 load_ key _reg 
key_reg_ mux _ sel 
----l...f-4 AES Controller~-¥----1Dreg_select(3:0) 
round_const(?:O) 
Figure 5.6: Block Diagram of the AES Controller for Pipelined SCFB 
AES when it is high. The "load_key _reg" signal triggers the corresponding register 
in the key scheduling module in order to load in the proper initial key /subkey to the 
keys register. The "round_const" signal is needed in the F function of key schedul-
ing, which we have introduced in Chapter 2. Compared with the AES controller in 
the serial transfer mode, we did some modification on the new AES controller of the 
pipelined SCFB system using parallel transfer: 
1. A new signal "reg...select(3:0)" is introduced, which is needed to select the 11 
128-bit Registers which are used to store the subkeys for each round either at 
the initialization of key scheduling or when the user changes the initial key. 
For example, when the "reg...select" = "0010", only the second subkey register 
can load the subkeys resulted from the key scheduling. Simultaneously the first 
and other subkey registers are held. When the "Reg...select" = "1011" and the 
current state is "hold", all the subkey registers are complete because all the 10 
blocks of subkeys have already been stored in the corresponding subkey registers. 
We do not need to re-generate all the subkeys even when the resynchronization 
happens unless the user wants to change the initial key. Also, one thing we 
need to note is that AES should be frozen while the shift registers arc filled or 
in the middle of transfering keystream out as we have mentioned before. We 
will further discuss this point in the system controller. 
2. The system does not need the "hold_on" signal as input any more because 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 80 
when the new IV is ready, the AES does not need to be triggered to count 11 
clk3 cycles in order to provide the "unhold_on" signal to the shift registers and 
queuing system. Actually, there is no "unhold_on" signal in the pipelined SCFB 
using parallel transfer mode because after the new IV is ready the system will 
stay in the blackout period during which the sync pattern recognition is ignored 
until the new keystream produced by the new IV is ready. 
3. The signal "load_data_reg" is removed from the original AES controller because 
the key scheduling does not need to re-use the register with the AES round 
operation. The key scheduling now has its own registers to store the subround-
keys. 
In Figure 5.7, if "reset" is high at any state, the next state will transfer to ! nit 
immediately (i.e., asynchronous reset) . From state RoundO to Round9, the output 
"round_const" varies. From state RoundO to Round9 , the outputs are the same except 
for "round_const" and "reg_select". The output "key _reg_mux_sel" is high to generate 
the round key by Key Scheduling block. The output "load_key _reg" is also high for 
these ten states for loading the round keys in the corresponding registers. When 
the state is Load Input, "key_reg_mux_sel" is low, which indicates the Multiplexer 
in the Key Scheduling will select the initial keys for the first round. If the current 
state is Round10, "load_key _reg" is set to low which indicates all the sub roundkeys 
have already been calculated and stored in the 11 corresponding registers. When the 
current state is hold, "load _key _reg" will be set to low because there will be no new 
round keys to be processed. All these 13 states will be experienced again only when 
the initial key is changed by the user because we apply the key scheme where all the 
round keys are precomputed and stored in memory. 
.-------------------------------------------------------------------------- ---
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 
/key_reg_mux_sel <= D'; 
load_key_reg <= D'; 
round cons! <= "00000000"· 
- ' 
reg select <= "0000"; 
key_reg_mux_sel <= '1 '; 
Joad_key_reg <= '1 '; 
round_const <= "00000001 "; 
reg select <= "0000"; 
load_key _reg <= D'; 
reg_ select <= "1 011 "; 
81 
round_const <= "00100000"; 
re select <= "0101 "; 
Figure 5.7: FSM of AES Controller for Pipelined SCFB 
5.2 .3 System Controller 
The system controller is needed to take the control of the whole SCFB system. Com-
pared with the controllers of the SCFB based on the serial transfer mode and non-
pipelined parallel transfer mode, the controller in the pipelined SCFB using parallel 
transfer is much more complicated. The port specifications of the system controller 
is shown in Figure 5.8. On the input side, t he port specification is as follows. 
1. The signal "clkl" is the base system clock in the implementation. The signal 
,-----------------------------------
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 82 
"clk3" is used to control the running speeds of the block cipher . 
2. "Cipher_Donel " and "Cipher_Done2" indicate the completion of the first and 
second keystream from the block cipher at very beginning. 
3. "SRLFini" and "SR2_Fini" indicat e whether shifL register_l or shifLregister_2 
finishes transfering out its keystream. 
4. "SRLSpeciaLCase" and "SR2_SpeciaLCase" represent that the shift_register _l 
or shift_register_2 will be stalled for two clk3 cycles when some special cases 
happen, which will be discussed later in the Section 5.2.5. 
5. "BlackouLPeriod(7:0)" indicates the number of bits left in the Blackout mode, 
which has been discussed earlier in Figure 5.1. 
1 CTR Furc Enab 
- -
1 SR1 Load rese 
Cipher_ Done1 1 
• sR2 Load Cipher_ Done Controller 1 SR1 Fini • Fiag_SR1 
SR2 Fini 1 Flag_SR2 SR1_Speciai_Cas 1 SR2_Speciai_Cas • AES Frozen 
Blackout_Peiod(9:0 1 • Queue_ Stal l 
Figure 5.8: Port Specification of System Controller for Pipelined SCFB 
On the output side, the ports specification is as follows. 
1. "CTR_Func_Enab" is needed to trigger the LFSR to load in "InitiaLCTR_Block" 
as the initial counter block at very beginning of the system. Then LFSR will 
do the increment operation . 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 83 
2. "SRLLoad" and "SR2_Load" are used to t rigger the shift_register_l or shift_register_2 
to load in the 128-bit keystream block, respectively. 
3. "Flag_SRl" and "Flag_SR2" indicate whether shift_register_1 or shift_register_2 
is in the middle of transfering keystream data. 
4. "AES_Frozen" is used to stall t he block cipher while the shift registers are filled 
or in the middle of transfering keystream. We have to freeze the block cipher 
sometimes because the period during which a block of keystream (128 bits) is 
XORed with plaintext bits is longer than that during which a block of keystream 
is generated in the block cipher. We already mentioned this point earlier in this 
chapter. If "AES_Frozen" is low, the block cipher will do the encryption. 
5. "Queue_Stall" is used to stall the shift registers for one clk3 cycle in order 
to allow the block cipher to provide one block of keystream (128 bits) to 
shift_register _1 or shift_register _2. 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 
Oueue_Stall• ·o· : 
Flaa_SR1 • '0'; 
flag_SR2 • '1' 
84 
''AES_Frozen·· = '1 ' will be synchronized to clk3 domain, which Indicates that the Block Cipher will be frozen; 
"AEs_Frozen" = 'O ' wlll be synchronized to clk3 domain and goes high In the next clk3 cycle. "AES_Froze" = ·o· 
allow the Block Cipher1o do the encryption for one round. 
Figure 5.9: Finite State Machine of SCFB System Controller for Pipelined SCFB 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 85 
The finite state machine of the system controller is shown in Figure 5.9. At 
anytime if the "Reset" is high, the system will be in On_Rst state. The system 
stays in the Gen_Key state until "Cipher_Done1" is high (i.e., the first block of 
keystream (128 bits) is ready in the last pipeline register of the block cipher). Then 
the system will go to SRLLaadK ey state, where the "SRLLoad" is set to high to 
trigger the shift_register_1 to load in the first block of keystrcam (128 bits) from the 
block cipher. When the "Cipher_Done2" is high (i.e., the second block of keystream 
(128 bits) is ready in the last pipeline register of the block cipher), the system will 
switch to SR2_LoadKey state on the next rising edge of clkl. Also the output signal 
"AES_Frozen" is set to stall the block cipher for one clk3 cycle high because the 
two shift registers are both occupied. Then the system will transfer to WaitJnit 
state until shifLregister _1 has finished its data transfer or the rcsyuchronization hap-
pens. State SRLLaad_N arm indicates that shifLregister_1 has finished up its data 
shifting and will load in a new block of keystream (128 bits). It should be noted 
that if the signal "AES_Frozen" is low it should go high on the rising edge of th 
next clk3. Besides, the signal "AES_Frozen" is synchronized to clk3 domain. So, in 
the state S RLLaad_N arm, the signal "AES_Frozen" is set to low to allow the block 
cipher to do the encryption for one block of keystream. State W aitLN arm indi-
cates shift _register _2 is in the middle of shifting keystream data and shifLregister_1 is 
held. State S R2_Laad_N arm indicates that shift_register_2 has finished up its data 
shifting and will load in a new block of keystream (128 bits). State Wait2_Narm in-
dicates shift_register _1 is in the middle of shifting keystream data and shifLregister _2 
is held. States Queue_Stalled3 and Q'ueue_Stalled4 (which will be explained in the 
section of shift register) represent if the next block of keystream (128 bits) is not 
ready when either shifLregister_l or shift_register_2 runs out of data, both shift reg-
isters and plaintext queue will be held. When signal "AES_Frozen" is high, the 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 86 
system state will transfer to Resyncl or R esync2, where a new block of keystream 
is loaded to the empty shift register shown in the previous state. Queue_Stalledl 
and Queue_Stalled2 (which will be explained in the section of shift register) repre-
sent two special cases when resynchronization happens. If the system is in either 
of these two states, both shifLregister _l and shift_register _2 will be held. When 
signal "AES_Frozen" is high, the system state will transfer to SRLLoad_Norm or 
SR2_Load_Norm, where a new block of keystream is loaded to the empty shift regis-
ter and another block of keystream is stored in the last pipeline register of the block 
cipher. State ResyncLContd or Resync2_Contd indicates an intermediate state af-
ter state Resyncl or Resync2 when "SRLSpeciaLCase" or "SR2_SpeciaLCase" is 
low. The reason we add in the state ResyncLC ontd or R esync2_C ontd is to set 
"SRLLoad" or "SR2_Load" to low, also wait until shift_register_2 or shifLregister_l 
finishes its data transfering (i.e. , "SR.2_Fini" or "SRLFini" is high. ) 
5.2.4 IV Shift Register for Parallel Transfer Mode 
The IV _Shift_Register block diagram is shown in Figure 5.10. In the IV _Shift_Register 
design for the pipelined SCFB using parallel transfer, both the "Unhold_on" and 
"Chose_New_IV" signals become the internal signal compared with our designs for 
the non-pipelined SCFB. The reason is that the shift registers, plaintext queue and 
ciphertext queue will not be held any more when "New_IV _Done" is high , and these 
modules certainly do not need the "Unhold_on" . In this pipelined SCFB design, 
the "Unhold_on" signal is only useful inside the IV_Shift_Register. The same thing 
happens to the "Chose_N ew _IV" signal because the CTR function can take care of 
the new IV selection instead of the MUX module which we have applied in the non-
pipelined SCFB designs. The IV _Shift_Register keeps checking the 8-bit sync pattern 
CHAPTER 5. PIPELINED SCFB MODE U SING P ARALLEL T RANSFER 87 
all the time, except for t he blackout period which has been mentioned in Figure 5.1. 
When the 128 bit new IV is ready, IV _Shift_Register will provide this new IV to the 
CTR function module, and at the same time, it will set the signal "New_IV_Done" 
high to trigger the CTR function to load in this new IV as its new init ial value. 
New_IV_done 
IV_Out(127:0 
Sync _Pattem(7:0) 
Blackout _Period(7:0 
r-
-::. 
,-
1 I 
12~1 
; J 
FJJ 
I 
I~ 
,f 
< 
-
s= [ 
,o.. 
g 
8 1 
I 
IV_ Shift_Register 
-
-
~ hold_cou t ~:DP OFF 
- ~~---- ~ ~ / Count_Enal;l. Va id Svn Valid 1 J ; ...... I i=-1 I 
Counter I 
1 I 
1 1 I 
1 1 I 
I 
.... 
-
• 
• 
IV_in(7:0) 
PQ_Va id 
reset 
cl k1 
Figure 5.10: IV Shift Register for Pipelined SCFB Using Parallel Transfer (N= 8) 
Figure 5.11 : Sync Pattern Recognition for Pipelined SCFB Using Parallel Transfer 
(N= 8) 
Unlike the non-pipelined SCFB designs, the pipelined SCFB design using block 
transfer size equal to 8 bits, for every clk1 cycle, has up to 8 valid bits of cipher text 
coming into the IV _Shift_Register. After the sync pattern is recognized, the 128-
-----· ----
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 88 
bit new IV will be collected in the IV _Shift_Register and, simutaneously, the system 
will be in blackout period (shown in Figure 5.1). When the new IV is ready, it will 
be transfered to the CTR function as the new initial value. The number of bits 
in the transfer block right before the new boundary (i.e., the last transfer block of 
the blackout period) and the number of bits in the first transfer block of the new 
keystream produced by the new IV depend on where the sync pattern is recognized. 
Figure 5.11 describes the process of sync pattern recognition. In Figure 5.11, for every 
clk1 cycle there is at most 8 comparisons occuring in the IV_Shift_Register in order to 
check the 8-bit sync pattern. The 1 st moment describes the first two transfer blocks 
of ciphertext data (i.e., {IV0 (0) ... IV0 (7)} and {IV1(0) ... IVi (7)}) have already been 
loaded into the first 16 bits positions in IV _Shift_Register by using 2 clk1 cycles, 
where clk1 is needed to clock the transfer of data into the IV _Shift_Register. The 2 nd 
moment describes that at most 8 comparisons are complete for every clk1 cycle. For 
example, if the sync pattern is recognized in the 6th comparison, the IV _Shift_Register 
will begin to collect the 128-bit new IV, and the first bit of the new IV will be IV2 (1). 
In this case, the number of bits in the new keystream's first transfer block is set to 
2, and the number of bits in the transfer block right before the new boundary (i.e., 
the last transfer block in the blackout period) should be 8- 2 = 6. These two parts, 
2 bits and 6 bits, will be combined to form an 8-bit transfer block of keystream and 
then XORed with a plaintext transfer block to produce a ciphertext transfer block. 
That is, the input transfer block to both the ciphertext queue and IV _ShifLRegister 
always contains 8 valid bits. Compared with the pattern recognition using non-
pipelined SCFB, for every clk1 cycle, there are always 8 bits of data coming in the 
IV _Shift_Register. 
Ol 
00 
C lrllw1n1Hiortl 
o•~>•tol 
~~~lre:Mn 
PrO(h.cW 
d lht.rtul 
Cl!lllfl'lnttlor k 
1-< ·lollo) 
f"l,loo11••1Hbdio 
(Jb.l llh) 
T 
C'lfiiiW i tUM•IIt 
(Milt•) 
f ' 11..- ~ ultwll 
, ...... , 
4... I Ntw_l\' ::.::.1 Cipherttll-t-------t-,.----<"'- -::c--f;:~:---+-------------ii-----+-L--.;:_+--I~:::...J....--1--
•-.. • "'<.. Ntw ~, 
ll.e finl new ciJ)htreul block produced by 
' 
\. "New_IV " cont•ln.s8 bits 
···-·l ~~~~., I IV•fl I ,,.~11 I II' •Ol I 1Vo'f41 I IV .. "l I ll' •"'t• l I ll'o'(l) 
( II ) 
( Ill ) 
1be fin tnew ciJ)ftwlr.xt block 
produced by " Nt:w _tv•· 
c:on tains lbi ts r-
... ...... lu· .. ,., 
Th~ first new dphtrtu t block product 
by " New_IV" contain s lbits 
The first new dphertext bloc.k produc-ed by 
I. .. Nr:w IV" contAin s 7 bits 
Bowtdary 
_· 1'-"-·~ ...J....I _"•__J·· 1_··-·· ..~.... 1 •_'"•--~.• 1_""-"' .._ I·_· .. ...LI _···_,· !-"-""' ..._I · •• ( I v ) 
~~happened on lhe 
BlackOut 
\ 100 .... 
• 
. . . ( I ) 
T he first New IV bloc Sync_ 
contains o~y f!. i~ 
~--~~--~~--~ L IVI(It J\'1(1) ' '"'IH 11\' I(J) I IV•<• I 
llu: first " New_ IV" Sync_ 
blofk contains only 2 bill 
I l\'1(1 ) I IVl(l) 
Sync_ 
L Tbe&rst " New IV '' blockuntaius onJy7bits 
I IVI(t) I I VI(O I IVICII} I , .... (.., I IV1(4) I IV!(S) I IVI(ft IVI(J) I 1\\1(1) I IVI( I ) I JVI(:Q I 
Syn(_ 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 90 
In order to recognize the sync pattern, the IV _Shift_Register has to do up to 8 
comparisons for every clk1 cycle except for the blackout period. In order to deal 
with the number of bits in the new keystream's first transfer block and the number 
of bits in the transfer block right before the new boundary (i.e. , the last transfer 
block in the blackout period), an internal signal, "Sync_ReLVector" , is introduced to 
the design. Table 5.1 illustrates how the pattern recognition is related to the signal 
"Sync_ReLVector" and the other two parameters which we just mentioned. In this 
table, the first column indicates the number of comparisons in which the sync pattern 
is recognized in the IV_Shift_Register. The second column, "Sync_ReLVector", is an 
internal signal depending on the "Xth comparison". The third column, the number of 
bits in new keystream's 1st transfer block, indicates how many bits the first transfer 
block of the new keystream should contain after the new boundary, depending on 
the first column (i.e., where the sync pattern is recognized). The fourth column 
represents the number of bits in the transfer block which is right before the new 
boundary (i.e., the last transfer block of the blackout period) depending on the first 
column. Actually, the value in the fourth column is equal to 8 minus the value 
in the third column except for the case where the 8th comparison happens in the 
IV _Shift_Register. This is because the bits in new keystream's 1st transfer block and 
the bits in the transfer block which is right before the new boundary will be combined 
to form an 8-bit transfer block of keystream and then XORed with a plaintext transfer 
block to produce a ciphertext transfer block except that there are 8 bits in the new 
keystream's pt transfer block and the transfer block right before the new boundary. 
For the block transfer size equal to 8 bits in the pipelined SCFB, after the sync 
pattern is recognized, IV_Shift_Register will spend 16 clk1 cycles to collect the 128 bit 
new IV. When the new IV is ready, IV _Shift_Register will provide this new IV to the 
CTR function tore-initialize the LFSR. Then the system will stay in blackout period. 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 91 
Table 5.1: Boundary Positions Where the Sync Pattern is Recognized 
xth Sync_ReLVector #bits #bits 
companson in new keystream's in the transfer block 
1st transfer block right before 
the new boundary 
8th 7 8 8 
7th 8 1 7 
6th 9 2 6 
5th 10 3 5 
4th 11 4 4 
3rd 12 5 3 
2nd 13 6 2 
1st 14 7 1 
Figure 5.12 illustrates how to deal with the new boundary of the new keystream. 
There are five rows of data flow in Figure 5.12. In this Figure, {IV0 (0) ... IVo(7)} 
and { IV1 (0) ... !Vi (7)} represent the first and the second transfer block of ciphertext 
going into the IV _Shift_Register. In Figure 5.12, rows I to IV data flow illustrate 
some examples of where the new IV should start and how many bits the first transfer 
block of the ciphertext produced by the new keystream should provide depending 
on where the sync pattern is recognized in the IV _Shift_Register. Actually we have 
shown four different situations for the sync pattern recognition in row I to row IV, 
respectively. 
In the first row of Figure 5.12, where T represents the blackout period after the 
sync pattern is recognized, each "Ciphertext Block" indicates a 128-bit block of ci-
phertext produced by a 128-bit block of keystream generated by the block cipher. 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 92 
The data in the arrow diagram indicates the first transfer block of the new ciphertext 
produced by the new keystream. The data in dark colour represents the new IV (row 
a) or the first transfer block of new IV (row I to IV). The "New Boundary" indicates 
where the the blackout period ends up and the new ciphertext produced by the new 
keystream starts up. Row I indicates if the sync pat tern is recognized in the 8th com-
parison (shown in Figure 5.11) the first 8 bits of new IV should be the the transfer 
block of ciphertext, { IV2(0) ... IV2 (7)}, which is not shown in row I. The left side of 
Row I illustrates that signal "Sync_ReLVector" is equal to 7 (i.e., the sync pattern is 
recognized in the 8th comparison correspondingly) and the first transfer block of new 
ciphertext produced by the new IV should contain 8 bits (i.e., {IV177 (0) ... I~77 (7)} ). 
The index of the this first transfer block of new ciphertext is 177 because the blackout 
period is 11 x 16 = 176 transfer blocks, which corresponds to 11 pipeline stages of 
ciphertext blocks (i.e., each ciphertext block contains 128 bits equal to 16 transfer 
blocks) in the blackout period before the new IV produced ciphertext block appears. 
Row I I indicates if the sync pattern is recognized in the 7th comparison (shown in 
Figure 5.11) the new IV should start with IV1(0) . The left side of Row II illustrates 
that signal "Sync_ReLVector" is equal to 8 (i.e., the sync pattern is recognized in the 
7th comparison correspondingly) and the first transfer block of new ciphertext pro-
duced by the new IV should contain only 1 bit (i.e., {IV176 (0)} ). Row I I I indicates 
if the sync pattern is recognized in the 6th comparison (shown in Figure 5.11) the 
new IV should start with {IV1 (0), I~ (1)}. The left side of Row I I I illustrates that if 
signal "Sync_ReL Vector" is equal to 9 (i.e., the sync pattern is recognized in the 6th 
comparison correspondingly) and the first transfer block of new ciphertext produced 
by the new IV should contain only 2 bits (i.e. , {IV176 (0), I~76 (1)} ). Row IV indicates 
if the sync pattern is recognized in the pt comparison (shown in Figure 5.11) the new 
IV should start with {I~ (0) ... I~ (6)} . The left side of Row IV illustrates that 
CHAPTER 5. PIPELINED SCFB MODE U SING PARALLEL TRANSFER 93 
signal "Sync_ReLVector" is equal to 14 (i.e., the sync pattern is recognized in the pt 
comparison correspondingly) the first t ransfer block of new ciphertext produced by 
the new IV should contain only 7 bits (i.e., {1Vi76 (0) ... IVi 76 (6)} ). 
5.2.5 Shift Registers 
Syn _Ref(4:Qt:::>-------, 
Blackout _Pffi od(7:0) 
cl k1 ---~-t--t-----.-----, 
SR1 Loa 
SR2 _Special_ Case 
SR2 Rni ----+----~~ 
SR2 Loa 
SR 1 Fl a D--------,L--___. 
SR2 Fla 
KeyStream _ Out(7:0) 
Figure 5.13: Block Diagram of Shift Registers for Pipelined SCFB Using Parallel 
Transfer (N=8) 
Compared to the shift register in the serial/parallel transfer mode, the shift reg-
ister in pipelined SCFB has a more complex hardware implementation. As we have 
mentioned, the throughput of the block cipher is higher than that of one keystream 
block (i.e. , 128 bits) XORed with plaintext. If we still use one shift register, that 
will result in a big delay because the keystream block which is ready in the last 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 94 
stage of AES can not be transfered to the shift register until it is empty. To resolve 
this, we have decided to apply two 128-bit shift registers. Figure 5.13 illustrates the 
block diagram of shift registers for the pipelined SCFB. The "Syn_Ref( 4:0)" signal 
is provided by the IV _Shift_Register. It is needed to identify the number of valid 
bits in the next transfer block. "Blackout_Period(7:0)" signal is also provided by the 
IV _Shift_Register. It indicates how many bits are left in the Blackout mode, which has 
been discussed earlier in Figure 5.1. On the input side, "KeyStream_In(127:0)" vector 
is one block of the keystream which is produced by the block cipher. The "SRLLoad" 
and "SR2_Load" signals indicate whether shift_register_1 or shift_register_2 should 
load in the 128-bit keystream block. These two signals are provided by the system 
controller, and they do not go high simutaneously. The "Queue_Stall" signal is trig-
gered by the system controller. When "Queue_Stall" is high, the shift registers will 
be stalled for one clk3 cycle in order to allow the block cipher to provide one block 
of keystream (128 bits). The "SRLFlag" or "SR2_Flag" signals represent whether 
shift_register _l or shift_register _2 is in the middle of transfering keystream data out to 
be XORed with plaintext from the plaintext queue. On the output side, "SRLFini" 
or "SR2_Fini", which can not go high simutaneously, indicate whether shift_register_l 
or shift_register _2 finishes transfering out the keystream. These two signals go to the 
system controller. The "SRLSpeciaLCase" or "SR2_SpeciaLCase" signal represents 
that the shift_register_l or shift_register_2 will be stalled for two clk3 cycles when 
some special case happens on the boundary of the new keystream which is produced 
by the new IV. The "KeyStream_Out(7:0)" signal is the output of the shift registers. 
.------------------------------ - ----------
CHAPTER 5 . PIPELINED SCFB MODE USING PARALLEL TRANSFER 95 
.... 
Shift_Register2 
Shift_Register2 
Shift_Register2 
Shift_Register1 
Shlft_Reglster2 
Figure 5.14: Data Flow of Shift Registers for Pipelined SCFB Using Parallel Transfer 
(N=8) 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 96 
Figure 5.14 represents how the shift registers deal with the new boundary of the 
new keystream produced by the new IV which we have mentioned before. Block 
(a - 1) in Figure 5.14 shows how the two shift registers deal with the new boundary 
(shown in Figure 5.12) of the new keystream produced by the the new IV when resyn-
chronization happens. When the "BlackouLPeriod" =1 and "Sync_ReLVector" = 10, 
the new keystrearn should provide the first block, which contains 3 bits of data, to 
combine with 5 bits in the last transfer block of blackout which is located right before 
the new boundary when the next clk1 event happens (the new boundary and signals 
"BlackouLPeriod" and "Sync_ReLVector" have been explained in Section 5.2.4). All 
the data in the diagonal line area is ignored because the blackout period has ended. If 
the next block of keystream is not ready after the last 5 bits of data has been trans-
ferred out of shiftJegister_1, both shift registers and plaintext queue will be held 
until shiftJegister_1 successfully loads in a new block of keystream. In this case, the 
system will be in state Queue_Stalled4 (or Queue_Stalled3 in the reverse situation) 
which has been shown in Figure 5.9. Block (a- 2) shows that after 15 clk1 cycles 
the last transfer block of the new keystream in block (a- 1) only contain 5 bits. In 
the next clk1 cycle, this 5-bit transfer block will be combined with the first 3 bits in 
shiftJegister_1 to fill up the 8-bit register. 
Blocks (b - 1) and (b - 2) represent a special case while determining the new 
boundary of the new keystream. In block (b-1) , when the "BlackouLPeriod"=1 and 
"Sync_ReLVector" =6, t he new keystream should provide the first block which only 
contains 2 bits of data to combine with 6 bits in the last transfer block of blackout 
which are located right before the new boundary. These 6 bits are composed of two 
parts, one is the last 5 bits in shift_register_2 (as shown in block (b - 1)) , the other is 
the first bit in shifLregister_1 (still shown in block (b -1)). After the new keystream 
(128 bits) produced by the new IV is loaded in shiftJegister_2 in the next clk1 event 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 97 
(shown in block (b- 2)), the first 2 bits of this new keystream will be combined with 
the previous 6 bits, and the remaining 126 bits in the new Keystream is stored in the 
shift_register_2. Then, both shift_register_l and shift_register_2 will not transfer any 
data out (represented in state Queue_Stalledl and Queue_Stalled2 from the system 
controller) until the two blocks of keystream are ready in the block cipher, in which 
one block of keystream is transferred into the shift _register _l and the other block of 
keystream is stored in the last pipeline stage of the block cipher. 
5.2.6 Plaintext Queue and Ciphertext Queue 
The structure of the plaintext queue and ciphertext queue for pipelined SCFB mode 
using parallel transfer is simpler than that for the non-pipelin d SCFB mode because 
of the following factors: 
1. The queuing system does not have to be stalled when the resynchronization 
happens. This feature has simplified the hardware design. 
2. The block transfer size is fixed for the queuing system all the time even for 
the last transfer block of the new keystream of new IV. The queuing system in 
the non-pipelined SCFB mode based on the parallel transfer mode has to han-
dle the various block transfer size, which makes the hardware implementation 
complicated. 
Although the structure is simple, it does not indicate the area complexity is small. 
This is because the first in and first out (FIFO) module in the queuing system has to 
be mapped to 8-bit register per transfer block while 4-bit or 2-bit register per unit is 
mapped for the non-pipelined SCFB mode based on the parallel transfer mode. This 
situation will lead to a larger area complexity for the synthesis results. 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 98 
IDA 
4/ l Wite_Pipeline 4 4 TA(3:0 Raad_~lnA 
W:SM ll-- '- 'M'ORI_OATA(:!<O) ~e(' Reader .. _ ... - RPORT'_Q,t,TA(l:O - ~ +--- --+-- Poirter o>-EED- ~ WPORT-~0 AFO: -~:0 '""()_m"t ' IV AU 
Vo\)O._ ITf! 
..... 
I 
RST 
• WCLK 
OOTE : All the C0flll011E11Is a-e synciYonilEd reset 
3 
4 
RFSM 
"""' ,...., 
IJ)O(.nllb 
l 
L.ast..!_V J 61Qih(2:0) 
ODATA(3:0) 
~-
--71 s PO__pip3_hdd R_Vaid 
EED 
• RCLK 
Figure 5.15: Plaintext Queue for Pipelined SCFB Mode Based on Parallel Transfer 
(N=8) 
Figure 5.15 illustrates the structure of the plaintext queue in pipelined SCFB 
mode based on parallel transfer mode for block transfer size which is equal to 8 bits. 
The input signals, "IDATA", "!VALID", "RST" "clk2" and "clk1" , are connected 
to the external ports of the system. The signal "clk1" is the base system clock 
used for the transfer of data out of plaintext queue and into ciphertext queue. The 
signal "clk2" is needed to clock the transfer of data into and out of the system. The 
"IDATA" signal represents the plaintext data, which will be loaded into the input 
pipeline which is composed of several 8-bit registers. Then the plaintext data will be 
stored in the proper positions in the FIFO and read out of the FIFO when the control 
signals, "wporLmeb", "wp_enab" , "renab" and "rport_meb", are asserted properly. 
The "SR_Valid" signal, which comes from the shift registers, is used to synchronize 
the output data from the shift register and the plaintext queue. In Figure 5.15, a 
write finite state machine is needed to control the behavior of the write part in the 
plaintext queue. The block Writer Pointer provides t he writing address to the FIFO. 
A read finite state machine, is needed to control the behavior of the read part in the 
plaintext queue. The block Reader Pointer provides the reading address to the FIFO. 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 99 
Figure 5.16 illustrates the structure of the ciphertext queue in pipelined SCFB 
mode based on parallel transfer mode for block transfer size which is equal to 8 bits. 
The output signals, "ODATA", "OVALID" and "CQ_Full", are connected to the 
external output ports of the system. The "IDATA" signal represents the ciphertext 
data, which will be loaded into the input pipeline that is composed of several 8-bit 
registers. Then the ciphertext data will be stored in the proper positions in the FIFO 
and read out of the FIFO when the control signals, "wporLmeb", "wp_enab", "renab" 
and "rport_meb", are asserted properly. The "!VALID" signal, which comes from the 
plaintext queue, is used to identify the validation of the input data. In Figure 5.16, 
a write finite state machine is needed to control the behavior of the write part in the 
ciphertext queue. The Writer Pointer provides the writing address to the FIFO. The 
FIFO is actually a 2-port RAM which is used to store and read the data through write 
port and read port, respectively. A read finite state machine is needed to control the 
behavior of the system on the read side of the plaintext queue. The block Reader 
Pointer provides the reading address to the FIFO. 
IDATA(3 :0,-
IVAI.J~ - EEU-
• 
RST 
clk2 • 
~ t Write_Pipellne ~ Rsad_Pipeline] / I I 
~-"*' L 
-
VIIFSM b.L 1- 1- WPORT_C>ITA{aO) 
Flo_...,. .. JI"(Icl ~-
Writer Reader 
., __ 
Poirter 
RPORT_C>\TA(l:O - Poirter ~-
""'-~"' b.L ~0 ~~ i FIFO ; ~- """'17:0) v-""""'-"~ I 
I 
NOTE : /JJ the corrponents ae synclToni:zed reset 
9-
RFSM 
-
"""' 
Jlub 
I 
ODATA(3:0) 
-EEL]-- • ovALJo 
y clk1 
• ca_FUI 
Figure 5.16: Ciphertext Queue for Pipelined SCFB Mode Based on Parallel TI:ansfer 
(N=8) 
.----------------------------------------
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 100 
5.3 Synthesis Results, Analysis and Comments on 
the Design 
We did an ASIC synthesis with 0.18 micron CMOS standard cell technology from 
TSMC (Taiwan Semiconductor Manufacturing Company) using Synopsys 2002 tools 
supported by Canadian Microelectronics Corporations (CMC). The synthesis results 
of the block cipher, plaintext queue and ciphertext queue for the pipelined SCFB 
mode based on the parallel transfer mode (block transfer size equal to 8 bits) are 
shown in Table 5.2. 
Table 5.2: Synthesis Result Using 0.18 Micron CMOS (Block Transfer Size = 8 bits) 
Total Area ( # gates) 
Plaintext Queue 14128 
Ciphertext Queue 11935 
Shift_Register 7883 
AES 150674 
IV _Shift_Register 4974 
SCFB System 189963 
We still adopt the same clock design as in Chapter 4, that is , the elk! frequency 
is set to two times faster than the elk2 frequency. Although the queuing system is 
not held due to the resynchronization in this pipelined SCFB mode, the plaintext 
queue and the ciphertext queue may still be held when the shift registers deal with 
the new boundary of the new keystream, which we have mentioned earlier in this 
chapter. For simplicity of design, we just set elk! is two times faster than elk2, which 
ensures that plaintext queue does not back up due to the previous reasons. We did 
the functional simulations for queue size varying from 64 x 8 to 128 x 8 bits. In the 
CHAPTER 5 . PIPELINED SCFB MODE USING PARALLEL TRANSFER 101 
simulation, clk1 is the fastest clock and it can be the base system clock. The signal 
clk2 is needed to clock the transfer of data into the plaintext queue and clk3 is the 
per-round rate for the block cipher. The sync pattern is adopted as 1 followed by 
seven Os. From the simulations, an appropriate queue size which is equal to 128 x 8 
bits was found to have no queue overflow for the block transfer size which is equal to 
8 bits. Because the total number of bits in plaintext queue and ciphertext queue is 
fixed, underflow may happen in ciphertext queue when the overflow really happens 
in plaintext queue. In our system for the queue size equal to 128 x 8 bits, overflow 
never happens in ciphertext queue, because of the complementary relationship of the 
number of bits in the queues. When underflow happened frequently in the plaintext 
queue, plaintext queue spent 2 clk1 cycles to send out one block of plaintext data (8 
bits). Thus, the actual rate of the incoming data of ciphertext queue will be equal to 
the rate of clk2. This will result in a balance between the rates of the incoming and 
outgoing data in ciphertext queue, which will lead to no overflow in ciphertext queue. 
In the functional simulation underflow actually happens all the time in the plaintext 
queue when the system is working in stable state. 
According to the timing delay from the synthesis results, clk3 (i.e., clock period 
of the block cipher) is equal to 24 ns, clk2 (i.e., clock period of transfer of data 
into and out of the system ) is equal to 24 ns and clk1 (i.e., the basic system clock 
period) is equal to 12 ns. These clocks are slower than that of the SCFB mode based 
on the parallel transfer mode which is illustrated in the last chapter. Although the 
hardware implementation of the output pipeline in the plaintext queue and the input 
pipeline in the ciphertext queue have become much simpler than the serial transfer, 
the structures of the shift registers and IV _Shift_Register become more complex than 
the serial transfer, which leads to a slow clkl. The ideal throughput of the block cipher 
is 128 bits /24 ns ~ 5.333 Gbits/s. On the other hand, the input throughput of the 
CHAPTER 5. PIPELINED SCFB MODE USING PARALLEL TRANSFER 102 
plaintext queue is N /24 ns = 333 Mbits/s for N = 8 bits. Thus, the throughput of 
the pipelined SCFB using parallel transfer 8 bits can reach 333 Mbps. The efficiency 
of the system is 333/5333 ~ 6.24%. 
Although the throughput of the block cipher can be enhanced by using pipeline 
architecture, the throughput of the queuing system can only reach 333 Mbps, which 
becomes the bottleneck of the system efficiency and throughput. The throughput of 
the queuing system can be improved by increasing the block transfer size (e.g., 16 bits 
or 32 bits or more). However, the hardware complexity of the queuing system will be 
increased when the block transfer size increases. The plaintext queue or ciphertext 
queue includes write state machine, read state machine, write counter, read counter 
and a FIFO. Only the area of FIFO increases dramatically when the number of block 
transfer size N increases. For example, for the pipelined SCFB mode based on the 
parallel transfer (N=8 bits) mode, the FIFO in the queuing system is composed of 
128 memory units, and in each of them an 8-bit register is applied. If we increase N 
to B (B=128), the hardware complexity of the FIFO can be increased by 16 times at 
least. From the synthesis results, the area of the pipelined SCFB is around 7 times 
larger than the serial transfer mode, but the throughput is only 1.5 times larger. 
Apparently, from Table 5.2, the area of AES occupies 80% of the cost of SCFB. 
Thus, we can make a conjecture, increasing the block transfer size to N=128 would 
result in throughput up to 5 Gbps with modest increase in area of pipelined SCFB 
because the hardware complexity of AES does not increase when N increases. This 
incremental portion mainly comes from the larger FIFOs in the plaintext queue and 
the ciphertext queue. 
CHAPTER 5. PIPELINED SCFB MODE USI G PARALLEL TRANSFER 103 
5.4 Conclusion 
This chapter investigates the hardware structure of the pipelined SCFB mode based 
on parallel transfer mode. Compared with non-pipelin d SCFB system based on 
parallel transfer mode which is studied in the last chapter, pipelined SCFB mode 
has a higher system throughput. For the investigation of ASIC synthesis with 0.1 
micron CMOS standard cell technology, the throughput of the pipelined SCFB mod 
based on parallel transfer (block transfer size equal to 8 bits) can reach 333 Mbps, 
which is about 1.5 times than that of the non-pipelined SCFB mode based on parall l 
transfer mode in Chapter 4. But the penalty is area, that is, th area complexity is 
over 7 times larger than that of SCFB mode based on the serial transfer mode. The 
major cause of increased area is the pipelined implementation of AES because of the 
11 128-bit registers inserted among the 10 rounds and another 10 128 bit registers to 
store the subkey. We conjecture that increasing the transfer block size to 32 or 64 bits 
should increase the throughput by a factor of about 4 or 8 with only modest incr ase 
in hardware complexity becasue the area complexity of AES will not increase when 
N increases. 
Chapter 6 
Analysis of SRD and EPF 
In this chapter, we investigate the error characteristics and the resynchronization 
properties of the pipelined SCFB mode based CTR mode at the output of the de-
cryption. In particular, we study how various blackout periods and sync pattern sizes 
affect the error characteristics and resynchronization characteristics in the pipelined 
SCFB. 
6.1 Error Propagation Factor 
The error propagation factor (EP F) [8] is the bit error rate at the output of the 
decryption divided by the probability of a bit error in the communication channel. 
We shall consider the EP F of the pipelined SCFB versus different blackout periods 
and different sync pattern sizes as well. The number of blackout periods (i .e., blackout 
period ranges from 1 to 13) represents the number of pipeline stages in the block 
cipher. The 11-stage pipelined SCFB used in our implementation based on AES is 
adopted when the EP F versus different sync pattern sizes is investigated. 
104 
r-------------------------------------
CHAPTER 6. ANALYSIS OF SRD AND EPF 105 
6.1.1 EPF of the Pipelined SCFB Mode Versus Various Black-
out Period Lengths 
In the simulations relating to EP F , the bit errors are generated in a constant distance 
in order to avoid the bit error interactions at the receiver. For larger probability of 
error (Pe), it is possible that the effect of one bit error at the output of decryption 
may interfere with the effect of another error in the channel. This means when an 
error already occurs in the sync pattern/IV, or a false sync pattern is generated, and 
another error occurs in the following CTR mode, the second error will not increase 
the EP F. In this case, the two error bits have interacted. 
n B LxB bits k n B 
...... sync IV Blackout CTR sync IV ...... 
Figure 6.1 : Synchronization Cycle for Pipelined SCFB with Various Blackout Period 
Figure 6.1 , illustrates the ciphertext bits transmitted in the communication chan-
nel for the pipelincd SCFB mode (based on CTR mode). In this figure, n represents 
the number of bits in the sync pattern, B is the cipher block size and length of the 
subsequent IV, L represents the number of pipeline stages in the block cipher (e.g., 
typically the number of rounds in the block cipher), and the remaining bits, which 
we refer to as the CTR block which has a size of k, occur between the end of the 
blackout and the beginning of the next sync pattern. A synchronization cycle consists 
of n + B + L x B + k bits, which includes the set of bits from the beginning of the 
sync pattern to the beginning of the next sync pattern. 
In general, for the pipelined SCFB mode (based on CTR mode), the expected 
error bits at the receiver can be roughly approximated for two cases as follows. 
In the first case, consider the occurrence of an error in the sync pattern or IV 
CHAPTER 6. ANALYSIS OF SRD AND EPF 106 
block. The resulting bound on EP Fsync/IV is 
1 (-E p Fsync/ IV ~ 2 X k + n + B + L X B) (6.1) 
where k is average length of CTR mode block. Assuming that all CTR mode blocks 
are the length of the average CTR mode block, we have k ~ 2n , where n is the number 
of bits in the sync pattern. For n = 8, B = 128, L = 11, EP Fsync/ IV is approximately 
equal to 900. In Eq.(6.1), (L x B + k + n +B) indicates the expected number of bits 
at the receiver from where the end of the current sync cycle until the resynchronization 
is re-achieved. The coefficient ~ in Eq.(6.1) indicates that on average half of the bits 
will be in error before resynchronization. 
In the second case, consider the occurrence of an error during the blackout or 
CTR mode blocks. The resulting EPF8 o;crR is shown in Eq.(6.2). 
EP Fao;crR 2: 1 (6.2) 
Eq.(6.2) corresponds to a bit error which occurs in the blackout/CTR mode and 
causes one bit error at the receiver such that it does not cause a false sync pattern. 
Eq.(6.2) does not account for the circumstance that a bit error causes a false sync 
pattern to occur resulting in the receiver improperly assuming a resynchronization. 
So overall, weighting each case by its probability of occurrence, the lower bound 
EP F can be approximated by 
EPF ~ Prob(sync/IV) X EP Fsync/IV + Prob(BO/CTR) X EP Fao jCTR 
n+B 
-----=:------ X EPFsyncjiV 
LxB+k+n+B 
LxB+k 
+ -----==------
LxB+k+n+B 
(6.3) 
.---------------------------------·- --
CHAPTER 6. ANALYSIS OF SRD AND EPF 107 
where Prob(sync/IV) represents the probability of occurrence for a bit error occuring 
in the sync pattern or IV and Prob(BO/CTR) represents the probability of occurence 
for a bit error occuring in the blackout or CTR mode. For n = 8, B = 128, L = 11, 
EP F is greater than or approximately equal to 69. For L equal to 1 to 13, EP F is 
plotted in Figure 6.2. In Eq.(6.3), it should be noted that the probabilities are very 
rough approximations based on assumption that all CTR mode blocks are exactly 
the length of the average CTR mode block. 
EPF for the Pipelined CTR mode SCFB vs. various Blackout periods 
120,---.-----,----,----,----,-----r;:====:::::::::=r:==.==::::;, 
__._ ~~~ ~~~;~ 
110 
100 
.9 90 
~ 
LL g 
~ 80 
~ 
Cl. 
l5 
.n 70 
60 
50 
......._ Lower bound EPF for 
Pipellned CTR mode SCFB 
400L---~---~---~--~a---~1o----1~2---~14~--_J1s· 
L (number of pipeline stages) 
Figure 6.2: EPF of the Pipelined CTR mode vs. various Blackout Period 
Figure 6.2 shows results of a simulation examining EP F versus L (i.e., pipeline 
stages). The simulation parameters are adopted as follows: 
1. The sync pattern size, n, is equal to 8. 
CHAPTER 6. ANALYSIS OF SRD AND EPF 108 
2. The sync pattern format is "10000000". 
3. The size of the block cipher, B, is equal to 128. 
4. The number of pipeline stages, L, varies from 1 to 13. 
5. The bit errors are inserted to the channel with a distance equal to 105 . 
6. The simulation length (i.e., the number of plaintext bits) is equal to 1010 . 
The results from Figure 6.2 illustrate that the EP F trends upwards slowly when 
the number of pipeline stages increases. The lower bound on EP F resulted from 
Eq.(6.3) is also illustrated in this figure. The trend on the graph is the result of the 
effects of false synchronizations. A false sync results in a loss of synchronization up 
until the end of the next blackout. That is, much of a sync cycle will be unsynchro-
nized between transmitter and receiver. Since the size of sync cycle is dependent 
on L, larger L implies greater EP F when false synchronization occurs at receiver. 
Hence, as L increases in the graph the effects of false synchronizations become more 
evident and EP F increases. False synchronizations are not incorporated into the 
lower bound on EP F. 
6.1.2 EPF of Pipelined SCFB Mode Versus Various Sync 
Pattern Sizes 
We also investigated the EP F versus different values of sync pattern size of n by 
running simulations. Figure 6.3 illustrates results of simulation examining EP F 
versus n (i.e., the size of sync pattern) for both the 11-stage pipelined SCFB based 
on CTR mode and the conventional SCFB mode. The simulations parameters are 
adopted as follows: 
CHAPT ER 6. ANALYSIS OF SRD AND EPF 109 
1. The sync pa ttern size, n, varies from 4 to 12. 
2. The sync pa ttern format is "10 . . . 00" . 
3. The number of pipeline stage, L , is 11. 
4. The size of the block cipher, B , is equal to 128. 
5. The bit errors are inserted to the channel with a distance equal to 105 . 
6. The simulation length (i.e., the number of plaintext bits) is equal to 109 . 
EPF for Different Sync Pattern Sizes vs. Pipelined SCFB Mode I Conventional SCFB Mode 
Simulation Length=1.00- e09 Error Distance = 1.00- e05 
200~--------~--------,---~-----.---------,----~=;~=.=;;;===~ 
_,._ PlpeHned SCFB mode 
~Conventional SCFB mode 
240 
220 
200 
u:-
a. 
w I 180 
~ 
g 160 
"' J 140 
~ 
w 
120 
100 
60 
600~--------L---------~--------~8~------~170 --------~12--------~,4 
Sync Pattern Sizes 
Figure 6.3: EPF of Pipelined CTR mode SCFB vs various Sync Pattern Size 
In Figure 6.3, the results for pipelined SCFB mode illustrates that the EP F 
decreases significantly when the size of sync pattern increases. For small n, a false 
sync pattern may take several sync cycles to clear up, and , hence, EP F is dramatically 
higher for smaller n . For the conventional SCFB mode where pipeline stage, L , can 
CHAPTER 6. ANALYSIS OF SRD AND EPF 110 
be considered as 0, has a shorter sync cycle than the pipelined SCFB. So even when 
the false sync pattern is frequently found for smaller n, it will not take so long before 
resynchronization. This is why for the smaller n, the EP F is not significantly as high 
as that for the pipelined SCFB mode. 
6.2 Sync Recovery Delay 
Synchronization R ecovery Delay (SRD) is the expected number of bits following a 
sync loss due to a slip before synchronization is regained [8]. SRD does not include 
the bits that are lost directly due to the slip. 
We will consider the SRD of pipelined SCFB versus different blackout periods 
and will also investigate the SRD versus different sync pattern sizes. The number of 
blocks on a blackout periods range from 1 to 13. The standard CTR mode SCFB is 
adopted when the SRD for varying values of n is investigated in order to compare 
the simulation results to the conventional SCFB in [8]. 
6.2.1 SRD Versus Various Blackout P eriod 
In general cases, for the pipelined SCFB based on CTR mode, the expected synchro-
nization recovery delay at the receiver can be roughly approximated for two cases as 
follows. 
In t he first case, consider the occurrence of a slip on t he sync pattern or IV block. 
The resulting SRDsync/IV is 
n+B -
s RD sync/ IV ~ 2 + L X B + k + n + B + L X B (6.4) 
CHAPTER 6 . ANALYSIS OF SRD AND EPF 111 
where k is average length of CTR mode block, B is the length of the subsequent IV, 
L is the number of pipeline stages in the block cipher , and n represents the number 
of bits in the sync pattern. We assume that all CTR mode blocks are exactly the 
length of the average CTR mode block (e.g., k ~ 2n, where n is the number of bits in 
the sync pattern). For n = 8, B = 128, L = 11 , S RDsync/IV is approximately equal 
to 3276. In Eq.(6.4) , the right side indicates the expected number of bits following a 
sync loss due to a slip before synchronization is regained at the receiver. 
In the second case, consider the occurrence of a slip during the blackout or CTR 
mode. The resulting SRDso;crR is 
L X B+k 
SRDso;crR ~ 2 +n+B+L x B 
(6.5) 
We assume that all CTR mode blocks are exactly the length of the average CTR 
mode block (e.g., k ~ 2n, where n is the number of bits in the sync pattern). For 
n = 8, B = 128, L = 11, SRD8 o;crR is approximately equal to 2376. Eq.(6.5) 
indicates that if a bit slip occurs in the blackout/CTR part, sync loss will last until 
the end of the next blackout period at the receiver. 
So overall, weighting each case by its probability of occurrence, the SRD can be 
approximated by 
SRD ~ Prob(sync/ IV) X SRDsync/IV + Prob(BO/CTR) X SRDso;crR 
n + B x ( n + B + L x B + k + n + B + L x B) 
n+B+LxB+k 2 
L x B + k ( L x B + k B L B) 
+ L x B+k+ n+ B x 2 +n+ + x 
(6.6) 
,-------------------------------------- -
CHAPTER 6. ANALYSIS OF SRD AND EPF 112 
where Prob(sync/ IV) represents the probability of occurrence for a bit slip occuring 
in the sync pattern or IV, Prob(BO/CTR) represents the probability of occurence for 
a bit slip occuring in the blackout or CTR mode. For n = 8, B = 128, L = 11, 
SRD is approximately equal to 2444. The resulting approximations for SRD for 
various values of L are plotted on Figure 6.4. In Eq.(6.3) , it should be noted that the 
probabilities are very rough approximations based on assumption that all CTR mode 
blocks are exactly the length of the average CTR mode block. This analysis does not 
account for false synchronizations at receiver caused by slips. 
For a larger slip rate (i.e., how often a bit slip occurs in the communication 
channel) , bit slip overlap may happen in the channel. The bit slip overlap represents 
the following situation: when a bit slip already occurs in the channel, another bit slip 
occurs before the new synchronization is achieved. These two bit slips overlap . 
~ 
Sync Recovery Delay for Pipelined CTR mode SCFB vs. L (#pipeline stages) 
14,-----~--~--~-~-------.-r===============~ 
- e-- Simulation resulls 
-A- Approximations derived from Eq. 6.6 
13 
12 
11 
(/) 10 
~ 
6oL- -------~---------~,o---------~,s 
L (the number of pipeline stages) 
Figure 6.4: SRD vs. various Blackout Period 
,-------------------------------------------------------------------- -
CHAPTER 6. ANALYSIS OF SRD AND EPF 113 
Figure 6.4 shows results of a simulation examining SRD versus various number of 
pipeline stages from 1 up to 13 for the pipelined SCFB based on CTR mode. The 
resulting approximations for SRD for various values of L derived from Eq.(6 .3) are 
also plotted in this graph. The simulation parameters are chosen as follows: 
1. The sync pattern size, n, is equal to 8. 
2. The sync pattern format is "10 ... 00" . 
3. The number of pipeline stages, L , varies from 1 to 13. 
4. The size of the block cipher, B , is equal to 128. 
5. The bit slips are inserted to the channel with a distance equal to 104 . 
6. The simulation length (i.e., the number of plaintext bits) is equal to 108 . 
In order to avoid the bit slips overlap, we have to choose the proper value of bit 
slip rate. We have run simulations for various values of bit slip rate and eventually 
the upper bound of slip rate equal to 10- 4 for the number of pipeline stages up to 13 
was found to have no bit slip overlap occuring at the receiver. Hence, 10- 4 is adopted 
as the bit slip rate for our simulation examining SRD versus various L. 
The simulation results in Figure 6.4 show that the logarithm of SRD increases 
when the number of pipeline stages increases. These results are comparable to the 
approximations of Eq.(6.4) , Eq.(6.5) and Eq.(6.6). The trends of the SRD from the 
simulations are quite closed to the approximations in Figure 6.4. 
6.2.2 SRD Versus Various Sync P attern Sizes 
We have also investigated the SRD versus different values of n (i.e., the size of sync 
pattern). Figure 6.5 shows results of a simulation examining SRD versus various sizes 
CHAPTER 6. ANALYSIS OF SRD AND EPF 114 
of sync pattern from 4 up to 12 for the pipelined SCFB (L = 11) . The simulation 
parameters are chosen as follows: 
1. The sync pattern size, n, varies from 4 to 12. 
2. The sync pattern format is "10 ... 00,. 
3. The size of the block cipher, B, is equal to 128. 
4. The bit slips are inserted to the channel with a distance equal to 105 . 
5. The simulation length (i.e., the number of plaintext bits) is equal to 109 . 
Sync Recovery Delay vs. Sync Pattern Size {B = 128 bits, Bit Slip Rate= 1/100000) 
14,-----.-----.-----.-----.-----~----~-----.-----.-----.-----. 
13 
12 
11 
10 
_..,_ Conventional SCFB 
_.,_ Plpelined SCFB (L• 11 ) 
.L_ ____ L_ ____ ~----~----J_ ____ ~ ____ _L ____ _L ____ _L ____ _L ____ _J 
0 8 10 11 12 13 
Sync Pattern Size 
Figure 6.5: SRD vs. various Sync Pattern size 
In Figure 6.5, the curve with circle symbols represents the conventional SCFB 
mode, which is the hybrid of CFB mode and OFB mode as we have discussed in 
.---------------------- - ---
CHAPTER 6. ANALYSIS OF SRD AND EPF 115 
chapter 2. The curve with triangle symbols represents the CTR mode SCFB. For 
convenience, the graph presents a plot of the logarithm base-2 of the SRD. 
In [8], the lower bound and upper bound on S RD are discussed. In the S RD 
simulation results from [8], the lower bound and upper bound converge as n gets 
larger. In our simulation, the SRD simulation results of the CTR mode SCFB and 
the conventional SCFB also converge as n gets larger. As discussed in [8], S RD 
increases in an exponential manner when n gets larger. 
6. 3 Conclusion 
This chapter investigates the error characteristics and the resynchronization prop-
erties of the pipelined SCFB mode. In the study of EP F, we do the simulations 
examining EP F versus L. We also provide the lower bound of EP F versus L with-
out incorporating the false synchronizations. By running the simulations, we also 
investigated the EP F versus different values of sync pattern size. In the study of 
SRD, we do the simulations examining SRD versus Land, we provide the equations 
which approximate S RD versus L. We also run the simulation to investigate the 
S RD versus various sync pattern sizes in this chapter. 
Chapter 7 
Conclusions and Future Work 
7.1 Conclusions 
This thesis investigates the statistical cipher feedback (SCFB) mode based serial 
transfer mode and parallel transfer. In addition, we propose and analyze a pipelined 
SCFB mode designed for high speed implementations. SCFB mode can configure a. 
block cipher to operate as a stream cipher by sending in the plaintext and sending out 
the ciphertext symbol by symbol or bit by bit . So, SCFB mode can be categorized 
as a self-synchronizing stream cipher. 
In order to overcome CFB modes poor efficiency and OFB mode 's lack of resyn-
chronization, SCFB mode combines CFB mode and OFB mode to not only im-
prove the efficiency by working in OFB mode most of the time but also obtain self-
synchronization by searching for the sync pattern in the ciphertext and working in 
CFB mode to periodically obtain the IV after the sync pattern is recognized. The 
hardware design and implementation is performed by using ModelSim SE 6.0, and 
the synthesis is performed by using synopsys tool with 0.18 micron CMOS technology 
from TSMC (Taiwan Semiconductor Manufacturing Company) supported by Cana-
116 
r---------------------------------------
CHAPTER 7. CONCLUSIONS AND FUTURE WORK 117 
dian Microelectronics Corporation (CMC) to study the timing delay and the area 
complexity. We also did the functional simulations of the SCFB mode in software to 
analyze error propagation factor (EP F) and synchronization recovery delay (SRD). 
The AES adopts both composite field implementation to decrease the hardware com-
plexity and simple boolean function implementation to improve the throughput of 
the block cipher. The former is used in SCFB mode using serial transfer mode and 
the latter is applied for the parallel transfer mode. 
We have implemented the SCFB mode using serial transfer mode, SCFB mode 
using parallel transfer mode for the block transfer size equal to 4, and pipelined SCFB 
mode based on parallel transfer mode. In the pipelined SCFB mode implementation, 
the throughput of the pipelined SCFB system can reach up to 333 Mbps which is 
approximately 1.5 times faster than the parallel transfer mode (N=4) , and the ef-
ficiency is only approximately 6.24%. The plaintext queue is in underflow most of 
time due to the high speed of key generation in the pipelined block cipher. The area 
complexity of the pipelined SCFB system is approximately 7 times larger than the 
serial transfer mode. 
The probability distribution of the number of bits in the plaintext queue is investi-
gated for both the serial transfer mode and the parallel transfer mode for varying sync 
pattern sizes. This analysis reveals that resynchronization happens more frequently 
for the smaller sizes of sync pattern, and the queue would have more chances to be 
filled with incoming plaintext bits without any outgoing bits during the resynchro-
nization. From the functional simulations for different buffer sizes, an appropriate 
buffer size of 64 (64 x N for the parallel transfer mode SCFB) bits, which results in 
no queue overflow, is selected for SCFB mode using serial transfer mode and parallel 
transfer mode (N=4). The buffer size for the pipelined SCFB mode based on the 
parallel transfer (N=8 bits) mode is finally equal to 128 x N in which no queue over-
CHAPTER 7. CONCLUSIONS AND FUTURE WORK 118 
flow is found. This results from the high speed of keystream generation in the AES 
block cipher which has a 11-stage pipeline architecture. 
From the synthesis results, the pipelined SCFB system based on parallel transfer 
mode has the most complicated hardware implementation and the most complex 
timing issues which constrain the efficiency but still allow higher speed. The SCFB 
system based on serial transfer mode has the simplest hardware implementation and 
the timing delay for the critical path is the smallest but the throughput is constrained 
by the plaintext queue timing. The SCFB system based on parallel transfer mode 
(N = 4) has an area complexity twice larger than serial transfer mode but the timing 
delay is one half of the serial transfer mode. 
7.2 Future Work 
Compared with the SCFB mode using parallel transfer mode (N= 4), the area com-
plexity of pipelined SCFB system (N = 8) increases dramatically, while the throughput 
increases only by 1.5 t imes. Two possible directions can be taken to solve this prob-
lem. 
1. Simplify the hardware structures of the two shift registers which is one of the 
most complex modules in the pipelined SCFB mode in order to reduce the area 
complexity. 
2. Increase the block transfer size (N) in order to improve the throughput of the 
SCFB system as well as the efficiency of the SCFB system. In the extreme, 
the design could have N = B (that is, transfer block size equal to cipher block 
size). 
- - ------
CHAPTER 7. CONCLUSIONS AND FUTURE WORK 119 
SCFB mode can be implemented in field-programmable gate arrays (FPGA) which 
allows for re-programmable debugging and lower non-recurring engineering costs com-
pared with ASICs. Although FPGAs are normally slower than ASICs and draw more 
power, we can test the SCFB system on a real chip if we can successfully implement 
the system on the FPGAs. We may also compare the SCFB mode to other modes 
which are widely used today in the physical layer of high speed networks. 
References 
[1] William Stallings, Cryptography and N etwork Security, Principles and Practice, 
3rd ed. Prentice Hall, 2003. 
[2] M. Bellare, A. Desai, E. Jokipii and P. Rogaway, "A concrete security treatment 
of symmetric encryption: Analysis of the des modes of operation," Proceedings 
of 38th Annual Symposium on Foundations of Computer Science, IEEE, pp. 
394- 403, 1997. 
[3] W. Stallings, "The advanced encryption standard," vol. XXVI, no.3, July 2002. 
[4] 0. Jung and C. Ruland, "Encryption with statistical self-synchronization in syn-
chronious broadband networks," Cryptographic Hardware and Embedded Systems 
- CHES'99s, Lecture Notes in Computer Science, vol. 1717, pp. 340- 352, 1999. 
[5] A. Alkassar, A. Geraldy, B. Pfitzmann and A-R. Sadeghi, "Optimized self-
synchronizing mode of operation," Fast Software Encryption Workshop - FSE 
2001, Yokohama, Japan, Apr 2001. 
[6] National institute of standards and technology. [Online]. Available: AES web 
site, http:/ /www.csrc.nist.gov/encrytion/aes 
120 
REFERENCES 121 
[7] J. Wolkerstorfer, E. Oswald, M. Lamberger , "An ASIC implementation of the 
AES sboxes,, The Cryptographer's Track at the RSA Conference (CT-RSA 
2002}, Lecture Notes in Computer Science, vol. 2271 , Feb 2002. 
[8] Howard M. Heys, "Analysis of the statistical cipher feedback mode of block 
ciphers,, IEEE Transactions on Computers, vol. 52, Issue 1, pp. 77- 92, Jan 
2003. 
[9] U.M. Maurer, "New approaches to the design of self-synchronization stream ci-
phers,, Advances in Cryptology - EUROCRYPT '91 , pp. 458 - 471, 1991. 
[10] M. Dworkin, Recommendation for Block Cipher Modes of Operation. NIST 
Special Publication 800-38A, 2001. 
[11] Diffie, W., and Hellman, M., "Privacy and authentication: An introduction to 
cryptography,, Proceedings of the IEEE, vol. 67, pp. 397 - 427, March 1979. 
[12] V. Rijmen. Efficient implementation of the rijndael sbox. [Online] . Available: 
http:/ /www.esat.kuleuven.ac.be/ rijmen/rijndael/sbox.pdf 
[13] J . Fuller, W. Millan, "Linear redundancy ins-boxes,, FSE 2003, vol. 2887, 2003. 
[14] N. Yu, "Compact hardware implementation of AES with concurrent error detec-
tion,, Master 's Thesis, Memorial University of Newfoundland, 2005. 
[15] L. Zhang, "Fully pipelined implementation of advanced encryption standard,, 
Project Report , Memorial University of Newfoundland, 2005. 
[16] Hua Li; Friggstad Z, "An efficient architecture for the AES mix columns opera-
tion,, Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium 
on, vol. 5, pp. 4637 - 4640, May 2005. 
REFERENCES 122 
[17] Xinmiao Zhang and Keshab K. Parhi, "Implementation approaches for the ad-
vanced encryption standard algorithm," IEEE Circuits and System Magazine, 
pp. 24- 26, 2002. 
[18] Xinmiao Zhang and Par hi , K.K., "High-speed VLSI architectures for the AES 
algorithm," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 
vol. 12, Issue 9, pp. 957- 967, Sep 2004. 
[19] F. Yang, "Analysis and implementation of statistical cipher feedback mode and 
optimized cipher feedback mode," Master's Thesis, Memorial University of New-
foundland , 2004. 
[20] Canadian Microelectronics Corporation, "Tutorial on CMC's digital ic design 
flow," Document ICI-096, 2002. 
[21] A. Menezes, P. van Oorschot and S. Vanstone, Handbook of Applied Cryptogra-
phy, 1st ed. CRC Press, 1997. 
.---------------------------------------
Appendix A 
Partial VHDL Codes for SCFB 
Systems 
A.l SCFB System Controller using Serial Thansfer 
- Author: Liang Zhang 
- Modification Date: 20th, Aug. 2006 
- SCFB system Controller 
library ieee; 
use ieee.stdJogic_l164.all; 
use ieee.stdJogic_arith.all; 
use work.all; 
entity Controller_SCFB is 
port( clkl : in stdJogic; 
reset : in stdJogic; 
123 
APPENDIX A . PARTIAL VHDL CODES FOR SCFB SYSTEMS 
New _IV _Done : in stdJogic; 
Cipher _Done : in stdJogic; 
RD_Done : in std_logic; 
SR_Done : in stdJogic; 
Cho_Mux : out stdJogic; 
SR_Load : out stdJogic; 
Reg_Load : out stdJogic; 
Unhold_on : out stdJogic); 
end Controller _SCFB ; 
architecture structural of Controller _SCFB is 
124 
type state_type is (On_Rst, Gen_Key, Reg_Taking_Key, Reg_Occupied, SR_Loading_Key, 
Wait_State, New_IV_Found); 
signal state, next_state : state_type; 
begin - Next State Decoding: 
Next_State_Decoding: process (state, New_IV _Done, Cipher_Done, RD_Done, SR_Done) 
begin 
case state is 
when On_Rst => 
next_state <= Gen_Key; 
when Gen_Key => 
if (New_IV _Done='l') then 
,--------------------------------------
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
nexLstate <= New_IV_Found; 
elsif (Cipher_Done='l' and RD_Done=' l ') then 
nexLstate <= Reg_Taking_Key; 
elsif (Cipher_Done='O' or RD_Done= 'O') then 
nexLstate <= Gen_Key; 
end if; 
when Reg_Taking_Key => 
if (New_IV_Done='l ') then 
nexLstate <= New_IV_Found; 
elsif (SR_Done='O') then 
nexLstate <= Reg_Occupied; 
elsif (SR_Done='l ') then 
nexLstate <= SR.Loading_Key; 
end if; 
when Reg_Occupied => 
if (New_IV_Done='l ') then 
nexLstate < = New _IV _Found; 
elsif (SR_Done= 'O') then 
nexLstate <= Reg_Occupied; 
elsif (SR_Done=' l ') then 
nexLstate <= SR_Loading_Key; 
end if; 
when SR_Loading_Key => 
125 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
if (New _IV _Done=' 1 ') then 
nexLstate <= New_IV_Found; 
elsif (New_IV_Done='O' and SR_Done='O') then 
next_state <= WaiLState; 
end if; 
when WaiLState => 
if (New_IV _Done='1 ') then 
next_state <= New_IV_Found; 
else 
next_state <= Gen_Key; 
end if; 
when others => 
next_state <= Gen_Key; 
end case; 
end process N exLState_Decoding; 
- Clock the State Machine: 
clock_state_machine: process (clk1, reset) begin 
if (reset= '1') then 
state <= On_Rst; 
elsif ( clk1 'event and clk1='0') then 
state <= next_state; 
end if; 
126 
APPE OIX A. PARTIAL VHDL CODES FOR SCFB SYSTEM S 
end process clock_sta te_machine; 
- Generation of the Combinatorial Control Signals: 
combinationalJogic: process (state, next ....state) 
begin 
if (state = On_Rst) then 
Cho_Mux <= '0'; 
SRLoad < = '0 ; 
Unhold_on <= '0'; 
Reg_Load <= '0'; 
elsif (state = Reg_ Taking_Key) then 
Reg_Load <= '1'; 
elsif (state = Reg_Occupied) then 
Reg.Load <= '0'; 
Cho_Mux <= '1 '; 
elsif (state = SR_Loading_Key) then 
SR_Load <= '1 '; 
Unhold_on <= '1'; 
Reg_Load <= '0'; 
Cho_Mux <= '1'; 
elsif (state = Wait_State) then 
Unhold_on <= '0'; 
SR_Load <= '0'; 
elsif (st ate = ew _IV _Found) then 
SR_Load <= '0'; 
127 
APPE DIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
Unhold_on <= '0' ; 
Reg_Load <= '0'; 
Cho_Mux <= '1 ' ; 
end if; 
if (next_state = Wait_State) then 
SR_Load <= '0'; 
Unhold_on <= '0'; 
end if; 
end process combinationalJogic; 
end structural; 
12 
A.2 SCFB System Controller using Paralle l Trans-
fer 
library ieee; 
use ieee.stdJ.ogic_l164.all; 
use ieee.stdJ.ogic_arith.all; 
use work.all; 
entity Controller_SCFB is 
port( clk1 : in stdJ.ogic· 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
reset : in std_logic; 
New _IV _Done : in stdJogic; 
Cipher _Done : in std_logic; 
RD_Done : in stdJogic; 
SR_Done : in stdJogic; 
Cho_Mux : out stdJogic; 
SR_Load : inout stdJogic; 
Reg_Load : out stdJogic; 
Unhold_on : out stdJogic); 
end Controller _SCFB ; 
architecture structural of Controller_SCFB is 
type state_type is (On_Rst, Gen_Key, Reg_Taking_Key, Reg_Occupied, 
SR_Loading_Key, Wait_State, New_IV_Found); 
signal state, next..state : state_type; 
signal tmp : stdJogic; 
begin 
process ( clk1, reset, state) 
begin 
if (reset=' 1 ') then 
tmp <= '0'; 
elsif (clk1'event and clk1='0' and SR_Load = '1') then 
tmp <= '1'; 
elsif (state /= SR_Loading_Key) then 
129 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB S YST EMS 
tmp <= '0'; 
end if; 
end process; 
- Next State Decoding: 
NexLStat e_Decoding: process (state , New_IV _Done, Cipher_Done, 
RD_Done, SR_Done) 
begin 
case state is 
when On_Rst => 
next__state <= Gen_Key; 
when Gen_Key => 
if (New _IV _Done=' 1 ') then 
next __stat e < = New _IV _Found; 
elsif (Cipher_Done=' l ' and RD_Done='l ') then 
nexLstate <= Reg_Taking_Key; 
elsif (Cipher_Done='O' or RD_Done='O') then 
next ..st ate <= Gen_Key; 
end if; 
when Reg_TakingJ {ey => 
if (New_IV_Done=' l ') then 
next..st at e <= New_IV_Found; 
elsif (SR_Done='O') then 
130 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
next_state <= Reg_Occupied; 
elsif (SR_Done='1 ') then 
next_state <= SR_Loading_Key; 
end if; 
when Reg_Occupied => 
if (New_IV_Done='1') then 
next...state <= New_IV_Found; 
elsif (SR_Done='O') then 
next...state <= Reg_Occupied; 
elsif (SR_Done='1 ') then 
next_state <= SR_Loading_Key; 
end if; 
when SR_Loading_Key = > 
if (New _IV _Done= ' 1' and 
next ...state / = WaiLS tate) then 
next...state <= New_IV_Found; 
elsif (New_IV_Done='O' and SR_Done='O') then 
next...state <= Wait_State; 
end if; 
when Wait_State => 
if (New _IV _Done=' 1 ') then 
next...state < = New_IV_Found; 
else 
131 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
next_state <= Gen_Key; 
end if; 
when others => 
next_state < = Gen_Key; 
end case; 
end process N ext_State_Decoding; 
- Clock the State Machine: 
clock_state_machine: process (clkl , reset) 
begin 
if (reset= ' l ') then 
state <= On_Rst; 
elsif (clkl'event and clkl='O') then 
state <= next_state; 
end if; 
end process clock_state_machine; 
- Generation of the Combinatorial Control Signals: 
combinationaLlogic: process (state, next_state, tmp) 
begin 
if (state = On_Rst) then 
Cho_Mux <= '0'; 
SR_Load <= '0'; 
132 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
Unhold_on <= '0'; 
Reg_Load <= '0'; 
elsif (state = Reg_Taking_Key) then 
Reg_Load <= '1'; 
elsif (state = Reg_Occupied) then 
Reg_Load <= '0'; 
Cho_Mux <= '1'; 
elsif (state = SR_Loading_Key) then 
SR_Load <= '1'; 
Unhold_on <= '1'; 
Reg_Load <= '0'; 
Cho_Mux <= '1'; 
elsif (state = Wait_State) then 
Unhold_on <= '0'; 
SR_Load <= '0' ; 
elsif (state = New _IV _Found) then 
SR_Load <= '0'; 
Unhold_on <= '0'; 
Reg_Load <= '0'; 
Cho_Mux <= '1'; 
end if; 
if (next_state = WaiLState) then 
Unhold_on <= '0'; 
end if; 
133 
APPE DIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
if (tmp = '1') then 
SR_Load <= '0'; 
end if; 
end process combinationalJogic; 
end structural; 
A.3 Pipelined SCFB System Controller 
library ieee; 
use ieee.stdJogic_ll64.all; 
use ieee.stdJogic_arith.all; 
use work. all ; 
entity Controller_CTR_SCFB is 
port( clk1 : in stdJogic; 
clk3 : in stdJogic; 
reset : in stdJogic; 
Cipher_Done1 : in stdJogic; 
Cipher_Done2 : in stdJogic; 
SRLFini : in stdJogic; 
SR2_Fini : in stdJogic; 
Blackout_Period : in integer range 0 to 191· 
134 
.-------------------------------------------
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
SRLSpeciaLCase : in stdJogic; 
SR2_SpeciaLCase : in stdJogic; 
CTR_Func_Enab : inout stdJogic; 
SRLLoad : out stdJogic; 
SR2_Load : out stdJogic; 
Flag_SR1 : out stdJogic; 
Flag_SR2 : out stdJogic; 
AES_Frozen : inout stdJogic; 
Queue_Stall : out stdJogic ) ; 
end Controller _CTR_SCFB ; 
architecture structural of Controller_CTR_SCFB is 
type state_type is (On_Rst , Gen_Key, SRLLoadKey, SR2_LoadKey, 
WaiLinit, SRLLoad_Norm, SR2_Load_Norm, 
WaitLNorm, Wait2_Norm, Resync1 , ResyncLContd, 
Resync2, Resync2_Contd, 
Queue_Stalled1, Queue_Stalled2, 
Queue_Stalled3, Queue_Stalled4); 
signal currenLstate, next_.state : state_type; 
begin 
- Next State Decoding: 
N ext_State_Decoding: process ( current_state, Cipher _Done 1, 
135 
APPE DIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
Cipher_Done2, SRLFini, 
SR2_Fini,BlackouLPeriod,AES_Frozen, 
SR2_SpeciaLCase) 
begin 
case current....state is 
when On_Rst => 
next....state <= Gen_Key; 
when Gen_Key => 
if (Cipher_Donel='O') then 
nexLstate <= Gen_Key; 
elsif (Cipher_Donel='l ' ) then 
next....state < = SRLLoadKey; 
end if; 
when SRLLoadKey => 
if (Cipher_Done2 = 'O')then 
next....state <= SRLLoadKey; 
elsif (Cipher_Done2='1') then 
next....state <= SR2_LoadKey; 
end if; 
when SR2_LoadKey => 
next....state <= WaiLinit; 
136 
-------- ----------------------
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
when Wait_Init => 
if (SRLFini = '1') then 
next_state <= SRLLoad_Norm; 
elsif (SRLFini = '0') then 
nexLstate <= Wait_Init; 
end if; 
when SRLLoad_Norm => 
nexLstate <= WaitLNorm; 
when WaitLNorm => 
if (SR2_Fini = '1 ') then 
next_state <= SR2_Load_Norm; 
elsif ((Blackout_Period = 1) and AES_Frozen = '1')then 
next_state <= Resync2; 
else nexLstate <= WaitLNorm; 
end if; 
when SR2_Load_Norm => 
next_state <= Wait2_Norm; 
when Wait2_Norm => 
if (SRLFini = '1 ') then 
next_state <= SRLLoad_Norm; 
elsif ((Blackout_Period = 1) and AES_Frozen = '1') then 
next_state <= Resyncl; 
137 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
else nexLstate <= Wait2_Norm; 
end if; 
when Queue_Stalled3 => 
if (AES_Frozen = '1') then 
nexLstate <= Resync1; 
else next_state <= Queue_Stalled3; 
end if; 
when Resyncl = > 
if (SR2_SpeciaLCase = '0' ) then 
nexLstate <= ResyncLContd; 
elsif (SRLSpeciaLCase = '1 ') then 
next_state <= Queue_Stalled1; 
end if; 
when Queue_Stalled4 => 
if (AES_Frozen = T) then 
next_state <= Resync2; 
else next_state <= Queue_Stalled4; 
end if; 
when Resync2 => 
if (SRLSpeciaLCase = '0') then 
next_state <= Resync2_Contd; 
elsif (SR2_SpeciaLCase = T) then 
138 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
nexLstate <= Queue_Stalled2; 
end if; 
when ResyncLContd => 
if (SRLSpeciaLCa.se = '0' and SRLFini = '1 ') then 
nexLstate <= SR2_Load_Norm; 
elsif (SRLSpeciaLCa.se = '1 ') then 
nexL.state <= Queue_Stalled1; 
else next....state <= ResyncLContd; 
end if; 
when Resync2_Contd => 
if (SR2_SpeciaLCa.se = '0' and SRLFini = '1 ') then 
next....state <= SRLLoad_Norm; 
elsif (SR2_SpeciaLCa.se = '1 ') then 
next....state <= Queue_Stalled2; 
else next....state <= Resync2_Contd; 
end if; 
when Queue_Stalled1 => 
if (AES_Frozen = '1') then 
nexLstate <= SR2_Load_Norm; 
else next....state <= Queue_Stalled1; 
end if; 
when Queue_Stalled2 => 
139 
APPENDIX A . PARTIAL VHDL CODES FOR SCFB SYSTEMS 
if (AES_Frozen = '1') then 
next....state <= SRLLoad_Norm; 
else next....state <= Queue_Stalled2; 
end if; 
when others => 
nexLstate <= Gen_Key; 
end case; 
end process NexLState_Decoding; 
- Clock the State Machine: 
clock....state_machine: process (clk1, reset) 
begin 
if (reset='1 ') then 
current....state <= On_Rst; 
elsif ( clk1 'event and clk1= '0') then 
current....state <= nexLstate; 
end if; 
end process clock....state_machine; 
- Generation of the Combinatorial Control Signals: 
combinationaLlogic: process ( current....state, clk3, next ....state) 
begin 
if ( current....state = On_Rst) then 
140 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEM S 
SRLLoad <= '0'; 
SR2_Load <= '0'; 
Flag_SR1 <= '0'; 
Flag_SR2 <= '0'; 
AES_Frozen <= '0'; 
Queue_Stall <= '0 ; 
elsif ( currenLstate = SRLLoadKey) then 
SRLLoad <= '1 '; 
elsif ( current...st ate = SR2_LoadKey) then 
SR2_Load <= '1'; 
SRLLoad <= '0'; 
AES_Frozen <= '1'; 
Flag_SR1 <= '1'; 
elsif ( current...state = WaiLinit) then 
SRLLoad <= '0' ; 
SR2_Load <= '0' ; 
elsif (current...state = SRLLoad_Norm) then 
SRLLoad <= '1' ; 
Queue_Stall <= '0' ; 
elsif ( current...st ate = WaitL orm) then 
SRLLoad <= '0'; 
141 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
SR2_Load <= '0'; 
Flag_SR1 <= '0'; 
Flag_SR2 <= '1'; 
elsif (current....state = SR2_Load_Norm) then 
SR2_Load <= '1'; 
Queue_Stall <= '0'; 
elsif ( current....state = Wait2_Norm) then 
SRLLoad <= '0'; 
SR2_Load <= '0'; 
Flag_SR1 <= '1'; 
Flag_SR2 <= '0' ; 
elsif (current ....state = Resync1) then 
AES_Frozen <= '0'; 
SRLLoad <= '1'; 
Queue_Stall <= '0'; 
Flag_SR1 <= '0' ; 
Flag_SR2 <= '1'; 
elsif ( current....state = Resync2) then 
AES_Frozen <= '0'; 
SR2_Load <= '1'; 
Queue_Stall <= '0 ' ; 
Flag_SR1 <= '1'; 
142 
APPENDIX A. PART IAL VHDL CODES FOR SCFB SYSTEMS 
Flag_SR2 <= '0'; 
elsif ( currenLstate = Resync2_Contd) then 
SR2_Load <= '0'; 
elsif ( currenLstate = ResyncL Contd) then 
SRLLoad <= '0'; 
elsif ( currenLstate = Queue_Stalledl) then 
Queue_Stall < = '1 '; 
SRLLoad <= '0'; 
Flag_SR1 <= '1'; 
Flag_SR2 <= '0'; 
elsif ( currenLstat e = Queue_Stalled2) then 
Queue_Stall <= '1 '; 
SR2_Load <= '0'; 
Flag_SR1 <= '0'; 
Flag_SR2 <= '1'; 
elsif ( currenL.state = Queue_Stalled3) then 
Queue_Stall <= '1'; 
elsif ( currenL.state = Queue_Stalled4) then 
Queue_Stall <= '1 '; 
143 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
end if; 
- To make the block cipher generate the key 
- stream in 1 clk1 cycle earlier 
if ( next...state = SR2_Load_N orm or 
next..state = SRLLoad_Norm) then 
AES_Frozen <= '0'; 
end if; 
- Constrain the Block Cipher to generate 
- only one block of new Keystream per clk3 
if (clk3'event and clk3 = '1') then 
if (AES_Frozen = '0' and 
current..state I= On_Rst and 
current...state I= Gen_Key and 
current_state I= SRLLoadKey) then 
AES_Frozen <= '1'; 
end if; 
end if; 
end process combinationalJogic; 
- To handle the "CTR_Func_Enab" 
144 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
process (reset, currenLstate, clk3) 
begin 
if (reset = '1 ') then 
CTR_F\mc_Enab <= '0'; 
elsif ( currenLstate = On_Rst) then 
CTR_Func_Enab < = '1 '; 
end if; 
if (clk3'event and clk3 = '1' and CTR_Func_Enab = '1' ) then 
CTR_Func_Enab <= '0'; 
end if; 
end process; 
end structural; 
A.4 Top Level RTL of Pipelined SCFB System 
- Top-level design of the Pipelined SCFB 
- Author : liang zhang 
- July 23rd, 2007 
LIBRARY IEEE; 
USE IEEE.stdJogic_1164.all; 
145 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
USE IEEE.numeric...std.ALL; 
use work.mypackage.all; 
use ieee.stdJ.ogic_unsigned.all; 
use work.all; 
entity SCFB is 
port ( clk3 : in stdJ.ogic; 
clkl : in stdJ.ogic; 
clk2 : in stdJ.ogic; 
reset : in stdJ.ogic; 
aesJniLdataJoad : in stdJ.ogic; 
ivalid : IN stdJ.ogic; 
PlaintextJn : I stdJ.ogic_vector(7 downto 0) ; 
syn_pattern : in stdJogic_vector(7 downto 0) ; 
Num_PQ_Ov rfiow_Bits : out stdJogic_vector(ll DOWNTO 0) ; 
Num_CQ_Undcrflow_Bits : out stdJogic_vector(ll DOWNTO 0) ; 
aver_Num_bitJn_PQ : out stdJ.ogic_vector(32 downto 0) ; 
PQ_Full : inout stdJ.ogic; 
CipherText : inout stdJ.ogic_vector(7 downto 0); 
ovalid : OUT stdJogic ); 
end SCFB; 
architecture STRUCTURAL of SCFB is 
146 
APPE DIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
- ===== Component Definition === 
component AES_en_:fly_onKey_withCTR is 
PORT( 
clk3 : IN STD_LOGIC; 
rst : I STD_LOGIC; 
aesJnit_dataJoad : I STD_LOGIC; 
hold_on : in stdJogic; 
AES_Frozen : IN stdJogic; 
CTR_F\mc_Enab : IN stdJogic; 
new_IV : IN stdJogic_vector(127 downto 0) ; 
ciphertext : OUT data_type; 
donel : OUT STD_LOGIC; 
done2 : OUT STD_LOGIC 
) ; 
end component; 
component Shift_Register _CTR_SCFB is 
port( clkl , reset : in stdJogic; 
CQ__Full : in stdJogic; 
SRLLoad : in stdJogic; 
SR2_Load : in stdJogic; 
flag_SRl : in stdJogic; 
:flag_SR2 : in tdJogic; 
Queue_Stall : in std_logic; 
147 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
Key _Stream_In : in data_ type; 
Sync_Ref : in stdJogic_ vector( 4 down to 0); 
Blackout_Period : in integer range 0 to 191; 
SRLSpeciaLCase : inout std_logic; 
SR2_SpeciaLCase : inout stdJogic; 
SRLFini : inout stdJogic; 
SR2_Fini : inout stdJogic; 
SR_Valid : out stdJogic; 
Key_Stream_Out : inout stdJogic_vector(7 downto 0) 
) ; 
end component; 
component Controller_CTR_SCFB is 
port( clk1 : in stdJogic; 
clk3 : in stdJogic; 
reset : in stdJogic; 
Cipher _Done1 : in stdJogic; 
Cipher _Done2 : in stdJogic; 
SRLFini : in stdJogic; 
SR2_Fini : in stdJogic; 
Blackout-Period : in integer range 0 to 191; 
SRLSpeciaLCase : in stdJogic; 
SR2_SpeciaLCase : in stdJogic; 
148 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
CTR_Func_Enab : inout stdJogic; 
SRLLoad : out stdJogic; 
SR2_Load : out stdJogic; 
Flag_SRl : out stdJogic; 
Flag_SR2 : out stdJogic; 
AES_Frozen : inout stdJogic; 
Queue_Stall : out stdJogic 
) ; 
end component ; 
component FIFO_PQ is 
PORT ( 
wclk : IN stdJogic; 
rclk : IN stdJogic; 
rst : IN stdJogic; 
ivalid : I stdJogic; 
idata : IN stdJogic_vector(7 downto 0); 
SR_Valid: in stdJogic; 
PQ_Full: inout stdJogic; 
odata : inout stdJogic_vector(7 downto 0) ; 
ovalid : OUT stdJogic; 
aver_Num_biLin_PQ : out stdJogic_vector(32 downto 0) 
) ; 
END component; 
component FIFO_CQ is 
149 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
PORT ( 
wclk : IN stdJogic; 
rclk : IN stdJogic; 
rst : IN stdJogic; 
ivalid : IN stdJogic; 
idata : IN stdJogic_vector(7 downto 0); 
odata : OUT stdJogic_vector(7 downto 0) ; 
ovalid : OUT stdJogic; 
CQ_F\.111: inout stdJogic 
) ; 
END component; 
component IV _Queue is 
port ( clk1 : in stdJogic; 
reset : in stdJogic; 
PQ_Valid : in stdJogic; 
IV _in : in stdJogic_vector(7 downto 0); 
syn_pattern : in stdJogic_vector(7 downto 0); 
SR_pointer : in natural range 0 to 128; 
LasLIV Jength_contd : in natural range 0 to 100; 
SR_Valid : in stdJogic; 
150 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
SR_Load : in stdJogic; 
new _IV _done : inout stdJogic; 
hold_on_for_pQ_SR : inout stdJogic; 
lasLIV _notice : inout stdJogic; 
IV_out : out stdJogic_vector(127 downto 0) ; 
IV_SR_counter : inout stdJogic_vector(4 downto 0) ; 
Blackout_P riod : out integer range 0 to 191; 
sync_ref : out stdJogic_vector( 4 downto 0) 
) ; 
end component; 
component PQ_Ov rfl.ow _Counter is 
PORT( 
clk2 : IN stdJogic; 
rst : IN stdJogic; 
PQ_Full : IN stdJogic; 
Num_PQ_Overfl.ow_Bits : out stdJogic_vector(ll DOW TO 0) 
) ; 
end component; 
component CQ_Underfl.ow_Count r is 
PORT( 
clk2 : I stdJogic; 
rst : I stdJogic; 
151 
APPE DIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
CipherText : IN stdJogic_vector(7 downto 0); 
Num_CQ_Underflow_Bits : out stdJogic_vector(ll DOWNTO 0) 
) ; 
end compon nt; 
- ========= Signal Definition ===== 
signal New_IV_Done : stdJogic; 
signal AES_Frozen : std_logic; 
signal CTR_Func_Enab : stdJogic; 
signal ew _IV : std_logic_ vector ( 127 down to 0) ; 
signal Key _Str am_In : data_type; 
signal Cipher _Done2 : stdJogic; 
signal Cipher _Donel : stdJogic; 
signal SRLFini : std_logic; 
signal SR2_Fini : stdJogic; 
signal Blackout-Period : integer range 0 to 191; 
signal SRLSpeciaLCase : stdJogic; 
signal SR2_SpeciaLCase : std_logic; 
signal SRLLoad : stdJogic; 
signal SR2_Load : std_logic; 
signal Flag_SRl : std_logic; 
signal Flag_SR2 : stdJogic; 
signal Queue_Stall : stdJogic; 
signal CQ_Full : std_logic; 
signal sync....ref : std_logic_ vector( 4 down to 0) · 
signal SR_Valid : std_logic; 
152 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
signal Key_Stream_Out: stdJogic_vector(7 downto 0) ; 
signal Plaintext_out : stdJogic_vector(7 downto 0); 
signal CipherKey_PQ_out : stdJogic_vector(7 downto 0); 
signal ovalid_PQ : stdJogic; 
signal SR_pointer : natural range 0 to 128; 
signal Last_IV Jength_contd : natural range 0 to 100; 
signal SR_Load : stdJogic; 
signal hold_on_for _PQ_SR : stdJogic; 
signal lasLIV _notice : stdJogic; 
signal IV_SR_counter: stdJogic_vector(4 downto 0); 
153 
for all: AES_en_fiy_onKey_withCTR use entity work.AES_en_fiy_onKey_withCTR; 
for all: Shift_Register_CTR_SCFB use entity work.Shift_Register_CTR_SCFB; 
for all: Controller_CTR_SCFB use entity work.Controller_CTR_SCFB; 
for all: FIFO_PQ use entity work.FIFO_PQ; 
for all: FIFO_CQ use entity work.FIFO_CQ; 
for all: IV _Queue use entity work. IV _Queue; 
for all: PQ_Ovcrflow _Counter use entity work.PQ_Ovcrflow _Counter; 
for all: CQ_Underflow_Counter use entity work.CQ_Underflow_Counter ; 
begin 
AES_core : AES_en_fiy_onKey_withCTR port map 
clk3) 
reset , 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
aesJniLdataJoad, 
new _IV _done, 
AES_Frozen, 
CTR_Func_Enab, 
New_IV, 
Key _Stream_In , 
Cipher_Donel, 
Cipher_Done2 ) ; 
State_Machine_SCFB: Controller_CTR_SCFB port map 
( clkl, 
clk3, 
reset , 
Cipher_Donel , 
Cipher_Done2, 
SRLFini, 
SR2_Fini, 
Blackout_Period, 
SRLSpeciaLCase, 
SR2_SpeciaLCase, 
CTR_Func_Enab, 
SRLLoad, 
SR2_Load, 
154 
~-----------------------------------------------------------------------
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
Flag_SRl , 
Flag_SR2, 
AES_Frozen, 
Queue_Stall ) ; 
Shift_Register_SCFB: ShifLRegister_CTR_SCFB port map ( clkl, 
reset, 
CQ_Full, 
SRLLoad, 
SR2_Load, 
Flag_SRl, 
Flag_SR2, 
Queue_Stall, 
Key _Stream_In, 
sync_ref, 
BlackouLPeriod, 
SRLSpeciaLCase, 
SR2_SpeciaLCase, 
SRLFini, 
SR2_Fini, 
SR_Valid , 
Key _Stream_ Out 
) ; 
FIFO_PQ_component: FIFOYQ port map 
155 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
( 
clk2, 
clkl, 
reset , 
ivalid, 
Plaintextjn, 
SR_Valid, 
PQ_Full, 
PlaintexLout, 
ovalid_PQ, 
aver _N um_bi Lin_PQ 
) ; 
FIFO_CQ_component: FIFO_CQ port map 
( 
clkl , 
clk2, 
reset, 
ovalid_PQ, 
Cipher Key _PQ_out, 
156 
APPENDIX A . PARTIAL VHDL CODES FOR SCFB SYSTEMS 
CipherText, 
ovalid, 
CQ_Full 
) ; 
IV _ShiftR: IV _Queue port map 
( clkl, 
reset, 
ovalid_PQ, 
Cipher Key _PQ_out, 
syn_pattern, 
SR_pointer, 
LasLIV Jength_contd, 
SR_Valid, 
SR_Load, 
New_IV_Done, 
hold_on_for _PQ_SR, 
last_IV _notice, 
New_IV, 
IV _SR_counter, 
Blackout_Period, 
sync_ref ); 
157 
APPENDIX A. PARTIAL VHDL CODES FOR SCFB SYSTEMS 
PQOverfl.ow: PQ_Overflow_Counter port map 
(clk2, 
reset, 
PQ_Full, 
Num_PQ_Overflow_Bits ); 
CQ Underflow: CQ_ Underflow _Counter port map 
clk2 ' 
reset , 
CipherText, 
Num_CQ_Undcrfl.ow_Bits ); 
CipherKey_PQ_outj= Plaintext_out XOR Key_Stream_Out; 
end STRUCTURAL; 
158 




