A hardware implementation of Rivest-Shamir-Adleman co-processor for resource constrained embedded systems by Paniandi, Arul
  
CHAPTER 1 
 
 
 
 
INTRODUCTION 
 
 
 
 
This thesis proposes the design and implementation of a RSA cryptographic 
co-processor on FPGA. The design applies the System-on-Chip (SoC) technology to 
produce a RSA cryptosystem that performs operations such as encryption, decryption 
and key generation.  The aim is to produce a RSA co-processor that strikes a balance 
between speed and area so that it is both compact and fast enough for commercial 
implementation. This first chapter covers background of research, problem 
statement, research objectives, scope of work, significance and contribution of the 
research, and finally thesis organization. 
 
 
 
 
1.9 Background 
 
 
The use of mobile electronic devices like smart cards, wireless handsets, 
PDAs, PCs, and network equipment, are becoming more prevalent since the turn of 
the new millennium. Their various applications cover almost every aspect of human 
life, including some very important fields like commerce and person identification. 
These embedded systems are ubiquitously used to capture, store, manipulate, and 
exchange sensitive information over insecure mediums, and consequently, they are 
subject to increasing security concerns.  
 
 
 
 
  
2
This concern can be addressed effectively by the application of crypto 
algorithms in these devices. Security mechanisms utilize crypto algorithms (public-
key ciphers, symmetric encryption, hashing functions, etc.) as building blocks in a 
suitable scheme to achieve the desired security services. The fundamental security 
requirements include confidentiality, authentication, data integrity, and non-
repudiation. To provide such security services, normally systems use public key 
cryptography. Among the various public key cryptography algorithms, the RSA 
cryptosystem [Rivest et al, 1978] is the best known and widely used public key 
crypto algorithm today. It is named after Ron Rivest, Adi Shamir and Len Adleman, 
who invented it in 1977.  
 
 
Since RSA is the current de-facto public key crypto algorithm, numerous 
implementations of RSA have been done throughout the world. Two main 
approaches are pursued, which are software implementations and hardware 
implementations. Software solutions are slower in performance compared to 
hardware implementations since they are not dedicated to the RSA operation. To 
achieve optimal system performance while maintaining physical security, it is 
desirable to implement the RSA algorithm in hardware. Hardware implementations 
also can be made tamper-resistant and clone-free.  
 
 
 
 
1.2 Problem Statement 
 
 
Public key cryptosystems have proved to be essential in the security of 
electronic transactions especially with the sudden boom in electronic commerce and 
transmissions of secure personal data. Since their invention in 1976 by Whitfield 
Diffie and Martin Hellman [1976] to solve the key management problem in 
symmetric key cryptography, various public key cryptosystems such as RSA, El-
Gamal and ECC, have been proposed. Public key cryptography can be used not only 
for privacy (encryption), but for authentication as well. Unfortunately, its drawback 
is that it performs much slower compared to symmetric key cryptography. 
 
 
  
3
As the RSA algorithm provides high security and easy to implement, it 
quickly became the most widely used public key cryptosystem. Its advantage is that 
it is able to provide privacy, confidentiality and digital signatures using the same key 
pair, and based on the same mathematical operation. However, due to its underlying 
complex wide-operand modular arithmetic, the RSA operation requires a long 
computation time. Software implementations of RSA are about 100 times slower 
than DES while hardware implementations of RSA are about 1000 times slower than 
DES. (Schneier, 1996) 
 
 
Due to increasing data rates and complexity of security protocols, software 
solutions are not sufficient to keep up with the computational demands of crypto 
processing. Thus, hardware implementation presents a viable solution to implement a 
RSA cryptosystem. Unfortunately, due to its underlying complex wide-operand 
modular arithmetic, the implementation of RSA in hardware poses a design 
challenge in itself. Coupled with the very fast speed requirement, the design 
challenge increases dramatically when we further add in the resource constraint issue 
of mobile electronic devices. 
 
 
Although a plethora of RSA cryptosystems in hardware exists, most of them 
are tailored to high-speed applications thus do not display a suitable compromise 
between speed and utilized hardware resources. As hardware resources are cost 
critical factors in devices like smart cards and hardware tokens, current 
implementations of RSA cores are unsuitable for them.  
 
 
Therefore, a compact yet reasonably fast RSA co-processor core is much 
needed to facilitate the upcoming of cryptographic functions in mobile devices. The 
RSA co-processor core design should be able to strike good a balance between speed 
and resource utilization. The design should also be parameterized so that it can be 
scaled up or down from the 1024 bits for either a more compact implementation with 
some compromise to the level of security, or a larger design with higher security. 
 
 
  
4
This flexibility in design could not be provided by full custom and semi 
custom ASIC solutions. However, reconfigurable logic like FPGA and CPLD can 
provide this flexibility. In hardware implementation, the FPGA has become the 
chosen platform for any proof-of-concept design, before being committed to an 
ASIC (Application-Specific Integrated Circuit) or VLSI implementation. Other than 
that, FPGA also allows for rapid prototyping which makes them suitable for 
implementations of crypto hardware on embedded systems. 
 
 
 
 
1.3 Objectives 
 
 
From the discussion in the previous sections, the objectives of the work 
presented in this thesis are as follows: 
 
 
1) To design and implement a 1024 bit RSA core which is able to perform RSA 
encryption and decryption within stipulated area and speed constraints. The 
design also has to be parameterizable so that it can be reconfigured for 
different key lengths.  
 
 
2) To design an embedded RSA cryptosystem that integrates the RSA core with 
an embedded processor on a System-on-Programmable Chip (SoPC) 
platform. 
 
 
3) To develop a prototype for demonstration of real-world RSA cryptography as 
a verification system in PC environment through the use of Graphical User 
Interface (GUI). A simple file encryption system is developed as the 
demonstration application prototype. 
 
 
 
 
 
 
 
  
5
1.4 Scope of Work 
 
 
Based on the outlined objectives above, available hardware and software 
resources, and the time frame allocated, this research project is narrowed down to the 
following scope of work. 
 
 
1) As specified by the research objectives, a hardware implementation of 1024 
bit RSA must consist of approximately 50,000 gates and must be able to 
perform the RSA encryption and decryption operation in less than 100 ms. 
Similarly, a 2048 bit RSA implementation must consist of approximately 
100,000 gates and must be able to perform the RSA encryption and 
decryption operation in less than 400 ms. (MyMS, 2004) 
 
2) The RSA co-processor, henceforth known as UTM-RSA_CoProcessor, is 
designed using VHDL. The design must be parameterizable so that the co-
processor can be reconfigured to other key sizes, based on the security level 
and the hardware resources required by targeted applications. 
 
3) The UTM-RSA_CoProcessor is integrated with the Nios II embedded 
processor to form the RSA Processor. The proposed RSA Processor is to fit 
into an Altera Stratix EP1S40F780C5 FPGA chip (which contains 41250 LEs 
(Logic Elements) or an equivalent of 14 x 106 system gates). The running 
frequency of the proposed cryptosystem with the RSA Processor is limited to 
40 MHz.  
 
4) The proposed RSA cryptosystem must be able to generate the RSA key pairs 
on chip, which means the RSA keys does not need to leave the embedded 
system. However, the issue of secure storage of the keys generated or used in 
the cryptosystem will not be addressed. (In actual applications like the Public 
Key Infrastructure, the public key is generated by a Certification Authority) 
 
 
 
  
6
5) The test and validation methodologies are carried out to verify the functional 
operations of the RSA Processor. Cryptanalysis techniques to measure the 
security level of the embedded system will not be covered in this work. 
 
6) A simple file encryption system is developed to validate the RSA 
cryptosystem. The current version is able to encrypt /decrypt a file limited to 
size of not more 4 GB. For a file larger than this size, the file needs to be 
chopped into multiple smaller files.  
 
 
 
 
1.5 Research Strategies 
 
 
 The following research strategies have been applied during the course of 
research to ensure a complete and quality research is carried out.  
 
 
1. The speed and area constraints are set based on the problems and stringent 
requirements demanded by industries in the commercial environment, which 
in turn increases the design challenge many times. 
 
2. The established RSA algorithms are studied and the necessary algorithmic 
modifications (without changing the actual algorithm itself) are determined 
for efficient mapping of the algorithm onto hardware. 
 
3. The designed RSA co-processor (UTM-RSA_CoProcessor) is integrated with 
a general-purpose embedded processor to obtain a complete RSA Processor 
on a System-on-Programmable Chip (SoPC) platform. 
 
4. An application demonstration prototype is developed as the means to perform 
the RSA cryptosystem’s verification on real-world test patterns. 
 
 
 
 
 
 
  
7
1.6 Research Contribution and Project Delivery 
 
 
1) A comprehensive design technique for design of an RSA core limited by 
computation speed and design area constraints for application in resource 
constrained embedded systems. 
 
 
2) Design of a complete embedded RSA cryptosystem that incorporates a 32-bit 
RISC embedded general-purpose Nios II processor. Besides performing 
encryption and decryption, it also is able to perform on-chip RSA key 
generation. 
 
3) An application demonstration prototype performing a real-world application 
that incorporates the UTM-RSA_CoProcessor and the Nios II processor to 
form the RSA Processor, and communicating with the standard PC to form 
the RSA Cryptosystem. Figure 1.1 below shows the system architecture of 
the proposed RSA cryptosystem. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.1 : System Architecture of Proposed RSA Cryptosystem 
 
 
 
 
 
 
  
8
1.7 Thesis Organization 
 
 
The work in this thesis is conveniently organized into eight chapters. The first 
chapter presents the motivation and research objectives and follows through with 
research scope and research contribution before concluding with thesis organization. 
 
 
 The second chapter provides brief summaries of the literature reviewed prior 
to engaging the mentioned scope of work. Several topics related to this research are 
reviewed to give an overall picture of the background knowledge involved. Summary 
of the literature review is given to clarify the research rationale. 
 
 
 Chapter three presents the design methodologies that are employed. 
 
 
 Chapter four focuses on the discussion of the implemented RSA algorithm, 
specifically the modular exponentiation and modular multiplication algorithms. This 
is followed by outlines of the necessary algorithmic modifications for better 
hardware implementation 
 
 
 Chapter five delivers the detailed description of the design of the RSA core 
based on the modified algorithms. First, a top-level view of the RSA cryptosystem is 
given before the design of each module is presented in both the top-down and 
bottom-up approach. 
 
 
 Chapter six explains the design of the RSA cryptosystem. First the design of 
the interface module for the RSA core is presented, followed by the development of 
the device drivers and embedded subroutines, the APIs and finally the RSA File 
Encryption Cryptosystem.  
 
 
 Chapter seven presents the tests that are carried out to verify the RSA 
cryptosystem. First, the hardware simulations of individual modules are presented. 
Then, this is followed by tests on the cryptosystem by using embedded software.  
  
9
 In the final chapter of the thesis, the research work is summarized and 
deliverables of the research are stated. Suggestion for potential extensions and 
improvements to the design is also given. 
 
 
 
 
1.8 Summary 
 
 
In this chapter, an introduction was given on the background and motivation 
of the project. The need for a compact yet fast, hardware implementation of RSA 
algorithm is pointed out. Based on those, several objectives were identified and 
scope of project was set to achieve the desired implementation. The UTM-
RSA_CoProcessor was proposed to perform RSA computations on resource 
constrained embedded systems. The following chapter will discuss the literature 
relevant to the research and look into some previous work accomplished on the 
design of RSA hardware.  
 
 
 
 
