Bit-parallel word-serial polynomial basis finite field multiplier in GF(2(233)). by Tang, Wenkai
University of Windsor 
Scholarship at UWindsor 
Electronic Theses and Dissertations Theses, Dissertations, and Major Papers 
2004 
Bit-parallel word-serial polynomial basis finite field multiplier in 
GF(2(233)). 
Wenkai Tang 
University of Windsor 
Follow this and additional works at: https://scholar.uwindsor.ca/etd 
Recommended Citation 
Tang, Wenkai, "Bit-parallel word-serial polynomial basis finite field multiplier in GF(2(233))." (2004). 
Electronic Theses and Dissertations. 1310. 
https://scholar.uwindsor.ca/etd/1310 
This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor 
students from 1954 forward. These documents are made available for personal study and research purposes only, 
in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, 
Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder 
(original author), cannot be used for any commercial purposes, and may not be altered. Any other use would 
require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or 
thesis from this database. For additional inquiries, please contact the repository administrator via email 
(scholarship@uwindsor.ca) or by telephone at 519-253-3000ext. 3208. 
Bit-Parallel W ord-Serial Polynom ial Basis 




Submitted to the Faculty of Graduate Studies and Research through the 
Department of Electrical and Computer Engineering in Partial Fulfillment 
of the Requirements for the Degree of Master of Applied Science at the
University of Windsor
Windsor, Ontario, Canada 
May, 2004






395 Wellington Street 






395, rue Wellington 
Ottawa ON K1A 0N4 
Canada
Your file Votre reference 
ISBN: 0-612-92454-8 
Our file Notre reference 
ISBN: 0-612-92454-8
The author has granted a non­
exclusive licence allowing the 
National Library of C anada to 
reproduce, loan, distribute or sell 
cop ies of this th esis  in microform, 
paper or electronic formats.
The author retains ownership of the 
copyright in this thesis. Neither the 
th esis  nor substantial extracts from it 
may be printed or otherwise  
reproduced without the author's 
permission.
L'auteur a accorde une licence non 
exclusive permettant a la 
Bibliotheque nationale du Canada de  
reproduire, preter, distribuer ou 
vendre d e s  cop ies de cette th ese  so u s  
la forme de microfiche/film, de  
reproduction sur papier ou sur format 
electronique.
L'auteur con serve la propriete du 
droit d'auteur qui protege cette th ese . 
Ni la th ese  ni d e s  extraits substantiels  
de celle-ci ne doivent etre imprimes 
ou aturement reproduits sa n s  son  
autorisation.
In com pliance with the Canadian  
Privacy Act so m e supporting 
forms may have been rem oved  
from this dissertation.
Conform em ent a la loi canadienne  
sur la protection de la vie privee, 
quelques formulaires secon d aires  
ont e te  en lev es  de ce  manuscrit.
While th ese  forms may be included 
in the docum ent page count, 
their removal d o es  not represent 
any lo ss  of content from the 
dissertation.
Bien que c e s  formulaires 
aient inclus dans la pagination, 
il n'y aura aucun contenu manquant.
Canada
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
©  May, 2004 Wenkai Tang
All Rights Reserved. No Part of this document may be reproduced, stored or oth­
erwise retained in a retreival system or transmitted in any form, on any medium by 
any means without prior written permission of the author.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
Smart card gains extensive uses as a cryptographic hardware in security applications 
in daily life. The characteristics of smart card require that the cryptographic hardware 
inside the smart card have the trade-off between area and speed.
There are two main public key cryptosystems, these are RSA cryptosystem and elliptic 
curve (EC) cryptosystem. EC has many advantages compared with RSA such as 
shorter key length and more suitable for VLSI implementation. Such advantages 
make EC an ideal candidate for smart card.
Finite field multiplier is the key component in EC hardware. In this thesis, bit-parallel 
word-serial (BPWS) polynomial basis (PB) finite field multipliers are designed. Such 
architectures trade-off area with speed and are very useful for smart card.
An ASIC chip which can perform finite field multiplication and finite field squaring 
using the BPWS PB finite field multiplier is designed in this thesis. The proposed 
circuit has been implemented using TSMC 0.18 CMOS technology.
A novel 8 x 233 bit-parallel partial product generator is also designed. This new 
partial product generator has low circuit complexity. The design algorithm can be 
easily extended to w x m  bit-parallel partial product generator for GF{2'^).
IV
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
To my parents for their constant encouragement, my wife for her support and my 
daughter Xina.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgements
I would like to express my sincere gratitude to my graduate supervisors Dr. Huapeng 
Wu and Dr. Majid Ahmadi for their constant support, guidance and motivation. I 
am also grateful to the committee members Dr. Kemal E. Tepe and Dr. Jessica Chen 
for providing valuable feedback at all times.
I would like to thank Till Kuendiger for helpful support on the utilization of VLSI 
CAD tools. I would also like to thank Minyi Fu, Bijan Ansari, Zheng Li and Zhong 
Zheng for helpful discussions and suggestions.
VI





List of Figures viii
List of Tables x
Abbreviations xi
1 Introduction 1
1.1 Research m otivations................................................................................. 1
1.1.1 Smart card and its applications.................................................. 1
1.1.2 Cryptography and cryptosystem s............................................... 3
1.1.3 Elliptic curve cryptography (ECC) ............................................ 4
1.2 Research goals.............................................................................................  6
1.3 Thesis organization...................................................................................  6
2 Arithmetic over Finite Field 7
2.1 Group, ring, field and finite field .............................................................  7
2.1.1 G ro u p .............................................................................................  7
Vll
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CONTENTS
2.1.2 R in g ................................................................................................. 8
2.1.3 F ie ld ................................................................................................. 9
2.1.4 Finite field........................................................................................ 10
2.2 Finite field element representations.......................................................  11
2.2.1 Polynomial b a s i s ...........................................................................  12
2.3 Finite field operation ................................................................................  12
2.3.1 A ddition..........................................................................................  12
2.3.2 Multiplication.................................................................................  13
2.3.3 Comparisons among the multiplications with different basis . . 14
2.4 Galois type linear feedback shift register (LFSR).................................. 15
2.5 Elliptic c u r v e ............................................................................................  15
2.6 Polynomial basis (PB) finite field m ultipliers........................................ 17
2.6.1 Bit-parallel PB finite field m ultipliers........................................  18
2.6.2 Bit-serial PB finite field m ultip liers............................................  19
2.6.3 Bit-parallel PB finite field squarer...............................................  21
2.7 Summary ..................................................................................................  22
3 Design of Bit-Parallel Word-Serial PB Finite Field Multipliers 25
3.1 NIST recommendations.............................................................................  26
3.2 Design of BPWS PB finite field M u ltip lie r........................................... 27
3.2.1 Multiplication a lgorithm ............................................................... 27
3.2.2 Bit-parallel word-serial multiplier arch itecture .......................... 28
3.2.3 M3: Constant Finite Field Multiplier Z  = x ^ Y .......................... 31
3.2.4 Ml: 8 x 233 Bit-parallel partial product generator..................... 32
3.3 Alternative BPWS PB finite field m u ltip lie r .......................................  36
3.4 General BPWS PB finite field multipliers.............................................. 40
3.5 Comparisons...............................................................................................  41
3.6 Summary ................................................................................................... 44
Vlll
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CONTENTS
4 Hardware Design 45
4.1 Hardware arch itec tu re ............................................................................. 45
4.2 Hardware specifications............................................................................. 46
4.3 VLSI implementation technology and design f lo w ................................  47
4.4 Pront-end design ......................................................................................  47
4.4.1 Stimuli f i le s ...................................................................................  49
4.4.2 Hardware m odeling.......................................................................  49
4.4.3 Logical sy n th es is ..........................................................................  50
4.4.4 DFT synthesis................................................................................  50
4.5 Back-end design.........................................................................................  51
4.5.1 Floorplanning and P lacem en t..................................................... 51
4.5.2 Clock tree synthesis.......................................................................  53
4.5.3 Golden n e tlis t................................................................................  54
4.5.4 R o u tin g .......................................................................................... 54
4.6 Physical verification and modification.....................................................  58
4.6.1 Layout versus schematic (LVS) ...................................................  58
4.6.2 Design rule checking (D E C ) .........................................................  59
4.7 Chip L ayout...............................................................................................  60
4.8 Comparisons...............................................................................................  63
4.9 Summary ................................................................................................... 63
5 Summaries o f Contributions 65
A Program 1 66




Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Figures
1.1 Smart card ................................................................................................  3
2.1 Galois type LFSR when F{x) = + x'  ̂+ 1 ........................................  15
2.2 MSB first bit-serial PB finite field multiplier when F(x) = x^ x"̂  -h 1 20
2.3 LSB first bit-serial PB finite field multiplier when F{x) = x^ +x'^ + 1 21
3.1 Proposed hybrid finite field m ultiplier.....................................................  29
3.2 M3: The constant finite field multiplier Z  =  x ^ Y ................................... 31
3.3 8 X 233 bit-parallel partial product generator......................................... 33
3.4 The architecture of the general constant finite field multiplier Z  = x'^Y  33
3.5 The architecture of AND netw ork ...........................................................  34
3.6 The architecture of XOR netw ork ...........................................................  34
3.7 The architecture of sub XOR network.....................................................  35
3.8 Alternative BPWS PB finite field multiplier over   37
3.9 The BPWS PB finite field multiplier in G F { Z ^ ) ................................... 40
3.10 The alternative BPWS PB finite field multiplier in G F i Z ^ ) ................. 41
4.1 The schematic of the hardware.................................................................. 46
4.2 CMC digital design f lo w ...........................................................................  48
4.3 Illustration of p lacem ent...........................................................................  53
4.4 Functional multiplication t e s t ..................................................................  55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF FIGURES
4.5 Waveform of functional multiplication t e s t ........................................... 55
4.6 Functional squaring t e s t .......................................................................... 56
4.7 Waveform of functional squaring t e s t .....................................................  56
4.8 Timing limit checking................................................................................  57
4.9 The result of LV S.....................................................................................  59
4.10 The result of phantom level D R C ...........................................................  60
4.11 The result of standard DRC from C M C ..................................................  61
4.12 The result of anntenna DRC from C M C ................................................  61
4.13 The layout of the c h i p ............................................................................  62
XI
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables
2.1 Addition rule for G F { 2 ) ...........................................................................  11
2.2 Multiplication rule for GF{2) ..................................................................  11
2.3 Close form representation of the product coefficient c , .......................... 19
2.4 The summary for MSB and LSB bit-serial finite field multipliers . . .  21
2.5 Close form representation for the squaring coefficient C i ....................... 23
3.1 NIST recommendations..............................................................................  26
3.2 The output and intermediate results upon each clock c y c le ................  30
3.3 Circuit and timing complexities of the 8 x 233 partial product generator 35
3.4 Circuit and timing complexities of the BPWS PB finite field multiplier 35
3.5 The values of output and other modules on each clock cycle .............  39
3.6 The circuit and timing complexities of alternative BPWS PB finite
field multiplier.............................................................................................  39
3.7 The comparisons among bit-parallel, bit-serial and BPWS finite field
m ultipliers...................................................................................................  42
4.1 Specifications..............................................................................................  47
4.2 Results of logic synthesis...........................................................................  50
4.3 The hardware parameters ........................................................................  63
4.-1 Comparisons among VLSI implementation of finite field multipliers . 64
Xll
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abbreviations
AOP All One Polynomial
ASIC Application-Specific Integrated Circuit
ATPG Automatic Test Pattern Generation
BPWS Bit-Parallel Word-Serial
CMC Ganadian Microelectronics Corporation
CMOS Complementary Metal Oxide Semiconductor
DFT Design For Testbility
DRC Design Rule Check
DSS Digital Signature Standard
EC Elliptic Curve
ECC Elliptic Curve Cryptography
ECDLP Elliptic Curve Discrete Logarithm Problem
OF Galois Field (Finite Field)
HDL Hardware Description Language
LFSR Linear Feedback Shift Register
LSB Least Significant Bit














Least Significant Word 
Layout Versus Schematic 
Most Significant Bit 
Most significant Word
National Institute of Standard and Technology
Polynomial Basis
Resistance-Capacitance
Rivest, Shamir, and Adleman
(public key encryption technology)
Regular Standard Parasitic Format 
Register Transfer Level 
System On Chip
Taiwan Semiconductor Manufacture Company
XIV
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 1
Introduction
In this chapter, the research motivations are introduced in the first section, the fol­
lowings are our research goals and the thesis organization.
1.1 Research motivations
Our research is originated from smart card applications.
1.1.1 Smairt card and its applications
A smart card is a credit card sized plastic card embedded with an integrated circuit 
(IC) chip. It provides not only memory capacity, but computational capability as 
well. The self-containment of smart card makes it resistant to attack as it does 
not need to depend upon potentially vulnerable external resources. Because of this 
characteristic, smart cards are often used in different security applications which 
require strong security protection and authentication.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. INTRODUCTION
The success of the smart cards in Europe began in the early eighties, between 1982 
and 1984 when Carte Bancaire (the French Bank Card Group) had the first smart card 
pilot running [9]. Together with Bull (A French company), Philips (An international 
company) and Schlumberger (An international company), Carte Bancaire launched 
trials in the French cities of Blois, Caen, and Lyon. The trials were a great success. 
Following these trials, French banks launched the use of smart cards for banking. 
This was the first mass rollout of smart cards in the banking industry.
Today, smart cards are used for many different purposes in daily life. Smart 
card can be a phone card, people can use it to make local or long distance call in 
a phone booth; Smart card can also act as an identification card which is used to 
prove the identity of the card holder. For example, it can be used as campus access 
card. In Finland, smart cards are used as the Finnish National Electronic Identity 
(FINEID) cards; Smart card can be a medical card which stores the medical history 
of a person; Furthermore, the smart card can be used as a credit/debit bank card 
which allows off-line transactions. In the near future, the traditional magnetic strip 
card will be replaced and integrated together into a single card by using the multi­
application smart card, which is known as an electronic purse or wallet in the smart 
card industry. All of these applications require sensitive data to be stored in the 
card, such as biometrics information of the card owner, personal medical history, and 
cryptographic keys for authentication, etc.
A smart card [9] shown in Figure 1.1 consists of a microprocessor, ROM (Read 
Only Memory), EEPROM (Electrical Erasable Programmable Read Only Memory), 
and RAM (Random Access Memory).
Today’s smart cards have approximately the same computing power as the first 
IBM PC [9]. At present, most smart cards have an inexpensive 8-bit microprocessor, 
but the high-end cards can have a 16-bit or 32-bit processor. An optional crypto­
graphic coprocessor (security processor) increases the performance of cryptographic









-  5 MMz, 5 \ '
-  optional coprocc!>?i;or
RAM
• 256 byte to 
\ KB
HBPROM
-  1 K B to  16 KB 
“  Filcs> 'stcra




Figure 1.1: Smart card
operations. The working frequency of smart card normally is 5 MHz. The RAM size 
of most smart cards is 256 bytes to 1 kilobyte. The chip size is at most 25mm^ and 
there are also a card operating system and might have some applications in smart 
card [9].
Since the working frequency is relatively slow, furthermore, the memory inside 
the smart card is very limited and the card operating system is not for security pur­
pose, software implementation of any security application in smart card is normally 
very slow and considered insecure. We usually solve the high load of cryptographic 
computations by means of the cryptographic coprocessor. Due to the above features 
of the smart card and its area constraint which we cannot make the chip very large, 
the coprocessor hardware inside the smart card should trade-off area for speed.
1.1.2 C ryptography and cryptosystem s
Cryptography and cryptosystems gain extensive uses in all kinds of security applica­
tions.
Cryptography is the study of mathematical techniques related to the aspects of 
information security such as confidentiality, data integrity, and data origin authenti-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. INTRODUCTION
cation, etc. Cryptosystems can normally be classified into two groups : symmetric 
cryptosystems [1] and asymmetric cryptosystems [1] (also called public key cryptosys­
tems).
Symmetric cryptosystems use the same key to encrypt and decrypt information. 
Implementations of symmetric key encryption/decryption can be highly efficient, so 
that users do not experience any significant time delay as a result of the encryption 
and decryption. But symmetric cryptosystems have a problem of low security for 
their key. It is generally very difficult to transmit the secret key from the sender 
to the recipient securely and in a tamperproof fashion. If anyone else discovers the 
key, it affects both confidentiality and authentication. A person with an unauthorized 
symmetric key not only can decrypt messages sent with that key, but can also encrypt 
new messages and send them as if they came from one of the two parties who were 
originally using the key. Frequently, trusted couriers are used as a solution to this 
problem. A more efficient and reliable solution is a public key cryptosystem.
Public key cryptosystems involve a pair of keys (a public key and a private 
key) which are associated with an entity that needs to authenticate its identity or 
to sign or to encrypt data. Such public key cryptosystems have the abilities to 
perform the functions of key exchange, digital signature, encryption and decryption. 
Nowadays there are two main public key cryptosystems which are RSA cryptosystems 
and Elliptic curve (EC) cryptosystems.
1.1.3 E lliptic curve cryptography (ECC)
In 1985, N. Koblitz [11] and V.S. Miller [16] independently proposed elliptic curves 
(EC) for public key cryptosystems. Their proposal however was not considered as 
a new cryptographic algorithm with elliptic curves over finite fields, as they imple­
mented existing algorithms, like Diffie-Hellman, using elliptic curves [23].
Over the past two decades, elliptic curve has been well researched by many schol­
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. INTRODUCTION
ars. These cryptosystems need a much shorter key than RSA cryptosystems to provide 
the same security strength. It appears that an elliptic curve cryptosystem imple­
mented over the 160-bit field GF{2^^^) currently offers roughly the same resistance to 
side channel attack as would a 1024-bit RSA [20] and an elliptic curve cryptosystem 
over a 136-bit field GF{2^^^) gives us roughly the same security as 768-bit RSA [22]. 
The basic operations in RSA cryptosystems are integer modular operations, while 
in EC cryptosystems, finite field operations are the basic operations. When elliptic 
curve is over finite field GF{2^),  the implementation of EC cryptosystems will save 
more hardware resources than RSA cryptosystems since the field elements in GF{2^)  
can be represented by m-bit binary numbers and the binary number is well adopted 
by computer arithmetic. All these advantages make EC an ideal candidate for smart 
card applications.
The finite field operations in EC cryptosystems can be broken into finite field ad­
ditions, multiplications, squarings and inversions. Finite field addition can be simply 
implemented by XOR gates and normally considered as almost free. These finite field 
adders are carry-free, and thus are faster than normal carry ripple adders. The finite 
field inversion can be further broken into finite field multiplications and finite field 
squarings [26, 4] and finite field squaring is a special case of finite field multiplication, 
thus, the finite field multiplier becomes the key component in EC hardware.
A number of finite field multiplier architectures have been proposed with different 
emphasis for various security applications. Full bit-parallel finite field multipliers [28, 
12, 13, 14, 15, 19, 25, 27] can yield high throughputs, bit-serial finite field multipliers 
[2, 8, 24, 27] only need small area. These finite field multiplier architectures can satisfy 
nearly all security applications. However, the full bit-parallel finite field multipliers 
are still too large for smart cards because they have the chip area constraint; On 
the other hand, finite field multiplier with bit-serial structure is too slow since the 
frequency is low for smart card and too many clock cycles are needed to perform one
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. INTRODUCTION
multiplication. We may need a hybrid bit-parallel word-serial (BPWS) finite field 
multiplier architecture to balance the trade-off between area and speed.
1.2 Research goals
One of our research goals is to design a new hybrid BPWS finite field multiplier archi­
tecture for smart card, such a finite field multiplier should trade-off between area with 
speed; The final goal is to design an ASIC (Application Specified Integrated Circuit) 
chip which can perform finite field multiplication using this BPWS architecture and 
finite field squaring using a bit-parallel finite field squarer.
1.3 Thesis organization
Chapter 2 introduces the basic concepts of finite field, finite field element repre­
sentations, finite field operation and a few state-of-art polynomial basis finite field 
multiplier architectures.
A BPWS finite field multiplier architecture is designed in Chapter 3, a novel 
8 X 233 bit-parallel partial product generator is developed, an alternative BPWS 
finite field multiplier architecture and the general architectures are also introduced in 
this chapter.
The design of an ASIC chip which has the BPWS finite field multiplier together 
with a full bit parallel squarer is presented in Chapter 4. The VLSI implementation 
technology is TSMC 0.18 CMOS technology. The design flow is the CMC digital 
design flow. The results at each design stage are also shown in this chapter.
Chapter 5 presents summary and conclusions of this research and provides future 
works.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 2
Arithmetic over Finite Field
In this chapter, concept of field, irreducible polynomial, finite field and finite field 
element representations are introduced. Furthermore Elliptic Curve (EC) and why 
finite field multiplier is so important for EC are explained. Using these background 
knowledge, several state-of-the-art polynomial basis finite field multiplier architec­
tures are discussed. At the end of this chapter, a bit parallel finite field squarer is 
introduced.
2.1 Group, ring, field and finite field
2.1.1 Group
A group [31] (G, *) is defined as a set G together with a binary operation =c ; 
G * G ^  G. We write “a * 6” for the result of applying the operation * to the 
two elements a and b of G. To have a group, * mast satisfy the following axioms ;
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
•  Associativity: For all a, b and c in G, (a * b) * c = a * {b * c).
•  Identity element: There is an element e in G such that for all a in G, e * a =
a = a*e .
•  Inverse element: For all a in G, there is an element 6 in G such that a*b = e =
b* a, where e is the identity element from the previous axiom.
• Closure: For all a and b in G, a * b  belongs to G.
An abelian group is a group (G, *) that is commutative, i.e., a * b  =  b * a  holds for 
all elements a and 6 in G.
Examples
1. The set of integers under addition forms a group and also an abelian group.
2. The set of nonzero rational numbers under multiplication is a group and also 
an abelian group.
3. The set of integers under multiplication is NOT a group.
2.1.2 R ing
A ring [32] is an abelian group (i?, +), together with a second binary operation * such 
that for all a, b, and c in R,
a*  {b* c) = { a * b )* c  
a*  (b + c) =  {a * b) + {a * c)
{a + b )* c  =  {a*c) + {b*c)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
and such that there exists a multiplicative identity, or unity, that is, an element 1 so 
that for all a in R,
a * l  = l * a  — a
The identity element with respect to +  is called the zero element of the ring and 
written as 0.
A commutative ring is a ring in which the multiplication operation obeys the com­
mutative law, i.e., if a and b are any elements of the ring, and if the multiplication 
operation is written as *, then a * b = b * a.
Examples
Integers, rational numbers, real numbers and complex number under addtion and 
multiplication are all examples of rings.
2.1.3 F ield
A field [33] is a commutative ring (F, - I - ,  *) such that additive identity element 0 does 
not equal multiplicative identity 1 and all elements of F  except 0 have a multiplicative 
inverse. Besides the above axioms of group and ring, a field also obey the following 
rules:
• Existence of an additive identity
There exists an element 0 in F, such that for all a belonging to F, a + 0 = a.
• Existence of a multiplicative identity
There exists an element 1 in F  different from 0, such that for all a belonging to 
F, a * 1 =  o.
• Existence of multiplicative inverses
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
For every a ^  0 belonging to F, there exists an element a  ̂ in F, such that 
a * a~^ = 1 .
Examples
Some examples of fields are listed below:
• The rational numbers Q = {a/b \ a, b in Z,  b ^  0}, where Z  is the set of 
integers.
• The real numbers %.
• The complex numbers C.
• The smallest field has only two elements: 0 and 1. It is sometimes denoted by 
F 2 or GF(2). It has important uses in cryptography and coding theory.
2.1.4 F in ite  field
Finite field is also called Galois field (so named in honor of Evariste Galois). Finite 
field is a field that contains only finite number of elements.
All finite fields have prime characteristic. The number (or the order) of the elements 
in a finite field is always a prime or a power of a prime [3].
• If p is a prime, the integers modulo p form a field with p elements, denoted by 
GF{p). Every other field with p elements is isomorphic to this one.
• If 5 == p'" is a prime power, then there exists up to isomorphism exactly one
field with q elements, written as GF{q) or GF{p^).
The finite field that is used in this thesis is GF{2'^). When we say finite field in this
thesis, we refer to GF{2F').
1 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
Finite field GF(2’̂ ) can be defined (or generated) by an irreducible polynomial F(x)  
of degree m  with its coefficients in GF( 2 ),
F{x) =X^^ + + . . . + f iX + I
where /, e  ^ ^ ( 2), for z =  1, 2, . . . ,  m — 1.
The elements in this finite field can be treated as the polynomials of degree n (0 < 
n < m )  with the coefficients in GF{2) or the m-bit binary numbers.
The finite field GF{2) consists of only two elements which are 0 and 1 and satisfies 









Table 2.2: Multiplication rule for GF{2)
2.2 Finite field element representations
Like vectors in linear algebra can be represented by various vector spaces, we use 
bases to represent the field element. There are three main bases used to represent
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
the elements in GF(2’̂ ), they are polynomial basis, normal basis and dual bases 
respectively.
2.2.1 Polynom ial basis
Assume a: be a root of the irreducible polynomial F(x)  which generates the finite field 
GF(2"^), then forms a polynomial basis. Any element A  in
the finite field can be represented as
m—1
A =  ^  GiX' =  (oo, ai, 02, . . . ,  Um-i), where ai e  GF{2) .
i=0
Normal basis and dual bases are other two main bases. The detail discussion can be 
found in [2].
2.3 Finite field operation
Given a finite field GF{2^) which is generated by an irreducible polynomial F{x) =  
F + ■ ■ •+ /i3 ;+ l, where fi G GF(2) for z =  1,2, . . . ,  m —1, let
A  and B  be any two elements in GF(2'”) and {1, x , x ^ , . . . ,  x^~^}  be the polynomial 
basis, A  and B  can be expressed as
m—1
A =  Yh I
i = 0  m—1
B  = >
i = 0
where a,, fcj G GF{2) for z =  0,1 , . . . ,  m — 1.
2.3.1 A ddition
Let S  be the sum of A  and B  and S  be expressed as
m —1
S  =  ’
2 = 0
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
where Sj e GF{2), for z =  0 ,1, . . . ,  m — 2, then
m —1 m —1
S  = A + B  =
i= 0  i = 0
m —1
= E  («i + bi)x^ .
t = 0
Thus, we can get
Sj =  Oj T fej , (2-1)
where s,, ai, bi e  GF{2), for i =  0 ,1, . . . ,  m — 1.
The addition expressed in Formula 2.1 obeys the addition rule for GF{2) which is 
described in Table 2.1 and can be implemented by an XOR gate. Hence, the addition 
in GF(2^)  can be implemented by m  XOR gates.
2.3.2 M ultip lication
Let C  be the product of A  and B  and C  be expressed as
m —1
C =  ^  c,rr® ,
i = 0
where Cj E GF(2), for i =  0,1, . . . ,  m — 1, then
m - l  m - 1
C = A B  = ̂  ^  hjX^mod.F{x) , (2.2)
i = 0  j = 0
where i, j  = 0 , l , . . . , m  — 1. Formula 2.2 involves two operations. One is polyno­
mial multiplication which is straightforward; The other is the reduction modulo the 
irreducible polynomial F{x). When the irreducible polynomial is trinomial, the coef­
ficients c, has a close form of expression in terms of {oj} and {6,} [28] which will be 
introduced later.
Finite field squaring is a special case of finite field multiplication. Let C  be the 
squaring of A  and C  be expressed as
m —1
C = ' Y ^ C i x \
i = 0
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER EINITE FIELD
where q  e  GF{2), for z =  0,1, . . . ,  m — 1, then
m —1
C = ^  CiX̂  =  mod F{x)
i = 0
m —1





where a- is given by
a, =
Oi if i is even;
5 (2.3)
0 if Hs odd.
When F{x) is an irreducible trinomial, the coefficient Ci has close form representa­
tions and the architecture of finite field squarer is much simpler than that of finite 
field multiplier. We will discuss this in detail later.
The detail discussions about finite field multiplications based on normal basis and 
dual bases can be found in [2, 10, 27].
2.3.3 Com parisons am ong th e  m ultiplications w ith  different 
basis
Three finite field multipliers which are based on polynomial basis, normal basis and 
dual bases respectively were compared by I.S. Hsu et al in [10], which are the dual 
basis multiplier, the normal basis multiplier, and the polynomial basis multiplier. 
The dual basis multiplier occupies the smallest amount of chip area in VLSI im­
plementation if the basis conversion is not included; The area of the normal basis 
multiplier however grows dramatically as the order of field goes up; The polynomial 
basis multiplier does not require basis conversion, it is readily matched to any input 
or output system, the design and expansion to higher order finite fields are easier to 
realize than the dual or normal basis multipliers.
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
2.4 Galois type linear feedback shift register (LFSR)
Galois type LFSR are widely used in bit-serial finite field multiplier architectures. 
Galois type LFSR is simple and the architecture of Galois type LFSR can be easily 
obtained from the irreducible polynomial F{x) which generates GF{2^).  Figure 
2.1 shows the Galois type LFSR architecture when the irreducible polynomial is 
F{x) = x'^ + x'  ̂+ 1.
Galois type LFSR serves as a constant multiplier, i.e., if the current value of Galois
.k+l•k-1
next clock 
A  — *xA
Figure 2.1: Galois type LFSR when F{x) = x'^ + x'  ̂+ 1
type LFSR is A, the value of this Galois type LFSR during the next clock cycle will 
be xA.
2.5 Elliptic curve
Elliptic curve cryptography (EGG) was proposed by Victor Miller and Neal Koblitz 
in the mid 1980s [11, ?]. EC over GF{2^)  has the following form.
E  \ + xy = x^ + a^x^ -h uq (2.4)
where oq and 02 are the elements in finite field GF(2^)  and E  represents the elliptic 
curve.
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
The elliptic curve is the set of points {x, y) which are the solutions to Formula 2.4 
together with an extra point O which is called the point at infinity. The coordinate 
values X and y of the point are also the elements in GF{2^).  The number of such 
points is finite.
This set of points on an EC forms a group under a certain addition rule (or it is 
called addition law), which is written using the notation +. The point O is the 
identity element of the group.
Given a point P  =  {x, y) and a positive integer t, we define [t]P = P  + P  + . . .  -\- 
P(t times). The order of a point P  =  {x,y) is the smallest positive integer n  such 
that [n]P =  (9.
We denote < P  > as the group generated by P, i.e.
n —1
< P > = { 0 , P , P  + P , P  + P  + P , . . . , P P P  + . . .  + P}
The security of ECC relies on elliptic curve discrete logarithm problem (ECDLP):
Let E be an elliptic curve over GF{2^),  let P  be a point on the elliptic curve, let Q 
be a point in < P  >. Finding an integer I such that Q = [l]P is the ECDLP.
It is widely believed that the I in ECDLP is hard to computationally solve when the 
point P  has large prime order.
Point operations on EC conform the addition law which is defined below.
Assume Pi{xi, yi), P2(x2,2/2) e  P  and Pi + P2 =  Psix^, 2 /3 ), we define
^ 3 =  I t '+ x o  + 3 ^ 1 + 3 : 2  +  02
Pi ¥^P2-{ • (2-5)
=  ( i T t E )  + ^S + » 1
P , = P 2 .< 2 f '
2/3 =  +  ( a:i +  X 3  +  X 3
When Pi 7̂  P2, we can find the point P3 using Formula 2.5, this is also called point 
addition; When Pi =  P2, we can obtain P3 by the formula 2.6, this is also called
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
point doubling. All the arithmetic operations in these two formulae are finite field 
operations.
Prom above Formula 2.5 and Formula 2.6 we can see that point operations always 
can be broken into finite field multiplications, finite field squarings, finite 
field inversions and finite field additions. Finite field addition in GF(2"^) can 
be implemented by m  XOR gates, so the finite field addition is considered as almost 
free. We usually calculate the finite field inversion by means of extended Euclidian 
algorithm or Fermat theorem [26, 4]. From Fermat theorem, the inversion of any field 
element A  can be obtained by the following formula
^-1 ^  ^2--2 _ (2.7)
Equation 2.7 can be further broken into finite field multiplications and finite field 
squarings. Thus, the finite field inversion can be obtained from finite field multipli­
cation and finite field squaring. Since finite field squaring is a special case of finite 
field multiplication, in addition, finite field squarer is much simpler than finite field 
multiplier (as will be seen later), the finite field multiplier is our focus. As we have 
already discussed in Section 2.3.2, the polynomial basis multiplier has the advantages 
over other basis multipliers, the polynomial basis (PB) finite field multiplier is the 
focus in this thesis.
2.6 Polynomial basis (PB) finite field multipliers
A number of PB finite field multipliers have been proposed [2, 6, 8, 13, 14, 25, 21, 
28, 24]. Two typical kinds of PB finite field multipliers are bit-parallel PB finite field 
multipliers and bit-serial PB finite field multipliers.
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
2.6.1 B it-parallel P B  finite field m ultipliers
There are many bit-parallel finite field multipliers which have been proposed so far. 
A bit-parallel systolic multiplier has been proposed in [13] for the GF{2^)  using the 
polynomial basis representation. The finite field is generated by the irreducible trino­
mial x ” *  - f  x "  - I - 1 of degree m. The permutation polynomial and Horner’s algorithm 
are applied to create a low complexity systolic multiplier. The circuit includes 
2-input AND gates, im? m  — 1 2-input XOR gates and Zm? -f 2m — 2 1-bit latches. 
The latency of the systolic multiplier over GF(2'”) is only 2m — 1 clock cycles with 
a throughput rate of one result per clock cycle.
In [14], a bit-parallel systolic AGP-based (All One Polynomial based) multiplier for 
GF{2^) has been presented. The architectures of the two AOP-based multipliers can 
also be adopted to implement ESP-based (Equal Space Polynomial based) multipliers.
In [25], an architecture based on a new formulation of the multiplication matrix 
is described and circuit complexities are analyzed when the finite field is generated 
by trinomial x"* -f- x" -f-1.
In [21], a new bit-parallel structure for a multiplier with low complexity in Galois 
fields is introduced. The multiplier operates over composite fields GF((2")"*), with 
k =  nm. It is shown that this operation has complexity of order under
certain constraints regarding k.
A bit-parallel finite field multiplier based on polynomial basis is discussed in [28]. 
Let A  and B  be any two field elements represented by polynomial basis as follow
m —1








Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
When F{x) is trinomial i.e. F{x) =  x ^  + x'  ̂ + 1, Ci has the following close form 
representations shown in Table 2.3.
F ( x )  =  x*" +  X +  1 Co = So +  Sm
Ci = Si “h S m + i—1 “t" S m + i ) i =  1 , 2 , . . . ,  m  -  1
Cm —1 = S m - 1  +  S 2 m -2
F ( x )  =  X*" +  x*“ +  1 Ci = Si +  Sm + i "I" S2m-~lc+i i =  0 , 1 , . . . ,  fe -  2-
1 <  fc <  m / 2 Cfc-1 = Sfc—1 +  Sm+fc—1
Ci = Si F  S m + i  F  Sm —fc+i F  S2m—2/e+i i = k , . . .  ,2k — 2
Ci = Si F  S m + i  F  Sfjn—k+i i = 2k — I , . . .  ,m  -  2
Cm—1 = S m - 1  F  S 2 m - f c - l
F ( x )  =  x ”» +  x ’” /^  +  1 Ci Si F  S m + i  F  S3m /2+i i = 0 ,1 , . . . ,  m /2  — 2
C m /2 -1 = ^ m / 2 - 1  +  S 3 m / 2 - l
Ci = Si F  Sjn/2+i i =  m / 2 , . . .  , m  — 2
Cm —1 = S m - 1  F  S 3 m / 2 - l
Table 2.3: Close form representation of the product coefficient Cj
2.6.2 B it-serial P B  finite field m ultipliers
Thomas Beth et al presented two basic architectures for PB bit serial finite field mul­
tiplier in [2]. Leilei Song et al have the similar design in [24] and Johann Grobschadl 
has the same idea with low power implementation in [8]. All the bit serial PB finite 
field multipliers use Galois type LFSRs.
Most significant bit (MSB) first bit serial PB finite field multiplier
The architecture of MSB first bit serial finite field multiplier [2] is very simple. The 
Figure 2.2 shows the architecture when irreducible polynomial is F{x) =  -1-rc  ̂-I-1.
There is a Galois type LFSR used in Figure 2.2. The initial value of this Galois 
type LFSR is 0. One operand B  is input in parallel. The other operand A  is input in
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
bo b2
Figure 2.2: MSB first bit-serial PB finite field multiplier when F{x) = + 1
serial, upon each clock cycle, one bit ai in A  is fed into the circuit, the input method 
used is the most significant bit (MSB) first. The final result can be obtained from 
the outputs CjS after 6 clock cycles. It takes 6 clock cycles to perform one finite field 
multiplication using this MSB first bit-serial finite field multiplier.
Least significant bit (LSB) first bit-serial PB finite field multiplier
The above MSB first PB finite field multiplier has an alternative form which is LSB 
first PB finite field multiplier [2]. When irreducible polynomial is still F{x) =  
X® -h -f 1, the architecture of the LSB first bit-serial PB multiplier is shown in 
Figure 2.3.
There is a Galois type LFSR in this architecture and it is initially set to one of 
the operands. The other operand A  is input in serial. The input method is the least 
significant bit (LSB) first. There is one additional 5-bit register which is initially set 
to 0. The final product can be obtained from the outputs CjS after 6 clock cycles. 
Like the MSB first bit-serial finite field multiplier, there are 6 clock cycles needed in 
order to perform one finite field multiplication using this LSB architecture.
Assume all the AND gates and XOR gates have two inputs, the delay of the
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
Figure 2.3: LSB first bit-serial PB finite field multiplier when F{x) = -\- x  + \
AND gate is Ta and the delay of the XOR gate is Tx- When GF{2^)  is generated by 
irreducible trinomial F{x) = x'^+x'^+1, we can summarize the circuit complexity and 
speed to perform one multiplication for the above bit-serial PB finite field multipliers 
in Table 2.4.
Multiplier Speed (clock cycle) Circuit complexity Critical path
MSB first m
m  AND gates 




m  AND gates 
m  + 1 XOR gates 
2 m-bit registers
Ta F T x
Table 2.4: The summary for MSB and LSB bit-serial finite field multipliers
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
2.6.3 B it-parallel P B  finite field squEirer
Finite field squaring is a special case of finite field multiplication. The architecture of 
bit-parallel PB finite field squarer is much simpler than that of bit-parallel PB finite 
field multiplier. A trinomial based bit-parallel PB finite field squarer is introduced in 
[28|,
Suppose that GF{2^)  is generated by the irreducible polynomial F{x) over GF{2), 
an arbitrary field element A  can be expressed by the polynomial basis as
m —1
A =  y^Q ii X
i = 0
Let C  be the squaring of A, we have
m —1
C = CiX̂  =  A^ mod F{x)
i = 0







where a' is given by
/
ai if z is even;
2 (2 .8 )
0 if z is odd.
When F{x) is an irreducible trinomial, i.e., F{x) =  x"* -f +  1, where 1 < A: < y , 
the coefficient Cj has close form representations which are summarized in Table 2.5.
2.7 Summary
Some basic concepts about finite field, field element representation and finite field 
arithmetic are introduced in the first three sections. In Section 2.5, Elliptic curve 
cryptography is briefly touched and we know that the finite field multiplier is the
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
Ci =  a '  +  < i =  0 , 2 , . . .  , m  — 2 , 
i =  1 , 3 , . . .  , m  -  1 .
F{x) = X™ + 1  +  1 m  is o d d  co =  Oq
Ci ^m+i ’ j =  1 , 3 , . . .  , m  — 2 ,
Ci =  “ 'i +  “ m + i - l  - i =  2 , 4 , . . .  , m  — 1 .
k  is even Ci = a[+  <l2m-*;+i ’ i =  0 , 2 , . . . , f c - 2  ,
m  is o d d Ci ~  “ m + i  ’ i =  l , 3 , . . . , f e - l  ,
Ci =  Oj +  a2m -2fc+ i  > i =  fe,fc +  2 , . . .  ,21: — 2 ,
Ci C'm+i ^m~k+i ’ i =  fc +  l , fc  +  3, . . . , ) 7 i  — 2 ,
Ci =  a' , j =  2k, 2fc +  2 , . . . ,  m  — 1 .
F{x)  =  x ’̂  + 1 *̂ +  1 k  is o d d Ci =  , i  =  0 , 2 , . . . ,  fc -  1 ,
m  is o d d Ci ®m +t ®2m—fc+i ’ i =  1 , 3 , . . . ,  fc -  2 ,
Ci —  C'i+ o jn - f c+ i  “ 2m -2fc+ i  ’ i =  fc +  l,fc +  3 , . . . ,2 fc  -  2 ,
Ci ^ m + i  ’ i = k ,k  + 2 , . . .  ,m  — 2 ,
Ci ^ i  +  fc+i ’ i =  2k, 2fc +  2 , . . . ,  771 — 1 .
k  is o d d Ci ~  “ i “ m + i  ’ 7 =  0 , 2 , . . . ,  A; -  1 ,
m  is even Ci “  “ 2 m -fc+ i  ’ 7 =  1 , 3 , . . . ,  fc — 2 ,
Ci =  ®i +  “ m + i  +  “ 2m -2fc+ i  ’ 7 =  fc +  l , f c  +  3 , . . . , 2 f c - 2  ,
Ci fc+i ' 7 =  fc,fc +  2, . . . , 7 7 7 — 1 ,
Ci +  *^77i+i J 7 =  2fc, 2fc +  2, . . . , 777 — 2 .
F(x)  =  x™ +  x T  +  1 Ci =  “ 1 +  “ m + i  ' 7 =  0, 2, . . . , y  — 1 ,
Ci ^  “ V + i  ’ 7 =  1 , 3 , . . . , f - 2 ,
Ci =  a '  , 7 = f  + 1 , ^ +  3 , . . . , 777 -  2 ,
Ci =  , 7 =  f , f  +  2, . . .,777 -  1 .
Table 2.5: Close form representation for the squaring coefficient Cj
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. ARITHMETIC OVER FINITE FIELD
basic key component in elliptic curve hardware. Two typical architectures of finite 
field multipliers are introduced in Section 2.6. At last, we mentioned a bit-parallel 
PB finite field squarer.
In the next chapter, we will design bit-parallel word-serial PB finite field multiplier 
architectures which have the trade-off between gate counts (area) and speed.
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 3 
Design of Bit-Parallel 
Word-Serial PB Finite Field 
Multipliers
In this chapter, accepted standard for EC cryptosystems is introduced at the begin­
ning. Next, the bit-parallel word-serial (BPWS) PB finite field multiplier is designed. 
An alternative form of BPWS PB finite field multiplier architecture is introduced 
in the following section. At the end of this chapter, general forms of BPWS PB 
multiplier architectures and the comparisons are presented.
It is known that, ECC devices require less storage, less power, less memory, and 
less bandwidth than other systems [29, 22, 34]. This allows implementation of cryp­
tography in platforms that are constrained, such as wireless devices, handheld com­
puters, smart cards. Several organizations such as NIST (National Institute of Stan­
dard and Technology), ANSI, IEEE etc. have standardized ECC [34]. NIST issues
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERJAL PE FINITE FIELD MULTIPLIERS
standards that are mandatory for US Federal Government agencies to follow. NIST 
recommends not only key establishment schemes, key management in Special Publi­
cation 800-56 and 800-57 for EC cryptosystems, but also digital signature standard 
(DSS) in Federal Information Processing Standards (FIPS) 186-2 [29] with elliptic 
curve domain parameters. We follow NIST recommendations in this thesis as NIST 
recommendations for ECC are well and widely adopted.
3.1 NIST recommendations
National Institute of Standard and Technology (NIST) recommends five finite fields 
which are generated by five irreducible polynomials for EC cryptosystems [29]. These 
five polynomials are shown in Table 3.1.
In this thesis, GF{2‘̂^ )̂ which is generated by the irreducible trinomial F(x) =
Degree Irreducible Polynomial F{x)
163 F{x) — -{■ x'’ x^ x^ 1
233 F{x) = x ^^  -F + 1
283 F{x) =  x ^ ^  + x^^ + x'  ̂+ x^ + 1
409 F{x) =  x"^^ -F -F 1
571 F{x) = x̂ '^̂  + x^^ x^ x^ + 1
Table 3.1: NIST recommendations
2̂33 _j_ J.74 ^  jg choice since GF{2 ‘̂ ^) can satisfy the security requirements for 
smart card applications and the irreducible trinomial can significantly reduce the cir­
cuit complexity.
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
3.2 Design of BPWS PB finite field Multiplier
3.2.1 M ultip lication algorithm
Let the irreducible polynomial be
F{x) = +  1 ,
then the polynomial basis for GF{2 ^^) can be given as {1, a;, . . . ,
Let A, B  G GF{2^^^) be any two field elements and C  be their product. We can write 
A  and B  as 232
A = Y^ UiX\ and 
1=0 232
B  = biX\
i= 0
Then the product C  is
C = A B  mod F{x) . (3.1)
We can divide the operand A  into 30 groups (words) from the least significant bit of
A  and let each word contain 8 bits. In the 30th word, we append seven “0”s as the 
most significant seven bits. This can be shown as follow,
A — (0, 0, 0, 0, 0, 0, 0, 0232, 0231, <̂230, • • • , 0224, • • • , O15, Gu, . . . , Og, O7, Og, . . . , Oq) ,
^  V    y . ................ ^  '■-------------- y,  y . "—
A 29 j428 ^ 0
where Aj  is the word, for j  =  0 ,1, . . . ,  29.
Let us denote Aj  as
■̂ j ~  08j+73̂  ̂+  Ogj+ê ;® +  . . .  +  Ogj+ia: +  Ogj , (3-2)
for j  =  0,1, 2 , . . . ,  29, where Oj =  0 for i =  233, 234,. . . ,  239.
Then A  can be expressed as
A =  (.. .  (̂ 293;® +  A2g)x  ̂+ . . .  +  Ai)a* + Ao . (3-3)
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
Thus, the product C  can be expressed in the following formula,
C  = A B  mod F{x)
= {{.. .{A 2gX̂  + A 2s)x^ + . Ai)x^ + Ao)B mod F(x)  (3.4)
= (... {A2qBx^ +  A 2sB)x^ +  ...  +  AiB)x^  +  AqB mod F(x)
Let
where
Cj =  Cj-\X^ +  Dj , for j  =  0,1, . . . ,  29, (3.5)
Dj = A 29- j B  , for j  =  0,1, . . . ,  29 . (3.6)
If we assume C_i =  0, then it can be seen from (3.4) (3.5) and (3.6) that
C = C29. (3.7)
3.2.2 Bit-peirallel word-serial m ultiplier architecture
The above iterative processes (3.5) and (3.6) can be mapped into the architecture 
shown in Figure 3.1.
The multiplier architecture has two input ports 71, 12 and one output port. Input 
port 72 is 8 bits wide and used to serially input the words A 2g-j, j  = 0 ,1 , . . . ,  29. 
The other input port 71 is 233 bits wide and used to input the other operand B. The 
output port is 233 bits wide which is used to output the product.
There are four modules in this architecture. These are :
Module M l : Module Ml is a 8 x 233 bit-parallel partial product generator. Ml 
implements the equation (3.6) by taking inputs of 8-bit word A 2g-j and 233-bit 
B  and yielding the partial product A 2g-j x B  after the j t h  clock cycle. The 
detailed architecture of Ml will be discussed later in Section 3.2.4.
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.






2 3 3 ,
M3
M4
2 3 3 - '
M l; 8 X 233 Partial product generator 
M2: 233-bit Adder 
M3: Constant multiplier 
M4: 233-bit Register
Figure 3.1: Proposed hybrid finite field multiplier
Module M2 : Module M2 is a bit-parallel finite field adder which realizes the equa­
tion (3.5) by taking inputs Dj from Ml and from M4. This adder
can be easily implemented by 233 XOR gates. The output of M2 is actually 
the output of the proposed BPWS PB finite field multiplier after 30 clock cycles.
Module M3 : Module M3 is a bit-parallel finite field constant multiplier. The two 
operands include the constant and the output of module M2 which is Cj. 
The output of module M3 is Cjx^.
Module M4 : Module M4 is a 233-bit register which is used to store the interme­
diate results. The output of module M4 is Cj-ix^.
29
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
Let the content of M4 be initialized as 0. Assume the operand B  be available at 71 
from the beginning until the product C  is generated at Output.
After the first clock cycle (Clock 0), the input word at 72 is A29, the output of Ml 
is Do = A 29B. Since the output of M4 is 0, the output of M2 is Co =  Do and the 
output of M3 is Cox^.
After the second clock cycle (Clock 1), the input word at 72 is A28 and the output of 
Ml is given by Di = A^^B. The adder M2 takes the inputs D\  and where CqX^ 
was stored in the register M4 in the previous clock cycle, and yields C\ = CqX^ +  Di.
This process is continued till the 30th clock cycle (Clock 29), the input word at 72 
is Ao and the output of Ml is given by D 29 = AoB. The adder M2 takes the inputs 
D 20 and C28X^, where C28X̂  was stored in the register M4 in the previous clock cycle, 
and yields C29 = C28X^ +  D 2 9 . This is the exact product C  from equation (3.7). 
Thus one finite field multiplication in GF{2^^^) needs 30 clock cycles in this BPWS 
PB finite field multiplier.
Assume that B  is available at 71 throughout the multiplication. Table 3.2 shows the 
main intermediate results after each clock cycle..
Clock Input at 72 Output of Ml Output of M4 Output
0 A29 Do = A 2 9B 0 Co = Do
1 A28 D\  =  A 2 8B Cox^ Cl =  Cox^ +  Di
2 A27 D 2 — A 2 7 B Cix^ C2 =  +  T>2
29 Ao D 29 = AoB C28X^ C29 =  C2SX̂  +  D 29
Table 3.2: The output and intermediate results upon each clock cycle
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
3.2.3 M3: C onstant F in ite F ield M ultiplier Z  =  x^Y
In general, we call a finite field multiplier a constant finite field multiplier when one 
of two operands is a constant field element. The constant finite field multiplier has 
a much simpler architecture than a regular multiplier since it removes all the AND 
gates and significantly reduces the number of XOR gates.
The module M3 in Figure 3.1 is a constant finite field multiplier which performs the 
multiplication of Z =  x^Y  in GF{2^^^), where rr® is a constant and Y  is any element 
in This constant finite field multiplier can be simply implemented by only
8 XOR gates.
If we express Y  and Z  as
232
y  = and
i = 0  
- 232
Z =  £
i = 0
the coefficients Zi of the product Z can be expressed as,
1/225+i i= 0 ,l,... ,7
Vi-s i=8,9,... ,73
Vi-s +  yi5i+i i=74,75,... ,81 
Vi- 8  i=82,83,... ,232
The architecture of the constant finite field multiplier M3 is shown in Figure 3.2.
Zi= \ (3.8)
Figure 3.2: M3; The constant finite field multiplier Z =  x^Y
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
3.2.4 M l: 8 x 233 Bit-parallel partial product generator
Module M l in the proposed BPWS PB finite field multiplier is a 8x233 partial product 
generator which performs the function of AjB,  where B  is the 233-bit operand and
Aj  is the input word of 8 bits. Note that Aj  can be viewed as a field element in
(0^(2233) most significant 225 bits being Os, i.e.,
Aj  =  arx^ -t- OeX® -H ...  4- Oq , (3.9)
where aj, oq, . . .  ,ao € GF(2). Thus,
A j B  =  i a - j X ^  4-  c lq X ^  4- . . .  4-  o,o)B
- „ (3.10)
— cijX B  “1" d^x B  clqB  .
In this expression, x'^B, x ^ B , .. . , x B  axe seven constant finite field multipliers. Each 
result from the seven constant finite field multipliers is multiplied by the coefficient 
Cj correspondingly, this step can be done by an AND network which is introduced 
later. Finally, the accumulation of the eight results can be simply obtained from an 
XOR network.
This 8 X 233 partial product generator can be implemented by the following full bit- 
parallel architecture shown in Figure 3.3.
The seven constant finite field multipliers have the similar architectures as the mod­
ule M3 in the proposed BPWS finite field multiplier in Figure 3.1.
Let Z  — x ^ Y  be the constant multiplier, where w =  1,2, . . . ,  7, such constant mul­
tiplier needs w XOR gates. The architecture of the constant finite field multiplier 
Z  = x ^ Y  is shown in Figure 3.4.
In the 8 x 233 bit-parallel partial product generator architecture shown in Figure 
3.3, the AND networks are used to multiply a 233-bit field element by the coefficient 
Oj and can be implemented by 233 AND gates. The architecture is shown in Figure 
3.5.
The outputs from eight AND networks are accumulated by an XOR network as
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT^PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
233 bits 233 bits
x B
233 bits233 bits 233 bits
X- x̂B
233 bits233 bits 233 bits
Output233 bits 233 bits 233 bits
233 bits233 bits
233 bits233 bits 233 bits
233 bits 233 bits 233 bits
233 bits233 bits 233 bits
233 bits233 bits 233 bits
A N D  N etw ork
A N D  N etw ork
AN D  N etw ork
AN D  N etw ork
AN D  N etw ork
AN D  N etw ork
A N D  N etw ork
A N D  N etw ork
Figure 3.3: 8 x 233 bit-parallel partial product generator
CXo 0̂ 32
Figure 3.4: The architecture of the general constant finite field multiplier Z  = x ^ Y
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
cx, cx
Y o  T
(X3.CX3
0 0 0 0
Y23. Y2:
a i
Figure 3.5: The architecture of AND network
shown in Figure 3.6.




^  233 bits
- ^ 3  233' bits
X 233 bits /___
5 233 bits
6




















Figure 3.6: The architecture of XOR network
XOR gates. The architecture is shown in Figure 3.7.
Assume all AND and XOR gates have only two inputs, the delay of AND gate is 
Ta, the delay of XOR gate is Tx,  the circuit complexity and timing complexity of the 
8  X 233 bit-parallel partial product generator are summarized in the Table 3.3.
The circuit complexity and timing complexity of the BPWS PB finite field multiplier 
in GF{2^^^) are summarized in Table 3.4.
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
cx ex.
Pb P.
Y o Y .
O C e 3 I C X b32
P3
Ŷ 3. Y23
Figure 3.7: The architecture of sub XOR network
#  of AND gates 8 * 233
#  of XOR gates 7 * 233 +  (1 + 7) * 7/2
Critical path Ta + 4Tx
Table 3.3: Circuit and timing complexities of the 8 x 233 partial product generator
#  of AND gates 8 * 233
#  of XOR gates 8 * 233 +  (1 +  8) * 8/2
#  of 233-bit registers 1
Critical path Ta + 6Tx
Table 3.4: Circuit and timing complexities of the BPWS PB finite field multiplier
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SEIUAL PB FINITE FIELD MULTIPLIERS
3.3 Alternative BPWS PB finite field multiplier
As defined in Section 3.2, There is an alternative form of architecture for the above 
BPWS PB finite field multiplier.
we still divide one operand A  into 30 words from the least significant bit of A  and each 
word contains 8 bits. In the 30th word, we append seven “0”s as the most significant 
seven bits.
Let Aj  denote each word, Aj  can be expressed as
Aj  =  Ogj +  CLSj+lX +  +  . . . +  asj+jX^ ,
where j  =  0 ,1 ,2 ,.. .,  29 and Oj =  0 for i =  233,234,..., 239.
Then A  can be rewritten as
A  — Aq +  AiX^ +  A 2 {x^) +  . . .  +  ^ 29(3;̂ ) . (3-11)
Thus the product C = A B  mod F{x) can be expressed as follows,
C = A B  mod F{x)
=  (Ao +  Aia:* +  A2(x*)  ̂+  . . .  +  A2g(a:®)^^)5 mod F(x) (3-12)
=  AoB + A^Bx^ + A 2B { x ^ f  +  . . .  +  A^gBix^f^  mod F(x)
We can further let
Dj — Dj-iX^ , (3.13)
for j  =  1, 2 , . . . ,  29, and Do =  5 ;
Cj = Cj- i  + AjDj  , (3.14)
for j  =  0 ,1 ,. . . ,  29, and C_i =  0.
Prom Equations 3.13, 3.14 and 3.12, we will find that the product C = A B  mod F{x)  
is
C =  C29 . (3.15)
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
The above alternative BPWS PB finite field multiplier can be implemented by the 
architecture shown in Figure 3.8.






M l: Constant multiplier 
M2: 233-bit Register 
M3: 8 X 233 Partial product generator 
M4: 233-bit Adder 
M5: 233-bit Register
Figure 3.8: Alternative BPWS PB finite field multiplier over
port is 8 bits wide and is used to serially input the word Aj  which is part of the 
operand A, the other input port is 233 bits wide and used to input the other operand 
B  in parallel. The output port is 233 bits wide which is used for the output product. 
There are five modules in this alternative multiplier architecture, which are given 
below.
Module M l : Module Ml is a constant finite field multiplier which performs the 
same function as Module M3 described in Section 3.2.
Module M2 : Module M2 is a 233-bit register which has the initial value of B{x)  
and store the intermediate results. Let the output of M2 be Dj, and the function 
that the module Ml and M2 perform be Dj =  Dj^ix^.
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
Module M3 : Module M3 is a 8x233 bit-parallel PB finite field multiplier which is 
the same as Module Ml described in 3.1. One input Aj  is the word from the 
operand A, the other input is Dj, the output of the module M3 is AjDj.
Module M4 ; Module M4 is an adder which has the same function as Module M2 
described in 3.1. The output of module M4 is also the output of the alternative 
BPWS PB finite field multiplier.
Module M5 : Module M5 is another 233-bit register which is used to keep the 
intermediate results. Let the output of module M4 be Cj, then the function 
that the module M4 and M5 perform is Cj =  Cj_i -f AjDj.
Let the initial value of M2 be B, the initial value of M5 be 0, the word input method 
used is the least significant word (LSW) first.
During the first clock cycle (Clock 0), the input word is A q, the output of module 
M2 is B  which is Dq, the output of module M3 is A qB  which is A qDo, the output 
module M4 is A qDq which is Cq since the initial value of M5 is 0 which is mapped by 
C-i = 0;
During the second clock cycle (Clock 1), the output of module M2 is Bx^  which is 
Di, the input word is Ai,  the output of module M3 is A^Di, the output of module 
of M5 is Co, therefore the output of module M4 is Cq -f AiDi  which is Ci;
During the third clock cycle (Clock 2), the output of module M2 is B{x^)^ which is 
L>2 =  the input word is A 2 , the output of module M3 is A 2 D 2 , the output of
module M5 is Ci, therefore the output of module M4 is Ci -1- A 2 D 2 which is C2;
During the SOth clock cycle (Clock 29), the output of module M2 is D 29 = D 2gX^, the 
input word is A29, the output of module M3 is A29D29, the output of module M5 is 
C28, therefore the output of M4 is C28 + ^ 29^29 which is Ĉ ?- 
The product C =  A B  mod F{x) can be obtained from the output of module M4
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
Clock Input M2 M5 Output
0 ^0 Do = B 0 Co =  AoB
1 =  Dox^ Co Cx=Co + AiD,
2 A 2 D 2 = Cl C2 =  Cl +  AiDi
29 A 29 D29 — D2&X̂ C28 C29 = C28 +  A 29D 29
Table 3.5: The values of output and other modules on each clock cycle
during the 30th clock cycle. Table 3.5 shows the values of input words, M2, M5 and 
output of the alternative BPWS PB finite field multiplier upon each clock cycle.
The 8  X 233 bit-parallel partial product generator, the constant multiplier, the adder 
and registers are the same as those described in Section 3.2. Using this architecture 
30 clock cycles are needed in order to perform one multiplication. With the same 
assumption in Section 3.2, The circuit complexity and timing complexity for this 
alternative BPWS PB finite field multiplier are summarized in Table 3.6.
of AND gates 8 * 233
#  of XOR gates 8 *233+ (1 +8) *8/2
#  of 233-bit registers 2
Critical path Ta +  5Tx
Table 3.6: The circuit and timing complexities of alternative BPWS PB finite field 
multiplier
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
3.4 General BPWS PB finite field multipliers
General BPWS PB finite field multipliers in GF(2^)  can be simply derived by ex­
tending the proposed BPWS PB finite field multiplier described in Section 3.2 and 
Section 3.3. Assume the finite field is GF{2^) and the size of the input word Aj  is 
w, then the corresponding architectures are shown in Figure 3.9 and Figure 3.10.







M l: w X m Partial product generator 
M2: m-bit Adder 
M3: Constant multiplier 
M4: m-bit Register
Figure 3.9: The BPWS PB finite field multiplier in GF{2^)
described in Section 3.2 and Section 3.3. It needs [m/tc] clock cycles to perform one 
multiplication in GF(2™) using these general architectures.
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
Input
Output
M l: Constant multiplier 
M2: Register
M3: w X m partial product generator 
M4: Adder 
M5; Register
Figure 3.10: The alternative BPWS PB finite field multiplier in GF{2 ^ )
3.5 Comparisons
Assume all AND gates and XOR gates have only 2 inputs and the delays of AND 
gate and XOR gate are Ta and Tx  respectively. When the irreducible polynomial 
F{x) which generates the finite field GF{T^)  is trinomial, i.e. F{x)  =  a;"* +  +  1,
with 1 < fc < m/2, circuit and timing complexities of bit-parallel, bit-serial and the 
general BPWS multipliers are shown in Table 3.7.
Prom Table 3.7 we can see that the numbers of AND gates and XOR gates in pro­
posed BPWS PB finite field multipliers are between those in bit-parallel PB finite 
field multiplier and those in bit-serial PB finite field multiplier. There is no sequen­
tial element needed in bit-parallel finite field multiplier. The number of registers in 
proposed BPWS PB finite field multipliers is the same as that in bit-serial PB finite 
field multipliers. The critical path for proposed BPWS PB finite field multipliers is 
also between that of bit-parallel and bit-serial PB finite field multipliers. The number
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
Multiplier Speed 
(Clock Cycles)
Circuit complexity Critical path
bit-parallel
[28]
1 AND gates 
— 1 XOR gates
T a +  (riog2(m -  1)1 +  2)Tx
bit-serial (MSB) 
[2]
m m AND gates 
m + 1 XOR gates 
One m-bit register
Ta  +  Tx
bit-serial (LSB) 
[2]
m 771 AND gates 
m -t 1 XOR gates 
Two m-bit registers
Ta +  Tx
Proposed BPWS 
MSW first
\(m/w)^ w * m  AND gates 
w * m  + (l +  w) * w/2  XOR gates 
One m-bit register
Ta +  (riog2 w)] -1- 3)Tx
Proposed BPWS 
LSW first
\{m/w)] w * m  AND gates 
It) * m + (1 +  lu) * w/2  XOR gates 
Two m-bit registers
Ta  +  ([log2 +  2)Tx
Table 3.7: The comparisons among bit-parallel, bit-serial and BPWS finite field mul­
tipliers
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
of clock cycles to perform one finite field multiplication in proposed BPWS PB finite 
field multiplier is also bigger than that of bit-parallel PB finite field multiplier and 
smaller than that of bit-serial PB finite field multipliers. This proposed BPWS PB 
finite field multiplier is the trade-off between bit-parallel finite field multiplier and 
bit-serial finite field multiplier.
When m  is far bigger than w, there is a rough relation between the number of AND 
gates, the number of XOR gates and the number of clock cycles to perform one mul­
tiplication among these multipliers, which is that the products of speed (in clock 
cycles) and circuit complexities for these multipliers are approximately same, i.e., for 
AND gates:
1 * m^(bit parallel) = m *  m(bit serial) w \{m/w)] *w  * m  , (3.16)
for XOR gates:
l*(m^ —l)(bit parallel) m*(m-l-l)(bit serial) ~  \{mlw)' \*{w*m + { l+ w)*w/2)  .
(3.17)
Carefully choosing the number w, the proposed BPWS PB finite field multipliers can 
achieve the desired trade-off between area and speed.
Now we can have further analysis conducted using this word size w,
w = m  , this actually is a full bit parallel architecture, which can be simplified to the 
bit parallel PB finite field multiplier [28] by removing all other modules except 
for the module oi w x  m  finite field multiplier.
w = 1 , this directly becomes a bit-serial multiplier as reported in [2j.
Thus, our design algorithm can be treated as a general design algorithm for finite 
field multiplier.
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. DESIGN OF BIT-PARALLEL WORD-SERIAL PB FINITE FIELD MULTIPLIERS
3.6 Summary
In this chapter, we first introduced the finite fields recommended by NIST for EC 
cryptosystems. After choosing a finite field GF(2^^^) which is generated by an irre­
ducible trinomial F(x)  =  -I- -f 1, we designed and analyzed the BPWS PB
finite field multiplier and the alternative BPWS PB finite field multiplier in Section 
3.2 and Section 3.3 respectively. At the end of this chapter, we designed a general 
form of BPWS PB finite field multiplier which is MSW first multiplier and its alter­
native form which is LSW first multiplier, we also made the comparisons among the 
bit-parallel PB finite field multiplier in [28], the bit-serial PB finite field multiplier 
in [2] and our general BPWS finite field multipliers. The proposed architecture is 
suitable for the application which requires to balance the trade-off between speed and 
area, it is extremely useful for smart card applications.
In next chapter, we will design an ASIC chip which is capable of performing the finite 
field multiplication and squaring using the proposed BPWS PB finite field multiplier 
described in Section 3.2 and the bit-parallel PB finite field squarer described in [28].
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4
Hardware Design
The final aim of this thesis is to design an application specific integrated circuit 
(ASIC) chip which can perform multiplication and squaring in In this
chapter, the detail design methodology of such a chip is introduced. Issues during the 
design are addressed. The multiplication is implemented by applying the proposed 
BPWS PB finite field multiplier described in Section 3.2. The squaring is achieved 
by applying the bit-parallel PB finite field squarer described in [28].
4.1 Hardware architecture
The proposed hardware architecture is shown in Figure 4.1.
In this figure, the module Multiplier is the BPWS PB finite field multiplier de­
scribed in Section 3.2, the module Squarer is the bit-parallel PB finite field squarer 
described in [28], the width of the data bus (which is also size of the word in BPWS 
PB finite field multiplier in Section 3.2) is 8, the width of the address is 5, elk is
45














Figure 4.1: The schematic of the hardware
clock signal, rst, sel and w are control signals, the module Mux is used to select 
either finite field multiplication or finite field squaring, the modules of Registers are 
used to store the input operands or the result, the modules of codec and codecout 
are used to decode the address to write the input data into or read the data out of 
the registers. Except for the modules of Multiplier and Squarer, all others can be 
modeled in the control part of the data path in a processor.
4.2 Hardware specifications
The specifications for the ASIC chip are summarized in Table 4.1.
46




Power consumption 20 mw
Table 4.1: Specifications
4.3 VLSI implementation technology and design 
flow
CMC (Canada Microelectronic Corporation) supports all Canadian universities with 
industry level VLSI design tools and technical support. CMC also provides several 
design flows for different kinds of ASIC designs. The design ffow followed in this 
thesis is the CMC digital design flow. Figure 4.2 shows the CMC digital design 
flow. The VLSI implementation technology used in this project is TSMC (Taiwan 
Semiconductor Manufacture Company) 0.18/xm CMOS technology. Compared with 
TSMC 0.35/im technology, O.lSfxm technology has the advantages of small area and 
low power consumption etc.
Digital chip design can be partitioned into front-end design, back-end design and 
post design verification and modification.
4.4 Front-end design
Front-end design of an ASIC chip includes the tasks of hardware modeling, testbench 
and stimuli file creations, logic synthesis and design-for-testability (DFT) synthesis 
etc.
47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN













Verilog Netiist )I .oe.ic Ct'asirainis Synopsys-i)C7f C 
Htiorplan MTest Vectors
Synthesis Libraries
CIn iiia l/C usiom ^_  Wireload File y Test CompilerFault Simulation7
Gate-Level
Simulation
nnysical C o n s tru es  
LEFlHiysicai iJibranes 
TX.F Timing Libraries
I I I( )
Placement
^ e r ilo g  Test F ix tu re)











Clock Tree C Golden Verilog Netiist
Placement
ECO Removal





I drcI ercI M etal Filling
Pracu a &  DivaF̂hysiciil Ral
C DEF )














3: For the creation o f  bb ra rte i. appropriate
design prectices ipuat be (rt>8erved
D C an a d ian  M icroe lectron ics C o rp o ra tio n  (Copy with permission only) 
7/4/00
Figure 4.2: CMC digital design flow
48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
4.4.1 Stim uli files
In ASIC digital design, we need stimuli files to verify the logic function of the mod­
eled hardware circuits. In this project, a software program is developed using Bor­
land C-I—I- Builder to create the stimuli files. This program is actually a finite field 
multiplier over GF{2‘̂ ^)  and can create two kinds of stimuli files to test finite field 
multiplication and finite field squaring respectively. There are a number of vectors in 
the stimuli files. In the stimuli file for the finite field multiplication test, each vector 
contains three 233-bit binary numbers, the first two are the two input operands and 
the last one is the result used to compare with the output of the modeled circuits; 
In the stimuli file for the finite field squaring test, each vector contains two 233-bit 
binary numbers, the first is the input operand and the last is the squaring. The 
number of the vectors is 1000 in this program.
4.4.2 Hctrdware m odeling
The circuit shown in Figure 4.1 and its modules are modeled in Verilog which is 
an industry level hardware description language (HDL). The circuit is modeled at 
register transfer level (RTL). In order to verify that the circuit perform the desire 
logic, two testbenches which are written in Verilog to perform functional finite field 
multiplication and finite field squaring tests are also needed. The detail Verilog files 
are listed in Appendix A and Appendix B. The Verilog simulation tool used in this 
thesis is Verilog-XL. In the testbenches, the stimuli files which are created by the 
software are used to exercise the circuit, no functional error has occurred during 
simulation. The modeled circuit performs the desired logic.
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
4.4.3 Logiczd synthesis
The Verilog files that we modeled at RTL level to describe the behavior of the hard­
ware circuit are also called RTL netlists, while the physical layout design needs 
gate level netlists. The task of logic synthesis is to convert the RTL netlists to gate 
netlists. The cells in the gate netlists are referenced by cell libraries (target libraries) 
which are provided by ASIC vendors. The logic synthesizer is a software to perform 
the logic synthesis task. The logic synthesizer used to perform logical synthesis in 
this project is Design Compiler from Synopsys, and the target libraries used here are 
TSMC 0.18 micron CMOS technology libraries. The Table 4.2 summarizes the results 
of logic synthesis.
Circuit/module #  of Cells Cell Area (iJ-m?) #  of equivalent gates
BPWS multiplier core 3029 124936.062500 4893
Squarer core 293 4248.527832 293
Whole circuit 4539 495452.468750 12113
Table 4.2: Results of logic synthesis
4.4.4 D F T  synthesis
There are two types of test in our project. One is functional test which verifies that the 
circuit performs the correct logic as expected. The other is manufacturing test which 
verifies that the circuit does not have manufacturing defects by focusing on circuit 
structure rather than functional behavior. Manufacturing defects might remain un­
detected by functional testing yet cause undesirable behavior during circuit operation.
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
Design for testab ility  (DFT)
DFT is a manufacturing test technique that we can adopt to thoroughly test our inte­
grated circuit. Detailed introduction of DFT can be found in [30] and other relevant 
books.
In this project, the DFT design technique is in ternal full sccui. The tool which is 
used for DFT synthesis is DFT Compiler from Synopsys, the ATPG tool which is used 
to create test patterns and to perform fault simulation is TetraMax from Synopsys. 
In our design, all sequential cells are all valid, no violation exists. The number of the 
test patterns is 154 and the fault coverage is 100%.
4.5 Back-end design
In this section, we start the physical IC layout design. Back-end design includes the 
tasks of floorplanning, placement, clock synthesis and routing.
4.5.1 Floorplanning and P lacem ent 
Floorplanning
The objectives of floorplanning are to minimize the chip area and minimize delay. The 
input to a floorplanning tool is the gate netiist that describes the modeled circuit. 
The gate netiist in our design is the output from logical synthesis and DFT synthesis 
which is a logical description of the ASIC. The floorplan is a physical description of an 
ASIC. Floorplanning is thus a mapping between the logical description (the netiist) 
and the physical description (the floorplan).
The tasks of floorplanning are to
• arrange the blocks on a chip,
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
• decide the location of the I/O  pads,
• decide the location and number of the power pads,
• decide the type of power distribution, and
In our design, the gate netiist does not contain any blocks. In TSMC 0.18/xm tech­
nology, the power supplies for I/O ring and core cells are different. There are four 
pairs of power pads added in our design, two pairs are used for I/O  ring power sup­
ply, two pairs are used for the core power supply. The aspect ratio is set to 1 which 
means the shape of the chip is square. A pair of power ring is placed around the 
core which contains all the standard cells and three pairs of vertical power strips are 
placed across the core.
Placement
After completing a floorplan we can begin placement of the logic cells. The objectives 
of placement axe to
• guarantee the router can complete the routing step,
• minimize all the critical net delays,
• the chip as dense as possible,
• minimize the total estimated interconnect length,
• meet the timing requirements for the critical nets,
•  minimize the interconnect congestion.
Compare with floorplanning, placement is more suitable for automation.
Our design is row-based ASIC design. All logic cells are placed in rows which are
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
defined in the floorplanning and placement tool. Rows are separated by channels 
which are used for horizontal routing. Figure 4.3 illustrates that the logic cells are 
placed into rows. Carefully selecting the channel offset, we can avoid the later design 
rule checking (DRC) problems. In our design, the row utilization is set to 90%, 
the channel offset of 4*0.56 micron is obtained from many experiments and other 
designers’ experience.
In our design, the tool used to perform floorplanning and placement is Physical
Rows
Channel -
Figure 4.3; Illustration of placement
Design Planner (or called AreaPdp) which is a Cadence tool. Placement is done by 
the timing driven Qplace tool which is integrated in AreaPdp. Timing constraint files 
which are obtained during logic synthesis are fed into AreaPdp as constraint files. In 
our design, timing requirements are met and there is no congestions in the geometry 
report.
4.5.2 C lock tree synthesis
The major task of clock tree synthesis is developing the interconnect geometry that 
connects the clock to all the cells on the chip that use a clock. These cells consist of
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
latches, flip-flops, and other logic elements that are needed to synchronize with the 
system clock. In this thesis, clock tree synthesis is done in CTGen which is integrated 
with First Encounter (FE) Ultra, a Cadence tool. The generated clock tree meets the 
timing requirements and 23 buffers are added to balance the clock tree in our design.
4.5.3 G olden netiist
The gate netiist generated from Design compiler has been modified at stages of DFT 
compiler by replacing all sequential cell with corresponding scanned enable sequential 
cells and clock tree synthesis by adding buffers into it. This modified gate netiist 
should perform the same logic as the RTL netiist. Before doing routing, we should 
run the functional test again to verify if this modified netiist perform the desire logic. 
The test tool is still Verilog-XL. Without considering the timing requirement, the 
functional test at this step is the last time to verify if the circuit can perform the 
desire logic. Any failure in functional test will result in the iterations of floorplan, 
placement and clock tree synthesis. The results of functional multiplication test are 
shown in Figure 4.4 and Figure 4.5. The functional squaring test results are shown 
in Figure 4.6 and Figure 4.7.
There is no functional error during simulation and this modified gate netiist is 
called golden netiist which can also be used in later LVS checking.
4.5.4 R outing
After the chip is floorplanned and the logic cells have been placed, it is time to make 
the connections by routing the chip. Routing is usually split into global routing 
followed by detailed routing. In this project, the tool (router) which is used to 
perform routing is Silicon Ensemble which is a Cadence tool. After detailed routing
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
Figure 4.4: Functional multiplication test
E^Dm FonQ«t
(# j ■« «• ( It ■-% «)i. X f''^ ^  ^  j [ ^ ' ^  ’i# ’ag iei'^iiil'4
5P. n.•I 1 - .  . f  >«■Sftarrr Nanst
Timanwiga | ’ b ecu ooop8~zo( ■ ■ ^ p m e A g g  *'< ' fcC3!̂ :
(.‘u rs o ’~D«t«iifiB <• la^S b .U H G p ^
mB V addrt4;l̂ 
clic
a  ^  cout(232i0] 'h083lkS99»
SI ^  .<ct0ffip[232:OI ;^ M )6 a i^ 9 v
ctam p^out(23Z ;0I M ^ Q 3 M 9 » » - |
SI d « t^ :0 1
SI ^  d ^ 8 u o u t l7 :q .  'hOO
TO eb|«cf« MMe
Figure 4.5: Waveform of functional multiplication test
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
Figure 4.6: Functional squaring test
EiiB Eoit ytaw E>«lor8 PonB«l- Ulndmn
s »i « <■•(* 'j%
6» arcft4 4 « m e«  |  " "'
^  »  n, sn 
 ap.pv





a  ctem p_out{232:0i 
• % . datd{7:01
d«rSL.OUtl7:01
[ 0 1  se le c t 
4 B|[ w rite
Figure 4.7: Waveform of functional squaring test
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
is complete, the exact length and position of each interconnect for every net is known 
and the parasitic capacitance and resistance associated with each interconnect, via, 
and contact can be calculated by RC extraction tool. Interconnect delay and load 
due to parasitic resistance and capacitance are written in regular standard parasitic 
format (RSPF) format and static timing analysis is done in peri scripts in our design. 
The timing analysis at this step is the last time to verify timing. Any failure to meet 
the timing requirements will result in the iterations of floorplanning, placement and 
routing until the timing requirements are thoroughly met. The result of timing limit 
checking is shown in Figure 4.8.
£ i le  £ d tt  view Hindows Help
M  .3lJU
e pin_to_pin
pearl.tmpcmd> ReadCCFConstraints . . /Synopsys/top_wrapper.gcf 
pearl .ttnpcmd> SetM axPossib ilities  10 
p e a r l. tmpemd> CheckTiming
p e a r l. tmpcmd> CheekLimits -check max_1oadjraax_s1ew,fluence > check!itn 
its .T o g
pearl,tmpcrad> Tim ingVerify -check setup.hoTd^gatedcTock,recovery.rem  
oval /nochangesetupmochangehQTdjperiodjWidth,Toop -max_s1ack 0 
No tim ing constrain ts  were triggered  
cmd> tim in g ve rify  
cmd> checki im its  
No l im it  v io la tio n s  were found
cmd> tim in g ve rify  
cmd> checkiim its  
cmd>
Figure 4.8: Timing limit checking
Our design meets the timing requirements and there is no timing violation in our 
design.
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
4.6 Physical verification and modification
After detail routing is complete and timing analysis shows that the design meet the 
timing requirement, we can perform physical verification and even modification which 
is usually needed before the chip is fabricated. There are two major kinds of checking, 
layout versus schematic (LVS) and design rule checking (DRC).
4.6.1 Layout versus schem atic (LVS)
The timing analysis we perform after routing just shows whether the design meets 
the timing requirements. The netiist might be modified during routing. One of our 
concerns is if the physical layout after routing performs the same logic as the golden 
netiist. LVS essentially compares the physical netiist (the netiist after routing) to 
the golden reference (golden netiist) to ensure that what is about to be committed 
to silicon is what is really wanted.
In our design, the tool to perform LVS is Diva LVS from Cadence which is integrated 
in Cadence Design Framework II (dfll). Diva LVS in dfll is used to compare:
1. the final layout in the form of a DEF (Design Exchange Format) file created 
from Silicon Ensemble after routing to
2. the golden netiist
to verify the physical (Placed & Routed) version of the design contains the same 
instances, nets, and connectivity as the verified ’’golden” netiist. The LVS result is 
shown in Figure 4.9.
In our design, the layout and schematic match each other. Since the physical 
layout meets the timing requirements, now we know the physical layout can perform 
the desire logic.
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
The n e t-1 1 S ts  match.
L4n-matched 
rew i red  
s iz e  e rro rs  
pruned  
a c t i  ve  
t o t a l
1 ayout schemat1c 
in s tan ces
un-matched
merged
p r u n e d
a c t iv e
























te rm in a l s 
0 0un-matched  
matched but
d i f f e r e n t  typ e  o 0
t o t a l  0 37
End com parison: Feb 20 10: 34 :32  2004
Gomparisoh program com pleted s u c c e s s fu lly .
Figure 4.9: The result of LVS
4.6.2 D esign  rule checking (DRC)
DRC ensures that nothing has gone wrong in the process of placing the logic cells 
and routing.
The DRC may be performed at two levels. Since the detailed router normally works 
with logic-cell phantoms, the first level of DRC is a phantom level DRC , which checks 
for shorts, spacing violations, or other design-rule problems between logic cells. This 
is principally a check of the detailed router. In our design, Dracula DRC which is a 
cadence tool performs the phantom level DRC. The result from Dracula is shown in 
Figure 4.10.
There is no any DRC violation during phantom level design rule checking.
If we have access to the real library-cell layouts (sometimes called hard layout ), we 
can instantiate the phantom cells and perform a second-level DRC at the transistor 
level. This is principally a check of the correctness after replacing the library cells
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
Rte Tools Cations CMCGatetvay CMQ!^8rDocuMmaUon Hei|i l
‘e x e c o t i i f ( p t s  ItvEttomit ^ a o t S U ) )  '^ p a » . i njls) *VL,..
Mocut ;̂ fate^ > -goeiSbai|p{gate 
executmg diffSSJnWS * drcintenna(o5antema?5 (gate (eua diffantennaWJ) (antema (era oSinte 
executing;- \tot?S = geo»Snd»ot(tSimteimaiSS_4iffVStoOS) ' 
exectrting; TiStatSS » (̂aStaapfifitoi .̂vStetwttialS) 
executing: vSintErrM « ge«®Cat(gp*e5S-vSWtj«iiwJi$|-. • •
«xecutingi,tatofcerma(^tErtpr (gate (sim-gatfflS)) (antawia (aim tSintsS)? (ignoreX nib) “T.,, 
fflO-sbr' ted..1^ ?#
I  -coigibted S',16:r^:i2'2M4
cpg TDffi « 00i((7i02 TOm CTE = '00;07:37 
********* suBtary of mb uiobtions for cell *t($ layout' 
i  Total errors found: 0
kouseX : ]g;cj£«m S8bt() H- MU8sPopI^() K: iKSiDSCO
d
Figure 4.10: The result of phantom level DRC
with detail cell layouts. Since we don’t have the detail library-cell layouts due to 
the confidentiality of TSMC technology, CMC will perform this check as a type of 
incoming inspection. The results of the second-level DRC from CMC are shown in 
Figure 4.11 and Figure 4.12.
There is no any DRC violation in this design. Now our design passed the CMC 
inspection and is ready for fabrication. The design name is ICFWRWTK, the fabri­
cation run code is 0402CF. The chip is expected to return on Oct. 2004.
4.7 Chip Layout
The layout of the chip is shown in Figure 4.13.
The total die size of chip is 2533190.5/xm  ̂ including pads. There are 39 pads in the 
chip. The hardware parameters are summarized in Table 4.3. From this table, we
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
Hi h l l  p;//W W W . c m t;-c d /p ro d _ _ s c r  V / d r c / c f / I d  WR i W K /pr i r i t o u t . in fo  M ic ro s o f t  I n te r  . .  j |n ] f 5 < l
C,dit F a v o r ite s  Ipiois t le lp
F a v o r t t i a s
AtJdrtstfs i4 ||^  h t tp : / /w w w .c m c .ca /p ro d _ se rv /d rc /c f/lC F W R T W K /p rln to u t'.ln fo  
I  ■ t g f e . S g a y c h : w e b - i  j' 822 blocked Ai t
'**J i!3  <so
'<= O S 5J
STANDARD DRC RESULTS ^ o r  BCFWRTWK. s ±  rm.
E x e c u t  i o n  D a t e /.T xm e: 
C a l i b r e  V e r s i o n :
Mon J a n  12 1 3 :5 6 :1 6  2004
vS^ 8 _ 1 6 . 2 Wed N ov  28 2 2 :4 6 :3 1  PST 2001
  RUNTIME WARNINGS
ACUTE a n g l e  o n  l a y e r  M ?i a t  l o o a t  i o n  < 2 6 .9 7 5 , 7 4 .4 5 )  i n  c e l l  PV SS1DGZ_66.
ACUTE a n g l e  o n  l a y e r  ^ i  a t  l o c a ' t i o n  (^ 6 .  9 2 5 , 1 6 9 .5 3 )  i n  c e l l  PDO08CDG_71.
ACUTE a n g l e  o n  l a y e r  M 2i a t  l o c a t i o n  < 2 6 .9 2 5 , 1 5 9 .5 3 )  i n  c e l l  PDiD G Z_70.
- —  RULECHECK INSULTS STATISTICS 
SUMMARY
TOTAL CPU T im er 6 8 8
TOTAL REAL T im e : 69 2
TOTAL DRC K u le C h e c k s  E x e c u te d :  341
TOTAL DRC R e s u l t s  G e n e r a t e d :  0 CO)
mmy
Figure 4.11: The result of standard DRC from CMC
2k h t f p : / A w w v . * : n u ; . r a / p r o d _ s e r v / c l i <  / r . f / i ( . l  Wit  I W K / p r i n J o i i 1 ,a n t e ,  i n f o  - Mit;i o s o f t  I n l e t  n e t  I x p lo r t i t
EHe B».v Fayortes. -look tifOp'- ■   49?
“i t  lihMeda ^  rn LJ ®  JSj
|i(||[|| h t tp ; //»vvvw.cmc .c a / y o d :.^e rv /d fc /cf/lCF\W TW K/pr{nto>jl:.a n te .in fo_________________   ^  Q i  <So
ANTENNA DRC RESULTS fo r  BCFWRTWK. s tra
Execut ion  Dat e/Time: 
C alib re  Versiori:
Hon J a n  12 14:08:15 2004
V8.8_16.2 Wed Nov 28 22:46:31 PST 2001
— RUNTIKE WARNINGS
  ROLECHECK RESULTS STATISTICS
  SUMMARY
lOTM.. CPU Tim e: 87
TOTAL REAL Time: 87
TOTAL DRC Rul'eCh.ecks E x e c u te d :  -24
TOTAL ;DR£ R e s u l t s  G e n e ra te d :  0 <0)
ijgjPOT̂
Figure 4.12; The result of anntenna DRC from CMC
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
V: S4£»  1SSSJ0 (FlSrivcbO
Tuols Deagn Window EcMi :Veniy usmecuvity Optioiis Route CMC StutL
R leHiEdbLtSisplayt^tionsO
Figure 4.13: The layout of the chip
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN
can see that our designed chip meets the design specifications.
Specification BPWS core Squarer core Whole circuit
#  of cells 3029 293 4570
#  of equivalent gates 4893 293 12154
Area 189297.06439 6437.15746 2533190.5
Power consumption (mw) < 9.2178 31.7421
Frequency (MHz) 50 (max. 130)
Table 4.3: The hardware parameters
4.8 Comparisons
It is known that the finite field multiplier is a key component in an EC security 
processor. The comparisons among the designed chip and other VLSI implementation 
of finite field multipliers are made in Table 4.4.
Even though the frequency is set to 50 MHz during the chip design, the maximum 
frequency that the chip can work on is 130 MHz. For our designed chip, there is 
only one clock cycle needed to perform one finite field squaring, while for all other 
multipliers, the number of clock cycles to perform one finite field squaring is the same 
as that to perform one finite field multiplication. Prom Table 4.4 we can see that our 
design meets the design saves hardware resources.
4.9 Summary
The full hardware design methodology is introduced in this chapter. The CMC dig­
ital design flow is followed, the VLSI technology adopted is TSMC 0.18/im CMOS
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4. HARDWARE DESIGN


















77 2233 37296 L U T s 
37552 F F s
528427 X ilinx  F P G A  
XC2V 6000- 
ffl517-4




66.4 <  2^56 14797 L U T s 
2948 F F s
136064 X ilinx  F P G A  




8 X 288 
[18]
3 <  2®™ 2 » 8 * 288 A N D s
2 * 8 * 288 X O H s
3 * (8 +  288) F F s
14544 A L TE R A
F P G A
E PF 10K 250-
A G C5992
Table 4.4; Comparisons among VLSI implementation of finite field multipliers
technology. There is no any error existed in the final layout. The design name is 
ICFWRWTK, the fabrication run code is 0402CF. The chip is expected to return on 
Oct. 2004. The designed chip is ready for fabrication.
In next chapter, my contributions and the expected future works are summarized.
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5
Summaries of Contributions
The summaries of my contributions are;
•  Two BPWS PB finite field multipliers in GF{2^^) are designed in this thesis 
and the proposed BPWS PB finite field multipliers have the trade-off between 
area and speed.
• The maximum frequency that the designed chip can work is 130 Mhz. The 
area of the BPWS multiplier is 189297.06439/xm^, the power consumption is 
less than 9.2178 mw. These results meet the design specifications.
• Compared with other finite field multipliers in Table 4.4, the proposed BPWS 
finite field multiplier saves the hardware resource.
• A novel 8 x 233 bit-parallel partial product generator is designed in this thesis.
The expected future work is to design an EC security processor for smart card 
using this proposed BPWS finite field multiplier.
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
■uo!ss!ujj0 d jnoLiJiM pejiqiLjojcl uoqonpojclaj jaL jpn j jauMo jLjBuAdoo aqj jo  uoissjoijad l|J!m paonpojday
99
( ([e]doq.~;vTio-BiBp)i- ‘ ( [S] q.no”-Bq.Bp) avd  ’ ) SO ino '-B i^pd oODSOOad
( ([2 ]do^-q .no-E q ,B p)i- ‘ ( [s ]  q.no-Bq.Bp)aVd' ) zo ^n o “Bq.Bpd O O D S O O a d
( ([x ]doq .-ano-B q.B p)i- ' ( [x] q.Tio"BaBp)aVd‘ ) XO^i^o"^l^P<i OQOSOOad
( ( [o]doq.”q,no~B:;Bp)i- ‘ ( [0] 3-tio“-BaBp)avd‘ ) OaOSOOQd
; (doq.“2Tix” as8X.‘doq."x^iT”3-S9^‘doq.“TiT“q,saq.‘doq.”a s “ q.saq.
‘ doq.“ 3[X3^sj ‘ doq.“ j[X3ft ‘ doq.~q.no ~aq.ap ‘ doq.":^o^x^s 
‘d o q ~ q asaa ‘doq~ aq iJA ‘doq~3txo‘doa~Jcppa‘doq~aaap) sAsdoq. sAs~doq
fdoq.~3 TiT~qsaq
‘doq~x tit“ q sa q ‘doq~TiT~qsaq‘d o q ~ as~ q saq ‘doq~3tX9Q-SJ 
‘ doq~3(XDtt‘ doq~qoaxas ‘d o q .~ q asa j‘d o q ~ aq tjw ‘doq~3tX3 SJX*
:doq~j:ppB [o^t'] 
;doq.~qno~aqap‘doq~BqBp [0 = ^]
:qno~aqBp [q : 2,] q.ndqno
: 3iX3q.sj ‘ stxoft ‘ a39I9S  ‘ q .asa j ‘ aq.Tj:a ‘ 3[X3 p.ndat 
: 2 UT~qsaq.‘ xnT” lsaq . ‘u x 'q s a q  ‘ a s 'q s a q  qnd n i
!jX>pb [0 : q.ndut
faqap  [o:z]andT ix 
: ( 2 xtx~qsaq‘ xnx~qsaq ‘u x ~ q sa q ‘ as~qsaq .‘q tx 3 ^ s j‘5[X9*
‘ q n o 'a q a p  ‘ qoaxas ‘ q asax  ‘ aqx xw ‘ 3[xo ‘ xppB ‘ Bq.ap) xadda jw 'd o q
axnponi sd o x /sn x  axBOsatnxq,
•SXX99 pad  o / I  9 m  s a u x ja p  qx ‘n o x q d x jo sa p  xsAax doq aqq / /  
sx  A 'xaddax«~doq a m  -xsAax OAq sapnxoxtx AqoaxaxTi qxnoxxD a m  / /
♦** £002 9X ’3^V ***
Sxrei xaqpiart
♦** A 'xaddBJ«~doq * * *
j  l u v u B o u j  
y  xipuaddy
A. PROGRAM 1
PD008CDG p d a ta _ o u t0 4  ( .P A D (d a ta_ o u t[ 4 ] ) ,  . I ( d a ta _ o u t _ to p [4 ])  ) ;
PD008CDG p d a ta _ o u t0 5  ( .P A D (d a ta_ o u t[ 5 ] ) ,  , I ( d a ta _ o u t_ to p [ 5 ] )  ) ;
PD008CDG p d a ta _ o u t0 6  ( .P A D (d a ta_ o u t[ 6 ] ) ,  . I ( d a ta _ o u t_ to p [ 6 ] )  ) ;
PD008CDG p d a ta _ o u t0 7  ( . P A D (d ata_ o u t[ 7 ] ) ,  . 1 (d a ta _ o u t_ to p  [7 ])  ) ;
. C (d a ta _ to p  [0] ) , . PAD ( d a ta  [0 ])  
• C ( d a ta _ to p [ l ] ) ,  . P A D (data[1 ])  
•C (d a ta _ to p [2 ] ) ,  . P A D (data[2 ])  
■ C (d a ta _ to p [3 ]) ,  . P A D (data[3 ])  
• C ( d a ta _ to p [4 ] ) ,  . P A D (data[4 ])  
■ C (d a ta _ to p [5 ]) ,  . P A D (data[5 ])  
,C ( d a ta _ to p [ 6 ] ) ,  . P A D (data[6 ])  
• C ( d a ta _ to p [7 ] ) ,  . P A D (data[7 ])  
.C ( a d d r _ to p [0 ] ) ,  .P A D (addr[0 ])  
• C ( a d d r _ to p [ l ] ) ,  .P A D (addr[1 ])  
.C ( a d d r _ to p [2 ] ) ,  .P A D (add r[2 ]) 
• C (a d d r_ to p [3 ] ) ,  .P A D (addr[3 ])  
.C ( a d d r _ to p [4 ] ) ,  .PA D (addr[4 ])  
PDIDGZ p c lk  ( .C ( c lk _ to p ) ,  .PA D (clk) ) ;
PDIDGZ p w r i te  ( .C ( w r i t e _ to p ) , .P A D (w rite) ) ;
PDIDGZ pw clk  ( .C (w c lk _ to p ) , .PAD(wclk) ) ;
PDIDGZ p r s t c l k  ( .C ( r s t c l k _ t o p ) , .P A D (rs tc lk )  ) ;  
PDIDGZ p s e l e c t  ( . C ( s e l e c t _ t o p ) , .P A D (se le c t)  ) ;  
PDIDGZ p r e s e t  ( .C ( r e s e t _ t o p ) , .P A D (re se t)  ) ;
PDIDGZ p t e s t _ s e  ( .C ( t e s t _ s e _ t o p ) , .P A D (te s t_ se )  ) ;  
PDIDGZ p t e s t _ i n  ( .C ( t e s t _ i n _ t o p ) , .P A D (te s t_ in )  ) ;  
PDIDGZ p t e s t _ i n l  ( .C ( t e s t _ i n l _ t o p ) , .P A D (te s t_ in l)  ) ;  
PDIDGZ p te s t_ in 2  ( .C ( t e s t _ in 2 _ to p ) , .P A D (te s t_ in 2 )  ) ;  
endm odule
PDIDGZ pdataOO 
PDIDGZ p d a taO l 
PDIDGZ p d a ta 0 2  
PDIDGZ p d a ta 0 3  
PDIDGZ p d a ta 0 4  
PDIDGZ pdataO S 
PDIDGZ pdataO e 
PDIDGZ p d a ta 0 7  
PDIDGZ paddrOO 
PDIDGZ padd rO l 
PDIDGZ pad d r0 2  
PDIDGZ pad d r0 3  
PDIDGZ pad d r0 4
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Appendix B 
Program 2
♦♦* t o p a l l . v  ***
*** Wenkai Tang **■•■
*♦* Aug. 16 2003 ***
♦ ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦******:|c!(c:((!(ci|c:(i***j(ii|t*****j)c*!|i*i|c***!|ci(c*:)ci)c*#*!)ci(e*:)t*i|c**/
II  The c i r c u i t  h ie ra c h y  in c lu d e s  two l e v e l .  The t o p a l l . v  i s  
/ /  t h e  se co n d  l e v e l  d e s c r i p t i o n ,  i t  m odels th e  c i r c u i t  and s t i l l  
/ /  c o n ta in s  a  h ie r a c h y .
‘t im e s c a le  I n s / lO p s  m odule
to p _ s y s ( a , a d d r , e l k , w, r s t , s e l , c , w c lk , r s t c l k , t e s t _ s e , 
t e s t _ i n , t e s t _ i n l , t e s t _ i n 2 ) ; 
in p u t  [7 :0 ]  a ; 
in p u t  [4 :0 ]  a d d r ; 
in p u t  e l k ,w c l k , r s t c l k ;  
in p u t  w; 
in p u t  r s t ;  
in p u t  s e l ;
in p u t  t e s t _ i n ,  t e s t _ s e ,  t e s t _ i n l , t e s t _ i n 2 ;  
o u tp u t  [7 :0 ]  c ; 
w ire  [2 5 5 :0 ] w l, w2; 
w ire  [7 :0 ]  w81;
w ire  [23 2 :0 ] w ll ,w l2 ,w l3 ,w l4 ,w l5 ;
codec c l ( a , a d d r , e l k ,w ,w l ) ; 
a s s ig n  wll«=wl [2 4 0 :8 ]  ; 
a s s ig n  w81*wl[ 7 :0 ] ;
BPWSMSBFFM b l ( w 8 1 , w l l , e l k , r s t , r s t c l k , w l 2 ) ; 
fb p _ sq u a x e r  f l ( w l l , w l 3 ) ;  
m u l t i p l i x e r 2 t o l  m l ( w l2 ,w l3 ,s e l ,w l4 ) ;
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
la tc h 2 3 3  I l ( w l 4 , w c l k , l ’b 0 ,w l5 ) ; 
a s s ig n  w 2[232 :0 ]= w l5 ; 
a s s ig n  w 2 [2 5 5 :2 3 3 ]= ’bO; 
co d e c o u t c 2 ( w 2 ,a d d r ,e lk ,w ,c ) ;
endm odule
/ /  M odule codec i s  a  com ponent i n  m odule to p _ s y s ,  i t  d ec o d es  a d d re s s  
/ /  and  w r i t e  th e  d a t a  i n to  a  p a r t  o f  r e g i s t e r ,  
m odule c o d e c ( d a t a ,a d d r ,e lk ,w ,d a ta _ o u t ) ; 
in p u t  [7 :0 ]  d a t a ;  
in p u t  [4 :0 ]  a d d r ; 
in p u t  e lk ;  
in p u t  w;
o u tp u t  [25 5 :0 ] d a ta _ o u t ;  
r e g [2 5 5 :0 ] d a ta _ o u t ;
a lw ay s S (p o sed g e  e lk )  
b e g in  
i f  (w)
c a se  (a d d r)
5 ’bOOOOO: d a t a _ o u t [ 7 :0 ] = d a ta ;
5 ’bOOOOl: d a ta _ o u t[ 1 5 :8 ] = d a ta ;
5 ’bOOOlO: d a ta _ o u t[ 2 3 :1 6 ] = d a ta ;
5 ’bOOOl1: d a t a _ o u t [3 1 :2 4 ]= d a ta ;
5 ’bOOlOO: d a ta _ o u t[3 9 :3 2 ] = d a ta ;
5 ’bOOlOl: d a ta _ o u t[ 4 7 :4 0 ] = d a ta ;
5 ’bOOllO: d a ta _ o u t[ 5 5 :4 8 ] = d a ta ;
5 ’b O O lll: d a t a _ o u t [6 3 :5 6 ]= d a ta ;
5 ’bOiOOO: d a t a _ o u t [ 7 i :6 4 ] = d a ta ;
5 ’bOlOOl: d a ta _ o u t[ 7 9 :7 2 ] = d a ta ;
5 ’bOlOlO: d a t a _ o u t [8 7 :8 0 ]= d a ta ;
5 ’b O lO ll: d a ta _ o u t[ 9 5 :8 8 ] = d a ta ;
5 ’bOllOO: d a ta _ o u t[1 0 3 :9 6 ] = d a ta ;
5 ’b O llO l: d a t a .o u t [ 1 1 1 :1 0 4 ]= d a ta ;
5 ’b O lllO : d a t a . o u t [ 1 1 9 :1 1 2 ]= d a ta ;
5 ’b O l111: d a t a . o u t [1 2 7 :1 2 0 ]= d a ta ;
5 ’blOOOO: d a t a . o u t [ 1 3 5 :1 2 8 ]= d a ta ;
5 ’blOOOl: d a t a . o u t [1 4 3 :1 3 6 ]= d a ta ;
5 ’blOOlO: d a t a . o u t [1 5 1 :1 4 4 ]= d a ta ;
5 ’b lO O ll : d a t a . o u t [1 5 9 :1 5 2 ]= d a ta ;
5 ’blOlOO: d a t a . o u t [1 6 7 :1 6 0 ]= d a ta ;
5 ’b lO lO l: d a t a . o u t [1 7 5 :1 6 8 ]= d a ta ;
5 ’b lO llO : d a t a . o u t [1 8 3 :1 7 6 ]= d a ta ;
5 ’b l O l l l : d a t a . o u t [1 9 1 :1 8 4 ]= d a ta ;
5 ’bllOOO: d a t a . o u t [1 9 9 :1 9 2 ]= d a ta ;
5 ’b l lO O l: d a t a . o u t [2 0 7 :2 0 0 ]= d a ta ;
5 ’b l lO lO : d a t a . o u t [2 1 5 :2 0 8 ]= d a ta ;
5 ’b l l O l l : d a t a . o u t [2 2 3 :2 1 6 ]= d a ta ;
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
5 ’ b 11100: d a t  a_ o u t [231:224] = d a ta ;
5 ’b l l l O l : d a t a _ o u t [2 3 9 ;2 3 2 ]= d a ta ; 
5 ’ b 11110: d a t a .o u t  [247:240] = d a ta ; 
5 ’b i l l 11: d a t a . o u t [2 5 5 :2 4 8 ]= d a ta ; 
e n d c ase
end
endm odule
//M o d u le  c o d e co u t i s  a  com ponent i n  m odule t o p . s y s ,  i t  decodes 
/ / t h e  a d d re s s  and r e a d  th e  d a t a  o u t o f  th e  r e g i s t e r ,  
m odule c o d e c o u t ( d a t a ,a d d r ,c lk ,w ,d a t a .o u t ) ; 
i n p u t  [2 5 5 :0 ]  d a t a ;  
in p u t  [4 :0 ]  a d d r ; 
in p u t  w; 
in p u t  e lk ;
o u tp u t [7 :0 ]  d a t a .o u t ;  
r e g  [7 :0 ]  d a t a .o u t ;
a lw ays ® (posedge e lk )  
b e g in  
i f ( ! w )
c a s e  (a d d r)
5 ’bOOOOO: d a t a .o u t= d a ta [ 7 :0 ] ;  
5 ’bOOOOl: d a t a .o u t= d a ta [ 1 5 : 8 ] ; 
5 ’bOOOlO: d a t a .o u t= d a ta [ 2 3 :16 ]; 
5 ’bOOOll: d a ta _ o u t= d a ta [ 3 1 :2 4 ] ; 
5 ’bOOlOO: d a t a .o u t= d a ta [ 3 9 : 3 2 ] ; 
5 ’bOOlOl: d a ta .o u t= d a ta [4 7 :4 0 ]  
5 ’bOOllO: d a ta _ o u t= d a ta [55 :48 ] 
5 ’b O O lll: d a ta .o u t= d a ta [6 3 :5 6 ]  
5 ’bOlOOO: d a t a .o u t= d a ta [ 7 1 : 6 4 ] ; 
5 ’bOlOOl: d a ta .o u t= d a ta [7 9 :7 2 ]  
5 ’bOlOlO; d a ta .o u t= d a ta [8 7 :8 0 ]  
5 ’b O lO ll:  d a ta .o u t= d a ta [9 5 :8 8 ]  
5 ’bOllOO: d a ta .o u t= d a ta [1 0 3 :9 6 ]  
5 ’b O llO l: d a t a .o u t = d a t a [ l l l : 1 0 4 ]  
5 ’b O lllO : d a ta .o u t= d a ta [1 1 9 :1 1 2 ]  
5 ’b O l l l l :  d a ta .o u t= d a ta [1 2 7 :1 2 0 ]  
5 ’blOOOO: d a t a .o u t= d a ta [ 1 3 5 :128 ]; 
5 ’blOOOl: d a ta _ o u t= d a ta [1 4 3 :1 3 6 ]  
5 ’blOOlO: d a ta .o u t= d a ta [1 5 1 ;1 4 4 ]  
5 ’b lO O ll:  d a ta _ o u t= d a ta [1 5 9 :1 5 2 ]  
5 ’blOlOO: d a t a .o u t= d a ta [ 1 6 7 :160 ]; 
5 ’b lO lO l; d a ta .o u t= d a ta [1 7 5 :1 6 8 ]  
5 ’b lO llO : d a ta .o u t= d a ta [1 8 3 :1 7 6 ]  
5 ’b l O l l l :  d a t a .o u t= d a ta [ 1 9 1 :184] 
5 ’bllOOO: d a t a .o u t= d a ta [ 1 9 9 :192 ]; 
5 ’b llO O l: d a t a .o u t= d a ta [ 2 0 7 :2 0 0 ] ; 
5 ’b llO lO : d a ta _ o u t= d a ta [ 2 1 5 :2 0 8 ] ;
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
5 ’b l l O l l :  d a ta _ o u t= d a ta [2 2 3 :2 1 6 ] ;
5 ’b lllO O : d a ta _ o u t= d a ta [2 3 1 :2 2 4 ] ;
5 ’b l l l O l :  d a ta _ o u t= d a ta [2 3 9 :2 3 2 ]
5 ’b l l l l O :  d a ta _ o u t= d a ta [2 4 7 :2 4 0 ]
5 ’b l 1111: d a ta _ o u t= d a ta  [255:248] 
en d c ase
end
endm odule
//M o d u le  BPWSMSBFFM i s  a  com ponent i n  m odule to p _ s y s .  I t  i s  
/ / t h e  p ro p o se d  BPWS PB f i n i t e  f i e l d  m u l t i p l i e r ,  
m odule B P W S M S B F F M (a ,b ,e lk ,rs t ,rs tc lk ,c ) ; 
in p u t  [7 :0 ]  a ; 
in p u t  [2 3 2 :0 ] b ; 
in p u t  e lk ;  
in p u t  r s t , r s t c l k ;  
o u tp u t [2 3 2 :0 ] c ;
w ire  [2 3 2 :0 ] w l,w 2,w 3,w 4,w 5,w 6,w 7; 
w ire  wO; 
a s s ig n  wO=l’bO;
la tc h 2 3 3  11 ( b , r s t c l k ,w O ,w l ) ; / / c l k  
m u lt ip l ie rS x 2 3 3  m8x233 (a ,w l ,w 2 ) ;  
x o r .n e tw o rk  x n l  (w 2,w 5,w 3); 
a s s ig n  c=w3;
c o n s t .m u l t i p l i e r  cm8 (w 3,w 4); 
la tc h 2 3 3  12 ( w 4 ,c l k , r s t ,w 5 ) ; 
endm odule
//M o d u le  f b p .s q u a r e r  i s  a  com ponent i n  m odule t o p . s y s .  I t  i s  
/ / t h e  f u l l  b i t  p a r a l l e l  PB s q u a r e r .  
m odule f b p _ s q u a r e r ( a ,b ) ; 
in p u t  [2 3 2 :0 ]  a ; 
o u tp u t [2 3 2 :0 ]  b ; 
r e g  [2 3 2 :0 ] b ; 
i n t e g e r  k ; 
i n t e g e r  m; 
i n t e g e r  i ;
a lw ays @ (a )  
b e g in
k=74; 
m=233;
f o r ( i = 0 ; i< k ; i= i+ 2 )
b [ i ] = a [ i / 2 ] * a [ m - k / 2 + i / 2 ] ; 
f o r ( i = l ; i< k ; i= i+ 2 )  
b [ i ] = a [ ( m + i ) /2 ]  ; 
f o r ( i = k ; i< 2 * k ; i= i+ 2 )
b [ i ] = a [ i / 2 ]  * a [m -k + i/2 ]  ; 
f o r ( i = k + l ; i< m ;i= i+ 2 )
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
b [ i ]  =a [ (m+i ) /2 ]  * a [ (m -k+i ) /2 ]  ; 
f o r ( i= 2 * k ; i<m ; i= i+ 2 ) 
b [ i ] = a [ i / 2 ]  ;
end
endm odule
//M o d u le  la tc l i2 3 3  i s  a  com ponent i n  b o th  m odule to p _ sy s  and 
/ /m o d u le  BPWSMSBFFM. I t  s e rv e  a s  a  2 3 3 - b i t  r e g i s t e r ,  
m odule l a t c h 2 3 3 ( a , c l k , r s t ,  q ) ; 
in p u t  [232 :0 ] a ; 
in p u t  e lk ;  
in p u t  r s t ;
o u tp u t  [23 2 :0 ] q ; 
r e g  [2 3 2 :0 ] q ; 
i n t e g e r  k ;
a lw ay s ® ( p o sed g e  e lk )  / / o r  p osedge  r s t  
i f ( r s t )  
q=233’b0; 
e l s e  
q=a; 
endm odule
//M o d u le  m u l t i p l i x e r 2 t o l  i s  a  com ponent i n  m odule to p _ s y s .  
/ / I t  i s  th e  m u l t i p l i x e r  u se d  to  s e l e c t  th e  o u tp u t from  e i t h e r  
/ / m u l t i p l i e r  o r  s q u a r e r .  
m odule m u l t i p l i x e r 2 t o l ( a , b , s e l , c ) ; 
i n p u t  [23 2 :0 ] a ;  
i n p u t  [23 2 :0 ] b ; 
in p u t  s e l ;  
o u tp u t  [23 2 :0 ] c ; 
r e g  [23 2 :0 ] c ;
a lw ays® (a o r  b o r  s e l )  
b e g in  
i f ( s e l )  
c=a; 




//M o d u le  m u l t ip l ie rS x 2 3 3  i s  a  com ponent i n  m odule BPWSMSBFFM. 
/ / I t  i s  t h e  8x233 p a r t i a l  p ro d u c t g e n e r a to r ,  
module m u l t i p l i e r S x 2 3 3 ( a ,b ,c ) ; 
in p u t  [7 :0 ]  a ; 
in p u t  [2 3 2 :0 ] b ; 
o u tp u t [2 3 2 :0 ] c ;
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
w ire  [232:0] Xl,x2,x3,x4,x5,x6,ml,m2,m3,m4,m5,in6,m7, 
aO, a l , a 2 , a 3 , a 4 , a 5 , a 6 , a 7 ; 
const_ml cml (b,ml); 
const_m2 cm2 (b,m2) 
const_m3 cm3 (b,m3) 
const_m4 cm4 (b,m4) 
const_m5 cm5 (b,m5) 
const_m6 cm6 (b,m6) 
const_m7 cm7 (b,m7)
and_netw ork  amO ( a [ 0 ] , b ,a 0 ) ;  
and_netw ork  ami ( a [ l ] , m l , a l )  
and_netw ork  am2 (a [2 ] ,m 2 ,a 2 ) 
and_netw ork  am3 (a [3 ] ,m 3 ,a 3 )  
and_netw ork  am4 (a [4 ] ,m 4 ,a 4 )  
a n d .n e tw o rk  amS (a [5 ] ,m 5 ,a 5 )  
and_netw ork  am6 (a [6 ] ,m 6 ,a 6 )  
and_netw ork  am7 (a [7 ] ,m 7 ,a 7 )
x o r_ n e tw o rk  x a l  
x o r_ n e tw o rk  xa2 
x o r_ n e tw o rk  xa3 
x o r_ n e tw o rk  xa4 
x o r_ n e tw o rk  xa5 
x o r_ n e tw o rk  xa6 
x o r_ n e tw o rk  xa7 
endm odule
( a O ,a l ,x l )  
( a 2 ,a 3 ,x 2 )  
( a 4 ,a 5 ,x 3 )  
( a 6 ,a 7 ,x 4 )  
( x l ,x 2 ,x 5 )  
(x 3 ,x 4 ,x 6 )  
( x 5 ,x 6 , c ) ;
//M o d u le  x o r_ n e tw o rk  i s  a  com ponent i n  b o th  m odule BPWSMSBFFM 
/ /  and m odule m u l t ip l ie r 8 x 2 3 3 .  I t  i s  th e  2 3 3 - b i t  a d d e r , 
m odule x o r _ n e tw o r k ( a ,b ,c ) ; 
in p u t  [23 2 :0 ] a ; 
in p u t  [23 2 :0 ] b ; 
o u tp u t  [2 3 2 :0 ] c ; 
r e g  [2 3 2 :0 ] c ;  
in t e g e r  k ;
a lw ays ®(a o r  b ) 
f o r  (k = 0 ;k< 233 ;k= k+ l) 
c [k ]= a [k ]  ~ b [ k ] ; 
endm odule
//M o d u le  c o n s t .m u l t i p l i e r  i s  a  com ponent i n  m odule BPWSMSBFFM. 
/ / I t  i s  th e  c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t _ m u l t i p l l e r ( a , b ) ; 
in p u t  [23 2 :0 ] a ; 
o u tp u t [2 3 2 :0 ] b ; 
r e g  [23 2 :0 ] b ; 
i n t e g e r  k ;
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
a lw ay s @(a) 
b e g in
b [0 ]= a [2 2 5 ] 
b [ l]= a [2 2 6 ]  
b [2 ]= a [2 2 7 ] 
b [3 ]= a [2 2 8 ] 
b [4 ]= a [2 2 9 ] 
b [5 ]= a [2 3 0 ] 
b [6 ]= a [2 3 1 ]  
b [7 ]= a [2 3 2 ]  
f o r ( k = 8 ;k< 74 ;k=k+1) 
b [ k ] = a [ k - 8 ] ; 
b t7 4 ]= a [2 2 5 ]" a [6 6 ]  
b [ 7 5 ]= a [2 2 6 ] 'a [6 7 ]  
b [7 6 ]= a [2 2 7 ]* a [6 8 ]  
b [7 7 ]= a [2 2 8 ]* a [6 9 ]  
b [7 8 ]= a [2 2 9 ]~ a [7 0 ]  
b [7 9 ]= a [2 3 0 ]* a [7 1 ]  
b [8 0 ]= a [2 3 1 ] ‘ a [7 2 ] 
b [8 1 ]= a [2 3 2 ]* a [7 3 ]  
fo r ( k = 8 2 ;k< 233 ;k= k+ l) 
b [ k ] = a [ k - 8 ] ;
end
endm odule
//M o d u le  co n s t_ m l i s  a  com ponent i n  m odule m u l t ip l ie r8 x 2 3 3 .  
/ / I t  i s  one c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t_ m l ( a ,b ) ; 
in p u t  [2 3 2 :0 ] a ; 
o u tp u t  [23 2 :0 ] b ; 
r e g  [2 3 2 :0 ]  b ; 
i n t e g e r  k ; 
a lw ays ® (a) 
b e g in
b [0 ]= a [2 3 2 ]  ; 
f o r (k = l;k < 7 4 ;k = k + l)  
b [ k ] = a [ k - l ] ; 
b [7 4 ]= a [2 3 2 ]" a [7 3 ]  ; 
fo r ( k = 7 5 ;k< 233 ;k = k + l) 
b [ k ] = a [ k - l ] ;
end
endm odule
//M o d u le  const_m 2 i s  a  com ponent i n  m odule m u lt ip l ie r8 x 2 3 3 .  
/ / I t  i s  one c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t_ m 2 ( a ,b ) ; 
in p u t  [2 3 2 :0 ] a ; 
o u tp u t [2 3 2 :0 ] b ;
74
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
r e g  [2 3 2 :0 ] b ; 
i n t e g e r  k ; 
a lw a y s  @(a) 
b e g in
b [ 0 ]= a [2 3 1 ] ; 
b [ l ] = a [ 2 3 2 ] ; 
fo r(k = 2 ;k < 7 4 :k = k + l)  
b  [k] =a [k -2 ] ; 
b [ 7 4 ]= a [2 3 1 ]* a [7 2 ] ; 
b [ 7 5 ]= a [2 3 2 ]* a [7 3 ] ; 
f  o r (k = 7 6 ; k<233;k=k+1) 
b [k ]= a [k -2 ]  ;
end
endm odule
//M o d u le  const_m 3 i s  a  com ponent i n  m odule m u l t ip l ie r8 x 2 3 3 .  
/ / I t  i s  one c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t_ m 3 (a ,b ) ; 
in p u t  [2 3 2 :0 ] a ; 
o u tp u t  [2 3 2 :0 ] b ; 
r e g  [2 3 2 :0 ] b ; 
i n t e g e r  k ; 
a lw ays @(a) 
b e g in
b [0 ]= a [2 3 0 ]  
b [ l ]= a [2 3 1 ]  
b [2 ]= a [2 3 2 ]  
fo r(k = 3 ;k < 7 4 ;k = k + l)  
b  [k] =a [ k - 3 ] ; 
b [7 4 ]= a [2 3 0 ] -a [7 1 ]  ; 
b [ 7 5 ]= a [2 3 1 ] 'a [7 2 ]  ; 
b [ 7 6 ]= a [2 3 2 ] * a [ 7 3 ] ; 
f  o r (k = 7 7 ;k<233;k=k+1) 
b [ k ] = a [ k - 3 ] ;
end
endm odule
//M o d u le  const_m 4 i s  a  com ponent i n  m odule m u l t ip l ie r 8 x 2 3 3 .  
/ / I t  i s  one c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t_ m 4 (a ,b ) ; 
in p u t  [2 3 2 :0 ] a ; 
o u tp u t  [2 3 2 :0 ]  b ; 
r e g  [2 3 2 :0 ]  b ; 
i n t e g e r  k ; 
a lw ays Q (a) 
b e g in
b [0 ]= a [2 2 9 ]  
b [ l ]= a [2 3 0 ]  
b [2 ]= a [2 3 1 ]
75
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
b [3 ]= a [2 3 2 ]  ; 
fo r(k = 4 ;k < 7 4 ;k = k + l)  
b [ k ] = a [ k - 4 ] ; 
b [7 4 ]= a [2 2 9 ]* a [7 0 ]  : 
b [7 5 ]= a [2 3 0 ] ‘ a  [71] 
b [7 6 ]= a [2 3 1 ] 'a [7 2 ]  
b [7 7 ]= a [2 3 2 ] 'a [7 3 ]  
f  o r (k = 7 8 ;k< 233 ;k=k+1) 
b [ k ]= a [k -4 ]  ;
end
endm odule
//M o d u le  const_m 5 i s  a  com ponent I n  m odule m u lt ip l ie r8 x 2 3 3 .  
/ / I t  i s  one c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t_ m 5 ( a ,b ) ; 
in p u t  [23 2 :0 ] a ; 
o u tp u t  [2 3 2 :0 ] b ; 
r e g  [2 3 2 :0 ] b ; 
i n t e g e r  k ; 
a lw ay s 0 ( a )  
b e g in
b [0 ]  = a[228] : 
b [ l ]= a [2 2 9 ]  
b [2 ]= a [2 3 0 ]  
b [3 ]= a [2 3 1 ]  
b [4 ]= a [2 3 2 ]  : 
fo r(k = 5 ;k < 7 4 ;k = k + l)  
b [k] =a [k -5 ]  ; 
b [7 4 ]= a [2 2 8 ]* a [6 9 ]  ; 
b [7 5 ]= a [2 2 9 ]* a [7 0 ]  
b [7 6 ]= a [2 3 0 ]* a [7 i]  
b [ 7 7 ]= a [2 3 1 ] ‘ a [7 2 ] 
b [7 8 ]= a [2 3 2 ]~ a [7 3 ]  ; 
f  o r (k = 7 9 ;k< 233 ; k= k+ l) 
b [ k ]= a [k -5 ]  ;
end 
endm odule
//M o d u le  const_m 6 i s  a  com ponent i n  m odule m u l t ip l ie r 8 x 2 3 3 . 
/ / I t  i s  one c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t_ m 6 (a ,b ) ; 
in p u t  [2 3 2 :0 ] a ; 
o u tp u t  [2 3 2 :0 ] b ; 
r e g  [2 3 2 :0 ]  b ; 
i n t e g e r  k ; 
a lw ay s ® (a) 
b e g in
b [0 ]= a [2 2 7 ]  ; 
b [ i ] = a [ 2 2 8 ] ;
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
b [2 ]= a [2 2 9 ]  
b [3 ]= a [2 3 0 ]  
b [4 ]= a [2 3 1 ]  
b [5 ]= a [2 3 2 ]  : 
fo r(k = 6 ;k < 7 4 ;k = k + l)  
b [k] =a [k -6 ] ; 
b [ 7 4 ]= a [2 2 7 ] ‘ a t6 8 ] 
b [7 5 ] =a [2 2 8 ] ' a  [69] 
b [ 7 6 ]= a [2 2 9 ] ‘ a[70] 
b [7 7 ]= a [2 3 0 ]  ‘ a  [71] 
b [7 8 ]= a [2 3 1 ]~ a [7 2 ]  
b [ 7 9 ]= a [2 3 2 ] 'a [7 3 ]  : 
f o r ( k = 8 0 : k<233;k = k+ l) 
b [ k ] = a [ k - 6 ] ;
end
endm odule
//M o d u le  const_m 7 i s  a  com ponent i n  module m u l t ip l ie r8 x 2 3 3 .  
/ / I t  i s  one c o n s ta n t  m u l t i p l i e r ,  
m odule c o n s t_ m 7 ( a ,b ) ; 
in p u t  [2 3 2 :0 ]  a ; 
o u tp u t [2 3 2 :0 ]  b ; 
r e g  [2 3 2 :0 ]  b ; 
i n t e g e r  k ; 
a lw ays @(a) 
b e g in
b [0 ]= a [2 2 6 ]  : 
b [ l ]= a [2 2 7 ]  
b [2 ]= a [2 2 8 ]  
b [3 ]  = a[229] 
b [4 ]  = a[230] 
b [5 ]= a [2 3 1 ]  
b [6 ]= a [2 3 2 ] ,  
fo r(k = 7 ;k < 7 4 ;k = k + l)  
b [ k ]= a [k -7 ]  ; 
b [ 7 4 ]= a [2 2 6 ] 'a [6 7 ]  
b [ 7 5 ]= a [2 2 7 ] -a [6 8 ]  
b [7 6 ]= a [2 2 8 ]* a [6 9 ]  
b [ 7 7 ]= a [2 2 9 ] 'a [7 0 ]  
b [ 7 8 ]= a [2 3 0 ] ‘ a[71 ] 
b [7 9 ] =a [231]*  a  [72] 
b [8 0 ]= a [2 3 2 ]* a [7 3 ]  
f o r ( k = 8 1 ; k< 233;k= k+ l) 
b [ k ]= a [k -7 ]  ;
end 
endm odule
//M o d u le  an d _ n e tw o rk  i s  a  com ponent i n  m odule BPWSMSBFFM. 
/ / I t  i s  t h e  AND n e tw o rk  u se d  t o  m u l t ip ly  an  e lem e n t by  a
77
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. PROGRAM 2
/ / c o e f f i c i e n t . 
m odule a n d _ n e tw o r k (a ,b ,c ) ; 
in p u t  a ;
in p u t  [23 2 :0 ] b ; 
o u tp u t [23 2 :0 ] c ; 
r e g  [23 2 :0 ] c ; 
i n t e g e r  k ;
a lw ays @(a o r  b) 
b e g in
fo r(k = 0 ;k < 2 3 3 ;k = k + l)  




Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
References
[1] M. Benantar “Introduction to the Public Key Infrastructure for the 
Internet” Prentice Hall PRT 2002
[2] T. Beth, D. Gollmann,” Algorithm Engineering for Public Key Algo­
rithm” , IEEE Journal on Selected Areas in Communications, VOL. 7, 
NO. 4, May 1989
[3] G. Birkhoff, S. Mac Lane, “A Survey of Modern Algebra”, 5th ed. New 
York: Macmillan, p. 413, 1996
[4] H. Brunner, A. Curiger, M.Hofstetter, “On Computing Multiplicative 
Inverses in GF(2’”)” IEEE Trans. Computers, VOL 42, NO. 8, August 
1993
[5] H. Eberle, S. Chang, N. Cura, S. Cupta, D. Finchelstein, E. Coupy, D. 
Stebila, “ An End-to-End Systems Approach to Elliptic Curve Cryp­
tography” Sun Microsytems Laboratories 2002-2003
[6] D. Gollmann, “qually Spaced Polynomials, Dual Bases, and Multipli­
cation in F2m” IEEE Trans. Computers, VOL.51, NO.5, May 2002
[7] C. Crabbe, M. Bednara, J. Teich, J. Von Zur Cathen, J. Shokrollahi, 
“FPGA designs of parallel high performance GF(2^33) multipliers” . 
Circuits and Systems, 2003. ISCAS ’03. Proceedings of the 2003 Inter­
national Symposium on , VOL 2 , 25-28 May 2003
[8] J. Crobschadl, “A LOW-POWER BIT-SERIAL MULTIPLIER FOR 
FINITE FIELDS GF{2^)” IEEE International Symposium on Cir­
cuits and Systems ISCAS 2001, Sydney, Australia, May 6-9, 2001
[9] U. Hansmann, M. S. Nicklous, T. Schack, F. Seliger, “Smart Card 
Application Development Using Java” Springer, first edition, 2000.
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
REFERENCES
[10] I.S. Hsu, T.K. Truong, L.J. Deutsch, I.S. Reed, “A comparison of VLSI 
architecture of finite field multipliers using dual, normal, or standard 
bases” IEEE Trans. Computers, VOL. 37, Issue 6, June 1988
[11] N. Koblitz, “Elliptic curve cryptosystems”. Mathematics of Computa­
tion, American Mathematical Society, 48(177):203-209, 1987.
[12] C.K. Koc, B. Sunar “Low-Complexity Bit-Parallel Canonical and Nor­
mal Basis Multipliers for a Class of Finite Fields” IEEE Trans. Com­
puters, VOL. 47, NO.3, March 1998
[13] C.Y. Lee, “Low complexity bit-parallel systolic multiplier over GF{2‘̂ )  
using irreducible trinomial”, lEE Proc. Comput. Digit. Tech., Vol 150, 
No. 1, January 2003
[14] C.Y. lee, E.H. Lu, J.Y. Lee, “New Bit-Parallel systolic multipliers for 
a class of G F {2 ^y \  Computer Arithmetic, 2001. Proceedings. 15th 
IEEE Symposium on , 11-13 June 2001 Pages:51 - 58
[15] E.D. Mastrovito, “VLSI Architectures for Multiplication Over Finite 
Field GF{2’̂ )” Applied Algebra, Algebraic Algorithms, and Error- 
Correcting Codes, Proc. Sixth Inti Conf., AAECC-6, T. Mora, ed., 
pp. 297-309, Rome, July 1988. New York: Springer-Verlag.
[16] V.S. Miller, “Use of elliptic curves in cryptography” CRYPTO’85 Pro­
ceedings of Crypto, pages 417-426, Springer, 1985.
[17] T. Nagell, ’Trreducibility of the Cyclotomic Polynomial.” 47 in Intro­
duction to Number Theory. New York: Wiley, pp. 160-164, 1951.
[18] S. Okada, N. Torii, K. Itoh, M. Takenaka, “Implementation of Elliptic 
Curve Cryptographic Coprocessor over GF(2’”) on an FPGA”, C.K. 
Koc and C. Paar (Eds.): CHES 2000, LNCS 1965, pp. 25-40, 2000. 
Springer-Verlag Berlin Heidelberg 2000
[19] A. Reyhani-Masoleh, M. Anwar Hasan, “A New Construction of 
Massey-Omura Parallel Multiplier over GF(2"*)”IEEE Trans. Com­
puters, VOL. 51, NO. 5, May 2002
[20] A. Menezes, “Elliptic Curve Cryptosystems” , CryptoBytes, Vol.l 
No.2, Summer 1995.
[21] C. Paar, “A new architecture for a parallel finite field multiplier with 
low complexity based on composite fields”, IEEE Trans. Computers, 
VOL. 45, NO. 7, July 1996
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
REFERENCES
[22] M.J.B. Robshaw, Y.L. Yin, “Elliptic Curve Cryptosystems” An RSA 
Laboratories Technical Note, Revised Jun2 27, 1997
[23] B. Schneier, “Applied Cryptography” John Wiley and Sons, Inc., 1994.
[24] L. Song, K. K. Parhi, “Efficient Finite Field Serial/Parallel Multipli­
cation” Application Specific Systems, Architectures and Processors, 
1996. ASAP 96. Proceedings of International Conference on , 19-21 
Aug. 1996 Pages:72 - 82.
[25] B. Sunar, C.K. Koc “Mastrovito Multiplier for All Trinomails” IEEE 
Trans. Computers, VOL. 48, NO. 5, May 1999
[26] N. Takagi, J. Yoshiki, K. Takagi, “ A Fast Algorithm for Multiplicative 
Inversion in GF{2^) Using Normal Basis” , IEEE Trans. Computers. 
VOL. 50, NO. 5, May, 2001
[27] C.C. Wang, T.K. Truong, H.M. Shao, L.J. Deutsch, J.K. Omura, and 
L.S. Reed, “VLSI Architectures for computing multiplications and In­
verses in GF(2”‘)” IEEE Trans. Computers, VOL. 46, NO. 2, Feb. 
1997
[28] H. Wu, “Bit-Parallel Finite Field Multiplier and Squarer Using Poly­
nomial Basi” IEEE Trans. Computers, VOL. 51, NO. 7, July 2002
[29] National Institute of Standard and Technology, FIPS PUB 186-2, Jan 
2000






Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
VITA AUCTORIS
Wenkai Tang was born in 1969 in P.R. China. He received his Bachelor’s Degree 
in Optoelectronics from Electronic Engineering Department in Tsinghua University 
in 1992. He is currently a candidate for the Master of Applied Science Degree in the 
Department of Electrical and Computer Engineering at the University of Windsor 
and hopes to graduate in Winter 2004.
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
