Hardware Implementations Of Ecc Over A Binary Edwards Curve by Kocabaş, Ünal

2
ISTANBUL TECHNICAL UNIVERSITYF INSTITUTE OF SCIENCE AND TECHNOLOGY
HARDWARE IMPLEMENTATIONS OF ECC
OVER A BINARY EDWARDS CURVE
M.Sc. Thesis by
Ünal KOCABAS¸, M.Sc.
Department : Electronics and Communication Engineering
Programme : Electronics Engineering
JULY 2009
ISTANBUL TECHNICAL UNIVERSITYF INSTITUTE OF SCIENCE AND TECHNOLOGY
HARDWARE IMPLEMENTATIONS OF ECC
OVER A BINARY EDWARDS CURVE
M.Sc. Thesis by
Ünal KOCABAS¸, M.Sc.
(504071224)
Date of Submission : 4 May 2009
Date of Examin : 22 July 2009
Supervisor : Ass. Prof. Dr. Sıddıka Berna ÖRS YALÇIN
Members of the Examining Committee Ass. Prof. Dr. Devrim Yılmaz AKSIN
Ass. Prof. Dr. Osman Kaan EROL
JULY 2009
I˙STANBUL TEKNI˙K ÜNI˙VERSI˙TESI˙F FEN BI˙LI˙MLERI˙ ENSTI˙TÜSÜ
BI˙R I˙KI˙LI˙ EDWARDS EG˘RI˙SI˙NI˙N
DONANIMSAL GERÇEKLEMELERI˙
YÜKSEK LI˙SANS TEZI˙
Y.Müh Ünal KOCABAS¸
(504071224)
Tezin Enstitüye Verildig˘i Tarih : 4 Mayıs 2009
Tezin Savunuldug˘u Tarih : 22Temmuz 2009
Tez Danıs¸manı : Ass. Prof. Dr. Sıddıka Berna ÖRS YALÇIN
Dig˘er Jüri Üyeleri Ass. Prof. Dr. Devrim Yılmaz AKSIN
Ass. Prof. Dr. Osman Kaan EROL
Temmuz 2009
i
ii
ACKNOWLEDGEMENT
Firstly, I would like to thank my supervisor Ass. Prof. Dr. S. Berna Örs Yalçın
who has supported me during my master and has introduced cryptography to me and
encouraged me to come to Katholieke Universiteit Leuven for my master thesis.
I would also like to thank my supervisors Dr. Lejla Batina and Prof. Ingrid
Verbauwhede for their guidance and support during the thesis and assistance on official
problems.
I find it necessary to thank Miroslav Kneževic´ and Vladimir Rožic´ for their endless
support, friendship and advices during my implementation.
I am very grateful to Kerem Varıcı and Özgül Küçük for their sincere friendship and
precious favors and corrections.
I am thankful and dedicate my thesis to my family, who are the references of my
accomplishments. Their support has encouraged me to work hard and reach the best in
my life.
Finally, I would like to thank my love Emanuela Zaraj for her love and support and the
happiness she is bringing in my life. It is a great feeling that she is always with me
whenever I need her.
July 2009 Ünal KOCABAS¸
iii
iv
TABLE OF CONTENTS
ABBREVATIONS ix
LIST OF TABLES xi
LIST OF FIGURES xiii
LIST OF SYMBOLES xv
SUMMARY xvii
OZET xix
1. INTRODUCTION 1
1.1. Motivation 1
1.2. Organization of Thesis 2
2. CRYPTOSYSTEMS 3
2.1. Symmetric-key Cryptosystems 3
2.2. Asymmetric-key Cryptosystems 4
2.2.1. Key Generation 5
2.2.2. Public-key Encryption 5
2.2.3. Digital Signature 6
2.2.4. Diffie-Hellman Key Management 7
3. ESSENTIAL CONCEPTS 9
3.1. Integers 9
3.1.1. The Integers modulo n 10
3.2. Groups 12
3.3. Rings 13
3.3.1. Polynomial Rings 13
3.4. Fields 15
3.4.1. Finite Fields 15
3.4.1.1. Addition and Subtraction 16
3.4.1.2. Multiplication 16
4. ELLIPTIC CURVE CRYPTOSYSTEMS 19
4.1. Discrete Logarithm Problem 19
4.1.1. Diffie-Hellman Key Agreement Protocol 20
4.1.2. The El-Gamal Cryptosystem 21
4.1.3. Elliptic Curve Discrete Logarithm Problem 21
4.2. Introduction to Elliptic Curves 23
4.2.1. Weierstrass Equation 24
4.2.2. Point Addition over Finite Fields 25
4.3. Point Multiplication 26
4.4. Projective Coordinates 29
v
4.4.1. Standard Projective Coordinates 29
4.4.2. Jacobian Coordinates 30
5. EDWARDS CURVES 31
5.1. Binary Edwards Curves 32
5.1.1. Introduction to Binary Edwards Curves 32
5.1.2. Binary Edwards Curves Addition Law 33
5.1.3. Complete Binary Edwards Curves 33
5.1.4. Explicit Addition Formulas 34
5.1.4.1. Affine Addition 34
5.1.4.2. Mixed Addition 34
5.1.4.3. Projective Addition 35
5.1.5. Doubling 36
5.1.5.1. Affine Doubling 37
5.1.5.2. Projective Doubling 38
5.1.6. Differential Addition 39
6. BINARY EDWARDS CURVES IMPLEMENTATION 41
6.1. Implementation of Binary Edwards Curves 41
6.1.1. Processor 43
6.1.1.1. MALU (Modular Arithmetic Logic Unit) Design 43
6.1.1.2. Register File Design 45
6.1.1.3. Shifter Design 48
6.1.1.4. Control Block 49
6.1.2. Bus Manager 51
6.1.3. Memory Units 53
6.1.3.1. ROM 53
6.1.3.2. RAM 53
6.2. Algorithms for Implementation of Binary Edwards Curves 54
7. RESULTS 57
7.1. Power Estimation Methodology 57
7.2. Area, Power Consumption in Different Frequencies 58
7.3. Trade-off 61
7.4. Clock-gating 62
8. CONCLUSION 65
REFERENCES 67
APPENDIX 69
A. Control Sequence of Control Block 69
B. Control Bits of Assign Operation 71
C. Codes for Cell and MALU 73
vi
D. Comparison Between Normal Clocking and Clock-gating 75
E. Simulation Examples 77
CIRCULUM VITAE 84
vii
viii
ABBREVATIONS
AES : Advanced Encryption Standard
BEC : Binary Edwards Curves
CDHP : Computational Diffie-Hellman Problem
DLP : Discrete Logarithm Problem
DPA : Differential Power Analysis
EC : Elliptic Curve
ECC : Elliptic Curve Cryptosystem
ECDLP : Elliptic Curve Discrete Logarithm Problem
FSM : Finite State Machine
GPG : GNU Privacy Guard
LSB : Least Significant Bit
LUT : Look-Up Table
MALU : Modular Arithmetic Logic Unit
MSB : Most Significant Bit
NESSIE : New European Schemes for Signatures, Integrity and Encryption
NIST : National Institute of Standards and Technology
PGP : Pretty Good Privacy
PKC : Public-key Cryptography
RAM : Random-access Memory
RFID : Radio-Frequency Identification
ROM : Read-Only Memory
RSA : Rivest-Shamir-Adleman
SPA : Simple Power Analysis
SSL : Secure Sockets Layer
ix
x
LIST OF TABLES
Page No
Table 3.1 Addition in Finite Fields . . . . . . . . . . . . . . . . . . . . . . 17
Table 4.1 NESSIE Recommendations . . . . . . . . . . . . . . . . . . . . 23
Table 5.1 The Speed of Binary Edwards Curves Differential Addition . . . 39
Table 6.1 Example of inversion in F2163 . . . . . . . . . . . . . . . . . . . . 55
Table 7.1 Results of implementation in 100kHz in 0.13µm technology . . . 59
Table 7.2 Results of implementation in 400kHz in 0.13µm technology . . . 59
Table 7.3 Results of implementation in 1MHz in 0.13µm technology . . . . 60
Table 7.4 Results of implementation in 5MHz in 0.13µm technology . . . . 60
Table 7.5 Results of implementation in 20MHz in 0.13µm technology . . . 60
Table 7.6 Results of implementation in 50MHz in 0.13µm technology . . . 60
Table 7.7 Our example design for d = 4 and 400kHz . . . . . . . . . . . . 61
Table 7.8 Results of implementation in 400kHz in 0.13µm technology after
clock-gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Table 7.9 Results of implementation in 5MHz in 0.13µm technology after
clock-gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
xi
xii
LIST OF FIGURES
Page No
Figure 2.1 : Symmetric-key Cryptosystems . . . . . . . . . . . . . . . . . . . 3
Figure 2.2 : Key Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Figure 2.3 : Public-key Encryption . . . . . . . . . . . . . . . . . . . . . . . 6
Figure 2.4 : Public-key Digital Signature Diagram . . . . . . . . . . . . . . . 6
Figure 2.5 : Diffie-Hellman Key Management Scheme . . . . . . . . . . . . . 7
Figure 3.1 : Modulo Operation over Rijndael Finite Field . . . . . . . . . . . 17
Figure 4.1 : Diffie-Hellman Key Agreement Protocol . . . . . . . . . . . . . 20
Figure 4.2 : El-Gamal Cryptosystem . . . . . . . . . . . . . . . . . . . . . . 22
Figure 4.3 : Point Addition of a Point in Elliptic Curve Equation y2 = x3−
50x+100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 4.4 : Doubling of a Point in Elliptic Curve Equation y2 = x3−50x+100 27
Figure 4.5 : Hierarchy of Elliptic Curve Cryptosystems . . . . . . . . . . . . 27
Figure 6.1 : The architecture of Binary Edwards Curves Processor (BEC
Processor) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 6.2 : BEC Processor’s MALU Architecture . . . . . . . . . . . . . . . 44
Figure 6.3 : Control Scheme of Cell and MALU with d = 4 . . . . . . . . . . 45
Figure 6.4 : Register File Architecture . . . . . . . . . . . . . . . . . . . . . 46
Figure 6.5 : Shifting Operation in regB and Data Assigning, Taking Process in
regD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Figure 6.6 : The Architecture of Shifter Block . . . . . . . . . . . . . . . . . 48
Figure 6.7 : The Finite State Machine Diagram of Shifter Component . . . . . 49
Figure 6.8 : The Processor of Binary Edwards Curves . . . . . . . . . . . . . 50
Figure 6.9 : The Map of Address Control . . . . . . . . . . . . . . . . . . . . 52
Figure 6.10: Architecture of Bus Manager . . . . . . . . . . . . . . . . . . . . 52
Figure 6.11: RAM Blocks and Storing Values . . . . . . . . . . . . . . . . . . 53
Figure 6.12: Required Operations for Binary Edwards Curves Implementation 54
Figure 7.1 : Power estimation flow . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 7.2 : Area Consumption vs. Time . . . . . . . . . . . . . . . . . . . . 61
Figure 7.3 : Throughput vs. Power Consumption in d = 4 . . . . . . . . . . . 62
Figure 7.4 : Power Consumption in 5MHz with different digit sizes . . . . . . 62
Figure 7.5 : Process of Clock gating . . . . . . . . . . . . . . . . . . . . . . 63
Figure A.1 : The Finite State Machine of Control Block . . . . . . . . . . . . 69
Figure B.1 : Assign Operation Control Map . . . . . . . . . . . . . . . . . . . 71
Figure C.1 : Codes for Cell and MALU . . . . . . . . . . . . . . . . . . . . . 73
Figure D.1 : Area Consumption vs. Time . . . . . . . . . . . . . . . . . . . . 75
Figure D.2 : Frequency vs. Power Consumption in d = 4 . . . . . . . . . . . . 76
Figure D.3 : Power Consumption in 5MHz with different digit sizes . . . . . . 76
Figure E.1 : Assigning Key Value in Modelsim . . . . . . . . . . . . . . . . . 77
Figure E.2 : Simulation in GEZEL for First Four Projective Addition . . . . . 78
Figure E.3 : Figure of Simulation in Modelsim for First X3 Value . . . . . . . 79
xiii
Figure E.4 : Figure of Simulation in Modelsim for First Y3 Value . . . . . . . 80
Figure E.5 : Figure of Simulation in Modelsim for First Z3 Value . . . . . . . 81
Figure E.6 : Simulation in GEZEL for Final Points After Inversion . . . . . . 82
Figure E.7 : Figure of Simulation in Modelsim for Final Points . . . . . . . . 83
xiv
LIST OF SYMBOLES
C : Complex numbers
Q : Rational numbers
R : Real numbers
Z : Integer numbers
Zp : Integer numbers (mod p)
G : Group
α : Generator of a group
p : Prime number
ek : Encryption with key ‘k’
dk : Decryption with key ‘k’
Fq : Finite field over ‘q’
E : An elliptic curve
E(K) : An elliptic curve over the field ‘K’
λ : Point addition constant
EB,d1,d2 : Binary Edwards curve with constants d1 and d2
X : The X coordinate of a point
Y : The Y coordinate of a point
Z : The Z coordinate of a point
xv
xvi
HARDWARE IMPLEMENTATIONS OF ECC OVER A BINARY EDWARDS
CURVE
SUMMARY
A technological progress and its increasing influence on our daily life have made
cryptology more evident. Daily applications such as ATM and ID cards, computer
passwords, secure e-mail, online banking and online commerce are made via certain
cryptology protocols. In early times, the specification of designs depended on high
speed and performance. In addition to these, nowadays, low power and small die area
implementations have become more important according to the widespread usage of
limited power and area specifications.
In this study, Edwards curve is implemented in minimum area specification which
is defined over all of the points of the elliptic curves (“completeness”) as proposed
in April 2008. The statement of the study includes the implementation steps, the
optimization over the number of registers and the parallelism of the design to gain
speed. The implementation is designed in such a way that it is secure against the
simple power analysis (SPA), although it slows down the process speed. Finally, the
comparison of power, area and speed are indicated.
xvii
xviii
BI˙R I˙KI˙LI˙ EDWARDS EG˘RI˙SI˙NI˙N DONANIMSAL GERÇEKLEMELERI˙
ÖZET
Teknolojinin gelis¸mesi ve hayatımıza olan etkisinin artmasıyla, kriptolojinin günlük
hayatımıza etkileri de belirgin bir biçimde görülmeye bas¸lamıs¸tır. ATM ve ID
kart s¸ifreleri, bilgisayar s¸ifreleri, güvenli e-posta, online bankacılık is¸lemleri ve
online ticaret gibi günlük kullanımlarımız belirli kriptografik protokoller üzerinden
gerçekles¸tirilmektedir. I˙lk zamanlarda yüksek hız ve yüksek is¸lem gücü kapasitesine
sahip devreler tasarlanmasına çalıs¸ılırken, günümüzde enerji ve alan kısıtlamasında
olan kullanım alanlarının yaygınlas¸masıyla güç ve alan tasarruflu gerçeklemeler büyük
önem kazanmıs¸tır.
Bu çalıs¸mada, Nisan 2008’de sunulan ve eliptik eg˘rinin tüm noktalarını tanımlayan
(“bütünlük”) Edwards Eg˘rilerinin minimum alanda gerçeklenmesine çalıs¸ılmıs¸tır.
Gerçekleme adımları, register sayısı optimizasyonu ve paralel is¸lemlerle döngünün
hızlandırılması anlatılmıs¸tır. Gerçeklemenin güvenlik durumunun SPA (Simple
Power Analysis)’ya dayanıklı tasarımı belirtilmis¸ ve is¸lemi yavas¸latmasına rag˘men
yan-kanal ataklarına kars¸ı dayanıklı olarak tasarlanmıs¸tır. Son olarak, güç, alan ve
hız kars¸ılas¸tırılmaları verilmis¸tir.
xix
1. INTRODUCTION
1.1 Motivation
Cryptography is one of the oldest fields of technical study which is going back
thousands of years. The title cryptography is inspired from Greek words “kryptós”,
which means “hidden, secret” and “gráfo”, which means “writing”. Until recent
decades, it has been known as the methods of encryption that use a pen and a paper.
More formally, today, cryptography is the art of encoding data in a way that only
the intended recipient can decode it, and knows that the message is authenticated and
unchanged [1].
Since early ages, empires and governments have been using the cryptography for
sending messages in a secure manner. The earliest example of cryptography is found
in non-standard hieroglyphs carved into monuments from Egypt’s Old Kingdom which
are more than 4500 years old. Moreover, cryptography is also used by the Spartans in
5BC. This warrior society developed a cryptographic device to send and receive secret
messages. This device, a cylinder called a “Scytale”, was in the possession of both
the sender and the recipient of the message. To prepare the message, a narrow strip of
leather, was wound around the Scytale and the message was written across it. To read
the message, it was re-wound onto a Scytale of exactly the same diameter. Finally,
Caesar’s method can be given as an example to the classic cryptography, which relies
on shifting the letters by three. In those times, cryptographic methods relied on just the
secrecy of algorithm which is called security by obscurity. Today, these ciphers stayed
of a historical interest and are not adequate for a real-world situation.
The modern times cryptography is based on mathematical algorithms whose security
is based on a hard mathematical problem. Before 1960s, when the computers were not
the key primitives of our lives, the cryptography had been used generally for military
purposes as a tool to protect national secrets and strategies. The role of cryptography
1
affected the tactics of the World War I and II, and new boundaries were drawn
depending on the developments in the cryptography. After 1960s, the widespread
usage of computers and communication systems brought a demand from the public
for means to protect information in digital form. The increasing necessity of security
services evoked to design DES (Data Encryption Standard) by IBM as a U.S. Federal
Information Processing Standard for encrypting unclassified information in 1977 [2].
Afterwards, the fast proceeding of computers led to need a new standard called AES
(Advanced Encryption Standard) which was designed by Rijmen and Daemon in 1998.
Today, it is most widely used algorithm in the world.
Nowadays, the cryptography is being used in many technologically advanced
applications, such as; ATM cards, smart cards, identification cards, secure e-mail,
computer passwords, biometrics and electronic commerce.
All modern algorithms use a key to control encryption and decryption. Basically,
cryptographic methods can be classified into two main branches like symmetric
(private key) cryptosystems and asymmetric (public-key) cryptosystems.
1.2 Organization of Thesis
In this thesis, the first hardware implementation of Binary Edwards Curves has
been designed, which includes the feature of completeness and is compatible with
RFID-tags. In chapter 2, a brief information about the classification of cryptosystems
is given and the idea behind public key cryptosystems is explained. Chapter 3 gives
an essential concept of mathematical explanations briefly. The discrete logarithmic
problem, introduction to elliptic curves with point addition and point multiplication,
projective coordinates and elliptic curve discrete logarithm problem are summarized
in chapter 4. Chapter 5 gives Edwards curves and binary Edwards curves with explicit
formulas of addition and doubling. In chapter 6 the implementation details of binary
Edwards curves and working principles are explained. The efficiency of circuit is
discussed. Chapter 7 gives the final results and trade-offs. Finally, chapter 8 concludes
binary Edwards curves implementation and efficiency.
2
2. CRYPTOSYSTEMS
2.1 Symmetric-key Cryptosystems
Symmetric algorithms, also called secret-key algorithms, use the same key for both
encryption and decryption. Basically, a sender and a receiver share a secret key “K”,
which is used to encrypt a plain-textwith “K” and an encryption rule. After the receiver
receives a cipher-text, the same secret key is used to decrypt the cipher-text to the
plain-text with a decryption rule. The security of system is depended on the secrecy of
key, therefore the key is not to be leaked to the outside, should be changed often and
be sufficiently random. A symmetric-key cryptosystem is illustrated in Figure 2.1.
Figure 2.1: Symmetric-key Cryptosystems
In Figure 2.1, an entity A encrypts a message with “K” and sends it through an
unsecured channel to an entity B. Another entity E, called eavesdropper, listens the
unsecured channel. But he/she gets only a cipher-text and can not infer a meaningful
message. When an entity B receives and decrypts the cipher-text by using the same
“K”, the communication is finished securely. One drawback of this scheme is that the
shared key, K, must be distributed before the communication occurs.
3
Symmetric-key algorithms can be further divided into two categories: stream ciphers
and block ciphers. Stream ciphers generate an arbitrary long key and the encryption
is performed by combining it with the plain-text bit-by-bit. In contrast, block ciphers
take a block (some fixed number of bits) of plain-text and a key, and give an output a
block of cipher-text of the same size. Moreover, different ciphers use different length
of keys and a longer key usually means higher security. The most popular example of
a block cipher is Advanced Encryption Standard algorithm (AES), which is approved
by NIST in December 2001 and uses 128,192,256-bit blocks [3].
Symmetric-key algorithms are generally much less computationally intensive than
asymmetric ones and use shorter keys. According to the [4], both in software and
hardware public-key encryption algorithms are two to three orders of magnitude
slower than symmetric algorithms. For example, a 1024-bit exponentiation with a
thirty-two-bit exponent takes 360µs on a 1 GHz Pentium III; this corresponds to 2800
cycles/byte; a decryption with a 1024-bit exponent takes 9.8ms or 76000 cycles/byte.
This speed should be compared to 15cycles/byte for AES. Although symmetric-key
algorithms have many advantages like high efficiency, a problem occurs when a
communication system needs to share the same private key through an insecure
channel.
The process of selecting, distributing and storing keys is known as key management,
and it is difficult to achieve these in secure [5]. In this point, the asymmetric-key
cryptosystem is used to establish a secret key, which is then used in a symmetric-key
cryptosystem.
2.2 Asymmetric-key Cryptosystems
In contrast to symmetric cryptosystem, asymmetric one has a pair of keys; a
public and a private key. This system is also called public-key cryptosystem. The
public-key cryptosystem consists of a public key, which can be used for encryption
and verification of a signature, and a private key, used for decryption and creation of a
signature. Everyone publish their public key and keep their private key secret.
The idea of public-key cryptography was introduced in the mid 70’s by Diffie
and Hellman [6]. In public-key cryptosystem, two different keys are generated,
4
which are affiliated with each other by trapdoor functions. Trapdoor functions are
easy to apply in one direction but extremely difficult to apply in the inverse [7].
Public-key cryptosystems are used in key establishment protocols, data integrity, entity
authentication and for encrypting small datas such as credit card numbers and PINs [2].
The generation of keys, encryption and signature schemes are going to be discussed in
more details.
2.2.1 Key Generation
First of all, each entity creates its own private and public key. The public-key
generation function uses an unpredictable (typically a large randomly chosen) number
to generate a valid key pair. The process is shown in Figure 2.2.
Key Generation
Function
Big random 
number
Public Key Private Key
Figure 2.2: Key Generation
After key generation is executed, these keys can be used for the public-key encryption.
2.2.2 Public-key Encryption
In public encryption, every entity has a public key “e” and a corresponding private key
“d”. As stated before, a public key is used for encryption (Ee(m)), and a private key
is used for decryption (Dd(c)). This is illustrated in Figure 2.3. In secure systems, the
task of computing “d” with given “e” is computationally infeasible [2].
In Figure 2.3, an entity B wants to send a message “m” to the A, and he/she takes an
authentic copy of A’s public key “e”. Now, B uses the encryption transformation to
obtain the cipher-text “c= Ee(m)”, and transmits “c” to A over an unsecured channel.
Anyone can see this cipher-text, but only A, who has the related private key, can
decrypt this message. To decrypt “c”, A applies the decryption transformation and
5
P
u
b
l i c
 K
e
y
P
u
b
l i c
 K
e
y
P
r i
v
a
t e
 K
e
y
P
r i
v
a
t e
 K
e
y
Figure 2.3: Public-key Encryption
obtains the original message “m = Dd(c)”. In public-key encryption, security rely
only on the secrecy of private key [2].
2.2.3 Digital Signature
The main objective of public-key encryption is to provide secrecy or confidentiality.
Since A’s encryption transformation is on public knowledge, public-key encryption
alone does not provide data authentication or data integrity [2]. Anybody can send a
cipher-text to the A, and there is no reason for A to believe that the message was sent
by the claimed identity unless a digital signature is used. The setting of public-key
cryptosystem also allows the application of digital signatures. These settings are shown
in Figure 2.4 to generate a signature.
P
u
b
l i c
 K
e
y
P
u
b
l i c
 K
e
y
P
r i
v
a
t e
 K
e
y
P
r i
v
a
t e
 K
e
y
Figure 2.4: Public-key Digital Signature Diagram
6
In this protocol shown in Figure 2.4, an entity A signs a message with its private key
“d” and sends it to an entity B. To verify the signature, B has to look up the public key
“e” of A and compute “Ee(Dd(sign)) = sign”. Here, anybody can see and decrypt the
signature of A, but no one can just copy signature of the A and send the messages to
B as A. Since the signature can only be created by A’s secret key, it’s validity depends
on the security of the private key.
Finally, public-key cryptosystems can also be used to generate shared secret key
without an authenticated and secure channel.
2.2.4 Diffie-Hellman Key Management
In 1976, Whitfield Diffie andMartin Hellman published a key management scheme [6].
In this scheme, each entity generates own public and private key pair, and distribute
their public key. After obtaining an authentic copy of each others public keys, A and
B can compute a shared key offline for a symmetric cipher in the diagram is shown in
Figure2.5.
A B
E
unsecured channel
Public Key of A
P
u
b
l i c
 K
e
y
P
u
b
l i c
 K
e
y
P
r i
v
a
t e
 K
e
y
P
r i
v
a
t e
 K
e
y
Public Key of B
B
A
A
B
S
e
c
r e
t  
S
h
a
r e
d
 K
e
y
S
e
c
r e
t  
S
h
a
r e
d
 K
e
y
Figure 2.5: Diffie-Hellman Key Management Scheme
After all, public-key cryptosystems can be compared by means of their security, key
lengths, speed and implementation issues. In terms of security, the hardness of the
underlying mathematical problem determines the intractability of the system [7].
7
In short, modern cryptosystems take the advantages of both asymmetric and symmetric
algorithms. Asymmetric algorithms are used at the first stages to provide authenticated
channel and key distribution, and then symmetric key algorithms are used for
encryption. For instance, this type of hybrid approach is used in SSL, PGP and GPG,
etc [2].
8
3. ESSENTIAL CONCEPTS
In this chapter, we present mathematical background of this study and introduce a few
basic definitions. These definitions will be useful to support ideas of the later chapters.
First of all, we give the properties and definitions of integers then we will establish
other subsections over these basic informations.
3.1 Integers
The set of integers {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} is denoted by the symbol Z.
For a given finite set A, the number of elements of A is denoted by ] A. The following
definitions involve the basic properties of integers that we use these explanations to
define some operations on the next sections.
Definition 3.1 : (Division algorithm for integers) If a and b are integers with b ≥ 1,
then ordinary long division of a by b yields integers q (the quotient) and r (the
remainder) such that;
a= q.b+ r, where 0≤ r < b (3.1)
The remainder of the division is denoted as a mod b, and the quotient is denoted as a
div b. In the other denotation, a div b= [a/b] and a mod b= a - b.[a/b].
Definition 3.2 : (Greatest common divisor) An integer c is a common divisor of a and
b if c | a and c | b. Moreover, a non-negative integer d is the greatest common divisor
of integers a and b, denoted d = gcd(a,b), if
(i) d is a common divisor of a and b,
(ii) whenever c | a and c | b, then c | d.
Equivalently, gcd(a,b) is the largest positive integer that divides both a and b, with the
exception that gcd(0,0) = 0. a,b ∈ Z are called relatively prime if and only if gcd(a,b)
= 1.
9
Definition 3.3 : (Least common multiple) A non-negative integer d is the least common
multiple of integers a and b, denoted d = lcm(a,b), if
(i) a | d and b | d,
(ii) whenever a | c and b | c, then d | c.
Equivalently, lcm(a,b) is the smallest non-negative integer divisible by both a and b.
In other denotation, lcm(a,b)=a.b/gcd(a,b).
3.1.1 The Integers modulo n
The following definitions of integers are given in modulo n. Let n be a positive integer.
The integers modulo n denoted asZn, is the set of integers {0,1,2, . . . ,n−1}. Addition,
subtraction and multiplication in Zn are performed in modulo n.
Definition 3.4 : (Congruency) If a and b are integers, then a is said to be congruent
to b modulo n, written a ≡ b (mod n), if n divides (a− b). The integer n is called
the modulus of the congruence. The properties of congruence are given for all
a, a1, b, b1, c ∈ Z.
1. a≡ b (mod n) if and only if a and b leave the same remainder when divided by n.
2. (reflexivity) a≡ a (mod n)
3. (symmetric) If a≡ b (mod n) then b≡ a (mod n)
4. (transitivity) If a≡ b (mod n) and b≡ c (mod n), then a≡ c (mod n)
5. If a≡ a1 (mod n) and b≡ b1 (mod n), then a+b≡ a1 + b1 (mod n) and a.b≡ a1 . b1
(mod n).
Definition 3.5 : (Multiplicative inverse) Let a ∈ Zn. The multiplicative inverse of a
modulo n is an integer x ∈ Zn such that a.x≡ 1 (mod n). If such an x exists, then a is
said to be invertible; the inverse of a is denoted by a−1. This condition can be provided
if and only if gcd(a,n) = 1.
The multiplicative inverse operation is used in our implementation depending on
Fermat’s theorem. To clarify the Fermat theorem, Euler phi function and multiplicative
group of Zn is defined in following definitions.
10
Definition 3.6 : (Euler phi function) For n≥ 1, let φ(n) denote the number of integers
in the interval [1,n] which are relatively prime to n. The function φ is called the Euler
phi function. The properties of Euler phi function are given;
(i) If p is a prime, then φ(p) = p−1.
(ii) The Euler phi function is multiplicative. That is, if gcd(m,n) = 1, then
φ(mn) = φ(m).φ(n).
(iii) If n= pe11 .p
e2
2 . . . p
ek
k is the prime factorization of n, then
φ(n) = n(1− 1
p1
)(1− 1
p2
) . . .(1− 1
pk
). (3.2)
Definition 3.7 : The multiplicative group of Zn is Z∗n = {a ∈ Zn | gcd(a,n) = 1}. In
particular, if n is a prime, then Z∗n = {a | 1≤ a≤ n−1}. Moreover the order of Z∗n is
defined to be the number of elements in Z∗n, namely | Z∗n | [2].
From the Euler phi function that |Z∗n |= φ(n). Note also that if a∈Z∗n and b∈Z∗n, then
a.b ∈ Z∗n, and so Z∗n is closed under multiplication.
Fact Let n≥ 2 be an integer.
(i) (Euler’s theorem) If a ∈ Z∗n, then aφ(n) ≡ 1 (mod n).
(ii) If n is a product of distinct primes, and if r≡ s (mod φ(n)), then ar ≡ as (mod
n) for all integers a. In other words, when working modulo such an n, exponents can
be reduced modulo φ(n).
A special case of Euler’s theorem is Fermat’s (little) theorem. Let p be a prime.
(i) (Fermat’s theorem) If gcd(a, p) = 1, then ap−1 ≡ 1 (mod p).
(ii) If r ≡ s (mod p−1), then ar ≡ as (mod p) for all integers a. In other words,
when working modulo a prime p, exponents can be reduced modulo p−1.
(iii) In particular, ap ≡ a (mod p) for all integers a.
These properties is used in our design to calculate the inversion of Z coordinate. The
method of finding inverse of Z coordinate is given in Table 6.1.
Definition 3.8 : Let α ∈ Z∗n. If the order of α is φ(n), then α is said to be a generator
or a primitive element of Z∗n. If Z∗n has a generator, then Z∗n is said to be cyclic [2].
The properties of generators of Z∗n are given;
11
(i) Z∗n has a generator if and only if n= 2, 4, pk or 2pk, where p is an odd prime
and k ≥ 1. In particular, if p is a prime, then Z∗n has a generator.
(ii) If α is a generator of Z∗n, then Z∗n = {α i mod n | 0≤ i≤ φ(n)−1}.
(iii) Suppose that α is a generator of Z∗n. Then b= α i mod n is also a generator
of Z∗n if and only if gcd(i,φ(n)) = 1. It follows that if Z∗n is cyclic, then the number of
generators is φ(φ(n)).
(iv) α ∈ Z∗n is a generator of Z∗n if and only if αφ(n)/p 6= 1 (mod n) for an each
prime divisor p of φ(n).
3.2 Groups
Definition 3.9 : A group (G,∗) consists of a set G with a binary operation ∗ on G
satisfying the following three axioms [2].
1. The group operation is associative. That is, a∗ (b∗c) = (a∗b)∗c for all a,b,c ∈G.
2. There is an element 1 ∈ G, called the identity element, such that a∗1= 1∗a= a if
all a ∈ G.
3. For each a ∈ G there exists an element a−1 ∈ G, called the inverse of a, such that
a∗a−1 = a−1 ∗a= 1.
4. A group G is abelian (or commutative) if, furthermore, a∗b= b∗a for all a,b ∈G.
The notation of (G,∗) is used to represent multiplicative group, the identity element
is represented by 1 and the inverse of a is denoted as a−1. If the group operation is
addition with the notation (G,+), then the group is said to be an additive group, the
identity element is denoted by 0, and the inverse of a is denoted −a.
If G is a finite group, then the number of elements of G is called the order of G and it
is denoted as |G|. An element of group G, a ∈ G. The order of a is defined to be the
least positive integer t such that at = 1, provided that such an integer exists. If such a
‘t’ does not exist, then the order of a is defined to be ∞.
Definition 3.10 : A group G is cyclic if there is an element α ∈ G such that for each
b ∈ G, there is an integer i with b= α i. Such an element α is called a generator of G.
12
For example, the set Zn = {0,1,2, . . . ,n−1} is a cyclic group of order n under addition
modulo n, i.e. a+b≡ r mod n, where r < n (r is the remainder when a+b is divided
by n) [2].
3.3 Rings
Definition 3.11 : A ring (R,+,×) consists of a set R with two binary operations
denoted + (addition) and × (multiplication) on R, satisfying the following axioms.
1. (R,+) is an abelian group with identity denoted 0.
2. The operation × is associative. That is, a× (b× c) = (a×b)× c for all a,b,c ∈ R.
3. There is a multiplicative identity denoted 1, with 1 6= 0, such that 1×a= a×1= a
for all a ∈ R.
4. The operation × is distributive over +. That is, a× (b+ c) = (a×b)+(a× c) and
(b+ c)×a= (b×a)+(c×a) for all a,b,c ∈ R.
The ring is a commutative ring if a×b= b×a for all a,b ∈ R. From the third axiom,
if R has an identity element, then it is said to be a unitary ring or a ring with unity
element [2].
3.3.1 Polynomial Rings
Definition 3.12 : If R is a commutative ring, then a polynomial in the indeterminate x
over the ring R is an expression of the form
f (x) = anxn+ · · ·+a2x2+a1x+a0 (3.3)
where each ai ∈ R and n is a positive integer. Here, the element ai is called the
coefficient of xi in f (x). The largest integer m for which am 6= 0 is called the degree of
f (x), denoted deg f (x); am is called the leading coefficient of f (x) [2].
Definition 3.13 : The polynomial ring R[x] is formed by the set of all polynomials in
the indeterminate x having coefficients from R. The standard polynomial addition and
multiplication operations are performed with coefficient arithmetic in the ring R [2].
Given two polynomials,
13
f (x) =
n
∑
i=0
aixi and g(x) =
n
∑
i=0
bixi
we define the sum of f (x) and g(x) as
f (x)+g(x) =
n
∑
i=0
(ai+bi)xi (3.4)
Given two polynomials,
f (x) =
n
∑
i=0
aixi and g(x) =
m
∑
j=0
b jx j
we define the product of f (x) and g(x) as
f (x)g(x) =
n+m
∑
k=0
(ck)xk, where ck = ∑
i= j=k
aib j (3.5)
We give an example to show addition and multiplication operations on the polynomial
ring Z[x]. Let f (x) = x3+ x+1 and g(x) = x2+ x be elements of our polynomial ring.
The addition of two elements is,
f (x)+g(x) = x3+ x2+1 (3.6)
and
f (x)×g(x) = x5+ x4+ x3+ x (3.7)
Definition 3.14 : (Division algorithm for F [x]) Let f (x),g(x) ∈ F [x], with g(x) 6= 0.
Then there exist unique polynomials q(x),r(x) ∈ F [x] such that
f (x) = q(x)g(x)+ r(x) (3.8)
where the degree of r(x) is less than the degree of g(x). The polynomial q(x) is called
the quotient, while r(x) is called the remainder. If r(x) is the zero polynomial (i.e.
r(x)=0), then g(x) is said to be a divisor of f (x). A non-constant polynomial f (x) is
said to be irreducible over F if it has no divisor of lower degree than f (x) in F [x] [2].
Again, we give an example to practice polynomial division on the polynomial ring.
Let f (x) = x6+ x5+ x3+ x2+ x+ 1 and g(x) = x4+ x3+ 1 in Z[x]. Polynomial long
division of f (x) by g(x) yields, g(x) = x2.h(x)+(x3+ x+1).
Hence f (x) mod g(x) = x3+ x+1 and f (x) div g(x) = x2
14
3.4 Fields
Definition 3.15 : A field is a commutative ring in which all non-zero elements have
multiplicative inverses [2].
The characteristic of a field is 0, if it is defined by addition over integer numbers that
m times︷ ︸︸ ︷
1+1+ . . .+1 6= 0 for any m≥ 1. Otherwise, the characteristic of the field is the least
positive integer m such that
m
∑
i=1
1= 0.
Moreover, Zp is a field under the usual operations of addition and multiplication in
modulo p, if and only if p is a prime number. Then Zp has characteristic p [2].
3.4.1 Finite Fields
Definition 3.16 : A finite field is a field F which contains a finite number of elements.
The order of F is the number of elements in F . The properties of a finite field F can
be given with following axioms [2].
(i) If F is a finite field, then F contains pm elements for some prime p and integer
m≥ 1.
(ii) For every prime power order pm, there is a unique finite field of order pm.
This field is denoted by Fpm , or GF(pm). The characteristic of Fpm is p.
Definition 3.17 : A finite field Fq is given with the order q = pm, p is a prime, the
non-zero elements of Fq form a group under multiplication called multiplicative group
of Fq, denoted by F∗q [2].
F∗q is a cyclic group of order q−1. Hence aq = a for all a ∈ Fq.
A polynomial basis representation is commonly used to represent the elements of a
finite field GF(q), where q = pm. There exists an irreducible polynomial, f (x), of
degree m over GF(p), then the polynomial representation of the finite field, GF(pm),
can be given in the following form.
g(x) = am−1xm−1+am−2xm−2+ . . .+a1x1+a0 (3.9)
where {0≤ am−1,am−2, . . . ,a1,a0 ≤ p−1 } and the greatest degree of field is m−1.
Addition: The representation of an addition in GF(pm) can be performed by adding
the coefficient of same degrees in modulo p.
15
Multiplication : If g(x),h(x) ∈ GF(pm), then the product g(x)h(x) can be formed by
first multiplying g(x) and h(x) as polynomials by the ordinary method with modulo p
for coefficients, and then taking the remainder after polynomial division by f (x).
Multiplicative inversion: In GF(pm), it can be computed by using Fermat’s Little
theorem which is stated in Definition 3.7.
In our design, we use GF(2163). So, the order of the finite field is of the form pm,
where p is a prime number called the characteristic of the field and 2, and m is a
positive integer and 163.
We can explain effective polynomial representation to clarify operations over GF(2163).
A particular case in GF(p) is GF(2), where addition is exclusive OR (XOR) and
multiplication is AND. Moreover, elements of GF(2163) may be represented as
polynomials of degree less than 163 over GF(2). Operations are then performed
modulo R(x) where R(x) is an irreduciblepolynomial of degree 163 over GF(2). The
addition of two polynomials P and Q is done as stated before; multiplication is done as
follows:W =P.Q, then compute the remainder modulo R(x). In our design, irreducible
polynomial is set to x163 + x7 + x6 + x3 + 1. It is possible to express elements of
GF(2163) as binary numbers, with each term in a polynomial represented by one bit
in the corresponding element’s binary expression.
3.4.1.1 Addition and Subtraction
Addition and subtraction are performed by adding or subtracting two of these
polynomials together, and reducing the result modulo the characteristic. In a finite
filed with characteristic 2 as ours, addition and subtraction are identical, and are
accomplished using the XOR operation. Table 3.1 gives some examples over GF(2163).
Notice that under regular addition of polynomials, the sum would contain a term 2x6,
but this term becomes 0x6 and is dropped when the answer is reduced modulo 2.
3.4.1.2 Multiplication
Multiplication in a finite field is multiplication modulo an irreducible reducing
polynomial used to define the finite field. We give an example of multiplication over
Rijndael’s finite field.
16
Table 3.1: Addition in Finite Fields
p1 p2 p1 + p2 (normal algebra) p1 + p2 in GF(2163)
x3+ x+1 x3+ x2 2x3+ x2+ x+1 x2+ x+1
x4+ x2 x6+ x2 x6+ x4+2x2 x6+ x4
x+1 x2+1 x2+ x+2 x2+ x
x3+ x x2+1 x3+ x2+ x+1 x3+ x2+ x+1
x2+ x x2+ x 2x2+2x 0
Rijndael uses a characteristic 2 finite field with 8 terms, which can also be called the
GF(28). The following reducing polynomial is given for multiplication:
x8+ x4+ x3+ x+1.
For example, {53}.{CA} = {01} in Rijndael’s field, it can be calculated as following
steps;
(x6+ x4+ x+1)(x7+ x6+ x3+ x)
= x13+ x12+ x9+ x7+ x11+ x10+ x7+ x5+ x8+ x7+ x4+ x2+ x7+ x6+ x3+ x
= x13+ x12+ x11+ x10+ x9+ x8+ x6+ x5+ x4+ x3+ x2+ x mod x8+ x4+ x3+ x+1
= 1
(3.10)
Modulo operation can be demonstrated through long division, remainder gives the
result value. Notice that EXOR is applied in the example and not arithmetic
subtraction.
Figure 3.1: Modulo Operation over Rijndael Finite Field
17
18
4. ELLIPTIC CURVE CRYPTOSYSTEMS
Widely usage of public cryptosystems in communication triggered the invention of
new mathematical algorithms. The first proposals of the elliptic curves were made by
Koblitz in [8] and Miller in [9] for the use in public-key cryptography (PKC). In order
to introduce a public key cryptosystem based on elliptic curves, firstly we describe
discrete logarithm problem, Diffie-Hellman problem, Diffie-Hellman key agreement
and El-Gamal cryptosystem to discuss later in ECDLP. Properties of elliptic curves are
discussed later.
4.1 Discrete Logarithm Problem
The discrete logarithm is the inverse of exponentiation in a finite cyclic group [10]. For
a given cyclic group G with a group operation “*” and a generator “ a”, exponentiation
in G is defined by
ax = a∗a∗ . . .∗a. (4.1)
Suppose that β = αx, then the discrete logarithm of β is x and is written as
logαβ = x. (4.2)
Actually, the discrete logarithm of β is not unique as it can only be found modulo the
order of α in F. If α is a generator as specified above, then the logarithm is found
modulo the order of the group
logαβ = x (mod p). (4.3)
where “ p” is the group order.
Definition 4.1 : (Discrete logarithm problem) Given a prime p, a generator α of Z∗p
and an element β ∈ Z∗p, find the integer x, 0≤ x≤ p−2, such that β = αx mod p [2].
The DLP in Zp is considered to be difficult or intractable if p has at least 150 digits and
p-1 has at least one large prime factor (as close to p as possible) [11]. These criteria
for p are safeguards against the known attacks on DLP.
19
4.1.1 Diffie-Hellman Key Agreement Protocol
The problem of computing discrete logarithms was just a mathematical curiosity until
Diffie and Hellman described a method of exchanging cryptographic keys which relies
on DLP in 1976 [6]. The Diffie-Hellman key agreement protocol provides sharing
secret key parts over an insecure channel between two parties, A and B, which is given
in Figure 4.1 and works as follows:
1. A and B agree on group G and generator α . These choices can be public.
2. A chooses an exponent x (0 ≤ x ≤ p− 2) randomly, computes αx, and sends this
value to the B. The exponent x must be kept private.
3. B chooses an exponent y (0 ≤ y ≤ p− 2) randomly, computes αy, and sends this
value to A. The exponent y must be kept private. B then computes, using the value
αx received from A, Kb=(αx)y.
4. When A receives αy from B, A computes Ka=(αy)x.
g
r o
u
p
 G
α
g
r o
u
p
 G
α
R
N
G
R
N
G
Figure 4.1: Diffie-Hellman Key Agreement Protocol
A and B now share the common secret key αxy. If third party does not know any of
the random choices, then DLP will keep the secret key αxy in secure. An attacker
could decrypt A’s message if B’s random secret key y could be computed from β ≡
αy (mod p) and α which are publicly known [12].
20
In Figure 4.1, third part E could listen αx, αy (mod p); the security of this protocol
is based on the assumption of computing αxy, common shared secret key, with these
public values is as hard as obtaining the value y from β ≡ αy (mod p) in DLP. In brief,
this protocol is secure as long as the DLP is intractable.
4.1.2 The El-Gamal Cryptosystem
The El-Gamal cryptosystem in Z∗p, which also uses discrete logarithm, is presented
with the following equations, given in Figure 4.2 [13].
Let p be a prime such that the DLP in Zp is intractable, and let α ∈ Z∗p be a primitive
element, where p and α are publicly known. Each user creates their private keys, x, y
and calculates αx, αy. After the calculation is completed, all are published to public
as;
βx ≡ αx (mod p), βy ≡ αy (mod p). (4.4)
where β is recipients published value.
Before, sending a message, user must choose a random number k ∈ Zp−1 and the
message, m ∈ Z∗p, is sent as:
(s1,s2) = (αk mod p,mβ ky mod p) (4.5)
After, receiving the message, the recipient decrypts text as follows:
s2(s
y
1)
−1 ≡ mβ k(αky)−1 ≡ mαyk(αky)−1 ≡ m mod p, (4.6)
where y is recipients secret key.
4.1.3 Elliptic Curve Discrete Logarithm Problem
The hardness of the elliptic curve discrete problem is essential for the security of all
elliptic curve cryptographic systems.
Definition 4.2 : Given an elliptic curve E defined over a finite field Fq, a point P ∈
E(Fq) of order n, and a point Q ∈< P >, find the integer k ∈ {0, n− 1} such that
Q = kP. The integer k is called the discrete logarithm of Q to the base P. ECDLP is
defined to be the problem of finding the logarithm k for a given P and Q.
21
pα pα
R
N
G
R
N
G
Figure 4.2: El-Gamal Cryptosystem
The number of rational points on a curve E over a finite field Fq is denoted by ]E(Fq).
The ECDLP is really hard unless ]E(Fq) is “smooth”, i.e., a product of small primes.
This number is shown on the following theorem.
Theorem 4.1 :(Hasse) Let E be an elliptic curve over Fq. Then
q+1−2√q ≤ E(Fq) ≤ q+1+2√q (4.7)
The quantity t, defined by ]E(Fq) = q+ 1− t is called the trace of Frobenius [14].
Hasse’s theorem implies |t| ≤ 2√q [12].
The elliptic curve parameters should be carefully selected in order to resist all known
attacks on ECDLP. If the estimated time of searching k is long enough to think about
the worth of the information in secrecy, then attacker will give up attacking.
Some known attacks, their running times and precautions can be considered with
following situations. Firstly, the most naive algorithm to solve the ECDLP is
exhaustive search which is computing in every step P, 2P, 3P . . . one by one until
reaching Q value. The running time is approximately n steps in the worst case and
n/2 steps on average. Therefore, this method can be circumvented by selecting elliptic
curve parameters large enough to represent an infeasible amount of computation as
n≥ 280. Secondly, there are lots of algorithms to attack ECDLP, but the most general
known one is the combination of the Pohlig-Hellman algorithm and the Polard’s rho
algorithm [12], which has an exponential running time ofO(
√
p)where p is the largest
22
prime divisor of n. If the elliptic curve parameters are chosen so that n is divisible by a
prime number p sufficiently large, then it will be an infeasible amount of computation
(e.g., p> 2160), so ECDLP will resist to this kind of attack [12]. Finally, the important
issue is choosing the parameters of elliptic curve very carefully, so that ECDLP could
resist to all attacking method known.
In conclusion, as a comparison of the cryptosystems on security, the NESSIE
consortium in [15], recommends sufficient security for the next 5-10 years, the use
of 1536-bit keys for RSA and DL based public key schemes, and 160-bit for elliptic
curve discrete logarithms. This recommendation is based on an assumed equivalence
between 512-bit RSA keys and 56-bit keys, and an extrapolation of that is given in
Table 4.1.
Table 4.1: NESSIE Recommendations
Equivalent symmetric key size 56 64 80 112 128 160
Elliptic curve key size 112 128 160 224 256 320
Modulus length (pq) 512 768 1536 4096 6000 10000
Modulus length (p2q) 570 800 1536 4096 6000 10000
4.2 Introduction to Elliptic Curves
Elliptic curve cryptography is a public-key cryptosystem which is believed to be
intractable because of hardness of finding discrete logarithm in a finite group. In
public-key cryptography, for example the RSA algorithm, the product of two large
prime numbers are used as the puzzle: a user picks two large random primes as
private key, and publish their product as public key. While finding large primes and
multiplying them is easy, its inverse process factoring is believed to be hard. But,
improvements on technology lead to longer bits to provide intractability. It is generally
recommended RSA public keys to be at least 1024 bits in length to render integer
factoring algorithms infeasible [16]. On the other hand, for given P and Q, finding k
such that kP = Q in elliptic curve needs less bits to provide intractability. The size of
group determines the difficulty of the problem. It is believed that smaller group can
be used to obtain the same level of security as RSA-based systems. When RSA is
compared with ECC, ECC needs shorter parameters and signatures, ECC is faster than
RSA on some platforms and needs lower power consumption [17].
23
Now, we discuss the properties of elliptic curves with the equation 4.8.
4.2.1 Weierstrass Equation
Let K be a field. For example, K can be the finite field of Fq, the prime field Zp, the
field R of the real numbers, the filed Q of rational numbers, or the field C of complex
numbers [13].
An elliptic curve over a field K is defined by the Weierstrass equation:
y2+a1xy+a3y= x3+a2x2+a4x+a6 (4.8)
over this field and the point O at infinity, where a1, a2, a3, a4, a6 ∈ K. The elliptic
curve E over K is denoted E(K).
All the solutions of the above equation together with a point at infinity form an Abelian
group, with the point at infinity as identity element. If the coordinates x and y are
chosen from a finite field, the solutions form a finite Abelian group.
For fields of various characteristics, the Weierstrass equation can be transformed into
different forms by a linear change of variables [13]. For instance,
Characteristic 6= 2,3 : Let K be a field of characteristics 6= 2,3, and let x3+ ax+ b
(where a,b ∈ K) be a cubic polynomial with the condition that 4a3+27b2 6= 0 which
ensures that the polynomial has no multiple roots. An elliptic curve over K is the set
of points (x,y) with x,y ∈ K that satisfy the equation.
y2 = x3+ax+b (4.9)
and the element denoted by O is called the point at infinity.
Characteristic 2 : If K is a field of characteristic 2, then there are two types of elliptic
curves:
An elliptic curve of zero j-invariant is the set of points satisfying
y2+a3y= x3+a4x+a6 (4.10)
(where a3,a4,a6 ∈ Fq,a3 6= 0) and O , the point at infinity.
j-invariant of E over K is an element of K determined by a1, a2, a3, a4 and a6 [18].
24
An elliptic curve of nonzero j-invariant is the set of points satisfying
y2+ xy= x3+a2x2+a6 (4.11)
(where a2,a6 ∈ Fq,a6 6= 0) and O , the point at infinity.
4.2.2 Point Addition over Finite Fields
Let P1 and P2 be two points on an elliptic curve E and we define a third point P1+P2
so that E(K) defines an abelian group with this addition operation. If P1 6= P2, then the
line which goes through P1 and P2 intersects the curve on a third point Q. If P1 = P2
then the tangent of E(K) at P1 intersects the curve on a second point Q. In every group
structure, there must be a neutral element with respect to the group operation, so this
line and Q does not define a group structure in this condition. Therefore, we find a
point of intersection where the curve meets the line connecting Q and the point infinity
(neutral element) with a third point which we call this point P1 + P2 or 2P1. This
situation can be provided by a vertical line, which is drawn through the point Q. A
vertical line intersects E(K) at 3 points: (x,y),(x,−y) an 0. Hence, the point at infinity
0 serves as the additive identity element, other two points are their inverses in addition.
P1 +P2 +Q = 0 or P1 +P2 = −Q, the inverse of Q. In figure 4.3 and 4.4, addition
in different points and doubling in one point is illustrated respectively. These elliptic
curves are drawn over real numbers with the equation y2 = x3−50x+100.
Given two points P1=(x1,y1) and P2=(x2,y2), P1 6=P2, the sum P3=P1+P2=(x3,y3)
can be computed as;
λ =
{y1−y2
x1−x2 P1 6= P2
3x21+2a2x1+a4−a1y1
2y1+a1x1+a3
P1 = P2
(4.12)
x3 = λ 2−a1λ −a2− x1− x2, y3= (x1− x3)λ − y1−a1x3−a3 (4.13)
In general, the basic operation for ECC algorithms is point or scalar multiplication,
shown as Q= kP, where k is an integer, P and Q are EC points. The efficiency of point
multiplication is mainly determined by the implementation of the finite field arithmetic.
The point operation can be calculated in many different ways, for example by using
two different double-and-add algorithm and Montgomery ladder algorithm which are
executed by point addition and doubling. Algorithms are given in Section 4.3. The
25
Figure 4.3: Point Addition of a Point in Elliptic Curve Equation y2 = x3−50x+100
lowest hierarchical level is composed of finite filed operations: addition, subtraction,
multiplication and inversion.
There are many types of coordinates in which an elliptic curve can be represented. In
the above equations affine coordinates are used, but so-called projective coordinates
have some implementation advantages. The main conclusion is that point addition can
be done in projective coordinates using only field multiplications, with no inversions
required. In addition to this, inversion is only needed ones, at the end of the point
multiplication operation, to convert back to affine coordinates.
4.3 Point Multiplication
In this section, we consider the methods of computing kP, where k is an integer
and P is a point on elliptic curve E defined over Fq. This operation is called point
multiplication, and it consumes almost all of the execution time on elliptic curve
cryptographic protocols. Basically, Q = k.P is calculated by adding the point P to
itself k times. Algorithm 1 and 2 are the basic repeated double-and-add methods which
process the bits of k from right to left and left to right, respectively. Algorithm 3 is
Montgomery ladder which is computationally balanced and independent of ki , thus
26
Figure 4.4: Doubling of a Point in Elliptic Curve Equation y2 = x3−50x+100
it is more secure against simple power analysis (SPA). It will be discussed in Section
6.1.1.4 in details.
After the method of point multiplication is chosen, one lower level of hierarchy is
selecting point addition and point doubling algorithms. These algorithms use finite
field arithmetic: addition, subtraction, multiplication and inversion, with respect to
the control of the top level. The hierarchy of a basic elliptic curve cryptosystem is
illustrated in Figure 4.5.
A
D
D
I T
I O
N
S
U
B
T
R
A
C
T
I O
N
I N
V
E
R
S
I O
N
M
U
L
T
I P
L
I C
A
T
I O
N
Figure 4.5: Hierarchy of Elliptic Curve Cryptosystems
27
Algorithm 1 : Right-to-left Binary Method for point multiplication [12]
Require: EC point P= (x,y), integer k, 0< k <M,
k = (kt−1,kt−2, . . . ,k0)2, P ∈ E(Fq)
Ensure: Q= [k]P
Q← ∞
for i f rom 0 to t−1 do
if ki = 1 then
Q← Q+P
end if
P← 2P
end for
return(Q)
Algorithm 2 : Left-to-right Binary Method for point multiplication [12]
Require: EC point P= (x,y), integer k, 0< k <M,
k = (kt−1,kt−2, . . . ,k0)2, P ∈ E(Fq)
Ensure: Q= [k]P
Q← ∞
for i f rom t−1 downto 0 do
Q← 2Q
if ki = 1 then
Q← Q+P
end if
end for
return(Q)
Algorithm 3 : Montgomery Ladder for point multiplication [12]
Require: EC point P= (x,y), integer k, 0< k <M,
k = (kt−1,kt−2, . . . ,k0)2, kt−1 = 1 P ∈ E(Fq)
Ensure: Q= [k]P
P1← P, P2← 2P
for i f rom t−2 downto 0 do
if ki = 1 then
P1← P1+P2, P2← 2P2
else
P2← P1+P2, P1← 2P1
end if
end for
return(P1)
28
In brief, an elliptic curve cryptography can be implemented either faster or more secure
up to the choice of the proper methods for application. Moreover, with respect to the
finite field choice, for example characteristic 2 as ours, addition and subtraction are
identical, and are accomplished using the XOR operation. Furthermore, inversion can
be neglected by using projective coordinates and only used once at the end of the point
multiplication. Specific work on elliptic curve algorithms can be found in [19].
4.4 Projective Coordinates
Formulas for adding two points on an elliptic curve were presented in Section 4.2.
For all curves defined, the formulas for point addition and point doubling require
inversions, multiplications and additions. If inversion in K consumes much more
time and power than multiplication, then using projective coordinate representation,
to reduce the number of inversion to one, can be more advantageous. The following
sections consider two different projective coordinates in brief.
Let K be a field, and let c and d be positive integers. One can define an equivalence
relation ∼ on the set K3(0,0,0) of nonzero triples over K by (X1,Y1,Z1)∼ (X2,Y2,Z2)
if X1 = λ cX2, Y1 = λ dY2, Z1 = λZ2 for some λ ∈ K∗.
The equivalence class containing (X ,Y,Z) ∈ K3(0,0,0) is
(X : Y : Z) = (λ cX ,λ dY,λZ) : λ ∈ K∗. (4.14)
(X : Y : Z) is called a projective point, and (X ,Y,Z) is called a representative of (X :
Y : Z). The projective of Weierstrass equation (4.8) of an elliptic curve E defined over
K is obtained by replacing x by X/Zc and y by Y/Zd , and clearing denominators [12].
4.4.1 Standard Projective Coordinates
Let c= 1 and d = 1. Then the projective form of the Weierstrass equation
E : y2+a1xy+a3y= x3+a2x2+a4x+a6 (4.15)
defines over K is
Y 2Z+a1XYZ+a3YZ2 = X3+a2X2Z+a4XZ2+a6Z3 (4.16)
29
The only point on the line at infinity that lies on E is (0 : 1 : 0) [12]. This projective
point corresponds to the point O in Equation 4.8.
4.4.2 Jacobian Coordinates
Let c= 2 and d = 3. The projective point (X : Y : Z), Z 6= 0, corresponds to the affine
point (X/Z2,Y/Z3) [12]. The projective form of the Weierstrass equation
E : y2 = x3+ax+b (4.17)
defines over K is
Y 2 = X3+aXZ4+bZ6 (4.18)
The point at infinity O corresponds to (1 : 1 : 0), while the negative of (X : Y : Z) is
(X :−Y : Z) [12].
30
5. EDWARDS CURVES
The main operations in the elliptic curve cryptography are single-scalar multiplication
(k,P→ kP) and double-scalar multiplication (k, l,P,Q→ mP+ nQ). For instance,
Miller proposed carrying these points in Jacobian coordinates, so each point is
represented by three values (x,y,z) which corresponds (x/z2,y/z3) on a curve y2 =
x3 + a4x+ a6 [9]. Up to now, the fastest algorithm for point addition uses 16 field
multiplications, specifically 11M+5S. Studies on getting faster addition and doubling
on elliptic curves are going on [20].
A new form for elliptic curves was added to the mathematical literature with Edwards
curves. Edwards showed in [21] that all elliptic curves over number fields can be
transformed to x2 + y2 = c2(1+ x2y2), with (0,c) as the neutral element and with a
simple and a symmetric addition law.
(x1,y1),(x2,y2)→ ( x1y2+ y1x2c(1+ x1x2y1y2) .
y1y2− x1x2
c(1− x1x2y1y2)). (5.1)
Similarly, all elliptic curve equations can be converted to the Edwards form. Some
of them require field extensions, but mostly these are used transformations which are
defined over the original number field or the finite field. Moreover, in [20] the notation
of Edwards form is expanded to include all curves x2 + y2 = c2(1+ dx2y2), where
cd(1− dc4) 6= 0, so that it is possible to capture a larger class of elliptic curves over
the original field.
In brief, Edwards form breaks the Jacobian speed barrier stated before and is the new
speed leader for multi-scalar multiplication. In addition to these, Edwards curve has
an extra feature that the addition formulas are complete. This means that the formulas
work over all point pairs on the curve with no exceptions for doubling, neutral element,
negatives, etc [20]. The following section discusses completeness of Edwards curves
over the characteristic 2, which are called binary Edwards curves. By introducing this
curve, the advantages of binary field over hardware implementations can be available.
31
In section 6, the implementation of binary Edwards curves for RFID tags will be
shown.
5.1 Binary Edwards Curves
This section contains complete addition formulas for binary elliptic curves, i.e.,
addition formulas that work for all input pairs, with no exceptional cases. First, the
need for Edwards curves is explained, and then the theorems and formulas will be
shown in order.
5.1.1 Introduction to Binary Edwards Curves
The points on a Weierstrass-form elliptic curve
y2+a1xy+a3y= x3+a2x2+a4x+a6 (5.2)
include not only the affine point (x1,y1), but also an extra point at infinity serving as
neutral element. The standard formulas for elliptic curve to compute a sum P1+P2 fail
if P1, P2, or P1+P2 is at infinity, or if P1 is equal to P2. Each of these possibilities should
be tested separately before generating any elliptic curve cryptosystem. A complete
addition algorithm is produced by combining several incomplete addition formulas.
In [10], the new curve shape for ordinary elliptic curves over field of characteristic 2
is introduced, and is shown that the affine points are non-singular. Binary Edwards
curves properties are defined in Weierstrass form in Definition 5.1.
Definition 5.1 : (Binary Edwards Curve) Let k be a field with char(k) = 2. Let d1,d2
be elements of k with d1 6= 0 and d2 6= d21 + d1, then the binary Edwards curve with
coefficients d1 and d2 is the affine curve [22]:
EB,d1,d2 = d1(x+ y)+d2(x
2+ y2) = xy+ xy(x+ y)+ x2y2 (5.3)
This curve is symmetric in x and y and thus it has the property that if (x1,y1) is a point
on the curve then so is (y1,x1). The point (0,0) will be the neutral element of the
addition law, while (1,1) will have order 2 [22].
The non-singularity of each binary Edwards curve is proven in Theorem 5.1.
Theorem 5.1(Non-singularity). Each binary Edwards curve is non-singular [22].
Proof. By definition, the curve EB,d1,d2 has d1 6= 0 and d2 6= d21 + d1. The partial
32
derivatives of the curve equation are d1 + y+ y2 and d1 + x+ x2. A singular point
(x1,y1) must have d1+ y1+ y21 = 0 and d1+ x1+ x
2
1 = 0 and therefore (x1+ y1)
2 =
x1+ y1, implying x1 = y1 or x1 = y1+1.
The case x1 = y1 implies 0= x21+x
4
1 by the curve equation and therefore d
2
1 = x
2
1+x
4
1 =
0, contradicting the hypothesis that d1 6= 0.
The case x1 = y1+ 1 implies d1+ d2 = y21+ y
4
1 by the curve equation and therefore
d21 = y
2
1+ y
4
1 = d1+d2, which contradicts the hypothesis that d2 6= d21 +d1.
5.1.2 Binary Edwards Curves Addition Law
Binary Edwards curves, EB,d1,d2 , addition law is given as in follows, and it is proven
that the addition law corresponds to the elliptic curve in Weierstrass form similarly. It
can be used for doubling with two identical inputs. The sum of two points (x1,y1),
(x2,y2) on EB,d1,d2 is the point (x3,y3) defined as follows:
x3 =
d1(x1+ x2)+d2(x1+ y1)(x2+ y2)+(x1+ x21)(x2(y1+ y2+1)+ y1y2)
d1+(x1+ x21)(x2+ y2)
, (5.4)
y3 =
d1(y1+ y2)+d2(x1+ y1)(x2+ y2)+(y1+ y21)(y2(x1+ x2+1)+ x1x2)
d1+(y1+ y21)(x2+ y2)
. (5.5)
If the denominators d1+(x1+ x21)(x2+ y2) and d1+(y1+ y
2
1)(x2+ y2) are non-zero
then the sum (x3,y3) is a point on EB,d1,d2: i.e., d1(x3+ y3)+ d2(x
2
3 + y
2
3) = x3.y3+
x3.y3(x3+ y3)+ x23.y
2
3 [22].
Here, if the points are inserted like (0,0) into the addition law, it is shown that (0,0) is
the neutral element. Similarly, (x1,y1)+(1,1) = (x1+1,y1+1); in particular (1,1)+
(1,1) = (0,0). Furthermore (x1,y1)+(y1,x1) = (0,0), so −(x1,y1) = (y1,x1) [22].
5.1.3 Complete Binary Edwards Curves
The complete binary Edwards curves conditions and requirements are given in
Definition 5.2.
Definition 5.2 : Let k be a field with char(k) = 2. Let d1, d2 be elements of k with
d1 6= 0. Assume that no element t ∈ k satisfies t2+ t+d2 = 0. Then the addition law on
the binary Edwards curve EB,d1,d2(k) is complete, then the complete binary Edwards
curves with coefficients d1 and d2 is the affine curve [22].
EB,d1,d2 = d1(x+ y)+d2(x
2+ y2) = xy+ xy(x+ y)+ x2y2. (5.6)
33
There is no conflict in notation or terminology, and no difference from binary Edwards
curve EB,d1,d2 . The complete case has the extra requirement that t
2+ t+d2 6= 0 for all
t ∈ k, not just for t = d1. If k is the finite field F2n then an equivalent requirement is that
Tr(d2) = 1, where Tr is the absolute trace of F2n over F2. In [10], more information is
given about generality of EB,d1,d2 .
5.1.4 Explicit Addition Formulas
In this section, we present explicit formulas for affine addition, projective addition
and mixed addition on the binary Edwards curves. The formulas are not as fast as
Weierstrass equation; on the other hand curves have the advantage of being unified and
completeness for suitable Tr(d2) values [22].
5.1.4.1 Affine Addition
The following formulas, given (x1,y1) and (x2,y2) on the binary Edwards curve
EB,d1,d2 , compute the sum (x3,y3) = (x1,y1)+(x2,y2) if it is defined:
Algorithm 4 : Affine Addition
w1 = x1+ y1,
w2 = x2+ y2,
A= x21+ x1,
B= y21+ y1,
C = d2w1w2,
D= x2y2,
x3 = y1+(C+d1(w1+ x2)+A(D+ x2))/(d1+Aw2),
y3 = x1+(C+d1(w1+ y2)+B(D+ y2))/(d1+Bw2).
These formulas use 2I + 8M+ 2S+ 3D, where I is the cost of inversion, M is the
cost of multiplication, S is the cost of squaring, D is the cost of a multiplication by a
curve parameter. The 3D here are two multiplications by d1 and one multiplication by
d2 [22].
For complete binary Edwards curves the denominators (d1 + A.w2) = d1 + (x21 +
x1)(x2+ y2) and (d1+B.w2) = d1+(y21+ y1)(x2+ y2) cannot be zero.
5.1.4.2 Mixed Addition
Given (X1 : Y1 : Z1) and (x2,y2) on the binary Edwards curve EB,d1,d2 , the following
formulas compute the sum (X3 :Y3 : Z3) = (X1 :Y1 : Z1)+(x2,y2) if it is defined: Note
34
Algorithm 5 : Mixed Addition
W1 = X1+Y1,
w2 = x2+ y2,
A= x22+ x2,
B= y22+ y2,
D=W1.Z1,
E = d1.Z21 ,
H = (E+d2D).w2,
I = d1.Z1,
U = E+A.D,
V = E+B.D,
Z3 =U.V,
X3 = Z3.y2+(H+X1(I+A(Y1+Z1))).V,
Y3 = Z3.x2+(H+Y1(I+B(X1+Z1))).U.
that, these formulas use 13M+ 3S+ 3D. For complete binary Edwards curves the
product Z3 = Z41(d1+(x
2
2+x2)(x1+y1))(d1+(y
2
2+y2)(x1+y1)) cannot be zero [22].
5.1.4.3 Projective Addition
The following formulas, given (X1 : Y1 : Z1) and (X2 : Y2 : Z2) on the binary Edwards
curve EB,d1,d2 , compute the sum (X3 : Y3 : Z3) = (X1 : Y1 : Z1)+(X2 : Y2 : Z2).
Algorithm 6 : Projective Addition I
W1 = X1+Y1,
W2 = X2+Y2,
A= X1.(X1+Z1),
B= Y1.(Y1+Z1),
C = Z1.Z2,
D=W2.Z2,
E = d1.C.C,
H = (d1Z2+d2W2).W1.C,
I = d1.C.Z1,
U = E+A.D,
V = E+B.D,
S=U.V,
X3 = S.Y1+(H+X2(I+A(Y2+Z2))).V.Z1,
Y3 = S.X1+(H+Y2(I+B(X2+Z2))).U.Z1,
Z3 = S.Z1.
These formulas use 21M+ 1S+ 4D. The 4D are three multiplications by d1 and one
multiplication by d2. For complete binary Edwards curves Z3 = Z51 .Z
4
2(d1 + (x
2
2 +
x2)(x1+ y1))(d1+(y22+ y2)(x1+ y1)) cannot be zero. Note that, these formulas are
going to be used to implement binary Edwards curves in projective coordinate and will
35
be discussed in Section 6 in details. The constant values can be more general than the
following projective addition formulas, thus we used these formulas in our design [22].
The following formulas are given for small d1 and d2 values, that they are faster than
previous one:
Algorithm 7 : Projective Addition II
A= X1.X2,
B= Y1.Y2,
C = Z1.Z2,
D= d1.C,
E =C2,
F = d21 .E,
G= (X1+Z1).(X2+Z2),
H = (Y1+Z1).(Y2+Z2),
I = A+G,
J = B+H,
K = (X1+Y1).(X2+Y2),
U =C.(F+d1K.(K+ I+ J+C)),
V =U+D.F+K.(d2(d1E+G.H+A.B)+(d2+d1)I.J)
X3 =V +D.(A+D).(G+D),
Y3 =V +D.(B+D).(H+D),
Z3 =U+(d2+d1)C.K2.
These formulas use 18M+2S+7D. One can alternatively compute F as D2, replacing
1D with 1S. For the complete binary Edwards curves the denominator Z3 cannot be
zero [22].
The following formulas become simpler in case d1 = d2:
These formulas use 16M+1S+4D. As stated above, one can replace 1D with 1S. For
complete binary Edwards curves the denominator Z3 cannot be zero.
5.1.5 Doubling
The fast doubling formulas on the Edwards curve EB,d1,d2 is presented in this section.
Affine coordinates and inversion-free projective coordinates are given respectively. In
addition to these, the formulas are complete if the curve is complete. The literature
on doubling formulas for binary Edwards curves is reviewed and the speeds of two
different doubling forms are compared in this section.
36
Algorithm 8 : Projective Addition III
A= X1.X2,
B= Y1.Y2,
C = Z1.Z2,
D= d1.C,
E =C2,
F = d21 .E,
G= (X1+Z1).(X2+Z2),
H = (Y1+Z1).(Y2+Z2),
I = A+G,
J = B+H,
K = (X1+Y1).(X2+Y2),
L= d1.K,
U =C.(F+L.(K+ I+ J+C)),
V =U+D.F+L.(d1E+G.H+A.B),
X3 =V +D.(A+D).(G+D),
Y3 =V +D.(B+D).(H+D),
Z3 =U.
5.1.5.1 Affine Doubling
Let (x1,y1) be a point on EB,d1,d2 , and assume that the sum (x1,y1)+(x1,y1) is defined.
Computing (x3,y3) = (x1,y1)+(x1,y1) we obtain;
x3 =
d2(x1+ y1)2+(x1+ x21)(x1+ y
2
1)
d1+(x1+ y1)(x1+ x21)
=
d1(x1+ y1)+ x1y1+ x21(1+ x1+ y1)
d1+ x1y1+ x21(1+ x1+ y1)
= 1+
d1(1+ x1+ y1)
d1+ x1y1+ y21(1+ x1+ y1)
,
(5.7)
where the second line uses that d2(x1+ y1)2+ x21y
2
1+ x1y
2
1 = d1(x1+ y1)+ x1y1+ x
2
1y1
for all points on EB,d1,d2 [22]. Likewise we have
y3 = 1+
d1(1+ x1+ y1)
d1+ x1y1+ y21(1+ x1+ y1)
. (5.8)
Note that, the affine formulas is computed with one inversion, as the product of the
denominators of x3 and y3 is
37
(d1+ x1y1+ x21(1+ x1+ y1))(d1+ x1y1+ y
2
1(1+ x1+ y1))
= d21 +(x
2
1+ y
2
1)(d1(1+ x1+ y1)+ x1y1(1+ x1+ y1)+ x
2
1y
2
1)
= d21 +(x
2
1+ y
2
1)(d1+d2(x
2
1+ y
2
1))
= d1(d1+ x21+ y
2
1+(d2/d1)(x
4
1+ y
4
1)),
(5.9)
where the curve equation is used again. This leads to the doubling formulas
x3 = 1+
d1+d2(x21+ y
2
1)+ y
2
1+ y
4
1
d1+ x21+ y
2
1+(d2/d1)(x
4
1+ y
4
1)
, (5.10)
y3 = 1+
d1+d2(x21+ y
2
1)+ x
2
1+ x
4
1
d1+ x21+ y
2
1+(d2/d1)(x
4
1+ y
4
1)
, (5.11)
which needs 1I+2M+4S+2D. For complete binary Edwards curves all denominators
here are nonzero [22].
If d1 = d2 some multiplications can be grouped as follows:
A= x21, B= A
2, C = y21, D=C
2, E = A+C,
F = 1/(d1+E+B+D), x3 = (d1E+A+B).F, y3 = x3+1+d1F .
These formulas use only 1I+1M+4S+2D.
5.1.5.2 Projective Doubling
In this sub-section, explicit formulas of projective doubling is given to compute 2(X1 :
Y1 : Z1) = (X3 : Y3 : Z3):
A= X21 , B= A
2, C = Y 21 , D=C
2, E = Z21 ,
F = d1E2, G= (d2/d1)(B+D), H = A.E,
I =C.E, J = H+ I, K = G+d2J,
X3 = K+H+D, Y3 = K+ I+B, Z3 = F+ J+G.
These formulas use 2M+ 6S+ 3D. The 3D are multiplications by d1,d2/d1 and d2.
For complete binary Edwards curves the denominator Z3 is nonzero.
If d1 = d2 the squaring can be computed as follows :
W1 = X1+Y1,E+(W1(W1+Z1))2,
X3 = ((
√
d1W1+X1)Z1+X21 )
2,Y3 = X3+E,Z3 = E+d1(Z21)
2.
These formulas use 2M + 5S + 2D. For complete binary Edwards curves the
denominator Z3 is nonzero [22].
38
Comparing the literature vs. binary Edwards curves:
These doubling formulas for complete Edwards curves are the first complete doubling
formulas in literature. All other doubling formulas in the literature have exceptional
cases. Moreover, it is presented in [22] that there are two improvements on doubling
formulas of Lopez-Dahab coordinates for binary curves in Weierstrass form. Also
the improved formulas against Kim and Kim represented formulas is given in [23]
and [22].
5.1.6 Differential Addition
“Differential addition” means computing Q+P given Q,P,Q − P: e.g., computing
(2m+ 1)P given (m+ 1)P, mP and P. In [22], it is analyzed that the cost of formulas
in affine coordinates is expensive, if the inversion operation is expensive. Thus, unless
there is limit to storage space, it is better to represent the points in projective form (i.e.,
as a ratio of two elements). In Table 5.1, the achieved speeds in [22] is given.
Table 5.1: The Speed of Binary Edwards Curves Differential Addition
General case d2 = d1
Affine diff addition 1I + 3M + 1S + 1D 1I + 1M + 2S + 1D
Affine diff addition + doubling 2I + 4M + 3S + 2D 2I + 1M + 3S + 2D
Mixed diff addition 6M + 1S + 2D 5M + 1S + 1D
Mixed diff addition + doubling 6M + 4S + 4D 5M + 4S + 2D
Projective diff addition 8M + 1S + 2D 7M + 1S + 1D
Projective diff addition + doubling 8M + 4S + 4D 7M + 4S + 2D
The reason, why the differential addition is interesting, is relied on Montgomery’s
fast formulas for u-coordinate differential addition in non-binary elliptic curves
presentation v2 = u3 + a2.u2 + u in [24]. An application, Montgomery ladder is
suggested to compute u(mP), u((m+ 1)P) given u(P). It is mentioned in Section 4
that the Montgomery ladder has many advantages like: it is fast; controller part fits
into extremely small hardware; its uniform double-and-add structure makes it secure
against simple side-channel attacks. Therefore, it is used in our design to protect
against SPA and also projective addition formulas are used to implement EC point
addition. More details related to differential addition can be found in [22].
39
40
6. BINARY EDWARDS CURVES IMPLEMENTATION
Previous sections show that the communication over unsecured channels is an
extremely hard problem. Symmetric-key algorithms can be used to generate highly
secure and fast systems. As mentioned previously, the drawback of symmetric
cryptography is key management and distribution before getting into secure channel to
communicate. To solve this problem, public-key cryptography was proposed to arrange
key management. If public-key and symmetric-key cryptography are used together,
then we can provide fast, more secure and efficient systems. Using elliptic curves
is one of the recent methods to create protocols for the key management. Moreover,
several algorithms are proposed to improve the features of elliptic curves. Recently,
Edwards curves were proposed [21] and it was shown that every point on curve is valid
for point addition. Afterwards, binary Edwards curves was proposed.
In this work, a binary Edwards curve implementation has been designed. This is
the first implementation of a binary Edwards curve on hardware, and the proposed
design is compact over finite fields, so that it can be used for RFID’s. In elliptic curve
algorithms most of the calculations is concentrated on point multiplication (Q= k.P),
then different protocols can be implemented on it easily. The proposed design was
written in GEZEL hardware design language [25], and its results are tested with
Synopsys Design Vision [26]. Moreover, the projective coordinates are used to neglect
inversion during finite fields computation, and also modular addition is used instead
of doubling. Therefore, the design is more secure against Simple Power Analyses
(SPA) [27]. Register number of the design is reduced from 8 to 5 to consume less area,
and several hardware tricks are used to finish the calculations in less clock cycles.
6.1 Implementation of Binary Edwards Curves
As an implementation step, first of all, the strategy of design is arranged. Control
block and the finite state machine of the point multiplication are the critical parts of
41
the design. Therefore, projective addition algorithm steps, given in algorithm 9, are
arranged to calculate the results in an efficient way. The point multiplication is tested
to see that algorithm is working for k = 5 (in binary 101) with respect to test vectors.
After the reference finite state machine is provided with 8 registers in register file,
different finite state machines are written to reduce the number of registers. Finally,
register file is established with five registers.
Algorithm 9 : Projective Addition
W1 = X1+Y1, =⇒W2 = X2+Y2,
W2 = X2+Y2, =⇒ D=W2.Z2,
A= X1.(X1+Z1), =⇒C = Z1.Z2,
B= Y1.(Y1+Z1), =⇒ E = d1.C.C,
C = Z1.Z2, =⇒ I = d1.C.Z1,
D=W2.Z2, =⇒ A= X1.(X1+Z1),
E = d1.C.C, =⇒W1 = X1+Y1,
H = (d1Z2+d2W2).W1.C, =⇒ H = (d1Z2+d2W2).W1.C,
I = d1.C.Z1, =⇒U = E+A.D,
U = E+A.D, =⇒ B= Y1.(Y1+Z1),
V = E+B.D, =⇒V = E+B.D,
S=U.V, =⇒ S=U.V,
X3 = S.Y1+(H+X2(I+A(Y2+Z2))).V.Z1, ⇒ Z3,
Y3 = S.X1+(H+Y2(I+B(X2+Z2))).U.Z1, ⇒ Y3,
Z3 = S.Z1. =⇒ X3.
The main architecture of the binary Edwards curves processor design is shown in
Figure 6.1. It consists of a processor, a bus manager, a ROM and RAM blocks. The
starting points (PX , PY ), key (k) and equation constants (d1,d2) are stored in the ROM.
Interval values and result points (X3, Y3, Z3) of modular addition are kept in the RAM0.
RAM1 and RAM2 store P1 and P2 values of the Montgomery ladder, respectively.
The bus manager controls the connection between ROM-processor, RAMs-processor,
processor-RAMs, RAMs-RAMs according to the address and assign bits.
The processor part of the design consists of a control block, a register file, a modular
arithmetic logic unit (MALU) and a shifter. The control block has the finite state
machine data path that arranges the inputs, outputs and connects the components
according to the addition and multiplication. Moreover, the last inversion process is
also controlled by same control block, based on Fermat Little Theorem [7] and several
multiplications which are given in Table 6.1. The following subsections indicate the
components in details in three sub-groups; processor, bus manager and memory units.
42
A
_
c
t r
l
C
_
c
t r
l
D
_
c
t r
l
E
_
c
t r
l
B
_
c
t r
l
M
u
l _
s
t a
r t
O
p
M
u
l _
l a
s
t
s
h
i f
t
m
s
b
_
k
f i
r s
t
c
o
u
n
t
s
t o
p
l o
a
d
d
a
t a
_
o
u
t
a
d
d
r e
s
s
a
s
s
i g
n
d
a
t a
_
i n
R
e
g
D
88
D
a
t a
_
i n
Figure 6.1: The architecture of Binary Edwards Curves Processor (BEC Processor)
6.1.1 Processor
A processor is the main part of the design. It assigns the necessary inputs from storage
to the register and it controls the addition and multiplication operations in finite fields.
Afterwards, according to the double and add algorithm of Montgomery ladder, is given
in Algorithm 10, the processor sends the intermediate values to the storage parts over
the bus. Moreover, the next point is calculated according to key value (k), while a
counter is counting shift operation of the key. Finally, the counter gives the finish
signal and calculation stops. Q= k.P is calculated.
6.1.1.1 MALU (Modular Arithmetic Logic Unit) Design
The first MALU architecture was initially proposed in [28]. It can perform both
addition and multiplication operations over finite fields as shown in Figure 6.2.
Operations are performed as shown by Equation 6.1 [28].
A(x) = B(x) .C(x) mod P(x) i f cmd = 1
A(x) = A(x)+C(x) mod P(x) i f cmd = 0
(6.1)
where A(x) =∑ai.xi,B(x) =∑bi.xi,C(x) =∑ci.xi and P(x) = x163+x7+x6+x3+1.
In the MALU, the cost of the field multiplication and addition is 163d and one clock
cycle for the digit size d, respectively. In every cell, multiplication and addition can
share the same XOR array. The MALU can be scaled easily to different digit sizes by
using cells in serial.
43
Figure 6.2: BEC Processor’s MALU Architecture
The MALU does not contain internal registers, everything works in sequential. As
shown in the Equation 6.1 [29], the register file keeps the interval value (RetA) and
MALU does the calculations. When the MALU performs a multiplication, each digit
of multiplication must be provided to the MALU. That means in every cycle, regBmust
be shifted to left by d bits and most significant digits turn back to the least significant
bits as a circular shifting. Note that, the shift operation must be circular and the last
shifting must be a remainder of 163d so that regB turns back to the initial value at the
end of the multiplication.
In Figure 6.2, cmd signal commands perform multiplication or addition as shown in
equation 6.1 [29]. The position of the XOR-gates in the latter array depends on the
irreducible polynomial. In this case, the polynomial P(x) = x163+ x7+ x6+ x3+ 1 is
used. In case of a finite field multiplication, the reduction needs to be done if theMSB
of A is “1”. For the finite field addition, cmd signal provides the reduction that will not
be performed [28].
The data path of the MALU is an MSB serial F2n multiplier with digit size d. The
MALU sums up three types of inputs which are Bi C, AMSB P(x) and A. Afterwards,
it gives the intermediate result, RetA, by computing RetA = (A+Bi.C+AMSB.P(x)).
The multiplication operation can be obtained by providing the next input A as RetA by
44
repeating this computation for n times. The addition operation can be obtained at the
same hardware with some additional tricks. Input value A is shifted 1-bit left inside
the ‘CELL’ as shown in Figure 7.4. If ‘A’ value is shifted to the right before entering
the ‘cell_3’ and does the XOR as an addition operation for binary fields inside the
‘cell_3, then C[0]⊕A[0] can be concatenated to the A, it is illustrated in Figure 7.4 on
the ‘MALU’. Thus, addition can be done with the same hardware in one clock cycle.
0
1163 163
Bit-wise XOR
0xc9 0
<<1
163 1638
163
A C
RetACELL
MSB_B[0]
Op
C A
0
A>>1
CELL_0
CELL_1
CELL_2
CELL_3
C[0]
A[0]
CELL_3[0]
163 16314
163
1
1
MALU
163
162
1
#
Figure 6.3: Control Scheme of Cell and MALU with d = 4
In our design, the optimum digit size is decided as 4 according to the trade-offs between
area consumption and processing time. Therefore, the Figure 7.4 illustrates the MALU
for d = 4.
Here, notice that the shifting process is done in 40 times 4 and 1 time 3 cycles. The
Mult_last signal controls the multiplexer to choose the output of cell_3 or cell_2, and
it selects cell_2 if it is the last round of multiplication. Also, mul_start signal controls
regA value and sets it to “0” when multiplication starts. The GEZEL code of the Cell
and MALU are given in Appendix-C.
6.1.1.2 Register File Design
The register file is the memory part of the processor and temporary values of projective
addition are stored in it. The MALU uses three registers as operands and the result
value. The architecture of register file is shown in Figure 6.4. A circular shift register is
45
used to reduce the complexity of the multiplexer. In a randomly accessible register file,
the area complexity is directly proportional with the square of the number of registers,
since every register has inputs as a number of register; on the other hand, the area
complexity of the multiplexer in circular shift register is a constant.
RegB<<4
RegA RegB RegC RegD RegE
1
RegD<<8# 
Data_in
163 1634 11
163 2 2 1 3 18
R
e
tA
A
_
c
tr
l
C
_
c
tr
l
D
_
c
tr
l
E
_
c
tr
l
B
_
c
tr
l
D
a
ta
_
in
R
e
g
A
M
S
B
_
4
B
R
e
g
C
R
e
g
D
Figure 6.4: Register File Architecture
Although the register file is a circular shift register file as given in Figure 6.4, each
register is independently controlled for the efficient management. In our first design,
regA was used for assigning new values and as an accumulator of operations. In
the progress of designing, it is noticed that the assigning operation can be processed
simultaneously with multiplication operation if another register is used to assign
values. Since, multiplication operation takes from 24 to 163 clock cycles in digit sizes
from 7 to 1 respectively and assigning from outside to the register file the data takes
only 21 clock cycles. Consequently, regD, a spare register, is used for assigning the
data to reduce the total processing time.
In our sample design, we stated before that forty times 4-bit and once 3-bit shifting is
needed. If we add one more bit to the regB, we can ignore 3-bit shifting and regular
4-bit shifting works as shown in Figure 6.5. In first shifting operation, MSB_4B of
regB B[162],B[161],B[160],B[159] goes to B[2],B[1],B[0],B[163] respectively and so
on, for the same values, second shifting will be obtained as B[2],B[1],B[0],B[163] to
B[6],B[5],B[4],B[3]. After 41 cycles, regB gets the initial value back. So, the fifth input
of the multiplexer is ignored (Reg B « 3) and the multiplexer can be controlled by only
in 2-bit. This feature can be done for the other digit sizes as well. The difference is
only remainder of bits after regular shifting done. RegC and regE do not need any
46
multiplexers. These registers are only connected with their own or previous ones value
and this can be controlled by clock-gating with enable signal which is generated from
“C_ctrl” and “E_ctrl” signals.
Shift(<<4)
4
MSB_4B MALU
RegB
1
6
2
1
6
1
1
6
0
1
5
9
1
6
3
2 1 0
LSB(8)MSB(3)(8) Shift(<<8)
Data_in
3 8
t==0
Data_out
RegD
Figure 6.5: Shifting Operation in regB and Data Assigning, Taking Process in regD
In our final design, the register file inputs are only connected from external memory
(ROM or RAM) to the regD. Since, as shown in Figure 6.5, the data is assigned by
8-bit inputs, RegD performs 8-bit shifts to keep the previous loaded data. Moreover,
other inputs of the regD are: one from itself, one from regC as circular shift register,
one from regA to connect outside easily like shortcut, one from outside and the last one
is ‘1’ to add projective Z-coordinate as initial value as shown in Figure 6.4.
Finally, shifting is done as shown in Figure 6.5. The data is assigned to the LSB bits,
then shift and concatenate operations are used to fill the regD. The output is taken from
MSB bits of regD by the following steps;
1. For t = 0 (counter), the first output is taken as most significant 3-bit.
2. For t = 1, one 8-bit shifting is omitted and multiplexer chooses the bits between
regD[159:152].
3. For 2 6 t 6 21, every cycle multiplexer chooses the bits between regD[159:152]
and regD is shifted 8-bit to the left.
47
6.1.1.3 Shifter Design
In our processor, key value (k) is stored in an internal register as a shifter component.
This component has its own finite state machine and is operated by a controller with
load, shift and stop signals. The same 8-bit shifting operation is used when the point
multiplication is started and k is assigned to the processor only once. The figure 6.6
illustrates the control of shifter.
shift
msb_k
first
countstop
1
1
1
1
load
1
k
8
RegK
RegK<<8 #k
K<<1
1
1
1
1
1
SHIFTER
Figure 6.6: The Architecture of Shifter Block
In Figure 6.6 the load signal controls the multiplexer to choose 8-bit shifting and
concatenation on regK. Since it is just a signal, it should be indicated in every cycle as
‘0’ or ‘1’. The shift signal controls the multiplexer to choose 1 bit shifted value to the
regK. Shifting operation is not used after every projective addition operation, so that
each bit is used twice to calculate new P1 and P2 values, in the Montgomery ladder.
The stop signal controls the shifting operation to stop. Moreover, Montgomery ladder
requires the most significant bit equals ‘1’ to start the operation. When the assigning
operation is over, the first step is searching the MSB until it equals ‘1’ as shifting to
the left, so that the MSB of key can be zero. The finite state machine of shifter is
illustrated in Figure 6.7. MSB_k value is checked, if it is ‘0’ then shift to the left by
one bit is active and the output flag ‘count’ is raised, else shifting operation is passive
and the output ‘first’ is raised. It means, when theMSB of k value equals ‘1’, the point
multiplication can be started. Note that, count flag just increments the counter inside
of the control block when shifting operation is processed.
48
Figure 6.7: The Finite State Machine Diagram of Shifter Component
6.1.1.4 Control Block
In our design, the control block works as the brain of all decisions. It has two different
finite state machines to manage projective addition and to convert the result back to the
affine coordinates at the final step (inversion). It controls not only the processor but
also the outside of the processor as shown in Figure 6.8.
Inside the processor, the control block controls the register file; it keeps the values or
performs circular shifting or assigning new value or shifting regB during multiplication
and keeps return value A or assigning values and runs multiplication operation in
parallel, with only 8-bit. 3-bit controls theMALU to define the operation type, manages
the multiplication with respect to counter (register t). It also manages shifter blocks
finite state machine with 3-bit.
Outside of the processor, the control block controls the interface between memory unit
to bus and bus to processor with the address and assign signals. It receives the values
from ‘data_in’ input port and puts the values to the memory from ‘data_out’ output
port.
There are several kinds of register types in the controller block as given in Figure 6.8.
One type is just counters, 6-bit register ‘t’ counts the multiplication steps, 7-bit register
49
SHIFTER
(k = 163)
MALU 
(d=4)
Register File 
5 * 163
CONTROL BLOCK
A
_
c
tr
l
C
_
c
tr
l
D
_
c
tr
l
E
_
c
tr
l
B
_
c
tr
l
2 2 1 3 1
RegA
MSB_4B
RegC
4
163
163
163
RetA
M
u
l_
s
ta
rt
O
p
M
u
l_
la
s
t
11 1
s
h
if
t
m
s
b
_
k
fi
rs
t
c
o
u
n
t
s
to
p
1 1 1 1 1
lo
a
d
1
d
a
ta
_
o
u
t
a
d
d
re
s
s
a
s
s
ig
n
d
a
ta
_
in
8 10 4
8
2+8 4
1 11
111
11
R
e
g
D
88
D
a
ta
_
in
1
6
7
t
t2 8counter
PROCESSOR
Figure 6.8: The Processor of Binary Edwards Curves
‘t2’ counts the inversion steps which are given in Table 6.1 and 8-bit register ‘counter’
counts the key shifting inside shifter, and manages shifting process. One bit registers
keep the values that are controlled by a conditional statement in the finite state machine.
10-bit register keeps the address in 8-bit and the read or write operation in 2-bit.
Finally, 4-bit register keeps the operation of assign controlling, whether reads from
ROM or RAM, writes to RAM, writes from RAM0 to RAM1, RAM2 and manages
reading values from P1 or P2 depending on the key.
In controller block, there is a sequential finite state machine for repeating one modular
projective addition which is given in Figure A.1 in Appendix A and in Algorithm 9.
Operations are initialized with the start signal that is created inside in our design, but
this signal can also be triggered from the outside. Then controller block manages the
bus and ROM to get the k value from ROM to shifter block until ‘t’ counts 21. When
MSB signal is received from shifter block, it starts to get the first time values from
ROM according to Montgomery ladder which is given in Algorithm 10 [24]. Initially,
in “firsttime” period, the P2← 2P value is calculated, assign signal is set the connection
over buses to put these values to RAM2. Note that (as stated before), RAM1 stores P1
values and RAM2 stores P2 values. Shifter shifts the ‘k’ to the left by 1-bit and checks
MSB, equals ‘0’ or ‘1’. So, the next point addition operation occurs according to the
50
MSB and ‘assign’ signal which will be explained later. After all bits of k are generated
by Montgomery ladder, the control block manages the inversion operation of Z1 to
convert the projective coordinates to affine one.
Algorithm 10Montgomery Ladder for Point Multiplication
Require: EC point P= (x,y), integer k, 0< k <M,
k = (kt−1,kt−2, . . . ,k0)2, kt−1 = 1 P ∈ E(Fq)
Ensure: Q= [k]P
P1← P, P2← 2P
for i f rom t−2 downto 0 do
if ki = 1 then
P1← P1+P2, P2← 2P2
else
P2← P1+P2, P1← 2P1
end if
end for
return(P1)
Consequently, there is an important point in the establishment of assign control. It
is obvious that the main problem is same finite state machine must work on different
calculations. The arrangements of the inputs and outputs are controlled by the help of
this four bit control unit, assign, which is given in Equation 6.2;
assign[3 : 0] = read ] online ] ctrlstate ] msbk (6.2)
where the read bit controls which value from which RAM will be assigned, the online
bit controls carrying values from RAM0 to RAM1 or RAM2 over bus, while FSM
of control block is running. The details are given in subsection RAM. Moreover, the
ctrlstate controls in which state we are (first or second state of Montgomery ladder
operations). Finally, the msbk controls the operations of Montgomery ladder. The
assign control map is given in Table B.1 in Appendix B in details.
6.1.2 Bus Manager
The bus manager is a connection between the processor to memory units and outside
of the chip. The bus manager gets 10-bit address and 4-bit assign control to manage
which ports will be connected. Four most significant bits of the address control are
read and write signals of memory units. The most significant two bits are considered
51
read ROM, read RAM and write RAM. The other two bits indicate which RAM will
read or write. The address map of this four bit is given in Figure 6.9.
Figure 6.9: The Map of Address Control
The bus manager has also a register which stores 6 of address bits, to work in parallel,
to read from RAM0 and to write to the related RAM one cycle later which is decided by
assign control. Therefore, we design this RAM blocks separately. In the Montgomery
ladder, there is a selection between P1 or P2 values according to the step. When
the results are obtained, they should be carried to the related places for reuse in the
following step by using separated RAMs with same addresses. This register just helps
us as a buffer, to use the same address bits from control block. Figure 6.10 shows the
architecture of the bus manager.
rd
address
odata
rd
rd
rd
address
wr
wr
wr
address
address
idata
idata
idata
odata
odata
odata
6
6
6
8
8
81
1
1
1
1
18
8
8
1
8
8
d
a
ta
_
o
u
t
a
d
d
re
s
s
a
s
s
ig
n
d
a
ta
_
in
8 10 4
8
BUS MANAGER
ROM
RAM2
RAM0
RAM1
CONTROLLER
6 44
A
d
d
re
s
s
 c
o
n
tr
o
l 
b
it
s
6
 b
it
s
 
a
d
d
re
s
s
-r
e
g
A
s
s
ig
n
in
g
 c
o
n
tr
o
l 
b
it
s
Figure 6.10: Architecture of Bus Manager
52
6.1.3 Memory Units
6.1.3.1 ROM
In ROM design, look-up tables (LUT) are used to store main point coordinates (X, Y),
constant values in binary Edwards curves equation (d1,d2) and key value (k). It can be
addressed by 8-bit. It is assumed that the connection between ROM and processor is
secure against eavesdropping.
6.1.3.2 RAM
‘Ipblock’ feature of GEZEL [25] is used to design RAM blocks. The reason of using
separate three RAM block is explained in section 6.1.2. In the “Ipblock feature, the
word length and the size of the RAM is defined as shown in algorithm 11.
Algorithm 11 Ipblock for RAM design
ipblock RAMEC0(in address : ns(8);
in wr, rd : ns(1); in idata : ns(8); out odata : ns(8)){
iptype “ram′′;
ipparm “wl = 8′′;
ipparm “size= 128′′;}
RAM0 is used for interval values and it stores four 163-bit values. After every
projective point addition operation finished, new coordinates reveal in order of X, Y,
Z and interval value. These first three values are carried to the related RAM for reuse
in the following steps of the Montgomery ladder, and it is given in Figure 6.11.
RAM0
“wl=8”
 “size=128”
RAM1
“wl=8”
 “size=64”
RAM2
“wl=8”
 “size=64”
0x00 : Interval values 0x00 : PX1 0x00 : PX2
0x15 : Interval values 0x15 : PY1 0x15 : PY2
0x2A : PZ20x2A : PZ10x2A : Interval values
0x3F : Interval values
Figure 6.11: RAM Blocks and Storing Values
53
6.2 Algorithms for Implementation of Binary Edwards Curves
The main operation of the binary Edwards curves is the scalar multiplication. The
hierarchical structure of required operations for implementation of BEC is given in
Figure 6.12. The scalar multiplication is at the top level and we use the Montgomery
Ladder algorithm which is given in Algorithm 10. The next lower level are composed
of projective addition and inversion operations, but as stated before, we used projective
coordinates to neglect inversion operation in every step of the point addition. So, we
use inversion operation at the end of process to get the affine coordinates. The binary
Edwards curves projective addition algorithm is used with general d1 and d2 constant
values for implementing point addition which is given in Algorithm 6. The lowest level
consists of the finite field arithmetic operations such as addition and multiplication.
Finally, as stated in Section 3 (Essential Concepts), the modular inversion is performed
using Fermat’s little theorem, which is given in Theorem 6.2 and in Theorem 6.3 [7].
Theorem 6.2 : If p is prime, then
ap ≡ a(modp).
Theorem 6.3 : Fermat’s little theorem If p is prime and p - a, then
ap−1 ≡ 1(modp).
SCALAR 
MULTIPLICATION
PROJECTIVE 
ADDITION
ADDITION MULTIPLICATION
Montgomery Ladder
INVERSION Fermat’s Little Theorem
Figure 6.12: Required Operations for Binary Edwards Curves Implementation
In Fermat’s little theorem, the prime number p refers to 2163 in our design. Then in
order to calculate the inverse of Z coordinate, we must calculate the value a2
163−2 ≡
a−1. This can be performed by multiplications and squarings. Note that, we did not
implement doubling algorithm in our design, instead we used the multiplication with
54
same operands to implement doubling. In [30], Itoh and Tsujii proposed an efficient
technique to calculate the inverse in an optimal way.
From Fermat’s little theorem, we have a−1 = a2n−2 = (a2n−1−1)2, for all a ∈ F2n [7]. In
our design, n equals to 163, so n−1 is even number. Then we can write:
a2
n−1−1 = a(2
n−1
2 −1)(2 n−12 +1) = (a2
n−1
2 −1)2
n−1
2 a2
n−1
2 −1. (6.3)
In our case for F2163 we get:
a−1 = a2
163−2 = (a2
162−1)2 = ((a2
81−1)2
81
a2
81−1)2, (6.4)
which means that we need to use formula a2
n−1−1, but now n−1 is odd number [7]. In
this case:
a2
n−1−1 = a.a2
n−1−2 = a(a2
n−2−1)2. (6.5)
The inversion operation can be computed by 171 multiplication by repeating this
process. All process is given in Table 6.1.
Table 6.1: Example of inversion in F2163
Calculation Number of Multiplication
a2
2−1 = a.a2 2M
a2
4−1 = (a22−1)22.a22−1 3M
a2
5−1 = a.(a24−1)2 2M
a2
10−1 = (a25−1)25.a25−1 6M
a2
20−1 = (a210−1)210.a210−1 11M
a2
40−1 = (a220−1)220.a220−1 21M
a2
80−1 = (a240−1)240.a240−1 41M
a2
81−1 = a.(a280−1)2 2M
a2
162−1 = (a281−1)281.a281−1 82M
a−1 = a2163−2 = (a2162−1)2 1M
Total 171M
55
56
7. RESULTS
The implementation of binary Edwards curves is written in GEZEL [25] hardware
description language and is optimized for low area consumption. We attempted to
implement Algorithm 9 as fast as possible. Assigning operations are hidden within
the multiplication operation. In one multiplication time, the processor does not only
assigning operation but also stores the values from regD to memory unit while digit
size is not overlapping the time limit. For instance, multiplication operation takes 41
clock cycles for d = 4, and single assigning operation takes 21 clock cycles by 8-bit
per cycle. Therefore, if digit size is below four, it will be possible to do assigning
and storing operations parallel with the multiplication operation. 0.13µm standard
cell library is used as the technology library. For the synthesis, the VHDL codes,
which are automatically generated by GEZEL, were used. The design was synthesized
by Synopsys Design Vision [26] at different frequencies, but especially 400 kHz
comparisons are given. RTL level and gate level simulations were done by GEZEL
fdlsim feature and Modelsim, example simulations are added to the Appendix E. All
power consumption estimations were also synthesized on Synopsys Power Compiler
tool and these measurements were carried out by the switching activity which was
captured by the gate level simulations in ModelSim.
7.1 Power Estimation Methodology
In our first estimation method, the power consumption is estimated just by Synopsys
Design Vision without any switching activity. Then, the similar results are seen for
different digit sizes. It is observed that the power consumption is only affected by
frequency changes. In order to estimate power consumption of the designs as accurate
as possible, power simulations were done in the gate-level by using the switching
activity. The power estimation hierarchy is shown in Figure 7.1.
57
Figure 7.1: Power estimation flow
After compiling the design with Synopsys, area and timing reports are obtained. In
Figure 7.1, dashed arrow shows first estimation type without switching activity. Red
arrows show the method to get the files. First, a verilog netlist and a standard delay
format (SDF) file are produced in order to be used in simulations. The SDF file
contains the timing information of the cells in the netlist. After this step, the netlist
is simulated in ModelSim and a value change dump (VCD) file is created by the
simulator. The VCD file keeps the switching activity of the signals of the netlist.
Then, this VCD file is converted to a switching activity interchange format (SAIF) file,
which is used by the Synopsys Power Compiler. As the final step, Power Compiler
generates the power estimation reports by using the netlist information and SAIF file.
After this measurement setup is applied to the power consumption estimation, results
are changed by the changing on digit sizes.
7.2 Area, Power Consumption in Different Frequencies
Six different implementation of binary Edwards curves with different digit sizes are
synthesized in order to find the best solution. Firstly, design is implemented with
d = 3 digit size. Afterwards, digit size is changed and resynthesized. The area and
power consumptions, longest critical path delay and processing clock cycle are given
58
in 100kHz, 400kHz, 1MHz, 5MHz, 20MHz and 50MHz in Table 7.1, 7.2, 7.3, 7.4,
7.5 and 7.6, respectively. Equivalent gate numbers of the designs are calculated by
dividing the total area of the circuit by the area of one 2-input NAND gate.
In Table 7.2, it can be observed from the results that combinational area is increasing
with digit size but non-combinational area stays the same. Non-combinational area
does not change, since all these results depend on the same five register design. The
difference between these results is in the number of CELL which consists of AND
and XOR gates, there is no additional register coming from CELL. Moreover, the
power consumption increases according to the higher digit size and higher frequencies.
Critical path of the design changes depend on the compiler, but basically values are
around 14-15 ns. For instance, if we add an extra cell in our MALU in serial, then
we expect to see, the path of the signal will increase. Slack means that the rest of the
period of a clock cycle after delay is removed. Our example design (d = 4) has around
14 ns delay, which means our design will worked up to 70MHz. The area consumption
of our example design is given in Table 7.7 in details.
Table 7.1: Results of implementation in 100kHz in 0.13µm technology
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 13.08 6.81 6.27 2.27 1.84 15.08 2484 1499716
d=2 14.23 7.96 6.27 2.76 2.10 15.63 2484 835678
d=3 14.75 8.49 6.26 3.17 2.34 15.21 2484 614333
d=4 15.03 8.76 6.26 3.28 2.41 15.11 2484 505975
d=5 15.90 9.64 6.26 3.43 2.45 16.62 2483 463496
d=6 16.43 10.16 6.26 3.56 2.5 17.51 2482 436945
Table 7.2: Results of implementation in 400kHz in 0.13µm technology
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 13.08 6.8 6.27 10.49 8.82 15.15 2485 1499716
d=2 14.23 7.96 6.27 12.50 9.82 15.31 2485 835678
d=3 14.74 8.48 6.26 14.43 10.84 13.98 2486 614333
d=4 15.04 8.77 6.27 14.42 10.77 13.83 2486 505975
d=5 15.90 9.63 6.27 15.81 11.46 14.09 2486 463496
d=6 16.42 10.15 6.27 15.86 11.42 14.70 2485 436945
59
Table 7.3: Results of implementation in 1MHz in 0.13µm technology
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 13.07 6.8 6.27 25.6 20.7 14.92 985 1499716
d=2 14.23 7.96 6.27 29.79 22.47 16.87 983 835678
d=3 14.75 8.49 6.26 31.83 23.43 16.72 983 614333
d=4 15.03 8.77 6.26 32.75 24.07 14.61 985 505975
d=5 15.90 9.63 6.26 34.36 24.59 17.53 982 463496
d=6 16.42 10.15 6.26 35.73 25.11 19.32 981 436945
Table 7.4: Results of implementation in 5MHz in 0.13µm technology
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 13.06 6.78 6.28 108.99 91.19 15.57 184 1499716
d=2 14.20 7.93 6.27 148.44 112.04 16.83 183 835678
d=3 14.72 8.46 6.27 157.63 116.33 15.66 184 614333
d=4 15 8.74 6.26 162.76 119.97 14.52 185 505975
d=5 15.87 9.6 6.26 170.14 122.01 15.84 184 463496
d=6 16.38 10.12 6.26 177.43 124.99 18.25 182 436945
Table 7.5: Results of implementation in 20MHz in 0.13µm technology
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 13.04 6.77 6.27 425.1 344.1 14.56 35 1499716
d=2 14.18 7.91 6.27 492.09 372.66 15.29 35 835678
d=3 14.72 8.45 6.26 526.73 388.46 15.33 35 614333
d=4 15 8.73 6.27 545.33 401.2 13.93 36 505975
d=5 15.85 9.59 6.26 572.78 410.54 15.54 34 463496
d=6 16.37 10.11 6.26 595.43 419.53 16.89 33 436945
Table 7.6: Results of implementation in 50MHz in 0.13µm technology
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 13.037 6.76 6.276 1390 1100 14.51 5.49 1499716
d=2 14.18 7.91 6.268 1630 1210 15.40 5.6 835678
d=3 14.715 8.45 6.264 1930 1350 15.27 4.73 614333
d=4 14.992 8.725 6.267 1860 1330 13.97 6.03 505975
d=5 15.85 9.586 6.263 2170 1460 15.77 4.23 463496
d=6 16.365 10.10 6.263 2250 1490 17.64 2.36 436945
60
Table 7.7: Our example design for d = 4 and 400kHz
Area Estimation (kG)
d=4 Total Comb. Non-comb.
TOP 15.04 8.77 6.27
FSM 2.74 2.47 0.27
MALU 3.1 3.1 0
CELL 0.54 0.54 0
Reg.F 7.06 2.09 4.97
Shifter 1.94 0.93 1.01
BUS 0.2 0.17 0.03
ROM 0.36 0.31 0.05
7.3 Trade-off
The trade-off between six different implementations with different digit sizes is given
in Figure 7.2. It is obvious that between d = 1 and d = 2 the speed of the design
is getting faster dramatically. Then, the design is getting faster proportional to
increasing area consumption, and finally between d = 5 and d = 6 area consumption is
increasing sharply with respect to time. Moreover, in Figure 7.3, power consumption
of design (d = 4) is given with throughputs. The power consumption increases not
only proportional to the throughput but also according to the digit size. Figure 7.4
shows different power consumptions in different digit sizes.
d=1
d=2
d=3
d=4
d=5
d=6
10.00
11.00
12.00
13.00
14.00
15.00
16.00
17.00
18.00
0 500000 1000000 1500000 2000000
A
re
a
 (
k
G
)
Time (Clock cycle)
Figure 7.2: Area Consumption vs. Time
61
Figure 7.3: Throughput vs. Power Consumption in d = 4
Figure 7.4: Power Consumption in 5MHz with different digit sizes
7.4 Clock-gating
Clock gating is one of the power-saving techniques used on many synchronous circuits.
Clock gating support additional logic to a circuit to prune the clock tree. Thus,
flip-flops do not change state in disabled parts of the design. Their switching power
consumption goes to zero, and only leakage currents are incurred. In our design, after
clock gating is applied to our register file, new results are reduced as %7 of area
consumption and %22 of power consumption in average. The results are given in
Table 7.8 and Table 7.9 for 400kHz and 5MHz respectively.
62
RTL clock gating works by identifying groups of flip-flops which share a common
enable control signal. RTL clock gating uses this enable signal to control a clock gating
circuit which is connected to the clock ports of all of the flip-flops with common enable
term. These flip-flops consume zero dynamic power as long as this enable signal is
false. An example of logic circuit illustrates the method of clock gating to understand
easily in Figure 7.5.
A_ctrl[0]
A_ctrl[1]
D Q
CLK
CLK
CLK_A
D Q
CLK_A
RegE
RetA
RegB
A
_
c
tr
l[
0
]
A
_
c
tr
l[
1
]
temp_a
RegA
En
CLK
En
temp_a
CLK_A
Figure 7.5: Process of Clock gating
Table 7.8: Results of implementation in 400kHz in 0.13µm technology after
clock-gating
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 12.11 6.308 5.804 8.12 6.18 11.54 2488.46 1499716
d=2 12.92 7.113 5.807 9.55 6.92 12.06 2487.94 835678
d=3 13.47 7.665 5.805 10.32 7.3 12.04 2487.96 614333
d=4 14.074 8.276 5.798 11.36 7.82 10.45 2489.55 505975
d=5 14.622 8.817 5.805 11.63 7.88 10.55 2489.45 463496
d=6 15.184 9.363 5.821 12.24 8.12 10.93 2489.07 436945
63
Table 7.9: Results of implementation in 5MHz in 0.13µm technology after
clock-gating
Area Est. (kG) Power Est. (µW ) Time Est. (ns)
digit Total Comb. Non-comb. TPdyn Pcell Critical Path Slack cycle
d=1 12.07 6.27 5.804 106.57 80.62 10.78 189.22 1499716
d=2 12.9 7.1 5.805 124.57 89.55 11.75 188.25 835678
d=3 13.43 7.62 5.804 135.76 95.43 10.92 189.08 614333
d=4 14.03 8.23 5.798 145 99.97 10.29 189.71 505975
d=5 14.58 8.77 5.806 148.7 100.96 10.38 189.62 463496
d=6 15.15 9.33 5.821 155.86 103.39 10.79 189.21 436945
64
8. CONCLUSION
In this dissertation, the first hardware implementation of binary Edwards curves
is presented. The design steps of the implementation are given in details. The
optimization of number of registers and clock cycles is stated. The design results
for six different digit sizes are given. It can be seen that power consumption is
related mostly to the frequency, so the first thing is to choose the clock frequency.
And then the trade-off between area and time can be decided. The best digit size of
design can be d = 4, since the working time is 108358 clock cycle less than d = 3
and area consumption is slightly bigger than d = 3. In our example design the best
implementation results can be provided for d = 4.
As the future work some further improvements are possible. It could be possible
to neglect Y coordinates of projective coordinates and to use only the X coordinate
for projective addition algorithm. Moreover, common-Z coordinate can be applied
to this implementation. Then, the complexity of calculations can be reduced and it
will take less clock cycles to finish point multiplication. Furthermore, the immunity
of the implementation must be checked against Simple Power Analysis (SPA) and
Differential Power Analysis (DPA) attacks.
65
66
REFERENCES
[1] Frösen, J., 1995, Practical Cryptosystems and their Strength, Tik-110.501
Seminar on Network Security.
[2] Menezes, A.J., van Oorschot, P.C. and Vanstone, S.A.,
2001. Handbook of Applied Cryptography, CRC Press,
http://www.cacr.math.uwaterloo.ca/hac/.
[3] NIST, November 2001, Announcing the ADVANCED ENCRYPTION
STANDARD (AES), FIPS PUB 197.
[4] Preneel, B., 2007. An Introduction to Modern Cryptology, J. Bergstra
and K. de Leeuw, editors, The History of Information Security. A
Comprehensive Handbook, Elsevier, pp. 565–592.
[5] Barker, E., Barker, W., Burr, W., Polk, W. and Smid, M. NIST SP800-57:
Recommendation for Key Management Part 1: General(Revised),
Technical report.
[6] Diffie, W. and Hellman, M., 1976. New directions in cryptography, IEEE
Transactions on Information Theory.
[7] Batina, L., December 2005. Arithmetic and Architectures for Secure Hardware
Implementations of Public-Key Cryptography, Ph.D. thesis, Katholieke
Universiteit Leuven, Belgium.
[8] Koblitz, N., January 1987. Elliptic Curve Cryptosystems, Mathematics of
Computation, 48(177), 203–209.
[9] Miller, V.S., 1985. Use of Elliptic Curves in Cryptography,H.C.Williams, editor,
CRYPTO, volume 218 of Lecture Notes in Computer Science, Springer,
pp. 417–426.
[10] Studholme, C., 2002, The Discrete Logarithm Problem, Research paper at
University of Toronto.
[11] Örs Yalçın, S.B., February 2005. Hardware Design of Elliptic Curve
Cryptosystems and Side-Channel Attacks, Ph.D. thesis, Katholieke
Universiteit Leuven, Belgium.
[12] Hankerson, D., Menezes, A. and Vanstone, S., 2004. Guide to elliptic curve
cryptography, Springer, New York.
[13] Saeki, M., February 1997. Elliptic Curve Cryptosystems, Ph.D. thesis, McGill
University Montreal, Canada.
67
[14] I. Blake, G. Seroussi, N., editor, 2000. Elliptic curves in Cryptography,
Cambridge University Press.
[15] Consortium, N., 2003, NESSIE security report, Technical report,
https://www.cosic.esat.kuleuven.be/nessie/deliverables/D20-v2.pdf.
[16] NIST, November 2008, Digital Signature Standard (DSS), FIPS PUB 186-3.
[17] Batina, L., March 2009, Elliptic Curve Cryptography, Lecture Handout.
[18] Silverman, J., 1986. The Arithmetic of Elliptic Curves, volume 106, New York:
Springer-Verlag.
[19] Morain, F. andOlivos, J., 1990. Speeding Up The Computations On An Elliptic
Curve Using Addition-Subtraction Chains, Theoretical Informatics and
Applications, 24, 531–543.
[20] Bernstein, D.J. and Lange, T., 2007. Faster Addition and Doubling on Elliptic
Curves, In Asiacrypt 2007 [10, pp. 29–50.
[21] Edwards, H.M., 2007. A Normal Form for Elliptic Curves, Bulletin of the
American Mathematical Society, 44(3), 393–422.
[22] Bernstein, D.J., Lange, T. and Farashahi, R.R., 2008, Binary Edwards Curves,
Cryptology ePrint Archive, Report 2008/171, http://eprint.iacr.org/.
[23] Kim, K.H. and Kim, S.I., 2007, A New Method for Speeding Up Arithmetic
on Elliptic Curves over Binary Fields, Cryptology ePrint Archive, Report
2007/181, http://eprint.iacr.org/.
[24] Montgomery, P.L., 1987. Speeding the Pollard and elliptic curve methods of
factorization, Mathematics of Computation, 48, 243–264.
[25] GEZEL, http://rijndael.ece.vt.edu/gezel2/index.php.
[26] Synopsys, I., 2006, Design Compiler Tutorial Using Design Vision.
[27] Oswald, E., 2002. Enhancing Simple Power-Analysis Attacks on Elliptic Curve
Cryptosystems, B.S.K. Jr., Çetin Kaya Koç and C. Paar, editors,
CHES, volume 2523 of Lecture Notes in Computer Science, Springer,
pp. 82–97.
[28] Sakiyama, K., Batina, L., Mentens, N., Preneel, B. and Verbauwhede,
I., 2006. Small-footprint ALU for Public-Key Processors for Pervasive
Security, Proc. Workshop RFID Security (RFIDSec ’06).
[29] Lee, Y.K., Sakiyama, K., Batina, L. and Verbauwhede, I., 2008.
Elliptic-Curve-Based Security Processor for RFID, IEEE Transactions
on Computers, 57(11), 1514–1527.
[30] T. Itoh, S.T., 1988. Effective recursive algorithm for computing multiplicative
inverses in GF(2m), volume 24(6), Electronic Letters, pp. 334–335.
68
A. Control Sequence of Control Block
start = 0
IDLE
start = 1 t != 21
firsttime = 1,
t == 21
First time ROM
r_rom = 1
PX – PY – d1 – d2
RAM0
Intermediate values
r_ram =1 
w_ram =1 
ROM
r_rom = 1
 d1 – d2
RAM0
Intermediate values
r_ram =1 
w_ram =1 
Regular
RAM1RAM2
Write : P2 <- 2P Write : P1 <- P
Write : P2 <- 2P2,
P2 <- P1+P2
Write : P1 <- 2P1,
P1 <- P1+P2
Read P2
coordinates
Read P1
coordinates
counter != 162counter == 162
Fermat - Z-1RAM0
Intermediate values
r_ram =1 
w_ram =1 
RAM1
 PX1 – PY1
r_ram = 1
Affine 
coordinates
 PX1 – PY1
Returns P1  IDLE
Load key
k
1
8
1
6
1
1
1
8
8
8
1
8
8
8
8
1
1
8
8
8
1
88
1
1
8
8
8
1
8
Figure A.1: The Finite State Machine of Control Block
69
70
B. Control Bits of Assign Operation
Figure B.1: Assign Operation Control Map
71
72
C. Codes for Cell and MALU
d p
 C
e l l
 ( i n
 M
S B
_ B
 :
 n
s ( 1 ) ;
 i n
 A
,
 C
 :
 n
s ( 1 6 3 ) ;
 o
u t
 R
e t
_ A
 :
 n
s ( 1 6 3 ) )
 {
 / /
 D
e f i n i t i o
n
 o f
 t h
e
 i n p
u t
 a n d
 o
u t p
u t
 
 
s i g
 S h i f t
_ A
 :
 n
s ( 1 6 3 ) ;
 
 / /
 D
e f i n i n g
 w
i r e s
 
s i g
 M
S B
_ A
 :
 n
s ( 1 ) ;
 
 
 
s f g
 M
u l t
 {
 
 
S h i f t
_ A
 =
 A
<
< 1 ;
 
 
 
 
 
M
S B
_ A
 =
 A [ 1 6 2 ] ;
 
 
 
 
 
R
e t
_ A
 =
 ( ( M
S B
_ B )
 ?
 
 C
^ S h i f t
_ A
 :
 S h i f t
_ A )
 ^
 ( ( M
S B
_ A )
 ?
 0
x
c 9
: 0 ) ;
 }
 
}
 
h
a r d
w
i r e d
 h
_ C
e l l ( C
e l l )
 
 { M
u l t ; }
 
/ /
 M
A
L U
_ d 4
 
d p
 C
e l l
_ 0 0
 :
 C
e l l
 / /
 U
s i n g
 s a m
e
 C
e l l
 4
 t i m
e s
 w
i t h
 d i f f e r e n t
 c o
m
p
o
n
e n t
 n
a m
e
 
d p
 C
e l l
_ 0 1
 :
 C
e l l
 
d p
 C
e l l
_ 0 2
 :
 C
e l l
 
d p
 C
e l l
_ 0 3
 :
 C
e l l
 
d p
 M
A
L U
 ( i n
 O
p
,
 M
u l
_
s t a r t
,
 M
u l
_ l a s t
 :
 n
s ( 1 ) ;
 i n
 M
S B
_ B
 :
 n
s ( 4 ) ;
 i n
 A
,
 C
:
 n
s ( 1 6 3 ) ;
 o
u t
 R
e t
_ A
 :
 n
s ( 1 6 3 ) )
 {
 
 
 
 
 
 
 
 
 / /
 W
o
r d
 S i z e
 =
 4
 
s i g
 R
e
_ A
_ 0 0
,
 R
e
_ A
_ 0 1
,
 R
e
_ A
_ 0 2
,
 R
e
_ A
_ 0 3
,
 I n
_ A
_ 0 0
,
 I n
_ A
_ 0 1
,
 I n
_ A
_ 0 2
,
 I n
_ A
_ 0 3
 :
 n
s ( 1 6 3 ) ;
 
s i g
 M
S B
_ B
_ 0 0
,
 M
S B
_ B
_ 0 1
,
 M
S B
_ B
_ 0 2
,
 M
S B
_ B
_ 0 3
 :
 n
s ( 1 ) ;
 
u
s e
 C
e l l
_ 0 0 ( M
S B
_ B
_ 0 0
,
 I n
_ A
_ 0 0
,
 C
,
 R
e
_ A
_ 0 0 ) ;
 u
s e
 C
e l l
_ 0 1 ( M
S B
_ B
_ 0 1
,
 I n
_ A
_ 0 1
,
 C
,
 R
e
_ A
_ 0 1 ) ;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 / /
 P
o
r t
 m
a p
 o f
 C
e l l s
 
u
s e
 C
e l l
_ 0 2 ( M
S B
_ B
_ 0 2
,
 I n
_ A
_ 0 2
,
 C
,
 R
e
_ A
_ 0 2 ) ;
 u
s e
 C
e l l
_ 0 3 ( M
S B
_ B
_ 0 3
,
 I n
_ A
_ 0 3
,
 C
,
 R
e
_ A
_ 0 3 ) ;
 
 
 
s f g
 M
a l u
 {
 
 
R
e t
_ A
 =
 ( M
u l
_ l a s t )
 ?
 R
e
_ A
_ 0 2
 :
 R
e
_ A
_ 0 3 [ 1
: 1 6 2 ]
 #
 ( ( O
p )
 ?
 R
e
_ A
_ 0 3 [ 0 ]
 :
 ( C [ 0 ] ^ A [ 0 ] ) ) ;
 
 
 
 
 
 
 / /
 W
o
r d
 S i z e
 =
 4
 
 
 
 
 
I n
_ A
_ 0 0
 =
 ( M
u l
_
s t a r t ) ?
 0
x 0
 : A
;
 
 
 
 
 
I n
_ A
_ 0 1
 =
 R
e
_ A
_ 0 0 ;
 
 
 
 
 
I n
_ A
_ 0 2
 =
 R
e
_ A
_ 0 1 ;
 
 
 
 
I n
_ A
_ 0 3
 =
 ( O
p )
 ?
 R
e
_ A
_ 0 2
: A
>
> 1 ;
 
 
 
 
 
M
S B
_ B
_ 0 0
 =
 M
S B
_ B [ 3 ] ;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
M
S B
_ B
_ 0 1
 =
 M
S B
_ B [ 2 ] ;
 
 
 
 
 
M
S B
_ B
_ 0 2
 =
 M
S B
_ B [ 1 ] ;
 
 
M
S B
_ B
_ 0 3
 =
 M
S B
_ B [ 0 ] | (
~ O
p ) ; }
 
 
 
 
 
 
 
}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
h
a r d
w
i r e d
 h
_ M
A
L U ( M
A
L U )
 {
 M
a l u ;
 }
 
Figure C.1: Codes for Cell and MALU
73
74
D. Comparison Between Normal Clocking and Clock-gating
Figure D.1: Area Consumption vs. Time
75
Figure D.2: Frequency vs. Power Consumption in d = 4
Figure D.3: Power Consumption in 5MHz with different digit sizes
76
E. Simulation Examples
Figure E.1: Assigning Key Value in Modelsim
77
Figure E.2: Simulation in GEZEL for First Four Projective Addition
78
Figure E.3: Figure of Simulation in Modelsim for First X3 Value
79
Figure E.4: Figure of Simulation in Modelsim for First Y3 Value
80
Figure E.5: Figure of Simulation in Modelsim for First Z3 Value
81
Figure E.6: Simulation in GEZEL for Final Points After Inversion
82
Figure E.7: Figure of Simulation in Modelsim for Final Points
83
CIRCULUM VITAE
Ünal Kocabas¸ was born in 1986 in Bursa, Turkey. He finished his primary, secondary
and high school educations in Bursa. He received his bachelor degree in Electronics
Engineering from Istanbul Technical University in 2007. In addition to bachelor
degree, he studied and worked part-time for 9 months as an assistant of ASIC
measurement department at the Mikroelekronik R&D Design Center in Istanbul,
Turkey. Afterwards, he started his M.Sc. in Electronics Engineering in ITU in
September 2007. In September 2008, he started his master thesis in the COSIC group
(Computer Security and Industrial Cryptography) in the Department of Electrical
Engineering (ESAT) of the Katholieke Universiteit Leuven, as an exchange student.
84
